Re: Jumbo Frames (was Re: MAE-EAST Moving? from Tysons corner to reston VA. )
On Mon, 19 June 2000, "Bora Akyol" wrote:
As long as most end users are running Ethernet, Fast Ethernet, DSL or Cable Modems, what is the point of jumbo frames/packets other than transferring BGP tables really fast. Did any one look into how many packets are moved through an OC-48 in 1 seconds. (approx. 6 million 40 byte packets). I think even without jumbo frames, this bandwidth will saturate most CPUs.
Jumbo frames are pointless until most of the Internet end users switch to a jumbo frame based media.
Yes, they look cool on the feature list (we support it as well). Yes they are marginally more efficient than 1500 byte MTUs ( 40/1500 vs 40/9000). But in reality, 99% or more of the traffic out there is less than 1500 bytes. In terms of packet counts, last time I looked at one, 50% of the packets were around 40 byte packets (ACKs) with another 40% or so at approx 576 bytes or so.
What is the big, clear advantage of supporting jumbo frames?
When 1500 byte frames from the customer's LAN enter the customer's router and enter some form of IP tunnel, then a core fabric which supports larger than 1500 byte frames will not cause fragmentation. It's not necessary to do the full jumbo size frames. I suspect that supporting two levels of encapsulation will be enough in 99.9% of the cases. For the sake of argument, what would be the downside of using a 2000 byte MTU as the minimum MTU in your core?
When the next end user upgrade is deployed and everyone has devices which can support larger MTUs, wouldn't it be a shame if they said "if only the internet core ran at larger MTUs, we could negiotate higher MTUs and make everyone happier". Also, it is far more then "marginally more efficient". For every packet you deal with, there is a great amount of work doing routing lookups, dealing with memory management, and handling interrupts. Copying another few bytes of data is easy in comparison. Since we are asking GigE to act in a server and backbone role, we should acknowledge that the requirements will be different from the average end-user ethernet. One of those requirements is that the backbone should be able to pass larger packets it may encounter without resorting to fragmentation (which only gets harder and we start getting into higher speeds). Aside from that, and the fact that there is nothing harmful in supporting larger packets through your network, there is the fact that if we want people to support standards we KNOW are good for them (even if they don't), we have to actually ask for it. Imagine an internet with a reliable MTU negiotation mechinism, which can take advantage of improved thruput, much lower CPU usage, zero copy, page-flipping, DMA transfers, and all those other lovely things. These are important for many reasons. Without these techniques, we can't even do line rate GigE on "common place" servers, let alone have any CPU left over to do more then just send packets. Its easy to just say "we'll throw a server farm at it" or "we'll just get a faster processor", but as higher speed links become more common place, and as GigE becomes common in servers (when servers can actually use it effectively) and 10GigE becomes commonplace for backbone links, we'll start to see these things matter. Why engineer ourselves into a corner of shortsightedness which only gets harder and harder to fix, because its "easier" to do nothing? (sorry Michael, just using your msg as a good point to reply :P) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/humble PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
Richard A. Steenbergen: Monday, June 19, 2000 2:57 AM
don't), we have to actually ask for it. Imagine an internet with a reliable MTU negiotation mechinism, which can take advantage of improved thruput, much lower CPU usage, zero copy, page-flipping, DMA transfers, and all those other lovely things.
Yeah ... <heavy breathing>
These are important for many reasons. Without these techniques, we can't even do line rate GigE on "common place" servers, let alone have any CPU left over to do more then just send packets.
Actually, my testing shows a falure to utilize even 100baseTX fully. Even in a switched FDX environment (no collisions) I can't achieve line rate without bumping the packet size up. Considering that the smallest box is a quad-CPU SMP machine (550Mhz), I don't think that there is a CPU shortage <grin>.
Its easy to just say "we'll throw a server farm at it" or "we'll just get a faster processor", but as higher speed links become more common place, and as GigE becomes common in servers (when servers can actually use it effectively)
In this case, the common problem is the RDBMS host. It is very difficult to cluster them, due to limitations with most RDBMSs. The result is that this host sources most of the packets and with MTU=1500 it is throttled at about 40% of line-rate, or less, depending on transfer size.
and 10GigE becomes commonplace for backbone links, we'll start to see these things matter. Why engineer ourselves into a corner of shortsightedness which only gets harder and harder to fix, because its "easier" to do nothing?
I don't have a 10gig-E system, but I wonder about going there when I can't even get gig-E to work efficiently. If vendors want to sell 10gig-E they should be concerned about exactly this point. Joe SOHO isn't going to buy it anyway. Joe Enterprise isn't going to spend the extra money unless he can see some real benefit, and Joe dot-com ain't going to do it unless it is measurably faster than gig-E (which it won't be with MTU=1500). I can aggrigate 3-5 gig-E links to get the same troughput, by adjusting MTU, and not pay the 10gig-E meal-ticket. BTW, the selling feature on gig-E is link aggrigation, built into the spec (over Fast-E), there is no similar feature enhancement for 10gig-E, AFAICT. Evenso, it is still limited by MTU size.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 2000-06-19-12:19:20 Roeland Meyer (E-mail):
Actually, my testing shows a falure to utilize even 100baseTX fully.
I'm unsurprised. For most purposes I continue to spec simple 100baseT for server<-->switch connects, 10baseT for normal clients, and quad-100baseT etherchannel for the occasional really badass server (e.g. a NetApp).
I don't have a 10gig-E system, but I wonder about going there when I can't even get gig-E to work efficiently.
Well, that's you --- and me too. But that's sure not most customers. Even some folks whose expertise I generally respect have completely bought into gig-E, and try to apply it for host connects, without attempting to measure whether the host is capable of saturating even 100BaseT with their traffic.
If vendors want to sell 10gig-E they should be concerned about exactly this point.
Not true, no more than the belief that if vendors wanna be able to charge more for GHz CPUs, they must have sufficiently balanced systems so those CPUs really get work done faster than the previous generation. Turns out they don't; the vast majority, that manufacturers care about, that determine the success or failure of a product or marketing strategy, those masses don't care about measured performance, they care about bigger numbers and proud boasts. RAMBUS, anyone?
Joe SOHO isn't going to buy it anyway. Joe Enterprise isn't going to spend the extra money unless he can see some real benefit, and Joe dot-com ain't going to do it unless it is measurably faster than gig-E (which it won't be with MTU=1500).
I disagree once again. Joe Enterprise buys whatever the last salescritter to buy him lunch-with-drinks tells him to buy. No other explanation for all those "SANS" they're inflicting upon themselves, poor boobs. Maybe Joe dot-com will be forced to get a little smarter if his Wall St. bank stays cagey, but I wouldn't bet on that outcome; I figure it won't be long before the dotcoms once again have far, far more money than clue to use it, and so buy whatever sounds gaudiest. - -Bennett -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.0 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE5TlIZL6KAps40sTYRAledAJ9YVq5w3IviRuqbm3c72WmRV1ENRwCeO6yf jfcowHOYE+59VUqxpYF18lg= =UL3s -----END PGP SIGNATURE-----
Sez "Roeland Meyer (E-mail)" <rmeyer@mhsc.com>
Actually, my testing shows a falure to utilize even 100baseTX fully. Even in a switched FDX environment (no collisions) I can't achieve line rate without bumping the packet size up. Considering that the smallest box is a quad-CPU SMP machine (550Mhz), I don't think that there is a CPU shortage <grin>.
It's a rare event to see any server get line rate on any media. If you bump up the media speed, you'll generally see more throughput. A box which can't saturate a FE link can often manage >100mbit/s on a GE link. The equivalent was true way back when FE was new.
I don't have a 10gig-E system, but I wonder about going there when I can't even get gig-E to work efficiently.
Perhaps you should talk to your server/app vendors about that.
If vendors want to sell 10gig-E they should be concerned about exactly this point.
The point is exactly the same point that FE and GE had. Initially TGE will be used as an aggregation point for lots of lower-speed user/server ports. As servers speed up to match the network's capabilities, TGE will migrate out to the servers and the network core will progress to an even faster technology (Hundred Gig E?).
Joe SOHO isn't going to buy it anyway.
Joe SOHO is generally still using 10mb hubs and isn't relevant here.
Joe Enterprise isn't going to spend the extra money unless he can see some real benefit,
That's funny, the biggest cry for TGE I hear is coming from Joe Enterprise, who is complaining that a GE network isn't fast enough to push his hundred-TByte backups and GE-connected workstations.
and Joe dot-com ain't going to do it unless it is measurably faster than gig-E (which it won't be with MTU=1500).
It will be 10x as fast, regardless of the MTU. If Joe's servers can't keep up, that doesn't change how fast TGE runs.
I can aggrigate 3-5 gig-E links to get the same troughput, by adjusting MTU, and not pay the 10gig-E meal-ticket.
If you want to go 10x as fast as GE, you will need either (a) 10 GE links with perfect loadsharing or (b) TGE. If you only care to go 5Gbit/s, that's not "the same throughput" simply because your servers can't keep up with TGE.
BTW, the selling feature on gig-E is link aggrigation, built into the spec (over Fast-E), there is no similar feature enhancement for 10gig-E, AFAICT.
To what link aggregation feature do you refer? I'm not aware of any functional difference between the aggregation capabilities of GE and FE.
Evenso, it is still limited by MTU size.
Hindered, not limited. There are devices that can fill a GE link with 64byte frames; hardware to do the same will undoubtedly appear for TGE. A 1500byte MTU is definitely a problem for server vendors, but claiming that faster media is pointless because of it is hardly realistic. Perhaps those vendors will get involved in the IEEE TGE process and get jumbo frames standardized. S | | Stephen Sprunk, K5SSS, CCIE #3723 :|: :|: Network Design Consultant, HCOE :|||: :|||: 14875 Landmark Blvd #400; Dallas, TX .:|||||||:..:|||||||:. Email: ssprunk@cisco.com
participants (4)
-
Bennett Todd
-
Richard A. Steenbergen
-
Roeland Meyer (E-mail)
-
Stephen Sprunk