Re: MAE-EAST Moving? from Tysons corner to reston VA.
On Sat, 17 Jun 2000, RJ Atkinson wrote:
Sounds like a filter on which product one buys. :-)
Based on those who don't support a non-standard extension? At any rate, people will buy them, and problems will ensue. :P
That fact aside, several OS drivers for GigE NICs do not currently support jumbo frames.
Which OSs don't yet support this ?
Not OS, drivers. Pick your favorite OS with GigE support, grep jumbo the drivers section. In a few cases the unix drivers support jumbo frames and the reference vendor drivers do not, in a couple its the other way around. I see its getting better though, there is more support then there used to be the last time I looked.
The point is that unless everyone makes these changes, any attempt to run a higher MTU along a non-supporting path without a reliable PMTU-D mechanism will result in bloody horror.
For content replication, as differentiated from content distribution, the larger MTU should be safe. A larger than 1518 MTU won't be safe for content distribution anytime soon because the Cable Modem standards specify a 1518 MTU and cable modems are a leading way to provide high performance networking to homes.
The real number ought to be >= 9180 IP bytes + Ethernet overhead, so that hosts can do page-flipping and other optimisations. Without Jumbo-grams, neither NT nor any common UNIX can run full line-rate over TCP/IP with GigE today. With Jumbo-Grams, both NT and UNIX can achieve 1 Gbps of throughput over TCP/IP/GigE.
I've been able to get DAMN NEAR line rate with sendfile(2) zero-copy transfer, and some optimizations for the entire packet output path, using FreeBSD and some AMD K7 800s, without using jumbo frames.
Curious. sendfile(2) is using TCP ? Any TCP extensions present ? This was off-the-shelf FreeBSD ? Which version ? Which GigE card (NetGear GA-620 ?) ?
Based off 4.0-STABLE, using some work I'm doing on the FreeBSD TCP/IP stack (mainly cleaner code and a few trivial optimizations at this point, nothing earth shattering), some additional optimizations and shortcuts through the stack based on IP Flow which I'm writing, using back to back NetGear GA620s, 512k send/recvspace buffers and a 1MB socket buffer, and a really quick in-kernel ack & discard in the receiving end. The last time I tried it with standard userland transfers was on back to back p3 500s which pulled about 450Mbps between a GA620 and an Intel GE.
It would certainly help to be doing less packets/rateoftime though, as this is a (the?) major bottleneck.
Packet processing overhead and memory manipulation are generally the bottlenecks in hosts. There is substantial literature to this effect.
Isn't that the truth. I think of a lot of it is poorly optimized and well-planned code though. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/humble PGP Key ID: 0x138EA177 (67 29 D7 BC E8 18 3E DA B2 46 B3 D8 14 36 FE B6)
Richard A. Steenbergen: Saturday, June 17, 2000 7:55 AM
On Sat, 17 Jun 2000, RJ Atkinson wrote:
Sounds like a filter on which product one buys. :-)
Based on those who don't support a non-standard extension? At any rate, people will buy them, and problems will ensue. :P
Currently, there are places in internal sites that are doing exactly this. What happens is, in the interests of invoice standardization, the same filter is being applied to the externally visible equipment. My personal nightmare is some tech using the wrong NIC in the wrong machine and the architecture (my responsibility area) is erroneously shown to fail. It is far safer to spec the same equipment/capability homogenously and eliminate that failure-mode <grin>.
I see its getting better though, there is more support then there used to be the last time I looked.
Uneven distribution is always an issue during technology roll-out/adoption. I expect it to take years, with some extentions being orphaned, and it could even cause some market re-alignments and new vendors to pop up... normal stuff, prior to feature commoditization.
The point is that unless everyone makes these changes, any attempt to run a higher MTU along a non-supporting path without a reliable PMTU-D mechanism will result in bloody horror.
For content replication, as differentiated from content distribution, the larger MTU should be safe. A larger than 1518 MTU won't be safe for content distribution anytime soon because the Cable Modem standards specify a 1518 MTU and cable modems are a leading way to provide high performance networking to homes.
Does anyone know if this same restriction applies to any form of DSL? If so, then this capability will rapidly become a data center internal-only usage. This will also restrict its usage on the co-lo trunks. Otherwise, we need to point at the cable-modem specs and get that changed. As alluded to previously, there are some who take the MTU=1500 issue with religious zeal. They will resist all rational argument in this. MTU values should remain a configuration parameter and should not be spec'd in the protocol... ever!
Based off 4.0-STABLE, using some work I'm doing on the FreeBSD TCP/IP stack (mainly cleaner code and a few trivial optimizations at this point, nothing earth shattering), some additional optimizations and shortcuts through the stack based on IP Flow which I'm writing, using back to back NetGear GA620s, 512k send/recvspace buffers and a 1MB socket buffer, and a really quick in-kernel ack & discard in the receiving end.
<g> You will release open-source so the Linux community can use <please>?
The last time I tried it with standard userland transfers was on back to back p3 500s which pulled about 450Mbps between a GA620 and an Intel GE.
This is much better than what I'm seeing (MTU=1500). With MTU=4096+40 I get closer. I've not tried higher MTU values because not all of my equipment supports it. Have you done any analysis of MTU=<value> vs. thoughput? If so, at what increment?
It would certainly help to be doing less packets/rateoftime though, as this is a (the?) major bottleneck.
Packet processing overhead and memory manipulation are generally the bottlenecks in hosts. There is substantial literature to this effect.
Isn't that the truth. I think of a lot of it is poorly optimized and well-planned code though.
In defense of the programmer, code dealing with a fixed MTU value can be much more efficient than code that has to discover MTU value at run-time. I suspect that this may also be the case for cable-modems. It would have a direct effect on CPU cost and therefore COGm. At a scale of 10M units, $0.001 COGm can add up to a lot of money. However, this is a transitive benefit, as CPU/RAM gets cheaper and faster. Ascend found this out with the Pipeline 25 (a lot of which they've had to replace with Pipeline 75's, under warranty, resulting in reduced profitablility, overall [to Ascend's credit, they actually did it, where warranted, at no additional consumer cost]). I don't intend this to be a defense for a fixed MTU value, I am only postulating a probable cause for its appearance in the cable-modem spec (that they are not completely irrational<g>).
On Sat, 17 Jun 2000, RJ Atkinson wrote:
Which OSs don't yet support this ?
Not OS, drivers. Pick your favorite OS with GigE support, grep jumbo the drivers section. In a few cases the unix drivers support jumbo frames and the reference vendor drivers do not, in a couple its the other way around. I see its getting better though, there is more support then there used to be the last time I looked.
you'd be surprised how many vendors aren't even considering supporting jumbo frames, or worse don't understand why you'd want to. several vendors of optical gear (dwdm) i've run into lately weren't even going to do it and didn't know why they should. this only applies to vendors doing native GE, not vendors going true transparent optics. -b
Same question once again. As long as most end users are running Ethernet, Fast Ethernet, DSL or Cable Modems, what is the point of jumbo frames/packets other than transferring BGP tables really fast. Did any one look into how many packets are moved through an OC-48 in 1 seconds. (approx. 6 million 40 byte packets). I think even without jumbo frames, this bandwidth will saturate most CPUs. Jumbo frames are pointless until most of the Internet end users switch to a jumbo frame based media. Yes, they look cool on the feature list (we support it as well). Yes they are marginally more efficient than 1500 byte MTUs ( 40/1500 vs 40/9000). But in reality, 99% or more of the traffic out there is less than 1500 bytes. In terms of packet counts, last time I looked at one, 50% of the packets were around 40 byte packets (ACKs) with another 40% or so at approx 576 bytes or so. What is the big, clear advantage of supporting jumbo frames? Bora ----- Original Message ----- From: "brett watson" <bwatson@mibh.net> To: "Richard A. Steenbergen" <ras@e-gerbil.net> Cc: "RJ Atkinson" <rja@inet.org>; <nanog@merit.edu> Sent: Saturday, June 17, 2000 11:11 PM Subject: Re: MAE-EAST Moving? from Tysons corner to reston VA.
On Sat, 17 Jun 2000, RJ Atkinson wrote:
Which OSs don't yet support this ?
Not OS, drivers. Pick your favorite OS with GigE support, grep jumbo the drivers section. In a few cases the unix drivers support jumbo frames
and
the reference vendor drivers do not, in a couple its the other way around. I see its getting better though, there is more support then there used to be the last time I looked.
you'd be surprised how many vendors aren't even considering supporting jumbo frames, or worse don't understand why you'd want to.
several vendors of optical gear (dwdm) i've run into lately weren't even going to do it and didn't know why they should. this only applies to vendors doing native GE, not vendors going true transparent optics.
-b
participants (4)
-
Bora Akyol
-
brett watson
-
Richard A. Steenbergen
-
Roeland Meyer (E-mail)