Re: Strange public traceroutes return private RFC1918 addresses
Which (as discussed previously) breaks things like Path MTU Discovery, traceroute,
If RFC1918 addresses are used only on interfaces with jumbo MTUs on the order of 9000 bytes then it doesn't break PMTUD in a 1500 byte Ethernet world. And it doesn't break traceroute. We just lose the DNS hint about the router location. A more important question is what will happen as we move out of the 1500 byte Ethernet world into the jumbo gigE world. It's only a matter of time before end users will be running gigE networks and want to use jumbo MTUs on their Internet links. Could we all agree on a hierarchy of jumbo MTU sizes that with the largest sizes in the core and the smallest sizes at the edge? The increment in sizes should allow for a layer or two of encapsulation and peering routers should use the largest size MTU. Thoughts? --Michael Dillon
A more important question is what will happen as we move out of the 1500 byte Ethernet world into the jumbo gigE world. It's only a matter of time before end users will be running gigE networks and want to use jumbo MTUs on their Internet links.
The performance gain achieved by using jumbo frames outside of very specific LAN scenarios is highly questionable, and they're still not standardized. Are "jumbo" Internet MTUs seen as a pressing issue by ISPs and vendors these days? -Terry
A more important question is what will happen as we move out of the 1500 byte Ethernet world into the jumbo gigE world. It's only a matter of time before end users will be running gigE networks and want to use jumbo MTUs on their Internet links.
The performance gain achieved by using jumbo frames outside of very specific LAN scenarios is highly questionable, and they're still not standardized. Are "jumbo" Internet MTUs seen as a pressing issue by ISPs and vendors these days?
-Terry
for some, yes. running 1ge is fairly common and 10ge is maturing. bleeding edge 40ge is available ... and 1500byte mtu is -not- an option. --bill
bill wrote:
for some, yes. running 1ge is fairly common and 10ge is maturing. bleeding edge 40ge is available ... and 1500byte mtu is -not- an option.
Me wonders why people ask for 40 byte packets at linerate if the mtu is supposedly larger? Pete
bill wrote:
for some, yes. running 1ge is fairly common and 10ge is maturing. bleeding edge 40ge is available ... and 1500byte mtu is -not- an option.
Me wonders why people ask for 40 byte packets at linerate if the mtu is supposedly larger?
Pete
got me... although I could fabricate a rational. 40 byte packets @ 40Gig is a wonder to contemplate. the whole ATM argument (53byte "cells" over 100Meg) being an egregious overhead expense for segmentation/ reassembly is amplified here. --bill
bill wrote:
got me... although I could fabricate a rational.
40 byte packets @ 40Gig is a wonder to contemplate. the whole ATM argument (53byte "cells" over 100Meg) being an egregious overhead expense for segmentation/ reassembly is amplified here.
There are more cell-based fixed access links running IP than all other technologies combined. (fixed == not counting dialup) So cells must be good for you. At least they sell well :) Pete
In a message written on Tue, Feb 03, 2004 at 08:15:13AM -0600, Terry Baranski wrote:
The performance gain achieved by using jumbo frames outside of very specific LAN scenarios is highly questionable, and they're still not standardized. Are "jumbo" Internet MTUs seen as a pressing issue by ISPs and vendors these days?
While the rate of request is still very low, I would say we get more and more requests for jumbo frames everyday. The pressing application today is "larger" frames; that is don't think two hosts talking 9000 MTU frames to each other, but rather think IPSec or other tunneling boxes talking 1600 byte packets to each other so they don't have to split 1500 byte Ethernet packets in half. Since most POS is 4470, adding a jumbo frame GigE edge makes this application work much more efficiently, even if it doesn't enable jumbo (9k) frames end to end. The interesting thing here is it means there absolutely is a PMTU issue, a 9K edge with a 4470 core. There is also a lot of work going on in academic networks that uses jumbo frames. I suspect in a few more years this will make it into more common applications. In a message written on Tue, Feb 03, 2004 at 04:40:15PM +0200, Petri Helenius wrote:
Me wonders why people ask for 40 byte packets at linerate if the mtu is supposedly larger?
This is a problem that is going to get worse. I support IP you have to support a 40 byte packet. As long as that exists, DDOS tools will use 40 byte packets, knowing more lookups are harder on the software/hardware in routers. At the same time I suspect software is going to continue to slowly move to larger and larger packets, because at the higher data rates (eg 40 gige) it makes a huge difference in host usage. You can fit 6 times in the data in a 9K packet that you can in a 1500 byte packet, which means 1/6th the interrupts, DMA transfers, ACL checks, etc, etc, etc. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/ Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org
Leo Bicknell wrote:
because at the higher data rates (eg 40 gige) it makes a huge difference in host usage. You can fit 6 times in the data in a 9K packet that you can in a 1500 byte packet, which means 1/6th the interrupts, DMA transfers, ACL checks, etc, etc, etc.
This is wrong. Interrupt moderation has been there for quite a while, DMA is chained and predictive. ACL checks I can agree on, but if you are optimizing the system, what do you need ACL´s for anyway because you can make the applications secure in the first place? Pete
Leo Bicknell wrote:
because at the higher data rates (eg 40 gige) it makes a huge difference in host usage. You can fit 6 times in the data in a 9K packet that you can in a 1500 byte packet, which means 1/6th the interrupts, DMA transfers, ACL checks, etc, etc, etc.
* pete@he.iki.fi (Petri Helenius) [Tue 03 Feb 2004, 19:47 CET]:
This is wrong. Interrupt moderation has been there for quite a while, DMA is chained and predictive.
Just like the extra chopping up of the data you want to send into more packets, it's things you have to do a few extra times. That takes time. There is no way around this. What Leo wrote is in no way wrong.
ACL checks I can agree on, but if you are optimizing the system, what do you need ACL?s for anyway because you can make the applications secure in the first place?
You're trolling, right? -- Niels. -- Blessed are the Watchmakers, for they shall inherit the earth.
Niels Bakker wrote:
Just like the extra chopping up of the data you want to send into more packets, it's things you have to do a few extra times. That takes time. There is no way around this. What Leo wrote is in no way wrong.
Maybe we need to define what the expression "huge difference" means in this context. Previously it has been defined as 1.4% difference which in my opinion qualifies as understatement of the day. If we would be talking about 20% or more difference here, the pain from larger MTU might be tolerable.
ACL checks I can agree on, but if you are optimizing the system, what do you need ACL?s for anyway because you can make the applications secure in the first place?
You're trolling, right?
No. I´ll trust my digital signatures over the source IP filters any day. Pete
Leo Bicknell wrote:
Since most POS is 4470, adding a jumbo frame GigE edge makes this application work much more efficiently, even if it doesn't enable jumbo (9k) frames end to end. The interesting thing here is it means there absolutely is a PMTU issue, a 9K edge with a 4470 core.
This brings up the question of what other MTUs are common on the Internet, as well as which ones are simply defaults (i.e., could easily be increased) and which ones are the result of device/protocol limitations. And why 4470 for POS? Did everyone borrow a vendor's FDDI-like default or is there a technical reason? PPP seems able to use 64k packets (as can the frame-based version of GFP, incidentally, POS's likely replacement). -Terry
On Tue, Feb 03, 2004 at 11:02:16AM -0500, Leo Bicknell wrote:
While the rate of request is still very low, I would say we get more and more requests for jumbo frames everyday. The pressing application today is "larger" frames; that is don't think two hosts talking 9000 MTU frames to each other, but rather think IPSec or other tunneling boxes talking 1600 byte packets to each other so they don't have to split 1500 byte Ethernet packets in half. Since most POS is 4470, adding a jumbo frame GigE edge makes this application work much more efficiently, even if it doesn't enable jumbo (9k) frames end to end. The interesting thing here is it means there absolutely is a PMTU issue, a 9K edge with a 4470 core.
9k isn't an absolutely necessity, especially for x86. I believe the original reason for 9k as picked by Alteon was to support the 8192 byte page size on the Alpha. As long as there is enough to squeeze an x86 memory page (4096 bytes of payload) plus some room for headers, the important goal of jumbo frames (which is NOT to lower the packet/sec count, this is only a mild by-product for those who are still doing things wrong) is achieved. This would also eliminate the problems of IPSec, GRE, and other forms of tunneling which may or may not be applied breaking things where PMTUD is blocked, since the "standard" payload packet for TCP would only be 4136 octets (leaving plenty for other "stuff"). The 4470 MTU of POS meets this requirement perfectly, and the world of end to end connectivity would be an infinitely better place if everyone could expect to pass 4470 through the Internet. But alas, there are probably too many people people running GigE in the core which doesn't support jumbo frames let alone a standardized size of jumbo frame, due to various vendor hijinks to truly make use of POS's MTU these days. -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
* pete@he.iki.fi (Petri Helenius) [Tue 03 Feb 2004, 15:42 CET]:
Me wonders why people ask for 40 byte packets at linerate if the mtu is supposedly larger?
Support for the worst-case scenario. Same why you spec support for a BIGINT-line ACL without excessive impact on forwarding capacity. -- Niels. -- Blessed are the Watchmakers, for they shall inherit the earth.
Niels Bakker wrote:
* pete@he.iki.fi (Petri Helenius) [Tue 03 Feb 2004, 15:42 CET]:
Me wonders why people ask for 40 byte packets at linerate if the mtu is supposedly larger?
Support for the worst-case scenario. Same why you spec support for a BIGINT-line ACL without excessive impact on forwarding capacity.
Why large MTU then? Most modern ethernet controllers don´t care if you´re sending 1500 or 9000 byte packets. (with proper drivers taking advantage of the features there) If you´re paying for 40 byte packets anyway, there is no incentive to ever go beyond 1500 byte MTU. Pete
In a message written on Tue, Feb 03, 2004 at 08:40:22PM +0200, Petri Helenius wrote:
If you're paying for 40 byte packets anyway, there is no incentive to ever go beyond 1500
With a 20 byte IP header: A 40 byte packet is 50% data. A 1500 byte packet is 98.7% data. A 9000 byte packet is 99.7% data. Anyone who pays by the bit should like large packets better than small packets, as you pay for less "overhead" bandwidth. Note that a 1500 byte IP in IP packet becomes 1520, and then gets fragmented to 1500 and a 40 byte packet (20 data, 20 header). That's only 97.3% efficient, where as a single 1520 byte packet, if it could be carried, is 98.7% efficient. Obviously talking in smaller numbers, but to a lot of VPN vendors 1.4% improvement in bandwidth usage, bus usage, or avoiding the path through the device that fragments a packet in the first place is a big win. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/ Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org
Why large MTU then? Most modern ethernet controllers don´t care if you´re sending 1500 or 9000 byte packets. (with proper drivers taking advantage of the features there) If you´re paying for 40 byte packets anyway, there is no incentive to ever go beyond 1500 byte MTU.
I think its partially due to removal of overhead and improvements you get out of TCP (bearing in mind it uses windowing and slow start) Bit of data on this link that i googled up, http://www-iepm.slac.stanford.edu/monitoring/bulk/10ge/20030303/tests.html
Stephen J. Wilcox wrote:
Why large MTU then? Most modern ethernet controllers don´t care if you´re sending 1500 or 9000 byte packets. (with proper drivers taking advantage of the features there) If you´re paying for 40 byte packets anyway, there is no incentive to ever go beyond 1500 byte MTU.
I think its partially due to removal of overhead and improvements you get out of TCP (bearing in mind it uses windowing and slow start)
Sure, if you control both endpoints. If you don´t and receivers have small (4k,8k or 16k) window sizes, your performance will suffer. Maybe we should define if we´re talking about record breaking attempts or real operationally useful things here. Pete
In a message written on Tue, Feb 03, 2004 at 09:53:30PM +0200, Petri Helenius wrote:
Sure, if you control both endpoints. If you don�t and receivers have small (4k,8k or 16k) window sizes, your performance will suffer.
Maybe we should define if we�re talking about record breaking attempts or real operationally useful things here.
Google and Akamai are just two examples of companies with hundreds of thousands of machines where they move large amounts of data between them and have control of both ends. Many corporations are now moving off-site backup data over the Internet, in large volumes between two end points they control. The Internet is not just web servers feeding dial-up clients. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/ Read TMBG List - tmbg-list-request@tmbg.org, www.tmbg.org
Leo Bicknell wrote:
Google and Akamai are just two examples of companies with hundreds of thousands of machines where they move large amounts of data between them and have control of both ends. Many corporations are now moving off-site backup data over the Internet, in large volumes between two end points they control.
Makes me wonder if either one of the mentioned want to take the operational and support burden of increasing the MTU across maybe one of the most diverse set of paths in any environment. I would probably never send even a 1500 byte packet if I would be either of them, but live somewhere in the low-1400 range. Pete
On Tue, 3 Feb 2004, Petri Helenius wrote:
Stephen J. Wilcox wrote:
Why large MTU then? Most modern ethernet controllers don´t care if you´re sending 1500 or 9000 byte packets. (with proper drivers taking advantage of the features there) If you´re paying for 40 byte packets anyway, there is no incentive to ever go beyond 1500 byte MTU.
I think its partially due to removal of overhead and improvements you get out of TCP (bearing in mind it uses windowing and slow start)
Sure, if you control both endpoints. If you don´t and receivers have small (4k,8k or 16k) window sizes, your performance will suffer.
Maybe we should define if we´re talking about record breaking attempts or real operationally useful things here.
By definition of this discussion about using large MTU we are assuming that packets are arriving >1500 bytes and therefore that we do have control of the endpoints and they are set to use jumbos Steve
On Tue, 3 Feb 2004, Terry Baranski wrote:
A more important question is what will happen as we move out of the 1500 byte Ethernet world into the jumbo gigE world. It's only a matter of time before end users will be running gigE networks and want to use jumbo MTUs on their Internet links.
The performance gain achieved by using jumbo frames outside of very specific LAN scenarios is highly questionable, and they're still not standardized. Are "jumbo" Internet MTUs seen as a pressing issue by ISPs and vendors these days?
Being a position to use a default mtu larger that 1500 would be nice given the number of tunnels of varying varieties that have to fragment because the packets going into them are themselves 1500 bytes... 4352 and 4470 are fairly common in the internet today... edge networks that are currently jumbo enabled for the most part do just fine when talking to the rest of the internet since they can do path mtu discovery... non-jumbo enabled devices on the same subnet with jumbo devices become a big problem since they end up black-holed from the hosts. adoption in the core of networks is likely easier than at the end-user edges...
-Terry
-- -------------------------------------------------------------------------- Joel Jaeggli Unix Consulting joelja@darkwing.uoregon.edu GPG Key Fingerprint: 5C6E 0104 BAF0 40B0 5BD3 C38B F000 35AB B67F 56B2
On Tue, 03 Feb 2004 06:39:33 PST, Joel Jaeggli said:
edge networks that are currently jumbo enabled for the most part do just fine when talking to the rest of the internet since they can do path mtu discovery...
Well, until you hit one of these transit providers that uses 1918 addresses for their links. :)
Michael.Dillon@radianz.com wrote:
If RFC1918 addresses are used only on interfaces with jumbo MTUs on the order of 9000 bytes then it doesn't break PMTUD in a 1500 byte Ethernet world. And it doesn't break traceroute. We just lose the DNS hint about the router location.
I'm confused about your traceroute comment. You're assuming a packet with a RFC1918 source address won't be dropped. In many cases, it will, and should be. Each organization is permitted to use the RFC1918 address space internally for any purpose they see fit. This often means they don't want people outside the organization to be able to generate packets with source addresses for machines they consider to be internal. It makes sense to drop such packets as they come in to your AS. Assuming that a packet with an RFC1918 source address will get dropped as it crosses in to a new AS, this will break traceroute hops, Path MTU Discovery, Network/Host unreachable, or any other ICMP that needs to be generated from a router with a RFC1918 address. Is everyone filtering RFC1918 at their edge? No. But my impression is that more and more places are. Certainly anyone who uses either Team Cymru's Bogon services or similar services (doesn't Cisco now do this in IOS as well?) will be blocking them... Bob
On 3-feb-04, at 11:47, Michael.Dillon@radianz.com wrote:
Which (as discussed previously) breaks things like Path MTU Discovery, traceroute,
If RFC1918 addresses are used only on interfaces with jumbo MTUs on the order of 9000 bytes then it doesn't break PMTUD in a 1500 byte Ethernet world. And it doesn't break traceroute.
You mean if they use 9000 bytes + RFC 1918 for the internal links and 1500 + real addresses for the external links there are no problems, even when people filter the RFC 1918 addresses? That would be correct in the case where this is a single organization network. But if it's a service provider network, there may be customers somewhere that connect over 1500 byte + links. (And never mind the fact that firewall admins are incredibly paranoid and also often filter RFC 1918 sources.)
A more important question is what will happen as we move out of the 1500 byte Ethernet world into the jumbo gigE world.
Not as much as I'd hoped. My powerbook has gigabit ethernet but it's limited to 1500 byte frames.
It's only a matter of time before end users will be running gigE networks and want to use jumbo MTUs on their Internet links.
The internet has always been a network with a variable MTU size. Even today under the iron rule of ether there are many systems with MTUs that aren't 1500. And yes, obviously people will want larger MTUs. I had the opportunity to work with a couple for boxes with 10 gigabit ethernet interfaces today. Unfortunately, I was unable to squeeze more than 1.5 gbit out of them over TCP. That's 125000 packets per second at 1500 bytes, which makes no sense any which way you slice it. (And the driver did actually do 125k interrupts per second, which probably explains the poor throughput.)
Could we all agree on a hierarchy of jumbo MTU sizes that with the largest sizes in the core and the smallest sizes at the edge? The increment in sizes should allow for a layer or two of encapsulation and peering routers should use the largest size MTU.
No need. Simply always use the largest possible MTU and make sure path MTU discovery works. If you have a range of maximum MTU sizes that is pretty close (9000 and 9216 are both common) it could make sense to standardize on the lowest in the range to avoid successive PMTUD drops but apart from that there is little to be gained by over-designing. Oh yes: there were some calculations in other postings, which were quite misleading as they only looked at the 20 byte IP overhead. There's also TCP overhead (20 bytes), often a timestamp option (12 bytes) and of course the ethernet overhead which is considerable: 8 byte preamble, 14 byte header, 4 byte FCS and an inter frame gap that is equivalent to 12 bytes. So a 1500 byte IP packet takes up 1538 bytes on the wire while it only has a 1460 byte payload (94.9% efficiency). A 9000 byte IP packet takes up 9038 bytes and delivers a 8960 byte payload (99.1%). 1520 bytes in a single packet would be 95% efficiency but fragmenting this packet would create a new IP packet with a 24 byte payload for a total of 44 bytes which is padded to 46 because of the ethernet minimum packet size, for a total bandwidth use on the wire of 1618 bytes, making for an efficiency rating of 91.5%. (Fragmenting 1520 in 1496 and 44 is pretty stupid by the way, 768 and 772 would be much better, thinking of the reasons why is left as an exercise for the reader.)
participants (12)
-
bill
-
Bob Snyder
-
Iljitsch van Beijnum
-
Joel Jaeggli
-
Leo Bicknell
-
Michael.Dillon@radianz.com
-
Niels Bakker
-
Petri Helenius
-
Richard A Steenbergen
-
Stephen J. Wilcox
-
Terry Baranski
-
Valdis.Kletnieks@vt.edu