I don't mind letting the client premises routers break down 9000 byte packets. My ISP controls end to end connectivity. 80% of people even let our techs change settings on their computer, this would allow me to give ~5% increase in speeds, and less network congestion for end users for a one time $60 service many people would want. It's also where the internet should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't the entire internet just moved to 9000(or 9600 L2) byte MTU? It was created for the jump to gigabit... That's 4 orders of magnitude ago. The internet backbone shouldn't be shuffling around 1500byte packets at 1tbps. That means if you want to layer 3 that data, you need a router capable of more than half a billion packets/s forwarding capacity. On the other hand, with even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and forwarding capacity needs just 100 or so mpps capacity. Routers that forward at that rate are found for less than $2k. On 18 January 2018 at 23:31, Vincent Bernat <bernat@luffy.cx> wrote:
❦ 18 janvier 2018 22:06 -0700, Michael Crapse <michael@wi-fiber.io> :
Why though? If i could get the major CDNs all inside my network willing to run 9000 byte packets, My routers just got that much cheaper and less loaded. The Routing capacity of x86 is hindered only by forwarding capacity(PPS), not data line rate.
Unless your clients use a 9000-byte MTU, you won't see a difference but you'll have to deal with broken PMTUD (or have your routers fragment). -- Many a writer seems to think he is never profound except when he can't understand his own meaning. -- George D. Prentice
On Thu, 18 Jan 2018, Michael Crapse wrote:
I don't mind letting the client premises routers break down 9000 byte packets. My ISP controls end to end connectivity. 80% of people even let our techs change settings on their computer, this would allow me to give ~5% increase in speeds, and less network congestion for end users for a one time $60 service many people would want. It's also where the internet should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't the entire internet just moved to 9000(or 9600 L2) byte MTU? It was created for the jump to gigabit... That's 4 orders of magnitude ago. The internet backbone shouldn't be shuffling around 1500byte packets at 1tbps. That means if you want to layer 3 that data, you need a router capable of more than half a billion packets/s forwarding capacity. On the other hand, with even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and forwarding capacity needs just 100 or so mpps capacity. Routers that forward at that rate are found for less than $2k.
As usual, there are 5-10 (or more) factors playing into this. Some, in random order: 1. IEEE hasn't standardised > 1500 byte ethernet packets 2. DSL/WIFI chips typically don't support > ~2300 because reasons. 3. Because 2, most SoC ethernet chips don't either 4. There is no standardised way to understand/probe the L2 MTU to your next hop (ARP/ND and probing if the value actually works) 5. PMTUD doesn't always work. 6. PLPMTUD hasn't been implemented neither in protocols nor hosts generally. 7. Some implementations have been optimized to work on packets < 2000 bytes and actually has less performance than if they have to support larger packets (they will allocate 2k buffer memory per packet), 9k is ill-fitting across 2^X values 8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's going to be mixed-MTU unless you control all devices (which is typically not the case outside of the datacenter). 9. The PPS problem in hosts and routers was solved by hardware offloading to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS no longer was a big problem. On the value to choose for "large MTU", 9000 for edge and 9180 for core is what I advocate, after non-trivial amount of looking into this. All major core routing platforms work with 9180 (with JunOS only supporting this after 2015 or something). So if we'd want to standardise on MTU that all devices should support, then it's 9180, but we'd typically use 9000 in RA to send to devices. If we want a higher MTU to be deployable across the Internet, we need to make it incrementally deployable. Some key things to achieve that: 1. Get something like https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented. 2. Go to the IETF and get a document published that advises all protocols to support PLMTUD (RFC4821) 1 to enable mixed-MTU lans. 2 to enable large MTU hosts to actually be able to communicate when PMTUD doesn't work. With this in place (wait ~10 years), larger MTU is now incrementally deployable which means it'll be deployable on the Internet, and IEEE might actually accept to standardise > 1500 byte packets for ethernet. -- Mikael Abrahamsson email: swmike@swm.pp.se
Other than people improperly blocking ICMP, when does PMTUD not work? Honest question, not troll. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Mikael Abrahamsson" <swmike@swm.pp.se> To: "Michael Crapse" <michael@wi-fiber.io> Cc: "NANOG list" <nanog@nanog.org> Sent: Friday, January 19, 2018 1:22:02 AM Subject: Re: MTU to CDN's On Thu, 18 Jan 2018, Michael Crapse wrote:
I don't mind letting the client premises routers break down 9000 byte packets. My ISP controls end to end connectivity. 80% of people even let our techs change settings on their computer, this would allow me to give ~5% increase in speeds, and less network congestion for end users for a one time $60 service many people would want. It's also where the internet should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't the entire internet just moved to 9000(or 9600 L2) byte MTU? It was created for the jump to gigabit... That's 4 orders of magnitude ago. The internet backbone shouldn't be shuffling around 1500byte packets at 1tbps. That means if you want to layer 3 that data, you need a router capable of more than half a billion packets/s forwarding capacity. On the other hand, with even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and forwarding capacity needs just 100 or so mpps capacity. Routers that forward at that rate are found for less than $2k.
As usual, there are 5-10 (or more) factors playing into this. Some, in random order: 1. IEEE hasn't standardised > 1500 byte ethernet packets 2. DSL/WIFI chips typically don't support > ~2300 because reasons. 3. Because 2, most SoC ethernet chips don't either 4. There is no standardised way to understand/probe the L2 MTU to your next hop (ARP/ND and probing if the value actually works) 5. PMTUD doesn't always work. 6. PLPMTUD hasn't been implemented neither in protocols nor hosts generally. 7. Some implementations have been optimized to work on packets < 2000 bytes and actually has less performance than if they have to support larger packets (they will allocate 2k buffer memory per packet), 9k is ill-fitting across 2^X values 8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's going to be mixed-MTU unless you control all devices (which is typically not the case outside of the datacenter). 9. The PPS problem in hosts and routers was solved by hardware offloading to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS no longer was a big problem. On the value to choose for "large MTU", 9000 for edge and 9180 for core is what I advocate, after non-trivial amount of looking into this. All major core routing platforms work with 9180 (with JunOS only supporting this after 2015 or something). So if we'd want to standardise on MTU that all devices should support, then it's 9180, but we'd typically use 9000 in RA to send to devices. If we want a higher MTU to be deployable across the Internet, we need to make it incrementally deployable. Some key things to achieve that: 1. Get something like https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented. 2. Go to the IETF and get a document published that advises all protocols to support PLMTUD (RFC4821) 1 to enable mixed-MTU lans. 2 to enable large MTU hosts to actually be able to communicate when PMTUD doesn't work. With this in place (wait ~10 years), larger MTU is now incrementally deployable which means it'll be deployable on the Internet, and IEEE might actually accept to standardise > 1500 byte packets for ethernet. -- Mikael Abrahamsson email: swmike@swm.pp.se
On Fri, 19 Jan 2018, Mike Hammett wrote:
Other than people improperly blocking ICMP, when does PMTUD not work? Honest question, not troll.
Mismatch of MTU interface settings between interfaces, mismatch of MTU between L3 devices and intermediate L2 devices, anycast services, ECMP based services where the ICMP error is delivered to the wrong node. So yes, there are plenty reasons that PMTUD doesn't work without anyone doing it because of ill will or incompetence. -- Mikael Abrahamsson email: swmike@swm.pp.se
Wouldn't those situations be causing issues now, given the likelihood that someone with a less than 1,500 byte MTU is communicating with you now? ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Mikael Abrahamsson" <swmike@swm.pp.se> To: "Mike Hammett" <nanog@ics-il.net> Cc: "NANOG list" <nanog@nanog.org> Sent: Friday, January 19, 2018 8:05:17 AM Subject: Re: MTU to CDN's On Fri, 19 Jan 2018, Mike Hammett wrote:
Other than people improperly blocking ICMP, when does PMTUD not work? Honest question, not troll.
Mismatch of MTU interface settings between interfaces, mismatch of MTU between L3 devices and intermediate L2 devices, anycast services, ECMP based services where the ICMP error is delivered to the wrong node. So yes, there are plenty reasons that PMTUD doesn't work without anyone doing it because of ill will or incompetence. -- Mikael Abrahamsson email: swmike@swm.pp.se
On Jan 19, 2018, at 9:07 AM, Mike Hammett <nanog@ics-il.net> wrote:
Wouldn't those situations be causing issues now, given the likelihood that someone with a less than 1,500 byte MTU is communicating with you now?
Tends to be more localized and less visible in many cases. I’m aware of at least one regional network that has duplicate packet issues going on and they’ve yet to understand the root cause. This can have performance impacts that are not always understood. Things get harder to diagnose when there’s multiple paths, etc.. involved. Many folks these days just fail away from a seemingly problematic link quickly and don’t always identify the root cause. - jared
"Many folks these days just fail away from a seemingly problematic link quickly and don’t always identify the root cause." Agreed. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Jared Mauch" <jared@puck.nether.net> To: "Mike Hammett" <nanog@ics-il.net> Cc: "NANOG list" <nanog@nanog.org> Sent: Friday, January 19, 2018 8:13:02 AM Subject: Re: MTU to CDN's
On Jan 19, 2018, at 9:07 AM, Mike Hammett <nanog@ics-il.net> wrote:
Wouldn't those situations be causing issues now, given the likelihood that someone with a less than 1,500 byte MTU is communicating with you now?
Tends to be more localized and less visible in many cases. I’m aware of at least one regional network that has duplicate packet issues going on and they’ve yet to understand the root cause. This can have performance impacts that are not always understood. Things get harder to diagnose when there’s multiple paths, etc.. involved. Many folks these days just fail away from a seemingly problematic link quickly and don’t always identify the root cause. - jared
On Fri, 19 Jan 2018, Mike Hammett wrote:
Wouldn't those situations be causing issues now, given the likelihood that someone with a less than 1,500 byte MTU is communicating with you now?
If the issue is that you're letting 8996 byte packets through but not 9000 byte packets, then no. -- Mikael Abrahamsson email: swmike@swm.pp.se
❦ 19 janvier 2018 08:07 -0600, Mike Hammett <nanog@ics-il.net> :
Wouldn't those situations be causing issues now, given the likelihood that someone with a less than 1,500 byte MTU is communicating with you now?
Those situations are causing issues now. If you have a MTU less than 1500 bytes, it is likely some destination are unreachable to you if you only rely on PMTUD. People usually rely on TCP MSS for those cases. -- I'll burn my books. -- Christopher Marlowe
On Fri, Jan 19, 2018 at 9:07 AM, Mike Hammett <nanog@ics-il.net> wrote:
Wouldn't those situations be causing issues now, given the likelihood that someone with a less than 1,500 byte MTU is communicating with you now?
Hi Mike, They do. These are the people calling your support line with the complaint that they can't get to your web site from home, but can from work (or vice versa). Your web site is "obviously" working and the calls are infrequent, so support advises there's a problem with the customer's ISP. Regards, Bill Herrin -- William Herrin ................ herrin@dirtside.com bill@herrin.us Dirtside Systems ......... Web: <http://www.dirtside.com/>
And also: When the router generates the ICMP by punting the packet to its CPU and such traffic is - legitimately - rate-limited to avoir crashing the router. When the ICMP is sourced by a private IP on the router for various legitimate reasons (not enough public IPv4 addresses, from within a VRF, or whatever), while packets from private IPs are legitimately filtered when entering the target network.
Le 19 janv. 2018 à 15:05, Mikael Abrahamsson <swmike@swm.pp.se> a écrit :
On Fri, 19 Jan 2018, Mike Hammett wrote:
Other than people improperly blocking ICMP, when does PMTUD not work? Honest question, not troll.
Mismatch of MTU interface settings between interfaces, mismatch of MTU between L3 devices and intermediate L2 devices, anycast services, ECMP based services where the ICMP error is delivered to the wrong node.
So yes, there are plenty reasons that PMTUD doesn't work without anyone doing it because of ill will or incompetence.
On 19 January 2018 at 13:48, Mike Hammett <nanog@ics-il.net> wrote:
Other than people improperly blocking ICMP, when does PMTUD not work? Honest question, not troll.
It can break under _certain_ scenarios with Anycast. It can break under _certain_ scenarios in v6 with ECMP. It can break across an LB in L4 mode, when a real behind the LB has an unexpected MSS. None of these scenarios are the normal, obviously, however PMTUD does have some edge-cases. /Ruairi
----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com
Midwest-IX http://www.midwest-ix.com
----- Original Message -----
From: "Mikael Abrahamsson" <swmike@swm.pp.se> To: "Michael Crapse" <michael@wi-fiber.io> Cc: "NANOG list" <nanog@nanog.org> Sent: Friday, January 19, 2018 1:22:02 AM Subject: Re: MTU to CDN's
On Thu, 18 Jan 2018, Michael Crapse wrote:
I don't mind letting the client premises routers break down 9000 byte packets. My ISP controls end to end connectivity. 80% of people even let our techs change settings on their computer, this would allow me to give ~5% increase in speeds, and less network congestion for end users for a one time $60 service many people would want. It's also where the internet should be heading... Not to beat a dead horse(re:ipv6 ) but why hasn't the entire internet just moved to 9000(or 9600 L2) byte MTU? It was created for the jump to gigabit... That's 4 orders of magnitude ago. The internet backbone shouldn't be shuffling around 1500byte packets at 1tbps. That means if you want to layer 3 that data, you need a router capable of more than half a billion packets/s forwarding capacity. On the other hand, with even just a 9000 byte MTU, TCP/IP overhead is reduced 6 fold, and forwarding capacity needs just 100 or so mpps capacity. Routers that forward at that rate are found for less than $2k.
As usual, there are 5-10 (or more) factors playing into this. Some, in random order:
1. IEEE hasn't standardised > 1500 byte ethernet packets 2. DSL/WIFI chips typically don't support > ~2300 because reasons. 3. Because 2, most SoC ethernet chips don't either 4. There is no standardised way to understand/probe the L2 MTU to your next hop (ARP/ND and probing if the value actually works) 5. PMTUD doesn't always work. 6. PLPMTUD hasn't been implemented neither in protocols nor hosts generally. 7. Some implementations have been optimized to work on packets < 2000 bytes and actually has less performance than if they have to support larger packets (they will allocate 2k buffer memory per packet), 9k is ill-fitting across 2^X values 8. Because of all above reasons, mixed-MTU LAN doesn't work, and it's going to be mixed-MTU unless you control all devices (which is typically not the case outside of the datacenter). 9. The PPS problem in hosts and routers was solved by hardware offloading to NICs and forwarding NPUs/ASICs with very high lookup speeds where PPS no longer was a big problem.
On the value to choose for "large MTU", 9000 for edge and 9180 for core is what I advocate, after non-trivial amount of looking into this. All major core routing platforms work with 9180 (with JunOS only supporting this after 2015 or something). So if we'd want to standardise on MTU that all devices should support, then it's 9180, but we'd typically use 9000 in RA to send to devices.
If we want a higher MTU to be deployable across the Internet, we need to make it incrementally deployable. Some key things to achieve that:
1. Get something like https://tools.ietf.org/html/draft-van-beijnum-multi-mtu-05 implemented. 2. Go to the IETF and get a document published that advises all protocols to support PLMTUD (RFC4821)
1 to enable mixed-MTU lans. 2 to enable large MTU hosts to actually be able to communicate when PMTUD doesn't work.
With this in place (wait ~10 years), larger MTU is now incrementally deployable which means it'll be deployable on the Internet, and IEEE might actually accept to standardise > 1500 byte packets for ethernet.
-- Mikael Abrahamsson email: swmike@swm.pp.se
On Fri, Jan 19, 2018 at 8:48 AM, Mike Hammett <nanog@ics-il.net> wrote:
Other than people improperly blocking ICMP, when does PMTUD not work? Honest question, not troll.
Hi Mike, One common scenario: the router's interface is numbered with an RFC 1918 private IP address. The packet is dropped because it tries to enter an adjacent system with a source address that isn't valid for the transit. Another common scenario: the packet is encapsulated in MPLS when it reaches the segment which can't handle the large packet. That particular router is not set up to decapsulate the MPLS packet and act on the IPv4 packet inside. A third scenario: asymmetric routing. A particular router is capable of moving packets to your destination but either intentionally or due to a configuration error is unable to route packets back to the source. A fourth scenario: for security reasons (part of defense in depth), a host is only permitted to communicate with whitelisted IP addresses. Random Internet routers are not on the whitelist. PMTUD's routine failure demonstrates the wisdom of the end to end principle. It's the one critical place in base IPv4 that doesn't follow it. Regards, Bill Herrin -- William Herrin ................ herrin@dirtside.com bill@herrin.us Dirtside Systems ......... Web: <http://www.dirtside.com/>
participants (8)
-
Jared Mauch
-
Michael Crapse
-
Mikael Abrahamsson
-
Mike Hammett
-
Olivier Benghozi
-
Ruairi Carroll
-
Vincent Bernat
-
William Herrin