Hi What is best practice regarding choosing MTU on transit links? Until now we have used the default of 1500 bytes. I now have a project were we peer directly with another small ISP. However we need a backup so we figured a GRE tunnel on a common IP transit carrier would work. We want to avoid the troubles you get by having an effective MTU smaller than 1500 inside the tunnel, so the IP transit carrier agreed to configure a MTU of 9216. Obviously I only need to increase my MTU by the size of the GRE header. But I am thinking is there any reason not to go all in and ask every peer to go to whatever max MTU they can support? My own equipment will do MTU of 9600 bytes. On the other hand, none of my customers will see any actual difference because they are end users with CPE equipment that expects a 1500 byte MTU. Trying to deliver jumbo frames to the end users is probably going to end badly. Regards, Baldur
On 22/Jul/16 14:01, Baldur Norddahl wrote:
Obviously I only need to increase my MTU by the size of the GRE header. But I am thinking is there any reason not to go all in and ask every peer to go to whatever max MTU they can support? My own equipment will do MTU of 9600 bytes.
See the below: http://mailman.nanog.org/pipermail/nanog/2016-March/084598.html You can reliably run Jumbo frames in your own network core, and also to another network that can guarantee you the same (which would typically be under some form of commercial, private arrangement like an NNI). Across the Internet, 1,500 bytes is still safest, simply because that is pretty much the standard. Trying to achieve Jumbo frames across an Internet link (which includes links to your upstreams, links to your peers and links to your customers) is an exercise in pain. Mark.
This topic seems to come up more lately. Much like it did often during IPSec related deployments. I simplify on 9,000 as an easy number and I don't have to split hairs (read 9,214 v 9,216) that some vendors have. My experience has been making a view phone calls and agreeing on 9,000 is simple enough. I've only experienced one situation for which the MTU must match and that is on OSPF neighbor relationships, for which John T. Moy's book (OSPF - Anatomy of an Internet Routing Protocol) clearly explains why MTU became an issue during development of that protocol. As more and more of us choose or are forced to support 'jumbo' frames to accommodate Layer 2 extensions (DCI [Data Center Interconnects]) I find myself helping my customers work with their carriers to ensure that jumbo frames are supported. And frequently remind them to inquire that they be enabled not only on the primary path/s but any possible back up path as well. I've had customers experience DCI-related outages because their provider performed maintenance on the primary path and the re-route was sent across a path that did not support jumbo frames. As always, YMMV but I personally feel having the discussions and implementation with your internal network team as well as all of your providers is time well spent. Later, -chris On Fri, Jul 22, 2016 at 8:53 AM, Mark Tinka <mark.tinka@seacom.mu> wrote:
On 22/Jul/16 14:01, Baldur Norddahl wrote:
Obviously I only need to increase my MTU by the size of the GRE header.
But
I am thinking is there any reason not to go all in and ask every peer to go to whatever max MTU they can support? My own equipment will do MTU of 9600 bytes.
See the below:
http://mailman.nanog.org/pipermail/nanog/2016-March/084598.html
You can reliably run Jumbo frames in your own network core, and also to another network that can guarantee you the same (which would typically be under some form of commercial, private arrangement like an NNI).
Across the Internet, 1,500 bytes is still safest, simply because that is pretty much the standard. Trying to achieve Jumbo frames across an Internet link (which includes links to your upstreams, links to your peers and links to your customers) is an exercise in pain.
Mark.
-- Chris Kane CCIE 14430 614 329 1906
On 22/Jul/16 15:42, Chris Kane wrote:
My experience has been making a view phone calls and agreeing on 9,000 is simple enough. I've only experienced one situation for which the MTU must match and that is on OSPF neighbor relationships, for which John T. Moy's book (OSPF - Anatomy of an Internet Routing Protocol) clearly explains why MTU became an issue during development of that protocol. As more and more of us choose or are forced to support 'jumbo' frames to accommodate Layer 2 extensions (DCI [Data Center Interconnects]) I find myself helping my customers work with their carriers to ensure that jumbo frames are supported. And frequently remind them to inquire that they be enabled not only on the primary path/s but any possible back up path as well. I've had customers experience DCI-related outages because their provider performed maintenance on the primary path and the re-route was sent across a path that did not support jumbo frames.
DCI links tend to be private in nature, and 100% on-net or off-net with guarantees (NNI). The question here is about the wider Internet.
As always, YMMV but I personally feel having the discussions and implementation with your internal network team as well as all of your providers is time well spent.
I don't disagree. The issue comes when other networks beyond your provider, and their providers/peers, whose providers/peers, and their providers/peers, is something you cannot control. This falls into the same category of "Can QoS markings be honored across the Internet" cases. Mark.
❦ 22 juillet 2016 14:01 CEST, Baldur Norddahl <baldur.norddahl@gmail.com> :
Until now we have used the default of 1500 bytes. I now have a project were we peer directly with another small ISP. However we need a backup so we figured a GRE tunnel on a common IP transit carrier would work. We want to avoid the troubles you get by having an effective MTU smaller than 1500 inside the tunnel, so the IP transit carrier agreed to configure a MTU of 9216.
Obviously I only need to increase my MTU by the size of the GRE header. But I am thinking is there any reason not to go all in and ask every peer to go to whatever max MTU they can support? My own equipment will do MTU of 9600 bytes.
You should always match the MTU of the remote end. So, if your transit carrier configured 9126 on its side, you should do the same on yours. There is no MTU discovery at the L2 layer: if you setup the MTU of your interface at 9600 and you happen to route a 9500-byte packets, it will be silently dropped by your transit carrier. -- Test input for validity and plausibility. - The Elements of Programming Style (Kernighan & Plauger)
On Fri, Jul 22, 2016 at 8:01 AM, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
What is best practice regarding choosing MTU on transit links?
Hi Baldur, On a link containing only routers, you can safely increase the MTU to any mutually agreed value with these caveats: 1. Not all equipment behaves well with large packets. It supposed to but you know what they say. 2. No protocol guarantees that every device on the link has the same MTU. It's a manual configuration task on each device and if the maximum receive unit on any device should happen to be less than the maximum transmit unit on any other, you will be intermittently screwed. This includes virtual links like the GRE tunnel. If you can guarantee the GRE tunnel travels a 9k path, you can set a slightly smaller MTU on the tunnel itself. MTU should never be increased above 1500 on a link containing workstations and servers unless you know for certain that packets emitted on that link will never traverse the public Internet. Path MTU discovery on the Internet is broken. It was a poor design - broke the end to end principle - and over the years we've misimplemented it so badly that it has no serious production-level of reliability. Where practical, it's actually a good idea to detune your servers to a 1460 or lower packet size in order to avoid problems transiting those parts of the Internet which have allowed themselves to fall beneath a 1500 byte MTU. This is often accomplished by asking the firewall to adjust the TCP MSS value in flight. Regards, Bill Herrin -- William Herrin ................ herrin@dirtside.com bill@herrin.us Owner, Dirtside Systems ......... Web: <http://www.dirtside.com/>
On 2016-07-22 15:57, William Herrin wrote:
On a link containing only routers, you can safely increase the MTU to any mutually agreed value with these caveats:
What I noticed a few years ago was that BGP convergence time was faster with higher MTU. Full BGP table load took twice less time on MTU 9192 than on 1500. Of course BGP has to be allowed to use higher MTU. Anyone else observed something similar? -- Grzegorz Janoszka
On Jul 22, 2016, at 1:37 PM, Grzegorz Janoszka <Grzegorz@Janoszka.pl> wrote: What I noticed a few years ago was that BGP convergence time was faster with higher MTU. Full BGP table load took twice less time on MTU 9192 than on 1500. Of course BGP has to be allowed to use higher MTU.
Anyone else observed something similar?
I have read about others experiencing this, and did some testing a few months back -- my experience was that for low latency links, there was a measurable but not huge difference. For high latency links, with Juniper anyway, there was a very negligible difference, because the TCP Window size is hard-coded at something small (16384?), so that ends up being the limit more than the tcp slow-start issues that MTU helps with. With that said, we run MTU at >9000 on all of our transit links, and all of our internal links, with no problems. Make sure to do testing to send pings with do-not-fragment at the maximum size configured, and without do-not-fragment just slightly larger than the maximum size configured, to make sure that there are no mismatches on configuration due to vendor differences. Best Regards, -Phil Rosenthal
On 2016-07-22 20:20, Phil Rosenthal wrote:
On Jul 22, 2016, at 1:37 PM, Grzegorz Janoszka <Grzegorz@Janoszka.pl> wrote: What I noticed a few years ago was that BGP convergence time was faster with higher MTU. Full BGP table load took twice less time on MTU 9192 than on 1500. Of course BGP has to be allowed to use higher MTU.
Anyone else observed something similar?
I have read about others experiencing this, and did some testing a few months back -- my experience was that for low latency links, there was a measurable but not huge difference. For high latency links, with Juniper anyway, there was a very negligible difference, because the TCP Window size is hard-coded at something small (16384?), so that ends up being the limit more than the tcp slow-start issues that MTU helps with.
I tested Cisco CRS-1 (or maybe already upgraded to CRS-3) to Juniper MX480 or MX960 on about 10 ms latency link. It was iBGP carrying internal routes plus full BGP table (both ways). I think the bottleneck was CPU on the CRS side and maxing MSS helped a lot. I recall doing later on tests Juniper to Juniper and indeed the gain was not that big, but it was still visible. Juniper command 'show system connections' showed MSS around 9kB. I haven't checked TCP Window size. -- Grzegorz Janoszka
On 7/22/16, Phil Rosenthal <pr@isprime.com> wrote:
On Jul 22, 2016, at 1:37 PM, Grzegorz Janoszka <Grzegorz@Janoszka.pl> wrote: What I noticed a few years ago was that BGP convergence time was faster with higher MTU. Full BGP table load took twice less time on MTU 9192 than on 1500. Of course BGP has to be allowed to use higher MTU.
Anyone else observed something similar?
I have read about others experiencing this, and did some testing a few months back -- my experience was that for low latency links, there was a measurable but not huge difference. For high latency links, with Juniper anyway, there was a very negligible difference, because the TCP Window size is hard-coded at something small (16384?), so that ends up being the limit more than the tcp slow-start issues that MTU helps with.
I think the Cisco default window size is 16KB but you can change it with ip tcp window-size NNN Lee
With that said, we run MTU at >9000 on all of our transit links, and all of our internal links, with no problems. Make sure to do testing to send pings with do-not-fragment at the maximum size configured, and without do-not-fragment just slightly larger than the maximum size configured, to make sure that there are no mismatches on configuration due to vendor differences.
Best Regards, -Phil Rosenthal
On 22 Jul 2016, at 19:37, Grzegorz Janoszka <Grzegorz@Janoszka.pl> wrote:
On 2016-07-22 15:57, William Herrin wrote: On a link containing only routers, you can safely increase the MTU to any mutually agreed value with these caveats:
What I noticed a few years ago was that BGP convergence time was faster with higher MTU. Full BGP table load took twice less time on MTU 9192 than on 1500. Of course BGP has to be allowed to use higher MTU.
Quite obvious thing - BGP by default on Cisco and Juniper will use up to max allowed 4k message per packet, which for typical unicast IPv4/v6 helps to pack all attributes with prefix. This not only improves (lowers) CPU load on sending side but also on the receiving end and helps with routing convergence. There was a draft to use up to 9k for BGP messaging, but I belive it's buried somewhere on the outside of town called "our current version RFC". -- Łukasz Bromirski
On 22/Jul/16 19:37, Grzegorz Janoszka wrote:
What I noticed a few years ago was that BGP convergence time was faster with higher MTU. Full BGP table load took twice less time on MTU 9192 than on 1500. Of course BGP has to be allowed to use higher MTU.
Anyone else observed something similar?
Yes, of course. Larger MSS for BGP updates means fewer BGP updates within which convergence can occur. The problem is eBGP sessions are generally ran between different networks, where co-ordinating MTU can be an issue. Mark.
On Jul 22, 2016, at 1:37 PM, Grzegorz Janoszka <Grzegorz@Janoszka.pl> wrote:
On 2016-07-22 15:57, William Herrin wrote:
On a link containing only routers, you can safely increase the MTU to any mutually agreed value with these caveats:
What I noticed a few years ago was that BGP convergence time was faster with higher MTU. Full BGP table load took twice less time on MTU 9192 than on 1500. Of course BGP has to be allowed to use higher MTU.
Anyone else observed something similar?
This has been well known for years: http://morse.colorado.edu/~epperson/courses/routing-protocols/handouts/bgp_s... You have to adjust the MTU, Input queues and such. The default TCP stack is very conservative. - Jared
On Fri 2016-Jul-22 14:01:36 +0200, Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
Hi
What is best practice regarding choosing MTU on transit links?
Until now we have used the default of 1500 bytes. I now have a project were we peer directly with another small ISP. However we need a backup so we figured a GRE tunnel on a common IP transit carrier would work. We want to avoid the troubles you get by having an effective MTU smaller than 1500 inside the tunnel, so the IP transit carrier agreed to configure a MTU of 9216.
Obviously I only need to increase my MTU by the size of the GRE header. But I am thinking is there any reason not to go all in and ask every peer to go to whatever max MTU they can support? My own equipment will do MTU of 9600 bytes.
If you're just doing this for the GRE overhead and given that you're talking about backup over transit and possibly $deity-knows-where paths, TBH I might just lean towards pinning your L3 MTU inside the tunnel to 1500 bytes and configuring IP fragmentation post-encap. Not pretty, but probably fewer chances for WTF moments than trying to push >1500 on a transit path. This *might* be coloured by my past fights with having to force GRE through a 1500-byte path and trying to make that transparent to transit traffic, but there you have it...
On the other hand, none of my customers will see any actual difference because they are end users with CPE equipment that expects a 1500 byte MTU. Trying to deliver jumbo frames to the end users is probably going to end badly.
Regards,
Baldur
-- Hugo Slabbert | email, xmpp/jabber: hugo@slabnet.com pgp key: B178313E | also on Signal
* Baldur Norddahl
What is best practice regarding choosing MTU on transit links?
Until now we have used the default of 1500 bytes. I now have a project were we peer directly with another small ISP. However we need a backup so we figured a GRE tunnel on a common IP transit carrier would work. We want to avoid the troubles you get by having an effective MTU smaller than 1500 inside the tunnel, so the IP transit carrier agreed to configure a MTU of 9216.
You use case as described above puzzles me. You should already your peer's routes being advertised to you via the transit provider and vice versa. If your direct peering fails, the traffic should start flowing via the transit provider automatically. So unless there's something else going on here you're not telling us there should be no need for the GRE tunnel. That said, it should work, as long as the MTU is increased in both ends and the transit network guarantees it will transports the jumbos. We're doing something similar, actually. We have multiple sites connected with either dark fibre or DWDM, but not always in a redundant fashion. So instead we run GRE tunnels through transit (with increased MTU) between selected sites to achieve full redundancy. This has worked perfectly so far. It's only used for our intra-AS IP/MPLS traffic though, not for eBGP like you're considering.
Obviously I only need to increase my MTU by the size of the GRE header. But I am thinking is there any reason not to go all in and ask every peer to go to whatever max MTU they can support? My own equipment will do MTU of 9600 bytes.
I'd say it's not worth the trouble unless you know you're going to use it for anything. If I was your peer I'd certainly need you to give me a good reason why I should deviate from my standard templates first...
On the other hand, none of my customers will see any actual difference because they are end users with CPE equipment that expects a 1500 byte MTU. Trying to deliver jumbo frames to the end users is probably going to end badly.
Depends on the end user, I guess. Residential? Agreed. Business? Who knows - maybe they would like to run fat GRE tunnels through your network? In any case: 1500 by default, other values only by request. Tore
On 23 July 2016 at 10:28, Tore Anderson <tore@fud.no> wrote:
* Baldur Norddahl
What is best practice regarding choosing MTU on transit links?
Until now we have used the default of 1500 bytes. I now have a project were we peer directly with another small ISP. However we need a backup so we figured a GRE tunnel on a common IP transit carrier would work. We want to avoid the troubles you get by having an effective MTU smaller than 1500 inside the tunnel, so the IP transit carrier agreed to configure a MTU of 9216.
You use case as described above puzzles me. You should already your peer's routes being advertised to you via the transit provider and vice versa. If your direct peering fails, the traffic should start flowing via the transit provider automatically. So unless there's something else going on here you're not telling us there should be no need for the GRE tunnel.
I did not say we were doing internet peering... In case you are wondering, we are actually running L2VPN tunnels over MPLS. Regards, Baldur
* Baldur Norddahl
I did not say we were doing internet peering...
Uhm. When you say that you peer with another ISP (and keep in mind what the "I" in ISP stands for), while giving no further details, then folks are going to assume that you're talking about a standard eBGP peering with inet/inet6 unicast NLRIs.
In case you are wondering, we are actually running L2VPN tunnels over MPLS.
Okay. Well, I see no reason why using GRE tunnels for this purpose shouldn't work, it does for us (using mostly VPLS and Martini tunnels). That said, I've never tried extending our MPLS backbone outside of our own administrative domain or autonomous system. That sounds like a really scary prospect to me, but I'll admit I've never given serious consideration to such an arrangement before. Hopefully you know what you're doing. Tore
On 23/Jul/16 13:32, Tore Anderson wrote:
That said, I've never tried extending our MPLS backbone outside of our own administrative domain or autonomous system. That sounds like a really scary prospect to me, but I'll admit I've never given serious consideration to such an arrangement before. Hopefully you know what you're doing.
Well, you can extend your MPLS-based services outside your domain through an NNI. Fair point, you generally won't run MPLS with your NNI partner, but they will carry your services across their own MPLS network toward their destination on the B-end. With such an arrangement, one can co-ordinate that capabilities between different networks are mirrored even though there isn't end-to-end control for either NNI partner. Mark.
participants (12)
-
Baldur Norddahl
-
Chris Kane
-
Grzegorz Janoszka
-
Hugo Slabbert
-
Jared Mauch
-
Lee
-
Mark Tinka
-
Phil Rosenthal
-
Tore Anderson
-
Vincent Bernat
-
William Herrin
-
Łukasz Bromirski