Jumbo frame Question

newer
RE: Level 3 Communications Issues...

older
RE: Level 3 Communications Issues...

Harris Hui

26 Nov 2010 26 Nov '10

12:13 a.m.

Hi Does anyone have experience on design / implementing the Jumbo frame enabled network? I am working on a project to better utilize a fiber link across east coast and west coast with the Juniper devices. Based on the default TCP windows in Linux / Windows and the latency between east coast and west coast (~80ms) and the default MTU size 1500, the maximum throughput of a single TCP session is around ~3Mbps but it is too slow for us to backing-up the huge amount of data across 2 sites. The following is the topology that we are using right now. Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host B I was trying to test the connectivity from Host A to the J-6350 cluster A by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping. Does anyone have experience on it? please advise. Thanks :-)

Show replies by date

Adrian Chadd

26 Nov 26 Nov

12:19 a.m.

TCP maximum window sizes. Application socket buffer sizes. Fix those and re-test! Adrian On Fri, Nov 26, 2010, Harris Hui wrote:

...

Hi

Does anyone have experience on design / implementing the Jumbo frame enabled network?

I am working on a project to better utilize a fiber link across east coast and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency between east coast and west coast (~80ms) and the default MTU size 1500, the maximum throughput of a single TCP session is around ~3Mbps but it is too slow for us to backing-up the huge amount of data across 2 sites.

The following is the topology that we are using right now.

Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host B

I was trying to test the connectivity from Host A to the J-6350 cluster A by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.

Does anyone have experience on it? please advise.

Thanks :-)

-- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $24/pm+GST entry-level VPSes w/ capped bandwidth charges available in WA -

Wil Schultz

12:33 a.m.

This helps tons. speedguide.net has some registry 'tweeks' for different versions of windows. Also Win7 had the ability to turn on a FASTTCP type of congestion management called Compound TCP. I haven't tried the windows version so ymmv, but I have experienced great success by changing the congestion avoidance algorithm on other devices. -wil On Nov 25, 2010, at 4:19 PM, Adrian Chadd <adrian@creative.net.au> wrote:

...

TCP maximum window sizes.

Application socket buffer sizes.

Fix those and re-test!

Adrian

On Fri, Nov 26, 2010, Harris Hui wrote:

...
Hi

Does anyone have experience on design / implementing the Jumbo frame enabled network?

I am working on a project to better utilize a fiber link across east coast and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency between east coast and west coast (~80ms) and the default MTU size 1500, the maximum throughput of a single TCP session is around ~3Mbps but it is too slow for us to backing-up the huge amount of data across 2 sites.

The following is the topology that we are using right now.

Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host B

I was trying to test the connectivity from Host A to the J-6350 cluster A by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.

Does anyone have experience on it? please advise.

Thanks :-)

-- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $24/pm+GST entry-level VPSes w/ capped bandwidth charges available in WA -

Kevin Oberman

12:26 a.m.

...

From: Harris Hui <harris.hui@hk1.ibm.com> Date: Fri, 26 Nov 2010 08:13:57 +0800

Hi

Does anyone have experience on design / implementing the Jumbo frame enabled network?

I am working on a project to better utilize a fiber link across east coast and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency between east coast and west coast (~80ms) and the default MTU size 1500, the maximum throughput of a single TCP session is around ~3Mbps but it is too slow for us to backing-up the huge amount of data across 2 sites.

The following is the topology that we are using right now.

Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host B

I was trying to test the connectivity from Host A to the J-6350 cluster A by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.

Does anyone have experience on it? please advise.

Thanks :-)

MTU is only one issue. System tuning and a clean path are also critical. Getting good data streams between two systems that far apart is not easy, but with reasonable effort you can get 300 to 400 Mbps. If an 8000 byte ping fails, that says that SOMETHING is not jumbo enabled, but it's hard to tell what. This assumes that no firewall or other device is blocking ICMP, but I assume that 1400 byte pings work. Try hop-by-hop tests. I should also mention that some DWDM gear needs to be configured to handle jumbos. We've been bitten by that. You tend to assume that layer 1 gear won't care about layer 2 issues, but the input is an Ethernet interface. Finally, host tuning is critical. You talk about "default" window size", but modern stack auto-tune window size. For lots of information on tuning and congestion management, see http://fasterdata.es.net. We move terabytes of data between CERN and the US and have to make sure that the 10GE links run at close to capacity and streams of more than a Gbps will work. (It's not easy.) -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751

Matthew Petach

1:06 a.m.

On Thu, Nov 25, 2010 at 4:26 PM, Kevin Oberman <oberman@es.net> wrote:

...

...
From: Harris Hui <harris.hui@hk1.ibm.com> Date: Fri, 26 Nov 2010 08:13:57 +0800

Hi

Does anyone have experience on design / implementing the Jumbo frame enabled network?

I am working on a project to better utilize a fiber link across east coast and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency between east coast and west coast (~80ms) and the default MTU size 1500, the maximum throughput of a single TCP session is around ~3Mbps but it is too slow for us to backing-up the huge amount of data across 2 sites.

The following is the topology that we are using right now.

Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host B

I was trying to test the connectivity from Host A to the J-6350 cluster A by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.

Does anyone have experience on it? please advise.

Thanks :-)

MTU is only one issue. System tuning and a clean path are also critical. Getting good data streams between two systems that far apart is not easy, but with reasonable effort you can get 300 to 400 Mbps.

If an 8000 byte ping fails, that says that SOMETHING is not jumbo enabled, but it's hard to tell what. This assumes that no firewall or other device is blocking ICMP, but I assume that 1400 byte pings work. Try hop-by-hop tests.

I should also mention that some DWDM gear needs to be configured to handle jumbos. We've been bitten by that. You tend to assume that layer 1 gear won't care about layer 2 issues, but the input is an Ethernet interface.

Finally, host tuning is critical. You talk about "default" window size", but modern stack auto-tune window size. For lots of information on tuning and congestion management, see http://fasterdata.es.net. We move terabytes of data between CERN and the US and have to make sure that the 10GE links run at close to capacity and streams of more than a Gbps will work. (It's not easy.) -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751

We move hundreds of TB around from one side of the planet to the other on a regular basis. Kevin's link has some really good resources listed on it. I can't stress enough the requirement for doing BOTH OS-level kernel tuning (make sure that RFC1323 extensions are enabled, make sure you have big enough maximum send and receive buffers; if you OS does auto-tuning, make sure the maximum parameters set are big enough to support all the data you'll want to have in flight at any one time) AND application level adjustments. One of the biggest stumbling blocks we run across is people who have done their OS tuning, but then try to use stock SSH/SCP for moving files around. It doesn't matter how much tuning you do in the OS if your application only has a 1MB or 64KB buffer for data handling, you just won't get the throughput you're looking for. But with proper OS and application layer tuning, you can move a lot of data even over stock 1500 byte frames; don't be distracted by jumboframes, it's a red herring when it comes to actually moving large volumes of data around. (yes, yes, it's not completely irrelevant, for the pedants in the audience--but it's not required by any means). Matt

Hank Nussbacher

4:37 a.m.

On Fri, 26 Nov 2010, Harris Hui wrote: You might want to read this: http://kb.pert.geant.net/PERTKB/JumboMTU -Hank

...

Hi

Does anyone have experience on design / implementing the Jumbo frame enabled network?

I am working on a project to better utilize a fiber link across east coast and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency between east coast and west coast (~80ms) and the default MTU size 1500, the maximum throughput of a single TCP session is around ~3Mbps but it is too slow for us to backing-up the huge amount of data across 2 sites.

The following is the topology that we are using right now.

Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host B

I was trying to test the connectivity from Host A to the J-6350 cluster A by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.

Does anyone have experience on it? please advise.

Thanks :-)

George Bonser

5:14 a.m.

...

Hi

Does anyone have experience on design / implementing the Jumbo frame enabled network?

I am working on a project to better utilize a fiber link across east coast and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency between east coast and west coast (~80ms) and the default MTU size 1500, the maximum throughput of a single TCP session is around ~3Mbps but it is too slow for us to backing-up the huge amount of data across 2 sites.

There are a lot of stack tweaks you can make but the real answer is larger MTU sizes in addition to those tweaks. Our network is completely 9000 MTU internally. We don't deploy any servers anymore with MTU 1500. MTU 1500 is just plain stupid with any network >100mb ethernet.

...

The following is the topology that we are using right now.

Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN

...

...
(MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host B

I was trying to test the connectivity from Host A to the J-6350 cluster A by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.

Does anyone have experience on it? please advise.

Thanks :-)

You might have some transport in the path (SONET?) that can't send 8000. I would try starting at 3000 and working up to find where your limit is. Your description of "fiber link across site" is vague. Who is the vendor, what kind of service?

Brandon Kim

4:02 p.m.

Where would the world be if we weren't stuck at 1500 MTU? I've always kinda thought, what if that was larger from the start.... We keep getting faster switchports, but the MTU is still 1500 MTU! I'm sure someone has done some testing with a 10/100 switch with jumbo frames enables versus a 10/100/1000 switch using regular 1500 MTU and compared the performance.....

...

Subject: RE: Jumbo frame Question Date: Thu, 25 Nov 2010 21:14:02 -0800 From: gbonser@seven.com To: harris.hui@hk1.ibm.com; nanog@nanog.org

...
Hi

Does anyone have experience on design / implementing the Jumbo frame enabled network?

I am working on a project to better utilize a fiber link across east coast and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency between east coast and west coast (~80ms) and the default MTU size 1500, the maximum throughput of a single TCP session is around ~3Mbps but it is too slow for us to backing-up the huge amount of data across 2 sites.

There are a lot of stack tweaks you can make but the real answer is larger MTU sizes in addition to those tweaks. Our network is completely 9000 MTU internally. We don't deploy any servers anymore with MTU 1500. MTU 1500 is just plain stupid with any network >100mb ethernet.

...
The following is the topology that we are using right now.

Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN

...
...
(MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host B

I was trying to test the connectivity from Host A to the J-6350 cluster A by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.

Does anyone have experience on it? please advise.

Thanks :-)

You might have some transport in the path (SONET?) that can't send 8000. I would try starting at 3000 and working up to find where your limit is.

Your description of "fiber link across site" is vague. Who is the vendor, what kind of service?

Mikael Abrahamsson

4:40 p.m.

On Fri, 26 Nov 2010, Brandon Kim wrote:

...

We keep getting faster switchports, but the MTU is still 1500 MTU! I'm sure someone has done some testing with a 10/100 switch with jumbo frames enables versus a 10/100/1000 switch using regular 1500 MTU and compared the performance.....

1500 MTU made sense when network was 10 megabit/s. Now that we have gig and 10GE (and soon general availability of 100GE), I don't understand why 9000 makes people excited, if we're going to do a serious effort towards larger MTU, let's make it 150000 then (100x) or at least 64k. 6x size different isn't that much, and it's going to involve a lot of work to make it happen, so if we're going to do that work, do it properly. -- Mikael Abrahamsson email: swmike@swm.pp.se

Geo.

5:03 p.m.

New subject: need a contact

Is there anyone on the list from facebook? Email me directly please. George Roettger

Randy Bush

8:24 p.m.

...

1500 MTU made sense when network was 10 megabit/s.

Now that we have gig and 10GE (and soon general availability of 100GE), I don't understand why 9000 makes people excited, if we're going to do a serious effort towards larger MTU, let's make it 150000 then (100x) or at least 64k.

the reason ieee has not allowed upping of the frame size is that the crc is at the prudent limits at 1500. yes, we do another check above the frame (uh, well, udp4 may not), but the ether spec can not count on that. randy

George Bonser

8:56 p.m.

...

...
1500 MTU made sense when network was 10 megabit/s.

Now that we have gig and 10GE (and soon general availability of 100GE), I don't understand why 9000 makes people excited, if we're going to do a serious effort towards larger MTU, let's make it 150000 then (100x) or at least 64k.

the reason ieee has not allowed upping of the frame size is that the crc is at the prudent limits at 1500. yes, we do another check above the frame (uh, well, udp4 may not), but the ether spec can not count on that.

randy

The CRC loses its effectiveness at around 12K bytes so yeah, 64K bytes would probably require a change to detect all possible double-but errors. But 9K bytes is still within the effective range of the current CRC algorithm.

...

From Dykstra:

"'Jumbo frames' extends ethernet to 9000 bytes. Why 9000? First because ethernet uses a 32 bit CRC that loses its effectiveness above about 12000 bytes. And secondly, 9000 was large enough to carry an 8 KB application datagram (e.g. NFS) plus packet header overhead. Is 9000 bytes enough? It's a lot better than 1500, but for pure performance reasons there is little reason to stop there. At 64 KB we reach the limit of an IPv4 datagram, while IPv6 allows for packets up to 4 GB in size. For ethernet however, the 32 bit CRC limit is hard to change, so don't expect to see ethernet frame sizes above 9000 bytes anytime soon." But it actually washes because if you have a larger packet size, you have fewer packets so while you might have a higher "false pass" rate on the larger packets, since you have fewer packets involved, the actual false pass rate for a given amount of data is virtually unchanged. http://staff.psc.edu/mathis/MTU/arguments.html#crc

Mikael Abrahamsson

8:58 p.m.

On Fri, 26 Nov 2010, Randy Bush wrote:

...

the reason ieee has not allowed upping of the frame size is that the crc is at the prudent limits at 1500. yes, we do another check above the frame (uh, well, udp4 may not), but the ether spec can not count on that.

<http://staff.psc.edu/mathis/MTU/arguments.html#crc> seems to disagree? -- Mikael Abrahamsson email: swmike@swm.pp.se

John Kristoff

29 Nov 29 Nov

7:10 p.m.

On Fri, 26 Nov 2010 15:24:57 -0500 Randy Bush <randy@psg.com> wrote:

...

the reason ieee has not allowed upping of the frame size is that the crc is at the prudent limits at 1500. yes, we do another check above the frame (uh, well, udp4 may not), but the ether spec can not count on that.

I wasn't there, but I paid some attention to the discussion of jumbos when it would frequently pop up on comp.dcom.lans.ethernet. Rich Seifert, who was involved, would jeer jumbos and point out the potential problems. A search in that group with his name and jumbo frames should bring up some useful background. In a nutshell, as I recall, one of the prime motivating factors for not standardizing jumbos was interoperability issues with the installed base, which penalizes other parts of the network (e.g. routers having to perform fragmentation) for the benefit of a select few (e.g. modern server to server comms). I also seem to recall Rich had also once said something to the effect that it might have been nice if larger frames were supported at the onset of Ethernet's initial development, but alas, such is life and it's simply too late now. The "installed base defeats us". John

Jack Bates

9:18 p.m.

On 11/29/2010 1:10 PM, John Kristoff wrote:

...

In a nutshell, as I recall, one of the prime motivating factors for not standardizing jumbos was interoperability issues with the installed base, which penalizes other parts of the network (e.g. routers having to perform fragmentation) for the benefit of a select few (e.g. modern server to server comms).

Given that IPv6 doesn't support routers performing fragmentation, and many packets are sent df-bit anyways, standardized jumbos would be nice. Just because the Internet as a whole may not support them, and ethernet cards themselves may not exceed 1500 by default, doesn't mean that a standard should be written for those instances where jumbo frames would be desired. Let's be honestly, there are huge implementations of baby giants out there. Verizon for one requires 1600 byte support for cell towers (tested at 1600 bytes for them, so slightly larger for transport gear depending on what is wrappers are placed over that). None of this indicates larger than 1500 byte IP, but it does indicate larger L2 MTU. There are many in-house setups which use jumbo frames, and having a standard for interoperability of those devices would be welcome. I'd personally love to see standards across the board for MTU from logical to physical supporting even tiered MTU with future proof overheads for vlans, mpls, ppp, intermixed in a large number of ways and layers (IP MTU support for X sizes, overhead support for Y sizes). Jack

Douglas Otis

11:21 p.m.

On 11/29/10 1:18 PM, Jack Bates wrote:

...

On 11/29/2010 1:10 PM, John Kristoff wrote:

...
In a nutshell, as I recall, one of the prime motivating factors for not standardizing jumbos was interoperability issues with the installed base, which penalizes other parts of the network (e.g. routers having to perform fragmentation) for the benefit of a select few (e.g. modern server to server comms).

Given that IPv6 doesn't support routers performing fragmentation, and many packets are sent df-bit anyways, standardized jumbos would be nice. Just because the Internet as a whole may not support them, and ethernet cards themselves may not exceed 1500 by default, doesn't mean that a standard should be written for those instances where jumbo frames would be desired.

Let's be honestly, there are huge implementations of baby giants out there. Verizon for one requires 1600 byte support for cell towers (tested at 1600 bytes for them, so slightly larger for transport gear depending on what is wrappers are placed over that). None of this indicates larger than 1500 byte IP, but it does indicate larger L2 MTU.

There are many in-house setups which use jumbo frames, and having a standard for interoperability of those devices would be welcome. I'd personally love to see standards across the board for MTU from logical to physical supporting even tiered MTU with future proof overheads for vlans, mpls, ppp, intermixed in a large number of ways and layers (IP MTU support for X sizes, overhead support for Y sizes).

The level of undetected errors by TCP or UDP checksums can be high. The summation scheme is remarkably vulnerable to bus related bit errors, where as much as 2% of parallel bus related bit errors might go undetected. Use of SCTP, TLS, or IPSEC can supplant weak TCP/UDP summation error detection schemes. While Jumbo frames reduce serial error detection rates of the IEEE CRC restored by SCTP/CRC32c for Jumbo frames, serial detection is less of a concern when compared to bus related bit error detection rates. CRC32c solves both the bus and Jumbo frame error detection and is found in 10GB/s NICs and math coprocessors. -Doug

Joel Jaeggli

26 Nov 26 Nov

9:29 p.m.

10/100 switches and NICs pretty much universally do not support jumbos. Joel's widget number 2 On Nov 26, 2010, at 8:02, Brandon Kim <brandon.kim@brandontek.com> wrote:

...

Where would the world be if we weren't stuck at 1500 MTU? I've always kinda thought, what if that was larger from the start....

We keep getting faster switchports, but the MTU is still 1500 MTU! I'm sure someone has done some testing with a 10/100 switch with jumbo frames enables versus a 10/100/1000 switch using regular 1500 MTU and compared the performance.....

...
Subject: RE: Jumbo frame Question Date: Thu, 25 Nov 2010 21:14:02 -0800 From: gbonser@seven.com To: harris.hui@hk1.ibm.com; nanog@nanog.org

...
Hi

Does anyone have experience on design / implementing the Jumbo frame enabled network?

I am working on a project to better utilize a fiber link across east coast and west coast with the Juniper devices.

Based on the default TCP windows in Linux / Windows and the latency between east coast and west coast (~80ms) and the default MTU size 1500, the maximum throughput of a single TCP session is around ~3Mbps but it is too slow for us to backing-up the huge amount of data across 2 sites.

There are a lot of stack tweaks you can make but the real answer is larger MTU sizes in addition to those tweaks. Our network is completely 9000 MTU internally. We don't deploy any servers anymore with MTU 1500. MTU 1500 is just plain stupid with any network >100mb ethernet.

...
The following is the topology that we are using right now.

Host A NIC (MTU 9000) <--- GigLAN ---> (MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9018) J-6350 cluster A (MTU 9018) <--- fiber link across site ---> (MTU 9018) J-6350 cluster B (MTU 9018) <--- GigLAN

...
...
(MTU 9216) Juniper EX4200 (MTU 9216) <---GigLAN ---> (MTU 9000) NIC - Host B

I was trying to test the connectivity from Host A to the J-6350 cluster A by using ICMP-Ping with size 8000 and DF bit set but it was failed to ping.

Does anyone have experience on it? please advise.

Thanks :-)

You might have some transport in the path (SONET?) that can't send 8000. I would try starting at 3000 and working up to find where your limit is.

Your description of "fiber link across site" is vague. Who is the vendor, what kind of service?

Saku Ytti

5:26 p.m.

On (2010-11-25 21:14 -0800), George Bonser wrote: Hey George,

...

9000 MTU internally. We don't deploy any servers anymore with MTU 1500. MTU 1500 is just plain stupid with any network >100mb ethernet.

I'm big proponent of high MTU, to facilitate user MTU of 1500 while adding say GRE or IPSEC overhead. But calling it plain stupid to run MTU of 1500 is quite the over statement. irb(main):001:0> 1460.0/(38+1500) => 0.949284785435631 irb(main):002:0> 8960.0/(38+9000) => 0.991369772073468 irb(main):003:0> You are theoretically winning 4.2%, which works only internally in your network, so maybe you'll be able to capitalize on that 4.2% on backup traffic or so. Doesn't seem like that critical win to be honest. -- ++ytti

Valdis.Kletnieks＠vt.edu

5:39 p.m.

On Fri, 26 Nov 2010 19:26:30 +0200, Saku Ytti said:

...

You are theoretically winning 4.2%, which works only internally in your network, so maybe you'll be able to capitalize on that 4.2% on backup traffic or so. Doesn't seem like that critical win to be honest.

That's only half the calculation. The *other* half is if you have gear that has a packets-per-second issue - if you go to 9000 MTU, you can move 6 times as much data in the same packets-per-second. Anybody who's ever had to trim a complicated ACL list because it saturated the CPU knows what I mean.

Saku Ytti

5:55 p.m.

On (2010-11-26 12:39 -0500), Valdis.Kletnieks@vt.edu wrote:

...

That's only half the calculation. The *other* half is if you have gear that has a packets-per-second issue - if you go to 9000 MTU, you can move 6 times as much data in the same packets-per-second. Anybody who's ever had to trim a complicated ACL list because it saturated the CPU knows what I mean.

Academically speaking interesting topic, of course the actual time to copy the packet is not constant, so you are not going to see linear increase in bandwidth. It would be very nice to see graph of say VXR with long enough ACL to cap 1500B rate very low and then see results of different packet sizes of 3000, 6000, 9000. If this is something you regularly need to operationally consider, do you happen to have such numbers and if not would it be too much of work for you to provide the numbers? In my world, we've been running hardware lookup engines since 2003, so we really don't need to care about features affecting lookup speed. -- ++ytti

Jon Meek

5:16 p.m.

I have the "opposite problem". I use iperf to test WAN and VPN throughput and packet loss, but find that the sending Linux system starts out with the expected MTU / MSS but then ramps up the packet size to way beyond 1500. The result is that network equipment must fragment the packets. On higher bandwidth circuits there are a lot of re-transmits that mask any real packet loss that might exist in the path. I have tried multiple methods to clamp the MTU, but nothing has worked so far. This leads me to wonder how often real bulk transfer applications start using jumbo packets that just end up getting fragmented downstream. The jumbo packets from iperf occur on various versions of the Linux kernel and different distributions. It might only happen on GigE. Suggestions on clamping the MTU are welcome. Thanks, Jon On Thu, Nov 25, 2010 at 7:13 PM, Harris Hui <harris.hui@hk1.ibm.com> wrote:

...

Hi

Does anyone have experience on design / implementing the Jumbo frame enabled network?

Richard Graves (RHT)

5:23 p.m.

Jon, Do you have something blocking MTU Path Discovery? Unless I'm off base on this, shouldn't that be taking care of your issue? -Richard -----Original Message----- From: Jon Meek [mailto:meekjt@gmail.com] Sent: Friday, November 26, 2010 12:17 PM To: nanog@nanog.org Subject: Re: Jumbo frame Question I have the "opposite problem". I use iperf to test WAN and VPN throughput and packet loss, but find that the sending Linux system starts out with the expected MTU / MSS but then ramps up the packet size to way beyond 1500. The result is that network equipment must fragment the packets. On higher bandwidth circuits there are a lot of re-transmits that mask any real packet loss that might exist in the path. I have tried multiple methods to clamp the MTU, but nothing has worked so far. This leads me to wonder how often real bulk transfer applications start using jumbo packets that just end up getting fragmented downstream. The jumbo packets from iperf occur on various versions of the Linux kernel and different distributions. It might only happen on GigE. Suggestions on clamping the MTU are welcome. Thanks, Jon On Thu, Nov 25, 2010 at 7:13 PM, Harris Hui <harris.hui@hk1.ibm.com> wrote:

...

Hi

Does anyone have experience on design / implementing the Jumbo frame enabled network?

5346

Age (days ago)

5349

Last active (days ago)

List overview

Download

21 comments

19 participants

participants (19)

Adrian Chadd
Brandon Kim
Douglas Otis
Geo.
George Bonser
Hank Nussbacher
Harris Hui
Jack Bates
Joel Jaeggli
John Kristoff
Jon Meek
Kevin Oberman
Matthew Petach
Mikael Abrahamsson
Randy Bush
Richard Graves (RHT)
Saku Ytti
Valdis.Kletnieks＠vt.edu
Wil Schultz