IP Fragmentation - Not reliable over the Internet?
I am trolling for information/community wisdom. What is the probability that a random path between two Internet hosts will traverse a middlebox that drops or otherwise barfs on fragmented IPv4 packets? If anyone has any data or anecdotes, please feel free to send an off-list email or whatever. Thanks! ----------------------------------------------------------- Christopher.Palmer@microsoft.com Program Manager Windows Networking Core - Client Technologies
On Tue, 27 Aug 2013 00:01:45 -0000, Christopher Palmer said:
What is the probability that a random path between two Internet hosts will traverse a middlebox that drops or otherwise barfs on fragmented IPv4 packets?
THe fact you're posting indicates that you already know the practical answer: "Often enough that you need to take defensive measures". But there's really several separate questions here: 1) What is the probability that a given path ends up fragging a packet because it isn't MTU 1500 end-to-end? 2) What is the probability that a frag needed is detected by a router that then botches it? 2a) What is the probability that the router does it right but the source node shoots itself in the foot by requesting PMTUD, but then blocks inbound ICMP for "security reasons"? 3) What is the probability that one router correctly frags a packet, but a subsequent box (most likely a firewall or target host) botches the re-assembly or other handling? 4) When confronted with the fact that there's a very high correlation between the level of technical clue that results in procuring and deploying a broken device, and the level of technical clue clue available to resolve the problem when you try to contact them, what's the appropriate beverage?
On Aug 26, 2013, at 22:02 , Valdis.Kletnieks@vt.edu wrote:
On Tue, 27 Aug 2013 00:01:45 -0000, Christopher Palmer said:
What is the probability that a random path between two Internet hosts will traverse a middlebox that drops or otherwise barfs on fragmented IPv4 packets?
THe fact you're posting indicates that you already know the practical answer: "Often enough that you need to take defensive measures".
But there's really several separate questions here:
1) What is the probability that a given path ends up fragging a packet because it isn't MTU 1500 end-to-end?
2) What is the probability that a frag needed is detected by a router that then botches it?
2a) What is the probability that the router does it right but the source node shoots itself in the foot by requesting PMTUD, but then blocks inbound ICMP for "security reasons"?
3) What is the probability that one router correctly frags a packet, but a subsequent box (most likely a firewall or target host) botches the re-assembly or other handling?
4) When confronted with the fact that there's a very high correlation between the level of technical clue that results in procuring and deploying a broken device, and the level of technical clue clue available to resolve the problem when you try to contact them, what's the appropriate beverage?
That's a lot of questions he didn't ask. As I read it, the question he asked is: If I send a packet out as a legitimate series of fragments, what is the chance that they will get dropped somewhere in the middle of the path between the emitting host and the receiving host? To my thinking, the answer to that question is basically "pretty close to 0 and if that changes in the core, very bad things will happen." Owen
On Tue, 27 Aug 2013 00:34:57 -0700, Owen DeLong said:
That's a lot of questions he didn't ask.
This isn't your first rodeo. You should know by now that the question actually asked, the question *meant* to be asked, and the question that actually needed answering are often 3 different things.
If I send a packet out as a legitimate series of fragments, what is the chance that they will get dropped somewhere in the middle of the path between the emitting host and the receiving host?
To my thinking, the answer to that question is basically "pretty close to 0 and if that changes in the core, very bad things will happen."
Saku Ytti and Emile Aben have numbers that say otherwise. And there must be a significantly bigger percentage of failures than "pretty close to 0", or Path MTU Discovery wouldn't have a reputation of being next to useless.
And then you have other issues like networks that arbitrarily set DF on all packets passing through them. That burnt a good three days of my life back in the day. -Blake On Tue, Aug 27, 2013 at 9:33 AM, <Valdis.Kletnieks@vt.edu> wrote:
On Tue, 27 Aug 2013 00:34:57 -0700, Owen DeLong said:
That's a lot of questions he didn't ask.
This isn't your first rodeo. You should know by now that the question actually asked, the question *meant* to be asked, and the question that actually needed answering are often 3 different things.
If I send a packet out as a legitimate series of fragments, what is the chance that they will get dropped somewhere in the middle of the path between the emitting host and the receiving host?
To my thinking, the answer to that question is basically "pretty close to 0 and if that changes in the core, very bad things will happen."
Saku Ytti and Emile Aben have numbers that say otherwise. And there must be a significantly bigger percentage of failures than "pretty close to 0", or Path MTU Discovery wouldn't have a reputation of being next to useless.
On Aug 27, 2013, at 07:33 , Valdis.Kletnieks@vt.edu wrote:
On Tue, 27 Aug 2013 00:34:57 -0700, Owen DeLong said:
That's a lot of questions he didn't ask.
This isn't your first rodeo. You should know by now that the question actually asked, the question *meant* to be asked, and the question that actually needed answering are often 3 different things.
If I send a packet out as a legitimate series of fragments, what is the chance that they will get dropped somewhere in the middle of the path between the emitting host and the receiving host?
To my thinking, the answer to that question is basically "pretty close to 0 and if that changes in the core, very bad things will happen."
Saku Ytti and Emile Aben have numbers that say otherwise. And there must be a significantly bigger percentage of failures than "pretty close to 0", or Path MTU Discovery wouldn't have a reputation of being next to useless.
No, their numbers describe what happens to single packets of differing sizes. Nothing they did describes results of actually fragmented packets. Owen
* Owen DeLong
On Aug 27, 2013, at 07:33 , Valdis.Kletnieks@vt.edu wrote:
Saku Ytti and Emile Aben have numbers that say otherwise. And there must be a significantly bigger percentage of failures than "pretty close to 0", or Path MTU Discovery wouldn't have a reputation of being next to useless.
No, their numbers describe what happens to single packets of differing sizes.
Nothing they did describes results of actually fragmented packets.
Yes, it did. Hint: 1473 + 8 + 20 Tore
On 28/08/2013 08:05, Tore Anderson wrote:
* Owen DeLong
On Aug 27, 2013, at 07:33 , Valdis.Kletnieks@vt.edu wrote:
Saku Ytti and Emile Aben have numbers that say otherwise. And there must be a significantly bigger percentage of failures than "pretty close to 0", or Path MTU Discovery wouldn't have a reputation of being next to useless.
No, their numbers describe what happens to single packets of differing sizes.
Nothing they did describes results of actually fragmented packets.
Yes, it did.
Hint: 1473 + 8 + 20
For Saku: yes. For me: that was my intention, but later I discovered the Atlas ping does include the ICMP header in it's 'size' parameter so what I did in effect was 1473 + 20 = 1493 (and not the 1501 I intended). Redid the tests to a "known good" destination where I knew interface MTU (1500) and could tcpdump which confirmed that I was looking at fragmentation. I also took an offline recommendation to do different packet sizes to try to distinguish fragmentation issues from general corruption-based packet loss. Results: size = ICMP packet size, add 20 for IPv4 packet size fail% = % of vantage points where 5 packets where sent, 0 where received. #size fail% vantage points 100 0.88 2963 300 0.77 3614 500 0.88 1133 700 1.07 3258 900 1.13 3614 1000 1.04 770 1100 2.04 3525 1200 1.91 3303 1300 1.76 681 1400 2.06 3014 1450 2.53 3597 1470 3.01 2192 1470 3.12 3592 1473 4.96 3566 1475 4.96 3387 1480 6.04 679 1480 4.93 3492 [*] 1481 9.86 3489 1482 9.81 3567 1483 9.94 3118 There is a ~5% difference going up from 1480 to 1481. As to interpreting this: Leo Bicknell's observations (this is to a "known good" host, and the RIPE Atlas vantage points may very well have a clueful-operator bias) stand, so interpret with care. Also: roughly 2/3 of these vantage points are behind NATs that may also have some firewall(ish) behaviour. Hope this data point helps interpreting the magnitude of IPv4 fragmentation problems. Emile Aben RIPE NCC [*] redid the 'size 1480' experiment because the first time around it had significantly less vantage points.
Has the path MTU been measured for all vantage point pairs? Is it known to be 1500 or just the end-point MTUs? That could affect your results very differently. Owen On Aug 28, 2013, at 02:26 , Emile Aben <emile.aben@ripe.net> wrote:
On 28/08/2013 08:05, Tore Anderson wrote:
* Owen DeLong
On Aug 27, 2013, at 07:33 , Valdis.Kletnieks@vt.edu wrote:
Saku Ytti and Emile Aben have numbers that say otherwise. And there must be a significantly bigger percentage of failures than "pretty close to 0", or Path MTU Discovery wouldn't have a reputation of being next to useless.
No, their numbers describe what happens to single packets of differing sizes.
Nothing they did describes results of actually fragmented packets.
Yes, it did.
Hint: 1473 + 8 + 20
For Saku: yes. For me: that was my intention, but later I discovered the Atlas ping does include the ICMP header in it's 'size' parameter so what I did in effect was 1473 + 20 = 1493 (and not the 1501 I intended).
Redid the tests to a "known good" destination where I knew interface MTU (1500) and could tcpdump which confirmed that I was looking at fragmentation. I also took an offline recommendation to do different packet sizes to try to distinguish fragmentation issues from general corruption-based packet loss.
Results: size = ICMP packet size, add 20 for IPv4 packet size fail% = % of vantage points where 5 packets where sent, 0 where received. #size fail% vantage points 100 0.88 2963 300 0.77 3614 500 0.88 1133 700 1.07 3258 900 1.13 3614 1000 1.04 770 1100 2.04 3525 1200 1.91 3303 1300 1.76 681 1400 2.06 3014 1450 2.53 3597 1470 3.01 2192 1470 3.12 3592 1473 4.96 3566 1475 4.96 3387 1480 6.04 679 1480 4.93 3492 [*] 1481 9.86 3489 1482 9.81 3567 1483 9.94 3118
There is a ~5% difference going up from 1480 to 1481.
As to interpreting this: Leo Bicknell's observations (this is to a "known good" host, and the RIPE Atlas vantage points may very well have a clueful-operator bias) stand, so interpret with care. Also: roughly 2/3 of these vantage points are behind NATs that may also have some firewall(ish) behaviour.
Hope this data point helps interpreting the magnitude of IPv4 fragmentation problems.
Emile Aben RIPE NCC
[*] redid the 'size 1480' experiment because the first time around it had significantly less vantage points.
On 29/08/2013 04:22, Owen DeLong wrote:
Has the path MTU been measured for all vantage point pairs?
I didn't, but see http://www.nlnetlabs.nl/downloads/publications/pmtu-black-holes-msc-thesis.p... Fig 23 (page 24) for path MTU data from roughly a year ago (thanks Benno for posting that link). Emile
On Aug 27, 2013, at 12:34 AM, Owen DeLong <owen@delong.com> wrote:
If I send a packet out as a legitimate series of fragments, what is the chance that they will get dropped somewhere in the middle of the path between the emitting host and the receiving host?
To my thinking, the answer to that question is basically "pretty close to 0 and if that changes in the core, very bad things will happen."
I mostly agree. I will argue that the actual path of an IP datagram is end to end, so the question is not the core, but the end to end path. That said, with today's congestion control algorithms, TCP does pretty badly with an other-than-negligible loss rate, so end to end, fragmented messages have a negligible probability of being dropped, so the probability of sending a message that is fragmented and having it arrive at the intended destination is a negligibly small probability smaller than then probability of sending an unfragmented message and having it arrive. The primary argument against that is firewall behavior, in which firewalls are programmed to drop fragments with high probability. If we had a protocol that sat atop IP and did what fragmentation does that we could expect all non-TCP/SCTP protocols to use, I would have a very different viewpoint. But, playing the ball where it lies, the primary change I would recommend would be to support any firewall rule that permitted dropping the first fragment of a fragmented datagram in which the first fragment did NOT include the entire IP header and the entire subsequent header, and expecting a host to keep a fragment of a datagram no more than some stated number of seconds (I might pick "two") with express permission to drop it more rapidly should the need arise. I would *not* support a rule that simple dropped fragments, or a protocol change that disallowed them.
On Sep 1, 2013, at 23:11 , "Fred Baker (fred)" <fred@cisco.com> wrote:
On Aug 27, 2013, at 12:34 AM, Owen DeLong <owen@delong.com> wrote:
If I send a packet out as a legitimate series of fragments, what is the chance that they will get dropped somewhere in the middle of the path between the emitting host and the receiving host?
To my thinking, the answer to that question is basically "pretty close to 0 and if that changes in the core, very bad things will happen."
I mostly agree. I will argue that the actual path of an IP datagram is end to end, so the question is not the core, but the end to end path.
That said, with today's congestion control algorithms, TCP does pretty badly with an other-than-negligible loss rate, so end to end, fragmented messages have a negligible probability of being dropped, so the probability of sending a message that is fragmented and having it arrive at the intended destination is a negligibly small probability smaller than then probability of sending an unfragmented message and having it arrive.
Yes, the path is end-to-end and things happening near the end-points can be bad for a particular conversation. My point is that if somewhere in the core starts doing bad things to fragments on a regular basis, it will be very bad for massive numbers of users and not just the localized damage one would expect from something closer to the edge. Otherwise, we are saying the same thing.
The primary argument against that is firewall behavior, in which firewalls are programmed to drop fragments with high probability.
Which fortunately tend to be located at the edge and not in the core.
If we had a protocol that sat atop IP and did what fragmentation does that we could expect all non-TCP/SCTP protocols to use, I would have a very different viewpoint. But, playing the ball where it lies, the primary change I would recommend would be to support any firewall rule that permitted dropping the first fragment of a fragmented datagram in which the first fragment did NOT include the entire IP header and the entire subsequent header, and expecting a host to keep a fragment of a datagram no more than some stated number of seconds (I might pick "two") with express permission to drop it more rapidly should the need arise. I would *not* support a rule that simple dropped fragments, or a protocol change that disallowed them.
I think I mostly agree, but I'd need to think it through a bit more than I can at the moment. Owen
On (2013-08-27 00:01 +0000), Christopher Palmer wrote:
If anyone has any data or anecdotes, please feel free to send an off-list email or whatever.
[ytti@ytti.fi ~]% ssh ring ring-all -t90 ping -s 1473 -c2 -w3 ip.fi|pastebinit http://p.ip.fi/KA7N [ytti@sci ~]% curl -s http://p.ip.fi/KA7N|grep transmitted|wc -l 224 [ytti@sci ~]% curl -s http://p.ip.fi/KA7N|grep "0 received"|wc -l 10 UUOC wc, but that's how I roll. 224 vantage points, 10 failed. -- ++ytti
On 27/08/2013 08:55, Saku Ytti wrote:
On (2013-08-27 00:01 +0000), Christopher Palmer wrote:
If anyone has any data or anecdotes, please feel free to send an off-list email or whatever.
[ytti@ytti.fi ~]% ssh ring ring-all -t90 ping -s 1473 -c2 -w3 ip.fi|pastebinit http://p.ip.fi/KA7N
[ytti@sci ~]% curl -s http://p.ip.fi/KA7N|grep transmitted|wc -l 224 [ytti@sci ~]% curl -s http://p.ip.fi/KA7N|grep "0 received"|wc -l 10
UUOC wc, but that's how I roll.
224 vantage points, 10 failed.
Same tests from RIPE Atlas pings towards nl-ams-as3333.anchors.atlas.ripe.net today: 48 byte ping: 42 out of 3406 vantage points fail (1.0%) 1473 byte ping: 180 out of 3540 vantage points fail (5.1%) Of the 180 vantage points that failed for the 1473 byte ping, 142 were successful in receiving at least 1 reply for the 48 byte ping. Measurement IDs in RIPE Atlas are 1019675 and 1019676. Emile Aben RIPE NCC
On (2013-08-27 10:45 +0200), Emile Aben wrote:
224 vantage points, 10 failed.
48 byte ping: 42 out of 3406 vantage points fail (1.0%) 1473 byte ping: 180 out of 3540 vantage points fail (5.1%)
Nice, it's starting to almost sound like data rather than anecdote, both tests implicate 4<5% having fragmentation issues. Much larger number than I intuitively had in mind. -- ++ytti
On Aug 27, 2013, at 6:24 AM, Saku Ytti <saku@ytti.fi> wrote:
On (2013-08-27 10:45 +0200), Emile Aben wrote:
224 vantage points, 10 failed.
48 byte ping: 42 out of 3406 vantage points fail (1.0%) 1473 byte ping: 180 out of 3540 vantage points fail (5.1%)
Nice, it's starting to almost sound like data rather than anecdote, both tests implicate 4<5% having fragmentation issues.
Much larger number than I intuitively had in mind.
I'm pretty sure the failure rate is higher, and here's why. The #1 cause of fragments being dropped is firewalls. Too many admins configuring a firewall do not understand fragments or how to properly put them in the rules. Where do firewalls exist? Typically protecting things with public IP space, that is (some) corporate networks and banks of content servers in data centers. This also includes on-box firewalls for Internet servers, ipfw or iptables on the server is just as likely to be part of the problem. Now, where are RIPE probes? Most RIPE probes are probably either with somewhat clueful ISP operators, or at Internet Clueful engineer's personal connectivity (home, or perhaps a box in a colo). RIPE probes have already significantly self-selected for people who like non-broken connectivity. What's more, the ping test was probably to some "known good" host(s), rather than a broad selection of Internet hosts, so effectively it was only testing the probe end, not both ends. Basically, I see RIPE probes as an almost best-case scenario for this sort of broken behavior. I bet the ISC Netalyzer folks have somewhat better data, perhaps skewed a bit towards broken connections as people run Netalyzer when their connection is broken! I suspect reality is somewhere between those two book ends. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/
On 8/27/2013 10:04 AM, Leo Bicknell wrote:
On Aug 27, 2013, at 6:24 AM, Saku Ytti <saku@ytti.fi> wrote:
On (2013-08-27 10:45 +0200), Emile Aben wrote:
224 vantage points, 10 failed.
48 byte ping: 42 out of 3406 vantage points fail (1.0%) 1473 byte ping: 180 out of 3540 vantage points fail (5.1%)
Nice, it's starting to almost sound like data rather than anecdote, both tests implicate 4<5% having fragmentation issues.
Much larger number than I intuitively had in mind.
I'm pretty sure the failure rate is higher, and here's why.
The #1 cause of fragments being dropped is firewalls. Too many admins configuring a firewall do not understand fragments or how to properly put them in the rules.
Where do firewalls exist? Typically protecting things with public IP space, that is (some) corporate networks and banks of content servers in data centers. This also includes on-box firewalls for Internet servers, ipfw or iptables on the server is just as likely to be part of the problem.
It's not just firewalls.... border-routers are also apt to have ACLs like these[1]: ip access-list extended BORDER-IN 10 deny tcp any any fragments 20 deny udp any any fragments 30 deny icmp any any fragments 40 deny ip any any fragments I see these a *LOT* on customer routers, before the packets even get to the firewall.... Regards, dtb 1. I found it most recently at http://hurricanelabs.com/blog/cisco-security-routers/ but I know there are many other "guides" that include these as part of their ACL.
On 8/27/13 4:04 PM, Leo Bicknell wrote:
I'm pretty sure the failure rate is higher, and here's why.
The #1 cause of fragments being dropped is firewalls. Too many admins configuring a firewall do not understand fragments or how to properly put them in the rules.
Where do firewalls exist? Typically protecting things with public IP space, that is (some) corporate networks and banks of content servers in data centers. This also includes on-box firewalls for Internet servers, ipfw or iptables on the server is just as likely to be part of the problem.
In a study using the RIPE Atlas probes, we have used a heuristic to figure out where the fragments where dropped. And from the Atlas probes where IP fragments did not arrive, there is a high likelihood the problem is with the last hop to the Atlas probe. All other situations are with the router just before the last hop. We did not find any problems in the core. Of course this was rather limited study using the RIPE Atlas probes in a certain setting. See for the full report "Discovering Path MTU Black Holes on the Internet Using the RIPE Atlas", http://www.nlnetlabs.nl/downloads/publications/pmtu-black-holes-msc-thesis.p....
Now, where are RIPE probes? Most RIPE probes are probably either with somewhat clueful ISP operators, or at Internet Clueful engineer's personal connectivity (home, or perhaps a box in a colo). RIPE probes have already significantly self-selected for people who like non-broken connectivity. What's more, the ping test was probably to some "known good" host(s), rather than a broad selection of Internet hosts, so effectively it was only testing the probe end, not both ends.
With help from RIPE NCC (many thanks), we did measurements both ways. Cheers, -- Benno -- Benno J. Overeinder NLnet Labs http://www.nlnetlabs.nl/
In a study using the RIPE Atlas probes, we have used a heuristic to figure out where the fragments where dropped. And from the Atlas probes where IP fragments did not arrive, there is a high likelihood the problem is with the last hop to the Atlas probe.
i wonder if this is correlated with the high number of probes being behind nats. randy
On 08/30/2013 01:58 PM, Randy Bush wrote:
In a study using the RIPE Atlas probes, we have used a heuristic to figure out where the fragments where dropped. And from the Atlas probes where IP fragments did not arrive, there is a high likelihood the problem is with the last hop to the Atlas probe.
i wonder if this is correlated with the high number of probes being behind nats.
That would be a viable explanation, although we have not tried to fingerprint the probes to figure out if this was true. If we will rerun the experiments in the future, we should spent more effort into identifying the router/middlebox that is giving the IP fragmentation problems (drops or blocking PMTUD ICMP). -- Benno -- Benno J. Overeinder NLnet Labs http://www.nlnetlabs.nl/
On 30/08/2013 16:36, Benno Overeinder wrote:
On 08/30/2013 01:58 PM, Randy Bush wrote:
In a study using the RIPE Atlas probes, we have used a heuristic to figure out where the fragments where dropped. And from the Atlas probes where IP fragments did not arrive, there is a high likelihood the problem is with the last hop to the Atlas probe.
i wonder if this is correlated with the high number of probes being behind nats.
That would be a viable explanation, although we have not tried to fingerprint the probes to figure out if this was true.
If we will rerun the experiments in the future, we should spent more effort into identifying the router/middlebox that is giving the IP fragmentation problems (drops or blocking PMTUD ICMP).
Maybe this provides a bit of insight:
From a test last week from all RIPE Atlas probes to a single "known good" MTU 1500 host I compared probes where I had both a ping test with ipv4.len 1020 and ipv4.len 1502. behind NAT probes: 12% 1020 bytes ping worked while 1502 failed non-NATted probes: 6% ""
hth, Emile Aben RIPE NCC
i wonder if this is correlated with the high number of probes being behind nats.
Maybe this provides a bit of insight: From a test last week from all RIPE Atlas probes to a single "known good" MTU 1500 host I compared probes where I had both a ping test with ipv4.len 1020 and ipv4.len 1502. behind NAT probes: 12% 1020 bytes ping worked while 1502 failed non-NATted probes: 6% ""
this needs publication on your adventure game of a web site, please. it will seriously 'inform' some discussion going back and forth on ietf lists. randy
On 31/08/2013 13:09, Randy Bush wrote:
i wonder if this is correlated with the high number of probes being behind nats.
Maybe this provides a bit of insight: From a test last week from all RIPE Atlas probes to a single "known good" MTU 1500 host I compared probes where I had both a ping test with ipv4.len 1020 and ipv4.len 1502. behind NAT probes: 12% 1020 bytes ping worked while 1502 failed non-NATted probes: 6% ""
this needs publication on your adventure game of a web site, please. it will seriously 'inform' some discussion going back and forth on ietf lists.
This is now published on RIPE Labs. For the adventurous: https://labs.ripe.net/Members/emileaben/ripe-atlas-packet-size-matters regards, Emile Aben RIPE NCC
this needs publication on your adventure game of a web site, please. it will seriously 'inform' some discussion going back and forth on ietf lists.
This is now published on RIPE Labs. For the adventurous: https://labs.ripe.net/Members/emileaben/ripe-atlas-packet-size-matters
some hours back, i posted the url to the ietf list arguing frag thanks a million randy
On 31/08/2013 13:13, Randy Bush wrote:
could you please test with ipv6?
This is what I see for various IPv6 payloads (large ICMPv6 echo requests) from all RIPE Atlas probes that where available at the time to a single "known good" MTU 1500 destination: plen fail% nr_probes 100 9.64 1266 500 9.34 1039 1000 9.94 1298 1240 9.94 1308 1241 11.62 1300 1440 12.70 890 1441 14.70 1306 1460 15.18 1304 1461 19.84 1290 1462 22.02 1294 plen: IPv6 payload length (ie. not including 40byte IPv6 header) fail%: percentage of probes that didn't get any of the 5 pkts that were sent. Note that there is a large baseline failure rate in IPv6 on RIPE Atlas probes [1], which would explain the ~10% failure rate for the smaller packets. I plan to do more analysis and start writing this up on RIPE Labs over the next few days. cheers, Emile Aben RIPE NCC [1] https://labs.ripe.net/Members/stephane_bortzmeyer/how-many-atlas-probes-beli...
I know I'm digging up an old thread here but I've spent some time analyzing some of the significant changes that Apple has made to the Facetime protocol, apparently with a huge focus on IP packet size to avoid fragmentation issues: http://blog.krisk.org/2013/09/apples-new-facetime-sip-perspective.html I'm betting they've had HUGE issues with IP+UDP MTU issues over the last three years... On Sun, Sep 1, 2013 at 4:34 PM, Emile Aben <emile.aben@ripe.net> wrote:
On 31/08/2013 13:13, Randy Bush wrote:
could you please test with ipv6?
This is what I see for various IPv6 payloads (large ICMPv6 echo requests) from all RIPE Atlas probes that where available at the time to a single "known good" MTU 1500 destination:
plen fail% nr_probes 100 9.64 1266 500 9.34 1039 1000 9.94 1298 1240 9.94 1308 1241 11.62 1300 1440 12.70 890 1441 14.70 1306 1460 15.18 1304 1461 19.84 1290 1462 22.02 1294
plen: IPv6 payload length (ie. not including 40byte IPv6 header) fail%: percentage of probes that didn't get any of the 5 pkts that were sent. Note that there is a large baseline failure rate in IPv6 on RIPE Atlas probes [1], which would explain the ~10% failure rate for the smaller packets.
I plan to do more analysis and start writing this up on RIPE Labs over the next few days.
cheers, Emile Aben RIPE NCC
[1] https://labs.ripe.net/Members/stephane_bortzmeyer/how-many-atlas-probes-beli...
-- Kristian Kielhofner
Christopher Palmer <Christopher.Palmer@microsoft.com> wrote:
What is the probability that a random path between two Internet hosts will traverse a middlebox that drops or otherwise barfs on fragmented IPv4 packets?
This question is important for large EDNS packets so you'll find some recent practical investigations from the perspective of people interested in DNSSEC. For instance, a couple of presentations from Roland van Rijswijk: https://ripe64.ripe.net/presentations/91-20120418_-_RIPE64_-_Ljubljana_-_DNS... http://toronto45.icann.org/meetings/toronto2012/presentation-dnssec-fragment... Tony. -- f.anthony.n.finch <dot@dotat.at> http://dotat.at/ Forties, Cromarty: East, veering southeast, 4 or 5, occasionally 6 at first. Rough, becoming slight or moderate. Showers, rain at first. Moderate or good, occasionally poor at first.
Christopher Palmer <Christopher.Palmer@microsoft.com> wrote: > > What is the probability that a random path between two Internet hosts > will traverse a middlebox that drops or otherwise barfs on fragmented > IPv4 packets? This question is important for large EDNS packets so you'll find some recent practical investigations from the perspective of people interested in DNSSEC. For instance, a couple of presentations from Roland van Rijswijk: https://ripe64.ripe.net/presentations/91-20120418_-_RIPE64_-_Ljubljana_-_DNS... http://toronto45.icann.org/meetings/toronto2012/presentation-dnssec-fragment... Related to this and maybe be of interest is the following blog post <https://www.nlnetlabs.nl/blog/2013/06/04/pmtud4dns/>. jaap
On Mon, Aug 26, 2013 at 8:01 PM, Christopher Palmer <Christopher.Palmer@microsoft.com> wrote:
What is the probability that a random path between two Internet hosts will traverse a middlebox that drops or otherwise barfs on fragmented IPv4 packets?
Hi Christopher, I think there might be three rather different questions here: 1. If I originate IP packet fragments, such as an 8000 byte NFS packet broken into 1500 byte fragments, what's the probability of some host before the other endpoint dropping one or all of those fragments? 2. If I send an IP packet that's too large for the path and *don't* set the don't-fragment bit, what' the chance that the router with the too-small next hop will fail to correctly fragment that packet (or that the correctly fragmented packet will fall into trap #1 above)? 3. If I send an IP packet that's too large for the path and *do* set the don't-fragment bit, what's the chance of failing to receive the "packet too big" message it causes the intermediate router to send? Are you after the answer to one in particular? Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
This is what I'm concerned about: """ 1. If I originate IP packet fragments, such as an 8000 byte NFS packet broken into 1500 byte fragments, what's the probability of some host before the other endpoint dropping one or all of those fragments? """ Big thanks to everyone who has sent thoughts already, really quite helpful. -----Original Message----- From: wherrin@gmail.com [mailto:wherrin@gmail.com] On Behalf Of William Herrin Sent: Tuesday, August 27, 2013 10:45 AM To: Christopher Palmer Cc: North American Network Operators' Group Subject: Re: IP Fragmentation - Not reliable over the Internet? On Mon, Aug 26, 2013 at 8:01 PM, Christopher Palmer <Christopher.Palmer@microsoft.com> wrote:
What is the probability that a random path between two Internet hosts will traverse a middlebox that drops or otherwise barfs on fragmented IPv4 packets?
Hi Christopher, I think there might be three rather different questions here: 1. If I originate IP packet fragments, such as an 8000 byte NFS packet broken into 1500 byte fragments, what's the probability of some host before the other endpoint dropping one or all of those fragments? 2. If I send an IP packet that's too large for the path and *don't* set the don't-fragment bit, what' the chance that the router with the too-small next hop will fail to correctly fragment that packet (or that the correctly fragmented packet will fall into trap #1 above)? 3. If I send an IP packet that's too large for the path and *do* set the don't-fragment bit, what's the chance of failing to receive the "packet too big" message it causes the intermediate router to send? Are you after the answer to one in particular? Regards, Bill Herrin -- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
In message <a708ea6a03eb4ca7a14f5b16e4ce8dda@BN1PR03MB171.namprd03.prod.outlook .com>, Christopher Palmer writes:
This is what I'm concerned about:
""" 1. If I originate IP packet fragments, such as an 8000 byte NFS packet broken into 1500 byte fragments, what's the probability of some host before the other endpoint dropping one or all of those fragments? """
For wide area NFS I would be using TCP not UDP. If you can't use TCP you should ensure that the firewalls at both ends pass fragmented UDP packet. NFS is generally not open to the world so fragmentation and NFS is essentially a local issue. Fragments don't get routinely dropped in the core. Ensure that the firealls at both ends pass ICMP/ICMPv6 PTB. Only idiots block all ICMP/ICMPv6. Yes there are a lot of idiots in the world.
Big thanks to everyone who has sent thoughts already, really quite helpful.
-----Original Message----- From: wherrin@gmail.com [mailto:wherrin@gmail.com] On Behalf Of William Herrin Sent: Tuesday, August 27, 2013 10:45 AM To: Christopher Palmer Cc: North American Network Operators' Group Subject: Re: IP Fragmentation - Not reliable over the Internet?
On Mon, Aug 26, 2013 at 8:01 PM, Christopher Palmer <Christopher.Palmer@microsoft.com> wrote:
What is the probability that a random path between two Internet hosts will traverse a middlebox that drops or otherwise barfs on fragmented IPv4 packets?
Hi Christopher,
I think there might be three rather different questions here:
1. If I originate IP packet fragments, such as an 8000 byte NFS packet broken into 1500 byte fragments, what's the probability of some host before the other endpoint dropping one or all of those fragments?
2. If I send an IP packet that's too large for the path and *don't* set the don't-fragment bit, what' the chance that the router with the too-small next hop will fail to correctly fragment that packet (or that the correctly fragmented packet will fall into trap #1 above)?
3. If I send an IP packet that's too large for the path and *do* set the don't-fragment bit, what's the chance of failing to receive the "packet too big" message it causes the intermediate router to send?
Are you after the answer to one in particular?
Regards, Bill Herrin
-- William D. Herrin ................ herrin@dirtside.com bill@herrin.us 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/> Falls Church, VA 22042-3004
-- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org
On Aug 29, 2013, at 18:15 , Mark Andrews <marka@isc.org> wrote:
In message <a708ea6a03eb4ca7a14f5b16e4ce8dda@BN1PR03MB171.namprd03.prod.outlook .com>, Christopher Palmer writes:
This is what I'm concerned about:
""" 1. If I originate IP packet fragments, such as an 8000 byte NFS packet broken into 1500 byte fragments, what's the probability of some host before the other endpoint dropping one or all of those fragments? """
For wide area NFS I would be using TCP not UDP. If you can't use TCP you should ensure that the firewalls at both ends pass fragmented UDP packet. NFS is generally not open to the world so fragmentation and NFS is essentially a local issue. Fragments don't get routinely dropped in the core.
However, passing fragmented UDP packets has its own (undesirable) set of security implications. Of course running NFS over an unencrypted path in the wild is, well, something with additional (undesirable) set of security implications. (IOW, this should be happening inside a VPN)
Ensure that the firealls at both ends pass ICMP/ICMPv6 PTB. Only idiots block all ICMP/ICMPv6. Yes there are a lot of idiots in the world.
+1 This cannot be stressed enough. Owen
Mark Andrews wrote:
Ensure that the firealls at both ends pass ICMP/ICMPv6 PTB. Only idiots block all ICMP/ICMPv6. Yes there are a lot of idiots in the world.
The worst idiots are people who designed ICMPv6 [RFC2463] as: (e.2) a packet destined to an IPv6 multicast address (there are two exceptions to this rule: (1) the Packet Too Big Message - Section 3.2 - to allow Path MTU discovery to work for IPv6 multicast, and (2) the Parameter Problem Message, Code 2 - Section 3.4 - reporting an unrecognized IPv6 option that has the Option Type highest-order two bits set to 10), or which makes it necessary, unless you are idiots, to filter ICMPv6 PTB against certain packets, including but not limited to, multicast ones. Masataka Ohta
participants (18)
-
Benno Overeinder
-
Blake Dunlap
-
Christopher Palmer
-
Dave Brockman
-
Emile Aben
-
Fred Baker (fred)
-
Jaap Akkerhuis
-
Kristian Kielhofner
-
Leo Bicknell
-
Mark Andrews
-
Masataka Ohta
-
Owen DeLong
-
Randy Bush
-
Saku Ytti
-
Tony Finch
-
Tore Anderson
-
Valdis.Kletnieks@vt.edu
-
William Herrin