level3.net in Chicago - high packet loss?!? - Test

level3.net in Chicago - high packet loss?!?

older
While Bush fiddles, New Orleans...

Network Fortius

6 Sep 2005 6 Sep '05

2:58 a.m.

Anybody having any idea why such a high packet loss on lever3's network, in Chicago? Stef:~ scm$ mtr -r www.yahoo.com ... tbr1-p010802.cgcil.ip.att.net 0% 16 16 15.12 24.21 49.26 ggr2-p310.cgcil.ip.att.net 0% 16 16 13.18 42.66 118.99 so-1-1-0.edge1.chicago1.level3.net 0% 16 16 14.48 35.84 126.48 so-2-1-0.bbr1.chicago1.level3.net 63% 6 16 14.44 43.74 79.97 ^^^^^^^ as-1-0.bbr2.sanjose1.level3.net 0% 16 16 61.95 80.64 176.01 ge-10-2.ipcolo3.sanjose1.level3.net 0% 16 16 63.37 95.61 148.46 unknown.level3.net 0% 16 16 63.34 86.46 168.62 unknown-66-218-82-217.yahoo.com 0% 16 16 62.09 88.91 127.58 p4.www.scd.yahoo.com 0% 16 16 64.51 89.96 183.79 TIA, Stef Network Fortius, LLC

Show replies by date

Robert E.Seastrom

6 Sep 6 Sep

12:45 p.m.

Network Fortius <netfortius@gmail.com> writes:

...

Anybody having any idea why such a high packet loss on lever3's network, in Chicago?

End-user misinterpreting output from MTR. This network does not appear to have any packet loss end-to-end. ---Rob

...

Stef:~ scm$ mtr -r www.yahoo.com ... tbr1-p010802.cgcil.ip.att.net 0% 16 16 15.12 24.21 49.26 ggr2-p310.cgcil.ip.att.net 0% 16 16 13.18 42.66 118.99 so-1-1-0.edge1.chicago1.level3.net 0% 16 16 14.48 35.84 126.48 so-2-1-0.bbr1.chicago1.level3.net 63% 6 16 14.44 43.74 79.97 ^^^^^^^ as-1-0.bbr2.sanjose1.level3.net 0% 16 16 61.95 80.64 176.01 ge-10-2.ipcolo3.sanjose1.level3.net 0% 16 16 63.37 95.61 148.46 unknown.level3.net 0% 16 16 63.34 86.46 168.62 unknown-66-218-82-217.yahoo.com 0% 16 16 62.09 88.91 127.58 p4.www.scd.yahoo.com 0% 16 16 64.51 89.96 183.79

TIA, Stef Network Fortius, LLC

Network Fortius

2:25 p.m.

And how exactly would you interpret the number returned by net_loss (int), in a column called "LOSS", in reference to reachability of a "hop" between two end points: int net_loss(int at) { if ((host[at].xmit - host[at].transit) == 0) return 0; /* times extra 1000 */ return 1000*(100 - (100.0 * host[at].returned / (host[at].xmit - host[at].transit)) ); } ? Thanks, Stef Network Fortius, LLC On Sep 6, 2005, at 7:45 AM, Robert E.Seastrom wrote:

...

Network Fortius <netfortius@gmail.com> writes:

...
Anybody having any idea why such a high packet loss on lever3's network, in Chicago?

End-user misinterpreting output from MTR. This network does not appear to have any packet loss end-to-end.

---Rob

...
Stef:~ scm$ mtr -r www.yahoo.com ... tbr1-p010802.cgcil.ip.att.net 0% 16 16 15.12 24.21 49.26 ggr2-p310.cgcil.ip.att.net 0% 16 16 13.18 42.66 118.99 so-1-1-0.edge1.chicago1.level3.net 0% 16 16 14.48 35.84 126.48 so-2-1-0.bbr1.chicago1.level3.net 63% 6 16 14.44 43.74 79.97

^^^^^^^ as-1-0.bbr2.sanjose1.level3.net 0% 16 16 61.95 80.64 176.01 ge-10-2.ipcolo3.sanjose1.level3.net 0% 16 16 63.37 95.61 148.46 unknown.level3.net 0% 16 16 63.34 86.46 168.62 unknown-66-218-82-217.yahoo.com 0% 16 16 62.09 88.91 127.58 p4.www.scd.yahoo.com 0% 16 16 64.51 89.96 183.79

TIA, Stef Network Fortius, LLC

Joe Maimon

2:35 p.m.

If the hop(s) following the one you see loss for shows no loss, then disregard the loss for that hop, obviously whatever it is, it does not affect transit, which is what you really want to know. Is that correct? Network Fortius wrote:

...

And how exactly would you interpret the number returned by net_loss (int), in a column called "LOSS", in reference to reachability of a "hop" between two end points:

int net_loss(int at) { if ((host[at].xmit - host[at].transit) == 0) return 0; /* times extra 1000 */ return 1000*(100 - (100.0 * host[at].returned / (host[at].xmit - host[at].transit)) ); } ?

Thanks, Stef Network Fortius, LLC

On Sep 6, 2005, at 7:45 AM, Robert E.Seastrom wrote:

...
Network Fortius <netfortius@gmail.com> writes:

...
Anybody having any idea why such a high packet loss on lever3's network, in Chicago?

End-user misinterpreting output from MTR. This network does not appear to have any packet loss end-to-end.

---Rob

...
Stef:~ scm$ mtr -r www.yahoo.com ... tbr1-p010802.cgcil.ip.att.net 0% 16 16 15.12 24.21 49.26 ggr2-p310.cgcil.ip.att.net 0% 16 16 13.18 42.66 118.99 so-1-1-0.edge1.chicago1.level3.net 0% 16 16 14.48 35.84 126.48 so-2-1-0.bbr1.chicago1.level3.net 63% 6 16 14.44 43.74 79.97

^^^^^^^ as-1-0.bbr2.sanjose1.level3.net 0% 16 16 61.95 80.64 176.01 ge-10-2.ipcolo3.sanjose1.level3.net 0% 16 16 63.37 95.61 148.46 unknown.level3.net 0% 16 16 63.34 86.46 168.62 unknown-66-218-82-217.yahoo.com 0% 16 16 62.09 88.91 127.58 p4.www.scd.yahoo.com 0% 16 16 64.51 89.96 183.79

TIA, Stef Network Fortius, LLC

chip

2:52 p.m.

On 9/6/05, Joe Maimon <jmaimon@ttec.com> wrote:

...

If the hop(s) following the one you see loss for shows no loss, then disregard the loss for that hop, obviously whatever it is, it does not affect transit, which is what you really want to know.

Is that correct?

This is one of the most misunderstood concepts in properly reading output from a traceroute (mtr, visualtraceroute, whatever). Basically you are seeing loss of packets destined directly *TO* that router, not THRU it. Most often this is caused by 1) the router having ratelimits applied to these packets so as not to bog down the CPU while it's trying to perfom its main function...forwarding packets or 2) the router is already busy and places a low priority on responding to those packets so as to leave CPU available for forwarding packets. You can see from the trace that hops after that don't show any loss. If that router was actually causing loss then you would see the loss continue thru the rest of the trace. Since you don't, you can assume that the router is experiencing one of the cases above. Of course there are always exceptions but 99.9% of the time this is the case. This same concept applies to latency as well. If you see only a single hop with a high response time and everything afterwards is normal, it's the same situation but it's taking the router a longer time to respond to you rather than it ignoring you. You can test this by simply pinging the end destination...do you see the same loss and/or high latency, if not you can disregard it. And while we're on the subject of reading this output, remember that traces only show you the forward path, not the reverse. Thanks to the wonders of asymmetric routing, at times it could be the return path that actually has the loss on it, the loss in the forward path only gives you an idea of where to begin troubleshooting. --chip -- Just my $.02, your mileage may vary, batteries not included, etc....

Christopher L. Morrow

5:09 p.m.

On Tue, 6 Sep 2005, chip wrote:

...

On 9/6/05, Joe Maimon <jmaimon@ttec.com> wrote:

...
If the hop(s) following the one you see loss for shows no loss, then disregard the loss for that hop, obviously whatever it is, it does not affect transit, which is what you really want to know.

Is that correct?

This is one of the most misunderstood concepts in properly reading output from a traceroute (mtr, visualtraceroute, whatever). Basically you are seeing loss of packets destined directly *TO* that router, not THRU it. Most

no... not destined TO the router, destined THROUGH the router that happen to TTL=0 ON that router. which is also misunderstood by just about everyone :( but anyway... 'not affecting transit' for reasons sited by yourself and min and adam already, yes.

sdb＠stewartb.com

8:08 p.m.

On Tue, 6 Sep 2005, Christopher L. Morrow wrote:

...

On Tue, 6 Sep 2005, chip wrote:

...
On 9/6/05, Joe Maimon <jmaimon@ttec.com> wrote:

...
If the hop(s) following the one you see loss for shows no loss, then disregard the loss for that hop, obviously whatever it is, it does not affect transit, which is what you really want to know.

Is that correct?

This is one of the most misunderstood concepts in properly reading output from a traceroute (mtr, visualtraceroute, whatever). Basically you are seeing loss of packets destined directly *TO* that router, not THRU it. Most

no... not destined TO the router, destined THROUGH the router that happen to TTL=0 ON that router.

Very true. Most backbone kit on a tier 1 network is designed to switch packets in a distributed fashion, shifting packets between ports/cards over a backplane of some sort. On such kit, generating things such as a TTL-exceeded packet is usually punted to a central processor (whose primary task is to build route tables to hand off to the cards), which deals with the task in a much slower and much lower priority way than packets which transit the routing device. You also don't want your central processor to have to deal with too much of this sort of thing, which is (at least one of the reasons) why it's often rate limited.

...

which is also misunderstood by just about everyone :( but anyway... 'not affecting transit' for reasons sited by yourself and min and adam already, yes.

Agreed. SB -- Stewart Bamford (Posting as an individual) Level3 Snr IP Engineer *** Views expressed are my own and not necessarily those of Level3 *** Primary email stewart@whoever.com Secondary email me@stewartb.com Personal website http://www.stewartb.com/

Christopher L. Morrow

7 Sep 7 Sep

2:01 a.m.

On Tue, 6 Sep 2005 sdb@stewartb.com wrote:

...

On Tue, 6 Sep 2005, Christopher L. Morrow wrote:

...
On Tue, 6 Sep 2005, chip wrote:

...
On 9/6/05, Joe Maimon <jmaimon@ttec.com> wrote:

...
If the hop(s) following the one you see loss for shows no loss, then disregard the loss for that hop, obviously whatever it is, it does not affect transit, which is what you really want to know.

Is that correct?

This is one of the most misunderstood concepts in properly reading output from a traceroute (mtr, visualtraceroute, whatever). Basically you are seeing loss of packets destined directly *TO* that router, not THRU it. Most

no... not destined TO the router, destined THROUGH the router that happen to TTL=0 ON that router.

Very true. Most backbone kit on a tier 1 network is designed to switch

I was really just pointing out that 'traceroute' or 'mtr' send packets with increasing TTL to show 'loss' or 'delay' from place to place, I wasn't trying to debate the every-changing reasons why backbone equipment might or might not answer 'ttl-expired' or 'unreachable' (or any 'exception traffic' really) in a 'timely' fashion. That issue changes with the wind/os/hardware/model.... :) nice to L3 sending in the answer police though :) Thanks! -Chris

sdb＠stewartb.com

11:46 a.m.

On Wed, 7 Sep 2005, Christopher L. Morrow wrote:

...

On Tue, 6 Sep 2005 sdb@stewartb.com wrote:

...
On Tue, 6 Sep 2005, Christopher L. Morrow wrote:

...
On Tue, 6 Sep 2005, chip wrote:

...
On 9/6/05, Joe Maimon <jmaimon@ttec.com> wrote:

...
If the hop(s) following the one you see loss for shows no loss, then disregard the loss for that hop, obviously whatever it is, it does not affect transit, which is what you really want to know.

Is that correct?

This is one of the most misunderstood concepts in properly reading output from a traceroute (mtr, visualtraceroute, whatever). Basically you are seeing loss of packets destined directly *TO* that router, not THRU it. Most

no... not destined TO the router, destined THROUGH the router that happen to TTL=0 ON that router.

Very true. Most backbone kit on a tier 1 network is designed to switch

I was really just pointing out that 'traceroute' or 'mtr' send packets with increasing TTL to show 'loss' or 'delay' from place to place, I wasn't trying to debate the every-changing reasons why backbone equipment might or might not answer 'ttl-expired' or 'unreachable' (or any 'exception traffic' really) in a 'timely' fashion. That issue changes with the wind/os/hardware/model.... :)

Yeah, it was a sweeping generalisation, hence the excessive use of words such as "usually" and "most" :) I was trying to put the point across as to why things are like this, for those that might be wondering why. The main point was actually that the ability of a device (router, web server etc) to deal with stuff _like_ ICMP message generation does not reflect its ability to perform it's main task.

...

nice to L3 sending in the answer police though :) Thanks!

Thanks :) SB -- Stewart Bamford (Posting as an individual) Level3 Snr IP Engineer *** Views expressed are my own and not necessarily those of Level3 *** Primary email stewart@whoever.com Secondary email me@stewartb.com Personal website http://www.stewartb.com/

Steven M. Bellovin

6 Sep 6 Sep

2:36 p.m.

In message <38D1AE7F-52FA-4BCF-A32F-CF40FA958DC6@gmail.com>, Network Fortius wr ites:

...

And how exactly would you interpret the number returned by net_loss (int), in a column called "LOSS", in reference to reachability of a "hop" between two end points:

The hop not sending out the ICMP message for those packets, either because its CPU is overloaded or because of a configuration option to rate-limit them. --Steven M. Bellovin, http://www.cs.columbia.edu/~smb

Network Fortius

3:09 p.m.

On Sep 6, 2005, at 9:36 AM, Steven M. Bellovin wrote:

...

In message <38D1AE7F-52FA-4BCF-A32F-CF40FA958DC6@gmail.com>, Network Fortius wr ites:

...
And how exactly would you interpret the number returned by net_loss (int), in a column called "LOSS", in reference to reachability of a "hop" between two end points:

The hop not sending out the ICMP message for those packets, either because its CPU is overloaded or because of a configuration option to rate-limit them.

--Steven M. Bellovin, http://www.cs.columbia.edu/~smb

Thank you. The former seems close to what may have happened, with a possible impact beyond ICMP, as once having moved my client over to their Broadwing connection, their processing from Yahoo's site seems to have come back to where it was a few days ago. Thanks again, Stef Network Fortius, LLC

Alexander Koch

3:16 p.m.

On Tue, 6 September 2005 10:09:17 -0500, Network Fortius wrote: [..]

...

Thank you. The former seems close to what may have happened, with a possible impact beyond ICMP, as once having moved my client over to their Broadwing connection, their processing from Yahoo's site seems to have come back to where it was a few days ago.

Oh, yes. Is it just me, or did this read as: "Bah! But I was right anyway, all you suckers!" ? Go on, amuse the crowd. Alexander

Network Fortius

3:35 p.m.

On Sep 6, 2005, at 10:16 AM, Alexander Koch wrote:

...

On Tue, 6 September 2005 10:09:17 -0500, Network Fortius wrote: [..]

...
Thank you. The former seems close to what may have happened, with a possible impact beyond ICMP, as once having moved my client over to their Broadwing connection, their processing from Yahoo's site seems to have come back to where it was a few days ago.

Oh, yes. Is it just me, or did this read as: "Bah! But I was right anyway, all you suckers!"

?

Go on, amuse the crowd.

Alexander

I am sorry, if you interpret it this way. I do not have much choice, when it comes to servicing people asking for immediate resolution, so it is either trying to determine (via the wrong tool, in this case?!?) if there is something going on, during Labor Day, when the client still accepts "toying around", or the knee-jerk reaction to move them first thing Tuesday morning, then trying to philosophize around the problem. Again - please accept my apologies - it must have been just a coincidence that my lack of properly interpreting the tool output, combined with something actually having happened on the client's side, led to the wrong assumption that things were wrong in a place that the tool's output should not have been indicative of. Stef Network Fortius, LLC

Scott Altman

5:09 p.m.

Apparently it's just you. Nice job on taking an on-topic discussion and then flaming the original poster who (based on my personal subjective read of the responses) was geninuinely asking for assistance and appears to have gleaned some value from this thread and got her client's service restored. Consider the value of your post prior to doing it. For those asking me to do the same, the value is to bring more attention to the lack of moderated influence on nanog. While recognizing it is nobody's full-time job to baby-sit this forum, I find that posts from the steering committee are generall heeded. Further, attacking those looking for help in no way aides the network community as a whole and you present yourself as elitist. People may be misguided and may be off topic, but helping them get where they are trying to go is our job in many levels. My apologies for providing troll food. - Scott On 9/6/05, Alexander Koch <koch@tiscali.net> wrote:

...

Oh, yes. Is it just me, or did this read as: "Bah! But I was right anyway, all you suckers!"

?

Go on, amuse the crowd.

Alexander

Adam Rothschild

2:39 p.m.

On 2005-09-06-10:25:28, Network Fortius <netfortius@gmail.com> wrote:

...

And how exactly would you interpret the number returned by net_loss (int), in a column called "LOSS", in reference to reachability of a "hop" between two end points [...]

I'd interpret it to mean you're hitting a control plane policer or somesuch, with no actual bearing on end-to-end performance, judging from the diagnostic output you've graciously provided us with. I find myself giving this lecture several times a week to random "gamer" customers upset that intermediary routers don't reply to their pings at full line rate; I'd expect slightly better critical thinking skills from the posters on this list, but I've been wrong before. :) -a

Jay R. Ashworth

4:53 p.m.

On Tue, Sep 06, 2005 at 10:39:12AM -0400, Adam Rothschild wrote:

...

On 2005-09-06-10:25:28, Network Fortius <netfortius@gmail.com> wrote:

...
And how exactly would you interpret the number returned by net_loss (int), in a column called "LOSS", in reference to reachability of a "hop" between two end points [...]

I'd interpret it to mean you're hitting a control plane policer or somesuch, with no actual bearing on end-to-end performance, judging from the diagnostic output you've graciously provided us with.

I find myself giving this lecture several times a week to random "gamer" customers upset that intermediary routers don't reply to their pings at full line rate; I'd expect slightly better critical thinking skills from the posters on this list, but I've been wrong before. :)

And yet, his client had a problem, with that link, and did not have a problem with some other link, which, presumably, did *not* show that indication. Correlation does not imply causation, given, but it's certainly a datapoint. Best Practices of wide-area diagnosis, anyone? Cheers, -- jra -- Jay R. Ashworth jra@baylink.com Designer +-Internetworking------+----------+ RFC 2100 Ashworth & Associates | Best Practices Wiki | | '87 e24 St Petersburg FL USA http://bestpractices.wikicities.com +1 727 647 1274 If you can read this... thank a system administrator. Or two. --me

andrew2＠one.net

5:16 p.m.

owner-nanog@merit.edu wrote:

...

Best Practices of wide-area diagnosis, anyone?

I'd be interested in a discussion of this as well. To answer a slightly different question, I usually point the "ping and traceroute" geeks to Karl's wonderful treatise on the subject: http://www.iwl.com/Resources/Papers/icmp-echo_print.html. Andrew Cruse

Jared Mauch

8:26 p.m.

On Tue, Sep 06, 2005 at 01:16:59PM -0400, andrew2@one.net wrote:

...

owner-nanog@merit.edu wrote:

...
Best Practices of wide-area diagnosis, anyone?

I'd be interested in a discussion of this as well. To answer a slightly different question, I usually point the "ping and traceroute" geeks to Karl's wonderful treatise on the subject: http://www.iwl.com/Resources/Papers/icmp-echo_print.html.

i've found it useful to use a simple udp probe tool to test networks in the past. You can test end-to-end loss and get something reasonable. The following expects you to know: 1) GCC/Makefiles 2) how to insure you link in your resolver and socket/nsl functions 3) tweak your cpu compile options for your host.. but.. ftp://puck.nether.net/pub/jared/rtt-0.12.tar.gz If your clocks are accurately synced, you can even get unidirectional delay. I usually run it like this: ./rtt -v <host> you will need to run ./rtt_resp on the far end host. You can also use iperf or similer tools to help customers diagnose network problems, but a easy/lightweight daemon on a few hosts is always fairly easy to play with in a quick-and-dirty way... - jared -- Jared Mauch | pgp key available via finger from jared@puck.nether.net clue++; | http://puck.nether.net/~jared/ My statements are only mine.

7240

Age (days ago)

7241

Last active (days ago)

List overview

Download

17 comments

13 participants

participants (13)

Adam Rothschild
Alexander Koch
andrew2＠one.net
chip
Christopher L. Morrow
Jared Mauch
Jay R. Ashworth
Joe Maimon
Network Fortius
Robert E.Seastrom
Scott Altman
sdb＠stewartb.com
Steven M. Bellovin