Anyone from Verizon/TATA on here? Possible Packet Loss
Hi guys, We host a web application for a client and they've been complaining that it's been slow since yesterday. It seems fast from the locations I've tested and the system looks fine, so I suspected there was packet loss going on somewhere between them and our colo facility. I did a few trace routes from our firewall to the client's IP and most of the time they look fine, however I occasionally see some packet loss. Good trace route: 1 65.61.0.97 0 msec 0 msec 0 msec 2 107.1.118.217 0 msec 10 msec 0 msec 3 69.139.194.21 0 msec 0 msec 0 msec 4 68.86.147.129 10 msec 10 msec 20 msec 5 68.86.94.169 20 msec 30 msec 20 msec 6 68.86.86.26 20 msec 20 msec 10 msec 7 216.6.87.97 10 msec 20 msec 20 msec 8 216.6.87.34 10 msec 20 msec 10 msec 9 152.63.34.22 20 msec 10 msec 20 msec 10 130.81.28.255 30 msec 30 msec 20 msec Traceroutes with packet loss (8th hop): 1 65.61.0.97 0 msec 0 msec 0 msec 2 107.1.118.217 0 msec 10 msec 0 msec 3 69.139.194.21 0 msec 0 msec 0 msec 4 68.86.147.129 20 msec 10 msec 10 msec 5 68.86.94.169 20 msec 20 msec 30 msec 6 68.86.86.26 20 msec 20 msec 10 msec 7 216.6.87.97 10 msec 20 msec 30 msec 8 216.6.87.34 10 msec * 10 msec 9 152.63.34.22 140 msec 110 msec 20 msec 10 130.81.28.255 20 msec 30 msec 30 msec 1 65.61.0.97 10 msec 0 msec 0 msec 2 107.1.118.217 0 msec 10 msec 0 msec 3 69.139.194.21 0 msec 0 msec 0 msec 4 68.86.147.129 20 msec 20 msec 10 msec 5 68.86.94.169 30 msec 20 msec 20 msec 6 68.86.86.26 20 msec 20 msec 20 msec 7 216.6.87.97 20 msec 10 msec 20 msec 8 216.6.87.34 20 msec 40 msec * 9 152.63.34.22 20 msec 10 msec 10 msec 10 130.81.28.255 30 msec 30 msec 20 msec It appears the 8th hop occasionally has packet loss. Thanks, Derek
This is not the proper way to interpret traceroute information. Also, 3 pings is not sufficient to determine levels of packet loss statistically. I suggest searching the archives regarding traceroute, or googling how to interpret them in regards to packet loss, as what you posted does not indicate what you think it does. -Blake On Wed, Sep 26, 2012 at 11:59 AM, Derek Ivey <derek@derekivey.com> wrote:
Hi guys,
We host a web application for a client and they've been complaining that it's been slow since yesterday. It seems fast from the locations I've tested and the system looks fine, so I suspected there was packet loss going on somewhere between them and our colo facility.
I did a few trace routes from our firewall to the client's IP and most of the time they look fine, however I occasionally see some packet loss.
Good trace route:
1 65.61.0.97 0 msec 0 msec 0 msec 2 107.1.118.217 0 msec 10 msec 0 msec 3 69.139.194.21 0 msec 0 msec 0 msec 4 68.86.147.129 10 msec 10 msec 20 msec 5 68.86.94.169 20 msec 30 msec 20 msec 6 68.86.86.26 20 msec 20 msec 10 msec 7 216.6.87.97 10 msec 20 msec 20 msec 8 216.6.87.34 10 msec 20 msec 10 msec 9 152.63.34.22 20 msec 10 msec 20 msec 10 130.81.28.255 30 msec 30 msec 20 msec
Traceroutes with packet loss (8th hop):
1 65.61.0.97 0 msec 0 msec 0 msec 2 107.1.118.217 0 msec 10 msec 0 msec 3 69.139.194.21 0 msec 0 msec 0 msec 4 68.86.147.129 20 msec 10 msec 10 msec 5 68.86.94.169 20 msec 20 msec 30 msec 6 68.86.86.26 20 msec 20 msec 10 msec 7 216.6.87.97 10 msec 20 msec 30 msec 8 216.6.87.34 10 msec * 10 msec 9 152.63.34.22 140 msec 110 msec 20 msec 10 130.81.28.255 20 msec 30 msec 30 msec
1 65.61.0.97 10 msec 0 msec 0 msec 2 107.1.118.217 0 msec 10 msec 0 msec 3 69.139.194.21 0 msec 0 msec 0 msec 4 68.86.147.129 20 msec 20 msec 10 msec 5 68.86.94.169 30 msec 20 msec 20 msec 6 68.86.86.26 20 msec 20 msec 20 msec 7 216.6.87.97 20 msec 10 msec 20 msec 8 216.6.87.34 20 msec 40 msec * 9 152.63.34.22 20 msec 10 msec 10 msec 10 130.81.28.255 30 msec 30 msec 20 msec
It appears the 8th hop occasionally has packet loss.
Thanks, Derek
On Wed, Sep 26, 2012 at 1:10 PM, Blake Dunlap <ikiris@gmail.com> wrote:
This is not the proper way to interpret traceroute information. Also, 3 pings is not sufficient to determine levels of packet loss statistically.
I suggest searching the archives regarding traceroute, or googling how to interpret them in regards to packet loss, as what you posted does not indicate what you think it does.
Agreed. Derek should read "A Practical Guide to (Correctly) Troubleshooting with Traceroute": http://www.nanog.org/meetings/nanog45/presentations/Sunday/RAS_traceroute_N4... -- Darius Jahandarie
Thanks guys. That was an informative read. I will do some more troubleshooting. Derek On Sep 26, 2012, at 1:16 PM, Darius Jahandarie <djahandarie@gmail.com> wrote:
On Wed, Sep 26, 2012 at 1:10 PM, Blake Dunlap <ikiris@gmail.com> wrote:
This is not the proper way to interpret traceroute information. Also, 3 pings is not sufficient to determine levels of packet loss statistically.
I suggest searching the archives regarding traceroute, or googling how to interpret them in regards to packet loss, as what you posted does not indicate what you think it does.
Agreed. Derek should read "A Practical Guide to (Correctly) Troubleshooting with Traceroute": http://www.nanog.org/meetings/nanog45/presentations/Sunday/RAS_traceroute_N4...
-- Darius Jahandarie
After some further troubleshooting, I believe I have narrowed down the issue to one of Verizon's routers (130.81.28.255). ping 130.81.28.255 repeat 100 Type escape sequence to abort. Sending 100, 100-byte ICMP Echos to 130.81.28.255, timeout is 2 seconds: ?!!!!!!!!?!!!!!!!?!!!!!!!!?!!!!!!!!!!!!!!!?!!!!!!!!!!!!!!?!!!!!!!!!!!? !!!!!!!!!!!!!!!!!!!!!!?!!!?!!! Success rate is 91 percent (91/100), round-trip min/avg/max = 20/26/30 ms I had my client send me the output of the ping command (100 pings) and a trace route. Their 5th hop is 130.81.28.254 and one of the response times in their trace route was 175ms so the issue seems to be around there. I asked them to open a ticket with Verizon to take a look. Thanks, Derek On Sep 26, 2012, at 1:54 PM, Derek Ivey <derek@derekivey.com> wrote:
Thanks guys. That was an informative read. I will do some more troubleshooting.
Derek
On Sep 26, 2012, at 1:16 PM, Darius Jahandarie <djahandarie@gmail.com> wrote:
On Wed, Sep 26, 2012 at 1:10 PM, Blake Dunlap <ikiris@gmail.com> wrote:
This is not the proper way to interpret traceroute information. Also, 3 pings is not sufficient to determine levels of packet loss statistically.
I suggest searching the archives regarding traceroute, or googling how to interpret them in regards to packet loss, as what you posted does not indicate what you think it does.
Agreed. Derek should read "A Practical Guide to (Correctly) Troubleshooting with Traceroute": http://www.nanog.org/meetings/nanog45/presentations/Sunday/RAS_traceroute_N4...
-- Darius Jahandarie
Many (most?) routers deprioritize ICMP meesages. Direct pings against the router are not informative re transit failures. On Sep 26, 2012, at 11:37 AM, Derek Ivey wrote:
After some further troubleshooting, I believe I have narrowed down the issue to one of Verizon's routers (130.81.28.255).
ping 130.81.28.255 repeat 100 Type escape sequence to abort. Sending 100, 100-byte ICMP Echos to 130.81.28.255, timeout is 2 seconds: ?!!!!!!!!?!!!!!!!?!!!!!!!!?!!!!!!!!!!!!!!!?!!!!!!!!!!!!!!?!!!!!!!!!!!? !!!!!!!!!!!!!!!!!!!!!!?!!!?!!! Success rate is 91 percent (91/100), round-trip min/avg/max = 20/26/30 ms
I had my client send me the output of the ping command (100 pings) and a trace route.
Their 5th hop is 130.81.28.254 and one of the response times in their trace route was 175ms so the issue seems to be around there.
I asked them to open a ticket with Verizon to take a look.
Thanks, Derek
On Sep 26, 2012, at 1:54 PM, Derek Ivey <derek@derekivey.com> wrote:
Thanks guys. That was an informative read. I will do some more troubleshooting.
Derek
On Sep 26, 2012, at 1:16 PM, Darius Jahandarie <djahandarie@gmail.com> wrote:
On Wed, Sep 26, 2012 at 1:10 PM, Blake Dunlap <ikiris@gmail.com> wrote:
This is not the proper way to interpret traceroute information. Also, 3 pings is not sufficient to determine levels of packet loss statistically.
I suggest searching the archives regarding traceroute, or googling how to interpret them in regards to packet loss, as what you posted does not indicate what you think it does.
Agreed. Derek should read "A Practical Guide to (Correctly) Troubleshooting with Traceroute": http://www.nanog.org/meetings/nanog45/presentations/Sunday/RAS_traceroute_N4...
-- Darius Jahandarie
-- Jo Rhett Net Consonance : net philanthropy to improve open source and internet projects.
That router might be experiencing a high CPU load, thus not being able to reply ICMP on a timely manner or maybe QoS policies are influencing depending on the kind of traffic the router deals with. If packets are only being delayed/lost on that segment, I would start my analysis there. On 09/26/2012 04:02 PM, Jo Rhett wrote:
Many (most?) routers deprioritize ICMP meesages. Direct pings against the router are not informative re transit failures.
On Sep 26, 2012, at 11:37 AM, Derek Ivey wrote:
After some further troubleshooting, I believe I have narrowed down the issue to one of Verizon's routers (130.81.28.255).
ping 130.81.28.255 repeat 100 Type escape sequence to abort. Sending 100, 100-byte ICMP Echos to 130.81.28.255, timeout is 2 seconds: ?!!!!!!!!?!!!!!!!?!!!!!!!!?!!!!!!!!!!!!!!!?!!!!!!!!!!!!!!?!!!!!!!!!!!? !!!!!!!!!!!!!!!!!!!!!!?!!!?!!! Success rate is 91 percent (91/100), round-trip min/avg/max = 20/26/30 ms
I had my client send me the output of the ping command (100 pings) and a trace route.
Their 5th hop is 130.81.28.254 and one of the response times in their trace route was 175ms so the issue seems to be around there.
I asked them to open a ticket with Verizon to take a look.
Thanks, Derek
On Sep 26, 2012, at 1:54 PM, Derek Ivey <derek@derekivey.com> wrote:
Thanks guys. That was an informative read. I will do some more troubleshooting.
Derek
On Sep 26, 2012, at 1:16 PM, Darius Jahandarie <djahandarie@gmail.com> wrote:
On Wed, Sep 26, 2012 at 1:10 PM, Blake Dunlap <ikiris@gmail.com> wrote:
This is not the proper way to interpret traceroute information. Also, 3 pings is not sufficient to determine levels of packet loss statistically.
I suggest searching the archives regarding traceroute, or googling how to interpret them in regards to packet loss, as what you posted does not indicate what you think it does. Agreed. Derek should read "A Practical Guide to (Correctly) Troubleshooting with Traceroute": http://www.nanog.org/meetings/nanog45/presentations/Sunday/RAS_traceroute_N4...
-- Darius Jahandarie
That router might be experiencing a high CPU load, thus not being able to reply ICMP on a timely manner or maybe QoS policies are influencing depending on the kind of traffic the router deals with. If packets are only being delayed/lost on that segment, I would start my analysis there. On 09/26/2012 04:02 PM, Jo Rhett wrote: Many (most?) routers deprioritize ICMP meesages. Direct pings against the router are not informative re transit failures. On Sep 26, 2012, at 11:37 AM, Derek Ivey wrote:
After some further troubleshooting, I believe I have narrowed down the issue to one of Verizon's routers (130.81.28.255).
ping 130.81.28.255 repeat 100 Type escape sequence to abort. Sending 100, 100-byte ICMP Echos to 130.81.28.255, timeout is 2 seconds: ?!!!!!!!!?!!!!!!!?!!!!!!!!?!!!!!!!!!!!!!!!?!!!!!!!!!!!!!!?!!!!!!!!!!!? !!!!!!!!!!!!!!!!!!!!!!?!!!?!!! Success rate is 91 percent (91/100), round-trip min/avg/max = 20/26/30 ms
I had my client send me the output of the ping command (100 pings) and a trace route.
Their 5th hop is 130.81.28.254 and one of the response times in their trace route was 175ms so the issue seems to be around there.
I asked them to open a ticket with Verizon to take a look.
Thanks, Derek
On Sep 26, 2012, at 1:54 PM, Derek Ivey <derek@derekivey.com> wrote:
Thanks guys. That was an informative read. I will do some more troubleshooting.
Derek
On Sep 26, 2012, at 1:16 PM, Darius Jahandarie <djahandarie@gmail.com> wrote:
On Wed, Sep 26, 2012 at 1:10 PM, Blake Dunlap <ikiris@gmail.com> wrote:
This is not the proper way to interpret traceroute information. Also, 3 pings is not sufficient to determine levels of packet loss statistically.
I suggest searching the archives regarding traceroute, or googling how to interpret them in regards to packet loss, as what you posted does not indicate what you think it does. Agreed. Derek should read "A Practical Guide to (Correctly) Troubleshooting with Traceroute": http://www.nanog.org/meetings/nanog45/presentations/Sunday/RAS_traceroute_N4...
-- Darius Jahandarie
I'm at home now. I also have Verizon FiOS and believe I am seeing the same thing our client saw. So you guys are saying that the response times in traceroutes might not always be accurate because routers prioritize ICMP messages. Does that mean values from MTR aren't accurate? I fired up MTR and took 2 screenshots (http://imgur.com/a/RDyXO). What do you guys think? Most of the time the ping times seem fairly low, however I occasionally see these spikes. It seems sporadic... My boss also has FiOS and he is seeing the same thing. Pages load quick most of the time and sometimes take awhile to load. Thanks, Derek On 9/26/2012 3:19 PM, Pellitteri Alexis wrote:
That router might be experiencing a high CPU load, thus not being able to reply ICMP on a timely manner or maybe QoS policies are influencing depending on the kind of traffic the router deals with.
If packets are only being delayed/lost on that segment, I would start my analysis there.
On 09/26/2012 04:02 PM, Jo Rhett wrote:
Many (most?) routers deprioritize ICMP meesages. Direct pings against the router are not informative re transit failures.
On Sep 26, 2012, at 11:37 AM, Derek Ivey wrote:
After some further troubleshooting, I believe I have narrowed down the issue to one of Verizon's routers (130.81.28.255).
ping 130.81.28.255 repeat 100 Type escape sequence to abort. Sending 100, 100-byte ICMP Echos to 130.81.28.255, timeout is 2 seconds: ?!!!!!!!!?!!!!!!!?!!!!!!!!?!!!!!!!!!!!!!!!?!!!!!!!!!!!!!!?!!!!!!!!!!!? !!!!!!!!!!!!!!!!!!!!!!?!!!?!!! Success rate is 91 percent (91/100), round-trip min/avg/max = 20/26/30 ms
I had my client send me the output of the ping command (100 pings) and a trace route.
Their 5th hop is 130.81.28.254 and one of the response times in their trace route was 175ms so the issue seems to be around there.
I asked them to open a ticket with Verizon to take a look.
Thanks, Derek
On Sep 26, 2012, at 1:54 PM, Derek Ivey <derek@derekivey.com> wrote:
Thanks guys. That was an informative read. I will do some more troubleshooting.
Derek
On Sep 26, 2012, at 1:16 PM, Darius Jahandarie <djahandarie@gmail.com> wrote:
On Wed, Sep 26, 2012 at 1:10 PM, Blake Dunlap <ikiris@gmail.com> wrote:
This is not the proper way to interpret traceroute information. Also, 3 pings is not sufficient to determine levels of packet loss statistically.
I suggest searching the archives regarding traceroute, or googling how to interpret them in regards to packet loss, as what you posted does not indicate what you think it does. Agreed. Derek should read "A Practical Guide to (Correctly) Troubleshooting with Traceroute": http://www.nanog.org/meetings/nanog45/presentations/Sunday/RAS_traceroute_N4...
-- Darius Jahandarie
----- Original Message -----
From: "Derek Ivey" <derek@derekivey.com>
I'm at home now. I also have Verizon FiOS and believe I am seeing the same thing our client saw. So you guys are saying that the response times in traceroutes might not always be accurate because routers prioritize ICMP messages. Does that mean values from MTR aren't accurate? I fired up MTR and took 2 screenshots (http://imgur.com/a/RDyXO). What do you guys think? Most of the time the ping times seem fairly low, however I occasionally see these spikes. It seems sporadic...
To recap, traceroute, mtr, and similar utilities work by talking to each succesive router along a path. Because this is so, and because Any Given Router may be too busy to deal with such packets in favor of "real" traffic (most routers handle data packets on the line cards, while they may have to expend actual CPU on things like ICMP), it's possible for a path with perfect connectivity to show some intermediate hops completely missing -- No Reply At All, you might say -- to diagnostic tools. The traces you show look pretty decent; I've seen much worse on links with fine interactive shell session response. The time you have to worry is when one router *and everything past it* shows packet loss of roughly the same amount, or when ping times jump markedly at a given spot (by which I mean, say, from 32 to 800ms, rather than from 32 to 125). The short version, though, which most people are are being uncharacteristically too nice to say (:-) is that this is still a tier 1 problem, and NANOG is generally tier 3 or 4. :-) You're welcome to take the issue up over on outages@outages.org, if you like... Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA #natog +1 727 647 1274
Thanks guys. Sorry for the noise... Derek On 9/26/2012 9:11 PM, Jay Ashworth wrote:
From: "Derek Ivey" <derek@derekivey.com> I'm at home now. I also have Verizon FiOS and believe I am seeing the same thing our client saw. So you guys are saying that the response times in traceroutes might not always be accurate because routers prioritize ICMP messages. Does that mean values from MTR aren't accurate? I fired up MTR and took 2 screenshots (http://imgur.com/a/RDyXO). What do you guys think? Most of the time the ping times seem fairly low, however I occasionally see these spikes. It seems sporadic... To recap, traceroute, mtr, and similar utilities work by talking to each succesive router along a path. Because this is so, and because Any Given Router may be too busy to deal with such packets in favor of "real" traffic (most routers handle data packets on the line cards, while they may have to expend actual CPU on things like ICMP), it's possible for a path with
----- Original Message ----- perfect connectivity to show some intermediate hops completely missing -- No Reply At All, you might say -- to diagnostic tools.
The traces you show look pretty decent; I've seen much worse on links with fine interactive shell session response. The time you have to worry is when one router *and everything past it* shows packet loss of roughly the same amount, or when ping times jump markedly at a given spot (by which I mean, say, from 32 to 800ms, rather than from 32 to 125).
The short version, though, which most people are are being uncharacteristically too nice to say (:-) is that this is still a tier 1 problem, and NANOG is generally tier 3 or 4. :-)
You're welcome to take the issue up over on outages@outages.org, if you like...
Cheers, -- jra
participants (6)
-
Blake Dunlap
-
Darius Jahandarie
-
Derek Ivey
-
Jay Ashworth
-
Jo Rhett
-
Pellitteri Alexis