Consumer networking head scratcher
Hi everyone, I've got a real head scratcher that I have come across after replacing the router on my home network. I thought I'd share because it is a fascinating issue to me. At random times, my Windows machines (Win 7 and Win 10, attached to the network via WiFi, 5GHz) lose connectivity to the Internet. They can continue to access internal resources, such as the router's admin interface. Other devices including Macs, iPhones, Android phones, and Rokus never have this issue. I realized that on the Windows machines, when the connection drops, if I run a traceroute, it dies at a certain hop every time (out in Comcast's network, who is my ISP) even though a Mac sitting right next to it is able to go all the way through to the destination. The even stranger thing I discovered last night is that if I trace to the hop before the hop that it dies at, it then dies at the hop before that (and as I trace to closer and closer hops, it dies the hop before that!) This is illustrated in the traces I've captured here: http://pastebin.com/raw/R1UHLi0U For what it's worth, the router is a Linksys EA7300 that I just picked up. I can't even imagine what would cause this issue at this point. If anyone has any thoughts, I'd love to hear them! I'm going to start studying some packet captures to see if I can spot an issue. Best, Ryan
That's strange... it's like the TTL on all Windows IP packets are decrementing more and more as time goes on causing you to get less and less hops into the internet I wonder if it's a bug/virus/malware affecting only your windows computers. -Aaron
The issue doesn't happen with my previous router, and I've tested multiple computers (one that isn't mine.) It doesn't seem like it decrements over time.. it just dies sooner as I trace further up the path. I can consistently die at the 7th hop if I try to go to Google, but if I trace to the 6th hop, it'll die at the 5th hop! On Wed, Mar 1, 2017, at 01:23 PM, Aaron Gould wrote:
That's strange... it's like the TTL on all Windows IP packets are decrementing more and more as time goes on causing you to get less and less hops into the internet
I wonder if it's a bug/virus/malware affecting only your windows computers.
-Aaron
What's the old router make/model ? What's the new router make/model ? -Aaron -----Original Message----- From: Ryan Pugatch [mailto:rpug@lp0.org] Sent: Wednesday, March 1, 2017 12:27 PM To: Aaron Gould <aaron1@gvtc.com>; nanog@nanog.org Subject: Re: Consumer networking head scratcher The issue doesn't happen with my previous router, and I've tested multiple computers (one that isn't mine.) It doesn't seem like it decrements over time.. it just dies sooner as I trace further up the path. I can consistently die at the 7th hop if I try to go to Google, but if I trace to the 6th hop, it'll die at the 5th hop! On Wed, Mar 1, 2017, at 01:23 PM, Aaron Gould wrote:
That's strange... it's like the TTL on all Windows IP packets are decrementing more and more as time goes on causing you to get less and less hops into the internet
I wonder if it's a bug/virus/malware affecting only your windows computers.
-Aaron
On Wed, Mar 1, 2017, at 01:23 PM, Aaron Gould wrote:
That's strange... it's like the TTL on all Windows IP packets are decrementing more and more as time goes on causing you to get less and less hops into the internet
Hi Ryan, Windows tracert uses ICMP echo-request packets to trace the path. It expects either an ICMP destination unreachable message or an ICMP echo response message to come back. The final hop in the trace will return an ICMP echo-response or an unreachable-prohibited. The ones prior to the final hop will return an unreachable-time-exceeded if they return anything at all. If the destination does not respond to ping, if those pings are dropped, or if it responds with an unreachable that's dropped you will not receive a response and the tracert will not find its end. That's why you're seeing the "decrementing" behavior you describe. I have no information about whether comcast blocks pings to its routers. Regards, Bill Herrin -- William Herrin ................ herrin@dirtside.com bill@herrin.us Owner, Dirtside Systems ......... Web: <http://www.dirtside.com/>
On Wed, Mar 1, 2017, at 02:04 PM, William Herrin wrote:
On Wed, Mar 1, 2017, at 01:23 PM, Aaron Gould wrote:
That's strange... it's like the TTL on all Windows IP packets are decrementing more and more as time goes on causing you to get less and less hops into the internet
Hi Ryan,
Windows tracert uses ICMP echo-request packets to trace the path. It expects either an ICMP destination unreachable message or an ICMP echo response message to come back. The final hop in the trace will return an ICMP echo-response or an unreachable-prohibited. The ones prior to the final hop will return an unreachable-time-exceeded if they return anything at all.
If the destination does not respond to ping, if those pings are dropped, or if it responds with an unreachable that's dropped you will not receive a response and the tracert will not find its end. That's why you're seeing the "decrementing" behavior you describe.
I have no information about whether comcast blocks pings to its routers.
Regards, Bill Herrin
I see what you're saying, and that could explain the decrementing behavior I'm seeing which ultimately is not a real indicator of the problem I am having. So in that case, I would be back to my original issue where I stop being able to pass traffic to the Internet, and when that happens my traceroute always dies at the same hop. After disconnecting and reconnecting, the same traceroute will go all the way through. Thanks for the thoughts.
On Wed, Mar 1, 2017 at 2:31 PM, Ryan Pugatch <rpug@lp0.org> wrote:
So in that case, I would be back to my original issue where I stop being able to pass traffic to the Internet, and when that happens my traceroute always dies at the same hop. After disconnecting and reconnecting, the same traceroute will go all the way through.
Hi Ryan, Next step: run Wireshark and see what you see during the traceroutes. Are they leaving with a reasonable TTL? Is it certain that nothing returns? Are the packets going to the ethernet MAC address you expect them to? I had a fun problem once when I cloned some VMs but neglected to change the source MAC address. They all seemed to work under light load but get two downloading at once and suddenly they both experienced major packet loss. Regards, Bill -- William Herrin ................ herrin@dirtside.com bill@herrin.us Owner, Dirtside Systems ......... Web: <http://www.dirtside.com/>
On Wed, Mar 1, 2017, at 02:57 PM, William Herrin wrote:
On Wed, Mar 1, 2017 at 2:31 PM, Ryan Pugatch <rpug@lp0.org> wrote:
So in that case, I would be back to my original issue where I stop being able to pass traffic to the Internet, and when that happens my traceroute always dies at the same hop. After disconnecting and reconnecting, the same traceroute will go all the way through.
Hi Ryan,
Next step: run Wireshark and see what you see during the traceroutes. Are they leaving with a reasonable TTL? Is it certain that nothing returns? Are the packets going to the ethernet MAC address you expect them to?
I had a fun problem once when I cloned some VMs but neglected to change the source MAC address. They all seemed to work under light load but get two downloading at once and suddenly they both experienced major packet loss.
Regards, Bill
Definitely the direction I'm going. Even aside from the traceroutes, I'm going to capture some regular web traffic to see what is happening. Planning to send traffic to a machine I control to see if any packets are actually making it through at all. I'm not sure if this new Linksys router has any packet capture ability that is exposed to the end user, but I'd also love be able to see what's actually going through the router itself. Thanks, Ryan
On many non-windows OS (Mac OSX, Linux, FreeBSD etc.) you can specify ICMP traceroute using -I: traceroute -I google.com I wonder if this would replicate your experience with Windows tracert
This all goes away when he reconnects his old router from what I remember... If that is the case, then I would concentrate my effort on the new router, and its functionality (or lack of). Could be something simple that you are missing on it as a setting, or assuming it works a certain way when it does not. Sometimes these devices can be counter intuitive. On Wed, Mar 1, 2017 at 1:23 PM, Aaron Gould <aaron1@gvtc.com> wrote:
That's strange... it's like the TTL on all Windows IP packets are decrementing more and more as time goes on causing you to get less and less hops into the internet
I wonder if it's a bug/virus/malware affecting only your windows computers.
-Aaron
Just a quick sanity check here since I know we can occasionally overlook the simple things. You have updated the firmware to the latest available version correct? Have you checked for any odd services like QoS, parental controls or an IDS? Have you tried wiping it to factory default and reconfiguring it? What happens if you give the affected machine a new IP? Could it be some service on the device affecting that specific IP? -----Original Message----- From: NANOG [mailto:nanog-bounces@nanog.org] On Behalf Of David Bass Sent: Thursday, March 2, 2017 9:09 AM To: Aaron Gould <aaron1@gvtc.com> Cc: <nanog@nanog.org> <nanog@nanog.org> Subject: Re: Consumer networking head scratcher This all goes away when he reconnects his old router from what I remember... If that is the case, then I would concentrate my effort on the new router, and its functionality (or lack of). Could be something simple that you are missing on it as a setting, or assuming it works a certain way when it does not. Sometimes these devices can be counter intuitive. On Wed, Mar 1, 2017 at 1:23 PM, Aaron Gould <aaron1@gvtc.com> wrote:
That's strange... it's like the TTL on all Windows IP packets are decrementing more and more as time goes on causing you to get less and less hops into the internet
I wonder if it's a bug/virus/malware affecting only your windows computers.
-Aaron
At random times, my Windows machines (Win 7 and Win 10, attached to the network via WiFi, 5GHz) lose connectivity to the Internet. They can continue to access internal resources, such as the router's admin interface. To the point of Windows reporting no internet access, MS does two things to determine if the machine has internet access, as outlined here. https://technet.microsoft.com/en-us/library/cc766017(v=ws.10).aspx (I
On 3/1/2017 11:28 AM, Ryan Pugatch wrote: think that's still valid) From a console, can these two machines do the http request and the dns lookup when they tell you they're offline? Can the other machines do these two things when the Windows machines can't or when the windows machines report offline?
On 2017-03-01 11:28, Ryan Pugatch wrote:
At random times, my Windows machines (Win 7 and Win 10, attached to the network via WiFi, 5GHz) lose connectivity to the Internet.
For what it's worth, the router is a Linksys EA7300 that I just picked up.
Way back when, I have a netgear router. It ended having a limit on its NAT translation table, and when I had too many connections going at same time (or not yet timed out), I would lose connection. There was an unofficial patch to the firmware (litterally a patch in code that defined table size) to increase that table to 1000- as I recall. Does the Linksys have a means to display the NAT translation table and see if maybe connections are lost when that table is full and lots of connections have not yet timed out ?
participants (9)
-
Aaron Gould
-
Dann Schuler
-
David Bass
-
iamzam@gmail.com
-
Jean-Francois Mezei
-
Mark Wiater
-
Ryan Pugatch
-
valdis.kletnieks@vt.edu
-
William Herrin