On Thu, 22 Dec 2022 at 08:41, William Herrin <bill@herrin.us> wrote:
Suppose you have a loose network cable between your Linux server and a switch. Layer 1. That RJ45 just isn't quite solid. It's mostly working but not quite right. What does it look like at layer 2? One thing it can look like is a periodic carrier flash where the NIC thinks it has no carrier, then immediately thinks it has enough of a carrier to negotiate speed and duplex. How does layer 3 respond to that?
Agreed. But then once the resolve happens, and linux floods the queued pings out, the responses would come ~immediately. So the delta between the RTT would remain at the send interval, in this case 1s. In this case, we see the RTT decreasing as if the buffer is being purged, until it seems to be filled again, up-until 5s or so. I don't exclude the rationale, I just think it's not likely based on the latencies observed. But at any rate with so little data, my confidence to include or exclude any specific explanation is low.
1s: send ping toward default router 1.1s: ping response from remote server 2s: send ping toward default router 2.1s: ping response from remote server 2.5s: carrier down 2.501s: carrier up 3s: queue ping, arp for default router, no response 4s: queue ping, arp for default router, no response 5s: queue ping, arp for default router, no response 6s: queue ping, arp for default router, no response 7s: queue ping, arp for default router 7.01s: arp response, send all 5 queued pings but note that the earliest is more than 4 seconds old. 7.1s: response from all 5 queued pings.
Cable still isn't right though, so in a few seconds or a few minutes you're going to get another carrier flash and the pattern will repeat.
I've also seen some cheap switches get stuck doing this even after the faulty cable connection is repaired, not clearing until a reboot.
Regards, Bill Herrin
-- For hire. https://bill.herrin.us/resume/
-- ++ytti