On Thu, Sep 17, 2009 at 03:35:37PM -0700, Charles Wyble wrote:
Random failures of a single ports connectivity.... bizzare and annoying. Whole switches? Seen it. Whole panels? Seen it. Whole blades? Seen it.
Single port on a switch or patch panel? Never.
You've never seen a single port go bad on a switch? I can't even count the number of times I've seen that happen. Not that I'm not suggesting the OP wasn't the victim of a human error like unplugging the wrong port and they just lied to him, that happens even more. My favorite bizarre random failure story is a toss-up between one of these two: Story 1. Had a customer report that they weren't able to transfer this one particular file over their connection. The transfer would start and then at a certain point the tcp session would just lock up. After a lot of head scratching, it turned out that for 8 ports on a 24 port FastE switch blade, this certain combination of bytes caused the packet to be dropped on this otherwise perfectly normal and functioning card, thus stalling the tcp session while leaving everything around it unaffected. If you moved them to a different port outside this group of 8, or used https, or uuencoded it, it would go through fine. Story 2. Had a customer report that they were getting extremely slow transfers to another network, despite not being able to find any packet loss. Shifting the traffic to a different port to reach the same network resolved the problem. After removing the traffic and attempting to ping the far side, I got the following: <drop> 64 bytes from x.x.x.x: icmp_seq=1 ttl=61 time=0.194 ms 64 bytes from x.x.x.x: icmp_seq=2 ttl=61 time=0.196 ms 64 bytes from x.x.x.x: icmp_seq=3 ttl=61 time=0.183 ms 64 bytes from x.x.x.x: icmp_seq=0 ttl=61 time=4.159 ms <drop> 64 bytes from x.x.x.x: icmp_seq=5 ttl=61 time=0.194 ms 64 bytes from x.x.x.x: icmp_seq=6 ttl=61 time=0.196 ms 64 bytes from x.x.x.x: icmp_seq=7 ttl=61 time=0.183 ms 64 bytes from x.x.x.x: icmp_seq=4 ttl=61 time=4.159 ms After a little bit more testing, it turned out that every 4th packet that was being sent to the peers' router was being queued until another "4th packet" would come along and knock it out. If you increased the interval time of the ping, you would see the amount of time the packet spent in the queue increase. At one point I had it up to over 350 seconds (not milliseconds) that the packet stayed in the other routers' queue before that 4th packet came along and knocked it free. I suspect it could have gone higher, but random scanning traffic on the internet was coming in. When there was a lot of traffic on the interface you would never see the packet loss, just reordering of every 4th packet and thus slow tcp transfers. :) -- Richard A Steenbergen <ras@e-gerbil.net> http://www.e-gerbil.net/ras GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)