Single IP routing problems through Level3
We're seeing some really weird issues with connections that go through / to Level3 IP space. Basically, certain "pairs" of IPs (particular L3 IPs coupled with particular IPs of ours) have dodgy/nonexistent connectivity, but if you change the IP at either end everything's hunky dory. I've sniffed (from both ends) pings going from a host in L3 space to our end and seen the pings arrive at our end and head back in the direction of L3, but they never get to their destination. Traceroutes from L3 stop at the next-to-last hop, while traceroutes back get to the hop before L3 space and stop. All of this behaviour is source/dest *pair* specific -- if I ping/traceroute from another address (in the same netblock as the problematic IP, so all the same equipment is involved) at either end, or to another address (again, same netblock) at either end, it all works again. I've got two questions: 1) Has anyone else seen similar behaviour from L3 (or other providers, even), so I know I'm not going mad? 2) What sort of configuration problem or software bug would cause this sort of problem to occur? If it was an IP blacklist (or even a block routing issue) anywhere along the line, surely it wouldn't be sensitive to changing the other end's address to another one in the same /24? Any insight/anecdotes/etc would be greatly appreciated, as it's starting to do my head in. Just knowing I'm not alone with this insanity would be nice at this point. <grin> If it makes any difference, the blocks I'm working from at my end are Internap, in 74.201.254.0/23 (we don't have all of it, just most of it), while the far end is 8.12.35.0/24. - Matt
1) I've seen this behavior before; you are not alone in the universe. 2) Most likely there is a balanced channel on the path, either L3 or L2, and one of the links in the bundle is dead but has not been detected as such. Rubens On Sun, Jun 15, 2008 at 11:01 AM, Matt Palmer <mpalmer@hezmatt.org> wrote:
We're seeing some really weird issues with connections that go through / to Level3 IP space. Basically, certain "pairs" of IPs (particular L3 IPs coupled with particular IPs of ours) have dodgy/nonexistent connectivity, but if you change the IP at either end everything's hunky dory.
I've sniffed (from both ends) pings going from a host in L3 space to our end and seen the pings arrive at our end and head back in the direction of L3, but they never get to their destination. Traceroutes from L3 stop at the next-to-last hop, while traceroutes back get to the hop before L3 space and stop.
All of this behaviour is source/dest *pair* specific -- if I ping/traceroute from another address (in the same netblock as the problematic IP, so all the same equipment is involved) at either end, or to another address (again, same netblock) at either end, it all works again.
I've got two questions:
1) Has anyone else seen similar behaviour from L3 (or other providers, even), so I know I'm not going mad?
2) What sort of configuration problem or software bug would cause this sort of problem to occur? If it was an IP blacklist (or even a block routing issue) anywhere along the line, surely it wouldn't be sensitive to changing the other end's address to another one in the same /24?
Any insight/anecdotes/etc would be greatly appreciated, as it's starting to do my head in. Just knowing I'm not alone with this insanity would be nice at this point. <grin>
If it makes any difference, the blocks I'm working from at my end are Internap, in 74.201.254.0/23 (we don't have all of it, just most of it), while the far end is 8.12.35.0/24.
- Matt
On Sun, Jun 15, 2008 at 11:12:25AM -0300, Rubens Kuhl Jr. wrote:
1) I've seen this behavior before; you are not alone in the universe.
Thank $DEITY for that. <grin>
2) Most likely there is a balanced channel on the path, either L3 or L2, and one of the links in the bundle is dead but has not been detected as such.
A multiple-link bundle which is load balanced by source/destination pair with an undetected dud link? I hadn't thought of that, but it does make an *awful* lot of sense. (Although, not being a big-network transit kinda person, I don't know if such a thing actually exists <grin>) I'll mention it (or ask about it) as a possibility next time I talk to the relevant people, though. Thanks, - Matt
On Sun, Jun 15, 2008 at 11:01 AM, Matt Palmer <mpalmer@hezmatt.org> wrote:
We're seeing some really weird issues with connections that go through / to Level3 IP space. Basically, certain "pairs" of IPs (particular L3 IPs coupled with particular IPs of ours) have dodgy/nonexistent connectivity, but if you change the IP at either end everything's hunky dory.
I've sniffed (from both ends) pings going from a host in L3 space to our end and seen the pings arrive at our end and head back in the direction of L3, but they never get to their destination. Traceroutes from L3 stop at the next-to-last hop, while traceroutes back get to the hop before L3 space and stop.
All of this behaviour is source/dest *pair* specific -- if I ping/traceroute from another address (in the same netblock as the problematic IP, so all the same equipment is involved) at either end, or to another address (again, same netblock) at either end, it all works again.
I've got two questions:
1) Has anyone else seen similar behaviour from L3 (or other providers, even), so I know I'm not going mad?
2) What sort of configuration problem or software bug would cause this sort of problem to occur? If it was an IP blacklist (or even a block routing issue) anywhere along the line, surely it wouldn't be sensitive to changing the other end's address to another one in the same /24?
Any insight/anecdotes/etc would be greatly appreciated, as it's starting to do my head in. Just knowing I'm not alone with this insanity would be nice at this point. <grin>
If it makes any difference, the blocks I'm working from at my end are Internap, in 74.201.254.0/23 (we don't have all of it, just most of it), while the far end is 8.12.35.0/24.
Matt Palmer wrote:
A multiple-link bundle which is load balanced by source/destination pair with an undetected dud link? I hadn't thought of that, but it does make an *awful* lot of sense....
I've also seen interesting OSPF misconfigurations that resulted in a router doing path-wise load balancing between the live link and an unroutable destination address that went into the bit bucket. On the Cisco boxes I was using at the time, the hallmark of load-balancing into a dead path was that every other IP address worked, but then every some number of addresses (12, as I remember, but I'm not 100% sure) the polarity flipped, so that whichever of odd vs even had worked before was now the one that didn't for the next N addresses. Artifact of the hash algorithm in use, no doubt. Matthew Kaufman matthew@eeph.com http://www.matthew.at
Matt Palmer wrote:
We're seeing some really weird issues with connections that go through / to Level3 IP space. Basically, certain "pairs" of IPs (particular L3 IPs coupled with particular IPs of ours) have dodgy/nonexistent connectivity, but if you change the IP at either end everything's hunky dory.
I've sniffed (from both ends) pings going from a host in L3 space to our end and seen the pings arrive at our end and head back in the direction of L3, but they never get to their destination. Traceroutes from L3 stop at the next-to-last hop, while traceroutes back get to the hop before L3 space and stop.
All of this behaviour is source/dest *pair* specific -- if I ping/traceroute from another address (in the same netblock as the problematic IP, so all the same equipment is involved) at either end, or to another address (again, same netblock) at either end, it all works again.
I've got two questions:
1) Has anyone else seen similar behaviour from L3 (or other providers, even), so I know I'm not going mad?
2) What sort of configuration problem or software bug would cause this sort of problem to occur? If it was an IP blacklist (or even a block routing issue) anywhere along the line, surely it wouldn't be sensitive to changing the other end's address to another one in the same /24?
Any insight/anecdotes/etc would be greatly appreciated, as it's starting to do my head in. Just knowing I'm not alone with this insanity would be nice at this point. <grin>
If it makes any difference, the blocks I'm working from at my end are Internap, in 74.201.254.0/23 (we don't have all of it, just most of it), while the far end is 8.12.35.0/24.
- Matt
We commonly see this sort of problem with Layer2 or Layer3 bonded etherchannel (LACP also). One member of the channel is failing for one reason or another and dropping traffic. The channel is really not a load balance mechanism, but is a frame distribution mechanism. The distribution of frames uses the source and destination IP addresses to hash out to a particular channel member, and that distribution provides a rough balance. The problems noted affect traffic in one direction differently as it is likely assymetric across the channel. Only traffic across the ailing member will be impacted. The above can present itself anywhere in the path if channeling is used. Regards, Tim Peiffer Networking and Telecommunications Services University of Minnesota/NorthernLights GigaPOP
participants (4)
-
Matt Palmer
-
Matthew Kaufman
-
Rubens Kuhl Jr.
-
Tim Peiffer