Consistent asymetric latency on monitoring?
Although the implementation is Cisco-specific, this feels more appropriate for NANOG. We've started rolling out a state-wide monitoring system based on Cisco's "IP SLA" feature set. Out of 5 sites deployed so far (different locations, different providers), we are consistently seeing one-way latency mirror the opposite direction. As source-destination latency goes up, destination-source latency goes down and vice versa. Myself and the monitoring team have ripped apart the OIDs, IP SLA configuration, and monitoring system. We've also built an ad-hoc system to compare the results. It's still consistent behavior. It's not a true mirror; there is definitely variation between the data collection, but at the 10,000 foot level, there is an obvious and consistent mirror to the data. The network topology is independant service providers all providing backhaul to a local ethernet exchange. Has anybody seen this type of behavior? We are solidly convinced that we are using the proper OIDs and making the proper transformations of the data. The two remaining causes appear to be either "natural behavior of the links" and/or "artifact in the IP SLA mechanism". Any ideas? Thanks!
Rick Ernst wrote:
Although the implementation is Cisco-specific, this feels more appropriate for NANOG.
We've started rolling out a state-wide monitoring system based on Cisco's "IP SLA" feature set. Out of 5 sites deployed so far (different locations, different providers), we are consistently seeing one-way latency mirror the opposite direction. As source-destination latency goes up, destination-source latency goes down and vice versa.
Myself and the monitoring team have ripped apart the OIDs, IP SLA configuration, and monitoring system. We've also built an ad-hoc system to compare the results. It's still consistent behavior. It's not a true mirror; there is definitely variation between the data collection, but at the 10,000 foot level, there is an obvious and consistent mirror to the data.
The network topology is independant service providers all providing backhaul to a local ethernet exchange.
Has anybody seen this type of behavior? We are solidly convinced that we are using the proper OIDs and making the proper transformations of the data. The two remaining causes appear to be either "natural behavior of the links" and/or "artifact in the IP SLA mechanism".
Any ideas?
Having never used cisco's IP SLA (or even read about it), take this with a sack of salt. I assume this product works by having a packet with a timestamp sent from the source to the destination where it is timestamped again and either sent back, or another packet is sent in the other direction. The difference between the two timestamps gives you the latency in that direction. Now, how are your clocks syncronised? are they synchronised using NTP? or something better (GPS?) If one of your clocks is drifting with respect to the other then you'll see this effect. Does your clock drift because NTP is failing to keep the clock well syncronised when it's connection to it's parent NTP server is saturated?
On 22/10/2009, at 2:31 PM, Perry Lorier wrote:
I assume this product works by having a packet with a timestamp sent from the source to the destination where it is timestamped again and either sent back, or another packet is sent in the other direction. The difference between the two timestamps gives you the latency in that direction.
I believe a packet is sent, and the target router responds with a timestamp. But yeah, timestamps are being compared. I'm with Perry though - sounds like your clocks are drifting. -- Nathan Ward
On Wed, 21 Oct 2009, Rick Ernst wrote:
Has anybody seen this type of behavior? We are solidly convinced that we are using the proper OIDs and making the proper transformations of the data. The two remaining causes appear to be either "natural behavior of the links" and/or "artifact in the IP SLA mechanism".
I've been using IP SLA for years (right now under 12.4) and I have not seen behaviour that mirrors what you see. I often see one-way latency go up without the other way doing so. You should start by looking in "show ip sla (monitor) op" and see what values you see in the router, that might give you more information regarding where the problem might be (your polling system or if the IP SLA agent is actually reporting what you see). -- Mikael Abrahamsson email: swmike@swm.pp.se
participants (4)
-
Mikael Abrahamsson
-
Nathan Ward
-
Perry Lorier
-
Rick Ernst