Re: Consistent asymetric latency on monitoring?
Resent, since I responded from the wrong address: --- The basic operation of IP SLA is as surmised; payload with timestamps and other telemetry data is sent to a 'responder' which manipulates the payload, including adding its own timestamps, and returns the altered payload. I had to do a mental walk-through, but I think I see how drift can cause this. I'm going to generate some artificial data, graph it, and see if it matches the general waveshape I'm seeing. I purposefully have the traffic generators ntp syncing against the responders. I thought that would keep the clocks more closely in sync. I don't necessarily care if the time is 'right', just that it's the same. What kind of difference should I expect if I sync both generators and responders against the same source, or not sync the responder? I'm thinking that having one source with constant drift may be better than both devices trying to walk/correct the time. Thanks for the input! On Wed, Oct 21, 2009 at 8:01 PM, Rick Ernst <ernst@shreddedmail.com> wrote:
Resent, since I responded from the wrong address: --- The basic operation of IP SLA is as surmised; payload with timestamps and other telemetry data is sent to a 'responder' which manipulates the payload, including adding its own timestamps, and returns the altered payload.
I had to do a mental walk-through, but I think I see how drift can cause this. I'm going to generate some artificial data, graph it, and see if it matches the general waveshape I'm seeing.
I purposefully have the traffic generators ntp syncing against the responders. I thought that would keep the clocks more closely in sync. I don't necessarily care if the time is 'right', just that it's the same. What kind of difference should I expect if I sync both generators and responders against the same source, or not sync the responder? I'm thinking that having one source with constant drift may be better than both devices trying to walk/correct the time.
Thanks for the input!
On Wed, Oct 21, 2009 at 7:55 PM, Rick Ernst <ernst@shreddedmail.com>wrote:
The basic operation of IP SLA is as surmised; payload with timestamps and other telemetry data is sent to a 'responder' which manipulates the payload, including adding its own timestamps, and returns the altered payload.
I had to do a mental walk-through, but I think I see how drift can cause this. I'm going to generate some artificial data, graph it, and see if it matches the general waveshape I'm seeing.
I purposefully have the traffic generators ntp syncing against the responders. I thought that would keep the clocks more closely in sync. I don't necessarily care if the time is 'right', just that it's the same. What kind of difference should I expect if I sync both generators and responders against the same source, or not sync the responder? I'm thinking that having one source with constant drift may be better than both devices trying to walk/correct the time.
Thanks for the input!
On Wednesday, October 21, 2009, Nathan Ward <nanog@daork.net> wrote:
On 22/10/2009, at 2:31 PM, Perry Lorier wrote:
I assume this product works by having a packet with a timestamp sent from the source to the destination where it is timestamped again and either sent back, or another packet is sent in the other direction. The difference between the two timestamps gives you the latency in that direction.
I believe a packet is sent, and the target router responds with a timestamp.
But yeah, timestamps are being compared.
I'm with Perry though - sounds like your clocks are drifting.
-- Nathan Ward
Rick Ernst wrote:
Resent, since I responded from the wrong address: --- The basic operation of IP SLA is as surmised; payload with timestamps and other telemetry data is sent to a 'responder' which manipulates the payload, including adding its own timestamps, and returns the altered payload.
Yup :) It's the obvious way to do it :)
I had to do a mental walk-through, but I think I see how drift can cause this. I'm going to generate some artificial data, graph it, and see if it matches the general waveshape I'm seeing.
I purposefully have the traffic generators ntp syncing against the responders. I thought that would keep the clocks more closely in sync. I don't necessarily care if the time is 'right', just that it's the same.
This causes major problems. What you're actually measuring here is how well ntp can keep the clock sync'd under assymetric latency. ntp is trying to do it's own measurements of one way delay, without the help of clocks to measure clock drift as well. As you can see from your graphs ntp is not coping[1]. You are far better to have each end sync to a local stratum 1 or stratum 2 ntp source, preferably one over a different link to the one under test. If you don't have a local stratum 1/2 time source at each end, you might be able find one over a local exchange or other less congested link. If this is very important to you then you should consider looking at running your own stratum 1 clocks at each end syncronised off something like GPS, CDMA or a T1 clock.
What kind of difference should I expect if I sync both generators and responders against the same source, or not sync the responder? I'm thinking that having one source with constant drift may be better than both devices trying to walk/correct the time.
Most hardware clocks in PC's/routers/switches etc have pretty atrocious amounts of drift if left to free run[2], sometimes in the order of seconds or occasionally minutes per week. To get useful numbers you really do need to syncronise them to /something/. Synchronising them to each other causes problems as ntp I think (I could be wrong) assumes mostly symmetrical latency, and if the latency isn't symmetric assumes it's because one clock is running fast/slow and will alter the clock's speed to account for it. The great thing about ntp stratum 1 servers is that by definition they have more or less the same time no matter where they are, so synchronising each against a local ntp server will be a much much better solution. If possible you should consider peering with at least 3 upstreams, preferably 4(!)[3] other ntp servers. [1]: To be fair it's a hard problem. Anything that involves time just gets more and more complicated the more you look at it, ntp is extremely clever and probably knows more about time than I'd ever want to know, but you're making it's job hard. [2]: http://vancouver-webpages.com/time/ / http://vancouver-webpages.com/time/ltmhist.png [3]: http://twiki.ntp.org/bin/view/Support/SelectingOffsiteNTPServers#Section_5.3....
Lots of good info, and a nice mind-dump that gives me a whole host of other things that need to be looked at... Umm. "thanks" :) On Wed, Oct 21, 2009 at 11:10 PM, Perry Lorier <perry@coders.net> wrote:
Rick Ernst wrote:
Resent, since I responded from the wrong address: --- The basic operation of IP SLA is as surmised; payload with timestamps and other telemetry data is sent to a 'responder' which manipulates the payload, including adding its own timestamps, and returns the altered payload.
Yup :) It's the obvious way to do it :)
I had to do a mental walk-through, but I think I see how drift can
cause this. I'm going to generate some artificial data, graph it, and see if it matches the general waveshape I'm seeing.
I purposefully have the traffic generators ntp syncing against the responders. I thought that would keep the clocks more closely in sync. I don't necessarily care if the time is 'right', just that it's the same.
This causes major problems. What you're actually measuring here is how well ntp can keep the clock sync'd under assymetric latency. ntp is trying to do it's own measurements of one way delay, without the help of clocks to measure clock drift as well. As you can see from your graphs ntp is not coping[1].
You are far better to have each end sync to a local stratum 1 or stratum 2 ntp source, preferably one over a different link to the one under test. If you don't have a local stratum 1/2 time source at each end, you might be able find one over a local exchange or other less congested link. If this is very important to you then you should consider looking at running your own stratum 1 clocks at each end syncronised off something like GPS, CDMA or a T1 clock.
What kind of difference should I expect if I sync both
generators and responders against the same source, or not sync the responder? I'm thinking that having one source with constant drift may be better than both devices trying to walk/correct the time.
Most hardware clocks in PC's/routers/switches etc have pretty atrocious amounts of drift if left to free run[2], sometimes in the order of seconds or occasionally minutes per week. To get useful numbers you really do need to syncronise them to /something/. Synchronising them to each other causes problems as ntp I think (I could be wrong) assumes mostly symmetrical latency, and if the latency isn't symmetric assumes it's because one clock is running fast/slow and will alter the clock's speed to account for it. The great thing about ntp stratum 1 servers is that by definition they have more or less the same time no matter where they are, so synchronising each against a local ntp server will be a much much better solution. If possible you should consider peering with at least 3 upstreams, preferably 4(!)[3] other ntp servers.
[1]: To be fair it's a hard problem. Anything that involves time just gets more and more complicated the more you look at it, ntp is extremely clever and probably knows more about time than I'd ever want to know, but you're making it's job hard.
[2]: http://vancouver-webpages.com/time/ / http://vancouver-webpages.com/time/ltmhist.png
[3]: http://twiki.ntp.org/bin/view/Support/SelectingOffsiteNTPServers#Section_5.3... .
On Oct 21, 2009, at 11:03 PM, Rick Ernst wrote:
I thought that would keep the clocks more closely in sync. I don't necessarily care if the time is 'right', just that it's the same.
ntp is a pretty basic operational requirement for any network, irrespective of the use of IP SLA, is it not? ----------------------------------------------------------------------- Roland Dobbins <rdobbins@arbor.net> // <http://www.arbornetworks.com> Sorry, sometimes I mistake your existential crises for technical insights. -- xkcd #625
participants (3)
-
Perry Lorier
-
Rick Ernst
-
Roland Dobbins