NTP Issues Today

Van Wolfe

19 Nov 2012 19 Nov '12

11:21 p.m.

Hello, Did anyone else experience issues with NTP today? We had our server times update to the year 2000 at around 3:30 MT, then revert back to 2012. Thanks, Van

Show replies by date

Mark Andrews

20 Nov 20 Nov

1:41 a.m.

In message <CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com> , Van Wolfe writes:

...

Hello,

Did anyone else experience issues with NTP today? We had our server times update to the year 2000 at around 3:30 MT, then revert back to 2012.

Thanks, Van

NTP should be immune from this sort of behaviour unless you did a ntpdate at the wrong moment. The clocks should have been marked as insane. Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

Wallace Keith

2:08 a.m.

Just got paged with a pbx alarm that had 1970 as the year. By the time I logged in , it was showing 2012. Using GPS for time and date. -----Original Message----- From: Mark Andrews [mailto:marka@isc.org] Sent: Monday, November 19, 2012 8:42 PM To: Van Wolfe Cc: nanog@nanog.org Subject: Re: NTP Issues Today In message <CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com> , Van Wolfe writes:

...

Hello,

Did anyone else experience issues with NTP today? We had our server times update to the year 2000 at around 3:30 MT, then revert back to 2012.

Thanks, Van

George Herbert

3:28 a.m.

crossreplying to outages list. Is anyone ELSE seeing GPS issues? This could well have been an unrelated issue on that particular PBX. If this was real, then the mother of all infrastructure attacks might be underway... One glitch on tick and tock and one malfunctioning PBX is not sufficient evidence of pattern - much less hostile activity - to induce panic, but it would perhaps be a wise time to check time-related logs? -george On Mon, Nov 19, 2012 at 6:08 PM, Wallace Keith <kwallace@pcconnection.com> wrote:

...

Just got paged with a pbx alarm that had 1970 as the year. By the time I logged in , it was showing 2012. Using GPS for time and date.

-----Original Message----- From: Mark Andrews [mailto:marka@isc.org] Sent: Monday, November 19, 2012 8:42 PM To: Van Wolfe Cc: nanog@nanog.org Subject: Re: NTP Issues Today

In message <CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com> , Van Wolfe writes:

...
Hello,

Did anyone else experience issues with NTP today? We had our server times update to the year 2000 at around 3:30 MT, then revert back to 2012.

Thanks, Van

NTP should be immune from this sort of behaviour unless you did a ntpdate at the wrong moment. The clocks should have been marked as insane.

Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

-- -george william herbert george.herbert@gmail.com

Sid Rao

3:58 a.m.

We had multiple servers synchronized with Windows/MS time change their clock to the year 2000 today. It broke many things, including AD authentication. These servers had been properly synchronized for years. They were synchronized with Microsoft and NIST NTP servers. This may not be isolated. Sid Rao | CTI Group | +1 (317) 262-4677 On Nov 19, 2012, at 10:29 PM, "George Herbert" <george.herbert@gmail.com> wrote:

...

crossreplying to outages list.

Is anyone ELSE seeing GPS issues? This could well have been an unrelated issue on that particular PBX.

If this was real, then the mother of all infrastructure attacks might be underway...

One glitch on tick and tock and one malfunctioning PBX is not sufficient evidence of pattern - much less hostile activity - to induce panic, but it would perhaps be a wise time to check time-related logs?

-george

On Mon, Nov 19, 2012 at 6:08 PM, Wallace Keith <kwallace@pcconnection.com> wrote:

...
Just got paged with a pbx alarm that had 1970 as the year. By the time I logged in , it was showing 2012. Using GPS for time and date.

-----Original Message----- From: Mark Andrews [mailto:marka@isc.org] Sent: Monday, November 19, 2012 8:42 PM To: Van Wolfe Cc: nanog@nanog.org Subject: Re: NTP Issues Today

In message <CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com> , Van Wolfe writes:

...
Hello,

Did anyone else experience issues with NTP today? We had our server times update to the year 2000 at around 3:30 MT, then revert back to 2012.

Thanks, Van

NTP should be immune from this sort of behaviour unless you did a ntpdate at the wrong moment. The clocks should have been marked as insane.

Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

-- -george william herbert george.herbert@gmail.com

Mike Lyon

4:17 a.m.

New subject: [outages] NTP Issues Today

Anyone check out the NIST GPS Archive? http://www.nist.gov/pml/div688/grp40/gpsarchive.cfm -Mike On Mon, Nov 19, 2012 at 7:58 PM, Sid Rao <srao@ctigroup.com> wrote:

...

We had multiple servers synchronized with Windows/MS time change their clock to the year 2000 today. It broke many things, including AD authentication.

These servers had been properly synchronized for years.

They were synchronized with Microsoft and NIST NTP servers.

This may not be isolated.

Sid Rao | CTI Group | +1 (317) 262-4677

On Nov 19, 2012, at 10:29 PM, "George Herbert" <george.herbert@gmail.com> wrote:

...
crossreplying to outages list.

Is anyone ELSE seeing GPS issues? This could well have been an unrelated issue on that particular PBX.

If this was real, then the mother of all infrastructure attacks might be underway...

One glitch on tick and tock and one malfunctioning PBX is not sufficient evidence of pattern - much less hostile activity - to induce panic, but it would perhaps be a wise time to check time-related logs?

-george

On Mon, Nov 19, 2012 at 6:08 PM, Wallace Keith <kwallace@pcconnection.com> wrote:

...
Just got paged with a pbx alarm that had 1970 as the year. By the time I logged in , it was showing 2012. Using GPS for time and date.

-----Original Message----- From: Mark Andrews [mailto:marka@isc.org] Sent: Monday, November 19, 2012 8:42 PM To: Van Wolfe Cc: nanog@nanog.org Subject: Re: NTP Issues Today

In message < CAMeggd4cDQwhxQE_JbvpNR-PKKe9LXqA+KzJ97anHFonjwZhdQ@mail.gmail.com> , Van Wolfe writes:

...
Hello,

Did anyone else experience issues with NTP today? We had our server times update to the year 2000 at around 3:30 MT, then revert back to

...
...
...
Thanks, Van

NTP should be immune from this sort of behaviour unless you did a ntpdate at the wrong moment. The clocks should have been marked as insane.

Mark -- Mark Andrews, ISC 1 Seymour St., Dundas Valley, NSW 2117, Australia PHONE: +61 2 9871 4742 INTERNET: marka@isc.org

-- -george william herbert george.herbert@gmail.com

_______________________________________________ Outages mailing list Outages@outages.org https://puck.nether.net/mailman/listinfo/outages

-- Mike Lyon 408-621-4826 mike.lyon@gmail.com http://www.linkedin.com/in/mlyon

Seth Mattinen

5:58 p.m.

On 11/19/12 6:08 PM, Wallace Keith wrote:

...

Just got paged with a pbx alarm that had 1970 as the year. By the time I logged in , it was showing 2012. Using GPS for time and date.

I use GPS for my NTP server and didn't notice anything, but it's PPS disciplined after initial sync so it doesn't matter as long as the pulse keeps going. ntp0# ntpq -pn remote refid st t when poll reach delay offset jitter ============================================================================== 127.127.1.0 .LOCL. 12 l 10 64 377 0.000 0.000 0.015 +216.171.124.36 .ACTS. 1 u 167 1024 377 26.801 2.387 0.015 +127.127.20.0 .GPS. 0 l 45 64 377 0.000 -0.048 0.015 o127.127.22.0 .PPS. 0 l 27 64 377 0.000 -0.048 0.015 ~Seth

Leo Bicknell

4:38 p.m.

In a message written on Mon, Nov 19, 2012 at 04:21:55PM -0700, Van Wolfe wrote:

...

Did anyone else experience issues with NTP today? We had our server times update to the year 2000 at around 3:30 MT, then revert back to 2012.

I'm surprised the various time geeks aren't all posting their logs, so I'll kick off: /tmp/parse-peerstats.pl peerstats.20121119 56250 76367.354 192.5.41.41 91b4 -378691200.312258363 0.088274002 0.014835425 0.263515353 56250 77391.354 192.5.41.41 91b4 -378691200.312258363 0.088274002 0.018668790 0.263749719 56250 78204.354 192.5.41.40 90b4 -378691200.785377324 0.088179350 0.014812585 0.263668835 56250 78416.355 192.5.41.41 91b4 -378691200.785974681 0.088312507 0.014832943 0.209966600 56250 79229.355 192.5.41.40 90b4 -378691200.785377324 0.088179350 0.018668723 378691200.785523713 56250 79442.355 192.5.41.41 91b4 -378691200.785974681 0.088312507 0.018689918 378691200.786114931 Or in more human readable form: /tmp/parse-peerstats.pl peerstats.20121119 192.5.41.41 off by -378691200.312258363 192.5.41.41 off by -378691200.312258363 192.5.41.40 off by -378691200.785377324 192.5.41.41 off by -378691200.785974681 192.5.41.40 off by -378691200.785377324 192.5.41.41 off by -378691200.785974681 The script, if you want to run against your own stats: #!/usr/bin/perl while (<>) { chomp; ($day, $second, $addr, $status, $offset, $delay, $disp, $skew) = split; if (($offset > 10) || ($offset < -10)) { # print "$addr off by $offset\n"; # More human friendly print "$_\n"; # Full details } } It just looks for servers off by more than 10 econds and then prints the line. 378691200 seconds is ~12 years, which lines up with the year 2000 dates some are reporting. The IP's are tick.usno.navy.mil and tock.usno.navy.mil. I can confirm from my vantage point that tick and tock both went about 12 years wrong on Nov 19th for a bit, I can also report that my NTP server with sufficient sources correctly determined they were haywire and ignored them. If your machines switched dates yesterday it probably means you're NTP infrastructure is insufficiently peered and diversified. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

Steve Meuse

5:02 p.m.

On Tue, Nov 20, 2012 at 11:38 AM, Leo Bicknell <bicknell@ufp.org> wrote:

...

If your machines switched dates yesterday it probably means you're NTP infrastructure is insufficiently peered and diversified.

If you take anything away from this thread, this is it.... -Steve

Leo Bicknell

7 p.m.

After some private replies, I'm going to reply to my own post with some information here. It appears many people don't understand how the NTP protocol works. I suspect many people have configured a "primary" and a "backup" NTP server on many of their devices. It turns out this is the _WORST_ possible configuration if you want accurate time: http://support.ntp.org/bin/view/Support/SelectingOffsiteNTPServers#Section_5.... To protect against two falseticking servers (tick and tock, as we saw on the 19th) you need _FIVE_ servers minimum configured if they are both in the list. More importantly, if you want to protect against a source (GPS, CDMA, IRIG, WWIV, ACTS, etc) false ticking, you need a minimum of _FOUR_ different source technologies in the list as well. It's not hard, my box that I posted the logs from peers with 18 servers using 8 source technologies, all freely available on the Internet... -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

Jay Ashworth

7:28 p.m.

----- Original Message -----

...

From: "Leo Bicknell" <bicknell@ufp.org>

...

To protect against two falseticking servers (tick and tock, as we saw on the 19th) you need _FIVE_ servers minimum configured if they are both in the list. More importantly, if you want to protect against a source (GPS, CDMA, IRIG, WWIV, ACTS, etc) false ticking, you need a minimum of _FOUR_ different source technologies in the list as well.

It's not hard, my box that I posted the logs from peers with 18 servers using 8 source technologies, all freely available on the Internet...

I'm curious, Leo, what your internal setup looks like. Do you have an internal pair of masters, all slaved to those externals and one another, with your machines homed to them? Full mesh? Or something else? In my last big gig, it was recommended to me that I have all the machines which had to speak to my DBMS NTP *to it*, and have only it connect to the rest of my NTP infrastructure. It coming unstuck was of less operational impact than *pieces of it* going out of sync with one another... Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA #natog +1 727 647 1274

Jared Mauch

7:39 p.m.

On Nov 20, 2012, at 2:28 PM, Jay Ashworth <jra@baylink.com> wrote:

...

----- Original Message -----

...
From: "Leo Bicknell" <bicknell@ufp.org>

...
To protect against two falseticking servers (tick and tock, as we saw on the 19th) you need _FIVE_ servers minimum configured if they are both in the list. More importantly, if you want to protect against a source (GPS, CDMA, IRIG, WWIV, ACTS, etc) false ticking, you need a minimum of _FOUR_ different source technologies in the list as well.

It's not hard, my box that I posted the logs from peers with 18 servers using 8 source technologies, all freely available on the Internet...

I'm curious, Leo, what your internal setup looks like. Do you have an internal pair of masters, all slaved to those externals and one another, with your machines homed to them? Full mesh? Or something else?

In my last big gig, it was recommended to me that I have all the machines which had to speak to my DBMS NTP *to it*, and have only it connect to the rest of my NTP infrastructure. It coming unstuck was of less operational impact than *pieces of it* going out of sync with one another...

here's a sample ntp config from one of my systems. -- snip -- # Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). server 0.fedora.pool.ntp.org server 1.fedora.pool.ntp.org server 2.fedora.pool.ntp.org server 3.fedora.pool.ntp.org # server 0.us.pool.ntp.org iburst maxpoll 9 server 1.us.pool.ntp.org iburst maxpoll 9 server 2.us.pool.ntp.org iburst maxpoll 9 server 129.250.35.250 iburst maxpoll 9 server 129.250.35.251 iburst maxpoll 9 -- snip -- You can audit its operation like this: nat:~$ ntpq -p -n -c ass remote refid st t when poll reach delay offset jitter ============================================================================== -129.250.35.250 164.244.221.197 2 u 68 512 377 19.248 -0.135 3.195 +129.250.35.251 192.5.41.40 2 u 439 512 377 41.817 1.109 15.660 -206.57.44.17 204.123.2.5 2 u 126 512 377 37.133 -6.443 9.631 +4.53.160.75 209.81.9.7 2 u 48 512 377 25.209 1.551 8.804 -64.73.32.135 192.5.41.41 2 u 349 512 377 23.418 -0.703 1.721 *50.116.38.157 64.250.177.145 2 u 380 512 377 43.021 1.267 2.136 +208.87.221.228 10.0.22.49 2 u 517 512 377 92.000 0.974 0.678 -206.212.242.132 128.252.19.1 2 u 323 512 377 21.781 -2.873 1.304 +38.229.71.1 204.123.2.72 2 u 211 512 377 21.977 -0.055 2.274 ind assid status conf reach auth condition last_event cnt =========================================================== 1 39973 931a yes yes none outlyer sys_peer 1 2 39974 941a yes yes none candidate sys_peer 1 3 39975 9324 yes yes none outlyer reachable 2 4 39976 942a yes yes none candidate sys_peer 2 5 39977 931a yes yes none outlyer sys_peer 1 6 39978 961a yes yes none sys.peer sys_peer 1 7 39979 9414 yes yes none candidate reachable 1 8 39980 931a yes yes none outlyer sys_peer 1 9 39981 941a yes yes none candidate sys_peer 1 What you would have seen is a falseticker from the impacted clocks. This is a fairly reasonable setup. I've also been looking at an item like this: http://www.netburnerstore.com/ProductDetails.asp?ProductCode=PK70EX-NTP which is about $300 + misc parts. Should be well worth it to avoid a 'major outage' that some folks had with needing to reboot their servers, etc. - Jared

George Herbert

8:52 p.m.

On Nov 20, 2012, at 11:39 AM, Jared Mauch <jared@puck.nether.net> wrote: .

...

I've also been looking at an item like this:

http://www.netburnerstore.com/ProductDetails.asp?ProductCode=PK70EX-NTP

which is about $300 + misc parts.

Should be well worth it to avoid a 'major outage' that some folks had with needing to reboot their servers, etc.

- Jared

Caution - that Netburner decice is just GPS synced, so if GPS ever does go insane you're out of luck. It doesn't list a precision internal clock part. I am not sure what all is in the dev kit version, but I know the company owner and can ask if anyone cares. George William Herbert Sent from my iPhone

Leo Bicknell

8:15 p.m.

In a message written on Tue, Nov 20, 2012 at 02:28:19PM -0500, Jay Ashworth wrote:

...

I'm curious, Leo, what your internal setup looks like. Do you have an internal pair of masters, all slaved to those externals and one another, with your machines homed to them? Full mesh? Or something else?

My particular internal setup is a tad weird, and so rather than answer your question, I'm going to answer with some generalities. The right answer of course depends a lot on how important it is that boxes have the right time. If you have 4 or more physical sites, I believe the right answer is to have on the order of 8 NTP servers. 2 each in 4 sites reaches the minimum nicely with redundancy. These boxes can have GPS, CDMA or other technologies if you want, but MUST peer with at least 10 stratum-1 sources outside of your network. Of course if you have more sites, one server in each of 8 sites is peachy. Those on a budget could probably get by with 4 servers total, but never less! All "critical" devices should then be synced to the full set of internal servers. 4 boxes minimum, 8-10 preferred. NTP will only use the 10 best servers in it's calculations, so there is a steep dropoff of diminishing returns beyond 10. For most ISP's I would include all routers in this list. For the "non-critical" devices? Well, there it gets more complex. For most I would only configure one server, their default gateway router. Of course, pushing out a set of 4+ to themm if that is easy is a great thing to do. The interesting thing here is that no devices except for your NTP servers should ever peer with anything outside of your network. Why? Let's say your NTP servers all go crazy together. The outside world is cut off, GPS is spoofed, the world is ending. All that you have left is that all of your devices are in time to each other....so at least your logs still coorelate and such. So having every device under your master set of NTP servers is important. One guy with an external peer may choose to use that, and leave the hive mind, so to speak. For small players, less than 4 sites, typically just use the NTP pool servers, configuring 4 per box minimum. If you want the same protection I just outlined in the paragraph before, make 4 of your servers talk to the outside world, and make everything else talk to those. Want to give back to the community? Get a GPS/CDMA/Whatever box and make it part of the NTP pool. Want to step up your game (which is what I do), reach out to various Stratum-1's on the net (or find free, open ones) and peer up 8-20 of them.

...

In my last big gig, it was recommended to me that I have all the machines which had to speak to my DBMS NTP *to it*, and have only it connect to the rest of my NTP infrastructure. It coming unstuck was of less operational impact than *pieces of it* going out of sync with one another...

Yep, a prime example of the scenario I described above. Depending on your level of network redundancy, number of NTP servers, and so on, this is a fine solution. With one NTP server (the DBMS) the downstream will always use it, and stay in sync. It's a valid and good config in many situations. -- Leo Bicknell - bicknell@ufp.org - CCIE 3440 PGP keys at http://www.ufp.org/~bicknell/

Darius Jahandarie

9 p.m.

On Tue, Nov 20, 2012 at 3:15 PM, Leo Bicknell <bicknell@ufp.org> wrote:

...

For small players, less than 4 sites, typically just use the NTP pool servers, configuring 4 per box minimum. If you want the same protection I just outlined in the paragraph before, make 4 of your servers talk to the outside world, and make everything else talk to those. Want to give back to the community? Get a GPS/CDMA/Whatever

Choosing the first four servers is usually pretty straightforward: *.CC.pool.ntp.org But beyond that, I'm honestly rather curious what server selections are a good idea. A first thought would be an adjacent country, but maybe there is a benefit to picking things outside of the pool.ntp.org selection entirely? I see that Jared used *.fedora.pool.ntp.org -- I wonder if there was a specific reason for that or if my questions are even worth thinking about at all :-). Happy to hear thoughts. -- Darius Jahandarie

Mike Lyon

9:04 p.m.

I usually use time.nist.gov. On Tue, Nov 20, 2012 at 1:00 PM, Darius Jahandarie <djahandarie@gmail.com>wrote:

...

On Tue, Nov 20, 2012 at 3:15 PM, Leo Bicknell <bicknell@ufp.org> wrote:

...
For small players, less than 4 sites, typically just use the NTP pool servers, configuring 4 per box minimum. If you want the same protection I just outlined in the paragraph before, make 4 of your servers talk to the outside world, and make everything else talk to those. Want to give back to the community? Get a GPS/CDMA/Whatever

Choosing the first four servers is usually pretty straightforward: *.CC.pool.ntp.org

But beyond that, I'm honestly rather curious what server selections are a good idea. A first thought would be an adjacent country, but maybe there is a benefit to picking things outside of the pool.ntp.org selection entirely?

I see that Jared used *.fedora.pool.ntp.org -- I wonder if there was a specific reason for that or if my questions are even worth thinking about at all :-).

Happy to hear thoughts.

-- Darius Jahandarie

-- Mike Lyon 408-621-4826 mike.lyon@gmail.com http://www.linkedin.com/in/mlyon

Jared Mauch

9:21 p.m.

On Nov 20, 2012, at 4:00 PM, Darius Jahandarie <djahandarie@gmail.com> wrote:

...

Choosing the first four servers is usually pretty straightforward: *.CC.pool.ntp.org

But beyond that, I'm honestly rather curious what server selections are a good idea. A first thought would be an adjacent country, but maybe there is a benefit to picking things outside of the pool.ntp.org selection entirely?

I see that Jared used *.fedora.pool.ntp.org -- I wonder if there was a specific reason for that or if my questions are even worth thinking about at all :-).

I'm by no means a time geek, but …. i have some ideas about what you want and can tell you why I picked the settings I did… 1) The 129.250 ones are my employer run clocks. It is a good idea to know how accurate they are. 2) The pool ones, some were default (e.g.: fedora) from my OS distro on the machine I took the example from. You will see freebsd, centOS and others based on your settings. You may even see time.apple.com if you are MacOS. 3) CC ntp pool were selected to provide additional clock diversity. 4) You want low jitter to your clocks. This will allow you to have an accurate timing source. This means don't congest that path. If you want something very reliable, don't run it on a server with the other "misc" functions you need (e.g.: DNS, etc). If it's important, dedicate some hardware to it. if it is of passing importance, use a fair number of peers. I was playing with the OWAMP software. Having consistent clocks is important for that, (even if they are all off by a few ms). It can be fun to play with and measure things… http://www.internet2.edu/performance/owamp/index.html 5) Monitor your NTP setup periodically. You may see clocks be rejected or outliers. Depending on how close your clocks are, you may see a fair number be unusable. Take this output: nat:~$ ntpq -n -p -c ass remote refid st t when poll reach delay offset jitter ============================================================================== *129.250.35.250 164.244.221.197 2 u 507 512 377 18.883 0.196 18.311 +129.250.35.251 209.51.161.238 2 u 366 512 377 41.349 0.429 2.184 -206.57.44.17 204.123.2.5 2 u 91 512 377 35.884 -5.982 7.099 -4.53.160.75 209.81.9.7 2 u 5 512 377 24.250 1.522 1.353 +64.73.32.135 164.67.62.194 2 u 296 512 377 26.405 -0.956 11.244 +50.116.38.157 64.250.177.145 2 u 897 1024 377 42.978 0.685 1.211 -208.87.221.228 10.0.22.51 2 u 390 512 377 83.858 -2.717 0.814 -206.212.242.132 128.252.19.1 2 u 262 512 377 22.278 -1.640 1.150 +38.229.71.1 204.123.2.72 2 u 95 512 377 20.688 0.113 1.878 ind assid status conf reach auth condition last_event cnt =========================================================== 1 39973 961a yes yes none sys.peer sys_peer 1 2 39974 941a yes yes none candidate sys_peer 1 3 39975 9324 yes yes none outlyer reachable 2 4 39976 932a yes yes none outlyer sys_peer 2 5 39977 941a yes yes none candidate sys_peer 1 6 39978 941a yes yes none candidate sys_peer 1 7 39979 9314 yes yes none outlyer reachable 1 8 39980 931a yes yes none outlyer sys_peer 1 9 39981 941a yes yes none candidate sys_peer 1 Only 5/9 clocks are 'candidate' for usage, or the actual reference clock. The jitter on the reference clock is equal to the delay (!). This is on a business class internet link/tier, but from one of the 'usual suspects' that offers residential services as well. I haven't been able to find them operating any customer accessible clocks, but they may exist. My config, or one resembling it will give you a fair amount of diversity of clocks. Syncing to one can easily result in being lied to and resetting the clock as everyone observed that went back to 2000. - Jared

Jay Ashworth

9:53 p.m.

New subject: Picking outside NTP servers (Re: NTP Issues Today)

----- Original Message -----

...

From: "Darius Jahandarie" <djahandarie@gmail.com>

...

Choosing the first four servers is usually pretty straightforward: *.CC.pool.ntp.org

But beyond that, I'm honestly rather curious what server selections are a good idea. A first thought would be an adjacent country, but maybe there is a benefit to picking things outside of the pool.ntp.org selection entirely?

I see that Jared used *.fedora.pool.ntp.org -- I wonder if there was a specific reason for that or if my questions are even worth thinking about at all :-).

Ah; the question that has plagued mankind since the beginning of.. time. :-) There are a couple of documents on this topic at ntp.org, and there's the traditional list -- of questionable accuracy at this point -- of open-acess Strat 1 and 2 servers. For myself, I usually pick the first three in us.pool.ntp.org, tick and tock, time.nist.gov, and a couple of regionally appropriate large universities. I have always aimed for 6 to 8 outside servers, and a pair inside, preferably in different locations, both talking to one another. If your site is in Internet Business, you should probably peer with your business partners. If you deal with Google Docs or AWS, you should probably peer with them, if they have servers for that. Cheers, -- jra -- Jay R. Ashworth Baylink jra@baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA #natog +1 727 647 1274

George Herbert

10:07 p.m.

New subject: Picking outside NTP servers (Re: NTP Issues Today)

On Tue, Nov 20, 2012 at 1:53 PM, Jay Ashworth <jra@baylink.com> wrote:

...

.... For myself, I usually pick the first three in us.pool.ntp.org, tick and tock, time.nist.gov, and a couple of regionally appropriate large universities.

As this week indicated, perhaps tick and tock are not sufficiently far apart to be a good redundancy choice from a geographical failover point of view or common mode failure point of view. As part of a set of 8 servers as you indicate later, perhaps ok, but I fear for people who think "Ok, I want redundancy, so... Tick and Tock." Which, it turns out, was significant quantities. -- -george william herbert george.herbert@gmail.com

Majdi S. Abbas

11:22 p.m.

New subject: Picking outside NTP servers (Re: NTP Issues Today)

On Tue, Nov 20, 2012 at 04:53:39PM -0500, Jay Ashworth wrote:

...

For myself, I usually pick the first three in us.pool.ntp.org, tick and tock, time.nist.gov, and a couple of regionally appropriate large universities.

I'd advise going through the RR for a while, and pick servers close to you. ntpd won't select a server that's more than 128ms away. It also degrades accuracy. Select for minimum latency, as well as a diverse set of sources. [Watch their refid over time, and make sure they aren't slaving to the same set of servers, as well as others you may be using.] It requires a bit of effort, but over time you get an idea what public time servers are close to each of your locations, and diverse from each other. --msa

Ask Bjørn Hansen

21 Nov 21 Nov

10:06 p.m.

On Nov 20, 2012, at 13:00, Darius Jahandarie <djahandarie@gmail.com> wrote: Hi everyone, I run the NTP Pool system - http://www.pool.ntp.org/ - so I have some opinions on some of this. :-)

...

But beyond that, I'm honestly rather curious what server selections are a good idea. A first thought would be an adjacent country, but maybe there is a benefit to picking things outside of the pool.ntp.org selection entirely?

First of all: None of the ~3800 servers in the NTP Pool system were affected by this as far as I can tell from the (copious) monitoring data. The big benefit to adding some non-pool servers is that you wouldn't be depending basically on a bunch of volunteers (and to a large extent me) for your time keeping. Though likely you'd just be depending on another group of volunteers. In addition to depending on the server operators who run the ntpd servers you also depend on: 1) The monitoring system keeping accurate time. 2) The monitoring system does its job catching bad servers. 3) The process updating and distributing the DNS data working. 4) The DNS servers working (and not being under a DoS attack or similar). 5) Anything I haven't thought of! Empirically I believe we've done a better job than just about anyone with a similar scale, but past performance is no promise of the future.

...

I see that Jared used *.fedora.pool.ntp.org -- I wonder if there was a specific reason for that or if my questions are even worth thinking about at all :-).

The servers for x.fedora.pool.ntp.org are in the same "group" as x.pool.ntp.org. If you are in a country with many servers in the pool then you'll very likely get different IPs for the two. If you are in a country with few servers your odds for that aren't so good and it'd be a bit pointless. Anyone using the NTP Pool in a default configuration (like Fedora does) must get a "vendor zone" setup - http://www.pool.ntp.org/en/vendors.html - so we have at least a little bit of a chance to monitor and mitigate problems. It also allows us to change what servers are selected, how many IPs are returned etc for a particular vendor. For example if Fedora in the future changes to use 'pool' instead of 'server' in the configuration we could optimize for that. Ask -- http://askask.com/

Jimmy Hess

12:49 a.m.

On 11/19/12, Van Wolfe <vanwolfe@gmail.com> wrote:

...

Did anyone else experience issues with NTP today? We had our server times update to the year 2000 at around 3:30 MT, then revert back to 2012.

Are you sure that you are actually using NTP to set your clock? For you to sync with 2000, you should have had multiple confused peers from multiple time sources; possibly a false radio signal.... NTP by default has a panic threshold of 1000 seconds. This _should_ have caused NTP to execute a panic shutdown, instead of setting the clock back 30 million seconds.

...

Thanks, Van -- -JH

Darius Jahandarie

12:56 a.m.

On Tue, Nov 20, 2012 at 7:49 PM, Jimmy Hess <mysidia@gmail.com> wrote:

...

Are you sure that you are actually using NTP to set your clock? For you to sync with 2000, you should have had multiple confused peers from multiple time sources; possibly a false radio signal....

NTP by default has a panic threshold of 1000 seconds.

This _should_ have caused NTP to execute a panic shutdown, instead of setting the clock back 30 million seconds.

For VMWare at least, their official recommendation[1] for NTP is to tinker panic 0 for suspend/resume reasons. I've seen it default in some places. [1] http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427 -- Darius Jahandarie

Blake Dunlap

1:03 a.m.

That's what happens when you just follow vendor recommendations blindly. If you do follow that on vm's (which can actually be a good practice), make sure they pull from your own time infrastructure, and not just the world at large, and that those servers behave in a sane fashion with regard to time jumps. On Tue, Nov 20, 2012 at 6:56 PM, Darius Jahandarie <djahandarie@gmail.com>wrote:

...

On Tue, Nov 20, 2012 at 7:49 PM, Jimmy Hess <mysidia@gmail.com> wrote:

...
Are you sure that you are actually using NTP to set your clock? For you to sync with 2000, you should have had multiple confused peers from multiple time sources; possibly a false radio signal....

NTP by default has a panic threshold of 1000 seconds.

This _should_ have caused NTP to execute a panic shutdown, instead of setting the clock back 30 million seconds.

For VMWare at least, their official recommendation[1] for NTP is to

tinker panic 0

for suspend/resume reasons. I've seen it default in some places.

[1] http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427

-- Darius Jahandarie

George Herbert

1:14 a.m.

As a reminder - time infrastructure is not recommended for virtualization. Make them physicals. On Tue, Nov 20, 2012 at 5:03 PM, Blake Dunlap <ikiris@gmail.com> wrote:

...

That's what happens when you just follow vendor recommendations blindly. If you do follow that on vm's (which can actually be a good practice), make sure they pull from your own time infrastructure, and not just the world at large, and that those servers behave in a sane fashion with regard to time jumps.

On Tue, Nov 20, 2012 at 6:56 PM, Darius Jahandarie <djahandarie@gmail.com>wrote:

...
On Tue, Nov 20, 2012 at 7:49 PM, Jimmy Hess <mysidia@gmail.com> wrote:

...
Are you sure that you are actually using NTP to set your clock? For you to sync with 2000, you should have had multiple confused peers from multiple time sources; possibly a false radio signal....

NTP by default has a panic threshold of 1000 seconds.

This _should_ have caused NTP to execute a panic shutdown, instead of setting the clock back 30 million seconds.

For VMWare at least, their official recommendation[1] for NTP is to

tinker panic 0

for suspend/resume reasons. I've seen it default in some places.

[1] http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427

-- Darius Jahandarie

-- -george william herbert george.herbert@gmail.com

Robert E. Seastrom

12:20 p.m.

Blake Dunlap <ikiris@gmail.com> writes:

...

That's what happens when you just follow vendor recommendations blindly. If you do follow that on vm's (which can actually be a good practice), make sure they pull from your own time infrastructure, and not just the world at large, and that those servers behave in a sane fashion with regard to time jumps.

Emphatically disagree on the "pull from your own infrastructure" point. You probably don't have the budget even in a big company for sufficient diversity of sources [*] for your NTP server and even if you do the NTP servers will probably be run by the same person/organization. Mills has called the latter practice out as bad in the past. As Leo pointed out, the key is having a large diverse set so that if a couple of them go nuts they can be voted off the island. If you have a requirement for super low jitter or holdover if you lose network, you're looking at on-site devices with OCXO or Rb frequency standards in them. That doesn't mean you shouldn't be talking to the rest of the world too though. What if your on-site sources go nuts? This happens periodically, say every 10 years or so, because of crappy implementations and worst-current-practices. A re-read of https://groups.google.com/forum/?fromgroups=#!search/mills$20ntp$20byzantine... may prove instructive. (reading list also includes http://www.amazon.com/dp/1439814635/ ) In my experience NTP beats out even DNS for "blatantly wrong configs in the wild that nevertheless seem to work well enough that dilettante tech folks don't notice". I might have replied to this thread yesterday but I was blissfully unaware of any problems: rs@bifrost [8] % ntpq -c peers | egrep -v '(===|remote)' | wc -l 11 rs@bifrost [9] % -r [*] particularly due to shortsighted US federal government choices on LORAN, GOES, WWVB time format, etc

Damian Menscher

12:59 a.m.

On Tue, Nov 20, 2012 at 4:49 PM, Jimmy Hess <mysidia@gmail.com> wrote:

...

...
Did anyone else experience issues with NTP today? We had our server times update to the year 2000 at around 3:30 MT, then revert back to

On 11/19/12, Van Wolfe <vanwolfe@gmail.com> wrote: 2012.

Are you sure that you are actually using NTP to set your clock? For you to sync with 2000, you should have had multiple confused peers from multiple time sources; possibly a false radio signal....

NTP by default has a panic threshold of 1000 seconds.

This _should_ have caused NTP to execute a panic shutdown, instead of setting the clock back 30 million seconds.

...

From logs various people have posted, it appears NTPd saw the excessive time shift and took the reasonable(?) step of killing itself. The OS detected ntpd's death and took the reasonable step of restarting it. On startup, ntpd can be reasonably(?) configured with the -g option to bypass

the 1000s limit to set the starting time before going into the regular ntpd time adjustment code. In this case that would have set them back to 2000.... It's a good lesson on how a chain of reasonable decisions can lead to a bad outcome, so you really need to understand the interactions of the whole system. Damian

Alvaro Pereira

1:01 a.m.

Looks like something bad has happened: Behind the Random NTP Bizarreness of Incorrect Year Being Set https://isc.sans.edu/diary.html?n&storyid=14548 --- "A few people have written in within the past 18 hours about their NTP server/clients getting set to the year 2000. The cause of this behavior is that an NTP server at the US Naval Observatory (pretty much the authoritative time source in the US) was rebooted and somehow reverted to the year 2000. This, then, propogated out for a limited time and downstream time sources also got this value. It's a transient problem and should already be rectified. Not much really to report except an error at the top of the food chain causing problems to the layers below. If you have a problem, just fix the year or resync your NTP server. Just goes to show how reliant NTP is that it is all but a "fire and forget" service once configured until "bad things happen". John Bambenek" --- Alvaro Pereira

Chuck Church

1:28 p.m.

-----Original Message-----

...

From: Jimmy Hess [mailto:mysidia@gmail.com] Sent: Tuesday, November 20, 2012 7:50 PM To: Van Wolfe Cc: nanog@nanog.org Subject: Re: NTP Issues Today

...

This _should_ have caused NTP to execute a panic shutdown, instead of setting the clock back 30 million seconds.

...

-- -JH

Sounds like SNTP might have been on the client. Doesn't do much if any sanity checking. Windows used to use that, was more than happy to change the time by years if bad time received. Not sure if that is still the case. Chuck

Greg Ihnen

1:50 p.m.

It sounds like the Navy and who ever else they partner with (NIST?) need some egress filtering on their NTP servers to catch and prevent events like this.

4827

Age (days ago)

4829

Last active (days ago)

List overview

Download

29 comments

21 participants

participants (21)

Alvaro Pereira
Ask Bjørn Hansen
Blake Dunlap
Chuck Church
Damian Menscher
Darius Jahandarie
George Herbert
Greg Ihnen
Jared Mauch
Jay Ashworth
Jimmy Hess
Leo Bicknell
Majdi S. Abbas
Mark Andrews
Mike Lyon
Robert E. Seastrom
Seth Mattinen
Sid Rao
Steve Meuse
Van Wolfe
Wallace Keith

NTP Issues Today

Wallace Keith

Sid Rao

Darius Jahandarie

Darius Jahandarie

Robert E. Seastrom

Greg Ihnen

tags

participants (21)