On Nov 20, 2012, at 2:28 PM, Jay Ashworth <jra@baylink.com> wrote:
----- Original Message -----
From: "Leo Bicknell" <bicknell@ufp.org>
To protect against two falseticking servers (tick and tock, as we saw on the 19th) you need _FIVE_ servers minimum configured if they are both in the list. More importantly, if you want to protect against a source (GPS, CDMA, IRIG, WWIV, ACTS, etc) false ticking, you need a minimum of _FOUR_ different source technologies in the list as well.
It's not hard, my box that I posted the logs from peers with 18 servers using 8 source technologies, all freely available on the Internet...
I'm curious, Leo, what your internal setup looks like. Do you have an internal pair of masters, all slaved to those externals and one another, with your machines homed to them? Full mesh? Or something else?
In my last big gig, it was recommended to me that I have all the machines which had to speak to my DBMS NTP *to it*, and have only it connect to the rest of my NTP infrastructure. It coming unstuck was of less operational impact than *pieces of it* going out of sync with one another...
here's a sample ntp config from one of my systems. -- snip -- # Use public servers from the pool.ntp.org project. # Please consider joining the pool (http://www.pool.ntp.org/join.html). server 0.fedora.pool.ntp.org server 1.fedora.pool.ntp.org server 2.fedora.pool.ntp.org server 3.fedora.pool.ntp.org # server 0.us.pool.ntp.org iburst maxpoll 9 server 1.us.pool.ntp.org iburst maxpoll 9 server 2.us.pool.ntp.org iburst maxpoll 9 server 129.250.35.250 iburst maxpoll 9 server 129.250.35.251 iburst maxpoll 9 -- snip -- You can audit its operation like this: nat:~$ ntpq -p -n -c ass remote refid st t when poll reach delay offset jitter ============================================================================== -129.250.35.250 164.244.221.197 2 u 68 512 377 19.248 -0.135 3.195 +129.250.35.251 192.5.41.40 2 u 439 512 377 41.817 1.109 15.660 -206.57.44.17 204.123.2.5 2 u 126 512 377 37.133 -6.443 9.631 +4.53.160.75 209.81.9.7 2 u 48 512 377 25.209 1.551 8.804 -64.73.32.135 192.5.41.41 2 u 349 512 377 23.418 -0.703 1.721 *50.116.38.157 64.250.177.145 2 u 380 512 377 43.021 1.267 2.136 +208.87.221.228 10.0.22.49 2 u 517 512 377 92.000 0.974 0.678 -206.212.242.132 128.252.19.1 2 u 323 512 377 21.781 -2.873 1.304 +38.229.71.1 204.123.2.72 2 u 211 512 377 21.977 -0.055 2.274 ind assid status conf reach auth condition last_event cnt =========================================================== 1 39973 931a yes yes none outlyer sys_peer 1 2 39974 941a yes yes none candidate sys_peer 1 3 39975 9324 yes yes none outlyer reachable 2 4 39976 942a yes yes none candidate sys_peer 2 5 39977 931a yes yes none outlyer sys_peer 1 6 39978 961a yes yes none sys.peer sys_peer 1 7 39979 9414 yes yes none candidate reachable 1 8 39980 931a yes yes none outlyer sys_peer 1 9 39981 941a yes yes none candidate sys_peer 1 What you would have seen is a falseticker from the impacted clocks. This is a fairly reasonable setup. I've also been looking at an item like this: http://www.netburnerstore.com/ProductDetails.asp?ProductCode=PK70EX-NTP which is about $300 + misc parts. Should be well worth it to avoid a 'major outage' that some folks had with needing to reboot their servers, etc. - Jared