Erroneous Leap Second Introduced at 2014-06-30 23:59:59 UTC

newer
Re: Feedback Requested: Routing...

Tim Heckman

1 Jul 2014 1 Jul '14

12:33 a.m.

Hey Everyone, I just was alerted to one of the systems I managed having a time skew greater than 100ms from NTP sources. Upon further investigation it seemed that the time was off by almost exactly 1 second. Looking back over our NTP monitoring, it would appear that this system had a large time adjust at approximately 00:00 UTC: - http://puu.sh/9Rs6O/a514ad7c97.png (times are in Pacific in these graphs, sorry about that) A few of our systems did alert early this morning, indicating they were going to be receiving a leap second today. However, I was unable to determine the exact cause for NTP believing a leap second should be added. And after some time a few of the systems were no longer indicating that a leap second would be introduced. This specific system is hosted in AWS US-WEST-2C and uses the 0.amazon.pool.ntp.org pool. Has anyone else seen any erroneous leap seconds being added to their system? Cheers! -Tim Heckman

Show replies by date

Majdi S. Abbas

1 Jul 1 Jul

2:27 a.m.

On Mon, Jun 30, 2014 at 05:33:52PM -0700, Tim Heckman wrote:

...

I just was alerted to one of the systems I managed having a time skew greater than 100ms from NTP sources. Upon further investigation it seemed that the time was off by almost exactly 1 second.

Looking back over our NTP monitoring, it would appear that this system had a large time adjust at approximately 00:00 UTC:

Okay. Do you have any logging configured (peerstats, etc?) for ntpd?

...

A few of our systems did alert early this morning, indicating they were going to be receiving a leap second today. However, I was unable to determine the exact cause for NTP believing a leap second should be added. And after some time a few of the systems were no longer indicating that a leap second would be introduced.

This can happen if a server is either passing along a leap notification that it received, or is configured to use a leapseconds file that is incorrect.

...

This specific system is hosted in AWS US-WEST-2C and uses the 0.amazon.pool.ntp.org pool.

0 is just one server in the pool (whichever you draw by rotation); is this the only server you have configured? --msa

Daniël W. Crompton

10:52 a.m.

That's strange as I remember reading this yesterday: NO leap second will be introduced at the end of June 2014. http://hpiers.obspm.fr/iers/bul/bulc/bulletinc.dat D. Oplerno is built upon empowering faculty and students -- Daniël W. Crompton <daniel.crompton@gmail.com> <http://specialbrands.net/> <http://specialbrands.net/> http://specialbrands.net/ <http://twitter.com/webhat> <http://www.facebook.com/webhat> <http://plancast.com/webhat> <http://www.linkedin.com/in/redhat> On 1 July 2014 04:27, Majdi S. Abbas <msa@latt.net> wrote:

...

On Mon, Jun 30, 2014 at 05:33:52PM -0700, Tim Heckman wrote:

...
I just was alerted to one of the systems I managed having a time skew greater than 100ms from NTP sources. Upon further investigation it seemed that the time was off by almost exactly 1 second.

Looking back over our NTP monitoring, it would appear that this system had a large time adjust at approximately 00:00 UTC:

Okay. Do you have any logging configured (peerstats, etc?) for ntpd?

...
A few of our systems did alert early this morning, indicating they were going to be receiving a leap second today. However, I was unable to determine the exact cause for NTP believing a leap second should be added. And after some time a few of the systems were no longer indicating that a leap second would be introduced.

This can happen if a server is either passing along a leap notification that it received, or is configured to use a leapseconds file that is incorrect.

...
This specific system is hosted in AWS US-WEST-2C and uses the 0.amazon.pool.ntp.org pool.

0 is just one server in the pool (whichever you draw by rotation); is this the only server you have configured?

--msa

Tim Heckman

7:20 p.m.

On Mon, Jun 30, 2014 at 7:27 PM, Majdi S. Abbas <msa@latt.net> wrote:

...

On Mon, Jun 30, 2014 at 05:33:52PM -0700, Tim Heckman wrote:

...
I just was alerted to one of the systems I managed having a time skew greater than 100ms from NTP sources. Upon further investigation it seemed that the time was off by almost exactly 1 second.

Looking back over our NTP monitoring, it would appear that this system had a large time adjust at approximately 00:00 UTC:

Okay. Do you have any logging configured (peerstats, etc?) for ntpd?

Our systems all have loopstats and peerstats logging enabled. I have those log files available if interested. However, when I searched over the files I wasn't able to find anything that seemed to indicate this was the peer who told the system to introduce a leap second. That said, I might just not know what to look for in the logs.

...

...
A few of our systems did alert early this morning, indicating they were going to be receiving a leap second today. However, I was unable to determine the exact cause for NTP believing a leap second should be added. And after some time a few of the systems were no longer indicating that a leap second would be introduced.

This can happen if a server is either passing along a leap notification that it received, or is configured to use a leapseconds file that is incorrect.

Correct, I was hoping to determine which peer it was so I can reach out to them to make sure this doesn't bleed in to the pool at the end of the year. I was also more-or-less curious how wide-spread of an issue this was, but I'm starting to think I may have been the only person to catch it in the act. :)

...

...
This specific system is hosted in AWS US-WEST-2C and uses the 0.amazon.pool.ntp.org pool.

0 is just one server in the pool (whichever you draw by rotation); is this the only server you have configured?

We use 0.amazon.pool.ntp.org, 1.amazon.pool.ntp.org, and 2.amazon.pool.ntp.org. As with the other widely-used pool hostnames, each of these is a round-robin DNS entry with 4 hosts and a TTL of 150s.

...

--msa

Thank you for getting back to me. Cheers! -Tim

Majdi S. Abbas

7:35 p.m.

On Tue, Jul 01, 2014 at 12:20:12PM -0700, Tim Heckman wrote:

...

Our systems all have loopstats and peerstats logging enabled. I have those log files available if interested. However, when I searched over the files I wasn't able to find anything that seemed to indicate this was the peer who told the system to introduce a leap second. That said, I might just not know what to look for in the logs.

Look at the status word in peerstats; if the high bit is set, that's your huckleberry. See: http://www.eecis.udel.edu/~mills/ntp/html/decode.html

...

Correct, I was hoping to determine which peer it was so I can reach out to them to make sure this doesn't bleed in to the pool at the end of the year. I was also more-or-less curious how wide-spread of an issue this was, but I'm starting to think I may have been the only person to catch it in the act. :)

You might want to upgrade to current 4.2.7 development code, wherein a majority rule is used to qualify the leap indicator. Cheers, --msa

Tim Heckman

2 Jul 2 Jul

2:19 a.m.

On Tue, Jul 1, 2014 at 12:35 PM, Majdi S. Abbas <msa@latt.net> wrote:

...

On Tue, Jul 01, 2014 at 12:20:12PM -0700, Tim Heckman wrote:

...
Our systems all have loopstats and peerstats logging enabled. I have those log files available if interested. However, when I searched over the files I wasn't able to find anything that seemed to indicate this was the peer who told the system to introduce a leap second. That said, I might just not know what to look for in the logs.

Look at the status word in peerstats; if the high bit is set, that's your huckleberry.

See: http://www.eecis.udel.edu/~mills/ntp/html/decode.html

I've taken a look at all of the peerstats available for this host, and surprisingly none of them are showing code 09 (leap_armed). I'm also fairly certain that I know when some of my systems armed the leap second (within a 60-120s window) based on our monitoring. Around those times everything seems normal according to peerstats. Looking at I am running Ubuntu 10.04 on this box, which is ntp v4.2.4p8. I'll need to looking to see if the printing of this flag was added later; otherwise, it would seem some of my systems picked up a phantom leap second from an unknown source with one of them actually executing it. Thanks for the decoder ring. My Google-fu wasn't hitting the right keywords.

...

...
Correct, I was hoping to determine which peer it was so I can reach out to them to make sure this doesn't bleed in to the pool at the end of the year. I was also more-or-less curious how wide-spread of an issue this was, but I'm starting to think I may have been the only person to catch it in the act. :)

You might want to upgrade to current 4.2.7 development code, wherein a majority rule is used to qualify the leap indicator.

We're going to be doing some system refreshes coming soon, so that may be something we'll need to look at. I didn't realize this was happening as part of the 4.2.7 development branch. Definitely an interesting feature, especially after this. :p

...

Cheers,

--msa

Thanks again, Majdi. Cheers! -Tim

4283

Age (days ago)

4284

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Daniël W. Crompton
Majdi S. Abbas
Tim Heckman