So, probably not a failure "caused by GPS", rather one caused by poor design (only two clock sources) combined with unsupported and buggy devices.
100% correct. From the PDF : 4.31 JT summarised its findings in relation to the ‘Panic Timer’ on the
Cisco IOS XR NTP Client, namely that: JT’s efforts in understanding the root cause, and mitigation steps to take to avoid future incidents have focused on the Cisco NTP Client behaviour, and notably Cisco’s decision to not implement the ‘Panic Timer’ on their IOS XR operating system. Arguably, whilst the NTP server injected an invalid time into the network, it is the NTP Clients filtering and selection algorithms which are responsible for detecting and disregarding falsetickers, and it was the Cisco NTP Clients failure to appropriately handle this which triggered the network incident. 43 […] Further detailed soak testing, log analysis and debug analysis corroborated that the Cisco IOS XR NTP Client did not implement the ‘Panic Timer’ that would normally cause an NTP Client to ignore an NTP Server exceeding 1000 seconds variance.
On Wed, Aug 16, 2023 at 10:50 AM Mel Beckman <mel@beckman.org> wrote:
So, probably not a failure "caused by GPS", rather one caused by poor design (only two clock sources) combined with unsupported and buggy devices.
-mel beckman
On Aug 16, 2023, at 3:51 AM, Matthew Richardson <matthew-l@itconsult.co.uk> wrote:
Mel Beckman wrote:-
Do you have a citation for your Jersey event? I doubt GPS caused the problem, but I'd like to see the documentation.
The event took place on the evening of Sunday 12 July 2020, and seems NOT to have been due to an issue caused directly by GPS, but rather to misbehaviour of a GPS NTP server relating to week numbers. Our regulator subsequently issued the following comprehensive document:-
https://www.jcra.je/media/598397/t-027-jt-july-2020-outage-decision-directio...
By way of summary, JT operated two GPS derived NTP servers, with all of their routers were pointing to both. On the evening in question, one of the two reset its clock back to 27 November 2000.
Their interior routing protocol used amongst their mesh of routers was IS-IS which was using authentication. The authentication [section 4.19] was described having a "password validity start date" of 01 July 2012. Thus, any routers which had picked up the time from the faulty source no longer had valid IS-IS authentication and were thus isolated.
Whilst only 15% of their routers were affected, this was enough to cause an almost total failure in their network, affecting telephony (fixed & mobile) and Internet. For foreign readers (this is NANOG!) "999" calls refer to the emergency services in these parts, where any failures attract the attention of our regulator.
The details of why the clock "failed" start at section 4.23, and seem to relate a GPS week number rollover.
So, probably not a failure "caused by GPS", rather one caused by poor design (only two clock sources) combined with unsupported and buggy devices.
One curious aspect is that some routers followed the "bad" time, which is alluded to in section 4.31.
Something not discussed in that report is that JT's email failed during the incident despite its being hosted on Office365. The reason was that the two authoritative DNS servers for jtglobal.com were hosted in Jersey inside their network. As that network was wholly disconnected, there was no DNS and hence no email. Despite my having raised this since with their senior management, their DNS remains hosted in this way:-
matthew@m88:~$ dig +norec +noedns +nocmd +nostats -t ns jtglobal.com @ ns1.jtibs.net
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20462
;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 4
;; QUESTION SECTION:
;jtglobal.com. IN NS
;; ANSWER SECTION:
jtglobal.com. 60 IN NS ns2.jtibs.net.
jtglobal.com. 60 IN NS ns1.jtibs.net.
;; ADDITIONAL SECTION:
ns1.jtibs.net. 60 IN A 212.9.0.135
ns2.jtibs.net. 60 IN A 212.9.0.136
ns1.jtibs.net. 60 IN AAAA 2a02:c28::d1
ns2.jtibs.net. 60 IN AAAA 2a02:c28::d2
Rediculously (and again despite my agitation to their management) our government domain gov.je has similar DNS fragility:-
matthew@m88:~$ dig +norec +noedns +nocmd +nostats -t ns gov.je @ns1.gov.je
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4249
;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; QUESTION SECTION:
;gov.je. IN NS
;; ANSWER SECTION:
gov.je. 3600 IN NS ns2.gov.je.
gov.je. 3600 IN NS ns1.gov.je.
;; ADDITIONAL SECTION:
ns2.gov.je. 3600 IN A 212.9.21.137
ns1.gov.je. 3600 IN A 212.9.21.9
-- Best wishes, Matthew
------
From: Mel Beckman <mel@beckman.org>
To: Matthew Richardson <matthew-l@itconsult.co.uk>
Cc: Nanog <nanog@nanog.org>
Date: Tue, 8 Aug 2023 15:12:29 +0000
Subject: Re: NTP Sync Issue Across Tata (Europe)
Until the Internet NTP network can be made secure, no. Do you have a citation for your Jersey event? I doubt GPS caused the problem, but I'd like to see the documentation.
Using GPS for time sync is simple risk management: the risk of Internet NTP with known, well documented vulnerabilities and many security incidents, versus the risk of some theoretical GPS-based vulnerability, for which mitigations such as geographic diversity are readily available. Sure, you could use Internet NTP as a last resort should GPS fail globally (perhaps due to a theoretical - but conceivable - meteor storm). But that would be a fall-back. I would not mix the systems.
-mel
On Aug 8, 2023, at 1:36 AM, Matthew Richardson <matthew-l@itconsult.co.uk> wrote:
?Mel Beckman wrote:-
It's a problem that has received a lot of attention in both NTP and
aviation navigation circles. What is hard to defend against is total signal
suppression via high powered jamming. But that you can do with a
geographically diverse GPS NTP network.
Whilst looking forward to being corrected, GPS (even across multiple
locations) seems to be a SINGLE source of time. You seem (have I
misunderstood?) to be a proponent of using GPS exclusively as the external
clock source.
Might it be preferable to have a mixture of GPS (perhaps with another GNSS)
together with carefully selected Internet-based NTP servers?
I recall an incident over here in Jersey (the one they named New Jersey
after!) where our primary telco had a substantial time shift on one of
their two GPS synced servers. This managed to adjust the clock on enough
of their routers that the certificate-based OSPF authentication considered
the certificates invalid, and caused a failure of almost their whole
network.
This is, of course, not to say that GPS is not a very good clock source,
but rather to wonder whether more diversity would be preferable than using
it as a single source.
--
Best wishes,
Matthew
------
From: Mel Beckman <mel@beckman.org>
To: "Forrest Christian (List Account)" <lists@packetflux.com>
Cc: Nanog <nanog@nanog.org>
Date: Mon, 7 Aug 2023 14:03:30 +0000
Subject: Re: NTP Sync Issue Across Tata (Europe)
Forrest,
GPS spoofing may work with a primitive Raspberry Pi-based NTP server, but commercial industrial NTP servers have specific anti-spoofing mitigations. There are also antenna diversity strategies that vendors support to ensure the signal being relied upon is coming from the right direction. It's a problem that has received a lot of attention in both NTP and aviation navigation circles. What is hard to defend against is total signal suppression via high powered jamming. But that you can do with a geographically diverse GPS NTP network.
-mel
On Aug 7, 2023, at 1:39 AM, Forrest Christian (List Account) < lists@packetflux.com> wrote:
?
The problem with relying exclusively on GPS to do time distribution is the ease with which one can spoof the GPS signals.
With a budget of around $1K, not including a laptop, anyone with decent technical skills could convince a typical GPS receiver it was at any position and was at any time in the world. All it takes is a decent directional antenna, some SDR hardware, and depending on the location and directivity of your antenna maybe a smallish amplifier. There is much discussion right now in the PNT (Position, Navigation and Timing) community as to how best to secure the GNSS network, but right now one should consider the data from GPS to be no more trustworthy than some random NTP server on the internet.
In order to build a resilient NTP server infrastructure you need multiple sources of time distributed by multiple methods - typically both via satellite (GPS) and by terrestrial (NTP) methods. NTP does a pretty good job of sorting out multiple time servers and discarding sources that are lying. But to do this you need multiple time sources. A common recommendation is to run a couple/few NTP servers which only get time from a GPS receiver and only serve time to a second tier of servers that pull from both those in-house GPS-timed-NTP servers and other trusted NTP servers. I'd recommend selecting the time servers to gain geographic diversity, i.e. poll NIST servers in Maryland and Colorado, and possibly both.
Note that NIST will exchange (via mail) a set of keys with you to talk encrypted NTP with you. See https://www.nist.gov/pml/time-and-frequency-division/time-services/nist-auth... .
On Sun, Aug 6, 2023 at 8:36?PM Mel Beckman <mel@beckman.org<mailto: mel@beckman.org>> wrote:
GPS Selective Availability did not disrupt the timing chain of GPS, only the ephemeris (position information). But a government-disrupted timebase scenario has never occurred, while hackers are a documented threat.
DNS has DNSSec, which while not deployed as broadly as we might like, at least lets us know which servers we can trust.
Your own atomic clocks still have to be synced to a common standard to be useful. To what are they sync'd? GPS, I'll wager.
I sense hand-waving :)
-mel via cell
On Aug 6, 2023, at 7:04 PM, Rubens Kuhl <rubensk@gmail.com<mailto: rubensk@gmail.com>> wrote:
?
On Sun, Aug 6, 2023 at 8:20?PM Mel Beckman <mel@beckman.org<mailto: mel@beckman.org>> wrote:
Or one can read recent research papers that thoroughly document the incredible fragility of the existing NTP hierarchy and soberly consider their recommendations for remediation:
The paper suggests the compromise of critical infrastructure. So, besides not using NTP, why not stop using DNS ? Just populate a hosts file with all you need.
BTW, the stratum-0 source you suggested is known to have been manipulated in the past (https://www.gps.gov/systems/gps/modernization/sa/), so you need to bet on that specific state actor not returning to old habits.
OTOH, 4 of the 5 servers I suggested have their own atomic clock, and you can keep using GPS as well. If GPS goes bananas on timing, that source will just be disregarded (one of the features of the NTP architecture that has been pointed out over and over in this thread and you keep ignoring it).
Rubens