Forrest seems to have posted a good general overview and perspectives about "good enough for the use case" while others continue to be pedantic about nuances that don't seem to be relevant to most use cases. ----- Mike Hammett Intelligent Computing Solutions http://www.ics-il.com Midwest-IX http://www.midwest-ix.com ----- Original Message ----- From: "Forrest Christian (List Account)" <lists@packetflux.com> To: "nanog list" <nanog@nanog.org> Sent: Monday, August 14, 2023 2:07:14 AM Subject: Re: NTP Sync Issue Across Tata (Europe) I've responded in bits and pieces to this thread and haven't done an excellent job expressing my overall opinion. This is probably because my initial goal was to point out that GPS-transmitted time is no less subject to being attacked than your garden variety NTP-transmitted time. Since this thread has evolved, I'd like to describe my overall position to be a bit clearer. To start, we need a somewhat simplified version of how UTC is created so I can refer to it later: Across the globe, approximately 85 research and standards institutions run a set of freestanding atomic clocks that contribute to UTC. The number of atomic clocks across all these institutions totals around 450. Each institution also produces a version of UTC based on its own set of atomic clocks. In the international timekeeping world, this is designated as UTC(Laboratory), where Laboratory is replaced with the abbreviation for the lab producing that version of UTC. So UTC(NIST) is the version that NIST produces at Boulder, Colorado, NICT produces UTC(NICT) in Tokyo, and so on. Because no clock is perfectly accurate, all of these versions of UTC drift in relation to each other, and you could have significant differences in time between different labs. As a result, there has to be a way to synchronize them. Each month, the standards organization BIPM collects relative time measurements and other statistics from each institution described above. This data is then used to determine the actual value of UTC. BIPM then produces a report detailing each organization's difference from the correct representation of UTC. Each institution uses this data to adjust its UTC representation, and the cycle repeats the next month. In this way, all of the representations of UTC end up being pretty close to each other. The document BIPM produces is titled "Circular T." The most recent version indicates that most of the significant standards institutions maintain a UTC version that differs by less than 10ns from the official version of UTC. Note that 10ns is far more accurate than we need for NTP, so most of the UTC representations can be considered identical as far as this discussion goes. Still, it is essential to realize that UTC(NIST) is generated separately from UTC(USNO) or other UTC implementations. For example, a UTC(NIST) failure should not cause UTC(USNO) to fail as they utilize separate hardware and systems. Each of these versions of UTC is also disseminated in various ways. UTC(NIST) goes out via the "WWV" radio stations, NTP, and other esoteric methods. GPS primarily distributes UTC(USNO), which is also available directly via NTP. UTC(SU) is the timescale for GLONASS. And so on. So, back to NTP and the accuracy required: Most end users (people running everyday web applications or streaming video or similar) don't need precisely synchronized time. The most sensitive application I'm aware of in this space is likely TOTP, which often needs time on the server and time on the client (or hardware key) within 90 seconds of each other. In addition, having NTP time fail usually isn't the end of the world for these users. The best way to synchronize their computers (including desktop and server systems) to UTC is to point their computer time synchronization service (whatever that is) at pool.ntp.org , time.windows.com , their ISP's time server, or similar. Or, with modern OS'es, you can leave the time configured to whatever server the OS manufacturer preconfigured. As an aside, one should note that historically windows ticked at 15ms or so, so trying to synchronize most windows closer than 15ms was futile. On the other hand, large ISPs or other service providers (including content providers) see real benefits to having systems synchronized to fractions of seconds of UTC. Comparing logs and traces becomes much easier when you know that something logged at 10:02:23.1 on one device came before something logged at 10:02:23.5 on another. Various server-to-server protocols and software implementations need time to be synchronized to sub-second intervals since they rely on timestamps to determine the latest copy of data, and so on. In addition, as an ISP, you'll often provide time services to downstream customers who demand more accuracy and reliability than is strictly necessary. As a result, one wants to ensure that all time servers are synchronized within some reasonable standard of accuracy. Within 100ms is acceptable for most applications but a goal of under 50ms is better. If you have local GPS receivers, times down to around 1ms is achievable with careful design. Beyond that, you're chasing unnecessary accuracy. Note that loss of precision is somewhat cumulative here - running a time server synchronized to within 100ms will ensure that no client can be synchronized to better than within 100ms from that server. Generally, you'll want your time server to be synchronized much better than needed to avoid the time server being the limiting factor. In a perfect world with no bad actors and where all links ran perfectly, one could set up an NTP server that pulled from pool.ntp.org or used GPS and essentially acted as a proxy. Unfortunately, we don't live in this world. So one has to ask how you build a system that meets at least the following goals: * Synchronized to UTC within 50ms, with lower being better. * Not subject to a reasonable set of attacks (typical DoS attacks, RF signal attacks, spoofing, etc). * Able to be run by typical network operations staff In addition, an ideal server setup would be made up of redundant servers in case one piece of hardware fails. I will ignore this part, as it's usually just setting up multiple copies of the same thing. The two most straightforward options are using a GPS-based NTP appliance or installing an NTP server and pointing it at pool.ntp.org . Under normal circumstances, both options will be synchronized to UTC with enough accuracy for most applications, and both are easy to run by typical network operations staff. This assumes reasonably consistent network latency in the NTP case and a good sky view in the GPS case. The GPS-based appliance is, however, subject to spoofing or jamming, as I've discussed earlier. The NTP server is at the mercy of the quality of the servers it picked from pool.ntp.org and is also subject to various outside attacks (spoofing, etc.). One must decide how critical time is to them before deciding whether this option is valid. The other end of the scale is the "develop your own offline version of UTC using atomic clocks" methodology. This fixes the attack issue but introduces several others. The main one is that you are now relying on the clock's accuracy. Admittedly rubidium and especially cesium clocks tend to be sufficiently reliable and stable. However, one has to ensure the frequency is accurate initially and stays that way. You must also wire the clock to an NTP Server and calibrate the initial UTC offset. If the clock goes haywire or is less accurate than is required, your in-house version of UTC will drift in relation to real UTC. This means you may need 2 or 3 or more atomic clocks to be sufficiently reliable. You'll then need to regularly take an average, compare it to UTC, and adjust if it's drifted too much. This quickly becomes more of a science project than something you want network operations staff to deal with on an ongoing basis. To be clear: If you need robust time not subject to outside forces and have or can obtain the skill set to pull this off internally, I won't argue that this is a bad option. However, I feel this isn't the type of service most providers want to run internally. So, looking at some middle-ground options that trade a bit of robustness for ease of use is reasonable. My lowest cost preference has always been to use a set of in-house NTP servers pointed at a carefully curated collection of NTP servers. Your curation strategy should depend on network connectivity, the reliability of the time sources, etc. In North America, picking one or two NIST servers from each NIST location is a good starting point. That is one or two from each of Maryland, Fort Collins, Boulder, and the University of Colorado. One may want to add some servers from other timekeeping organizations (such as USNO). Note that there is one commonality: These time servers are run by organizations listed in circular T as contributing to UTC, and the servers are tied to the atomic clocks. That way, we ensure that the servers are not subject to inaccuracies caused by time transfer from an authoritative source for UTC. What is left is any potential attack on the time transfer over NTP itself. I would argue that with a curated list of enough NTP servers, this risk can be pushed down to where it is low enough for many use cases. A lot will depend on the quantity and quality of NTP servers you select and the robustness of the network path to those servers. If the packets between your NTP server and the NTP servers you choose traverse a relatively secure and short path with plenty of bandwidth, and the paths to differing NTP servers are diverse, many attacks will become harder to implement. In addition, the more NTP servers you add, the more likely it is that NTP will be able to correctly pick the servers providing the correct time, even if an attacker is successfully spoofing one or more sources. In some cases it may make sense to add additional servers which are run by third parties if it gains additional robustness based on network architecture. This is especially true if you're closely connected network-wise with the third party and they run a good quality NTP service as well. As I've mentioned, a good middle-of-the-road solution is adding various sources of time derived via GPS. Note I said, "to add." Start with the carefully curated NTP server set, then install one or more GPS-based NTP Servers polled by your NTP server. Adding these GPS time sources to your NTP servers does three things: First, it provides another source of time NTP can use to determine the correct time. Second, we're now using a different time transmission method with different vulnerabilities. And finally, it will significantly improve the accuracy of the time the NTP server produces as NTPd will generally prefer it to do the final trimming to UTC. The strength of the combination of both terrestrial transmitted time via NTP and the precision of rf-transmitted GPS time ensures that time is both correct and precise. There are still attack vectors here, but as you add more time sources, the complexity of pulling off a successful attack increases. This is especially true if you can monitor the NTP server for signs of stress, such as time servers that are not telling the correct time or GPS signals which are inconsistent with the NTP-derived time. A successful attack would require simultaneous NTP (network) and GPS (rf) attacks. Other options or blends of options are also possible. With a reasonably large network, putting enough GPS receivers into place would significantly reduce the possibility of a spoofer or jammer taking out your entire GPS infrastructure. Reducing or eliminating external NTP time sources might be reasonable in that case. The theory is that attacking GPS receivers at one location is easy. Doing it at dozens simultaneously is much more difficult. To use an exaggeration to make a point: If you had 100 different GPS receivers spread across 100 widely geographically diverse locations, and all of your NTP servers were able to poll all of them for time, the chances that an attacker would be able to take out or spoof enough GPS receivers to make a difference would be close to zero. Your failure point becomes UTC(USNO) and the GPS constellation itself. The same argument would apply to NTP servers regarding quantity and diversity. Other options involve adding additional technologies. For example, some appliances use GPS to discipline (adjust) an internal atomic clock. Once the atomic clock is locked to UTC, the GPS can fail for extended periods without affecting NTP output. In addition, some of these will filter updates from the GPS based on the appliance's internal atomic time. That way, a spoofer would be ignored, jammers would have to continue for hours or days, and so on. Of course, these solutions' reliability depends on the implementation quality. If I had the budget to implement something like this in a network, I'd likely scatter a few of these around the network and then still use garden variety NTPd servers which would be pointed at these appliances. I might even consider buying solutions from multiple vendors to ensure a bug in one implementation was filtered out and ignored. I can't cover every option here, but balancing security, cost, operational complexity, and application needs is the key. Some solutions are cheap and easy but not robust. Some are highly robust but expensive and not easy. Somewhere in the middle is probably where most real implementations should lie. Now, to address a couple of specific items: 1) Additional GPS and commercial time distribution systems will likely improve reliability. However, only GPS and GALILEO are available for free in the US. I'm ignoring GLONASS for various legal and political reasons. GALILEO is a valid option but it lives in the same band as GPS, so jamming GPS will usually also jam GALILEO. Utilizing GNSS receivers that use the civilian signals in the newer bands would also help. Some commercial solutions are available that don't require GNSS, but they're relatively new and not as commonly available as one would like. 2) For running my own time servers in a service-provider environment, I'd rather specifically designate the exact NTP server I want to utilize and not rely on a third party to give me a pool of servers. It's more about ensuring the server I use is running a trusted server, and if I delegate the server selection, I lose this ability. On the other hand, where I'm not running a NTP server that is critical for many clients, I'll just point it at pool.ntp.org , or north-america.pool.ntp.org and skip all of the recommendations that I've made above. I would be cautious about requesting pool.ntp.org add entries for "stratum of server" or "origin of time" as this seems like it would tend to overload the stratum one servers in the pool with people "optimizing" their configuration to use only stratum one servers. Remember that pool.ntp.org is generally intended as an end-user-device service, and providing methods that end users can bypass the robustness that a fully distributed pool will provide is probably not a great idea. 3) This all should hopefully sort itself out over the next few years. GPS and GALILEO are flying new birds that have changes designed to improve attack resilience by using cryptography to ensure authentic transmissions (which may rely on ground transmission of cryptographic keys). NTP already supports manual cryptographic keys that work, but NTS is a pain in the rear. Hopefully, NTPv5 will have a better security mechanism. Other, more secure, time sources are on the horizon as the cybersecurity crowd is aware of the issues. And finally, as a sort of a tl;dr; Summary: Each operator needs to decide how critical time is to their network and pick a solution that works for them and fits the organization's budget. Some operators might point everything at pool.ntp.org and not run their own servers. Others might run their own time lab and use that time to provide NTP time and precision time and frequency via various methods. Most will be somewhere in between. But regardless of which you choose, please be aware that GPS isn't 100% secure, and neither is NTP. If attack resilience matters to you, you should think about all of the attack vectors and design something that is robust enough to meet your use case.