NTP: The Protocol That Keeps Time from Collapsing
Most engineers think about time as a number returned by now().
This is acceptable until your TLS certificates appear to be from the future, your database replicas reject writes, your cron jobs run twice, and your forensic timeline reads like fiction.
Time is infrastructure. NTP is the part of that infrastructure that keeps distributed systems from arguing with physics.
The Supreme Leader classifies NTP as a strategic protocol: invisible when correct, catastrophic when neglected.
I. The Problem: Every Clock Lies Differently
Quartz drift is real. Thermal effects are real. Virtualized clocks are chaotic under load. Suspend/resume events produce nonsense. Hardware clocks and kernel clocks do not agree by default.
Without synchronization, machines in the same fleet diverge. With enough divergence, “before” and “after” become undecidable at incident scale.
NTP addresses this by estimating offset and network delay across multiple sources, then disciplining local time gradually.
It does not make clocks perfect. It makes clocks coherent enough for civilization.
II. Official History
NTP was developed by David L. Mills. Early work appears in the 1980s, with RFC 958 (1985) and RFC 1059 (1988), followed by later revisions culminating in NTPv4 (RFC 5905, 2010).
The protocol survived the transition from research networks to global commercial Internet because it solved a universal systems problem: independent machines need a common time base without trusting a single box blindly.
NTP’s longevity is not nostalgia. It is proof that the failure model was understood correctly.
III. How NTP Works in Practice
NTP runs primarily over UDP/123 and organizes sources by strata.
| Stratum | Meaning | Typical source |
|---|---|---|
| 0 | Reference clocks (not directly network-served) | GPS, atomic clocks, radio time receivers |
| 1 | Directly attached to stratum 0 | Time servers in labs, IXPs, providers |
| 2+ | Downstream synchronized servers | Enterprise, ISP, cloud, campus servers |
A client usually queries multiple upstreams, then applies selection and filtering to reject outliers and converge on stable offset.
Useful telemetry fields include:
- Offset: how far local clock is from selected source
- Delay: round-trip network latency
- Jitter: variation in timing measurements
- Root distance/dispersion: uncertainty budget through upstream chain
You do not want “the lowest latency server.” You want stable, sane, and diverse sources.
IV. Minimal Sane Configuration
A practical chrony baseline:
# Use diverse pools/sources
pool pool.ntp.org iburst maxsources 4
server time.cloudflare.com iburst nts
# Step on boot if offset is large, then slew
makestep 1.0 3
# Keep RTC in sync with system clock
rtcsync
Operational checks:
chronyc tracking
chronyc sources -v
Interpretation discipline matters more than green status icons. A source list is not health if all sources share one hidden failure domain.
V. Incidents That Taught the Industry to Respect Time
| Date | Incident | Mechanism | Impact |
|---|---|---|---|
| 2012-06-30 | Leap second event | Kernel/userspace timing bugs triggered high CPU loops on many Linux systems | Widespread service instability across major platforms |
| 2016-12-31 / 2017-01-01 | Cloudflare leap-second DNS incident | Edge time handling bug around leap second processing | Partial DNS resolution failures until mitigation |
| Ongoing (2010s) | NTP amplification abuse | Abusable query modes used for DDoS reflection | Major volumetric attacks, hardening campaigns followed |
These were not “NTP is bad” stories. They were “time handling assumptions were naive” stories.
The Supreme Leader notes that temporal bugs are politically similar to paperwork bugs: harmless until suddenly constitutional.
VI. Leap Seconds: Tiny Unit, Large Blast Radius
UTC occasionally inserts leap seconds to remain aligned with Earth’s rotation.
Software stacks historically assume time moves forward at uniform cadence. A leap second violates that assumption. If components disagree on whether to step, slew, smear, or ignore, event ordering can break.
Common strategies:
- Step: apply one-second correction abruptly
- Slew: adjust gradually over time
- Leap smear: spread the correction over a window to avoid discontinuity
None is universally “correct” in isolation. Consistency across your own fleet matters more than ideological preference.
If half your systems smear and half step without design intent, you have built a temporal split-brain.
VII. Security and Trust in Time Distribution
Classic NTP was not designed for today’s threat assumptions.
Security improvements now include:
- Better default hardening in modern daemons
- Restrictive query modes to reduce reflection abuse
- NTS (Network Time Security, RFC 8915) for authenticated time exchange over TLS-based key establishment
Reality check:
- Authenticated time still depends on PKI, network reachability, and sane upstream selection.
- If your time sources are all in one provider and that provider has a routing/control event, your trust chain narrows dangerously.
Time diversity is as important as DNS diversity and BGP policy hygiene.
VIII. Why Product Teams Should Care
NTP errors leak directly into product behavior:
- JWT and OAuth token validity windows fail unexpectedly
- Certificate checks break
- Log ordering becomes unreliable for incident response
- Cache eviction and TTL behavior drift
- Distributed consensus and leader election become unstable
“It’s just one second” is an amateur statement in distributed systems.
One second is enough to violate invariants you did not know you had.
The Decree
NTP is not optional plumbing. It is temporal governance.
If you cannot answer these quickly, your platform is operating on hope:
- Which daemons run time sync on each tier?
- What is our leap-second policy (step/slew/smear) and is it consistent?
- Do we have source diversity across providers and paths?
- Are we monitoring offset/jitter/root distance, not just process up/down?
The Internet tolerates many kinds of sloppiness. It does not tolerate clocks that disagree for long.
Tomorrow: we can shift to virtio or Unicode history, unless you want the filesystem branch first.
— Kim Jong Rails, Supreme Leader of the Republic of Derails