Skip to main content
Version: v0.9.0a2

Hybrid Logical Clocks

4 min readProtocol implementer · Node operatorSpec-01-Fact-Model.4

What this page is

How Stigmem orders facts causally across distributed nodes without a central coordinator — combining wall-clock time with a logical counter into a single monotone identifier.

The problem

Distributed nodes need to agree on the order of events. If Node A asserts "Alice is CEO" and Node B asserts "Alice is CTO" during a network partition, which came first? The answer determines which fact is the latest when the partition heals.

Wall clocks seem like the obvious answer — just compare timestamps. But wall clocks drift. NTP synchronization has millisecond-level jitter, VM clocks can jump, and a node whose clock runs fast will appear to win every conflict. You need a clock that is both causally correct and close to real time.

Naive approaches and why they fail

Approach
Failure mode
Why it doesn't work
Wall clocks alone
silent loss
Two nodes with clocks 50ms apart will disagree on ordering for any events within that window. A clock that jumps backward (e.g., NTP correction) can cause a fact written later to appear older than one written earlier.
Pure logical clocks (Lamport)
opaque counter
Correct causal ordering, but the counter tells you nothing about when it happened. You can't answer "what did the graph look like last Tuesday?" without mapping logical values back to wall time.
Vector clocks
O(N) state
Per-node counters detect concurrent events but state is O(N) per event where N is the number of nodes. In a federation with hundreds of peers, every fact would carry a vector of hundreds of counters. Impractical.

Our model

Stigmem uses a Hybrid Logical Clock (HLC), combining wall-clock time with a logical counter:

HLC = "{wall_ms_utc}.{counter}"

For example: "1746230400000.003" — wall time 1746230400000ms (UTC) with counter 3.

Advance rules

The HLC advances according to two rules.

Rule
Trigger
Behavior
Rule 1 — Local write
local event
Set wall_ms = max(now_ms, last_hlc.wall_ms). If wall_ms is unchanged, increment the counter. Otherwise reset the counter to 0.
Rule 2 — Federated ingest
remote event
Set wall_ms = max(now_ms, received_hlc.wall_ms). Same counter logic as Rule 1. Ensures the receiving node's clock never goes backward relative to a fact it has just ingested.

Causal ordering

Two facts a and b are causally ordered if a.hlc < b.hlc (compared as wall_ms first, then counter). Equal HLCs on different nodes indicate concurrent writes — these are handled by the contradiction policy (Spec-15-Fact-Semantics).

Worked example

Consider two nodes during and after a partition.

Time
Node A · clock accurate
Node B · clock 20ms ahead
T=0
Asserts fact. HLC: 1000.0
T=10
Asserts fact. HLC: 1020.0 (clock is ahead).
T=50
Receives B's fact. max(1050, 1020) = 1050. HLC: 1050.0
T=51
Local write. max(1051, 1050) = 1051. HLC: 1051.0

Node A's HLC tracks real time closely. When it ingests Node B's fact with wall_ms = 1020, it correctly recognizes that its own clock (1050) is ahead and uses that. The counter stays at 0 because wall_ms advanced. No causal information is lost, and the ordering is deterministic.

If both nodes had written at the same millisecond, the counter would break the tie:

Node A Node B
1000.0 1000.0
1000.1 (no second write)

1000.1 > 1000.0, so Node A's second write is ordered after both first writes.

Why this is non-obvious

HLC looks like a wall clock, but isn't

The wall_ms component tracks real time closely but is not a wall-clock timestamp. It can only advance forward, never backward. HLC values are monotonically increasing on a single node, even if the system clock is corrected backward by NTP.

O(1) state vs. O(N) state

Unlike vector clocks, HLC requires only a single (wall_ms, counter) pair per node — constant state regardless of federation size.

Equal HLCs are concurrent, not identical

Two facts with the same HLC from different nodes are not duplicates — they are concurrent writes that happened to occur at the same logical instant. Handled by the contradiction policy.

What it costs

Clock skew tolerance

HLC absorbs clock skew by advancing to max(local, remote). A node whose clock runs far ahead will "pull" every peer's HLC forward permanently. Ensure NTP is configured and monitor for excessive HLC drift.

No true simultaneity detection

HLC can detect causal ordering and concurrent writes, but it cannot distinguish "truly simultaneous" from "happened within the clock skew window." Both are treated as concurrent and surfaced as contradictions if they conflict.

Counter overflow in theory

The counter is an integer with no spec-defined upper bound. Hitting overflow requires sub-millisecond write rates sustained long enough to exhaust integer range — not a realistic concern.

References