R-HLC-DRIFT

2 min readOn-call operatorRunbook

When to use

A peer sends Hybrid Logical Clock timestamps outside your allowed skew window. Trigger alerts: peer_hlc_drift_high, peer_hlc_drift_critical. Default critical threshold: one peer sends a timestamp more than 300 seconds in the future.

Identify

Gather recent HLC anomaly events:

curl -s "https://your-node.example.com/v1/federation/audit?limit=200" \
  -H "Authorization: Bearer $STIGMEM_ADMIN_KEY" \
  | jq '.[] | select(.event_type == "peer_hlc_anomaly")'

Record peer entity URI, observed HLC, local HLC, drift seconds, and whether the fact was rejected or admitted.

Contain

If drift is critical or repeated, pause pulls from the peer.
Do not relax skew limits for normal live federation traffic.
If you are running an intentional archival backfill, isolate that backfill from normal peer traffic and restore the skew bound afterward.

Investigate

Determine whether this is honest clock skew or malicious/manipulated input:

NTP status

Ask the peer operator for current NTP status and system time.

Drift direction

Future skew is higher risk than old backfill data.

Relation patterns

Check whether facts around the anomaly share unusual relations or scopes.

Concurrent violations

Look for concurrent replay or capability-violation events.

Recover

For honest clock skew:

Ask the peer operator to fix NTP/system time.
Resume pulls after their clock is stable.
Manually pull a small batch and confirm no new anomalies.

For suspicious drift:

Keep the peer disabled.
Retract any admitted facts whose ordering could affect decisions.
Treat the peer as compromised until the operator proves control of the node.

Communicate

Send the peer operator the drift seconds, timestamps, and affected fact IDs.

If your deployment uses HLC order for compliance/audit workflows, notify affected internal stakeholders before relying on time-ordered reports from the incident window.

Identify​

Contain​

Investigate​