Skip to main content
Version: v0.9.0a2
Operator

Federation DNSSEC Origin Trust

8 min readNode operatorv0.9.0aN

What this runbook covers

Publish the DNSSEC-signed binding record that lets a downstream node re-derive your origin key when your node is unreachable, enable the relay-path DNSSEC trust tier, and work the operator-confirm queue for origins that DNSSEC cannot anchor.

Audience: operators running a multi-hop relay (federation_relay_enabled=true) who want a relayed origin's facts to remain trustable when the origin node is offline but its DNS zone is DNSSEC-signed. See also: Federation Peer Setup, Federation trust.

What problem this solvesโ€‹

In a multi-hop relay (A โ†’ B โ†’ C), node C may receive a fact that originated at A but was relayed through B. C must verify A's per-fact origin signature against A's key. The pre-DNSSEC anchors are: an operator pin, a stored first-party binding, or a fetch-on-first TOFU fetch of A's manifest. All three require A to be reachable or already known. If A is unreachable and unknown, C fails closed (relay_origin_unanchored).

DNSSEC origin trust adds one more anchor after those three: A publishes a DNSSEC-signed TXT record that binds A's entity_uri to A's key fingerprint. Because A's DNS is independent of A's node, C can re-derive (and re-check the recency/revocation of) A's key without ever contacting A โ€” A's node can stay offline while its DNS keeps answering.

The tier is default-OFF and strictly additive: with the flag off, the relay path is byte-identical to before (it fails closed exactly as it did, never calling a resolver).

Step 1 โ€” Publish the binding TXT recordโ€‹

Publish a DNSSEC-signed TXT record at _stigmem-fed._key.<canonical-host>, where <canonical-host> is the host of your node's entity_uri (for https://memory.acme.example/ the qname is _stigmem-fed._key.memory.acme.example). Your zone must be DNSSEC-signed โ€” the record is only consulted when the full chain to the IANA root validates.

The record grammar is v=stigmem1, semicolon-separated key=value pairs. Two forms:

# active binding
v=stigmem1; fpr=<key_fpr>; epoch=<n>; prev_fpr=<or-empty>; prev_until=<or-empty>

# revocation tombstone (withdraws all keys for the host)
v=stigmem1; status=revoked; epoch=<n>; fpr=
Field
Required
Meaning
v=stigmem1
yes (first token)
Version sentinel. Must be the first token or the record is rejected.
fpr
active: yes
The bound key fingerprint (the same fingerprint format the manifest publishes). Empty/omitted on a revocation tombstone.
epoch
yes
A monotonic non-negative integer. A record at an epoch below the floor a downstream node has already seen is a rollback and is rejected. Bump it on every rotation/revocation.
prev_fpr
no
During a rotation, the retiring key's fingerprint. A relayed fact still signed by the retiring key is honored while inside the grace window.
prev_until
no
ISO-8601 deadline for the prev_fpr grace. When omitted, the downstream derives a grace from federation_key_rotation_grace_hours. A present-but-unparseable value fails closed (no grace).

Unknown key=value pairs are ignored, so adding a field in a future version is a routine zone re-sign rather than a breaking change.

Re-sign the record on your normal zone cadence. A downstream node treats an aged RRSIG on the relay path as a hard reject (it cannot run a mid-relay operator-confirm). Keep the binding RRSIG fresh relative to federation_dnssec_max_rrsig_age (default 7 days).

Rotating your keyโ€‹

  1. Generate the new key and update your manifest (see Federation Peer Setup).
  2. Re-sign the binding TXT at a strictly higher epoch, with fpr=<new> and prev_fpr=<old> plus a prev_until covering the in-flight window.
  3. Once the grace window has elapsed, re-sign again dropping prev_fpr/prev_until.

Revoking your keyโ€‹

Publish the tombstone form (status=revoked; fpr= at a higher epoch). A downstream re-check that resolves the tombstone hard-rejects the relayed fact (relay_origin_revoked) โ€” and this works while your node is offline, because revocation lives in your DNS, not your node.

Step 2 โ€” Enable the relay-path DNSSEC trust tierโ€‹

DNSSEC origin trust is a sub-feature of multi-hop relay. It is only meaningful when relay is also on.

# both flags must be ON for the DNSSEC tier to run
export STIGMEM_FEDERATION_RELAY_ENABLED=true
export STIGMEM_FEDERATION_DNSSEC_TRUST_ENABLED=true
# restart the node

With federation_dnssec_trust_enabled=false (the default) the tier is inert: no resolver is constructed, no DNS is queried, and an unanchored unreachable origin fails closed exactly as before.

How a relayed origin key is trustedโ€‹

When C receives a relayed fact from an unreachable, unpinned, unknown origin A whose carried manifest supplies a candidate fingerprint, C runs the first-trust ladder:

  1. operator-pin โ€” a human-confirmed anchor wins outright.
  2. DNSSEC โ€” resolve _stigmem-fed._key.<host>; the validated record's fingerprint must equal the candidate. On success C pins the binding.
  3. operator-confirm โ€” an unsigned/insecure delegation, an authenticated absence on a never-signed host, or a slow-resigning (aged) signature parks the candidate in a queue for a human (Step 3).
  4. fail-closed โ€” anything else (revoked, rollback, bogus chain, unvalidatable) is rejected.

Recency / revocation re-checkโ€‹

A relayed DNSSEC key is honored only after a relay-path recency re-check confirms the binding is still current. The re-check cadence is clamp(record_DNS_TTL, floor, cap) measured from the pin's last genuine DNS validation:

  • STIGMEM_FEDERATION_DNSSEC_RECHECK_FLOOR_SECONDS (default 300) โ€” anti-storm floor.
  • STIGMEM_FEDERATION_DNSSEC_RECHECK_CAP_SECONDS (default 3600) โ€” re-resolve at least this often.

Within the cadence the pinned key is honored with no DNS egress (re-checks are cached per-origin, not per-fact). Past it, C re-resolves once and applies asymmetric semantics:

Re-check result
Disposition
Audit event
active, fingerprint still matches
honor
โ€”
rotation (higher epoch, new fpr; prior key in grace)
honor + advance pin
โ€”
status=revoked tombstone
reject
relay_origin_revoked
epoch below the seen floor (rollback)
reject
relay_origin_rolled_back
aged RRSIG (operator-confirm is first-trust-only)
reject
relay_origin_recheck_stale
no validatable answer (transport/SERVFAIL/insecure/absent)
honor within grace, else reject
relay_origin_recheck_unreachable

A positive withdrawal (revoked/rollback) is hard-rejected โ€” an attacker cannot forge one, so a positive answer is proof. Suppression (no positive proof) is honored only up to min(STIGMEM_FEDERATION_DNSSEC_UNREACHABLE_GRACE_SECONDS, STIGMEM_FEDERATION_DNSSEC_UNREACHABLE_TTL_MULTIPLE ร— cap) measured from the last genuine DNS validation โ€” never treated as a revocation, and never extended by relay activity.

Step 3 โ€” Work the operator-confirm queueโ€‹

Origins that DNSSEC can neither anchor nor reject (unsigned delegation, authenticated absence on a never-signed host, slow-resigning zone) are parked for an out-of-band human confirm. List, confirm, or reject pending candidates from the CLI (each subcommand calls the local node's admin API; provide --node-url or STIGMEM_NODE_URL, and an admin:federation --api-key):

# list quarantined candidates
stigmem federation dnssec pending

# paste-to-confirm a candidate (the fingerprint must match exactly)
stigmem federation dnssec confirm \
--entity-uri https://memory.acme.example/ \
--node-id stigmem://node-a-... \
--key-fpr sha256:...

# reject a candidate without trusting it
stigmem federation dnssec reject \
--entity-uri https://memory.acme.example/ \
--node-id stigmem://node-a-...

The same operations are available on the admin API:

  • GET /v1/federation/dnssec/pending โ€” list the queue.
  • POST /v1/federation/dnssec/pending/confirm โ€” paste-to-confirm (the body's pasted fingerprint must match the quarantined candidate).
  • POST /v1/federation/dnssec/pending/reject โ€” clear a pending row without trusting it.

The per-peer queue is bounded by federation_dnssec_pending_confirm_cap (default 100) so an untrusted relay cannot flood it; rows expire after federation_dnssec_pending_confirm_ttl (default 7 days).

Troubleshootingโ€‹

Symptom
Likely cause
Fix
Relayed fact fails closed, audit shows relay_origin_unanchored
flag off / no candidate
Confirm STIGMEM_FEDERATION_DNSSEC_TRUST_ENABLED=true AND the relay carried the origin's manifest (the candidate key the binding fingerprint is matched against).
relay_origin_revoked for a key you did not revoke
stale tombstone
Check the served TXT โ€” a leftover status=revoked record withdraws the key. Re-sign an active record at a higher epoch.
relay_origin_rolled_back
epoch went backwards
The served epoch is below one a downstream already pinned. Always bump epoch monotonically; never reuse a lower value.
relay_origin_recheck_stale
aged RRSIG
Re-sign the binding TXT; the RRSIG is older than federation_dnssec_max_rrsig_age.
Candidate stuck in the operator-confirm queue
unsigned/absent/aged
DNSSEC could not anchor it. Verify the candidate out-of-band, then stigmem federation dnssec confirm (or sign the zone and let the binding re-resolve).