Version: v0.9.0a2

Spec

Security Hardening Components

10 min readSpec contributor · Node operatorSpec-09 + Spec-10 + Spec-11 + Spec-12

What this page is

Rendered compatibility entry point for the four hardening component specs: Spec-09-Audit-Log, Spec-10-Hardening, Spec-11-Replay-Protection, and Spec-12-HLC-Bounded-Skew. mTLS federation, key rotation, audit log, per-principal quotas, container baseline.

Authoritative source: spec/stigmem-spec-v0.9.0a1.md

Section body

Legacy §22 anchors are retained for existing links while the maintained hardening prose lives in the modular component specs listed above.

§22.1 mTLS federation transport

§22.1.1 Scope

This section specifies mutual TLS requirements for all transport connections between federated Stigmem nodes. The spec otherwise treats the federation wire protocol as transport-agnostic (§6); §22.1 narrows that flexibility for deployments connecting more than one node.

§22.1.2 Normative requirements

All federation transport connections between distinct Stigmem nodes MUST use mutual TLS (mTLS): both the dialing node and the accepting node MUST present a valid X.509 certificate and MUST verify the peer's certificate before data exchange begins.
The TLS version floor is TLS 1.3. Nodes MUST NOT negotiate TLS 1.2 or earlier on federation ports. Implementations MUST configure their TLS stack to refuse downgrade to TLS < 1.3.
The cipher suite floor for TLS 1.3 connections MUST include at a minimum TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384, and TLS_CHACHA20_POLY1305_SHA256. Operators MAY restrict to a subset but MUST NOT add cipher suites outside this list without board-level security approval documented in the ops runbook.
Node certificate Subject Alternative Names (SANs) MUST include the node's canonical entity_uri (as a URI SAN). Verifying nodes MUST check that the peer's SAN matches the entity_uri declared in the peer's org manifest (§19.1.2) before accepting the connection as authenticated.
Nodes MUST reject any federation connection from a peer whose certificate chain cannot be verified against a locally configured trust root or whose SAN does not match the expected entity_uri.

Reverse-proxy deployments

If a reverse proxy (nginx, Caddy, Envoy) terminates TLS before the stigmem node process, mTLS peer certificate validation is bypassed. Set STIGMEM_MTLS_REQUIRED=true to force the node to reject any connection without a verified peer certificate, even behind a proxy. Verify this configuration in staging before enabling federation.

§22.1.3 Cert rotation hook into §19 manifest

When a node rotates its mTLS node certificate:

The node MUST generate a new X.509 certificate for the new key pair.
The new certificate's public key fingerprint MUST be recorded in the node's org manifest (§19.1) as a new RotationEvent (§19.1.4) alongside the Ed25519 key rotation, or in a dedicated tls_cert_fingerprint field on the manifest if the TLS key is distinct from the Ed25519 signing key. Implementations MUST NOT rotate the mTLS certificate silently — every rotation MUST produce a manifest update.
The updated manifest MUST be re-signed and re-published to /.well-known/stigmem-manifest.json (§19.1.6) before the new certificate is put into service.
The updated manifest MUST be submitted to the transparency log (§19.2) as part of the rotation event. Nodes MUST NOT activate the new certificate until the transparency log submission has been acknowledged (i.e., until a LogEntry is received). Nodes SHOULD retry the transparency log submission for up to 24 hours before proceeding with rotation. If rotation proceeds without a log acknowledgement (e.g., due to a Rekor maintenance window), the node MUST record a pending_log_submission: true flag in the manifest and MUST complete the submission as soon as the log is reachable.
During the transition window (see §22.2.2 for dual-trust period), nodes MUST accept both the old and new TLS certificates from the rotating peer. The transition window MUST NOT exceed the dual-trust period defined in §22.2.

§22.1.4 Client certificate provisioning

Nodes SHOULD use short-lived mTLS client certificates (≤ 24 hours) issued by a local certificate authority dedicated to federation transport. Operators MAY use longer-lived certificates (≤ 90 days) provided they implement automated rotation (e.g., via cert-manager or equivalent). Long-lived certificates MUST be listed in the node's org manifest as described in §22.1.3.

§22.2 Key rotation

§22.2.1 Scope

This section applies to two key types:

Key type

Use

Reference

Ed25519 node signing keys

manifests + capability tokens

§19.1, §19.3.

Capability issuer keys

live capability token issuers

Subset of node signing keys.

§22.2.2 Rollover window and dual-trust period

The rollover window begins when a new key pair is generated and ends when all previously issued capability tokens signed by the old key have expired or been explicitly revoked.
During the rollover window, nodes MUST maintain a dual-trust period: both the old and new public keys are simultaneously trusted for signature verification. The dual-trust period MUST cover at least the maximum outstanding capability token lifetime from the time rotation is initiated. Since capability tokens MUST NOT exceed 90 days (§19.3.2), the dual-trust period MUST be at least 90 days unless all outstanding tokens are explicitly revoked before rotation completes.
Nodes MUST reject tokens signed by a key older than the dual-trust period (i.e., keys for which the dual-trust period has elapsed and which are no longer in the org manifest's rotation chain).
The rollover window MUST be recorded in the org manifest via a RotationEvent (§19.1.4). The transparency log MUST receive a separate log entry for the rotation event with event_type: "key_rotation" and a dual_trust_expires_at field indicating when the old key's trust period ends.
During the dual-trust period, verifiers SHOULD consult the org manifest rotation chain (§19.1.4) to identify which historic key signed a given token, rather than assuming the current manifest key.

§22.2.3 Transparency log entry on rotation

Every key rotation MUST produce a transparency log entry so that federation peers and auditors can verify the chain of identity across key transitions. The entry is signed by the old (retiring) key — this anchors the new key to the prior identity and prevents a compromised new key from fabricating a rotation event.

KeyRotationLogEntry:
  event_type:           "key_rotation"
  entity_uri:           URI         // the rotating node/org
  old_key_id:           hex         // key_id of the retiring key
  new_key_id:           hex         // key_id of the new key
  rotated_at:           RFC3339
  dual_trust_expires_at: RFC3339    // old key trusted until this time
  manifest_log_index:   integer     // log index of the updated manifest submission
  rotation_sig:         base64url   // Ed25519 sig over RFC 8785 JCS encoding, signed by OLD key

The rotation_sig MUST verify under the old_key_id public key. The byte sequence signed MUST be the RFC 8785 JSON Canonicalization Scheme (JCS) serialisation of the other fields: keys lexicographically sorted, no whitespace, UTF-8 encoding, no trailing newline.

The manifest submission (§22.1.3.4) MUST be acknowledged by the transparency log before the KeyRotationLogEntry is submitted; the returned log index MUST be recorded as manifest_log_index.

§22.2.4 Rotation cadence

Key type

SHOULD cadence

Notes

Ed25519 node signing keys

≤ 365 days

Operators MAY define shorter cadences.

Capability issuer keys

≤ 90 days

Matching the maximum token lifetime.

Cadence MUST be documented in the node's operational runbook and MAY be declared in the node's /.well-known/stigmem advertisement.

§22.3 Audit log surface

§22.3.1 Required event types

Every Stigmem node MUST emit structured audit log events for the following operations. Each event MUST be written to the audit log before the operation's response is returned to the caller (write-ahead semantics).

Event type

Trigger

Minimum fields

fact_write

assert/retract

event_type, timestamp, hlc, actor_entity, fact_id, scope, verb.

fact_read

recall returning ≥1 fact

event_type, timestamp, actor_entity, scope_filter, fact_ids_returned[], query_strategy.

capability_token_issue

token issued

token_id, issuer, subject, verb, object, expiry.

capability_token_revoke

token revoked

token_id, issuer, revoked_at, reason.

manifest_publish

manifest published or updated

entity_uri, key_id, manifest_hash.

key_rotation

Ed25519 or mTLS key rotated

entity_uri, old_key_id, new_key_id, dual_trust_expires_at.

federation_connect

peer connection accepted/rejected

peer_entity_uri, peer_cert_fingerprint, outcome, reject_reason?.

quarantine_admit

fact admitted to quarantine

fact_id, source, admit_reason.

quarantine_release

fact released from quarantine

fact_id, actor_entity, decision.

quota_breach

per-principal quota ceiling hit

principal, quota_dimension, ceiling, actual.

admin_action

any admin API call

actor_entity, action, resource, outcome.

replay_rejected

capability token replay

token_id, nonce, reject_reason.

instruction_audit

lazy instruction preload/recall

agent_id, chunk_id, load_trigger, outcome. MUST emit if the instruction recall layer is active; nodes not implementing the lazy instruction layer are exempt.

instruction_quarantined

instruction-namespace quarantine

fact_id, actor_entity, source, reason.

instruction_promoted

quarantined instruction promoted

fact_id, actor_entity, quarantine_garden_id, target_garden_id?.

Implementations MUST NOT omit required fields. Optional fields (marked ?) SHOULD be included when available.

§22.3.2 Ordering guarantee

Audit log events MUST be totally ordered by a monotonically increasing sequence number within a single node. Events SHOULD include the node's HLC tick (§2.4) alongside the wall-clock timestamp to allow cross-node ordering reconstruction. The sequence MUST NOT reset across node restarts.

§22.3.3 Retention contract

Minimum 90 days

Audit logs MUST be retained for at least 90 days.

Recommended 1 year

Operators SHOULD retain logs for 1 year for forensic purposes.

Append-only storage

Logs MUST be stored in a medium that is append-only with respect to normal operational access. Ordinary application processes MUST NOT be able to overwrite or delete log entries.

Separate from fact store

Logs MUST NOT be stored exclusively in the same database that serves the production fact store unless that database provides an independent, append-only audit trail mechanism (e.g., PostgreSQL audit extension with restricted DDL access).

§22.3.4 Admin export shape

Admins MUST be able to export audit logs via the following HTTP route.

GET /v1/admin/audit-log
Authorization: Bearer <admin-token>
Query parameters:
  after:      RFC3339   // events after this timestamp (exclusive); omit for all
  before:     RFC3339   // events before this timestamp (exclusive); omit for open end
  event_type: string    // filter to one event type; repeatable for multiple types
  limit:      integer   // max events per page; default 500; max 5000
  cursor:     string    // opaque pagination cursor from prior response

Response:

{
  "events": [
    {
      "seq":        12345,
      "event_type": "fact_write",
      "timestamp":  "2026-05-04T12:00:00Z",
      "hlc":        "1746360000000-0001-a1b2",
      ...
    }
  ],
  "next_cursor": "opaque-cursor-string",
  "has_more":   true
}

Admin-scoped token required

The export route MUST require an admin-scoped token.

Ascending `seq` order

Events MUST be returned in ascending seq order.

Streaming support

The route MUST support streaming for large time ranges (chunked transfer or cursor pagination with has_more).

CLI wrapper

Operators SHOULD provide a CLI wrapper for this endpoint that writes NDJSON to stdout.

§22.4 Per-principal quotas

§22.4.1 Model

Stigmem implements per-principal rate limiting using a token-bucket model. Each (principal, dimension) pair maintains an independent token bucket. The principal is the actor_entity URI derived from the authenticated caller's capability token or API key.

TokenBucket:
  principal:   URI      // entity URI of the caller
  dimension:   string   // quota dimension (see §22.4.2)
  capacity:    integer  // bucket size (max burst)
  rate:        float    // refill rate in tokens/second
  current:     float    // current token count (updated on each request)
  last_refill: RFC3339  // timestamp of last refill computation

The bucket refills continuously at rate tokens/second up to capacity. Each qualifying request consumes 1 token unless otherwise specified per dimension.

§22.4.2 Quota dimensions and default ceilings

Dimension

Capacity · Rate

Description

fact_write

100 · 10/s

Fact assertions and retractions.

fact_read

500 · 50/s

Recall and query operations.

token_issue

20 · 0.33/min

Capability token issuance.

federation_pull

30 · 0.5/min

Outbound federation pull calls.

admin_action

10 · 0.17/min

Admin API calls.

subscription_event

200 · 20/s

Outbound subscription event deliveries.

audit_export

10000 · 167/min

Rows returned from audit export endpoint.

Default ceilings MUST be applied unless overridden by an admin-configured QuotaPolicy document for the principal. Overrides MUST be stored persistently and survive node restarts.

§22.4.3 Backpressure response shape

When a principal's token bucket is exhausted:

The node MUST return HTTP 429 Too Many Requests with the body shape below. retry_after is a float number of seconds until the bucket refills sufficiently to accept one more request at the current rate. Implementations MUST compute this as (1 - current) / rate (seconds to earn 1 token).
The node MUST include a Retry-After HTTP header with the integer ceiling of retry_after.
The node MUST emit a quota_breach audit log event (§22.3.1) for every request that hits the ceiling.
Nodes SHOULD propagate quota pressure upstream to federated callers via the X-Stigmem-Replication-Lag header (§6.7) when federation_pull quota is exhausted.
Callers MUST honour Retry-After and MUST implement exponential backoff with jitter after two consecutive 429 responses from the same node.

{
  "error":        "quota_exceeded",
  "dimension":    "fact_write",
  "principal":    "stigmem://org/my-agent",
  "retry_after":  3.2
}

§22.5 Replay protection

§22.5.1 Scope

This section extends §19.3.5 (capability token nonce) with normative clock-skew bounds and a unified replay protection model applicable to both capability tokens and federation handshake messages.

§22.5.2 Nonce and timestamp window

Every capability token MUST include a nonce of 32 cryptographically random bytes (§19.3.5). Every federation handshake message MUST include an independent nonce of 32 cryptographically random bytes.
The timestamp acceptance window is ± 5 minutes from the verifier's local clock. Tokens or messages with an issued_at timestamp outside this window MUST be rejected with a timestamp_out_of_window error, even if the nonce is fresh.
The nonce cache MUST retain seen nonces for at least the duration of the acceptance window plus the maximum token lifetime (5 minutes + 90 days for capability tokens; 5 minutes + session duration for handshake messages). Implementations MUST NOT prune nonces from the cache before this window elapses.
Nonces MUST be stored in a persistent cache (survives node restarts within the retention window). An in-memory-only nonce cache MUST NOT be used in production; a brief restart MUST NOT create a replay window.

§22.5.3 Clock-skew bounds

Scenario

Bound

Behavior on violation

issued_at > verifier clock + 5 min

future-dated

Reject: timestamp_future_dated.

issued_at < verifier clock − 5 min

stale

Reject: timestamp_stale.

expiry < verifier clock

expired

Reject: token_expired.

expiry > issued_at + 90 days

excessive lifetime

Reject: token_lifetime_exceeded.

Nodes MUST synchronise their system clocks via NTP (or equivalent). Operators SHOULD configure alerts if clock drift exceeds 30 seconds.

§22.5.4 Error codes

HTTP · Code

Class

Condition

401 · timestamp_future_dated

replay

issued_at more than 5 minutes in the future.

401 · timestamp_stale

replay

issued_at more than 5 minutes in the past.

401 · token_expired

lifecycle

Token expiry has passed.

401 · token_lifetime_exceeded

policy

Token expiry − issued_at > 90 days.

401 · token_replay

replay

Nonce already seen within the retention window.

§22.6 Container baseline

§22.6.1 Scope

This section specifies the normative security posture for reference operator container images published by Eidetic Labs. Third-party operators running Stigmem from source SHOULD adopt the same baseline.

v0.9.0a1 status

The Docker / Docker Compose requirements in this section apply to the supported v0.9.0a1 deployment surface. Requirements that reference Helm charts or Kubernetes manifests apply conditionally: in v0.9.0a1 those deployment surfaces are deferred to experimental/deploy-helm/ and unsupported until they pass the ADR-008 reintroduction gates.

§22.6.2 Distroless image

Reference operator images MUST be built FROM a distroless base (e.g., gcr.io/distroless/cc-debian12 or equivalent). Images MUST NOT include a shell (sh, bash) in the production layer.
Multi-stage builds MUST be used: build dependencies and tools MUST be confined to a builder stage and MUST NOT appear in the final image layer.
The image MUST contain only the Stigmem node binary and its minimal runtime dependencies (shared libraries, CA bundle, tzdata).

§22.6.3 Non-root user

The container MUST run as a non-root user. The Dockerfile MUST include a USER directive setting a non-zero UID (SHOULD use UID 1000) in the final stage.
The container MUST NOT be run with --privileged or with CAP_SYS_ADMIN. Operators MUST NOT grant any Linux capabilities beyond the minimum required (if port < 1024, use CAP_NET_BIND_SERVICE; SHOULD use a port ≥ 1024).
Kubernetes / container runtime manifests for reference deployments MUST include:

securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  allowPrivilegeEscalation: false

§22.6.4 Read-only root filesystem

The container's root filesystem MUST be mounted read-only (readOnlyRootFilesystem: true in Kubernetes). All writable state (database files, log buffers, temporary files) MUST be mounted as explicit volumes or emptyDir mounts.
Reference Helm charts MUST configure readOnlyRootFilesystem: true by default and MUST document which volumes require write access.

§22.6.5 Seccomp profile

Reference images MUST ship a seccomp profile that allows only the syscalls required by the Stigmem node binary. The profile MUST deny ptrace, process_vm_readv, process_vm_writev, kexec_load, and perf_event_open at a minimum.
Kubernetes deployments MUST apply the profile via seccompProfile.type: Localhost with localhostProfile: profiles/stigmem-node.json, or type: RuntimeDefault where a restrictive runtime default is confirmed equivalent. Unconfined MUST NOT be used in production.
The seccomp profile MUST be published alongside each release in deploy/seccomp/stigmem-node.json and versioned with the binary.

§22.6.6 Image signing

Reference images MUST be signed using Sigstore Cosign and the signature MUST be pushed to the same registry. Operators SHOULD verify the image signature before deployment using cosign verify. Image digests (not mutable tags) MUST be used in all reference Kubernetes manifests and Helm chart values.yaml defaults.

§22.7 Transparency log own-instance decision memo

§22.7.1 Purpose

§19.2.2 permits but does not require operating a self-hosted Rekor instance. This section provides normative decision criteria so that operators can determine whether self-hosting is appropriate, and records the Eidetic Labs reference deployment position.

§22.7.2 Decision criteria

An operator SHOULD self-host a Rekor instance if and only if ALL of the following criteria are met.

Criterion

Class

Rationale

Private network without external egress

connectivity

Public Rekor requires egress to rekor.sigstore.dev.

Federation peers are all internal

topology

Public log provides independent verifiability for external peers; private log acceptable for closed meshes.

Commit to ≥ 99.9% uptime

operations

Federation peers depend on the log for manifest verification in trust_mode: strict.

Independent peer accessibility

protocol SHOULD

§19.2.2 SHOULD: log SHOULD be independently accessible to all peers.

Dedicated ops team / automation

key ceremony

Rekor key rotation is operationally complex and MUST NOT be performed ad-hoc.

If any criterion is not met, the operator SHOULD use the public Rekor instance at https://rekor.sigstore.dev (or a hosted equivalent). Operators MUST NOT self-host without documented answers to each criterion in their ops runbook.

§22.7.3 Reference deployment position (Eidetic Labs)

The Eidetic Labs reference deployment uses the public Rekor instance (https://rekor.sigstore.dev).

Criterion

Status

Notes

Private network without egress

Not met

Reference node targets public deployments.

Internal-only federation

Not met

External federation is a core use-case.

Ops commitment ≥ 99.9%

Not evaluated

Would require dedicated SRE investment.

Independent peer accessibility

Not evaluated

Moot given above.

Dedicated key ceremony team

Not evaluated

Moot given above.

Decision: defer self-hosted Rekor to backlog.

A self-hosted Rekor instance for the Eidetic Labs reference deployment does not meet the minimum decision criteria at this phase. Reconsider when (a) a private-network deployment tier is productised, or (b) a dedicated SRE function is established.

§22.7.4 Configuration

STIGMEM_TRANSPARENCY_LOG_URL=https://rekor.sigstore.dev
STIGMEM_TRANSPARENCY_LOG_PUBLIC_KEY=<base64-encoded ECDSA key from GET /api/v1/log>

The public key is pinned explicitly rather than discovered at runtime — this ensures the node always verifies log entries against a known trust anchor even if the Rekor URL is compromised. STIGMEM_TRANSPARENCY_LOG_PUBLIC_KEY MUST be pinned explicitly; key discovery via the URL alone MUST NOT be the sole trust anchor in production.

§22.7.5 Transparency log public-key rotation

The Sigstore/Rekor root signing key is subject to rotation (a root key rotation occurred in 2022). Operators pinning STIGMEM_TRANSPARENCY_LOG_PUBLIC_KEY MUST have a documented procedure for updating the pin.

Operators SHOULD subscribe to Sigstore transparency log key rotation announcements (the sigstore-announce mailing list and the CT log transparency dashboard) and SHOULD update STIGMEM_TRANSPARENCY_LOG_PUBLIC_KEY within 30 days of a published rotation.
A node MUST NOT treat a persistent transparency log key verification failure as a permanent misconfiguration without first checking whether a Rekor root key rotation has occurred. On repeated verification failures, the node SHOULD emit a transparency_log_key_mismatch audit log event and surface an operator alert before entering a degraded-verification state.

Subsection anchors

Anchors below are provided so docs links to specific subsections always resolve, even when the subsection text lives only in earlier spec drafts.

§22.1 mTLS federation transport​

§22.1.1 Scope​

§22.1.2 Normative requirements​

§22.1.3 Cert rotation hook into §19 manifest​

§22.1.4 Client certificate provisioning​

§22.2 Key rotation​

§22.2.1 Scope​

§22.2.2 Rollover window and dual-trust period​

§22.2.3 Transparency log entry on rotation​

§22.2.4 Rotation cadence​

§22.3 Audit log surface​

§22.3.1 Required event types​

§22.3.2 Ordering guarantee​

§22.3.3 Retention contract​

Minimum 90 days

Recommended 1 year

Append-only storage

Separate from fact store

§22.3.4 Admin export shape​

Admin-scoped token required

Ascending seq order

Streaming support

CLI wrapper

§22.4 Per-principal quotas​

§22.4.1 Model​

§22.4.2 Quota dimensions and default ceilings​

§22.4.3 Backpressure response shape​

§22.5 Replay protection​

§22.5.1 Scope​

§22.5.2 Nonce and timestamp window​

§22.5.3 Clock-skew bounds​

§22.5.4 Error codes​

§22.6 Container baseline​

§22.6.1 Scope​

§22.6.2 Distroless image​

§22.6.3 Non-root user​

§22.6.4 Read-only root filesystem​

§22.6.5 Seccomp profile​

§22.6.6 Image signing​

§22.7 Transparency log own-instance decision memo​

§22.7.1 Purpose​

§22.7.2 Decision criteria​

§22.7.3 Reference deployment position (Eidetic Labs)​

§22.7.4 Configuration​

§22.7.5 Transparency log public-key rotation​

Subsection anchors​

§22.1.2.3​

§22.1.2.4​

§22.1.3.4​

§22.1.3.5​

§22.1 mTLS federation transport

§22.1.1 Scope

§22.1.2 Normative requirements

§22.1.3 Cert rotation hook into §19 manifest

§22.1.4 Client certificate provisioning

§22.2 Key rotation

§22.2.1 Scope

§22.2.2 Rollover window and dual-trust period

§22.2.3 Transparency log entry on rotation

§22.2.4 Rotation cadence

§22.3 Audit log surface

§22.3.1 Required event types

§22.3.2 Ordering guarantee

§22.3.3 Retention contract

§22.3.4 Admin export shape

Ascending `seq` order

§22.4 Per-principal quotas

§22.4.1 Model

§22.4.2 Quota dimensions and default ceilings

§22.4.3 Backpressure response shape

§22.5 Replay protection

§22.5.1 Scope

§22.5.2 Nonce and timestamp window

§22.5.3 Clock-skew bounds

§22.5.4 Error codes

§22.6 Container baseline

§22.6.1 Scope

§22.6.2 Distroless image

§22.6.3 Non-root user

§22.6.4 Read-only root filesystem

§22.6.5 Seccomp profile

§22.6.6 Image signing

§22.7 Transparency log own-instance decision memo

§22.7.1 Purpose

§22.7.2 Decision criteria

§22.7.3 Reference deployment position (Eidetic Labs)

§22.7.4 Configuration

§22.7.5 Transparency log public-key rotation

Subsection anchors

§22.1.2.3

§22.1.2.4

§22.1.3.4

§22.1.3.5