Security Hardening Components
What this page is
Rendered compatibility entry point for the four hardening component specs: Spec-09-Audit-Log, Spec-10-Hardening, Spec-11-Replay-Protection, and Spec-12-HLC-Bounded-Skew. mTLS federation, key rotation, audit log, per-principal quotas, container baseline.
Authoritative source:
spec/stigmem-spec-v0.9.0a1.md
Legacy §22 anchors are retained for existing links while the maintained hardening prose lives in the modular component specs listed above.
§22.1 mTLS federation transport
§22.1.1 Scope
This section specifies mutual TLS requirements for all transport connections between federated Stigmem nodes. The spec otherwise treats the federation wire protocol as transport-agnostic (§6); §22.1 narrows that flexibility for deployments connecting more than one node.
§22.1.2 Normative requirements
- All federation transport connections between distinct Stigmem nodes MUST use mutual TLS (mTLS): both the dialing node and the accepting node MUST present a valid X.509 certificate and MUST verify the peer's certificate before data exchange begins.
- The TLS version floor is TLS 1.3. Nodes MUST NOT negotiate TLS 1.2 or earlier on federation ports. Implementations MUST configure their TLS stack to refuse downgrade to TLS < 1.3.
- The cipher suite floor for TLS 1.3 connections MUST include at a minimum
TLS_AES_128_GCM_SHA256,TLS_AES_256_GCM_SHA384, andTLS_CHACHA20_POLY1305_SHA256. Operators MAY restrict to a subset but MUST NOT add cipher suites outside this list without board-level security approval documented in the ops runbook. - Node certificate Subject Alternative Names (SANs) MUST include the node's canonical
entity_uri(as a URI SAN). Verifying nodes MUST check that the peer's SAN matches theentity_urideclared in the peer's org manifest (§19.1.2) before accepting the connection as authenticated. - Nodes MUST reject any federation connection from a peer whose certificate chain cannot be verified against a locally configured trust root or whose SAN does not match the expected
entity_uri.
If a reverse proxy (nginx, Caddy, Envoy) terminates TLS before the
stigmem node process, mTLS peer certificate validation is bypassed.
Set STIGMEM_MTLS_REQUIRED=true to force the node to reject any
connection without a verified peer certificate, even behind a proxy.
Verify this configuration in staging before enabling federation.
§22.1.3 Cert rotation hook into §19 manifest
When a node rotates its mTLS node certificate:
- The node MUST generate a new X.509 certificate for the new key pair.
- The new certificate's public key fingerprint MUST be recorded in the node's org manifest (§19.1) as a new
RotationEvent(§19.1.4) alongside the Ed25519 key rotation, or in a dedicatedtls_cert_fingerprintfield on the manifest if the TLS key is distinct from the Ed25519 signing key. Implementations MUST NOT rotate the mTLS certificate silently — every rotation MUST produce a manifest update. - The updated manifest MUST be re-signed and re-published to
/.well-known/stigmem-manifest.json(§19.1.6) before the new certificate is put into service. - The updated manifest MUST be submitted to the transparency log (§19.2) as part of the rotation event. Nodes MUST NOT activate the new certificate until the transparency log submission has been acknowledged (i.e., until a
LogEntryis received). Nodes SHOULD retry the transparency log submission for up to 24 hours before proceeding with rotation. If rotation proceeds without a log acknowledgement (e.g., due to a Rekor maintenance window), the node MUST record apending_log_submission: trueflag in the manifest and MUST complete the submission as soon as the log is reachable. - During the transition window (see §22.2.2 for dual-trust period), nodes MUST accept both the old and new TLS certificates from the rotating peer. The transition window MUST NOT exceed the dual-trust period defined in §22.2.
§22.1.4 Client certificate provisioning
Nodes SHOULD use short-lived mTLS client certificates (≤ 24 hours) issued by a local certificate authority dedicated to federation transport. Operators MAY use longer-lived certificates (≤ 90 days) provided they implement automated rotation (e.g., via cert-manager or equivalent). Long-lived certificates MUST be listed in the node's org manifest as described in §22.1.3.
§22.2 Key rotation
§22.2.1 Scope
This section applies to two key types:
§22.2.2 Rollover window and dual-trust period
- The rollover window begins when a new key pair is generated and ends when all previously issued capability tokens signed by the old key have expired or been explicitly revoked.
- During the rollover window, nodes MUST maintain a dual-trust period: both the old and new public keys are simultaneously trusted for signature verification. The dual-trust period MUST cover at least the maximum outstanding capability token lifetime from the time rotation is initiated. Since capability tokens MUST NOT exceed 90 days (§19.3.2), the dual-trust period MUST be at least 90 days unless all outstanding tokens are explicitly revoked before rotation completes.
- Nodes MUST reject tokens signed by a key older than the dual-trust period (i.e., keys for which the dual-trust period has elapsed and which are no longer in the org manifest's rotation chain).
- The rollover window MUST be recorded in the org manifest via a
RotationEvent(§19.1.4). The transparency log MUST receive a separate log entry for the rotation event withevent_type: "key_rotation"and adual_trust_expires_atfield indicating when the old key's trust period ends. - During the dual-trust period, verifiers SHOULD consult the org manifest rotation chain (§19.1.4) to identify which historic key signed a given token, rather than assuming the current manifest key.
§22.2.3 Transparency log entry on rotation
Every key rotation MUST produce a transparency log entry so that federation peers and auditors can verify the chain of identity across key transitions. The entry is signed by the old (retiring) key — this anchors the new key to the prior identity and prevents a compromised new key from fabricating a rotation event.
KeyRotationLogEntry:
event_type: "key_rotation"
entity_uri: URI // the rotating node/org
old_key_id: hex // key_id of the retiring key
new_key_id: hex // key_id of the new key
rotated_at: RFC3339
dual_trust_expires_at: RFC3339 // old key trusted until this time
manifest_log_index: integer // log index of the updated manifest submission
rotation_sig: base64url // Ed25519 sig over RFC 8785 JCS encoding, signed by OLD key
The rotation_sig MUST verify under the old_key_id public key.
The byte sequence signed MUST be the RFC 8785 JSON Canonicalization
Scheme (JCS) serialisation of the other fields: keys lexicographically
sorted, no whitespace, UTF-8 encoding, no trailing newline.
The manifest submission (§22.1.3.4) MUST be acknowledged by the
transparency log before the KeyRotationLogEntry is submitted; the
returned log index MUST be recorded as manifest_log_index.
§22.2.4 Rotation cadence
Cadence MUST be documented in the node's operational runbook and MAY
be declared in the node's /.well-known/stigmem advertisement.
§22.3 Audit log surface
§22.3.1 Required event types
Every Stigmem node MUST emit structured audit log events for the following operations. Each event MUST be written to the audit log before the operation's response is returned to the caller (write-ahead semantics).
fact_writeevent_type, timestamp, hlc, actor_entity, fact_id, scope, verb.fact_readevent_type, timestamp, actor_entity, scope_filter, fact_ids_returned[], query_strategy.capability_token_issuetoken_id, issuer, subject, verb, object, expiry.capability_token_revoketoken_id, issuer, revoked_at, reason.manifest_publishentity_uri, key_id, manifest_hash.key_rotationentity_uri, old_key_id, new_key_id, dual_trust_expires_at.federation_connectpeer_entity_uri, peer_cert_fingerprint, outcome, reject_reason?.quarantine_admitfact_id, source, admit_reason.quarantine_releasefact_id, actor_entity, decision.quota_breachprincipal, quota_dimension, ceiling, actual.admin_actionactor_entity, action, resource, outcome.replay_rejectedtoken_id, nonce, reject_reason.instruction_auditagent_id, chunk_id, load_trigger, outcome. MUST emit if the instruction recall layer is active; nodes not implementing the lazy instruction layer are exempt.instruction_quarantinedfact_id, actor_entity, source, reason.instruction_promotedfact_id, actor_entity, quarantine_garden_id, target_garden_id?.Implementations MUST NOT omit required fields. Optional fields
(marked ?) SHOULD be included when available.
§22.3.2 Ordering guarantee
Audit log events MUST be totally ordered by a monotonically increasing sequence number within a single node. Events SHOULD include the node's HLC tick (§2.4) alongside the wall-clock timestamp to allow cross-node ordering reconstruction. The sequence MUST NOT reset across node restarts.
§22.3.3 Retention contract
Minimum 90 days
Audit logs MUST be retained for at least 90 days.
Recommended 1 year
Operators SHOULD retain logs for 1 year for forensic purposes.
Append-only storage
Logs MUST be stored in a medium that is append-only with respect to normal operational access. Ordinary application processes MUST NOT be able to overwrite or delete log entries.
Separate from fact store
Logs MUST NOT be stored exclusively in the same database that serves the production fact store unless that database provides an independent, append-only audit trail mechanism (e.g., PostgreSQL audit extension with restricted DDL access).
§22.3.4 Admin export shape
Admins MUST be able to export audit logs via the following HTTP route.
GET /v1/admin/audit-log
Authorization: Bearer <admin-token>
Query parameters:
after: RFC3339 // events after this timestamp (exclusive); omit for all
before: RFC3339 // events before this timestamp (exclusive); omit for open end
event_type: string // filter to one event type; repeatable for multiple types
limit: integer // max events per page; default 500; max 5000
cursor: string // opaque pagination cursor from prior response
Response:
{
"events": [
{
"seq": 12345,
"event_type": "fact_write",
"timestamp": "2026-05-04T12:00:00Z",
"hlc": "1746360000000-0001-a1b2",
...
}
],
"next_cursor": "opaque-cursor-string",
"has_more": true
}
Admin-scoped token required
The export route MUST require an admin-scoped token.
Ascending seq order
Events MUST be returned in ascending seq order.
Streaming support
The route MUST support streaming for large time ranges (chunked transfer or cursor pagination with has_more).
CLI wrapper
Operators SHOULD provide a CLI wrapper for this endpoint that writes NDJSON to stdout.
§22.4 Per-principal quotas
§22.4.1 Model
Stigmem implements per-principal rate limiting using a
token-bucket model. Each (principal, dimension) pair maintains
an independent token bucket. The principal is the actor_entity URI
derived from the authenticated caller's capability token or API key.
TokenBucket:
principal: URI // entity URI of the caller
dimension: string // quota dimension (see §22.4.2)
capacity: integer // bucket size (max burst)
rate: float // refill rate in tokens/second
current: float // current token count (updated on each request)
last_refill: RFC3339 // timestamp of last refill computation
The bucket refills continuously at rate tokens/second up to
capacity. Each qualifying request consumes 1 token unless
otherwise specified per dimension.
§22.4.2 Quota dimensions and default ceilings
fact_writefact_readtoken_issuefederation_pulladmin_actionsubscription_eventaudit_exportDefault ceilings MUST be applied unless overridden by an
admin-configured QuotaPolicy document for the principal. Overrides
MUST be stored persistently and survive node restarts.
§22.4.3 Backpressure response shape
When a principal's token bucket is exhausted:
- The node MUST return HTTP 429 Too Many Requests with the body shape below.
retry_afteris a float number of seconds until the bucket refills sufficiently to accept one more request at the current rate. Implementations MUST compute this as(1 - current) / rate(seconds to earn 1 token). - The node MUST include a
Retry-AfterHTTP header with the integer ceiling ofretry_after. - The node MUST emit a
quota_breachaudit log event (§22.3.1) for every request that hits the ceiling. - Nodes SHOULD propagate quota pressure upstream to federated callers via the
X-Stigmem-Replication-Lagheader (§6.7) whenfederation_pullquota is exhausted. - Callers MUST honour
Retry-Afterand MUST implement exponential backoff with jitter after two consecutive 429 responses from the same node.
{
"error": "quota_exceeded",
"dimension": "fact_write",
"principal": "stigmem://org/my-agent",
"retry_after": 3.2
}
§22.5 Replay protection
§22.5.1 Scope
This section extends §19.3.5 (capability token nonce) with normative clock-skew bounds and a unified replay protection model applicable to both capability tokens and federation handshake messages.
§22.5.2 Nonce and timestamp window
- Every capability token MUST include a
nonceof 32 cryptographically random bytes (§19.3.5). Every federation handshake message MUST include an independentnonceof 32 cryptographically random bytes. - The timestamp acceptance window is ± 5 minutes from the verifier's local clock. Tokens or messages with an
issued_attimestamp outside this window MUST be rejected with atimestamp_out_of_windowerror, even if the nonce is fresh. - The nonce cache MUST retain seen nonces for at least the duration of the acceptance window plus the maximum token lifetime (5 minutes + 90 days for capability tokens; 5 minutes + session duration for handshake messages). Implementations MUST NOT prune nonces from the cache before this window elapses.
- Nonces MUST be stored in a persistent cache (survives node restarts within the retention window). An in-memory-only nonce cache MUST NOT be used in production; a brief restart MUST NOT create a replay window.
§22.5.3 Clock-skew bounds
issued_at > verifier clock + 5 mintimestamp_future_dated.issued_at < verifier clock − 5 mintimestamp_stale.expiry < verifier clocktoken_expired.expiry > issued_at + 90 daystoken_lifetime_exceeded.Nodes MUST synchronise their system clocks via NTP (or equivalent). Operators SHOULD configure alerts if clock drift exceeds 30 seconds.
§22.5.4 Error codes
timestamp_future_datedissued_at more than 5 minutes in the future.timestamp_staleissued_at more than 5 minutes in the past.token_expiredexpiry has passed.token_lifetime_exceededexpiry − issued_at > 90 days.token_replay§22.6 Container baseline
§22.6.1 Scope
This section specifies the normative security posture for reference operator container images published by Eidetic Labs. Third-party operators running Stigmem from source SHOULD adopt the same baseline.
The Docker / Docker Compose requirements in this section apply to
the supported v0.9.0a1 deployment surface. Requirements that
reference Helm charts or Kubernetes manifests apply
conditionally: in v0.9.0a1 those deployment surfaces are deferred to
experimental/deploy-helm/
and unsupported until they pass the
ADR-008 reintroduction gates.
§22.6.2 Distroless image
- Reference operator images MUST be built FROM a distroless base (e.g.,
gcr.io/distroless/cc-debian12or equivalent). Images MUST NOT include a shell (sh,bash) in the production layer. - Multi-stage builds MUST be used: build dependencies and tools MUST be confined to a builder stage and MUST NOT appear in the final image layer.
- The image MUST contain only the Stigmem node binary and its minimal runtime dependencies (shared libraries, CA bundle, tzdata).
§22.6.3 Non-root user
- The container MUST run as a non-root user. The
DockerfileMUST include aUSERdirective setting a non-zero UID (SHOULD use UID 1000) in the final stage. - The container MUST NOT be run with
--privilegedor withCAP_SYS_ADMIN. Operators MUST NOT grant any Linux capabilities beyond the minimum required (if port < 1024, useCAP_NET_BIND_SERVICE; SHOULD use a port ≥ 1024). - Kubernetes / container runtime manifests for reference deployments MUST include:
securityContext:
runAsNonRoot: true
runAsUser: 1000
allowPrivilegeEscalation: false
§22.6.4 Read-only root filesystem
- The container's root filesystem MUST be mounted read-only (
readOnlyRootFilesystem: truein Kubernetes). All writable state (database files, log buffers, temporary files) MUST be mounted as explicit volumes oremptyDirmounts. - Reference Helm charts MUST configure
readOnlyRootFilesystem: trueby default and MUST document which volumes require write access.
§22.6.5 Seccomp profile
- Reference images MUST ship a seccomp profile that allows only the syscalls required by the Stigmem node binary. The profile MUST deny
ptrace,process_vm_readv,process_vm_writev,kexec_load, andperf_event_openat a minimum. - Kubernetes deployments MUST apply the profile via
seccompProfile.type: LocalhostwithlocalhostProfile: profiles/stigmem-node.json, ortype: RuntimeDefaultwhere a restrictive runtime default is confirmed equivalent.UnconfinedMUST NOT be used in production. - The seccomp profile MUST be published alongside each release in
deploy/seccomp/stigmem-node.jsonand versioned with the binary.
§22.6.6 Image signing
Reference images MUST be signed using
Sigstore Cosign and the
signature MUST be pushed to the same registry. Operators SHOULD
verify the image signature before deployment using cosign verify.
Image digests (not mutable tags) MUST be used in all reference
Kubernetes manifests and Helm chart values.yaml defaults.
§22.7 Transparency log own-instance decision memo
§22.7.1 Purpose
§19.2.2 permits but does not require operating a self-hosted Rekor instance. This section provides normative decision criteria so that operators can determine whether self-hosting is appropriate, and records the Eidetic Labs reference deployment position.
§22.7.2 Decision criteria
An operator SHOULD self-host a Rekor instance if and only if ALL of the following criteria are met.
rekor.sigstore.dev.trust_mode: strict.If any criterion is not met, the operator SHOULD use the public
Rekor instance at https://rekor.sigstore.dev (or a hosted
equivalent). Operators MUST NOT self-host without documented answers
to each criterion in their ops runbook.
§22.7.3 Reference deployment position (Eidetic Labs)
The Eidetic Labs reference deployment uses the public Rekor
instance (https://rekor.sigstore.dev).
Decision: defer self-hosted Rekor to backlog.
A self-hosted Rekor instance for the Eidetic Labs reference deployment does not meet the minimum decision criteria at this phase. Reconsider when (a) a private-network deployment tier is productised, or (b) a dedicated SRE function is established.
§22.7.4 Configuration
STIGMEM_TRANSPARENCY_LOG_URL=https://rekor.sigstore.dev
STIGMEM_TRANSPARENCY_LOG_PUBLIC_KEY=<base64-encoded ECDSA key from GET /api/v1/log>
The public key is pinned explicitly rather than discovered at
runtime — this ensures the node always verifies log entries against
a known trust anchor even if the Rekor URL is compromised.
STIGMEM_TRANSPARENCY_LOG_PUBLIC_KEY MUST be pinned explicitly; key
discovery via the URL alone MUST NOT be the sole trust anchor in
production.
§22.7.5 Transparency log public-key rotation
The Sigstore/Rekor root signing key is subject to rotation (a root
key rotation occurred in 2022). Operators pinning
STIGMEM_TRANSPARENCY_LOG_PUBLIC_KEY MUST have a documented
procedure for updating the pin.
- Operators SHOULD subscribe to Sigstore transparency log key rotation announcements (the sigstore-announce mailing list and the CT log transparency dashboard) and SHOULD update
STIGMEM_TRANSPARENCY_LOG_PUBLIC_KEYwithin 30 days of a published rotation. - A node MUST NOT treat a persistent transparency log key verification failure as a permanent misconfiguration without first checking whether a Rekor root key rotation has occurred. On repeated verification failures, the node SHOULD emit a
transparency_log_key_mismatchaudit log event and surface an operator alert before entering a degraded-verification state.
Subsection anchors
Anchors below are provided so docs links to specific subsections always resolve, even when the subsection text lives only in earlier spec drafts.