Spec-21-Content-Addressed-IDs
What this spec defines
The content-addressed fact identifier (CID): deterministic, tamper-evident identifiers for the canonical body of a fact.
Extraction status
This component spec extracts the content-addressed fact ID material
that previously lived in the monolithic stigmem-spec-v0.9.0a1.md
lineage.
CIDs are core Stigmem behavior per ADR-017.
They are not an experimental plugin feature, and a conforming default node MUST compute CIDs for new facts.
Purpose
A CID is a deterministic, tamper-evident identifier for the canonical body of a fact. Unlike a UUID-style fact id, a CID can be recomputed independently from the fact payload.
CIDs provide:
Content integrity
Checks for stored facts.
Deduplication
For identical assertions.
Stable provenance
References across nodes and transports.
Defense-in-depth
A core layer for storage immutability.
CID format
A CID MUST use the sha256: prefix followed by 64 lowercase
hexadecimal characters:
sha256:<64 lowercase hex chars>
The CID MUST be computed as:
CID = "sha256:" + hex_lowercase(SHA-256(canonical_fact_body_bytes))
Only lowercase hexadecimal output is valid for sha256: CIDs.
Strings beginning with sha256: that do not match this format MUST
be treated as malformed CIDs.
Canonical fact body
The canonical fact body for v0.9.0aN CID computation contains exactly these fields:
{
"confidence": 1.0,
"entity": "stigmem://example/entity",
"relation": "memory:prefers",
"scope": "local",
"source": "agent:example",
"value_type": "string",
"value_v": "dark mode"
}
The canonical body MUST be serialized as compact UTF-8 JSON with
deterministic lexicographic key ordering and no insignificant
whitespace. The reference node uses JSON sorted keys with compact
separators and ensure_ascii=false.
All seven canonical fields are CID-sensitive. Changing any of them MUST produce a different CID.
Excluded fields
The following fields MUST NOT participate in CID computation:
id / fact_idcidtimestamp / created_athlcvalid_untilderived_fromattestation_chain / signaturesource_trustreasonA matching CID is not proof that excluded metadata is trustworthy.
Excluded fields may still be security-relevant. Implementations MUST validate those fields through their owning specs.
Storage contract
The fact storage model MUST support:
Nullable cid column
Nullable only for legacy rows pending backfill.
fact_cid_aliases table
Mapping stored fact ids to CIDs.
Unique CID index
For efficient CID lookup.
Fact id index
For alias maintenance.
Every new fact write MUST persist the computed CID on the fact row and insert the corresponding alias row in the same transaction.
Write path and deduplication
On local assertion, a node MUST:
- Normalize the assertion fields according to their owning specs.
- Compute the CID before writing the fact row.
- Persist the CID with all other fact fields.
- Insert the CID alias row in the same transaction.
A CID collision MUST NOT overwrite the existing record.
If the computed CID already exists for the same tenant, the node SHOULD return the existing record instead of creating a duplicate fact. If an implementation detects the same CID for a different canonical body, it MUST treat that as a CID collision.
Dual addressing
The single-fact read route MUST accept either a UUID-style fact id or a CID:
GET /v1/facts/{cid_or_fact_id}
When the path value starts with sha256:, the node MUST validate
CID syntax and resolve the fact through the CID alias index.
Malformed CIDs MUST return a validation error. Well-formed but
unknown CIDs MUST return not found.
Fact responses SHOULD include the stored cid field. Legacy facts
that have not yet been backfilled MAY return cid: null.
CID verification
Nodes MUST expose an integrity check that recomputes the CID from the stored fact body and compares it with the stored CID:
POST /v1/facts/{fact_id}/verify-cid
The response MUST include:
cid_validcomputed_cidstored_cidmismatch_reasoncid_valid is false.A false result SHOULD trigger operator investigation. It may indicate data corruption, storage tampering, a legacy row pending backfill, or a canonicalization bug.
Backfill
Nodes MUST provide a backfill path for legacy rows whose cid is
null. The backfill process MUST:
- Iterate over facts with
cid IS NULL. - Recompute each CID from the canonical fact body.
- Update the fact row and insert the alias row.
- Be idempotent.
The reference node exposes a backfill-cids CLI command and this
status route:
GET /v1/admin/cid-backfill/status
The status response MUST include:
total_factsbackfilled_factscid.pending_factscid.backfill_completepending_facts is zero.Federation use
Receiving nodes SHOULD recompute the CID from the inbound canonical body and reject payloads whose declared CID does not match.
Federation payloads SHOULD carry CIDs when fact records cross node
boundaries. Legacy CID-null rows may exist during migration and
backfill windows. Federation policy for accepting or rejecting
CID-null inbound facts is owned by Spec-05-Federation-Trust; this
spec defines the CID format and computation needed to perform that
validation.
Error conditions
Nodes SHOULD use these stable error meanings:
cid_malformedsha256: path value is not followed by 64 lowercase hex characters.fact_not_foundcid_mismatchcid_collision_detectedOut of scope
This spec does not define:
Federation CID-null policy
Full trust policy for legacy facts.
Hash algorithm rotation
Beyond the sha256: prefix shape.
Provenance graph
Semantics for derived_from.
Tombstone / time-travel
Or source-attestation behavior.