Skip to main content
Version: v0.9.0a2
Spec

Spec-21-Content-Addressed-IDs

5 min readSpec contributor · Node operatorDraft · v0.9.0aN

What this spec defines

The content-addressed fact identifier (CID): deterministic, tamper-evident identifiers for the canonical body of a fact.

Extraction status

This component spec extracts the content-addressed fact ID material that previously lived in the monolithic stigmem-spec-v0.9.0a1.md lineage.

CIDs are core Stigmem behavior per ADR-017.

They are not an experimental plugin feature, and a conforming default node MUST compute CIDs for new facts.

Purpose

A CID is a deterministic, tamper-evident identifier for the canonical body of a fact. Unlike a UUID-style fact id, a CID can be recomputed independently from the fact payload.

CIDs provide:

Content integrity

Checks for stored facts.

Deduplication

For identical assertions.

Stable provenance

References across nodes and transports.

Defense-in-depth

A core layer for storage immutability.

CID format

A CID MUST use the sha256: prefix followed by 64 lowercase hexadecimal characters:

sha256:<64 lowercase hex chars>

The CID MUST be computed as:

CID = "sha256:" + hex_lowercase(SHA-256(canonical_fact_body_bytes))

Only lowercase hexadecimal output is valid for sha256: CIDs.

Strings beginning with sha256: that do not match this format MUST be treated as malformed CIDs.

Canonical fact body

The canonical fact body for v0.9.0aN CID computation contains exactly these fields:

{
"confidence": 1.0,
"entity": "stigmem://example/entity",
"relation": "memory:prefers",
"scope": "local",
"source": "agent:example",
"value_type": "string",
"value_v": "dark mode"
}

The canonical body MUST be serialized as compact UTF-8 JSON with deterministic lexicographic key ordering and no insignificant whitespace. The reference node uses JSON sorted keys with compact separators and ensure_ascii=false.

All seven canonical fields are CID-sensitive. Changing any of them MUST produce a different CID.

Excluded fields

The following fields MUST NOT participate in CID computation:

Field
Reason
Notes
id / fact_id
self-reference
Cannot be part of its own CID.
cid
self-reference
Cannot be self-referential.
timestamp / created_at
write-time metadata
Write time is not part of the assertion.
hlc
node-local
Logical clock metadata.
valid_until
policy
Expiry policy, not the assertion body.
derived_from
provenance
References can create circularity.
attestation_chain / signature
transport
Transport or attestation metadata.
source_trust
local
Locally derived trust score.
reason
audit context
Operator or audit context, not the assertion body.

A matching CID is not proof that excluded metadata is trustworthy.

Excluded fields may still be security-relevant. Implementations MUST validate those fields through their owning specs.

Storage contract

The fact storage model MUST support:

Nullable cid column

Nullable only for legacy rows pending backfill.

fact_cid_aliases table

Mapping stored fact ids to CIDs.

Unique CID index

For efficient CID lookup.

Fact id index

For alias maintenance.

Every new fact write MUST persist the computed CID on the fact row and insert the corresponding alias row in the same transaction.

Write path and deduplication

On local assertion, a node MUST:

  1. Normalize the assertion fields according to their owning specs.
  2. Compute the CID before writing the fact row.
  3. Persist the CID with all other fact fields.
  4. Insert the CID alias row in the same transaction.

A CID collision MUST NOT overwrite the existing record.

If the computed CID already exists for the same tenant, the node SHOULD return the existing record instead of creating a duplicate fact. If an implementation detects the same CID for a different canonical body, it MUST treat that as a CID collision.

Dual addressing

The single-fact read route MUST accept either a UUID-style fact id or a CID:

GET /v1/facts/{cid_or_fact_id}

When the path value starts with sha256:, the node MUST validate CID syntax and resolve the fact through the CID alias index. Malformed CIDs MUST return a validation error. Well-formed but unknown CIDs MUST return not found.

Fact responses SHOULD include the stored cid field. Legacy facts that have not yet been backfilled MAY return cid: null.

CID verification

Nodes MUST expose an integrity check that recomputes the CID from the stored fact body and compares it with the stored CID:

POST /v1/facts/{fact_id}/verify-cid

The response MUST include:

Field
Required
Meaning
cid_valid
yes
Whether the stored CID matches the recomputed CID.
computed_cid
yes
CID computed from the stored canonical body.
stored_cid
nullable
Stored CID, or null for legacy rows pending backfill.
mismatch_reason
conditional
Human-readable reason when cid_valid is false.

A false result SHOULD trigger operator investigation. It may indicate data corruption, storage tampering, a legacy row pending backfill, or a canonicalization bug.

Backfill

Nodes MUST provide a backfill path for legacy rows whose cid is null. The backfill process MUST:

  1. Iterate over facts with cid IS NULL.
  2. Recompute each CID from the canonical fact body.
  3. Update the fact row and insert the alias row.
  4. Be idempotent.

The reference node exposes a backfill-cids CLI command and this status route:

GET /v1/admin/cid-backfill/status

The status response MUST include:

Field
Type
Meaning
total_facts
integer
Total facts visible to the status query.
backfilled_facts
integer
Facts with non-null cid.
pending_facts
integer
Facts still missing cid.
backfill_complete
boolean
Whether pending_facts is zero.

Federation use

Receiving nodes SHOULD recompute the CID from the inbound canonical body and reject payloads whose declared CID does not match.

Federation payloads SHOULD carry CIDs when fact records cross node boundaries. Legacy CID-null rows may exist during migration and backfill windows. Federation policy for accepting or rejecting CID-null inbound facts is owned by Spec-05-Federation-Trust; this spec defines the CID format and computation needed to perform that validation.

Error conditions

Nodes SHOULD use these stable error meanings:

Error
Condition
Notes
cid_malformed
syntax
A sha256: path value is not followed by 64 lowercase hex characters.
fact_not_found
lookup
Fact id or CID does not resolve to a readable fact.
cid_mismatch
integrity
A recomputed or inbound CID does not match the declared/stored CID.
cid_collision_detected
integrity
Two different canonical fact bodies produce the same CID.

Out of scope

This spec does not define:

Federation CID-null policy

Full trust policy for legacy facts.

Hash algorithm rotation

Beyond the sha256: prefix shape.

Provenance graph

Semantics for derived_from.

Tombstone / time-travel

Or source-attestation behavior.

Storage-engine DDL syntax