Skip to main content
Version: v0.9.0a2
Integrator

Content-Addressed Fact IDs (CIDs)

5 min readIntegrator ยท Federation operatorSpec-21-Content-Addressed-IDs

What this page is

Practical guide to computing and using CIDs. Every fact in Stigmem receives a deterministic SHA-256 hash of its canonical body. Two facts with the same entity, relation, value, source, and scope always produce the same CID, regardless of when or where they were asserted. For the design rationale, see Content Addressing concepts.

CID formatโ€‹

sha256:<64 hex chars>

Example:

sha256:a3f2b8c901d4e5f6789012345678abcdef0123456789abcdef0123456789abcd

Canonical bodyโ€‹

The CID is computed over a JSON object with exactly 6 fields in lexicographic key order, serialized with RFC 8785 (JCS) determinism:

{
"entity": "user:alice",
"relation": "memory:role",
"scope": "local",
"source": "agent:assistant",
"value_type": "string",
"value_v": "engineer"
}

Excluded fieldsโ€‹

Field
Class
Why excluded
fact_id, cid
circular
The CID cannot include itself.
created_at / timestamp
temporal metadata
Same assertion at different times shares one CID.
confidence
mutable signal
Confidence can change; CID should not.
valid_until
operational
Expiry is operational, not content-defining.
derived_from
provenance chain
Requires independent validation.
attestation_chain
security
Security-relevant; validated separately.
source_trust
contextual
Trust score is context-dependent.
signature
authenticity
Validated independently of content identity.
reason
metadata
Retraction/tombstone reason.

Computing a CIDโ€‹

Pythonโ€‹

import hashlib
import json

def compute_cid(entity, relation, value_type, value_v, source, scope):
body = {
"entity": entity,
"relation": relation,
"scope": scope,
"source": source,
"value_type": value_type,
"value_v": value_v,
}
canonical = json.dumps(
body, sort_keys=True, separators=(",", ":"), ensure_ascii=False
).encode("utf-8")
digest = hashlib.sha256(canonical).hexdigest()
return f"sha256:{digest}"

cid = compute_cid(
entity="user:alice",
relation="memory:role",
value_type="string",
value_v="engineer",
source="agent:assistant",
scope="local",
)
print(cid) # sha256:...

TypeScriptโ€‹

import { createHash } from "crypto";

function computeCid(
entity: string, relation: string,
valueType: string, valueV: string,
source: string, scope: string,
): string {
const body = { entity, relation, scope, source, value_type: valueType, value_v: valueV };
const canonical = JSON.stringify(body, Object.keys(body).sort());
const digest = createHash("sha256").update(canonical).digest("hex");
return `sha256:${digest}`;
}

Goโ€‹

import (
"crypto/sha256"
"encoding/json"
"fmt"
)

func computeCID(entity, relation, valueType, valueV, source, scope string) string {
body := map[string]string{
"entity": entity, "relation": relation, "scope": scope,
"source": source, "value_type": valueType, "value_v": valueV,
}
canonical, _ := json.Marshal(body) // keys sorted by Go's map iteration after json.Marshal
digest := sha256.Sum256(canonical)
return fmt.Sprintf("sha256:%x", digest)
}

Dual addressing โ€” UUID and CIDโ€‹

Every fact has both a UUID (id) and a CID. You can fetch a fact by either.

# By UUID
curl -s http://localhost:8765/v1/facts/550e8400-e29b-41d4-a716-446655440000 \
-H "Authorization: Bearer $TOKEN"

# By CID
curl -s http://localhost:8765/v1/facts/sha256:a3f2b8c9... \
-H "Authorization: Bearer $TOKEN"

The node resolves CIDs via the fact_cid_aliases table, which maps each CID to its UUID.

Write-path deduplicationโ€‹

When you assert a fact, the node:

  1. Computes the CID from the 6 canonical fields.
  2. Checks fact_cid_aliases for an existing fact with the same CID.
  3. If found, returns the existing fact (idempotent write).
  4. If not found, inserts the new fact, stores its CID, and creates the alias.

Asserting the same (entity, relation, value, source, scope) tuple twice returns the same fact record โ€” no duplicates.

# First assertion โ€” creates the fact
curl -s -X POST http://localhost:8765/v1/facts \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"entity":"user:alice","relation":"memory:role","value":{"type":"string","v":"engineer"},"source":"agent:assistant","scope":"local"}' \
| jq '{id, cid}'

# Second identical assertion โ€” returns the same fact
curl -s -X POST http://localhost:8765/v1/facts \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"entity":"user:alice","relation":"memory:role","value":{"type":"string","v":"engineer"},"source":"agent:assistant","scope":"local"}' \
| jq '{id, cid}'
# Same id and cid as above

CID verificationโ€‹

The node can verify that a stored fact's CID matches a freshly computed CID:

curl -s http://localhost:8765/v1/facts/sha256:a3f2b8c9.../verify \
-H "Authorization: Bearer $TOKEN"

If the CID diverges from the stored canonical body, the node emits a cid_mismatch audit event.

Federation and tamper detectionโ€‹

Federation envelopes carry the CID for each fact. The receiving node:

  1. Computes the CID independently from the envelope's canonical fields.
  2. Compares it against the envelope's declared CID.
  3. Rejects the fact if they diverge โ€” this detects tampering in transit.

Facts with cid: null whose created_at falls after CID enforcement begins are rejected. Legacy pre-CID facts with cid: null are accepted during the backfill window.

CID backfillโ€‹

Existing facts created before the pre-reset design window do not have CIDs. The node backfills them in the background.

# Check backfill progress
curl -s http://localhost:8765/v1/admin/cid-backfill/status \
-H "Authorization: Bearer $ADMIN_TOKEN" | jq .

Response:

{
"total_facts": 12500,
"backfilled_facts": 11200,
"pending_facts": 1300,
"backfill_complete": false
}

The backfill runs concurrently with live writes. The migration window is 12 months. After the window closes, all facts must have CIDs.

External citationโ€‹

CIDs are stable, content-derived identifiers that work across nodes. Use them to cite facts in external systems.

stigmem://node.example.com/facts/sha256:a3f2b8c901d4e5f6...

Because the CID is derived from content, the same fact on two federated nodes has the same CID โ€” making cross-node references unambiguous.

See alsoโ€‹