Version: v0.9.0a2

Spec

Experimental feature

This feature is Experimental. Breaking changes may occur before it reaches Stable. See feature status definitions →

§20. Recall & Graph

10 min readSpec contributor · Recall implementerExperimental · future plugin line

What this section covers

Graph adjacency index, embedding storage, recall API, memory cards, and causal/derivation link lifecycle. The subscription primitive that previously lived in §20.5 is now colocated under Spec-X7-Subscriptions.

Status: Experimental / dormant source package. Archived source material was drafted as normative, but this spec is deferred from the supported v0.9.0aN surface and must pass ADR-008 gates before reintroduction.

Source material: Archived evolutionary spec snapshots. This page is the maintained Spec-X home for recall graph semantics.

Section body

Each subsection below shows the most recent normative text from the spec source. When earlier spec drafts also contained text for the same subsection, those revisions are collapsed under a Revisions accordion beneath it — open one to see what changed. Subsections that only appear in one draft render as plain text with no accordion.

Depends on: §2 (fact shape), §5 (wire format), §17 (memory garden), §18 (source attestation), §19 (federation trust).

§20.1 Graph Index

§20.1.1 Purpose

The facts table is a flat relation keyed by entity URI. Entity-to-entity connections exist implicitly: any fact whose value.type = "ref" and whose value URI denotes a known entity constitutes a directed edge from the subject entity to the referenced entity.

Without a materialized adjacency structure, multi-hop traversal requires O(k × |F|) full table scans per recall query.

§20 mandates a materialized entity_edges table to enable efficient bounded-depth BFS.

§20.1.2 Schema

CREATE TABLE IF NOT EXISTS entity_edges (
    id              TEXT PRIMARY KEY,      -- edge UUID (= source fact id)
    subject         TEXT NOT NULL,         -- normalized entity URI ("from" node)
    relation        TEXT NOT NULL,         -- predicate / edge label
    object          TEXT NOT NULL,         -- normalized entity URI ("to" node)
    scope           TEXT NOT NULL,
    confidence      REAL NOT NULL,         -- mirrors fact.confidence; updated by decay sweeper
    source_trust    REAL,                  -- cached t(fact.source) per §19.4; nullable
    decay_epoch     INTEGER,               -- Unix ms of last decay sweep touch
    created_at      INTEGER NOT NULL       -- Unix ms
);

CREATE INDEX IF NOT EXISTS idx_edges_subject     ON entity_edges (subject,  scope, confidence);
CREATE INDEX IF NOT EXISTS idx_edges_object      ON entity_edges (object,   scope, confidence);
CREATE INDEX IF NOT EXISTS idx_edges_subject_rel ON entity_edges (subject,  relation, scope);

Implementations MUST create this table and all three indexes before accepting PUT /v1/facts calls that could produce ref-type values.

§20.1.3 Adjacency Invariants

Insert on ref fact. An entity_edges row MUST be inserted whenever a fact is persisted with value.type = "ref" and the v field passes entity-URI validation. The id MUST equal the source fact's id. The object MUST be the normalized form of the ref target URI.
Decay sweep propagation. When the decay sweeper updates a fact's confidence, it MUST update the corresponding entity_edges row's confidence and decay_epoch in the same transaction.
Retraction soft-delete. Set confidence = 0.0 on the facts row AND on the entity_edges row (not hard-delete), AND insert a row into fact_retractions(fact_id, retracted_at). The fact_retractions record is authoritative for time-travel queries.
Garden scope. entity_edges rows inherit the fact's scope. Cross-garden traversal is governed by the caller's garden ACL checked at the application layer before returning results.
Consistency. An entity_edges row MUST NOT outlive the deletion of its source fact. SHOULD use a foreign-key cascade or equivalent constraint.

§20.1.4 Edge Metadata Fields

Field

Type

Description

id

TEXT (UUID)

Primary key; equals the source fact's id.

subject / object

TEXT (URI)

Normalized "from" / "to" entity URIs.

relation

TEXT

Predicate label from the source fact.

scope

TEXT

Garden or global scope identifier.

confidence

REAL [0,1]

Current confidence; mirrors and tracks facts.confidence.

source_trust

REAL [0,1]

Cached t(fact.source). MAY be null for pre-Phase-9 data.

decay_epoch

INTEGER

Unix ms of last decay sweep update.

created_at

INTEGER

Unix ms of row creation (= fact insertion time).

§20.1.5 `neighbors()` Query Semantics

The neighbors() traversal is the primitive used by the recall pipeline's graph expansion stage and is also exposed directly via GET /v1/graph/neighbors.

GET /v1/graph/neighbors
  ?entity={entity_uri}
  &depth={k}            // integer 1–3; default 1; MUST reject > 3
  &relation_filter={rel_pattern}  // optional; comma-separated relation labels or glob patterns
  &scope={scope}        // required; MUST NOT traverse across scope boundaries
  &min_confidence={c}   // optional; default 0.1
  &min_trust={t}        // optional; default 0.0
  &cursor={opaque}      // pagination cursor
  &page_size={n}        // default 20; max 200

Depth cap

MUST cap depth at 3. Requests with depth > 3 MUST return HTTP 400 graph_depth_exceeded.

Prune early

SHOULD prune edges with confidence < min_confidence or source_trust < min_trust before BFS, not after, to reduce fanout.

Glob only

relation_filter MAY use * as a wildcard suffix. MUST NOT evaluate as full regex; only prefix-glob is supported.

Shortest-path dedup

Duplicate paths to the same neighbor MUST be de-duplicated; the shortest path (fewest hops) is reported.

§20.1.6 Pagination and Cursor Stability

Opaque

Base64url-encoded to callers.

Stable

A cursor that was valid before a new fact was inserted MUST continue to work and MUST NOT skip or re-return neighbors that were present at cursor issuance.

TTL 300 s

Invalidated after STIGMEM_CURSOR_TTL_S seconds. Expired cursor returns HTTP 400 cursor_expired.

The server MUST include a next_cursor field in the response only when more pages exist.

§20.1.7 Federation Integrity

The entity_edges table is local-node state.

When facts are received from a federated peer, the receiving node MUST apply the same insert/retract/decay invariants to its local entity_edges table. Edges derived from federated facts MUST record the peer's source_trust so that cross-node traversal paths carry trust provenance. Nodes MUST NOT return federated-source edges in neighbors() results when the caller's capability token lacks cross-federation read scope.

§20.2 Embedding Storage

§20.2.1 Vector Table

Implementations MUST use sqlite-vec for vector storage. The virtual table schema is:

CREATE VIRTUAL TABLE IF NOT EXISTS vec_facts USING vec0(
    id       TEXT PRIMARY KEY,
    embedding FLOAT[768]         -- default dimensionality; see §20.2.4
);

The id column is the source fact's id for per-fact embeddings, or the string "card:{entity_uri}:{scope}" for memory card embeddings.

§20.2.2 Embedding Unit

Each live fact (confidence > STIGMEM_EMBED_MIN_CONFIDENCE, default 0.1) MUST be embedded as the composed string:

"{entity_display} {relation} {value_text}"

where entity_display is the last path segment of the entity URI, relation is the fact's relation label, and value_text is the typed value's textual representation.

All embeddings MUST be L2-normalized to unit length on insertion.

Cosine similarity reduces to a dot product, enabling sqlite-vec's native dot-product acceleration. The 1-to-1 mapping (one embedding per fact row) ensures that vector ANN retrieval returns individual, attributable facts rather than entity-level blobs.

§20.2.3 Default Model

The default embedding model is nomic-embed-text-v1.5 (768 dimensions, Apache-2.0, runnable offline via Ollama).

Provider

Model

Notes

ollama (default)

nomic-embed-text · 768

Offline; Matryoshka-capable.

ollama

mxbai-embed-large · 1024

Higher recall; larger memory footprint.

openai

text-embedding-3-small · 1536

Cloud opt-in; requires OPENAI_API_KEY.

voyage

voyage-3-lite · 512

Cloud opt-in; requires VOYAGE_API_KEY.

§20.2.4 Dimensionality Declaration

Each node MUST record its configured embedding dimensionality in the /.well-known/stigmem response. truncated_dimensions MAY be set to a smaller integer when using Matryoshka-capable models; the floor for nomic-embed-text-v1.5 is 64 dimensions.

Implementations MUST refuse to mix embeddings of different dimensionalities in the same vec_facts table.

If STIGMEM_EMBED_DIMENSIONS changes after facts have been indexed, the node MUST refuse to start and emit a vec_facts dimensionality mismatch error. Re-indexing is performed by draining and re-inserting all rows.

§20.2.5 Embedding Lifecycle

Event

Action

Notes

Fact inserted > threshold

embed + insert

Insert into vec_facts.

Value change

re-embed

Update vec_facts row.

Confidence < tombstone threshold

delete

Default 0.1. Stale low-confidence vectors pollute ANN results.

Confidence restored

re-insert

Re-embed and re-insert.

Hard-delete

delete (txn)

Same transaction.

§20.2.6 Contradiction Interaction

Both contradicting facts retain their embeddings.

The contradiction penalty is applied at ranking time, not by modifying stored vectors. Implementations MUST NOT delete or modify the embedding of a contradicted fact.

§20.3 Recall API

§20.3.1 Route

GET  /v1/recall
POST /v1/recall   (preferred when query text is long)

POST is preferred when query exceeds 1000 characters to avoid URI length limits. The MCP tool recall wraps the same endpoint with identical semantics.

§20.3.2 Request Shape

Parameter

Required · Default

Description

query

yes · —

Natural-language or structured query string.

token_budget

yes · —

Max tokens in the response payload.

depth

no · 1 (max 2)

Graph expansion depth; capped lower than neighbors()'s max-3 to bound recall latency.

weights

no · {lex:0.30, vec:0.50, graph:0.20}

Stage weights; MUST sum to 1.0 within ±0.001. Provisional — operators SHOULD re-tune against a held-out probe set.

include_low_trust

no · false

If false, facts with effective confidence < 0.2 are excluded.

entity / relation

Entity URI enables entity-centric recall; relation filter skips memory card lookup.

lambda_mmr

no · 0.7

MMR diversity-relevance tradeoff; 1.0 = pure relevance.

min_confidence

no · 0.1

Minimum effective confidence for candidate inclusion.

Validation errors: token_budget < 1 → invalid_token_budget; depth > 2 → recall_depth_exceeded; weights not summing to 1.0 → invalid_weights.

§20.3.3 Ranking Pipeline

The recall pipeline runs three stages then fuses their candidate sets: Stage 1 (Lexical / BM25), Stage 2 (Dense ANN), Stage 3 (Graph expansion BFS).

Stage 2 (ANN) scope enforcement MUST be applied via the join to facts.

vec_facts carries no scope column. Implementations MUST NOT pass ANN results to fusion before this join filter; doing so risks cross-scope leakage. Implementations MUST ALSO verify the caller's garden ACL for each Stage 2 candidate. Stage 3 seed entities MUST have their garden ACL verified before BFS expansion begins.

Stage 3 edge score:

graph_score(f at entity e via edge x) =
  (1 / (1 + hops)) × edge.confidence / log(1 + out_degree(x.subject))

The log(1 + out_degree) denominator is the hub-bias guard — it penalizes hub entities whose facts would otherwise dominate graph expansion regardless of query relevance.

Fusion formula:

raw_score(f) = α · norm(bm25(f)) + β · norm(cosine_sim(f)) + γ · norm(graph_score(f))

salience(f)  = recency(f)
             × confidence_weight(f)
             × access_freq_weight(f)
             × contradiction_weight(f)
             × garden_tier(f)

score(f)     = raw_score(f) × salience(f) × source_trust_multiplier(f.source_trust)

Salience signals:

Signal

Formula

Range

recency(f)

exp(-0.01 × age_days)

(0, 1]

confidence_weight(f)

f.confidence

[0, 1]

access_freq_weight(f)

log-normalized within set

[0, 1]

contradiction_weight(f)

1.0 ok · 0.7 unresolved

{0.7, 1.0}

garden_tier(f)

configurable

Default 1.0; quarantine default 0.2.

source_trust_multiplier(t)

0.5 + 0.5 × t

[0.5, 1]; 1.0 when trust_mode = off.

access_count MUST be incremented each time a fact appears in a recall response. SHOULD batch increments (flush interval ≤ 30 s).

§20.3.4 Token-Budget Packing (MMR)

The scored candidate set is packed using Maximal Marginal Relevance:

next = argmax_{f ∈ R \ selected} [
    λ_mmr · score(f)
  - (1 − λ_mmr) · max_{f_j ∈ selected} cosine_sim(embed(f), embed(f_j))
]

The loop runs until the remaining token budget cannot accommodate the next candidate.

token_cost(f) = 40 + ceil(len(value_text_utf8) / 4)

Empty-budget edge case: return empty results with truncated: true, NOT HTTP 400.

The caller controls budget. When entity is specified (entity-centric recall), MMR MUST be disabled; all facts for that entity in scope are returned sorted by score descending.

§20.3.5 Response Shape

{
  "query": "what is Alice's current role?",
  "token_budget": 512,
  "tokens_used": 340,
  "results": [
    {
      "id":          "3f7a…",
      "entity":      "https://example.com/entity/alice",
      "relation":    "memory:role",
      "value":       { "type": "text", "v": "CEO" },
      "confidence":  0.97,
      "source_trust": 0.90,
      "score":       0.843,
      "hops":        0,
      "contradicted": false,
      "card_stale":  false
    }
  ],
  "memory_card": null,
  "truncated": false,
  "scores_debug": null
}

memory_card is populated for entity-centric queries. scores_debug MAY be populated when debug=true; MUST be null in production responses.

§20.3.6 `include_low_trust` Behavior

When include_low_trust = false (default), facts with effective_confidence = fact.confidence × source_trust < 0.2 MUST be excluded from all three stages before fusion. When true, they are included but the source_trust_multiplier still applies, so they rank lower.

§20.4 Memory Cards

§20.4.1 Card Definition

A memory card is a per-entity synthesized text summary stored as a fact with:

entity:   {entity-uri}
relation: stigmem:memory:card
value:    { "type": "text", "v": {card_markdown} }
source:   "system:stigmem:card-generator"
scope:    {same scope as constituent facts}
confidence: 1.0

confidence = 1.0 expresses confidence in the card's existence, not its content accuracy.

Cards are NOT subject to the fact decay sweeper.

§20.4.2 Card Schema

The value.v field is structured Markdown containing entity metadata, current facts table, contradictions list, and source summary.

Effective confidence ≥ 0.3

MUST include all live facts (fact.confidence × source_trust).

Sort relation/HLC

MUST sort rows by (relation ASC, hlc DESC) so the most recent assertion per relation appears first.

Surface contradictions

MUST surface both values and confidences. Cards MUST NOT silently resolve contradictions.

4000 token cap

Include the highest-confidence facts and append … {n_omitted} lower-confidence facts omitted.

Single (entity, scope, garden_id)

The card generator MUST NOT mix garden-scoped facts into a cross-garden card.

The card is also embedded as a unit for entity-level semantic search; its vec_facts key is "card:{entity_uri}:{scope}".

§20.4.3 Refresh Policy

Cards MUST NOT be subject to confidence decay. They are invalidated and queued for async refresh on these triggers:

Trigger

Action

Notes

New fact for entity

async refresh

Invalidate card; enqueue background refresh.

Decay sweep touch

async refresh

Constituent confidence change.

Card age > STIGMEM_CARD_MAX_AGE_S

async refresh

Default 86400 s.

Contradiction resolved

async refresh

Via POST /v1/conflicts/:id/resolve.

During refresh, the stale card remains readable and is served with card_stale: true.

When force_refresh = true, card regeneration is synchronous and MUST complete within 500 ms. If exceeded, the stale card (or raw facts if no card exists) MUST be returned with card_stale: true and force_refresh_timeout: true.

§20.4.4 Recall Integration

Condition

Behavior

Notes

Entity-centric, card exists

card + raw facts

Return card as memory_card; top-N raw facts as results.

Relation filter

skip card

Return raw facts for that relation.

Card stale, no force

stale + top-10

Return stale card with card_stale: true + top-10 raw facts.

Contradictions + include flag

card + pairs

Return card + raw fact pairs for each contradiction.

No card

raw + async gen

Return raw facts; trigger async card generation.

Generation in flight

no block

Return raw facts immediately; do not block.

Not entity-centric

skip card

Run full hybrid pipeline on raw facts.

Implementations MUST verify the caller's garden ACL against the card's garden_id before including the card in a recall response. Cards in unauthorized gardens MUST be excluded; the fallback is raw facts from authorized gardens only.

§20.4.5 Divergence Policy

Implementations MUST NOT serve a card whose content is known to be inconsistent with live facts.

When raw facts contradict the card's synthesized summary, the card MUST be invalidated immediately and the divergent fact MUST be included in the results array with card_stale: true.

§20.5 Subscriptions

The subscription primitive has been extracted into the colocated experimental spec Spec-X7-Subscriptions. Recall and graph implementations that emit card-refresh or fact-change notifications depend on that spec for event delivery semantics.

§20.6 Causal / Derivation Links

§20.6.1 `derived_from` Lifecycle

The derived_from field on a fact is a JSON array of FactHash references identifying the source facts from which this fact was inferred or synthesized.

Each entry MUST be a 64-character lowercase hex string (SHA-256 of the referenced fact's canonical wire representation).
derived_from arrays MUST NOT contain cycles. The PUT /v1/facts handler MUST verify acyclicity before persisting. Cycles MUST be rejected with HTTP 400 provenance_cycle_detected.
derived_from references MAY point to facts that no longer exist. Dangling references are valid — they preserve audit lineage.
Implementations MUST NOT alter derived_from after the fact is created. PATCH MUST reject with HTTP 422 derived_from_immutable.

§20.6.2 Provenance Walk

GET /v1/facts/:id/provenance
  ?depth={k}      // max 5; default 3
  &scope={scope}

The response MUST be indistinguishable from a missing fact to prevent cross-scope inference attacks.

Implementations MUST verify caller read access to the root fact's scope and garden_id before executing the walk. Unauthorized root facts MUST return HTTP 403 with no node or edge data. Facts in unauthorized scopes or gardens MUST be represented as { "hash": "…", "exists": false } — identical to genuinely absent facts.

§20.6.3 Recall Integration

When GET /v1/recall returns a derived fact, its derived_from hashes MUST be included in the result object. Implementations SHOULD include the immediate parent facts (depth=1) in the results array when their token cost fits, annotated with "provenance_of": "{derived_fact_id}". If the budget is tight, parent facts MUST be omitted (not truncated); the derived_from hashes allow a follow-up provenance walk.

Derivation depth contributes to the graph_score discount: each additional derivation hop applies a multiplier of 0.9 to the fact's confidence_weight salience signal.

§20.6.4 Derivation Link and Federation

When a derived fact is replicated to a peer via federation, the derived_from hashes MUST be transmitted in the wire format. The receiving node MUST store them as-is; it MUST NOT attempt to resolve hashes that it does not have locally. Dangling hashes on the receiving node are valid and MUST NOT prevent the fact from being persisted.

§20.7 Schema Migrations

The following migrations MUST be applied when upgrading to pre-reset graph & recall design (v1.1 spec compliance):

-- Graph index
CREATE TABLE IF NOT EXISTS entity_edges ( ... );
CREATE INDEX IF NOT EXISTS idx_edges_subject     ON entity_edges (subject, scope, confidence);
CREATE INDEX IF NOT EXISTS idx_edges_object      ON entity_edges (object,  scope, confidence);
CREATE INDEX IF NOT EXISTS idx_edges_subject_rel ON entity_edges (subject, relation, scope);

-- Vector table (sqlite-vec required)
CREATE VIRTUAL TABLE IF NOT EXISTS vec_facts USING vec0(
    id        TEXT PRIMARY KEY,
    embedding FLOAT[768]
);

-- Access frequency tracking
ALTER TABLE facts ADD COLUMN IF NOT EXISTS access_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE facts ADD COLUMN IF NOT EXISTS last_accessed_at INTEGER;

-- Subscription storage is owned by Spec-X7-Subscriptions.

§20.8 Error Reference

HTTP

Error code

Condition

400

graph_depth_exceeded

neighbors() or recall depth > max allowed.

400

cursor_expired

Pagination cursor TTL exceeded.

400

invalid_token_budget

token_budget < 1.

400

recall_depth_exceeded

depth > 2 on recall request.

400

invalid_weights

weights values do not sum to 1.0 ± 0.001.

400

provenance_cycle_detected

derived_from graph contains a cycle.

400

invalid_relation_filter

relation_filter uses unsupported regex beyond prefix-glob.

422

derived_from_immutable

Attempt to modify derived_from on an existing fact.

422

embed_dimensionality_mismatch

vec_facts configured dimensions differ from stored.

404

fact_not_found

Provenance walk root fact not found.

§20.1 Graph Index​

§20.1.1 Purpose​

§20.1.2 Schema​

§20.1.3 Adjacency Invariants​

§20.1.4 Edge Metadata Fields​

§20.1.5 neighbors() Query Semantics​

Depth cap

Prune early

Glob only

Shortest-path dedup

§20.1.6 Pagination and Cursor Stability​

Opaque

Stable

TTL 300 s

§20.1.7 Federation Integrity​

§20.2 Embedding Storage​

§20.2.1 Vector Table​

§20.2.2 Embedding Unit​

§20.2.3 Default Model​

§20.2.4 Dimensionality Declaration​

§20.2.5 Embedding Lifecycle​

§20.2.6 Contradiction Interaction​

§20.3 Recall API​

§20.3.1 Route​

§20.3.2 Request Shape​

§20.3.3 Ranking Pipeline​

§20.3.4 Token-Budget Packing (MMR)​

§20.3.5 Response Shape​

§20.3.6 include_low_trust Behavior​

§20.4 Memory Cards​

§20.4.1 Card Definition​

§20.4.2 Card Schema​

Effective confidence ≥ 0.3

Sort relation/HLC

Surface contradictions

4000 token cap

Single (entity, scope, garden_id)

§20.4.3 Refresh Policy​

§20.4.4 Recall Integration​

§20.4.5 Divergence Policy​

§20.5 Subscriptions​

§20.6 Causal / Derivation Links​

§20.6.1 derived_from Lifecycle​

§20.6.2 Provenance Walk​

§20.6.3 Recall Integration​

§20.6.4 Derivation Link and Federation​

§20.7 Schema Migrations​

§20.8 Error Reference​

§20.1 Graph Index

§20.1.1 Purpose

§20.1.2 Schema

§20.1.3 Adjacency Invariants

§20.1.4 Edge Metadata Fields

§20.1.5 `neighbors()` Query Semantics

§20.1.6 Pagination and Cursor Stability

§20.1.7 Federation Integrity

§20.2 Embedding Storage

§20.2.1 Vector Table

§20.2.2 Embedding Unit

§20.2.3 Default Model

§20.2.4 Dimensionality Declaration

§20.2.5 Embedding Lifecycle

§20.2.6 Contradiction Interaction

§20.3 Recall API

§20.3.1 Route

§20.3.2 Request Shape

§20.3.3 Ranking Pipeline

§20.3.4 Token-Budget Packing (MMR)

§20.3.5 Response Shape

§20.3.6 `include_low_trust` Behavior

§20.4 Memory Cards

§20.4.1 Card Definition

§20.4.2 Card Schema

§20.4.3 Refresh Policy

§20.4.4 Recall Integration

§20.4.5 Divergence Policy

§20.5 Subscriptions

§20.6 Causal / Derivation Links

§20.6.1 `derived_from` Lifecycle

§20.6.2 Provenance Walk

§20.6.3 Recall Integration

§20.6.4 Derivation Link and Federation

§20.7 Schema Migrations

§20.8 Error Reference