§20. Recall & Graph
What this section covers
Graph adjacency index, embedding storage, recall API, memory cards,
and causal/derivation link lifecycle. The subscription primitive
that previously lived in §20.5 is now colocated under
Spec-X7-Subscriptions.
Status: Experimental / dormant source package. Archived source material was drafted as normative, but this spec is deferred from the supported v0.9.0aN surface and must pass ADR-008 gates before reintroduction.
Source material: Archived evolutionary spec snapshots. This page is the maintained Spec-X home for recall graph semantics.
Each subsection below shows the most recent normative text from the spec source. When earlier spec drafts also contained text for the same subsection, those revisions are collapsed under a Revisions accordion beneath it — open one to see what changed. Subsections that only appear in one draft render as plain text with no accordion.
Depends on: §2 (fact shape), §5 (wire format), §17 (memory garden), §18 (source attestation), §19 (federation trust).
§20.1 Graph Index
§20.1.1 Purpose
The facts table is a flat relation keyed by entity URI. Entity-to-entity connections exist implicitly: any fact whose value.type = "ref" and whose value URI denotes a known entity constitutes a directed edge from the subject entity to the referenced entity.
Without a materialized adjacency structure, multi-hop traversal requires O(k × |F|) full table scans per recall query.
§20 mandates a materialized entity_edges table to enable
efficient bounded-depth BFS.
§20.1.2 Schema
CREATE TABLE IF NOT EXISTS entity_edges (
id TEXT PRIMARY KEY, -- edge UUID (= source fact id)
subject TEXT NOT NULL, -- normalized entity URI ("from" node)
relation TEXT NOT NULL, -- predicate / edge label
object TEXT NOT NULL, -- normalized entity URI ("to" node)
scope TEXT NOT NULL,
confidence REAL NOT NULL, -- mirrors fact.confidence; updated by decay sweeper
source_trust REAL, -- cached t(fact.source) per §19.4; nullable
decay_epoch INTEGER, -- Unix ms of last decay sweep touch
created_at INTEGER NOT NULL -- Unix ms
);
CREATE INDEX IF NOT EXISTS idx_edges_subject ON entity_edges (subject, scope, confidence);
CREATE INDEX IF NOT EXISTS idx_edges_object ON entity_edges (object, scope, confidence);
CREATE INDEX IF NOT EXISTS idx_edges_subject_rel ON entity_edges (subject, relation, scope);
Implementations MUST create this table and all three indexes before accepting PUT /v1/facts calls that could produce ref-type values.
§20.1.3 Adjacency Invariants
- Insert on ref fact. An
entity_edgesrow MUST be inserted whenever a fact is persisted withvalue.type = "ref"and thevfield passes entity-URI validation. TheidMUST equal the source fact'sid. TheobjectMUST be the normalized form of the ref target URI. - Decay sweep propagation. When the decay sweeper updates a fact's
confidence, it MUST update the correspondingentity_edgesrow'sconfidenceanddecay_epochin the same transaction. - Retraction soft-delete. Set
confidence = 0.0on thefactsrow AND on theentity_edgesrow (not hard-delete), AND insert a row intofact_retractions(fact_id, retracted_at). Thefact_retractionsrecord is authoritative for time-travel queries. - Garden scope.
entity_edgesrows inherit the fact'sscope. Cross-garden traversal is governed by the caller's garden ACL checked at the application layer before returning results. - Consistency. An
entity_edgesrow MUST NOT outlive the deletion of its source fact. SHOULD use a foreign-key cascade or equivalent constraint.
§20.1.4 Edge Metadata Fields
idid.subject / objectrelationscopeconfidencefacts.confidence.source_trustt(fact.source). MAY be null for pre-Phase-9 data.decay_epochcreated_at§20.1.5 neighbors() Query Semantics
The neighbors() traversal is the primitive used by the recall pipeline's graph expansion stage and is also exposed directly via GET /v1/graph/neighbors.
GET /v1/graph/neighbors
?entity={entity_uri}
&depth={k} // integer 1–3; default 1; MUST reject > 3
&relation_filter={rel_pattern} // optional; comma-separated relation labels or glob patterns
&scope={scope} // required; MUST NOT traverse across scope boundaries
&min_confidence={c} // optional; default 0.1
&min_trust={t} // optional; default 0.0
&cursor={opaque} // pagination cursor
&page_size={n} // default 20; max 200
Depth cap
MUST cap depth at 3. Requests with depth > 3 MUST return HTTP 400 graph_depth_exceeded.
Prune early
SHOULD prune edges with confidence < min_confidence or source_trust < min_trust before BFS, not after, to reduce fanout.
Glob only
relation_filter MAY use * as a wildcard suffix. MUST NOT evaluate as full regex; only prefix-glob is supported.
Shortest-path dedup
Duplicate paths to the same neighbor MUST be de-duplicated; the shortest path (fewest hops) is reported.
§20.1.6 Pagination and Cursor Stability
Opaque
Base64url-encoded to callers.
Stable
A cursor that was valid before a new fact was inserted MUST continue to work and MUST NOT skip or re-return neighbors that were present at cursor issuance.
TTL 300 s
Invalidated after STIGMEM_CURSOR_TTL_S seconds. Expired cursor returns HTTP 400 cursor_expired.
The server MUST include a next_cursor field in the response only when more pages exist.
§20.1.7 Federation Integrity
The entity_edges table is local-node state.
When facts are received from a federated peer, the receiving node
MUST apply the same insert/retract/decay invariants to its local
entity_edges table. Edges derived from federated facts MUST record
the peer's source_trust so that cross-node traversal paths carry
trust provenance. Nodes MUST NOT return federated-source edges in
neighbors() results when the caller's capability token lacks
cross-federation read scope.
§20.2 Embedding Storage
§20.2.1 Vector Table
Implementations MUST use sqlite-vec for vector storage. The virtual table schema is:
CREATE VIRTUAL TABLE IF NOT EXISTS vec_facts USING vec0(
id TEXT PRIMARY KEY,
embedding FLOAT[768] -- default dimensionality; see §20.2.4
);
The id column is the source fact's id for per-fact embeddings, or the string "card:{entity_uri}:{scope}" for memory card embeddings.
§20.2.2 Embedding Unit
Each live fact (confidence > STIGMEM_EMBED_MIN_CONFIDENCE, default 0.1) MUST be embedded as the composed string:
"{entity_display} {relation} {value_text}"
where entity_display is the last path segment of the entity URI, relation is the fact's relation label, and value_text is the typed value's textual representation.
All embeddings MUST be L2-normalized to unit length on insertion.
Cosine similarity reduces to a dot product, enabling sqlite-vec's native dot-product acceleration. The 1-to-1 mapping (one embedding per fact row) ensures that vector ANN retrieval returns individual, attributable facts rather than entity-level blobs.
§20.2.3 Default Model
The default embedding model is nomic-embed-text-v1.5 (768 dimensions, Apache-2.0, runnable offline via Ollama).
ollama (default)ollamaopenaiOPENAI_API_KEY.voyageVOYAGE_API_KEY.§20.2.4 Dimensionality Declaration
Each node MUST record its configured embedding dimensionality in the /.well-known/stigmem response. truncated_dimensions MAY be set to a smaller integer when using Matryoshka-capable models; the floor for nomic-embed-text-v1.5 is 64 dimensions.
Implementations MUST refuse to mix embeddings of different dimensionalities in the same vec_facts table.
If STIGMEM_EMBED_DIMENSIONS changes after facts have been indexed,
the node MUST refuse to start and emit a vec_facts dimensionality mismatch error. Re-indexing is performed by draining and
re-inserting all rows.
§20.2.5 Embedding Lifecycle
vec_facts.vec_facts row.§20.2.6 Contradiction Interaction
Both contradicting facts retain their embeddings.
The contradiction penalty is applied at ranking time, not by modifying stored vectors. Implementations MUST NOT delete or modify the embedding of a contradicted fact.
§20.3 Recall API
§20.3.1 Route
GET /v1/recall
POST /v1/recall (preferred when query text is long)
POST is preferred when query exceeds 1000 characters to avoid URI length limits. The MCP tool recall wraps the same endpoint with identical semantics.
§20.3.2 Request Shape
querytoken_budgetdepthneighbors()'s max-3 to bound recall latency.weightsinclude_low_trustentity / relationlambda_mmrmin_confidenceValidation errors: token_budget < 1 → invalid_token_budget; depth > 2 → recall_depth_exceeded; weights not summing to 1.0 → invalid_weights.
§20.3.3 Ranking Pipeline
The recall pipeline runs three stages then fuses their candidate sets: Stage 1 (Lexical / BM25), Stage 2 (Dense ANN), Stage 3 (Graph expansion BFS).
Stage 2 (ANN) scope enforcement MUST be applied via the join to facts.
vec_facts carries no scope column. Implementations MUST NOT pass
ANN results to fusion before this join filter; doing so risks
cross-scope leakage. Implementations MUST ALSO verify the caller's
garden ACL for each Stage 2 candidate. Stage 3 seed entities MUST
have their garden ACL verified before BFS expansion begins.
Stage 3 edge score:
graph_score(f at entity e via edge x) =
(1 / (1 + hops)) × edge.confidence / log(1 + out_degree(x.subject))
The log(1 + out_degree) denominator is the hub-bias guard — it penalizes hub entities whose facts would otherwise dominate graph expansion regardless of query relevance.
Fusion formula:
raw_score(f) = α · norm(bm25(f)) + β · norm(cosine_sim(f)) + γ · norm(graph_score(f))
salience(f) = recency(f)
× confidence_weight(f)
× access_freq_weight(f)
× contradiction_weight(f)
× garden_tier(f)
score(f) = raw_score(f) × salience(f) × source_trust_multiplier(f.source_trust)
Salience signals:
recency(f)exp(-0.01 × age_days)confidence_weight(f)f.confidenceaccess_freq_weight(f)contradiction_weight(f)garden_tier(f)source_trust_multiplier(t)0.5 + 0.5 × ttrust_mode = off.access_count MUST be incremented each time a fact appears in a recall response. SHOULD batch increments (flush interval ≤ 30 s).
§20.3.4 Token-Budget Packing (MMR)
The scored candidate set is packed using Maximal Marginal Relevance:
next = argmax_{f ∈ R \ selected} [
λ_mmr · score(f)
- (1 − λ_mmr) · max_{f_j ∈ selected} cosine_sim(embed(f), embed(f_j))
]
The loop runs until the remaining token budget cannot accommodate the next candidate.
token_cost(f) = 40 + ceil(len(value_text_utf8) / 4)
Empty-budget edge case: return empty results with truncated: true, NOT HTTP 400.
The caller controls budget. When entity is specified
(entity-centric recall), MMR MUST be disabled; all facts for that
entity in scope are returned sorted by score descending.
§20.3.5 Response Shape
{
"query": "what is Alice's current role?",
"token_budget": 512,
"tokens_used": 340,
"results": [
{
"id": "3f7a…",
"entity": "https://example.com/entity/alice",
"relation": "memory:role",
"value": { "type": "text", "v": "CEO" },
"confidence": 0.97,
"source_trust": 0.90,
"score": 0.843,
"hops": 0,
"contradicted": false,
"card_stale": false
}
],
"memory_card": null,
"truncated": false,
"scores_debug": null
}
memory_card is populated for entity-centric queries. scores_debug MAY be populated when debug=true; MUST be null in production responses.
§20.3.6 include_low_trust Behavior
When include_low_trust = false (default), facts with effective_confidence = fact.confidence × source_trust < 0.2 MUST be excluded from all three stages before fusion. When true, they are included but the source_trust_multiplier still applies, so they rank lower.
§20.4 Memory Cards
§20.4.1 Card Definition
A memory card is a per-entity synthesized text summary stored as a fact with:
entity: {entity-uri}
relation: stigmem:memory:card
value: { "type": "text", "v": {card_markdown} }
source: "system:stigmem:card-generator"
scope: {same scope as constituent facts}
confidence: 1.0
confidence = 1.0 expresses confidence in the card's existence, not its content accuracy.
Cards are NOT subject to the fact decay sweeper.
§20.4.2 Card Schema
The value.v field is structured Markdown containing entity metadata, current facts table, contradictions list, and source summary.
Effective confidence ≥ 0.3
MUST include all live facts (fact.confidence × source_trust).
Sort relation/HLC
MUST sort rows by (relation ASC, hlc DESC) so the most recent assertion per relation appears first.
Surface contradictions
MUST surface both values and confidences. Cards MUST NOT silently resolve contradictions.
4000 token cap
Include the highest-confidence facts and append … {n_omitted} lower-confidence facts omitted.
Single (entity, scope, garden_id)
The card generator MUST NOT mix garden-scoped facts into a cross-garden card.
The card is also embedded as a unit for entity-level semantic search; its vec_facts key is "card:{entity_uri}:{scope}".
§20.4.3 Refresh Policy
Cards MUST NOT be subject to confidence decay. They are invalidated and queued for async refresh on these triggers:
STIGMEM_CARD_MAX_AGE_SPOST /v1/conflicts/:id/resolve.During refresh, the stale card remains readable and is served with card_stale: true.
When force_refresh = true, card regeneration is synchronous and
MUST complete within 500 ms. If exceeded, the stale card (or raw
facts if no card exists) MUST be returned with card_stale: true
and force_refresh_timeout: true.
§20.4.4 Recall Integration
memory_card; top-N raw facts as results.card_stale: true + top-10 raw facts.Implementations MUST verify the caller's garden ACL against the card's garden_id before including the card in a recall response. Cards in unauthorized gardens MUST be excluded; the fallback is raw facts from authorized gardens only.
§20.4.5 Divergence Policy
Implementations MUST NOT serve a card whose content is known to be inconsistent with live facts.
When raw facts contradict the card's synthesized summary, the card
MUST be invalidated immediately and the divergent fact MUST be
included in the results array with card_stale: true.
§20.5 Subscriptions
The subscription primitive has been extracted into the colocated experimental spec Spec-X7-Subscriptions. Recall and graph implementations that emit card-refresh or fact-change notifications depend on that spec for event delivery semantics.
§20.6 Causal / Derivation Links
§20.6.1 derived_from Lifecycle
The derived_from field on a fact is a JSON array of FactHash references identifying the source facts from which this fact was inferred or synthesized.
- Each entry MUST be a 64-character lowercase hex string (SHA-256 of the referenced fact's canonical wire representation).
derived_fromarrays MUST NOT contain cycles. ThePUT /v1/factshandler MUST verify acyclicity before persisting. Cycles MUST be rejected with HTTP 400provenance_cycle_detected.derived_fromreferences MAY point to facts that no longer exist. Dangling references are valid — they preserve audit lineage.- Implementations MUST NOT alter
derived_fromafter the fact is created.PATCHMUST reject with HTTP 422derived_from_immutable.
§20.6.2 Provenance Walk
GET /v1/facts/:id/provenance
?depth={k} // max 5; default 3
&scope={scope}
The response MUST be indistinguishable from a missing fact to prevent cross-scope inference attacks.
Implementations MUST verify caller read access to the root fact's
scope and garden_id before executing the walk. Unauthorized root
facts MUST return HTTP 403 with no node or edge data. Facts in
unauthorized scopes or gardens MUST be represented as
{ "hash": "…", "exists": false } — identical to genuinely absent
facts.
§20.6.3 Recall Integration
When GET /v1/recall returns a derived fact, its derived_from hashes MUST be included in the result object. Implementations SHOULD include the immediate parent facts (depth=1) in the results array when their token cost fits, annotated with "provenance_of": "{derived_fact_id}". If the budget is tight, parent facts MUST be omitted (not truncated); the derived_from hashes allow a follow-up provenance walk.
Derivation depth contributes to the graph_score discount: each additional derivation hop applies a multiplier of 0.9 to the fact's confidence_weight salience signal.
§20.6.4 Derivation Link and Federation
When a derived fact is replicated to a peer via federation, the derived_from hashes MUST be transmitted in the wire format. The receiving node MUST store them as-is; it MUST NOT attempt to resolve hashes that it does not have locally. Dangling hashes on the receiving node are valid and MUST NOT prevent the fact from being persisted.
§20.7 Schema Migrations
The following migrations MUST be applied when upgrading to pre-reset graph & recall design (v1.1 spec compliance):
-- Graph index
CREATE TABLE IF NOT EXISTS entity_edges ( ... );
CREATE INDEX IF NOT EXISTS idx_edges_subject ON entity_edges (subject, scope, confidence);
CREATE INDEX IF NOT EXISTS idx_edges_object ON entity_edges (object, scope, confidence);
CREATE INDEX IF NOT EXISTS idx_edges_subject_rel ON entity_edges (subject, relation, scope);
-- Vector table (sqlite-vec required)
CREATE VIRTUAL TABLE IF NOT EXISTS vec_facts USING vec0(
id TEXT PRIMARY KEY,
embedding FLOAT[768]
);
-- Access frequency tracking
ALTER TABLE facts ADD COLUMN IF NOT EXISTS access_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE facts ADD COLUMN IF NOT EXISTS last_accessed_at INTEGER;
-- Subscription storage is owned by Spec-X7-Subscriptions.
§20.8 Error Reference
graph_depth_exceededneighbors() or recall depth > max allowed.cursor_expiredinvalid_token_budgettoken_budget < 1.recall_depth_exceededdepth > 2 on recall request.invalid_weightsweights values do not sum to 1.0 ± 0.001.provenance_cycle_detectedderived_from graph contains a cycle.invalid_relation_filterrelation_filter uses unsupported regex beyond prefix-glob.derived_from_immutablederived_from on an existing fact.embed_dimensionality_mismatchvec_facts configured dimensions differ from stored.fact_not_found