Skip to main content
Version: v0.9.0a2
Operator

Operators

3 min readSelf-hosting operators ยท SREsHandbook overview

What this handbook covers

Everything you need to run a Stigmem node in production: picking a storage backend, deploying, federating with peers, backing up, monitoring, and debugging recall latency.

Audience: self-hosting operators, infrastructure engineers, SREs.


In this sectionโ€‹

Page
Topic
What you'll find
Choose your backend (experimental)
storage
Decision tree: SQLite vs libSQL vs Postgres.
Deploy runbooks
deploy
Step-by-step runbooks for Fly, Compose, Helm, systemd, and PaaS.
Federation peer setup
federation
Key generation, pinning, and source-trust tuning.
Operator validation soak
validation
30-day external validation checklist, weekly digest shape, and finding triage.
Backup & restore
DR
Signed snapshot workflow and cloud PITR.
Monitoring & debugging
observability
Health checks, metrics, and recall-latency diagnosis.
Peer compromise response
incident
Containment and recovery when a federation peer is suspicious or compromised.
Worm detection response
incident
Response path for automated cross-peer or agent-to-agent propagation.
Manifest failure response
incident
What to do when peer manifest or key-rotation verification fails.
Rekor unavailable response
incident
How to handle delayed fact-chain transparency-log checkpoints.
HLC drift response
incident
How to handle peers sending timestamps outside allowed skew.
Key expiry response
incident
Recovery from expired API, federation, issuer, or encryption keys.
Immutability & attestation
hardening
R-23 hardening stack, WORM evidence, and TEE deployment options.

Operator helper scriptsโ€‹

The public repo keeps reusable operator helpers in scripts/:

import_markdown_tree.py

Imports a markdown index and linked markdown files into a Stigmem node as facts. Useful for bootstrapping runbooks, team wikis, or personal knowledge bases.

stigmem-snapshot.sh

Creates a human-readable markdown export of selected facts and contradiction metrics. Complements (but does not replace) the signed stigmem snapshot backup format.


Quick orientationโ€‹

A production Stigmem node has four operational concerns:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Stigmem reference node โ”‚
โ”‚ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Storage โ”‚ โ”‚ Federation โ”‚ โ”‚ Recall / โ”‚ โ”‚
โ”‚ โ”‚ backend โ”‚ โ”‚ peer mesh โ”‚ โ”‚ embedding โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ†• โ†• โ†• โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Operational layer โ”‚ โ”‚
โ”‚ โ”‚ backup/restore ยท key rotation ยท monitoring โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Start here if you haven't deployed yet:

  1. Choose your backend โ€” picks your persistence strategy.
  2. Deploy runbooks โ€” gets the node running in your environment.
  3. Federation peer setup โ€” connects your node to peers.

Day-two operations:

Backup & restore

protect against data loss.

Monitoring & debugging

observe and diagnose.

Incident runbooks

respond to critical alerts: federation, manifest, HLC, worm, key-expiry.

Planning a deployment? The cost calculator helps you estimate storage growth, egress, embedding spend, and operator time before you commit to infrastructure.

Joining external validation? Start with the Operator validation soak checklist so public findings, weekly digests, and future hardened-core exit evidence are traceable from the first day.