Skip to main content
Version: v0.9.0a2
Operator

Audit Log & Per-Principal Quotas

6 min readOperator ยท SIEM integratorSpec-09 + Spec-10

What this page covers

Two pre-reset hardening security features: how to mint an audit.read API key and query the structured audit log (Spec-09), and the 7 token-bucket quota dimensions, their defaults, and how to tune them via environment variables (Spec-10).

Audit log surface (Spec-09-Audit-Log)โ€‹

The audit.read capabilityโ€‹

audit.read is a permission string, not a separate admin role.

A key can hold any combination of read, write, and audit.read. Access to /v1/admin/audit is gated on audit.read.

# Mint a dedicated audit key (Python SDK)
from stigmem_client import StigmemClient

client = StigmemClient(base_url="https://node.example.com", api_key=ADMIN_KEY)
audit_key = client.create_api_key(
entity_uri="stigmem://your-org.example.com/siem-reader",
permissions=["audit.read"], # read/write not required for audit access
)
print(audit_key.key) # store securely
# Equivalent curl โ€” POST /v1/admin/api-keys
curl -s -X POST https://node.example.com/v1/admin/api-keys \
-H "Authorization: Bearer $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{
"entity_uri": "stigmem://your-org.example.com/siem-reader",
"permissions": ["audit.read"]
}' | jq '.key'

Requests to GET /v1/admin/audit without audit.read return 403:

{"detail": "audit.read capability required"}

Querying the audit logโ€‹

GET /v1/admin/audit
Authorization: Bearer <audit.read key>

Query parametersโ€‹

Parameter
Type
Description
since
RFC3339
Include events with ts >= since (inclusive).
until
RFC3339
Include events with ts <= until (inclusive).
principal
string
Filter by entity_uri of the acting principal.
event_type
string
Filter to one event type (e.g. quota_breach).
cursor
integer
Opaque seq-based cursor for forward pagination.
limit
integer
Page size โ€” range 1โ€“1000, default 200.

Response schemaโ€‹

{
"entries": [
{
"seq": 12345,
"id": "018f1a2b-...",
"event_type": "quota_breach",
"entity_uri": "stigmem://your-org.example.com/my-agent",
"oidc_sub": "108...",
"fact_id": null,
"source": "API caller",
"attested_key_id": null,
"ts": "2026-05-03T12:00:00Z",
"tenant_id": "default",
"detail": "{\"dimension\": \"fact_write\", \"retry_after\": 3.2}"
}
],
"total": 200,
"next_cursor": 12346
}

seq

Monotonically increasing integer per database; use it as the cursor value for the next page.

next_cursor

Absent when the current page is the last page.

detail

JSON-encoded map of event-specific fields; schema varies by event_type (see Spec-09 event types).

Tenant isolationโ€‹

The API automatically scopes results to the caller's tenant_id. Cross-tenant entries are never returned regardless of credentials.

Pagination exampleโ€‹

AUDIT_KEY="<your audit.read key>"
BASE="https://node.example.com/v1/admin/audit"
CURSOR=""

while true; do
PARAMS="limit=500&event_type=quota_breach"
[[ -n "$CURSOR" ]] && PARAMS="${PARAMS}&cursor=${CURSOR}"

RESP=$(curl -s "${BASE}?${PARAMS}" -H "Authorization: Bearer $AUDIT_KEY")
echo "$RESP" | jq '.entries[] | {seq, ts, principal: .entity_uri}'

NEXT=$(echo "$RESP" | jq -r '.next_cursor // empty')
[[ -z "$NEXT" ]] && break
CURSOR="$NEXT"
done

Audit event typesโ€‹

Sixteen event types are emitted to fact_audit_log. All are returned by GET /v1/admin/audit unless filtered by event_type.

Event type
Trigger
Notes
fact_write
mutation
Fact assertion or retraction.
fact_read
query
Recall / query returning โ‰ฅ 1 fact.
capability_token_issue
capability
Capability token issued.
capability_token_revoke
capability
Capability token revoked.
manifest_publish
federation
Org manifest updated.
key_rotation
lifecycle
API key rotated.
federation_connect
federation
Federation peer connection attempt.
quarantine_admit
quarantine
Fact held in quarantine.
quarantine_release
quarantine
Quarantined fact accepted or rejected.
quota_breach
rate limit
Per-principal quota ceiling hit.
admin_action
admin
Admin API call.
replay_rejected
replay
Token replay nonce collision.
instruction_audit
instruction
Lazy instruction chunk loaded.
instruction_quarantined
instruction
Instruction-namespace fact placed in quarantine pending approval.
instruction_promoted
instruction
Quarantined instruction-namespace fact promoted by an operator.
api_key_rehashed
key migration
Legacy SHA-256 API key row migrated to Argon2id after successful authentication.

Write-ahead orderingโ€‹

Audit events are persisted before the HTTP response is sent.

If the node crashes after writing the event but before responding, the event is still in the log.

Retentionโ€‹

Operator minimum target

90 days.

Recommended for forensics

1 year.

Table is append-only

Rows are never modified or purged by the node.

No local 90-day enforcement

Operators who must prove a retention window should keep the local database, signed snapshots, or immutable audit exports for at least that long.

Audit evidence posture by trust modeโ€‹

STIGMEM_TRUST_MODE changes how strongly the node enforces federation trust decisions, but it does not disable ordinary fact_audit_log or federation_audit writes.

Trust mode
Enforcement
Audit evidence posture
strict
fail closed
Federation trust decisions fail closed where required; low-trust or unverifiable inputs are rejected, quarantined, or downgraded. Best high-assurance mode. Preserve fact_audit_log, federation_audit, fact-chain checkpoints, and transparency-log evidence together.
relaxed
warnings, not failures
Trust signals computed; warnings emitted; some paths accept data with warning evidence instead of failing closed. Suitable for staged federation rollout. Treat warnings as review items and export audit rows before local retention expiry.
off
checks skipped
Source-trust scoring and related federation trust checks are skipped. Operational audit rows still exist, but trust-enforcement evidence is intentionally absent. Do not treat this as a production assurance posture.

For production deployments, use STIGMEM_TL_BACKEND=rekor when available, or place the local transparency-log file on append-only storage. Export both audit tables to WORM-capable storage on a schedule that is shorter than your local retention target.

Per-principal quotas (Spec-10-Hardening rate limits)โ€‹

Modelโ€‹

Stigmem uses a token-bucket rate limiter, one bucket per (entity_uri, tenant_id, dimension) triple. Every inbound request consumes one token from the relevant dimension bucket. Tokens refill continuously at a fixed rate up to the bucket capacity.

The 7 dimensions and their defaultsโ€‹

Dimension
Capacity ยท Refill
Covers
fact_write
100 ยท 10/s
POST /v1/facts, DELETE /v1/facts/*
fact_read
500 ยท 50/s
GET /v1/facts/*, GET /v1/recall*
token_issue
20 ยท 0.33/s
POST /v1/federation/capability-tokens
federation_pull
30 ยท 0.5/s
Outbound pull replication.
admin_action
10 ยท 0.17/s
POST /v1/admin/*
subscription_event
200 ยท 20/s
Outbound subscription deliveries.
audit_export
10,000 ยท 167/s
Rows returned by GET /v1/admin/audit.

Federation peer requests (using peer-token auth) and unauthenticated health-check paths are exempt from quota.

Tuning via environment variablesโ€‹

The legacy per-hour env vars control the effective rate for fact_write and fact_read. At startup they are converted to token-bucket parameters:

rate_per_second = ceiling / 3600
Variable
Default
Effect
STIGMEM_RATE_LIMIT_WRITE_PER_HOUR
1000
Sets fact_write refill rate.
STIGMEM_RATE_LIMIT_READ_PER_HOUR
5000
Sets fact_read refill rate.
# Example: allow 10,000 writes/hour for a high-throughput ingest node
STIGMEM_RATE_LIMIT_WRITE_PER_HOUR=10000

# Disable quotas entirely (dev/test only โ€” do not use in production)
STIGMEM_RATE_LIMIT_WRITE_PER_HOUR=0
STIGMEM_RATE_LIMIT_READ_PER_HOUR=0

Never set quota env vars to 0 in production.

It removes the per-principal backpressure that protects the node from noisy neighbours and credential misuse. When both read and write limits are 0, Stigmem emits a startup SECURITY WARNING because quota enforcement is disabled; treat that warning as acceptable only in isolated local/dev/test environments.

Bucket state (current tokens, last refill timestamp) is stored in the quota_buckets SQLite table. It persists across restarts.

429 response shape and Retry-After semanticsโ€‹

When a request exceeds the bucket:

HTTP 429 Too Many Requests

{
"error": "quota_exceeded",
"dimension": "fact_write",
"principal": "stigmem://your-org.example.com/my-agent",
"retry_after": 3.2
}

The Retry-After header is set to the integer ceiling of retry_after (seconds). Clients should honour it and not immediately retry.

quota_breach audit eventsโ€‹

Every 429 response generates a quota_breach audit event before the response is sent (write-ahead). The detail field contains:

{
"dimension": "fact_write",
"path": "/v1/facts",
"method": "POST",
"retry_after": 3.2
}

Query recent quota breaches across all principals:

curl -s "https://node.example.com/v1/admin/audit?event_type=quota_breach&limit=100" \
-H "Authorization: Bearer $AUDIT_KEY" \
| jq '.entries[] | {ts, principal: .entity_uri, detail: (.detail | fromjson)}'

Operator checklistโ€‹

Audit key managementโ€‹

  1. Mint a dedicated audit.read key for each monitoring or SIEM consumer. Never reuse application keys for audit access โ€” a compromised app key would also expose the audit trail.
  2. Store audit keys in your secrets manager (Vault, AWS Secrets Manager, 1Password) with a rotation schedule.
  3. If your SIEM supports it, configure it to pull from /v1/admin/audit on a schedule and forward quota_breach and replay_rejected events to your alerting pipeline.

Retentionโ€‹

  1. Verify that fact_audit_log rows older than 90 days have not been deleted. The node does not auto-purge; implement retention via your own SQLite maintenance job or by exporting to cold storage.
  2. For compliance use cases, export audit log rows to an immutable store (S3 Object Lock, GCS object versioning, WORM-enabled storage) before the 90-day window.
-- Check oldest event in the log
SELECT min(ts), count(*) FROM fact_audit_log;

Prometheus metricsโ€‹

When prometheus_client is installed, the node exposes a /metrics endpoint with quota and audit counters.

Metric
Type
Description
stigmem_quota_breach_total
counter
By (principal, tenant, dimension).
stigmem_audit_event_total
counter
By (event_type, tenant).

Install the optional dependency:

pip install prometheus_client

No additional configuration is required โ€” the endpoint is registered automatically if the package is importable at startup.

Quota baselinesโ€‹

  1. After 24 hours of representative traffic, query quota_breach events and identify principals that breach repeatedly. Increase STIGMEM_RATE_LIMIT_WRITE_PER_HOUR / STIGMEM_RATE_LIMIT_READ_PER_HOUR if defaults are too tight for legitimate workloads, or investigate the offending principal.
  2. Set a Prometheus alert on stigmem_quota_breach_total to catch runaway clients before they degrade the node for other principals.