N-node Relay Backpressure
What this page is
How relay nodes signal replication lag to downstream peers in multi-hop federation chains. Lets D distinguish "C is up to date" from "C is 8 minutes behind A" — even when C is serving D at full rate.
N-node backpressure patterns are covered by Spec-05-Federation-Trust relay-backpressure guidance. These behaviors are SHOULD (recommended), not MUST — conformant nodes that omit them will interoperate but downstream nodes may receive stale data without warning.
The problem · silent stale data in relay chains
In a two-node federation (A ← B), replication lag is directly
observable: B can see its own pull latency against A.
In an N-node chain (A ← B ← C ← D), node C can fall behind B's
replication while continuing to serve D at full rate — meaning D
receives facts that are stale by both B's lag and C's additional
lag, with no signal that anything is wrong.
Backpressure signal · X-Stigmem-Replication-Lag
When a node's inbound replication lag exceeds
STIGMEM_FEDERATION_RELAY_LAG_WARNING_MS (default 60 seconds), it
includes a warning header on pull responses:
X-Stigmem-Replication-Lag: 87450
The value is the maximum lag in milliseconds across all inbound peers. Downstream nodes receiving this header SHOULD:
Log and alert
Log the lag and alert an operator if it persists.
Treat synthesized facts as stale
Treat synthesized facts from this node as potentially stale.
Optionally back off pull interval
To reduce load on the lagging node.
Hard throttle · HTTP 503
When lag exceeds STIGMEM_FEDERATION_RELAY_LAG_HARD_MS (default 5
minutes), the node returns HTTP 503 on pull requests:
HTTP/1.1 503 Service Unavailable
Retry-After: 120
{
"error": "relay_lag_exceeded",
"lag_ms": 342000,
"retry_after_s": 120
}
Downstream nodes MUST respect the retry_after_s value and not retry before that window.
Discovery via /.well-known/stigmem
Nodes expose their current relay lag in the well-known endpoint so operators can observe it without triggering a pull:
curl $STIGMEM_URL/.well-known/stigmem | jq .replication_lag_ms
# 87450
The replication_lag_ms field is:
Maximum lag
Across all inbound peers.
Omitted if leaf node
No inbound federation, or lag is within warning bounds.
Environment variables
STIGMEM_FEDERATION_RELAY_LAG_WARNING_MS60000X-Stigmem-Replication-Lag warning header.STIGMEM_FEDERATION_RELAY_LAG_HARD_MS300000STIGMEM_FEDERATION_RELAY_ENABLEDtruefalse on leaf nodes to disable relay behavior.Set STIGMEM_FEDERATION_RELAY_ENABLED=false on leaf nodes (nodes
with no downstream peers) to suppress relay headers and avoid
unnecessary lag computation.
Example · 4-node topology
A (origin) ← B (relay) ← C (relay) ← D (leaf)
RELAY_ENABLED=false.replication_lag_ms in well-known; serves C.X-Stigmem-Replication-Lag on responses from C; RELAY_ENABLED=false.D's agent should check X-Stigmem-Replication-Lag on every pull
response from C and surface it if lag is significant:
# Check for lag header on pull response
response=$(curl -si $NODE_C_URL/v1/federation/pull \
-H "Authorization: Bearer $NODE_C_PEER_TOKEN")
lag=$(echo "$response" | grep -i 'x-stigmem-replication-lag' | awk '{print $2}' | tr -d '\r')
if [ -n "$lag" ] && [ "$lag" -gt 30000 ]; then
echo "Warning: C is ${lag}ms behind A. Facts may be stale."
fi
Operator checklist
- Set
STIGMEM_FEDERATION_RELAY_ENABLED=falseon all leaf nodes. - Configure
RELAY_LAG_WARNING_MSandRELAY_LAG_HARD_MSfor your topology's acceptable freshness window. - Monitor
/.well-known/stigmem#replication_lag_mson relay nodes. - Alert if any relay node exceeds warning threshold for more than 5 minutes.
- Downstream agents: check
X-Stigmem-Replication-Lagat context-injection time and log when stale.