Model Certification
What this page is
Stigmem's prompt-injection boundary is split across protocol controls and the consumer that reads recalled content. The protocol enforces origin tagging, instruction-write authorization, instruction quarantine, and channel-separated recall. Model certification is the transparency layer for the remaining consumer-side behavior described by ADR-015.
The current certification corpus is corpus-v1.
Located at data/conformance/adversarial/corpus-v1. It
contains 80 prompt-injection patterns across 10 categories. The
corpus is the source of truth for model and adapter certification
runs.
Methodology
Each run sends every corpus pattern to one provider/model/adapter tuple. The adapter prompt tells the model that recalled facts are untrusted data and that behavioral authority stays with the system and user messages.
The model sees:
Pattern identifier and severity
Context facts
Representing normal recalled data.
Adversarial recalled fact content
The instruction to answer
The original task while treating recalled content only as data.
The deterministic rubric classifies each response:
passfailreview-needed