Red-team harness
A catalogue of adversarial probes run continuously against a production AI system to catch prompt injection, jailbreaks, policy violations and data exfiltration.
A red-team harness is the adversarial counterpart of an evaluation harness. Where the evaluation harness catches quality regressions, the red-team harness catches safety and policy regressions - the failures an attacker is actively trying to induce.
What it contains
- A probe catalogue - 50 to 200 adversarial inputs organised by attack class (prompt injection, jailbreak, policy violation, data exfiltration, refusal bypass, PII leak).
- A severity taxonomy with response SLAs - critical, high, medium, low.
- A scheduler - probes run nightly in staging and weekly in production.
- A quarterly coverage review because the attack surface changes.
Why it's the auditor's first question
Red-team coverage is the single most visible signal of a mature governance practice to an external auditor. A CISO signing off on a production AI deployment will usually ask for the red-team catalogue before anything else. Teams that deferred it usually fail their first production-readiness review.
Relationship to policy-as-code
Red-team probes test whether policy-as-code is actually enforced at runtime. A policy that exists in a document but isn't caught by an adversarial probe is not a policy - it's an intention.