AI vocabulary,
decoded.
The terms that actually matter in production AI - evaluation harnesses, policy-as- code, red-team catalogues, sovereign cloud, scorecards. One-sentence definitions up top, longer plain-English explanations below. Every term links to the services and writing behind it.
Architecture
2 termsAI agent
An AI system that decides the sequence of tool calls at runtime - research, retrieval, reasoning, drafting - rather than following a pre-written workflow.
Workflow automation
An AI-assisted system whose path from input to output is knowable in advance - model calls are embedded at specific decision points inside otherwise deterministic code.
Evaluation
3 termsEvaluation harness
An automated system that runs a curated set of inputs through an AI system, scores the outputs, and catches regressions before they reach customers.
Golden set
A curated collection of representative inputs with known expected behaviour, used as the regression backbone of an AI evaluation harness.
LLM-as-judge
Using a language model to score the outputs of another language model against a rubric - a cheaper, noisier substitute for human review in AI evaluation.
Governance
4 termsAudit trail
A queryable record of every AI-driven decision - inputs, outputs, model version, tool calls, policy evaluations, timestamps - indexed for regulator-level review.
PII redaction
The removal of personally identifiable information from prompts, logs and stored records - ideally before the data ever reaches the model.
Policy-as-code
Compliance and operational rules expressed as executable, versioned, testable code that enforces itself at runtime - rather than as a document nobody quite follows.
Red-team harness
A catalogue of adversarial probes run continuously against a production AI system to catch prompt injection, jailbreaks, policy violations and data exfiltration.
Operating model
2 termsProduction readiness
A written definition, agreed before build, of what an AI system must do to be considered shipped - covering evaluation, governance, observability, incident response and operating model.
Scorecard
A one-page document, agreed before build, that names the AI programme's outcome metric with a baseline, target, review cadence and written stop condition.
Let’sbuildyoursystemnext.
Thirty minutes with someone who’d be doing the work. No slide deck, no intake form. We’ll tell you what’s feasible, where you’ll hit friction, and what we’d pick up first.