Defensive AI engineering — guardrails, hardening, response.
Engineering-focused coverage of defensive AI. Guardrail architecture, classifier ensembles, model hardening, output filtering, refusal training, and the response patterns that hold under adversarial pressure in production systems.
How LLM Guardrails Work: Architecture, Detection, and Trade-offs
A technical breakdown of how LLM guardrails work — the six pipeline layers, classifier mechanics, latency costs, and the residual risks that no single control eliminates.
Archive
-
Choosing Runtime Guardrails for LLM Apps: A Decision Framework
There is no single 'best' LLM guardrail. A decision framework for selecting runtime guardrails by threat, placement, and latency budget — comparing rules, classifiers, LLM-as-judge, and safety models, mapped to the OWASP LLM Top 10 risks they mitigate.
-
Securing the ML Model Supply Chain: Provenance, Signing, and Verification
Model weights are unauthenticated binaries that execute code on load. This is a practical guide to securing the ML supply chain with model signing, Sigstore, SLSA provenance, and load-time verification — with the failure modes that make scanning insufficient on its own.
-
Monitoring LLM Outputs in Production: Anomalies and Drift
How to build a production observability stack for LLM outputs — covering anomaly detection pipelines, latency threshold alerting, output drift signals, and concrete alerting logic you can deploy today.
-
Output Filtering Architecture for Production LLMs: A Blueprint
How to architect a multi-layer output filtering pipeline for production LLMs — covering deterministic guards, ML classifiers, schema validation, and async sequencing patterns to minimize latency while maximizing coverage.
-
Output Filtering Architecture for Production LLMs
A deep-dive into layered output filtering for production LLMs — combining semantic classifiers, regex scrubbing, and LLM-as-judge techniques to catch harmful, policy-violating, and hallucinated content before it reaches users or downstream systems.
-
Prompt Injection Prevention: Defense-in-Depth for LLM Systems
A systems-level guide to preventing prompt injection attacks in production LLMs — covering defense-in-depth layering, structural prompt architecture, privilege separation, and continuous adversarial validation with concrete implementation patterns.
-
Prompt Injection Prevention: Hardening and Privilege Separation
A technical guide to preventing prompt injection attacks in production LLMs — covering system prompt hardening, privilege-separated architectures, instruction hierarchy, and defense-in-depth patterns with vulnerable vs. hardened code examples.
-
Implementing Rate Limiting and Abuse Detection for AI APIs
A practical engineering guide to rate limiting, quota enforcement, and abuse detection for AI API endpoints — covering token-bucket algorithms, per-user quotas, fingerprinting, and behavioral anomaly detection for LLM services.
-
Building an Internal Adversarial Testing Pipeline for LLMs
How to build an internal adversarial testing pipeline for LLM applications using garak, promptfoo, and custom probes — with a CI integration pattern that catches security regressions before they reach production.
Trusted by researchers across the AI security community
AI Defense is part of a 26-site editorial network covering adversarial ML, AI governance, defensive tooling, and ops engineering — all open access.
AI Defense — in your inbox
Defensive AI engineering — guardrails, hardening, response. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.