All posts
-
How LLM Guardrails Work: Architecture, Detection, and Trade-offs
A technical breakdown of how LLM guardrails work — the six pipeline layers, classifier mechanics, latency costs, and the residual risks that no single control eliminates.
-
Choosing Runtime Guardrails for LLM Apps: A Decision Framework
There is no single 'best' LLM guardrail. A decision framework for selecting runtime guardrails by threat, placement, and latency budget — comparing rules, classifiers, LLM-as-judge, and safety models, mapped to the OWASP LLM Top 10 risks they mitigate.
-
Securing the ML Model Supply Chain: Provenance, Signing, and Verification
Model weights are unauthenticated binaries that execute code on load. This is a practical guide to securing the ML supply chain with model signing, Sigstore, SLSA provenance, and load-time verification — with the failure modes that make scanning insufficient on its own.
-
Monitoring LLM Outputs in Production: Anomalies and Drift
How to build a production observability stack for LLM outputs — covering anomaly detection pipelines, latency threshold alerting, output drift signals, and concrete alerting logic you can deploy today.
-
Output Filtering Architecture for Production LLMs: A Blueprint
How to architect a multi-layer output filtering pipeline for production LLMs — covering deterministic guards, ML classifiers, schema validation, and async sequencing patterns to minimize latency while maximizing coverage.
-
Output Filtering Architecture for Production LLMs
A deep-dive into layered output filtering for production LLMs — combining semantic classifiers, regex scrubbing, and LLM-as-judge techniques to catch harmful, policy-violating, and hallucinated content before it reaches users or downstream systems.
-
Prompt Injection Prevention: Defense-in-Depth for LLM Systems
A systems-level guide to preventing prompt injection attacks in production LLMs — covering defense-in-depth layering, structural prompt architecture, privilege separation, and continuous adversarial validation with concrete implementation patterns.
-
Prompt Injection Prevention: Hardening and Privilege Separation
A technical guide to preventing prompt injection attacks in production LLMs — covering system prompt hardening, privilege-separated architectures, instruction hierarchy, and defense-in-depth patterns with vulnerable vs. hardened code examples.
-
Implementing Rate Limiting and Abuse Detection for AI APIs
A practical engineering guide to rate limiting, quota enforcement, and abuse detection for AI API endpoints — covering token-bucket algorithms, per-user quotas, fingerprinting, and behavioral anomaly detection for LLM services.
-
Building an Internal Adversarial Testing Pipeline for LLMs
How to build an internal adversarial testing pipeline for LLM applications using garak, promptfoo, and custom probes — with a CI integration pattern that catches security regressions before they reach production.
-
AI Defense Techniques for LLMs: A Practitioner's Guide
A technical breakdown of proven AI defense techniques for LLMs — from input guardrails and prompt hardening to dual-model architectures and red teaming, mapped to OWASP and NIST frameworks.
-
LLM Guardrails Implementation: A Guide to Production Controls
How to implement LLM guardrails across input validation, output filtering, and runtime enforcement — with concrete patterns, tooling comparisons, and latency trade-offs for production deployments.
-
What this site is for
AI Defense covers defensive AI engineering — guardrails, content filters, and shipping AI features without shipping liability.