Securing the ML Model Supply Chain: Provenance, Signing, and Verification
Model weights are unauthenticated binaries that execute code on load. This is a practical guide to securing the ML supply chain with model signing, Sigstore, SLSA provenance, and load-time verification — with the failure modes that make scanning insufficient on its own.
Most teams treat a downloaded model the way they would treat a trusted internal artifact: pull the weights, load them, serve. The assumption baked into that workflow is that the file on disk is the file the publisher intended to ship. For software packages, an ecosystem of signing, lockfiles, and provenance exists to back that assumption. For model weights, until recently, nothing did. A serialized checkpoint pulled from a hub was an unauthenticated binary — and in several common formats, loading it executes code. This post covers how to close that gap with model signing, provenance, and load-time verification, and where each control stops being sufficient on its own.
Why the Model Is a Supply-Chain Attack Surface
The core problem is that a model file is not inert data. The Python serialization format that still backs a large fraction of published checkpoints encodes a sequence of opcodes that can include arbitrary callable invocation. torch.load, joblib.load, and the equivalent deserializers on attacker-influenced bytes are remote code execution primitives. The model “loads,” and somewhere in that deserialization an attacker’s payload runs with the privileges of your serving process.
This is not theoretical. In February 2025, ReversingLabs documented malicious models hosted on Hugging Face ↗ that carried reverse-shell payloads in their serialized data. The technique — which the researchers named “nullifAI” — deliberately corrupted the serialization stream so that the platform’s picklescan scanner failed to flag it, while the payload still executed when the file was deserialized. Two models were affected and taken down after disclosure. The significant part is not the count; it is the demonstration that the artifact and the scanner designed to vet it can be defeated by the same input. Scanning is a useful layer, but it inspects content the attacker controls, using assumptions the attacker can study.
The attack surface is broader than the weights file. It includes:
- The checkpoint itself — deserialization RCE, or a backdoored model that behaves normally except on a trigger input.
- The retrieval path — a typosquatted repo, a hijacked maintainer account, or a man-in-the-middle on an unauthenticated download.
- The conversion and quantization steps — every transform between the published artifact and what you serve is an opportunity for tampering if it isn’t verified end to end.
- The dependency graph — the loader libraries (
transformers,vllm, custom code in a repo’s*.pyfiles thattrust_remote_code=Truewill happily execute).
Defense-in-depth here means answering two distinct questions at load time: is this the artifact the publisher actually produced (integrity and authenticity), and was it produced by a process I trust (provenance). Signing answers the first. Provenance attestations answer the second.
Model Signing: Authenticating the Artifact
Model signing binds a cryptographic signature to a model’s content so that a verifier can confirm, before loading, that the bytes match what a known identity signed. In 2025 this moved from ad-hoc to standardized: the OpenSSF AI/ML Working Group released Model Signing v1.0 ↗ — a library and CLI authored by engineers from Google, NVIDIA, and HiddenLayer — that signs and verifies models of any format and any size, hashing the constituent files into a manifest and producing a signature over that manifest.
The default backend is Sigstore ↗, which is the more interesting design choice. Traditional signing requires the publisher to manage a long-lived private key — the thing that gets leaked, committed to a repo, or stolen. Sigstore’s keyless flow instead binds the signature to an OpenID Connect identity (a CI workload identity, or a developer’s federated identity) and records the signing event in a public, append-only transparency log. There is no long-lived secret to rotate or exfiltrate, and the transparency log makes a signature over a malicious artifact auditable after the fact rather than invisible.
The model-signing project (sigstore/model-transparency ↗) supports several PKI backends — Sigstore keyless, your own public/private keypair, X.509 certificates, and PKCS#11 hardware tokens — all producing the same signature format. The CLI usage is straightforward:
# Sign a model directory with Sigstore keyless (prompts for OIDC identity)
model_signing sign sigstore ./my-model/
# Verify before loading — fails non-zero if the signature
# does not match a recognized identity
model_signing verify sigstore ./my-model/ \
--identity "[email protected]" \
--identity-provider "https://accounts.google.com"
The discipline that matters is where you verify. The model-signing model is to check signatures at every boundary the artifact crosses: when it is uploaded to a hub, when it is selected for deployment, and when it is consumed as input by another model. A signature checked once at download and never again does not protect against tampering in your own conversion pipeline.
# Verification belongs in the load path, not in a one-time setup script.
import subprocess
def load_model_verified(model_dir: str, expected_identity: str) -> "Model":
result = subprocess.run(
["model_signing", "verify", "sigstore", model_dir,
"--identity", expected_identity,
"--identity-provider", "https://token.actions.githubusercontent.com"],
capture_output=True,
)
if result.returncode != 0:
raise RuntimeError(
f"Model signature verification FAILED for {model_dir}: "
f"{result.stderr.decode()}"
)
# Only now do we touch the bytes. Even so, prefer safetensors over the
# legacy serialization so a verification gap is not also an RCE.
return _load_safetensors(model_dir)
Two caveats keep this honest. First, a valid signature proves authenticity, not safety — a publisher can sign a backdoored model, and the signature will verify correctly. Signing tells you who produced the artifact and that it hasn’t changed since; it does not tell you the artifact is benign. Second, signing is only as meaningful as your identity policy. Verifying against “any identity that signed” is nearly worthless; verifying against a specific, expected releaser identity is the control.
Provenance: Trusting the Process, Not Just the Publisher
Signing authenticates the artifact. Provenance authenticates how it was built. SLSA ↗ (Supply-chain Levels for Software Artifacts) — developed at Google, contributed to the OpenSSF in 2021, and now vendor-neutral — defines incrementally adoptable levels of build integrity, expressed as machine-readable attestations in the in-toto format.
For models, the relevant SLSA idea is build provenance: a signed statement describing what was built, from what source, by what build process, in what environment. The levels are coarse but useful as a target:
- Level 1 — provenance exists and describes how the artifact was built. This alone catches “where did this checkpoint even come from” questions.
- Level 2 — provenance is signed by the build platform, so post-build tampering is detectable.
- Level 3 — the build runs on a hardened, isolated builder that resists tampering during the build, raising the bar against a compromised build step injecting a backdoor.
A model that ships with a SLSA provenance attestation lets a consumer answer questions a signature alone cannot: which training data manifest and code commit produced these weights, whether the build ran in an ephemeral isolated environment, and whether the chain from source to artifact is unbroken. For internal pipelines, generating provenance for your own fine-tunes is often the higher-leverage move — it gives you an audit trail when a model starts misbehaving in production and you need to know exactly what produced the artifact you’re serving.
Putting the Layers Together
No single control is sufficient, which is the recurring theme of defensible AI architecture. A practical model supply-chain posture stacks:
- Format hygiene. Prefer
safetensorsover the executable-capable serialization wherever the ecosystem allows; it cannot carry executable payloads. Treattrust_remote_code=Trueas a privileged operation requiring review, not a convenience flag. - Scanning. Run a model scanner (picklescan, ModelScan, or a commercial equivalent) as a fast first filter — knowing it inspects attacker-controlled content and can be evaded, per nullifAI.
- Signature verification at every boundary. Verify against a specific expected identity in the load path, not once at download.
- Provenance attestation. Require SLSA-style build provenance for artifacts you produce, and prefer published models that ship it.
- Runtime isolation. Load and run models in a sandbox with least privilege and no outbound network by default, so that a verification gap or a signed-but-backdoored model has a constrained blast radius. This is the same containment logic that makes output-side controls hold when an upstream layer is bypassed.
The order matters: cheap filters first, cryptographic verification before the bytes are touched, and isolation as the assumption-of-failure backstop. The supply-chain controls are now mature enough — Model Signing v1.0, Sigstore, SLSA — that the question for most teams is not whether the tooling exists but whether the load path actually invokes it before the deserializer runs.
Sources
- Launch of Model Signing v1.0 — OpenSSF AI/ML Working Group ↗ — Announcement of the v1.0 release (April 2025), authored by contributors from Google, NVIDIA, and HiddenLayer; covers the library/CLI, supported PKI backends, and the verify-at-every-boundary model.
- sigstore/model-transparency (GitHub) ↗ — The
model-signinglibrary and CLI; documents Sigstore keyless, keypair, certificate, and PKCS#11 signing and the manifest/DSSE signature format. - Malicious ML models discovered on Hugging Face — ReversingLabs ↗ — Primary reporting on the nullifAI technique (February 2025): reverse-shell payloads in corrupted serialization files that evaded picklescan.
- SLSA — Supply-chain Levels for Software Artifacts ↗ — The framework specification, including the level definitions and the in-toto provenance attestation format.
Sources
AI Defense — in your inbox
Defensive AI engineering — guardrails, hardening, response. — delivered when there's something worth your inbox.
No spam. Unsubscribe anytime.
Related
Choosing Runtime Guardrails for LLM Apps: A Decision Framework
There is no single 'best' LLM guardrail. A decision framework for selecting runtime guardrails by threat, placement, and latency budget — comparing rules, classifiers, LLM-as-judge, and safety models, mapped to the OWASP LLM Top 10 risks they mitigate.
Implementing Rate Limiting and Abuse Detection for AI APIs
A practical engineering guide to rate limiting, quota enforcement, and abuse detection for AI API endpoints — covering token-bucket algorithms, per-user quotas, fingerprinting, and behavioral anomaly detection for LLM services.
LLM Guardrails Implementation: A Guide to Production Controls
How to implement LLM guardrails across input validation, output filtering, and runtime enforcement — with concrete patterns, tooling comparisons, and latency trade-offs for production deployments.