opncrafter

Supply Chain Security: Did You Build That Model?

Dec 30, 2025 • 20 min read

SolarWinds taught the software world that supply chain attacks are devastatingly effective: rather than attacking the hardened target, attackers compromise the build process. In ML, this risk is amplified. A model weight file is just a large binary blob — you can't audit it visually. A malicious actor who compromises your model registry, dataset hosting, or training infrastructure could inject backdoors: specific trigger phrases that cause the model to output malicious instructions, exfiltrate data, or bypass safety filters. Sigstore and Cosign bring cryptographic provenance to ML artifacts.

1. The ML Supply Chain Attack Surface

  • Training data poisoning: Inject malicious samples into the training dataset that create backdoor triggers in the final model
  • Weight replacement: Replace model.safetensors in the model registry with a backdoored version after the legitimate team pushes
  • Docker image tampering: Modify the serving container between the data science team's push and the production deployment
  • Dependency confusion: Publish a malicious package to PyPI with the same name as an internal training library
  • Pickle exploits: PyTorch's .pt/.pkl format can execute arbitrary Python on load — a vector for code injection

The safetensors format mitigates the pickle exploit (it's a safe format that can't execute code), but doesn't solve provenance: how do you know the safetensors file you just downloaded is the one your team produced?

2. Cosign: Signing Container Images

# Install cosign
brew install cosign  # macOS
# Or download from: github.com/sigstore/cosign/releases

# Method 1: Long-lived key pair (good for offline/air-gapped environments)
cosign generate-key-pair
# Creates: cosign.key (private, keep secret) + cosign.pub (public, safe to share)

# Build and push your AI serving container
docker build -t registry.example.com/my-llm-server:v1.2 .
docker push registry.example.com/my-llm-server:v1.2

# Sign the image digest (signs the SHA256, not the mutable tag)
cosign sign --key cosign.key registry.example.com/my-llm-server:v1.2
# The signature is stored in the OCI registry alongside the image
# → registry.example.com/my-llm-server:sha256-abc123...sig

# Verify before deployment
cosign verify --key cosign.pub registry.example.com/my-llm-server:v1.2
# Output if valid:
# [{"critical":{"identity":{"docker-reference":"registry.example.com/my-llm-server"},
#   "image":{"docker-manifest-digest":"sha256:abc123..."},"type":"cosign container image signature"},
#   "optional":null}]
# Exit code 0 = valid, exit code 1 = invalid/tampered

# Method 2: Keyless signing with GitHub Actions OIDC (no key management!)
cosign sign --yes registry.example.com/my-llm-server:v1.2
# In a GitHub Actions workflow, cosign automatically authenticates using the
# OIDC token from the GitHub Actions environment — no secret key to manage!
# Signature is recorded in the public Rekor transparency log

3. Kubernetes Admission Controller with Policy Controller

# Sigstore Policy Controller: Kubernetes webhook that rejects unsigned images
# Install via Helm
helm repo add sigstore https://sigstore.github.io/helm-charts
helm install policy-controller sigstore/policy-controller \
    --namespace cosign-system \
    --create-namespace

# Define which images must be signed with which key
# ClusterImagePolicy.yaml:
apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
metadata:
  name: require-ml-signatures
spec:
  images:
    - glob: "registry.example.com/my-llm-server**"  # Match all versions
    - glob: "registry.example.com/embedding-model**"
  authorities:
    - key:
        kms: "azurekms://your-vault/cosign-key"  # Key in Azure Key Vault
    # OR for keyless:
    # - keyless:
    #     identities:
    #       - issuer: https://token.actions.githubusercontent.com
    #         subject: https://github.com/your-org/your-repo/.github/workflows/build.yml@refs/heads/main

# Apply the policy
kubectl apply -f ClusterImagePolicy.yaml

# Now any Pod that tries to use an unsigned image will be rejected:
# Error: admission webhook "policy.sigstore.dev" denied the request:
# Image registry.example.com/my-llm-server:v1.2 is not signed with a valid key

4. Signing Model Weight Files

# Sign individual model files (safetensors, GGUF, ONNX, etc.)
# This is critical for model zoos and HuggingFace Hub artifacts

# Sign a model file
cosign sign-blob \
    --key cosign.key \
    --bundle model_v1.2.bundle \   # Bundle contains signature + certificate chain
    model.safetensors
# Creates: model_v1.2.bundle (JSON file with signature + public key info)

# Distribute the .bundle file alongside the model weights on your model registry

# Verify before loading the model
cosign verify-blob \
    --key cosign.pub \
    --bundle model_v1.2.bundle \
    model.safetensors
# Exit 0 = verified, Exit 1 = tampered or unsigned

# Integrate into your model loading code:
import subprocess
import sys

def secure_load_model(model_path: str, bundle_path: str, pubkey_path: str):
    """Load model only after verifying its cryptographic signature."""
    result = subprocess.run([
        "cosign", "verify-blob",
        "--key", pubkey_path,
        "--bundle", bundle_path,
        model_path,
    ], capture_output=True, text=True)
    
    if result.returncode != 0:
        raise SecurityError(f"Model signature verification FAILED: {result.stderr}")
    
    print(f"✓ Model signature verified for {model_path}")
    # Only load the model after successful verification
    return load_weights(model_path)

5. Software Bill of Materials (SBOM) for Models

# SBOM = complete inventory of what went into building a model
# Includes: training data hashes, Python package versions, hardware used, etc.

# Generate SBOM for a Docker container image
syft registry.example.com/my-llm-server:v1.2 -o spdx-json > sbom.spdx.json

# Attach SBOM to the container image in the registry
cosign attach sbom --sbom sbom.spdx.json registry.example.com/my-llm-server:v1.2

# Sign the SBOM itself
cosign sign --attachment sbom --key cosign.key registry.example.com/my-llm-server:v1.2

# Python: Generate model-specific SBOM (tracks training provenance)
import json
import hashlib
from pathlib import Path

def create_model_sbom(model_path: str, training_config: dict, data_sources: list):
    """Create an SBOM for an ML model documenting its provenance."""
    model_hash = hashlib.sha256(Path(model_path).read_bytes()).hexdigest()
    
    sbom = {
        "spdxVersion": "SPDX-2.3",
        "dataLicense": "CC0-1.0",
        "name": "ML Model SBOM",
        "packages": [{
            "name": Path(model_path).name,
            "checksum": {"algorithm": "SHA256", "checksumValue": model_hash},
            "comment": "Trained model weights",
        }],
        "relationships": [{
            "spdxElementId": "model",
            "relationshipType": "GENERATED_FROM",
            "relatedSpdxElement": f"dataset:{ds['hash']}"
        } for ds in data_sources],
        "annotations": {
            "trainingConfig": training_config,
            "trainingEnvironment": {
                "pythonVersion": "3.11.0",
                "torchVersion": "2.1.0",
                "hardwareType": "NVIDIA A100 80GB",
            },
            "trainedBy": "data-science-team@company.com",
        }
    }
    return sbom

Frequently Asked Questions

Is Sigstore the same as GPG signing?

Both provide cryptographic signatures, but Sigstore is significantly more developer-friendly. GPG requires key management, key distribution, Web of Trust, and manual expiry handling — a significant operational burden. Sigstore's keyless signing uses short-lived certificates issued by a trusted CA (Fulcio) tied to your OIDC identity (GitHub, Google, Microsoft account), with signatures stored in a public transparency log (Rekor). This provides non-repudiation without long-lived key management. For ML teams, keyless signing in CI/CD pipelines means zero key rotation overhead.

What's the risk of using torch.load()?

PyTorch's default torch.load() uses Python's pickle format, which can execute arbitrary code during deserialization. A malicious actor who replaces your model.pt file can achieve remote code execution on every machine that loads the model. The fix: use torch.load(path, weights_only=True) (available in PyTorch 2.0+) which restricts deserialization to tensors only, or migrate to the safetensors format entirely. Never load .pkl or .pt files from untrusted sources without signature verification.

Conclusion

ML supply chain security is an emerging discipline but the tooling is mature. Cosign and Sigstore provide cryptographic signing for both containers and model weight files. Kubernetes Policy Controller enforces that only signed images run in your cluster. SBOMs document model provenance for compliance and incident response. The minimum viable secure ML pipeline: always use weights_only=True or safetensors, sign your container images in CI/CD with keyless Cosign, and implement Policy Controller to prevent unsigned images from running in production.

Continue Reading

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK