opncrafter

Zero Trust AI: Who is this Agent?

Dec 30, 2025 • 20 min read

In a microservice architecture, API keys authenticate services. In an agentic architecture, autonomous agents invoke other agents, call tools, and access databases autonomously. Hardcoding static API keys into every agent is a catastrophic security pattern: leaked keys provide unlimited access, key rotation requires redeploying every agent, and you can't reason about which agent actually made a specific API call. The solution is workload identity — cryptographic certificates that identify each agent by its deployment context rather than a hardcoded secret.

1. The Problem with Static API Keys in Agent Systems

  • Key sprawl: 10 agents × 5 APIs each = 50 static secrets to manage, rotate, and audit
  • Over-privileged access: A leaked key to the Agent-A → Database connection gives any attacker full database access
  • No rotation: Rotating all keys requires coordinated downtime for all agents
  • No attribution: "Which agent made this database query at 3am?" — impossible to answer with shared API keys
  • Lateral movement risk: If an agent is compromised, attacker gets all its stored credentials

2. SPIFFE: The Standard for Workload Identity

SPIFFE (Secure Production Identity Framework for Everyone) is a CNCF standard that assigns every workload (container, pod, function) a verified, short-lived cryptographic identity:

# SPIFFE Identity URI format:
# spiffe://<trust-domain>/<path-identifying-the-workload>

# Examples:
spiffe://company.ai/production/agents/research-agent
spiffe://company.ai/production/agents/code-execution-agent
spiffe://company.ai/staging/tools/web-search-tool

# Each identity is backed by an SVID (SPIFFE Verifiable Identity Document)
# An SVID is an X.509 certificate containing the SPIFFE URI in the SAN field
# Properties:
# - Short-lived: Expires in 1 hour (auto-rotated by SPIRE agent)
# - Hardware-bound: Can be tied to TPM or Kubernetes ServiceAccount
# - No secrets stored in code: Certificate fetched from local SPIRE agent
# - Fine-grained: Each agent gets its own distinct identity

3. SPIRE: The Runtime Implementation

# SPIRE (SPIFFE Runtime Environment) has two components:
# - SPIRE Server: Central CA — issues certificates
# - SPIRE Agent: Runs on each node — fetches certs for local workloads

# Install on Kubernetes with Helm
helm repo add spiffe https://spiffe.github.io/helm-charts/
helm install spire spiffe/spire \
    --namespace spire \
    --set "global.spire.trustDomain=company.ai"

# Register your AI agent workload
kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server entry create \
    -spiffeID spiffe://company.ai/production/agents/research-agent \
    -parentID spiffe://company.ai/ns/spire/sa/spire-agent \
    -selector k8s:pod-label:app:research-agent \
    -selector k8s:ns:production
# SPIRE will now issue certificates to any pod in the 'production' namespace
# with the label 'app: research-agent'

4. Fetching SVID Identity in Python Agents

from pyspiffe.workloadapi import DefaultWorkloadApiClient
from pyspiffe.spiffeid import SpiffeId
import ssl

# No hardcoded credentials! SPIRE agent on localhost handles auth.
def get_my_identity():
    """Fetch this agent's X.509 SVID from the local SPIRE agent."""
    client = DefaultWorkloadApiClient("unix:///tmp/spire-agent/public/api.sock")
    context = client.fetch_x509_context()
    svid = context.default_svid
    
    print(f"My SPIFFE ID: {svid.spiffe_id}")
    # → spiffe://company.ai/production/agents/research-agent
    
    print(f"Certificate expires: {svid.leaf.not_valid_after}")
    # → Auto-rotates 5 minutes before expiry
    
    return svid

# Use SVID for mTLS (mutual TLS) when calling other agents
def call_agent_with_mtls(target_url: str, payload: dict):
    svid = get_my_identity()
    
    # Create SSL context with our certificate + trust bundle
    ssl_context = ssl.create_default_context()
    ssl_context.load_cert_chain(
        certfile=svid.leaf_pemfile,     # Our identity certificate
        keyfile=svid.private_key_pemfile,
    )
    ssl_context.load_verify_locations(cafile=svid.bundle_pemfile)
    
    import urllib.request
    request = urllib.request.Request(target_url, data=json.dumps(payload).encode())
    with urllib.request.urlopen(request, context=ssl_context) as response:
        return json.loads(response.read())
    # Both agents authenticate each other — no API keys needed!

5. JWT-SVIDs for Service Authorization

# JWT-SVIDs: use when you need to pass identity through HTTP headers
# (X.509 mTLS requires TLS everywhere; JWTs work with standard HTTP)

from pyspiffe.workloadapi import DefaultWorkloadApiClient

# Calling agent: get a JWT for the target audience
def get_jwt_svid(audience: str) -> str:
    """Get a JWT token identifying this agent to a specific target service."""
    client = DefaultWorkloadApiClient("unix:///tmp/spire-agent/public/api.sock")
    jwt_svid = client.fetch_jwt_svid(audiences=[audience])
    return jwt_svid.token  # Standard JWT string, expires in ~1 hour

# Usage: agent calling the database API
token = get_jwt_svid(audience="spiffe://company.ai/production/database-api")
headers = {"Authorization": f"Bearer {token}"}
response = requests.post("https://database-api/query", headers=headers, json=query)

# Receiving agent: validate the JWT
from pyspiffe.workloadapi import DefaultWorkloadApiClient

def validate_request(token: str) -> str:
    """Validate incoming JWT and return caller's SPIFFE ID."""
    client = DefaultWorkloadApiClient("unix:///tmp/spire-agent/public/api.sock")
    
    # Validates signature against SPIRE trust bundle (auto-fetched)
    jwt_svid = client.validate_jwt_svid(
        token=token,
        audience="spiffe://company.ai/production/database-api",
    )
    
    caller_identity = jwt_svid.spiffe_id
    print(f"Request from: {caller_identity}")
    # → "spiffe://company.ai/production/agents/research-agent"
    
    return caller_identity

6. Policy Enforcement with Open Policy Agent (OPA)

# OPA policy: which agents can call which APIs?
# policy.rego (Rego language)

package agentauth

# Allow research-agent to query the database (read-only)
allow {
    input.caller == "spiffe://company.ai/production/agents/research-agent"
    input.resource == "database"
    input.action == "query"
}

# Allow code-execution-agent to run code sandboxes only
allow {
    input.caller == "spiffe://company.ai/production/agents/code-execution-agent"
    input.resource == "code-sandbox"
    input.action in ["run", "terminate"]
}

# Deny everything else by default (zero-trust principle)

# Evaluate policy from Python
import requests

def check_permission(caller_spiffe_id: str, resource: str, action: str) -> bool:
    response = requests.post("http://localhost:8181/v1/data/agentauth/allow", json={
        "input": {
            "caller": caller_spiffe_id,
            "resource": resource,
            "action": action,
        }
    })
    return response.json().get("result", False)

Frequently Asked Questions

Is SPIFFE/SPIRE overkill for small agent systems?

For fewer than 5 agents running in a controlled environment, SPIFFE adds operational complexity that may not be justified. Use per-agent API keys with short TTLs and a secrets manager (AWS Secrets Manager, HashiCorp Vault) as a simpler alternative. SPIFFE becomes essential when you have 10+ agents, autonomous certificate rotation is needed, compliance requires cryptographic attribution of every action, or you're running agents across multiple Kubernetes clusters.

How does SPIFFE work for locally-run agents (not Kubernetes)?

The SPIRE agent communicates via a Unix socket (/tmp/spire-agent/public/api.sock). For local development, run SPIRE in development mode with process-based selectors that identify workloads by process PID, binary hash, or Unix user. This works for Docker containers and native processes without requiring Kubernetes.

Conclusion

Zero Trust architecture for AI agents replaces the fragile pattern of hardcoded API keys with cryptographic workload identities that automatically rotate, provide fine-grained attribution, and enable policy-based access controls. As agent systems grow in complexity — multiple autonomous agents calling each other, accessing external tools, and making consequential decisions — the security model must be proportionally robust. SPIFFE/SPIRE provides the CNCF-standardized foundation that production AI systems need as they move beyond prototype.

Continue Reading

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK