opncrafter

Dockerizing AI Agents

Dec 30, 2025 • 20 min read

"Works on my machine" is a death sentence for production software, and AI agents carry extra risk: dependency conflicts between vector libraries, CUDA versions, Python versions, and system packages. Docker eliminates all of that by packaging your agent, its dependencies, and its runtime environment into a single portable artifact that runs identically everywhere from your laptop to AWS ECS to Google Cloud Run.

1. Why Containerizing AI Agents Is Harder Than Normal Apps

Standard web apps have straightforward dependency trees. AI agents often require:

  • System-level C++ libraries: ChromaDB, FAISS, and other vector libraries need build-essential, cmake, and sometimes libopenblas-dev
  • Large models at startup: If your agent loads a local embedding model (like BGE-M3), container startup can take 30-90 seconds
  • GPU support (optional): CUDA-enabled images are 5-10x larger than CPU images — only add it if you actually need local GPU inference
  • Secrets from multiple providers: OpenAI keys, Pinecone keys, database URLs — all must be injected securely at runtime

2. The Production Python Dockerfile

# Dockerfile (Python AI Agent)
# Stage 1: Build dependencies (heavier image, not deployed)
FROM python:3.11-slim AS builder

WORKDIR /app

# Install system dependencies required by vector/AI libs
RUN apt-get update && apt-get install -y \
    build-essential \
    cmake \
    libopenblas-dev \
    pkg-config \
    && rm -rf /var/lib/apt/lists/*  # Clean up to reduce layer size

# Install Python deps into a virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Stage 2: Runtime image (smaller, only what's needed to run)
FROM python:3.11-slim AS runtime

WORKDIR /app

# Copy the pre-built virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Copy only application code (not build tools)
COPY src/ ./src/
COPY main.py .

# Run as non-root user for security
RUN useradd --create-home appuser
USER appuser

# Health check — allows orchestrators to detect unhealthy containers
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/health', timeout=5)"

EXPOSE 8000
CMD ["python", "main.py"]

3. Managing Secrets Properly

Never bake secrets into your Docker image. If you run docker history my-agent, every layer command is visible — including any ENV API_KEY=sk-... you set during build. Instead:

For Local Development: .env files

# .env (never commit this file)
OPENAI_API_KEY=sk-...
PINECONE_API_KEY=...
DATABASE_URL=postgresql://...

# docker-compose.yml
services:
  agent:
    build: .
    env_file: .env        # Injects all variables at runtime
    ports:
      - "8000:8000"

For Production: AWS Secrets Manager or GCP Secret Manager

# ECS Task Definition — inject from AWS Secrets Manager
"secrets": [
  {
    "name": "OPENAI_API_KEY",
    "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:prod/openai-key"
  }
]
# The ECS agent retrieves the secret at container startup
# Your app reads it as a normal environment variable

4. Multi-Stage Builds for Smaller Images

Multi-stage builds are critical for AI apps. A naive Python image with PyTorch and transformers can easily hit 8-15 GB. A well-crafted multi-stage build gets the same app under 2 GB:

ApproachImage SizeBuild Time
python:3.11 (base, no optimizations)~8 GB with ML libs10+ min (cold)
python:3.11-slim + multi-stage~1.2 GB5-7 min (cold)
python:3.11-slim + multi-stage + cache~1.2 GB90s (warm cache)
distroless + binary wheels~600 MB8 min (cold)

5. Docker Compose for Local Multi-Service Development

# docker-compose.yml — full local stack
services:
  agent:
    build: .
    ports:
      - "8000:8000"
    env_file: .env
    depends_on:
      chroma:
        condition: service_healthy
      redis:
        condition: service_healthy
    volumes:
      - ./src:/app/src   # Hot-reload code changes in dev

  # ChromaDB vector database
  chroma:
    image: chromadb/chroma:latest
    ports:
      - "8001:8000"
    volumes:
      - chroma_data:/chroma/chroma
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/api/v1/heartbeat"]
      interval: 10s
      timeout: 5s
      retries: 5

  # Redis for rate limiting and session caching
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s

volumes:
  chroma_data:

6. Node.js Agent Dockerfile

# Dockerfile (Node.js AI Agent — e.g., LangChain.js)
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json .
RUN npm ci --only=production

FROM node:20-alpine AS runtime
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY src/ ./src/
COPY package.json .

RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

HEALTHCHECK --interval=30s --timeout=10s \
    CMD wget -qO- http://localhost:3000/health || exit 1

EXPOSE 3000
CMD ["node", "src/index.js"]

7. Deploying to AWS ECS (Fargate)

# Build and push to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS \
  --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com

docker build -t my-agent .
docker tag my-agent:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/my-agent:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/my-agent:latest

# ECS handles: container scheduling, health checks, auto-restart on failure,
# rolling deployments (zero-downtime), and auto-scaling rules

8. Optimizing Startup Time for Large Models

If your agent loads an embedding model at startup (e.g., for local embedding without the OpenAI API), cold start time can be 30-90 seconds. Strategies to reduce this:

  • Pre-bake the model into the image: COPY models/ ./models/ in your Dockerfile — avoids downloading at runtime but increases image size
  • EFS / persistent volumes: Mount a pre-populated EFS volume in ECS — the model files are cached on the volume, not re-downloaded each deployment
  • Increase start-period in health checks: Set --start-period=120s so ECS doesn't kill the container before the model loads
  • Use OpenAI embeddings in production: Offload embedding to the API and avoid loading models locally entirely

Frequently Asked Questions

Should I use Docker or serverless for AI agents?

Serverless (AWS Lambda, Vercel Functions) is great for stateless, short-lived API calls. Docker/ECS is better when your agent: loads large models at startup, maintains in-memory state, needs persistent background processes, or runs for more than 15 minutes. For most RAG-based chatbots, serverless is sufficient.

How do I handle CUDA/GPU in Docker?

Use the official NVIDIA CUDA base images (nvidia/cuda:12.1-cudnn8-runtime-ubuntu22.04) and require the NVIDIA Container Toolkit on the host. Use GPU-enabled ECS instances (p3, g4dn families) for AWS deployment. Only worth the complexity if you're running local LLM inference — otherwise use the OpenAI API.

Conclusion

Containerizing your AI agent is a one-time investment that pays off every deploy. Multi-stage builds keep images lean, proper secrets management keeps them secure, and health checks ensure orchestrators can detect and recover from failures automatically. Once your agent is containerized, deploying to any cloud becomes a solved problem.

Continue Reading

👨‍💻
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK