⏱ 9–14 min read🎓 Intermediate → AdvancedUpdated Apr 2026

AWS ECS (Fargate): Serverless Containers

Dec 30, 2025 • 22 min read

You've Dockerized your AI agent. Now where do you run it? EC2 is too manual — you're managing OS patches, instance types, and SSH keys. Kubernetes (EKS) is powerful but requires a dedicated platform engineer. AWS ECS Fargate is the sweet spot: serverless containers where you define your task, and AWS handles server provisioning, scaling, and health checks automatically. You pay only for vCPU and memory consumed.

1. ECS Architecture Overview

Component	Equivalent	Purpose
Cluster	Kubernetes cluster	Logical grouping of services
Task Definition	Pod spec / Dockerfile	Defines container image, CPU, memory, env vars
Task	Running Pod	Single running instance of your container
Service	Deployment + ReplicaSet	Keeps N tasks running, handles failures + scaling
Target Group	Kubernetes Service	Load balancer routes traffic to healthy tasks

2. Infrastructure as Code with AWS CDK

Never click buttons in the AWS Console for production resources. Use CDK (Python) to define your entire stack as code:

pip install aws-cdk-lib constructs

# agent_stack.py
from aws_cdk import (
    Stack, Duration,
    aws_ec2 as ec2,
    aws_ecs as ecs,
    aws_ecs_patterns as ecs_patterns,
    aws_ssm as ssm,
    aws_logs as logs,
)
from constructs import Construct

class AgentStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs):
        super().__init__(scope, construct_id, **kwargs)
        
        # Step 1: VPC — 2 AZs for high availability, NAT gateway for outbound calls to OpenAI
        vpc = ec2.Vpc(self, "AgentVPC",
            max_azs=2,
            nat_gateways=1,  # Required for containers to reach external APIs
            subnet_configuration=[
                ec2.SubnetConfiguration(name="Public", subnet_type=ec2.SubnetType.PUBLIC),
                ec2.SubnetConfiguration(name="Private", subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS),
            ]
        )
        
        # Step 2: ECS Cluster
        cluster = ecs.Cluster(self, "AgentCluster",
            vpc=vpc,
            cluster_name="ai-agents-prod",
            container_insights=True,  # Enables CloudWatch Container Insights
        )
        
        # Step 3: Application Load Balanced Fargate Service (batteries included)
        service = ecs_patterns.ApplicationLoadBalancedFargateService(
            self, "AgentService",
            cluster=cluster,
            cpu=1024,             # 1 vCPU (256, 512, 1024, 2048, 4096)
            memory_limit_mib=2048, # 2GB RAM
            desired_count=2,      # Run 2 tasks for HA
            task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
                image=ecs.ContainerImage.from_asset("./"),  # Builds from local Dockerfile
                container_port=8080,
                log_driver=ecs.LogDrivers.aws_logs(
                    stream_prefix="agent",
                    log_retention=logs.RetentionDays.ONE_WEEK,
                )
            ),
            public_load_balancer=True,
        )

3. Secrets Management with SSM Parameter Store

Never hardcode API keys in your Dockerfile or environment variables in CDK code. Use SSM Parameter Store and inject at runtime:

# Store secrets first (one-time setup from CLI):
aws ssm put-parameter \
    --name "/agents/prod/OPENAI_API_KEY" \
    --type SecureString \
    --value "sk-..."

aws ssm put-parameter \
    --name "/agents/prod/ANTHROPIC_API_KEY" \
    --type SecureString \
    --value "sk-ant-..."

# In CDK: inject secrets as environment variables at task startup
task_definition = ecs.FargateTaskDefinition(self, "TaskDef", cpu=1024, memory_limit_mib=2048)

container = task_definition.add_container("AgentContainer",
    image=ecs.ContainerImage.from_asset("./"),
    secrets={
        # Container gets OPENAI_API_KEY env var, pulled from SSM at startup
        "OPENAI_API_KEY": ecs.Secret.from_ssm_parameter(
            ssm.StringParameter.from_secure_string_parameter_attributes(
                self, "OpenAIKey",
                parameter_name="/agents/prod/OPENAI_API_KEY",
                version=1,
            )
        ),
        "ANTHROPIC_API_KEY": ecs.Secret.from_ssm_parameter(
            ssm.StringParameter.from_secure_string_parameter_attributes(
                self, "AnthropicKey",
                parameter_name="/agents/prod/ANTHROPIC_API_KEY", version=1
            )
        ),
    }
)

4. Autoscaling Configuration

# Add autoscaling to the service
scaling = service.service.auto_scale_task_count(
    min_capacity=1,   # Always keep 1 task running (avoid cold start)
    max_capacity=10,
)

# Scale on CPU utilization — good for compute-bound agent workloads
scaling.scale_on_cpu_utilization("CpuScaling",
    target_utilization_percent=70,  # Scale up when CPU > 70%
    scale_in_cooldown=Duration.seconds(60),
    scale_out_cooldown=Duration.seconds(30),
)

# Scale on request count — better for API gateway patterns
scaling.scale_on_request_count("RequestScaling",
    requests_per_target=100,  # Each task handles 100 concurrent req/min
    target_group=service.target_group,
    scale_in_cooldown=Duration.seconds(120),
    scale_out_cooldown=Duration.seconds(30),
)

5. Health Checks & Graceful Shutdown

# In your FastAPI/Flask app: implement a health check endpoint
from fastapi import FastAPI
app = FastAPI()

@app.get("/health")
async def health_check():
    """ECS sends GET /health every 30s. Return 200 = healthy."""
    return {"status": "healthy", "model_loaded": model is not None}

# CDK: configure health check
service.target_group.configure_health_check(
    path="/health",
    healthy_http_codes="200",
    interval=Duration.seconds(30),
    timeout=Duration.seconds(5),
    unhealthy_threshold_count=3,  # 3 failed checks = task replaced
)

# Graceful shutdown: handle SIGTERM for in-flight requests
import signal
import asyncio

def handle_shutdown(sig, frame):
    """Complete in-flight requests before shutting down."""
    print("SIGTERM received, finishing in-flight requests...")
    # Give agent time to complete current task
    asyncio.get_event_loop().run_until_complete(shutdown_handler())

signal.signal(signal.SIGTERM, handle_shutdown)

6. Deploying and Updating

# First deploy
cdk bootstrap  # One-time: creates CDK S3 bucket for assets
cdk deploy     # Builds Docker image, pushes to ECR, deploys stack

# Update (triggers rolling deployment — zero downtime)
# Just push a new commit and re-run:
cdk deploy

# Rolling update process (automatic):
# 1. Build and push new image to ECR
# 2. Register new Task Definition revision
# 3. Start new tasks with new image (draining traffic to them via LB)
# 4. Wait for health checks to pass on new tasks
# 5. Stop old tasks — no downtime!

# Monitor deployment
aws ecs describe-services --cluster ai-agents-prod --services AgentService \
    --query 'services[0].{Running:runningCount,Pending:pendingCount,Desired:desiredCount}'

Frequently Asked Questions

ECS Fargate vs EKS — which should I choose?

Choose Fargate when: your team has fewer than 5 engineers, you don't have Kubernetes expertise, you want simple scaling without node management, and you don't need GPU instances. Choose EKS when: you need GPU workloads (Fargate doesn't support GPUs), you have existing Kubernetes expertise, or you need advanced networking/service mesh features that ECS doesn't support.

How do I handle long-running agent tasks (5+ minutes)?

ECS Fargate tasks can run indefinitely — there's no timeout (unlike Lambda's 15-minute limit). For very long agent tasks, implement task queuing: use SQS to queue agent jobs, and have a separate ECS batch service pull and process jobs. This decouples request acceptance from agent execution and makes scaling and retry logic simpler.

Conclusion

AWS ECS Fargate with CDK is the fastest path from a Dockerized AI agent to a production-grade, auto-scaling deployment. The CDK patterns shown here — VPC configuration, SSM secrets injection, health checks, autoscaling — are the same patterns used by production AI companies. Importantly, everything is code: your entire infrastructure is reproducible, version-controlled, and deployable with a single command.

Continue Reading

👨‍💻

Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK

More about me →GitHub ↗Contact