AWS ECS (Fargate): Serverless Containers
Dec 30, 2025 • 22 min read
You've Dockerized your AI agent. Now where do you run it? EC2 is too manual — you're managing OS patches, instance types, and SSH keys. Kubernetes (EKS) is powerful but requires a dedicated platform engineer. AWS ECS Fargate is the sweet spot: serverless containers where you define your task, and AWS handles server provisioning, scaling, and health checks automatically. You pay only for vCPU and memory consumed.
1. ECS Architecture Overview
| Component | Equivalent | Purpose |
|---|---|---|
| Cluster | Kubernetes cluster | Logical grouping of services |
| Task Definition | Pod spec / Dockerfile | Defines container image, CPU, memory, env vars |
| Task | Running Pod | Single running instance of your container |
| Service | Deployment + ReplicaSet | Keeps N tasks running, handles failures + scaling |
| Target Group | Kubernetes Service | Load balancer routes traffic to healthy tasks |
2. Infrastructure as Code with AWS CDK
Never click buttons in the AWS Console for production resources. Use CDK (Python) to define your entire stack as code:
pip install aws-cdk-lib constructs
# agent_stack.py
from aws_cdk import (
Stack, Duration,
aws_ec2 as ec2,
aws_ecs as ecs,
aws_ecs_patterns as ecs_patterns,
aws_ssm as ssm,
aws_logs as logs,
)
from constructs import Construct
class AgentStack(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs):
super().__init__(scope, construct_id, **kwargs)
# Step 1: VPC — 2 AZs for high availability, NAT gateway for outbound calls to OpenAI
vpc = ec2.Vpc(self, "AgentVPC",
max_azs=2,
nat_gateways=1, # Required for containers to reach external APIs
subnet_configuration=[
ec2.SubnetConfiguration(name="Public", subnet_type=ec2.SubnetType.PUBLIC),
ec2.SubnetConfiguration(name="Private", subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS),
]
)
# Step 2: ECS Cluster
cluster = ecs.Cluster(self, "AgentCluster",
vpc=vpc,
cluster_name="ai-agents-prod",
container_insights=True, # Enables CloudWatch Container Insights
)
# Step 3: Application Load Balanced Fargate Service (batteries included)
service = ecs_patterns.ApplicationLoadBalancedFargateService(
self, "AgentService",
cluster=cluster,
cpu=1024, # 1 vCPU (256, 512, 1024, 2048, 4096)
memory_limit_mib=2048, # 2GB RAM
desired_count=2, # Run 2 tasks for HA
task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
image=ecs.ContainerImage.from_asset("./"), # Builds from local Dockerfile
container_port=8080,
log_driver=ecs.LogDrivers.aws_logs(
stream_prefix="agent",
log_retention=logs.RetentionDays.ONE_WEEK,
)
),
public_load_balancer=True,
)3. Secrets Management with SSM Parameter Store
Never hardcode API keys in your Dockerfile or environment variables in CDK code. Use SSM Parameter Store and inject at runtime:
# Store secrets first (one-time setup from CLI):
aws ssm put-parameter \
--name "/agents/prod/OPENAI_API_KEY" \
--type SecureString \
--value "sk-..."
aws ssm put-parameter \
--name "/agents/prod/ANTHROPIC_API_KEY" \
--type SecureString \
--value "sk-ant-..."
# In CDK: inject secrets as environment variables at task startup
task_definition = ecs.FargateTaskDefinition(self, "TaskDef", cpu=1024, memory_limit_mib=2048)
container = task_definition.add_container("AgentContainer",
image=ecs.ContainerImage.from_asset("./"),
secrets={
# Container gets OPENAI_API_KEY env var, pulled from SSM at startup
"OPENAI_API_KEY": ecs.Secret.from_ssm_parameter(
ssm.StringParameter.from_secure_string_parameter_attributes(
self, "OpenAIKey",
parameter_name="/agents/prod/OPENAI_API_KEY",
version=1,
)
),
"ANTHROPIC_API_KEY": ecs.Secret.from_ssm_parameter(
ssm.StringParameter.from_secure_string_parameter_attributes(
self, "AnthropicKey",
parameter_name="/agents/prod/ANTHROPIC_API_KEY", version=1
)
),
}
)4. Autoscaling Configuration
# Add autoscaling to the service
scaling = service.service.auto_scale_task_count(
min_capacity=1, # Always keep 1 task running (avoid cold start)
max_capacity=10,
)
# Scale on CPU utilization — good for compute-bound agent workloads
scaling.scale_on_cpu_utilization("CpuScaling",
target_utilization_percent=70, # Scale up when CPU > 70%
scale_in_cooldown=Duration.seconds(60),
scale_out_cooldown=Duration.seconds(30),
)
# Scale on request count — better for API gateway patterns
scaling.scale_on_request_count("RequestScaling",
requests_per_target=100, # Each task handles 100 concurrent req/min
target_group=service.target_group,
scale_in_cooldown=Duration.seconds(120),
scale_out_cooldown=Duration.seconds(30),
)5. Health Checks & Graceful Shutdown
# In your FastAPI/Flask app: implement a health check endpoint
from fastapi import FastAPI
app = FastAPI()
@app.get("/health")
async def health_check():
"""ECS sends GET /health every 30s. Return 200 = healthy."""
return {"status": "healthy", "model_loaded": model is not None}
# CDK: configure health check
service.target_group.configure_health_check(
path="/health",
healthy_http_codes="200",
interval=Duration.seconds(30),
timeout=Duration.seconds(5),
unhealthy_threshold_count=3, # 3 failed checks = task replaced
)
# Graceful shutdown: handle SIGTERM for in-flight requests
import signal
import asyncio
def handle_shutdown(sig, frame):
"""Complete in-flight requests before shutting down."""
print("SIGTERM received, finishing in-flight requests...")
# Give agent time to complete current task
asyncio.get_event_loop().run_until_complete(shutdown_handler())
signal.signal(signal.SIGTERM, handle_shutdown)6. Deploying and Updating
# First deploy
cdk bootstrap # One-time: creates CDK S3 bucket for assets
cdk deploy # Builds Docker image, pushes to ECR, deploys stack
# Update (triggers rolling deployment — zero downtime)
# Just push a new commit and re-run:
cdk deploy
# Rolling update process (automatic):
# 1. Build and push new image to ECR
# 2. Register new Task Definition revision
# 3. Start new tasks with new image (draining traffic to them via LB)
# 4. Wait for health checks to pass on new tasks
# 5. Stop old tasks — no downtime!
# Monitor deployment
aws ecs describe-services --cluster ai-agents-prod --services AgentService \
--query 'services[0].{Running:runningCount,Pending:pendingCount,Desired:desiredCount}'Frequently Asked Questions
ECS Fargate vs EKS — which should I choose?
Choose Fargate when: your team has fewer than 5 engineers, you don't have Kubernetes expertise, you want simple scaling without node management, and you don't need GPU instances. Choose EKS when: you need GPU workloads (Fargate doesn't support GPUs), you have existing Kubernetes expertise, or you need advanced networking/service mesh features that ECS doesn't support.
How do I handle long-running agent tasks (5+ minutes)?
ECS Fargate tasks can run indefinitely — there's no timeout (unlike Lambda's 15-minute limit). For very long agent tasks, implement task queuing: use SQS to queue agent jobs, and have a separate ECS batch service pull and process jobs. This decouples request acceptance from agent execution and makes scaling and retry logic simpler.
Conclusion
AWS ECS Fargate with CDK is the fastest path from a Dockerized AI agent to a production-grade, auto-scaling deployment. The CDK patterns shown here — VPC configuration, SSM secrets injection, health checks, autoscaling — are the same patterns used by production AI companies. Importantly, everything is code: your entire infrastructure is reproducible, version-controlled, and deployable with a single command.
Continue Reading
Vivek
AI EngineerFull-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.