Orchestrating Swarms: Multi-Agent Systems
Dec 29, 2025 • 22 min read
Single Agents are powerful, but limited. They hallucinate, get stuck, and lack specialized knowledge. The future of AI is Swarm Intelligence: small, specialized agents working together.
1. The "Supervisor" Architecture
The most robust pattern for 2025 is the Supervisor Pattern. You have one "Root" LLM (The Supervisor) who does not do work; it only delegates tasks to specialized workers (Tools or simple LLMs).
The Supervisor
"I need a report on AAPL stock." → Delegates to Researcher.
The Workers
Researcher: Fetches data.
Writer: Drafts blog.
2. Code: Building a Swarm with LangGraph
We can implement this by giving the Supervisor a special tool that lets it "route" execution to other nodes.
# 1. Define the Supervisor Prompt
system_prompt = (
"You are a supervisor tasked with managing a conversation between the"
" following workers: {members}. Given the following user request,"
" respond with the worker to act next. Each worker will perform a"
" task and respond with their results and status. when finished,"
" respond with FINISH."
)
# 2. Define the Graph
workflow = StateGraph(AgentState)
workflow.add_node("Supervisor", supervisor_node)
workflow.add_node("Researcher", researcher_node)
workflow.add_node("Coder", coder_node)
# 3. Add Edges (Supervisor -> Workers)
for member in members:
workflow.add_edge(member, "Supervisor") # Workers report back to boss
workflow.add_conditional_edges(
"Supervisor",
lambda x: x["next"], # Read 'next' field from Supervisor output
{
"Researcher": "Researcher",
"Coder": "Coder",
"FINISH": END
}
)
3. Shared State vs Isolated State
How do agents communicate? Two main approaches:
- Blackboard Architecture (Shared): All agents write to a central objects (State). Everyone sees everything. Good for small teams (<5 agents).
- Message Passing (Isolated): Agent A sends a DM to Agent B. Agent C knows nothing. Essential for security and preventing token context overflow.
4. The Problem of "Infinite Loops"
When you put two agents in a room, they often get stuck complimenting each other.
Agent B: "Great code! Anything else?"
Agent A: "No, I am glad you liked it."
Agent B: "I am glad too. Let me know if you need help."
Solution: Implement a termination_condition. For example, checking if the string "TERMINATE" is present, or setting a hard max_turns=10 limit in your Graph.
5. Production Patterns for Multi-Agent Systems
Pattern 1: The Validator Loop
Before any agent's output reaches the user, add a dedicated Validator Agent that checks the output against a rubric. This is especially important for code generation (does it actually run?) and factual claims (are they grounded in sources?).
# Validator node in LangGraph
def validator_node(state: AgentState):
output = state["last_output"]
validation_result = llm.invoke(
f"""Check if this output meets quality standards.
Output: {output}
Reply PASS if correct, or FAIL: <reason> if not."""
)
if validation_result.startswith("FAIL"):
return {"next": "Researcher", "feedback": validation_result}
return {"next": "FINISH"}Pattern 2: Parallelization with Map-Reduce
Some tasks can be broken into independent parallel workstreams. For example, researching 5 different companies simultaneously. LangGraph supports "fan-out" edges that spawn parallel agent paths, then a "fan-in" aggregation node that combines results:
# Fan-out: spawn parallel research agents
for company in ["AAPL", "GOOGL", "MSFT", "AMZN", "NVDA"]:
tasks.append(research_agent.ainvoke({"company": company}))
# Fan-in: aggregate all results
results = await asyncio.gather(*tasks)
final_report = aggregator_agent.invoke({"data": results})This reduces a 5-company research task from 5 sequential LLM calls (~25 seconds) to 1 parallel batch (~8 seconds)—a 3x speedup with identical output quality.
Pattern 3: Human-in-the-Loop Checkpoints
Not every decision should be fully automated. For high-stakes actions (sending emails, making API calls, publishing content), pause the workflow and request explicit human approval before proceeding. LangGraph's interrupt_before creates checkpoints where agents pause and wait:
# Pause before executing the action
app = workflow.compile(
checkpointer=MemorySaver(),
interrupt_before=["send_email", "make_purchase", "delete_files"]
)
# Agent pauses here; human reviews and resumes
app.invoke(task, config)
# Human reviews in your UI, then: app.invoke(None, config) to resume6. Real-World Multi-Agent Deployments
AI-Powered Due Diligence (Finance)
Investment firms use 5-10 agent swarms to automate M&A due diligence. A Document Retriever agent pulls relevant sections from thousands of contract pages. A Risk Analyst agent identifies liability clauses and red flags. A Financial Modeler agent extracts numbers and builds projections. A Report Writer agent synthesizes findings. A task that took 3 analysts 2 weeks now takes 4 hours with human oversight at key decision points.
Content Production Pipeline (Media)
Digital publishers use multi-agent pipelines where a Trend Scout identifies viral topics, a Researcher gathers supporting data and quotes, a Writer drafts the article, an Editor refines for style and tone, an SEO Agent optimizes headers and keywords, and a Fact-Checker verifies all claims. What took 6 hours per article now takes 45 minutes with human review at the end.
Software Bug Resolution (Engineering)
Engineering teams use agent swarms for automated debugging. A Log Analyzer agent reads error logs and stack traces. A Code Searcher agent finds the relevant functions in the codebase. A Root Cause Agent hypothesizes the failure mode. A Fix Generator writes a patch. A Test Runner executes the test suite and verifies the fix. For known bug patterns, this resolves P3/P4 tickets end-to-end without any human involvement.
7. Choosing Your Framework
| Framework | Best For | Learning Curve | Production Ready |
|---|---|---|---|
| LangGraph | Complex stateful workflows | High | ✅ Yes |
| CrewAI | Role-based task delegation | Low | ✅ Yes |
| AutoGen | Code execution & debugging | Low | ✅ Yes |
| AutoGPT | General autonomous tasks | Medium | ⚠️ Limited |
| Custom | Full control, unique patterns | Very High | ✅ If done right |
8. Monitoring and Observability
Multi-agent systems are notoriously hard to debug because failures can occur anywhere in the chain. Production deployments need:
- LangSmith tracing: Records every LLM call, tool invocation, and agent transition with latency and cost data. Essential for debugging why an agent made a wrong decision.
- Step-level logging: Log the input and output of every agent node. When something goes wrong, you need to replay the exact sequence.
- Cost tracking: Each agent invocation costs money. Multi-agent workflows can run 10-50 LLM calls per task. Set budget limits (
max_cost_per_run=$0.50) to prevent runaway costs. - Human review queues: For low-confidence outputs (flagged by the Validator), route to a human review queue instead of blocking the workflow.
Frequently Asked Questions
How many agents is too many?
Most production systems use 3-7 agents. Beyond 10 agents, coordination overhead (each agent needs context about what others are doing) starts to dominate. If you find yourself designing a 15-agent system, step back and ask if some agents can be combined or if the workflow can be simplified. Complexity is the enemy of reliability.
Can agents run in parallel or must they be sequential?
Both patterns are valid. LangGraph supports parallel execution via "fan-out" edges. Sequential is simpler to debug and reason about. Use parallel execution when tasks are genuinely independent (researching 5 companies simultaneously). Use sequential when each step depends on the previous output.
What's the cost of running a multi-agent system vs a single agent?
Multi-agent systems typically cost 3-10x more per task than single-agent approaches, because each agent invocation uses LLM tokens. The tradeoff is quality and reliability. For complex tasks requiring research + writing + review, the quality improvement justifies the cost. For simple tasks, use a single well-prompted agent.
How do I prevent agents from taking dangerous actions?
Use human-in-the-loop checkpoints for any action with real-world consequences (sending emails, making purchases, modifying databases). Add validation agents that check actions against an allow-list before execution. Sandbox code execution in Docker containers. Never give agents API credentials with write permissions unless absolutely necessary—use read-only tokens where possible.
Conclusion
Multi-Agent systems allow you to solve problems that are too complex for a single prompt. By breaking a task into research, writing, and review, you get superhuman reliability. The teams winning in AI-powered products aren't those with the best single prompts—they're those who've mastered orchestration, validation loops, and human-in-the-loop workflows that combine AI speed with human judgment at the right moments.
Continue Reading
Vivek
AI EngineerFull-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning — no fluff, just working code and real-world context.