Multi-Agent Conversation: Microsoft AutoGen
Dec 29, 2025 β’ 10 min read
AutoGen by Microsoft Research is a framework for building multi-agent AI applications where agents talk to each other, write code, execute it, and iterate until they solve complex tasks. It's particularly famous for enabling self-correcting code generation workflows that would be impossible with a single LLM call.
Why AutoGen? The Power of Conversational Agents
Most AI frameworks focus on a single agent doing all the work. AutoGen's core insight is different: complex problems are solved more reliably when multiple specialized agents collaborate, critique each other's work, and iterate toward a solution.
Consider writing and debugging a data analysis script. A single LLM might write code, but it can't run it. It can't observe the error output. It can't actually verify the chart looks correct. AutoGen solves this by coupling an AssistantAgent (the brain that writes code) with a UserProxyAgent (the hands that execute code in a sandbox and report back results). This loop continues until the task is completeβautomatically, without human intervention.
Real-world AutoGen deployments at Microsoft and partner organizations have shown that this "write β run β fix β repeat" loop can resolve coding tasks in 3-5 iterations that would otherwise require 30+ minutes of manual debugging. The key breakthrough is that AutoGen agents aren't just generating textβthey're running experiments and learning from the results.
1. Core Agent Types
AssistantAgent β The Brain
Powered by an LLM, this agent understands tasks, writes code, plans solutions, and responds to feedback from the UserProxyAgent. It does NOT execute code directly.
UserProxyAgent β The Hands
Represents the human (or automated system) in the conversation. Crucially, it has a code_execution_config that allows it to actually run Python code in a Docker container or local directory and capture the output.
GroupChatManager β The Orchestrator
Manages conversations between 3+ agents, deciding which agent should speak next. Enables complex multi-agent coordination where different specialists handle different parts of a task.
2. Basic Setup and Code Execution
from autogen import UserProxyAgent, AssistantAgent
# Configure LLM
config_list = [{
"model": "gpt-4o",
"api_key": os.environ["OPENAI_API_KEY"]
}]
# The AI brain - writes code and plans
assistant = AssistantAgent(
name="data_scientist",
llm_config={"config_list": config_list},
system_message="""You are a data scientist. When asked to analyze data,
you write Python code using pandas and matplotlib.
Always verify your output visually."""
)
# The executor - runs code and reports results
user_proxy = UserProxyAgent(
name="executor",
human_input_mode="NEVER", # Fully automated
code_execution_config={
"work_dir": "./workspace",
"use_docker": False # Use True in production for isolation
},
max_consecutive_auto_reply=10 # Max iterations before stopping
)
# Start the conversation
user_proxy.initiate_chat(
assistant,
message="Download AAPL stock data and create a 90-day price chart with volume bars."
)
# AutoGen will: write code β execute it β see any errors β fix β re-execute β repeat3. GroupChat: Three or More Agents
For complex tasks, you can create a panel of specialists that work together:
from autogen import GroupChat, GroupChatManager
# Define specialist agents
planner = AssistantAgent("planner", llm_config=llm_config,
system_message="Break tasks into clear steps. Don't write code yourself.")
coder = AssistantAgent("coder", llm_config=llm_config,
system_message="Write clean Python code. Focus on correctness.")
reviewer = AssistantAgent("reviewer", llm_config=llm_config,
system_message="Review code for bugs and security issues. Be critical.")
executor = UserProxyAgent("executor", human_input_mode="NEVER",
code_execution_config={"work_dir": "workspace"})
# Create the group
group_chat = GroupChat(
agents=[planner, coder, reviewer, executor],
messages=[],
max_round=20 # Maximum conversation rounds
)
manager = GroupChatManager(groupchat=group_chat, llm_config=llm_config)
# The planner decides who talks next automatically
executor.initiate_chat(manager, message="Build a web scraper for Hacker News top stories")4. Real-World Use Cases
1. Automated Data Analysis Pipelines
Business intelligence teams use AutoGen to automate repetitive analysis. Instead of a data analyst spending 2 hours writing matplotlib code, they describe what they need in plain English. AutoGen's AssistantAgent writes the data processing code, the UserProxyAgent executes it against a live database, and the system iterates until the chart or report is correct. Teams report reducing routine analysis time by 70%.
2. Automated Bug Fixing
Engineering teams use AutoGen to automatically fix failing tests. The workflow: feed failing test output to AssistantAgent β it reads the code and error β writes a fix β UserProxyAgent runs the tests β if still failing, AssistantAgent tries again with new information. For common error patterns (off-by-one errors, null pointer issues, type mismatches), AutoGen resolves 40-60% of bugs without human intervention.
3. Research Paper Summarization
Research organizations use multi-agent AutoGen setups where: a Reader Agent extracts key claims from PDFs, a Critic Agent identifies potential flaws in the methodology, a Summarizer writes an executive summary, and a Fact-Checker verifies all statistics cited. The combined output is a structured critique that would take a human researcher 3-4 hours per paper.
4. DevOps Automation
Infrastructure teams use AutoGen to automate incident response. When an alert fires (high CPU, disk full, service down), an AutoGen workflow: reads system logs β identifies root cause β writes a remediation script β executes it in a sandboxed environment β reports the outcome. For known failure patterns, the system resolves incidents end-to-end without paging any engineer.
5. AutoGen vs. CrewAI vs. LangGraph
| Dimension | AutoGen | CrewAI | LangGraph |
|---|---|---|---|
| Primary Pattern | Conversational | Role-based tasks | State machines |
| Code Execution | β Built-in | β οΈ Via tool | β οΈ Via node |
| Human Approval | β Native | β οΈ Manual | β Interrupt |
| Best For | Code & data tasks | Content pipelines | Complex agent loops |
| Learning Curve | Low | Very Low | High |
| Flexibility | Medium | Low | Very High |
Frequently Asked Questions
Is AutoGen safe to run? Can agents delete my files?
By default, UserProxyAgent executes code in the current directory. For production use, always set "use_docker": True to run code in an isolated Docker container. This prevents agents from accessing your host filesystem, network, or system resources. Never run AutoGen with human_input_mode="NEVER" and Docker disabled in production.
What happens when agents disagree?
In GroupChat mode, the GroupChatManager (also an LLM) decides which agent speaks next. When agents produce conflicting suggestions, the manager typically defers to the most authoritative agent for that domain (e.g., the reviewer over the coder for quality decisions). You can also configure custom speaker selection strategies.
How do I prevent agents from getting stuck in infinite loops?
Set max_consecutive_auto_reply on UserProxyAgent and max_round on GroupChat. Also configure a is_termination_msg function that returns True when the task is complete (e.g., when the last message contains "TASK COMPLETE").
Can AutoGen work with local models like Ollama?
Yes. AutoGen supports any OpenAI-compatible API. Configure your config_list with your local server URL:
config_list = [{
"model": "llama3.2",
"base_url": "http://localhost:11434/v1",
"api_key": "ollama" # Required but not used
}]Next Steps
- Run the Quickstart: Try the stock chart example above with your own OpenAI keyβit's the fastest way to see AutoGen's power in action.
- Add Docker Isolation: Set up Docker Desktop and switch
use_dockerto True for safe, isolated code execution. - Build a GroupChat: Create a 3-agent crew (Planner + Coder + Reviewer) for a coding task you care about.
- Explore AutoGen Studio: Microsoft's visual interface for building and testing AutoGen workflows without writing code.
Continue Reading
Vivek
AI EngineerFull-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning β no fluff, just working code and real-world context.