opncrafter

Multi-Agent Conversation: Microsoft AutoGen

Dec 29, 2025 β€’ 10 min read

AutoGen by Microsoft Research is a framework for building multi-agent AI applications where agents talk to each other, write code, execute it, and iterate until they solve complex tasks. It's particularly famous for enabling self-correcting code generation workflows that would be impossible with a single LLM call.

Why AutoGen? The Power of Conversational Agents

Most AI frameworks focus on a single agent doing all the work. AutoGen's core insight is different: complex problems are solved more reliably when multiple specialized agents collaborate, critique each other's work, and iterate toward a solution.

Consider writing and debugging a data analysis script. A single LLM might write code, but it can't run it. It can't observe the error output. It can't actually verify the chart looks correct. AutoGen solves this by coupling an AssistantAgent (the brain that writes code) with a UserProxyAgent (the hands that execute code in a sandbox and report back results). This loop continues until the task is completeβ€”automatically, without human intervention.

Real-world AutoGen deployments at Microsoft and partner organizations have shown that this "write β†’ run β†’ fix β†’ repeat" loop can resolve coding tasks in 3-5 iterations that would otherwise require 30+ minutes of manual debugging. The key breakthrough is that AutoGen agents aren't just generating textβ€”they're running experiments and learning from the results.

1. Core Agent Types

AssistantAgent β€” The Brain

Powered by an LLM, this agent understands tasks, writes code, plans solutions, and responds to feedback from the UserProxyAgent. It does NOT execute code directly.

UserProxyAgent β€” The Hands

Represents the human (or automated system) in the conversation. Crucially, it has a code_execution_config that allows it to actually run Python code in a Docker container or local directory and capture the output.

GroupChatManager β€” The Orchestrator

Manages conversations between 3+ agents, deciding which agent should speak next. Enables complex multi-agent coordination where different specialists handle different parts of a task.

2. Basic Setup and Code Execution

from autogen import UserProxyAgent, AssistantAgent

# Configure LLM
config_list = [{
    "model": "gpt-4o",
    "api_key": os.environ["OPENAI_API_KEY"]
}]

# The AI brain - writes code and plans
assistant = AssistantAgent(
    name="data_scientist",
    llm_config={"config_list": config_list},
    system_message="""You are a data scientist. When asked to analyze data,
    you write Python code using pandas and matplotlib. 
    Always verify your output visually."""
)

# The executor - runs code and reports results
user_proxy = UserProxyAgent(
    name="executor",
    human_input_mode="NEVER",  # Fully automated
    code_execution_config={
        "work_dir": "./workspace",
        "use_docker": False  # Use True in production for isolation
    },
    max_consecutive_auto_reply=10  # Max iterations before stopping
)

# Start the conversation
user_proxy.initiate_chat(
    assistant,
    message="Download AAPL stock data and create a 90-day price chart with volume bars."
)
# AutoGen will: write code β†’ execute it β†’ see any errors β†’ fix β†’ re-execute β†’ repeat

3. GroupChat: Three or More Agents

For complex tasks, you can create a panel of specialists that work together:

from autogen import GroupChat, GroupChatManager

# Define specialist agents
planner = AssistantAgent("planner", llm_config=llm_config,
    system_message="Break tasks into clear steps. Don't write code yourself.")

coder = AssistantAgent("coder", llm_config=llm_config,
    system_message="Write clean Python code. Focus on correctness.")

reviewer = AssistantAgent("reviewer", llm_config=llm_config,
    system_message="Review code for bugs and security issues. Be critical.")

executor = UserProxyAgent("executor", human_input_mode="NEVER",
    code_execution_config={"work_dir": "workspace"})

# Create the group
group_chat = GroupChat(
    agents=[planner, coder, reviewer, executor],
    messages=[],
    max_round=20  # Maximum conversation rounds
)

manager = GroupChatManager(groupchat=group_chat, llm_config=llm_config)

# The planner decides who talks next automatically
executor.initiate_chat(manager, message="Build a web scraper for Hacker News top stories")

4. Real-World Use Cases

1. Automated Data Analysis Pipelines

Business intelligence teams use AutoGen to automate repetitive analysis. Instead of a data analyst spending 2 hours writing matplotlib code, they describe what they need in plain English. AutoGen's AssistantAgent writes the data processing code, the UserProxyAgent executes it against a live database, and the system iterates until the chart or report is correct. Teams report reducing routine analysis time by 70%.

2. Automated Bug Fixing

Engineering teams use AutoGen to automatically fix failing tests. The workflow: feed failing test output to AssistantAgent β†’ it reads the code and error β†’ writes a fix β†’ UserProxyAgent runs the tests β†’ if still failing, AssistantAgent tries again with new information. For common error patterns (off-by-one errors, null pointer issues, type mismatches), AutoGen resolves 40-60% of bugs without human intervention.

3. Research Paper Summarization

Research organizations use multi-agent AutoGen setups where: a Reader Agent extracts key claims from PDFs, a Critic Agent identifies potential flaws in the methodology, a Summarizer writes an executive summary, and a Fact-Checker verifies all statistics cited. The combined output is a structured critique that would take a human researcher 3-4 hours per paper.

4. DevOps Automation

Infrastructure teams use AutoGen to automate incident response. When an alert fires (high CPU, disk full, service down), an AutoGen workflow: reads system logs β†’ identifies root cause β†’ writes a remediation script β†’ executes it in a sandboxed environment β†’ reports the outcome. For known failure patterns, the system resolves incidents end-to-end without paging any engineer.

5. AutoGen vs. CrewAI vs. LangGraph

DimensionAutoGenCrewAILangGraph
Primary PatternConversationalRole-based tasksState machines
Code Executionβœ… Built-in⚠️ Via tool⚠️ Via node
Human Approvalβœ… Native⚠️ Manualβœ… Interrupt
Best ForCode & data tasksContent pipelinesComplex agent loops
Learning CurveLowVery LowHigh
FlexibilityMediumLowVery High

Frequently Asked Questions

Is AutoGen safe to run? Can agents delete my files?

By default, UserProxyAgent executes code in the current directory. For production use, always set "use_docker": True to run code in an isolated Docker container. This prevents agents from accessing your host filesystem, network, or system resources. Never run AutoGen with human_input_mode="NEVER" and Docker disabled in production.

What happens when agents disagree?

In GroupChat mode, the GroupChatManager (also an LLM) decides which agent speaks next. When agents produce conflicting suggestions, the manager typically defers to the most authoritative agent for that domain (e.g., the reviewer over the coder for quality decisions). You can also configure custom speaker selection strategies.

How do I prevent agents from getting stuck in infinite loops?

Set max_consecutive_auto_reply on UserProxyAgent and max_round on GroupChat. Also configure a is_termination_msg function that returns True when the task is complete (e.g., when the last message contains "TASK COMPLETE").

Can AutoGen work with local models like Ollama?

Yes. AutoGen supports any OpenAI-compatible API. Configure your config_list with your local server URL:

config_list = [{
    "model": "llama3.2",
    "base_url": "http://localhost:11434/v1",
    "api_key": "ollama"  # Required but not used
}]

Next Steps

  • Run the Quickstart: Try the stock chart example above with your own OpenAI keyβ€”it's the fastest way to see AutoGen's power in action.
  • Add Docker Isolation: Set up Docker Desktop and switch use_docker to True for safe, isolated code execution.
  • Build a GroupChat: Create a 3-agent crew (Planner + Coder + Reviewer) for a coding task you care about.
  • Explore AutoGen Studio: Microsoft's visual interface for building and testing AutoGen workflows without writing code.

Continue Reading

πŸ‘¨β€πŸ’»
Written by

Vivek

AI Engineer

Full-stack AI engineer with 4+ years building LLM-powered products, autonomous agents, and RAG pipelines. I've shipped AI features to production for startups and worked hands-on with GPT-4o, LangChain, LlamaIndex, and the Vercel AI SDK. I started OpnCrafter to share everything I wish I had when learning β€” no fluff, just working code and real-world context.

GPT-4oLangChainNext.jsVector DBsRAGVercel AI SDK