Multi-Agent Orchestration: A Practical Guide for 2026
You've decided your system needs multiple agents. Good — for the right problem, multi-agent architectures dramatically outperform single agents. Now comes the hard part: how do they talk to each other, who decides what runs when, and what happens whe
You've decided your system needs multiple agents. Good — for the right problem, multi-agent architectures dramatically outperform single agents. Now comes the hard part: how do they talk to each other, who decides what runs when, and what happens when an agent fails?
Multi-agent orchestration is the engineering of coordination. It's the difference between three brilliant people in a room shouting over each other and three brilliant people in a room with a clear agenda and a facilitator.
This guide covers the four patterns that dominate production multi-agent systems in 2026, with real code using OpenAI Agents SDK and LangGraph. We'll also cover the error handling that most tutorials skip — the part that determines whether your system works in production or only in demos.
The Four Orchestration Patterns
Every multi-agent system you'll encounter maps to one (or a combination) of these patterns:
1. Triage (Router) — One agent classifies the request and hands off to a specialist
2. Fan-out / Fan-in (Parallel) — Multiple agents work simultaneously, results are merged
3. Supervisor (Hierarchical) — A central agent delegates tasks and reviews results
4. Pipeline (Sequential) — Agents execute in a fixed order, each building on the previous output
Let's build each one.
Pattern 1: Triage Agent (The Router)
The triage pattern is the most common in production. A lightweight classifier agent receives every request, determines what kind of task it is, and hands the entire conversation to a specialized agent.
This is how customer support systems, coding assistants, and enterprise AI deployments work at scale. OpenAI's Agents SDK was designed around this pattern.
OpenAI Agents SDK implementation
from agents import Agent, Runner
# Specialist agents
billing_agent = Agent(
name="Billing Agent",
instructions="""You handle billing questions: invoices,
payments, refunds, pricing. Be precise with numbers.""",
model="gpt-5.1"
)
technical_agent = Agent(
name="Technical Agent",
instructions="""You handle technical support: bugs,
configuration, API issues, error codes.""",
model="gpt-5.1"
)
general_agent = Agent(
name="General Agent",
instructions="""You handle general questions: product info,
feature requests, account management.""",
model="gpt-5-nano" # Cheaper model for simple queries
)
# Triage agent with handoffs
triage_agent = Agent(
name="Triage Agent",
instructions="""You are a routing agent. Classify the user's
request and hand off to the right specialist:
- Billing questions → Billing Agent
- Technical issues → Technical Agent
- Everything else → General Agent
Do NOT answer the question yourself. Always hand off.""",
model="gpt-5-nano", # Fast, cheap classifier
handoffs=[billing_agent, technical_agent, general_agent]
)
# Run
result = await Runner.run(triage_agent, "My invoice shows the wrong amount")
print(f"Handled by: {result.last_agent.name}")
print(result.final_output)
# Handled by: Billing Agent
Key design decisions
- Use the cheapest model for triage. The router doesn't need to be smart — it needs to be fast and cheap. GPT-5-nano or Claude Haiku at the gate, heavy models behind it.
- Handoffs transfer the full conversation. In the OpenAI SDK, when a handoff occurs, the receiving agent gets the complete conversation history. You can use
nest_handoff_history=Trueto compress prior history into a single message if context is getting large. - Declare handoff targets explicitly. The triage agent can only hand off to agents listed in its
handoffsarray. This prevents rogue routing and makes the system auditable.
When to use triage
Triage works best when:
- Requests fall into clear, non-overlapping categories
- Each category needs different expertise or instructions
- Volume is high enough that routing overhead pays for itself
- You want to use different models per category (expensive for complex, cheap for simple)
Pattern 2: Fan-Out / Fan-In (Parallel Execution)
When a task can be decomposed into independent subtasks, run them in parallel. A coordinator splits the work, fires off agents simultaneously, waits for all results, and merges them.
This pattern is ideal for research tasks, multi-source analysis, and any workflow where subtasks don't depend on each other.
LangGraph implementation
import asyncio
from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated
import operator
class ResearchState(TypedDict):
query: str
results: Annotated[list, operator.add] # Merge results from parallel nodes
summary: str
async def search_academic(state: ResearchState) -> dict:
"""Agent 1: Search academic papers."""
result = await llm.generate(
f"Find academic research on: {state['query']}. "
"Return key findings with citations."
)
return {"results": [{"source": "academic", "content": result}]}
async def search_industry(state: ResearchState) -> dict:
"""Agent 2: Search industry reports."""
result = await llm.generate(
f"Find industry analysis on: {state['query']}. "
"Return market data and trends."
)
return {"results": [{"source": "industry", "content": result}]}
async def search_news(state: ResearchState) -> dict:
"""Agent 3: Search recent news."""
result = await llm.generate(
f"Find recent news about: {state['query']}. "
"Return latest developments and announcements."
)
return {"results": [{"source": "news", "content": result}]}
async def synthesize(state: ResearchState) -> dict:
"""Merge all results into a coherent summary."""
all_results = "\n\n".join(
f"[{r['source']}]: {r['content']}" for r in state["results"]
)
summary = await llm.generate(
f"Synthesize these research findings into a coherent report:\n{all_results}"
)
return {"summary": summary}
# Build the graph
graph = StateGraph(ResearchState)
graph.add_node("academic", search_academic)
graph.add_node("industry", search_industry)
graph.add_node("news", search_news)
graph.add_node("synthesize", synthesize)
# Fan-out: START → all three in parallel
graph.add_edge(START, "academic")
graph.add_edge(START, "industry")
graph.add_edge(START, "news")
# Fan-in: all three → synthesize
graph.add_edge("academic", "synthesize")
graph.add_edge("industry", "synthesize")
graph.add_edge("news", "synthesize")
graph.add_edge("synthesize", END)
app = graph.compile()
result = await app.ainvoke({"query": "AI agent market size 2026", "results": []})
Why LangGraph for parallel patterns
LangGraph models workflows as directed graphs with typed state. Each node is a function (or agent), edges define flow, and state is passed between nodes. The framework handles:
- Parallel execution — Nodes with no dependency between them run concurrently
- State merging — The
Annotated[list, operator.add]pattern automatically concatenates results from parallel branches - Conditional routing — Add
conditional_edgefunctions to route based on intermediate state
This is more flexible than the OpenAI SDK's handoff model for workflows that aren't strictly linear. The trade-off: more configuration, steeper learning curve.
Pattern 3: Supervisor (The Manager)
The supervisor pattern puts one agent in charge. It receives the task, breaks it into subtasks, delegates to workers, reviews their output, and decides whether to accept, revise, or reassign.
This is the most powerful pattern — and the most expensive, because the supervisor makes an LLM call for every delegation and review decision. For cost management strategies, see our prompt caching guide — caching can cut multi-agent costs by 50–90%.
Supervisor with tool-calling
The cleanest way to implement a supervisor: make worker agents available as tools.
from agents import Agent, Runner, function_tool
@function_tool
async def research(query: str) -> str:
"""Delegate a research task to the research agent."""
researcher = Agent(
name="Researcher",
instructions="You are a thorough researcher. Search and summarize.",
model="gpt-5.1"
)
result = await Runner.run(researcher, query)
return result.final_output
@function_tool
async def write_code(spec: str) -> str:
"""Delegate a coding task to the coding agent."""
coder = Agent(
name="Coder",
instructions="You write clean, tested Python code.",
model="claude-sonnet-4-20250514"
)
result = await Runner.run(coder, spec)
return result.final_output
@function_tool
async def review_code(code: str) -> str:
"""Delegate a code review to the review agent."""
reviewer = Agent(
name="Reviewer",
instructions="Review code for bugs, security, and style.",
model="gpt-5.1"
)
result = await Runner.run(reviewer, f"Review this code:\n{code}")
return result.final_output
supervisor = Agent(
name="Supervisor",
instructions="""You are a project manager. Break tasks into subtasks
and delegate to specialists. Review all results before presenting
to the user. If a result is unsatisfactory, explain what's wrong
and delegate again.""",
model="claude-opus-4-20250514", # Smartest model for coordination
tools=[research, write_code, review_code]
)
result = await Runner.run(
supervisor,
"Build a Python function that fetches weather data from Open-Meteo API"
)
Supervisor vs triage: when to use which
| Triage | Supervisor | |
|---|---|---|
| Control flow | One handoff, specialist runs to completion | Supervisor maintains control, iterates |
| Task complexity | Single-domain requests | Multi-step, multi-domain tasks |
| Cost | Low (1 routing call + 1 specialist) | High (supervisor call per delegation + review) |
| Error recovery | Specialist handles own errors | Supervisor can reassign or retry |
Use triage for high-volume, single-intent routing. Use supervisor for complex, multi-step tasks where quality matters more than speed.
Pattern 4: Pipeline (Sequential Processing)
Pipelines pass output from one agent to the next in a fixed order. Each agent transforms or enriches the data before passing it forward.
from langgraph.graph import StateGraph, START, END
from typing import TypedDict
class ArticleState(TypedDict):
topic: str
outline: str
draft: str
edited: str
final: str
async def outliner(state: ArticleState) -> dict:
outline = await llm.generate(
f"Create a detailed outline for an article about: {state['topic']}"
)
return {"outline": outline}
async def writer(state: ArticleState) -> dict:
draft = await llm.generate(
f"Write a full article based on this outline:\n{state['outline']}"
)
return {"draft": draft}
async def editor(state: ArticleState) -> dict:
edited = await llm.generate(
f"Edit this draft for clarity, accuracy, and style:\n{state['draft']}"
)
return {"edited": edited}
async def formatter(state: ArticleState) -> dict:
final = await llm.generate(
f"Format this article with proper headings, meta description, "
f"and SEO optimization:\n{state['edited']}"
)
return {"final": final}
graph = StateGraph(ArticleState)
graph.add_node("outliner", outliner)
graph.add_node("writer", writer)
graph.add_node("editor", editor)
graph.add_node("formatter", formatter)
graph.add_edge(START, "outliner")
graph.add_edge("outliner", "writer")
graph.add_edge("writer", "editor")
graph.add_edge("editor", "formatter")
graph.add_edge("formatter", END)
app = graph.compile()
Pipelines are simple and predictable. Each agent has a clear input and output contract. The downside: if step 2 produces bad output, steps 3 and 4 build on a rotten foundation. Combine with quality gates (conditional edges that loop back on failure) for robustness.
Error Handling: The Part Most Tutorials Skip
According to GitHub's engineering team, most multi-agent workflows fail not because agents produce bad output, but because errors propagate unchecked between agents. Here's the production playbook:
1. Timeouts on every agent call
import asyncio
async def call_agent_with_timeout(agent, message, timeout_seconds=30):
try:
result = await asyncio.wait_for(
Runner.run(agent, message),
timeout=timeout_seconds
)
return result.final_output
except asyncio.TimeoutError:
return {"error": "Agent timed out", "agent": agent.name}
Without timeouts, a single stuck agent stalls your entire pipeline indefinitely.
2. Retries with exponential backoff
import random
async def call_with_retry(agent, message, max_retries=3):
for attempt in range(max_retries):
try:
result = await call_agent_with_timeout(agent, message)
if "error" not in result:
return result
except Exception as e:
if attempt == max_retries - 1:
raise
wait = (2 ** attempt) + random.uniform(0, 1) # Jitter
await asyncio.sleep(wait)
return {"error": "Max retries exceeded", "agent": agent.name}
3. Circuit breakers
If an agent fails repeatedly, stop calling it. This prevents cascading failures where Agent A's error triggers Agent B's error handler, which calls Agent A again — creating an infinite retry loop.
class CircuitBreaker:
def __init__(self, failure_threshold=5, reset_timeout=60):
self.failures = 0
self.threshold = failure_threshold
self.reset_timeout = reset_timeout
self.last_failure = 0
self.state = "closed" # closed = healthy, open = broken
def can_call(self) -> bool:
if self.state == "closed":
return True
if time.time() - self.last_failure > self.reset_timeout:
self.state = "half-open"
return True
return False
def record_success(self):
self.failures = 0
self.state = "closed"
def record_failure(self):
self.failures += 1
self.last_failure = time.time()
if self.failures >= self.threshold:
self.state = "open"
4. Structured handoff contracts
Treat inter-agent communication like API contracts. Define what each agent expects as input and produces as output using schemas. Schema violations should be caught immediately — not three agents downstream.
from pydantic import BaseModel
class ResearchResult(BaseModel):
summary: str
sources: list[str]
confidence: float # 0-1
class CodeResult(BaseModel):
code: str
language: str
tests_passed: bool
When an agent produces output that doesn't match its schema, retry, repair, or escalate — but never pass malformed data to the next agent.
Framework Comparison: OpenAI Agents SDK vs LangGraph
| Feature | OpenAI Agents SDK | LangGraph |
|---|---|---|
| Orchestration model | Handoff-based (agent transfer) | Graph-based (state machine) |
| Best for | Triage, linear handoffs | Parallel, conditional, loops |
| Model support | OpenAI only | Any model via LangChain |
| State management | Conversation history | Typed state dict |
| Built-in features | Handoffs, guardrails, tracing | Checkpointing, human-in-the-loop, streaming |
| Complexity | Low — opinionated, simple | Higher — flexible, more config |
| Scale limit | Gets unwieldy at 8-10 agent types | Handles complex graphs well |
Rule of thumb: If your workflow is handoff-based and you're in the OpenAI ecosystem, use the Agents SDK. If you need parallel execution, conditional branching, or model-agnostic support, use LangGraph.
Other frameworks worth evaluating: CrewAI (role-based crews, good for structured team workflows), Google ADK (hierarchical agent trees, tight Gemini integration), and AutoGen/AG2 (conversational GroupChat pattern). See our single-agent vs multi-agent comparison for guidance on whether you need multi-agent at all.
Production Checklist
Before deploying a multi-agent system, verify:
- [ ] Every agent call has a timeout — No unbounded waits
- [ ] Retries have limits and backoff — No infinite loops
- [ ] Circuit breakers are in place — Failed agents get isolated
- [ ] Inter-agent data has schemas — Catch malformed output early
- [ ] Observability is enabled — Trace every agent call, handoff, and error
- [ ] Costs are bounded — Set token/call budgets per request to prevent runaway loops
- [ ] Fallback paths exist — If the specialist fails, the system degrades gracefully
- [ ] Context is managed — Handoffs pass relevant state, not entire histories (context engineering matters here)
FAQ
What's the difference between agent handoffs and agent-as-tool?
In a handoff, the first agent transfers control completely — the receiving agent takes over the conversation and the original agent exits. In agent-as-tool, the parent agent invokes the child agent as a function call, gets back a result, and maintains control. Handoffs are simpler for linear routing; agent-as-tool is better when the parent needs to coordinate multiple agents or review results before continuing.
How many agents should a multi-agent system have?
Start with the minimum that solves your problem — usually 2-4. The OpenAI Agents SDK gets unwieldy above 8-10 agent types. LangGraph handles more complex graphs, but debugging difficulty scales superlinearly with agent count. Add agents only when a single agent demonstrably can't handle the domain. As Anthropic's research shows, sometimes one well-prompted agent outperforms five mediocre ones.
Do all agents in a multi-agent system need the same LLM?
No — and mixing models is a best practice. Use cheap, fast models (GPT-5-nano, Claude Haiku) for routing and classification. Use capable models (GPT-5.1, Claude Sonnet 4) for specialist work. Use the strongest model (Claude Opus 4) for supervision and review. This optimizes both cost and quality.
How do I debug a multi-agent system?
Tracing is essential. Both the OpenAI Agents SDK (built-in tracing) and LangGraph (LangSmith integration) provide end-to-end trace visualization showing every agent call, tool invocation, handoff, and state mutation. Log structured data at every agent boundary. Treat schema violations as bugs, not warnings.
What's the biggest mistake teams make with multi-agent systems?
Building multi-agent when single-agent would suffice. According to Towards Data Science's "Multi-Agent Trap" analysis, the most common failure is adding coordination overhead without proportional capability gains. Start single-agent, measure where it fails, then split only the failing parts into separate agents. The other big mistake: no error handling between agents. An agent failure that silently propagates bad data downstream is worse than a crash.
Can I combine orchestration patterns?
Absolutely — most production systems do. A common architecture: triage agent at the front (Pattern 1) routes to a supervisor (Pattern 3) that coordinates specialist workers using fan-out (Pattern 2) for independent subtasks and pipelines (Pattern 4) for sequential processing.
> 💡 Local model tip: Run specialist workers on Qwen 3.5 14B (~$0.20/hr on Vast.ai via vLLM) — 10-50× cheaper than API calls for high-volume agent tasks.
Start with one pattern, prove it works, then compose.
How do multi-agent systems handle context across handoffs?
The OpenAI Agents SDK passes the full conversation history on handoff by default. For large conversations, set nest_handoff_history=True to compress prior history into a single assistant message. In LangGraph, you control state explicitly — pass only the fields each agent needs. The best practice: send structured context objects (task description, key findings, constraints) rather than raw conversation history. This reduces latency and prevents context window bloat.
Recommended Hardware
Recommended Products
- NVIDIA GeForce RTX 5090 — Essential for high-performance computing tasks, the RTX 5090 GPU provides the power needed for running complex multi-agent systems.
- HP Z840 Workstation — Built for demanding applications, this workstation offers the stability and performance required to manage and orchestrate multiple agents efficiently.
- Corsair AX1600i 1600W 80+ Platinum Fully Modular Power Supply — Ensures reliable power delivery, crucial for maintaining the performance and longevity of high-end hardware used in multi-agent systems.
Frequently Asked Questions
What's the difference between agent handoffs and agent-as-tool?
How many agents should a multi-agent system have?
Do all agents in a multi-agent system need the same LLM?
How do I debug a multi-agent system?
What's the biggest mistake teams make with multi-agent systems?
Can I combine orchestration patterns?
How do multi-agent systems handle context across handoffs?
🔧 Tools in This Article
All tools →Related Guides
All guides →AI Agent Guardrails & Output Validation in 2026: Tools, Patterns & Best Practices
A production AI agent makes thousands of decisions per hour. Some of those decisions will be wrong. Without guardrails, those wrong decisions reach your…
12 min read
GuideMCP Is Not Dead: Why Server-Side MCP Changes Everything for AI Agents
Meta Title: MCP Is Not Dead: Why Server-Side MCP Changes Everything (2026)
6 min read
GuideAsia's Physical AI Offensive: XPeng, LG, and the Factory Race
Meta Title: Asia's Physical AI Offensive: XPeng, LG, AgiBot Lead the Robot Factory Race (2026)
6 min read