Guide

Multi-Agent Orchestration: A Practical Guide for 2026

You've decided your system needs multiple agents. Good — for the right problem, multi-agent architectures dramatically outperform single agents. Now comes the hard part: how do they talk to each other, who decides what runs when, and what happens whe

March 18, 2026·13 min read·1,768 words

Multi-agent orchestration is the engineering of coordination. It's the difference between three brilliant people in a room shouting over each other and three brilliant people in a room with a clear agenda and a facilitator.

This guide covers the four patterns that dominate production multi-agent systems in 2026, with real code using OpenAI Agents SDK and LangGraph. We'll also cover the error handling that most tutorials skip — the part that determines whether your system works in production or only in demos.

The Four Orchestration Patterns

Every multi-agent system you'll encounter maps to one (or a combination) of these patterns:

1. Triage (Router) — One agent classifies the request and hands off to a specialist

2. Fan-out / Fan-in (Parallel) — Multiple agents work simultaneously, results are merged

3. Supervisor (Hierarchical) — A central agent delegates tasks and reviews results

4. Pipeline (Sequential) — Agents execute in a fixed order, each building on the previous output

Let's build each one.

Pattern 1: Triage Agent (The Router)

The triage pattern is the most common in production. A lightweight classifier agent receives every request, determines what kind of task it is, and hands the entire conversation to a specialized agent.

This is how customer support systems, coding assistants, and enterprise AI deployments work at scale. OpenAI's Agents SDK was designed around this pattern.

OpenAI Agents SDK implementation


from agents import Agent, Runner

# Specialist agents
billing_agent = Agent(
    name="Billing Agent",
    instructions="""You handle billing questions: invoices,
    payments, refunds, pricing. Be precise with numbers.""",
    model="gpt-5.1"
)

technical_agent = Agent(
    name="Technical Agent",
    instructions="""You handle technical support: bugs,
    configuration, API issues, error codes.""",
    model="gpt-5.1"
)

general_agent = Agent(
    name="General Agent",
    instructions="""You handle general questions: product info,
    feature requests, account management.""",
    model="gpt-5-nano"  # Cheaper model for simple queries
)

# Triage agent with handoffs
triage_agent = Agent(
    name="Triage Agent",
    instructions="""You are a routing agent. Classify the user's
    request and hand off to the right specialist:
    - Billing questions → Billing Agent
    - Technical issues → Technical Agent
    - Everything else → General Agent

    Do NOT answer the question yourself. Always hand off.""",
    model="gpt-5-nano",  # Fast, cheap classifier
    handoffs=[billing_agent, technical_agent, general_agent]
)

# Run
result = await Runner.run(triage_agent, "My invoice shows the wrong amount")
print(f"Handled by: {result.last_agent.name}")
print(result.final_output)
# Handled by: Billing Agent

Key design decisions

Use the cheapest model for triage. The router doesn't need to be smart — it needs to be fast and cheap. GPT-5-nano or Claude Haiku at the gate, heavy models behind it.
Handoffs transfer the full conversation. In the OpenAI SDK, when a handoff occurs, the receiving agent gets the complete conversation history. You can use nest_handoff_history=True to compress prior history into a single message if context is getting large.
Declare handoff targets explicitly. The triage agent can only hand off to agents listed in its handoffs array. This prevents rogue routing and makes the system auditable.

When to use triage

Triage works best when:

Requests fall into clear, non-overlapping categories
Each category needs different expertise or instructions
Volume is high enough that routing overhead pays for itself
You want to use different models per category (expensive for complex, cheap for simple)

Pattern 2: Fan-Out / Fan-In (Parallel Execution)

When a task can be decomposed into independent subtasks, run them in parallel. A coordinator splits the work, fires off agents simultaneously, waits for all results, and merges them.

This pattern is ideal for research tasks, multi-source analysis, and any workflow where subtasks don't depend on each other.

LangGraph implementation


import asyncio
from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated
import operator

class ResearchState(TypedDict):
    query: str
    results: Annotated[list, operator.add]  # Merge results from parallel nodes
    summary: str

async def search_academic(state: ResearchState) -> dict:
    """Agent 1: Search academic papers."""
    result = await llm.generate(
        f"Find academic research on: {state['query']}. "
        "Return key findings with citations."
    )
    return {"results": [{"source": "academic", "content": result}]}

async def search_industry(state: ResearchState) -> dict:
    """Agent 2: Search industry reports."""
    result = await llm.generate(
        f"Find industry analysis on: {state['query']}. "
        "Return market data and trends."
    )
    return {"results": [{"source": "industry", "content": result}]}

async def search_news(state: ResearchState) -> dict:
    """Agent 3: Search recent news."""
    result = await llm.generate(
        f"Find recent news about: {state['query']}. "
        "Return latest developments and announcements."
    )
    return {"results": [{"source": "news", "content": result}]}

async def synthesize(state: ResearchState) -> dict:
    """Merge all results into a coherent summary."""
    all_results = "\n\n".join(
        f"[{r['source']}]: {r['content']}" for r in state["results"]
    )
    summary = await llm.generate(
        f"Synthesize these research findings into a coherent report:\n{all_results}"
    )
    return {"summary": summary}

# Build the graph
graph = StateGraph(ResearchState)
graph.add_node("academic", search_academic)
graph.add_node("industry", search_industry)
graph.add_node("news", search_news)
graph.add_node("synthesize", synthesize)

# Fan-out: START → all three in parallel
graph.add_edge(START, "academic")
graph.add_edge(START, "industry")
graph.add_edge(START, "news")

# Fan-in: all three → synthesize
graph.add_edge("academic", "synthesize")
graph.add_edge("industry", "synthesize")
graph.add_edge("news", "synthesize")
graph.add_edge("synthesize", END)

app = graph.compile()
result = await app.ainvoke({"query": "AI agent market size 2026", "results": []})

Why LangGraph for parallel patterns

LangGraph models workflows as directed graphs with typed state. Each node is a function (or agent), edges define flow, and state is passed between nodes. The framework handles:

Parallel execution — Nodes with no dependency between them run concurrently
State merging — The Annotated[list, operator.add] pattern automatically concatenates results from parallel branches
Conditional routing — Add conditional_edge functions to route based on intermediate state

This is more flexible than the OpenAI SDK's handoff model for workflows that aren't strictly linear. The trade-off: more configuration, steeper learning curve.

Pattern 3: Supervisor (The Manager)

The supervisor pattern puts one agent in charge. It receives the task, breaks it into subtasks, delegates to workers, reviews their output, and decides whether to accept, revise, or reassign.

This is the most powerful pattern — and the most expensive, because the supervisor makes an LLM call for every delegation and review decision. For cost management strategies, see our prompt caching guide — caching can cut multi-agent costs by 50–90%.

Supervisor with tool-calling

The cleanest way to implement a supervisor: make worker agents available as tools.


from agents import Agent, Runner, function_tool

@function_tool
async def research(query: str) -> str:
    """Delegate a research task to the research agent."""
    researcher = Agent(
        name="Researcher",
        instructions="You are a thorough researcher. Search and summarize.",
        model="gpt-5.1"
    )
    result = await Runner.run(researcher, query)
    return result.final_output

@function_tool
async def write_code(spec: str) -> str:
    """Delegate a coding task to the coding agent."""
    coder = Agent(
        name="Coder",
        instructions="You write clean, tested Python code.",
        model="claude-sonnet-4-20250514"
    )
    result = await Runner.run(coder, spec)
    return result.final_output

@function_tool
async def review_code(code: str) -> str:
    """Delegate a code review to the review agent."""
    reviewer = Agent(
        name="Reviewer",
        instructions="Review code for bugs, security, and style.",
        model="gpt-5.1"
    )
    result = await Runner.run(reviewer, f"Review this code:\n{code}")
    return result.final_output

supervisor = Agent(
    name="Supervisor",
    instructions="""You are a project manager. Break tasks into subtasks
    and delegate to specialists. Review all results before presenting
    to the user. If a result is unsatisfactory, explain what's wrong
    and delegate again.""",
    model="claude-opus-4-20250514",  # Smartest model for coordination
    tools=[research, write_code, review_code]
)

result = await Runner.run(
    supervisor,
    "Build a Python function that fetches weather data from Open-Meteo API"
)

Supervisor vs triage: when to use which

Triage	Supervisor
Control flow	One handoff, specialist runs to completion	Supervisor maintains control, iterates
Task complexity	Single-domain requests	Multi-step, multi-domain tasks
Cost	Low (1 routing call + 1 specialist)	High (supervisor call per delegation + review)
Error recovery	Specialist handles own errors	Supervisor can reassign or retry

Use triage for high-volume, single-intent routing. Use supervisor for complex, multi-step tasks where quality matters more than speed.

Pattern 4: Pipeline (Sequential Processing)

Pipelines pass output from one agent to the next in a fixed order. Each agent transforms or enriches the data before passing it forward.


from langgraph.graph import StateGraph, START, END
from typing import TypedDict

class ArticleState(TypedDict):
    topic: str
    outline: str
    draft: str
    edited: str
    final: str

async def outliner(state: ArticleState) -> dict:
    outline = await llm.generate(
        f"Create a detailed outline for an article about: {state['topic']}"
    )
    return {"outline": outline}

async def writer(state: ArticleState) -> dict:
    draft = await llm.generate(
        f"Write a full article based on this outline:\n{state['outline']}"
    )
    return {"draft": draft}

async def editor(state: ArticleState) -> dict:
    edited = await llm.generate(
        f"Edit this draft for clarity, accuracy, and style:\n{state['draft']}"
    )
    return {"edited": edited}

async def formatter(state: ArticleState) -> dict:
    final = await llm.generate(
        f"Format this article with proper headings, meta description, "
        f"and SEO optimization:\n{state['edited']}"
    )
    return {"final": final}

graph = StateGraph(ArticleState)
graph.add_node("outliner", outliner)
graph.add_node("writer", writer)
graph.add_node("editor", editor)
graph.add_node("formatter", formatter)

graph.add_edge(START, "outliner")
graph.add_edge("outliner", "writer")
graph.add_edge("writer", "editor")
graph.add_edge("editor", "formatter")
graph.add_edge("formatter", END)

app = graph.compile()

Pipelines are simple and predictable. Each agent has a clear input and output contract. The downside: if step 2 produces bad output, steps 3 and 4 build on a rotten foundation. Combine with quality gates (conditional edges that loop back on failure) for robustness.

Error Handling: The Part Most Tutorials Skip

According to GitHub's engineering team, most multi-agent workflows fail not because agents produce bad output, but because errors propagate unchecked between agents. Here's the production playbook:

1. Timeouts on every agent call


import asyncio

async def call_agent_with_timeout(agent, message, timeout_seconds=30):
    try:
        result = await asyncio.wait_for(
            Runner.run(agent, message),
            timeout=timeout_seconds
        )
        return result.final_output
    except asyncio.TimeoutError:
        return {"error": "Agent timed out", "agent": agent.name}

Without timeouts, a single stuck agent stalls your entire pipeline indefinitely.

2. Retries with exponential backoff


import random

async def call_with_retry(agent, message, max_retries=3):
    for attempt in range(max_retries):
        try:
            result = await call_agent_with_timeout(agent, message)
            if "error" not in result:
                return result
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)  # Jitter
            await asyncio.sleep(wait)
    return {"error": "Max retries exceeded", "agent": agent.name}

3. Circuit breakers

If an agent fails repeatedly, stop calling it. This prevents cascading failures where Agent A's error triggers Agent B's error handler, which calls Agent A again — creating an infinite retry loop.


class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=60):
        self.failures = 0
        self.threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.last_failure = 0
        self.state = "closed"  # closed = healthy, open = broken

    def can_call(self) -> bool:
        if self.state == "closed":
            return True
        if time.time() - self.last_failure > self.reset_timeout:
            self.state = "half-open"
            return True
        return False

    def record_success(self):
        self.failures = 0
        self.state = "closed"

    def record_failure(self):
        self.failures += 1
        self.last_failure = time.time()
        if self.failures >= self.threshold:
            self.state = "open"

4. Structured handoff contracts

Treat inter-agent communication like API contracts. Define what each agent expects as input and produces as output using schemas. Schema violations should be caught immediately — not three agents downstream.


from pydantic import BaseModel

class ResearchResult(BaseModel):
    summary: str
    sources: list[str]
    confidence: float  # 0-1

class CodeResult(BaseModel):
    code: str
    language: str
    tests_passed: bool

When an agent produces output that doesn't match its schema, retry, repair, or escalate — but never pass malformed data to the next agent.

Framework Comparison: OpenAI Agents SDK vs LangGraph

Feature	OpenAI Agents SDK	LangGraph
Orchestration model	Handoff-based (agent transfer)	Graph-based (state machine)
Best for	Triage, linear handoffs	Parallel, conditional, loops
Model support	OpenAI only	Any model via LangChain
State management	Conversation history	Typed state dict
Built-in features	Handoffs, guardrails, tracing	Checkpointing, human-in-the-loop, streaming
Complexity	Low — opinionated, simple	Higher — flexible, more config
Scale limit	Gets unwieldy at 8-10 agent types	Handles complex graphs well

Rule of thumb: If your workflow is handoff-based and you're in the OpenAI ecosystem, use the Agents SDK. If you need parallel execution, conditional branching, or model-agnostic support, use LangGraph.

Other frameworks worth evaluating: CrewAI (role-based crews, good for structured team workflows), Google ADK (hierarchical agent trees, tight Gemini integration), and AutoGen/AG2 (conversational GroupChat pattern). See our single-agent vs multi-agent comparison for guidance on whether you need multi-agent at all.

Production Checklist

Before deploying a multi-agent system, verify:

[ ] Every agent call has a timeout — No unbounded waits
[ ] Retries have limits and backoff — No infinite loops
[ ] Circuit breakers are in place — Failed agents get isolated
[ ] Inter-agent data has schemas — Catch malformed output early
[ ] Observability is enabled — Trace every agent call, handoff, and error
[ ] Costs are bounded — Set token/call budgets per request to prevent runaway loops
[ ] Fallback paths exist — If the specialist fails, the system degrades gracefully
[ ] Context is managed — Handoffs pass relevant state, not entire histories (context engineering matters here)

FAQ

What's the difference between agent handoffs and agent-as-tool?

In a handoff, the first agent transfers control completely — the receiving agent takes over the conversation and the original agent exits. In agent-as-tool, the parent agent invokes the child agent as a function call, gets back a result, and maintains control. Handoffs are simpler for linear routing; agent-as-tool is better when the parent needs to coordinate multiple agents or review results before continuing.

How many agents should a multi-agent system have?

Start with the minimum that solves your problem — usually 2-4. The OpenAI Agents SDK gets unwieldy above 8-10 agent types. LangGraph handles more complex graphs, but debugging difficulty scales superlinearly with agent count. Add agents only when a single agent demonstrably can't handle the domain. As Anthropic's research shows, sometimes one well-prompted agent outperforms five mediocre ones.

Do all agents in a multi-agent system need the same LLM?

No — and mixing models is a best practice. Use cheap, fast models (GPT-5-nano, Claude Haiku) for routing and classification. Use capable models (GPT-5.1, Claude Sonnet 4) for specialist work. Use the strongest model (Claude Opus 4) for supervision and review. This optimizes both cost and quality.

How do I debug a multi-agent system?

Tracing is essential. Both the OpenAI Agents SDK (built-in tracing) and LangGraph (LangSmith integration) provide end-to-end trace visualization showing every agent call, tool invocation, handoff, and state mutation. Log structured data at every agent boundary. Treat schema violations as bugs, not warnings.

What's the biggest mistake teams make with multi-agent systems?

Building multi-agent when single-agent would suffice. According to Towards Data Science's "Multi-Agent Trap" analysis, the most common failure is adding coordination overhead without proportional capability gains. Start single-agent, measure where it fails, then split only the failing parts into separate agents. The other big mistake: no error handling between agents. An agent failure that silently propagates bad data downstream is worse than a crash.

Can I combine orchestration patterns?

> 💡 Local model tip: Run specialist workers on Qwen 3.5 14B (~$0.20/hr on Vast.ai via vLLM) — 10-50× cheaper than API calls for high-volume agent tasks.

Start with one pattern, prove it works, then compose.

How do multi-agent systems handle context across handoffs?

The OpenAI Agents SDK passes the full conversation history on handoff by default. For large conversations, set nest_handoff_history=True to compress prior history into a single assistant message. In LangGraph, you control state explicitly — pass only the fields each agent needs. The best practice: send structured context objects (task description, key findings, constraints) rather than raw conversation history. This reduces latency and prevents context window bloat.

Recommended Hardware

Frequently Asked Questions

What's the difference between agent handoffs and agent-as-tool?

In a handoff, the first agent transfers control completely — the receiving agent takes over the conversation and the original agent exits. In agent-as-tool, the parent agent invokes the child agent as a function call, gets back a result, and maintains control. Handoffs are simpler for linear routing; agent-as-tool is better when the parent needs to coordinate multiple agents or review results before continuing.

How many agents should a multi-agent system have?

Do all agents in a multi-agent system need the same LLM?

How do I debug a multi-agent system?

What's the biggest mistake teams make with multi-agent systems?

Can I combine orchestration patterns?

Absolutely — most production systems do. A common architecture: triage agent at the front (Pattern 1) routes to a supervisor (Pattern 3) that coordinates specialist workers using fan-out (Pattern 2) for independent subtasks and pipelines (Pattern 4) for sequential processing. 💡 Local model tip: Run specialist workers on Qwen 3.5 14B ( $0.20/hr on Vast.ai via vLLM) — 10-50× cheaper than API calls for high-volume agent tasks. Start with one pattern, prove it works, then compose.

How do multi-agent systems handle context across handoffs?

The OpenAI Agents SDK passes the full conversation history on handoff by default. For large conversations, set nest handoff history=True to compress prior history into a single assistant message. In LangGraph, you control state explicitly — pass only the fields each agent needs. The best practice: send structured context objects (task description, key findings, constraints) rather than raw conversation history. This reduces latency and prevents context window bloat.

🔧 Tools in This Article

Microsoft AutoGen

Make (Integromat)

LangChain

CrewAI

Dust

vLLM

Related Guides

All guides →

Guide

AI Agent Guardrails & Output Validation in 2026: Tools, Patterns & Best Practices

A production AI agent makes thousands of decisions per hour. Some of those decisions will be wrong. Without guardrails, those wrong decisions reach your…

12 min read

Guide

MCP Is Not Dead: Why Server-Side MCP Changes Everything for AI Agents

Meta Title: MCP Is Not Dead: Why Server-Side MCP Changes Everything (2026)

6 min read

Guide

Asia's Physical AI Offensive: XPeng, LG, and the Factory Race

Meta Title: Asia's Physical AI Offensive: XPeng, LG, AgiBot Lead the Robot Factory Race (2026)

6 min read

#ai#llm#api#claude#gpt#coding

The Four Orchestration Patterns

Pattern 1: Triage Agent (The Router)

OpenAI Agents SDK implementation

Key design decisions

When to use triage

Pattern 2: Fan-Out / Fan-In (Parallel Execution)

LangGraph implementation

Why LangGraph for parallel patterns

Pattern 3: Supervisor (The Manager)

Supervisor with tool-calling

Supervisor vs triage: when to use which

Pattern 4: Pipeline (Sequential Processing)

Error Handling: The Part Most Tutorials Skip

1. Timeouts on every agent call

2. Retries with exponential backoff

3. Circuit breakers

4. Structured handoff contracts

Framework Comparison: OpenAI Agents SDK vs LangGraph

Production Checklist

FAQ

What's the difference between agent handoffs and agent-as-tool?

How many agents should a multi-agent system have?

Do all agents in a multi-agent system need the same LLM?

How do I debug a multi-agent system?

What's the biggest mistake teams make with multi-agent systems?

Can I combine orchestration patterns?

How do multi-agent systems handle context across handoffs?

Recommended Hardware

Recommended Products

Frequently Asked Questions

🔧 Tools in This Article

Related Guides

AI Agent Guardrails & Output Validation in 2026: Tools, Patterns & Best Practices

MCP Is Not Dead: Why Server-Side MCP Changes Everything for AI Agents

Asia's Physical AI Offensive: XPeng, LG, and the Factory Race