Use this when one job needs more than one agent working together: an orchestrator dispatching workers, a pipeline of specialists, a peer handoff, or a critic checking a producer. Covers the decision (should this be multi-agent at all), the topology, the handoff contract between agents, shared state, and how you keep cost, latency, and failure under control across the whole system. If a single agent with the right tools can do the job, you do not need this skill, use /ai-agent-design. Most jobs do not need multi-agent.
Related skills: Design each agent first with
/ai-agent-design. Evaluate the system end to end with/agent-eval-harness. Plan tool and data access with/mcp-integration-plan. Monitor it in production with/llm-observability-plan.
The hard part most teams miss
Adding agents feels like adding capacity. It is usually adding surface area to fail on.
- Multi-agent multiplies cost, latency, and failure surface, and the coordination tax often eats the benefit. Two agents are not twice the work, they are the work plus every round trip between them, every duplicated context window, every chance one stalls while the other waits. If a single agent with the right tools clears the bar, ship that. Reach for multiple agents only when the job genuinely needs parallel breadth or separated context that one agent cannot hold at once.
- The handoff contract is the real failure point, not the agents. Each agent can be individually excellent and the system still produces garbage, because agent A handed agent B something B did not expect. What A passes and what B must return is the interface, and an unspecified interface is where every multi-agent system breaks. Pin the contract before you tune any single agent.
- One place has to own termination and total cost, or the system runs away. Per-agent step caps do not bound the system; a parent can re-dispatch workers forever under its own cap. You need a single owner of the global budget and the global stop condition. Without it, a loop you cannot see spends money you did not approve.
Process
Step 1: Gather inputs
Ask the user:
- What is the job, end to end? (One or two sentences. The outcome, not the agents.)
- Why can't one agent do it? (Be specific: parallel breadth, context that won't fit one window, genuinely distinct skills, or an independent check. If you can't answer, it is probably a single agent.)
- What are the distinct roles? (Each agent's job and its one responsibility. If two roles blur, they are one agent.)
- What does "done" look like for the whole system? (A checkable success condition for the job, not per agent.)
- What is the total budget? (Across all agents: rough tool-call ceiling, time, and dollar cost before the system must stop.)
- What is the cost of a wrong final answer? (Reversible and cheap, or irreversible and expensive. This sets how hard the critic or human gate must be.)
Step 2: Confirm it should be multi-agent
Single agent is the default. Go multi-agent only if at least one holds, and none of the cheaper tiers clears the bar:
- Parallel breadth: the job splits into independent sub-tasks that genuinely run at once and shorten wall-clock time.
- Context separation: the work needs more focused context than one window can hold well, so isolated agents each carry their own slice.
- Distinct expertise: sub-tasks need materially different tools, prompts, or models, not just different phrasing.
- Independent verification: the answer needs a separate critic that did not produce it.
If none hold, drop to a single agent (/ai-agent-design) or a plain workflow. Say so plainly. The coordination tax is real and the cheaper tier usually wins.
Step 3: Choose the topology
Pick the simplest shape that fits. Name it explicitly.
- Orchestrator-worker: a lead agent decomposes the job, dispatches workers (often in parallel), and synthesizes their results. The common production pattern, and the right default when sub-tasks are independent and the lead can judge the whole.
- Sequential pipeline: agents run in a fixed order, each consuming the prior output. Use when stages have a hard dependency order. Cheapest to reason about; no real parallelism.
- Peer handoff: control passes between agents by role (triage hands to specialist hands back). Use for routing-style work where the next owner depends on content.
- Debate / critic: a producer and a critic (or several) iterate until the critic passes or a cap is hit. Use when the cost of a wrong answer justifies an independent check. Cap the rounds hard.
Step 4: Pin the handoff contract
This is the load-bearing step. For every edge between agents, define the interface so neither side guesses:
- What the sender passes: the exact payload, its shape, and what is required versus optional. Pass the result, not the full transcript.
- What the receiver must return: the expected output shape and the success or failure signal the caller branches on.
- What "bad input" looks like and who handles it: the receiver validates what it got and rejects clearly, rather than improvising on a malformed payload.
- Shared state vs passed state: decide what lives in shared memory all agents read, and what is passed point to point. Keep shared state small and name its single writer; many writers corrupt it silently.
Step 5: Control cost, latency, and failure across the system
Per-agent limits are not enough. Bound the whole thing:
- Global termination: one owner holds the system stop condition, success met, total budget hit, or unrecoverable failure. This is on top of each agent's own step cap, not replaced by it.
- Total budget: a single ceiling on tool calls, time, and spend across all agents. A parent that re-dispatches workers can blow past every per-agent cap while staying inside each one.
- Latency: parallel work is bounded by the slowest worker plus synthesis. Set per-worker timeouts and decide whether the lead proceeds on partials or fails the run.
- Failure isolation: one worker's failure must not corrupt the run. Return its error to the orchestrator as a result it can route around, retry, or drop, never as a crash that takes the system down.
- Per-agent observability: trace each agent and each handoff separately with a shared run id, so "the system is broken" resolves to a specific agent or a specific edge. See
/llm-observability-plan.
Step 6: Output the orchestration design
# Multi-Agent Orchestration: {{system_name}}
**Job:** {{one sentence}}
**Done means:** {{checkable system-level success condition}}
**Why multi-agent:** {{which Step 2 condition holds, and why a single agent fails}}
**Topology:** {{orchestrator-worker / pipeline / peer handoff / debate}}
## Agents
| Agent | Responsibility | Model/tools | Step cap |
|---|---|---|---|
## Handoff contracts
| Edge (A -> B) | A passes | B returns | Bad-input handling |
|---|---|---|---|
## Shared state
- What is shared: {{fields}}
- Single writer: {{who}}
- What is passed point to point: {{payloads}}
## System control
- Global termination owner: {{who}}
- Total budget (calls / time / spend): {{values}}
- Per-worker timeout + partial policy: {{value, proceed-on-partial or fail}}
- Failure isolation: {{how a worker failure is contained}}
- Observability: {{shared run id, per-agent + per-edge traces}}
## Open questions
- {{unresolved decisions}}
Step 7: Review
Ask the user:
- Could a single agent with these tools do this instead? (If yes, build that.)
- For each handoff, what happens when the sender returns something malformed?
- Who owns the global stop, and what is the worst-case total spend before it fires?
- When one worker hangs or fails, does the system degrade or die?
- Can you tell which agent or which edge caused a bad result, from the traces alone?
Anti-patterns
| Anti-pattern | Why it fails | Do instead |
|---|---|---|
| Multi-agent where one agent fits | Pays the coordination tax for breadth you never needed | Default to a single agent; go multi only when Step 2 holds |
| Unspecified handoff contract | Each agent works, the system still produces garbage at the seam | Pin payload in and result out for every edge before tuning agents |
| Only per-agent caps | A parent re-dispatches workers and blows the system budget while each cap holds | One owner of global termination and total spend |
| Passing the full transcript downstream | Context and cost balloon as every agent carries every other's history | Pass results, not transcripts; keep shared state small |
| Shared state with many writers | Agents overwrite each other and the corruption is invisible | One named writer per field; others read only |
| A worker error crashes the run | One failure kills work the system could have routed around | Return errors to the orchestrator as results to handle |
Output location
Present the orchestration design as formatted text in the conversation for the user to copy into their design doc.