Index

Runtime Subagents: Orchestration as Code

I often ask coding agents to do work that sounds like this:

Check each of these five areas in parallel. Follow up on anything that looks risky. Compare the results, and make a recommendation.

There is a fan-out. There is a join. There is a conditional second pass. There is aggregation at the end. If I describe that in prose, the model has to hold the whole orchestration plan in its head while it is also reading code, choosing tools, tracking partial results, deciding what to launch next, and remembering how the branches relate to each other.

The point of runtime subagents is to make that orchestration explicit: give the model run(), join(), and cancel(), and let it write the workflow as JavaScript.

Prose Is The Wrong Runtime

Ask a model to narrate a ten-branch, two-wave, race-and-cancel workflow and it will try. But prose is just a bad medium for concurrent control flow. Code has already solved this, and the model knows code.

Cloudflare's Code Mode made the same observation at the tool layer: a model writes code more naturally than it calls tools. They know Promise.all, async/await, typed interfaces; the tool-call format is comparatively synthetic. Their fix was to convert the MCP schema into a TypeScript API and let the model write code.

The same argument extends to orchestration. The model writes an async body with three primitives: run() to spawn a child, join() to collect its result, and cancel() to stop work. Once run is non-blocking, basic fan-out is just JavaScript:

const agents = await Promise.all([
  run({ prompt: "Find security risks" }),
  run({ prompt: "Find database risks" }),
  run({ prompt: "Find frontend risks" }),
]);

try {
  // Collect all outputs within a strict 15-second deadline
  const pendingJoins = agents.map(a => join(a.id, { timeout: 15000 }));
  return await Promise.all(pendingJoins);
} catch (timeout) {
  // Kill all branches past deadline to save tokens
  await Promise.all(agents.map(a => cancel(a.id)));
  throw new Error("Scan timed out.");
}

The plan is now in code and not working memory. Each branch runs in an isolated context. The parent collects clean outputs instead of absorbing every intermediate token from every child.

The Tiny API

I tested this in Pi, a minimal agent-harness, similar to Claude Code. It's intentionally small, inspectable, and easy to extend (Mario Zechner's design post covers the architecture). This extensibility makes it a good place to try my weird orchestration idea.

The Extension

The extension (called pi-dispatch hereafter) registers a single tool that accepts a code string. The model writes an async JavaScript body, and the runtime evaluates it with one injected object, sa, which exposes five methods:

sa.run({ prompt, model?, ...})   // -> { id }
sa.join(id)                      // -> { ..., output?, error? }
sa.cancel(id)                    // kill a running child
sa.status(id)                    // inspect one run
sa.list()                        // inspect all runs

When sa.run(spec) fires, it launches a child Pi process in JSON-event mode. The parent reads that process's stdout, tracks the events, and resolves a handle when the child exits cleanly. run is non-blocking by design. That's what makes Promise.all over a set of handles ordinary JavaScript rather than a special case.

sa.cancel is the primitive that changes what the parent can express. Killing a running process is a small thing to implement, but without it, a parallel dispatch is just fan-out and wait. With it, the model can write a race: launch several approaches, take the first answer that comes back good, and stop paying tokens on everything else.

The orchestration body runs inside a new AsyncFunction("sa", code) call, which means sa is the only named parameter and await works natively throughout. The scoping is deliberate but not a security boundary: the model's code simply has no access to require, process, or anything else from the surrounding extension, just the one interface it actually needs.

Five methods and a code string. That's the whole runtime.

Why Not Just Use The Built-In Subagent Tools?

Both Claude & Codex provide the pieces: an Agent tool for spawning subagents, run_in_background for concurrent dispatch, TaskStop to kill a running agent, Monitor to watch process output. You could approximate a race-and-cancel: spin up background agents, monitor their output, stop the losers when one wins. The right operations exist.

But look at what that requires.

Between each step, the parent has to hold the orchestration state as running text between turns: read what the monitor surfaced, decide which branch won, call TaskStop, reconcile the result. The workflow isn't written anywhere. It lives as working memory across turns. If a branch fails at step four, there is no checkpoint.

Benchmark: Same Task, Two Harnesses

Codex is the control: native tool-call orchestration, no external runtime. pi-dispatch is the runtime harness: the model writes JS, fan-out and join happen server-side, only the final return comes back to the parent. The variable is how much context the parent holds while the orchestration runs.

We send the same task to both harnesses, running gpt-5.5:medium: evaluate five strategies in parallel, stop early on hard blockers, and synthesize a ranked recommendation from the strategies that survive.

The pi-dispatch orchestration follows a small shape: fan out five runs, join them, parse one verdict line from each result, and synthesize only the survivors. The interesting part is the contract the parent creates with its children.

Each subagent is told to end with exactly one machine-readable line:

FINAL_VERDICT: BLOCKED
or
FINAL_VERDICT: PROCEED

The parent parses that line, splits the results into blocked and surviving, then sends only the surviving analyses into the synthesis pass. Codex arrives at the same split by reading prose output and deciding what sounds blocked. pi-dispatch makes the split deterministic code.

What The Runtime Buys

With Codex, every subagent return lands in the parent's context. It's in the data path for every intermediate value. With pi-dispatch, the orchestration body runs server-side: subagent outputs flow JS-variable to JS-variable, and only the final synthesis comes back.

Metricpi-dispatchCodex
Parent context used2.2%5%
Wall clock1m 47s2m 32s

Results are aggregated over three runs.

That is why pi-dispatch holds less than half the context: the parent sees the JS body plus one synthesis return, not five subagent returns and the prose reasoning between them.

The wall clock results also surprised me. I expected parity since both fan out in parallel, but the Codex loop spends turns deciding what to do next between returns; pi-dispatch fans out once and waits.

The useful part is how small the runtime is: any harness with subagents is one small layer away from making orchestration executable.

The full extension is just 200 lines of TypeScript. (link)

The Pattern Converged

A few weeks after I wrote this post, Anthropic shipped Dynamic Workflows in Claude Code, and it's the same idea. Claude writes a script against a small set of primitives: agent() spawns a subagent and hands back its result, parallel() runs a batch at once and waits for all of them, pipeline() streams items through stages without stopping at each one, and phase() groups the run into labeled stages you can watch. It's a higher-level take on my run() / join() / cancel(), but the design lines up:

  • Plan. It lives in code, not the transcript, so the model doesn't have to carry it in its head from one turn to the next.
  • Context. Intermediate results stay out of it. Their docs describe them living in “script variables,” so “Claude's context holds only the final answer.” Same variable-to-variable flow that helped shrink the parent context here.
  • Runtime. It “executes the script in an isolated environment, separate from your conversation,” which is just the orchestration body running harness-side.

It's not a perfect match. There's no script-level cancel(), so you can't write the race-and-cancel I made a fuss about earlier. That part didn't make the jump.

None of this was a prediction. Once you feel the friction of bloated context, you end up here anyway. pi-dispatch was 200 lines on a hobby harness; Dynamic Workflows is that idea built at scale.

Nidhish Shah / © 2026 / carpe diem