~/sidharth.dev
I Turned My Agent Orchestration Into an RPG (And It Made Everything Click)
fig.0: rpg-agent-orchestration-voxel-office

~/blog/rpg-agent-orchestration-voxel-office

8 min · · 1,548 words

I Turned My Agent Orchestration Into an RPG (And It Made Everything Click)

Most agent dashboards are spinners pretending to be alive. I built mine as a 3D voxel office — every agent is a blocky character with a name, a room, and a status badge floating over its head. It's unironic, it's silly, and it's the best dev-tooling decision I've made this year.

#ai-agents#agent-orchestration#multi-agent-systems#developer-tools#visualization#three-js
Hero — every squad in motion, REVIEW/WORKING/DONE badges visible across the lobby.
Hero — every squad in motion, REVIEW/WORKING/DONE badges visible across the lobby.

The core idea: agent orchestration isn't a software problem, it's a coordination problem. Coordination is something humans have built tools for since cities — org charts, war rooms, kanban boards. They all encode the same trick: turn invisible parallel work into something you can point at. A 3D voxel office is just the latest version of that trick.

The hook

Most agent orchestration UIs look like this: a tree of nested boxes, a stack of JSON logs, or a Kanban board pretending to be alive. You stare at it long enough and you stop seeing agents — you start seeing a build pipeline with extra steps.

So I built mine as a 3D voxel office. Each agent is a little blocky character. They live in themed rooms. They walk to a council chamber when it's review time. A label floats over their head that says WORKING, REVIEW, or DONE.

It is, unironically, the best dev-tooling decision I've made this year.

Why an RPG?

Three things break in normal agent dashboards:

  1. You can't feel parallelism. A spinner is a spinner. Twelve spinners is twelve spinners. Your brain refuses to model it.
  2. You can't tell who's idle. Logs only fire when something happens. Silence is ambiguous — busy, blocked, or dead?
  3. Roles blur into prompts. Every agent ends up looking like "the one with the long system prompt." There's no identity, just config.

A spatial, character-driven interface fixes all three at once. Position is parallelism. Stillness is idleness. A face and a name is identity.

The cast

I split the agents into squads, each in its own color-coded room. Naming was deliberate — short, slightly silly, easy to say out loud during a debug session. Boring names produce boring mental models.

Engineering (the green room)

  • archi — architect. Designs before anyone else writes a line. Owns the "does this fit the system" call.
  • fronto — frontend. Components, routes, client state.
  • backo — backend. APIs, services, data layer.
  • testo — tests. Writes them, runs them, refuses to ship without them.

Growth (the pink room)

  • buzzy — distribution and reach. Knows where audiences live.
  • growthy — experiments and funnels. Owns the "did this actually move a number" question.
  • wordy — copy and narrative. Translates features into language.

Operations (the amber room)

  • shipper — release engineer. Cuts builds, runs deploys, watches the rollout.
  • guardy — security and compliance. The friendly paranoid.
  • scaley — infra and capacity. Wakes up if a graph turns red.

Product (the blue room)

  • pixie — design. Tokens, layouts, motion.
  • prio — prioritization. Argues about what matters this week.
  • clause — contracts, policy, risk language. Often working alone, which feels right.

The Council (the purple room)

  • nova, sage, blaze, vera — the reviewers. They don't build. They critique. They show up when work is ready and stamp DONE (or send it back).

That's it. Seventeen named characters. Every prompt I send routes to one of them.

Every squad in motion — clusters by department, status badges across the floor.
Every squad in motion — clusters by department, status badges across the floor.

Engineering room

Engineering — archi DONE, fronto/backo WORKING with "Endpoint wired — tests next", testo DONE.
Engineering — archi DONE, fronto/backo WORKING with "Endpoint wired — tests next", testo DONE.

Growth room

Growth — buzzy, growthy, wordy each WORKING in the pink-themed room.
Growth — buzzy, growthy, wordy each WORKING in the pink-themed room.

Operations room

Operations — shipper and guardy WORKING, scaley DONE under amber lighting.
Operations — shipper and guardy WORKING, scaley DONE under amber lighting.

Product room

Product — pixie and prio paired up, both WORKING.
Product — pixie and prio paired up, both WORKING.
Legal — clause working solo, gold-themed empty room.
Legal — clause working solo, gold-themed empty room.

The architecture under the costume

Strip the voxel skin off and the system is straightforward:

[ Orchestrator ] ── dispatches task ──▶ [ Squad room ]
        │                                     │
        │                                ┌────┴────┐
        │                                ▼         ▼
        │                            [ Agent ]  [ Agent ]   ← parallel
        │                                │         │
        │                                ▼         ▼
        │                              writes work artifacts
        │                                │
        ▼                                ▼
[ Council room ] ◀── ready-for-review ──┘
        │
        ├──▶ nova   (one critique lens)
        ├──▶ sage   (another lens)
        ├──▶ blaze  (another lens)
        └──▶ vera   (another lens)
                │
                ▼
        merged verdict → back to squad or → DONE

Each room is a topic-scoped workspace. Each agent is a system prompt + a small toolset + a state machine with three states: WORKING, REVIEW, DONE. The orchestrator is just a router with a queue.

The 3D scene is a thin client over that state. Position = which room. Animation = current state. The label is literally agent.state. Nothing fancy — but rendering it as a place instead of a table changes how you reason about it.

The Council pattern

This is the part I'm most happy with.

Reviews used to be a single agent reading the output of another agent and rubber-stamping it. Boring, weak, and prone to "looks fine to me" failure.

Now I have four reviewers with deliberately different personalities:

  • nova — looks for what's new and risky. Asks "what could break that didn't exist before?"
  • sage — looks for what's missing. Asks "what would an experienced person have done that this doesn't?"
  • blaze — looks for speed and decisiveness. Pushes to ship if the work is good enough.
  • vera — looks for truth and evidence. Demands tests, links, receipts.

When work hits review, all four run in parallel. Their critiques get merged into a single verdict. They disagree often. The disagreements are the most useful signal in the entire system — they surface tradeoffs the original agent didn't think about.

Visually: the four reviewers walk to the council chamber, line up, and one by one their labels flip from REVIEW to DONE. When all four are green, the work is released.

Council in session — nova and sage say "ship it", blaze and vera want more time. Two DONE, two still in COUNCIL.
Council in session — nova and sage say "ship it", blaze and vera want more time. Two DONE, two still in COUNCIL.

When the four reviewers agree, every badge flips to green and the work is released:

Council resolved — all four DONE, work released.
Council resolved — all four DONE, work released.

Why this beats a dashboard

A few patterns I didn't expect:

Idle becomes obvious. When a squad's room is empty or everyone's standing still, you immediately notice. A dashboard would show "0 active jobs" and you'd shrug. A still room is unsettling in exactly the right way — it makes you ask "why isn't anyone working on growth right now?"

Marketing room mid-shift — three agents at their desks, no badges, total quiet. The wrongness is immediate.
Marketing room mid-shift — three agents at their desks, no badges, total quiet. The wrongness is immediate.

Bottlenecks become spatial. If three agents are stuck on REVIEW and the council chamber is empty, you can see the queue waiting. You don't need to read a metric.

Five agents tagged REVIEW across the floor — work is queued for sign-off.
Five agents tagged REVIEW across the floor — work is queued for sign-off.
…and the council chamber, while all that work waits. No badges, no urgency, just empty desks.
…and the council chamber, while all that work waits. No badges, no urgency, just empty desks.

Naming agents makes you write better prompts. "What should backo know that fronto doesn't?" is a sharper design question than "how should I split the system prompts?" The character forces specificity.

Status broadcasts become storytelling. When an agent finishes a task, it can pop a speech bubble: "Briefed Growth team on 3 tasks." That single line is better changelog than most teams produce.

Things that went wrong (and stayed wrong)

  • Cute names hide capability. wordy sounds like a single-purpose copywriter. It's actually doing narrative architecture. I've had to fight my own naming a few times.
  • The Council can stall. Four reviewers means four chances for someone to nitpick. I had to add a tiebreaker rule and a max-iterations cap.
  • Spatial UIs don't scale linearly. Seventeen agents fits in one screen. Fifty wouldn't. At some point I'll need camera controls or floor levels. Tomorrow's problem.
  • It's harder to debug than logs. When something breaks, you still want a flat text trace. The 3D view is for steady-state awareness, not postmortems. Build both.

What I'd tell someone copying this

  1. Pick a metaphor and commit. Office, dungeon, starship, kitchen — doesn't matter. Pick one and design every element to fit. Mixed metaphors are worse than no metaphor.
  2. Three states, not seven. WORKING / REVIEW / DONE is enough. Every state you add is a bucket of edge cases.
  3. Make idle visible. If your UI only renders activity, you'll never notice when nothing is happening — and "nothing happening" is the most common failure mode of agent systems.
  4. Reviewers should disagree by design. One reviewer is a stamp. Four reviewers with different lenses is a critique.
  5. The character matters more than the prompt. A vivid persona produces sharper prompts naturally. Start with "who is this agent" and the system prompt writes itself.

The takeaway

Agent orchestration isn't a software problem. It's a coordination problem — and coordination is something humans have been building tools for since cities. Org charts, war rooms, situation tables, kanban boards. They all encode the same idea: turn invisible parallel work into something you can point at.

A 3D voxel office is just the latest version of that. It's not serious. It's not enterprise. It's a toy.

But the toy lets me see seventeen agents work in parallel without losing the thread, and that's worth more than any dashboard I've ever built.

Sidharth Satapathy

Sidharth Satapathy

AI Engineer & Builder. 8+ years shipping at scale. Building AI-native tools with Claude Code, MCP servers, and agentic workflows.

Related posts

ask sid about this post