Routing Agents

In a post a few days ago, I wrote about how there's not yet a unified AI assistant. As I was thinking about agents, I realized the production-ready world is even more challenging than I'd realized.

Even if you scope your assistant to "workflow bots" or "agents" (whatever you want to call them), there still is no one-size-fits-all product out there.

To level set, let's loosely define an agent as an LLM with reasoning and tool-running capabilities. Users issue requests, and the LLM chats with itself and runs tools while in order to arrive at some fulfillment of the request.

Many of these products are already in production in some form or another. Let's look at how they're organized.

Zapier's Agent builder encourages not one, but many agents. Each agent has its own tools, system instructions, and multi-chat history.

Many Agents

Notably, there's no single-chatbot-overlord which has access to all agent capabilities. Users always select a specific agent to run, and then create a new chat with it.

OpenAI has a very different architecture than Zapier, but the same user-level design.

OpenAI Many Agents

OpenAI users start by selecting an agent, and only then begin to converse with it and use its capabilities.

With Poe, a radically different underlying architecture once againe, it's the same thing.

Poe many agents

As far as I can tell, the only products out there that approach universal agent capabilities are Cursor and Claude Desktop. Using MCP, developers can inject tools into their chat client like plugins. In theory, if you just add All The Tools, you'll have a universal agent.

All the tools

But it doesn't quite work out that way yet. First, MCP is clearly a non-consumer feature, even in Claude Desktop. Engaging with it requires editing a JSON file.

Cursor many agents

And secondly, there are limits to how many MCP servers (read: tools) can be added before hitting serious drawbacks. Anecdotally, I've heard from folks at foundation model companies that things can start to get weird after 10-20 MCP servers.

At that point:

Tool listings along may begin eating up your context window, depending on your model.
Tools from different providers may begin competing with each other - both as "verbs" and as the set of "nouns" they consume.

How should my bot -- which of course knows every pizza shop in town -- engage with the separate choices of orderFood, orderPizza, order, and submitOrder, and oneClickOrder when asked to get me a pizza?

...in a way that's simple enough to be viable as a consumer's daily driver? That is to say: it's got to be easier than the Domino's App.

Compartmentalization feels essential

I think there's a case to be made that even if we assume magic-grade LLMs, compartmentalization will be necessary.

The control surface of my digital life just has too many nouns and verbs with only hair-splitting differences between them.

Do I use Credit Card A or Credit Card B?
Do I save files to Downloads, Desktop, or Documents?
Do I share that photo via WhatsApp, iMessage, or LINE?

Users don't want to babysit an AI that keeps pestering them with choices that feel, to be honest, stupid. But users also don't want to adopt the risk of the AI taking initiative. Otherwise your boss emails to ask why there's a lingerie purchase on your corporate card.

Routing Between Agents

The existing design of production agent platforms hints at one path forward.

Curate user-contributed special-purpose agents
Help users route between them

Today, that routing involves choosing and clicking in the UI. Announcing your intent as a precursor to beginning a chat.

But it's easy to see how that could evolve into automated routing. Given an utterance like "help me build a website", a router-level agent could do the following:

Compare the request to its library of pre-packaged agents
Pick a default one based on reviews or prior user selections
Ask the user if they want to engage with that agent

The router-LLM then cedes control to the sub-agent until some mechanism signals it's time to take it back.

Standardized Intents

A second great simplifier of "annoying user inverventions" is building a standardized set of intents that the user can issue preferences on once, which then applies to all systems.

I might tell my OS that my order of message preferences is always: iMessage, WhatsApp, and then LINE, for example. Or I might tell it that my preference for purchases is always my Chase Visa card.

Mobile phones have mastered this to enable app-to-app interactions, and I suspect we'll see this style of intent brokering appear in agent platoforms as well.

Even in a world with a strictly compartmentalized set of agents, those agents will need to access core atomic actions (pay, send message, save file) that are different from user-to-user. At scale, it doesn't make sense for me to separately configure every single agent to prefer my GMail account over my Outlook one.