Why a unified chat assistant is hard
I've been thinking about what it would take to build a single, unified, chat assistant. The kind that will one day drive our interactions at the operating system level.
From the outside, there's pressure for an assistant to appear as a single chat entity like Siri or Alexa. But on the inside, chat assistants span several categories with vastly different UX and infrastructure needs.
Here are five distinct roles an OS-level assistant might play:
1. Impersonal Chat Assistants
These require a frontier LLM and web search. The model is tuned to respond generally and neutrally across diverse questions. Personal information might be stored in chat history but isn't leveraged—the LLM doesn't necessarily know who you are. Example: ChatGPT Classic
2. Personal Companions
These need chat history compactification and user profile management. The LLM is tuned for empathy; its job is to support you and your thinking, which slightly conflicts with perfect neutrality. Personal information is actively extracted and curated for future use. Examples: Replika, OpenAI's memory feature
3. Personal Information Assistant
These require document processing and retrieval capabilities. The LLM is tuned to run search tools and synthesize coherent, cited answers. Latency suffers: while casual conversation is possible, the LLM prioritizes deep thinking and research. Example: Microsoft 365 Copilot
4. In-App Concierges
These demand careful integration between the LLM and its host application. The model translates user requests into contextual application actions. Issues like dead-ends and unwanted side-effects must be managed to prevent accidental breakage. The LLM needs minimal conversational ability within the application's domain. Examples: Cursor, AI onboarding agents
5. Workflow Bots
These need diverse tools and models of how to use them. The LLM needn't be conversational as long as it can translate instructions into workflow plans. Higher-level concepts like async task management, loops, schedules, notifications, and error handling become necessary to meet user expectations. Example: Zapier Agent.
The Platform Challenge
Now back to the challenge facing platform companies.
Customers see that Spotlight-style chat hovering above their screen and they expect a single, unified assistant.
But in reality, no such thing exists yet.
You could ship top-tier workflow automation on a phone, but not a complete chat assistant (at least today). You could instrument any single app with LLM concierge support, but users wouldn't want to watch a general-purpose LLM fumble through learning a new app in real time.
So what's a platform company to ship?
You must pick—and stick to—a single type of assistant, then clearly communicate that choice to users. Otherwise, you'll get tangled in conflicting design and technical constraints.
For all the industry jokes about OpenAI model names, this likely explains part of their strategy. Each awkwardly named model (really "augmented model" at this point) excels at a different style of chat-based assistance. None is universally superior; they target different capabilities: deep reasoning, rapid chit-chat, instruction-following, etc.
And then what?
Then you build your second (augmented) model and figure out how to switch betwen the two.
- Maybe users explicitly choose which one they're talking to, like with OpenAI.
- Maybe you use OS context to choose for the user.
- Maybe you use chat context to transition between models.
OpenAI's Canvas appears to facilitate collaborative discussion between user and assistant about larger documents. But I think it's also introducing a new form of model interaction, where users converse with an entire chorus of models that swap in and out, providing different UIs as the conversation evolves.
This will be the design reality until AGI
It seems reasonable to claim that until we reach AGI, we'll deal with models that excel in some areas but not others.
That's presumably the G
part of AGI.
If so, platform-level chat assistants must strategically select narrow areas of competence, then evolve UX mechanisms that let users seamlessly switch between new models as they become available.