Context windows are hard limits in a fuzzy paradigm.

So much of agent engineering is learning to steer a world of fuzzy, emergent behavior. But at a certain point, you hit a hard limit and have to make choices about what should & and shouldn't remain in context.

To my knowledge nobody has yet found a gold standard for doing this yet.

  • Compression / Compaction leverages the LLM to make decisions about what to keep and drop, but anyone who has used a coding agent knows the moment of compaction is often the moment where quality drops off a cliff.
  • History Editing leverages human-drafted policies about what kinds of messages can be dropped, in what situations. It's a bit like taking out the garbage. But that garbage was also providing the LLM subtle context about work already happened. Too much history ablation and the agent can become attracted to Sisyphean doom loops.
  • Persistent Memory can act as a non-narrative compliment to compression, a scratch pad a tool-call away. But this is the L2 cache to the chat history's L1 cache. A valuable component, but fundamentally subject to the same limits over time.

My best guess it that we'll come to think of context windows management the same way we think of memory architecture in a processor:

A few tiers of memory storage, with extremely well defined access & cleanup patterns, that work well enough to become standardized so that we can forget they exist.

  • The ChatHistory itself as L1
  • A MemoryTool as L2
  • A Filesystem specific to the agent instance as L3

By definition, L2 and L3 are larger in magnitude than L1, which means they'll need access patterns that support random, semantic address:

  • Unix-style access: grep, glob
  • RAG-like embedding search
  • Semantic addressing (e.g., for coding agents: AST-style address)

We already have most of these in place today.. they're just not standardized across the agent rigs we're all using.

© 2025 Edward Benson. All rights reserved.