Token caching will shape agentic innovation.

OpenAI recently published a list of customers who had consumed over one trillion tokens. At current GPT-5 prices, that’s:

  • $1,250,000 in input tokens, or
  • $125,000 in cached input tokens

Now imagine you’re the technical lead at one of these companies, responsible for developing Agentic AI. By engineering your context to be append-only and turning on caching, you can save your company one million dollars overnight.

Do you do it?

On the one hand, who doesn’t want to save their company a million dollars?

On the other hand, using cached tokens comes at a heavy cost in terms of experimental freedom. To commit to the required immutable completon prefixes takes a myriad context engineering tricks off the table. I’m hesitant to even give examples since these have become the Coca-Cola formula of our time.

Andrej Karpathy, one of the founding members of OpenAI, recently said he believed Agentic computing will take another 10 years of experimentation to solve. The experimentation to find a solution will involve a lot of creative context engineering — exactly what cached tokens constrain.

My guess? The cost savings are too great to ignore. As Jobs once said: “great artists ship,” and cost control is a part of that.

For many, that means the path toward Agentic Computing will be one explored with an extreme bias toward append-only agent contexts.

Not necessarily good or bad. Just an interesting example of how the medium can drive the message.

© 2025 Edward Benson. All rights reserved.