How should AI be priced?

March 20, 2026 Digital

AI assistants are quickly becoming an essential tool of modern life, whether users are writing code, drafting legal memos, or planning a holiday. But why do companies like Anthropic, Google, OpenAI or DeepSeek charge for access to large language models (LLMs) in ways that seem so haphazard and opaque? Is there a logic behind the patchwork of usage caps, subscription tiers, and bundles of credits? In a new study, TSE’s Alex Smolin identifies the optimal pricing strategies for providers of this critical new technology.

Why is pricing LLMs such a challenge?

LLMs are unusual products because they are used and valued differently by each consumer. A programmer might rely on an AI assistant to generate code, a lawyer might use it to analyze documents, while a student might ask it to explain a concept or summarize a textbook. Even for the same user, the value of AI varies. Some requests are trivial while others may save hours of work, so each consumer has their own mix of tasks and willingness to pay.
This creates a daunting level of complexity for any provider designing a pricing system. At the same time, once someone buys LLM access, the provider cannot see how they use it across different tasks. Economists describe this combination of hidden preferences and hidden behavior as adverse selection and moral hazard.

How do ‘token budgets’ help to solve this problem?

Most LLM services measure usage in tokens, which loosely correspond to fragments of text. When you type a prompt and receive a response, the system processes input tokens from your prompt and output tokens generated by the model.


A key insight of our research is that the value created by tokens tends to scale in a predictable way. This means that even if LLM users perform many different types of tasks, their behavior can often be summarized by a single number: their overall demand for AI computing capacity. So instead of needing to know every task a user performs, providers can focus on how many tokens they are likely to consume in total.
In our framework, this property allows firms to design simple menus of token budgets. Users then select the plan that best fits their expected demand, revealing information about themselves through their choice.

So how should providers price LLM access?

Our framework suggests that several pricing mechanisms can emerge as optimal. The first is a maximum-spend plan, in which users buy a wallet of credits and spend them as they like until the balance runs out. The second is a minimum-spend plan that commits users to spending at least a certain amount per month in exchange for lower per-token prices. The third is a two-part tariff that combines a flat subscription fee with a usage-based price per token.
In general, the efficient price of an AI token is equal to its marginal cost (the cost of processing an additional token). But firms trying to maximize profit will typically use nonlinear pricing. Heavy users pay more overall, yet their average price per token may be lower than that paid by lighter users.

Anthropic, OpenAI and Google each offer a range of models. Does your theory match their pricing structures?

Many pricing schemes in today’s LLM market closely resemble the strategies predicted by our theoretical model.
Anthropic's tier structure — from Pro at $20 per month to Max 20x at $200 — gives all paid users access to the same family of models but differentiates plans based on how much compute the user can consume. More advanced models tend to use tokens more quickly. For example, an Opus query may consume a user’s budget several times faster than a Sonnet query.
OpenAI’s pricing menu combines two screening strategies. Higher tiers unlock both larger usage allowances and access to more powerful models. For instance, the company reserves the advanced reasoning for its $200 premium subscription tier. 
Our theory also accounts for platforms like Quora's Poe and GitHub Copilot, which resell access to AI models developed by other providers. Both give subscribers a monthly budget of “compute points” they can use across many different models, but Copilot also allows users to exceed this allowance at a fixed per-request price.

What about LLM access for developers?

Developers typically access LLMs through Application Programming Interfaces (APIs), which allow software to interact directly with a model. In these markets, pricing looks very different. Anthropic, OpenAI and Google charge developers a simple linear price per token, with no subscription fees, volume discounts, or bundled allocations.
Our theory predicts this too, but for a different reason. Linear pricing corresponds to a strategy that maximizes total value, suggesting providers are prioritizing adoption and growth over short-term profits. Fast-falling API prices support this reading: the cost of GPT-4-class capability went down by around 90% in its first 16 months on the market.

How do open-source entrants change the game?

The entry of DeepSeek last year, offering high-performance AI models at a fraction of the cost, demonstrated how competition from open-source models is likely to reshape today’s pricing strategies. 
Our model suggests that low-value users will adopt the cheap open-source alternative while high-value users will stick to premium models. To reduce the temptation for intermediate users to defect or split demand across providers, the optimal response for a proprietary firm is to lower prices for mid-range users.


KEY TAKEAWAYS

• AI pricing follows economic logic – Real-world subscription and token pricing schemes closely match the strategies predicted by economic theory.
• Tokens simplify complexity – Even though users perform many different tasks, their behavior can be summarized by overall demand for AI capacity.
• Heavy users pay less per token – Nonlinear pricing means large customers often pay a lower average price, even though they spend more overall.
• Inflection point – As the LLM industry shifts from rapid growth to profitability, economic research can highlight the most efficient pricing strategies.

 

FURTHER READING

Since its inception in 2018, researchers at TSE Digital Center have been leading efforts to understand the economics of digital platforms, Big Data, and AI. “Menu Pricing of Large Language Models” (coauthored by Dirk Bergemann and Alessandro Bonatti) and other publications by Alex Smolin are available to read on the TSE website.


Article published in TSE Reflect, March 2026