LLM Token Costs in 2026: Pricing Every Model, Hidden…

Input vs output token pricing across GPT, Claude, and Gemini, the context-window cost trap, how caching and batching cut bills 40–80%, and the real per-user margin most AI apps miss.

Every AI app I review has the same problem in its margin model: the founder priced the product before they understood what a token costs, and they're either bleeding money on heavy users or sandbagging themselves on light ones. LLM economics aren't hard — they're just unintuitive, because input and output tokens price differently, context windows compound, and provider price lists move every couple of months. This guide is the durable framework.

The pricing primitives nobody explains

A token is roughly 0.75 of an English word — 100 tokens ≈ 75 words. Pricing is quoted per 1 million tokens, split into:

Input tokens — everything you send to the model: the user's message, the system prompt, retrieved context, tool definitions, and the full prior conversation. These are cheap.
Output tokens — what the model generates back. These are typically 3–5× the price of input tokens.
Cached input tokens — if the same system prompt or long context is reused, providers charge 10–25% of the normal input rate. This is the single largest optimization most teams miss.

Frontier-tier models in mid-2026 sit roughly at:

Premium reasoning ("Pro" tier): $3–15 / M input, $15–60 / M output
Balanced general ("Standard"): $0.50–3 / M input, $2.50–12 / M output
Fast / cheap ("Mini" / "Flash"): $0.05–0.40 / M input, $0.20–2.00 / M output

Always check the model's current price page before locking your margin model. The ratio between tiers is more stable than the absolute number.

The context-window trap

A 200k-token context window doesn't mean you can stuff 200k tokens into every call without thinking. It means you'll pay for every token, every turn. A 10-turn chat with 50k tokens of retrieved context costs 10× a single-turn call. On a $3 / M input model that's $1.50 per conversation — fine for an enterprise tool, lethal for a consumer chatbot at $20/month.

Three concrete fixes:

Summarize the conversation after every 4–6 turns and discard raw history. A 90% token reduction is normal.
Tier your model. Route classification and routing calls to the cheap tier; reserve premium models for the final generation. Many production stacks see 60–80% cost reduction.
Cache aggressively. Pin your system prompt and retrieved chunks in the provider's cache; you'll pay 10–25% of input cost on every subsequent call.

A worked margin example

Consumer AI writing app, $20/month plan. Median user runs 200 sessions/month, each session is 3 turns, each turn averages 800 input tokens and 400 output tokens on a $2 / M input, $8 / M output model:

Input cost: 200 × 3 × 800 × $2/M = $0.96
Output cost: 200 × 3 × 400 × $8/M = $1.92
Per-user model cost: $2.88
Margin at $20: ~85%

Now the same app for a power user running 2,000 sessions: per-user cost = $28.80, margin = negative $8.80. This is the moment naive "unlimited" pricing kills the business. Model your own scenarios in the LLM inference cost calculator and AI agent unit economics calculator.

The four pricing patterns that actually work

Per-message or per-token credits. Aligns cost with usage one-to-one. Best for power-user-heavy products.
Tiered usage caps. $20 = 500 messages, $50 = 2,000. Predictable margin, easy to communicate.
Fair-use unlimited. Marketed as unlimited but rate-limited at the 99th percentile. Works if your median user is 50× cheaper than your cap user.
Bring-your-own-key. User pays the LLM provider directly; you charge a flat product fee. Eliminates the margin problem entirely. Common in dev tools.

The honest planning advice

Model your AI margin against the 99th-percentile user, not the median. Cache your system prompt from day one. Pick a sensible model tier for each step in your pipeline rather than routing everything to the premium tier "just to be safe." And re-pull provider pricing once a quarter — the per-token rates on every major model have dropped 60–95% over the last 18 months, and your margin model should benefit from that, not your competitor's.

LLM Token Costs in 2026: Pricing Every Model, Hidden Multipliers, and Margin Math

The pricing primitives nobody explains

The context-window trap

A worked margin example

The four pricing patterns that actually work

The honest planning advice

More related calculators