AI infra · OpEx · Free calculator

LLM Inference Cost Calculator

Estimate monthly OpenAI, Anthropic, Google, and open-source LLM bills from input/output tokens, requests per day, and per-million-token pricing. Models cache hits, batched discounts, and gross margin at a given retail price.

Disclaimer: Vendor pricing changes frequently. Plug in current per-million-token prices from your provider's pricing page for production budgets. Estimates exclude embedding costs, fine-tuning fees, and image/audio multimodal charges.

New here? Watch it work in 2 seconds — then tweak it for you.

Try it like this

Tap a scenario to load realistic numbers, then tweak the sliders.

Requests per day5,000

Avg input tokens / request1,200

System prompt + user message + retrieved context. RAG apps often 2–8k.

Avg output tokens / request400

Input price ($/1M tokens)$2.50

GPT-4o ≈ $2.50, Claude Sonnet ≈ $3, Gemini 2.5 Pro ≈ $1.25, Haiku/Mini ≈ $0.15–0.25.

Output price ($/1M tokens)$10.00

Output is usually 3–5× input. GPT-4o ≈ $10, Claude Sonnet ≈ $15, o1 ≈ $60.

Prompt cache hit %30%

Cached input tokens cost ~10–25% of normal. Big system prompts benefit most.

Retail price you charge per request ($)$0.05

Set to 0 if you bundle into a subscription.

Monthly LLM API spend

$942

$0.0063/req · $11,461/year · 87% gross margin

At 5,000 requests/day with 1,200 input + 400 output tokens each, you're spending $0.0063 per request — $942/month, $11,461/year. Cache hits at 30% shave roughly 24% off your input cost. If you charge $0.050/request, gross margin is 87.4% ($6,558/month gross profit). The two biggest levers most teams ignore: (1) route trivial requests to Haiku/Mini/Flash — 80% of traffic rarely needs your premium model, and (2) trim system prompts ruthlessly. A 4,000-token system prompt sent on every request costs the same as a 4k-token essay generation. Watch output tokens harder than input — output is usually 4× the price and many apps run with no max_tokens cap.

Cost per request$0.0063

Daily spend$31

Monthly spend$942

Annual spend$11,461

Input cost per request$0.0023

Output cost per request$0.0040

Cost per 1M requests$6,280

Monthly revenue (at retail)$7,500

Gross margin %87.4%

Formula used

LLM cost formula

Almost all frontier APIs price per million tokens. Input and output are billed separately; output is typically 3–5× the input rate. Prompt caching (Anthropic, OpenAI, Gemini) discounts cached input tokens by 75–90%.

Cost/req = (InTok/1M × InPrice × CacheMult) + (OutTok/1M × OutPrice)

Output / input price

~3–5×

Cache discount

75–90%

Mini-tier price

~10–20% of frontier

Backlink-friendly embed

Embed this calculator

Free to embed on any site. Inputs preserved, link back to RevenueLab. Each format trades polish for SEO juice.

WidthHeight (px)Theme

<iframe src="https://revenuelab.fyi/embed/llm-inference-cost-calculator?requestsPerDay=5000&inputTokens=1200&outputTokens=400&inputPrice=2.5&outputPrice=10&cacheHitPct=30&pricePerRequest=0.05" width="100%" height="680" style="border:0;border-radius:12px;max-width:100%" loading="lazy" title="LLM Inference Cost Calculator"></iframe>
<p style="font:12px/1.4 system-ui;color:#666;margin:6px 0 0">Calculator by <a href="https://revenuelab.fyi/llm-inference-cost-calculator?requestsPerDay=5000&inputTokens=1200&outputTokens=400&inputPrice=2.5&outputPrice=10&cacheHitPct=30&pricePerRequest=0.05" target="_blank" rel="noopener">RevenueLab</a></p>

Easiest to install — passes referral traffic and a referring-domain signal.

Cite this calculator

Writing about this topic? Grab a citation — every link helps keep these tools free.

APA

RevenueLab. (2026). LLM Inference Cost Calculator. Retrieved from https://revenuelab.fyi/llm-inference-cost-calculator

HTML

<p>Source: <a href="https://revenuelab.fyi/llm-inference-cost-calculator" target="_blank" rel="noopener">LLM Inference Cost Calculator — RevenueLab</a> (2026).</p>

Markdown

Source: [LLM Inference Cost Calculator — RevenueLab](https://revenuelab.fyi/llm-inference-cost-calculator) (2026).

Token math, in plain English

One token ≈ 0.75 English words, ≈ 4 characters. A 1,000-word doc ≈ 1,300 tokens. A 'long' chat reply ≈ 500–800 tokens. Vendors price per 1,000,000 tokens — so $2.50/1M means $0.0025 per 1,000 tokens, or $0.000025 per 1k chars. Multiply by request volume and the rounding error becomes a real budget line fast.

• Input is what you send (system + user + retrieved context + tool definitions).
• Output is what the model generates (the assistant message + any tool args).
• Tool/function-calling tokens count as output. Long JSON tool schemas count as input on every call.

Why your bill explodes faster than your traffic

LLM bills scale with tokens × requests, not just requests. Three things compound: (1) context bloat — devs add 'just one more example' to the system prompt and quietly 4× input tokens; (2) RAG retrieval — pulling top-20 chunks instead of top-3 multiplies input cost without measurably better answers; (3) reasoning models — o1/o3/Gemini Thinking burn thousands of hidden reasoning tokens billed as output. Track $/successful-task as your north star, not $/token.

Professor Revenue Rex pointing at a chalkboard of formulas

Rex's Notes

Most teams discover their LLM bill three months after shipping, when finance forwards a $40k AWS-style invoice and asks what happened. The honest answer is almost always the same: nobody modeled cost per request before launch. This calculator does that math in 30 seconds — input tokens, output tokens, cache hit rate, and the retail price you charge — and tells you whether your unit economics survive contact with real traffic.

What each input means

Get these inputs right and the output is reliable. Get them wrong and the calculator just multiplies bad assumptions.

Requests per day

Total inference calls hitting the API, including retries and background jobs.

Typical range: 1k–10k for early SaaS; 100k–1M for consumer apps; 10M+ for embedded classifiers.

Avg input tokens / request

System prompt + user message + retrieved RAG chunks + tool/function definitions. Use a real distribution, not a guess.

Typical range: 500–2,000 for chat; 4k–12k for RAG; 200–600 for classification.

Avg output tokens / request

Assistant reply + any tool-call arguments. Reasoning models burn 2–10x more invisible tokens.

Typical range: 100–500 for chat; 500–2,000 for code/long-form; 4k+ for o1/o3/Thinking models.

Input / output price ($/1M)

Vendor pricing per million tokens. Output is almost always 3–5x input.

Typical range: Frontier: $2.50/$10 (GPT-4o), $3/$15 (Sonnet). Mini-tier: $0.15/$0.60 (Haiku, Mini, Flash).

Prompt cache hit %

Share of input tokens served from prompt cache. Big shared system prompts benefit most; one-off requests don't.

Typical range: 10–30% for typical apps; 50–80% for high-volume RAG with stable system prompts.

Retail price per request

What the end customer effectively pays per call. Set to 0 if bundled into a flat subscription and check unit margin separately.

Typical range: $0.001–$0.05 for chat; $0.10–$2 for agent runs; $0 for free or freemium tiers.

Worked examples

Real scenarios with the math walked through line by line.

Example

B2B chatbot SaaS, GPT-4o

Scenario: 5,000 requests/day, 1,500 input tokens, 500 output tokens, $2.50/$10 pricing, 40% cache hit, $0.05/request retail.

Math: Cached input cost = 1,500/1M × $2.50 × (0.6 + 0.4×0.2) = $0.00255. Output = 500/1M × $10 = $0.005. Cost/req ≈ $0.0076. Daily = $38. Monthly = $1,140. Monthly revenue = $7,500. Gross margin = 85%.

Outcome: Healthy. The cache discount alone saves ~$340/mo. Watch for prompt bloat — adding 2k tokens to the system message would halve your margin.

Example

High-volume classifier on Haiku

Scenario: 1M requests/day, 400 input + 80 output tokens, $0.15/$0.60 pricing, 10% cache, $0.001/request retail.

Math: Input ≈ $0.000058. Output = $0.000048. Cost/req ≈ $0.000106. Daily cost = $106. Monthly = $3,180. Monthly revenue = $30,000. Margin = 89%.

Outcome: Classifier economics work because you picked the right model. Routing this to GPT-4o would 17x your cost and erase the business.

Common mistakes

Where this calculation usually goes wrong in the real world.

Modeling cost from a single 'typical' request instead of a token distribution. Real traffic has a long tail of huge requests that dominate the bill.
Ignoring output tokens. Output is 3–5x the price of input — and most apps run with no max_tokens cap.
Counting cache discounts before measuring them. Caches only help if the same prefix gets reused within the TTL; chatty apps with unique users often see <10% hit rates.
Forgetting reasoning tokens. o1, o3, and Gemini Thinking models bill hidden reasoning as output. A 'short' answer can secretly burn 8,000 output tokens.
Comparing vendors on input price alone. Always combine input + output × your real ratio before deciding.

When to use this calculator

Pricing a new AI feature before launch — set retail price > 4x worst-case cost/request.
Deciding whether to switch from frontier to mini-tier models (route 80% of traffic, keep 20% premium).
Building an investor pitch that needs defensible gross margin claims.
Negotiating an enterprise deal — model per-seat token usage at the customer's expected volume.
Justifying engineering time spent on prompt caching, RAG chunk reduction, or function-calling cleanup.

Glossary

Term

Token

The unit LLMs bill on. ~0.75 English words or ~4 characters per token. 1,000 tokens ≈ 750 words.

Term

Context window

Maximum input + output tokens a model accepts in a single request. Pricing is per token used, not per context window size.

Term

Prompt caching

Provider feature that stores shared input prefixes (system prompts, tool definitions) and bills cached tokens at 10–25% of normal.

Term

Reasoning tokens

Hidden chain-of-thought tokens that reasoning models (o1, o3, Gemini Thinking) generate before the visible answer. Billed as output.

Term

Gross margin

(Revenue − direct API cost) ÷ revenue. AI features below 60% gross margin usually can't survive scaling support and infra overhead.

Related guides

Long-form playbooks on the same topic, written by the RevenueLab editorial team.

Guide · 11 min read

LLM Token Costs in 2026: Pricing Every Model, Hidden Multipliers, and Margin Math

Input vs output token pricing across GPT, Claude, and Gemini, the context-window cost trap, how caching and batching cut bills 40–80%, and the real per-user margin most AI apps miss.

Read the guide

Methodology last reviewed: 2026-05 by the RevenueLab editorial team.

FAQ

How are LLM API costs usually billed?

Per million tokens, split into input (prompt) and output (completion) prices. Output is typically 3–5× the input rate. Some vendors also bill separately for cached input, tool-use tokens, and image/audio inputs (per image or per second of audio).

How much do prompt caching discounts actually save?

Anthropic and OpenAI charge ~10–25% of normal price for cached input tokens, with a small write-cost on first use. For an app with a 4k-token system prompt and high request volume, cache hits commonly cut total input cost by 60–80%. Output tokens are never cached.

Should I use GPT-4 / Claude Sonnet for everything?

No. The cheapest reliable model that hits your quality bar usually wins. Route classification, extraction, and simple chat to Haiku, Mini, or Flash (10–20× cheaper). Reserve frontier models for reasoning-heavy work where output quality drives real revenue.

What about self-hosted open-source models?

Hosting Llama/Mistral/Qwen on your own GPUs only beats hosted APIs above roughly 10–50M tokens/day of sustained traffic, depending on model size. Below that, hosted APIs almost always win on total cost of ownership when you include reliability, scaling, and engineering time.

How this calculator is built

Independently maintained

Written by Sam Doshi and the RevenueLab editorial team. We don't sell the data feeds this tool is built on.

Sourced from primary data

Benchmarks come from public AdSense / Stripe / IRS disclosures and reader-submitted data — never third-party "$X per view" claims. Full methodology.

Last reviewed

July 2026. We re-check every figure on the platform on a rolling quarterly cycle.

Editorial standards

See our editorial policy and disclaimer. Results are estimates, not advice.

Stay in the same topic — keep the model running.

LLM Inference Cost Calculator

LLM cost formula

Cite this calculator

Token math, in plain English

Why your bill explodes faster than your traffic

What each input means

Requests per day

Avg input tokens / request

Avg output tokens / request

Input / output price ($/1M)

Prompt cache hit %

Retail price per request

Worked examples

B2B chatbot SaaS, GPT-4o

High-volume classifier on Haiku

Common mistakes

When to use this calculator

Glossary

Token

Context window

Prompt caching

Reasoning tokens

Gross margin

More questions answered

Why is my actual bill 2–3x what this calculator predicts?

When does it make sense to self-host an open model?

How do I budget for an AI feature that's still in beta?

Should batched API calls be priced differently?

Related guides

LLM Token Costs in 2026: Pricing Every Model, Hidden Multipliers, and Margin Math

FAQ

How are LLM API costs usually billed?

How much do prompt caching discounts actually save?

Should I use GPT-4 / Claude Sonnet for everything?

What about self-hosted open-source models?

How this calculator is built

LLM Inference Cost Calculator

LLM cost formula

Embed this calculator

Cite this calculator

Token math, in plain English

Why your bill explodes faster than your traffic

What each input means

Requests per day

Avg input tokens / request

Avg output tokens / request

Input / output price ($/1M)

Prompt cache hit %

Retail price per request

Worked examples

B2B chatbot SaaS, GPT-4o

High-volume classifier on Haiku

Common mistakes

When to use this calculator

Glossary

Token

Context window

Prompt caching

Reasoning tokens

Gross margin

More questions answered

Why is my actual bill 2–3x what this calculator predicts?

When does it make sense to self-host an open model?

How do I budget for an AI feature that's still in beta?

Should batched API calls be priced differently?

Related guides

LLM Token Costs in 2026: Pricing Every Model, Hidden Multipliers, and Margin Math

FAQ

How are LLM API costs usually billed?

How much do prompt caching discounts actually save?

Should I use GPT-4 / Claude Sonnet for everything?

What about self-hosted open-source models?

How this calculator is built

Related calculators