AI economics · Free calculator

AI Model Cost Calculator

Estimate monthly API costs for GPT-5, Claude Sonnet, Gemini Pro, and other LLMs. Model input/output tokens, request volume, and break-even pricing for your AI feature or product.

Advertisement
New here? Watch it work in 2 seconds — then tweak it for you.
Try it like this

Tap a scenario to load realistic numbers, then tweak the sliders.

100,000
1,200

1 token ≈ 4 chars. A typical chat prompt with system prompt is 800–2000.

400
$1.25

GPT-5: $1.25, Claude Sonnet 4: $3, Gemini 2.5 Flash: $0.30.

$10.00

GPT-5: $10, Claude Sonnet 4: $15, Gemini 2.5 Flash: $2.50.

50%

Prompt-caching cuts repeated input cost 50–90%.

Advertisement
Formula used

LLM cost formula

Output is 5–15× more expensive than input. Cap max_tokens aggressively and stream early to keep costs down.

(input tokens × input $/1M) + (output tokens × output $/1M)
GPT-5 input
$1.25/M
GPT-5 output
$10/M
Cache savings
Up to 90%
Backlink-friendly embed

Embed this calculator

Free to embed on any site. Inputs preserved, link back to RevenueLab. Each format trades polish for SEO juice.

<iframe src="https://revenuelab.fyi/embed/ai-model-cost-calculator?requests=100000&inputTokens=1200&outputTokens=400&inputPrice=1.25&outputPrice=10&cachingDiscount=50" width="100%" height="680" style="border:0;border-radius:12px;max-width:100%" loading="lazy" title="AI Model Cost Calculator"></iframe>
<p style="font:12px/1.4 system-ui;color:#666;margin:6px 0 0">Calculator by <a href="https://revenuelab.fyi/ai-model-cost-calculator?requests=100000&inputTokens=1200&outputTokens=400&inputPrice=1.25&outputPrice=10&cachingDiscount=50" target="_blank" rel="noopener">RevenueLab</a></p>

Easiest to install — passes referral traffic and a referring-domain signal.

Cite this calculator

Writing about this topic? Grab a citation — every link helps keep these tools free.

APA
RevenueLab. (2026). AI Model Cost Calculator. Retrieved from https://revenuelab.fyi/ai-model-cost-calculator
HTML
<p>Source: <a href="https://revenuelab.fyi/ai-model-cost-calculator" target="_blank" rel="noopener">AI Model Cost Calculator — RevenueLab</a> (2026).</p>
Markdown
Source: [AI Model Cost Calculator — RevenueLab](https://revenuelab.fyi/ai-model-cost-calculator) (2026).

Where AI costs actually leak

Most teams under-estimate output tokens by 3–5×. A 'short answer' from an unconstrained model averages 400–800 tokens. The fix is hard system-prompt limits ('respond in <= 80 words') and aggressive max_tokens, not just choosing a cheaper model.

  • Prompt caching cuts repeated system-prompt cost by 50–90%.
  • Batch API endpoints are 50% cheaper for non-realtime work.
  • Cheaper models often need 1.5× more retries — net cost can be higher.

Pricing your AI feature

If a user runs ~100 requests/mo at $0.02 cost/request, you need to charge $7+/mo for a 70% gross margin. Free tiers should hard-cap requests, not tokens — easier to communicate, harder to abuse.

Rex's Notes

Comparing GPT-5, Claude, and Gemini API pricing on input tokens alone misses 60% of the real cost: output tokens are 3–10× more expensive, prompt caching changes economics entirely, and tool-calling rounds blow up your token count. This calculator models true blended cost across providers so you can pick the right model per workload, not per blog post.

What each input means

Get these inputs right and the output is reliable. Get them wrong and the calculator just multiplies bad assumptions.

Monthly requests

Total inference calls across the model.

Typical range: Highly workload-specific — measure with logging, don't guess.

Average input tokens

System prompt + user message + RAG context.

Typical range: 500–2,000 chat; 5,000–20,000 RAG; 20,000+ for code-context agents.

Average output tokens

Model response length.

Typical range: 200–800 chat; 1,000–4,000 long-form; 100–500 structured extraction.

Cache hit rate (if supported)

Share of input tokens served from prompt cache.

Typical range: 30–70% on assistants with stable system prompts; 0% if every request is unique.

Worked examples

Real scenarios with the math walked through line by line.

Example

Chat product, GPT-5 mini

Scenario: 500,000 requests/mo, 1,500 input tokens, 600 output tokens, no cache. Pricing: $0.30 input / $2.40 output per 1M tokens.

Math: Input cost = 500k × 1,500 / 1M × $0.30 = $225. Output = 500k × 600 / 1M × $2.40 = $720. Total ≈ $945/mo.

Outcome: Predictable scaling; output dominates as expected. Upgrading to GPT-5 standard ~5–8× this.

Example

RAG product, Claude Sonnet 4 with caching

Scenario: 200,000 requests, 12,000 input tokens (8,000 cacheable), 800 output. Pricing ≈ $3 input / $0.30 cached / $15 output per 1M.

Math: Cacheable input = 200k × 8,000 / 1M × $0.30 = $480. Non-cached = 200k × 4,000 / 1M × $3 = $2,400. Output = 200k × 800 / 1M × $15 = $2,400. Total ≈ $5,280.

Outcome: Without caching, same workload would cost ~$9,600 — caching cuts bill 45%.

Common mistakes

Where this calculation usually goes wrong in the real world.

  • Pricing on input only. Output tokens drive 60–80% of typical chat costs.
  • Forgetting tool-calling round trips. Each tool call is a separate request with its own token billing.
  • Comparing models on identical prompts. Different models need different prompt structures — cost-per-task is the real metric.
  • Skipping prompt caching where available. 30–80% bill reduction on workloads with stable system prompts.
  • Modeling at peak token count. Most chat sessions are far below max context.

When to use this calculator

  • Estimating monthly bill before scaling a feature.
  • Comparing GPT-5/Claude/Gemini for a specific workload.
  • Modeling the savings from prompt caching or batching.
  • Setting per-user cost budgets for a freemium AI feature.

Glossary

Term

Token

Roughly ¾ of a word in English. Models bill by tokens, not characters or words.

Term

Prompt cache

Server-side reuse of repeated prompt prefixes. Reduces input billing 60–90% on cached portions.

Term

Context window

Maximum tokens the model can process in a single request (input + output combined).

More questions answered

Is GPT-5 or Claude cheaper per task?

Depends on the task. GPT-5 standard runs $1.25 input / $10 output per million tokens; Claude Sonnet 4 is $3 / $15. For pure cost, GPT-5 mini and Claude Haiku undercut both at <$0.50 / $3. Choose on quality-per-dollar for your specific workload — run 100 representative prompts against 3 candidate models and grade outputs before choosing on price alone.

When should I use Gemini Flash vs Pro?

Flash for high-volume, latency-sensitive workloads (chat, classification, simple extraction) — typically 10–20× cheaper than Pro. Pro for tasks requiring strong reasoning, long-context analysis, or complex multimodal input. Many production systems route 80%+ of traffic to Flash and reserve Pro for the 10–20% of requests that fail Flash's quality bar.

How much can prompt caching realistically save?

Workload-dependent: chatbots with stable system prompts see 50–70% cost reduction; RAG with reusable doc embeddings see 40–60%; code agents with shared file context can hit 80%+. Anthropic and OpenAI both support ephemeral and persistent caching; the engineering cost to enable it is typically <1 day for >$1k/mo workloads, making it the highest-ROI optimization.

Related guides

Long-form playbooks on the same topic, written by the RevenueLab editorial team.

Methodology last reviewed: 2026-05 by the RevenueLab editorial team.

FAQ

How much does GPT-5 cost per request?

At ~1200 input / 400 output tokens (typical chat), GPT-5 costs about $0.005 per request before caching, or $0.003 with 70% input caching. Roughly half the price of GPT-4o.

Is Claude or Gemini cheaper than GPT-5?

Gemini 2.5 Flash is the cheapest tier ($0.30 in / $2.50 out per 1M). Claude Sonnet 4.5 is the most expensive of the three ($3 / $15). GPT-5 sits in the middle.

How do I lower my AI bill?

Three biggest levers: (1) enable prompt caching, (2) cap max_tokens hard, (3) use a cheaper model for routing/classification and the big model only for final output.

How this calculator is built

Independently maintained

Written by Sam Doshi and the RevenueLab editorial team. We don't sell the data feeds this tool is built on.

Sourced from primary data

Benchmarks come from public AdSense / Stripe / IRS disclosures and reader-submitted data — never third-party "$X per view" claims. Full methodology.

Last reviewed

June 2026. We re-check every figure on the platform on a rolling quarterly cycle.

Editorial standards

See our editorial policy and disclaimer. Results are estimates, not advice.