Where AI costs actually leak
Most teams under-estimate output tokens by 3–5×. A 'short answer' from an unconstrained model averages 400–800 tokens. The fix is hard system-prompt limits ('respond in <= 80 words') and aggressive max_tokens, not just choosing a cheaper model.
- • Prompt caching cuts repeated system-prompt cost by 50–90%.
- • Batch API endpoints are 50% cheaper for non-realtime work.
- • Cheaper models often need 1.5× more retries — net cost can be higher.
Pricing your AI feature
If a user runs ~100 requests/mo at $0.02 cost/request, you need to charge $7+/mo for a 70% gross margin. Free tiers should hard-cap requests, not tokens — easier to communicate, harder to abuse.
Related guides
Long-form playbooks on the same topic, written by the RevenueLab editorial team.
FAQ
How much does GPT-5 cost per request?
At ~1200 input / 400 output tokens (typical chat), GPT-5 costs about $0.005 per request before caching, or $0.003 with 70% input caching. Roughly half the price of GPT-4o.
Is Claude or Gemini cheaper than GPT-5?
Gemini 2.5 Flash is the cheapest tier ($0.30 in / $2.50 out per 1M). Claude Sonnet 4.5 is the most expensive of the three ($3 / $15). GPT-5 sits in the middle.
How do I lower my AI bill?
Three biggest levers: (1) enable prompt caching, (2) cap max_tokens hard, (3) use a cheaper model for routing/classification and the big model only for final output.