AI infra · CapEx alternative · Free calculator

AI / GPU Cloud Cost Calculator

Estimate monthly GPU cloud spend on AWS, GCP, Azure, Lambda, CoreWeave, RunPod, and Modal. Models per-GPU hourly rate, utilization, storage, egress, and on-demand vs reserved pricing.

Disclaimer: Cloud GPU pricing changes constantly and varies by region, contract terms, and discount tier. Use this calculator for budgeting and vendor comparison — confirm specific pricing in your provider's quote before committing.

Scenarios
Common scenarios

Tap a persona to auto-load realistic numbers for that scenario, then tweak the sliders.

8
$2.50

H100 on-demand ≈ $2.50–$4.50, A100 ≈ $1.50–$3, L40S ≈ $1–$1.80, RTX 4090 ≈ $0.40–$0.80.

70%

On-demand jobs rarely hit 100%. Reserved/committed-use often does.

5

Datasets + checkpoints + model weights. NVMe ~$0.10/GB/mo, object storage $0.02/GB/mo.

$0.08
2

AWS/GCP/Azure egress ≈ $0.08–$0.12/GB. Cloudflare/some GPU clouds = $0.

$0.09
0%

1-yr commit typically saves 30–40%, 3-yr 50–65%.

Formula used

GPU cloud cost formula

GPUs are the headline number, but storage and egress regularly add 10–30% on top. Reserved or committed-use pricing reshapes the entire bill — model both before signing a multi-year contract.

Monthly = (GPUs × $/hr × 730 × Util%) × (1 − Reserved%) + Storage + Egress
H100 on-demand
$2.50–$4.50/hr
Reserved discount
30–65%
Hyperscaler egress
$0.08–$0.12/GB
Backlink-friendly embed

Embed this calculator

Free to embed on any site. Inputs preserved, link back to RevenueLab. Each format trades polish for SEO juice.

<iframe src="https://revenuelab.fyi/embed/ai-gpu-cloud-cost-calculator?gpuCount=8&hourlyPerGpu=2.5&utilizationPct=70&storageTb=5&storagePricePerGb=0.08&egressTb=2&egressPricePerGb=0.09&reservedDiscountPct=0" width="100%" height="680" style="border:0;border-radius:12px;max-width:100%" loading="lazy" title="AI / GPU Cloud Cost Calculator"></iframe>
<p style="font:12px/1.4 system-ui;color:#666;margin:6px 0 0">Calculator by <a href="https://revenuelab.fyi/ai-gpu-cloud-cost-calculator?gpuCount=8&hourlyPerGpu=2.5&utilizationPct=70&storageTb=5&storagePricePerGb=0.08&egressTb=2&egressPricePerGb=0.09&reservedDiscountPct=0" target="_blank" rel="noopener">RevenueLab</a></p>

Easiest to install — passes referral traffic and a referring-domain signal.

Cite this calculator

Writing about this topic? Grab a citation — every link helps keep these tools free.

APA
RevenueLab. (2026). AI / GPU Cloud Cost Calculator. Retrieved from https://revenuelab.fyi/ai-gpu-cloud-cost-calculator
HTML
<p>Source: <a href="https://revenuelab.fyi/ai-gpu-cloud-cost-calculator" target="_blank" rel="noopener">AI / GPU Cloud Cost Calculator — RevenueLab</a> (2026).</p>
Markdown
Source: [AI / GPU Cloud Cost Calculator — RevenueLab](https://revenuelab.fyi/ai-gpu-cloud-cost-calculator) (2026).

On-demand vs reserved vs spot

On-demand is the most expensive way to buy GPU-hours. 1-year commits typically save 30–40%; 3-year saves 50–65%. Spot/preemptible can shave another 50–70% on top but get evicted mid-run — only viable for fault-tolerant training and batch inference. Most production inference clusters land on 1-year reserved for the steady-state baseline plus on-demand burst for traffic spikes.

  • On-demand: zero commitment, full hourly rate. Best for evaluation and bursty workloads.
  • Reserved: 30–65% discount in exchange for 1–3 year commitment. Best for steady inference.
  • Spot/preemptible: 50–70% additional discount, eviction risk. Best for distributed training with checkpointing.

Why neoclouds keep winning workloads

CoreWeave, Lambda Labs, Crusoe, RunPod, Modal, and Together typically charge 40–70% less per GPU-hour than AWS/GCP/Azure for the same hardware. They win on price-per-hour and lose on the managed-service surface area (no Bedrock, no Vertex, fewer compliance attestations). For pure GPU workloads, the math usually pencils in their favor; for tightly-integrated multi-service apps, the hyperscaler premium can still be worth it.

Rex's Notes

GPU pricing is the most opaque corner of cloud infrastructure. The same H100 costs $2.49/hr on Lambda, $4.20/hr on neoclouds, $8.39/hr on AWS on-demand, and $3.50/hr on a 1-year AWS reserved commit — for identical silicon. This calculator translates GPU type, utilization, and provider into a real monthly bill, then layers reserved discounts and egress so you can compare apples to apples before signing a $200k contract.

What each input means

Get these inputs right and the output is reliable. Get them wrong and the calculator just multiplies bad assumptions.

GPU count

Active GPUs you need running concurrently at peak. Multiply training jobs × replicas + serving fleet.

Typical range: 1–8 for fine-tuning; 8–64 for serious training; 4–32 for production inference clusters.

Hourly price per GPU

On-demand list price. Reserved/spot are separate inputs. H100 is the modern reference point.

Typical range: H100: $1.99–$3 (Lambda/neoclouds), $3.50–$5 (1yr reserved hyperscaler), $7–$12 (on-demand hyperscaler). A100: 40–60% of H100. L40S: $1.10–$2.50.

Utilization %

Share of paid hours the GPU is actually crunching. Idle GPUs still bill at full rate.

Typical range: 30–60% for serving with bursty traffic; 70–95% for training; <20% means you bought too many.

Reserved discount %

Discount from list price for 1–3yr commitments. Spot pricing can be deeper but with eviction risk.

Typical range: 1yr reserved: 30–45%. 3yr: 50–65%. Spot: 60–90% off but interruptions kill long training jobs.

Egress GB/month

Data leaving the cloud — model artifacts, dataset downloads, inference responses going to other clouds.

Typical range: 100GB–10TB for inference apps; 50–500TB during data prep for training.

Worked examples

Real scenarios with the math walked through line by line.

Example

Inference cluster, 8× H100 on a neocloud

Scenario: 8 H100s at $2.49/hr on Lambda or similar, 55% utilization, no reserved commit, 5TB egress.

Math: Compute = 8 × $2.49 × 730h = $14,542/mo. Egress = 5,000 × $0.08 ≈ $400. Total ≈ $15k/mo. Per useful GPU-hour (at 55% util) = $4.53.

Outcome: Solid floor for an inference cluster. Moving to AWS on-demand would push this to ~$50k/mo for the same silicon.

Example

Training run on hyperscaler with 1yr reserved

Scenario: 16 H100s at $4.50/hr (1yr reserved AWS), 88% utilization, 80TB egress for evals.

Math: Compute = 16 × $4.50 × 730h = $52,560/mo. Egress = 80,000 × $0.05 (committed tier) = $4,000. Total = $56,560/mo, $678k/yr.

Outcome: Acceptable if this is a production training pipeline running 70%+ of the year. If it's two big runs per year, switch to on-demand or spot — reservations punish part-time use.

Common mistakes

Where this calculation usually goes wrong in the real world.

  • Comparing GPUs on $/hour without normalizing for throughput. An H100 doing 2.5x the tokens/sec of an A100 at 1.6x the price is cheaper per useful unit of work.
  • Forgetting egress. AWS egress at $0.09/GB can add $9k/month to a 100TB pipeline — sometimes more than compute on training jobs.
  • Over-reserving. Reservation breakeven is usually 60–70% sustained utilization across the term. Bursty workloads lose money on long commits.
  • Counting on spot without checkpointing. A 72-hour training run on spot will get evicted at least once; without checkpoints you restart from scratch.
  • Ignoring storage and networking line items. Premium IOPS for dataset access and InfiniBand for multi-node training often add 10–20% to compute.

When to use this calculator

  • Choosing between hyperscaler reserved, neocloud on-demand, and bare metal for a new training pipeline.
  • Building a make-vs-buy case for self-hosted open-source models vs. hosted LLM APIs.
  • Negotiating committed-use discounts — model breakeven before the sales call.
  • Sizing a Series A budget request for the AI compute line item.
  • Comparing serving cost across H100, A100, L40S, and H200 for the same model.

Glossary

Term

On-demand

Pay-as-you-go GPU pricing with no commitment. Highest hourly rate but full flexibility.

Term

Reserved instance

1–3 year commitment to a specific GPU type for 30–65% off on-demand. Charged whether you use it or not.

Term

Spot / preemptible

Deeply discounted GPUs (60–90% off) that the provider can reclaim with 30s–2min notice. Great for fault-tolerant batch work.

Term

Neocloud

GPU-specialist providers (Lambda, CoreWeave, Crusoe, Together) that undercut hyperscalers by 40–70% on H100/H200 hourly rates.

Term

Egress

Data transferred out of the cloud. Charged per GB; varies 5–20x across providers and destinations.

More questions answered

Are neoclouds actually safe to run production on?

For inference and training, yes — Lambda, CoreWeave, Crusoe, and Together AI serve production workloads for major labs. The real tradeoffs are fewer managed services (no equivalent of S3, IAM, or RDS), thinner regional coverage, and less mature support SLAs. Hybrid setups (compute on neocloud, storage/auth on AWS) are common.

How do I decide between H100 and A100 for serving?

Math, not vibes. For most LLMs in the 7B–70B range, H100 delivers ~2–2.5x the throughput of A100 at ~1.5–1.8x the price — net win per token served. For older models with smaller batch sizes, A100 sometimes ties on $/token. Benchmark your actual model with vLLM or TensorRT-LLM before committing.

What about TPUs and AMD MI300X?

TPUs (v5p, Trillium) on GCP can beat H100 on $/token for certain workloads if your stack supports JAX or PyTorch/XLA. MI300X has more memory than H100 (192GB vs 80GB), which simplifies serving very large models on a single device. Both options require real engineering investment — don't switch for a 15% paper saving.

How much should AI infra be of total cloud spend at scale?

At AI-native startups post-product-market-fit, GPU compute typically runs 40–70% of total cloud spend. Below 30% suggests you're not actually GPU-bound; above 80% means you should be aggressively pursuing reserved capacity or self-hosting.

Methodology last reviewed: 2026-05 by the RevenueLab editorial team.

FAQ

How much does an H100 actually cost per hour?

On-demand: $4–$5/hr on AWS/GCP/Azure, $2.50–$3.50/hr on CoreWeave/Lambda/RunPod, $1.90–$2.50/hr on lesser-known neoclouds. Reserved 1-year drops hyperscaler pricing to ~$2.50–$3/hr; 3-year to ~$1.80–$2.20/hr.

Should I train on the cloud or buy GPUs?

Buying becomes cheaper than cloud reserved pricing at roughly 18–30 months of continuous utilization, depending on GPU and electricity costs. Most teams stay on cloud because they need elasticity, don't want to manage hardware, and lack the colo footprint.

How big is egress on real workloads?

On a hyperscaler, 10 TB/month of egress at $0.09/GB = $900/month. Cross-region data transfer (e.g., training in us-east-1, serving in eu-west-1) can dwarf the GPU bill on data-heavy pipelines. Cloudflare R2, Backblaze B2, and several neoclouds offer free or near-free egress as a deliberate counter to hyperscaler pricing.

What's a healthy GPU utilization target?

Inference: 60–80% steady-state utilization is excellent; under 30% means you're over-provisioned. Training: 90%+ wall-clock during the run, with auto-shutdown when the job completes. Always tag instances and set hard auto-stop policies — orphaned GPUs are the #1 cause of mystery cloud bills.