Free tools

LLM Cost Calculator

Free tool4 min readUpdated June 13, 2026

This free LLM cost calculator estimates what you will pay to run a large language model in production. Pick two models, enter your input and output tokens (or words, which it converts for you) and your monthly call volume, then toggle prompt caching and batch processing to see the savings. It shows per-call and monthly cost for both models side by side using current 2026 list prices, so you can size a budget or pick the cheapest model for the job before you write a line of code. Everything runs in your browser: no sign-up, no API key, nothing leaves your device.

The tool

Enter your usage as
Cost-saving options
Cost per call
$0.0225
Cost per month
$22.50
Input cost per call
$0.01
Output cost per call
$0.0125
Rate per million
$5.00 in / $25.00 out
Cheaper
Cost per call
$0.0135
Cost per month
$13.50
Input cost per call
$0.006
Output cost per call
$0.0075
Rate per million
$3.00 in / $15.00 out

Estimates use list prices as of 2026 and may not reflect current rates. Your real bill depends on your exact usage. This is a planning estimate, not a live quote.

About this tool

How LLM pricing actually works

Almost every LLM API charges per token, not per request, and quotes the rate per million tokens. A token is roughly four characters of English text, so about 750 words is 1,000 tokens. Your bill for one call is simply the input tokens you send times the input rate plus the output tokens the model generates times the output rate, divided down to your actual token counts. Because providers quote per million, the calculator above does that division for you and multiplies by your monthly call volume to project a real monthly cost.

Why input and output are priced differently

Output tokens almost always cost several times more than input tokens, because generating text is more compute-intensive than reading it. On the Claude models in 2026, for example, output is five times the input rate (about USD 5 input and USD 25 output per million tokens for Opus, about USD 3 and USD 15 for Sonnet, and about USD 1 and USD 5 for Haiku). This is why a chatbot that returns long answers can cost far more than one that returns short ones, and why trimming output length is often the biggest single lever on your bill. The calculator splits the two sides so you can see exactly where the money goes.

Prompt caching and batch discounts

Two features can cut your cost dramatically, and the toggles above model both. Prompt caching reuses a large, unchanging prefix (a system prompt, tool definitions, or retrieved documents) so that repeated input is billed at roughly a tenth of the normal input rate; it only affects the input side, which is why the calculator discounts input alone. Batch processing runs non-urgent jobs asynchronously for about half price on both input and output, in exchange for a slower, best-effort turnaround. If your workload reuses context or can tolerate latency, these two settings often matter more than which model you pick.

Picking the cheapest model that still works

The cheapest model is not always the smartest choice: a weaker model that needs three retries can cost more than a stronger one that gets it right the first time. The honest approach is to start with the smallest model that can do the task reliably, measure its real token usage, and only move up when quality clearly demands it. Use this calculator alongside our Opus vs Sonnet vs Haiku comparison and our Choosing an AI Model article to match the model to the task, then estimate the bill before you commit. For high-volume work, combine a capable lead model with a cheaper one for narrow side tasks.

Frequently asked questions

Next step

Ready to put AI to work as a real workflow?

Start with the foundations course, keep your progress locally and sync everything to your free account whenever you like.