How is LLM API cost calculated?

Cost is per token, quoted per million tokens. For one call you pay your input tokens times the input rate plus your output tokens times the output rate. Multiply by your monthly call volume for a monthly estimate. This calculator does that math for you using current 2026 list prices.

Why does output cost more than input?

Generating text is more compute-intensive than reading it, so output tokens are priced higher, often several times the input rate. On the Claude models in 2026, output is about five times the input rate. Shortening responses is usually the biggest single way to lower a bill.

How much do prompt caching and batch processing save?

Prompt caching bills repeated input at roughly a tenth of the normal input rate, so it helps most when a large prefix like a system prompt or documents is reused. Batch processing runs non-urgent jobs at about half price on both sides in exchange for slower turnaround. Toggle both above to see your savings.

Are these LLM prices accurate?

The figures are list prices as of 2026 and are kept consistent with our model comparisons, but provider pricing changes over time and your real bill depends on your exact usage. Treat the result as a planning estimate, not a live quote, and confirm current rates on the provider pricing page before you commit.

Is this LLM cost calculator free and private?

Yes. It is completely free, needs no sign-up and no API key, and runs entirely in your browser. Nothing you type is sent to a server, so your numbers stay private.

Free LLM API Cost Calculator (2026)

The tool

Model A

Cost per call: $0.0225
Cost per month: $22.50
Input cost per call: $0.01
Output cost per call: $0.0125
Rate per million: $5.00 in / $25.00 out

CheaperModel B

Cost per call: $0.0135
Cost per month: $13.50
Input cost per call: $0.006
Output cost per call: $0.0075
Rate per million: $3.00 in / $15.00 out

Estimates use list prices as of 2026 and may not reflect current rates. Your real bill depends on your exact usage. This is a planning estimate, not a live quote.

How LLM pricing actually works

Almost every LLM API charges per token, not per request, and quotes the rate per million tokens. A token is roughly four characters of English text, so about 750 words is 1,000 tokens. Your bill for one call is simply the input tokens you send times the input rate plus the output tokens the model generates times the output rate, divided down to your actual token counts. Because providers quote per million, the calculator above does that division for you and multiplies by your monthly call volume to project a real monthly cost.

Why input and output are priced differently

Output tokens almost always cost several times more than input tokens, because generating text is more compute-intensive than reading it. On the Claude models in 2026, for example, output is five times the input rate (about USD 5 input and USD 25 output per million tokens for Opus, about USD 3 and USD 15 for Sonnet, and about USD 1 and USD 5 for Haiku). This is why a chatbot that returns long answers can cost far more than one that returns short ones, and why trimming output length is often the biggest single lever on your bill. The calculator splits the two sides so you can see exactly where the money goes.

Prompt caching and batch discounts

Two features can cut your cost dramatically, and the toggles above model both. Prompt caching reuses a large, unchanging prefix (a system prompt, tool definitions, or retrieved documents) so that repeated input is billed at roughly a tenth of the normal input rate; it only affects the input side, which is why the calculator discounts input alone. Batch processing runs non-urgent jobs asynchronously for about half price on both input and output, in exchange for a slower, best-effort turnaround. If your workload reuses context or can tolerate latency, these two settings often matter more than which model you pick.

Picking the cheapest model that still works

The cheapest model is not always the smartest choice: a weaker model that needs three retries can cost more than a stronger one that gets it right the first time. The honest approach is to start with the smallest model that can do the task reliably, measure its real token usage, and only move up when quality clearly demands it. Use this calculator alongside our Opus vs Sonnet vs Haiku comparison and our Choosing an AI Model article to match the model to the task, then estimate the bill before you commit. For high-volume work, combine a capable lead model with a cheaper one for narrow side tasks.

LLM Cost Calculator

The tool

About this tool

How LLM pricing actually works

Why input and output are priced differently

Prompt caching and batch discounts

Picking the cheapest model that still works

Frequently asked questions

Ready to put AI to work as a real workflow?

The tool

About this tool

How LLM pricing actually works

Why input and output are priced differently

Prompt caching and batch discounts

Picking the cheapest model that still works

Frequently asked questions

How is LLM API cost calculated?

Why does output cost more than input?

How much do prompt caching and batch processing save?

Are these LLM prices accurate?

Is this LLM cost calculator free and private?

Related

Ready to put AI to work as a real workflow?

Better AI workflows, once a week.