---
title: "LLM Cost Calculator"
description: "Free LLM API cost calculator. Estimate per-call and monthly cost for Claude, GPT and Gemini, compare two models, and model prompt caching and batch savings."
type: "tool"
locale: "en"
category: "Free tool"
canonical: "https://agenticschool.dev/tools/llm-cost-calculator"
dateModified: "2026-06-13"
---

# LLM Cost Calculator

- Category: Free tool
- Updated: 2026-06-13
- Keywords: llm cost calculator, claude api cost calculator, token cost calculator, gpt api pricing calculator, llm pricing
- Canonical URL: https://agenticschool.dev/tools/llm-cost-calculator
- Locale: en

> Free LLM API cost calculator. Estimate per-call and monthly cost for Claude, GPT and Gemini, compare two models, and model prompt caching and batch savings.

This free LLM cost calculator estimates what you will pay to run a large language model in production. Pick two models, enter your input and output tokens (or words, which it converts for you) and your monthly call volume, then toggle prompt caching and batch processing to see the savings. It shows per-call and monthly cost for both models side by side using current 2026 list prices, so you can size a budget or pick the cheapest model for the job before you write a line of code. Everything runs in your browser: no sign-up, no API key, nothing leaves your device.

## About this tool

### How LLM pricing actually works

Almost every LLM API charges per token, not per request, and quotes the rate per million tokens. A token is roughly four characters of English text, so about 750 words is 1,000 tokens. Your bill for one call is simply the input tokens you send times the input rate plus the output tokens the model generates times the output rate, divided down to your actual token counts. Because providers quote per million, the calculator above does that division for you and multiplies by your monthly call volume to project a real monthly cost.

### Why input and output are priced differently

Output tokens almost always cost several times more than input tokens, because generating text is more compute-intensive than reading it. On the Claude models in 2026, for example, output is five times the input rate (about USD 5 input and USD 25 output per million tokens for Opus, about USD 3 and USD 15 for Sonnet, and about USD 1 and USD 5 for Haiku). This is why a chatbot that returns long answers can cost far more than one that returns short ones, and why trimming output length is often the biggest single lever on your bill. The calculator splits the two sides so you can see exactly where the money goes.

### Prompt caching and batch discounts

Two features can cut your cost dramatically, and the toggles above model both. Prompt caching reuses a large, unchanging prefix (a system prompt, tool definitions, or retrieved documents) so that repeated input is billed at roughly a tenth of the normal input rate; it only affects the input side, which is why the calculator discounts input alone. Batch processing runs non-urgent jobs asynchronously for about half price on both input and output, in exchange for a slower, best-effort turnaround. If your workload reuses context or can tolerate latency, these two settings often matter more than which model you pick.

### Picking the cheapest model that still works

The cheapest model is not always the smartest choice: a weaker model that needs three retries can cost more than a stronger one that gets it right the first time. The honest approach is to start with the smallest model that can do the task reliably, measure its real token usage, and only move up when quality clearly demands it. Use this calculator alongside our Opus vs Sonnet vs Haiku comparison and our Choosing an AI Model article to match the model to the task, then estimate the bill before you commit. For high-volume work, combine a capable lead model with a cheaper one for narrow side tasks.

## FAQ

### How is LLM API cost calculated?

Cost is per token, quoted per million tokens. For one call you pay your input tokens times the input rate plus your output tokens times the output rate. Multiply by your monthly call volume for a monthly estimate. This calculator does that math for you using current 2026 list prices.

### Why does output cost more than input?

Generating text is more compute-intensive than reading it, so output tokens are priced higher, often several times the input rate. On the Claude models in 2026, output is about five times the input rate. Shortening responses is usually the biggest single way to lower a bill.

### How much do prompt caching and batch processing save?

Prompt caching bills repeated input at roughly a tenth of the normal input rate, so it helps most when a large prefix like a system prompt or documents is reused. Batch processing runs non-urgent jobs at about half price on both sides in exchange for slower turnaround. Toggle both above to see your savings.

### Are these LLM prices accurate?

The figures are list prices as of 2026 and are kept consistent with our model comparisons, but provider pricing changes over time and your real bill depends on your exact usage. Treat the result as a planning estimate, not a live quote, and confirm current rates on the provider pricing page before you commit.

### Is this LLM cost calculator free and private?

Yes. It is completely free, needs no sign-up and no API key, and runs entirely in your browser. Nothing you type is sent to a server, so your numbers stay private.
