How much does prompt caching save?

On the Claude API a cache read costs roughly a tenth of normal input price, about a 90 percent discount on the cached portion. Writing the cache costs a small premium (around 1.25x input for a 5 minute lifetime), which the reads quickly repay.

When should I use prompt caching?

Use it when many requests share a large, identical prefix, such as a long system prompt, fixed tool definitions, a reference document, or the stable early turns of a conversation. It does not help unique, one-off prompts.

Why did my cache not hit?

Usually because the prefix changed. The match is exact: if even one token before the breakpoint differs, or the lifetime expired, you get a miss and pay full price. Keep the cached prefix byte-for-byte identical and put variable content after it.

What Is Prompt Caching? How It Saves Cost

Q: What is prompt caching?

It is a feature that stores the processed prefix of a prompt so repeated requests reuse it instead of reprocessing it. This lowers cost and latency, with cache reads on the Claude API costing about 90 percent less than normal input.

How prompt caching works

You mark a point in your prompt as a cache breakpoint. The provider stores the encoded state of everything up to that point, and the next request that begins with the exact same bytes reads from the cache rather than recomputing. The match must be exact: if even one token in the prefix differs, you get a cache miss and pay the full price. The order matters too, since the prompt is hashed as tools, then system prompt, then messages.

Put the stable, repeated content first (tools, system prompt, long documents).
Mark a cache breakpoint after it; the prefix up to there gets cached.
A later request with the identical prefix reads the cache cheaply.

What it costs and saves

There is a small premium to write the cache and a large saving to read it. On the Claude API a cache write costs about 1.25x normal input for a 5 minute lifetime (or 2x for a 1 hour lifetime), while a cache read costs about 0.1x input, a roughly 90 percent saving. The cache is ephemeral with a short time-to-live that resets on each read, so a busy conversation keeps its cache warm without paying the write cost again.

When to use it

Prompt caching pays off whenever you reuse a large, stable prefix across many calls: a long system prompt, a fixed set of tool definitions, a knowledge document, or a multi-turn chat where the early context stays the same. It does not help one-off prompts that never repeat. Because the saving applies only to the unchanged prefix, structure your prompts so the constant parts come first and the variable parts come last.

What Is Prompt Caching?

In short

How prompt caching works

What it costs and saves

When to use it

Frequently asked questions

Ready to put AI to work as a real workflow?

In short

How prompt caching works

What it costs and saves

When to use it

Frequently asked questions