What is a token in simple terms?

A token is a common chunk of characters, roughly four characters or three quarters of an English word. Models read and are billed in tokens, not words, so long or unusual words and code cost more tokens than short common ones.

What is a context window?

It is the maximum number of tokens a model can consider at once, including your prompt, any pasted files, the conversation so far and the answer it is writing. When it fills up, the model effectively forgets the oldest content.

Why do long prompts give worse answers?

Because of the performance cliff. As the context window fills, quality drops, especially for information buried in the middle. A short prompt with exactly the right context beats a giant prompt almost every time.

Are 1 million token context models better?

Not automatically. They can accept far more text but quality still degrades as the window fills, so a packed huge window often answers worse than a tight focused prompt. Treat large windows as insurance, not a workspace.

How LLMs Work: Tokens, Context and the Cliff

In short

A large language model does one thing remarkably well: it predicts the next token given everything it has seen. Once you understand tokens, the context window and the performance cliff that hits long inputs, working with any model stops feeling like guesswork. This guide explains all three in plain language, with the few numbers that actually matter in 2026, so you can drive any model well and stop blaming the tool for behaviour that is entirely predictable.

Tokens, not words

A model never sees words the way you do. Your text is first split into tokens, which are common chunks of characters, roughly four characters or three quarters of a word in English. Two things are measured in tokens: the price you pay and the amount a model can hold at once. That is why a cheap model can become expensive on long documents, and why code or other languages cost more tokens than the same idea in plain English. Pricing is quoted per million tokens and split into input and output, with output usually several times more expensive than input.

The context window

The context window is the maximum number of tokens a model can consider at once: your instructions, the files you pasted, the conversation history and the answer it is writing, all added together. Think of it as the model's desk. Everything relevant has to fit on the desk at the same time, and when the desk is full something falls off and is effectively forgotten. This is why a long chat starts losing track of instructions you gave near the start. In 2026 a strong model typically has around a 200,000 token window, with some advertising a million or more.

The performance cliff

Bigger context is not the same as better answers. As you fill a context window, quality degrades long before you hit the hard limit. Models attend best to the start and end of a long input and get fuzzy in the middle, a pattern often called lost in the middle. A million-token window sounds amazing, but answer quality on a packed window is often worse than on a tight, well-chosen prompt. This is the performance cliff, and the lesson is blunt: relevance beats volume every time.

Why huge context windows disappoint

You will see models advertising enormous context windows and assume they are strictly better. In practice they often disappoint, for exactly the reason above. A model can technically accept a million tokens and still answer worse than a focused prompt, because quality falls as the window fills. Treat a huge window as occasional insurance for a genuinely large document, not as permission to stop curating what you send.

How to use this in practice

The practical takeaways are simple. Send less, but send the right less. Start fresh conversations rather than piling onto long ones. When an answer is bad, your first two questions are whether your context is too big and whether the relevant information is actually near the top or bottom. On a workflow that runs thousands of times, trimming a bloated prompt can cut your bill dramatically and improve the answers at the same time.

Why this matters for your business

Tokens are money and context discipline is quality. A team that understands this writes tighter prompts, picks cheaper models for simple tasks, and gets more reliable output, which means less rework. Understanding the cliff is the single highest-leverage thing a non-technical founder can learn before spending on AI at scale, because it changes every downstream decision about models, prompts and agents.