In short
A context window is the maximum amount of text an AI model can consider at once, measured in tokens. It includes everything in play: the system prompt, the files or data you paste in, the conversation so far, and the answer the model is writing. Think of it as the model working memory: anything inside the window can influence the response, and once the window is full the oldest content effectively falls out of view. Knowing the size of a model context window, and managing what you put in it, is central to getting good, affordable results.
Why the context window matters
The context window sets a hard ceiling on how much the model can take into account in one go. If your instructions, code and history exceed it, something has to be dropped or summarised, and the model can lose track of details you gave earlier. This is why long, sprawling chats start forgetting things, and why a fresh, focused conversation often beats piling onto an old one.
- Everything counts: system prompt, pasted files, history and the output share the window.
- When it fills up, the oldest content is dropped or compacted and can be forgotten.
- Bigger is not always better: a stuffed window can still bury the key detail.
Sizes and the "lost in the middle" problem
Context windows have grown large, with leading models in 2026 offering hundreds of thousands of tokens and some reaching a million, enough to hold a whole codebase. But more room is not a free lunch: models can pay less attention to information buried in the middle of a long context, an effect often called "lost in the middle". So putting the most important context near the start or end, and keeping it relevant, still beats dumping everything in.
Managing the context window
Agent harnesses spend a lot of effort here: they compact older turns into summaries, trim irrelevant content, and offload noisy side work to subagents so the main window stays clean. The practical rule for you is the same as for cost: send less but more relevant text. Good context management is its own discipline, sometimes called context engineering, and it is one of the highest-leverage skills when building with agents.
