Tokens & Context Windows Explained: Why AI 'Forgets' (2026)
Tokens and context windows sound technical but explain everyday AI behavior — why a chatbot loses track of a long conversation, why it has limits, and how to work around them. Plain English, with practical tips.

TL;DR. A token is a chunk of text (~¾ of a word). A context window is how much text an AI can hold in mind at once — its working memory for one conversation, measured in tokens. Everything counts toward it: your messages, its replies, pasted documents. When a chat gets longer than the window, the oldest parts drop out — which is exactly why AI seems to "forget." Knowing this tells you how to avoid it.
Two words come up constantly once you use AI seriously — tokens and context window — and they sound far more technical than they are. They actually explain some of the most common, confusing AI behavior. Here's the plain version.
Tokens: how AI measures text
AI models don't read in words exactly; they read in tokens — small chunks, usually a word or a piece of one. "Cat" is one token; "unbelievable" might be two or three. The useful rule of thumb:
1,000 tokens ≈ 750 words. A typical page of text is roughly 500 tokens.
Why care? Because everything is measured in tokens — the text you send and the text the AI sends back. It's why usage limits and developer (API) pricing are quoted in tokens rather than words. For everyday use you'll rarely count them, but the unit explains the next idea.
The context window: AI's working memory
The context window is the amount of text an AI can keep in mind at one time — its short-term memory for a single conversation, measured in tokens. Modern models have large windows (hundreds of thousands of tokens, equivalent to several books), but they're still finite.
Here's the key part: everything in the conversation counts toward the window — every message you've sent, every reply it's given, and any documents you've pasted in. It's a budget, and a long conversation slowly spends it.
Why AI "forgets"
This is the payoff. When a conversation grows longer than the context window, the oldest content falls out of the model's view. It's not being lazy or broken — that early text is simply no longer in its working memory. So:
- You give detailed instructions at the start of a long chat, and an hour later it stops following them.
- You paste a long document, keep chatting, and eventually it "loses track" of the document's details.
- A marathon brainstorming session starts contradicting things it said near the beginning.
All the same cause: the start of the conversation scrolled out of the window.
How to work around it
- Re-state the important stuff. If a long chat starts drifting, paste the key facts or instructions again so they're back in the window.
- Summarize and restart. For a sprawling conversation, ask the AI to summarize what matters, then start a fresh chat with that summary. A clean window full of only the relevant context beats a huge window cluttered with old back-and-forth.
- One conversation per task. Don't run your whole week through a single endless thread. Separate chats keep each window focused.
- Put the most important context near your latest message. Recent text is freshest in the model's "mind."
A note on bigger windows
Models keep getting larger context windows, and that genuinely helps — you can hand them whole reports. But two cautions: bigger isn't a substitute for relevant. A window stuffed with marginally-related text can actually dilute the model's focus, so curating what you include still matters. And large contexts cost more time and (for developers) money to process. More room is good; using it deliberately is better.
Once "tokens" and "context window" click, AI stops feeling mysterious. It's a system with a working memory of a certain size — feed it the right things, keep the important parts fresh, and start clean when a thread gets bloated.
This article is educational.
Save hours every week with the AI Career Lab — All AI Prompts Bundle
All eight profession-specific AI Prompts packs — 393 agentic skills total with ambient compliance guards. Runs on Claude Cowork.
Frequently asked questions
What is a token in AI?+
A token is a small chunk of text — roughly a word or part of a word — that an AI reads and writes in. As a rough rule, 1,000 tokens is about 750 words. AI models measure everything (what you send and what they reply) in tokens, which is why limits and API pricing are described in tokens rather than words.
What is a context window?+
A context window is how much text an AI can hold in mind at once — its working memory for a single conversation, measured in tokens. Everything in the conversation (your messages, its replies, any documents you pasted) counts toward it. When a conversation gets longer than the window, the oldest parts fall out of view, which is why a chatbot can seem to 'forget' what you said earlier.
Why does AI forget what I told it earlier?+
Because the conversation grew past its context window. The model can only 'see' a fixed amount of recent text at once. Once you exceed it, the earliest messages drop out of its working memory. The fix is to re-paste the key facts, summarize the important points, or start a fresh conversation with just what matters.
Related Guides
Claude for Financial Services: What Anthropic's Free Plugin Does (and the Layer It Leaves to You)
Anthropic shipped a free Claude for Financial Services plugin that builds plans, models, and rebalances. Here's exactly what it covers, what it leaves out, and how advisors fill the compliance and client-communication gap.
Claude Pricing & Plans Explained: Free vs Pro vs Max vs Team (2026)
What Claude actually costs in 2026 — the Free, Pro, Max, Team, and Enterprise plans, what you get on each, whether Claude Cowork is free, and whether the plugins and skills cost extra. Plain English, no upsell.
How to Use Claude Cowork: A Getting-Started Walkthrough (2026)
A step-by-step, no-code guide to actually using Claude Cowork — getting access, creating a Project, giving it your first real task, and the handful of habits that make it work. Written for professionals, not developers.