Tokens & Context Windows Explained: Why AI 'Forgets' (2026)

TL;DR. A token is a chunk of text (~¾ of a word). A context window is how much text an AI can hold in mind at once — its working memory for one conversation, measured in tokens. Everything counts toward it: your messages, its replies, pasted documents. When a chat gets longer than the window, the oldest parts drop out — which is exactly why AI seems to "forget." Knowing this tells you how to avoid it.

Two words come up constantly once you use AI seriously — tokens and context window — and they sound far more technical than they are. They actually explain some of the most common, confusing AI behavior. Here's the plain version.

Tokens: how AI measures text

AI models don't read in words exactly; they read in tokens — small chunks, usually a word or a piece of one. "Cat" is one token; "unbelievable" might be two or three. The useful rule of thumb:

1,000 tokens ≈ 750 words. A typical page of text is roughly 500 tokens.

Why care? Because everything is measured in tokens — the text you send and the text the AI sends back. It's why usage limits and developer (API) pricing are quoted in tokens rather than words. For everyday use you'll rarely count them, but the unit explains the next idea.

The context window: AI's working memory

The context window is the amount of text an AI can keep in mind at one time — its short-term memory for a single conversation, measured in tokens. Modern models have large windows (hundreds of thousands of tokens, equivalent to several books), but they're still finite.

Here's the key part: everything in the conversation counts toward the window — every message you've sent, every reply it's given, and any documents you've pasted in. It's a budget, and a long conversation slowly spends it.

Why AI "forgets"

This is the payoff. When a conversation grows longer than the context window, the oldest content falls out of the model's view. It's not being lazy or broken — that early text is simply no longer in its working memory. So:

You give detailed instructions at the start of a long chat, and an hour later it stops following them.
You paste a long document, keep chatting, and eventually it "loses track" of the document's details.
A marathon brainstorming session starts contradicting things it said near the beginning.

All the same cause: the start of the conversation scrolled out of the window.

How to work around it

Re-state the important stuff. If a long chat starts drifting, paste the key facts or instructions again so they're back in the window.
Summarize and restart. For a sprawling conversation, ask the AI to summarize what matters, then start a fresh chat with that summary. A clean window full of only the relevant context beats a huge window cluttered with old back-and-forth.
One conversation per task. Don't run your whole week through a single endless thread. Separate chats keep each window focused.
Put the most important context near your latest message. Recent text is freshest in the model's "mind."

A note on bigger windows

Models keep getting larger context windows, and that genuinely helps — you can hand them whole reports. But two cautions: bigger isn't a substitute for relevant. A window stuffed with marginally-related text can actually dilute the model's focus, so curating what you include still matters. And large contexts cost more time and (for developers) money to process. More room is good; using it deliberately is better.

Once "tokens" and "context window" click, AI stops feeling mysterious. It's a system with a working memory of a certain size — feed it the right things, keep the important parts fresh, and start clean when a thread gets bloated.

This article is educational.

Frequently asked questions

What is a token in AI?+

A token is a small chunk of text — roughly a word or part of a word — that an AI reads and writes in. As a rough rule, 1,000 tokens is about 750 words. AI models measure everything (what you send and what they reply) in tokens, which is why limits and API pricing are described in tokens rather than words.

What is a context window?+

A context window is how much text an AI can hold in mind at once — its working memory for a single conversation, measured in tokens. Everything in the conversation (your messages, its replies, any documents you pasted) counts toward it. When a conversation gets longer than the window, the oldest parts fall out of view, which is why a chatbot can seem to 'forget' what you said earlier.

Why does AI forget what I told it earlier?+

Because the conversation grew past its context window. The model can only 'see' a fixed amount of recent text at once. Once you exceed it, the earliest messages drop out of its working memory. The fix is to re-paste the key facts, summarize the important points, or start a fresh conversation with just what matters.

Tokens & Context Windows Explained: Why AI 'Forgets' (2026)

Tokens: how AI measures text

The context window: AI's working memory

Why AI "forgets"

How to work around it

A note on bigger windows

See Claude set up for your job

Set up AI for your job — free, in about 2 minutes

See Claude set up for your job

Frequently asked questions

Related Guides

AI Usage Limits Compared: How Many Messages You Actually Get on ChatGPT, Claude, and Gemini (2026)

The 91% AI Value Gap: Why Most Organizations Fall Short — and How You Don't (2026)

Open vs Closed AI Models, Explained for Professionals (2026)