The Developer's Guide to AI Context Windows: Why Your Prompts Keep Failing
Understanding context windows is the most underrated skill in AI-assisted development. Learn how tokens work, why your prompts degrade mid-session, and how to structure context for maximum accuracy.
The Developer's Guide to AI Context Windows: Why Your Prompts Keep Failing
You've been in a great debugging session for 40 minutes. The AI understood your schema, remembered your naming conventions, and was making solid suggestions. Then, around message 25, something changes. The model starts hallucinating function names. It forgets the constraint you mentioned earlier. The responses feel generic.
You haven't changed models. You haven't done anything wrong. What happened?
You hit the context window limit. And you didn't even know it.
What Is a Context Window?
A context window is the total amount of text — measured in tokens — that a language model can "see" at once. Everything inside the window is available to the model for reasoning. Everything outside it is completely invisible, as if it never existed.
Think of it like RAM for the model's attention. The context window holds your entire conversation: your system prompt, every message you've sent, every response the model has given, and any documents you've pasted in.
When the conversation exceeds the context window, the oldest content gets dropped. The model doesn't warn you. It doesn't say "I've forgotten what you told me earlier." It just quietly stops knowing things it used to know.
Token math every developer should know
| Model | Context window | Rough character equivalent |
|---|---|---|
| GPT-4o | 128K tokens | ~512,000 characters |
| Claude 3.5 Sonnet | 200K tokens | ~800,000 characters |
| Claude 3 Opus | 200K tokens | ~800,000 characters |
| Gemini 1.5 Pro | 1M tokens | ~4,000,000 characters |
One token is roughly 4 characters or ¾ of a word in English. A typical source file is 500–2,000 tokens. A long conversation with lots of code is 10,000–50,000 tokens.
Why Context Degradation Happens Before the Hard Limit
Here's the counterintuitive part: you don't have to hit the hard limit for context to degrade. Even well within the window, models show attention decay — they weight recent tokens more heavily than distant ones.
If you explain your architecture at the beginning of a long conversation, and then ask a question 15,000 tokens later, the model may partially or fully ignore that early explanation. Not because it was dropped — it's still in the window — but because attention mechanisms naturally prioritize recency.
This is why the "lost in the middle" phenomenon is well-documented in LLM research: information buried in the middle of a long context is systematically underweighted.
What this means in practice
- Put your most important constraints at the end of your system/context block, not the beginning
- Repeat critical constraints in your follow-up messages ("as a reminder, we're using App Router, not Pages Router")
- Use short, dense context blocks rather than long, discursive explanations
How to Structure Your Context for Maximum Accuracy
The goal is to pack maximum signal into minimum tokens, and to place the highest-priority information where the model will attend to it most reliably.
The ATLAS context format
This is the structure we recommend at ATLAS and that thousands of developers use daily:
## Stack
Next.js 14 (App Router) · TypeScript 5.3 · Tailwind 3.4 · Supabase · tRPC v11 · Zod · Vitest
## Architecture
- Monorepo (Turborepo): apps/web, apps/api, packages/types, packages/ui
- Auth: Supabase Auth + RLS, session cookie strategy
- API: tRPC routers in apps/api/src/routers/, types shared via packages/types
## Conventions
- Functional components only · Server Components default · kebab-case files · PascalCase components
- No barrel files · Collocated tests in __tests__/ · Prefer zod.parse() over manual validation
## Current session
[What you're working on right now]
Notice what this format does:
- Stack line uses
·separators to pack info densely (fewer tokens than bullet points) - Architecture section is structural facts only — no adjectives, no explanation
- Conventions are rules, stated imperatively
- Current session is at the end, so it gets the highest attention weight
This format typically uses 150–250 tokens — cheap enough to include in every single message if needed.
The "Context Reset" Problem in Long Sessions
There's a second context problem that's distinct from the window limit: session boundaries.
Every new conversation with ChatGPT or Claude starts with zero context. Not degraded context — zero. Your previous session, no matter how productive, is completely gone. All the decisions you made, all the constraints you established, all the code you reviewed together — gone.
This is the problem ATLAS was built to solve. Instead of re-establishing context from scratch in every new session, ATLAS maintains a persistent context store for each of your projects. When you start a new conversation, ATLAS injects your current project context automatically.
The result: every new session feels like a continuation of the last one, not a cold start.
Practical Techniques for Managing Context in Long Sessions
1. The checkpoint summary trick
Every 20–30 messages in a long session, ask the model to summarize what's been decided:
"Summarize the key decisions we've made so far in this session as a compact context block I can paste into a new conversation."
Paste that summary at the start of your next session. This effectively extends your working context across multiple conversations.
2. The "IMPORTANT:" prefix
For constraints the model keeps forgetting, prefix them explicitly:
"IMPORTANT: We are using the App Router, NOT the Pages Router. Every file path you suggest must be inside
app/, notpages/."
The emphasis isn't cargo cult — research shows LLMs reliably up-weight content with explicit importance markers.
3. Code as context, not description
Instead of describing your data model in prose, paste the actual TypeScript types:
// Paste this, not a description of it
type User = {
id: string;
email: string;
role: "admin" | "member" | "viewer";
orgId: string;
};
Types are dense, unambiguous, and use fewer tokens than natural language descriptions.
4. Trim before you extend
Before pasting a large file for analysis, trim irrelevant sections. The model doesn't need your entire package.json — it needs the dependencies object. The more surgical you are, the more reliable the responses.
When to Start a New Session vs. Continue an Old One
A useful rule of thumb: start a new session when:
- You're switching to a different sub-problem or feature
- The model has started hallucinating or forgetting constraints
- The conversation has passed ~15,000 tokens (check via the API, or estimate based on message count)
- You want to approach the problem from a different angle without prior framing
Continue the same session when:
- You're iterating on the same piece of code
- The model has established useful context (your schema, your API shape) that took effort to set up
- You're in a flow state and the responses are accurate
Summary
- Context windows are token-limited RAM — when full, the oldest content silently disappears
- Attention decay happens before the hard limit: info in the "middle" is underweighted
- Structure your context densely, imperatively, and with the highest-priority info last
- Every new session starts at zero — use ATLAS to persist context across sessions automatically
- Use checkpoint summaries, IMPORTANT: prefixes, and code-as-context techniques for long sessions