Context Engineering in Practice: How the Shift from Prompting Actually Happens

The practitioner transformation from prompt-first to context-first AI work — with real before/after artifacts, four progressive stages, and a discrimination test for knowing when each approach is right.

By Jay West · February 8, 2026 · 10 min read · Updated February 9, 2026

Context Engineering in Practice: How the Shift from Prompting Actually Happens

You spend ten minutes crafting the perfect prompt. Product context, team standards, architectural constraints — everything the AI agent needs, carefully worded. The agent gives you a decent response. Tomorrow, same task. You rewrite the prompt from scratch because you can't remember exactly what worked yesterday. Next week, same pattern. A month in, you have a folder of prompt variations and no compounding returns.

You're on the prompt treadmill.

You know there's a better approach. Maybe you've read about context engineering, or you've named the principles — less is more, knowledge gardening, structure over retrieval. But when you sit down to work, you still reach for the prompt. The theory is clear. The habit hasn't changed.

This article is about making that change. Not another definition of context engineering — that's been written a hundred times. Not the principles behind it — we covered that. This is the transformation itself: what actually shifts in your daily work when you stop optimizing prompts and start engineering context. A specific before/after from my own workflow, four progressive stages of the shift, and a discrimination test for knowing when prompting is still the right call.

The Prompt Treadmill: Going Beyond Prompt Engineering

Prompt engineering works. For simple tasks — summarize this document, reformat this data, draft a quick email — a well-crafted prompt gets the job done. The feedback loop is immediate: write a prompt, get a response, refine if needed. No setup required.

The problem emerges as complexity grows. There's a curve that every practitioner discovers through experience: prompt effectiveness plateaus as tasks become multi-step, require prior context, or build on previous work.

Consider a backlog refinement session. A prompt-first approach means writing something like: "You are a senior product owner. Here is our product's architecture [paste architecture]. Here are our team standards [paste standards]. Here is our Definition of Ready [paste DoR]. Here is the user story to refine: [paste story]. Challenge the scope and identify missing acceptance criteria." Four hundred words of context, rewritten for every session. It works — the AI produces reasonable feedback. But you're hand-carrying all the knowledge the agent needs, every single time. Nothing compounds.

Now consider the same task next sprint. Your architecture hasn't changed. Your standards haven't changed. Your DoR is the same. But you rewrite the prompt because that's what prompts require — they're ephemeral by design. As Andrej Karpathy has framed it, the LLM is an operating system, not a command. The prompt is just the command you're typing into that system. If you're spending all your time crafting commands while ignoring the operating system's configuration, you're optimizing the wrong layer.

This isn't a skill issue. It's a structural limitation. Prompt engineering operates on a single interaction. It has no mechanism for persistence (knowledge that survives across sessions), structure (knowledge that's typed and traversable), or maintenance (knowledge that stays current). When tasks require any of those three properties, prompt engineering can't scale — not because you're prompting wrong, but because prompts don't have those capabilities. You're asking a screwdriver to be a power drill.

Larger context windows don't solve this. At 200K tokens, you could paste your entire knowledge base into the prompt — but that creates its own problem. Token efficiency degrades as context grows: the agent spreads attention across everything, relevant and irrelevant alike. Tobi Lütke framed context engineering as "providing all the context an LLM needs" — but the operative word is "needs," not "has." The constraint isn't the window size. It's the curation of what goes inside it.

The counter-argument is real: "Prompt engineering is still evolving — chain of thought, few-shot examples, system prompts." True. And context engineering contains all of those techniques. Every prompting pattern works better with good context underneath. The shift from prompt engineering to context engineering isn't replacement. It's elevation. The prompt still matters — it just matters less than the information system feeding it.

Before and After: What Context Engineering in Practice Actually Looks Like

Most articles about the shift from prompting to context engineering explain the difference abstractly. Tables comparing "single interaction vs. system design." Diagrams with arrows. Conceptual frameworks. What none of them show is the actual artifacts — the specific files that change when a practitioner makes the shift.

Here's mine.

Before (prompt-first): Every refinement session started with a 500-word prompt. I'd open Claude, paste in the product context, the team's standards, our architectural constraints, and the specific stories to refine. The prompt was good — I'd refined it over weeks. But I rewrote it from scratch every session because context from the previous session didn't carry forward. Sometimes I forgot a constraint. Sometimes I included an outdated decision. The AI didn't know the difference, and neither did I until the output was wrong.

After (context-first): A 50-line CLAUDE.md file that references structured vault notes. Product axioms in axiom/. Active decisions in decision/. Refinement standards in principle/. The agent reads this context automatically — no prompt engineering needed. When I start a refinement session, the agent already knows my product, my standards, and my recent decisions. My prompt went from 500 words to something like: "Refine these three stories against our Definition of Ready."

The efficiency gain is obvious: 20 minutes of prompt engineering per session, gone. But the deeper change isn't efficiency — it's quality. The context-first version produces fundamentally different output because structured context enables reasoning that flat prompts can't support.

In one session, the PO agent flagged a story that contradicted decision/defer-mobile-q3.md — a decision I'd forgotten I'd made. With the prompt-first approach, that decision was buried in a paragraph I sometimes remembered to include. With the context-first approach, it's a structured note the agent traverses automatically. The difference isn't "same result, less effort." It's a qualitatively different interaction — as described in our guide to AI-augmented backlog refinement, where the agent challenges your assumptions because it has structured access to your decisions, not because you wrote a clever instruction.

Anthropic's documentation on CLAUDE.md files describes this pattern at the tool level: a system prompt that persists across sessions, that agents read automatically, that you maintain rather than rewrite. It's context engineering implemented as a file — persistent, structured, maintained. The practical implementation of the shift everyone talks about in the abstract.

The Four Stages of the Context Engineering Transformation

The articles that present the shift from prompting to context engineering as a binary — old way vs. new way, before vs. after — are missing something important. The transformation isn't a switch you flip. It's a progression through identifiable stages, each with observable signals.

Understanding the stages matters because it removes the pressure to transform overnight. You don't need to build a complete knowledge system before you benefit. Each stage is an improvement over the last.

Stage 1: Awareness. You notice you're on the prompt treadmill. You recognize the rewriting, the lack of compound returns, the ceiling on complex tasks. You've read about context engineering — maybe you can name the principles. But your daily behavior hasn't changed. You still open the chat, write the prompt, get the response. This is where most people stay, not from lack of understanding but from lack of a clear next step.

Stage 2: First experiment. You create your first persistent context. Maybe a CLAUDE.md file with your project standards. Maybe a single vault note with your product's core axioms. You use it in one session and notice the agent's responses are different — sharper, more grounded, referencing knowledge you didn't manually provide. You use it in a second session and realize: the knowledge survived. You didn't rewrite anything. Something compounded.

Stage 3: Systematic context. You build the system. A structured knowledge vault with typed notes — axioms, principles, decisions, rules. A CLAUDE.md that references structured knowledge instead of containing it. An agent configured for a specific role with access to tools, structured memory, and curated context. You design different memory types — persistent knowledge in vault notes, session history for episodic context, standard procedures the agent follows. You apply the principles: less is more (focused context, not a knowledge dump), knowledge gardening (maintained context, not a static archive), structure over retrieval (deterministic context instead of RAG search results).

Stage 4: Context-first instinct. When the agent gives bad output, you check the context before you rewrite the prompt. Is the knowledge current? Is it focused? Is the right information accessible? You diagnose problems as context-level (missing, stale, diluted knowledge) or prompt-level (unclear instruction) automatically. The shift is complete — not because you decided to change, but because the habit formed through repetition.

Dan Shipper's writing on AI workflows documents a similar progression — the move from "using AI for tasks" to "building systems with AI" follows comparable stages where each level unlocks capabilities the previous one couldn't reach. The pattern is consistent: practitioners who build persistent context systems report that the shift, once it happens, feels irreversible. You can't go back to rewriting prompts for every session any more than you could go back to manual deployment after setting up CI/CD.

The key insight: you don't need Stage 4 to benefit. Stage 2 — your first persistent context — already eliminates the rewriting problem. Stage 3 adds reasoning capability (the agent can traverse your knowledge, not just read it). Stage 4 is the instinct, the internalized habit. Each stage compounds because persistent context compounds: every session builds on the last instead of starting from zero.

From Prompts to Context Engineering: The Discrimination Test

Context engineering isn't always the answer. Prompt engineering is a real skill, and for many interactions, a well-crafted prompt is exactly right. The transformation isn't about abandoning prompts — it's about knowing when each approach serves you.

Four questions form the discrimination test:

Is the task repeatable? If you'll do this task (or a variant) again next week or next sprint, context engineering compounds. The investment in persistent context pays dividends across every repetition. If the task is genuinely one-off — a single analysis, a quick summary, an ad hoc question — a prompt is fine. Don't build infrastructure for something you'll do once.

Does it build on prior work? If the task requires knowledge from previous sessions — decisions made, constraints established, standards defined, context accumulated — then context engineering persists that knowledge. Prompts start from zero. The refinement example is clear: refinement happens every sprint, builds on prior decisions, and requires consistent standards. That's context engineering territory.

Is the complexity multi-step? Single-turn tasks work well with prompts: summarize this, reformat that, translate this. Multi-turn, multi-step tasks — refine this feature against our standards, review this architecture against our principles, challenge this scope given our constraints — need structured context because the agent must hold multiple knowledge frames simultaneously and maintain coherence across turns. The more steps, the more context matters.

Will others need the same context? If teammates will do similar work with AI agents, structured context is shareable and consistent. Everyone reads the same CLAUDE.md, the same vault notes, the same standards. Prompt libraries are fragile — they drift across copies, go stale independently, and embed assumptions that aren't visible.

If two or more answers are yes: context engineering. If all four are no: prompting is fine. Most product work — refinement, planning, architecture review, scope negotiation — hits at least three of the four criteria.

This test prevents over-engineering. Not every AI interaction needs a vault, a CLAUDE.md, and a configured agent. Sometimes you just need to ask a question. The discipline isn't in always engineering context — it's in knowing when to invest and when to prompt.

Make the Shift Today

You don't need to wait for Stage 4. Start where you are:

Identify your treadmill. What AI task do you repeat weekly, rewriting context each time? That's your first candidate for persistent context. Write down the context you rewrite — product information, team standards, constraints — and notice how much stays the same between sessions.
Create one persistent artifact. A CLAUDE.md file with your project's core context. Start with 15-20 lines: what the product does, what standards apply, what the agent should know. Not comprehensive — focused. Apply less is more from the start.
Use it twice. The shift crystallizes the second time you use persistent context and realize you didn't rewrite anything. That's the compound return — the moment the treadmill stops.
Run the discrimination test. Before your next AI interaction, ask the four questions. If two or more are yes, invest in context. If not, prompt freely. Build the habit of asking before you act.

When you get bad AI output, do you instinctively check the context system rather than rewording the prompt? If you find yourself diagnosing problems as context-level ("the agent didn't know about our Q3 decision") rather than prompt-level ("I should have phrased that differently"), the transformation is taking hold.

You've built the system — a structured knowledge vault. You've applied it to a real workflow — AI-augmented refinement. You've understood why it works — named principles. And now you've made the shift from prompt-first to context-first. The Foundation is internalized. Next: the tactical setup — your first Claude Code workflow where context engineering stops being an idea and becomes daily muscle memory.

Get the Context Engineering Shift Checklist + PO Agent Blueprint

Enter your email and we'll send you the download link.

Context Engineering in Practice: How the Shift from Prompting Actually Happens

By Jay West · February 8, 2026 · 10 min read · Updated February 9, 2026

Context Engineering in Practice: How the Shift from Prompting Actually Happens

You're on the prompt treadmill.

The Prompt Treadmill: Going Beyond Prompt Engineering

Before and After: What Context Engineering in Practice Actually Looks Like

Here's mine.

The Four Stages of the Context Engineering Transformation

From Prompts to Context Engineering: The Discrimination Test

Four questions form the discrimination test:

Make the Shift Today

You don't need to wait for Stage 4. Start where you are:

Identify your treadmill. What AI task do you repeat weekly, rewriting context each time? That's your first candidate for persistent context. Write down the context you rewrite — product information, team standards, constraints — and notice how much stays the same between sessions.
Create one persistent artifact. A CLAUDE.md file with your project's core context. Start with 15-20 lines: what the product does, what standards apply, what the agent should know. Not comprehensive — focused. Apply less is more from the start.
Use it twice. The shift crystallizes the second time you use persistent context and realize you didn't rewrite anything. That's the compound return — the moment the treadmill stops.
Run the discrimination test. Before your next AI interaction, ask the four questions. If two or more are yes, invest in context. If not, prompt freely. Build the habit of asking before you act.

Get the Context Engineering Shift Checklist + PO Agent Blueprint

Enter your email and we'll send you the download link.

Context Engineering in Practice: How the Shift from Prompting Actually Happens

Context Engineering in Practice: How the Shift from Prompting Actually Happens

The Prompt Treadmill: Going Beyond Prompt Engineering

Before and After: What Context Engineering in Practice Actually Looks Like

The Four Stages of the Context Engineering Transformation

From Prompts to Context Engineering: The Discrimination Test

Make the Shift Today

Get the Context Engineering Shift Checklist + PO Agent Blueprint

Related Reading

Your First Claude Code Workflow: From Context Engineering to Daily Practice

The Axiom-Principle-Rule Framework: Codifying How You Think for AI Agents

Ready to accelerate?

Context Engineering in Practice: How the Shift from Prompting Actually Happens

Context Engineering in Practice: How the Shift from Prompting Actually Happens

The Prompt Treadmill: Going Beyond Prompt Engineering

Before and After: What Context Engineering in Practice Actually Looks Like

The Four Stages of the Context Engineering Transformation

From Prompts to Context Engineering: The Discrimination Test

Make the Shift Today

Get the Context Engineering Shift Checklist + PO Agent Blueprint

Related Reading

Your First Claude Code Workflow: From Context Engineering to Daily Practice

The Axiom-Principle-Rule Framework: Codifying How You Think for AI Agents

Ready to accelerate?