Context Engineering Principles: The Mental Models I Use Every Day with AI Agents
Named principles for context engineering — less is more, knowledge gardening, structure over retrieval — discovered through daily agent usage, not academic theory.
Context Engineering Principles: The Mental Models I Use Every Day with AI Agents
You've built a structured knowledge system. Maybe you set up an Obsidian vault with typed frontmatter where axioms, principles, and decisions are explicit and traversable. Maybe you configured a PO agent with a CLAUDE.md file and ran an AI-augmented refinement session where the agent actually challenged your assumptions. It worked — better than generic prompts, better than pasting context into ChatGPT.
But you can't explain why it works.
Someone asks you what "context engineering" means and you hesitate. Is it just better prompting? Is it RAG? You've heard Shopify's CEO coined the term. Anthropic published a technical guide. Every AI blog has a "what is context engineering" explainer. You've read three of them and still can't articulate what you actually do differently.
That's because most context engineering content defines the concept from the outside. What follows is different: three named principles I use every day, each discovered through a specific failure or debugging session with AI agents. Not definitions — mental models. Each one changes how you evaluate whether a context change will improve or degrade your agent's performance.
Why Context Engineering Is Not Prompt Engineering
The most common misconception: context engineering is an evolution of prompt engineering. Better prompts, more context, same discipline. This framing is wrong, and it leads people to apply prompt-engineering intuitions to context-engineering problems — with predictable results.
Prompt engineering asks: "How do I phrase this instruction to get the output I want?" It operates on a single interaction. You craft a prompt, get a response, refine the prompt. The feedback loop is immediate. The scope is one conversation.
Context engineering asks a fundamentally different question: "What does the agent need to know — in what structure, persisted how, maintained by whom — to perform reliably across every interaction?" The scope isn't a conversation. It's an information system.
Three dimensions separate context engineering from prompt engineering, and none of them appear in a typical prompting guide:
Persistence. Prompts are ephemeral — written for one interaction, discarded after. Context engineering designs knowledge that survives across sessions. Your CLAUDE.md file, your vault's axiom and decision notes, your team's Definition of Ready — these persist. The agent accesses them every time, not just when you remember to paste them.
Structure. Prompts are flat text — natural language instructions. Context engineering creates typed, traversable knowledge. An axiom has different authority than a decision. A principle operates differently from a rule. When your vault uses typed frontmatter, the agent can reason about your knowledge — not just search through it.
Maintenance. Prompts are fire-and-forget — write once, use once. Context requires active curation. Decisions change. Constraints expire. If you don't maintain your context, the agent reasons from stale information and produces confident, wrong output. There is no prompt-engineering equivalent of this maintenance burden because prompts don't persist.
Anthropic's guide on effective context engineering for AI agents covers the engineering dimension well — system prompts, tool design, retrieval patterns, long-horizon techniques like compaction and multi-agent architectures. It's the authoritative technical reference, and it correctly identifies different memory types (episodic memory for conversations, semantic memory for facts, procedural memory for learned behaviors) as core to context design. But it's written for developers building agent infrastructure. What's missing from the conversation is the practitioner's perspective: mental models for people who use agents daily, not build them. As Simon Willison has observed, the practical gap between "understanding how LLMs work" and "using them effectively every day" is where most people get stuck.
Tobi Lütke framed context engineering as "the art of providing all the context, information and tools an LLM needs." That definition is useful but incomplete. Providing context is step one. Structuring it, maintaining it, and knowing when to remove it — those are the principles that determine whether your agent is reliable or fragile.
If you've built a typed knowledge vault, that vault isn't a "better prompt." It's an information system. If you've configured a PO agent for refinement, it didn't work because of clever instructions — it worked because it had structured access to your decisions, axioms, and standards. The principles below explain why.
Less Is More: The Counter-Intuitive Context Engineering Best Practice
This is the one that surprises people. The intuition says: give the agent more context, get better output. More product knowledge, more decision history, more architectural constraints. The context window is 200K tokens — fill it up.
The reality is the opposite. More context often degrades performance.
I discovered this during a specific debugging session. My CLAUDE.md had grown to roughly 800 lines — every product decision, every architectural constraint, every team standard I could think of. Comprehensive. The agent's responses were vague, generic, and contradictory. It would reference a constraint from line 400 that conflicted with a decision on line 650, then hedge with "it depends on your specific situation."
I cut the CLAUDE.md to 200 lines. Focused. Only the context relevant to the agent's current role. The agent's responses became sharper, more opinionated, and more useful — immediately.
Why this happens is well-documented. Anthropic calls it "context rot" — performance degradation as context grows. The attention mechanism in large language models spreads across all tokens. More context means each relevant token gets proportionally less attention. Research on the "Lost in the Middle" phenomenon shows that LLMs suffer 20-30% accuracy reduction when relevant information is buried in the middle of a long context.
The practical implication: for every piece of context you add to an agent's working set, ask "does this improve the next 10 interactions?" Not "is this information correct?" (it probably is) but "does having it here make the agent better at its current job?"
Notice how the PO agent's CLAUDE.md from the refinement guide was roughly 15 lines. Product context, refinement standards, challenge rules, knowledge references. Fifteen lines — and it challenged scope, referenced prior decisions, and applied foundational axioms. It didn't need 800 lines of product knowledge pre-loaded. It needed the right 15 lines, with a path to the vault for everything else.
The larger context windows get, the more this principle matters — not less. A 200K token window doesn't solve the dilution problem. It makes it worse. You can fit more irrelevant context alongside the relevant context, and the agent will try to honor all of it. The constraint isn't the window size. It's your discipline in curating what goes inside it.
The principle in practice: Start with the minimum context the agent needs. Add only when you observe a specific gap. Remove anything the agent doesn't reference in its outputs. Treat your context like code — every line should earn its place.
Knowledge Gardening: The Context Engineering Principle Nobody Teaches
Building a knowledge system is the easy part. Maintaining it is where most people fail.
I noticed this pattern on a Monday morning. The PO agent recommended deferring a feature to "maintain our microservices boundary" — referencing a constraint note in the vault. The problem: we had shifted to a modular monolith two weeks earlier. The constraint note was stale. The agent was reasoning from outdated information with complete confidence, and its recommendation was wrong.
This is worse than no context. No context produces generic output. Stale context produces specific, confident, wrong output — the kind you're more likely to follow because it sounds authoritative.
Knowledge gardening is the practice of actively maintaining your context — pruning stale notes, updating changed decisions, retiring constraints that no longer apply. Not a quarterly reorganization project. A daily micro-maintenance habit, like tending an actual garden.
Three gardening actions handle most maintenance needs:
Update. A decision changed — update the note. You shifted from microservices to modular monolith? Update decision/architecture-pattern.md the same day. Not next week. Not "when I have time." The cost of a stale decision note compounding across multiple agent interactions is far higher than the two minutes it takes to update it.
Retire. A constraint no longer applies — archive the note. You dropped the offline-support requirement? Move the constraint note to _archive/. Don't delete it (you may need the history), but get it out of the agent's active context. Retired knowledge that stays in the working set is noise that dilutes signal.
Prune. A note grew beyond its purpose — split or trim it. Your "product strategy" note expanded from a focused axiom into a rambling essay? It's lost its atomic quality. Break it back into focused pieces: one axiom, one principle, one decision. The vault's typed structure makes this tractable — you know what each note's type should be, so you know when it's outgrown its purpose.
The reason knowledge gardening is absent from most context engineering content is simple: it's not technically interesting. There's no algorithm. No framework. No tool. It's discipline — 5 to 10 minutes daily reviewing what changed, what's stale, and what's no longer true. Typed frontmatter makes it manageable: axioms rarely change (review monthly), decisions change regularly (review weekly), rules derive from principles (when a principle updates, check its rules).
This maintenance discipline is what separates a knowledge system that compounds from one that decays. Without gardening, your vault becomes a liability after about a month — enough time for real-world changes to make the agent's context subtly wrong.
The principle in practice: Spend 5-10 minutes daily reviewing notes touched by recent work. Weekly, check active decisions for staleness. Monthly, review axioms and principles for relevance. Treat it like inbox zero — not a project, a habit.
Structure Over Retrieval: Why Pre-Built Context Beats AI Search
Most context engineering content assumes you'll use RAG — retrieval-augmented generation. Embed your knowledge into vectors, build a retrieval pipeline, and let the system find relevant context dynamically. For large-scale systems with millions of documents, this makes sense. For a practitioner's knowledge system, it's over-engineering.
The principle: when your knowledge system is small enough to curate — hundreds of notes, not millions — pre-structured knowledge beats retrieval. Every time.
Consider the difference. With RAG, you ask the agent about your architectural constraints. A retrieval system searches your notes, returns the chunks with the highest semantic similarity, and hopes it found the right ones. Maybe it returns your architecture axiom. Maybe it returns a meeting note where someone mentioned architecture. Maybe it chunks your axiom across two pieces and loses the nuance. You don't know what the agent sees. The retrieval is a black box.
With structured knowledge, the agent reads axiom/architecture-not-ml.md directly. You know exactly what it sees. The note has typed frontmatter — it's an axiom, it has a specific scope, it links to derived principles. The agent can traverse from the axiom to its child principles to the rules they generate. Deterministic. Auditable. Maintainable.
RAG introduces three failure modes that structure avoids:
Retrieval noise. The vector search returns semantically similar but contextually wrong chunks. A note about "architecture reviews" has high cosine similarity to a query about "architectural constraints" but answers a completely different question.
Chunk boundaries. Your axiom is 150 words. The chunking algorithm splits it at 100 words. The agent gets half an axiom and a quarter of the next note. The original meaning is lost.
Opacity. You can't easily audit what the retrieval system returned. When the agent produces wrong output, debugging requires inspecting vector search results, re-ranking scores, and chunk reconstruction. With structured notes, you read the note the agent read. Debugging is direct.
This isn't a universal argument against RAG. Retrieval excels for ephemeral, high-volume context: customer support tickets, codebase search, external research. The boundary is clear: if you curate it, structure it. If you query it, retrieve it. A product owner's knowledge system — axioms, principles, decisions, rules — is curated. It belongs in structured notes, not a vector database.
The Obsidian vault approach is structure over retrieval in practice. The PO agent from the refinement workflow didn't retrieve your refinement standards from a search — it read them from CLAUDE.md, which pointed to structured vault notes. That directness is why the agent could challenge your assumptions credibly. It wasn't guessing at relevant context. It was reading the specific notes you structured for it.
The principle in practice: Use structured notes for persistent, curated knowledge (your product context, decisions, operating principles). Use retrieval for ephemeral, high-volume data (customer queries, codebase search, market research). When in doubt, start with structure — you can always add retrieval later if volume demands it.
Apply These Context Engineering Principles Today
You don't need to rebuild anything. These principles apply to whatever you've already built.
-
Audit your context size. Open your CLAUDE.md or agent configuration. How long is it? If it's more than a page, apply less is more. Cut anything the agent doesn't need for its primary role. Move reference material to the vault where the agent can access it on demand, not pre-loaded.
-
Start the gardening habit. Set a daily 5-minute reminder. Review notes touched by yesterday's work. Is anything stale? Has a decision changed? Does a constraint still apply? Update, retire, or prune as needed. The first week will feel unnecessary. By the third week, you'll catch a stale note that would have caused a wrong recommendation.
-
Check your retrieval dependency. Are you using RAG for knowledge that could be structured? If your "knowledge base" is hundreds of curated notes (not thousands of raw documents), consider replacing the retrieval pipeline with structured notes and direct references. You'll gain auditability and lose a dependency.
-
Name your principles. The biggest benefit of named principles isn't the principles themselves — it's the vocabulary. When you can say "that violates less is more" or "this note needs gardening," you have a shared language for context quality. That language compounds across every agent interaction and every team conversation about AI workflow.
Can you explain your context engineering principles by name and give a concrete example of each from your own work? If you can name less is more, knowledge gardening, and structure over retrieval — and point to specific moments where they changed your agent's output — you've internalized the mental models that make the Foundation work.
You've built the system (a structured knowledge vault), applied it to a real workflow (AI-augmented refinement), and now understand why it works (these principles). The Foundation is complete. Next: the transformation itself — what changes in your daily work when you shift from prompt-first to context-first thinking across every tool and workflow you use.
Get the Obsidian Template Vault + PO Agent Blueprint
Enter your email and we'll send you the download link.
Related Reading
Building a Second Brain for AI Agents
How to architect an Obsidian vault with typed frontmatter so AI agents can reason with your knowledge, not just search it.
AI-Augmented Backlog Refinement: How I Run Every Session with a PO Agent
How to configure Claude Code as a PO agent with session types and run AI-augmented backlog refinement that challenges your assumptions, not just generates boilerplate.
Ready to accelerate?
Book a strategy call to discuss how these patterns apply to your product team.
Book a Strategy Call