Validate schemas incrementally

Why you should not trust an untested schema at full scale

In L-0288 you learned that the most reliable way to test a schema is to act on it and observe the results. But there is a question that lesson left unanswered: how much of the schema should you test at once?

The answer, drawn from converging evidence across epistemology, software engineering, innovation research, and decision theory, is almost always the same: test the smallest piece first. Validate incrementally. Build confidence in parts before you trust the whole.

This principle sounds obvious. In practice, it is the one people skip most often. You read a compelling framework for productivity, and instead of testing it on one morning, you reorganize your entire week. You develop a theory about what motivates your team, and instead of checking it against one person's behavior, you redesign the incentive structure. You adopt a mental model for how markets work, and instead of paper-trading one position, you rebalance your portfolio. In each case, the failure is the same: you bet on the whole schema before verifying any of its parts. When it breaks — and complex schemas usually break somewhere — you cannot tell which piece failed, because you changed everything simultaneously.

Incremental validation is the antidote. It is a discipline of proportional commitment: invest trust in a schema at the same rate you accumulate evidence for it. Test the atom before you trust the molecule. Test the molecule before you trust the compound. This lesson explains why that discipline works, where it comes from, and how to make it operational in your own thinking.

The epistemological foundation: Popper's piecemeal criticism

The philosophical case for incremental validation begins with Karl Popper's falsificationism. Popper argued that scientific knowledge advances not by confirming theories but by attempting to refute them — by searching for the conditions under which a theory fails. But Popper made a crucial methodological point that is often overlooked: all scientific criticism must be piecemeal. You cannot question every aspect of a theory simultaneously, because certain elements of what he called "background knowledge" must be held constant while you test others (Popper, 1963).

This is not a concession to laziness. It is a logical necessity. If you change three variables at once and get an unexpected result, you have no way to attribute the result to any particular variable. The test is uninformative. Popper extended this principle beyond the laboratory into social reform, advocating what he called "piecemeal social engineering" — the practice of modifying one institution at a time, testing the results, and iterating, rather than attempting to revolutionize an entire social system based on an untested blueprint (Popper, 1945).

The parallel to personal schema validation is direct. Your schemas — your mental models for how things work — are theories about the world. Popper's insight says: do not test your entire theory of how your career works by quitting your job and trying a completely different approach. Test one prediction. Does your schema say that networking events lead to opportunities? Attend one. Measure what happens. Does your schema say that morning routines improve your focus? Try one morning. A single, contained test gives you diagnostic information. A total overhaul gives you noise.

The engineering proof: unit testing and TDD

Software engineering arrived at the same principle through decades of painful experience. The practice of unit testing — testing the smallest isolable component of a system before integrating it into larger structures — is now so fundamental that codebases without it are considered professionally irresponsible.

Kent Beck formalized this discipline in his development of Test-Driven Development (TDD), documented in Test-Driven Development: By Example (2002). The TDD cycle is deliberately incremental: write a test for the smallest behavior you can specify, write just enough code to make that test pass, refactor, then repeat. You never write a large system and test it afterward. You build confidence one passing test at a time, each test verifying one small claim about how the system behaves.

The reason is not just practical efficiency. It is epistemic. When a unit test fails, you know exactly what broke — the unit under test. When an integration test fails on a system you have never unit-tested, you know something broke, but you may spend hours or days tracking down where. The smaller the unit of validation, the higher the diagnostic resolution of the failure. This is Beck's deeper insight: testing is not just about catching errors. It is about producing actionable information when errors occur.

Translate this to personal schemas. If you test your entire theory of time management by overhauling your schedule, calendar system, and task manager simultaneously, and your productivity drops, what do you learn? Almost nothing — the failure could be in any component or in their interaction. But if you test one element — say, time-blocking your mornings — while holding everything else constant, a failure tells you something specific about time-blocking in your context. You have generated a unit test for your schema.

The innovation evidence: little bets

Peter Sims documented the same pattern in creative and entrepreneurial contexts in Little Bets: How Breakthrough Ideas Emerge from Small Discoveries (2011). Sims studied innovators across domains — Pixar, Amazon, the comedy circuit, military strategy, architecture — and found a consistent pattern: breakthrough results came not from grand plans executed faithfully, but from systematic series of small, low-cost experiments.

A little bet, as Sims defines it, is "a low-risk action taken to discover, develop, and test an idea." The key properties are that it is affordable (you can absorb the loss if it fails), concrete (it produces observable results), and fast (you learn something quickly enough to iterate). Chris Rock, for example, does not write a comedy special in isolation and perform it for the first time in a packed theater. He tests individual jokes in small comedy clubs over six months to a year, iterating based on audience response. Each joke is a unit test for a comedic hypothesis. Only jokes that survive incremental validation at small scale make it into the finished product.

Sims's research demonstrates that incremental validation is not just epistemically sound — it is emotionally manageable. Small bets keep the stakes low enough that failure is informative rather than devastating. You can afford to be wrong about a joke at a small club. You cannot afford to be wrong about a joke in front of twenty thousand people. The same logic applies to your schemas. Testing your belief about what motivates your team in a single one-on-one conversation is a small bet. Restructuring the entire incentive system is an all-in wager. The incremental path produces the same information with a fraction of the risk.

The financial theory: real options and progressive commitment

Decision theory formalizes this intuition through the concept of real options. Stewart Myers coined the term in 1977 to describe a specific class of strategic choice: the option to make a small investment now that preserves the right to make a larger investment later, once you have more information (Myers, 1977).

The core insight of real options theory is that under uncertainty, the ability to commit progressively — to invest in stages rather than all at once — has measurable value. An option to expand is worth something even before you decide whether to expand, because it preserves your ability to respond to what you learn. The pharmaceutical industry operates on this principle: drug development proceeds through Phase I, Phase II, and Phase III trials not because regulators are cautious by nature, but because each phase produces information that determines whether the next phase is worth the investment. At each stage, you can continue, modify, or abandon — and the option to abandon is itself valuable.

For your schemas, the parallel is this: every schema you hold is an investment of cognitive commitment. You are betting that this model of reality is accurate enough to act on. Real options theory says: do not make that bet all at once. Make it in stages. Test the schema in a low-stakes context first (Phase I). If it survives, test it in a moderately complex context (Phase II). If it still holds, apply it to a high-stakes decision (Phase III). At each stage, you have the option to revise or abandon. That option has value — it is the value of not being locked into a schema that reality has already falsified.

Nassim Nicholas Taleb extends this logic in Antifragile (2012) with his barbell strategy: combine extreme caution in the domain of potential ruin with numerous small, speculative bets in the domain of potential gain. The philosophy is directly applicable to schema validation. Protect your core commitments — do not overhaul your life based on an unvalidated theory. But make many small bets — test individual components of the schema in contained contexts where failure is cheap and informative. You gain the upside of validated knowledge without the downside of catastrophic error.

The knowledge management principle: incremental formalization

The personal knowledge management community has independently discovered the same pattern. Frank Shipman and Catherine Marshall, in their influential paper "Formality Considered Harmful" (1999), identified a fundamental problem with knowledge systems that demand formal structure upfront. They found that users resist premature formalization for four reasons: cognitive overhead (specifying structure is extraneous to the primary task), tacit knowledge (people cannot always articulate what they know), premature structure (people resist committing to a formalism before its utility is clear), and situational structure (useful structures vary across contexts).

Their proposed solution was incremental formalization — enter information informally first, then add structure gradually as the appropriate formalisms become clear and immediately useful. Do not force your knowledge into a rigid schema before you understand what schema it needs. Let the structure emerge from use, validated by its actual utility in actual contexts.

This principle applies directly to how you build and validate personal schemas. A schema for "how I make good decisions" should not start as a formal ten-step framework. It should start as a rough observation: "I seem to decide better when I sleep on it." Test that. If it holds, add specificity: "I decide better when I sleep on it and the decision involves more than two stakeholders." Test that. The schema becomes more formal, more structured, and more reliable as each increment survives validation. You are not building the schema and then testing it. You are building it by testing it, one increment at a time.

AI and the Third Brain: step-by-step verification

The convergence of incremental validation and AI is visible in one of the most important developments in large language model reasoning: chain-of-thought prompting.

Wei et al. demonstrated in 2022 that large language models produce dramatically better results on complex reasoning tasks when prompted to show their work step by step, rather than jumping directly to a final answer. The technique — chain-of-thought (CoT) prompting — decomposes a problem into intermediate reasoning steps, each of which can be individually verified. The improvement is not marginal. On challenging mathematical reasoning benchmarks, chain-of-thought prompting roughly doubled accuracy compared to standard prompting.

The deeper development came from Lightman et al. (2023), whose paper "Let's Verify Step by Step" showed that supervising each reasoning step individually — process supervision — significantly outperformed supervising only the final answer. When you reward an AI for getting each step right rather than just the conclusion, the model produces more reliable reasoning and fewer compounding errors. This is incremental validation applied to machine cognition: verify the atoms of reasoning before trusting the molecule.

For your Third Brain — the AI-augmented knowledge infrastructure you are building through this curriculum — the implication is practical. When you use an AI system to help validate your schemas, do not ask "Is my theory of team motivation correct?" That is whole-schema validation, and the AI will produce a plausible-sounding answer regardless. Instead, decompose: "Here is one specific prediction my schema makes. Does the evidence support it?" Then another prediction. Then another. You are running unit tests on your schema through the AI, and each test produces higher-resolution information than a holistic assessment ever could.

The same discipline applies to AI-generated schemas. When an AI proposes a framework for how something works, do not adopt it wholesale. Extract one testable prediction and check it. Then another. Incremental validation is your defense against the AI's tendency to produce confident, coherent, and occasionally wrong frameworks. The coherence makes wholesale adoption tempting. The incrementalism makes selective adoption safe.

Protocol: the incremental validation ladder

Here is a concrete protocol for validating any schema incrementally:

Step 1: Decompose the schema into testable claims. Most schemas are compound — they contain multiple assertions bundled together. "Morning routines improve my productivity" contains at least three hidden claims: that mornings are a distinct productivity window, that routines (versus improvised mornings) are the active ingredient, and that the specific activities in the routine matter. Separate them.

Step 2: Rank the claims by testability. Some claims are easy to test in isolation. Others require setup or depend on other claims being true. Start with the most independently testable claim — the one you can check with the least overhead and fewest confounding variables.

Step 3: Design a minimal test. The test should be small enough to run in a single day or a single interaction. One morning with a routine versus one morning without. One conversation using the new communication framework versus one using your default. One decision made with the new model versus one made without it. The smaller the test, the cleaner the signal.

Step 4: Run the test and record the result. Document what happened. Not just "it worked" or "it didn't" — what specifically did you observe? Where did the schema's prediction match reality? Where did it diverge? What surprised you?

Step 5: Update and escalate. If the claim survived the minimal test, design a slightly larger test. If it failed, revise the claim before testing further. Never escalate a claim that failed at a smaller scale — fix it first, then retest.

Step 6: Repeat until the schema earns your confidence. Confidence is not binary. It accumulates through survived tests. A schema that has survived five incremental tests across different contexts has earned more of your trust than one that has survived zero tests but sounds compelling.

The bridge to distinguishing validation from confirmation

Incremental validation gives you a powerful method for testing schemas safely and informatively. But there is a subtle trap waiting inside the method itself: the temptation to design your incremental tests so they can only confirm what you already believe.

If you test your schema about morning routines only on mornings when you slept well and had no interruptions, you are not validating — you are confirming. If you test your communication framework only with the colleague who is easiest to talk to, you are selecting for success. The incremental approach protects you from the cost of large-scale failure, but it does not automatically protect you from the bias of self-serving test design.

That is what L-0290 addresses directly: the difference between validation and confirmation. Validation seeks to discover whether a schema is true. Confirmation seeks to reassure you that it is. They feel similar from the inside — both involve testing — but they produce radically different epistemic outcomes. Incremental validation is the method. Honest test design is the discipline that makes the method trustworthy. You need both.

Sources

Beck, K. (2002). Test-Driven Development: By Example. Addison-Wesley.
Lightman, H., Kosaraju, V., Burda, Y., Edwards, H., Baker, B., Lee, T., Leike, J., Schulman, J., Sutskever, I., & Cobbe, K. (2023). Let's verify step by step. arXiv preprint arXiv:2305.20050.
Myers, S. C. (1977). Determinants of corporate borrowing. Journal of Financial Economics, 5(2), 147-175.
Popper, K. (1945). The Open Society and Its Enemies. Routledge.
Popper, K. (1963). Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge.
Shipman, F. M., & Marshall, C. C. (1999). Formality considered harmful: Experiences, emerging themes, and directions on the use of formal representations in interactive systems. Computer Supported Cooperative Work, 8(4), 333-352.
Sims, P. (2011). Little Bets: How Breakthrough Ideas Emerge from Small Discoveries. Free Press.
Taleb, N. N. (2012). Antifragile: Things That Gain from Disorder. Random House.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.