Schema quality criteria

Not all schemas are created equal

You carry hundreds of schemas — mental models of how the world works, why people behave the way they do, what causes what, what to expect next. L-0321 introduced the idea that you can build schemas about your schemas. L-0322 asked you to observe how you create them. This lesson asks the harder question: how do you know whether a schema is any good?

Because most of your schemas were never evaluated. They were absorbed from parents, inherited from corporate culture, inferred from a handful of experiences, or borrowed from a persuasive book. They feel true — that is, they feel familiar — and familiarity masquerades as validity. You have never subjected most of your operating models to any systematic quality check. You wouldn't run production software that had never been tested. You are running your decisions on untested schemas every day.

The good news: the question of what makes a model good has been rigorously studied across philosophy of science, software engineering, machine learning, and pragmatist epistemology. These fields converge on a surprisingly consistent set of criteria. Once you know them, you can evaluate any schema you hold — and stop treating gut-feeling plausibility as a substitute for quality.

The six criteria

Thomas Kuhn, in his 1977 essay "Objectivity, Value Judgment, and Theory Choice," identified five values that scientists use when choosing between competing theories: accuracy, consistency, scope, simplicity, and fruitfulness. These were not arbitrary preferences. Kuhn argued they are "necessarily permanent, for abandoning them would be abandoning science." They represent the minimum conditions a model must meet to be considered genuinely explanatory rather than merely narrative.

I adapt Kuhn's five into six criteria for evaluating personal schemas, adding one that Karl Popper would insist on and that Kuhn implicitly assumed: falsifiability. Here is each criterion, what it means for your thinking, and how to test for it.

1. Accuracy

A schema is accurate when its predictions match observed reality. This is the most basic requirement — the schema says X will happen, and X does happen. Kuhn defined accuracy as "consequences deducible from theory should be demonstrated in agreement with the results of existing experiments and observations."

For personal schemas, accuracy means: does this model correctly describe what has already happened? If your schema says "people leave companies because of bad managers," check it against the actual departures you have witnessed. How many were genuinely about the manager versus compensation, career growth, relocation, or burnout? A schema that explains only 30% of the cases it claims to explain is not accurate. It is a narrative with anecdotes attached.

The trap is survivorship counting: you remember the cases that confirm the schema and forget the ones that don't. Accuracy requires looking at all the data, not just the data that fits. In machine learning, this is the difference between training accuracy and test accuracy — a model can perfectly memorize its training data while failing on anything new. The term for this is overfitting, and your brain does it constantly. A schema built from three vivid experiences may be perfectly "accurate" on those three experiences and wrong everywhere else.

2. Predictive power

Accuracy looks backward — did the model explain what already happened? Predictive power looks forward — can the model tell you what will happen next, before you observe it?

This is the criterion that separates genuinely useful schemas from post-hoc storytelling. Imre Lakatos, building on both Kuhn and Popper, made this the centerpiece of his methodology of scientific research programmes. A research programme is "progressive" when it makes novel predictions and at least some of those predictions are confirmed. It is "degenerative" when it only explains what has already happened and generates no new, testable predictions about the future.

Apply this directly to your schemas. Your model of what makes a product successful — does it predict which new products will succeed before launch, or does it only explain why past successes worked? Your theory of how your manager makes decisions — can it tell you what she will say in next week's meeting, or does it only rationalize what she said last week?

William James, the father of philosophical pragmatism, put the test bluntly: "What is the truth's cash-value in experiential terms?" For James, true ideas are those "that we can assimilate, validate, corroborate and verify." Ideas become true "just in so far as they help us to get into satisfactory relation with other parts of our experience." An idea that produces no predictions you can verify has no cash-value. It is an intellectual decoration.

The machine learning analogy is exact. A model is evaluated not on training data but on a held-out test set — data it has never seen. Accuracy on training data tells you the model memorized the past. Accuracy on test data tells you the model learned a pattern that generalizes. Your schemas need the same test: do they perform on new data, or only on the data they were built from?

3. Scope

A schema with broad scope explains many phenomena. A schema with narrow scope explains one specific thing. All else being equal, broader scope is better — not because bigger is always better, but because a schema that works across contexts reveals a deeper pattern than one that works in only one situation.

Kuhn described scope as consequences that "should extend beyond the data a theory is required to explain." Newton's laws didn't just explain falling apples — they explained planetary orbits, tidal patterns, and the trajectories of cannonballs. The scope is what made the theory powerful: the same set of rules governed wildly different phenomena.

For your personal schemas: does your model of "how people resist change" work only in your current team, or does it also explain resistance in other teams, organizations, or contexts you've observed? A schema that works only in one narrow context may be an accurate local description rather than a genuine model. Local descriptions are useful, but they don't transfer. And transfer — the ability to apply a schema in a new context and have it work — is where the real leverage lives.

The caution on scope is overreach. A schema that claims to explain everything ("It's all about incentives" or "Everything is power dynamics") may actually explain nothing — it has become so broad that it cannot be wrong, which means it cannot be tested, which means it has no predictive power. Scope must increase alongside accuracy, not at its expense.

4. Simplicity

Between two schemas that explain the same facts with the same accuracy, prefer the simpler one. This is Occam's Razor, attributed to the 14th-century philosopher William of Ockham: "Entities must not be multiplied beyond necessity."

The principle is not that reality is simple. The principle is that unnecessary complexity is a liability. Every additional element in a schema is an additional place it can break, an additional assumption that must be true, an additional thing you must track. Karl Popper argued that simpler theories are preferable "because their empirical content is greater; and because they are better testable." A simple theory applies to more cases and is more easily falsified — meaning it makes stronger, more specific claims about the world.

Software engineering encodes this as a design principle. Good code has high cohesion (each module does one thing well) and low coupling (modules don't depend excessively on each other). McCabe's cyclomatic complexity metric literally counts the number of independent paths through a program — higher complexity means more paths means more places for bugs to hide. The same is true of schemas: a model with fifteen interacting variables and eight conditional exceptions is not sophisticated. It is fragile. It will break in ways you cannot predict because you cannot hold its full structure in working memory.

But simplicity is not the same as simplism. A model that is too simple — that ignores real complexity in the phenomenon it claims to explain — is not parsimonious. It is wrong. The Akaike Information Criterion (AIC) in statistics formalizes this tradeoff: it measures model quality as a function of goodness of fit penalized by complexity. The best model is not the simplest possible model. It is the simplest model that still captures the real structure of the data. Underfitting — making a model too simple — is just as much a failure as overfitting.

5. Fruitfulness

A good schema generates new questions, new investigations, new insights you would not have reached without it. Kuhn called this fruitfulness: a theory "should disclose new phenomena or previously unnoted relationships among those already known."

This criterion is underappreciated. A schema can be accurate, predictive, broad, and simple — and still be sterile. It explains what it explains and that is the end of it. A fruitful schema, by contrast, opens doors. It makes you notice things you weren't looking for. It suggests experiments you hadn't considered. It connects domains that seemed unrelated.

Lakatos built his entire framework around this: a progressive research programme is one that keeps producing novel predictions and new discoveries. A degenerative programme just patches its existing explanations with ad hoc modifications. The same test works for personal schemas. Your model of effective leadership — does it keep generating new insights as you encounter new situations? Does it suggest experiments you can run with your team? Or have you been explaining the same observations with the same model for five years without learning anything new?

Fruitfulness is the growth criterion. A schema that is no longer producing new insights has reached its ceiling. It may still be useful as a stable tool, but it is no longer driving learning. And a schema portfolio where nothing is fruitful is a cognitive infrastructure that has stopped evolving.

6. Falsifiability

A schema is falsifiable when there exists, in principle, an observation that would prove it wrong. This was Popper's central contribution to the philosophy of science and it remains the sharpest tool in your quality kit.

If no possible observation could disprove your schema, it is not a model of reality — it is a belief system. "Everything happens for a reason" is unfalsifiable. So is "the market is always right" (because you can redefine "right" after any outcome). So is "people are fundamentally selfish" (because any apparently altruistic behavior can be reframed as secretly self-serving).

Falsifiability does not mean the schema is false. It means the schema is making a claim specific enough that reality could, in principle, disagree. "Teams with unclear ownership of decisions will have longer cycle times" is falsifiable — you can measure ownership clarity and cycle time and check the relationship. "Culture matters" is not falsifiable — it is too vague to test.

The machine learning parallel: a model that can explain any possible output is not a good model. It has too many parameters relative to its training data. It has memorized rather than learned. Falsifiability is the antidote — it demands that your schema rule out some possible outcomes, committing to a specific claim about which observations should and should not occur.

How the criteria interact

These six criteria are not a checklist where you need a perfect score on all six. They are in tension with each other, and managing that tension is the real skill.

Scope versus accuracy. Broadening scope often reduces accuracy. A model that explains everything explains nothing precisely. The art is finding the right level of generality — broad enough to transfer across contexts, specific enough to make testable claims in each.

Simplicity versus scope. A simpler model typically has narrower scope. Adding variables expands what the model can explain but increases complexity. The question is always: does this additional complexity earn its keep in additional explanatory or predictive power?

Predictive power versus simplicity. Sometimes you need a more complex model to make better predictions. Machine learning has demonstrated this clearly — deep neural networks are vastly more complex than linear regression, but for problems like image recognition, the complexity is justified by dramatically superior prediction. The criterion is not "always be simple." It is "never be more complex than the prediction demands."

Fruitfulness versus accuracy. A speculative schema with low current accuracy but high fruitfulness — one that generates novel, testable predictions — can be more valuable than an accurate but sterile schema. Lakatos would say the speculative schema is in a progressive phase while the accurate-but-static one is degenerating.

Kuhn himself acknowledged this tension. He noted that different scientists can weight these criteria differently and arrive at different theory choices without either being irrational. The same is true for your personal schemas. But you need to make the weighting explicit. If you're choosing a schema because it feels right, you're not using criteria — you're using comfort. The whole point of having criteria is to override the default where emotional resonance substitutes for quality.

Applying the criteria to a real schema

Take a common schema: "Hard work leads to success."

Accuracy: Mixed. Hard work correlates with success in many domains, but survivorship bias hides the many people who worked extremely hard and did not succeed. Accuracy is moderate, not high.
Predictive power: Low. The schema cannot tell you which specific hard-working person will succeed. It cannot distinguish productive effort from unproductive effort. It generates no specific predictions about when hard work will fail.
Scope: Superficially broad — it seems to apply everywhere. But its breadth comes from vagueness ("hard work" and "success" are both undefined), not from genuine explanatory reach.
Simplicity: High. Two variables, one relationship. But the simplicity is achieved by omitting everything that matters: what kind of work, in what context, with what resources, toward what end.
Fruitfulness: Low. This schema has generated no novel insights for you in years. It does not make you curious. It does not suggest experiments. It explains the same things the same way it always has.
Falsifiability: Near zero. Any failure can be attributed to "not working hard enough." Any success confirms the schema. It is functionally unfalsifiable.

Score: high simplicity, moderate accuracy, poor on everything else. This is not a useful operating schema. It is a cultural platitude wearing the costume of a model. A better schema might be: "Focused effort on high-leverage activities, in a domain where your skills match the opportunity, with feedback loops that allow course correction, leads to disproportionate outcomes." That schema is more complex, but it is also more accurate, more predictive, more falsifiable, and more fruitful. The additional complexity earns its keep.

Why this matters for your cognitive infrastructure

You are making decisions every day based on schemas you have never evaluated. Some of those schemas are excellent — battle-tested, predictive, refined over years. Some are junk — absorbed uncritically, never tested, running on emotional resonance alone. You cannot tell the difference without criteria.

The six criteria give you a standard. Not a perfect one — reasonable people can weight the criteria differently. But a standard that is vastly better than the default, which is no standard at all beyond "it feels true" and "I've always thought this."

L-0322 asked you to notice how you form schemas. This lesson gives you the tools to evaluate what you've formed. The next lesson — L-0324, Schema inventory — asks you to apply these criteria systematically: to catalog your most important schemas, rate them on these dimensions, and identify which ones are running your decisions with no empirical support. The scorecard you build becomes the foundation for deliberate schema maintenance — upgrading, replacing, or retiring the models that no longer serve clear thinking.

You have quality criteria now. Use them on the schemas that matter most.