Predictions test schemas

Every schema is a prediction engine

A schema that does not make predictions is not a schema. It is a label.

When you hold a mental model of anything — a person, a market, a process, a relationship — that model implicitly claims to represent how the thing works. And if you know how something works, you should be able to anticipate what it will do next. This is not a high bar. It is the minimum requirement for any model that claims to be useful. A map that cannot tell you what is around the next corner is not a map. It is decoration.

Karl Popper formalized this insight in Conjectures and Refutations (1963) when he argued that the distinguishing mark of a scientific theory is not that it can be verified but that it can be falsified. A theory earns its status by sticking its neck out — by making claims about what will happen that could, in principle, turn out to be wrong. A theory that is compatible with every possible observation tells you nothing about the world. It is, in Popper's phrase, irrefutable — and therefore empty.

Your personal schemas work the same way. If your model of a colleague, a market trend, or your own behavior cannot generate specific predictions about what will happen next, it is not doing cognitive work. It is sitting in your mind occupying space, giving you the feeling of understanding without any of the substance. The test is simple: what does your schema predict? If you cannot answer that question, you do not have a working model. You have an impression.

What prediction actually tests

When you derive a prediction from a schema and check it against reality, you are not just testing whether that specific prediction is correct. You are testing the entire structure of assumptions, inferences, and causal claims that produced the prediction. This is what makes prediction so epistemically powerful — and so different from mere observation.

Observation tells you what happened. Prediction tells you whether your understanding of why things happen is accurate. Consider the difference. You observe that your team missed a deadline. That is a fact, but it tells you nothing about your model. You predicted, based on your schema of how the team works, that they would miss the deadline because the requirements were ambiguous. When you check the actual cause, you discover they missed it because a key dependency was delivered late and the requirements were perfectly clear. Your prediction was correct in outcome but wrong in mechanism. Your schema needs revision — not because it failed to predict the event, but because the causal story it told was inaccurate.

This distinction between outcome accuracy and mechanism accuracy is critical. A schema can get the right answer for the wrong reasons, and those lucky hits will eventually fail you. Philip Tetlock's research, spanning two decades and culminating in Superforecasting: The Art and Science of Prediction (2015), demonstrated that the most accurate forecasters are distinguished not by their ability to guess outcomes but by the quality of their causal reasoning. In the Good Judgment Project, a multi-year tournament involving over 20,000 participants forecasting geopolitical events, the top performers — dubbed "superforecasters" — outperformed professional intelligence analysts with access to classified information by approximately 30 percent. What set them apart was not superior information. It was superior model quality: they held more nuanced schemas, updated them more frequently, and tested them against reality more rigorously.

Calibration: the meta-prediction

There is a prediction about predictions that matters as much as any individual forecast: how well-calibrated are you?

Calibration is the correspondence between your stated confidence and actual outcomes. If you say you are 70 percent confident in something, and you make a hundred such predictions, roughly seventy of them should come true. If ninety come true, you are underconfident — your schemas are better than you think. If only forty come true, you are overconfident — your schemas are worse than you think. Both miscalibrations are informative. Both point to systematic distortions in how you use your models.

The Brier score, the standard measure used in forecasting research, captures this precisely. It calculates the mean squared difference between your predicted probabilities and the actual binary outcomes. A perfect Brier score is 0 (every prediction was exactly right at 100 percent confidence); the worst possible score is 2. Random guessing on binary events yields a score of 0.5. Tetlock's superforecasters consistently achieved Brier scores around 0.20 — a level of accuracy that, sustained over hundreds of predictions, requires both good schemas and honest self-assessment about how good those schemas are.

The lesson for personal epistemology is that calibration is a learnable skill. Research from Atanasov, Mellers, Tetlock, and colleagues, published in Organizational Behavior and Human Decision Processes (2020), analyzed over 80,000 forecasts from 515 forecasters and found that the most accurate forecasters updated their beliefs in small, frequent increments. They did not wait for dramatic disconfirmation. They made minor adjustments continuously as evidence accumulated — exactly the way a well-functioning schema should revise itself. In contrast, poor forecasters either stubbornly confirmed their initial judgments or made rare, dramatic reversals. The pattern is clear: calibration comes from treating each prediction as a data point about your model, not as a verdict on your intelligence.

The predictive processing brain

The idea that prediction is central to cognition is not a metaphor. It is the dominant framework in contemporary neuroscience.

Andy Clark's predictive processing theory, developed across his 2013 paper "Whatever Next?" and his 2023 book The Experience Machine, proposes that the brain is fundamentally a prediction machine. Rather than passively receiving sensory input and building up a picture of the world, the brain generates cascading top-down predictions about what it expects to perceive. Incoming sensory signals are compared against these predictions, and only the mismatches — the prediction errors — are propagated upward for processing. What you consciously experience is not raw reality. It is your brain's best prediction of reality, corrected by whatever does not match.

This architecture has a direct implication for schema validation. Your schemas are, in neural terms, the generative models that produce predictions. When a prediction error occurs — when reality does not match what your schema expected — that error signal is precisely the information that drives model updating. A schema that never generates prediction errors is either perfectly accurate (unlikely) or so vague that it never makes contact with specific sensory evidence (far more likely). The brain itself treats prediction failure as the primary signal for learning. Your conscious practice of schema validation through prediction is the deliberate, reflective version of what your neural circuits do automatically.

The precision of the prediction matters. In Bayesian terms, each prediction carries not just a point estimate but a confidence interval — a prior belief about how certain you are. A more precise prior is harder to update because it takes stronger evidence to shift it. This is why, as research on the Bayesian brain has shown, overconfident schemas are epistemically dangerous: they resist updating in the face of contradictory evidence, not because the evidence is weak but because the schema's self-assessed precision is too high. When you make a prediction and assign it a confidence level, you are not just testing the schema. You are testing your meta-schema — your model of how good your model is.

From theory to practice: the prediction log

The bridge between theory and personal epistemology is a concrete tool: the prediction log.

A prediction log is exactly what it sounds like — a dated record of specific predictions derived from your schemas, written down before the relevant events occur, and scored after the outcomes are known. This practice is simple, but its effects compound. Over weeks and months, a prediction log reveals patterns that no single observation can: which domains you predict accurately, where you are systematically overconfident, where your schemas degrade under specific conditions, and which models you should trust and which you should treat as provisional.

The mechanics matter. A useful prediction log requires three properties:

Specificity. "The project will go well" is not a prediction. "The first deliverable will be completed within one week of the deadline, but the second will slip by at least two weeks because the second team has an unresolved dependency" is a prediction. The more specific your prediction, the more precisely it tests your schema. Vague predictions pass vague schemas.

Timestamping. You must record the prediction before the outcome is known. This is non-negotiable. Human memory is reconstructive — after the fact, you will remember having predicted whatever actually happened. This is hindsight bias, and it is universal. The timestamp is your defense against it. If the prediction is not written down before the event, it is not a prediction. It is a narrative.

Honest scoring. After the outcome, you must assess the prediction against what you actually wrote, not against what you now think you meant. This is where most prediction practices break down. The temptation to reinterpret a failed prediction as "basically right" or "right in spirit" is overwhelming. Resist it. A prediction that was wrong was wrong. That wrongness is the data.

The superforecasters in Tetlock's research practiced exactly this discipline. They did not predict casually. They predicted deliberately, recorded their confidence levels, tracked outcomes, and used the results to recalibrate their models. This was not natural talent. It was a practice — a specific, repeatable behavior that anyone can adopt.

Why accountability sharpens prediction

Prediction markets — systems like Metaculus and Polymarket where participants trade contracts on future events — demonstrate a principle that matters for personal practice: accuracy improves when something is at stake. Research shows that prediction markets forecast replication outcomes in social science with 73 percent accuracy and outperform election polls at longer horizons. The mechanism is skin in the game. When you must bet on your beliefs, you cannot casually overstate your confidence.

You do not need a prediction market to import this principle. When your prediction log is private and no one ever sees it, the temptation to fudge scores is high. When you share predictions with a colleague or use them to make real decisions, the quality improves because the cost of being wrong becomes tangible. Tetlock's superforecasters were not forecasting for entertainment. They were competing in a tournament where accuracy was publicly tracked. The environment enforced the honesty that personal practice must create deliberately.

AI and the Third Brain: prediction as infrastructure

Large language models are, at their core, prediction machines. They predict the next token in a sequence based on patterns learned from training data. But this narrow predictive ability obscures a more useful capability for your epistemic practice: LLMs can help you generate, evaluate, and track predictions from your schemas.

Consider the workflow. You hold a schema about how your industry will evolve over the next year. Instead of testing it in isolation, you can articulate the schema to an LLM and ask it to derive specific, falsifiable predictions from your model. The LLM does not know whether your schema is correct — but it is very good at tracing the logical implications of a set of assumptions. If your schema implies that a certain market segment will contract, the LLM can help you identify the specific observable indicators you should watch for, the timeline over which the contraction should become visible, and the alternative explanations you should consider if the indicators do not materialize.

This is not outsourcing your thinking. It is instrumenting it. The schema remains yours. The predictions remain yours. But the process of deriving rigorous, specific, timestamped predictions from a model — which most people skip because it is cognitively effortful — becomes dramatically easier when you can externalize the derivation step to a system that does not get tired, does not get emotionally attached to the schema, and does not unconsciously avoid predictions that might disconfirm a cherished belief.

The Third Brain pattern here is straightforward: your schemas are the model layer, your prediction log is the evaluation layer, and AI is the derivation engine that connects the two. Over time, this creates a feedback loop — schema generates predictions, predictions meet reality, reality updates schema — that runs with a rigor and consistency that unaided cognition cannot sustain.

The epistemics of prediction failure

There is one more thing that prediction testing reveals, and it is perhaps the most important.

When a prediction fails, it does not just tell you that your schema is wrong. It tells you where your schema is wrong. The pattern of failures is diagnostic. If your predictions about a colleague's behavior fail specifically when the situation involves public visibility but succeed in private settings, the failure pattern is pointing at a specific feature your schema is missing. If your predictions about a market fail in the short term but succeed in the long term, the failure pattern is telling you that your schema captures structural forces but misses timing dynamics.

This diagnostic power is what makes prediction testing categorically superior to simple observation. Observation tells you what happened. Prediction testing tells you which piece of your model is broken. And that diagnostic specificity is exactly what you need to repair your schemas efficiently rather than discarding them wholesale and starting over.

L-0285 will explore this further — the principle that failed predictions are data, not failures. But the foundation is here: predictions test schemas because predictions force schemas to commit. A schema that has committed to a specific claim about the future can be evaluated, calibrated, and refined. A schema that has never committed to anything is unfalsifiable — and therefore, by Popper's criterion and by practical experience, useless.

Make your schemas predict. Then watch what happens.

Sources

Atanasov, P., Witkowski, J., Ungar, L., Mellers, B., & Tetlock, P. (2020). Small steps to accuracy: Incremental belief updaters are better forecasters. Organizational Behavior and Human Decision Processes, 160, 19-35.
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181-204.
Clark, A. (2023). The Experience Machine: How Our Minds Predict and Shape Reality. Pantheon.
Mellers, B., Stone, E., Atanasov, P., Rohrbaugh, N., Metz, S. E., Ungar, L., Bishop, M. M., Horowitz, M., Merkle, E., & Tetlock, P. (2015). The psychology of intelligence analysis: Drivers of prediction accuracy in world politics. Journal of Experimental Psychology: Applied, 21(1), 1-14.
Popper, K. (1963). Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge.
Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.