Categories are constructed not discovered

You didn't find those categories. You built them.

Open any knowledge management system, any database schema, any org chart, and you will find categories presented as though they were features of the territory — natural joints in reality that someone merely noticed and labeled. Projects are "active" or "archived." Tasks are "bugs" or "features." People are "technical" or "non-technical." Food is "healthy" or "unhealthy."

These divisions feel obvious. They feel discovered. And that feeling is precisely the problem.

Every category you use was constructed by someone, for some purpose, in some context. The question is never "is this the right category?" The question is "right for what?" Once you see this, you cannot unsee it — and your ability to build, evaluate, and improve classification systems transforms permanently.

The philosophical foundation: categories as human acts

The insight that categories are constructed rather than discovered is not new, but its implications are still underappreciated. Three thinkers in particular laid the groundwork that makes this lesson operational.

Michel Foucault opened The Order of Things (1966) with a passage from a Jorge Luis Borges story describing a fictitious Chinese encyclopedia that divides animals into categories including "belonging to the Emperor," "embalmed," "trained," "sucking pigs," "sirens," "drawn with a very fine camelhair brush," and "that from a long way off look like flies." Foucault wrote that this passage "shattered all the familiar landmarks of my thought" — not because the categories were wrong, but because they exposed the arbitrariness of every classification system, including the ones we treat as natural. His central argument: each historical period operates within an episteme, a largely invisible set of rules that determines what can be categorized, how, and by whom. During the Classical era, Western knowledge organized itself through taxonomy and representation — identity and difference. During the Modern era, new disciplines like biology, economics, and linguistics emerged with entirely different classification regimes. The categories didn't change because reality changed. They changed because the purposes and assumptions of the classifiers changed.

Ian Hacking, in his landmark essay "Making Up People" (1986), went further. He showed that when we create categories of people — "the obese," "the autistic," "the gifted" — the categories don't just describe a pre-existing population. They create new ways of being a person. Hacking called this dynamic nominalism: the idea that naming interacts with the named. Unlike quarks, which do not care what physicists call them, human beings respond to being categorized. A person classified as "learning disabled" in the 1970s didn't just receive a label — they received a new identity, a new set of institutional pathways, and a new set of possible life trajectories. Hacking called this feedback loop the looping effect: the classification changes the classified, which in turn changes the meaning of the classification, which changes the classified again. Categories of people are never stable because the people inside them are always responding to the category itself.

Geoffrey Bowker and Susan Leigh Star, in Sorting Things Out: Classification and Its Consequences (1999), brought this from philosophy into infrastructure. They studied the classification systems that undergird institutions — the International Classification of Diseases, the Nursing Interventions Classification, and most viscerally, the racial classification system of apartheid South Africa. Their core argument: classification systems are not neutral technical infrastructure. They are moral and political acts. "Each standard and each category valorizes some point of view and silences another." The apartheid regime required every person to be classified into one of four racial groups — Europeans, Asiatics, Coloureds, and Natives — and those classifications determined where you could live, work, go to school, and travel. The system was one-dimensional and could not account for the variability of human ethnicity, but its consequences were devastating and lasted four decades. Bowker and Star's point is not that all classification is oppressive — it is that all classification has consequences, and treating categories as "just how things are" blinds you to those consequences.

The case that broke the illusion: race

No example demonstrates the construction of categories more starkly than race. For centuries, Western science treated racial categories as biological discoveries — natural divisions in the human species that taxonomy merely identified and labeled. The assumption was so deep that it shaped medicine, law, education, and daily life.

The genomic evidence destroyed that assumption. Studies of complete genomes across global populations have shown that there is not a single absolute genetic difference between continental groups — no single variant where all members of one "race" have one allele and all members of another have a different one. Genetic variation within any so-called racial group far exceeds the variation between groups. As a 2023 article in Scientific American summarized the scientific consensus: "Race is a social construct without biological meaning." There is no accepted scientific method for classifying humans into races.

But here is the crucial nuance: saying race is constructed does not mean it is unreal. Race as a social category has profound biological consequences. Being classified as a particular race affects exposure to environmental toxins, access to healthcare, rates of incarceration, neighborhood quality, and chronic stress — all of which produce measurable physiological differences. The category was constructed, but once constructed, it shaped the reality it claimed merely to describe. This is Hacking's looping effect operating at civilizational scale.

The lesson for your own epistemic practice is direct: any time you hear "that's just what it is" about a category, you are hearing someone who has forgotten — or never learned — that the category was built for a purpose, and you should ask what that purpose was and whether it still serves.

Categories in engineering: the schema shapes the world

If you work with data, you make category decisions every day — and every one of those decisions constrains what questions can be asked downstream.

Consider a simple example. You are designing a customer database. You create a field called customer_type with two values: individual and business. That binary seems natural. But now a sole proprietor signs up — are they individual or business? A nonprofit registers — business? A government agency — neither? Your two-category schema has created a world in which some real entities cannot be accurately represented. Every report, every query, every machine learning model trained on this data will inherit the assumptions baked into those two values.

This is not a theoretical problem. Database schema decisions are foundational choices that cascade through the entire data engineering and analysis pipeline. Selecting a star schema versus a snowflake schema is not a theoretical exercise but a design decision that directly affects performance, maintainability, and the long-term usability of your analytics environment. But long before you reach schema topology, you have already made the more consequential choice: what categories exist at all?

When a product team decides that user feedback falls into the categories "bug," "feature request," and "question," they have constructed a world in which "confusion caused by our own documentation" has no natural home. When an HR system classifies employees as "full-time," "part-time," and "contractor," it has constructed a world in which the growing population of fractional executives, project-based specialists, and portfolio workers is invisible. The schema is not describing reality. It is carving reality into shapes that the system can process — and everything that doesn't fit those shapes gets distorted or dropped.

The engineering discipline here is to treat every category in your data model as a hypothesis rather than a fact. The field status: active | inactive is a hypothesis that these two states are sufficient to capture the lifecycle of the entity. The enum priority: low | medium | high | critical is a hypothesis about the granularity of urgency that serves your team's triage process. When you treat categories as hypotheses, you build in the expectation that they will need revision — and you design your systems to accommodate that revision rather than resist it.

AI and the construction of machine categories

The construction of categories becomes especially consequential — and especially invisible — in machine learning. When an AI system classifies images, text, or behavior, it does so using categories that humans constructed during the labeling phase. The AI does not discover what a "cat" is. It learns to replicate the category boundaries that thousands of human annotators drew when they labeled training images.

ImageNet, the dataset that catalyzed the deep learning revolution, was built by scraping millions of images from the internet and having workers on Amazon Mechanical Turk classify them according to categories drawn from WordNet, a hierarchical lexical database. The process involved 49,000 workers from 167 countries verifying over 14 million images. The category structure was not neutral. WordNet categories were designed for linguistic research, not visual classification. Some categories — particularly those describing people — translated poorly from verbal terms to visual recognition, retrieving only the most stereotypical image search results.

Kate Crawford and Trevor Paglen exposed this in their 2019 project "Excavating AI," which examined ImageNet's "person" categories and found labels like "slattern," "alcoholic," and "failure" applied to photographs of real people. Their central argument: "Datasets aren't simply raw materials to feed algorithms, but are political interventions." There is no neutral, natural, or apolitical vantage point from which training data can be constructed.

A 2021 study by researchers at MIT and Amazon found systematic labeling errors averaging 3.4% across ten major benchmark datasets — with error rates reaching 10% in some cases. In ImageNet specifically, 5.83% of test labels were incorrect. One breed of dog was confused for another. A baby was labeled as a nipple. These are not random mistakes — they are artifacts of the category construction process: the ambiguity of the categories themselves, the speed and context of the labeling work, and the assumptions embedded in the taxonomic structure.

The implication for anyone building or using AI systems is that the model's categories are your categories. If you label customer support tickets as "positive" or "negative" to train a sentiment classifier, the model will replicate whatever implicit theory of sentiment your labelers used. If the category boundaries were unclear, the model's boundaries will be unclear. If the categories were biased, the model will be biased. The AI does not correct for constructed categories — it amplifies them.

Your Third Brain: categories as infrastructure you maintain

If you use any kind of personal knowledge system — a note-taking app, a task manager, a personal wiki, a journal — you are maintaining a classification infrastructure. And the same principles apply. The categories in your system were constructed, not discovered. They serve purposes. And when those purposes change, the categories need to change too.

When you work with AI as a thinking partner — what this curriculum calls your Third Brain — the categories you use become the interface between your thinking and the machine's capabilities. If you ask an AI to "organize these notes by topic," the AI will construct categories based on surface patterns in your text. Those categories may or may not match your actual epistemic purposes. If you instead provide your own category structure — "classify these notes as observations, hypotheses, or open questions" — you are constructing categories that serve your thinking process, and the AI becomes a tool for applying your classification rather than imposing its own.

This is the operational difference between someone who treats categories as found objects and someone who treats them as designed infrastructure. The first person accepts whatever categories their tools, their culture, or their AI provides. The second person asks: what am I trying to see? What am I trying to do? What categories would serve that purpose? And then builds accordingly.

A protocol for constructing categories deliberately

Here is a five-step protocol you can apply any time you create or inherit a classification system:

Name the purpose. Every category should have an answer to the question: "This category exists so that [who] can [do what]." If you cannot complete that sentence, the category is inherited furniture, not functional infrastructure.
Identify what it hides. Every categorization scheme makes some things visible and others invisible. A project labeled "on track" or "at risk" hides projects that are technically on schedule but strategically pointless. A person labeled "senior engineer" hides the difference between someone with ten years of experience and someone with one year of experience repeated ten times. Ask: what reality does this category make hard to see?
Check for looping effects. If you are categorizing people — team members, customers, students, yourself — ask whether the category is changing the behavior of the categorized. Labeling someone a "low performer" changes how you interact with them, which changes their performance, which confirms the label. This is Hacking's looping effect in your own system.
Design for revision. If you know categories are constructed, you know they will need to change when purposes change. Build your systems — your databases, your note structures, your mental models — with reclassification as a first-class operation, not an emergency procedure. This sets up the principle explored in L-0233: reclassification is not failure.
Distinguish constructed from arbitrary. Saying categories are constructed does not mean all categorizations are equally good. A constructed bridge is not arbitrary — it serves a purpose, obeys constraints, and can be evaluated against criteria. Your categories should be evaluated the same way: do they serve the purpose they were built for? Do they handle the cases that actually arise? Do they produce useful distinctions rather than noise?

What this changes

When you internalize that categories are constructed, three things shift in your epistemic practice:

You stop arguing about "the right category" and start asking "right for what purpose." Disagreements about how to classify something — is this a bug or a feature request? is this person an introvert or an extrovert? is this task urgent or important? — become productive design conversations instead of factual disputes.

You gain the ability to reclassify without crisis. If categories are constructed, then changing them is maintenance, not failure. Your project management categories should evolve as your projects evolve. Your personal knowledge taxonomy should evolve as your understanding deepens. Rigidity in classification is not a sign of clarity. It is a sign of forgetting that you built the system in the first place.

You see the politics of classification. Every category in an institutional system — medical diagnosis codes, HR job levels, content moderation policies, credit scores — reflects someone's priorities and someone's blind spots. You can evaluate those systems, advocate for changes, and design alternatives. Not because all categories are bad, but because all categories are choices, and choices can be made better.

The previous lesson established that classification is how you carve reality into categories. This lesson establishes the more radical point: the carving is yours. You are not discovering joints in nature. You are building joints that serve your purposes. The next lesson — that explicit categories beat implicit ones — follows directly. If categories are constructed, then the quality of your thinking depends on constructing them deliberately rather than inheriting them unconsciously.