Boundary cases test your categories

Your categories are only as good as what breaks them

You have a system for organizing your work. Maybe it's a project management board with labels like "bug," "feature," and "improvement." Maybe it's a mental model that divides people into "technical" and "non-technical." Maybe it's a personal rule that distinguishes "productive time" from "wasted time."

Whatever the system, here is the question that tests whether it actually works: what do you do with the thing that doesn't fit?

The bug report that is also a feature request. The colleague who codes fluently but can't explain what they built. The hour you spent reading a research paper that might never apply to anything but fundamentally changed how you think about a problem.

These aren't annoyances. They are the most valuable data your classification system will ever produce. Every item that resists clean categorization is pointing directly at a boundary that needs refinement, a dimension you haven't accounted for, or an assumption you didn't know you were making.

Boundary cases don't break your categories. They reveal that your categories were already broken. You just hadn't noticed yet.

The heap problem: where every boundary fails

Philosophers have a name for the oldest boundary problem in recorded thought. The Sorites paradox — from the Greek soros, meaning "heap" — asks a question that sounds trivial until you try to answer it: how many grains of sand make a heap?

One grain is not a heap. If you add a single grain to something that is not a heap, it seems impossible that one grain could transform a non-heap into a heap. Therefore two grains are not a heap. By the same logic, three grains are not a heap. Follow this reasoning grain by grain and you arrive at an absurd conclusion: no number of grains of sand can ever constitute a heap.

The paradox works because the concept "heap" has no sharp boundary. There is no specific number n at which n - 1 grains fail to be a heap and n grains succeed. The category is vague not because we haven't thought about it carefully enough, but because vagueness is intrinsic to how the concept works. The Stanford Encyclopedia of Philosophy describes the Sorites as "generated by vague terms, viz., terms with unclear ('blurred' or 'fuzzy') boundaries of application" — and notes that "bald," "tall," "old," and "blue" all suffer from the same structural problem.

This isn't an abstract philosophical curiosity. Every category you use in daily thinking has the same vulnerability. When does a "small" project become a "medium" one? When does a "junior" developer become "senior"? When does "being cautious" become "being avoidant"? You use these categories constantly, and every one of them dissolves under the same grain-by-grain pressure that destroys the heap.

The epistemically honest response is not to stop using vague categories — they are indispensable for practical reasoning. The response is to expect that your boundaries will fail at the edges and to build systems that handle that failure gracefully.

When the Supreme Court had to classify a tomato

If boundary cases seem theoretical, consider that the United States Supreme Court once had to decide whether a tomato is a fruit or a vegetable. The answer had nothing to do with biology and everything to do with where you draw a line.

In Nix v. Hedden (1893), the Nix family, who imported tomatoes from the West Indies, argued that their goods should enter the country duty-free because tomatoes are botanically a fruit — the mature ovary of a flowering plant, containing seeds. Under the Tariff Act of 1883, fruits were exempt from import duties. Vegetables were not.

Justice Horace Gray, writing for a unanimous Court, ruled that tomatoes are vegetables. Not because the botany was wrong, but because the Tariff Act used words "in their ordinary meaning" rather than their technical scientific definition. In common usage, people eat tomatoes as part of a main course, not as dessert. Therefore: vegetable.

This case is famous because it sounds absurd, but the underlying principle is serious and generalizable. The tomato didn't change. The purpose of the classification changed — and when the purpose changed, the correct categorization changed with it. The tomato sat at the boundary between two categories because those categories were designed for different questions. Botany asks "what is the reproductive structure of this plant?" Tariff law asks "how do people use this product at the dinner table?"

Every boundary case you encounter in your own thinking carries this same lesson. When something doesn't fit your categories, the first question to ask is not "where should I force this item?" but "what question were my categories designed to answer — and is that still the right question?"

Boundary objects: things that live between worlds

The sociologists Susan Leigh Star and James R. Griesemer formalized this insight in 1989 in one of the most-cited papers in the history of Social Studies of Science. Studying how amateur naturalists and professional zoologists collaborated at Berkeley's Museum of Vertebrate Zoology, they identified a class of artifacts they called "boundary objects" — things that sit at the intersection of multiple communities, are "adaptable to different viewpoints and robust enough to maintain identity across them."

A field specimen, for instance, meant different things to the amateur collector (a trophy of observation skill), the professional taxonomist (a data point in a classification system), and the museum administrator (an inventory item justifying funding). The specimen didn't fit neatly into any single category because it simultaneously inhabited all three. And rather than causing problems, this category-spanning quality was exactly what made collaboration possible. The boundary object served as a translation layer between groups that would otherwise have had no common vocabulary.

Star and Griesemer identified four types of boundary objects: repositories (libraries, museums), ideal types (diagrams, atlases), coincident boundaries (objects with the same edges but different contents depending on who's looking), and standardized forms (shared templates that each group fills in differently). All four share the defining property: they are "weakly structured in common use" but "become strongly structured in individual-site use."

The practical takeaway is that some items resist categorization not because your system is broken but because they occupy a genuinely productive position between categories. Forcing them into one box doesn't solve a problem — it destroys information. The boundary object's ambiguity is its function.

Software engineers already know this

If there is one discipline that has turned boundary-case thinking into a formal methodology, it is software testing. Boundary value analysis (BVA) is a standard testing technique built on a single empirical observation: bugs cluster at the edges of input ranges, not in the middle.

When a function accepts an integer between 1 and 100, the bugs will not typically appear when you pass 50. They appear at 0, 1, 2, 99, 100, and 101 — the values on or just outside the boundary of the valid range. Industry data suggests that over 70 percent of input-related errors occur at boundaries, which is why BVA is one of the first testing techniques taught to new engineers.

The methodology is systematic. For every constrained input, you test the minimum valid value, the maximum valid value, one step inside each boundary, and one step outside each boundary. You don't test the middle because the middle is boring — it's where things work. The boundaries are where assumptions get exposed.

This principle transfers directly to epistemic categories. If you have a definition of "urgent" for your task system, the most informative items are not the ones that are obviously urgent or obviously not urgent. They are the ones that sit at the edge — the task that might be urgent depending on how you define the term. Test your categories at the boundaries, not in the comfortable center, and you will find the weaknesses before they cause failures.

Neural networks fail at exactly the same boundaries

In 2014, Ian Goodfellow and colleagues at Google demonstrated something disturbing about artificial intelligence: neural networks are catastrophically vulnerable to boundary cases. Using a technique called the Fast Gradient Sign Method (FGSM), they showed that adding imperceptible noise to an image could cause a state-of-the-art classifier to completely misidentify it.

The most famous example: an image correctly classified as a "panda" with 57 percent confidence was modified by adding a tiny perturbation — invisible to the human eye — and the same network reclassified it as a "gibbon" with 99.3 percent confidence. The panda still looked exactly like a panda to any human observer. But the perturbation pushed the image's mathematical representation across the classifier's decision boundary, from one category into another.

These adversarial examples work because they exploit the exact boundary between categories in the model's learned representation space. The neural network has drawn a line in high-dimensional space that separates "panda" from "gibbon," and adversarial attacks find inputs that sit right on that line — then nudge them a fraction to the wrong side. The line itself is the vulnerability.

Real-world consequences have followed. Researchers have demonstrated that small stickers placed on road signs can cause self-driving systems to misread stop signs as speed limit signs. Slight modifications to 3D-printed objects can make them unrecognizable to computer vision systems. In every case, the attack targets the boundary — the exact region where the classifier's categories are most fragile.

This is the same principle as the heap problem, expressed in linear algebra instead of philosophy. The neural network's categories, like yours, are only as reliable as their boundaries. And the boundaries are always the weakest point.

Nature doesn't respect your species boundaries either

Perhaps the most powerful demonstration of boundary-case fragility comes from biology, where the most fundamental category — "species" — dissolves under boundary-case pressure.

The Ensatina salamander (Ensatina eschscholtzii) in California is a textbook ring species. Starting from a common ancestor population in Oregon, populations spread south along both sides of the Central Valley — too dry and hot for salamanders to cross. As the pioneering populations moved south, they evolved new color patterns and adaptations for different environments. Adjacent populations along each side of the ring can interbreed freely. But where the two branches meet again in Southern California, the terminal populations — subspecies eschscholtzii and klauberi — have diverged so much that they behave as separate species. They no longer interbreed.

Here is the boundary problem: if you sample populations sequentially around the ring, each adjacent pair is clearly the same species. They interbreed. There is no single point where "one species" becomes "two species." Yet at the endpoints, you have two populations that are reproductively isolated — the standard criterion for being different species.

The Ensatina ring demonstrates that "species" is not a binary category with a sharp boundary. It is a continuum, and the boundary is an artifact of the fact that most intermediate populations went extinct. When the intermediates survive, as in a ring species, the boundary disappears, and the category breaks.

Hybridization zones — regions where distinct species overlap and interbreed, producing fertile hybrids — create the same problem at larger scales. As Taylor et al. noted in a 2015 Journal of Heredity paper, "boundaries between species in sympatry are maintained by intrinsic barriers to gene exchange; these boundaries may not be uniform in space, in time, or across the genome." Species boundaries are not walls. They are permeable membranes that vary in strength depending on when, where, and which genes you examine.

The diagnostic power of what doesn't fit

Each of these domains — philosophy, law, sociology, software engineering, artificial intelligence, biology — arrives at the same conclusion through different evidence: boundaries are where the real information is.

The Sorites paradox reveals that vague categories lack sharp boundaries by design. Nix v. Hedden reveals that the "correct" boundary depends on which question you are asking. Boundary objects reveal that some items are most useful precisely because they span multiple categories. Boundary value analysis reveals that errors cluster at category edges. Adversarial examples reveal that classifiers are most vulnerable at their decision boundaries. Ring species reveal that even nature's most fundamental category dissolves under boundary pressure.

The common lesson: if you want to understand a classification system — yours, someone else's, or one built by a machine — don't look at the items in the center of each category. Look at the items at the edges. They will tell you things the well-classified items never can:

Which dimensions your categories neglect. The tomato exposed that "fruit vs. vegetable" conflates botanical structure with culinary function.
Where your definitions are vague by necessity. The heap shows that some boundaries cannot be sharpened without destroying the concept's utility.
Which items serve as bridges, not problems. Boundary objects reveal that some things work because they resist clean classification.
Where your system will fail under stress. The adversarial panda shows that the boundary is where the classifier is most confident and most wrong simultaneously.

Protocol: stress-testing your categories

Here is a systematic approach for finding and using boundary cases in any classification system you maintain:

Step 1: Identify your active categories. Write down the categories you actually use — your task labels, your mental model of your team, your definition of "good work," your criteria for "done." Most people use categories they've never made explicit.

Step 2: For each category, find the boundary. Ask: "What would be the most ambiguous case I could encounter?" Not an item that clearly belongs or clearly doesn't — one that makes you hesitate. If you can't find one, your category is either too broad (everything fits) or too narrow (nothing fits).

Step 3: Diagnose the boundary failure. For each ambiguous item, determine why it resists classification. Is the category missing a dimension? Is the boundary vague by necessity? Is the item a boundary object that serves multiple purposes? Is the classification question itself wrong?

Step 4: Decide what to do about it. Some boundaries should be sharpened (add a dimension, split a category). Some should be left vague (the concept requires flexibility). Some items should be allowed to span categories (they are boundary objects, and forcing them destroys information). The diagnosis from Step 3 tells you which response is appropriate.

Step 5: Log and review. Keep a running log of items that resist classification. After accumulating five to ten entries, review the log for patterns. If the same boundary keeps failing, that boundary is telling you something structural about your system.

Your Third Brain and the boundary advantage

AI classification systems are simultaneously the most powerful and the most boundary-fragile tools ever built. Large language models classify text into categories at superhuman speed, but their decision boundaries are opaque and exploitable. Image classifiers outperform human radiologists on average but fail catastrophically on adversarial inputs that a child would handle effortlessly.

When you use AI as a thinking partner — your Third Brain — boundary cases become a critical calibration tool. If you ask a language model to classify your tasks, your notes, or your ideas, test the classification with the items that you find ambiguous. If the model classifies them confidently, that confidence is a flag, not a reassurance. It means the model has drawn a sharp boundary where reality has a fuzzy one.

The productive workflow is: let AI handle the clear cases (the items in the center of each category, where classification is unambiguous) and reserve human judgment for the boundary cases (where classification requires understanding purpose, context, and tradeoffs that the model cannot access). This is not a limitation of AI — it is a division of labor that plays to the strengths of each system. The machine is faster in the middle. You are wiser at the edges.

Goodfellow's adversarial panda is the permanent reminder: any classifier, human or artificial, is only as reliable as its weakest boundary. The items that don't fit are not noise to be suppressed. They are signal to be studied.

Boundary cases are not exceptions to your categories. They are audits of your categories. Every item that resists classification carries information about where your system is incomplete, where your definitions are vague, or where your question is wrong. The instinct to force the misfit into the nearest box is the instinct to ignore the most useful data you have. Resist that instinct. Log the misfit. Study the boundary. Refine the system.

Your categories will never be perfect. But they can be honest about where they fail — and that honesty is what makes them improvable.