Behavioral Science & Systems Thinking

The Extrapolation Paradox: Predictive Processing, Bayesian Brains, and the Illusion of Genius

Angga Conni Saputra
β€’
April 20, 2026
The Extrapolation Paradox: Predictive Processing, Bayesian Brains, and the Illusion of Genius

There is a classic joke circulating in hacker and academic circles that serves as a perfect litmus test for human cognition. It goes like this: "There are two kinds of people in the world: 1. Those who can extrapolate from incomplete data." And then... it simply stops. There is no number two. πŸ€”

That is precisely why it is funny. The reader is forced to deduce the "second kind of person" in real-time. Those who can extrapolate from incomplete data instantly bridge the gap and understand the punchline (the second kind is, naturally, those who can't). Those who lack this ability are left waiting in confusion for the second bullet point.

It is a brilliant, self-executing intellectual trap. But beneath the humor lies a profound epistemological truth about how the human brainβ€”and increasingly, Artificial Intelligenceβ€”actually processes reality. We live in a universe where data is never 100% complete. If you cannot extrapolate, you cannot function.

1. The Epistemology of the Incomplete

By biological and physical design, human beings operate on partial information. You see smoke, you infer fire. You see a colleague sitting silently at their desk, you infer exhaustion or frustration. You look at a two-page CV, and you infer the arc of a candidate's entire professional competence.

Why is this cognitive leap not just justified, but absolutely critical for survival? Because in the real world, no decision waits for perfect data.

If we waited for empirical certainty, the world would grind to a halt. The core of human intelligence is not calculating known variables; it is pattern recognition, probability mapping, and statistical inference. We navigate the fog of reality using a built-in mathematical engine.

The Mathematics of Guessing: Bayes' Theorem

When we extrapolate, we are subconsciously executing Bayesian updating. We assess the likelihood of an event based on imperfect, unfolding evidence.

P(A|B) = P(B|A) × P(A)P(B)

Posterior Belief = (Likelihood × Prior Probability) / Evidence

We do not measure truth in absolutes; we update our priors (what we already know) with new evidence (the incomplete data) to form a posterior (our extrapolated conclusion). It is the elegant mathematics of navigating uncertainty.

2. The Dark Side of Inference: When Extrapolation Becomes Prejudice

If extrapolation is our greatest cognitive asset, why is it also our most destructive flaw? Because the neurological mechanism that allows a grandmaster to predict a chess match is the exact same mechanism that generates bias and prejudice.

Extrapolation without rigorous self-awareness rapidly devolves into wild assumptions. A colleague is late, so we extrapolate that they are fundamentally lazy. Sales dip for one quarter, and we extrapolate that the entire product line is doomed. We take a single, isolated data point regarding an individual's background and extrapolate an entire stereotypic profile about their character.

This is the domain of Confirmation Bias and Overgeneralization. The human brain, in its desperate attempt to conserve metabolic energy, rushes to fill in the blanks using the easiest, most accessible stereotypes available. We stop searching for truth and start searching for data that supports our initial guess.

The Thin Line of Inference

The cognitive boundary between brilliant insight and destructive bias.

High Pattern Library (Domain Expertise) Low Pattern Library (Novice State) Low Evidence (Incomplete Data) High Evidence (Complete Data) Expert Insight Accurate prediction from minimal cues Empirical Ground Truth Data confirms the rich model Prejudice / Bias Wild assumptions based on nothing Data Paralysis Has the data, but no framework to read it The Gossip Trajectory

Figure 1: The margin between an elite analyst and the office gossip is astonishingly thin. It depends entirely on the depth of the prior pattern library.

3. The Plot Twist: Nobody Extrapolates from "Nothing"

Here is where the joke requires a philosophical correction. If we are brutally honest about cognitive science, the punchline is incomplete.

There are no "those who can't." Because all human beings are pattern-dependent reasoners.

When someone successfully "extrapolates from incomplete data," they are not pulling a conclusion out of thin air. What we call genius, intuition, or "connecting the dots" is actually a neurological process called Predictive Processing. The brain does not passively receive the world and try to figure it out. Instead, the brain is a prediction engine. It houses a massive, compressed library of past experiences, schemas, and models. When it encounters incomplete data (A), it instantly searches its library, finds the closest match, and aggressively predicts the rest of the sequence (B).

We do not discover reality. We autocomplete it.

This is exactly how Large Language Models (LLMs) and generative AI work. They do not "understand" a prompt in a conscious sense. They possess an incomprehensibly large prior training database, calculate probabilistic continuation, and "autocomplete" the response. Human beings function on remarkably similar underlying principles.

The More Brutal Binary Joke

If the extrapolation joke tests your logic, there is an even crueler version that tests your domain expertise:

"There are 10 types of people in the world: those who understand binary, and those who don't."

In binary code, "10" represents the decimal number 2. The joke works entirely on pattern compression. If you lack the specific pattern library (computer science), the extrapolation fails completely. This explains why "geniuses" can be incredibly awkward outside their fields. Sherlock Holmes can deduce a murder from a speck of mud, but might be entirely incapable of reading a room's emotional tension. The genius is not universal; it is domain-specific pattern density.

4. The Biological Blueprint: How Neurons Wire Themselves

To understand why human and AI training are not so different, we need to look at the hardware. In 1949, Canadian psychologist Donald Hebb articulated one of the most consequential ideas in neuroscience: "Neurons that fire together, wire together." Known as Hebbian Learning, this principle describes how repeated co-activation of two neurons gradually strengthens the synaptic connection between them.

Think of it as the brain physically remodeling itself around its own experience. A child who practices piano every day does not merely get better at piano β€” their motor cortex literally grows denser networks dedicated to finger coordination. A taxi driver who navigates a complex city for decades demonstrates measurably enlarged hippocampal volume compared to non-drivers. The brain is not a fixed computer; it is a living, architecture-shifting learning machine. This property is called neuroplasticity.

Hebb's Rule vs. Gradient Descent: The Same Idea, Different Hardware

🧠 Human Brain
  • β€’ Synaptic weights strengthened by co-activation (Hebb)
  • β€’ Error signals via dopamine & prediction error
  • β€’ Forgetting via synaptic pruning (regularization)
  • β€’ Sleep consolidates and generalizes memory
  • β€’ Emotion modulates learning rate (stress, curiosity)
πŸ€– Artificial Neural Network
  • β€’ Synaptic weights updated via backpropagation
  • β€’ Error signals via loss function (cross-entropy, MSE)
  • β€’ Forgetting via dropout & weight decay (regularization)
  • β€’ Batch normalization generalizes across data
  • β€’ Learning rate scheduler mimics attention modulation

The vocabulary is different. The mathematics, at a conceptual level, is hauntingly similar. Both systems learn by adjusting the strength of connections in response to error β€” the gap between what was predicted and what actually happened.

5. Training Loops: How Humans and AI Learn From Mistakes

Here is the uncomfortable truth that bridges cognitive science and machine learning: neither humans nor AI learn from correct answers alone. Both learn primarily from errors.

In machine learning, the training process works as follows: the model makes a prediction, that prediction is compared against the known correct answer (the ground truth label), a numerical error (the "loss") is computed, and that error signal is propagated backwards through every layer of the network β€” adjusting each connection weight by a tiny amount to reduce future error. This process, called backpropagation with gradient descent, was formalized by Rumelhart, Hinton, and Williams in their landmark 1986 Nature paper.

Now consider how a child learns to walk. The child attempts a step, falls (prediction error), the vestibular system fires a strong signal (error gradient), and the cerebellum adjusts the motor commands for the next attempt. This loop β€” attempt β†’ error β†’ adjustment β†’ reattempt β€” is neurologically identical in structure to the machine learning training loop. The child does not read a manual about center-of-gravity physics. They backpropagate their falls.

Figure 2: The Universal Learning Loop

Structurally identical processes across biological and artificial systems.

INPUT Data / Experience PREDICTION Output / Action ERROR SIGNAL Loss / Dopamine Drop WEIGHT UPDATE Synapse / Gradient Step Feedback / Backpropagation 🧠 Human: Stumble β†’ Fall β†’ Adjust posture β†’ Try again πŸ€– AI: Forward pass β†’ Loss β†’ Backprop β†’ Gradient descent

Figure 2: The four-stage learning loop is universal. Only the substrate β€” biological tissue vs. silicon β€” differs.

6. The Role of the Teacher: Supervised, Unsupervised, and Reinforcement

One of the most illuminating parallels between human pedagogy and AI training is the taxonomy of how we learn. Machine learning researchers divide training paradigms into three categories β€” and educators have been using equivalent frameworks for centuries without knowing the labels.

Supervised Learning

AI version: The model receives labeled input-output pairs. It learns by minimizing the difference between its prediction and the known correct label.

Human equivalent: A teacher marks an exam. The student sees the red "βœ—" next to the wrong answer. The explicit label ("correct answer: B") adjusts future behavior. This is the entire architecture of formal schooling.

Unsupervised Learning

AI version: The model receives raw, unlabeled data and must discover latent patterns, clusters, and structure entirely on its own β€” no teacher, no correct answer sheet.

Human equivalent: A toddler acquiring language. No grammar textbook is opened. The child absorbs thousands of hours of ambient speech and self-organizes the phonological, syntactic, and semantic rules of their native language from scratch.

Reinforcement Learning

AI version: An agent acts in an environment and receives a reward signal (positive or negative). It learns a policy that maximizes cumulative reward over time β€” no labeled data, only consequences.

Human equivalent: Every human navigating a social environment. Complimenting a colleague generates a positive social signal (reward); interrupting the CEO in a meeting generates a negative one (punishment). We learn the rules of social physics through trial, consequence, and recalibration.

The philosopher and developmental psychologist Jean Piaget called this process assimilation and accommodation: we first try to fit new information into our existing mental schemas (assimilation), and when that fails catastrophically, we restructure the schema itself (accommodation). In ML terminology, this is the difference between fine-tuning an existing model on new data versus retraining it from scratch with a new architecture.

7. The Zone of Proximal Development = Curriculum Learning

In 1934, Soviet psychologist Lev Vygotsky introduced the concept of the Zone of Proximal Development (ZPD): the gap between what a learner can do independently and what they can do with guidance. The optimal learning zone is neither the trivially easy nor the impossibly hard β€” it is the challenging, scaffolded middle ground.

In 2009, AI researchers Bengio, Louradour, Collobert, and Weston formalized an almost identical idea under the name Curriculum Learning. They demonstrated empirically that training neural networks on examples ordered from easy to hard β€” rather than randomly β€” produces significantly better generalization performance. The model, like the child, benefits from a structured progression of difficulty. You do not teach a child calculus before arithmetic. You do not train a neural network on adversarial images before it has learned basic edge detection.

The Scaffolding Principle

Both Vygotsky's scaffolding and ML's curriculum learning share the same core insight: the sequence of training matters as much as the content of training. A brilliant curriculum with the wrong order will produce a confused learner β€” human or artificial. A mediocre dataset presented in a carefully graduated sequence can outperform a larger, richer dataset presented randomly. This has profound implications for both classroom design and ML pipeline architecture.

8. Transfer Learning: The Genius Who Reads Between Fields

One of the most impressive feats of human intelligence is the ability to take expertise from one domain and apply it productively to a completely different one. The physicist who designs economic models. The chess grandmaster who becomes an exceptional military strategist. The jazz musician who writes unusually creative software architecture. This is transfer learning β€” and it is one of the hottest research areas in modern AI.

In 2018, the publication of BERT (Bidirectional Encoder Representations from Transformers) by Google's research team marked a watershed moment. A single model, pre-trained on massive amounts of general text data, could be fine-tuned with minimal additional examples to perform remarkably well on dozens of specialized downstream tasks β€” legal document analysis, medical question answering, sentiment analysis, code generation. The model had built a rich, general-purpose pattern library, and then transferred it.

This is exactly what happens in human expertise transfer. The underlying cognitive mechanisms β€” analogical reasoning, schema abstraction, and structural mapping β€” allow a deeply experienced professional to walk into an unfamiliar domain and achieve competence far faster than a complete novice. They are not starting from zero; they are fine-tuning a pre-trained model.

Figure 3: The Parallel Architecture of Learning

A structural comparison between human cognitive development and AI model training pipelines.

Learning Concept 🧠 Human Implementation πŸ€– AI Implementation
Foundation Training Childhood development; sensory exposure; first language acquisition Pre-training on massive unlabeled corpora (e.g., Common Crawl, Wikipedia)
Error-Driven Adjustment Dopaminergic prediction error; negative feedback from teacher or environment Backpropagation; minimizing the loss function via gradient descent
Memory Consolidation Slow-wave sleep; hippocampal replay of daily experiences to neocortex Epoch training; repeated passes over the dataset to stabilize weights
Overfitting Rigid thinking; inability to generalize; expert who fails outside their narrow field Model memorizes training set; fails on unseen data; poor generalization
Regularization Forgetting irrelevant details; sleep pruning; deliberate unlearning of bad habits Dropout layers; weight decay (L2 regularization); early stopping
Transfer Learning Applying expertise from chess to military strategy; jazz to software design Fine-tuning a pre-trained foundation model (GPT, BERT) on domain-specific tasks
Curriculum / Scaffolding Vygotsky's ZPD; graded difficulty in education; Montessori method Curriculum learning; progressive training from easy to hard samples

9. Overfitting: The Danger of Learning Too Well

In machine learning, overfitting is a catastrophic failure mode: a model performs perfectly on its training data but fails dramatically on any new, unseen input. It has memorized specifics instead of learning generalizable rules. It has confused pattern noise for signal.

We live with the human equivalent of overfitting every single day, and we call it professional rigidity, cultural bias, and ideological echo chambers. The seasoned executive who succeeded with one market strategy in 2005 and refuses to adapt in 2025 is overfitting. The specialist physician who anchors every diagnosis to their most frequently seen condition is overfitting. The algorithm β€” biological or silicon β€” learned too hard on a too-narrow dataset.

The solution in ML is regularization: techniques that deliberately constrain the model's complexity, forcing it to find simpler, more generalizable patterns. The human equivalent is deliberate exposure to diverse experiences, the intellectual discipline of seeking disconfirming evidence, and what psychologists call cognitive flexibility. Both fight the same enemy: a system that mistakes familiarity for truth.

The Overfit Expert Problem

Research by organizational psychologist Adam Grant in Think Again (2021) documents a well-established phenomenon: domain experts often become worse at predicting outcomes in their own field than informed generalists, particularly in volatile environments. They are overfit. Their training set (past experience) no longer matches the distribution of the real world. This is not a failure of intelligence; it is a failure of adaptive regularization β€” the refusal to prune outdated weights.

10. Where the Analogy Breaks: The Limits of the Comparison

Intellectual honesty demands that we acknowledge where the human brain and artificial neural networks diverge β€” sometimes profoundly.

Data efficiency: A three-year-old child can learn the concept "dog" from seeing perhaps a dozen examples across a few weeks. A state-of-the-art image recognition system may need tens of thousands of labeled images. Human cognition achieves a form of few-shot learning that remains beyond the current frontier of most AI architectures. This gap is narrowing with models like GPT-4 and Gemini, but the biological brain remains extraordinarily more data-efficient.

Embodiment: Human learning is fundamentally embodied β€” anchored in proprioception, pain, pleasure, hunger, and the social feedback of faces and voices. AI models learn from text, images, and numerical data representations. The richness of biological sensorimotor experience has no current AI equivalent.

Motivation and emotion: Human learning is modulated by cortisol, dopamine, oxytocin, and serotonin β€” neurochemical systems that assign differential salience to experiences based on survival, social bonding, and reward. There is no artificial equivalent of fear, love, or boredom shaping how an AI weights its training examples.

Consciousness: And of course, the deepest difference: we do not know if there is anything it is like to be a language model processing a prompt. Human learning occurs inside a subjective, phenomenological experience. Whether AI systems have any form of inner experience remains one of the most contested and unanswered questions in philosophy of mind.

Conclusion: We Think From Memory

The ability to extrapolate is not pure magic, nor is it a sign of infallible intelligence. It is the result of exposure, practice, and a meticulously curated internal library of patterns.

The expert surgeon looks at a faint shadow on an X-ray and "sees" a tumor not because they have better eyesight, but because they have looked at ten thousand X-rays before. They are not thinking from the void; they are thinking from memory.

And a large language model, asked to complete a sentence, is doing something structurally equivalent: retrieving, recombining, and probabilistically continuing patterns from its vast compressed training library. The mechanism is different. The logic is the same.

So, the next time you find yourself quickly jumping to a conclusion β€” whether about a dataset, a market trend, or a co-worker β€” pause and ask yourself: Am I exhibiting expert insight, or am I just autocompleting reality with my own prejudice? Are my "priors" actually up to date, or am I running a biased algorithm?

And the next time an AI system gives you a confident, fluent, and utterly wrong answer, recognize the mirror it holds up: it is doing precisely what we do. It is extrapolating from its training. It is thinking from its memory.

Genius is often just compression. We don't think from nothing. We think from memory. And so does the machine.

#PredictiveProcessing #Neuroscience #MentalModels #SystemsThinking #BayesTheorem #CognitiveBias #ArtificialIntelligence #MachineLearning #DeepLearning #NeuralNetworks #HebbianLearning #TransferLearning #Neuroplasticity

Scientific Citations & References

Ref 1

Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. (The foundational text on how the human brain functions as a prediction machine rather than a passive receiver of data.)

Verify on PubMed
Ref 2

Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138. (Explains the mathematical/Bayesian necessity of the brain extrapolating incomplete data to minimize surprise and metabolic cost.)

Read on Nature
Ref 3

Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux. (The seminal work on System 1 heuristics vs. System 2 analysis, outlining how "jumping to conclusions" creates cognitive bias and stereotyping.)

Publisher Site
Ref 4

Bayes, T. (1763). An Essay towards solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society of London, 53, 370–418. (The original articulation of Bayesian probability inference.)

Royal Society
Ref 5

Gobet, F., & Simon, H. A. (1996). Recall of random and distorted chess positions: Implications for the theory of expertise. Memory & Cognition, 24(4), 493–503. (Empirical proof that domain expertise relies on pattern chunking and memory, not raw calculative power.)

Verify on PubMed
Ref 6

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536. (The landmark paper that formalized backpropagation, the foundational algorithm of modern deep learning, directly analogous to error-driven learning in biological neural systems.)

Read on Nature
Ref 7

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444. (The landmark review article that established the scientific foundations of deep learning, authored by the three researchers awarded the 2018 Turing Award β€” widely considered the Nobel Prize of computing.)

Read on Nature
Ref 8

Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. Wiley. (Introduced Hebbian Learning β€” "neurons that fire together, wire together" β€” the biological precedent to synaptic weight updates in artificial neural networks.)

Publisher DOI
Ref 9

Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th International Conference on Machine Learning (ICML 2009), 41–48. (The first formal study proving that training neural networks from easy-to-hard examples β€” mirroring Vygotsky's scaffolded pedagogy β€” significantly improves generalization.)

Read on ACM
Ref 10

Maguire, E. A., Gadian, D. G., Johnsrude, I. S., Good, C. D., Ashburner, J., Frackowiak, R. S. J., & Frith, C. D. (2000). Navigation-related structural change in the hippocampi of taxi drivers. PNAS, 97(8), 4398–4403. (Landmark neuroimaging study demonstrating measurable structural brain change as a direct result of experiential training β€” definitive proof of neuroplasticity in professional expertise.)

Read on PNAS
Ref 11

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT 2019, 4171–4186. (The paper introducing BERT, Google's transfer learning breakthrough, demonstrating that a single pre-trained model can achieve state-of-the-art performance across 11 NLP tasks β€” the AI equivalent of cross-domain expertise transfer.)

Read on ACL Anthology
Ref 12

Vygotsky, L. S. (1978). Mind in Society: The Development of Higher Psychological Processes. Harvard University Press. (Introduced the Zone of Proximal Development β€” the pedagogical principle that optimal learning occurs at the boundary of current capability and guided challenge; the conceptual predecessor to ML curriculum learning.)

Harvard University Press
Ref 13

Grant, A. (2021). Think Again: The Power of Knowing What You Don't Know. Viking. (Documents the "overfit expert" problem in organizational and scientific contexts β€” experienced professionals whose rigid mental models make them less accurate predictors than informed generalists in dynamic environments.)

Author's Site

Share this insight