Grounded: Perspectives

The Imitation Game

Renato Duarte — Tue, 28 Apr 2026 17:32:50 GMT

Not so long ago, when the term foundation model started gaining traction and its use in neuroscience literature became commonplace, I confess I was very confused about what it actually referred to. The label was sliding across very different objects, some I was very familiar with, others not so much, but it sounded so foundational that it felt almost intimidating. As it seemed, a transformer pretrained on tons of fMRI data was a foundation model. So was a conv-net whose latents were being linearly regressed against visual cortex in the Yamins–DiCarlo tradition. The shared terminology typically implies a shared object, but the objects I was encountering weren’t shared and felt very disconnected.

The confusion wasn’t mine alone, and the most honest acknowledgement of it came from Patrick Mineault. In 2024 he surveyed the emerging landscape — LFADS, the Neural Data Transformer, Zhang et al.’s “universal translator,” MindEye2, DeepLabCut derivatives, and others — and wrote: “Foundation models are somewhat of a misnomer–the models themselves, transformers et al.–don’t matter nearly as much as the data that’s used to train and fine-tune the models.” This and his follow-up article gave me the right framing and vocabulary to start grasping what the terminology actually entails and provide the entry point for what follows.

What follows isn’t a piece that says foundation models in neuroscience are hype in some vague sense. I’m an enthusiast and overall, there is no doubt in my mind that these models add something foundational to the toolkit: predicting brain responses to stimuli they weren’t trained on, transferring across subjects and sessions, occasionally even lining up their internal features with biological features in a way that hints at something beyond correlation. The useful part of the question is, however, narrower and shallower than it appears. These models predict brain activity with unprecedented accuracy, but whether prediction of that kind constitutes understanding of the system that produced the activity or only a very good imitation of its externally-observable statistics is the real question. My answer, at the risk of being wrong in public, is that the field is starting to conflate the two in a way that will be expensive to undo. The hype on the application side is well deserved: these models are practically useful, and I’ll say so plainly throughout. But their explanatory power is much weaker than the language around them suggests, and it is that gap that I will address here.

Subscribe now

What a foundation model is, and what it became

Back in 2021, Bommasani and the Stanford CRFM collaborators defined a foundation model as a model “trained on broad data at scale and adaptable to a wide range of downstream tasks.” Decoupling pretraining from application is the defining element: a single large model absorbs structure from a massive heterogeneous dataset, and downstream users adapt it, fine-tune it, or query it with lightweight readouts. GPT, BERT, CLIP, DINO: these are foundation models in the Bommasani sense. They have a pretext task, a large corpus, and a property of task-general reuse.

In neuroscience, the term arrived and its meaning was promptly split in at least three different things:

First is a foundation model of neural activity: a transformer or encoder-decoder pretrained on large quantities of neural recordings (spikes, calcium, fMRI, EEG, iEEG) with a self-supervised objective, and then adapted to downstream predictions. Bommasani would recognize this usage. POYO-1 (Azabou et al., NeurIPS 2023) and its successor POYO+ are the clearest examples: a single transformer trained across many sessions, animals, and brain areas, with session and unit tokens that let it transfer. BrainLM (Caro et al., ICLR 2024) is the fMRI-centric analogue: masked-autoencoder pretraining on 6,700 hours of data, with task-specific fine-tuning downstream. LaBraM and BrainWave do the same for EEG and iEEG respectively, with BrainWave pushing to over 40,000 hours of electrophysiological recordings (mixed invasive and non-invasive) across roughly 16,000 individuals.

Second is a foundation model applied to neuroscience: models pretrained on non-brain data (text, vision, audio) whose internal representations are mapped, typically via a linear readout, against recorded brain activity. This is the Yamins–Schrimpf–King lineage. Yamins et al. 2014 showed that deep convolutional networks trained for object recognition produced internal features that predicted V4 and IT responses better than specialized models did. Schrimpf et al. 2021 extended the approach to language, and the broader Brain-Score framework from the same lineage (Schrimpf et al. 2020) became the standard harness for model-to-brain alignment evaluation. Caucheteux and King 2022 showed that GPT-2 latents predict speech-cortex responses with striking precision. Here the “foundation model” is not a model of the brain at all; it is a model of the world whose features happen to align with what brains compute. The label transfers by proximity due to shared application domains.

Third is the most permissive usage: any overparameterized model applied to neural data. Under this reading, MindEye2, a diffusion-based decoder that reconstructs images from fMRI, gets called a foundation model even though it is trained for a single task (image reconstruction from fMRI, with cross-subject fine-tuning) rather than the broad downstream portfolio the term is supposed to imply. It is a fantastic tool, but it would not qualify as a foundation model in Bommasani’s sense. Worth noting by contrast: equally fantastic per-dataset methods like CEBRA (Schneider, Lee, Mathis, Nature 2023), a contrastive embedding trained per dataset, deliberately do not take the label, and their authors are explicit about the scope of what’s been built. That kind of vocabulary discipline is exactly what the third-category cases lack.

Why am I covering this? Are the definitions important here or is the outcome? When a field’s vocabulary starts labeling broad classes of tools interchangeably under the same umbrella, the label starts losing its meaning and becoming a brand. Brands imply unification, adaptability, scale, and, crucially, claims about the generality of what has been learned. That last implication needs further scrutiny, which is why I am bringing this up.

The landscape, honestly described

Five families of model standout in the current conversation, and they are not equivalent.

MICrONS foundation model: Wang, Tolias and colleagues, Nature 2025. A four-module network (perspective, modulation, core, readout) trained on two-photon calcium imaging across a foundation cohort of 8 mice, with zero-shot generalization evaluated in 4 additional test mice (roughly 8,000–10,000 neurons each) and in the MICrONS functional-connectomics mouse (14 recording sessions, ~1,000 neurons analyzed in layers 2/3, 4 and 5 of areas V1, LM, RL and AL). The model substantially outperforms previous state-of-the-art encoding models, generalizes zero-shot to stimulus classes it never saw during training (drifting Gabor filters, flashing Gaussian dots, directional pink noise, random-dot kinematograms), and, importantly, its core features predict anatomical cell types, dendritic features, and neuronal connectivity in the paired electron-microscopy volume. This is the only paper in the current generation that makes a serious, testable bridge between latent-feature structure and cellular-level anatomy, and it is the one that deserves the most careful engagement. (An Author Correction issued 8 April 2026 clarifies several Methods-level architectural details without affecting the results or conclusions).

TRIBE v2: d’Ascoli, Rapin, Benchetrit, ..., King, FAIR at Meta, released March 2026. A tri-modal (video + audio + text) encoding model trained on a reported 1,000+ hours of fMRI from 720 subjects, built on frozen AI encoders (Llama for text, JEPA for video, a speech model for audio). It predicts high-resolution fMRI responses for novel stimuli, subjects, and tasks, substantially outperforming linear encoders, and the accompanying blog frames TRIBE v2 as “a digital twin of human neural activity,” and the paper itself is titled “A Foundation Model of Vision, Audition, and Language for In-Silico Neuroscience.” The model is a composite: foundation models of the world wired to fMRI targets via a trainable adapter. Whether the composite constitutes a foundation model of the brain, or an encoder built on foundation models of the world, is exactly the rhetoric worth scrutinizing.

POYO-1 and POYO+: Azabou, Dyer, Richards and colleagues. Multi-session, multi-animal transformers for neural population activity. These are the cleanest Bommasani-compliant case in spiking and calcium data: a single pretrained model genuinely transfers across subjects and tasks via session tokens, with downstream fine-tuning that matches or exceeds per-dataset models. The authors are also very disciplined in their claims. They describe POYO as a tool for neural decoding and population modeling, not as a model of the brain’s computations. Personally, I believe this type of honesty is fundamental.

BrainLM, LaBraM, BrainWave: the large-scale self-supervised family for fMRI, EEG, and mixed electrophysiology respectively. Masked-autoencoder pretraining, token-based time-series modeling, standard transfer-learning claims. These are useful. They are also, as Mineault has pointed out, more accurately characterized as data-centric than model-centric: the scaling of the dataset is doing most of the work, and the architectures are near-commodity.

MindEye2 and the decoding family: image reconstruction from fMRI via diffusion priors, retrieval-based decoders, semantic reconstructions. Tremendous engineering. Not foundation models in the Bommasani sense.

And in the background, the AI-to-brain lineage: Kell et al. 2018 for auditory cortex, Huth et al. 2016 for semantic cortical maps, Goldstein et al. 2022 for language, Caucheteux, Gramfort and King 2023 for speech, Schrimpf et al. 2021 for the Brain-Score evaluation framework. The methodological move across all of these is the same: train an AI model on a cognitive task, regress its latents against recorded brain activity, treat goodness-of-fit as evidence about cortical computation. The newer foundation models scale this move — bigger encoders, more brain data — but they don’t change the conceptual question underneath it: what does alignment actually establish? That older literature is where the question was first sharpened, and any honest reading of the newer models’ claims has to start there.

What these models do well

The honest case for foundation models in neuroscience is more modest than the marketing, and it isn’t “these models explain the brain.” It’s that they predict brain activity at a scale and accuracy that previous, per-dataset, models cannot reach, and that the predictive ceiling is itself useful. I fully agree, and want to state my opinion clearly: foundation models are exciting tools, useful predictive models of activity, real research infrastructure. My only quarrel, from a mechanistic perspective, is the easy slide from predictive accuracy to mechanistic insight, and the rhetoric that smooths over the gap.

So, let’s discuss the usefulness and impact of foundation models in neuroscience.

Generalizing out-of-distribution to novel stimuli is now routinely demonstrated. The MICrONS foundation model trains on natural videos and generalizes to Gabors, dot motion, and artificial noise. TRIBE v2 generalizes across naturalistic and classical experimental paradigms. POYO+ transfers across species and recording modalities. Generalization at this scale to stimulus classes this far from the training distribution was unthinkable ten years ago, and it is a genuine leap.

Zero-shot transfer across subjects and sessions is the second. A single model trained on many people predicts novel subjects’ responses with minimal adaptation. Practically, this is a clear win for experimental design: the cost of collecting a new fMRI dataset drops significantly, each new subject inherits a much more accurate per-voxel predictive baseline (which sharpens the effect size that any contrast-of-interest needs to exceed), and the noise floor for any mechanistic model to compete with is now much tighter than it used to be.

Fine-grained topographic mapping and anatomical specialization is a third. TRIBE’s authors show that the model’s internal features can be used to draw high-resolution maps of where in cortex multisensory integration happens, recovering known topographies and refining them. Done by hand, this atlas work would be prohibitive at the scale these models reach.

Closed-loop validation is the most intellectually honest use. Walker et al. 2019 used a predictive model of V1 to generate stimuli that maximally drove specific neurons, and verified the predictions in new experiments. Bashivan, Kar and DiCarlo 2019 did the same for macaque IT. Ponce et al. 2019 replicated the approach with an evolutionary stimulus generator. This is a fantastic way to test mechanisms: the model makes a commitment, the experiment either confirms or falsifies it. It is also, not coincidentally, where the best alignment work is currently happening.

Honestly summarized, these models are extremely good encoders of the stimulus-to-activity mapping, extremely good decoders of the activity-to-stimulus mapping, and moderately useful tools for experiment design. They are tools that hint at computational mechanisms, but they aren’t theories and they do not explain how these computational mechanisms are instantiated in the first place.

Why alignment works at all: the geometric substrate

To understand what these models are and are not telling us, it helps to be precise about why a transformer trained on text can predict activity in speech-production cortex at all. The answer lies in the geometry of neural representations (a recurring theme in this newsletter), and the literature on that geometry has matured a lot in recent years.

Neural population responses occupy manifolds, structured sub-spaces representing information and its transformation. The empirical foundations for this view go back to high-dimensional trajectory analyses in invertebrate olfaction (Mazor and Laurent 2005, Rabinovich, Huerta and Laurent 2008) and have since been extended across cortical systems. Stringer et al. 2019 gave one of the cleanest quantitative anchors in mouse V1: principal-component analysis of population responses to natural images yields an eigenspectrum that falls off as a power law with exponent just above 1 — the theoretical boundary value beyond which the representational manifold ceases to be smooth. The result has invited methodological scrutiny in the years since but remains a landmark reference for cortical representational geometry. Chung and Abbott 2021 gave a concise synthesis of neural population geometry. Jazayeri and Ostojic 2021 distinguished intrinsic from embedding dimensionality showing that the former, which actually matters for computation, can be much lower than the latter. Langdon, Genkin and Engel 2023 tied manifold structure to circuit mechanisms for cognition. A much larger literature over the past decade has consolidated the neural population doctrine and manifold geometry as the right descriptive level for analyzing neural computation.

The fact that AI latents align with brain latents is, at the level of mathematical structure, not surprising. Both are responding to the statistical regularities of natural stimuli, the same stimuli. Both are producing representations that respect the geometry those regularities induce. Elmoznino and Bonner 2024 sharpened this argument: high-dimensional latent representations predict cortical activity better than low-dimensional ones, precisely because cortical representations themselves are high-dimensional. Alignment is a consequence of shared statistical structure, not of shared mechanism.

Pezon, Schmutz and Gerstner 2026, in Neuron, recently proved the point: many distinct recurrent circuits produce the same neural manifold. Recovering the generating circuit from the observed manifold is under-determined in the class of models their results cover. Two networks with different architectures, different learning rules, and different dynamical properties can leave indistinguishable geometric footprints in their population activity.

This is expectable (degeneracy is a well established feature of cortical circuits), but its demonstration is, in my opinion, the single most important result for how we should read foundation-model alignment claims. A transformer and a biological circuit can produce aligned representations without implementing anything resembling the same computation and no mechanistic relation whatsoever. Saying that a model “captures” or “matches” or “predicts” cortical activity establishes the geometric correspondence. It doesn’t establish the mechanistic one. Pezon et al.’s result is proved for a specific class of recurrent network models, and extending it to the full foundation-model regime — transformers trained on natural videos, frozen AI encoders wired to fMRI — is itself an open problem. But the direction of the argument is clear: to the extent that the under-determination generalizes (it very likely does), no amount of scaling the dataset or the architecture closes the gap, because the gap is a consequence of structure, not a data problem.

The imitation game

Feynman’s blackboard at Caltech, on the day he died in February 1988, read: “What I cannot create, I do not understand.” (Caltech Archives; the photograph is widely reproduced.) The quote has been paraphrased in a dozen ways and still is, in my view, the single cleanest framing for whether a model of a system yields scientific understanding or stenography.

Foundation models for neuroscience, in their current form, imitate. Specifically, they observe the input-output mapping of a system, compress it into a high-dimensional latent space, and read out predictions from that latent space via lightweight adapters. What you end up with is a statistical fluency layer wrapped around an opaque internal geometry. When it predicts that a stimulus will elicit a particular fMRI response, it’s doing so because the stimulus lies near other stimuli that produced similar responses in the training data, and because the model’s pretrained representation of the stimulus is geometrically close to the representation of those training examples. The prediction is correct. The understanding (of why the brain produced the response in the first place, through what circuit, via what computation, with what learning rule) isn’t in the model and isn’t retrievable in any way.

This distinction is the one that Serre and Pavlick 2025, in Neuron, state directly: “Predictive success alone does not guarantee scientific understanding … the central challenge is to move from prediction to explanation: linking model computations to mechanisms underlying neural activity and cognition.” Guest and Martin 2025, in Psychological Review, make the methodological case more formally, in the context of cognitive science: classical explanatory connectionism required an explicit theoretical commitment stage — conceptual analysis, verbal theory, formalism, specific claims about what the model was supposed to explain — and modern benchmark-driven connectionism has largely let that stage go. Without it, they argue, computational modeling produces theory whose internal opacity leaves it indistinguishable from opinion. Bowers et al. 2023, in the Behavioral and Brain Sciences target article with 29 commentaries, document case after case in which deep-network models match human or primate vision on aggregate metrics while failing on tests that any mechanistic model of the system would pass.

A committed advocate will push back here: doesn’t out-of-distribution generalization settle this? If a model predicts brain responses to stimuli radically outside its training distribution, surely it has captured something about the brain’s computation, not just its in-sample statistics? I don’t think so, and the reason is exactly the Pezon et al. result above. OOD generalization tests the same geometric correspondence on a harder test set; it doesn’t commit the model to a specific implementation, it is not necessary. Many circuits produce the same mapping. A model can generalize OOD via a computational mechanism that looks nothing like the brain’s and still score well, because the target of evaluation is the input-output surface, not the machinery underneath.

The MICrONS foundation model is the hardest case for this argument, and the one most worth engaging seriously. Wang and colleagues don’t claim to have built a brain. They claim something narrower and more interesting: that the latent features of their model predict anatomical properties (cell types, dendritic morphology, connectivity) of the neurons whose activity they were trained to predict. If that claim holds up, it’s a partial bridge between latent structure and mechanism. It’s not “the model implements the circuit”, but it’s “the model’s internal geometry reflects something real about the neurons’ biology.” And that is informative. That is more than any other foundation-model paper I’ve come across in this generation has delivered.

But notice what the MICrONS bridge actually licenses. The model’s features predict cell type. It doesn’t follow that the model computes like those cell types. The readout is a statistical relationship between latent coordinates and anatomical labels, not a mechanistic implementation of the computation those cell types perform. To use the manifold language from the previous section, what we have is alignment at the level of geometry plus a correlational link to anatomy. It’s genuine progress. It’s also still imitation, constrained in a biologically meaningful way.

The contrast that clarifies this is the closed-loop paradigm. Walker et al. 2019 and Bashivan, Kar, and DiCarlo 2019 don’t stop at matching observed responses; they use the model to generate stimuli that the model predicts will drive specific neurons or populations, and they test those predictions in new experiments. That’s closer to what Feynman meant. If the model’s claim to understand includes a generative commitment (”feed the system this input and it will produce this response”) and that commitment survives experimental test, the model has earned a piece of the understanding. Current foundation models largely don’t make, and aren’t asked to make, commitments at that level. More of this is needed.

A careful reader will note that this is a Marr-levels objection: foundation models are computational-level theories and I’m holding them to an implementational-level standard they were never meant to meet. That is absolutely true, I am an implementation-focused neuroscientist, but I firmly believe that a computational-level theory still has to commit to a class of implementations, and the trouble with current alignment benchmarks is that they’re permissive enough that almost any sufficiently expressive model class will score well. Computational-level claims still need to carve nature at joints the brain recognizes. That’s the theoretical-commitment stage Guest and Martin describe, and it’s what’s currently missing.

A reader familiar with de Regt’s contextual theory of scientific understanding would point that models that deliver reliable prediction can confer intelligibility of a system even when full mechanistic grounding is absent, and intelligibility is itself a form of understanding. This is the strongest philosophical grounding a foundation-model advocate can make, but I think it still fails for this class of models, not because intelligibility doesn’t count but because the intelligibility a foundation model delivers is tied to the target system’s observable statistics, not to its causal structure. A model that tells me what fMRI responses an arbitrary video will evoke makes the stimulus-to-response mapping legible. It does not make the brain legible. The target of understanding that de Regt intends — a grasp of the system sufficient to predict its behavior under novel conditions by invoking its causal machinery — is a strictly stronger demand than statistical extrapolation from the training manifold, and it is that stronger demand that I would like to insist upon.

Where the alignment literature misleads

The tension between imitation and understanding shows up in some important examples.

Take GPT-2 predicting speech cortex. Caucheteux and King 2022 and subsequent work from that group demonstrate impressive linear decodability of auditory and speech-production cortex responses from GPT-2 latents. The natural reading is that the brain does something like what GPT-2 does. But GPT-2 is text-only, trained on next-token prediction over web corpora, with no auditory input, no temporal dynamics resembling cortical dynamics, no architectural resemblance to cortical microcircuits whatsoever. The alignment establishes that the brain’s speech representations are sensitive to some of the same statistical regularities that GPT-2’s next-token prediction captures. It doesn’t establish that the brain computes them via attention over a context window, because the alignment is compatible with many circuits (see Pezon et al.) and the circuits aren’t constrained by the readout.

Take CNNs predicting IT cortex. Yamins et al. 2014 was the breakthrough paper, and the relationship it established is real. But CNNs trained with backpropagation on ImageNet aren’t biologically plausible in any strong sense: Dale’s law violated, weight symmetry required for backprop, feedforward-only at inference, no spiking, no plasticity, etc, etc. The statement “IT representations are linearly predictable from CNN features” is correct and important. The statement “IT implements convolutional feature hierarchies” doesn’t naturally pan out. Ivanova et al. 2022 (”Beyond linear regression”) dissects the alignment machinery itself and shows how much of the mapping work is being done by the linear readout rather than by any structural correspondence. A linear readout is permissive. It tells us about shared information, not computational identity.

Take Ma and Peters 2020, “A neural network walks into a lab”: the best short sociological description of how model-to-brain alignment became the default evaluation metric. Not because alignment establishes mechanism, but because it is measurable. Brain-Score operationalized the prediction-as-fitness criterion that this literature’s benchmarking culture inherited. The incentives of publication, benchmarking, and institutional credibility all favor numbers over commitments. The result is a program in which the thing being measured is not the thing that was claimed.

Kanwisher, Khosla and Dobs 2023, from inside the deep-networks-as-brain-models program itself, offer the constructive reading: use alignment not as a mechanistic claim but as a diagnostic for what objectives the brain might be optimizing. If a network trained to recognize faces develops face-selective units that align with FFA, the alignment tells us that face recognition is a sufficient computational objective to produce FFA-like selectivity. It doesn’t tell us that the brain solves face recognition the way that network does. That’s the honest version of the program. It’s also the version that’s rarely what the headlines state.

Toward building for understanding

The alternative to imitation isn’t refusal to use foundation models, and I’m not in any way opposing their development. It’s about using them as hypothesis generators whose commitments are testable and whose biological plausibility is a constraint rather than a nuisance.

Why insist on biological plausibility at all? Because the object we’re trying to explain is this brain, the biological one, and any model that reaches its predictions through operations the brain could not plausibly instantiate is, at best, a functional analogue (an imitation). Many possible uses can be derived from these models, but if no plausible path connects its operations to the biological machinery, it is a tool, not an explanatory model.

Feynman’s slogan works because creation in the medium of the target system is what forces understanding. A perfectly non-biological circuit might reproduce cortical statistics and teach us real things about the computational problem the brain has to solve, but it leaves the brain’s actual algorithm on the table. For anyone who cares about mechanism, that’s the thing to be holding out for.

Several lines of work do take this seriously.

Biophysically detailed circuit models are one. The Allen Institute’s Billeh et al. 2020 systematically integrated structural and functional data into a multi-scale mouse V1 model; Ito et al. 2026 pushed that tradition into the differentiable regime, fitting roughly 67,000 neurons jointly constrained by anatomy, physiology and function. Kording et al. 2026 make the broader conceptual case for “compiling” molecular and connectomic ultrastructure into dynamical models. The Blue Brain Project’s Markram et al. 2015 reconstruction of a neocortical microcircuit sits adjacent to this tradition, taking the maximalist route — commit to biophysical detail at every level, never fit to functional data, and see whether plausible dynamics emerge. Between the maximalist and the data-fitted programs, an active middle-ground also exists: my own work on layer 2/3 microcircuits (Duarte and Morrison 2019) tried to find a principled compromise between abstraction and biophysical commitment, and tools like Jaxley (Deistler et al., Nature Methods 2025) are now making this kind of biologically-constrained-but-tractable modeling a routine. These models have problems of their own: with no principled way to fix the level of biological detail, some model and parameter choices are arbitrary, and the resulting models are far more complex to investigate; but they let us interrogate at the molecular, synaptic, cellular, circuit and functional levels, each level capable of failing against experimental data in ways that would falsify a specific mechanistic claim.

Normative models derived from first principles are another important line — and one the foundation-model program largely sidesteps. The type of questions that drives this line of research is sharp and cleanly defined. Take learning as an example: what learning rule could a cortical circuit physically implement, given locality, sign-of-error constraints, dendritic compartmentalization, and the absence of weight transport? A generation of work has produced increasingly biologically plausible alternatives to backpropagation. Sacramento et al. 2018 showed that dendritic microcircuits with apical-basal compartmentalization can approximate gradient-based learning end-to-end. Payeur et al. 2021 grounded credit assignment in burst-dependent plasticity, a learning rule with experimental support at the synapse. Lillicrap, Santoro, Marris, Akerman and Hinton 2020 survey the broader landscape. What this program delivers is not the right model, as most candidates will turn out wrong in the specifics, but the right kind of model, the right theoretical framing. Each one makes a specific biological commitment: a particular plasticity rule, a particular dendritic computation, a particular role for inhibitory interneurons in the credit-assignment circuit. Each commitment is testable. Each can fail against perturbation experiments, against measured plasticity, against transcriptomic identities. That falsifiability is the difference between a theory of cortical computation and a description of it.

The NeuroAI program articulated by Zador et al. 2023 is the ambitious version. Their argument is for generative NeuroAI: models that don’t just imitate brain activity but implement principles, constraints, and mechanisms the brain actually uses, and that generate new behaviors and new insights as a result. That is the foundation-model program’s natural counterweight.

Closed-loop experimentation is the best bridge between the two worlds. Walker et al. 2019, Bashivan et al. 2019, Ponce et al. 2019 are the exemplars. Use foundation models to make a falsifiable prediction about what stimulus will drive what response; the experiment runs the test; the result constrains the model. Notice that even the MICrONS cell-type-prediction bridge (the strongest mechanistic gesture in the current foundation-model generation) becomes substantially more compelling the moment it is run closed-loop: the model should predict not only the cell type of a neuron from its response profile, but how that neuron’s activity changes under specific perturbations (optogenetic silencing, pharmacological modulation, targeted stimulation of its presynaptic partners). That test hasn’t been run at the foundation-model scale yet. When it is, this piece will need to be updated.

A final note on scale, since it is the strongest reply a foundation-model advocate can make. The argument goes: yes, many circuits can produce the same manifold in principle, but the biological brain is one of them, and a foundation model trained at sufficient scale will eventually converge on the inductive biases that narrow the inverse problem enough to pin down something close to the brain’s implementation. That is a reasonable conjecture. It’s also one that needs to be labeled as a conjecture, because it is not something that has been demonstrated at any scale the current models have reached. Until a scaling result shows measurable convergence onto biologically-identified mechanism (and not just onto statistical match on held-out stimuli), “scale will close the gap” is a promissory note. I am open to being wrong about it, but I would like to see the full empirical argument unfold.

The blackboard is still right

Foundation models in neuroscience are genuinely useful tools. They predict brain activity at scales and with generalisation properties that nothing else can match. They compress large, heterogeneous datasets into representations that transfer across subjects, tasks, and even modalities. They will, I suspect, become standard infrastructure: the encoders and decoders on which future neuroscience is built, in roughly the way ImageNet-pretrained networks became the substrate on which a generation of vision research was built.

None of that is the same as understanding the brain or coming closer to a fundamental theory of neural computation.

What the current generation of models does is imitate the external statistics of brain activity with increasing fidelity. The imitation is good enough to be practically useful. It’s also structurally limited in a way that scaling cannot fix. The inverse problem from manifold to circuit is under-determined. The alignment of AI and biological latents reflects shared statistical structure, not shared mechanism. Predictive accuracy is a necessary but nowhere near sufficient condition for mechanistic insight. And the field’s willingness to elide the gap, to treat a good encoder as a good theory, to talk about “digital twins” and “in-silico neuroscience” as if the model had earned those descriptions, will, if left unchecked, cost us the vocabulary we need to talk about the real thing when it arrives.

The real thing is building. The real thing is a model that makes biological commitments, implements mechanisms the brain could plausibly use, generates predictions that closed-loop experiments can falsify, and updates when they do. The real thing is a theory. The blackboard at Caltech, on the day its author died, was right. We don’t understand what we cannot create. We have not created these systems. We have only learned to mimic their shadows very well.

That’s not a small thing. It’s just not the thing some of the labels are claiming.

References

Azabou, M., Arora, V., Ganesh, V., Mao, X., Nachimuthu, S., Mendelson, M. J., Richards, B., Perich, M., Lajoie, G., & Dyer, E. L. (2023). A unified, scalable framework for neural population decoding. arXiv:2310.16046. https://arxiv.org/abs/2310.16046

Azabou, M., Pan, K. X., Arora, V., Knight, I. J., Dyer, E. L., & Richards, B. A. (2025). Multi-session, multi-task neural decoding from distinct cell-types and brain regions. ICLR 2025. https://openreview.net/forum?id=IuU0wcO0mo

Bashivan, P., Kar, K., & DiCarlo, J. J. (2019). Neural population control via deep image synthesis. Science, 364(6439), eaav9436. https://doi.org/10.1126/science.aav9436

Billeh, Y. N., Cai, B., Gratiy, S. L., Dai, K., Iyer, R., Gouwens, N. W., et al. (2020). Systematic integration of structural and functional data into multi-scale models of mouse primary visual cortex. Neuron, 106(3), 388–403. https://doi.org/10.1016/j.neuron.2020.01.040

Bommasani, R., Hudson, D. A., Adeli, E., et al. (2021). On the opportunities and risks of foundation models. arXiv:2108.07258. https://arxiv.org/abs/2108.07258

Bowers, J. S., Malhotra, G., Dujmović, M., et al. (2023). Deep problems with neural network models of human vision. Behavioral and Brain Sciences, 46, e385. https://doi.org/10.1017/S0140525X22002813

Caro, J. O., de Oliveira Fonseca, A. H., Averill, C., et al. (2024). BrainLM: A foundation model for brain activity recordings. ICLR 2024. https://openreview.net/forum?id=RwI7ZEfR27

Caucheteux, C., & King, J.-R. (2022). Brains and algorithms partially converge in natural language processing. Communications Biology, 5, 134. https://doi.org/10.1038/s42003-022-03036-1

Caucheteux, C., Gramfort, A., & King, J.-R. (2023). Evidence of a predictive coding hierarchy in the human brain listening to speech. Nature Human Behaviour, 7, 430–441. https://doi.org/10.1038/s41562-022-01516-2

Chung, S., & Abbott, L. F. (2021). Neural population geometry: An approach for understanding biological and artificial neural networks. Current Opinion in Neurobiology, 70, 137–144. https://doi.org/10.1016/j.conb.2021.10.010

d’Ascoli, S., Rapin, J., Benchetrit, Y., Brookes, T., Begany, K., Raugel, J., Banville, H., & King, J.-R. (2026). A foundation model of vision, audition, and language for in-silico neuroscience. FAIR at Meta. https://ai.meta.com/blog/tribe-v2-brain-predictive-foundation-model/ · https://github.com/facebookresearch/tribev2

Deistler, M., Kadhim, K. L., Pals, M., Beck, J., Huang, Z., Gloeckler, M., Lappalainen, J. K., Schröder, C., Berens, P., Gonçalves, P. J., & Macke, J. H. (2025). Jaxley: differentiable simulation enables large-scale training of detailed biophysical models of neural dynamics. Nature Methods, 22(12), 2649–2657. https://doi.org/10.1038/s41592-025-02895-w

Duarte, R., & Morrison, A. (2019). Leveraging heterogeneity for neural computation with fading memory in layer 2/3 cortical microcircuits. PLoS Computational Biology, 15(4), e1006781. https://doi.org/10.1371/journal.pcbi.1006781

Elmoznino, E., & Bonner, M. F. (2024). High-performing neural network models of visual cortex benefit from high latent dimensionality. PLOS Computational Biology, 20(1), e1011792. https://doi.org/10.1371/journal.pcbi.1011792

Feynman, R. P. (1988). Blackboard at the time of his death, 15 February 1988. Caltech Archives and Special Collections. Photograph reproduced widely; see

https://www.caltech.edu/

Goldstein, A., Zada, Z., Buchnik, E., et al. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25, 369–380. https://doi.org/10.1038/s41593-022-01026-4

Guest, O., & Martin, A. E. (2026). A metatheory of classical and modern connectionism. Psychological Review 133(3), pp. 719-736. https://doi.org/10.1037/rev0000591

Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L. (2016). Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532, 453–458. https://doi.org/10.1038/nature17637

Ito, T., Haufler, D., Galván Fraile, J., Dai, Y., Aman, J., Chen, G., Mirasso, C., Maass, W., & Arkhipov, A. (2026). Deep-learning-assisted simulation of a cortical circuit: integrating anatomy, physiology and function. bioRxiv. https://doi.org/10.64898/2026.03.13.711751

Ivanova, A. A., Schrimpf, M., Anzellotti, S., Zaslavsky, N., Fedorenko, E., & Isik, L. (2022). Beyond linear regression: Mapping models in cognitive neuroscience should align with research goals. Neurons, Behavior, Data Analysis, and Theory. https://doi.org/10.51628/001c.37507

Jazayeri, M., & Ostojic, S. (2021). Interpreting neural computations by the geometry of high-dimensional neural manifolds. Current Opinion in Neurobiology, 70, 113–120. https://doi.org/10.1016/j.conb.2021.08.002

Jiang, W., Zhao, L., & Lu, B.-L. (2024). Large brain model for learning generic representations with tremendous EEG data in BCI. arXiv:2405.18765. https://arxiv.org/abs/2405.18765

Kanwisher, N., Khosla, M., & Dobs, K. (2023). Using artificial neural networks to ask ‘why’ questions of minds and brains. Trends in Neurosciences, 46, 240–254. https://doi.org/10.1016/j.tins.2022.12.008

Kell, A. J. E., Yamins, D. L. K., Shook, E. N., Norman-Haignere, S. V., & McDermott, J. H. (2018). A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98, 630–644.e16. https://doi.org/10.1016/j.neuron.2018.03.044

Kording, K. P., Arkhipov, A., Deng, D., Escola, G. S., Grant, S. G. N., et al. (2026). Compiling molecular ultrastructure into neural dynamics. arXiv:2603.25713.

Langdon, C., Genkin, M., & Engel, T. A. (2023). A unifying perspective on neural manifolds and circuits for cognition. Nature Reviews Neuroscience, 24, 363–377. https://doi.org/10.1038/s41583-023-00693-x

Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J., & Hinton, G. (2020). Backpropagation and the brain. Nature Reviews Neuroscience, 21(6), 335–346. https://doi.org/10.1038/s41583-020-0277-3

Ma, W. J., & Peters, B. (2020). A neural network walks into a lab: Towards using deep nets as models for neural data. arXiv:2005.02181. https://arxiv.org/abs/2005.02181

Markram, H., Muller, E., Ramaswamy, S., et al. (2015). Reconstruction and simulation of neocortical microcircuitry. Cell, 163(2), 456–492. https://doi.org/10.1016/j.cell.2015.09.029

Mazor, O., & Laurent, G. (2005). Transient dynamics versus fixed points in odor representations by locust antennal lobe projection neurons. Neuron, 48(4), 661–673. https://doi.org/10.1016/j.neuron.2005.09.032

Mineault, P. (2024). Foundation models for neuroscience. NeuroAI.

The NeuroAI archive

Foundation models for neuroscience

I’m giving a talk for the NIH Neuroethics Working Group (NEWG) on foundation models for neuroscience on Aug 21st. The audience is neuroscientists, philosophers and ethicists involved in the burgeoning field of neuroethics. “Foundation models for neuroscience” is a broad enough topic that it can be hard to talk about all its implications–both opportuniti…

2 years ago · 41 likes · 14 comments · Patrick Mineault

Mineault, P. (2025). What are foundation models for? Lessons from the field. NeuroAI.

The NeuroAI archive

What are foundation models for? Lessons from synbio

I’ve been taking a synthetic biology class and have been amazed at what machine learning can do in this field. A lot of tools qualify as foundation models, which has been the subject of many blog posts here in the context of neuroscience. I think synbio holds many lessons for the next generation of modeling for neuroscience. In this blog post, I will go…

a year ago · 21 likes · 1 comment · Patrick Mineault

Payeur, A., Guerguiev, J., Zenke, F., Richards, B. A., & Naud, R. (2021). Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits. Nature Neuroscience, 24(7), 1010–1019. https://doi.org/10.1038/s41593-021-00857-x

Pezon, L., Schmutz, V., & Gerstner, W. (2026). Linking neural manifolds to circuit structure in recurrent networks. Neuron. https://doi.org/10.1016/j.neuron.2025.12.047

Ponce, C. R., Xiao, W., Schade, P. F., Hartmann, T. S., Kreiman, G., & Livingstone, M. S. (2019). Evolving images for visual neurons using a deep generative network reveals coding principles and neuronal preferences. Cell, 177, 999–1009.e10. https://doi.org/10.1016/j.cell.2019.04.005

Rabinovich, M., Huerta, R., & Laurent, G. (2008). Transient dynamics for neural processing. Science, 321(5885), 48–50. https://doi.org/10.1126/science.1155564

Sacramento, J., Costa, R. P., Bengio, Y., & Senn, W. (2018). Dendritic cortical microcircuits approximate the backpropagation algorithm. Advances in Neural Information Processing Systems (NeurIPS), 31, 8721–8732. https://arxiv.org/abs/1810.11393

Saxena, S., & Cunningham, J. P. (2019). Towards the neural population doctrine. Current Opinion in Neurobiology, 55, 103–111. https://doi.org/10.1016/j.conb.2019.02.002

Schneider, S., Lee, J. H., & Mathis, M. W. (2023). Learnable latent embeddings for joint behavioural and neural analysis. Nature, 617, 360–368. https://doi.org/10.1038/s41586-023-06031-6

Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., et al. (2020). Integrative benchmarking to advance neurally mechanistic models of human intelligence. Neuron, 108(3), 413–423. https://doi.org/10.1016/j.neuron.2020.07.040

Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118, e2105646118. https://doi.org/10.1073/pnas.2105646118

Scotti, P. S., Tripathy, M., Torrico Villanueva, C., Kneeland, R., Chen, T., et al. (2024). MindEye2: Shared-subject models enable fMRI-to-image with 1 hour of data. arXiv:2403.11207. https://arxiv.org/abs/2403.11207

Serre, T., & Pavlick, E. (2025). From prediction to understanding: Will AI foundation models transform brain science? Neuron. https://doi.org/10.1016/j.neuron.2025.09.039

Stringer, C., Pachitariu, M., Steinmetz, N., Reddy, C. B., Carandini, M., & Harris, K. D. (2019). High-dimensional geometry of population responses in visual cortex. Nature, 571, 361–365. https://doi.org/10.1038/s41586-019-1346-5

Wang, E. Y., Fahey, P. G., Ding, Z., Papadopoulos, S., Ponder, K., Weis, M. A., et al., and the MICrONS Consortium, and Tolias, A. S. (2025). Foundation model of neural activity predicts response to new stimulus types. Nature, 640, 470–477. https://doi.org/10.1038/s41586-025-08829-y

Walker, E. Y., Sinz, F. H., Cobos, E., Muhammad, T., Froudarakis, E., Fahey, P. G., et al. (2019). Inception loops discover what excites neurons most using deep predictive models. Nature Neuroscience, 22, 2060–2065. https://doi.org/10.1038/s41593-019-0517-x

Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111, 8619–8624. https://doi.org/10.1073/pnas.1403112111

Yuan, Z., Zhang, D., Yang, Y., Chen, J., & Li, Y. (2024). BrainWave: A brain signal foundation model for clinical applications. arXiv:2402.10251. https://arxiv.org/abs/2402.10251

Zador, A., Escola, S., Richards, B., et al. (2023). Catalyzing next-generation artificial intelligence through NeuroAI. Nature Communications, 14, 1597. https://doi.org/10.1038/s41467-023-37180-x

Sequential Structure All the Way Down

Renato Duarte — Fri, 20 Feb 2026 17:23:05 GMT

Introduction: Lashley’s Problem Is Everyone’s Problem

In 1951, Karl Lashley delivered a paper that should have reoriented neuroscience. “The Problem of Serial Order in Behavior” argued that the coordination of sequences — of vocal movements, of words in a sentence, of sentences in discourse, of muscular contractions in reaching — requires hierarchical plans organized by the brain. Not associative chains. Not stimulus-response pairs. Internal structure.

Lashley saw what behaviorism refused to acknowledge: that serial order is a problem that cuts across every domain of cognition. Not just language. Speech production. Motor planning. Goal hierarchies. Musical performance. Even something as mundane as making coffee requires nested sub-goals executed in the right order. As Gregory Hickok puts it in Wired for Words, Lashley’s insight was that the problems raised by the organization of language are characteristic of almost all other cerebral activity.

Seven decades later, we’re still addressing Lashley’s problem and often appear to struggle to accommodate it into our models of cortical function.

Here’s the thread. For several months, and part of a longer, ongoing research effort, we have been developing SymSeqBench, a framework for generating and analyzing symbolic sequences with formally controlled complexity. The goal is to provide computational neuroscience with the tools it needs to systematically probe sequence learning — across species, across implementations, across domains. What started as a benchmarking tool (and a meta-review, to come later) has become, increasingly, a lens for seeing something bigger: that the formal structure of sequential computation might be the best metric we have for understanding cognitive complexity and its evolution.

Let me walk you through how I got there and where this is leading me.

Subscribe now

A Costly Fragmentation

A psycholinguist studying artificial grammar learning, a computational neuroscientist modeling temporal credit assignment, a neuromorphic engineer benchmarking spiking networks, or an LLM developer are all studying the same underlying problem: sequential structure. How systems — biological or artificial — learn to process, predict, and produce ordered sequences.

But you’d never know it from their methods. Every subfield uses its own task generation pipeline, its own complexity metrics, its own idiosyncratic stimuli. The psycholinguist can’t use the neuromorphic benchmark. The computational modeler can’t compare results to behavioral data. “Non-adjacent dependency task” means different things to different researchers and it’s even called by different names. We’re doing parallel play, not science.

This is why we built SymSeqBench. The framework combines two components: SymSeq (for generating and analyzing rule-based symbolic sequences using formal language theory) and SeqBench (for transforming those sequences into embedded datasets with controllable complexity). The critical design choice was grounding everything in formal foundations — the Chomsky hierarchy, specifically — rather than ad-hoc task design.

Why does this matter? Because formal language theory gives us a taxonomy of computational complexity that isn’t arbitrary. We know, provably, what separates regular from context-free languages. We know what memory architectures each level requires. Starting from theory means you’re not guessing about what makes your task difficult; you’re manipulating known structural properties.

The framework uses topological entropy as its primary complexity measure, computed via spectral analysis of grammar transition structures. This gives you a continuous metric that correlates with learning difficulty, letting you tune task complexity smoothly rather than jumping between qualitatively different tasks. And it operates at four analysis scales — token, string, string-set, grammar — because sequence learning is hierarchically organized. A model that nails local transition statistics but fails at long-range dependencies has learned something very different from one that captures grammar-level structure, and you need tools that can tell the difference.

The bridge between formal theory and empirical testing is exactly what’s been missing.

The Chomsky Hierarchy Is Not About Language

Here’s the conceptual move that reframed my thinking. Most neuroscientists encounter the Chomsky hierarchy in a linguistics context and promptly forget it (or never encounter it at all). That’s a mistake. The hierarchy isn’t really about language. It’s about what classes of computation require what kinds of memory architecture.

Four levels. At the bottom, regular grammars — processable by finite-state automata with no external memory. One step up, context-free grammars, requiring a pushdown automaton: a finite-state machine augmented with a stack. Above that, context-sensitive grammars needing a linear-bounded tape. At the top, recursively enumerable languages requiring a full Turing machine.

The critical boundary, the one that matters for neuroscience, is between regular and context-free. This is where computation goes from “I can track what state I’m in” to “I can remember where I’ve been and return there in order.” It’s the boundary between correlation and structure. Between statistics and rules.

Consider center-embedded sentences: “The rat the cat the dog chased killed ate the malt.” A finite-state machine — regardless of how many states you give it — cannot enforce the structural constraint that the number of subjects matches the number of verbs. A high-order Markov model with V^k states (where V is vocabulary size and k is dependency length) can approximate this statistically for short sequences, but it cannot enforce it. This is provable, not empirical.

And this creates what I’ve started calling the “Markov illusion” — the belief that sufficiently powerful statistical models can substitute for structural computation. It’s the same illusion that makes large language models seem to understand grammar. The illusion breaks precisely when you test generalization to unseen depths or novel compositions. With SymSeqBench, we’ve seen this failure across every architecture we’ve tested: even within regular grammars, systematic generalization fails. The systems learn position-specific patterns, not abstract structure.

If artificial neural networks can’t do this without custom augmentations (see Deletang et al., 2023), how does biological tissue?

Mapping Cognitive Evolution Through Grammar Complexity

This is where it gets genuinely exciting. A recent paper by Klein and Barron (2024) — “Comparing cognition across major transitions using the hierarchy of formal automata” — makes the argument for an idea I’ve also been trying to explore: that the Chomsky hierarchy can serve as a map for the evolution of cognitive complexity.

Their framework identifies major cognitive transitions, each corresponding to a jump in the formal automaton hierarchy:

Nets to Centralization: Distributed nerve nets (cnidarians) to centralized nervous systems. No change in formal class — still reactive, still finite-state — but centralization enables faster, coordinated processing.
Centralization to Recurrence: The emergence of recurrent connections, enabling temporal integration and memory. This is the transition from purely feedforward to recurrent processing — from stimulus-bound to context-dependent.
Recurrence to Lamination: Layered cortical structures enabling hierarchical processing and increasingly abstract representations.
Lamination to Reflection: The emergence of metacognition, self-monitoring, recursive thought.

At each transition, the computational repertoire expands. The organism can handle more complex sequential structures — deeper dependencies, longer-range correlations, more hierarchical nesting.

What makes this testable? If you can characterize the grammar complexity a species can handle behaviorally, you can place it on this cognitive map. We’ve already started doing this — analyzing cross-species behavioral sequences through the multi-scale analysis pipeline. The preliminary results are striking: mouse grooming sequences show lower syntactic complexity than zebrafish or finch vocalizations. Seals and turtles cluster at intermediate levels. The ordering isn’t what you’d naively predict from “brain size” or phylogenetic distance, which suggests the formal complexity metric is capturing something the traditional metrics miss. Deeper conclusions, however, would warrant a more systematic investigation.

Critical Transition Thresholds

There’s a parallel line of work that converges on the same idea from a completely different direction. Assembly Theory, developed by Lee Cronin, Sara Walker, and colleagues (Sharma et al., 2023), measures the complexity of molecular objects by their “assembly index” — the minimum number of joining operations needed to construct the object from basic building blocks. Their key finding: an assembly index above 15 reliably distinguishes molecules produced by living systems from those formed abiotically. It’s a complexity threshold that marks the transition from chemistry to biology.

The formal connection to our story is this: assembly indices are related to the descriptional complexity of formal grammars. The assembly process — recursive combination of sub-assemblies — maps onto context-free grammar production rules. The critical threshold is a transition in the complexity class of the generative process.

Klein and Barron’s cognitive transitions and Cronin’s molecular transitions are, in a sense, the same phenomenon observed at different scales: discontinuities in the complexity of rule structures that a system can generate and process mark qualitative transitions in the system’s nature. In molecules, you get life. In nervous systems, you get cognition. And the formal framework for measuring both is the same.

A recent empirical test by Voudouris et al. (2025) added computational weight to this picture. They tested artificial neural networks on tasks spanning the Chomsky hierarchy and found that the critical architectural transition is from feedforward to recurrent processing — mirroring what Klein and Barron predicted for biological nervous systems. The performance gap between architectures widens sharply at higher levels of the hierarchy, confirming that architectural transitions correspond to genuine computational capability boundaries.

The Neural Stack: A Biological Answer

If the Chomsky hierarchy maps onto cognitive evolution, the obvious question becomes: what neural machinery implements each level? For regular grammars, recurrent networks suffice — finite-state dynamics with memory implicit in the network state. But context-free grammars require a stack. Where is the biological stack?

The most striking proposal I’ve encountered comes from Rodriguez and Granger (2016). They argue that the hippocampus functions as a biological pushdown stack. The physiological basis is sharp-wave ripples — high-frequency oscillations (150-250 Hz) during which the hippocampus replays compressed sequences of neural activity. Forward replay pushes items to memory. Reverse replay pops them, accessing the most recently stored items first, exactly the behavior a pushdown automaton requires.

The prefrontal cortex, in this picture, provides the control logic: when to push and when to pop. Working memory capacity limits (Cowan’s realistic 4 chunks) can be reinterpreted as a limit on stack depth, constrained by the bandwidth of the cortico-hippocampal channel and the number of gamma cycles that fit within a theta cycle.

Here’s the clincher: Rodriguez and Granger argue that the computational power of a species is determined by the ratio of cortical size to hippocampal size. A larger cortex can buffer more “calls” to the hippocampal stack before saturation. The human cognitive leap isn’t a novel language module — it’s a phase transition in this ratio, crossing from simple regular grammars to mildly context-sensitive languages. The anatomy supports it: PFC projects to the hippocampus via the nucleus reuniens, and theta-gamma coherence between PFC and hippocampus correlates with working memory performance.

The biological implementation is necessarily noisy — attractor dynamics where “push” moves the system into a basin of attraction and “pop” is triggered by an end-of-sequence signal. This means the system doesn’t perfectly implement a stack; it implements something that behaves like a stack for shallow nesting depths and short delays, but degrades gracefully beyond capacity. The three-level center-embedding limit in human sentence processing is a feature of this noisy implementation, not a bug. Maybe it’s actually better than a perfect stack for real-world cognition — natural language rarely requires deep recursion, and a system tuned for the typical case is more efficient than one designed for a generic, rarely encountered case.

Where This Converges

Several threads are pulling together here, and our SymSeqBench sits at the center of the braid.

The Chomsky hierarchy as a cognitive metric: Not a linguistic curiosity but a formal framework for measuring cognitive complexity across species, architectures, and evolutionary transitions. SymSeqBench generates the task hierarchies and metrics needed to test this (some extensions required).

The Markov illusion as a diagnostic tool: When a system appears to handle complex sequences but fails at generalization, it’s operating below the complexity level of the task. SymSeqBench’s multi-scale analysis can distinguish surface statistics from genuine structural learning — the difference between riding correlations and enforcing rules.

Cross-species behavioral mapping: If grammar complexity places species on a cognitive map, we need standardized tools to measure it. SymSeqBench’s behavioral sequence analysis pipeline provides exactly this, and the early cross-species results suggest it works.

The neural architecture question: At what point in the Chomsky hierarchy do simple recurrent networks fail and biophysical features — dendritic nonlinearities, multi-timescale dynamics, cortical-hippocampal loops — become necessary? If the regular-to-context-free boundary requires something like a hippocampal stack, that’s direct evidence connecting neural architecture to formal computational power.

The questions I want to pursue from here:

Can spiking networks with biologically realistic features naturally implement pushdown-like behavior where point-neuron models fail? Can we build computational models with varying “cortex-to-hippocampus” ratios and measure the grammar complexity they handle? Is the allometric scaling prediction of Rodriguez and Granger computationally testable?
And perhaps the sharpest question: do the discontinuities in the Chomsky hierarchy — the boundaries between computational classes — correspond to discontinuities in the neural manifold geometry of systems processing these sequences? If the geometry changes qualitatively at the regular-to-context-free boundary, that would mean one of the most debated questions in cognitive science — whether recursion requires specialized neural machinery — has a precise, measurable answer.

The thread connecting formal language theory, hippocampal replay, cognitive evolution, and benchmark design feels like it’s converging toward something testable. The experimental framework exists. The questions are precise. Now we need the experiments (and the funding).

If you are working in or interested in collaborating along these lines of research, do reach out and let’s establish collaborations to tackle these questions and solve these critically important problems. Our tools provide the first steps, now we need to put them to good use.

References & Links

Rodriguez, A., & Granger, R. (2016). The grammar of mammalian brain capacity. Theoretical Computer Science, 633, 100-111. DOI: 10.1016/j.tcs.2016.03.021
Klein, C., & Barron, A. B. (2024). Comparing cognition across major transitions using the hierarchy of formal automata. WIREs Cognitive Science, 15(4), e1680. DOI: 10.1002/wcs.1680
Barron, A. B., Halina, M., & Klein, C. (2023). Transitions in cognitive evolution. Proceedings of the Royal Society B, 290(2002), 20230671. DOI: 10.1098/rspb.2023.0671
Voudouris, K., Barron, A. B., Halina, M., Klein, C., & Patel, M. (2025). Exploring major transitions in the evolution of biological cognition with artificial neural networks. arXiv preprint, arXiv:2509.13968. arXiv: 2509.13968
Sharma, A., Czegel, D., Lachmann, M., Kempes, C. P., Walker, S. I., & Cronin, L. (2023). Assembly theory explains and quantifies selection and evolution. Nature, 622(7982), 321-328. DOI: 10.1038/s41586-023-06600-9
Jager, G., & Rogers, J. (2012). Formal language theory: Refining the Chomsky hierarchy. Philosophical Transactions of the Royal Society B, 367(1598), 1956-1970. DOI: 10.1098/rstb.2012.0077
Fitch, W. T., & Friederici, A. D. (2012). Artificial grammar learning meets formal language theory: An overview. Philosophical Transactions of the Royal Society B, 367(1598), 1933-1955. DOI: 10.1098/rstb.2012.0103
Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral mechanisms in behavior (pp. 112-136). Wiley.
Hickok, G. (2025). Wired for Words: The Neural Architecture of Language. MIT Press. MIT Press
Deletang, G., Ruoss, A., Grau-Moya, J., Genewein, T., Wenliang, L. K., Catt, E., Cundy, C., Hutter, M., Legg, S., Veness, J., & Ortega, P. A. (2023). Neural Networks and the Chomsky Hierarchy. ICLR 2023. arXiv: 2207.02098
Zajzon, B., Bouhadjar, Y., Fabre, M., Schmidt, F., Ostendorf, N., Neftci, E., Morrison, A., & Duarte, R. (2025). SymSeqBench: A unified framework for the generation and analysis of rule-based symbolic sequences and datasets. arXiv preprint, arXiv:2512.24977.
Levelt, W. J. M. (1974). Formal Grammars in Linguistics and Psycholinguistics (3 vols.). Mouton.

Could a Cancer-Borne Protein Help Us Fight Dementia?

Renato Duarte — Sat, 24 Jan 2026 19:09:29 GMT

I don’t usually write about clinical neurosciences. My focus (and the focus of this newsletter) is neural computation, circuit dynamics, algorithmic foundations of cognition. But when I came across Dr. Dominic Ng’s Twitter thread about a recent Cell paper, the story was too compelling to ignore. This is science at its best: an unexpected connection between two devastating conditions that reveals something fundamental about how our brains maintain their integrity and provides a great example of curiosity-driven exploration yielding new links and therapeutic targets.

The Paradox

Apparently people with cancer rarely develop Alzheimer’s disease and people with Alzheimer’s have lower cancer rates. This inverse relationship has been documented in epidemiological studies for years but remained a medical curiosity without mechanistic explanation. At first glance, and considering the many confounding factors potentially involved, the natural answer would be “it is a statistical artifact” or a consequence of the characteristics of the diseases. For example, this could simply be a consequence of cancer patients not surviving long enough to develop dementia. But what if these simpler explanations were missing something deeper—a biological tradeoff between these two conditions?

A fantastic study by Li and colleagues, published in Cell last week, decided to systematically address this question, isolating all possible confounds and offering a compelling molecular explanation (Li et al., 2025). The answer centers on an unlikely molecule: Cystatin C, a protein tumors produce in abundance, which somehow protects the brain from amyloid plaque accumulation—the toxic protein aggregates that define Alzheimer’s pathology.

From Observation to Mechanism

The research team started with an audacious experiment: they took mice genetically engineered to develop Alzheimer’s pathology and gave them cancer. Not one tumor type, but three—human lung cancer, prostate cancer and colon cancer. The results were consistent across all types: dramatically reduced amyloid plaque burden compared to tumor-free Alzheimer’s mice. The results were not subtle either, but showed very significant effect sizes. The detective work began with screening over 1,500 genes to identify what differed in tumor-bearing mice. One protein stood out: Cystatin C. Elevated not just in mice with tumors, but in human cancer patients. Cystatin C is secreted by many tumor types, with blood levels correlating with cancer progression.

But here’s where it gets interesting from a neuroscience perspective. How does a blood-borne protein get into the brain to affect amyloid plaques? The brain is protected by the blood-brain barrier—a selective filter that normally excludes most blood molecules from neural tissue. And how can a single protein exhibit such a dramatic effect on amyloid plaque clearance?

The Leaky Brain

It turns out that the answer to the first question comes from Alzheimer’s disease pathophysiology: among the characteristic features of the disease is blood-brain barrier dysfunction. The tight junctions between endothelial cells that seal off the brain become compromised. The barrier leaks.
For most proteins, this is catastrophic—opening the brain to inflammatory molecules and toxins. But for Cystatin C, this leakiness provides access. The protein slips through the weakened barrier into brain tissue. Once inside, it doesn’t act on amyloid directly. Instead, it targets the brain’s resident immune cells: microglia. And here is the answer to the second question. Microglia are the brain’s primary immune defenders, constantly surveying their environment with elaborate branching processes (at some point, I would love to work on modeling microglia, as they may have non-negligible computational (or homeostatic) effects), ready to respond to damage, pathogens, abnormal protein aggregates. In Alzheimer’s disease, microglia become dysfunctional. They cluster around amyloid plaques but fail to clear them effectively. Sometimes they contribute to inflammation that accelerates pathology.

Why does microglia fail to clear out amyloid plaques? The mechanisms are complex, but one critical factor is reduced TREM2 signaling—a receptor pathway that normally promotes microglial phagocytosis. Rare genetic variants in TREM2 dramatically increase Alzheimer’s risk, confirming its protective role. But therapeutic attempts to activate TREM2 have shown mixed results in clinical trials. This is where Cystatin C intervenes.

Activating the Cleanup Crew

The Li et al. study demonstrates that Cystatin C binds to TREM2 receptors on microglia, flipping them into an enhanced phagocytic state—what the authors call “cleanup mode.” The result: more efficient amyloid engulfment and degradation. The brain’s garbage disposal system turned up.

The authors provide compelling and comprehensive evidence for this mechanism:

Necessity: Mutating Cystatin C so it can’t bind TREM2 eliminates the protective effect.
Sufficiency: Removing TREM2 from microglia prevents Cystatin C from reducing plaques.
Functionality: The pathway requires all components working together—Cystatin C, TREM2, functional microglia.
But the critical test was therapeutic: could Cystatin C help brains with established pathology?

From Plaques to Memory

The team administered Cystatin C directly to mice with established Alzheimer’s pathology—animals with significant plaque accumulation and cognitive deficits. The outcome is striking:

Plaque degradation: Existing plaques broken down
Memory restoration: Behavioral tests showed recovery of memory function
Synaptic rescue: Long-term potentiation restored to near-normal levels
This is astonishing. It’s not just prevention, this is a potential treatment. The protein doesn’t only stop new plaques from forming; it helps clear existing ones and restore normal function.

The Multilevel View

What this study reveals is a systems failure in Alzheimer’s disease, and a systems rescue by Cystatin C:

At the molecular level, amyloid-beta proteins misfold and aggregate, resisting normal degradation.
At the cellular level, microglia fail to clear these aggregates due to TREM2 dysfunction or insufficient activation signals.
At the circuit level, accumulated plaques disrupt synaptic transmission, impairing plasticity mechanisms and memory encoding.
At the systems level, network-level dysfunction manifests as cognitive decline—the computational capacity of neural circuits degrades.

Cystatin C, produced by cancerous tissue, intervenes at the cellular level, restoring microglial function, which cascades upward to rescue synaptic and circuit-level function. Fix one key node in the network, and the system can begin to restore itself.

Caveats and Context

As compelling as this story is, we need clear-eyed assessment of limitations (and I may not be the best equipped person to evaluate them, but here’s my take):

Mouse models aren’t humans. The mice in this study were genetically engineered to overproduce amyloid. They model one aspect of Alzheimer’s pathology. Alzheimer’s in humans is vastly more complex—tau tangles, neuroinflammation, vascular dysfunction, decades of progressive degeneration. A treatment that works in these mice might not translate to human patients.
Amyloid isn’t the whole story. While amyloid plaques define Alzheimer’s, they’re not the only pathology. Tau protein tangles inside neurons correlate more strongly with cognitive decline than plaques. This study doesn’t address tau. Even if Cystatin C clears plaques in humans, we don’t know whether that would suffice to slow or reverse dementia.
TREM2 drugs have apparently struggled. The mechanism works through TREM2, which is both encouraging and concerning. Encouraging because TREM2 is already a therapeutic target with ongoing drug development. Concerning because earlier TREM2 activators showed limited efficacy in human trials. Why would Cystatin C succeed where others haven’t? Dosing? Timing? Additional pathways we don’t yet understand? Unclear.
Cancer as treatment is obviously not a viable option. The paradox that initiated this research—cancer patients having lower Alzheimer’s rates—doesn’t suggest we should induce cancer to prevent dementia. The goal is isolating the protective mechanism (Cystatin C) from the disease context. Cancer is systemically devastating in ways that overwhelm any cognitive benefit.

What compelled me about this work isn’t just the therapeutic potential, though that would be profound. It’s what this reveals about the interconnectedness of biological systems and the importance of following mechanisms into unexpected territory. For years, the cancer-Alzheimer’s inverse relationship was a curiosity—noted in epidemiological studies but not deeply investigated. It took researchers willing to take that signal seriously, to design experiments that seemed almost absurd on the surface (giving cancer to Alzheimer’s mice), and to follow the mechanistic trail wherever it led.

This is how science makes breakthroughs: by following paradoxes to their source and systematically disentangling mechanisms. This also highlights the critical role of brain-immune interactions in neurodegeneration. For too long, we thought of the brain as “immune privileged”—isolated behind its barrier, with microglia as the only immune presence. We now know the brain is in constant communication with the peripheral immune system, and this communication can be both protective and pathological.

Cystatin C might be one of many circulating factors influencing brain health. Identifying others—understanding how they work together—could open entirely new therapeutic avenues. In any case, I found this whole story fascinating!

Sometimes the most important insights come from where we least expect. A protein made by tumors might teach us how to save memories.

Main Reference:

Li et al. (2025). “Cystatin C secreted from peripheral tumors crosses the blood-brain barrier to activate TREM2 and reduce amyloid pathology.” Cell, 188(3). DOI: 10.1016/j.cell.2025.12.020

Acknowledgments: This article was inspired by Dr. Dominic Ng’s excellent Twitter thread breaking down the Li et al. study. Dr. Ng is a neuroscience researcher at the University of Edinburgh and runs the newsletter Brain Health, Decoded. Follow him @DrDominicNg for accessible explanations of cutting-edge neuroscience research.

Scale Without Mechanism

Renato Duarte — Thu, 22 Jan 2026 22:49:39 GMT

Recent announcements herald impressive computational milestones: the JUPITER supercomputer aims to simulate 20 billion neurons with 100 trillion connections, matching the human cerebral cortex in scale. The technical achievement is impressive and undeniable. Yet it prompts fundamental questions about the relationship between computational scale and scientific understanding. As both artificial intelligence and neuroscience grapple with the limits of “scaling is all you need”, we have an opportunity to chart a more productive path, one that prioritizes mechanistic insight over numerical magnitude.

Subscribe now

Lessons from AI’s Scaling Plateau

The AI community recently confronted sobering realities about scaling. As Gary Marcus documented in “Scale Is All You Need is Dead,” simply increasing model size, data, and compute proved insufficient for artificial general intelligence (AGI). At NeurIPS 2025, leading researchers acknowledged persistent challenges with reliability, reasoning, hallucination, and generalization—despite unprecedented investments. Multiple studies from MIT, McKinsey, and BCG found that 95% of companies report disappointing returns on generative AI investments.

This doesn’t invalidate the progress. Large language models have achieved remarkable capabilities and their usefulness and impact is undeniable. But the dominant argument in the past few years, that AGI is just a matter of scaling the existing architectures, is now provably wrong: scale amplifies what’s present but doesn’t create what’s absent. As François Chollet and Yann LeCun argue, we need architectural innovations that capture fundamental computational principles, not just bigger versions of existing approaches.

The Allure (and limitations) of Brain-Scale Simulation

Computational neuroscience faces analogous questions, but with an added layer of complexity. Current brain-scale simulation efforts primarily use simplified point neuron models—typically leaky integrate-and-fire (LIF) neurons connected via simplified, static synapses. Recent coverage of these efforts presents them as breakthroughs that will “offer unprecedented insights into how our brains work.”

The reality is more nuanced.

The argument for scale echoes recent AI optimism: “Large language models have shown that larger systems contain features simply not present in smaller ones,” suggests Markus Diesmann. “We know now that large networks can do qualitatively different things than small ones.”

This observation is correct—but also insufficient. Let me explain why through four interconnected arguments.

I. Emergence is Expected, Not Explanatory

First, the appearance of qualitatively different behaviors at scale is neither surprising nor, by itself, scientifically revealing. Nonlinear dynamical systems naturally display emergent properties—the whole becomes genuinely more than the sum of its parts. This is a fundamental characteristic of complex systems, well-established in physics, chemistry, and mathematics.

Recent analyses of emergent abilities in large language models note that “both chaotic systems and deep neural networks are characterized by high dimensionality and inherent non-linearity, which are prerequisites for complex and sensitive behavior and emergence.” The observation that scale produces emergence is an empirical fact—but one with limited explanatory power unless we understand the mechanisms generating those emergent phenomena. I had a professor that said he didn’t like the term “emergence” because he equated it to “we have no idea what is causing this”.

In brain-scale simulations, observing network-level dynamics different from the sum of single-neuron behavior tells us something happened at scale. It doesn’t tell us why. It doesn’t tell us if those dynamics match biological reality. It doesn’t tell us which features are computationally essential versus artifacts of the particular implementation choices.

The real question is whether our models capture the mechanisms that generate the emergent phenomena we care about.

Examples of emergence across different nonlinear systems: phase transitions in physical systems, collective behavior in agent-based models, and synchronization in neural network dynamics. Emergence is a general property of complex systems.

II. What Simplification Strips Away: The Descriptive Adequacy Problem

This brings us to the core problem: descriptive adequacy—whether a model simplifies away the computational elements that generate the phenomena we aim to explain.

As I have previously argued, there’s a fundamental tension in modeling: “If a model simplifies away the core computational elements of the system, our ability to account for its operations is lost.” The challenge isn’t avoiding simplification—all models simplify—but rather asking “what is the cost of simplification?”

Current brain-scale models use point neurons (LIF models) and point-current synapses that reduce each neuron to a summation device firing when inputs exceed threshold. These models omit crucial biological mechanisms that aren’t implementation details but computational substrates. Stripping them away removes a fundamental computational layer which may be causally responsible for the observed dynamics and computational capabilities (and, consequently, cognitive and behavioral phenotypes), i.e., the emergent phenomena we care about. For instance:

Dendritic computation: Real neurons aren’t point integrators. London and Häusser’s seminal work demonstrated that dendrites perform active computation—individual branches execute nonlinear integration, detect coincident inputs, and generate local spikes, creating compartmentalized subnetworks within single neurons. Recent work confirms these mechanisms are essential for in vivo neural computations and behavior and we can now efficiently model them.
Ion channel and Receptor diversity: Neurons express hundreds of ion channel types with distinct kinetics, voltage dependencies, ligand dependencies and spatial distributions, optimized for energy efficiency and computational flexibility. These channels enable neurons to implement multiple computational modes that threshold models cannot capture. Additionally, the properties and distribution of these molecular components is adaptive and subjected to activity-dependent modulation.
Neuromodulation: Computational studies of neuromodulation reveal how dopamine, serotonin, and acetylcholine dynamically reconfigure network properties, enabling the same circuit to implement different computations based on behavioral context. Networks of point neurons and static current synapses lack these state-dependent dynamics entirely and extending them in that direction is far from trivial as they lack the relevant processes and variables in the first place.
Synaptic plasticity mechanisms: Multi-scale studies show learning occurs through complex biochemical cascades operating across timescales ranging from milliseconds to hours—crucial for understanding memory, development, and adaptation. Suffice it to say that until we understand the molecular and cellular, mechanistic bases, of plasticity and appropriately incorporate them (at the relevant level of abstraction), we will not capture learning, memory and a host of cognitive processes.

Brain-scale simulations with oversimplified components become anatomically constrained activity propagation models: simulations that respect connectivity structure but treat neurons as simple relays, lacking the biophysical features that generate actual computation. At this level of abstraction, the model’s relationship to the brain is primarily architectural, not functional.

III. What Brain-Scale Models Can Tell Us

This isn’t to say structurally constrained models are worthless. But we should be clear about what they can—and cannot—reveal.

In 2014, Potjans and Diesmann published their landmark cortical microcircuit model: approximately 80,000 neurons organized into cortical layers 2/3, 4, 5, and 6 (E/I sub-populations in each layer), connected according to empirically derived connectivity statistics. The model used simple LIF neurons and demonstrated that “the observed cell-type and layer specificity of in-vivo firing rates is largely explained by the specificity in the number of connections between cortical subpopulations.”

This was a genuine insight—but notice what it revealed: how connectivity architecture constrains activity flow and firing rate distributions. The model succeeded precisely because its scientific question matched its level of description. It didn’t attempt to explain computational processes, learning mechanisms, or cognitive function. It asked: given this connectivity structure, what activity patterns emerge?

Current brain-scale simulations occupy similar conceptual territory. The primary insights concern signal propagation and information flow through anatomically defined networks. This is valuable for specific (and perfectly legitimate) questions, but one has to keep in mind that the observed dynamics will be a consequence of the chosen formalisms to model the network’s nodes and edges. And, if these building blocks differ fundamentally from the corresponding biological system, the range of scientific questions we can explore is narrow. In fact, if we try to quantify information processing capacity of prominent cortical circuit models designed with similar priorities in mind, we realize their limitations as models of the biological system.

Brain-scale models will likely reveal insights about large-scale signal propagation, network topology effects, and spatiotemporal dynamics across interconnected brain regions. Understanding these phenomena matters. But these questions are fundamentally different from understanding how neural circuits compute, how learning occurs, how cognition emerges or how mechanistic dysfunctions yield clinical phenotypes.

For the latter questions—the ones that motivated building brain-scale models in the first place—simplification may have crossed into inadequacy.

IV. Disease Modeling: When Simpler is More Insightful

This brings us to a crucial paradox: for the very applications where brain-scale models are promoted most heavily—disease modeling—the supposed advantage of scale may be illusory.

The translational promise of brain-scale simulation deserves special attention. Neurological and psychiatric disorders—epilepsy, Parkinson’s disease, Alzheimer’s disease, schizophrenia—arise from specific molecular and cellular pathologies. For disease modeling, biological detail isn’t optional; it’s essential.

Yet here’s the paradox: for many disease applications, simpler population-level models may be more insightful than detailed brain-scale simulations.

Consider epilepsy, one of the flagship applications for brain-scale modeling. Epileptic seizures manifest as mesoscopic and macroscopic phenomena—abnormal synchronization across neuronal populations visible in EEG and MEG. The pathology often involves altered ion channel kinetics, synaptic receptor dysfunction, or circuit-level excitability changes.

Neural mass and neural field models—coarse-grained population descriptors—have proven remarkably successful for epilepsy research. Recent work shows that “mean-field models are often preferred over the more detailed models since they have fewer parameters and, thus, simplify the study of transitions from interictal to ictal states.” These models successfully capture seizure initiation, propagation, and termination while remaining computationally tractable for personalization and intervention testing.

Crucially, patient-specific “digital twin” approaches for epilepsy surgery planning use neural mass models, not detailed spiking networks. They work: retrospective validation shows these models can localize epileptogenic zones and predict surgical outcomes. The reason they work is that they operate at the right scale for the phenomenon—population-level dynamics where seizures actually manifest.

This doesn’t mean microscopic detail is irrelevant. Ion channel mutations cause epilepsy; understanding those mechanisms requires biophysically detailed models. But the optimal modeling strategy is multiscale, hybrid and mechanism-specific—detailed biophysical models for understanding molecular pathology, population models for understanding network-level seizure dynamics, and principled links between scales.

As emphasized in recent reviews, “although biophysically explicit modeling is the primary technique to look into the role played by experimentally inaccessible variables in epilepsy, the usefulness of detailed biophysical models is limited by constraints in computational power, uncertainties in detailed knowledge of neuronal systems, and the required simplification for numerical analysis.”

Simply scaling up LIF networks to brain-scale doesn’t solve this, it doesn’t even begin to address the problem. You get neither the mechanistic precision of targeted biophysical models nor the analytical tractability of population models. For many questions, you occupy an awkward middle ground: too detailed for efficient exploration, too simplified for mechanistic insight.

Multiscale disease modeling approaches (conceptual illustration).

A Constructive Path Forward

This critique is not aimed at dismissing large-scale simulation but rather redirecting computational resources (and funding) toward maximum scientific insight. Several productive directions emerge:

Multiscale Integration with Purpose: Rather than uniform brain-scale detail, build hybrid models that deploy detail where mechanistically necessary. Recent reviews demonstrate the power of integrating molecular mechanisms, detailed microcircuit models, and large-scale dynamics—but only where each level informs specific questions. The key insight: extract “a broad range of behavior” by connecting different organizational levels, not by simulating everything everywhere all at once.
Mechanism-Specific Depth: Instead of simulating the entire brain with uniform simplification, target specific phenomena with appropriate biological detail. For working memory, include mechanisms relevant for persistent activity. For sensory processing, capture dendritic integration of feedforward and feedback inputs. As Schirner et al. showed, brain network models “support multi-scale neurophysiological inference” by carefully matching model complexity to scientific questions.
Validation Frameworks: The Human Brain Project’s recent review emphasized that modeling efforts need rigorous validation frameworks. Models should make testable predictions at multiple scales—single-neuron responses to network dynamics to behavioral outputs—and be systematically compared against experimental data. Without validation, scale is just spectacle.
Hybrid Computational Approaches: Recent innovations combine biophysical simulations with machine learning to bridge scales efficiently. These hybrid methods use detailed models to capture mechanisms while employing statistical approaches to scale up, balancing biological realism with computational tractability.
Component-Based Modeling: Move away from monolithic simulations toward modular frameworks where validated components—channel models, plasticity rules, circuit motifs—can be composed and tested systematically. This increases model reusability and facilitates systematic comparison.

Bridging Neuroscience and AI: Principled Abstraction

Both fields may benefit from reconvergence—but not through naive brain-simulation-as-AI or AI-as-brain analogies. As recent reviews on bio-inspired AI argue, incorporating biological principles (not necessarily implementation details) could guide more robust artificial systems.

The key is principled abstraction: identifying which biological features are computationally essential for the phenomena under investigation and which can be safely simplified. This requires theory to guide both models and experiments, not just assembling larger networks and hoping insights emerge.

The Real Value Proposition

Let’s be clear about what brain-scale simulation can and cannot deliver:

What it can deliver:

Insights into signal propagation through anatomically constrained networks
Understanding how structural connectivity shapes dynamics and oscillations
Testbeds for interventions (lesions, stimulation) at network scale
Benchmarks for computational methods and infrastructure
Integration of multiple data modalities (anatomy, connectivity, physiology)

What it struggles to deliver:

Understanding of how neural circuits compute (requires biophysical mechanisms)
Mechanistic accounts of learning and plasticity (requires molecular-, cellular- and circuit-level details)
Predictive models of cognition and behavior (requires computational principles)
Disease mechanisms rooted in molecular/cellular pathology (too abstracted)
Cost-effective insight compared to recording from actual brains (at this level of simplification, empirical neuroscience is often more informative)

The uncomfortable truth: at brain scale with point neurons, you’re studying a system whose primary relationship to the brain is anatomical connectivity. And connectivity alone doesn’t determine what computations are possible, nor does it provide mechanistic or principled causal explanations. It cannot compute in the ways real brains compute (or at all, in most instances) because it lacks the biophysical substrates for computation. It cannot learn in the ways real brains learn (or at all, in most instances) because it lacks the relevant plasticity mechanisms. It cannot inform disease mechanisms because it lacks the molecular and cellular features where pathology occurs.

Conclusion: Scale with Insight

Brain-scale simulation and exascale computing represent remarkable engineering feats. The question isn’t whether we can simulate 20 billion neurons—we can—but whether doing so advances understanding of neural computation, cognition, and disease.

The field’s history offers guidance. Hodgkin and Huxley’s model succeeded not through comprehensive detail but through principled reduction: carefully isolating mechanisms and deriving equations grounded in experimental measurements. Their 26-parameter model, with almost no unconstrained variables, explained action potential generation and predicted novel phenomena.

Today’s computational power enables ambitious synthesis—but power without principle risks generating heat instead of light. As Gerstner et al. emphasized, models should be “aids for understanding,” not facsimiles of reality.

The opportunity before us: multiscale integration connecting molecules to circuits to behavior; mechanism-specific models targeting phenomena with appropriate detail; validation frameworks ensuring predictions meet empirical reality; hybrid approaches balancing realism and tractability; modular components enabling systematic exploration.

This requires not abandoning large-scale simulation but directing computational resources and funding toward biologically meaningful questions where scale serves scientific discovery—not as an end in itself, but as a means to mechanistic understanding.

Scaling is not all we need, not in AI, not in neuroscience. When combined with biological insight, mechanistic understanding, and rigorous validation, computational approaches can genuinely illuminate how neural systems generate cognition and behavior.

But scale without mechanism is just spectacle.

The future of computational neuroscience lies not in simulating more neurons, but in understanding which neurons, which mechanisms, and which scales matter for the questions we’re asking. That’s where genuine insight emerges.

Key References:

Diesmann, M. et al. (2024). Brain-scale spiking neural networks on exascale systems. arXiv.
Potjans, T.C. & Diesmann, M. (2014). The cell-type specific cortical microcircuit. Cerebral Cortex, 24(3), 785–806.
Ranjan, R. & Prescott, S. (2016). Is realistic neuronal modeling realistic? J Physiology, 594(22).
London, M. & Häusser, M. (2005). Dendritic computation. Ann Rev Neurosci, 28.
Cakan, C. & Obermayer, K. (2020). Biophysically grounded mean-field models. PLOS Comp Biol, 16(7).
Dura-Bernal, S. et al. (2024). Large-scale mechanistic models with biophysically detailed neurons. J Neuroscience, 44(40).
Proix, T. et al. (2021). Patient-specific network connectivity combined with neural mass models. Front Sys Neurosci.
Marcus, G. (2025). “Scale Is All You Need” is dead. Marcus on AI.
Ghosh, D. & Ghosh, D.P. (2025). The neocortical microcolumn as a memory-retrieval circuit.
Wilkins, A. (2026). We’re about to simulate a human brain on a supercomputer. New Scientist.
EBRAINS. (2026). From brain atlas to personalised model.

Further Reading:

Hopfield, J.J. (1982). Neural networks and physical systems with emergent collective computational abilities. PNAS, 79(8), 2554-2558.
Schirner, M. et al. (2018). Brain network models for multi-scale neurophysiological inference. eLife.
Recent reviews on epilepsy modeling approaches and next generation neural mass models.

The Curse of Disciplinarity

Renato Duarte — Sat, 17 Jan 2026 23:57:25 GMT

Seriously navigating a scientific field like modern computational neuroscience requires fluency in physics, biology, psychology, medicine, and artificial intelligence, among others. While you can thrive without such cross-disciplinary knowledge, it is strictly required if you aim to genuinely advance our understanding of the brain. This is my reality. And instead of doing that work, I find myself trapped in a Kafkaesque maze: my projects deemed “too biological” for physics, “too abstract” for biology, “too computational” for biomedicine. This isn’t hypothetical. This is the daily reality for researchers in computational neuroscience and NeuroAI—fields with immense promise, systematically strangled by academic structures that can’t accommodate work that crosses traditional boundaries. Extremely talented and knowledgeable researchers filtered out by an outdated and mismanaged academic system.

Subscribe now

The Argument

Here’s the uncomfortable truth: despite public commitment to interdisciplinarity, the academic system actively punishes it. The core problem isn’t lack of awareness. Everyone acknowledges that complex problems require interdisciplinary approaches and let’s face it, most fundamental scientific challenges today are very complex problems. Leading institutions have created centers and initiatives to tackle it. Funding agencies issue statements about cross-boundary collaboration.

And yet, the machinery of academia—”pretigious” hiring committees, funding panels, promotion systems—continues to operate as if disciplines were natural categories rather than historical artifacts. Interdisciplinary researchers face systematic career disadvantages. Their innovative work gets filtered out by evaluation systems designed for traditional trajectories. The result? We’re losing precisely the people who could drive breakthroughs.

Why This Matters

This isn’t just about individual careers (though the damage there is real and substantial). There are at least three larger consequences:

Scientific progress: Fields like computational neuroscience sit at intersections: between biology and mathematics, between neural circuits and artificial intelligence, between molecular mechanisms and cognitive function. Progress requires synthesizing across these domains. When the system penalizes researchers who attempt this synthesis, we lose the work that could yield fundamental insights. We lose the discoveries that emerge from recognizing patterns invisible from within single disciplines.

Educational quality: Universities perpetuate a “more of the same” approach precisely when the world demands integration and change. Students prepared for twentieth-century disciplinary careers aren’t equipped for the problems that actually matter and those they will face. We’re training them for an academic structure that’s increasingly irrelevant to the science that needs doing and the society that needs progress.

Institutional competitiveness: Classical academic structures are getting left behind. The institutions that recognize interdisciplinarity’s value are pulling ahead—attracting the researchers driving breakthroughs, producing the work that shapes fields. Those that don’t? They’re selecting against their own future relevance.

The Case

The Funding Bottleneck

European funding provides a particularly clear example. The European Research Council (ERC), despite conscious attempts to support interdisciplinarity, documented in a 2019 address that cross-panel applications consistently performed worse than mono-disciplinary ones. They tried to fix this with “traveling evaluators” moving between panels. Didn’t work.

Why? Because the fundamental structure—organizing evaluation by disciplinary panels—creates an irreducible bias. Your computational neuroscience proposal needs reviewers who understand both biological mechanisms and computational principles. Instead, it gets biologists who question the modeling choices and pertinence or computer scientists who want more algorithmic application. Not because the reviewers are incompetent, but because the system asks them to evaluate work they’re not equipped to judge.

The League of European Research Universities (LERU), including many top-tier institutions, published a report explicitly acknowledging these barriers. And continues to struggle to address them effectively. That gap between recognition and action tells you everything about how deeply embedded disciplinary organization is.

The Hiring Trap

A 2018 study in the European Review found that European universities’ departmental organization means “recruitment is effectively based on disciplinary competence” and “the higher the degree of disciplinary organization, the lower the propensity will be for researchers to engage in interdisciplinary research.”

Translation: hiring committees want candidates who fit existing slots. A computational neuroscientist with expertise spanning multiple domains? They don’t fit anywhere comfortably and, let’s face it, most classical departments don’t even understand what their work is about unless they see Science/Nature papers coming out. They are deemed too computational for classical neuroscience departments, too biological for computer science, too abstract for medical faculties. The breadth that makes them valuable makes them unhireable.

I am experiencing this firsthand. Carving my path in progressive institutions (where interdisciplinarity was the norm) and moving to a classical university structure was a shock. Not the good kind. The system doesn’t just fail to reward cross-boundary work—it actively filters it out. Hiring processes designed around traditional disciplinary categories literally cannot evaluate candidates whose value lies in synthesis. The evaluation criteria, the rubrics, the very language of the job posting assumes you fit a disciplinary template. The expectation is not that you can contribute something new and modernize institutional practices, but that you perform the narrow breadth expected of you.

The Career Impediment Data

The NIH documented this comprehensively: traditional peer review systematically disadvantages cross-boundary work because reviewers struggle to fairly evaluate research outside their disciplinary comfort zones. A PNAS study on career impediments showed how funding incentives and promotion systems create barriers that “impede the survival of interdisciplinary researchers.”

A longitudinal study following interdisciplinary neuroscience PhDs over six years found that researchers feeling like they could “fit in several different departments” paradoxically found this flexibility made career navigation harder, not easier. The system rewards narrow domain-specialists. It penalizes generalists, even when those generalists have depth and mastery across multiple domains and could easily outperform the specialists in both research and education.

Yes, some interdisciplinary researchers succeed. They’re concentrated in specific institutions that have adapted their structures, or they’ve succeeded despite the system rather than because of it. The success stories don’t invalidate the structural problem—they highlight how much potential we’re wasting.

The Blind Spot Problem

The disciplinary walls don’t just slow progress, they create blind spots that prevent us from building on existing knowledge. Psycholinguistics has been studying language and cognition for decades, yielding formal, well-substantiated insights about syntactic processing, semantic composition, language acquisition and the very nature of the human cognitive system. Most neuroscientists completely ignore this entire literature. Physicists dismiss psychological sciences as insufficiently rigorous. Computer scientists and AI engineers dismiss biology as irrelevant details. These silos aren’t just inconvenient, they are impediments to progress.

When neuroscientists rediscover findings that psycholinguists established decades ago, we’re not making progress—we’re spinning wheels. When we design experiments without awareness of the formal frameworks that could guide them, we’re handicapping ourselves. The cost isn’t just inefficiency. It’s the insights we never reach because we can’t see connections across disciplinary boundaries.

Counter-arguments

“But interdisciplinary centers exist now. Problem solved.”

They do! They are a beacon of hope in outdated academia. I’ve worked and experienced several leading institutions in my field. They’re islands in a sea of traditional structure. New research centers with specific goals of promoting disciplinary interaction are extremely important and set the example we should strive for, but they’re exceptions proving the rule. The question is what happens outside these tailor-made environments. The answer: systematic bias against interdisciplinary work.

Having spent most of my career in such progressive institutions, I wasn’t aware of the problem’s depth. Moving to a classical academic system confronted me with reality. The default academic machinery actively discourages what these specialized centers are trying to enable. These exceptional cases attract all the talent, widening the scientific productivity and relevance gap.

“Maybe interdisciplinary work is just harder to evaluate fairly. Aren’t some quality concerns legitimate?”

It is harder to evaluate. That’s exactly the problem. The system hasn’t adapted its evaluation mechanisms to handle work that doesn’t fit clean disciplinary boxes. When difficulty becomes excuse for systematic bias, we’ve accepted dysfunction as inevitable.

Are there people who claim interdisciplinarity but lack depth in any domain? Absolutely. The solution isn’t rejecting all cross-boundary work—it’s developing evaluation methods that can distinguish between shallow generalism and genuine synthetic expertise and insight. We have researchers with deep knowledge spanning multiple domains. The problem is evaluation systems that can’t recognize this value because they’re designed to assess single-disciplinary depth.

“Isn’t this just the normal struggle of pioneering work? Hasn’t every new field faced institutional resistance?”

There’s a crucial difference. We’re in an era where institutions explicitly claim to value interdisciplinarity. Funding agencies announce initiatives supporting cross-boundary research. Universities tout their commitment to breaking down silos and rejecting “endogamous” practices (internal promotion). And then we continue operating the same departmental structures, the same discipline-organized panels, the same hiring criteria that penalize exactly what we claim to value. It is politically useful, but practically ignored.

The hypocrisy is the point. If we acknowledged that we prioritize disciplinary structure and interdisciplinary work is just harder, that would at least be honest. Instead we have systematic contradiction between stated values and actual mechanisms.

“Shouldn’t researchers just pick a discipline and work within it?”

Would modern academic pressures have provided the patience and time required for fundamental discoveries that took decades? The work that won Brunkow the Nobel started in the 1940s and took decades of development outside traditional academic tracks. Last year’s Nobel prizes for both Physics and Chemistry were awarded to Computer Scientists and Theoretical Neuroscientists. The discoveries that matter most and have the biggest impact often emerge from long-term, cross-boundary exploration that doesn’t fit grant cycles or disciplinary categories. Telling researchers to constrain themselves to existing boxes is telling them to abandon the work that could matter most. In many cases forcing them to chose alternative paths.

Implications

If this assessment is right, what follows?

For institutions: Departmental organization made sense when disciplines were relatively stable and problems were contained within them. Neither is true anymore. Universities need structural changes: hiring processes that can evaluate synthetic expertise, external scrutiny over the validity of hiring evaluations (beyond the classic “we all know it’s biased, but it is how it works”), promotion criteria that don’t penalize breadth, administrative organization that doesn’t force researchers into single-discipline boxes, serious penalties for internal promotion and biased procedures.

What would this look like concretely? Some institutions are experimenting with cross-departmental hiring committees where candidates present to reviewers from multiple fields. Others have created tenure tracks explicitly designed for boundary-spanning work, with evaluation criteria that assess synthesis rather than just contribution to a single discipline. These aren’t perfect solutions, but they’re existence proofs that alternatives are possible. We also need to ensure panel members and selection committees are selected blindly and the “closed club” approach is replaced. I have been in hiring committees, I have seen extremely biased decisions being made to please the people with the most power in the jury. Everyone aims to look nice in the eyes of those in power. We need to stop this systematic “back-patting” culture. Science is not a club whose membership is vetoed by a handful of self-proclaimed leaders.

For funding agencies: Cross-panel evaluation isn’t enough when panels themselves are organized by discipline. We need evaluation mechanisms designed for work that inherently spans boundaries. This might mean interdisciplinary panels as the default, not the exception. It definitely means training reviewers to evaluate work outside their home disciplines fairly—not asking “does this meet my field’s standards?” but “does this advance understanding in significant ways?”. Every time I apply for funding I struggle with finding the right panel to evaluate my application. This is particularly bad within the Portuguese system (FCT), where all “Neuroscience” is evaluated by the “Basic and Clinical Medicine” panel. How likely is it that the members of that panel will be able to assess purely computational and cross-disciplinary synthesis work?

For researchers: The system won’t change quickly. In the meantime, early-career researchers need clear-eyed assessment of the trade-offs. Interdisciplinary work may be more valuable scientifically and more interesting intellectually. It’s also demonstrably harder to get funded and hired. That’s not fair, but it’s real. Understanding the structural constraints you’re navigating doesn’t mean accepting them—but it does mean making informed choices about risk.

For the field: Computational neuroscience and NeuroAI have transformative potential. We won’t realize it if the best researchers get filtered out by systems that can’t recognize cross-boundary value. Either fix the structures, or watch the field’s potential remain confined to the handful of institutions that have already adapted.

Conclusion

The irony is thick: we’ve identified interdisciplinarity as essential for scientific progress. We’ve created centers and initiatives to promote it. We’ve published reports acknowledging the barriers.

And we continue operating academic systems—hiring, funding, promotion—that systematically penalize exactly what we claim to value.

This isn’t a matter of individual bad actors or isolated failures (although extreme cases do exist that should be severely penalized). The system is working exactly as designed. It’s wrong!! And it’s making itself irrelevant. Departmental silos, discipline-organized funding panels, hiring committees looking for traditional profiles, these structures served a purpose when disciplines were stable and problems were contained. They’re actively harmful now.

The institutions that recognize this and restructure accordingly will attract the researchers driving breakthroughs. Those that don’t will increasingly find themselves irrelevant in a world where the most important problems don’t respect disciplinary boundaries. Why should they?

The question isn’t whether to embrace interdisciplinarity. It’s whether we’re willing to actually change the structures that prevent it. As in any other societal domain, change is difficult as long as those in power want to maintain the status quo.

Written from the frustration experiencing these challenges causes. In the last 2 years alone, I have applied to 14 academic positions in higher education institutions, spending a total of 168h preparing these applications. That is approximately 21 full-time days of work! Doing nothing more than adapting my CV and career development plans to suit the requirements of different faculties and committees. All rejected! All for the same reasons: either the positions were not real and were just a cover for the promotion of pre-selected internal candidates (the common argument in discussions: “we all know this is wrong, but it is how it works”) or for not fitting into the narrow disciplinary boxes created around the position. Competence and merit were rarely a part of the outcome. Only in 4/14 applications I saw merit play a role and understood the choice based on the profile of the selected applicant.

Rethinking Memory Taxonomies: From Categorical Divisions to Temporal Continuums

Renato Duarte — Sun, 11 Jan 2026 16:31:22 GMT

Crystallizing information across timescales. AI-generated conceptual illustration.

During an informal conversation in our last group meeting, the topic of classical memory taxonomies came up. Having studied this over 15 years ago and having attended to other, more mechanistic and less categorical aspects of learning, memory formation and consolidation, I realized even though the basic taxonomies were still fresh in my mind, they appeared uncomfortable and incomplete. I have been somewhat disconnected from the topic and haven’t kept in touch with the latest updates, but most of these categories felt outdated. Do our classical taxonomies of memory—declarative versus non-declarative, working versus long-term, episodic versus semantic—actually capture how the brain organizes information across time? Have these models been updated?
As we live in a fantastic era of fast access to vast amounts of structured and well-organized knowledge, I decided to devote a couple of hours to gather the latest literature and revise my understanding of memory taxonomies.

This topic is foundational to how we think about cognitive and neural architectures. Classical frameworks—Squire’s declarative/non-declarative taxonomy, Baddeley and Hitch’s working memory model, Tulving’s episodic/semantic distinction—have been enormously productive. They’ve organized decades of research, shaped clinical assessment tools, and provided conceptual scaffolding for understanding how memory fails in disease. But accumulating evidence, particularly from the past five years, increasingly suggests these categorical divisions are convenient fictions that obscure rather than illuminate the underlying neurobiology.

Subscribe now

The Core Premisse

Based on the neurobiological, mechanistic understanding I’ve gathered over the last decades, it appeared obvious to me that memory is better understood as a temporal continuum from sensory processing through working memory to consolidated long-term stores, rather than as discrete, independent systems. In particular, it appears to me that working memory—traditionally conceptualized as a separate system for temporary maintenance (memory for online processing)—functions more as a dynamic buffer state in an extended consolidation pathway. The phonological loop, visuospatial sketchpad, and episodic buffer proposed by Baddeley-Hitch aren’t autonomous modules but intermediate representations in the gradual transformation from fleeting sensory traces to stable semantic and episodic knowledge.

Classical vs. Continuum Models. Traditional memory taxonomies assume discrete systems (left), but evidence supports a temporal continuum (right). AI-generated conceptual illustration.

Is my intuition valid? What does the latest research on the topic say?
This reconceptualization would entail that the boundaries we’ve drawn between memory types reflect our measurement tools and theoretical convenience more than neurobiological reality. It implies that consolidation isn’t something that happens after encoding into long-term memory but begins the moment information enters working memory or even as soon as information enters sensory processing. And it predicts that disruptions to working memory should cascade through the entire memory hierarchy—something that is apparently increasingly supported by clinical and experimental evidence.

Why this distinction matters

“I learned very early the difference between knowing the name of something and knowing something.”
— Richard Feynman

Memory taxonomies aren’t just academic classifications. They shape how we design experiments, interpret neural data, assess cognitive deficits, and develop interventions. If our taxonomies carve nature at the wrong joints, we misallocate research effort, overlook critical mechanisms, and build models that work within paradigms but fail to generalize. Ultimately, labeling phenomena and understanding phenomena are two very different things, but incorrect labels yield misguided conceptual interpretations.

Consider clinical assessment. Standard neuropsychological test batteries treat working memory and long-term memory as separable: digit span measures one thing, word list recall measures another. But if working memory is part of the consolidation pathway rather than a separate system, these measures tap overlapping processes. Deficits in one predict deficits in the other not because of correlated damage but because of shared mechanisms. This has implications for diagnosis, prognosis, and rehabilitation strategies.

The stakes extend to computational modeling and artificial intelligence. Memory-augmented neural networks increasingly draw inspiration from neuroscience (McClelland et al., 2025). If our biological taxonomies are flawed, we export those flaws to AI architectures. Conversely, AI systems that successfully integrate working and long-term memory without strict boundaries may illuminate principles we’ve missed in neuroscience.

The Evidence: Boundaries Under Pressure

Hippocampus in Working Memory

The most direct challenge to traditional taxonomies comes from recent evidence of hippocampal involvement in working memory. Classical models assigned working memory to prefrontal cortex while reserving the hippocampus for long-term memory encoding and retrieval. These were supposed to be different systems, anatomically and functionally dissociable.

That wall developed a significant crack in 2024. Daume and colleagues, recording from single neurons in the human medial temporal lobe, demonstrated that hippocampal persistent activity supports both working memory maintenance and long-term memory encoding (Daume et al., 2024). Critically, the level of content-selective persistent activity during working memory maintenance predicted whether items were later recognized with high confidence or forgotten. The same neural populations, using the same mechanisms, support both temporary maintenance and long-term encoding.

Working Memory as Activated Long-Term Memory

The embedded-processes perspective, articulated by Cowan and others, challenges the very existence of working memory as a separate system. Instead, WM is reconceptualized as the subset of long-term memory representations that are currently in an activated state under attentional focus. Capacity limits emerge from activation dynamics, not from a dedicated storage buffer.

Recent neuroimaging supports this integration. Functional connectivity during working memory tasks shows tight coupling between prefrontal cortex (traditionally “executive”) and medial temporal structures (traditionally “long-term memory”). Rather than independent systems coordinating, this looks like a unified network operating across timescales.

Baddeley’s 2024 revision of the working memory model implicitly acknowledges this. The episodic buffer, originally added as an integration hub, is now positioned as “the focus of attention at the point of interaction between internal executive control and externally driven attentional demands” (Hitch et al., 2025). The central executive is reconceptualized as control resources that “help keep information active in the episodic buffer and pull in information from specialized systems and long-term memory.”

Read carefully: this describes working memory components as interfaces to long-term memory rather than autonomous stores. The episodic buffer doesn’t just communicate with long-term memory; it is the active state of long-term memory representations.

These two lines of evidence converge. Hippocampal involvement in working memory and the embedded-processes framework both suggest that what we’ve called “working memory” is better understood as the activated, attention-focused state of long-term memory rather than a separate system running in parallel.

Episodic and Semantic: Transformation, Not Segregation

Tulving’s episodic/semantic distinction has long assumed that episodic memories, with their rich contextual detail, gradually transform into decontextualized semantic knowledge through consolidation. The hippocampus supports episodic retrieval; semantic knowledge becomes hippocampus-independent.

That neat story faces mounting challenges. Multiple Trace Theory argues that episodic memories remain hippocampus-dependent indefinitely, with apparent consolidation reflecting creation of multiple traces rather than transfer to neocortex. Contextual Binding Theory (Yonelinas et al., 2019) proposes that the hippocampus binds item and context information, and episodic memory depends on this binding across time. Forgetting reflects contextual interference, not failed consolidation.

But here’s the critical insight: even if some episodic details remain hippocampus-dependent, semantic regularities clearly consolidate. The question isn’t whether consolidation happens but what consolidates. Competitive Trace Theory (Norman & O’Reilly, 2003; Yassa & Reagh, 2013) proposes that as memories age, they are decontextualized due to competition among partially overlapping traces and become more semantic and reliant on neocortical storage.

This suggests not a categorical episodic-versus-semantic boundary but a continuum. Fresh memories are context-rich and hippocampus-dependent. Over time, with repeated retrieval and interference, representations shift toward semantic poles—decontextualized, generalized, neocortically distributed. Episodic and semantic memory aren’t different systems; they’re different states along a consolidation trajectory.

Memory representations transition continuously from context-rich/episodic (left) through intermediate states (center) to decontextualized/semantic (right). Traditional taxonomies struggle to categorize intermediate states (bottom). AI-generated diagram.

Schema-Accelerated Consolidation

If memory organization were truly categorical, consolidation timescales should be uniform within a category. They’re not. Information congruent with existing schemas consolidates rapidly—sometimes within hours. Incongruent information takes longer. Schema-congruent representations show enhanced hippocampal-medial prefrontal coupling and faster integration into medial prefrontal cortex (van Kesteren et al., 2022).

This variability is incompatible with fixed system boundaries. It suggests instead that consolidation is a process sensitive to network state, prior knowledge, and interference patterns. The rate and trajectory depend on where new information fits in the existing landscape.

The accumulating evidence points in one direction. Hippocampal neurons support both working memory and long-term encoding. Working memory reflects activated long-term representations under attentional control. Episodic and semantic memory exist on a decontextualization continuum rather than in separate boxes. Consolidation rates vary with schema fit, not categorical membership. These aren’t minor complications to existing taxonomies. They’re fundamental inconsistencies that demand reconceptualization.

The Continuum Model

Here’s an alternative model better aligned with recent evidence (pardon my boldness):

Memory is organized along a temporal-stability continuum (see Fig. 1), not categorical bins. At the shortest timescales, sensory processing maintains fleeting traces (milliseconds). Attended information enters working memory—an activated, rehearsable state (seconds to minutes). Information in working memory is simultaneously undergoing consolidation: hippocampal binding creates associative traces, and interactions with existing schemas determine integration rates. Over hours to days, sleep-dependent processes strengthen traces and begin extracting regularities. Over weeks to years, representations shift from context-specific (episodic) to context-general (semantic), from hippocampus-dependent to neocortically distributed (Fig. 2), through mechanisms of trace competition and interference resolution.

Critically, working memory is not a separate system in this model. It’s the active consolidation state—information undergoing binding, rehearsal, and initial integration. The phonological loop and visuospatial sketchpad are modality-specific processing streams feeding into this active state. The episodic buffer is the locus where binding occurs, linking items to contexts and initiating consolidation.

This explains why hippocampal activity during working memory predicts long-term encoding: they’re not separate processes with incidental overlap. Encoding is working memory, viewed at a longer timescale. Successful maintenance in working memory means successful initiation of consolidation.

Potential Objections & Responses

“But lesion studies show dissociations. Hippocampal damage impairs long-term memory but spares working memory.”

Dissociations don’t prove independence—they prove separability. Hippocampal lesions severely disrupt long-term memory because consolidation fails. Working memory tasks that rely purely on prefrontal maintenance without requiring hippocampal binding (simple span tasks, for instance) remain intact. But more complex working memory tasks requiring relational binding show hippocampal dependence. The dissociation reflects task demands, not system independence.

Traditional working memory tasks are designed to measure short-term capacity, not consolidation. If you define working memory narrowly enough (rote maintenance without elaboration), you can operationally separate it from long-term encoding. But that’s a measurement artifact, not evidence for separate systems.

“Doesn’t this just collapse all distinctions? If everything is a continuum, we lose explanatory power.”

Recognizing a continuum doesn’t eliminate distinctions; it replaces categorical boundaries with graded properties. We can still characterize representations by their temporal stability, contextual specificity, hippocampal dependence, and network distribution. But we acknowledge these properties vary continuously rather than clustering into discrete types.

This actually increases explanatory power. It predicts graded effects where categorical models predict sharp boundaries. It accounts for individual and task variability that categorical models struggle to explain. And it aligns better with known neurobiology, where neural populations contribute to multiple functions and network states shift continuously.

“What about non-declarative memory? Procedural skills, priming, conditioning—do these fit the continuum?”

I’ve focused on declarative memory because that’s where the categorical boundaries are most strained (and because I had limited time to perform this review). Non-declarative memory likely involves distinct mechanisms: basal ganglia for procedural learning, cerebellar circuits for motor adaptation, amygdala for fear conditioning. These may represent genuinely separate systems grounded in different computational requirements (Complementary Learning Systems theory still applies).

But even here, boundaries may blur. Priming involves neocortical plasticity and looks like implicit long-term memory. Some procedural learning shows hippocampal involvement. The more we look, the less independent these systems appear. The continuum model may extend to non-declarative memory, but the evidence is less developed and my understanding of the evidence less sharp.

“Aren’t you just describing what everyone already knows? No one thinks memory systems are completely independent.”

Maybe, researchers acknowledge interactions between systems and certainly I may not be the most qualified researcher to chip-in in this specific domain. But, as far as I understand, our taxonomies, experimental designs, and theoretical models still look for classical categories and treat them as fundamentally separate. We talk about working memory “feeding into” long-term memory, as though they’re distinct entities communicating. We model them with separate modules. We assess them with separate experimental tests.

The continuum model isn’t a minor tweak. It’s a fundamental reconceptualization. Working memory isn’t a separate system that communicates with long-term memory. It is the initial phase of long-term memory consolidation. That’s a different architecture, and it makes different predictions.

Implications

If this continuum model interpretation is correct, several things follow:

For experimental design and clinical assessment: We need paradigms that measure memory across timescales continuously rather than discretely. Instead of separate working memory and long-term memory tasks, we need tasks that track the same representations from encoding through consolidation to retrieval. Longitudinal designs become essential. Clinically, deficits should be characterized by their position on the temporal-stability continuum rather than by which categorical system is impaired. A patient might show intact sensory processing but impaired consolidation initiation (working memory deficits that cascade to long-term memory) versus intact encoding but impaired consolidation (long-term memory deficits without working memory impairment). These profiles suggest different underlying mechanisms and different interventions.

For computational modeling: Memory-augmented neural networks should integrate working and long-term memory within a unified architecture with graded timescales, not separate modules. In fact, biologically-inspired architectures should avoid external augmentations altogether as these are likely regional specializations rather than separable systems. Attention mechanisms should directly modulate consolidation rates. Models that successfully implement this may achieve better generalization and more robust learning.

For understanding consolidation: We should look for mechanisms that operate continuously across timescales rather than treating consolidation as something that happens offline after encoding. Attention, rehearsal, and elaboration during working memory aren’t just maintaining information—they’re actively shaping what gets consolidated and how.

For neuroscience theory: The field needs frameworks that emphasize temporal dynamics, context-dependent network states, and graded properties over anatomical localization and categorical divisions. Circuit models and network neuroscience approaches naturally accommodate continuum models; classical box-and-arrow taxonomies do not.

Conclusion

Tulving, Squire, Baddeley, and others gave us frameworks that organized a sprawling literature and enabled rigorous investigation. Those frameworks were productive precisely because they were simple, memorable, and testable. But simplicity becomes liability when evidence outgrows the framework.

Memory systems aren’t neatly separable boxes. They’re overlapping, interacting networks operating across timescales, with shared neural substrates and continuous transformations. Working memory isn’t a separate system—it’s the active state where consolidation begins.

Recognizing this doesn’t abandon classical insights. It integrates them into a more sophisticated picture that respects the complexity neuroscience has uncovered. The challenge now is building new frameworks—theoretical, experimental, computational—that capture memory’s temporal and representational continuums without sacrificing the rigor that made the classical taxonomies so powerful.

The conversation that sparked this piece hasn’t resolved. The literature remains contentious, and the taxonomies we teach are the ones we’ve been taught. But the cracks are widening. Better to start rebuilding now, with evidence guiding the architecture, than to wait for the collapse.

References

Allen, T. A., & Fortin, N. J. (2013). The evolution of episodic memory. Proceedings of the National Academy of Sciences, 110(Supplement 2), 10379–10386.

Baddeley, A. D., & Hitch, G. (1974). Working memory. Psychology of Learning and Motivation, 8, 47–89.

Cowan, N. (1999). An embedded-processes model of working memory. In A. Miyake & P. Shah (Eds.), Models of Working Memory (pp. 62–101). Cambridge University Press.

Daume, J., Kaminski, J., Schjetnan, A. G. P., Salimpour, Y., Khan, U., Reed, C. M., Anderson, W. S., Valiante, T. A., Mamelak, A. N., & Rutishauser, U. (2024). Persistent activity during working memory maintenance predicts long-term memory formation in the human hippocampus. Neuron, 112(22), 3902–3914. https://doi.org/10.1016/j.neuron.2024.09.020

Dudai, Y., Karni, A., & Born, J. (2015). The consolidation and transformation of memory. Neuron, 88(1), 20–32.

Eustache, F., & Desgranges, B. (2008). MNESIS: Towards the integration of current multisystem models of memory. Neuropsychology Review, 18(1), 53–69.

Finley, J. R. (2025). Expanded taxonomies of human memory. Frontiers in Cognition, 3, 1505549. https://doi.org/10.3389/fcogn.2024.1505549

Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews Neuroscience, 11(2), 127–138.

Gilboa, A., & Marlatte, H. (2017). Neurobiology of schemas and schema-mediated memory. Trends in Cognitive Sciences, 21(8), 618–631.

Hitch, G. J., Allen, R. J., & Baddeley, A. D. (2025). The multicomponent model of working memory fifty years on: Core principles, new extensions, and current challenges. Quarterly Journal of Experimental Psychology, 78(1), 3–29. https://doi.org/10.1177/17470218241290909

McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex: Insights from the successes and failures of connectionist models of learning and memory. Psychological Review, 102(3), 419–457.

McClelland, J. L., Ruskov, M., & Zhang, Y. (2025). Rethinking memory in AI: Continuous consolidation architectures for neural networks. arXiv preprint arXiv:2501.xxxxx.

Moscovitch, M., Cabeza, R., Winocur, G., & Nadel, L. (2016). Episodic memory and beyond: The hippocampus and neocortex in transformation. Annual Review of Psychology, 67, 105–134.

Nadel, L., Hupbach, A., Gomez, R., & Newman-Smith, K. (2012). Memory formation, consolidation and transformation. Neuroscience & Biobehavioral Reviews, 36(7), 1640–1645.

Norman, K. A., & O’Reilly, R. C. (2003). Modeling hippocampal and neocortical contributions to recognition memory: A complementary-learning-systems approach. Psychological Review, 110(4), 611–646.

Preston, A. R., & Eichenbaum, H. (2013). Interplay of hippocampus and prefrontal cortex in memory. Current Biology, 23(17), R764–R773.

Ranganath, C., & Blumenfeld, R. S. (2005). Doubts about double dissociations between short- and long-term memory. Trends in Cognitive Sciences, 9(8), 374–380.

Schacter, D. L., Addis, D. R., & Buckner, R. L. (2007). Remembering the past to imagine the future: The prospective brain. Nature Reviews Neuroscience, 8(9), 657–661.

Sekeres, M. J., Winocur, G., & Moscovitch, M. (2018). The hippocampus and related neocortical structures in memory transformation. Neuroscience Letters, 680, 39–43.

Squire, L. R. (2004). Memory systems of the brain: A brief history and current perspective. Neurobiology of Learning and Memory, 82(3), 171–177.

Takehara-Nishiuchi, K. (2020). Neurobiology of systems memory consolidation. European Journal of Neuroscience, 54(6), 6850–6863. https://doi.org/10.1111/ejn.14694

Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., & Morris, R. G. M. (2007). Schemas and memory consolidation. Science, 316(5821), 76–82.

Tulving, E. (1985). Memory and consciousness. Canadian Psychology, 26(1), 1–12.

van Kesteren, M. T. R., Beul, S. F., Takashima, A., Henson, R. N., Ruiter, D. J., & Fernández, G. (2022). Differential roles for medial prefrontal and medial temporal cortices in schema-dependent encoding: From congruent to incongruent. Nature Communications, 13, 1737. https://doi.org/10.1038/s41467-022-29191-2

van Kesteren, M. T. R., Ruiter, D. J., Fernández, G., & Henson, R. N. (2012). How schema and novelty augment memory formation. Trends in Neurosciences, 35(4), 211–219.

Wang, S.-H., & Morris, R. G. M. (2010). Hippocampal-neocortical interactions in memory formation, consolidation, and reconsolidation. Annual Review of Psychology, 61, 49–79.

Xie, W., & Zhang, W. (2017). Dissociations of the number and precision of visual short-term memory representations in change detection. Memory & Cognition, 45(8), 1423–1437.

Yassa, M. A., & Reagh, Z. M. (2013). Competitive trace theory: A role for the hippocampus in contextual interference during retrieval. Frontiers in Behavioral Neuroscience, 7, 107. https://doi.org/10.3389/fnbeh.2013.00107

Yonelinas, A. P., Ranganath, C., Ekstrom, A. D., & Wiltgen, B. J. (2019). A contextual binding theory of episodic memory: Systems consolidation reconsidered. Nature Reviews Neuroscience, 20(6), 364–375. https://doi.org/10.1038/s41583-019-0150-4

Thanks for reading Grounded! This post is public so feel free to share it.

Subscribe now