neuroscience Neuroscience is the scientific study of the nervous system (the brain, spinal cord, and peripheral nervous system), its functions, and its disorders. It is a multidisciplinary science that combines physiology, anatomy, molecular biology, ...

, predictive coding (also known as predictive processing) is a theory of

brain The brain is an organ (biology), organ that serves as the center of the nervous system in all vertebrate and most invertebrate animals. It consists of nervous tissue and is typically located in the head (cephalization), usually near organs for ...

function which postulates that the brain is constantly generating and updating a " mental model" of the environment. According to the theory, such a mental model is used to predict input signals from the

senses A sense is a biological system used by an organism for sensation, the process of gathering information about the surroundings through the detection of stimuli. Although, in some cultures, five human senses were traditionally identified as su ...

that are then compared with the actual input signals from those senses. Predictive coding is member of a wider set of theories that follow the Bayesian brain hypothesis.

Origins

Theoretical ancestors to predictive coding date back as early as 1860 with Helmholtz's concept of unconscious inference. Unconscious inference refers to the idea that the human brain fills in visual information to make sense of a scene. For example, if something is relatively smaller than another object in the visual field, the brain uses that information as a likely cue of depth, such that the perceiver ultimately (and involuntarily) experiences depth. The understanding of perception as the interaction between sensory stimuli (bottom-up) and conceptual knowledge (top-down) continued to be established by

Jerome Bruner Jerome Seymour Bruner (October 1, 1915 – June 5, 2016) was an American psychologist who made significant contributions to human cognitive psychology and cognitive learning theory (education), learning theory in educational psychology. Bruner was ...

who, starting in the 1940s, studied the ways in which needs, motivations and expectations influence perception, research that came to be known as 'New Look' psychology. In 1981, McClelland and Rumelhart examined the interaction between processing features (lines and contours) which form letters, which in turn form words. While the features suggest the presence of a word, they found that when letters were situated in the context of a word, people were able to identify them faster than when they were situated in a non-word without semantic context. McClelland and Rumelhart's parallel processing model describes perception as the meeting of top-down (conceptual) and bottom-up (sensory) elements. In the late 1990s, the idea of top-down and bottom-up processing was translated into a computational model of vision by Rao and Ballard. Their paper demonstrated that there could be a

generative model In statistical classification, two main approaches are called the generative approach and the discriminative approach. These compute classifiers by different approaches, differing in the degree of statistical modelling. Terminology is inconsiste ...

of a scene (top-down processing), which would receive feedback via error signals (how much the visual input varied from the prediction), which would subsequently lead to updating the prediction. The computational model was able to replicate well-established receptive field effects, as well as less understood extra-classical receptive field effects such as end-stopping. In 2004, Rick Grush proposed a model of neural perceptual processing according to which the brain constantly generates predictions based on a generative model (what Grush called an ‘emulator’), and compares that prediction to the actual sensory input. The difference, or ‘sensory residual’ would then be used to update the model so as to produce a more accurate estimate of the perceived domain. On Grush’s account, the top-down and bottom up signals would be combined in a way sensitive to the expected noise (aka uncertainty) in the bottom-up signal, so that in situations in which the sensory signal was known to be less trustworthy, the top-down prediction would be given greater weight, and vice-versa. The emulation framework was also shown to be hierarchical, with modality-specific emulators providing top-down expectations for sensory signals as well as higher-level emulators providing expectations of the distal causes of those signals. Grush applied the theory to visual perception, visual and motor imagery, language, and theory of mind phenomena.

General framework

Predictive coding was initially developed as a model of the

sensory system The sensory nervous system is a part of the nervous system responsible for processing sensory information. A sensory system consists of sensory neurons (including the sensory receptor cells), neural pathways, and parts of the brain involved ...

, where the brain solves the problem of modelling distal causes of sensory input through a version of

Bayesian inference Bayesian inference ( or ) is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian infer ...

. It assumes that the brain maintains an active internal representations of the distal causes, which enable it to predict the sensory inputs. A comparison between predictions and sensory input yields a difference measure (e.g. prediction error, free energy, or surprise) which, if it is sufficiently large beyond the levels of expected statistical noise, will cause the internal model to update so that it better predicts sensory input in the future. If, instead, the model accurately predicts driving sensory signals, activity at higher levels cancels out activity at lower levels, and the internal model remains unchanged. Thus, predictive coding inverts the conventional view of perception as a mostly bottom-up process, suggesting that it is largely constrained by prior predictions, where signals from the external world only shape perception to the extent that they are propagated up the cortical hierarchy in the form of prediction error. Prediction errors can not only be used for inferring distal causes, but also

learning Learning is the process of acquiring new understanding, knowledge, behaviors, skills, value (personal and cultural), values, Attitude (psychology), attitudes, and preferences. The ability to learn is possessed by humans, non-human animals, and ...

them via neural plasticity. Here the idea is that the representations learned by cortical neurons reflect the statistical regularities in the sensory data. This idea is also present in many other theories of neural learning, such as sparse coding, with the central difference being that in predictive coding not only the connections to sensory inputs are learned (i.e., the

receptive field The receptive field, or sensory space, is a delimited medium where some physiological stimuli can evoke a sensory neuronal response in specific organisms. Complexity of the receptive field ranges from the unidimensional chemical structure of od ...

), but also top-down predictive connections from higher-level representations. This makes predictive coding similar to some other models of hierarchical learning, such as Helmholtz machines and Deep belief networks, which however employ different learning algorithms. Thus, the dual use of prediction errors for both inference and learning is one of the defining features of predictive coding.

Precision weighting

The precision of incoming sensory input is their predictability based on signal noise and other factors. Estimates of the precision are crucial for effectively minimizing prediction error, as it allows to weight sensory inputs and predictions according to their reliability. For instance, the noise in the visual signal varies between dawn and dusk, such that greater conditional confidence is assigned to sensory prediction errors in broad daylight than at nightfall. Similar approaches are successfully used in other algorithms performing

, e.g., for Bayesian filtering in the

Kalman filter In statistics and control theory, Kalman filtering (also known as linear quadratic estimation) is an algorithm that uses a series of measurements observed over time, including statistical noise and other inaccuracies, to produce estimates of unk ...

. It has also been proposed that such weighting of prediction errors in proportion to their estimated precision is, in essence,

attention Attention or focus, is the concentration of awareness on some phenomenon to the exclusion of other stimuli. It is the selective concentration on discrete information, either subjectively or objectively. William James (1890) wrote that "Atte ...

, and that the process of devoting attention may be neurobiologically accomplished by ascending reticular activating systems (ARAS) optimizing the “gain” of prediction error units. However, it has also been argued that precision weighting can only explain "endogenous spatial attention", but not other forms of attention.

Active inference

The same principle of prediction error minimization has been used to provide an account of behavior in which motor actions are not commands but descending proprioceptive predictions. In this scheme of active inference, classical reflex arcs are coordinated so as to selectively sample sensory input in ways that better fulfill predictions, thereby minimizing proprioceptive prediction errors. Indeed, Adams et al. (2013) review evidence suggesting that this view of hierarchical predictive coding in the motor system provides a principled and neurally plausible framework for explaining the agranular organization of the motor cortex. This view suggests that “perceptual and motor systems should not be regarded as separate but instead as a single active inference machine that tries to predict its sensory input in all domains: visual, auditory, somatosensory, interoceptive and, in the case of the motor system, proprioceptive."

Neural theory in predictive coding

Much of the early work that applied a predictive coding framework to neural mechanisms came from sensory processing, particularly in the

visual cortex The visual cortex of the brain is the area of the cerebral cortex that processes visual information. It is located in the occipital lobe. Sensory input originating from the eyes travels through the lateral geniculate nucleus in the thalam ...

. These theories assume that the cortical architecture can be divided into hierarchically stacked levels, which correspond to different cortical regions. Every level is thought to house (at least) two types of neurons: "prediction neurons", which aim to predict the bottom-up inputs to the current level, and "error neurons", which signal the difference between input and prediction. These neurons are thought to be mainly non-superficial and superficial

pyramidal neurons Pyramidal cells, or pyramidal neurons, are a type of multipolar neuron found in areas of the brain including the cerebral cortex, the hippocampus, and the amygdala. Pyramidal cells are the primary excitation units of the mammalian prefrontal cort ...

, while

interneurons Interneurons (also called internuncial neurons, association neurons, connector neurons, or intermediate neurons) are neurons that are not specifically motor neurons or sensory neurons. Interneurons are the central nodes of neural circuits, ena ...

take up different functions. Within cortical regions, there is evidence that different cortical layers may facilitate the integration of feedforward and feed-backward projections across hierarchies. These cortical layers have therefore been assumed to be central in the computation of predictions and prediction errors, with the basic unit being a

cortical column A cortical column is a group of neurons forming a cylindrical structure through the cerebral cortex of the brain perpendicular to the cortical surface. The structure was first identified by Vernon Benjamin Mountcastle in 1957. He later identified c ...

. A common view is that * ''error neurons'' reside in supragranular layers 2 and 3, since these neurons show sparse activity and tend to respond to unexpected events * ''prediction neurons'' reside in deep layer 5, where many neurons exhibit dense responses * ''precision weighting'' might be implemented through diverse mechanism, such as neuromodulators or long range projections from other brain areas (e.g.,

thalamus The thalamus (: thalami; from Greek language, Greek Wikt:θάλαμος, θάλαμος, "chamber") is a large mass of gray matter on the lateral wall of the third ventricle forming the wikt:dorsal, dorsal part of the diencephalon (a division of ...

) However, thus far there is no consensus on how the brain most likely implements predictive coding. Some theories, for example, propose that supragranular layers contain not only error, but also prediction neurons. It is also still debated through which mechanisms error neurons might compute the prediction error. Since prediction errors can be both negative and positive, but biological neurons can only show positive activity, more complex error coding schemes are required. To circumvent this problem, more recent theories have proposed that error computation might take place in neural dendrites instead. The neural architecture and computations proposed in these dendritic theories are similar to what has been proposed in Hierarchical temporal memory theory of cortex.

Applying predictive coding

Perception

The empirical evidence for predictive coding is most robust for perceptual processing. As early as 1999, Rao and Ballard proposed a hierarchical visual processing model in which higher-order visual cortical area sends down predictions and the feedforward connections carry the residual errors between the predictions and the actual lower-level activities. According to this model, each level in the hierarchical model network (except the lowest level, which represents the image) attempts to predict the responses at the next lower level via feedback connections, and the error signal is used to correct the estimate of the input signal at each level concurrently. Emberson et al. established the top-down modulation in infants using a cross-modal audiovisual omission paradigm, determining that even infant brains have expectation about future sensory input that is carried downstream from visual cortices and are capable of expectation-based feedback. Functional near-infrared spectroscopy (fNIRS) data showed that infant occipital cortex responded to unexpected visual omission (with no visual information input) but not to expected visual omission. These results establish that in a hierarchically organized perception system, higher-order neurons send down predictions to lower-order neurons, which in turn sends back up the prediction error signal.

Interoception

There have been several competing models for the role of predictive coding in

interoception Interoception is the collection of Sense#Other internal sensations and perceptions, senses providing information to the organism about the internal state of the body. This can be both conscious and subconscious. It encompasses the brain's process ...

. In 2013, Anil Seth proposed that our subjective feeling states, otherwise known as emotions, are generated by predictive models that are actively built out of causal interoceptive appraisals. In relation to how we attribute internal states of others to causes, Sasha Ondobaka, James Kilner, and Karl Friston (2015) proposed that the free energy principle requires the brain to produce a continuous series of predictions with the goal of reducing the amount of prediction error that manifests as “free energy”. These errors are then used to model anticipatory information about what the state of the outside world will be and attributions of causes of that world state, including understanding of causes of others’ behavior. This is especially necessary because, to create these attributions, our multimodal sensory systems need interoceptive predictions to organize themselves. Therefore, Ondobaka posits that predictive coding is key to understanding other people's internal states. In 2015,

Lisa Feldman Barrett Lisa Feldman Barrett is a Canadian-American psychologist. She is a University Distinguished Professor of psychology at Northeastern University, where she focuses on affective science and co-directs the Interdisciplinary Affective Science Labora ...

and W. Kyle Simmons proposed the Embodied Predictive Interoception Coding model, a framework that unifies Bayesian active inference principles with a physiological framework of corticocortical connections. Using this model, they posited that agranular visceromotor cortices are responsible for generating predictions about interoception, thus, defining the experience of interoception. Contrary to the inductive notion that emotion categories are biologically distinct, Barrett proposed later the theory of constructed emotion, which is the account that a biological emotion category is constructed based on a conceptual category—the accumulation of instances sharing a goal. In a predictive coding model, Barrett hypothesizes that, in interoception, our brains regulate our bodies by activating "embodied simulations" (full-bodied representations of sensory experience) to anticipate what our brains predict that the external world will throw at us sensorially and how we will respond to it with action. These simulations are either preserved if, based on our brain's predictions, they prepare us well for what actually subsequently occurs in the external world, or they, and our predictions, are adjusted to compensate for their error in comparison to what actually occurs in the external world and how well-prepared we were for it. Then, in a trial-error-adjust process, our bodies find similarities in goals among certain successful anticipatory simulations and group them together under conceptual categories. Every time a new experience arises, our brains use this past trial-error-adjust history to match the new experience to one of the categories of accumulated corrected simulations that it shares the most similarity with. Then, they apply the corrected simulation of that category to the new experience in the hopes of preparing our bodies for the rest of the experience. If it does not, the prediction, the simulation, and perhaps the boundaries of the conceptual category are revised in the hopes of higher accuracy next time, and the process continues. Barrett hypothesizes that, when prediction error for a certain category of simulations for x-like experiences is minimized, what results is a correction-informed simulation that the body will reenact for every x-like experience, resulting in a correction-informed full-bodied representation of sensory experience—an emotion. In this sense, Barrett proposes that we construct our emotions because the conceptual category framework our brains use to compare new experiences, and to pick the appropriate predictive sensory simulation to activate, is built on the go.

Computer science

With the rising popularity of

representation learning In machine learning (ML), feature learning or representation learning is a set of techniques that allow a system to automatically discover the representations needed for feature detection or classification from raw data. This replaces manual fea ...

, the theory has also been actively pursued and applied in

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

and related fields.

Challenges

One of the biggest challenges to test predictive coding has been the imprecision of exactly how prediction error minimization works. In some studies, the increase in BOLD signal has been interpreted as error signal while in others it indicates changes in the input representation. A crucial question that needs to be addressed is what exactly constitutes error signal and how it is computed at each level of information processing. Another challenge that has been posed is predictive coding's computational tractability. According to Kwisthout and van Rooij, the subcomputation in each level of the predictive coding framework potentially hides a computationally intractable problem, which amounts to “intractable hurdles” that computational modelers have yet to overcome. Future research could focus on clarifying the neurophysiological mechanism and computational model of predictive coding.

References

Cognitive modeling Neuropsychology Perception