Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis | PNAS

A number of recent advances have been achieved in the study of midbrain dopaminergic neurons. Understanding these advances and how they relate to one another requires a deep understanding of the computational models that serve as an explanatory framework and guide ongoing experimental inquiry. This intertwining of theory and experiment now suggests very clearly that the phasic activity of the midbrain dopamine neurons provides a global mechanism for synaptic modification. These synaptic modifications, in turn, provide the mechanistic underpinning for a specific class of reinforcement learning mechanisms that now seem to underlie much of human and animal behavior. This review describes both the critical empirical findings that are at the root of this conclusion and the fantastic theoretical advances from which this conclusion is drawn.

The theory and data available today indicate that the phasic activity of midbrain dopamine neurons encodes a reward prediction error used to guide learning throughout the frontal cortex and the basal ganglia. Activity in these dopaminergic neurons is now believed to signal that a subject’s estimate of the value of current and future events is in error and indicate the magnitude of this error. This is a kind of combined signal that most scholars active in dopamine studies believe adjusts synaptic strengths in a quantitative manner until the subject’s estimate of the value of current and future events is accurately encoded in the frontal cortex and basal ganglia. Although some confusion remains within the larger neuroscience community, very little data exist that are incompatible with this hypothesis. This review provides a brief overview of the explanatory synergy between behavioral, anatomical, physiological, and biophysical data that has been forged by recent computational advances. For a more detailed treatment of this hypothesis, refer to Niv and Montague (1) or Dayan and Abbot (2).

Tags: basal-ganglia, dopamine, frontal-cortex, neuroscience, reinforcement-learning, reward-prediction-error, synapses, and temporal-difference-learning

Star rating:
[Total: 1   Average: 5/5]

Format: Scientific Paper

Creator(s): Paul W. Glimcher (https://med.nyu.edu/faculty/paul-w-glimcher)

Publication Date: 2011, Mar 9th


View Resource:
https://www.pnas.org/content/108/Supplement_3/15647

Leave a Reply