REWARD
- Definition and Relationship to Reinforcement
- Historical Context in Behaviorism
- Neural Mechanisms of Reward: The Dopaminergic System
- Types of Rewards: Primary and Secondary
- Applications in Learning and Motivation
- Distinction: Reward vs. Punishment vs. Reinforcement
- The Role of Expectation and Prediction Error
- Intrinsic versus Extrinsic Rewards
- Clinical and Societal Implications
Definition and Relationship to Reinforcement
The term “reward” in common parlance is generally employed to describe any stimulus or outcome that is perceived as desirable, pleasant, or satisfying. In the technical lexicon of psychology, particularly within the framework of behaviorism and learning theory, the concept of reward is closely related to, though often used synonymously with, the mechanism known as positive reinforcement. While a layman might define a reward based on its hedonic quality—that is, how good it feels—the psychological definition, especially in experimental contexts, centers strictly on the stimulus’s operational function. A reward is a stimulus presented after a specific behavior that is intended to increase the future frequency or probability of that behavior occurring again under similar circumstances. This distinction is critical: reinforcement defines the process (the increase in behavior), whereas reward typically refers to the consequence (the object or event) delivered.
The operational definition clarifies that a stimulus is only functioning as a true reward (a positive reinforcer) if it demonstrably increases the target behavior. If a researcher introduces a piece of chocolate following a rat’s lever press, and the rat subsequently presses the lever more often, the chocolate is confirmed as a reward and, by extension, a positive reinforcer. Conversely, if the introduction of the chocolate does not alter or even decreases the lever-pressing frequency, then despite its perceived pleasantness, it fails the operational test and is not acting as a reinforcer in that specific context. This emphasis on observable behavioral consequences, rather than subjective internal states, forms the bedrock of the behaviorist perspective pioneered by figures like B.F. Skinner, who meticulously avoided mentalistic terminology like “satisfaction” or “pleasure” in favor of measurable contingencies.
Furthermore, the effectiveness of any potential reward is highly contingent upon the individual’s state of deprivation and motivational context. A large meal may serve as a powerful reward for an individual who has fasted for twenty-four hours, but it will likely hold little reinforcing value for someone who has just eaten to satiety. This variability necessitates the inclusion of motivational operations—antecedent conditions that temporarily alter the effectiveness of a consequence—when analyzing the true impact of a reward. Therefore, while “reward” serves as a useful and intuitive shorthand for a desirable outcome, its rigorous scientific application demands confirmation that it serves as an effective positive reinforcer capable of driving learning and increasing the probability of antecedent behaviors.
Historical Context in Behaviorism
The psychological study of rewards finds its earliest formal roots in the work of Edward Thorndike, specifically his formulation of the Law of Effect at the turn of the 20th century. Thorndike postulated that responses immediately followed by satisfaction or a “satisfying state of affairs” would be more likely to recur, while responses followed by discomfort or annoyance would be less likely to recur. This early conceptualization introduced the fundamental idea that consequences shape behavior, laying the groundwork for all subsequent research on reinforcement and reward. Although Thorndike used subjective terms like “satisfaction,” his emphasis on the automatic strengthening of stimulus-response connections based on outcomes was revolutionary and empirical.
Building upon this foundation, Ivan Pavlov’s research into classical conditioning demonstrated how previously neutral stimuli could acquire motivational significance through association. In Pavlovian conditioning, the unconditioned stimulus (UCS), such as food, inherently possesses rewarding properties, eliciting an unconditioned response (UCR), like salivation. When a neutral stimulus (e.g., a bell) is reliably paired with this UCS, the bell transforms into a conditioned stimulus (CS), capable of eliciting a conditioned response (CR). In this framework, the UCS functions as the primary reward that drives the association, signaling a biologically significant event and preparing the organism for its delivery, illustrating a basic mechanism by which environmental cues gain predictive reward value.
The most defining historical contribution came from B.F. Skinner, who formalized the principles of operant conditioning. Skinner clearly articulated that a reward, acting as a positive reinforcer, is defined solely by its ability to increase the rate of the behavior it follows. He developed precise experimental procedures using the operant chamber (Skinner Box) to study the contingencies between response and consequence. Skinner’s rigorous approach moved the field away from inferring internal states and focused instead on the reliable prediction and control of behavior through the manipulation of consequences, specifically emphasizing that reinforcement is the technical term for the process and that the reinforcing stimulus itself is the reward. This work cemented the understanding that rewards are tools for shaping and maintaining complex behavioral repertoires.
Neural Mechanisms of Reward: The Dopaminergic System
Modern neuroscience has pinpointed a specific circuit in the brain responsible for processing reward, motivation, and goal-directed behavior, commonly referred to as the mesolimbic dopamine pathway. This critical pathway originates in the Ventral Tegmental Area (VTA) and projects extensively to key forebrain structures, most notably the Nucleus Accumbens (NAc), which is considered the primary hub of the brain’s reward center. When an organism encounters a rewarding stimulus—whether it is food, water, social interaction, or a psychoactive substance—dopamine neurons in the VTA are activated, releasing dopamine into the NAc and other target areas like the amygdala and prefrontal cortex. This surge of dopamine does not necessarily register as pure pleasure (the ‘liking’ component), but rather serves as a powerful signal for motivational salience and ‘wanting,’ tagging the preceding actions and stimuli as worthy of future pursuit.
The role of dopamine in this system is multifaceted, yet crucially, it is less about the consumption of the reward and more about the learning and anticipation of the reward. Dopamine signals are critical for determining whether an event is salient enough to justify behavioral energy, effectively functioning as a learning signal that motivates approach behavior. For instance, studies have shown that dopamine release peaks not when the reward is received, but often when the organism encounters the cues (conditioned stimuli) that predict the reward’s arrival. This anticipatory function underscores dopamine’s primary role in translating reward information into action, driving the organism to seek out resources and engage in behaviors necessary for survival and reproduction.
While dopamine drives the motivational ‘wanting,’ the actual subjective experience of pleasure or ‘liking’ involves different neurochemical systems, primarily endogenous opioids (endorphins and enkephalins) and endocannabinoids, which act within the NAc and the ventral pallidum. These systems modulate the hedonic impact of the reward once it is consumed. The dissociation between ‘wanting’ (dopamine/motivation) and ‘liking’ (opioids/hedonics) is fundamental to understanding reward pathologies, such as addiction, where intense wanting for a substance persists long after the capacity to derive pleasure from it has diminished, demonstrating a hijacking of the motivational drive irrespective of the hedonic outcome.
Types of Rewards: Primary and Secondary
Rewards can be broadly categorized into two fundamental types based on their origin and dependency on prior learning: primary rewards and secondary rewards. Primary rewards, also known as unconditioned reinforcers, are those that possess inherent reinforcing properties because they satisfy essential biological needs necessary for survival. These include stimuli such as food, water, optimal temperature, sexual contact, and relief from pain. Their effectiveness does not rely on any prior association with other reinforcers; they naturally elicit approach behavior and are potent determinants of behavior across species, reflecting their fundamental importance in evolutionary adaptation.
In contrast, secondary rewards, or conditioned reinforcers, are initially neutral stimuli that acquire their reinforcing power through consistent pairing with primary rewards or previously established secondary rewards. Secondary rewards are entirely learned and culturally dependent. Examples include tokens, grades, praise, recognition, and most notably, money. Money, for instance, has no intrinsic biological value, but because it is systematically paired with the ability to acquire primary rewards (food, shelter) and countless other secondary rewards, it becomes one of the most powerful and pervasive reinforcers in human society. The process by which a neutral stimulus gains this reinforcing capacity is called higher-order conditioning.
A particularly powerful subtype of secondary reward is the generalized conditioned reinforcer. These are secondary rewards that have been paired with a wide variety of other reinforcers, making them effective across almost all motivational states and contexts. Money is the quintessential example of a generalized conditioned reinforcer because it can be exchanged for virtually any primary reward (e.g., food when hungry, shelter when cold) or other secondary rewards. Because generalized reinforcers are not dependent on a single state of deprivation, they maintain their effectiveness across diverse situations, making them exceptionally useful in behavioral intervention programs like token economies established in clinical and educational settings.
Applications in Learning and Motivation
The application of rewards is central to understanding and manipulating complex learning processes, particularly through the use of schedules of reinforcement. Rather than delivering a reward after every single instance of the desired behavior (continuous reinforcement), intermittent schedules are typically employed to maintain behavior and make it highly resistant to extinction. These schedules are categorized based on whether the reward is delivered based on the number of responses (ratio schedules) or the passage of time (interval schedules), and whether the requirement is fixed or variable.
The four classical schedules—Fixed Ratio (FR), Variable Ratio (VR), Fixed Interval (FI), and Variable Interval (VI)—produce distinct and predictable patterns of responding. For example, a Fixed Ratio schedule, which rewards a subject after a specific, unchanging number of responses, tends to produce a high rate of response followed by a brief pause immediately after the reward is delivered. Conversely, the Variable Ratio schedule, where the number of responses required for reward varies unpredictably (e.g., slot machines), generates the highest and steadiest rate of responding and is notoriously resistant to extinction, as the subject is always motivated by the immediate possibility of reward. Understanding these schedules allows educators, clinicians, and managers to optimize motivational structures for long-term behavioral maintenance.
Furthermore, reward is essential in the technique of shaping, a powerful tool used to teach behaviors that are not currently in the organism’s repertoire. Shaping involves differentially reinforcing successive approximations of the target behavior. Initially, any behavior that vaguely resembles the final goal is rewarded. As the subject learns, the criterion for receiving the reward is incrementally raised, requiring behaviors that are progressively closer to the desired outcome. This systematic use of positive reinforcement, or reward, allows trainers to bridge the gap between simple, spontaneous actions and highly complex behaviors, ranging from teaching an animal to perform a complex trick to teaching a child nuanced social skills.
Distinction: Reward vs. Punishment vs. Reinforcement
A frequent source of confusion in lay discourse is the interchangeability of the terms reward, punishment, and reinforcement. Psychologically, these terms are strictly defined by their effect on behavior frequency. Reinforcement is any consequence that increases the future likelihood of the behavior it follows, while Punishment is any consequence that decreases the future likelihood of the behavior it follows. The term “reward” is best reserved for the specific stimulus used in one type of reinforcement.
This framework yields four fundamental consequence types, defined by the interaction of two dimensions: whether a stimulus is presented (Positive) or removed (Negative), and whether the behavior increases (Reinforcement) or decreases (Punishment):
- Positive Reinforcement: The presentation of a desirable stimulus (the reward) following a behavior, which increases the future frequency of that behavior. Example: Giving a child praise (reward) after they clean their room, leading to more cleaning.
- Negative Reinforcement: The removal of an aversive stimulus following a behavior, which increases the future frequency of that behavior. Example: Taking an aspirin (behavior) to remove a headache (aversive stimulus), leading to increased aspirin use in the future.
- Positive Punishment: The presentation of an aversive stimulus following a behavior, which decreases the future frequency of that behavior. Example: Receiving a painful electric shock (aversive stimulus) for touching a wrong object.
- Negative Punishment: The removal of a desirable stimulus (the reward) following a behavior, which decreases the future frequency of that behavior. Example: Taking away a teenager’s cell phone (removing a reward) after they break curfew.
Therefore, the psychological term “reward” functions strictly as the stimulus added in the procedure of positive reinforcement. It is a desirable stimulus used to strengthen a response. It is critical to recognize that negative reinforcement, while increasing behavior, does not involve a reward; it involves the termination of an aversive condition. Similarly, punishment, which aims to suppress behavior, involves the addition of aversive stimuli or the removal of existing rewards. Maintaining this precise nomenclature is essential for the accurate analysis and modification of behavior.
The Role of Expectation and Prediction Error
Moving beyond early behaviorist models, cognitive and neuroscientific research introduced the concept that rewards are not processed in isolation but are evaluated relative to the organism’s expectations. This led to the highly influential concept of Reward Prediction Error (RPE), which posits that the crucial learning signal provided by a reward is not the reward itself, but the discrepancy between the expected reward and the actual reward received. This model is integral to understanding how the brain updates its associative learning based on experience.
The RPE is mathematically represented as: Actual Reward Received – Expected Reward. Dopaminergic neurons in the VTA and NAc are believed to encode this error signal. When a reward is received that is better than expected (a positive RPE), dopamine neurons fire strongly, providing a robust signal that strengthens the association between the preceding cues and the action taken. This encourages the repetition of the behavior. Conversely, if a reward is expected but is omitted or is worse than expected (a negative RPE), the dopamine firing momentarily dips below baseline, signaling a failure to meet expectations and prompting the organism to adjust its predictive model or change its behavior.
When the outcome perfectly matches the expectation (RPE equals zero), the dopamine neurons fire neutrally, indicating that no new learning is required. This mechanism demonstrates that the primary function of the reward system is not merely to signal pleasure, but to serve as a sophisticated teaching mechanism that constantly refines the organism’s predictive model of the environment. By signaling prediction errors, rewards drive optimal decision-making, ensuring that an organism allocates its limited resources toward actions that yield the best possible outcomes in a dynamic world.
Intrinsic versus Extrinsic Rewards
Rewards can also be classified based on whether the source of motivation is internal or external to the activity itself, leading to the distinction between intrinsic rewards and extrinsic rewards. Extrinsic rewards are tangible or intangible incentives that are external to the task being performed, such as money, grades, prizes, or social recognition. They are given to control or motivate the performance of a behavior that might otherwise be unappealing or neutral.
Intrinsic rewards, however, are inherently satisfying; they arise directly from the execution of the task itself. These internal rewards are driven by feelings of competence, autonomy, enjoyment, mastery, and the achievement of personal goals. An individual is intrinsically rewarded when they engage in an activity purely for the satisfaction derived from the process, such as solving a complex puzzle for the challenge or engaging in a hobby for pleasure. Intrinsic motivation is generally considered more robust and sustainable over the long term than reliance solely on external incentives.
A significant finding concerning this dichotomy is the Overjustification Effect, which suggests that introducing excessive extrinsic rewards for an activity that is already intrinsically rewarding can paradoxically undermine the internal motivation. If a child enjoys drawing (intrinsic reward) but is then heavily rewarded with money for every picture they complete (extrinsic reward), the child may reattribute their motivation from internal enjoyment to external gain. When the extrinsic reward is withdrawn, the child may stop drawing altogether, demonstrating that the external reward has “overjustified” the behavior and diminished the inherent satisfaction of the task. This highlights the delicate balance required when designing motivational systems, particularly in educational and organizational settings.
Clinical and Societal Implications
The neurobiological understanding of reward has profound implications for clinical psychology, particularly in the study and treatment of addiction. Addictive substances and behaviors (e.g., gambling) act by flooding the mesolimbic pathway with massive, unsustainable surges of dopamine, far exceeding the level produced by natural rewards. This intense signal hijacks the brain’s natural learning mechanism, leading to rapid and powerful conditioning where the cues associated with the substance become intensely motivating. Over time, the brain down-regulates its dopamine receptors, necessitating higher doses to achieve the same effect, shifting the motivational state from seeking pleasure to compulsively seeking relief from withdrawal, thus perpetuating the cycle of dependence.
In therapeutic settings, rewards are consciously employed to modify behavior. Token economies are a classic example, especially effective in institutional settings like psychiatric hospitals or specialized classrooms. Patients earn tokens (secondary rewards) for engaging in desired behaviors (e.g., self-care, social interaction) which can later be exchanged for backup reinforcers (primary rewards or privileges). This systematic application of reward principles allows clinicians to structure the environment to promote adaptive behaviors and facilitate rehabilitation.
Societally, the principles of reward underpin diverse systems, including educational grading, economic incentive structures, and public health campaigns. Grading systems utilize grades and degrees as powerful secondary rewards to motivate academic effort. In economics, bonuses, commissions, and performance incentives are extrinsic rewards designed to align individual behavior with organizational goals. Understanding the potency and appropriate scheduling of rewards is therefore crucial not only for individual learning but also for designing functional, ethical, and motivating societal structures.