INTERMITTENT REINFORCEMENT
- Intermittent Reinforcement: A Detailed Review
- The Mechanism of Behavioral Persistence and Durability
- Schedules of Intermittent Reinforcement: Fixed Schedules
- Schedules of Intermittent Reinforcement: Variable Schedules
- Historical Research and Key Findings
- Applications and Implications in Psychology and Society
- Conclusion
- References
Intermittent Reinforcement: A Detailed Review
Intermittent reinforcement, often referred to as partial reinforcement, stands as a cornerstone concept within the realm of behavioral psychology, specifically operant conditioning. This technique involves the delivery of a reinforcer or reward following a desired behavior, but critically, not every time the behavior occurs. Unlike continuous reinforcement (CRF), where every correct response is rewarded, intermittent reinforcement schedules introduce an element of unpredictability or infrequency into the reward structure. This strategic variation in reinforcement delivery is utilized precisely because it generates behaviors that are significantly more durable, robust, and resistant to extinction than behaviors established under CRF. The primary goal of employing intermittent reinforcement is not merely to initiate a behavior, but rather to ensure its long-term maintenance and stability, making it a powerful tool in contexts ranging from psychological experimentation to practical applications like animal training and therapeutic interventions.
The application of intermittent reinforcement highlights a profound principle of behavioral science: the consistency of the behavior is often inversely related to the consistency of the reward required to maintain it. When an organism, whether human or animal, learns that a reward is available only sometimes, it becomes motivated to continue responding through periods of non-reinforcement, anticipating the eventual, unpredictable reward. This mechanism contrasts sharply with continuous reinforcement; while CRF rapidly establishes a behavior, the organism quickly learns the contingency rule. If the reinforcement stops under CRF, the organism immediately detects the change, and the behavior ceases rapidly—a phenomenon known as quick extinction. Intermittent schedules, conversely, embed non-reinforced trials directly into the learning process, thereby conditioning the organism to persist despite temporary setbacks or failures to receive a reward.
While the term “intermittent reinforcement” broadly covers any schedule that is not continuous, the specific patterns and timing of reinforcement delivery are crucial determinants of the resulting behavioral outcomes. These patterns, known as schedules of reinforcement, dictate the rate, pattern, and tenacity of the learned response. The foundational research by B.F. Skinner and his colleagues meticulously mapped out these schedules, classifying them primarily based on whether the reinforcement depends on the number of responses made (ratio schedules) or the elapsed time since the last reinforcement (interval schedules), and whether these requirements are fixed or variable. Understanding these nuances is essential for predicting and controlling behavior across diverse environments, from controlled laboratory settings to complex social interactions.
The Mechanism of Behavioral Persistence and Durability
The profound effectiveness of intermittent reinforcement hinges upon its ability to foster high rates of behavioral persistence, particularly when the reinforcement is withdrawn entirely (extinction). The core psychological mechanism involves the conditioning history of the organism. When a behavior is only reinforced occasionally, the organism learns that periods of non-reinforcement are a normal, expected part of the process leading to the eventual reward. Therefore, when an extinction phase begins—where the reward is permanently ceased—the initial non-reinforced responses are simply perceived as another temporary gap in the schedule, rather than a definitive signal that the contingency has ended. This ambiguity significantly delays the recognition of extinction, leading to extended periods during which the organism continues to emit the learned response, often with increased vigor or emotional response, a phenomenon known as the extinction burst.
This durability is formally described by the concept of the Partial Reinforcement Extinction Effect (PREE). The PREE stipulates that the less frequently a behavior is reinforced during acquisition, the longer it will take to extinguish that behavior once reinforcement is completely removed. Two primary theories attempt to explain the PREE. The first, Discrimination Theory, suggests that the organism finds it difficult to discriminate between the training condition (intermittent reinforcement) and the extinction condition (no reinforcement) because both involve periods of non-reward. The second, Frustration Theory, posits that during intermittent training, the organism learns to respond not only when rewarded but also when frustrated by the absence of an expected reward. Thus, during extinction, the frustration elicited by non-reward actually serves as a conditioned stimulus for continued responding, thereby fueling persistence.
Furthermore, the psychological impact of unpredictable rewards is intrinsically motivating. The unpredictability inherent in variable schedules, particularly the Variable Ratio schedule, taps into the brain’s reward system, stimulating dopamine release associated with anticipation and seeking behavior. This uncertainty keeps the organism engaged and focused on the task, as every response holds the potential for reinforcement. This contrasts with fixed schedules, where behavior often drops off immediately following reinforcement until the next required ratio or interval is nearly completed. The power of intermittent reinforcement lies in converting uncertainty into a potent driver of continuous engagement, making the behavior less about the immediate reward and more about the hope of future reinforcement.
Schedules of Intermittent Reinforcement: Fixed Schedules
Fixed schedules of reinforcement are characterized by a predictable, unvarying requirement for reinforcement delivery. These schedules produce specific, identifiable patterns of response that differ markedly from the steady rates seen in variable schedules. The two main types of fixed schedules are Fixed Ratio (FR) and Fixed Interval (FI). In a Fixed Ratio (FR) schedule, reinforcement is delivered only after a specific, predetermined number of responses has been performed. For example, an FR-10 schedule means the organism must respond ten times to receive one reward. This schedule typically results in a very high rate of responding because the reward is directly contingent upon the effort (the number of responses).
A defining characteristic of the FR schedule is the post-reinforcement pause, sometimes called the “break-and-run” pattern. Immediately following the receipt of reinforcement, the organism typically ceases responding for a period of time before beginning the high-rate responding required to meet the next ratio requirement. The length of this pause is generally proportional to the size of the ratio required; larger ratios lead to longer pauses. This pause suggests that the organism is aware of the upcoming work requirement. Examples of FR schedules include piecework pay in industrial settings, where a worker is paid a fixed amount for completing a specific quantity of items, thereby encouraging high effort but often resulting in short breaks after a batch is completed.
In contrast, the Fixed Interval (FI) schedule requires that a fixed, specific amount of time must elapse before the next response will be reinforced. Once that interval has passed, the first response made by the organism will trigger the reward. For example, in an FI-5 minute schedule, five minutes must pass since the last reinforcement before a response is effective. This schedule produces a distinctive pattern of response known as the scallop effect. Immediately after reinforcement, there is a low rate of responding (or a pause), followed by a progressively accelerating response rate as the end of the interval approaches. The organism learns to estimate the passage of time, concentrating its effort closer to the moment when the reward becomes available. Examples include studying behavior, where students often cram right before a scheduled exam, or checking the mail as the expected delivery time approaches.
Schedules of Intermittent Reinforcement: Variable Schedules
Variable schedules introduce an element of unpredictability, eliminating the post-reinforcement pauses and characteristic patterns associated with fixed schedules. Because the organism never knows precisely when the next reinforcement will occur, it must maintain a consistent rate of responding to maximize its chances of receiving the reward. This uncertainty makes variable schedules arguably the most powerful in terms of maintaining high, steady, and extremely persistent behaviors. The two main types are Variable Ratio (VR) and Variable Interval (VI).
The Variable Ratio (VR) schedule delivers reinforcement based on an average number of responses, but the exact number required varies randomly from one reinforcement to the next. For instance, a VR-10 schedule means that reinforcement is delivered, on average, after every ten responses, but the actual number might be three responses, then seventeen, then eight, and so on. This schedule generates the highest and steadiest rates of responding and is notoriously resistant to extinction. The high response rate is maintained because the reward is always just one more response away, creating a powerful psychological drive. The classic real-world example of the VR schedule is gambling, particularly the slot machine, where payoffs are randomized around a mean number of plays, driving highly persistent and compulsive behavior in the user.
The Variable Interval (VI) schedule delivers reinforcement for the first response made after a period of time has elapsed, but the length of that interval varies randomly around an average time. For example, a VI-5 minute schedule means reinforcement is available on average every five minutes, but the actual intervals might be two minutes, seven minutes, four minutes, etc. Because the reward is tied to time rather than effort, the VI schedule produces a steady, moderate rate of responding. Unlike the FI schedule, there are no post-reinforcement pauses or scallop patterns because the organism cannot predict when the next opportunity for reinforcement will arise.
Examples of VI schedules are common in everyday life. Checking email or social media is a prime example; reinforcement (a new message or notification) is delivered unpredictably over time, leading the user to check frequently and consistently, albeit at a moderate rate, because they never know exactly when the next reinforcing item will appear. Searching for fishing spots also operates on a VI schedule; the fisherman casts consistently, knowing that the fish bite at unpredictable intervals. The stability of the response rate produced by VI schedules makes them highly useful in maintaining desired behaviors that do not require intense bursts of effort but must be performed reliably over time.
Historical Research and Key Findings
The systematic study of intermittent reinforcement originates largely with the work of B.F. Skinner in the 1930s. In his seminal 1938 work, “The Behavior of Organisms: An Experimental Analysis,” Skinner established the foundational difference between continuous and intermittent schedules. His experiments, often involving rats pressing a lever or pigeons pecking a key, clearly demonstrated that while continuous reinforcement was necessary for the rapid acquisition of a behavior, behaviors subjected to intermittent reinforcement were significantly more durable and persistent, especially when the reward was subsequently withdrawn. This initial research set the stage for understanding how different patterns of reward schedule shape the motivational state and response patterns of the organism.
The most comprehensive analysis of schedules of reinforcement was provided by Ferster and Skinner in their 1957 book, “Schedules of Reinforcement.” This monumental work systematically documented the unique and predictable behavioral outcomes associated with each of the four primary schedules (FR, FI, VR, VI). Using cumulative recorders, they mapped the distinctive response patterns—from the high, steady rates of VR to the scallop pattern of FI—confirming that the schedule structure, independent of the type of reinforcer, was the primary determinant of the organism’s behavioral output. Their findings provided the empirical framework necessary for applying these principles to complex behaviors in humans and animals.
Later research continued to explore the durability effects. A study by Catania and Reynolds (1968) compared continuous and intermittent reinforcement schedules and confirmed that behaviors maintained on intermittent schedules were highly resistant to extinction. Their findings, along with others, solidified the understanding that conditioning to expect non-reinforcement is key to behavioral persistence. Similarly, studies focusing on specific species, such as that by Zentall and Galef (1988) on social influences, affirmed the cross-species applicability of these reinforcement principles, confirming that the rules governing the schedules hold true across a wide range of biological systems and environmental contexts, solidifying intermittent reinforcement as a universal law of learning.
Applications and Implications in Psychology and Society
The principles of intermittent reinforcement have profound implications across numerous fields, demonstrating how subtle manipulation of reward timing can control complex behaviors. Perhaps the most prominent and problematic application is in the maintenance of addictive behaviors. The VR schedule, in particular, is the core mechanism driving the persistence of gambling addiction. Slot machines are precisely calibrated to deliver rewards on a variable ratio, creating an incredibly potent drive to continue playing despite losses, as the next response might be the one that pays off. Similarly, drug-seeking behavior is often maintained intermittently; the euphoric reinforcement is not guaranteed every time the drug is used, but the unpredictability strengthens the seeking behavior, leading to relapse even after prolonged abstinence.
In educational and therapeutic settings, intermittent reinforcement is utilized constructively to maintain positive changes. Once a new behavior is established using continuous reinforcement, switching to an intermittent schedule is crucial for fading dependency on the reward and ensuring the behavior lasts in the natural environment. For instance, in animal training, continuous reinforcement (e.g., giving a treat every time a dog sits) is used initially, but trainers quickly shift to rewarding the dog unpredictably, perhaps after every third or fifth successful sit, to ensure the dog responds reliably even without immediate reward motivation. This shift creates a much more reliable and durable obedience.
Intermittent reinforcement also plays a crucial, though often unintentional, role in shaping human relationships and family dynamics. Parents or caregivers may accidentally reinforce undesirable behaviors in children, such as whining or temper tantrums, on an intermittent basis. If a parent ignores a child’s whining nine times out of ten but finally gives in on the tenth occasion just for peace, the child learns that persistence through non-reinforcement eventually leads to reward. This accidental VR schedule powerfully entrenches the undesirable behavior, making it highly resistant to extinction, as the child is conditioned to withstand long periods of non-reinforcement before success. Understanding this mechanism is vital for effective parenting and behavioral management.
In professional contexts, variable schedules are effective motivators. Sales commissions, which are tied to an unpredictable number of attempted sales or calls (a VR schedule), often drive extremely high effort and persistence in sales staff. Similarly, performance appraisals and bonuses that are delivered unpredictably, rather than on a fixed, predictable schedule, can maintain a steady, moderate level of effort (VI schedule) because employees must remain vigilant and perform well at all times, never knowing when the opportunity for reward will next appear.
Conclusion
Intermittent reinforcement is a fundamental and powerful behavioral phenomenon that dictates the durability and persistence of learned behaviors. Through the systematic variation of reward timing, specifically utilizing the structures defined by fixed and variable ratio and interval schedules, behaviorists can precisely control the response patterns and resilience of an organism. The establishment of the Partial Reinforcement Extinction Effect demonstrates that enduring behaviors are not built upon constant reward, but rather upon the strategic expectation of reward despite frequent failure. This understanding is indispensable not only for laboratory research but also for tackling challenging social issues, such as addiction, and for designing effective strategies in education, training, and clinical psychology aimed at maintaining long-term behavioral change.
References
- Catania, A. C., & Reynolds, G. S. (1968). A comparison of continuous and intermittent reinforcement. Journal of the Experimental Analysis of Behavior, 11(2), 203-210.
- Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. New York, NY: Appleton-Century-Crofts.
- Skinner, B. F. (1938). The Behavior of organisms: An experimental analysis. New York: Appleton-Century-Crofts.
- Zentall, T. R., & Galef, B. G. (1988). Social influences on lever-pressing in pigeons. Animal Learning & Behavior, 16(3), 268-275.