SUMMATED RATINGS METHOD
- Introduction to the Summated Ratings Method
- Historical Context and the Likert Scale
- The Process of Item Generation and Selection
- Scaling and Response Formats
- Calculation and Interpretation of Scores
- Psychometric Properties: Reliability and Validity
- Advantages of Summated Rating Scales
- Limitations and Methodological Criticisms
- Comparison with Other Attitude Scaling Techniques
Introduction to the Summated Ratings Method
The Summated Ratings Method, often synonymous with the widely utilized Likert scaling technique, represents a cornerstone in the field of psychological and social measurement, serving primarily as a robust procedure for quantifying complex human attitudes and beliefs. This methodology is fundamentally designed to construct a sophisticated attitude measuring scale where respondents indicate their level of agreement or disagreement across a series of carefully developed declarative statements related to a specific attitude object. Unlike simpler dichotomous response formats, the summated rating scale captures the intensity and direction of an individual’s feeling, providing a more granular and nuanced depiction of their underlying disposition towards the subject under investigation, whether it involves political ideology, consumer preferences, or organizational commitment. The power of this method lies in its assumption that the underlying attitude being measured is a single, latent trait, and that summing the scores across multiple items provides a reliable and precise estimate of that trait, mitigating the measurement error inherent in relying on single indicators.
Central to the implementation of the Summated Ratings Method is the principle of accumulation. An individual’s total attitude score is derived by aggregating the numerical values assigned to their responses across all the relevant items in the scale, provided the items have been statistically verified to measure the same underlying construct. This composite score, which typically ranges from a minimum to a maximum possible value, serves as the operational definition of the respondent’s attitude strength and valence. The methodology requires meticulous attention during the item generation phase to ensure that the statements are unambiguous, clearly focused on the attitude object, and represent the full spectrum of possible opinions regarding that object. Furthermore, the inclusion of both positively and negatively worded statements is a standard practice within this framework, crucial for mitigating potential response set biases, such as the tendency for respondents to simply agree with all items regardless of content, known as acquiescence bias.
The resulting attitude scale constructed through this process is invaluable for researchers aiming to categorize populations, track changes in attitudes over time, or correlate specific attitudes with behavioral outcomes. Because the method yields scores that are usually treated as interval data—meaning the difference between two scores is meaningful and consistent across the scale—it allows for the application of powerful parametric statistical techniques, including correlation, regression analysis, and analysis of variance (ANOVA). This statistical flexibility, combined with the relative ease of scale construction compared to alternative scaling methods, solidifies the Summated Ratings Method as the most frequently employed approach for developing psychological instruments across diverse domains, from clinical assessment to educational research and market analysis.
Historical Context and the Likert Scale
The conceptual foundation of the Summated Ratings Method owes its enduring legacy primarily to Rensis Likert, who published his seminal work on the technique in 1932. Prior to Likert’s development, attitude measurement was often reliant upon more cumbersome and resource-intensive methods, such as the Thurstone Equal-Appearing Intervals technique, which required extensive pre-testing by panels of expert judges to assign definitive scale values to individual statements. Likert sought a more practical, yet still psychometrically sound, approach. His innovation was the realization that attitudes could be effectively measured by simply having respondents react to a series of statements using a standardized response format, such as the now-ubiquitous five-point scale ranging from “Strongly Disagree” to “Strongly Agree,” and then summing these responses. This simplification dramatically reduced the time and labor required for scale construction, democratizing attitude research and making large-scale surveys far more feasible for researchers across disciplines.
The rapid adoption of the Likert scale was driven largely by its compelling balance of simplicity and statistical rigor. Unlike earlier methods that attempted to define the absolute position of a single item on a measurement continuum, the Summated Ratings Method focuses on the cumulative effect of a set of items, assuming that the error associated with individual item placement cancels out when aggregated. Likert argued that if a set of statements all tap into the same underlying attitude, the total score derived from summing the responses provides a highly reliable index of that attitude. The robustness of this assumption has been repeatedly validated through decades of empirical research, demonstrating that the total score exhibits strong internal consistency—meaning the items correlate well with each other and collectively measure a singular construct. This historical shift marked a pivotal moment where efficiency was effectively integrated with psychometric quality in survey design.
While the term Summated Ratings Method is the formal methodological name, the term Likert Scale has become the common vernacular reference, specifically describing the specific response format utilized (e.g., the five-point agreement continuum). It is important to note, however, that the broader methodology encompasses the entire process: the generation of a pool of potential items, the empirical selection of the most discriminating items based on item analysis (usually involving item-total correlations), the standardization of the response format, and the final scoring protocol. The historical contribution of Likert was not merely the response format, but the development of a coherent, efficient, and statistically defensible methodology for attitude scale construction that fundamentally changed the landscape of social psychology and survey research, allowing researchers to explore nuanced psychological constructs previously difficult to operationalize.
The Process of Item Generation and Selection
The successful application of the Summated Ratings Method hinges critically upon the thoroughness and precision of the item generation phase. This process begins with the establishment of a clear conceptual definition of the attitude object or construct being measured. A large initial pool of declarative statements—often two to three times the number intended for the final scale—must be created. These statements must be relevant to the attitude object, express a clear and unambiguous opinion, and collectively cover the entire domain of the construct. Expert judgment is frequently utilized at this stage, where domain specialists review the items for clarity, relevance, and representativeness, ensuring that the content validity of the eventual scale is maximized. Statements must avoid complex jargon, double negatives, and questions that are double-barreled (asking about two different things simultaneously).
Following the initial generation and refinement of the item pool, the critical step of item analysis is performed using data collected from a pilot study administered to a representative sample. The primary goal of item analysis is to discard weak or non-discriminating items and retain only those that strongly contribute to the measurement of the intended latent construct. The key metric employed here is the item-total correlation, which measures the correlation between the score on a single item and the total composite score derived from all other items in the scale. Items exhibiting low correlations (typically below 0.30) are usually removed because they fail to measure the same underlying construct as the majority of the scale, thereby reducing the scale’s overall internal consistency and reliability. Items that correlate too highly with the total score might be redundant, but low correlations are the primary indicator of poor item quality.
Furthermore, item selection also involves examining the discriminatory power of each statement, often assessed through a comparison of the mean scores on that item between high scorers (the top 25% of the total score distribution) and low scorers (the bottom 25%). A strong item should successfully differentiate between individuals who possess a high level of the attitude and those who possess a low level. If an item fails to show a significant difference between these extreme groups, it suggests the item is not sensitive enough to the underlying attitude variation and should be considered for exclusion. By rigorously applying these statistical criteria—item-total correlations, internal consistency checks, and discriminatory indices—the researcher ensures that the final set of items forms a cohesive, internally consistent, and psychometrically sound measurement instrument, ready for widespread application in the definitive study.
Scaling and Response Formats
The standardized response format is perhaps the most recognizable feature of the Summated Ratings Method. This format, often called a Likert-type scale, dictates how respondents express their attitude intensity. The most common format is the five-point scale, structured symmetrically around a neutral midpoint: Strongly Disagree, Disagree, Neither Agree nor Disagree, Agree, and Strongly Agree. This structure provides psychological distance between the response options, ensuring that the numerical assignment (e.g., 1 to 5) reflects a quantifiable gradient of attitude strength. Variations exist, including three-point, seven-point, and even nine-point scales, each chosen based on the desired sensitivity and the nature of the population being studied. For instance, a seven-point scale may offer finer discrimination, while a three-point scale might be used for populations with lower literacy or cognitive loads.
A critical decision in implementing the Summated Ratings Method involves the inclusion or exclusion of a neutral midpoint. When an odd number of response options is used (e.g., five or seven), a neutral option is provided, allowing respondents who are genuinely undecided, indifferent, or perceive the statement as equally positive and negative to select a middle ground. Conversely, some researchers opt for an even number of options (e.g., a four-point or six-point scale), thereby forcing a choice, compelling respondents to lean either toward agreement or disagreement. This technique, known as a forced-choice scale, is often utilized when researchers suspect that a large proportion of the sample might default to the neutral option without possessing a genuinely neutral attitude, potentially obscuring meaningful variance in the data. The choice between an odd or even number of points depends heavily on the theoretical construct and the perceived necessity of capturing true neutrality versus forcing differentiation.
Regardless of the number of points employed, it is standard practice that all response categories must be clearly labeled and mutually exclusive, ensuring that respondents understand precisely what each option signifies. The numerical assignment must be consistent across all items, though the direction of scoring will depend on whether the item is positively or negatively worded. For positively worded items (where agreement indicates a positive attitude), the highest numerical value (e.g., 5) is assigned to Strongly Agree. For negatively worded items (where agreement indicates a negative attitude), the scoring must be reversed (e.g., 5 assigned to Strongly Disagree) prior to summation. This necessary step of reverse scoring ensures that all items contribute uniformly and positively to the final attitude index, reflecting the underlying unidimensional nature of the scale.
Calculation and Interpretation of Scores
The ultimate goal of the Summated Ratings Method is to derive a single, composite score that represents the respondent’s overall attitude toward the measured object. The calculation process is straightforward but requires careful adherence to the scoring protocol, especially concerning reverse-scored items. Once the numerical weights have been assigned to every response (e.g., 1 to 5), the scores across all selected items are simply added together. For example, if a researcher uses a 10-item scale with a five-point response format (1 to 5), the minimum possible total score would be 10 (10 items × 1 point/item), and the maximum possible score would be 50 (10 items × 5 points/item). This range provides a continuous measure of the attitude, where higher scores consistently indicate a stronger or more favorable attitude, assuming proper reverse scoring was executed for all applicable statements.
The interpretation of the final score is context-dependent and typically relative, not absolute. Because the scores are often treated as interval data, researchers can calculate the mean score for a group, compare means between different groups (e.g., using t-tests or ANOVA), or correlate the attitude score with other variables (e.g., demographic data or behavioral outcomes). For instance, a researcher might determine the average attitude score of one population segment is 42, while another segment scores 35. The difference between these means is statistically and theoretically meaningful, indicating a divergence in their respective attitudes. However, interpreting what constitutes a “high” or “low” score often relies on establishing norms or referencing the theoretical midpoint. In the 10-item, 1-to-5 scale example, the theoretical neutral midpoint would be 30; scores significantly above 30 suggest a generally positive attitude, while scores significantly below 30 suggest a generally negative attitude.
It is crucial to acknowledge the statistical implications of summing discrete ordinal responses. While the individual items themselves yield ordinal data (ordered categories where differences between ranks are not necessarily equal), the convention in the Summated Ratings Method is that when 10 or more items are summed, the resulting composite score approximates interval data. This approximation is justified by the central limit theorem and the assumption that the underlying psychological trait is continuous. This critical assumption allows researchers to employ powerful parametric statistics, providing a robust analytical framework. However, researchers must always be mindful of this underlying assumption and ensure that their scale demonstrates sufficient reliability and validity, as failure to do so undermines the statistical inferences drawn from the composite scores.
Psychometric Properties: Reliability and Validity
The utility and trustworthiness of any scale constructed via the Summated Ratings Method are entirely dependent upon its psychometric properties, specifically its reliability and validity. Reliability refers to the consistency of the measurement—that is, the extent to which the scale yields the same results under consistent conditions. The most common measure of reliability for a summated rating scale is internal consistency, typically assessed using Cronbach’s Alpha ($alpha$). A high Cronbach’s Alpha value (generally 0.70 or higher in early research, and preferably 0.80 or higher in established scales) indicates that the items within the scale are highly correlated with each other and are measuring the same underlying construct. This statistical measure is a direct indicator of the quality of the item selection process, confirming that the sum of the items provides a stable measure of the latent attitude.
Validity, conversely, addresses whether the scale actually measures what it purports to measure. Several types of validity are essential when utilizing the Summated Ratings Method. Content validity ensures that the items adequately represent the full domain of the attitude construct, often established through expert panel review during the initial item generation phase. Criterion validity assesses the scale’s ability to predict a relevant external criterion, either concurrently (e.g., correlating the attitude score with simultaneous behavior) or predictively (e.g., using the attitude score to forecast future behavior). For instance, a high score on a job satisfaction scale should correlate positively with low rates of employee turnover.
Perhaps the most complex form of validation for summated rating scales is construct validity, which confirms that the scale measures the theoretical construct it was designed to assess. This is typically established through two sub-forms: convergent validity, where the scale scores show strong correlations with scores from other measures theoretically related to the construct; and discriminant validity, where the scale scores show low or negligible correlations with measures of constructs that are theoretically unrelated. Often, sophisticated techniques like factor analysis are employed to empirically verify the hypothesized structure of the scale. Factor analysis determines whether the item responses cluster together in a way that aligns with the researcher’s theoretical understanding, confirming that the items indeed load onto a single, dominant factor (unidimensionality) or multiple distinct factors (multidimensionality) as intended by the scale design.
Advantages of Summated Rating Scales
The widespread appeal of the Summated Ratings Method stems from several compelling advantages over alternative scaling techniques. Foremost among these is the relative simplicity and efficiency of construction. Unlike the Thurstone method, which demands hundreds of hours of judging time by numerous experts to establish interval values for items, the Summated Ratings Method requires only that the researcher generates a pool of relevant statements and then uses basic statistical item analysis (item-total correlations) to refine the set. This efficiency makes it the preferred choice for rapid scale development and for researchers operating under time or resource constraints. This ease of creation does not typically compromise psychometric quality; well-constructed Likert scales routinely achieve high reliability indices.
Another significant advantage is the superior reliability typically achieved by summated scales. Because the method relies on aggregating responses across multiple items, the random measurement error associated with any single item is significantly reduced or averaged out in the total score. The inclusion of numerous items targeting the same construct results in high internal consistency, often yielding Cronbach’s Alpha values that exceed those achievable by single-item measures or simpler scaling techniques. Furthermore, the standardized response format is easy for respondents to understand and complete, minimizing ambiguity and administrative errors, thus contributing to cleaner and more reliable data collection, particularly in large-scale survey settings.
Finally, the Summated Ratings Method offers high discriminatory power. By using a graded response scale (e.g., five or seven points) instead of simple agreement/disagreement, the scale is capable of capturing fine distinctions in attitude intensity among respondents. This allows researchers to detect subtle shifts in attitudes over time or small but significant differences between groups, which might be missed by less sensitive measures. The ability to treat the resulting total scores as interval data further enhances the analytical advantages, permitting the application of the full range of sophisticated inferential statistics, thereby maximizing the statistical power and interpretability of the research findings.
Limitations and Methodological Criticisms
Despite its popularity, the Summated Ratings Method is subject to several important methodological limitations and criticisms. A primary concern revolves around the susceptibility of these scales to various response biases. One pervasive bias is acquiescence response set, the tendency for some respondents to agree with statements regardless of content, which can artificially inflate the internal consistency of the scale and skew the mean scores. This issue is typically mitigated by the inclusion of both positively and negatively worded items, requiring careful reverse scoring, but the bias itself remains a threat. Another major bias is social desirability bias, where respondents answer in a way they believe is socially acceptable rather than truthfully reflecting their actual attitude, particularly when dealing with sensitive topics.
A fundamental criticism is the assumption of equal intervals, which underpins the treatment of the total score as interval data. Although the method assigns equidistant numerical weights (e.g., 1, 2, 3, 4, 5) to the categories, there is no definitive psychological proof that the subjective distance between “Disagree” and “Neither Agree nor Disagree” is precisely the same as the distance between “Agree” and “Strongly Agree.” If the intervals are truly unequal, then the statistical operations applied (like calculating means and standard deviations) may violate the assumptions of the chosen parametric tests, potentially leading to inaccurate statistical inferences. While this practical approximation is widely accepted in social science research, it remains a serious theoretical limitation that researchers must acknowledge when interpreting their findings.
Furthermore, the inclusion or exclusion of the neutral midpoint presents a dilemma. While the neutral option allows for genuine indifference, it also serves as an easy escape route for respondents who wish to avoid thinking deeply about the item or expressing a true opinion. High rates of neutral responding can reduce the variance in the data, limiting the scale’s ability to correlate significantly with other variables and potentially masking true relationships. The debate over forced-choice versus neutral-option scales highlights the inherent difficulty in designing a standardized format that perfectly captures the complex, continuous nature of human attitudes without introducing artificial response patterns. These methodological concerns necessitate careful design choices and thorough pilot testing to minimize extraneous influences on the final attitude measure.
Comparison with Other Attitude Scaling Techniques
To fully appreciate the Summated Ratings Method, it is useful to contrast it with its primary historical and contemporary competitors, namely the Thurstone and Guttman scaling techniques. The Thurstone Equal-Appearing Intervals Method (developed earlier) is characterized by its rigorous requirement for absolute item scaling. It involves judges sorting statements into categories based on the degree of favorableness they express, resulting in each item receiving an objective scale value independent of the responses of the research subjects. While Thurstone scales produce items with verified interval properties, their construction is excessively time-consuming and labor-intensive, making them impractical for most modern research. The Likert method triumphed historically precisely because it demonstrated that highly reliable and valid attitude scores could be generated without this extensive pre-testing phase.
In contrast, the Guttman Cumulative Scaling Method (also known as scalogram analysis) focuses on establishing a perfectly ordered, cumulative set of items. A Guttman scale assumes that if a respondent agrees with a specific item, they will automatically agree with all less extreme items. For example, if a person agrees to undergo major surgery, they should also agree to take a pill for the same condition. This method aims to establish a perfect hierarchy of attitudes, resulting in an ordinal scale where the total score directly corresponds to the pattern of responses. While Guttman scales offer a stringent test of unidimensionality, finding a set of items that perfectly fits the cumulative model is rare in practice, limiting its applicability largely to highly specific, often behavioral, domains.
The Summated Ratings Method occupies the middle ground, offering a practical compromise. It retains the essential requirement of unidimensionality and statistical rigor found in Guttman scaling, yet it avoids the prohibitive complexity of Thurstone scaling. The ability to efficiently construct a scale that yields composite scores approximating interval data, which can then be subjected to powerful statistical analysis, ensures its continued dominance. While Thurstone focuses on precise item placement and Guttman focuses on perfect response patterns, the Likert-based Summated Ratings Method prioritizes the highly reliable aggregation of multiple indicators to achieve a stable and flexible measure of the underlying latent attitude.