s

Somers’ D: Mastering Ordinal Association in Psychology


Somers' D: Mastering Ordinal Association in Psychology

Somers’ D: Asymmetric Measure of Association

The Core Definition of Somers’ D

Somers’ D is a fundamental statistical tool in psychology and social sciences, defined precisely as an asymmetric measure of association between two variables that are measured on an ordinal scale. Unlike symmetric measures which treat both variables equally, Somers’ D explicitly distinguishes between an independent variable (the predictor) and a dependent variable (the response), quantifying the extent to which the independent variable can predict the relative ordering or ranking of the dependent variable. The value of Somers’ D ranges from -1.0 to +1.0, where zero indicates no association, and values closer to the extremes indicate a stronger relationship—positive indicating a tendency toward similar rankings, and negative indicating an inverse relationship between the rankings. This directional nature makes it indispensable for researchers attempting to model causal or predictive relationships where the data structure is non-continuous but ordered.

The core principle behind Somers’ D lies in its focus on prediction and directionality. It is essentially a proportional reduction in error (PRE) measure, meaning it calculates the improvement in predicting the rank of the dependent variable when the rank of the independent variable is known, compared to predicting it randomly. Because it only concerns itself with the direction of the relationship—how variable X influences Y—and not the influence of Y on X, the resulting coefficient D(Y|X) is typically different from D(X|Y). This asymmetry is the defining characteristic that separates Somers’ D from related symmetric correlation coefficients, allowing researchers to accurately reflect hypothesized directional relationships inherent in many psychological theories, such as the prediction of behavioral outcomes based on measured attitudes.

Furthermore, the mechanism of Somers’ D is rooted in the comparison of pairs of observations. It assesses whether pairs of data points are ordered consistently (concordantly) or inconsistently (discordantly) across both variables. By calculating the difference between these concordant and discordant pairs and normalizing this difference by the total number of pairs that are not tied on the independent variable, the measure provides a robust estimate of the directional association. This reliance on rank order rather than numerical distance ensures that it remains highly applicable and appropriate for common psychological metrics like Likert scales, survey responses, and institutional rankings, which inherently possess ordered categories but lack true interval properties.

Historical Development and Robert H. Somers

The measure known as Somers’ D was formally introduced by the American sociologist Robert H. Somers in 1962, published in the seminal paper titled “A New Asymmetric Measure of Association for Ordinal Variables.” At the time of its development, statistical methodology was rapidly evolving, and researchers frequently encountered difficulties applying traditional parametric measures, which assume interval or ratio data and normal distributions, to the categorical yet ordered data common in sociology and psychology. While other nonparametric measures, such as Gamma and Kendall’s Tau, existed, they were primarily symmetric and did not provide the necessary framework for testing directional hypotheses.

Somers’ objective was to create a coefficient that could explicitly handle the distinction between a predictor and a criterion variable while maintaining the robustness required for ordinal data. This need arose specifically from the challenges of analyzing contingency tables where the rows and columns represented ordered categories, and the analyst needed to hypothesize a clear causal pathway. His innovation was to adapt the logic of concordant and discordant pairs—the foundation of other rank-order statistics—and adjust the denominator to account only for pairs that were untied on the independent variable. This specific normalization step is what imbues Somers’ D with its crucial asymmetric property, filling a significant methodological gap in the social sciences during the mid-20th century.

The introduction of Somers’ D marked a significant step toward developing sophisticated nonparametric tools suitable for testing theoretical models in the behavioral sciences. By providing a clear directional coefficient, it enabled researchers to move beyond simply stating that two variables were associated, allowing them instead to quantify the strength and direction of the predicted influence. This methodological advancement was rapidly integrated into psychometrics and survey analysis, contributing to more rigorous testing of theories involving variables like socioeconomic status, educational attainment, attitude strength, and various measures of psychological well-being, all of which are frequently operationalized using ordinal scales.

Practical Application in Psychological Research

To illustrate the utility of Somers’ D, consider a common research scenario in educational psychology focused on the relationship between student motivation and academic achievement. A researcher hypothesizes that higher levels of intrinsic motivation lead to better academic rankings. Motivation is measured using a comprehensive survey resulting in four ordered categories: “Very Low,” “Low,” “Moderate,” and “High” (the Independent Variable, X). Academic achievement is measured by the student’s final class percentile ranking, categorized into four ordered groups: “Bottom 25%,” “Second 25%,” “Third 25%,” and “Top 25%” (the Dependent Variable, Y). Since both variables are strictly ordinal, Somers’ D is the appropriate measure to test the directional hypothesis D(Y|X).

The “How-To” of applying this principle involves comparing every possible pair of students in the dataset. For instance, if Student A has “High” motivation and falls into the “Top 25%” ranking, and Student B has “Low” motivation and falls into the “Bottom 25%” ranking, this pair is considered a concordant pair, supporting the hypothesis that higher motivation leads to higher achievement. Conversely, if Student C has “High” motivation but falls into the “Bottom 25%” ranking, and Student D has “Low” motivation but falls into the “Top 25%” ranking, this pair is discordant, contradicting the hypothesis. Crucially, if two students are tied on the motivation variable (X), they are excluded from the denominator calculation for D(Y|X), because a tie on the independent variable provides no predictive information about the dependent variable’s ranking.

Upon calculating the coefficient, the resulting value provides a clear directional metric. If Somers’ D is +0.65, the researcher can interpret this as a strong positive association: knowing a student’s motivation level significantly improves the prediction of their academic ranking. This result supports the directional hypothesis that motivation influences achievement ranking. If, however, the researcher were to calculate D(X|Y)—the degree to which achievement ranking predicts motivation level—the coefficient would likely be different, emphasizing the importance of correctly specifying the independent and dependent variables based on the underlying psychological theory being tested. This rigor in directional specification is paramount for building robust psychological models.

Significance and Utility in Data Analysis

The significance of Somers’ D to the field of psychological data analysis cannot be overstated, primarily because it addresses the pervasive issue of ordinal data in psychological measurement. Most common psychological constructs—attitudes, personality traits, clinical severity ratings—are often captured using scales (like Likert or visual analogue scales) which are inherently ordinal. Using parametric statistics (like Pearson’s r) on such data violates key assumptions, potentially leading to inaccurate conclusions. Somers’ D provides a statistically sound, nonparametric alternative that respects the ordered nature of the data while allowing for sophisticated modeling of directional relationships, thus lending greater validity to research findings.

Its primary utility today lies in its integration into advanced statistical methods, particularly in logistic regression and generalized linear models. When ordinal variables are used as predictors in these models, the calculation of Somers’ D is often a prerequisite for assessing model fit and predictive power. For instance, in evaluating the performance of risk prediction models in clinical psychology, Somers’ D (or the closely related c-statistic, which is equivalent to D when Y is binary) measures the model’s ability to correctly order or discriminate between positive and negative outcomes. A higher D value indicates superior model performance in ranking individuals according to their predicted risk, which is a critical evaluation criterion in psychometrics and diagnostics.

Furthermore, Somers’ D serves a vital role in validating survey instruments and psychometric scales. By calculating the measure of association between scale items and an external criterion variable, researchers can confirm the predictive validity and the directional alignment of their measurements. This allows for rigorous testing of psychological theories that propose a clear sequence of events or influence, such as the prediction of job satisfaction (ordinal outcome) based on management style ratings (ordinal predictor). The clear, directional coefficient derived from Somers’ D ensures that the statistical evidence directly addresses the theoretical claim of influence, making it a cornerstone for evidence-based practice and theoretical refinement in psychology.

Somers’ D belongs to a family of rank correlation coefficients that includes Gamma (Goodman and Kruskal’s Gamma) and Kendall’s Tau. All these measures rely on the fundamental comparison of concordant (C) and discordant (D) pairs of observations. However, they differ significantly in how they handle tied pairs in the denominator, which determines their symmetry and interpretation. Gamma is the most liberal, ignoring all tied pairs (on X and Y) in its calculation, making it symmetric and often yielding the highest magnitude coefficient among the three. Kendall’s Tau-b, a symmetric measure, incorporates ties on both X and Y into the denominator, providing a more conservative estimate of association.

The defining connection and distinction between Somers’ D and these related measures lies in its specialized treatment of ties. Somers’ D, specifically D(Y|X), includes ties on the dependent variable (Y) in the calculation of the total number of pairs but critically excludes ties only on the independent variable (X). This asymmetrical approach ensures that the resulting coefficient genuinely measures the predictive power of X over Y, without being diluted by cases where X provides no differential information. Therefore, while Somers’ D is algebraically related to Kendall’s Tau-b—specifically, D(Y|X) is equal to Tau-b divided by the correction factor for ties in the dependent variable—its unique denominator formulation ensures its interpretation remains strictly directional, making it the preferred choice when causality or directionality is hypothesized.

The broader category of psychology to which Somers’ D most prominently applies is Psychometrics and Nonparametric Statistics. Psychometrics, the science of psychological measurement, relies heavily on ordinal data for constructing and validating scales. Nonparametric statistics provide the robust mathematical framework necessary to analyze this data without restrictive assumptions about population distribution. Somers’ D is a key tool within this framework, bridging theory and measurement by providing a directional test of association that is mathematically sound for the type of data most frequently generated by psychological assessments, contributing significantly to fields ranging from cognitive psychology (measuring reaction time rankings) to clinical psychology (ranking symptom severity).

Calculating Somers’ D: Step-by-Step Methodology

The calculation of Somers’ D requires a systematic comparison of all possible pairs of observations within the dataset, specifically focusing on the number of concordant pairs (C) and discordant pairs (D). A concordant pair is one where the subject ranked higher on the independent variable (X) also ranks higher on the dependent variable (Y). A discordant pair is one where the subject ranked higher on X ranks lower on Y. The difference between C and D forms the numerator, representing the net predictive agreement across all pairs.

The formula for Somers’ D, specifically D(Y|X), where Y is the dependent variable and X is the independent variable, is defined as: D = (C – D) / N_0, where N_0 is the denominator. The crucial step is the calculation of N_0, the total number of pairs that are not tied on the independent variable (X). N_0 is calculated as the total number of pairs minus the number of pairs tied on X (T_x). By excluding pairs tied on X, the coefficient ensures that the resulting measure reflects only the predictive power derived from differences in the independent variable. This mathematical restriction is the source of the measure’s asymmetry.

  1. Identify the Independent (X) and Dependent (Y) Variables: Determine which variable is hypothesized to be the predictor.
  2. Calculate Concordant Pairs (C): Systematically compare every pair of observations, counting how many pairs show the same ranking direction on both X and Y.
  3. Calculate Discordant Pairs (D): Count how many pairs show opposite ranking directions on X and Y.
  4. Calculate Ties on X (T_x): Count the number of pairs where both members have the same value on the independent variable (X).
  5. Determine the Denominator (N_0): N_0 is calculated as the total number of pairs minus T_x. The total number of pairs is usually N(N-1)/2, where N is the total sample size.
  6. Calculate Somers’ D: Apply the formula D(Y|X) = (C – D) / N_0. The result is the directional coefficient, indicating the strength and direction of X’s predictive influence on Y.

Limitations and Considerations for Use

While Somers’ D is a powerful tool for ordinal data, researchers must be aware of its inherent limitations. Firstly, like all rank correlation measures, Somers’ D is sensitive only to the rank order, not the magnitude of the differences between categories. If a psychological intervention moves a participant from “Slightly Motivated” to “Highly Motivated” (a large change), compared to another participant moving from “Moderately Motivated” to “Highly Motivated” (a smaller change), the measure treats both rank shifts equally, potentially obscuring meaningful differences in the strength of the underlying construct. This limitation requires careful interpretation, ensuring that researchers do not over-interpret the coefficient as a measure of effect size magnitude.

Secondly, the interpretation of the coefficient must always be framed within the context of the asymmetry. A strong D(Y|X) does not imply a strong D(X|Y). If the researcher incorrectly identifies the dependent variable, or if the relationship is truly symmetric, using Somers’ D might lead to an incomplete or misleading conclusion about the relationship structure. It is crucial that the choice of independent and dependent variables is grounded firmly in established psychological theory or clear temporal ordering, rather than simply maximizing the resulting coefficient value.

Finally, although Somers’ D handles ties on the dependent variable robustly, extensive ties in the data set can lead to a significant reduction in the magnitude of the coefficient, potentially underestimating the true underlying association. While this is mathematically sound—as numerous ties indicate less information in the ranking—it necessitates careful consideration of whether the ordinal scale used is sufficiently fine-grained to capture the necessary variance. Researchers must ensure that their measurement instruments provide enough categories to minimize data compression, thereby preserving the fidelity of the rank ordering required for Somers’ D to function optimally.