b

BANDING



Introduction and Definition of Banding

Banding, in the context of psychometrics and organizational psychology, refers to a specific methodology employed in personnel selection and high-stakes testing designed to mitigate the inherent unreliability of test scores. Rather than treating small, numerical differences in scores as significant indicators of differential ability, banding groups scores into predetermined ranges, or score bands. All scores that fall within a single band are subsequently considered statistically and functionally equivalent for the purpose of initial selection decisions. This approach fundamentally shifts the focus from strict rank-ordering, where a score of 88 is definitively superior to a score of 87, to a system where both scores might fall within the same range (e.g., 85–90) and are thus treated identically in terms of measured aptitude. The primary objective of banding is to acknowledge and account for the inevitable measurement error present in standardized tests, thereby promoting a system of selection perceived as fairer and more defensible, particularly when dealing with diverse applicant pools.

The application of banding typically occurs after initial testing has been completed and raw scores have been standardized. Once the score bands are established, often based on statistical measures such as the Standard Error of Measurement (SEM), the selection process proceeds by focusing on the highest available band first. Only after all candidates within the top band have been considered and potentially selected, do selectors move to the next lower band. This staged approach ensures that candidates with the highest statistically reliable scores are always considered first. The decision to implement banding often stems from a desire by organizations to balance the need for high-quality candidates, as indicated by test scores, with broader organizational goals, such as increasing workforce diversity or allowing for the consideration of non-cognitive factors that testing might not capture accurately.

It is crucial to differentiate banding from simple fixed cutoff scores. While a fixed cutoff score establishes a minimum acceptable performance level, often eliminating scores below that threshold entirely, banding is a tool for determining equivalence among acceptable scores. If a fixed cutoff is 70, a score of 71 is accepted and 69 rejected. Conversely, if banding is used, all scores above the cutoff, perhaps ranging from 70 to 100, might be divided into three or four distinct bands. Within the highest band, all candidates are placed on an equal footing, meaning secondary criteria must be utilized to make final selection decisions. This nuanced methodology requires sophisticated psychometric expertise to implement correctly, ensuring that the chosen band widths are statistically sound and defensible against legal challenge.

Theoretical Foundation: Standard Error of Measurement (SEM)

The theoretical cornerstone supporting the practice of banding is the concept of the Standard Error of Measurement (SEM). Classical Test Theory posits that any observed test score is composed of two components: the true score (the hypothetical, error-free measure of the trait) and the measurement error. Because no psychological test is perfectly reliable, there is always some degree of error associated with the observed score. The SEM provides a quantitative estimate of the typical distance between the observed score and the true score, essentially quantifying the margin of error inherent in the test instrument itself. This statistical understanding dictates that scores that are numerically close may, in reality, represent the same true level of ability, and any observed difference is merely the result of random or systematic testing error.

When scores are closely clustered, particularly within a range defined by the SEM, treating them as strictly rank-ordered creates an illusion of precision that the test instrument cannot actually support. For example, if the SEM for a selection test is 3 points, a candidate scoring 85 and a candidate scoring 87 are statistically indistinguishable at a typical confidence level (e.g., 68%). The difference of two points is likely attributable to random fluctuations—perhaps the candidate was distracted, guessed correctly on one item, or simply experienced test-day variability. Banding directly addresses this psychometric reality by establishing band widths that typically correspond to one or two SEM units. By defining the band width using this statistically derived margin of error, the system ensures that only differences in scores that are statistically significant—differences large enough to exceed the expected measurement error—lead to differential treatment of candidates.

The utilization of SEM in defining bands lends significant statistical rigor to the banding approach. Without grounding the band widths in a measure of test reliability, the process would appear arbitrary or subjectively determined. Calculation of the SEM requires knowledge of the test’s standard deviation and its reliability coefficient. A test with higher reliability will have a smaller SEM, resulting in narrower bands and finer distinctions between scores. Conversely, a test with lower reliability will yield a larger SEM and consequently wider bands, reflecting the greater uncertainty about the true ability levels of the test takers. This direct relationship ensures that the structure of the bands accurately reflects the psychometric quality and predictive capability of the specific assessment tool being used for personnel decisions.

Methodology and Calculation of Score Bands

The implementation of banding involves a systematic methodological process, starting with the calculation of the SEM and concluding with the selection strategy. The initial step involves determining the appropriate band width, which is the most critical decision in the process. While some organizations may use a fixed band width (e.g., one SEM unit wide), others might employ a confidence interval approach, setting the width based on two SEM units to achieve a higher confidence level (e.g., 95%) that the true scores of individuals within the band overlap significantly. The formula for the SEM is derived from the standard deviation of the test scores and the test’s reliability coefficient. Once the SEM is calculated, the band width is set, defining the numerical range within which scores are deemed equivalent.

Two primary types of banding methodologies exist: fixed bands and sliding bands. Fixed banding establishes static ranges from the outset. For example, if the test ranges from 0 to 100 and the band width is 5 points, the bands might be 96–100, 91–95, 86–90, and so forth. Under fixed banding, the selection team works exclusively within the highest established band until all viable candidates in that band are selected or rejected based on secondary criteria. The downside is that a score marginally below the band cutoff (e.g., 90) is treated as significantly inferior to a score marginally above it (e.g., 91), potentially reintroducing the arbitrary cutoff problem that banding sought to avoid at the micro-level.

Conversely, sliding banding offers greater flexibility and is often preferred in practice. In sliding banding, the top score achieved by any candidate sets the upper limit of the initial band. As soon as a selection is made from the highest band, the band “slides” down. For instance, if the highest score is 98 and the band width is 5 points, the initial band is 94–98. Once a selection is made from this band, the next highest score among the remaining candidates dictates the new upper limit of the band, provided that new score is outside the previous band’s range. This dynamic approach minimizes the artificial discontinuity inherent in fixed bands and ensures that the selection process always considers the highest performing pool of candidates currently available, while still acknowledging the statistical equivalence of scores within the SEM range.

Application in Personnel Selection

The primary organizational context for the use of banding is high-volume, high-stakes personnel selection, particularly in public sector organizations where fairness and defensibility are paramount. Selection processes often involve multiple steps, and banding is typically applied to the results of cognitive ability tests or standardized knowledge tests, which tend to have high predictive validity but also high potential for adverse impact against certain demographic groups if utilized for strict rank-ordering. By establishing bands, the organization creates a pool of equally qualified individuals from which managers can exercise discretion using criteria other than the test score itself.

Once the top band is identified, the selection process transitions from a purely psychometric exercise to a holistic assessment. Since all candidates in the band are statistically equivalent in terms of measured aptitude, organizations introduce secondary criteria to differentiate among them. These secondary criteria may include structured interviews, assessment center performance, relevant work experience, educational credentials, or specific diversity considerations if permissible under law. The critical requirement is that these secondary criteria must be job-related, transparently applied, and demonstrably valid predictors of job success, thereby ensuring that the final selection decision maintains predictive utility while achieving broader organizational goals.

A significant benefit for human resource management is the increased flexibility afforded by banding. In strict rank-ordering, a manager is ethically and often legally compelled to hire the highest-scoring candidate, regardless of other factors. Banding allows the hiring manager to look at a group of highly qualified candidates and select the individual who possesses the best combination of test scores (which place them in the top band) and fit with the team culture, specific experiential knowledge, or other subjective but job-relevant attributes. This discretion ensures that the selection process retains human judgment and organizational context, rather than relying solely on an automated numerical cutoff.

Advantages and Arguments for Fairness

One of the most compelling arguments for the adoption of banding relates to organizational fairness and the reduction of adverse impact. Cognitive ability tests, despite their strong predictive validity, often exhibit mean score differences across various demographic groups. Strict rank-ordering based on these scores tends to disproportionately favor one group over others, leading to significant adverse impact which can result in legal scrutiny and reduced workforce diversity. Banding addresses this issue directly by allowing organizations to select equally qualified minority candidates from the top band before moving to the next band, provided those candidates meet the secondary criteria. This method allows organizations to pursue diversity objectives without sacrificing the quality of the workforce, as all selected candidates have demonstrated performance within the statistically top tier.

Furthermore, banding embodies a more realistic and ethically responsible view of psychological measurement. By acknowledging that small score differences are attributable to measurement error rather than true ability differences, banding upholds the principle of psychometric justice. It prevents the organization from making high-stakes decisions based on noise inherent in the measurement instrument. The candidate who scored 92 and the candidate who scored 90 are both deemed excellent performers; treating the 92 scorer as definitively superior ignores the fact that, on another day, the 90 scorer might have achieved the 92. This perspective promotes greater trust in the selection system among applicants who recognize that the process accounts for the inevitable imperfection of standardized testing.

The administrative benefits are also notable. Banding simplifies the selection decision once the top tier is established, allowing hiring managers to focus their resources on detailed qualitative assessments (e.g., intensive interviews, background checks) of a smaller, highly qualified pool. Key advantages include:

  • Mitigating Legal Risk: Providing a statistically defensible rationale for hiring individuals who are not the absolute top scorer, especially important when addressing adverse impact claims.
  • Promoting Holistic Review: Forcing the use of multiple, diverse criteria (interviews, experience) alongside test scores, leading to better overall hiring decisions.
  • Candidate Relations: Reducing perceived arbitrariness in the selection process by assuring candidates that differences within a band are not weighted differentially.

Criticisms and Potential Drawbacks

Despite its strong theoretical underpinnings in psychometrics, banding faces significant criticism, primarily centered on the practical implications of intentionally discarding numerical information. Critics argue that even if small score differences are not statistically significant, they still represent the best available estimate of true ability. In situations where hundreds of individuals score within the top band, ignoring the rank-order within that band means potentially bypassing the candidate who scored 98 in favor of a candidate who scored 90, simply because both scores fell within the 90–100 range. This practice is criticized for reducing the overall validity and utility of the selection process by failing to maximize the predictive power of the test instrument.

Another major concern involves the application of secondary criteria within the band. While banding intends to introduce fairness, critics contend that it merely shifts the potential for bias from the test score stage to the secondary criteria stage. If hiring managers are given discretion to use non-test factors—such as subjective interview ratings or perceived cultural fit—to differentiate among statistically equivalent candidates, the system may inadvertently introduce subjective biases or unfair preferences. If the organization uses banding primarily to achieve diversity targets, and the secondary criteria are not rigorously job-related, the practice could be challenged as a form of reverse discrimination or quota setting, undermining the credibility of the entire selection system.

Furthermore, the choice between fixed and sliding bands introduces specific methodological vulnerabilities. Fixed banding is criticized for creating artificial cliffs at band boundaries, while sliding banding, though more psychometrically sound, can be administratively complex and may necessitate continuous monitoring and recalculation as candidates are selected. Organizations must also contend with the perception among high-scoring candidates who fall near the top of the band that their superior performance has been effectively ignored. If Candidate A scores 99 and Candidate B scores 90 (both in the top band), Candidate A may feel unfairly treated if Candidate B is selected first based on a secondary, less reliable criterion. This requires careful communication and transparency to manage applicant expectations effectively.

In jurisdictions such as the United States, the use of banding is heavily scrutinized under Equal Employment Opportunity (EEO) laws. Organizations that adopt banding must be prepared to legally defend the methodology, particularly if they are using it to address adverse impact. The core legal defense rests on demonstrating that the test is valid and job-related, and that the SEM is correctly calculated and applied. Crucially, the organization must show that the resulting bands are based on sound psychometric principles and not simply arbitrary cutoffs designed to manipulate selection outcomes for non-job-related reasons, such as meeting specific demographic quotas, which is generally illegal.

Ethically, transparency is paramount. Organizations have an obligation to clearly communicate to applicants how scores are interpreted and how the banding process works. Failure to explain why a higher-scoring candidate might be bypassed in favor of a lower-scoring candidate within the same band can lead to confusion, frustration, and potential legal challenges based on lack of procedural fairness. The ethical implementation of banding requires that the secondary criteria used for internal differentiation are equally valid, reliable, and demonstrably linked to successful job performance, thus maintaining the overall utility of the selection process even when discretion is applied.

Legal precedent surrounding banding is complex and often context-dependent. While courts generally accept the psychometric principle of measurement error, the practical implementation of banding, especially when tied to diversity initiatives, can draw intense legal review. Organizations must meticulously document:

  1. The methodology used to calculate the SEM and set band widths.
  2. The specific, job-related criteria used to differentiate candidates within a band.
  3. Evidence that the selection process, even with banding, maintains the predictive validity necessary to ensure the competence of the workforce.

Without robust documentation and clear justification for the chosen methodology, banding risks being perceived not as a scientifically sound way to account for error, but as a mechanism for circumventing meritocratic principles.

Alternative Scoring Methods and Comparison

Banding is one of several methods organizations use to interpret and apply standardized test scores, and it contrasts sharply with other widely adopted techniques. The most traditional alternative is Strict Rank-Ordering, often termed top-down selection. In this method, candidates are selected strictly in descending order of their test scores until all positions are filled. While maximizing the correlation between test scores and selection decisions, this method often maximizes adverse impact, as every marginal point difference is assumed to represent a true ability difference. Banding is specifically designed as a compromise between the predictive utility of rank-ordering and the social utility of reducing adverse impact.

Another common alternative is the use of Fixed Cutoff Scores. This approach establishes a minimum acceptable score; candidates above the cutoff are considered qualified, and those below are eliminated. Once qualified, candidates might be treated equally or ranked based on other criteria. Cutoff scores are simpler to administer than banding but suffer from the arbitrary nature of the cutoff point itself—a score marginally above the cutoff is treated as vastly superior to a score marginally below it, despite the small numerical difference often falling within the SEM. Banding, by contrast, uses the SEM to define ranges of equivalence among the qualified population, offering a more statistically informed approach to group formation than a single fixed cutoff.

Finally, banding must be evaluated in the context of Utility Analysis, which assesses the economic benefit of various selection methods. While strict rank-ordering generally offers the highest theoretical utility (maximizing average job performance), banding seeks to achieve high utility while minimizing negative organizational costs associated with adverse impact, such as litigation risks and reduced diversity. The decision to use banding is ultimately a trade-off: accepting a marginal reduction in predicted job performance (by occasionally selecting a slightly lower scorer) in exchange for significant gains in fairness, diversity, and legal defensibility. Organizations must determine if the utility lost by abandoning strict rank-ordering is outweighed by the utility gained through organizational flexibility and enhanced equity.