Interrater Agreement: Definition, History, and Characteristics

Interrater agreement, also known as interobserver agreement or intercoder agreement, is a term used to describe the extent to which two or more individuals or groups assess the same situation in the same manner. It is a measure of the degree of agreement among raters in terms of their observations, ratings, or decisions. The ability to consistently measure the same phenomenon is important to ensure that the results of a study, assessment, or evaluation are reliable and valid.


The concept of interrater agreement has been traced back to the mid-19th century, when the first systematic attempts to measure agreement between two or more observers were developed. Early work in this area included the development of the Kappa coefficient, which was introduced by Jacob Cohen in 1960. This coefficient is still widely used in psychological and medical research and is one of the most commonly used measures of interrater agreement.


Interrater agreement is characterised by three main qualities: accuracy, consistency, and reliability. Accuracy measures the degree to which an observer’s ratings or decisions match the true state of the situation, while consistency measures the degree to which two or more observers agree on a given rating or decision. Reliability measures the degree to which an observer’s ratings or decisions remain consistent over time.

Interrater agreement is an important measure of the validity and reliability of research results, and is especially important in research that uses subjective ratings or observations. When there is a high degree of interrater agreement, it is likely that the results of the study are valid and reliable.


Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37-46.

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378-382.

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159-174.

Mellenbergh, G. J. (2008). Inter-rater agreement, reliability and generalizability theory. In G. J. Mellenbergh & J. J. Meyer (Eds.), Advances in contemporary methodology and statistics (pp. 41-72). Amsterdam: Elsevier Science.

Scroll to Top