skip to main content skip to footer

Another Look at Inter-Rater Agreement

Zwick, Rebecca J.
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
National Institute of Mental Health, Interrater Reliability, Research Methodology, Research Problems


Most currently used measures of inter-rater agreement for the nominal case incorporate a correction for "chance agreement." The definition of chance agreement is not the same for all coefficients, however. Three chance-corrected coefficients are Cohen's k, Scott's II, and the S index of Bennett, Goldstein, and Alpert, which has reappeared in many guises. For all three measures, chance is defined to include independence between raters. Scott's II involves a further assumption of homogeneous rater marginals under chance. For the S coefficient, uniform marginals for both raters under chance are assumed. Because of these disparate formulations, k, II, and S can lead to different conclusions about rater agreement. Consideration of the properties of these measures leads to the recommendation that a test of marginal homogeneity be conducted as a first step in the assessment of rater agreement. Rejection of the hypothesis of homogeneity is sufficient to conclude that agreement is poor. If the homogeneity hypothesis is retained, II can be used as an index of agreement. (26pp.)

Read More