skip to main content skip to footer

Computer Reader Agreement for the GRE Writing Assessment GRE

Powers, Donald E.
Publication Year:
Report Number:
ETS Research Memorandum
Document Type:
Page Count:
Subject/Key Words:
Constructed-Response Tests, Writing Evaluation, Scoring, Interrater Reliability, Test Reliability, Kappa Coefficient, Essay Tests, Graduate Record Examinations (GRE), GRE Writing Assessment


Many kinds of constructed-response measures, including essay tests, require subjective judgments of the quality of examinee responses. For these measures, readers (i.e., judges or raters) constitute a potential source of error variance. It is important, therefore, to have some means of monitoring the degree to which readers may disagree with respect to the scores they assign to constructed-response measures. For many writing assessments, the tradition has been simply to report the rates of agreement - either exact or within one point - for pairs of readers. Using data from approximately the first 1500 examinees to take the GRE Writing Assessment, this paper shows the extent to which reader agreement may result by chance. It also compares various agreement statistics for these data. It concludes that, because they correct for chance agreement, statistics such as Cohen's kappa are more appropriate ways of expressing inter-reader agreement than are simple percentage agreement statistics. The former should therefore be computed routinely to supplement the simpler statistics.

Read More