skip to main content skip to footer

Examining the Calibration Process for Raters of the GRE General Test GRE

Wendler, Cathy; Glazer, Nancy; Cline, Frederick
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
General Test (GRE), Graduate Record Examination (GRE), Constructed-Response Scoring, Human Raters, Rater Performance, Calibration


One of the challenges in scoring constructed‐response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as a type of quality control during CR scoring. Calibration sets are designed to provide sufficient evidence that raters have understood and internalized the rubrics and can score accurately across all score points of the score scale. This study examined the calibration process used to qualify raters to score essays from the GRE® Analytical Writing measure. A total of 46 experienced raters participated in the study, and each rater scored up to 630 essays from 1 of 2 essay prompt types. Two research questions were evaluated: Does calibration influence scoring accuracy? and Does reducing the frequency of calibration impact scoring accuracy? While the distribution of score points represented by the essays used in the study did not necessarily reflect what raters see during operational scoring, results suggest that the influence of calibration on Day 1 remains with raters through at least 3 scoring days. Results further suggest that scoring accuracy may be moderated by prompt type. Nevertheless, study results indicate that daily calibration for GRE prompt types may not be necessary and that reducing the frequency of calibration is unlikely to reduce scoring accuracy.

Read More