angle-up angle-right angle-down angle-left close user menu open menu closed search globe bars phone store

Comparing the Validity of Automated and Human Essay Scoring

Powers, Donald E.; Burstein, Jill; Chodorow, Martin; Fowles, Mary E.; Kukich, Karen
Publication Year:
Report Number:
GREB-98-08aR, RR-00-10
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Automated Scoring Automated Scoring and Natural Language Processing Computer-Based Testing (CBT) Essay Scoring Graduate Record Examinations (GRE) Graduate Record Examinations Board GRE Writing Assessment Validity Writing Skills


This study sought to provide further evidence of the validity of automated, or computer-based, scores of complex performance assessments, such as direct tests of writing skill that require examinees to construct responses rather than select them from a set of multiple choices. While several studies have examined agreement between human scores and automated scoring systems, only a few have provided evidence of the relationship of automated scores to other, independent indicators of writing skill. This study examined relationships of each of two sets of Graduate Record Examinations (GRE) Writing Assessment scores--those given by human raters and those generated by e-rater (the system being researched for possible application in a variety of assessments that require natural language responses)--to several independent, nontest indicators of writing skill, such as academic, outside, and perceived success with writing. Analyses revealed significant, but modest, correlations between the nontest indicators and each of the two methods of scoring. That automated and human scores exhibited reasonably similar relations with the nontest indicators was taken as evidence that the two methods of scoring reflect similar aspects of writing proficiency. These relations were, however, somewhat weaker for automated scores than for scores awarded by humans.

Read More