Comparing the Validity of Automated and Human Essay Scoring

Author(s):: Powers, Donald E.; Burstein, Jill; Chodorow, Martin; Fowles, Mary E.; Kukich, Karen
Publication Year:: 2014
Source:: Wendler, Cathy; Bridgeman, Brent (eds.) with assistance from Chelsea Ezzo. The Research Foundation for the GRE revised General Test: A Compendium of Studies. Princeton, NJ: Educational Testing Service, 2014, p4.2.1-4.2.4
Document Type:: Chapter
Page Count:: 4
Subject/Key Words:: Graduate Record Examination (GRE), Revised GRE, Test Design, Test Revision, Automated Essay Scoring (AES), Validity, Human Scoring, e-rater, Scoring Models

Abstract

Reports on a study that evaluated different aspects of the validity of automated scoring. Although the agreement of machine and human scores is a component of the validity argument for automated scoring, evidence of relationships to other measures of writing ability is also critical. In this study, data were collected from approximately 1,700 prospective graduate students from 26 colleges and universities across the United States; half of the group wrote one issue essay and one argument essay, while the other half wrote either two issue or two argument essays. Indicators of writing skills, such as course writing samples and test takers’ perceptions of their writing skill level, were compared to the scores on the issue and argument essays generated by e-rater and those generated by human raters. Ratings based on the combination of one human rater and e-rater correlated with the external criteria to almost the same extent as ratings based on two human raters.

http://www.ets.org/s/research/pdf/gre_compendium.pdf

Comparing the Validity of Automated and Human Essay Scoring

Abstract

Read More