skip to main content skip to footer

Evaluations of Automated Scoring Systems in Practice

Rotou, Ourania; Rupp, Andre A.
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Human Raters, Constructed-Response Tests, Automated Scoring, Natural Language Processing, Large-Scale Assessment, Evaluation Design


This research report provides a description of the processes of evaluating the “deployability” of automated scoring (AS) systems from the perspective of large-scale educational assessments in operational settings. It discusses a comprehensive psychometric evaluation that entails analyses that take into consideration the specific purpose of AS, the test design, the quality of human scores, the data collection design needed to train and evaluate the AS model, and the application of statistics and evaluation criteria. Finally, it notes that an effective evaluation of an AS system requires professional judgment coupled with statistical and psychometric knowledge and understanding of the risk assessment and business metrics.

Read More