skip to main content skip to footer

Evaluating a Prototype Essay Scoring Procedure Using Off-the-Shelf Software

Burstein, Jill; Kaplan, Bruce A.; Kaplan, Randy M.; Lu, Chi; Rock, Donald A.; Trenholm, Harriet; Wolff, Susanne
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Automation, Computer Software, Constructed Responses, Essay Tests, Models, Scoring, Automated Scoring and Natural Language Processing


Constructed-response items, whose responses consist of words, phrases, sentences, paragraphs, and essays, are among the most difficult and costly to score. The increased use of constructed-response items like essays creates a need for tools to partially or fully automatically score these responses. This study explores one approach to analyzing essay-length natural language constructed-responses. In this study we develop and evaluate a decision model for scoring essays. The decision model uses off- the-shelf software for grammar and style checking of the English language. The first part of this study consisted of an evaluation of several commercial grammar checking programs. From this evaluation we select the best performing grammar checking programs to construct a decision model for scoring the essays. The second part of the study uses data produced from the selected grammar checking program(s) to make a decision about the score for an essay. Through statistical and linguistic methods, we analyze the performance of the decision model in an effort to understand its usefulness and practicality in a production scoring setting. (80pp.)

Read More