How the e-rater Engine Works

When scoring essays, the e-rater^® engine will:

validate that the features are not only predictive of a readers' score but also have some logical relevance to the writing prompt
automatically flag responses that are off-topic or inconsistent, so that they can be set aside for review
combine the scoring features in a statistical model to produce a final score estimate

The e-rater engine is continually being developed and improved, with the aim of extending its ability to model important and challenging aspects of writing proficiency. Ongoing research aims to enhance the e-rater engine capabilities so that it can identify and evaluate the structure of an argument in an essay, as well as assess the creative use of language in student and test-taker writing.

e-rater features

The features used for e-rater scoring are the result of nearly 2 decades of Natural Language Processing research at ETS, and each feature may be composed of independent sub-features. Work has also been done to establish a vertically linked scale of K–12 writing scores across grades based on the e-rater engine, known as the Developmental Writing Scale.

The features of the e-rater scoring engine currently include:

content analysis based on vocabulary measures
lexical complexity/diction
proportion of grammar, usage and mechanics errors
proportion of style comments
organization and development scores
rewarding idiomatic phraseology

The adjustment of features to assign a total score to an essay can be tailored to a specific prompt, or in a "generic" fashion, allowing the same e-rater model to be used to score a variety of prompt responses.

Score agreement

For tasks that are appropriate for the e-rater engine (essay-length writing tasks that are scored for writing quality rather than correctness of claims made in the response), agreement with human raters can be very strong. As Attali, Bridgeman & Trapani found in 2010, Automated Essay Writing with e-rater v2.0 (PDF), the e-rater engine's agreement with a human rater on the TOEFL^® Independent and GRE^® Issue tasks was higher than the agreement between two independent human raters.

How the e-rater Engine Works

e-rater features

Score agreement

CONTACT US