Automated Scoring of Writing Quality

The e-rater® automated scoring engine is ETS's proven capability for automated scoring of expository, persuasive and summary essays, currently used in multiple high-stakes programs. The e-rater engine is used in combination with human raters to score the Writing sections of the TOEFL® and GRE® tests, as psychometric research has demonstrated that this combination is superior to either machine scoring or human scoring on their own. It is also used as the sole score in lower-stakes contexts, such as formative use in a classroom setting with ETS's Criterion® online essay evaluation system. The e-rater engine is used to generate the individualized feedback provided to users of Criterion. The e-rater engine addresses the need for essay scoring that is reliable, valid, fast and flexible, as more and more testing programs, including large-volume state testing, move to online delivery and adopt essay-based tasks for writing assessment.

ETS has conducted over a decade of ground-breaking research in natural language processing related to the automated identification of text features characteristic of developing writers. The patented e-rater engine — a platform that automatically provides a rich set of underlying linguistic representations related to writing quality, in addition to scores — represents the culmination of this research to date.

The e-rater engine predicts essay scores based on features related to writing quality, including:

  • errors in grammar (e.g., subject-verb agreement)
  • usage (e.g., preposition selection)
  • mechanics (e.g., capitalization)
  • style (e.g., repetitious word use)
  • discourse structure (e.g., presence of a thesis statement, main points)
  • vocabulary usage (e.g., relative sophistication of vocabulary)

The e-rater engine also includes features related to vocabulary, content appropriateness, organization and development. The e-rater engine's score predictions have been shown to correlate strongly with the scores of human raters and other measures of writing ability. It can also automatically detect responses that are off-topic or otherwise anomalous, and therefore should not be scored.

ETS has an active research agenda that investigates new genres (digital writing formats, such as blogs) and the development of linguistic features suitable for modeling aspects of content and argumentation that reflect additional components of writing quality, such as:

  • metrics of text coherence
  • organization of claims and evidence
  • the writer's stance toward the test question
  • the identification of particular topics addressed in the response
  • the use of supporting facts from external sources

As e-rater research continues, this research agenda aims to grow the array of writing genres that can be addressed; work with new genres and features to support the writing of English learners (e.g., correct use of articles, prepositions and collocations); and advance the state of the art with regard to evaluating the quality of argumentation across different genres and modes of discourse.

Featured Publications

Below are some recent or significant publications that our researchers have authored on the subject of automated scoring of writing quality.

2012

2010

  • Using Parse Features for Preposition Selection and Error Detection
    J. Tetreault, J. Foster, & M. Chodorow
    Proceedings of the 2010 Association for Computational Linguistics (ACL 2010)
    Association for Computational Linguistics

    This paper evaluates the effect of adding features that aim to improve the detection of preposition errors in writing from speakers of English as a second language. Download the full report.

  • Rethinking Grammatical Error Annotation and Evaluation with the Amazon Mechanical Turk
    J. Tetreault, E. Filatova, & M. Chodorow
    NAACL-HLT: 2010 Proceedings of the 5th Workshop on Building Educational Applications (BEA-5)
    Association for Computational Linguistics

    This paper presents the results of two pilot studies that show that using the Amazon Mechanical Turk for preposition error annotation is as effective as using trained raters, but at a fraction of the time and cost. Download the full report.

  • Progress and New Directions in Technology for Automated Essay Evaluation
    J. Burstein & M. Chodorow
    The Oxford Handbook of Applied Linguistics, 2nd Edition, pp. 487–497
    Editor: R. Kaplan
    Oxford University Press

    This ETS-authored work is part of a 39-chapter volume that covers topics in applied linguistics with the goal of providing a survey of the field, showing the many connections among its sub-disciplines, and exploring likely directions of its future development. Learn more about this work.

  • Unsupervised Prompt Expansion for Off-Topic Essay Detection
    A. Louis & D. Higgins
    Proceedings of the Workshop on Building Educational Applications, HLT-NAACL 2010
    Association for Computational Linguistics

    This paper addresses the problem of getting software based on natural language processing technology to predict, without having previously analyzed essays as training data, whether an essay is "off-topic" — that is, irrelevant to the given prompt or question. Download the full report.

  • Using Entity-Based Features to Model Coherence in Student Essays
    J. Burstein, J. Tetreault, & S. Andreyev
    Human language technologies: The 2010 Annual Conference of the North American Chapter of the ACL, pp. 681–684
    Association for Computational Linguistics

    This paper describes a study in which researchers combined an algorithm for observing what computational linguists refer to as entities — nouns and pronouns — with natural language processing features related to grammar errors and word usage with the aim of creating applications that can evaluate evidence of coherence in essays. Download the full report.

2009

2008

2006

2005

  • Online Assessment in Writing
    N. Horkay, R. E. Bennett, N. Allen, & B. Kaplan
    Online Assessment in Mathematics and Writing: Reports from the NAEP Technology-Based Assessment Project
    NCES Report No. 2005-457
    U.S. Department of Education, National Center for Education Statistics

    The 2002 Writing Online (WOL) study is the second of three field investigations in the Technology-Based Assessment project, which explores the use of new technology in administering the National Assessment of Educational Progress (NAEP). Learn more or download the full report.

2003