Automated Scoring of Writing Quality

The e-rater® automated scoring engine is ETS's proven capability for automated scoring of expository, persuasive and summary essays, currently used in multiple high-stakes programs. The e-rater engine is used in combination with human raters to score the Writing sections of the TOEFL® and GRE® tests, as psychometric research has demonstrated that this combination is superior to either machine scoring or human scoring on their own. It is also used as the sole score in lower-stakes contexts, such as formative use in a classroom setting with ETS's Criterion® online essay evaluation system. The e-rater engine is used to generate the individualized feedback provided to users of Criterion. The e-rater engine addresses the need for essay scoring that is reliable, valid, fast and flexible, as more and more testing programs, including large-volume state testing, move to online delivery and adopt essay-based tasks for writing assessment.

ETS has conducted over a decade of ground-breaking research in natural language processing related to the automated identification of text features characteristic of developing writers. The patented e-rater engine — a platform that automatically provides a rich set of underlying linguistic representations related to writing quality, in addition to scores — represents the culmination of this research to date.

The e-rater engine predicts essay scores based on features related to writing quality, including:

  • errors in grammar (e.g., subject-verb agreement)
  • usage (e.g., preposition selection)
  • mechanics (e.g., capitalization)
  • style (e.g., repetitious word use)
  • discourse structure (e.g., presence of a thesis statement, main points)
  • vocabulary usage (e.g., relative sophistication of vocabulary)

The e-rater engine also includes features related to vocabulary, content appropriateness, organization and development. The e-rater engine's score predictions have been shown to correlate strongly with the scores of human raters and other measures of writing ability. It can also automatically detect responses that are off-topic or otherwise anomalous, and therefore should not be scored.

ETS has an active research agenda that investigates new genres (digital writing formats, such as blogs) and the development of linguistic features suitable for modeling aspects of content and argumentation that reflect additional components of writing quality, such as:

  • metrics of text coherence
  • organization of claims and evidence
  • the writer's stance toward the test question
  • the identification of particular topics addressed in the response
  • the use of supporting facts from external sources

As e-rater research continues, this research agenda aims to grow the array of writing genres that can be addressed; work with new genres and features to support the writing of English learners (e.g., correct use of articles, prepositions and collocations); and advance the state of the art with regard to evaluating the quality of argumentation across different genres and modes of discourse.

Featured Publications

Below are some recent or significant publications that our researchers have authored on the subject of automated scoring of writing quality.

2014

  • Using Writing Process and Product Features to Assess Writing Quality and Explore How Those Features Relate to Other Literacy Tasks
    P. Deane
    ETS Research Report No. RR-14-03

    This report explores automated methods for measuring features of student writing and determining its relationship to writing quality and other features of literacy, such as reading test scores. The e-rater® automatic essay scoring system and keystroke logging are a central focus. View citation record >

  • Predicting Grammaticality on an Ordinal Scale
    M. Heilman, A. Cahill, N. Madnani, M. Lopez, M. Mulholland, & J. Tetreault
    Paper in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, (Short Papers), pp. 174–180

    This paper describes a system for predicting the grammaticality of sentences on an ordinal scale. Such a system could be used in educational applications such as essay scoring. View citation record >

  • An Explicit Feedback System for Preposition Errors based on Wikipedia Revisions
    N. Madnani & A. Cahill
    Paper in Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 79–88

    In this paper, the authors describe a novel tool they developed to provide automated explicit feedback to language learners based on data mined from Wikipedia revisions. They demonstrate how the tool works for the task of identifying preposition selection errors. View citation record >

  • Difficult Cases: From Data to Learning and Back
    B. Beigman Klebanov & E. Beigman
    Paper in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, (Short Papers), pp. 390–396

    This paper addresses cases in annotated datasets that are difficult to annotate reliably. Using a semantic annotation task, the authors provide empirical evidence that difficult cases can thwart supervised machine learning on the one hand and provide valuable insights into the characteristics of the data representation chosen for the task on the other. View citation record >

  • Different Texts, Same Metaphors: Unigrams and Beyond
    B. Beigman Klebanov, C. Leong, M. Heilman, & M. Flor (2014)
    Paper in Proceedings of the Second Workshop on Metaphor in NLP, pp. 11–17

    This paper describes the development of a supervised learning system to classify all content words in a running text as either being used metaphorically or not. View citation record >

  • Applying Argumentation Schemes for Essay Scoring
    Y. Song, M. Heilman, B. Beigman Klebanov, & P. Deane
    Paper in Proceedings of the First Workshop on Argumentation Mining, pp. 69–78

    In this paper, the authors develop an annotation approach based on the theory of argumentation schemes to analyze the structure of arguments and implement an NLP system for automatically predicting where critical questions are raised in essays. View citation record >

2013

  • Handbook of Automated Essay Evaluation: Current Applications and New Directions
    M. D. Shermis & J. Burstein

    This comprehensive, interdisciplinary handbook reviews the latest methods and technologies used in automated essay evaluation (AEE) methods and technologies. New York: Routledge. View citation record >

  • Robust Systems for Preposition Error Correction Using Wikipedia Revisions
    A. Cahill, N. Madnani, J. Tetreault, & D. Napolitano
    In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 507–517, Atlanta, Ga.

    This paper addresses the lack of generalizability in preposition error correction systems across different test sets. The authors then present a large new annotated corpus to be used in training such systems, and illustrate the use of the corpus in training systems across three separate test sets. View citation record >

  • Detecting Missing Hyphens in Learner Text
    A. Cahill, M. Chodorow, S. Wolff and N. Madnani
    In Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 300–305, Atlanta, Ga.

    This paper presents a method for automatically detecting missing hyphens in English text. View citation record >

  • Automated Scoring of a Summary-Writing Task Designed to Measure Reading Comprehension
    N. Madnani, J. Burstein, J. Sabatini, & T. O'Reilly
    In Proceedings of the North American Association for Computational Linguistics Eighth Workshop Using Innovative NLP for Building Educational Applications, Atlanta, Ga., June 13, 2013

    This paper introduces a cognitive framework for measuring reading comprehension that includes the use of novel summary writing tasks. View citation record >

  • The e-rater® Automated Essay Scoring System 
    J. Burstein, J. Tetreault, & N. Madnani
    In M. D. Shermis, & J. Burstein (Eds.), Handbook of Automated Essay Scoring: Current Applications and Future Directions. New York: Routledge.

    This handbook chapter includes a description of the e-rater automated essay scoring system and its NLP-centered approach, and a discussion of the system's applications and development efforts for current and future educational settings. View citation record >

2012

2010

  • Using Parse Features for Preposition Selection and Error Detection
    J. Tetreault, J. Foster, & M. Chodorow
    Proceedings of the 2010 Association for Computational Linguistics (ACL 2010)
    Association for Computational Linguistics

    This paper evaluates the effect of adding features that aim to improve the detection of preposition errors in writing from speakers of English as a second language. View citation record >

  • Progress and New Directions in Technology for Automated Essay Evaluation
    J. Burstein & M. Chodorow
    The Oxford Handbook of Applied Linguistics, 2nd Edition, pp. 487–497
    Editor: R. Kaplan
    Oxford University Press

    This ETS-authored work is part of a 39-chapter volume that covers topics in applied linguistics with the goal of providing a survey of the field, showing the many connections among its subdisciplines, and exploring likely directions of its future development. View citation record >

  • Using Entity-Based Features to Model Coherence in Student Essays
    J. Burstein, J. Tetreault, & S. Andreyev
    Human language technologies: The 2010 Annual Conference of the North American Chapter of the ACL, pp. 681–684
    Association for Computational Linguistics

    This paper describes a study in which researchers combined an algorithm for observing what computational linguists refer to as entities — nouns and pronouns — with natural language processing features related to grammar errors and word usage with the aim of creating applications that can evaluate evidence of coherence in essays. View citation record >

2008

2006

  • Automated Essay Scoring With e-rater v.2.0
    Y. Attali & J. Burstein
    Journal of Technology, Learning, and Assessment, Vol. 4, No. 3

    This article describes Version 2 of ETS's e-rater essay scoring engine. The authors present evidence on the validity and reliability of the scores that the system generates. View citation record >

2003

Find More Articles

View more research publications related to automated scoring of writing quality.

Read More from Our Researchers

View a list of current ETS researchers and their work.