angle-up angle-right angle-down angle-left close user menu open menu closed search globe bars phone store

Automated Scoring of Writing Quality

The e-rater® automated writing evaluation engine is ETS's patented capability for automated evaluation of expository, persuasive and summary essays. Multiple assessment programs use the engine. The engine is used in combination with human raters to score the writing sections of the TOEFL iBT® and GRE® tests.

The e-rater engine is also used as the sole score in learning contexts, such as formative use in a classroom setting with ETS's Criterion® online essay evaluation system. In the Criterion application, the engine is used to generate individualized feedback for students, addressing an increasingly important need for automated essay evaluation that is reliable, valid, fast and flexible.

The e-rater engine features related to writing quality include:

  • errors in grammar (e.g., subject-verb agreement)
  • usage (e.g., preposition selection)
  • mechanics (e.g., capitalization)
  • style (e.g., repetitious word use)
  • discourse structure (e.g., presence of a thesis statement, main points)
  • vocabulary usage (e.g., relative sophistication of vocabulary)
  • sentence variety
  • source use
  • discourse coherence quality

The e-rater engine can also automatically detect responses that are off-topic or otherwise anomalous and, therefore, should not be scored.

ETS has an active research agenda that investigates new automated scoring features for genres of writing beyond traditional essay genres, and now includes source-based and argumentative writing tasks found on assessments, as well as lab reports or social science papers.

Featured Publications

Below are some recent or significant publications that our researchers have authored that highlight research in automated writing evaluation.





  • Content Importance Models for Scoring Writing From Sources
    B. Beigman Klebanov, N. Madnani, N., J. Burstein, & S. Somasundaran
    Paper in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), pp. 247–252

    This paper describes an integrative summarization task used in an assessment of English proficiency for nonnative speakers applying to higher education institutions in the United States. Researchers evaluate a variety of content importance models that help predict which parts of the source material the test taker would need to include in a successful response. Learn more about this publication

  • Using Writing Process and Product Features to Assess Writing Quality and Explore How Those Features Relate to Other Literacy Tasks
    P. Deane
    ETS Research Report No. RR-14-03

    This report explores automated methods for measuring features of student writing and determining its relationship to writing quality and other features of literacy, such as reading test scores. The e-rater automated essay-scoring system and keystroke logging are a central focus. Learn more about this publication

  • Predicting Grammaticality on an Ordinal Scale
    M. Heilman, A. Cahill, N. Madnani, M. Lopez, M. Mulholland, & J. Tetreault
    Paper in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, (Short Papers), pp. 174–180

    This paper describes a system for predicting the grammaticality of sentences on an ordinal scale. Such a system could be used in educational applications such as essay scoring. Learn more about this publication

  • An Explicit Feedback System for Preposition Errors based on Wikipedia Revisions
    N. Madnani & A. Cahill
    Paper in Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 79–88

    In this paper, the authors describe a novel tool they developed to provide automated explicit feedback to language learners based on data mined from Wikipedia revisions. They demonstrate how the tool works for the task of identifying preposition selection errors. Learn more about this publication

  • Difficult Cases: From Data to Learning and Back
    B. Beigman Klebanov & E. Beigman
    Paper in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, (Short Papers), pp. 390–396

    This paper addresses cases in annotated datasets that are difficult to annotate reliably. Using a semantic annotation task, the authors provide empirical evidence that difficult cases can thwart supervised machine learning on the one hand and provide valuable insights into the characteristics of the data representation chosen for the task on the other. Learn more about this publication

  • Different Texts, Same Metaphors: Unigrams and Beyond
    B. Beigman Klebanov, C. Leong, M. Heilman, & M. Flor (2014)
    Paper in Proceedings of the Second Workshop on Metaphor in NLP, pp. 11–17

    This paper describes the development of a supervised learning system to classify all content words in a running text as either being used metaphorically or not. Learn more about this publication

  • Lexical Chaining for Measuring Discourse Coherence Quality in Test-taker Essays
    S. Somasundaran, J. Burstein, & M. Chodorow
    In The 25th International Conference on Computational Linguistics (COLING), Dublin, Ireland, August 23–29, 2014.
    Paper in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 950–961

    Researchers investigated a technique known as lexical chaining for measuring discourse coherence quality in test-taker essays. In this paper, they describe the contexts in which they achieved the best system performance. Learn more about this publication

  • Applying Argumentation Schemes for Essay Scoring
    Y. Song, M. Heilman, B. Beigman Klebanov, & P. Deane
    Paper in Proceedings of the First Workshop on Argumentation Mining, pp. 69–78

    In this paper, the authors develop an annotation approach based on the theory of argumentation schemes to analyze the structure of arguments and implement an NLP system for automatically predicting where critical questions are raised in essays. Learn more about this publication




  • Using Parse Features for Preposition Selection and Error Detection
    J. Tetreault, J. Foster, & M. Chodorow
    Proceedings of the 2010 Association for Computational Linguistics (ACL 2010)
    Association for Computational Linguistics

    This paper evaluates the effect of adding features that aim to improve the detection of preposition errors in writing from speakers of English as a second language. Learn more about this publication

  • Progress and New Directions in Technology for Automated Essay Evaluation
    J. Burstein & M. Chodorow
    The Oxford Handbook of Applied Linguistics, 2nd Edition, pp. 487–497
    Oxford University Press

    This ETS-authored work is part of a 39-chapter volume that covers topics in applied linguistics with the goal of providing a survey of the field, showing the many connections among its subdisciplines, and exploring likely directions of its future development. Learn more about this publication

  • Using Entity-Based Features to Model Coherence in Student Essays
    J. Burstein, J. Tetreault, & S. Andreyev
    Human language technologies: The 2010 Annual Conference of the North American Chapter of the ACL, pp. 681–684
    Association for Computational Linguistics

    This paper describes a study in which researchers combined an algorithm for observing what computational linguists refer to as entities — nouns and pronouns — with natural language processing features related to grammar errors and word usage with the aim of creating applications that can evaluate evidence of coherence in essays. Learn more about this publication




Find More Articles

View more research publications related to automated scoring of writing quality.