Automated Scoring of Written Content

When test takers respond correctly to multiple-choice questions aimed at gauging whether they understand a concept or have understood ideas in a passage that they read, they have shown the ability to recognize and select the answer from a list of options. But it is hard to say with certainty whether they would have been able to choose the correct answer if it had not been included as one of the options. If we allow students to write a sentence or to "fill in the blank," their understanding can be tested. So, if we are able to control the costs of grading and the time that it takes to return the results, short written answers, consisting of a few sentences of text, would often be preferable to multiple-choice responses.

ETS has developed a technology, the c-rater™ automated scoring engine, that can accurately score the content of short written responses. The c-rater engine has been validated on responses from multiple testing programs and in many different content areas, including science, reading comprehension and history.

The c-rater engine's technology uses natural language processing to assess whether a student response contains text that corresponds to the concepts listed in the rubric for an item. To identify these concepts, the c-rater engine applies a sequence of natural language processing steps, including:

  • correcting students' spelling
  • determining the grammatical structure of each sentence
  • resolving pronoun reference
  • reasoning about words and their senses

The c-rater engine's use of deep linguistic analysis ensures that it can avoid being misled by responses that use the right words in the wrong context. It is common for students to produce responses of exactly this type. Purely statistical approaches based on words, such as latent semantic analysis, do not have access to the grammatical information that is needed, so will frequently be misled.

Current research focuses on extending the range of applications for automated short-answer assessment. In the classroom, or in online classes, it can be arranged for the computer to provide not only a score, but also feedback on particular aspects of the student’s performance. Because the feedback will not always be perfect, it must be used judiciously, but studies have shown that automated feedback can be a valuable complement to the work of a human instructor.

Featured Publications

Below are some recent or significant publications that our researchers have authored on the subject of automated scoring of written content.

2012

  • Towards Effective Tutorial Feedback for Explanation Questions: A Dataset and Baselines
    M. O. Dzikovska, R. D. Nielsen, and C. Brew
    Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 200–210
    Association for Computational Linguistics

    The authors propose a new shared task for grading student answers, where the goal is to enable targeted and flexible feedback in the form of a tutorial dialogue. They suggest that this corpus will be of interest to the researchers working in textual entailment and will stimulate new developments both in natural language processing in tutorial dialogue systems and textual entailment, contradiction detection and other techniques of interest for a variety of computational linguistics tasks.

  • Identifying High-Level Organizational Elements in Argumentative Discourse
    N. Madnani, M. Heilman, J. Tetreault, and M. Chodorow (2012)
    Proceedings of the 2012 Meeting of the North American Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)
    Publisher: Association for Computational Linguistics

    This paper discusses argumentative discourse and the benefit of differentiating between language that expresses claims and evidence and language that can be used to organize such claims and pieces of evidence. The authors suggest automation as a way to detect high-level organizational elements in an argumentative discourse that combines a rule-based system with a probabilistic sequence model. Download the full paper.

  • Measuring the Use of Factual Information in Test-Taker Essays
    B. Beigman-Klebanov and D. Higgins
    Proceedings of the 7th Workshop on the Innovative Use of NLP for Building Educational Applications, pp 63–72

    The authors studied how to measure the use of factual information in test-taker essays and how to assess its effectiveness when predicting essay scores. The article also discusses implications for development of automated essay scoring systems. Download the full paper.

2010

2009

  • Automating Model Building in c-rater
    J. Z. Sukkarieh & S. Stoyanchev
    Proceedings of TextInfer: The ACL/IJCNLP 2009 Workshop on Applied Textual Inference, pp. 61–69
    Association for Computational Linguistics

    In this paper, researchers describe an approach to automating tasks related to the modeling of different test responses used in ETS's c-rater engine. This approach saves, on average, 12 hours of human intervention per item. Download the full report

  • Towards Agile and Test-Driven Development in NLP Applications
    J. Z. Sukkarieh & J. Kamal
    Proceedings of the NAACL HLT Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 42–44
    Association for Computational Linguistics

    In this paper, the authors describe the ideal environment for developing natural language processing applications, using ETS's c-rater engine as an example. Download the full report.

  • c-rater: Automatic Content Scoring for Short Constructed Responses
    J. Z. Sukkarieh & J. Blackmore
    Proceedings of the 22nd International FLAIRS Conference
    Association for the Advancement of Artificial Intelligence

    This paper describes some of the major improvements in the development of ETS's c-rater automated content scoring engine. Download the full report.

2008

2003