Automated Scoring of Written Content
When test takers respond correctly to multiple-choice questions aimed at gauging whether they understand a concept or have understood ideas in a passage that they read, they have shown the ability to recognize and select the answer from a list of options. But it is hard to say with certainty whether they would have been able to choose the correct answer if it had not been included as one of the options. If we allow students to write a sentence or to "fill in the blank," their understanding can be tested. So, if we are able to control the costs of grading and the time that it takes to return the results, short written answers, consisting of a few sentences of text, would often be preferable to multiple-choice responses.
ETS has developed a technology, the c-rater™ automated scoring engine, that can accurately score the content of short written responses. The c-rater engine has been validated on responses from multiple testing programs and in many different content areas, including science, reading comprehension and history.
The c-rater engine's technology uses natural language processing to assess whether a student response contains text that corresponds to the concepts listed in the rubric for an item. To identify these concepts, the c-rater engine applies a sequence of natural language processing steps, including:
- correcting students' spelling
- determining the grammatical structure of each sentence
- resolving pronoun reference
- reasoning about words and their senses
The c-rater engine's use of deep linguistic analysis ensures that it can avoid being misled by responses that use the right words in the wrong context. It is common for students to produce responses of exactly this type. Purely statistical approaches based on words, such as latent semantic analysis, do not have access to the grammatical information that is needed, so will frequently be misled.
Current research focuses on extending the range of applications for automated short-answer assessment. In the classroom, or in online classes, it can be arranged for the computer to provide not only a score, but also feedback on particular aspects of the student’s performance. Because the feedback will not always be perfect, it must be used judiciously, but studies have shown that automated feedback can be a valuable complement to the work of a human instructor.
Below are some recent or significant publications that our researchers have authored on the subject of automated scoring of written content.
Towards Effective Tutorial Feedback for Explanation Questions: A Dataset and Baselines
M. O. Dzikovska, R. D. Nielsen, and C. Brew
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 200–210
Association for Computational Linguistics
The authors propose a new shared task for grading student answers, where the goal is to enable targeted and flexible feedback in the form of a tutorial dialogue. They suggest that this corpus will be of interest to the researchers working in textual entailment and will stimulate new developments both in natural language processing in tutorial dialogue systems and textual entailment, contradiction detection and other techniques of interest for a variety of computational linguistics tasks.
Identifying High-Level Organizational Elements in Argumentative Discourse
N. Madnani, M. Heilman, J. Tetreault, and M. Chodorow (2012)
Proceedings of the 2012 Meeting of the North American Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)
Publisher: Association for Computational Linguistics
This paper discusses argumentative discourse and the benefit of differentiating between language that expresses claims and evidence and language that can be used to organize such claims and pieces of evidence. The authors suggest automation as a way to detect high-level organizational elements in an argumentative discourse that combines a rule-based system with a probabilistic sequence model. View citation record >
Measuring the Use of Factual Information in Test-Taker Essays
B. Beigman-Klebanov and D. Higgins
Proceedings of the 7th Workshop on the Innovative Use of NLP for Building Educational Applications, pp 63–72
The authors studied how to measure the use of factual information in test-taker essays and how to assess its effectiveness when predicting essay scores. The article also discusses implications for development of automated essay scoring systems. View citation record >
Effect of Immediate Feedback and Revision on Psychometric Properties of Open-Ended GRE® Subject Test Items
Y. Attali & D. Powers
ETS Research Report No. RR-08-21
In this study, registered examinees for the GRE® Subject Tests in Biology and Psychology participated in a Web-based experiment where they answered open-ended questions that were automatically scored by the c-rater™ scoring engine. Study participants received immediate feedback and an opportunity to revise their answers. View citation record >
Leveraging c-rater's Automated Scoring Capability for Providing Instructional Feedback for Short Constructed Responses
J. Z. Sukkarieh & E. Bolge
Lecture Notes in Computer Science: Vol. 5091. Proceedings of the 9th International Conference on Intelligent Tutoring Systems, ITS 2008, pp. 779–783
This paper describes ETS's c-rater engine and considers its potential as an instructional tool. View citation record >
c-rater: Scoring of Short-Answer Questions
C. Leacock & M. Chodorow
Computers and the Humanities, Vol. 37, pp. 389–405
In this article, the authors describe the c-rater engine's use in two studies, one involving the National Assessment for Educational Progress (NAEP) and the other a statewide assessment in Indiana. View citation record >
Find More Articles
View more research publications related to automated scoring of written content.
Read More from Our Researchers
View a list of current ETS researchers and their work.