Automated Scoring of Written Content
When test takers respond correctly to multiple-choice questions aimed at gauging whether they understand a concept or have understood ideas in a passage that they read, they have shown the ability to recognize and select the answer from a list of options. But it is hard to say with certainty whether they would have been able to choose the correct answer if it had not been included as one of the options. If we allow students to write a sentence or to "fill in the blank," their understanding can be tested. So, if we are able to control the costs of grading and the time that it takes to return the results, short written answers, consisting of a few sentences of text, would often be preferable to multiple-choice responses.
ETS has developed a technology, the c-rater™ automated scoring engine, that can accurately score the content of short written responses. The c-rater engine has been validated on responses from multiple testing programs and in many different content areas, including science, reading comprehension and history.
The c-rater engine's technology uses natural language processing to assess whether a student response contains text that corresponds to the concepts listed in the rubric for an item. To identify these concepts, the c-rater engine applies a sequence of natural language processing steps, including:
- correcting students' spelling
- determining the grammatical structure of each sentence
- resolving pronoun reference
- reasoning about words and their senses
The c-rater engine's use of deep linguistic analysis ensures that it can avoid being misled by responses that use the right words in the wrong context. It is common for students to produce responses of exactly this type. Purely statistical approaches based on words, such as latent semantic analysis, do not have access to the grammatical information that is needed, so will frequently be misled.
Current research focuses on extending the range of applications for automated short-answer assessment. In the classroom, or in online classes, it can be arranged for the computer to provide not only a score, but also feedback on particular aspects of the student’s performance. Because the feedback will not always be perfect, it must be used judiciously, but studies have shown that automated feedback can be a valuable complement to the work of a human instructor.
Featured Publications
Below are some recent or significant publications that our researchers have authored on the subject of automated scoring of written content.
2012
-
Towards Effective Tutorial Feedback for Explanation Questions: A Dataset and Baselines
M. O. Dzikovska, R. D. Nielsen, and C. Brew
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 200–210
Association for Computational LinguisticsThe authors propose a new shared task for grading student answers, where the goal is to enable targeted and flexible feedback in the form of a tutorial dialogue. They suggest that this corpus will be of interest to the researchers working in textual entailment and will stimulate new developments both in natural language processing in tutorial dialogue systems and textual entailment, contradiction detection and other techniques of interest for a variety of computational linguistics tasks.
-
Identifying High-Level Organizational Elements in Argumentative Discourse
N. Madnani, M. Heilman, J. Tetreault, and M. Chodorow (2012)
Proceedings of the 2012 Meeting of the North American Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)
Publisher: Association for Computational LinguisticsThis paper discusses argumentative discourse and the benefit of differentiating between language that expresses claims and evidence and language that can be used to organize such claims and pieces of evidence. The authors suggest automation as a way to detect high-level organizational elements in an argumentative discourse that combines a rule-based system with a probabilistic sequence model. Download the full paper.
-
Measuring the Use of Factual Information in Test-Taker Essays
B. Beigman-Klebanov and D. Higgins
Proceedings of the 7th Workshop on the Innovative Use of NLP for Building Educational Applications, pp 63–72The authors studied how to measure the use of factual information in test-taker essays and how to assess its effectiveness when predicting essay scores. The article also discusses implications for development of automated essay scoring systems. Download the full paper.
2010
-
Building a Textual Entailment Suite for Evaluating Content Scoring Technologies
J. Z. Sukkarieh & E. Bolge
Proceedings of the Seventh Conference on International Language Resources and Evaluation, pp. 3149–3156
European Language Resources AssociationThis paper describes a methodology and tools that ETS designed to evaluate and compare results among different automated scoring technologies. View the full abstract or download this report.
-
Maximum Entropy for the Automatic Content Scoring of Free-Text Responses
J. Z. Sukkarieh
Proceedings of the 30th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2010)
American Institute of PhysicsThis paper describes the application of a technique known as maximum entropy modeling, or MaxEnt, to the task of using natural language processing to evaluate the content of writing. Download the full report.
2009
-
Automating Model Building in c-rater
J. Z. Sukkarieh & S. Stoyanchev
Proceedings of TextInfer: The ACL/IJCNLP 2009 Workshop on Applied Textual Inference, pp. 61–69
Association for Computational LinguisticsIn this paper, researchers describe an approach to automating tasks related to the modeling of different test responses used in ETS's c-rater engine. This approach saves, on average, 12 hours of human intervention per item. Download the full report
-
Towards Agile and Test-Driven Development in NLP Applications
J. Z. Sukkarieh & J. Kamal
Proceedings of the NAACL HLT Workshop on Software Engineering, Testing, and Quality Assurance for Natural Language Processing, pp. 42–44
Association for Computational LinguisticsIn this paper, the authors describe the ideal environment for developing natural language processing applications, using ETS's c-rater engine as an example. Download the full report.
-
c-rater: Automatic Content Scoring for Short Constructed Responses
J. Z. Sukkarieh & J. Blackmore
Proceedings of the 22nd International FLAIRS Conference
Association for the Advancement of Artificial IntelligenceThis paper describes some of the major improvements in the development of ETS's c-rater automated content scoring engine. Download the full report.
2008
-
Effect of Immediate Feedback and Revision on Psychometric Properties of Open-Ended GRE® Subject Test Items
Y. Attali & D. Powers
ETS Research Report No. RR-08-21In this study, registered examinees for the GRE® Subject Tests in Biology and Psychology participated in a Web-based experiment where they answered open-ended questions that were automatically scored by the c-rater™ scoring engine. Study participants received immediate feedback and an opportunity to revise their answers. View the full abstract or download the full report.
-
Leveraging c-rater's Automated Scoring Capability for Providing Instructional Feedback for Short Constructed Responses
J. Z. Sukkarieh & E. Bolge
Lecture Notes in Computer Science: Vol. 5091. Proceedings of the 9th International Conference on Intelligent Tutoring Systems, ITS 2008, pp. 779–783
SpringerThis paper describes ETS's c-rater engine and considers its potential as an instructional tool. View the full abstract or order this report.
2003
-
c-rater: Scoring of Short-Answer Questions
C. Leacock & M. Chodorow
Computers and the Humanities, Vol. 37, pp. 389–405In this article, the authors describe the c-rater engine's use in two studies, one involving the National Assessment for Educational Progress (NAEP) and the other a statewide assessment in Indiana. View the full abstract or order this report.