Proven Automated Scoring Capabilities
There is a long history of research on automated scoring of constructed response (CR) and performance tasks. In recent years, the technology has begun to have a widespread impact on large-scale assessment, with its introduction into the GRE® test, the TOEFL® test and many state assessments. Automated scoring promises to drive fundamental improvements in the speed, cost and scalability of performance assessment, but a knowledgeable partner is necessary in order to get the most out of the technology. ETS has been at the forefront of research into new techniques in automated scoring of performance tasks and can bring this knowledge base to bear in the real-time scoring of PARCC's and Smarter Balanced's large-scale assessments.
Based on more than 15 years of ETS-supported research and development, we currently offer a suite of automated scoring applications that can consistently score items, including the:
- e-rater® engine, which evaluates the quality of essays written on the computer
- c-rater™ engine, which detects the presence of particular expected concepts in a student response
- m-rater engine, which scores CR mathematics items for which the response is a number, an equation or mathematical expression, or a graph
- SpeechRaterSM engine, which scores spoken responses for pronunciation, fluency, vocabulary usage and prosodic features
ETS has a broad research and development agenda to continually improve its automated scoring technologies, based on new findings and techniques in the field such as natural language processing (NLP), machine learning and educational measurement. This includes both long-term research goals to improve the depth with which important constructs can be measured, as well as near-term development goals for improving and enhancing existing capabilities.
Development Goals for Improving and Enhancing Existing Capabilities
Improvements to the e-rater engine that are currently under development include special scoring and feedback for English-language learners, improvement of grammatical error analysis features and the incorporation of context-sensitive spelling error identification.
One immediate focus of c-rater development is an update to its core NLP components to improve, for example, its ability to identify correct content despite the presence of spelling and grammar errors. Another area of interest is the creation of a method to identify references to points found in the stimulus, but not incorporated in the conceptual rubric.
Enhancements to the m-rater engine will include upgrades to the interface components into which equations and graphs can be entered. Besides general updates designed to improve interface usability and extend the range of mathematics that can be assessed, we will modify the graph editor so that users can set the viewing window and enter labels for the axes. (These new features will be scoreable.) We also will add support for m-rater advisories, which will flag cases in which a response is syntactically ill-formed, perhaps due to a typographical error by the student, and may need to be routed for special processing.
Finally, ETS's immediate development goals for the SpeechRater engine include generalizing the capability to allow it to be used for a range of different item types and extending it to provide formative feedback as well as summative scores.
Beyond these near-term development goals for our capabilities, ETS remains deeply involved in fundamental research intended to improve not only our own capabilities but the state of the art in NLP and speech analysis for assessment purposes. One strand of this research, which is particularly important given the emphasis that the CCSS place on student ability to demonstrate critical thinking in productive language, is next-generation content scoring. Current methods of scoring student writing, to the extent they are able to address written content at all, do so with methods that rely on meaningful units no larger than single words.
Get more information about automated scoring and natural language processing. Numerous publications, authored or co-authored by ETS staff, also discuss automated scoring in greater detail:
- Automated Scoring for the Assessment of Common Core Standards by multiple authors, including David M. Williamson, Randy E. Bennett and Stephen Lazer, discusses automated scoring as a means for helping to achieve valid and efficient measurement of abilities that are best measured by CR items.
- Automated Scoring of Constructed-Response Literacy and Mathematics Items by Randy E. Bennett identifies potential uses and challenges around automated scoring to help the consortia make better-informed planning and implementation decisions.
- A Validity-Based Approach to Quality Control and Assurance of Automated Scoring by Isaac Bejar proposes a validity-based approach to ensure the quality of scores based on automated means.
- A Three-Stage Approach to the Automated Scoring of Spontaneous Spoken Responses by Derrick Higgins, Klaus Zechner, Xiaoming Xi and David M. Williamson presents a description and evaluation of SpeechRater, a system for automated scoring of nonnative speakers' spoken English proficiency.
The K–12 Center at ETS offers a variety of resources on the assessment consortia, including summaries of their designs and future plans, videos and presentations.
ETS has assisted the NAEP program in introducing numerous psychometric and assessment design innovations over the years. Learn more >