Automated Scoring of Speech
ETS's SpeechRater℠ engine is the only spoken response scoring application that is used to score spontaneous responses, in which the range of valid responses is open-ended rather than narrowly determined by the item stimulus. Test takers preparing to take the TOEFL® test have had their responses scored by the SpeechRater engine as part of the TOEFL Practice Online test since 2006. Competing capabilities focus on assessing low-level aspects of speech production such as pronunciation by using restricted tasks in order to increase reliability. The SpeechRater engine, by contrast, is based on a broad conception of the construct of English speaking proficiency, encompassing aspects of speech delivery (such as pronunciation and fluency), grammatical facility and higher-level abilities related to topical coherence and the progression of ideas.
The SpeechRater engine processes each response with an automated speech recognition system specially adapted for use with nonnative English. Based on the output of this system, natural language processing is used to calculate a set of features that define a "profile" of the speech on a number of linguistic dimensions, including fluency, pronunciation, vocabulary usage and prosody. A model of speaking proficiency is then applied to these features in order to assign a final score to the response. While the structure of this model is informed by content experts, it is also trained on a database of previously observed responses scored by human raters, in order to ensure that SpeechRater's scoring emulates human scoring as closely as possible. Furthermore, if the response is found to be unscorable due to audio quality or other issues, the SpeechRater engine can set it aside for special processing.
ETS's research agenda related to automated scoring of speech includes the development of more extensive Natural Language Processing (NLP) features to represent grammatical competencies and the discourse structure of spoken responses. The core capability is also being extended to apply across a range of item types used in different assessments of English proficiency, including a range of options from very restricted item types (such as passage read-alouds), through less restrictive items (such as summarization tasks), to fully open-ended items.
Below are some recent or significant publications that our researchers have authored on the subject of automated scoring of speech.
HALEF: An Open-Source Standard-Compliant Telephony-Based Modular Spoken Dialog System – A Review and an Outlook
D. Suendermann-Oeft, V. Ramanarayanan, M. Teckenbrock, F. Neutatz, & D. Schmidt
Paper in Proceedings of the IWSDS 2015, International Workshop on Spoken Dialog Systems,
This paper describes completed and ongoing research on HALEF, a telephony-based open-source spoken dialog system. The system can be deployed toward a versatile range of potential applications, including intelligent tutoring, language learning and assessment. View paper >
Performance of a Trialogue-based Prototype System for English Language Assessment for Young Learners
K. Evanini, Y. So, J. Tao, D. Zapata-Rivera, C. Luce, L. Battistini, & X. Wang
Paper in Proceedings of the Interspeech Workshop on Child Computer Interaction (WOCCI 2014)
This paper describes a trialogue-based system for assessing the spoken language abilities of young learners of English. Specifically, the system employs spoken dialogue system components in interactive, conversation-based assessment tasks involving the test taker and two virtual interlocutors. View paper >
Automatic Detection of Plagiarized Spoken Responses
K. Evanini & X. Wang
Paper in Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications pp. 22–27
This paper addresses the task of automatically detecting plagiarized responses in the context of a test of spoken English proficiency for nonnative speakers. A corpus of spoken responses containing plagiarized content was collected from a high-stakes assessment of English proficiency for nonnative speakers. View paper >
Similarity-Based Non-Scorable Response Detection for Automated Speech Scoring
S. Y. Yoon & S. Xie
Paper in Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 116–123
This paper describes a method that filters out spoken responses from the test takers who try to game the system using diverse strategies such as speaking in their native languages or by citing memorized responses for unrelated topics. View citation record >
Automated Speech Scoring for Non-native Middle School Students with Multiple Task Types
K. Evanini & X. Wang
Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), Lyon, France, Aug. 25–29, 2013, pp. 2435–2439
The authors present the results of applying automated speech scoring technology to English spoken responses provided by nonnative children in the context of an English proficiency assessment for middle school students. The challenges of using automated spoken language assessment for children are discussed and directions for future improvements are proposed. View citation record >
Applying Unsupervised Learning To Support Vector Space Model Based Speaking Assessment
Conference of the North American Association for Computational Linguistics and Human Language Technologies (NAACL-HLT-2013)
The author shows that machine-generated scores can effectively approximate the scores of human raters for use in model-building for automated speech assessment. View citation record >
Coherence Modeling for the Automated Assessment of Spontaneous Spoken Responses
X. Wang, K. Evanini, & K. Zechner
Conference of the North American Association for Computational Linguistics and Human Language Technologies (NAACL-HLT)
This paper describes a system for automatically evaluating discourse coherence in spoken responses. View citation record >
Prompt-based content scoring for automated spoken language assessment
K. Evanini, S. Xie, & K. Zechner
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pages 157–162, Atlanta, Ga.
This paper investigates the use of prompt-based content features for the automated assessment of spontaneous speech in a spoken language proficiency assessment. View citation record >
Automated Content Scoring of Spoken Responses in an Assessment for Teachers of English
K. Zechner & X. Wang
Proceedings of the Eighth Workshop on the Innovative Use of Natural Language Processing for Building Educational Applications (BEA-8), Conference of the North American Association for Computational Linguistics and Human Language Technologies (NAACL-HLT-2013), Atlanta, Ga.
This paper presents and evaluates approaches to automatically score the content correctness of spoken responses in a new language test for teachers of English as a foreign language who are nonnative speakers of English. View citation record >
A Comparison of Two Scoring Methods for an Automated Speech Scoring System
X. Xi, D. Higgins, K. Zechner, & D. Williamson
Language Testing, Vol. 29, No. 3, pp. 371–394
In this paper, researchers compare two alternative scoring methods for an automated scoring system for speech. The authors discuss tradeoffs between multiple regression and classification tree models. View citation record >
Exploring Content Features for Automated Speech Scoring
S. Xie, K. Evanini, & K. Zechner
Proceedings of the 2012 Meeting of the North American Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)
Association for Computational Linguistics
Researchers explore content features for automated speech scoring in this paper about automated scoring of unrestricted spontaneous speech. The paper compares content features based on three similarity measures in order to understand how well content features represent the accuracy of the content of a spoken response. View citation record >
Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures
S. Yoon & S. Bhat
Paper in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
In this paper, researchers present a method that measures English language learners' syntactic competence for the automated speech scoring systems. The authors discuss the advantage of the current natural language processing technique-based and corpus-based measures over the conventional ELL measures. View citation record >
Using Automatic Speech Recognition to Assess the Reading Proficiency of a Diverse Sample of Middle School Students
K. Zechner, K. Evanini, & C. Laitusis
Paper in Proceedings of the Interspeech Workshop on Child, Computer, and Interaction
The authors describe a study exploring automated assessment of reading proficiency, in terms of oral reading and reading comprehension, for a middle school population including students with reading disabilities and low reading proficiency, utilizing automatic speech recognition technology. View citation record >
A Three-Stage Approach to the Automated Scoring of Spontaneous Spoken Responses
D. Higgins, K. Zechner, X. Xi, & D. Williamson
Computer Speech & Language, Vol. 25, No. 2, pp. 282–306
This paper presents a description and evaluation of SpeechRater, a system for automated scoring of nonnative speakers' spoken English proficiency. The system evaluates proficiency based on assessment tasks that elicit spontaneous monologues on particular topics. View citation record >
Automatic Scoring of Non-Native Spontaneous Speech in Tests of Spoken English
K. Zechner, D. Higgins, X. Xi, & D. Williamson
Speech Communication, Vol. 51, No. 10, pp. 883–895
This paper presents the first version of the SpeechRater system, reviewing the automated scoring engine's use in the context of the TOEFL Practice Online test.
View citation record >
Find More Articles
View more research publications related to automated scoring of speech.
Read More from Our Researchers
View a list of current ETS researchers and their work.