ETS's SpeechRater® engine is the world's most advanced spoken-response scoring application targeted to score spontaneous responses, in which the range of valid responses is open-ended rather than narrowly determined by the item stimulus. Test takers preparing to take the TOEFL® test have had their responses scored by the SpeechRater engine as part of the TOEFL Practice Online ( TPO™) practice tests since 2006. Competing capabilities focus on assessing low-level aspects of speech production such as pronunciation by using restricted tasks in order to increase reliability. The SpeechRater engine, by contrast, is based on a broad conception of the construct of English-speaking proficiency, encompassing aspects of speech delivery (such as pronunciation and fluency), grammatical facility and higher-level abilities related to topical coherence and the progression of ideas.
The SpeechRater engine processes each response with an automated speech recognition system specially adapted for use with nonnative English. Based on the output of this system, natural language processing (NLP) and speech-processing algorithms are used to calculate a set of features that define a "profile" of the speech on a number of linguistic dimensions, including fluency, pronunciation, vocabulary usage, grammatical complexity and prosody. A model of speaking proficiency is then applied to these features in order to assign a final score to the response. While this model is trained on previously observed data scored by human raters, it is also reviewed by content experts to maximize its validity. Furthermore, if the response is found to be unscorable due to audio quality or other issues, the SpeechRater engine can set it aside for special processing.
ETS's research agenda related to automated scoring of speech includes the development of more extensive NLP features to represent pragmatic competencies and the discourse structure of spoken responses. The core capability has also been extended to apply across a range of item types used in different assessments of English proficiency, very restricted item types (such as passage read-alouds), or less restricted items (such as summarization tasks).
Featured Publications
Below are some recent or significant publications that our researchers have authored on the subject of automated scoring of speech, spoken dialog systems, and multimodal assessments.
2018
-
Monitoring the Performance of Human and Automated Scores for Spoken Responses
Z. Wang, K. Zechner, & Y. Sun
Language Testing, Vol. 35, No. 1, pp. 101–120The authors discuss procedures for ongoing monitoring of the performance of automated and human scores for spoken constructed responses. Learn more about this publication
2017
-
Approaches to Automated Scoring of Speaking for K–12 English Language Proficiency Assessments
K. Evanini, M. C. Hauck, & K. Hakuta
ETS Research Report No. RR-17-18This research report provides recommendations to stakeholders (including score users, policymakers, and administrators) about best practices for using automated speech scoring in K-1 English language proficiency assessment. Learn more about this publication
-
Combining Human and Automated Scores for the Improved Assessment of Non-Native Speech
S.-Y. Yoon & K. Zechner
Speech Communication, Vol. 93, pp. 43–52The authors describe a hybrid approach for combining human and automated scores for an assessment of spoken English in which human scores are obtained for responses that are flagged by the automated system as being difficult to score. Learn more about this publication
-
Comparative Evaluation of Automated Scoring of Syntactic Competence of Non-Native Speakers
K. Zechner, S.-Y. Yoon, S. Bhat, & C. W. Leong
Computers in Human Behavior, Vol. 76, pp. 672–682This article presents an evaluation of several automated metrics for assessing an English learner's syntactic competencies. Learn more about this publication
2016
-
Automated Scoring Across Different Modalities
A. Loukina & A. Cahill
Paper in Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 130–135This article investigates the application of automated scoring systems that were originally developed for scoring text (short answers and essays) to non-native spoken English in combination with the SpeechRater automated scoring service. Learn more about this publication
- Self-Adaptive DNN for Improving Spoken Language Proficiency Assessment
Y. Qian, X. Wang, K. Evanini, & D. Suendermann-Oeft
Paper in Proceedings of the 17th Annual Conference of the International Speech Communication Association (INTERSPEECH 2016), pp. 3122–3126This article presents how a self-adaptive DNN trained with i-vectors on a corpus of non-native speech can improve the performance of an automated speech recognizer and an automated speech scoring system. Learn more about this publication
2015
-
Automatic Assessment of Syntactic Complexity for Spontaneous Speech Scoring
S. Bhat & S.-Y. Yoon
Speech Communication, Vol. 67, pp. 42–57The article presents a study of new measures of syntactic complexity, which automated scoring systems can use when assessing-second language spontaneous speech. Learn more about this publication
-
Feature Selection for Automated Speech Scoring
A. Loukina, K. Zechner, L. Chen, & M. Heilman
Paper in Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 12–19The authors compared different ways of selecting features in automated scoring systems that evaluate spoken or written responses in language assessments. Learn more about this publication
-
Automated Scoring of Speaking Tasks in the Test of English-for-Teaching ( TEFT™)
K. Zechner, L. Chen, L. Davis, K. Evanini, C. M. Lee, C. W. Leong, X. Wang, & S.-Y. Yoon
ETS Research Report No. RR-15-31The report summarizes research and development efforts focusing on scoring models for automatically scoring spoken-item responses, for a pilot administration of a test for nonnative English teachers or teacher candidates. Learn more about this publication
2014
-
Performance of a Trialogue-based Prototype System for English Language Assessment for Young Learners
K. Evanini, Y. So, J. Tao, D. Zapata-Rivera, C. Luce, L. Battistini, & X. Wang
Paper in Proceedings of the Interspeech Workshop on Child Computer Interaction (WOCCI 2014), pp. 79–84This paper describes a trialogue-based system for assessing the spoken language abilities of young learners of English. Specifically, the system employs spoken dialogue system components in interactive, conversation-based assessment tasks involving the test taker and two virtual interlocutors. Learn more about this publication
-
Automatic Detection of Plagiarized Spoken Responses
K. Evanini & X. Wang
Paper in Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 22–27This paper addresses the task of automatically detecting plagiarized responses in the context of a test of spoken English proficiency for nonnative speakers. A corpus of spoken responses containing plagiarized content was collected from a high-stakes assessment of English proficiency for nonnative speakers. Learn more about this publication
-
Similarity-Based Non-Scorable Response Detection for Automated Speech Scoring
S. Y. Yoon & S. Xie
Paper in Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 116–123This paper describes a method that filters out spoken responses from the test takers who try to game the system using diverse strategies such as speaking in their native languages or by citing memorized responses for unrelated topics. Learn more about this publication
2013
-
Automated Speech Scoring for Non-native Middle School Students with Multiple Task Types
K. Evanini & X. Wang
Paper in Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), pp. 2435–2439The authors present the results of applying automated speech-scoring technology to English-spoken responses provided by nonnative children in the context of an English proficiency assessment for middle school students. The challenges of using an automated spoken-language assessment for children are discussed and directions for future improvements are proposed. Learn more about this publication
-
Applying Unsupervised Learning To Support Vector Space Model Based Speaking Assessment
L. Chen
Paper in Proceedings of the Eighth Workshop on the Innovative Use of NLP for Building Educational Applications, pp. 58–62The author shows that machine-generated scores can effectively approximate the scores of human raters for use in model-building for automated speech assessment. Learn more about this publication
-
Coherence Modeling for the Automated Assessment of Spontaneous Spoken Responses
X. Wang, K. Evanini, & K. Zechner
Paper in Proceedings of the NAACL HLT, 2013, pp. 814–819This paper describes a system for automatically evaluating discourse coherence in spoken responses. Learn more about this publication
-
Prompt-Based Content Scoring for Automated Spoken Language Assessment
K. Evanini, S. Xie, & K. Zechner
Paper in Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 157–162This paper investigates the use of prompt-based content features for the automated assessment of spontaneous speech in a spoken language proficiency assessment. Learn more about this publication
-
Automated Content Scoring of Spoken Responses in an Assessment for Teachers of English
K. Zechner & X. Wang
Paper in Proceedings of the Eighth Workshop on the Innovative Use of Natural Language Processing for Building Educational Applications, pp. 73–81This paper presents and evaluates approaches to automatically score the content correctness of spoken responses in a new language test for teachers of English as a foreign language who are nonnative speakers of English. Learn more about this publication
2012
-
A Comparison of Two Scoring Methods for an Automated Speech Scoring System
X. Xi, D. Higgins, K. Zechner, & D. Williamson
Language Testing, Vol. 29, No. 3, pp. 371–394In this paper, researchers compare two alternative scoring methods for an automated scoring system for speech. The authors discuss tradeoffs between multiple regression and classification tree models. Learn more about this publication
-
Exploring Content Features for Automated Speech Scoring
S. Xie, K. Evanini, & K. Zechner
Paper in Proceedings of the 2012 Conferece of the North American Association for Computational Linguistics: Human Language Technologies, pp. 103-111Researchers explore content features for automated speech scoring in this paper about automated scoring of unrestricted spontaneous speech. The paper compares content features based on three similarity measures in order to understand how well content features represent the accuracy of the content of a spoken response. Learn more about this publication
-
Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures
S. Yoon & S. Bhat
Paper in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 600-608In this paper, researchers present a method that measures English language learners' syntactic competence for the automated speech scoring systems. The authors discuss the advantage of the current natural-language processing technique-based and corpus-based measures over the conventional ELL measures. Learn more about this publication
2011
-
Using Automatic Speech Recognition to Assess the Reading Proficiency of a Diverse Sample of Middle School Students
K. Zechner, K. Evanini, & C. Laitusis
Paper in Proceedings of the Interspeech Workshop on Child, Computer, and Interaction (WOCCI 2012), pp. 45-52The authors describe a study exploring automated assessment of reading proficiency, in terms of oral reading and reading comprehension, for a middle school population including students with reading disabilities and low reading proficiency, utilizing automatic speech recognition technology. Learn more about this publication
-
A Three-Stage Approach to the Automated Scoring of Spontaneous Spoken Responses
D. Higgins, X. Xi, K. Zechner, & D. Williamson
Computer Speech & Language, Vol. 25, No. 2, pp. 282–306This paper presents a description and evaluation of SpeechRater, a system for automated scoring of nonnative speakers' spoken English proficiency. The system evaluates proficiency based on assessment tasks that elicit spontaneous monologues on particular topics. Learn more about this publication
2009
-
Automatic Scoring of Non-Native Spontaneous Speech in Tests of Spoken English
K. Zechner, D. Higgins, X. Xi, & D. Williamson
Speech Communication, Vol. 51, No. 10, pp. 883–895This paper presents the first version of the SpeechRater system, reviewing the automated scoring engine's use in the context of the TOEFL Practice Online test.
Learn more about this publication
Spoken Dialog Systems
2017
-
A Modular, Multimodal Open-Source Virtual Interviewer Dialog Agent
K. Cofino, V. Ramanarayanan, P. Lange, D. Pautler, D. Suendermann-Oeft, & K. Evanini
Paper in Proceedings of ICMI 2017, 19th ACM International Conference on Multimodal Interaction, pp. 520–521We present an open-source multimodal dialog system equipped with a virtual human avatar interlocutor. To demonstrate the capabilities of the system, we designed and implemented a conversational job interview scenario where the avatar plays the role of an interviewer and responds to user input in real-time to provide an immersive user experience. Learn more about this publication
-
Exploring ASR-Free End-to-End Modeling to Improve Spoken Language Understanding in a Cloud-Based Dialog System
Y. Qian, R. Ubale, V. Ramanarayanan, P. Lange, D. Suendermann-Oeft, K. Evanini, & E. Tsuprun
Paper in Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 569–576This paper proposes an automatic speech recognition (ASR)-free, end-to-end modeling approach to spoken language understanding for a cloud-based, modular spoken dialog system. Experimental results show that our approach is particularly promising in situations with low ASR accuracy. Learn more about this publication
-
Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human-Machine Spoken Dialog Interactions
V. Ramanarayanan, P. Lange, K. Evanini, H. Molloy, & D. Suendermann-Oeft
Paper in Proceedings of Interspeech 2017: 18th Annual Conference of the International Speech Communication Association, pp. 1711–1715We analyzed human-rated scores of recorded dialog data on three different scoring dimensions critical to the delivery of conversational English – fluency, pronunciation and intonation/stress – and examined the efficacy of automatically extracted, hand-curated speech features in predicting each of these subscores. Learn more about this publication
-
Jee haan, I'd Like Both, Por Favor: Elicitation of a Code-Switched Corpus of Hindi-English and Spanish-English Human-Machine Dialog
V. Ramanarayanan & D. Suendermann-Oeft
Paper in Proceedings of Interspeech 2017: 18th Annual Conference of the International Speech Communication Association, pp. 47–51We present a database of code-switched conversational human–machine dialog in English–Hindi and English–Spanish. We leveraged HALEF, an open-source standards-compliant cloud-based dialog system, to capture audio and video of bilingual crowd workers as they interacted with the system. Learn more about this publication
2016
-
LVCSR System on a Hybrid GPU-CPU Embedded Platform for Real-Time Dialog Applications
A. V. Ivanov, P. Lange, & D. Suendermann-Oeft
Paper in Proceedings of the SIGDIAL 2016 Conference, pp. 220–223We present the implementation of a large-vocabulary continuous speech recognition (LVCSR) system on NVIDIA’s Tegra K1 hybrid GPU-CPU embedded platform. The fact that the system is real-timeable and consumes less than 7.5 watts peak makes the system perfectly suitable for fast, but precise, offline spoken dialog applications, such as in robotics, portable gaming devices, or in-car systems. Learn more about this publication
-
Bootstrapping Development of a Cloud-Based Spoken Dialog System in the Educational Domain From Scratch Using Crowdsourced Data
V. Ramanarayanan, D. Suendermann-Oeft, P. Lange, A. Ivanov, K. Evanini, Z. Yu, E. Tsuprun, & Y. Qian
ETS Research Report No. RR-16-16We propose a crowdsourcing-based framework to iteratively and rapidly bootstrap a dialog system from scratch for a new domain. We leverage the open-source modular HALEF dialog system to deploy dialog applications. We illustrate the usefulness of this framework using four different prototype dialog items with applications in the educational domain and present initial results and insights from this endeavor. Learn more about this publication
2015
-
Automated Speech Recognition Technology for Dialogue Interaction with Non-Native Interlocutors
A. Ivanov, V. Ramanarayanan, D. Suendermann-Oeft, M. Lopez, K. Evanini, & J. Tao
Paper in Proceedings of SIGDIAL 2015, 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 134–138We present a comparative study of various approaches to speech recognition in non-native context. Comparing systems in terms of their accuracy and real-time factor, we find that a Kaldi-based Deep Neural Network Acoustic Model (DNN-AM) system with online speaker adaptation by far outperforms other available methods. Learn more about this publication
-
A Distributed Cloud-Based Dialog System For Conversational Application Development
V. Ramanarayanan, D. Suendermann-Oeft, I. V. Alexei, & K. Evanini
Paper in Proceedings of SIGDIAL 2015, 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 432–434In this paper, we extend this infrastructure for HALEF–an open-source spoken dialog system–to be cloud-based, and thus truly distributed and scalable. This system can be accessed both via telephone interfaces as well as through web clients with WebRTC/HTML5 integration, allowing in-browser access to potentially multimodal dialog applications. Learn more about this publication
-
HALEF: An Open-Source Standard-Compliant Telephony-Based Modular Spoken Dialog System – A Review and an Outlook
D. Suendermann-Oeft, V. Ramanarayanan, M. Teckenbrock, F. Neutatz, & D. Schmidt
Paper in Proceedings of the IWSDS 2015, International Workshop on Spoken Dialog Systems, pp. 1–9This paper describes completed and ongoing research on HALEF, a telephony-based open-source spoken dialog system. The system can be deployed toward a versatile range of potential applications, including intelligent tutoring, language learning and assessment. Learn more about this publication
Multimodal Assessments
2018
-
Interview With an Avatar: A Real-Time Cloud-Based Virtual Dialog Agent for Educational and Job Training Applications
V. Ramanarayanan, D. Pautler, P. Lange, & D. Suendermann-Oeft
ETS Research Memorandum No. RM-18-02We present a multimodal dialog system equipped with a virtual human avatar interlocutor. In this scenario, the avatar plays the role of an interviewer and responds to user input in real time to provide an immersive user experience. Learn more about this publication
2017
-
MAP: Multimodal Assessment Platform for Interactive Communication Competency
S. Khan, D. Suendermann-Oeft, K. Evanini, D. Williamson, S. Paris, Y. Qian, Y. Huang, P. Bosch, S. D'Mello, A. Loukina, & L. Davis
Paper in S. Shehata & J. P.-L. Tan (Eds.), Practitioner Track Proceedings of the 7th International Learning Analytics & Knowledge Conference, pp. 6–12In this paper, we describe a prototype system for automated, interactive human communication assessment. The system processes multimodal data captured in a variety of human-human and human-computer interactions, integrates speech and face recognition-based biometric capabilities, and ranks and indexes large collections of assessment content. Learn more about this publication
-
Crowdsourcing Ratings of Caller Engagement in Thin-Slice Videos of Human-Machine Dialog: Benefits and Pitfalls
V. Ramanarayanan, C. Leong, D. Suendermann-Oeft, & K. Evanini
Paper in Proceedings of ICMI 2017, 19th ACM International Conference on Multimodal Interaction, pp. 281–287We analyze the efficacy of different crowds of naïve human raters in rating engagement during human–machine dialog interactions. Each rater viewed multiple 10 second, thin-slice videos of native and non-native English speakers interacting with a computer-assisted language learning (CALL) system and rated how engaged and disengaged those callers were while interacting with the automated agent. Learn more about this publication
-
Crowdsourcing Multimodal Dialog Interactions: Lessons Learned From the HALEF Case
V. Ramanarayanan, D. Suendermann-Oeft, H. Molloy, E. Tsuprun, P. Lange, & K. Evanini
Paper in Proceedings of the Workshop on Crowdsourcing, Deep Learning and Artificial Intelligence Agents at the Thirty-First AAAI Conference on Artificial Intelligence, pp. 423–431We present a retrospective on collecting data of human interactions with multimodal dialog systems (“dialog data”) using crowdsourcing techniques. This is largely based on our experience using the HALEF multimodal dialog system to deploy education-domain conversational applications on the Amazon Mechanical Turk crowdsourcing platform. Learn more about this publication
-
An Open-Source Dialog System With Real-Time Engagement Tracking for Job Interview Training Applications
Z. Yu, V. Ramanarayanan, P. Lange, & D. Suendermann-Oeft
Paper in Proceedings of IWSDS 2017, International Workshop on Spoken Dialog Systems, pp. 1–9We designed and implemented a dialog system that tracks and reacts to a user’s state, such as engagement, in real time. We designed and implemented a conversational job interview task based on the proposed framework. The system acts as an interviewer and reacts to user’s disengagement in real-time with positive feedback strategies designed to re-engage the user in the job interview process. Learn more about this publication
2016
-
Assembling the Jigsaw: How Multiple Open Standards Are Synergistically Combined in the HALEF Multimodal Dialog System
V. Ramanarayanan, D. Suendermann-Oeft, P. Lange, R. Mundkowsky, A. Ivanov, Z. Yu, Y. Qian, & K. Evanini
Chapter in D. Dahl (Ed.), Multimodal Interaction With W3C Standards: Towards Natural User Interfaces to Everything, Springer, pp. 295–310In this chapter, we examine how an open source, modular, multimodal dialog system—HALEF—can be seamlessly assembled, much like a jigsaw puzzle, by putting together multiple distributed components that are compliant with the W3C recommendations or other open industry standards. Learn more about this publication
-
Multimodal HALEF: An Open-Source Modular Web-Based Multimodal Dialog Framework
Z. Yu, V. Ramanarayanan, R. Mundkowsky, P. Lange, A. Ivanov, A. Black, & D. Suendermann-Oeft
Paper in Proceedings of the IWSDS 2016, International Workshop on Spoken Dialog Systems, pp. 1–11We describe recent developments and preliminary research results on extending the HALEF spoken dialog system to other modalities, in particular the capture of video feeds in web browser sessions. This technology enables the roll-out of multimodal dialog systems to a massive user base, as exemplified by the use of Amazon Mechanical Turk for data collection using Multimodal HALEF. Learn more about this publication
2015
-
Evaluating Speech, Face, Emotion and Body Movement Time-Series Features for Automated Multimodal Presentation Scoring
V. Ramanarayanan, C. W. Leong, L. Chen, G. Feng, & D. Suendermann-Oeft
Paper in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, pp. 23–30This paper describes a new approach of extracting multimodal features for automatically scoring presentation performances. Histograms of co-occurrences capture how different prototypical body postures or facial configurations co-occur within different time lags of each other over the evolution of the multimodal, multivariate time series. Learn more about this publication
-
A Modular Open-Source Standard-Compliant Dialog System Framework With Video Support
V. Ramanarayanan, Z. Yu, R. Mundkowsky, P. Lange, A. Ivanov, A. Black, & D. Suendermann-Oeft
Paper in Proceedings of ASRU 2015, 14th IEEE Automatic Speech Recognition and Understanding Workshop, pp. 1–2
We present HALEF (Help Assistant–Language-Enabled and Free), an open-source cloud-compatible multimodal dialog system that can be used with different plug-and-play backend application modules and includes support for video interfacing via web browser. Learn more about this publication
Find More Articles
View more research publications related to automated scoring of speech.