Automated Scoring of Speech

ETS's SpeechRaterSM engine is the only spoken response scoring application that is used to score spontaneous responses, in which the range of valid responses is open-ended rather than narrowly determined by the item stimulus. Test takers preparing to take the TOEFL® test have had their responses scored by the SpeechRater engine as part of the TOEFL Practice Online test since 2006. Competing capabilities focus on assessing low-level aspects of speech production such as pronunciation by using restricted tasks in order to increase reliability. The SpeechRater engine, by contrast, is based on a broad conception of the construct of English speaking proficiency, encompassing aspects of speech delivery (such as pronunciation and fluency), grammatical facility and higher-level abilities related to topical coherence and the progression of ideas.

The SpeechRater engine processes each response with an automated speech recognition system specially adapted for use with nonnative English. Based on the output of this system, natural language processing is used to calculate a set of features that define a "profile" of the speech on a number of linguistic dimensions, including fluency, pronunciation, vocabulary usage and prosody. A model of speaking proficiency is then applied to these features in order to assign a final score to the response. While the structure of this model is informed by content experts, it is also trained on a database of previously observed responses scored by human raters, in order to ensure that SpeechRater's scoring emulates human scoring as closely as possible. Furthermore, if the response is found to be unscorable due to audio quality or other issues, the SpeechRater engine can set it aside for special processing.

ETS's research agenda related to automated scoring of speech includes the development of more extensive Natural Language Processing (NLP) features to represent grammatical competencies and the discourse structure of spoken responses. The core capability is also being extended to apply across a range of item types used in different assessments of English proficiency, including a range of options from very restricted item types (such as passage read-alouds), through less restrictive items (such as summarization tasks), to fully open-ended items.

Featured Publications

Below are some recent or significant publications that our researchers have authored on the subject of automated scoring of speech.

2012

  • Assessment of ESL Learners' Syntactic Competence Based on Similarity Measures
    S. Yoon & S. Bhat
    Paper in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

    In this paper, researchers present a method that measures English language learners' syntactic competence for the automated speech scoring systems. The authors discuss the advantage of the current natural language processing technique-based and corpus-based measures over the conventional ELL measures. Download the full paper.

  • Using Automatic Speech Recognition to Assess the Reading Proficiency of a Diverse Sample of Middle School Students
    K. Zechner, K. Evanini, & C. Laitusis
    Paper in Proceedings of the Interspeech Workshop on Child, Computer, and Interaction

    The authors describe a study exploring automated assessment of reading proficiency, in terms of oral reading and reading comprehension, for a middle school population including students with reading disabilities and low reading proficiency, utilizing automatic speech recognition technology. Download the full paper.

  • A Comparison of Two Scoring Methods for an Automated Speech Scoring System
    X. Xi, D. Higgins, K. Zechner, & D. Williamson
    Language Testing, Vol. 29, No. 3, pp. 371–394

    In this paper, researchers compare two alternative scoring methods for an automated scoring system for speech. The authors discuss tradeoffs between multiple regression and classification tree models. View the full abstract or order this report.

  • Exploring Content Features for Automated Speech Scoring
    S. Xie, K. Evanini, and K. Zechner
    Proceedings of the 2012 Meeting of the North American Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)
    Association for Computational Linguistics

    Researchers explore content features for automated speech scoring in this paper about automated scoring of unrestricted spontaneous speech. The paper compares content features based on three similarity measures in order to understand how well content features represent the accuracy of the content of a spoken response. Download the full report.

2011

2010

  • Towards Using Structural Events to Assess Non-Native Speech
    L. Chen, J. Tetreault, & X. Xi
    NAACL-HLT 2010: Proceedings of the 5th Workshop on Building Educational Applications (BEA-5) Association for Computational Linguistics

    In this study, researchers investigated the usefulness of "structural events" in speech — for example, clause and disfluency structure — as a way of predicting holistic measures of speaking proficiency. Download the full report.

  • Using Amazon Mechanical Turk for Transcription of Non-Native Speech
    K. Evanini, D. Higgins, & K. Zechner
    Proceedings of the Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, HLT-NAACL 2010
    Association for Computational Linguistics

    In this study, researchers found speech transcriptions by Amazon Mechanical Turk workers to be as accurate as an individual expert transcriber for one type of test response and only slightly less accurate for another. Download the full report.

2009

2008