skip to main content skip to footer

Analysis of Proposed Revisions of the Test of Spoken English TSE

Henning, Grant; Schedl, Mary A.; Suomi, Barbara K.
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Construct Validity, Interrater Reliability, Test Construction, Test Content, Test Format, Test of Spoken English (TSE)


This research was conducted to compare a prototype revised form of the TSE® (Test of Spoken English™) with the current version of the same test. The study compared interrater reliability, frequency of rater discrepancy at all score levels, component task adequacy, scoring efficacy, and other concurrent and construct validity evidence, including oral proficiency interview correlations for a subset of the examinee sample. The study employed a representative examinee sample of 342 nonnative speakers of English, purposely sampled from among the two professional domains of prospective university teaching assistants (N = 184) and prospective licensed medical professionals (N = 158). One somewhat unusual component of the study was the attempt to involve persons most at risk in the judgment process. Thus, in addition to employing the usual group of trained raters for the scoring of the current and prototype versions of the test, 16 naive adult raters were purposely selected (eight first-year university students from four broad academic disciplines and eight nondegreed prospective medical outpatients within four broad age levels) for having limited exposure to foreign languages and cultures. These 16 naive raters (eight females and eight males) provided concurrent judgments of the comprehensibility and communicative effectiveness of a subset of 40 recorded prototype examinations. In general, the comparative evidence gathered appeared to underscore the psychometric quality of the prototype revised TSE and to support conclusions of its adequacy as an instrument used to make judgments of the oral English language proficiency of nonnative speakers in the targeted populations. Some additional suggestions are provided on ways to implement the scoring of the prototype version of the test. (50pp.)

Read More