skip to main content skip to footer

Dependability of Scores for a New ESL Speaking Test: Evaluating Prototype Tasks EFL ESL

Lee, Yong-Won
Publication Year:
Report Number:
ETS Research Memorandum
Document Type:
Page Count:
Subject/Key Words:
Dependability Index, English as a Foreign Language (EFL), English as a Second Language (ESL), Generalizability Coefficient, Generalizability Theory, Independent Tasks, Integrated Tasks, Rating Design, Score Dependability, Speaking Assessment, Task Generalizability, Variance Components


A new multitask speaking measure is expected to be an important component of a new version of the Test of English as a Foreign Language (TOEFL). This study considered two critical issues concerning score dependability of the new speaking measure: How much would the score dependability be impacted by (a) combining scores on different task types into a composite score and (b) rating each task only once? To answer these questions, the study used generalizability theory (G-theory) procedures to examine (a) the relative effects of tasks and raters on examinees' speaking scores and (b) the impact of the numbers of tasks and raters per speech sample and of subsection lengths on the dependability of speaking section scores. Univariate and multivariate G-theory analyses were conducted on rating data collected for 261 examinees for the study. The finding in the univariate analyses was that it would be more efficient to increase the number of tasks rather than the number of ratings per speech sample in maximizing the score dependability. The multivariate G-theory analyses also revealed that (a) the universe scores among the task-type subsections were very highly correlated and that (b) slightly larger gains in composite score reliability would result from increasing the number of listening-speaking tasks for the fixed section lengths.

Read More