Dependability of Scores for a New ESL Speaking Test: Evaluating Prototype Tasks

Author(s):
Lee, Yong-Won
Publication Year:
2005
Report Number:
RM-04-07, TOEFL-MS-28
Source:
Document Type:
Subject/Key Words:
Dependability index EFL ESL (English as a foreign second language) generalizability coefficients generalizability theory independent tasks integrated tasks rating design score dependability speaking assessment task generalizability variance components

Abstract

A new multitask speaking measure is expected to be an important component of a new version of the TOEFL® test (Test of English as a Foreign Language™). This study considered two critical issues concerning score dependability of the new speaking measure: How much would the score dependability be impacted by (a) combining scores on different task types into a composite score and (b) rating each task only once? To answer these questions, the study used generalizability theory (G-theory) procedures to examine (a) the relative effects of tasks and raters on examinees' speaking scores and (b) the impact of the numbers of tasks and raters per speech sample and of subsection lengths on the dependability of speaking section scores. Univariate and multivariate G-theory analyses were conducted on rating data collected for 261 examinees for the study. The finding in the univariate analyses was that it would be more efficient to increase the number of tasks rather than the number of ratings per speech sample in maximizing the score dependability. The multivariate G-theory analyses also revealed that (a) the universe scores among the task-type subsections were very highly correlated and that (b) slightly larger gains in composite score reliability would result from increasing the number of listening-speaking tasks for the fixed section lengths.

Read More