Dependability of New ESL Writing Test Scores: Evaluating Prototype Tasks and Alternative Rating Schemes ESL TOEFL

Author(s):: Lee, Yong-Won; Kantor, Robert
Publication Year:: 2005
Report Number:: RR-05-14
Source:: ETS Research Report
Document Type:: Report
Page Count:: 76
Subject/Key Words:: Absolute Error, Dependability Index, English as a Second Language (ESL), Generalizability Coefficient, Generalizability Theory, Integrated Task, Rating Design, Relative Error, Score Dependability, Task Generalizability, Variance Components, Writing Assessment, Test of English as a Foreign Language (TOEFL)

Abstract

Possible integrated and independent tasks were pilot tested for the writing section of a new generation of the TOEFL test (Test of English as a Foreign Language) examination. This study examines the impact of various rating designs as well as the impact of the number of tasks and raters on the reliability of writing scores based on integrated and independent tasks from the perspective of generalizability theory (G-theory). Both univariate and multivariate G-theory analyses were conducted. It was found that (a) in terms of maximizing the score reliability, it would be more efficient to increase the number of tasks rather than the number of ratings per essay; (b) two particular single-rating designs having different tasks for the same examinee rated by different raters [p x (R:T), R:(p x T)] achieved relatively higher score reliabilities than other single-rating designs; and (c) a somewhat larger gain in composite score reliability was achieved when the number of listening-writing tasks was larger than the number of reading-writing tasks.

Request Copy (specify title and report number, if any)
http://dx.doi.org/10.1002/j.2333-8504.2005.tb01991.x

Dependability of New ESL Writing Test Scores: Evaluating Prototype Tasks and Alternative Rating Schemes ESL TOEFL

Abstract

Read More