Using a New Statistical Model for Testlets to Score the TOEFL® Test

Author(s):
Wainer, Howard; Wang, Xiaohui
Publication Year:
2001
Report Number:
RR-01-09, TOEFL-TR-16
Source:
Document Type:
Subject/Key Words:
Gibbs sampling local dependence listening comprehension reading comprehension Bayesian model fitting

Abstract

Standard item response theory (IRT) models fit to examination responses ignore the fact that sets of items (testlets) often are matched with a single common stimuli (e.g., a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee proficiency). Models that assume conditional independence will overestimate the precision with which examinee proficiency is measured. Overstatement of precision may lead to inaccurate inferences as well as prematurely ending an examination in which the stopping rule is based on the estimated standard error of examinee proficiency (e.g., an adaptive test). The standard three parameter IRT model was modified to include an additional random effect for items nested within the same testlet (Wainer, Bradlow, & Du, 2000). This parameter, Greek letter gamma, characterizes the amount of local dependence in a testlet. We fit 86 TOEFL® testlets (50 reading comprehension and 36 listening comprehension) with the new model and obtained a value for the variance of g for each testlet. We compared the standard parameters [discrimination (a), difficulty (b), and guessing (c)] with what is obtained through traditional modeling. We found that difficulties were well estimated either way, but estimates of both a and c were biased if conditional independence is incorrectly assumed. Of greater import, we found that test information was substantially over-estimated when conditional independence was incorrectly assumed.

Read More