skip to main content skip to footer

Using a New Statistical Model for Testlets to Score TOEFL TOEFL IRT

Wainer, Howard; Wang, Xiaohui
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Test of English as a Foreign Language (TOEFL), Reading Comprehension, Listening Comprehension, Item Response Theory (IRT)


Standard item response theory (IRT) models fit to examination responses ignore the fact that sets of items (testlets) often are matched with a single common stimuli (e.g., a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee proficiency). Models that assume conditional independence will overestimate the precision with which examinee proficiency is measured. Overstatement of precision may lead to inaccurate inferences as well as prematurely ending an examination in which the stopping rule is based on the estimated standard error of examinee proficiency (e.g., an adaptive test). The standard three parameter IRT model was modified to include an additional random effect for items nested within the same testlet. This parameter, Greek letter gamma, characterizes the amount of local dependence in a testlet. We fit 86 TOEFL testlets (50 reading comprehension and 36 listening comprehension) with the new model, and obtained a value for the variance of g for each testlet. We compared the standard parameters [discrimination (a), difficulty (b), and guessing (c)] with what is obtained through traditional modeling. We found that difficulties were well estimated either way, but estimates of both a and c were biased if conditional independence is incorrectly assumed. Of greater import, we found that test information was substantially over-estimated when conditional independence was incorrectly assumed.

Read More