skip to main content skip to footer

Estimating the Effects of Test Length and Test Time on Parameter Estimation Using the HYBRID Model IRT TOEFL

Yamamoto, Kentaro
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
HYBRID, Item Response Theory (IRT), Parameter Estimation, Test Speededness, Test of English as a Foreign Language (TOEFL)


When individuals perform tasks, they differ from each other not only in their ability to perform the tasks correctly, but also in their speed. Even though the traditional indicator of test speededness, missing responses, clearly indicates a lack of time to respond (thereby indicating the speededness of the test), it is inadequate for evaluating speededness in a multiple- choice test scored as number correct and underestimates test speededness. Conventional IRT parameter estimation ignores the mixture of random responses during calibration; consequently, estimated parameters are biased. The HYBRID model (Yamamoto, 1989) was extended (Yamamoto, 1990) to characterize when each examinee switches from an ability-based response strategy to a strategy of responding randomly. The model has allowed us to evaluate test speededness by estimating the proportions of examinees who switch strategies at any possible point in the test. The estimated IRT parameters based on the HYBRID model were more accurate than the ordinary IRT-only analysis. With the extended HYBRID model applied to the data taken from an experimental form of the TOEFL® test, we found that 1) the test length had a small impact on the proportion of the examinees affected by the speededness of the test, 2) a greater proportion of examinees were affected by speededness of a test with a 50-minute time limit than a test with a 55- or 60-minute time limit, and 3) the difference in the proportions of examinees affected by speededness of tests under 55- and 60- minute time limits was small. However, nearly 20% of the examinees were affected by speededness after completing 80% of the test. In other words, the last 20% of the responses of 20% of the examinees did not represent their true ability.

Read More