Application of a New Goodness-of-Fit Plot Procedure to SAT and TOEFL Item Type Data IRT SAT TOEFL

Eignor, Daniel R.; Golub-Smith, Marna L.; Wingersky, Marilyn S.
Equated Scores, Goodness of Fit, Item Response Theory (IRT), Scholastic Aptitude Test (SAT), Statistical Analysis, Test of English as a Foreign Language (TOEFL)


The three-parameter logistic IRT model is currently being used for equating purposes with data from multiple administrations of two testing programs at ETS: the Admissions Testing Program Scholastic Aptitude Test (SAT) and the Test of English as a Foreign Language (TOEFL). Users of the model in these two testing programs have generally assessed goodness-of-fit of the model to test items by looking at item-estimated ability regression plots. This procedure tends to be fairly subjective and quite time consuming, and, as a result, little useful aggregate data have been assessed that would shed light on which SAT-Verbal, SAT-Mathematical, and TOEFL item types are better or more poorly fit by the model. The purposes of this study were two-fold: 1) to systematically study the fit of the three-parameter logistic model to SAT and TOEFL item type data, using a fit statistic formed by grouping on estimated ability and a newly suggested normal probability plot procedure; and 2) to determine the degree of consistency that can be expected when assessing lack of fit of the three-parameter model using the fit statistic formed by grouping on estimated ability and a newly suggested fit statistic which is formed by grouping an observed number-right score. (56pp.)

