skip to main content skip to footer

Effect of Rasch Calibration on Ability and DIF Estimation in Computer-Adaptive Tests CAT DIF

Thayer, Dorothy T.; Wingersky, Marilyn S.; Zwick, Rebecca J.
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Ability, Adaptive Testing, Computer Assisted Testing, Differential Item Functioning (DIF), Estimation (Mathematics), Rasch Model


A simulation study of methods of assessing differential item functioning (DIF) in computer-adaptive tests (CATs) was conducted by Zwick, Thayer, and Wingersky (in press, 1992). Results showed that modified versions of the Mantel-Haenszel and standardization methods work well with CAT data. In that study, data were generated using the three-parameter logistic (3PL) model, and this same model was assumed in obtaining item parameter estimates. In the current study, 3PL item response data were used, but the Rasch model was assumed in obtaining item parameter estimates, which, in turn, determined the information table to be used in the item selection algorithm. New Rasch-based expected true scores were obtained for each examinee, based on responses to the CAT items. As in the previous study, the DIF statistics were highly correlated with the generating DIF, and the means and standard deviations of these statistics across items were close to their nominal values. There was, however, a tendency for DIF statistics to be slightly smaller in magnitude than in the 3PL analysis, resulting in a lower probability of detecting items with extreme DIF. This reduced sensitivity appeared to be related to degradation in the accuracy of matching. Expected true scores from the Rasch-based CAT tended to be biased downward, particularly for lower-ability examinees. Unlike the Rasch CAT scores, Rasch expected true scores based on nonadaptive administration of all pool items behaved quite well, as did the nonadaptive and CAT-based expected true scores obtained using the 3PL model. (38pp.)

Read More