skip to main content skip to footer

The Effects on Observed-and True-Score Equating Procedures of Matching on a Fallible Criterion: A Simulation With Test Variation IRT SAT

Eignor, Daniel R.; Stocking, Martha L.; Cook, Linda L.
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Equated Scores, Item Response Theory (IRT), Scholastic Aptitude Test (SAT)


Two recent simulation studies were conducted to aid in the diagnosis and interpretation of equating differences found between random and matched (nonrandom) samples for four commonly used equating procedures: Tucker, Levine equally reliable, Chained equipercentile observed-score procedures, and the 3PL IRT true-score equating procedure. For these simulations, test forms were equated to themselves, a situation that does not pattern reality. In this simulation, test variation was added as an additional variable for study. The results confirmed the results of the previous two simulations and support the prediction based on theoretical grounds that observed-score equating methods, such as Tucker and Chained equipercentile, are more affected by sample variation than are a true-score method (IRT) or an observed-score method based on true-score assumptions (Levine equally reliable). The results further suggest that matching equating samples on the basis of a fallible measure of ability, such as anchor test score, is not advisable for any equating method studied except possibly the Tucker method. (28pp.)

Read More