The Effects on Observed-and True-Score Equating Procedures of Matching on a Fallible Criterion: A Simulation With Test Variation IRT SAT

Author(s):: Eignor, Daniel R.; Stocking, Martha L.; Cook, Linda L.
Publication Year:: 1990
Report Number:: RR-90-25
Source:: ETS Research Report
Document Type:: Report
Page Count:: 28
Subject/Key Words:: Equated Scores, Item Response Theory (IRT), Scholastic Aptitude Test (SAT)

Abstract

Two recent simulation studies were conducted to aid in the diagnosis and interpretation of equating differences found between random and matched (nonrandom) samples for four commonly used equating procedures: Tucker, Levine equally reliable, Chained equipercentile observed-score procedures, and the 3PL IRT true-score equating procedure. For these simulations, test forms were equated to themselves, a situation that does not pattern reality. In this simulation, test variation was added as an additional variable for study. The results confirmed the results of the previous two simulations and support the prediction based on theoretical grounds that observed-score equating methods, such as Tucker and Chained equipercentile, are more affected by sample variation than are a true-score method (IRT) or an observed-score method based on true-score assumptions (Levine equally reliable). The results further suggest that matching equating samples on the basis of a fallible measure of ability, such as anchor test score, is not advisable for any equating method studied except possibly the Tucker method. (28pp.)

Request Copy (specify title and report number, if any)
http://dx.doi.org/10.1002/j.2333-8504.1990.tb01361.x

The Effects on Observed-and True-Score Equating Procedures of Matching on a Fallible Criterion: A Simulation With Test Variation IRT SAT

Abstract

Read More