Weighting Test Samples in IRT Linking and Equating: Toward an Improved Sampling Design for Complex Equating

Author(s):
Qian, Jiahe; von Davier, Alina A.; Jiang, Yanming
Publication Year:
2013
Report Number:
RR-13-39
Source:
ETS Research Report
Document Type:
Report
Page Count:
31
Subject/Key Words:
Sampling Design Target Population Subsample Raking Item Response Theory (IRT) Poststratification Equating (PE) Complete Grouped Jackknifing

Abstract

To study the weighting effects on linking, we first selected multiple subsamples from a data set. We then compared the linking parameters from subsamples with those from the data and examined whether the linking parameters from the weighted sample yielded smaller mean square errors (MSE) than those from the unweighted subsample. To study the weighting effects on true-score equating, we also compared the distributions of the equated scores. Generally, the findings were that the weighting produced good results. Several factors could cause variability in item response theory (IRT) linking and equating procedures, such as the variability across examinee samples and/or test items, seasonality, regional differences, native language diversity, gender, and other demographic variables. Hence, the following question arises: Is it possible to select optimal samples of examinees so that the IRT linking and equating can be more precise at an administration level as well as over a large number of administrations? This is a question of optimal sampling design in linking and equating. To obtain an improved sampling design for invariant linking and equating across testing administrations, we applied weighting techniques to yield a weighted sample distribution that is consistent with the target population distribution. The goal is to obtain a stable Stocking-Lord test characteristic curve (TCC) linking and a true-score equating that is invariant across administrations.

Read More