skip to main content skip to footer

Equating Achievement Tests Using Samples Matched on Ability

Author(s):
Cook, Linda L.; Eignor, Daniel R.; Schmitt, Alicia P.
Publication Year:
1990
Report Number:
RR-90-10, CBR-90-02
Source:
ETS Research Report
Document Type:
Report
Page Count:
58
Subject/Key Words:
Achievement Tests, College Entrance Examinations, Equated Scores, Measurement Techniques

Abstract

The equating of reasonably parallel forms of College Board Achievement Tests in Biology, Chemistry, Mathematics Level II, American History and Social Studies, and French is discussed in this paper. The results of five equating methods are compared: (1) Tucker, (2) Levine equally reliable, (3) Levine unequally reliable, (4) frequency estimation equipercentile, and (5) chained equipercentile. These methods are used with an internal common-item anchor-test data collection design. Three sampling strategies were evaluated: (1) random samples from populations similar in ability level, (2) random samples from populations dissimilar in ability level, and (3) samples from populations dissimilar in ability level that have been constructed to be similar in ability level by matching on the basis of a covariate, such as the distribution of scores on a set of common items. The criteria for comparison in all cases were the results of the Tucker procedure used with random samples from populations similar in ability level. These results were used as the criterion for equating results because they represent results obtained under the most optimal operational conditions. The results of the study indicate that it may be difficult, and in some cases impossible, to equate achievement tests using new- and old-form samples obtained from populations that are different in ability level. All equating methods investigated in this study appear to be affected by group differences in ability. The equating methods that appear to be the most affected by these differences are the Tucker and frequency estimation equipercentile procedures. The methods that appear to be the most robust to group differences in ability are the chained equipercentile and the two Levine procedures. (66pp.)

Read More