skip to main content skip to footer

A Comparison of Score Aggregation Methods for Unidimensional Tests on Different Dimensions IRT GPCM CBAL

Author(s):
Fu, Jianbin; Feng, Yuling
Publication Year:
2018
Report Number:
RR-18-01
Source:
ETS Research Report
Document Type:
Report
Page Count:
18
Subject/Key Words:
Aggregation, Factor Analysis, Testlet Model, Unidimensional IRT, Item Response Theory (IRT), Generalized Partial-Credit Model (GPCM), Test Scores, Elementary Secondary Education, Test-Taker Performance, Cognitively Based Assessment of, for, and as Learning (CBAL), Test Dimensionality

Abstract

In this study, we propose aggregating test scores with unidimensional within-test structure and multidimensional across-test structure based on a 2-level, 1-factor model. In particular, we compare 6 score aggregation methods: average of standardized test raw scores (M1), regression factor score estimate of the 1-factor model based on the correlation matrix of test raw scores (M2), overall ability from a unidimensional generalized partial credit model (GPCM) based on the items from all tests (M3), average of ability estimates from individual tests based on GPCM (M4), regression factor score of the 1-factor model based on the correlation matrix of ability estimates from individual tests based on GPCM (M5), and general ability from the testlet model (M6). The 4 design factors considered in the simulation study are ability correlation between tests (.3, .5, .7, .8, and .9), test length (10, 20, 30, and 60 items), number of tests (2 and 4), and factor loading distribution (equal and unequal). The comparisons are also conducted on a real test data set with 2 tests. On the basis of the results, M1 and M4 are recommended for 2 tests, and M2, M5, and M6 are recommended for 3 or more tests. Several issues regarding attaining aggregate score reliability for intended uses and score aggregation types distinguished by test dimensionality are discussed, and practical suggestions for score aggregation are provided.

Read More