Estimating Item Difficulty With Comparative Judgments

Author(s):: Attali, Yigal; Saldivia, Luis; Jackson, Carol A.; Schuppan, Frederick W.; Wanamaker, Wilbur
Publication Year:: 2014
Report Number:: RR-14-39
Source:: ETS Research Report
Document Type:: Report
Page Count:: 8
Subject/Key Words:: Test Development, Item Difficulty, Raters

Abstract

Previous investigations of the ability of content experts and test developers to estimate item difficulty have, for the most part, produced disappointing results. These investigations were based on a noncomparative method of independently rating the difficulty of items. In this article, we argue that, by eliciting comparative judgments of difficulty, judges can more accurately estimate item difficulties. In this study, judges from different backgrounds rank ordered the difficulty of SAT mathematics items in sets of 7 items. Results showed that judges are reasonably successful in rank ordering several items in terms of difficulty, with little variability across judges and content areas. Simulations of a possible implementation of comparative judgments for difficulty estimation show that it is possible to achieve high correlations between true and estimated difficulties with relatively few comparisons. Implications of these results for the test development process are discussed.

Request Copy (specify title and report number, if any)
http://dx.doi.org/10.1002/ets2.12042

Estimating Item Difficulty With Comparative Judgments

Abstract

Read More