skip to main content skip to footer

Estimating Item Difficulty With Comparative Judgments

Attali, Yigal; Saldivia, Luis; Jackson, Carol A.; Schuppan, Frederick W.; Wanamaker, Wilbur
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Test Development, Item Difficulty, Raters


Previous investigations of the ability of content experts and test developers to estimate item difficulty have, for the most part, produced disappointing results. These investigations were based on a noncomparative method of independently rating the difficulty of items. In this article, we argue that, by eliciting comparative judgments of difficulty, judges can more accurately estimate item difficulties. In this study, judges from different backgrounds rank ordered the difficulty of SAT mathematics items in sets of 7 items. Results showed that judges are reasonably successful in rank ordering several items in terms of difficulty, with little variability across judges and content areas. Simulations of a possible implementation of comparative judgments for difficulty estimation show that it is possible to achieve high correlations between true and estimated difficulties with relatively few comparisons. Implications of these results for the test development process are discussed.

Read More