skip to main content skip to footer

The Relationship Between Item Difficulty and Test Validity and Reliability

Myers, Charles T.
Publication Year:
Report Number:
ETS Research Bulletin
Document Type:
Page Count:
Subject/Key Words:
Difficulty Level, Predictive Validity, Test Items, Test Reliability, Test Validity


Using freshman average grades as a criterion, this research compares the validity and reliability of part scores based on sets of items selected from a verbal and mathematical aptitude test for college freshman, the sets having been selected on the basis of their "difficulty." Two parallel sets of 24 items each were selected from the items whose difficulty fell within the limits of 40 per cent passing and 74 per cent passing. Two other parallel sets of 24 items were selected from the items outside this range with one half the items in each set easy and one half the items hard. The subjects were 1600 freshman at 12 liberal arts colleges. For each college a table was prepared of correlations between sets of items and between each set of items and grade averages. The correlations between parallel sets of items were taken as the reliability coefficients. The correlations between average grades and the sums of scores on pairs of parallel sets were taken as the validity coefficients. Thus there were 12 pairs of reliability coefficients and 12 pairs of validity coefficients, one pair of each for each college. The significance of the differences between the two kinds of sets was tested by Wilcoxon's matched pairs signed ranks test. No significant differences were found for the validities. The mean validity coefficient for all sets was .50. The sets of medium "difficulty" items were found to be more reliable at the two per cent level of significance. The mean reliability of these sets of 24 items was .69 while the mean reliability of the other sets was .63.

Read More