skip to main content skip to footer

The Standard Errors of Various Test Statistics When the Items Are Sampled (Revised Edition)

Lord, Frederic M.
Publication Year:
Report Number:
ETS Research Bulletin
Document Type:
Page Count:
Subject/Key Words:
Office of Naval Research, Error of Measurement, Item Sampling, Sample Size, Statistical Bias, Test Items, Test Reliability, Test Theory, Test Validity


Suppose that a large number of forms of the same test are administered to the same group of examinees, each form consisting of a random sample of items drawn from a common pool of items. If some test statistic is computed separately for each form of the test, the value obtained will (ignoring practice effect, fatigue, etc.) differ from form to form because of sampling fluctuations. The standard deviation of the values obtained represents, approximately, the standard error of the test statistic when the test items are sampled. Formulas for such standard errors are here derived for a) the test score of a single examinee, b) the mean test score of a group of examinees, c) the standard deviation of the scores of the group, d) the Kuder-Richardson reliability of the test, formula 20, e) the Kuder- Richardson reliability, formula 21, f) the test validity. In large samples, the foregoing statistics (with the possible exception of d) are approximately normally distributed, so that significance tests can be made by familiar procedures. Consideration is given to the relation of certain of the foregoing standard errors to the conventional standard error of measurement, to the Kuder-Richardson reliability coefficients 20 and 21, and to the Wilks- Votaw criterion for parallel tests. Practical applications of the results are briefly discussed. In particular, it is concluded that the Kuder- Richardson formula-21 reliability coefficient should properly be used in certain practical situations instead of the commonly preferred formula-20 coefficient. (Supersedes RB-53-07.)

Read More