A Sampling of Statistical Problems Encountered at the Educational Testing Service NAEP

Johnson, Eugene D; Lewis, Charles; Mislevy, Robert J.; Wainer, Howard
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Bayesian Statistics, Educational Testing Service, National Assessment of Educational Progress (NAEP), Nonresponse, Statistical Analysis, Test Theory, Testing Problems


In this paper four researchers at ETS describe what they consider some of the most vexing statistical problems they have faced. While these problems are not all completely statistical they all have major statistical components. In the first section, "Problems with Simultaneous Estimation of Many True Scores," Charles Lewis describes a technical problem that occurs in taking a Bayesian approach to traditional test theory. The linear models he describes, so-called "true score models," are the basis of most test scoring schemes and so the problem he describes has analogs in many other fields. In the second section, "Test Theory Reconceived," Robert J. Mislevy explains the growing dissatisfaction with models of the sort described previously, and points toward the need for a broader outlook. In the third section, "Allowing Examinee Choice in Exams," Howard Wainer discusses the general problem of nonignorable nonresponse in one circumstance, specifically an increasingly popular innovation in testing practice--allowing examinees to choose to answer only a small number of test items from a larger selection. He points out the pitfalls of such a practice and laments its consequences. In the fourth section, Eugene G. Johnson describes some statistical issues facing NAEP, specifically the inferences that are occurring within the NAEP survey due to nonignorable nonresponse. By law, students and schools may opt not to participate in the assessment. The problems caused by nonresponse and the current methods of adjustment are discussed, In response to the educational reforms described by Mislevy and Wainer NAEP uses new testing methodologies. Johnson describes some of these and indicates some statistical issues they engender. (JGL) (21pp.)

