Below are some recent publications that our researchers have authored.
2016
-
Psychometrics and Game-Based Assessment
R. J. Mislevy, S. Corrigan, A. Oranje, K. DiCerbo, M. I. Bauer, A. A. von Davier, & M. John (2016)
in F. Drasgow (ed) Technology and Testing: Improving Educational and Psychological Measurement, pp. 23–48The authors of this chapter discusses psychometric models and validity issues related to game-based assessments and design implications of linking assessment and psychometric methods with game design. The book is part of the NCME Applications of Educational Measurement and Assessment Book Series. Learn more
-
Agent-Based Modeling of Collaborative Problem Solving
Y. Bergner, J. J. Andrews, M. Zhu, & J.E. Gonzales (2016)
ETS Research Report, RR-16-27. Princeton, NJ: Educational Testing Service.The authors explore how agent-based modeling can model collaborative problem solving, test the sensitivity of outcomes to different population characteristics, and generate simulated data for refining and developing psychometric models. View citation record >
-
Using Networks to Visualize and Analyze Process Data for Educational Assessment
M. Zhu, Z. Shu, & A.A. von Davier (2016)
Journal of Educational Measurement, v53, n2, pp.190–211, Summer 2016The authors of this study focus on process data collected from scenario-based tasks and explore the potential of using concepts and methods from social network analysis to represent and analyze process data. View citation record >
-
Exponential Family Distributions Relevant to IRT
S. J. Haberman (2016)
In W. J. van der Linden (ed.) Handbook of Item Response Theory, Volume Two: Statistical Tools. Boca Raton: Chapman and Hall/CRC, 2016, pp. 47–69Exponential families provide a statistical framework for development of customary models for analysis of item responses and for analysis of the statistical properties associated with these models. View citation record >
-
A Program for Nonparametric Raw-to-Scale Conversion
S. J. Haberman (2016)
ETS Research Memorandum, RM-16–03Raw-to-scale conversions are developed in which no assumptions are made concerning the distribution of the raw scores, but the target scale score distribution is based on a continuous exponential family and on specified moment constraints. View citation record >
-
An Evaluation of Different Statistical Targets for Assembling Parallel Forms in Item Response Theory
U.S. Ali, & P. van Rijn (2016)
Applied Psychological Measurement, v40 n3 pp.163–179, 2016The authors investigated the interplay between two different IRT targets of statistical test specifications: Test Characteristic Curve (TCC) and the Test Information Function (TIF). The authors explored the extent to which test forms differ in terms of their TIFs when they are assembled based on a target TCC. View citation record >
-
Integrating Scaffolding Strategies Into Technology-Enhanced Assessments of English Learners: Task Types and Measurement Models
M.K. Wolf, D. Guzman-Orth, A.A. Lopez, K.E. Castellano, I. Himelfarb, & F.S. Tsutagawa (2016)
Educational Assessment, v21, n3, pp.157–175, 2016The authors explore ways to improve the assessment of English language proficiency of those students who are learning English, given the current movement of creating next-generation English language proficiency assessments in the Common Core era. They discuss the integration of scaffolding strategies into the design of technology-enhanced assessment tasks. View citation record >
-
Designing Tests to Measure Personal Attributes and Noncognitive Skills
P.C. Kyllonen (2016)
In S. Lane, M.R. Raymond, T.M. Haladyna, (eds.) Handbook of Test Development, Second Edition. New York: Routledge, 2016, pp.190–211The authors of this chapter discuss why noncognitive skills are important for both education and the workplace. They also explore various construct frameworks for these skills and different methods for measuring them. View citation record >
-
Estimating True Student Growth Percentile Distributions Using Latent Regression Multidimensional IRT Models
J.R. Lockwood, & K.E. Castellano (2016)
Educational and Psychological Measurement, pp.1–28, Jul 2016 (currently online only)The authors develop a novel framework using latent regression multidimensional item response theory models to study distributional properties of true student growth percentiles. Such models are becoming more common in the United States when making inferences about student achievement growth and educator effectiveness. View citation record >
-
Evaluation of Different Scoring Rules for a Noncognitive Test in Development
H. Guo, J. Zu, P.C. Kyllonen, & N. Schmitt (2016)
ETS Research Report, RR-16–03The authors discuss how systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. View citation record >
-
Applications of Multidimensional Item Response Theory Models With Covariates to Longitudinal Test Data
J. Fu (2016)
ETS Research Report, RR-16–21The authors discuss applications of Multidimensional Item Response Theory (MIRT) models with covariates to longitudinal test data to measure skill differences at the individual and group levels. View citation record >
-
Taming Log Files From Game/Simulation-Based Assessments: Data Models and Data Analysis Tools
J. Hao, L. Smith, R. Mislevy, A.A. von Davier, & M.I. Bauer (2016)
ETS Research Report, RR-16–10The authors propose a generic data model specified as an Extensible Markup Language (XML) schema for log files from game/simulation-based assessments. They also propose a set of analysis methods for identifying useful information from the log files and implement the methods in a package in the Python programming language, glassPy. View citation record >
2015
-
Psychometric Considerations for the Next Generation of Performance Assessment
T. Davey, S. Ferrara, P. W., R. Shavelson, N.M., L.L. Wise, (2015)
Report of the Center for K–12 Assessment & Performance ManagementThe authors take up issues of definition; scoring, score reliability and task comparability for individual and groupwork performance assessment; and modeling and scoring of the diverse response types produced. View citation record >
-
Psychometrics in Support of a Valid Assessment of Linguistic Minorities: Implications for the Test and Sampling Designs
M.E. Oliveri, & A.A. von Davier (2015)
International Journal of Testing, pp.1–20, 2015 (currently online only)The authors of this study propose that the unique needs and characteristics of linguistic minorities should be considered throughout the test development process, as well as strategies that focus on the early stages of test development. View citation record >
-
The Powerful Merge of "Big Data" and Assessment Data in Support of Student Success
D.G. Payne, & A.A. von Davier (2015)
Ninth Annual Strategic Leaders Global Summit. Singapore: National University of Singapore, 2015, pp. 66–68The authors discuss the use of big data in education as a way to achieve a more nuanced measurement and deeper knowledge about students’ abilities and life circumstances. View citation record >
-
Simulation-Extrapolation for Estimating Means and Causal Effects with Mismeasured Covariates
J.R. Lockwood, & D.F. McCaffrey (2015)
Observational Studies, v1, pp. 241–290, Oct 2015The authors review approaches to consistent estimation of a population mean of an incompletely observed variable using error-prone covariates, noting difficulties with applying these methods. They also consider the application of Simulation-Extrapolation (SIMEX) as a simple and effective alternative. View citation record >
-
Uncovering Multivariate Structure in Classroom Observations in the Presence of Rater Errors
D.F. McCaffrey, K. Yuan, T.D. Savitsky, J.R. Lockwood, M.O. Edelen (2015)
Educational Measurement: Issues and Practice, v34, n2, pp. 34–46, Summer 2015The authors examine the factor structure of scores from the CLASS-S protocol obtained from observations of middle school classroom teaching. They demonstrate that errors in scores made by two raters on the same lesson have a factor structure that is distinct from the factor structure at the teacher level. They consider alternative hierarchical estimation approaches designed to prevent the contamination of estimated teacher-level factors. View the publisher's abstract >
-
Matching and Weighting with Functions of Error-Prone Covariates for Causal Inference
J.R. Lockwood, & D.F. McCaffrey (2015)
Journal of the American Statistical Association, 2015 (currently online only)The authors establish necessary and sufficient conditions for matching and weighting with functions of observed covariates to yield unconfounded causal effect estimators, generalizing results from the standard (i.e. no measurement error) case. View citation record >
-
An Alternative Way to Model Population Ability Distributions in Large-Scale Educational Surveys
E. Wetzel, X. & Xu, M. von Davier (2015)
Educational and Psychological Measurement, v75, n5, pp. 739–763, Oct 2015The authors explore an alternative way to model population ability distributions in large-scale educational surveys, where a latent regression model is often used to compensate for the shortage of cognitive information. In this article, the authors introduce an alternative approach to identify multiple groups that can account for the variation among students — conduct a Latent Class Analysis (LCA). View citation record >
-
Bayesian Networks in Educational Research
R. G. Almond, R. J. Mislevy, L. Steinberg, D. Yan, & D. Williamson (2015)
Statistics for Social and Behavioral SciencesThe authors explain and illustrate how Bayesian Networks, which combine statistical methods and computer-based expert systems, can be used to develop models for educational assessment. Learn more
-
Prediction of True Test Scores from Observed Item Scores and Ancillary Data
S. J. Haberman, L. Yao, & S. Sinharay (2015)
British Journal of Mathematical and Statistical Psychology, Vol. 68, No. 2, pp. 363–385The authors develop new methods to evaluate performance of test takers when items are scored both by human raters and by computers. Learn more
-
Alternative Statistical Frameworks for Student Growth Percentile Estimation
J. R. Lockwood & K. E. Castellano (2015)
Statistics and Public Policy, Online FirstThe authors describe two alternative statistical approaches for estimating student growth percentiles (SGP). The first estimates percentile ranks of current test scores conditional on past test scores directly. The second estimates SGP directly from longitudinal item-level data. Learn more >
-
An Application of Exploratory Data Analysis in the Development of Game-Based Assessments
K. E. DiCerbo, M. Bertling, S. Stephenson, Y. Jia, R. J. Mislevy, M. I. Bauer, & T. Jackson (2015)
Serious Games Analytics: Methodologies for Performance Measurement, Assessment, and Improvement, pp. 319–342
Editors: C. S. Loh, Y. Sheng, & D. IfenthalerThe authors of this chapter discuss the use of exploratory data analysis (EDA) using the 4R’s (revelation, resistance, re-expression and residuals) to better understand players’ knowledge, skills and attributes (KSAs) in order to gain evidence that the authors suggest can be combined in a measurement model based on Bayesian Networks. Learn more
-
Assessing Collaborative Problem Solving with Simulation Based Tasks
J. Hao, L. Liu, A. von Davier, & P. Kyllonen (2015)
In Proceedings for the 11th International Conference on Computer Supported Collaborative LearningThe authors discuss preliminary results from a project developed for assessing the collaborative problem-solving using web-based simulation. Two participants collaborated via a chat box in this experiment in order to complete a science task. Responses from 486 individuals and 278 teams (dyads) recruited from Amazon Mechanical Turk™ were compared. View citation record
-
Methodological Challenges in the Analysis of MOOC Data for Exploring the Relationship between Discussion Forum Views and Learning Outcomes
Y. Bergner, D. Kerr, & D. E. Pritchard (2015)
In Proceedings of the 8th International Conference on Educational Data Mining, pp. 234–241The article discusses methodological challenges and problems with missing data that researchers face when seeking to understand the diverse group of students that take massively open online courses (MOOCs). Solving these challenges is important to policymakers and providers of education. View citation record
-
Estimation of Ability from Homework Items When There are Missing and/or Multiple Attempts
Y. Bergner, K. Colvin, & D. E. Pritchard (2015)
In Proceedings of the Fifth International Conference on Learning Analytics and Knowledge, pp. 118–12Missing data and multiple attempts to answers by a user presents two challenges to scoring massively open online courses (MOOCs). The authors discuss these challenges in regards to ability estimation of homework items in a large-enrollment electrical engineering MOOC. View citation record
-
An Exploratory Study Using Social Network Analysis to Model Eye Movements in Mathematics Problem Solving
M. Zhu & G. Feng (2015)
In Proceedings of the Fifth International Conference on Learning Analytics and Knowledge, pp. 383–387The paper applies techniques from social network analysis to eye movement patterns in mathematics problem-solving. The authors construct and visualize transition networks using eye-tracking data collected from 37 8th grade students while solving linear function problems. View citation record
-
Use of Jackknifing to Evaluate Effects of Anchor Item Selection on Equating With the Nonequivalent Groups With Anchor Test (NEAT) Design
R. Lu, S. Haberman, H. Guo & J. Liu (2015)
ETS Research Report No. RR-15-10.The authors evaluate the impact of anchor selection on equating stability, which can strongly influence equating results in the real world, even when large examinee samples are present. This can provide a major hazard to the practical use of equating. View citation record
-
Repeater Analysis for Combining Information From Different Assessments
S. Haberman & L. Yao (2015)
Journal of Educational Measurement, Vol. 52, No. 2, pp. 223–251The article discusses how information from several assessments, for example TOEFL iBT® test scores and GRE® revised General Test scores, can be combined in a rational way. It suggests principles for exploring how various assessments relate to each other. Augmentation approaches developed for individual tests are applied to provide an accurate evaluation of combined assessments. The proposed methodology can be applied to other situations involving multiple assessments. View citation record
-
Pseudo-Equivalent Groups and Linking
S. Haberman (2015)
Journal of Educational and Behavioral Statistics, Vol. 40, No. 3, pp. 254–273The author explores an approach to linking test forms in the case of a nonequivalent groups design with no satisfactory common items. He compares the reasonableness of results from pseudo-equivalent groups to results from kernel equating. View citation record
-
Analyzing Process Data from Game/Scenario-Based Tasks: An Edit Distance Approach
J. Hao, Z. Shu, & A. von Davier (2015)
Journal of Educational Data Mining, Vol. 7, No. 1, pp. 33–50In this paper the authors describe their research on evaluating students' performances in game/scenario-based tasks by comparing how far their action strings (a string of characters) are from the action string that corresponds to the best performance, where the proximity is quantified by the edit distance between the strings. View citation record
-
Gamification in Assessment: Do Points Affect Test Performance?
Y. Attali & M. Arieli-Attali (2015)
Computers & Education, Vol. 83, pp. 57–63The authors examine the premise that gamification promotes motivation and engagement and therefore supports the learning process. The authors examined the effects of points, which is a basic element of gamification, on performance in a computerized assessment of mastery and fluency of basic mathematics concepts. View citation record
-
The Changing Nature of Educational Assessment
R. Bennett (2015)
Review of Research in Education, Vol. 39, No. 1, pp. 370–407The author describes the evolution of technology-based assessment from an initial stage involving infrastructure building, a second stage of gradual qualitative change and efficiency improvement to a third stage of innovative assessments aligned with the two Common Core State Assessment (CCSA) consortia, the Partnership for the Assessment of Readiness for College and Careers (PARCC), and the Smarter Balanced Assessment Consortium (SBAC). The article proceeds to explore and discuss in-depth the emerging third stage of assessments. View citation record
-
A Comparison of IRT Proficiency Estimation Methods Under Adaptive Multistage Testing
S. Kim, T. Moses, & H. Yoo (2015)
Journal of Educational Measurement, Vol. 52, No. 1, pp. 70–79The article reports on an investigation of item response theory (IRT) proficiency estimators’ accuracy under multistage testing (MST). View citation record
-
The Impact of Measurement Error on the Accuracy of Individual and Aggregate SGP
D. F. McCaffrey, K. E. Castellano, & J. R. Lockwood (2015)
Educational Measurement: Issues and Practice, Vol. 34, No. 1, pp. 15–21The authors discuss a potential bias due to measurement errors. This bias can affect student growth percentiles (SGPs) for individual students as well as mean or median SGPs on the aggregate level. The authors discuss various techniques that seek to decrease this bias. View citation record
2014
-
Using Response Time to Investigate Students’ Test-Taking Behaviors in a NAEP Computer-Based Study
Y.-H. Lee & Y. Jia (2014)
Large-scale Assessments in Education, Vol. 2, No. 1, pp. 1–24Students’ test-taking behaviors and level of effort in survey assessments as well as the effects on performance have been discussed for a long time. The authors present a procedure for examining test-taking behaviors using response time collected from a National Assessment of Educational Progress (NAEP) computer-based study, referred to as MCBS. View citation record
-
Psychometric Considerations in Game-Based Assessment
R. Mislevy, A. Oranje, & M. Bauer et. al. (2014)
GlassLab Research, Institute of PlayThis paper describes the formative assessment value of simulation games, as seen in the work to develop "SimCityEDU: Pollution Challenge!" It is the first of several papers published by Institute of Play on the work and research of GlassLab. The authors of this report are affiliated with ETS, Institute of Play, Pearson, and Electronic Arts. View citation record
-
Equating Test Scores (without IRT), Second Edition
S. Livingston (2014)
ETS published bookA nonmathematical introduction to equating, emphasizing conceptual understanding and practical applications. This is the second edition, which also covers raw and scaled scores, linear and equipercentile equating, data collection designs for equating, selection of anchor items, equating constructed-response tests (and other tests that include constructed-response questions), and methods of anchor equating. View citation record
-
Toward Increasing Fairness in Score Scale Calibrations Employed in International Large-Scale Assessments
M. E. Oliveri & M. von Davier (2014)
International Journal of Testing, Vol. 14, No. 1, pp. 1–21The authors investigate the creation of comparable score scales across countries in international assessments and examine potential improvements to current score scale calibration procedures used in international large-scale assessments. View citation record
-
Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis
L. Rutkowski, M. von Davier & D. Rutkowski (eds.) (2014)
Chapman & Hall/CRC Statistics in the Social and Behavioral SciencesThis handbook provides a broad guide to international large-scale assessments, behavioral statistics and policy. The handbook is of value to graduate students and researchers, as well as policy analysts familiar with quantitative methods without being expert in the field.
View citation record -
Computerized Multistage Testing: Theory and Applications
D. Yan, A. A., von Davier, & C. Lewis (eds.) (2014)
Chapman & Hall, London, UKThe book covers the methodologies, underlying technology and implementation of computerized multistage testing. It gives a historical overview, examines critical areas of research inquiry, item pool development and maintenance, and the most recent practical applications and challenges. It also discusses how to apply theoretical statistical tools to testing. View citation record
-
Visualization and Confirmatory Clustering of Sequence Data from a Simulation-based Assessment Task
Y. Bergner, Z. Shu, & A. A. von Davier (2014)
In Stamper, J., Pardos, Z., Mavrikis, M., McLaren, B. M. (Eds.) Proceedings of the 7th international conference on educational data mining, London, UKThis paper explores computer-based visualization and clustering in relation to sequence data from a simulation-based assessment task. It covers visualization challenges such as how to represent progress towards a goal and how to account for variable-length sequences. View paper
-
Examining Potential Boundary Bias Effects in Kernel Smoothing on Equating: An Introduction for the Adaptive and Epanechnikov Kernels
J. A. Cid & A. A. von Davier (2014).
Applied Psychological MeasurementTest equating is a method for making scores from different test forms of a particular assessment comparable. The article discusses kernel equating (KE) and explores the potential effects of score spikes on the extreme ends of a kernel equating, and alternative ways to reduce boundary bias when smoothing. View citation record
-
Monitoring of Scoring Using the e-rater® Automated Scoring System and Human Raters on a Writing Test
Z. Wang & A. A. von Davier (2014)
ETS Research Report No. RR-04-14. Princeton, NJ: Educational Testing Service.This report proposes methodologies for monitoring the quality of both human and automated constructed-response (CR) scoring. For quality assurance purposes, there is the need to provide a consistent and standardized approach to monitoring CR scoring quality over time and across programs. Monitoring the scoring results will help provide scores that are both fair and accurate for test takers and test users. View citation record
-
Achieving a Stable Scale for an Assessment with Multiple Forms: Weighting Test Samples in IRT Linking
J. Qian, A. A. von Davier, & Y. Jiang (2014).
In R. E. Millsap, L. A. van der Ark, D. M. Bolt, & C. M. Woods (eds.), New developments in quantitative psychology. Presentations from the 77th Annual Psychometric Society Meeting (pp. 171–186).
New York, NY: Springer Verlag -
In this study, researchers studied the application of statistical weighting techniques as a way of improving scale stability over time in assessments with multiple forms. View citation record
2013
-
Detection of Unusual Administrations Using a Linear Mixed Effects Model
Y-H. Lee, M. Liu, & A. A. von Davier (2013)
In R. E. Millsap, L. A. van der Ark, D. M. Bolt, & C. M. Woods (eds.), New developments in quantitative psychology. Presentations from the 77th Annual Psychometric Society Meeting (pp. 133–150). New York, NY: Springer VerlagIn this paper, researchers report on their investigation of a model for detecting abnormal results in an assessment that used a specific equating plan over multiple test administrations that took place in rapid succession. View citation record
-
Assessing Item Fit for Unidimensional Item Response Theory Models Using Residuals from Estimated Item Response Functions
S. J. Haberman, S. Sinharay & K.-H. Chon (2013)
Psychometrika, Vol. 78, No. 3, pp. 417–440The authors discuss residual analysis, a popular method to assess fit of item response theory (IRT) models and suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. View citation record
-
Determining When Single Scoring for Constructed-Response Items Is as Effective as Double Scoring in Mixed-Format Licensure Tests
S. Kim & T. P. Moses (2013)
International Journal of Testing, Vol. 13, No. 4, pp. 314–328The authors studied the conditions under which single scoring for constructed-response items is as effective as double scoring in the licensure testing context. The authors found that the use of single scoring, under certain conditions, can reduce scoring time and cost without increasing classification inconsistency. View citation record
-
Monitoring Scale Scores Over Time Via Quality Control Charts, Model-Based Approaches, and Time Series Techniques
Y. H. Lee & A. A. von Davier (2013)
Psychometrika, Vol. 78, No. 3, pp. 557–575The authors present a new way of monitoring scores and assessing scale drift. They use quality control charts, model-based approaches, and time series techniques to address a range of needs for monitoring scale scores, including continuous monitoring, adjustment of customary variations, identification of abrupt shifts, and assessment of autocorrelation. View citation record
-
Assessing a Critical Aspect of Construct Continuity When Test Specifications Change or Test Forms Deviate from Specifications
J. Liu & N. J. Dorans (2013)
Educational Measurement: Issues and Practice, Vol. 32, No. 1, pp. 15–22The authors show how score equity assessment (SEA) can support the assessment of a critical aspect of construct continuity, the equivalence of scores, whenever planned changes are introduced to testing programs. They also demonstrate that SEA can be used for quality control check when evaluating the degree to which tests developed to a static set of specifications are the same (equatable). View citation record
-
Using Deterministic, Gated Item Response Theory Model to Detect Test Cheating Due to Item Compromise
Z. Shu, R. Henson & R. Luecht (2013)
Psychometrika, Vol. 78, No. 3 pp. 481–497The authors suggests that the deterministic, gated Item Response Theory model, can be used to identify cheaters that achieve a significant score gain on tests, which is a result of item exposure or item compromise. In the article, the authors use the hierarchical Markov Chain Monte Carlo as the model's estimation framework. The model is then applied to a real data set to illustrate how it can identify examinees with advance knowledge of exposed items. View citation record
-
Collaborative Problem Solving and the Assessment of Cognitive Skills: Psychometric Considerations
A. A. von Davier & P. F. Halpin
ETS Research Report RR-13-41The authors give an overview of the research conducted in several fields of work related to collaboration, propose a framework for the assessment of cognitive skills (such as science or math) through collaborative problem-solving tasks, and propose several statistical approaches to model the data collected from collaborative interactions. View citation record
-
Observed-Score Equating: An Overview
A. A. von Davier (2013)
Psychometrika,Vo. 78, pp.605–623This paper provides an overview of the observed-score equating (OSE) process and discusses a range of issues related to tests, common items, and sampling designs and their relationship to measurement and equating. It also presents various challenges to the equating process, model assumptions, and approaches to equating evaluation. View citation record
-
Monitoring Scale Scores Over Time via Quality Control Charts, Model-based Approaches, and Time Series Techniques
Y-H. Lee & A. A. von Davier (2013)
Psychometrika, 78, pp. 557–575The authors introduce a new approach for score monitoring and assessment of scale drift that involves quality control charts, model-based approaches and time series techniques. View citation record
-
Local Equating Using the Rasch Model, the OPLM, and the 2PL IRT Model — or — What Is It Anyway if the Model Captures Everything There Is to Know About the Test Takers?
M. von Davier, J. Gonzalez, & A. A. von Davier. (2013).
Journal of Educational Measurement, Vol. 50, No. 3, pp. 295–303.In this article, the authors discuss how the features of a procedure known as local equating, based on Lord's criterion of equity, change when using a Rasch model. View citation record
Find Other ETS-authored Statistics and Psychometrics Publications
ETS ReSEARCHER is a database that contains information on ETS-authored or -published works, such as ETS Research Reports, ETS Research Memorandums or publications written by ETS researchers and published by third parties, such as scholarly journals.