Educational Assessment, Accountability and Equity: Conversations on Validity Around the World

Videos and Presentations

Opening Ceremony/Introduction: Video Presentation
Susan Fuhrman (Teachers College, Columbia University)  
Ida Lawrence (Educational Testing Service)  
Validity, Fairness, and Testing: Video Presentation
Michael Kane (Educational Testing Service)
Edmund W. Gordon (Teachers College, Columbia University)  
Sébastien Georges (Centre international d'études pédagogiques)  
Kadriye Ercikan (University of British Columbia)
Alina Von Davier (Educational Testing Service)
Models of Teacher Evaluation and School Accountability Around the World: Video Presentation
Adrie Visscher (University of Twente)
Aaron Pallas (Teachers College, Columbia University)  
Drew Gitomer (Rutgers University)  
Jakob Wandall (Independent Consultant, Denmark)
Haniza Yon (MIMOS, Berhad)  
Validity Issues in International Large-Scale Assessments: Video Presentation
Michael Feuer (George Washington University)
Hans Wagemaker (International Association for the Evaluation of Educational Achievement)
Eduardo Backhoff (Universidad Autónoma de Baja California)
Robert Laurie (New Brunswick Ministry of Education)
Val W. Plisko (retired – The National Center for Education Statistics)
Bringing the Validity Conversations Home: When Education Measures Go Public: Video Presentation
Eva L. Baker (University of California – Los Angeles)

This conference was followed by a two-day AERI Institute event which included a sampler of short courses on assessment and evaluation topics delivered by expert faculty. Details regarding courses and faculty are available at

Presentation Abstracts

The Chimera of Validity
(Eva L. Baker)

This presentation will treat the long valued topic of validity, with a brief review of its history, through recent guidelines to redefine and assist its use in educational assessment and testing. As a context, the 1999 Standards for Testing and Assessment will be used. Three examples will be given of ways in which validity claims have not been technically supported, and potential causes will be considered. Given the potential changes in assessment design and delivery, such as technological options, the development of badges and other qualifications to recognize deep and extended student work, the future of validity issues can be sketched.

Validity Issues in International Large-Scale Assessments: Truth and Consequences
(Michael Feuer)

This paper has three goals: to consider the policy context of international comparisons of educational achievement in the United States and elsewhere; to explore possible lessons for educational comparisons from cross-national studies in economics and demography; and to propose an expanded framework for validity definition in cross-national assessments, borrowing from Messick's formulation of "consequential validity."

Validity, Fairness, and Testing
(Michael Kane)

Validation requires two kinds of arguments, an interpretive argument and a validity argument. The interpretive argument specifies the proposed interpretation and use of test scores in terms of a sequence of inferences and assumptions leading from observed test performances to claims about attributes of the test taker, and typically, to decisions about the test taker. The validity argument validates the proposed interpretation and use by evaluating the plausibility of the inferences and assumptions in the interpretive argument. That is, in validating a proposed interpretation or use, we first lay out the claims being made, and then we evaluate these claims. An evaluation of a test use necessarily requires an evaluation of the consequences of the use. If the consequences are, on the whole, positive, the use is justified. If the consequences are judged to be more negative than positive, the use is unjustified. Evaluating the consequences of a decision rule tends to be difficult and potentially contentious, but if we want to evaluate test uses, it is necessary. Fairness plays a major role in the evaluation of both the interpretations and uses of test scores. In the context of interpretations, fairness can be defined in terms of the consistency of score meanings and implications (e.g., predictions of various criteria) across groups. For test uses, fairness can be defined in terms of the appropriateness of the decisions based on test scores, and therefore depends mainly on the consequences of these decisions.

Models of Teacher Evaluation and School Accountability Around the World: A Dutch perspective
(Adrie Visscher)

A description will be given of how the Dutch schools inspectorate holds schools (and teachers) accountable for their performance, and how Dutch schools are performing. Next, the potential of various ways of using evaluation data for performance improvement will be discussed. It is stated that benefitting most from evaluation data requires a school organization in which teaching is deprivatized, which operates in a goal-oriented way, in which performance feedback is used for improvement, and in which teachers are supported in improving the knowledge and skills that matter for student achievement. An important prerequisite for such a mode of operation is a fruitful balance between (external) evaluation and improvement.

Legal  |  Privacy & Security

Copyright © 2012 by Educational Testing Service. All rights reserved. ETS, the ETS logo and LISTENING. LEARNING. LEADING. are registered trademarks of Educational Testing Service (ETS). All other trademarks are property of their respective owners. 18486