English Language Learning and Assessment Worldwide
ETS has a long history of supporting research on global assessments for adult and older adolescent English learners who need to demonstrate English proficiency for academic or job-related purposes. Our research on assessments for younger learners in recent years has provided a strong research foundation for newly introduced assessments that target elementary and middle school students in global contexts — the TOEFL® Primary™ tests and the TOEFL Junior® Standard and Comprehensive tests.
Our research in this area has aims such as these:
- Validity and Fairness Frameworks — We conduct foundational research with the aim of providing conceptual frameworks to guide the practice of developing fair and valid assessments.
- Test Design Frameworks and Principles — We develop new test design frameworks and principles that are theoretically-grounded and practically applicable.
- Quality — We maintain an ongoing program of research to support and continuously improve existing testing programs. As part of this focus on quality, we also conduct foundational research with the aim of understanding language development, identifying factors that may affect test-taker performances, understanding rater behaviors and improving practices in test scoring.
- Innovation — We seek to design new measures, in particular those that utilize new methods and language technologies to support English learning and instruction.
Research Related to ETS Testing Programs
- The TOEFL iBT® test — Learn more about the research we conduct to support this assessment of students' ability to use English for communication at the university level across the globe.
- The TOEIC® tests — Learn more about the research we conduct to support this assessment of test-takers' ability to use English for communication in the global workplace.
In addition to the publications that our research professionals have authored in direct support of the TOEFL® and TOEIC tests, our English Language Learning and Assessment research initiative publishes or funds works related to foundational research in an international context. Topics of such research include validity and fairness frameworks; test design frameworks and principles; quality; and innovation. Below are some recent publications:
- Validity and Fairness Frameworks
In this book chapter, the author discusses an argument-based approach to validation and provides examples in the context of language testing. View citation record >
Validating Score Interpretations and Uses
M. Kane (2012)
Language Testing, Vol. 29, No. 1, pp. 3–17
In this article, the author discusses a two-step, argument-based approach to validation involving (1) specifying the proposed uses and interpretations of the test's scores, and (2) evaluating the plausibility of the proposed interpretive argument. View citation record >
A Framework for Evaluation and Use of Automated Scoring
D. Williamson, X. Xi, & J. Breyer (2012)
Educational Measurement: Issues and Practice, Vol. 31, No. 1, pp. 2–13
This article provides a framework for evaluating and using automated scoring for constructed-response tasks. The framework entails both evaluation of automated scoring as well as guidelines for implementing and maintaining it in the context of constantly evolving technologies. View citation record >
Validity and the Automated Scoring of Performance Tests
X. Xi (2012)
Chapter in The Routledge Handbook of Language Testing, pp. 438–451
Editors: G. Fulcher & F. Davidson
In this book chapter, the author discusses validity considerations in using automated scoring for performance-based language tests in the context of evolving theories and practice in test validity. View citation record >
Does an Argument-Based Approach to Validity Make a Difference?
C. A. Chapelle, M. K. Enright, & J. Jamieson (2010)
Educational Measurement: Issues and Practice, Vol. 29, No. 1, pp. 3–13
This paper evaluates the differences between two different approaches to validity including Kane (2006) and the 1999 AERA/APA/NCME Standards for Educational and Psychological Testing. View citation record >
How Do We Go About Investigating Test Fairness?
X. Xi (2010)
Language Testing, Vol. 27, No. 2, pp. 147–170
This article proposes an approach that treats fairness as an aspect of validity and provides an illustration of how a fairness argument may be established and supported in a validity argument. View citation record >
Methods of Test Validation
X. Xi (2008)
Chapter in Encyclopedia of Language and Education, Volume 7: Language Testing and Assessment, 2nd fully revised edition (pp. 177–196).
Editors: E. Shohamy & N. H. Hornberger
This chapter provides a comprehensive examination of the evolution of the concept of validity and presents current validation methods for language assessments. The author also discusses how advances in validity research in language assessment benefit from progress in other fields. View citation record >
What and How Much Evidence Do We Need? Critical Considerations in Validating an Automated Scoring System
X. Xi (2008)
Chapter in Towards Adaptive CALL: Natural Language Processing for Diagnostic Language Assessment (pp. 102–114)
Editors: C. A. Chapelle, Y.-R. Chung, & J. Xu
Publisher: Iowa State University
This paper illustrates how an argument-based approach can be applied to the validation of the use of an automated scoring system called SpeechRater℠ for the TOEFL® Practice Online Speaking test. View citation record >
- Test Design Frameworks and Principles
Evidence-centered design (ECD) is a conceptual framework for the design and delivery of assessments. This book chapter discusses the ways ECD can be used effectively in language testing. View citation record >
This book chapter offers a definition of prototyping, considers the ideal characteristics of a prototyping population, and discusses the kinds of information that prototyping can provide for the design and development of new assessments. View citation record >
The Case for a Comprehensive, Four-Skills Assessment of English-Language Proficiency
D. E. Powers (2010)
R&D Connections No. 14
This article makes a case for measuring a test-taker's overall proficiency in all modes of communication in English, including listening, reading, writing and speaking. View citation record >
Using Multiple Texts in an Integrated Writing Assessment: Source Text Use as a Predictor of Score
L. Plakans & A. Gebril (2013)
Journal of Second Language Writing, 22(3), pp. 217–230.
This study investigates how test takers use source text in an integrated writing task, and how the use differs across score levels and task topics. Findings support the validity of interpreting integrated task scores as a measure of academic writing. This research was funded by the TOEFL Committee of Examiners. View citation record >
The Influence of Second Language Experience and Accent Familiarity on Oral Proficiency Rating: A Qualitative Investigation
P. Winke, & S. Gass (2013)
TESOL Quarterly, 47(4), pp. 762–789.
This article investigates whether raters' knowledge of test takers' first language affects how the raters orient themselves to the task of rating oral speech and the effects of accent familiarity on raters' score assignment processes. This research was funded by the TOEFL Committee of Examiners. View citation record >
Raters' L2 Background as a Potential Source of Bias in Rating Oral Performance
P. Winke, S. Gass, & C. Myford (2013)
Language Testing, 30(2), pp. 231–252.
This study investigates whether accent familiarity, defined as having learned the test takers' L1, leads to rater bias. Raters' accent familiarity was found to be a potential source of bias. This research was funded by the TOEFL Committee of Examiners. View citation record >
Tests of English for Academic Purposes (EAP) in University Admissions
X. Xi, B. Bridgeman, & C. Wendler (2013)
In A. Kunnan (Ed.), The Companion to Language Assessment. pp. 318–337 Malden, Mass.: Wiley-Blackwell.
This chapter charts the history, surveys the current developments and discusses the future trends in English for academic purposes assessments used for admissions to postsecondary English-medium institutions. View citation record >
Comparison of Human and Machine Scoring of Essays: Differences by Gender, Ethnicity and Country
B. Bridgeman, C. Trapani, & Y. Attali (2012)
Applied Measurement in Education, 25 (1), pp. 27–40.
This study compares the differences in essay scores generated by machine and by human raters for certain gender, ethnic and country groups. Human and machine scores were found to be very similar across most subgroups. View citation record >
Using Raters from India to Score a Large-Scale Speaking Test
X. Xi, & P. Mollaun (2011)
Language Learning, 61(4), pp. 1222–1255.
This study investigates the scoring of a speaking test by speakers of English and speakers of Indian languages. Results show that the Indian raters performed as well as the U.S.-based raters in scoring both Indian and non-Indian examinees. View citation record >
The Effectiveness of Feedback for L1-English and L2-Writing Development: A Meta-Analysis
D. Biber, T. Nekrasova, & B. Horn (2011)
ETS Research Report No. RR-11-05
This report reviews and synthesizes previous research on the effectiveness of feedback for individual writing development. The meta-analysis of research in this area shows that feedback is beneficial for writing development. View citation record >
Suprasegmental Measures of Accentedness and Judgments of Language Learner Proficiency in Oral English
O. Kang, D. Rubin, & L. Pickering (2010)
Modern Language Journal, Vol. 94, No. 4, pp. 554–566
This study examines the relationships among several acoustic measures of accented speech and native listeners' judgments of oral proficiency. The suprasegmental features of speech were found to be strong predictors of oral proficiency and comprehensibility. This research was funded by the TOEFL Committee of Examiners. View citation record >
Aspects of Performance on Line Graph Description Tasks: Influenced by Graph Familiarity and Different Task Features
X. Xi (2010)
Language Testing, Vol. 27, No. 1, pp. 73–100
This article describes a study that systematically manipulated characteristics of a line graph description task in a speaking test with the aim of mitigating the influence of graph familiarity, a potential source of construct-irrelevant variance in testing. View citation record >
Reverse Linguistic Stereotyping: Measuring the Effect of Listener Expectations on Speech Evaluation
O. Kang & D. Rubin (2009)
Journal of Language and Social Psychology, Vol. 28, No. 4, pp. 441–456
This article investigates a phenomenon known as reverse linguistic stereotyping, in which attributions of a speaker's group membership trigger distorted evaluations of that person's speech. This research was funded by the TOEFL Committee of Examiners. View citation record >
A Comparison of Two Scoring Methods for an Automated Speech Scoring System
X. Xi, D. Higgins, K. Zechner, & D. Williamson (2012).
Language Testing, 29(3), pp. 371–394.
This paper compares two alternative scoring methods — multiple regression and classification trees — for an automated speech scoring system used in a practice environment. View citation record >
The Utility of Article and Preposition Error Correction Systems for English Language Learners: Feedback and Assessment
M. C. Chodorow, M. Gamon, & J. R. Tetreault (2010)
Language Testing, Vol. 27, No. 3, pp. 419–436
This paper describes and evaluates two systems for identifying and correcting writing errors involving English articles and prepositions. Results show that both systems were helpful for error corrections. View citation record >
Automated Grammatical Error Detection for Language Learners
C. Leacock, M. Chodorow, M. Gamon, & J. Tetreault (2010)
Synthesis Lectures on Human Language Technologies, Vol. 3, No. 1, pp. 1–134
This volume provides an overview of automated approaches that have been developed to identify and correct different types of grammatical errors in a number of languages. New directions for research in automated grammatical error detection are proposed. View citation record >
Automated Scoring and Feedback Systems: Where Are We and Where Are We Heading?
X. Xi (2010)
Language Testing, Vol. 27, No. 3, pp. 291–300
This editorial prefaces a special edition of Language Testing that provides a collection of new methodologies and approaches in automated scoring and feedback systems. Background information and issues related to automated scoring and automated feedback research are discussed. View citation record >
Adapting the Acoustic Model of a Speech Recognizer for Varied Proficiency Non-Native Spontaneous Speech Using Read Speech with Language-Specific Pronunciation Difficulty
K. Zechner, D. Higgins, R. Lawless, Y. Futagi, S. Ohls, & G. Ivanov (2009)
Paper in Proceedings of Interspeech 2009: 10th Annual Conference of the International Speech Communication Association, Vol. 1–5, pp. 612–615
This paper presents an approach to acoustic model adaptation of a recognizer for nonnative spontaneous speech in the context of recognizing candidates' responses in a test of spoken English. View citation record >
A Computational Approach to Detecting Collocation Errors in the Writing of Non-Native Speakers of English
Y. Futagi, P. Deane, M. Chodorow, & J. Tetreault (2008)
Computer Assisted Language Learning, Vol. 21, No. 4, pp. 353–367
This article describes a prototype of an automated tool for detecting collocation errors in the writing of English learners. Detailed error analyses and possible improvements of the system are discussed. View citation record >
Find a Publication
Learn more in this brief interview with Xiaoming Xi, Senior Research Director at ETS (Flash, 3:06).