Scoring and Technology

The speaking and writing tasks on the TOEFL iBT® test allow test takers to perform communicative language tasks similar to those that they might need to perform in an actual university setting. At ETS, when we score these open-ended or constructed-response tasks, we have to do so in ways that are valid, fair and consistent and that allow for the same score interpretations, regardless of whether the task was scored solely by human raters or with the help of technology. In addition, we also conduct research to ensure that the scoring of the multiple-choice test questions leads to valid, fair and meaningful scores.

To ensure that test scores are reliable and have the same meaning regardless of the institution that plans to use them, part of the TOEFL iBT test's ongoing program of research considers the criteria, tools and methods used to score the test — both the traditional multiple-choice items as well as the open-ended or constructed-response tasks such as the speaking section.

Featured Publications

These are some publications related to the methods and technologies used to score open-ended or constructed-response tasks on the TOEFL iBT test:

Human-Constructed Response Scoring

The Relationship Between Raters' Prior Language Study and the Evaluation of Foreign Language Speech Samples
P. Winke, S. Gass, & C. Myford (2011)
TOEFL iBT Report No. iBT-16

How Do Raters from India Perform in Scoring the TOEFL iBT® Speaking Section and What Kind of Training Helps?
X. Xi & P. Mollaun (2009)
TOEFL iBT Report No. iBT-11

Analytic Scoring of TOEFL® CBT Essays: Scores from Humans and e-rater® Engine
Y.-W. Lee, C. Gentile, & R. Kantor (2008)
TOEFL Research Report No. RR-81

Investigating the Utility of Analytic Scoring for the TOEFL® Academic Speaking Test (TAST)
X. Xi & P. Mollaun (2006)
TOEFL iBT Report No. iBT-01

An Examination of Rater Orientations and Test Taker Performance on English-for-Academic-Purposes Speaking Tasks
A. Brown, N. Iwashita, & T. McNamara (2005)
TOEFL Monograph No. MS-29

Exploring Variability in Judging Writing Ability in a Second Language: A Study of Four Experienced Raters of ESL Compositions
M. Erdosy (2004)
TOEFL Research Report No. RR-70

Scoring TOEFL® Essays and TOEFL® 2000 Prototype Writing Tasks: An Investigation Into Raters' Decision Making and Development of a Preliminary Analytic Framework
A. Cumming, R. Kantor, & D. E. Powers (2001)
TOEFL Monograph No. MS-22

Development of Automated Scoring Tools

Validation of Automated Scores of TOEFL iBT® Tasks Against Nontest Indicators of Writing Ability
S. C. Weigle (2011)
TOEFL iBT Report No. iBT-15

Automated Scoring of Spontaneous Speech Using SpeechRater v1.0
X. Xi, D. Higgins, K. Zechner, & D. M. Williamson (2008)
ETS Research Report No. RR-08-62

Towards an Understanding of the Role of Speech Recognition in Non-native Speech Assessment
K. Zechner, I. I. Bejar, & R. Hemat (2007)
TOEFL iBT Report No. iBT-02

Beyond Essay Length: Evaluating e-rater® Engine's Performance on TOEFL® Essays
M. Chodorow & J. Burstein (2004)
TOEFL Research Report No. RR-73

Automatic Assessment of Vocabulary Usage Without Negative Evidence
C. Leacock & M. Chodorow (2001)
TOEFL Research Report No. RR-67

A Review of Computer-based Speech Technology for the TOEFL® 2000 Test
J. C. Burstein, R. M. Kaplan, S. Rohen-Wolff, D. I. Zuckerman, & C. Lu (1999)
TOEFL Monograph No. MS-13

Computer Analysis of the TOEFL® Test of Written English™
L. T. Frase, J. Faletti, A. Ginther, & L. Grant (1999)
TOEFL Research Report No. RR-64