skip to main content skip to footer

The Relationship Between Raters' Prior Language Study and the Evaluation of Foreign Language Speech Samples TOEFL iBT IRT

Winke, Paula; Gass, Susan M.; Myford, Carol M.
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Test of English as a Foreign Language (TOEFL), Internet Based Testing (iBT), Oral Assessment, Second Language Performance Assessment, Item Response Theory (IRT), Rater Performance, Rater Bias, Rasch Measurement, FACETS, NVivo


This study investigated whether raters’ second language (L2) background and the first language (L1) of test takers taking the TOEFL iBT Speaking test were related through scoring. After an initial 4-hour training period, a group of 107 raters (mostly of learners of Chinese, Korean, and Spanish), listened to a selection of 432 speech samples that 72 test takers (native speakers of Chinese, Korean, and Spanish) produced. We analyzed the rating data using a multifaceted Rasch measurement approach to uncover potential biases in the rating process. In addition, 26 of the raters participated in stimulated recall sessions, during which they watched videos of themselves rating. Using the video as a prompt, we asked them to discuss and explain their rating processes at the time of rating. The results from our bias interaction analyses revealed that matches between the raters’ L2 and the test takers’ L1 resulted in some of the raters assigning ratings that were significantly higher than expected. As a whole, raters with Spanish as an L2 were significantly more lenient toward test takers who had Spanish as an L1, and raters with Chinese as an L2 were significantly more lenient toward test takers who had Chinese as an L1. Analyses of the qualitative data, assisted by the program QSR NVivo 8, revealed information concerning the raters’ awareness of their biases.

Read More