The Relationship Between Raters' Prior Language Study and the Evaluation of Foreign Language Speech Samples

Winke, Paula; Gass, Susan; Myford, Carol
Publication Year:
Report Number:
RR-11-30; TOEFLiBT-16
Document Type:
Subject/Key Words:
oral assessment, second language performance assessment, item response theory (IRT), rater performance, rater bias, Rasch measurement, Facets, NVivo


This study investigated whether raters’ second language (L2) background and the first language (L1) of test takers taking the TOEFL iBT® Speaking test were related through scoring. After an initial 4-hour training period, a group of 107 raters (mostly of learners of Chinese, Korean, and Spanish), listened to a selection of 432 speech samples that 72 test takers (native speakers of Chinese, Korean, and Spanish) produced. We analyzed the rating data using a multifaceted Rasch measurement approach to uncover potential biases in the rating process. In addition, 26 of the raters participated in stimulated recall sessions, during which they watched videos of themselves rating. Using the video as a prompt, we asked them to discuss and explain their rating processes at the time of rating. The results from our bias interaction analyses revealed that matches between the raters’ L2 and the test takers’ L1 resulted in some of the raters assigning ratings that were significantly higher than expected. As a whole, raters with Spanish as an L2 were significantly more lenient toward test takers who had Spanish as an L1, and raters with Chinese as an L2 were significantly more lenient toward test takers who had Chinese as an L1. Analyses of the qualitative data, assisted by the program QSR NVivo 8, revealed information concerning the raters’ awareness of their biases.

Read More


Find a Publication

Advanced Search

Closing the Achievement Gap

Closing the Achievement GapLearn more about ETS's commitment to closing the achievement gap through rigorous research, thought-provoking forums and more.