skip to main content skip to footer

Developing an Innovative Elicited Imitation Task for Efficient English Proficiency Assessment TOEFL TOEFL iBT ESL

Davis, Lawrence; Norris, John M.
Publication Year:
Report Number:
RR-21-24, TOEFL-RR-96
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
TOEFL Essentials, Language Proficiency, English as a Second Language (ESL), Test of English as a Foreign Language (TOEFL), TOEFL iBT, English Language Assessment (ELA), Validity Argument, Second-Language Speaking, Speaking Assessments, Animation, Rater Performance, Rater Consistency, Item Response Time, Proficiency Levels, Item Difficulty, Ability Measure, Stimulus Duration


The elicited imitation task (EIT), in which language learners listen to a series of spoken sentences and repeat each one verbatim, is a commonly used measure of language proficiency in second language acquisition research. The TOEFL Essentials test includes an EIT as a holistic measure of speaking proficiency, referred to as the “Listen and Repeat” task type. In this report, we describe the design considerations that informed the development of the EIT for TOEFL Essentials. We also report the results of a series of investigations conducted during the prototyping and pilot phases of test development, which were undertaken with the goal of confirming task design specifications, evaluating scoring performance, and obtaining initial validity evidence to support score interpretation and use of the EIT in the TOEFL Essentials test. We found that task design variables generally performed as expected. The length of input sentence was strongly associated with performance (Pearson r = .88), consistent with the construct measured by the EIT, while other task variables not directly related to the EIT construct did not impact performance (e.g., graphics, speaker accent, and response time). Scorers drawn from TOEFL iBT test raters were able to score responses consistently with over 98% exact or adjacent interrater agreement on a 6-point scale, and scores on the pilot version of the EIT were highly reliable (Cronbach’s α = .93 on the 15-item pilot version). Correlations between EIT scores and other measures were generally as expected: Correlations with other speaking tasks were high (.78–.84) and slightly to somewhat lower for other language measures (.73 for writing, .68 for listening, and .57 for reading). Correlation with an independent measure of holistic language proficiency (C-test) was moderately high (.69), as expected. We discuss the study findings in terms of the TOEFL Essentials test validity argument and point out limitations to the current results along with future research needs. Overall, we believe that the findings provide initial support to warrant the use of the EIT as operationalized in the TOEFL Essentials test.

Read More