angle-up angle-right angle-down angle-left close user menu open menu closed search globe bars phone store

An Investigation of the Effect of Task Type on the Discourse Produced by Students at Various Score Levels in the TOEFL iBT Writing Test

Knoch, Ute; Macqueen, Susy; O'Hagan, Sally
Publication Year:
Report Number:
RR-14-43, TOEFLiBT-23
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Discourse Analysis English (Second Language) Integrated Writing Prompt Test Scores TOEFL iBT Washback Effect [Testing] Writing Assessment Writing Tasks


This study, which forms part of the TOEFL iBT test validity argument for the writing section, has two main aims: to verify whether the discourse produced in response to the independent and integrated writing tasks differs and to identify features of written discourse that are typical of different scoring levels. The integrated writing task was added to the TOEFL iBT test to "improve the measurement of test-takers' writing abilities, create positive washback on teaching and learning as well as require test-takers to write in ways that are more authentic to academic study" (Cumming et al., 2006, p. 1). However, no research since the study by Cumming et al. (2006) on the prototype tasks has investigated if the discourse produced in response to this new integrated reading/listening-to-write task is in fact different from that produced in response to the independent task. Finding such evidence in the discourse is important, as it adds to the validity argument of the TOEFL iBT writing test and is useful for a verification of the rating scale descriptors used in operational rating. This study applied discourse-analytic measures to the writing of 480 test takers who each responded to the two writing tasks. The discourse analysis focused on measures of accuracy, fluency, complexity, coherence, cohesion, content, orientation to source evidence, and metadiscourse. An analysis with a multivariate analysis of variance (MANOVA) using a two-by-five (task type by proficiency level) factorial design with random permutations showed that the discourse produced by the test takers varies significantly on most variables under investigation. The discourse produced at different score levels also generally differed significantly. The findings are discussed in terms of the TOEFL iBT test validity argument. Implications for rating scale validation and automated scoring are discussed.

Read More