Applying Content Similarity Metrics to Corpus Data: Differences Between Native and Non-Native Speaker Responses to a TOEFL Integrated Writing Prompt

Author(s):: Deane, Paul; Gurevich, Olga
Publication Year:: 2008
Report Number:: RR-08-51
Source:: ETS Research Report
Document Type:: Report
Page Count:: 36
Subject/Key Words:: Natural Language Processing (NLP), Test of English as a Foreign Language (TOEFL), Scoring, English as a Foreign Language (EFL), Integrated Writing Prompt

Abstract

For many purposes, it is useful to collect a corpus of texts all produced to the same stimulus, whether to measure performance (as on a test) or to test hypotheses about population differences. This paper examines several methods for measuring similarities in phrasing and content and demonstrates that these methods can be used to identify population differences between native and non-native speakers of English in a writing task.

Request Copy (specify title and report number, if any)
http://dx.doi.org/10.1002/j.2333-8504.2008.tb02137.x

Applying Content Similarity Metrics to Corpus Data: Differences Between Native and Non-Native Speaker Responses to a TOEFL Integrated Writing Prompt

Abstract

Read More