skip to main content skip to footer

Automatic Assessment of Vocabulary Usage Without Negative Evidence TOEFL ALEK

Leacock, Claudia; Chodorow, Martin
Publication Year:
Report Number:
RR-01-21, TOEFL-RR-67
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Test of English as a Foreign Language (TOEFL), Assessing Lexical Knowledge (ALEK), Constructed Response, National Language Processing, Grammatical Error Detection, Automated Scoring and Natural Language Processing


This report describes the implementation and evaluation of an automated statistical method for assessing an examinee's use of vocabulary words in constructed responses. The grammatical error-detection system, ALEK (Assessing Lexical Knowledge), infers negative evidence from the low frequency or absence of constructions in 30 million word of well-formed, copy-edited text from North American newspapers. ALEK detects two types of errors: those that violate basic principles of English syntax (e.g., treating a mass noun as a count noun in a pollution). The system evaluated word usage in essay-length responses to Test of English as a Foreign Language (TOEFL) prompts. ALEK was developed using three words and was evaluated on an additional 20 words that appeared frequently in TOEFL essays and in a university word list. System accuracy was evaluated to investigate its potential for scoring performance-based measures of communicative competence. It performed with about 80% precision and 20% recall. False positives (correct usages that ALEK identified as errors) and misses (usage errors that were not recognized by ALEK) were analyzed, and methods for improving system performance were outlined.

Read More