Automatic Assessment of Vocabulary Usage Without Negative Evidence
- Leacock, Claudia; Chodorow, Martin
- Publication Year:
- Report Number:
- Document Type:
- Subject/Key Words:
- ALEK constructed response grammatical error detection natural language processing
This report describes the implementation and evaluation of an automated statistical method for assessing an examinee's use of vocabulary words in constructed responses. The grammatical error-detection system, ALEK (Assessing Lexical Knowledge), infers negative evidence from the low frequency or absence of constructions in 30 million words of well-formed, copy-edited text from North American newspapers. ALEK detects two types of errors: those that violate basic principles of English syntax (e.g., agreement errors as in a desks) and those that show a lack of information about a specific word (e.g., treating a mass noun as a count noun in a pollution). The system evaluated word usage in essay-length responses to TOEFL® (Test of English as a Foreign Language™) prompts. ALEK was developed using three words and was evaluated on an additional 20 words that appeared frequently in TOEFL essays and in a university word list. System accuracy was evaluated to investigate its potential for scoring performance-based measures of communicative competence. It performed with about 80% precision and 20% recall. False positives (correct usages that ALEK identified as errors) and misses (usage errors that were not recognized by ALEK) were analyzed, and methods for improving system performance were outlined.