A Study of the Effects of Contextualization and Familiarization on Responses to the TOEFL® Vocabulary Test Items

Henning, Grant
Publication Year:
Report Number:
RR-91-23, TOEFL-RR-35
Document Type:
Subject/Key Words:
Context clues English tests item formats test format vocabulary


In an effort to evaluate criticisms of the current TOEFL® vocabulary item format and to ascertain the comparative effectiveness of alternative vocabulary test item formats, the present study investigated the functioning of eight different multiple-choice formats that differed with regard to the length and inference-generating quality of the stem, with regard to the nature of the task (matching versus supply), and with regard to the degree of passage embeddedness of item stems or response options. In all, 1,040 vocabulary test items (80 familiarization items and 960 experimental items) were developed and administered in counterbalanced clusters to a total participating sample of 190 adult English-as-a-second-language students (99 persons with a controlled prior 75-minute familiarization activity and 91 with no familiarization activity) at two language institutes. These two levels of familiarization were examined to ascertain effects of familiarization on performance with the eight item types. Also, self-reports of levels of prior exposure to particular item types were correlated with actual performance with those item types. Results indicated that, for the present sample of persons and items, the current TOEFL vocabulary item format performed comparatively well in terms of appropriateness of difficulty, mean internal consistency reliability, and criterion-related validity. Such performance came in spite of earlier suggestions that current item stems may be too long, lack inference-generating information about the meaning of the targeted vocabulary, and place undue reliance on matching rather than supply format. Among alternative item formats considered, only items embedded in reading passages appeared to outperform current TOEFL vocabulary format in terms of estimates of both reliability and criterion-related validity. Use of an item format that differed from current TOEFL vocabulary format only in the uniform presence of inference-generating information in the item stem and in reduction of mean length of stem also slightly outperformed current TOEFL vocabulary item format in terms of the particular criterion validity indicated by correlation with vocabulary total score; however, these correlational differences did not quite attain statistical significance. In general, multiple-choice items using stems with underscored target vocabulary in context performed better than did either multiple-choice single-word-or-phrase matching tasks or multiple-choice supply-type items. Participation in the familiarization activity did not relate significantly or differentially to performance with any item type. However, self-reports of familiarity with particular item types suggested that some types were more familiar than others, and the three most familiar item types, including current TOEFL vocabulary format, exhibited a significant positive correlation between self-report of familiarity with the item type and successful performance with the item type.

Read More


Find a Publication

Advanced Search

Closing the Achievement Gap

Closing the Achievement GapLearn more about ETS's commitment to closing the achievement gap through rigorous research, thought-provoking forums and more.