A Study of the Effects of Variation of Short-Term Memory Load, Reading Response Length, and Processing Hierarchy on TOEFL® Listening Comprehension Item Performance

Henning, Grant
Publication Year:
Report Number:
RR-90-18, TOEFL-RR-33
Document Type:
Subject/Key Words:
Memory performance factors reading comprehension test length Test of English as a Foreign Language™


This study was conceived in response to criticisms of the current TOEFL® listening comprehension test-item format. Major areas of criticism have included speculation that listening as tested places too much burden on short-term memory as opposed to comprehension, that a knowledge of reading is required in order to respond successfully, and that many items appear to require mere recall and matching of details rather than higher-order process-ing skills. To address these criticisms in turn, a study was designed with 120 ESL learners and three listening tests (comprised of 144 total real and adapted TOEFL test items) to examine the characteristics of item functioning under conditions of stimulus repetition versus nonrepetition, variation of length of aural stimulus passage and of associated numbers of items, shorter versus longer reading response options, and higher versus lower level of processing skills required. Those item types and stimulus conditions that were found to associate with superior item functioning as indicated by estimates of item difficulty, item discriminability, internal consistency reliability, fit to a latent trait model, and convergent and discriminant validity were identified. Results suggested that, while repetition of the stimulus passage predictably tended to reduce item difficulty when control was made for concomitant influences, there was no consistent effect of stimulus passage repetition on item discrimination, Rasch model fit, or discriminant validity across difficulty level. However, there was a tendency for items in the no-repetition condition to exhibit greater convergent and discriminant validity than items in the one-repetition condition. Although passage length was confounded with numbers of items per passage and with comprehension hierarchy level, the test with passages of three-sentence length tended to be more reliable than the test with passages of two-sentence length, and the test with passages of two-sentence length tended to be more reliable than the test with passages of one-sentence length. Also, the test with the longest passages tended predictably to be slightly more difficult than the test with the shortest passages. Item response-option length was significantly related to item difficulty and Rasch model fit in the direction that items with options that were shortened to about half current TOEFL response-option length tended to be easier and to exhibit better fit than items with current longer options. Also, items with shortened options showed greater convergent and discriminant validity across levels of difficulty than did items with unshortened options. And, there was a near-significant tendency for items with shortened options to exhibit better discrimination than items with unshortened options, when concomitant influences were controlled.

