Innovation in assessment design is at the heart of the TOEFL® research program. Continuous research efforts have been made to investigate how different innovative assessment task designs and the use of technology can improve the measurement of students' language knowledge and skills and provide useful information for score users and language learners.
Automated scoring across different modalities
This study investigates whether language features such as grammar and vocabulary developed for an automated essay-scoring system (e-rater® engine) can be used to automatically evaluate spoken responses using the SpeechRater® service. Results show that combining features of grammar, vocabulary and fluency in automated scoring of speaking can produce a more robust score and makes the SpeechRater system less vulnerable to cheating.
Helping students select appropriately challenging text: Application to a test of second language reading ability
This study proposes an approach to assessing the complexity of TOEFL iBT® reading passages and examines the types of reading texts that language learners at different proficiency levels are expected to be able to comprehend. An automated online tool, TextEvaluator®, which can provide information about the complexity of reading passages that learners can use to select reading materials appropriate for their abilities, is also introduced.
Automatic plagiarism detection for spoken responses in an assessment of English language proficiency
This study evaluates an innovative system that can be used to automatically detect plagiarized spoken responses. The system uses several features to identify plagiarized responses, such as similarities between source materials and test-taker responses, and differences in speech features (e.g., fluency) for the same individual when producing plagiarized versus spontaneous speech. The system can help improve the detection of plagiarized responses for the TOEFL iBT speaking test.
Monitoring the performance of human and automated scores for spoken responses
This study reports the use of different statistical procedures to compare scores given by human raters with those generated by SpeechRater, an automated scoring engine for spoken responses. Results reveal systematic differences between scores generated by human raters and SpeechRater. The results indicate that it would be better to use both human and machine raters to evaluate speaking proficiency with the current state-of-the-art technology.