Reliability and validity are essential aspects of the quality of test scores. The TOEFL® research program ensures test score reliability and validity by following established guidelines and practices for the development and operational implementation of educational measurements. The program also conducts research on the different claims and inferences made based on TOEFL Family of Assessments test scores.
Investigating the effects of prompt characteristics on the comparability of TOEFL iBT® integrated writing tasks
This study examines how different characteristics of questions on the TOEFL iBT® integrated Read-Listen-Write tasks affect test-taker performances. Findings indicate that the difficulty of the reading passages and the distinctness of ideas in the listening passages could significantly influence writing test scores. The results imply that test takers should pay attention to the different characteristics of the questions when responding to them.
Effects of printed option sets on listening item performance among young English-as-a-Foreign-Language learners
This study compares test-taker performances on the multiple-choice listening items of the TOEFL Primary® test under two conditions: (1) with the answer options printed in the test booklet and read aloud to the test takers, and (2) with the options read aloud only. Results reveal that printed options did not affect listening scores, and students preferred printed options, providing empirical evidence to support the design of the test.
Speaking proficiency of young language students: A discourse-analytic study
This study examines the characteristics of young language students' spoken responses to the Picture Narration task and one Listen-Speak task of the TOEFL Junior® Speaking test. The results show that students scoring higher on the speaking test also demonstrated better fluency, grammar, vocabulary and content. The findings indicate that the use of different task types is important for measuring and developing young students' speaking proficiency.
An investigation of the effect of task type on the discourse produced by students at various score levels in the TOEFL iBT® writing test
The study compares the characteristics of test-taker responses to TOEFL iBT independent and integrated writing tasks. Analyses of 480 writing samples revealed that test takers produced longer essays on independent tasks, but used more complex language for integrated tasks. The results suggest that the use of both task types is more effective in measuring writing proficiency and provide empirical support for the design of the TOEFL iBT test.
Is writing performance related to keyboard type? An investigation from examinees' perspectives on the TOEFL iBT
The requirement that a U.S.-type keyboard be used when taking the TOEFL iBT test may affect the performance of test takers who routinely use other types of keyboards. Through a survey of over 17,000 test takers worldwide, the study found that keyboard type had little impact on test performance. This study provides empirical evidence to support the use of U.S.-type keyboard for TOEFL iBT test administrations around the world.
Stakeholders' beliefs about the TOEFL iBT® test as a measure of academic language ability
This study explores the perceptions of students, teachers and administrators of the content of the TOEFL iBT test. Through focus groups and surveys, the results show that the participants, representing different stakeholder groups, largely believe that the test accurately measures academic language ability and that the test results are good indicators of student performance. These findings support the use of TOEFL iBT as a measure of academic English-language ability.
From one to multiple accents on a test of L2 listening comprehension
This study examines the effect of different varieties of native accents and accent strength on listening comprehension. The researchers gathered listening test scores from over 20,000 TOEFL iBT test takers randomly assigned to listen to an academic lecture presented in a U.S., Australian or British accent, with varying degrees of accent strength. Results show that the speaker's accent strength and the listener's familiarity with various accents could affect listening comprehension.
Shaping a score: Complexity, accuracy, and fluency in integrated writing performances
This study reports an analysis of 480 TOEFL iBT written responses and corresponding scores on two TOEFL iBT writing tasks, where test takers read a short text, listen to a brief lecture, and then write a short essay in response. Researchers found that writers' fluency and grammatical accuracy were critical in predicting test scores, but the complexity of the essays had a weaker relationship to test scores.
Screener tests need validation too: Weighing an argument for test use against practical concerns
Screeners are short tests that can be used to place a test taker into a particular test-taker group. This study reports the development of a prototype screener test to place young English learners into different levels of the TOEFL Primary Reading test and outlines the validity evidence, such as score consistency and practicality, that need to be gathered in order to ensure valid use of such tests.
A study of the use of TOEFL iBT® test speaking and listening scores for international teaching assistant screening
This study examines the effectiveness of using TOEFL iBT speaking and listening scores to screen international teaching assistants (ITAs) in U.S. universities. Scores on the TOEFL iBT listening section were found to be better predictors of ITAs' teaching competence than scores on the test's speaking section. The study suggests that ITAs can make noticeable improvements in their speaking ability after spending three months in English-speaking environments.