angle-up angle-right angle-down angle-left close user menu open menu closed search globe bars phone store

Reliability and Validity

Reliability and validity are essential aspects of the quality of test scores. The TOEFL® research program ensures test score reliability and validity by following established guidelines and practices for the development and operational implementation of educational measurements. The program also conducts research on the different claims and inferences made based on TOEFL Family of Assessments test scores.

Investigating the effects of prompt characteristics on the comparability of TOEFL iBT® integrated writing tasks

This study examines how different characteristics of questions on the TOEFL iBT® integrated Read-Listen-Write tasks affect test-taker performances. Findings indicate that the difficulty of the reading passages and the distinctness of ideas in the listening passages could significantly influence writing test scores. The results imply that test takers should pay attention to the different characteristics of the questions when responding to them.

Read more about Investigating the effects of prompt characteristics on the comparability of TOEFL iBT integrated writing tasks

Effects of printed option sets on listening item performance among young English-as-a-Foreign-Language learners

This study compares test-taker performances on the multiple-choice listening items of the TOEFL Primary® test under two conditions: (1) with the answer options printed in the test booklet and read aloud to the test takers, and (2) with the options read aloud only. Results reveal that printed options did not affect listening scores, and students preferred printed options, providing empirical evidence to support the design of the test.

Read more about Effects of printed option sets on listening item performance among young English-as-a-Foreign-Language learners

Speaking proficiency of young language students: A discourse-analytic study

This study examines the characteristics of young language students' spoken responses to the Picture Narration task and one Listen-Speak task of the TOEFL Junior® Speaking test. The results show that students scoring higher on the speaking test also demonstrated better fluency, grammar, vocabulary and content. The findings indicate that the use of different task types is important for measuring and developing young students' speaking proficiency.

Read more about Speaking proficiency of young language students: A discourse-analytic study

An investigation of the effect of task type on the discourse produced by students at various score levels in the TOEFL iBT® writing test

The study compares the characteristics of test-taker responses to TOEFL iBT independent and integrated writing tasks. Analyses of 480 writing samples revealed that test takers produced longer essays on independent tasks, but used more complex language for integrated tasks. The results suggest that the use of both task types is more effective in measuring writing proficiency and provide empirical support for the design of the TOEFL iBT test.

Read more about

Is writing performance related to keyboard type? An investigation from examinees' perspectives on the TOEFL iBT

The requirement that a U.S.-type keyboard be used when taking the TOEFL iBT test may affect the performance of test takers who routinely use other types of keyboards. Through a survey of over 17,000 test takers worldwide, the study found that keyboard type had little impact on test performance. This study provides empirical evidence to support the use of U.S.-type keyboard for TOEFL iBT test administrations around the world.

Read more about Is writing performance related to keyboard type? An investigation from examinees' perspectives on the TOEFL iBT

Stakeholders' beliefs about the TOEFL iBT® test as a measure of academic language ability

This study explores the perceptions of students, teachers and administrators of the content of the TOEFL iBT test. Through focus groups and surveys, the results show that the participants, representing different stakeholder groups, largely believe that the test accurately measures academic language ability and that the test results are good indicators of student performance. These findings support the use of TOEFL iBT as a measure of academic English-language ability.

Read more about Stakeholders' beliefs about the TOEFL iBT® test as a measure of academic language ability

From one to multiple accents on a test of L2 listening comprehension

This study examines the effect of different varieties of native accents and accent strength on listening comprehension. The researchers gathered listening test scores from over 20,000 TOEFL iBT test takers randomly assigned to listen to an academic lecture presented in a U.S., Australian or British accent, with varying degrees of accent strength. Results show that the speaker's accent strength and the listener's familiarity with various accents could affect listening comprehension.

Read more about From one to multiple accents on a test of L2 listening comprehension

Shaping a score: Complexity, accuracy, and fluency in integrated writing performances

This study reports an analysis of 480 TOEFL iBT written responses and corresponding scores on two TOEFL iBT writing tasks, where test takers read a short text, listen to a brief lecture, and then write a short essay in response. Researchers found that writers' fluency and grammatical accuracy were critical in predicting test scores, but the complexity of the essays had a weaker relationship to test scores.

Read more about Shaping a score: Complexity, accuracy, and fluency in integrated writing performances

Screener tests need validation too: Weighing an argument for test use against practical concerns

Screeners are short tests that can be used to place a test taker into a particular test-taker group. This study reports the development of a prototype screener test to place young English learners into different levels of the TOEFL Primary Reading test and outlines the validity evidence, such as score consistency and practicality, that need to be gathered in order to ensure valid use of such tests.

Read more about Screener tests need validation too: Weighing an argument for test use against practical concerns

A study of the use of TOEFL iBT® test speaking and listening scores for international teaching assistant screening

This study examines the effectiveness of using TOEFL iBT speaking and listening scores to screen international teaching assistants (ITAs) in U.S. universities. Scores on the TOEFL iBT listening section were found to be better predictors of ITAs' teaching competence than scores on the test's speaking section. The study suggests that ITAs can make noticeable improvements in their speaking ability after spending three months in English-speaking environments.

Read more about

You might also be interested in ...