Reliability and Validity
Reliability and validity are essential aspects of the quality of test scores. The TOEFL® research program ensures test score reliability and validity by following established guidelines and practices for the development and operational implementation of educational measurements. The program also conducts research on the different claims and inferences made based on TOEFL® Family of Assessments test scores.
Investigating the effects of prompt characteristics on the comparability of TOEFL iBT® integrated writing tasks
This study examines how different characteristics of questions on the TOEFL iBT® integrated Read-Listen-Write tasks affect test-taker performances. Findings indicate that the difficulty of the reading passages and the distinctness of ideas in the listening passages could significantly influence writing test scores. The results imply that test takers should pay attention to the different characteristics of the questions when responding to them.
Effects of printed option sets on listening item performance among young English-as-a-Foreign-Language learners
This study compares test-taker performances on the multiple-choice listening items of the TOEFL Primary® test under two conditions: (1) with the answer options printed in the test booklet and read aloud to the test takers, and (2) with the options read aloud only. Results reveal that printed options did not affect listening scores, and students preferred printed options, providing empirical evidence to support the design of the test.
Speaking proficiency of young language students: A discourse-analytic study
This study examines the characteristics of young language students' spoken responses to the Picture Narration task and one Listen-Speak task of the TOEFL Junior® Speaking test. The results show that students scoring higher on the speaking test also demonstrated better fluency, grammar, vocabulary and content. The findings indicate that the use of different task types is important for measuring and developing young students' speaking proficiency.
An investigation of the effect of task type on the discourse produced by students at various score levels in the TOEFL iBT® writing test
The study compares the characteristics of test-taker responses to TOEFL iBT independent and integrated writing tasks. Analyses of 480 writing samples revealed that test takers produced longer essays on independent tasks, but used more complex language for integrated tasks. The results suggest that the use of both task types is more effective in measuring writing proficiency and provide empirical support for the design of the TOEFL iBT test.
Is writing performance related to keyboard type? An investigation from examinees' perspectives on the TOEFL iBT
The requirement that a U.S.-type keyboard be used when taking the TOEFL iBT test may affect the performance of test takers who routinely use other types of keyboards. Through a survey of over 17,000 test takers worldwide, the study found that keyboard type had little impact on test performance. This study provides empirical evidence to support the use of U.S.-type keyboard for TOEFL iBT test administrations around the world.
Stakeholders' beliefs about the TOEFL iBT® test as a measure of academic language ability
This study explores the perceptions of students, teachers and administrators of the content of the TOEFL iBT test. Through focus groups and surveys, the results show that the participants, representing different stakeholder groups, largely believe that the test accurately measures academic language ability and that the test results are good indicators of student performance. These findings support the use of TOEFL iBT as a measure of academic English-language ability.
From one to multiple accents on a test of L2 listening comprehension
This study examines the effect of different varieties of native accents and accent strength on listening comprehension. The researchers gathered listening test scores from over 20,000 TOEFL iBT test takers randomly assigned to listen to an academic lecture presented in a U.S., Australian or British accent, with varying degrees of accent strength. Results show that the speaker's accent strength and the listener's familiarity with various accents could affect listening comprehension.
Shaping a score: Complexity, accuracy, and fluency in integrated writing performances
This study reports an analysis of 480 TOEFL iBT written responses and corresponding scores on two TOEFL iBT writing tasks, where test takers read a short text, listen to a brief lecture, and then write a short essay in response. Researchers found that writers' fluency and grammatical accuracy were critical in predicting test scores, but the complexity of the essays had a weaker relationship to test scores.
Screener tests need validation too: Weighing an argument for test use against practical concerns
Screeners are short tests that can be used to place a test taker into a particular test-taker group. This study reports the development of a prototype screener test to place young English learners into different levels of the TOEFL Primary Reading test and outlines the validity evidence, such as score consistency and practicality, that need to be gathered in order to ensure valid use of such tests.
A study of the use of TOEFL iBT® test speaking and listening scores for international teaching assistant screening
This study examines the effectiveness of using TOEFL iBT speaking and listening scores to screen international teaching assistants (ITAs) in U.S. universities. Scores on the TOEFL iBT listening section were found to be better predictors of ITAs' teaching competence than scores on the test's speaking section. The study suggests that ITAs can make noticeable improvements in their speaking ability after spending three months in English-speaking environments.
Assessment and Learning
Assessment information can be used by language teachers to inform their teaching and by students to help them learn and improve. The TOEFL research program conducts research on different topics related to language assessment and learning. This research provides guidance on how to use test scores to inform classroom teaching and assessment, and ensures that the TOEFL® Family of Assessments reflects the language knowledge and skills representative of different EFL curricula.
Strategies used by young English learners in an assessment context
The strategies language learners use when responding to test questions can show whether learners' thought processes reflect what the test is intended to measure. This study examined the strategies used by young language learners when they answered TOEFL Primary test questions. Interview data with students suggested that the majority of the strategies used were relevant to what the test is designed to measure.
Examining content representativeness of a young learner language assessment: EFL teachers' perspectives
This study examines the importance of the language knowledge and skills measured by the TOEFL Primary Reading and Listening test. Judgments made by EFL teachers from 15 countries suggest that the TOEFL Primary test effectively measures the important language knowledge and skills commonly taught in EFL curricula designed for young language learners around the world. The results also imply that TOEFL Primary can be meaningfully used to support language teaching and learning.
Out of many, one: Challenges in teaching multilingual Kenyan primary students in English
This study examines the appropriateness of the use of the TOEFL Primary Reading and Listening test to measure the English-language proficiency of Kenyan primary school students and explores challenges teachers face. The data show that the TOEFL Primary test is an appropriate tool to measure Kenyan primary students' English proficiency. The researchers recommend that teachers make use of students' first language skills to help develop their English skills.
Young learners' response processes when taking computerized tasks for speaking assessment
This study compares the processes that nonnative English-speaking children and native English-speaking children use when responding to TOEFL Primary speaking tasks. The analysis yielded some differences between nonnative and native speakers, with the former spending more time looking at the countdown timers. The study highlights the importance of taking into consideration the thought processes young learners are engaged in when designing speaking tasks for this population.
The effects of different levels of performance feedback on TOEFL iBT reading practice test performance
This study examines how different types of feedback provided to students who are preparing for the TOEFL iBT test can influence their reading test performance. Results show that different types of feedback did not significantly impact reading test scores. However, students perceived that the feedback provided was useful for them to understand better the different TOEFL iBT reading item types and helped them develop strategies to process the reading passages.
Are teacher perspectives useful? Incorporating EFL teacher feedback in the development of a large-scale international English test
This study reports the use of teacher feedback to inform the development of the TOEFL Junior test. Feedback from teachers on the appropriateness of the pilot test items and the accuracy of the test scores were collected and considered when test developers revised the items. The study highlights the importance of engaging teachers in the process of developing large-scale language assessments.
Evaluating a learning tool for young English learners: The case of the TOEFL Primary® English Learning Center
The TOEFL Primary English Learning Center (ELC) is an online language-learning tool designed for young language learners. This study examines users' perceptions of the ELC through surveys and interviews with teachers and students. The ELC was perceived positively as a tool for language practice outside the classroom and for preparing students to take the TOEFL Primary test.
Scoring and Interpretation
The TOEFL research program conducts research to identify important information that score users need in order to appropriately and accurately interpret test results and make decisions based on test scores. Research efforts are made to develop statements and descriptors about what test takers with test scores within a given score range can typically do and how they can improve their English-language knowledge and skills.
Interpreting the relationships between TOEFL iBT scores and GPA: Language proficiency, policy, and profiles
Analyses of test scores of around 2,000 Chinese students at a U.S. university suggest that institutions should consider both TOEFL iBT section and overall scores when using the test to make admissions decisions. The analyses revealed that many students obtained different section score profiles, such as very high reading and listening scores with very low speaking or writing scores. Setting minimum section scores for admissions purposes is recommended.
Using the Common European Framework of Reference to facilitate score interpretations for young learners' English language proficiency assessments
Researchers provide a brief history of the development of the Common European Framework of Reference (CEFR) and the corresponding proficiency level descriptors about students' language knowledge and skills. They then describe the technique used to align the TOEFL Junior and TOEFL Primary test scores to the CEFR levels and the corresponding level descriptors. The study also provides the correspondence between each test and the CEFR levels.
An investigation of the use of TOEFL Junior Standard scores for ESL placement decisions in secondary education
Analyses of test data and teachers' judgments suggest that TOEFL Junior test scores are related to English teachers' evaluations of the appropriate levels of ESL classes in which students should be placed. The ESL levels suggested by test scores and those assigned by teachers were largely the same. The findings suggest that TOEFL Junior test can be used as an efficient and effective tool for placement purposes.
Developing and validating band levels and descriptors for reporting overall examinee performance
Researchers explain the process for developing proficiency level descriptors about what test takers at six TOEFL Junior Comprehensive score levels can do. The descriptors reflect the content of the test questions and the general proficiency descriptors from the CEFR. These descriptors are intended to help score users interpret numeric test scores and make informed decisions based on them.
Facilitating the interpretation of English language proficiency scores: Combining scale anchoring and test score mapping methodologies
Based on test-taker performance on the TOEFL ITP® Level 1 test, researchers developed proficiency level descriptors that describe what test takers with scores at four different score levels can do on each of the three sections of the test: listening comprehension; structure and written expression; and reading comprehension. The level descriptors are also aligned with descriptors of the CEFR levels.
Innovation and Technology
Innovation in assessment design is at the heart of the TOEFL research program. Continuous research efforts have been made to investigate how different innovative assessment task designs and the use of technology can improve the measurement of students' language knowledge and skills and provide useful information for score users and language learners.
Automated scoring across different modalities
This study investigates whether language features such as grammar and vocabulary developed for an automated essay-scoring system (e-rater® engine) can be used to automatically evaluate spoken responses using the SpeechRater® service. Results show that combining features of grammar, vocabulary and fluency in automated scoring of speaking can produce a more robust score and makes the SpeechRater system less vulnerable to cheating.
Helping students select appropriately challenging text: Application to a test of second language reading ability
This study proposes an approach to assessing the complexity of TOEFL iBT reading passages and examines the types of reading texts that language learners at different proficiency levels are expected to be able to comprehend. An automated online tool, TextEvaluator®, which can provide information about the complexity of reading passages that learners can use to select reading materials appropriate for their abilities, is also introduced.
Automatic plagiarism detection for spoken responses in an assessment of English language proficiency
This study evaluates an innovative system that can be used to automatically detect plagiarized spoken responses. The system uses several features to identify plagiarized responses, such as similarities between source materials and test-taker responses, and differences in speech features (e.g., fluency) for the same individual when producing plagiarized versus spontaneous speech. The system can help improve the detection of plagiarized responses for the TOEFL iBT speaking test.
Monitoring the performance of human and automated scores for spoken responses
This study reports the use of different statistical procedures to compare scores given by human raters with those generated by SpeechRater, an automated scoring engine for spoken responses. Results reveal systematic differences between scores generated by human raters and SpeechRater. The results indicate that it would be better to use both human and machine raters to evaluate speaking proficiency with the current state-of-the-art technology.