angle-up angle-right angle-down angle-left close user menu open menu closed search globe bars phone store

TOEIC® Score Consistency

TOEIC® scores are consistent and reliable.

Evidence: The research in this section demonstrates how TOEIC Program Research helps to ensure that scores are not improperly influenced by aspects of the testing procedure that are unrelated to language ability. When examining score consistency or reliability, there are multiple aspects of the testing procedure that are considered, including:

  • test items (internal consistency)
  • test forms (equivalence)
  • test occasions or administrations (stability)
  • raters (inter- and intra-rater reliability)
  • Monitoring Score Change Patterns to Support TOEIC® Listening and Reading Test Quality

    In large-scale, high-stakes testing programs, such as the TOEIC program, some test takers take a test more than once over time. The score change patterns of these so-called "repeaters" can be analyzed to support the overall quality of the test (e.g., its reliability, validity, intended uses). This study examined the aforementioned score change patterns, with the goal of evaluating the reliability and validity of TOEIC® Listening and Reading test scores.

    Read more about Monitoring Score Change Patterns to Support TOEIC® Listening and Reading Test Quality

  • Measuring English-Language Proficiency across Subgroups: Using Score Equity Assessment to Evaluate Test Fairness

    English-language proficiency assessments are designed for a targeted test population and may include test takers from diverse demographic, sociocultural and educational backgrounds. The test is assumed to be fair and the scores earned by different subgroups of test takers have the same meaning. One way of evaluating the test fairness is to produce a linked test for each subgroup and compare the test score results of the linked test with the test scores of the original test they took.

    Read more about Measuring English-Language Proficiency across Subgroups: Using Score Equity Assessment to Evaluate Test Fairness

  • Cover of How ETS Scores the TOEIC Speaking and Writing Test Responses

    How ETS Scores the TOEIC® Speaking and Writing Test Responses

    Typically, human raters are used to score Speaking and Writing tests because of their ability to evaluate a broader range of language performance than automated systems. This paper describes how ETS ensures the reliability and consistency of scores by human raters for TOEIC® Speaking and Writing tests through training, certification, and systematic administrative and statistical monitoring procedures.

    Read more about How ETS Scores the TOEIC Speaking and Writing Test Responses

  • Linking TOEIC® Speaking Scores Using TOEIC® Listening Scores

    In testing programs, multiple forms of a test are used across different administrations to prevent the overexposure of test forms and to reduce the possibility of test takers gaining advance knowledge of test content. Because slight differences may occur in the statistical difficulty of the alternate forms, a statistical procedure, known as test score linking, has been commonly used to adjust for these differences in difficulty so that test forms are comparable.

    Read more about Linking TOEIC® Speaking Scores Using TOEIC Listening Scores

  • Cover of Monitoring TOEIC® Listening and Reading Test Performance Across Administrations Using Examinees' Background Information

    Monitoring TOEIC® Listening and Reading Test Performance across Administrations Using Examinees' Background Information

    The scoring process for the TOEIC Listening and Reading test includes monitoring procedures that help ensure that scores are consistent across different test forms and test administrations, and that skill interpretations are fair. This study explores the possibility of using information about test takers' backgrounds in order to enhance several types of monitoring procedures.

    Read more about Monitoring TOEIC Listening and Reading test performance across Administrations

  • Evaluating the Stability of Test Score Means for the TOEIC® Speaking and Writing Tests

    For educational tests, it is critical to maintain consistency of score scales and to understand the sources of variation in score means over time. This helps ensure that interpretations about test takers' abilities are comparable from one administration (or form) to another. Using statistical procedures, this study examined the consistency of reported scores for the TOEIC® Speaking and Writing tests.

    Read more Evaluating the Stability of Test Score Means for the TOEIC Speaking and Writing Tests

  • Cover of Comparison of Content, Item Statistics, and Test-Taker Performance on the Redesigned and Classic TOEIC® Listening and Reading Test

    Comparison of Content, Item Statistics, and Test Taker Performance on the Redesigned and Classic TOEIC® Listening and Reading Test

    This paper compares the content, reliability and difficulty of the classic and 2006 redesigned TOEIC Reading and Listening tests. Although the redesigned tests included slightly different item types to better reflect current models of English-language proficiency, the tests were judged to be similar across versions.

    Read more about Comparison of Content, Item Statistics, and Test-Taker Performance on the Redesigned and Classic TOEIC Listening and Reading Test

  • Statistical Analyses for the Expanded Item Formats of the TOEIC® Speaking Test

    Testing programs should periodically review their assessments to ensure that their test items or tasks are well-aligned with real-world activities. For this reason, to better support communicative language learning and to discourage the use of memorization and other test-taking strategies, ETS expanded the existing format of some items of the TOEIC® Speaking test in May 2015.

    Read more about Statistical Analyses for the Expanded Item Formats of the TOEIC Speaking Test

  • Statistical Analyses for the Updated TOEIC® Listening and Reading Test

    To ensure that tests continue to meet the needs of test takers and score users, it is important that testing programs periodically revisit their assessments. For this reason, in order to keep up with the continuously changing use of English and the ways in which individuals commonly communicate in the global workplace and everyday life, an updated TOEIC® Listening and Reading test was designed and first launched in May 2016.

    Read more about Statistical Analyses for the Updated TOEIC Listening and Reading Test

  • The Consistency of TOEIC® Speaking Scores Across Ratings and Tasks

    This study examines the consistency of TOEIC Speaking scores. The analysis uses a methodology based on generalizability theory, which allows researchers to examine the degree to which aspects of the testing procedure (i.e., raters, tasks) influence scores. The results contribute evidence to support claims that TOEIC Speaking scores are consistent.

    Read more about The Consistency of TOEIC Speaking Scores Across Ratings and Tasks

  • Cover of Monitoring Individual Rater Performance for the TOEIC® Speaking and Writing Tests

    Monitoring Individual Rater Performance for the TOEIC® Speaking and Writing Tests

    This paper describes procedures implemented on the TOEIC Speaking and Writing tests for monitoring individual rater performance and enhancing overall scoring quality. These multifaceted, carefully developed procedures help ensure that the potential for human error is kept to a minimum, thereby contributing to the TOEIC tests' scoring consistency.

    Read more about Monitoring Individual Rater Performance for the TOEIC Speaking and Writing Tests

  • Cover of Alternate Forms Test-Retest Reliability and Test Score Changes for the TOEIC® Speaking and Writing Tests

    Alternate Forms Test-Retest Reliability and Test Score Changes for the TOEIC® Speaking and Writing Tests

    The reliability or consistency of scores can be examined in a variety of ways, including the degree to which scores for the same test taker are consistent across different test forms (so-called "equivalent forms reliability") and different occasions of testing ("test-retest reliability"). This study examined the consistency of TOEIC Speaking and Writing scores across different test forms at different time intervals (e.g., 1–30 days, 31–60 days) and found that test scores had reasonably high equivalent form test-retest reliability.

    Read more about Alternate Forms Test-Retest Reliability and Test Score Changes

  • Cover of Statistical Analyses for the TOEIC® Speaking and Writing Pilot Study

    Statistical Analyses for the TOEIC® Speaking and Writing Pilot Study

    This paper reports the results of a pilot study that contributed to TOEIC Speaking and Writing test development. The analysis of the reliability of test scores found evidence of several types of score consistency, including inter-rater reliability (agreement of several raters on a score) and internal consistency (a measure based on correlation between items on the same test).

    Read more about Statistical Analyses for the TOEIC Speaking and Writing

  • Cover of Field Study Results for the Redesigned TOEIC® Listening and Reading Test

    Field Study Results for the Redesigned TOEIC® Listening and Reading Test

    This paper describes the results of a field study for the 2006 redesigned TOEIC Listening and Reading tests, which includes analyses of item and test difficulty, reliability and correlations between test sections with classic TOEIC Listening and Reading tests.

    Read more about Field Study Results for the Redesigned TOEIC Listening and Reading Test