Summary: In testing programs, multiple forms of a test are used across different administrations to prevent overexposure of test forms and to reduce the possibility of test takers gaining advance knowledge of test content. Because slight differences may occur in the statistical difficulty of the alternate forms, a statistical procedure known as test score linking has been commonly used to adjust for these differences in difficulty so that test forms are comparable. In developing multiple forms of the TOEIC Speaking Test, the current practice of the TOEIC Program is to follow strict specifications to ensure that each new form of the TOEIC Speaking test is comparable to previously used forms in terms of content and difficulty. Another practice employed to maintain score comparability of test forms is to conduct various comparisons against historical records or test statistics to detect, for example, any suspicious scoring events or failures to observe content specifications. This study investigated how well the current practice for the TOEIC Speaking test maintains the comparability of test forms across time and administration compared to a more conventional test form linking procedure (e.g., using the TOEIC Listening score as an anchor). Using 30 actual Speaking Test forms, test scores based on the current practice were compared to test scores resulting from linking via the TOEIC Listening scores. The outcome showed that the score differences from the two procedures were minimal. The results suggest that continuing the current scoring procedures is a practical choice for maintaining the comparability of TOEIC Speaking test forms over time and to help ensure fair score interpretations. Abstract: The purpose of this study was to assess the effectiveness of the current practice for reporting scores on the TOEIC Speaking test. Currently test developers adhere to strict specifications to ensure that each new edition (or form) of the TOEIC Speaking test is comparable to previously used forms in terms of content and difficulty. For each of the 30 TOEIC Speaking test forms, the operational scores derived from the current practice were compared to the scores derived from the external multiple-choice (MC) anchor linking design. Scores on the TOEIC Listening test were used as an external anchor test for linking. Score differences derived by the two procedures were generally minimal and comparable to differences resulting from measurement error. The study suggests that psychometric benefits that may be achieved by replacing the current practice with an external MC anchor linking design will be minimal. The TOEIC program currently conducts various statistical checks against the historical records in an attempt to maintain score comparability over forms and over time. The continuation of the checking procedure will be a practical choice for this test to maintain comparability of the reporting scale over time.