A recent multidimensional scaling analysis of item response data for the TOEFL® test identified clusters of items in the test sections and suggested that these clusters might be more homogeneous and more distinct than their parent sections, and hence better suited for diagnostic use. The present study explored the feasibility and value of using such cluster scores. The original analysis was based on all the item responses (choosing one of the four alternatives, or omitting or not reaching an item). That analysis was repeated, this time using the traditional scoring of item responses as correct or incorrect. Scores were then obtained for the within-section clusters identified in the new analysis. The new dimensions and clusters were very similar to the old ones. The scores for the clusters and the test sections had similar internal-consistency reliabilities and intercorrelations for the total sample, but diverged inconsistently for high-scoring and low-scoring examinees. Major conclusions were that (a) the cluster scores have no clear cut advantage over the section scores for diagnosis or other purposes and (b) taking account of particular incorrect responses yields little or no information beyond that provided by considering only whether the responses are correct or incorrect.