This paper investigates whether conclusions from previous studies are true also for multiple choice tests, which can be answered correctly by guessing. Reasons why a separate investigation is needed for multiple-choice tests are first considered. Then the hypothetical tests chosen for this investigation are defined. This paper's main interest centers on a comparison between tests differing either in average difficulty or in the variability of item difficulty, but not in both, while all other characteristics of the tests remain constant. It is concluded from the tests studied that: 1) the maximum test reliability and the maximum curvilinear correlation of test score on criterion, for a test composed of equivalent items, is obtained when the item difficulty is somewhat easier than halfway between the chance level and 1.00; 2) reliability and curvilinear correlations of test score on criterion decrease as the variability of item difficulty increases; and 3) it seems likely that many of our multiple choice tests could be improved by reducing the range of item difficulties and decreasing the average difficulty level. (JGL)