The meaning of scale reliability and the effects of some aspects of a test that are likely to change its value is discussed. Recent trends in testing have endorsed test structures and item types that yield lower empirical reliability. These trends have led to more unified test construction with units larger than a single binary item. Clusters of locally dependent items make a test multidimensional; that multidimensionality needs to be considered in the evaluation of reliability. The new SAT, introduced in 1994, has a much greater emphasis on passage-based critical reading questions than the traditional SAT. The plausible consequences of a shrinkage in the reliability of the new SAT are discussed. The reliability of three tests with varying testlet size, the SAT, LSAT and TOEFL, is compared.