For testing programs that administer multiple forms within a year and across years, score equating is used to ensure that scores can be used interchangeably. In an ideal world, samples sizes are large and representative of populations that hardly change over time, and very reliable alternate test forms are built with nearly identical psychometric properties. Under these conditions, most equating methods produce score conversions close to the identity function. Unfortunately, equating is sometimes performed on small non-representative samples with variable distributions of ability, and administered tests are built to vague specifications. Here, different equating methods produce different results because they are based on different assumptions. In the nearly ideal case, there are smaller deviations from the identity function because great effort is taken to control variation. Even when equating is conducted under these desirable conditions, the random variation in form-to-form equating, when concatenated over time, can produce substantial shifts in score conversions, that is, scale drift. In this paper, we make distinctions among different sources of variation that may contribute to score-scale inconsistency, and identify practices that are likely to contribute to it.