Studies evaluating hypotheses about sources of differential item functioning (DIF) are classified into two categories: observational studies evaluating operational items and randomized DIF studies evaluating specially constructed items. For observational studies, advice is given for item classification, sample selection, the matching criterion, and the choice of DIF techniques, as well as how to summarize, synthesize, and translate DIF data into DIF hypotheses. In randomized DIF studies of specially constructed items, specific hypotheses, often generated from observational studies, are evaluated under rigorous conditions. Advice for these studies focuses on the importance of carefully constructed items to assess DIF hypotheses. In addition, randomized DIF studies are cast within a causal inference framework, which provides a justification for the use of standardization analyses or logistic regression analysis to estimate effect sizes. Two studies that have components spanning the observational and controlled domains are summarized for illustrative purposes. Standardization analyses are used for both studies. Special logistic regression analyses of an item from one of these studies are provided to illustrate a new approach in the assessment of DIF hypotheses using specially constructed items.