This paper investigates the predictive validity of various features of Generating Examples (GE) test items - algebra problems that pose mathematical constraints and ask examinees to provide example solutions meeting those constraints. Selection of item features was motivated by a cognitive model of how examinees solve GE items using informal solution strategies such as generate-and-test. Experiment 1 examined the extent to which examinee performance can be explained by features predicted to affect difficulty, and Experiments 2 and 3 investigated the generality and cognitive bases of the difficulty model. The factors studied accounted for approximately 55% of the variance among item difficulty levels, and this predictive power was maintained on a more heterogeneous set of items. Cognitive strategies underlying the difficulty factors were also examined.