The second study documented the development and validation of a set of task models designed to help item writers generate new items that are optimally constructed to provide high-quality evidence about targeted skills. The proposed models were validated by considering the percentage of difficulty variance accounted for by the specified item classifications. That amount ranged from slightly more than 30% for items designed to test vocabulary skills to slightly more than 40% for items designed to test additional verbal reasoning skills, such as generating near and far inferences and understanding complex oppositional reasoning.