Standard 3.9 of the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council for Measurement in Education, 1999) demands evidence of model fit when an item response theory (IRT) model is used to make inferences from a data set. We applied two recently suggested methods for assessing goodness of fit of IRT models—generalized residual analysis (Haberman, 2009) and residual analysis for assessing item fit (Bock & Haberman, 2009)—to several operational data sets. We assessed the practical significance of misfit whenever possible. This report summarizes our findings. Though evidence of misfit of the IRT model was found for all the data sets, the misfit was not always practically significant.