Five item response theory (IRT) computer programs, IRTPRO, flexMIRT, PARSCALE, mdltm, and MIRT, are compared in terms of item parameter estimates. The five programs are used to run the one-parameter logistic (1PL)/partial credit model (PCM), two-parameter logistic (2PL)/generalized partial credit model (GPCM), and/or three-parameter logistic (3PL)/GPCM on two real data sets and 30 simulated data sets. For real and simulated data sets, the (mean) correlations, differences, and root mean square differences of item parameter estimates among the five programs are compared, and for the simulated data sets, these statistics with the true parameters are also reported. The advantages and disadvantages of each program are discussed. The flexMIRT program is recommended for calibrating large-scale assessment data. It is further recommended that Educational Testing Service develop shareable in-house IRT software based on mdltm, MIRT, or the National Assessment of Educational Progress version of PARSCALE.