In this report, systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. Data collected from a situational judgment test are used to facilitate the comparison. For a well-developed item with appropriate keys (i.e., the correct answers), agreement among various item-scoring rules is expected in the item-option characteristic curves. In addition, when models based on item-response theory fit the data, test reliability is greatly improved, particularly if the nominal response model and its estimates are used in scoring.