This article proposes and investigates several methodologies for monitoring the quality of constructed-response (CR) scoring, both human and automated. There is an increased interest in the operational scoring of essays using both automated scoring and human raters. There is also evidence of rater effects—scoring severity and score inconsistency by human raters. Recently, automated scoring of CRs was successfully implemented with human scoring for operational programs (TOEFL® and GRE® tests); however, there is much that is not yet known about the performance of automated scoring systems. Hence, for quality assurance purposes, there is the need to provide a consistent and standardized approach to monitor the quality of the CR scoring over time and across programs. Monitoring the scoring results will help provide scores that are both fair and accurate for test takers and test users, enabling testing programs to detect and correct changes in the severity of scoring.