Automated scoring models were trained and evaluated for the essay task in the Praxis I writing test. Prompt-specific and generic e-rater scoring models were built, and evaluation statistics, such as quadratic weighted kappa, Pearson correlation, and standardized differences in mean scores, were examined to evaluate the e-rater model performance against human scores. Performance of the scoring model was also evaluated across different demographic subgroups using the same statistics. Additionally, correlations for automated scores with external measures were observed for validity evidence. Analyses were performed to establish appropriate agreement thresholds between human and e-rater scores for unusual essays and to examine the impact of using e-rater on operational scores and classification rates. The generic e-rater scoring model was recommended for operational use to produce contributory scores within a discrepancy threshold of 1.5 with a human score.