Improving the Statistical Aspects of E-rater: Exploring Alternative Feature Reduction and Combination Rules

Author(s):
Feng, Xing; Dorans, Neil J.; Patsula, Liane N.; Kaplan, Bruce
Publication Year:
2003
Report Number:
RR-03-15
Source:
Document Type:
Subject/Key Words:
e-rater automated essay scoring prediction classification

Abstract

This study explores alternative ways of reducing the number of variables/features and additional ways of combining information across features to produce more stable and accurate e-rater scores. Following an explanation of the statistical aspects of the process is a description of alternatives to the process. Our explorations resulted in certain conclusions and directions for future research. We have examined enough e-rater data to conclude that stepwise regression seems to be effective as a feature reduction procedure. However, this may be attributed to the consistently strong relationship with essay score that is observed for the content vector analysis (CVA) variables and the two variables used to approximate word length (number of auxiliary verbs and the ratio of the number of auxiliary verbs to the number of words). To yield better validation results, we also suggest that the hold-out method for evaluating validity should replace the current two-stage approach of first developing a model in a quasi-uniform training sample and then validating these results in a target cross-validation sample. More research is needed in several areas. First, explicit modeling of the part of essay scores that is unrelated to word length is warranted. The POM (Proportional Odds Model) approach should be investigated in greater depth. Also needed is a statistical justification for using essay scores to score CVA variables. Algorithmic approaches to prediction/classification problem, such as boosting, may prove fruitful. Further investigation of quantile regression and ridge regression should be conducted.