skip to main content skip to footer

Prediction of Writing True Scores in Automated Scoring of Essays by Best Linear Predictors and Penalized Best Linear Predictors AES CORE

Yao, Lili; Haberman, Shelby J.; Zhang, Mo
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Automated Essay Scoring (AES), Writing Assessment, Best Linear Predictors, Penalty Functions, High-Stakes Decisions, Human Scoring, Accuracy, Praxis Core Academic Skills for Educators (Core), TOEFL iBT, GRE Writing Assessment


Many assessments of writing proficiency that aid in making high‐stakes decisions consist of several essay tasks evaluated by a combination of human holistic scores and computer‐generated scores for essay features such as the rate of grammatical errors per word. Under typical conditions, a summary writing score is provided by a linear combination of the holistic scores and the feature scores. The best linear predictor (BLP) is used to approximate the true composite writing score by a linear combination of holistic scores and scores of essay features. However, because the relationship between computer‐generated feature score and human scores may depend on subgroup membership and the same scoring rules must normally be applied to all test takers, Yao, Haberman, and Zhang proposed a modified methodology of the penalized best linear predictor (PBLP) by incorporating a quadratic penalty function into the conventional BLP method. This research report contains full accounts of the BLP results as well as supplementary PBLP results to Yao et al. for three assessments of writing that aid in making high‐stakes decisions: the TOEFL iBT Writing test, the GRE General Analytical Writing subject test, and the Praxis Core Academic Skills for Educators: Writing assessment. Results obtained indicate the added value in using machine features for prediction of composite true scores of essay writings and effectiveness of the penalty function in suppressing the lack of population invariance.

Read More