skip to main content skip to footer

Using Confusion Infusion and Confusion Reduction Indices to Compare Alternative Essay Scoring Rules

Dorans, Neil J.; Patsula, Liane N.
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Electronic Essay Rater (E-rater), Confusion Reduction, Essay Scoring, Confusion Infusion


Observed proportion agreement as a measure of association between two ratings of essay performance can be inflated when the number of rating categories is small. Cohen's Kappa adjusts observed agreement by subtracting out what one might expect if ratings were assigned independently of each other. The matrix of proportion agreements between two sets of assignment rules can be recast as a confusion matrix in which zero confusion is the equivalent of perfect agreement. Kappa can be viewed then as a measure of confusion reduction. A complementary measure, confusion infusion is defined. Its usefulness is illustrated with live data from a large-scale testing program where e-rater, an automatic essay-scoring algorithm, is used in place of a second reader. The confusion reduction and confusion infusion indices help make comparisons among the relative efficacy of two versions of e-rater, and two other methods of assigning scores, a second reader and assigning all candidates the mode of the first reading.

Read More