Evaluation of e-rater for the GRE Issue and Argument Prompts AES NLP

Author(s):: Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent
Publication Year:: 2012
Report Number:: RR-12-02
Source:: ETS Research Report
Document Type:: Report
Page Count:: 106
Subject/Key Words:: e-rater, Automated Essay Scoring (AES), Graduate Record Examination (GRE), Analytical Writing, Automated Scoring Models, Natural Language Processing (NLP), Automated Scoring and Natural Language Processing, NLP-Related Measurement Research

Abstract

Automated scoring models for e-rater were built and evaluated for the GRE argument and issue-writing tasks. Prompt-specific, generic, and generic with prompt-specific intercept scoring models were built and evaluation statistics such as weighted kappas, Pearson correlations, standardized difference in mean scores, and correlations with external measures were examined to evaluate the e-rater model performance against human scores. Performance was also evaluated across different demographic subgroups. Additional analyses were performed to establish appropriate agreement thresholds between human and e-rater scores for unusual essays and the impact of using e-rater on operational scores. The generic e-rater scoring model with operational prompt-specific intercept for the issue-writing task and prompt-specific e-rater scoring model for the argument writing task were recommended for operational use. The two automated scoring models were implemented to produce check scores at a discrepancy threshold of 0.5 with human scores.

Request Copy (specify title and report number, if any)
http://dx.doi.org/10.1002/j.2333-8504.2012.tb02284.x

Evaluation of e-rater for the GRE Issue and Argument Prompts AES NLP

Abstract

Read More