GE FRST Evaluation Report: How Well Does a Statistically-Based Natural Language Processing System Score Natural Language Constructed Responses?
- Burstein, Jill; Kaplan, Randy M.
- Publication Year:
- Report Number:
- Document Type:
- Subject/Key Words:
- Automation Constructed Responses Evaluation General Electric Free-Response Scoring Tool (GE FRST) Scoring
There is a considerable interest at Educational Testing Service (ETS) to include performance-based, natural language constructed-response items on standardized tests. Such items can be developed, but the projected time and costs required to have these items scored by human graders would be prohibitive. In order for ETS to include these types of items on standardized tests, automated scoring systems need to be developed and evaluated. Automated scoring systems could decrease the time and costs required for human graders to score these items. This report details the evaluation of a statistically based scoring system, the General Electric Free-Response Scoring Tool (GE FRST). GE FRST was designed to score short-answer, constructed- responses of up to 17 words. The report describes how the system performs for responses on three different item types. For the sake of efficiency, it is important to evaluate systems on a number of item types to see if the system's scoring method can generalize to a number of item types. (30pp.)