skip to main content skip to footer

From Biology to Education: Scoring and Clustering Multilingual Text Sequences and Other Sequential Tasks NLP

Sukkarieh, Jana Z.; von Davier, Matthias; Yamamoto, Kentaro
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Multilingual Automated Scoring, Large-Scale Assessment, Sequential Tasks, Prefix-Infix Omission and Insertion, Multilingual Character or Grapheme Sequences, Sequence Alignment and Clustering, Natural Language Processing (NLP), Bio-NLP


This document describes a solution to a problem in the automatic content scoring of the multilingual character-by-character highlighting item type. This solution is language independent and represents a significant enhancement. This solution not only facilitates automatic scoring but plays an important role in clustering students’ responses; consequently, it has a nontrivial impact on the refinement of the items and/or their scoring guidelines. Furthermore, though designed for a specific problem, the proposed solution is general enough for any educational task that can be transformed into a sequential one. To name a few: It can be used for a set of actions expected from a student in simulations or learning trajectories as projected by a teacher, inside an intelligent tutoring system, or even in a game—or it can simply be used for a set of student clicks, button selections, or keyboard hits expected to reach a correct answer. This solution provides flexibility for existing automatic-scoring techniques and potentially could provide more flexibility if coupled with statistical data-mining techniques.

Read More