Systems and methods are provided for scoring a response to a character-by-character highlighting task. A similarity value for the response is calculated by comparing the response to one or more correct responses to the task to determine the similarity or dissimilarity of the response to the one or more correct responses to the task. A threshold similarity value is calculated for the task, where the threshold similarity value is indicative of an amount of similarity or dissimilarity to the one or more correct responses required for the response to be scored at a certain level. The similarity value for the response is compared to the threshold similarity value. A score is assigned at, above, or below the certain level based on the comparison.