Large-scale institutional testing, and testing in general, are in a period of rapid change. Among the more obvious dimensions is the growing use of constructed-response items and of computer-based testing. This study explores the potential for using a computer-based scoring procedure for the formulating- hypotheses item. This item type presents a situation and asks the examinee to generate explanations for it. Each explanation is judged right or wrong and the number of creditable explanations summed to produce an item score. Scores were generated for 30 examinees' responses to each of eight items by a semantic patter- matching program and independently by five human raters. On its initial scoring run, the program agreed highly with the raters' mean item scores for some questions and improved its concurrence substantially as modifications to the automatic scoring process were made. By the final run, correlations between the program and the raters on item scores ranged from .89 to .97, and mean human-machine discrepancies ran from .6 to 1.1 on a 16-point scale. At the individual- hypothesis level, the proportion agreement ranged from .80 to .94, which, given the large disproportion of correct responses in the sample, was little better than chance. Also detected was a tendency on the part of the program to erroneously classify wrong responses as correct. We conclude that F-H items might be more effectively scored by a semiautomatic system that combines machine processing with a small number of human judges, and we present a preliminary configuration for such a process.