(d) If we train a predictive model on one prompt, how well do its predictions generalize to the other prompts of the same or different genre? The results of the study indicate that keystroke log features vary considerably in stability across testing occasions and display somewhat different patterns of feature–human correlation across genres and topics. However, using the most stable features, we can obtain moderate to strong prediction of human essay scores, and those models generalize reasonably well across prompts though more strongly within than across writing genres.