Systems and methods are described for generating a scoring model for responses. A computer-implemented method of calibrating a scoring model using a processing system for scoring examinee responses includes accessing a plurality of training responses for training the scoring model. The plurality of training responses are analyzed to derive values of multiple features (variables) of the training responses. The scoring model is trained based on the values of the multiple features of the training responses and one or more external measures of proficiency for each individual associated with a training response utilized in the training. The one or more external measures are not derived from the training responses. Based on the training, a weight for each of the multiple features is determined. The scoring model is calibrated to include the weights for at least some of the features for scoring examinee responses.