Systems and methods are provided for assigning a difficulty score to a speech sample. Speech recognition is performed on a digitized version of the speech sample using an acoustic model to generate word hypotheses for the speech sample. Time alignment is performed between the speech sample and the word hypotheses to associate the word hypotheses with corresponding sounds of the speech sample. A first difficulty measure is determined based on the word hypotheses, and a second difficulty measure is determined based on acoustic features of the speech sample. A difficulty score for the speech sample is generated based on the first difficulty measure and the second difficulty measure.