Data is received that encapsulates a spoken response to a test question. Thereafter, the received data is transcribed into a string of words. The string of words is then compared with at least one source string so that a similarity grid representation of the comparison can be generated that characterizes a level of similarity between the string of words and the at least one source string. The grid representation is then scored using at least one machine learning model. The score indicates a likelihood of the spoken response having been plagiarized. Data providing the encapsulated score can then be provided. Related apparatus, systems, techniques and articles are also described.