Native Language Identification with Time Delay Deep Neural Networks Trained Separately on Native and Non-Native English Corpora

Qian, Yao; Evanini, Keelan; Lange, Patrick; Pugh, Robert A.; Ubale, Rutuja
Sep 22, 2020
Native Language Identification (NLI), Neural Networks, Corpora (Linguistics), English Language Learners (ELL)


Systems and methods for identifying a person's native language, are presented. A native language identification system, comprising a plurality of artificial neural networks, such as time delay deep neural networks, is provided. Respective artificial neural networks of the plurality of artificial neural networks are trained as universal background models, using separate native language and non-native language corpora. The artificial neural networks may be used to perform voice activity detection and to extract sufficient statistics from the respective language corpora. The artificial neural networks may use the sufficient statistics to estimate respective T-matrices, which may in turn be used to extract respective i-vectors. The artificial neural networks may use i-vectors to generate a multilayer perceptron model, which may be used to identify a person's native language, based on an utterance by the person in his or her non-native language.

