Systems and methods are provided for acquiring physical-world data indicative of interactions of a subject with an avatar for evaluation. An interactive avatar is provided for interaction with the subject. Speech from the subject to the avatar is captured, and automatic speech recognition is performed to determine content of the subject speech. Motion data from the subject interacting with the avatar is captured. A next action of the interactive avatar is determined based on the content of the subject speech or the motion data. The next action of the avatar is implemented, and a score for the subject is determined based on the content of the subject speech and the motion data.