Even expert raters, individuals with a background in teaching and evaluating oral communication, had difficulty agreeing with one another on those dimensions. Low-inference dimensions such as visual aids and vocal expression were associated with much higher levels of interrater reliability, .65 and .75, respectively. The holistic score was associated with an interrater reliability of .63. These results point to the need for a significant investment in task, rubric, and training development for the public speaking competence assessment before it can be used for large-scale assessment purposes.