The automated scoring of open-ended items is currently used for licensure tests in medicine and architecture and for tests employed in admissions to graduate management programs. Whereas automated scoring appears to have achieved some of the efficiency goals of its progenitors, it has yet to realize fully its potential to improve the quality of assessment through finer control of underlying constructs. This paper offers thoughts for moving the field forward by, first and foremost, remembering that it?s not only the scoring. That is, the hard work is in devising a scoring approach - and more generally, an assessment - grounded in a credible theory of domain proficiency. We can also move the field forward by conducting rigorous scientific research that helps build a strong validity argument; publishing research results in peer-reviewed measurement journals where the quality can be critiqued from a technical perspective; attempting to use competing scoring approaches in combination; employing multiple pieces of evidence to bolster the meaning of automated scores; and, finally, using automated scoring to fundamentally alter the character of large-scale testing in ways that bring lasting educational impact.