A New Approach for Reporting Aggregate Student Growth Scores

Dan McCaffrey

Associate Vice President at ETS

Katherine Castellano

Senior Research Scientist at ETS

December 8, 2021

Currently, 48 states in the United States measure student achievement growth as part of their statewide elementary and secondary school student-testing programs. Often, growth measures are used as part of accountability systems for districts, schools, or even teachers to give a more complete picture of student performance than current achievement alone. These systems use the average of individual students’ growth for all the students in a district, school, or teacher’s class. While this approach on the surface may appear to be a practical and straightforward way to summarize student progress, it can, in reality, prove to be problematic.

Averaging growth measures for schools, districts or teachers with few students can result in substantial year-to-year fluctuations. A small school, for example, with a high growth measure in one year, ranking it at the 90th percentile, might have a low value the next year, ranking it at only the 10th percentile. These year-to-year changes can make it troublesome to use the average of student growth for decision making. Given that these decisions can often be tied to high-stakes, funding implications and other considerations that impact students, schools and districts, it’s critical that these measures are accurate and provide actionable information.

Over the last few years, we have been working with the California Department of Education (CDE) to help them better understand ways to measure and report their state’s student achievement growth. When faced with high year-to-year fluctuations, California was uncertain whether or how to proceed with rolling out their growth measures. As a result, we set out to find a way to improve the state’s (aggregate) growth measures and remove excessive instability from the measures for schools or districts serving small numbers of students or for low incidence student groups within any of them, such as students with disabilities or English learners (ELs).

During the course of our work, we turned to a standard statistical method known as Empirical Best Linear Prediction (EBLP) to improve the accuracy and consequently reduce the year-to-year fluctuations of the growth measures. This statistical method is commonly used in many different applications to provide measures for multiple groups such as patient outcomes in hospitals or literacy levels for counties in a state. Our team developed the necessary methodology and computer algorithms and code to apply this EBLP method to growth data for which there can be over a million individual student growth measures and hundreds or even thousands of schools.

The EBLP method is not a new student growth model. It can be applied to any type of student growth scores from simple gain scores (e.g., current year score minus prior year score) to more involved Student Growth Percentiles. The power of the EBLP procedure lies in optimally using student growth score data from multiple years to produce a better estimate of the group’s growth score in the state’s reporting year. Simply put, the EBLP aggregate growth scores are approximately a weighted average of student growth scores from two or more school years instead of a simple average of scores from just the reporting year.

What’s more, the EBLP method adapts to the size of the group. For larger groups that already have more accurate and stable estimates, the EBLP aggregate growth score is nearly identical to the simple average. It places almost all of the weight on student growth scores in the reporting year and little or no weight on student growth scores from prior years. Conversely, for smaller groups, the EBLP procedure will place some nontrivial amount of weight on the prior year growth scores because they can help inform growth of the group of students in the reporting year. In these cases, the EBLP will differ more noticeably from the simple average, but it will also be more accurate and stable than the simple average. The overall result is that EBLP weighted averages will have a larger impact on improving accuracy and stability of smaller groups, reducing the gap in performance for smaller versus larger. Using this method, states with schools and districts that tend to have smaller student populations or small student groups within schools and districts will no longer be penalized simply because they happen to serve fewer students than larger ones.

Using this EBLP methodology, we shared our team’s results with CDE whose staff were intrigued by the method as a potential solution to the high year-to-year fluctuations they observed with their aggregate growth scores and requested we explore its potential to improve the stability of the growth measures for schools and school districts throughout the state. The exploration conducted by ETS and the CDE revealed that EBLP improved the accuracy and cross-year correlation of growth measures, particularly for small schools and school districts. Given the success of our proposed approach, the California State Board of Education recently voted unanimously to approve the use of the EBLP approach for reporting school and district growth along with growth for student groups within schools and districts.

Learn more about this work or our R&D Consulting Services.

Katherine Castellano is a senior research scientist at ETS. Dan McCaffrey is an associate vice president at ETS.