Skills and Earnings in the Full-Time Labor Market

Neeta Fogg, Paul Harrington, and Ishwar Khatiwada

# Multivariate Regression Analysis of Earnings

Descriptive findings presented in previous sections of this report revealed a positive connection between levels of educational attainment and skills and a strong positive connection between skills and earnings. Workers with higher levels of education had higher earnings than workers with lower levels of education. However, within groups of equally educated workers, there were sizable differences in the mean earnings of workers by their literacy and numeracy proficiency. Earnings of workers are positively related to education, because education raises the knowledge and skills of individuals and makes them more productive in the workplace. However, if education fails to enhance the skills of workers, the link between education and earnings is likely to weaken.

Education and skills are measures of human capital that are related. Education bolsters literacy and numeracy proficiencies of students, and higher levels of literacy and numeracy skills bolster the chances of higher levels of educational attainment. Individuals with higher levels of skills and cognitive abilities are more likely to seek and complete higher levels of education to earn educational credentials. Our study of Philadelphia public high school graduates found sizable impacts of standardized math and reading test scores on the likelihood of enrolling in college, persisting through college, and completing college with a credential.47

The analysis of PIAAC data for U.S. workers presented in previous sections of this report is important in describing patterns in the data, but it does not isolate the independent effects of human capital measures and other background traits (job traits, employment-related workers traits, and demographic traits of workers) on the monthly earnings of workers. To identify the independent effect of these variables, particularly human capital variables, on the earnings of workers, we have estimated a set of multivariate earnings regression models.

The earnings functions estimated in this report are an expanded version of the basic Mincerian human capital earnings function. They include all three measures of human capital as explanatory variables as well as a number of other covariates known to influence the level of earnings of workers. The three measures of human capital are specified in these regressions as described below.

Education is represented with a set of dummy variables for each educational credential included in the PIAAC data files such as high school diploma, post-high school certificate, associate's degree, bachelor's degree, and postgraduate degrees. Work experience is entered in the regression as a nonlinear variable; it is specified as a quadratic variable to represent the following prediction of the human capital model—that earnings increase with additional work experience, but that these gains occur at a diminishing rate, reaching a maximum at a certain level of work experience. The third measure of human capital, skills of workers, is specified in two different ways—the first is with standardized scores of workers on the PIAAC literacy and numeracy tests and the second with PIAAC levels of literacy and numeracy proficiency of workers.

The earnings functions are estimated with a series of regressions designed to focus on the human capital of workers, particularly their literacy and numeracy proficiencies. We have followed a slightly different order from a standard Mincerian human capital earnings function that typically begins with education and work experience before the addition of skills/abilities and other covariates. Because of our focus on skills, the earnings functions that we have estimated begin with skills (the literacy and numeracy proficiencies of workers), followed by blocks of variables representing the educational attainment of workers and years of work experience, English writing ability, the characteristics of the job in which they were employed, the employment-related traits of workers, and the demographic traits of workers.

These regressions are designed to measure the independent effect of human capital traits of workers on earnings. We have estimated six earnings regressions. The explanatory variables in each of these regressions are presented in the top half of Box 1. As noted above, worker skills were represented in these regressions with PIAAC literacy and numeracy proficiencies, and the earnings regressions were estimated using two different specifications of each proficiency measure. Thus, four sets of the six earnings regressions were estimated—two sets for two specifications of the literacy proficiency of workers and another two sets for two specifications of the numeracy proficiency of workers. The four sets of these six regressions are presented in the lower half of Box 1.

SIX EARNINGS REGRESSION MODELS The explanatory variable blocks in each of the six regression models are listed below: Human capital traits: Model 1: Literacy / numeracy proficiency Model 2: Model 1 plus educational attainment Model 3: Model 2 plus paid work experience and English writing ability Job characteristics and employment-related traits of workers: Model 4: Model 3 plus sector of employment and occupation Model 5: Model 4 plus weekly hours of work, school enrollment status, and place of residence Demographic traits of workers: Model 6: Model 5 plus gender, race-ethnicity, foreign-born status, and disability status |

FOUR EARNINGS REGRESSION MODELS Four sets of earnings regressions were estimated, each consisting of six regression models described above. The four sets of regressions differ on the specification of the explanatory variable measuring skills as follows: Set A: Skills specified as standardized score on the literacy test Set B: Skills specified as standardized score on the numeracy test Set C: Skills specified as levels of literacy proficiency Set D: Skills specified as levels of numeracy proficiency |

The Effects of Literacy and Numeracy Proficiencies on Earnings

A summary of the effects of standardized scores of literacy and numeracy proficiencies of 25- to 54-year-old full-time employed workers on their monthly earnings estimated from Set A (where skills are specified as standardized score on the literacy test) and Set B (where skills are specified as standardized score on the numeracy test) of the six regression models are presented in figures 11 and 12.48 According to Set A-Model 1, which includes just one explanatory variable, the standardized literacy test score, an increase of one standard deviation unit in the literacy test score is expected to increase monthly earnings by nearly 32 percent (Figure 11). The explanatory power (adjusted R-squared) of this regression (Set A-Model 1) was .187—that is, literacy skills explain just under one-fifth of the variation in the earnings of 25- to 54-year-old full-time employed workers.

The addition of more explanatory variables in the earnings regression models is expected to reduce the regression-adjusted measure of the effect of skills on earnings since these explanatory variables that measure worker and job traits are also known to correlate with earnings. Regression-adjusted effects measure the "independent" effect of an explanatory variable on the dependent variable after statistically controlling for the effects of other explanatory variables included in the regression. Adding the educational attainment measure in Set A-Model 2 halved the effect of literacy proficiency on monthly earnings—that is, one standard deviation unit change in the literacy proficiency of workers is expected to increase earnings by 16 percent. The R-squared of Set A-Model 2 increased from .187 to .284, representing an increase of over 50 percent in the variance accounted for in Set A-Model 2 compared to that in Set A-Model 1. However, the independent (regression-adjusted) effect of literacy skills on earnings remained large and statistically significant.

The addition of two more variables measuring human capital—paid work experience and English writing ability—in Set A-Model 3 reduced the estimated effect of one standard deviation unit change in the literacy score to 14 percent (significant at the .01 level) and raised the R-squared to .344. Set A-Model 4, which added the following two variables measuring job traits—occupation and sector of employment—estimated an 11.6 percent change in earnings from one standard deviation unit change in the literacy score of workers (Figure 11). The R-squared of Set A-Model 4 rose to .405. Even after adding education, paid work experience, English writing ability, sector of employment, and occupation—traits that are strongly related to earnings—the effect of literacy skills of workers remained sizable and statistically significant.

Variables measuring worker traits that are expected to influence their earnings were added as explanatory variables in the final two regression models. Set A-Model 5 added three additional explanatory variables (weekly hours of work, school enrollment status, and region of residence) representing employment-related traits of workers, and Set A-Model 6 added personal demographic traits as explanatory variables (Figure 11). The estimated regression-adjusted effect of literacy proficiency in these last two regression models in Set A further declined to 10.3 percent with the addition of job-related worker traits to explanatory variables in Set A-Model 5, and 8.4 percent in Set A-Model 6, which added demographic traits of workers to the regression's explanatory variables.

These estimates of the percentage effect of literacy proficiency on earnings in all six models measure the effect of one standard deviation unit change in the literacy score of workers on their monthly earnings after statistically adjusting for the effects on earnings of other explanatory variables included in these regression models. The explanatory power of the earnings regression models also rose in models 5 and 6: .467 in Model 5 and .500 in Model 6 (Figure 11). The .500 R-squared of the full regression model (Model 6) means that this model explains one-half of the variation in the monthly earnings of 25- to 54-year-old full-time employed workers. Although 50 percent of the variation in earnings explained (measured by .500 R-squared) in cross-section data is considerable, the remaining 50 percent of the variation in earnings is not captured by the explanatory variables included in the full regression model and could be attributable to a variety of factors that are not available in PIAAC data or attributes that are not easily measurable.

The regression-adjusted effect of numeracy proficiencies on the earnings of workers that are measured in Set B earnings regression models 1 through 6 are presented in Figure 12. Although the findings of numeracy proficiencies are similar to that of literacy proficiencies, the regression-adjusted effect of numeracy proficiencies on the earnings of workers was somewhat higher in models 1 through 5 of Set B compared to estimates of the regression-adjusted effect of literacy proficiency in models 1 through 5 of Set A. However, in the full model (Model 6), the regression-adjusted effect of numeracy proficiency (Set B-Model 6) was about the same as the regression-adjusted effect of literacy proficiencies estimated in Set A-Model 6. In Set B-Model 1, the coefficient of the standardized numeracy proficiency score variable was .292, representing a percent effect on earnings of 34 percent. In other words, one standard deviation unit increase in the numeracy test score is expected to increase monthly earnings by nearly 34 percent. The percent effect of numeracy proficiency on earnings falls to 18.6 percent with the addition of educational attainment as explanatory variables of the regression in Set B-Model 2. In other words, after statistically controlling for the effect of educational attainment on earnings, the independent effect of one standard deviation unit change in workers' numeracy proficiency score is estimated to be an 18.6 percent change in earnings.

The size of the estimated percent effect of numeracy skills on earnings declined as the number of explanatory variables included in the regression models increased. In the full regression model (Set B-Model 6), an increase of one standard deviation unit in the numeracy skill score of workers is expected to increase earnings by 8.3 percent. The full regression model includes the explanatory variables added in each of the five regression models 1 through 5: numeracy skill score, educational attainment, work experience, English writing ability, job traits, and employment-related traits of workers along with demographic traits of workers. Even after controlling for all these variables, the independent effect of one standard deviation unit change in the numeracy proficiency score of workers on their monthly earnings remained high—8.3 percent (Figure 12). The R-squared of Set B-Model 6 (which includes the standardized numeracy score as the explanatory variable measuring skills) was .500 (Figure 12); that is identical to the R-squared for Set A-Model 6 (which includes the standardized literacy score as the explanatory variable measuring skills) (Figure 11).

The next two sets of earnings regressions (sets C and D) utilize levels of literacy and numeracy proficiencies (described in Box 1) as explanatory variables to measure skills of workers instead of standardized literacy and numeracy scores that were utilized as explanatory variables in the previous two sets of earnings regressions (sets A and B). Levels of the literacy and numeracy proficiency of workers are entered in sets C and D earnings regression models as explanatory variables with three dummy variables representing three proficiency levels—level 1 or below, level 3, and level 4/5. This specification leaves level 2 as the base or reference group against which earnings premiums/deficits of workers in the remaining three proficiency levels are assessed. The three dummy variables are defined as taking on the value 1 if workers' skills scores are at the level defined by the dummy variable and 0 otherwise.

Using this specification of literacy and numeracy skills, we have estimated six earnings regressions for literacy proficiency (Set C) and another six earnings regressions for numeracy proficiency (Set D). Findings from the six regressions (Set C) on the regression-adjusted impact of the level of literacy proficiency on the monthly earnings of 25- to 54-year-old full-time employed workers are presented in Figure 13. These findings reveal that in Set C-Model 1, which included just three explanatory variables representing literacy skill levels of workers, the monthly earnings of workers with literacy skills at or below level 1 are expected to be nearly 24 percent lower than the monthly earnings of workers in level 2; workers in level 3 and level 4/5 are, respectively, expected to earn 33 percent and 76 percent more than workers in level 2 of the literacy proficiency scale.

The size of the earnings deficit of workers with level 1 or lower literacy skills compared to their counterparts in level 2 fell to 12 percent in Set C-Model 2 (with the addition of explanatory variables measuring educational attainment), meaning that the regression-adjusted earnings premium of workers in level 2 literacy skills compared to workers with level 1 or below literacy skills declined (between Set C-Model 1 and set-C Model 2) after statistically controlling for the effect of education on earnings. The sizes of earnings premiums of level 3 and levels 4/5 literacy skills also were estimated to be smaller in Set C-Model 2 than in Set D-Model 1. Similar to the findings of earnings regressions with standardized literacy scores (Set A), the regression-adjusted earnings premiums of literacy skills in level 3 and levels 4/5 fell in each successive Set C regression model with additional explanatory variables (Figure 13).

The regression-adjusted earnings difference between workers with literacy skills in level 2 and their counterparts with level 1 or below also declined with an increase in the number of explanatory variables, falling below the threshold for statistical significance after Set C-Model 2, meaning that there was no statistically significant difference between the regression-adjusted earnings of workers with literacy skills in level 1 or below and level 2 when the earnings regression models were expanded to include paid work experience and English writing ability of workers, job traits, employment-related worker traits, and demographic traits of workers as explanatory variables.

The full earnings regression model of Set C (Set C-Model 6) reveals that there was no statistically significant difference between the regression-adjusted earnings of workers in level 1 or lower and level 2 literacy skills. Workers with level 3 literacy skills are expected to earn nearly 7 percent more than workers with level 2 (significant at the .10 level), while the regression-adjusted earnings of workers with the highest level of literacy proficiencies (levels 4/5) were estimated to be 21 percent higher than that of their counterparts with level 2 literacy skills (Figure 13; Set C-Model 6).

Findings from the final set of six earnings regression models, Set D, are presented in Figure 14. As noted above, this set of earnings regression models includes levels of the numeracy proficiency of workers as explanatory variables measuring skills of workers. Similar to regressions in Set C that included literacy proficiency levels as explanatory variables to measure worker skills, regressions in Set D included three dummy variables to represent three levels of the numeracy proficiency of workers: level 1 or below, level 3, and levels 4/5. The estimated regression coefficients of these three dummy variables represent measures of the regression-adjusted earnings premium or deficit of workers with these three numeracy proficiency levels compared to the earnings of workers with level 2 numeracy skills.

Regression of monthly earnings of workers on their numeracy skill levels (Set D-Model 1) found that relative to workers with level 2 numeracy proficiency, those with level 1 or lower were expected to earn 24 percent less, while workers with level 3 and levels 4/5 numeracy proficiencies were expected to earn 33 percent and 78 percent more, respectively. Similar to each of the three sets of regressions discussed above (Set A, B, and C), the regression-adjusted effect of numeracy skills on earnings of workers declined with each regression that included additional blocks of explanatory variables (Figure 14).

The addition of educational attainment (Set D-Model 2) reduced the regression-adjusted earnings difference relative to workers with level 2 numeracy proficiency to -14 percent among workers with numeracy proficiencies at or below level 1 (down from -24% in Set D-Model 1); to 17 percent among level 3 workers (down from 33% in Set D-Model 6); and to 41 percent among level 4/5 workers (down from 78% in Set D-Model 1). The addition of two more measures of human capital to the explanatory variables in Model 3 (paid work experience and English writing ability) further reduced the regression-adjusted earnings deficit of workers in level 1 or below compared to workers in level 2 to 9.8 percent (down from -14 percent in Set D-Model 2). The regression-adjusted earnings premiums of workers in levels 3 and 4/5 remained almost unchanged in Set D-Model 3 from the levels estimated in Set D-Model 2 (Figure 14).

In models 4, 5, and 6 where job traits, employment-related traits, and demographic traits of workers were added as explanatory variables, the regression-adjusted earnings deficit of workers in level 1 or below relative to workers in level 2 was not statistically significant; in other words there was no statistically significant difference in the regression-adjusted earnings of workers in the bottom two levels of the numeracy proficiency scale. At the higher end of the numeracy proficiency scale, the addition of job traits, employment-related traits, and demographic traits of workers as explanatory variables steadily reduced the estimated size of the earnings premium in level 3 and levels 4/5 of the numeracy scale (relative to level 2) down to 8 percent (significant at the .05 level) and 21.8 percent (significant at the .01 level), respectively (Figure 14).

The explanatory power of the earnings regressions increased with each additional block of explanatory variables. Model 1, which had just the level of numeracy proficiencies as explanatory variables, had an R-squared of .191. The addition of education in Model 2 increased the explanatory power to .294, followed by another increase in the R-squared to .357, when all human capital traits were included as explanatory variables in the regressions. Job traits, particularly occupation, also are closely related to earnings. The addition of job traits to the explanatory variables of the regression increased the R-squared to .416. The R-squared increased to .475 in Set D-Model 5 that controlled for weekly hours of work, school enrollment status, and the region of residence of workers. The full model, which included all the variables in Model 5 plus demographic traits of workers with an R-squared of .502, explained one-half of the variation in the monthly earnings of 25- to 54-year-old full-time employed workers (Figure 14). The explanatory power of the full model (Model 6) in each of the four sets of earnings regressions was almost identical: .500 in Set A-Model 6, .500 in Set B-Model 6, .501 in Set C-Model 6, and .502 in Set D-Model 6.

The coefficients, percent effects, and statistical significance of all explanatory variables in each of the six earnings regression models in all four sets are presented in Appendix tables D‑1 through D-8. A detailed discussion of the findings from the full earnings regression model (Model 6) of sets A, B, C, and D is presented below.

## Notes

47 Fogg and Harrington, From Diplomas to Degrees.

48The dependent variable in these regressions is the log of the monthly earnings. The anti-log of predictor variables minus 1 provides a measure of the expected percent change in dependent variable (monthly earnings) from a one-unit change in predictor variables. The coefficient for standardized literacy score was .277. The anti-log of .277 = 1.319; and 1.319 - 1 = 31.9 percent.