Construct Validity of the GRE Aptitude Test Across Populations--An Empirical Confirmatory Study
- Rock, Donald A.; Werts, Charles E.; Grandy, Jerilee
- Publication Year:
- Report Number:
- Document Type:
- Subject/Key Words:
- Aptitude tests Black students college students item analysis test bias test validity validity studies
The purpose of this study was to: (1) evaluate the invariance of the construct validity and thus the interpretation of GRE Aptitude Test scores across four populations, and (2) develop and apply a systematic procedure for investigating the possibility of test bias from a construct validity frame of reference. The notion of invariant construct validity was defined as: (1) similar patterns of loadings across populations; (2) equal units of measurement across populations; and (3) equal test score precision as defined by the standard error of measurement. If any one of the above criteria differs across populations, then one has to consider seriously the possibility of psychometric bias, as defined in this paper. The advantage of investigating psychometric bias at the item type level (even though the total score may not be biased) is that this may provide an "early warning" with respect to any future plans to increase the number of items of any particular type. A secondary purpose of this study was to evaluate the factor structure of the three sections (verbal, quantitative, and analytical) on which the subscores are derived. Assuming that the invariant construct validity model based on item types is tenable, a hypothesized three factor "macro" model based on the three sections could be applied to the population invariant variance-covariance matrix. It should be noted that the term "psychometric bias" as defined here does not require external criteria information for the analysis. The internal procedure used here is suggested as only a first step in a broader process of an integrated validation procedure that should include not only internal checks on the population invariance of the underlying constructs but also checks on the population invariance of their relationship with external criteria. Although this is only a first step, it is a necessary step since any interpretation of relationships with external criteria becomes academic unless one can first show that the tests measure what they purport to measure with similar meaning and accuracy for all populations of interest. The four subpopulations were 1,122 White males, 1,471 White females, 284 Black males, and 626 Black females. The analysis indicated that a factor structure defined by the 10 item types showed relatively invariant psychometric characteristics across the four subpopulations. That is, the item-type factors appear to be measuring the same things in the same units with the same precision. These results do not provide any significant evidence of psychometric bias in the test. Confirmatory analysis of a higher-order factor model defined by an a priori model based on three- and four-factor solutions was attempted to investigate the factorial contributions of the analytical item types. Results of this analysis indicated that the three analytical item types appear to be varying functions of reading comprehension and quantitative ability. The analysis of explanations item type was the more complex factorially and include a vocabulary component as well as reading and quantitative components. Of the remaining two analytic item types, logical diagrams had the comparatively larger unique variance component. Analytical reasoning appeared to share most of its variance with the reading comprehension and quantitative factors.