skip to main content skip to footer

Impact of Categorization and Scaling on Classification Agreement and Prediction Accuracy Statistics IRT QWK

Wang, Wei; Dorans, Neil J.
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Item Response Theory (IRT), Simulation Studies, Categorization, Scaling, Classification, Prediction, Accuracy, Classification Accuracy, Agreement, Constructed-Response Scoring, Automated Scoring, Psychometric Models, Measurement Instruments, Discrepancy Measures, Correlation Analysis, Scaling Statistical Analysis, Quadratic Weighted Kappa (QWK), Kappa Measures, Human Scoring


Agreement statistics and measures of prediction accuracy are often used to assess the quality of two measures of a construct. Agreement statistics are appropriate for measures that are supposed to be interchangeable, whereas prediction accuracy statistics are appropriate for situations where one variable is the target and the other variables are predictors. Using bivariate normality assumptions, we analytically examine the impact of categorization of a continuous variable and mean/sigma scaling on different measures of agreement and different measures of prediction accuracy. We vary the degree of relationship (squared correlation) between two continuous measures of a construct and the degree to which these measures are reduced to fewer and fewer categories (categorization). The main findings include that (a) categorization influences all the statistics investigated, (b) the correlation between the continuous variables affects the values of the statistics, and (c) scaling a prediction of a target variable to have the same mean and variability as the target increases agreement (according to Cohen's kappa and quadratic weighted kappa) but does so at the expense of prediction accuracy. The implications of these results for scoring of essays by humans or machines are also discussed.

Read More