skip to main content skip to footer

Instability in a Tree Approach to Regression

Kim, Sung Ho
Publication Year:
Report Number:
ETS Research Report
Document Type:
Page Count:
Subject/Key Words:
Data Analysis, Models, Regression (Statistics), Statistical Analysis, Tree Structures (Graphs)


One of the major problems that a tree-approach to data analysis often encounters is instability of tree-structures. Thus, if one wishes to interpret the data structure by the tree-approach, the instability issue must be dealt with. Examining instability at a node of a tree provides insight into the instability of the whole tree, since the same theory of instability applies to all the nodes. Thus, this paper deals with the instability issue at a single node of a tree. It is assumed that data are from a regression model and it is examined what factors in that model affect the instability. Squared-error loss is considered as a criterion for tree- construction ("ls" criterion in CART program). The selection rate of a regressor variable at a node of a tree is used as a measure of instability. The selection rate mainly depends on (i) regression coefficients, (ii) (conditional) variance- covariance structure of the regressor variables (given a subset of the regressor variables), (iii) the sample size, and (iv) noise in the response variable. Simulation results that show patterns of instability for several different settings of regression models are reported. (32pp.)

Read More