Instability in a Tree Approach to Regression
- Author(s):
- Kim, Sung Ho
- Publication Year:
- 1992
- Report Number:
- RR-92-01
- Source:
- ETS Research Report
- Document Type:
- Report
- Page Count:
- 32
- Subject/Key Words:
- Data Analysis, Models, Regression (Statistics), Statistical Analysis, Tree Structures (Graphs)
Abstract
One of the major problems that a tree-approach to data analysis often encounters is instability of tree-structures. Thus, if one wishes to interpret the data structure by the tree-approach, the instability issue must be dealt with. Examining instability at a node of a tree provides insight into the instability of the whole tree, since the same theory of instability applies to all the nodes. Thus, this paper deals with the instability issue at a single node of a tree. It is assumed that data are from a regression model and it is examined what factors in that model affect the instability. Squared-error loss is considered as a criterion for tree- construction ("ls" criterion in CART program). The selection rate of a regressor variable at a node of a tree is used as a measure of instability. The selection rate mainly depends on (i) regression coefficients, (ii) (conditional) variance- covariance structure of the regressor variables (given a subset of the regressor variables), (iii) the sample size, and (iv) noise in the response variable. Simulation results that show patterns of instability for several different settings of regression models are reported. (32pp.)
Read More
- Request Copy (specify title and report number, if any)
- http://dx.doi.org/10.1002/j.2333-8504.1992.tb01431.x