A Preliminary Study of the Effects of Within-Group Covariance Structure on Recovery in Cluster Analysis
- Author(s):
- Donoghue, John R.
- Publication Year:
- 1994
- Report Number:
- RR-94-46
- Source:
- ETS Research Report
- Document Type:
- Report
- Page Count:
- 49
- Subject/Key Words:
- Cluster Analysis, Monte Carlo Methods, Statistical Analysis
Abstract
Two Monte Carlo studies investigated the effects of within-group covariance structure on subgroup recovery by several widely used hierarchical clustering methods. Data sets were 100 bivariate observations from two subgroups, generated according to a finite normal mixture model. In Study 1, subgroup size, within-group correlation, within-group variance, and distance between subgroup centroids were manipulated. All clustering methods were strongly affected by within- group correlation; negative correlation yielded much poorer recovery. Smaller effects were found for the interaction of clustering method with within-group variance. Study 2 separated the effects of direction of correlation from the direction of differences in the subgroup centroids. Subgroup size, within-group correlation, direction of the vector separating subgroup centroids, and distance between subgroup centroids were manipulated. Superior recovery was associated with within-group correlation that matched the direction of subgroup separation. Overall, the EML algorithm of SAS yielded best recovery, followed closely by Ward's method, average linkage, and a version of the beta-flexible algorithm, although several interactions were noted. The results are interpreted according to the weakness of the (squared) Euclidean distance as a measure of (dis)similarity for cluster analysis. Several alternative measures are discussed, and promising alternatives are identified for future investigation. (49pp.)
Read More
- Request Copy (specify title and report number, if any)
- http://dx.doi.org/10.1002/j.2333-8504.1994.tb01619.x