skip to main content skip to footer

A Preliminary Study of the Effects of Within-Group Covariance Structure on Recovery in Cluster Analysis

Author(s):
Donoghue, John R.
Publication Year:
1994
Report Number:
RR-94-46
Source:
ETS Research Report
Document Type:
Report
Page Count:
49
Subject/Key Words:
Cluster Analysis, Monte Carlo Methods, Statistical Analysis

Abstract

Two Monte Carlo studies investigated the effects of within-group covariance structure on subgroup recovery by several widely used hierarchical clustering methods. Data sets were 100 bivariate observations from two subgroups, generated according to a finite normal mixture model. In Study 1, subgroup size, within-group correlation, within-group variance, and distance between subgroup centroids were manipulated. All clustering methods were strongly affected by within- group correlation; negative correlation yielded much poorer recovery. Smaller effects were found for the interaction of clustering method with within-group variance. Study 2 separated the effects of direction of correlation from the direction of differences in the subgroup centroids. Subgroup size, within-group correlation, direction of the vector separating subgroup centroids, and distance between subgroup centroids were manipulated. Superior recovery was associated with within-group correlation that matched the direction of subgroup separation. Overall, the EML algorithm of SAS yielded best recovery, followed closely by Ward's method, average linkage, and a version of the beta-flexible algorithm, although several interactions were noted. The results are interpreted according to the weakness of the (squared) Euclidean distance as a measure of (dis)similarity for cluster analysis. Several alternative measures are discussed, and promising alternatives are identified for future investigation. (49pp.)

Read More