skip to main content skip to footer

A Preliminary Comparison of the Effectiveness of Cluster Analysis Weighting Procedures for Within-Group Covariance Structure

Author(s):
Donoghue, John R.
Publication Year:
1995
Report Number:
RR-95-35
Source:
ETS Research Report
Document Type:
Report
Page Count:
50
Subject/Key Words:
Algorithms, Cluster Analysis, Comparative Analysis, Monte Carlo Methods, Statistical Analysis, Weighted Scores

Abstract

A Monte Carlo study compared the usefulness of six variable weighting methods for cluster analysis. Datasets were 100 bivariate observations from two subgroups, generated according to a finite normal mixture model. Subgroup size, within-group correlation, within-group variance, and distance between subgroup centroids were manipulated. Of the clustering methods examined, the flexible average clustering algorithm with ˛ = -.15 or -.20 gave the best recovery. Of the remaining methods, Ward's method yielded the best recovery, followed closely by beta-flexible linkage (˛ = -.50) and SAS's EML algorithm. In the absence of variable weights, negative within-group correlation resulted in much poorer recovery for all clustering algorithms. The ACE weighting method of Art, Gnanadesikan, and Kettenring provided a net improvement in 17-24% of the datasets when used with better clustering algorithms. When used with the same clustering algorithms, De Soete's ultrametric weighting yielded improved recovery 16-22% of the time. However, although ultrametric weighting was more sensitive than ACE to negative within-subgroup correlation, clustering based on principal components was less effective. Therefore, the ACE method is preferred overall. There is still room for improvement, however. Clustering with Mahalanobis distance based on the pooled within-group covariance matrix indicated that knowing the correct covariance matrix would yield improved recovery (over ACE) approximately 10% of the time. (50pp.)

Read More