skip to main content skip to footer

Variable Screening for Cluster Analysis

Author(s):
Donoghue, John R.
Publication Year:
1994
Report Number:
RR-94-36
Source:
ETS Research Report
Document Type:
Report
Page Count:
56
Subject/Key Words:
Cluster Analysis, Monte Carlo Methods, Statistical Analysis

Abstract

Inclusion of irrelevant variables in a cluster analysis adversely affects subgroup recovery. This paper examines using moment-based statistics to screen variables; only variables which pass the screening are then used in clustering. Normal mixtures are analytically shown often to possess negative kurtosis. Two related measures, m and coefficient of bimodality b, are also examined. A Monte Carlo study compared the screening measures to no selection, De Soete's (1988) ultrametric weights, and Fowlkes, Gnanadesikan, and Kettenring's (1988) forward selection procedure. Screening based on kurtosis degraded recovery and is not recommended. In contrast, screening on m or on b improved recovery over both no selection and forward selection, and screening performed as well as ultrametric weights. Combining screening with ultrametric weights performed extremely well. All methods were found to be somewhat sensitive to other types of error. Screening variables appears a viable alternative to both ultrametric weights and forward selection. The potential advantages and disadvantages of screening are considered. (56pp.)

Read More