CUNY Ph.D. Program in Computer Science
Technical Reports

Tree Menu Help




Submit TechReport

Send Suggestions
TR-2003009
A Hierarchical Projection Pursuit Clustering Algorithm
Author(s):  Jayson E. Rome, Alexei D. Miasnikov, Robert M. Haralick
Received Date:  August 18, 2003
Download:  

Abstract

We define a cluster to be characterized by regions of high density
separated by regions that are sparse. Given a collection of
observations $X = \{x_{i}\}$, $x_{i} \in \mathbb{R}^{d}$, $|X| =
N$, we would like to find clusters in data sets in which $d$ and
possibly $N$ are large, in which there is no known parametric
distribution and in which clusters may take on arbitrary shapes.
By observing the downward closure property of density, the search
for interesting structure in a high dimensional space can be
reduced to a search for structure in lower dimensional subspaces.
We present a parameter free Hierarchical Projection Pursuit
Clustering (HPPC) algorithm that repeatedly bi-partitions interesting
lower dimensional projections of a high dimensional dataset. We
describe a projection search procedure for use with relatively high
dimensional data and a projection pursuit index function based on
the Kittler and Illingworth optimal threshold technique. The
output of the algorithm is a decision tree whose nodes store a
projection and threshold and whose leaves represent the clusters
(classes). We present several methods for cluster validation that
are used to evaluate the algorithm. Experiments with various real
and synthetic datasets show the effectiveness of the approach.