Application of the k -spatial medians clustering
Myoungshic Jhun
Department of Statistics, Korea University
Anam-Dong, Sungbuk-Ku
Seoul, 136-701, Korea
Seoho on Jin
Department of Statistics, Korea University
Anam-Dong, Sungbuk-Ku
Seoul, 136-701, Korea
1. Intro duction
The most widely used partitioning metho d in cluster analysis is the k -means clustering
which minimizes within-cluster sum of squares. However, the k -means clustering is sensitive
to outliers or cluster structures. We intro duce the k -spatial medians clustering which is less
sensitive to outliers as an alternative to the k -means clustering and compare two clustering
metho ds for some arti cial data sets.
2. The k -spatial medians clustering
Results of the most cluster analyses are quite sensitive to outliers. The k -means clustering
is also in uenced either by distant ob jects or cluster structures. To tackle the e ect of outliers
in cluster analysis, we consider the k -spatial medians clustering which replaces the squared Eu-
clidean distances in the ob jective function of the k -means clustering with the absolute Euclidean
distances.
We prop ose an algorithm, which mo di es the nearest centroid sorting घMacQueen, 1967ङ
and the transfer algorithm घBan eld and Bassil, 1977ङ, of the k -spatial medians clustering.
It has two distinct phases: one of transferring an ob ject from one cluster to another and
the other of amalgamating the single member cluster with its the nearest cluster. Given a
starting partition, each p ossible transfer is tested in turn to see if it would improvethevalue of
clustering criterion. When no further transfers can improve the criterion value, each p ossible
amalgamation of the single member cluster and other clusters is tested. The amalgamation of
the single member cluster should be executed with the detachment of an ob ject which is far
from its cluster centroid when it is found to be b ene cial. When no further amalgamations
give an improvement, the transfer phase is re-entered, and continued until no more transfers or
amalgamations can improvethe clustering criterion value.
In order to compare the k -spatial medians clustering with the k -means clustering two
examples are considered. One has an outlier and the other has a particular structure. Supp ose
we measure twovariables for each of 11 ob jects. There are two the same structure clusters and
one outlierघthe point घ7,3ङङ. Each p oint is plotted in Figure 1 by their cluster identi cation
number. For the k -means clustering, the outlier p erturbs the genuine cluster structure. How-
ever, the resulting clusters of the k -spatial medians clustering are prop erly separated. Since the
spatial median is less sensitive to outliers, the centroid of each cluster is a ected little by the
outlier. It makes the partition maintain its genuine structure. Secondly, supp ose we measure
two variables for each of 60 ob jects. Each of 30 p oints forms an elongated structure. Figure 2
shows the results of dividing into two groups. Ob jects are unsuitably separated to the upp er
and the lower clusters for k -means clustering. On the contrary, the outcome of the k -spatial
medians metho d shows well separated two clusters.
Figure 1. Comparison of two methods for data with an outlier
Figure 2. Comparison of two methods for elongated structured data
3. Conclusion
We intro duced the k -spatial medians clustering pro cedure and compared it with the k -
means clustering for arti cial data sets. The k -spatial medians clustering made b etter result
than the k -means clustering, when either outliers existed or clusters had particular structures
like the elongated one. It's not uncommon to nd outliers and particular structures in real
life clustering situations, we exp ect to obtain go o d partitions by using the k -spatial medians
clustering.
REFERENCES
Ban eld, C.F. and Bassill, L.C. घ1977ङ A Transfer Algorithm for Non-hierarchical Classi cation.
Applied Statistics 26, 206-210.
MacQueen, J. घ1967ङ Some metho ds for Classi cation and Analysis of Multivariate observations.
Pro ceedings of the Fifth Berkeley Symp osium on Mathematical Statistics and Probability1,281-
297. Berkeley: University of California Press.
उ उ
FRENCH RESUME
Nous avons introduit le groupement des k -spatiaux mउedians et उetudiउe sa supउerioritउe au
groupement des k -moyenncs au cas oईu les donnउees aberrantes existaient oईu les groupements
avaient des structures particuliईeres.