<<

Application of the k -spatial clustering

Myoungshic Jhun

Department of , Korea University

Anam-Dong, Sungbuk-Ku

Seoul, 136-701, Korea

[email protected]

Seoho on Jin

Department of Statistics, Korea University

Anam-Dong, Sungbuk-Ku

Seoul, 136-701, Korea

[email protected]

1. Intro duction

The most widely used partitioning metho d in cluster is the k - clustering

which minimizes within-cluster sum of squares. However, the k -means clustering is sensitive

to outliers or cluster structures. We intro duce the k -spatial medians clustering which is less

sensitive to outliers as an alternative to the k -means clustering and compare two clustering

metho ds for some arti cial sets.

2. The k -spatial medians clustering

Results of the most cluster analyses are quite sensitive to outliers. The k -means clustering

is also in uenced either by distant ob jects or cluster structures. To tackle the e ect of outliers

in cluster analysis, we consider the k -spatial medians clustering which replaces the squared Eu-

clidean distances in the ob jective function of the k -means clustering with the absolute Euclidean

distances.

We prop ose an , which mo di es the nearest centroid sorting घMacQueen, 1967ङ

and the transfer algorithm घBan eld and Bassil, 1977ङ, of the k -spatial medians clustering.

It has two distinct phases: one of transferring an ob ject from one cluster to another and

the other of amalgamating the single member cluster with its the nearest cluster. Given a

starting partition, each p ossible transfer is tested in turn to see if it would improvethevalue of

clustering criterion. When no further transfers can improve the criterion value, each p ossible

amalgamation of the single member cluster and other clusters is tested. The amalgamation of

the single member cluster should be executed with the detachment of an ob ject which is far

from its cluster centroid when it is found to be b ene cial. When no further amalgamations

give an improvement, the transfer phase is re-entered, and continued until no more transfers or

amalgamations can improvethe clustering criterion value.

In order to compare the k -spatial medians clustering with the k -means clustering two

examples are considered. One has an outlier and the other has a particular structure. Supp ose

we measure twovariables for each of 11 ob jects. There are two the same structure clusters and

one outlierघthe point घ7,3ङङ. Each p oint is plotted in Figure 1 by their cluster identi cation

number. For the k -means clustering, the outlier p erturbs the genuine cluster structure. How-

ever, the resulting clusters of the k -spatial medians clustering are prop erly separated. Since the

spatial is less sensitive to outliers, the centroid of each cluster is a ected little by the

outlier. It makes the partition maintain its genuine structure. Secondly, supp ose we measure

two variables for each of 60 ob jects. Each of 30 p oints forms an elongated structure. Figure 2

shows the results of dividing into two groups. Ob jects are unsuitably separated to the upp er

and the lower clusters for k -means clustering. On the contrary, the outcome of the k -spatial

medians metho d shows well separated two clusters.

Figure 1. Comparison of two methods for data with an outlier

Figure 2. Comparison of two methods for elongated structured data

3. Conclusion

We intro duced the k -spatial medians clustering pro cedure and compared it with the k -

means clustering for arti cial data sets. The k -spatial medians clustering made b etter result

than the k -means clustering, when either outliers existed or clusters had particular structures

like the elongated one. It's not uncommon to nd outliers and particular structures in real

life clustering situations, we exp ect to obtain go o d partitions by using the k -spatial medians

clustering.

REFERENCES

Ban eld, C.F. and Bassill, L.C. घ1977ङ A Transfer Algorithm for Non-hierarchical Classi cation.

Applied Statistics 26, 206-210.

MacQueen, J. घ1967ङ Some metho ds for Classi cation and Analysis of Multivariate observations.

Pro ceedings of the Fifth Berkeley Symp osium on and Probability1,281-

297. Berkeley: University of California Press.

उ उ

FRENCH RESUME

Nous avons introduit le groupement des k -spatiaux mउedians et उetudiउe sa supउerioritउe au

groupement des k -moyenncs au cas oईu les donnउees aberrantes existaient oईu les groupements

avaient des structures particuliईeres.