Application of the K-Spatial Medians Clustering

Application of the K-Spatial Medians Clustering

Application of the k -spatial medians clustering Myoungshic Jhun Department of Statistics, Korea University Anam-Dong, Sungbuk-Ku Seoul, 136-701, Korea [email protected] Seoho on Jin Department of Statistics, Korea University Anam-Dong, Sungbuk-Ku Seoul, 136-701, Korea [email protected] 1. Intro duction The most widely used partitioning metho d in cluster analysis is the k -means clustering which minimizes within-cluster sum of squares. However, the k -means clustering is sensitive to outliers or cluster structures. We intro duce the k -spatial medians clustering which is less sensitive to outliers as an alternative to the k -means clustering and compare two clustering metho ds for some arti cial data sets. 2. The k -spatial medians clustering Results of the most cluster analyses are quite sensitive to outliers. The k -means clustering is also in uenced either by distant ob jects or cluster structures. To tackle the e ect of outliers in cluster analysis, we consider the k -spatial medians clustering which replaces the squared Eu- clidean distances in the ob jective function of the k -means clustering with the absolute Euclidean distances. We prop ose an algorithm, which mo di es the nearest centroid sorting घMacQueen, 1967ङ and the transfer algorithm घBan eld and Bassil, 1977ङ, of the k -spatial medians clustering. It has two distinct phases: one of transferring an ob ject from one cluster to another and the other of amalgamating the single member cluster with its the nearest cluster. Given a starting partition, each p ossible transfer is tested in turn to see if it would improvethevalue of clustering criterion. When no further transfers can improve the criterion value, each p ossible amalgamation of the single member cluster and other clusters is tested. The amalgamation of the single member cluster should be executed with the detachment of an ob ject which is far from its cluster centroid when it is found to be b ene cial. When no further amalgamations give an improvement, the transfer phase is re-entered, and continued until no more transfers or amalgamations can improvethe clustering criterion value. In order to compare the k -spatial medians clustering with the k -means clustering two examples are considered. One has an outlier and the other has a particular structure. Supp ose we measure twovariables for each of 11 ob jects. There are two the same structure clusters and one outlierघthe point घ7,3ङङ. Each p oint is plotted in Figure 1 by their cluster identi cation number. For the k -means clustering, the outlier p erturbs the genuine cluster structure. How- ever, the resulting clusters of the k -spatial medians clustering are prop erly separated. Since the spatial median is less sensitive to outliers, the centroid of each cluster is a ected little by the outlier. It makes the partition maintain its genuine structure. Secondly, supp ose we measure two variables for each of 60 ob jects. Each of 30 p oints forms an elongated structure. Figure 2 shows the results of dividing into two groups. Ob jects are unsuitably separated to the upp er and the lower clusters for k -means clustering. On the contrary, the outcome of the k -spatial medians metho d shows well separated two clusters. Figure 1. Comparison of two methods for data with an outlier Figure 2. Comparison of two methods for elongated structured data 3. Conclusion We intro duced the k -spatial medians clustering pro cedure and compared it with the k - means clustering for arti cial data sets. The k -spatial medians clustering made b etter result than the k -means clustering, when either outliers existed or clusters had particular structures like the elongated one. It's not uncommon to nd outliers and particular structures in real life clustering situations, we exp ect to obtain go o d partitions by using the k -spatial medians clustering. REFERENCES Ban eld, C.F. and Bassill, L.C. घ1977ङ A Transfer Algorithm for Non-hierarchical Classi cation. Applied Statistics 26, 206-210. MacQueen, J. घ1967ङ Some metho ds for Classi cation and Analysis of Multivariate observations. Pro ceedings of the Fifth Berkeley Symp osium on Mathematical Statistics and Probability1,281- 297. Berkeley: University of California Press. उ उ FRENCH RESUME Nous avons introduit le groupement des k -spatiaux mउedians et उetudiउe sa supउerioritउe au groupement des k -moyenncs au cas oईu les donnउees aberrantes existaient oईu les groupements avaient des structures particuliईeres..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    2 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us