An Improvement for DBSCAN Algorithm for Best Results in Varied Densities
Total Page:16
File Type:pdf, Size:1020Kb
Computer Engineering Department Faculty of Engineering Deanery of Higher Studies The Islamic University-Gaza Palestine An Improvement for DBSCAN Algorithm for Best Results in Varied Densities Mohammad N. T. Elbatta Supervisor Dr. Wesam M. Ashour A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering Gaza, Palestine (September, 2012) 1433 H ii stnkmegnelnonkcA Apart from the efforts of myself, the success of any project depends largely on the encouragement and guidelines of many others. I take this opportunity to express my gratitude to the people who have been instrumental in the successful completion of this project. I would like to thank my parents for providing me with the opportunity to be where I am. Without them, none of this would be even possible to do. You have always been around supporting and encouraging me and I appreciate that. I would also like to thank my brothers and sisters for their encouragement, input and constructive criticism which are really priceless. Also, special thanks goes to Dr. Homam Feqawi who did not spare any effort to review and audit my thesis linguistically. My heartiest gratitude to my wonderful wife, Nehal, for her patience and forbearance through my studying and preparing this study. I would like to express my sincere gratitude to my advisor Dr. Wesam Ashour for the continuous support of my master study and research, for his patience, motivation, enthusiasm, and immense knowledge. His guidance helped me in all the time of research and writing of this thesis. I could not have imagined having a better advisor and mentor for my master study. Also, I would like to thank all my teachers in master degree, for their perceptiveness, understanding, and their skillful teaching. And many thanks to all my friends for their moral support. And above all, many thanks to Allah, who helps me get all required resources for completing this research. iii Contents ACKNOWLEDGEMENTS ........................................................................................ III LIST OF ABBREVIATIONS ................................................................................... VII LIST OF FIGURES .................................................................................................. VIII LIST OF TABLES....................................................................................................... XI ABSTRACT .............................................................................................................. XIII CHAPTER 1: INTRODUCTION ................................................................................. 1 1.1 WHAT IS CLUSTERING .................................................................................. 1 1.2 TYPES OF CLUSTERING ................................................................................ 2 1.3 TYPES OF CLUSTERS...................................................................................... 5 1.4 REQUIREMENTS OF CLUSTERING ALGORITHMS .................................. 6 1.5 OUR CONTRIBUTION ..................................................................................... 9 1.6 THESIS STRUCTURE ..................................................................................... 10 CHAPTER2: RELATED WORKS............................................................................. 11 2.1 BACKGROUND ................................................................................................ 11 2.2 DENCLUE ......................................................................................................... 15 2.3 DBCLASD .......................................................................................................... 16 2.4 ST-DBSCAN ...................................................................................................... 17 2.5 DVBSCAN.......................................................................................................... 18 CHAPTER 3: BACKGROUND .................................................................................. 20 iv 3.1 DATA TYPES IN CLUSTERING ANALYSIS ................................................ 20 3.2 SIMILARITY AND DISSIMILARITY ............................................................ 21 3.3 SCALE CONVERSION .................................................................................... 21 3.4 DATA STANDARDIZATION AND TRANSFORMATION ........................... 22 3.5 OVERVIEW OF DBSCAN ALGORITHM ...................................................... 23 CHAPTER 4: PROPOSED ALGORITHMS ............................................................. 27 4.1 FIRST PROPOSED ALGORITHM VMDBSCAN .......................................... 27 4.1.1 DESCRIPTION OF PROPOSED ALGORITHM TO FIND CORRECT NUMBER OF CLUSTERS .......................................................................................... 28 4.1.2 VIBRATION PROCESS ................................................................................. 30 4.1.3 VMDBSCAN ALGORITHM PSEUDO-CODE .............................................. 33 4.2 SECOND PROPOSED ALGORITHM DMDBSCAN .................................. 34 4.2.1 DESCRIPTION OF FINDING SUITABLE EPSI FOR EACH DENSITY LEVEL ......................................................................................................................... 36 4.2.2 DMDBSCAN ALGORITHM PSEUDO-CODE .............................................. 38 4.3 THIRD PROPOSED ALGORITHM VDDBSCAN .......................................... 39 4.3.1 DATA SETS INPUT AND DATA STANDARDIZATION ............................ 39 4.3.2 OVERCOMING ON PROBLEMS OF VARIED DENSITIES ...................... 40 4.3.3 NOISE ELIMINATION .................................................................................. 41 4.3.4 PROPOSED ALGORITHM VDDBSCAN PSEUDO-CODE ......................... 41 4.4 PROPOSED ALGORITHMS PROPERTIES .............................................. 42 CHAPTER 5: EXPERIMENTAL RESULTS ............................................................ 43 v 5.1 IMPLEMENTATION ENVIRONMENT .................................................... 43 5.2 ARTIFICIAL DATA SETS ............................................................................... 43 5.2.1 DBSCAN .......................................................................................................... 43 5.2.2 DVBSCAN ....................................................................................................... 45 5.2.3 VMDBSCAN.................................................................................................... 48 5.2.4 DMDBSCAN.................................................................................................... 48 5.2.5 VDDBSCAN .................................................................................................... 52 5.3 REAL DATA SETS ........................................................................................... 61 5.3.1 DBSCAN .......................................................................................................... 62 5.3.2 DVBSCAN ....................................................................................................... 62 5.3.3 VMDBSCAN.................................................................................................... 63 5.3.4 DMDBSCAN................................................................................................... 64 5.3.5 VDDBSCAN .................................................................................................... 65 5.4 THE RESULTS SUMMARY ............................................................................ 65 5.4.1 ARTIFICIAL DATA SETS ............................................................................. 65 5.4.2 REAL DATA SETS ......................................................................................... 66 CHAPTER 6: CONCLUSION .................................................................................... 69 6.1 SUMMARY AND CONCLUDING REMARKS .............................................. 69 6.2 FUTURE WORK ............................................................................................... 70 BIBLIOGRAPHY ....................................................................................................... 71 vi stAcvmivsbbbn toctmkA Core The core object for each cluster is the object that has the minimum density function value according. DBCLASD A Distribution-Based CLustering Algorithm for mining in large Spatial Databases. DBSCAN Density-Based Spatial Clustering of Applications with Noise. DENCLUE Clustering Based on Density Distribution Functions "DENsity-based CLUstEring". DMDBSCAN Dynamic Method DBSCAN. DVBSCAN A Density based Algorithm for Discovering Density Varied Clusters in Large Spatial Databases. E Total Density Function represents the difference among the data points, which is based on the core. E-neighborhood The neighborhood within a radius Eps of a given object is called the E-neighborhood of the object. Eps The radius of a number of objects. MDBSCAN A Multi-Density DBSCAN. MinPts The Minimum number of Points (objects) each point of a cluster the neighborhood of a given radius (Eps) has to contain. p Some Point in a data set. ST-DBSCAN An algorithm for clustering Spatial–Temporal data. VDDBSCAN Vibration and Dynamic DBSCAN. VMDBSCAN Vibration Method DBSCAN. vii List of Figures Figure 1.1 Different ways of dividing points into clusters [2] ........................................... 2 Figure 1.2 Diagram of clustering algorithms