Design & Implementation of Classification & Clustering
Total Page:16
File Type:pdf, Size:1020Kb
Summer Internship Report On Design & Implementation of Classification & Clustering Algorithm for Mobile Phones At IDRBT, Hyderabad 6th May 2014 to 5th July 2014 Submitted By: Ratnesh Chandak B. Tech- CSE (2nd Year) Roll Number: CS12B1030 Indian Institute of Technology, Hyderabad Guided By: Dr. V.N. Sastry Designation: Professor IDRBT, Hyderabad Date of Submission: 4th July 2014 1 Abstract This project is broadly divided in two parts, in the first part we would be finding intersection of common objects from different sets taking an example of finding common colored balls from different sets of containers. In second part we would describe K-Means Clustering Algorithm to find similar numerical sets and we will experimentally find optimal value of “K” in clustering algorithm by using Elbow Method. Both the these are explained with an example of mobiles phones and done implementation in java. 2 Certificate Certified that this is a bonafide record of summer internship project work entitled Design & Implementation of Classification & Clustering Algorithm for Mobile Phones Done By Ratnesh Chandak B. Tech - CSE (2nd Year) Indian Institute of Technology, Hyderabad at IDRBT, Hyderabad during 6th May to 5th July 2014 Prof. V N Sastry (Project Guide) IDRBT, Hyderabad 3 Acknowledgement We have completed this project as summer internship in IDRBT (Institute for Development and Research in Banking Technology), Hyderabad under the guidance of Dr. V.N. Sastry. We would like to thank all our friends and our mentor for great support in completing this project. Ratnesh Chandak Date: (Project Trainee) 4 Contents Chapter 1 Introduction 1.1 Project Objectives 1.2 Classification 1.3 Clustering Chapter 2 Container-Ball Algorithm 2.1 Algorithm 2.2 Application 2.3 Remarks Chapter 3 Cluster Analysis of Mobile Phones 3.1 Assigning weights to Mobile Phones 3.2 K-Means Clustering Algorithm 3.3 Finding Optimal K for Clustering Algorithm 3.4 Observations Chapter 4 Conclusion and Future Work Appendix A: Program of Weight calculation & K-Means Clustering Algorithm References 5 Chapter 1: Introduction 1.1 Project Objectives 1. To find intersection of same colored balls of same sizes from a set of containers. 2. To do Cluster Analysis of Mobile phones available in market. 1.2 Classification Classification refers to the task of predicting a class label for given unlabeled objects in the dataset. Example: 1. A Bank loan officer needs analysis of past data to learn which loan applicants are safe and who are risky and accordingly the officer provides loan. 2. A marketing manager at a company needs to analyze to guess whether a customer with a given profile will buy a new computer. How does Classification works? Classification is a two step process: 1. Training Phase (Learning step): Using the past history (training data), a classification algorithm is build which defines classification rules for the Classifier. 2. Classification Phase (Labeling Step): The Classifier according to the classification rules gives label (class) to any new object. 1.2 Clustering Clustering refers to grouping a set of objects in such a way that objects in the same cluster (group) are more similar (depending on their some or the other features) to each other than to those in other clusters. Example: 1. Clustering helps in classifying documents on web for information discovery. 2. It can be used in Earthquake studies, city planning, market research, pattern recognition, data analysis, and image processing etc. 6 There are different types of clustering algorithm used to do cluster analysis, mainly: 1. Hierarchical Clustering 2. Centroid- Based Clustering 3. Distribution Based Clustering 4. Density-Based Clustering 7 Chapter 2: Container Ball Algorithm 2.1 Objective: Finding common number of same colored balls from different sets of containers. User Input: 1. Number of Containers (M) 2. Number of Colors (N). The input can be from console or from input file Assumption: The number of containers should be greater than one. Algorithm: 3. Sort the container in increasing order of the number of balls by Quicksort. 4. Check the number of container (M) A. If “M” is even, divide the M containers into group of two consecutive containers. Now find the intersection of common colored balls in the two containers in all such M/2 groups. And now delete the parent containers and form child containers whose composition is common colored balls from its parent containers. Now make M -> M/2. B. If “M” is odd, divide first M-1 containers into group of two consecutive containers. Now find the intersection of common colored balls in the two containers in all such (M-1)/2 groups. And now delete the parent containers and form child containers whose composition is common colored balls from its parent containers. Now make M -> (M-1/2) +1 5. Repeat steps (1) & (2) until we have only one container whose composition is common colored balls. Pseudo Code: A. Main (container M) 1. Container M = new Container[] 2. QUICKSORT(M) 3. while M.length ≠ 1 4. if M.length % 2 == 0 5. Container temp = new Container[] 6. int j=0, i=0 7. while i < M.length/2 8 8. temp[j] = combine (M[i], M[i+1]) 9. i = i+2 10. j = j+1 11. M = temp 12. else if M.length % 2 == 1 13. Container temp = new Container[] 14. int j=0, i=0 15. while i < (M.length-1)/2 16. temp[j] = combine (M[i], M[i+1]) 17. i = i+2 18. temp[j] = M[M.length] 19. M = temp B. Container Class 1. String name 2. String[] colorarray= new String[] 3. int[] freqarray = new int[] 4. public Container(String name) 5. this.name=name; 6. public void defineColor(String name2,int b,int c) 7. Color2 temp2= new Color2(name2); 8. temp2.freq(b); 9. colorarray[c]=temp2.name; 10. freqarray[c]= temp2.frequency; C. Color Class 1. String name 2. int frequency 3. public Color2(String name) 4. this.name=name 5. public void freq(int a) 6. frequency=a D. combine(Container A, Container B) 1. Container2 commondata = new Container2("tempcontainer"); 2. n = M.length 3. for(int k=0;k<n;k++) 4. for(int l=0;l<n;l++) 9 5. if(firstone.colorarray[k].compareTo(secondone.colorarray[l])==0){ 6. if(firstone.freqarray[k]<=secondone.freqarray[l]) 7. { commondata.defineColor(firstone.colorarray[k],firstone.freqarray[k) } 8. else if(firstone.freqarray[k]>secondone.freqarray[l]) 9. { commondata.defineColor(firstone.colorarray[k],secondone.freqarray[l]) } 10. return commondata Container 1 Container 2 ….Container M Size1 Size2 ..Size a Size 1 Size2 ..Sizeb Size 1 Size 2 ..Size m Color 1 Color 2 Color 3 ..Color N Figure 2.1 M Containers with containing N colored balls of different sizes. Example: Input: Container 1 Container 2 Container 3 Size 1 Size 2 Size 3 Size 1 Size 2 Size 3 Size 1 Size 2 Size 3 Color 1 4 42 13 45 12 6 55 29 1 Color 2 8 12 7 35 17 7 23 5 5 Color 3 3 10 2 32 29 31 32 3 34 Color 4 12 11 23 11 27 71 21 0 2 Output table containing of common colored balls of similar sizes. Size 1 Size 2 Size 3 Color 1 4 12 1 Color 2 8 5 5 Color 3 3 3 2 Color 4 11 0 2 2.2 Related Applications 1. True Caller Mobile Application. 2. In Facebook “People You May Know” Feature. 10 2.3 Remarks 1. This application can be used by Market manager of mobile application company, if he/she wants to estimate the number of mobiles phones which are compatible to their new application by finding intersection of features of their application from different sets of mobile. 2. This application can also be used as Idea of “Diet Check” per week, to know which all food components one is eating in each week by finding intersection of proteins, vitamins etc. from each day. This can be used even by Hospitals to regulate the amount of nutrients going to patients. 11 Chapter 3: Cluster Analysis of Mobile Phones In this project we would present about Centroid-Based clustering algorithm namely K-Means Clustering Algorithm. In this project we will take example of mobiles phones in market and we will group them in appropriate clusters and for any new launched mobile in market we will apply our method and we can know the appropriate clusters. But before moving to K-Means Clustering Algorithm, you have to know that there is a limitation to K-Means, it can be applied only to numerical databases, so to overcome that condition in case of mobile phones, we have given each mobile a weight. 5.1 Assigning weights to mobile phones To assign weight to mobile phones, follow below steps: 1. List all the features with which someone want to compare. 2. Assign priority order to each of the features 3. Give (X,Y) coordinate to each of the features as follows: a. Take some increasing function X= F(Z) and give the value of Z as 1,2,3,……n (“n” corresponds to number of features) b. Now take an increasing function for Y, Y= F(X). 4. For any given mobile we now have fixed number of features. Now calculate distances of all those features which are present in that mobile phone from origin (0,0 ). 5. Calculate the standard deviation of distances of each mobile that will Corresponds to its weight. 12 In this project, we have considered the following features and the number before each feature corresponds to its priority order. 1. 2G 12.