Analysis and Prediction of Television Show Popularity Rating Using Incremental K-Means Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Mechanical Engineering and Technology (IJMET) Volume 9, Issue 1, January 2018, pp. 482–489, Article ID: IJMET_09_01_052 Available online at http://iaeme.com/Home/issue/IJMET?Volume=9&Issue=1 ISSN Print: 0976-6340 and ISSN Online: 0976-6359 © IAEME Publication Scopus Indexed ANALYSIS AND PREDICTION OF TELEVISION SHOW POPULARITY RATING USING INCREMENTAL K-MEANS ALGORITHM D. Anand Asst. Professor, KL University A.V.Satyavani, B.Raveena and M.Poojitha KL University ABSTRACT The Television Reality shows are increasing day-by-day in the present generations. There are many different ways to find the Television Rating Point (TRP). Firstly the raw data is taken based upon the People’s Meter and the no of views will be counted from that. Then we need to divide the whole data set into clusters based on different channels. Here the data set consists of channels. Select the particular channel and take the view count. Depending upon the number of views, rate the channel or show accordingly, like if the view count is more than 10,000, then allot 10 rating to that particular show. If any new data is present then add it in the middle of the proces, then the whole process starts again. With the help of proposed algorithm we can update, add new entries in the middle of the process also. Based on the number of views we will rate that the particular Television shows accordingly with the highest Rating. The TRP can be compared among different shows and be viewed in bar graphs, pie charts, histograms. We have K-Means and Incremental K-Means algorithms to compare the TRP. The comparison between the two algorithms is very clear on histograms. It is the easiest way of predicting TV show analysis. If the data is inaccurate it may result to fault values. Keywords: K-Means, Incremental K-Means, Clustering, Data Object Cite this Article: D. Anand, A.V.Satyavani, B.Raveena and M.Poojitha, Analysis and Prediction of Television Show Popularity Rating using Incremental K-Means Algorithm, International Journal of Mechanical Engineering and Technology 9(1), 2018. pp. 482–489. http://iaeme.com/Home/issue/IJMET?Volume=9&Issue=1 http://iaeme.com/Home/journal/IJMET 482 [email protected] D. Anand, A.V.Satyavani, B.Raveena and M.Poojitha 1. INTRODUCTION: Television has become a part in every one’s life. There are many shows telecasting in different channels. The number f viewers watching shows are increasing day-by-day. There are many ways to find out the view count. We can find out which show has the highest TRP and we considered it as the most viewed show. In Television Rating we consider the following like People’s Meter, Clustering, K-Means Clustering and Incremental K-Means Clustering algorithms. The People’s meter is a ‘box’ which is hooked up to each television and is accompanied by a remote control unit. It records the number of viewers and their details like Name, Age and Gender to identify which show they are viewing. Clustering means dividing the data into sub classes called Clusters. Based on the recently described cluster models If any clustering will be taken into consideration it has its own advantages and dis advantages. Based on the given data the algorithm has to be choose correctly. There are two algorithms like K-Means and Incremental K-Means algorithms. With the help of K-Means we can classify the data very effectively. The entire K-Means mainly depends upon the clusters selected. Single clusters are divided into sub clusters. Among them select any cluster and perform the clustering operations. The output mainly depends upon the K- value selected. Incremental K-Means Clustering is an extension to K-Means. The new data sets selected in the middle of the process are added into the existing database. The new data is always grouped with the previous data. This strategy optimizes the process and adapts to applications. Compared to K-Means, incremental K-Means is the most efficient one. A graph depicts that plotting number of shows on X-axis and the rating on Y-axis. Different channels have different rating analysis. The graph contains histograms and the channel with the highest popularity rating will be shown clearly in the graph. 2. LITERATURE SURVEY In Hartigan has given a paper research on determining the Threshold values for the new cluster based on the existing cluster. He proposed Incremental clustering which has attracted the attention of research community with Hartigan’s Leader clustering algorithm which uses to get the Threshold Values. By virtue of the distance the algorithm splits the data set into groups. An object is taken and makes it as a Leader and the remaining object in the group also lies in the same region at a distance some T. The data point which is first will be selected as a Leader Object. Similarly the remaining objects which are at different points are all made into one group. The data is to processed only once. The Leader algorithms handle static data bases only which has become a base for Incremental. Charikar studied a clustering algorithm to handle dynamic data bases also. The researches further proved the incremental algorithm and many models of it by providing an extension to the dynamic data bases. This paper analyses several natural greedy algorithms and proved that they perform rather poorly in the dynamic setting. BIRCH which was proposed by Zhang especially suitable for larger number of data items. For many clustering techniques balancing can be done with the help of this algorithm. Iterations can also be made normally by suing this technique. A data structure called Cluster feature tree used to split the data points and to increment them in sorted fashion. This algorithm reduces the memory usage. However it is the first clustering algorithm to handle clusters. http://iaeme.com/Home/journal/IJMET 483 [email protected] Analysis and Prediction of Television Show Popularity Rating using Incremental K-Means Algorithm The data mining approaches are suggested by Ester et al proposes Incremental DB Scan which is suitable for mining. The data bases have very frequent updates. Performing different operations on data bases like insertion, deletion the cluster need to be updated. However the insertions and deletions effects very small on the particular cluster. The algorithm checks whether which part of the space is affected by the new data. For the pair of objects this algorithm is very efficient one. Steinbach, Karypus and Kumar summarized a paper on hierarchical and Clustering algorithms mainly on K –Means algorithm. The main concept if this paper is to provide a detailed and comprehensive description of important clustering algorithms. There are many application areas like Computer Science, Machine learning. Based on the clusters and techniques many theories have been proposed in by them. The cluster sequential data and approaches are also discussed. Based on the general clustering techniques Jain and Dubes proposed many clustering tasks. The main goal of this paper is to find the very hard components of data present in clustering. The whole process is divided into two stages. In first stage the appropriate processing steps need to be selected like pre-processing, feature extraction and many more. With the help of these chosen steps measure the values. For analyzing this phase one should have a good knowledge on basics of data analysis and resulting domain. The second phase consists of exact patterns present in the required data sets. By using simple distance functions the approximate value can be calculated. Nagy proposed hierarchical clustering algorithms. The time and space complexity of these algorithms are very efficient when compared to typical k-means algorithms. With the help of these hybrid algorithms can be developed. Other advantages with these algorithms are simplicity and speed. The results may vary when we run on the other data sets and algorithms. Conclusions were made that variance results are not appropriate to the particular problem. Ball and Hall developed a new data called ISODATA. In this the K- value plays a major role. There are some thresholds present in the clusters the ISO data can merge the clusters. The splitting of the clusters is also possible in this area. The loop will be taken. Every time the iteration is completed the k- value got updated. New k-value is added all the time to calculate the number of clusters. By using this iteration, there is no guarantee for optimization. Further developments in this area will results in the optimization. Many researchers developed new operators to improve the efficiency and optimization purpose. The two modifications Proposed by Stephen J. Phillips are avoiding making unnecessary comparisons between data points and comparing with each other. The second modification is avoiding the algorithm sorting means. This modification helped the K-Means and other classification algorithms on any given wide range of data sets. Hirtoshi and Haruno proposed several techniques on training data. By applying distributional clustering as a main feature to obtain text classification. The difference between two features is same as features of distributions. The similar features will form into a cluster. They will involve in a same classification process. By using these distributions it reduces the number of accuracy. 3. K-MEANS ALGORITHM FOR CLUSTERING: Clustering: Cluster means objects of similar group. Clustering is the process of collection of that different clusters. In other way it is the process of group of similar objects. In many fields Clustering is the main task for statistical analyzing the data in the data mining. http://iaeme.com/Home/journal/IJMET 484 [email protected] D.