Churn Analysis of Online Social Network Users Using Data Mining Techniques
Total Page:16
File Type:pdf, Size:1020Kb
Churn analysis of online social network users using data mining techniques Citation for published version (APA): Long, X., Yin, W., An, L., Ni, H., Huang, L., Luo, Q., & Chen, Y. (2012). Churn analysis of online social network users using data mining techniques. In International MultiConference of Engineers and Computer Scientists, IMECS 2012 (Vol. 2195, pp. 551-556). Newswood Limited. Document status and date: Published: 16/03/2012 Document Version: Typeset version in publisher’s lay-out, without final page, issue and volume numbers Please check the document version of this publication: • A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement: www.tue.nl/taverne Take down policy If you believe that this document breaches copyright please contact us at: [email protected] providing details and we will investigate your claim. Download date: 01. Oct. 2021 See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/235762390 Churn analysis of online social network users using data mining techniques Conference Paper · March 2012 CITATIONS READS 6 241 7 authors, including: Xi Long Le An Philips University of California, Riverside 65 PUBLICATIONS 417 CITATIONS 42 PUBLICATIONS 505 CITATIONS SEE PROFILE SEE PROFILE Some of the authors of this publication are also working on these related projects: Epilepsy View project Sleep in preterm infants View project All content following this page was uploaded by Xi Long on 04 June 2014. The user has requested enhancement of the downloaded file. Churn Analysis of Online Social Network Users Using Data Mining Techniques Xi Longy, Wenjing Yin, Le An, Haiying Ni, Lixian Huang, Qi Luo, and Yan Chen Abstract—A churn is defined as the loss of a user in an China, the most widely-used Chinese OSNs are: Renren online social network (OSN). Detecting and analyzing user (renren.com), Qzone (qzone.qq.com), Weibo (weibo.com), churn at an early stage helps to provide timely delivery of Kaixin001 (kaixin001.com), and Pengyou (pengyou.com), retention solutions (e.g., interventions, customized services, and better user interfaces) that are useful for preventing users from who share the majority of China’s OSN market, traffic, and churning. In this paper we develop a prediction model based users [5], [6]. on a clustering scheme to analyze the potential churn of users. Many studies based on OSN users with different purposes In the experiment, we test our approach on a real-name OSN have been achieved by using data mining techniques [7]-[9]. which contains data from 77,448 users. A set of 24 attributes For an OSN service provider, the sustainable development is extracted from the data. A decision tree classifier is used to predict churn and non-churn users of the future month. In of the OSN site mainly depends on whether it has mass addition, k-means algorithm is employed to cluster the actual users and how active they are [4]. Unfortunately, a problem churn users into different groups with different online social has been observed that many existing users behave inactively networking behaviors. Results show that the churn and non- on an OSN site or they completely leave the site. In other churn prediction accuracies of ∼65% and ∼77% are achieved words, the OSN site loses users. In this paper, the loss of respectively. Furthermore, the actual churn users are grouped into five clusters with distinguished OSN activities and some a user is defined as a churn. The analysis of user churn suggestions of retaining these users are provided. problem has been studied in many different industries such as telecommunications [10], [11], healthcare [12], financial Index Terms—Online social network, churn prediction, user clustering, retention solution. service [13], and banking [14]. For OSN service providers, it has become increasingly important because user churn of OSN sites causes direct revenue loss [15]. Besides, recruiting I. INTRODUCTION a new user may cost higher than retaining an existing one ED-BASED online social network (OSN) sites have in internet industry [2], [15]. Therefore, it is important for W large beneficial effects on people’s social communi- an OSN service provider to: (1) precisely and early locate cations through internet [1], [2]. Many OSNs (e.g., Facebook, the users who are at high risk of churn in the near future LinkedIn, and MySpace), which provide various services (i.e., the future month in this case); (2) timely prevent these to meet different user needs, are emerging and growing users from churning by using appropriate retention solutions rapidly with easier and better interactive experience [3]. A such as providing interventions through email, offering better large number of netizens register their personal accounts services, recommending interesting contents, etc. at OSN sites with social communities in order to extend Although the conventional approaches (e.g., usability test their social relationships in local, national or worldwide and user interview) are useful for qualitatively knowing the ranges by connecting with others. Such connections between interactive problems and user expectations of an OSN site, individuals are usually based on their manifest or latent they are less helpful to identify who may churn in the future. relationships fashioned by their self-disclosed information Hence, predicting future churns based on their past online such as personal profiles, locations, and interests [4]. Up activity data plays an important role in delivering appropriate to now, more than a hundred OSNs have been launched retention solutions to them. In this study, the prediction was and many of them focus on providing local services to achieved by means of a classifier that can distinguish the users in their own countries with mother-tongue-only in- users in two pre-defined classes (i.e., churn and non-churn) terfaces by means of their inherent advantages in the as- because their online activity data contain information from pects of local culture, daily lifestyle, online/offline behav- which churn and non-churn users can be derived. Classifiers ior, communication habit, and so on [5]. For example, in require that this information is first extracted from the data as “attributes”. Several simple classifiers can be considered Manuscript received December 08, 2011. This work was supported by for classifying churns and non-churns with acceptable per- Customer Research and User Experience Design Center (CDC), Tencent formances such as neural network (NN) and decision tree Inc., China. (DT) [16]. Compared with a DT-based classifier, a NN- yX. Long was a User Researcher in CDC, Tencent Inc., High-Tech Park, Shenzhen, 518057, China. He is now a PhD candidate in the Dept. of based one cannot give an explicit expression of the attribute Electrical Engineering, Eindhoven University of Technology, Eindhoven, patterns of classes in an easily understandable form, which 5600 MB, the Netherlands (e-mail: [email protected]). complicates user behavior interpretation for making decisions W. Yin1, H. Ni2, L. Huang3, and Q. Luo4 are Senior User Researchers in CDC, Tencent Inc., High-Tech Park, Shenzhen, 518057, China (e-mail: during classification [17]. It means that only the churning f1viviyin, 2amieeni, 3henryhuang, [email protected]). probabilities of users are provided without indicating which L. An is a PhD candidate in the Dept. of Electrical Engineering, University attributes are more significant in discriminating with churn of California, Riverside, CA92521, USA (e-mail: [email protected]). Y. Chen is a Director in CDC, Tencent Inc., High-Tech Park, Shenzhen, and non-churn. Besides, a DT-based classifier can handle 518057, China (e-mail: [email protected]). both numerical and categorical data. In this study, a DT-based classifier [18] is adopted to predict the class membership. collected in anonymity (in honor of user privacy and under After locating the churn users, it is important to group certain regulations) such as number of friends, total online these users into different clusters based on their different time, login times, active days, number of blogs written, online activities so that the appropriate retention solutions number of pictures shared, number of messages in guestbook, can be delivered to them. For this purpose, unsupervised login times in QQ, etc. Then a large dataset was built in learning is recommended. From the practical point of view, which various attributes (e.g., users’ statuses and login ac- a k-means and a hierarchical clustering algorithm are consid- tivities) can be extracted. Note that all the data are monthly- ered due to the wide employment and the ease of implemen- based.