Predicting Gaming Behavior Using Facebook Data
Total Page:16
File Type:pdf, Size:1020Kb
PREDICTING GAMING BEHAVIOR USING FACEBOOK DATA Word count: 8791 Cesar Vermeulen Student number : 01205064 Supervisor: Prof. dr. Dirk Van Den Poel Co-supervisor: Matthias Bogaert Master’s Dissertation submitted to obtain the degree of: Master of Science in Business Engineering Academiejaar/ Academic year: 2017 – 2018 Permission I declare that the content of this Master's Dissertation may be consulted and/or reproduced, provided that the source is referenced. Name student : Cesar Vermeulen Signature: I Nederlandstalige samenvatting Sociale game ontwikkelaars staan voor heel wat uitdagingen. Langs de ene kant is het voor hen heel moeilijk om nieuwe gebruikers aan te trekken, aangezien deze markt zoveel aanbod heeft. Langs de andere kant, zijn deze games meestal gratis te spelen, dus ervaren ze een groot aantal klanten dat hun spel verlaten, typisch al na een zeer korte tijdspanne. In deze paper, hebben we gezocht naar antwoorden op de volgende drie vragen: (1) Is het mogelijk om het gedrag van gamers te voorspellen aan de hand van Facebook data, (2) welk algoritme is hier het beste voor en (3) welke karakteristieken hebben de grootste impact op het gedrag van gamers. Volgende stappen zijn gevolgd om tot onze resultaten te komen. Eerst en vooral hebben we data verzameld van meer dan 5000 mensen, waarvan we 166 variabelen hebben geselecteerd. Hierna hebben we modellen gebouwd met 6 verschillende machine learning algoritmes. Volgende algoritmes zijn gebruikt: Random Forest, Logistische regressie, XGBoost, Support vector machines, deep learning en een hybrid ensemble. Deze vergelijken we met elkaar aan de hand van drie scores: de oppervlakte onder de ROC-curve (AUC), de nauwkeurigheid en de top 10% decile lift. Hierna hebben we aan de hand van een sensitiviteitsanalyse, gevolgd door een informatie fusie, de belangrijkste karakteristieken gevonden. Met AUC scores tussen de 0.6946 en 0.7459, nauwkeurigheid van 0.6893 tot 0.7642 en 2.3070 tot 2.5750 voor top 10% decile lift, kunnen we besluiten dat onze modellen doeltreffend zijn in het identi- ficeren van potenti¨elegamers en het dus een effici¨ente manier zou zijn voor marketing afdelingen om hun beleid bij te sturen met onze aanbevelingen. Wanneer we kijken naar belangrijkste determinanten, vinden we dat likes voor gemeenschapspagina's de sterkste invloed heeft. Deze studie draagt bij aan de huidige literatuur door een effectieve manier naar voor te schuiven om het game gedrag van personen te voorspellen aan de hand van zijn of haar Facebook data. II Acknowledgements This master's dissertation is the conclusion of six years of hard work. I could not do this without the help I received, so I can not think of a better opportunity to express my gratitude than here and now. First of all, I would like to thank my promotor, professor Dirk Van den Poel and his assistant Matthias Bogaert for the opportunity they gave me to work on this study. Their guidance, feedback and immense knowledge have been of great help. I could not wish for better advisers. Second, my family. I could not thank them enough for their assistance and support during the course of my studies. A sincere thank you to my parents, for giving me the opportunity to start my studies. My brother and sisters, for their support and assistance. Special thanks to Marjoke, for always being there for me and supporting me in every hard situation I encountered. Special thanks to Daan, Pieter and Victor for proofreading this paper. Last but not least, my fellow students and friends, for all the fun and hard work we had together. Silke, Victor, David, Arne, Stephan and Pieter, thank you. III Contents 1 Introduction 2 2 Literature review 3 3 Methodology 7 3.1 Data . .7 3.2 Predictors . .8 3.3 Classification algorithms . 10 3.3.1 Random forest . 10 3.3.2 XGBoost . 10 3.3.3 Logistic regression . 11 3.3.4 Support vector machines . 11 3.3.5 Deep learning . 12 3.3.6 Soft voting ensemble . 13 3.4 Model performance . 14 3.5 Cross-validation . 15 3.6 Information fusion sensitivity analysis . 16 4 Discussion of results 17 4.1 Data analytic model results . 17 4.2 Information fusion sensitivity analysis . 19 5 Conclusion and practical implications 23 6 Limitations and future research 25 Appendices 33 A Top 30 most frequent games 33 IV List of Figures 1 Scree plot of sensitivity scores of top 100 variables . 20 2 Correlation heatmap for the decrease in AUC across all predictors . 22 V List of Tables 1 Literature review . .4 2 Top five most frequently played games . .8 3 Predictors . .9 4 Median AUC, accuracy and top 10% decile lift for fivefold stratified cross-validation . 17 5 Inter quartile ranges . 17 6 Average ranks for AUC, accuracy and lift . 18 7 Top 15 variables based on decrease in AUC . 21 8 Top 10 variables based on decrease in AUC without like categories. 23 9 Top 30 most frequent games . 33 VI Abbreviations AUC area under receiver operating characteristic curve LASSO least absolute shrinkage and selection operator MMORPG massive multiplayer online role playing game RBF Gaussian radial basis function RPG role playing game SVM Support vector machines VII Abstract Due to their high barriers to enter and low barriers to leave, social game developers face difficulties in finding and retaining customers. In order to aid marketeers in their targeting campaigns, this study seeks to find out (1) whether or not it is possible to predict gaming behavior using Facebook data, (2) what the best machine learning algorithm is to do so and (3) evaluate which variables have the greatest impact on one's gaming behavior. With the use of a custom-built Facebook application, we gathered the relevant data of 5010 profiles. From these profiles, we used a total of 166 variables. We benchmarked 6 different algorithms (Random Forest, Logistic Regression, XGBoost, Deep learning, Support Vector Machines and a hybrid ensemble) and compared their performance with regards to AUC, accuracy and top 10% decile lift, with the use of a fivefold stratified cross-validation. With AUC scores reaching between 0.6946 and 0.7459, accuracy scores ranging between 0.6893 and 0.7642 and top 10% decile lift between 2.3070 and 2.5750, we can state that we have found a viable approach in targeting potential gamers. Moreover, we performed an information fusion sensitivity analysis in order to find our most important variables. In terms of mean decrease in AUC, we found that the most important variable is the amount of likes for community pages. This study contributes to existing theory and practice by presenting a data-analytical approach of the acquisition process of potential gamers using Facebook data. 1 1 Introduction With more than 445 million Facebook users who play games [1], games have become an important part of Facebook's platform. However, revenues directly generated from these games, have been declining over the past years [5]. Most of these games are social games: they can be played on mobile platforms such as Facebook or Google+ and often use a freemium model. This implies that games are free to play, but additional content to improve gaming experience (e.g., no advertisements, extra content) has to be purchased. Typically, these games have a high barrier to entry (i.e., there is too much choice) and a low barrier to leave (i.e., there is no financial penalty for leaving the game). Given the fact that Facebook has given the opportunity to game developers to generate revenue from advertisements [4], which is also Facebook's main source of income [5], the main challenge of this model is to attract new potential gamers. Research on gaming behavior has been devoted on churn prediction in online games [31, 38, 47, 55, 57, 63, 64, 65]. These studies investigate whether a player will stop playing the focal game. However, the targeting of new potential gamers through a data-analytical approach has not received a lot of attention in the field of gaming analytics. Finding the main determinants of a person's reason to play games can heavily aid game developers and gaming companies, as the acquisition of new potential gamers is much more expensive than maintaining current customers. Reinartz and Kumar [56] concluded that it is more profitable for a company to maintain and satisfy its current customers than to put the emphasis on renewing its customer base. However, due to the high churn rate that is in the nature of freemium games, the retaining of current customers is very difficult [35]. Despite the fact that acquiring new customers is of great importance, no study has evaluated the feasibility of a system that identifies potential gamers using Facebook data. This study is the first to explore the field of predicting gaming behavior through Facebook data. Despite the prevalence of demographic research, where the focus lies on identifying what type of person plays which type of games, little or no research has so far been available to academics or practitioners in the predictive field of the acquisition of gamers. We believe that this demographic research is a part of the acquisition process of gamers, since it gives marketeers a good overview of what their target customers are. To solve this gap in literature, we study whether or not it is feasible to identify gamers on Facebook. We define a gamer as a person who has interests in games that are currently being offered on a mobile platform. A person is considered a gamer when he or she has liked a page of one or more of these social games (e.g. Candy Crush, Farmville). For this study, we extracted the Facebook profiles of 5010 2 individuals.