PREDICTING GAMING BEHAVIOR USING FACEBOOK DATA

Word count: 8791

Cesar Vermeulen Student number : 01205064

Supervisor: Prof. dr. Dirk Van Den Poel Co-supervisor: Matthias Bogaert

Master’s Dissertation submitted to obtain the degree of:

Master of Science in Business Engineering

Academiejaar/ Academic year: 2017 – 2018

Permission

I declare that the content of this Master’s Dissertation may be consulted and/or reproduced, provided that the source is referenced. Name student : Cesar Vermeulen Signature:

I Nederlandstalige samenvatting

Sociale game ontwikkelaars staan voor heel wat uitdagingen. Langs de ene kant is het voor hen heel moeilijk om nieuwe gebruikers aan te trekken, aangezien deze markt zoveel aanbod heeft. Langs de andere kant, zijn deze games meestal gratis te spelen, dus ervaren ze een groot aantal klanten dat hun spel verlaten, typisch al na een zeer korte tijdspanne. In deze paper, hebben we gezocht naar antwoorden op de volgende drie vragen: (1) Is het mogelijk om het gedrag van gamers te voorspellen aan de hand van Facebook data, (2) welk algoritme is hier het beste voor en (3) welke karakteristieken hebben de grootste impact op het gedrag van gamers. Volgende stappen zijn gevolgd om tot onze resultaten te komen. Eerst en vooral hebben we data verzameld van meer dan 5000 mensen, waarvan we 166 variabelen hebben geselecteerd. Hierna hebben we modellen gebouwd met 6 verschillende machine learning algoritmes. Volgende algoritmes zijn gebruikt: Random Forest, Logistische regressie, XGBoost, Support vector machines, deep learning en een hybrid ensemble. Deze vergelijken we met elkaar aan de hand van drie scores: de oppervlakte onder de ROC-curve (AUC), de nauwkeurigheid en de top 10% decile lift. Hierna hebben we aan de hand van een sensitiviteitsanalyse, gevolgd door een informatie fusie, de belangrijkste karakteristieken gevonden. Met AUC scores tussen de 0.6946 en 0.7459, nauwkeurigheid van 0.6893 tot 0.7642 en 2.3070 tot 2.5750 voor top 10% decile lift, kunnen we besluiten dat onze modellen doeltreffend zijn in het identi- ficeren van potenti¨elegamers en het dus een effici¨ente manier zou zijn voor marketing afdelingen om hun beleid bij te sturen met onze aanbevelingen. Wanneer we kijken naar belangrijkste determinanten, vinden we dat likes voor gemeenschapspagina’s de sterkste invloed heeft. Deze studie draagt bij aan de huidige literatuur door een effectieve manier naar voor te schuiven om het game gedrag van personen te voorspellen aan de hand van zijn of haar Facebook data.

II Acknowledgements

This master’s dissertation is the conclusion of six years of hard work. I could not do this without the help I received, so I can not think of a better opportunity to express my gratitude than here and now. First of all, I would like to thank my promotor, professor Dirk Van den Poel and his assistant Matthias Bogaert for the opportunity they gave me to work on this study. Their guidance, feedback and immense knowledge have been of great help. I could not wish for better advisers. Second, my family. I could not thank them enough for their assistance and support during the course of my studies. A sincere thank you to my parents, for giving me the opportunity to start my studies. My brother and sisters, for their support and assistance. Special thanks to Marjoke, for always being there for me and supporting me in every hard situation I encountered. Special thanks to Daan, Pieter and Victor for proofreading this paper. Last but not least, my fellow students and friends, for all the fun and hard work we had together. Silke, Victor, David, Arne, Stephan and Pieter, thank you.

III Contents

1 Introduction 2

2 Literature review 3

3 Methodology 7 3.1 Data ...... 7 3.2 Predictors ...... 8 3.3 Classification algorithms ...... 10 3.3.1 Random forest ...... 10 3.3.2 XGBoost ...... 10 3.3.3 Logistic regression ...... 11 3.3.4 Support vector machines ...... 11 3.3.5 Deep learning ...... 12 3.3.6 Soft voting ensemble ...... 13 3.4 Model performance ...... 14 3.5 Cross-validation ...... 15 3.6 Information fusion sensitivity analysis ...... 16

4 Discussion of results 17 4.1 Data analytic model results ...... 17 4.2 Information fusion sensitivity analysis ...... 19

5 Conclusion and practical implications 23

6 Limitations and future research 25

Appendices 33

A Top 30 most frequent games 33

IV List of Figures

1 Scree plot of sensitivity scores of top 100 variables ...... 20 2 Correlation heatmap for the decrease in AUC across all predictors ...... 22

V List of Tables

1 Literature review ...... 4 2 Top five most frequently played games ...... 8 3 Predictors ...... 9 4 Median AUC, accuracy and top 10% decile lift for fivefold stratified cross-validation . . 17 5 Inter quartile ranges ...... 17 6 Average ranks for AUC, accuracy and lift ...... 18 7 Top 15 variables based on decrease in AUC ...... 21 8 Top 10 variables based on decrease in AUC without like categories...... 23 9 Top 30 most frequent games ...... 33

VI Abbreviations

AUC area under receiver operating characteristic curve

LASSO least absolute shrinkage and selection operator

MMORPG massive multiplayer online role playing game

RBF Gaussian radial basis function

RPG role playing game

SVM Support vector machines

VII Abstract

Due to their high barriers to enter and low barriers to leave, social game developers face difficulties in finding and retaining customers. In order to aid marketeers in their targeting campaigns, this study seeks to find out (1) whether or not it is possible to predict gaming behavior using Facebook data, (2) what the best machine learning algorithm is to do so and (3) evaluate which variables have the greatest impact on one’s gaming behavior. With the use of a custom-built Facebook application, we gathered the relevant data of 5010 profiles. From these profiles, we used a total of 166 variables. We benchmarked 6 different algorithms (Random Forest, Logistic Regression, XGBoost, Deep learning, Support Vector Machines and a hybrid ensemble) and compared their performance with regards to AUC, accuracy and top 10% decile lift, with the use of a fivefold stratified cross-validation. With AUC scores reaching between 0.6946 and 0.7459, accuracy scores ranging between 0.6893 and 0.7642 and top 10% decile lift between 2.3070 and 2.5750, we can state that we have found a viable approach in targeting potential gamers. Moreover, we performed an information fusion sensitivity analysis in order to find our most important variables. In terms of mean decrease in AUC, we found that the most important variable is the amount of likes for community pages. This study contributes to existing theory and practice by presenting a data-analytical approach of the acquisition process of potential gamers using Facebook data.

1 1 Introduction

With more than 445 million Facebook users who play games [1], games have become an important part of Facebook’s platform. However, revenues directly generated from these games, have been declining over the past years [5]. Most of these games are social games: they can be played on mobile platforms such as Facebook or Google+ and often use a freemium model. This implies that games are free to play, but additional content to improve gaming experience (e.g., no advertisements, extra content) has to be purchased. Typically, these games have a high barrier to entry (i.e., there is too much choice) and a low barrier to leave (i.e., there is no financial penalty for leaving the game). Given the fact that Facebook has given the opportunity to game developers to generate revenue from advertisements [4], which is also Facebook’s main source of income [5], the main challenge of this model is to attract new potential gamers. Research on gaming behavior has been devoted on churn prediction in online games [31, 38, 47, 55, 57, 63, 64, 65]. These studies investigate whether a player will stop playing the focal game. However, the targeting of new potential gamers through a data-analytical approach has not received a lot of attention in the field of gaming analytics. Finding the main determinants of a person’s reason to play games can heavily aid game developers and gaming companies, as the acquisition of new potential gamers is much more expensive than maintaining current customers. Reinartz and Kumar [56] concluded that it is more profitable for a company to maintain and satisfy its current customers than to put the emphasis on renewing its customer base. However, due to the high churn rate that is in the nature of freemium games, the retaining of current customers is very difficult [35]. Despite the fact that acquiring new customers is of great importance, no study has evaluated the feasibility of a system that identifies potential gamers using Facebook data. This study is the first to explore the field of predicting gaming behavior through Facebook data. Despite the prevalence of demographic research, where the focus lies on identifying what type of person plays which type of games, little or no research has so far been available to academics or practitioners in the predictive field of the acquisition of gamers. We believe that this demographic research is a part of the acquisition process of gamers, since it gives marketeers a good overview of what their target customers are. To solve this gap in literature, we study whether or not it is feasible to identify gamers on Facebook. We define a gamer as a person who has interests in games that are currently being offered on a mobile platform. A person is considered a gamer when he or she has liked a page of one or more of these social games (e.g. Candy Crush, Farmville). For this study, we extracted the Facebook profiles of 5010

2 individuals. In order to do this, we developed an application that extracts the user’s data. With their authorisation, our application obtained all the relevant user’s profile data. To evaluate the capacity of Facebook data to predict gaming behavior, we will benchmark several algorithms (i.e., Random forest, neural networks, logistic regression, XGBoost, support vector machines and a hybrid ensemble method) and compare their predictive performance. In a next phase, we also assess the most important drivers of gaming behavior. The remainder of this paper is structured as follows: first we review the existing literature on gaming analytics. Second, we elaborate on our methodology. Third, we will discuss our results. Finally, we elaborate on the limitations and possible avenues for further research.

2 Literature review

In the domain of gaming analytics, extensive research has been done on the field of monetization of gamers [25, 26] and social engagement between players [7, 22]. The field of monetization of gamers attempts to find the right pricing strategy for a given type of game, while social engagement between players is more focused on how communities form in online games, how the interaction between players affects the gaming experience or how the interaction between players influence churn rates. However, these fields are not in the scope of this study. The area of interest we will focus on in this study, is the section of predicting gaming behavior. Within this section, we have two important branches, (1) the prediction of churn and (2) the prediction of gaming engagement. Literature on gaming behavior can be classified according to the data that has been used, whether we want to explain churn or acquisition and whether or not the study is predictive. The used data can be classified in two categories: game metrics and user data. Game metrics is a broad term, which includes various in-game player data (e.g. completion time, progression rate and total number of deaths). Several studies focus on the recent activity of a user in order to predict churn [12, 57, 64], others put their focus more on in-game aspects as time of completion or number of sessions [31, 47]. Whereas others study the social aspects and interactions in the games, and their influence on churn rates [38]. User data is demographic data, such as age, gender, ethnicity or education.

3 Table 1: Literature review Author Data behavior Predictive

Acquisition Churn Runge et al. [57] Game metrics x x Xie et al. [64] Game metrics x x Mahlmann et al. [47] Game metrics x x Borbora and Srivastava [12] Game metrics x x Kawale et al. [38] Game metrics x x

4 Hadiji et al. [31] Game metrics x x Xie et al. [65] Game metrics x x Xie et al. [63] Game metrics x x Peri´a˜nezet al. [55] Game metrics x x Griffiths and Hunt [28] Survey x Brox [14] Survey x Williams et al. [61] Survey x Sung et al. [59] Survey x Wohn et al. [62] Survey x Our study Facebook data x x The overarching objective of these studies was to predict churn behavior using using different sorts of data (i.e. game metrics or user data, or a combination of both). When we look at the literature on acquisition of gamers, we notice that these studies are mainly based on surveys [14, 28, 61, 66]. Studies on the acquisition of gamers investigate which characteristics of the gamer drives his/her gaming behavior. This can help advertisers to target particular segments of the market. These studies found that age and gender have a influence on gaming behavior. For example, according to Williams et al. [61], the average age of gamers in a popular massive multiplayer online role playing game (MMORPG) is 31.16 years old with a standard deviation of 9.65 years. Gender is found to be significant as well, with 80.80% of the gaming population male and 19.20% female. Finally, Griffiths and Hunt [28] and Brox [14] found that males were more likely to play more frequently than females. Another important study in this area, is the study conducted by Als´enet al. [7]. This study evaluated the effects of introducing social game play features in casual games (i.e. the opportunity of ’team battles’). All three games that were used in this study are also included in our list of games. The study has shown that the introduction of social game play features has a positive impact on gameplay experience, loyalty and monetization. Another notable finding is that two-thirds of casual social gamers are female. This is in line with the findings of Sung et al. [59], who conducted a survey on social game players. They found that females are more likely to play these type of games daily. However, we notice that these studies are mainly based on surveys and hence none of the studies given an insight in whether or not gaming behavior can be predicted and if these demographic variables play an important role. They only give a glimpse of the gamer population and its characteristics. Table 1 gives an overview of the literature of gaming behavior. It is clear that, to the best of our knowledge, no study has investigated whether or not it is feasible to predict gaming behavior using Facebook data and hence use Facebook data to identify and target potential gamers. Because Facebook gives game developers the ability to develop games on their platform, this study can be of great value for both parties. Most of these games are played on the Facebook platform and in order to enjoy the social aspect, such as competing with friends, players need to give the game access to their Facebook profile. This market is mostly dominated by large companies who have a lot of games in their portfolio. Examples of these companies are: King, Electronic Arts or Zynga. For example, King has produced over 200 games, with more than 285 million monthly active users [3]. If this study proves that it is a viable approach, they can utilize this data in several ways. They can adjust their marketing approach, to advertise other games in their portfolio or identify new potential gamers with the approach followed in this study.

5 Studies have proven that one-to-one targeting approach is much more effective than mass marketing.[15] Marketing in itself is an investment decision, where the marketeer decides where to locate its budget for acquiring new customers. Given the fact that renewing your customer base is more expensive than maintaining your current customer base [56], this investment decision should not be taken lightly. However, a model that is able to identify the most potential customers can help companies in their one-to-one marketing approach. Studies have shown that customized marketing strategies yield higher revenues than mass marketing [40]. If this study proves that it is possible to identify potential gamers, the marketing strategy of social gaming developers can be altered by using Facebook data. It gives access to a large pool of variables related to user behavior, preferences and characteristics. On top of that, Facebook has a user base of 2.13 billion monthly active users [1]. Having access to Facebook profiles gives companies better abilities in identifying potential customers. This is a point of interest for social game developers, since these games often use a freemium model and players can enter and leave freely, resulting in a high churn rate and a low fraction of users that make in-app purchases [57]. For example, Hui and Liu [35] found that on average, the 7-day retention rates for new players across social games are 10.5%. On top of that, due to the high availability of games, there is a high barrier to enter, due to the fact that there are so many social games available. Whilst in the past, more revenue was created from in-app purchases, the emphasis is now shifting towards advertisement revenues [5]. Increasing your player base thus increases the amount of people that generate revenue, since advertisement revenue is influenced by the amount of views for these advertisements. This study contributes to existing theory and practice by (1) finding out whether or not it is possible to predict a person’s gaming interest using his Facebook data, (2) finding the best algorithms to do so and (3) determining which variables are the most important. Unlike previous gaming behavior studies, where the focus was on predicting churn, this studies aims to predict one’s interest in games. Studies in the field of churn prediction have proven that it is possible to use user data to predict gaming behavior. Runge et al. used game telemetry in order to to predict player churn in social games (i.e., rounds played, days in game, days since last purchase) [57]. When it comes to predicting behavior using social media data, examples are plenty to demonstrate that social media data is a viable tool to predict one’s interests. Bogaert et al. [10] has proven that it is possible to predict event attendance and sport preference.[9] Moreover, in the field of emotion prediction, several studies have confirmed the effectiveness of social media data. Meire et al. [48] successfully built a sentiment prediction model with the use of Facebook data, whereas Bogaert et al. [11] effectively built several models to predict romantic ties. When it comes to algorithm performance when dealing with

6 predicting gaming behavior, we see that several algorithms were proven to perform well. Mahlmann et al. [47] used in-game data (i.e. causes of death, total number of deaths, playing time) in order to predict whether or not someone will stop playing the game Tomb Raider and, if one would complete the game, how long it would take. In this study, logistic regression came up as the best performing algorithm. If we are dealing with churn behavior of high value players in social games, Runge et al. [57] stated that neural networks were the best in terms of AUC. Peri´a˜nezet al. [55] applied survival analysis to churn prediction in social games. In this study, the conditional inference survival ensembles were the best predicting algorithms, followed by support vector machines.

When it comes to social gamer demographics, evidence in literature seems to suggest that there is an important role for gender and age. Als´enet al. [7] points out that the average social gamer is a 43 year old woman. This study also confirmed that the introduction of social features has a positive impact on the engagement, monetization and customer loyalty.

In sum, we believe that it will be possible to predict a person’s gaming behavior, with strong fea- ture importance on age and gender. In terms of algorithms, neural networks, logistic regression and support vector machines have proven to be well performing algorithms when dealing with the predic- tion of gaming behavior. To the best of our knowledge, no studies have tackled the acquisition process of gamers through a data mining approach. In the next section, we will discuss our methodology.

3 Methodology

3.1 Data

The data were extracted from Facebook by using a custom-built Facebook application for a European soccer team. Multiple incentives were given to stimulate participation. First of all, the application was regularly advertised on the Facebook page of the European soccer team. Second, a signed shirt of a famous soccer player was offered as prize to the person that could answer various questions correctly. To raise awareness, we also added the application to the main page tabs. After launching the application, users were asked their permission to extract data from their Facebook user profile. Along with this authorization box, we also added rules and regulations, as well as our contact information. We emphasized that all information would be anonymous. The application was available between May 7th, 2014 and May 26th, 2014. In total, 5010 people filled in our application. In order to determine

7 if one is a gamer, we extracted all unique values from the like category ’app-page’. From this list, we excluded several non-gaming pages, in order to be as precise as possible. In total, this resulted in a list of 645 unique games. An overview of the top five most frequently played games in the list is given in table 2. An overview of the top 30 most frequent games can be found in the appendix, Table A. The response variable in our model is binary, which resolves to one when a user has liked one or more pages that are included in our list. This lead to a total number of 1477 gamers and 3533 non-gamers.

Table 2: Top five most frequently played games Name Frequency

Farm Heroes Saga 193 Candy Crush Saga 169 Top Eleven - Be a Football Manager 98 Criminal Case 84 Pet Rescue Saga 47

3.2 Predictors

From our user related variables, we selected 166 features. We included several variable categories: demographic variables (e.g. age, gender, relationship status), interest variables (e.g. more than 100 like categories, for example movies, games/toys, professional sports teams), education variables (e.g. type of education). Other categories include general Facebook behavior data (e.g. profile completeness, amount of likes for music, sports) and friend related variables. In our model, we included 3 different friend metrics: the average age of one’s friends, the average game page likes of his or her friends and the average app page likes his or her friends. The difference between the latter 2 is that game page likes are games that are not offered on mobile platforms, in contrast to the app page games. However, these are not in the scope of this study, thus not included in our definition of a gamer. The category app page is the game category where we will focus on in this study. Table 3 gives a summary of all variables. All variables were standardized before running the model. This ensures that all variables are on the same scale, making it easier to interpret the results and compare variables to one another. Another advantage of standardization is that it makes the training of our models faster and for several algorithms, standardization is preferred. For example, when dealing with algorithms that use gradient descent based optimization, such as support vector machines, logistic regression and neural networks,

8 we need to standardize our data [30]. We use the Z-score normalization, defined as follows: [29]

x − µ z = (1) σ

Where µ equals the mean of the variable, and σ the standard deviation.

Table 3: Predictors Category Name Type Demographic Age INT Gender IND Social Relationship status IND Sexual preference IND Education Education type IND Personal Sports COUNT Friends COUNT Music COUNT Television COUNT Favorite teams COUNT Game interests COUNT Groups COUNT Movies COUNT Work COUNT Books COUNT Events COUNT Languages COUNT Family COUNT Inspirational people COUNT Interest COUNT General Facebook Profile completeness SUM(COUNT) Photo likes COUNT Album likes COUNT Status likes COUNT Checkin likes COUNT Statuses COUNT Video tags COUNT Photo tags COUNT Educations COUNT Photo’s COUNT Video’s COUNT Albums COUNT Likes 126 like categories COUNT Friend variables Game likes AVERAGE App page likes AVERAGE Number of friends AVERAGE

9 3.3 Classification algorithms

3.3.1 Random forest

The random forest algorithm combines random feature selection with bagging. Bagging is a technique where a tree is grown by the use of binary recursive partitioning on a random bootstrap sample of the training set. At each node split, it chooses the best set of features that maximises the decrease in impurity [13]. When all trees are grown, an average is taken over all predictions. This combination leads to higher robustness and the elimination of the suboptimal performance of single decision trees [23]. For more information about the random forest algorithm, we refer to Breiman [13]. Because of the Law of Large Numbers, random forest is less vulnerable towards overfitting [13]. Hence, we choose to use a large number of trees (1500). We use the RandomForestClassifier function of the Python package scikit- learn [54]. Other hyper-parameters are fine-tuned by use of grid search. We use a fivefold stratified cross-validation when we iterate over all possible parameters. The following parameter intervals were evaluated: maximum depth of each decision tree: [10, 20, 30,..., 100], minimum samples in each node to split : [10, 20, ...,100] and minimum samples in each leaf: [1, 2, ..., 10]. Respectively, the grid search procedure gave 50, 20 and 1 as the most optimal parameters. In order to determine the maximum amount of features in each tree, we follow the recommendations of Breiman [13] and use the square root of the total amount of features. Due to the class imbalance, the class weight parameter is set to balanced. This ensures that each label gets a weight assigned equal to the inverse of the this label’s proportion in the dataset. This ensures that the estimator puts more emphasis on the label with the highest weight.

3.3.2 XGBoost

XGBoost, short for eXtreme Gradient Boosting, is a state-of-the-art algorithm that has been per- forming extremely well in the past few years across various machine learning competitions [2]. The algorithm is different from the gradient boosting algorithm proposed by Friedman [27]. The difference lies in its scalability and efficiency, as well the possibility of the input of sparse data [16]. Gradient boosting builds an ensemble of weak models into a better prediction model. XGBoost tries to minimize the residual of the target function by adding a decision tree on each iteration. For a more elaborate explanation, we refer to Chen and Guestrin [16]. Important parameters for XGBoost are the learning rate and the number of trees. Smaller values of the learning rate leads to smaller steps that decreases the frequency of discontinuities, which is often preferable. As for the number of trees, it is advisable

10 to use a large number of trees. By use of grid search, we determined the optimal parameters. For the number of trees, we used the following values: [100, 200, ..., 500], the maximum depth of each tree reaches from 10 to 100 with a step size of 10 and for the learning rate: [0.0001, 0.001, 0.01 ,0.1, 0.2, 0.3]. We found that the optimal number of trees is 500, with a max depth of 10 and a learning rate of 0.1.

3.3.3 Logistic regression

In this study, we will use the regularized logistic regression with a L1 regularization. The L1 penalty, also know as least absolute shrinkage and selection operator (LASSO), is proven to be a well performing penalization when dealing with a dataset that has a significant amount of features [60]. The use of the lasso technique leads to the shrinkage of the coefficient of irrelevant features to 0, which leads to higher prediction accuracy and easier interpretation of the model [50, 60]. Another benefit of regularization is that it avoids overfitting [44]. We use the function LogisticRegression from Scikit to implement the algorithm [54]. We set the class weight parameter to balanced. 3 parameters were set by use of grid search: The parameter C, which is the inverse of the regularization strength, the maximum amount of iterations and the tolerance for stopping. We used following parameters for C : [0.001, 0.01, 0.1, 1, 10, 100, 1000, 10000]. Lower values for this parameter means that the regularization is stronger. Values for the maximum amount of iterations were [10,100,1000,10000] and for the tolerance for stopping : [10−6,10−7,10−8,10−9]. The optimal combination that we found was C equal to 1000, the maximum amount of iterations 100 and the tolerance for stopping 10−8.

3.3.4 Support vector machines

Due to their robust mathematical theory, Support vector machines (SVM) became highly preferred for high margin classifiers [53]. The kernel based algorithm results in good accuracy in comparison to classifiers that minimize the mean squared errors, because of the fact that the SVM algorithm tries to maximize the decision boundary by minimizing the maximum loss [53]. The kernel function transforms the input into a high-dimensional feature space, where it attempts to find hyper planes in order to separate different instances from each other. One problem that often occurs when implementing this high-dimensional feature space, is the high dimensionality and implementation complexity [18]. A solution to these problems, is the introduction of inner product kernels. We decided to use the Gaussian radial basis function (RBF), the most used kernel in literature: [18]

0 2 K(x, x ) = exp(−γ ||x − xi|| ) (2)

11 This bell shaped function measures the distance between the instance and a landmark (i.e. point of reference) [29]. Two parameters have to be defined for the implementation of the kernel function, C and γ. C is the penalty parameter of the error term and γ is the kernel parameter. Higher γ values lead to a narrower bell-shaped function, thus reducing the range of influence for each instance [29]. There are four main advantages when using this kernel. (1) In contrast to linear kernels, the RBF function is able to handle the non-linear relationship between the class labels and features [34]. (2) The simplicity of the hyperparameter selection. Where, with the RBF kernel, you only need to specify two different parameters, the polynomial function has more parameters to specify [34]. (3) When using the polynomial kernel, kernel values can reach to infinity, the RBF kernel values only range between 0 and 1. (4) When comparing performances, the linear kernel with a given parameter C has comparable performance with the RBF kernel with parameters C and γ [39]. Moreover, Lin and Lin [45] found that the sigmoid kernel behaves similar to the RBF kernel when the parameters of the sigmoid function are in a certain range. We follow the recommendations of Coussement and Poel [18] and perform a stratified 5 fold grid search on exponential sequences of C and γ. (i.e. C =2−3 2−5, ,..., 213; γ = 2−15, 2−14 ..., 21, 23). The best possible combination for this study is C = 211 and γ = 2−15. We use the SVC function from Scikit to implement the support vector machine algorithm [54].

3.3.5 Deep learning

A neural network is an algorithm that was created to model the human brain. A neural network’s structure consists of 3 parts: an input layer, an output layer and one or more hidden layers in between. The difference between a neural network and deep learning is that with deep learning, multiple hidden layers are used in the model’s architecture. We use the Python package Keras in order to build our neural network [17]. The model is a linear stack of layers. It consists of 6 hidden layers, with 6 drop out layers. Dropout layers set a percentage of random features to 0 at each update during training. We do this in order to deal with overfitting [58]. The fraction of input units to drop is in each layer 30%. Before each drop out layer, we also include a batch normalization layer. This layer ensures that the distribution of the inputs stay the same (i.e. with a mean activation close to 0 and activation standard deviation close to 1). Because of the use of activation functions, the input values can change dramatically when going further in the neural network, leading to higher training times and lower performance. The application of the batch normalization has proven to be effective in reducing training time, while increasing the model performance [36]. The activation function we use is the

12 softplus function. This function is defined as follows: [67]

s(x) = log(1 + ex) (3)

This is a smoothing version of the widely used ReLu function.[67] It brings several advantages in comparison with the latter. It offers smoothness in the decision domain, which leads to higher stability towards the estimation in the positive of negative directions. A second advantage is the function has a non-zero gradient when the input value is negative [67]. The softplus function came out as the best performing activation function out of the following possible candidates: tanh, ReLu, sigmoid, softmax and softplus. The initialization function, which determines the way the initial random weights of the layers are set, is the random uniform function. This generates tensors with a uniform distribution. Finally, we use the binary cross entropy loss function, with the adam optimizer. This optimization method comes with several advantages. It is easy to implement, computationally efficient and is well applicable when dealing with large data, both in terms of parameters and size, while having small memory requirements [41].

3.3.6 Soft voting ensemble

Most ensemble methods train a single learning algorithm, where the training data is typically manip- ulated by bagging or boosting [24]. In this study, we will use a hybrid ensemble approach. The idea behind these voting ensembles is that it balances out the individual weaknesses of the single classifiers [42]. It trains multiple different algorithms and combine these predictions into one prediction. Studies have proven that ensemble learning methods outperform single classifier methods [46]. The major advantage of ensemble learning is the improvement of the generalization ability of single learners [68]. In this study, we will build an ensemble method of 5 different classifiers (i.e. Random forest, XGBoost, Support vector machines and logistic regression). In voting ensembles, each single component is used to predict the class outcome. The most used schemes are hard voting rule, soft voting rule or sum rule, max rule and product rule [42]. When using hard voting, each single classifier will predict the binary outcome of an instance. The outcome with the most votes will then be selected. When using soft voting, each algorithm will use probabilistic estimates. The ensemble then uniformly averages all probabilities and predicts the class with the highest probability. For example, when we look at the following 5 outcomes across 5 different classifiers for P(y = 1) : 0.3, 0.6, 0.55, 0.35, 0.6. If we would apply the hard voting rule, the prediction of our model would be 1. However, when we would apply the soft voting rule, the a posteriori probability of would be:

13 0.3+0.6+0.55+0.35+0.6 5 = 0.48

Hence, in this case the outcome would be 0. We choose the soft voting rule in order to predict our outcome, because this has proven to be the best performing rule when combining classifiers [42]. The Scikit-learn function VotingClassifier was used in order to implement the algorithm [54].

3.4 Model performance

In order to evaluate our model, we will use 3 performance metrics that are often used in CRM, namely AUC, the accuracy score and the top 10% decile lift [19]. All predictions are made with a probibalistic outcome. Hence, we had to decide what threshold we want to use for our accuracy metric. This arbitrarely choosing of a threshold is the major drawback of using the accuracy score. Accuracy, or percentage of predictions that are classified correctly, is defined as follows: [33]

TP + TN Accuracy = (4) TP + TN + FP + FN

Where TP stands for true positives, TN for true negatives, FP for false positives and FN for false negatives. In this study, we opted for the 50% threshold, because of the fact that a lot of these social games have a low customer life time expectancy [35]. For these type of games, it is in the interest of marketeers to target customers that are even somewhat likely to show interest for their game, because social game producing companies typically have a wide variety of games that they offer. Hence the decision of a 50% threshold. However, working with thresholds has numerous downsides. First of all, due do the cut-off rule, the accuracy score is very vulnerable to class imbalances. Second, the performance is very sensitive to the chosen cut-off value [33]. To cope with the problem that cut-off related performance measures bring, we use the AUC score. AUC allows us to evaluate our models by including all possible cut-off values [18]. We define AUC as follows: Z 1 TP FP AUC = d (5) 0 TP + FN FP + TN AUC is the probability that a randomly selected pair of positive and negative labels are ranked correctly in regards of their outcome [32]. AUC values reach from 0.5 to 1. A value of 0.5 implies that the model does not perform better than random guessing. A value of 1 means that the model generates perfect predictions.

14 The last performance measure we will be using, is the top 10% decile lift. This metric is often used in the field of marketing, because it targets the top 10% who have the highest probability of being a positive outcome [18]. In our case, this will give the top 10% of the profiles that are most likely to be a gamer. The top 10% decile lift is defined as follows:

P /(P + N ) Lift = top10% top10% top10% (6) P/(P + N)

The top decile lift allows us to compare our different algorithms in terms of predicting the players who are the most likely to become a customer. Higher lift scores mean better performing algorithms. The minimum top 10% decile lift equals 1, the maximum is equal to 1 divided by the percentage of positive labels in our dataset, which is 1 / 29.48% = 3.39.

3.5 Cross-validation

In order not to over- or underestimate our model performances, we opted for the stratified fivefold cross- validation. Stratification ensures that in each fold, the classes are represented in the same proportion as they are in the original dataset [43]. Studies have proven that stratified cross-validation performs better in terms of variance and bias in comparison to regular cross-validation [43]. Stratified cross- validation starts with dividing the data in k folds, where the proportion of labels is the same as in the entire data set. The model is fit on k-1 folds, while the remaining fold is used for testing. This proces is repeated k times and each fold is used k-1 times as a train fold and 1 time as a test fold, thus, in our case for k = 5, resulting in five different estimates for accuracy, AUC and top 10% decile lift [43]. For implementing the stratified fivefold cross-validation, the function ’StratifiedKFold’ of Scikit was used [54]. In order to find out whether or not our classifiers differ from each other, we follow Demsar’s recommendations and use the Friedman test followed by a Bonferonni-Dunn post hoc test [21]. This is a non-parametric equivalent of the repeated measures ANOVA. It assigns ranks to all classifiers, starting from 1 (the strongest) to k, where k equals the number of classifiers. If the performance is equal, average ranks are used. Under the null hypothesis, all classifiers are equal. If this hypothesis is rejected, Demsar suggests to continue with a Bonferonni-Dunn post hoc test in order to find out which classifiers significantly differ from the best performing classifier. If the difference between average ranks exceeds the critical value, in this study 2.576, we can reject the null-hypothesis and conclude that the classifier significantly differs from the best performing classifier [21].

15 3.6 Information fusion sensitivity analysis

After performing the cross-validation, the last step in our analysis is the sensitivity analysis, combined with information fusion. Researchers agree that one model is not effective to summarize all important predictors [10, 20, 51]. Aggregating results leads to an improvement of information regarding accuracy and robustness. In order to assess the variable importances, we follow the approach of Oztekin [52], with the difference that we use the weighted average of the mean decrease in AUC instead of accuracy. To find out the decrease in AUC, we fit our model, for each algorithm, 166 times. On each iteration, we leave one different feature out of the fit procedure. We calculate the difference of the AUC of the model with the feature and the AUC of the model without the feature. Hence, variables with a large decrease in AUC will be more important for the prediction of one’s interest in games than variables with a lower decrease. Several other model performance metrics have been used in the past to measure this variable importance. For example, the mean decrease in Gini index or mean decrease in accuracy [13]. As noted earlier, most of these performance measurements are sensitive to the distribution of the data[37]. We follow the recommendations of Janitza et al. [37] and use the mean decrease in AUC in order to cope with the problems that data imbalance brings. To complete our information fusion process, we use the following equation to compute our variable importances:

r X Sθ(fused) = λiSi,θ = λ1S1,θ + λ2S2,θ + ... + λrSr,θ (7) i=1

Here, Si is the importance of feature θ in model i, with r the amount of models. λ are the weights assigned to each model i. These weights are calculated according to the weighted average of the AUC scores from our stratified fivefold cross-validation. Thus, models with a higher AUC will have a higher impact on the sensitivity score than models with lower AUC scores.

16 4 Discussion of results

4.1 Data analytic model results

The results of the fivefold stratified-cross validation for accuracy, AUC and lift score are summarized in Table 4. All values are calculated as the median of our cross-validation outcomes. Table 5 gives an overview of the different interquartile ranges.

Table 4: Median AUC, accuracy and top 10% decile lift for fivefold stratified cross-validation RF LR XGBoost SVM Voting Deep learning

AUC 0.7024 0.7284 0.6946 0.7459 0.7340 0.7315 Accuracy 0.7403 0.6893 0.7372 0.7033 0.7642 0.7323 Lift 2.3070 2.5410 2.3070 2.5750 2.4430 2.4430

Table 5: Inter quartile ranges RF LR XGBoost SVM Voting Deep learning

AUC 0.0460 0.0365 0.0363 0.0438 0.0437 0.0143 Accuracy 0.0145 0.0322 0.0175 0.0375 0.0354 0.0215 Lift 0.3690 0.3730 0.3670 0.3400 0.2340 0.2680

It is clear that, in terms of median top 10% decile lift score and AUC score, SVM is the best performing algorithm, with an AUC score of 0.7459 and top 10% decile lift of 2.5750. As mentioned above, the minimum AUC score, which would mean that our model does not perform any better than random selection, is 0.5. With XGBoost having the worst performance, having a minimum median AUC score of 0.6946, we can state that all our models significantly perform better than random selection. The top 10% decile lift indicates how well our model can predict the top 10% most likely gamers. A value of 1 indicates that our model does not perform better than normal selection, while the maximum value equals 1 divided by the proportion of gamers in our dataset, equalling 3.39. A lift score of 2.575 for the support vector machine model means that it can predict the most likely gamers 2.575 times better than random selection. With the worst score of 2.3070, random forest and XGBoost both share the last place.

17 When we look at the accuracy scores, we see that the voting ensemble takes the first place, with a score of 0.7642. Logistic regression is the worst performing algorithm, with a score of 0.6893. This implies that when dealing with cut-off related values, the voting ensemble is the best algorithm to work with. By using the Friedman test, we tested the null hypothesis of no significant differences between our model performances metrics. Table 6 gives an overview of all rankings, with the χ2 statistic and p-value. When taking a closer look to the AUC rankings, the χ2 of the Friedman test equals 18.03, p <0.003. This means that we can reject the null hypothesis and continue with performing the Bonferonni-Dunn post hoc test. If the difference in average ranks between our top performing algorithm and the algorithm we want to compare, exceeds our critical value of 2.576, we can conclude that there is a significant difference in performance between the 2 algorithms. We found that there is a significant difference between SVM and both random forest and XGboost. The same analysis can be made when looking at the accuracy scores. With a χ2 of 22.37, p <0.005, we can reject the null hypothesis of equal performing classifiers. Here, there is a significant difference between logistic regression and both random forest and the hybrid ensemble and between the hybrid ensemble and the support vector machine. Contrary to the findings of AUC and accuracy, we can not reject the null hypothesis regarding the lift scores. With a χ2 value of 7.6, there is no significant difference. We do not need to perform a post hoc test in this case.

Table 6: Average ranks for AUC, accuracy and lift RF LR XGBoost SVM Voting Deep learning Friedman χ2

AUC 4.8 3.4 5.6 1.2 2.4 3.6 18.03, p <0.003 Accuracy 2.2 6.0 2.8 4.8 1.2 4.0 22.37, p <0.005 Lift 3.9 4.9 3.8 2.9 1.9 3.6 7.60, p = 0.1795

As a final evaluation criterion for our models, we determined the stability between the five folds. In order to do this, we calculated the interquartile range, found in Table 5. The interquartile range is the difference between the third and first quartile. The lower, the more stable our predictions are. The IQR ranges from 1.95% to 6.55% for AUC, 1.96% to 5.33% for accuracy and 9.6% to 16% regarding lift scores. We find that in terms of AUC, our neural network is the most stable. When dealing with accuracy, random forest is the most robust and for the top 10% decile lift the hybrid ensemble takes the first place. These findings confirm our hypothesis that it is a viable approach to predict gaming

18 behavior using Facebook data. Our main research question in this study consists of three parts. (1) Whether or not it is possible to predict one’s gaming behavior by using Facebook data, (2) which algorithm performs best and (3) which are the main determinants of a person’s gaming behavior. As for (1), the results shown in Table 4, clearly indicate that it is possible to predict gaming behavior, with support vector machines as best performing algorithm with regards to median AUC and lift score. In the next section, we will elaborate on the third part of the research question.

4.2 Information fusion sensitivity analysis

Most machine learning algorithms nowadays are considered as ’black box models’, particularly when dealing with neural networks [49]. Little or no information can be retrieved from the model itself regarding feature importance. To cope with this problem, we use information fusion sensitivity analysis as described in equation 7. Information fusion incorporates the feature importance for each algorithm, putting more weight on better performing algorithms, which leads to a decrease in model uncertainty and an increase in model insight [20]. We iterated over each feature in our dataset, where in each iteration one different feature was left out and the decrease in AUC was calculated. We repeated this process for each algorithm, thus we have for each variable 6 different decrease in AUC scores. We took the weighted average of each algorithm’s AUC, multiplied this value with their respective sensitivity score and took the sum of these values. Figure 1 is a scree plot of the variable importances of the top 100 predictors in the dataset, ranked from the highest decrease in AUC to lowest. We can see that adding features lower in rank than the 14th predictor, will only result in minor improvements.

19 Figure 1: Scree plot of sensitivity scores of top 100 variables

Table 7 gives an overview of the top 15 features regarding mean decrease in AUC. Several interesting conclusions can be drawn from this table. 14 out of the top 15 features are like categories. However, this may be explained by several hypotheses. First of all, our dependent variable is built from a like category itself. Profiles with a significant amount of likes towards certain types of categories, are probably highly correlated towards the liking of a game page. Second, the games in our study are social games (i.e. games played on social platforms such as Facebook or Google+). People with a high amount of likes for several categories are probably more active on Facebook than people with fewer amounts of likes. This might be an explanation of why this is of great importance with regards to gaming on social platforms. Another interesting conclusion is that some of our top predictors are based on the social aspect of Facebook (e.g. likes for community pages and average amount of game pages). This is in line with previous studies on gaming behavior. Als´enet al. [7] has shown that the social aspect in social games is of great importance in order to gain customer loyalty, engagement and monetization. Finally, we want to point out that these like categories are very interesting for marketing campaigns. These categories are an easy to measure indicator for companies in their mission to target potential customers.

20 Table 7: Top 15 variables based on decrease in AUC Rank Feature Sensitivity score

1 Count(Likes for community) 0.0470 2 Count(Likes for athletes) 0.0390 3 Count(Likes for musician/bands) 0.0254 4 Count(Likes for Tv show) 0.0213 5 Count(Likes for books) 0.0205 6 Count(Likes for Actors / directors) 0.0155 7 Count(Likes for professional sport teams) 0.0153 8 Count(Likes for public figures) 0.0152 9 Count(Likes for food or beverages) 0.0127 10 Count(Likes for comedians) 0.0105 11 AVG(Friend likes for games) 0.0102 12 Count(Likes for companies) 0.0094 13 Count(Likes for local business) 0.0092 14 Count(Likes for sport leagues) 0.0075 15 Count(Likes for movies) 0.0073

21 Figure 2: Correlation heatmap for the decrease in AUC across all predictors

In order to find out which algorithms drive our sensitivity analysis, we included Figure 2. This gives an overview of the correlation between feature sensitivity scores of several algorithms. We see that, apart from random forest, all algorithms are fairly correlated with each other. This indicates that most models react the same way when deleting specific variables. Random forest is the least correlated with other methods, with a Pearson r2 between 0.17 and 0.22. The highest correlation is found between support vector machines and logistic regression. They have a correlation coefficient of 0.98.

22 Table 8: Top 10 variables based on decrease in AUC without like categories. Rank Feature Sensitivity score

1 AVG(Friend likes for games) 0.0102 2 Count(Family members) 0.0050 3 IND(Gender) 0.0044 4 Count(Friends) 0.0041 5 Count(Favourite teams) 0.0038 6 Profile completeness 0.0038 7 Count(Video uploads) 0.0037 8 Count(Photo uploads) 0.0036 9 IND(Relationship status == relationship) 0.0034 10 Count(Educations) 0.0029

Table 8 gives an overview of the most important features of categories, leaving out the feature importances of like categories. We deleted the these feature importances in order to find out whether or not our results are in line with the survey studies mentioned in Table 1. As for non-like related features, we see that the average amount of game page likes in their network is the most important, followed by the amount of family members the gender and number of friends. This is in line with several other studies. Als´enet al. [7] found that for social games, gender and age play an important role in someone’s gaming behavior, as well as the social aspect of gaming, such as amount of friends and interaction between players.

5 Conclusion and practical implications

Due to their high barrier to enter and low barrier to leave, social game marketeers face a difficult challenge in finding new potential customers for their games, whilst retaining their current customers. Given the fact that a lot of these games are being offered on the Facebook platform and often the users have to grant access to their profile data, game developers have access to a significant amount of Facebook data. Being able to predict one’s gaming behavior with the use of this data can aid marketeers in their acquisition process. Another important aspect to notice is that social game developers do not focus on one game only. Most of these companies produce a wide variety of games. For example, King is a social game develop company with more than 200 games in its portfolio [3]. Three out of

23 five of our most frequent games in this study, were produced by this company. The main question of this paper was whether or not it is possible to predict gaming behavior using Facebook data. In order to solve this problem, following steps were followed. First, we constructed a base table out of various Facebook variables, found in Table 3. Second, we benchmarked 6 different algorithms (i.e. Random forest, XGBoost, logistic regression, support vector machines, deep learning and a hybrid ensemble). We compared these algorithms on 3 different performance criteria from a stratified fivefold cross-validation (i.e. top 10% decile lift, AUC and accuracy score). Third, we evaluated the feature importances by performing an information fusion sensitivity analysis. This allows us to determine which variables play the most important role in the determination of one being a gamer or not. This is interesting for marketeers in their search for new customers. With median AUC scores ranging between 0.6946 and 0.7459, accuracy ranging between 0.6893 and 0.7642 and top 10% decile lift between 2.3070 and 2.575, the results clearly indicate that using Facebook data is a viable strategy in order to predict gaming behavior. In terms of median AUC and top 10% decile lift, the support vector machine came out as the best performing algorithm, whilst for accuracy the hybrid ensemble came up superior. The results of the information fusion sensitivity analysis gave us different results than initially expected. 14 out of the 15 most important variables are variables that indicate the amount of likes for certain categories. The most important variable is the amount of likes for communities, followed by the amount of likes for athletes and on the third place likes for musicians / bands. A potential reason for these results can be found in the nature of our dependent variable. In this study, a gamer is defined as someone who liked one or more gaming pages on Facebook. People that are more active on Facebook may be more attracted towards these types of games than people who are less active. Another possible explanation for these results might be that social games are very diverse. Social games typically do not have one specific genre. The most traditional social game genres are simulation and resource management games (i.e. Farmville), social casino games (i.e. Zynga Poker) and casual arcade games (i.e. Bubble Shooter)[7]. For example, for the five games in Table 2, we have three different genres. (i.e. puzzle games, a detective game and a role playing game (RPG) This wide variety of game types may be a possible explanation that there are no demographic indicators in our top 15 variables. Although some of the variable importances are not in line with our initial expectations, network related variables were proven in the past to be important in the determination of whether or not someone is a gamer. Two variables in our top 15 are network related variables, so this is in line with our study. To summarize, this study contributed to the literature by proving that Facebook data is a viable

24 resource in the prediction of gaming behavior. More concrete, we prove that our models contain valuable insights for marketeers in their search for new potential customers. Due to their social aspect, the acquisition of new customers in social games may lead to a snowball effect, because social game players tempt to invite their friends in order to play with or against them. As Wohn et al. [62] stated, the majority of social game players play with their friends and not alone. The results clearly indicate that it is a viable approach to target the most potential customers by following our recommendations.

6 Limitations and future research

Our study faces several limitations. A first limitation is the occurrence of selection effects. The data that were used, were obtained by building a Facebook application that extracts the relevant data from people who used this application. There is a chance that people did not want to fill in our application because of privacy issues. This leads to the possibility that people who did fill in the application are different from people who did not opt into the contest. A way to cope with selection effects, is to use web crawling [6]. However, due to privacy settings, some profiles can not be extracted with this technique. Generally, there is a large overlap between data gathered by use of a Facebook application and by use of web crawling [8]. However, the use of extraction has several benefits in comparison to web crawling. First, web crawling can be seen as intrusive for some users, in contrary to our approach, where we ask for the user’s approval. Second, we provided a section where we explained several rules, regulations and our contact details. Third, we explicitly mentioned that all data would be handled anonymously. A second limitation is the definition of our dependent variable. Our criterion for someone being a gamer is that he or she liked one or more pages of social games. There is a possibility that he or she was only interested in the past or liked one page of a game that he or she just played once. Another possibility is that he or she is not a gamer, but only liked a game page. However, we do believe that when a person shows interest in a game page by liking it, it is safe to assume that he or she has played or is planning to play this game. The list of the games that was used to determine if one is a gamer or not, is not completely accurate either. This list contains 645 games. However, there are a lot more games available on the Facebook platform and not incorporated in this list. In order to have a more robust approach, future research can be focussed on including actual in-game data along with recency variables, together with the inclusion of all games available at that time. A third drawback is that we could not incorporate exact friend-related gaming variables. Given

25 the fact that past studies have proven that the social aspect of gaming is a strong predictor, we believe that more specific friend related variables can yield strong predictive power. For example, one variable that could be important in future analysis, is the amount of gamer friends a person has. A fourth limitation, was the composition of our hybrid ensemble. Due to the fact that we built our deep learning algorithm with Keras, we were unable not implement this in the Scikit package. Future research might try to build an ensemble method with a deep learning algorithm included, in order to have more robust results. A final limitation is that we only incorporated Facebook data. Social games are played across several different platforms, such as Steam, Google+, Android or iOS. In order to get a correct image of the social gaming population, the data of those social platforms can be added to the models we built in order to gain better insights and more accurate predictions. Although this study faces several limitations, we believe that we still make a valuable contribution to the existing literature. To the best of our knowledge, we are the first to utilize Facebook data in order to successfully predict gaming behavior.

26 References

[1] Facebook, company info — facebook newsroom. https://newsroom.fb.com/company-info/. 2017.

[2] Kaggle. https://www.kaggle.com/. Accessed: 11/05/2018.

[3] King company information. https://discover.king.com/about/. Accessed: 21/05/2018.

[4] Monetizing games on facebook. https://developers.facebook.com/docs/games/ gamesonfacebook/monetization. Accessed: 04/04/2018.

[5] Facebook nears ad-only business model as game revenue falls. https://www.reuters.com/article/us-facebook-revenue/ facebook-nears-ad-only-business-model-as-game-revenue-falls-idUSKBN1802U7. Ac- cessed: 23/04/2018.

[6] Cliff A.C. Lampe, Nicole Ellison, and Charles Steinfield. A familiar face(book): Profile elements as signals in an online social network. pages 435–444, 01 2007.

[7] Adam Als´en,Julian Runge, Anders Drachen, and Daniel Klapper. Play with me? understanding and measuring the social aspect of casual gaming. CoRR, abs/1612.02172, 2016.

[8] Michel Ballings and Dirk Van den Poel. Crm in social media: Predicting increases in facebook usage frequency. European Journal of Operational Research, 244(1):248 – 260, 2015.

[9] Matthias Bogaert, Michel Ballings, and Dirk Van den Poel. The added value of facebook friends data in event attendance prediction. Decision Support Systems, 82:26 – 34, 2016.

[10] Matthias Bogaert, Michel Ballings, Martijn Hosten, and Dirk Van den Poel. Identifying soccer players on facebook through predictive analytics. Decision Analysis, 14(4):274–297, 2017.

[11] Matthias Bogaert, Michel Ballings, and Dirk Van den Poel. Evaluating the importance of different communication types in romantic tie prediction on social media. ANNALS OF OPERATIONS RESEARCH, 2017.

[12] Z. H. Borbora and J. Srivastava. User behavior modelling approach for churn prediction in online games. In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, pages 51–60, Sept 2012.

27 [13] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, Oct 2001.

[14] Ricarda-Marie Brox. The influence of social factors on gaming behaviour, July 2011. URL http://essay.utwente.nl/60988/.

[15] Jonathan Burez and Dirk Van den Poel. Crm at a pay-tv company: Using analytical models to reduce customer attrition by targeted marketing for subscription services. Expert Systems with Applications, 32(2):277 – 288, 2007.

[16] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. CoRR, abs/1603.02754, 2016. URL http://arxiv.org/abs/1603.02754.

[17] Fran¸coisChollet et al. Keras. https://keras.io, 2015.

[18] Kristof Coussement and Dirk Van Den Poel. Churn prediction in subscription services: An ap- plication of support vector machines while comparing two parameter-selection techniques. Expert Systems with Applications, 34(1):313 – 327, 2008.

[19] Kristof Coussement, Dries F. Benoit, and Dirk Van den Poel. Improved marketing decision making in a customer churn prediction context using generalized additive models. Expert Systems with Applications, 37(3):2132 – 2143, 2010.

[20] Ali Dag, Asil Oztekin, Ahmet Yucel, Serkan Bulur, and Fadel M. Megahed. Predicting heart transplantation outcomes through data analytics. Decision Support Systems, 94:42 – 52, 2017.

[21] Janez Demˇsar.Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res., 7:1–30, December 2006.

[22] Nicolas Ducheneaut, Nicholas Yee, Eric Nickell, and Robert J. Moore. ”alone together?”: Ex- ploring the social dynamics of massively multiplayer online games. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’06, pages 407–416, New York, NY, USA, 2006. ACM. ISBN 1-59593-372-7.

[23] Sandrine Dudoit, Jane Fridlyand, and Terence P Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 97(457):77–87, 2002.

[24] Saso Dˇzeroskiand Bernard Zenko.ˇ Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54(3):255–273, Mar 2004.

28 [25] Magy Seif El-Nasr. Game analytics: maximizing the value of player data. Springer, New York, 2013. ISBN 978-1-4471-4768-8.

[26] Tim Fields. Mobile & social game design: monetization methods and mechanics. CRC Press, Taylor & Francis Group, Boca Raton, second edition edition, 2014. ISBN 978-1-4665-9868-3.

[27] Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. Ann. Statist., 29(5):1189–1232, 10 2001.

[28] Mark D. Griffiths and Nigel Hunt. Computer game playing in adolescence: Prevalence and demo- graphic indicators. Journal of Community and Applied Social Psychology, 5(3):189–193, 1995.

[29] Aurlien Gron. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc., 1st edition, 2017.

[30] Joel Grus. Data science from scratch: first principles with python. ” O’Reilly Media, Inc.”, 2015.

[31] F. Hadiji, R. Sifa, A. Drachen, C. Thurau, K. Kersting, and C. Bauckhage. Predicting player churn in the wild. In 2014 IEEE Conference on Computational Intelligence and Games, pages 1–8, Aug 2014.

[32] J.A. Hanley and Barbara Mcneil. The meaning and use of the area under a receiver operating characteristic (roc) curve. 143:29–36, 05 1982.

[33] H. He and E. A. Garcia. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9):1263–1284, Sept 2009.

[34] C.-W Hsu, C.-C Chang, and C.-J Lin. A practical guide to support vector classification. 101: 1396–1400, 01 2003.

[35] Sam K. Hui and Yuzhou Liu. Understanding gamer retention in social games using aggregate dau and mau data: A bayesian data augmentation approach. 2013.

[36] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015.

[37] Silke Janitza, Carolin Strobl, and Anne-Laure Boulesteix. An auc-based permutation variable importance measure for random forests. BMC Bioinformatics, 14(1):119, Apr 2013.

29 [38] J. Kawale, A. Pal, and J. Srivastava. Churn prediction in mmorpgs: A social influence based approach. In 2009 International Conference on Computational Science and Engineering, volume 4, pages 423–428, Aug 2009.

[39] S. Sathiya Keerthi and Chih-Jen Lin. Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comput., 15(7):1667–1689, July 2003.

[40] Romana Khan, Michael Lewis, and Vishal Singh. Dynamic customer management and the value of one-to-one marketing. Marketing Science, 28(6):1063–1079, 2009.

[41] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014. URL http://arxiv.org/abs/1412.6980.

[42] J. Kittler. Combining classifiers: A theoretical framework. Pattern Analysis and Applications, 1 (1):18–27, Mar 1998.

[43] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selec- tion. In Proceedings of the 14th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’95, pages 1137–1143, San Francisco, CA, USA, 1995. Morgan Kaufmann Publishers Inc.

[44] Su-In Lee, Honglak Lee, Pieter Abbeel, and Andrew Y. Ng. Efficient l1 regularized logistic regression. 21, 01 2006.

[45] Hsuan-Tien Lin and Chih-Jen Lin. A study on sigmoid kernels for svm and the training of non-psd kernels by smo-type methods. 06 2003.

[46] Richard Maclin and David W. Opitz. Popular ensemble methods: An empirical study. CoRR, abs/1106.0257, 2011. URL http://arxiv.org/abs/1106.0257.

[47] Tobias Mahlmann, Anders Drachen, Julian Togelius, Alessandro Canossa, and Georgios Yan- nakakis. Predicting player behavior in tomb raider: Underworld (pre-print). In Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games, CIG2010, pages 178 – 185, 09 2010.

[48] Matthijs Meire, Michel Ballings, and Dirk Van den Poel. The added value of auxiliary data in sentiment analysis of facebook posts. DECISION SUPPORT SYSTEMS, 89:98–112, 2016.

30 [49] Malihe Molaie, Razieh Falahian, Shahriar Gharibzadeh, Sajad Jafari, and Julien Clinton Sprott. Artificial neural networks: powerful tools for modeling chaotic behavior in the nervous system. In Front. Comput. Neurosci., 2014.

[50] Andrew Y. Ng. Feature selection, l1 vs. l2 regularization, and rotational invariance. In Proceedings of the Twenty-first International Conference on Machine Learning, ICML ’04, pages 78–, New York, NY, USA, 2004. ACM. ISBN 1-58113-838-5.

[51] Asil Oztekin. A hybrid data analytic approach to predict college graduation status and its deter- minative factors. 116:1678–1699, 09 2016.

[52] Asil Oztekin. A hybrid data analytic approach to predict college graduation status and its deter- minative factors. Industrial Management & Data Systems, 116(8):1678–1699, 2016.

[53] Krupal S. Parikh and Trupti P. Shah. Support vector machine – a large margin classifier to diag- nose skin illnesses. Procedia Technology, 23:369 – 375, 2016. URL http://www.sciencedirect. com/science/article/pii/S2212017316300408. 3rd International Conference on Innovations in Automation and Mechatronics Engineering 2016, ICIAME 2016 05-06 February, 2016.

[54] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pret- tenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Re- search, 12:2825–2830, 2011.

[55] A.´ Peri´a˜nez,A. Saas, A. Guitart, and C. Magne. Churn prediction in social games: Towards a complete assessment using survival ensembles. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pages 564–573, Oct 2016.

[56] Werner J. Reinartz and V. Kumar. The impact of customer relationship characteristics on prof- itable lifetime duration. Journal of Marketing, 67(1):77–99, 2003.

[57] Julian Runge, Peng Gao, Florent Garcin, and Boi Faltings. Churn prediction for high-value players in casual social games. pages 1–8. IEEE, August 2014. ISBN 978-1-4799-3547-5.

[58] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.

31 [59] Jieun Sung, Torger Bjornrud, Yu-Hao Lee, and Donghee Yvette Wohn. Social network games: exploring audience traits. In CHI Extended Abstracts, 2010.

[60] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267–288, 1996. ISSN 00359246.

[61] Dmitri Williams, Nick Yee, and Scott E. Caplan. Who plays, how much, and why? debunking the stereotypical gamer profile. Journal of Computer-Mediated Communication, 13(4):993–1018, 2008.

[62] D. Y. Wohn, C. Lampe, R. Wash, N. Ellison, and J. Vitak. The ”s” in social network games: Ini- tiating, maintaining, and enhancing relationships. In 2011 44th Hawaii International Conference on System Sciences, pages 1–10, Jan 2011.

[63] H. Xie, S. Devlin, D. Kudenko, and P. Cowling. Predicting player disengagement and first purchase with event-frequency based data representation. In 2015 IEEE Conference on Computational Intelligence and Games (CIG), pages 230–237, Aug 2015.

[64] Hanting Xie, Daniel Kudenko, Sam Devlin, and Peter Cowling. Predicting Player Disengagement in Online Games. In Tristan Cazenave, Mark H. M. Winands, and Yngvi Bj¨ornsson,editors, Com- puter Games: Third Workshop on Computer Games, CGW 2014, Held in Conjunction with the 21st European Conference on Artificial Intelligence, ECAI 2014, Prague, Czech Republic, August 18, 2014, Revised Selected Papers, pages 133–149. Springer International Publishing, Cham, 2014.

[65] Hanting Xie, Sam Devlin, and Daniel Kudenko. Predicting disengagement in free-to-play games with highly biased data. In Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference, 2016.

[66] Nick Yee. The demographics, motivations, and derived experiences of users of massively multi-user online graphical environments. Presence: Teleoper. Virtual Environ., 15(3):309–329, June 2006.

[67] Hao Zheng, Zhanlei Yang, Wenju Liu, Jizhong Liang, and Yanpeng Li. Improving deep neural net- works using softplus units. In 2015 International Joint Conference on Neural Networks (IJCNN), pages 1–4, July 2015.

[68] Zhi-Hua Zhou and Xu-Ying Liu. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1): 63–77, Jan 2006.

32 Appendices

A Top 30 most frequent games

Table 9: Top 30 most frequent games Rank Name Frequency

1 Farm Heroes Saga 193 2 Candy Crush Saga 169 3 Top Eleven - Be a Football Manager 98 4 Criminal Case 84 5 Pet Rescue Saga 47 6 Papa Pear Saga 46 7 Pudding Pop Community 43 8 Monster Busters 43 9 Pepper Panic Saga 42 10 Pig and Dragon Community 40 11 Pearls Peril 27 12 Solitaire in Wonderland Community 26 13 Throne Rush 26 14 Texas HoldEm Poker 25 15 Online Soccer Manager 25 16 Buggle Community 20 17 TrainStation 20 18 Hit It Rich! Casino Slots 20 19 FarmVille 2 20 20 Fruit Jamba Community 19 21 Puzzle Charms 19 22 Pyramid Solitaire Saga 18 23 Crystal Island 17 24 Hidden Shadows 16 25 Builder 16 26 Mahjong Trails 16 27 Bingo Bash 15 28 Bubble Witch Saga 14 29 Hay Day 14 30 Tetris Blitz 13

33