Personalized Game Reviews

Miguel Pacheco de Melo Flores Ribeiro

Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering

Supervisor: Prof. Carlos Ant´onioRoque Martinho

Examination Committee

Chairperson: Prof. Lu´ısManuel Antunes Veiga Supervisor: Prof. Carlos Ant´onioRoque Martinho Member of the Committee: Prof. Jo˜aoMiguel de Sousa de Assis Dias

May 2019

Acknowledgments

I would like to thank my parents and brother for their love and friendship through all my life. I would also like to thank my grandparents, uncles and cousins for their understanding and support throughout all these years. Moreover, I would like to acknowledge my dissertation supervisors Prof. Carlos Ant´onioRoque Martinho and Prof. Layla Hirsh Mart´ınezfor their insight, support and sharing of knowledge that has made this Thesis possible. Last but not least, to my girlfriend and all my friends that helped me grow as a person and were always there for me during the good and bad times in my life. Thank you. To each and every one of you, thank you.

Abstract

Nowadays one way of subjective evaluation of games is through game reviews. These are critical analysis, aiming to give information about the quality of the games. While the experience of playing a game is inherently personal and different for each player, current approaches to the evaluation of this experience do not take into account the individual characteristics of each player. We firmly believe game review scores should take into account the personality of the player. To verify this, we created a game review system, using multiple machine learning algorithms, to give multiple reviews for different personalities which allow us to give a more holistic perspective of a review score, based on multiple and distinct players’ profiles.

Keywords digital game, machine learning, player model, review system.

iii

Resumo

Atualmente, a forma mais comum de avalia¸c˜aode jogos ´eatrav´esde reviews de jogos. Estas s˜aoan´alises cr´ıticas,com o objetivo de dar informa¸c˜aosobre a qualidade dos jogos. Enquanto que a experiˆenciade um jogo ´einerentemente pessoal e diferente para cada jogador, as abordagens atuais para a avalia¸c˜ao desta experiˆencian˜aotˆemem considera¸c˜aoas caracter´ısticasindividuais de cada jogador. N´osacreditamos veemente que as pontua¸c˜oes das reviews dos jogos s˜aoinerentes `apersonalidade do jogador. Para verificar isto, n´oscri´amosum sistema de review de jogos, usando m´ultiplosalgoritmos de aprendizagem, que modela reviews para diferentes personalidades que nos permitem dar uma perspetiva mais hol´ısticade uma pontua¸c˜aode review, baseado em m´ultiplose distintos perfis de jogador.

Palavras Chave jogo digital, aprendizagem de m´aquina,modelo de jogador, sistema de review.

v

Contents

1 Introduction 1 1.1 Motivation...... 2 1.2 Problem...... 3 1.3 Hypothesis...... 3 1.4 Contributions...... 4 1.5 Document Outline...... 4

2 Related Work 7 2.1 Game Reviews...... 8 2.1.1 Aggregators...... 10 2.1.2 Platforms for digital mobile distribution...... 10 2.1.3 Platforms for digital computer distribution...... 10 2.1.4 Entertainment websites...... 11 2.1.5 Youtube channels...... 12 2.1.6 Game Reviews Summary...... 12 2.2 Game Genres...... 13 2.3 Personality Models...... 14 2.3.1 The Five Factor Model...... 14 2.3.2 Myer-Briggs Type Indicator...... 14 2.4 Player Models...... 15 2.4.1 Bartle Player Types...... 16 2.4.2 Quantic Foundry’s Gamer Motivation Profile...... 17 2.4.3 Demographic Game Design...... 19 2.4.4 BrainHex...... 20 2.5 WEKA...... 21 2.6 Explored Machine Learning Algorithms...... 23 2.6.1 Instance-Based Algorithms...... 23 2.6.2 Regression Algorithms...... 25 2.6.3 Tree Algorithms...... 28 2.7 Credibility...... 29 2.7.1 Bootstrap...... 30 2.7.2 Cross-validation...... 30 2.8 Discussion...... 31

3 Methods and Procedures 33 3.1 Measuring Tools...... 34 3.2 Approaches...... 34 3.3 Methodology...... 35 3.3.1 User Questionnaire...... 36 3.3.2 Game Filtering...... 37 3.3.3 Dataset Filtering and ARFF Preparation...... 38

vii 3.3.4 Algorithm Filtering...... 40 3.3.5 Review System...... 41 3.3.6 Review System Validation...... 42 3.4 Metrics...... 42 3.5 Discussion...... 42

4 Results 43 4.1 Demographic Results...... 44 4.2 Game Filtering Results...... 46 4.3 Dataset Filtering Results...... 48 4.4 Review System Training Results...... 48 4.4.1 Linear Regression Algorithm Training Results...... 48 4.4.2 K-Nearest-Neighbor Algorithm Training Results...... 52 4.4.3 Multilayer Perceptron Algorithm Training Results...... 53 4.4.4 M5P Algorithm Training Results...... 54 4.5 Review System Validation Results...... 56 4.5.1 Linear Regression Algorithm Validation Results...... 56 4.5.2 K-Nearest Neighbor Algorithm Validation Results...... 56 4.5.3 Multilayer Perceptron Algorithm Validation Results...... 57 4.5.4 M5P Algorithm Validation Results...... 57 4.5.5 Algorithms Validation Results Discussion...... 58 4.6 Discussion...... 59

5 Conclusions 61 5.1 Discussion...... 62 5.2 Future Work...... 62

A Game Genres 71

B Selected Digital Games 73

C User Questionnaire 1 77

D User Questionnaire 2 85

viii List of Figures

1.1 GameRankings’s review score of 96.33 in a scale from 0 to 100 (Game Rankings, n.d.)...... 3

2.1 FIFA 18 review score of 7 by GameSpot specialists (GameSpot, 2017)...... 8 2.2 Pokemon GO review score of 7 by GameSpot specialists (GameSpot, 2016)...... 8 2.3 Pokemon GO user’s review score of 4.1 on Play (, n.d.)...... 10 2.4 Grand Theft Auto V with 70% positive reactions and both graphs with all user’s reactions since its release date and the past month (, 2015)...... 11 2.5 Player interest graph with two coordinates creating four distinct quadrants: Achiever, Explorer, Socialiser, Killer (Bateman, 2009)...... 16 2.6 Quantic Foundry’s Gamer Motivation Model with the three high-level motivations and its sub-categories (Quantic Foundry, 2016)...... 17 2.7 The four player typologies from Demographic Game Design Model: Conqueror, Manager, Wanderer and Participant, each one divided into casual gamers and hardcore gamers (Dias & Martinho, 2010)...... 20 2.8 BrainHex model with the seven archetypes, each related to a style of play (BrainHex, n.d.). 21 2.9 Attribute-Relation File Format (ARFF) file used in our work. The relation Grand Theft Auto V Review with the list of seven plus one numeric attributes and a list with some of the instances... 22 2.10 A K-Nearest-Neighbor (KNN) example with two distinct features where a new instance is being evaluated within the three nearest neighbors (Punch III et al., 1993)...... 24 2.11 A Linear Regression (LR) graph in a system with two dimensions, where the red line represents the value obtained from all instances represented as blue dots...... 26 2.12 Sigmoid Function graph where is possible to see that is a function not completely differ- entiable in the middle values (Witten et al., 2016)...... 27 2.13 Tenfold cross-validation repeated ten times where the white squares represent the training dataset and the orange squares the validation dataset...... 31

3.1 Two approaches to give personalize review scores...... 34 3.2 Flow chart of the six methodology steps. It starts from the User Questionnaire, followed by Game Filtering, and Dataset Filtering and ARFF Preparation. From it, appears the Algorithm Filtering process, and the Review System. Ending the pipeline methodology with the Review System Validation step...... 35 3.3 Header part of an ARFF file. It contains the relation Grand Theft Auto V Review System, the list of seven plus one numeric attributes which are the seven archetypes of the user’s BrainHex model and the game review score...... 39 3.4 ARFF Header part of the General Review System (GRS). It contains the relation Gen- eral Review System, the list of seven numeric attributes which are the user’s BrainHex dimensions, the seven game’s BrainHex dimensions, and the game review score...... 40

ix 3.5 ARFF Data part of the GRS, where the seven first columns contains the user’s BrainHex dimensions, the next seven contains the game’s BrainHex dimensions, and the last column contains the respective game review scores...... 40 3.6 The software Waikato Environment for Knowledge Analysis (WEKA) in menu Explorer and section Classify. Here is being classifying a dataset withLR algorithm, using tenfold cross-validation. Also, it is presented the generated model and the results of some error metrics...... 41

4.1 Pie chart with the gender information of the 300 users. With 84.3% male individuals and 15.7% female answers...... 44 4.2 Pie chart with the age information of the 300 users...... 45 4.3 Pie chart with the information about the type of gamer of the 300 users. With 56.0% as hardcore gamer, 41.7% to casual gamer, and 2.3% as not gamer...... 45 4.4 An identified outlier in the game The Witcher 3: Wild Hunt, with a review score of 2... 48 4.5 Generated tree using the M5Prime (M5P) algorithm, for the GRS. One initial node, three sub-nodes and five leafs with their correspondent linear regression models...... 55

5.1 The user Miguel has a predicted review score of 7.8 out of 10, and his six friends with their names, photos and predicted review scores in a circle, or the real review scores in a hexagon, and the highest and lowest BrainHex dimensions in a small green and red circles, accordingly (VilaGames, n.d.)...... 64

x List of Tables

4.1 The 12 selected digital games with their 7 BrainHex’s archetypes scores and correspondent game genres. Meaning: Achiever (Achiever (AC)), Conqueror (Conqueror (CO)), Dare- devil(Daredevil (DA)), Mastermind (Mastermind (MA)), Seeker (Seeker (SE)), Socialiser (Socializer (SO)), Survivor (Survivor (SU))...... 46 4.2 12 selected digital games with their correspondent opinion deviations and a last row with the average of all selected games (in percentage)...... 47 4.3 The first column presents the name of the digital games, the second and third columns show the highest and lowest Mean-Absolute Error (MAE), accordingly...... 47 4.4 GRS and the 12 selected digital games with their correspondent number of instances (NI) and overall MAE divided into three phases, the first without restrictions to the dataset; the second without those instances of the games played for 1 hour or less; and the third restriction without other identified outliers. Finally, the last column is the improvement percentage, of the overall MAE from the first to the third phase...... 49 4.5 The number of instances (NI) with the respective training dataset percentage, testing dataset percentage and MAE in seven progressive steps...... 50 4.6 The 12 selected digital games with their correspondent predetermined weights, to be mul- tiplied by the BrainHex archetypes, in alphabetical order. Meaning: w1 (AC), w2 (CO), w3(DA), w4 (MA), w5 (SE), w6 (SO), w7 (SU)...... 51 4.7 The 12 selected digital games with their correspondent number of instances (NI), and overall MAE based on the training instances of the singular review systems, as can be seen in the seventh column of the Table 4.4, and the MAE obtained from testing the GRS model using the training instances of each game...... 52 4.8 GRS and the 12 selected digital games with their correspondent number of instances (NI) and the four percentage neighbor approaches, each accordingly with their MAEs...... 53 4.9 GRS and the 12 selected digital games with their correspondent number of instances (NI) and three distinct weighting approaches to calculate the importance of each common neigh- bor, with the respective number of instances and MAEs. The three weighting distance methods we used in our system were 1, 1-d, and 1/d accordingly...... 53 4.10 GRS and the 12 selected digital games with their correspondent number of instances (NI), and the two different approaches of the number of hidden layers (HL) with their MAE results, accordingly. The second approach is presented by A-A-A because it has 3 levels of hidden layers each one with the number of attributes...... 54 4.11 GRS and the 12 selected digital games with their correspondent number of instances (NI) and MAEs, using the M5P algorithm...... 55 4.12 GRS and the 12 selected digital games with their correspondent number of instances (NI), and the training MAE and the testing MAE, using theLR algorithm...... 56 4.13 GRS and the 12 selected digital games with their correspondent number of instances (NI), and the training MAE and the testing MAE, using the KNN algorithm...... 57

xi 4.14 GRS and the 12 selected digital games with their correspondent number of instances (NI), and the training MAE and the testing MAE, using the Multilayer Perceptron (MLP) algorithm...... 58 4.15 GRS and the 12 selected digital games with their correspondent number of instances (NI), and the training MAE and the testing MAE, using the M5P algorithm...... 58 4.16 GRS and the 12 selected digital games with their correspondent number of instances (NI) and MAEs, for all selected algorithms...... 59

C.1 21 questions with their correspondent BrainHex’s dimension. Meaning: Achiever (AC), Conqueror (CO), Daredevil(DA), Mastermind (MA), Seeker (SE), Socialiser (SO), Sur- vivor (SU)...... 78 C.2 7 sentences with their correspondent BrainHex’s dimension. Meaning: Achiever (AC), Conqueror (CO), Daredevil(DA), Mastermind (MA), Seeker (SE), Socialiser (SO), Sur- vivor (SU)...... 78

xii Acronyms

AC Achiever

ACG Angry Centaur Gaming

AMAE Average Mean-Absolute Error

ARFF Attribute-Relation File Format

CCA Canonical Correlation Analysis

CO Conqueror

DA Daredevil

DGD Demographic Game Design

FFM Five Factor Model

FPS First-Person Shooter

GMP Gamer Motivation Profile

GNU General Public License

GOG Good Old Games

GRS General Review System

IB Instance-Based

IGN Imagine Games Network

IST Instituto Superior T´ecnico

KNN K-Nearest-Neighbor

LDA Linear Discriminant Analysis

LBG Location-Based Game

LR Linear Regression

M5P M5Prime

MA Mastermind

MAE Mean-Absolute Error

MAED Mean-Absolute Error Difference

xiii MBTI Myer-Briggs Type Indicator MLP Multilayer Perceptron

MUD Multi-User Dungeon NPC Non-Player Character PCA Principal Component Analysis

PUCP Pontif´ıciaUniversidad Cat´olicadel Per´u RPG Role-Playing Game RTS Real-Time Strategy RS Review System

SDR Standard Deviation Reduction SE Seeker SO Socializer

SU Survivor WEKA Waikato Environment for Knowledge Analysis

xiv 1 Introduction

Contents 1.1 Motivation...... 2

1.2 Problem...... 3

1.3 Hypothesis...... 3

1.4 Contributions...... 4

1.5 Document Outline...... 4

1 1.1 Motivation

Nowadays, digital games make up a 100 billion U. S. dollars global industry1. Back in 2015, there were already 1.8 billion people playing digital games worldwide2. While more and more games are coming to the market, it is a hard task for players when the time comes to choose a suitable game Huang(2018).

This being said, we believe the game review industry is an important way to disseminate new games and help players selecting them (Huang, 2018). This industry is not a new thing and there are hundreds of digital game review websites all over the world (Huang, 2018). However a search was made and most of them are mere scores and none of them takes into account the personality of the user or any of their preferences.

With this work, we present a simple, but innovative, system where we give personalized game reviews according to each player’s personality. By “personalized” we mean that the appropriate recommendation is given specifically to that particular player (Huang, 2018). Usually game reviews are made on a single scale from 1 to 53, 0 to 104, 0 to 1005 as can be seen in Figure 1.1 where the game Grand Theft Auto V obtained a review score of 96.33 in a scale from 0 to 100; or by giving like/dislike reactions6. Those review scores are the same for all the other people also searching for the same game. This can lead to disappointment when people invest their time and money buying an expensive new game they think they will like because of the review7. Often it happens because the industry of game reviews is giving high reviews scores to that specific game, but when they try it, they notice that game do not cover their personal expectations. This occurs because we know that different people will like the same game for different reasons so we propose a system where each person reports his player profile and according to it will be provided with a distinct game review scores for each one (Malone, 1981). Keeping these factors in mind, we think this personalized game review system proposal is a new interesting approach in the area.

1https://www.history.com/topics/inventions/history-of-video-games 2https://blogs.intel.com/technology/2015/08/the-game-changer 3https://play.google.com 4https://pt.ign.com 5http://metacritic.com 6https://store.steampowered.com 7https://culturedvultures.com/lets-talk-gaming-embargoes

2 Figure 1.1: GameRankings’s Grand Theft Auto V review score of 96.33 in a scale from 0 to 100 (Game Rankings, n.d.).

1.2 Problem

Recommending games for players is a well known industry, and what it does is to attribute a score to a game to advise a person if they should spend their time and money on that game or not8. However, all these scores are based on subjective opinions and are made by other people with a different way of thinking (Ribeiro, Lobato, & Liberato, 2009). Moreover, nowadays game reviews do not take into account the tastes or even the personality of each player. All of the reviews are made by producing numbers which can be very volatile; review scores are not dogmas. So, the problem is to develop a system to give personalized game reviews, for selected digital games.

1.3 Hypothesis

Assuming we know the personality of each person a priori, using a player profile model for the player and for the game, we will be able to provide game reviews scores more adequate to each user. Besides this, the review system we will develop will be able to identify which kind of person have similar traits for each selected digital game. We will develop a review system through multiple perspectives of different personalities. In order to develop it, we will use a player satisfaction model in order to get the player’s

8https://medium.com/mode-b/the-problem-with-video-game-reviews-838933f94224

3 profiles. Then, we will analyze selected machine learning algorithms such as instance-based algorithms, function algorithms, and tree algorithms. Finally, we will choose the one or ones that best fit our problem, meaning that they are not very heavy to compute and have a low error rate; to accomplish that we will use evaluation methods. Our intention is to find out if, using a player profile model, it is possible to create a review system to recommend games to players with diverse profiles. Doing game reviews less general and subjective this way, since they will be personalized for each player.

1.4 Contributions

With this work, we intend to search the state of the art in game reviews detailing some of the most prominent contributions within them, game genres, personality models, player models, and machine learning algorithms. This dissertation is an exploratory study that attempts to take a first step in the use of personalized game reviews. Our objective is to develop a review system for game reviews taking into account the player’s profile in each case. Since traditional methods, as made by hand, are time-consuming and somewhat tedious, computer-aided decision support systems have appeared to simplify the task. To accomplish our aim, we will use hundreds of real players with diverse profiles and backgrounds. Our model will be implemented and tested with a list of recent existing commercial digital games from different consoles and game genres. To implement our review system we will use machine learning algorithms that we will validate using evaluation methods. Lastly, we will provide all documentation of our research which we expect will contribute to help future game review websites to take this approach into account or even to create a new company within the industry based on this way of thinking.

1.5 Document Outline

This thesis was developed conform to the following structure:

• Chapter 1 - Introduction, where we present our main reasons to develop this work by explaining the importance of creating an innovative game review approach, the personalized game review system, for selected games, taking into account the player’s profile.

• Chapter 2 - Related Work, where we begin by extensively analyzing how games are reviewed nowadays and how they can be assigned into different game genres. Afterwards, we report two important personality models and four player models. Then, we describe the data preprocessing workbench used and analyze some machine learning algorithms. We end this chapter describing classification methods used to validate our game review system.

4 • Chapter 3 - Methods and Procedures, where we describe the process of developing our game review system. Firstly we describe our measuring tools, and then, we pass through an exhaustive explanation of each step of our methodology and the procedures to validate our solution.

• Chapter 4 - Results, where we report the obtained outcome from our experiment and their significance over our work. We begin by describing the validation process for each selected algorithm, complemented with their results, and concluded with a debate on the importance of our findings and the identified issues.

• Chapter 5 - Conclusions, where we sum up our major achievements taken for our system as well as discuss success criteria. To finish up we present potential directions for the future of this work.

5 6 2 Related Work

Contents 2.1 Game Reviews...... 8

2.2 Game Genres...... 13

2.3 Personality Models...... 14

2.4 Player Models...... 15

2.5 WEKA...... 21

2.6 Explored Machine Learning Algorithms...... 23

2.7 Credibility...... 29

2.8 Discussion...... 31

7 2.1 Game Reviews

Game Reviews are one kind of review. There are many things in the world which get reviewed such as cars, movies, pieces of music or even restaurants (Autocar, 2018; Movie Database [IMDb], 2018; Pitchfork, 2018; Tripadvisor, 2018). What we understand by review, also called critical analysis, is the study of a particular sector in relation to its pre-established requirements, aiming to give objective information on the same based subjective and personal appreciation. The purpose of the reviewers is usually to help with the decision making about how to spend money, time, or other resources. Moreover, the aim of game reviews is also to help users by leading them to games that make them happy (McNamara, 2008). In fact, a well-written game review can attract new players (Huang, 2018). Keeping these factors in mind, game reviews are the previous experience of players and their opinions about a game, after they played it (Huang, 2018). From game reviews a lot of information could be found, such as the quality of a game, or the content of a game (Huang, 2018). As can be seen in Figure 2.1 and Figure 2.2, these two recent digital games have the same review score of 7 defined as Good on GameSpot website. Although, these games are for different types of consoles, have different game genres and different thematic.

Figure 2.1: FIFA 18 review score of 7 by GameSpot specialists (GameSpot, 2017).

Figure 2.2: Pokemon GO review score of 7 by GameSpot specialists (GameSpot, 2016).

Nowadays there are hundreds of game review companies, websites or magazines and most of them work with similar methods1. A search was made and found that they only vary in the scale used that can be from 0 to 10, 0 to 100, 1 to 5, like/dislike, and others scales to evaluate the game, never taking into account the personality of the person who is evaluating it. There are several well known cases of

1http://www.metacritic.com/faq

8 online game reviews such as GameSpot2 and Imagine Games Network3 (IGN) that work as entertainment websites that do reviews by themselves. On the other hand, GameRankings4 and Metacritic5 use a wider approach because they do not do reviews, they work as aggregators from other known websites and magazines that do reviews. Here, certain websites as Metacritic weight the individual review scores based on the source. Moreover, there is another way of doing game reviews and it is made by videos in Youtube channels as Angry Centaur Gaming6 (ACG) and UnfairReviews7. Finally, App Store8 and Google Play9 work as platforms for digital mobile distribution whereas Good Old Games10 (GOG) and Steam11 work as digital distribution for computers. Although all these last four cases have one thing in common, all work only with common users’ reviews instead of professional ones. Regardless of not all distribution platform provides with user reviews, this is the current trend.

While working with reviews, there are some cases where we noticed that they have the specialist’s scores as showed in Figure 2.2 where they have the review score of Pokemon GO made by the specialists of Gamespot; and cases where they have the overall user’s scores as can be seen in Figure 2.3 where they have the user’s review score of Pokemon GO on Google Play. Reviews made by specialists are more comprehensive compared to user reviews because they focus on multiple aspects while one user review only focuses on one or two aspects (Huang, 2018). These professional game reviews are a good approach but it can be limited as they are the opinion of one person or a limited set of people. On the other hand, the number or user reviews is much larger, and sources of user reviews are varied (Huang, 2018). Although, they tend to be shorter, partial and drastic. Also, we can often see the maximum and the minimum scores, even if the scales have a wide set of choices. Not all the users’ reviews suffer from this fact, but we can call it review bombing and it contributes for the noise that sometimes disturbs the wise game review results which often occurs for reasons outside of the game itself (PCGamesN, 2017). A review bomb is an Internet phenomenon in which large groups of people leave negative user reviews for digital games in an attempt to harm their sales and popularity (Steamed, 2015). This could alter the choice to buy the game from consumers who use that aggregate rating as a principle part of their purchasing decision (Steamed, 2015).

We will now briefly explain how the companies calculate their game reviews scores for these selected cases.

2https://www.gamespot.com 3https://ign.com 4https://www.gamerankings.com 5http://metacritic.com 6https://www.youtube.com/user/AngryCentaurGaming 7https://www.youtube.com/user/UnfairReviews 8https://www.appstore.com 9https://play.google.com 10https://www.gog.com 11https://www.pcgamer.com

9 Figure 2.3: Pokemon GO user’s review score of 4.1 on Google Play (Google Play, n.d.).

2.1.1 Aggregators

GameRankings and Metacritic are both aggregators of game reviews, although they use different ap- proaches. GameRankings uses the average of each game review they receive (Game Rankings, n.d.), instead of Metacritic who have a secret formula which calculates, with weights, the overall score for each game (Metacritic, n.d.). They produce their formula based on whatever sources they think are most reliable (Metacritic, n.d.). Firstly they normalize the values by converting every review to a scale from 0 to 100 and then they apply their formula to obtain the final score (Metacritic, n.d.). For example, as can be seen in Figure 1.1 where the game Grand Theft Auto V obtained a review score of 96.33 in a scale from 0 to 100.

2.1.2 Platforms for digital mobile distribution

App Store and Google Play are platforms for digital mobile distribution and they work only with users’ reviews (Google Play, 2016; Apple, 2016). Both these two platforms went for a simple approach, where they use a scale from 1 to 5 for game reviews (Google Play, n.d.). Maybe they use this scale to obtain better overall numbers because it is never possible to classify a game as 0. For example, as showed in Figure 2.3, they have more than ten million user’s reviews of Pokemon GO on Google Play.

2.1.3 Platforms for digital computer distribution

Good Old Games (GOG) and Steam are platforms for digital computer distribution and they work only with users’ reviews (Steam, 2017). Although GOG uses a scale for their game reviews similar to Google Play, Steam has a different and peculiar approach to game reviews (Good Old Games [GOG], 2016). It proposes that with only a single boolean choice for evaluating the games (like or dislike) it gives a good overall score for the game (Steam, n.d.). In Steam, they show the information as a percentage of

10 people that like the game followed by two temporally graphs (Steam, 2015). One of the graphs shows the amount of likes and dislikes that the game received since the release date and the other shows the same information but for the last month. For example, as presented in Figure 2.4, the game Grand Theft Auto V has 70% positive reactions in Steam since its release date. Besides, it can be seen on the left side of the figure, the graph with all user’s reactions since its release date and on the right side the graph with the user’s reactions from the past month. It is also noticeable that in July of 2017, there was an evidence of review bombing due to the amount of negative reviews.

Figure 2.4: Grand Theft Auto V with 70% positive reactions and both graphs with all user’s reactions since its release date and the past month (Steam, 2015).

2.1.4 Entertainment websites

GameSpot and Imagine Games Network (IGN) are entertainment websites and they strive to pair up games with the person who knows the brand and the genre so as to offer the most expert opinion (Imagine Games Network [IGN], n.d.). Professional reviews are comprehensive and authorable, but sometimes they can also be partial due to the limited experience of the writer (Huang, 2018). Besides, sometimes this kind of person is the one who likes this genre of games above others so the review can be biased. Even so, there are people capable to give fair reviews to the game genres they like. Saying that both GameSpot and IGN classify the games in a scale from 0 to 10, then IGN add some comments of what is best and worst from that specific game. So the games are rated from 0 to 10 where:

• 10.0 is a master piece game, example: Grand Theft Auto V.

• 9.0 to 9.9 is an amazing game, example: The Witcher III Wild Hunt.

• 8.0 to 8.9 is a great game, example: FIFA 18.

• 7.0 to 7.9 is a good game, example: Pokemon GO.

11 • 6.0 to 6.9 is an okay game, example: Angry Birds 2.

• 5.0 to 5.9 is a mediocre game, example: Beyond Eyes.

• 4.0 to 4.9 is a bad game, example: The Legend of Korra.

• 3.0 to 3.9 is an awful game, example: Rambo: The .

• 2.0 to 2.9 is a painful game, example: Fast and Furious: Showdown.

• 1.0 to 1.9 is an unbearable game, example: Step Up.

• 0.0 to 0.9 is a disaster game, example: Extreme PaintBrawl.

Note that nowadays the games that are released and reviewed by these known websites, usually receive a review score of seven or more. Being those games with review score below six considered exceptional cases.

2.1.5 Youtube channels

Angry Centaur Gaming (ACG) and UnfairReviews are Youtube channels and they use a different approach from all we saw before because they use a nominal scale for game reviews. In ACG they evaluate the games within four categories classifying each game to Buy, Wait for Sale, Rent, or to Never Touch (Angry Centaur Gaming [ACG]). This is an interesting approach because here they are not giving their subjective evaluation through a final review number but, instead, they are giving their concrete opinion if they would buy those specific games or not (Angry Centaur Gaming [ACG]). For example, the game Grand Theft Auto V is classified as a game to buy on ACG.

2.1.6 Game Reviews Summary

After the explanation of different kinds of game reviews, we can conclude that there are some valid meth- ods to evaluate games. Despite the ways game reviews are presented to consumers, they are undoubtedly an influence on the manner that people view, talk and understand about games (McNamara, 2008). We want to highlight one pro from each type of game reviews described before:

• Aggregators take into account several other game review sources.

• Platforms for digital mobile and computer distribution have usually much more game reviews due to being made by users.

• Entertainment websites make game reviews by specialists in the brand and game genre.

• Youtube channels use a nominal scale for game reviews, giving a concrete opinion.

12 2.2 Game Genres

The idea of digital games categorization by genre has not been without difficulties, such as the definition of what exactly constitutes a genre, overlaps and boundaries between genres, and the fact that genres are always in flux as long as new works are being produced (Rabin, 2010). This said, it follows that not every digital game can be assigned to a particular genre, or classified as a combination of two or more genres (Rabin, 2010). In our work we analyzed different digital game genres in order to obtain a wider spectrum of games and their relation with the players’ profiles. The digital game genres we consider in our work, which can be seen with more detail in Appendix A - Game Genres, are:

• Action Games which emphasize physical challenges, including reaction-time and hand-eye coordi- nation (Rabin, 2010). For example Grand Theft Auto V, or God of War 4.

• Adventure Games which are set in a world usually made up of multiple rooms, involving objectives that must usually be completed in several steps (Rabin, 2010). For example Grand Theft Auto V, or God of War 4.

• Battle Royale Games in which the last person standing wins (Digital Trends). For example Fortnite.

• First-Person Shooter (FPS) Games which the player is able to wield a variety of weapons and dispatches enemies by shooting them (Rabin, 2010). For example Black Ops III.

• Location-Based Game (LBG) Games which move with the user’s location (Lehmann, 2012). For example Pokemon GO.

• Real-Time Strategy (RTS) Games in which the goal is for the player to collect resources, build an army, and control his units to attack the enemy while fast-paced (Rabin, 2010). For example Clash Royale.

• Role-Playing Game (RPG) Games in which players create or take on a character, which may include a developed persona (Rabin, 2010). For example Dark Souls III, The Elder Scrolls V: Skyrim, or The Witcher 3: Wild Hunt.

• Simulator Games which are based on the simulation of a system (Rabin, 2010). For example The Sims 4, or FIFA 18.

• Sport Games which are simulations of existing sports or variations of them (Rabin, 2010). For example FIFA 18, or .

13 2.3 Personality Models

Each person has a different taste for everything in the world, and so, we assume different people have different personality and each one have their particular needs. In this sub-section, we will explain the psychological theories of The Five Factor Model (FFM), and Myer-Briggs Type Indicator (MBTI) that we consider relevant to our work and which served as a basis for some work done on personality in digital games (McCrae & John, 1992).

2.3.1 The Five Factor Model

The Five Factor Model (FFM), also known as Big Five Personality Traits, is a theory that divides a person’s personality into five different dimensions (Sheldon, Ryan, Rawsthorne, & Ilardi, 1997). These traits were identified through empirical research over multiple decades and the model has been shown to be robust, reliable, and cross-culturally valid (McCrae & John, 1992). These traits are openness to experience, conscientiousness, extroversion, agreeableness, and neuroticism, and are often described by the acronyms OCEAN. Openness to experience is a trait related to the appreciation of adventure, art, curiosity, emotion, imagination and seeking novelty (McCrae & John, 1992). Conscientiousness implies that a person is very self-disciplined, focused, determined, careful and vigilant (McCrae & John, 1992). Extroversion is a trait of people who have a tendency to look out for others, being outgoing engaging with the external world (McCrae & John, 1992). Agreeableness describes a person who is very cooperative, compassionate, kind, sympathetic, warm, considerate and receptive to others (McCrae & John, 1992). Neuroticism is related to how emotionally stable a person is and if they easily develop undesirable emotions such as anxiety, fear, and anger (McCrae & John, 1992).

2.3.2 Myer-Briggs Type Indicator

Myer-Briggs Type Indicator (MBTI) is one of the most widely used personality model, and it led to the development of the Demographic Game Design (DGD) model, the forerunner of the BrainHex model used in this dissertation (Nacke, Bateman, & Mandryk, 2014). MBTI came to life in the 1920s when, inspired by the studies of the psychiatrist Carl Jung, Katharine Cook Briggs and her daughter Isabel Briggs Myers developed the personality indicator still used nowadays (Briggs-Myers & Myers, 1995). It classifies people according to four dimensions each one composed by two sub-dichotomies which lead to sixteen different types of personalities. We will now explain briefly the MBTI model: Mind: How people react to their surroundings. This can be divided into two sub-dichotomies:

• Extroversion: People with this feature need to live the experiences of the world and prefer group activities. They are energized by social interaction. These individuals tend to be more easily excited than introverts and more enthusiastic.

14 • Introversion: People with this feature tend to be more solitary and get exhausted by social inter- action. These individuals, in general, tend to be quite sensitive to external stimulation.

Energy: How people see the world and process information. This can be divided into two sub- dichotomies:

• Sensing: People with this feature tend to capture information based on their five senses as a reliable source. These individuals like to deal with concrete and explicit information and use past situations to deal with current ones.

• Intuition: People with this feature tend to be more open-minded, imaginative and curious. These individuals search for non-explicit information, even ambiguous information, and create memories based on patterns.

Nature: How people make decisions and cope with emotions. This can be divided into two sub- dichotomies:

• Thinking: People with this feature prefer to use facts and logic to make their decisions. These individuals tend to hide their feelings and see efficiency as more important than cooperation.

• Feeling: People with this feature frequently use their emotions and subjective factors to make their decisions. These individuals are less competitive. They tend to focus on social cooperation and harmony.

Tactics: How people deal with the world; this can be divided into two sub-dichotomies:

• Judging: People with this feature like to have a stable, organized and planned life. These individuals plan their tasks carefully. They like routines and work independently of deadlines.

• Perceiving: People with this feature like to have a flexible, free and spontaneous life. These indi- viduals make their plans while doing tasks and change them frequently. They do not like routines and are strongly influenced by deadlines.

Having stated this, we can give an example that a person can have the ISTP (Introversion-Sensing- Thinking-Perceiving) personality. The person with this personality tend to be more logical, organized, conscientious, pragmatic, conservative and stable.

2.4 Player Models

Players tend to have preferred game genres and usually play games which contents are in accordance with their likes (Lazzaro, 2004). Their likes derive from the type of players they are, and consequently derive from their player personality (Martinho, Santos, & Prada, 2014). So we know the player’s personality is

15 strongly related to the games they like the most. In this section, we will present and explain four player models that categorize players by type.

2.4.1 Bartle Player Types

Bartle Player Types is a study by Richard Bartle based on Multi-User Dungeons (MUDs). This study brought us one of the first and most consistent player models, classifying a player’s actions in relation to his personality (Quandt & Kr¨oger, 2014). According to this study, by analyzing the interaction patterns, four different types of players were found (Bartle, 1996). These pattern types are Achievers, Killers, Explorers, and Socialisers, which are relevant to Massive Multiplayer Online games but not always to other game genres. As can be seen in Figure 2.5, the result can be expressed in the form of a two coordinate axis where the horizontal axis represents preference for interacting with other players by one side and exploring the world in the opposite extreme; the vertical axis represents preference for interaction in one side and action in the other.

Figure 2.5: Player interest graph with two coordinates creating four distinct quadrants: Achiever, Explorer, Socialiser, Killer (Bateman, 2009).

With this graph we can observe four distinct quadrants that we will now describe: Achievers act upon the world. To them, the environment is an immersive experience and the harder the challenge is, the most rewarded and better they feel. Killers act on other players. They like competition and demonstrate superiority, prefer to play against real players rather than Non-Player Character (NPC)s. Explorers interact with the world. They enjoy exploring everything and being surprised, knowing all the mechanics, tricks and glitches in a game. Socialisers interact with other players. They enjoy being in community and guilds to spread their knowledge, or just having a conversation and creating bonds with other players.

16 2.4.2 Quantic Foundry’s Gamer Motivation Profile

Quantic Foundry was founded in 2015 by Nick Yee and Nicolas Ducheneaut who came up with an empirical Gamer Motivation Profile (GMP), by gathering data over time from more than three hundred thousand English-speaking gamers from many geographic regions (Yee, 2016). The GMP was developed by generating an inventory of motivations gathered from a literary review of existing motivational models, such as FFM(Yee, 2016). They validated the data and, using factor analysis, identified how the different player motivations clustered together (Yee, 2016). The Quantic Foundry validated and refined their motivations model based on data from thirty thousand gamers around the world to get new motivations and refine the model itself. From that came its second, and last, version. In GMP, as can be seen in Figure 2.6, there are three high-level motivations, namely Extraversion, Conscientiousness, and Openness, but these can be divided into six middle-level motivations which are Action and Social, Mastery and Achievement, Immersion and Creativity. Then each one of these can be divided into two low-level motivations, based on factor analysis of how they cluster together.

Figure 2.6: Quantic Foundry’s Gamer Motivation Model with the three high-level motivations and its sub- categories (Quantic Foundry, 2016).

GMP is based on the twelve major motivations that we will describe now in more detail: Extraversion covers the more energetic and gregarious modes of gameplay, seeking out arousing gaming experiences.

• Action: For those with a high level of Action, who are more aggressive in their play attitude and like to jump into the action and be surrounded by many effects and dramatic visuals. On the other hand, those with a low level of Action prefer calmer and quieter settings and slower-paced games.

17 The Action dimension is divided into the following two low-level motivations:

– Destruction: The enjoyment of chaos, mayhem, guns, and explosives.

– Excitement: The enjoyment of games that are fast-paced, intense, and provide an adrenaline rush.

• Social: For those with a high level of Social, they enjoy collaborating and competing with other players. On the other hand, those with a low level of Social prefer to play alone where they can be more independent. The Social dimension is divided into two low-level motivations, namely:

– Competition: The enjoyment of competition with other players (duels or matches).

– Community: The enjoyment of interacting and collaborating with other players.

Conscientiousness covers different ways of progressing through and attaining power within the construct of the game world.

• Achievement: For those with a high level of Achievement, they like to accrue power, all the collectibles, and rare items, even if they have to suffer to obtain them. On the other hand, those with a low level of Achievement do not worry about their progress and scores in the game and have a more relaxed posture while playing. The Achievement dimension is divided into two low-level motivations as follows:

– Completion: The desire to complete every mission, get every collectible, and discover hidden things.

– Power: The importance of becoming powerful within the context of the game world.

• Mastery: Those with a high level of Mastery enjoy challenging gaming experiences with deep and complex strategies. On the other hand, those with a low level of Mastery, enjoy being spontaneous when gaming and like more accessible and forgiveness-linked games. The Mastery dimension is divided into the following two low-level motivations:

– Challenge: The preference for games of skill and enjoyment and of overcoming difficult challenges.

– Strategy: The enjoyment of games that require careful decision-making and strategic think- ing.

Openness covers different ways of relating to the story and design of the game world.

• Creativity: Those with a high level of Creativity like to constantly experiment with the game world, and if possible, customize and change the game with their own designs and customizations. On the other hand, those with a low level of Creativity are more practical in their gameplay and accept the game as it is. The Creativity dimension is divided into two low-level motivations, namely:

18 – Design: The appeal of expression and deep customization.

– Discovery: The desire to explore, tinker, and experiment with the game world.

• Immersion: Those with a high level of Immersion enjoy games with interesting settings, stories, and customization options so they can be deeply immersed in the game. On the other hand, those with a low level of Immersion care less about the narrative experiences and are more focused on gameplay mechanics. The Immersion dimension is divided into two low-level motivations, namely:

– Fantasy: The desire to become someone else, somewhere else.

– Story: The importance of an elaborate storyline and interesting characters.

2.4.3 Demographic Game Design

Demographic Game Design (DGD) is a very popular model based on applying MBTI typology gathering data from the player’s gameplay needs (Martinho et al., 2014). This model proposed by Chris Bateman is an adaptation of Myers-Briggs typology to games and it is focused on market-oriented game design (Bateman, Lowenhaupt, & Nacke, 2011). In DGD, as in Bartle Player Types, the players are organized into four categories (Martinho et al., 2014). These four categories can be represented in a graph, as shown in Figure 2.7, and they are the Conqueror (Thinking and Judging), the Manager (Thinking and Perceiving), the Wanderer (Feeling and Perceiving), and the Participant (Feeling and Judging). They can be described as: Conquerors need fast game progression in games, like hidden components. The story, and their characters, are not relevant. Managers need a stable progression in games, like implicit goals. Plots are more important than the characters and they do not need a strong social component. Wanderers need slow progression in games, where progression implies new items, like simple controls and an involving story. Participants need to have a game progression connected to the narrative, prefer face to face games and enjoy group interactions. Each of these four clusters can be divided into two subcategories in order to distinguish between hardcore and casual players, the latter being the majority of the population. This subdivision means that DGD is more focused on players’ abilities rather than on their likes, knowing that hardcore gamers are always seeking for the most recent news in the games industry, trying a lot of games and adapting their life to playing (Martinho et al., 2014). Casual players do not have much knowledge of the games industry, do not talk much about games and do not change their habits to play games, playing each game as a one-off (Martinho et al., 2014). In the meanwhile, there was an attempt by International Hobo to

19 develop a new player type theory, named DGD2, which leaded to the creation of the BrainHex Model that we will describe in the next subsection (Nacke et al., 2014).

Figure 2.7: The four player typologies from Demographic Game Design Model: Conqueror, Manager, Wanderer and Participant, each one divided into casual gamers and hardcore gamers (Dias & Martinho, 2010).

2.4.4 BrainHex

BrainHex is a player satisfaction model (BrainHex, 2008). It was created by International Hobo Ltd and was based on insights from neurobiological findings as well as the results from the earlier DGD(Nacke et al., 2014). This player model illustrates gameplay behaviour in terms of seven different archetypes in the human nervous systems and is the one we will be using in our research to get our player’s personality data (BrainHex, 2008). We chose the BrainHex player model, revised in 2008 based on a study with 60,000 participants, because it is independently measured and we had free access to its coding. The seven dimensions of the BrainHex model are: Achiever, Conqueror, Daredevil, Mastermind, Seeker, Socialiser and Survivor. They can be seen in Figure 2.8, where the archetype Achiever is represented in the center because all the other archetypes also involve, to some degree, the pleasure centre (nucleus accumbens); Conqueror is represented by the small diamond on the right, since it activates the adrenal glands norepinephrine; Daredevil is represented by the small diamond on the left, since it activates the adrenal gland epinephrine; Mastermind is represented by the square which means the orbitofrontal cortex; Seeker is represented by the circle looking forward, since it activates the sensory cortices and the hippocampus; Socialiser is represented by the large diamond on the right, since it activates the hipothalamus; finally, Survivor is represented by the large diamond on the left, since it activates the amygdala (BrainHex, n.d.).

20 Figure 2.8: BrainHex model with the seven archetypes, each related to a style of play (BrainHex, n.d.).

Achiever: The player with a high score in Achiever archetype has a fixation on reaching goals, enjoys checking boxes, is motivated by long-term achievements and strives to fully complete a game (Nacke et al., 2014). Conqueror: The player with a high score in Conqueror archetype enjoys defeating difficult foes, struggling until he achieves victory, beating other players and showing superiority (Nacke et al., 2014). Daredevil: The player with a high score in Daredevil archetype seeks thrills, excitement and risk- taking (Nacke et al., 2014). Mastermind: The player with a high score in Mastermind archetype enjoys solving puzzles that require strategic thinking, as well as focusing on making the most efficient decisions (Nacke et al., 2014). Seeker: The player with a high score in Seeker archetype is curious about the game world and sakes moments of curiosity, wonder and sensual stimulation (Nacke et al., 2014). Socializer: The player with a high score in Socializer archetype enjoys being in community and guilds, hanging out and helping others (Nacke et al., 2014). Survivor: The player with a high score in Survivor archetype gets pleasure from the use of his survival instincts, triggered by the intensity of terror experiences (Nacke et al., 2014).

2.5 WEKA

The Waikato Environment for Knowledge Analysis (WEKA) is a machine learning and data preprocessing workbench that was developed, in 1993, at the University of Waikato in New Zealand (Bouckaert et al., 2010). This toolbox is written in Java and distributed under the terms of the General Public License (GNU)(Witten, Frank, Hall, & Pal, 2016). It runs on many platforms and has been tested under Macintosh, Windows and Linux operating systems (Witten et al., 2016). In our work we will use the 3.8.3 WEKA stable version, under the Macintosh operating system. In addiction, we will make use of

21 WEKA as a facilitator tool to analyze data in large scale. It provides an uniform interface to many different learning algorithms, where we can pre-process a dataset, feed it into a learning scheme, and evaluate the resulting classifier and its performance. WEKA uses as input a single relational table named Attribute-Relation File Format (ARFF). This type of file has two distinct sections: the Header and the Data (Bouckaert et al., 2018). The Header contains the name of the relation and the list of the attributes with their names and types (Bouckaert et al., 2018). The second section of the ARFF file contains the list of the instances, the input dataset itself. An example of an ARFF file used in our work can be seen in Figure 2.9, where it contains the relation Grand Theft Auto V Review System; the list of seven plus one numeric attributes which are the seven archetypes of the BrainHex model and the plus one is the game review score; and finally, the list of some instances.

Figure 2.9: ARFF file used in our work. The relation Grand Theft Auto V Review with the list of seven plus one numeric attributes and a list with some of the instances.

In this dissertation, its used the main graphical user interface of WEKA, called Explorer (Hall et al., 2009). It gives access to all of its facilities using menu selection of algorithms and each specification, by reading the dataset from an ARFF document (Hall et al., 2009). The software WEKA contains methods for all the standard data mining problems such as regression, clustering, classification, attribute selection and association rule mining (Witten et al., 2016). There are three distinct ways of using Explorer in WEKA, one is to apply a learning method to a dataset and analyze its output to learn more about the data (Witten et al., 2016). Another method is to use learned models to generate predictions on new instances (Witten et al., 2016). A third way is to apply selected different learners and compare their performance (Witten et al., 2016). In our work, we will work with all three approaches, the first to analyze which kind of user will like the specific games. With the second way, we will use our learned models to predict how a new user will evaluate a specific game, but for this we will also use the third

22 way by using several algorithms and choosing which best fits our problem.

2.6 Explored Machine Learning Algorithms

To learn is getting knowledge by study, experience, or being taught. So, we can say that people and machines learn when they change their behavior in a way that makes them perform better in the future (Witten et al., 2016). In our system we worked with numerical and continuous attributes so we only analyzed algorithms that performed with this conditions. Besides, it is often possible to translate from nominal attributes to numerical but not vice versa (Witten et al., 2016). We will now describe selected algorithms that fulfill our specifications.

2.6.1 Instance-Based Algorithms

Instance-Based (IB), or lazy, algorithms are classic techniques that have been used for many decades to solve prediction tasks in statistics (Witten et al., 2016). InIB classification, each new instance is compared with existing ones using a distance metric, and the closest existing instance is used to assign the class to the new one. This is called the nearest-neighbor classification method (Witten et al., 2016). Although,IB learning using nearest-neighbor classifiers is quite volatile in the presence of noise, and revolves around using a distance function that matches the task (Witten et al., 2016). The fundamental assumption of these algorithms is that similar instances will have similar classifications. It requires the entire training data to be stored, which may not be feasible in practice with large datasets (Witten et al., 2016). Although there are other choices, mostIB algorithms use distance functions (Witten et al., 2016). The most known inIB algorithms is the Euclidean distance function (Witten et al., 2016). Which the distance between two instances a and b, each one with k entries, the distance formula d is defined as:

p 2 2 2 d(a, b) = (a1 − b1) + (a2 − b2) + ... + (ak − bk) . (2.1)

One alternative to the Euclidean distance is the Manhattan where the difference between attribute values is just adding the absolute value instead of squaring it (Aha, Kibler, & Albert, 1991). There are other alternatives but these are heavier to compute so the Euclidean distance is generally a good compromise. Usually when we work with different attributes, these are measured on different scales so if the Euclidean distance formula was used directly, the effects of some attributes could be completely dwarfed by others (Witten et al., 2016). Thus, it is usual to normalize all the attribute values to be between 0 and 1, for that we could use this formula, where n is the normalized value of the actual attribute i, v is the attribute value, and maximum and minimum are taken over all instances in the training set:

vi − min vi ni = . (2.2) max vi − min vi

23 In our work we will not need to do this normalization because we are already working with all the attributes within the same scale of values. We know thatIB learning is very simple and effective, but can often be slow for large datasets because the entire set must be searched for each test instance in order to find its closest neighbor. As we can see this procedure is linear and is proportional to the number of training instances. K-Nearest-Neighbor (KNN) is an algorithm which only considers the K neighbors (K is a positive integer, usually small) around the object, not the underlying data distribution (Yue, Wang, Chen, Payne, & Liu, 2018). This way, the output is the average of the values of its most common KNNs, which it can be calculated linearly or logarithmically, by taking into consideration the proximity of each neighbor. If it is the case that K =1, then we have present the nearest-neighbor algorithm that can be used to implement a basic form ofIB learning (Aha et al., 1991). On the other hand, if we are trying to classify a new instance with a two dimensions’ system and with a K value of 3, the KNN algorithm will behave as presented in Figure 2.10.

Figure 2.10: A KNN example with two distinct features where a new instance is being evaluated within the three nearest neighbors (Punch III et al., 1993)

There are several practical problems with this simple method. First, it tends to be slow for large training sets. Second, it can have problems with noisy data because we are looking and assuming that our value is exactly the same as our closest neighbor which can lead to precipitated wrong values (Shaw & Jebara, 2009). For high-dimensional datasets, the complexity can grow and the dimension reduction is usually performed prior to applying a KNN algorithm in order to avoid the effects of the curse of dimensionality. Because when the dimensionality increases, the volume of the space increases so fast that the available data become sparse, and so, the search for the K closest neighbors gets heavier to compute. So, feature extraction and dimension reduction can be combined in one step using Principal Component

24 Analysis (PCA), Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA) techniques as a pre-processing step followed by clustering by KNN on feature vectors in reduced-dimension space. In machine learning this process is also called low-dimensional embedding (Shaw & Jebara, 2009).

In 2014, a study was made using KNN classification in an adaptive virtual reality framework (Johnson, Tang, & Franzwa, 2014). In this paper, an empirical evaluation demonstrates that the KNN-based game system accurately predicts players’ domain knowledge levels on which it offers differentiated instructions to guide them through the problem-solving in the game (Johnson et al., 2014).

2.6.2 Regression Algorithms

Function, or regression, algorithms have been widely used in statistical applications for decades (Witten et al., 2016). The idea is to express the class as a linear combination of the attributes, with predetermined weights except that weights are applied to each attribute before adding them together. Here, the input and output attributes are all numeric. The aim is to come up with good values for the weights that make the model match the desired output.

Linear Regression (LR) is a natural technique to consider when the outcome, or class, is numeric, and all the attributes are numeric (Witten et al., 2016). It is an excellent, simple method for numeric prediction (Witten et al., 2016). The idea of (simple)LR is to express the class as a linear combination of the attributes, with predetermined weights so the data are modeled to fit a straight line. In Figure 2.11 is shown aLR graph in a system with two dimensions, where the red line represents the value obtained from all instances represented as blue dots. This can be described, when we have only two dimensions, as:

y = m × x + b. (2.3)

Or we can analyze the multiple linear regression that is an extension of the (simple)LR, which allows a variable, y, to be modeled as a linear function of two or more predictor variables (Witten et al., 2016).

y = w0 + w1 × x1 + w2 × x2 + ... + wk × xk, (2.4)

where y is the linear regression function; x’s are the k attribute values; and w’s are the weights.

25 Figure 2.11: ALR graph in a system with two dimensions, where the red line represents the value obtained from all instances represented as blue dots.

The weights are calculated from the training dataset. Here the formula gets more complex, because we need a way of expressing each attribute value for each training instance. The first instance will have a class, say y, and k attribute values x and weight w. Thus, for this algorithm the predicted value for the first instance’s class can be written as:

k 1 X 1 y = wjxj . (2.5) j=0 The result is a formula of a set of numeric weights, based on the training data, which we can use to predict the class of new instances (Witten et al., 2016). This said,LR is an extraordinary, simple method for numeric prediction, and it has been widely used in many areas of statistical applications (Witten et al., 2016). But linear models suffer from the disadvantage of, as the name says, linearity (Witten et al., 2016). If the data has a non-linear dependency, -fitting straight line will be found, but may not fit very well (Witten et al., 2016). A filter function (logarithmic or quadratic) is often used to solve this problem by converting data to linear models. In 2010, a study using multipleLR examined the relationship between selected personality traits and online gaming addiction (Mehroof & Griffiths, 2010). They reach the conclusion that the association between predictive variables and digital game addiction was strongly positive, with high value on the coefficient of determination R=73% (Mehroof & Griffiths, 2010). The results of this paper indicates that personality traits contribute to addiction in online digital games (Mehroof & Griffiths, 2010). Multilayer Perceptron (MLP) is an algorithm that uses back-propagation to classify instances (Witten et al., 2016). The algorithm is divided into three steps recursively, namely: Initialization of the weights, Propagation of the inputs forward, and Backpropagation of the error (Witten et al., 2016). It learns the structure of the network by learning the connection weights, through a fixed structure (Witten

26 et al., 2016). For each training instance, the weights are modified in such a way as to minimize the mean-squared error of the network between the target value and its prediction (Witten et al., 2016). The solution for the MLP is that these prediction modifications are made in a backward way (from the output layer, through each hidden layer until the first hidden layer) until they reach a specific threshold parameter to end the algorithm (Witten et al., 2016). To this end, there is a known optimization algorithm called gradient descent, which takes derivatives and converts the weighted sum of the inputs into a binary prediction which is not completely differentiable in the middle values, as shown in Figure 2.12, and it is called the sigmoid function: 1 f(x) = . (2.6) 1 + e−x

Figure 2.12: Sigmoid Function graph where is possible to see that is a function not completely differentiable in the middle values (Witten et al., 2016).

TheLR algorithm seen before is a particular case of the MLP with one hidden layer of one node and a linear activation, instead of a sigmoid activation function.

Giving an input I, and its correspondent instance i in a hidden layer (or output layer); where w is the value of the connection from instance p in the previous layer to instance i; O is the output of this instance p from the previous layer; and B is the bias, which works as a threshold. We can then visualize the follow function of the MLP algorithm:

X Ii = wpiOi + Bi. (2.7) p

In 2011, a study using MLP with back-propagation learning rule was adopted to predict the winning rates of two teams according to their official statistical data of 2006 World Cup Football Game at the previous stages (Huang & Chen, 2011). The prediction accuracy they achieved was 75%, when excluding draw games (Huang & Chen, 2011).

27 2.6.3 Tree Algorithms

Decision tree algorithms are used in many practical applications of machine learning because they are fast and produce intelligible output that is often accurate (Han, Pei, & Kamber, 2011). The tree data structure is composed by node which is a test for the value of an attribute, branches that correspond to results of those tests and leaves which represent the classes (Witten et al., 2016). In tree algorithms it is very common to use pruning methods because pruned trees are smaller, less complex, and easier to comprehend (Witten et al., 2016). Some of the branches will reflect anomalies in the training data due to noise or outliers (Witten et al., 2016). Such methods typically use statistical measures to remove the least-reliable branches (Witten et al., 2016). Another method, much used in tree algorithms, which makes the system less complex and often gives better accuracy is that of smoothing techniques (Witten et al., 2016).

M5Prime (M5P) is a model tree algorithm that deals with continuous-class learning problems (Wang & Witten, 1996). This technique is a development of the M5 algorithm proposed previously by Quinlan in 1992, although details of that work are not really available (Wang & Witten, 1996). The main idea behind M5P is to generate a conventional decision tree with linear regression model at the leaves (Machado, Clua, & Zadrozny, 2010). This is quite straightforward and composed by three distinct stages (Wang & Witten, 1996).

Firstly it builds a tree using a decision-tree induction algorithm (Witten et al., 2016). In this step, instead of maximizing the information at each interior node, a binary splitting function is used that minimizes the intra-subset variation in the leaves values (Witten et al., 2016). Thus, it is chosen the attribute that maximizes the expected error reduction (Witten et al., 2016). The splitting function is named Standard Deviation Reduction (SDR) and is given by the formula:

X Ri SDR = sd(R) − | | × sd(R ), (2.8) R i i

where sd is the function that calculates the standard deviation of the classes values of a set of examples, R is the set of examples that reach the node and R1, R2, R3, ... are the sets that result from splitting the node accordingly to the chosen attribute (Wang & Witten, 1996). Note that SDR is set to infinite if splitting on the attribute would create a leaf with less than two examples in order to maintain the form of a binary tree (Wang & Witten, 1996).

In the second stage it gives consideration to pruning the tree back from each leaf to the correspondent interior node, replacing that node by a regression plane instead of a constant value used in many tree algorithms (Wang & Witten, 1996). Besides, the attributes that define that regression are precisely those which participate in the decision in the child nodes of the current one (Wang & Witten, 1996). Note that the pruning factor controls a trade off between prediction and tree size (Wang & Witten, 1996).

28 Finally, it uses a smoothing process in order to avoid abrupt sharp discontinuities that may occur between subtrees at the leaves of the pruned tree (Machado et al., 2010). This process is even more important for some models with a small number of training instances (Wang & Witten, 1996). It first uses the leaf model until it reaches the root, while smoothing it at each node by combining it with the value predicted by the linear model for that node (Wang & Witten, 1996). The smoothing process can be described as:

tp + kq p0 = , (2.9) t + k

where p’ is the predicted value passed up to the adjacent higher node, p is the prediction passed up from below to this node, q is the value predicted by the model at this node, t is the number of training instances that reach the node below, and k is a constant (Wang & Witten, 1996). This smoothing process was already applied to several types of data and has been proven that increases consistently the accuracy of predictions (Wang & Witten, 1996). In 2010 a study using M5P, and the workbench WEKA, studied the generation of emergent behaviors for characters in a strategy game (Machado et al., 2010). They compared the efficiency of M5P with other two machine learning algorithms presented in WEKA and it showed to be the best one (Machado et al., 2010). Besides, it showed that machine learning algorithms in games are capable of developing strategies from simple behaviors (Machado et al., 2010). In this experiment, M5P produced the strategy that characters have to tend to be fast and high life in order to build best battalions (Machado et al., 2010).

2.7 Credibility

For classification problems, it is natural to measure a classifier’s performance in terms of error rate (Witten et al., 2016). The classifier predicts the class of each instance: if it is correct that is counted as a success; if not, it is an error (Witten et al., 2016). The error rate is the proportion of errors made over a whole set of instances, and it measures the overall performance of the classifier (Witten et al., 2016). Evaluating methods reserve a certain amount of data for testing and use the remainder for training; more concretely it is common to use two-thirds of the data for training and the last third to test it (Witten et al., 2016). Of course, we could be unlucky and the sample used for training/testing might not be representative (Witten et al., 2016). In general, we cannot tell if the samples are representative or not, although it is worthwhile doing the stratified holdout procedure, that is to put each class in the right proportions in the testing and training sets (Witten et al., 2016). Besides that, we can use the repeated holdout that repeats the whole process several times with different random samples (Witten et al., 2016). The error rates on the different iterations are the basis to yield an overall error rate. In this work we will

29 explain the bootstrap and cross-validation methods.

2.7.1 Bootstrap

Bootstrap, often depicted as 0.632 bootstrap, is an estimation technique which samples the given training instances uniformly with replacement (Witten et al., 2016). In this method, each time an instance is selected, it is possible that it will be chosen, selected again and re-added to the training dataset (Kohavi, 1995). The 0.632 bootstrap works as follows: suppose we have a set of i instances, the data set is sampled i times, with replacement (Witten et al., 2016). It is probable that some of those i instances will be repeated, but others that were not chosen now will be used as the test set. The probability of a particular instance being selected each time is 1/i, so the probability of not been chosen, when we multiply this process several times, it will tend to this formula:

1 i (1 − ) = e−1 = 0, 368. (2.10) i

(where e=2.718 is the base of natural logarithms). Based on this, it is expected that on average, 36.8% of the instances will never be selected for the training set (Witten et al., 2016). For that reason, in a large data set, the training set will contain about 63.2% of the instances (that is why it is called 0.632 bootstrap method) (Witten et al., 2016). In 2014, a study was made using bootstrap techniques for empirical games (Wiedenbeck, Cassell, & Wellman, 2014). That experiment support statistical reasoning technique as part of the empirical game- theoretic analysis process (Wiedenbeck et al., 2014). Besides, they demonstrate that when applying bootstrapped regret confidence intervals in simultation-based game modeling, they can improve sampling decisions (Wiedenbeck et al., 2014).

2.7.2 Cross-validation

Cross-validation is an important and well known strategy for algorithm selection (Arlot & Celisse, 2010). It is an statistical technique where we decide on a fixed number of folds/partitions of the data (Witten et al., 2016). Imagine that we use stratified tenfold cross-validation, then we use the stratified techniques seen before, combined with splitting the data randomly into ten approximately equal portions and each one, in turn, is used for testing and the others are used for training, repeating this procedure ten times (Witten et al., 2016). Why 10? Extensive tests with different algorithms have shown that 10 is about the right number of partitions to get the best estimate of error with low bias and variance (Witten et al., 2016). However, these arguments are not yet conclusive and this debate continues. Besides that, tests have shown that stratification improves results slightly and reduces the variation but it does not eliminate it entirely (Witten et al., 2016). It is often used the tenfold cross-validation, as presented in Figure 2.13

30 where the white squares represent the training dataset and the orange squares the validation dataset (Witten et al., 2016). Of course, this has a notable problem, the computational performance to run this technique involves ten times the algorithm on datasets of nine-tenths the size of the original (Witten et al., 2016).

Figure 2.13: Tenfold cross-validation repeated ten times where the white squares represent the training dataset and the orange squares the validation dataset.

In 2015, a research with the aim of development and validation of the Game Perception Scale was made (Vandercruysse et al., 2015). In this study, the authors used cross-validation to validate and confirm their model (Vandercruysse et al., 2015).

2.8 Discussion

We started this proposal by calling attention to the importance of our work. We found that there was not a recommender system, in the game industry, that took into account the individual characteristics of each player. Moreover, we firmly believe our work has great potential to improve market value in the games review sector. We then started this Related Work chapter with the intention of understanding and analyzing the different aspects on which our work relies. Firstly we discussed some approaches in the field of games reviews and different methods of gathering information. Following that line of thinking, we then presented a list of games genres. We believe game genres are important to our work, because games from different genres can offer different challenges to the players, and so, different players will like different game genres. We also took a look at the work done in the psychological field on theories of personality models, where we presented the FFM and MBTI. These personality models are relevant to our research because we assume different people having

31 different personality, perceive and act upon the world in different ways, having their particular preferences. Moreover, these personality models served as a basis for a lot of the work done on personality in digital games, and also, MBTI led to the development of the forerunner player model used in our work. We then analyzed and discussed ways of creating player models in games, in accordance with their psychological traits. We looked at the work done in player profiles, such as Bartle Player Types, GMP, DGD, and BrainHex taking a deeper look at the latter because we used their survey and model in our research. Since our work relies upon creating a system that produces game reviews according to the player’s personality, we took a look at the fields of machine learning. We explored the open source workbench, for machine learning and data preprocessing, named WEKA, that allow us to use and analyze algorithms in parallel. We know that our review system receives some input and output, and it learns how to measure the importance of each part of the dataset, which is the same as saying that our system looks for a pattern in the dataset. Going deeper in this field we analyzed how different mathematical algorithms work, such as KNN,LR, MLP and M5P, and which better fits our purpose. We chose to work with these algorithms because our dataset is composed by numerical and continuous attributes, and so, we needed algorithms that perform with this specification. Besides, we saw that they are used in similar works. Finally, we also took a look on classification methods such as bootstrap and cross-validation taking a deeper understanding at the latter since it was the one that we used to validate our work.

32 3 Methods and Procedures

Contents 3.1 Measuring Tools...... 34

3.2 Approaches...... 34

3.3 Methodology...... 35

3.4 Metrics...... 42

3.5 Discussion...... 42

33 In this chapter, we will describe in detail our developed work on how to recommend digital games for people according to their own personality. To accomplish this detailed description of the work, we will start by explaining our measuring tools and the decision we made during this step. Besides, we will describe the two approaches developed to give personalize review scores. Afterwards, we will describe the procedure methodology, starting by introducing our user questionnaire and the filtering process of our selected digital games. Next, we will evince the filtering procedure made on the obtained dataset, where we will explain the data division into train and test sets. Then, we will present the chosen algorithms used to build our review system models, in WEKA. In the final step of our methodology procedure, we will describe how we used the testing datasets to evaluate the obtained review systems. Lastly in this chapter, we will present our metrics to evaluate our systems followed by a discussion of all developed methodology work.

3.1 Measuring Tools

In order to reach the goal of this research, we needed a way to get the personality of the players and their own evaluation on selected digital games. For this purpose we decided to create a survey to get this information. Thus, we used Google Forms mainly because is open-source, user-friendly, and we had previous positive experience with this tool (Mathias & Sakai, 2013). Besides, since this dissertation was done with agreement within two universities from different continents, we needed a survey tool that allow us to distribute and obtain the results automatically apart from the physical space and schedules.

3.2 Approaches

In our work we considered two approaches to give personalize review scores, as can be seen in Figure 3.1. The first one, named Review System (RS), takes into account each player’s BrainHex model, and is applied for each game individually. The second approach, named General Review System (GRS), receives as input the players’ BrainHex model and the games’ BrainHex models, and could be applied to any game.

Figure 3.1: Two approaches to give personalize review scores.

34 3.3 Methodology

Our methodology was to follow a 6-stage pipeline, composed by an User Questionnaire, Game Filter- ing, Dataset Filtering and ARFF Preparation, Algorithm Filtering, Review System and Review System Validation, as can be seen in Figure 3.2. Each stage will be detailed in the next sections.

Figure 3.2: Flow chart of the six methodology steps. It starts from the User Questionnaire, followed by Game Filtering, and Dataset Filtering and ARFF Preparation. From it, appears the Algorithm Filter- ing process, and the Review System. Ending the pipeline methodology with the Review System Validation step.

Considering now the creation of the User Questionnaire, this stage was carried out by an iterative process together with the Game Filtering where we added and removed some digital games to the study, based on the answers we obtained from the questionnaire. When we finished our games filtering process, we started then the next step of our pipeline, the Dataset Filtering and ARFF Preparation. Here we started by doing an exhaustive data analysis and removing the instances with erroneous information. In

35 this same stage we built the ARFF files, taking into account the chosen digital games, and also dividing the obtained data into training and testing datasets, with proportions of 70% and 30% accordingly. These testing ARFF files were not used until the last step of our methodology pipeline. Considering now our training dataset files, we used them to do the WEKA and Review System processing, based on the chosen algorithms from the Algorithm Filtering stage. We did all the preprocessing and processing of the data using WEKA in order to obtain our review system models, for each chosen algorithm and game. Besides this, we also did an ARFF file with all the chosen games together, named GRS, and we built the correspondent review system using WEKA, for each selected algorithm. The final part of our methodology pipeline had the aim to validate the developed review systems. In order to obtain this validation, we used the testing dataset files and evaluated the goodness of fit of our obtained review system models from the training ARFF files.

3.3.1 User Questionnaire

In order to get the personality of the players and their own evaluation on selected digital games, we created a survey on Google Forms1, and can be seen with more detail in Appendix C - User Questionnaire 1. Since we wanted hundreds of users to answer the survey, we needed to choose carefully what to ask in order not to do an extensive survey. Because in general the shorter the questionnaire is, the greater percentage of answers we get (Leslie, 1970). During the creation of this questionnaire, we passed through some development cycles until we reached the final version. These cycles were aimed at validating the adequacy of the set of games that would be used in the survey. Therefore, we studied the worldwide digital games market from the past few years in the most diverse systems in order to get the more answers as possible. This being said, we made a first test version of this questionnaire and gave it to 15 people, in order to receive their feedback about the structure and understanding of all questionnaire. After this, we fixed the found errors and started to distribute it massively. Since this survey was anonymous, we did not ask their names, emails or other identifiers. This questionnaire took around 10 minutes to fill, and was divided into three distinct sections, which we will now explain in detail. In the first part we collect the demographic data of the players, where we asked their gender, age, and type of gamer. For the purpose of knowing their kind of gamers, we asked them how often they play digital games. If the answer was I do not play digital games, then we classify the player as not gamer. If the answer was I play digital games occasionally when the opportunity presents itself, then we classify as casual gamer. Otherwise, if the answer was I make some time in my schedule to play digital games, the user it would be classified as hardcore gamer. Secondly, we asked the players to give their own review score, an integer value between 1 and 10, to

1https://docs.google.com/forms/d/e/1FAIpQLSfZpG2q2F5iKv0zqvM7nlueiXB3dQ1E9xozTqOFrusMsvHhhw/viewform

36 the selected digital games. In this section we asked exactly the same questions for each one of the games. Furthermore this part of the survey, had a section for users to add reviews for a maximum of 3 games that were not explicitly in the questionnaire. With those open answers, we noticed that were games being reviewed more frequently, so we decided to add them to the study. On the other hand, we noticed that there were other games that firstly were in the study, and were not considered for this study because they were getting few reviews from the players, and we needed games with several reviews to build a more stable and consistent system. Besides these open questions, for each game played we asked three more questions. Firstly we asked their estimated total play time in hours, in order to obtain a better overview on how well the users know that game. Afterwards, we presented a scale with values from 1 to 10 to players where we asked them to answer with their own game review score and the review score they think most players would give to that game. Also, we decided to put these two questions side by side in order to compel the users to answer the way they perceive the tastes of the other players and their owns. The personal game review scores the people gave us were the instances used in our study. Lastly, the third section begun by giving a link to the BrainHex test where the players had to fill their survey and report their results. With that, we obtained their personal BrainHex model used to build the review system models.

3.3.2 Game Filtering

We will now explain the filtering process to obtain the list of the selected digital games that we used to validate our game review systems. We chose a list of digital games, from different genres, and for the most diverse kinds of systems. We decided to consider only recent games, with maximum of 4 years from their launch, in order to get game reviews from recent memories. It is of interest to us to analyze a set of selected games in order to cover all seven archetypes of the BrainHex model that we are using. This way we try to cover all the space of the model, and have games for all the players’ profiles. Additionally, our hypothesis is that digital games can be represented by a BrainHex model, that evaluates their content in terms of the behavior they promote. So, in order to obtain the BrainHex model of each selected game, we decided to reproduce the BrainHex’s survey in Google Forms, and to ask people that played those games. The questionnaire would replace the user desires by what the game offers in terms of gameplay. This questionnaire was anonymous, took around 5 minutes, per game, to fill and can be seen with more detail in Appendix D - User Questionnaire 2. We ensured that we had at least two players answering each of the selected games. Our aim with this survey was to get the BrainHex model for each selected game, and to analyze if the players had similar opinions about what the games offer. So, after we got all the dataset and the values for each dimension, we analyzed the opinion deviation for each BrainHex dimension, by subtracting the

37 lower value from the higher one and normalize it to [0, 1] and express it as a percentage. We did this process game by game, and registered those overall opinion deviations, using the following formula:

Pn |p − q | d = i=0 i i × 100, (3.1) n × s

where d is the deviation function, in percentage; p’s and q’s are the two people opinion’s score; n is the number of archetypes of our BrainHex model, that in our case are 7; s is the scale dimension, that in our case are 30 since we have a scale from -10 until 20; and we multiply the result, of the fraction before, times 100 in order to obtain the percentage value. After we got all these opinions, we had to decide which of those we would use in our model. So, we went game by game and chose randomly, from the opinions we had, the values for each BrainHex archetype. We recorded the obtained Mean-Absolute Error (MAE), using the Linear Regression algorithm in WEKA, validating it with tenfold cross-validation technique in the GRS model. Then, for each dimension where the opinions differed for more than 15%, we tested all possible combinations using the Linear Regression in the GRS model. We studied which values had less MAE and we chose them.

3.3.3 Dataset Filtering and ARFF Preparation

In this methodology pipeline stage, we filtered our dataset in order to create better review system models. This dataset filtering process was made through several exhaustive observations on the data, that can be divided into four main steps. First, we took a careful look at the obtained data to remove the instances that had all the user’s BrainHex dimensions with the same value, or the instances the users gave a review score but did not answer how many hours they played. We considered that these instances contained erroneous information and it was indispensable to remove them. After that, during the third step of our dataset filtering, we eliminated all instances of the users that played the selected games for 1 hour or less. In our opinion, 1 hour is not enough time to get a consistent idea of the selected games, and so, the review score could not be adequate. We did not put more constraints on the played time because we wanted a game score for most of the people and not only those who completed the game. Mainly, because most of the people that do not complete the games, is because they have something that the players did not like and, thus, they stopped playing it; and we want those reviews as well. So, we did not remove people that have not experienced the totality of content offered by the game but removed experienced too short to be considered informed opinion, rather than a strict “first impression”. Afterwards, we divided the data into training and test sets, such that 70% of the instances were used for training, and the other 30% were used for testing. This division was done, separately, for each digital

38 game. Lastly, we imposed another restriction on our training dataset to identify and remove outlier instances, and improve the predictive performance of our review system. We consider outlier instances those reviews that got very different scores compared to the other reviews that had similar player types. This data constraint was applied only for the training set because we did not want to get previous knowledge of the testing datasets. The elimination of outliers was done by starting the observation from the review scores with more extremes values (1 and 10). Observing these instances, if the number of instances of that specific review score was less than 5% of the total instances, and the next score number also had less than 5%, then, the review scores with the number we were observing first were classified as outliers. Finally, for each outlier, we look to the other users whose BrainHex model does not differ more than 10% from this one; and if, in average, they had a review score 15%, or more, different to this one, the identified outlier was eliminated from our input dataset model. Otherwise, no instance were removed from our dataset. We took this approach in order to not throw away valid information, since our hypothesis was that people with different personalities, will like different games. This way, we identified isolated outliers from the training dataset, that gave very different review scores but had no significant different personality to the other users. In this pipeline stage, we created the correspondent ARFF files for both training and testing datasets from the Excel file generated by Google Forms, when recollecting the users answers. This being said, we built one ARFF file for each selected game, for both training and testing datasets. Moreover, we also built other ARFF file for the training dataset and another for the testing dataset, with all the games together, to build the GRS. The Header part of these two files contained seven more attributes, the seven BrainHex dimensions of the games. The Header part of an ARFF file can be seen in Figure 3.3. It contains the relation Grand Theft Auto V Review System, the list of seven plus one numeric attributes which are the seven archetypes of the user’s BrainHex model and the game review score.

Figure 3.3: Header part of an ARFF file. It contains the relation Grand Theft Auto V Review System, the list of seven plus one numeric attributes which are the seven archetypes of the user’s BrainHex model and the game review score.

Despite the creation process of these files was the same for all selected games, we developed a dis- tinct file for the GRS, which Header contained seven more attributes. As can be observed in Fig-

39 ure 3.4, the seven extra attributes were the BrainHex dimensions of the games that we named Achiev- erGAME, ConquerorGAME, DaredevilGAME, MastermindGAME, SeekerGAME, SocialiserGAME and SurvivorGAME.

Figure 3.4: ARFF Header part of the GRS. It contains the relation General Review System, the list of seven nu- meric attributes which are the user’s BrainHex dimensions, the seven game’s BrainHex dimensions, and the game review score.

Therefore, in Figure 3.5 can be seen a list with some instances of the ARFF Data part from this same file. It is noticeable that the seven first columns contains the BrainHex dimensions of the users, the next seven contains the BrainHex dimensions of the games, and the last column contains the respective game review scores. The Data part of the other ARFF files were similar to this one, except that they did not contain the game BrainHex dimensions.

Figure 3.5: ARFF Data part of the GRS, where the seven first columns contains the user’s BrainHex dimensions, the next seven contains the game’s BrainHex dimensions, and the last column contains the respective game review scores.

3.3.4 Algorithm Filtering

In our work, we went through a study of what genre of algorithms we should use to validate our hypothesis. We discarded some algorithms when we restricted our data to continuous values. This makes sense since, in our dataset, all attributes are numeric and are case-sensitive, or rather, closer numbers are more

40 similar. Based on the continuous nature of our data, there were several interesting algorithms to work with. Although, we explored the four following machine learning algorithms to model our data:

• Instance-Based Algorithm: KNN

• Function Algorithms: LR and MLP

• Tree Algorithm: M5P.

3.3.5 Review System

During this part of our methodology pipeline, we had the aim to preprocess, process and analyze the results of our training datasets, using WEKA, and with that create the respective review system models. In this stage of the pipeline we used the software WEKA where we inserted the selected ARFF, chose the algorithm we wanted to use, and the tenfold cross-validation technique. Then, from the generated model we built our review system model, recording the presented error metrics. We repeated this whole procedure, for each selected game and for the GRS, using each selected machine learning algorithm. The software WEKA in the menu Explorer and section Classify, can be seen in Figure 3.6. Here it is being classified a dataset with theLR algorithm, using tenfold cross-validation technique. Also, it is presented the generated model and the results of some error metrics.

Figure 3.6: The software WEKA in menu Explorer and section Classify. Here is being classifying a dataset withLR algorithm, using tenfold cross-validation. Also, it is presented the generated model and the results of some error metrics.

41 3.3.6 Review System Validation

In the last step of our methodology pipeline, our goal was to evaluate each one of our developed review systems. To achieve that, we used the software WEKA to evaluate the accuracy of our review system models, previously obtained from the training data files. Lastly, we presented the obtained results from this evaluation process.

3.4 Metrics

When we analyzed and validated the accuracy of our review system models, based on our training datasets, we used tenfold cross-validation. In our work, that evaluation of the system’s success was made with one of the most commonly used measure metrics, the MAE(Witten et al., 2016). With this metric, the predicted values on the test instances are the p values, the a values are the actual values of the instances, and n are the number of the instances. The MAE equation is defined as:

Pn |p − a | MAE= i=0 i i . (3.2) n

3.5 Discussion

In this chapter we exposed the performed methods during the development of the same, with the aim to define the steps that allows the reader to replicate this study. In the next chapter we will report and explain all obtained results, finishing it with a discussion.

42 4 Results

Contents 4.1 Demographic Results...... 44

4.2 Game Filtering Results...... 46

4.3 Dataset Filtering Results...... 48

4.4 Review System Training Results...... 48

4.5 Review System Validation Results...... 56

4.6 Discussion...... 59

43 4.1 Demographic Results

The demographic answers we obtained, from our user’s questionnaire, were mostly from people with engineering background, since we distributed the link via email to the engineering students of Pontif´ıcia Universidad Cat´olicadel Per´u(PUCP) and the information and software engineering students of Instituto Superior T´ecnico(IST). Besides, the survey’s link was also distributed in the Facebook closed group named Comunidade Gamer Portuguesa, and other people contacted personally with diverse backgrounds. In total, we got 300 users answering our questionnaire. Often people answered to more than one game, which led us to a total of 1254 instances that we used as our dataset.

We started our questionnaire by asking their gender, where we got 84.3% male individuals and 15.7% female answers, as shown in Figure 4.1.

Figure 4.1: Pie chart with the gender information of the 300 users. With 84.3% male individuals and 15.7% female answers.

Then, we asked their age, where we got 9.0% of the answers from people under 18 years old, 54.3% within the range from 18 to 23 years old, 23.3% from 24 to 29 years old, 10.3% from 30 to 39 years old and 3.0% above 39 years old, as presented in Figure 4.2.

44 Figure 4.2: Pie chart with the age information of the 300 users.

Afterwards, we asked how often they play digital games, in order to know how they would see them- selves as players. Here we divided into not gamer if they answered that do not play digital games, casual gamer if the answer was they play occasionally when the opportunity presents itself, and hardcore gamer if they answered they make some time in their schedules to play digital games. The results in this ques- tion showed 56.0% of the users were classified as hardcore gamer, 41.7% to casual gamer, and 2.3% as not gamer, as can be seen in Figure 4.3.

Figure 4.3: Pie chart with the information about the type of gamer of the 300 users. With 56.0% as hardcore gamer, 41.7% to casual gamer, and 2.3% as not gamer.

45 4.2 Game Filtering Results

The selected digital games were obtained through an iterative process where, with the open answers of the first 50 questionnaires, we noticed that the games Clash Royale and Rocket League were being reviewed frequently, so we decided to add them to the study. On the other hand, we observed that the digital games Civilization VI, Sport and Resident Evil VII that firstly were in the study, were not considered anymore for this research because we got few reviews from the players, and we needed games with several reviews to build more stable and consistent review systems. A list with the final chosen digital games, with their correspondent BrainHex’s dimension scores, and game genres, can be seen in the following Table 4.1. A description of each selected digital game can be seen with more detail in Appendix B - Selected Digital Games.

Digital Game AC CO DA MA SE SO SU Game Genre Call of Duty Black Ops III 16 8 6 17 10 14 13 First-Person Shooter Clash Royale 8 12 6 6 -8 9 -8 Real-Time Strategy Dark Souls III 14 19 2 14 18 9 7 Role-Playing Game FIFA 18 2 18 6 -2 -6 14 -8 Simulator and Sport Fortnite -2 17 15 8 6 18 1 Battle Royale God of War 4 15 14 2 16 6 -10 4 Action and Adventure Grand Theft Auto V 14 5 8 -2 14 10 5 Action and Adventure Pokemon GO 19 12 -6 0 4 14 -8 Location-Based Game Rocket League 2 14 11 3 -4 14 -1 Sport The Elder Scrolls V: Skyrim 17 9 6 9 20 -10 -2 Role-Playing Game The Sims 4 14 0 -4 10 4 8 -4 Simulator The Witcher 3: Wild Hunt 16 15 -3 12 20 -10 0 Role-Playing Game

Table 4.1: The 12 selected digital games with their 7 BrainHex’s archetypes scores and correspondent game genres. Meaning: Achiever (AC), Conqueror (CO), Daredevil(DA), Mastermind (MA), Seeker (SE), Socialiser (SO), Survivor (SU).

In order to obtain the BrainHex model for each selected game, showed in Table 4.1, we developed a second user questionnaire that can be seen with more detail in Appendix D - User Questionnaire 2. For the purpose of understanding the agreement between the players’ opinions about what the game has to offer, we analyzed the opinion deviation for each BrainHex dimension. We did this process game by game, and registered those overall opinion deviations. We analyzed the resulted values for all selected games, and obtained a percentage average of opinion deviations of 12.3%, as presented in the last row of the Table 4.2. After collecting all these opinions and analyzing their similarity, we had to decide which of those scores we would use in our model. First, we chose randomly the values for each archetype, from the opinions we had. Then, for each dimension, of each game, where the opinions differed by more than 15%, we tested all possible combinations using theLR algorithm in WEKA, validating it with tenfold cross-validation technique in the GRS model. We studied which values had the lowest MAE and we chose

46 Digital Game Deviation (%) Call of Duty Black Ops III 18.6 Clash Royale 9.5 Dark Souls III 18.6 FIFA 18 9.5 Fortnite 12.4 God of War 4 13.8 Grand Theft Auto V 12.4 Pokemon GO 8.1 Rocket League 16.7 The Elder Scrolls V: Skyrim 7.1 The Sims 4 9.5 The Witcher 3: Wild Hunt 11.4 Overall 12.3

Table 4.2: 12 selected digital games with their correspondent opinion deviations and a last row with the average of all selected games (in percentage).

them. In order to better understand the impact of this step, we present Table 4.3, where the first column shows the name of the digital games, the second and third columns show the highest and lowest MAE, accordingly. These MAE values were obtained by training all possible combinations from the opinions which difference was more than 15%.

Digital Game Highest MAE Lowest MAE Call of Duty Black Ops III 1.107 1.099 Clash Royale 1.106 1.102 Dark Souls III 1.151 1.100 FIFA 18 1.107 1.102 Fortnite 1.126 1.101 God of War 4 1.139 1.102 Grand Theft Auto V 1.149 1.098 Pokemon GO 1.102 1.102 Rocket League 1.126 1.102 The Elder Scrolls V: Skyrim 1.141 1.102 The Sims 4 1.105 1.102 The Witcher 3: Wild Hunt 1.106 1.100

Table 4.3: The first column presents the name of the digital games, the second and third columns show the highest and lowest MAE, accordingly.

Furthermore, we trained the GRS by randomly selecting the dimension values between the two opin- ions from each game. This random approach got a MAE of 1.123. On the other hand, we trained the GRS with a distinct approach, which consisted of selecting the dimension values that obtained better results in the training of each game separately. This second approach got a MAE of 1.095.

47 4.3 Dataset Filtering Results

In order to improve the predictive performance of our review systems, we applied constraints to the training datasets used to process the selected algorithms. The first applied restriction had the aim to eliminate all instances of the users that played the selected games for 1 hour or less. The second constraint was to identify and remove outlier instances. In the following Figure 4.4, we present an example of the effects of removing the identified outlier instance to the training dataset of the digital game The Witcher 3: Wild Hunt. In this image, there is a red circle highlighting the identified outlier, where a user answered with a review score of 2 to this game. This player had 5 other people whose personality did not differentiate for more than 10%, although, the percentage average review score of these players was 78% distinct (over 15%), which means this identified outlier was removed from our training set.

Figure 4.4: An identified outlier in the game The Witcher 3: Wild Hunt, with a review score of 2.

4.4 Review System Training Results

We developed and explored our review system models using the training dataset files in the software WEKA. We now present and analyze all these results, organized by machine learning algorithm.

4.4.1 Linear Regression Algorithm Training Results

The first algorithm used in our study was theLR algorithm, which is one of the most important and widely used in statistical models (Witten et al., 2016). The result obtained from this type of algorithm is an expression of a class as a linear combination of the attributes with their predetermined weights.

48 The results of using ten times tenfold cross-validation technique in three distinct dataset phases, with the respective number of instances and the overall error estimate using the average of the ten MAEs, can be seen in the following Table 4.4. The last column of this table represents the improvement percentage, of the overall MAE from the first to the third phase. The first phase was the dataset without restrictions, by this we mean all 877 training instances (without the erroneous instances). The second phase includes all previous instances, except those which users answered that played the selected digital games for one hour or less, where it gave us a total of 843 instances. In the third, and last phase, we added a restriction to the dataset to exclude the identified outliers described in the section before, which gave us a total of 809 instances.

Digital Game NI MAE NI MAE NI MAE Improvement (%) GRS 877 1.313 843 1.255 809 1.096 16.5 Call of Duty Black Ops III 42 1.311 41 1.231 40 1.112 15.2 Clash Royale 78 1.498 73 1.407 65 1.044 30.3 Dark Souls III 48 1.354 46 1.108 44 1.031 23.9 FIFA 18 63 1.404 62 1.436 60 1.343 4.3 Fortnite 99 1.856 88 1.712 84 1.393 24.9 God of War 4 47 1.140 46 1.150 45 0.863 24.3 Grand Theft Auto V 117 1.101 114 1.052 109 0.867 21.3 Pokemon GO 132 1.785 128 1.797 123 1.670 6.4 Rocket League 67 1.328 65 1.299 64 1.242 6.5 The Elder Scrolls V: Skyrim 78 1.113 77 1.114 76 1.023 8.1 The Sims 4 49 1.390 46 1.370 44 1.267 8.8 The Witcher 3: Wild Hunt 57 0.965 56 0.860 55 0.667 30.9

Table 4.4: GRS and the 12 selected digital games with their correspondent number of instances (NI) and overall MAE divided into three phases, the first without restrictions to the dataset; the second without those instances of the games played for 1 hour or less; and the third restriction without other identified outliers. Finally, the last column is the improvement percentage, of the overall MAE from the first to the third phase.

We could observe that, with few instances removed, the constraints imposed to the dataset improved the prediction performance of the system for the training dataset. The first constraint imposed to the system, improved the performance of nine out of thirteen review systems, which is considered to be positive. Besides, the second restriction was even more notable because it improved all review systems, and so, we consider them to be our final review systems for all selected games and for the GRS. Although that with these constraints we built a model more fitted to the training dataset, we may have produced an overfitted model of our data. This may happen because we did an extensive analysis on the training data, which can therefore fail to fit the testing dataset. Besides the improvement obtained by adding some restraints to our training dataset, we wanted to get more information about it, specifically, if the number of instances were enough or we would need more answers from our user questionnaire. In order to understand this, we divided our training dataset, for each digital game, into 30% for testing and the remaining data into seven equal portions for training,

49 adding all digital games in the same ARFF file. Then, we picked the first 10% of the data to train the model, testing it with the 30% and recording the MAE. We did this process seven times, each time adding other 10% data portion. In Table 4.5, is presented the result of applying this process.

NI Training (%) Testing (%) MAE 81 10 30 1.240 163 20 30 1.182 248 30 30 1.130 325 40 30 1.115 407 50 30 1.101 484 60 30 1.093 565 70 30 1.094

Table 4.5: The number of instances (NI) with the respective training dataset percentage, testing dataset per- centage and MAE in seven progressive steps.

It is notable that the MAE were decreasing until it reached the 60%, which means that we are comfortable saying that the number of instances that we had was enough. Although, the MAE only decreased about 13.3% from the first until the last step of this process. The following Equation 4.1 explains the composition of the obtained linear regression models, for each selected game, that we present in following Table 4.5.

k X Review = wi × ai, (4.1) i=0 where Review is the value we want to obtain; w’s are the predetermined weights; k is the number of BrainHex model archetypes; a’s are the archetype values, being the first a of the equation equal to 1. The linear regression model obtained for the GRS, was the following:

Review = 0.047 × Achiever − 0.017 × Conqueror + 0.014 × Daredevil

+0.012 × Mastermind − 0.020 × Seeker + 0.019 × Socialiser + 0.015 × Survivor

−0.130 × AchieverGAME + 0.000 × ConquerorGAME − 0.115 × DaredevilGAME (4.2) −0.092 × MastermindGAME + 0.028 × SeekerGAME

−0.095 × SocialiserGAME + 0.145 × SurvivorGAME

+10.493.

The 12 selected digital games with their correspondent predetermined weights, to be multiplied by the BrainHex archetypes, in alphabetical order, is presented in Table 4.6. Now we present an example that demonstrates the application of these review system models. We chose randomly one player, from our training dataset, that gave a review to the game FIFA 18, in order to present his predicted review score in the system of the digital game alone and in the GRS, and also

50 Digital Game w0 w1 w2 w3 w4 w5 w6 w7 Call of Duty Black Ops III 8.736 -0.094 0.006 0.050 -0.054 0.002 -0.037 -0.021 Clash Royale 6.616 0.020 -0.064 0.003 0.023 0.046 0.013 -0.030 Dark Souls III 8.035 0.071 0.002 0.060 -0.036 -0.034 0.001 0.046 FIFA 18 9.059 0.078 -0.115 -0.025 0.031 -0.105 0.027 -0.017 Fortnite 7.580 0.051 -0.024 0.018 0.048 -0.079 -0.039 0.034 God of War 4 8.700 0.011 0.021 -0.007 -0.017 0.028 0.003 0.000 Grand Theft Auto V 7.456 0.044 0.004 0.052 0.012 -0.023 0.0257 0.026 Pokemon GO 6.021 0.075 -0.011 -0.002 0.011 -0.037 0.021 0.050 Rocket League 5.557 0.082 0.046 0.034 0.023 -0.049 0.033 0.033 The Elder Scrolls V: Skyrim 7.163 0.0600 -0.038 0.007 0.015 0.039 0.079 -0.029 The Sims 4 7.087 0.043 -0.059 -0.032 -0.022 0.055 0.066 0.023 The Witcher 3: Wild Hunt 7.473 0.032 0.025 0.005 0.022 0.072 0.003 -0.009

Table 4.6: The 12 selected digital games with their correspondent predetermined weights, to be multiplied by the BrainHex archetypes, in alphabetical order. Meaning: w1 (AC), w2 (CO), w3(DA), w4 (MA), w5 (SE), w6 (SO), w7 (SU). to compare those obtained results. The BrainHex dimension scores of this player are the following 13 in Achiever dimension, 17 in Conqueror, 1 in Daredevil, 17 in Mastermind, 8 in Seeker, 10 in Socialiser, and 1 in Survivor. The predicted review score for this player in the digital game FIFA 18 review system model, when replaced the dimensions by the respective scores, can be seen in Equation 4.3.

Review = 0.078 × 13 − 0.115 × 17 − 0.025 × 1 + 0.031 × 17 − 0.105 × 8 + 0.027 × 10 − 0.017 × 1 (4.3) +9.059.

The predicted review score was 8.033, in a scale from 1 to 10. The actual review score that this random player gave to the game FIFA 18 was 8. So, the deviation of the predicted and the actual review scores was around 0.3%. The predicted review score for this player in the GRS, can be calculated by replacing the BrainHex dimensions of the player and game by the respective scores, which means using the FIFA 18 scores presented in Table 4.1 and the player dimension scores above written, both on Equation 4.2. The resulting predicted value was 7.654, in a scale from 1 to 10. Although, the actual review score that this random player gave to the game FIFA 18 was 8. So, the deviation of the predicted and the actual review scores was around 3.5%. In order to analyze the accuracy of our obtained GRS model, we decided to test it using the training instances of each game, separately. The results of this approach can be seen in the following Table 4.7, where we present the chosen digital games with their number of instances and correspondent overall MAE based on the training instances of the singular review systems (as can be seen in the seventh column of Table 4.4). In the final column of this table can be seen the MAE obtained from testing the GRS using

51 the training instances of each digital game.

Digital Game NI MAE Separately MAE in GRS Call of Duty Black Ops III 40 1.112 1.114 Clash Royale 65 1.044 0.999 Dark Souls III 44 1.031 0.927 FIFA 18 60 1.343 1.192 Fortnite 84 1.393 1.256 God of War 4 45 0.863 0.816 Grand Theft Auto V 109 0.867 0.826 Pokemon GO 123 1.670 1.536 Rocket League 64 1.242 1.093 The Elder Scrolls V: Skyrim 76 1.023 1.054 The Sims 4 44 1.267 1.194 The Witcher 3: Wild Hunt 55 0.667 0.617

Table 4.7: The 12 selected digital games with their correspondent number of instances (NI), and overall MAE based on the training instances of the singular review systems, as can be seen in the seventh column of the Table 4.4, and the MAE obtained from testing the GRS model using the training instances of each game.

When we compared the results obtained in Table 4.7, we could observe that the GRS model was generally better than the singular game review system models. This being observed, there seems to be an indication that the GRS is at least as good as the individual systems. We consider this an important achievement of our research, since it helps to support our belief that a relevant aspect of a digital game in terms of its review score can be modeled by BrainHex player model.

4.4.2 K-Nearest-Neighbor Algorithm Training Results

The instance-based algorithm used in our study was the KNN algorithm. The result obtained from this type of algorithm is the average of the values of its most common nearest neighbors. In our exploration, we used the euclidean distance. In order to decide the number of neighbors to use in our system, we tested with 5%, 10%, 15%, and 20% of the total of the instances of the system. We used the tenfold cross-validation technique in the training datasets, with 1/d distance weighting method, for both 12 games and the GRS. The 12 games and the GRS, with the respective number of instances and the five percentage neighbor approaches, each accordingly to their MAEs, are presented in Table 4.8. Here we verified that the 10% obtained less MAE for more review systems than the other percentage options. So, we decided to use a number of neighbors of 10% of the total instances in the following phase. Besides the selection of the number of neighbors, we analyzed three distinct weighting approaches to calculate the importance of each neighbor. Those three weighting distance methods were 1-d, and 1/d, and 1 (d euclidean distance). The results of using tenfold cross-validation technique on the training dataset after applying all restrictions seen before, with a number of neighbors of 10% from the total of

52 Digital Game NI 5% (MAE) 10% (MAE) 15% (MAE) 20% (MAE) GRS 809 1.121 1.127 1.143 1.153 Call of Duty Black Ops III 40 1.172 1.067 1.089 1.076 Clash Royale 65 1.095 1.071 1.055 1.021 Dark Souls III 44 0.825 0.871 0.912 0.896 FIFA 18 60 1.307 1.261 1.249 1.264 Fortnite 84 1.516 1.4148 1.415 1.344 God of War 4 45 1.070 0.925 0.868 0.886 Grand Theft Auto V 109 0.891 0.890 0.870 0.867 Pokemon GO 123 1.752 1.716 1.691 1.660 Rocket League 64 1.350 1.209 1.239 1.189 The Elder Scrolls V: Skyrim 76 1.134 1.053 1.087 1.054 The Sims 4 44 1.271 1.172 1.160 1.173 The Witcher 3: Wild Hunt 55 0.719 0.637 0.655 0.657

Table 4.8: GRS and the 12 selected digital games with their correspondent number of instances (NI) and the four percentage neighbor approaches, each accordingly with their MAEs. the instances, and for each distance approach, can be seen in Table 4.9.

Digital Game NI 1 (MAE) 1-d (MAE) 1/d (MAE) GRS 809 1.135 1.133 1.127 Call of Duty Black Ops III 40 1.050 1.055 1.067 Clash Royale 65 1.084 1.082 1.071 Dark Souls III 44 0.875 0.874 0.871 FIFA 18 60 1.261 1.261 1.261 Fortnite 84 1.429 1.428 1.414 God of War 4 45 0.920 0.921 0.925 Grand Theft Auto V 109 0.892 0.892 0.890 Pokemon GO 123 1.720 1.719 1.716 Rocket League 64 1.229 1.225 1.209 The Elder Scrolls V: Skyrim 76 1.053 1.046 1.053 The Sims 4 44 1.153 1.157 1.172 The Witcher 3: Wild Hunt 55 0.618 0.622 0.637

Table 4.9: GRS and the 12 selected digital games with their correspondent number of instances (NI) and three distinct weighting approaches to calculate the importance of each common neighbor, with the respec- tive number of instances and MAEs. The three weighting distance methods we used in our system were 1, 1-d, and 1/d accordingly.

Here we noticed that the approach with best results was 1/d. This means that closer neighbors have more importance when to calculate the result of the review system. We also analyzed the other combinations of using the different distance metrics with the different percentage of neighbors, and the results confirmed our choice.

4.4.3 Multilayer Perceptron Algorithm Training Results

The second function algorithm used in our study was the MLP algorithm with back-propagation. To analyze the number of hidden layers to use in our review systems, we tested two distinct approaches.

53 The first approach was to use a sigmoid function as activation function and only one hidden layer. The second approach we tested, was built on three fully connected hidden layers, each layer with a number of nodes equal to the number of attributes of the review system.

The following Table 4.10 presents the results of using tenfold cross-validation technique in the training dataset, after applying all restrictions seen before. It presents the 12 games and the GRS, with the respective number of instances, the two different approaches and the obtained MAEs.

Digital Game NI 1 HL (MAE) A-A-A HL (MAE) GRS 809 1.107 1.377 Call of Duty Black Ops III 40 1.153 1.162 Clash Royale 65 1.299 1.152 Dark Souls III 44 0.930 0.956 FIFA 18 60 1.472 1.366 Fortnite 84 1.385 1.361 God of War 4 45 1.029 0.791 Grand Theft Auto V 109 0.873 0.934 Pokemon GO 123 1.802 1.694 Rocket League 64 1.398 1.340 The Elder Scrolls V: Skyrim 76 1.150 1.203 The Sims 4 44 1.412 1.240 The Witcher 3: Wild Hunt 55 0.796 0.745

Table 4.10: GRS and the 12 selected digital games with their correspondent number of instances (NI), and the two different approaches of the number of hidden layers (HL) with their MAE results, accordingly. The second approach is presented by A-A-A because it has 3 levels of hidden layers each one with the number of attributes.

We verified that the one hidden layer approach had less games with better result, although it had a better result with the GRS. This means that when we increase the number of hidden layers, the gain is not relevant because not all systems improve their results, and those who improve the gain is around 1%.

4.4.4 M5P Algorithm Training Results

The tree algorithm used in our work was the M5P algorithm, being the tree algorithms used in many practical applications of machine learning. The result obtained from this algorithm is the generation of a conventional decision tree with linear regression model at the leaves.

The generated tree using the M5P algorithm, for the GRS, can be seen in Figure 4.5. Where we can observe one initial node, three sub-nodes and five leafs with their correspondent linear regression models.

54 Figure 4.5: Generated tree using the M5P algorithm, for the GRS. One initial node, three sub-nodes and five leafs with their correspondent linear regression models.

The results of using tenfold cross-validation technique in the training dataset after applying all restric- tions seen before, can be seen in the following Table 4.11, where it is represented the respective number of instances and MAEs.

Digital Game NI MAE GRS 809 1.131 Call of Duty Black Ops III 40 1.159 Clash Royale 65 1.047 Dark Souls III 44 1.099 FIFA 18 60 1.305 Fortnite 84 1.471 God of War 4 45 0.801 Grand Theft Auto V 109 0.891 Pokemon GO 123 1.667 Rocket League 64 1.250 The Elder Scrolls V: Skyrim 76 1.054 The Sims 4 44 1.326 The Witcher 3: Wild Hunt 55 0.639

Table 4.11: GRS and the 12 selected digital games with their correspondent number of instances (NI) and MAEs, using the M5P algorithm.

55 4.5 Review System Validation Results

As a last step of our methodology pipeline, we validated our developed review systems using the 30% correspondent testing dataset files, for each algorithm. In the next subsections we present and analyze all these results, organized by each algorithm, ending it with a discussion.

4.5.1 Linear Regression Algorithm Validation Results

In order to analyze and compare the MAE obtained from the training and testing datasets for the selected games and for the GRS, in the case of theLR algorithm, we present the following Table 4.12. Here it is possible to observe the 12 selected games and the GRS, the respective number of instances, the training MAE and the testing MAE.

Digital Game NI Training (MAE) Testing (MAE) GRS 809 1.096 1.381 Call of Duty Black Ops III 40 1.112 1.688 Clash Royale 65 1.044 1.614 Dark Souls III 44 1.031 1.229 FIFA 18 60 1.343 1.143 Fortnite 84 1.393 2.030 God of War 4 45 0.863 0.720 Grand Theft Auto V 109 0.867 1.102 Pokemon GO 123 1.702 1.670 Rocket League 64 1.242 1.414 The Elder Scrolls V: Skyrim 76 1.023 1.253 The Sims 4 44 1.267 1.526 The Witcher 3: Wild Hunt 55 0.667 0.863

Table 4.12: GRS and the 12 selected digital games with their correspondent number of instances (NI), and the training MAE and the testing MAE, using theLR algorithm.

As expected, the testing results got higher MAEs in the majority of the selected games, although the training results had higher MAE in three of them. While comparing the GRS with the singular games, by subtracting the MAEs obtained on the testing results and training results, we can say that the GRS worked well because it had a Mean-Absolute Error Difference (MAED) of 0.285, and the other games had an Average Mean-Absolute Error (AMAE) of 0.223.

4.5.2 K-Nearest Neighbor Algorithm Validation Results

In order to analyze and compare the MAE resulted from the training and testing datasets for for the selected games and for the GRS, using the KNN algorithm, we present the following Table 4.13. Here it is possible to see the 12 selected games and the GRS, the respective number of instances, the training MAE and the testing MAE. We used the 1/d weighting distance method and a number of neighbors of 10% of the total instances of the respective review system.

56 Digital Game NI Training (MAE) Testing (MAE) GRS 809 1.127 1.367 Call of Duty Black Ops III 40 1.067 1.707 Clash Royale 65 1.071 1.579 Dark Souls III 44 0.871 1.280 FIFA 18 60 1.261 1.042 Fortnite 84 1.414 1.134 God of War 4 45 0.925 2.872 Grand Theft Auto V 109 0.890 1.152 Pokemon GO 123 1.716 1.535 Rocket League 64 1.209 1.285 The Elder Scrolls V: Skyrim 76 1.053 1.361 The Sims 4 44 1.172 1.431 The Witcher 3: Wild Hunt 55 0.637 0.877

Table 4.13: GRS and the 12 selected digital games with their correspondent number of instances (NI), and the training MAE and the testing MAE, using the KNN algorithm

The training results got lower MAEs in the majority of the selected games, although the testing results had lower MAE in three of them. When we compared the results of the GRS with the singular games, on the testing and training results, we can say that it worked well because it had a MAED of 0.240, and the other games had an AMAE of 0.166. Moreover, since these results have the same magnitude as the LR algorithm, we consider that this last is a better option to consider since it is simpler to compute.

4.5.3 Multilayer Perceptron Algorithm Validation Results

In order to analyze and compare the MAE obtained from the training and testing datasets for the selected games and for the GRS, using the MLP algorithm, we present the following Table 4.14. Here it is possible to observe the 12 selected games and the GRS, the respective number of instances, the training MAE and the testing MAE. We used three fully connected hidden layers, each layer with a number of nodes equal to the number of attributes of the file. In this algorithm, the testing results got higher MAEs in nine out of the twelve selected games. While comparing the GRS with the singular games, on the testing and training datasets, we can say that it worked well because it had a MAED of 0.275, and the other games had an AMAE of 0.194. Similar to the previous algorithm, since these results have the same magnitude as theLR algorithm, we consider that this last is a better option to consider due to its simplicity.

4.5.4 M5P Algorithm Validation Results

In order to analyze and compare the MAE resulted from the training and testing datasets for the selected games and for the GRS, using the M5P algorithm, we present the following Table 4.15. It is possible to observe the selected digital games and the GRS, with the respective number of instances and MAEs.

57 Digital Game NI Training (MAE) Testing (MAE) GRS 809 1.107 1.382 Call of Duty Black Ops III 40 1.153 1.864 Clash Royale 65 1.299 1.720 Dark Souls III 44 0.930 1.210 FIFA 18 60 1.472 1.114 Fortnite 84 1.385 2.106 God of War 4 45 1.029 0.723 Grand Theft Auto V 109 0.873 1.048 Pokemon GO 123 1.802 1.727 Rocket League 64 1.398 1.505 The Elder Scrolls V: Skyrim 76 1.150 1.535 The Sims 4 44 1.412 1.534 The Witcher 3: Wild Hunt 55 0.796 0.935

Table 4.14: GRS and the 12 selected digital games with their correspondent number of instances (NI), and the training MAE and the testing MAE, using the MLP algorithm.

Digital Game NI Training (MAE) Testing (MAE) GRS 809 1.131 1.389 Call of Duty Black Ops III 40 1.159 1.704 Clash Royale 65 1.047 1.527 Dark Souls III 44 1.099 1.268 FIFA 18 60 1.305 0.969 Fortnite 84 1.471 2.057 God of War 4 45 0.801 0.752 Grand Theft Auto V 109 0.891 1.049 Pokemon GO 123 1.667 1.682 Rocket League 64 1.250 1.316 The Elder Scrolls V: Skyrim 76 1.054 1.319 The Sims 4 44 1.326 1.580 The Witcher 3: Wild Hunt 55 0.639 0.884

Table 4.15: GRS and the 12 selected digital games with their correspondent number of instances (NI), and the training MAE and the testing MAE, using the M5P algorithm.

In this algorithm, the training results got higher MAEs in two of the selected games. Also, we can say that the GRS worked well, compared to the singular games, because it had a MAED of 0.258, and the other games had an AMAE of 0.200. Moreover, since these results and the results of the KNN and MLP algorithms have the same magnitude as theLR algorithm, we consider that the latter, is the best option to consider since it is simpler to compute.

4.5.5 Algorithms Validation Results Discussion

The validation results of the training datasets, after applying all restrictions explained before, with the testing datasets, for all selected algorithms, can be seen in Table 4.16. Here it is possible to observe the selected digital games and the GRS, with the respective number of instances and MAEs.

58 Digital Game NI LR(MAE) KNN(MAE) MLP(MAE) M5P(MAE) GRS 809 1.381 1.367 1.389 1.521 Call of Duty Black Ops III 40 1.688 1.707 1.704 1.749 Clash Royale 65 1.614 1.579 1.527 1.740 Dark Souls III 44 1.229 1.280 1.268 1.556 FIFA 18 60 1.143 1.042 0.969 1.092 Fortnite 84 2.030 2.134 2.057 2.121 God of War 4 45 0.720 0.872 0.752 0.723 Grand Theft Auto V 109 1.102 1.152 1.049 1.127 Pokemon GO 123 1.670 1.535 1.682 1.693 Rocket League 64 1.414 1.285 1.316 1.131 The Elder Scrolls V: Skyrim 76 1.253 1.361 1.319 1.635 The Sims 4 44 1.526 1.431 1.580 1.744 The Witcher 3: Wild Hunt 55 0.863 0.877 0.884 1.079

Table 4.16: GRS and the 12 selected digital games with their correspondent number of instances (NI) and MAEs, for all selected algorithms.

TheLR algorithm was the best algorithm in six out of the twelve selected games. By “best” we mean it obtained lower MAE. The M5P algorithm obtained better results in three of those games, whereas KNN was the best in two games and MLP in one game. Although, the KNN was the best algorithm for the GRS, followed byLR, then M5P, and finally MLP. To conclude, we noticed that the obtained selected algorithm results had the same magnitude. Al- though, we can say that theLR algorithm seems to be the best option due to its simplicity.

4.6 Discussion

In this chapter we reported and explained all obtained results. We concluded that, besides all algorithms had similar results, theLR seemed to be the best option since it is simpler to compute. Furthermore, after this analysis on the results, we confirmed our hypothesis that if we know the personality of each player a priori, using a player profile model, we are able to give personalized game reviews for selected digital games. However, using theLR approach in the GRS, the estimated value error is 1.381 in a scale from 1 to 10. In the next chapter, we will finalize this dissertation by wrapping up what has been said until here, clarify the contributions of this study, and mention what can be done in the future.

59 60 5 Conclusions

Contents 5.1 Discussion...... 62

5.2 Future Work...... 62

61 This dissertation was an exploratory study that presents initial efforts in the use of personalized game reviews. With this work, we developed a simple review system where we give personalized game reviews according to each player’s personality.

5.1 Discussion

Nowadays there are different types of game reviews, however, these approaches do not consider that each player has a different personality, and thus, they give equal reviews to everyone. Since we hypothesized that there is a connection between the player’s personality and the game review, we developed a system to give personalized game reviews that applies to everyone and to every digital game. In order to develop it, we used the BrainHex player profile model whose questionnaire adapted well to our purpose. We believe the formal description that the industry makes of the games has not enough information, and so, we decided to obtain the BrainHex model for each selected game, by asking some players their opinion on what the games offer. It is important to mention, that the developed General Review System (GRS), has the ability to be applied to every game by inserting the personality of the player and the BrainHex dimensions of the game. We selected and used four machine learning algorithms that were Linear Regression (LR), K-Nearest- Neighbor (KNN), Multilayer Perceptron (MLP), and M5Prime (M5P). These four algorithms seemed to be suitable for this work. Moreover, we validated the obtained review systems with the 30% testing datasets, and the obtained results were similar for all four selected algorithms. Although, we can say that theLR algorithm seems to be the best option due to its simplicity. The adequacy of theLR algorithm should not appear as totally surprising as the approach was build on the individual dimensions of the BrainHex model. Furthermore, the software Waikato Environment for Knowledge Analysis (WEKA) has covered our expectations, by helping with the processing of the system. From every point of view, a personalized game review benefits the user. Since our approach helps the player to decide if they should spend their money and time in a determined game, there will be less frustration and more confidence in their purchases. Finally, we expect our model to have low maintenance costs, since adding a new game to the GRS is simply asking a few users to create a BrainHex model for the game and add it to the system.

5.2 Future Work

For future research, in order to achieve even more stable and reliable review systems, a larger sample size of players should be gathered. We gathered 300 users but we know that if we had samples on the order of thousands of players, it could help to achieve more consistent systems. Although, the evolution of accuracy of the training datasets showed that we are at a relatively safe point to support our work.

62 We also suggest to replicate the methods used in our research with other player models, such as Bartle Player Types, or Gamer Motivation Profile (GMP); or even to use personality models as the described Five Factor Model (FFM), or the Myer-Briggs Type Indicator (MBTI). Besides, we suggest using a different review scale, or even to use a nominal scale, similar to the Angry Centaur Gaming (ACG) game reviews. Moreover, we think it would be an interesting addition to this work if future studies to take into account more personal information of the players. By this, we mean to build review systems which input data has more attributes to be computed, such as the gender, the age, type of player, and also their background and favorite game genres. In our opinion, this additional information could influence the review scores, and thus, it could help to verify our hypothesis. However, we decided not to use it because the review system we developed was composed by numerical and continuous attributes. Furthermore, we firmly believe our model could be offered through a web service containing a database of the BrainHex’s model for a series of games. The users would enter their BrainHex profile and the web service would provide them with personalized game reviews. Additionally to their predicted game review score, the users would have access to the predicted and real review scores of their friends. Figure 5.1 shows a possible example. In this example, Miguel has a preview score of 7.8 out of 10 listed along the predicted (circle) and real (hexagon) review scores of his friends. The highest (green) and lowest (red) scores of the BrainHex model of each user is also highlighted. The values presented in Figure 5.1 were obtained using our data (the names used are fictional). Besides that, a review system powered by the players themselves, could be generated, in such a way that the model is self-generated, in order to remain up to date to the latest releases. As such, we would like to end by highlighting that this review system has potential market value.

63 Figure 5.1: The user Miguel has a predicted review score of 7.8 out of 10, and his six friends with their names, photos and predicted review scores in a circle, or the real review scores in a hexagon, and the highest and lowest BrainHex dimensions in a small green and red circles, accordingly (VilaGames, n.d.).

64 References

Game Rankings. n.d. GameRankings Help. Retrieved from https://www.gamerankings.com/help. html.

GameSpot. 2017. FIFA 18. Retrieved from https://www.gamespot.com/fifa-18.

GameSpot. 2016. Pok´emonGO. Retrieved from https://www.gamespot.com/pokemon-go.

Google Play. n.d. View and analyse your app’s ratings and reviews. Retrieved from https://support. google.com/googleplay/android-developer/answer/138230?hl=en-GB.

Steam. 2015. Grand Theft Auto V. Retrieved from https://store.steampowered.com/app/271590/ Grand_Theft_Auto_V.

C. Bateman. 2009. In Beyond Game Design: nine steps towards creating better videogames.

Quantic Foundry. 2016. Gamer Motivation Profile. Retrieved from https://quanticfoundry.com/ wp-content/uploads/2016/03/Gamer-Motivation-Profile-GDC-2016-Slides.pdf.

R. Dias & C. Martinho. 2010. inFlow: Adapting gameplay to player personality. Master’s thesis, Instituto Superior T´ecnico.

BrainHex. n.d. What does my BrainHex class icon mean? Retrieved from http://blog.brainhex.com/ what-does-my-brainhex-icon-mean.html.

W. F. Punch III, E. D. Goodman, M. Pei, L. Chia-Shun, P. D. Hovland, & R. J. Enbody. 1993. Further Research on Feature Selection and Classification Using Genetic Algorithms. ICGA, pages 557–564.

I. H. Witten, E. Frank, M. A. Hall, & C. J. Pal. 2016. In M. Kaufmann, editor, Data Mining: Practical machine learning tools and techniques (4th edn.).

VilaGames. n.d. FIFA 18 - PS4. Retrieved from https://www.vilagamesonline.com.br/produto/ ps4/fifa-18-ps4-2/.

65 J. Huang. 2018. What can we recommend to game players?-implementing a system of analyzing game reviews. Master’s thesis, University of Tampere.

T. W. Malone. 1981. Toward a theory of intrinsically motivating instruction. Cognitive science, 5(4): 333–369.

W. C. Ribeiro, W. Lobato, & R. C. Liberato. 2009. Notas sobre fenomenologia, percep¸c˜aoe educa¸c˜ao ambiental. Revista Sinapse Ambiental, pages 42–65.

Autocar. 2018. Porsche Cayenne review. Retrieved from https://www.autocar.co.uk/car-review/ porsche/cayenne.

Internet Movie Database [IMDb]. 2018. Vingadores: Guerra do Infinito. Retrieved from https://www. imdb.com/title/tt4154756/reviews?ref_=tt_ov_rt.

Pitchfork. 2018. Eminem Kamikaze. Retrieved from https://pitchfork.com/reviews/albums/ eminem-kamikaze.

Tripadvisor. 2018. Alma. Retrieved from https://www.tripadvisor.pt/Restaurant_ Review-g189158-d9977670-Reviews-Alma-Lisbon_Lisbon_District_Central_Portugal.html.

A. McNamara. 2008. Up against the wall: game makers take on the press. Game Developer’s Conference.

PCGamesN. 2017. A brief history of how Steam review bombing damages developers. Retrieved from https://www.pcgamesn.com/history-of-steam-review-bombing.

Steamed. 2015. Steam ’review bombing’ is a problem. Retrieved from https://steamed.kotaku.com/ steam-review-bombing-is-a-problem-1701088582.

Metacritic. n.d. How we create the metascore magic. Retrieved from http://www.metacritic.com/ about-metascores.

Google Play. 2016. Pok´emonGO. Retrieved from https://play.google.com/store/apps/details? id=com.nianticlabs.pokemongo&showAllReviews=true.

Apple. 2016. Pok´emonGO. Retrieved from https://itunes.apple.com/us/app/pok%C3%A9mon-go/ id1094591345?mt=8.

Steam. 2017. Steam blog: user reviews. Retrieved from https://steamcommunity.com/games/593110/ announcements/detail/1448326897426987372.

Good Old Games [GOG]. 2016. The Witcher 3: Wild Hunt - game of the year edition. Retrieved from https://www.gog.com/game/the_witcher_3_wild_hunt_game_of_the_year_edition.

66 Steam. n.d. Steam reviews. Retrieved from https://store.steampowered.com/reviews.

Imagine Games Network [IGN]. n.d. Review scoring. Retrieved from https://ign-entertainment. squarespace.com/review-practices.

T. . D. U. . h. Y. . n. Angry Centaur Gaming [ACG], Lastchecked = 13th October 2018.

S. Rabin. 2010. In N. Education, editor, Introduction to Game Development (2nd edn.).

Digital Trends. The history of Battle Royale: From to worldwide phenomenon, url = https://www.digitaltrends.com/gaming/history-of-battle-royale-games, year = 2018.

L. A. Lehmann. 2012. In Location-based mobile games.

R. R. McCrae & O. P. John. 1992. An introduction to the Five-Factor Model and its applications. Journal of personality, 60(2):175–215.

K. M. Sheldon, R. M. Ryan, L. J. Rawsthorne, & B. Ilardi. 1997. Trait self and true self: cross- role variation in the Big-Five Personality Traits and its relations with Psychological Authenticity and Subjective Well-Being. Journal of Personality and Social Psychology, 73(6):1380–1393.

L. E. Nacke, C. Bateman, & R. L. Mandryk. 2014. Brainhex a Neurobiological Gamer Typology Survey. Entertainment computing, 5(1):55–62.

I. Briggs-Myers & P. B. Myers. 1995. In Gifts differing: understanding personality type.

N. Lazzaro. 2004. Why we play games: four keys to more emotion without story. Game Developers Conference.

C. Martinho, P. Santos, & R. Prada. 2014. In FCA, editor, Design e desenvolvimento de jogos.

T. Quandt & S. Kr¨oger.2014. In Routledge, editor, Multiplayer: The Social Aspects of Digital Gaming.

R. Bartle. 1996. Hearts, clubs, diamonds, spades: Players who suit MUDs. Journal of MUD research, 1 (1):19.

N. Yee. 2016. The Gamer Motivation Profile what we learned from 250,000 gamers. Annual Symposium on Computer-Human Interaction in Play, page 2.

C. Bateman, R. Lowenhaupt, & L. Nacke. 2011. Player typology in theory and practice. DiGRA Conference.

BrainHex. 2008. Welcome to BrainHex! Retrieved from http://blog.brainhex.com.

67 R. R. Bouckaert, E. Frank, M. A. Hall, G. Holmes, B. Pfahringer, P. Reutemann, & I. H. Witten. 2010. WEKA - Experiences with a Java Open-Source Project. Journal of machine learning research, 11: 2533–2541.

R. Bouckaert, E. Frank, M. Hall, R. Kirkby, P. Reutemann, A. Seewald, & D. Scuse. 2018. WEKA Manual for Version 3-8-3.

M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, & I. H. Witten. 2009. The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18.

D. W. Aha, D. Kibler, & M. K. Albert. 1991. Instance-Based Learning Algorithms. Machine learning, 6 (1):37–66.

W. Yue, Z. Wang, H. Chen, A. Payne, & X. Liu. 2018. Machine learning with applications in breast cancer diagnosis and prognosis. Designs, 2(2):13.

B. Shaw & T. Jebara. 2009. Structure Preserving Embedding. 26th Annual international conference on machine learning on proceedings, pages 937–944.

A. Johnson, Y. Tang, & C. Franzwa. 2014. kNN-based adaptive virtual reality game system. Networking, Sensing and Control (ICNSC), 2014 IEEE 11th International Conference, pages 457–462.

M. Mehroof & M. D. Griffiths. 2010. Online gaming addiction: the role of sensation seeking, self- control, neuroticism, aggression, state anxiety, and trait anxiety. Cyberpsychology, behavior, and social networking, 13(3):313–316.

K. Y. Huang & K. J. Chen. 2011. Multilayer perceptron for prediction of 2006 world cup football game. Advances in Artificial Neural Systems, 11.

J. Han, J. Pei, & M. Kamber. 2011. In Elsevier, editor, Data mining: concepts and techniques (3rd edn.).

Y. Wang & I. H. Witten. 1996. In Induction of model trees for predicting continuous classes.

A. F. Machado, E. W. Clua, & B. Zadrozny. 2010. A Method for Generating Emergent Behaviors using Machine Learning to Strategy Games. Games and Digital Entertainment (SBGAMES), 2010 Brazilian Symposium, pages 12–18.

R. Kohavi. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai, 14(2):1137–1145.

B. Wiedenbeck, B. A. Cassell, & M. P. Wellman. 2014. Bootstrap statistics for empirical games. Pro- ceedings of the 2014 international conference on Autonomous agents and multi-agent systems, 14(2): 597–604.

68 S. Arlot & A. Celisse. 2010. A survey of cross-validation procedures for model selection. Statistics surveys, 4:40–79.

S. Vandercruysse, M. Vandewaetere, M. Maertens, J. ter Vrugte, P. Wouters, T. Jong, & J. Elen. 2015. Development and validation of the Game Perceptions Scale (GPS). Journal of educational multimedia and hypermedia, 24(1):43–74.

S. L. Mathias & C. Sakai. 2013. Utiliza¸c˜aoda Ferramenta Google Forms no Processo de Avalia¸c˜ao Institucional: Estudo de Caso nas Faculdades Magsul.

L. L. Leslie. 1970. Increasing response rates to long questionnaires. The Journal of Educational Research, 63(9):347–350.

69 70 A Game Genres

Action Games which emphasize physical challenges, including reaction-time and hand-eye coordination. This can then be subdivided into some sub-genres such as shooter. Adventure Games which are set in a world usually made up of multiple, connected rooms or screens, involving an objective which is more complex than simply catching, shooting or escaping, although, completion of the objective may involve several or all of these activities. Objectives must usually be completed in several steps, for example, finding keys and unlocking doors to other areas to retrieve objects needed elsewhere in the game. Characters are usually able to carry objects, such as weapons, keys, tools, and so on. Settings often evoke a particular historical period and place or are thematically related to content-based genres such as Science Fiction, Fantasy, or Espionage. This term should not be used to refer to a game which does not allow a player to wander and explore its world freely. Battle Royale Games in which the last person standing wins. These games challenge a large number of players, dropped in an island empty-handed and have to scavenge for weapons and equipment and eliminate all other opponents within limited time. First-Person Shooter Games which places the player “behind the eyes” of the game character. In the games, the player is able to wield a variety of weapons and dispatches enemies by shooting them.

71 Hack & Slash Games in which the player equip the main character with melee weapons with the goal to destroy the opponents while explore the setting. Location-Based Game Games which moves with the user’s location. This is a type of perva- sive game that allows the player to report their location, frequently this is through the use of satellite positioning. Role-Playing Game Games in which players create or take on a character, which may include a developed persona, are considered the digital game version of pen and paper games like Dungeons and Dragons. The characters’ descriptions may include various characteristics, such as strength, dexterity, constitution, intelligence, wisdom, charisma, and may also include specifics such as race, class, and background. The games can be single player or multiple players. This term should not be used for games like Adventure in which identity is not emphasized or important. Real-Time Strategy Games in which the goal is for the player to collect resources, build an army, and control his units to attack the enemy. The action in these games is fairly fast-paced and because of the continuous play, strategic decisions must be made quickly. Simulator Games which are based on the simulation of a system. This system can be anything from the workings and economy of the railroads to a single fighter craft, or even the simulation of human life and social interactions. Sport Games which are simulations of existing sports or variations of them. This genre covers a myriad of games that simulate sporting experiences such as football, volleyball, basketball, and other sports. Similarly to racing digital games, sports games are an attempt to re-create the interactions in a real sport.

72 B Selected Digital Games

Call of Duty Black Ops III1 is a digital game developed by Activision. This game was released in 2015 and is available to play on PC, PlayStation and Xbox. The genre of this game is First-Person Shooter. Here the idea is that the playable character plays through several missions in war scenarios. Clash Royale2 is a digital game developed by Supercell. This game was released in 2016 and is available to play on Android and iOS. The genre of this game is Real-Time Strategy. Here the idea is to play online, 1 against 1 player, and destroy the opponent base. Dark Souls III3 is a digital game developed by FromSoftware. This game was released in 2016 and is available to play on PC, PlayStation and Xbox. The genre of this game is Action-Role-Playing Game. Here the idea is that the playable character encounter different types of enemies with different behaviors and can improve his skills and weapons to defeat them. FIFA 184 is a digital game developed by . This game was released in 2017 and is available to play on Switch, PC, PlayStation and Xbox. The genres of this game are Sport and

1Call of Duty Black Ops III, https://www.callofduty.com/blackops3. Last accessed 17 November 2018. 2Clash Royale, https://clashroyale.com. Last accessed 17 November 2018. 3Dark Souls III, https://en.bandainamcoent.eu/dark-souls/dark-souls-iii. Last accessed 17 November 2018. 4FIFA 18, https://www.ea.com/ea-access/landing-page/fifa-18. Last accessed 17 November 2018.

73 Simulator. Here the main idea is that the player controls a football team and plays matches against other teams. It can be played online with other players or offline. Fortnite5 is a digital game developed by Epic Games. This game was released in 2017 and is available to play on Android, iOS, , PC, PlayStation and Xbox. The genre of this game is Battle Royale. Here the idea is to be in an island, with a large number of opponents playing online, eliminating all opponents within limited time. God of War 46 is a digital game developed by Santa Monica Studio. This game was released in 2018 and is available to play on PlayStation. The genres of this game are Action and Adventure. Here the idea is that the playable character is a god spartan and fight difficult foes and gods. Grand Theft Auto V7 is a digital game developed by Rockstar Games. This game was released in 2014 and is available to play on PC, PlayStation and Xbox. The genres of this game are Action and Adventure. Here the idea is that the playable character is a criminal and can carry out missions for the progress of the story, as well as participate in non-linear actions in a fictional open-world. Pokemon GO8 is a digital game developed by Niantic and Nintendo. This game was released in 2016 and is available to play on Android and iOS. The genre of this game is Location-Based Game. Here the idea is to catch all the Pok´emons while the player moves through the real world. Rocket League9 is a digital game developed by . This game was released in 2015 and is available to play on Nintendo Switch, PC, PlayStation and Xbox. The genre of this game is Sport. Here the main idea is that the player controls a car and plays football with it, in a team, against other players. The Elder Scrolls V: Skyrim10 is a digital game developed by Bethesda Game Studios. This game was released in 2016 and is available to play on Nintendo Switch, PC, PlayStation and Xbox. The genre of this game is Role-Playing Game. Here the idea of this open-world game is that the playable character is the last dragonborn alive, a hunter of dragons which aim to remove the threat that the dragon Alduin represents for the fantastic world. The Sims 411 is a digital game developed by Maxis. This game was released in 2014 and is available to play on PC, PlayStation and Xbox. The genre of this game is Simulator. Here the idea the idea is that the player creates and controls the lives of virtual people (called “Sims”) and builds and designs their houses. The Witcher 3: Wild Hunt12 is a digital game developed by CD Projekt Red. This game was released in 2015 and is available to play on PC, PlayStation and Xbox. The genre of this game is Role-

5Fortnite, https://www.epicgames.com/fortnite/en-US/home. Last accessed 17 November 2018. 6God of War 4, https://godofwar.playstation.com. Last accessed 17 November 2018. 7Grand Theft Auto V, https://www.rockstargames.com/V. Last accessed 17 November 2018. 8Pokemon GO, https://www.pokemongo.com. Last accessed 17 November 2018. 9Rocket League, https://www.ea.com/ea-access/landing-page/fifa-18. Last accessed 17 November 2018. 10The Elder Scrolls V: Skyrim, https://elderscrolls.bethesda.net/en/skyrim. Last accessed 17 November 2018. 11The Sims 4, https://www.ea.com/games/the-sims/the-sims-4/pc/store/mac-pc-download-base-game-standard-edition. Last accessed 17 November 2018. 12The Witcher 3: Wild Hunt, https://elderscrolls.bethesda.net/en/skyrim. Last accessed 17 November 2018.

74 Playing Game. Here the idea is that the player controls a witcher in an open-world game, that fights different types of enemies with weapons and magic.

75 76 C User Questionnaire 1

Here we present the first user questionnaire named Game Reviews, where all the recollect dataset can be seen here1. The 12 selected digital games were Call of Duty Black Ops III, Clash Royale, Dark Souls III, FIFA 18, Fortnite, God of War 4, Grand Theft Auto V, Pokemon GO, Rocket League, The Elder Scrolls V: Skyrim, The Sims 4 and The Witcher 3: Wild Hunt. Although, for reasons of space, we present only the questions of the first digital game, but the questions were the same for all the 12 selected games. When the users were asked to follow the questionnaire, clicking on the link to obtain the BrainHex model, they were redirected to the quiz of the International Hobo page. In this quiz, in order to obtain the scores for each dimension of the BrainHex, the 21 questions were separated by their correspondent BrainHex’s dimension, as presented in Table C.1. Then for each answer “I love it!”, it was added 2 points; for the answers “I like it.”, it was added only 1 points; for the answers “It’s okay.”, no value was added; for the answers “I dislike it.”, it was subtracted 2 points; and for the answers “I hate it!”, it was subtracted 4 points. Afterwards in this quiz of the International Hobo, the players had to rate 7 sentences, each correspon-

1https://docs.google.com/spreadsheets/d/1oz6z87XM5ufbBhuQh1gbZGDrd lxpxOYBSDVBfjZX- U/edit?usp=forms web b#gid=285354937

77 dent to one BrainHex dimension, as presented in Table C.2. For the lowest rated sentence was added 2 points for that dimension, for each rating above this lowest was added 2 more points sequentially, so the sentence with the highest rating received more 14 points. Finally, all points were calculated and obtained the correspondent score of each dimension.

Question AC CO DA MA SE SO SU “Exploring to see what you can find.” Yes “Frantically escaping from a terrifying foe.” Yes “Working out how to crack a challenging puzzle.” Yes “The struggle to defeat a difficult boss.” Yes “Playing in a group, online or in the same room.” Yes “Responding quickly to an exciting situation.” Yes “Picking up every single collectible in an area.” Yes “Looking around just to enjoy the scenery.” Yes “Being in control at high speed.” Yes “Devising a promising strategy (...) try next.” Yes “Feeling relief when you escape to a safe area.” Yes “Taking on a strong opponent (...) versus match.” Yes “Talking with other players, (...) same room.” Yes “Finding what (...) complete a collection.” Yes “Hanging from a high ledge.” Yes “Wondering what’s behind a locked door.” Yes “Feeling scared, terrified or disturbed.” Yes “Working out what to do on your own.” Yes “Completing a punishing (...) many times.” Yes “Co-operating with strangers.” Yes “Getting 100% (...).” Yes

Table C.1: 21 questions with their correspondent BrainHex’s dimension. Meaning: Achiever (AC), Conqueror (CO), Daredevil(DA), Mastermind (MA), Seeker (SE), Socialiser (SO), Survivor (SU).

Sentence AC CO DA MA SE SO SU “A moment of jaw-dropping wonder or beauty.” Yes “An experience of primeval (...) your mind.” Yes “A moment of breathtaking speed or vertigo.” Yes “The moment when the solution (...) your mind.” Yes “A moment of hard-fought victory.” Yes “A moment when you feel (...) another player.” Yes “A moment of completeness (...) strived for.” Yes

Table C.2: 7 sentences with their correspondent BrainHex’s dimension. Meaning: Achiever (AC), Conqueror (CO), Daredevil(DA), Mastermind (MA), Seeker (SE), Socialiser (SO), Survivor (SU).

78 79 80 81 82 83 84 D User Questionnaire 2

Here we present our second user questionnaire named Games & BrainHex1. The 12 selected digital games were Call of Duty Black Ops III, Clash Royale, Dark Souls III, FIFA 18, Fortnite, God of War 4, Grand Theft Auto V, Pokemon GO, Rocket League, The Elder Scrolls V: Skyrim, The Sims 4 and The Witcher 3: Wild Hunt. Although, for reasons of space, we present only the questions for the first digital game, the questions were the same for all the 12 selected games. To obtain the scores for each dimension of the BrainHex, the 21 questions were separated by their correspondent BrainHex’s dimension. Then, in a Likert scale between 1 meaning: this game does not allow, to the player, the opportunity of this kind of behavior, to 5 meaning this behavior is strongly present in the game; for each answer with a 5, it was added 2 points; for the answers with the number 4, it was added only 1 points; for the answers with a 3, no value was added; for the answers with the number 2, it was subtracted 2 points; and for the answers with 1, it was subtracted 4 points. Afterwards, the players had to rate 7 sentences, each correspondent to one BrainHex dimension. For the lowest rated sentence was added 2 points for that dimension, for each rating above this lowest was added 2 more points sequentially, so the sentence with the highest rating received more 14 points. Finally,

1https://docs.google.com/spreadsheets/d/1RncS-HD0bfRcZHSJBu4KkQoq-Kc0DJLHOIFqDzeQkWo/edit#gid=2041209212

85 all points were calculated and obtained the correspondent score of each dimension.

86 87 88 89