Player Modelling in Civilization IV
Freek den Teuling HAIT Master Thesis series nr. 10-001
THESIS SUBMITTED IN PARTIAL FULFILMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF ARTS IN COMMUNICATION AND INFORMATION SCIENCES ,
MASTER TRACK HUMAN ASPECTS OF INFORMATION TECHNOLOGY ,
AT THE FACULTY OF HUMANITIES
OF TILBURG UNIVERSITY
Thesis committee:
Dr. Ir. P.H.M. (Pieter) Spronck
Dr. M.M. (Menno) van Zaanen
Prof. Dr. E.O. (Eric) Postma
Tilburg University
Faculty of Humanities
Department of Communication and Information Sciences
Tilburg, The Netherlands
April 2010 P a g e | 2
Preface
Knowledge is power. Knowledge about an opponent is power over that opponent. This reasoning forms the base of opponent modelling. Once we are capable to make a model of an opponent, we have nothing to fear from that opponent. We know the opponent; we know his weaknesses, his skills, his preference. That is the power of opponent modelling. As a student for the master track Human Aspects of Information Technology at Tilburg University, fanatic gamer, and computer enthusiast, the modelling of an opponent was for me instantly associated with computers and games. It would be great for gamers if computers were able to create a model of human players. Ideas came to mind of computer games that act based on specific needs of players (as ally or as adversary), and game applications which are capable of automatically modelling opponents. In short, an increase in level of interaction and adaption between player and computer. However, these were wild ideas without knowledge of the possibilities. During meetings with my thesis supervisor Dr. Ir. Pieter Spronck we discussed several games we could use as a digital research field. We discussed games like WORLD OF WARCRAFT , NEVERWINTER NIGHTS and SID MEIER ’S CIVILIZATION IV. Pieter already did some experiments with CIVILIZATION IV in which he used classifiers to predict when a player would declare war in that game. His experience with the game was one of the reasons to choose CIVILIZATION IV as our digital research field. We have altered his concept, so that it could be used for opponent modelling. My goal would be to find out to what extent it would be possible to create a player model based on the game CIVILIZATION IV by using a classifier. This research has not as goal to actually create a player model, but rather to find out whether our approach is a valid approach to create a player model. While reading literature, it became clear that opponent modelling by means of greedy classification has two major disadvantages: (1) large amounts of data are needed in order to train a classifier, and (2) training of a classifier takes time. However, our approach is designed to bypass these problems and makes it therefore a useable concept to be actually implemented in future games. This research would not have been completed without the ever helpful Pieter Spronck. I have absorbed large amounts of his time with endless talks about my research. Whenever I was facing obstacles in my research, he helped me past those. My friends and family have also been fantastic by offering advice and moral support. Furthermore, I want to thank the participants, Walter and Job, for their time.
May this research be of as much value to you as it was to me.
Have fun reading!
Freek den Teuling
P a g e | 3
Abstract
In order for humans to play a game, they just need to understand the rules of the game. However, to become victorious in a game it is necessary understand the game and the opponent. In other words, besides knowing the rules it is important to have an opponent model. An opponent model, or player model, is an abstract representation of player. It can consist of a player’s strengths, weaknesses, preferences, strategies, skills, or a combination of those. The player model in this research is based on the preferences of players and is constructed by means of classification. The construction of a player model by means of classification requires large datasets that need to be generated by players, which in turn requires time. To speed up the construction of the player model, computer-controlled players with preferences embedded in their code are used to create a classification model. These computer-controlled players with preferences embedded in their code were found in the commercial computer game SID MEIER ’S CIVILIZATION IV. In this game the computer-controlled players are represented by leaders of different civilizations. These computer-controlled leaders are selected to generate a large database. The classifier is to train on this database in order for it to be capable of predicting preferences of computer-controlled players and even human players; thus creating player models.
After an introduction in player modelling, classification, and CIVILIZATION IV follows the mid-section of the research. This comprises of preparations, the experiments, the results, and the discussion of these results. The preparation consists of: (1) the process of the selecting computer-controlled players, (2) the generation of data, (3) the construction of databases, (4) the selection of the appropriate classifier, and (5) tweaking and fine-tuning mechanics. Three experiments are conducted: (1) Classification Model Validation in which the constructed classification models are validated, (2) Modelling of AI were an attempt is made to predict the preferences of computer-controlled players, and (3) Modelling of Humans were an attempt is made to model two human players. Following the experiment section is a detailed result section, displaying the results of the experiments. Noteworthy results are further elaborated on in the discussion. The conclusion of our investigation is that (1) a fairly accurate preference-based player model can be constructed by means of classification. Therefore classification seems a suitable approach to player modelling. (2) The predictions on computer-controlled players other than the ones used to create the database, are not that accurate. Possible reason is the choice for preferences as class. Besides preferences there appear to be more influences that steer the actions of the leaders, e.g. special-purpose code. Furthermore we notice that it is harder for the classifier to distinguish computer-controlled players that have roughly similar preferences. (3) The creation of a human player model does not seem that accurate either. Interesting is that the accuracy of the player model appears to differ depending on play styles. Although, this subject is researched briefly and is well worth future research.
P a g e | 4
Table of Contents
Preface ...... 2
Abstract ...... 3
Table of Contents ...... 4
1. Introduction ...... 6
2. Background ...... 8 2.1. Sid Meier’s Civilization IV ...... 8 2.2. Player Modelling ...... 9 2.3. Sequential Minimize Optimization Classifier ...... 10
3. Experimental Setup ...... 13 3.1. Research Elaboration ...... 13 3.2. Leaders and Preferences ...... 15 3.3. Data Generation ...... 16 3.4. Observation ...... 18 3.5. Classifier Selection...... 20 3.6. Experiments ...... 21 Classification Models Validation ...... 21 Modelling of AI ...... 22 Modelling of Humans ...... 23
4. Results ...... 25 4.1. Classification Models Validation ...... 25 InfoGain and GainRatio ...... 26 Minus100 ...... 27 Validation ...... 29 Summary ...... 30 4.2. Modelling of AI ...... 30 Preference Classification ...... 31 Leader Classification ...... 32 Summary ...... 33 4.3. Modelling of Humans ...... 33 Casual Player ...... 34 Expert Player ...... 35 Close or Not? ...... 36 Summary ...... 38
P a g e | 5
Geef titel van hoofdstuk op (niveau 3) ...... 6 5. Discussion ...... 39 5.1. Constructing the Classification Models ...... 39 5.2. Predicting Preference or Player ...... 40 First Solution ...... 41 Second Solution ...... 41 5.3. How to Classify Humans? ...... 42 Geef titel van hoofdstuk op (niveau 3) ...... 6 6. Conclusions ...... 43 6.1. Suitable Player Model ...... 43 6.2. Predicting AI Opponents ...... 43 6.3. Predicting Human Opponents ...... 44 6.4. Answer to the Problem Statement ...... 44 6.5. Future Work ...... 44 Implementation in Civilization IV ...... 45 Geef titel van hoofdstuk op (niveau 3) ...... 6 References ...... 46 Geef titel van hoofdstuk op (niveau 3) ...... 6 Appendices ...... 47 A. Features ...... 47 B. SMO ...... 49 C. InfoGain & GainRatio ...... 51 D. InfoGain & GainRatio Features ...... 53 E. Reports ...... 54 F. SMO Output ...... 57
P a g e | 6
Know thy self, know thy enemy. A thousand battles, a “thousand victories. Sun Tzu (500 BC) ” 1 Introduction
When humans play games, they strive to master these games. The mastering of a game is a prerequisite to become victorious in a game. That is not the only prerequisite however. Besides mastering the game it is equally important, or perhaps even more important, to know the opponent. To become victorious, it is necessary to understand the opponent’s preferences, strategies, skills and weaknesses. This combination of (1) mastering the game itself and (2) learning about the opponent are both part of the preparations of top grandmasters in CHESS , CHECKERS or SHOGI (Carmel & Markovitch, 1993; Van den Herik, Donkers & Spronck, 2005). Mastering a game can be achieved by observing games, learning the rules, and practice. In order to learn about an opponent, a model can be constructed. Such a model is called an opponent model or more general: a player model. A player model can be described as an abstract representation of certain characteristics of a player and his behaviour. A player model can specify a player’s preferences, strategies, skills and weaknesses or any combination of those (Houlette, 2004; Van den Herik et al., 2005; Donkers & Spronck, 2006). It has been shown in other research that it is possible to create a player model based on actions that players make in certain game states (Houlette, 2004). However, the actual actions of a player are defined by the strategy of a player, which are in turn defined by a player’s preferences. A player model that is purely based on actions and the predicting of actions, tends to be of limited accuracy (Donkers & Spronck, 2006). To overcome the issue of limited accuracy and the desire to model the mind of a player, a different approach is chosen in this research. Here, an attempt is made to model the preferences of a player rather than the actions of a player. The problem statement is defined as follows: To what extent can a model be constructed of a player, which accurately predicts the player’s preferences? We have limited our research to players in the commercial computer game SID MEIER ’S CIVILIZATION IV. First an attempt will be made to create a classification model based on data generated by computer-controlled players. Second, an attempt will be made to model other computer-controlled players with the constructed classification model. Third, an attempt will be made to model human players with the constructed classification model. This has been formulated in three research questions:
1. What is a suitable player model for the computer game SID MEIER ’S CIVILIZATION IV? 2. To what extent can a model be constructed, using a classification algorithm, which recognizes the preferences of a computer-controlled player in SID MEIER ’S CIVILIZATION IV? 3. To what extent can a model be constructed, using a classification algorithm, that recognizes the preferences of a human player in SID MEIER ’S CIVILIZATION IV? An advantage of constructing a classification model on data generated by computer-controlled players before using it to model a player (computer-controlled or human) is that it bypasses two major disadvantages of using greedy classifiers. The first disadvantage of a greedy classifier is the need for human players to invest time in generating data for the classifier to train on. Data to create a database is still needed for this approach. However, that database can be constructed by using computer-controlled players.
P a g e | 7
In other words, no human players need to invest time in the generation of data. The second disadvantage of a greedy classifier is the amount of time it takes for a classifier to train on the data. In this research, time is still needed to train the classifier. However, once the classifier is trained the modelling of players, computer-controlled or human, will only take seconds. The outline of this thesis is as follows. In Chapter 2 background information is provided, containing the ins and outs of the computer game SID MEIER ’S CIVILIZATION IV, elaboration on the approach of player modelling in general, and finally a description of the classifier that is used in this research. Chapter 3 concerns the experimental setup. It contains an elaboration on this research, an explanation on the selected preferences and computer-controlled players, the process of transferring the preferences to the game world, the selection of which features should be incorporated in the database, validation of the used classifier, and finally an explanation of the experiments. Chapter 4 contains the results of the experiments. These results are discussed in Chapter 5, followed by the conclusions and recommendations for future work in Chapter 6.
P a g e | 8
2 Background
This chapter serves as background and reference for the rest of this research. In Section 2.1 the commercial computer game SID MEIER ’S CIVILIZATION IV is described. It gives an overview of the computer game and its mechanics. Furthermore, it describes the reason why this particular computer game is chosen as digital research field. In Section 2.2 the modelling of a player is elaborated further, describing player modelling in classical games as well as in modern games. In Section 2.3 the classification algorithm that is used for this research is explained. 2.1 Sid Meier’s Civilization IV
SID MEIER ’S CIVILIZATION IV (CIV 4) is a turn-based strategy (TBS) game in which the player builds an empire. In general, TBS games are strategically oriented computer games. An important difference with most common computer games it that TBS games are played in turns rather than in real-time. Board games like Chess and Checkers are good examples of turn-based games. Similar to Chess and Checkers players take alternating turns in CIVILIZATION IV. Taking turns provides more time for players to think about their next action. Strategies and planning are therefore even more important for TBS games than for real-time strategy (RTS) games.
In CIV 4 a player begins with selecting an empire and an appropriate leader. There are eighteen different empires available and a total of 26 leaders. Once the empire and leader have been selected, the game starts in the year 4000 BC. From here on the player has to compete with rival leaders, manage cities, develop infrastructure, encourage scientific and cultural progress, found religions, etcetera. An original characteristic of CIV 4, is that defeating the opponent is not the only way to be victorious. There are six conditions to be victorious as mentioned in the CIVILIZATION IV MANUAL (2005): (1) Time Victory , (2) Conquest Victory , (3) Domination Victory , (4) Cultural Victory , (5) Space Race and (6) Diplomatic Victory . Because of these six different victory conditions the relation between the player and the opponent is different from most strategy games. The main part of the game the player is at peace with his opponents. Therefore it is possible to interact, to negotiate, to trade, to threaten and to make deals with opponents. Only after declaring war or being declared war upon, a player is at war. Any player can declare war any time, unless that player is in an agreement with an opponent which specifically forbids war declaration. To provide an impression of the game, an in-game screenshot of CIV 4 is displayed in Figure 2.1
The reason to choose CIV 4 as digital research field was fourfold. (1) There has been done previous research with this game. Having previous research as reference prevents mistakes that otherwise could have been made. Furthermore, it provides guidelines and even useable programs (Houlette, 2004; Rohs, 2007). (2) There was an extensive database already available containing data of numerous played games. The extensive database reduced time-consuming data generation, although this existing database is drastically expanded in this research (see Chapter 3 Section 3.4). (3) The fact that there are multiple (peaceful) ways to be victorious. The multiple victory conditions and the possibility of interaction between players, stimulates the use of preferences and strategies. (4) Most importantly, the computer-controlled leaders act based on preferences given to them by the designers of the game. We believe that the computer-controlled leaders act, based on their preferences. This last fact forms the core of this thesis. As mentioned in Chapter 1 the general goal is to find out to what extent a model can be constructed of a player, which accurately predicts
P a g e | 9
preferences of the player. The predefined preferences that are implemented by the designers of CIV 4, provide a solid base upon which classification models can be build. This will be further elaborated in Section 3.1.
Figure 2.1 – A screenshot of a game of CIV 4. The border between two empires is visible as well as one city of each empire. Furthermore, one can see the division of the land in tiles, the availability of resources, different terrains, and some units
2.2 Player Modelling
In order to elaborate on player modelling, a distinction is made between classical games and computer games. First, a short overview of the purposes of a player model is given. Followed by player modelling in classical games and then in computer games. For an extensive description of the history of player modelling, look into Donkers (2003). Basically, a model of a player can be used in three different ways. (1) In order for a game to adapt to the player it is beneficial to have a player model. Knowing how a player acts can give the game data (and thus knowledge) about the strengths and weaknesses of the player. A game can use this data to defeat strong players by exploiting the weaknesses of those players or by tutoring the players to overcome their weaknesses. (2) A player model can be used to implement humanlike characteristics in a non-player character (NPC). By copying human models, hence human behaviour into NPC’s the game can be more realistic (Van den Herik et al ., 2005; Donkers & Spronck, 2006; Bakkes, to appear ). (3) Another benefit of a player model is that a model can maximize game results even though a game-theoretical optimal solution is known (Bakkes, Spronck & Van den Herik, 2009). Take for example the game of ROSHAMBO (Rock- Paper-Scissors). The game-theoretical optimal solution is for both players to randomly select one of the three options. The result will roughly be that both players will win half of the time and both will lose half of the time. Now consider an opponent that always chooses rock . Sticking with the game-theoretical optimal solution will have the same result as before. Using the knowledge about the opponent, hence the model of the opponent, one could maximize the result by choosing paper all the time (Fürnkranz, 2007). Player modelling has been envisaged a long time ago in the domain of classic games. In the 1970s, chess programs incorporated a contempt factor. This means that the program accepted a draw against a stronger opponent and declined a draw against a weaker opponent (Van den Herik et al ., 2005). It took the other player in consideration and was therefore considered as the first form of player modelling. The first
P a g e | 10 attempt to really model an opponent in a classic game was taken by Slagle and Dixon (1970) who used a optimized mini-max procedure. In 1993, research specifically focussed on the opponent modelling. In Israel, Carmel and Markovitch (1993) investigated, in depth, the learning of models of opponent strategies. At the same time in the Netherlands, Iida, Uiterwijk, and Van den Herik (1993) investigated potential applications of opponent-model search. Both research teams called their approach opponent-model search. In 1994, Uiterwijk and Van den Herik invented a search technique to speculate on the fallibility of the opponent player. In the 2000s a probabilistic opponent model was defined by Donkers, Uiterwijk & Van den Herik (2001) and Donkers (2003), that incorporated the player’s uncertainty about the opponent’s strategy (Bakkes, to appear ). The general aim of player modelling in computer games is to make the games more entertaining to the player whereas in contradiction with classic games, the general aim is to beat the opponent, (Van den Herik et al. , 2005). This is one of the reasons player modelling is of increasing importance in modern computer games (Fürnkranz, 2007). According to Van den Herik et al. (2005), player modelling has two main goals in a computer game’s artificial intelligence (AI): (1) as a companion and (2) as a opponent. Companion role : In order for the AI to be a good companion, it is necessary that the AI behaves according to the humans expectations. For instance, when the human player prefers a sneaky approach to deal with hostile characters (e.g. by attempting to maintain undetected), he will not be pleased when the computer-controlled companions immediately attack every hostile character that is near. The entertainment value of a game would soon be impaired if the AI fails to predict the preferences of the human player (Bakkes, to appear ). Opponent role : As an opponent it is important for the AI to keep the game interesting. Research has shown that if an opponent is too weak, the player quickly looses interest in the computer game. On the other hand it is not desirable for the AI to be stronger than the player. This results in the player getting stuck, which also reduces the entertainment value (Van Lankveld, Spronck & Van den Herik, 2009). Player modelling has been around for some decades now. It started with classic games and is now becoming a point of interest in modern computer games. Much research has been done, but there is still much to learn. 2.3 Sequential Minimize Optimization Classifier
Data extracted from the game world through observations needs to be classified in order to construct a player model. There are many different classification algorithms available. To choose the optimal classification method we considered several different classification methods, each of these classification methods use their own approach in classification. Among them were: (1) the naive Bayes classifier NaiveBayes, (2) the optimised Support Vector Machine (SVM) ‘Sequential Minimal Optimization’ (SMO), (3) the k-nearest-neighbour classifier IBk, (4) the decision tree builder J48, and (5) the rule-based classifier JRip. Based on a small test, described in Chapter 3 Section 3.5, the fastest and most accurate classification algorithm was SMO. Essentially, the SMO is closely related to SVM and is designed to outperform a standard SVM (Platt, 1998, 2000). To understand SMO it is necessary to understand SVM. In the remainder of this section a short overview of SVM is given, concluding with the added value of SMO. For in-depth information about SMO look into Platt (1998); Keerthi, Shevade, Bhattacharyya & Murthy (2000); Mak (2000). For further information on the other four classifiers we refer to John & Langley’s (1995) Estimating Continuous Distributions in Bayesian Classifiers , Aha & Kibler’s (1991) Instance-based Learning Algorithms, Quinlan’s (1993) C4.5: Programs for Machine Learning , and Cohen’s (1995) Fast Effective Rule Induction respectively. These four classification methods are not elaborated on, since they are of little relevance to this research.
P a g e | 11
SVM uses linear models to implement nonlinear class boundaries. This is done by transforming the input using nonlinear mapping. A new instance space is created, called vector space. In this vector space a linear line can be drawn, that appears nonlinear in the real instance space. The instance space is shown in Figure 2.2 were it is clearly visible that the instances cannot be separated by a single linear line. Figure 2.3 shows the same instances, but in a vector space in which they can easily be divided by a linear line.
Class A Class B
Figure 2.2 – Non linear separable instance space
Class A Class B
Figure 2.3 – Nonlinear separable instance space after transformation into a linear separable vector space
It is best explained considering a dataset that consists of twenty-four instances, also called vectors. Each vector is filled with a number of attribute values, followed by a class. In this case there are two classes, Class A and Class B. So each vector belongs to Class A or Class B. By using a nonlinear mapping formula the vectors are transformed into points in a vector space. These points are called feature vectors. The basic idea is that the SVM can draw a linear line that separates all the feature vectors from Class A from all the feature vectors from Class B. The linear model is called a hyperplane. This is shown in Figure 2.3. Class A is represented by diamonds, Class B by squares. Whenever a new vector is encountered, it will be transformed to a feature vector. Depending on the place of the feature vector in the vector space, it is
attributed to Class A or Class B. The SVM uses an even more elaborated technique then an ordinary hyperplane, it is called maximum margin hyperplane. It is a hyperplane which is furthest away from both classes and which is orthogonal to the shortest line connecting the outer boundaries. The instances that are closest to the maximum margin
P a g e | 12 hyperplane are called support vectors. These support vectors determine the lay-out of the hyperplane. Each class has at least one support vector, but often more. In Figure 2.4 the vectors that are considered support vectors, are represented by larger squares and diamonds. It is important to note that when the support vectors are known, all the other vectors are irrelevant. They can be deleted without changing the shape of the maximum margin hyperplane, since the hyperplane is constructed based solely upon the vectors of one class with minimal distance to the vectors of the other class. All the vectors that are further away do not influence the maximum margin hyperplane.
Class A Class B
Figure 2.4 - Maximum margin hyperp lane with su pport vectors in a vector space
The example given here is that of a rather simple classification problem and would not take long to execute. However, using a SVM with a large dataset containing multiple classes and large amounts of attributes per vector will result in a not feasible computational complex problem. It will take a considerable amount of time and computational resources to classify such a dataset. This problem is also known as constrained quadratic optimization. This is where SMO provides the solution. SMO divides the quadratic programming (QP) optimization problem into smaller parts. These small QP problems are solved analytically, which avoids using a time-consuming numerical QP optimization as an inner loop. The amount of memory required for SMO is linear to the training set size, which allows SMO to handle very large training sets. Because matrix computation is avoided, SMO scales somewhere between linear and quadratic in the training set size for various test problems (Platt, 1998). Concluding, a SMO is an optimization of a SVM. It reduces the time and computational recourses that a SVM would need to perform a classification, without losing accuracy.
P a g e | 13
3 Experimental Setup
The aim of this research is to explore to what extent we can create a player model based on preferences by using a classifier. In order to structure the research we formulated a research plan which is a theoretical representation of the intended research process. This research plan is further elaborated on in Section 3.1. Section 3.2 revolves around the selection of the computer-controlled leaders and their preferences. Once those are selected the generation of data starts in Section 3.3. Which data is to be observed and stored is discussed in Section 3.4 followed by the selection of the appropriate classifier in Section 3.5. Finally, an explanation of the experiments is provided in Section 3.6. 3.1 Research Elaboration
The choice for the commercial computer game CIV 4 as the digital research field has four reasons (Section 2.2). Most important reason is the availability of preferences in the game code of the CIV 4 leaders. These preferences are the base of our player model. The goal is to make a player model based on the preferences of players. In order to create the player model we need to start with the construction of classification models, one for each preference. Once the classification models are constructed, an attempt is made to (1) model computer-controlled players and (2) human players. We believe that given a player’s preferences, this player exhibits certain strategies. However, strategies themselves cannot be observed, since a strategy is not explicitly manifested by parameters of the game world. The actions of the player on the other hand, are based upon those strategies and are, unlike the strategies itself, observable in the game world. Via mechanics described in Section 3.3 such observations can be extracted from the game world and stored in a database, Database A . The observations are linked to the preferences of the computer-controlled players that generate the observable actions. Because the observations are linked to the preferences it is possible for the SMO classifier to train on the observations in the database. After the classifier has trained on the data, one classification model per preference is created. Each classification model is assumed to be capable of predicting one preferences of the computer- controlled players. Even if it is to predict preference of players that were not part of the data generation. The combination of the classification models of all the preferences is considered a player model. In the first stadium we validate the classification models by letting them predict the preferences of the same computer-controlled players that are used to construct the classification models. In Figure 3.1 an overview is presented of the process of building a classification model. It contains seven aspects consisting of: (1) squares that represent entities, (2) ovals to indicate actions, processes or steps, and (3) the cylinder that represents a database. Each entity is a prerequisite to reach the next entity through an action, process or step. Determining the accuracy of the predictions of the classification models on computer-controlled players and human players is the following step in our process of player modelling. Once the classification models are build, they are assumed to be able to predict the preferences of other computer-controlled players and even human players. For the created classification models to provide predictions on these new players, new databases are needed. These databases are called Database B for the computer-controlled players and Database C for the human players. Determining the accuracy of the predictions on computer-
P a g e | 14
Predicts (1) A computer-controlled player; more precise the preferences 7. of that player. Classification 1. AI Player Model (2) The process of transferring preferences through actions into the game world.
(3) The game world.
6. 2. Acting (4) The process of observing the game world and extracting Training data.
(5) A database named Database A . This database is used for the experiment Classification Model Validation.
3. Game (6) The process of the classifier that trains on the data in 5. Database A World Database A . 4. Observing (7) A classification model is built, capable of predicting one preference.
Figure 3.1 - Building a classification model based on the preferences implemented in the computer -controlled players
controlled players and human players is rather similar to the creation of the classification with minor adjustments. This process is presented in Figure 3.2. It contains nine aspects rather similar to Figure 3.1 consisting of: (1) squares are entities, (2) ovals indicate actions, processes or steps, and (3) the cylinder represents a database. Each entity is a prerequisite to reach the next entity through an action, process or step.
Similarity (1) A player; computer-controlled or human.
9. Player Model 1. Player (2) The process of transferring preferences through actions into the game world.
(3) The game world.
8. Combining 2. Actions (4) The process of observing the game world Results and extracting data.
(5) A database. Database B for the experiment Modelling of AI. Database C for the experiment Modelling of Humans.
7. Classification 3. Game World Model (6) The process of feeding the data to the previously built classification model.
(7) The classification model.
6. Feeding 4. (8) The process of combining the results of Data Observing the classification models into a player model.
5. Database B / C (9) A player model.
Figure 3.2 - Building a player model based on a classificat ion model and player data
The first five entities in Figure 3.2 are similar to the first five entities in Figure 3.1. These entities are necessary to create a database of game data from which predictions about preferences can be made. Starting at entity six, the approach slightly differs from the approach in Figure 3.1. Instead of letting the SMO classifier train on the data in the database, the data is inserted in the previously built classification models. In other words, the previously created classification models are used to predict preferences based on new data. This results in predictions of the preferences of the players that are used to create the new databases. Database B will consist of game data from a set of computer-controlled players, different computer-
P a g e | 15 controlled players than those initially used to create the classification model. Database C will be created by human players. Assuming the accuracy of the predictions is sufficient, the predictions would form the base of our player model. The player model would then provide the name of a player, his preferences, and his preference values. Ultimately the player model would have preferences and preference values similar to those of the actual player. 3.2 Leaders and Preferences
In CIV 4 there are 26 leaders to choose from. Each leader has its own personality. These personalities are for a significant part attributed to preferences that are in the leader’s game code. In this section we will discuss and motivate the choice of the leaders and their preferences to create the classification model.
The leaders in CIV 4 are designed with, what the game designers call, ‘flavours’. All these flavours are attributed to each of the 26 leaders. In other words, each leader has these flavours. Flavours are identical to what we call preferences. Besides these flavours there are several other parameters that can be interpreted as preferences, also found in the game code of each leader. From the flavours, six are chosen to function as preferences in this research. Finally the preference, Aggression , is added to the selected six flavours. Aggression is one of the additional parameters that also influence the leader’s behaviour. It is added to the six preferences because a players tendency towards aggression is valuable information in CIV 4. This sums up to a total of seven preferences that serve as the base on which the player models will be built. These preferences are called: (1) Aggression , (2) Culture , (3) Gold , (4) Growth , (5) Military , (6) Religion , and (7) Science . All 26 leaders have these seven preferences, only the values of the preferences differ per leader. This ensures that each leader will exhibit different behaviour in the game world. These preference values can also be found in the game code. For the six preferences that are defined as flavours these values can be in the range {0, 2, 5, 10}. This can be interpreted as no preference, minor preference, major preference, and only preference respectively. For the added preference Aggression the preference values are in the range {1, 2, 3, 4, 5}. This can be interpreted as very low aggression, low aggression, medium aggression, high aggression, and very high aggression respectively. In Section 2.2 was mentioned that a large database of game data was already available. That game data was accumulated by letting leaders duel each other. Using that data implies that three leaders were already selected for us. Three additional leaders were randomly selected to bolster the ranks to six leaders who are to function as base for our classification models. These leaders are presented in Table 3.1 with the preference values per preference and a subjective description of the leaders behaviour in the game world based on game play experience.
Table 3.1 The six selected leaders with their preference values per preference and a subjective description of the leaders behaviour in the game world based on game play experience Aggression Culture Gold Growth Military Religion Science Alexander 5 0 0 2 5 0 0
Alexander shows some respect to other players that have a strong military force. He is even willing to enter into treaties with those players. As soon as he ascertains that his military force is larger than that of his opponent, he cancels the treaties with that opponent to attack him.
Hatshepsut 3 5 0 0 0 2 0
Hatshepsut is a peace loving leader. She is focussing on her own culture and expands her empire gradually. She makes many deals and treaties with other players and will almost never attack without serious provocation.
P a g e | 16
Aggression Culture Gold Growth Military Religion Science Louis XIV 3 5 0 0 2 0 0
Louis XIV is a culturally oriented leader. He expands his empire by increasing its cultural value. He remains peaceful as long as his empire is not threatened by military or cultural means. Once in war with this leader it becomes clear that he is not to be trifled with.
Mansa Musa 1 0 5 0 0 2 0
Mansa Musa generates lots of gold and invests it mainly into science. Therefore he is almost the only leader who ever attempts to achieve a space victory. Playing against this leader always gives his opponents a scientific disadvantage.
Napoleon 4 0 2 0 5 0 0
Napoleon is a military oriented leader who is always looking for a fight. He produces many military units and is easily annoyed with other players. Once annoyed he declares war without hesitation.
Tokugawa 4 0 0 0 2 0 5
Tokugawa is a very isolated leader who is only concerned with his own empire. He will not negotiate with other players and will not even allow other players to pass through his lands. He is a fairly strong adversary once in war because of some scientific advantage and a sufficient large military force.
From Table 3.1 can be concluded that the preference values 2 for Aggression and 10 for the other six preferences are missing. This can be attributed to the random selection of leaders. As a result not every preference value was represented, possible implications will be discussed in Chapter 5. Each preference value will function as a class for the SMO classifier. This implies that each preference will require a separate classification model, which results in seven classification models. The SMO classifier is trained on game data generated by these six leaders and their preference values. After training it is expected to predict preferences based on game data of other leaders and even human players. The classification model predicts the preference values of players. In case of aggression it predicts a player’s preference value to be {1, 3, 4, 5}. For the other preferences the classification model predicts a player his preference value to be {0, 2, 5}. To summarize, six leaders are selected. Each of these leaders has the same seven preferences. The difference can be found in the preference values. These leaders will be used to generate data on which the SMO classifier can train. The training results in the construction of one classification model per preference. Each classification model is assumed to predict the preference values of unknown players (computer- controlled or human). 3.3 Data Generation
The selection of six leaders (Alexander Set ) and their seven preferences results in a set of computer- controlled players that form the base to create a classification model. The process of transferring their preferences into actions in the game world is discussed in this section. For the Alexander Set to transfer their preferences into actions in the game world, they need to play. Normally a computer-controlled player cannot play a game of CIV 4 alone, it requires at least one human player as opponent. However, it is not desirable to use human players to compete with the Alexander Set
P a g e | 17
for two reasons. (1) Consider the amount of time an average game in CIV 4 requires if played by a human player. This can vary from half an hour to three hours. If the human player would only play once against each leader of the Alexander Set , it would take ten and a half hour on average. (2) Another reason is that the SMO classifier needs large amounts of data to train on since it is a greedy classifier. This would require human players to play numerous games against the leaders of the Alexander Set . This would nullify the added value of this approach, since we aimed to minimize the need for humans to generate data.
A solution is found by implementing an application for CIV 4 called: AIA UTO PLAY. This application makes it possible to let computer-controlled players fight each other without the requirement of a human player. It provides the option for a human player to give the control to the AI for any desired amount of turns. It only requires a human to initiate the game between computer-controlled players and set the amount of turns the game should continue. For this research each game is set to take 460 turns. The reason being that games in CIV 4 always end with a winner in the 460th turn through a Time Victory (Section 2.2). If another victory condition has been met by a player before the 460th turn, the game will continue until it reaches the 460th turn. The turns after reaching a victory condition are erased afterwards. To improve clarity of future observations only duels between computer-controlled leaders of the Alexander Set are initiated. Creating games with more leaders would possibly hinder some leaders to utilize their desired preferences, creating unclear observations. The duels between the Alexander Set leaders are structured according to a battle plan. This battle plan is presented in Table 3.2.
Table 3.2 Battle plan presenting the duels for each leader Player 1 Player 2 Player 1 Player 2 Alexander vs. Hatshepsut Hatshepsut vs. Louis XIV Alexander vs. Louis XIV Hatshepsut vs. Mansa Musa Alexander vs. Mansa Musa Hatshepsut vs. Napoleon Alexander vs. Napoleon Hatshepsut vs. Tokugawa Alexander vs. Tokugawa Hatshepsut vs. Alexander
Louis XIV vs. Mansa Musa Mansa Musa vs. Napoleon Louis XIV vs. Napoleon Mansa Musa vs. Tokugawa Louis XIV vs. Tokugawa Mansa Musa vs. Alexander Louis XIV vs. Alexander Mansa Musa vs. Hatshepsut Louis XIV vs. Hatshepsut Mansa Musa vs. Louis XIV
Napoleon vs. Tokugawa Tokugawa vs. Alexander Napoleon vs. Alexander Tokugawa vs. Hatshepsut Napoleon vs. Hatshepsut Tokugawa vs. Louis XIV Napoleon vs. Louis XIV Tokugawa vs. Mansa Musa Napoleon vs. Mansa Musa Tokugawa vs. Napoleon
According to this battle plan each leader from the Alexander Set is player 1 for five duels and player 2 for five more duels. This sums up to a total of 10 games per leader per battle plan. To generate more data the battle plan was executed four times, summing up to a total of 40 games per leader and 240 games in total. These games are the start of a database on which the SMO classifier is to train. To control the accuracy of the SMO classifier a test set is required. This test set contains data that the SMO classifier has not trained on, but contains data of the same computer-controlled players that generated the training data. To this end 30 games per leader will serve as training data and 10 games per leader will serve as test data.
P a g e | 18
The number of games used as training set and test set are no arbitrary numbers. These are based on a preliminary experiment. This preliminary experiment is executed to determine whether the results of a 10- fold cross-validation on the training set are roughly similar to the results of a classification on future test sets. With 30 games per leader as training set and 10 games per leader as test set, the results would only differ about 5%, which we found a statistical acceptable difference. It is necessary for the difference in results to be as small as 5% or less. Future adjustments, to for instance the database, need to be tested. It is important to know whether an adjustment is an improvement or not. Conducting experiments with the training set and test set would probably cause overfitting on the test set. In other words, each adjustment is then made to improve the results on that specific test set. If another test set is used, these adjustments may not be beneficial for that new test set. To prevent this from happening, all adjustments are tested by 10-fold cross-validation on the training set. 3.4 Observations
All duels serve the purpose of generating data. They need to be observed and data needs to be extracted to create a database. In this section we will discuss how we extract the data, which data we extract and why we extract that data. The classification models that are to be constructed are based upon the observations made during the duels. It is essential that these observations are useful and contribute to the creation of the classification models. The purpose of the player model is to be applicable in the game for players to use. Therefore it is necessary that the data upon which the classification model is build, and later the player model, is available to all players at all time. The database can only contain data that is available to both leaders in the duels. In strategy games it is common that a player has little knowledge about his opponent’s state. For example, most strategy games (including CIV 4) have a ‘fog of war’, which hides the opponent’s unit. Opposed to most strategy games, CIV 4 provides statistical information about other players via tables and schematics. These are freely accessible during a game. All that information can be used to create a database on which the classification model can train. An example of such a schematic is shown in Figure 3.3.
Figure 3.3 - Example of a schematic, containing the score for both players. This schematic is available to both players during a game of CIVILIZATION IV
P a g e | 19
The selected features need to meet two requirements: (1) the features need to be an indication of a preference and (2) the features must be available to each player at all times. The result is a list of twenty base features as displayed in Table 3.3.
Table 3.3 Base features and their meaning # Feature Range Meaning 1. Turn 1-459 Turn number 2. War 0, 1 0 = not in war; 1 = in war 3. Cities 0-15 Number of cities 4. Units 0-200 Number of units 5. Population 0-200 Amount of population 6. Gold 0-10000 Amount of gold 7. Land 0-200 Amount of land tiles 8. Plots 0-400 Amount of land and water tiles 9. Techs 0-100 Number of technologies researched 10. Score 0-10000 Overall score 11. Economy 0-300 Overall economic score 12. Industry 0-500 Overall industrial score 13. Agriculture 0-400 Overall agriculture score 14. Power 0-4000 Overall power score 15. Culture 0-300000 Overall cultural score 16. Maintenance 0-100 Gold needed for maintenance per turn 17. GoldRate 0-1000 Amount of gold gained per turn 18. ResearchRate 0-2000 Amount of research gained per turn 19. CultureRate 0-3000 Amount of culture gained per turn 20. StateReligionDiff -1, 0, 1 -1 = different religion; 0 = no religion; 1 = same religion
The game CIV 4 is a TBS (Section 2.2). This means that each player has a limited amount of actions to perform in a turn. Once that player is not allowed any more actions, his turn is ended and the turn of the opponent starts. At the end of each turn an observation of all these features is made. Resulting in one row of data (instance) which contains the turn, all the observations of the features (feature values), and the corresponding preference values of that specific player. This is how all observable data is stored in the database to make it possible for the SMO classifier to train on. To extract even more useful information out of the game, several features are added. The added features are seven modifications of almost all 20 base features. The exceptions are: (1) Turn , (2) War , and (3) StateReligionDiff . The seven modifications to the base features are displayed in Table 3.4, including the calculation (Spronck & Den Teuling, to appear ) and the meaning of the modification.
Table 3.4 Modifications to the base features # Modification Calculation Meaning
1. Derivate − Increase or decrease in the base feature per turn