Linköping University | Department of Computer and Information Science Master thesis, 30 ECTS | Computer Science 202018 | LIU-IDA/LITH-EX-A--2018/021--SE

Prediction models for soccer sports analytics

Edward Nsolo

Supervisor : Niklas Carlson Examiner : Patrick Lambrix

Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin- istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam- manhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum- stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con- sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni- versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c Edward Nsolo Abstract

In recent times there has been a substantial increase in research interest of soccer due to an increase of availability of soccer statistics data. With the help of data provider firms, access to historical soccer data becomes more simple and as a result data scientists started researching in the field. In this thesis, we develop prediction models that could be applied by data scientists and other soccer stakeholders. As a case study, we run several machine learning algorithms on historical data from five major European leagues and make a com- parison. The study is built upon the idea of investigating different approaches that could be used to simplify the models while maintaining the correctness and the robustness of the models. Such approaches include feature selection and conversion of regression predic- tion problems to binary classification problems. Furthermore, a literature review study did not reveal research attempts about the use of a generalization of binary classification predictions that applies different target class upper boundaries other than 50% frequency binning. Thus, this thesis investigated the effects of such generalization against simplic- ity and performance of such models. We aimed to extend the traditional discretization of classes with equal frequency binning function which is standard for converting regression problems into the binary classification in many applications. Furthermore, we ought to establish important players’ features in individual leagues that could help team managers to have cost-efficient transferring strategies. The approach of selecting those features was achieved successfully by the application of wrapper and filter algorithms. Both methods turned out to be useful algorithms as the time taken to build the models was minimal, and the models were able to make good predictions. Fur- thermore, we noticed different features matter for different leagues. Therefore, in accessing the performance of players, such consideration should be kept in mind. Different machine learning algorithms were found to behave differently under different conditions. How- ever, Naïve Bayes was determined to be the best-fit in most cases. Moreover, the results suggest that it is possible to generalize binary classification problems and maintain the performance to a reasonable extent. But, it should be observed that the early stages of gen- eralization of binary classification models involve a tedious work of training datasets, and that fact should be a tradeoff when thinking to use this approach.

Acknowledgments

Firstly, I would like to express my sincere gratitude to my thesis examiner and supervisor Prof. Patrick Lambrix and Prof. Niklas Carlson of Linköping university for the opportunity of Thesis project that was carried under their supervision. Their continuous support, guidance, and patience motivated me in the right direction which led to the successful accomplishment of this thesis. Secondly, I would like to extend the hand of gratitude to fellow schoolmates, friends, and family for their company, advice, and encouragement throughout my years of study and through the process of researching and writing this thesis. This accomplishment would not have been possible without them. Lastly, I would like to thank almighty God for the good health and opportunity of a schol- arship to study in Sweden. This publication has been produced during scholarship period at Linköping University, thus, I would like to give a special appreciation to Swedish Institute scholarship.

v Contents

Abstract iii

Acknowledgments v

Contents vi

List of Figures viii

List of Tables x

1 Introduction 1 1.1 Purpose ...... 2 1.2 Research questions ...... 2 1.3 Delimitations ...... 3

2 Related work 5

3 Theory 7 3.1 Software (Weka) ...... 7 3.2 Min-max normalization ...... 8 3.3 Feature selection methods ...... 8 3.4 Class imbalance ...... 9 3.5 SMOTE (Synthetic Minority Oversampling Technique) ...... 9 3.6 TigerJython with Weka ...... 10 3.7 Machine learning algorithms ...... 10 3.8 Evaluation of the prediction models ...... 12

4 Research methods, techniques, and methodology 15 4.1 Pre-study ...... 15 4.2 Experimental study ...... 16 4.3 Methodology ...... 16

5 Data pre-processing 23 5.1 Data collection ...... 23 5.2 Data rescaling, missing values, and duplicates...... 23 5.3 Converting regression problem to binary classification problem ...... 24

6 Feature selection 27 6.1 Feature selection with wrapper method ...... 27 6.2 Feature selection with filter attribute evaluator ...... 31

7 Performance of prediction models 35 7.1 Accuracy results of the prediction models ...... 35 7.2 F1 Score results of the prediction models ...... 36

vi 7.3 AUC-ROC results of the prediction models ...... 37

8 Discussion and conclusion 41 8.1 What are the best mechanisms for selecting essential features for predicting the performance of top players in European leagues? ...... 41 8.2 What are the essential features for developing prediction models for top play- ers in European leagues? ...... 42 8.3 What are the useful classification models for predicting performance of top players in European leagues? ...... 43 8.4 How can binary prediction models be generalized? ...... 43

9 Future research 45

Bibliography 47

A Wrapper method results of the combined-leagues 51

B Attributes selected by Wrapper method of the combined leagues 57

C Execution time of wrapper method for the combined leagues 61

D Aggregated results of filter method for the combined leagues 65

E Model accuracy results of wrapper datasets for the combined leagues 67

F Model accuracy of filter-datasets for the combined leagues 71

G F1 score results of wrapper datasets for the combined leagues 75

H F1 Score results of the filter-datasets for the combined leagues 79

I AUC-ROC results of the wrapper datasets for the combined leagues 83

J AUC-ROC results of the filter-datasets 87

K Accuracy results for individual leagues 91

L F1 score results for individual leagues 97

M AUC-ROC results for individual leagues 103

vii List of Figures

4.1 A procedure for analyzing soccer sport historical data ...... 17 4.2 Data preparation model ...... 18 4.3 Knowledge flow activities for data formatting process ...... 19 4.4 Feature selection with wrapper method knowledge flow model ...... 20 4.5 Feature selection with filter method knowledge flow model ...... 21

6.1 Merit of subsets of attributes selected ...... 28 6.2 Execution time of wrapper subset evaluator ...... 31

7.1 Model accuracy results ...... 36 7.2 Overall F1 Score results of the combined leagues ...... 37 7.3 Overall AUC-ROC results ...... 38

C.1 Execution time of wrapper attribute evaluator for defenders datasets ...... 61 C.2 Execution time of wrapper method for goalkeepers datasets ...... 62 C.3 Execution time of wrapper method for midfielders datasets ...... 62 C.4 Execution time of wrapper method for forwards datasets ...... 63

E.1 Prediction model accuracy for defenders wrapper-dataset ...... 67 E.2 Prediction model accuracy for midfielders wrapper-dataset ...... 68 E.3 Model accuracy for the goalkeepers wrapper-datasets ...... 68 E.4 prediction model accuracy for forwards wrapped datasets ...... 69

F.1 Model accuracy for defenders filter-datasets ...... 71 F.2 Model accuracy for midfielders filter-datasets ...... 72 F.3 Model accuracy for goalkeepers filter-datasets ...... 72 F.4 Model accuracy for forwards filter-datasets ...... 73

G.1 F1 Score results of the defenders wrapper-datasets ...... 75 G.2 F1 Score results of the midfielders wrapper-datasets ...... 76 G.3 F1 Score results of the goalkeepers wrapped datasets ...... 76 G.4 F1 Score results of the forwards wrapped datasets ...... 77

H.1 F1 Score results of the defenders filter-datasets ...... 79 H.2 F1 Score results of the midfielders filter-datasets ...... 80 H.3 F1 Score results of the goalkeepers filtered datasets ...... 80 H.4 F1 score results of the forwards filtered datasets ...... 81

I.1 AUC-ROC results of the defenders wrapper-datasets ...... 83 I.2 AUC-ROC results of the midfielders wrapper-datasets ...... 84 I.3 AUC-ROC results of the goalkeepers wrapper-datasets ...... 84 I.4 AUC-ROC results of the forwards wrapper-datasets ...... 85

viii J.1 AUC-ROC results of the defenders filter-datasets ...... 87 J.2 AUC-ROC results of the midfielders filter-datasets ...... 88 J.3 AUC-ROC results of the goalkeepers filter-datasets ...... 88 J.4 AUC-ROC results of the forwards filter-datasets ...... 89

ix List of Tables

3.1 Confusion matrix ...... 13

5.1 List of all attributes ...... 24 5.2 All datasets of the combined leagues ...... 25 5.3 All datasets of the ...... 25 5.4 All datasets of the EPL ...... 25 5.5 All datasets of the ...... 26 5.6 All datasets of the ...... 26 5.7 All datasets of the ...... 26

6.1 Important features for goalkeepers as selected by wrapper method ...... 29 6.2 Important features for defenders as selected by wrapper method ...... 29 6.3 Important features for midfielders as selected by wrapper method ...... 29 6.4 Important features for forwards as selected by wrapper method ...... 30 6.5 Important features for the combined leagues by wrapper attribute evaluator . . . . 30 6.6 Most frequent selected attributes by wrapper method in combined leagues . . . . . 30 6.7 Important features for goalkeepers as selected by filter method ...... 32 6.8 Important features for defenders as selected by filter method ...... 33 6.9 Important features for midfielders as selected by filter method ...... 33 6.10 Important features for forwards as selected by filter method ...... 33 6.11 Important features for the combined leagues by filter attribute evaluator ...... 34

7.1 Overall model accuracy ...... 36 7.2 F1 Score results between wrapper and filter of the combined leagues ...... 38 7.3 AUC-ROC results ...... 39

8.1 Comparison between the wrapper and filter subset evaluator ...... 42 8.2 Sample of actual predictions ...... 44

A.1 Attribute selection with wrapper method for defenders top10pc dataset ...... 51 A.2 Attribute selection with wrapper method for defenders top25pc dataset ...... 52 A.3 Attribute selection with wrapper method for defenders top50pc dataset ...... 52 A.4 Attribute selection with wrapper method for goalkeepers top10pc dataset . . . . . 53 A.5 Attribute selection with wrapper method for goalkeepers top25pc dataset . . . . . 53 A.6 Attribute selection with wrapper method for goalkeepers top50pc dataset . . . . . 53 A.7 Attribute selection with wrapper method for midfielders top10pc dataset . . . . . 54 A.8 Attribute selection with wrapper method for midfielders top25pc dataset . . . . . 54 A.9 Attribute selection with wrapper method for midfielders top50pc dataset . . . . . 54 A.10 Attribute selection with wrapper method for forwards top10pc dataset ...... 55 A.11 Attribute selection with wrapper method for forwards top25pc dataset ...... 55 A.12 Attribute selection with wrapper method for forwards top50pc dataset ...... 55 A.13 Frequency of attributes being selected by several wrapper schemes ...... 56

x B.1 Support count of selected attributes with wrapper method for defender’s top10pc dataset ...... 57 B.2 Support count of selected attributes for wrapper method for defender’s top25pc dataset ...... 57 B.3 Support count of selected attributes for wrapper method for defender’s top50pc dataset ...... 58 B.4 Support count of selected attributes with wrapper method for goalkeeper’s top10pc dataset ...... 58 B.5 Support count of selected attributes for wrapper method for goalkeepers’ top25pc dataset ...... 58 B.6 Count of selected attributes for goalkeepers’ top50pc dataset ...... 58 B.7 Support count of selected attributes with wrapper method for midfielders top10pc dataset ...... 58 B.8 Support count of selected attributes for wrapper method for midfielder’s top25pc dataset ...... 59 B.9 Count of selected attributes for midfielders top50pc dataset ...... 59 B.10 Support count of selected attributes with wrapper method for forwards top10pc dataset ...... 59 B.11 Support count of selected attributes for wrapper method for forwards top25pc dataset ...... 59 B.12 Count of selected attributes for forwards top50pc dataset ...... 60

D.1 Overall filter method results ...... 66

K.1 Accuracy results of Bundesliga ...... 92 K.2 Accuracy results of EPL ...... 93 K.3 Accuracy results of La Liga ...... 94 K.4 Accuracy results of Ligue 1 ...... 95 K.5 Accuracy results of Serie A ...... 96

L.1 F1 results of Bundesliga ...... 98 L.2 F1 results of EPL ...... 99 L.3 F1 results of La Liga ...... 100 L.4 F1 results of Ligue 1 ...... 101 L.5 F1 results of Serie A ...... 102

M.1 AUC-ROC results of Bundesliga ...... 104 M.2 AUC-ROC results of EPL ...... 105 M.3 AUC-ROC results of La Liga ...... 106 M.4 AUC-ROC results of Ligue 1 ...... 107 M.5 AUC-ROC results of Serie A ...... 108

xi

1 Introduction

Soccer is an ancient sport whose history can be traced to 2000 years ago from China [1]. To date, it is still recognized as the highest ranked sport in the world regarding attendance, broadcasting, and support. More than a half global population follows the sport, and even in the countries like India, USA, and Canada where the sport is not the most dominant, there are millions of players and fans of the sport [2]. Due to the empirical evidence of the success of machine learning techniques in other sports like baseball, the extent of recording historical soccer data changed dramatically as the need for new information and the advancement of the machine learning tools increased [3]. An example of such analysis is the Moneyball study that demonstrated how a team could improve performance regardless of limited resources by finding players with potential in player’s features that are usually undervalued in predicting winning [4]. Determination of essential features is not always straightforward as it may appear [5] [6] [7]. According to FIFA rules, European soccer leagues have a transfer window which spans 16 weeks where soccer team managers tend to look for talented players all over the world that could strengthen their squads. The activity of acquiring such players can be motivated by many factors including the excellent form of the player in the former league. Such moti- vation is unsatisfactory because European leagues are different, they require a different level of physicality and technicality to excel. What makes a good player in one league may not be the case for the other. Hence, there has been an increase of disappointments and criticism of top players underperforming in their current clubs after colossal player transfers turned out to be a failure. An Example of such a player is of Manchester United who performed well in Serie A which made him the world most expensive footballer in 2016 but failed to impress when transferred to English in the same year [8]. Therefore, we see the importance of establishing different criteria in individual leagues that can be used to guide team managers to make proper investments in buying players that will give a good value for money spent. Furthermore, this thesis aims to produce prediction models that are simple and more accurate by grouping players into their specific roles, leagues, and performance levels. In general, we want to make general predictions models that cover many aspects of what makes a good player. First, we find essential player’s features sets because: Soccer constitutes many features that may be hard to derive useful information, irrelevant, uncontrollable and hard to measure. All activities associated with the identification of players, training statistics and

1 1. INTRODUCTION rarely executed actions like penalties can be irrelevant for performance prediction models. Also, various features carry different weights when a player role in a soccer match is con- cerned, essential features of defenders may not be essential for forwards or goalkeepers. For example, a number-of-saves is an essential feature for goalkeepers but not for other play- ers. Also, the performance of machine learning algorithms depends on adequately trained datasets. Attributes must be selected wisely and arranged according to the player roles. Sec- ondly, we take an approach of testing different machine learning algorithms that involve: processing the data, selecting features, running classifiers, and evaluating the performance of the models. Furthermore, different types of discretization are used to broaden the range at which players can be classified. Soccer is a mega profit business which attracts firms from many sectors such as clothes, shoes, and gambling [9]. Consequently, stakeholders demand simple prediction models to facilitate improvement in training, match tactics, and individual player performance. Thus, theoretically, it can be asserted that better prediction models may lead to better performance and profit to all soccer stakeholders. We believe that the models that will be created may be tested empirically and be beneficial to users. In the market, data mining and machine learning are among the best methods available regarding predictions and learning associations among soccer data. Therefore, in this master’s thesis, data mining, and machine learning techniques were carried out using historical data from top players in European leagues of the season 2015-2016 for training and season 2017-2018 for actual prediction testing.

1.1 Purpose

In attempting to build prediction models that analyze European soccer players, this thesis is generally aiming at developing simple prediction models that work as classification prob- lems. Furthermore, for each player category, the different prediction models are built for de- fenders, goalkeepers, midfielders, and forwards. Along the process of fulfilling the main ob- jective of the study of building soccer prediction models for top players in European leagues, the following specific targets form the basis of the thesis project.

• Find essential features of players according to the ideal roles and determining the per- formance of top players in European leagues.

• Develop binary classification models that predict the performance of top players in Eu- ropean leagues in individual and consolidated leagues data.

• Analyze the player ranking schemes provided by soccer data providers

1.2 Research questions

The analysis of the best prediction models for top players explicitly tries to answer the fol- lowing research questions:

• What are the best mechanisms for selecting essential features for predicting the perfor- mance of top soccer players?

• What are the potential features for developing classification models for top players?

• What are the useful classification models for predicting top players?

• How can binary classification models be generalized?

• How accurate can the player ranking schemes of the soccer data providers be?

2 1.3. Delimitations

1.3 Delimitations

This thesis will only analyze player’s historical data from five major European leagues in- cluding English Premier League (EPL), La Liga Premier Division, Bundesliga, Serie A, and French Ligue 1. For each category of classification models available in the machine learning tools, popular representative algorithms were selected. The categories of selected classifiers are Bayes-based, function-based, Lazy-learners, rules-based, and tree-based classifiers. For resampling technique, only oversampling was applied because time was not enough to im- plement all algorithms with all possible combinations of parameters.

3

2 Related work

This chapter focuses on describing the contribution of other researchers and studies that are like this thesis where a review of the application of data mining and machine learning in soccer is conducted. A look at how data mining has been used to predict the performance of players, teams, match outcomes, and injuries is done. Also, an elaboration of current research on other sports analytics will be given. Currently, the main research areas in soccer analytics are concerned with effective train- ing, soccer officials, and prediction of future young stars, match outcomes, injuries, team performances and player ratings. Prediction of player performance is among the foremost researched fields in soccer analytics since players are the center of the sport. Several studies established new and useful ways of predicting player performance by using several machine learning techniques. These studies include Vroonen et al. [10] which proposed advanced techniques for predicting future performance of young soccer players. Vroonen et al. used similarity metrics to create an advanced system that predicts the performance of younger players of the same age. The approach was beneficial as it outperformed the baseline results. More research is possible to cover other aspects of this study as it only included players of the same age rated above 0.9/1. Therefore, a similar study can be conducted on players of different ages and the rating threshold could be lowered to 0.8/1, which is still reasonably high. Among common problems for many researchers are the use of the ineffective historical data because not all match events give useful information when it comes to analysis. There has been inadequate research that focuses on the match phases (phases that led to a or a shot). Not until recently some authors have begun to concentrate on match phases in per- formance analysis of historical soccer data. Decroos et al. [11] grouped several match events by using dynamic time warping technique and then applied an exponential-decay-based ap- proach for calculating the performance of players in the distinct phases. Furthermore, S. Brown [12] conducted a similar study but with page-rank as an approach. Both approaches had high accuracy, but, these studies overlooked the fact that the match events leading to a goal and a shot mostly favor attackers and midfielders, and as a result discriminate defend- ers and goalkeepers. For the improved results, the authors could have separated the datasets according to different game roles, which is an approach emphasized in this thesis. Soccer analysis is based on the consideration of not only match events but also other at- tributes such as postural and physiological characteristics. There are studies such as one by

5 2. RELATED WORK

Lara et al. [13] which analyze the prediction systems that use player movements and balance statistics. The authors used Decision trees (CART) and Logistic Regression as machine learn- ing algorithms. The observation revealed that the performance of players does not depend on balance data and that Logistic Regression outperformed decision trees (CART). Given the fact that there are many machine learning techniques, it can be interesting to replicate the study and compare the results. Furthermore, most sports analytics including soccer, predicts player performance by only using past records. Some researchers incorporate live match statistics to produce live predic- tions. Research by Cintia et al. [14] proposed a model which considers the use of live match statistics and uses chi-square technique. The results showed that the application of live statis- tics yielded a 50% accuracy, which is below the desirable performance. Some soccer analysts tend to use many complicated features when it comes to forecasting performance of players. Consequently, the resulted prediction model became incomprehensible and deceptive. Over time, there has been an effort from the research community to find useful prediction mod- els that are applicable in real-life. Brandt and Brefeld [15] discovered that focusing on just a few features and simple machine learning techniques such as page-rank, C5.0, and SVMs (RBF kernels) could increase the higher accuracy of prediction models. Similarly, G. Kumar [16] reached the same conclusion by using standard feature selection mechanisms. In addi- tion to the Brandt and Brefeld work, G. Kumar used many learning algorithms to figure out essential player attributes. The algorithms used include Linear regression, SMOreg, Gaus- sian Processes, LeastMedSq, M5P, Bagging with REP Tree, Additive Regression with Deci- sion Stump, REP Tree, J48, Decision Tables, Multilayer perceptron, Simple Linear Regression, Locally Weighted Learning, IBk, KStar, and RBF Network. However, some of Kumar et al. [16] claims contradict Cintia et al. [14] assertions that the application of live statistics leads to better accuracy than historical data.

6 3 Theory

This chapter describes the tools and machine learning techniques used while writing this thesis. It gives necessary knowledge about various concepts in the text and the knowledge about the tools applied. For each tool and method, a link to previous research is given to show how other researchers used similar approaches and when a different approach is used motivation supports it. One thing to note, concepts presented in this section are described on a high-level assuming that interested readers will further read the references provided to acquaint themselves with the more in-depth concepts. This chapter is organized into three subsections. Section 3.1 presents the tools used for the project. Section 3.2 describes the techniques used for rescaling the data. Section 3.3 elab- orates different mechanisms for feature selection, especially those applied in this thesis. Sec- tion 3.4 and 3.5 illustrate class imbalance and ways of solving the problem with resampling technique. Section 3.6 presents the scripting package used to extend discretization. Lastly, Section 3.7 presents the high-level knowledge about different classification algorithms used for this thesis.

3.1 Software (Weka)

Weka is open-source software for small and large-scale machine learning tasks. The software has been designed to perform all core machine learning functionalities including data pre- processing, classification, clustering, feature selection, and visualization. Since its first release in 1993, the tool has been progressively accepted and widely used for data mining research and across other fields because of its simplicity and extensibility [17] [18]. Apart from soccer, Weka has also been used in other domains of research. Therefore, there is enough confidence that Weka is suitable for the research similar to this [19] [20]. Weka has five modules which perform the same tasks in different ways. For this thesis, the first three modules: Explorer, Experimenter, and Knowledge flow were used to accomplish most of the project tasks. The first module (Explorer) was used for exploring the datasets to determine the kinds of filters and parameters needed to be applied. The second module (Experimenter) was used to run algorithms and perform tests for each dataset taken after the data has been cleaned, and features have been selected. The third module (knowledge flow), was used for designing and executing sub-steps of the data pre-processing procedure. With knowledge flow models most of data pre-processing procedures are done. Missing values,

7 3. THEORY normalization, feature selection, and resampling are handled. The discretization task is done by extended module TigerJython which enables the running of Jython script for custom dis- cretization and pruning of the datasets. The workbench and the Simple client modules were not used because the first three modules were satisfactory for this project.

3.2 Min-max normalization

Normalization is an important step in any data-preparation process. It is used to transform data to follow a smaller and consistent range across all attributes within datasets; normally the preferred range is from 0 to 1. In many cases, attributes within the same dataset appear to be in different scales, a tendency that affects the efficiency of machine learning algorithms especially for those that use distance measures as a criterion for learning [21] [22]. When datasets contain attributes with different scales, attributes with larger scales tend to over- shadow attributes with smaller scales by distorting attribute weight proportions in distance calculations. As a result, algorithms run slower and produce misleading outcomes. Normal- ization can be done using several techniques including Min-Max, Mean, and Standard de- viation normalization methods. Generally, no normalization method is better than another. Depending on the type of data and machine learning algorithm used, minor differences may be noticed. Previous studies suggest that the min-max normalization technique can lead to a model with slightly higher accuracy, less complicated, and less training time compared to other techniques [23]. The Min-Max normalization technique transforms given attribute values to a range of 0 to 1 or -1 to 1 by using the minimum and maximum attribute values within a dataset. Following the fact that many researchers used the min-max technique and it has been proven to have benefits, this study has chosen the technique for feature rescaling. However, other methods could have been applied as well and still produce reasonable results. The formula for Min-Max normalization is given as in equation 3.1:

x ´ min (x) Normalizedvalueof (x) = (3.1) max (x) ´ min (x)

3.3 Feature selection methods

Feature selection is the process of finding an optimal subset of attributes that enhance the performance of the classification model. It is common for many machine learning problems to have attributes that are redundant or do not influence the target class [24] [25]. Such attributes are referred to as insignificant or undesired features. Removing insignificant attributes in many cases improves the performance, robustness, and simplicity of the classification models [26]. There are four reasons why feature selection is essential for machine learning processes. The first reason is, the feature selection process produces datasets with fewer attributes. Therefore, improves simplicity and interpretability of classification models. The second rea- son is that feature selection significantly reduces learning time hence improves efficiency [27]. The third reason is, feature selection reduces the amount of noise and outliers hence decreases the problem of overfitting. Fourth, feature selection generates dense datasets that in turn increase the statistical significance of values of datasets [25]. Standard feature selection techniques in machine learning include filters, wrappers, and embedded methods. The Filter-method is a technique that uses the correlation between a normal attribute and a class attribute. The features that score above the desired correlation threshold are selected as potential attributes [28]. Filter methods are more efficient but disre- gard the inter-correlation between normal attributes. Wrapper method as another technique may be used when interrelated features exist. Wrapper method uses machine learning algorithms to cross-check multiple subsets of at-

8 3.4. Class imbalance tributes and save the set with optimum performance as a potential set of attributes [27]. Un- like filters, this technique not only considers the relationship between attributes and the class attribute but also, the attribute interrelation. When determining what to use between the wrapper and filter methods becomes a chal- lenge, the Embedded method can be used because it uses both filter and wrapper methods with the aim of utilizing benefits of both techniques [25] [29]. All techniques have benefits and drawbacks; it cannot be generalized that one technique is better than the other. Generally in machine learning, all techniques are executed, then the method that led to optimal performance and robustness is selected and included as part of the model.

3.4 Class imbalance

When values of the class attribute are not equally distributed within a dataset can yield to a class imbalance problem which affects the interpretation of the prediction model’s accuracy [30]. Commonly, this problem causes the accuracy of the majority class to transcend the accu- racy of the minority class which misleads the interpretation of the results. To better illustrate the effects of this problem, consider a dataset with 1000 cancer test results wherein ten records tested cancer-positive while 990 records tested cancer-negative. If results appeared to be as described in a confusion matrix below, it is apparent that, though the accuracy is 98.2%, the model can be thought to be unsuitable for this situation. The accuracy for the positive class is extremely low 0.2% and dominated by negative class accuracy 98%. By implication 99.8% of all tested cancer positive predictions will be wrong despite the reasonable model accuracy of 98.2%.

Actual positives (cancer +ve) Actual negatives (cancer -ve) Predicted positive TP = 2 FP = 8 Predicted negative FN = 10 TN= 980

Model accuracy = Positive class accuracy + Negative class accuracy TP TN Model accuracy = x 100% + x 100% (TP + TN + FP + FN) (TP + TN + FP + FN) (3.2) 2 980 Model accuracy = x 100% + x 100% (1000) (1000) Model accuracy = 0.2 % + 98 % = 98.2%

The class imbalance problem is most problematic to decision trees classifiers and small datasets [30] [31]. Since this thesis project used several decision tree algorithms like J48 and Random Forest it was deemed necessary to resample the data. Additionally, some of the datasets generated for goalkeepers and forwards were small. Likewise, it was thought to be important to apply resampling techniques which solve the problem of class imbalance. Several techniques can be used to deal with class imbalance including under-sampling, over- sampling, and a blended method which uses both techniques. This thesis used only over- sampling technique called SMOTE which is described in section 3.5.

3.5 SMOTE (Synthetic Minority Oversampling Technique)

SMOTE is a popular method of handling class imbalance. It is an oversampling technique that synthetically determines copies of the instances of the minority class to be added to the

9 3. THEORY dataset to match the number of instances of the majority class. Oversampling is the actual process of increasing the needed instances of an underrepresented class in a given dataset to increase the statistical significance of the minority class that in turn reduces the degree at which the models could overfit. Since the class ratio between minority class and majority class for the datasets used for this thesis were 1 to 4, and 1 to 10 which are high. Oversampling technique is suitable than the rest of the resampling techniques [31]. Contrary to traditional techniques, SMOTE does not just add random copies, it looks for the specified K-Nearest Neighbors instances in the dataset and add them until SMOTE per- centage proportion is reached. This technique is proven to be efficient and it is popular among researchers [32]. The SMOTE algorithm takes three parameters (number of minority class, percentage of instances to be added, and the number of nearest neighbors). If the parameter values are not chosen properly, SMOTE may introduce noise and hence affect the accuracy of the classification models [33]. The selection of good values for SMOTE parameters is ex- plained in the Section 4.3.

3.6 TigerJython with Weka

TigerJython is a simple development platform for writing Jython scripts that gives Weka extra capability to execute Python and Java scripts to perform machine learning tasks with much more control than graphical user interfaces [34]. To use scripting languages like Jython a TigerJython plugin was installed through the package manager. With TigerJython a user can directly invoke Weka’s java classes, java, and python modules at will and extend the limited functionalities as desired. For this thesis, Jython script is used to extend the discretization functionality.

3.7 Machine learning algorithms

This section presents the underlying knowledge about machine learning algorithms used for feature selection and classification. It begins by briefly describing how the algorithms work, and presents the benefits and challenges encountered from previous research. Furthermore, for each algorithm a brief discussion about the expectations and tradeoff is made.

Bayesian network Bayesian network algorithms are the type of classifiers that use distribution probabilities in the Directed Acyclic Graphs to give predictions. It takes the leaf nodes as the attributes, and the parent nodes as the predictor class. It is among the widely used classier in machine learning because it is robust and performs well with classification problems with complex and conflicting information. Moreover, Bayesian network techniques work well with datasets of any size, a property that is beneficial for this thesis as the size of the processed data is between 285 to 1857 instances. On the other hand, the main drawback of the algorithm is the inability to deal well with continuous data [35]. In many cases, to apply Bayesian network classifiers, discretization must be applied which in turn disturbs the linear relationship of the data.

Naïve Bayes Naïve Bayes is another type of classifier based on Bayes theorem of conditional probabilities. It is considered as the simplified version of Bayes network classifier except that the attributes are treated independently indicating that knowing the values of one attribute does not imply the value of the other attributes. As its name suggest the classifier is simple and efficient because by ignoring class conditional relationships between attributes it reduces the compu- tation work. Like general Bayesian network classifier, Naïve Bayes works well with small and large datasets. Although it is a simple classifier, many studies have found Naïve Bayes to

10 3.7. Machine learning algorithms outperform many classifiers even those that are sophisticated [36]. On the other hand, Naïve Bayes has drawbacks that include decreasing of accuracy caused by ignoring the information about conditional dependencies, also it forces invocation of discretization function which in turn affect negatively certain types of datasets [37].

Logistic Regression Logistic Regression is the type of classifier that falls under the function category. It works similarly as linear regression except of the relation function and the type of the class attribute used. This classifier is advantageous over linear on the following aspects: The ability to work well with categorical data, only handle binary classification [38], works relatively well to skewed datasets where the linear relationship in datasets is not well observed. The algorithm uses the population growth function to express the relationship between the input variables and the predictor class. The equation used for Logistic Regression comprise of four variables as follows: y means predicted class, x means the values of the independent attributes and b0 mean the Logistic equation intercept, and b1 means the value of the single independent attribute.   predicted class (y) = e(b0 + b1x) / 1 + e(b0 + b1x) (3.3)

Lazy classifiers Unlike other types of classifiers, lazy learners use the instances of the training dataset to make predictions. Other types of classifiers create the model first and use it with testing data to make predictions. The name lazy is followed by its tendency of not producing the model at all. Sometimes they are referred to as instance-based learners, an example of such learners is KNN (IBK) or KStar classifiers. Except for KStar which uses entropy distance measures to make predictions most of the lazy classifiers use standard distance measures such as Eu- clidean distances [39]. Although the idea behind the lazy leaners is simple, they are efficient, especially for small datasets. However, when lazy learners use large or high dimensional datasets as input, the lazy classifier performs poorly and require large storage capacity [40]. Another challenge is that it can be difficult to determine the appropriate K nearest neighbors because a redundant work of guessing K value can be time-consuming. Furthermore, since lazy learners make predictions directly from the training datasets, constant work of updating the training datasets is highly required as the new data become available.

Rule-based classifiers Rule-based classifiers are the type of classifiers that generate a set of rules in the form of (IF condition THEN class label) and use them to make predictions. They are efficient and straightforward learners and widely used for many classification problems. For this thesis, two types of rule-based classifiers implemented are Zero and PART. Zero (Zero Rules) classifier is the simplest classifier which focuses on the target class and ignores other non-target attributes. It is mainly used as baseline accuracy to determine the minimum accepted accuracy for other classifiers [41]. PART is another type of rule-based classifier that combines two techniques to make predictions. It works as C4.5 but additionally prunes the C4.5 decision tree separately on each iteration and selects the best leaf as a rule [42]. Furthermore, the Decision Tables classifier completes the list of rule-based algorithms that we use. Decision Tables are known as one of the most straightforward and efficient classifiers that use a simple logic of decision tables to classify instances of data. They are like decision trees in the classification process. What distinguishes them is how the decisions or rules are presented.

11 3. THEORY

Decision trees classifiers Decision tree classifiers are non-parametric algorithms that build classification and regression models by using decision trees concepts. The decision trees are built in a way that Leaf nodes represent the dependent class attribute, and the root node represents the independent input attributes. To make a prediction, first, a decision tree is generated and stored as the set of rules to be used for determining the class value of the new instance. For this study two of the standard decision tree classifiers, J48 and Random Forest were used. J48 is java implementation of the C4.5 idea proposed by Ross Quinlan as an improvement of the predecessor ID3 algorithm [43]. Unlike ID3, J48 works well with both numeric and categorical data and applies pruning techniques for minimizing errors [44]. Random Forest is another type of tree-based classifier used for this thesis. It works by building multiple deci- sion trees generated randomly from subsets of the training dataset. The classification of a new instance works by each decision tree predicts a class in which an instance belongs then the class with most predictions is selected as the correct class of the instance. The benefits of this algorithm include robustness, and capability to handle large and high dimensional datasets appropriately. However, Random Forest has disadvantages as well. The first drawback is, Random Forest for regression problems is not as good as for classification problem. The sec- ond disadvantage is that the optimal number of features is not known intuitively, therefore trial and error required.

3.8 Evaluation of the prediction models

Determining the best prediction models is always a difficult task that requires a great deal of knowledge of machine learning techniques and a broad range of performance metrics [45]. The classifier’s performance can be measured by using simple metrics such as con- fusion matrix or advanced metrics that include multiple interpretations. Examples of things that machine learning algorithms used for evaluation of performance include, the number of true positives (TP), true negatives (TN), false positives (FP), false negatives (FN), accu- racy, Precision, Recall, F1 measure, Area Under the Curve Receiver Operating Characteristic (AUC-ROC), robustness in terms of time taken for training and testing, to mention a few.

Confusion matrix The confusion matrix is the essential means of evaluating the performance of prediction mod- els. It is composed of four fundamental metrics that can be used as criteria when selecting the best prediction model. Commonly, machine learning algorithms target two of the metrics, TP and TN where the goal is to achieve the highest score, while the other two metrics, FN and FP the goal is to achieve the lowest score [46].

• TP: The desirable element of the confusion matrix that presents the number of correct predictions for the positive/target class

• TN: The desirable element of the confusion matrix that presents the number of correct prediction for the negative/no-target class.

• FP: The undesirable element of the confusion matrix that presents the number of incor- rect predictions for the positive/target class

• FN: The undesirable element of the confusion matrix that presents the number of the incorrect predictions for the negative/non-target class

The confusion matrix can be used as a quick way of analyzing the performance of the models. However, it cannot be used to make a comparison of the models because it does not have a single metric which incorporates all metrics. If the confusion matrix alone is to be used,

12 3.8. Evaluation of the prediction models the analyst will have to consider the values of TP, TN, FP, and FN individually, otherwise, the results may not reflect the real performance.

Table 3.1: Confusion matrix Predicted Positive Predicted Negative Actual positive TP FN Actual Negative FP TN

Prediction accuracy The second metric that can be used to measure the performance of the prediction models is the accuracy which is given as the percentage of the sum of the correct predictions (TP + TN) over the sum of all predictions (TP+TN+FP+FN). Accuracy can only suffice the measurement of the correctness but cannot tell the extent of FP and FN. When it comes to sensitive domains such as health, aviation, and security, high values of FP and FN are intolerable. Therefore, accuracy needs to work along other metrics such as F1 and AUC to make accurate evalua- tions[47] [45]. Below is the formula that deduced the values of accuracy.

(TP + TN) Model accuracy = (3.4) (TP + TN + FP + FN)

F1 measure The third metric for measuring the accuracy of the classifiers is the F1 Score. This metric presents the combinatory effect of the other two measurements of FP and FN that are reflected in precision and recall values [48]. Precision is used to indicate the extent of the false positives while Recall indicates the extent of the false negatives. The range of F1 values is 0 to 1 where 1 is the perfect prediction. The higher value of F1 indicates that the prediction model has few FP and FN.

• Precision: Given by the ratio of TP and (TP + FP)

• Recall: Given by the ratio of TP and (TP + FN)

• F1 measure: Given by the double ratio of the product and sum of precision and recall [49]

(Precision x Recall) F1 = 2 x (3.5) (Precision + Recall)

Area Under the Curve (AUC) The AUC is a plot representation of TP rate or Recall versus FP rate or precision [50]. It is an easy way to depict the relationship between the values of Recall and Precision or Recall and specificity applied at different thresholds [45]. The former is generally known as AUC - Receiver Operating Characteristic and the latter is known as AUC-PRC. They both corre- spond to each other, but, in this thesis, AUC-ROC was used over AUC-PRC as they form a bit smoother curve.

• Specificity: measure the degree at which negative predictions are correct. It is given by the ratio of TN and (TN + FP)

• AUC-ROC plot: Recall against (1-specificity), i.e., TP rate vs. FP rate

13 3. THEORY

• AUC-PRC plot: Precision vs. Recall

The range of AUC-ROC/PRC is between 0 to 1 where 1 is a perfect prediction, and 0 is an absolute bad prediction.

14 4 Research methods, techniques, and methodology

This chapter presents the research methods, techniques, and the methodology outlining step by step the procedures that led to the accomplishment of the thesis project. There are many research methods that are commonly used in computer science including simulation, exper- imentation, observation, pre-study, literature, and comparative study. But, this thesis used three common approaches: pre-study, literature, and experimental study that are relevant to machine learning research. In respective order the Section 4.1, 4.2, and 4.3 describe the pre- study activities undertaken to start the project, gives an overview on how the experimental study was conducted, and present the methodology used for this thesis.

4.1 Pre-study

As the starting point of the thesis project a pre-study was conducted as a means of getting familiar with the undergoing research of soccer analytics. The aim was to build a foundation of the thesis by analyzing different methodologies, tools, algorithms, and areas explored in the past. This was carried out as literature review process where several sports analytics and machine learning publications were collected. Furthermore, literature study was used as means of acquisition of necessary knowledge for some of the concepts needed for the thesis work. The sports analytics and machine learning journals and conferences used for this thesis include:

• MIT SLOAN | Sports analytics conference • MLSA | Machine Learning and Data Mining for Sports Analytics MLSA workshop 20170 • IEEE Transactions on Pattern Analysis and Machine Intelligence • Machine Learning: ECML-95 conference • Journal of Machine Learning Research • Journal of Artificial Intelligence Research

Furthermore, search engines were used to collect important information about the study. Some example of search terms includes:

15 4. RESEARCHMETHODS, TECHNIQUES, ANDMETHODOLOGY

• team|player|injuries|young talent && prediction|forecasting && models

• feature|attribute && selection|reduction && techniques|methods

• soccer|football && machine learning|data mining|artificial intelligence|ai

• basketball|baseball|health && machine learning|data mining|artificial intelli- gence|ai

• training && machine learning|data mining|artificial intelligence|ai

4.2 Experimental study

Experimental research is the quantitative study aiming to find, validate, and analyze the stud- ied cases in relation to different parameters applied [51] [52]. It can be incorporated into observations and comparative studies between several conditions. The experiments in this thesis were undertaken in the form of supervised machine learning process where on each phase as described in section 4.3 analysis of the performance of players in five European leagues was conducted. Among comparisons made in the study were between: wrapper and filter feature selection techniques; Top10, Top25, and Top50 ranking of players; Bundesliga, EPL, La Liga, league1, and SerieA leagues; Bayes network, Naïve Bayes, Logistic Regression, IBK, PART, Decision Tables, J48, Random Forest, and ZeroR classifiers. The experiments were aided by the machine learning tool Weka that enabled the processes of data pre-processing, training, and evaluation of the algorithms. Along with the tool, Java and Python scripts were implemented to facilitate the limited functions in Weka. The purpose of the scripts was to extend data discretization to any desired ranking split and to implement pruning of the raw-datasets following data cleaning and feature selection procedures.

4.3 Methodology

The purpose of this section is to present the processes undertaken during forming prediction models for soccer players. The section commences by first elaborating the methods regarding data collection and preparation. It describes the process involving reduction, resampling, and rescaling of the data. The second subsection is about the feature-selection procedure where a list of most important features is selected for each dataset. The third subsection explains the procedure used for generation of training and testing datasets by showing its validity to the context of this study. The fourth and fifth section elaborates development, execution, and testing of prediction models. Furthermore, for each classifier and filter used, the choices of parameter values used for the experiments are explained. The final section is about the anal- ysis of the performance of machine learning models created. Different performance metrics including accuracy, precision, recall, AUC (ROC) are presented with another constituent pa- rameters regarding robustness. The Figure 4.1 summarizes the overall methodology for this thesis.

Data collection In this thesis data collection was conducted to five European leagues comprising of English Premier League (EPL), Spanish league (La Liga), Germany league (Bundesliga), Italian league (Serie A), and French League (Ligue 1) where players were arranged according to the different roles. WhoScored was used as the main data provider considering that most of its data were acquired from Opta . Many secondary data providers like Squawka also use Opta as a pri- mary data source, but the differences regarding the rating of players performances between data providers are small therefore they are equally reliable. WhoScored and other secondary data-providers use internal schemes developed by a group of soccer experts to rate player

16 4.3. Methodology

Figure 4.1: A procedure for analyzing soccer sport historical data and team performances. Because of the small differences, it gave confidence that the selec- tion of data-provider does not matter as long as other requirements of acquiring the data is not expensive regarding time and resources. Previous researchers have also been using WhoScored as the main source of data which indicates that the information provided is ac- cessible and it can also be used for this thesis. An example of the study that used WhoScored includes Decroos et al. [11] in his research about spatial-temporal action rating system for soccer. The data collection process led to two groups of datasets which in total contained 5212 instances. Group one separated players according to their corresponding leagues and further divided them to the four raw-datasets following the position of the player on the pitch. Group two combined players from all leagues into four datasets goalkeepers, defenders, midfielders, and forwards. Both group one and two had a total of 2606 instances. Players whose primary role and position on the pitch were defending and back, respectively were categorized as defenders. Players whose primary role and position were playmaking and central, respec- tively were considered as midfielders. Players whose primary responsibility and position were attacking and front, respectively were categorized as attackers. The remaining players were categorized as goalkeepers. The smallest dataset was for goalkeepers it contained 222 instances. The largest dataset was midfielders; it had 1109 instances. The remaining datasets which belong to defenders and forwards had 970 and 305 instances, respectively.

Data preparation Data preparation is an essential procedure for machine learning problems as in real life raw datasets usually contain errors regarding incorrect formats, inaccurate values, typographical mistakes, duplicates, and incomplete information. Such errors can negatively affect the per- formance of machine learning algorithms as they can increase the amount of time needed for learning and in worst cases give a misleading outcome [53] [54]. In this thesis, data prepa- ration is divided into four phases named as data formatting, relevance analysis, resampling, and feature selection.

17 4. RESEARCHMETHODS, TECHNIQUES, ANDMETHODOLOGY

Figure 4.2: Data preparation model

Phase one – Data formatting Phase one includes all procedures for formatting the raw-datasets as to make them compat- ible with the Weka software and enhancing naming and ordering of attributes. Particularly the activities involved in this phase include: Conversion of unrecognizable character-types to equivalent recognizable characters. For example, the name of the player Dembélé was re- placed by the equivalent characters Dembele. All datasets were converted to standard file formats CSV and ARFF which are supported by Weka tool and commonly used in machine learning. Then, discretization of the datasets was done to group data into Top 10, 25, and 50 percent. In addition, rating attribute was selected as class attribute. A class label with values above the selected threshold was marked as a target class on each dataset. Furthermore, the removal of incorrect data, replacement of missing values, and merging of duplicate information followed. Several players were identified as duplicates due to trans- ferring from one team to another across Europe. For this case, the record of the previous team was merged with the record of the new team. Missing values for some attributes were replaced by zero and other replaced by the appropriate information collected from the Inter- net. For example, missing nationality information of some players was obtained from search engines. Before normalization, a numeric attribute player_id was transformed to nominal so as it can be excluded automatically as by default Weka tend to ignore nominal attributes and class attribute when normalization filter is applied. The normalization process was done by a min-max method where a scale of 0 to 1 was used.

Phase two – Relevance analysis Relevance analysis mainly involves data reduction and feature selection. Feature selection is all about finding subsets of attributes of the dataset that can give the best results at lesser expense, more details about this will be given in the following section. On the other hand, Data reduction is an exploratory activity that involves removal of data with no or unwanted information. Normally in this stage, duplicate attributes are merged, attributes with single value are eliminated, decimal places are reduced by rounding formulas, partitioning of data according to a certain threshold is applied. Many of the previous studies used number of matches or minutes played for the season to be a constraint of removing irrelevant instances [11] [55] [56] [57]. In any sport high-rated players turned out to be relatively more important and relied more upon than other players on ensuring the success of a team [55]. On aver- age top-rated players have more game time than regular players because of their consistent

18 4.3. Methodology

Figure 4.3: Knowledge flow activities for data formatting process

fitness and excellent performances. For an individual player to be considered a good player he must at least have a decent game time per season. However, in this thesis a different approach is used, no game time or other constraints are used to filter players because the information that is given by the instances may be beneficial while training. If the relationship between game time and rating attributes is insignificant, the information will be captured and removed during feature selection process. Since the aim is to create binary prediction models for top-rated players, for this study the term top-rated is referred to top 10%, top 25%, and top 50% players. Binary classification models need a binary class with values upper or below the percentage threshold. Weka can only discretize the attributes by using equal-frequency binning (i.e., 50%-frequency binning). However, the aim is to have 10%, 25%, and 50% - fre- quency binning therefore TygerJython script is used to perform custom discretization. After running the script, three datasets were created for defenders, goalkeepers, midfielders, and forwards. The first dataset had two labels: top ten percent, and non-top ten percent players. The second dataset had two labels: top twenty-five percent, and non-top twenty-five percent players. Lastly, the third dataset had two labels: top fifty percent, and non-top fifty percent players.

Phase three – Resampling (Dealing with class imbalance) Discretizing the datasets to smaller percentage frequency binning such as 10% produced datasets with an unbalanced number of instances among classes. The ratio between posi- tive and negative class for top 10% and 25% datasets were 1:10 and 1:4 respectively. When a pilot analysis was undertaken on these datasets, most of the models overfitted. The classifiers returned 100% classification accuracy. Therefore, it was necessary to remove class imbalance to be able to generalize the binary classification models whose desired ratio between positive and negative class is 1:1. Therefore SMOTE percentage parameter of the algorithm is ma- nipulated by using the formula below to archive the desired ratio. This formula is used to calculate the number of instances needed to reach a class ratio of 1:1. It takes the number of instances of the majority and minority class as input and returns the percentage of minority

19 4. RESEARCHMETHODS, TECHNIQUES, ANDMETHODOLOGY class needed for the generation of new instances.

 Number of instances of the majority class  SMOTE percentage = ´ 1 x100 (4.1) Number of instances of a minority class Z ^ Phase four – Feature selection After completing data preparation as described in section 4.1, the procedure of selecting a set of important features follows. Both filter and wrapper methods are widely used in data mining research. Some research used either filter or wrapper approach while other combined both and selected the one with the best results. In this thesis, we select both approaches. For wrapper method, the incremental searching algorithm Best-first was used where the search starts with an empty set of attributes and advances to large attribute sets in a search graph. The methods used seven learning schemes which were cross-validated five times against a list of subsets. Then a support count for the selected attribute is conducted where attributes that were selected by more than two algorithms were accepted, and those selected by less than two algorithms were discarded. Weka tends to return all attributes when invoked to produce a list of important attributes by using filter method. It gives the users capability to set a correlation threshold that can be used to prune non-important attributes. In this study, we choose 0.3 coefficient as it indicates a moderate correlation between class attribute and individual normal attributes. Other values of thresholds could have been picked but selecting the values above 0.3 could have resulted in elimination of almost all attributes and the values below 0.3 could have led to subset of attribute that is almost the same as the original set which defies the intended purpose of reducing the number of features. The activities of feature selection process were undertaken by the knowledge flow activity models as described in Figure 4.4 and Figure 4.5. The results of each model created a newer dataset with desirable features and other files reporting other information about the results.

Figure 4.4: Feature selection with wrapper method knowledge flow model

The prediction process After completing the necessary procedure of processing the data, the running of machine learning algorithms with the datasets from the previous step proceeded. For each dataset, the algorithms Zero R, Bayes network, Naïve Bayes, Decision Tables, Logistic Regression,

20 4.3. Methodology

Figure 4.5: Feature selection with filter method knowledge flow model

IBK, KStar, PART, Random Forest, and J48 were executed. For determining training and testing datasets, a percentage split technique was used. The training datasets were generated by selecting randomly 66% of all instances, and the remaining 34% instances were used for testing. Furthermore, to get reliable results, the experiments were repeated ten times for each dataset, and an average was obtained as a result.

Evaluation of classification model In real-life problems, evaluating classification models can be tricky. Accuracy alone may not be enough to decide which model is better than the other. Depending on the sensitivity of the application domain and values of other evaluation metrics, a model with low accuracy can be selected as a better model over a model with high accuracy. In many cases, the desirable classification models are the ones with a more significant proportion of true positives (TP) and true negatives (TN) over false positive (FP) and false negative (FN). Along with these metrics, other metrics such as Precision, Recall, F-Score, and AUC-ROC are used to present the combinatory influence of (TP, TN, FP, FN) metrics. The score range for precision, recall, F- score, and AUC is 0 to 1 where the higher the score, the more, the result. Lower precision and recall indicate that the model contains many false positives and false negatives respectively, the behavior which is not desirable for classification models. Furthermore, for evaluating the robustness of classification models, the analysis is made starting from feature selection process to running of prediction models. CPU time is recorded for each dataset when training and testing the algorithms.

21

5 Data pre-processing

The purpose of this chapter is to present the entire process undergone to prepare the data. The first section 5.1 presents data collection procedure that prepared the raw-datasets for the later analysis. The second section 5.2 deals with incorrect data, missing values, and duplicates. The third section 5.3 is about the preparation of regression problem as a classification problem.

5.1 Data collection

Data collection was the first experiment activity which included gathering and organizing the data into suitable sets as preparation for the further stages of the experiments. The resulting datasets included defenders, goalkeepers, midfielders, and forwards. The defender’s dataset had 970 instances categorized by the positions Left Wing Back (LWB), Left Back (LB), Central Back (CB), Right Back (RB), and Right Wing Back (RWB). Goalkeeper’s dataset had only one position role thus it was the smallest dataset with 222 instances. The midfielders dataset was the largest dataset with 1033 instances. The roles categorized the players as: Left Midfielders (LM), Left Attacking Midfielders (LAM), Central Attacking Midfielders (CAM), Central Mid- fielders (CM) Central Defensive Midfielders (CDM), Right Attacking Midfielders (RAM), and Right Midfielders (RM). The forward’s dataset contained 285 instances which were identified by the roles: Left Wing (LW), Left Forward (LF), Central Forward (CF), Striker (ST), Right Forward (RF), and Right Wing (RW). The process of collecting data yielded 38 attributes that were grouped as defensive, playmaking, attacking, and identifying. Table 5.1 shows the list of all attributes.

5.2 Data rescaling, missing values, and duplicates.

After observing the minimum, maximum, and standard deviation values of each numeric attribute of all datasets, normalization was chosen over standardization because some at- tributes contained extreme ranges more than 2000 and some less than 5. Such vast difference in attribute ranges could hugely affect the performance of the models [21]. Therefore, the min-max normalization technique was applied to all numeric attributes except for the class attribute and player id which was numeric, but its values were used for uniquely identifying players rather than quantification.

23 5. DATA PRE-PROCESSING

Table 5.1: List of all attributes Types of attributes Selected attributes Identifying Player_id, nationality, player name, age, pos, height, weight, team_name, league name Defensive Tackles, interception, fouls committed, offsides won, clearance, drib- bled past, blocks, own goals, Playmaking Assists, key pass, Average passes per game, pass success, crosses, long balls, through balls Attacking Goals, assists, shortsPerGame, key pass, dribble, fouled, offsides committed, dispossessed, bad control, All Rating, mins, halftime, fulltime, Aerial won, man of the match, yel- low cards, red cards,

Datasets produced from the previous section had missing values and duplicated records. Missing values were caused by inconsistent presentation of zero values from the data provider. The values from data provider contained empty, dash, or zero to represent the zero value. Therefore, for consistency, all empty records and dashes were replaced by a digit 0. Also, for some players that nationality information was missing, and search en- gines were used to fill up the missing information. Duplicate values were identified and handled for all datasets. The player transferring activities from one team to another within the same season caused some players to appear twice. In respective order, defenders, goal- keepers, midfielders, and forwards had 20, 3, 20, and 71 players as duplicates. Removing of the duplicate players was accomplished by replacement of repeating records with the aver- age score of recurring player’s information. Examples of repeating players for all datasets include Dante (from Wolfsburg to Bayern Munich), Willy Caballero (from Chelsea to Manch- ester city), (from Arsenal to Chelsea), and (from Wolfsburg to Manchester city).

5.3 Converting regression problem to binary classification problem

Since the goal was to build generalized binary prediction models, different splitting points were used and in turn, caused a un-proportional number of instances between the target and non-target classes. Therefore, a resampling technique called SMOTE was applied to resolve the class imbalance. The discretization and resampling procedure created new datasets top10, top25, and top50 which had more instances than the previous datasets. The top10, top25, and top50 datasets contained a class attribute discretized by 10, 25, and 50 percent binning, implying that top 10, 25, and 50 percent of all players represent the positive class while the rest represent the negative class respectively. Note, top 50 percent binning is equivalent to binary binning and can be regarded as a perfect binary prediction. The class imbalance discussed earlier was attenuated by the custom discretization with binning less than 50 percent. Thus SMOTE technique was needed to oversample the minority class to reach the proportion of the majority class. In the process, the minority class was increased by the percentage calculated by using the formula described in section 4.3. The percentage of instance added for top10, top25, and 50 percent were 800%, 200%, and 0% respectively. For each dataset, target and non-target class-intervals were formatted as “X to inf” and “-inf to X” where “X” is the boundary value between the classes determined by the discretization script. Then, randomization of instances followed to mix appended copies of a minority class. To see the summarized information about the datasets top10, top25, and top50 in all categories of players, refer to Table 5.2.

24 5.3. Converting regression problem to binary classification problem

Table 5.2: All datasets of the combined leagues Pos_dataset (+) class interval (-) class interval instances SMOTE % DF_10pc 7.17 to inf -inf to 7.17 1673 800 GK_ 10pc 7.07 to inf -inf to 7.07 398 800 MD_ 10pc 7.19 to inf -inf to 7.19 1857 800 FW_ 10pc 6.97 to inf -inf to 6.97 509 800 DF_ 25pc 7.00 to inf -inf to 7.00 1393 200 GK_ 25pc 6.83 to inf -inf to 6.83 334 200 MD_ 25pc 6.93 to inf -inf to 6.93 1549 200 FW_ 25pc 6.68 to inf -inf to 6.68 427 200 DF_50pc 6.82 to inf -inf to 6.82 929 0 GK_50pc 6.65 to inf -inf to 6.65 222 0 MD_50pc 6.67 to inf -inf to 6.67 1033 0 FW_50pc 6.40 to inf -inf to 6.40 285 0

Table 5.3: All datasets of the Bundesliga Pos_dataset (+) class interval (-) class interval instances SMOTE % DF_10PC 7.13 to inf -inf to 7.13 322 800 GK_10PC 6.95 to inf -inf to 6.95 68 800 MD_10PC 7.20 to inf -inf to 7.20 365 800 FW_10PC 7.12 to inf -inf to 7.12 90 800 DF_25PC 6.95 to inf -inf to 6.95 266 200 GK_25PC 6.85 to inf -inf to 6.85 54 200 MD_25PC 7.02 to inf -inf to 7.02 307 200 FW_25PC 6.88 to inf -inf to 6.88 74 200 DF_50PC 6.80 to inf -inf to 6.80 178 0 GK_50PC 6.63 to inf -inf to 6.63 36 0 MD_50PC 6.71 to inf -inf to 6.71 205 0 FW_50PC 6.44 to inf -inf to 6.44 50 0

Table 5.4: All datasets of the EPL Pos_dataset (+) class interval (-) class interval instances SMOTE % DF_10PC 7.20 to inf -inf to 7.20 323 800 GK_10PC 7.09 to inf -inf to 7.09 87 800 MD_10PC 7.21 to inf -inf to 7.21 435 800 FW_10PC 7.07 to inf -inf to 6.97 89 800 DF_25PC 7.06 to inf -inf to 7.06 269 200 GK_25PC 6.78 to inf -inf to 6.78 71 200 MD_25PC 6.91 to inf -inf to 6.91 365 200 FW_25PC 6.73 to inf -inf to 6.73 73 200 DF_50PC 6.86 to inf -inf to 6.86 179 0 GK_50PC 6.63 to inf -inf to 6.63 47 0 MD_50PC 6.68 to inf -inf to 6.68 243 0 FW_50PC 6.42 to inf -inf to 6.42 49 0

25 5. DATA PRE-PROCESSING

Table 5.5: All datasets of the La Liga Pos_dataset (+) class interval (-) class interval instances SMOTE % DF_10PC 7.10 to inf -inf to 7.10 355 800 GK_10PC 7.09 to inf -inf to 7.09 76 800 MD_10PC 7.17 to inf -inf to 7.17 429 800 FW_10PC 6.98 to inf -inf to 6.98 93 800 DF_25PC 6.95 to inf -inf to 6.95 293 200 GK_25PC 6.82 to inf -inf to 6.82 66 200 MD_25PC 6.84 to inf -inf to 6.84 355 200 FW_25PC 6.82 to inf -inf to 6.82 66 200 DF_50PC 6.77 to inf -inf to 6.77 195 0 GK_50PC 6.62 to inf -inf to 6.62 44 0 MD_50PC 6.62 to inf -inf to 6.62 237 0 FW_50PC 6.50 to inf -inf to 6.50 53 0

Table 5.6: All datasets of the Ligue 1 Pos_dataset (+) class interval (-) class interval instances SMOTE % DF_10PC 7.27 to inf -inf to 7.27 325 800 GK_10PC 7.21 to inf -inf to 7.21 77 800 MD_10PC 7.12 to inf -inf to 7.12 437 800 FW_10PC 6.93 to inf -inf to 6.93 113 800 DF_25PC 7.07 to inf -inf to 7.07 271 200 GK_25PC 6.92 to inf -inf to 6.92 67 200 MD_25PC 6.93 to inf -inf to 6.93 367 200 FW_25PC 6.57 to inf -inf to 6.57 97 200 DF_50PC 6.90 to inf -inf to 6.90 181 0 GK_50PC 6.77 to inf -inf to 6.77 45 0 MD_50PC 6.63 to inf -inf to 6.63 245 0 FW_50PC 6.33 to inf -inf to 6.33 65 0

Table 5.7: All datasets of the Serie A Pos_dataset (+) class interval (-) class interval instances SMOTE % DF_10PC 7.11 to inf -inf to 7.11 356 800 GK_10PC 7.18 to inf -inf to 7.18 90 800 MD_10PC 7.22 to inf -inf to 7.22 183 800 FW_10PC 6.89 to inf -inf to 6.89 124 800 DF_25PC 6.97 to inf -inf to 6.97 294 200 GK_25PC 6.86 to inf -inf to 6.86 74 200 MD_25PC 6.94 to inf -inf to 6.94 155 200 FW_25PC 6.67 to inf -inf to 6.67 102 200 DF_50PC 6.79 to inf -inf to 6.79 196 0 GK_50PC 6.68 to inf -inf to 6.68 50 0 MD_50PC 6.71 to inf -inf to 6.71 103 0 FW_50PC 6.41 to inf -inf to 6.41 68 0

26 6 Feature selection

This chapter is organized into three subsections explaining the application of feature selection algorithms. Section 6.1 clarifies the implementation of the wrapper selection method where the report presents the choices for the attribute subset evaluators and parameters for each league and each dataset. Section 6.1 describes the filter selection method, for each league, the attribute correlations and ranking cutoff point are explained. The results were used to prune the datasets before running the machine learning algorithms. For wrapper and, filter selection methods, 144 datasets created in which 120 was for the individual leagues and 24 for the combined leagues then used as input for training and testing processes.

6.1 Feature selection with wrapper method

To perform feature selection by wrapper method, seven machine-learning algorithms that can handle numeric, categorical, and binary attributes were selected as the evaluator of the generated subsets of data. On each subset, the classifiers Bayes network, Naïve Bayes, Lo- gistic Regression, IBK, J48, PART, and Random Forest to specify how good the subset can improve the performance. The subsets were determined by the Best-first search algorithm which starts with an empty set of attributes in a graph, then in a forward direction calculates the goodness measure of an attribute subset and save it as Merit of the best subset. Itera- tively the process continues by adding more attributes as nodes in a search graph until no improvement achieved [58]. To avoid terminating the search process at the local maximum, the backtracking method facilities the algorithm to reach other parts of the graph that could find other subsets with higher merit than the local maximum. Thus, we selected to backtrack five consecutive non-improving nodes before terminating the search algorithm. Furthermore, five fold cross-validation method was used to ensure accuracy of the selected attributes. The sections 6.1, 6.1, and 6.1 describes the results for the application of wrapper methods.

Merit of the subsets of attributes selected The merit of subset found is the score accumulated by wrapper method at each node of the graph as search progresses. The score represents how well the subset of attributes will yield accurate models. The observation indicates that Bayes Network, Naïve Bayes, and Random Forest on average had the highest merit score of approximately 90 percent for all datasets of

27 6. FEATURE SELECTION all leagues in combined and separated form. Hence, at this stage, it can be assumed that the model performance of the experiments that took subsets generated by said wrapper evalu- ators will be higher than other models. It can be noticed that Top 10% datasets for all four categories of players had the highest merit while Top 50% datasets for all categories of play- ers had the lowest merit. This might be attenuated by the number of instances added when handling class imbalances. The proportion of instances added for Top 10%, Top 25%, and Top 50% was eight, two, and zero respectively times the size of the minority class. The in- stances added by resampling technique might have introduced noise and hence affect how the classifiers evaluated the subsets of attributes.

Figure 6.1: Merit of subsets of attributes selected

Selected attributes After clarifying the merit scores of the subsets returned, the experiment proceeded with find- ing the number of times the attributes picked by the subset evaluators. The attributes that passed the minimum frequency of two were recorded in a set of important attribute while the attributes with a frequency below two were removed. Table 6.1 - 6.4 presents the at- tributes with the overall highest frequency of goalkeepers, defenders, midfielders, and for- wards datasets of the separated leagues while Table 6.5 and 6.6 show the results of the se- lected attributes when players are combined. The listed attributes can be identified as the most important features for each player category. By referring to Table 6.5 and 6.6 several interesting observations can be made. First of which is that most common attributes for all categories of players except goalkeepers rep- resented all types of attributes. Shorts on goals, assists, and crosses represented the attack- ing attributes; Interceptions and tackles represented defensive attributes; crosses and assists represented attacking and playmaking; team name and player name represented identifying attributes, and a man of the match represented all. This shows that good players regardless of their position in the pitch should exhibit excellence in all aspects of the game. For example, to regard a defender as the best player, he or she should also display attacking and playmaking qualities.

28 6.1. Feature selection with wrapper method

By far man of the match and name of the player dominated the list of important attributes in all datasets. Man of the match was picked 19 times while the name of the player 20 times which implies that to build a Naïve generalized model a single attribute man of the match or the name of the player can be used and may still give a decent result. Another fascinating insight is the selection of the name of the player, and the name of the team as among most frequent potential attributes. Algorithms on all datasets except three selected player name and team name as attributes most likely to produce better prediction models. Goalkeepers strongly support this observation in Table 6.1 and 6.5 for both sep- arated and combined leagues datasets. Such observation indicated that since goalkeepers have fewer activities compared to other players and statistically no vast difference between top quality and normal goalkeepers, the identifying attributes play a prominent role in rat- ing the players. Refer to tables in Appendix A to see results of individual datasets and the extended list of most frequent selected attributes.

Table 6.1: Important features for goalkeepers as selected by wrapper method League Selected attributes Bundesliga BadControl, crosses, flag, height, motm, name, team_name EPL clear, drib, fouled, height, motm, name, team_name La Liga crosses, inter, motm, name, tackles, team_name Ligue 1 crosses, foulCommited, name, owng, team_name Serie A aw, flag, halfTime, motm, name

Table 6.2: Important features for defenders as selected by wrapper method League Selected attributes Bundesliga assists, aw, crosses, fouled, height, inter, motm, name, PassSuccPerc, red, shotspg, team_name EPL assists, avgp, aw, clear, crosses, halfTime, inter, keyp, motm, name, off- sidesWon, owng, PassSuccPerc, tackles, team_name La Liga aw, crosses, fouled, inter, mins, motm, name, offsidesCommitted, off- sidesWon, PassSuccPerc, tackles, team_name Ligue 1 crosses, height, inter, motm, name, offsidesCommitted, tackles Serie A aw, crosses, goals, inter, motm, name, red, shotspg, team_name

Table 6.3: Important features for midfielders as selected by wrapper method League Selected attributes Bundesliga avgp, fullTime, halfTime, height, keyp, motm, name, tackles, team_name, thrb EPL assists, blocks, crosses, flag, fullTime, goals, halfTime, motm, name, tackles, team_name, thrb La Liga age, assists, BadControl, fullTime, motm, name, offsidesWon, owng, tackles, team_name, thrb Ligue 1 assists, crosses, fouled, fullTime, halfTime, inter, keyp, motm, name, off- sidesCommitted, shotspg, tackles Serie A assists, clear, crosses, fouled, halfTime, inter, motm, name, shotspg, weight

Time taken to evaluate the best subsets The time taken by feature selection algorithms on separated datasets was significantly lower compared to the time taken by combined datasets. On overage, it took one to six minutes to generate attributes for individual leagues while for combined leagues it took up to more

29 6. FEATURE SELECTION

Table 6.4: Important features for forwards as selected by wrapper method League Selected attributes Bundesliga fouled, keyp, goals, name, BadControl, clear EPL shotspg, goals, crosses, red, foulCommited, name, motm, fullTime, keyp La Liga motm, shotspg, assists, fouled, aw, name, goals Ligue 1 shotspg, goals, name, blocks, fullTime, halfTime Serie A motm, goals, crosses, name, keyp, inter, fullTime

Table 6.5: Important features for the combined leagues by wrapper attribute evaluator Selected attributes Dataset Inter, name, aw, motm, shotspg, tackles, team_name, age, avgp, blocks, red, DF_10PC yel Name, motm, shotspg, tackles, team_name, crosses, halfTime, goals, owng, DF_25PC disposs, fullTime, offsidesCommitted, inter, mins Motm, inter, crosses, Shotspg, Tackles, assists, blocks, goals, team_name, thrb, DF_50PC offsidesCommitted, league Name, Tackles, owng, halfTime, longb GK_10PC Name, Team_name, Owng, thrb GK_25PC Motm, name, aw, player_id GK_50PC name, team_name, fouled, owng, motm MD_10PC motm, name, halfTime, assists, Shotspg, crosses, PassSuccPerc, league, red, MD_25PC goals, blocks, keyp fullTime, crosses, motm, keyp, tackles, shotspg, PassSuccPerc, assists MD_50PC motm, name, owng, blocks FW_10PC motm, player_id, name, aw, shotspg, halfTime, goals, offsidesCommitted, FW_25PC age motm, crosses, shotspg, assists, inter, red, offsidesWon, foulCommited, FW_50PC weight, BadControl

Table 6.6: Most frequent selected attributes by wrapper method in combined leagues DF GK MD FW Attribute 10 25 50 10 25 50 10 25 50 10 25 50 Man of the match 3 5 7 15 0 1 7 8 2 6 4 12 5 7 6 18 53 Player name 4 5 0 ř9 5 4 3 12ř 5 4 0 ř9 5 4 0ř 9 ř39 ř Crosses 1 3 6 10 0 0 1 1 1 2 6 9 0 1 5 6 26 Shots per game 3 4 5 12 1 0 1 2 1 2 3 6 0 3 3 6 26 Interception 4 2 7 13 1 0 1 2 1 1 1 3 0 1 3 4 22 Tackles 3 4 4 11 2 1 0 3 1 1 3 5 0 1 1 2 21 Half time 1 3 0 4 2 1 1 4 1 4 1 6 1 3 1 5 19 Team name 3 4 2 9 0 3 0 3 2 1 0 3 1 1 1 3 18 Assists 1 0 4 5 1 1 1 3 0 3 2 5 0 1 3 4 17

30 6.2. Feature selection with filter attribute evaluator than four hours. This behavior was especially exihited by Logistic Regression and Random Forest classifiers as the attribute subset evaluators. The observation reveals that the poor performance of Random Forest can happen when large datasets used as they create many trees which in turn makes the algorithm slow. Logistic Regression works well when there is a linear relationship between a dependent and independent variables, if data is sparse and there is a correlation between attributes, Logistic Regression maximum-likelihood process fail to converge and hence underperform. Another subset evaluator which performs poorly when large datasets were used is IBK because it is an instance-based approach which puts the entire subsets of data into the memory while evaluating the potential attributes. Bayes network, Naïve Bayes and J48 were observed to be stable and robust for all datasets and for all splitting points applied. J48 does not exhaustively compute all possible decision trees while training therefore even when large datasets were used the subset evaluator finds the optimum splitting point which in turn iteratively reduced the computation time. Also, Bayes-based subset evaluators were faster because the building of the Bayesian belief net- work is always optimized by using specialized search algorithms like K2-Hill climbing ad- ditionally to Naïve Bayes it considers independence between attributes. Figure 6.2 presents the time taken by attribute subset evaluator where D10, D25, D50, M10, M25, M50, F10, F25, F50, G10, G25, and G50 represents top10, top25, and top50 datasets of Defenders, Midfield- ers, Forwards, and Goalkeepers respectively. In addition, to see individual execution time for each experiment see Appendix A and C.

Figure 6.2: Execution time of wrapper subset evaluator

6.2 Feature selection with filter attribute evaluator

Another mechanism of feature selection applied for this thesis is a filter-based method, which measures the relationship between an individual attribute, and the class attribute by using correlation coefficient with cross-validation. The attribute with the higher correlation value are the ones considered as potential features. The algorithm takes two parameters that are the minimum coefficient and the attributes to ignore. The minimum correlation of 0.3 was used to eliminate all features with weak correlation, while the second parameter was not

31 6. FEATURE SELECTION specified as for this experiment, the aim was to let the algorithm do the job, and our task was to analyze the results. Then, for each dataset, the important attributes are determined following player positions and leagues. The results for application of filter attribute evaluator, the overall ranking results for all experiments conducted, and the sets of attributes selected on each dataset are presented.

Correlation ranking of selected attributes The result shows that correlation values of the larger datasets like defenders and midfielders successively increase with an increase of the SMOTE percentage as large datasets required an enormous number of additional instances when oversampling, which in turn distorted the correlation between the attributes and the class attribute. On the other hand, the correlation values of smaller datasets like goalkeepers and forwards were not affected by the increase of instances in the process of oversampling the data. Less additional instances were used for the smaller datasets. Therefore, insignificant distortions were observed. Another observation suggests that the selected attributes did not include any of the iden- tifying attributes such as team name, nationality, player’s name, and league name. In fact, for filter attribute evaluator, identifying attributes scores were the least across all experiments because the identifying attributes were either wholly scattered or have the same values for all players. For each subset created, the algorithms removed all features that had a weak correla- tion of below 0.3. Furthermore, weka correlation ranking returns absolute values. Therefore, 0.3 may indicate correlation for both negative and positive. However, intuitively majority of the attributes selected by filter method can be thought to affect the class attribute positively.

Selected attributes By comparing the results of the wrapper and filter evaluators in Table 6.1 – 6.5, the attributes selected by filter method consisted of more similar items than the wrapper method does. Across all leagues and player types, filter method led to datasets with fewer unique attributes. Four for defenders and goalkeeper, two for forwards, and one for midfielders. In contrast, wrapper method yielded to more skewed attribute datasets for each league. The unique number of attributes selected for defenders, forwards, goalkeepers, and midfielders datasets were seven, nine, nine, and twelve respectively. Undoubtedly a man of the match ruled the list of important attributes in all datasets except Ligue1 forwards. Man of the match was picked 19 out of twenty filter algorithms. The result which is similar to wrapper method which indicated that to build a Naïve generalized model a single attribute man of the match can be used and may still give a decent result. Moreover, another insight from the Table 6.10 indicated that the ranking of forwards is more affected by the negative attributes than other categories of players. The attributes Red cards, Yellow card and Bad control, appeared in almost all leagues for forwards. This case is also supported by the wrapper method results where similar behavior is observed.

Table 6.7: Important features for goalkeepers as selected by filter method League Selected attributes Bundesliga aw, flag, motm, PassSuccPerc EPL aw, clear, fouled, height, motm, name La Liga fullTime, motm, name, team_name, yel Ligue 1 flag, fullTime, mins, motm, name Serie A aw, flag, fullTime, mins, motm, name, yel

32 6.2. Feature selection with filter attribute evaluator

Table 6.8: Important features for defenders as selected by filter method League Selected attributes Bundesliga aw, crosses, fullTime, inter, mins, motm, tackles, thrb EPL crosses, inter, keyp, motm La Liga aw, crosses, fullTime, inter, mins, motm Ligue 1 aw, crosses, inter, motm, shotspg, tackles Serie A aw, crosses, fullTime, goals, inter, mins, motm, tackles

Table 6.9: Important features for midfielders as selected by filter method League Selected attributes Bundesliga assists, avgp, crosses, disposs, fouled, fullTime, goals, keyp, mins, motm, shotspg EPL assists, BadControl, crosses, disposs, drib, fouled, fullTime, goals, halfTime, keyp, longb, mins, motm, shotspg, tackles, thrb La Liga assists, avgp, crosses, drib, fouled, fullTime, goals, keyp, mins, motm, shot- spg, tackles, thrb Ligue 1 assists, avgp, BadControl, crosses, disposs, fouled, fullTime, goals, keyp, longb, mins, motm, shotspg Serie A assists, avgp, crosses, disposs, fullTime, goals, halfTime, inter, keyp, mins, motm, shotspg, tackles, thrb

Table 6.10: Important features for forwards as selected by filter method League Selected attributes Bundesliga age, assists, avgp, aw, BadControl, clear, crosses, disposs, foulCommited, fouled, fullTime, goals, keyp, mins, motm, offsidesCommitted, shotspg, tackles, thrb, yel EPL assists, avgp, aw, BadControl, clear, crosses, disposs, fouled, fullTime, goals, keyp, mins, motm, shotspg, thrb, yel La Liga assists, avgp, aw, BadControl, clear, crosses, disposs, fouled, fullTime, goals, keyp, mins, motm, offsidesCommitted, shotspg, yel Ligue 1 assists, avgp, aw, BadControl, clear, crosses, disposs, foulCommited, fouled, fullTime, goals, keyp, mins, offsidesCommitted, shotspg, thrb Serie A assists, avgp, BadControl, blocks, clear, crosses, disposs, fouled, fullTime, goals, inter, keyp, mins, motm, offsidesCommitted, shotspg, tackles, thrb

33 6. FEATURE SELECTION

Table 6.11: Important features for the combined leagues by filter attribute evaluator Selected attributes Dataset inter, motm, aw, crosses, tackles, halfTime, assists, fullTime, mins DF_10PC motm, crosses, aw, inter, mins, fullTime, tackles, goals, halfTime, shotspg DF_25PC crosses, fullTime, Mins, Inter, motm, aw, tackles, goals, thrb DF_50PC mins, fullTime, clear, drib, yel GK_10PC motm GK_50PC motm, keyp, fullTime, mins, shotspg, assists, crosses, goals, fouled, disposs, MD_10PC halfTime, longb, thrb, BadControl, tackles fullTime, mins, motm, keyp, crosses, assists, fouled, shotspg, goals, disposs, MD_25PC tackles, halfTime, thrb, longb, BadControl, drib fullTime, mins, crosses, keyp, assists, motm, fouled, tackles, shotspg, inter, MD_50PC yel, goals, thrb, drib, disposs, BadControl, longb, foulCommited, clear crosses, shotspg, fullTime, goals, motm, mins, assists, avgp, keyp, fouled, FW_10PC aw, thrb, disposs, BadControl, offsidesCommitted, clear, yel, halfTime, age, tackles, foulCommited fullTime, mins, shotspg, crosses, goals, avgp, keyp, motm, assists, disposs, FW_25PC BadControl, fouled, offsidesCommitted, aw, thrb, tackles, clear, yel, foulCom- mited, age crosses, fullTime, mins, shotspg, keyp, avgp, BadControl, goals, disposs, FW_50PC fouled, aw, assists, offsidesCommitted, thrb, foulCommited, motm, clear, tackles, yel, age

34 7 Performance of prediction models

This chapter presents the results of the metrics for validating the performance of the predic- tion algorithms. For each league, player category, and feature selection technique, the report gives a broad comparison that enables determination of the best-fit prediction models. The three metrics selected for this thesis were Accuracy, F1, and AUC-ROC. These metrics im- plicitly captures the other important metrics such as TP, TN, FP, and FN that can be used in combination to describe the performance of a model.

7.1 Accuracy results of the prediction models

The experiment results of all datasets showed that majority of classifiers outperformed the baseline (ZeroR) accuracy. On overage, the Zero R accuracy was around 50% which was ac- ceptable as a reference point for other classifiers. Therefore, the models that outperformed the baseline accuracy undergone further evaluation with other metrics such as F1 Score and AUC-ROC. The random forest model for Bundesliga-goalkeepers top50pc dataset was ig- nored as the accuracy value was 25% which is far from the baseline. As it can be seen in the appendices E and F for the individual leagues, the model accuracy of datasets made by wrapper method had a slightly higher performance than those made by filter method while for the individual leagues the vice versa holds. However, the differences were minimal in such a way that applying both techniques can assure better results. Fur- thermore, for individual leagues, there were some exceptional cases where the perfect classi- fication occurred which implied that the accuracy was 100%, the FN and FP were zero, and the F1 and AUC were one. The behavior was exceptionally observed from top10-forwards datasets of Bundesliga and SerieA. The observation indicates that on average the accuracy of the Naïve Bayes algorithms was 87.6 and 87.9 for EPL and Bundesliga, while Random Forest, Logistic Regression, and Bayes network rated the highest with the scores 86.79, 83.04, and 83.9 for La Liga, Ligue1, and SerieA respectively. Furthermore, the accuracy score for all leagues showed correlation to the number of instances added by SMOTE for some of the algorithms. This implies that the rate at which the accuracy was affected depended on the type of classifiers used. Furthermore, the results of the combined leagues show as in Appendix F that J48 and PART for wrapper-datasets and Logistic Regression and Naïve Bayes for filter-datasets are stable classifiers meaning that no matter what the oversampling percent used, the accuracy

35 7. PERFORMANCEOFPREDICTIONMODELS did not change. While For small datasets of Goalkeepers and defenders no uniform pattern was recognized, the smaller datasets barely masked the effect of noise. Therefore, it was more difficult to establish a clear pattern than large datasets. To see the average accuracy of classifiers for all datasets view Table 7.1 and for analyzing the individual results of the classifiers against individual dataset refer to Appendices E and F

Figure 7.1: Model accuracy results

Table 7.1: Overall model accuracy Classifier Acc (Filter) Acc (Wrapper) RandomForest 87.43 86.61 BayesNet 85.53 87.78 Logistic 85.93 86.08 DecisionTable 84.11 84.99 IBk 83.89 84.67 Kstar 83.76 81.67 NaiveBayes 81.72 83.49 J48 84.66 80.01 PART 84.63 78.44 Grand average 84.63 83.75

7.2 F1 Score results of the prediction models

The other metrics used for determining the performance of the classifiers is F1 Score. As the results indicate, the overall F1 Score performance is like accuracy results. The information in Appendix G and H reveals that F1 Score is linearly scaling with the number of instances. However, in exception of J48 and PART and with the same reasons as model accuracy results, all classifiers were affected positively by the increase of the resampling instances. The added instances in one hand help to improve class imbalance but in a process introduced impurities

36 7.3. AUC-ROC results of the prediction models that caused overfitting to some degree. Moreover, the results of all experiments reveal that F1 Score of the instance-based classifiers, IBK and KStar were more sensitive to the increase of resampled instances compared to other classifiers because instance-based classifier learns di- rectly from the given datasets, so the added instances gradually affect how machine learning algorithms learn from the data. Furthermore, as it can be seen in Figure 7.2 and Table 7.2, the F1 Score results were sat- isfactory for both wrapped and filtered datasets. The average minimum and maximum F1 Score results for filtered and wrapped models were 0.82, 0.87 and 0.79, 0.86 respectively. In many cases, the F1 Score results of the models made by wrapper datasets outperformed the those made by filtered datasets. On average, the Random Forest, Bayes network, and Logistic Regression classifiers had the highest score compared to other classifiers. On the other hand, Naïve Bayes, J48, and PART had the lowest but satisfying scores. When it comes to F1 Score the results of individual leagues. The outcomes of the experi- ments indicated that Naïve Bayes and random forest outperformed other algorithms in this measure which seems to be agreeing with the results of the accuracy meaning that the best algorithms featured in F1 appeared as well in accuracy. Bundesliga, EPL, and SerieA Ran- dom Forest classifier was rated the highest with the average values of 0.87, 0.87, 0.88 while La Liga, and Ligue1, Naïve Bayes scored 0.87 and 0.82 respectively.

Figure 7.2: Overall F1 Score results of the combined leagues

7.3 AUC-ROC results of the prediction models

AUC-ROC is another metric like the F1 Score for measuring the performance of the classifiers. Generally, they are used to present the combinatory effect of Precision and Recall. AUC is used along with F Score to extend the range of the analysis. From Figure 7.3 and Table 7.3 it can be seen that Random Forest, Bayes network, and Logistic Regression produced the highest results while instance-based classifiers like IBK and K-star, and J48 and PART were the lowest. For all classifiers, except J48 and PART, the use of filter or wrapper method did not make a difference regarding AUC.

37 7. PERFORMANCEOFPREDICTIONMODELS

Table 7.2: F1 Score results between wrapper and filter of the combined leagues Classifier F1 Score (Filter) F1 Score (Wrapper) RandomForest 0.87 0.86 BayesNet 0.85 0.87 Logistic 0.85 0.86 IBk 0.83 0.85 DecisionTable 0.83 0.84 NaiveBayes 0.82 0.84 J48 0.84 0.81 Kstar 0.83 0.82 PART 0.84 0.79 Grand average 0.84 0.84

Moreover, for the individual leagues, on average, Naïve Bayes outperformed other classi- fiers in all leagues except La Liga. The results in appendix M showed that Naïve Bayes had the highest AUC score of 0.94, 0.93, 0.90, and 0.94 in Bundesliga, EPL, Ligue1, and SerieA, while Bayes network, random forest, and Logistic Regression had the highest accuracy of 0.92, 0.93, 0.90, and 0.93 for La Liga, EPL, Ligue1, and SerieA. Furthermore, a unique behav- ior was observed when instance-based classifier such as IBK used which turned out to be the opposite of the consolidated datasets. For individual each league and type of ranking, IBK showed to have better AUC score when small datasets taken as input to the algorithm. Re- sults of forwards and goalkeeper’s models that had the smallest dataset exhibit this pattern.

Figure 7.3: Overall AUC-ROC results

38 7.3. AUC-ROC results of the prediction models

Table 7.3: AUC-ROC results Scheme AUC (Filter) AUC (Wrapper) RandomForest 0.93 0.93 BayesNet 0.92 0.93 Logistic 0.92 0.92 NaiveBayes 0.89 0.92 DecisionTable 0.90 0.89 Kstar 0.91 0.88 IBk 0.84 0.85 PART 0.88 0.80 J48 0.86 0.81 Grand average 0.89 0.88

39

8 Discussion and conclusion

The purpose of this section is to present the discussion of the analysis made about the re- search questions asked in the introduction chapter and provide recommendations whenever necessary. The chapter is organized following the order of research questions. Therefore, sec- tion 8.1 is about analyzing the mechanism of selecting important features for development of efficient soccer prediction models. Section 8.2 is about the discussion of the potential features selected for the prediction models. Section 8.4 is about the generalization of binary soccer prediction models. Section 8.3 discusses the better mechanism of selecting potential features. Section 8.3 discusses the potential soccer classification models and section 8.4 analyzed the ranking schemes of various data providers.

8.1 What are the best mechanisms for selecting essential features for predicting the performance of top players in European leagues?

In this thesis two popular algorithms, wrapper and filter for selecting the most important at- tributes were used in parallel. The comparison showed that both filter and wrapper methods could result in the creation of effective models. The results showed that filter method in most cases including Accuracy, F1 Score, and AUC, slightly outperformed the filter method. Fur- thermore, the fact that wrapper methods selected the attributes by applying the algorithms that later used to perform classification, the results may be biased than filter methods. How- ever, though it seems filter method is better than wrapper, we cannot ignore the fact that, the results of wrapper method includes identifying attributes which are commonly ignored by filter method. Features like team name and player name also can influence the evaluation of players, so it is important to remember, Wrapper method may be needed in these kinds of situations. The wrapper method selection process is significantly slower than filter subset evaluation especially when large datasets or noise data run with instance-based and convergent depen- dent classifiers. The time taken for filter subset evaluator was in milliseconds while wrapper subset evaluator took several minutes to hours. Therefore, if we want models that make consideration of many dimensions of perfor- mance, considering applying both wrapper and filter methods could be the right solution to complement the weaknesses that were found in both algorithms. Filter method will supple-

41 8. DISCUSSIONANDCONCLUSION ment robustness and the features to be returned will not be biased as a wrapper and will be simple to analyze. On the other hand, the wrapper method will return identifying attributes ignored by the filter method. Table 8.1 summarizes the difference between a wrapper and filter subset evaluators.

Table 8.1: Comparison between the wrapper and filter subset evaluator Criteria Wrapper subset evaluator Filter subset evaluator Time Consumes time when large Very fast, run in millisec- datasets used. Takes min- onds utes up to hours to complete selection. Correlation between at- Does consider inter- Ignore the correlation be- tributes correlation between at- tween attributes. Only tributes consider the correlation be- tween individual attribute and the class attribute Identifying attributes Considers identifying Does not favor identifying attributes such as (Nation- attributes because individu- ality, team name, player ally they correlate little with name) as results of eval- the class attribute. uation of inter-correlation between attributes Intuition Return attributes that are in- Return attributes that are in- tuitively easy to interpret tuitively easy to interpret but not as filter method

8.2 What are the essential features for developing prediction models for top players in European leagues?

To answer this research question, we first need to define what it means for a set of features to be regarded as essential. An essential set of features is the one that produces high per- formance according to a given metrics, produce almost same results when small tweaks on parameters used and should be in a reasonable size for analysis. For this thesis, the set of es- sential features are categorized according to the leagues and player positions including goal- keepers, defenders, midfielders, and forwards. The result of each set of selected important attributes can be viewed in the chapter seven. The selected attributes of EPL by wrapper method agrees with the fact that Premier League is the toughest league in Europe regarding the high intensity and physical require- ments needed [59]. The number of attributes and types of attributes featured in the premier league is much more compared to other leagues. It requires a defender to possess a lot of different qualities to excel in EPL. EPL had a key pass, clearance, own goals, average passes per game, minutes played, and halftime as unique attributes that deemed to be not impor- tant in other leagues. Because of this many of the non-English elite footballers see themselves under perform when joining the league. The likes of Henrikh Mkhitaryan, Memphis Depay, and Angel Di Maria can be an excellent example of such players. Also, another interesting observation was for the EPL indicated that discipline is important for forwards. The number of red cards or fouls committed can affect the rating of the player. The results of other leagues revealed similarity which indicated that, English Premier League is different from the rest of European leagues. Which implies that players from other leagues such as Bundesliga, La Liga, Serie A, and Ligue 1 could find it difficult to adopt and excel when signed with English team.

42 8.3. What are the useful classification models for predicting performance of top players in European leagues? 8.3 What are the useful classification models for predicting performance of top players in European leagues?

Selecting the most effective classification model for this thesis was the trickiest thing to do because there are many ways in which effective models can be reflected. Models can be com- pared to the performance metrics such as Accuracy, F1 Score, AUC, size of the datasets, and stability of resampling effects against the types of datasets. Thus, for simplicity, the answer will be separated according to the comparison criteria. According to the observation when the oversampling technique is applied to large datasets, J48 and PART were the suitable algo- rithm for the prediction models which use wrapper attribute selection method to prepare the datasets. The model performance remained the same with the tweaks of different parameters applied such as of oversampling percentage. Likewise, Naïve Bayes, and Logistic Regression were suitable for large datasets made by filter attribute evaluator. Furthermore, Naïve Bayes was featured as the third stable algorithm for wrapper models. Thus, we can say Naïve Bayes is better for both situations. If the aim is not stability but performance Naïve Bayes, Random Forest, Bayes network were the most suitable classifiers for all kinds of datasets. These algorithms had the highest accuracy, F1 Score, and AUC values across all datasets and their stability is fare compared to other classifiers. Therefore, choosing these as best models can be a better tradeoff selec- tion for both stability and performance. Small datasets for all player categories consisted of overlapping results. In some cases, Naïve Bayes, Random Forest, and Bayes network outper- form other classifiers, and in other cases, Decision Tables and logistic classifiers outperformed other classifiers. Therefore, for smaller datasets, it can be generalized that the best classifiers where Naïve Bayes, Random Forest, Bayes network, Logistic Regression, and Decision Ta- bles. However, the best of the best classifier that fit in almost all datasets, leagues, custom discretized split, and types of players was Naïve Bayes. Naïve Bayes is known as simple classifier but surprisingly outperform sophisticated ones. Though there could be strong cor- relation between attributes which can affect the performance of the classifier, the effects of such correlation cancel each other and hence no effect on the learning process [60].

8.4 How can binary prediction models be generalized?

Traditionally many of the machine learning tools like Weka enable conversion of regression models to binary models through discretization of a class attribute with equal frequency bi- nary binning which is a limitation when we want to build generalized models. Instead of splitting the data equally we may want to set another splitting point. To resolve the limita- tion two steps were done. The first one was custom discretization by the Jython script and then compensation of the minority class by using SMOTE as oversampling technique fol- lowed. Determination of the appropriate number of instances to be added was determined by the formula developed after observing what it takes to compensate the minority class to reach the ratio of 1:1. Therefore it can be recommended that when using SMOTE, a formula as described in section 4.3 can be incorporated. To conclude, the generalization of binary classification problems should be taken with care, because minimization of the target-class upper boundary increases rapidly with the increase of the compensating instances that may somehow introduce noise. Furthermore, it must be noted that conversion of regression prob- lem to binary classification can increase a tedious work of supervising the learning process, especially when the analysis involves comparison of many parameters. Thus, data analyst should make a tradeoff when deciding to use a similar approach. 8.5 How accurate can the ranking schemes of the popular soccer data providers be? The actual prediction conducted on a sample population of data, in this case, Manchester City players historical data of the season 2017-2018 show that the ranking algorithms of sports data providers like whoScored were correct by 80%. The predictions only classified two players

43 8. DISCUSSIONANDCONCLUSION incorrectly, indicating that Claudio Bravo and are overrated players. In the case of Claudio Bravo, our models predicted the player to drop performance while the whoScored scheme suggested the opposite. By comparing between Claudio and Ederson game-time, it is apparent that Claudio should not be rated above Ederson. Ederson has been the outstanding goalkeeper for Manchester city throughout the season. On the other hand, the prediction of Raheem Sterling may have been one of those cases where perfect prediction is not possible. The Table 8.2 show the actual performance prediction of Manchester city team for the season 2018.

Table 8.2: Sample of actual predictions

Player name Actual Actual Pred. Corr. Pos EPL Team rating class class pred ranking Claudio Bravo 6.83 6.63-inf -inf-6.63 No GK Top50pc Man city Ederson 6.68 6.63-inf 6.63-inf Yes GK Top50pc Man city 7.16 -inf-7.2 -inf-7.2 Yes DF Top10pc Man city Nicolas Otamendi 7.13 -inf-7.2 -inf-7.2 Yes DF Top10pc Man city Oleksandr Zinchenko 7.11 -inf-7.2 -inf-7.2 Yes DF Top10pc Man city 7.09 -inf-7.2 -inf-7.2 Yes DF Top10pc Man city 7.05 -inf-7.2 -inf-7.2 Yes DF Top10pc Man city 7.04 -inf-7.2 -inf-7.2 Yes DF Top10pc Man city Danilo 6.91 -inf-7.2 -inf-7.2 Yes DF Top10pc Man city 6.89 -inf-7.2 -inf-7.2 Yes DF Top10pc Man city 6.87 -inf-7.2 -inf-7.2 Yes DF Top10pc Man city Eliaquim Mangala 6.57 -inf-7.2 -inf-7.2 Yes DF Top10pc Man city Kevin De Bruyne 7.8 6.91-inf 6.91-inf Yes MD Top25pc Man city David Silva 7.58 6.91-inf 6.91-inf Yes MD Top25pc Man city Leroy Sané 7.54 6.91-inf 6.91-inf Yes MD Top25pc Man city Fernandinho 7.44 6.91-inf 6.91-inf Yes MD Top25pc Man city 6.79 -inf-6.91 -inf-6.91 Yes MD Top25pc Man city Ilkay Gündogan 6.73 -inf-6.91 -inf-6.91 Yes MD Top25pc Man city Yaya Touré 6.5 -inf-6.91 -inf-6.91 Yes MD Top25pc Man city Brahim Diaz 6.08 -inf-6.91 -inf-6.91 Yes MD Top25pc Man city Sergio Aguero 7.81 7.07-inf 7.07-inf Yes FW Top10pc Man city Raheem Sterling 7.55 7.07-inf -inf-7.07 No FW Top10pc Man city 7.05 -inf-7.07 -inf-7.07 Yes FW Top10pc Man city 6.3 -inf-7.07 -inf-7.07 Yes FW Top10pc Man city Lukas Nmecha 5.98 -inf-7.07 -inf-7.07 Yes FW Top10pc Man city

44 9 Future research

Soccer analytics still have enormous areas to be researched. In this thesis, the only over- sampling technique was used as the method of combating class imbalances. Future research can try to use undersampling or use both for making a comparison and report the suitable mechanism of soccer predictions. Furthermore, regarding team chemistry (How team players relate to one another) there is limited research. By searching several sports analytics and different journals, the studies about team chemistry were not found despite the idea being popular and used in soccer gaming community as one of the criteria of selecting best team squads [61]. The same idea can be applied in real life, and future research may specifically try to find the efficient ways to measure the player chemistry and find how it can be incorporated in improving the team, match, and player performances.

45

Bibliography

[1] FIFA. History of football - The origins. 2016. URL: http://www.fifa.com/about- /who-we-are/the-game/index.html. [2] Alex Johnson. Soccer by the Numbers: A Look at the Game in the U.S. URL: https://www. nbcnews.com/storyline/fifa- corruption- scandal/soccer- numbers- look-game-u-s-n365601. [3] EPL (English Premier League). Data Capture, Player and Club Statistics | Premier League. URL: https://www.premierleague.com/stats/clarification. [4] Michael Lewis. Moneyball: The Art of Winning an Unfair Game. 2003, p. 320. [5] Jianqing Fan and Runze Li. “Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery”. In: Proceedings of the International Congress of Math- ematicians. 2006, pp. 595–622. [6] K Kira and LA Rendell. “The feature selection problem: Traditional methods and a new algorithm”. In: AAAI (1992), pp. 129–134. [7] K Kira and LA Rendell. “A practical approach to feature selection”. In: Proceedings of the Ninth International Conference on Machine Learning (1994), pp. 249–256. [8] Man Utd: What is the problem with Paul Pogba - and what is the solution? - BBC Sport. URL: https://www.bbc.co.uk/sport/football/43339067. [9] Deloitte - Sports Business Group. Ahead of the curve - Annual Review of Football Finance. Tech. rep. July. 2017, pp. 1–36. [10] Ruben Vroonen, Tom Decroos, Jan Van Haaren, and Jesse Davis. “Predicting the Po- tential of Professional Soccer Players”. In: Proceedings of the 4th Workshop on Machine Learning and Data Mining for Sports Analytics. Vol. 1971. 2017, pp. 1–10. [11] Tom Decroos, Jan Van Haaren, Vladimir Dzyuba, and Jesse Davis. “STARSS: A spatio- temporal action rating system for soccer”. In: Proceedings of the 4th Workshop on Machine Learning and Data Mining for Sports Analytics. Vol. 1971. 2017, pp. 11–20. [12] Shael Brown. “A PageRank Model for Player Performance Assessment in Basketball, Soccer and Hockey”. In: MIT SLoan Sports Analytics Conference (2017), pp. 1–22.

47 BIBLIOGRAPHY

[13] Juan A Lara, David Lizcano, David De La Peña, and José M Barreiro. “Data mining in stabilometry: Application to patient balance study for sports talent mapping”. In: Proceedings of the 4th Workshop on Machine Learning and Data Mining for Sports Analytics. Vol. 1842. 2016, paper 14. [14] Paolo Cintia, Salvatore Rinzivillo, and Pappalardo Luca. “A network-based approach to evaluate the performance of football teams”. In: SEPTEMBER (2015), pp. 1–9. [15] Markus Brandt and Ulf Brefeld. “Graph-based approaches for analyzing team interac- tion on the example of soccer”. In: Proceedings of the 2nd Workshop on Machine Learning and Data Mining for Sports Analytics. Vol. 1970. 2015, pp. 10–17. [16] G Kumar. “Machine Learning for Soccer Analytics”. In: KU Leuven, MSc thesis SEPTEM- BER 2013 (2013), pp. 1–2. [17] G. Holmes, A. Donkin, and I.H. Witten. “WEKA: a machine learning workbench”. In: Proceedings of ANZIIS ’94 - Australian New Zealand Intelligent Information Systems Confer- ence (1994), pp. 357–361. [18] Mike Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. “The WEKA Data Mining Software: An Update”. In: SIGKDD Explorations 11.1 (2009), pp. 10–18. [19] Joshua Weissbock, Herna Viktor, and Diana Inkpen. “Use of performance metrics to forecast success in the national hockey league”. In: Proceedings of the 1st Workshop on Machine Learning and Data Mining for Sports Analytics. Vol. 1969. 2013, pp. 39–48. [20] Hamidah Jantan, Abdul Razak Hamdan, and Zulaiha Ali Othman. “Potential data min- ing classification techniques for academic talent forecasting”. In: ISDA 2009 - 9th Inter- national Conference on Intelligent Systems Design and Applications (2009), pp. 1173–1178. [21] Luai Al Shalabi, Zyad Shaaban, and Basel Kasasbeh. “Data Mining: A Preprocessing Engine”. In: Journal of Computer Science 2.9 (2006), pp. 735–739. [22] H Jiawei, Micheline Kamber, Jiawei Han, Micheline Kamber, and Jian Pei. Data Mining: Concepts and Techniques. 2012, p. 745. [23] L. A1 Shalabi and Z. Shaaban. “Normalization as a preprocessing engine for data min- ing and the approach of preference matrix”. In: IEEE Computer Society, Proceedings of the International Conference on Dependability of Computer Systems 0-7695-256 (2006). [24] M. Dash and Huan Liu. “Feature Selection for Classification”. In: Intelligent Data Anal- ysis 1.1-4 (1997), pp. 131–156. [25] Girish Chandrashekar and Ferat Sahin. “A survey on feature selection methods”. In: Computers and Electrical Engineering 40.1 (2014), pp. 16–28. [26] Isabelle Guyon, André Elisseeff, and Andre@tuebingen Mpg De. “An Introduction to Variable and Feature Selection”. In: Journal of Machine Learning Research 3 (2003), pp. 1157–1182. [27] Naoual El Aboudi and Laila Benhlima. “Review on wrapper feature selection ap- proaches”. In: Proceedings - 2016 International Conference on Engineering and MIS, ICEMIS 2016 (2016). [28] M a Hall and L a Smith. “Feature subset selection: A correlation based filter approach”. In: Progress in Connectionist-Based Information Systems, Vols 1 and 2 (1998), pp. 855–858. [29] Avrim L. Blum and Pat Langley. “Selection of relevant features and examples in ma- chine learning”. In: Artificial Intelligence 97.1-2 (1997), pp. 245–271. [30] Nathalie Japkowicz and Shaju Stephen. “The class imbalance problem: A systematic study 1”. In: Intelligent Data Analysis 6 (2002), pp. 429–449.

48 Bibliography

[31] Ricardo Barandela, Rosa M Valdovinos, J Salvador Sánchez, and Francesc J Ferri. “The Imbalanced Training Sample Problem : Under or over Sampling ?” In: Structural, Syn- tactic, and Statistical Pattern Recognition (2004), pp. 806–814. [32] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. “SMOTE: Synthetic Minority Over-sampling Technique”. In: Journal of Artificial Intel- ligence Research 16 (2002), pp. 321–357. [33] Ying Mi. “Imbalanced Classification Based on Active Learning SMOTE”. In: Research Journal of Applied Sciences, Engineering and Technology 5.3 (2013), pp. 944–949. [34] Jim Hugunin, Barry Warsaw, Samuele Pedroni, Brian Zimmer, Frank Wierzbicki, and Ted Leung. The Jython Project. 1997. URL: http://www.jython.org/. [35] Laura Uusitalo. “Advantages and challenges of Bayesian networks in environmental modelling”. In: Ecological Modelling 203.3-4 (2007), pp. 312–318. [36] David D. Lewis. “Naive (Bayes) at forty: The independence assumption in informa- tion retrieval”. In: Proceedings of the 10th European Conference on Machine Learning. 1998, pp. 4–15. [37] Irina Rish. “An empirical study of the naive Bayes classifier”. In: Empirical methods in artificial intelligence workshop, IJCAI 22230.JANUARY 2001 (2001), pp. 41–46. [38] S Le Cessie, J C Van Houwelingen, S Le Cessiet, and J C Van Houwelingen. “Ridge Estimators in Logistic Regression”. In: Source Journal of the Royal Statistical Society. Series C (Applied Statistics) Appl Statist 41.1 (1992), pp. 191–201. [39] John G Cleary and Leonard E Trigg. “K*: An Instance-based Learner Using an Entropic Distance Measure”. In: Machine Learninginternational Workshop Then Conference 5 (1995), pp. 1–14. [40] David W Aha, Dennis Kibler, and Marc K Albert. “Instance-Based Learning Algo- rithms”. In: Machine Learning 6 (1991), pp. 37–66. [41] Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim, and Di- nani Amorim Fernández-Delgado. “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” In: Journal of Machine Learning Research 15 (2014), pp. 3133–3181. [42] Eibe Frank and Ian H Witten. “Generating accurate rule sets without global optimiza- tion”. In: Proceedings of the Fifteenth International Conference on Machine Learning (1998), pp. 144–151. [43] J Ross Quinlan. C4.5: Programs for Machine Learning. 3. 1993, p. 302. [44] Himani Sharma and Sunil Kumar. “A Survey on Decision Tree Algorithms of Classifica- tion in Data Mining”. In: International Journal of Science and Research 5.4 (2016), pp. 2094– 2097. [45] Naeem Seliya, Taghi M. Khoshgoftaar, and Jason Van Hulse. “A study on the relation- ships of classifier performance metrics”. In: Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI (2009), pp. 59–66. [46] Marina Sokolova and Guy Lapalme. “A systematic analysis of performance measures for classification tasks”. In: Information Processing and Management 45 (2009), pp. 427– 437. [47] Foster Provost, Tom Fawcett, and Ron Kohavi. “The Case Against Accuracy Estima- tion for Comparing Induction Algorithms”. In: Proceedings of the Fifteenth International Conference on Machine Learning (1997), pp. 445–453.

49 BIBLIOGRAPHY

[48] Marina Sokolova, Nathalie Japkowicz, and Stan Szpakowicz. “Beyond Accuracy, F- score, and ROC: A Family of Discriminant Measures for Performance Evaluation”. In: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science 4304 (2006), pp. 1015–1021. [49] Yutaka Sasaki. “The truth of the F-measure”. In: Teach Tutor mater (2007), pp. 1–5. [50] Andrew P Bradley. “The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms”. In: Pattern Recognition 30.7 (1997), pp. 1145–1159. [51] Gordana Dodig-Crnkovic. “Scientific methods in computer science”. In: Proceedings of the Conference for the Promotion of Research in IT at New Universities and at University Colleges in Sweden, Sk{ö}vde, Suecia (2002), pp. 126–130. [52] Andreas Höfer and Walter F Tichy. “Status of Empirical Research in Software Engineer- ing”. In: Empirical Software Engineering Issues LNCS 4336 (2007), pp. 10–19. [53] P. M. Goncalves, R. S. M. Barros, and D. C. L. Vieira. “On the Use of Data Mining Tools for Data Preparation in Classification Problems”. In: 2012 IEEE/ACIS 11th International Conference on Computer and Information Science (2012), pp. 173–178. [54] Balaji Rajagopalan and Mark W. Isken. “Exploiting data preparation to enhance mining and knowledge discovery”. In: IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews 31.4 (2001), pp. 460–467. [55] Javier López Peña and Raúl Sánchez Navarro. “Who can replace Xavi? A passing motif analysis of football players”. In: (2015). [56] Joris Bekkers. “Flow Motifs in Soccer : What can passing behavior tell us ?” In: Mit Sloan Sports Analytics Conference (2017), pp. 1–31. [57] Laszlo Gyarmati and Mohamed Hefeeda. “Estimating the maximal speed of soccer players on scale”. In: Proceedings of the 2nd Workshop on Machine Learning and Data Min- ing for Sports Analytics. Vol. 1970. 2015, pp. 96–103. [58] Mark Hall and Martin Guetlein. Weka source code: BestFirst search algorithm. URL: https://github.com/johdah/Weka/blob/master/src/main/java/weka/ attributeSelection/BestFirst.java. [59] Richard Elliott and Gavin Weedon. “Foreign players in the english premier academy league: ’feet-drain’ or ’feet-exchange’?” In: International Review for the Sociology of Sport 46.1 (2011), pp. 61–75. [60] Harry Zhang. “The Optimality of Naïve Bayes”. In: In FLAIRS2004 conference. 2004. [61] FIFA Chemistry for FIFA Ultimate Team. URL: https : / / www . fifauteam . com / category/fifa/fifa-chemistry/.

50 A Wrapper method results of the combined-leagues

Table A.1: Attribute selection with wrapper method for defenders top10pc dataset

ExecTime subsets Classifier Selected attributes Merit hh:mm:ss evaluated Bayes net name, shotspg, aw, tackles, in- 00:01:00 0.965 494 ter, blocks, offsidesCommitted, crosses, team_name Naïve Bayes flag, name, age, red, shotspg, 00:00:58 0.954 641 motm, aw, tackles, inter, off- sidesWon, keyp, disposs, avgp, longb, thrb, team_name Logistic name 00:42:24 0.964 190 IBK player_id, name, inter 00:03:25 0.973 266 PART Yel, shotpg 00:02:17 0.898 245 J48 player_id, motm, tackles, 00:00:55 0.922 305 team_name Random Forest player_id, age, pos, height, weight, 00:20:47 0.954 739 fullTime, halfTime, mins, assists, yel, red, motm, aw, inter, clear, drib, blocks, fouled, avgp

51 A.WRAPPERMETHODRESULTSOFTHECOMBINED-LEAGUES

Table A.2: Attribute selection with wrapper method for defenders top25pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes network name, halfTime, mins, shotspg, 00:01:06 0.918 453 motm, aw, tackles, drib, blocks, keyp, team_name, rating Naïve Bayes name, goals, motm, tackles, inter, 00:00:24 0. 899 435 owng, offsidesCommitted, crosses, team_name Logistic name 04:00:35 0.888 190 IBK player_id, name, motm, tackles 00:03:46 0.921 304 PART weight, fullTime, halfTime, goals, 00:17:09 0.864 529 shotspg, motm, offsidesWon, off- sidesCommitted, disposs, BadCon- trol, crosses, team_name J48 yel, shotspg, owng 00:01:23 0.922 305 Random Forest flag, name, pos, fullTime, half- 00:20:49 0.885 703 Time, mins, shotspg, motm, tack- les, inter, foulCommited, disposs, crosses, thrb, team_name, league

Table A.3: Attribute selection with wrapper method for defenders top50pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes network assists, shotspg, motm, inter, 00:00:20 0.825 408 BadControl, crosses, team_name, league Naïve Bayes age, fullTime, shotspg, motm, 00:00:22 0.839 492 tackles, inter, clear, blocks, offsidesCommitted, crosses, team_name Logistic height, weight, goals, assists, 00:05:39 0.857 453 motm, tackles, inter, blocks, owng, crosses, league IBK player_id, assists, motm, inter 00:02:10 0.749 306 PART shotspg, motm, inter, crosses, thrb 00:01:35 0.802 333 J48 goals, shotspg, motm, tackles, in- 00:02:18 0.797 617 ter, drib, crosses, thrb Random Forest pos, mins, assists, shotspg, Pass- 00:08:39 0.834 541 SuccPerc, motm, tackles, inter, foulCommited, blocks, keyp, off- sidesCommitted, crosses, longb

52 Table A.4: Attribute selection with wrapper method for goalkeepers top10pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes net name 00:00:06 0.954 212 Naïve Bayes Name, longb 00:00:04 0.955 243 Logistic Name, red, avgp 00:04:09 0.955 378 IBK Player_id, name, halfTime, keyP 00:00:34 0.972 337 PART Player_id, tackles, owng 00:00:24 0.931 274 J48 Name 00:00:05 0.954 213 Random Forest Player_id, age, pos, halfTime, as- 00:04:09 0.945 623 sists, yel, shotspg, tackles, inter, offsidesWon, owng, fouled, off- sidesCommitted, longb

Table A.5: Attribute selection with wrapper method for goalkeepers top25pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes network name, team_name 00:00:02 0.835 244 Naïve Bayes flag, name, height, fouled, thrb, 00:00:03 0.845 355 team_name Logistic name 00:01:52 0.877 210 IBK player_id, name, halfTime, clear, 00:00:14 0.877 331 owng PART player_id, assists, owng, thrb 00:00:11 0.764 303 J48 player_id, fullTime, motm 00:00:06 0.798 274 Random Forest goals, tackles, longb, team_name 00:00:55 0.804 304

Table A.6: Attribute selection with wrapper method for goalkeepers top50pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes network name, motm, crosses 00:00:02 0.691 274 Naïve Bayes player_id, name, motm 00:00:01 0.694 274 Logistic height, assists, yel, shotspg, motm, 00:00:44 0.721 409 aw, inter, foulCommited IBK player_id, motm 00:00:05 0.689 245 PART motm 00:00:01 0.685 213 J48 motm 00:00:01 0.685 213 Random Forest name, pos, halfTime, motm, aw, 00:00:55 0.705 453 keyp

53 A.WRAPPERMETHODRESULTSOFTHECOMBINED-LEAGUES

Table A.7: Attribute selection with wrapper method for midfielders top10pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes net name 00:00:15 0.967 211 Naïve Bayes name 00:00:06 0.967 209 Logistic name 00:08:24 0.967 190 IBK player_id, name, owng 00:04:15 0.978 266 PART assists, motm, offsidesWon, fouled, 00:12:17 0.948 419 longb, team_name J48 name 00:00:17 0.967 212 Random Forest halfTime, mins, red, shotspg, Pass- 00:09:47 0.964 557 SuccPerc, motm, tackles, inter, clear, drib, owng, keyp, fouled, crosses, team_name

Table A.8: Attribute selection with wrapper method for midfielders top25pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes network name, halfTime, assists, motm, 0:00:40 0.934 408 drib, blocks, thrb, team_name Naïve Bayes name, owng 0:00:05 0.89 228 Logistic name, motm 0:10:49 0.917 228 IBK player_id, name, shotspg, motm 0:02:51 0.935 305 PART halfTime, goals, PassSuccPerc, 0:06:59 0.898 699 motm, tackles, foulCommited, blocks, keyp, crosses, league J48 age, halfTime, assists, yel, red, 0:01:31 0.89 488 motm Random Forest fullTime, halfTime, goals, as- 0:14:25 0.921 651 sists, red, shotspg, PassSuccPerc, motm, aw, inter, keyp, BadControl, crosses, league

Table A.9: Attribute selection with wrapper method for midfielders top50pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes network fullTime, motm, crosses 00:00:14 0.850 274 Naïve Bayes flag, fullTime, shotspg, motm, tack- 00:00:15 0.885 409 les, inter, keyp, crosses Logistic height, fullTime, assists, shot- 00:04:41 0.900 455 spg, PassSuccPerc, motm, tackles, crosses, league IBK fullTime, motm 00:01:05 0.822 243 PART fullTime, PassSuccPerc, keyp, 00:00:55 0.845 270 crosses J48 fullTime, foulCommited, blocks, 00:00:30 0.837 302 crosses Random Forest player_id, age, fullTime, halfTime, 00:08:54 0.886 621 goals, assists, shotspg, tackles, off- sidesWon, clear, owng, keyp, Bad- Control, crosses, longb, thrb

54 Table A.10: Attribute selection with wrapper method for forwards top10pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes net name, halfTime, motm, blocks, 00:00:14 0.984 355 owng, team_name Naïve Bayes name, offsidesWon, owng 00:00:04 0.965 305 Logistic name, motm 00:01:07 0.976 242 IBK player_id, name, motm, blocks 00:00:27 0.981 270 PART player_id, age, motm, owng, avgp 00:00:16 0.969 303 J48 name 00:00:03 0.963 213 Random Forest motm, blocks, owng, thrb 00:01:24 0.969 305

Table A.11: Attribute selection with wrapper method for forwards top25pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes network flag, name, shotspg, motm, keyp, 0:00:09 0.932 355 team_name Naïve Bayes name, age, shotspg, motm, drib, 0:00:05 0.926 392 league Logistic name, halfTime, goals, red, motm 0:01:43 0.946 331 IBK player_id, name, assists, motm 0:00:20 0.941 269 PART player_id, motm, aw, BadControl 0:00:11 0.92 269 J48 player_id, weight, halfTime, motm, 0:00:14 0.941 381 aw, offsidesCommitted, crosses Random Forest player_id, age, halfTime, goals, 0:02:34 0.947 538 shotspg, PassSuccPerc, motm, aw, tackles, inter, offsidesCommitted

Table A.12: Attribute selection with wrapper method for forwards top50pc dataset

Selected ExecTime subsets Classifier Merit attributes hh mm ss evaluated Bayes network player_id, shotspg, motm, inter, 00:00:07 0.919 384 foulCommited, keyp, crosses Naïve Bayes weight, assists, motm, BadControl, 00:00:03 0.912 329 crosses Logistic fullTime, assists, red, motm, inter, 00:00:57 0.909 384 offsidesCommitted, crosses IBK shotspg, offsidesWon 00:00:08 0.860 243 PART red, motm, BadControl, crosses, 00:00:09 0.902 396 team_name J48 assists, motm, crosses 00:00:04 0.909 274 Random Forest pos, height, weight, halfTime, 00:02:09 0.916 589 goals, yel, shotspg, motm, aw, tackles, inter, foulCommited, off- sidesWon, owng, fouled, avgp, longb

55 A.WRAPPERMETHODRESULTSOFTHECOMBINED-LEAGUES

Table A.13: Frequency of attributes being selected by several wrapper schemes DF GK MD FW Attribute(s) 10 25 50 10 25 50 10 25 50 10 25 50 Man of the match 3 5 7 15 0 1 7 8 2 6 4 12 5 7 6 18 53 ř ř ř ř ř ř Player name 4 5 0 9 5 4 3 12 5 4 0 9 5 4 0 9 39 Crosses 1 3 6 10 0 0 1 1 1 2 6 9 0 1 5 6 26 Shots per game 3 4 5 12 1 0 1 2 1 2 3 6 0 3 3 6 26 Player id 3 1 1 5 3 3 2 8 1 1 1 3 2 4 1 7 23 Interception 4 2 7 13 1 0 1 2 1 1 1 3 0 1 3 4 22 Tackles 3 4 4 11 2 1 0 3 1 1 3 5 0 1 1 2 21 Half time 1 3 0 4 2 1 1 4 1 4 1 6 1 3 1 5 19 Team name 3 4 2 9 0 3 0 3 2 1 0 3 1 1 1 3 18 Assists 1 0 4 5 1 1 1 3 0 3 2 5 0 1 3 4 17 Own goals 0 2 1 3 2 2 0 4 2 1 1 4 4 0 1 5 16 Full time 1 2 1 4 0 1 0 1 0 1 7 8 0 0 1 1 14 Key pass 1 1 1 3 1 0 1 2 1 2 3 6 0 1 1 2 13 Blocks 2 1 3 6 0 0 0 0 0 2 1 3 3 0 0 3 12 Aerial won 3 1 0 4 0 0 2 2 0 1 0 1 0 3 1 4 11 Goals 0 2 2 4 0 1 0 1 0 2 1 3 0 2 1 3 11 Age 2 0 1 3 1 0 0 1 0 1 1 2 1 2 0 3 9 Offsides committed 1 2 2 5 1 0 0 1 0 0 0 0 0 2 1 3 9 Red cards 2 0 0 2 1 0 0 1 1 2 0 3 0 1 2 3 9 Through ball 1 1 2 4 0 2 0 2 0 1 1 2 1 0 0 1 9 Long balls 1 0 1 2 2 1 0 3 1 0 1 2 0 0 1 1 8 Offsides won 1 1 0 2 1 0 0 1 1 0 1 2 1 0 2 3 8 Bad control 0 1 1 2 0 0 0 0 0 1 1 2 0 1 2 3 7 Fouls committed 0 1 1 2 0 0 1 1 0 1 1 2 0 0 2 2 7 League 0 1 2 3 0 0 0 0 0 2 1 3 0 1 0 1 7 Passing success % 0 0 1 1 0 0 0 0 1 2 2 5 0 1 0 1 7 Yellow cards 2 1 0 3 1 0 1 2 0 1 0 1 0 0 1 1 7 Dribbling 1 1 1 3 0 0 0 0 1 1 0 2 0 1 0 1 6 Fouled 1 0 0 1 1 1 0 2 2 0 0 2 0 0 1 1 6 Height 1 0 1 2 0 1 1 2 0 0 1 1 0 0 1 1 6 Position 1 1 1 3 1 0 1 2 0 0 0 0 0 0 1 1 6 Weight 1 1 1 3 0 0 0 0 0 0 0 0 0 1 2 3 6 Clearance 1 0 1 2 0 1 0 1 1 0 1 2 0 0 0 0 5 Mins played 1 2 1 4 0 0 0 0 1 0 0 1 0 0 0 0 5 Nationality 1 1 0 2 0 1 0 1 0 0 1 1 0 1 0 1 5 Passes per game 2 0 0 2 1 0 0 1 0 0 0 0 1 0 1 2 5 Dispossessed 1 2 0 3 0 0 0 0 0 0 0 0 0 0 0 0 3

56 B Attributes selected by Wrapper method of the combined leagues

Table B.1: Support count of selected attributes with wrapper method for defender’s top10pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted Inter, name 4 Yes aw, motm, player_id, shotspg, 3 Yes tackles, team_name age, avgp, blocks, red, yel 2 Yes assists, clear, crosses, disposs, 1 No drib, flag, fouled, fullTime, half- Time, height, keyp, longb, mins, offsidesCommitted, offsidesWon, pos, thrb, weight

Table B.2: Support count of selected attributes for wrapper method for defender’s top25pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted Name, motm 5 Yes Shotspg, tackles, team_name 4 Yes Crosses, halfTime 3 Yes Goals, owng, disposs, fullTime, 2 Yes offsidesCommitted, inter, mins Flag, thrb, league, BadControl, 1 No yel, blocks, keyp, offsidesWon, aw, drib, weight, player_id, foul- Commited, pos

57 B.ATTRIBUTESSELECTEDBY WRAPPERMETHODOFTHECOMBINEDLEAGUES

Table B.3: Support count of selected attributes for wrapper method for defender’s top50pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted Motm, inter 7 Yes crosses 6 Yes Shotspg 5 Yes Tackles, assists 4 Yes blocks 3 Yes goals, team_name, thrb, off- 2 Yes sidesCommitted, league age, height, fullTime, longb, keyp, 1 No mins, pos, drib, clear, foulCom- mited, BadControl, owng, weight, PassSuccPerc, player_id

Table B.4: Support count of selected attributes with wrapper method for goalkeeper’s top10pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted Name 5 Yes Player_id 3 Yes Tackles, owng, halfTime, longb 2 Yes assists, keyp, inter, red, pos, 1 No fouled, offsidesWon, shotspg, yel, avgp, age, offsidesCommitted

Table B.5: Support count of selected attributes for wrapper method for goalkeepers’ top25pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted Name 4 Yes Team_name, player_id 3 Yes Owng, thrb 2 Yes goals, fouled, halfTime, clear, full- 1 No Time, assists, tackles, motm, flag, height, longb

Table B.6: Count of selected attributes for goalkeepers’ top50pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted motm 7 Yes name 3 Yes aw, player_id 2 Yes halfTime, assists, foulCommited, 1 No crosses, pos, shotspg, height, yel, inter, keyp

Table B.7: Support count of selected attributes with wrapper method for midfielders top10pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted name 5 Yes team_name, fouled, owng, motm 2 Yes PassSuccPerc, shotspg, halfTime, 1 No longb, inter, mins, player_id, clear, red, drib, tackles, off- sidesWon, keyp, crosses

58 Table B.8: Support count of selected attributes for wrapper method for midfielder’s top25pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted motm 6 Yes name, halfTime 4 Yes assists 3 Yes Shotspg, crosses, PassSuccPerc, 2 Yes league, red, goals, blocks, keyp inter, thrb, tackles, foulCom- 1 No mited, drib, fullTime, age, owng, team_name, aw, yel, player_id, BadControl

Table B.9: Count of selected attributes for midfielders top50pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted fullTime 7 Yes crosses 6 Yes motm 4 Yes keyp, tackles, shotspg 3 Yes PassSuccPerc, assists 2 Yes flag, player_id, owng, blocks, 1 No longb, clear, offsidesWon, age, BadControl, foulCommited, goals, league, halfTime, thrb, height, inter

Table B.10: Support count of selected attributes with wrapper method for forwards top10pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted motm, name 5 Yes owng 4 Yes blocks 3 Yes player_id 2 Yes thrb, age, team_name, off- 1 No sidesWon, avgp, halfTime

Table B.11: Support count of selected attributes for wrapper method for forwards top25pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted motm 7 Yes player_id, name 4 Yes aw, shotspg, halfTime 3 Yes goals, offsidesCommitted, age 2 Yes team_name, PassSuccPerc, as- 1 No sists, flag, drib, red, crosses, BadControl, tackles, inter, weight, keyp, league

59 B.ATTRIBUTESSELECTEDBY WRAPPERMETHODOFTHECOMBINEDLEAGUES

Table B.12: Count of selected attributes for forwards top50pc dataset Attribute(s) No of algorithms selected the attribute(s) Accepted motm 6 Yes crosses 5 Yes shotspg, assists, inter 3 Yes red, offsidesWon, foulCommited, 2 Yes weight, BadControl player_id, height, avgp, fullTime, 1 No team_name, aw, pos, fouled, half- Time, offsidesCommitted, tackles, goals, owng, yel, keyp, longb

60 C Execution time of wrapper method for the combined leagues

Figure C.1: Execution time of wrapper attribute evaluator for defenders datasets

61 C.EXECUTIONTIMEOFWRAPPERMETHODFORTHECOMBINEDLEAGUES

Figure C.2: Execution time of wrapper method for goalkeepers datasets

Figure C.3: Execution time of wrapper method for midfielders datasets

62 Figure C.4: Execution time of wrapper method for forwards datasets

63

D Aggregated results of filter method for the combined leagues

65 D.AGGREGATED RESULTS OF FILTER METHOD FOR THE COMBINED LEAGUES

Table D.1: Overall filter method results DF MD FW GK Attributes 10 25 50 AV 10 25 50 AV 10 25 50 AV 10 25 50 AV GAV full time .30 .40 .47 .39 .59 .59 .66 .61 .71 .69 .68 .69 .37 .02 .26 .22 .48 minutes played .30 .40 .47 .39 .59 .58 .66 .61 .70 .68 .68 .69 .38 .02 .26 .22 .48 man of the match .46 .48 .44 .46 .65 .56 .47 .56 .70 .58 .44 .58 .08 .25 .37 .23 .46 crosses .42 .43 .47 .44 .53 .55 .60 .56 .72 .68 .71 .70 .02 .20 .15 .12 .46 shots per game .29 .31 .28 .29 .55 .46 .45 .49 .72 .68 .67 .69 .05 .06 .07 .06 .38 assists .32 .28 .29 .29 .55 .51 .49 .52 .62 .58 .52 .57 .12 .01 .11 .08 .37 key pass .23 .21 .21 .22 .60 .56 .51 .56 .61 .63 .59 .61 .07 .03 .02 .04 .36 goals .27 .34 .32 .31 .47 .41 .42 .43 .71 .66 .58 .65 .0 .0 .0 .0 .35 aerial won .45 .43 .38 .42 .16 .20 .28 .21 .51 .48 .54 .51 .24 .20 .18 .21 .34 fouled .23 .19 .20 .21 .44 .47 .47 .46 .52 .50 .54 .52 .10 .16 .07 .11 .32 tackles .40 .38 .36 .38 .30 .37 .46 .38 .34 .42 .42 .39 .11 .01 .06 .06 .30 through balls .24 .24 .30 .26 .36 .35 .41 .37 .51 .45 .48 .48 .05 .12 .11 .09 .30 dispossessed .18 .16 .13 .16 .41 .38 .37 .38 .49 .54 .55 .52 .05 .06 .07 .06 .28 interception .50 .43 .45 .46 .24 .30 .42 .32 .22 .25 .21 .22 .14 .05 .13 .11 .28 Bad control .15 .11 .10 .12 .34 .34 .36 .35 .48 .50 .58 .52 .17 .06 .02 .08 .27 clearance .24 .22 .27 .25 .06 .18 .30 .18 .43 .41 .43 .42 .33 .13 .16 .21 .26 yellow cards .08 .20 .28 .18 .23 .30 .42 .31 .39 .36 .41 .39 .30 .11 .04 .15 .26 half time .37 .32 .28 .32 .38 .36 .28 .34 .35 .17 .04 .18 .07 .11 .13 .10 .24 fouls Committed .11 .14 .13 .13 .20 .26 .32 .26 .32 .36 .46 .38 .17 .16 .11 .14 .23 average passes .07 .07 .04 .06 .17 .16 .18 .17 .61 .63 .59 .61 .11 .05 .01 .06 .22 dribbles .17 .15 .15 .16 .27 .33 .40 .33 .13 .27 .21 .20 .31 .18 .10 .19 .22 offsides Committed .07 .12 .07 .09 .24 .21 .18 .21 .44 .49 .51 .48 .0 .0 .0 .0 .19 long balls .16 .12 .11 .13 .36 .35 .32 .34 .12 .11 .20 .14 .10 .04 .01 .05 .17 age .05 .0 .01 .02 .08 .11 .15 .11 .34 .31 .34 .33 .16 .07 .02 .08 .14 blocks .13 .16 .19 .16 .10 .13 .20 .14 .23 .22 .20 .22 .0 .0 .0 .0 .13 weight .10 .14 .14 .13 .06 .07 .09 .07 .19 .15 .17 .17 .13 .07 .14 .11 .12 Nationality .24 .10 .05 .13 .08 .09 .03 .06 .16 .07 .06 .10 .24 .12 .07 .14 .11 PassSucc % .19 .18 .19 .19 .15 .15 .14 .15 .04 .03 .04 .04 .01 .09 .03 .04 .10 red cards .08 .04 .06 .06 .09 .11 .15 .12 .03 .09 .16 .09 .22 .15 .03 .13 .10 team name .09 .06 .04 .06 .11 .06 .03 .06 .24 .09 .04 .12 .21 .11 .05 .12 .09 own goals .06 .0 .03 .03 .01 .08 .08 .06 .21 .11 .12 .15 .17 .19 .05 .14 .09 league .20 .10 .06 .12 .06 .07 .04 .06 .10 .10 .05 .08 .17 .12 .03 .11 .09 height .08 .09 .10 .09 .03 .08 .08 .06 .11 .13 .10 .11 .03 .09 .08 .07 .08 player name .08 .05 .03 .05 .07 .04 .03 .05 .15 .09 .06 .10 .19 .10 .07 .12 .08 offsides won .07 .10 .15 .11 .07 .02 .04 .04 .11 .02 .01 .05 .0 .0 .0 .0 .05 player id .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 position .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0

66 E Model accuracy results of wrapper datasets for the combined leagues

Figure E.1: Prediction model accuracy for defenders wrapper-dataset

67 E.MODEL ACCURACY RESULTS OF WRAPPER DATASETS FOR THE COMBINED LEAGUES

Figure E.2: Prediction model accuracy for midfielders wrapper-dataset

Figure E.3: Model accuracy for the goalkeepers wrapper-datasets

68 Figure E.4: prediction model accuracy for forwards wrapped datasets

69

F Model accuracy of filter-datasets for the combined leagues

Figure F.1: Model accuracy for defenders filter-datasets

71 F. MODEL ACCURACY OF FILTER-DATASETS FOR THE COMBINED LEAGUES

Figure F.2: Model accuracy for midfielders filter-datasets

Figure F.3: Model accuracy for goalkeepers filter-datasets

72 Figure F.4: Model accuracy for forwards filter-datasets

73

G F1 score results of wrapper datasets for the combined leagues

Figure G.1: F1 Score results of the defenders wrapper-datasets

75 G. F1 SCORE RESULTS OF WRAPPER DATASETS FOR THE COMBINED LEAGUES

Figure G.2: F1 Score results of the midfielders wrapper-datasets

Figure G.3: F1 Score results of the goalkeepers wrapped datasets

76 Figure G.4: F1 Score results of the forwards wrapped datasets

77

H F1 Score results of the filter-datasets for the combined leagues

Figure H.1: F1 Score results of the defenders filter-datasets

79 H. F1 SCORE RESULTS OF THE FILTER-DATASETS FOR THE COMBINED LEAGUES

Figure H.2: F1 Score results of the midfielders filter-datasets

Figure H.3: F1 Score results of the goalkeepers filtered datasets

80 Figure H.4: F1 score results of the forwards filtered datasets

81

I AUC-ROC results of the wrapper datasets for the combined leagues

Figure I.1: AUC-ROC results of the defenders wrapper-datasets

83 I.AUC-ROC RESULTS OF THE WRAPPER DATASETS FOR THE COMBINED LEAGUES

Figure I.2: AUC-ROC results of the midfielders wrapper-datasets

Figure I.3: AUC-ROC results of the goalkeepers wrapper-datasets

84 Figure I.4: AUC-ROC results of the forwards wrapper-datasets

85

J AUC-ROC results of the filter-datasets

Figure J.1: AUC-ROC results of the defenders filter-datasets

87 J.AUC-ROC RESULTS OF THE FILTER-DATASETS

Figure J.2: AUC-ROC results of the midfielders filter-datasets

Figure J.3: AUC-ROC results of the goalkeepers filter-datasets

88 Figure J.4: AUC-ROC results of the forwards filter-datasets

89

K Accuracy results for individual leagues

91 K.ACCURACYRESULTSFORINDIVIDUALLEAGUES

Table K.1: Accuracy results of Bundesliga Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV NaiveBayes 79.7 98.7 76.0 87.8 85.5 87.1 97.6 76.6 93.3 88.7 87.1 top10 81.8 100.0 69.6 87.9 84.8 96.4 96.7 82.6 91.9 91.9 88.4 top25 76.9 96.0 83.3 85.6 85.5 87.9 96.0 88.9 95.2 92.0 88.7 top50 80.3 100.0 75.0 90.0 86.3 77.0 100.0 58.3 92.9 82.1 84.2 Logistic 80.2 96.9 77.0 89.4 85.9 85.4 95.6 73.4 92.3 86.7 86.3 top10 85.5 96.7 78.3 91.1 87.9 97.3 96.7 87.0 93.5 93.6 90.7 top25 78.0 100.0 77.8 91.3 86.8 90.1 96.0 83.3 96.2 91.4 89.1 top50 77.0 94.1 75.0 85.7 83.0 68.9 94.1 50.0 87.1 75.0 79.0 BayesNet 78.6 98.9 72.9 89.3 84.9 86.7 96.9 75.8 90.5 87.5 86.2 top10 89.1 96.7 82.6 90.3 89.7 97.3 96.7 91.3 94.4 94.9 92.3 top25 74.7 100.0 77.8 90.4 85.7 92.3 100.0 77.8 91.3 90.4 88.0 top50 72.1 100.0 58.3 87.1 79.4 70.5 94.1 58.3 85.7 77.2 78.3 PART 81.0 93.4 82.7 87.8 86.2 80.2 93.4 79.0 87.5 85.0 85.6 top10 89.1 100.0 87.0 90.3 91.6 93.6 100.0 87.0 88.7 92.3 92.0 top25 83.5 92.0 77.8 87.5 85.2 68.1 92.0 83.3 86.5 82.5 83.8 top50 70.5 88.2 83.3 85.7 81.9 78.7 88.2 66.7 87.1 80.2 81.1 RandomForest 83.4 97.3 76.6 90.3 86.9 86.6 97.6 55.6 88.4 82.0 84.5 top10 91.8 100.0 82.6 96.8 92.8 96.4 96.7 69.6 91.9 88.6 90.7 top25 84.6 92.0 72.2 88.5 84.3 87.9 96.0 72.2 90.4 86.6 85.5 top50 73.8 100.0 75.0 85.7 83.6 75.4 100.0 25.0 82.9 70.8 77.2 J48 78.3 93.4 80.8 88.7 85.3 77.9 93.4 67.9 87.5 81.7 83.5 top10 89.1 100.0 87.0 94.4 92.6 84.5 100.0 87.0 88.7 90.1 91.3 top25 76.9 92.0 72.2 87.5 82.2 70.3 92.0 50.0 86.5 74.7 78.4 top50 68.9 88.2 83.3 84.3 81.2 78.7 88.2 66.7 87.1 80.2 80.7 IBk 80.1 92.8 63.1 87.4 80.9 83.2 96.7 72.5 88.8 85.3 83.1 top10 90.0 100.0 78.3 94.4 90.7 95.5 100.0 87.0 93.5 94.0 92.3 top25 81.3 96.0 44.4 86.5 77.1 86.8 96.0 72.2 94.2 87.3 82.2 top50 68.9 82.4 66.7 81.4 74.8 67.2 94.1 58.3 78.6 74.6 74.7 DecisionTable 75.1 92.3 77.5 86.5 82.9 76.4 92.3 68.3 90.7 81.9 82.4 top10 90.0 96.7 82.6 86.3 88.9 87.3 96.7 82.6 93.5 90.0 89.5 top25 71.4 92.0 66.7 84.6 78.7 74.7 92.0 55.6 91.3 78.4 78.5 top50 63.9 88.2 83.3 88.6 81.0 67.2 88.2 66.7 87.1 77.3 79.2 ZeroR 49.9 49.7 50.7 50.3 50.1 49.9 49.7 50.7 50.3 50.1 50.1 top10 50.0 50.0 52.2 50.8 50.7 50.0 50.0 52.2 50.8 50.7 50.7 top25 50.5 52.0 50.0 50.0 50.6 50.5 52.0 50.0 50.0 50.6 50.6 top50 49.2 47.1 50.0 50.0 49.1 49.2 47.1 50.0 50.0 49.1 49.1 Grand Total 76.3 90.4 73.0 84.2 81.0 79.3 90.4 68.9 85.5 81.0 81.0

92 Table K.2: Accuracy results of EPL Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV NaiveBayes 81.5 94.4 84.7 88.0 87.2 82.6 91.7 87.7 90.1 88.0 87.6 top10 83.6 93.3 89.7 89.9 89.1 81.8 93.3 96.6 91.2 90.7 89.9 top25 83.5 96.0 83.3 86.3 87.3 90.1 88.0 91.7 93.5 90.8 89.1 top50 77.4 93.8 81.3 88.0 85.1 75.8 93.8 75.0 85.5 82.5 83.8 RandomForest 81.5 94.5 80.6 90.1 86.7 84.0 96.6 78.7 91.0 87.6 87.1 top10 87.3 100.0 89.7 92.6 92.4 93.6 100.0 86.2 94.6 93.6 93.0 top25 84.6 96.0 95.8 88.7 91.3 89.0 96.0 87.5 92.7 91.3 91.3 top50 72.6 87.5 56.3 89.2 76.4 69.4 93.8 62.5 85.5 77.8 77.1 Logistic 78.3 91.7 82.7 90.9 85.9 84.7 92.3 74.6 92.6 86.0 86.0 top10 85.5 93.3 89.7 95.3 90.9 94.5 93.3 86.2 94.6 92.2 91.5 top25 76.9 88.0 95.8 89.5 87.6 93.4 96.0 87.5 95.2 93.0 90.3 top50 72.6 93.8 62.5 88.0 79.2 66.1 87.5 50.0 88.0 72.9 76.0 BayesNet 78.7 95.5 73.4 89.3 84.2 86.7 94.1 74.1 92.4 86.8 85.5 top10 82.7 96.7 82.8 92.6 88.7 96.4 96.7 93.1 95.3 95.4 92.0 top25 79.1 96.0 87.5 88.7 87.8 91.2 92.0 91.7 95.2 92.5 90.2 top50 74.2 93.8 50.0 86.7 76.2 72.6 93.8 37.5 86.7 72.6 74.4 J48 80.1 96.6 69.2 88.4 83.6 80.7 96.6 74.1 88.3 84.9 84.2 top10 86.4 100.0 93.1 92.6 93.0 85.5 100.0 93.1 92.6 92.8 92.9 top25 84.6 96.0 70.8 89.5 85.2 82.4 96.0 79.2 87.9 86.4 85.8 top50 69.4 93.8 43.8 83.1 72.5 74.2 93.8 50.0 84.3 75.6 74.0 IBk 80.0 93.2 72.5 88.7 83.6 78.6 93.0 77.6 88.4 84.4 84.0 top10 85.5 100.0 86.2 91.9 90.9 95.5 93.3 93.1 93.2 93.8 92.3 top25 80.2 92.0 75.0 86.3 83.4 82.4 92.0 83.3 93.5 87.8 85.6 top50 74.2 87.5 56.3 88.0 76.5 58.1 93.8 56.3 78.3 71.6 74.0 DecisionTable 79.7 96.6 66.5 88.1 82.7 79.5 96.6 72.0 91.2 84.8 83.8 top10 83.6 100.0 82.8 87.2 88.4 87.3 100.0 82.8 96.6 91.7 90.0 top25 81.3 96.0 66.7 90.3 83.6 76.9 96.0 83.3 92.7 87.2 85.4 top50 74.2 93.8 50.0 86.7 76.2 74.2 93.8 50.0 84.3 75.6 75.9 PART 80.6 94.5 68.3 89.0 83.1 78.8 94.5 69.0 87.6 82.5 82.8 top10 86.4 100.0 86.2 89.9 90.6 91.8 100.0 86.2 91.2 92.3 91.5 top25 84.6 96.0 75.0 90.3 86.5 70.3 96.0 70.8 87.1 81.1 83.8 top50 71.0 87.5 43.8 86.7 72.2 74.2 87.5 50.0 84.3 74.0 73.1 ZeroR 50.2 49.3 50.6 50.2 50.1 50.2 49.3 50.6 50.2 50.1 50.1 top10 50.0 50.0 51.7 50.0 50.4 50.0 50.0 51.7 50.0 50.4 50.4 top25 50.5 48.0 50.0 50.0 49.6 50.5 48.0 50.0 50.0 49.6 49.6 top50 50.0 50.0 50.0 50.6 50.2 50.0 50.0 50.0 50.6 50.2 50.2 Grand Total 76.7 89.6 72.1 84.8 80.8 78.4 89.4 73.2 85.7 81.7 81.2

93 K.ACCURACYRESULTSFORINDIVIDUALLEAGUES

Table K.3: Accuracy results of La Liga Filter Wrapper DF FW GK MD AV DF FW GK MD AV GAV RandomForest 82.6 86.9 85.1 89.2 85.9 87.4 90.4 82.4 90.3 87.6 86.8 top10 92.6 90.3 100.0 93.8 94.2 92.6 93.5 96.2 96.6 94.7 94.4 top25 84.0 92.6 81.8 90.9 87.3 91.0 100.0 90.9 94.2 94.0 90.7 top50 71.2 77.8 73.3 82.7 76.3 78.8 77.8 60.0 80.2 74.2 75.2 BayesNet 79.0 86.9 85.3 86.4 84.4 88.3 90.4 85.3 90.2 88.5 86.5 top10 84.3 90.3 96.2 93.8 91.2 97.5 93.5 96.2 97.3 96.1 93.6 top25 83.0 92.6 86.4 87.6 87.4 90.0 100.0 86.4 94.2 92.6 90.0 top50 69.7 77.8 73.3 77.8 74.6 77.3 77.8 73.3 79.0 76.8 75.7 Logistic 80.8 79.5 81.1 90.5 83.0 89.0 90.0 87.7 91.3 89.5 86.2 top10 87.6 77.4 92.3 95.9 88.3 95.0 90.3 92.3 95.2 93.2 90.8 top25 79.0 88.9 90.9 91.7 87.6 90.0 96.3 90.9 90.9 92.0 89.8 top50 75.8 72.2 60.0 84.0 73.0 81.8 83.3 80.0 87.7 83.2 78.1 IBk 76.5 86.3 86.8 85.6 83.8 84.3 88.0 78.3 87.5 84.5 84.2 top10 84.3 90.3 96.2 95.9 91.7 90.1 93.5 88.5 97.9 92.5 92.1 top25 68.0 96.3 90.9 86.8 85.5 90.0 92.6 86.4 91.7 90.2 87.8 top50 77.3 72.2 73.3 74.1 74.2 72.7 77.8 60.0 72.8 70.8 72.5 NaiveBayes 79.0 86.9 75.5 88.1 82.4 88.4 88.1 71.7 91.4 84.9 83.7 top10 86.8 90.3 84.6 92.5 88.5 93.4 90.3 84.6 94.5 90.7 89.6 top25 79.0 92.6 81.8 89.3 85.7 90.0 96.3 77.3 93.4 89.2 87.5 top50 71.2 77.8 60.0 82.7 72.9 81.8 77.8 53.3 86.4 74.8 73.9 DecisionTable 78.7 86.7 79.2 86.8 82.8 82.8 84.9 80.7 86.2 83.7 83.2 top10 89.3 93.5 96.2 94.5 93.4 87.6 93.5 96.2 95.2 93.1 93.2 top25 74.0 88.9 68.2 91.7 80.7 82.0 88.9 72.7 89.3 83.2 82.0 top50 72.7 77.8 73.3 74.1 74.5 78.8 72.2 73.3 74.1 74.6 74.5 PART 79.5 89.8 73.2 87.9 82.6 82.9 88.6 73.2 85.0 82.4 82.5 top10 90.9 93.5 96.2 95.9 94.1 95.9 93.5 96.2 90.4 94.0 94.1 top25 75.0 92.6 50.0 90.1 76.9 80.0 88.9 50.0 89.3 77.0 77.0 top50 72.7 83.3 73.3 77.8 76.8 72.7 83.3 73.3 75.3 76.2 76.5 J48 77.8 89.8 73.2 89.5 82.6 80.2 88.6 73.2 85.9 82.0 82.3 top10 87.6 93.5 96.2 95.2 93.1 92.6 93.5 96.2 91.8 93.5 93.3 top25 76.0 92.6 50.0 91.7 77.6 80.0 88.9 50.0 91.7 77.7 77.6 top50 69.7 83.3 73.3 81.5 77.0 68.2 83.3 73.3 74.1 74.7 75.8 ZeroR 50.1 49.9 50.2 49.9 50.0 50.1 49.9 50.2 49.9 50.0 50.0 top10 50.4 51.6 53.8 50.7 51.6 50.4 51.6 53.8 50.7 51.6 51.6 top25 50.0 48.1 50.0 49.6 49.4 50.0 48.1 50.0 49.6 49.4 49.4 top50 50.0 50.0 46.7 49.4 49.0 50.0 50.0 46.7 49.4 49.0 49.0 Grand Total 76.0 82.5 76.6 83.8 79.7 81.5 84.3 75.8 84.2 81.5 80.6

94 Table K.4: Accuracy results of Ligue 1 Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV Logistic 88.8 88.1 63.7 85.7 81.6 88.1 91.6 66.3 92.1 84.5 83.0 top10 92.7 89.5 69.2 89.2 85.2 94.5 100. 88.5 94.6 94.4 89.8 top25 88.0 93.9 81.8 84.8 87.2 89.1 93.9 63.6 88.8 83.9 85.5 top50 85.5 81.0 40.0 83.1 72.4 80.6 81.0 46.7 92.8 75.3 73.8 NaiveBayes 87.3 86.5 71.4 79.3 81.1 85.4 88.0 72.0 86.5 83.0 82.1 top10 90.0 92.1 76.9 76.4 83.8 90.9 100. 92.3 85.1 92.1 88.0 top25 84.8 81.8 77.3 78.4 80.6 81.5 87.9 63.6 86.4 79.9 80.2 top50 87.1 85.7 60.0 83.1 79.0 83.9 76.2 60.0 88.0 77.0 78.0 Rand.Forest 85.6 85.4 70.7 86.5 82.0 84.0 90.6 65.3 86.5 81.6 81.8 top10 94.5 92.1 76.9 91.9 88.9 94.5 100. 92.3 91.2 94.5 91.7 top25 84.8 87.9 81.8 83.2 84.4 81.5 90.9 63.6 84.0 80.0 82.2 top50 77.4 76.2 53.3 84.3 72.8 75.8 81.0 40.0 84.3 70.3 71.5 IBk 84.0 86.2 71.5 87.6 82.3 87.6 86.6 64.0 85.3 80.9 81.6 top10 94.5 86.8 76.9 90.5 87.2 93.6 97.4 88.5 93.2 93.2 90.2 top25 84.8 90.9 90.9 88.0 88.7 87.0 90.9 63.6 90.4 83.0 85.8 top50 72.6 81.0 46.7 84.3 71.1 82.3 71.4 40.0 72.3 66.5 68.8 BayesNet 82.0 84.1 69.9 82.3 79.5 87.1 91.6 61.0 86.6 81.6 80.6 top10 89.1 89.5 76.9 82.4 84.5 95.5 100. 88.5 88.5 93.1 88.8 top25 82.6 81.8 72.7 80.0 79.3 83.7 93.9 54.5 85.6 79.4 79.4 top50 74.2 81.0 60.0 84.3 74.9 82.3 81.0 40.0 85.5 72.2 73.5 PART 84.6 84.1 68.9 83.9 80.4 77.4 89.6 72.1 82.5 80.4 80.4 top10 93.6 86.8 80.8 91.9 88.3 91.8 100. 92.3 89.2 93.3 90.8 top25 85.9 93.9 72.7 86.4 84.7 66.3 87.9 77.3 77.6 77.3 81.0 top50 74.2 71.4 53.3 73.5 68.1 74.2 81.0 46.7 80.7 70.6 69.4 J48 85.9 84.1 66.7 83.2 80.0 76.9 88.6 75.1 80.9 80.4 80.2 top10 95.5 86.8 80.8 87.8 87.7 91.8 100. 92.3 86.5 92.7 90.2 top25 81.5 93.9 72.7 84.8 83.2 66.3 84.8 86.4 84.0 80.4 81.8 top50 80.6 71.4 46.7 77.1 69.0 72.6 81.0 46.7 72.3 68.1 68.5 Dec.Table 76.0 83.5 73.7 81.9 78.8 77.8 86.8 61.7 86.6 78.2 78.5 top10 80.9 89.5 88.5 81.1 85.0 80.9 94.7 88.5 93.9 89.5 87.2 top25 69.6 84.8 72.7 81.6 77.2 75.0 84.8 50.0 81.6 72.9 75.0 top50 77.4 76.2 60.0 83.1 74.2 77.4 81.0 46.7 84.3 72.3 73.3 ZeroR 50.0 49.6 50.2 49.9 49.9 50.0 49.6 50.2 49.9 49.9 49.9 top10 50.0 52.6 53.8 50.7 51.8 50.0 52.6 53.8 50.7 51.8 51.8 top25 50.0 48.5 50.0 49.6 49.5 50.0 48.5 50.0 49.6 49.5 49.5 top50 50.0 47.6 46.7 49.4 48.4 50.0 47.6 46.7 49.4 48.4 48.4 Grand Total 80.4 81.3 67.4 80.0 77.3 79.4 84.8 65.3 81.9 77.8 77.6

95 K.ACCURACYRESULTSFORINDIVIDUALLEAGUES

Table K.5: Accuracy results of Serie A Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV BayesNet 85.1 95.3 68.9 84.8 83.5 86.2 95.3 71.1 84.5 84.3 83.9 top10 92.6 100.0 90.0 95.2 94.4 93.4 100.0 96.7 95.2 96.3 95.4 top25 84.0 94.3 52.0 90.6 80.2 88.0 94.3 52.0 86.8 80.3 80.2 top50 78.8 91.7 64.7 68.6 75.9 77.3 91.7 64.7 71.4 76.3 76.1 DecisionTable 85.0 89.0 70.4 78.9 80.8 85.0 91.3 70.4 78.2 81.2 81.0 top10 92.6 95.2 76.7 91.9 89.1 92.6 95.2 76.7 93.5 89.5 89.3 top25 79.0 88.6 64.0 84.9 79.1 79.0 82.9 64.0 81.1 76.7 77.9 top50 83.3 83.3 70.6 60.0 74.3 83.3 95.8 70.6 60.0 77.4 75.9 IBk 90.2 91.3 84.2 81.5 86.8 82.5 90.2 74.2 85.9 83.2 85.0 top10 91.7 97.6 100.0 91.9 95.3 91.7 100.0 80.0 91.9 90.9 93.1 top25 88.0 97.1 88.0 92.5 91.4 86.0 91.4 72.0 94.3 85.9 88.7 top50 90.9 79.2 64.7 60.0 73.7 69.7 79.2 70.6 71.4 72.7 73.2 J48 86.8 92.3 75.8 81.0 84.0 80.9 94.2 75.8 81.5 83.1 83.5 top10 88.4 95.2 76.7 93.5 88.5 86.8 95.2 76.7 91.9 87.7 88.1 top25 87.0 94.3 80.0 92.5 88.4 74.0 91.4 80.0 81.1 81.6 85.0 top50 84.8 87.5 70.6 57.1 75.0 81.8 95.8 70.6 71.4 79.9 77.5 Logistic 92.4 85.2 78.2 79.9 83.9 87.2 96.9 75.8 86.6 86.6 85.2 top10 94.2 97.6 90.0 91.9 93.4 94.2 97.6 76.7 96.8 91.3 92.4 top25 89.0 82.9 68.0 90.6 82.6 90.0 97.1 80.0 94.3 90.4 86.5 top50 93.9 75.0 76.5 57.1 75.6 77.3 95.8 70.6 68.6 78.1 76.9 NaiveBayes 88.0 95.1 76.2 85.7 86.3 92.9 96.9 75.3 89.7 88.7 87.5 top10 88.4 95.2 80.0 95.2 89.7 94.2 97.6 73.3 96.8 90.5 90.1 top25 83.0 94.3 84.0 90.6 88.0 92.0 97.1 76.0 92.5 89.4 88.7 top50 92.4 95.8 64.7 71.4 81.1 92.4 95.8 76.5 80.0 86.2 83.6 PART 86.3 93.3 69.7 79.8 82.3 78.1 94.2 67.8 82.6 80.7 81.5 top10 90.1 95.2 76.7 93.5 88.9 91.7 95.2 76.7 95.2 89.7 89.3 top25 84.0 97.1 56.0 88.7 81.5 73.0 91.4 56.0 81.1 75.4 78.4 top50 84.8 87.5 76.5 57.1 76.5 69.7 95.8 70.6 71.4 76.9 76.7 RandomForest 87.4 96.3 79.5 86.0 87.3 83.1 95.9 77.6 86.6 85.8 86.5 top10 93.4 100.0 100.0 95.2 97.1 92.6 97.6 80.0 96.8 91.7 94.4 top25 87.0 97.1 68.0 94.3 86.6 81.0 94.3 88.0 88.7 88.0 87.3 top50 81.8 91.7 70.6 68.6 78.2 75.8 95.8 64.7 74.3 77.6 77.9 ZeroR 50.1 49.5 49.7 51.3 50.2 50.1 49.5 49.7 51.3 50.2 50.2 top10 50.4 50.0 50.0 51.6 50.5 50.4 50.0 50.0 51.6 50.5 50.5 top25 50.0 48.6 52.0 50.9 50.4 50.0 48.6 52.0 50.9 50.4 50.4 top50 50.0 50.0 47.1 51.4 49.6 50.0 50.0 47.1 51.4 49.6 49.6 Grand Total 83.5 87.5 72.5 78.8 80.6 80.7 89.4 70.8 80.8 80.4 80.5

96 L F1 score results for individual leagues

97 L. F1 SCORERESULTSFORINDIVIDUALLEAGUES

Table L.1: F1 results of Bundesliga Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV NaiveBayes 0.79 0.99 0.75 0.88 0.85 0.86 0.98 0.79 0.94 0.89 0.87 top10 0.82 1.00 0.67 0.89 0.84 0.96 0.97 0.86 0.92 0.93 0.89 top25 0.77 0.96 0.84 0.85 0.86 0.88 0.96 0.89 0.95 0.92 0.89 top50 0.79 1.00 0.73 0.90 0.85 0.74 1.00 0.62 0.93 0.82 0.84 PART 0.81 0.93 0.82 0.88 0.86 0.82 0.93 0.81 0.88 0.86 0.86 top10 0.89 1.00 0.87 0.91 0.92 0.94 1.00 0.87 0.89 0.92 0.92 top25 0.83 0.91 0.80 0.88 0.85 0.75 0.91 0.80 0.87 0.83 0.84 top50 0.70 0.89 0.80 0.86 0.81 0.78 0.89 0.75 0.88 0.82 0.82 BayesNet 0.78 0.99 0.72 0.90 0.85 0.86 0.97 0.75 0.91 0.87 0.86 top10 0.89 0.97 0.82 0.91 0.90 0.97 0.97 0.92 0.94 0.95 0.92 top25 0.75 1.00 0.80 0.90 0.86 0.92 1.00 0.80 0.91 0.91 0.89 top50 0.70 1.00 0.55 0.88 0.78 0.68 0.95 0.55 0.86 0.76 0.77 Logistic 0.79 0.97 0.76 0.89 0.85 0.84 0.96 0.71 0.92 0.86 0.86 top10 0.85 0.97 0.76 0.91 0.87 0.97 0.97 0.88 0.93 0.94 0.91 top25 0.77 1.00 0.80 0.91 0.87 0.90 0.96 0.84 0.96 0.92 0.89 top50 0.74 0.94 0.73 0.85 0.82 0.65 0.95 0.40 0.87 0.72 0.77 J48 0.78 0.93 0.82 0.89 0.85 0.80 0.93 0.76 0.88 0.84 0.85 top10 0.90 1.00 0.87 0.94 0.93 0.86 1.00 0.87 0.89 0.90 0.92 top25 0.76 0.91 0.78 0.87 0.83 0.77 0.91 0.67 0.87 0.80 0.82 top50 0.68 0.89 0.80 0.85 0.81 0.78 0.89 0.75 0.88 0.83 0.82 RandomForest 0.83 0.97 0.73 0.90 0.86 0.86 0.97 0.56 0.88 0.82 0.84 top10 0.92 1.00 0.82 0.97 0.93 0.96 0.97 0.77 0.92 0.91 0.92 top25 0.83 0.91 0.71 0.88 0.83 0.88 0.96 0.74 0.90 0.87 0.85 top50 0.72 1.00 0.67 0.87 0.81 0.75 1.00 0.18 0.82 0.69 0.75 IBk 0.80 0.92 0.56 0.87 0.79 0.83 0.97 0.72 0.89 0.85 0.82 top10 0.90 1.00 0.78 0.94 0.91 0.96 1.00 0.89 0.94 0.95 0.93 top25 0.81 0.96 0.29 0.86 0.73 0.88 0.96 0.74 0.94 0.88 0.80 top50 0.69 0.80 0.60 0.81 0.73 0.67 0.94 0.55 0.78 0.73 0.73 DecisionTable 0.75 0.92 0.79 0.86 0.83 0.77 0.92 0.63 0.91 0.81 0.82 top10 0.90 0.97 0.82 0.87 0.89 0.88 0.97 0.82 0.93 0.90 0.89 top25 0.70 0.91 0.75 0.83 0.80 0.76 0.91 0.33 0.91 0.73 0.76 top50 0.65 0.89 0.80 0.89 0.81 0.68 0.89 0.75 0.88 0.80 0.80 ZeroR 0.66 0.67 0.67 1.00 0.67 0.66 0.67 0.67 1.00 0.67 0.67 top10 0.67 0.67 0.69 1.00 0.67 0.67 0.67 0.69 1.00 0.67 0.67 top25 1.00 1.00 0.67 1.00 0.67 1.00 1.00 0.67 1.00 0.67 0.67 top50 0.66 1.00 0.67 1.00 0.66 0.66 1.00 0.67 1.00 0.66 0.66 Grand Total 0.78 0.94 0.74 0.88 0.83 0.82 0.94 0.71 0.90 0.84 0.84

98 Table L.2: F1 results of EPL Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV NaiveBayes 0.81 0.95 0.84 0.88 0.87 0.82 0.93 0.86 0.90 0.88 0.87 top10 0.83 0.94 0.91 0.91 0.90 0.84 0.94 0.97 0.92 0.91 0.91 top25 0.84 0.96 0.82 0.87 0.87 0.90 0.90 0.91 0.94 0.91 0.89 top50 0.76 0.94 0.80 0.87 0.84 0.73 0.94 0.71 0.84 0.81 0.82 RandomForest 0.82 0.95 0.82 0.90 0.87 0.83 0.97 0.77 0.90 0.87 0.87 top10 0.88 1.00 0.90 0.93 0.93 0.94 1.00 0.87 0.95 0.94 0.93 top25 0.86 0.96 0.96 0.89 0.92 0.89 0.96 0.87 0.93 0.91 0.92 top50 0.71 0.89 0.59 0.89 0.77 0.65 0.94 0.57 0.84 0.75 0.76 J48 0.81 0.97 0.74 0.88 0.85 0.81 0.97 0.81 0.88 0.87 0.86 top10 0.86 1.00 0.94 0.93 0.93 0.86 1.00 0.94 0.93 0.93 0.93 top25 0.86 0.96 0.72 0.90 0.86 0.85 0.96 0.81 0.89 0.88 0.87 top50 0.70 0.93 0.57 0.81 0.75 0.73 0.93 0.67 0.83 0.79 0.77 Logistic 0.78 0.92 0.81 0.91 0.85 0.84 0.93 0.74 0.93 0.86 0.85 top10 0.85 0.94 0.90 0.95 0.91 0.95 0.94 0.85 0.95 0.92 0.91 top25 0.77 0.89 0.96 0.90 0.88 0.94 0.96 0.87 0.95 0.93 0.90 top50 0.71 0.93 0.57 0.87 0.77 0.63 0.88 0.50 0.88 0.72 0.75 DecisionTable 0.81 0.97 0.72 0.89 0.85 0.81 0.97 0.77 0.91 0.86 0.85 top10 0.85 1.00 0.84 0.88 0.89 0.88 1.00 0.84 0.97 0.92 0.91 top25 0.84 0.96 0.67 0.91 0.84 0.79 0.96 0.82 0.93 0.87 0.86 top50 0.73 0.93 0.67 0.87 0.80 0.75 0.93 0.67 0.83 0.80 0.80 BayesNet 0.79 0.96 0.79 0.89 0.86 0.85 0.95 0.62 0.92 0.83 0.85 top10 0.83 0.97 0.85 0.93 0.89 0.96 0.97 0.93 0.95 0.96 0.92 top25 0.80 0.96 0.87 0.89 0.88 0.92 0.93 0.92 0.95 0.93 0.91 top50 0.73 0.94 0.67 0.86 0.80 0.68 0.94 0.00 0.85 0.62 0.71 IBk 0.80 0.93 0.76 0.89 0.85 0.76 0.93 0.76 0.88 0.83 0.84 top10 0.85 1.00 0.87 0.92 0.91 0.96 0.94 0.93 0.94 0.94 0.93 top25 0.81 0.93 0.77 0.87 0.85 0.83 0.93 0.82 0.94 0.88 0.86 top50 0.75 0.88 0.63 0.86 0.78 0.50 0.93 0.53 0.77 0.68 0.73 PART 0.81 0.95 0.74 0.89 0.84 0.81 0.95 0.71 0.87 0.83 0.84 top10 0.86 1.00 0.87 0.90 0.91 0.92 1.00 0.87 0.91 0.93 0.92 top25 0.86 0.96 0.77 0.91 0.87 0.77 0.96 0.59 0.87 0.80 0.84 top50 0.70 0.88 0.57 0.85 0.75 0.73 0.88 0.67 0.83 0.78 0.76 ZeroR 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 top10 0.67 0.67 0.68 1.00 0.67 0.67 0.67 0.68 1.00 0.67 0.67 top25 0.67 1.00 0.67 0.67 0.67 0.67 1.00 0.67 0.67 0.67 0.67 top50 0.67 1.00 0.67 0.67 0.67 0.67 1.00 0.67 0.67 0.67 0.67 Grand Total 0.79 0.94 0.77 0.87 0.84 0.80 0.94 0.75 0.88 0.84 0.84

99 L. F1 SCORERESULTSFORINDIVIDUALLEAGUES

Table L.3: F1 results of La Liga Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV RandomForest 0.83 0.87 0.85 0.89 0.86 0.88 0.90 0.83 0.90 0.88 0.87 top10 0.93 0.90 1.00 0.94 0.94 0.93 0.93 0.96 0.97 0.95 0.95 top25 0.85 0.93 0.83 0.91 0.88 0.92 1.00 0.91 0.94 0.94 0.91 top50 0.72 0.78 0.71 0.83 0.76 0.79 0.78 0.63 0.79 0.75 0.75 Logistic 0.81 0.79 0.82 0.91 0.83 0.89 0.90 0.88 0.91 0.90 0.86 top10 0.88 0.76 0.92 0.96 0.88 0.95 0.90 0.92 0.95 0.93 0.90 top25 0.80 0.90 0.91 0.91 0.88 0.91 0.96 0.91 0.91 0.92 0.90 top50 0.76 0.71 0.63 0.85 0.73 0.82 0.84 0.82 0.88 0.84 0.79 BayesNet 0.80 0.87 0.83 0.87 0.84 0.88 0.90 0.83 0.90 0.88 0.86 top10 0.85 0.90 0.96 0.94 0.91 0.98 0.93 0.96 0.97 0.96 0.94 top25 0.84 0.93 0.87 0.88 0.88 0.90 1.00 0.87 0.94 0.93 0.90 top50 0.71 0.78 0.67 0.79 0.74 0.75 0.78 0.67 0.80 0.75 0.74 IBk 0.76 0.86 0.87 0.87 0.84 0.85 0.88 0.79 0.88 0.85 0.85 top10 0.86 0.90 0.96 0.96 0.92 0.91 0.94 0.89 0.98 0.93 0.92 top25 0.67 0.96 0.91 0.88 0.86 0.90 0.93 0.87 0.92 0.90 0.88 top50 0.76 0.71 0.75 0.76 0.74 0.75 0.78 0.63 0.73 0.72 0.73 NaiveBayes 0.79 0.87 0.77 0.88 0.83 0.88 0.88 0.76 0.91 0.86 0.84 top10 0.88 0.90 0.86 0.93 0.89 0.94 0.90 0.86 0.95 0.91 0.90 top25 0.80 0.93 0.83 0.89 0.86 0.91 0.96 0.80 0.93 0.90 0.88 top50 0.71 0.78 0.63 0.83 0.73 0.81 0.78 0.63 0.86 0.77 0.75 PART 0.79 0.89 0.78 0.88 0.84 0.83 0.88 0.78 0.85 0.84 0.84 top10 0.91 0.93 0.96 0.96 0.94 0.96 0.93 0.96 0.90 0.94 0.94 top25 0.79 0.92 0.67 0.90 0.82 0.82 0.89 0.67 0.88 0.81 0.82 top50 0.67 0.82 0.71 0.78 0.74 0.73 0.82 0.71 0.77 0.76 0.75 J48 0.77 0.89 0.78 0.90 0.84 0.80 0.88 0.78 0.87 0.83 0.83 top10 0.88 0.93 0.96 0.95 0.93 0.93 0.93 0.96 0.92 0.94 0.93 top25 0.76 0.92 0.67 0.92 0.82 0.82 0.89 0.67 0.92 0.82 0.82 top50 0.67 0.82 0.71 0.82 0.76 0.64 0.82 0.71 0.76 0.74 0.75 DecisionTable 0.79 0.87 0.79 0.86 0.83 0.84 0.86 0.78 0.86 0.84 0.83 top10 0.89 0.93 0.96 0.95 0.93 0.89 0.93 0.96 0.95 0.93 0.93 top25 0.74 0.89 0.70 0.92 0.81 0.84 0.89 0.67 0.90 0.82 0.82 top50 0.75 0.78 0.71 0.72 0.74 0.79 0.76 0.71 0.74 0.75 0.75 ZeroR 0.67 0.65 0.65 0.67 0.66 0.67 0.65 0.65 0.67 0.66 0.66 top10 0.67 /0 /0 0.67 0.67 0.67 /0 /0 0.67 0.67 0.67 top25 0.67 0.65 0.67 0.66 0.66 0.67 0.65 0.67 0.66 0.66 0.66 top50 0.67 /0 0.64 0.66 0.65 0.67 /0 0.64 0.66 0.65 0.65 Grand Total 0.78 0.85 0.80 0.86 0.82 0.84 0.88 0.79 0.86 0.84 0.83

100 Table L.4: F1 results of Ligue 1 Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV RandomForest 0.86 0.84 0.82 0.87 0.85 0.85 0.89 0.59 0.87 0.80 0.82 top10 0.94 0.92 0.80 0.92 0.90 0.95 1.00 0.91 0.92 0.94 0.92 top25 0.85 0.87 0.85 0.84 0.85 0.83 0.90 0.67 0.85 0.81 0.83 top50 0.79 0.74 /0 0.84 0.79 0.77 0.78 0.18 0.83 0.64 0.70 Logistic 0.89 0.87 0.59 0.86 0.80 0.88 0.91 0.62 0.92 0.83 0.82 top10 0.92 0.88 0.73 0.89 0.86 0.95 1.00 0.87 0.95 0.94 0.90 top25 0.88 0.94 0.85 0.85 0.88 0.90 0.94 0.67 0.88 0.85 0.86 top50 0.86 0.78 0.18 0.83 0.66 0.81 0.78 0.33 0.92 0.71 0.69 NaiveBayes 0.88 0.87 0.62 0.79 0.79 0.86 0.87 0.73 0.87 0.83 0.81 top10 0.90 0.92 0.80 0.76 0.85 0.92 1.00 0.92 0.86 0.92 0.88 top25 0.85 0.83 0.81 0.78 0.82 0.84 0.88 0.71 0.87 0.83 0.82 top50 0.88 0.84 0.25 0.82 0.70 0.83 0.74 0.57 0.87 0.75 0.73 PART 0.85 0.84 0.69 0.83 0.80 0.81 0.89 0.75 0.83 0.82 0.81 top10 0.93 0.87 0.81 0.92 0.89 0.92 1.00 0.91 0.90 0.93 0.91 top25 0.86 0.94 0.79 0.87 0.87 0.74 0.88 0.71 0.81 0.78 0.82 top50 0.74 0.70 0.46 0.71 0.65 0.77 0.78 0.64 0.77 0.74 0.70 J48 0.86 0.84 0.64 0.83 0.79 0.81 0.88 0.80 0.81 0.82 0.81 top10 0.96 0.87 0.81 0.88 0.88 0.92 1.00 0.91 0.88 0.93 0.90 top25 0.82 0.94 0.79 0.85 0.85 0.74 0.85 0.84 0.85 0.82 0.84 top50 0.81 0.70 0.33 0.76 0.65 0.75 0.78 0.64 0.69 0.72 0.68 IBk 0.83 0.85 0.72 0.88 0.82 0.88 0.86 0.57 0.85 0.79 0.80 top10 0.95 0.86 0.80 0.91 0.88 0.94 0.97 0.87 0.94 0.93 0.90 top25 0.84 0.91 0.92 0.89 0.89 0.88 0.91 0.67 0.91 0.84 0.87 top50 0.70 0.78 0.43 0.84 0.69 0.82 0.70 0.18 0.70 0.60 0.64 BayesNet 0.83 0.83 0.61 0.82 0.77 0.87 0.91 0.56 0.87 0.80 0.79 top10 0.88 0.88 0.79 0.83 0.84 0.95 1.00 0.88 0.89 0.93 0.89 top25 0.83 0.82 0.79 0.81 0.81 0.84 0.94 0.62 0.87 0.82 0.81 top50 0.78 0.78 0.25 0.83 0.66 0.83 0.78 0.18 0.85 0.66 0.66 DecisionTable 0.74 0.83 0.63 0.82 0.76 0.78 0.86 0.73 0.87 0.81 0.78 top10 0.80 0.90 0.86 0.80 0.84 0.80 0.94 0.86 0.93 0.88 0.86 top25 0.64 0.86 0.79 0.83 0.78 0.75 0.85 /0 0.83 0.81 0.79 top50 0.78 0.74 0.25 0.83 0.65 0.79 0.78 0.60 0.84 0.75 0.70 ZeroR /0 0.65 0.64 0.66 0.65 /0 0.65 0.64 0.66 0.65 0.65 top10 /0 /0 /0 /0 /0 /0 /0 /0 /0 /0 /0 top25 /0 /0 /0 0.66 0.66 /0 /0 /0 0.66 0.66 0.66 top50 /0 0.65 0.64 0.66 0.65 /0 0.65 0.64 0.66 0.65 0.65 Grand Total 0.84 0.84 0.66 0.82 0.79 0.84 0.87 0.66 0.84 0.81 0.80

101 L. F1 SCORERESULTSFORINDIVIDUALLEAGUES

Table L.5: F1 results of Serie A Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV NaiveBayes 0.88 0.95 0.76 0.86 0.86 0.93 0.97 0.77 0.89 0.89 0.88 top10 0.89 0.95 0.82 0.95 0.91 0.94 0.98 0.78 0.97 0.92 0.91 top25 0.83 0.94 0.83 0.91 0.88 0.92 0.97 0.77 0.93 0.90 0.89 top50 0.92 0.96 0.63 0.72 0.81 0.92 0.96 0.75 0.79 0.86 0.83 RandomForest 0.88 0.96 0.80 0.87 0.88 0.84 0.96 0.78 0.87 0.86 0.87 top10 0.94 1.00 1.00 0.95 0.97 0.93 0.98 0.82 0.97 0.92 0.95 top25 0.87 0.97 0.69 0.94 0.87 0.82 0.94 0.89 0.90 0.89 0.88 top50 0.83 0.92 0.71 0.72 0.79 0.78 0.96 0.63 0.76 0.78 0.79 BayesNet 0.85 0.95 0.77 0.86 0.86 0.87 0.96 0.79 0.85 0.86 0.86 top10 0.93 1.00 0.90 0.95 0.95 0.93 1.00 0.97 0.95 0.96 0.95 top25 0.84 0.94 0.67 0.90 0.84 0.88 0.94 0.67 0.87 0.84 0.84 top50 0.79 0.92 0.73 0.72 0.79 0.79 0.92 0.73 0.72 0.79 0.79 Logistic 0.92 0.86 0.78 0.81 0.85 0.88 0.97 0.76 0.87 0.87 0.86 top10 0.94 0.98 0.91 0.91 0.94 0.94 0.98 0.80 0.97 0.92 0.93 top25 0.89 0.84 0.69 0.91 0.83 0.89 0.97 0.78 0.95 0.90 0.87 top50 0.94 0.77 0.75 0.62 0.77 0.79 0.96 0.71 0.70 0.79 0.78 IBk 0.91 0.92 0.82 0.81 0.86 0.83 0.91 0.75 0.85 0.84 0.85 top10 0.92 0.98 1.00 0.92 0.95 0.92 1.00 0.83 0.92 0.92 0.94 top25 0.88 0.97 0.89 0.93 0.92 0.86 0.91 0.72 0.95 0.86 0.89 top50 0.91 0.80 0.57 0.59 0.72 0.71 0.81 0.71 0.69 0.73 0.72 J48 0.87 0.92 0.70 0.82 0.83 0.83 0.94 0.70 0.82 0.82 0.82 top10 0.89 0.95 0.70 0.93 0.87 0.88 0.95 0.70 0.92 0.86 0.86 top25 0.87 0.94 0.74 0.92 0.87 0.77 0.91 0.74 0.83 0.81 0.84 top50 0.85 0.88 0.67 0.59 0.75 0.83 0.96 0.67 0.71 0.79 0.77 DecisionTable 0.84 0.90 0.68 0.78 0.80 0.84 0.91 0.68 0.81 0.81 0.80 top10 0.93 0.95 0.70 0.92 0.87 0.93 0.95 0.70 0.93 0.88 0.87 top25 0.76 0.89 0.67 0.83 0.79 0.76 0.82 0.67 0.82 0.77 0.78 top50 0.83 0.85 0.67 0.59 0.73 0.83 0.96 0.67 0.67 0.78 0.76 PART 0.86 0.93 0.53 0.80 0.78 0.78 0.94 0.51 0.84 0.77 0.77 top10 0.90 0.95 0.70 0.93 0.87 0.91 0.95 0.70 0.95 0.88 0.87 top25 0.84 0.97 0.15 0.88 0.71 0.77 0.91 0.15 0.83 0.67 0.69 top50 0.84 0.88 0.75 0.59 0.77 0.66 0.96 0.67 0.74 0.75 0.76 ZeroR 0.67 0.66 0.67 0.68 0.67 0.67 0.66 0.67 0.68 0.67 0.67 top10 0.67 0.67 0.67 /0 0.67 0.67 0.67 0.67 /0 0.67 0.67 top25 0.67 0.65 /0 0.67 0.67 0.67 0.65 /0 0.67 0.67 0.67 top50 0.67 0.67 /0 0.68 0.67 0.67 0.67 /0 0.68 0.67 0.67 Grand Total 0.85 0.90 0.73 0.81 0.82 0.83 0.91 0.71 0.84 0.83 0.83

102 M AUC-ROC results for individual leagues

103 M.AUC-ROC RESULTSFORINDIVIDUALLEAGUES

Table M.1: AUC-ROC results of Bundesliga Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV NaiveBayes 0.88 1.00 0.87 0.95 0.92 0.93 1.00 0.93 0.98 0.96 0.94 top10 0.91 1.00 0.86 0.95 0.93 0.99 1.00 0.98 0.98 0.99 0.96 top25 0.86 1.00 0.91 0.95 0.93 0.92 1.00 0.99 0.99 0.98 0.95 top50 0.87 1.00 0.83 0.95 0.91 0.87 1.00 0.81 0.98 0.91 0.91 RandomForest 0.92 1.00 0.79 0.98 0.92 0.93 1.00 0.74 0.96 0.91 0.91 top10 0.98 1.00 0.82 1.00 0.95 1.00 1.00 0.99 0.99 1.00 0.97 top25 0.94 1.00 0.83 0.98 0.94 0.97 1.00 0.88 0.99 0.96 0.95 top50 0.85 1.00 0.71 0.95 0.88 0.84 1.00 0.33 0.91 0.77 0.82 Logistic 0.90 0.98 0.86 0.96 0.92 0.90 0.98 0.76 0.96 0.90 0.91 top10 0.95 0.96 0.87 0.98 0.94 0.97 1.00 0.98 0.98 0.98 0.96 top25 0.88 1.00 0.86 0.97 0.93 0.95 1.00 0.90 1.00 0.96 0.95 top50 0.87 0.97 0.83 0.92 0.90 0.78 0.94 0.39 0.89 0.75 0.82 BayesNet 0.85 1.00 0.72 0.96 0.88 0.93 1.00 0.81 0.98 0.93 0.91 top10 0.95 1.00 0.79 0.98 0.93 1.00 1.00 0.98 0.99 0.99 0.96 top25 0.85 1.00 0.80 0.96 0.90 0.97 1.00 0.86 0.99 0.95 0.93 top50 0.76 1.00 0.57 0.95 0.82 0.82 0.99 0.58 0.96 0.84 0.83 PART 0.81 0.93 0.83 0.91 0.87 0.81 0.93 0.79 0.86 0.85 0.86 top10 0.89 1.00 0.87 0.93 0.92 0.97 1.00 0.87 0.89 0.93 0.93 top25 0.82 0.92 0.78 0.91 0.86 0.68 0.92 0.83 0.87 0.82 0.84 top50 0.71 0.88 0.83 0.89 0.83 0.79 0.88 0.67 0.83 0.79 0.81 J48 0.80 0.93 0.81 0.90 0.86 0.77 0.93 0.81 0.89 0.85 0.86 top10 0.87 1.00 0.87 0.98 0.93 0.85 1.00 0.87 0.89 0.90 0.91 top25 0.80 0.92 0.72 0.87 0.83 0.71 0.92 0.89 0.87 0.85 0.84 top50 0.75 0.88 0.83 0.86 0.83 0.77 0.88 0.67 0.91 0.81 0.82 DecisionTable 0.84 0.92 0.78 0.94 0.87 0.81 0.92 0.69 0.95 0.84 0.85 top10 0.96 0.97 0.83 0.94 0.92 0.92 0.97 0.83 0.93 0.91 0.92 top25 0.85 0.92 0.67 0.93 0.84 0.78 0.92 0.56 0.96 0.80 0.82 top50 0.72 0.88 0.83 0.95 0.84 0.72 0.88 0.67 0.95 0.80 0.82 IBk 0.80 0.93 0.63 0.87 0.81 0.83 0.97 0.72 0.89 0.85 0.83 top10 0.90 1.00 0.78 0.94 0.91 0.95 1.00 0.86 0.94 0.94 0.92 top25 0.81 0.96 0.44 0.87 0.77 0.87 0.96 0.72 0.94 0.87 0.82 top50 0.69 0.83 0.67 0.81 0.75 0.67 0.95 0.58 0.79 0.75 0.75 ZeroR 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top10 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top25 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 Grand Total 0.81 0.91 0.75 0.89 0.84 0.82 0.91 0.75 0.88 0.84 0.84

104 Table M.2: AUC-ROC results of EPL Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV RandomForest 0.90 1.00 0.86 0.97 0.93 0.92 0.99 0.84 0.98 0.93 0.93 top10 0.95 1.00 0.98 0.98 0.98 0.99 1.00 0.92 1.00 0.98 0.98 top25 0.93 0.99 0.93 0.97 0.96 0.96 1.00 0.97 0.99 0.98 0.97 top50 0.81 1.00 0.66 0.96 0.86 0.79 0.98 0.63 0.94 0.84 0.85 NaiveBayes 0.90 0.95 0.88 0.96 0.92 0.92 0.98 0.88 0.98 0.94 0.93 top10 0.92 0.99 0.97 0.95 0.96 0.95 1.00 1.00 0.97 0.98 0.97 top25 0.90 0.94 0.94 0.96 0.94 0.93 0.97 0.99 0.98 0.97 0.95 top50 0.86 0.92 0.73 0.98 0.87 0.88 0.98 0.66 0.97 0.87 0.87 Logistic 0.88 0.95 0.86 0.98 0.92 0.87 0.96 0.86 0.96 0.91 0.91 top10 0.91 1.00 0.98 0.98 0.97 0.99 0.97 0.99 0.98 0.98 0.97 top25 0.88 0.90 0.97 0.98 0.93 0.94 1.00 0.99 0.99 0.98 0.95 top50 0.85 0.95 0.63 0.97 0.85 0.67 0.92 0.61 0.92 0.78 0.82 BayesNet 0.88 0.97 0.79 0.96 0.90 0.92 1.00 0.80 0.98 0.92 0.91 top10 0.92 1.00 0.96 0.98 0.96 0.99 1.00 0.99 1.00 0.99 0.98 top25 0.88 0.93 0.90 0.97 0.92 0.96 0.99 0.97 0.99 0.98 0.95 top50 0.82 0.98 0.50 0.94 0.81 0.81 1.00 0.44 0.96 0.80 0.81 DecisionTable 0.83 0.97 0.66 0.94 0.85 0.84 0.97 0.72 0.96 0.87 0.86 top10 0.89 1.00 0.82 0.95 0.92 0.93 1.00 0.82 0.97 0.93 0.92 top25 0.86 0.96 0.67 0.96 0.86 0.83 0.96 0.84 0.98 0.90 0.88 top50 0.75 0.94 0.50 0.92 0.78 0.76 0.94 0.50 0.93 0.78 0.78 J48 0.81 0.98 0.69 0.89 0.85 0.83 0.98 0.76 0.89 0.87 0.86 top10 0.90 1.00 0.93 0.92 0.94 0.85 1.00 0.93 0.91 0.92 0.93 top25 0.91 0.96 0.71 0.88 0.87 0.86 0.96 0.86 0.93 0.90 0.88 top50 0.64 0.98 0.44 0.87 0.73 0.78 0.98 0.50 0.85 0.78 0.76 IBk 0.80 0.93 0.73 0.89 0.84 0.79 0.93 0.79 0.88 0.85 0.84 top10 0.85 1.00 0.86 0.92 0.91 0.95 0.93 0.93 0.93 0.94 0.92 top25 0.80 0.92 0.76 0.86 0.83 0.82 0.92 0.83 0.94 0.88 0.86 top50 0.74 0.88 0.58 0.88 0.77 0.58 0.94 0.59 0.78 0.72 0.75 PART 0.82 0.96 0.68 0.90 0.84 0.79 0.96 0.69 0.90 0.83 0.84 top10 0.90 1.00 0.86 0.94 0.92 0.94 1.00 0.86 0.94 0.93 0.93 top25 0.88 0.96 0.75 0.91 0.87 0.70 0.96 0.71 0.90 0.82 0.85 top50 0.67 0.92 0.44 0.86 0.72 0.74 0.92 0.50 0.84 0.75 0.74 ZeroR 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top10 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top25 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 Grand Total 0.81 0.91 0.74 0.89 0.84 0.82 0.92 0.76 0.89 0.85 0.84

105 M.AUC-ROC RESULTSFORINDIVIDUALLEAGUES

Table M.3: AUC-ROC results of La Liga Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV BayesNet 0.87 0.93 0.88 0.95 0.91 0.94 0.94 0.88 0.97 0.93 0.92 top10 0.94 0.98 1.00 0.98 0.98 0.99 0.98 0.99 1.00 0.99 0.98 top25 0.88 0.99 0.91 0.96 0.94 0.97 1.00 0.91 0.99 0.97 0.95 top50 0.79 0.80 0.73 0.91 0.81 0.86 0.83 0.73 0.91 0.84 0.82 DecisionTable 0.85 0.89 0.79 0.93 0.86 0.86 0.87 0.81 0.92 0.87 0.87 top10 0.95 0.93 0.96 0.98 0.96 0.93 0.93 0.96 0.96 0.94 0.95 top25 0.85 0.96 0.68 0.97 0.86 0.88 0.96 0.73 0.97 0.88 0.87 top50 0.74 0.78 0.73 0.84 0.77 0.79 0.72 0.73 0.84 0.77 0.77 IBk 0.76 0.86 0.87 0.86 0.84 0.84 0.88 0.79 0.88 0.85 0.84 top10 0.84 0.90 0.96 0.96 0.92 0.90 0.94 0.89 0.98 0.93 0.92 top25 0.68 0.96 0.91 0.87 0.86 0.90 0.93 0.86 0.92 0.90 0.88 top50 0.77 0.72 0.74 0.74 0.74 0.73 0.78 0.61 0.73 0.71 0.73 J48 0.83 0.90 0.73 0.89 0.84 0.81 0.89 0.73 0.86 0.82 0.83 top10 0.91 0.94 0.96 0.95 0.94 0.94 0.93 0.96 0.96 0.95 0.94 top25 0.85 0.93 0.50 0.88 0.79 0.80 0.89 0.50 0.89 0.77 0.78 top50 0.72 0.83 0.73 0.82 0.78 0.68 0.83 0.73 0.74 0.75 0.76 Logistic 0.90 0.90 0.91 0.97 0.92 0.94 0.93 0.93 0.97 0.94 0.93 top10 0.93 0.90 0.99 0.99 0.96 0.99 0.93 0.98 0.99 0.98 0.97 top25 0.89 0.99 0.96 0.98 0.95 0.93 1.00 0.95 0.98 0.97 0.96 top50 0.87 0.81 0.79 0.94 0.85 0.89 0.85 0.86 0.95 0.89 0.87 NaiveBayes 0.87 0.92 0.87 0.95 0.90 0.95 0.93 0.88 0.96 0.93 0.92 top10 0.91 0.98 0.99 0.95 0.96 0.99 0.99 0.99 0.95 0.98 0.97 top25 0.88 0.97 0.95 0.95 0.94 0.97 0.99 0.93 0.97 0.96 0.95 top50 0.83 0.82 0.68 0.93 0.82 0.89 0.80 0.71 0.95 0.84 0.83 PART 0.85 0.90 0.73 0.89 0.84 0.83 0.89 0.73 0.87 0.83 0.84 top10 0.91 0.94 0.96 0.95 0.94 0.96 0.93 0.96 0.97 0.96 0.95 top25 0.87 0.93 0.50 0.89 0.80 0.80 0.89 0.50 0.93 0.78 0.79 top50 0.77 0.83 0.73 0.84 0.79 0.74 0.83 0.73 0.71 0.75 0.77 RandomForest 0.90 0.93 0.88 0.96 0.92 0.96 0.95 0.89 0.95 0.94 0.93 top10 0.98 0.98 1.00 0.99 0.99 0.99 0.99 0.99 1.00 0.99 0.99 top25 0.92 0.99 0.91 0.98 0.95 0.99 1.00 0.92 0.99 0.97 0.96 top50 0.81 0.81 0.73 0.90 0.81 0.89 0.85 0.77 0.87 0.85 0.83 ZeroR 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top10 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top25 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 Grand Total 0.81 0.86 0.80 0.88 0.84 0.85 0.86 0.79 0.88 0.84 0.84

106 Table M.4: AUC-ROC results of Ligue 1 Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV Logistic 0.96 0.94 0.81 0.94 0.91 0.93 0.93 0.75 0.98 0.90 0.90 top10 0.97 0.94 0.92 0.95 0.95 0.96 1.00 0.93 0.99 0.97 0.96 top25 0.95 0.98 0.98 0.93 0.96 0.91 0.99 0.81 0.97 0.92 0.94 top50 0.96 0.91 0.54 0.93 0.83 0.91 0.79 0.51 0.98 0.80 0.82 NaiveBayes 0.94 0.92 0.83 0.90 0.90 0.94 0.93 0.77 0.95 0.89 0.90 top10 0.97 0.99 0.93 0.88 0.94 0.99 1.00 0.96 0.93 0.97 0.96 top25 0.90 0.93 0.98 0.89 0.93 0.90 0.93 0.83 0.95 0.90 0.91 top50 0.94 0.84 0.57 0.92 0.82 0.92 0.86 0.50 0.96 0.81 0.82 BayesNet 0.93 0.93 0.82 0.91 0.89 0.95 0.93 0.71 0.96 0.89 0.89 top10 0.97 0.99 0.93 0.91 0.95 0.99 1.00 0.95 0.97 0.98 0.96 top25 0.91 0.93 0.94 0.90 0.92 0.93 0.99 0.81 0.96 0.92 0.92 top50 0.89 0.86 0.57 0.92 0.81 0.92 0.80 0.37 0.93 0.75 0.78 RandomForest 0.92 0.94 0.82 0.94 0.91 0.95 0.91 0.66 0.95 0.87 0.89 top10 0.98 0.98 0.93 0.97 0.97 1.00 1.00 0.94 0.99 0.98 0.97 top25 0.92 0.98 0.97 0.93 0.95 0.95 0.98 0.78 0.96 0.92 0.93 top50 0.86 0.85 0.57 0.92 0.80 0.90 0.76 0.25 0.91 0.71 0.75 DecisionTable 0.84 0.87 0.72 0.89 0.83 0.87 0.88 0.64 0.91 0.83 0.83 top10 0.90 0.96 0.88 0.90 0.91 0.90 0.94 0.88 0.94 0.92 0.91 top25 0.79 0.90 0.73 0.90 0.83 0.86 0.89 0.50 0.90 0.79 0.81 top50 0.83 0.76 0.57 0.87 0.76 0.86 0.80 0.55 0.88 0.78 0.77 IBk 0.84 0.86 0.72 0.88 0.82 0.88 0.89 0.63 0.86 0.81 0.82 top10 0.95 0.87 0.77 0.91 0.87 0.94 0.98 0.88 0.93 0.93 0.90 top25 0.85 0.91 0.93 0.88 0.89 0.87 0.92 0.64 0.90 0.83 0.86 top50 0.73 0.80 0.46 0.84 0.71 0.82 0.77 0.38 0.73 0.68 0.69 PART 0.87 0.85 0.70 0.85 0.82 0.76 0.89 0.73 0.82 0.80 0.81 top10 0.93 0.87 0.84 0.91 0.89 0.94 1.00 0.92 0.89 0.94 0.91 top25 0.87 0.97 0.73 0.85 0.85 0.66 0.88 0.77 0.78 0.77 0.81 top50 0.80 0.71 0.53 0.79 0.71 0.67 0.80 0.50 0.81 0.70 0.70 J48 0.87 0.85 0.67 0.83 0.81 0.76 0.89 0.76 0.82 0.81 0.81 top10 0.97 0.87 0.84 0.89 0.89 0.94 1.00 0.90 0.85 0.92 0.91 top25 0.83 0.97 0.73 0.86 0.84 0.66 0.88 0.86 0.84 0.81 0.83 top50 0.81 0.71 0.46 0.75 0.68 0.69 0.80 0.50 0.77 0.69 0.69 ZeroR 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top10 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top25 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 Grand Total 0.85 0.85 0.73 0.85 0.82 0.84 0.86 0.68 0.86 0.81 0.82

107 M.AUC-ROC RESULTSFORINDIVIDUALLEAGUES

Table M.5: AUC-ROC results of Serie A Filter Wrapper Classifier DF FW GK MD AV DF FW GK MD AV GAV NaiveBayes 0.95 1.00 0.86 0.93 0.94 0.97 1.00 0.83 0.96 0.94 0.94 top10 0.96 1.00 0.98 0.98 0.98 0.99 1.00 0.95 0.98 0.98 0.98 top25 0.93 1.00 0.85 0.99 0.94 0.96 1.00 0.80 0.98 0.94 0.94 top50 0.96 1.00 0.76 0.83 0.89 0.97 1.00 0.74 0.91 0.90 0.90 RandomForest 0.96 1.00 0.85 0.91 0.93 0.92 1.00 0.86 0.94 0.93 0.93 top10 0.99 1.00 1.00 0.99 1.00 0.97 1.00 0.96 0.99 0.98 0.99 top25 0.95 1.00 0.82 0.98 0.94 0.93 1.00 0.95 0.96 0.96 0.95 top50 0.92 0.99 0.74 0.75 0.85 0.85 1.00 0.68 0.86 0.85 0.85 BayesNet 0.94 0.99 0.82 0.89 0.91 0.95 1.00 0.82 0.94 0.93 0.92 top10 0.98 1.00 0.98 0.99 0.99 0.98 1.00 0.98 0.99 0.99 0.99 top25 0.93 0.98 0.79 0.99 0.92 0.95 1.00 0.79 0.97 0.93 0.92 top50 0.91 1.00 0.68 0.70 0.82 0.91 1.00 0.68 0.85 0.86 0.84 Logistic 0.97 0.90 0.87 0.85 0.90 0.91 0.99 0.86 0.89 0.91 0.91 top10 0.98 1.00 1.00 0.96 0.98 0.97 1.00 0.97 0.98 0.98 0.98 top25 0.95 0.98 0.83 0.98 0.93 0.96 1.00 0.86 0.99 0.95 0.94 top50 0.98 0.74 0.80 0.62 0.79 0.80 0.98 0.74 0.69 0.80 0.79 IBk 0.90 0.91 0.86 0.81 0.87 0.82 0.90 0.76 0.86 0.84 0.85 top10 0.92 0.98 1.00 0.92 0.95 0.92 1.00 0.90 0.92 0.93 0.94 top25 0.88 0.97 0.88 0.92 0.92 0.86 0.92 0.70 0.94 0.85 0.88 top50 0.91 0.79 0.68 0.60 0.75 0.70 0.79 0.68 0.72 0.72 0.73 J48 0.88 0.93 0.76 0.80 0.84 0.86 0.94 0.76 0.82 0.85 0.84 top10 0.91 0.95 0.77 0.93 0.89 0.96 0.95 0.77 0.95 0.91 0.90 top25 0.91 0.94 0.79 0.91 0.89 0.75 0.92 0.79 0.81 0.82 0.85 top50 0.83 0.89 0.72 0.56 0.75 0.88 0.95 0.72 0.72 0.81 0.78 DecisionTable 0.90 0.89 0.71 0.80 0.83 0.90 0.92 0.71 0.86 0.85 0.84 top10 0.94 0.95 0.77 0.96 0.90 0.94 0.94 0.77 0.97 0.90 0.90 top25 0.86 0.87 0.65 0.92 0.83 0.86 0.83 0.65 0.86 0.80 0.81 top50 0.90 0.84 0.72 0.53 0.75 0.90 0.99 0.72 0.76 0.84 0.79 PART 0.86 0.93 0.68 0.79 0.81 0.84 0.94 0.66 0.83 0.82 0.82 top10 0.90 0.95 0.77 0.93 0.89 0.96 0.95 0.77 0.95 0.91 0.90 top25 0.85 0.97 0.54 0.87 0.81 0.73 0.92 0.54 0.81 0.75 0.78 top50 0.83 0.88 0.72 0.55 0.75 0.82 0.96 0.67 0.74 0.80 0.77 ZeroR 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top10 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top25 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 top50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 Grand Total 0.87 0.90 0.77 0.81 0.84 0.85 0.91 0.75 0.84 0.84 0.84

108