STATISTICAL LEARNING FOR

ESPORTS MATCH PREDICTION

A Thesis

Presented to the

Faculty of

California State Polytechnic University, Pomona

In Partial Fulfillment

Of the Requirements for the Degree

Master of Science

In

Mathematics

By

Kevin Bailey

2020 SIGNATURE PAGE

THESIS: STATISTICAL LEARNING FOR MATCH PREDICTION

AUTHOR: Kevin Bailey

DATE SUBMITTED: Spring 2020

Department of Mathematics and Statistics

Dr. Jimmy Risk Thesis Committee Chair Mathematics & Statistics

Dr. Alan Krinik Mathematics & Statistics

Dr. Adam King Mathematics & Statistics

ii ACKNOWLEDGMENTS

This thesis is dedicated to my family, friends, and professors who have supported me throughout my education. I would not have gotten through graduate school had it not been for many of you. A special thanks to that of my girlfriend, Rita, and her family for allowing me an affordable place to stay during this stressful time. And another special thanks to Dr. Jimmy Risk for dealing with me in his office hours many times each week every semester throughout my graduate school career, as well as agreeing to be my thesis advisor and dealing with my pestering throughout it.

iii ABSTRACT

Traditional sports have been a hot topic for many statistical and machine learning models. Not only for prediction, but these models can help to make informed decisions on fantasy drafts as well. When only caring about the result of a match, many popular classifications such as logistic regression and support vector machines

(SVMs) are used. However, one statistical learning concept that has been gaining traction, but is still not widely used, is that of Gaussian processes (GPs). This paper aims to analyze of many different models that are common in practice and give a quick rundown on the theory of them, as well as going into Gaussian processes to see how they will compare. Our application is to apply these models on a data set of professional matches, and comparing them to see which one works best for this classification of winning or losing. Theoretically, Gaussian processes should be a great fit for this problem since local observations help the predictions more than distant ones, and the domain of some very important factors

(gold and gold-difference) tend to be clustered around the average; in fact, it is nearly impossible to stray extremely far from that average.

iv Contents

Signature Page ii

Acknowledgements iii

Abstract iv

List of Tables viii

List of Figures ix

1 Introduction 1

2 Classification Problem and Assessment Metrics 5

2.1 Classification Problem ...... 5

2.2 Classification Metrics ...... 6

2.2.1 Basic Terms ...... 6

2.2.2 Accuracy ...... 7

2.2.3 AUC and ROC Curve ...... 8

2.3 Out of Sample Prediction ...... 9

2.3.1 Validation Set ...... 10

2.3.2 K-Fold Cross-Validation ...... 11

v 2.4 Subset Selection ...... 13

2.4.1 Best Subset Selection ...... 13

2.4.2 Forward Stepwise Selection ...... 14

3 Classification Models 16

3.1 Logistic Regression ...... 16

3.2 LASSO for Classification ...... 17

3.3 Linear and Quadratic Discriminant Analysis ...... 18

3.3.1 Linear Discriminant Analysis ...... 18

3.3.2 Quadratic Discriminant Analysis ...... 20

3.4 Support Vector Machine ...... 21

3.4.1 Hyperplane Classifiers ...... 21

3.4.2 Support Vector Classifiers ...... 21

3.4.3 Support Vector Machines ...... 22

3.5 Gaussian Processes ...... 24

3.5.1 Mean and Covariance Priors ...... 28

4 League of Legends Dataset 30

4.1 Dataset ...... 30

4.2 Features ...... 32

5 Results 34

5.1 Fitting and Diagnostics ...... 34

5.1.1 Logistic Regression ...... 35

5.1.2 Linear and Quadratic Discriminant Analysis ...... 39

5.1.3 LASSO ...... 40

5.1.4 Support Vector Machine ...... 42

vi 5.1.5 Gaussian Processes ...... 43

5.1.6 Training Set Results ...... 45

5.2 Test Set Evaluation ...... 46

6 Conclusion 50

6.1 Discussion ...... 50

6.2 Future Work ...... 51

6.3 Final Words ...... 53

Bibiliography 53

A League of Legends Game Concepts 57

A.1 League of Legends Championship Series ...... 60

B Summarized Data Descriptions 62

vii List of Tables

4.1 First 12 rows and 5 columns of raw data ...... 31

4.2 First 4 rows and 5 columns of summarized data ...... 31

5.1 Logistic regression metrics using Forward Stepwise Selection on dif-

ferent bases...... 35

5.2 Forward Stepwise Selection logistic regression model’s coefficients . 37

5.3 LDA and QDA metrics on the training set ...... 40

5.4 LASSO metrics on the training set ...... 40

5.5 SVM accuracies for different kernels and all predictors on the training

set ...... 42

5.6 GP metrics on the training set ...... 44

5.7 Summarized metrics and confusion matrices of best performing mod-

els on the training set...... 46

5.8 Performance metrics of candidate models on the test set ...... 47

B.1 Column names and description of the binary predictors used. . . . . 62

B.2 Column names and description of the differences in creep score, gold,

and experience for each role, as well as the response of the data.. . . 63

viii List of Figures

2.1 Example of a good and bad ROC curve ...... 9

4.1 Scatterplot matrix of certain predictors ...... 32

5.1 ROC Curves for the two logistic regression models on the training set. 36

5.2 Scatterplot of the two most important predictors from logistic re-

gression ...... 38

5.3 ROC curves for LDA and QDA on the training set...... 39

5.4 ROC curves on the training set for the LASSO models ...... 41

5.5 ROC curves on the training set for different SVM Kernels ...... 43

5.6 ROC curves on the training set for different Gaussian processes using

the Gaussian kernel...... 45

5.7 ROC curves on the test set for all of the best performing models . . 48

ix Chapter 1

Introduction

Esports, short for electronic sports, is a form of competitive sports using video games. Commonly, this takes on the form of organized, multiplayer video game competitions, either individually or as a team. In the last decade, esports became a significant factor in the video game industry, leading to many developers de- signing games with esports in mind. The common video game genres associated with esports are first-person shooters (FPS), fighting, real-time strategy (RTS), and multiplayer online battle arenas (MOBA). With the ease of access to streaming plat- forms such as YouTube, Twitch, Douyu, and more, allowing viewers from all over the world to tune in and watch as well as stream themselves, esports has begun to rise in viewership to the point of being noticed by traditional sports networks such as ESPN. During the 2019 League of Legends (LoL) World Championship, records were broken when viewership reached a peak of 1.743 million on Twitch alone, beating out the previous record of 1.618 million which was held by Fortnite’s “The

End” event [1]. Reports say that overall, the LoL 2019 World Championship finals peaked at 44 million concurrent viewers between all platforms, being broadcast in

1 16 different languages [11]. Furthermore, the average salary for a player in the

League of Legends Championship Series (LCS) (see Appendix A.1) was $105,000 in

January 2017, and has recently tripled to $300,000 in April 2019 [5]. From a mone- tary standpoint, it is crucial to identify what factors can influence game outcomes.

Maximizing spectator enjoyment is an additional factor, which can be fortified by

providing real time match predictions based on current game statistics, which has been done in Defense of the Ancients 2 (DOTA2), a game that is extremely similar

to that of LoL, in [16].

The match prediction problem, however, is not an easy one. A single game of

LoL contains a multitude of real time variables, all of which that can decide the

outcome of a game. Aside from just player/team performance, there are many

in-game features that have a drastic impact and can turn the game into the other team’s favor or amplify the current lead a team has. We provide a quick glance

into some of the basic ideas and complexities involved in a possible game of LoL:

• The game consists of two teams, each team consisting of five players.

• Each of the ten players individually choose from a pool of 148 champions, all

of which have individual strengths and weaknesses, and some perform better

than other against certain champions. In addition, champions typically have

different power thresholds at various points in the match. The champion

choice is fixed throughout a match.

• Players gain gold and experience throughout the game which also grants power

to their champion.

• Major events happen at varying points throughout the game whose outcome

can impact a team’s total gold and experience.

2 • All of this data is available continuously to a live spectator of the game.

Further details about LoL’s game mechanics are provided in Appendix A. One could simply view this as a large dimensional data problem that needs data reduc- tion, but any expert player or analyst will be able to tell you obvious implications that some of these variables have on how likely a team is to win at say, the ten or fifteen minute mark. Thus, in this high dimensional problem, interpretability and/or prior beliefs are a crucial factors.

There is a clear similarity to sports, which we first look to borrow ideas from.

In particular, predicting events/outcomes from past game results in sports has been a popular area of research for a long time, with many different methods and approaches to the area. In football, Min [3] uses Bayesian inference and rule-based reasoning along with in-game time-series to provide live win percentages. Many data mining techniques have also been used for sports prediction, as displayed in a study by Cao [7] for basketball outcome prediction; in this study, a model was constructed using a variety of machine learning algorithms such as a simple logistic classifier, support vector machines (SVM), naive Bayes and artifical neural networks

(ANN). Interestingly enough, the Simple Logistic Classifier resulted in the highest accuracy. DOTA2, mentioned above, is similar to LoL in almost every component, and is its direct competitor, leading to many parallels being drawn between the two games. In the DOTA2 literature, logistic regression was trained using just champion selection as well as a different model using champion, player, and win- rate. Neural networks have been done on the same set and random forests were trained on champion selection along with their win-rate, all in [16]. The metric used in [16] was cross-validation accuracy, which we will be using here as well.

Another popular machine learning algorithm involves Gaussian processes (GPs),

3 which have not been seen much in the literature for esports prediction thus far.

Typically used in regression and originating in geostatistics [2], GPs as a regression tool have performed quite well. They can also be used in classification by utilizing them as the prior in the probability for a categorical response.

This thesis aims to build upon the previous work done in DOTA2 and general sports. Specifically, we will explore LoL match prediction using different types of models and in-game information. The main goal is to produce an accurate predictive model, but it is also important to recognize what predictors have a large impact on the outcome of a match. In particular, we seek to produce match predictions based on data available at the fifteen minute mark. This is done in

DOTA2 as discussed in [16], but not yet done in LoL. We believe this is valuable to a LoL game broadcast and can easily and accurately be done with the available data. The rest of the thesis outlines the task at hand. In Chapter 2 we discuss the general framework of probabilistic classification, and provide assessment metrics for general classification models. Next, Chapter 3 discusses the theoretical details behind the models we use in this analysis. Next, Chapter 4 introduces the data set to be analyzed and investigates basic relationships between the variables. Finally,

Chapter 5 discusses the model fitting, diagnostics, and prediction results. We conclude in Chapter 6.

4 Chapter 2

Classification Problem and

Assessment Metrics

2.1 Classification Problem

p Let D = ((x1, y1),..., (xn, yn)) be a design, where xi ∈ R and yi ∈ {0, 1} is a binary response for a given xi. The purpose of classification is to produce a binary prediction y based on a location x, based on the observed design D. The means to

do this is usually to model directly a probability

p(x) ≡ p(y = 1|D), (2.1) i.e. fit a model that produces the probability of obtaining y = 1 given a data set D. The resulting classification model is called a classifier. To perform the actual classification prediction yˆ of y, the methods typically consist of fitting the model p(x) in Equation (2.1) and making class assignments for y (i.e. is y = 1 or y = 0) based off whether p(x) surpasses a certain threshold, η, to produce the class estimate yˆ. For example, if η = 0.5, p(x) > 0.5, corresponds to the prediction

5 yˆ = 1 since it is saying it is more likely to obtain yˆ = 1 than yˆ = 0, and conversely, p(x) < 0.5 produces yˆ = 0. Using η = 0.5 makes sense for most binary response cases because there are only two classes, so whichever one has a probability above

0.5 has a higher chance of occurring than the opposite class.

2.2 Classification Metrics

0 0 0 Suppose that we want to assess a classifier p(x) on a general data set D = ((x1 , y1 ),

0 0 ..., (xm , ym )), which may or may not be equal to D. Here, m is the number of points for p(x) to be evaluated on. In the case where D0 = D, we are assessing the in sample classification ability of p(x). Whenever x0 ∈/ D, i.e. x0 is not in the design,

0 0 p(x ) is an out of sample prediction, so yi is assumed unknown. We discuss common metrics used to quantify the performance of a classifier.

2.2.1 Basic Terms

A false positive (FP) is when we predict a positive result, but the result is actually

a negative,

FP = 1 0 0 0 i {yi6=yˆi, yi=0}

0 0 where 1 0 0 is the indicator function of the event {y = yˆ }: {yi =yˆi } i i ⎧ ⎪ 0 0 ⎨⎪ 0 if yi =6 yˆi 1 0 = (2.2) {yi =yˆi} ⎪ 0 0 ⎪⎩ 1 if yi = yˆi

A false negative (FN) is when we predict a negative result, yˆ = 0, but the result is

actually a positive,

FN = 1 0 0 0 . i {yi6=yˆi, yi=1}

6 Both of these are incorrect predictions, which a good classifier should seek to min- imize. Similarly, a good classifier should aim to maximize the number of true negatives (TN), i.e. when yˆi = yi = 0, and true positives (TP), yˆi = yi = 1, both of which would be correct predictions.

To produce a metric for a classifiers performance over an entire dataset, we

define the following:

• The False Positive Rate (FPR) is the ratio of false positives to the total

number of negatives, given by

m m P P 1 0 0 i=1 FPi i=1 {yˆi=1,yi=0} FPR = P m = Pm  . (2.3) [FPi + TNi] 1 0 0 + 1 0 0 i=1 i=1 {yˆi =1,yi =0} {yˆi =yi=0}

• The specificity is 1 − FPR, which measures the proportion of actual negatives

that are correctly identified as such.

• The True Positive Rate (TPR), also called the sensitivity, is the ratio of true

positives to the total number of positives, given by

m m P P 1 0 0 i=1 TPi i=1 {yˆi=yi=1} TPR = P m = Pm  . (2.4) [TPi + FNi] 1 0 0 + 1 0 0 i=1 i=1 {yˆi=y i =1} {yˆi =0,yi=1}

In regards to these metrics, a classifier should aim to minimize the FPR (equiv-

alent to maximizing the specificity) and maximize the TPR.

2.2.2 Accuracy

Accuracy is the average number of correct predictions, P n 1 Acc = i=1 {yi=yˆi} . (2.5) n

This results in the percentage of correct predictions from a model, or the number of true positives and true negatives out of the true and false positive/negatives; we

7 want this number to be high. Equivalently, one could also report the misclassifica- tion error rate, defined as 1−Acc; this number we would want to be small.

2.2.3 AUC and ROC Curve

The Receiver Operator Characteristic (ROC) curve is constructed by plotting the

FPR on the horizontal axis and the TPR on the vertical axis. A desirable classifier aims to maximize the TPR while simultaneously minimizing the FPR. One can show that the TPR increases as the FPR also increases (specifically by tweaking

0 ≤ η ≤ 1), so the hope is that the TPR increases as rapidly as possible as the FPR increases. Hence, the area under the ROC curve (AUC) should be maximized.

Ideally, we want our ROC curve to hug the top left of the graph, where the TPR is 1 and the FPR is 0, giving an AUC of 1, meaning there are all true positives and no false positives. Because of this, when evaluating a model’s performance on the basis of AUC, the closer it is the 1, the better it is predicting.

An example of two ROC curves can be seen in Figure 2.1, showing that the better performing model in red has its TPR increasing very rapidly from the start, trying to hug the top left of the plot. The poor performing model in blue has its

TPR go up almost at a one-to-one rate with the FPR, performing no better than a random chance classifier (the diagonal line) which assigns a score sampled from the uniform distribution between 0 and 1 to each instance, then classifying it based on η.

8 1.00

0.75

Decent Model 0.50 AUC-ROC = 0.83 Bad Model AUC-ROC = 0.5

True positive rate 0.25

0.00 0.00 0.25 0.50 0.75 1.00 False positive rate

Figure 2.1: Example of two ROC curves, one model that performs fairly well drawn

in red and one poorly performing model shown in blue. The poorly performing

model has an AUC similar to that of a random chance classifier.

2.3 Out of Sample Prediction

In general, a classifier that performs well in sample may not perform well out of

sample. In practice, since out of sample data is not readily available, one needs to use the a full data set to both fit the model and also assess out of sample performance.

We reiterate the notation defined above, with D = ((x1, y1),..., (xn, yn)) being

0 0 0 0 0 the design or training set, and D = ((x1, y1),..., (xm, ym)) be the set on which a classifier p(x) is evaluated. It is possible that D and D0 overlap, or are identical.

However, as the aim of this paper is to produce a model that is optimal in predicting

out of sample, from henceforth we denote D0 as the test set, such that D ∩ D0 = ∅, that is, there is no overlap. In this regard, the names of these sets are accurate:

9 the model is fitted (trained) on D, and its out of sample performance is tested on

D0 . We do note that the error metrics above can still be assessed on D, however, this can result in overfitting since the predictions yˆi are themselves correlated with

0 the training observations yi. In fact, if yi is a hypothetical out of sample response associated with xi, then one can generally show that

" n # " n # n X X 2 X (y0 − yˆ )2 = (y − yˆ )2 + Cov(yˆ , y ), (2.6) E i i E i i n i i i=1 i=1