Predicting the Matches Entertainment on the Example of the Russian
Total Page:16
File Type:pdf, Size:1020Kb
Predicting the Matches Entertainment on the Example of the Russian Premier League Using Machine Learning Methods and Predictions Application for Sports Broadcasts Organization Sergey Gorshkov 1,2[0000-0001-5958-5224], Anastasia Chernysheva 1,3[0000-0003-0812-8941], and Ilya Ivanov 1,3[0000-0003-4205-590X] 1 Lomonosov Moscow State University, Leninskie gory, 1, GSP-1, Moscow, 119991, Russia [email protected] 2 National Research University Higher School of Economics, 20 Myasnitskaya str, Moscow, 101000, Russia [email protected] 3 Skolkovo Institute of Science and Technology, Bolshoy Boulevard 30, bld. 1, Skolkovo, 121205, Russia [email protected] Abstract. The calendar of the football season of the Russian Premier League, as well as other leading European championships, is designed in such a way that several matches can take place in one period of time. This situation has be- come more common after the pause associated with the COVID-19 pandemic, due to the match schedule densification. The broadcaster needs to determine which match will be live shown on the main channel. Also, he can provide ac- cess to all broadcasts of the championship round for paid channels and recom- mend the most spectacular matches for viewers to watch. The start time of the matches in the modern world is chosen by agreement of the League and televi- sion, so it would be really convenient to put potentially the most spectacular matches on prime time. This paper introduces the concept of the entertainment index, which takes into account goals and other important events of the match. The value of the entertainment index for upcoming matches is predicted using a machine learning model based on historical data. As the result, we have a model that can predict the entertainment of a match and help you to choose the most interesting game from the viewer's side. Keywords: football; Premier League; machine learning; CatBoost; matches en- tertainment; mathematical modeling; linear regression. 1 Introduction Match entertainment is a very subjective concept. Different people like different foot- ball. Someone prefers the abundance of goals and dangerous moments, someone looks at tactical coaching steps and football “chess”, someone savors a double-edged Proceedings of the 10th International Scientific and Practical Conference named after A. I. Kitov "Information Technologies and Mathematical Methods in Economics and Management (IT&MM-2020)", October 15-16, 2020, Moscow, Russia © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) game with a lot of action in the center third of the field. Modern TV and Internet broadcasters are ready to offer viewers a wide range of football matches both the national championship and international competitions to watch. In many European Championships, the matches start time is chosen from certain time slots agreed with the main partner broadcasters and is based on regional specifics, participating teams’ games schedule, and the broadcasters’ interests. Various metrics are used to measure viewers’ interest: the number of views, ratings, retention, and so on. However, these are indirect indicators that are not directly related to football, but only evaluate the reaction of viewers. Usually, are selected to be shown at the most convenient time and in the best format such as top matches, which have high-status opponents with a large fan audience is playing, and the match will be watched by the largest number of peo- ple. However, there are not many such matches, and among the others, you must also select the most and the least priority matches to display. In addition, many online cinemas and Internet TV have recommendation systems that advise you to watch the most interesting matches that the audience should like. The second problem is related to the fact that often several matches fall on the same time slot or on overlapping time slots, and TV broadcasters need to solve the problem of showing these matches to TV viewers. Different strategies can be chosen for show- ing matches that take place simultaneously, for example, live broadcasting of different matches on different media holding channels, live broadcasting of individual matches on a Federal channel etc. In Russia, during the period after the pause associated with the coronavirus pandemic, matches of the Russian football championship are live shown in large numbers on the public TV channel MATCH TV. Due to the high den- sity of the schedule, many games are held simultaneously, and therefore the TV chan- nel must choose which match to show live on the Federal channel. Of course, this choice is based on a lot of factors. Firstly, it is the current ranking of the teams, the competitive value of the match, principled opposition to rivals, the current form of opponents, and so on. Secondly, it is the size of the estimated match audience, which correlates with the total number of fans of a particular Premier League team. Thirdly, these are considerations related to the fact that there are no teams whose matches are not shown at all or are rarely shown on a public channel. Of course, there are matches of the leading teams of the championship, derbies, which are likely to cause the greatest audience interest, including for neutral fans. Match selection among the remaining ones becomes a difficult task. In this study, we make a reasonable assumption that viewers are more interested in watching spectacular football, all other things being equal. Therefore, to recommend a match on the Internet or show in live time, it is suggested to learn how to predict the entertainment of a match based on the team statistics and related match's factors. Cur- rently, there are no solutions for similar problems in the scientific literature, so in section 2 we provide an overview of related areas, where we will look in detail at existing approaches to predicting various factors, both related to the game and to the match organization, including evaluating entertainment based on other factors. Sec- tion 3 introduces the match entertainment index and describes the developed algo- rithm that allows predicting the entertainment of a Premier League match with ac- ceptable accuracy based on historical data for opposing teams. Section 4 will describe the testing of the algorithm and a description of the data sets. Section 5 discusses the results of the algorithm and its correctness. Section 6 provides a summary of the done work. 2 Literature Review Most sports-related research aims to predict the outcome of the matches or some match statistics. This is easily explained from the commercial side of the issue – large incomes in the field of sports for various individuals and legal entities are associated with betting. Of course, only a small part of the research is public. The most common thing that researchers try to predict in their work is the outcome of a match, based on many different characteristics [1 - 7]. In [4] are used the ELO rating (it is also used in FIDE chess rankings and other sports kinds [8]). It can be concluded from the article that statistical loss functions are more effective than economic measures when using various forecasting methods. ELO ratings are useful for encoding past results infor- mation. Applicable to football the difference in ranking is very significant in predict- ing the match result. The authors concluded that using the ELO rating to assess the strength of the team is a justified step. In [6], a Bayesian network is considered for predicting the results of football matches involving FC Barcelona. Many factors that influence the outcome of a football match were identified. Thus, the process of select- ing the model and features to explore sets the boundaries which can be discovered. The authors of the article group factors into two types: non-psychological and psycho- logical. Table 1 shows the main factors for predicting football matches that the au- thors used in Bayesian network. Table 1. The most important factors for predicting the result of football matches in [6]. Psychological Non Psychological Weather Average of players age History of 5 last games Injured main players Result against for teams Average match in week Home game Performance of main players Performance of all players Ability front team Average goal in all home Psychological state Average goal for home The accuracy of predicting the outcome of the match was 92%. This is a very high result for such type of research, however, it can be explained by the fact that only one team is considered, which is much more likely to win than lose points. A wide range of approaches and machine learning algorithms are considered in [5, 7, 9, 10]. For example, the authors of [10] conduct experimental studies with the following models: Naïve Bayes, Bayesian networks, LogitBoost, The k-nearest neighbors algorithm, Random forest. However, none of the above models usually achieves more than 60% accuracy. Thus, based on this and other articles, we can conclude that the accuracy of close to 60% is quite acceptable since the football game is a rather non-deterministic process with a lot of random events, such as red cards and penalties, so it is not possi- ble to predict the result with high accuracy. There are several articles in the scientific literature regarding the evaluation of the entertainment of a match. In [11], a linear regression model with GLM encoding was chosen to predict the number of viewers based on the social network activity (Face- book) during the match, as well as activity in the last two weeks before the match. Besides that, according to the authors, club or the national team photos on the social page cause the greatest activity of the audience and fuel interest in the match.