Evaluation on the Team Performance of MLB Applying Pagerank Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
Ann Appl Sport Sci In Press: e968 http://www.aassjournal.com; e-ISSN: 2322–4479; p-ISSN: 2476–4981. 10.29252/aassjournal.968 ORIGINAL ARTICLE Evaluation on the Team Performance of MLB Applying PageRank Algorithm 1Hyo-jun Yun, 1Jae-Hyeon Park, 1Jiwun Yoon, 1Minsoo Jeon* 1Department of Sports Science, Korea National Sport University, Seoul, Korea. Submitted 19 January 2021; Accepted in final form 29 January 2021. ABSTRACT Background. There is a weakness that win-loss ranking model in the MLB now is simply calculated based on the result of win-loss game, so we assume that a ranking system considering the opponent's team performance is necessary. Objectives. The purpose of this study is to suggest PageRank algorithm to complement problem with ranking calculated with winning ratio in calculating team ranking of US MLB. Methods. PageRank figure is calculated by using the result of 4,861 matches in 2017 season (2,430 matches) and 2018 season (2,431 matches) in the MLB. Results. There’s a difference between ranking calculated in PageRank and ranking calculated with winning ratio both in 2017 season and 2018 season, and there’s a difference in performance per each district as a result of comparing performance per each league and district. In addition, as a result of calculating the predictive validity about PageRank and winning ratio ranking, it turns out that the ranking calculated with PageRank algorithm has relatively high predictive validity. Conclusion. This study confirmed the possibility of predictive in the US MLB by applying PageRank algorithm. KEYWORDS: PageRank, MLB, Baseball, Ranking INTRODUCTION Evaluation on athletic performance of team and game record. In the case of Taekwondo, badminton athletes in sporting events is one of the major interest and tennis, ranking is calculated using the to leaders who performs sports, athletes and the accumulated points based on results of match. And public who watch sports (1). There are various ways in the case of baseball, soccer and volleyball, of evaluating teams and athletes in sporting matches ranking is calculated based on the difference which are different in each sports. Especially, between win and lose which is a traditional sports ranking system can be introduced as a representative ranking method (5). It calculates the ranking by method to evaluate team and players in sports event. comparing the frequency of win and loss in the total Ranking system is a method used to evaluate team frequency of playing the game, and this is called a or players through result of match (2). In addition, win·loss ranking model. The win and loss ranking sports field has been used as information for models are widely used in league games where the determining annual salary of professional athlete, number of matches is same (1). selecting national team and deciding Olympic In particular, as an example of representative participation rights based on ranking, so it has sports applying win-loss ranking model, U.S. major considerable interest (3, 4). league baseball (MLB) can be introduced. The MLB In sporting matches, method to calculate ranking provide the opportunity to compete in the division are applied differently according to sports. In the series to each first-ranked rankers of the 3 teams in case of sports such as swimming, track and field and national leagues(Western, Central and Eastern) and weightlifting, ranking is calculated based on the the 3 teams in American leagues(Western, Central, *. Corresponding Author: Minsoo Jeon, Ph.D E-mail: [email protected] 2 Evaluation on the Team Performance of MLB Eastern) based on win and loss ranking model. ranking by considering the opponent's performance Specifically, major league teams play 76 games with level rather than calculating ranking based only on teams in the same district and the same league, 66 the result of the match, so it has the advantage of games with teams in other districts and the same compensating for the problems of ranking currently league, and 20 games with teams in other leagues. being calculated in MLB (1, 10). This is a way of This is the way to provide competing opportunity of weighting relatively good teams by weighting them division series to each the first-ranked teams in each according to their team's performance level. In other league district with a high winning ratio after words, even if Team A wins, the ranking may be performing total of 162 matches per team. variable depending on whether the opponent has However, MLB applying the win and loss how good performance (11-13). ranking model may have one question. In the case of Looking into Precedent Study on PageRank MLB, the number of matches per team is the same, algorithm, it is used in methodology study to but the ranking is calculated without considering the develop optimization model (14-18) application opponent's performance level. For example, in the aspect of pedagogy (19-21) and sports field assuming that all the 5 teams of western district in (22-26). the league have good performance and 5 teams of Therefore, this study aims to calculate MLB central district have relatively lower performance, ranking using PageRank algorithm. Specifically, in the meaning of value for winning may be different order to apply the PageRank algorithm, the team in each district. In other words, winning to the first rankings are calculated based on the results of the ranking and the tenth ranking will have a relatively regular season matches in 2017 and 2018, and the different value. predictive validity of the model is verified. This kind of problem may occur in the An Example of MLB Ranking Calculation performance of league in-out matches as well as Applying PageRank Algorithm. Looking at the each district’s matches. However, there is a match results of team A, they recorded 7 wins and 3 weakness that win-loss ranking model in the major losses in total; 2 wins against teams B, C, and D, 1 win leagues now is simply calculated based on the result against team F, and 1 win against teams E, G, and H. In of win-loss game, so we assume that a ranking case of team E, they recorded 6 wins and 4 losses in system considering the opponent's team total; 1 win against team A, B, C, and D, 2 losses performance is necessary. against team F, and 1 win and 1 lose against team H. Then, to solve this problem, we can apply This is illustrated in (Figure 1) below. If you look at PageRank algorithm based on the network theory. (Figure 1), the object sending the arrow is interpreted The PageRank algorithm is a method to present a as a lose, and the object receiving the arrow is priority page when Google calculates a search result, interpreted as a win. If the game result data is which is an algorithm that PageRank ranking varies represented by a matrix, it is as (Matrix 1) below. In according to the frequency of web page citations (6- (Matrix 1), the row is the frequency of victory and the 9). PageRank algorithm is a calculating method of column is the frequency of defeat. Figure 1. Schematization of Match Results between Teams. Matrix 1. Initial Matrix Q Evaluation on the Team Performance of MLB 3 Q12 is the number of times Team A has won Team team A sent, and 1 of Q32 is 1 among 7 that team B sent. B, and Q51 is the number of times Team A has lost to Therefore, it is necessary to convert to matrix that takes Team E. In other words, in a matrix, the row is the into account the total weight sent by the team, and can frequency of team's victory and the column is the it can be expressed as (Figure 2) below. It is shown that frequency of defeat. However, 1 of Q51 and 1 of Q32 is Q51 in (Figure 2) is converted to 1/3=0.333, and Q32 is same 1 lose, but the meaning can be interpreted converted to 1/7=0.143. differently. It’s because that 1 of Q51 is 1 among 3 that Figure 2. Matrix 2, Conversion Matrix Q, Matrix 3, Conversion Matrix Q considering Damping Factor As PageRank algorithm is calculated by Markov among the total weights. Specifically, team A wins 7 Chain's radical root method, repetitive operation is times in the total of 40 matches so it is 0.175 (7/40), and performed. However, if the linked nodes do not send team B wins 3 times so it is .075(3/40). is produced the link to other nodes, sink phenomenon occurs by multiplying by (Matrix 2), and 2 is produced because PageRank continues to accumulate in the loop by multiplying 1 by (Matrix 2). Whenπ this3 process is (6). To solve this problem, PageRank algorithm applies repeatedly performed,π2 a convergence stepπ is formed in the concept of Damping Factor. which the amountπ of change becomes insignificant as Damping Factor is the random walker, which means shown in the following (Table 2), and eigenvector the probability of moving to any other node, and it matrix can be ranked in large order. means the act of randomly following a link on a web page or moving to a new random page by ending the ( 2) current page due to various reasons. (6). Theoretically, = it has a range of 0<d<1 and it is generally set to .15 to analyze, but it is also used to adjust Damping factor In this example, team F is the 1st, team G is 2nd, depending on the situation.