Can Complex Network Metrics Predict the Behavior of NBA Teams?
Total Page:16
File Type:pdf, Size:1020Kb
Can Complex Network Metrics Predict the Behavior of NBA Teams? Pedro O.S. Vaz de Melo Virgilio A.F. Almeida Antonio A.F. Loureiro Federal University Federal University of Minas Federal University of Minas Gerais Gerais of Minas Gerais 31270-901, Belo Horizonte 31270-901, Belo Horizonte 31270-901, Belo Horizonte Minas Gerais, Brazil Minas Gerais, Brazil Minas Gerais, Brazil [email protected] [email protected] [email protected] ABSTRACT of dollars. In 2006, the Nevada State Gaming Control Board The United States National Basketball Association (NBA) is reported $2.4 billion in legal sports wager [10]. Meanwhile, one of the most popular sports league in the world and is well in 1999, the National Gambling Impact Study Commission known for moving a millionary betting market that uses the reported to Congress that more than $380 billion is illegally countless statistical data generated after each game to feed wageredonsportsintheUnitedStateseveryyear[10].The the wagers. This leads to the existence of a rich historical generated statistics are used, for instance, by many Internet database that motivates us to discover implicit knowledge sites to aid gamblers, giving them more reliable predictions in it. In this paper, we use complex network statistics to on the outcome of upcoming games. analyze the NBA database in order to create models to rep- The statistics are also used to characterize the perfor- resent the behavior of teams in the NBA. Results of complex mance of each player over time, dictating their salaries and network-based models are compared with box score statis- the duration of their contracts. Kevin Garnett [18] aver- tics, such as points, rebounds and assists per game. We show aged 22.4 Points Per Game (PPG), 12.8 Rebounds Per Game the box score statistics play a significant role for only a small (RPG) and 4.1 Assists Per Game (APG) in the 2006/2007 fraction of the players in the league. We then propose new season, making his salary to be the highest in the 2007/2008 models for predicting a team success based on complex net- season: $23.75 million. On the other hand, Anderson Vare- work metrics, such as clustering coefficient and node degree. j˜ao [18], who had 6.0 PPG, 6.0RPGand0.6APGinthe Complex network-based models present good results when 2006/2007 season, asked in the following season for a $60 compared to box score statistics, which underscore the im- million contract for six years and had his request neglected. portance of capturing network relationships in a community Robert Horry [18], who is at the 7th position in the rank such as the NBA. of players who won more NBA championships with seven titles for three different teams, has career averages of 7.2 PPG, 4.9RPGand2.2 APG and, in the year 2007, of his Categories and Subject Descriptors last title, had a salary of $3.315 million. Two simple ques- H.2.8 [Information Systems]: database management— tions arise from these observations. First, would Anderson Database Applications, Data mining; G.3 [Mathematics of Varej˜ao be overpaid in case his request were accepted? Sec- Computing]: Probability and Statistics—Statistical com- ond, is Robert Horry underpaid, once he wins a title for puting every team he played? The first question was answered by Henry Abbot, a Se- General Terms nior Writer of ESPN.com, in his blog True Hoop [1]. He said PPG, RPG and APG only measure the actions of a player Theory within a second or two when someone shoots the ball. The rest of the time, points and rebounds measure nothing. He 1. INTRODUCTION also said, answering to the first question, that these statistics The United States National Basketball Association (NBA) are against Anderson Varej˜ao, who is one of the best play- was founded in 1946 and since then is well known for its ers in the NBA in the adjusted plus/minus statistic. The efficient organization and for its high level athletes. After plus/minus statistic keeps track of the net changes in score each game played, a large amount of statistical data are when a given player is either on or off the court, and it does generated describing the performance of each player who not depend on to box scores, such as PPG, APG and RPG [2, played in the match. These statistics are used in the United 14]. This indicates that Anderson Varej˜ao could have asked States to move a betting market estimated in tens of billions for a $60 million contract. Moreover, in the aid of Ander- son Varej˜ao, we point out that after he finally reached an Permission to make digital or hard copies of all or part of this work for agreement with the Cavaliers, the performance of the team personal or classroom use is granted without fee provided that copies are went from 9 wins and 11 losses to 15 wins and 7 losses, with not made or distributed for profit or commercial advantage and that copies Anderson Varej˜ao scoring 7.8 PPG, 8.5RPGand1.2APG bear this notice and the full citation on the first page. To copy otherwise, to before his injury. republish, to post on servers or to redistribute to lists, requires prior specific For the second question we could not find an answer. permission and/or a fee. KDD’08, August 24–27, 2008, Las Vegas, Nevada, USA. Robert Horry has played 14 seasons, averaging a title per Copyright 2008 ACM 978-1-60558-193-4/08/08 ...$5.00. 695 two years played and per team played. Is he a lucky guy found, among other interesting results, that the connectivity who always play with the best ones or he really makes a dif- of the players has increased over the years while the clus- ference? One thing we know for sure is, that simple statistics tering coefficient declined. The authors suggest that the such as PPG, RPG and APG should not be the only met- possible causes for this phenomenon are the increase in the rics used to predict a player and team success. While the exodus of players to outside of Brazil, the increasing number statistics are treated separately and the players are treated of trades of players among national teams and, finally, the individually, little is known whether there is any relation- increase in the player’s career time. Moreover, it was found ship between them. We have seen in history, players with that the assortativity degree is positive and also increases insignificant box scores statistics playing significant roles on over time, indicating that exchanges between players are, in a team success. most cases, between teams of the same size. A possible way to study the collective behavior of social Finally, the work of [7] presents an statistical analysis to agents is to apply the theory of complex networks [19, 22]. quantify the predictability of all sports competitions in five A network is a set of vertices, sometimes called nodes, with major sports leagues in the United States and England. To connections between them, called edges. A complex network characterize the predictability of games, the authors mea- is a network characterized by a large number of vertices sure the “upset frequency” (i.e., the fraction of times the un- and edges that follow some pattern, like the formation of derdog wins). Basketball has a low upset frequency, which clusters or highly connected vertices, called hubs [4]. While instigated us to look into the basketball league database to in a simple network, with at most hundreds of vertices, the find out models to predict the team behavior. human eye is an analytic tool of remarkable power, in a complex network, this approach is useless. Thus, to study 3. MOTIVATION complex networks it is necessary to use statistical methods For every game played in the NBA, a huge amount of in a way to tell us how the network looks like. statistics is generated. This leads to the existence of a rich The goal of this work is to model the NBA as a com- historical database that motivates us to discover implicit plex network and develop metrics that predict the behavior knowledge in it. The NBA data we used in this work was of NBA teams. The metrics should take into account the obtained from the site Database Basketball [11], which was social and work relationship among players and teams and cited by the Magazine Sports Illustrated [9, 11] as the best should also be able to predict a team success without rely- reference site for basketball. The site makes available all ing on box score statistics. Before presenting the metrics, we the NBA statistical data in text files, from the year of 1946 show that the number of players who have made significant to the year of 2006. Among the data, it is available infor- impact in the history of the NBA and in their teams is neg- mation on 3736 players and 97 teams, season by season or ligible if we draw our conclusions based only on box score by career. Our main goal is to move beyond the usual box statistics. Then, we study the characteristics of the NBA score statistics presented in this database and discover new complex network in the direction of understanding how the knowledge in the plain recorded numbers. The first analysis relationships among players evolve over time. And then, fi- of the database aims to show why we need to look beyond nally, we present and compare the developed metrics that the box score statistics.