Predicting the Success of NFL Teams Using Complex Network Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Predicting the Success of NFL Teams using Complex Network Analysis Matheus de Oliveira Salim and Wladmir Cardoso Brandao˜ Department of Computer Science, Pontifical Catholic University of Minas Gerais (PUC Minas), Belo Horizonte, Brazil Keywords: Knowledge Discovery, Social Network Analysis, Success Prediction, Sport, Football, NFL. Abstract: The NFL (National Football League) is the most popular sports league in the United States and has the hig- hest average attendance of any professional sports league in the world, moving billions of dollars annually through licensing agreements, sponsorships, television deals, ticket and product sales. In addition, it moves a billionaire betting market, which heavily consumes statistical data on games to produce forecasts. Moreover, game statistics are also used to characterize players performance, dictating their salaries. Thus, the discovery of implicit knowledge in the NFL statistics becomes a challenging problem. In this article, we model the beha- vior of NFL players and teams using complex network analysis. In particular, we represent quarterbacks and teams as nodes in a graph and labor relationships among them as edges to compute metrics from the graph, using them to discover implicit properties of the NFL social network and predict team success. Experimental results show that this social network is a scale-free and small-world network. Furthermore, node degree and clustering coefficient can be effectively used to predict team success, outperforming the usual passer rating statistic. 1 INTRODUCTION the New Orleans Saints by reaching the 12-sack mark. His 2015, 2016 and 2017 base salaries of $4 million, The NFL (National Football League) was formed in $4 million and $5 million each increased by $1 mil- 1920 and today is the most popular sports league in lion. However, hiring players based exclusively on the United States, having the highest average atten- their game statistics and paying them the highest sa- dance of any professional sports league in the world. laries is not a guarantee of team success. The NFL is also a successful franchise business, with In fact, a recent study has provided clues that its 32 teams ranked among the top 50 most valua- large wage distortions is not a good strategy for team ble sports teams in the world, moving billions of dol- success (Burke, 2012). Analyzing the offensive line lars annually through licensing agreements, sponsors- salary and performance, the author shows that the hips, television deals, ticket and product sales (Ejio- more teams pay their linemen, the more sacks and chi, 2014). After each NFL game, a large amount of tackles for losses they tend to give up. By the con- statistics are generated to describe the performance of trary, a higher median salary indicates better perfor- the players. These statistics are used to move a bet- mance. This results suggests that game statistics alone ting market estimated in $93 billion dollars of legal are not enough to effectively predict team success. and illegal gambling annually (Heitner, 2015). For The adoption of models and metrics that capture col- instance, Internet sites use NFL game statistics to aid lective behavior of players appear promising to in- gamblers, giving them more reliable predictions on crease prediction effectiveness. Therefore, the the- the outcome of upcoming games. ory of complex networks can be used to investigate NFL salary structures are notoriously complex, the collective behavior of social agents, including te- with base pay, bonuses, and guarantees, and game ams and players in a sporting context. Particularly, statistics are also regularly used to characterize the a network is a set of nodes and connections between performance of each player over time, dictating their them, called edges. Complex networks are networks salaries and the duration of their contracts. For in- with a large number of nodes and edges following re- stance, Elvis Dumervil, the outside linebacker of the levant patterns, such as hubs, i.e., clusters with highly Baltimore Ravens in the 2014 season, triggered $3 connected nodes. While the analysis of simple net- million in base salary escalators and earned $1 mil- works can be done through visual inspection, the dis- lion in incentives during Week 12’s contest against covery of relevant patterns in complex networks de- 135 Salim, M. and Brandão, W. Predicting the Success of NFL Teams using Complex Network Analysis. DOI: 10.5220/0006697101350142 In Proceedings of the 20th International Conference on Enterprise Information Systems (ICEIS 2018), pages 135-142 ISBN: 978-989-758-298-1 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved ICEIS 2018 - 20th International Conference on Enterprise Information Systems mands statistical methods. of items have very few connections (Faloutsos et al., In this article, we investigate the properties of the 1999). They usually are represented as graphs, with NFL social network, a network with players and te- items as nodes (or vertices), and the connections bet- ams as nodes and labor relationships among them as ween the nodes as edges (or links). edges. Additionally, we propose to predict the success Particularly, a complex network models a real- of teams by modeling the behavior of players and te- world problem with nodes and edges storing infor- ams in the NFL social network. Experiments show mation on the problem (Wasserman and Faust, 1994). that the number of quarterbacks with significant im- In multi-modal networks, the information are in the pact in the NFL history and in their teams is negligi- nodes, while in multidimensional or multi-relational ble if we exclusively rely on game statistics, such as networks, the information are in the edges. We can passer rating. In addition, we show that the NFL so- also classify complex networks by their application cial network is scale-free, i.e., a very small number in real-world problems (Newman, 2010). Biologi- of quarterbacks present extraordinary performance, cal networks represent biological systems, e.g., neu- while a large number of quarterbacks perform poorly. ral, protein, vascular and metabolic pathways net- Moreover, we show that the NFL social network fol- works. Information networks represent information lows a small-world behavior, where the distance bet- and knowledge systems where nodes are informa- ween any two nodes are very small. Particularly, the tion items, such as research articles, documents, and key contributions of this article are: Web pages. Citation networks and the Web are ex- • We investigate the properties on the NFL social amples of information networks. Social networks re- network, showing that it can be characterized as a present relationships between people or groups, such scale-free and small-world network. as friendships, family and professional relationships. Usually, social networks present a small-world beha- • We propose a method to predict the success of vior, where no one is far from anyone (Watts and Stro- NFL teams based on the network properties and gatz, 1998). Technological networks represent man- metrics. made systems, usually built for efficiently distribution • We thoroughly evaluate the network metrics used of resources, e.g., electrical grid, telephony, water dis- by our method by contrasting them with usual tribution and the Internet (Newman, 2003). passer rating statistic. We show that our metrics Figures 1, 2, and 3, present three different kind of outperform this quarterback performance statistic complex networks. Particularly, they differ according to predict the success of NFL teams. to how the connections between nodes are built (Costa The remainder of this article is organized as fol- et al., 2007): randomly or non-randomly. Random n e lows: Section 2 reviews the related literature on networks are built from a graph with nodes, where complex networks. Section 3 presents related work. edges are randomly drawn between the nodes (Erdos¨ Section 4 show that the usual passer rating statistic and Renyi,´ 1959), so that all nodes have the same pro- plays a significant role in only a small fraction of the bability of receiving new connections. In random net- NFL players. Section 5 presents the properties of the works, the more connections one add to the graph, the NFL social network, including the method we pro- greater the chance of a cluster to occur. pose to predict team success. Section 6 shows ex- perimental results, attesting the effectiveness of our method and metrics to predict team success, when compared with passer rating. Finally, Section 7 pro- vides a summary of the contributions and the conclu- sions made throughout the other sections, presenting directions for future research. 2 BACKGROUND Complex networks are huge sets of interconnected Figure 1: Example of a random network. items with a structure that do not follow a regular pat- tern. For instance, the Internet is a complex network Scale-free networks are built from a graph with composed by millions of interconnected routers, fol- n nodes, where e edges are not randomly drawn bet- lowing a pattern in which a small number of items are ween the nodes (Albert and Barabasi,´ 2002), so that extremely highly-connected, and the great majority the more connections a node has, the greater the 136 Predicting the Success of NFL Teams using Complex Network Analysis chance of it receives new connections. In scale-free contribute to team’s success, proposing a model to networks, the more connections one add to the graph, predict the success of teams in the NFL playoffs based the greater the chance of a few nodes getting more on the network topology. connected. As result, scale-free networks have a very Similarly, the interactions between NBA (Natio- low degree of connectivity and present a behavior nal Basketball Association) teams and players were known as the rich get richer. modeled as a complex network (Vaz de Melo et al., 2008). The authors model the labor relationships be- tween NBA players and teams as a complex network to investigate team’s behavior, proposing different ap- proaches to predict team success using network me- trics.