Winning Hockey Contests Using Integer Programming

David Scott Hunter Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139, [email protected] Juan Pablo Vielma Department of Operations Research, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139, [email protected] Tauhid Zaman Department of Operations Management, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA 02139, [email protected]

We present an integer programming approach to winning daily fantasy sports hockey contests which have top heavy payoff structures (i.e. most of the winnings go to the top ranked entries). Our approach incorporates publicly available predictions on player and team performance into a series of integer programming problems that compute optimal lineups to enter into the contests. We find that the lineups produced by our approach perform well in practice and are even able to come in first place in contests with thousands of entries. We also show through simulations how the profit margin varies as a function of the number of lineups entered and the points distribution of the entered lineups. Studies on real contest data show how different versions of our integer programming approach perform in practice. We find two major insights for winning top heavy contests. First, each lineup entered should be constructed in a way to maximize its individual mean, and more importantly, variance. Second, multiple lineups should be entered and they should be constructed in a way to limit their correlation with each other. Our approach can easily be extended to other sports besides hockey, such as American football and .

Key words: sports analytics, daily fantasy sports, statistics, integer programming History: arXiv:1604.01455v1 [stat.OT] 6 Apr 2016

1. Introduction Daily fantasy sports allow users to enter lineups of players in a sport (such as hockey or American football) into daily contests. If the players on a user’s lineup perform well, the user can win a substantial amount of money in the contests. Daily fantasy sports have recently become a huge industry. For instance, DraftKings, the leading company in this area, reported $30 million in revenues and $304 million in entry fees in 2014 (Heitner 2015). While there is ongoing legal debate as to whether or not daily fantasy sports contests are gambling (Board 2015), there is a strong element of skill required to continuously win contests as evidenced by the fact that the top one percent of players pay 40 percent of the entry fees and gain 91 percent of the

1 Hunter, Vielma, and Zaman: Winning 2 profits (Harwell 2015). Daily fantasy sports therefore provide a great opportunity for users to compete and earn profits with analytics. Several sites such as Rotogrinders have been created which solely focus on daily fantasy sports analytics (Rotogrinders 2015). Other sites such as Daily Fantasy Nerd even offer to provide optimized contest lineups for a fee (DailyFantasyNerd 2015). There are a variety of contests in daily fantasy sports with different payoff structures. Some contests distribute all the entry fees equally among all lineups that score above the median score of the contest. These are known as double-up or 50/50 contests. They are relatively easy to win, but do not pay out very much. Other contests give a disproportionately high fraction of the entry fees to the top scoring lineups. We refer to these as top heavy contests. Winning a top heavy contest is much more difficult than winning a double-up or a 50/50 contest, but the payoff is also much greater. Typical entry fees range from $3 to $30, while the potential winnings range from thousands to a million dollars (DraftKings 2015). A daily fantasy sports company sets a price (in virtual dollars) for each player in a sport. These prices vary day to day and are based upon past and predicted performance (Steinberg 2015). Lineups must then be constructed within a fixed budget. For instance, in DraftKings hockey contests, there is a $50,000 budget for each lineup. There are further constraints on the maximum number of each type of player allowed and the number of different teams represented in a lineup. Finding an optimal lineup is challenging because the performance of each player is unknown and there are hundreds of players to choose from. However, predictions for player points can be found on a host of sites such as Rotogrinders (Rotogrinders 2015) or Daily Fantasy Nerd (DailyFantasyNerd 2015) which are fairly accurate, and many fantasy sports players have a good sense about how players will perform. This suggests that the main challenge is not in predicting player performance, but in using predictions and other information to construct optimal lineups. The strategy for constructing lineups depends upon the type of contest being entered. For double-up and 50/50 contests all that is needed is a lineup that beats the median. In this case, one would want a lineup that has a high average predicted score but also low variance. In contrast, top heavy contests provide extreme profits if a lineup can achieve a very high rank. If the predictions were perfect, then this could be done with a single lineup, similar to what one would do for a double-up or 50/50. The problem with this approach is that the predictions are noisy, so one could not consistently achieve a high rank with a single lineup. However, two things can be done to increase the chance of victory. First, one can use lineups which have an inherently large variance, which would increase the chance of the lineup having an extremely good performance and achieving a high rank. Second, one can enter multiple lineups, thereby increasing the chance that at least one of them will rank high. Because of the payoff structure of a top heavy contest, having at least one lineup achieve a high rank will be enough to make a substantial profit. Therefore, a good approach to winning top heavy contests is to enter many lineups with high variance. In this work we use integer programming to win top heavy hockey contests in DraftKings. Our integer programming approach allows us to systematically construct multiple lineups which have high variance. We Hunter, Vielma, and Zaman: Winning 3

November 15, 2015 November 16, 2015 November 17, 2015 November 23, 2015

Figure 1 Screenshots of the performance of our top integer programming lineups in real DraftKings hockey contests with 200 linueps entered. All profits are being donated to charity. chose hockey because we are not familiar with it. This way, any success we have is strictly due to our use of integer programming and not any of our own expert knowledge. The contests we focused on had thousands of entrants and top payoffs of several thousand dollars. During the period when this work was conducted, DraftKings allowed a single user to enter 200 lineups into a contest. We began competing with our integer programming approach and performed very well several days in a row. We even placed third or higher in three consecutive contests. We show screenshots of our performance with 200 lineups in Figure 1. A week after we began winning with our integer programming approach, DraftKings reduced the number of multiple entries to 100. This reduced our performance, but we were still able to perform well in the contests. This demonstrates the power of integer programming for winning daily fantasy sports contests. The remainder of the paper is structured as follows. We begin by reviewing the literature on sports ana- lytics with a particular focus on hockey in Section 1.1. We present an empirical analysis of NHL player performance and the prediction accuracy of sites which forecast player performance in Section 2. We present a simulation study of the multiple lineups strategy in Section 3 which shows how much of an edge a set of lineups needs in order to win top heavy contests. Section 4 presents our integer programming approach for constructing multiple lineups. We evaluate the performance of our integer programming approach in actual DraftKings hockey contests in Section 5. We conclude in Section 6 with a discussion on extending our integer programming approach to other sports.

1.1. Previous Work Much of the extant work in hockey analytics has focused on statistical models of player performance. The traditional measure of a player’s ability is the plus-minus statistic, which measures the number of goals Hunter, Vielma, and Zaman: Winning 4 scored by a player’s team minus the number of goals scored by the opposing team when the player in on the ice. Many modifications have been suggested for this simple metric in order to provide a more clear assessment of a player’s quality. The adjusted minus/plus probability approach of Schuckers et al. (2011) attempts to add in other data such as hits or face-offs. Controlling for team strength by using the team- average plus-minus was proposed in Awad (2009). Other approaches attempt to statistically infer the effects of individual factors on team performance (Thomas et al. 2013). An approach using a regression on goals per hour to assess player performance in proposed in Macdonald (2011). To deal with the large number of features present in the high dimensional regressions that occur for player performance, a regularized logistic regression approach is used in Gramacy et al. (2013). The Added Value metric is presented in Pettigrew (2015) which incorporates information on what are known as power plays in hockey. A methodology known as total hockey rating is proposed in Schuckers and Curro (2013) for performance evaluation of skaters which incorporates data on factors such as zone starts and non-shooting events such as turnovers. While the above models focus on skating players, a different approach is needed when evaluating goalie performance. Typically a goalie is evaluated by their percentage. However, this may depend upon the strength of the defense of the team. A defense independent rating of goalies is presented in Schuckers (2011) which uses data on the spatial locations of shots they faced. In Beaudoin and Swartz (2010) a simulator is developed which assesses strategies for pulling goalies in hockey games. In the area of fantasy sports there are a plethora of websites which provide strategies for drafting lineups or algorithmic tools which provide optimized lineups. However, there is very little formal academic work on this topic. In Becker and Sun (2014) an integer programming approach is presented which optimizes lineups for an entire fantasy American football season. This work has elements similar to our work, such as the use of integer programming to optimize lineups, but there are a few key differences. First the focus is on season long fantasy contests, whereas we focus on daily fantasy contests. Second, the model is developed for American football, which has a very different structure from hockey. Third, the approach produces a single optimized lineup, whereas our strategy is to generate multiple lineups.

2. Hockey Data Analysis There are two capabilities we utilize in our approach to winning daily fantasy sports hockey contests. The first is exploiting the structure of hockey teams and fantasy points scoring systems in order to construct lineups that can score many fantasy points. The second is to be able to predict the performance of players so that our lineups do indeed achieve the scores we expect. In this section we present data analysis which will show how to predict player performance and guide the construction of an integer programming formulation to select multiple lineups in Section 4. The daily fantasy contests we focus on are from the site DraftKings and are based on players in the (NHL), which is the main hockey league in North America. We begin by describing the basic elements of hockey and hockey daily fantasy sports contests. Hunter, Vielma, and Zaman: Winning 5

We then evaluate the performance of different public websites that provide predictions for player and team performance. After this we study the correlations in player performance which result from the structure of hockey teams. While our analysis focuses on DraftKings, we note that other daily fantasy sports sites have very similar contests and point scoring systems. All data in this section comes from October 21, 2015 until December 31, 2015.

2.1. Hockey Basics A hockey team consists of six players: one center, two wingers, two defensemen, and a goalie. The centers, wingers, and defensemen are known as skaters because they are free to skate on the ice rink, whereas the goalie primarily stays in front of the goal. In hockey the aim is for a team to shoot a puck into the goal of the opposing team and the goalie’s task is to protect the goal from any opponent scores. In daily fantasy sports hockey contests there are constraints on the types of players that must be present in a lineup. For DraftKings, each lineup must consist of two centers, three wingers, two defensemen, one goalie, and one which is any skater, resulting in a lineup of nine players. We show a screenshot of one possible DraftKings lineup in Figure 2. The figure also shows a column labeled FPPG, which stands for fantasy points per game. This is the average number of fantasy points per game the player has earned throughout the current NHL season. DraftKings provides this information to help users construct their lineups. The total fantasy points for a lineup is the sum of each player’s fantasy points. The lineup shown has 34.4 fantasy points per game. This score is in the typical range of NHL lineups in DraftKings and gives a sense of the value of different scoring actions. In a daily fantasy sports contest, there is a specific way in which players score fantasy points. Because we test our approach in DraftKings, we will focus on its points scoring system1. We summarize the details of the DraftKings scoring system in Table 1. In Figure 2 we show the basic statistics of points per game for each position averaged over all NHL players. We also show the average DraftKings cost per player in the figure. As can be seen, on average goalies tend to score almost twice as many points as all other players and also cost twice as much. Also, the standard deviation of each position’s points per game is roughly the same size as the average value. The centers and wingers are known as offensive players. In the NHL each hockey team plays its offensive players in fixed sets known as lines. This means that if one player from the line, such as the center, is on the ice, then so are the other two players from the line. Each team has four standard lines and two power play lines, which only play when there is a power play as a result of a penalty. We show the average points per game for players based upon their line in Figure 2. The rank of the line typically suggests how well it performs. As can be seen, line one is generally the highest scoring line. The points system of DraftKings is very similar to most daily points systems. It gives more points for players that perform better, but it also has some important properties which we use to

1 Full scoring rules can be found at www..com/rules. Hunter, Vielma, and Zaman: Winning 6

Score type Points Goal 3 Assist 2 Shot on Goal 0.5 Blocked Shot 0.5 Short Handed Point Bonus (Goal/Assist) 1 Shootout Goal 0.2 Hat Trick Bonus 1.5 Win (goalie only) 3 Save (goalie only) 0.2 Goal allowed (goalie only) -1 Shutout Bonus (goalie only) 2 Table 1 Points system for NHL contests in DraftKings. construct our lineups. The first property concerns assists, which is passing the puck to a player who either scores or who passes the puck to the player that scores. Each assist is worth two points and DraftKings awards points for up to two assists per goal. This is an important aspect of the scoring system because it means that for each goal scored, it is possible for a single lineup to receive seven points (three points for the goal, and four points for the two assists that led to the goal). However, such a situation can only occur if all players involved are on a single lineup. This suggests that to score many points, a lineup should contain all players from a single line. By putting all players from a single line in a lineup, we are essentially increasing the variance of the lineup. This is a popular fantasy sports tactic known as stacking. Many expert daily fantasy sports players use stacking to win in a variety of sports contests (Drewby 2015), (Bales 2015b), (Bales 2015a). The second property of the points system concerns goalies. If a goalie’s team wins the game, an additional three points is awarded to the goalie. This is important because as we will see, it is possible to predict who will win an NHL hockey game with relatively good accuracy, so by choosing goalies on winning teams, an extra three points can be obtained, which is a significant amount in hockey contests given that lineups average between 30 to 40 points.

2.2. Prediction Sites Many fantasy sports players devote a substantial amount of effort in developing elaborate prediction mod- els for players’ performance. Their advantage comes mainly from the power of their predictive models. However, we do not have the domain knowledge required to improve upon the prediction models of the top fantasy players. Therefore, we take a different approach. Publicly available player point predictions are pro- vided by the websites Rotogrinders (Rotogrinders 2015) and Daily Fantasy Nerd (DailyFantasyNerd 2015). We find the predictions from these sites to be relatively accurate, but the accuracy varies by position. We plot in Figure 2 the mean absolute error in points of the two sites’ predictions for different hockey positions using our data from the 2015 NHL season. We see that the prediction errors of both sites are very similar. Therefore, it does not seem that either one has a superior prediction model. In Section 4.1 we build a simple regression based prediction model which uses the data from both of these sites. Hunter, Vielma, and Zaman: Winning 7

8 5 Average FPPG Line 1 Standard Deviation FPPG Line 2 7 4.5 Line 3 Average Player Cost [$ thousands] Line 4 4 Power Play Line 1 6 Power Play Line 2 3.5 5 3

4 2.5

3 2 Average FPPG

1.5 2 1 1 0.5 0 0 Goalie Center Winger Defenseman Winger Center 3.5 Daily Fantasy Nerd Rotogrinders 3

2.5

2

1.5

1

0.5 Mean Absolute Error [FPPG] 0 Goalie Center Winger Defenseman Skaters

Figure 2 (top left) Mean and standard deviation of fantasy points per game (FPPG) and average player cost in DraftKings for each hockey position. (bottom left) Mean absolute error of NHL player FPPG predictions of Daily Fantasy Nerd and Rotogrinders for each hockey position. (top right) Mean FFPG of offensive skaters in different types of lines. (bottom right) Screenhsot of an NHL lineup from DraftKings.

2.3. Player Correlations In Section 4 we will present an integer programming formulation which creates lineups with a large vari- ance. The key to this approach is putting players on lineups with positive correlations in their FPPG. This way a lineup will either perform very well or very poorly. For top heavy contests, we will see that this strategy is effective. To create high variance lineups we must first understand the key correlations between players. There are two important correlations we look at. The first concerns players in lines. The second concerns offensive players and goalies. We first begin by examining the correlations of FPPG between players in the same line versus those in different lines. We calculate the Pearson correlation coefficient between the FPPG for every pair of NHL offensive players (centers and wingers) in the same line and in the same team but different line. We plot the resulting average correlation coefficients for each group in Figure 3. We see that the players on the same line have a much higher correlation than players on different lines. This is not surprising given the way fantasy points are awarded. This correlation is most likely due to the assist/goal combinations that give Hunter, Vielma, and Zaman: Winning 8

0.3 0.3 Offensive Skater 0.25 0.25 Defensemen

0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

0 0

−0.05 -0.05

−0.1 -0.1

−0.15 -0.15 Corr. Coeff. of Goalie-Skater FPPG Corr. Coef. of Offensive Player FPPG −0.2 -0.2 Line 1 Line 2 Line 3 Line 4 Different Line Same Team Different Team

Figure 3 (left) Average correlation coefficient of FPPG for offensive players in the same and different lines. (right) Average correlation coefficient of FPPG for goalies and skaters in the same and opposing teams. points to multiple players who are simultaneously on the ice. This correlation should be included in a lineup to increase the variance. We next look at the correlation between goalies and skaters in the same game. We consider the correlation of FPPG for offensive players and defensemen with goalies on the same and opposing team in a game. We plot the resulting average Pearson correlation coefficient of the FPPG for these groups in Figure 3. We find that the correlation is negative for goalies and skaters on opposing teams. For the same team the correlation is near zero. The negative correlation is due to the fact that if a skater scores fantasy points, it is typically because a goal was scored, which gives negative fantasy points to the goalie. On the same team there is no structural condition that would correlate skaters and goalies on the same team. The negative correlation is something we want to avoid within a lineup as it would reduce the overall variance. Therefore, to increase lineup variance, we will avoid putting goalies and skaters on opposing teams in a single lineup.

3. Simulation Study of the Multiple Lineups Strategy Because of the excessively high payoff for the best lineups in top heavy contests, the key to winning is to have at least one lineup achieve a high rank. For instance, in a DraftKings NHL Sniper contest, nothing is paid to the bottom 79% of lineups but $4000 is paid to the first place lineup. The entry fee is $3 per lineup and an individual user can enter up to 100 or 200 lineups2. If a user entered 200 lineups in the Sniper contest and one of the lineups came in at least seventh place, then all $600 in entry fees would be recovered. The Sniper contest payoff structure is similar to other top heavy contests. Profit can be made in these contests even if all lineups entered do poorly, as long as at least one does very well. We provide an

2 Originally a maximum of 200 lineups per user were allowed in any contest, but during the course of this research DraftKings reduced this to 100 lineups. Hunter, Vielma, and Zaman: Winning 9 integer programming approach to construct multiple lineups in Section 4 that have some sort of edge over the other contest lineups. We wish to understand how much of an edge the integer programming approach needs to provide in order for it to be profitable. To understand how the multiple lineup strategy will perform in a top heavy contest, we model the lineups as follows. Let n be the total number of lineups in a contest. Let m be the number of lineups we will enter using our integer programming approach. The rules of daily fantasy sports force m to be no more than 1% or 2% of n. To analyze the behavior of the profit, we model the points of each lineup as an independent normal random variable. In reality this will not be the case because many lineups will share players, but we make the assumption here to simplify our analysis. Let the points of the integer programming lineups

2 be given by normal random variables Xi, 1 ≤ i ≤ m, with mean and variance µ1 and σ1. Let the points of the other lineups in the contest, which we refer to as the population lineups, be given by the normal random

2 variables Yi, 1 ≤ i ≤ n − m with mean and variance µ0 and σ0. We wish to understand how the profit of the integer programming lineups behaves as a function of the contest payoff structure, the number of integer programming lineups entered, and the edge provided by the integer programming approach.

We let the profit of each rank in a contest be given by a vector p = [p1, p2, ..., pn] where pi is the total Pn profit (winnings minus entry fee) of a lineup ranking i out of n. We have that i=1 pi < 0 since a fixed percentage of the entry fees are claimed by DraftKings as revenue. We define the order statistics of the integer programming lineups as X(1) ≥ X(2) ≥ ... ≥ X(m) and the order statistics of the population lineups as Y(1) ≥ Y(2) ≥ ... ≥ Y(n−m). The rank of the ith best integer programming lineup X(i) in the contest is ri and the profit margin from a contest is Pm p Q = i=1 ri . (3.1) cm

This profit margin is a function of n,m, p, c, and the distributions of Xi, and Yi. To understand the rela- tionship between Q and the other variables, we perform a simulation study. We consider the NHL Sniper contest which has a top heavy payoff structure shown in Figure 4. The size of the contest varies from day to day, but typically has several thousand entries. In our simulations we use n = 21, 073 which is the actual number of entries in one of the Sniper contests. To simulate the Sniper contest we fix the distribution param- eters, simulate m integer programming lineups and n − m population lineups, and calculate the resulting profit margin using equation (3.1). We repeat this for 10,000 trials for each set of parameter values. We then investigate the relationship between profit margin and the different parameters of the model.

We plot in Figure 5 the median profit as a function of m for the Sniper contest with µ0 = 27.5 and

σ0 = 7.1, which are actual population values for one of the Sniper contests. We use different values for

µ1 and σ1. We simulate up to m = 1000, but in reality m is no more than 100 or 200. We see that when the algorithm and population have equivalent variances, the integer programming mean must be two points above the population to make profit. From the figure, the profit margin becomes positive at approximately Hunter, Vielma, and Zaman: Winning 10

4 10

3 10

2 10 Payoff [$]

1 10

0 10 0 1 2 3 4 10 10 10 10 10 Rank Figure 4 Payoff versus lineup rank of the Sniper contest with 21,073 entries. Ranks which receive zero payoff are not shown.

140 integer programming lineups when µ1 − µ0 > 2. The profit margin also tends to increases in m for this range. Therefore, if the integer programming approach could only increase µ1, then more lineups would always be better in this range. The profit margin clearly decreases at some point because if m = n the profit margin would be negative since the net payoff of the contest is negative (due to fees collected by DraftKings). However, this occurs far beyond m = 1, 000 and so is not relevant in real contests. We see a more interesting behavior if we set the integer programming mean equal to the population mean and vary the integer programming variance. Here we see that with σ1 only two points above σ0, positive profit can be achieved with 35 lineups. If σ1 is four points above σ0, something different occurs. Now the peak median profit margin of 1,250 % is achieved at around 100 lineups, but for more than 100 lineups, the profit margin actually begins to decrease. For the range of m allowed by DraftKings, an increase in σ1 has much more impact on the profit margin than an increase in µ1. This is a result of the payoff structure of top heavy contests. To further understand how the algorithm lineup distribution affects the profit margin for values of m that are used in practice, we fix µ0 and σ0 as before and we set m equal to 100 and 200 (the new and original limits on multiple lineups set by DraftKings). We then sweep over µ1 and σ1 and trace the profit margin contours in Figure 6. We see from this figure that if µ0 = µ1 then we need σ1 to be about one point above

σ0 to make profit. If σ0 = σ1, µ1 must be approximately two points above µ0 to make profit. The integer programming approach can even make profit if its mean or variance are below the population.

For instance, if µ1 − µ0 = −1, then the integer programming approach will make profit if σ1 is more than one point above σ0. But if σ1 − σ0 = −1 then the integer programming approach needs µ1 to be at least three points above µ0. This shows that the key to winning top heavy contests is having lineups which have a higher variance than the population, even if this comes at the cost of a lower mean. Hunter, Vielma, and Zaman: Winning 11

300 1400 µ −µ = −2 σ −σ = −2 1 0 1 0 250 µ −µ = 0 1200 σ −σ = 0 1 0 1 0 200 µ −µ = 2 σ −σ = 2 1 0 1000 1 0 µ −µ = 4 σ −σ = 4 1 0 1 0 150 800 100 600 50 400 0 200 −50 Median profit margin (percent) −100 Median profit margin (percent) 0

−150 −200 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of integer program lineups Number of integer program lineups

Figure 5 The median profit as a function of the number of integer programming lineups for the Sniper con-

test with n = 21, 073, µ0 = 27.5, and σ0 = 7.1. (right) σ1 = σ and curves for different values of µ1 are

shown. (left) µ1 = µ and curves for different values of σ1 are shown.

100 integer program lineups 200 integer program lineups 5 400 5 600 200 400 600 4 4 200 400 600 400 3 0 200 3 600 200 400 0 600 2 200 2 400 0 600 200 1 1 0 600 0 200 0 400 σ σ 200 − 0 0 − 0 1 1 0 σ σ −1 −1

−2 −2 0 0 −3 −3

−4 −4

−5 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 µ −µ µ −µ 1 0 1 0

Figure 6 Contour plot of the median profit margin (in percent) for the Sniper contest with n = 21, 073, µ0 =

27.5, σ0 = 7.1 and different numbers of integer programming lineups.

3.1. Double-ups versus Top Heavy Contests Our simulation analysis shows that the multiple lineups strategy will be very profitable if our integer pro- gramming approach can produce lineups with an edge of a few points above the population. We now inves- tigate if a similar result holds for another popular contest, the double-up. Recall that double-up contests divide all the entry fees evenly among roughly the top half of the lineups. We consider an actual DraftKings hockey double-up contest with n = 1, 136 and m = 30. This contest has a smaller overall size and multi- ple entry limit than the top heavy competitions we analyzed. We simulate performance on this contest as a function of the integer programming edge and show the result in Figure 7. Two things are evident from this figure. First, the profit margin is much lower than for top heavy contests. In the figure the maximum Hunter, Vielma, and Zaman: Winning 12

profit margin is 90 percent. Second, the profit margin decreases as σ1 increases, in contrast to the top heavy contests. In fact, the highest profits are made when the integer programming approach has a positive edge in the mean and a negative edge in the variance. This difference in strategies for double-up and top heavy contests is due to the different payoff structures of the contests. Top heavy contests have a high variance payoff structure and require the integer programming approach to have a high variance as well, whereas double-up contests have a low variance payoff structure and prefer a lower variance. The simplicity of the double-up payoff structure allows us to accurately approximate the profit margin with a closed form expression. We assume the entry fee is c and that m lineups are entered. For our approx- imation we assume that n is very large so that the median score in the competition is given by µ0 (recall that for normal random variables the mean and median are equal). For a double-up, the payoff is approxi- mately 2c for any lineup above the median and zero for the rest. We say approximately because the cutoff for winning is actually slightly above the median in order for some of the entry fees to be collected by the daily fantasy sports company. Then, using equation (3.1) the profit margin for m lineups is

c Pm 1 {X > µ } − c Pm 1 {X ≤ µ } Q = i=1 i 0 i=1 i 0 m cm m 2 X = 1 {X > µ } − 1 m i 0 i=1

Recall that we model Xi as normal random variables. We define F (x; µ, σ) as the cumulative distribution function (CDF) of a normal random variable with mean µ and standard deviation σ. Then we have the following result for the limiting profit margin of a double-up.

Theorem 3.1 Let Q = 1 − 2F (µ0; µ1, σ1). Then Qm converges almost surely to Q.

1 Pm 1 a.s. Proof. From the strong law of large numbers we have that m i=1 {Xi > µ0} −→ 1 − F (µ0; µ1, σ1). The expression for Q follows from this.  The expression Q is our approximation for the profit margin of a double up contest when we have many lineups entered and the contest is very large. We plot Q in Figure 7 along side the simulated profit margin for 30 integer programming lineups. As can be seen, the approximation is very close to the simulated value. Therefore, we can use Q to very easily map the profit margin surface of double-up contests. We find the same basic conclusions as with the simulation, i.e. the best lineups have high mean and low variance. While we can find a closed form expression for the profit margin surface for double-ups, such an expression cannot be easily found for top heavy competitions because of the complexity of the payoff structure. Instead we must rely on simulations to map the profit margin surface. Hunter, Vielma, and Zaman: Winning 13

30 algorithm lineups, double-up Theory, double−up 20 10

10 4 4 20

0 0 3 20 30 3

2 10 2 40 30

20 1 30 1 0 0 20

σ 10 σ

40 − 0 − 0 50 1 0 1

10 σ σ 0 −1 −1 40 30 50

40 30 20 60 −2 −2 50 60 70

−3 50 −3 20 30 70 40 80 10 40 60 −4 10 60 80 −4 20 0 70 90 0 50 70 90 50 −5 −5 30 80 −5 −4 −3 −2 −1 0 1 2 3 4 5 −5 −4 −3 −2 −1 0 1 2 3 4 5 µ −µ µ −µ 1 0 1 0 Figure 7 (left) Contour plot of the median profit margin (in percent) for a double-up contest with n = 1, 136,

m = 30, µ0 = 27.5, and σ0 = 7.1. (right) Contour plot of the theoretical profit margin (in percent) for a

double-up contest with µ0 = 27.5, and σ0 = 7.1.

4. Multiple Lineup Integer Programming Approach We now present our integer programming approach for selecting multiple lineups for hockey contests. The integer programming approach takes as input the player point predictions, which we calculate using a simple regression model. We then construct and solve an integer programming formulation to produce a single lineup with maximum predicted points, subject to a set of stacking constraints which increase the lineup variance. Then, we iteratively add constraints to the integer programming formulation in order to produce multiple lineups which are distinct from previously constructed lineups. We now go into the details of the prediction model and the integer programming formulation.

4.1. Player Point Predictions We first build a predictive model for the fantasy points for each player in the NHL. Rather than build complex statistical model to make predictions, we instead choose a much simpler approach. We saw that sites such as Rotogrinders and Daily Fantasy Nerd have relatively accurate predictions for player points. Further, these sites are by fantasy sports experts who most likely have much domain expertise and have spent a great amount of effort to construct their predictions. Therefore, it is unlikely that any statistical model we build would have an edge over these sites, especially given our lack of domain knowledge for hockey. Rather, it is sensible and very simple to use these predictions for the players’ points. Here we will analyze a set of simple predictive models and evaluate their prediction accuracy. Later in Section 5 we will evaluate these different models by their profitability in real contests. Hunter, Vielma, and Zaman: Winning 14

(1) (2) (3) (4) (5) (6) skater points goalie points goalie points goalie points goalie points goalie points ∗∗∗ ∗∗∗ β1 0.634 0.628 0.203 0.203 (Rotogrinders) (0.0203) (0.0864) (0.144) (0.144)

∗∗∗ β2 0.282 -0.0173 -0.0309 -0.0301 (Daily Fantasy Nerd) (0.0242) (0.106) (0.113) (0.113)

∗ ∗∗ ∗ ∗∗ β3 4.310 5.304 4.176 5.172 (Win Probability) (2.123) (2.005) (2.064) (1.941)

β0 1.334 1.686 (0.0314) (0.549) (1.032) (1.003) (1.013) (0.983) R2 0.238 0.0858 0.0179 0.0140 0.0178 0.0139 Samples 10825 565 506 506 506 506 Standard errors in parentheses ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001 Table 2 Regression coefficients, R squared, and sample count for different prediction models of skater and goalie points.

For a player u, let xu1 and xu2 be the predictions by Rotogrinders and Daily Fantasy Nerd, respectively. For skaters, we perform a linear regression on the player’s points using these predictions as features. We denote yu as the predicted points of player u using our model. For skater u our points prediction is

yu = β0 + β1f1u + β2f2u. (4.1)

For goalies, we must also factor in the outcome of the game since there is a three point bonus for victory. Daily Fantasy Nerd provides win probabilities for each goalie, and we have found these estimates to be relatively accurate. We perform a linear regression on the goalie points using as predictor variables the predictions of Rotogrinders, Daily Fantasy Nerd, and the win probability of the goalie’s team. For a goalie u, let this win probability by pu. Then the points prediction for goalie u is

fu = β0 + β1f1u + β2f2u + β3pu. (4.2)

We compare different prediction models in Table 2. For skaters, both Rotogrinders and Daily Fantasy Nerd predictions are significant. For goalies when we do not include the win probability we find that only Rotogrinders is significant. When the win probability is included, it is the only significant feature. The highest R squared value for goalies is the model which does not include the win probability. This suggests that we should not include the win probability in our goalie model. However, we will see in Section 5 that in terms of profit in real contests, the win probability model can have better performance.

4.2. Feasibility Constraints Now that we have the predictions for the player points, we can construct the integer programming formula- tion. This integer programming formulation will be solved iteratively with new added constraints to produce Hunter, Vielma, and Zaman: Winning 15 multiple lineups. We now introduce the basic notation for our algorithm. We will construct m lineups. We assume there are N NHL players to choose from and we let the variable xuk equal one if player u is selected for lineup k and zero otherwise. Each player u is predicted to achieve fu fantasy points. We let the fantasy price of player u be cu. There are NT NHL teams playing in a contest, and we denote the set of players on team i by Ti. Also, we denote the set of players who are centers, wingers, defenseman, and goalies by C, W , D, and G, respectively. We begin by formulating the basic feasibility constraints of a lineup. Daily fantasy sports contests all impose a budget B constraint on a lineup. For DraftKings, this budget is B = 50, 000 fantasy dollars. There is also a constraint on the number of players of each type of position. In DraftKings, a lineup consists of nine players: two centers, three wingers, two defensemen, a goalie, and a utility player who can be any skater type. The position constraints are given by X 2 ≤ xuk ≤ 3, u∈C X 3 ≤ xuk ≤ 4, u∈W X (4.3) 2 ≤ xuk ≤ 3, u∈D X xuk = 1, u∈G DraftKings also requires each lineup to have players that come from at least three different NHL teams. The team constraints are X ti ≤ xuk, ∀ i ∈ {1,...,NT }

u∈Ti N XT (4.4) ti ≥ 3, i=1

ti ∈ {0, 1} , ∀ i ∈ {1,...,NT } . We put these constraints together to obtain the basic constraints for a feasible lineup:

N X cuxuk ≤ B, (budget constraint) u=1 N X xuk = 9, (lineup size constraint) u=1 (4.5) Equations (4.3), (position constraint)

Equations (4.4), (team constraint)

xuk ∈ {0, 1} , 1 ≤ u ≤ N. 4.3. Stacking Lineups Using Structural Correlations As we discussed in Section 2.3, there are correlations that are induced by the structure of the game. If we were able to perfectly predict every player’s performance, then we would not need this structural correlation Hunter, Vielma, and Zaman: Winning 16 information to win a given contest. However, uncertainty in the player performance requires us to take advantage of these structural correlations in order to win the contests. In essence, what we want to do is use the structural correlations to create lineups which have very high variance, which is referred to as stacking. By stacking our lineups, we increase the chance that at least one lineup can achieve a top rank in a contest. In this section we propose ways to stack lineups by incorporate this structural correlation information into our integer programming formulation. Goalie Stacking. As discussed in Section 2.3, goalies are negatively correlated with the skaters that they are opposing. Therefore, to increase the variance of a lineup, we would like to avoid having this negative correlation by making sure all of the lineup’s skaters are not opposing its goalie. We will refer to this constraint as goalie stacking. To model this constraint in our integer programming formulation, we denote the set of skaters who are opposing goalie u by Ou. Due to the team constraint given by (4.4), the maximum number of players we can have in one lineup from a given NHL team is six. For lineup k and goalie u, the goalie stacking constraint is X 6xuk + xvk ≤ 6, ∀u ∈ G. (4.6) v∈Ou

This constraint will force the goalie variable xuk to be zero if the lineup has any skater opposing the goalie. Line Stacking. Additionally, in Section 2.3 we found that players on the same line are positively cor- related. Therefore, to increase the lineup variance, we would like to impose additional constraints on our formulation that will create lineups with players on the same line. We will refer to this as line stacking. There are eight possible skaters on a team and a line consists of three skaters. Therefore, the maximum number of lines we can include in a lineup is two. To include as much positive correlation as possible amongst the players, we can make a lineup to have one complete line of three players, and two partial lines with at least two players each. To formulate the line stacking constraints, we introduce some notation. Let

NL denote the total number of lines, and let Li denote the set of players on line i. With this notation we can formulate the constraint of having at least one complete line in lineup k as

X 3vi ≤ xuk, ∀i ∈ {1,...,NL}

u∈Li N XL (4.7) vi ≥ 1 i=1

vi ∈ {0, 1} , ∀i ∈ {1,...,NL} .

Under these constraints vi is one if we have all three players from line i, and zero otherwise, and we require at least one of the vi to be one. To formulate the constraint of having at least two players from two distinct Hunter, Vielma, and Zaman: Winning 17 lines, we use the following constraints: X 2wi ≤ xuk, ∀i ∈ {1,...,NL}

u∈Li N XL (4.8) wi ≥ 2 i=1

wi ∈ {0, 1} , ∀i ∈ {1,...,NL} .

Similar to the complete line constraints, this constraint sets wi to one if at least two out of three players of line i are in the lineup, and there must be at least two wi equal to one. Defensemen Stacking. Lastly, we have found that defensemen that are on the first power play line score substantially higher than defensemen that are not. Therefore, to increase a lineup’s average points, one could choose defensemen that are on the first power play line. We will refer to this as defensemen stacking. To model this constraint, let PD denote the set of defensemen that are on any team’s first power play line. Then, to use only these defensemen in lineup k we add the following constraints to our integer programming formulation:

xuk = 0 ∀u ∈ D \ PD. (4.9)

This constraint makes sure that any defenseman that is chosen for the lineup is also in the first power play line of some team. There are many other constraints that could be added to the model, such as forcing a lineup to have every player on a team’s first power play, imposing constraints to have seven players from one game, etc. We did not investigate all the possible types of stacking, but they can all be naturally incorporated into our integer programming formulation. In this work we focus on the constraints presented because they capture the major structural correlations in hockey.

4.4. Multiple Lineup Constraints Multiple lineups are created by iteratively solving the integer programming formulation. For each new lineup, we add additional linear constraints to the integer programming formulation in order to refine the

∗ solution space by removing previously found lineups. Let xul be the optimal value of the variable for player u in lineup l. We define the multiple lineup constraints for lineup k as

N X ∗ xulxuk ≤ γ, l = 1, . . . , k − 1. (4.10) u=1 These constraints make sure that lineup k will have no more than γ players in common with any previously constructed lineup. We refer to γ as the maximum lineup overlap. Decreasing γ forces the lineups to be more diverse, whereas for larger γ the lineups will share more players. We will see in Section 5 how the choice of γ is impacted by the number of NHL games being played on a given night. Hunter, Vielma, and Zaman: Winning 18

4.5. Integer Programming Approach for Multiple Lineups We now present the final integer programming formulation for constructing m lineups. We obtain lineup k (1 ≤ k ≤ m) by solving

N X maximize fuxuk x u=1 subject to Equation (4.5) (feasibility constraints) (4.11) Equation (4.10) (multiple lineup constraints)

Any subset of equations (4.6), (4.7), (4.8), (4.9) (stacking constraints).

By solving this integer programming formulation iteratively m times, each time updating the multiple lineup constraints with the previously found solutions, one obtains m lineups. This formulation allows flexibility in the choice of stacking constraints, prediction model, maximum lineup overlap, and number of linueps constructed. We will see next how to select the best integer programming settings.

5. Performance We now evaluate the performance of our algorithm on real daily fantasy sports hockey contests. We obtained data on the performance of the population lineups in several actual DraftKings hockey contests by entering the contest. As entrants, we are able to download the lineups and points of every lineup in the contest. In total we have data for 38 different top heavy contests which contain several thousand entrants. We run our algorithm and evaluate how much profit our lineups would have made in each contest competing against the actual population lineups. We investigate a variety of integer programming settings, including the number of lineups entered, the type of lineup stacking used, the prediction model used, and other features. We combine the different constraints from Section 4.3 to create six different types of stacking which are summarized in Table 3. The first type is no stacking, which only uses the basic constraints and none of the stacking constraints. All the other types of stacking use goalie stacking (equation (4.6)) as this is a very natural negative correlation we wish to avoid. Also, all the other types of stacking require at least one complete line (equation (4.7)), and at least two partial lines (equation (4.8)). Only two defensemen are allowed in Type 2, the logic being that the utility player should be an offensive skater that will score more points, as was seen in Figure 2. Types 3 and 4 use defensemen stacking (equation (4.9)). Types 4 and 5 require exactly three different teams to be represented on a lineup. This constraint increases the lineup variance by having more players on the same team in the lineup.

5.1. Impact of Stacking and Number of Lineups We begin by investigating the performance of each type of stacking versus the number of lineups entered. We create different numbers of lineups for each type of stacking with a maximum lineup overlap of seven and calculate the mean profit margin across all contests. We plot the results in Figure 8. We observe that the Hunter, Vielma, and Zaman: Winning 19

No stacking Type 1 Type 2 Type 3 Type 4 Type 5 Ck Ck Ck Ck Ck Ck Equations (4.6) Equations (4.6) Equations (4.6) Equations (4.6) Equations (4.6) Equations (4.7) Equations (4.7) Equations (4.7) Equations (4.7) Equations (4.7) Equations (4.8) Equations (4.8) Equations (4.8) Equations (4.8) Equations (4.8) X Equations (4.9) Equations (4.9) NT xuk = 2 X u∈D NT ti = 3 X i=1 ti = 3 i=1 Table 3 Summary of the constraints for different types of stacking in our integer programming formulation for constructing hockey lineups. lowest profit margin is achieved by no stacking. This shows the importance of stacking lineups to increase lineup variance. Type 4 stacking achieves the highest profit margin for 100 and 200 lineups. For one lineup, Type 3 stacking has a higher mean profit margin due to a single contest where the profit margin is 124%. Excluding this single outlier, Type 4 stacking has the highest mean profit margin for one lineup. Recall that Type 4 stacking includes goalie stacking, line stacking (complete and partial), defensemen stacking, and at most three different teams. It has more stacking than all other types, which we expect to give its lineups more variance. We see from the data that this maximizing variance strategy is effective for winning top heavy contests. It seems from Figure 8 that the mean profit margin is largest for one lineup. However, this is a deceiving statistic because the mean is dictated essentially by a single outlier for these top heavy contests. A more accurate picture is provided by the boxplot in Figure 9. Here we show the profit margin for Type 4 stacking with a maximum lineup overlap of seven. We see that with one lineup, the median profit margin is -100%. However, with more lineups, the median profit margin becomes larger. In addition, the 75th percentile is larger for 200 lineups than for 100 lineups. This shows that with more lineups the distribution of the profit margin is shifted upward and given a slightly fatter tail. Therefore, to win more money consistently, it is advantageous to use more lineups.

5.2. Impact of Prediction Models We next investigate the impact of different prediction models on our performance. We consider Type 4 stacking, a maximum lineup overlap of seven, and 200 lineups. The prediction models we consider use Rotogrinders predictions, Daily Fantasy Nerd predictions, both sites’ predictions, and both sites plus the win probability. From Table 2 we saw that the best model (in terms of R squared) for predicting goalie points used only the Rotogrinders and Daily Fantasy Nerd predictions and not the win probability information. However, we find that in terms of profit margin, this is not the case. We show in Figure 10 the mean profit margin of different prediction models for different numbers of integer programming lineups. We see here that the model with all three features does the best or near the best for each number of lineups. The reason Hunter, Vielma, and Zaman: Winning 20

500 No Stacking Type 1 400 Type 2 Type 3 Type 4 300 Type 5

200

100 Mean profit margin [%]

0

−100 1 lineup 100 lineups 200 lineups

Figure 8 Plot of the mean profit margin in DraftKings hockey contests versus the number of integer pro- gramming lineups. The plots are for different types of stacking constraints. The maximum lineup overlap allowed is seven.

500

400

300

200

100 Pro t margin [%] margin Pro t 0

−100

1 lineup 100 lineups 200 lineups

Figure 9 Boxplot of the profit margin in DraftKings hockey contests versus the number of integer program- ming lineups. The lineups use Type 4 stacking and a maximum lineup overlap of seven. here may be that by including the win probability, the integer programming lineups are more biased towards goalies expected to win the games, resulting in a better chance of obtaining the game win bonus of three points. Therefore, though the regression analysis suggested that the win probability feature resulted in worse predictions, it actually biased our lineups towards goalies more likely to win, which gave us the edge in the contests.

5.3. Impact of Maximum Lineup Overlap One parameter we can adjust in our model is the maximum lineup overlap. By decreasing the allowed overlap, we force the integer programming approach to explore more of the space of possible lineups. We find that the degree to which one wants to explore this space depends upon how large it is. The size of the lineup space on a given night is larger if there are more NHL games being played. We plot the mean profit Hunter, Vielma, and Zaman: Winning 21

200 DFN Rotogrinders DFN and Rotogrinders DFN, Rotogrinders, and Vegas odds 150

100

50 Mean profit margin [%] 0

−50 1 lineup 100 lineups 200 lineups

Figure 10 Plot of the mean profit margin of our integer programming approach in DraftKings hockey con- tests with different prediction models. The models considered are Daily Fantasy Nerd predictions (DFN), Rotogrinders predictions, DFN and Rotogrinders predictions, and DFN and Rotogrinders predictions with Vegas odds. The maximum lineup overlap allowed is seven and constraint type four is used for the integer programming lineups. margin for 200 integer programming lineups as a function of the number of games played in a night for different values of the maximum overlap in Figure 11. These lineups were created using Type 4 stacking. For nights with less than four games, a maximum overlap of seven does best. For a larger number of games, it seems that decreasing the maximum overlap improves performance. For instance, for nights with more than nine games, an overlap of four does best. The reason for this is that when there are few games being played, the lineup space is smaller and probably has fewer good lineups. By reducing the maximum overlap, we end up not selecting these good lineups and instead choose many poor lineups. However, on nights with many games, there is a huge lineup space that has several different good lineups. A smaller overlap causes the integer programming approach to choose several of these good lineups, increasing the chance that one of them will achieve a high rank. For maximum overlaps smaller than four we found that it becomes difficult to obtain 200 feasible lineups. Maximum overlaps larger than seven produce lineups that are too similar and do not provide enough diversity. Therefore, the maximum overlap should be tuned between four and seven depending on how many games are being played on a given night.

5.4. Lineup Creation Order Versus Lineup Performance Rank The integer programming formulation is solved iteratively to create multiple lineups. One question to ask is do the lineups which are created earlier perform better? To investigate this, we set the integer programming approach to create 200 linueps with Type 4 stacking and a maximum overlap of seven. We then evaluate Hunter, Vielma, and Zaman: Winning 22

300 Max. overlap = 4 Max. overlap = 5 250 Max. overlap = 6 Max. overlap = 7 200

150

100

50 Mean profit margin [%]

0

−50 <4 games 4 to 6 games 7−9 games >9 games

Figure 11 Plot of the mean profit margin 200 integer programming lineups with Type 4 stacking in DraftKings hockey contests versus maximum lineup overlap for different number of NHL hockey games being played. their performance rank relative to each other in the contests. This allows us to look at several statistics relating the creation rank and the performance rank of the lineups. We first investigate the Spearman rank correlation coefficient between these two rankings. We find that the average value of the correlation coeffi- cient across the different contests is 0.09 with a standard deviation of 0.1. This suggests that there is very little correlation between the creation rank and performance rank of the lineups. To dive deeper into the analysis, we plot in Figure 12 two different sets of data. First, we show a boxplot of the creation rank of the best performing lineup. Second, we show the performance rank of the first created lineup. We see that the median performance rank of the first created lineup is 74.5. This shows that the first created lineup has a slightly, but not substantially improved performance. For the best performing lineup, the median creation rank is 124.5. Therefore, while the first created lineup is slightly better, the winning lineup is generally one of the later created lineups. This supports our finding of a very low correlation between the creation and performance rank of the lineups.

5.5. Mean and Standard Deviation of Lineup Points Versus Profit Margin

We also look at the empirical mean and standard deviation of the integer programming lineups (µ1 and σ1) versus the population lineups (µ0 and σ0) for each contest. We use 200 integer programming lineups with Type 4 stacking and a maximum overlap of seven and plot the profit margin versus the difference in the mean and standard deviation in Figure 13. If one looks at the spread of the mean difference of the integer programming lineups, we see that there is a substantial variation. The mean difference swings between -10 and 20 points. When it is below zero, no money is made, but when it is above zero, a substantial profit is made. This is a result of the fact that the lineups must satisfy the stacking constraints which increase Hunter, Vielma, and Zaman: Winning 23

250 Creation rank of Contest rank of best lineup 200 first created lineup

150

100

50

0

Figure 12 Plot of the performance rank of the first created lineup and the creation rank of the best performing lineup. There were 200 integer programming lineups created with Type 4 stacking and a maximum overlap of seven. their individual variance, but also maximize predicted points. The result is that when the actual points are realized, we are either far above or far below the population mean. The standard deviation of the integer programming lineups is generally below the population standard deviation. This is because we are choosing lineups that have some correlation because they are designed to maximize points, even though the individual lineups are designed to have a high variance. The variance we induce in the lineups is manifested through the swing of their mean across contests. From Figure 13 it seems that we are equally likely to be either above or below the population mean. However, because the contest payoff structure is so asymmetric, with a maximum loss of 100%, but a maximum gain which is at least an order of magnitude larger, our strategy ends up being profitable.

5.6. Implementation and Runtime All formulations were constructed using the JuMP algebraic modeling language (Dunning et al. 2015, Lubin and Dunning 2015), which is written in the Julia programming language (Bezanson et al. 2012). This allowed us to test the runtime of our approach for various integer programming solvers. In prac- tice, we must wait for information about which goalies are being played to be posted before we con- struct our lineups. Goalies do not play every night, so if we do not wait for this information, we may put goalies in lineups who are not playing and receive zero points for them. This goalie information is generally posted on public websites about 30 minutes before the games begin. We must be able to solve the integer programming formulation in this time to be able to enter the contest. We show in Figure 14 the time needed to solve the integer programming formulation for 200 lineups in each of our con- tests using different integer programming solvers. We consider the free solvers CBC (COIN-OR) and GLPK (GNU) and the commercial solvers Gurobi (Gurobi Optimization). All computations were done using a Intel Core i5-4570 3.20GHz with 8GB of RAM. We find that all solvers are able to solve the Hunter, Vielma, and Zaman: Winning 24

4 1400

3 1200 2 1000 1 8% 435% 0 642% 1220% 800

- σ 0 77% 1 102% 744% σ 127% 600 -1 4% 161% 554% 13% 178% 1445% 400 -2

41% 114% -3 200

-4 -20 -10 0 10 20 µ1-µ0

Figure 13 Plot of profit margin versus µ1 − µ0 and σ1 − σ0 for DraftKings hockey contests with 200 integer programming lineups created using Type 4 stacking and a maximum overlap of seven. The profit margin for each contest is indicated through the color and size of the markers and also listed next to the marker. The x markers indicate contests where the profit margin is not positive.

4

3.5

3

2.5

2

1.5 Runtime [minutes] 1

0.5

CBC GLPK Gurobi Solver

Figure 14 The runtime of our algorithm for generating 100 lineups with different integer programming soft- ware.

integer programming formulations in under four minutes, giving sufficient time to enter the lineups into the contest. The complete software used to construct the integer programming lineups is available at https://github.com/dscotthunter/Fantasy-Hockey-IP-Code. Hunter, Vielma, and Zaman: Winning 25

6. Conclusion We have shown here that integer programming can provide a significant edge in daily fantasy sports contests. Our integer programming approach was able to win several top heavy hockey contests. We have studied how different settings of the integer programming approach impact performance. Furthermore, we provided an analysis of the edge needed from the integer programming approach in order to win top heavy contests. At a high level, our approach can be summarized as follows. 1. Use publicly available predictions for player points. 2. Construct an integer programming formulation to construct a lineup that maximizes points while forc- ing the lineup to have a high variance by including lines (also known as stacking). 3. Solve the integer programming formulation multiple times, each time adding constraints to produce distinct lineups. The success we were able to achieve given the relative simplicity of our approach shows the power of inte- ger programming in daily fantasy sports contests. An interesting question is how easily can this integer programming approach be ported over to other sports besides hockey? Of the items listed above, the first and third are generic to any sport. The only one unique to hockey is the way in which we formulated stack- ing constraints. There are structural correlation in hockey which we exploited to produce higher variance lineups. While other sports may not have these specific structural correlations, they do have other structural correlations which can be used to stack lineups. We now briefly discuss these for a few popular sports. American Football. In American football daily fantasy sports contests, lineups are required to have a single quarterback who throws the ball, and receivers who catch the ball. Fantasy points are awarded to the quarterback and receiver for any completed passes, yards achieved on the completion, and any touchdown scored on the completion. This leads to a very natural structural correlation. By creating a lineup with a quarterback and receivers on the same team, the variance of the lineup will increase. This is a well known approach to stacking football lineups (Bales 2015b). Baseball. In baseball, a pitcher throws the ball to a batter who must the ball and then run to four bases to score runs. There is a fixed set of batters for a team, and usually a single pitcher for the majority of a game. Fantasy points are given to the batters for hits and runs. The structural correlation in baseball is created by the pitcher. If a pitcher is weak, then the all the opposing batters will likely score many fantasy points. Therefore, to create high variance lineups in baseball, batters should be selected from the same team if the opposing pitcher is expected to be weak. This stacking approach is also well known (Bales 2015a).

References T. Awad. Numbers on ice: Fixing plus/minus. Hockey Prospectus, apr 2009.

Jonathan Bales. The art of stacking (or not) in daily . http://playbook.draftkings.com/mlb/the-art-of- stacking-or-not-in-daily-fantasy-baseball/, June 2015a. Hunter, Vielma, and Zaman: Winning 26

Jonathan Bales. Pairing a QB with his receiver(s). https://rotogrinders.com/articles/pairing-a-qb-with-his-receiver-s- 481544, December 2015b.

David Beaudoin and Tim B Swartz. Strategies for pulling the goalie in hockey. The American Statistician, 64(3), 2010.

A. Becker and A. Sun. An analytical approach for fantasy football draft and lineup management. Journal for Quanti- tative Analysis in Sports, 2014.

Jeff Bezanson, Stefan Karpinski, Viral Shah, and Alan Edelman. Julia: A fast dynamic language for technical com- puting. CoRR, abs/1209.5145, 2012.

New York Times Editorial Board. Rein in online fantasy sports gambling. http://www.nytimes.com/2015/10/05/opinion/rein-in-online-fantasy-sports-gambling.html, October 2015.

COIN-OR. CBC: COIN-OR Branch and Cut. https://projects.coin-or.org/Cbc.

DailyFantasyNerd. Daily fantasy nerd. http://dailyfantasynerd.com, December 2015.

DraftKings. Draftkings. http://draftkings.com, December 2015.

Drewby. Rackem stackem: Dfs hockey. http://dailyroto.com/nhl-dfs-stacking-strategy-draftkings-, October 2015.

Iain Dunning, Joey Huchette, and Miles Lubin. JuMP: A modeling language for mathematical optimization. arXiv:1508.01982 [math.OC], 2015.

GNU. GLPK: GNU Linear Programming Kit. https://www.gnu.org/software/glpk/.

Robert B Gramacy, Shane T Jensen, and Matt Taddy. Estimating player contribution in hockey with regularized logistic regression. Journal of Quantitative Analysis in Sports, 9(1):97–111, 2013.

Gurobi Optimization. The Gurobi Optimizer. http://www.gurobi.com.

Drew Harwell. Why you (probably) won’t win money playing Draftkings, FanDuel. http://www.dailyherald.com/article/20151012/business/151019683/, October 2015.

Darren Heitner. Draftkings reports $304 million of entry fees in 2014. http://www.forbes.com/sites/darrenheitner/2015/01/22/draftkings-reports-304-million-of-entry-fees-in-2014/, January 2015.

Miles Lubin and Iain Dunning. Computing in operations research using Julia. INFORMS Journal on Computing, 27 (2):238–248, Spring 2015.

Brian Macdonald. An improved adjusted plus-minus statistic for nhl players. In Proceedings of the MIT Sloan Sports Analytics Conference, URL http://www. sloansportsconference. com, 2011.

Stephen Pettigrew. Assessing the offensive productivity of nhl players using in-game win probabilities. 2015.

Rotogrinders. Rotogrinders. https://rotogrinders.com/, December 2015. Hunter, Vielma, and Zaman: Winning 27

M Schuckers. Digr: A defense independent rating of nhl using spatially smoothed save percentage maps. In Proceedings of the MIT Sloan Sports Analytics Conference, 2011.

Michael Schuckers and James Curro. Total hockey rating (thor): A comprehensive statistical rating of national hockey league forwards and defensemen based upon all on-ice events. In Proceedings of the 2013 MIT Sloan Sports Analytics Conference, http://www. statsportsconsulting. com/thor, 2013.

Michael E Schuckers, Dennis F Lock, Chris Wells, CJ Knickerbocker, and Robin H Lock. National hockey league skater ratings based upon all on-ice events: An adjusted minus/plus probability (ampp) approach. Unpub- lished manuscript. Available at http://myslu. stlawu. edu/˜ msch/sports/LockSchuckersProbPlusMinus113010. pdf, 2011.

Daniel Steinberg. How daily fantasy football pricing algorithms work. https://dailyfantasywinners.com/fantasy- categories/featured/daily-fantasy-football-pricing-algorithms-work/, 2015.

AC Thomas, Samuel L Ventura, Shane T Jensen, Stephen Ma, et al. Competing process hazard function models for player ratings in . The Annals of Applied Statistics, 7(3):1497–1524, 2013.