Reinforcement Learning of Optimal Strategies in Fantasy Football CS221 AI Project | Young Gun You, Warren Macdonald
Total Page:16
File Type:pdf, Size:1020Kb
Reinforcement Learning of Optimal Strategies in Fantasy Football CS221 AI project | Young Gun You, Warren MacDonald What is DFS (Daily Fantasy Sports): How to Play: not Depth-first search J An example for Draftkings (DFS platform) • Daily fantasy sports are a subset of fantasy sport • Select 9 players games. – 1 x Quarterback • Participants compete against others by building a – 2 x Running back team of professional athletes from a particular league – 3 x Wide receiver while remaining under a salary cap. – 1 x Tight end • Earn points based on the actual performance metrics – 1 x Flex (RB, WR or TE) of players in real-world competitions. – 1 x Team defense • Salary cap of 9 players combined is $50,000 Our Objective: Our Challenges: Find a way to win the tournament! Easy to play, hard to win… • Median value of the score of • Draftkings itself uses machine learning to assign tournament participants is 129. salary, aiming to create equal values across players. • To win double-up tournament, • Individual performance prediction using ML is bad. we need ~135 points. – Avg. error is ~5 pts, while avg. score is ~9 pts. • The ultimate goal is to find an optimal strategy for • Each player only plays 10-15 games per season, so building a team of 9 players that maximizes ROI. there is limited data available. s a s’ web-scraping QB RB WR TE DST ~40 players Provide artificially ~450 players • Game log generated unique • Player stats tournament High Salary? • Team defense stats environment at each High Pred? Generate 20 line-up • Snap count learning iteration High Value? using Genetic • Vegas money line (value = Pred/salary) algorithm • Fantasy salary Constraints: assigned -Keep the match-up while Line-up binary encoding with • Fantasy points player duplication is not 1000 pop, 100 generations prediction allowed in each environment Q Reward Agent Constraints: • Fantasy points actual $ -Salary < 50k w/ 9 players Update Q based, -Stacking strategy 20,000 iterations -Penalize selecting players Data Collection Tournament Three types of Reward (total 400,000 line-up) from same team We scraped and [1] Median pts of 20 line-ups Environment Leveraged AWS with parallel cleaned statistical data To overcome limited data Evaluate general performance computing algorithm structure from the 2014-2016 Optimization availability, we constructed Create 20 unique line-ups NFL seasons. Per thousands of artificial [2] ROI on double-up Why Q-learning? season there are 17 Assuming 135 points is cut- • Previous action is which maximize sum of weeks consisting of prediction while meeting weeks of game data. individual games from line of winning the dependent to the future tournament, calculate ROI action. given constraints. Used other weeks across all three • Reward can be determined genetic algorithm to seasons, assumed all [3] Max pts of 20 line-ups only at final status. provide randomness games played across weeks • Flexible to more complex were independent of each Evaluate GPP performance actions going forward. other. Q-Learning & Outcome Limitations & Next Steps What if we played 20,000, what we could’ve learn from that experience. How to improve further Q value iteration by policy Best vs Worst comparison v Limitation of existing model – Assumed tournament winning cut-off – Decisions are too simple – Use outsourced player projection – Used only 20 line-up v Performance improvement direction [BEST policy for Double-Up tournament] 122% ROI – Add more sophisticated actions using stats QB(Value)+RB(Value)+WR(Pred)+TE(Value)+DST(Value) (player, team, opponents, weather, etc.) ∆35%p – Continue improving with weekly game results [WORST policy for Double-Up tournament] 87% ROI – Generate more line-ups (100+) with player QB(Salary)+RB(Salary)+WR(Value)+TE(Value)+DST(Pred) ownership control – Improve/adjust fitness function for genetic [BEST policy for GPP tournament] 177 highest point average algorithm QB(Pred)+RB(Pred)+WR(Salary)+TE(Pred)+DST(Value) – Use average (or median) projection from multiple 15pts experts [WORST policy for GPP tournament] 162 highest pts. Contact Us QB(Salary)+RB(Value)+WR(Value)+TE(Salary)+DST(Salary) Warren MacDonald [email protected] YoungGun You [email protected].