Week 1. Problem Definition and Justificaiton
Sam Son Creative IT Studio 3 1. Problem Definition Problem Definition
• My problem is to expect the future performance of major league pitchers. Problem Definition
• How can we define the performance of a player?
• Well, there are many possibilities… • The number of strikes out he made • The number of wins • The number of inning pitched • ERA(the average of lost points per 9 inning pitched) Problem Definition
• But … we live in the 21st century
• We already know these features works bad as evaluator of performance!
• Ex. 2012 Ryu Hyun Jin in KBO. • He striked out 210 and his ERA was 2.66(1st ,5th in the league) • But he only became a winning pitcher in 9 games out of 27 although he was one of the most valuable player in that year. Problem Definition
• Statistician and many baseball nerds developed new technique called sabermetrics to evaluate the performance of players exactly. • FIP, WAR, WPA, … and includes many other acronyms you might never hear Problem Definition
• WAR(Wins Above Replacement) is the one figure can represent the performance of a player.
• It shows us how much wins one player has given the team when compared to replacement-level player; • A player of common skills available for minimum cost to a major league baseball team. A team of replacement-level players would be expected to win a baseline minimum number of games, typically 40-50, per 162 game season.(Wikipedia) Problem Definition
• In 2001, Chan Ho Park records 4.3 WAR in LA Dodgers • In 2001, Randy Johnson record 10.1 WAR in D-backs
• So… theoretically, LA Dodgers would win about 6 more games if they had Randy Johnson instead of Chan Ho Park. Problem Definition
• The following procedure shows how we can calculate fWAR(WAR that fangraphs.com provieds) of a pitcher(https://library.fangraphs.com/war/calculating-war-pitchers/). • WAR = [[([(League “FIP” – “FIP”) / Pitcher Specific Runs Per Win] + Replacement Level) * (IP/9)] * Leverage Multiplier for Relievers] + League Correction • ifFIP = ((13*HR)+(3*(BB+HBP))-(2*(K+IFFB)))/IP + ifFIP constant • ifFIP Constant = lgERA – (((13*lgHR)+(3*(lgBB+lgHBP))- (2*(lgK+lgIFFB)))/lgIP) • Adjustment = lgRA9 – lgERA • FIPR9 = ifFIP + Adjustment • … • I don’t think we need to go deeper. Problem Definition
• WAR is widely considered a comprehensive figure that describes the performance of a player best.
• So, I will use WAR as an evaluator of performance.
• Let’s re-define my problem Problem Definition
• My problem is to expect the future WAR of major league pitchers using publicly accessible data, using several machine learning technique such as LSTM, XGBoost. 2. Justification Justification
• Q: Have anybody tried to make a system predicting future WAR? • A: Absolutely, yes.
• There are already several public prediction system such as ZiPs, Streamer, and each pro sports club must be using their own prediction system. Justification
• However, public prediction system works not that good.
• Here is my demo justifying my argument. • https://www.youtube.com/watch?v=cmUVDXm--qA&feature=youtu.be
• It shows 0.81 error for all players and 1.05 for player whose WAR is larger than 1. Justification
• In 2018, the value of 1 WAR was calculated $8m. • It is calculated by (Sum of salary of all players)/(Sum of WAR of all players)
• One major league team consists 25 players.
• 1 WAR error per person can make big difference. Justification
• ZiPs don’t mention any serious machine learning or, especially, deep learning in the description of their method. • There have been some papers using machine learning in baseball problem • https://beta.vu.nl/nl/Images/werkstuk-elfrink_tcm235-888205.pdf • http://cs229.stanford.edu/proj2005/Donaker- MLBPredictionAndAnalysis.pdf • But, they focused on match prediction between two teams rather than prediction on performance. Justification
• So, my goal should be making a performance prediction system that out-performs existing systems.
• To be specific, I will use 2017, 2018 ZiPs prediction data as references. My goal is to make a prediction system that shows better prediction in 2017, 2018 performance than ZiPs’ prediction. Thank you