1. Problem Definition and Justificaiton

Week 1. Problem Definition and Justificaiton Sam Son Creative IT Studio 3 1. Problem Definition Problem Definition • My problem is to expect the future performance of major league pitchers. Problem Definition • How can we define the performance of a player? • Well, there are many possibilities… • The number of strikes out he made • The number of wins • The number of inning pitched • ERA(the average of lost points per 9 inning pitched) Problem Definition • But … we live in the 21st century • We already know these features works bad as evaluator of performance! • Ex. 2012 Ryu Hyun Jin in KBO. • He striked out 210 and his ERA was 2.66(1st ,5th in the league) • But he only became a winning pitcher in 9 games out of 27 although he was one of the most valuable player in that year. Problem Definition • Statistician and many baseball nerds developed new technique called sabermetrics to evaluate the performance of players exactly. • FIP, WAR, WPA, … and includes many other acronyms you might never hear Problem Definition • WAR(Wins Above Replacement) is the one figure can represent the performance of a player. • It shows us how much wins one player has given the team when compared to replacement-level player; • A player of common skills available for minimum cost to a major league baseball team. A team of replacement-level players would be expected to win a baseline minimum number of games, typically 40-50, per 162 game season.(Wikipedia) Problem Definition • In 2001, Chan Ho Park records 4.3 WAR in LA Dodgers • In 2001, Randy Johnson record 10.1 WAR in D-backs • So… theoretically, LA Dodgers would win about 6 more games if they had Randy Johnson instead of Chan Ho Park. Problem Definition • The following procedure shows how we can calculate fWAR(WAR that fangraphs.com provieds) of a pitcher(https://library.fangraphs.com/war/calculating-war-pitchers/). • WAR = [[([(League “FIP” – “FIP”) / Pitcher Specific Runs Per Win] + Replacement Level) * (IP/9)] * Leverage Multiplier for Relievers] + League Correction • ifFIP = ((13*HR)+(3*(BB+HBP))-(2*(K+IFFB)))/IP + ifFIP constant • ifFIP Constant = lgERA – (((13*lgHR)+(3*(lgBB+lgHBP))- (2*(lgK+lgIFFB)))/lgIP) • Adjustment = lgRA9 – lgERA • FIPR9 = ifFIP + Adjustment • … • I don’t think we need to go deeper. Problem Definition • WAR is widely considered a comprehensive figure that describes the performance of a player best. • So, I will use WAR as an evaluator of performance. • Let’s re-define my problem Problem Definition • My problem is to expect the future WAR of major league pitchers using publicly accessible data, using several machine learning technique such as LSTM, XGBoost. 2. Justification Justification • Q: Have anybody tried to make a system predicting future WAR? • A: Absolutely, yes. • There are already several public prediction system such as ZiPs, Streamer, and each pro sports club must be using their own prediction system. Justification • However, public prediction system works not that good. • Here is my demo justifying my argument. • https://www.youtube.com/watch?v=cmUVDXm--qA&feature=youtu.be • It shows 0.81 error for all players and 1.05 for player whose WAR is larger than 1. Justification • In 2018, the value of 1 WAR was calculated $8m. • It is calculated by (Sum of salary of all players)/(Sum of WAR of all players) • One major league team consists 25 players. • 1 WAR error per person can make big difference. Justification • ZiPs don’t mention any serious machine learning or, especially, deep learning in the description of their method. • There have been some papers using machine learning in baseball problem • https://beta.vu.nl/nl/Images/werkstuk-elfrink_tcm235-888205.pdf • http://cs229.stanford.edu/proj2005/Donaker- MLBPredictionAndAnalysis.pdf • But, they focused on match prediction between two teams rather than prediction on performance. Justification • So, my goal should be making a performance prediction system that out-performs existing systems. • To be specific, I will use 2017, 2018 ZiPs prediction data as references. My goal is to make a prediction system that shows better prediction in 2017, 2018 performance than ZiPs’ prediction. Thank you.

Load more