Week 1. Problem Definition and Justificaiton

Sam Son Creative IT Studio 3 1. Problem Definition Problem Definition

• My problem is to expect the future performance of major league . Problem Definition

• How can we define the performance of a player?

• Well, there are many possibilities… • The number of strikes out he made • The number of wins • The number of inning pitched • ERA(the average of lost points per 9 inning pitched) Problem Definition

• But … we live in the 21st century

• We already know these features works bad as evaluator of performance!

• Ex. 2012 Ryu Hyun Jin in KBO. • He striked out 210 and his ERA was 2.66(1st ,5th in the league) • But he only became a winning in 9 games out of 27 although he was one of the most valuable player in that year. Problem Definition

• Statistician and many nerds developed new technique called sabermetrics to evaluate the performance of players exactly. • FIP, WAR, WPA, … and includes many other acronyms you might never hear Problem Definition

• WAR(Wins Above Replacement) is the one figure can represent the performance of a player.

• It shows us how much wins one player has given the team when compared to replacement-level player; • A player of common skills available for minimum cost to a team. A team of replacement-level players would be expected to win a baseline minimum number of games, typically 40-50, per 162 game season.(Wikipedia) Problem Definition

• In 2001, Chan Ho records 4.3 WAR in LA Dodgers • In 2001, record 10.1 WAR in D-backs

• So… theoretically, LA Dodgers would win about 6 more games if they had Randy Johnson instead of Chan Ho Park. Problem Definition

• The following procedure shows how we can calculate fWAR(WAR that fangraphs.com provieds) of a pitcher(https://library.fangraphs.com/war/calculating-war-pitchers/). • WAR = [[([(League “FIP” – “FIP”) / Pitcher Specific Runs Per Win] + Replacement Level) * (IP/9)] * Leverage Multiplier for Relievers] + League Correction • ifFIP = ((13*HR)+(3*(BB+HBP))-(2*(K+IFFB)))/IP + ifFIP constant • ifFIP Constant = lgERA – (((13*lgHR)+(3*(lgBB+lgHBP))- (2*(lgK+lgIFFB)))/lgIP) • Adjustment = lgRA9 – lgERA • FIPR9 = ifFIP + Adjustment • … • I don’t think we need to go deeper. Problem Definition

• WAR is widely considered a comprehensive figure that describes the performance of a player best.

• So, I will use WAR as an evaluator of performance.

• Let’s re-define my problem Problem Definition

• My problem is to expect the future WAR of major league pitchers using publicly accessible data, using several machine learning technique such as LSTM, XGBoost. 2. Justification Justification

• Q: Have anybody tried to make a system predicting future WAR? • A: Absolutely, yes.

• There are already several public prediction system such as ZiPs, Streamer, and each pro sports club must be using their own prediction system. Justification

• However, public prediction system works not that good.

• Here is my demo justifying my argument. • https://www.youtube.com/watch?v=cmUVDXm--qA&feature=youtu.be

• It shows 0.81 error for all players and 1.05 for player whose WAR is larger than 1. Justification

• In 2018, the value of 1 WAR was calculated $8m. • It is calculated by (Sum of salary of all players)/(Sum of WAR of all players)

• One major league team consists 25 players.

• 1 WAR error per person can make big difference. Justification

• ZiPs don’t mention any serious machine learning or, especially, deep learning in the description of their method. • There have been some papers using machine learning in baseball problem • https://beta.vu.nl/nl/Images/werkstuk-elfrink_tcm235-888205.pdf • http://cs229.stanford.edu/proj2005/Donaker- MLBPredictionAndAnalysis.pdf • But, they focused on match prediction between two teams rather than prediction on performance. Justification

• So, my goal should be making a performance prediction system that out-performs existing systems.

• To be specific, I will use 2017, 2018 ZiPs prediction data as references. My goal is to make a prediction system that shows better prediction in 2017, 2018 performance than ZiPs’ prediction. Thank you