1. Problem Definition and Justificaiton

Week 1. Problem Definition and Justificaiton Sam Son Creative IT Studio 3 1. Problem Definition Problem Definition • My problem is to expect the future performance of major league pitchers. Problem Definition • How can we define the performance of a player? • Well, there are many possibilities… • The number of strikes out he made • The number of wins • The number of inning pitched • ERA(the average of lost points per 9 inning pitched) Problem Definition • But … we live in the 21st century • We already know these features works bad as evaluator of performance! • Ex. 2012 Ryu Hyun Jin in KBO. • He striked out 210 and his ERA was 2.66(1st ,5th in the league) • But he only became a winning pitcher in 9 games out of 27 although he was one of the most valuable player in that year. Problem Definition • Statistician and many baseball nerds developed new technique called sabermetrics to evaluate the performance of players exactly. • FIP, WAR, WPA, … and includes many other acronyms you might never hear Problem Definition • WAR(Wins Above Replacement) is the one figure can represent the performance of a player. • It shows us how much wins one player has given the team when compared to replacement-level player; • A player of common skills available for minimum cost to a major league baseball team. A team of replacement-level players would be expected to win a baseline minimum number of games, typically 40-50, per 162 game season.(Wikipedia) Problem Definition • In 2001, Chan Ho Park records 4.3 WAR in LA Dodgers • In 2001, Randy Johnson record 10.1 WAR in D-backs • So… theoretically, LA Dodgers would win about 6 more games if they had Randy Johnson instead of Chan Ho Park. Problem Definition • The following procedure shows how we can calculate fWAR(WAR that fangraphs.com provieds) of a pitcher(https://library.fangraphs.com/war/calculating-war-pitchers/). • WAR = [[([(League “FIP” – “FIP”) / Pitcher Specific Runs Per Win] + Replacement Level) * (IP/9)] * Leverage Multiplier for Relievers] + League Correction • ifFIP = ((13*HR)+(3*(BB+HBP))-(2*(K+IFFB)))/IP + ifFIP constant • ifFIP Constant = lgERA – (((13*lgHR)+(3*(lgBB+lgHBP))- (2*(lgK+lgIFFB)))/lgIP) • Adjustment = lgRA9 – lgERA • FIPR9 = ifFIP + Adjustment • … • I don’t think we need to go deeper. Problem Definition • WAR is widely considered a comprehensive figure that describes the performance of a player best. • So, I will use WAR as an evaluator of performance. • Let’s re-define my problem Problem Definition • My problem is to expect the future WAR of major league pitchers using publicly accessible data, using several machine learning technique such as LSTM, XGBoost. 2. Justification Justification • Q: Have anybody tried to make a system predicting future WAR? • A: Absolutely, yes. • There are already several public prediction system such as ZiPs, Streamer, and each pro sports club must be using their own prediction system. Justification • However, public prediction system works not that good. • Here is my demo justifying my argument. • https://www.youtube.com/watch?v=cmUVDXm--qA&feature=youtu.be • It shows 0.81 error for all players and 1.05 for player whose WAR is larger than 1. Justification • In 2018, the value of 1 WAR was calculated $8m. • It is calculated by (Sum of salary of all players)/(Sum of WAR of all players) • One major league team consists 25 players. • 1 WAR error per person can make big difference. Justification • ZiPs don’t mention any serious machine learning or, especially, deep learning in the description of their method. • There have been some papers using machine learning in baseball problem • https://beta.vu.nl/nl/Images/werkstuk-elfrink_tcm235-888205.pdf • http://cs229.stanford.edu/proj2005/Donaker- MLBPredictionAndAnalysis.pdf • But, they focused on match prediction between two teams rather than prediction on performance. Justification • So, my goal should be making a performance prediction system that out-performs existing systems. • To be specific, I will use 2017, 2018 ZiPs prediction data as references. My goal is to make a prediction system that shows better prediction in 2017, 2018 performance than ZiPs’ prediction. Thank you.

1. Problem Definition and Justificaiton

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support