Major Qualifying Project: Advanced Baseball Statistics

Total Page:16

File Type:pdf, Size:1020Kb

Load more

Major Qualifying Project: Advanced Baseball Statistics Matthew Boros, Elijah Ellis, Leah Mitchell Advisors: Jon Abraham and Barry Posterro December 13, 2019 Contents 1 Background 5 1.1 The History of Baseball . .5 1.2 Key Historical Figures . .7 1.2.1 Jerome Holtzman . .7 1.2.2 Bill James . .7 1.2.3 Nate Silver . .8 1.2.4 Joe Peta . .8 1.3 Explanation of Baseball Statistics . .9 1.3.1 Save . .9 1.3.2 OBP,SLG,ISO . 10 1.3.3 Earned Run Estimators . 10 1.3.4 Probability Based Statistics . 11 1.3.5 wOBA . 12 1.3.6 WAR . 12 1.3.7 Projection Systems . 13 2 Aggregated Baseball Database 15 2.1 Data Sources . 16 2.1.1 Retrosheet . 16 2.1.2 MLB.com . 17 2.1.3 PECOTA . 17 2.1.4 CBS Sports . 17 2.2 Table Structure . 17 2.2.1 Game Logs . 17 2.2.2 Play-by-Play . 17 2.2.3 Starting Lineups . 18 2.2.4 Team Schedules . 18 2.2.5 General Team Information . 18 2.2.6 Player - Game Participation . 18 2.2.7 Roster by Game . 18 2.2.8 Seasonal Rosters . 18 2.2.9 General Team Statistics . 18 2.2.10 Player and Team Specific Statistics Tables . 19 2.2.11 PECOTA Batting and Pitching . 20 2.2.12 Game State Counts by Year . 20 2.2.13 Game State Counts . 20 1 CONTENTS 2 2.3 Conclusion . 20 3 Cluster Luck 21 3.1 Quantifying Cluster Luck . 22 3.2 Circumventing Cluster Luck with Total Bases . 24 3.3 Conclusion . 26 4 Garbage Time 27 4.1 Calculating Situational Odds of Winning . 28 4.2 Quantifying Garbage Time . 30 4.3 Garbage-Adjusted Statistics . 31 5 Preseason Model 34 5.1 Calculating Cluster Luck Adjusted Wins . 34 5.2 Roster Changes . 36 5.3 Injury Adjustments . 37 5.4 Alternative Attempts at Creating the Preseason Model . 38 5.5 Alternative Methods to Creating Preseason Models . 39 5.5.1 PECOTA and ZIPS . 39 5.5.2 Total Bases . 39 5.5.3 Fractional Rosters . 39 5.6 Regression Results . 40 5.7 Second Regression . 42 6 In-Season Model 43 6.1 Runs . 43 6.2 Cluster Luck Runs . 44 6.3 Total Bases . 45 6.4 Wins Adapted In-Season . 46 6.5 Garbage Time Exlcuded . 47 6.6 Results . 48 7 Combination of the Two Models 51 7.1 Credibility Theory . 51 7.2 Linear . 52 7.3 Mixed . 53 7.4 Results . 54 8 Single Game Betting 55 8.1 Starting Pitcher, Relief Pitcher, and Lineup . 55 8.2 Games . 56 8.3 Results . 57 8.4 Conclusion . 58 9 2019 Testing 59 9.1 Preseason Testing . 59 9.2 Conclusion . 60 10 Conclusion 61 CONTENTS 3 A Database Technical Details 63 A.1 Design Decisions . 63 A.1.1 Choosing SQLite and SQL Alchemy . 63 A.1.2 Table Design and Code Reusability . 65 A.2 Retrieving and Cleaning Data . 66 A.2.1 Retrosheet . 66 A.2.2 MLB.com Rosters . 66 A.2.3 CBS Sports Injury Reports . 67 A.3 Using the Database . 69 A.4 Full Table List . 70 A.4.1 GameLog . 70 A.4.2 PlayByPlay . 70 A.4.3 SeasonalRoster . 74 A.4.4 StartingLineup . 74 A.4.5 TeamStats . 74 A.4.6 TeamBattingStats . 75 A.4.7 TeamPitchingStats . 75 A.4.8 NonGbgTeamBattingStats . 75 A.4.9 NonGbgTeamPitchingStats . 76 A.4.10 PlayerBattingStats . 76 A.4.11 PlayerPitchingStats . 77 A.4.12 NonGbgPlayerBattingStats . 77 A.4.13 NonGbgPlayerPitchingStats . 77 A.4.14 Game State Counts By Year ........................... 78 A.4.15 Game State Counts . 78 A.4.16 RosterByGame . 78 A.4.17 Schedule . 78 A.4.18 BettingOdds . ..
Recommended publications
  • A Bayesian Approach to In-Game Win Probability in Soccer

    A Bayesian Approach to In-Game Win Probability in Soccer

    A Bayesian Approach to In-Game Win Probability in Soccer Pieter Robberechts Jan Van Haaren Jesse Davis [email protected] [email protected] [email protected] KU Leuven, Dept. of Computer KU Leuven, Dept. of Computer KU Leuven, Dept. of Computer Science; Leuven.AI Science; Leuven.AI Science; Leuven.AI B-3000 Leuven, Belgium B-3000 Leuven, Belgium B-3000 Leuven, Belgium Premier League - 2011/12 3 : 2 1:0 1:1 QPR 1:2 2:2 3:2 100% City wins 80% 60% Draw 40% 20% QPR wins 15 30 HT 60 75 90 min Figure 1: In-game win probabilities for the game between Manchester City and Queens Park Rangers (QPR) on the final day of the 2011/12 Premier League season. ABSTRACT demonstrates that our framework provides well-calibrated proba- In-game win probability models, which provide a sports team’s bilities. Furthermore, two use cases show its ability to enhance fan likelihood of winning at each point in a game based on historical experience and to evaluate performance in crucial game situations. observations, are becoming increasingly popular. In baseball, bas- ketball and American football, they have become important tools CCS CONCEPTS to enhance fan experience, to evaluate in-game decision-making, • Mathematics of computing ! Probabilistic inference prob- and to inform coaching decisions. While equally relevant in soccer, lems; Variational methods; • Computing methodologies ! Ma- the adoption of these models is held back by technical challenges chine learning; • Information systems ! Data mining. arising from the low-scoring nature of the sport. In this paper, we introduce an in-game win probability model for KEYWORDS soccer that addresses the shortcomings of existing models.
  • TODAY's HEADLINES AGAINST the OPPOSITION Home

    TODAY's HEADLINES AGAINST the OPPOSITION Home

    ST. PAUL SAINTS (6-9) vs INDIANAPOLIS INDIANS (PIT) (9-5) LHP CHARLIE BARNES (1-0, 4.00) vs RHP JAMES MARVEL (0-0, 3.48) Friday, May 21st, 2021 - 7:05 pm (CT) - St. Paul, MN - CHS FIeld Game #16 - Home Game #10 TV: FOX9+/MiLB.TV RADIO: KFAN Plus 2021 At A Glance TODAY'S HEADLINES AGAINST THE OPPOSITION Home .....................................................4-5 That Was Last Night - The Saints got a walk-off win of their resumed SAINTS VS INDIANAPOLIS Road ......................................................2-4 game from Wednesday night, with Jimmy Kerrigan and the bottom of the Saints order manufacturing the winning run. The second game did .235------------- BA -------------.301 vs. LHP .............................................1-0 not go as well for St. Paul, where they dropped 7-3. Alex Kirilloff has vs. RHP ............................................5-9 homered in both games of his rehab assignment with the Saints. .333-------- BA W/2O ----------.300 Current Streak ......................................L1 .125 ------- BA W/ RISP------- .524 Most Games > .500 ..........................0 Today’s Game - The Saints aim to preserve a chance at a series win 9 ----------------RUNS ------------- 16 tonight against Indianapolis, after dropping two of the first three games. 2 ----------------- HR ---------------- 0 Most Games < .500 ..........................3 Charlie Barnes makes his third start of the year, and the Saints have yet 2 ------------- STEALS ------------- 0 Overall Series ..................................1-0-1 to lose a game he’s started. 5.00 ------------- ERA ----------- 3.04 Home Series ...............................0-0-1 28 ----------------- K's -------------- 32 Keeping it in the Park - Despite a team ERA of 4.66, the Saints have Away Series ................................0-1-0 not been damaged by round-trippers.
  • Blogger Widge Did You Ever Have That Situation Where You Tried Something the First Time and You Thought It

    Blogger Widge Did You Ever Have That Situation Where You Tried Something the First Time and You Thought It

    NFL MLB NBA NHL NCAAF NCAAB SOCCER TICKETS Blogger Widgets NFL ­ Team Reports FXP Staff Writers STAY CURRENT - FOLLOW WEDNESDAY, APRIL 24, 2013 FOOTBALLXPS MOST DISCUSSED REVENGE OF THE SOCK: WHO SHOULD THE BEARS AND PACKERS PICK IN THE The Packers are FIRST ROUND OF THE DRAFT? improving, but they can't win the Super Bowl. Week 17 Power Rankings. Geno Smith outperformed everyone's expectations Percy Harvin is moving on to the Seahawks, where RF SPORTS RADIO - LIVE do the Vikings go to replace him? 00:00 Happy Hour Network ­ Baseball How good is Seattle right Beer and BBQ now? Are they the best team in the NFL? Week 17 Power Rankings Isn't That Kangeroo Cute?: Early NFL Week 1 Fantasy Did you ever have that situation where you tried something the first time Football Pickups Tim Tebow to sign and you thought it was great, but the next time you tried it just wasn't that with Eagles good? Off­season Tony Romo zings Patriots at Country... Report: Adrian For instance, you go to get some pizza from around the corner and Peterson Not afterwards you swear to your friends that you never had better pizza, but Troubleshooting: St. Louis Likely... Rams when you all go back there the pepperoni pizza actually tastes like your Green Bay Packers: college roommate's old sweaty sock? And trust me, my college roommate Complete seven... Garbage In, didn't do his laundry often, so his sweaty sock was pretty darn nasty. At Garbage Out: How Confident are one point, I think I had to fight his old sweaty sock for a beer since it kept the Steelers in Cleveland Markus..
  • NCAA Division I Baseball Records

    NCAA Division I Baseball Records

    Division I Baseball Records Individual Records .................................................................. 2 Individual Leaders .................................................................. 4 Annual Individual Champions .......................................... 14 Team Records ........................................................................... 22 Team Leaders ............................................................................ 24 Annual Team Champions .................................................... 32 All-Time Winningest Teams ................................................ 38 Collegiate Baseball Division I Final Polls ....................... 42 Baseball America Division I Final Polls ........................... 45 USA Today Baseball Weekly/ESPN/ American Baseball Coaches Association Division I Final Polls ............................................................ 46 National Collegiate Baseball Writers Association Division I Final Polls ............................................................ 48 Statistical Trends ...................................................................... 49 No-Hitters and Perfect Games by Year .......................... 50 2 NCAA BASEBALL DIVISION I RECORDS THROUGH 2011 Official NCAA Division I baseball records began Season Career with the 1957 season and are based on informa- 39—Jason Krizan, Dallas Baptist, 2011 (62 games) 346—Jeff Ledbetter, Florida St., 1979-82 (262 games) tion submitted to the NCAA statistics service by Career RUNS BATTED IN PER GAME institutions
  • Understanding Advanced Baseball Stats: Hitting

    Understanding Advanced Baseball Stats: Hitting

    Understanding Advanced Baseball Stats: Hitting “Baseball is like church. Many attend few understand.” ~ Leo Durocher Durocher, a 17-year major league vet and Hall of Fame manager, sums up the game of baseball quite brilliantly in the above quote, and it’s pretty ridiculous how much fans really don’t understand about the game of baseball that they watch so much. This holds especially true when you start talking about baseball stats. Sure, most people can tell you what a home run is and that batting average is important, but once you get past the basic stats, the rest is really uncharted territory for most fans. But fear not! This is your crash course in advanced baseball stats, explained in plain English, so that even the most rudimentary of fans can become knowledgeable in the mysterious world of baseball analytics, or sabermetrics as it is called in the industry. Because there are so many different stats that can be covered, I’m just going to touch on the hitting stats in this article and we can save the pitching ones for another piece. So without further ado – baseball stats! The Slash Line The baseball “slash line” typically looks like three different numbers rounded to the thousandth decimal place that are separated by forward slashes (hence the name). We’ll use Mike Trout‘s 2014 slash line as an example; this is what a typical slash line looks like: .287/.377/.561 The first of those numbers represents batting average. While most fans know about this stat, I’ll touch on it briefly just to make sure that I have all of my bases covered (baseball pun intended).
  • The Base out Model of Baseball the BOMB

    The Base out Model of Baseball the BOMB

    The “Base-Out Model” of Baseball—the BOMB! Barry Codell’s Diamond Metrics (“Diametrics”) seem the perfect mean between misleading traditional statistics and nonsensical sabermetrics--diametrically opposed to each. The Base-Out Model works so beautifully because it is expressive of the game itself: the batter’s ceaseless attempt to accumulate bases into runs while avoiding outs is what we are rooting for constantly (or, of course, from the pitching side, pulling for the hurler’s effort to stop baserunners allowed, before they become runs, by recording outs). Cub fans cheered opening day when Alfonso Soriano led off the game and the season with a home run off the Astros’ Roy Oswalt. Bill James’ Runs Created claimed Alfonso’s homer created four runs. Codell’s Runs Tallied says it tallied one. Let’s, beyond fantasy, get real: that cheering was not because Soriano (or James) somehow “created” 4 runs to make the score 1-0, but for the fact that he tallied 1 run for his team by touching all 4 bases. Even Joe Reichler’s terminally flawed Runs Produced (R + RBI – HR) can reflect that Soriano produced a run. (Even a broken clock . .) Let’s now posit the following, a realistic rally following the tally: after Soriano’s blast, 3 straight singles (the third an RBI) and a 3-run homer, making the score 5-0. Codell computes and chronicles, naturally, 5 Runs Tallied, but James “creates” 11 (as Reichler “produces” 8)! The ever-changing, never-ending algebraic Runs Created formula defies common sense. Its complexity is only exceeded by its inaccuracy.
  • A Statistical Study Nicholas Lambrianou 13' Dr. Nicko

    A Statistical Study Nicholas Lambrianou 13' Dr. Nicko

    Examining if High-Team Payroll Leads to High-Team Performance in Baseball: A Statistical Study Nicholas Lambrianou 13' B.S. In Mathematics with Minors in English and Economics Dr. Nickolas Kintos Thesis Advisor Thesis submitted to: Honors Program of Saint Peter's University April 2013 Lambrianou 2 Table of Contents Chapter 1: The Study and its Questions 3 An Introduction to the project, its questions, and a breakdown of the chapters that follow Chapter 2: The Baseball Statistics 5 An explanation of the baseball statistics used for the study, including what the statistics measure, how they measure what they do, and their strengths and weaknesses Chapter 3: Statistical Methods and Procedures 16 An introduction to the statistical methods applied to each statistic and an explanation of what the possible results would mean Chapter 4: Results and the Tampa Bay Rays 22 The results of the study, what they mean against the possibilities and other results, and a short analysis of a team that stood out in the study Chapter 5: The Continuing Conclusion 39 A continuation of the results, followed by ideas for future study that continue to project or stem from it for future baseball analysis Appendix 41 References 42 Lambrianou 3 Chapter 1: The Study and its Questions Does high payroll necessarily mean higher performance for all baseball statistics? Major League Baseball (MLB) is a league of different teams in different cities all across the United States, and those locations strongly influence the market of the team and thus the payroll. Year after year, a certain amount of teams, including the usual ones in big markets, choose to spend a great amount on payroll in hopes of improving their team and its player value output, but at times the statistics produced by these teams may not match the difference in payroll with other teams.
  • UPCOMING SCHEDULE and PROBABLE STARTING PITCHERS DATE OPPONENT TIME TV ORIOLES STARTER OPPONENT STARTER June 12 at Tampa Bay 4:10 P.M

    UPCOMING SCHEDULE and PROBABLE STARTING PITCHERS DATE OPPONENT TIME TV ORIOLES STARTER OPPONENT STARTER June 12 at Tampa Bay 4:10 P.M

    FRIDAY, JUNE 11, 2021 • GAME #62 • ROAD GAME #30 BALTIMORE ORIOLES (22-39) at TAMPA BAY RAYS (39-24) LHP Keegan Akin (0-0, 3.60) vs. LHP Ryan Yarbrough (3-3, 3.95) O’s SEASON BREAKDOWN KING OF THE CASTLE: INF/OF Ryan Mountcastle has driven in at least one run in eight- HITTING IT OFF Overall 22-39 straight games, the longest streak in the majors this season and the longest streak by a rookie American League Hit Leaders: Home 11-21 in club history (since 1954)...He is the first Oriole with an eight-game RBI streak since Anthony No. 1) CEDRIC MULLINS, BAL 76 hits Road 11-18 Santander did so from August 6-14, 2020; club record is 11-straight by Doug DeCinces (Sep- No. 2) Xander Bogaerts, BOS 73 hits Day 9-18 tember 22, 1978 - April 6, 1979) and the club record for a single-season is 10-straight by Reggie Isiah Kiner-Falefa, TEX 73 hits Night 13-21 Jackson (July 11-23, 1976)...The MLB record for consecutive games with an RBI by a rookie is No. 4) Vladimir Guerrero, Jr., TOR 70 hits Current Streak L1 10...Mountcastle has hit safely in each of these eight games, slashing .394/.412/.848 (13-for-33) Yuli Gurriel, HOU 70 hits Last 5 Games 3-2 with three doubles, four home runs, seven runs scored, and 12 RBI. Marcus Semien, TOR 70 hits Last 10 Games 5-5 Mountcastle’s eight-game hitting streak is the longest of his career and tied for the April 12-14 fourth-longest active hitting streak in the American League.
  • Making It Pay to Be a Fan: the Political Economy of Digital Sports Fandom and the Sports Media Industry

    Making It Pay to Be a Fan: the Political Economy of Digital Sports Fandom and the Sports Media Industry

    City University of New York (CUNY) CUNY Academic Works All Dissertations, Theses, and Capstone Projects Dissertations, Theses, and Capstone Projects 9-2018 Making It Pay to be a Fan: The Political Economy of Digital Sports Fandom and the Sports Media Industry Andrew McKinney The Graduate Center, City University of New York How does access to this work benefit ou?y Let us know! More information about this work at: https://academicworks.cuny.edu/gc_etds/2800 Discover additional works at: https://academicworks.cuny.edu This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] MAKING IT PAY TO BE A FAN: THE POLITICAL ECONOMY OF DIGITAL SPORTS FANDOM AND THE SPORTS MEDIA INDUSTRY by Andrew G McKinney A dissertation submitted to the Graduate Faculty in Sociology in partial fulfillment of the requirements for the degree of Doctor of Philosophy, The City University of New York 2018 ©2018 ANDREW G MCKINNEY All Rights Reserved ii Making it Pay to be a Fan: The Political Economy of Digital Sport Fandom and the Sports Media Industry by Andrew G McKinney This manuscript has been read and accepted for the Graduate Faculty in Sociology in satisfaction of the dissertation requirement for the degree of Doctor of Philosophy. Date William Kornblum Chair of Examining Committee Date Lynn Chancer Executive Officer Supervisory Committee: William Kornblum Stanley Aronowitz Lynn Chancer THE CITY UNIVERSITY OF NEW YORK I iii ABSTRACT Making it Pay to be a Fan: The Political Economy of Digital Sport Fandom and the Sports Media Industry by Andrew G McKinney Advisor: William Kornblum This dissertation is a series of case studies and sociological examinations of the role that the sports media industry and mediated sport fandom plays in the political economy of the Internet.
  • Blogger Widge Was Anyone Really Concerned When Rambo And

    Blogger Widge Was Anyone Really Concerned When Rambo And

    NFL MLB NBA NHL NCAAF NCAAB SOCCER TICKETS Blogger Widgets NFL ­ Team Reports FXP Staff Writers STAY CURRENT - FOLLOW MONDAY, APRIL 29, 2013 FOOTBALLXPS MOST DISCUSSED YOUR WORST NIGHTMARE: THE PACKERS 2013 DRAFT ANALYSIS The Packers are improving, but they can't Was anyone really concerned when Rambo and Colonel Trautman were win the Super Bowl. Week surrounded by a Russian tank brigade in Rambo III? 17 Power Rankings. Geno Smith outperformed everyone's expectations Percy Harvin is moving on to the Seahawks, where RF SPORTS RADIO - LIVE do the Vikings go to replace him? 00:01 Happy Hour Network ­ Baseball How good is Seattle right Beer and BBQ now? Are they the best team in the NFL? Week 17 Power Rankings Isn't That Kangeroo Cute?: Early NFL Week 1 Fantasy Football Pickups Tim Tebow to sign Of course not. It’s Rambo for goodness sake. And his ridiculously long with Eagles knife. And the guy that taught Rambo how to be such a badass. One Off­season does have to wonder, though, why it is that the U.S. armed forces in total Tony Romo zings Patriots at Country... apparently trained only one person on how to run any kind of covert ops. I mean come on, where’s the rest of the budget going? And can’t anyone Report: Adrian at least afford to buy a spandex shirt for Rambo so he doesn’t keep Peterson Not Troubleshooting: St. Louis Likely... tearing through them? Haven’t they ever heard of Under Armour? But I Rams digress. Green Bay Packers: Complete seven..
  • SUNDAY, MAY 2, 2021 • GAME #28 • ROAD GAME #14 BALTIMORE ORIOLES (13-14) at OAKLAND ATHLETICS (16-13) LHP Bruce Zimmermann (1-3, 5.33) Vs

    SUNDAY, MAY 2, 2021 • GAME #28 • ROAD GAME #14 BALTIMORE ORIOLES (13-14) at OAKLAND ATHLETICS (16-13) LHP Bruce Zimmermann (1-3, 5.33) Vs

    SUNDAY, MAY 2, 2021 • GAME #28 • ROAD GAME #14 BALTIMORE ORIOLES (13-14) at OAKLAND ATHLETICS (16-13) LHP Bruce Zimmermann (1-3, 5.33) vs. LHP Sean Manaea (3-1, 2.83) O’s SEASON BREAKDOWN LIFE ON THE ROAD: The O’s defeated the A’s for the second-straight day, capturing their third road HITTING IT OFF Overall 13-14 series win of the season; the O’s have yet to win a series at home...The O’s have gone 9-4 on the Major League Hit Leaders: Home 4-10 road for a .692 winning percentage, the second-highest in the AL and tied for third-best in the majors. No. 1) CEDRIC MULLINS, BAL 35 hits Road 9-4 The O’s have posted a team ERA of 2.93 (38 ER/116.2 IP) in their 13 road games, the J.D. Martinez, BOS 35 hits Day 6-6 lowest road ERA in the majors. No. 3) Xander Bogaerts, BOS 34 hits Night 7-8 The O’s have averaged 4.1 runs per game on the road and 3.5 at home; they have a Yermin Mercedes, CWS 34 hits Current Streak W3 +13 run differential on the road and a -22 run differential at home. No. 5) Tommy Edman, STL 33 hits Last 5 Games 3-2 The O’s are looking for their second road sweep of the season (4/2-4 at BOS). Mike Trout, LAA 33 hits Last 10 Games 5-5 With a win today, the O’s would reach the .500 mark for the first time since being 4-4..
  • Sports Analytics Algorithms for Performance Prediction

    Sports Analytics Algorithms for Performance Prediction

    Sports Analytics Algorithms for Performance Prediction Paschalis Koudoumas SID: 3308190012 SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Data Science JANUARY 2021 THESSALONIKI – GREECE -i- Sports Analytics Algorithms for Performance Prediction Paschalis Koudoumas SID: 3308190012 Supervisor: Assoc. Prof. Christos Tjortjis Supervising Committee Mem- Assoc. Prof. Maria Drakaki bers: Dr. Leonidas Akritidis SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Data Science JANUARY 2021 THESSALONIKI – GREECE -ii- Abstract This dissertation was written as a part of the MSc in Data Science at the International Hellenic University. Sports Analytics exist as a term and concept for many years, but nowadays, it is imple- mented in a different way that affects how teams, players, managers, executives, betting companies and fans perceive statistics and sports. Machine Learning can have various applications in Sports Analytics. The most widely used are for prediction of match outcome, player or team performance, market value of a player and injuries prevention. This dissertation focuses on the quintessence of foot- ball, which is match outcome prediction. The main objective of this dissertation is to explore, develop and evaluate machine learning predictive models for English Premier League matches’ outcome prediction. A comparison was made between XGBoost Classifier, Logistic Regression and Support Vector Classifier. The results show that the XGBoost model can outperform the other models in terms of accuracy and prove that it is possible to achieve quite high accuracy using Extreme Gradient Boosting. -iii- Acknowledgements At this point, I would like to thank my Supervisor, Professor Christos Tjortjis, for offer- ing his help throughout the process and providing me with essential feedback and valu- able suggestions to the issues that occurred.