Baseball Hacks™ by Joseph Adler Copyright © 2006 O’Reilly Media, Inc
Total Page:16
File Type:pdf, Size:1020Kb
Other resources from O’Reilly Related titles Access Hacks PHP Hacks Excel Hacks Retro Gaming Hacks Gaming Hacks Spidering Hacks Google Hacks Programming Perl Google Maps Hacks MySQL Cookbook Hacks Series Home hacks.oreilly.com is a community site for developers and power users ofall stripes. Readers learn fromeach other as they share their favorite tips and tools for Mac OS X, Linux, Google, Windows XP, and more. oreilly.com oreilly.com is more than a complete catalog ofO’Reilly books. You’ll also find links to news, events, articles, weblogs, sample chapters, and code examples. oreillynet.com is the essential portal for developers inter- ested in open and emerging technologies, including new platforms, programming languages, and operating systems. Conferences O’Reilly brings diverse innovators together to nurture the ideas that spark revolutionary industries. We special- ize in documenting the latest tools and systems, translating the innovator’s knowledge into useful skills for those in the trenches. Visit conferences.oreilly.com for our upcoming events. Safari Bookshelf (safari.oreilly.com) is the premier online reference library for programmers and IT professionals. Conduct searches across more than 1,000 books. Sub- scribers can zero in on answers to time-critical questions in a matter ofseconds. Read the books on your Book- shelf from cover to cover or simply flip to the page you need. Try it today. BASEBALL HACKSTM Joseph Adler Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo Baseball Hacks™ by Joseph Adler Copyright © 2006 O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Editors: Tatiana Apandi and Indexer: Johnna Dinse Andrew Odewahn Cover Designer: Hanna Dyer Production Editor: Marlowe Shaeffer Interior Designer: David Futato Copyeditor: Audrey Doyle Illustrators: Robert Romano, Jessamyn Proofreader: Chris Downey Read, and Lesley Borash Printing History: February 2006: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. The Hacks series designations, Baseball Hacks, the image of an umpire’s indicator, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. Small print: The technologies discussed in this publication, the limitations on these technologies that technology and content owners seek to impose, and the laws actually limiting the use ofthese technologies are constantly changing. Thus, some of the hacks described in this publication may not work, may cause unintended harm to systems on which they are used, or may not be consistent with applicable user agreements. Your use ofthese hacks is at your own risk, and O’Reilly Media, Inc. disclaims responsibility for any damage or expense resulting from their use. In any event, you should take care that your use of these hacks does not violate any applicable laws, including copyright laws. [LSI][2011-07-29] ISBN: 978-0-596-00942-7 Contents Credits . ix Preface . xiii Chapter 1. Basics of Baseball . 1 1. Score a Baseball Game 4 2. Make a Box Score from a Score Sheet 12 3. Keep Score, Project Scoresheet–Style 19 4. Follow Pitches During a Game 26 5. Follow the Game Online 31 6. Add Baseball Searches to Firefox 33 7. Find Images of Stadiums 36 Chapter 2. Baseball Games from Past Years . 39 8. Get and Install MySQL 40 9. Get an Access Database of Player and Team Statistics 46 10. Get a MySQL Database of Player and Team Statistics 49 11. Make Your Own Stats Book 54 12. Get Perl 64 13. Learn Perl 67 14. Get Historical Play-by-Play Data 74 15. Make Box Scores or Database Tables from Play-by-Play Data with Retrosheet Tools 78 16. Use SQL to Explore Game Data 84 17. Use Microsoft Access to Run SQL Queries 92 18. Get a GUI for MySQL 94 v 19. Move Data from a Database to Excel 97 20. Load Baseball Data into MySQL 100 21. Load Retrosheet Game Logs 105 22. Make a Historical Play-by-Play Database 107 23. Use Regular Expressions to Identify Events 111 Chapter 3. Stats from the Current Season . 114 24. Use Microsoft Excel Web Queries to Get Stats 114 25. Spider Baseball Sites for Data 121 26. Discover How Live Score Applications Work 127 27. Keep Your Stats Database Up-to-Date 131 28. Get Recent Play-by-Play Data 142 29. Find Data on Hit Locations 151 Chapter 4. Visualize Baseball Statistics . 158 30. Plot Histograms in Excel 158 31. Get R and R Packages 162 32. Analyze Baseball with R 164 33. Access Databases Directly from Excel or R 170 34. Load Text Files into R 180 35. Compare Teams and Players with Lattices 182 36. Compare Teams Using Chernoff Faces 185 37. Plot Spray Charts 188 38. Chart Team Stats in Real Time 193 39. Slice and Dice Teams with Cubes 200 Chapter 5. Formulas . 210 40. Measure Batting with Batting Average 213 41. Measure Batting with On-Base Percentage 221 42. Measure Batting with SLG 224 43. Measure Batting with OPS 227 44. Measure Power with ISO 230 45. Measure Batting with Runs Created 234 46. Measure Batting with Linear Weights 238 47. Measure Pitching with ERA 246 48. Measure Pitching with WHIP 251 vi | Contents 49. Measure Pitching with Linear Weights 253 50. Measure Defense with Defensive Efficiency 257 51. Measure Pitching with DIPS 260 52. Measure Base Running Through EqBR 266 53. Measure Fielding with Fielding Percentage 271 54. Measure Fielding with Range Factor 273 55. Measure Fielding with Linear Weights 281 56. Measure Park Effects 288 57. Calculate Fan Save Value 295 58. Calculate Save Value 300 59. Calculate Holds and Decent Holds for Relief Pitchers 305 Chapter 6. Sabermetric Thinking . 310 60. Calculate Expected Runs 311 61. Calculate an Expected Hits Matrix 321 62. Look for Evidence of Platoon Effects 329 63. Significant Number of At Bats 332 64. Find “Clutch” Players 342 65. Calculate Expected Number of Wins 348 66. Measure Hits by Pitch Count 352 67. OBP, SLG, and Scoring Runs 361 68. Measure Skill Versus Luck 365 69. Odds of the Best Team Winning the World Series 377 70. Top 10 Bargain Outfielders 384 71. Fitting Game Scores to a Strength Model 400 Chapter 7. The Bullpen . 410 72. Start or Join a Fantasy League 410 73. Draft Your Fantasy Team 416 74. Make a Scoreboard Widget 419 75. Analyze Other Sports 428 Appendix A. Where to Learn More Stuff . 431 Appendix B. Abbreviations . 436 Index . 439 Contents | vii 0 Credits About the Author Joseph Adler is a researcher in the Advanced Product Development Group at VeriSign, focusing on problems in user authentication, managed security services, and RFID security. Joe has years ofexperience analyzing data, building statistical models, and formulating business strategies as an employee and consultant for companies including DoubleClick, American Express, and Dun & Bradstreet. He is a graduate ofthe Massachusetts Insti- tute ofTechnology with an Sc.B. and an M.Eng. in computer science and computer engineering. Joe is an unapologetic Yankees fan, but he appreci- ates any good baseball game. Joe lives in Silicon Valley with his wife, two cats, and a DirecTV satellite dish. Contributors The following people contributed their hacks, writing, and inspiration to this book: • Tom Dierickx is a data analyst specializing in automating data pro- cesses and authoring data-driven solutions. He has a wide range ofcom- puter programming and database development experience, along with an M.S. in statistics from Arizona State University. All of the languages, tools, and techniques he enthusiastically pursues—whether technical or statistical—are simply an outgrowth ofthe passion he has forworking with data. • Mark E. Johnson, Ph.D., earned his Ph.D. from Princeton University’s Program in Applied and Computational Mathematics in 1998, where his research interests included the visualization and study ofchaotic dynamical systems arising in physics and chemical engineering. A ix cofounder of SportMetrika, Inc., Mark spent the 2004 baseball season as the in-house baseball analyst ofthe National League Champion St. Louis Cardinals. While pursuing his B.S. in computer science and M.A. in mathematics as an undergraduate at Indiana University, Mark spent four years as a student manager of the men’s basketball team under the leadership of Coach Bob Knight. • Matthew S. Johnson, Ph.D., earned his Ph.D. from the Department of Statistics at Carnegie Mellon in 2001. His research interests lie in the field of psychometrics, estimating abilities from repeated categorical responses, where the responses might be answers to test questions used to measure a student’s ability, or wins and losses used to measure a team’s talent. Matt is an assistant professor in the Department of Statis- tics and Computer Information Systems in the Zicklin School of Busi- ness at Baruch College, ofthe City University ofNew York. Matt received his undergraduate degree in mathematics from Indiana Univer- sity in 1996, and he is a cofounder of SportMetrika, Inc.