Mathematical modelling of football

Start again 11:15 David Sumpter Uppsala University & Hammarby IF Structure today

• Summary of last time. Key Performance Indices.

• Statistical models of passes

• Summary of expected goals (chance to raise questions)

Break 11-11:15

Please ask questions in the chat section where possible and I will answer them as I go.

I will take a reasonably slow pace and interact along the way. Go in to canvas and look where we are in the course… First a correction. Pass arrows. 138 SOCCERMATICS

the defence. By marking every point at which the ball was played just before each of the Real Madrid shots during the Champions League season, we can get an overall picture of how they create successful attacks. Figure 7.12 is a risk map showing where the ball was played in the 15 seconds leading up to a shot from the 20m by 20m area in front of . The darker areas show places where there is a high risk of a Real Madrid shot from the danger zone coming within the next 15 seconds; the lighter areas show places where the risk is low. Corners are one clear risk-zone and, not surprisingly, if the ball is already in the box then the risk of a shot is high. But the most interesting risk zone is the hot area outside the box on Real Madrid’s left. This area of the pitch is mainly Summary from last timeoccupied by Marcelo, who comes up on the left wing, and Ronaldo, who is more central. It is from here that dangerous chances are created.

‘Football looked at in a very different way’ Pat Nevin Soccermatics ‘Every football nerd’s dream.’ FourFourTwo

Football – the most mathematical of sports. From shot statistics and league tables to the geometry of passing and managerial strategy, the modern game is filled with numbers, patterns and shapes. How do we make sense of them? The answer lies in modelling – mathematical processes more usually applied in biology, physics and economics. Soccermatics Soccermatics brings football and mathematics together in a mind-bending synthesis, using numbers to help reveal the Figure 7.12 Real Madrid danger-zones duringinner workings of the the beautiful game. Champions This new and expanded edition analyses the current big- League season 2014/15.The shading is proportionallyname players and teams darker using mathematics, in and areas meets the professionals working inside football who use numbers and where the ball was located during the 15 secondsstatistics to boost performance. leading up to a No matter who you follow – from your local non-league side to the big boys of the Premiership, , the Bundesliga, Serie shot from the 20m by 20m area in front of theA or theopposition MLS – you’ll be amazed atgoal. what mathematics Data has to DAVID SUMPTER MATHEMATICAL ADVENTURES IN THE Beautiful GAME provided by Opta. teach us about the world’s favourite sport.

www.bloomsbury.com £9.99

DAVID SUMPTER Cover photograph: © GettyImages PRO EDITION

99781472924124_Soccermatics_Book_Finalpass.indd781472924124_Soccermatics_Book_Finalpass.indd 138138 112/13/20162/13/2016 8:29:538:29:53 PMPM Summary from last time…

• Raw data is seldom enough. • Standard visualisations are seldom enough (why we need Python!) • What is the question? How do you answer it? • Does the data support your hypothesis? • Danger of self-confirmation, but risk of ignoring domain knowledge. • Building measures that can then be used for future benchmarking (KPIs) • Use both deductive (story) vs. inductive (data) thinking.

Key Performance indices at clubs

• Entries final third/box. • Shots danger zone. • Number of passes leading to shot. • Ball recoveries within 5 seconds. • Passing tempo. • Expected goals.

Single set of measures that are used over the entire club.

Lecture 2 Statistics of Passing Mathematical modelling of football CHECK MY FLOW 67

CHECK MY FLOW 67

CHECK MY FLOW 67

Pass sequence

Figure 3.6 Passes leading up to Italy’s f rst goal against Germany in Euro 2012, showing passes made by Italy (arrows in top panel) andFigure Mesut 3.6 Özil’s Passes movement leading up while to Italy’s chasing f rst the goal ball against (bottom Germany panel). Darkerin Euro shading 2012, showingindicates passes more maderecent by events Italy in(arrows time. inLetters top panel)indi- cateand events:Mesut (A)Özil’s Pirlo’s movement f rst pass; while (B) Pirlo’schasing second the ball pass; (bottom (C) Chiellini panel). receivesDarker shadingthe ball; indicates(D) Balotelli more scores. recent events in time. Letters indi- cate events: (A) Pirlo’s f rst pass; (B) Pirlo’s second pass; (C) Chiellini receives the ball; (D) Balotelli scores.

99781472924124_Soccermatics_Book_Finalpass.indd781472924124_Soccermatics_Book_Finalpass.indd 6767 112/13/20162/13/2016 8:29:408:29:40 PMPM

99781472924124_Soccermatics_Book_Finalpass.indd781472924124_Soccermatics_Book_Finalpass.indd 6767 112/13/20162/13/2016 8:29:408:29:40 PMPM

Figure 3.6 Passes leading up to Italy’s f rst goal against Germany in Euro 2012, showing passes made by Italy (arrows in top panel) and Mesut Özil’s movement while chasing the ball (bottom panel). Darker shading indicates more recent events in time. Letters indi- cate events: (A) Pirlo’s f rst pass; (B) Pirlo’s second pass; (C) Chiellini receives the ball; (D) Balotelli scores.

99781472924124_Soccermatics_Book_Finalpass.indd781472924124_Soccermatics_Book_Finalpass.indd 6767 112/13/20162/13/2016 8:29:408:29:40 PMPM Diego Escribano (Group 2) Philip Winchester (Group 2)

The next step is to go from visual understanding to statistical understanding… All passes by ’s women in World Cup All passes by England’s women in World Cup

5 lanes 5 ‘heights’ All passes by England’s women in World Cup

10 lanes 10 ‘heights’ Passes within 15 seconds of a shot Passes within 15 seconds of a shot Limitations

• Not adjusted for xG of chance created.

• For example, all chances with greater than 0.05 xG. (Challenge)

• Not compared to other teams. Passes made by each player

Lucy Bronze 31 11 Francesca Kirby 26 Abbie McManus 10 23 Bethany Mead 10 22 9 16 4 Stephanie Houghton 16 4 13 4 13 3 12 Karen Julia Carney 3 1 Limitations

• Not corrected for minutes played.

• Again, could be corrected for xG. Lets look at the code…

7PassHeatMap.py Still don’t have statistical understanding until we compare to other teams… Do teams that pass more shot more?

All teams in the Women’s World Cup Linear regression

Minimize sum of distance between points and line.

Equivalently, minimize sum of squares of differences. Do teams that pass more shot more? Fitting in python

Goals = b0 + b1Passes

b0 b1 Fitting in python

Test goodness of fit Fitting in python

Goals = b0 + b1Passes

b0 b1

Test null hypothesis that intercept (b 0) and slope (b1) are zero Can we learn something about individual teams?

USA England Linear regression is a quick way of checking relationships in data They predict future goals better than goals.

https://cartilagefreecaptain.sbnation.com/2014/2/28/5452786/shot-matrix-tottenham-hotspur-stats-analysis- expected-goals Possession and goal difference

Premier League 16/17

https://medium.com/@Soccermatics/how-important-is-it-to-have-the-ball-47f93b7760fd Possession and goal difference

Champions League 16/17

https://medium.com/@Soccermatics/how-important-is-it-to-have-the-ball-47f93b7760fd Possession and winning are not usually correlated! For goals we should us Poisson regression (next lecture)

No evidence to dismiss the null hypothesis Poisson regression fit.

b0 b1 Poisson regression fit. Look in code

8PassCompare.py Difference to average of all teams What would I tell England (or ) about their World Cup? Based on 3 or 4 hours of data analysis… 31 Francesca Kirby 26 Jill Scott 23 Nikita Parris 22 Keira Walsh 16 Stephanie Houghton 16 Rachel Daly 13 Toni Duggan 13 Ellen White 12 Data Alchemy Outnumbered Algorithms are running our society, and we don’t really know what they are up to. ‘You’ve heard about these algorithms that run your life, and you want to know two things: how exactly do they Featuring Our increasing reliance on technology and the internet has opened a window for work? And how much should I worry? With a refreshing David Cambridge mathematicians and data researchers to Analytica mix of in-depth knowledge and personal honesty, gaze through into our lives. Using the data David Sumpter answers both those questions.’ they are constantly collecting about where Timandra Harkness, writer, comedian and Sumpter we travel, where we shop, what we buy and broadcaster, and author of Big Data what interests us, they can begin to predict our daily habits. But how reliable is this data? Without understanding what mathematics can and can’t do, it is impossible to get a ‘A stellar book about the application of mathematics handle on how it is changing our lives. David Sumpter is Professor of Applied to the real world. Each chapter tells a fascinating story, Outnumbered Mathematics at the University of Uppsala, In this book, David Sumpter takes an and David’s warm and witty style demonstrates that a Sweden. Originally from , but growing algorithm-strewn journey to the dark up in Scotland, he completed his doctorate mathematician can be so much more than just side of mathematics. He investigates the in Mathematics at , and held a a machine for turning coffee into theorems. equations that analyse us, influence us and Royal Society Fellowship at before A riveting read.’ will (maybe) become like us, answering Facebook questions such as: heading to Sweden. His scientific research From Kit Yates, Senior Lecturer, Department of covers everything from the inner workings Mathematical Sciences, University of Bath How does Facebook build a 100-dimensional of fish schools and ant colonies, the analysis picture of your personality? of the passing networks of football teams, Google and to Are Google algorithms racist and sexist? segregation in society to machine learning and artificial intelligence. ‘As millions slowly wake up to the pitfalls of handing over Why do election predictions fail so David has written for The Economist, The their digital lives, Sumpter combines engaging hands-on fake news drastically? Telegraph, Current Biology, Mathematics demonstrations with stories from insiders to shed light on and Are algorithms that are designed to find Today and FourFourTwo magazine, amongst precisely how data alchemists seek to persuade and criminals making terrible mistakes? others. He has been awarded the IMA’s predict us, and whether their almighty algorithms David Sumpter What does the future hold as we relinquish Catherine Richards prize for communicating are all they’re hyped up to be.’ filter-bubbles – our decision-making to machines? mathematics to a wider audience. David’s John Burn-Murdoch, data journalist, Financial Times first book was Soccermatics: Mathematical Featuring interviews with those working at Adventures in the Beautiful Game. the cutting edge of algorithm research, the algorithms along with a healthy dose of mathematical self-experiment, Outnumbered will explain how mathematics and statistics work in that control the real world, and what we should and shouldn’t worry about. https://medium.com/@Soccermatics/why-algorithms-are-no-better-than-humans-at- BLOOMSBURY SIGMA, BOOK THIRTY-SIX A lot of people feel outnumbered by our lives algorithms – don’t be one of them. predicting-exam-results-goals-in-football-musical-f5650aeb1cbb Author photo © Lovisa Sumpter £16.99 US $27.00 / CAN.$36.00 Arrogance in what you can find out… but modesty in what you can say that football coaches don’t know. Summary of how to be a football data ‘alchemist’.

• Plot things and make pictures. • Do statistical tests. Be rigorous. • Look for unexpected patterns. • Try to build up a picture of what is going on that aligns with footballing knowledge. • Enjoy yourself. Go in to canvas and look at the exercise…