Universite´ catholique de Louvain Institute of Information and Communication Technologies, Electronics and Applied Mathematics Hybrid Models to Predict Recreational Runners' Performance Dimitri de Smet d'Olbecke Thesis submitted in partial fulfillment of the requirements for the Ph.D. Degree in Engineering Sciences and Technology Thesis Committee Advisor: Michel Verleysen (UCLouvain) Jury: Olivier Br¨uls (ULi`ege) Marc Francaux (UCLouvain) Bernadette Govaerts (UCLouvain) Romain H´erault(INSA, Rouen) John Lee (UCLouvain) Chairman: Jean-Pierre Raskin (UCLouvain) Louvain-la-Neuve, October, 2019. Abstract When long-distance runners prepare for a race, they can train more effi- ciently if they are able to predict their expected performance. Accurate race time prediction also allows them to pick the right pace from the beginning of the race, which is known to impact the race outcome significantly. Usu- ally, expected performance is estimated using fitness and endurance metrics provided by analyzing standardized exercise protocols in specialized labora- tories. Unfortunately, most runners (especially recreational runners) cannot afford access to the required equipment and dedicated staff. In recent years, some companies have started to offer digital coaching for runners through sports watches and smartphone applications. One of the challenges that these companies face is to track runner fitness levels so that their workout planning can evolve according to their progress and so that race paces can be recommended. This thesis addresses the problem of predicting runners' performance based on data that is cheap and easy to collect, even for recreational runners: previous race times and workout session recordings (most often timestamps, heart rates, and positioning). The modeling of performance is said to be hybrid because it combines blind machine learning methods applied to large sets of data with knowledge from the domain literature. Runner performance can be modeled as a function that describes how race times evolve in relation to race length (in meters). This function can be adjusted to fit previous race times obtained on given race lengths so that performances can be extrapolated to other race lengths. However, the re- gression is made difficult when the number of race records is limited or when the past performances present a high variability. This issue is addressed by using a probabilistic setting that includes probability distributions inferred from a large collection of race times. Races cannot be strictly summarized by their length: gradient of ascent, weather conditions, altitude, vegetation, uneven ground and ground firm- 3 4 ness affect runner speeds. Nevertheless, performance modeling is still pos- sible using the notion of equivalent distances. This considers races char- acterized by only one parameter, all race factors being summarized by the equivalent distance. Equivalent distances are assigned to races based on race times; but, as races are not run by the same set of runners, equivalent distance estimation must address the problem of disparities between partic- ipants. This is done by evaluating races and runners simultaneously using collaborative filtering techniques. Subsequently, the relationship between these equivalent distances and race elevation profiles is formalized. Many runners record their runs using a smartphone or a sports watch. If their device is paired with a heart rate sensor, their relative intensity of exer- tion is continuously monitored. Runners' fitness levels can thus be assessed by relating their heart rate to an estimation of their activity level. The latter is derived from their recorded geolocations associated with timestamps. Acknowledgements Let me take the opportunity at the outset of this text to thank the people who made this thesis possible, the people who made it better, and the people who have made the last six years as enjoyable as they have been. First of all, I would like to thank Michel Verlysen for the courses he gave, which sparked my interest in the field of machine learning and made me want to undertake a thesis. His experience and vision have allowed me to grow as a scientist and I am thankful for his careful supervision of my thesis. I am also very grateful to Marc Francaux and Laurent Baijot for raising the central questions addressed in this thesis, as well as their respective competencies in the fields of sports sciences and coaching, which I did not initially share. This manuscript has been made more rigorous and precise thanks to valuable feedback from John Lee, Bernadette Govearts, Romain Herault and Olivier Br¨uls.During the final phase, the language was also significantly improved with the help of Mary Munroe, whom I warmly thank here. The years I spent at university would not have been nearly as pleasant with- out my colleagues. I thank them for sharing their scientific and technical expertise but also and especially for their excellent company! Of course, this thesis was made possible thanks to the people who surround me. That includes my colleagues, but firstly my parents, family and friends. I sincerely thank all of them, especially my wife Aline De Broux, to whom I dedicate this thesis for the support she has offered thus far and which she continues to offer to me and our three wonderful children. Finally, thank you, reader, for opening this thesis for whatever reason, even if this is the only page you intended to read. 5 6 Contents 1 Introduction 11 1.1 Motivation and Scope...................... 11 1.2 Modeling and Validation..................... 13 1.3 List of Publications........................ 16 1.4 Summary of the Contributions................. 17 1.5 Organization of the thesis.................... 18 1.6 Conventions............................ 19 1.6.1 Units........................... 19 1.6.2 Illustrations........................ 19 1.6.3 Error Boxplots...................... 20 1.6.4 Notations......................... 20 1.6.5 Acronyms......................... 21 2 Data Description and Pre-Processing 23 2.1 Introduction............................ 23 2.2 Race Results........................... 24 2.2.1 Data Description..................... 24 2.2.2 Filtering the Data.................... 24 Minimal Races per Runner............... 25 Isolated Races...................... 25 2.2.3 Race Features....................... 27 2.3 Runner Recordings........................ 27 2.3.1 Heart Rate........................ 28 2.3.2 Geo-Localized Positions................. 28 2.3.3 Speeds........................... 29 2.3.4 Elevation Data...................... 29 2.3.5 Gradient of Ascent.................... 30 2.3.6 Instant Power During Exercise............. 31 2.3.7 Smoothing Speeds and Slopes.............. 33 2.3.8 Merging the Two Data of Sources........... 33 7 8 Contents 2.3.9 Race Tracks Cropping.................. 33 2.4 Conclusion............................ 36 3 Runner Performances Modeling 37 3.1 Existing Models.......................... 38 3.1.1 Power Law........................ 38 3.1.2 Hyperbolic Two-Parameter Model........... 41 3.1.3 Hyperbolic Three-parameter model........... 42 3.1.4 Exponential Model.................... 44 3.1.5 Logarithmic Endurance Model (P´eronnet)....... 45 3.1.6 VDOT Model....................... 46 3.2 Proposed Models......................... 47 3.2.1 Two-Threshold Power Law............... 48 3.2.2 Polynomial-Logarithmic Model............. 48 3.3 Fitting of the Models....................... 50 3.3.1 Power-Law........................ 50 3.3.2 Hyperbolic Two-Parameter Model........... 51 3.3.3 Hyperbolic Three-Parameter Model.......... 51 3.3.4 Exponential Model.................... 52 3.3.5 Logarithmic Endurance Model............. 52 3.3.6 VDOT Model....................... 52 3.3.7 Two-Threshold Power Law............... 52 3.3.8 Polynomial-Logarithmic Model............. 53 3.4 Model Comparison........................ 53 3.4.1 World Records Fitting.................. 54 3.4.2 Race Performances Prediction.............. 54 3.4.3 Results.......................... 54 World Records Fitting.................. 54 Race Performances Predictions............. 55 3.5 Discussion............................. 58 4 Races and Athletes Characterization from Race Results 61 4.1 Introduction............................ 61 4.2 Low-Rank Approximation of the Race Results Matrix.... 63 4.3 Cost Function........................... 65 4.4 Solution.............................. 65 4.5 Data Requirements........................ 67 4.5.1 Communities of Runners................. 68 4.5.2 Choosing Rank k ..................... 68 4.6 Validation Process........................ 69 4.7 Results............................... 69 4.7.1 Data............................ 69 4.7.2 Optimization....................... 70 Contents 9 4.7.3 Model Selection...................... 70 4.8 Discussion............................. 71 4.8.1 Race Distances and Rank-One Approximation.... 71 4.8.2 More on Communities.................. 72 4.9 Conclusion............................ 77 5 Race Equivalent Distances and Race Features 79 5.1 Introduction............................ 79 5.2 Equivalent Distances From Race Results............ 80 5.2.1 Equivalent Distance................... 80 5.2.2 Obtaining the Equivalent Distances.......... 81 5.2.3 Data Requirements.................... 82 5.3 Equivalent Distance From Elevation Profile.......... 82 5.3.1 Models.......................... 84 5.3.2 Model Fitting....................... 85 5.4 Validation Process........................ 86 5.5 Results..............................
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages146 Page
-
File Size-