Read Ebook {PDF EPUB} Cluster by Rider England
Clusteringriders:animproved approach. Inthis post we are goingto continue withthe clusteringproblemthat we started inFebruary. The idea remains the same, we are goingto tryto automaticallygroup riders, but we changed our approachconsiderably. Inthe first step we identifyfour rider clusters:time trialists, sprinters, GC guys/climbers and classics specialists. After that we zoominonthe sprint cluster and the clusteringalgorithmcomes up withthree distinct sprinter types. The first sprinter type is the ‘highspeed flat race’ type ofguy, suchas Giacomo Nizzolo and DylanGroenewegen. The second group of sprinters prefers longstages, and also scores inGC’s (Kuznetsov, Barbero, Mohorič). The third sprinter group consists is anall-round group that performs mainlyonthe World Tour leveland like the classics as well. The most characteristic rider ofthis last group is DannyvanPoppeland the top scorers here are e.g. Peter Saganand GregvanAvermaet.
The old and new approach. Inour first attempt we used the followingdimensions to cluster riders ina single step: ODR flat Points scored inflat one dayraces ODR not-flat Points scored inone dayraces that are not flat ODR unknownPoints scored inone day races for whichwe could not determine the profile TT Points scored intime trials duringa race stage MT Points scored inmountainstages stage flat Points scored inflat stages stage hillPoints scored inhillystages stage hill-flat Points scored inhillystages, but witha reasonablyflat finishGC Points scored inGC’s (excludingyouth/mountain/points/combat classifications) % WT Percentage ofpoints scored at the UWT level% May Percentage ofpoints scored before May1.
The result was 9 different clusters, but manyclusters were quite alike and there were e.g. 3 clusters that were GC related, but the distinctioncame fromthe ‘% WT’ and ‘% May’ variables. These variables do not reallycharacterize the type ofrider but the leveland moments ofpeak performance. Hence, it is more naturalto look at these whentheymaingroups are alreadydetermined. Furthermore, there is also a risk that riders froma different group (e.g. a sprinter) is put ina GC group because ofgreat similarityin‘% WT’ and ‘% May’.
Inthis post we willtake a different approachand do the following: We are goingto make initialclusters onless dimensions. This most likelyleads to fewer clusters that are clearlydefined as GC, sprinter, etc. The eight dimensions that remainfromthe first versionare:GC, ODR flat, ODR non-flat, TT, stage MT, stage flat, stage hill, and stage hill-flat. We introduce one new dimension:points frommountainclassifications. Due to the fact that we count points for flat stages and hillystages witha flat finishwe do not award points for beinginthe finalpoint classification. The totalnumber ofdimensions inthe first step is nine. We stop using Zweeler points. Instead, we use a different point calculationwhere eachtop 20 classificationyields points, withthe number one earning20 points and the number 20 gets 1 points. Riders get points for race stages, GC’s, MC’s (mountainclassifications), and one dayraces. The GC / MC points are adjusted for race length, suchthat grand tours matter most. The 1.UWT classics points get a multiplier of1.5. The points are re-scaled per rider as we are interested inthe source ofthe points, not necessarilythe number ofpoints. This is unchanged compared to our first clustering post. Once we have obtained initialclusters, we are goingto cluster againwithineachcluster to discover the different types ofe.g. sprinters. Inthe second stage clusteringprocedure we willuse additionalvariables suchas ‘% WT’ and ‘% May’. Anadvantage ofthis approachis that we can change the second stage variables, dependent onthe cluster that we investigate. For example, we willtake a closer look at the sprinter cluster and use variables related to average race lengthand race speed.
The first step clusteringresults are outlined inthe next section. Step 1:generalclusters. Figure 1 shows that, withour nine dimensions, the optimalnumber ofclusters inthe first step is four. The radialplot that shows the scores oneach dimensionfor the four clusters is provided inFigure 2.
Figure 1:Optimalnumber ofclusters usingsilhouette appraoch. Figure 2:Scores per dimensionfor eachcluster inthe first step ina radialplot. FromFigure 2 it quicklybecomes clear that especiallythe classic specialists (cluster 4) and time trialists (cluster 1) gaintheir points frommainly one dimension. The sprinters incluster 3 get their points fromthe different types offlat finishes, whereas the riders incluster 2 score inmountain stages and GC’s. A bar chart representationofthe scores oneachdimensionis provided inFigure 3.
Figure 3:Scores per dimensionfor eachcluster inthe first step but now ina stacked bar chart. The four cluster outcomes are verylogicaland infact these are the four categories onthe PCS profiles (e.g. Alejandro Valverde)! It is a confirmationthat the clusteringmethod is sensible. Inaddition, it also shows that the profile scores onPCS are wellthought of Before we will zoominonthe sprint cluster, we have outlined for eachinitialcluster the most characteristic riders (most exemplary) ofeachcluster as wellas the riders withmost points ineachcluster.
cluster 1:time trialists (15 riders) most characteristic:JonathanCastroviejo, NelsonOliveira, Maciej Bodnar, SimonGeschke, StefanKüngtop scoring:JonathanCastroviejo, NelsonOliveira, Maciej Bodnar, SimonGeschke, StefanKüng.
cluster 2:climbers / GC (185 riders) most characteristic:Pierre Latour, Sergio Henao, TonyGallopin, Fabio Aru, Richard Carapaztop scoring:Pierre Latour, Sergio Henao, Tony Gallopin, Fabio Aru, Richard Carapaz.
cluster 3:sprinters (97 riders) most characteristic:DannyvanPoppel, Elia Viviani, JempyDrucker, Clément Venturini, Peter Sagantop scoring:DannyvanPoppel, Elia Viviani, JempyDrucker, Clément Venturini, Peter Sagan.
cluster 4:classics specialists (26 riders) most characteristic:Oliver Naesen, Robert Power, Sep Vanmarcke, ŁukaszWiśniowski, MichaelValgrentop scoring:Oliver Naesen, Robert Power, Sep Vanmarcke, ŁukaszWiśniowski, MichaelValgren.
To illustrate the concept of‘most characteristic’ rider a bit more, take a look at Figure 4. Here youcansee the scores oneachdimensionofthe most characteristic sprinters incluster 3. As youcansee these riders indeed have quite a similar profile across allvariables we consider instep 1. Inthe next sectionwe willdive further into this sprint cluster and introduce new variables that allow us to split up this group further.
Figure 4:Scores per dimensionfor first step sprint cluster 3. Step 2:re-clusteringsprint cluster 3. For the second step we are goingto focus onthe 97 sprinters incluster 3. We willcluster these sprinters againand use the followingadditional variables:
% WT Percentage ofpoints scored at the UWT level% MayPercentage ofpoints scored before May1 teampos The average relative positionof the rider withinthe team% highspeed Percentage ofpoints scored inthe 25% ofraces withthe highest average speed % 200+ kmPercentage of points scored inraces/stages withmore than200 kilometers.
The additionalvariables should be reasonablyself-explanatory, except for the ‘teampos’ variable. We introduce this variable to account for the fact that a lead-out mayverywellend up inthe top 20 as well, but most likelybehind the protected rider. Since we do not use the totalnumber of points scored inthe clusteringwe do look at how highranked is a rider withinhis team. So for eachrace we look at the rank ofa rider withinhis ownteam, re-scale this onthe 0-100 domainand average it over allraces the rider participates in.
Withthe additionalfive variables (now 14 intotal) the optimalnumber ofclusters for the 97 sprinters is three (Figure 5) and the scores ofthe three clusters oneachofthe 14 dimensions canbe found inFigure 6.
Figure 5:Optimalnumber ofclusters whenre-clusteringthe sprinters fromstep 1 usingthe silhouette appraoch. Figure 6:Scores per dimensionfor eachsecond-stage sprint cluster ina radialplot. The three sprint clusters show quite some variationinthe new variables, inparticular % WT and % highspeed. It is not that straightforward to describe eachcluster injust a few words, but we have tried our best. Cluster 3.1 canbe characterized as all-round sprinters and classics riders at the world tour level, cluster 3.2 are the sprinters that score some GC points and love races of200 kilometers or more. Finallycluster 3.3 is occupied bythe pure strength, high-speed flat stage guys. The five most characteristic riders and riders withmost points ineachcluster are listed below.
cluster 3.1:World Tour All-Round/Classics (30 riders) most characteristic:DannyvanPoppel, Sacha Modolo, Mike Teunissen, Edward Theuns, Matteo Trentintop scoring:Peter Sagan, Elia Viviani, GregVanAvermaet, Jasper Stuyven, SonnyColbrelli.
cluster 3.2:Longdistance / GC (27 riders) most characteristic:VyacheslavKuznetsov, Carlos Barbero, Nils Politt, Magnus Cort, Matej Mohorič top scoring:Matej Mohorič, Carlos Barbero, Magnus Cort, Edvald BoassonHagen, Patrick Bevin.
cluster 3.3:Highspeed / super flat (40 riders) most characteristic:Giacomo Nizzolo, DylanGroenewegen, Álvaro José Hodeg, Kristoffer Halvorsen, Luka Mezgec top scoring:Alexander Kristoff, Arnaud Démare, DylanGroenewegen, PascalAckermann, Marc Sarreau.
Finally, lets zoominonthe all-round sprint cluster 3.1. InFigure 7 we plotted the scores ofthe most characteristic riders incluster 3.1, whereas in Figure 8 youcanfind the best scoringriders inthis cluster. Fromthe most characteristic riders inthis cluster Sacha Modolo stands out onthe hillflat stages, and Mike Teunissenis inthis cluster relativelystrongonthe GC dimension. Whenwe look at the riders withmost points we notice that Peter Sagancollects allhis points fromWorld Tour races. This is consistent withour extensive analysis ofSagan’s historic results.
Figure 7:Scores per dimensionfor the most characteristic allround spinters incluster 3.1. Figure 8:Scores per dimensionfor the best scoringallround spinters incluster 3.1. Conclusion. The two-step approachwhenclusteringriders is a considerable improvement over the single step approachthat we tried earlier. The first step yields four clearlydefined clusters consistingoftime trialists, sprinters, GC guys/climbers and classics specialists. Inthe second step we looked at sprinters. Withinthe sprinter cluster we obtained three subgroups:flat stage specialists, sprinters witha GC component that like longdistances and World Tour levelall-round/classics sprinters.
We willmost likelyfollow up onthis clusteringpost byinvestigatinghow these clusters members are distributed over the teams. Ifyouhave other suggestions don’t hesitate to send us a message onTwitter or email.
BM Bikes & BM Riders Club. Ifyoustop callingit "voltage"and callit potentialdifference (<< its proper name) thenit is easier to grasp that youare measuringdifferences only, not absolute values.
The intentionalload is the bulb so ideallywe would like the "potentialdifference"across the bulb to be the same as the batteryvoltage, that wayall the energyis goinginto lightingthe bulb up.
Inthe realworld that is impossible because anyconnections and wires introduce a smallload oftheir ownthat we tryto minimise. We canget an idea ofhow near to idealit is byseeinghow muchpotentialdifference we are gettingbetweenthe bulb positive and the batterypositive. Ideally zero but never quite gettingthere.
Re:'87 R65 instrument cluster wiring. Post byr75boxer »MonAug10, 2020 1:49 pm. Thanks once againRob. I've testingthe positive terminalonthe battery(usingthe positive wire fromthe multimeter) and the ground onthe bulb (brownwire) withthe negative lead onthe multimeter. That's what I thought youwanted (before I carefullyread the instructions). Anyway, it's clear now and I willconduct the test as above.
Re:'87 R65 instrument cluster wiring. Post bywulfrun»MonAug10, 2020 2:37 pm. At the risk ofyour wrathRob and excuse myignorance, but surelythe readingfromthe first bulb’s positive and batterypositive would be 12 volts?
Lookingat the picture here onvoltage drops: This is somethingI oftenstruggle with; actuallyelectrics ingeneral. It's (nominally) 12V at the batterypositive, but onlyassumingyou're measuringto the batterynegative. However, that's not the point ofwhat Rob intended. At the bulb's positive terminalsome voltage has beenlost due to wiring/switches/contacts not beingzero ohms. Bymeasuringfromthe batterypositive to the bulb positive, youfind out what's being"lost"and whether it's acceptable or indicates a fault. Likewise measuringfrom batterynegative to the bulb-holder negative canshow anearthfault up. Since the voltage ought to be verylow, youcanuse a more sensitive range onthe meter and get a more accurate idea, once you've established it actuallyIS a low figure (i.e. no major fault).
As anaside, the batterypositive isn't evenat (nominally) 12V, at least not unless youphysicallyearththe frame to the ground (not practicalifon the move!). It's just 12V more positive thanthe frame is.
Re:'87 R65 instrument cluster wiring. Post bywindmilljohn»MonAug10, 2020 5:10 pm. Thanks all. I think I was readingit as the voltage at the batterypositive to bulb positive, Rob was expecting0.somethingvoltage, not 12 less a bit ofloss. I understand losses, just couldn’t follow the thread; myfault.
Re:'87 R65 instrument cluster wiring. Post byMjolinor »MonAug10, 2020 5:23 pm. Thanks all. I think I was readingit as the voltage at the batterypositive to bulb positive, Rob was expecting0.somethingvoltage, not 12 less a bit ofloss. I understand losses, just couldn’t follow the thread; myfault.
Now your confusionis showing. Ifyoumeasure batterypositive to bulb positive it is ideallyzero but actuallyslightlyabove zero. The twelve volts that youhave available fromthe batteryis not ofanyconsequence as youare not usingthe negative side at all, it could be running offa hundred volts and the result would be the same. PotentialDIFFERENCE.
Re:'87 R65 instrument cluster wiring. Post bywindmilljohn»MonAug10, 2020 7:15 pm. Ahh. so. makingmyselflook evendafter possibly. would that explainthe tinyfigure Rob was talkingabout? A little acceptable loss? Or shallI just stick to carbs and mechanicals. Re:'87 R65 instrument cluster wiring. Post byr75boxer »MonAug10, 2020 7:33 pm. Okay. After a few domestic errands I'mback at it. I believe the onlyquestionthat needs to be anwered is #4. "4) The test readings should be takenat the positive bulb connectionfor eachofthe four indicator bulbs. TYhe wire colour willbe Blue/Red or Blue/Black. The bulb inthe circuit beingtested should be onat the time. This means you'llhave to do the left bulbs withthe indicator switchto the left and the right bulbs withthe switchto the right."
Withignitionon, left signalswitchon(solid, not blinking) I get a readingof0.004vonbothbulbs. The right side, right signalon(solid) I get a readingof0.002vonbothbulbs. I repeated the measurement twice and got the same readings.
Re:'87 R65 instrument cluster wiring. Post byMjolinor »MonAug10, 2020 8:11 pm. Ifyouuse the quote tags it makes readingthe posts muchmucheasier ifyouare quotingprevious replies. As shownbut remove the spaces: [ q uo t e ] Text inhere [ / q uo t e ] What it looks like ifyoudo it right. Re:'87 R65 instrument cluster wiring. Post byRob Frankhamr »MonAug10, 2020 8:40 pm. Okay. After a few domestic errands I'mback at it. I believe the onlyquestionthat needs to be anwered is #4. "4) The test readings should be takenat the positive bulb connectionfor eachofthe four indicator bulbs. TYhe wire colour willbe Blue/Red or Blue/Black. The bulb inthe circuit beingtested should be onat the time. This means you'llhave to do the left bulbs withthe indicator switchto the left and the right bulbs withthe switchto the right."
Withignitionon, left signalswitchon(solid, not blinking) I get a readingof0.004vonbothbulbs. The right side, right signalon(solid) I get a readingof0.002vonbothbulbs. I repeated the measurement twice and got the same readings.
That sounds actuallyverygood. Withthe knowledge that the lights are now workingas theyshould and those readings , I would sayit's time to stop worryingand ride the bike.
Frankhams retirement home for elderlyBoxers. Re:'87 R65 instrument cluster wiring. Post byRob Frankhamr »MonAug10, 2020 9:11 pm. Ahh. so. makingmyselflook evendafter possibly. would that explainthe tinyfigure Rob was talkingabout? A little acceptable loss? Or shallI just stick to carbs and mechanicals. OK, youasked for it. Everymaterialhas a resistance (No I'mnot goingto get into superconductors . I'mtalkingina world here that relates to practicalmotorcycle mechanics!). The resistance ofanyconductor is proportionate to it's length, the inverse ofit's cross sectionalarea and a constant unique to the materialit is made from. The constant is knownas the 'coefficient ofresistivity' for that material(I'msure youreallywanted to know that . Theres also another constant called the 'temperature coefficient ofresistance' but, ithe interests ofKISS, forget I mentioned it ). The upshot ofthis is that everywire has a resistance value.
Ifyouhave a current flowingthrougha resistance there willbe a voltage drop across it. that is to saythe voltage at the end connected closest to positive willalwaybe higher thanthe voltage at the end connected closest to negative. Ina practicalcircuit, the lower the resistance inthe wire, the lower the voltage across that resistance.. or to put it another way, there willalways be a voltage betweenthe ends ofthe wire. and that voltage is oftenreferred to as a voltage drop (unintended voltage drop would be more accurate). The trick is to get it as low as possible. This canbe done in a number ofways: a) keep the lengthofthe wire as short as practical. the shorter the wire the less resistance. b) Use a thicker wire. the thicker the wire the less resistance. c) Reduce the number ofhighcurrent loads attached to a single source. the higher the current, the greater the voltage drop. d) Reduce the number ofconectors. eachconnector adds it's ownresistance. and keep themas cleanas possible. e) Use onlygood qualityswitches. Contacts also add a measure ofresistance and, therefore, voltage drop. Clusteringriders. Todaywe address a complex, but funand interestingproblem:clusteringriders. This is byfar a new idea, everybodyknows Elia Vivianiand Dylan Groenewegenare classified as ‘sprinters’, whereas Chris and Froome are ‘GC guys’. The labela rider receives is simplybased onpast results. We are goingto investigate whether we cantake this a step further by(1) usinga uniformpoint systemfor race results and (2) includingother dimensions thanjust outcomes. One ofour findings is that althoughDylanGroenewegenand Elia Vivianiare bothsprinters, theyclearlybelongto different clusters. Let’s dive inand investigate how we came to this conclusion.
Truthto be told, clusteringis anart as wellas a science. It is a veryexploratorytype ofanalysis where youtryto find subgroups ofobservations (in our case riders) that are most similar across the dimensions that youconsider. Clusteringrequires substantialthought about the algorithmto use, the outcomes youinclude and whether youscale themina particular manner. The results are ingeneralnot right or wrong, but should help yougain insights ina particular field.
Inour rider cluster procedure we consider the following11 dimensions: ODR flat Points scored inflat one dayraces ODR not-flat Points scored inone dayraces that are not flat ODR unknownPoints scored inone day races for whichwe could not determine the profile TT Points scored intime trials duringa race stage MT Points scored inmountainstages stage flat Points scored inflat stages stage hillPoints scored inhillystages stage hill-flat Points scored inhillystages, but witha reasonablyflat finishGC Points scored inGC’s (excludingyouth/mountain/points/combat classifications) % WT Percentage ofpoints scored at the UWT level% May Percentage ofpoints scored before May1.
Allvariables are calculated for the riders that are part ofthe 2019 World Tour teams. We require that a rider scores at least 50 points and participates inat least 20 single dayraces or race stages before we include himinthe analysis.
Nine out of11 variables use ‘points’. We do not use UCI points or rank scores for the analysis. Instead, we use a point systemverysimilar to the Zweeler games, where finishingfirst ina race stage yields 35 points. For a one dayrace a rider canearnup to 120 points. Withthis point systemit pays offto finishinthe top 20 (stage) or top 25 (one dayrace). It is good to know that we rescale the points variables per rider. For example, ‘TT’ is the percentage ofthe totalrider points scored intime trials.
The GC points are calculated as the number ofstage points times the square root ofthe race length. As a consequence a grand tour victoryyields the same amount ofpoints as about 4.5 stage wins. The points maynot be perfectlybalanced, but keep inmind the mainpurpose ofthe points is to determine the strengths ofriders, we willnot reallyuse themto claimone rider is ‘better’ thananother.
Unlike UCI points the Zweeler points to not change byrace class (World Tour, HC, etc). We reallydo this onpurpose, it allows us to identify riders that are similar to e.g. Peter Saganor GregvanAvermaet intheir race preferences and performance, but don’t show this at World Tour level. This could be because oftheir race preferences, perhaps theysimplycannot compete withthe absolute top or have teamorders that prevent themfromridingfor their ownchances. To compensate for the levelthe results were obtained onwe have ‘% WT’ whichis the percentage of points scored at the World Tour level. Withthe variable ‘% May’ (the percentage ofpoints scored before May1st) we tryto capture the form peak ofriders.
Now before we canshow some results we are left withone bigquestion. How manygroups ofriders are there? Fortunately, this is not a trialand error process. There are severaltechniques that canbe used to determine the number ofclusters. One ofthese approaches is the ‘silhouette’ method and the results are showninFigure 1. The method indicates the optimalnumber ofgroups is 9, however, the difference with7 or 8 clusters is small. For now we stick with9 clusters and let’s see what comes out.
Figure 1:Optimalnumber ofclusters usingsilhouette appraoch. Just to recap, we have 11 variables / dimensions and the algorithmindicates we should make 9 groups / clusters ofriders withthose variables. The resultingclusters are presented intwo different ways. Figure 2 shows the scores ofeachcluster ina radialplot. It is quite intuitive to read and observe the differences betweenclusters. For example, lets focus onthe grey-blue line (cluster number 1). The riders inthis cluster score veryhigh inthe variables % WT and % Mayand take most those points fromnot flat single dayraces (ODR not-flat). Compare this withcluster 9. These riders also score highonthe not flat single dayraces, but do this ingeneralafter Mayand not so muchona world tour level.
Figure 2:Scores per dimensionfor eachcluster ina radialplot. Some bignames fromcluster 1 are GregvanAvermaet and Peter Sagan, but guess who is incluster 9:Alejandro Valverde . Well, yeahthat can happen, ifyoulet a computer do the job, but let’s just take a quick peek whether this makes sense giventhe variables. Figure 3 shows the scores for the most characteristic riders for cluster 1 (DarylImpey) and cluster 9 (David Gaudu) as wellas Peter Sagan(cluster 1) and Alejandro Valverde (cluster 9). The difference betweenthe two groups canbe explained by, amongothers, the percentage ofpoints at World Tour leveland points scored inhillystages. Furthermore, Valverde and Gaudutend to score more inGC’s. At some point we willalso consider more advanced cluster types where eachrider canactuallybe part ofmultiple clusters. It is likelythat members ofcluster 1 are also to some extent part ofcluster 9 and vice versa.