arXiv:2103.13736v1 [cs.LG] 25 Mar 2021 7--3113-/1$10 22 IEEE ©2021 978-0-7381-1236-7/21/$31.00 tt fteat(OA A 087 n DG(0.98) NDCG and (0.983), (0.867) NDCG (0.921), mAP mAP SOTA (SOTA) the and yielding pred Rugby, art effectively 15 produci most Super NBA, loss) the (Triplet the SNN ranking of The respectively. in (Tri model state LightGBM effective clear a most is the It is loss. loss) Contrastive using those Loss than (Triplet XGBoost (Triplet Loss), LightGBM (Contrastive Loss), XGBoost a (Contrastive Basketball. Loss), LightGBM and a Rugby in XGBoost, teams rank importance to the predict and uniso to matches in models, of spor (SNN) XGBoost Networks and the Neural LightGBM in Siamese with involved sports on focus those of we to response, amount Spor valuable analy making literature. the predictions nature, accurate the stochastic in respective in their increase in rankings unique an and been predictions analysis, has there quently, htohrRnigmdl hudb osdrdi h future the in sing considered be a should not models indi is Ranking sports two there other the Overall that across model sports. performing best of consistent ranking representations/embeddi learning and of prediction value the displaying ucm fgmswt ihpeiini eoigmore becoming is precision high with the forecasting games With algorithms, another. of learning over outcome machine winning team of offerin one availability games. companies world’s of upcoming betting odds the of numerous gambling of results are the there some la anticipating Furthermore, a are fans leaves Rugby of popularity This [26] number 15 [24]. betting [11], Super sports sports most-watched and of globalization NBA the The by demonstrated as oa fsxengmsbfr neigtekokotstages knock-out the entering plays before from The squad games teams [22]. Each sixteen [24]. rugby of year continents total each fifteen four playoffs of across countries the 16 consists five teams make Competition meaning teams eight playoffs, 15 30 top Super the the the to of where advance out season, conference regular each the pla from teams in Western thirty games All the Conference. - 82 conferences Eastern two the thirty into and are divided Conference There NBA rugby. the 15 in Super teams the and regul teams’ performances (NBA) season Association Basketball consid National sports model the The are improve sport. can in predictions representations across of performance learning deep which to epSmlrt erigfrSot emRanking Team Sports for Learning Similarity Deep i oeswr eeoe n oprd ihGM a LightGBM, a compared, and developed were models Six Abstract h olo hsrsac st opr vlaeteextent the evaluate compare to is research this of goal The h oesta tls rpe osfnto efr bett perform function loss Triplet a utilise that models The prsotoepeito otne ogi ouaiy[21 popularity gain to continues prediction outcome Sports r s 073.Tiltls rdcstebs vrl results overall best the produces loss Triplet (0.793). Sot aai oeraiyaalbeadconse- and available readily more is data —Sports colo optrSineadApidMathematics Applied and Science Computer of School † [email protected], ailYazbek Daniel .I I. nvriyo h Witswatersrand the of University NTRODUCTION oansug ot Africa South Johannesburg, † oahnSnieSibindi Sandile Jonathan , ∗ [email protected] ce the icted i,and sis, g for ngs cating sare ts .In t. ered plet ). rge the ar- ng . er le . ], n y g a . ∗ lsies uha av ae 1] i 04[9,used [19], 2014 weaker Lin many [19]. of Bayes results naive the averaging as by such losses classifiers, or wins [19] as (2014) Sundaresan SVM. Adaptive as and and such Forest methods Random Short, other Boost, with Lin, combination in as regressio - suggest used logistic as drawback, be role-specific as this could not overcome and To team usu rebounding. whole are o points the based scoring throughout whereas heavily probably spread team, is is their rebounds in This get position [4]. to their between ability (RPG) player’s success game’ a find per because and wouldn’t ’rebounds able (MPG) but and game’ be (PPG) MPG per likely game’ ’minutes would exampl per between NBA For ’points relationship the [5]. a in fit relationships model to multivariate regression fit linear mode to a output. this fails to ’loss’ drawbacks it predicti are or sports as there multiple However, ’win’ [28]. in a [12], used papers to been has representation regression feature Linear a link to ub ttsis hc ste nu noarnigalgorit and ranking a NBA into of input then feature group is a selected which learn statistics, to carefully instances Rugby implemented a two is from the SNN a representation between research, calculat this to In distance function [7]. or a construct similarity fe The to the [14]. used approximation is function best forward-network the in results that hrdwihs edfradntokdfie mapping the a of defines idea network measure feed-forward with main as to networks A feed-forward The learns weights. two create inputs. shared network to it’s is neural network between Siamese of similarity type the clas this to learning inputs, a model neural the its to of of type knowledge Instead a architecture. their are network (SNNs) transfer Networks must Neural betw Siamese humans correlation dataset. the data, learn and to in network labels labe neural data a upon classify for depend datasets; effectively tasks Classification and classes. patterns identifiable learn to designed learni in machine analysis of and multitude [18]. a research incre algorithms using increased This [26], prediction franchise. has the sports that upside achieve for to economic results teams in future to a adjustments possible can making best predictions in t game teams in accurate sports players Further, various industry. to betting desirable economically and feasible dpieBotadRno oet ekt lsiygames classify to seek Forests Random and Boost Adaptive used be can algorithms learning machine of variety A oua ahn erigagrtm r erlnetworks, neural are algorithms learning machine Popular y = f ( x ; θ ) nttt o nelgn Systems Intelligent for Institute n erstevleo h parameters the of value the learns and oansug ot Africa South Johannesburg, nvriyo Johannesburg of University eec .VnZyl Van L. Terence [email protected] hm. sify ally lled een ed- ase on ng he to id e, n θ n e l box score data from 1991-1998 to develop machine learn- seasons cross-validation was used for hyper-parameter tuning ing models to predict the winner of professional basketball on the three seasons of training data. Initially, the first two games. The methods included an adaptive boosting algorithm, seasons are used for training and validated on the third, then random-forest model, logistic regression and a support vector the first and third are used for training and validated on the machines (SVMs) [6]. The adaptive boost algorithm is most second dataset. Lastly, the model is trained on the last two closely related to this work and performs multiple iterations seasons and validated on the first. that improve a classifier by attempting to correctly classify 1) Rugby: The data collected for rugby is from the Super data points that were incorrectly classified on the previous Rugby International Club Competition. The data is collected iteration. This method yielded a testing accuracy of 64.1% from the season of 2017 to 2020. The models are trained and in Lin 2014 [19]. However, the high number of iterations validated on the Super Rugby seasons of 2017, 2018, and 2019 performed makes the model sensitive to outliers as it will and the season of 2020 to test the models. Data from each continually try predicting them correctly [19] leading, to over- season is split into two parts: seasonal statistics of each team fitting of data in some cases. Logistic regression yielded 64.7% as per the end of the season; and the second being the record testing accuracy and the random-forest model yielded 65% of every game played with the results thereof. It is noted that testing accuracy, just behind the SVM model at 65.2% testing draws appear less than non-draws so they’ve been included as accuracy, leaving little to split the four models respective a home team win for creating the embeddings. performances [19]. The purpose of these models is to use the team seasonal In this work we will use feature learning in the form of data to predict the outcome of matches between two sides similarity learning to learn an embedding for each of the sports for a given season. Subsequently using these predicted match considered. The learned features will be ranked using gradient outcomes to rank teams for the season. In each season there boosting with the LightGBM and XGBoost implementations. are one hundred and twenty games, with each team playing The outputs are to predict the seasonal standing for the NBA sixteen games in the regular season (each team plays eight and Super 15. Multiple seasons will be trained to predict a games within its conference and eight games against teams in selected test season to be evaluated to gauge the performance other conferences). of the models. The use of an SNN together with a LightGBM Data for the seasonal statistics are retrieved from and XGBooost for this task is an unexplored method in the Rugby4Cast and Statbunker. The features in the datasets are current literature. presented in Table I. The contributions of this paper are: 2) Basketball: The dataset used for basketball in this study 1) Showing that a general framework including gradient consists of four consecutive regular seasons spanning from boosting and deep representation learning can success- 2014 to 2018, each of the seasons contains 2460 games [27]. fully predict seasonal rankings across different sports The data was divided into two sets, each with fourteen features. (NBA and Rugby). One of the sets represented the home team and the other set 2) Demonstrating that the the use of representation learn- represented the away team. The features were chosen are as ing improves rankings using both the Gradient Boost per Thabtah et al. (2019) [26] and Ahmadalinezhad et al. Machines (LightGBM and XGBoost). (2019) [3], and are shown in Table II. 3) Improving on SOTA in NBA and establishing the SOTA C. Algorithms in Super 15 ranking task. We combine gradient boosting with Siamese neural net- II. METHODOLOGY works to improve overall performance. The reason for the A. Overview combination is that the gradient boosting frameworks are The descriptions, preprocessing, and analysis used during effective at ranking tasks but limited in their ability to learn the experimental process are explained in this section. The feature embeddings. We compare two gradient boosting frame- processed data is transferred to the models after partitioning. works combined with two Siamese network embedding losses. The data is partitioned by placing them into their respective XGBoost and LightGBM are decision- seasons, whereby the first three seasons will make up the tree-based ensembles that use a gradient boosting framework training data and the final season making up the testing data. for ranking, classification, and many other machine learning The models are trained and tested with their results compared tasks [17]. Gradient boosting is selected over Adaptive Boost- and analysed using appropriate metrics. ing, Random Forests and SVM’s. These machines are recent extensions of those algorithms and they containing additional B. Data features aiding in performance, speed, and scalability [23]. The The data was first normalized. When preparing the data result is that gradient boosting is more suitable for the desired a training-testing split was utilised. As the data-set has a rankings and predictions [17]. temporal aspect, one cannot use future seasons to predict those Boosting refers to a machine learning technique that trains preceding it [1]. The training data comprised of the first three models iteratively [17]. A strong overall prediction is produced seasons and testing data of the fourth and final season, helping by starting with a weak base model which is then added to iter- to prevent any data leakage [10]. However, due to limited atively by including further weak models. Each iteration adds TABLE I FEATURES OF RUGBY TEAM SEASONAL STATISTICS DATASET

1) Team name 2) Total tries scored 3) Home tries scored 4) Away tries scored 5) Tries scored in first half 6) Tries scored in second half 7) Total matches played 8) Tries conceded 9) Tries conceded at home 10) Tries conceded whilst away 11) Four try bonus point 12) Four try bonus point at home 13) Four try bonus point whilst away 14) Lost within seven points 15) Lost within seven points at home 16) Lost within seven points whilst away 17) Opponent four try bonus point 18) Opponent four try bonus point at home 19) Opponent four try bonus point away 20) Opponent lost within seven points 21) Opponent lost within seven points at home 22) Opponent lost within seven points away 23) Yellow cards accumulated 24) Yellow cards to red cards acquired 25) Red cards accumulated 26) Halftime wins 27) Halftime wins to full time wins 28) Halftime draws 29) Halftime draws to full time wins 30) Half time lose 31) Half time lose to full time win 32) Conversions 33) Penalties 34) Tackles 35) Penalties Conceded 36) Lineouts 37) Rucks 38) Scrums

TABLE II FEATURES OF BASKETBALL TEAM STATISTICS

1) Home/away team 2) Field Goals 3) Field Goals Attempted 4) Three-Point Shots 5) Three-Point Shots Attempted 6) Free Throws 7) Free Throws Attempted 8) Offensive Rebounds 9) Defensive Rebounds 10) Assists 11) Steals 12) Blocks 13) Turnovers 14) Total Fouls an additional weak model to overcome the shortcomings of Loss function for pairwise ranking. The Log Loss function the prediction of the previous weak models. The weak models used to calculate the cost value is given by: are trained by applying gradient descent in the direction of the N average gradient of the leaf nodes of the previous model [17]. −1 F = [y log(ˆy )+(1 − y ) log(1 − yˆ ) This gradient is calculated with respect to the pseudo residual N X i i i i i=1 errors of the previous models loss function. In the case of decision trees, an initial fit, F0, and a where y is the correct outcome of a game, yˆ is the predicted differentiable loss function L for a number of boosting rounds result of a game and N is the number of games played. M are used as starting points: 3) Siamese Neural Network: Data was fed into a Siamese Neural Network to create the embeddings to be passed to the n aforementioned ranking algorithms. An SNN consists of two F0(x)= arg min L(yi,γ). (1) γ X identical sub-networks where each is fed one of a selected i=1 tuple of inputs, in this case, the home and away team for Thereafter, the pseudo residuals are calculated by each game in the data-set [7]. The Rectified Linear Activation Function was selected [2] and which was used alongside an −∂L(yi, Fm−1(xi)) rim = . (2) RMSprop optimizer for thirteen epochs.Thirteen epochs was ∂Fm−1(xi) selected as the model accuracy had converged on the validation Lastly, the decision tree is fit, hm(x) to rim and let set at thirteen. As for the loss function, two were implemented: Contrastive loss and Triplet loss [8]. Fm(x)= Fm−1(x)+ λmγmhm(x) (3) Contrastive loss is defined as: where λm is the learning rate, γm is the step size optimized 1 2 1 2 by performing a line search and y is the correct outcome of J(θ)=(1−Y )( )(Da,b) +(Y )( )(max(0,m−Da,b)) (4) i 2 2 the game. 1) LightGBM: The first implemented evaluated is Light- where Y is representative of the outcomes classes, 0 indicating GBM, a gradient boost machine developed by Microsoft a win and 1 indicating a loss, m being the dissimilarity margin which uses tree-based learning algorithms. A LambdaRank [9] and Da,b the euclidean distance, objective function was selected to scales the gradients by the K change in NDCG computed in the LightGBM [9]. v 2 D (x1, x2)= u (b(x2) − a(x1) ) (5) 2) XGBoost: Second XGBoost was selected an optimised a,b uX k k tk=0 distributed gradient boosting library that uses a LambdaMART objective function [9]. XGBoost is designed to be highly where a(x1) is the output from the sub-network one and flexible, efficient, and portable to implement machine learning b(x2) the output of the sub-network two [8], [15]. A vector algorithms under the Gradient Boosting framework. The XG- containing the home team’s seasonal statistics is fed into the Boost library has many models in it but the selected one was sub-network one and similarly the away team is fed into the Pairwise ranker for these experiments. XGBoost used a Log sub-network two. Triplet loss is defined as: 3) Normalized Discounted Cumulative Gain: To proceed with Normalized Discounted Cumulative Gain (NDCG), one J(θ) = max(Da,b(a,p) − Da,b(a,n)+ m, 0) (6) first needs to introduce relevance scores for the actual rankings of the test season under consideration. If a team finishes first whereby anchor (a) is an arbitrary data point, positive (p) in their league/conference, a score of 15 is assigned to them, which is the same class (win or loss) as the anchor, and the team who finishes in second place is assigned score of 14, negative (n) which is a different class from the anchor [8]. and so forth until the fifteenth/last team receives a score of 1. Each SNN is fed the seasonal data set where the respective These scores will be referred to as rel where the respective outputs are run through the LightGBM ranker, resulting in values at position p are computed [29]: predicted values (similarity scores) for each game in the DCGp selected seasons. To compute the final standings we use a Tally nDCGp = (10) rank, if a team wins a game the similarity score is added to IDCGp their tally, and likewise, if a team loses, the similarity score where p is subtracted from their tally. 2reli − 1 A list-wise predicted ranking of both the Rugby and Bas- DCGp = (11) X=1 log2(i + 1) ketball test seasons were produced. This list-wise ranking i was sorted in descending order to achieve the final predicted and |RELP | standings. The teams were then separated into the Western 2reli − 1 IDCG (12) and Eastern conferences for analysis purposes. From the three p = X=1 log2(i + 1) methodologies discussed, six results were outputted, one from i each of the LightGBM and XGBoost model individually and a E. Experimental procedure total of four from the LightGBM and XGBoost SNN models. First the two baselines are computed, one assigning random Thus, a total of six predicted standings were achieved for each rankings and the other naively. For random the league stand- sport, each of which will be compared to one another using ings were randomized for each sport thirty times to produce several metrics. the baseline. For the naive baseline the assumption is made that the test season ranking will be identical to that of the D. Metrics preceding season. Once the baselines had been established, LightGBM and The predicted rankings were compared to one another, to the XGBoost were used without the embedding from the SNN. baselines and to the actual standings of that year. The metrics Both algorithms are fit to the training data and rank the that were used to evaluate the performance of the models are: respective test seasons (2019/20 for rugby and 2017/18 for 1) Mean Average Precision (mAP): the NBA). A final predicted league standing was computed for N both sports, the NBA is divided into the Western and Eastern 1 mAP = AP (7) conference, by adding the resulting similarity score to each N X i i=1 team’s tally if they win, and similarly subtracting the similarity score if they lose. with Next, two SNN’s were built (only differing by their loss K True Positives functions) to set up the second set of algorithms. The SNN AP = (8) which was built has two identical sub-networks, both of which X True Positives + False Positives k=1 have an input layer consisting of 50 neurons for basketball and input neurons for rugby, followed by two hidden layers with and K is the desired point where precision is calculated. The 37 and neurons and finally output layers with neuron AP was evaluated at k=15 in both the Eastern and Western 70 20 1 respectively. Both of the sub-networks use a ReLU (Rectified conferences, where the mAP is achieved by averaging these Linear Activation Function) [2] and then RMSprop optimizer two results. Super Rugby only uses the Average Precision. for thirteen epochs. Our RMSprop optimizer had a learning 2) Spearman’s rank correlation coefficient: rate of 0.001, a momentum value of 0, the epsilon value 2 was 1−7 and a discounting factor for the gradient of 0.9. 6 di rs =1 − P (9) n(n2 − 1) For contrastive loss, one of the sub-networks is fed the home team’s data and the other is fed the away team’s data. For where di = difference in the actual ranking of a team and triplet loss, a team is chosen as a baseline and the distance to the predicted rank, and n the number of cases. rs can take the positive input is minimized and the distance to the negative on values between 1 and -1, where 1 indicates a perfect input is maximized. The SNN’s were then trained on the first association of ranks, 0 indicates no association between ranks three seasons and were used to predict the final test season. and -1 indicates a perfect negative association of ranks. The The resulting scores were fed into both the LightGBM and closer to zero rs is, the weaker the association between the XGBoost rankers from which the predicted standings were ranks are [13]. computed using the tally rank method previously discussed. F. Results on the performance of the model. In terms of the selection of The results are as follows: the loss function, our results have little differentiation between them, specifically their F1 score, NDCG, and rs. However, the TABLE III mAP of each loss function was the biggest discerning factor, BASKETBALL RESULTS with the Triplet loss producing a mAP of 0.867 versus the Contrastive loss mAP of 0.859. Not only does the Triplet loss Method mAP rs NDCG improve the models’ overall precision, it predicted one more Naive 0.704 0.640 0.839 correct playoff team (Washington Wizards) that the Contrastive 0.02 0.01 0.05 Randomized 0.477 ± 0.558 ± 0.820 ± loss. LightGBM 0.822 ±0.00 0.839 ±0.00 0.979 ±0.00 XGBoost 0.857 ±0.00 0.900 ±0.00 0.976 ±0.00 However, there appears to be a converse effect with the LightGBM (Contrastive) 0.859 ±0.22 0.876 ±0.13 0.977 ±0.02 use of an SNN with the XGBoost. The XGBoost function by XGBoost (Contrastive) 0.745 ±0.26 0.751 ±0.13 0.916 ±0.08 LightGBM (Triplet) 0.867 ±0.15 0.870 ±0.11 0.980 ±0.00 itself outperforms the iterations where an SNN was used in XGBoost (Triplet) 0.725 ±0.26 0.675 ±0.14 0.922 ±0.11 every metric. From Table III, it is clear the LightGBM (Triplet loss) algorithm produces the overall best set of results closely followed by the XGBoost algorithm. TABLE IV 2) Rugby: Looking at the scores across the models IV, the RUGBY RESULTS Siamese Triplet Ranker performs the best followed closely by the Siamese Contrastive, LightGBM (Triplet loss), and Model AP rs NDCG LightGBM (Contrastive loss). The results then fall short of Naive 0.462 0.532 0.915 our Previous Season naive implementation with the XGBoost Randomized 0.193 ±0.20 0.525 ±0.02 0.735 ±0.10 LightGBM 0.381 ±0.08 0.379 ±0.06 0.806 ±0.09 model and LightGBM model falling behind in sixth and XGBoost 0.421 ±0.05 0.432 ±0.01 0.910 ±0.00 seventh position before the Random ranker in last. LightGBM (Constrastive) 0.620 ±0.02 0.671 ±0.04 0.965 ±0.05 The top-ranked amongst these models was the Siamese XGBoost (Contrastive) 0.724 ±0.08 0.746 ±0.04 0.974 ±0.01 LightGBM (Triplet) 0.686 ±0.02 0.754 ±0.05 0.970 ±0.00 Triplet ranker, followed closely by the Siamese Triplet with XGBoost (Triplet) 0.921 ±0.04 0.793 ±0.00 0.982 ±0.03 Gradient Boosting, Siamese Contrastive and Siamese Con- trastive with Gradient Boosting. The LambdaMart Gradient Boost and LambdaRank Gradient Boost models performed the G. Discussion worst on average and had scores that would lose out to the 1) Basketball: All of the six models (LightGBM, Light- naive implementations. GBM (Contrastive loss), LightGBM (Triplet loss), XGBoost, Similarly to the NBA regular season, it’s important in XGBoost (Contrastive loss), XGBoost (Triplet loss)) used to SuperRugby to note which are the top 8 teams move to the predict the NBA test season outperformed both the naive and knockout stages. The Siamese Triplet model correctly guessed random baselines in every metric (F1 score, mAP, NDCG 7 out of 8 teams, whilst the rest of the trained models predicted and rs). This is expected as basketball in itself is a volatile 6 correct. The random ranker came in last with only 4 teams sport [22], making a naive or random prediction highly un- predicted correctly. likely to resemble any form of result close to that produced by a deep similarity learning. H. Future improvements Arguably the most important feature of the NBA regular Although the Triplet loss edges the performance over the season is progressing to the playoffs [22], which only the top Contrastive loss, the margin is slightly smaller than ex- 8 teams from each conference advance to. The deep similarity pected [16]. The predictive models are categorized under learning models developed and investigated for this study were , which is widely accepted to be the domain able to predict the playoff teams to a relatively high degree of Triplet loss functions based on strategies explored in recent of accuracy, with the LightGBM (Triplet loss) performing the literature [8], [25], which indicates the under performance best with 8/8 Eastern Conference playoff teams correct and 7/8 of the models. Triplet loss functions are sensitive to noisy Western Conference playoff teams correct. XGBoost (Triplet data [16], which may be the hindering factor. An exploration loss) performed the worst with only managing 6 correct from of a new data-set, or even a new sport in itself, may help rectify either conference. It is worth mentioning that each of the this. Furthermore, an investigation into different sports would six implemented models incorrectly predicted that the Denver help one grasp the portability and effectiveness of the proposed Nuggets would make it to the playoffs over the Utah Jazz. predictive modeling outside of Rugby and Basketball, as it is This is likely attributed to the Utah Jazz drafting higher quality clear that there is no one consistent best performing model players that year (2017/18) possibly aiding them over-perform across the two sports. what was expected. Using a LightGBM without an SNN falls short in playoff III. CONCLUSION predictions, F1 score, mAP, NDCG, and rs, as seen in Table IV, providing us with a direct indication that the use of an SNN The seasonal rankings of the National Basketball Associ- in unison with a LightGBM has an overall beneficial impact ation and Super Rugby Club Championship were predicted using Siamese Neural Network models, Gradient Boost Ma- [7] J. Bromley, I. Guyon, Y. LeCun, E. S¨ackinger, and R. Shah, “Sig- chines, and a combination of the two. Super Rugby is far more nature verification using a “siamese” time delay neural network,” in Proceedings of the 6th International Conference on Neural Information difficult to predict due to each team playing far fewer games Processing Systems, ser. NIPS’93. San Francisco, CA, USA: Morgan compared to teams in the NBA. Kaufmann Publishers Inc., 1993, p. 737–744. In Basketball, the six models all produce satisfactory results, [8] T. Bui, J. Collomosse, L. Ribeiro, T. Nazare, and M. Ponti, “Regression in Deep Learning: Siamese and Triplet Networks,” 2017 30th SIBGRAPI with the best model predicting at least 81% of the playoff conference on graphics, patterns and images tutorials (SIBGRAPI-T), teams correctly. These models exceed a score of 0.8 in mAP, no. Icmc, 2017. [9] C. J. Burges, “From ranknet to lambdarank to lambdamart: An NDCG, and rs as well as outperform a majority of current overview,” Learning, vol. 11, no. 23-581, p. 81, 2010. literature in this field [5], [12], [20]. The models that utilise the [10] V. Cerqueira, L. Torgo, and I. Mozetic, “Evaluating time series LightGBM ranking model combined with the Siamese (Triplet forecasting models: An empirical study on performance estimation Loss) Neural Network produced the highest ranking in both methods,” Machine Learning, pp. 1–28, 2019. [Online]. Available: http://arxiv.org/abs/1905.11744 the NDCG and mAP scores. All models surpassed both the [11] Y. Chen, J. Dai, and C. Zhang, “A neural network model of the nba Naive and Randomized baseline in all three metrics. most valued player selection prediction,” in A Neural Network Model of The models for rugby that utilise the Siamese (Triplet Loss) the NBA Most Valued Player Selection Prediction, 08 2019, pp. 16–20. [12] G. Cheng, Z. Zhang, M. Kyebambe, and K. Nasser, “Predicting the Neural Network have proven to be better than those that utilise Outcome of NBA Playoffs Based on the Maximum Entropy Principle,” the Siamese (Contrastive Loss) with the best-predicting 87.5% Entropy, vol. 18, p. 450, 2016. of the play-off teams correctly and exceeding 0.9 in mAP, [13] Ellis and Victoria, “Spearma n ’ s correlation,” Ellis and Victoria textbook, p. 8, 2011. NDCG, and rs. The models that use LightGBM and XGBoost [14] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, are not as effective as their counterparts which do not. Both 2016, http://www.deeplearningbook.org. the LightGBM and XGBoost models which do not use the [15] R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” Proceedings of the IEEE Computer SNN as part of their design failed to provide better results Society Conference on Computer Vision and Pattern Recognition, vol. 2, than the Naive rankings but were better than the Randomized pp. 1735–1742, 2006. list. [16] E. Hoffer and N. Ailon, “Deep metric learning using triplet network,” 2018. Using Gradient Boost Machines and SNNs, it is possible to [17] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Y. create impressive and capable ranking and prediction models Liu, “LightGBM: A highly efficient gradient boosting decision tree,” for the NBA and Super Rugby. However, there is no one best Advances in Neural Information Processing Systems, vol. 2017-Decem, no. Nips, pp. 3147–3155, 2017. performing model across the two sports. This research can be [18] N. Lieder, “Can Machine-Learning Methods Predict the Outcome of an built upon to further delve into more sporting disciplines or NBA Game?” SSRN Electronic Journal, pp. 1–10, 2018. expand on the current models by training on more seasons [19] J. Lin, L. Short, and V. Sundaresan, “Predicting National Basketball Association Winners,” CS 229 FINAL PROJECT, pp. 1–5, 2014. and better feature modelling. It may be concluded that a [20] J. Lu, Y. Chen, and Y. Zhu, “Prediction of future nba games’ point LightGBM model is more effective in ranking larger data-sets, difference: A statistical modeling approach,” in Prediction of future as prevalent in the NBA, compared to the XGBoost model NBA Games’ point difference: A statistical modeling approach, 2019, pp. 252–256, cited By 0. which explains the differences in their respective results. [21] H. Manack and T. L. Van Zyl, “Deep similarity learning for soccer team In conclusion, the models proposed provided excellent re- ranking,” in 2020 IEEE 23rd International Conference on Information sults in comparison to their baseline counterparts and answered Fusion (FUSION). IEEE, 2020, pp. 1–7. [22] B. Pagno, L. Guedes, C. Maria, and L. Nedel, “Nbavis: Visualizing the research questions at hand. They have been proven that national basketball association information,” in NBAVis: Visualizing they can be utilized in betting industries to forecast results National Basketball Association Information, 08 2014. that are precise in both the Super Rugby and NBA. [23] S. Rahman, M. Irfan, M. Raza, K. M. Ghori, S. Yaqoob, and M. Awais, “Performance analysis of boosting classifiers in recognizing activities of daily living,” International Journal of Environmental Research and REFERENCES Public Health, vol. 17, no. 3, 2020. [1] R. Adhikari K. and A. R.K., “An Introductory Study on Time Series [24] S. RugRules, Super Rugby Rules - Super Rugby — Super Modeling and Forecasting Ratnadip Adhikari R. K. Agrawal,” arXiv 15 Rugby And Rugby Championship News,Results And Fixtures preprint arXiv:1302.6613, 2013. From Super XV Rugby, 2016 (accessed November 15, 2020), [2] A. F. Agarap, “Deep learning using rectified linear units (relu),” arXiv https://www.superxv.com/super-rugby/super-rugby-rules. preprint arXiv:1803.08375, 03 2018. [25] A. Taha, Y.-T. Chen, T. Misu, A. Shrivastava, and L. Davis, “Boosting [3] M. Ahmadalinezhad, M. Makrehchi, and N. Seward, “Basketball lineup standard classification architectures through a ranking regularizer,” 2020. performance prediction using network analysis,” in Basketball lineup [26] F. Thabtah, L. Zhang, and N. Abdelhamid, “Nba game result prediction performance prediction using network analysis, 2019, pp. 519–524, cited using feature analysis and machine learning,” Annals of Data Science, By 0. 01 2019. [4] T. M. Arnold and J. M. Godbey, “Introducing Linear Regression: An [27] A. Thompson, “All the news: 143,000 articles from 15 american Example Using Basketball Statistics,” SSRN Electronic Journal, vol. 11, publications,” =https://www.kaggle.com/snapcrack/all-the-news, 8 2017. no. 2, pp. 113–130, 2012. [28] R. A. Torres and Y. H. Hu, “Prediction of NBA games based on Machine [5] K. Barlos and S. Koutsogianni, “Predicting Regular Season Results Learning Methods,” University of Wisconsin, Madison, 2013. of NBA TeamsBased on of Common Basketball [29] H. Valizadegan, R. Jin, R. Zhang, and J. Mao, “ by Statistics,” UNIVERSITY OF CALIFORNIA AT BERKELEY, vol. 97, optimizing ndcg measure,” in Learning to Rank by Optimizing NDCG no. 12, pp. 194–200, 2015. Measure, 01 2009, pp. 1883–1891. [6] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ser. COLT ’92. New York, NY, USA: Association for Computing Machinery, 1992, p. 144–152. [Online]. Available: https://doi.org/10.1145/130385.130401 This figure "fig1.png" is available in "png" format from:

http://arxiv.org/ps/2103.13736v1