Technical Report
Total Page:16
File Type:pdf, Size:1020Kb
National College of Ireland BSc in Computing 2017/2018 Daniel Murran X13114336 [email protected] Supervisor: Simon Caton Ahead of the Curve, Analysis of the NFL Draft Technical Report 1 | P a g e Contents 1 Table of Figures .................................................................................................. 5 Executive Summary ................................................................................................... 8 2 Introduction ........................................................................................................ 9 Aims ............................................................................................................ 10 Technologies ............................................................................................... 11 3 Background & Literature Review .................................................................... 12 4 Methodology .................................................................................................... 16 Data Selection ............................................................................................. 16 Pre-Processing ............................................................................................ 17 Transformation ........................................................................................... 17 Data Mining ................................................................................................ 18 Interpretation / Evaluation ........................................................................ 18 5 System ............................................................................................................... 19 Design and Architecture ............................................................................ 19 Implementation .......................................................................................... 20 5.2.1 Data Selection ...................................................................................... 20 5.2.2 Data Preparation ................................................................................. 21 5.2.3 Data Transformation ........................................................................... 22 5.2.4 Database Interactions .......................................................................... 29 5.2.5 Data Mining with Regression ............................................................. 30 5.2.6 Glmnet Sparse Regularized Regression ............................................. 30 5.2.7 XGBoost Gradient Boosting Dense Regression .................................. 31 Evaluation ................................................................................................... 33 6 Graphical User Interface (GUI) Layout ........................................................... 37 Testing ........................................................................................................ 39 2 | P a g e 6.1.1 Unit Testing ......................................................................................... 39 6.1.2 Statistical Testing ................................................................................. 41 6.1.3 Customer testing ................................................................................. 42 Exploration Plan ......................................................................................... 48 Impact Summary ........................................................................................ 48 7 Conclusions & Further Development .............................................................. 50 8 References ......................................................................................................... 52 9 Appendix .......................................................................................................... 53 Definitions, Acronyms, and Abbreviations .............................................. 53 Visuals ........................................................................................................ 54 Technical Details ........................................................................................ 55 Requirements.............................................................................................. 56 9.4.1 Functional requirements ..................................................................... 56 9.4.2 Use Case ............................................................................................... 56 9.4.3 Requirement 1 Web Scrape Draft Data .............................................. 56 9.4.4 Requirement Web Scrape Combine Data ........................................... 58 9.4.5 Requirement Web Scrape College Data ............................................. 60 9.4.6 Requirement Database Creation ......................................................... 62 9.4.7 Requirement Data Visualization......................................................... 64 9.4.8 Data Requirements .............................................................................. 66 9.4.9 Performance/Response time requirement .......................................... 66 9.4.10 Availability Requirement .................................................................... 66 9.4.11 Recover requirement ........................................................................... 66 9.4.12 Security requirement ........................................................................... 67 9.4.13 Reliability requirement ....................................................................... 67 9.4.14 Maintainability requirement ............................................................... 67 9.4.15 Extendibility requirement ................................................................... 67 9.4.16 Reusability requirement ...................................................................... 67 3 | P a g e Project Plan ................................................................................................. 67 Monthly Journals........................................................................................ 68 9.6.1 September ............................................................................................ 68 9.6.2 October................................................................................................. 70 9.6.3 November ............................................................................................ 71 9.6.4 December ............................................................................................. 71 9.6.5 January ................................................................................................. 72 9.6.6 February ............................................................................................... 72 9.6.7 March ................................................................................................... 72 9.6.8 April ..................................................................................................... 73 9.6.9 May ...................................................................................................... 73 4 | P a g e 1 Table of Figures Figure 1: Path of a college football player ............................................................... 10 Figure 2: Data Analysis in NFL .............................................................................. 12 Figure 3: KDD Methodology ................................................................................... 16 Figure 4 System Architecture Overview ................................................................. 19 Figure 5: Urls extracted ............................................................................................ 20 Figure 6: Combine scraped data and merge into a table ........................................ 21 Figure 7: Filtering multiple rows with the same player ......................................... 21 Figure 8: Mice Imputation of data ........................................................................... 22 Figure 9: Training & Test Set Preparation............................................................... 22 Figure 10: Correlation Matrix .................................................................................. 24 Figure 11: Correlation Plot....................................................................................... 24 Figure 12: Word Cloud ............................................................................................ 24 Figure 13: Histogram for the shuttle ....................................................................... 25 Figure 14: Scatterplot for the shuttle with weight and age .................................... 26 Figure 15: Individuals factor map & Variables factor map .................................... 27 Figure 16: Scree Plot & Code Snippet...................................................................... 28 Figure 17: Dimensions Correlations and p values .................................................. 29 Figure 18: Database connection script..................................................................... 29 Figure 19: Database connection script..................................................................... 29 Figure 20: Matrix for sparse model ........................................................................ 30 Figure 21: Code snippet for ROC curve & AUC ..................................................... 31 Figure 22: Matrix for dense boosting model ........................................................... 31 Figure 23: Tuning