Exploring Efficient Ways of Distributing Code Quality Metrics in Cloud
Total Page:16
File Type:pdf, Size:1020Kb
Exploring Efficient Ways of Distributing Code Quality Metrics in Cloud by Dibangku Baruah, B.Sc. Dissertation Presented to the University of Dublin, Trinity College in fulfillment of the requirements for the Degree of Master of Science in Computer Science University of Dublin, Trinity College September 2016 Declaration I, the undersigned, declare that this work has not previously been submitted as an exercise for a degree at this, or any other University, and that unless otherwise stated, is my own work. Dibangku Baruah August 30, 2016 Permission to Lend and/or Copy I, the undersigned, agree that Trinity College Library may lend or copy this thesis upon request. Dibangku Baruah August 30, 2016 Acknowledgments I wish to express my sincere gratitude to my supervisor, Prof. Stephen Barrett for his support, guidance, suggestions and feedback which helped guide me throughout the course of my work on the dissertation. I would like to thank Paul Duggan and John Welsh, for their constant help and su- pervision throughout the duration of my thesis. I am also extremely grateful to all my friends for their words of encouragement, and for always being there to discuss any technical issues. Finally, a big thank you to my wonderful parents, without whom all of this would not have been possible. Dibangku Baruah University of Dublin, Trinity College September 2016 iv Exploring Efficient Ways of Distributing Code Quality Metrics in Cloud Dibangku Baruah, M.Sc. University of Dublin, Trinity College, 2016 Supervisor: Prof. Stephen Barrett The objective of this thesis, is to explore ways of efficiently distributing calculation of code quality metrics. A thorough study is made in this thesis, to determine whether or not it is possible to distribute calculation of source code complexity metrics and, if possible whether that solution holds against code repositories of all sizes and to explore the best way to distribute out the processing to many nodes. The report here outlines the theory about software metrics, their purpose and classi- fication, with particular emphasis on the metrics that is used in the study and later for experimentation. This thesis ends with an in-depth analysis and evaluation of the results, gathered through the running of the chosen complexity metric on different designed solution to the problem of distributing the processing to many machines. v Contents Acknowledgments iv Abstract v List of Tables xi List of Figures xii Chapter 1 Introduction 1 1.1 Background . .1 1.2 Motivation . .2 1.3 Research Questions . .2 1.4 Research Objectives . .3 1.5 Dissertation Outline . .3 Chapter 2 Software Metric 5 2.1 Usage of Software Metrics . .6 2.2 Classes of Software Metrics and Attributes . .6 2.2.1 Process Metrics . .7 2.2.2 Product Metrics . .7 2.2.3 Resource Metrics . .8 2.3 Software Code Metrics . .8 2.3.1 Quantitative Metrics . .9 vi 2.3.2 Metrics for Program Flow Complexity . .9 2.3.3 Metrics for Data Control Flow Complexity . .9 2.3.4 Metrics for Control Flow and Data Flow Complexity . 10 2.3.5 Object-Oriented Metrics . 10 2.3.6 Safety Metrics . 10 2.3.7 Hybrid Metrics . 10 2.4 Finding a Suitable Metric . 10 2.5 Source Lines of Code(SLOC) . 11 2.5.1 Advantages . 12 2.5.2 Disadvantages . 12 2.5.3 Takeaways . 13 2.6 Chidamber and Kemerer Object-Oriented Metrics Suite . 13 2.6.1 Weighted Methods per Class . 14 2.6.2 Depth of Inheritance Tree . 14 2.6.3 Coupling Between Object Classes . 14 2.6.4 Number of Children . 15 2.6.5 Response for a Class . 15 2.6.6 Lack of Cohesion in Methods . 16 2.6.7 Summary of OO Metrics . 16 2.6.8 Disadvantages . 17 2.6.9 Takeaways . 17 2.7 ABC Metric . 17 2.7.1 Advantages . 18 2.7.2 Disadvantages . 18 2.7.3 Takeaways . 18 2.8 Halstead Measures of Complexity . 19 2.8.1 Program Vocabulary . 20 2.8.2 Program Length . 20 vii 2.8.3 Program Volume . 21 2.8.4 Difficulty Level . 21 2.8.5 Program Level . 21 2.8.6 Programmign Effort . 22 2.8.7 Development time . 22 2.8.8 Advantage . 22 2.8.9 Disadvantage . 22 2.8.10 Takeaways . 23 2.9 Selection of Metrics . 23 Chapter 3 McCabe Cyclomatic Complexity 24 3.1 Definition . 25 3.1.1 Properties . 27 3.2 Categories of Cyclomatic Complexity . 28 3.3 Cyclomatic Complexity Variations . 28 3.3.1 Connected Components Variation . 28 3.3.2 Variation with Regards to Program Syntax . 30 3.4 Reasons for chosing Cyclomatic Complexity . 33 3.4.1 Limitations of Cyclomatic Complexity . 34 3.5 Summary . 34 Chapter 4 Design 35 4.1 Overview of MapReduce . 35 4.1.1 MapReduce Architecture . 38 4.1.2 Map Side Task . 39 4.1.3 Reduce Side Task . 40 4.1.4 Distributed File System . 41 4.1.5 Namenode and Datanode . 41 4.2 Overview of Git . 43 viii 4.3 Overview of REST architecture . 44 4.4 Basic Architecture of the System . 45 4.5 Design 1 : Baseline System . 47 4.6 Design 2 : Distributed Solution, RESTFul . 47 4.6.1 Optimization to the Distributed Solution . 49 4.7 Design 3 : Distributed Solution using Hadoop MapReduce . 50 4.8 Summary . 51 Chapter 5 Implementation 52 5.1 Technologies used . 52 5.2 REST Implementation . 53 5.3 MapReduce Implementation . 56 5.4 Implementation Summary . 59 Chapter 6 Evaluation 61 6.1 Comparison between the Baseline and Distributed Model . 61 6.2 Comparison between First Run and Subsequent Runs . 63 6.3 Comparision between Distributed and MapReduce Model . 64 6.4 Effects of File Size on MapReduce Jobs . 67 6.5 Modifications to Files copied to HDFS . 68 6.6 Discussion . 70 6.7 Evaluation Summary . 71 Chapter 7 Conclusions 73 7.1 Project Overview . 73 7.2 Future Work . 74 Appendix A Abbreviations 75 Appendix B Github Repositories used in Experiments 78 ix Bibliography 79 x List of Tables 2.1 Summary of the metrics . 16 3.1 Ranges of cyclomatic complexity . 28 B.1 Repositories used in the experiments . 78 xi List of Figures 2.1 Example of calculation of Halstead Metric . 19 3.1 Control flow graph for a simple program . 26 3.2 Control flow graph of the main function on the left . 27 3.3 Multiple components in a single program . 29 3.4 Complexity calculation from the number of branching statements . 32 3.5 Cyclomatic complexity calculation by observing the number of regions . 33 4.1 Shared everything architecture . 36 4.2 Shared nothing architecture . 38 4.3 Overview of a MapReduce job . 39 4.4 The overall MapReduce word count process . 40 4.5 The topology of a MapReduce cluster . 42 4.6 Git repository with three commits . 44 4.7 Basic overview of the system design . 46 4.8 Distributed solution implementation using OpenNebula . 47 4.9 Distributing data using Hadoop MapReduce . 50 5.1 REST listener class . 54 5.2 Format of the output in HDFS . ..