
DETECTION OF NAMED BRANCH ORIGIN FOR GIT COMMITS A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Heather M. Michaud August, 2015 DETECTION OF NAMED BRANCH ORIGIN FOR GIT COMMITS Heather M. Michaud Thesis Approved: Accepted: _______________________________ _______________________________ Advisor Dean of the College Dr. Michael L. Collard Dr. Chand Midha _______________________________ _______________________________ Faculty Reader Interim Dean of the Graduate School Dr. Kathy J. Liszka Dr. Chand Midha _______________________________ _______________________________ Faculty Reader Date Dr. Zhong-Hui Duan _______________________________ Interim Department Chair Dr. David N. Steer !ii ABSTRACT The named branch on which a change is committed in a Git repository provides valuable insight into the evolution of a software project, including a natural and logical ordering of commits categorized by the developer at the time of the change. In addition, the name of the branch provides semantic context as to the nature of the changes along that branch. However, this branch name is unrecorded in the historical archive of Git repositories. In this thesis, a heuristics-based algorithm is presented to detect the named branch origin of commits based on the merge commit messages. An empirical evaluation shows precision levels reaching an average of 87% as seen when applied to generated test repositories and an average recall of over 97% when applied to generated test repositories and forty-four open source systems. This is shown to constitute an enormous increase in recall when compared to the only existing algorithm for branch name detection. Additionally, a detailed explanation of common merge commit messages, merge types, and branch names as found in over forty open-source projects is discussed. !iii TABLE OF CONTENTS Page LIST OF TABLES ............................................................................................................vii LIST OF FIGURES .........................................................................................................viii CHAPTER I. INTRODUCTION .....................................................................................................1 II. GIT STRUCTURE ....................................................................................................4 Branch operations ...................................................................................................6 Merge operations ....................................................................................................9 Merge ..........................................................................................................9 Fast-forward ..............................................................................................11 Pull-requests ..............................................................................................12 Rebase .......................................................................................................13 Cherry-pick ...............................................................................................15 Remote repositories ..............................................................................................16 III. ROLE OF MERGE COMMITS ..............................................................................18 Text extraction ......................................................................................................19 IV. CASE STUDY: MERGE COMMIT TYPES ..........................................................23 V. HEURISTICS-BASED ORIGIN DETECTION .....................................................29 !iv Branch Heads Heuristic ........................................................................................32 Parents of Merges Heuristic ..................................................................................35 Ancestral Heuristic ................................................................................................37 Majority-Origin Heuristic .....................................................................................40 VI. ALGORITHM EVALUATION ...............................................................................42 Generated test repositories ....................................................................................42 Grim extension ......................................................................................................46 Analyses ................................................................................................................47 VII. APPLICATION OF HEURISTICS ON OPEN-SOURCE SYSTEMS ...................50 VIII. THREATS TO VALIDITY ......................................................................................53 IX. RELATED WORK ..................................................................................................55 X. CONCLUSION AND FUTURE WORK ................................................................57 BIBLIOGRAPHY .............................................................................................................60 !v LIST OF TABLES Table Page 1. Variety of default merge commit messages based on branch types ........................20 2. Summary of merge commit types in 44 open-source projects ...............................21 3. List of collected open-source repositories used in the case study ...........................24 4. Distribution of explicit merge types on 44 open-source systems ............................25 5. Commonly occurring non-default merge commit messages ...................................27 6. Recall of heuristics-based algorithm stages on 44 systems .....................................51 !vi LIST OF FIGURES Figures Page 1. An example Git repository shown as a directed acyclic graph ...................................5 2. Stages of commits to branches in example Git repository ..........................................7 3. Branch creation and checkout .....................................................................................8 4. Before and after repository state of a git merge operation ........................................10 5. Before and after a fast-forward merge .......................................................................11 6. Stages of a git rebase operation .................................................................................14 7. A repository state before and after a git cherry-pick .................................................16 8. An example Git repository with labeled origins of algorithm baseline .....................31 9. Heuristics-based algorithm pseudocode ....................................................................32 10. Branch head heuristics pseudocode ...........................................................................33 11. An example Git repository with labeled origins after the branch head heuristic has also been applied ........................................................................................................34 12. Fast-forwards and branch deletions effect the result of the branch head heuristic ....35 13. Merge parent heuristic pseudocode ...........................................................................36 14. An example Git repository with labeled origins after the merge parents heuristic has also been applied .......................................................................................................37 15. Ancestral heuristic pseudocode .................................................................................38 16. An example Git repository with labeled origins after the ancestral heuristic has also been applied ...............................................................................................................39 !vii 17. Majority origin heuristic pseudocode ........................................................................40 18. An example Git repository with labeled origins after the majority origin heuristic has also been applied ........................................................................................................41 19. Bash source-code for generating test repositories .....................................................43 20. Example generated repository with fifteen non-merge commits ...............................45 !viii CHAPTER I INTRODUCTION Mining source-code repositories has become an intrinsic part of software engineering analyses since the advent of version-control systems (VCS). Grouping the commits into categories helps to easily interpret the historical information of the repository as well as identify evolutionary patterns and trends. Oftentimes, commits or issues are grouped according to the author [1], a sliding time-window [2-5], the size [6] or type [7] of the change, by the files that were changed [5,8], branch patterns [9-10], or data-mining clustering methods [11]. Git, the most popular VCS to date [12], allows the developer to create named branches (i.e., independent, diverging paths from the mainline of development) which can later be merged back into the mainline. Because branching and merging operations are so flexible and efficient, it has been seamlessly integrated into developer workflow. A typical Git workflow involves creating a short-lived topic branch to implement a specific feature or bug fix, which is then merged back into the main branch upon task completion. Long-term branches are used
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages70 Page
-
File Size-