Revision Control Systems Introduction to Git
Total Page:16
File Type:pdf, Size:1020Kb
Revision Control Systems Introduction to git Bartosz Kostrzewa October 2014 The palest ink is better than the best memory Chinese proverb Contents 1 Notes 1 1.1 Outline . 1 1.2 Revision Control . 1 1.3 Revision Control in Science . 2 1.4 What is a Revision Control System? . 3 1.5 Revision Control Features . 3 1.5.1 Branches . 3 1.6 Centralized and Distributed Version Control Systems . 4 1.6.1 CVS, SVN ... - Centralized Version Control Systems . 4 1.7 git - the stupid content tracker . 5 1.7.1 storage paradigms - differences vs. snapshots . 5 1.7.2 revisions vs. commits . 6 1.7.3 the staging area . 7 1.7.4 remote repositories . 8 1.7.5 speciality: cheap branching . 8 1.8 Branching Development Models . 9 1.8.1 Trunk, Release . 9 1.8.2 Rolling Release, Feature Branches . 9 1.8.3 Stable Branch, Development Branch, Feature Branches . 11 1.9 Remotes and Branching Models . 11 1.9.1 Revisiting the \Central" Repo . 11 1.9.2 What is a \pull request"? . 11 1.10 Source Code Management and Collaboration . 12 1.10.1 redmine & github . 12 2 Exercises 13 2.1 git and SVN, Similarities and Differences . 13 2.2 Basic Exercises . 13 2.2.1 git Configuration . 13 2.2.2 Setting up a Local Repository . 14 2.2.3 Making Changes and the Staging Area . 15 2.2.4 Committing Two Changes Separately . 15 2.2.5 Unstaging a Change . 16 2.2.6 Reverting a Commit . 16 2.2.7 File Operations . 17 2.2.8 Creating a New Branch . 18 2.2.9 Interacting with a Remote Repository - Cloning, Fetching, Pulling and Resolving Conflicts . 19 2.2.10 Interacting with a Remote Repository - Pushing Local Changes to a Remote . 23 2.3 Advanced Exercises . 24 2.3.1 The .gitignore File . 24 2.3.2 Preparing a git Repository for Writing a Paper (or your Thesis) . 24 2.3.3 Creating \bare" git Repositories . 25 2.3.4 git stash . 26 2.3.5 Staging Partial Changes . 27 2.3.6 Using git for Debugging . 28 2.3.7 Dealing with and Reverting Merges . 30 2.3.8 Rewriting History . 31 2.3.9 Cherry-pickimg one or more Commits . 33 2.3.10 Referencing the Commit Hash in a Program . 34 2.3.11 Using tags . 35 2.3.12 Converting a Subversion Repository . 35 3 Conclusion 35 4 References 36 1 1 Notes This is a short set of notes for the lecture with the same title. The slides and the notes are supposed to complement each other and you should read the notes while looking at the slides, unless you have a photographic memory. 1.1 Outline We will begin with a short overview of the reasons for using version control systems in software development in general and in scientific software devel- opment in particular. We will learn about the difference between centralized and decentralized (or distributed) version control systems. Then we will talk about the types of work-flows possible with distributed version control systems and will look in particular at the work-flows enabled by git. The particular work-flow shown here includes pizza and who could disagree with that? Because a lot of the content of these notes is abstract, the exercises at the end are supposed to familiarize you with git and the various techniques discussed in the lecture. 1.2 Revision Control Revision control should be an essential part of the software development pro- cess. Your software will go through many small incremental steps as you add features, fix bugs or release different versions. Although manageable by hand on a small scale, this process will quickly lead to confusion as to which change was made when and by whom. In the best case, this can result in very painful bug tracking, in the worst case it can lead to work - such as new features - being forgotten and lost. A revision control system allows you to keep track of the state of a software project at any given time. It does so by saving the current set of files that belong to your program side-by-side with meta-information which identifies when given changes were made and by whom. Some systems will even track relationships between changes made to a software project so that the origin of a given chunk of code can be traced back to the point of its introduction into the code-base. The additional meta-information will give you the necessary information to follow the development process retrospectively and make your bug and release management easier. It will also allow you to properly assign credit for devel- opments of your code-base which is important if you're trying to understand a bug in somebody else's function, for example. Finally, the most basic but most important reason for the necessity of revi- sion control is simply the imperfection of human memory. Chances are that you will work on multiple things at the same time and you will often forget what you were doing as you switch from one project to another. Much like good code 2 commenting, keeping a good revision history with meaningful change messages helps you and your team keep track of your progress and makes it easier two switch between projects as necessary. 1.3 Revision Control in Science As scientists I believe we have an additional obligation to use revision con- trol systems. Just like experimental scientists are expected to keep diligent research log books documenting their research process, academics using com- puters should be able to keep track of the development process of their pro- grams. Because of the nature of software, doing so by hand would be very cumbersome. The information kept by a revision control system can be directly useful for keeping track of the methodology for the purpose of publications based on some computational work. Similarly, results in publications should be linked to the exact version of your software they were created with, another thing that a revision control system can help you with. This is essential in ensuring that our research remains reproducible and hence testable. Finally, it allows for proper accountability of the work done, which can be very important judging by the recent \scandal" involving the climate research unit at the University of East Anglia. The history kept by the revision control system can rightly be considered as your "Laboratory Notebook". 1.4 What is a Revision Control System? Revision Control Systems (RCS) are also referred to as Version Control Systems (VCS) or Source Code Management (Systems) (SCM[S]) or Software Configu- ration Management (Systems). The basic idea is to provide some sort of system which in addition to just keeping the files related to some software project also records the development history. With this, it should allow you to move around in this history, either on a per-file basis like in CVS or on a project basis as in SVN or git. As mentioned before, supplementary useful information about the originator of a particular change could also be kept in addition to creation and modification times. Finally, one of the most important features of a revi- sion control system is the preservation of change messages or a \change log" in other words. Writing meaningful, succinct and complete change messages is extremely helpful to you and other developers on the team. The change message “bug-fix” is useless, while: module M: changed function F to fix bug #4587, clear memory for temporary string before it is reused to prevent output from being garbled by stale information 3 tells you exactly what was done and why. It also links this particular change to a bug, which is useful if you're trying to figure out when a particular bug was fixed. More advanced systems also understand the renaming of directory struc- tures and files. They can also help you with undoing changes that turn out to be incorrect in retrospect. If you follow good practice in revision control, this \undo" functionality can be as simple as one command. As we will see, versatile systems are adept at managing branches as well as their splitting and merging, thereby supporting you in the deployment of new features or bug fixes. This also aids in keeping a structured programming work-flow or at least a struc- tured program history, as we will see later. Finally these systems offer various features that help with release management and collaborative development. 1.5 Revision Control Features 1.5.1 Branches The concept of branching will be central to a large part of this document and it is therefore important to describe it here. In basic revision control there will only ever be one copy of the source code that the developers work on. For the purpose of publishing releases, copies might be made at a given point in time and labelled somehow, say \version 1.0". Branches are a way of keeping track of multiple copies of a software project and you might think of them as virtual directories. They can come in useful when multiple versions of your software are in use at the same time and you need to fix a bug in a few of them. Alternatively you might keep a \stable" and an \unstable" version of your code, where only the latter has new features added. More advanced branching systems are possible. We will see that branches can be used to test an idea or as an organizational tool to write a fix for some bug.