Current Concepts in Version Control Systems

Current Concepts in Version Control Systems Petr Baudiˇs∗ 2009-09-11 Abstract of thousands of developers on huge projects. This work will focus on the latter end of the We give the reader a comprehensive overview of scale, describing the modern approaches for ad- the state of the Version Control software engi- vanced version control with emphasis on soft- neering field, describing and analysing the con- ware development. We will compare various sys- cepts, architectural approaches and methods re- tems and techniques, but not judge their general searched and included in the currently widely suitability as the set of requirements for version used version control systems and propose some control can vary widely. We will not focus on possible future research directions. integration of version control in the source configuration management systems1 since SCM is an extremely wide area and many other papers Preface already cover SCM in general. [3][15][70] This work aims to fill the sore need for an This paper has been originally written in May introduction-level but comprehensive summary 2008 as an adaptation of my Bachelor thesis and of the current state of art in version control the- I have been meaning to get back to it ever since, ory and practice. We hope that it will help new to add in some missing information, especially developers entering the field to quickly under- on some of my research ideas, describing the stand all the concepts and ideas in use. Since the patch algebra more formally and detailing on author is (was) a Git developer, we focus some- the more interesting chapters of historical de- what on the Git version control system in our velopment in the field. examples, but we try to be comprehensive and Unfortunately, it is not likely that I will ever cover other systems fairly as much as is within get the time to polish the paper up to my full our power and knowledge. content; in the meantime, it is slowly getting obsolete and I believe it can have a good value We also describe for the first time some tech- even as it stands, especially as an introductory niques so far present only as source code in material for people getting into this area. Maybe various systems without any explicit documen- arXiv:1405.3496v1 [cs.SE] 14 May 2014 I have missed some of the latest developments; tation and present some of our original work. I ask the reader for tolerance of the omissions, One problem in the field of version control is and my fellow researches and authors for fol- that since most innovation and research hap- lowups on this work that would fill in the gaps pens not in the academia or even classical in- and explain ongoing new ideas in the field. dustry, but the open community, very few of the | Petr Baudiˇs results get published in the form of a scientific paper or technical report and many of the concepts are known only from mailing list posts, 1 Introduction random writeups scattered around the web or merely source code of the free software tools. Version control plays an important role in most Thus, another aim of this work is tie up all these of the creative processes involving a computer. sources and provide an anchor point for future Ranging from simple undo/redo capabilities of researchers. most office tools to complicated version control supporting branching and diverse collaboration 1Source Configuration Management covers areas like ∗The work on this paper was in part sponsored by version control, build management, bug tracking or de- Novell (SUSE Labs). velopment process management. 1 2 BASIC DESIGN 2 2 Basic Design full version tracking capabilities and not only is all the history usually copied over (pulled), The basic task of version control system is to but it is also possible to commit new changes preserve history of a collection of files (project) locally; later, they can be sent (pushed) to an- so that the user can compare two historical ver- other repository in order to be shared with other sions of the content easily, return to an older developers and users. This approach effectively version or review the list of changes between two makes every copy a stand-alone repository with versions. stand-alone branches; nevertheless, most systems make it easy to emulate the centralized 2.1 Development Process Model version control model as a special-case at least for the read-only users. We will focus on version control systems that are based on the distributed development paradigm. 2.2 Snapshot-Oriented History While the concept dates many years back, distributed development has only recently seen ma- There are two possible approaches to modeling jor adoption and so far mostly only in the area the history. The first-class object is usually the 4 of open source development. particular version as a state (snapshot) of the The classical approach to version control is to content at a given moment, while the changes have a single central server2 storing the project are deduced only by comparing two versions. history; the client can have the current version However there is also a competing approach cached locally, and possibly also (parts of) the which focuses on the changes instead: in that history, but any operation involving change in case, the important thing when marking new the history (e.g. committing new version) re- version is not the new state of the content, but quires cooperation with the central server. the difference between the old and new state; a particular version is then regarded as a cer- This centralized development approach has 5 two major shortcomings: first, the developer tain combination of changes. has to have network access during development, which is problematic in the era of mobile com- 2.2.1 Branches puting | i.e., when developing during travel or With regard to the snapshot-oriented history fixing problems at the customer site. Second, the model, it may suffice in the trivial case to merely central repository access implies some inherent represent the succeeding versions as a list with bureaucracy overhead | the developer has to the versions ordered chronologically. But over gain appropriate commit permissions before be- time it may become feasible to work on several ing able to version-control any of his work, de- versions of the content in parallel, while also velopment of third-party patches is difficult and recording further modifications using the ver- since the repository is shared, the developer is sion control system; the development has effec- usually restricted in experimentation and creat- tively been split to several variants (branches). ing short-lived branches, often resorting to using Handling branches is an obvious feature of a some kind of manual version control during reg- version control system as the necessity to branch ular workflow and using the VCS only on official the development is frequent for many projects occassions. targetted at wider audience. Commonly, aside On the other hand, since all the develop- of the main development branch another branch ment is tracked at a central place, this model is kept based on the latest public release of the may pose advantages to corporate shops because project and it is dedicated to bugfixes for this work and performance of all developers can be release. 3 tracked easily. Moreover, branches have many other uses as In the distributed development model, a copy well | it may be feasible for individual develop- of the project created by version control has ers or teams to store work in progress yet unsuit- 2Possibly with several read-only mirrors for replica- 4We use the terms version, revision and commit inter- tion, as in the BSD ports setup. changeably in this paper based on the context, according 3In case of in-company development using distributed to usual practice. version control, this can be alleviated by setting appro- 5Note that the conceptual model may have no relation priate policy for the developers to push their work to to actual storage, which can well use deltas even within designated central repositories in regular intervals. snapshot-oriented systems. 2 BASIC DESIGN 3 able for the main development branch in sepa- systems and can result in trouble when merging rate branches, or even for a developer to save the branches later, both in duplicated changes incomplete modifications during short-time re- stored in the history and increased merge con- assignment to a different task or when moving flicts rate.7 to another machine. The prerequisite for this is that creating a separate branch must have only minimal overhead, both technical (time 2.2.2 Data Structures and space required to create a new branch) and social (it must be possible to easily remove Given that the development may fork at some the branch too; also, the unimportant branches point in history, we cannot use simple lists any- should not cloud the view of the important ma- more to adequately represent this event but we jor ones). have to model the history as a directed tree The ability to properly record and manipu- instead, with succeeding versions connected by late branches nevertheless becomes truly vital edges; the node representing a fork point (not as the development gets distributed, since in ef- only one, but several branches grow from the fect every clone of the project becomes a sep- node) has multiple children. arate branch. If the system aims to fully and comfortably support the distributed develop- However, when we take merging into con- ment model, it must make sure that there are sideration, it becomes necessary to also repre- no hurdles for creating many new branches; fur- sent the past merges between branches if we are thermore, there can be no central registry of to properly support repeated merges and pre- branches, thus the branch-related data struc- serve detailed history of both branches across tures must allow for branches to grow from dif- the merge.

Current Concepts in Version Control Systems

Glassfish™ Community Lighting Talks

Mkusb Quick Start Manual

Ubuntu Kung Fu

Generating Commit Messages from Git Diffs

Introduction to Version Control with Git

Efficient Algorithms for Comparing, Storing, and Sharing

Distributed Configuration Management: Mercurial CSCI 5828 Spring 2012 Mark Grebe Configuration Management

Sistemas De Control De Versiones De Última Generación (DCA)

Higher Inductive Types (Hits) Are a New Type Former!

Brno University of Technology Bulk Operation

Version Control – Agile Workflow with Git/Github

IBM Rational Team Concert V4.0.3 Keeps Development Teams Focused