Distributed Version Control Systems

Distributed Version Control Systems Dipl. Medieninformatiker (BA) Daniel Kuhn Master degree course Computer Science and Media Hochschule der Medien Nobelstrasse 10, 70569 Stuttgart e-mail: [email protected] July 2010 Abstract integrity checks. Both SCCS and RCS are Version Control Systems that use a locking mechanism to lock files for other Traditional centralized Version Control Systems (VCS) as users when they are edited. e.g. CVS and SVN, which were used since the 1980’s in al- 1989 the more familiar Concurrent Versions System most every company, are facing competition by a new ap- (CVS) was developed as an improvement of RCS. CVS proach to source code management. Distributed Version showed many additions like the support for multiple files Control Systems like Bazaar, Darcs, Mecurial and Git are and the support for multiple users. But the major advantage changing the way repositories are stored and distributed by of CVS was support for merging concurrently changed files representing a new and more contemporary mindset. Today instead of using a locking mechanism. This enabled devel- Distributed Version Control Systems have therefore become opers to work collaboratively on their projects. But CVS very popular in the open-source community with many de- had some restrictions concerning the handling of directo- velopers using them. ries and binary-files. With the growth of the internet, users This paper aims to show the requirements, advantages of CVS criticized the lack of supporting a native remote ac- and disadvantages of Distributed Version Control Systems cess in CVS and demanded for a solution. So in 1994 a and discusses the differences to Centralized Version Control group of developers implemented an extension to use CVS Systems us the software Git as an example. The document over TCP, which was a large improvement to CVS, which shows and explains the functionality of the system and takes could now be used easier over the internet. a look under the hood by not focusing on the use of the tool, Later in the year 2000 a company called CollabNet de- but by rather analyzing how the tool handles its tasks. cided to develop a replacement system for CVS. So Ben Collins-Sussman, Brian Behlendorf and Jason Robbins of 1. Introduction CollabNet started the Subversion (SVN) Project. The subversion CVS is still being used today and considered an in- Since the 1970’s Version Control Systems have been used dustry standard. Subversion is an Open-Source Software to archive and manage different versions of files, especially in accordance with the Debian Free Software Guidelines source code. Version Control started in 1972 with the re- (DFSG)1. Subversion brought many advantages to develop- lease of the first system called Source Code Control Sys- ers like better file-handling, including a directory structure, tem (SCCS). SCCS was developed to manage source code, which lead to more focus to the project instead of only files. configuration-files or documentations and focused on man- SVN has real atom commits ensuring consistency, better aging single files and single users. SCCS has brought fea- branching and tagging and an abstract repository interface tures like: letting developers keep track of the history of a for an easy development of different access methods. [5] file, using checksums to detect data corruption, using a for- As of today, Centralized Version Control Systems - es- ward delta system and providing support for binary files [3] pecially Subversion - are the top-dogs in companies. With [4]. Bitkeeper, developed in 1997, there was another approach Another early version control system is the Revision to source control management: Distributed Version Con- Control System (RCS), developed in the 1980’s. Just like trol Systems. These systems have become more and more SCCS, RCS is a file-based, single-user oriented version popular with the development of Git, Mercurial, Bazaar and control software that simply stores different versions of files Darcs. in the file-system. But unlike SCCS, RCS keeps no track of the history of a file and does not provide checksums for 1http://www.debian.org/social_contract#guidelines 1 Git is a Distributed Version Control System (DVCS) in tween nodes over distances. When it comes to data storage development since 2005 as an open-source project started and delivery one has to think about general questions like by Linus Torvalds. The Git project was started with the in- 1. Security – is my data safe? tention of replacing the proprietary Bitkeeper DVCS used to maintain the linux kernel and free to use for open-source 2. Performance – can I deliver the data quickly? projects. After the Bitkeeper developers decided to change their licensing policy Linus Torvalds project leader of the 3. Reliability and Availibility – can the data be accessed linux kernel, started looking for an appropriate replacement, 24/7 without downtime? which he ultimatly couldn’t find. Therefore Torvalds de- 4. Consistency – is my system state consistent over every cided to design the Git DVCS to meet the needs of the ker- node? nel and its developers. 5. Maintenance – can I maintain my infrastructure easily? 2. Polymorphism of Version Control These five points are the general requirements of a ver- Systems sion control system. However some of these requirements are orthogonal to each other. A system which is performing One of the easiest “versioning systems” is local and man- well, highly secure, highly reliable, easy to maintain and ual versioning. Everybody has used it, maybe without even always has a consistent system state is hardly possible to thinking about it as a kind of version control. Local or design without spending a tremendous amount of money. manual versioning means to make a copy of a document Fact is, if you use a centralized system, it is easy to and save it under a different name (e.g. myImportantLet- backup the data and maintain the servers and storage sys- ter_20100202). This version control system does neither tems. But when it comes to scalability and distribution, cen- perform well, nor is it efficient or safe in any way. But it is tralized systems will not perform well – which can also lead easy to use and understand and may fit personal versioning. to general failures. On top of that there is always the risk of However when it comes to a professional level a better ver- a single point of failure. sion control system is needed. So it becomes quite obvious The same questions arise when comparing CVCS with that version control can be done in many ways and should DVCS. A centralized system will always be easier to main- always fit the use cases by fulfilling its requirements. tain. Servers are mostly in one place and contain many One of the common purposes of Version Control Sys- repositories for multiple projects. This makes it easy to tems is to enable groups of people to work together on (any backup the whole data in comparison to a distributed sys- kind of) files. They can change, archive, merge, branch or tem. DVCS are taking another approach on the backup as- synchronize files of which the system keeps a history. An- pect and consistency. The idea is that if you clone a reposi- other purpose is archiving and backupping. Content in the tory to your local drive, it contains all the history (including system should possibly be safe and free of data-loss. Com- branches and merges) from that one developer who’s repos- bining the two aspects of collaboration and security leads to itory was cloned. But you never see the code of all devel- Centralized Version Control Systems (CVCS). Traditional opers working on the same project. A developer can imple- systems like CVS and more common SVN are centralized, ment a feature and say “hey I’ve got the feature ready, pull containing only one main repository per project usable by the code if you like” – and if a developer likes to pull that every member of the group. CVCS are used in most of feature, he does and merges the code right into his reposi- the companies and therefore have reached their maturity by tory. So there is not just a single development version, like being considered an industry-standard. Aside from CVCS one might be used from CVCS, but rather multiple versions there is another approach to achieve version control in a of the code. The result being nobody knows the whole sys- slightly different way. Distributed Version Control Sys- tem state. This approach to development involves a change tems like Bazaar, Darcs, Mercurial and Git use decentral- of the mindset used in the traditional centralized systems, ized repositories for versioning data which is spread among because Distributed Version Control Systems focus not on users. backup, but rather on easy versioning and the sharing of fea- tures to a program instead of having only the one version. 2.1 A quick comparison between centralized Linus Torvalds (in Google tech talk [7]) sees the fact that and decentralized VCS everybody pulls from everybody as an advantage, because your code is distributed and backuped by other developers First I’d like to give a more general view of centralization and you don’t have to care about it. But I disagree. This may and decentralization. Comparing the systems most of the work for famous developers like Mr. Torvalds, but perhaps time leads to the same problems. Not only for Version Con- not for the “nooby-open-source-project-contributer”. But trol Systems, but for almost every system sharing data be- even if everybody pulls your code, this involves only the 2 code of a feature that you released and not your code you therefore the need to combine and motivate more develop- are working on in your development branch (which is only ers to contribute to their project.

Distributed Version Control Systems

Efficient Algorithms for Comparing, Storing, and Sharing

Higher Inductive Types (Hits) Are a New Type Former!

DVCS Or a New Way to Use Version Control Systems for Freebsd

Darcs 2.0.0 (2.0.0 (+ 75 Patches)) Darcs

Distributed Versioning for Everyone

Version Control

Continuous Integration for Decentralised Development of Service-Based Software Systems

David's Advanced Revision Control System

Esa Study Contract Report

Version Control with Git Why Using Version Control?

Bill Laboon Friendly Introduction Version Control: a Brief History

Anatomy of the Deployment Pipeline