DVCS Or a New Way to Use Version Control Systems for Freebsd
Total Page:16
File Type:pdf, Size:1020Kb
DVCS or a new way to use Version Control Systems for FreeBSD Ollivier ROBERT <[email protected]> 11th May 2006 Abstract FreeBSD, like many open source projects, uses CVS as its main version control system (VCS), which is an extended history of all modifications made since the beginning of the project in 1993. CVS is a cornerstone of FreeBSD in two ways: not only does it record the history of the project, but it is a fundamental tool for coordinating the development of the FreeBSD operating system. CVS is built around the concept of centralised repository, which has a number of limitations. Recently, a new type of VCS has arisen: Distributed VCS, one of the first being BK from Bit- Mover, Inc. Better known from the controversy it generated when Linus Torvalds started using it, it has nonetheless changed the way some people develop software. This paper explores the area of distributed VCS. We analyse two of them (Arch in its Bazaar[1] incarnation and Mercurial[2] and try to show how such a tool could help further FreeBSD devel- opment, both as a tool and as a new development process. 1 Introduction 2 A brief history of VCS 2.1 Ancestors FreeBSD[3] has been using CVS as its main ver- sion control system (VCS) tool for as long as it ex- At the beginning, the main tools used to manage ists and has now more than 10 years of history. A different versions of software were pretty prim- few years ago, limitations inherent in CVS design itive but it was enough for most people during became too much to workaround and the project early stages of development and large pieces of begun using Perforce[4] for projects that needed software were developed using SCCS or later, to change fast without “polluting” the main tree. RCS. Both SCCS & RCS uses the same basic princi- ples to handle changes with a special directory It works well but using two VCS instead of one in the working area, storing the files and the dif- is making merges harder than it needs to be and ferents revisions in a special format with a fun- CVS limitations have become too much even for damental difference between the two: SCCS uses the main tree itself. The separation of the reposi- a special format called "weave" (see [5]) whereas tory into 4 different ones helped but it is still com- RCS stores separate deltas: always storing the lat- plicated. est revision and storing differences down to the first one. After a bit of history, we will explore how we RCS assumes you will want a fast access to re- could solve this problem. visions close to what we will call the HEAD, the latest checked-in revision in the main line of de- velopment, the main inconvenience being that as 2.2 The Golden Age of CVS you add branches and tags, the backend storage In 1986, Dick Grune created CVS with two of his file gets more complicated and the system will students as a set of shell scripts over RCS in or- gradually slow down. der to be able to work on pieces of a compiler AT&T and CSRG at Berkeley both used SCCS independently[7]. His work was then rewritten to manage whole versions of UNIXTM and one in C by Brian Berliner and a paper published in can find in many files the marker for SCCS1. 1990 at USENIX Winter Technical Conference[8]. The what(1) command is used to "reveal" SCCS At first, CVS didn’t have remote operations and strings in binaries. The author even used to em- thus, to work on a given CVS tree, you would bed the SCCS marker inside RCS $Id$ strings have to be logged on a machine with "physical" to be able to use either what(1) or ident(1) access to the tree (of course it could be through from RCS on such binaries. NFS2). Both SCCS and RCS use a "locking" model The main advantages of CVS apart from being where checkout means locking a file before being free – a very useful feature in itself of course – able to modify it. This model essentially works at that time were: because it assumes that a given file will be mod- ifed by only one person at some point in time. • A central repository instead of a collection It is pretty easy to see that it doesn’t scale espe- of small trees each with its own unsharable cially with teams in different locations, as neither RCS directory of them support remote operations. That means • A "no-locking" mode of operation, allowing that sharing a tree can only be done using NFS or concurrent access to the repository through an equivalent sharing file system. extraction (also known as checkout) of a What is actually interesting is that among the cur- subset of the tree in sandboxes – a pri- rently available VCS (distributed or not), most of vate workarea from which each developer them use SCCS or RCS as the base for its design, can commit his work regularely and merge either by copying the User Interface (UI) – SVN his/her modifications with others. for example – or by extending it in different ways: • A central repository enables the use of cus- • Perforce uses the RCS file format with DB tom scripts (for sending commit logs by files for metadata mail), access control lists and triggers. • BK is more or less a rewrite of SCCS For all these reasons and the fact that none others to allow cloning of repositories/branches existed, CVS became the VCS of choice for many (See the announce posted on the projects, both proprietary3 and free ones like all linux-kernel list [6]). the BSDs4. Soon, most if not all FOSS5 began using CVS with There is another area where these venerable VCS the notable exception of Linux, as the Linus Tor- don’t work efficiently: binary file support. They valds disliked CVS so much that he refused to are fundamentaly designed to cope with text files use an imperfect product6. such as source code; binary files would be stored as "text" with all the potential loss of information Then CVS gained remote operation support and any command that tries to display or merge (through either RSH, SSH or its own pserver would generate gibberish on the screen. mode) and its adoption by SourceForge[9] and similar projects repositories really pushed CVS in 1The marker is @(#) and the RCS equivalent is $Header$ 2NFS: Network File System 3The author knows for a fact that the French telco company Alcatel has built its own VCS on top of CVS. 4All 4 of them, not counting TrustedBSD: FreeBSD, NetBSD, OpenBSD and more recently DragonflyBSD 5FOSS: Free and Open Source Software 6How he managed to not use a VCS for so long (1991 - 2000) keeps on baffling the author of the paper. the open. The various bits of documentation • Branches are cumbersome to use, especially available first on FTP sites then on the WWW, the when you want to merge bits of this and books (see [10], [11] and [12]) and the simple but bits of that from another branch. As the effective UI of CVS also helped a lot to lower the main entity versioned is the file and CVS difficulty of using a VCS and thousands of peo- has no memory of branching, you have to ple began using it. keep somewhere the exact revision for each file to be merged or use tags and/or dates to get this information. 2.3 CVS flaws • Third-party code integration and main- 9 Nowadays, many developers confess to stum- tainance is very cumbersome as it is done bling on one or several misfeatures or design through a special branch called the vendor problems with CVS. Its flaws are now very well- branch. known: The last two points are critical for projects such as FreeBSD because we have to maintain parallel • Commits are not atomic (i.e. there is no con- branches for STABLE/CURRENT developement 7 cept of a changeset ), the granularity being and releases (and security branches). The weak the directory: in a single directory, you get support for branches is also something that slows consistency between the various impacted down development in general. It makes work- files through a lock but if a given com- ing on specific sub-projects or features more com- mit spans multiple directories, you are on plicated. That’s why a few years ago, FreeBSD your own: that’s why the various conver- made the choice of using a different VCS for such 8 tion tools like Tailor[13] or cscvs have cases: Perforce (see below). some difficulties finding all files in a given Offline work is possible through CVSup (see be- changeset. low) but it is still very limited. • You can not rename files and directories. The only way to achieve that is either by 2.4 Enter Subversion delete+add – which is very wrong and loses history – or manually edit the repository to To remove all these limitations, SVN was born. copy or move the corresponding backend SVN is often cited as the natural successor to CVS file. and looking at it, it is easy to see why: • CVS has no fine grained access control fa- • It is written by former CVS developers cilities and needs to rely on filesystem per- • It is touted as CVS done right missions for many things although heavy users such as the FreeBSD project have de- • Resembling CVS as much as possible is one veloped customs wrappers and scripts to of its primary goals (you can alias the cvs add per-branch ACL, review workflow and command to svn and get running in no more.