<<

Revision Control

Tomáš Kalibera, (Peter Libič) Department of Distributed and Dependable Systems http://d3s.mff.cuni.cz CHARLES UNIVERSITY PRAGUE Faculty of Mathematics and Physics Problems solved by revision control

What is it good for? Keeping history of system evolution

• What a “system” can be . Source code (single file, source tree) . Textual document . In general anything what can evolve – can have versions • Why ? . Safer experimentation – easy reverting to an older version • Additional benefits . Tracking progress (how many lines have I added yesterday) . Incremental processing (distributing patches, …) Allowing concurrent work on a system

• Why concurrent work ? . Size and complexity of current systems (source code) require team work • How can concurrent work be organized ? 1. Independent modifications of (distinct) system parts 2. Resolving conflicting modifications 3. Checking that the whole system works . Additional benefits . Evaluating productivity of team members Additional benefits of code revision control

• How revision control helps . Code is isolated at one place (no generated files) . Notifications when a new code version is available • Potential applications that benefit . Automated testing • Compile errors, functional errors, performance regressions . Automated building . Backup • Being at one place, the source is isolated from unneeded generated files . Code browsing • Web interface with hyperlinked code Typical architecture

Working copy Source code repository (versioned sources)

synchronization Basic operations

• Check-out . Create a working copy of repository content • Update . Update working copy using repository (both to latest and historical version) • Check-in () . Propagate working copy back to repository • Diff . Show differences between two versions of source code Simplified usage scenario

Source code Check-out or update repository Working 1 copy 2 Modify & Test

Check-in 3 Exporting/importing source trees

• Import . Importing a whole (presently non-versioned) source tree into a repository • Export . Exporting whole source tree from a repository . Similar to check-out, but the exported tree does not contain metadata that would allow to check-in the changes Branching and merging

Main development

branching merging

Parallel development

Time & software versions Why parallel development of different versions

• Experimental versions . Isolate intrusive and potentially dangerous experimental development . (1) Branch from latest version . (2) back when stable • Bugfixes . Fix a bug in an older version (already released and considered stable) . (1) Branch from an older version . (2) Merge to main development only if the bug is still there Branching details

• Branch root . A given version (of a given branch – branches can be branched) . Typically latest version of a given branch (i.e. main development branch) • May not need a working copy . In principle, branching is an operation performed only on the repository • Naming . Branches typically have symbolic human readable names . They also may be numbered, creating a hierarchical structure of versions Merging details (1)

• Merge objective . Merge changes made in two branches of development into one branch . In detail: apply changes made in a branch to the current version of the original branch

Main development

branching merging

Parallel development Diff operation (2-way)

• Objective . Compare 2 text (source code) files . Produce brief and concise list of differences • Input . Two textual files, compares 2nd against 1st • Output (line-based diff) . List of added lines, deleted lines, and (sometimes) modified lines . The exact distinction of what is still a modification depends on the algorithm Diff operation (2-way) – Example Diff operation (2-way) – LCS based algorithm

1. View files as lists of lines 2. Find longest common subsequence (LCS) – non-continuous sequence of lines that are present in both files 3. Match lines from the common subsequence 4. Mark the blocks of remaining lines as . Added – iff all its lines are only in the 2nd file . Deleted – iff all its lines are only in the 1st file . Modified – otherwise Diff (2-way) applications

• Incremental data representation . Backup (binary files use a different algorithm) . Internal storage format in revision control system . Distribution of sources in form of patches • Visual help for programmers . Graphical representation of what has been changed Back to merging – 3-way merge operation

A Modified version Common ancestor O

B Modified version 1. Two-way diff of A and O (finding LCS suffices) 2. Two-way diff of B and O (finding LCS suffices) 3. Resulting blocks of lines are • Left alone – iff equal in all versions O,A,B • Automatically changed – iff changed only in A or B • Left for manual fix – otherwise (CONFLICT) Merging summary

• Merge objective . Synchronize changes made in parallel • Typical merge usage 1. Invoke merge explicitly (merging branches) or implicitly (checking-in in case of concurrent changes) 2. Resolve conflicts – some have to be resolved manually, all should be checked 3. Check-in the merged sources Synchronizing the developers

• Why in revision control system ? . Software is developed in teams that may not be always able to communicate directly • Lock-Modify-Unlock solution . Mandatory locking prior to modification • Copy-Modify-Merge . Concurrent modification, resolving conflicts if they happen Lock-Modify-Unlock model

• Pros . Conflicts cannot happen (safety) – though depends on locking granularity . Simpler allocation of responsibility • Cons . The sense of safety can be false, if the locking granularity is too fine-grained . Unnecessary serialization, especially when the locking granularity is too coarse-grained . People forget to unlock Copy-Modify-Merge model

• In-line with merging of branches • Typical advise: Merge often . Lower chance of conflicts . Smaller conflicts are easier to resolve . More steps of conflict resolution are under versioning control (and can be reviewed) . In-line with “check-in often” – checking in often increases the chance that someone else will have to do the merge… • Merge often is often applied to branches as well Combining the models (Locking and Merging)

• A system can support both, allowing users to lock files, but not forcing them to • Locking may not be that strict . Users can break them (not needing to contact the administrator) • Watches . Users can register to receive notifications when a particular file is used by someone else (locked, modified, … - depending on the system) Existing versioning systems (selected)

• SCCS (Source Code Control System) . Marc Rochkind, Bell Labs, 1972, . Works on single files, now obsolete • RCS (Revision Control System) . Walter Tichy, Purdue Uni, 1980, GNU . Works on single files . Simple installation (repository files are typically kept next to the working copy) . RCS repository file format for storing multiple revisions of a text file is still used (CVS, TWiki) Existing versioning systems (selected)

• CVS (Concurrent Versioning System) . Dick Grune, 1986, Brian Berliner, 1989, free . Uses RCS, allows operation on multiple files . Many big projects were using CVS, but the trend is moving towards newer systems • i.e. Mozilla -> , GIMP -> • SVN (Subversion) . ~2000, CollabNet team, free; Apache project now . Intended as CVS replacement . Easier to use than CVS, support for binary files, versioning of directories . Currently used by many open-source projects (i.e. Apache, GCC), but many are moving towards distributed systems (i.e. KDE -> git) Existing versioning systems (selected)

• ClearCase . Atria Software, 1992, now IBM, commercial . Special versioned filesystem, views . Excessively complex to use . Used by Arpege/Aladin numerical weather forecast model Existing versioning systems (selected)

• Mercurial • Bazaar • GIT . Linus Torvalds, 2005 . Distributed repository . Used by Linux Kernel