An Essential Tool for Collaborative Development

Philipp Haller

Associate Professor School of Electrical Engineering and Computer Science KTH Royal Institute of Technology

DD2480 Software Engineering Fundamentals 17 January, 2020 About Myself

• 2006 Dipl.-Inform. Open source activities: Karlsruhe Institute of • Member of Scala language Technology (KIT), Germany team 2006—2014 Creator of Scala’s first 2010 Ph.D. in Computer • • widely-used actor library Science Co-author of futures and Swiss Federal Institute of • promises in Scala Technology Lausanne (EPFL) • Co-author of Scala’s async/ await extension • Held positions at Typesafe, Inc., Stanford University, and EPFL

• Since Dec 2014 at KTH A Large Open-Source Project

• Example: Scala programming language • Many {users, contributors, commits, releases, …} Challenges of Collaborative Software Development • Several developers extending/changing the same code base concurrently • How to integrate multiple, possibly conflicting changes? • How to track ownership of changes to code and documents? • Development and maintenance of multiple major versions in parallel • Maintain previous major versions through bug fixes (e.g., v3.0 -> v3.1) • In parallel, develop new feature or next major version v2.1.4 -> v2.1.5, v2.1.x -> v2.2.0

• Unambiguous identification of a revision/stable release • Each revision of each file/directory/resource etc. Version Control

• Version control = practices and tools that provide control over changes to and other documents

• Synonyms: revision control, source control, software configuration management (SCM) What to keep versioned?

Which source code, files and artifacts should be maintained using version control? YES, of course! • Current development version of source code?

• Coding experiment but commented out? NO, comments should normally not be committed Compiled binaries (e.g., JVM .class files)? • NO!

• Build files, build scripts (e.g., for Maven, Gradle, sbt) YES!

• Documentation? YES!

• Configuration files? YES! Basic Concepts (1)

• Revision (or “version”) • The state of either (a) a single file, or (b) the entire source tree, at a given point in time

• Repository - storage for current and historical state of all versioned files

• Working copy - local copy of files from a repository at a specific revision Basic Concepts (2)

• As a verb: “writing changes made in working copy back to repository”

• As a noun: the revision created as a result of committing • Change set - set of changes made in a single commit • Branch, - copy of source tree developed independently • Head (also tip) - most recent commit either to the trunk or to a branch History: 1st Generation

• SCCS (Source Code Control System) - created in 1972 at Bell Labs

• Tracks changes in source code, enables retrieving any of its previous versions

• Known for the sccsid string: static char sccsid[] = "@(#)ls. 8.1 (Berkeley) 6/11/93";

• RCS - first released in 1982 by Walter F. Tichy at Purdue University

• Core concept: check-in and check-out of sets of files called revision groups

• Revisions stored in a tree structure; can revert to previous revisions

• DSEE (Domain Software Engineering Environment) • Predecessor of ClearCase developed by Atria Software History: 2nd Generation

• CVS (Concurrent Versions System) Last release in 2008 • Originally developed as a front end for RCS • Before SVN the de-facto standard VCS in the open source world • Subversion (SVN) - added features missing in CVS • Commits as true atomic operations Interrupted commits of multiple files could lead to Renamed/moved files retain full revision history • corruption in CVS • Branching is a cheap operation • SVK - uses Subversion file system, adds features • Offline operations (checkin, log, ); distributed branches History: 3rd Generation

• Decentralized version control systems become mainstream

• Improvements in speed, reliability and flexibility Decentralized Version Control Systems • Peer-to-peer approach:

• Full history mirrored on every developer's computer • Repositories are synchronized by exchanging patches between peers

• Improves working offline, enables private work • Common operations are fast (no communication with centralized server)

• Does not rely on a single location for backups History: 3rd Generation

• BitKeeper • DVCS, used for kernel development 2002-2005 • - created by in 2005 for development • Supports distributed BitKeeper-like workflow • Focus on high performance, strong safeguards against corruption

• Most widely used VCS according to 2014 Eclipse Foundation community survey and 2015 Stack Overflow developer survey

• Like Git, created because of the withdrawal of the free version of BitKeeper

• GNU arch, Bazaar, ArX No longer developed Version Control in Practice

• Practical use of Git and GitHub • Live session… Best Practices

"What you should • Commits always try to do"

• Self-contained change sets (“one purpose”), with unit tests

• Good commit message with helpful title (“Fix issue #42”, “Add tutorial”)

• Pull requests

• Clean build and test suite must pass (checked via CI)

• Refer to issue in issue tracker

• Follow pull request policy1 of corresponding project 1Example: https://github.com/scala/scala/blob/2.13.x/CONTRIBUTING.md References

• Scott Chacon and Ben Straub. Pro Git https://git-scm.com/book CC BY-NC-SA 3.0

• GitHub Help - About pull requests https://help.github.com/articles/about-pull-requests/

• GitHub Guides https://guides.github.com/

Q: What is a group of software developers called? A: A merge conflict.