A Fully Decentralized, Peer-To-Peer Based Version Control System As a Solution to Overcome Issues of Currently Available Systems

A FULLY DECENTRALIZED, PEER-TO-PEER BASED VERSIONCONTROLSYSTEM Vom Fachbereich Elektrotechnik und Informationstechnik der Technischen Universität Darmstadt zur Erlangung des akademischen Grades eines Doktor-Ingenieurs (Dr.-Ing.) genemigte dissertationsschrift von dipl.-inform. patrick mukherjee geboren am 1. Juli 1976 in Berlin Erstreferent: Prof. Dr. rer. nat. Andy Schürr Korreferent: Prof. Dr.-Ing. Bernd Freisleben Tag der Einreichung: 27.09.2010 Tag der Disputation: 03.12.2010 Hochschulkennziffer D17 Darmstadt 2011 Dieses Dokument wird bereitgestellt von This document is provided by tuprints, E-Publishing-Service, Technischen Universität Darmstadt. http://tuprints.ulb.tu-darmstadt.de [email protected] Bitte zitieren Sie dieses Dokument als: Please cite this document as: URN: urn:nbn:de:tuda-tuprints-2488 URL: http://tuprints.ulb.tu-darmstadt.de/2488 Die Veröffentlichung steht unter folgender Creative Commons Lizenz: Namensnennung – Keine kommerzielle Nutzung – Keine Bearbeitung 3.0 Deutschland This publication is licensed under the following Creative Commons License: Attribution – Noncommercial – No Derivs 3.0 http://creativecommons.org/licenses/by-nc-nd/3.0/de/ http://creativecommons.org/licenses/by-nc-nd/3.0/ Posveeno mojoj Sandri, koja me uvek i u svakom pogledu podrava i qini moj ivot ispuënim. ABSTRACT Version control is essential in collaborative software development today. Its main functions are to track the evolutionary changes made to a project’s files, manage work being concurrently undertaken on those files and enable a comfortable and complete exchange of a project’s data among its developers. The most commonly used version control systems are client-server based, meaning they rely on a centralized machine to store and manage all of the content. The obvious drawbacks involved by bringing an intermediary server into the system include having a single point of failure, a central ownership and vulnerability, scalability issues, and increased synchronization and communication delays. Many popular global software projects switched to the emerging distributed version control systems, demonstrating the urgent need for a more suitable version control system. This thesis proposes a fully decentralized, peer-to-peer based version control system as a solution to overcome issues of currently available systems. The peer-to-peer communication paradigm proved to be successful in a variety of applications. A peer-to-peer system consists of autonomous participants, thus certain behavior cannot be guaranteed. Its unreliable nature, however, means its usage in version control systems is not obvious. In order to develop a solution based on the peer-to-peer communication paradigm, existing client-server, distributed, and peer-to-peer version control systems are analyzed and evaluated using a set of requirements, which were derived from two realistic usage scenarios. Furthermore, the design details of those systems are closely examined in order to realize the impact of their design decisions on both functional features and quality aspects of the system, with the strongest focus on consistency. The proposed system is designed and implemented based on these findings. The resulting system, PlatinVC, is a fully decentralized system, which features the complete system view of centralized systems, while overcoming their drawbacks. All workflows of existing version control systems, centralized or distributed, are supported. PlatinVC even supports a hybrid setup, where other version control systems can interoperate, using PlatinVC as a mediator. Moreover, it introduces an automatic isolation of concurrent work, which separates relevant and required updates from possibly unneeded ones. In this way, branches are only created when necessary. The evolutionary changes of links between files, which can be enhanced by any attributes, are also tracked. That extends snapshot-based version control to other purposes, e.g. traceability of software artifacts. PlatinVC is a serious alternative to current version control systems, as industry proven components for the most critical parts of the system were reused. The evaluation shows that PlatinVC meets the identified requirements completely, thereby being the first fully decentralized version control system that provides a high consistency degree, efficiency, and robustness. The impact of this thesis is twofold: First, it offers the novel concept of automatic concurrent work isolation, so that the essential updates are separated from the unnecessary ones and the costs of integrating and merging branches are minimized. Second, this thesis provides the proof that the peer-to-peer paradigm can be pushed beyond its currently reputed limits by adding features, which previously seemed incompatible. iv ZUSAMMENFASSUNG Die Versionsverwaltung ist in der kooperativen Softwareentwicklung heutzutage unerlässlich. Ihre wichtigsten Funktionen sind die Nachverfolgung von evolutionären Änderungen an Dateien eines Projektes, die Koordination von nebenläufiger Arbeit an diesen und es den Ent- wicklern zu ermöglichen, die kompletten Daten eines Projektes auf komfortable Weise auszutauschen. Die am häufigsten verwendeten Versionsverwaltungssysteme sind Client-Server- basiert, was bedeutet, dass sie einen zentralen Rechner zur Speicherung und Verwaltung aller Daten voraussetzen. Die offensichtlichen Nachteile, die durch das Einbringen eines zwischengeschalteten Servers in der Kommunikation der Entwickler verursacht wird, sind die Schaffung einer zentralen Ausfallstelle (single point of failure), einer zentralen Angriffsstelle mit zentralen Eigentumsverhältnissen (central ownership and vulnerability), Skalierbarkeitsproblemen und erhöhte Synchronisations- und Kommunikationszeiten. Viele populäre globale Software-Projekte sind deshalb zu den aufkommenden verteilten Versionsverwaltungssystemen gewechselt, was die dringende Notwendigkeit für ein passenderes Versionsverwaltungssystem verdeutlicht. Diese Arbeit stellt ein völlig dezentrales, Peer-to-Peer basiertes Versionsverwaltungssystem vor, dass die Probleme aktueller Systeme überwindet. Das Peer-to-Peer Kommunikationspara- digma ist in einer Vielzahl von Anwendungen sehr erfolgreich. Dieses Paradigma beschreibt ein System, das ausschließlich aus autonomen Teilnehmern aufgebaut ist. Dadurch kann kaum bestimmtes Verhalten garantiert werden, was den Einsatz von Peer-to-Peer Kommunikation für ein Versionsverwaltungssystem äußerst erschwert. Um eine Lösung zu entwickeln wurden bestehende Client-Server, verteilte und Peer-to-Peer basierte Versionsverwaltungssysteme bezüglich einer Menge von Anforderungen analysiert und ausgewertet, die von zwei realistis- chen Anwendungsszenarien abgeleitet wurden. Darüber hinaus sind die Design-Details dieser Systeme genauestens untersucht worden, um die Auswirkungen der getroffenen Entschei- dungen auf Funktionalität und Qualitätsaspekte des Systems zu erkennen, insbesondere die Auswirkung auf die vom System gebotene Konsistenz. Basierend auf diesen Erkenntnissen wurde das vorgeschlagene System konzipiert und umgesetzt. Das resultierende System, PlatinVC, ist ein völlig dezentrales Systems, dass die ganzheitliche Systemsicht zentraler Systeme anbietet, ohne deren Nachteile zu erben. Alle in bisherigen Versionsverwaltungssystemen üblichen Arbeitsabläufe werden unterstützt, unabhängig davon, ob diese bei zentralen oder dezentralen Systemen verwendet werden. PlatinVC kann es sogar in einem hybriden Aufbau als Vermittler anderen Systemen ermöglichen, sich interoperabel auszutauschen. Darüber hinaus führt diese Arbeit ein Konzept zur automatischen Isolierung von nebenläufigen Änderungen ein, bei dem benötigte Beiträge von potenziell unnötigen getrennt werden. Zweige werden nur angelegt, falls sie tatsächlich notwendig sind. Die evolutionären Änderungen von Links zwischen Dateien, die mit beliebigen Attributen angereichert werden können, werden ebenfalls verwaltet. Dies erweitert die Nutzung über snapshot-basierter Versionsverwaltung hinaus für andere Zwecke, wie beispielsweise die Nachverfolgung von Software Artefakten (traceability). PlatinVC ist eine ernstzunehmende Alternative zu aktuellen Versionsverwaltungssystemen, da für die kritischsten Systemteile Komponenten wiederver- wendet wurden, die sich im industriellem Umfeld bewährt haben. Die Evaluationsergebnisse zeigen, dass PlatinVC die identifizierten Anforderungen komplett erfüllt und somit das erste völlig dezentrale Versionsverwaltungssystem ist, das eine hohe Konsistenz, Effizienz und Robustheit bietet. Der Einfluss dieser Arbeit zeigt sich in zwei Formen: Erstens bietet sie ein neuartiges Konzept zur automatischen Isolation nebenläufiger Arbeit, so dass notwendige und nutzlose Updates separiert und die Kosten der Integration und Zusammenführung von Verzweigungen minimiert werden. Zweitens beweist diese Dissertation durch bisher für inkompatibel gehal- tene Funktionen, dass das Peer-to-Peer Paradigma über seine momentane Anwendungszwecke hinaus eingesetzt werden kann. v "Sometimes when you innovate, you make mistakes. It is best to admit them quickly, and get on with improving your other innovations." — Steve Jobs — PREFACE Rationale During my work as a consultant for an international corporation, I experienced the magnitude and importance of collaboration and team work. Multiple code and text files are subject to concurrent changes by multiple developers. Tracking the evolution of a project and managing possible conflicts are essential features of a development environment. I used version

A Fully Decentralized, Peer-To-Peer Based Version Control System As a Solution to Overcome Issues of Currently Available Systems

Speech Recognition Systems: a Comparative Review

Speech-To-Text System for Phonebook Automation

Multipurpose Large Vocabulary Continuous Speech Recognition Engine Julius

The Wavesurfer Automatic Speech Recognition Plugin

Speech Phonetization Alignment and Syllabification (SPPAS): a Tool for the Automatic Analysis of Speech Prosody Brigitte Bigi, Daniel Hirst

Recent Development of Open-Source Speech Recognition Engine Julius

Recent Development of Open-Source Speech Recognition Engine Julius

Open Source Notices This Product Contains Computer Programs (Also Referred to As “Software”) Owned by Or Licensed to Panasonic

Understanding and Auditing the Licensing of Open Source Software Distributions

Recent Development of Open-Source Speech Recognition Engine Julius

Cryptography in C and C++

An Open Source Real-Time Large Vocabulary Recognition Engine