Design and Development of a Service for Software Interrelationships

Die approbierte Originalversion dieser Diplom-/ Masterarbeit ist in der Hauptbibliothek der Tech- nischen Universität Wien aufgestellt und zugänglich. http://www.ub.tuwien.ac.at The approved original version of this diploma or master thesis is available at the main library of the Vienna University of Technology. http://www.ub.tuwien.ac.at/eng Design and Development of a Service for Software Interrelationships Diplomarbeit zur Erlangung des akademischen Grades Diplom-Ingenieur im Rahmen des Studiums Software Engineering & Internet Computing eingereicht von Nikola Ilo Matrikelnummer 0925955 an der Fakultät für Informatik der Technischen Universität Wien Betreuung: Thomas Grechenig Mitwirkung: Mario Bernhart Wien, 6. Oktober 2014 (Unterschrift Verfasser/In) (Unterschrift Betreuung) Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Design and Development of a Service for Software Interrelationships Master’s Thesis submitted in partial fulfillment of the requirements for the degree of Diplom-Ingenieur in Software Engineering & Internet Computing by Nikola Ilo Registration Number 0925955 to the Faculty of Informatics at the Vienna University of Technology Advisor: Thomas Grechenig Assistance: Mario Bernhart Vienna, October 6, 2014 (Signature of Author) (Signature of Advisor) Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Statement by Author Nikola Ilo Pfalzauerstraße 60, 3021 Pressbaum Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwendeten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. I hereby declare that I am the sole author of this thesis, that I have completely indicated all sources and help used, and that all parts of this work – including tables, maps and figures – if taken from other works or from the internet, whether copied literally or by sense, have been labelled including a citation of the source. (Place, Date) (Signature of Author) i This thesis is dedicated to the memory of my beloved father. Dipl.-Ing. Dr. Sotiraq Ilo 1958–2014 iii Acknowledgements First of all I want to thank my advisers Prof. Dr. Thomas Grechenig and Mario Bernhart for their support. They allowed me to explore my topic freely and therefore walk the path to which the research led me, from Software Engineering into the interesting field of the Semantic Web. Such open mindedness is not to be taken for granted. I also like to thank Brigitte Brem, who helped with the formalities of handing in the thesis. I want to thank my two employers I had during my master’s studies, Christoph Leithner and Manfred Gronalt. They made it possible to finish my degree and work at the same time. Special thanks is due to my friends and supporters Michael Geyer, who helped to convert my hand drawn sketches into beautiful graphics, Wilfried Mayer, who always was there to speak to when I encountered a problem, and Florian Hassanen, who significantly helped me to refine the nomenclature of the concepts which were created as part of this thesis. Further I want to thank my mother, Albana Ilo. Besides caring like all mothers do, she proofread my thesis and provided most valuable feedback. For this she made a great effort to read into a new subject. Most I want to thank my significant other, Anna Mach, for the steady support and for enduring the countless days where the keyboard noise did not stop until late at night. She cheered me up whenever I hit a wall and managed everything I could not while I was writing. Last but not least I want to thank all my family, friends, and colleagues, who supported me throughout the difficult last two years. v Kurzfassung Inter-Software-Beziehungen, wie z.B. Software-Abhängigkeiten, haben Auswirkung auf die Qualität und Entwicklung von Software und Software-Projekten und sind daher von essentieller Bedeutung für die Software-Entwicklung und Wartung. Aus diesem Grund gibt es bereits ausgeklügelte Systeme, um Software-Beziehungen zu deklarieren, zu verwalten und nutzbringend für Softwarebetriebs- und Entwicklungsprozesse einzusetzen. Nennenswerte Beispiele hierfür sind Packet-Management-Systeme von Linux-Distributionen und Build-Management-Systeme wie Apache Maven. Die Software-Netzwerke, auf denen diese Systeme agieren, bilden in sich interoperable, aber jeweils abgeschlossene Software- Ökosysteme, die sich in Syntax und Semantik voneinander unterscheiden, obwohl es Überlappungen in der Menge der enthaltenen Software gibt. Derzeit gibt es kein anwendbares System, welches Software- Ökosystem übergreifende Abfragen und Auswertungen zulässt. Diese Arbeit greift die Problemstellung auf, die semantischen und syntaktischen Grenzen von Software-Ökosystemen zu überwinden und dadurch die praktische Nutzung von Informationen über Inter-Software-Beziehungen für die Software Entwicklung und Wartung zu ermöglichen. Im Rahmen dieser Arbeit wurde ein Software-Prototyp entwickelt, der es ermöglicht, verschiedene Software-Ökosysteme zu integrieren und dadurch systemübergreifende Abfragen durchzuführen. Ein besonderes Augenmerk wurde auf Erweiterbarkeit und Skalierbarkeit gelegt, damit möglichst einfach neue, aber auch zahlreiche Software-Ökosysteme integriert werden können. Während der Entwicklung zeigte sich, dass Semantic Web-Technologien einen guten Rahmen für die Bearbeitung der Problem- stellung bieten. Mehrere Software-Ökosysteme wurden, z.B. aus den Debian/Ubuntu-Quellen oder den Common Vulnerability Enumeration (CVE)- und Common Platform Enumeration (CPE)-Verzeichnissen des National Institute of Standards and Technology (NIST), für die Evaluierung der Datenintegrati- on eingebunden. Weiters wurden Applikationen, wie ein Sicherheitslücken-Benachrichtigungssystem oder ein Lizenz-Einhaltungs-Überprüfungsprogramm, beispielhaft implementiert, um das Potential von Software-Ökosystem übergreifenden Abfragen aufzuzeigen und das Ergebnis zu evaluieren. Die wissenschaftlichen Beiträge dieser Arbeit gliedern sich wie folgt: eine verteilte Architektur für das Abgreifen, Parsen, Umlegen, Nachbearbeiten und Abrufen von generischen Datenquellen in ein semantisches RDF Datenmodel; eine abstrakte OWL-Ontologie für die semantische Modellierung von Inter-Software-Beziehungen; sowie ein System für die Verarbeitung von temporalen Resource Description Framework (RDF)-Aussagen mit SPARQL Protocol and RDF Query Language (SPARQL). Hierbei werden die Anfragen unter Beachtung der zeitlichen Gültigkeit, jedoch ohne vorheriger zeitlichen Normalisierung von Beobachtungszeitpunkten in Gültigkeitszeiträume, evaluiert. Schlüsselwörter Software Beziehungen, Software Abhängigkeiten, Semantic Web, Ontologie, Temporales SPARQL, metaservice, Mining Software Repositories vii Abstract Software interrelationships, like software dependencies, have impact on the quality and evolution of software projects and are therefore important to software development and maintenance. Sophisticated systems have been created in the past to define, manage, and utilize relationships in software processes. Mentionable examples for this are package management systems of Linux distributions and build systems like Apache Maven. These systems are clustered in software-ecosystems, which most of the times are syntactically and semantically incompatible to each other, although the described software can overlap. Currently there are no viable systems for querying information across different ecosystems. This thesis is about how to overcome semantic and syntactic borders of software ecosystems and thereby enable practical usage of information about software interrelationships in software development and maintenance. An iterative approach was used to develop a prototype, which enables integration of - and therefore queries across - different software ecosystems. Particular emphasis was placed on the extendibility and the scalability, i.e., to be able to easily integrate new and many ecosystems. During development, Semantic Web technologies showed to provide a suitable framework to approach this task. Several ecosystems, like Debian/Ubuntu repositories, and CVEs and CPEs defined by the NIST, were used to evaluate data integration. Additionally small applications, like a vulnerability notification system and license violation detector were used to show the usefulness of aggregated cross-ecosystem-interrelationships. Contributions of this thesis consist of: a distributed architecture for data retrieval, parsing, mapping, post-processing and querying of generic data into semantic RDF data model; an abstract owl-ontology for semantic modeling of inter-software relationships; and a model for processing temporally scoped RDF statements using SPARQL without previous normalization of observation times to time periods. Keywords Software Relationships, Software Dependencies, Semantic Web, Ontology, Temporal SPARQL, metaservice, Mining Software Repositories ix Contents 1 Introduction 1 1.1 Problem Statement . 1 1.2 Motivation . 2 1.3 Methodology . 3 1.4 Contributions . 3 1.5 Related Work . 3 1.5.1 Software Interrelationships . 3 1.5.2 Mining Software Repositories . 4 1.5.3 Semantic Web . 4 1.6 Thesis Structure . 5 2 Fundamentals 7 2.1 Software Repositories and Ecosystems . 7 2.1.1 Debian Ecosystem . 8 2.1.2 Apache Maven Repositories . 8 2.1.3 Joinup . 9 2.1.4

Design and Development of a Service for Software Interrelationships

Return of Organization Exempt from Income

Maven by Example I

Op E N So U R C E Yea R B O O K 2 0

Apache Buildr in Action a Short Intro

The Maven Definitive Guide

Free and Open Source Software Across the EU 153

Client-Server Web Apps with Javascript and Java

EIRA Overview

Brian Fox (Sonatype, Inc.), Bruce Snyder (Sonatype, Inc.), Jason Van Zyl (Sonatype, Inc.), Eric Redmond ()

Buildr by Apache

O2/A5 Final Release All Content

Securegateway Server