Dependency Management 2.0 – a Semantic Web Enabled Approach

Dependency Management 2.0 – A Semantic Web Enabled Approach Ellis Emmanuel Eghan A Thesis In the Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy (Computer Science) at Concordia University Montreal, Quebec, Canada July 2019 © Ellis Emmanuel Eghan, 2019 CONCORDIA UNIVERSITY SCHOOL OF GRADUATE STUDIES This is to certify that the thesis prepared By: Ellis Emmanuel Eghan Entitled: Dependency Management 2.0 – A Semantic Web Enabled Approach and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the final examining committee: _______________________________________________________ Chair Dr. Govind Gopakumar _______________________________________________________ External Examiner Dr. Giuliano Antoniol _______________________________________________________ Examiner Dr. Ferhat Khendek _______________________________________________________ Examiner Dr. Dhrubajyoti Goswami _______________________________________________________ Examiner Dr. Nikolaos Tsantalis _______________________________________________________ Supervisor Dr. Juergen Rilling Approved by ________________________________________________________________ Dr. Leila Kosseim, Graduate Program Director September 3, 2019 ___________________________________________________ Dr. Amir Asif, Dean Gina Cody School of Engineering and Computer Science Abstract Dependency Management 2.0 – A Semantic Web Enabled Approach Ellis Emmanuel Eghan, Ph.D. Concordia University, 2019 Software development and evolution are highly distributed processes that involve a multitude of supporting tools and resources. Application programming interfaces are commonly used by software developers to reduce development cost and complexity by reusing code developed by third-parties or published by the open source community. However, these application programming interfaces have also introduced new challenges to the Software Engineering community (e.g., software vulnerabilities, API incompatibilities, and software license violations) that not only extend beyond the traditional boundaries of individual projects but also involve different software artifacts. As a result, there is the need for a technology-independent representation of software dependency semantics and the ability to seamlessly integrate this representation with knowledge from other software artifacts. The Semantic Web and its supporting technology stack have been widely promoted to model, integrate, and support interoperability among heterogeneous data sources. This dissertation takes advantage of the Semantic Web and its enabling technology stack for knowledge modeling and integration. The thesis introduces five major contributions: (1) We present a formal Software Build System Ontology – SBSON, which captures concepts and properties for software build and dependency management systems. This formal knowledge representation allows us to take advantage of Semantic Web inference services forming the basis for a more flexibility API dependency analysis compared to traditional proprietary analysis approaches. (2) We conducted a user survey which involved 53 open source developers to allow us to gain insights on how actual developers manage API breaking changes. (3) We introduced a novel approach which integrates our SBSON model with knowledge about source code usage and changes within the Maven ecosystem to support API consumers and producers in managing (assessing and minimizing) the impacts of breaking changes. (4) A Security Vulnerability Analysis Framework iii (SV-AF) is introduced, which integrates builds system, source code, versioning system, and vulnerability ontologies to trace and assess the impact of security vulnerabilities across project boundaries. (5) Finally, we introduce an Ontological Trustworthiness Assessment Model (OntTAM). OntTAM is an integration of our build, source code, vulnerability and license ontologies which supports a holistic analysis and assessment of quality attributes related to the trustworthiness of libraries and APIs in open source systems. Several case studies are presented to illustrate the applicability and flexibility of our modelling approach, demonstrating that our knowledge modeling approach can seamlessly integrate and reuse knowledge extracted from existing build and dependency management systems with other existing heterogeneous data sources found in the software engineering domain. As part of our case studies, we also demonstrate how this unified knowledge model can enable new types of project dependency analysis. iv To Betty, Michael, and Isabel v ACKNOWLEDGEMENTS First and foremost, all praises to God for blessing, protecting, and guiding me throughout my studies. I could never have accomplished this without my faith. A heartfelt thanks goes to my supervisor, Dr. Juergen Rilling, for having given me such an opportunity. Dr. Rilling did more than just advising: his continuous support, patience, motivation, and immense knowledge pushed me beyond my boundaries and shaped me, both as a person and a researcher. Throughout this journey, Dr. Rilling was always available to openly discuss new research ideas, ask challenging questions, and therefore elevate this work to the best it can be. His unique personality as a supervisor and friend is the main reason behind the success of this research. I also extend my thanks to members of the examining committee including Dr. Nikolaos Tsantalis, Dr. Ferhat Khendek, Dr. Dhrubajyoti Goswami, and Dr. Giuliano Antoniol (External), whose work and valuable feedback expanded my horizon and helped me discover new research opportunities that significantly improved my research. My gratitude also goes to Ambient Software Evolution Group members at Concordia University: Dr. Sultan Alqahtani, Dr. Rabe Abdalkareem, SayedHassan Khatoonabadi, Chris Forbes, Parisa Moslehi, and Yasaman Sarlati. A special thanks to two unique friends and colleagues, Dr. Sultan Alqahtani and Parisa Moslehi, who were there during the past years to discuss and share research ideas and challenges. Our discussions led to new published work and more research opportunities. Above all, I thank all the members of my family. First and foremost, I want to thank my parents, Victor and Joyce Eghan, who generously and wholeheartedly gave me their unconditional love and endless support throughout these years. I am grateful for your trust and confidence in me and for giving me the freedom to pursue my dreams. Last, but definitely most prominently, I thank my wife, Betty, for her unwavering love and encouragement during the pursuit of my studies. Thank you for always believing in me, and for reminding me to endure during the tough times. I consider myself lucky to have encountered you. I love you. vi Table of Contents List of Figures ............................................................................................................................... xi List of Tables .............................................................................................................................. xiv List of Acronyms ......................................................................................................................... xv 1 Introduction ........................................................................................................................... 1 1.1 Our Thesis ..................................................................................................................................... 2 1.2 Summary of Research Contributions ............................................................................................ 3 1.3 Related Publications ...................................................................................................................... 5 1.4 Thesis Organization ...................................................................................................................... 5 2 Motivation .............................................................................................................................. 7 3 Background and Related Work ......................................................................................... 12 3.1 The Semantic Web in a Nutshell ................................................................................................. 12 3.2 Ontologies in Software Engineering ........................................................................................... 15 3.3 Mining Software Repositories (MSR) ........................................................................................ 17 3.4 Build Systems and Dependency Management ............................................................................ 19 3.5 Chapter Summary ....................................................................................................................... 23 4 A Unified Ontology-based Modeling Approach for Software Build and Dependency Repositories ................................................................................................................................. 24 4.1 Introduction ................................................................................................................................. 24 4.2 Software Build System ONtology (SBSON): Knowledge Modeling and Engineering .............

Dependency Management 2.0 – a Semantic Web Enabled Approach

Identifying Challenges for OSS Vulnerability Scanners - a Study & Test Suite

View Whitepaper

UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations

Automated Knowledge Base Extension Using Open Information

A BZ Media Publication

Unifying Logical and Statistical AI with Markov Logic

Human in the Loop Enrichment of Product Graphs with Probabilistic Soft Logic

Hinge-Loss Markov Random Fields and Probabilistic Soft Logic

Technology Report 1H 2018

ESIP Software Guidelines: Bibliography and Resources

Probabilistic Soft Logic Dr

Collective Spammer Detection in Evolving Multi-Relational Social Networks