Master-Thesis.Pdf

University of Oslo Composing Software Product Lines with Machine Learning Components Sebastian Schartum Nomme & Jørgen Borgersen Master Thesis in Informatics: programming and system architecture - software 60 credits Department of Informatics The Faculty of Mathematics and Natural Sciences Spring 2020 Abstract Background. A software product line is a set of software-intensive systems that share a com- mon, managed set of features satisfying the specific needs of a particular market segment. The most considerable benefit of using a software product line is the ability of large-scale reuse. Cur- rently, machine learning models lack reproducibility and suffer from inconsistent deployment. There is a disconnect in machine learning engineering and traditional software that can cause issues when including machine learning models in a software product line. Aim. The study aims to outline an approach to address the problem allowing stakeholders bet- ter to weight their options in regards to how successfully include machine learning components in their software product line. Method. In the thesis, we developed a prototype and conducted interviews to gain insights into the topic. Results. Findings suggest that automatic product derivation with machine learning components has a few drawbacks. Manual effort is, in most cases, necessary. By having taken into account all the restrictions and constraints of software product line engineering and machine learning engineering, a composition-based approach is a viable option to architect software product lines. Conclusion. Utilising a composition-based approach with a component-based system will en- able to retain the many benefits of a software product line while including machine learning components. Keywords: software product lines, machine learning. i ii Preface First and foremost, we would like to thank everyone who has participated and contributed to the work in our thesis. It would have been hard to complete the thesis without your help and support. Thank you, Snapper Net Solutions for providing experience, employees, and hospitality throughout this process. We would like to give a massive thank you to our supervisor, Antonio Martini. Thank you for your excellent supervision and guidance throughout a long and complicated period of work. Thank you for always helping us on the right track and having faith in the work that we did. Thank you for being critical and for always asking the right questions. Finally, thank you for all the support from our friends and family. Through all these years of study, you have shown patience and encouragement towards our work. This is highly appreci- ated, and we could not have done this without you! Sebastian Schartum Nomme & Jørgen Borgersen University of Oslo, June 2020 iii iv Table of Contents Table of Contents ix List of Tables x List of Figures xiii List of Equations xiv 1 Introduction 1 1.1 Experience and background . 2 1.2 Personal motivation . 2 1.3 Snapper Net Solutions . 3 1.4 Target group . 3 1.5 Presentation of the thesis layout . 4 2 The case 6 2.1 Problem description . 6 2.2 Terms and concepts . 7 2.3 Research questions . 7 3 Background 11 3.1 Software product lines . 11 3.1.1 Motivation / Awareness of problem . 11 3.1.2 Fundamental approach . 13 3.1.3 Product definition strategy . 14 3.1.4 Variability management . 15 3.1.5 Process . 17 3.2 Machine learning . 22 3.2.1 The use of machine learning . 23 3.2.2 Methods of machine learning . 24 3.2.3 Possible approaches . 25 3.3 Recommender systems . 32 3.3.1 How do recommender systems work? . 33 3.3.2 Filtering methods . 33 v 3.3.3 Top-N recommenders . 36 3.3.4 Designing and evaluating recommender systems . 38 3.3.5 Accuracy measures . 38 3.3.6 Evaluating recommender systems . 41 3.3.7 Economical considerations . 43 3.3.8 Motivation . 45 4 Research process 48 4.1 Conducting research . 48 4.1.1 Foundation for the right research process . 48 4.2 Our process . 49 4.2.1 Define problem case . 50 4.2.2 Research theory . 50 4.2.3 Initial approach . 51 4.2.4 Implementation of second protoype . 51 4.2.5 Evaluation . 52 4.2.6 Reflection and conclusion . 52 4.3 Design science research . 53 4.4 DSR framework . 54 4.5 DSR Guidelines . 56 4.5.1 Guideline 1: Design as an artefact . 56 4.5.2 Guideline 2: Problem relevance . 57 4.5.3 Guideline 3: Design evaluation . 57 4.5.4 Guideline 4: Research contributions . 60 4.5.5 Guideline 5: Research rigor . 60 4.5.6 Guideline 6: Design as a search . 61 4.5.7 Guideline 7: Communication of research . 62 4.6 Methods for collecting data . 62 4.6.1 Case study . 63 4.6.2 Survey . 67 4.6.3 Methods for data analysis . 68 vi 5 Product foundation 73 5.1 Data access and gathering . 73 5.1.1 Explicit data . 73 5.1.2 Implicit data . 74 5.2 Company A data access . 76 5.3 Generating mock data . 77 5.4 Foundation . 78 5.4.1 Architecture . 78 5.5 Requirements elicitation in SPL . 80 6 Initial approach 84 6.1 First prototype architecture . 84 6.1.1 A tweak of K-Nearest Neighbours . 86 6.2 Distance metrics . 87 6.2.1 Euclidean distance . 88 6.2.2 Pearson correlation coefficient . 90 6.2.3 Recommending items . 93 6.2.4 Evaluating other distance metrics . 95 6.3 Prototype with SPL implementation . 96 6.3.1 Architecture . 97 6.3.2 Generic code . 99 6.3.3 Layout of components and views . 101 6.4 Process from first prototype to SPL prototype . 103 6.5 Estimates of implementing R2 . 104 6.6 Considerations for the future . 105 7 Implementation 108 7.1 Overall architecture . 109 7.1.1 Architecture of the recommender system . 111 7.1.2 Architectural flow with recommender system . 112 7.2 Implementation of a recommender system in a SPL . 113 7.2.1 Pre-processing . 114 7.2.2 Model training . 119 7.2.3 Model serving . 126 vii 7.2.4 Testing, experimentation and evaluation . 126 8 Results from evaluation 130 8.1 RQ1: How possible is it to create machine learning components that work for multiple products in a software product line? . 130 8.2 RQ2: How reusable are machine learning components in a software product line? 132 8.3 RQ3: How feasible is it to create and consume reusable machine learning models in a software product line? . 133 8.4 RQ4: How can you support a Software Product Line Evolution containing Ma- chine Learning Components? . 135 8.5 RQ5: How does a software product line affect the quality of its recommender systems? . 137 9 Lessons learned 142 9.1 Reactive and feature-oriented approach . 142 9.1.1 Composition-based approach . 143 9.1.2 Reactive approach . 144 9.1.3 Feature-orientation . 144 9.1.4 Software composition with mapping to SPL . ..

Load more