
Automated Extraction of Feature and Variability Information from Natural Language Requirement Specifications D I S S E R TA T I O N zur Erlangung des akademischen Grades Doktoringenieur (Dr.-Ing.) angenommen durch die Fakultät für Informatik der Otto-von-Guericke-Universität Magdeburg von M.Sc. Yang Li geb. am 14.06.1987 in Hebei, China Gutachterinnen/Gutachter Prof. Dr. Gunter Saake Prof. Dr. Andreas Nürnberger Prof. Dr. Rick Rabiser Magdeburg, den 30.10.2020 ii Abstract Software Product Lines support structured reuse of software artifacts to realize the maintenance and evolution of the typically large number of variants, which promotes the industrialization of software development, especially for software-intensive prod- ucts. Feature and variability information extraction from different artifacts is an indispensable activity to support the systematic integration of single software sys- tems and software product line. However, for a legacy system, it is non-trivial to gain information about commonalities and differences of the variants. Beyond manually extracting commonalities and variabilities, a variety of approaches, such as feature location in source code and feature extraction in requirements, has been proposed to provide automatic identification of features and their variation points. Compared with source code, requirements contain more complete variability information and provide traceability links to other artifacts from early development phases. In this thesis, we provide a systematic literature review, which contains a multi-dimensional overview of feature extraction approaches from natural language documents. Based on the observations from studies, we provide feasible and accurate approaches to improve the efficiency of feature extraction. To achieve this goal, we first explore the application of deep learning technologies in feature extraction. Second, we pro- pose a hybrid approach based on multiple natural language processing and data mining techniques to extract features and variability information. Third, in order to provide understandable notations for features, we propose an approach combining keyword extraction and machine learning methods to predict feature-related terms. Fourth, we apply the proposed feature extraction approaches to analyze the require- ments from a real-world scenario in practice, where we adjust the framework and combine other algorithms in terms of the specialities of real-world requirements. We empirically present how our proposed approaches can be used to extract features and variation points, while results show the usage of the proposed approaches can benefit the extraction process. Zusammenfassung Software-Produktlinien unterstutzen¨ die strukturierte Wiederverwendung von Soft- ware Artefakten, um die Wartung und Weiterentwicklung der normalerweise großen Anzahl von Varianten zu realisieren, was die Industrialisierung der Softwareen- twicklung insbesondere fur¨ softwareintensive Produkte f¨ordert. Die Extraktion von Feature und Variabilit¨atsinformationen aus verschiedenen Artefakten ist eine un- verzichtbare Aktivit¨at, um die systematische Integration einzelner Softwaresysteme und Software-Produktlinie zu unterstutzen.¨ Fur¨ ein Altsystem ist es jedoch nicht trivial, Informationen uber¨ Gemeinsamkeiten und Unterschiede der Varianten zu erhalten. Neben dem manuellen Extrahieren von Gemeinsamkeiten und Variabil- it¨aten wurden vielf¨altige Ans¨atze vorgeschlagen, z. B. die Position von Features im Quellcode und die Extraktion von Features in Anforderungen, um Features und ihre Variationspunkte automatisch zu identifizieren. Im Vergleich zum Quellcode enthal- ten die Anforderungen umfassendere Variabilit¨atsinformationen und bieten Ruck-¨ verfolgbarkeitsverknupfungen¨ zu anderen Artefakten aus fruhen¨ Phasen der Softwa- reentwicklung. In dieser Arbeit bieten wir eine systematische Literaturrecherche, die einen multidimensionalen Uberblick¨ uber¨ Ans¨atze zur Feature-Extraktion aus Doku- menten in naturlicher¨ Sprache enth¨alt. Basierend auf den Beobachtungen aus dieser Studie schlagen wir praktikable und genaue Ans¨atze zur Verbesserung der Effizienz der Feature-Extraktion vor. Um dieses Ziel zu erreichen, untersuchen wir zun¨achst die Anwendung von Deep-Learning-Technologien bei der Feature-Extraktion. Zweit- ens schlagen wir einen hybriden Ansatz vor, der auf mehreren Techniken zur Verar- beitung naturlicher¨ Sprache und Data-Mining basiert, um Informationen von Fea- ture und Variabilit¨at zu extrahieren. Daruber¨ hinaus pr¨asentieren wir einen Ansatz, der Schlusselwortextraktion¨ und Methoden des maschinellen Lernens kombiniert, um feature-bezogene Termini vorherzusagen, damit verst¨andliche Notationen fur¨ Features bereitgestellt werden k¨onnen. Schließlich wenden wir die zuvor pr¨asen- tierten Ans¨atze zur Feature-Extraktion an, um die Anforderungen aus einem realen Szenario in der Praxis zu analysieren, wobei wir das Framework anpassen und andere Algorithmen im Hinblick auf die Besonderheiten realer Anforderungen kombinieren. Empirisch pr¨asentieren wir, wie von uns gestellte Ans¨atze verwendet werden k¨onnen, um Features und Variationspunkte zu extrahieren. Zugleich zeigen die Ergebnisse, dass die Verwendung dieser Ans¨atze dem Extraktionsprozess zugutekommen kann. Acknowledgements I would like to express my deepest gratitude to Gunter Saake. He gave me the op- portunity to pursue my Ph.D. under his supervision, provided an excellent research environment, and gave me the freedom to choose my research direction. I would like to thank Sandro Schulze for his valuable and constructive suggestions during the planning and development of my research. His attentive guidance, com- ments, and feedback improved my academic writing skills. During the last four years, we had numerous fruitful discussions that had a major impact on my research. His willingness to give his time so generously has been very much appreciated. I would like to offer my special thanks to all my colleagues in our team without whom I could not achieve fruitful discussions. The advice given by them has been a great help in my research. I am particularly grateful for the assistance given by Xiao Chen, Sebastian Krieter, Jacob Kruger,¨ Wolfram Fenske, David Broneske, Juliana Alves Pereira, Mustafa Al-Hajjaji, Gabriel Campero Durand, Fabian Benduhn, Jens Meinicke, and Anja Buch. They not only supported me in my research, but also helped me solve the problems encountered in life. I would also like to thank all the researchers I met on the path of my research. In particular, I thank Thomas Fogdal, Helene Scherrebeck, Jiahua Xu, Stefania Gnesi, and Laura Semini. The collaboration, feedback, and comments from them had a great impact on my research. Moreover, I would like to thank Andreas Nurnberger¨ and Rick Rabiser for being the reviewers of my thesis. Last but not least, I want to thank my girlfriend Bowen who brings a lot of joy and happiness to our life. I am very grateful to my parents and brother for their selfless support and love. Contents List of Figures xiv List of Tables xv List of Acronyms xvii 1 Introduction 1 1.1 Goal of the Thesis . .2 1.2 Structure of the Thesis . .3 2 Background 5 2.1 Software Product Line Engineering . .5 2.1.1 Domain Engineering . .5 2.1.2 Application Engineering . .8 2.1.3 Feature Model . .9 2.1.4 Gap between SPL and Traditional Software Reuse . 10 2.2 Natural Language Processing . 11 2.2.1 Preprocessing . 11 2.2.2 WordEmbedding......................... 13 2.2.3 Recognizing Textual Entailment . 15 2.3 Summary ................................. 16 3 Current Research on Feature and Variability Extraction 17 3.1 Review Methodology . 18 3.1.1 Need for a Review . 18 3.1.2 Research Questions . 18 3.1.3 Search Strategy . 19 3.1.4 Conducting the Review . 20 3.2 Results................................... 22 3.2.1 Results of Studies Search . 22 3.2.2 Answering Research Questions . 26 3.3 Discussion................................. 35 3.4 ThreatstoValidity ............................ 36 3.5 RelatedWork ............................... 36 3.6 Summary ................................. 37 4 An Initial Self-Learning Structure for Feature Extraction 39 4.1 Methodology ............................... 41 x Contents 4.1.1 Overview ............................. 41 4.1.2 Laplacian Eigenmaps . 42 4.1.3 Convolutional Neural Network . 43 4.1.4 Clustering . 45 4.2 Preliminary Result . 45 4.2.1 Discussion . 45 4.3 RelatedWork ............................... 47 4.4 Summary ................................. 48 5 VarMine: Reverse Engineering Variability in A Hybrid Way 49 5.1 VarMine in a Nutshell . 50 5.2 Semantic Similarity Network . 51 5.2.1 Word Level Similarity . 51 5.2.2 Requirement Level Similarity . 52 5.3 Feature and Variability Extraction . 55 5.3.1 Feature Extraction . 55 5.3.2 Optionality and Group Constraints Detection . 58 5.3.3 Cross-Tree Constraints Detection . 59 5.4 Evaluation................................. 61 5.4.1 Research Questions . 62 5.4.2 Case Study Description . 62 5.4.3 Clustering Evaluation . 64 5.4.4 Feature Model Evaluation . 65 5.4.5 Comparison with SOVA and ArborCraft . 71 5.4.6 Answering RQs . 74 5.5 ThreatstoValidity ............................ 75 5.6 RelatedWork ............................... 75 5.7 Summary ................................. 77 6 The Inference of the Notions of Features 79 6.1 Methodology ............................... 81 6.1.1 Dataset Generation . 81 6.1.2 Dataset Preprocessing . 85 6.1.3 Training Process . 87 6.2 Evaluation................................. 88
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages151 Page
-
File Size-