Mining and Analyzing User Rationale in Software Engineering
Total Page:16
File Type:pdf, Size:1020Kb
University of Hamburg Mining and Analyzing User Rationale in Software Engineering Dissertation with the aim of achieving a doctoral degree (Dr. rer. nat) at the Faculty of Mathematics, Informatics, and Natural Sciences Department of Informatics of the University of Hamburg submitted by Zijad Kurtanović from Sarajevo Hamburg, 2018 Day of oral defense: 19.04.2018 Head of examination commission: Prof. Dr. Ingrid Schirmer Evaluators of the dissertation: 1. Prof. Dr. Walid Maalej 2. Prof. Dr. Nicole Novielli To my family, with love. Acknowledgments I would like to express my deep gratitude to Walid Maalej, for his trust, visionary guidance, and inspirational support. I feel honored to had him as my supervisor and cannot express how much I learned from him. Thank you for your invaluable feedback and encouragement to pursue this topic and the opportunity to learn and continuously grow in and beyond research. I also thank Nicole Novielli who accepted to be my second supervisor. Thank you for the inspiring talks and discussions on natural language processing during your stay in Hamburg and your valuable feedback. I am lucky to have worked with great colleagues at the Applied Software Tech- nology Group: Alexander Beifuß, Mathias Ellmann, Fariba Fazli, Davide Fucci, Marlo Häring, Chakajkla Jesdabodi, Timo Johann, Clara Marie Lüders, Na- talia Mannov, Daniel Martens, Lloyd Montgomery, Yen Dieu Pham, Christoph Stanik, Rebecca Tiarks, and Nedaa Zirjawi. I am grateful for all the fruitful discussions, your valuable feedback, all the shared experiences and memorable moments that shaped me and my research. Thank you for any kind of support. I am deeply thankful to all of you for amplifying my moments of joy and being there in tough times. I thank my students I have supervised or I have worked with, the co-authors, or any other persons who inspired or helped me during my studies, in particu- lar Alireza MollaAlizadeh Bahnemiri, Daniel Dabrowski, Jan Hennings, Hadeer Nabil, Ahmed Saad, Julian Schmidt, and Mansoureh Ziaei. I thank Dominik Herrmann for constructive discussions and feedback on sam- pling strategies. I gratefully thank Swapneel Sheth for his warm attitude, valu- able discussions, and help during his stay in Hamburg. I am thankful to René Schumann and Ingo J. Timm for inspiring and encouraging me to pursue doc- toral studies. I gratefully acknowledge the assistance and support of Heidi Oskarsson and Carina Volkmer, for their constant positive attitude and support particularly with organizational and bureaucratic obstacles at the University. I would like to thank the whole team of the Informatics Computing Center. A special thanks goes to Reinhard Zierke for his highly competent and professional support with the infrastructure. I am very grateful to had the opportunity to be part of the SCAn project and to work with great project colleagues from Hans-Bredow Institute, particularly Wiebke Loosen. The work in SCAn shaped my work in the final stage of my dissertation. I am also grateful for the opportunity to be part of the OPENREQ project and to work with highly talented individuals from research and industry. Finally, I am deeply grateful and can hardly thank my family enough, partic- ularly my dear parents for their unconditional love, sacrifice, and support. I am also deeply grateful to my brothers, my sister, and friends who were always there when I needed them most. Finally, I owe my deepest love and gratitude to my wife Selma for her enormous love and devotion, for her constant and unselfish support throughout this uncertain journey, and to our two lovely daughters that remind me every day how to enjoy life. A big thanks to everyone else I missed to mention who supported me throughout my doctoral studies. Abstract Rationale refers to the reasoning and justification behind human decisions, opin- ions, and beliefs. In software engineering, rationale is important for capturing and documenting requirements and design decisions and consequently organiz- ing and reusing knowledge in software organizations. While rationale knowledge typically originates from professional stakeholders involved in a software project (e.g., business analysts, developers, managers), nowadays there is a potential in augmenting this knowledge with the rationale of users, posted e.g., in app stores or social media. User feedback contains a significant amount of knowledge in- cluding rationale that we can mine and use for software engineering purposes. Unfortunately, studying and mining rationale from user feedback for software engineering has been so far deficiently researched. This thesis empirically studies rationale written by end users in online reviews using grounded theory approach and peer content analysis. We studied users reasoning and justification, for example how users explain their decisions, e.g. on upgrading, installing, or switching the application. We also studied the characteristics and frequency distribution of the identified rationale concepts, such as issues encountered, alternatives considered, or criteria for assessment. We found that criteria such as performance, compatibility, and usability, which play an important role during requirements analysis, system design, and project management activities, represent the most frequent user rationale concept. We also found that users express and justify their stances by criteria assessments. Using a manually labeled dataset of software reviews we studied how accu- rately we can automatically mine rationale concepts from reviews using super- vised machine learning and identified potentials and challenges. We also studied whether we can augment an industrial criteria dataset with our user rationale dataset to improve classification accuracy of non-functional requirements, by handling class imbalances and by enlarging the industrial dataset. We also used a dataset of pro and contra user comments on controversial issues to assess topic- independent lexical features and significance of comment’s parts (e.g., sentence position) for stance mining. We found classification and data insights for stance miners and discuss their potential for software engineering. Inspired from our studies and empirical findings, we introduce and discuss the Rationalytics framework and two prototypes as a proof of concept for rationale and stance mining tools for software engineering projects. Zusammenfassung Begründungen werden dazu verwendet um menschliche Entscheidungen, Mei- nungen und Überzeugungen zu rechtfertigen. In der Softwareentwicklung sind Begründungen wichtig, um Anforderungen und Designentscheidungen zu er- fassen und zu dokumentieren, und das folglich entstandene Wissen in Software- organisationen zu organisieren und wiederzuverwenden. Während Begründun- genswissen hauptsächlich von professionellen Stakeholdern stammen, die an einem Softwareprojekt beteiligt sind (z.B. Business-Analysten, Entwickler, Pro- jektmanager), besteht heutzutage das Potenzial, dieses Wissen durch die Be- gründungen der Softwarenutzer zu erweitern, die z.B. in App-Stores oder sozialen Medien veröffentlicht werden. Nutzerfeedback enthält eine erhebliche Menge an nützlichem Wissen, einschließlich Begründungen, die wir für die Softwareen- twicklung extrahieren und verwenden können. Das Studium und die Extraktion von Begründungen aus dem Nutzerfeedback für Softwareentwicklung-Zwecke wurde bisher jedoch unzureichend erforscht. Diese Arbeit untersucht empirisch Begründungen von Nutzern in Online- Bewertungen unter Anwendung des Grounded-Theory Ansatzes und der Peer- Inhaltsanalsyse. Wir haben Argumentationen und Rechtfertigungen studiert, beispielsweise wie Nutzer ihre Entscheidungen erklären, über die Aktualisierung, Installation oder den Wechsel der Anwendung. Außerdem untersuchten wir die Merkmale und Häufigkeitsverteilung der identifizierten Begründungskonzepte, z.B. aufgetretene Probleme, berücksichtigte Alternativen oder Bewertungskri- terien. Wir haben festgestellt, dass Kriterien wie Leistung, Kompatibilität und Benutzerfreundlichkeit, die bei Anforderungsanalyse, Systemgestaltung und Projektmanagement-Aktivitäten eine wichtige Rolle spielen, die häufigsten ver- wendeten Begründungskonzepte darstellen. Wir haben auch festgestellt, dass Nutzer ihre Positionen durch Kriterienbewertungen ausdrücken und rechtferti- gen. Anhand eines manuell beschrifteten Datensatzes von Software-Bewertungen haben wir untersucht, wie genau wir Begründungskonzepte aus Nutzerbewer- tungen mit überwachtem maschinellem Lernen automatisch gewinnen können und Potenziale und Herausforderungen identifiziert. Wir haben auch unter- sucht, ob wir einen industriellen Kriteriendatensatz mit unserem Nutzerdaten- satz ergänzen können, um durch Handhabung von Klassenungleichgewichten und Erweiterung des Datensatzes, die Klassifikationsgenauigkeit von nicht-funk- tionalen Anforderungen im Kriteriendatensatz zu verbessern. Wir verwende- ten auch einen Datensatz von Pro-Contra Nutzerkommentaren zu kontroversen Themen, um themenunabhängige lexikalische Merkmale sowie die Signifikanz von Kommentarteilen (z.B. Satzpositionen) für automatische Identifikation von Nutzerpositionen zu evaluiren. Wir fassten unsere Klassifikations- und Datenein- blicke für die Erkennung von Nutzerpositionen zusammen und diskutieren ihr Potenzial für Softwareentwicklung. Inspiriert von unseren Studien und empirischen Ergebnissen, stellen wir das Rationalytics-Framework vor, sowie zwei Prototypen als Konzeptnachweis für Werkzeuge zur Extraktion von Nutzerbegründungen und -haltungen für Softwareentwicklungs-Projekte.