Interpretability of Configurable Software in the Biosciences
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2020 Interpretability of configurable software in the biosciences Mikaela Cashman McDevitt Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Recommended Citation McDevitt, Mikaela Cashman, "Interpretability of configurable software in the biosciences" (2020). Graduate Theses and Dissertations. 18184. https://lib.dr.iastate.edu/etd/18184 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Interpretability of configurable software in the biosciences by Mikaela Cashman A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Science Program of Study Committee: Myra B. Cohen, Major Professor James Lathrop Robyn Lutz Samik Basu Julie Dickerson The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2020 Copyright © Mikaela Cashman, 2020. All rights reserved. ii DEDICATION This thesis is dedicated to my husband John and our feline children Leela and Romana. iii TABLE OF CONTENTS Page LIST OF TABLES . vi LIST OF FIGURES . xiii ACKNOWLEDGMENTS . xvi ABSTRACT . xviii CHAPTER 1. INTRODUCTION . .1 1.1 Motivation . .4 CHAPTER 2. BACKGROUND . .9 2.1 Configurability in Software . .9 2.2 Software Product Lines . 13 2.3 Bioinformatics . 15 2.4 Synthetic Biology . 16 CHAPTER 3. CONFIGURABILITY IN BIOINFORMATICS SOFTWARE . 18 3.1 Motivating Examples . 18 3.2 Case Study - Experimenting with Configurations . 20 3.2.1 Bioinformatics Programs . 21 3.2.2 Creating the Models . 23 3.2.3 Measures of Variability . 27 3.2.4 Experimental Setup . 28 3.2.5 Sampling . 29 3.3 Results . 29 3.3.1 RQ1: Failure Detection . 29 3.3.2 RQ2 (a): Functionality . 31 3.3.3 RQ2 (b): Performance . 34 3.3.4 RQ3: Sampling . 36 3.4 Discussion and Lessons Learned . 37 3.4.1 Lessons Learned . 38 3.4.2 Suggestions for Improvement . 39 3.5 Conclusions and Future Work . 40 CHAPTER 4. END USER INTERPRETABILITY FRAMEWORK . 41 4.1 Introduction . 41 4.2 Motivation . 43 iv 4.2.1 State of the Art . 43 4.2.2 Towards Interpretability . 45 4.3 Related Work . 45 4.4 ICO Framework . 46 4.4.1 Distance . 46 4.4.2 Algorithm . 47 4.4.3 ICO Implementation Details . 50 4.5 Study Methodology . 51 4.5.1 Subjects . 52 4.5.2 Configuration Models . 53 4.5.3 Handling of Enumeration Configuration Options . 56 4.5.4 Metrics . 56 4.5.5 Threats to Validity . 59 4.6 Results . 60 4.6.1 RQ1: Can we convey information to assist in choosing of configuration options based on user goals? . 60 4.6.2 RQ2: Is the information ICO provides to a user useful? . 63 4.6.3 RQ3: How does the state the art in prediction compare to ICO? . 66 4.7 Conclusion and Future Work . 70 CHAPTER 5. ORGANIC SOFTWARE PRODUCT LINES . 78 5.1 Introduction . 78 5.2 Motivating Example . 81 5.3 Related Work . 83 5.4 Organic Software Product Lines (OSPLs) . 84 5.4.1 Assets . 85 5.4.2 Domain Engineering . 86 5.4.3 Application Engineering . 86 5.5 Empirical Study . 86 5.5.1 Subject Repository . 87 5.5.2 Study Objects . 87 5.5.3 Methodology . 89 5.6 Threats to Validity . 94 5.6.1 External Validity . 94 5.6.2 Internal Validity . 95 5.6.3 Construct Validity . 95 5.7 Results . 95 5.7.1 RQ1: Does a DNA repository have functions with the characteristics of a software product line? . 95 5.7.2 RQ2: Can we build feature models representing families of products from an existing DNA Repository? . 100 5.7.3 RQ3: What type of end-to-end analysis can we provide to developers of or- ganic programs? . 105 5.7.4 RQ4: How effective is automatically reverse engineering feature models in this domain? . 110 5.8 Implications: The Future of OSPL Engineering . 117 v 5.8.1 Need for Tools Supporting SPL Evolution . 117 5.8.2 Need for SPL Constructs that Support Duplicate Features . 118 5.8.3 Need for More Scalable Reverse Engineering . 119 5.8.4 Incomplete Feature Models . 119 5.8.5 Towards a Domain Specific Language . 120 5.9 Conclusions and Future Work . 120 CHAPTER 6. CONCLUSIONS AND FUTURE WORK . 123 BIBLIOGRAPHY . 125 APPENDIX A. CONFIGURATION MODELS . 137 APPENDIX B. FRAMEWORK OUTPUT . 143 APPENDIX C. SPL CONQUEROR TOOL . 151 vi LIST OF TABLES Page Table 3.1 Case Study Subjects . 22 Table 3.2 BLAST Configuration Model . 25 Table 3.3 MEGAHIT Configuration Model . 25 Table 3.4 FBA Configuration Models . 26 Table 3.5 FBA-MFA - Errors . 30 Table 3.6 BLAST Functional Variance for Use Case 1 . 32 Table 3.7 BLAST Functional Variance for Use Case 2 . 32 Table 3.8 FBA-GUI - Functional Variance . 34 Table 3.9 FBA-MFA Sampling Results . ..