Interacting Domain-Specific Languages with Biological
Total Page:16
File Type:pdf, Size:1020Kb
INTERACTING DOMAIN-SPECIFIC LANGUAGES WITH BIOLOGICAL PROBLEM SOLVING ENVIRONMENTS A Dissertation Submitted to the Graduate School of the University of Notre Dame in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Trevor M. Cickovski, M.S. Jes´usA. Izaguirre, Director Graduate Program in Computer Science and Engineering Notre Dame, Indiana April 2008 INTERACTING DOMAIN-SPECIFIC LANGUAGES WITH BIOLOGICAL PROBLEM SOLVING ENVIRONMENTS Abstract by Trevor M. Cickovski Iteratively developing a biological model and verifying results with lab obser- vations has become standard practice in computational biology. This process is currently facilitated by biological Problem Solving Environments (PSEs), multi- tiered and modular software frameworks which traditionally consist of two layers: a computational layer written in a high level language using design patterns, and a user interface layer which hides its details. Although PSEs have proven effective, they still enforce some communication overhead between biologists refining their models through repeated comparison with experimental observations in vitro or in vivo, and programmers actually implementing model extensions and modifications within the computational layer. I illustrate the use of biological Domain-Specific Languages (DSLs) as a middle- level PSE tier to ameliorate this problem by providing experimentalists with the ability to iteratively test and develop their models using a higher degree of expres- sive power compared to a graphical interface, while saving the requirement of general purpose programming knowledge. I develop two radically different biological DSLs: XML-based BioLogo will model biological morphogenesis using a cell-centered stochastic cellular automaton and translate into C++ modules for an object-oriented Trevor M. Cickovski PSE CompuCell3D, and MDLab will provide a set of high-level Python libraries for running molecular dynamics simulations, using wrapped functionality from the C++ PSE ProtoMol. I describe each language in detail, including its its roles within the larger PSE and its expressibility in terms of representable phenomena, and a discussion of observations from users of the languages. Moreover I will use these studies to draw general conclusions about biological DSL development, includ- ing dependencies upon the goals of the corresponding PSE, strategies, and tradeoffs. This thesis is dedicated to my parents and Lilly for all their love and support. However, I also would like to make the following acknowledgments. Special thanks to my advisor Dr. Jes´usIzaguirre for his outstanding mentorship, and to my committee, Dr. Greg Madey, Dr. Pat Flynn and Dr. Mark Alber. I also am thankful for the collaborations with the Bioinformatics group at Indiana University and the LCLS at Notre Dame, and particularly Dr. Chris Sweet for providing me with top-notch education in both Math 999 and Jokes 101. Someday I'll be able to pass the second. Finally, thanks to Scott, Margaret, Jen, and Nice Save! for the friendships and unforgettable times for which I will always be grateful. ii CONTENTS FIGURES . viii TABLES . xii ACKNOWLEDGMENTS . xiii CHAPTER 1: INTRODUCTION . 1 1.1 Contributions of This Thesis . 10 CHAPTER 2: BioLogo , A DOMAIN-SPECIFIC LANGUAGE FOR MOR- PHOGENESIS . 11 2.1 Introduction . 11 2.2 Mathematical Morphogenesis Models . 11 2.2.1 Turing Reaction-Diffusion PDEs . 12 2.2.2 Cellular Potts Model . 13 2.2.3 Cell Type Automata . 16 2.3 CompuCell3D Design and Interfacing of BioLogo . 17 2.4 Syntax and Examples . 25 2.4.1 CPM Energy Terms . 25 2.4.2 Cell Type Automata . 29 2.4.3 Reaction-Diffusion Partial Differential Equation Solvers . 31 2.4.4 Avian Limb Bud Growth . 35 2.4.5 In Vitro HUVEC Vasculogenesis . 43 2.5 BioLogo Tools Design . 48 2.5.1 Compiler Front End . 48 2.5.2 Compiler Back End (Plugin Generator) . 55 2.6 Observations . 62 2.6.1 Contributions . 62 2.6.2 Design Perspectives . 64 2.6.3 User Perspectives . 68 iii CHAPTER 3: MDLAB, A DOMAIN-SPECIFIC LANGUAGE FOR MOLEC- ULAR DYNAMICS . 76 3.1 Introduction . 76 3.2 Molecular Dynamics Mathematical Models . 77 3.3 ProtoMol Design and Interfacing of MDLab . 80 3.4 Syntax and Examples . 85 3.4.1 Simulation Protocols in MDL . 87 3.4.2 Defining New Propagation Schemes . 101 3.4.3 Defining New Forces . 120 3.4.4 Interfacing to Other Tools . 123 3.5 MDL Tool Design: Factories . 127 3.5.1 Propagator Factory . 127 3.5.2 Force Factory . 137 3.6 Observations . 141 3.6.1 Contributions . 141 3.6.2 Design Perspectives . 142 3.6.3 User Perspectives . 147 CHAPTER 4: CONCLUSIONS . 153 APPENDIX A: BIOLOGICAL TERMS: MORPHOGENESIS . 163 APPENDIX B: BACKUS-NAUR FORM OF BioLogo . 167 APPENDIX C: PYTHON SIMULATION IN CompuCell3D . 171 APPENDIX D: CompuCell3D CONFIGURATION FILES OF BioLogo VERIFICATION SIMULATIONS . 173 D.1 Avian Limb Bud Growth . 174 D.2 HUVEC Vasculogenesis (Unscripted) . 175 D.3 HUVEC Vasculogenesis (Scripted) . 176 APPENDIX E: GENERATED C++/PYTHON FOR VARIOUS EXTENSIONS177 E.1 Avian Limb: Cell Type Automaton . 178 E.2 Avian Limb: Haptotaxis . 183 E.3 HUVEC Vasculogenesis: C++ PDE Solver . 188 E.4 HUVEC Vasculogenesis: Python PDE Solver . 191 APPENDIX F: INTERMEDIATE CODE FOR VERIFICATION SIMULA- TIONS . 195 iv APPENDIX G: OTHER MATHEMATICAL MODELS . 197 G.1 Basic Cell Sorting Type Automaton . 197 G.2 Schnakenberg PDEs . 200 APPENDIX H: BIOLOGICAL TERMS: MOLECULAR DYNAMICS . 203 APPENDIX I: MDL API DESCRIPTION . 205 I.1 Package src . 206 I.1.1 Modules . 206 I.2 Package src.factories . 207 I.2.1 Modules . 207 I.3 Module src.factories.ForceFactory . 208 I.3.1 Class ForceFactory . 208 I.4 Module src.factories.PropagatorFactory . 215 I.4.1 Functions . 215 I.4.2 Variables . 215 I.4.3 Class PropagatorFactory . 215 I.5 Package src.forces . 220 I.5.1 Modules . 220 I.6 Module src.forces.HDForce . 221 I.6.1 Functions . 221 I.6.2 Class HDForce . 221 I.7 Package src.modifiers . 222 I.7.1 Modules . 222 I.8 Module src.modifiers.averagePositions . 223 I.8.1 Functions . 223 I.8.2 Class ReducedHessAngleList . 223 I.9 Module src.modifiers.friction . 225 I.9.1 Functions . 225 I.10 Module src.modifiers.mollify . 226 I.10.1 Functions . 226 I.11 Package src.propagators . 227 I.11.1 Modules . 227 I.12 Package src.propagators.methods . 228 I.12.1 Modules . 228 I.13 Module src.propagators.methods.bbk . 229 I.13.1 Functions . 229 I.13.2 Variables . 229 I.14 Module src.propagators.methods.impulse . 230 I.14.1 Functions . 230 I.14.2 Variables . 230 I.15 Module src.propagators.methods.leapfrog . 231 I.15.1 Functions . 231 I.15.2 Variables . 231 I.16 Module src.propagators.methods.takahashi . 232 I.16.1 Functions . 232 v I.16.2 Variables . 232 I.17 Module src.propagators.methods.velocityscale . 233 I.17.1 Functions . 233 I.17.2 Variables . 233 I.18 Package src.propagators.objects . 234 I.18.1 Modules . 234 I.19 Module src.propagators.objects.HMCMDL . 235 I.19.1 Variables . 235 I.19.2 Class HMCMDL . 235 I.20 Module src.propagators.objects.MTS . 237 I.20.1 Class MTS . 237 I.20.2 Class MOLLY . 237 I.21 Module src.propagators.objects.NosePoincGL . 238 I.21.1 Variables . ..