Accelerating Chemical Simulation Through Model Modification
Total Page:16
File Type:pdf, Size:1020Kb
Masaryk University Faculty of Informatics Accelerating Chemical Simulation through Model Modification Doctoral Thesis Jana Pazúriková Brno, Fall 2017 Masaryk University Faculty of Informatics Accelerating Chemical Simulation through Model Modification Doctoral Thesis Jana Pazúriková Brno, Fall 2017 Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Jana Pazúriková Advisor: Luděk Matyska i Acknowledgement The years of my doctoral studies have been filled with life lessons and people who helped me to get through them. Let’s tackle those lessons first. Working on such a huge project for years, mostly alone, as I have, often can be (and sure was) overwhelming. I have fought self-doubt and the imposter syndrome almost every week and I have not won. However, if you open up and share, you find out that everyone has its doubts. And suddenly, you are not alone. I have invested enourmous time and energy to the research of one of the problems and the results did not correspond with the effort. I ended up with more problems to solve than I had at the beginning and little to show for years of focus. Science can be like that sometimes, even the most elegant solutions might not work despite reasonable expectations. Leaving behind so much of my work, so much of my time has been the most challenging and at the same time the most freeing thing I have done. And I learned not to linger on my already spent time just as I try not to linger on material things. And then a side project turned out nicely and offered an unex- pected opportunity leading to an elegant method. And I learned you should always keep a door open to another direction. After that I came across a problem that seemed too easy at the first glance, not scientific enough. And I learned that if you dig deeper and do things properly, you can come up with the solution that surprises you with its simplicity. Just because it’s science, it does not have to be complicated. During this journey, several people supported me and helped me to get up and more forward again and again. Professor Matyska has often calmed my worries and offered a new insight. Scientists I have co-operated with, Aleš Křenek, Radka Svobodová and Vojtěch Spiwok, have always been supportive to me as a scientist-in-training and pro- vided me with valuable feedback. My colleagues in the Laboratory of Advanced Network technologies, Víťa, Pavel, Milan, Fila and Honza, have shared their experience and life lessons during our conversations in the kitchen. iii I would not be where I am now without my family. Their love and care has shapen me as a person. And finally, Tomáš has stood right beside me the whole time asan infinite source of hugs that soothe my soul and lift my spirit. This work was supported by Czech Science Foundation (15-17269S) and LM2015047 Czech National Infrastructure for Biological Data. Computational resources were provided by the CESNET LM2015042 and the CERIT Scientific Cloud LM2015085, provided under the pro- gramme “Projects of Large Research, Development, and Innovations Infrastructures”. iv Abstract Computer modeling and simulations have supported research in almost all scientific areas. Especially for life sciences, they offer new insights for thinking and new possibilities for experimenting. One of the most powerful computational methods available in simu- lations regarding chemical processes is molecular dynamics. In this type of simulations, particles move due to their potential and kinetic energy. Several models differing in scale and the level of approxima- tion describe those particles and their potential energy, ranging from quantum to coarse-grained models. Two main issues hamper the wider application of molecular dy- namics simulations: cost and accuracy of models. High computational demands stem from the necessity of long simulation timescales due to the factors given either by the physical reality of our world or the requirements put on scientific methods. Except for quantum model of particles and interactions, all others approximate the reality rather coarsely. Complex nonlinear behavior is replaced by a scalar con- stant or an empirical function, all to simulate faster and therefore reach longer timescales and larger systems. Computer science can contribute to solving these issues as their sub problems often relate to high performance computing, numerical methods, optimization problems, and others. In this dissertation, we propose a methodology to deal with such demanding computations. A lot of them share a common concept: the evaluation of a computationally expensive function in a loop. The usual approaches aim to accelerate the function, but they are begin- ning to hit the limits. We, on the contrary, turn our attention to the loop. First, we build a model of the problem that emphasizes the it- erative character of computation. Then we modify the execution of the loop in various ways that lead to the acceleration of the whole computation. We apply this methodology to three demanding compu- tational problems from chemical simulations: two regarding the long timescales and one regarding the accuracy. We change the scheme of the loop to add another level of parallelization, we reduce the number vi of the function’s evaluations by reusing calculations from previous iterations or we omit iterations that do not contribute to the result. In the first problem, we applied parallel-in-time computation scheme to increase the scalability of molecular dynamics simulation. Even though we successfully performed the first such simulation of a biomolec- ular system, many issues need to be addressed before routine use. We offer their rigorous analysis supported with experiments. In the second problem, we approximated the calculation of the mean square distance between many molecular structures, a demand- ing part of metadynamics. Metadynamics pushes the molecular dy- namics simulation forward to quicken the occurrence of rare events. Our method manages to reduce its overhead significantly without a loss of accuracy. In the third problem, we simplified the algorithm for parametriza- tion of atomic charges, basically an optimization problem. By sys- tematic analysis, we found out that state-of-the-art approaches are needlessly complicated. We developed a method that reaches or sur- passes their accuracy and runs faster. The achieved results demonstrate that the proposed methodology focusing on the iterative evaluation of bottleneck function can be successfully applied to difficult computational problems. The results also show that even mature computational problems like molecular dynamics simulations can benefit from a systematic application of the state-of-the-art computer science. vii Keywords modelling, simulation, computational chemistry, approximation, opti- mization, acceleration viii Contents Acronyms 1 Glossary 2 1 Introduction 7 1.1 Simulating Chemical Processes ................7 1.1.1 Accuracy of Models . .9 1.1.2 Computational Demands . 11 1.2 Motivation and Current Limits ................ 13 1.3 Acceleration and Its Difficulties ............... 15 1.4 Our Approach ......................... 16 1.4.1 Structure of the Thesis . 18 2 Parallel-in-Time Molecular Dynamics 19 2.1 Problem ............................ 20 2.1.1 Molecular Dynamics . 20 2.1.2 Parallel-in-Time Computation . 24 2.1.3 Motivation . 29 2.1.4 Problem Description and Solution Proposal . 29 2.2 Model of Computation .................... 29 2.3 Modified Model of Computation ............... 31 2.4 Analysis of Problematic Aspects ............... 32 2.4.1 Prototype and Setup of Experiments . 33 2.4.2 Differences between Gravitational and Electro- static N-body Problem . 37 2.4.3 Application of Molecular Dynamics to Parallel- in-Time Integration Scheme . 42 2.4.4 Overview of Problematic Aspects . 45 2.4.5 Theoretical Speedup . 46 2.5 Conclusion .......................... 48 2.5.1 Future Work . 49 3 Approximation of Mean Square Distance Computations in Metadynamics 51 3.1 Problem ............................ 52 3.1.1 Metadynamics . 52 ix 3.1.2 Motivation . 59 3.1.3 Problem Description and Solution Proposal . 59 3.2 Model of Computation .................... 61 3.3 Approximated Model of Computation ............ 63 3.4 Accuracy and Speedup .................... 66 3.4.1 Number of MSD Computations . 66 3.4.2 Implementation . 67 3.4.3 Datasets and Computational Details . 68 3.4.4 Theoretical Speedup . 71 3.4.5 Practical Speedup . 73 3.4.6 Accuracy Evaluation . 79 3.5 Conclusion .......................... 84 3.5.1 Future Work . 85 4 Simplified Optimization Method in Atomic Charges Parametriza- tion 87 4.1 Problem ............................ 88 4.1.1 Atomic Charges . 88 4.1.2 Electronegativity Equalization Method and its Parametrization . 91 4.1.3 Motivation . 92 4.1.4 Problem Description and Solution Proposal . 93 4.2 Model of Computation .................... 93 4.3 Analysis of the Optimization Space and the Fitness Landscape 96 4.3.1 Optimization Space . 96 4.3.2 Fitness Landscape . 97 4.4 Simplified Model of Computation ............... 105 4.5 Accuracy and Speedup .................... 107 4.5.1 Implementation . 107 4.5.2 Evaluation Methods . 108 4.5.3 Accuracy Evaluation . 109 4.5.4 Theoretical Speedup . 111 4.5.5 Practical Speedup and Scalability . 112 4.6 Conclusion .......................... 113 4.6.1 Future Work . 114 5 Conclusions 115 x Bibliography 117 A Author’s Publications 145 xi List of Tables 2.1 List of datasets. 35 3.1 Percentage of computation time spent in metadynamics (MTD), CV evaluation (CV), distance computation (MSD), and rotation matrix computation (DSYEVR). 60 3.2 Values of the current-close structure distance threshold # influence K, the average number of steps between the reassignment of the close structure. 72 3.3 Speedup for the whole simulation for various # and thus various maximal speed-ups of MSD computations 73 3.4 Comparison of original and close structure trajectories.