Pattern Mining and Visualization for Molecular Dynamics
Total Page:16
File Type:pdf, Size:1020Kb
PATTERN MINING AND VISUALIZATION FOR MOLECULAR DYNAMICS SIMULATION by Xue Kong A Thesis Submitted to the Faculty of The College of Computer Science and Engineering In Partial Fulfillment of the Requirements for the Degree of Master of Science Florida Atlantic University Boca Raton, Florida August 2014 ACKNOWLEDGEMENTS I would like to thank my advisor, Dr. Xingquan Zhu, for all his help, patience, and encouragement. He gave me this great opportunity to study with him. In order to achieve my goal for my Master’s thesis, he put forth great efforts and enthusiasm to address my questions and concerns. Thank you to my committee members, Dr. Taghi Khoshgoftaar and Dr. Zvi Roth, for all of their guidance, expertise, and support throughout this long journey. Thanks to my work office staff – International Student and Scholars Office – for all of their encouragement. I would like to thank Andy Plotkin, Gabby Helo, and Caitlin (Ella) Connolly for taking their precious time to review my Master’s thesis. Last but not least, I would like to thank my parents, my family, computing science staff, and my friends for all their support and encouragement. iii ABSTRACT Author: Xue Kong Title: Pattern Mining and Visualization for Molecular Dynamics Simulation Institution: Florida Atlantic University Thesis Advisor: Dr. Xingquan Zhu Degree: Master of Science Year: 2014 Molecular dynamics is a computer simulation technique for expressing the ultimate details of individual particle motions and can be used in many fields, such as chemical physics, materials science, and the modeling of biomolecules. In this thesis, we study visualization and pattern mining in molecular dynamics simulation. The molecular data set has a large number of atoms in each frame and range of frames. The features of the data set include atom ID; frame number; position in x, y, and z plane; charge; and mass. The three main challenges of this thesis are to display a larger number of atoms and range of frames, to visualize this large data set in 3-dimension, and to cluster the abnormally shifting atoms that move with the same pace and direction in different frames. Focusing on these three challenges, there are three contributions of this thesis. First, we design an abnormal pattern mining and visualization framework for molecular dynamics simulation. The proposed framework can visualize the clusters of abnormal shifting atom groups in a three-dimensional space, and show their temporal relationships. Second, we iv propose a pattern mining method to detect abnormal atom groups which share similar movement and have large variance compared to the majority atoms. We propose a general molecular dynamics simulation tool, which can visualize a large number of atoms, including their movement and temporal relationships, to help domain experts study molecular dynamics simulation results. The main functions for this visualization and pattern mining tool include atom number, cluster visualization, search across different frames, multiple frame range search, frame range switch, and line demonstration for atom motions in different frames. Therefore, this visualization and pattern mining tool can be used in the field of chemical physics, materials science, and the modeling of biomolecules for the molecular dynamic simulation outcomes. v DEDICATION To my parents, who have always encouraged and supported me. PATTERN MINING AND VISUALIZATION FOR MOLECULAR DYNAMICS SIMULATION LIST OF TABLES ................................................................................................................... x LIST OF FIGURES ................................................................................................................ xi 1. Introduction .......................................................................................................................... 1 1.1 Introduction .................................................................................................................... 1 1.2 Motivations ..................................................................................................................... 6 1.3 The Need for Visualization Tool and Pattern Mining ................................................. 9 1.3.1 The Need for Visualization Tool ............................................................................ 9 1.3.2 The Need for Pattern Mining ................................................................................ 10 1.4 The Challenges of Molecular Dynamics .................................................................... 13 1.4.1 Time and Size ........................................................................................................ 13 1.4.2 Interaction of the Atoms ....................................................................................... 14 1.4.3 Pattern Mining ....................................................................................................... 14 1.5 Contribution .................................................................................................................. 15 1.6 Thesis Organization ..................................................................................................... 15 2. Background and Related Work ......................................................................................... 17 2.1 Introduction .................................................................................................................. 17 2.2 Simple Introduction to General Molecular Dynamics Tools .................................... 17 2.2.1 Visual Molecular Dynamics ................................................................................. 18 2.2.2 USCF Chimera ...................................................................................................... 19 2.2.3 YASARA ............................................................................................................... 19 vii 2.3 Pattern Mining for Molecular Dynamics Simulation................................................. 19 2.3.1 Data Mining ........................................................................................................... 19 2.3.2 Pattern Mining ....................................................................................................... 20 2.3.3 Pattern Mining for Visualization .......................................................................... 21 2.3.4 Pattern Mining for Molecular Dynamics Simulation .......................................... 22 2.4 Abnormal Pattern Discovery from Molecular Dynamics Simulation Data.............. 22 2.5 Summary ....................................................................................................................... 24 3. A Framework for Visualization tool for molecular dynamics ........................................ 25 3.1 Introduction .................................................................................................................. 25 3.1.1 Introduction ........................................................................................................... 25 3.1.2 The Framework of the Proposed Visualization Tool .......................................... 25 3.2 Key Challenges for Our Visualization Tool ............................................................... 27 3.2.1 File Reading Latency ............................................................................................ 27 3.2.2 Frame Rendering Latency..................................................................................... 28 3.2.3 Displaying 3-Dimensional Visualization of Data ............................................... 28 3.3.3 User-Interface ........................................................................................................ 32 3.3.3 Display Window .................................................................................................... 34 3.4 Tool Performance ......................................................................................................... 42 3.5 Summary ....................................................................................................................... 43 4. A Framework for pattern mining tool for molecular dynamics ...................................... 44 4.1 Clustering...................................................................................................................... 44 4.1.1 Clustering ............................................................................................................... 44 4.1.2 Similarity and Normalization ............................................................................... 46 4.2 Methodologies .............................................................................................................. 48 viii 4.3 Hierarchical Clustering ................................................................................................ 48 4.3.1 Agglomerative Algorithm ..................................................................................... 49 4.3.2 User-Interface ........................................................................................................ 50 4.4 Spectral Clustering ....................................................................................................... 57 4.4.1 Spectral Clustering ................................................................................................ 57 4.4.2 User-Interface .......................................................................................................