A Review of Geometric, Topological and Graph Theory Apparatuses for the Modeling and Analysis of Biomolecular Data
Total Page:16
File Type:pdf, Size:1020Kb
A review of geometric, topological and graph theory apparatuses for the modeling and analysis of biomolecular data Kelin Xia1 ∗and Guo-Wei Wei2;3 y 1Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371 2Department of Mathematics Michigan State University, MI 48824, USA 3Department of Biochemistry & Molecular Biology Michigan State University, MI 48824, USA January 31, 2017 Abstract Geometric, topological and graph theory modeling and analysis of biomolecules are of essential im- portance in the conceptualization of molecular structure, function, dynamics, and transport. On the one hand, geometric modeling provides molecular surface and structural representation, and offers the basis for molecular visualization, which is crucial for the understanding of molecular structure and interactions. On the other hand, it bridges the gap between molecular structural data and theoretical/mathematical mod- els. Topological analysis and modeling give rise to atomic critic points and connectivity, and shed light on the intrinsic topological invariants such as independent components (atoms), rings (pockets) and cavities. Graph theory analyzes biomolecular interactions and reveals biomolecular structure-function relationship. In this paper, we review certain geometric, topological and graph theory apparatuses for biomolecular data modeling and analysis. These apparatuses are categorized into discrete and continuous ones. For discrete approaches, graph theory, Gaussian network model, anisotropic network model, normal mode analysis, quasi-harmonic analysis, flexibility and rigidity index, molecular nonlinear dynamics, spectral graph the- ory, and persistent homology are discussed. For continuous mathematical tools, we present discrete to continuum mapping, high dimensional persistent homology, biomolecular geometric modeling, differential geometry theory of surfaces, curvature evaluation, variational derivation of minimal molecular surfaces, atoms in molecule theory and quantum chemical topology. Four new approaches, including analytical minimal molecular surface, Hessian matrix eigenvalue map, curvature map and virtual particle model, are introduced for the first time to bridge the gaps in biomolecular modeling and analysis. Emphasis is given to the connections of existing biophysical models/methods to mathematical theories, such as graph theory, Morse theory, Poicar´e-Hopf theorem, differential geometry, differential topology, algebraic topology and geometric topology. Potential new directions and standing open problems are briefly discussed. Key words: Molecular bioscience, Molecular biophysics, Graph theory, Graph spectral theory, Graph Lapla- cian, Differential geometry, Differential topology Persistent homology, Morse theory, Conley index, Poincar´e- Hopf index, Laplace-Beltrami operator, Dynamical system. ∗E-mail: [email protected] yE-mail: [email protected] 1 Contents 1 Introduction 4 2 Discrete apparatuses for biomolecules7 2.1 Graph theory related methodologies.................................8 2.1.1 Elementary graph theory...................................9 2.1.2 Gaussian network model (GNM)............................... 10 2.1.3 Anisotropic network model (ANM).............................. 11 Generalized GNM and generalized ANM........................... 12 2.1.4 Normal mode analysis (NMA) and quasi-harmonic analysis................ 13 Standard NMA......................................... 14 Essential dynamics and quasi-harmonic analysis....................... 14 2.1.5 Flexibility rigidity index (FRI)................................ 15 Multiscale FRI......................................... 16 Consistency between GNM and FRI............................. 16 Anisotropic FRI........................................ 17 2.1.6 Spectral graph theory..................................... 18 Graph decomposition and graph cut............................. 19 Ratio cut and Laplaician matrix............................... 20 Normalized cut and normalized Laplacian matrix...................... 21 Graph Laplacian and continuous Laplace operator..................... 22 Modularity........................................... 22 2.1.7 Molecular nonlinear dynamics................................. 24 Stability analysis........................................ 25 2.2 Persistent homology.......................................... 26 2.2.1 Simplicial homology and persistent homology........................ 26 Simplicial complex....................................... 26 Homology............................................ 27 Cechˇ complex, Rips complex and alpha complex...................... 28 General filtration processes.................................. 28 Persistent homology...................................... 29 2.2.2 Multiscale persistent homology................................ 30 2.2.3 Topology based quantitative modeling............................ 30 3 Continuous apparatuses for biomolecules 31 3.1 Geometric representation....................................... 31 Non-smooth biomolecular surface representations...................... 31 Smooth biomolecular surface representations........................ 31 Discrete to continuum mapping................................ 32 3.2 Multiresolution and multidimensional persistent homology..................... 32 Multiresolution persistent homology............................. 32 Multidimensional persistent homology............................ 33 3.3 Differential geometry theory of surfaces............................... 34 Surface elements and immersion............................... 34 First fundamental form.................................... 34 Gauss map........................................... 34 Weingarten map........................................ 34 Second and third fundamental form............................. 35 Principal curvature....................................... 35 Gaussian and mean curvature................................. 35 3.4 Differential geometry modeling and computation.......................... 36 3.4.1 Minimal molecular surface.................................. 36 2 3.4.2 Scalar field curvature evaluation............................... 37 Algorithm I........................................... 37 Algorithm II.......................................... 38 3.4.3 Analytical minimal molecular surface............................. 39 3.5 Scalar and vector field topology.................................... 40 3.5.1 Critical points and their classification............................ 41 3.5.2 Vector field topology..................................... 42 Morse theory.......................................... 43 3.5.3 Topological characterization of chemical bonds....................... 43 The Laplacian of electron density............................... 43 Identifying noncovalent interactions............................. 44 3.6 Geometric-topological (Geo-Topo) fingerprints of scalar fields................... 44 3.6.1 Geo-Topo fingerprints of Hessian matrix eigenvalue maps................. 44 3.6.2 Geo-Topo fingerprints of curvature maps.......................... 46 Gaussian and mean curvature maps............................. 48 Maximal and minimal curvature maps............................ 48 3.7 Eigenvector field analysis....................................... 48 3.7.1 Virtual particle model..................................... 50 3.7.2 Eigenvector analysis...................................... 53 3.8 Demonstrations............................................. 53 3.8.1 Case studies........................................... 53 Fullerene C20 .......................................... 53 An α-helix structure...................................... 54 A β-sheet structure....................................... 56 3.8.2 Persistent homology for scalar field analysis......................... 56 4 Concluding remarks 58 3 1 Introduction Life science is regarded the last forefront in natural science and the 21st century will be the century of biological sciences. Molecular biology is the foundation of biologic sciences and molecular mechanism, which is governed by all the valid mechanics, including quantum mechanics when it is relevant, and is the ultimate truth of life science. One trend of biological sciences in the 21st century is that many traditional disciplines, such as epidemiology, neuroscience, zoology, physiology and population biology, are transforming from macroscopic and phenomenological to molecular-based sciences. Another trend is that biological sciences in the 21st century are transforming from qualitative and descriptive to quantitative and predictive, as many other disciplines in natural science have done in the past. Such a transformation creates unprecedented opportunities for mathematically driven advances in life science.287 Biomolecules, such as proteins and the nucleic acids, including DNA and RNA, are essential for all known forms of life, such as animals, fungi, protists, archaea, bacteria and plants. Indeed, proteins perform a vast variety of biological functions, including membrane channel transport, signal transduction, organism structure supporting, enzymatic catalysis for transcription and the cell cycle, and immune agents. In contrast, nucleic acids function in association with proteins and are essential players in encoding, transmitting and expressing genetic information, which is stored through nucleic acid sequences, i.e., DNA or RNA molecules and trans- mitted via transcription and translation processes.