An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation Advances in Experimental Medicine and Biology

Advances in Experimental Medicine and Biology 797 Gregory R. Bowman Vijay S. Pande Frank Noé Editors An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation Advances in Experimental Medicine and Biology Volume 797 Editorial Board: IRUN R. COHEN, The Weizmann Institute of Science, Rehovot, Israel ABEL LAJTHA, N.S. Kline Institute for Psychiatric Research, Orangeburg, NY, USA JOHN D. LAMBRIS, University of Pennsylvania, Philadelphia, PA, USA RODOLFO PAOLETTI, University of Milan, Milan, Italy For further volumes: www.springer.com/series/5584 [email protected] Gregory R. Bowman r VijayS.Pande r Frank Noé Editors An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation [email protected] Editors Gregory R. Bowman Frank Noé University of California Freie Universität Berlin Berkeley, CA, USA Berlin, Germany Vijay S. Pande Department of Chemistry Stanford University Stanford, CA, USA ISSN 0065-2598 ISSN 2214-8019 (electronic) Advances in Experimental Medicine and Biology ISBN 978-94-007-7605-0 ISBN 978-94-007-7606-7 (eBook) DOI 10.1007/978-94-007-7606-7 Springer Dordrecht Heidelberg New York London Library of Congress Control Number: 2013956358 © Springer Science+Business Media Dordrecht 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsi- bility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) [email protected] Contents 1 Introduction and Overview of This Book ............ 1 Gregory R. Bowman, Vijay S. Pande, and Frank Noé 2 An Overview and Practical Guide to Building Markov State Models ............................... 7 Gregory R. Bowman 3 Markov Model Theory ...................... 23 Marco Sarich, Jan-Hendrik Prinz, and Christof Schütte 4 Estimation and Validation of Markov Models ......... 45 Jan-Hendrik Prinz, John D. Chodera, and Frank Noé 5 Uncertainty Estimation ..................... 61 Frank Noé and John D. Chodera 6 Analysis of Markov Models ................... 75 Frank Noé and Jan-Hendrik Prinz 7 Transition Path Theory ..................... 91 Eric Vanden-Eijnden 8 Understanding Protein Folding Using Markov State Models . 101 Vijay S. Pande 9 Understanding Molecular Recognition by Kinetic Network Models Constructed from Molecular Dynamics Simulations . 107 Xuhui Huang and Gianni De Fabritiis 10 Markov State and Diffusive Stochastic Models in Electron Spin Resonance .......................... 115 Deniz Sezer and Benoît Roux 11 Software for Building Markov State Models ............................... 139 Gregory R. Bowman and Frank Noé v [email protected] Contributors Gregory R. Bowman Departments of Molecular & Cell Biology and Chem- istry, University of California, Berkeley, CA, USA; University of California, Berkeley, USA John D. Chodera Memorial Sloan-Kettering Cancer Center, New York, NY, USA Gianni De Fabritiis Computational Biophysics Laboratory (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Barcelona, Spain Xuhui Huang Department of Chemistry, Division of Biomedical Engineer- ing, Center of Systems Biology and Human Health, Institute for Advance Study, The Hong Kong University of Science and Technology, Kowloon, Hong Kong Frank Noé Freie Universität Berlin, Berlin, Germany Vijay S. Pande Department of Chemistry, Stanford University, Stanford, CA, USA; Stanford University, Stanford, CA, USA Jan-Hendrik Prinz Freie Universität Berlin, Berlin, Germany Benoît Roux Department of Biochemistry and Molecular Biology, The Uni- versity of Chicago, Chicago, USA Marco Sarich Freie Universität Berlin, Berlin, Germany Christof Schütte Freie Universität Berlin, Berlin, Germany Deniz Sezer Faculty of Engineering and Natural Sciences, Sabancı Univer- sity, Istanbul, Turkey Eric Vanden-Eijnden Courant Institute, New York University, New York, NY, USA vii [email protected] Acronyms ESR Electron spin resonance MD Molecular dynamics (simulation) MSM Markov state model PCCA Perron cluster cluster analysis TPT Transition path theory TPS Transition path sampling ix [email protected] Mathematical Symbols T(τ) A transition probability matrix (row-stochastic) in Rn×n describing the probabilities of hopping amongst a discrete set of states. The elements Tij (τ) give the probability of an i → j transition during a time interval τ . Tˆ (τ) An estimate of T(τ) from trajectory data. C(τ) A transition count matrix (row-dominant) in Rn×n describing the number of transitions observed amongst a discrete set of states. The elements cij (τ) count the number of i → j transitions observed, each of which occurred during a time interval τ . τ The time resolution (or lag time) of a model. n The number of discrete states. n p(t) A (column) vector in R where the entry pi(t) specifies the probability of being in state i at time t. n π A (column) vector in R where the entry π i specifies the equilib- rium probability of being in state i. λi The i’th largest eigenvalue of a transition probability matrix T . The largest eigenvalue is λ1 and eigenvalues are ordered such that 1 = λ1 >λ2 >λ3. ψi The i’th right eigenvector of a transition probability matrix T in Rn . The first right eigenvector is ψ1. φi The i’th left eigenvector of a transition probability matrix T in Rn . The first left eigenvector is φ1. χi An indicator function for state i that is 1 within state i and 0 else- where. It may also refer to the degree of membership in state i. θi An experimental observable characteristic of sate i. qi The commitor probability for state i. That is, the probability of reaching some predefined set of final states from state i before reaching some predefined set of initial states. Ω A continuous state space (including positions and momenta). x(t) A state in Ω (including positions and momenta) at time t. μ(x) The stationary density of x. p(x, y; τ) The transition probability density to y ∈ Ω after time τ given the system is in x ∈ Ω. xi [email protected] xii Mathematical Symbols T (τ) A transfer operator that propagates the continuous dynamics for a time τ . m The number of dominant eigenfunctions/eigenvalues considered. S1,...,Sn Discrete sets which partition the state space Ω. μi(x) The local stationary density restricted to a discrete state i. f,g The scalar product f,g= f(x)g(x)d x. f,gμ The weighted scalar product f,gμ = μ(x)f (x)g(x)dx. [email protected] Introduction and Overview of This Book 1 Gregory R. Bowman, Vijay S. Pande, and Frank Noé Computer simulations are a powerful way of un- could complement your work or a theorist seek- derstanding molecular systems, especially those ing to understand the details of these methods, we that are difficult to probe experimentally. How- hope this book will be useful to you. ever, to fully realize their potential, we need This introduction provides a brief overview of methods that can provide understanding, make the background leading to the development of a quantitative connection with experiment, and MSMs, what MSMs are, and the contents of this drive efficient simulations. book. The main purpose of this book is to introduce Markov state models (MSMs) and demonstrate that they meet all three of these requirements. In 1.1 Background short, MSMs are network models that provide a map of the free energy landscape that ultimately Molecular systems are exquisitely sensitive to determines a molecule’s structure and dynamics. atomistic details—for example, a single point These maps can be used to understand a system, mutation can have dramatic effects on protein predict experiments, or decide where to run new folding or function—a complete understanding would require atomically detailed models that simulations to refine the map. Protein folding and capture both the thermodynamics and kinetics of function will often be used to illustrate the prin- the system of interest. There are many power- ciples in this book as these problems have largely ful experimental methods for probing the struc- driven the development of MSMs; however, the ture and dynamics of molecular systems but, cur- methods are equally applicable to other molecu- rently, none can provide a complete understand- lar systems and possibly entirely different probing of a system. lems. Whether you are an experimentalist inter- Structural biologists have developed a range ested in understanding a bit of theory and how it of methods for building atomically detailed models of proteins and other molecules; however, we G.R.

An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation Advances in Experimental Medicine and Biology

Modeling Dependence in Data: Options Pricing and Random Walks

A Study of Hidden Markov Model

Entropy Rate

Markov Decision Process Example

1. Markov Models

Markov Chains and Hidden Markov Models

Notes on Markov Models for 16.410 and 16.413 1 Markov Chains

A HMM Approach to Identifying Distinct DNA Methylation Patterns

Stochastic Processes and Hidden Markov Models Introduction

Introduction to Stochastic Processes

Markov Decision Processes

Stochastic Processes and Markov Chains (Part I)