Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1871

Effect of Macromolecular Crowding on Diffusive Processes

ZAHEDEH BASHARDANESH

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 ISBN 978-91-513-0785-5 UPPSALA urn:nbn:se:uu:diva-395119 2019 Dissertation presented at Uppsala University to be publicly examined in BMC: A1:111a, Husargatan 3, Uppsala, Monday, 9 December 2019 at 09:15 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Prof. Dr. Lars Schäfer (Ruhr-Universität Bochum ).

Abstract Bashardanesh, Z. 2019. Effect of Macromolecular Crowding on Diffusive Processes. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1871. 50 pp. Uppsala: Uppsala University. ISBN 978-91-513-0785-5.

Macromolecular crowding are innate to cellular environment. Understanding their effect on cellular components and processes is essential. This is often neglected in dilute experimental setup both and in silico. In this thesis I have dealt with challenges in biomolecular simulations at two levels of modeling, Brownian Dynamics (BD) and Molecular Dynamics (MD). Conventional BD simulations become inefficient since most of the computational time is spent propagating the particles towards each other before any reaction takes place. Event- driven algorithms have proven to be several orders of magnitude faster than conventional BD algorithms. However, the presence of diffusion-limited reactions in biochemical networks lead to multiple rebindings in case of a reversible reaction which deteriorates the efficiency of these types of algorithms. In this thesis, I modeled a reversible reaction coupled with diffusion in order to incorporate multiple rebindings. I implemented a Green's Function Reaction Dynamics (GFRD) algorithm by using the analytical of the reversible reaction diffusion equation. I show that the algorithm performance is independent of the number of rebindings. Nevertheless, the gain in computational power still deteriorates when it comes to the simulation of crowded systems. However, given the effects of macromolecular crowding on diffusion coefficient and kinetic parameters are known, one can implicitly incorporate the effect of crowding into coarse-grain algorithms by choosing right parameters. Therefore, understanding the effect of crowding at atomistic resolution would be beneficial. I studied the effect of high concentration of on diffusive properties at atomistic level with MD simulations. The findings emphasize the effect of chemical interactions at atomistic level on mobility of macromolecules. Simulating macromolecules in high concentration raised challenges for atomistic physical models. Current force fields lead to aggregation of at high concentration. I probed scenarios based on weakening and strengthening -protein and protein-water interactions, respectively. Furthermore, I built a cytoplasmic model at atomistic level based on the data available on . This model was simulated in time and space by MD simulation package, GROMACS. Through this model, it is possible to study structural and dynamical properties under cellular like environment at physiological concentration.

Zahedeh Bashardanesh, Department of and Molecular Biology, Computational Biology and Bioinformatics, Box 596, Uppsala University, SE-751 24 Uppsala, Sweden.

© Zahedeh Bashardanesh 2019

ISSN 1651-6214 ISBN 978-91-513-0785-5 urn:nbn:se:uu:diva-395119 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-395119) To Alex and Sophie

List of papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Bashardanesh, Z. and Lötstedt, P. Efficient Green’s Function Reaction Dynamics (GFRD) simulations for diffusion-limited, reversible reactions. J. Comp. Phys. 2018, 357, pp78-99

II Bashardanesh, Z. and van der Spoel, D. Impact of Dispersion Coefficient on Simulations of Proteins and Organic Liquids. J. Phys. Chem. 2018, 122, pp8018-8027

III Bashardanesh, Z., Haiyang Z., Elf, J. and van der Spoel, D. Rotational and translational diffusion of proteins as a function of concentration Submitted

IV Bortot, L. O., Bashardanesh, Z. and van der Spoel, D. Making Soup: Preparing and Validating Molecular Simulations of the Bacterial Cytoplasm. Submitted

Reprints were made with permission from the publishers.

Contents

1 Introduction ...... 9 1.1 Macromolecular Crowding ...... 9 1.2 Quantitative/Computational Biology ...... 11 1.3 Thesis outline ...... 12

2 Molecular Dynamics ...... 13 2.1 Equation of Motion ...... 14 2.2 Potential Function ...... 15 2.2.1 Electrostatic Interactions ...... 15 2.2.2 Harmonic Potentials ...... 15 2.2.3 Van der Waals Interactions ...... 16 2.2.4 Alternatives and Extensions ...... 17 2.3 Parameter Optimization ...... 19 2.4 Force Calculation ...... 20

3 Brownian Dynamics - Smoluchowski Approach ...... 21 3.1 Deterministic Modelling ...... 21 3.2 Stochastic Modelling ...... 22 3.3 Green’s Function Reaction Dynamics ...... 24

4 Summary of papers ...... 27 4.1 Paper I: Efficient Green’s Function Reaction Dynamics (GFRD) simulations for diffusion-limited, reversible reactions . 27 4.2 Paper II: Impact of Dispersion Coefficient on Simulations of Proteins and Organic Liquids ...... 28 4.3 Paper III: Rotational and translational diffusion as a function of concentration ...... 31 4.4 Paper IV: Making Soup: Preparing and Validating Molecular Simulations of the Bacterial Cytoplasm ...... 31

5 Concluding Remarks ...... 33

6 Svensk Sammanfattning ...... 37

7 Acknowledgments ...... 39

References ...... 41

1. Introduction

1.1 Macromolecular Crowding Biology is the scientific field that studies living organisms. The most basic unit at which life happens is a cell. Cells are capable of performing complex functions to keep them alive. These cellular functions, such as protein synthe- sis, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) translation, are outcomes of multiple physio-chemical processes at various temporal- and spatial scales. Studying cellular functions requires experimental setups with appropriate tools for the temporal- and spatial scale of the ques- tion. Generally speaking, biological experiments are divided in two categories. In vivo refers to the experiments that are performed in living cells, and in vitro to those in which the components of living cells are isolated from their original environment. The results from in vitro experiments do not fully predict the outcomes of in vivo experiments. One reason is the different composition of the intracellular environment and the test tubes. The intracellular environment is composed of wide range of biomolecular assemblies, from macromolecules such as nucleic acids, proteins and carbo- hydrates, to much smaller molecules such as metabolites, ions and water. The cell interior is estimated to be up to 40% macromolecules [153, 89, 33]. Water and small molecules are the remaining constituents. For an illustration of in- terior of cell see Fig. 1.1. Numerous attempts have been made to make the in vitro milieu resemble the cellular environment by addition of high concentra- tions of synthetic polymers or cell lysate [27, 95, 64]; however, to what extent in vivo measurements are reproducible with artificial supplements is still am- biguous [145, 96]. The effects of the intracellular environment on cellular processes is referred to as (macromolecular) crowding. Some of the problems addressed under crowding conditions are [133], protein-DNA binding [47] and dynamical properties such as diffusion [82, 144, 145].

Macromolecular Crowding and Structure Much of the knowledge about biomolecular structures at atomistic resolution are derived from two experimental techniques, X-ray crystallography and nu- clear magnetic resonance (NMR) spectroscopy [23, 17, 121]. It is not clear how the structure and stability of biomolecules would be altered under non- physiological conditions, but for advances in in vivo NMR see [120].

9 (a)(b)

Figure 1.1. a) Graphic illustration of Escherichia coli by David S. Goodsell. Free im- age downloaded from the website http://pdb101.rcsb.org/sci-art/goodsell-gallery. b) All-atom representation of E. coli cytoplasm pertaining to about 1/5000 of its volume.

Macromolecular Crowding and Dynamics Molecular transportation throughout the cell takes place mainly by two means; active transport or through random movements. An active transport is a pro- cess in which energy is expended to move substances from lower concen- tration to higher concentration. On the other hand, random movements of molecules are due to their thermal energy and collisions with neighboring molecules such as water molecules. The substance is transported without ex- pense of energy, down the concentration gradient. This passive transport is often referred to as Brownian motion [14] an it plays an important role in the cellular processes taking place globally throughout the cell. The presence of molecules in a wide range of sizes, shapes and electro- static properties creates a heterogeneous environment that affects the diffusive processes [34, 75]. Diffusive properties of macromolecules are crucial in determining the ki- netic properties of cellular processes, because an alteration in rates of diffu- sive processes could change the response of the biochemical network [127]. Knowledge about the alteration in the diffusional properties of biomolecules in living cells or cell-like environments is important in order to understand cellular biology at the molecular level.

Excluded Volume and Non-Specific Interactions This crowding leads to a diminished volume available to biomolecules. Its effect has been studied for a long time using a range of experiments, theory, and simulations. For instance, macromolecular crowding has been imitated by artificial crowding agents in vitro. In these studies [151], the association rates

10 between biomolecules accelerate as the result of high concentration of arti- ficial crowders and in turn, the chemical reaction equilibrium shifts towards the associated state. Proteins fold more compactly and the free energy of pro- tein unfolding increases under these conditions as the excluded volume leaves less conformational space for unfolded proteins. For a review on these stud- ies see [151]. These studies, to some extent, are consistent with predictions from simple theoretical models known as excluded-volume [152, 112], which takes the steric effects into account. However, it is not obvious how well non- reactive polymers can replace biomolecular entities that contain surfaces with positive, negative and neutral charges. Nevertheless, they provide a suitable model for steric repulsion. More recently, reactive agents, such as cell lysates or protein crowders are used in macromolecular crowding studies, but the results are at times contrary to those studies in which just the inert crowders are used [146]. Both steric re- pulsion and chemical interactions must be considered to understand the effects of macromolecular crowding [118]. Nonspecific interactions between proteins and crowders are also investi- gated by simulations [94, 42, 41]. For a review about the effects of macro- molecular crowding on biomolecular properties in silico see [44].

The biggest bottleneck in understanding cellular processes under crowded condition is the limitation of the experimental techniques. There are rarely any experimental techniques that can give atomistic resolution of the interior of the cellular environment. Therefore, theoretical and computational tools are valuable resources to give insights and provide new hypotheses.

1.2 Quantitative/Computational Biology Quantitative biology spans a wide range of problems, from evolution and ecol- ogy, to cellular biology and biochemical reactions. From a methodological point of view, this field also spans a wide range of mathematical tools [106], from mean-fields [99] to quantum mechanical descriptions of biochemical re- actions [30]. Biological systems are intrinsically complex. Their overall behavior is the result of interplays of many molecular components in a complex network. Even though the completed human project [139, 29] has yielded the blueprints of all human genes, this is not sufficient to predict the complex inter- action of various networks. Computational approaches are used to capture the features of these complex biological networks. As an example, see the review on mathematical models used in constructing gene regulatory networks [26]. Certainly, an analytical solution to mathematical models is preferable to nu- merical because obtaining statistical properties then becomes easier. Unfortunately, reaching an analytical solution for complex systems such as

11 those in biology is not doable. For these complex problems only numerical approaches can provide solutions. There are efficient algorithms developed in a variety of packages [69, 70, 68] for nonlinear but simple systems. However, new algorithms and models are needed when the degrees of freedom in the problem grow or when stochasticity plays an important role, as happens in biological systems [109]. The generated output from quantitative models is often compared with the experimental output. One advantage of computational biology over experi- mental practice is that it is often cheaper and faster. The possibility of probing scenarios that are not feasible in an experimental setup is another advantage of computational studies. The comparison of generated data from simulations and experiments is a necessary step to either get insight into the underlying mechanism, or to test a new hypothesis by experiment, or to change/improve the underlying quantitative models. An important step in setting up computational experiments is choosing the level of details of the quantitative model. As a rule of thumb, more detailed models have higher accuracy, at the expense of costly (sometimes close to impossible) calculations. However, the nature of the question is the determi- nant of the required level of details. For example, understanding the enthalpic contribution to the observed crowding effect cannot be resolved using coarse- grained models, such as simple Brownian Dynamics. On the other hand, un- derstanding the effects of crowding on macroscopic behavior requires models that can resolve the system for a couple of hours of their life time which is currently impossible with atomistic level models, e.g., Molecular Dynamics. This is the classic trade-off between accuracy and efficiency. This trade-off is shown in this thesis between two models, Molecular Dynamics and Brownian Dynamics, that are often used in biomolecular simulations.

1.3 Thesis outline The introductory chapter is meant to give the background on the biological problem that is the main focus of the work and to name the tools. In chap- ters 2 and 3, the tools are explained in more details. Chapter 2 deals with the numerical method known as Molecular Dynamics (MD) in solving atomistic movements of molecules. Chapter 3 deals with an analytical model (diffu- sion equation) and the numerical method Green’s Function Reaction Dynam- ics (GFRD) to solve the Brownian movements of molecules. Chapter 4 sum- marizes the scientific papers/manuscripts that have been produced during the thesis work. Chapter 5 finalizes the thesis with concluding remarks on the state-of-the-art.

12 2. Molecular Dynamics

In order to investigate problems pertaining to living organisms quantitatively, one needs a proper level tool from mathematics, statistics and computational sciences. All living organisms are composed of one or multiple cells that consume energy, grow, and multiply to keep the organism alive, the study of which is collectively called cellular biology. In turn, the fate of each single cell is governed by the interplay of its sub-components. These cellular pro- cesses occur at a wide range of temporal and spatial scales. Therefore, one needs to know in advance at which physical level the problem of interest is sufficiently and/or efficiently described. For example, questions regarding the dynamical properties of cellular component at a spatial scale equivalent to the size of the cell cannot be resolved with tools based on electron movement de- scribed by quantum mechanics. For a broad overview of biological processes at spatial scale, see Fig. 2.1. However, an atomic resolution investigation of biomolecules at a larger scale than the size of the molecules would be bene- ficial in understanding some dynamical properties. This has not been much practiced due to high computational costs until recently [45, 78, 150]. Any physical model is based on certain physical and mathematical assump- tions and approximations. This practice of simplification allows us to solve larger scale problems. For example, the time-dependent Schrödinger equation could be used, in principle, to predict all properties of any molecule; how- ever, as the number of particles increases the procedure ceases to be efficient if practical at all. Accordingly, Molecular Dynamics, the subject matter of this chapter, is the result of Born-Oppenheimer approximation [19] which assumes the motion of nuclei and electrons of an atom in a molecule are separable. Since the weight of nuclei is by far larger than that of the electrons, the move- ment of atoms are predictable by their nuclear movement and the electrons are treated in an average fashion, for instance through partial charges. This ap- proximation allows us to build models of molecules and their dynamics based on simple physical laws, such as, Newton Equations of motion, Hooke’s law and Coulomb’s law. Molecular dynamics (MD) simulation with biological relevance was de- veloped in the 70s for systems with a couple of hundreds of atoms. It has advanced since then, to include e.g., studies of enzymatic reactions [147], protein folding [93] and protein dynamics in water [81]. The applicability of MD simulations to systems with hundreds of thousands of atoms would not be practical without novel algorithmic advances in energy calculations. These are now possible thanks to high performance computing [38] and par- allelization [11, 63], or the use of graphical processing units [53, 117, 1]. MD

13 Atom Metabolite Protein Cytoplasm Bacterium Prey Predator Interaction

Quantum Mechanics Molecular Dynamics Brownian Dynamics Differential Equations

Figure 2.1. Representation of wide range of spatial scale; from atoms to individuals can now be applied to larger biological systems with explicit [12], membrane embedded proteins [8] or macromolecular complexes such as the nucleosome [113], ribosomes [129] and highly complex systems in the pres- ence of explicit water such as whole viruses [45, 78], fractions of Mycoplasma genitalium [150] and Paper IV. In this chapter, the equation of motion is introduced, the potential func- tion and the models behind its constituents are explained, and some notes on parametrization of functional forms and calculation of forces are provided.

2.1 Equation of Motion Molecular Dynamics simulations are in fact numerical techniques to solve the N-body problem, N being the number of atoms representing a (bio)molecular system. Molecular dynamics simulation yields a deterministic trajectory of the state of the (bio)molecular system. In this model, the system represented in 3D (or 2D) coordinates evolves through Newton’s equation of motion, d2x F = m (2.1) dt2 where F is the force exerted on the particle with mass m at the position x. Eq. (2.1) is not solvable analytically for systems with more than two particles like biomolecular systems. Numerical approximations are used instead to give a trajectory of positions at discrete time points of all particles, as opposed to closed-form analytical solutions. Given the initial position and velocity of all particles, one can use finite difference methods to solve Eq. (2.1) numerically. There are multiple methods of numerical integration to solve ordinary dif- ferential equations. Examples are Euler integration, Leap-frog integration and Verlet integration. Verlet method is the most used one in MD simulations [3]. However, before the integration step in any MD algorithm, the left hand of Eq. (2.1), the force between each two atoms, has to be calculated. The calculation of force through energy is explained in the subsequent section.

14 δ+

rij δ−

rik δ− rjk

Figure 2.2. Representation of three non-bonded atoms with partial electrostatic + − − charges (δ, δ, δ ) at distances rij, rik and r jk from each other.

2.2 Potential Function To calculate the forces exerted on particles at each instant of time, the relation between force and energy is used, dU(x) F = − (2.2) dx U(x) is the potential energy at position x. The determination of potential energy for a complex system such as biomolecules is an active field of re- search [111]. Different energy functions can have different functional forms and different parameters, the determination of which are collectively called force-field development. Examples are AMBER [108, 142, 65, 86], CHARMM [66, 90, 18] and OPLS [58]. Below the most common functional form will be de- scribed. However, there are other functional forms that are more accurate, sometimes at the cost of larger calculations [111].

2.2.1 Electrostatic Interactions In classical point-charge force fields, the electrostatic interaction (Fig. 2.2) is calculated as the sum of interactions between pairs of atoms modeled as point charges through Coulomb’s law,

1 qiq j Vij(rij)= (2.3) 4πε0 rij

where qi and q j are the partial charges of atoms i and j, ε0 the dielectric constant, and rij the distance between atom i and j.

2.2.2 Harmonic Potentials When atoms share an electron, i.e., they are covalently bonded, the charge dis- tribution around the nucleus cannot be assumed to be symmetrical anymore.

15 l

θ

Figure 2.3. Representation of three atoms with two covalent bonds at equilibrium distance l and equilibrium angle θ,

Furthermore, the distance between charges are very small (less than a couple of Ångström). Therefore, simple electrostatic models cannot precisely mea- sure the contribution of the energies of the bonded atoms to the total energy of the systems. Consequently, their electrostatic energy contribution is excluded from the calculation of the total energy of the system. In these instances, i.e., covalently bonded atoms (Fig. 2.3), the potential energy for inter-atomic in- teraction between two and three atoms at respectively one and two covalent bonds apart are simply modeled by Hooke’s law. The potneitla energy per- taining to stretching a covalent bond is given by: k v(l)= b (l − l )2 (2.4) 2 0 where kb is the stretching constant of the bond and l0 the reference bond length. The potential energy pertaining to bending a two-covalent bond is given by: k v(θ)= a (θ − θ )2 (2.5) 2 0 where ka is the bending constant of the angle and θ0 the reference angle.

2.2.3 Van der Waals Interactions Van der Waals interactions are a combination of attractive and repulsive forces between non-bonded atoms (Fig. 2.4). The nature of the energy resulting from these forces shows that it vanishes at a very large distance and is very repulsive at a short distance. The energy reaches a minimum at a distance that is called equilibrium distance. In quantum mechanics terms, the repulsion is due to the overlap of the electron clouds of two atoms. It has been shown that the repulsive potential is accurately represented by an exponential function for some noble gases [21]. Attractive forces at intermediate and larger distances are due to the instantaneous created dipole that induces dipole-dipole moment in the neighboring atoms. These forces are called London dispersion forces, whose potential is the inverse of the sixth power of the distance between two atoms. [79].

16 rij

rik

rjk

Figure 2.4. Representation of three non-bonded atoms at distances rij, rik and r jk from each other.

The number of van der Waals interactions in a biomolecular system con- sisting of N atoms scale as N2. This prohibits the use of quantum mechan- ical models to measure the contribution of these interactions to the potential energy. The most famous model of van der Waals potential function is the Lennard-Jones (12-6) potential [80], in which the attractive part is modeled as a function of r−6 and the repulsive part as a function of r,−12 12 6 σij σij Uvdw = 4εij − . (2.6) rij rij

Here, εij is the well depth and σij is the distance at which the energy be- tween atom i and j is zero.

2.2.4 Alternatives and Extensions Most of the time, the functional forms used in the force fields are chosen such that their derivative is easily obtainable and the computational demand of their calculations is not large. However, this leads to the classic compromise between computational efficiency and accuracy. Below, alternative functions to, or the extension of, the functional forms mentioned above are introduced and discussed briefly.

Harmonic Potentials The harmonic potential functions (Eqs. (2.4) and (2.5)) for a typical bond work reasonably well for biomolecules under normal conditions. However, this model is a simple approximation of more accurate inter-atomic potentials such as the Morse potential [98]. The Morse potential describes the potential energy even when the bond deviates to its bond breaking point and dissocia- tion. The Morse potential has the following form:

2 v(l)=De (1 − exp(−a(l − l0))) (2.7)

17 φ

Figure 2.5. Representation of four atoms with three covalent bonds and the torsional angle φ.

where De is the depth of potential energy minimum, a = ω μ/2De, μ is the reduced mass, and ω is the frequency of the bond vibration. ω is related to the stretching constant of the bond, k,byω = k/μ, and l0 is the reference value of the bond. The Morse potential describes the inter-atomic interaction correctly, but it is not often used in force field calculation, It has a high com- putational cost for calculation of exponential function and it also has an extra parameter compared to the harmonic model. To approximate the Morse po- tential more accurately one can add cubic, quadratic, and higher terms to the simple harmonic potentials for a bond-stretching model (Eq. (2.4)). This in- clusion of an angle-bending model (Eq. (2.5)) can also give better accuracy of force fields, but at the expense of more parameter estimations.

Torsional Term The torsional energy contributed from the rotation around one bond (see Fig. 2.5) is represented by,

v(φ)=kt (1 + cos(mφ − γ)) (2.8) where kt is the force constant of torsion, φ is the current torsional angle, m is the multiplicity (the number of energy minima as the bond is rotated 360o), and γ is the phase angle (0 or π).

Polarized Charges Point-charge models, as described previously, are broadly used in many biomolec- ular force fields, despite the importance of the polarization effect in condensed phases or biomolecular systems. Polarization effect refers to the fact that the charge density around the nucleus is not fixed—its distribution varies depend- ing on the environment. For example, in the simple case of a water molecule, the dipole moment of water varies from 1.9D in the gas phase [83] to 2.1D in a water cluster [54] and to 2.9D in bulk water [9]. In a more complex system such as protein ligand binding, the explicit polarization treatment of electrostatic charges yields a reliable prediction of binding free energy [62]. However, in most force fields, the electrostatic properties of atoms are treated

18 by placing a point charge at the site of the nucleus where its magnitude is determined beforehand and kept constant.

Van der Waals Potentials Other alternatives to van der Waals potential include 7-14LJ, 6-10LJ and the Buckingham potential. For more details, see [79].

Putting all the functional forms together the expression below needs to be calculated in order to calculate Eq. (2.2),

kbm 2 kan 2 U = ∑ (lm − lm,0) + ∑ (θn − θn,0) bonds 2 angles 2

σ 12 σ 6 + ε ij − ij + qiq j ∑∑ 4 ij πε (2.9) i j=i rij rij 4 0rij

where, m and n enumerate the number of bonds and angles, respectively, and i and j enumerate the number of atoms N. In the next section the determi- nation of the parameters that are introduced above will be explained. Param- eterization is of utmost importance in developing force fields after choice of functional forms.

2.3 Parameter Optimization The main two challenges in developing force fields are the choice of func- tional forms (discussed in previous section) and accurate parameters. First, the functional forms need to be computationally efficient and realistic enough to capture meaningful physical behavior. Seconds, correct parametrization is crucial in force-field development, which depends on abundant and reliable data from experiments and/or QM measurements, and efficient, reliable opti- mization algorithms. Roughly speaking, the parametrization protocol includes the choice of target data, the order of parameters to be determined, and opti- mization methods. Each force field contains a large number of parameters, some of them are more crucial than others in reproducing certain properties. These properties include structural properties such as conformational energies and vibrational frequencies, and thermodynamic properties such as densities and heat of vaporization. Due to the important role of the non-bonded parameters in thermodynamic properties and their role in biomolecular structure and interactions, their proper optimization is essential for any reliable biomolecular force field. Non-bonded

19 parameters include those in van der Waals interactions and electrostatic inter- actions. One approach is to perform the parametrization by trial and error, i.e., the parameters are refined gradually to give better fits to the targeted data. Another approach is to define a penalty function and perform least-squares fitting [84]. Because different force fields have different functional forms and hence pa- rameters and/or different parametrization protocols, it is not possible to use one sets of parameters for all force fields, i.e., the parameters are not transfer- able among different force fields.

2.4 Force Calculation To calculate force through potential function (Eqs. (2.2) and (2.9)), one needs the spatial derivative of the latter equation. The potential functions described above have rather simple analytical derivation which makes them favorable to calculate. As is clear from Eq. (2.9), the calculation of energy for bonded terms scale as N, the number of particles. However, it scales as N2 for non- bonded terms. The magnitude of non-bonded interactions decreases as 1/r−1 for electrostatic interactions, and as 1/r−6 for van der Waals interactions. Therefore, one solution is to calculate the non-bonded interactions only for atoms falling within a certain distance rc, known as the cutoff distance. This solution is not free from artifacts created by truncating a smooth function abruptly and solutions for it have been proposed [134]. In fact, the effect of long-range interactions on the biomolecular systems have been researched since 1993. For example, for electrostatic interactions, a method widely used is the Particle Mesh Ewald (PME) sum [31, 38]. The magnitude of van der Waals interactions falls off rapidly as r−6, since their magnitudes are all neg- ative (unlike electrostatic interaction) the effect of neglecting them is quite large. For long-range van der Waals interactions the addition of an analyti- cal correction known as dispersion correction is proposed to account for those interactions beyond the cutoff distance [3]. The method of PME has been proposed and implemented for van der Waals interactions in the GROMACS software [148].

20 3. Brownian Dynamics - Smoluchowski Approach

As was mentioned in the introduction, the dimension of a problem determines the choice of research tool. Molecular Dynamics (MD) is an appropriate tool when the question has an atomistic nature (Fig. 2.1) and a sub-microsecond temporal scale. However, when the problem at hand is to understand the interactions and reactions of several species with hundreds to thousands of molecules for a duration of hours, MD simulations are not appropriate tools. A common mathematical tool to study (bio)chemical kinetics of this type of problem is differential equations. One of the earliest examples of using dif- ferential equations goes back to 1850 [149], when the conversion of sucrose to glucose and fructose was described by using ordinary differential equations (ODE). However, incorporation of the spatial gradient of the molecular species to the model is necessary when the problem of interest is subject to spatial non- homogeneity e.g., due to slow diffusion or compartmentalization in the interior of cells. For this, partial differential equations (PDE) are used to describe the changes in molecular species in a continuous manner with respect to time and space. These models are called reaction-diffusion systems. The diffusion term describes the transportation of matter and the reaction term models the trans- formation process. Reaction-diffusion systems have been used to describe a wide range of problems in physics, chemistry and biology [136]. An example is two populations of prey and predator and their spread in the environment. The populations can be modelled by two sets of coupled reaction-diffusion equations, describing the birth and decay of each species by natural causes and their interactions. The spread of each species is captured by a diffusion term with a predefined diffusion coefficient for each species, in a model known as the Lotka-Volterra [87, 141] model or prey-predator model [88, 20]. An- other example of reaction-diffusion systems to explain biological phenomena is in pattern formation such as stripes and spots (also known as Turing pat- terns [132]). These models, together with their analytical solutions [123], are widely used to explain Morphogenesis i.e., embryonic development of organ- isms.

3.1 Deterministic Modelling One way of deriving the diffusive part of the reaction-diffusion equation is through Fick’s first law. This law asserts that the flux is linearly related to the

21 gradient of concentration, ∂c j = −D (3.1) ∂x and the conservation of mass that it results in (expanded just in one dimension here): ∂c ∂ j = − . (3.2) ∂t ∂x Combining Eqs. (3.1) and (3.2) leads to the diffusion equation,

∂c ∂ 2c = D . (3.3) ∂t ∂x2 The reaction equation is derived through the law of mass action [55], which states that the rate of a chemical reaction is proportional to the masses of re- actants. ∂c = −k c (3.4) ∂t d

where, kd is the rate at which the reactant degrades. The so-called reaction diffusion equation is therefore written as:

∂c ∂ 2c = D − k c. (3.5) ∂t ∂x2 d In the macroscopic picture of (bio)chemical reactions above, it is assumed that the reagents of reactions have high concentrations. However, not all molecular species appear in high copy numbers in inter/intra cellular reac- tions. An example is gene expression, which involves low copy number of transcription factors, polymerase and mRNAs. Therefore, a macroscopic ap- proach is not a suitable analytical tool to explain and predict these processes. An alternative and accurate way of modeling cellular processes with a low copy number of molecules and with reactions happening at random times is stochastic modelling [51, 36].

3.2 Stochastic Modelling The chemical master equation (CME) is a widely used stochastic model to ex- plain and predict the molecular copy number of a species that changes through chemical reactions. CME is a set of Ordinary Differential Equations (ODEs) that govern the evolution of probability distribution functions of certain num- ber of molecules of each species at a certain time [136]. Except in a few special cases [46, 50], the CME is not solvable analytically. Stochastic simulation algorithms (SSAs) such as the one by Gillespie [49] are the most well-known and well-extended algorithms to simulate analyti- cally non-solvable systems. An essential assumption in CME is a well-stirred

22 environment where the diffusion is fast enough to create a homogeneous en- vironment in between reactions. Some cellular processes involve molecular species with slow diffusion that create a non-homogeneous environment. In such cases, CME is not an appropriate model. Cellular processes with re- actions happening stochastically in a non-uniform environment are primarily simulated by two particle-based models, Brownian Dynamics (BD) and Reac- tion Diffusion Master Equation (RDME). RDME dates back to 70s [76, 48, 102] and can be interpreted as a spatial extension of the CME. In RDME, the space is divided into sub-volumes with a mesh size small enough to guarantee the assumption of the uniform distri- bution of particles within each voxel. Therefore, the same procedure as for CME can be applied to each sub-volume for the reactions in addition to the diffusion of particles to adjacent voxels [40, 59, 143]. The state of the system is tracked by the copy number of each species in each sub-volume. In spite of the progress in speeding up [32] and adjusting RDME to different geometries and unstructured meshes [35], seeking the smallest mesh size to validate the approach is still controversial [39, 37, 71, 60]. It has been shown [37] that by decreasing the mesh size relative to molecule size, no bimolecular reactions can take place. Instead, the molecules simply diffuse and are transformed by monomolecular reactions. In contrast to RDME, the BD algorithm is a particle-based off-lattice model. Each particle in the system is characterized by its size and species and their positions are tracked individually. The particles undergo a random walk which is governed by the diffusion equation. In most of the BD algorithm develop- ments, the diffusing steps are sampled from a Gaussian distribution by choos- ing a fixed time step. As in RDME, the monomolecular reactions in BD algorithms are modeled as a Poisson process. However, the bimolecular reactions are not explicitly modeled in most of the algorithms [5, 110, 7]. For a discussion of different implementations of bimolecular reactions see [125]. Another drawback with time-driven BD algorithms is the large computation time due to small time steps. Increasing the time steps is a way of improving the efficiency; however, this could result in missing reactions between particles. To make sure no reaction will be missed, a regime of BD algorithms known as event-driven algorithms has been developed. In these algorithms, the next time step is calculated such that an interesting event happens, e.g., monomolec- ular or bimolecular reactions. One example of these algorithms is Green’s Function Reaction Dynamics (GFRD) [138, 137]. GFRD uses the exact solu- tion of the Smoluchowski equation to combine the diffusion in space and the reaction between particles into a single step. This allows GFRD to take large jumps in space and time to propagate the particles in systems with low concen- tration. Although the cost to compute every iteration is high, these algorithms are up to 5 orders of magnitude faster than time-driven algorithms [137]. In the GFRD algorithm, the calculated time to the next reaction is also compared

23 with the time step for the next diffusive step. Therefore, a cut-off distance is introduced for calculating the diffusion of particles. Introducing the cut-off distance makes the algorithm inaccurate to a certain extent, because there is a non-zero probability that the particle jumps to a distance beyond the cut- off distance. In a later version of the GFRD algorithm, eGFRD, this issue is resolved by introducing an absorbing boundary condition, see [103, 127].

3.3 Green’s Function Reaction Dynamics GFRD is an event-driven BD algorithm to simulate reaction diffusion systems at the particle level. A reaction diffusion system is composed of many particles that diffuse in space and react upon contact with each other. An analytical solution is not obtainable for a reaction-diffusion system with more than two particles. The original idea of GFRD was to divide the many-body system into one- and two-body problems and find an exact analytical solution for each system. The use of Green’s function solutions to set up an event-driven algorithm allows GFRD to take large jumps in space and time at each time step. Below the modeling behind GFRD is explained in more detail.

Biochemical reactions: monomolecular or bimolecular reactions A cellular process can be depicted schematically with three basic reactions below (as many as needed): /0 → X (3.6) X → ... (3.7) X +Y → ... (3.8) The monomolecular and bimolecular reactions above are reaction channels that cells use to transform one set of chemical species to another at a specific rate. Here the dynamics and kinetics of monomolecular and bimolecular re- actions are modelled without going into the details of atomistic resolution in MD. In fact, molecules are considered spheres with radii to model their sizes. The spheres are reactive symmetrically.

Molecular transportation is modeled as a random walk The diffusion equation can be derived from a random walk model. Here, we consider the random walk in one dimension, without loss of generality. We derive the equation for evolution of p(x,t), the probability density that the particle is at position x at time t. We assume that the random walk is an unbiased walk, which means the probability of going either right or left is equal (p = q = 1/2). We assume that a particle takes a jump of length Δx in one unit of time, Δt. If we look at the probability of being at position x in

24 one unit of time in the future, t + Δt, this happens either if the particle is at position x + Δx at time t and jumps to the left. Or if it jumps to the right when the particle is at position x − Δx at time t. Given the Markov property of such processes, one can sum over the probabilities as follows: 1 1 p(x,t + Δt)= p(x − Δx)+ p(x + Δx). (3.9) 2 2 Subtracting p(x,t) from both sides of Eq. (3.9) and dividing both sides by Δt and with a slight manipulation, it results in: p(x,t + Δt) − p(x,t) (Δx)2 p(x − Δx) − 2p(x,t)+p(x + Δx) = . (3.10) Δt 2Δt (Δx)2 The equation above becomes the diffusion equation at infinitesimal values for Δt and Δx:

∂ ( , )= ∂ 2 ( , ) t p x t D x p x t (3.11) (Δx)2 where is replaced with diffusion coefficient D. 2Δt Monomolecular reactions are modeled as Poisson processes The transformation of one species to another or the degradation of species in Eq. (3.7) is independent of molecular diffusion or any spatial gradient, unless the species is involved in localized biochemical reactions such as those hap- pening on the membrane. Moreover, the transformation is assumed to be in- stantaneous. For such description, a good approximation is a Poisson process with the predefined rate as the average lifetime of the corresponding species. Therefore, the waiting time for the next monomolecular reaction is distributed exponentially.

Bimolecular reactions are modeled with boundary conditions In contrast to monomolecular reactions, bimolecular reactions are not easily separable from their diffusive dynamics when the particles are in close vicinity of each other. Let’s consider an isolated pair of molecules, A and B, that diffuse with diffusion coefficients DA and DB and associate at the contact surface σ = σA + σB: k A + B −→a C (3.12) where σA and σB are the radii of particles A and B, respectively. Here, ka is the association rate at contact and C the product. The probability distribution function p(rA,rB,t) of the position of the particles, rA and rB at time t,is governed by the diffusion equation below, ∂ ( , , | , , )= Δ + Δ ( , , | , , ), t p rA rB t rA0 rB0 t0 DA rA DB rB p rA rB t rA0 rB0 t0 (3.13)

25 Δ where rA0 and rB0 are the initial positions of the particles at time t0 and is the Laplacian operator. In an attempt to solve the Eq. (3.13), one needs to simplify it by changing the position vectors rA and rB to the inter-particle distance vector r and the center of diffusion vector R through the following expression: R = αrA + βrB, (3.14) r =rB −rA (3.15)

where, α and β are chosen such that p(rA,rB,t|rA ,rB ,t) can be written 0 0 as p(R,r,t|R0,r0,t0) and further factorized in two independent processes with R r p(R,r,t|R0,r0,t0)=p (R,t|R0,t0)p (r,t|r0,t0). The rule above imposes conditions on the positive α and β as follows: ⎧ α ⎨ = DB , β DA ⎩ 2 2 α DA + β DB = DR.

Here, DR is the diffusion coefficient of the center of diffusion vector. Ap- plying the changes above we obtain the diffusion equation for the center of diffusion ∂t p(R,t|R0,t0)=DRΔp(R,t|R0,t0) (3.16) and the diffusion equation for the inter-particle distance vector

∂t p(r,t|r0,t0)=(DA + DB)Δp(r,t|r0,t0), σ ≤ r. (3.17) The solution to Eq. (3.16) with a point source initial condition and absorb- ing boundary condition at infinity is the Gaussian solution. The solution to Eq. (3.17) is derived in [25] with the following initial and boundary conditions: p(r,t0|r0,t0)=δ(r −r0) (3.18) 2 4πσ D∂r p(r,t|r0,t0)||r|=σ = ka p(r,t|r0,t0)||r|=σ (3.19) where the bimolecular reaction—association—is incorporated into the bound- ary condition, here ka is the intrinsic association rate. This problem is an extension of the Smoluchowski problem with a reactive boundary condition, see [28]. GFRD is slow compared to RDME specifically when it comes to the diffusion- limited reactions that are characteristic of many biological processes. After each dissociation event, the particles stay in the vicinity of each other due to low diffusivity. Then they re-associate before diffusion moves the particles apart. This problem is addressed in Paper I.

26 4. Summary of papers

4.1 Paper I: Efficient Green’s Function Reaction Dynamics (GFRD) simulations for diffusion-limited, reversible reactions The Green’s function reaction dynamics (GFRD) algorithm [138, 137] was proposed to resolve the inefficiency of Brownian Dynamics (BD) algorithms by taking larger time steps, as explained in Chapter 3, and to apply Smolu- chowski theory for bimolecular reactions. Another advantage of the GFRD algorithm is that it obeys the detailed balance principle when it comes to re- versible reactions. In GFRD, a pair of particles can associate when they reach each other’s contact surface and the products of the transformation of a single particle are put at contact surface. However, the efficiency of GFRD deteri- orates when the reaction system involves diffusion limited reactions. These reactions include molecular species that have a slower diffusion than their as- sociation rate. This results in multiple re-bindings after each dissociation until diffusion separates the particles from each other. This is shown schematically in Fig. 4.1. In GFRD, many calculations should be carried out for sampling time to the next association and dissociation events. However, in reversible GFRD (rGFRD) the time to separation is sampled once. In this paper we propose a model for a single particle that can dissociate and re-associate re- versibly until the diffusion moves the product away to a distance where the probability of rebinding is low. Consider the schematic reversible reaction

kd C  A + B (4.1) ka together with transformation or decay reactions

γ γ γ A →A ..., B →B ..., C →C .... (4.2)

Here, ka is the intrinsic association rate, kd the intrinsic dissociation rate, and γA, γB and γC are transformation or decay rates of particles A, B and C, re- spectively. The probability distribution function of inter-particle distance r between particles A and B satisfies the equation

∂t p(r,t)=DΔr p(r,t) − γu p(r,t) (4.3)

27 where D is the sum of diffusion coefficients of particles A and B and γu = γA + γB. The distance r is restricted to the shell σ ≤ r ≤ b (4.4) where σ, the sum of the radii of particles A and B is the distance at which a re- action can take place and b is an arbitrarily chosen distance with the absorbing boundary condition p(b,t)=0 (4.5) at which the separation of A and B occurs. The reaction at the contact surface is modelled as a flux at the inner boundary sphere through

2 4πσ D∂r p(r,t)|r=σ = ka p(r,t)|r=σ − kdQ(t) (4.6) The probability of being bound as C at time t is denoted by Q(t) and evolves as dQ(t) = −k Q(t) − γ Q(t)+k p(r,t)| =σ (4.7) dt d C a r We derive the analytical solution of the equation above in the rGFRD al- gorithm to speed up the algorithm. Fig. 4.2 shows the CPU time for both im- plementations of GFRD and rGFRD for different values of re-bindings, Nd. ka The relation between ka in Eq. 4.1, D in Eq. 4.3 and Nd is Nd = 1 + , kD kD = 4πσD. We have shown that by increasing the number of re-associations, the execution time of GFRD algorithm increases linearly, however, it remains constant for rGFRD algorithm (Fig. 4.2).

4.2 Paper II: Impact of Dispersion Coefficient on Simulations of Proteins and Organic Liquids Biological environments are densely packed with macromolecules. It is often difficult to get detailed insight into how the crowding affects what we know about cellular processes from experimental means. An alternative is to use computational techniques. The structure and dynamics of biomolecules can be tracked at high resolution at both spatial and temporal scales. Molecular Dynamics (MD) has the ability to get atomistic resolution of biomolecular dynamics and structure. However, MD simulations of concentrated models of proteins have shown non-physical coagulation of them [105, 2, 107]. The protein-protein interactions are too strong when simulated with standard force fields. One explanation is that the force fields have often been initially param- eterized for simple and small-model systems, or proteins in diluted solutions. One way to resolve this issue is to modify the protein-protein or protein-water interactions. Strengthening the protein-water Lennard-Jones interactions im- proves the properties of disordered proteins [15]. Another explanation of the

28 Figure 4.1. Comparison of the time sampling in GFRD and rGFRD. rGFRD samples a longer time step than GFRD when it comes to reversible reactions. In rGFRD, the idea is to integrate over the short-lived time steps of all re-association events after each dissociation. non-physical behavior of concentrated models suggests that the dispersion in- teractions may have been overestimated due to short cutoffs while parame- terizing force fields [97]. In this work, we tested the latter explanation by systematically reducing the dispersion coefficient in the 12-6 Lennard-Jones (LJ) potential. The LJ potential is given by: σ 12 σ 2 ( )= ε ij − ij = C12 − C6 UvdW rij 4 ij 12 6 (4.8) rij rij rij rij

where rij is the distance between atoms i and j, and σij is the distance at which the energy is zero, εij is the depth of the potential well. C12 and C6 are repulsion and dispersion coefficients, respectively. They are derived from the distance and well-depth parameters through = ε σ 12, = ε σ 6 . C12 4 ij ij C6 4 ij ij (4.9)

Here, we reduced the C6 for non-water atom types by 0, 1, 2 and 4 % to study the protein solutions and neat liquids. For neat liquids simulations, two general force fields were chosen: the generalized AMBER force field (GAFF) and the CHARMM general force field. The LJ long-range interactions were calculated with particle mesh Ewald (PME). For protein solutions, we tested two families

29 Figure 4.2. Comparison of the execution time for GFRD and rGFRD. Average execu- ka tion time of 10 simulations is shown for a reversible reaction A + B  C for a wide kd range of Nd using the original GFRD and the rGFRD. The parameters of the system 3 −1 3 −1 −1 are kD = 0.2 μm s , ka =(Nd − 1)kD μm s , kd = ka/(2VR) s and R = 1 μm.

of force fields: AMBER ff99SB-ILDN [142, 65, 86] and CHARMM27[90, 18]. We used two methods for long-range LJ interaction calculations: cut-off (for AMBER ff99SB-ILDN) and PME (for both AMBER ff99SB-ILDN and CHARMM27). We also simulated the protein systems with scaled AMBER force fields designed for crowded systems to compare them with ours. We found that the generalized force fields, AMBER and CHARMM, repro- duce the experimental densities for neat liquids with little or no corrections. Reducing the dispersion coefficient led to underestimation of densities. For biomolecular force fields, the highest degree of reduction of long-range forces resulted in less coagulation. However, the scaled AMBER force field designed by Best [15] still performed better. Since reducing the long-range attractive forces even more may lead to unfolding and destabilizing the biomolecules structure, this solution is not favorable. Furthermore, since biomolecules and liquids force fields share atom types, a different solution is desirable for de- signing a force field for crowded system.

30 4.3 Paper III: Rotational and translational diffusion as a function of concentration The interior of cells is densely packed (chapter 1). Macromolecular crowding affects the dynamical properties of proteins as well as the structural and kinetic properties [145, 33, 152, 118, 112]. It has been shown that the translational diffusion coefficient of Green Fluorescent Protein in E. coli measured in vivo is 10% of its in vitro value [34, 75]. However, an atomistic insight into the prob- lem has not been tractable by an experimental techniques so far. In this work, we used computational techniques to simulate a simple model of crowded en- vironment at atomistic level by increasing the concentration of proteins in a systematic way. We used Molecular Dynamics to simulate the systems for one microsecond with a force field designed for crowded systems [15]. The study was done for three proteins: Ubiquitin [140], 2kim [6], and 2k57 [115]. We focused on the dynamical properties of proteins, i.e., the translational and rotational diffusion. The translational diffusion coefficient and rotational cor- relation time (global tumbling) were calculated. These results were compared with hard sphere (HS) models that took into account the excluded volume ef- fect. The full atomistic model of the concentrated proteins probed the effect of electrostatic forces in addition to steric forces. We found that the reduction in rotational diffusion has a larger spread and it depends on the proteins in our study. However, the reduction in translational diffusion is similar for different proteins. The difference between our simulations (MD) and HS models are clear for the proteins in our study which suggests that the excluded volume ef- fect explains the reduction in translational diffusion to some extent. The effect of electrostatic interactions could be seen by formation of transient clusters in our study. This cannot be explained with simple HS models. Moreover, inclu- sion of explicit water and also study of rotational diffusion is not possible with HS models.

4.4 Paper IV: Making Soup: Preparing and Validating Molecular Simulations of the Bacterial Cytoplasm In this work we took a step forward towards whole-cell simulation. We built a model of E. coli at atomistic resolution that can be used for resolving its dynamics with MD simulations. This model is available as a Python script on github repository https://github.com/dspoel/soup. We used this model to set up a simulation of the system that represents 1/5000 of E. coli volume. Incorporating a full list of proteins, nucleic acids and metabolites in a full atomistic model is prohibitive in terms of computational resources. Therefore, we selected the most abundant non-ribosomal proteins; these account for 50% of the total non-ribosomal proteins numbers. To account for nucleic acids, we chose , because 74% of the dry weight of non-ribosomal RNAs is

31 composed of tRNA, i.e., 5% of the total protein weight corresponds to the total RNA weight. tRNAPhe was chosen as representative of tRNAs in our model because of the availability of the crystallographic structure [22]. The metabolite fraction of our cytoplasm model was built based on data that the number of metabolite molecules in the cytoplasm of E. coli is 42.86 times greater than the number of proteins [13]. We considered the most abun- dant metabolite as representative of each metabolite class, i.e., amino acids, nucleotides, central carbon intermediates and, cofactors. The number of water molecules to be added to the system depends on the total biomolecular mass and concentration. We chose 30% biomolecular con- centration and calculated the number of water molecules to reach this value. At the end, we added 0.15 M ionic strength to our model to include cations, like Mg2+ and K+ and anions Cl−. We used the Molecular Dynamics (MD) package, GROMACS 2018, to simulate the cytoplasm model in three replica for 1μ s for each replica. Additionally, we simulated each protein under a dilute condition for 200 ns for each replica. We validated our cytoplasm model by comparison of diffusion coefficient reduction with experimental results. Translational diffusion coefficient drops by around 85% for the proteins and 93% for the tRNA in our study. These findings are in agreement with experimental results showing that the diffusion coefficient of Green Fluorescent Protein (GFP) in E. coli decreased 90% when compared to in vitro measurements [34, 75]. Our results show also that the drop in diffusion coefficient is independent of the size of the molecules. We found that higher drop of diffusion coefficient of tRNA is because of aggre- gation of ATP−3 and FBP−4 around Mg+2 that is used as cation for tRNA molecules. Therefore, we treat tRNA as an outlier and it was excluded from the rest of the analysis. Furthermore, we suggest to use fully protonated ATP −3 and FBP−4 in the model to avoid the aggregation problem. In our study we found that most of the proteins are more flexible in cyto- plasm model than in the dilute condition. This is in agreement with previous computational study [42] and experiments [96, 146] showing that non-specific interactions between proteins and their native environment leads to proteins being less stable.

32 5. Concluding Remarks

Atomistic models of biological macromolecules provide the highest level of detail. However, for investigating macromolecular behavior in a cellular envi- ronment, this is not sufficient unless the whole entity of the biological organ- ism is modeled at an atomistic level. Whole cell modeling is not a new topic in biological studies. A mathematical model for single bacterial cell growth was proposed in late 70’s [122]. In this model, the concentration of the bacterial components was modelled for signal transduction, genetic regulatory networks and metabolic networks through equations (kinetic models). From a hypothetical whole-cell computational model, one can predict cellu- lar phenotypes based on their genome. Beside basic scientific purposes, whole cell modelling may have medical applications. The effect of new drugs can be investigated on the entire cell behavior rather than on a single target. Cellular environments are complex in many ways. Firstly, cellular pro- cesses occur at spatial and temporal scales that span over many orders of mag- nitudes. For example, macromolecular atomistic details have 0.1 nanome- ter resolution whereas whole biological processes happen at a few hundred nanometers to a hundreds of micrometers. From a temporal point of view, chemical reactions happen on the order of femtoseconds whereas cell divi- sion time for E. coli is 20 minutes. Secondly, cellular environments con- tain components at different level of concentrations and structural complexity. These range from macromolecules such as proteins, nucleic acids, and carbo- hydrates, to ions and water. All have different shapes and sizes in addition to different abundance. Another complexity pertains to cellular processes. Over the life of the cell, the different processes that take place are modulated in a complex way. Most often the cellular pathways are studied separately. From a technical point of view, whole cell modelling poses grand chal- lenges. In recent years, there has been an abundance of high-throughput data from various studies on cellular modules, e.g., transcriptomics, proteomics and metabolomics. However, acquiring homogeneous knowledge across dif- ferent cellular pathways through a technology that spans all cellular functions is still a big challenge. Furthermore, some cellular processes are more stud- ied than others. Signal transduction, metabolic network and regulatory net- works are far more investigated than cell differentiation and apoptosis [61]. Therefore, data curation is one of the grand challenges on the way towards building a whole cell model. Another difficulty is the discrepancy between models among different cellular pathways. For instance, most often flux bal- ance analysis (FBA) [104] is used in models, but ordinary differ- ential equations (ODEs) in modeling signal transduction pathways [74]. Even

33 though there is a possibility to model all these pathways with a unique model e.g., ODEs, the large temporal difference in various cellular processes leads to stiff ODE’s that are not easily integrable. Therefore, integrating the best model and all knowledge known for each pathway into a unified model are the greatest challenges on modelling the whole cell. For more details on principles and challenges in whole-cell mod- elling see [91, 73, 52, 126]. In spite of the challenges, there are examples of whole-cell models of dif- ferent organisms at different levels of approximation. A project known as E-cell started as in 1999 [131] to build a platform for a whole-cell simulation. Through this software a hypothetical cell with a minimal number of genes sufficient for its survival was built based on differential equations. This hypo- thetical cell is derived from Mycoplasma genitalium bacterium. M. genitalium has only ∼ 480 genes, almost one order of magnitude fewer than E. coli. Its genetic sequence has been published (http://www.tigr.org/), which makes it a good candidate for whole-cell modeling. Another whole-cell computational model of the bacterium M. genitalium at molecular level can be found in [72]. Other examples of organisms or cell type models are the computational one of erythrocytes [130], a data-driven model of E. coli [24], and those are found in [16, 110, 94, 4]. Furthermore, a model of cytoplasm at atomistic level has been built based on M. genitalium [41]. This model integrates genomic and metabolic data with physical and biochemical principles. In Paper IV,we have constructed an atomistic level of cytoplasm that pertains to 1/5000 of the volume of E. coli cytoplasm. Constructing atomistic resolution models of cellular environment, or cellu- lar types are extremely challenging due to several factors. The atomistic struc- ture of all cellular components is not available. Globular protein structures are either experimentally resolved or they can be modeled with reasonable ac- curacy through homology modelling [116]. However, membrane proteins or disordered proteins are more difficult targets to address experimentally. Like- wise, nucleic acids and cellular membranes have complex structures that are difficult to resolve. For a review on this issue see [67]. Macromolecular models reach 0.1 nm spatial resolution when modeled at atomistic level. Constructing a full atomistic model of the whole cell at this resolution is conceivable, but it would be prohibitively costly to investigate the dynamics of the model with e.g., Molecular Dynamics simulations. From a simple extrapolation in time, it has been estimated that the simulation of an E. coli bacterium would be possible in 25 years from now for one nanosecond with 1011 atoms [135]. However, no cellular processes happen at the nanosec- ond scale. More recently macromolecular crowding has been investigated for simple systems [42, 57, 56, 100]; see Paper III. Many of these studies on macro- molecular crowding show a trend in which compact native states are favored over extended states, due to entropic effects. However, the effects of non-

34 specific protein-protein interactions could have an opposing outcome depend- ing on the type of crowding agents [114]. These non-specific interactions should not be neglected in the study of the diffusive properties of macro- molecules. The attempts in systematically studying simple models for crowding has advantages that bring us to the next point in the challenges around atom- istic models, namely accurate physical models. The accuracy of MD simu- lations depends on the force-field models. In general, the force fields have been improved in reproducing experimental observations over time [77, 85, 10]. However, even state-of-the-art additive force fields lead to aggregation of biomolecules at cellular concentration levels [105, 2, 107]. This problem has been addressed in several studies [15, 107, 97] including Paper II. For recent reviews on force-field development see [101, 111]. Therefore, more accurate physical models and increased computational power would resolve the current challenges to some extent. For the time being, ap- proximative models are of great value for shedding some light on cellular pro- cesses. In spite of the high concentration of macromolecules and nucleic acids in the cellular environment, the major component is water molecules. There- fore, using implicit water with reduced dielectric constant to model the effect of cellular conditions would significantly reduce the computational costs. For more thorough discussions on modeling biomolecular simulations from atom- istic resolution towards coarse grained models see [43]. The highest level of approximation for cellular processes is achieved through kinetic models (ODEs). These are the simplest models and therefore computa- tional power is gained at the expense of losing details, including spatial detail. For a review on the importance of space in simulation of cellular processes see [128]. Another feature of cellular processes that is not captured with ODEs or even PDEs is the stochastic nature of certain cellular processes [92, 109]. From the other hand, one key feature of cellular processes that is captured with ODEs and PDEs is the reactions. Reactions in terms of bond-breaking processes are not modeled in MD simulations. In order to incorporate bio- chemical reaction into the simulations, one needs to study the problem at the quantum mechanics (QM) level or through quantum mechanics/molecular me- chanics (QM/MM) [119]. However, the increase in degrees of freedom in- creases the computational costs yet again. Reaction Diffusion Master Equation (RDME) and Brownian Dynamics (BD) are among formalisms that incorporate key features of cellular processes such as reactions, stochasticity, and space. Furthermore, they can simulate pro- cesses at the cellular time scale. The advantage of RDME over BD is the event-driven scheme of the algorithm which makes it faster; however BD algorithms retain spatial information more accurately. For more discussion on this regime of algorithms see [51]. Green’s Function Reaction Dynamics (GFRD) algorithm is yet another type of BD algorithm that takes advantage of an event-driven scheme to speed up the simulations [138, 137]. It is extended

35 to 1-D, 2-D and 3-D processes [124]. In Paper I, its efficiency is improved for diffusion-limited reactions. In spite of large computational overhead, GFRD algorithms are up to 5 orders of magnitude faster than conventional BD al- gorithms; however, they are still slower than the RDME type of algorithms. Nevertheless, when accurate spatial information and kinetic parameters are crucial to the investigation, the GFRD algorithm is a better candidate. For more details on the choice of parameters for stochastic simulation algorithms see [125]. Macromolecular crowding investigation with BD algorithms poses a prob- lem which is the high number of molecular collisions. From the other hand, BD algorithms are appropriate tools for processes that take place on large temporal scales, e.g., minutes to hours. This is prohibitive with e.g., MD simulations. In these instances, a lower level models, e.g., MD can be used to parametrize diffusion coefficient and/or kinetic parameters under crowding condition. These parameters can be used for higher level models, e.g., BD algorithms to see the effect of crowding on cellular processes.

36 6. Svensk Sammanfattning

Biologi är det vetenskapliga området som studerar levande organismer. Den mest grundläggande enheten som kan sägas leva är en cell. Cellerna kan själva utföra komplexa funktioner för att hålla dem vid liv. Dessa cellulära funk- tioner, t.ex. proteinsyntes, transkription av deoxiribonukleinsyra (DNA) och ribonukleinsyra (RNA) syntes, är resultat av flera skilda fysiokemiska pro- cesser som sker på olika tids- och rymdskalor. Dessutom består den intra- cellulära miljön av ett brett spektrum av biomolekylära enheter, från makro- molekyler som nukleinsyror, proteiner och kolhydrater, till mycket mindre de- lar som metaboliter, joner och givetvis vatten. Effekterna av den intracel- lulära miljön på cellulära processer kallas makromolekylär trängsel. Till ex- empel kan strukturella och dynamiska egenskaper förändras under olika grad av trängsel. Den största flaskhalsen för att förstå cellulära processer under dessa “trånga” är begränsningen hos de experimentella teknikerna. Det finns knappt några experimentella tekniker som kan ge ett upplösning på atomnivå av de innersta delarna av en cell. Därför är teoretiska verktyg och beräkningar värdefulla resurser för att få insikt i och ge upphov till nya hypoteser i cellens liv. I denna avhandling används två beräkningsmetoder, Molecular Dynamics (MD) och Brownian Dynamics (BD) simuleringer. MD-tekniker kan användas för att studera biomolekylers dynamik och interaktioner på atomnivå genom att lösa Newton-ekvationen numeriskt. BD-tekniker löser biomolekylär dy- namik och reaktioner på ett förenklat sätt, i det här fallet genom att beskriva hela proteiner som en sfär. Konventionella BD-simuleringar blir ofta ineffektiva då huvuddelen av beräkn- ingstiden spenderas för att sprida partiklarna mot varandra innan någon reak- tion äger rum. Händelsedrivna algoritmer har visat sig vara flera tiopotenser snabbare än konventionella BD-algoritmer. Närvaron av diffusionsbegränsade reaktioner i biokemiska nätverk leder emellertid till flera återbindningar vid en reversibel reaktion som försämrar effektiviteten hos dessa typer av algoritmer. I artikel I modellerade vi en reversibel reaktion i kombination med diffusion för att räkna för flera återbindningar. Vi har implementerat en Green’s Func- tion Reaction Dynamics (GFRD) algoritm genom att använda den analytiska lösningen för den reversibla reaktionsdiffusionsekvationen. Vi har visat att algoritmernas prestanda är oberoende av antalet återbindningar. Vinsten i beräkningskraften försämras emellertid när det gäller simulering av trånga system, men med tanke på effekterna av de makromolekylära n på

37 diffusionskoefficient och kinetiska parametrar är kända, kan man implicit in- tegrera effekten av trängsel i BD-algoritmer genom att välja rätt parametrar. Därför är det fördelaktigt att förstå effekten av trängsel på atomnivå. Vi studerade effekten av trängsel på diffusiva egenskaper på atomnivå med MD-simuleringar. Våra resultat i artikel III betonar effekten av kemiska in- teraktioner på atomnivå på makromolekylers rörlighet. En utmaning vat att nya mot atomistiska fysiska modeller s.k. kraftfält behövs ta fram för simu- lering av makromolekyler i höga koncentrationer, då befintigla kraftfält leder till aggregering av proteiner vid hög koncentration. I artikel II undersökte vi scenarier baserade på försvagning och förstärkning av växelverkan mellan proteiner respektive mellan proteiner och vatten. I artikel IV byggde vi en cy- toplasmisk modell baserad på tillgängliga data på cytoplasma från Escherichia coli, på atomnivå. Denna modell kan beräknas i såväl tid som rum med MD- simuleringspaket GROMACS. Genom denna modell är det möjligt att studera strukturella och dynamiska egenskaper i cellmiljö vid en fysiologisk relevant koncentration.

38 7. Acknowledgments

This thesis wouldn’t be possible without the support of so many people during the last many years. I would like to thank my supervisors and advisors. David van der Spoel, without your support, patience and scientific skills, I would have been lost. Per Lötstedt, thank you for helping me to save three years of my work, without your help my work would have been lost. Sorin Tanase, thank you for all the tutoring. I would also like to thank you, Elisabeth Larsson, for being a great mentor to me, being there all the time and advising me patiently. I would like to thank you Alex, for being extremely patient during all these years. I always found right directions in your neat view to the problems. I hope you recognize me not being a PhD student. I would like to thank you, my little Sophie, without you the last stretch of time wouldn’t have been as delightful. I would like to thank my friends and colleagues in David’s group, Madeleine, Mohammad, Leandro, Nina, Natasha and Daniel. Thanks for sharing your ex- periences in science and life with me. I would like to thank my friends I met during my “residency” at BMC that made my life happy, Alexandre, Beat, Miha, Paul, Sudarsan, Livia, Rezvan, Xana, Prune, Cedric, Zhen, Hugo, Sebastian, Aleksandra, Klev, Gurkan, Petar, Klaudia, Mirthe and Meike. Madeleine, thank you for informing me on Swedish life style and upbring- ing of kids. I admire your steady, calm and consistent attitude. Silvana, thank you for all the discussions on books, movies, friendships and relationships. I admire your point of view. Sofia, thank you for always directing me towards all positive points in life. I admire your optimism. Katrin, thank you for all the walkings during “mammaledighet”. I admire your calm attitude as well as all your energy for biking all the time. Jevgenia, you are my very first friend in Uppsala since I started my PhD. You’ve seen all my ups and downs, I promise from now it will be more ups than downs. I admire your thorough and deep viewpoint. Shabnam, thank you for inspiring me with your attitude. I admire your energy. Showgy, you’ve talked to me through the difficulties and never stopped be- lieving in me. I am so grateful for all your time, advice and support. It means a lot to me. And you Anja, we have shared a lot during last couple of years. I don’t remember how we started to have a lot in common and subsequently discussed

39 a lot. You helped me a lot in many ways, I know you know it too, without your inspiring talks I wouldn’t have made it. I would like also to thank my friends I got to know since I came to Sweden. I wouldn’t have made the first two years in Sweden without having you here in Uppsala; Fatemeh, Amin and Ershad. I would like to thank my family for never doubting me and always asking when I am finishing my PhD, patiently. The question I had no answer to until recently. Thank you Maman, Baba, Arnavaz, Ziba and Sajjad.

40 References

[1] Mark James Abraham, Teemu Murtola, Roland Schulz, Szilárd Páll, Jeremy C. Smith, Berk Hess, and Erik Lindahl. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1-2:19–25, 2015. [2] Luciano A Abriata and Matteo Dal Peraro. Assessing the potential of atomistic molecular dynamics simulations to probe reversible protein-protein recognition and binding. Scientif. Rep., 5:10549, 2015. [3] M P Allen andDJTildesley. Computer Simulation of Liquids. Oxford Science Publications, Oxford, 2nd edition edition, 2017. [4] Tadashi Ando and Jeffrey Skolnick. Crowding and hydrodynamic interactions likely dominate in vivo macromolecular motion. Proc. Natl. Acad. Sci. U.S.A., 107(43):18457–18462, 2010. [5] Steven S Andrews and Dennis Bray. Stochastic simulation of chemical reactions with spatial resolution and single molecule detail. Phys. Biol., 1(3):137, 2004. [6] J.M. Aramini, R.L. Belote, C.T. Ciccosanti, M. Jiang, B. Rost, R. Nair, G.V.T. Swapna, T.B. Acton, R. Xiao, J.K. Everett, and G.T. Montelione. Solution NMR structure of an o6-methylguanine methyltransferase family protein from Vibrio parahaemolyticus. Northeast Structural Genomics Consortium target VpR247. To be published. [7] Satya Nanda Vel Arjunan and Masaru Tomita. A new multicompartmental reaction-diffusion modeling method links transient membrane attachment of e. coli mine to e-ring formation. Syst. Synth. Biol., 4(1):35–53, 2010. [8] Walter L Ash, Marian R Zlomislic, Eliud O Oloo, and D Peter Tieleman. Computer simulations of membrane proteins. Biochimica et Biophysica Acta (BBA)-Biomembranes, 1666(1-2):158–189, 2004. [9] Y S Badyal, M.-L. Saboungi, D L Price, S D Shastri, D R Haeffner, and A K Soper. Electron distribution in water. J. Chem. Phys., 112(21):9206–9208, 2000. [10] Kyle A. Beauchamp, Yu-Shan Lin, Rhiju Das, and Vijay S. Pande. Are protein force fields getting better? a systematic benchmark on 524 diverse nmr measurements. J. Chem. Theory Comput., 8(4):1409–1414, 2012. [11] H Bekker,HJCBerendsen, E J Dijkstra, S Achterop, R van Drunen, D van der Spoel, A Sijbers, H Keegstra, B Reitsma, andMKRRenardus. Gromacs: A Parallel Computer for Molecular Dynamics Simulations. In R A de Groot and J Nadrchal, editors, Physics Computing 92, pages 252–256, Singapore, 1993. World Scientific. [12] Marie-Claire Bellissent-Funel, Ali Hassanali, Martina Havenith, Richard Henchman, Peter Pohl, Fabio Sterpone, David van der Spoel, Yao Xu, and Angel E. Garcia. Water Determines the Structure and Dynamics of Proteins. Chem. Rev., 116:7673–7697, 2016.

41 [13] Bryson D Bennett, Elizabeth H Kimball, Melissa Gao, Robin Osterhout, Stephen J Van Dien, and Joshua D Rabinowitz. Absolute metabolite concentrations and implied active site occupancy in escherichia coli. Nat. Chem. Biol., 5(8):593–599, 2009. [14] Howard C Berg. Random walks in biology. Princeton University Press, 1993. [15] Robert B Best, Wenwei Zheng, and Jeetain Mittal. Balanced protein–water interactions improve properties of disordered proteins and non-specific protein association. J. Chem. Theory Comput., 10(11):5113–5124, 2014. [16] Dominique J Bicout and Martin J Field. Stochastic dynamics simulations of macromolecular diffusion in a model of the cytoplasm of escherichia coli. J. Phys. Chem., 100(7):2489–2497, 1996. [17] Martin Billeter, Gerhard Wagner, and Kurt Wüthrich. Solution nmr structure determination of proteins revisited. J. Biomol. NMR, 42(3):155–158, 2008. [18] Par Bjelkmar, Per Larsson, Michel A Cuendet, Berk Hess, and Erik Lindahl. Implementation of the CHARMM Force Field in GROMACS: Analysis of Protein Stability Effects from Correction Maps, Virtual Interaction Sites, and Water Models. J. Chem. Theory Comput., 6:459–466, 2010. [19] Max Born and Robert Oppenheimer. Zur quantentheorie der molekeln. Ann. Phys., 389(20):457–484, 1927. [20] Fred Brauer, Carlos Castillo-Chavez, and Carlos Castillo-Chavez. Mathematical models in population biology and epidemiology, volume 40. Springer, 2012. [21] R A Buckingham. The Classical Equation of State of Gaseous Helium, Neon and Argon. Proc. R. Soc. London Ser. A, 168:264–283, 1938. [22] Robert T Byrne, Huw T Jenkins, Daniel T Peters, Fiona Whelan, James Stowell, Naveed Aziz, Pavel Kasatsky, Marina V Rodnina, Eugene V Koonin, and Andrey L Konevega. Major reorientation of trna substrates defines specificity of dihydrouridine synthases. Proc. Natl. Acad. Sci. U.S.A., 112(19):6033–6037, 2015. [23] Iain D Campbell. The march of structural biology. Nat. Rev. Mol. Cell Biol., 3(5):377, 2002. [24] Javier Carrera, Raissa Estrela, Jing Luo, Navneet Rai, Athanasios Tsoukalas, and Ilias Tagkopoulos. An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of escherichia coli. Mol. Syst. Biol., 10(7), 2014. [25] HS Carslaw and JC Jaeger. Conduction of heat in solids: Oxford Science Publications. Oxford, England, 1959. [26] Lian En Chai, Swee Kuan Loh, Swee Thing Low, Mohd Saberi Mohamad, Safaai Deris, and Zalmiyah Zakaria. A review on the computational approaches for gene regulatory network construction. Comput. Biol. Med., 48:55–65, 2014. [27] Lisa M Charlton, Christopher O Barnes, Conggang Li, Jillian Orans, Gregory B Young, and Gary J Pielak. Residue-level interrogation of macromolecular crowding effects on protein stability. J. Am. Chem. Soc., 130(21):6826–6830, 2008. [28] Frank C Collins and George E Kimball. Diffusion-controlled reaction rates. J. Sci., 4(4):425–437, 1949.

42 [29] International Human Genome Sequencing Consortium et al. Initial sequencing and analysis of the human genome. Nature, 409(6822):860, 2001. [30] Qiang Cui. Perspective: Quantum mechanical methods in and . J. Chem. Phys., 145(14):140901, 2016. [31] T Darden, D York, and L Pedersen. Particle mesh Ewald: An N-log(N) method for Ewald sums in large systems. J. Chem. Phys., 98:10089–10092, 1993. [32] Johan Elf and Måns Ehrenberg. Spontaneous separation of bi-stable biochemical systems into spatial domains of opposite phases. IEE P. Syst. Biol., 1(2):230–236, 2004. [33] R John Ellis. Macromolecular crowding: an important but neglected aspect of the intracellular environment. Curr. Opin. Struct. Biol., 11(1):114–119, 2001. [34] Michael B Elowitz, Michael G Surette, Pierre-Etienne Wolf, Jeffry B Stock, and Stanislas Leibler. Protein mobility in the cytoplasm of escherichia coli. J. Bacteriol., 181(1):197–203, 1999. [35] Stefan Engblom, Lars Ferm, Andreas Hellander, and Per Lötstedt. Simulation of stochastic reaction-diffusion processes on unstructured meshes. SIAM J. Sci. Comput., 31(3):1774–1797, 2009. [36] Stefan Engblom, Andreas Hellander, and Per Lötstedt. Multiscale simulation of stochastic reaction-diffusion networks. In Stochastic Processes, Multiscale Modeling, and Numerical Methods for Computational Cellular Biology, pages 55–79. Springer, 2017. [37] Radek Erban and S Jonathan Chapman. Stochastic modelling of reaction–diffusion processes: algorithms for bimolecular reactions. Phys. Biol., 6(4):046001, 2009. [38] U Essmann, L Perera, M L Berkowitz, T Darden, H Lee, and L G Pedersen. A Smooth Particle Mesh Ewald Method. J. Chem. Phys., 103:8577–8592, 1995. [39] David Fange, Otto G Berg, Paul Sjöberg, and Johan Elf. Stochastic reaction-diffusion kinetics in the microscopic limit. Proc. Natl. Acad. Sci. U.S.A., 2010. [40] David Fange and Johan Elf. Noise-induced min phenotypes in e. coli. PLoS Comput. Biol., 2(6):e80, 2006. [41] Michael Feig, Ryuhei Harada, Takaharu Mori, Isseki Yu, Koichi Takahashi, and Yuji Sugita. Complete atomistic model of a bacterial cytoplasm for integrating physics, biochemistry, and systems biology. J. Mol. Graph. Mod., 58:1–9, 2015. [42] Michael Feig and Yuji Sugita. Variable interactions between protein crowders and biomolecular solutes are important in understanding cellular crowding. J. Phys. Chem. B., 116(1):599–605, 2011. [43] Michael Feig and Yuji Sugita. Reaching new levels of realism in modeling biological macromolecules in cellular environments. J. Mol. Graph. Mod., 45:144–156, 2013. [44] Michael Feig, Isseki Yu, Po-hung Wang, Grzegorz Nawrocki, and Yuji Sugita. Crowding in cellular environments at an atomistic level from computer simulations. J. Phys. Chem. B., 121(34):8009–8025, 2017. PMID: 28666087. [45] P L Freddolino, A S Arkhipov, S B Larson, A McPherson, and K Schulten. Molecular dynamics simulations of the complete satellite tobacco mosaic virus. Structure, 14:437–449, 2006.

43 [46] Chetan Gadgil, Chang Hyeong Lee, and Hans G Othmer. A stochastic analysis of first-order reaction networks. Bull. Math. Biol., 67(5):901–946, 2005. [47] Mahipal Ganji, Margreet Docter, Stuart FJ Le Grice, and Elio A Abbondanzieri. Dna binding proteins explore multiple local configurations during docking via rapid rebinding. Nucleic Acids Res., 44(17):8376–8384, 2016. [48] CW Gardiner, KJ McNeil, DF Walls, and IS Matheson. Correlations in stochastic theories of chemical reactions. J. Stat. Phys., 14(4):307–331, 1976. [49] Daniel T Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys., 22(4):403–434, 1976. [50] Daniel T Gillespie. Markov processes: an introduction for physical scientists. Elsevier, 1991. [51] Daniel T Gillespie, Andreas Hellander, and Linda R Petzold. Perspective: Stochastic algorithms for chemical kinetics. J. Chem. Phys., 138(17):05B201_1, 2013. [52] Arthur P Goldberg, Balázs Szigeti, Yin Hoon Chew, John AP Sekar, Yosef D Roth, and Jonathan R Karr. Emerging whole-cell modeling principles and methods. Curr. Opin. Biotechnol., 51:97–102, 2018. [53] Andreas W Götz, Mark J Williamson, Dong Xu, Duncan Poole, Scott Le Grand, and Ross C Walker. Routine microsecond molecular dynamics simulations with amber on gpus. 1. generalized born. J. Chem. Theory Comput., 8(5):1542–1555, 2012. [54] J K Gregory, D C Clary, K Liu, M G Brown, and R J Saykally. The water dipole moment in water clusters. Science, 275:814–817, 1997. [55] C M Guldberg and P Waage. Studies concerning affinity. Forhandlinger: Videnskabs-Selskabet i Christian, 1864. [56] Ryuhei Harada, Yuji Sugita, and Michael Feig. Protein crowding affects hydration structure and dynamics. J. Am. Chem. Soc., 134(10):4842–4849, 2012. [57] Ryuhei Harada, Naoya Tochio, Takanori Kigawa, Yuji Sugita, and Michael Feig. Reduced native state stability in crowded cellular environment due to protein–protein interactions. J. Am. Chem. Soc., 135(9):3696–3701, 2013. [58] Edward Harder, Wolfgang Damm, Jon Maple, Chuanjie Wu, Mark Reboul, Jin Yu Xiang, Lingle Wang, Dmitry Lupyan, Markus K. Dahlgren, Jennifer L. Knight, Joseph W. Kaus, David S. Cerutti, Goran Krilov, William L. Jorgensen, Robert Abel, and Richard A. Friesner. Opls3: A force field providing broad coverage of drug-like small molecules and proteins. J. Chem. Theory Comput., 12:281–296, 2016. [59] Johan Hattne, David Fange, and Johan Elf. Stochastic reaction-diffusion simulation with mesord. Bioinformatics, 21(12):2923–2924, 2005. [60] Stefan Hellander, Andreas Hellander, and Linda Petzold. Reaction-diffusion master equation in the microscopic limit. Phys. Rev. E, 85(4):042901, 2012. [61] Michael O Hengartner. The biochemistry of apoptosis. Nature, 407(6805):770, 2000. [62] Christian Hensen, Johannes C Hermann, Kwangho Nam, Shuhua Ma, Jiali Gao, and Hans-Dieter Höltje. A combined qm/mm approach to protein- ligand

44 interactions: polarization effects of the hiv-1 protease on selected high affinity inhibitors. J. Med. Chem., 47(27):6673–6680, 2004. [63] Berk Hess, Carsten Kutzner, David van der Spoel, and Erik Lindahl. GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. J. Chem. Theory Comput., 4(3):435–447, mar 2008. [64] Jiang Hong and Lila M Gierasch. Macromolecular crowding remodels the energy landscape of a protein by favoring a more compact unfolded state. J. Am. Chem. Soc., 132(30):10445–10452, 2010. [65] V. Hornak, R. Abel, A. Okur, B. Strockbine, A. Roitberg, and C. Simmerling. Comparison of multiple amber force fields and development of improved protein backbone parameters. Proteins: Struct. Funct. Gen., 65:712–725, 2006. [66] Jing Huang, Sarah Rauscher, Grzegorz Nawrocki, Ting Ran, Michael Feig, Bert L de Groot, Helmut Grubmüller, and Alexander D MacKerell Jr. Charmm36m: an improved force field for folded and intrinsically disordered proteins. Nat. Methods, 14(1):71, 2016. [67] Wonpil Im, Jie Liang, Arthur Olson, Huan-Xiang Zhou, Sandor Vajda, and Ilya A Vakser. Challenges in structural approaches to cell modeling. J. Mol. Biol., 428(15):2943–2964, 2016. [68] COMSOL, Inc. Comsol multiphysics. Stockholm, Sweden. [69] The MathWorks, Inc. Matlab and statistics toolbox. Natick, Massachusetts, United States. [70] Wolfram Research, Inc. Mathematica, Version 12.0. Champaign, IL, 2019. [71] Samuel A Isaacson. The reaction-diffusion master equation as an asymptotic approximation of diffusion to a small target. SIAM J. Appl. Math., 70(1):77–111, 2009. [72] Jonathan R Karr, Jayodita C Sanghvi, Derek N Macklin, Miriam V Gutschow, Jared M Jacobs, Benjamin Bolival Jr, Nacyra Assad-Garcia, John I Glass, and Markus W Covert. A whole-cell computational model predicts phenotype from genotype. Cell. J., 150(2):389–401, 2012. [73] Jonathan R Karr, Koichi Takahashi, and Akira Funahashi. The principles of whole-cell modeling. Curr. Opin. Microbiol., 27:18–24, 2015. [74] Hans A Kestler, Christian Wawra, Barbara Kracher, and Michael Kühl. Network modeling of signal transduction: establishing the global view. BioEssays, 30(11-12):1110–1125, 2008. [75] Michael C Konopka, Irina A Shkel, Scott Cayley, M Thomas Record, and James C Weisshaar. Crowding and confinement effects on protein diffusion in vivo. J. Bacteriol., 188(17):6115–6123, 2006. [76] Yoshiki Kuramoto. Effects of diffusion on the fluctuations in open chemical systems. Prog. of Theor. Phys., 52(2):711–713, 1974. [77] Oliver F Lange, David van der Spoel, and Bert L de Groot. Scrutinizing Molecular Mechanics Force Fields on the Submicrosecond Timescale with NMR Data. Biophys. J., 99:647–655, 2010. [78] Daniel S D Larsson, Lars Liljas, and David van der Spoel. Virus capsid dissolution studied by microsecond molecular dynamics simulations. PLoS Comput. Biol., 8(5):e1002502, 2012. [79] Andrew Leach. Molecular Modelling: Principles and Applications (2nd

45 Edition). Prentice Hall, 2 edition, April 2001. [80] J E Lennard-Jones. On the Determination of Molecular Fields. Proc. Royal Soc. Lond. Ser. A, 106:463–477, 1924. [81] Michael Levitt and Ruth Sharon. Accurate simulation of protein dynamics in solution. Proc. Natl. Acad. Sci. U.S.A., 85(20):7557–7561, 1988. [82] Conggang Li, Yaqiang Wang, and Gary J Pielak. Translational and rotational diffusion of a small globular protein under crowded conditions. J. Phys. Chem. B., 113(40):13390–13392, 2009. [83] DV Lide. The 84th edition of the crc handbook of chemistry and physics, 2004. [84] Shneior Lifson and Arieh Warshel. Consistent force field for calculations of conformations, vibrational spectra, and enthalpies of cycloalkane and n-alkane molecules. J. Chem. Phys., 49(11):5116–5129, 1968. [85] Kresten Lindorff-Larsen, Paul Maragakis, Stefano Piana, Michael P Eastwood, Ron O Dror, and David E Shaw. Systematic validation of protein force fields against experimental data. PloS one, 7(2):e32131, 2012. [86] Kresten Lindorff-Larsen, Stefano Piana, Kim Palmo, Paul Maragakis, John L Klepeis, Ron O Dror, and David E Shaw. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins, 78:1950–1958, 2010. [87] Alfred J Lotka. Contribution to the theory of periodic reactions. J. Chem. Phys., 14(3):271–274, 1910. [88] Alfred James Lotka. Elements of Physical Biology. Williams & Wilkins Company, 1925. [89] Katherine Luby-Phelps. Cytoarchitecture and physical properties of cytoplasm: volume, viscosity, diffusion, intracellular surface area. Int. Rev. Cytol., 192:189–221, 1999. [90] A D MacKerell Jr., M Feig, and C L Brooks III. Extending the treatment of backbone energetics in protein force fields: limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations,. J. Comput. Chem., 25:1400–1415, 2004. [91] Derek N Macklin, Nicholas A Ruggero, and Markus W Covert. The future of whole-cell modeling. Curr. Opin. Biotechnol., 28:111–115, 2014. [92] Harley H McAdams and Adam Arkin. Stochastic mechanisms in gene expression. Proc. Natl. Acad. Sci. U.S.A., 94(3):814–819, 1997. [93] J Andrew McCammon, Bruce R Gelin, and Martin Karplus. Dynamics of folded proteins. Nature, 267(5612):585, 1977. [94] Sean R McGuffee and Adrian H Elcock. Diffusion, crowding & protein stability in a dynamic molecular model of the bacterial cytoplasm. PLos Comput. Biol., 6(3):e1000694, 2010. [95] Andrew C Miklos, Conggang Li, Naima G Sharaf, and Gary J Pielak. Volume exclusion and soft interaction effects on protein stability under crowded conditions. Biochemistry, 49(33):6984–6991, 2010. [96] Andrew C Miklos, Mohona Sarkar, Yaqiang Wang, and Gary J Pielak. Protein crowding tunes protein stability. J. Am. Chem. Soc., 133(18):7116–7120, 2011. [97] Mohamad Mohebifar, Erin R Johnson, and Christopher N Rowley. Evaluating force-field london dispersion coefficients using the exchange-hole dipole

46 moment model. J. Chem. Theory Comput., 13(12):6146–6157, 2017. [98] Philip M Morse. Diatomic molecules according to the wave mechanics. ii. vibrational levels. Phys. Rev., 34(1):57, 1929. [99] James Dickson Murray. Mathematical Biology I: An Introduction, volume I. Springer-Verlag„ 2003. [100] Grzegorz Nawrocki, Po-hung Wang, Isseki Yu, Yuji Sugita, and Michael Feig. Slow-down in diffusion in crowded protein solutions correlates with transient cluster formation. J. Phys. Chem. B., 121(49):11072–11084, 2017. [101] Paul S Nerenberg and Teresa Head-Gordon. New developments in force fields for biomolecular simulations. Curr. Opin. Struct. Biol., 49:129–138, 2018. [102] Gregoire Nicolis. Self-organization in nonequilibrium systems: From Dissipative Structures to Order through Fluctuations. Wiley, 1977. [103] Tomas Opplestrup, Vasily V Bulatov, George H Gilmer, Malvin H Kalos, and Babak Sadigh. First-passage monte carlo algorithm: diffusion without all the hops. Phys. Rev. Lett., 97(23):230602, 2006. [104] Jeffrey D Orth, Ines Thiele, and Bernhard Ø Palsson. What is flux balance analysis? Nat. Biotechnol., 28(3):245, 2010. [105] Drazen Petrov and Bojan Zagrovic. Are current atomistic force fields accurate enough to study proteins in crowded environments? PLoS Comput. Biol., 10(5):e1003638, 2014. [106] Rob Phillips, Julie Theriot, Jane Kondev, and Hernan Garcia. Physical biology of the cell. Garland Science, 2012. [107] Stefano Piana, Alexander G. Donchev, Paul Robustelli, and David E. Shaw. Water dispersion interactions strongly influence simulated structural properties of disordered protein states. J. Phys. Chem. B, 119(16):5113–5123, 2015. [108] J W Ponder and D A Case. Force fields for protein simulations. Adv. Prot. Chem., 66:27, 2003. [109] Arjun Raj and Alexander van Oudenaarden. Nature, nurture, or chance: stochastic gene expression and its consequences. Cell. J., 135(2):216–226, 2008. [110] Douglas Ridgway, Gordon Broderick, Ana Lopez-Campistrous, Melania Ru’aini, Philip Winter, Matthew Hamilton, Pierre Boulanger, Andriy Kovalenko, and Michael J Ellison. Coarse-grained molecular simulation of diffusion and reaction kinetics in a crowded virtual cytoplasm. Biophys. J., 94(10):3748–3759, 2008. [111] Sereina Riniker. Fixed-charge atomistic force fields for molecular dynamics simulations in the condensed phase: An overview. J. Chem. Inf. Model., 58:565–578, 2018. [112] Germán Rivas and Allen P Minton. Macromolecular crowding in vitro, in vivo, and in between. Trends Biochem. Sci., 41(11):970–981, 2016. [113] Danilo Roccatano, Andre Barthel, and Martin Zacharias. Structural flexibility of the nucleosome core particle at atomic resolution studied by molecular dynamics simulation. Biopolymers, 85(5-6):407–421, 2007. [114] Matthias Roos, Maria Ott, Marius Hofmann, Susanne Link, Ernst Rössler, Jochen Balbach, Alexey Krushelnitsky, and Kay Saalwächter. Coupling and decoupling of rotational and translational diffusion of proteins under crowding conditions. J. Am. Chem. Soc., 138:10365–10372, 2016.

47 [115] P. Rossi, J.A. Aramini, D. Hang, R. Xiao, T.B. Acton, and G.T. Montelione. Solution NMR structure of putative lipoprotein from pseudomonas syringae gene locus PSPTO2350. Northeast Structural Genomics Target PsR76A. To be published. [116] Andrej Šali and Tom L Blundell. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol., 234(3):779–815, 1993. [117] Romelia Salomon-Ferrer, Andreas W Götz, Duncan Poole, Scott Le Grand, and Ross C Walker. Routine microsecond molecular dynamics simulations with amber on gpus. 2. explicit solvent particle mesh ewald. J. Chem. Theory Comput., 9(9):3878–3888, 2013. [118] Mohona Sarkar, Conggang Li, and Gary J Pielak. Soft interactions and crowding. Biophys. Rev., 5(2):187–194, 2013. [119] Hans Martin Senn and Walter Thiel. Qm/mm methods for biomolecular systems. Angew. Chem., Int. Ed. Engl., 48(7):1198–1229, 2009. [120] Zach Serber and Volker Dötsch. In-cell nmr spectroscopy. Biochemistry, 40(48):14317–14323, 2001. [121] Yigong Shi. A glimpse of structural biology through x-ray crystallography. Cell. J., 159(5):995–1014, 2014. [122] ML Shuler, S Leung, and CC Dick. A mathematical model for the growth of a single bacterial cell. Ann. N. Y. Acad. Sci., 326(1):35–52, 1979. [123] Joel Smoller. Shock waves and reaction diffusion equations, volume 258. Springer Science & Business Media, 2012. [124] Thomas R Sokolowski, Joris Paijmans, Laurens Bossen, Thomas Miedema, Martijn Wehrens, Nils B Becker, Kazunari Kaizu, Koichi Takahashi, Marileen Dogterom, and Pieter Rein ten Wolde. egfrd in all dimensions. J. Chem. Phys., 150(5):054108, 2019. [125] Thomas R Sokolowski and Pieter Rein ten Wolde. Spatial-stochastic simulation of reaction-diffusion systems. arXiv preprint arXiv:1705.08669, 2017. [126] Balázs Szigeti, Yosef D Roth, John AP Sekar, Arthur P Goldberg, Saahith C Pochiraju, and Jonathan R Karr. A blueprint for human whole-cell modeling. Curr. Opin. Struct. Biol., 7:8–15, 2018. [127] Koichi Takahashi, Sorin Tanase-Nicola,˘ and Pieter Rein Ten Wolde. Spatio-temporal correlations can drastically change the response of a mapk pathway. Proc. Natl. Acad. Sci. U.S.A., 107(6):2473–2478, 2010. [128] Kouichi Takahashi, Satya Nanda Vel Arjunan, and Masaru Tomita. Space in systems biology of signaling pathways–towards intracellular molecular crowding in silico. FEBS letters, 579(8):1783–1788, 2005. [129] Ignacio Tinoco Jr and Jin-Der Wen. Simulation and analysis of single-ribosome translation. Phys. Biol., 6(2):025006, 2009. [130] M Tomita. Whole-cell simulation: a grand challenge of the 21st century. Trends Biotechnol., 19(6):205–210, 2001. [131] M Tomita, K Hashimoto, K Takahashi, T S Shimizu, Y Matsuzaki, F Miyoshi, K Saito, S Tanida, K Yugi,JCVenter, and C A Hutchison. E-CELL: software environment for whole-cell simulation. Bioinformatics, 15(1):72–84, 1999. [132] Alan Mathison Turing. The chemical basis of morphogenesis. Phil. Trans. R. Soc. Lond. B, 237(641):37–72, 1952.

48 [133] Bert van den Berg, Rachel Wain, Christopher M Dobson, and R John Ellis. Macromolecular crowding perturbs protein refolding kinetics: implications for folding inside the cell. EMBO. J., 19(15):3870–3875, 2000. [134] D van der Spoel andPJvanMaaren. The Origin of Layer Structure Artifacts in Simulations of Liquid Water. J. Chem. Theory Comput., 2:1–11, 2006. [135] W F van Gunsteren, D Bakowies, R Baron, I Chandrasekhar, M Christen, X Daura, P Gee, D P Geerke, A Glattli, P H Hunenberger, M A Kastenholz, C Ostenbrink, M Schenk, D Trzesniak,NFAvanderVegt,andHBYu. Biomolecular modeling: Goals, problems, perspectives. Angew. Chem., Int. Ed. Engl., 45:4064–4092, 2006. [136] Nicolaas Godfried Van Kampen. Stochastic processes in physics and chemistry, volume 1. Elsevier, 1992. [137] Jeroen S van Zon and Pieter Rein Ten Wolde. Green’s-function reaction dynamics: a particle-based approach for simulating biochemical networks in time and space. J. Chem. Phys., 123(23):234910, 2005. [138] Jeroen S van Zon and Pieter Rein Ten Wolde. Simulating biochemical networks at the particle level and in time and space: Green’s function reaction dynamics. Phys. Rev. Lett., 94(12):128103, 2005. [139] J Craig Venter, Mark D Adams, Eugene W Myers, Peter W Li, Richard J Mural, Granger G Sutton, Hamilton O Smith, Mark Yandell, Cheryl A Evans, and Robert A Holt. The sequence of the human genome. Sci. Signal., 291(5507):1304, 2001. [140] S Vijay-Kumar, C E Bugg, and W J Cook. Structure of Ubiquitin Refined at 1.8 Å Resolution. J. Mol. Biol., 194:531–544, 1987. [141] Vito Volterra. Variazioni e fluttuazioni del numero d’individui in specie animali conviventi. C. Ferrari, 1927. [142] J. Wang, P. Cieplak, and P. A. Kollman. How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? J. Comput. Chem., 21:1049–1074, 2000. [143] Siyang Wang, Johan Elf, Stefan Hellander, and Per Lötstedt. Stochastic reaction–diffusion processes with embedded lower-dimensional structures. Bull. Math. Biol., 76(4):819–853, 2014. [144] Yaqiang Wang, Laura A Benton, Vishavpreet Singh, and Gary J Pielak. Disordered protein diffusion under crowded conditions. J. Phys. Chem. Lett., 3(18):2703–2706, 2012. [145] Yaqiang Wang, Conggang Li, and Gary J Pielak. Effects of proteins on protein diffusion. J. Am. Chem. Soc., 132(27):9392–9397, 2010. [146] Yaqiang Wang, Mohona Sarkar, Austin E Smith, Alexander S Krois, and Gary J Pielak. Macromolecular crowding and protein stability. J. Am. Chem. Soc., 134(40):16614–16618, 2012. [147] A Warshel and M Levitt. Theoretical studies of enzymic reactions - dielectric, electrostatic and steric stabilization of carbonium-ion in reaction of lysozyme. J. Mol. Biol., 103:227–249, 1976. [148] Christian L. Wennberg, Teemu Murtola, Berk Hess, and Erik Lindahl. Lennard-jones lattice summation in bilayer simulations has critical effects on surface tension and lipid properties. J. Chem. Theory Comput., 9:3527–3537,

49 2013. [149] Ludwig Wilhelmy. Ueber das gesetz, nach welchem die einwirkung der säuren auf den rohrzucker stattfindet. Ann. Phys., 157(12):499–526, 1850. [150] Isseki Yu, Takaharu Mori, Tadashi Ando, Ryuhei Harada, Jaewoon Jung, Yuji Sugita, and Michael Feig. Biomolecular interactions modulate macromolecular structure and dynamics in atomistic model of a bacterial cytoplasm. eLife, 5, 2016. [151] Huan-Xiang Zhou, Germán Rivas, and Allen P Minton. Macromolecular crowding and confinement: biochemical, biophysical, and potential physiological consequences. Annu. Rev. Biophys., 37:375–397, 2008. [152] Steven B Zimmerman and Allen P Minton. Macromolecular crowding: biochemical, biophysical, and physiological consequences. Annu. Rev. Biophys. Biomol. Struct., 22(1):27–65, 1993. [153] Steven B Zimmerman and Stefan O Trach. Estimation of concentrations and excluded volume effects for the cytoplasm of escherichia coli. J. Mol. Biol., 222(3):599–620, 1991.

50

Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1871 Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-395119 2019