List of papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I. Larsson, D. S. D., Liljas, L., van der Spoel, D. (2012) Dissolution Studied by Microsecond Molecular Dynamics Simula- tions. PLoS Comput. Biol., in press

II. Larsson, D. S. D., van der Spoel, D. (2012) Screening for the Loca- tion of RNA Using the Chloride Ion Distribution in Simulations of Virus . J. Chem. Theory Comput., submitted

III. Larsson, D. S. D. 1, Wang, Y.1, van der Spoel, D. (2009) Encapsula- tion of Myoglobin in a Cetyl Trimethylammonium Bromide Micelle in Vacuo: A Simulation Study. Biochemistry, 48(5):1006-15

IV. Marklund, E. G., Larsson, D. S. D., van der Spoel, D., Patriksson, A., Caleman, C. (2009) Structural Stability of Electrosprayed Pro- teins: Temperature and Hydration Effects. Phys. Chem. Chem. Phys., 11(36):8069-78

V. Friemann, R., Larsson, D. S. D., Wang, Y., van der Spoel, D. (2009) Molecular Dynamics Simulations of a Membrane -Micelle Complex in Vacuo. J. Am. Chem. Soc., 131(46):16606-7

VI. van der Spoel, D., Marklund, E., Larsson, D. S. D., Caleman, C. (2011) , Lipids, and Water in the Gas Phase. Macromol. Biosci., 11(1):50-59

VII. Wang, F., Weckert, E., Ziaja, B., Larsson, D. S. D., van der Spoel, D. (2011) Coherent Diffraction of a Single Virus Particle: The Impact of a Water Layer on the Available Orientational Information. Phys. Rev. E, 83(3):031907

Reprints were made with permission from the publishers.

1Shared first author. Additonal Publications

VIII. Spångberg, D., Larsson, D. S. D., van der Spoel, D. (2011) Tra- jectory NG - Portable, Compressed, General Molecular Dynamics Trajectories. J. Mol. Model., 17(10):2669-2685 IX. Brindefalk, B., Viklund, J., Larsson, D., Thollesson, M., Ander- sson, S. G. E. (2007) Origin and Evolution of the Mitochondrial Aminoacyl-tRNA Synthetases. Mol. Biol. Evol., 24(3):743-56 Abbreviations

AU asymmetric unit CP coat protein cryo-EM cryo-electron microscopy dsDNA double-stranded deoxyribonucleic acid dsRNA double-stranded ribonucleic acid EM electron microscopy ESI electrospray ionization MD molecular dynamics MS mass spectrometry PDB Protein Data Bank ssDNA single-stranded deoxyribonucleic acid ssRNA single-stranded ribonucleic acid

Viruses α3 Enterobacteria phage alpha3 BPMV bean pod mottle virus BBV black beetle virus CCMV cowpea chlorotic mottle virus CGMV cucumber green mottle mosaic CPV canine parvovirus DYMV Desmodium yellow mottle virus φX174 Enterobacteria phage phiX174 HIV human immunodeficiency virus MS2 Enterobacteria phage MS2 MVM minute virus of mice FHV Flock House virus PaV Pariacato virus PhMV physalis mottle virus PrV Providence virus TBSV tomato bushy stunt virus TNSV See STNV TMSV See STMV TMV tobacco necrosis virus TAV tomato aspermy virus TYMV turnip yellow mosaic virus RMV ribgrass mosaic virus STNV satellite tobacco necrosis virus STMV satellite

Contents

1 Introduction ...... 9 1.1 Outline of the papers ...... 9 1.2 Overarching aim ...... 10

2 Computational methods to study biology ...... 11 2.1 Space and time ...... 11 2.2 A brief introduction to molecular dynamics ...... 12 2.3 Large scale simulations ...... 13

3 Virus ...... 16 3.1 General characterization of ...... 16 3.2 Virus structure ...... 21

4 Protein structure in the gas phase ...... 28 4.1 Structure and function ...... 28 4.2 Structure determination methods ...... 29 4.3 Aerosolization ...... 31

5 Summary of papers and conclusions ...... 33 5.1 Virus simulations ...... 33 5.2 Gas phase proteins structures ...... 35 5.3 Determining orientation of a water-covered virus in vacuum ...... 37 5.4 Future perspectives ...... 38

6 Summary in Swedish/Sammanfattning på svenska ...... 39

7 Acknowledgments ...... 42

References ...... 43

1. Introduction

Increasing our understanding in biology is an enterprise in learning more about ourselves. Modern biology has a firm foundation in the theory of natural se- lection by Charles Darwin (1809–1882), and all aspects of biology have to be studied in the light of evolution. The motor that currently drives the field for- ward is the advances made in molecular biology. Great strides in the develop- ment of instruments and techniques enable completely new types of questions to be answered and give us further insight in the properties of, and the inter- play between, the macromolecules of the —proteins, nucleic acids, lipids and sugars [1]. However, all experimental methods have limitations and this is where com- putational methods can be a powerful complement. Modeling and simulations can be used to study aspects that cannot be approached with experimental methods and can be used to predict, synthesize, interpret and extrapolate re- sults. By combining experiments with theoretical models and powerful com- putational methods, we can advance beyond what is feasible to merely mea- sure and observe.

1.1 Outline of the papers The capsids of many viruses are well characterized through X-ray crystallog- raphy experiments. This is a powerful method to study the average structure, but the dynamics and processes of assembly and disassembly are difficult to address. Viruses need to have a strong and stable capsid to protect the sensi- tive , but also be able to disassemble and release the genomic content upon infection. Fulfilling both of these paradigms presents a difficult design challenge, which viruses have been able to solve. Paper I looks at how STNV regulates the switching between two states. Although the structure of the coat protein (CP) of many viruses is known, not much information is available about the structure of virus . Crys- tal structures do not reveal much about the internal organization of virions unless the RNA/DNA adheres to the icosahedral symmetry of the outside. In paper II we investigate how much RNA packing information is encoded in the structure of the two virus capsids of STNV and STMV. Several analytical methods used for analysis of protein structure and dy- namics are required to be performed in container free environments and the sample needs to be aerosolized. Examples are mass spectrometry (MS) [2]

9 and a novel diffraction imaging technique that uses pulsed X-ray lasers [3, 4]. Therefore it is important to assure the structural integrity of the macro- molecules in the conditions required for these experiments. Papers IV and VI addresses questions about the evaporative cooling and dehydration processes when proteins are brought from solution into the gas phase. Molecular layers of water can act as hydrating agents, but also act as a sacrificial tamper against radiation damage in diffractive experiments. In paper III, V and VI we look into the organization of proteins together with lipids and detergent molecules based on ideas of providing an encapsulation for proteins upon aerosolization. Structural studies using X-ray lasers are currently focused on large repro- ducible objects, such as viruses [4] and nanocrystals [5, 6]. In paper VII the impact of a water layer on the possibility to reconstruct a virus structure was investigated.

1.2 Overarching aim The aim of the work in this thesis is to use molecular dynamics (MD) to study aspects of molecular biology that are difficult or impossible to study by other means. Knowing the structure of proteins and macromolecular assemblies in the gas phase is important to interpret experimental data. The dynamics and structure of expanded virus capsids have evaded characterization attempts, as has the structure of virus genomes. If we can understand the molecular basis of viruses, we are better equipped to combat disease-inducing viruses using a structural approach. Viruses and virus-like structures also pose interesting platforms for development of industrial applications in nanotechnology. Fur- thermore, a goal with this work has been to explore and advance the limits of simulating large systems for extended time periods.

10 2. Computational methods to study biology

Molecular biology rests on a foundation of chemistry and physics, that in turn can be described mathematically. The theoretical basis of the methods presented in this thesis stems from the classical physics of Sir Isaac New- ton (1642–1727). The models used are conceptually simple, with a limited number of parameters that have been tuned to reproduce key experimental data. The quality of present-day models in tandem with powerful computer resources makes it possible to extrapolate the experimental observations with predictive power.

2.1 Space and time Biological systems present a grand challenge for physicists to model. Almost all non-trivial biological systems span vast regimes in both the space and the time domain (Fig. 2.1). The relevant range in sizes goes all the way from single atoms, on the order of Ångströms (10−10 m), to macroscopic objects, such as ourselves that can be measured using a tape measure (100 m). In be- tween these two extremes we can find a wide array of important objects. Most globular proteins, such as many important enzymes, are on the order of a few nanometers (10−9 m). Spherical viruses can have a diameter of anywhere between 15–500 nanometers (10−8–10−7 m). Prokaryotic cells are often rep- resented by the model species Escherichia coli, which are bean shaped with a length of a few micrometers (10−6 m). The size of organelle equipped eukary- otic cells spans over a wide range, but are usually several orders of magnitude larger than prokaryotes. The range in the time domain is even greater than in space. The longest relevant time is that of evolutionary processes, which can take millions of years, but when studying molecular phenomena, the upper limit of relevant times can be considered that of protein synthesis, which happens on the order of minutes [7]. The fastest folding proteins can adopt their native conforma- tion within tenths of milliseconds (10−4 s) [8], while peptides and secondary structure elements can fold in less than a microsecond (10−7 s) [9]. Collective motions in proteins happen over nanoseconds (10−9 s) [10] while the polar interactions between individual water molecules form and break on the time scale of about 2 picoseconds (10−12 s) [11]. The fastest vibrations of atom bonds occurs on the order of 10 femtoseconds (10−14 s) at room tempera- ture [12].

11 multicellular eukaryotic cell organism prokaryotic cell Size

100 m protein virus 0 10 s 10-3 m 10-3 s 10-6 m atom protein 10-6 s sythesis 10-9 m protein 10-9 s folding -12 10-12 s 10 m protein domain 10-15 s Time movements hydrogen bond life-time bond vibration Figure 2.1. The vast size and time domains of central biological concepts makes it a challenge to model computationally.

Modeling atomistic representations of biological systems over biologically significant time spans is challenging, but the advancements in computer tech- nology have made it possible to simulate larger and longer than ever before. With regards to size, super computers with parallel setups of massive amounts of compute nodes have enabled short simulations of 100 million atom sys- tems [13]. The time aspect has been tackled by using the special-purpose computer Anton, which has the simulation algorithms implemented in hard- ware, to obtain millisecond long trajectories of small proteins folding and un- folding [14]. The key for successfully performing large scale simulations lies in develop- ing software that can harness the power of the computers. In paper I we have studied a virus system with 1.2 million atoms on the microsecond scale; push- ing the limit of simulating large systems over long time periods. This study was made possible by the GROMACS simulation engine [15]; an open-source, world leading software package for simulation of biomolecules.

2.2 A brief introduction to molecular dynamics Any type of modeling requires simplifications and generalizations to be made. For the study of molecular biology, the granularity level of atoms seems to be a well-balanced trade-off, yielding an accurate representation while still keeping

12 a simplistic theoretical model with a limited number of tunable parameters. This strengthens the predictive power of the model. In molecular dynamics [16], the movements of the atoms are determined by a set of mathematical equations and corresponding parameters—usually referred to as a force-field [17]—that describes how the atoms interact with each other. The positions of the atoms therefore define an energy landscape that changes as the atoms move. To calculate the time series of atomic move- −→ ments, Newton’s second law of motion, F = m−→a , is applied and integrated for all atoms in a series of short time steps. The result is a deterministic trajec- tory in the phase space of the system. From this information many important statistical variables can be derived, that can be compared to experimental ob- servables.

Force fields The most popular force-fields used for simulations of proteins and other bio- logical macromolecules are AMBER [18], CHARMM [19] and OPLS/AA [20] as well as derivates of these. They are built up by distance- and angle-based interactions, which are easily described mathematically. Atoms are modeled as soft spheres with a central point charge. The stiffness of the spheres is de- scribed by Lennard-Jones potentials—two-term polynomials where the terms are proportional to the inverse 6th and 12th power of the radial distances—that represent the Van der Waals interactions. The charge repulsions and attrac- tions are described by Coulomb’s law, which is inversely proportional to the distance between the point charges. The atoms are linked together in net- works, interconnected by harmonic springs. Harmonic springs also describe the bond angle vibrations while the rotations around bonds are described by trigonometric expressions.

2.3 Large scale simulations It is difficult to calculate trajectories for large systems over extended periods of time. To simplify the problem, it is important to reduce the number of inter- actions required to be calculated. The simulation system for a globular object can be kept minimal by using a sphere-like simulation box—such as a rhombic dodecahedron—and applying periodic boundary conditions. Long-range elec- trostatic interactions beyond a cut-off of about 1 nm can be calculated against a mesh representation of the charge distribution, which reduces an O(n2) prob- lem to O(nlogn) [21]. Long-range Van der Waals interactions (beyond the cut-off) can be corrected analytically.

Parallelization and scaling To perform large MD simulations, the problem has to be parallelized. Super- computers used for this purpose consist of many individual computers (nodes)

13 connected by a high-speed network. MD simulations are by nature fairly easy to divide into smaller problems and each subproblem can therefore be dis- patched to a separate node. An increase in the size of the system (called strong scaling) can be counteracted by adding more nodes. However, if the system size does not change, adding more nodes (called weak scaling) will not improve the performance beyond a certain point. In the limit, where each node only calculates one interaction, it is not possible to subdivide the prob- lem any further. However the practical limit is rather determined by the access to shared resources, such as the file storage or the network bandwidth. In MD simulations, the bottle-neck is usually the communication between nodes when calculating long range interactions and other global properties. An increase in the system size can be compensated by adding more compute nodes, but extending the simulation time is more difficult. Since each time step is dependent on the previous one, the calculation of consecutive steps must be performed in series. Therefore, the speed of the individual nodes imposes an upper limit of the number of time steps that can be calculated.

The integration time step The total length of a trajectory depends on both the number and the length of the integration step. When deciding that we require an atomistic model, we are anchored in the lower end of the time domain by the fastest vibrational modes. Accurate integration of the forces into velocities requires a short time step, at the most 1/5th of the period of the fastest vibration [22]. Longer time steps can lead to poor preservation of the energy as well as the linear momentum. It can also lead to unstable systems due to numerical overflow if atoms move on top of each other. The limiting factor in most force-fields is the fast vibrational motion be- tween bonded atoms. These spring constants are very large relative to the other interactions in biomolecular systems. Especially the bonds involving hydrogens are fast, since the vibration frequency is inversely dependent on the mass of the atom. Deuterating the system, i.e. replacing all hydrogens with heavy hydrogens and thereby doubling the mass, reduces the frequency of the vibrations. When this is done in experiments, it has detrimental effects on the kinetics of metabolic processes in vivo [23], however. Instead, in most large scale simulations of biological systems, the bond lengths are constrained to the equilibrium distance; meaning that after updat- ing all the atomic positions, the atoms are rearranged to keep bonded atoms at constant separation. Constraining bonds to hydrogen atoms allows an in- tegration time step of about 1 fs using standard integration algorithms, such as leap-frog. By also constraining all other bond lengths in the system, it is possible to use a time step of about 2 fs [24]. Analysis of the vibrational frequencies show that the next set of fast vibra- tions are the angular vibrations of hydrogen atoms. By completely constrain the position of the hydrogen atoms relative to the heavy atoms, the time step

14 can be increased up to at least 4-5 fs [12]. This can be done by adding an im- proper constraint (i.e. between non-bonded atoms) between the hydrogen and an additional heavy atom. The bond angles in methyl and amine groups can be locked while still allowing the entire group to rotate freely by replacing the mass of the entire functional group with two interaction-less dummy masses. The original atoms are mass-less virtual sites, whose positions are determined relative to the dummy masses. The dummy masses and positions are chosen such that the total mass and the moment of inertia of the group remain the same. Forces on the virtual sites are mapped onto the dummy masses [12].

Large-scale parallelization enabled us to simulate 1.2 million atoms for one billion integration time steps. The use of virtual sites allowed us to take 4 femtoseconds long time steps, resulting in a total of 4 microseconds of simu- lation time (paper I).

15 3. Virus structural biology

3.1 General characterization of viruses Viruses constitute a large and diverse group of life-like molecular assem- blies [25]. Fundamentally, a virus is a packaged set of genes with the ability to infect a cellular host and redirect its metabolism into producing copies of itself. They balance on the limit of what can be considered life, but they fail to display some of the properties that are normally considered necessary [26], although with the discovery of the enormous mimiviruses, that display many of the traits associated with life, the line between viruses and other obligate intracellular parasites is becoming more diffuse [27]. Sequencing of environmental samples, so called metagenomics studies, have shown that viruses are the most abundant biological entities on the planet, out- numbering the total number of cells by a wide margin [28]. An estimate of the number of marine virus planktons was set to 1030 [29]. Viruses cause many widespread or severe diseases (e.g., AIDS, influenza, hepatitis and the common cold), as well as economically important crop and livestock diseases (e.g., foot-and-mouth disease and tobacco mosaic disease). Viruses are also important model systems: most early genetic studies were made on virus genomes; virus polymerases and promoter sequences are valu- able tools in biochemistry; and viruses can be utilized to modify the genomes of cell cultures through transduction.

3.1.1 Virus evolution The origin of viruses is unclear and debated. The phylogenetic tree of cellular life is based on the similarity in a small set of core genes that are central for the cell metabolism; in particular the gene coding for the 16S rRNA of the small ribosome subunit has been pivotal to define the three domains: bacteria, archaea and eukarya [30]. However, no gene exists that is common to all viruses and that can be used to build a phylogeny, although the capsid protein of many icosahedral viruses share a common fold, the so called jelly roll fold (Fig. 3.1) [31]. The ubiquity of this feature has inspired theories of an ancient common ancestry for all viruses [32], since the fold of homologous proteins is conserved to a much greater extent than the amino acid sequence. Perhaps a more likely scenario is that viruses can be seen as genes that have escaped from the cell. Many proteins form capsid-like structures (e.g.,

16 βC βH βE βF βB βI βD Ct βG Nt

Figure 3.1. The secondary structure of the shell domain of STNV showcasing the jelly roll fold of virus capsid proteins. The eight-stranded anti-parallel β-barrel is the central feature of the fold. carboxysomes and microtubules). A hypothetical route for the genesis of a virus is a mutation that enables a protein to aggregate into a structure that captures its own mRNA and escape from the cell. Viruses evolve rapidly. Parasites in general tend to maintain a high muta- tion rate which seems to help them evade the defense mechanism of the host. The Red Queen hypothesis [33, 34] states that in the arms race between the host and the parasite "it takes all the running you can do, to keep in the same place"1. Viruses often rely on fast, but less accurate polymerases for the repli- cation of their genome. The lack of proof-reading by these polymerases makes them more error-prone. Most mutations are detrimental for the function, but the large population of viruses guarantees that some of the new particles are identical to, or even improved versions of, the original. Viruses also frequently acquire genes as fragments from the host genome or from co-infecting viruses. The emergence of exceptionally pathogenic strains of the seasonal flu (such as the Spanish flu in 1918) arises when a virus obtains genetic fragments from another strain. This can be seen as a parallel to sexual reproduction in higher organisms.

3.1.2 Virus classification Viruses are diverse and heterogeneous both in shape and in the way they func- tion. There are many ways of categorizing them based on a number of differ- ent properties. The most important factors are the type and organization of the

1Adopted from the novel "Through the Looking-Glass" by Lewis Carroll.

17 II +ssDNA dsDNA VII

I dsDNA RT DNA RNA

VI +ssRNA mRNA

III V dsRNA –ssRNA

IV +ssRNA

Figure 3.2. The Baltimore classification of viruses is based on the type of genome and how the mRNA is synthesized. Class VI (and also VII) are dependent on reverse transcriptase in some stage of the life-cycle. genome, the shape and composition of the capsid, the presence or absence of a membrane envelope, the host range and different life-cycle aspects. The Baltimore classification is based on how the virus genome is encoded and how the virus synthesize mRNA [35]. It can be RNA or DNA based, single- or double-stranded, positive or negative sense (i.e. sequence is iden- tical or complementary to the mRNA) and it could involve steps with re- verse transcriptase polymerization of DNA from RNA templates (Fig. 3.2). The class IV viruses that have a positive-sense single-stranded RNA (ssRNA) genome, can also directly use the genomic RNA for protein synthesis.

Taxonomy The governing body in virus taxonomy is the International Committee on Taxonomy of Viruses (ICTV)1. They use bioinformatic aspects, such as the genome organization (type and order of genes) and homology (sequence sim- ilarity), as a basis for how viruses are categorized.

Hierarchy of virus taxa • (Order) • Family • (Sub-family)

1http://ictvonline.org/

18 1

Ca2+ 2 3

4 5

6

Figure 3.3. The life-cycle of a general virus involves (1) entry into the host cell (2) disassembly (3) copying of the genome (4) expression of viral genes (5) assembly and (6) exit from cell. The diagram shows a proposed life-cycle for STNV. Depletion of the calcium ions upon entering the cell induces the disassembly of the capsid. The replication of the STNV genome (stage 3) is dependent on a polymerase provided by a co-infecting helper virus.

• Genus • Species

The main taxonomical levels are families (with the suffix -viridae), genera (with the suffix -virus) and species. Each genus is defined by a type species. Species are loosely defined and can consist of multiple isolates, strains or serotypes.

3.1.3 Virus life-cycle Viruses are highly streamlined and lean machines that function in a highly competitive environment with little room for unnecessary components. The structure of a virus is therefore intimately connected to its life-cycle. The steps involved in a simplified virus infection are sketched in Fig. 3.3, although the details are highly depending on the type of host cell. Viruses that integrate into the host genome may also have a lysogenic cycle, where they stay dormant and are propagated to daughter cells when the cell divides.

19 1. Entering the host cell Many plant viruses require a vector, e.g., an insect or a fungus to mechanically bypass the cell membrane and the resilient cell wall [36]. Prokaryote and animal viruses often bind to a specific receptor on the surface of the host cell. They sometimes enter by triggering an endocytosis response, where the virion is internalized in a membrane vesicle. Enveloped viruses can facilitate fusion of the envelope membrane with the membrane of the cell. Some phages stay on the outside of the cell and inject just the DNA or RNA into the cell.

2. Disassembly and release of the genome The capsid needs to provide a robust protection for the genome outside of the cell, but it is also necessary for the virion to be metastable and disassem- ble once inside of the cell. The timing of the disassembly is importance and changes in the environment can be used as cues. Triggering factors can be the lowering of pH inside of endosomes that are targeted for fusion with de- grading lysosomes or the chelation of divalent ions, as is studied in paper I. Mutations that increase the stability of the capsid can reduce or eliminate the infectiousness due to inability to release the RNA or DNA [37].

3. Genome replication The replication of the genome depends on the virus type. Many viruses en- code for their own polymerases that might be faster (and less accurate) or ac- cept other templates. The host has no suitable RNA-dependent RNA or DNA polymerases and therefore negative-sense RNA viruses and are obliged to encode for and package an RNA dependent polymerase.

4. Expression of viral genes Positive-sense ssRNA viruses can immediately recruit the protein synthesis machinery of the cell, while other types of viruses first need to synthesize mRNA (Fig. 3.2). Negative-sense ssRNA viruses have polymerases packaged in the virion to facilitate the synthesis of mRNA. DNA viruses with eukaryotic hosts have to traverse the nuclear envelope to gain access to the DNA expres- sion machinery. Virus genes often have strong promoters, which biases the cell to produce viral proteins. Cells have ribonucleases that specifically target unprotected RNA. The viral mRNA can therefore often form secondary struc- tures that mimic the proteins that protect cellular mRNA from degradation and that facilitate transcription initiation. The virus might need to reprogram the cell cycle to enter a metabolically active phase.

5. Assembly of virions It is common that viruses can self-assemble into mature virus particles. Some viruses require both the genome as well as the capsid proteins to be present. For other viruses, in particular many double-stranded DNA (dsDNA) viruses, the capsid forms first and then ATP driven motors pump the nucleic acid

20 molecule(s) into the capsid. Enveloped viruses often assemble at the sur- face of a cellular membrane where viral membrane proteins have already been inserted. Some viruses utilize scaffold proteins to aid the assembly. Some viruses also pack auxiliary proteins into the virion that are required during the initial stages of the infection, e.g., polymerases. Viruse genomes might have special sequences, packing signals, which facilitate self-recognition to promote—or at least bias—the packing of the native RNA or DNA.

6. Exit from the host cell The virus particles can escape from the host cell through lysis, where the cell bursts. Enveloped viruses often bud off from the surface of the cell mem- brane. Plant cells are connected through openings called plasmodesmata and most plant viruses encode for movement proteins that grant them access to neighboring cells.

3.2 Virus structure A minimal virus consists of an RNA or DNA molecule and an enclosure in the form of a protein capsid. The capsid should be a robust and inert enclosure for the fragile genome. Viruses can superficially be grouped into rod-shaped and isometric/spherical (or some complex variant of these two main classes). The capsid of the former group commonly have helical symmetry while capsids of the latter group usu- ally have icosahedral symmetry. The symmetry enables the construction of a strong enclosure from monomers with a maximum genetic efficiency. This is essential, since it takes three nucleotides to encode for one amino acid, effec- tively making it impossible to construct a monolithic protein capsid that can encapsidate its own mRNA.

3.2.1 Rod-shaped viruses In helical viruses, the CPs form a fiber by associating with the DNA or RNA and condensing into a single or double coiled structure. The helix can be characterized by the rise (axial distance between two consecutive subunits), pitch (axial distance for a complete turn) and amplitude (diameter).

Tobacco mosaic virus Fiber diffraction has been used with success to study the helical tobacco mo- saic virus (TMV). The history of this feat is excellently reviewed in refer- ence [37]. The 2.9 Å structure1 reveals the structure of the CP and how it interacts with the RNA molecule [38].

1PDB ID: 2TMV

21 Figure 3.4. Tobacco mosaic virus, rendition of a short segment of the helical fiber, with the CPs in one turn in white.

The RNA is sandwiched between two consecutive turns of the CP coil. It is embedded in a groove where appropriately positioned arginine side chains can interact with the phosphate backbone. Three nucleotides can fit per protein and one of the the positions specifically favors the binding of a guanine base. The genome of TMV has a origin-of-assembly sequence with a repeated pattern of G residues on every third position, promoting the packing of the native RNA molecule. The 5’ prime end of the genome, on the other hand, lacks guanine in the first 69 positions causing that end to be somewhat destabilized, facilitating access to the RNA molecule for ribosomes [37]. Using similar methods, the structure of both the protein and the RNA of two additional helical viruses has been solved. They are both closely related to TMV: cucumber green mottle mosaic (CGMV) virus, at 3.40 Å1, and ribgrass mosaic virus (RMV), at 2.90 Å2.

3.2.2 Spherical viruses The capsid structure Spherical viruses form capsids with icosahedral symmetry. An icosahedron is the platonic solid that most closely approximate a sphere. It consists of 20

1PDB ID: 1CGM 2PDB ID: 1RMV

22 equilateral triangles and has 5:3:2 symmetry. The 5-fold symmetry axis is at the 12 vertices, the 3-fold is at the center of the 20 triangular faces and the 2- fold symmetry axis is at the middle of the 30 edges. In a frequently cited pub- lication from 1962, Donald Caspar (1927–) and Sir (1926–) laid the theoretical basis of the understanding of capsid symmetries and defined a nomenclature of how to describe icosahedral capsids in terms of triangles [39].

Icosahedral symmetry Small spherical viruses can have each of the 20 triangular faces composed of three CPs, where each protein has exactly equivalent surrounding. Such a capsid consist of 60 proteins and one protein constitutes the icosahedral asym- metric unit (AU). Because of genomic efficiency larger capsids tend to not have larger cap- sid proteins, but rather consist of a greater number of protein subunits. It is not possible to place more than 60 proteins on an icosahedral surface where each protein has the exact same surrounding. Caspar and Klug suggested that proteins can accommodate slight differences in the interaction with neighbor- ing proteins, a concept they coined quasi-symmetry. With quasi-symmetry it is possible to tessellate a surface using multiple types of polygons, e.g., hexagons and pentagons1. A plane can be tiled with hexagons in a honeycomb pattern. It is possible to introduce curvature if hexagons are converted to pentagons and it can be shown that at least 12 pentagons are required to form an enclosed surface. In the most general case, the surface can be tessellated with equilateral tri- angles. The allowed number of triangles per icosahedron facet can be de- scribed with a triangulation number defined as T = P f 2. P is the primi- tive organization and can assume values according to the relationship P = h2 + hk + k2 where h and k are integers larger or equal to zero, that do not share common factors. f represents subdivisions of the primitive triangles and it can be any positive integer (although it is often assumed to be 1). Hence the triangulation numbers can be T = 1, 3, 4, 7, 9, 12, 13, 16, 19, 21, . . . (with the primitives in bold font). Since each triangle consists of 3 proteins and each capsids has 20 faces, the total number of CPs are 60 T [39].

Molecular switches The model proposed by Caspar and Klug seems to hold true for most icosa- hedral virus capsids, ranging from the smallest viruses with a triangulation number of T = 1 (e.g., STNV and STMV, studied in paper I and II) to the largest known virus, the mimivirus [40], that has an estimated triangulation number in the range 972 ≤ T ≤ 1200. Capsids with higher triangulation number that are constructed from a single type of monomer display quasi-symmetry. In some capsids the proteins can

1Alike the iconic Adidas TELSTAR soccer ball from the 1970 FIFA World Cup.

23 adopt different conformations depending on the position in the lattice. Tomato bushy stunt virus (TBSV), a T=3 virus, has a loop sequence which is disor- dered in subunits at two of the three quasi-equivalent positions, but the loops from the proteins at the third position form a β-ring (or β-annulus) at the 3- fold icosahedral symmetry axis [41]. In Flock House virus (FHV), also a T=3 virus, duplex RNA at the 2-fold axis together with an ordered arm from the C-type subunits acts as a wedge and causes the subunit contact to be more flat compared to the contact at the quasi-2-fold interface [42]. Viruses with larger genomes can afford to have multiple CP genes, where different proteins are situated at different positions in the capsid lattice. An- other “solution” is to have a leaky termination codon in the gene for the CP. This causes a fraction of the CPs to have an extra domain at the C-terminal end. When the capsid proteins are not identical, but similar, it is referred to as pseudo-symmetry. In many cases the genes for two or more monomers have mutated into a fusion product with multiple core domains. In large viruses the proteins often pre-assemble into capsomeres consisting of five or six monomers and these constitute the basic building blocks for the capsid assembly. In the case of the mimivirus, the protein monomer is a fusion protein with two jelly roll domains and each capsomere is made up of three proteins [40].

Divalent ions in capsids Many viruses have binding sites for divalent ions, such as calcium, in the capsid. It is especially prevalent in non-enveloped ssRNA viruses that infect plants. It could be explained by a common origin or attributed to the specific physiology of plant cells causing convergent evolution, but it can also be ob- served in insect and fish viruses of the Nodavirdae family. Examples of viruses that have capsids that bind calcium ions are TMV, STNV and TBMV.

The genome structure Relatively little is known about the internal organization of icosahedral viruses. The high-resolution methods that are used to characterize the structure of the capsid relies on the averaging over many copies. In X-ray the virus particles are packed in a lattice and the orientation of the individual par- ticles depends only on the exterior contacts. The interior structures that do not adhere to the same symmetry can therefore be positioned in 60 different orientations and will not be distinguishable from the background. (Structure determination methods described in more detail in Chapter 4.2.)

Single-stranded genomes In a handful of icosahedral viruses it is possible to resolve parts of the nucleic acid molecules. Table 3.1 lists all 17 of these that have been deposited in the Protein Data Bank1 (PDB). All of these have single-stranded genomes and

1http://www.pdb.org/

24 Table 3.1. A non-redundant table of icosahedral viruses in the PDB that contain nucleotides. ∗: Authors state that nucleotides were built on weak densities and may have partial occupancy. ∗∗: Average per virion for multipartite genomes. +: Non- genomic RNA. xx: Bipartite genome, crystal only had the shorter strand. Name PDB Res. Genome T Visible Tot. % Structure, axis ID (Å) (nt) (Kb) BPMV 1BMV 3.00 ssRNA 3 33×20 3.7xx 18 clover, 3 BBV 2BBV 2.80 ssRNA 3 20×30 4.5 13 double helix, 2 CCMV 1CWP 3.20 ssRNA 3 10×60 3.0∗∗ 20 helical DYMV 1DDL 2.70 ssRNA 3 9×60 6.3 9∗ - MS2 1ZDH 2.70 ssRNA 3 13×90 3.6 33+ hairpin, 2 FHV 2Z2Q 2.70 ssRNA 3 24×30 4.5 16 double helix, 2 PaV 1F8V 3.00 ssRNA 3 50×30 5.8 35 dodecahedron PhMV 2XPJ 3.40 ssRNA 3 3×60 6.7 3 - PrV 2QQP 3.80 ssRNA 4 8×30 8.7 3 double helix, 2 STMV 1A34 1.81 ssRNA 1 20×30 1.1 62∗ double helix, 2 +1×60 STNV 3S4G 6.00 ssRNA 1 7×60 1.2 34+∗ double helix TAV 1LAJ 3.40 ssRNA 3 3×60 3.2∗∗ 6 helical TYMV 2FZ2 2.90 ssRNA 3 3×60 6.3 3 - CPV 1IJS 3.25 ssDNA 1 11×60 5.3 12 pseudo knot α3 1M06 3.50 ssDNA 1 10×60 6.1 10∗ - φX174 1RB8 3.50 ssDNA 1 10×60 5.4 11∗ - MVM 1Z1C 3.50 ssDNA 1 22×60 5.1 26 pseudo knot

all are relatively small and non-enveloped. Figure 3.5 show renditions of the RNA with the icosahedral symmetry applied for five of the structures in table 3.1. Only the parts of the genome that have icosahedral symmetry can be seen in these structures. All the resolved nucleic acid can be found at the capsid inner wall in close association with the CPs. Despite being single-stranded, the RNA or DNA can form duplex structures by folding back onto inself. A common feature found among these structures is a double helix positioned horizontally on the 2-fold axis, which can be seen in BBV, FHV, PaV, PrV and STMV. In most structures it is not possible to resolve the base identity of the nu- cleotides, although there are cases where there seem to be a bias for either a smaller pyrimidine or a larger purine base. The exception is MS2, where the crystal was soaked with an RNA oligomer that contained an operator se- quence from the genome [43]. The operator forms a hairpin structure that binds specifically to CP dimers. During infection this is believed to act as a negative feedback loop to balance the amount of CP relative to the amount of genomic RNA.

25 PaV BPMV CCMV

STMV STNV

Figure 3.5. RNA in the five viruses with the largest fraction of the genome having icosahedral symmetry. (Not at scale.)

Protein–genome interactions Single-stranded RNA or DNA viruses commonly have a positively charged capsid interior with alkaline residues lining the inside. A high density of argi- nine and lysine residues can facilitate RNA affinity and neutralize the phos- phate backbone of the nucleic acid. Many of the smaller RNA viruses also have a positively charged N-terminal domain of the CP that forms a partially unstructured arm that extends into the capsid cavity and likely binds the RNA. The number of charges on the arm strongly correlates with the length of the entire genome; there seems to be a linear relationship with a proportionality coefficient of 1.61 ± 0.03 [44]. Some viruses instead encode for a positively charged peptide, that form by proteolytic cleavage of the CP, e.g., Nodaviruses [45]. The peptides associate with the genome. This can be seen as an evolutionary development of having a positive N-terminal arm. Such peptides play similar roles as the arginine-rich sperm nuclear proteins that condenses the chromosomes in spermatozoa.

Double-stranded genomes None of the viruses with double-stranded genomes show any nucleic acid with icosahedral symmetry. This can possibly be due to life-cycle related issues. These viruses frequently package auxiliary proteins into the virion, e.g., to facilitate production of mRNA. Another important difference is that duplex

26 DNA and RNA have a much longer persistence length than single-stranded ditto. The increased stiffness of the fiber implies additional constraints in how the genome can be packed.

Low resolution methods Cryo-electron microscopy (cryo-EM) and, to some extent, neutron scattering have been used to characterize virus genomes at lower resolution. Some of them have icosahedral symmetry at low resolution, e.g., the ssRNA genome of FHV, that forms a dodecahedron cage similar to what can be seen in the closely related PaV [46]. Interesting work has also been done to overcome the need to impose icosahedral symmetry on the virus. It has been shown in cryo- EM experiments that CPV, a single-stranded DNA (ssDNA) virus, binds to host cell receptor asymmetrically in solution [47]. Using a similar approach, it could be shown in situ that the distribution of RNA in MS2 has some polarity when bound to a bacterial pilus [48]. Low-resolution images of double-stranded viruses sometimes show shells of densities at specific radial distances. Many dsDNA bacteriophages have densely packed genomes, which they pump into pre-assembled capsids under high pressure using ATP driven motors. Cryo-EM images of these virions show concentric layers of closely packed DNA [49]. Cryo-EM has also been useful to study membrane enveloped viruses, which do not easily lend themselves to crystallization [50]. The resolution one can obtain using neutron scattering is fairly limited, but a mixture of heavy and normal water can give selective contrast. Heavy water 2 ( H2O) has higher scattering cross-section for neutrons than normal water. A 40% deuterium ratio matches the scattering of protein with the solvent, while 68% deuterium gives the same level as RNA. This method has been used to study the RNA content of STNV [51] (see also paper II).

3.2.3 Other arrangments Not all viruses follow the paradigms of having either helical or icosahedral symmetry. Some viruses have local symmetries similar to icosahedral viruses, but are prolate (oval) shaped with icosahedron-like spherical ends, but with additional rings of CPs inserted. An important example is the human im- munodeficiency virus (HIV), which has a cone-shaped capsid with a T=3 like symmetry formed by hexamers and pentamers [52]. Many bacteriophages (i.e. viruses infecting a bacterium host) have a head structure which usually has icosahedral symmetry (or is prolate), but with an additional tail, that usu- ally has helical symmetry. The Geminiviruses have dumbbell-shaped capsids consisting of two fused “incomplete” icosahedra.

27 4. Protein structure in the gas phase

Several experimental methods require aerosolization of the sample prior to analysis. Important, then, is that the measured property do not change un- expectedly in the transition from the solution to the gas phase. It is difficult to test the effect of vacuum on the structure of macromolecules, other than indirectly. Therefore, theoretical methods are used to predict how vacuum conditions affect the solution structure.

4.1 Structure and function There is an intimate connection between structure and function in biology. The introduction of the use of microscopes by Antonie van Leeuwenhoek (1632– 1723) enabled the study of features smaller than what the naked eye can make out. The advances made in the past decades have revolutionized the biological research and today we have access to structures on the atomic level. Without such knowledge, it is very difficult to understand, e.g., how drug molecules binds to proteins or how mutations might change the function of proteins. However, far from all proteins can be studied with current methods and this has hampered progress in many research areas. Therefore it is important to continue to develope novel experimental techniques and computational meth- ods.

4.1.1 Protein structure Proteins are ubiquitous in cells. They perform a wide array of tasks, every- thing from catalyzing chemical reactions (enzymes), transducing signals (re- ceptors), regulating gene expression (transcription factors), facilitating trans- port across lipid membranes (channels and pumps) maintaining the shape of the cell (cytoskeleton) and many other important functions. Proteins consist of amino acid residues connected by peptide bonds formed between the amine group and the carboxyl group of consecutive residues. Side chains give amino acids their unique chemical properties. A protein usually consists of anything from a few dozen to more than a thousand amino acid residues. The backbone trace, known as the fold, is the scaffold that determines the general shape of the protein structure. The fold is very strongly conserved be- tween related proteins and there seems to exist a finite number of fold families, much fewer than the total number of proteins [53].

28 In general, soluble proteins spontaneously fold into a unique structure, of- ten illustrated as falling down a folding funnel. The folding is driven by the formation of polar interactions, in particular hydrogen bonds, and the burying of a hydrophobic core. Salt bridges between charged residues and covalent linkages between pairs of cystine resides may stabilize the folded structure.

4.1.2 Membrane proteins Membrane proteins can be divided into integral, which have at least one do- main embedded in the lipid bilayer, and peripheral, which are only associated with the membrane surface. Integral membrane proteins that span across the entire width of the bilayer are called transmembrane proteins. The transmembrane domain of proteins often consists of one or multiple α- helices (e.g., G-protein coupled receptors), or a β-barrel (e.g., OmpA studied in paper V). From sequence analyses that predict transmembrane domains, it has been estimated that 20–30% of the genes in most genomes are membrane associated [54]. Many of the most important drug targets are membrane associated proteins, due to their central role in intercell signaling, membrane transport and cell regulation. Yet, very few structures have been solved. Currently1, the Mem- brane Protein Databank2 holds the structures of a mere 286 unique proteins and peptides [55]. It is difficult to develop generally applicable methods to crystallize mem- brane proteins [56]. Membrane proteins are often difficult to over-express, resulting in little material to work with. The protocols used for crystallization of soluble proteins usually do not work, since it is often necessary to include detergent molecules. The resulting crystals can be fragile or do not diffract well enough.

4.2 Structure determination methods Diffraction limits how small objects can be resolved with ligth of a certain wavelength. To study molecular structures at atomistic resolution we need to use X-rays or particle-beams, such as electron beams. Electrons can be focused using magnets and therefore electron microscopes can be designed using similar principles as light microscopes. X-rays on the other hand are much more difficult to focus. Instead it is possible to record the diffraction pattern of the scattered light and then computationally perform the operation a lens would do.

1As of April 15, 2012 2http://www.mpdb.tcd.ie/

29 X-ray crystallography X-ray crystallography is the most prevalent and powerful technique used for the purpose of determining the structure of proteins. The majority of the struc- tures deposited in the PDB were generated using this technique. The theory and the methods are well established and the facilities and tools are well de- veloped and easily accessible. The general principle of the method is to shine rays of high energy onto an object and measure the scattered light. The diffraction pattern contains infor- mation that makes it possible to mathematically reconstruct the shape of the object. The diffraction pattern from a single molecule is unfortunately very weak. However, by using a crystal (i.e. many identical copies of the molecule on an ordered lattice) the light scattered from different molecules will inter- fere. The interference is dependent on the symmetries of the crystal lattice and will be destructive in most directions, but constructive in some directions. The amplification by the constructive interference makes it possible to detect and record the signal. The greatest hurdle with X-ray crystallography is to express and crystal- lize the protein. Far from all proteins have the innate propensity to aggregate orderly on a lattice and are therefore difficult to study. Membrane proteins belong to this group; but also other classes of proteins, e.g., glycoproteins.

Cryo-electron microscopy Electron microscopy (EM) uses beams of electrons to image small structures. In cryo-EM, the sample is prepared by flash freezing at liquid nitrogen tem- perature. This way, the water vitrifies before it has time to form ice crystals. High-resolution three-dimensional structures can be achieved by averaging hundreds or thousands of images of identical objects in different orientations. The strength of cryo-EM is that it does not require the sample to be in crystal form. It cannot reach as high resolution as X-ray crystallography, but state-of-the-art structures come close to atomic resolution.

Nuclear magnetic resonance spectroscopy Nuclear magnetic resonance (NMR) spectroscopy is a method based on the spin-coupling between atoms. The output is a set of distance bounds between atoms. The main advantage of this method is that measurements can be done in solution, but it is limited in the size of the molecules it can be used to study. The 82 kDa maltase synthase G is among the largest molecules solved with NMR yet (in combination with small angle X-ray scattering measure- ments) [57].

Single-particle diffraction imaging Single-particle diffraction imaging is a method that is still under development and can not yet reach atomistic resolution [3, 4]. It uses the short and very in- tense photon pulses generated by accelerator driven free-electron X-ray lasers.

30 ESI CAPILLARY HEATER ION GUIDE/DRIFT TUBE

TO DETECTION SKIMMER

+-

PUMP PUMP Figure 4.1. Schematic of a basic electrospray ionization setup.

The short pulse length and the great intensity of the X-rays removes the need to have a crystal to amplify the diffractive signal.

4.3 Aerosolization Electrospray ionization Electrospray ionization (ESI) is a gentle aerosolization technique (reviewed in reference [58]). The general setup (Fig. 4.1) is a nozzle that sprays the solution into a chamber. A potential difference between the nozzle and the exit orifice ensures that there is a surplus of charged ion species close to the nozzle exit. Drops form and the electric field accelerates the charged drops across the chamber. The fission of the large drops into smaller is believed to be driven by Coulomb repulsion when evaporation causes the drop to shrink. The ions in the drops repel each other and end up on the surface. The number of charges that can be accommodated within a certain sized droplet is determined by the Rayleigh limit (studied by simulations in reference [59]). The dehydration of particles in ESI is a fairly complex process and not entirely understood (studied in paper III–VI). The evaporation of water from the droplet surface is a stochastic process. Water molecules with high kinetic energy will have a greater likelihood of being able to break away, and therefore the average temperature of the remaining droplet will decrease. As the average kinetic energy of the remaining water molecules decreases, the evaporation will eventually level off and there will be residual water that stays associated with the protein [60]. Many ESI setups therefore guide the aerosol through a heating capillary or have a counter flow of heated inert gas to do away with all water.

Matrix-assisted laser desorption/ionization Matrix-assisted laser desorption/ionization (MALDI) is a commonly used al- ternative to ESI, especially as an ionization technique in tandem with MS for the analysis of proteins and other macromolecules [61]. A UV-laser beam, tuned to specifically be absorbed by a supporting matrix, induces an ablation

31 where matrix fragments and sample molecules are ejected (desorption). The ionization of the analyte molecules is believed to happen in the gas phase, when the analyte molecules interact with charged matrix fragments, although the details of the MALDI process are not well understood.

32 5. Summary of papers and conclusions

5.1 Virus simulations Viruses are important and fascinating machines which often use simple chem- ical principles to perform seemingly complex tasks. Many of the questions about how they function could potentially be answered using computational methods, but it has proven to be quite difficult technically. Much of the prob- lem lies in the large size of even the smallest viruses and the long time scales of the processes of such large systems. Previous studies have resorted to uti- lizing coarse-grained modeling of the virus components, but such simulations are relatively inaccurate when it comes to capture the complex molecular in- teractions that governs the dynamics. The few all-atom simulations that have been published so far have covered limited simulation times [62–64].

5.1.1 Virus capsid swelling induced by calcium depletion In paper I we characterized the structure of the swollen STNV capsid. In ex- periments, STNV particles swell when adding a calcium-chelating agent such as ethylenediaminetetraacetic acid, EDTA [65]. STNV have calcium binding sites in between the CPs [66]. The extremely low concentration of free cal- cium in the cytoplasm of the host cell is thought to shift the equilibrium at the binding sites towards being unoccupied. Charge repulsion between cal- cium ligands can then induce swelling of the capsid. This has been suggested to facilitate a simple but effective way for the virus to initiate the dissolution at infection. The same mechanisms has also been suggested for many other icosahedral plant viruses, e.g., CCMV [67]. Simulations of the capsid with the calcium ions expanded in good agree- ment with measurements. From the trajectories we were able to track the movement of water in and out of the native and the swollen capsid and this yealds to our knowledge the first estimate of the permeability of a virus cap- sid. Calcium binding cites close to the 3-fold axis were driving the expansion, while the site at the 5-fold played a passive, structural role. The scope of the simulations transcend what has previously been reported and the results clearly show that processes in such large systems require ex- tensive sampling in time.

33 Figure 5.1. STNV virion constructed from the crystal structure of the capsid pro- tein [68] and the RNA genome modeled with the molecule modeling package MMB (MacroMolecularBuilder) [69]. (The front half of the capsid is not shown.)

5.1.2 Discerning the structure of virus genomes The structural knowledge about the packing of icosahedral virus genomes is very limited and in most cases restricted to low resolution models. In only 17 viruses (Table 3.1) does the RNA or DNA have icosahedral symmetry to the degree that X-ray crystallization can reveal high-resolution features. In paper II we investigated two of these viruses: STMV1 and STNV2. Molecular dynamics simulations of the capsids were carried out both with and without the RNA fragments seen in the crystal structure. There was a pronounced overlap between the distribution of negative chloride ions inside the empty capsid and the location of the crystallographic RNA. We also suggested where additional RNA could possibly be located based on the ion densities. Simulations of the ionic atmosphere could potentially be used to screen the structure of other virus capsids to find nucleic acid binding pockets. Based on the short helical fragments seen in the STNV capsid [70] and an analysis of potential hairpin packing signals in the genome of STNV [71] a model of the entire virion is currently being prepared (Fig. 5.1).

1PDB ID: 1A34 2PDB ID: 3S4G

34 5.2 Gas phase proteins structures From a biological perspective the most relevant environment for protein fold- ing is in solution. Since there is no evolutionary pressure to make the proteins more stable in the gas phase it would not be completely unexpected if proteins would denature in the gas phase, but simulations suggest that this is not the case.

5.2.1 Globular soluble proteins In paper IV and VI we studied a set of small to medium sized globular pro- teins to see how they fare when progressively being dehydrated. The proteins were:

• C-terminal fragment of ribosomal protein L7/L12 (68 amino acids) • Insuline (51 aa) • Lysozyme (129 aa) • Myoglobin (152 aa) • Trp-cage (20 aa) • Ubiquitin (76 aa)

The protein structures were taken from the PDB and simulated using MD in the vacuum with different levels of hydration, on the order of a couple of hydration shells thick. The main conclusion was that the proteins seem to be relatively stable in the gas phase over the time scales and temperature ranges studied. Two important differences when removing the water are the lost opportu- nity to hydrogen bond with the solvent and that there is no longer an entropic penalty to expose hydrophobic patches. Both α-helices and β-sheets are sta- bilized by backbone hydrogen bonds, make them relatively insensitive to the effects of removing the solvent (paper IV). The tertiary structure could potentially be sensitive, especially in proteins with a large hydrophobic core. However, hydrophobic effects are probably more important to guide the folding of proteins down the folding funnel. Once folded, the many polar interactions will create a strong network which is diffi- cult to disrupt in the gas phase. The energy barrier of unfolding in the gas phase is fairly high, as seen in fluorescence probe experiments in the gas phase of Trp-Cage [72]. In paper IV we increased the temperature considerably in the starting droplet. Although the temperature initially drops quite rapidly due to evaporative cooling, the highest temperature systems leveled out on temperatures elevated relative to body temperature. Still, none of the proteins showed any tendency of large scale unfolding with exposure of the core residues to the vacuum. So even if the solution structure might not represent the global energy minimum in the

35 Crystal Bulk water Vacuum

Figure 5.2. Transmembrane domain of OmpA: The crystal structure (with some of the loops, that could not be resolved missing) to the left and embedded in a DPC micelle in bulk water and in vacuum to the right. Dark areas of the protein indicates high mobility. gas phase, the proteins seem to be metastable—kinetically trapped in a deep energy well. Although we used a fairly heterogenous set of proteins, it could be argued that the sample size was limited. However, a larger scale analysis of 30 pro- teins representing a wide spectrum of different fold families came to similar conclusions [73]. In that study it was also shown that rehydration into bulk water readily reverted most of the proteins back to solution structures.

5.2.2 Micelle embedded proteins MS of gas phase complexes of proteins and detergent molecules suggest that protein-micelle complexes can survive the ESI process [74, 75]. In paper V we tested this by embedding the transmembrane domain (residue 1–171) from the outer membrane protein A of E. coli in a lipid micelle (Fig. 5.2). The transmembrane β-barrel was stable, but the micelle reorganized itself to max- imize the number of intramolecular head-group interactions and some of the tail groups flipped out. This is potentially very promising, since membrane proteins are some of the most interesting targets for single-molecule diffrac- tion imaging. The photon pulses from X-ray free electron lasers are extremely short, on the femtosecond time scale. Yet concern has been raised that the radiation damage might start to deteriorate the protein structure already during the ex- posure, causing blurred images. The ionizing events cause the build-up of charge, which will lead to a Coulomb-repulsion driven explosion [3]. Simula-

36 Figure 5.3. Virus in vacuum with a water layer. tions indicate that the explosion will start at the surface of the sample. It could therefore potentially be useful to have a sacrificial tamper layer to protect the sample [76]. MS data indicate that it is possible to embed non-membrane pro- teins in reverse detergent micelles in vacuum [77] and could possibly be a can- didate for a protective coating of protein samples. In paper III we mimicked the setup from the MS experiment and showed that in the case of myoglobin a reverse micelle could have a stabilizing effect in vacuum, relative to the naked protein.

5.3 Determining orientation of a water-covered virus in vacuum Diffractive imaging of single particles will yeald projection images of the ob- ject. To produce high resolution three-dimensional structures, it will require the combination of a large set of images [78]. A prerequisite for doing this is that either the orientation of the object in each image is known, or that there is enough information in each image such that the orientation can be determined. In paper VII we predict how much a layer of water would interfere with the ability to determine the orientation of a small virus (Fig. 5.3). We find that the water does obfuscate the signal from the virus, but the peak in the signal

37 does not coincide with the peak from the water. This should make it feasible to separate the two signals, if the water layer is not too thick.

5.4 Future perspectives 5.4.1 Biology and computation Computational biology is a rapidly growing field. The trend in the life sciences is moving towards large-scale projects (e.g., sequencing of genomes and even entire biotopes [29]) and high-throughput methods that generate large amounts of data. The increased sampling gives more reliable results, but the analysis as well as the synthesis of data from different experiments also require more advanced computational methods. In the field of molecular dynamics the possibilities grow as quickly as the computer technology improves. The increased performance will allow us to improve the models (such as using polarizable forcefields or quantum mechan- ics level of modeling), to increase sampling in time and in phase space and to model larger systems. Some impressive results from the D. E. Shaw Research group, who devel- oped Anton, the molecular dynamics computer, have been reported [14]. Yet, special-purpose computers is not the most promising hardware development. Instead, so called streaming computers—which consist of many arithmetic- logic units in parallel, akin to the graphics cards in personal computers—can provide raw strength at a reasonable cost making them a viable future compu- tational platform. The greatest challenge lies in reformulating the problems to fully take advantage of this kind of hardware.

5.4.2 New structure determination methods Great strides have recently been made with diffractive imaging using free elec- tron X-ray lasers. Single images of virus particles have been generated [4] and as the photon sources improve, it may become possible to image smaller objects. The undertaking of constructing three-dimensional virus structures could soon test the predictions in paper VII. Another highly promising use of the X-ray lasers is to image tiny protein crystals [5]. The crystal pattern greatly increases the diffraction signal and this method can utilize crystals that are too small to be mounted in a synchrotron beam. This technique shows promise for structure determination of membrane proteins [6]. Electron microscopy-based methods are also becoming more ubiquitous. The steady improvements have brought lots of valuable low-resolution infor- mation about structures of virus genomes, but also about dynamic processes, such as detailed three-dimensional views of the budding of HIV from infected cells [79].

38 6. Summary in Swedish/Sammanfattning på svenska

Utforskande av Proteiners och Virus Molekylära Dynamik

Struktur och funktion Det finns ett nära sammanband mellan struktur och funktion inom biologi. Om vi har kunskap om form och struktur, så kan vi ofta förstå och förutsäga hur saker fungerar. Upptäckten av mikroskopet var ett stort framsteg som gjorde att man kunde studera föremål mindre än vad blotta ögat kan urskilja. Men ljusets vågnatur begränsar hur hög upplösning man kan uppnå. Genom att använda röntgenstrålar kan man uppnå atomär upplösning och därigenom studera livets molekyler—proteiner, nukleinsyror, fetter och sockerarter—på detaljnivå.

Röntgenkristallografi Dessvärre är det inte helt lätt att avbilda ett föremål med hjälp av röntgen- strålar. Det är mycket svårare att konstruera linser och speglar för att fokusera röntgenljus än för synligt ljus. Därför måste man istället samla upp det spridda ljuset och fokusera bilderna på matematisk väg. Dessutom skadar det joniser- ande röntgenljuset provet och det är därför svårt att få tillräckligt med ljus till detektorn. Men genom att belysa en kristall bestående av väldigt många ident- iska molekyler får man ett gittermönster med växelvis destruktiv och konstruk- tiv interferens hos vågorna. Förstärkningen i de ljusa punkterna blir så pass mycket starkare att de går att mäta innan provet hinner vittra sönder. Proteiner och andra molekyler är inte stela kroppar, utan är avsevärt flexibla och rörliga. De växelverkar ofta med andra molekyler, till exempelvis genom att klibba ihop. De bilder man får vid röntgenkristallografi är medelstrukturer av tusentals likadana molekyler, som sitter frusna på ett ordnat rutnät. Med datorns hjälp kan vi dock studera hur de rör sig och samspelar med varandra.

Molekyldynamik Molekyldynamik är en beskrivning av ett molekylsystem som bygger på klass- isk mekanik. Atomerna modelleras som mjuka klot som växelverkar med varandra enligt enkla matematiska samband, som provats ut för att efterlikna mätdata. Vi kan beräkna och summera växelverkan mellan alla atomer och

39 förutsäga hur atomernas position ändras en kort tid senare. Atomernas nya konstellation kommer att ändra hur de växelverkar med varandra och nya pos- itionsförändringar kan beräknas. På så vis kan man stegvis flytta atomerna i en lång tidsserie. Utifrån positionerna och hastigheterna hos atomerna kan vi räkna ut makro- skopiska egenskaper såsom temperatur, tryck, energibarriärer, hastighetskon- stanter, etc. Vi kan även tolka tidsserierna som filmer av molekylernas dans.

Virus Virus orsakar sjukdomar hos alla slags djur, växter och encelliga organismer. En viruspartikel består i den enklaste formen av arvsmassa—i form av RNA eller DNA—packad i ett skyddande proteinhölje, en så kallad kapsid. Viruset har förmågan att ta sig in i en värdcell och programmera om den så att den pro- ducerar nya kapsidproteiner, arvsmassemolekyler och andra molekyler som den behöver. Sedan sammanfogas dessa delar—ofta spontant—till nya virus. Viruskapsider är (i de allra flesta fallen) antingen stav- eller bollformade och består av identiska proteiner som sammanfogas i ett symmetriskt mönster. Det är förhållandevis lätt att studera viruskapsiders uppbyggnad med hjälp av röntgenkristallografi och därför vet vi hur de ser ut. Tyvärr är dock insidan i de allra flesta fallen helt utsuddad, så vi vet inte mycket om hur arvsmassan ser ut inuti viruspartikeln.

Virusdynamik När ett virus träder in i en cell, så behöver den göra arvsmassan åtkomlig för det maskineri i cellen som läser av generna och tillverkar proteiner. I uppsats I studerar vi ett litet växtvirus som heter Satellit-tobaksnekros- virus. Det viruset har kalciumjoner bundna mellan proteinerna, vilka tros spela en viktig roll i samband med infektionen. I experiment har det konstaterats att när kalciumjonerna tas bort, så sväller viruspartikeln upp. Växtceller håller halten av fria kalciumjoner på en väldigt låg nivå genom att binda dem till proteiner eller pumpa bort dem. Det medför att viruset dräneras på joner, vilket leder till att kapsiden faller isär och arvsmassan frigörs. Det har varit svårt att avbilda den uppsvällda viruspartikeln. Därför använde vi oss av simuleringar för att återskapa svällningen hos viruset. Vi kunde lokalisera ställen i kapsiden där vatten kunde strömma in och ut ur den svullna viruspartikeln och vår slutsats var att processen som leder till att viruset faller isär börjar där.

Arvsmassan i virus Eftersom arvsmassan oftast inte kan ses inuti viruspartiklar ville vi i uppsats II undersöka det med hjälp av molekyldynamik. Arvsmassan består av nuk- leinsyror, som är starkt negativt laddade. Vår hypotes var att man skulle kunna studera var inuti viruskapsiden arvsmassan är bunden genom att analysera hur de negativa jonerna från lösningsmedlet fördelar sig. Det visade sig finnas

40 ett starkt samband mellan jonfördelningen i jämförelse med några av de fåtal kända strukturerna av RNA inuti virus.

Proteiner i vacuum Röntgenkristallografi har en begränsning som är svår att kringgå och det är att man måste kunna skapa kristaller av det föremål man vill avbilda. Men en ny metod som håller på att utvecklas innebär att man kan avbilda föremål med hjälp av röntgenljus utan att ha en kristall. Det är tack vare utvecklandet av nya strålkällor som kan producera röntgenljus med mycket hög intensitet i väldigt korta pulser. Men för att kunna använda den avbildningsmetoden så måste man injicera provet i en vakuumkammare. Det finns även andra experimentella metoder, såsom mass-spektrometri, som kräver att proteiner utsätts för vakuumliknande förhållanden. I uppsats III–VII undersökte vi hur proteiner, protein/fett-komplex och en viruspartikel beter sig i vakuum. Vi förutsäger att proverna klarar att befinna sig i vakuum utan att falla sönder eller bli missformade. Proteinerna kan dessu- tom kapslas in i vattenlager eller i lipid-miceller för att skyddas, även om de till viss del kan störa signalen vid mätningarna.

Framtiden för beräkningsbiologi Beräkningsbiologi är en snabbt växande gren inom de så kallade livsveten- skaperna (biologi, medicin, farmaci, biokemi, biofysik och angränsande om- råden) driven av teknikutvecklingen inom både de experimentella grenarna, men även av själva datorhårdvaran. Inom molekyldynamik går utvecklingen mot utnyttjandet av grafikkortsliknande datorer, som innehåller tusentals par- allella beräkningskärnor.

41 7. Acknowledgments

The direct and indirect support from many people made this thesis possible.

Maia, Gunilla Carlss

se m

reFimn,Cr aea,AeadaPtiso,Dne Spå Daniel Patriksson, Alexandra Caleman, Carl Friemann, arie on, Magnus Bergh, Gösta Huldt, Sara Lejon Dodd

Liljas, Erik Marklund, Yaofeng Wang, Ro

, hita eneg aagZag aakaa uudr Inger Tuguldur, Badamkhatan Zhang, Hayang Wennberg, Christian

gben

David van der Spoel, Janos Hajdu, Lars r

rland, Imtiyas Syed,g, Fenglin Lina Amlinger, Sigrid Wang, Petterson, Erika Beata Anu Ziaja, Edgard Weckert, Samuel Flor

hnHb ai pse,TmsEeeg inaIa,Filipe Iwan, Bianca Ekeberg, Tomas Uppsten, Malin Hub, chen o

s, J e Thank you very much, all of you!

ily, f riends, anyone else that forgot I and Eva

Holm, Christoffer Cede

A drsn ihe a u,Mri ebr,Mri vna ynKa Lynn Svenda, Martin Seibert, Marvin Lun, van Michiel ndersson,

n

dsson, Anders Eriksson, Everyone at Xray and ICM, fam

ri,AeadeBroo jr krå,Mikael Skarpås, Björn Barrozo, Alexandre erlin, m

42 References

[1] Marth, J. D. Nat. Cell Biol. 2008, 10, 1015–1015. [2] Fenn, J.; Mann, M.; Meng, C.; Wong, S.; Whitehouse, C. Science 1989, 246, 64–71. [3] Neutze, R.; Wouts, R.; van der Spoel, D.; Weckert, E.; Hajdu, J. Nature 2000, 406, 752–757. [4] Seibert, M. M. et al. Nature 2011, 470, 78–81. [5] Chapman, H. N. et al. Nature 2011, 470, 73–77. [6] Johansson, L. C. et al. Nat. Meth. 2012, 9, 263–265. [7] Johansson, M.; Bouakaz, E.; Lovmar, M.; Ehrenberg, M. Mol. Cell 2008, 30, 589–598. [8] Burton, R. E.; Huang, G. S.; Daugherty, M. A.; Fullbright, P. W.; Oas, T. G. J. Mol. Biol. 1996, 263, 311–322. [9] Kubelka, J.; Hofrichter, J.; Eaton, W. A. Curr. Opin. Struc. Biol. 2004, 14, 76 – 88. [10] Henzler-Wildman, K.; Kern, D. Nature 2007, 450, 964–972. [11] van der Spoel, D.; van Maaren, P. J.; Larsson, P.; TÃ R mneanu, N. J. Phys. Chem. B 2006, 110, 4393–4398. [12] Feenstra, K. A.; Hess, B.; Berendsen, H. J. C. J. Comput. Chem. 1999, 20, 786–798. [13] Mei, C.; Sun, Y.; Zheng, G.; Bohm, E.; Kale, L.; Phillips, J.; Harrison, C. Enabling and Scaling Biomolecular Simulations of 100 Million Atoms on Petascale Machines with a Multicore-optimized Message-driven Runtime. SC ’11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. New York, NY, USA, 2011. [14] Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Shaw, D. E. Science 2011, 334, 517–520. [15] Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. J. Chem. Theory Comput. 2008, 4, 435–447. [16] van Gunsteren, W. F.; Berendsen, H. J. C. Angew. Chem. Int. Edit. 1990, 29, 992–1023. [17] Levitt, M.; Lifson, S. J. Mol. Biol. 1969, 46, 269 – 279. [18] Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Proteins 2006, 65, 712–725. [19] Mackerell, A. D.; Feig, M.; Brooks, C. L. J. Comput. Chem. 2004, 25, 1400–1415. [20] Jorgensen, W. L.; Tirado-Rives, J. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 6665–6670. [21] Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. 1993, 98, 10089–10092. [22] Mazur, A. K. J. Comput. Phys. 1997, 136, 354–365. [23] Kushner, D. J.; Baker, A.; Dunstall, T. G. Can. J. Physiol. Pharm. 1999, 77, 79–88.

43 [24] Hess, B. J. Chem. Theory Comput. 2008, 4, 116–122. [25] Acheson, N. H. Fundamentals of Molecular Virology; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2007. [26] Moreira, D.; Lopez-Garcia, P. Nat. Rev. Micro. 2009, 7, 306–311. [27] Raoult, D.; Forterre, P. Nat. Rev. Micro. 2008, 6, 315–319. [28] Edwards, R. A.; Rohwer, F. Nat. Rev. Micro. 2005, 3, 504–510. [29] Parsons, R. J.; Breitbart, M.; Lomas, M. W.; Carlson, C. A. ISME J. 2012, 6, 273–284. [30] Woese, C. R.; Kandler, O.; Wheelis, M. L. Proc. Natl. Acad. Sci. U.S.A. 1990, 87, 4576–4579. [31] Khayat, R.; Johnson, J. E. Structure 2011, 19, 904 – 906. [32] Koonin, E.; Senkevich, T.; Dolja, V. Biol. Direct 2006, 1, 29. [33] Valen, L. V. Evolutionary Theory 1973, 1, 1–30. [34] Brockhurst, M. A. Science 2011, 333, 166–167. [35] Baltimore, D. Bacteriol. Rev. 1971, 35, 235–241. [36] Kassanis, B.; MacFarlane, I. J. Gen. Virol. 1968, 3, 227–232. [37] Stubbs, G. Philos. Trans. R. Soc. Lond. B 557, 354, 551. [38] Namba, K.; Pattanayek, R.; Stubbs, G. J. Mol. Biol. 1989, 208, 307–325. [39] Caspar, D. L. D.; Klug, A. Cold Spring Harb. Symp. Quant. Biol. 1962, 27, 1–24. [40] Xiao, C.; Kuznetsov, Y. G.; Sun, S.; Hafenstein, S. L.; Kostyuchenko, V. A.; Chipman, P. R.; Suzan-Monti, M.; Raoult, D.; McPherson, A.; Rossmann, M. G. PLoS Biol. 2009, 7, e1000092. [41] Olson, A.; Bricogne, G.; Harrison, S. J. Mol. Biol. 1983, 171, 61–93. [42] Fisher, A. J.; Johnson, J. E. Nature 1993, 361, 176–179. [43] Valegård, K.; Murray, J. B.; Stonehouse, N. J.; van den Worm, S.; Stockley, P. G.; Liljas, L. J. Mol. Biol. 1997, 270, 724 – 738. [44] Belyi, V. A.; Muthukumar, M. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 17174–17178. [45] Speir, J. A.; Taylor, D. J.; Natarajan, P.; Pringle, F. M.; Ball, L. A.; Johnson, J. E. Structure 2010, 18, 700 – 709. [46] Tihova, M.; Dryden, K. A.; Le, T.-v. L.; Harvey, S. C.; Johnson, J. E.; Yeager, M.; Schneemann, A. J. Virol. 2004, 78, 2897–2905. [47] Hafenstein, S.; Palermo, L. M.; Kostyuchenko, V. A.; Xiao, C.; Morais, M. C.; Nelson, C. D. S.; Bowman, V. D.; Battisti, A. J.; Chipman, P. R.; Parrish, C. R.; Rossmann, M. G. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 6585–6589. [48] Toropova, K.; Stockley, P. G.; Ranson, N. A. J. Mol. Biol. 2011, 408, 408 – 419. [49] Zhang, Z.; Greene, B.; Thuman-Commike, P. A.; Jakana, J.; Jr, P. E. P.; King, J.; Chiu, W. J. Mol. Biol. 2000, 297, 615 – 626. [50] Mancini, E. J.; Clarke, M.; Gowen, B. E.; Rutten, T.; Fuller, S. D. Mol. Cell 2000, 5, 255 – 266. [51] Bentley, G.; Lewit-Bentley, A.; Liljas, L.; Skoglund, U.; Roth, M.; Unge, T. J. Mol. Biol. 1987, 194, 129 – 141. [52] Pornillos, O.; Ganser-Pornillos, B. K.; Yeager, M. Nature 2011, 469, 424–427. [53] Murzin, A. G.; Brenner, S. E.; Hubbard, T.; Chothia, C. J. Mol. Biol. 1995, 247, 536–540.

44 [54] Krogh, A.; Larsson, B.; von Heijne, G.; Sonnhammer, E. L. J. Mol. Biol. 2001, 305, 567 – 580. [55] Raman, P.; Cherezov, V.; Caffrey, M. Cell. Mol. Life Sci. 2006, 63, 36–51. [56] Newby, Z. E. R.; O’Connell, J. D.; Gruswitz, F.; Hays, F. A.; Harries, W. E. C.; Harwood, I. M.; Ho, J. D.; Lee, J. K.; Savage, D. F.; Miercke, L. J. W.; Stroud, R. M. Nat. Protoc. 2009, 4, 619–637. [57] Grishaev, A.; Tugarinov, V.; Kay, L.; Trewhella, J.; Bax, A. Journal of Biomolecular NMR 2008, 40, 95–106, 10.1007/s10858-007-9211-5. [58] Wilm, M. Mol. Cell. Proteomics 2011, [59] Caleman, C.; van der Spoel, D. Phys. Chem. Chem. Phys. 2007, 9, 5105–5111. [60] Caleman, C.; van der Spoel, D. J. Chem. Phys. 2006, 125, 154508. [61] Karas, M.; Hillenkamp, F. Anal. Chem. 1988, 60, 2299–2301. [62] Freddolino, P. L.; Arkhipov, A. S.; Larson, S. B.; McPherson, A.; Schulten, K. Structure 2006, 14, 437–449. [63] Zink, M.; Grubmüller, H. Biophys. J. 2009, 96, 1350–1363. [64] Zink, M.; Grubmüller, H. Biophys. J. 2010, 98, 687–695. [65] Unge, T.; Montelius, I.; Liljas, L.; Öfverstedt, L. G. Virology 1986, 152, 207–218. [66] Liljas, L.; Unge, T.; Jones, T. A.; Fridborg, K.; Lovgren, S.; Skoglund, U.; Strandberg, B. J. Mol. Biol. 1982, 159, 93–108. [67] Speir, J. A.; Munshi, S.; Wang, G. J.; Baker, T. S.; Johnson, J. E. Structure 1995, 3, 63–78. [68] Jones, T. A.; Liljas, L. J. Mol. Biol. 1984, 177, 735–767. [69] Flores, S. C.; Altman, R. B. RNA 2010, 16, 1769–1778. [70] Lane, S. W.; Dennis, C. A.; Lane, C. L.; Trinh, C. H.; Rizkallah, P. J.; Stockley, P. G.; Phillips, S. E. V. J. Mol. Biol. 2011, 413, 41–50. [71] Bunka, D. H.; Lane, S. W.; Lane, C. L.; Dykeman, E. C.; Ford, R. J.; Barker, A. M.; Twarock, R.; Phillips, S. E.; Stockley, P. G. J. Mol. Biol. 2011, 413, 51 – 65. [72] Iavarone, A. T.; Patriksson, A.; van der Spoel, D.; Parks, J. H. J. Am. Chem. Soc. 2007, 129, 6726–6735. [73] Meyer, T.; de la Cruz, X.; Orozco, M. Structure 2009, 17, 88 – 95. [74] Ilag, L. L.; Ubarretxena-Belandia, I.; Tate, C. G.; Robinson, C. V. J. Am. Chem. Soc. 2004, 126, 14362–14363, PMID: 15521749. [75] Barrera, N. P.; Di Bartolo, N.; Booth, P. J.; Robinson, C. V. Science 2008, 321, 243–246. [76] Hau-Riege, S. P.; Boutet, S.; Barty, A.; Bajt, S. c. v.; Bogan, M. J.; Frank, M.; Andreasson, J.; Iwan, B.; Seibert, M. M.; Hajdu, J.; Sakdinawat, A.; Schulz, J.; Treusch, R.; Chapman, H. N. Phys. Rev. Lett. 2010, 104, 064801. [77] Sharon, M.; Ilag, L. L.; Robinson, C. V. J. Am. Chem. Soc. 2007, 129, 8740–8746. [78] Huldt, G.; Szöke, A.; Hajdu, J. J. Struct. Biol. 2003, 144, 219 – 227, Analytical Methods and Software Tools for Macromolecular Microscopy. [79] Carlson, L.-A.; Briggs, J. A.; Glass, B.; Riches, J. D.; Simon, M. N.; Johnson, M. C.; Müller, B.; Grünewald, K.; Kräusslich, H.-G. Cell Host Microbe 2008, 4, 592 – 599.

45