BUILDING BETTER MODELS: PROTOCOL OPTIMIZATION OF MOLECULAR DYNAMICS SIMULATIONS OF APOLIPOPROTEINS

A THESIS

Presented to the University Honors Program

California State University, Long Beach

In Partial Fulfillment

of the Requirements for the

University Honors Program Certificate

Yessica K. Gomez

Fall 2016

ABSTRACT

BUILDING BETTER MODELS: PROTOCOL OPTIMIZATION

OF MOLECULAR DYNAMICS SIMULATIONS

OF APOLIPOPROTEINS

By

Yessica K. Gomez

December 2016

Apolipoproteins are biologically-ubiquitous lipid transport that have been implicated in the onset of Alzheimer’s disease and serve as a target for therapies focused on drug delivery to the brain. Their distinct topology and electrostatic makeup have contributed to the lack of computational chemistry studies on them. In this study, we report a many-microsecond data set comprising a variety of molecular dynamics simulations meant to probe the effects of varying force field, long-range electrostatics, and cut-off distances on three different apolipoprotein structures.

The results display a clear consensus on the ideal settings to recreate biological conditions, with all three structures exhibiting the best behavior using the AMBER-94 force field with particle mesh Ewald (PME) and mid-range cut-off distances. These results are in line with previous force field studies and will serve as an initiation point for future high-temperature unfolding studies that will provide new details regarding the mechanics of the folding pathway.

ACKNOWLEDGEMENTS

My initial thanks goes to Dr. Sorin, without whose mentoring and resources this project would not have been possible. Thank you as well to all the members of the Sorin Lab I have worked with over the past three years – particularly Dakota and Xavier – and all others who collaborated with me on this project.

I wish to acknowledge the University Honors Program, especially Kashima and Lizette, for their academic advice, and help in getting me through the program, and Dr. Thien, who was an excellent and knowledgeable 498 instructor. Additional thanks to all the friends I made in the

UHP: your encouragement and empathy has been invaluable while writing this thesis.

Finally, thanks to my family, whose unwavering support over the course of my academic career has gotten me where I am today. I am eternally grateful for everything you have given me and sacrificed alongside me, especially in the four years over which this project was developed and completed.

iii

TABLE OF CONTENTS

Page

ACKNOWLEDGEMENTS ...... iii

LIST OF TABLES ...... vi

LIST OF FIGURES ...... vii

LIST OF ABBREVIATIONS ...... viii

CHAPTER

1. INTRODUCTION ...... 1

Apolipoproteins...... 1 Molecular Dynamics ...... 4

2. METHODS ...... 8

Choosing MD Parameters ...... 8 MD Simulations ...... 11

3. RESULTS AND DISCUSSION ...... 13 Results ...... 13 Discussion ...... 14 Conclusion ...... 16

APPENDIX…… ...... 18 REFERENCES ...... 22

iv

LIST OF TABLES

TABLE Page

1. Final list of parameters chosen for comparison ...... 10

2. Most ideal parameters for apolipoprotein simulation ...... 14

3A. Differences in native state structural behavior based on initial parameters For 1LS4 ...... 19

3B. Differences in native state structural behavior based on initial parameters For 2KC3 ...... 20

3C. Differences in native state structural behavior based on initial parameters For 2L7B ...... 21

v

LIST OF FIGURES

FIGURE Page

1. Size classes and composition of lipid micelles in humans ...... 2

2. Starting structures for all three apolipoproteins ...... 9

3. Time-dependent fluctuation of RMSD and radius of gyration ...... 13

vi

LIST OF ABBREVIATIONS

Apo E Apolipoprotein E

Apo III Apolipophorin III

BAPP Beta amyloid precursor

CT Carboxyl terminus

NT Amino terminus

MD Molecular dynamics

RMSD Root mean square deviation

PME Particle mesh Ewald

RF Reaction-field

VDW Van der Waals

SS Steady-state

DSSP estimation algorithm

AMBER Assisted Model Building with Energy Refinement

vii

CHAPTER 1

INTRODUCTION

Apolipoproteins

Apolipoproteins are a diverse class of alpha-helical lipid transport proteins ubiquitous in the human body1. They are also found in other animals, such as insects, where they serve different structurally-mediated roles. Natively, they exist in an unbound, free-floating conformation where the helices bundle themselves together so the hydrophobic faces are hidden away from the surrounding water2. Most commonly, these bundles are found in small groups of around four, although additional helices may be present to serve as wrappers or tails.

Discovery of apolipoproteins began in the seventies3, and from there, it was quickly apparent that many different apolipoprotein families exist. In humans, there are currently six main classes: A-E and H. Each class is affiliated with transporting one or more different types of lipid micelle, which is a spherical amphipathic monolayer enclosing many free-floating cholesterol and lipid (fat) molecules4. A cutaway demonstrating the contents of these lipid micelles is shown Figure 1b.

When bound to a lipid package known as a micelle, the apo- prefix is dropped and they are simply referred to as lipoproteins. The association of activated lipoproteins with corresponding micelles is a physical transformation in which the native state opens up via some unfolding mechanism, the mechanics of which is still in contention among researchers5-7. This behavior is also illustrated when the protein attaches to a signal receptor, triggering some response within the cell2. Once bound, the lipid-protein complex can travel through the body via

1 either the lymphatic or circulatory system. In addition, each apolipoprotein family is also associated with a particular organ or organs within the body. For example, the apolipoprotein E

(apo E) family is associated primarily with transporting the chylomicron and low-density (LDL) classes of micelle to and from the brain8.

Figure 1. Size classes and composition of lipid micelles in humans. (a) Micelles are classified by density, and can be referred to as high, low, very low, or chylomicron. The lipoprotein wraps around the micelle via hydrophilic interactions with the polar lipid membrane heads. (b) The cutaway diagram demonstrates the variety of cholesterol and lipid molecules that rely on apolipoprotein association for proper transport8.

Apolipoprotein E

Apo E is especially important because its main transport location means it can cross the blood brain barrier, a highly selective semi-permeable membrane that protects the central nervous system from the circulatory system2. This unique feature has made it the focus of a multitude of scientific studies that have allowed its other, equally interesting characteristics to come to light. It is now known that apo E exists in three separate genotypes: apo E2, apo E3, and apo E4, which can appear in any combination with each other depending on which two alleles a human has for the protein. While the structures of these isoforms vary by only a single nucleotide each, their biological function is extremely different9. Mutations in apolipoprotein E

2 genotype, specifically from apo E2 or E3 to apo E4, have been correlated with improper interactions with amyloid beta peptides via the beta amyloid precursor protein (BAPP)10. These peptides are one of the precursors for fatty plaques in the brain, creating a strong link between these unfavorable contacts and the development of blood vessel-based diseases of the brain and heart, such as atherosclerosis, dementia, and Alzheimer's.

Alzheimer's Disease. Alzheimer's disease (AD) is a neurodegenerative disorder that manifests itself as a progressive loss of memory and psychological control. It affects people worldwide, yet there is currently no cure. One of the most compelling theories regarding the source of AD is platelet build-up in the brain caused by an improper or broken interaction between amyloid beta and apo E9. Barger et al. provided evidence of this by demonstrating that although BAPP levels are directly correlated with amyloid beta levels in healthy aging patients, those with AD had decreased BAPP levels and instead showed increased apo E levels.

Additionally, the brain plaques themselves showed high apo E content10. A third piece of evidence is the current research which suggests this protein may possess a natural propensity for misfolding, which can often lead to the formation of toxic, high mass oligomers that aggregate into platelets5. Yet of the three possible apo E genotypes, only the apo E4 allele is currently implicated in the development of AD. Simply having this apo E4 genotype is considered a major genetic determinant in developing Alzheimer's disease, particularly in patients over age 85. The risk increases in an individual with 2 copies of the E4 allele by 50% to 90%, and by 45% in those with one copy of the allele10.

Atherosclerosis. Atherosclerosis is a cardiac disease analogous to AD, similarly characterized by lipid deposits in blood vessels where plaque rupture often leads to apoptosis, among other symptoms11. While previous studies by Zhou et al. demonstrating spontaneous

3 atherosclerotic lesions that grew over time after removal of apo E from mice11 initially seems to imply that apo E is necessary to curb the improper accumulation of LDL, later studies strongly indicated that this result was due mainly to the prevalence of the apo E3 isoform, which is nonmalignant. Results were expected to vary greatly in situations where the apo E4 allele was predominant, based on a progression of plaque build-up analogous to what has been observed in the brain5.

Apolipophorin III

Of the apolipoproteins found in invertebrates, apolipophorin III (apo III) is one of the most important due to its small size and simplistic composition. Apo III is a five- parallel bundle isolated from the hemolymph of Locusta migratoria, where it serves as a generalized transporter for fats, cholesterol, and hydrocarbons12. In addition to its compact nature, it is known to display a high level of general stability6,13. These characteristics create a protein that functions well as a simple model or test system.

Molecular Dynamics

Molecular dynamics simulation (MD), a widely useful tool based off computer modeling and programming, is applied herein to elucidate chemical kinetics of a protein system, with an emphasis on biologically relevant stability. An MD run can be thought of as a slideshow of a biological system over time, in which a three-dimensional structure of a protein is placed in a virtual box filled with solvent and forces are applied to each protein unit at a set time interval, known as a timestep. In an all-atom simulation, the protein units are atoms, with both the protein and the solvent represented as large groups of individual atoms with individual size and charge.

Forces are calculated using force fields, pre-defined mathematical models for calculating interactions between any two potential atoms14. The box itself does not float in a vacuum, but is

4 always surrounded on all sides by boxes identical to itself, which experience the same forces and create a kind of matrix known as periodic boundary conditions, where going too far in any direction brings the molecule around to the other side. These virtual boxes are not inherently cubic, and many simulations are performed using hexagonal prisms or truncated octahedrons.

While performing these simulations can be computationally expensive, there are a variety of potential assumptions which can be made to greatly speed up the run. The most effective are coarse-grained models, which treat large functional groups or even peptides as single units rather than considering every atom, and implicit solvent models, where an algorithm is applied to mimic the effects of solvent on a protein rather than putting individual solvent molecules in the box. Other methods include increasing the timestep or changing the box shape to remove nonessential water molecules15. The disadvantage of these methods is reduced physicality and greater potential for deviance compared to more detailed and exact methods. Overall however, the computer-based method provides several advantages, such as the ability to achieve atomic levels of resolution which experimental chemists struggle with. It also gives researchers the ability to study transition states of a pathway, rather than just static folded and unfolded states15.

The simulations herein were performed using a program called GROMACS, an open- source molecular dynamics suite from the University of Groningen, Netherlands16. GROMACS includes all the tools necessary to execute a complete all-atom simulation: energy minimization of the protein, annealing of the water surrounding the protein, and running the simulation. It also has other tools available for installation, such as the module used to analysis general secondary structure, and analysis tools used to generate root mean square deviation (RMSD) and radius of gyration values17. While the GROMACS software already contains several preinstalled force- fields, the ffAMBER family of force-fields was employed in this study instead18. Assisted

5 Model Building with Energy Refinement, known as AMBER for short, consists of a molecular simulation software package and a series of public domain force-fields that can be used in many different modeling programs. Only the latter were used in this study. In addition, beginning with version 4.0, the ffAMBER files used in this study are now part of the default GROMACS distribution19. All simulations were carried out on the twelve-node computing cluster known as

SPOT, which is housed in the CSU Long Beach University Library, where it allows lab researchers to perform and monitor simulations of biological molecules.

Significance

Prior computational work involving apolipoproteins has been very limited in scope. This may be due in part to the lack of available structures with sufficient resolution for molecular dynamics studies, stemming from the difficulties in working with larger proteins or those with a high number of charged side chains. While the first apolipoprotein structures were determined in the eighties, advancement to newer structural prediction and elucidation techniques, such as

NMR, on this front has been slow1,20, and there remains a great deal to discover. Computational studies, particularly those employing MD simulation methods, have typically been of unusually small duration21-23, especially when compared to concurrent studies and predictions about the near future of protein simulation24. Despite this, there are a multitude of experimental studies being performed using human apolipoproteins, with significant attention being placed on the relation between apo E and neurological diseases25,26.

This study represents an initial effort to bridge the gap between experimentalists and computationalists in this field by providing the information necessary to model these proteins more effectively. It builds off of preliminary work done by the same research group in assessing the structural effects of certain AMBER force fields on the small, flexible apo III27. It is also, to

6 the author’s knowledge, the largest apolipoprotein MD simulation data set presented to date.

Overall, the author hopes this study will encourage other researchers to pursue new avenues of computational investigation involving apolipoproteins.

7

CHAPTER 2

METHODS

To set up and successfully run the most stable, biologically realistic all-atom MD simulation of a helical apolipoprotein, multivariate testing of MD simulation parameters was applied. The expected result was an optimized protocol, which would then be applied to an expansive volume of parallel runs to generate an expansive, novel, and detailed data set.

Choosing MD Parameters

For this study, three apolipoproteins were chosen based on criteria which emphasized the need to compare all factors across different systems of varying function and size, with emphasis on apo E. The first structure (PDB ID: 2KC3) was taken from a 2009 NMR paper and contained the entire N terminal (NT) of apo E320. The other structure (PDB ID: 2L7B) was also NMR- derived, but contained both the apo E-NT and C terminal (CT) domains1. This 2011 study characterized the structure as having fast, reversible transitions that indicated limited potential for conformational changes, and found it to be significantly larger than the NT-only structure at

307 versus 184 residues. This is due in part to the larger protein also displaying a series of small wrapper helices which wound around the main helix bundle. The third protein was an apo III structure (PDB ID: 1LS4), which, despite being complete, was the smallest of the three at 180 residues in size12. These variations in structure were necessary for establishing an optimized set of modeling conditions that would be valid for any lipoprotein. Starting structures for all three proteins were obtained from the RCSB databank and are displayed side by side in Figure 2.

8

Figure 2. Starting structures for all three apolipoproteins. These were generated using the Jmol viewer integrated into the RCSB website: (a) 1LS4, (b) 2KC3, (c) 2L7B. The rainbow color scheme goes from blue to red and NT to CT. The cartoon representation shows alpha helices as and both coil and sections as ropes.

For each production run, GROMACS employs a single MDP file which incorporates all the simulation run parameters. Using this MDP file, three other independent variables were identified for testing: force field, long-range electrostatics, and cut-off distance scheme. Four force fields were chosen from the dozens available in the molecular modelling community based on previous studies demonstrating their physical relevance, and particular attention was given to those known to favor helical conformations due to simplified bond energies, such as AMBER ff9418. Two kinds of long-range electrostatics were employed: reaction-field (RF) and particle mesh Ewald (PME). These algorithms provided a method for determining atom pair interactions beyond a certain distance, which was specified as below. Both methods made use of periodic boundary conditions to create a repeating, seemingly endless environment which facilitated the calculation of long-distance interactions. Finally, cut-off schemes were a single variable used to represent four different parameters grouped as one. Cut-off distances for the switch-type simulations done here are the radii (rin and rout in the equation below) at which GROMACS

9 switches from explicit calculation of pairwise Lennard-Jones and electrostatic interactions to a tapered calculation, followed by no calculation of interactions.

This ramping of both Coulomb and Van der Waals (VDW) interactions as a function of distance greatly speeds up the calculation in a way that discourages the potential for simulation artifacts. There are four such distances specified in GROMACS (version 3.3): rlist, rcoulomb, rvdw, and rvdw_switch, which correspond to both Coulomb cut-off distance and VDW cut-off distance. The organization of distances into preset schemes facilitates large-scale comparison and is supported by prior work in this field. Distances set between 0.8 and 1.2 Å may be considered moderate; for comparison, the average distance at which a hydrogen bond forms is

3.0 Å28. These schemes, along with all other variables, are listed in Table 1 for reference.

Table 1. Final list of parameters chosen for comparison. The proteins are referred to by their PDB ID. The force fields are all AMBER, referred to as A- and distinguished by number. For convenience, the four cut-off distances are labelled as L (rlist), C (rcoulomb), V (rvdw), and S (rvdw_switch). All other abbreviations are as discussed in text.

PDB Structure Force Field Long-range electrostatics Cut-off schemes 1LS4 A03 PME L=0.8, C=0.6, V=0.6, S=0.4 2KC3 A94 RF L=1.0, C=0.8, V=0.8, S=0.6 2L7B A99 L=1.2, C=0.8, V=1.0, S=0.8 A99φ L=1.2, C=1.0, V=0.8, S=0.6 L=1.2, C=1.0, V=1.0, S=0.8 L=1.2, C=1.2, V=1.2, S=1.0 L=1.4, C=1.2, V=1.2, S=1.0

10

MD Simulations

The protein was modelled using all-atom molecular dynamics simulations with the

AMBER potential ported to the GROMACS 3.3 molecular modelling software16,29. A cubic box size of 80 Å was used for 1LS4 and 90 Å for both 2KC3 and 2L7B. On average, this required approximately 20,000 TIP3P water molecules as solvent30. Neutrality was obtained by adding both Cl- and Na+ ions randomly to counteract the net positive protein charge. All systems were annealed for 1.0 ns prior to the collection of production simulations. They ran at 1 atm and 300

K without any artificial biasing potentials or restraints, using a 2.0 fs timestep with the LINCS algorithm to constrain bonds31. Conformations were stored every 100 ps.

Overall, fifty permutations were tested, with each permutation having four simultaneous clone runs that could be averaged to weigh against potential error. In some cases, duplicate sets of four were run. While some theoretically possible permutations were not chosen from the list for data collection, based on the current understanding of protein simulation, the results discussed in Chapter 3 are representative of the range of possibilities dictated by the software.

Following the completion of the runs, an analysis script was created using the Perl scripting language. This script was designed to make a text directory of all the existing simulations, and for each clone run in every group of simulations, generate RMSD and radius of gyration XTC files using GROMACS. Working from this directory, the script used the

Statistics::Basic Perl module to calculate mean and standard deviation for these two variables across all clones in each simulation set. Spatial RMSD and radius of gyration are two of the most common metrics currently used to assess simulation stability based on deviations from the starting state, along with other factors such as solvent-accessible surface area.

11 A second script was developed in Python, using a translated framework of the first, to generate and aggregate line plots of RMSD and radius of gyration versus time for each simulation set using matplotlib32. From the line plots, a steady-state time was visually determined for each set of runs. The steady-state (SS) time represents a timestep t after which the system can be considered to have reached equilibrium, and is generally identified based on the relative convergence of the RMSD and radius of gyration of all the clone runs. This is necessary to prevent undue influence from early timeframes in which the protein typically undergoes excessive conformational changes to reach an initial energetically stable state.

After this information was determined for the entire data set, an updated version of the first Perl script was created which ran as before, but also ran secondary structure characterization with DSSP, calculated using only timesteps greater than t. It also gave new, post-equilibration runtimes. DSSP is a hydrogen bond estimation algorithm for GROMACS which classifies every residue at each timeframe in a simulation as one of the following: , pi helix, three-ten helix, , coil, or turn. Taken together, the changes in this structural timeline allow for monitoring of substructure formation and transition. It is also useful for visualization of the protein trajectory in molecular viewer programs such as VMD or Pymol33,34, which can assist in identification of trends over time.

12

CHAPTER 3

RESULTS AND DISCUSSION

Results

The initial round of results after running both scripts provided a graph of time-based trajectory for both RMSD and radius of gyration for each set of simulations, an example of which is given in Figure 3. As discussed previously, these results were used to determine SS times, which were incorporated into the Perl analysis script. The final execution of the updated script described in Chapter 2 gave a tab-delineated output that could be used to compare different runs, reproduced in the appendix as Table 3.

Figure 3. Representative plot of RMSD and radius of gyration fluctuation over time. The figure presents the fluctuation in the two reaction coordinates of interest, the y-axis, versus time, the x-axis. It was generated using the matplotlib library in Python, which automatically determined the domain and range based on file input. The vertical dashed line indicates the conservative estimate of where the group converges (t). A total of fifty such figures were generated across the entire data set.

13 The data set represented a total of 301 simulations, with a combined overall runtime of

8.47 milliseconds (ms), of which 5.52 ms were post-equilibrium.

Discussion

To rank the results from Table 3, three characteristics were chosen to reflect relative stability: low RMSD, high number of helical residues, and the deviation of calculated radius of gyration from the native radius of gyration. Due to the experimental origin of the structures employed in the study, greater emphasis was placed on the first two characteristics. The values were compared within the protein, as each protein had a nonnegligible difference in average structure. As an example, consider the contents of Table 3B. Ranked by RMSD, the three top sets are the ff94 sets using EWALD with cut-offs of L1.2-C0.8-V1.0-S0.8 and L1.2-C0.8-V1.0-

S0.8, and the same cut-offs with the ff99Φ force field (RMSD = 3.4, 3.6, and 3.7 Å, respectively). By helical content, the ranking becomes ff94 with EWALD and L1.2-C1.0-V0.8-

S0.6 cut-offs, again with L1.2-C0.8-V1.0-S0.8, and the ff03 force field with EWALD and L1.2-

C1.0-V0.8-S0.6 cut-offs, having 117.96, 117.15, and 116.62 helical residues on average, respectively. Only by combining these two metrics and validating the results against initial radius of gyration (1.798 Å for 2KC3)20, do we obtain the most comprehensive result. Using this method, the most ideal simulation settings found for each protein are given in Table 2 as follows:

Table 2. Ideal parameters for apolipoprotein simulation. As determined via weighed comparison of preliminary results from the previous table, using the method described in the text.

PDB Force Long-range electrostatics MDP Cut-off Settings (Å) Field L C V S 1LS4 A94 EWALD 1.2 1.0 1.0 0.8 2KC3 A94 EWALD 1.2 0.8 1.0 0.8 2L7B A94 EWALD 1.2 1.0 1.0 0.8

14 Table 2 also demonstrates the high level of consistency among the results. For all three protein types, the force field and long-range electrostatic type was the same. In addition, the cut- off schemes were identical in two out of three cases, with the third differing at only a single value. This strong consensus is indicative of results with wide-spread applicability.

This result is not initially surprising because as previously stated, force fields with simplified bonding energies were already likely candidates for helical modelling. AMBER ff94 in particular uses a form of RESP which causes it to slightly favor hidden nonpolar residues, a hallmark feature of the classic alpha helical structure - a potential flaw which has been experimentally verified35. What was unusual was that ff94 is actually the oldest AMBER force field, and one of the earliest second-generation force fields, developed in 1995. At the time, it was a significant improvement over the first generation of force fields because it was developed alongside popular solvent models, such as TIP3P, and took into account the “overpolarization” inherent in the representation of liquid water36. In comparison, AMBER ff99 and ff96 were developed later, partially in an effort to address the issues inherent to the ff94 force field. These newer fields also have potential drawbacks however, such as potential conflict between the two sets of internal dihedral angles, as illustrated by Hornak et al. in 200637. While it makes sense that that an older algorithm could provide more reliable results, it is an interesting testament to the validity of the model that it performed so favorably with newer systems and software. The work done by the Sorin group had already shown that the AMBER-99SB force field (not tested herein) significantly destabilized the protein helix and induced apo III unfolding orders of magnitude faster than expected27. It also showed that ff03 lacked the experimentally observed

310 helical content which is present with other force fields and overweighed the contribution of certain backbone configurations. However, it gave a favorable review of ff99φ, stating the

15 average radius of gyration radius was well in agreement with the NMR structure. This remains generally true for the data in this study, although other force fields have performed demonstrably better than ff99φ. Comparatively, the recent MD simulation work published by Huang et al. employs the charmm27 force field, which is a very close derivative of the charmm22 force field created around the same time as ff9436. They report a maximum of approximately 3.0 Å for

RMSD and 18.6 Å for radius of gyration for a four-helix apo E structure 165 residues in size, which is comparable to the results reported here, though it is important to note that the short simulation time they employed may not properly reflect the most energetically stable conformation.

The long-range electrostatics method chosen is also of importance moving forward as is has not been the default for most protein simulations. Reaction-field has been the preferred method due to its relative speed, but it may be that the long-range interactions of certain biological systems is better served by the PME method. Finally, the selected cut-off distances were the least contentious result, as it has been well established over numerous studies14,29 that cut-off distances should be of moderate length. Excessively long cut-offs bog down the simulation, and can decrease the rate so much as to render the calculation unfeasible.

Conversely, excessively short cut-offs decrease the physicality of the simulation and risk leaving out important interactions at moderate distances.

Conclusion

In conclusion, we find that the best settings for this type of system are AMBER ff94, with

PME and moderate cut-offs. The next steps for this study will be using these settings to collect a massive number of high temperature unfolding simulations to elucidate the transition state pathway for the model apolipoproteins. This type of unfolding mechanism can be translated to

16 the folding mechanism for the activation of the apolipoprotein E bundle via the principle of microscopic reversibility, which has been demonstrated using Monte Carlo computer simulations15. This nature of this mechanism is especially of interest based on the extensive debate among researchers in this field20, particularly those looking at potential drug delivery applications25. Ideally, this project can provide novel insight into the behavior of apolipoproteins on the molecular level, as well as a foundation for future collaborations with experimentalists also studying these species.

17

APPENDIX

DIFFERENCES IN NATIVE STATE STRUCTURAL BEHAVIOR

BASED ON INITIAL PARAMETERS

18

1LS4

for

based on initial parameters on initial based

Differences in structural behavior for native statebehaviorfor native structural in Differences

. .

A1

Table Table

19

20 21

REFERENCES

22

REFERENCES

1 Chen, J., Li, Q. & Wang, J. Topology of human apolipoprotein E3 uniquely regulates its diverse biological functions. Proc Natl Acad Sci U S A 108, 14813-14818, doi:10.1073/pnas.1106420108 (2011).

2 Weers, P. M. et al. Lipid binding ability of human apolipoprotein E N-terminal domain isoforms: correlation with protein stability? Biophys Chem 100, 481-492 (2003).

3 Weisgraber, K. H. in Lipoproteins, Apolipoproteins, and Lipases Vol. 45 249-295 (Academic Press 1994).

4 Mabtech. Apolipoprotein Research - Mabtech., 2014).

5 Narayanaswami, V., Kiss, R. S. & Weers, P. M. The helix bundle: a reversible lipid binding motif. Comp Biochem Physiol A Mol Integr Physiol 155, 123-133, doi:10.1016/j.cbpa.2009.09.009 (2010).

6 Soulages, J. L. & Bendavid, O. J. The lipid binding activity of the exchangeable apolipoprotein apolipophorin-III correlates with the formation of a partially folded conformation. Biochemistry 37, 10203-10210, doi:10.1021/bi980622l (1998).

7 Leon, L. J., Idangodage, H., Wan, C. P. L. & Weers, P. M. M. Apolipophorin III: Lipopolysaccharide binding requires helix bundle opening. Biochem Bioph Res Co 348, 1328-1333, doi:10.1016/j.bbrc.2006.07.199 (2006).

8 Lehninger, A. L. Principles of Biochemistry. 6th edn, (2012).

9 Poirier, J. Apolipoprotein E and Alzheimer's disease. Ann. N.Y. Acad. Sci. 972, 81-92 (2001).

10 Barger, S. W., DeWall, K. M., Liu, L., Mrak, R. E. & Griffin, W. S. T. Relationships between expression of apolipoprotein E and beta-amyloid precursor protein are altered in proximity to Alzheimer beta-amyloid plaques: Potential explanations from cell culture studies. J Neuropath Exp Neur 67, 773-783 (2008).

11 Zhou, J., Lhotak, S., Hilditch, B. A. & Austin, R. C. Activation of the unfolded protein response occurs at all stages of atherosclerotic lesion development in apolipoprotein E- deficient mice. Circulation 111, 1814-1821, doi:10.1161/01.Cir.0000160864.31351.C1 (2005).

23 12 Fan, D., Zheng, Y., Yang, D. & Wang, J. NMR solution structure and dynamics of an exchangeable apolipoprotein, Locusta migratoria apolipophorin III. J Biol Chem 278, 21212-21220, doi:10.1074/jbc.M208486200 (2003).

13 Weers, P. M., Abdullahi, W. E., Cabrera, J. M. & Hsu, T. C. Role of buried polar residues in helix bundle stability and lipid binding of apolipophorin III: destabilization by threonine 31. Biochemistry 44, 8810-8816, doi:10.1021/bi050502v (2005).

14 Pande, V. S., Sorin, E. J., Snow, C. D. & Rhee, Y. M. in Protein Folding, Misfolding and Aggregation: Classical Themes and Novel Approaches (ed Victor Munoz) 161-187 (Royal Society of Chemistry - Biomolecular Sciences, 2008).

15 Toofany, R. D. D., Valerie. Understanding protein unfolding from molecular simulations. Wiley Interdisciplinary Reviews: Computational Molecular Science 2, 405-423 (2012).

16 Lindahl, E., Hess, B. & van der Spoel, D. GROMACS 3.0: a package for molecular simulation and trajectory analysis. J Mol Model 7, 306-317 (2001).

17 Sorin, E. J. & Pande, V. S. Exploring the helix-coil transition via all-atom equilibrium ensemble simulations. Biophys J 88, 2472-2493, doi:10.1529/biophysj.104.051938 (2005).

18 Cornell, W. D. et al. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules (vol 117, pg 5179). J Am Chem Soc 118, 2309- 2309, doi:10.1021/Ja955032e (1996).

19 Hess, B., Kutzner, C., van der Spoel, D. & Lindahl, E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J Chem Theory Comput 4, 435-447, doi:10.1021/ct700301q (2008).

20 Sivashanmugam, A. & Wang, J. J. A Unified Scheme for Initiation and Conformational Adaptation of Human Apolipoprotein E N-terminal Domain upon Lipoprotein Binding and for Receptor Binding Activity. J Biol Chem 284, 14657-14666, doi:10.1074/jbc.M901012200 (2009).

21 Prevost, M. Anatomy by computer experiment of the exchange of a water molecule buried in human apolipoprotein E. Fold Des 3, 345-351, doi:10.1016/S1359- 0278(98)00049-2 (1998).

22 Prevost, M. & Kocher, J. P. Structural characterization by computer experiments of the lipid-free LDL-receptor-binding domain of apolipoprotein E. Protein Engineering 12, 475-483, doi:DOI 10.1093/protein/12.6.475 (1999).

23 Ortmans, I. & Prevost, M. Analysis of the structural and dynamic properties of human N- terminal domain of apolipoprotein E by molecular dynamics simulations. J Phys Chem B 112, 8730-8736, doi:10.1021/jp8002678 (2008).

24 24 Snow, C. D., Sorin, E. J., Rhee, Y. M. & Pande, V. S. How Well Can Simulation Predict Protein Folding Kinetics and Thermodynamics? Annu. Rev. Biophys. Biomol. Struct. 34, 43–69 (2005).

25 Kim, S. H. et al. Targeted Intracellular Delivery of Resveratrol to Glioblastoma Cells Using Apolipoprotein E-Containing Reconstituted HDL as a Nanovehicle. PloS one 10, doi:10.1371/journal.pone.0135130 (2015).

26 Vasunilashorn, S. M. et al. Does Apolipoprotein E Genotype Increase Risk of Postoperative Delirium? Psychiatric research 23, 1029-1037 (2015).

27 Thompson, E. J., DePaul, A. J., Patel, S. S. & Sorin, E. J. Evaluating molecular mechanical potentials for helical peptides and proteins. PloS one 5, e10056, doi:10.1371/journal.pone.0010056 (2010).

28 Jeffrey, G. A. An introduction to hydrogen bonding. (Oxford University Press, 1997).

29 ffAMBER: AMBER force field ports for the GROMACS molecular dynamics suite (http://chemistry.csulb.edu/ffamber/, 2008).

30 Jorgensen, W. L. C., J.; Madura, J.D.; Impey, R.W.; Klein, M.L. Comparison of simple potential functions for simulating liquid water. The Journal of Chemical Physics 79 (1983).

31 Hess, B. B., H.; Berendsen, H.J.C.; Fraaije, J.G.E.M. . LINCS: A linear constraint solver for molecular simulations. J. Comput. Chem. 18, 1463-1472 (1997).

32 Hunter, J. D. Matplotlib: A 2D graphics environment. Computing In Science & Engineering 9, 90-95, doi:10.1109/MCSE.2007.55 (2007).

33 Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J Mol Graph 14, 33-38, 27-38 (1996).

34 Schrodinger, LLC. The PyMOL Molecular Graphics System, Version 1.8 (2015).

35 Garcia, A. E. & Sanbonmatsu, K. Y. Alpha-helical stabilization by side chain shielding of backbone hydrogen bonds. Proc Natl Acad Sci U S A 99, 2782-2787, doi:10.1073/pnas.042496899 (2002).

36 Ponder, J. W. & Case, D. A. Force fields for protein simulations. Adv Protein Chem 66, 27-85 (2003).

37 Hornak, V. et al. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins 65, 712-725, doi:10.1002/prot.21123 (2006).

25

Grade Form: Honors Thesis/Project (UHP 498)

Student Name ______Student ID# ______

Monday, December 5th, 2016: Deadline for email submission of grade form to [email protected]

Please evaluate student’s work according to discipline-specific conventions and the following Honors criteria:

___/70pts -- Complete thesis is due Thursday, November 17th, 2016: o Cover page and signature page o Abstract (150 words) o Table of contents o Body (5000 words) o Bibliography All of the above to be submitted as one PDF document to Thesis Advisor and Honors via BeachBoard.

___/20pts – Thesis Panel Presentation (variable dates; advisor attendance encouraged) o Oral presentation of thesis followed by Q & A o Required elements: research question; thesis or hypothesis; justification of significance; methodology; results; conclusions

___/10pts -- Quality of Scholarship Please consider the following in assignment of the final grade: o Excellent writing skills and superior demonstration of scholarship o Regular correspondence & meeting attendance regarding progress of thesis and draft materials Please note: The University Honors Program strongly encourages advisors to hold students to a standard of excellence and to downgrade work with numerous misspellings and/or grammatical errors: www.csulb.edu/divisions/aa/catalog/current/academic_regulations/part_one_definitions.html

Final Grade /100 UHP Participation Thesis grades are subject to review by the Honors Program and may change based on satisfactory completion of thesis project requirements. Course substitutions are dependent on the fulfillment of outlined requirements (below) and submission of completed course work via Beach Board.

Students are responsible for completing all of the following:

o Thesis Orientation Meeting Plagiarism in an Honors Thesis Project will result in a failing

o Advanced Research Workshop II grade and the student will not graduate with an Honors o Recognizing Plagiarism Tutorial and Certificate. Please note: using work submitted for other Certificate (unless completed in UHP 496): courses is self-plagiarism. www.indiana.edu/~istd/definition.html Certificates are due to the UHP office no later than Friday, th o Thesis Panel Presentation September 9 , 2016 o Meeting with Honors Director Advisor Name ______Signature ______

Date ______