<<

Investigation of Protein/Ligand Interactions Relating Structural Dynamics to Function: Combined Computational and Experimental Approaches

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Ryan Elliott Pavlovicz

Graduate Program in Biophysics

The Ohio State University

2014

Dissertation Committee:

Chenglong Li, Advisor

Charles E. Bell

Michael E. Paulaitis

Copyright by

Ryan Elliott Pavlovicz

2014

Abstract

The use of computers in chemistry has matured significantly since the

introduction of the modern personal computer, leading to the development of

many tools that may be used to describe phenomena that are difficult or

otherwise impossible to observe experimentally. Computational chemistry is

becoming an integral component of many research programs, often resulting in

the formation of hypotheses that may be tested experimentally. This document details the application of computational tools to study ligand/receptor interactions

in two systems: the nicotinic receptor (nAChR) and the retinoic acid receptor (RAR).

The nAChR study details how a combination of homology modeling, molecular dynamics (MD), blind docking, and free energy analysis may be used to

determine the binding site of a ligand with few clues from experiment to guide the

search. Specifically, the binding site for a class of negative allosteric nAChR

modulators was successfully identified. The computationally predicted binding

site was verified by functional assays performed on receptors that were mutated

at the suspected binding site. Additionally, a comparison of structural data from

homologous proteins and MD simulations of the receptor in complex with an

allosteric modulator lead to a proposed mode of allosteric antagonism that

involves inhibition of C loop closure, thereby preventing channel opening.

ii

In the RAR study, both computation and experiment were applied to characterize

the activity of two β-apocarotenoids that have been previously described as

antagonists of all-trans retinoic acid (ATRA), the endogenous RAR agonist. The

activity of RAR ligands is related to how they influence the interaction between

the receptor and coactivator proteins that lead to gene transcription. The results

of isothermal titration calorimetry (ITC) experiments indicate that the β-

apocarotenoids induce an interaction between the receptor and coactivator that

is intermediate in strength between the unliganded and ATRA-bound receptor,

indicating that these compounds would be most accurately characterized as

partial agonists instead of antagonists. One of the partial agonists, β-apo-13- carotenone, exhibits an unexpectedly high affinity for RAR given its chemical differences from known high-affinity binders. Modeling this compound in the RAR binding site lead to the hypothesis that a covalent interaction may be occurring between the carotenone and a conserved cysteine residue in the binding pocket.

While not conclusive, NMR and mass spectrometry experiments suggest that this interaction is indeed occurring.

Computational free energy analysis was also performed between the ligand- bound receptors and the coactivator. Using the molecular mechanics Poisson-

Boltzmann surface area (MM-PBSA) method applied to microsecond MD simulations, very strong correlation was achieved between the computational binding energies and the experimental ITC data, providing support that the compounds were correctly modeled in the RAR binding pocket. Converged

iii

binding energy averages that lead to the strong correlation with experiment were

contingent upon simulation lengths of ~1 μs, and inclusion of both the calculated

PBSA free energy of solvation and entropic components of binding were found to

strengthen the correlation.

Finally, Chapter 5 includes a study on the parameterization of a new atom type for use in the pair-wise additive AMBER force field. The sulfonium atom type is not included in the current set of parameters since it is relatively uncommon in biology. However, S-adenosylmethionine (SAM), the most common methyl donor in biology, is a notable exception. The development of sulfonium parameters required for the MD simulation of SAM is discussed in detail.

iv

Dedication

This document is dedicated to my family.

v

Vita

1999 ...... North Royalton High School

2004 ...... B.S. Electrical and Computer Engineering, Ohio State University

2006-2014 ...... Graduate Research Assistant, Department of Pharmacy, Ohio State University

2008-2010 ...... American Foundation for Pharmaceutical Education Pre-Doctoral Fellowship

2010 ...... NSF East Asian and Pacific Summer Institutes for U.S. Graduate Students Research Fellowship (Institute of Biophysics, Beijing, China)

2010 ...... American Chemical Society Division of Medicinal Chemistry Pre-Doctoral Fellowship

2011 ...... Outstanding Student Achievement Award presented by the Ohio State University Biophysics Graduate Committee

2012 ...... Presidential Fellowship (Ohio State University Graduate School)

vi

Publications

1.) Frey EN, Pavlovicz RE, Wegman CJ, Li C, Askwith CC. “Conformational changes in the lower palm domain of ASIC1a contribute to desensitization and RFamide modulation”, PLOS ONE, 8(8): e71733, 2013.

2.) Yi B, Long S, González-Cestari TF, Henderson BJ, Pavlovicz RE, Werbovetz K, Li C, McKay DB. “Discovery of benzamide analogs as negative allosteric modulators of human neuronal nicotinic receptors: pharmacophore modeling, rational design, and structure-activity relationship studies”, Bioorganic & Medicinal Chemistry, 21(15):473-43, 2013.

3.) Liu MJ, Bao S, Gálvez-Peralta M, Pyle CJ, Rudawsky AC, Pavlovicz RE, Killilea DW, Li C, Nebert DW, Wewers MD, Knoell DL. “ZIP8 regulates host defense through zinc-mediated inhibition of NF-κB.” Cell Reports, 3(2):386-400, 2013.

4.) Still P, Yi B, González-Cestari TF, Pan L, Pavlovicz RE, Chai H-B, Ninh T, Li C, Soejarto DD, McKay DB, Kinghor AD. “Alkaloids from Microcos paniuclata with cytotoxic and nicotinic receptor antagonistic activities”, Journal of Natural Products, 76(2):243-439, 2013.

5.) Koval OM, Snyder JS, Wolf RM, Pavlovicz RE, Cardona N, Glynn P, Leymaster ND, Dun W, Wright PJ, Qian L, Mitchell CC, Boyden PA, Binkley PF, Li C, Anderson ME, Mohler PJ, Hund TJ. “CaMKII-based regulation of voltage-gated Na+ channel in cardiac disease”, Circulation, 126(17):2084-94, 2012.

6.) Henderson BJ, González-Cestari TF, Yi B, Pavlovicz RE, Boyd RT, Li C, Bergmeier SC, McKay DB. “Defining the putative inhibitory site for a selective allosteric modulator of human α4β2 neuronal nicotinic receptors”, ACS Chemical Neuroscience, 3(9):682-92, 2012.

7.) Henderson BJ, Carper DJ, González-Cestari TF, Yi B, Mahasenan KV, Pavlovicz RE, Dalefield ML, Coleman RS, Li C, McKay DB. “Structure-activity relationship studies of sulfonylpiperazine analogues as novel negative allosteric modulators of human neuronal nicotinic receptors”, Journal of Medicinal Chemistry, 54(24):8681-92, 2011.

8.) Mahasenan KV, Pavlovicz RE, Henderson BJ, González-Cestari TF, Yi B, McKay DB, Li C. “Discovery of novel α4β2 neuronal nicotinic receptor modulators through structure-based virtual screening”, ACS Medicinal Chemistry Letters, 2(11):855-860, 2011.

9.) Pavlovicz RE, Henderson BJ, Bonnell AB, Boyd RT, McKay DB, Li C. “Identification of a novel negative allosteric site on human α4β2 and α3β4 neuronal nicotinic acetylcholine receptors”, PLOS ONE, 6(9): e24949, 2011.

10.) West MB, Wickham S, Quinalty LM, Pavlovicz RE, Li C, Hanigan M. “Autocatalytic cleavage of human gamma-glutamyl transpeptidase is highly dependent on N-glycosylation at asparagine 95”, Journal of Biological Chemistry, 286(33):28876-88, 2011.

11.) Henderson BJ, Pavlovicz RE, Allen JD, González-Cestari TF, Orac CM, Bonnell AB, Zhu MX, Boyd RT, Li C, Bergmeier SC, McKay DB. “Negative allosteric modulators that target human α4β2 neuronal nicotinic receptors”, Journal of Pharmacology and Experimental Therapeutics, 334(3):761-74, 2010.

vii

12.) Doddapaneni K, Mahler B, Pavlovicz RE, Haushalter A, Yuan C, Wu Z. “Solution structure of RCL, a novel 2’-deoxyribonucleoside 5’-monophosphate N-glycosidase”, Journal of Molecular Biology, 394(3), 423-434, 2009.

13.) Tiwari R, Mahasenan K, Pavlovicz RE, Li C, Tjarks W. “Carborane clusters in computational drug design: a comparative docking evaluation using AutoDock, FlexX, Glide, and Surflex”, Journal of Chemical Informatics, 49(6), 1581-1589, 2009.

14.) Liu Z, Liu S, Xie Z, Pavlovicz RE, Wu J, Chen P, Aimiuwu J, Pang J, Bhasin D, Neviani P, Fuchs JR, Plass C, Li PK, Li C, Huang THM, Wu LC, Rush L, Wang H, Perrotti D, Marcucci G, Chan KK. “Modulation of DNA methylation by a sesquiterpene lactone pathenolide”, Journal of Pharmacology and Experimental Therapeutics, 329(2), 505-514, 2009.

15.) González-Cestari TF, Henderson BJ, Pavlovicz RE, McKay SB, El-Hajj RA, Pulipaka AB, Orac CM, Reed DD, Boyd RT, Zhu MX, Li C, Bergmeier SC, McKay DB. “Effect of novel negative allosteric modulators of neuronal nicotinic receptors on cells expressing native and recombinant nicotinic receptors: implications for drug discovery”, Journal of Pharmacology and Experimental Therapeutics, 328(2), 504-515, 2009.

16.) Liu Z, Xie Z, Jones W, Pavlovicz RE, Liu S, Li PK, Lin J, Fuchs JR, Marcucci G, Li C, Chan KK. “Curcumin is a potent DNA hypomethylation agent”, Bioorganic & Medicinal Chemistry Letters, 19(3), 706-709, 2009.

Fields of Study

Major Field: Biophysics

viii

Table of Contents

Abstract ...... ii

Dedication ...... v

Vita ...... vi

List of Tables ...... xii

List of Figures ...... xv

List of Abbreviations ...... xxi

Chapter 1 . Introduction ...... 1

1.1 Computers in Biochemistry ...... 1

1.2 Free Energy Calculations ...... 12

1.3. Dissertation Themes and Organization ...... 17

Chapter 2 . Identification of a Negative Allosteric Binding Site on the Nicotinic

Acetylcholine Receptor ...... 21

2.1 Introduction ...... 21

2.2 nAChR Background ...... 22

2.3 Homology Modeling ...... 29

2.4 Molecular Dynamics ...... 36

2.5 Blind Docking ...... 44

2.6 Focused Docking and Induced Fit Molecular Dynamics ...... 50

2.7 Binding Site Validation: Mutagenesis and Functional Assays ...... 53

2.8 Free Energy Analysis ...... 59 ix

2.9 Mechanism of Allosteric Antagonism ...... 63

2.10 Conclusions ...... 70

Chapter 3 . Experimental Investigation of Retinoic Acid Receptor Antagonism .. 71

3.1 Introduction ...... 71

3.2 Nuclear Receptor Background ...... 72

3.3 NR LBD expression and purification ...... 103

3.4 Circular Dichroism Experiments ...... 116

3.5 Dimerization LC Experiments ...... 120

3.6 ITC experiments ...... 122

3.7 Origin of β-apo-13-carotenone binding affinity ...... 141

3.8 Conclusions ...... 164

Chapter 4 . Computational Investigation of Retinoic Acid Receptor Antagonism

...... 166

4.1 Introduction ...... 166

4.2 Ligand Parameterization ...... 166

4.3 Ligand Docking to RARα LBD ...... 181

4.4 Molecular Dynamics Simulations of RARα Complexes ...... 192

4.5. Free Energy Analysis ...... 212

4.6. Long Timescale Simulations of Apo RARα LBD ...... 233

4.7. Conclusions ...... 239

Chapter 5 . Force Field Parameterization of S-Adenosylmethionine ...... 240

5.1 Introduction ...... 240

x

5.2 SAM Background ...... 240

5.3 Survey of Existing SAM and SAH Structures ...... 249

5.4 Sulfonium Force Field Parameterization ...... 256

5.5 Derivation of Partial Atomic Charges for SAM ...... 290

References ...... 298

Appendix A. Nuclear Receptor Sequence Alignment ...... 316

Appendix B. Structure of Synthetic RAR Ligands Available from Tocris ...... 319

Appendix C. His-hRARα LBD Primary Sequence and Plasmid Sequence ...... 321

Appendix D. His-hRXRα LBD Primary Sequence and Plasmid Sequence ...... 323

Appendix E. FPLC Standards ...... 325

Appendix F. Detailed ITC Data ...... 327

Appendix G. AMBER Input Files for All-trans Retinoic Acid ...... 330

Appendix H. AMBER Input Files for TTNPB ...... 332

Appendix I. AMBER Input Files for β-apo-13-carotenone ...... 334

Appendix J. AMBER Input Files for β-apo-14’-carotenoic Acid ...... 336

Appendix K. Partial Atomic Charges for β-apo-13-carotenone Covalently Linked to Cysteine...... 338

Appendix L. CCS Library File for Implementation in AMBER ...... 340

Appendix M. CCS Parameter File for Implementation in AMBER ...... 347

Appendix N. EC 2.1.1 Members That Do Not Use SAM as a Methyl Donor ..... 348

Appendix O: Partial Atomic Charges for All SAM Conformations Used in

Multiconfiguration Fits ...... 350

xi

List of Tables

Table 2.1. Sequence identity between template and model sequences...... 31

Table 2.2. Sequence similarity between template and model sequences...... 32

Table 2.3. Number of conformational clusters from MD simulations of nAChR models in three different binding states...... 38

Table 2.4. Average RMSDs for backbone atoms of ECD models from MD simulations in three states...... 41

Table 2.5. Measurements of agonist binding stability in MD simulations of -bound nAChRs...... 44

Table 2.6. Blind docking results for agonists to multiple nAChR conformations...... 49

Table 2.7. Effects of agonists and antagonists on wild type and mutant hα4β2 nAChRs...... 58

Table 2.8. MM-PBSA binding energy calculations for epibatidine- and KAB-18-bound receptors.

...... 61

Table 2.9. Survey of C loop closure for AChBP X-ray structures...... 65

Table 2.10. General ranges for C loop "openness" upon binding ligands of different pharmacological function...... 65

Table 2.11. Measurements of C loop closure for MD simulations of epibatidine-bound nAChRs. 66

Table 3.1. List of common NRs and their known endogenous ligands...... 74

Table 3.2. Apocarotenoid binding affinity for human RAR subtypes...... 101

Table 3.3. Percentage of folded His-hRARα LBD with added ethanol...... 120

Table 3.4. Summary of ITC experiments with hRARα LBD...... 137

Table 3.5. Summary of ITC experiments on hRARα C235A LBD...... 147

Table 3.6. Existing hRARα LBD crystal structures...... 162

Table 3.7. Conditions tested for -apo-13-carotenone and RARα crystallization...... 163

xii

Table 4.1. Comparison of calculated angles describing the tetrasubstituted carbon of compound

1...... 176

Table 4.2. Optimized angles and force constants for the tetrahedral linkage of β-apo-13- carotenone to cysteine...... 179

Table 4.3. Comparison of angle measurments from MD simulations using GAFF or MP2/6-

311+G(d,p)-optimized parameters...... 180

Table 4.4. ATRA docking to RARα LBD...... 184

Table 4.5. Cluster analysis of β-apo-13-carotenone docked to RARα LBD...... 187

Table 4.6. Custer analysis of β-apo-14'-carotenoic acid docked to RARα LBD...... 190

Table 4.7. List of RARα LBD simulations performed...... 192

Table 4.8. All-atom RMSD of ligands with respect to average structure...... 206

Table 4.9. Entropic and MM-PBSA deviations from 50 ps data for reduced data sets...... 219

Table 4.10. Summary of ITC Data of SRC-1 NR2 peptide binding ligand-bound RARα...... 221

Table 4.11. MM-PBSA binding energy components after 1.5 μs of MD simulation...... 222

Table 4.12. Detailed MM-PBSA components for receptor peptide interaction...... 223

Table 5.1. List of crystal structures binding SAM/SAH with a syn-conformation...... 255

Table 5.2. Experimental, force field, and ab initio sulfonium measurements...... 263

Table 5.3. Partial atomic charges for 2,3-butanedione...... 272

Table 5.4. Force constants derived for trimethylsulfonium...... 278

+ Table 5.5. RMSD between MM and QM frequencies for C3H9S with reparameterized V3...... 279

Table 5.6. H-C-S-C profiles...... 281

Table 5.7. Force constants derived for ethyldimethylsulfonium...... 284

+ Table 5.8. RMSD between MM and QM frequenceis for C4H11S with reparameterized V3...... 287

Table 5.9. C-C-S-C dihedral scan results...... 287

Table 5.10. Final sulfonium parameters...... 290

Table 5.11. PDB structure used for derivation of partial atomic charges representing the anti-

conformation...... 294 xiii

Table 5.12. PDB structures used for derivation of partial atomic charges representing the high anti-conformation...... 294

Table 5.13. PDB structures used for derivation of parital atomic charges representing the syn- conformation...... 294

Table 5.14. Comparison of partial atomic charges for SAM in multiple conformation...... 296

Table E.1. Contents of protein standrd 1...... 325

Table E.2. Contents of protein standard 2...... 326

Table F.1. ATRA-bound hRARα LBD ITC results...... 327

Table F.2. TTNPB-bound hRARα LBD ITC results...... 327

Table F.3. β-apo-13-carotenone-bound hRARα LBD ITC results...... 327

Table F.4. β-apo-14'-carotenoic acid-bound hRARα LBD results...... 327

Table F.5. BMD 195614-bound hRARα LBD ITC results...... 328

Table F.6. β-apo-13-lycopenone-bound hRARα LBD ITC results...... 328

Table F.7. Untreated (apo) hRARα LBD ITC results...... 328

Table F.8. ATRA-bound C235A hRARα LBD ITC results...... 328

Table F.9. β-apo-13-carotenone-bound C235A hRARα LBD ITC results...... 329

Table K.1. Partial atomic charges for CCS...... 338

Table O.1. Partial atomic charges for SAM in anti-conformation...... 350

Table O.2. Partial atomic charges for SAM in high anti-conformation...... 350

Table O.3. Partial atomic charges for SAM in syn-conformation...... 352

xiv

List of Figures

Figure 1.1. Thermodynamic cycle implemented in the MM-PBSA protocol...... 15

Figure 1.2. Structural representation of the two receptors studied in this dissertation...... 17

Figure 2.1. Schematic of neuronal nAChR structure...... 23

Figure 2.2. Structure of acetylcholine binding protein (AChBP)...... 26

Figure 2.3. Numbered sequence alignment of AChBP and nAChR sequences used for modeling.

...... 31

Figure 2.4. Histograms of model energies per modeling iteration...... 35

Figure 2.5. RMSD plots for MD simulations of nAChR models...... 39

Figure 2.6. Average all-atom RMSDs for hα4β2 nAChR ECD model in three different binding states...... 42

Figure 2.7. Average all-atom RMSDs for hα3β4 nAChR ECD model in three different binding states...... 43

Figure 2.8. Compounds used in blind docking experiments...... 45

Figure 2.9. Blind docking modes compared to X-ray structures...... 49

Figure 2.10. Stability of KAB-18 at its proposed binding site...... 52

Figure 2.11. Detailed docking modes for negative allosteric nAChR modulators...... 53

Figure 2.12. Dose-response curves for epibatidine and KAB-18 on wild type and mutant hα4β2

nAChRs...... 57

Figure 2.13. Convergence of MM-PBSA calculations...... 62

Figure 2.14. C loop closure of AChBP bound to various ligands...... 64

Figure 2.15. Comparison of experimental DMXBA binding to computationally predicted KAB-

18/epibatidine binding...... 69

Figure 3.1. Domain organization of N-CoR1 and SMRT corepressor proteins...... 76

Figure 3.2. Domain organization of p160 family of coactivators...... 78

xv

Figure 3.3. Domain organization of typical nuclear receptor...... 83

Figure 3.4. DNA-binding domain of estrogen receptor in complex with hormone response element...... 85

Figure 3.5. Diagram of apo hRXRα LBD...... 86

Figure 3.6. Crystal structure of apo hPPARγ...... 88

Figure 3.7. Crystal structure of agonist-bound hPPARγ in complex with SRC-1 NR-box 2 peptide.

...... 90

Figure 3.8. Antagonist-induced H12 conformation...... 92

Figure 3.9. Corepressor interactions with inverse agonist-bouind NR LBDs...... 96

Figure 3.10. Extended helix 12 of apo hRXRα LBD interacts with coactivator binding pocket of a neighboring molecule...... 97

Figure 3.11. β-carotene and apocarotenoids...... 102

Figure 3.12. SDS-PAGE of His-hRARα LBD...... 105

Figure 3.13. Size exclusion chromatogram for His-hRARα LBD...... 107

Figure 3.14. SDS-PAGE for His-hRXRα LBD...... 109

Figure 3.15. Size exclusion chromtogram for His-hRXRα LBD...... 110

Figure 3.16. SDS-PAGE of His-hRARα/∆His-hRXRα LBD heterodimer purification...... 112

Figure 3.17. Size exclusion chromatogram for His-hRARα/∆His-hRXRα LBD purification...... 113

Figure 3.18. Purification of RARα/RXRα heterodimers...... 114

Figure 3.19. SDS-PAGE of His-hRARα LBD thrombin cleavage...... 116

Figure 3.20. Circular dichroism spectrum for His-hRARα LBD in solution with varied amounts of ethanol...... 118

Figure 3.21. Percentage of folded His-hRARα LBD with added ethanol...... 119

Figure 3.22. Size exclusion chromatograms of hRARα/hRXRα LBD complexes...... 122

Figure 3.23. Structures of RAR modulators used in ITC experiments...... 128

Figure 3.24. ITC results of SRC-1 NR2 peptide binding to ATRA-bound hRARα LBD...... 130

Figure 3.25. ITC results of SRC-1 NR2 peptide binding to TTNPB-bound hRARα LBD...... 131 xvi

Figure 3.26. ITC results of SRC-1 NR2 peptide binding to β-apo-13-carotenone-bound hRARα

LBD...... 132

Figure 3.27. ITC results of SRC-1 NR2 peptide binding to β-apo-14'-carotenoic acid-bound

hRARα LBD...... 133

Figure 3.28. ITC results of SRC-1 NR2 peptide binding to BMS614-bound hRARα LBD...... 134

Figure 3.29. ITC results of SRC-1 NR2 peptide binding to β-apo-13-lycopenone-bound hRARα

LBD...... 135

Figure 3.30. ITC results of SRC-1 NR2 peptide binding to apo hRARα LBD...... 136

Figure 3.31. Suggested mechanism of covalent bond formed between β-apo-13-carotenone and

C235 of hRARα LBD...... 142

Figure 3.32. Luffariellolide covalently binds to RAR LBD...... 143

Figure 3.33. ITC results of SRC-1 NR2 peptide binding to ATRA-bound hRARα C235A LBD. .. 145

Figure 3.34. ITC results of SRC-1 NR2 peptide binding to β-apo-13-carotenone-bound hRARα

C235A LBD...... 146

Figure 3.35. 13C-labeled β-apo-13-carotenone...... 148

Figure 3.36. 13C NMR spectra...... 150

Figure 3.37. ADEQUATE spectrum of free, triply-labeled β-apo-13-carotenone...... 154

Figure 3.38. ADEQUATE spectrum of triply-labeled β-apo-13-carotenone bound to hRARα LBD

(C203S, C336S)...... 155

Figure 3.39. ADEQUATE spectrum of triply-labeled β-apo-13-carotenone bound to hRARα LBD

(C203S, C336S, C235A)...... 156

Figure 3.40. Mass spectra of hRARα LBD...... 159

Figure 4.1. Structure of all-trans retinoic acid (ATRA)...... 169

Figure 4.2. Structure of TTNPB...... 170

Figure 4.3. Structure of β-apo-14'-carotenoic acid...... 171

Figure 4.4. Structure of β-apo-13-carotenone...... 172

Figure 4.5. Structure of CCS amino acid and 2-(methylthio)but-3-en-2-ol (1)...... 177 xvii

Figure 4.6. Mesurement of the c2-c3-S angle from MD simulations using parameters refit to ab initio calculations or using GAFF parameters...... 181

Figure 4.7. Docking modes of ATRA...... 185

Figure 4.8. Docking modes of β-apo-13-carotenone...... 186

Figure 4.9. Proximity of β-apo-13-carotenone to C235...... 188

Figure 4.10. β-apo-14'-carotenoic acid extending outside the RAR binding cavity...... 189

Figure 4.11. Docking models of β-apo-14'-carotenoic acid...... 191

Figure 4.12. RAR LBD structural alignment and mobility...... 197

Figure 4.13. RARα LBD backbone RMSD...... 198

Figure 4.14. Average per residue RMSDs with respect to starting structure (1)...... 202

Figure 4.15. Average per residue RMSDs with respect to starting structure (2)...... 203

Figure 4.16. Average per residue RMSDs with respect to the average structure (1)...... 204

Figure 4.17. Average per residue RMSDs with respect to the average structure (2)...... 205

Figure 4.18. Induced fit MD of -apo-14'-carotenoic acid...... 207

Figure 4.19. Deformation of L1-3 loop upon β-apo-14'-carotenoic acid binding...... 209

Figure 4.20. Y208-D288 distance...... 209

Figure 4.21. S213-D221 distance...... 210

Figure 4.22. Distance between β-apo-13-carotenone and C235 of RAR LBD...... 212

Figure 4.23. Entropy comparison between full and truncated systems...... 217

Figure 4.24. Computed entropic component, T∆S, of SRC-2 NR2 binding to ATRA-bound hRARα

LBD...... 219

Figure 4.25. Range of average deviations from 50 ps data for reduced entropy and MM-PBSA data sets...... 220

Figure 4.26. Computed binding energies for SRC-1 NR2 peptide binding to RAR LBD...... 222

Figure 4.27. Computed ∆Gbind distributions...... 226

Figure 4.28. Computed ∆Gbind distributions...... 227

xviii

Figure 4.29. Correlation between computed and experimental ∆G values for SRC-1 NR2 peptide

binding to RARα...... 228

Figure 4.30. Correlation coefficient over 1.5 μs of free energy calculations...... 229

Figure 4.31. Correlation coefficient and slope of regression lines for binding energy components.

...... 231

Figure 4.32. Comparison of SRC-1 NR2 binding energies to RARα bound to β-apo-13-carotenone both covalently and non- covalently...... 232

Figure 4.33. Deviation of apo RARα LBD simulations...... 235

Figure 4.34. Computed secondary structure of 4.75 μs RARα LBD simulations (run 1)...... 236

Figure 4.35. Computed secondary structure of 5 μs RARα LBD simulation (run 2)...... 237

Figure 4.36. Conformation of apo RAR LBD over 5 s of explicit solvent simulation...... 238

Figure 5.1. SAM methyltransferase reaction...... 242

Figure 5.2. Radical SAM reaction...... 245

Figure 5.3. The SAM cycle...... 248

Figure 5.4. Folic acid cycle...... 249

Figure 5.5. Statistics of PDB crystal structures including SAM or SAM...... 250

Figure 5.6. Gylcosidic (χ) torsion angle of SAM and SAH structures in the PDB...... 251

Figure 5.7. 44 SAM molecules in the anti-conformation...... 253

Figure 5.8. 184 SAM molecules in the high anti-conformation...... 253

Figure 5.9. 22 SAM molecules in the syn-conformation...... 254

Figure 5.10. Sulfur atom types parameterized in the general AMBER force field (GAFF)...... 260

Figure 5.11. Sample dihedral profiles with periodicity of 1, 2, and 3...... 267

Figure 5.12. Dihedral scan of 2,3-butanedione...... 268

Figure 5.13. Initial dihedral profile fitting test for 2,3-butanedioine...... 270

Figure 5.14. Dihedral profiles for 2,3-butanedione with different charge sets...... 273

Figure 5.15. Optimized dihedral profiles for 2,3-butanedione...... 275

Figure 5.16. Force field parameters required for treatment of sulfonium center in SAM...... 276 xix

Figure 5.17. Fitting of the H-C-S-C V3 parameter...... 280

+ Figure 5.18. H-C-S-C torsional profile of C3H9S with existing force field parameters...... 282

Figure 5.19. Fitting of the C-C-S-C V3 parameter...... 286

Figure 5.20. Comparison of C-C-S-C torsional profile with new and existing force field ...... 288

Figure 5.21. Absolute energy difference between QM and MM C-C-S-C torsional profiles...... 289

Figure 5.22. Geometry optimization of SAM...... 293

Figure 5.23. Partial atomic charges for SAM with three different χ angles...... 297

Figure B.1. Structure of synthetic RAR ligands...... 320

Figure E.1. FPLC chromatogram for protein standard set 1...... 325

Figure E.2. FPLC chromatogram for protein standard set 2...... 326

xx

List of Abbreviations

ACBP = acyl-CoA binding protein

AChBP = acetylcholine binding protein

AdoHcy = S-adenosyl-L-homocysteine

AdoMet = S-adenosyl-L-methionine

AK = aspartate kinase

ATRA = all-trans retinoic acid

BPTI = basic pancreatic trypsin inhibitor

CREB = cAMP response element binding-protein

CBP = CREB binding-protein

CI = confidence interval

DBD = DNA-binding domain (of NR)

DMXBA = 3-2(2,4-dimethoxybenzylidine)-anabaseine

DFT = density functional theory

EC number = enzyme commission number

ECD = extracellular domain

EM = electron microscopy

FEP = free energy perturbation

FF = force field

FPLC = fast protein liquid chromatography

GAFF = general AMBER force field

HAT = histone acelytransferase

HDAC = histone deacetylase

ITC = isothermal titration calorimetry

LBD = ligand-binding domain

xxi

LES = locally enhanced sampling

LGA = Lamarckian genetic algorithm

MAT = methionine adenosyltransferase

MD = molecular dynamics

MEP = molecular electrostatic potential

MM = molecular mechanics

MM-PBSA = molecular mechanics Poisson-Boltzmann surface area

Phser = phosphohomoserine pLGIC = pentameric ligand-gated ion channel nAChR = nicotinic

NAM = negative allosteric modulator

NR = nuclear receptor

NR2 = nuclear receptor box II (second NR interaction motif of SRC)

PDB = protein databank

PME = particle mesh ewald

QM = quantum mechanics

RAR = retinoic acid receptor

REMD = replica exchange molecular mechanics

RESP = restrained electrostatic potential

RID = receptor-interaction domain (of SRC)

RMSD = root-mean-square deviation

RXR = retinoic X receptor

SAH = S-adenosyl-L-homocysteine

SAM = S-adenosyl-L-methionine

SAR = structure activity relationship

SD = standard deviation

SE = standard error xxii

SRC = steroid receptor coactivator

TCEP = tris(2-carboxyethyl)phosphine)

TI = thermodynamics integration

THF = tetrahydrofolate

TM = transmembrane

xxiii

Chapter 1. Introduction

1.1 Computers in Biochemistry

Computation has been called the “third pillar of science” [1], serving as an

important link between theory and experiment. As applied in the field of

chemistry, computation is able to describe phenomena that are either difficult or

otherwise impossible to observe experimentally. In the past two decades, the

field of computational chemistry has become increasingly refined, now frequently

serving as a valuable compliment to experiment; it can often provide insights to

drive the development of new hypotheses that may be tested experimentally. As

the field matures, it draws closer to obtaining the significant goal of becoming a

reliably predictive method.

Perhaps the best way to chronicle the developments of computational chemistry

is to highlight the two Nobel Prizes awarded to scientists in the field. First, in

1998, the Nobel Prize in Chemistry was awarded to Walter Kohn for “his development of the density-functional theory” and to John Pople for his

“development of computational methods in quantum chemistry.” Both Kohn and

Pople significantly contributed to the solution of quantum mechanical (QM) electronic structure calculations through which the properties of chemical entities may be determined. Kohn helped develop density functional theory (DFT) where

1 functionals are used to describe electron density as an alternative method to

dealing with wavefunctions [2]. This method scales well with an increased

number of electrons and is therefore popular in electronic structure calculations

of many-electron systems such as those studied by material scientists.

Pople’s work, on the other hand, was in the ab initio (from first principles) solution

of the Schrödinger equation to determine the discretized energy states

(eigenvalues) of a chemical system described by a wavefunction composed of a

linear combination of atomic orbitals. Methods to speed up calculations, such as

the use of basis sets composed of Gaussian-type orbitals and the development

of efficient algorithms to solve the Schrödinger equation resulted in Gaussian, the

popular quantum chemistry software package [3-5]. By implementing the

quantum mechanical theory worked out in the 1920s by Schrödinger, Dirac,

Heisenberg, and others, the Gaussian program allowed for the application of

these methods by general chemists.

With the use of complete basis sets describing electron positions in combination

with the treatment of all possible configurations, ab initio calculations can

converge on the exact time-independent, non-relativistic solution of the

Schrödinger equation within the Born-Oppenheimer approximation (electron

motion decoupled from fixed nuclei). This, however, comes at a great

computational cost, and in some cases is only theoretically possible. Where DFT

scales no worse than O(N3), where N is the system size, modern treatments of

electronic structures such as MP2 or MP3 scale as O(N5) and O(N6) respectively.

Thus, while ab initio calculations may be carried out for small molecules with high

2 accuracy, such a treatment for large biomolecules including proteins or nucleic

acids is prohibitively costly. Therefore, approximate techniques must be implemented for the study of biological systems via computational chemistry.

The development of more approximate computational chemistry techniques were

highlighted with the 2013 Nobel Prize in Chemistry which was awarded jointly to

Martin Karplus, Michael Levitt, and Arieh Warshel for their “development of multiscale models for complex chemical systems.” The work honored with this award laid the groundwork for the integration of quantum mechanical calculations such as those discussed above with classical molecular mechanics methods

(MM) in an approach called QM/MM. QM/MM simulations have application in the study of enzymatic reactions where only a select number of atoms are required

to be treated quantum mechanically (the atoms involved in bond breaking and

bond formation), while the remaining atoms may be treated with an atomistic

force field. By limiting the number of atoms treated by computationally expensive

QM methods, the QM/MM method allows for chemical reactions to be studied in

their native environments.

Similar to the case of quantum mechanics, the theoretical basis for classical

molecular mechanics simulations of chemical systems was developed well

before modern computers existed. The work of Hill and Westheimer in the 1940s

used Coulombic and van der Waals interactions to explain the interaction of

atoms [6,7]. These concepts were later incorporated into some of the first force

fields and MM programs by Lifson and Warshel in the late 60s, in which early

computations were used to measure energy differences between the

3 conformations of small molecules [8]. One of the earliest molecular dynamics

(MD) simulations was performed by Rahman in 1964, which consisted of 864

argon atoms using periodic boundary conditions [9]. In 1977, thirteen years after

the simulation by Rahman, the first MD simulation of a protein was performed by

McCammon in the Karplus lab. This was a 8.8 ps long simulation of basic

pancreatic trypsin inhibitor (BPTI) performed in vacuum [10]. Although this simulations was extremely short compared to modern standards, it helped to change the mindset that protein interiors were rigid and began to bring life to the static structures provided by crystallography. At the time, X-ray crystallography was the only way in which structures of biomolecules could be determined; the first de novo NMR protein structure, that of proteinase inhibitor IIA from bull

seminal plasma, was reported by the Wüthrich group in 1985 [11].

In the 35 years since the first protein MD simulation, great advances have been

made in the field of computational chemistry, coinciding with the remarkable

increase in availability and power of computer hardware. Throughout this time of

development, a significant trade-off between speed and accuracy of the

simulations has always been a consideration. Regarding the MD simulation of biomolecules, it is important to obtain an adequate amount of conformational

sampling, either through multiple short simulations or a single long simulation,

such that calculated properties have statistically converged to an average with acceptable errors. Different approaches may be used in order to achieve convergent results with limited computational power. One enhanced sampling method is replica exchange molecular dynamics (REMD), which runs several

4 simulations in parallel over a range of temperatures, allowing conformations to be

swapped between different temperatures based on satisfaction of a Metropolis

criterion. This permits enhanced sampling experienced only at higher temperatures to be accessible to the lower temperature simulations due to

lowered energy barriers [12].

Another method to achieve enhanced sampling is to sample the conformation of

a solute in an implicitly solvated environment. In a MD simulation in which the

solute is explicitly solvated (i.e. all water molecules are considered), a considerable amount of the computational effort is applied to calculating the

interactions between the water molecules themselves. For example, when using

a 12-15 Å buffer of water from the solute surface to the edges of the unit

simulation cell, the number of water atoms typically compose 85-90% of the total

atoms of the entire system. These numbers reflect the use of the TIP3P water

model, which uses three points or atoms to describe each water molecule. If a

more accurate four-point water model such as TIP4P-Ew were to be used, the

total percentage of solute would increase to 90-95%. In an implicitly solvated

simulation, only the atoms of the solute are directly considered and the solvent

environment is replaced by a reaction field which acts upon each charge of the

solute. The reaction field may be calculated by the Poisson-Boltzmann (PB)

equation [13] or the more approximate (and more quickly calculated) generalized

Born (GB) method [14,15]. A tradeoff for the increased speed of simulations

implementing implicit solvents is a loss of atomistic detail, particularly at the solvent accessible surface of the solute.

5 In recent years, major breakthroughs in computational power have seen new

efforts applied to the production of very long explicitly solvated simulations that

extend into the microsecond or millisecond range. Three different approaches

have been used to produce exceptionally long MD simulations: supercomputing,

distributed computing, and the use of special-purpose hardware. The use of

supercomputers, in which thousands of processors work together via high-speed

connections, has been a current standard in the production of long MD

trajectories. In 2013, the Schulten group reported an all-atom, explicitly solvated,

100 ns simulation of the full HIV-1 capsid totaling an extraordinary 64,423,983

atoms [16]. The simulation was carried out using 4,000 nodes (128,000

processors) of the $188 million Blue Waters supercomputer for a production rate

of 5-9 ns/day. To put the size of this simulation in perspective, the two targets

studied in this dissertation, the ligand-binding domains of the nicotinic

acetylcholine receptor (nAChR) and retinoic acid receptor (RAR), contain a

respective ~130,000 and ~35,000 atoms when explicitly solvated. Considering

that complete pairwise simulations scale as O(N2) while most modern MD

algorithms scale as O(N log N), doubling the number of atoms, N, in a

simulations will always decrease the simulation rate by more than a factor of 2.

Therefore, assuming O(N log N), the nAChR and RAR systems could be

theoretically simulated at rates of 3.8-6.8 μs/day and 6.7-12.1 μs/day using the supercomputer resources mentioned above. The longest simulations reported in this dissertation are two 5 μs RAR trajectories that were computed at an average rate of 42 ns/day.

6 In a more cost-effective approach, the Pande group has developed a distributed computing platform called Folding@Home which harnesses the unused computational power of thousands of volunteers around the world [17]. Currently,

170,601 computers are donating time to the project. A recent noteworthy result

from this application was the 2012 report of a total of 30 ms of simulation of Acyl-

CoA binding protein (ACBP) using the AMBER ff96 force field in combination with a GBSA implicit water model [18].

Finally, of particular note are the simulations performed on Anton, a special- purpose machine built by D. E. Shaw Research (DESRES) solely to produce very long MD simulations. In 2010, DESRES reported the first 1 ms all-atom

simulation of a solvated protein [19], the 58-residue BPTI, which was the subject

of the very first protein simulation by McCammon and Karplus 33 years earlier.

The DESRES simulation of BPTI was eight orders of magnitude longer than that

initial simulation and two orders of magnitude longer than the previously reported

longest all-atom MD simulation at the time [20]. The BPTI simulation began in the

native state and maintained a near-native conformation for the duration of the

simulation, transitioning between only about four different conformational states,

two of which had been described via NMR experiments [21]. In the same paper,

multiple 100 μs simulations of FiP35, an engineered WW domain protein with a

fast folding time, were also reported. These trajectories began in the unfolded

state and exhibited numerous folding and refolding events within the same

unbiased, all-atom simulation. The calculated folding rate of 10 ± 3 μs was in

close agreement with the experimentally measured folding rate of 14 μs [22].

7 While these simulations are impressive, they were performed using resources that few scientists may access. Both standard supercomputer and special- purpose hardware are very costly, presenting a significant barrier to access.

Distributed computing, on the other hand, provides a solution to the prohibitive costs, since all of the hardware resources are donated. However access to this platform remains an issue. A recent advancement in the rewriting of MD code to make use of modern graphics processing units (GPUs) has dramatically improved simulation speeds available to general users [23,24]. Harnessing GPU technology, the development of which has been propelled by the profitable video game industry, enables the routine production of microsecond all-atom simulations at a small fraction of the cost of a supercomputer or special-purpose machine.

The recent advancements in simulation length have allowed for a more thorough analysis of modern force field accuracy. The earlier simulations by DESRES used a modified version of the AMBER ff99SB force field (ff99SB-ildn) [25], and while the successful folding of FiP35 in part validated the ability of the force field to describe a folding mechanism with rates comparable to experiment does not mean it will translate successfully to other proteins that are either larger or possess different structural elements. In fact, it has been determined that a modified version of the CHARMM force field (CHARM22*) is more robust at describing the folding of a diverse set of small proteins [26,27]. In the application of long timescale, all-atom MD simulations to the refinement of homology models, it was observed that the models tended to drift away from their native

8 structures and that the reason the native state was not achieved was more likely

due to force field deficiencies rather than poor conformational sampling [28].

A recent study compared the ability of 11 force fields (including many AMBER

variants) in five different solvent environments to reproduce 524 NMR chemical

shifts and J-couplings of various short peptides and ubiquitin. These results

indicated that the ff99SB-ildn-NMR and ff99SB-ildn-phi variants performed the

best when simulated in the TIP4P-Ew or TIP4P-2005 water environment [29].

The same study showed that the use of an implicit solvent model (GB in this

study) performed the worst of the five solvent models tested. This indicated that

while the use of GB in folding simulations is able to achieve native

conformations, some of the finer dynamics are not well reproduced. Another

recent report studied four force fields in combination with four water models as

applied to the simulation of a protein crystal (scorpion toxin protein II) [30]. The

results of this study found that the pairing of ff99SB with the TIP3P water model

performed the best at maintaining lattice structure and reproducing experimental

B-factors.

A general conclusion of most recent studies is that current force fields can and

should be improved. The most used modern force fields such as the ff99SB variants of AMBER [25,31-33] and the CHARMM22 force field with CMAP corrections [34,35] originated in the late 90s and have since been only slightly

modified instead of completely reparameterized. Only the CHARMM protein force

field was parameterized for use with a specific water model (a modified form of

TIP3P), while most other force fields, including AMBER, suggest optimal

9 performance with TIP3P. A problem here is that the TIP3P water model was

parameterized in 1983 [36], predating the development of the Particle Mesh

Ewald (PME) method in 1993 which permits the efficient and accurate calculation

of long-range electrostatic interactions in a periodic system by solving for the

interaction energies in Fourier space [37,38]. Use of different mesh Ewald

methods in the simulation of explicitly solvated systems with periodic boundary

conditions is now the standard, however TIP3P, the most common water model,

was never designed for use in such a configuration. In addition to TIP3P being

parameterized with interaction cutoffs instead of PME, this water model was also

only parameterized for use in ambient conditions and is known to have a

viscosity that is much less than experiment. This raises the question of whether

the promising results observed in some recent computational folding experiments

are due to the cancellation of errors between the water model and protein force

fields, since folding in a less viscous environment would be expected to result in

computed folding rates that are faster than experiment. More recent water

models such as TIP4P-Ew [39] and TIP4P/2005 [40] were developed to be consistent with PME methods and are able to better recreate many different

properties of bulk water (density, freezing/vaporization temperatures, self

diffusion coefficient, etc.) than TIP3P. Although this comes at the expense of an

additional ‘atom’ for each water molecule, the recent advances in computational

speed allow for the adoption of a more expensive water molecule.

New force fields are currently in development, being designed for use with

specific, more accurate water models. For instance, new amino acid charge sets

10 and Lennard-Jones parameters have been defined for the AMBER community that incorporate an implicit polarization effect from an average surrounding charge distribution of TIP4P-Ew water [41]. Newer force fields are also taking advantage of the ability to base parameterizations off of higher-level QM calculations: the new AMBER charges were derived at the MP2/cc-pV(T+d)Z level, where the charges used in the ff99 variants were derived using HF/6-31G*.

It is likely that these new parameterization efforts will reach the limits of the approximations implemented in classical force fields, namely the use of fixed, atom-centered partial atomic charges that disregard any explicit treatment of electronic effects.

If these new force fields fail to faithfully replicate the dynamics of biomolecules, it is likely that the incorporation electronic-level details such as polarization effects will be necessary to create more accurate models. Polarizable force fields have already been developed, with the AMOEBA force field being the current standard

[42,43]. The large AMBER and CHARMM communities also have developed polarized force fields of their own [44,45], however they are not in common use due to decreased simulation rate, which is often 2-3 fold slower than a classical force field. While properties such as solvation free energies and condensed phase dynamics of small peptides on a short time scale seem to be better reproduced by polarizable force fields such as AMOEBA [46], the behavior of

AMOEBA on long time scales remains unknown, making its overall accuracy much less defined than the classical force fields.

11 1.2 Free Energy Calculations

In addition to an atomistic description of protein folding, another major goal of computational chemistry is the accurate prediction of free energies. The ability to reliably predict binding free energies (∆Gbind) could revolutionize the

pharmaceutical industry, making computational drug design a standard first

approach to the development of a new therapeutic compound. Although there

have already been significant steps made toward this goal, most notably with the

computational design of the first HIV protease inhibitors [47,48], computational

chemists still often serve a support role of drug design teams instead of leading

the design process.

As is the case for general computational methods, a trade-off between speed

and accuracy for free energy calculations also exists. On one end of the

spectrum, there are fast ‘docking’ methods that determine the best fit between

two molecules, based on empirically weighted, physics-based energy functions.

These methods generally disregard any sort of dynamic interaction between the

ligand and receptor. Additionally, solvent effects are poorly treated or

disregarded all together, resulting in large errors. For example, the Glide XP and

SP scoring functions have RMSDs of 2.26 and 3.18 kcal/mol with respect to

experimental binding affinities for a set of 198 molecules [49]. The latest

implementation of AutoDock, AutoDock Vina, has a reported standard error of

2.75 kcal/mol for 116 compounds not used in its training set [50]. In spite of these

drawbacks, docking techniques may be used to perform tasks that are not

possible with more time consuming scoring methods. Most notably, in a

12 technique known as virtual screening, the speed of docking algorithms is

harnessed in the identification of new potential therapeutic drugs from databases

that can be on the order of 1,000,000 compounds.

On the other end of the accuracy/computational expense spectrum are

thermodynamically rigorous free energy perturbation (FEP) and thermodynamic

integration (TI) methods [51-53]. FEP and TI are alchemical methods in which

one ligand is computationally transformed into another ligand along a coupling

parameter, λ, such that λ=0 represents an initial ligand and λ=1 represents a

second ligand. Convergence of the relative binding energy differences, ∆∆Gbind,

are slow and costly, requiring individual simulations along the λ pathway that

exhaustively sample the protein, ligand, and protein/ligand complex in explicitly

solvated environments. The payoff for these time-consuming simulations is the

resulting, theoretically most accurate change in free energy. In addition to relative

free energy differences, absolute binding free energies may be calculated with

FEP/TI by transforming a ligand into ‘nothing’.

Finally, between docking and FEP/IT on the spectrum of rigor, there is the

molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method for

computing ∆Gbind [54,55]. As the name suggests, MM-PBSA is a molecular

dynamics-based technique, and in that respect is similar to FEP/TI. However,

MM-PBSA is often described as an ‘end-point’ free energy method where the receptor, ligand, and complex are considered individually, differentiating it from

FEP/TI which employs an alchemical approach to transition one bound state into another. A simplification employed in the MM-PBSA method is that while the MD

13 simulations involved in the calculations are performed in an explicitly solvated

environment, the free energies are calculated using an implicit water model. This

serves to eliminate the noise introduced by the many degrees of freedom of the

numerous water molecules, expediting convergence at the expense of a less

accurate treatment of solvation free energy and entropic effects. The MM-PBSA

method is implemented in Chapters 2 and 4 of this dissertation, with the

application of the method to long timescale simulations making up a significant

portion of Chapter 4. Thus, a more detailed description of the method is provided

below.

The MM-PBSA method follows the thermodynamic cycle illustrated in Figure 1.1,

where the binding energy, ∆Gbind, is considered as the sum of three energy components:

∆Gbind = ∆EMM + ∆Gsolv – T∆S (1)

The first component, ∆EMM, is the intramolecular energy difference between the

complex and the sum of the receptor and ligand:

∆EMM = EMM_complex – (EMM_receptor + EMM_ligand) (2)

∆Gsolv, is the free energy difference of solvation, and T∆S is the temperature, T,

multiplied by the entropy difference, ∆S.

The intramolecular and solvation free energies can be further broken down into

the following components:

∆EMM = ∆EvdW + ∆Eele + ∆Eint (3)

14 Where EvdW is the van der Waals energy, Eele is the electrostatic energy, and Eint is the internal energy of molecule that is determined using classical force field equations.

Figure 1.1. Thermodynamic cycle implemented in the MM-PBSA protocol. The final binding solv vacuum energy, ∆Gbind , is calculated by first determining ∆Gbind , then accounting for the solvation effects of binding, ∆Gsolv, using an implicit water model.

Finally, the solvation free energy may be decomposed into electrostatic and nonpolar contributions:

∆Gsolv = ∆Gelec + ∆Gnp (4)

The traditional method to implicitly calculate the electrostatic contribution of solvation, ∆Gelec, is to use the Poisson-Boltzmann equation. The nonpolar part of solvation can be thought of as the energy required to create a cavity in the water for the solute. Therefore, this cavitation term is dependent upon the solvent 15 accessible surface area (SASA) of the solute. The most common way to

calculate the nonpolar portion of the free energy of solvation uses the following

relationship described by Sitkoff et al. [56]:

Gnp = γ*SASA-β (5)

Where the surface accessible surface area is scaled by a factor γ that is often

considered the surface tension, and offset by a correction factor, β.

The original implementation of the MM-PBSA method treated the nonpolar

component of solvation as linear to the SASA, however a newer, more physically

complete implementation considers both a repulsive and attractive component of

the nonpolar solvation energy:

Gnp = Grep + Gatt (6)

Here, the repulsive component remains linearly correlated to the SASA as in (5),

while the attractive component, which relates to the van der Waals interaction

energy between the solute and solvent molecules, can be computed using a

surface-integration approach [57]. This implementation performed very well at

computing the nonpolar component of solvation for a test set of 42 small

molecules (correlation coefficient = 0.98), however when extending the model to

measure the nonpolar potential of mean force for two sets of cytosine-guanine

base pairs, the results are less impressive. The application of this new methodology was not assessed for the treatment of larger biomolecules, which is addressed in Chapter 4 of this dissertation.

16 1.3 Dissertation Themes and Organization

The focus of this dissertation is in the application of computational tools to describe biological phenomena, particularly the dynamic interaction between biomolecules. The targets studied in the following chapters are two biological receptors: the nicotinic acetylcholine receptor (nAChR) and retinoic acid receptor

(RAR), as illustrated in Figure 1.2.

A B

Figure 1.2. Structural representation of the two receptors studied in this dissertation. A. Nicotinic acetylcholine receptor composed of five subunits (orange and blue) bound to acetylcholine (red) in the extracellular, ligand-binding domain. The position of a lipid bilayer is indicated in grey. B. Nuclear receptor showing the modular structure of the DNA-binding domain (bound to DNA, red and orange) and the ligand-binding domain (bound to ligand, purple). Images created by David S. Goodsell for RCSB PDB and made available through creative commons (http://creativecommons.org/licenses/by/3.0/us/#), used without alteration.

17 Some overlap in terminology arises and should be clarified, specifically for the

terms ‘receptor’ and ‘ligand’. The biological definition of a receptor corresponds

to a protein that translates an extracellular chemical signal (a ligand) into a

cellular response. In the cases of the nAChR and RAR, the endogenous ligands

are acetylcholine and retinoic acid, respectively. Binding of these ligands to their

respective receptors results in depolarization of the plasma membrane due to an

influx of positively charged ions in the case of nAChR, and gene transcription in

the case of RAR. In a biochemical sense, the term ‘receptor’ indicates a

particular type of protein, differentiating it from other classes of proteins such as

enzymes or structural proteins. A ‘ligand’ in this sense is generally limited to small, drug-like molecules or signaling peptides.

In the field of computational chemistry, particularly when considering free energy

calculations, the terms ‘ligand’ and ‘receptor’ carry far less restricted definitions.

Here, both the ligand and receptor may be any two molecules that interact. The

larger of the two interacting molecules is generally considered the receptor and is

often a protein (receptor, kinase, etc.) or nucleic acid. The ligand is generally a

small molecule or peptide, however it may also be another protein or nucleic

acid. Although the two targets considered in this dissertation are true biological

receptors, the computational methods applied herein may be used to describe

the dynamics and interactions between any two types of molecules.

Chapter 2 discusses the methods used to determine the binding site of a set of

nAChR negative allosteric modulators, given very little experimental data to limit

the search. A combination of molecular dynamics and blind docking was used on

18 a homology model of two nAChR subtypes to identify a potential binding site. The

binding mode was further refined and characterized by MD simulations and MM-

PBSA free energy analysis to both wildtype and mutant receptors. Functional

assays were used with mutant receptors, confirming that the correct allosteric binding site was located. Finally, a mechanism of allosteric antagonism is proposed based on a review of structural nAChR homologues and dynamic data

from MD simulations.

Chapters 3 and 4 discuss experimental and computational results that describe

how two retinoic acid analogues act as antagonists of RAR. Chapter 3 contains

the experimental section that includes the results of isothermal calorimetry, NMR,

and mass spectrometry experiments characterizing the interaction of the ligands

with the receptor. Chapter 4 contains the computational results including docking

and dynamics results that compliment the experimental findings in Chapter 3.

Detailed binding free energy calculations are also performed, revealing that for

the case of a peptide/protein interaction, strong correlation with experiment is

dependent upon sampling that extends into microsecond timescales.

Finally, Chapter 5 involves the parameterization of a new atom type for use with

the AMBER ff99 force field. Specifically, parameters to describe a sulfonium

atom type (sulfur atom with three substituents carrying a positive charge) are

derived. This chemical group is uncommon in biology and was therefore not

considered in the development of the common ff99 force field nor the general

AMBER force field (GAFF). However, a sulfonium center is present in S- adenosylmethionine (SAM), the most common methyl donor in biology.

19 Therefore, in order to perform MD simulations of SAM in complex with the 200+ types of SAM-dependent methyltransferases, sulfonium parameters were derived.

20

Chapter 2. Identification of a Negative Allosteric Binding Site on the Nicotinic Acetylcholine Receptor

2.1 Introduction

This chapter deals with computational methods used to identify the binding site of

a known modulator of a receptor, the nicotinic acetylcholine receptor (nAChR),

with very limited experimental data to limit the search space. Extensive homology

modeling was carried out to create a three-dimensional structure of the

extracellular domain (ECD) of the receptor, which was used to search for the

binding site of a set of experimentally-verified negative allosteric modulators

using blind docking methods. After potential binding modes were refined with molecular dynamics (MD) simulations, functional experiments were performed on mutant receptors to confirm the binding site; computational free energy analysis confirmed the experimental findings, providing additional support for the binding mode. Finally, based on a survey of experimental structures from a homologous protein, a mode of allosteric antagonism is proposed.

A majority of this chapter is adapted from the following reference made available from Creative Commons (http://creativecommons.org/licenses/by/2.5/):

Pavlovicz RE, Henderson BJ, Bonnell AB, Boyd RT, McKay DB, Li C. “Identification of a negative allisteric site on human α4β2 and α3β3 neuronal nicotinic acetylcholine receptors.” PLOS ONE, 6(9): e24949, 2011.

21 2.2 nAChR Background

Nicotinic acetylcholine receptors (nAChRs) are members of the pentameric

ligand-gated ion channel (pLGIC) family of membrane proteins, which also includes GABAA, glycine, serotonin, and zinc activated receptors. pLGICs are

also known as Cys-loop receptors due to a common structural motif of a long

membrane-contacting loop formed by a disulfide bridge. nAChRs are cation-

specific, plasma membrane channels found throughout the central and peripheral

nervous systems [58-60], which may be classified as either muscle- or neuronal-

type receptors based on their subunit composition. There are numerous subtypes

of neuronal nAChRs, with α2-α10 and β2-β4 subunits arranging in either homo-

or heteropentameric assemblies. The heteromeric receptors contain both α and β

subunits, with a general stoichiometry of 2α:3β [61-63], although there is also evidence for (α4)3(β2)2 nAChRs [64,65]. The homomeric receptors are solely

comprised of α subunits and have five agonist binding sites. For heteromeric

receptors, agonist binding occurs at α(+)/β(-) interfaces, where the (+) notation implies the contribution of a principle binding feature called the ‘C loop’ to the binding interface and the (-) notation refers to the complimentary subunit surface that completes the binding site. The general structure of nAChRs (Figure 2.1) is known from electron microscopy (EM) data of the Torpedo marmorata muscle- type receptor that has been refined to a resolution of 4 Å [66].

22

Figure 2.1. Schematic of neuronal nAChR structure. The proportions of the extracellular domain (A), transmembrane domain (B), and intracellular domain (C) are illustrated on the right while the modeled subunit stoichiometry and configuration for heteromeric neuronal nAChRs is illustrated on the left, including labels for the (+) and (-) side of each subunit and the location of each of the two agonist binding sites.

Physiologically, neuronal nAChRs are complex, participating in many neurological processes including cognition [67], pain sensation [68], and reward/addiction mechanisms [69,70]. In addition to nicotine addiction, these receptors have been linked to numerous neurological diseases and disorders including Parkinson’s disease [71], Alzheimer’s disease [71], schizophrenia [72], epilepsy [73], and lung cancer [74], making them important therapeutic targets.

Because the composition and distribution of nAChRs throughout the nervous

system are so varied, it is difficult to study the roles of the various nAChR subtypes in neuronal signaling pathways. In order to deduce these functional roles, there is a need for nAChR antagonists that selectively target specific receptor subtypes. Recently, a class of negative allosteric modulators (NAMs) 23 has been described by the McKay group [75,76]. Some of these compounds,

including a molecule called KAB-18, exhibit preferential inhibition of hα4β2

nAChR when compared to hα3β4 nAChRs, making them particularly interesting to study in order to understand where and how they act on the receptor so as to ultimately design more potent drugs that inhibit nAChRs of specific subunit compositions.

2.2.1 Structural Background: Acetylcholine Binding Protein

Structural comparison between the muscle-type nAChR and acetylcholine

binding protein (AChBP), a soluble pentamer found in molluskan species,

revealed that AChBP is a structural homologue of the extracellular domain (ECD)

of nAChRs [77]. AChBP structures have been reported for three different

molluskan species [77-79], and serve as the most complete templates for nAChR

ECD modeling. The most recent nAChR-related structure is that of the α1

extracellular domain of the mouse nAChR which was resolved to a resolution of

1.94 Å [80].

Great advancements in the field of nAChR structural studies came with the

results of genomic sequencing projects. In 2001, acetylcholine binding protein,

AChBP, was discovered in a cDNA library of the snail Lymnaea stagnalis and

has since been found to be a soluble homopentameric protein homologous to the

extracellular domain of LGICs [81]. AChBP homologues were later found in two

other molluskan species (Aplysia californica [82] and Bulinus truncatus [83]),

each sharing 15-28% sequence identity with all LGIC subunits and exhibiting

pharmacological binding to nAChR ligands similar to those displayed by the 24 homomeric α7 nAChR. The high-resolution structures of AChBPs cocrystallized with various nAChR agonist and antagonists provides a good picture of how nAChR ECDs are structured, how they bind ligands, and how ligand binding may be linked to channel opening.

The nAChR ECD, as inferred from the numerous AChBP structures, is primarily composed of β-strands. As illustrated in Figure 2.2, each subunit contains a short

N-terminal helix followed by ten β-strands which form a four-strand and six-strand

β-sheet. These two sheets are associated in a β-sandwich motif, where all but one strand (β8) is aligned in antiparallel fashion. All together, the topology takes the form of a modified immunoglobulin fold [77]. Two short 310 helices are also present in the structure, the first of which is found on the β2-β3 loop that projects near the N-terminal, making it one of the most distal parts from the membrane.

This loop forms the main immunogenic region (MIR) that is the antibody target of the autoimmune disease myasthenia gravis [84]. Another notable ECD region elucidated by AChBP structures is the ligand-binding site. This site, sometimes called the ‘aromatic nest’ or ‘aromatic cage’, is composed of three tyrosine and two tryptophan residues (Figure 2.2B), all of which are absolutely conserved among all of the nAChR subtypes. The ligand-binding site occurs at the interface between subunits, where one subunit, called the principle face, contributes four of the five aromatic ligand-binding residues. The complementary face contributes the final aromatic residue to complete the aromatic nest. Two tyrosine residues on the principle face are found on the β9-β10 loop, called the ‘C loop’. This loop, a defining characteristic of α subunits, is longer than the corresponding loop of β

25 subunits and contains a vicinal cysteine pair at its tip. nAChRs possess one

ligand binding site for each α subunit. Therefore homopentameric nAChRs such as the α7 subtype contains five binding sites, while the α4β2 and α3β4 subtypes modeled in the following sections contains only two. The conformation of the C loop has been shown to adjust to ligands bound in the aromatic nest [85]. When bound to small agonists such as acetylcholine or nicotine [86], the C loop closes around the molecules. Larger molecules such as [85] or cobra toxin [87], on the other hand, force the C loop into a more ‘open’ conformation

and act as antagonists [85].

A B

Figure 2.2. Structure of acetylcholine binding protein (AChBP). A. A subunit interface of the pentameric Lymnaea stagnalis AChBP is shown binding nicotine (magenta) from PDB ID: 1UW6. They Cys loop, named after the disulfide bond that gives it structure, is featured as well as the C loop which forms a large part of the ligand binding site. One subunit is colored red (N-terminus) to blue (C-terminus) while the complimentary subunit is in grey. B. Close-up view of the aromatic nest in the ligand binding site. Four of the aromatic residues that compose the aromatic nest (Y83, W143, Y185, and Y192) come from one subunit, while W53 is contributed by the adjoining subunit. Note the vicinal disulfide bridge at the tip of the C loop.

26 In 2007, the first mammalian nAChR subunit structure was solved, the

monomeric extracellular domain of the mouse α1 subunit (PDB ID: 2QC1),

providing more details on nAChR function. Although the subunit structure was

solved in monomeric form, it confirmed the high level of structural homology suspected for AChBP promoters and revealed two interesting features that likely relate to function: a well-ordered carbohydrate chain and a hydration pocket in the core of the subunit [80]. The N-linked glycosylation stems from an asparagine residue found on the Cys-loop of most subunits. The high-mannose carbohydrate chain is composed of two N-acetylglucosamines followed by eight mannose residues that stretch from the Cys-loop to the C loop. Mutation of this asparagine resulted in a loss of nAChR expression, indicating the importance of glycosylation in folding and trafficking of functional receptors, while single channel patch-clamp recordings on nAChRs that had their carbohydrate chains removed post-expression revealed a decrease in both opening probability and total current measured per opening event [80]. These tests verified the importance of the glycosylation for proper receptor function, probably by linking the ligand-binding site near the C loop to the transmembrane helices which come into contact with the Cys-loop, the point of origin of the carbohydrate chain.

The hydration pocket is composed of a serine and threonine residue that coordinate a single water molecule surrounded by the otherwise hydrophobic core of the ECD β-sandwich. This feature, not found in the molluskan AChBP structures, is located near the disulfide bond that characterizes the Cys-loop near the membrane surface. Mutational experiments that removed these hydrophilic

27 residues from the subunit core showed a ‘substantial’ loss of nAChR function,

indicating the importance of the hydration pocket to proper nAChR function [80].

It seems that the trapped water molecule inside each subunit allows for greater

mobility of the subunits during channel opening events.

2.2.2 Structural Background: Pentameric Ligand-Gated Ion Channels

More recent crystallographic efforts have revealed the structure of membrane-

spanning bacterial pLGICs, homologous to nAChRs. These distantly related

pLGICs share a common fold with metazoan nAChRs, but lack an N-terminal

helix, disulphide linkage in the Cys-loop, and an intracellular TM3-TM4 domain.

Like the discovery of AChBPs, these prokaryotic LGICs were discovered via

genome searches [88]. To date, full LGICs from the gram-negative Erwinia

chrysanthemi (ELIC) [89] and cyanobacteria Gloebacter violaceus (GLIC) [90,91]

have been crystallized. The ELIC structures are thought to represent the resting

or basal state of the receptor, while the two GLIC structures seem to have been

solved in an active, open state. Comparison of these structures reveals multiple

conformational changes that may be universal to all pLGICs. The transition

between the open and closed states involve both a quaternary twist of each

subunit in a counter-clockwise fashion when the receptors are viewed from the

extracellular side in addition to rigid body movements of the extracellular

domains by 8° around an axis parallel to the inner β-sheets [90]. Additionally,

downward movements of the β1-β2 loop of the ECD was observed, as well as movements of the Cys- and TM2-TM3 loops towards the periphery that tilt the

TM2 and TM3 helices to open the central pore [90,91]. 28 2.3 Homology Modeling

Multiple nAChR modeling studies have been previously reported, addressing

topics such as gating dynamics [92,93], agonist binding [94-96], agonist

selectivity [97,98], and allosteric modulator binding [99]. Most of these models

were built using a single crystal structure as a template [92,94-99], while some

studies have eschewed nAChR modeling all together, using AChBP structures

directly in virtual screening attempts to identify novel nAChR ligands [100,101]. A

strength of the software used here to model the nAChR ECD, MODELLER, is

that it can incorporate structural information from multiple templates into a single

model. Therefore, the modeling described below takes advantage of the

numerous templates available by incorporating four crystal structures into the

model of the nAChR ECD.

Since most nAChR-related experimental structures support modeling of the ECD,

and this receptor domain is known to bind a number of ligands with varied

pharmacological effects, the ECD is the focus of this computational study. In particular, human (α4)2(β2)3 (hα4β2) and human (α3)2(β4)3 (hα3β4) extracellular

domains were modeled based on multiple crystallographic templates. Four

different crystallographic templates were used in the homology modeling

process: the AChBP structure of three molluskan species including Lymnaea

stagnalis (PDB IDs: 1UW6 [86]), Aplysia californica (2BYR [102]), and Bulinus

truncatus (2BJ0 [78]), as well as the mouse α1 ECD monomer (PDB ID: 2QC1

[80]). In order to model the pentameric ECD with the monomeric mouse α1 ECD,

29 an artificial α1 pentamer was created by superimposing the monomer over an

AChBP structure five separate times.

The alignment of the four template structures to the target sequences (Figure

2.3) was performed manually, although cues were taken from PSIPRED [103] and PHD [104] secondary structure predictions. The sequence identity and

sequence similarity between the template and target structures were calculated based on the alignment (Table 2.1 and Table 2.2). Overall, the sequences of the

AChBP templates and the human nAChR ECD targets are quite dissimilar, ranging from 21-29% identity and 43-49% similarity. Based on the low sequence identity, only a ‘low-accuracy’ homology model is anticipated which is partly due

to a higher probability of alignment errors at this level of identity [105]. However,

based on the careful sequence alignments in Figure 2.3, most of the secondary

structural elements seem to be conserved between these two distantly related

proteins; a majority of the differences occur in the loop regions where many

insertions and deletions are present. The mouse α1 sequence shares a much

higher level of homology to the target sequences with 41-53% identity and 64-

70% similarity, bringing more certainty to the alignment and the overall quality of

the resulting models.

30

Figure 2.3. Numbered sequence alignment of AChBP and nAChR sequences used for modeling. Templates (bold) are the acetylcholine binding protein from three molluskan species (Lymnaea stagnalis, Aplysia californica, and Bulinus truncatus) and the mouse α1 nAChR ECD. Targets are the human α3, α4, β2, and β4 nAChR ECDs. Magenta highlighting indicates a conserved residue, while turquoise highlighting indicates residue similarity. Light green bars above residues represents α helices, dark green bars represent 310 helices, and light blue arrows represent β strands. The alignment was performed manually with cues taken from AChBP X-ray structures and the secondary structure prediction algorithms PHD and PSIPRED.

Table 2.1. Sequence identity between template and model sequences. Ls Ac Bt mα1 hα3 rα3 hα4 hβ2 hβ4 rβ4 Ls 100 Ac 35.4 100 Bt 47.1 36.5 100 mα1 22.7 24.9 22.4 100 hα3 25.8 29.1 24.3 51.4 100 rα3 25.2 28.0 24.3 52.4 94.7 100 hα4 26.4 29.6 23.7 52.9 60.6 61.1 100 hβ2 24.5 26.1 27.8 44.9 49.5 51.0 54.0 100 hβ4 24.5 23.9 21.5 40.9 49.0 49.5 52.0 69.6 100 rβ4 23.9 22.2 21.5 41.4 48.0 61.5 50.0 68.6 92.8 100 Template sequences: AChBP of Lymnaea stagnalis (Ls), Aplysia californica (Ac), and Bulinus truncatus (Bt) and the mouse α1 ECD (mα1). Target sequences: the ECD of rat α3 and β4 subunits and human α3, α4, β2, and β4 subunits.

31

Table 2.2. Sequence similarity between template and model sequences. Ls Ac Bt mα1 hα3 rα3 hα4 hβ2 hβ4 rβ4 Ls 100 Ac 59.0 100 Bt 68.6 60.5 100 mα1 44.2 46.6 44.7 100 hα3 48.5 45.0 48.7 69.7 100 rα3 48.5 43.9 49.3 68.8 96.2 100 hα4 42.9 42.9 44.7 66.8 76.4 76.4 100 hβ2 45.8 43.9 47.2 65.7 68.7 70.7 69.2 100 hβ4 48.4 43.9 49.3 63.6 68.7 69.7 68.2 85.5 100 rβ4 48.4 43.3 49.3 64.1 67.7 68.7 67.7 85.0 97.6 100 Template sequences: AChBP of Lymnaea stagnalis (Ls), Aplysia californica (Ac), and Bulinus truncatus (Bt) and the mouse α1 ECD (mα1). Target sequences: the ECD of rat α3 and β4 subunits and human α3, α4, β2, and β4 subunits.

Following alignment, three-dimensional models were built with MODELLER9v1

[106] in an iterative fashion, with 200 models being built in each iteration. Since

the model assessment methods used in MODELLER were exclusively calibrated

with single-chain proteins, they are not suitable for selecting top structures

among the pentameric nAChR models. To more accurately select a top structure,

each model was scored with a molecular mechanics Poisson-Boltzmann surface

area (MM-PBSA) approach which includes the internal energy of the model as

well as its solvation free energy. Each model was solvated in a TIP3P water box,

energy minimized, stripped of its waters, then scored with an MM-PBSA

approach in the AMBER suite of programs [107].

Molecular dynamics with locally enhanced sampling (LES) [108] was applied to

the top structure of the sixth modeling iteration to better sample the conformation

of the A loop and its connection to the adjoining β5 strand of each subunit. This approach was taken since the A loop (loop 5 in Figure 2.3, corresponding to residues 94-105 for α subunits and 96-107 for β subunits) is poorly aligned with

32 the AChBP sequences and while a one-to-one alignment exists with the mouse

α1 subunit, this loop exists at the subunit interface which is not present in the monomeric mouse structure. Five copies of each of the five LES regions (one region for each subunit) were created with the ADDLES module of AMBER. After solvating the structure with a TIP3P water box and adding counterions, approximately 4 ns of LES simulation was performed. In total, there were 55 LES residues for each of the five copies, leaving the remaining 987 residues of the

ECD to be treated classically. Over the 3.93 ns duration of the simulation, the all- atom root mean square deviation (RMSD) for residues in the LES regions reached a maximum RMSD of 5.31 Å with respect to the starting structure.

Comparing this deviation to the maximum RMSD of 3.40 Å exhibited by the non-

LES residues characterizes the enhanced sampling achieved by this method.

Upon separation of the five LES copies, the energy of each model conformation during the simulation was calculated with the same MM-PBSA protocol as described for the initial homology modeling. The structure with the lowest computed energy during the MD simulation was selected as the final template for homology modeling.

Initially, a rat α3β4 (rα3β4) nAChR ECD model was built to compliment experimental data that was available at the time. Later, human α3β4 and human

α4β2 models were built based on the final rat model. The rα3β4 ECD was modeled in seven iterations as illustrated in Figure 2.4, where each successive iteration added additional symmetry, distance, and secondary structural restraints as well as incorporating the best-ranking model from the previous iteration as a

33 fifth template structure. The same templates and restraints used to obtain the

final rat α3β4 nAChR model were also used to create the human α3β4 and α4β2

nAChR models.

As shown in Figure 2.4, the each successive modeling iteration was able to

successfully decrease the calculated model energy. The MM-PBSA energy of the

rα3β4 model was reduced by 11.2% from the initial round of modeling through

the final iteration. Three modeling adjustments that made the most significant

improvements in the calculated energies included refinement of the alignment

with secondary structure assignments, incorporation of the mouse α1 monomer

as a fourth crystallographic template, and LES (locally enhanced sampling)

refinement of loop A, which yielded -4.47%, -2.45%, and -3.50% changes in total

computed energy respectively. Incorporation of the mouse α1 monomer into the

homology modeling process was particularly helpful in refining the conformation

of several loop regions: L1, L5 (A loop), L7 (Cys-loop), and L9 (F loop). The three

molluskan species for which AChBPs have been crystallized all have shorter or

longer sequences in these regions, implying altered loop conformations in the

human nAChRs, while the mouse α1 sequence has a one-to-one alignment in

these loop regions. In fact, the mouse α1 sequence shares a one-to-one

alignment with both human α targets considered in this study, except for a single

insertion found in the C loop of the α1 sequence.

In addition to template differences, the alignment used to create this model is

unique, particularly in loop regions, from those previously reported. This implies differences in model structure that will affect docking and dynamics results.

34

Figure 2.4. Histograms of model energies per modeling iteration. Model energies were calculated to include the internal energy (EMM) in addition to solvation free energy calculated using the MM-PBSA method. Iterations 2-7 incorporates the top scoring model from the previous iteration as an additional template. Rat α3β4 ECD is modeled in interations 1-7 (blue), human α3β4 in iteration 8 (orange), and human α4β2 in intertion 9 (red). 1. Two roughly aligned AChBP templates (PDB ID: 1UWG and PDB ID: 2BYR) were used with symmetry restraints. 2. An additional AChBP template (PDB ID: 2BJ0) was included; template alignment was refined, secondary structure assignments and distance restraints of select conserved motifs were added. 3. β-sheet restraints were added. 4. Mouse α1 monomer (PDB ID: 2QC1) was included as a fourth crystallographic template; α1 template specifically used to refine loop 1; hydration pocket waters added. 5. α1 template was used to refine F loop conformation. 6. C loop conformation of β subunits was refined. 7. The A loop of all subunits were refined with a template modified by LES MD simulation; symmetry were restraints removed. 8. Human α3β4 ECD models were built using same alignments and constraints as in 6. 9. Human α4β2 ECD models were built using same alignments and constraints as in 6.

35 2.4 Molecular Dynamics

Prior to docking, molecular dynamics (MD) simulations of the hα4β2 and hα3β4

ECD models were carried out for two purposes: to test the stability of the models and to collect an ensemble of receptor conformation for use in docking studies.

2.4.1 MD Methods

Prior to simulation, the model was solvated in a TIP3P water box with a 15 Å

buffer around all edges of the protein. After solvation, the system was charge

neutralized by the addition of Na+ counterions and energy minimized by 500

steps of steepest descent minimization followed by 1500 steps of conjugate

gradient minimization. The system was equilibrated by first increasing the

temperature of the system from 0 K to 300 K over 200 ps in which all protein

atoms were fixed with a 50 kcal/mol harmonic potential. This proved to be an

important step, since it allowed the water molecules to fill in the gaps at the

protein/water interface that were left vacant by the solvating algorithm in the

LEaP module of AMBER. If the waters were not first allowed to equilibrate

around the protein, undesired side chain movements were observed that

detrimentally effected agonist docking to the agonist binding sites. A final 200 ps

of unrestrained MD completed the equilibration process. Production runs of 5 ns

followed the equilibration. All simulations used a heat bath coupling constant of

2.0 ps and were performed at 1 atm with a pressure relaxation time of 2.0 ps.

Nonbonded interaction calculations were cutoff at 8 Å, while the electrostatic

energy was computed using the Particle Mesh Ewald method. The simulations

were run using the sander code of AMBER 9 with the ff99 force field. Constant 36 volume and temperature MD simulations of the nAChR models used the SHAKE

algorithm as implemented by AMBER with a 2 fs time step. Snapshots were

captured at 200 ps intervals along the production run trajectories to form a set of

26 receptor conformations that were used for docking.

2.4.2 MD Results

Since the antagonists we are studying act in an allosteric fashion, it was

important to model the receptor in the presence of agonist as would occur in vivo.

To prepare the nAChR models for antagonist blind docking, MD simulations were conducted for the receptors in various binding states, including an unbound state, a binary complex bound to a single epibatidine molecule, and a ternary complex saturated with two epibatidine molecules. Epibatidine was selected as the agonist used in the model to most closely recreate the experimental conditions.

Conformational clustering of the MD simulations was performed using the k- means method with the kclust script from the MMTSB toolbox [109]. Receptor conformations were extracted at 1 ps intervals over 5 ns MD simulations and clustered based on their Cα atom RMSDs with a tolerance of 1 Å. As shown in

Table 2.3, a general decrease in sampled conformations for both hα4β2 and hα3β4 models was observed upon ligand binding. While the α3β4 MD simulations show a decrease in sampled receptor conformations with both agonist binding events, no change in number of clusters was found in the α4β2 model upon binding the second epibatidine molecule. These results suggest that the ECDs favor particular conformations when bound to agonists, presumably those conformations that lead to channel opening. This is consistent with single 37 channel experiments which show that short openings results from singly-ligated

nAChR, while doubly-ligated nAChR exhibit sustained openings [110].

Table 2.3. Number of conformational clusters from MD simulations of nAChR models in three different binding states. Number of MD clustersa α4β2 α3β4 apo 9 13 binary complexb 6 12 ternary complexc 6 10 aResults of k-means clustering from 5000 snapshots extracted at 1 ps intervals from 5 ns MD simulations using centroids of all Cα atoms with an RMSD tolerance of 1 Å bSingle epibatidine molecule bound to agonist binding site 1 cEpibatidine bound to both agonist binding sites

The stability of these simulations was quantified by all-atom RMSD analysis. It

was found that the Cys loops were conformationally unstable, leading to steadily

increasing RMSDs over the duration of the 5 ns simulations. However, when the

RMSDs were recalculated to exclude the Cys loop residues, the all-atom RMSD

for each model plateaued in the range of 2-3 Å, indicating stable MD trajectories

at this timescale (Fig. 2.5). The Cys-loops are some of the most variable regions

on the ECD models, which is not surprising since these loops are known to make contact with the membrane head groups as well as the M2-M3 loops of the transmembrane domain. Since these potentially stabilizing interactions are absent in the models which only include the ECD, the Cys-loops are free to sample conformations that do not reflect the physical reality of the full nAChR embedded in a plasma membrane.

38

Figure 2.5. RMSD plots for MD simulations of nAChR models. All-atom RMSD plots for hα4β2 (A) and hα3β4 (B) in three different states: unbound (apo), binary complex, and ternary complex. Dashed lines represent RMSD values for the entire extracellular domain models, while the solid lines represent the RMSD for the entire models excluding the Cys loop residues. Data was smoothed with a ±25 frame sliding window average.

The average RMSDs from the starting structures for the individual subunits in the three sampled binding states show that the MD trajectories are relatively stable

(Table 2.4). The maximal backbone RMSD average for a single ECD subunit is

4.56 Å, while the typical subunit only deviates an average of 2.02 Å from its initial

conformation over simulation times of 5 ns. Some regions, including the C, Cys, and L1 loops, are particularly more variable in conformation when compared to each subunit as a whole, while the A, B, and F loops are generally more stable.

39 Plots of the all-atom RMSDs on a per residue basis are shown in Figures 2.6-7,

illustrating the more conformationally variable regions of the receptor versus the

more stable regions. These plots once again emphasize the mobility of the Cys-

loops (residues 127-138 in α subunits and 129-140 in β subunits) compared to the rest of the ECD model.

40

Table 2.4. Average RMSDs for backbone atoms of ECD models from MD simulations in three states. apo binary ternary apo binary ternary α4β2 α41 all 1.91 1.62 1.40 α3β4 α31 2.44 2.16 2.18 average C loop 2.60 1.07 1.30 average 3.48 2.87 2.26 RMSD F loop 1.84 1.64 1.14 RMSD 2.08 1.55 1.36 (Å) A loop 1.29 1.42 1.33 (Å) 2.10 1.51 1.50 Loop 1 3.08 2.55 2.57 2.42 2.77 4.44 Cys loop 4.35 2.15 2.04 4.12 4.06 3.84 B loop 1.49 0.92 0.95 1.85 2.31 1.44 β21 all 1.82 1.77 1.66 β41 2.15 2.79 1.75 C loop 3.65 2.10 1.92 1.87 4.70 2.80 F loop 1.60 1.59 1.33 1.86 3.23 1.56 A loop 1.47 1.37 0.95 1.38 2.36 1.95 Loop 1 1.57 1.66 1.92 2.98 4.56 2.25 Cys loop 3.00 2.49 2.73 2.99 3.39 2.08 B loop 1.22 1.42 1.13 1.85 1.74 1.30 α42 all 1.55 2.38 1.27 α32 1.90 1.83 1.96 C loop 2.27 2.98 2.19 2.61 1.90 1.54 F loop 2.22 2.54 1.15 1.57 1.63 2.25 A loop 1.34 2.41 1.26 1.38 0.99 2.31 Loop 1 1.64 1.75 1.21 2.61 1.97 1.94 Cys loop 1.60 1.59 1.33 1.86 3.23 1.56 B loop 1.47 1.37 0.95 1.38 2.36 1.95 β22 all 1.57 1.66 1.92 β42 2.98 4.56 2.25 C loop 3.00 2.49 2.73 2.99 3.39 2.08 F loop 1.22 1.42 1.13 1.85 1.74 1.30 A loop 1.55 2.38 1.27 1.90 1.83 1.96 Loop 1 2.27 2.98 2.19 2.61 1.90 1.54 Cys loop 2.22 2.54 1.15 1.57 1.63 2.24 B loop 1.34 2.41 1.26 1.38 0.99 2.31 β22 all 1.64 1.75 1.21 β42 2.61 1.97 1.94 C loop 1.60 1.59 1.33 1.86 3.23 1.56 F loop 1.47 1.37 0.95 1.38 2.36 1.95 A loop 1.57 1.66 1.92 2.98 4.56 2.25 Loop 1 3.00 2.49 2.73 2.99 3.39 2.08 Cys loop 1.22 1.42 1.13 1.85 1.74 1.30 B loop 1.55 2.38 1.27 1.90 1.83 1.96 Binary complex (one bound epibatidine molecule at α/β interface, agonist binding site 1 in Figure 2.1) and ternary complex (an epibatidine molecule bound to each α/β interface, agonist binding sites 1 and 2 in Figure 2.1). MD snapshots were collected at 1 ps intervals from 5 ns long trajectories and all RMSD values are in reference to the initial structure of each trajectory.

41

Figure 2.6. Average all-atom RMSDs for hα4β2 nAChR ECD model in three different binding states. All-atom RMSD of each residue from the initial structure of a 5 ns MD simulation of three states: unbound (blue), bound to one epibatidine molecule at agonist binding site 1 (green), and bound to an epibatidine molecule at both agonist binding sites (red). Several loop regions are highlighted, including L1 (14-27), Cys-loop (127-138), F loop (159-174), and the α- subunit C loop (189-195).

42

Figure 2.7. Average all-atom RMSDs for hα3β4 nAChR ECD model in three different binding states. All-atom RMSD of each residue from the initial structure of a 5 ns MD simulation of three states: unbound (blue), bound to one epibatidine molecule at agonist binding site 1 (green), and bound to an epibatidine molecule at both agonist binding sites (red). Several loop regions are highlighted, including L1 (14-27), Cys-loop (127-138), F loop (159-174), and the α- subunit C loop (189-195).

Most nAChR agonists, including epibatidine, carry a positive charge at

physiologic pH. This plays a significant role in their binding to the nAChRs due to

cation-π interactions between the charged agonist and the cluster of aromatic

residues that forms the agonist binding site [111]. In addition to cation-π

interactions, proper fitting into the agonist binding site can allow for strong

hydrogen bond formation between the positively charge nitrogen of the agonist and the backbone carbonyl of Trp148, as observed in crystallographic structures 43 [86,102] and proven important in mutational studies [111]. Both of these

interactions have been measured in our dynamics studies, with the results

presented in Table 2.5. In both cases of epibatidine binding to the α4β2 models,

the hydrogen bond between the agonist and Trp148 is observed, while initial

simulations of the epibatidine-bound α3β4 models did not indicate that the

hydrogen bond was formed. Inspection of the docked epibatidine conformation in the hα3β4 nAChR binding sites revealed that the hydrogen bonding interaction

was not occurring due to a 180º rotation of the bicyclic portion of the epibatidine

molecules, positioning the positively charged nitrogen atom in the opposite

direction observed in the AChBP-bound conformation [102]. To remedy the

inaccurate epibatidine blind docking mode to the hα3β4 model, the binding mode

was remodeled based on the hα4β2 docking results that had a lower RMSD from

the crystallographic position. MD simulations of the remodeled hα3β4 epibatidine-bound nAChRs resulted in trajectories in which the agonist formed stable hydrogen bonds with the backbone oxygen atom of Trp148, consistent with the experimental binding mode.

44 Table 2.5. Measurements of agonist binding stability in MD simulations of epibatidine- bound nAChRs. hα4β2 hα3β4 Agonist binding Agonist binding Agonist binding Agonist binding

site 1 site 2 site 1 site 2 Hydrogen bonding interaction distance (Å)a binary complexc 2.88 (0.13) - 2.84 (0.11) - ternary complexd 2.89 (0.13) 3.75 (1.16) 2.84 (0.12) 2.86 (0.13) Cation-π interaction distance (Å)b binary complexc 3.91 (0.39) - 3.34 (0.25) - ternary complexd 3.72 (0.38) 4.55 (0.75) 3.45 (0.27) 4.87 (0.49) aDistance between positively charged N of epibatidine and backbone O of Trp148 bDistance between positively charged N of epibatidine and center of mass for the indole group of Trp148 (Å) cSingle epibatidine molecule bound to agonist binding site 1 dEpibatidine bound to both agonist binding sites Average measurements calculated from 5 ns MD simulations with standard deviations in parentheses

2.5 Blind Docking

A set of four nAChR negative allosteric modulators (NAMs) were used to search

for the unknown binding site. As illustrated in Figure 2.8B, these included COB-3,

KAB-18, APB-12, and PPB-9. Given that these compounds are all structurally

related, it was anticipated that each would bind at the same site on the receptor.

Docking methods were used to identify the binding site/mode of the compounds.

Typically, a binding site is already known and the docking algorithm is used to

determine a specific binding mode (conformation of the ligand in the pocket).

However, in this case, the siteof binding was not known, therefore an approach

called ‘blind docking’ was used, in which the entire surface of the receptor was

treated as part of the search space. Only two pieces of data were available to

limit the search space. First, was that these molecules inhibit nAChR function in

an allosteric fashion, therefore the orthosteric (agonist-binding) site could be

discounted. This knowledge was worked into the docking protocol by using

epibatidine-bound receptor conformations. Secondly, one of NAMs was shown to

44 be selective for hα4β2 receptors over hα3β4 receptors. This information allowed

us to eliminate binding sites that shared identical amino acid sequences between

the two receptor subtypes.

Acetylcholine Nicotine Epibatidine A

COB-3 PPB-9

KAB-18 APB-12 B

Figure 2.8. Compounds used in blind docking experiments. A. Docked agonists included acetylcholine, nicotine, and epibatidine. B. Docked negative allosteric modulators included COB- 3, PPB-9, APB-12, and KAB-18.

2.5.1 Docking Methods

Agonist structural coordinates were taken from the PDB and processed by the

LigPrep program of the Schrödinger suite to determine the ionization state of

each compound at pH 7 ± 2. All agonists were determined to carry a positive

charge within the pH range considered. All compounds were assigned Gasteiger charges and docked with the Lamarckian genetic algorithm (LGA) [112] in

45 AutoDock4 [113] with the maximum number of freely rotating bonds per ligand.

One hundred independent docking runs were completed for each ligand to each of the receptor conformations. A cutoff of 25,000,000 – 100,000,000 energy evaluations was used, depending on the number of rotatable bonds in the ligand, while all other docking parameters maintained the default setting.

Blind docking grids of size 90.00 Å  90.00 Å  56.25 Å with grid point spacing of

0.375 Å were constructed for each snapshot conformation with AutoGrid4. These grids were large enough to encompass the entire extracellular domain, only excluding the Cys-loop region, since docking results in this region are unrealistic due to the contact these loops make with the TM2-TM3 loops that are not part of these models.

Each of the 100 docking positions for each ligand at each receptor conformation were clustered by their centroid points with a 4 Å tolerance. The four most populous clusters of each ligand were then clustered against those from the other receptor conformations. This clustering of clusters was based on the receptor residues that came into contact with each cluster instead of the

Cartesian coordinates attributed to the centroid-based clusters. This method allowed for the clusters from different time points to be compared to each other without having to worry about spatial drift or rotation of the receptor. A list of residues coming within 5 Å of each of the docked conformations for each centroid-based cluster was created with scripts utilizing functions available in the

Chimera program [114]. Clusters with residue lists that shared a 65% intersection were considered to belong to the same docking position.

46 After the initial round of blind docking to the unbound nAChR models, the epibatidine docking with the smallest RMSD from the AChBP binding mode (as found in PDB ID: 2BYQ) was kept as part of each nAChR structure. Each agonist-bound system was then resampled via an MD simulation using a similar protocol as described above. A second epibatidine molecule was then docked to the models using the same ensemble blind docking method employed to dock the first compound. Again, the docking with the smallest RMSD from the AChBP binding mode at the second agonist binding site was added to the system to create a ternary complex: nAChR saturated with two agonist molecules.

Epibatidine was chosen as the agonist in the model to correspond to the agonist used in functional assays [115].

Upon creation of the ternary complex for both hα3β4 and hα4β2 nAChR models, the systems underwent one final MD simulation to create ensembles of epibatidine-bound receptor conformations. A final blind docking procedure was carried out with the antagonists illustrated in Fig. 2B. The results of the ensemble blind docking with the antagonists were clustered in the same fashion as the agonists in order to identify the most probable docking sites.

2.5.2 Docking Results

Flexibility of the agonist binding site has been documented by unbound and agonist-bound AChBP crystal structures [102]. When docking, this flexibility was accounted for by the use of multiple receptor conformations as extracted from

MD trajectories. From the MD simulations discussed in Section 2.4, a total of 26 snapshots were collected for each receptor subtype, collected at regular 200 ps 47 intervals. Initially, three different agonists with known experimental binding

modes to AChBP were blindly docked to validate the docking procedure. These

compounds, illustrated in Fig. 2.8A, include acetylcholine, nicotine, and

epibatidine. The docking results for the agonists near the agonist-binding site are

presented in Table 2.6 with representative docking modes illustrated in Fig. 2.9.

Although blind docking of the agonists to the hα3β4 snapshots was able to locate

both binding sites, only one of the two hα4β2 binding sites was properly located.

This was due to an unusual C-loop conformation at agonist binding site 1 in the

unbound state. Experimental binding affinities for all three agonists on human

nAChRs could not be found in the literature, however the EC50 values for

acetylcholine, nicotine, and epibatidine have been reported for recombinant

hα4β2 and hα3β4 receptors expressed in HEK293 and Xenopous oocytes [116-

118]. The docking energies for the agonists were able to reproduce efficacy

trends, with epibatidine binding more strongly than nicotine which displays

greater binding affinity than acetylcholine. Additionally, the average docking

energies of the agonists all showed a preference to bind the hα4β2 models over the hα3β4 models, a trend that is also experimentally observed [116-118].

48

Figure 2.9. Blind docking modes compared to X-ray structures. Docking modes for epibatidine (A – magenta), nicotine (B – orange), and acetylcholine (C – green) to hα4β2 models compared to crystallographic binding modes (blue). Crystallographic structures for AChBP bound to epibatidine, nicotine, and carbamylcholine (PDB IDs: 2BYN, 1UW6, and 1UV6 respectively) were superimposed on nAChR ECD models to determine RMSDs of the dockings.

Table 2.6. Blind docking results for agonists to multiple nAChR conformations. hα4β2 hα3β4 Average Expt. Average Average Expt. Average docking potency, Cluster docking docking potency, Cluster docking

energy EC50 size RMSD energy EC50 size RMSD (kcal/mol)a (μM) b (Å)c (kcal/mol)a (μM) b (Å)c acetylcholine -4.86 100 132 3.13 -4.66 203.14 150 6.45 nicotine -6.59 3.5 282 1.72 -6.31 40.3 90 4.97 epibatidine -7.83 0.043 154 5.44 -7.28 0.151 149 7.42 aAutoDock energies bExperimental agonist potencies from data reported in [116-118] cRMSD measurements compared to corresponding AChBP crystal complexes

Three antagonists, COB-3, PPB-9, and APB-12, were docked to the models using the same ensemble blind docking method that was used to dock the agonists. Based on LigPrep (Schrödinger, LLC) results, the antagonists are all positively charged at physiologic pH, protonated at the nitrogen atom of their piperidine/pyrrolidine moieties. Each antagonist also has one or more stereogenic centers. The two stereoisomers of each compound with the lowest computed energy were used in the blind docking study; each of these conformations had equatorial branching off of the heterocyclic moieties. The

49 antagonist docking site that was ultimately validated as the correct binding site was populated by 28.2% of the dockings to the epibatidine-bound model conformations. Three other sites were more prominently populated with alternate docking clusters; these had 69.6%, 57.1% and 47.1% rates of being identified as one of the four largest docking clusters for each antagonist that was docked. The positions of these other sites were all located on the inside of the doughnut- shaped extracellular domain facing the pore. They were either at subunit interfaces (both α/β and β/β) or tucked inside an A loop.

Some of the false positives observed in the blind docking can be attributed to the large search space used. Additionally, the use of a medium- to low-resolution homology model complicated the matter, possibly creating cavities in the surface that normally do not exist to which the ligands may preferentially dock. The use of multiple receptor conformations served two purposes. First, it could potentially relive some of the bias that the models had toward the modeling templates

(AChBP structures). Additionally, the ensemble-based docking approach could help account for the known flexibility of the receptor, much of which is ligand- induced.

2.6 Focused Docking and Induced Fit Molecular Dynamics

One of the frequently occurring antagonist blind docking modes was investigated more closely by redocking the antagonist KAB-18 to the suspected allosteric site with focused docking grids. KAB-18 became a focus because since it exhibits preferential antagonism of hα4β2 nAChRs versus hα3β4 nAChRs [115]. KAB-18 was docked to focused docking grids of size 37.5 Å  36.0 Å  37.5 Å with 0.375 50 Å point spacing. The grids were centered at an α/β interface, encompassing the

regions surrounding the epibatidine-bound agonist binding site. KAB-18 was

docked using AutoDock with similar parameters as the agonists, using a cutoff of

100,000,000 energy evaluations for the LGA. Recurring docking poses were

determined by clustering the docking results with an all-atom RMSD tolerance of

2 Å.

The selection of a precise docking mode was aided by existing structure activity

relationship (SAR) data that indicate modifying the terminal phenyl of the biphenyl group of KAB-18 to a succinimide moiety results in a loss of hα4β2 selectivity [116]. Additionally, modifying the length of the aliphatic linker on the opposite end of the antagonist was also shown to result in a loss of relative hα4β2 selectivity. Taking this into consideration, a binding mode in which the aforementioned regions of KAB-18 were found to associate with receptor residues that vary between the hα4β2 and hα3β4 nAChRs was selected. A subsequent MD simulation of this focused docking mode revealed two potentially important polar binding interactions. First was a hydrogen bond between the keto group of the ester linkage of KAB-18 and the hydroxyl oxygen of Thr58 of the β2 subunit. Second was a Coulombic interaction between the positively charged nitrogen of the piperidine group of KAB-18 and the carboxyl group of Glu60 of the

β2 subunit. These interactions and their stability in a 9 ns MD simulation are illustrated in Figure 2.10.

The refined binding mode is illustrated in Figure 2.11A, highlighting the amino acids with which the antagonist makes contact, while a superposition of the other

51 antagonist docking modes is found in Figure 2.11B. Interestingly, the residues that seem to confer selectivity for this binding mode, i.e. the sites of variation between the hα4β2 and hα3β4 subtypes (amino acids at positions 78, 110, 112,

118, 58, and 35), are all found on the β subunit, forming a band along the 6- membered β-sheet that creates the (-) side of the α/β interface (dark blue in Fig.

2.11A).

Figure 2.10. Stability of KAB-18 at its proposed binding site. A. Initial docking mode of KAB- 18 (magenta) to the hα4β2 model in the presence of epibatidine (purple). The ligand binds at the interface between the α subunit (green ribbon) and the β subunit (blue ribbon). Dotted lines identify key polar interactions between the ligand and the receptor. B. Induced binding mode after 7 ns of MD simulation. C. Distance between positively charged piperidine nitrogen of KAB-18 and carboxyl oxygens of β2Glu60. D. Distance between keto oxygen of ester linkage of KAB-18 and hydroxyl oxygen of β2Thr58.

52

Figure 2.11. Detailed docking modes for negative allosteric nAChR modulators. A. Docking mode of KAB-18 (magenta) at the 4(+) (green) / 2(-) (blue) interface in the presence of the agonist epibatidine (purple). Residues varying between the β2 and β4 subunits are featured (dark blue). B. Superimposed Glide docking modes of KAB-18 (magenta), APB-12 (grey), PPB-9 (orange), and COB-3 (green) at the same binding site.

2.7 Binding Site Validation: Mutagenesis and Functional Assays

[The work presented in this section was primarily performed by Brandon

Henderson, but is included here as the experiments were specifically performed

to validate the computationally predicted binding mode.]

Based on the proposed interaction between KAB-18 and the hα4β2 ECD (Figure

2.11A), mutations were suggested that could experimentally validate the binding

mode. Initially, two mutations were tested: β2F118L and β2T58K. Both of these

mutations change sites on the β2 subunit to the corresponding amino acid found

on the β4 subunit. KAB-18 has no functional activity on human α3β4 nAChRs 53 when tested at concentrations up to 100 μM [115] (higher concentrations were

not possible due to solubility limitations), therefore, it was thought that these

mutations would potentially decrease the observed potency of the molecule.

2.7.1 Experimental Methods

Human nAChR α4 and β2 full-length cDNAs in the vector pSP64 (poly A) were

obtained from Dr. Jon Lindstrom (University of Pennsylvania) and used as the

template for mutagenesis (β2) and for transient expression (α4 and β2). A single mutation was made in the β2 subunit using the Quik Change Lightning Multi Site-

Directed Mutagenesis Kit (Stratagene) following the manufacturers instructions.

Primers were designed using the QuikChange Primer Design Program

(Stratagene) and Oligo 4.0 (National Biosciences) and synthesized by Invitrogen.

Primers were designed to replace the threonine residue at position 58 in the hβ2

subunit with a lysine found at the similar position in the hβ4 subunit (T58K). The

following primer was designed to change the threonine (ACC) at position 58 in

the β2 subunit to lysine (AAG): β2 mutant 5'-

CCACCAATGTCTGGCTGAAGCAGGAGTGGGAAGATTATCG-3'. The

underlined nucleotides defined the mutation. Primers were also designed to

replace the phenylalanine residue at position 118 in the hβ2 subunit with a

leucine found at the similar position in the hβ4 subunit. The following primer was

designed to change the phenylalanine (TTC) at position 118 in the β2 subunit to

leucine (TTG): 5'-TCTCCTATGATGGTAGCATCTTGTGGCTGCCGCCTGC-3'

and 5'- GCAGGCGGCAGCCACAAGATGCTACCATCATAGGAGA-3'. It should

be noted that for the F118L mutation, an additional mutation (T) was introduced 54 which did not change the coding sequence, but relaxed a potential loop in the

primer in order to allow for the generation of this mutation. The mutant hβ2

cDNAs were subcloned into pcDNA 3.1+Zeo (Invitrogen). The wild type hα4 and

hβ2 cDNAs were also subcloned into the pcDNA 3.1+ and pcDNA 3.1+ Zeo

vectors respectively. All cDNA clones were completely sequenced using a 3730

DNA Analyzer (Applied Biosystems) at the Ohio State University Plant-Microbe

Genomics Facility. DNAs used for transfection were purified using PureLink High

Pure Mini or Midi Kits (Invitrogen). HEK ts201 cells (kind gift of Dr. Rene Anand,

Ohio State University Department of Pharmacology) were transiently transfected

with wild-type hα4 mutant hβ2 or wild-type hα4β2 cDNAs using Lipofectamine

2000 (Invitrogen) in 60 mm dishes. After 8 hours, the cells were replated into 96

well dishes for the intracellular calcium accumulation assays.

Calcium accumulation assays were performed as described previously [76] with

slight modifications. Briefly, HEK ts201 cells, transiently expressing wild type

hα4β2 nAChRs (hα4β2wt) or mutant hα4β2 nAChRs (hα4β2m), were plated on

96-well plates at a density of 2.6 x 105 cells per well. Twenty-four hours after plating, the cells were washed and then incubated with fluo-4-AM for 30 minutes at 37°C followed by 30 minutes at 24°C. After incubation, cells were washed and fluorescence was measured at ~0.7 second intervals using a fluid handling integrated fluorescence plate reader (Flex Station, Molecular Devices,

Sunnyvale, CA). The experimental design involved three treatment groups: control-sham treated, control-epibatidine treated, and antagonist treated.

Functional responses were quantified by first calculating the net fluorescence

55 changes (the difference between control sham-treated and control agonist-

treated groups). Data were expressed as a percentage of control-epibatidine

treated groups. Results were calculated from the number of observations (n)

performed in triplicate. Curve fitting was performed by Prism software

(GraphPad, San Diego, CA). EC50 values, IC50 values, and Hill coefficients were obtained by averaging values generated from each individual concentration-

response curve. EC50 values and IC50 values were expressed as geometric

means (95% confidence limits). Experimental values were compared using the t-

test (p<0.005), as indicated.

2.7.2 Functional Results

Functional IC50 values for KAB-18 and control antagonists (d-tubocurarine and ) as well as function EC50 values for a control agonist (epibatidine) were obtained using a fluorescence calcium accumulation assay. Changes in the

IC50 values of KAB-18 were used to document a change in the apparent affinity

of KAB-18 as caused by mutation of the target amino acids. The IC50 for KAB-18

was reduced to 71.8 μM on the β2T58K mutant from the wild-type IC50 of 8.5 μM

(Figure 2.12 and Table 2.7), an eight-fold decrease in observed potency. The affect of the β2F118L mutation was even more pronounced, with a loss of inhibitory activity for KAB-18 at concentrations up to 100 μM. It is important to note these single point mutations did not affect apparent affinity for epibatidine

(an agonist) at the orthosteric site adjacent site of allosteric modulation. Nor did the mutation alter the apparent affinity for tubocurarine (a competitive antagonist)

56 or mecamylamine, a non-competitive antagonist which binds at a different

location on the receptor.

The results of the functional assays with the mutant receptors indicated that both

mutations are involved in the binding of KAB-18, while leaving the receptors

functionally intact. The F118L mutation resulted in a greater change of apparent

affinity between the ligand and receptor. Based on the modeling, phenylalanine

at position 118 seems to be involved in a π-π stacking interaction with the

terminal phenyl group of KAB-18, in addition to a potential cation-π interaction

with the positively charge piperidine moiety of KAB-18. Overall, these data

support the experimental findings that KAB-18 preferentially inhibits hα4β2 over

hα3β4 nAChRs [115] and are consistent with KAB-18 binding to the allosteric site predicted by computational modeling.

Figure 2.12. Dose-response curves for epibatidine and KAB-18 on wild type and mutant hα4β2 nAChRs. A. Functional response for epibatidine binding to hα4β2WT (wild type) and hα4β2 T58K/F118L mutant nAChRs. B. Functional response of KAB-18 on wild type and mutant nAChRs. Data are expressed as a percentage of control responses using 3 μM epibatidine. Values represent means ± SEMs (n = 5 – 7).

57 Table 2.7. Effects of agonists and antagonists on wild type and mutant hα4β2 nAChRs. Wild type hα4β2 hα4β2 T58K hα4β2 F118L a b a b a b EC50 or IC50 Values nH EC50 or IC50 Values nH EC50 or IC50 Values nH c epibatidine (EC50) 36.8 (25.4-53.4) nM 0.9 29.2 (9.8-87.2) nM 0.7 23.7 (12.6-44.6) nM 0.8 d-tubocurarine (IC50) 5.5 (3.4-9.0) μM -1.0 6.2 (2.1-18.5) μM -0.6 6.5 (3.9-10.9) μM -0.9 mecamylamine (IC50) 0.2 (0.1-0.3) μM -1.4 0.2 (0.1-0.5) μM -0.6 0.4 (0.3-0.5) μM -1.1 c c d KAB-18 (IC50) 10.0 (5.5-18.0) μM -1.2 71.8 (48.3-107.3) μM -1.0 >100 μM -- aValues represent geometric means (confidence limits), n = 5-7 b nH, Hill coefficient csignificantly different from wild type response, p<0.005 dcompound is insoluble at concentrations greater than 100 μM Data ranges in parentheses 58

58 2.8 Free Energy Analysis

Following binding mode analysis and in conjuction with the experimental

mutagenesis, β2T58K and β2F118L hα4β2 nAChR models were computationally

built and evaluated. Methods to build the mutants and sample their dynamics

were similar to those described in Sections 2.3 and 2.4. Following MD simulations, binding energy analysis was carried out to correlate KAB-18 binding

with the experimental data in Section 2.7.

Binding free energies were calculated for six cases: epibatidine binding to both

hα4β2 and hα3β4 nAChR models, KAB-18 binding to both models in the

presence of epibatidine, and KAB-18 binding to the hα4β2 T58K and F118L

models. The standard AMBER MM-PBSA protocol [119] was applied to 1500

bound-state conformations, extracted at 1 ps intervals from the MD simulations

described above. The receptor systems were composed of full α/β ECD interfaces for both enthalpic and entropic calculations. Entropy values were calculated using normal mode analysis.

Convergence of the computed binding free energies was tracked to assure sufficient sampling. Standard deviations of time averages are reported for sliding average data with a window size of 200 data points. Time averages at increasing intervals were computed (Figure 2.13) to quantify the convergence of each binding free energy. The average change in computed binding free energy between the first 1400 and first 1500 data points was 0.16 kcal/mol for the six cases reported, supporting the convergence of the values over the sampling period. 59 The results of the binding energy calculations are presented in Table 2.8. A binding energy of -17.46 kcal/mol for epibatidine binding alone to the hα4β2 model was computed, compared to the experimental range of -14.49 to -14.27 kcal/mol [117]. For epibatidine binding to the hα3β4 model, a binding energy of -

14.91 kcal/mol was computed, compared to the experimental range of -13.19 to -

13.19 kcal/mol [118]. These more computationally intensive free energy calculations yield numbers that follow the experimental binding trends for epibatidine in addition to being much closer estimates of the experimentally derived energies than the AutoDock scores reported in Table 2.6. In the presence of epibatidine, KAB-18 was predicted to bind more strongly to the hα4β2 nAChR model than the hα3β4 model, with computed binding energies of -

6.25 kcal/mol and 11.25 kcal/mol respectively. This is inline with the experimental data.

The binding energy for KAB-18 bound to two hα4β2 models with mutations in the putative allosteric binding site were also assessed with the MM-PBSA method.

KAB-18 was computed to bind slightly weaker to the model with a T58K mutation on the β2 subunit with binding energy of -5.34 kcal/mol, a 0.91 kcal/mol difference from the wild-type binding energy. A F118L mutation on the β2 subunit resulted in a positive computed binding energy of 7.31 kcal/mol. Both of these in silico mutation experiments correspond with the functional data presented in

Section 2.7.

60 Table 2.8. MM-PBSA binding energy calculations for epibatidine- and KAB-18-bound receptors. hα4β2-WT hα4β2-T58K hα4β2-F118L hα3β4-WT epibatidine bindinga ∆H -33.28 (1.01) -- -- -32.29 (0.96) -T∆S 15.82 (2.07) -- -- 17.37 (1.28) ∆G -17.46 (2.32) -- -- -14.91 (1.46) Expt. rangeb -14.49 – -14.27 -- -- -13.25 – -13.19 distance 1c 2.86 (0.04) -- -- 2.83 (0.02) distance 2d 7.79 (0.30) -- -- 8.65 (0.27) KAB-18 binding in presence of epibatidinee ∆H -28.27 (2.02) -28.56 (2.69) -21.30 (1.67) -14.90 (0.98) -T∆S 22.02 (1.77) 23.22 (1.99) 28.61 (2.28) 26.15 (2.38) ∆G -6.25 (2.86) -5.34 (3.32) 7.31 (2.59) 11.25 (3.06) distance 1c 2.90 (0.06) 2.93 (0.07) 2.97 (0.11) 2.86 (0.03) distance 2d 11.83 (0.21) 13.76 (0.40) 16.84 (0.34) 13.61 (0.57) aBinding at agonist binding site 2. bExperimental binding affinities calculated from data reported by Parker et al. [120] cDistance between the positively charged N atom in the bound epibatidine molecule and the backbone O atom of Trp148, quantifying epibatidine binding stability. dCα-Cα distance between α191 and β58 at the binding interface, quantifying C loop closure eBoth compounds bound at agonist binding site 2.

61

Figure 2.13. Convergence of MM-PBSA calculations. Average free energies of binding as a function of sampling period for A. epibatidine binding hα4β2 model B. epibatidine binding to hα3β4 model C. KAB-18 binding to epibatidine-bound hα4β2 model D. KAB-18 binding to epibatide-bound hα4β2 T58K model E. KAB-18 binding to epibatidine-bound hα4β2 F118L model F. KAB-18 binding to epibatidine-bound hα3β4 model. Calculated energies are presented as averages starting at time 0.

62 2.9 Mechanism of Allosteric Antagonism

As made apparent in numerous crystal structure of AChBP bound to various

ligands, C loop dynamics are an important aspect of ligand binding. When bound

to small agonists such as nicotine, acetylcholine, or epibatidine, the C loop takes

on a ‘closed’ or capped conformation, while competitive antagonists, which are

much larger than agonists, force the C loop into a more ‘open’ conformation

[102]. To track C loop dynamics in the MD simulations performed here, the Cα-

Cα distance was measured between αCys191 on the tip of the C loop and β58

(β2Thr58 / β4Lys58) on the opposite side of the interface as illustrated in Figure

2.14. These corresponding distances for 22 different AChBP crystal structures were measured (Table 2.9) and have been used to create generalized Cα-Cα ranges of C loop ‘openness’ for agonist, partial agonist, and antagonist binding in addition to unbound states which are grouped with the non-peptidic antagonists.

The general range for agonist binding, based on four X-ray structures, is 7.72 –

8.19 Å, compared to the unbound state which has a range of 15.36 – 15.72 Å based on two structures (Table 2.10).

63

Figure 2.14. C loop closure of AChBP bound to various ligands. Superposition of four crystal structures of AChBP in complex with various compounds to illustrate the difference in intersubunit distances between Cα of residue C191 of the α subunit C loop on the (+) side of the binding interface and Cα of residue 58 of the β subunit β2 strand on the (-) side of the interface. Only epibatidine is shown (pink surface) for clarity to highlight the ligand binding site. The tabulated Cα-Cα distances allows for quantification of the degree of C loop closure upon ligand binding.

64 Table 2.9. Survey of C loop closure for AChBP X-ray structures. Structure Measurement of C PDB ID resolution Compound name a Compound type loop closure (Å) (Å) 2WNL 2.70 anabaseine 7.72 agonist 2BYQ 3.40 epibatidine 7.80 agonist 1UW6 2.20 nicotine 7.93 agonist 2BJ0 2.00 CXS 8.10 buffer 1UV6 2.50 carbamylcholine 8.16 agonist 2BYS 2.05 8.19 agonist 2BR7 3.00 HEPES 8.33 buffer 1I9B 2.70 HEPES 9.30 buffer 1UX2 2.20 HEPES 9.34 buffer 2WNJ 1.80 DMXBA 9.75 partial agonist 2WNC 2.20 10.13 partial agonist 2WN9 1.75 4-OH-DMXBA 12.30 partial agonist 2X00 2.40 gymnodimine A 12.88 antagonist 2BYN 2.02 PEG 14.71 buffer 2BYR 2.45 methyllycaconitine 14.64 antagonist 2BG9 4.00 - 15.36 - 2W8E 1.90 - 15.72 - 13-desmethyl 2WZY 2.51 16.05 antagonist spirolidine peptidic 1YI5 4.20 17.50 antagonist peptidic 2BR8 2.40 α- PNIA 18.76 Å antagonist peptidic 2C9T 2.25 α-conotoxin IMI 19.13 Å antagonist peptidic 2BYP 2.07 α-conotoxin IMI 19.24 Å antagonist aAverage Cα-Cα distance between residues that correspond to C191 on the C loop of nAChR α subunit of the (+) of the binding interface and residue 58 on the β2 strand of β subunits of the (-) side of the interface.

Table 2.10. General ranges for C loop "openness" upon binding ligands of different pharmacological function. average Cα-Cα range (Å)a agonist 7.72-8.19 partial agonist 9.75-12.30 antagonist / unbound 12.88-16.05 peptidic antagonist 17.50-19.24 aAverage Cα-Cα distance between residues that correspond to C191 on the C loop of nAChR α subunits on the (+) side of the binding interface and residue 58 on the β2 strand of β subunits on the (-) side of the interface

65 The average Cα-Cα distance from the 5 ns MD simulations of unbound agonist

binding sites (apo binding sites 1 & 2, binary complex binding site 1) all had

values between the partial agonist and unbound ranges defined in Table 2.10,

implying more “open” C loops. An exception was observed for agonist binding to

site 1 of the apo hα4β2 receptor. Here, the C loop is closed in the absence of

agonist, which may represent the closed unbound state observed by

Mukhtasimova et al. [121]. Upon agonist binding, the measured Cα-Cα distances

decreased to values in the ranges measured for agonist-bound and partial

agonist-bound receptors, consistent with structural data that implicates agonists

causing C loop closure to initiate channel opening [102]. In the bound states, the

low standard deviations indicate relatively stable C loop conformations; the

standard deviations for the time-averaged Cα-Cα distances are greater in the

unbound states.

Table 2.11. Measurements of C loop closure for MD simulations of epibatidine-bound nAChRs. average Cα-Cα distance (Å)a hα4β2 hα3β4 agonist binding agonist binding agonist binding agonist binding site 1 site 2 site 1 site 2 apo 8.94 (1.57) 15.08 (1.73) 12.68 (2.12) 15.06 (2.25) binary complexb 7.78 (0.46) 14.32 (3.00) 8.69 (0.48) 12.99 (1.60) ternary complexc 8.02 (0.45) 11.62 (1.79) 10.62 (1.35) 7.88 (0.38) average minimum average minimum distance (Å) distance (Å) distance (Å) distance (Å) epibatidine / 12.97 (1.49) 10.58 18.97 (2.95) 11.55 KAB-18 complexd aSame Cα-Cα measurement as defined in Table 4. bSingle epibatidine molecule bound to agonist binding site 1. cEpibatidine bound to both agonist binding sites. dCompounds bound to agonist binding site 2. Data averaged over 5 ns MD simulations with standard deviations in parenthesis.

66 In our computational KAB-18 binding studies, the dynamics of the C loop show

that even though epibatidine is forming a stable hydrogen bond with the carbonyl

oxygen of Trp148, the C loop is obstructed from closing to an agonist-bound

state due to the presence of KAB-18. The minimum Cα-Cα distances in

simulations of epibatidine and KAB-18-bound hα4β2 and hα3β4 nAChRs was

10.58 and 11.55 Å respectively, while the average values over 5 ns of simulation were larger at 12.97 and 18.97 Å respectively. This indicates a possible mechanism of noncompetitive antagonism: inhibition of C loop closure that is required for the channel to open while not interfering with agonist binding.

Although this is a known mode of antagonism for competitive antagonists

[102,122], this is the first time a negative allosteric modulator has been suggested to act in this fashion.

Furthermore, superposition of the X-ray structure of AChBP in complex with the

α7 nAChR partial agonist, 3-(2,4-dimethoxybenzylidine)-anabaseine (DMXBA)

[123], to an MD snapshot of our equilibrated epibatidine- and KAB-18-bound

hα4β2 nAChR complex, reveals interesting similarities in ligand binding (Figure

2.15). The anabaseine portion of DMXBA superimposes well with the epibatidine

molecule bound in the nAChR model, while the dimethoxybenzylidine moiety of

DMXBA branches towards the (-) surface of the subunit interface to the same

region occupied by KAB-18 in the nAChR model. Anabaseine acts as a full α7

agonist, while the addition of the dimethoxybenzylidine group reduces the level of

efficacy, transforming the molecule into a partial agonist [123]. The experimental

Cα-Cα measurements of C loop closure for anabaseine average 7.72 Å in the

67 bound state while DMXBA measures 9.75 Å. KAB-18 seems to share some of the nAChR binding qualities that make DMXBA an antagonist, however KAB-18 is able to more effectively prevent C loop closure while not competing with the agonist-binding site. These similar binding features coupled with varied degrees of C loop closure can provide some insight on what may differentiate a partial agonist from a full agonist or antagonist; pharmacological effects of a ligand binding at or near the orthosteric site are related to the degree to which the ligand induces or inhibits C loop closure.

68

Figure 2.15. Comparison of experimental DMXBA binding to computationally predicted KAB-18/epibatidine binding. The X-ray structure of DMXBA (orange) in complex with Aplysia californica AChBP (grey ribbon) superimposed on a hα4β2 nAChR ECD model (green and blue ribbon for α4 and β2 subunits respectively) bound to both epibatidine (purple) and the negative allosteric modulator KAB-18 (magenta). The C loops for each protein have been removed for clarity in the main figure, while the inset features the varied degree of C loop closure.

69 2.10 Conclusions

In conclusion, we have shown how a combination of homology modeling, molecular dynamics, and docking techniques can be used to identify the binding site of a ligand with little guiding experimental data. These techniques were specifically applied to locate and validate the binding site/mode of a class of nAChR negative allosteric modulators. Two mutations at the proposed binding site reduced the apparent affinity of KAB-18, a hα4β2-selective NAM, while not affecting the binding of a test agonist, competitive antagonist, or off-site antagonist, verifying the binding site. Finally, a survey of crystallographic structures and MD simulations of the KAB-18-bound receptor suggests that KAB-

18 may act as an antagonist by preventing C-loop closure, while not effecting agonist binding.

70

Chapter 3. Experimental Investigation of Retinoic Acid Receptor Antagonism

3.1 Introduction

The transcription factor retinoic acid receptor (RAR) is activated by all-trans

retinoic acid (ATRA), its endogenous agonist. A major source of ATRA in the

body comes from symmetric cleavage of dietary β-carotene. Recently, it has

been discovered that asymmetric cleavage products of β-carotene, namely β- apo-14’-carotenoic acid and β-apo-13-carotenone, function as antagonists of retinoic acid receptor (RAR) gene transcription [124]. Although β-apo-14’- carotenoic acid remains a theoretical β-carotene metabolite, physiologically relevant concentrations of β-apo-13-carotenone have been identified in human plasma samples.

In the following two chapters, two major questions are addressed:

1.) What is the basis of retinoic acid receptor antagonism for β-apo-13- carotenone and β-apo-14’-carotenoic acid?

2.) What is the origin of the strong binding affinity observed between β-apo- 13-carotenone and the retinoic acid receptor?

71 This chapter discusses the experimental work performed to address these questions, while Chapter 4 contains the computational results that complement the findings presented here.

3.2 Nuclear Receptor Background

Nuclear receptors (NRs) are metazoan, ligand-activated transcription factors responsible for the expression of gene programs related to nearly all aspects of life, including development, cell differentiation, immune response, reproduction, metabolism, and homeostasis. Unlike membrane-bound receptors, which may induce gene transcription by initializing intracellular signaling pathways, NRs are intracellular proteins that directly bind to genomic DNA at sites known as hormone response elements (HREs). Based on a genome-wide search for conserved NR domains, it has been determined that there are 48 human NRs

[125]. Of these 48 receptors, about half have known ligands, while the remainder are currently classified as “orphan” receptors. Interestingly, the number of NRs found in other model organisms with sequenced genomes has been found to vary dramatically. The fruit fly, Drosophila melanogaster, possesses only 21 NRs, while the nematode, Caenorhabditis elegans, has ~270 predicted NR genes

[126]. Retinoic acid receptor (RAR), the particular NR that is the focus of this study was first cloned in 1987 [127,128], two years after the first human NR, the glucocorticoid receptor, was cloned [129].

72 3.2.1 NRs Are Ligand-Activated Transcription Factors

As with most transcription factors, the ability to bind to a target DNA sequence

does not necessarily allow it to mediate transcriptional activity. Most transcription

factors are modular in structure and require one domain to bind DNA, while

another interacts with transcriptional machinery (RNA polymerase) or chromatin

remodelers that in turn interact with RNA polymerase. In the case of NRs,

agonist binding promotes the recruitment of proteins called steroid receptor

coactivators (SRCs) via a C-terminal region called the activation function 2 (AF2).

Although NRs also contain a ligand-independent AF1 region in the unstructured

N-terminal domain, it is the ligand-activated AF2 region that is a defining characteristic of NRs.

3.2.1.1 NR Ligands

Most characterized NR ligands are small, hydrophobic molecules such as steroid

hormones, thyroid hormone, vitamins, or metabolic intermediates. Some of the

most well characterized NRs respond to hormones including thyroid hormones

and steroids such as estradiol, , testosterone, cortisol, and

aldosterone. Other NRs respond to nutrients such as vitamin D and vitamin A or

metabolic intermediates. Table 3.1 lists the common NRs and their known

endogenous ligands. The receptors in this table are divided into three functional

classes. In the absence of ligand, class I NRs, which primarily bind steroid

hormones, remain in the cytosol bound to heat shock proteins (HSPs). Agonist

binding causes release from the HSPs, homodimerization, and translocation into

the nucleus where the receptors preferentially bind specific HREs. Class II NRs, 73 on the other hand, are always located in the nucleus where they are often bound to DNA. In the absence of agonist, class II NRs may be bound to corepressor proteins that actively silence surrounding genes. Upon agonist binding, corepressors dissociate from the NR and coactivators are recruited to initiate transcription. These NRs typically form heterodimers with retinoid X receptors

(RXRs), while RXR itself may form homodimers or tetramers [130]. Finally, as in most biological classification systems, there are exceptions. While most NRs function as homo- or heterodimers, some have been found to act as monomers.

Although the two examples of this class presented in Table 3.1, nerve growth factor IB-like receptors (NGFIBs) and Rev-Erb receptors, have been observed to form dimers, each may also bind as monomers due to enhanced affinity for the core recognition DNA motif.

Table 3.1. List of common NRs and their known endogenous ligands. Full name Abbreviation Subtypes Endogenous ligand 17β-estradiol, estrone, estrogen receptor ER α, β estriol progesterone receptors PR -- progesterone Class I testosterone and androgen receptor AR -- dihydrotestosterone glucocorticoid receptor GR -- cortisol mineralocorticoid receptor MR -- aldosterone thyroid hormones, T and thyroid hormone receptor TR α, β 3 T4 vitamin D receptor VDR -- vitamin D all-trans retinoic acid retinoic acid receptor RAR α, β, γ (ATRA) or 9-cis-RA Class II retinoic X receptor RXR α, β, γ 9-cis-RA peroxisome proliferator- PPAR α, β, γ fatty acid metabolites activated receptor liver X receptor LXR α, β oxysterols farnesoid X receptor FXR -- bile acids pregnane X receptor PXR -- xenobiotics NGFIB, nerve growth factor IB-like -- -- Class III Nur77 Rev-Erb Rev-Erb α, β heme 74 3.2.1.2 NR Corepressors

In the absence of a bound ligand, NRs that reside in the nucleus are often bound

to corepressors. These proteins, such as the nuclear receptor corepressor 1 (N-

CoR1) [131] or the silencing mediator for retinoid or thyroid-hormone receptors

(SMRT, also known as N-CoR2) [132], recruit histone deacetylases (HDACs).

HDACs actively remodel chromatin by removing acetyl groups from histone lysines, increasing the formal charge of the side chains and strengthening the interaction with the negatively charged DNA backbone. Ultimately, this leads to

transcriptional silencing. In addition to corepressors binding to apo receptors, a subset of NR antagonists, called inverse-agonists, can promote the recruitment

of corepressors.

Most NRs interact with corepressors very weakly, however unliganded thyroid

hormone receptor (TR) and RAR exhibit strong repression of basal transcription

in the presence of corepressor proteins [131]. Additionally, LXR [133] and Rev-

Erb [134,135] have also been shown to interact with corepressors. Rev-Erb can

actually only act as a transcriptional repressor since it lacks the C-terminal region

that is responsible for the ligand-activated AF2 functionality. The AF2 region,

which maps to helix 12 of the NR ligand-binding domain (LBD) as described in

further detail in Section 3.2.3.4, undergoes a conformational change upon ligand

binding which has the dual effect of displacing corepressor proteins while forming

a binding pocket for coactivators.

A schematic of the functional N-CoR1 and SMRT corepressor domains is

illustrated in Figure 3.1. Here we see that there are three independent repression

75 domains (RD1-RD3) that are responsible for either directly recruiting HDACs or

binding to bridging proteins such as mSin3, which then recruit HDACs (HDAC1 in the case of mSin3) [136]. In spite of the large number of deacetylases found to associate with the N-CoR/SMRT corepressors, HDAC3 seems to be responsible for repressive activity [137]. Interestingly, although recombinant HDAC3 is non- active in deacetylase assays, SMRT-bound HDAC3 does display deacetylase activity. Therefore, RD2 has also been called a deacetylase activating domain

(DAD) [138]. At the C-terminus of the corepressor is the receptor interaction domain (RID) that contains two or three (I/L)XX(V/I)I interaction motifs (N-CoR1 contains three interaction motifs). Although the ~270 kDa N-CoR1 and SMRT corepressors are closely related, they do not seem to be redundant proteins, since knock-out of N-CoR1 was found to be lethal in mouse embryos [139].

Figure 3.1. Domain organization of N-CoR1 and SMRT corepressor proteins. Three repressive domains (RD1-RD3) either directly recruit histone deacetylase proteins (HDACs) or bind to proteins such as mSin3 which then binds to an HDAC. The nuclear receptor interaction domain (RID) interacts with the NR via two or three (I/L)XX(V/I)I motifs found at the C-terminal end of the corepressor.

76 3.2.1.3 NR Coactivators

The ligand-activated transcriptional response mediated by NRs is reliant upon

association with additional proteins in a ligand-dependent fashion. These

proteins are called coactivators and are responsible for multiple functions that

lead to downstream transcription. In particular, coactivators are associated with chromatin remodeling and recruitment of the basal transcription machinery.

Nuclear receptor coactivators are known by many names due to parallel discoveries. The first coactivators characterized belong to the p160 family of proteins. In particular, steroid receptor coactivator-1 (SRC-1) was first reported in

1995 through the use of a yeast two-hybrid screen of a human B-lymphocyte

cDNA library that used the hinge and LBD of hPR as bait [140]. In mice, a protein

homologous to SRC-1 was discovered and called nuclear receptor coactivator-1

(NCoA-1) [141]. Ultimately, the p160 family of coactivators was determined to be composed of three members. In this document, they will be referred to as steroid

receptor coactivators (SRCs), although the NCoA notation is perhaps the most

generalized name as SRCs bind to all nuclear receptors, not just those activated

by steroid hormones. It should be pointed out that SRC-2 is commonly known as

glucocorticoid receptor-interacting protein-1 (GRIP-1) or transcriptional mediators/intermediary factor 2 (TIF2), while SRC-3 has been called many names including p300/CBP/co-integrator-associated protein (p/CIP), ACTR [142],

activated in breast 1 (AIB1), receptor associated coactivator 3 (RAC3), and

thyroid hormone receptor activated molecule-1 (TRAM-1) [141].

77 The three p160 members share a similar domain layout as illustrated in Figure

3.2, acting as a platform for multiple protein interactions in addition to exhibiting

intrinsic histone acetyltransferase (HAT) activity [142]. The N-terminal region

contains a basic helix-loop-helix (bHLH) and Per-Arnt-Sim homology (PAS)

domain indicating dimerization and signaling abilities respectively. The central

region contains the nuclear receptor interaction domain (RID) and the N-terminal

region contains two autonomous transactivation domains (AD1 and AD2), each

with the ability to induce transcription. Both AD regions are responsible for

binding to secondary coactivator proteins: AD1 has been mapped to the residues

responsible for binding CREB-binding protein (CBP) [143], while AD2 binds to

coactivator-associated arginine methyltransferase (CARM1, also known as

protein arginine N-methyltransferase 4, PRMT4) [144]. CBP is closely related to

another coactivator, p300, and serves as a general integrator of transcriptional

signals.

Figure 3.2. Domain organization of p160 family of coactivators. Domains identified in all three members of the p160 family of coactivators include N-terminal basic helix-loop-helix (bHLH) and Per-Arnt-Sim homology (PAS) domains, a central nuclear receptor interaction domain (RID), and C-terminal transactivation domains (AD1 and AD2). AD1 binds the cAMP response element binding protein (CREB) binding protein (CBP) while AD2 binds to coactivator-associated arginine methyltransferase 1 (CARM1/PRMT4). Histone acetyltransferase (HAT) activity has also been mapped to the C-terminal region. The RID contains three LXXLL motifs called NR-box 1-3. Additionally, SRC-1 contains a fourth LXXLL motif at the extreme C-terminus.

78 Coactivators specifically interact with NRs via LXXLL motifs (L=leucine, X=any amino acid) that are found in a variety of proteins, including the p160/SRC and p300/CBP coactivator families. Additional coactivators such as receptor- interacting protein 140 (RIP-140), transcription intermediary factor 1 (TIF-1), and

TRIP-1 proteins also contain the interaction motif [145]. The central RID of the

SRC proteins contains three LXXLL motifs, while SRC-1 contains a fourth motif at its extreme C-terminus. The LXXLL motifs have been called NR boxes, resulting in abbreviations such as ‘SRC-1 NR2’ to describe the second NR box motif of the SRC-1 coactivator. Studies have found that the LXXLL motifs alone are not sufficient for receptor binding, while peptides extended to include flanking regions of the motifs are more potent binders. For example, 13 and 14-residue peptides of SRC-2 NR2 and NR3 respectively have been shown to bind TRβ

[146]. In addition to increasing binding affinity, the flanking regions of the LXXLL motifs confer some level of specificity for the different NRs.

The AD1 and AD2 activities of the p160 coactivators have been linked to the chromatin modification capabilities of p300/CBP and CARM1. Like p160, p300/CBP contains intrinsic HAT activity [147,148]. Additionally, p300/CBP further recruits p300/CBP-associated factor (P/CAF) that also exhibits HAT activity [149]. While CBP and its associated factors can acetylate histone lysine residues, CARM1 can asymmetrically methylate select histone arginine side chains. CBP and CARM1 act synergistically to enhance transcriptional activity since the presence of one or the other does not result in the same amount of transcriptional activity observed when both factors are present. Additionally,

79 CARM1 has been found to more efficiently methylate nucleosomes that have already been acetylated [150]. Although CARM1 and CBP are generally considered secondary coactivators for NRs, CBP/p300 also contains three

LXXLL motifs and can bind directly to NRs [151]. However, this interaction results in 50-100-fold less β-galacosidase activity than that observed for p160 activity in a yeast two-hybrid assay, suggesting the importance of the p160 coactivators to serve as a platform for CBP to bind [152,153].

CBP was initially identified as a coactivator of the CREB transcription factor. In addition to directly interacting with CREB and with NRs through the p160 coactivators, CBP interacts with AP-1 transcription factors (c-Jun, c-Fos), c-Myc, v-Myb, p53, Stat-1 and NF-κB. Thus, CBP is a general coactivator protein responsible for the transcriptional activity of several classes of transcription factors. Interestingly, in addition to methylating histone H3, CARM1 can methylate CBP at R600 of its KIX domain, disrupting its ability to interact with the

KID domain of CREB, thereby impairing cAMP-induced transcription. This molecular switch is thought to allow for cross-talk between the NR and cAMP signaling pathways and allow for proper distribution of the limited number of copies of nuclear CBP [150].

3.2.2 RARs Function a Heterodimers with RXRs

Like most other class II NRs, retinoic acid receptors (RARs) heterodimerize with

RXRs to form functional receptors. While the ligand-binding domain (LBD) serves as the primary platform for dimerization, a major functional consequence of

80 dimerization is the ability to bind to the proper DNA sites upstream of the genes

under the control of the particular NR.

3.2.2.1 Response Elements

The DNA sequences recognized by NRs are called hormone response elements

(HREs), or simply response elements (REs), and are typically found in the

promoter region of downstream genes. Two six base pair consensus sequences

have been determined: AGG/TTCA is preferentially recognized by ER, AR, GR,

and MR, and AGAACA is recognized by all other NRs. While most NRs will bind

the same response elements, variations of these sequences leads to greater

specificity for certain NRs. Additional RE specificity for most NRs is due to dimerization. In the context of dimerization, a recognition sequence is called a

‘half-site’ and the orientation of two half-sites can lead to more specific binding.

For example, half-sites may be oriented as palindromes, inverted palindromes, or direct repeats. Homodimers, such as those recognizing the AGG/TTCA half-site, have REs made up of palindromic half-sites, while most others recognize direct repeats. In addition to orientation, spacing between half-sites may also confer greater binding affinity between a specific NR and its corresponding RE. For example, the RAR-RXR dimer (in which the RAR is binding the upstream element) will preferentially bind direct repeats with a two base pair spacing

(DR2), while the RXR-TR dimer will preferentially bind DR4 REs [130].

Finally, it has been shown that NRs that bind DNA as monomers are able to bind with sufficient affinity due to additional contacts made outside of the six base pair half-site. In the case of NGFIB, the recognition sequence is extended by two 81 bases upstream from the half-site with adenines at the -1 and -2 positions:

AAAGGTCA [154]. While the standard six base half-site is recognized in the

major groove of the DNA, the additional contacts made by NGFIB are in the

minor groove. Overall, the half-site consensus sequences are simplified models

for NR/DNA recognition. Based on what has been observed for NGFIB, it is likely

that DNA contacts outside of the traditionally-defined HREs contribute to the

specificity for other NRs to promote the transcription of particular genes.

3.2.2.2 NR Dimerization

Although dimerization can be seen as being most important for recognition and

binding of repeating REs, it is the ligand-binding domain (LBD), not the DNA-

binding domain (DBD) that serves as the primary platform for dimerization. In the

absence of LBD, high affinity RE binding is lost [155,156]. Therefore, it is too simplistic to think of the LBD and DBD as independent domains; the domains of

a nuclear receptor work together to specifically promote the transcription of particular genes.

3.2.3 NR Structure

Numerous crystal structures of nuclear receptor (NR) ligand binding domains

(LBDs) have revealed the large conformational changes that occur upon ligand

binding. Prior to these structures, NRs were known to contain two activation

functions (AFs). The N-terminus contains the ligand-independent AF-1 for which no structural data has been reported. At the C-terminus is the ligand-dependent

82 AF-2 that crystallographic studies have since mapped to α-helix 12 (H12) of the

LBD.

3.2.3.1 Modular Structure

The overall structure of a NR is modular with six different regions as illustrated in

Figure 3.3. The DNA-binding domain (DBD) and ligand-binding domain (LBD) of the receptors are the most conserved, particularly the DBD, and serve as signatures when identifying NRs in newly sequenced genomes. The other regions vary in their length and sequence. The A/B region allows ligand- independent transcription, while the function of the F region, which is completely absent in some NRs, has not been established. Region D, the linker between the structurally conserved DBD and LBD allows for communication between the two domains during dimerization in addition to granting flexibility to the two DBDs of a

NR dimer to bind the half-site repeats of HREs with different orientations or spacing [155,157].

A/B D F

DBD LBD

C E

Figure 3.3. Domain organization of typical nuclear receptor. The DNA-binding domain (DBD) and ligand-binding domain (LBD), C and E regions, are the most conserved and share a common fold among all NRs.

83 3.2.3.2 A/B Region

The A/B region is the most variable in both sequence and length. The human

RXRα A/B region is 200 residues long, while that for the human hepatocyte nuclear factor 4 γ (hHNF4γ) is only 44 residues long. Little is known about the structure of this region; it is considered an intrinsically disordered domain that can become more ordered upon interaction with a binding partner. This phenomenon has been observed in the case of ERα and GR interacting with

TATA-binding protein (TBP) [158,159]. Interestingly, the A/B region contains one

of two activation functions, AF1 and AF2, found on a typical nuclear receptor.

While AF2 is ligand-dependent, AF1 can be responsible for gene transcription in

the absence of an agonist molecule.

3.2.3.3 DNA-Binding Domain

The first structures of a DNA-binding domain (DBD) were the solution structure of

the glucocorticoid receptor in 1990 [160], followed by the crystal structure in 1991

(PDB ID: 1GLU [161]). The latter of these two structures was in complex with a

hormone response element (HRE), revealing the interaction between the DBD

and the major groove of the DNA. The highly conserved fold is composed of two

amphipathic α-helices that cross at about a 90° angle. Additionally, the DBD

includes two zinc fingers, each coordinating a Zn2+ ion with the side chains of

four cysteine residues (Figure 3.4A). The N-terminal zinc finger forms a

conventional treble clef motif, while the C-terminal zinc finger is a modified treble

clef motif [162]. As illustrated in Figure 3.4B, the first of the two helices inserts

84 into the major groove of the DNA, and is responsible for recognition of the HRE

sequence.

A B

180°

Figure 3.4. DNA-binding domain of estrogen receptor in complex with hormone response element. A. DBD monomer, highlighting its two zinc-binding motifs and two-helix composition. B. The DBD heterodimer inserts into the major groove of the DNA (black and light grey) which consists of two palindromic half sites separated by three base pairs. The DBDs shown in ribbon representation with one monomer colored from red (N-terminus) to blue (C-terminus) and the second monomer colored grey. The structure illustrated here comes from PDB ID: 1HCQ [163].

3.2.3.4 Ligand-Binding Domain

The first published NR ligand-binding domain (LBD) crystal structure was that of the human RXRα (PDB ID: 1LBD [164]) in 1995. This structure revealed that the

LBD is composed of a novel fold consisting of 12 α-helices and a short β-hairpin.

The overall conformation has been described as an antiparallel α-helical 85 sandwich in which helices H4, H5, H6, H8, and H9 are sandwiched between H1,

H2, and H3 on one side and H7, H10, and H11 on the other, as illustrated in

Figure 3.5A. This structure also revealed that the dimerization interface is largely

composed of residues on H10, including a highly conserved leucine residue

flanked by moderately conserved hydrophobic residues (Figure 3.5B).

A H9 B H9 H1 H8 H4 H4 H5 180° H5 H7 H10

H3 H2 H2 H6 H11

H12 H12

Figure 3.5. Diagram of apo hRXRα LBD. A. Topology of monomer. B. Dimerization interface formed by H10. A leucine residue, highly conserved in all NRs, is shown in sphere representation.

In this structure, helix 12 (H12), which contains the activation function 2 (AF2)

region, is extended away from the core of the receptor. Subsequent agonist-

bound structures such as the hRARγ bound to ATRA (PDB ID: 2LBD, [165]), the

rat TRα1 bound to a T3 isostere (structure not available, [166]), and hERα bound

to 17β-estradiol (PDB ID: 1ERE, [167]), reveal that the tertiary structure of the

LBD is the same among different types of NR and that upon agonist binding, H12

undergoes a dramatic conformational change compared to the structure of apo hRXRα. In the agonist-bound state, the amphipathic H12 is closely associated

86 with the core of the receptor, in what has been called the ‘active’ conformation,

making contacts with H3, H4, H5, and the ligand, effectively sealing off one

apparent entry to the ligand-binding pocket. In addition to the reconfiguration of

H12 upon ligand binding, subsequent agonist-bound structures have shown that

H11 becomes continuous with H10, resulting in one less individual helix. In such

cases, the numbering of the secondary structures remains based off of the apo

hRXRα structure, thus the C-terminal helix is still called H12 and not H11.

Based on the various crystal structures up to this point, a ‘mouse-trap

mechanism’ was proposed that suggested the dramatic conformational change in

H12 from extended to ‘closed’ was universal for ligand-induced NR

transcriptional activation [165,168]. However, subsequent crystal structures seem

to challenge this model. A particularly important report that helped clarify the

existing structural data was reported by Nolte et al. in 1998 [169]. Here, two structures were reported: apo hPPARγ LBD (PDB ID: 1PRG) and rosiglitazone

(agonist)-bound hPPARγ LBD in complex with an 88 amino acid segment of the steroid receptor coactivator-1 (SRC-1) (PDB ID: 2PRG). Both structures were solved as homodimers. Superposition of the two monomers of the apo structure reveals asymmetry in the H12 position. In neither case is H12 extended in an

‘open’ conformation similar to the apo hRXRα LBD structure. Instead, H12 in one of the monomers takes on the ‘active’ conformation seen in the crystal structure of various agonist-bound LBDs, while the other H12 is still in a ‘closed’, yet slightly different conformation than the ‘active’ conformation as illustrated in

Figure 3.6. After superimposing the two structures with the ‘match’ command in

87 Chimera, the Cα RMSD between the two LBDs is 2.46 Å, while the Cα RMSD is reduced to 1.01 Å when the H11-H12 region is excluded from the calculation, highlighting that the two monomer share nearly identical conformations aside from the H11-H12 region.

A B

H12

Figure 3.6. Crystal structure of apo hPPARγ. A. Apo hPPARγ LBD homodimer (PDB ID: 1PRG). One monomer is colored blue, while the other is tan. B. Superimposed monomers from 1PRG (coloring same as in A) in addition to agonist-bound hRARγ LBD (PDB ID: 2LBD, pink) highlighting the similar overall structure, yet different H12 conformations.

Shortly after the publication of the apo hPPARγ by Nolte et al., another crystal

structure of apo hPPARγ LBD was published by a different group. This time, the

molecule was solved as a monomer with H12 in an agonist-bound conformation

(PDB ID: 3PRG, [170]). These two apo PPARγ structures provide conflicting data

for the ‘mouse-trap mechanism’ hypothesis since these unliganded structures

were exhibiting ‘closed’ or even ‘active’ H12 conformations.

88 3.2.3.4.1 Coactivator Interactions with the LBD

The structure of the hPPARγ dimer in complex with the 88-residue coactivator peptide (PDB ID: 2PRG) revealed the binding mode for the LXXLL motifs found in the receptor interaction domain (RID) of coactivator proteins [169]. The SRC

binding pocket, as shown in Figure 3.7, is partially created by H12, helping

explain the structural basis for the ligand dependent transcription of NRs. Ligand

binding seems to bring together two of the most conserved regions of the NR

LBDs. In a sequence alignment of 18 NRs from nine different receptor types, one

region of the LBD was notably strongly conserved: the short span composed of

the C-terminal end of H3 through H4 contains six absolutely conserved residues

and six more weakly conserved sites (Figure 3.7C; see Appendix A for full

alignment). Since this region forms the largest part of the coactivator-binding

pocket, it seems that interactions at this site are evolutionarily conserved among

the superfamily of nuclear receptors. The other portion of the coactivator-binding

pocket is formed by H12. This amphipathic helix, which contains the AF-2 region,

has high similarity among the selected sequences in addition to one absolutely

conserved glutamate. As revealed by the crystal structure, this invariant

glutamate in conjunction with an absolutely conserved lysine at the end of H3

forms a ‘charge clamp’ interaction with the receptor interaction domain (RID) of

SRC-1. When mapped to the hRARα sequence, the charge clamp residues are

K244 and E412. Both of the LXXLL RID motifs form α-helices in which the backbone atoms of the N-terminal side of the helix interact with E412, while K244 interacts with the backbone of the C-terminal side of the helix (Figure 3.7B).

89 Therefore, in addition to the polar interaction between side chain and backbone atoms, the dipoles of the RID helices could help stabilize interaction with the receptor via the charge clamp residues. Similar interactions were observed soon

after in structures of both the hTRβ and hERα LBDs in complex with SRC-2 NR2 peptide [146,171].

A B

K244 E412

H3 H4 H12

C

Figure 3.7. Crystal structure of agonist-bound hPPARγ in complex with SRC-1 NR-box 2 peptide. A. hPPARγ monomer (PDB ID: 2PRG; chain B) bound to an agonist (van der Waals surface). The ribbon is colored from red (N-terminus) to blue (C-terminus), while the coactivator peptide is in magenta. B. Close up of view of peptide interactions with the receptor. Coloring is the same as in panel A, however the side chains of invariant residues are now shown. This includes K244 of H3 and E412 of H12, forming the ‘charge clamp’ interaction. The leucine residues of the LXXLL motif of the peptide are shown as well as backbone atoms that interact with the charge clamp residues. Residue numbering is for hRARα. C. Sequence alignment of the regions forming the coactivator-binding pocket from 18 NRs representing nine different types of receptors. Note the string of absolutely conserved residues that form the H3-H4 loop. The residues forming the charge clamp are highlighted in magenta, strongly conserved residues in yellow, and more weakly conserved residues in blue. The arginine highlighted in green forms a salt bridge with the carboxylate terminus of most RAR ligands.

90

3.2.3.4.2 Antagonist-Induced NR LBD Conformations

As described in more detail in Section 3.2.4, antagonist activity can be classified

several different ways. One major distinction can be made between partial and

full (or “pure”) antagonists. Where partial NR antagonists, more commonly called

partial agonists, promote some level of transcription above basal (unliganded)

levels, pure antagonists inhibit basal level transcription. Several crystal structures

have been solved of pure antagonists in complex with NR LBDs. These include hERα LBD bound to the selective estrogen receptor modulators (SERMs)

raloxifene (PDB ID: 1ERR, [167]) and tamoxifen (PDB ID: 3ERT, [171]) as well

as hRARα LBD bound to BMS614 (PDB ID: 1DKF, [172]). In each of these three

cases, the antagonists, which are larger than the agonists from which they were

derived, sterically inhibit H12 of the LBD from occupying the ‘active’

conformation, thereby directly preventing the formation of the coactivator binding

pocket. Furthermore, as shown in Figure 3.8, H12, which contains a degenerate

LXXLL motif itself, physically occupies the coactivator binding site.

91 A E

All-trans retinoic acid BMS614

B C D

Figure 3.8. Antagonist-induced H12 conformation. A. Structural comparison between an RAR agonist (ATRA) and antagonist (BMS614), highlighting in red the additional fragment that gives the molecule its antagonistic properties. B. ATRA-bound hRARα LBD conformation (PDB ID: 3A9E) bound to coactivator peptide (magenta). Helix 12 is colored orange. C. BMS614-bound hRARα LBD conformation (PDB ID: 1DKF) with H12 in orange. D. Overlay of panels B and C, highlighting that H12 in the antagonist-bound receptor occupies that coactivator-binding site. E. Superposition of BMS614-bound structure (rainbow ribbon) and ATRA-bound structure (light blue ribbon + magenta coactivator peptide). BMS614 is the grey molecule shown clashing with an isoleucine residue in H12 of the superimposed agonist-bound conformation.

Where the three antagonists described above act as pure antagonists, partial

agonists may induce yet another H12 conformation. In the crystal structure of the

hERβ LBD in complex with the partial agonist genistein (PDB ID: 1QKM, [173]),

H12 takes a conformation similar to, but slightly different from the pure antagonist

conformation. The helix occupies the coactivator binding site, yet at a different

angle than the H12 conformation induced by pure antagonists. Since genistein

does not have the bulky extension common to pure antagonists, the ligand does

not sterically preclude H12 from adopting the agonist conformation. Additional

insights on the way in which ligands can effect H12 positioning are provided by a

pair of structures of 5,11-cis-diethyl-5,6,11,12-tetrahydrochrysene-2,8-diol (THC)

92 bound to hERα and hERβ LBD (PDB IDs: 1L2I & 1L2J, [174]). While THC is a

hERα agonist, it acts as a pure hERβ antagonist in that it depresses transcription

below basal levels. In the THC-bound hERβ crystal structure, H12 is very similar

to the conformation induced by the ER partial agonist genistein. Like genistein,

THC does not possess a bulky side chain common to the pure antagonists

described above; therefore THC seems to act as a hERβ antagonist through a

different mechanism. This has been termed “passive antagonism” as opposed to

the “active antagonism” observed for pure antagonists [174]. Thus, one must

take care in interpreting the static conformations presenting in crystal structures,

since although both genistein and TCH have been crystallized with the same

ERβ H12 conformation that blocks the coactivator binding pocket, coactivators are able to bind genistein-bound ERβ with much greater affinity than TCH-bound

ERβ. This has been demonstrated with fluorescence polarization studies of rhodamine-labeled coactivator peptides binding to ligand-saturated ERβ LBD,

where Kd_genistein = 104 nM, Kd_apo = 215 nM, and Kd_THC = 3.3 μM, suggesting that

THC stabilizes the crystallized conformation more so than genistein [174].

While the structures of genistein and THC in complex with hERβ reveal a new

H12 conformation that can be stabilized by either a partial agonist or pure

antagonist, a structure of RXRαF318A in complex with oleic acid, has revealed that a partial agonist may also induce the pure agonist-bound H12 conformation

[172]. Like genistein, oleic acid does not sterically inhibit the agonist-bound H12 conformation. Finally, in a study of RXR modulators, it has been shown how an agonist may be progressively transformed into a partial agonist and then a pure

93 antagonist by lengthening a side chain that interferes with the formation of the

agonist-bound H12 conformation. In this study, crystal structures of hRXRα in

complex with three different partial agonists were solved with H12 in the agonist-

bound conformation due to co-crystallization with a coactivator peptide containing

the LXXLL motif [175]. Fluorescence anisotropy studies of the RXRα conjugated

to a C-terminal fluorescein moiety revealed that while the agonist stabilized the

conformation of the C-terminal H12, the partial agonists decreased this stability.

However, the addition of a coactivator peptide was able to increase anisotropy in

the partial agonist-bound cases to levels observed with the agonist-bound

receptor [175]. Thus, while agonists induce a specific conformation that allows for

coactivator recruitment, partial agonists or antagonists have been shown to

stabilize alternate conformations. In the case of a partial agonist, the presence of

coactivators is able to shift the equilibrium towards the active form [172,175,176].

3.2.3.4.3 Corepressor Binding

As previously discussed in Section 3.2.1.2, in the absence of ligand, some NRs

including RAR and TR are commonly found in complex with corepressor proteins

that suppress gene transcription through HDAC activity. In certain cases, an

antagonist may strengthen the interaction between a NR and corepressor

protein. Such ligands may be specifically classified as inverse agonists. The first crystal structure of a NR LBD in complex with a corepressor peptide was that for

PPARα binding to a 22-residue SMRT NR2 motif (residues 2329-2358) (PDB ID:

1KKQ, [177]). As shown in Figure 3.9A, H12 in this structure is poorly structured and loosely packed against H3, a conformation not observed in any previous 94 crystal structure. The corepressor fragment is situated in the coactivator-binding

pocket, yet with noticeable differences. The corepressor interacts with the LBD

as an amphipathic α-helix, like the coactivator, indicating how both coregulators

recognize the same groove. However, the SMRT NR2 helix is one turn longer

than the coactivator helix (three turns instead of two), illustrating how

discrimination between the two coregulators is controlled by the H12

conformation. In the agonist-bound case, H12 is positioned to form the charge

clamp interaction with the coactivator, which puts a strict two-helical turn limit on

the motif that may bind to the receptor. However, in an inverse agonist-bound

conformation, the charge clamp is disrupted, allowing for the longer corepressor

motif to bind.

More recently, in 2010, the structure of RARα LBD bound to an inverse agonist

(BMS493) and a fragment of the N-CoR1 NR1 provided insight on why certain apo NRs, such as RAR and TR, more effectively recruit corepressors than others. In this structure (PDB ID: 3KMZ, [178]), it was revealed that the N-CoR

NR1 motif forms a helix that is one turn longer than the SMRT NR2 motif that was previously co-crystalized with PPARα as described above. To allow for this longer helix to bind, H11 in RARα not only unfolds, but forms a short anti-parallel

β-strand interaction with the corepressor (see Figure 3.9B). Thus, the ability for the RAR, and presumably TR, H11 region to switch to a β-strand allows for it to interact more favorably with the corepressor through the longer NR1 motif. Not only does this allow more interactions between the longer helical motif and the

95 receptor, but also forms the β-strand interaction which further stabilizes the NR-

corepressor complex [178].

Figure 3.9. Corepressor interactions with inverse agonist-bouind NR LBDs. A. PPARα LBD interacting with SMRT NR2 peptide (PDB ID: 1KKQ, [177]). PPARα is colored from red (N- terminus) to blue (C-terminus), the corepressor peptide is magenta, and the ligand is grey. Note that H11 remains intact, while H12 is disordered, yet resolved in the crystal structure. B. RARα LBD interacting with N-CoR1 NR1 peptide (PDB ID: 3KMZ, [178]). Coloring is the same as in Panel A. In this structure, H11 is extended, forming an anti-parallel β-strand interaction with the corepressor peptide. Also note that the NR1 motif forms a longer helix.

3.2.3.4.4 Evidence Against an Extended H12 Conformation

Structural knowledge of the interaction between the SRC LXXLL motifs and the

NR LBD allowed previous crystal structures to be interpreted in a new light. For example, the apo hRXRα LBD crystal structure showed that in the absence of

ligand, H12 was extended away from the core of the LBD. However, when

examining the crystallographic contacts made between monomers, it becomes

apparent that the extended H12 is making contact with a neighboring LBD.

96 Superimposing the coactivator peptide-bound hPPARγ structure with the apo hRXRα structure reveals that the amphipathic H12 is mimicking the LXXLL motif

interactions as shown in Figure 3.10.

A B

H12

C D

H12

H12

E F

Figure 3.10. Extended helix 12 of apo hRXRα LBD interacts with coactivator binding pocket of a neighboring molecule. A. Apo hRXRα LBD monomer with extended H12 conformation (PDB ID: 1LBD). B. Crystal packing of apo hRXRα. C. Two apo hRXRα LBDs interacting. D. Agonist-bound hRARα LBD (pink) in complex with coactivator peptide (green) (PDB ID: 3A9E). E. Overlay of structures from panels C and D. F. Close-up view of panel E, highlighting the extended hRXRα H12 interacting with its neighboring molecule at the coactivator- binding site.

97 This new insight casts doubt on whether the extended H12 conformation found in

the apo hRXRα LBD is a state sampled by the protein under physiological

conditions or if it is simply a crystallographic artifact. Crystal structures of other

NR LBDs have been solved with extended H12s. These include the 1998 structure of ERα LBD solved as a symmetric homodimer in which both H12s were extended away from the ligand-binding pocket (PDB ID: 1A52, [179]). A significant crystal-packing artifact, in which a pair of intramolecular disulfide bonds forms between neighboring hERα dimers, allows for the unusual extended

H12 conformation for the agonist-bound receptor. Like the original apo hRXRα structure, each extended H12 is interacting with the coactivator-binding site of a neighboring receptor. Additionally, in 2010, a structure of an agonist-bound hRARα/antagonist-bound mRXRα LBD was published in which the antagonist- bound mRXRα exhibited an extended H12 conformation while H12 of the hRARα

LBD was in the typical agonist-bound conformation (PDB ID: 3A9E, [180]). Again, the extended H12 in this structure interacted with the LXXLL binding site of a neighboring mRXRα molecule. Thus, these structures perhaps highlight the general flexibility of the H12 region, however, they do not support that the extended conformation is the default state in the absence of ligand, since all of these structures seem to require stabilization through interactions with neighboring receptors. Even in the structure of the apo hPPARγ dimer where both H12s were in the ‘closed’ as opposed to extended state (PDB ID: 1PRG,

Figure 3.6), the crystal contacts reveals that the monomer with the ‘active’ H12 conformation is stabilized by the H12 of the dimerization partner of a neighboring

98 dimer. Also, the ‘active’ H12 conformation of the apo hPPARγ monomer from

PDB ID: 3PRG seems to be influenced by a neighboring molecule. However, this

appears to be the only apo LBD crystal structure in which H12 is not interacting

with the coactivator-binding site of a neighboring molecule. Instead, H12 of one

monomer is making crystallographic contacts with the C-terminus of H3 of an

adjacent monomer. Nevertheless, it seems that nearly all reports of apo H12

conformations are influenced by neighboring molecules in the crystal lattice.

3.2.4 NR Pharmacology

In terms of NR ligands, a small molecule must possess two qualities to be

considered biologically active. First it must bind to the receptor. Second, the

binding must have an effect on its transcriptional activity. The affinity that a

molecule has for a receptor is not associated with an ability to elicit or inhibit a

response. For example, (E)-4-[2-(5,6,7,8-tetrahydro-5,5,8,8-tetramethyl-2- napthylenyl)-1-propenyl] benzoic acid (TTNPB) is a RAR ‘superagonist’. It binds to RARs with a 10-fold weaker affinity than ATRA, yet exhibits a 1000-fold greater potency [181], clearly indicating that binding affinity between ligand and receptor is not correlated to transcriptional activity. As discussed above, a NR ligand is better characterized by the H12 conformation that it stabilizes.

Some basic pharmacological terms are defined below as they relate to NRs.

These definitions will help clarify later discussions of experimental findings in an effort to categorize β-apo-13-carotenone and β-apo-14’-carotenoic acid.

Agonist: a molecule that induces a response. As applied to NRs, the response is gene transcription above basal levels. 99 Parital agonist: a molecule that enhances transcription above basal levels but

not to the extent of the endogenous agonist. In the case of RARs, the

endogenous ligand is ATRA.

‘Superagonist’: a molecule that enhances transcriptional activities beyond the

levels observed with the endogenous agonist.

Inverse agonist: a molecule that induces a response opposing the agonistic

response. In the context of NRs, an inverse agonist would recruit corepressor

proteins that actively repress gene transcription via histone deacetylase (HDAC)

activity. An inverse agonist may also be considered an antagonist as it inhibits

agonist activity.

Antagonist: an antagonist is defined as a molecule that inhibits agonist

responses. Partial agonists are also antagonists in that high levels may compete

with the endogenous agonist, resulting in less than maximal activity. An

antagonist may be further classified as being either ‘pure’ or ‘neutral’ if the

compound does not induce an effect opposing the agonistic response as seen in

inverse agonists.

3.2.5 β-Apocarotenoids Modulate RAR Activity

All-trans retinoic acid (ATRA), the acid form of vitamin A, is the endogenous

agonist for RARs. Plant-synthesized β-carotene is the dietary vitamin A source

for all animals; therefore pure carnivores must obtain their vitamin A from animal

stores that are often found in the form of retinol or retinyl esters. Typically, β-

carotene is centrally cleaved by β,β-carotene-15,15’-oxygenase (BCO1) to form two equivalent retinal molecules which are important chromophores that bind to 100 opsins to enable color vision. Retinal may be enzymatically oxidized to retinol, a

storage form of vitamin A, or further oxidized to retinoic acid.

It is possible that β-carotene is cleaved at sites other than the central 15-15’

double bond. For example, β,β-carotene-9’,10’-oxygenase (BCO2) can form β- apo-10’-carotenal and β-ionone [182]. Some of these asymmetric cleavage products, called β-apocarotenoids, have been found to be biologically active.

Specifically, β-apo-13-carotenone and β-apo-14’-carotenoic acid have been shown to be potent RAR antagonists [124] (Figure 3.11). β-apo-13-carotenone is particularly potent, exhibiting a binding affinity similar to ATRA (Table 3.2).

Additionally, this compound has been found at physiologically relevant concentrations in human plasma samples (3.8 ± 0.6 nM in samples from six individuals), suggesting it is a natural modulator of ATRA activity [124].

Table 3.2. Apocarotenoid binding affinity for human RAR subtypes. * Binding affinity , Ki (nM) β-apocarotenoid hRARα hRARβ hRARγ ATRA 3 ± 1 4 ± 2 3 ± 1 β-apo-13-carotenone 5 ± 1 4 ± 2 4 ± 1 β-apo-14’-carotenoic acid 34 25 58 *As measured by radioligand displacement assays; data reproduced from [124].

101 A

B

C

D

Figure 3.11. β-carotene and apocarotenoids. A. β-carotene B. All-trans retinoic acid C. β-apo- 14’-carotenoic acid D. β-apo-13-carotenone

The exact source of β-apo-13-carotene is not yet known, and it has yet to be

determined if β-apo-14’-carotenoic acid is a naturally occurring β-carotene

metabolite. Nevertheless, given the structural similarities between ATRA and the

two aforementioned asymmetric cleavage products, it is interesting to find that

while ATRA is a potent agonist of RAR activity, the shorter (β-apo-13-

carotenone) and longer (β-apo-14’-carotenoic acid) apocarotenoids both act as

relatively potent antagonists. While β-apo-14’-carotenoic acid only differs from

ATRA in the length of its unsaturated hydrocarbon tail, β-apo-13-carotenone also

differs in its terminal functional group (ketone vs. carboxyl group). A large

majority of synthetic retinoids employ a carboxyl group to form a salt bridge with 102 a conserved arginine residue (R276 in hRARα) at one end of the ligand-binding

pocket, mimicking the binding mode of ATRA. Of the 20 non-ATRA RAR ligands

available for purchase on the Tocris website, 19 contain a carboxyl group as

illustrated in Appendix B. ATRA precursors, retinol and retinal, which differ from

ATRA in that they lack the carboxyl terminus, have been measured to bind with

~4-7-fold and >200-fold weaker affinity than ATRA, respectively, while their

activities are reduced by 35- and 500-fold [183]. These findings highlight how the

oxidation of retinol to ATRA transforms a weak agonist into a strong agonist.

Therefore, it is curious that β-apo-13-carotenone, which lacks the carboxyl group

and is shorter than ATRA, exhibits such a favorable binding affinity.

This chapter addresses the experimental work carried out to understand the

activity of β-apo-13-carotenone and β-apo-14’-carotenoic acid, while the

following chapter details the computational results used to explain the

mechanism of RAR antagonism in greater detail.

3.3 NR LBD expression and purification

In the following sections, all discussion of RARα and RXRα refer to the ligand binding domains of the human receptors only, specifically residues 182-421 for

RARα and 213-452 for RXRα. No experiments made use of constructs containing additional domains. In all cases, the proteins contained an N-terminal

6xHis-tag, unless noted otherwise (e.g. ∆His-RXRα).

103 3.3.1 RARα LBD Expression and Purification

Plasmid containing the human RARα LBD (residues 182-421, MW = 30060.6699

Da, sequence found in Appendix C) with an N-terminal His-tag in the pET28a

vector, encoding kanamycin resistance, was obtained as a gift from Noa Noy,

Case Western Reserve University. This represents the full LBD, excluding 41-

residue, C-terminal F domain. The plasmid was transformed into BL21-

Gold(DE3) cells that were grown to an optical density of ~0.6 in lysogeny broth

(LB) with 0.05 mg/ml kanamycin. Protein expression was then induced with 0.5 M

isopropyl β-D-1-thiogalactopyranoside (IPTG) which promotes BL21(DE3) T7

RNA polymerase expression and therefore RARα LBD expression as it is

controlled by a T7 promoter in the pET28a vector. The protein was expressed for

5 hours at 30°C while shaking at 225 rpm. Following expression, the cells were

pelleted in a Beckman centrifuge with a JLA 16.25 rotor for 20 minutes at 4,000

rpm and 4°C.

The pelleted cells from 2 L of culture were resuspended in a solution containing

20 mM Tris-HCl, 500 mM NaCl, 5 mM imidazole, and one tablet of Mini, EDTA-

free Roche cOmplete Protease Inhibitor Cocktail, pH 8.0, then frozen overnight at

-80°C. The following day, the cells were thawed at 4°C and lysozyme and

deoxyribonuclease (DNase) were added to final concentrations of 0.1 and 0.001

mg/ml respectively prior to lysis via sonication. Cellular debris from the lysate

was pelleted with an Eppendorf 5810R centrifuge at maximal speed for 60

minutes at 4°C. The supernatant was filtered through glass wool then a 0.22 μM filter prior to incubation with ~1 ml of Ni Sepharose™ 6 Fast Flow resin for affinity

104 chromatography of the His-tagged protein. The protein was incubated with the

resin overnight on a rotary mixer at 4°C.

The protein was then purified by affinity chromatography by loading the nickel

resin onto a column and washing with increasing concentrations of imidazole

(wash buffer 1: 20 mM Tris-HCl, 500 mM NaCl, 5 mM imidazole; wash buffer 2:

wash buffer 1 + 50 mM imidazole; elution buffer: wash buffer 1 + 200 mM

imidazole). The column was first washed with 5x5 ml of wash buffer 1, followed

by 3x5 ml of wash buffer 2. Finally, the protein was eluted with 5x5 ml of elution

buffer. The purity of the elution fractions was analyzed by sodium dodecyl sulfate

polyacrylamide gel electrophoresis (SDS-PAGE) as illustrated in Figure 3.12.

Figure 3.12. SDS-PAGE of His-hRARα LBD. His-RARα LBD purified from 2 L of BL21(DE3) cell culture induced with 0.5 M IPTG for 5 hours at 30°C. Lane 1: molecular weight markers; Lane 2: flow through; Lanes 3-6: 5 ml buffer washes with 5 mM imidazole; Lanes 7-9: 5 ml buffer washes with 50 mM imidazole; Lanes 10-15: 5 ml elution fractions with 200 mM imidazole.

105 The elution fractions were further purified via size exclusion chromatography with a HiLoad 16/60 Superdex 200 prep grade column (GE Healthcare Life Sciences).

All 25 ml of the elution fractions were concentrated to < 3 ml with an Amicon Ultra

10,000 Da nominal molecular weight cut-off centrifugal filter (Millipore), so that the entire sample could be loaded onto the column in a single run. During the fast protein liquid chromatography (FPLC) run, the protein was exchanged into a buffer containing 10 mM Tris, 150 mM NaCl, and 1 mM tris(2- carboxyethyl)phosphine) (TCEP), pH 7.5. A sample FPLC chromatogram of purified RARα LBD is presented in Figure 3.13. The major peak is found at a volume of 85.91 ml with a slight shoulder at 79.27 ml. RARα LBD has an expected molecular weight of 30,060.6699 Da. When compared to the elution profiles of protein standards of known molecular weight (see Appendix E) we find that the RARα peaks correspond to monomer and dimer fractions. When collected in 5 ml fractions, the protein eluted as a monomer in fractions 17-20, which were named fractions B8-B5. RARα LBD dimers are only observed when the protein solution was highly concentrated, which could explain some of the binding stoichiometries observed in ITC experiments carried out at high concentrations (see Section 3.6, Appendix F). Typical RARα LBD yields are around 25 mg of purified protein from 2 L of cell culture.

106 43 kDa 25 kDa

67 kDa

Figure 3.13. Size exclusion chromatogram for His-hRARα LBD. Chromatogram of His-hRARα LBD run on a HiLoad 16/60 Superdex 200 column. Dashed lines are indicate the elution volumes of BSA (67 kDa), ovalbumin (43 kDa), and chymotrypsinogen A (25 kDa) protein standards (see Appendix E for details).

3.3.2 RXRα LBD Expression and Purification

Plasmid containing the human RXRα LBD (residues 214-452, MW = 28988.3574

Da, sequence found in Appendix D) with an N-terminal His-tag in the pET15b vector, encoding ampicillin resistance, was obtained from Xu Shen, Shanghai

Institute of Materia Medica who originally obtained it from Eric Xu, Van Andel

Research Institute. The plasmid was transformed into BL21-Gold(DE3) and expressed in the same manner as RARα LBD except for the use of 0.05 mg/ml 107 ampicillin instead of kanamycin as a selection agent. An example SDS-PAGE

result is found in Figure 3.14 and an example size exclusion chromatogram is

found in Figure 3.15.

The two most significant differences noted between the RARα and RXRα LBD expression are the protein yields and the oligomeric states observed in the FPLC

profiles. While a typical RARα LBD yield from 2 L of BL21(DE3) cells is ~25 mg, a much larger amount of RXRα LBD is obtained with the same methods. Typical

RXRα LBD yields are ~50 mg, roughly twice that of RARα LBD. However, as illustrated in Figure 3.15, less than half of this protein is in the monomeric state.

Based on the elution volume of protein standards, RXRα LBD seems to predominantly elute as a tetramer (expected mass of 115.9 kDa), unlike RARα

LBD, which is predominantly monomeric. Additional chromatograms in Section

3.3.3 reveal a dimer peak between the two peaks in Figure 3.15, verifying the tetrameric state of RXRα LBD.

108

Figure 3.14. SDS-PAGE for His-hRXRα LBD. His-hRXRα LBD purified from 2 L of BL21(DE3) cell culture induced with 0.5 M IPTG for 5 hours at 30°C. Lane 1: molecular weight markers; Lane 2: flow through; Lanes 3-7: 5 ml washes with 5 mM imidazole; Lanes 8-10: 5 ml washes with 50 mM imidazole; Lanes 11-15: 5 ml elution fractions with 200 mM imidazole.

109 43 kDa

67 kDa 25 kDa

158 kDa

Figure 3.15. Size exclusion chromtogram for His-hRXRα LBD. Chromatogram of His-hRXRα LBD run on a HiLoad 16/60 Superdex 200 column. Dashed lines indicate the elution volumes of aldolase (158 kDa), BSA (67 kDa), ovalbumin (43 kDa), and chymotrypsinogen A (25 kDa) protein standards (see Appendix E for details).

3.3.3 RARα/RXRα LBD Heterodimer Purification

Since neither RARα nor RXRα work alone to transcribe genes, RARα/RXRα LBD dimers were purified to test whether any of our compounds inhibit the observed transcriptional activity by disrupting the RAR/RXR dimerization. Initially, a copurification method was attempted to isolate RAR/RXR LBD heterodimers by creating a His-tag deletion mutant of RXR LBD. By incubating the cell lysates

containing His-RARα LBD and ∆His-RXRα LBDs together with a Ni Sepharose

110 resin, we hoped to purify His-RARα/RXRα LBD dimers in a single purification

step. During purification, it was thought that excess ∆His-RXRα would be washed

away leaving only His-RARα/∆His-RXRα heterodimers immobilized by the

column. Thus, the His-tag on RXRα LBD was removed using the QuikChange II

XL Site-Directed Mutagenesis Kit from Stratagene with the following primers:

RXR ∆His Forward: AAGAAGGAGATATACCATGACCAGCAGCGCCAAC RXR ∆His Reverse: GTTGGCGCTGCTGGTCATGGTATATCTCCTTCTT

The mutagenesis removed 60 bases that encoded for the His-tag and thrombin

cleavage site. Resulting colonies were cultured and the plasmid DNA was

purified with a Qiagen Miniprep kit. The mutation was successful with high efficiency based on sequencing results from the Plant-Microbe Genomics Facility at OSU (eight of eight selected colonies had the His-tag properly removed).

To test the heterodimer purification strategy, 2 L of His-RARα LBD and 2 L of

∆His-RXRα LBD cell lysates were incubated overnight with ~1 ml of Ni

Sepharose™ 6 Fast Flow resin and the protein was purified following the protocol described in Section 3.3.1. SDS-PAGE analysis of the elution fractions clearly shows the presence of both His-RARα and ∆His-RXRα LBDs (Figure 3.16), however, the FPLC chromatogram shows that the eluted protein formed a mixture of LBD monomers, dimers, and tetramers (Figure 3.17).

111

Figure 3.16. SDS-PAGE of His-hRARα/∆His-hRXRα LBD heterodimer purification. His- hRARα/∆His-hRXRα LBD purified from 2 L of BL21(DE3) cell culture induced with 0.5 M IPTG for 5 hours at 30°C. Lane 1: molecular weight markers; Lane 2: flow through; Lanes 3-7: 5 ml washes with 5 mM imidazole; Lanes 8-10: 5 ml washes with 50 mM imidazole; Lanes 11-15: 5 ml elution fractions with 200 mM imidazole.

112

Figure 3.17. Size exclusion chromatogram for His-hRARα/∆His-hRXRα LBD purification. Chromatogram of His-hRARα/∆His-hRXRα LBD run on a HiLoad 16/60 Superdex 200 column. Initial profile (purple) shows mixture of monomer, dimer, and tetramer, as well as aggregate or precipitate at ~45 ml. Dimers could be purified by rerunning specific fractions.

The formation of RAR/RXR tetramers with 2:2 or 1:3 RAR:RXR stoichiometry

was not anticipated, complicating this method of RAR/RXR dimer purification. By

rerunning selected fractions of the initial gel filtration, dimers could ultimately be

purified from the monomers and tetramers. However, this often required several

runs over the Superdex 200 column, making this method inefficient.

Instead, it seems that the effective way to purify RARα/RXRα heterodimers is to

first purify RARα and RXRα monomers individually, then combine equimolar

113 amounts of each LBD before purifying them via gel filtration. A sample

chromatogram of such a preparation is illustrated in Figure 3.18.

A B

Figure 3.18. Purification of RARα/RXRα heterodimers. A. FPLC chromatogram of His-hRARα added to a molar equivalent of His-hRXRα. B. Zoomed in chromatogram of plot in panel A (purple) compared to the profile of the B8 fraction from the His-hRARα/∆His-hRXRα sample in Figure 3.17, repurified on the size exclusion column (yellow). A 1:4 hRARα:hRXRα sample (green) is included to indicate the elution volumes of monomers, dimers, and tetramers.

When compared to B8 fraction of the His-RAR/∆His-RXR experiment (Figure

3.17B, yellow), this alternate method of RAR/RXR purification results in greater

purity as evidenced by smaller shoulder in the elution profiles, while yielding

nearly the same amount of total dimer. This comes at the cost of an FPLC run.

3.3.4 His-tag Cleavage

The hRARα LBD in the pET28a vector contains an additional 27 N-terminal residues that contain a 6xHis-tag and a thrombin cleavage site (LVPR↓GS).

MGSSHHHHHHSSGLVPRGSHMESYTLTPEVGEL └────┘ └────┘ └──────────► hRARα LBD

114

Likewise, the hRXRα LBD in the pET13b vector contains an additional 21 N-

terminal residues, also with a 6xHis-tag and thrombin cleavage site.

MGSSHHHHHHSSGLVPRGSHMTSSANEDMP └────┘ └────┘ └───────► hRXRα LBD

The extra residues on the N-termini of the protein can pose problems in some

cases, for example, the disordered tails can possibly prevent the formation of

protein crystals in the case of X-ray crystallography screening. In mass

spectrometry experiments, it has been shown that His-tagged proteins expressed in BL21(DE3) E. coli cells can become spontaneously gluconoylated or phosphogluconoylated, resulting in extra masses of +178 Da or +258 Da, respectively [184]. To cleave the His-tag from the recombinant protein, 250 units of thrombin (Amersham Biosciences, Amersham, UK) were added to the His-

RARα/His-RXRα LBD elution fractions from the Ni resin purification.

Thrombin has a reported optimal enzymatic activity at pH 8.3 and contains disulfide bonds. Therefore, to cleave the His-tag from the RARα or RXRα LBDs, it is most appropriate to add the thrombin after the initial Ni-affinity purification in which the elution buffer is at pH 8.0 and contains no reducing agents such as

DTT, β-mercaptoethanol, or TCEP. Additionally, although thrombin activity is reduced at lower temperatures, the His-tag cleavage was carried out at 4°C to

minimize protein precipitation. Also, the sample was not mixed during the

cleavage reaction, since earlier experiments revealed that the RARα LBD

115 samples were very sensitive to agitation, resulting in the visual accumulation of protein precipitate.

The expected mass difference between His-tagged and thrombin-cleaved RARα or RXRα LBD is ~1.88 kDa. This mass difference could be resolved via SDS-

PAGE with a 15% acrylamide content. As illustrated in Figure 3.19, when adding

250 units of thrombin to the His-hRARα LBD FPLC elution fractions, the His-tags were removed after ~2 days when incubating at 4°C without stirring.

Figure 3.19. SDS-PAGE of His-hRARα LBD thrombin cleavage. Fractions were taken at ~0h, 6h, 18h, 23h, 42h, 73h, and 91h. Thrombin, with a molecular weight of 36 kDa can be seen as the faint bands above the RARα LBD.

3.4 Circular Dichroism Experiments

Since the RAR/RXR ligands are all very hydrophobic, they are insoluble in aqueous solutions. Therefore, all compounds were dissolved in ethanol. Typical experiments used a 2-3 fold molar excess of the ligands added to protein 116 samples to ensure saturated binding. If the ligands were at low concentrations

and our protein at high concentrations, a large volume of ligand and therefore

ethanol would need to be delivered to the protein. This caused concern that a

high percentage of ethanol in the protein samples would lead to protein denaturation and/or precipitation. To test the stability limits of RARα on solution ethanol content, a circular dichroism (CD) experiment was conducted.

The RARα sample used in the experiments was purified as described in Section

3.3.1 in 10 mM Tris-Hcl, 150 mM NaCl, and 1 mM TCEP at pH 7.5. The sample was determined to have a concentration of 0.57 mg/ml (18.94 μM) based on UV absorbance at 280 nm. The molar extinction coefficient of the LBD was calculated based on amino acid content of the LBD [185]:

ε280 = #Trp*5500 + #Tyr*1490 + #Cystine*125

His-hRARα LBD contains 1 tryptophan residue, 4 tyrosines, and 0 disulfide

-1 -1 bonds for a molar extinction coefficient of 11,460 M cm at 280 nm. An Aviv

62DS circular dichrosim spectrophotometer was used for the experiments with a

0.1 cm path-length quartz cuvette. Readings were collected at 25°C from 260 to

190 nm at 0.5 nm steps with 5 s signal averaging at each wavelength. Final data

are the average of three independent runs from which baseline data of the

cuvette with only buffer has been subtracted.

The cuvette was filled with 300 μl of sample, which was diluted with 0%, 5%,

7.5%, 10%, 15%, and 20% ethanol. Output from the spectrophotometer is given

in degrees of ellipticity (mdeg), which does not consider sample concentration

117 and is therefore inappropriate for the purposes of this experiment. Since the

sample concentration is diluted with the addition of various amounts of ethanol,

the CD data was converted to molar ellipticity so that the different samples could

be compared:

Molar ellipiticity, [θ] = θ / 10 * concentration * path length (deg * M-1 * m-1)

Here, degrees of ellipticity, θ, are in units of mdeg and path length is in cm. The baseline-corrected molar ellipticity spectra for the protein samples with increasing amount of ethanol are found in Figure 3.20.

Figure 3.20. Circular dichroism spectrum for His-hRARα LBD in solution with varied amounts of ethanol.

118 From the numerous nuclear receptor LBD crystal structures, it is known that

these domains are almost completely α-helical in structure. This coincides with

the RARα LBD CD spectra in Figure 3.20, where characteristic α-helical spectra

are observed with minima at 208 and 222 nm. To determine the amount of

precipitated or unfolded protein at varying levels of ethanol, the molar ellipticity at

208 and 222 nm were compared to the sample without any added ethanol to estimate the percentage of folded protein remaining in the solution. The results for this analysis are found in Figure 3.21 and Table 3.3.

Figure 3.21. Percentage of folded His-hRARα LBD with added ethanol.

119 Table 3.3. Percentage of folded His-hRARα LBD with added ethanol. % Ethanol added Wavelength 0 5 7.5 10 15 20 280 nm 100 99.6 88.3 73.8 29.1 7.7 222 nm 100 99.2 91.8 83.3 38.2 18.8

Based on the results of this experiment, it has been determined that the addition

of a volume of no greater than 5% ethanol results in minimal loss of folded

protein and is therefore an acceptable limit when adding ethanol-dissolved

ligands to RARα LBD in future experiments.

3.5 Dimerization LC Experiments

The transcriptional ability of a nuclear receptor is partly based on its ability to

localize to its target genes, which requires recognition of a response element

(RE). As described in Section 3.2.2.1, most REs are formed by two six-base half- sites separated by 0 to 5 bases. Thus, dimerization, which is largely driven by the

LBD is essential for RE binding. To test if the antagonistic effect of β-apo-13- carotenone and β-apo-14’-carotenoic acid is related to the disruption of

RAR/RXR dimerization, a size exclusion chromatography experiment was performed.

RARα/RXRα heterodimers were prepared by mixing 0.36 ml of 46 μM RARα with

0.49 ml of 33.8 μM RXRα for a final RARα/RXRα concentration of 19.5 μM.

Heterodimer samples were treated with 5 μl of either 15.53 mM β-apo-13- carotenone or 11.23 mM β-apo-14’-carotenoic acid, a 4.7 and 3.4-fold molar excess, respectively. The final volume of ethanol in each case was 0.58%, which,

120 based on the CD experiments in Section 3.4, should have a minimal effect on

that stability of the folded protein. Samples were mixed by pipetting then

incubated for 1+ hours on ice prior to loading onto a HiLoad 16/60 Superdex 200

gel filtration column, in which the protein buffer (10 mM Tris-HCl, 150 mM NaCl,

1 mM TCEP, pH 7.5) was used as the mobile phase with a flow rate of 1 ml/min.

The resulting chromatograms (Figure 3.22) indicate that dimerization is not

disrupted by treatment with either β-apo-13-carotenone or β-apo-14’-carotenoic acid. A reference chromatogram of a 1:4 ratio of RAR:RXR was used to identify the elution volumes of tetramers (~72 ml), dimers (~82 ml), and monomers (~88 ml). In both cases, the antagonist-treated heterodimers eluted as dimers.

Although slight shoulders are observed in the dimer peaks for the antagonist- treated samples, indicating a slight population of monomer, a similar shoulder is observed in the control case. The fact that the shoulders are slightly more prominent in the antagonist-treated cases compared to the untreated control is most likely due to the fact that the apocarotenoids contribute additional

absorbance at 280 nm, which is apparent when comparing the relative peak

absorbance of the dimers at ~82 ml.

121

Figure 3.22. Size exclusion chromatograms of hRARα/hRXRα LBD complexes. Heterodimer complexes were prepared with excess amounts of β-apo-13-carotenone (red) or β-apo-14’- carotenoic acid (blue), which both elute as dimers, similar to the untreated control (black). A 1 hRARα : 4 hRXRα mixture is included as a reference for the elution volumes of tetramers (~72 ml), dimers (~82 ml), and monomers (~87 ml).

3.6 ITC experiments

3.6.1 ITC Background

Through direct measurements of heats evolved during binding processes,

isothermal titration calorimetry (ITC) is the only experimental method that can

measure binding free energies (∆Gbind) in addition to the entropic and enthalpic components of binding (∆Sbind and ∆Hbind). This is accomplished by titrating a ligand into a sample cell containing a receptor. The sample cell is maintained at a

122 constant temperature with respect to a reference cell within an adiabatic chamber

by the application of a constant power. During a titration experiment, the amount

of power required to maintain a constant temperature within the sample cell is

recorded with each injection. Integrating these recorded powers over time results

in the total heat (∆H) of each injection. If concentrations of receptor and ligand

are chosen carefully, the full titration experiment will result in a saturation binding

curve. Given known receptor and ligand concentrations, fitting models to the

resulting curve yields the binding constant (Kb or Ka) and the stoichiometry of

binding (n). Thus, ∆H, Kb, and n are directly measured in an ITC experiment.

Using the relationship

∆ ln where R is the gas constant (R=1.985877534e-3 kcal*K-1*mol-1), T is the

temperature in Kelvin, and the dissociation constant Kd = 1/Kb, the binding free

energy (∆Gbind) may be calculated. Finally, using the relationship

∆ ∆ ∆

the entropic change attributed to the binding event (∆Sbind) may be calculated,

providing all of the components of the total binding energy.

ITC experiments have been applied in the past to nuclear receptor systems, most

often in measuring the interaction energy between coactivators and ligand

binding domains bound to various ligands. Most often, a peptide containing a

single LXXLL motif is used in the ITC experiments as an estimation of the full

coactivator binding energy. Some studies have also used a full coactivator

receptor interaction domain (RID) in the binding experiments. The following

123 experiments measure the binding affinity of the SRC-1 NR2 peptide (686-Ac-

RHKILHRLLQEGS-NH2-698) to ligand-bound RARα.

Only a single report was found of ITC used to directly measure the binding affinity of a ligand to a NR LBD: danthron, an RXRα-specific agonist, was measured to have a Kd of 7.5 μM [186]. The lack of ligand binding studies is likely due to the practical limitations of ITC experiments. In particular, it is important that the ligand and receptor are in solutions that are well-matched, otherwise large heats of dilution during the titration experiment will mask any heats of binding. Given that most NR ligands are very hydrophobic, they are poorly soluble in the aqueous solutions required for protein stability. Additionally, ITC experiments are generally limited to ligands with dissociation constants, Kd, in the range of 1 mM to 10 nM. Direct measurement of high affinity ligands is made difficult by limitations on conditions required to obtain interpretable thermograms.

A parameter called the ‘c value’ is often used to determine the precision by which a Kd may be calculated from an isotherm obtain under specific conditions, where

with [M] = the concentration of the receptor, n = the stoichiometry of binding.

Generally, c should fall within a window of 1 to 1000. More recent studies

suggest that a tighter window of 10 to 100 is more appropriate, with an optimal c

value of 40 [187]. Displacement ITC experiments may be designed to measure

picomolar binding affinities given that an additional ligand with a Kd that falls

within the range of direct measurement is available [188].

124 3.6.2 Coactivator Binding ITC Methods

To probe the effects of ligand binding on the ability of the hRARα LBD to associate with coactivators, isothermal titration calorimetry experiments were conducted. In these experiments, a 2-3 fold molar excess of RAR ligand was added to purified His-hRARα LBD, then loaded into a dialysis cassette with a

10,000 molecular weight cutoff (Pierce) and dialyzed overnight against 4 L of

protein buffer (10 mM Tris, 150 mM NaCl, 1 mM TCEP, pH 7.5). The dialysate

was then used to solubilize lyophyilized SRC-1 NR2, providing a matched buffer

to avoid large heats of dilution during ITC experiments. The SRC-1 NR2 peptide

(686-Ac-RHKILHRLLQEGS-NH2-698) was ordered from EZBiolab (Carmel, IN).

The peptide was prepared in 50:50 water:acetonitrile then made into aliquots which were flash frozen in liquid nitrogen and lyophilized to remove the solvent.

Typically, 0.6-1.2 mg SRC-1 NR2 peptide aliquots were appropriate for individual

ITC runs.

Since RAR ligands are very hydrophobic, the ligands worked with in this study were solubilized in ethanol. The addition of ethanol can have a denaturing/precipitating effect on the concentrated protein samples used for the

ITC experiments (see Section 3.4). Less than 5% ethanol was added to the protein samples in order to achieve a 2-3 fold molar excess of ligand while minimizing protein denaturation. Some protein precipitation is inevitable due to

the acidic nature of most RAR ligands. After the overnight dialysis, cloudy

precipitate was often observed in the samples. Therefore, prior to ITC

experiments, the precipitate was removed by centrifuging the dialyzed sample for

125 10 minutes at 10,000 rpm. The concentration of the precipitate-free, liband-bound

protein solution was then determined by Bradford assay and aliquots of

lyophilized SRC-1 NR2 were solubilized with the dialysate to a 10-fold molar

excess of the protein sample.

All ITC experiments were performed with a VP-ITC Isothermal Titration

Calorimeter from MicroCal which contains a reaction cell of volume ~1.4 ml and a

syringe holding 295 μl of titrant. In order to load the reaction cell and syringe

without air bubbles, ~2.5 ml protein sample and ~600-700 μl of titrant need to be

prepared for each experiment. Since each ITC experiment includes a second

titration of SRC-1 NR2 into dialysate to correct the baseline for heats of dilution,

a total of ~1200-1400 μl titrant needs to be prepared for each experiment. An

initial injection of 5 μl was used in each experiment. The resulting data point was

discarded in all cases due to diffusion that may occur during equilibration. Initial

experiments used a 5 μl injection volume resulting in a total of 55 data points for

each run. Ultimately, it was decided that this was an unnecessarily high number

of titration points, and injection volumes were increased to 10 μl in later

experiments for a total of 28 data points. The time between injections was

adjusted for each case, however equilibrium was generally reached within 210 to

300 seconds after each injection.

To prevent the formation of air bubbles in the reaction cell during the

experiments, protein and peptide samples were degassed at 25°C for 5 minutes

while slowly stirring with magnetic stir bars using a ThermoVac from MicroCal prior to loading samples into the ITC machine. All experiments were performed at

126 25°C with while stirring at 307 revolutions/minute using a reference power of 12

μcal/sec.

3.6.3 Coactivator Binding ITC Results

ITC experiments were carried out to measure the binding affinity of the SRC-1

NR2 peptide to RARα saturated with six different ligands in addition to the apo

receptor. These compounds, as illustrated in Figure 3.23, include three

apocarotenoids: β-apo-14’-carotenoic acid and β-apo-13-carotenone, the two

compounds that are the primary focus of this study in addition to all-trans retinoic

acid (ATRA), the endogenous RAR agonist. Two other compounds were tested

for comparison purposes: 4-[(E)-2-(5,6,7,8-tetrahydro-5,5,8,8-tetramethyl)-2-

naphthalenyl)-1-propenyl]benzoic acid (TTNPB), a superagonist, and 4-[[[5,6-

Dihydro-5,5,-dimethyl-8-(3-quinolinyl)-2-naphthalenyl]carbonyl]amino]benzoic

acid (BMS195614 or BMS614), a neutral antagonist. Finally, β-apo-13-

lycopenone was also tested due to its structural similarities with β-apo-13-

carotenone, differing in the lack of a fused ring.

127 all-trans retinoic acid β-apo-14’-carotenoic acid

β-apo-13-carotenone β-apo-13-lycopenone

TTNPB BMS195614

Figure 3.23. Structures of RAR modulators used in ITC experiments. All-trans retinoic acid (ATRA) is the endogenous agonist, while synthetic compounds TTNPB and BMS195614 are a superagonist and neutral antagonist respectively. The three remaining compounds are eccentric β-carotene or lycopene cleavage products that antagonize hRARα activity.

All ligands were dissolved in ethanol and concentrations were quantified by UV

absorbance at λmax wavelengths, using published molar extinction coefficients for

-1 -1 the cases of ATRA (λmax = 350 nm; ε350 = 45,000 M cm [189]), β-apo-13-

-1 -1 carotenone (λmax = 341 nm; ε341 = 25,300 M cm [124]), and β-apo-14’-

-1 -1 carotenoic acid (λmax = 378 nm; ε378 = 52,200 M cm [124]). Concentrations of

TTNPB, β-apo-13-lycopenone, and BMS614 were determined by weight.

Ultimately, precise concentrations were not required for the experiments

128 described here. However, a reasonable estimation of ligand concentrations

allowed the addition of 2-3 fold molar excess in a volume that did not exceed 5%

total ethanol added to the protein samples.

Each ITC experiment was repeated 3-4 times to provide average results with

small standard deviations. The resulting thermograms are found in Figures 3.24-

30, while the average data is found in Table 3.4. More detailed data for the individual ITC experiments are found in Appendix F.

129

130

Figure 3.24. ITC results of SRC-1 NR2 peptide binding to ATRA-bound hRARα LBD.

131

Figure 3.25. ITC results of SRC-1 NR2 peptide binding to TTNPB-bound hRARα LBD.

132

Figure 3.26. ITC results of SRC-1 NR2 peptide binding to β-apo-13-carotenone-bound hRARα LBD.

133

Figure 3.27. ITC results of SRC-1 NR2 peptide binding to β-apo-14'-carotenoic acid-bound hRARα LBD.

Figure 3.28. ITC results of SRC-1 NR2 peptide binding to BMS614-bound hRARα LBD.

134

135

Figure 3.29. ITC results of SRC-1 NR2 peptide binding to β-apo-13-lycopenone-bound hRARα LBD.

136

Figure 3.30. ITC results of SRC-1 NR2 peptide binding to apo hRARα LBD.

Table 3.4. Summary of ITC experiments with hRARα LBD. Kd (μM) ∆H (cal/mol) ∆S (cal/mol/K) ∆G (kcal/mol) TTNPB 0.48 ± 0.07 -6069 ± 239 8.57 ± 0.83 -8.62 ± 0.08 ATRA 0.65 ± 0.01 -5305 ± 124 10.5 ± 0.42 -8.44 ± 0.02 β-apo-14’-carotenoic acid 1.28 ±0.27 -6511 ± 287 5.17 ± 1.26 -8.05 ± 0.12 β-apo-13-carotenone 2.16 ±0.05 -5772 ± 544 6.56 ± 1.82 -7.73 ± 0.01 apo 5.97 ± 0.94 -4116 ± 444 10.12 ± 1.62 -7.13 ± 0.09 β-apo-13-lycopenone 39.67 ±10.0 -14606 ± 2097 -28.8 ± 8.46 -6.03 ± 0.17 BMS 195614 115.7 ± 61.6 4593 ± 743 33.7 ± 1.79 -5.45 ± 0.28 * Average data ± 1 SD from n=3

The range of binding affinities determined by this work is in line with previous

reports of coactivator binding to NRs characterized by ITC. The most extensive

ITC work performed on a nuclear receptor system was a study of the SRC-1 NR2

peptide (685-ERHKILHRLLQEG-697) binding to the constitutive androstane

receptor (CAR) [190]. The binding affinity of the coactivator peptide was measure

to both monomer CAR LBD in addition to CAR/RXRα LBD dimers. The affinities

measured for the CAR/RXR dimer were very similar to those reported here. Apo

CAR/RXR bound the SRC-1 NR2 peptide with a Kd of 4.9±0.7 μM, similar to the

5.97±0.94 μM observed for the RARα LBD here. Interestingly, this study showed a significant difference in affinity of the peptide to the CAR monomer vs. the

CAR/RXR dimer. For example, the peptide was found to bind to the apo CAR monomer with Kd = 21.3±2.4 μM. When bound to the agonist TCPOBOP, the affinity of the peptide to the CAR/RXR heterodimer was reduced from 0.6±0.1 μM to 8.7±0.8 μM when binding to monomeric CAR, suggesting allosteric cooperatively in binding to dimers.

137

In a case that is more directly comparable to the experiments performed here,

Osz et al. measured the binding affinity for the SRC-1 NR2 peptide (676-

CPSSHSSLTERHKILHRLLQEGSPS-700) and SRC-2 receptor interaction

domain (RID, 622-828 and 632-772) to a TTNPB-saturated hRARα LBD complex

(176-421) to be 1.5±0.1 μM and 2.5±0.2 μM respectively [191]. This is a very

similar experiment to what was performed here in which the hRARα LBD

consisted of residues 182-421, however the SRC-1 NR2 peptide used in this

work (686-Ac-RHKILHRLLQEGS-NH2-698) was 12 residues shorter.

Nevertheless, the reported affinity of the SRC-1 to the TTNPB-bound hRARα

LBD complex by Osz et al. was similar to the result of 0.48±0.07 μM measured

here.

In a study by Pogenberg et al., the binding affinity of SRC-1 NR2 monomeric hRARα (176-421) was measured via fluorescence anisotropy [192]. The coactivator peptide was the same as used in our ITC experiments, except for a fluorescein moiety conjugated to the N-terminus. The Pogenberg study measured the dissociation constant of the peptide to RAR that was bound to

TTNPB and BMS614 in addition to the apo receptor, with results of 0.53±0.05

μM, 92.70±20 μM, and 8.18±0.3 μM respectively. These values are very similar to the data reported here, especially for the TTNPB case. In addition, Pogenberg tested binding of the full SRC-1 receptor interaction domain (RID) and did not notice any significant difference between dissociation constants for the full RID

138

versus the NR2 fragment, leading to the conclusion that the 13-residue fragment

is able reproduce the interactions between the coactivator and the nuclear receptor [192].

Although agonist binding is generally a requirement for NR recruitment of coactivators, SRC-1 NR2 peptide binding was observed in the apo RAR ITC experiments. This is consistent with previous experiments in which full-length

SRC-1 was tested for binding to a variety of NRs using a yeast two-hybrid assay.

In this study, both RARα and RXRα were able to bind SRC-1 in the absence of

agonist, while GR, ER, AR, MR, PR, VDR, and TR were not [193]. This implies a

basal level of RARα-mediated transcription with suitable coactivator

concentrations, regardless of the presence of agonist.

Based on the results in Table 3.4, a clear trend is evident in that agonists induce stronger coactivator binding than antagonists. For example, when bound to

TTNPB, a superagonist, an affinity for the coactivator peptide is observed that is more favorable than for the ATRA-bound state. Additionally, binding to BMS614, a pure antagonist, resulted in the lowest observed affinity for SRC-1 NR2—lower than the apo (unliganded) state. This is in line with a similar correlation observed between coactivator peptide binding to ligand-bound ER as measured by fluorescence polarization, where it was found that a partial agonist induced a peptide binding affinity between the affinity observed for full agonist-bound and

unliganded receptors [174]. A pure antagonist, on the other hand, induced a

139

peptide binding affinity that was weaker than what was measured for the

unliganded receptor.

β-apo-14’-carotenoic acid and β-apo-13-carotenone induce coactivator binding

affinities that fall between those measured for the ATRA-bound and apo states.

Therefore, while these compounds were initially characterized as RAR antagonists [124], this binding data suggests that it would be most accurate to classify the compounds as partial agonists. β-apo-13-lycopenone, on the other hand is a pure antagonist, inducing a coactivator binding affinity that is less favorable than observed for the apo receptor. The additional rotational freedom of the lyopenone likely destabilizes the H12 conformation resulting in poor peptide binding, whereas β-apo-13-carotenone, with the fused ring induces a

H12 conformation that favors peptide binding.

Although a clear trend is observed for peptide binding affinity with respect to the type of ligand bound to the receptor: superagoninst > agonist > apo > antagonist, the same cannot be said about the measured enthalpic and entropic components of the binding. All of the binding events are exothermic except for the case of

BMS614-bound RAR. The positive ∆H measured in the BMS614 experiments

was offset by a very favorable ∆S for an overall negative ∆G. The calculated

entropies of binding were favorable in all cases except for β-apo-13-lycopenone- bound RAR. Similar to the BMS614-bound case, the overall binding energy induced by the lycopenone was favorable due to compensation by a very

140

favorable enthalpy change.

Overall, these data indicate that β-apo-13-carotenone and β-apo-14’-carotenoic acid are better classified as partial agonists instead of antagonists, and the effects of these two molecules is due to destabilization of the coactivator/receptor interaction with respect to the endogenous ligand, ATRA, but stabilization of the interaction with respect to the unliganded receptor.

3.7 Origin of β-apo-13-carotenone binding affinity

Most RAR ligands form a salt bridge with a conserved arginine at one end of the

binding pocket (R276 in hRARα), however β-apo-13-carotenone does not contain a carboxylate as is found in most RAR ligands (see Appendix B for structures of many carboxylate-containing RAR ligands). Additionally, the unsaturated tail of the carotenone is two carbons shorter than the endogenous agonist, ATRA, and therefore cannot physically fill the pocket to reach R276. This makes it difficult to rationalize the high affinity of β-apo-13-carotenone for RAR, which is similar to that of ATRA (ATRA Ki = 3 ± 1 nM; β-apo-13-carotenone Ki = 5 ± 1 nM) [124].

However, as detailed in Section 4.4.5, computational modeling and subsequent

molecular dynamics (MD) simulations suggest that, when bound to RAR, the

carbonyl carbon of β-apo-13-carotenone is within van der Waals contact distance

of the sulfur atom of a conserved RAR cysteine residue (C235) in the binding

pocket. This lead to the hypothesis that the high affinity of the carotenone is due

to a covalent interaction with the receptor by the mechanism proposed in Figure

141

3.31.

cysteine (in thiolate form)

δ δ‐ +

β-apo-13-carotenone hemithioacetal product

Figure 3.31. Suggested mechanism of covalent bond formed between β-apo-13-carotenone and C235 of hRARα LBD.

Recently, it has been reported that luffariellolide, a marine natural product found in several families of sponges, is a RAR partial agonist which forms a covalent bond with C235 [194], providing precedence for such an interaction. The structure of luffariellolide (Figure 3.32A) contains a trimethylcyclohexene ring similar to β-apo-13-carotenone, however the former is longer than ATRA and terminates in a bulkier γ-hydroxybutenolide moiety, which serves as the point of conjugation to the receptor. Additionally, the covalent interaction between luffariellolide and the receptor forms an ether bond that is much more stable than the proposed hemithioacetal product formed between the receptor an β-apo-13- carotenone. As discussed in the following sections, the instability of this product

142

makes it more difficult to observe experimentally.

A B

Figure 3.32. Luffariellolide covalently binds to RAR LBD. A. Structure of luffariellolide (bottom) compared to the structure of retinoic acid (top). B. Crystal structure of luffariellolide bound to RARα LBD, covaltently linked to C235 (PDB ID: 4DQM).

To investigate the possibility that β-apo-13-carotenone forms a covalent interaction with RAR, several experimental approaches were considered, including ITC, NMR, mass spectrometry, and X-ray crystallography. While none

of the following results conclusively confirm the formation of the covalent product,

much of the data is suggestive that the interaction is taking place.

3.7.1 ITC Experiments with C235A Mutant

If β-apo-13-carotenone forms a covalent interaction with C235 of RARα, then

mutation of this residue to alanine would prevent such an interaction from

occurring. ITC experiments were performed to determine if a C235A mutation

perturbed the ability of β-apo-13-carotenone to induce coactivator binding. Using

143

the protocol described in Section 3.6.2, binding data was collected for His- hRARα C235A LBD that was saturated with either ATRA or β-apo-13- carotenone. The point mutation was made using a Stratagene QuikChange II XL

Site-Directed Mutagenesis Kit with the following primers:

RARα C235A Forward: CAGTGAACTCTCCACCAAGGCCATCATTAAGACTGTGGAG RARα C235A Reverse: CTCCACAGTCTTAATGATGGCCTTGGTGGAGAGTTCACTG

Thermograms of the experiments are found in Figure 3.33 and 3.34 with a

summary of the thermodynamic data in Table 3.5.

144

145

Figure 3.33. ITC results of SRC-1 NR2 peptide binding to ATRA-bound hRARα C235A LBD.

145

Figure 3.34. ITC results of SRC-1 NR2 peptide binding to β-apo-13-carotenone-bound hRARα C235A LBD.

146

Table 3.5. Summary of ITC experiments on hRARα C235A LBD. Kd (μM) ∆H (cal/mol) ∆S (cal/mol/K) ∆G (kcal/mol) C235A ATRA 0.86 ± 0.07 -5209 ± 225 10.3 ± 0.89 -8.28 ± 0.05 β-apo-13-carotenone 2.18 ± 0.24 -4977 ± 95 9.21 ± 0.45 -7.72 ± 0.06 Wild type ATRA 0.65 ± 0.01 -5305 ± 124 10.5 ± 0.42 -8.44 ± 0.02 β-apo-13-carotenone 2.16 ± 0.05 -5772 ± 544 6.56 ± 1.82 -7.73 ± 0.01 * Average data ± 1 SD from n=3

When compared to the results with the wild type RAR LBD, the C235A mutation

was found to have negligible effects on the ability of the ligands to induce a

particular SRC-1 NR2 binding affinity. In fact, the overall binding affinity in the

ATRA-bound case was more perturbed by the mutation than the carotenone-

bond case. This result indicates that if a covalent interaction occurs between the

ligand and receptor, it does not alter the position or dynamics of the ligand in the

binding pocket significantly enough to effect coactivator binding. However, taking a closer look at the thermodynamic data, while the energy components of the measured ∆Gs are not significantly different for the ATRA-bound receptors, this is not the case for β-apo-13-carotenone. Binding of the SRC-1 NR2 peptide to

RAR seems to be more entropically favored in the case of the carotenone bound to the mutant receptor. However, more repeat experiments would need to be performed to determine the significance of this observation given the fairly large standard deviation for the calculated ∆S in the case of the β-apo-13-carotenone-

bound wild type receptor. In the wild type case, two of the three measurements

had a lower ∆S while the third, which was performed at a later date, was more

inline with the ∆S values measured for the mutant receptor (see data in Appendix

147

F for details). In the end, these data do not discount the possibility that the

cysteine in the binding pocket is required for the observed potency of β-apo-13-

carotenone.

3.7.2 NMR Experiments

Two nuclear magnetic resonance (NMR) experiments were designed to directly

detect the possible covalent interaction between β-apo-13-carotenone and the

RAR LBD. Based on the proposed mechanism of interaction illustrated in Figure

3.31, the carbonyl carbon of the ligand is expected to transition from sp2 to sp3 hybridization upon formation of the hemithioacetal product. The first experiment employed 13C direct detection using β-apo-13-carotenone with a single 13C label at the carbonyl carbon (Figure 3.35A), while the second experiment was a 2D

ADEQUATE (Adequate DoublE QUAntum Transfer Experiment) using a triply- labeled compound (Figure 3.35B).

A B

Figure 3.35. 13C-labeled β-apo-13-carotenone. A. Singly-labeled compound at carbonyl carbon. B. Triply-labeled compound. 13C sites denoted by red asterisks.

The experiments here were limited to concentrations of 1-1.5 mM due to

instability of the RAR LBD at room temperature and at high concentrations. In

order to decrease the amount of protein precipitation during the experiments, two 148

surface-exposed cysteine residues were mutated to serine. The mutations were

made to C203 and C336, which are located on the highly mobile L1-3 loop and

the L8-9 loop respectively. Both of these sites are distant from the ligand binding

site and are therefore not expected to influence binding. Using the Stratagene

QuikChange II XL Site-Directed Mutagenesis Kit, the C203S mutation was made

using the following primers:

RARα C203S Forward: TCCCTGCCCTCAGCCAGCTGGGC RARα C203S Reverse: GCCCAGCTGGCTGAGGGCAGGGA

while the C336S mutation was made using:

RARα C336S Forward: CATCTGCCTCATCAGCGGAGACCGCCA RARα C336S Reverse: TGGCGGTCTCCGCTGATGAGGCAGATG

RAR constructs with these two surface-exposed cysteine mutations were made

both with and without the C235A mutation (C235 is the residue that covalently

interacts with luffariellolide and is suspected to interact in a similar fashion with β-

apo-13-carotenone). Use of RAR with the two surface mutations resulted in

significantly less observed precipitate after long NMR experiments. NMR

samples were prepared by first transferring the protein into a buffer containing 10

mM Tris-HCl, 150 mM NaCl, 3 mM TCEP, and 20% D2O, pH 7.5 via a PD-10

column. Next, the concentration of the protein was determined using the Bradford

assay and an equimolar amount of β-apo-13-carotenone was added prior to concentrating of the sample to a volume of ~500 μl. The data was collected on a

Bruker DRX400 NMR Spectrophotometer at ambient temperature. For the

149

protein samples, 1,200-1,500 scans were performed. The resulting NMR spectra

are presented in Figure 3.36.

Figure 3.36. 13C NMR spectra. A. 2 mM 13C-labeled β-apo-13-carotenone in deuterated methanol. B. Buffer used in protein NMR experiments. C. ~1.5 mM RARα LBD (C203S, C336S) + 13C labeled β-apo-13-carotenone. D. ~1.5 mM RARα LBD (C203S, C336S, C235A) + 13C labeled β-apo-13-carotenone.

Panel A in Figure 3.36 confirms that β-apo-13-carotenone was successfully synthesize with a single 13C label at the carbonyl position given that carbonyl carbons of ketones have a typical 13C chemical shift in the range of 200-210

150

ppm. The other signal of this spectrum at 49.1 ppm is from the methanol used as

a solvent. For the protein samples (Figure 3.36C-D), one broad peak

corresponding to the protein itself was observed from 170-180 ppm. Additionally,

five large peaks were observed in the range of 60-80 ppm. Based on the NMR

spectrum for the protein buffer (Figure 3.36B), two of the peaks were due to Tris,

while the remaining three peaks were ultimately determined to be from the

glycerol that protects the spin concentrators used during sample preparation.

Later NMR experiments substituted Na2HPO4 for Tris and used thoroughly rinsed concentrators to eliminate these signals.

In the sample in which the binding pocket retained a cysteine at position 235, the carbonyl carbon was not observed (no signal ~200 ppm in Figure 3.36C) and two alternate signals were observed at ~117 and ~137 ppm. However, these may not be actual signals given that their amplitude is not much greater than the noise level. If the covalent interaction is occurring and the carbonyl carbon transitions from sp2 to sp3 hybridization, then the resulting NMR signal would be expected to

be in the range of 70-80 ppm. It is possible that the peaks from the Tris and

glycol could possibly obscure the new signal, but it is unlikely. It is also possible

that there is a fast transition between bond formation and bond breaking given

the relatively unstable hemithioacetal product that is expected upon conjugation

of β-apo-13-carotenone to the receptor. This could lead to an observed signal between 200 and 70 ppm, in the range of ~135 ppm.

In the sample with the C235A mutation in the binding pocket, a clear signal at

~202 ppm is present, indicating that the carotenone is present in the sample and 151

is chemically unaltered (Figure 3.36D). It is possible that an excess of ligand in the sample could be the cause of this signal. Although I was careful to add a 1:1 molar ratio of ligand to protein, errors in measured concentration of either protein or ligand could result in an excess of ligand added to the protein sample.

However, this error would have to be significant, almost unrealistically so, to result in the 200 ppm signal observed in Figure 3.36D. Nevertheless, to correct for this possibility in the ADEQUATE experiment that follows, a slight change was made to the sample preparation. Instead of adding the ligand to the protein sample after transfer into the D2O-containing buffer, the ligand was added first.

This way, any excess ligand should be retained in the PD-10 column.

Based on the presence of the carbonyl signal of β-apo-13-carotenone in the

C235A sample and the disappearance of this signal in wild type sample, it is tempting to draw the conclusion that the carotenone is being chemically altered by C235. However, in the end, the signal-to-noise ratio is too poor to make any definitive conclusions. In an attempt to increase signal over noise levels, we need to either increase sample concentration or increase sensitivity. Unfortunately, hRARα LBD is fairly unstable and precipitates at room temperature and high concentrations. Although the surface cysteine mutations partially alleviate this problem, substantial precipitation was still observed in the NMR samples after overnight experiments.

Direct detection of carbon in NMR experiments generally suffers from lower sensitivity (~6,000-fold) than proton detection. Therefore, in an attempt to increase signal, a set ADEQUATE experiments were carried out. ADEQUATE 152

experiments are proton-detected experiments used to correlate pairs of bonded

13C atoms by probing protons that are directly bonded to the 13C atoms. Thus, in

this 2D experiment, cross-peaks arise at the chemical shifts of a proton and a 13C

atom two bonds away in a structure containing 1H-13C-13C. Using the triply 13C- labeled compound illustrated in Figure 3.35B, two proton peaks are expected: one corresponding to the three terminal methyl protons (~2 ppm) and one corresponding to the single proton of the methenyl group adjacent to the keto

carbon (~6 ppm). These signals would form cross-peaks with the ketone carbon

at ~200 ppm on the F1 axis. The ADEQUATE spectrum of the triply-labeled β- apo-13-carotenone at ~1 mM in ethanol is illustrated in Figure 3.37. Here we see the two expected signals, however, the signal-to-noise ratio is not as improved as was hoped, with a particularly weak signal from the methyl protons.

Nevertheless, samples were prepared of the carotenone with the same two protein constructs described above (hRARα LBS C203S,C336S and hRARα

C203S,C336S,C235A LBD). To eliminate the possibility of excess, unbound ligand, the carotenone was added to the protein prior to buffer transfer with the

PD-10 column. The new buffer had the following composition: 10 mM Na2HPO4,

150 mM NaCl, 20% D2O, pH 7.51 to eliminate signals from the Tris used in

previous experiments. Additionally, the Amicon 10,000 MWCO centrifugal

concentrators were thoroughly rinsed prior to concentrating the samples to ~1-

1.5 mM, to eliminate unwanted signals from the glycerol that protects the

concentrator membranes.

153

The results of the ADEQUATE experiments are presented in Figures 3.38 and

3.39. In both cases, the addition of β-apo-13-carotenone to the protein completely diminishes the signals observed for the unbound ligand, making the spectra uninterpretable.

Figure 3.37. ADEQUATE spectrum of free, triply-labeled β-apo-13-carotenone.

154

Figure 3.38. ADEQUATE spectrum of triply-labeled β-apo-13-carotenone bound to hRARα LBD (C203S, C336S).

155

Figure 3.39. ADEQUATE spectrum of triply-labeled β-apo-13-carotenone bound to hRARα LBD (C203S, C336S, C235A).

3.7.3 Mass Spectrometry Experiments

Mass spectrometry was used in an attempt to observe β-apo-13-carotenone

covalently linked to the RARα LBD. Two samples were prepared: one treated

156

with β-apo-13-carotenone, the other with ATRA. The expected results were that

only the mass of the protein would be observed for the ATRA-treated sample,

while a mass shift corresponding to the mass of the carotenone (258.4 Da) would

be observed in the other sample. Initial experiments revealed mass shifts of +258

and +178 for His-hRARα LBD treated with both ligands. It was ultimately

determined that these shift were due to gluconoylation (+178 Da) and

phosphogluconoylation (+258 Da) that is often observed in His-tagged proteins

expressed in BL21(DE3) E. coli cell lines [184]. Since the expected mass shift

due to carotenone binding would overlap with the mass shift caused by

phosphogluconoylation, ∆His-RAR was used in subsequent experiment to

achieve cleaner data.

Ultimately, experiments were performed with ∆His-RAR C203S C336S that was

transferred into a solution of 150 mM ammonium acetate, pH ~7.1. The samples

were treated with a three-fold excess of either ATRA or β-apo-13-carotenone,

and incubated overnight. Based on the proposed reversible mechanism of

covalent interaction between β-apo-13-catoenone and RAR (Figure 3.31), the

conjugated hemithioacetal would be stabilized under acidic conditions, therefore

2 μl of formic acid was added to 250 μl of each 100 μM sample in order to shift the potential equilibrium toward the formation of the conjugated carotenone.

Then, to precipitate the protein, 750 μl of chloroform was added to each sample which was subsequently vortexed and centrifuged, resulting in a protein precipitate sandwiched between an aqueous layer below and a chloroform layer above. It is interesting to note that in an earlier experiment with highly 157

concentrated NMR samples (1-1.5 mM), the chloroform layer for an ATRA-

treated sample was vibrantly yellow, while the choloroform layer for a β-apo-13- carotenone-treated sample was colorless (both compounds are yellow in solution). Additionally, the protein precipitate for the carotenone-treated sample was noticeably yellow on the side facing the chloroform layer, suggesting that the carotenone remained associated with the protein while ATRA did not.

After discarding the aqueous and chloroform fractions, another 250 μl of ammonium acetate (pH 4.5) and 500 μl of chloroform were added to the sample, which was again vortexed and centrifuged to wash away any unbound ligand.

Finally, the protein precipitate was dried with N2 gas and was resolubilized in 800

μl acetonitrile, 600 μl ammonium acetate, and 60 μl formic acid for a final protein

concentration of ~10 μM. Sonication and vortexing were used to completely resolubilize the pelleted protein, while the relatively high amount of formic acid was required to achieve good ionization. The masses of the samples were measured by electrospray mass spectrometry using a Waters Q-Tof Micro with a capillary voltage of 3,500 V, a source temperature of 80°C, a desolvation temperature of 150°C, a sample cone voltage of 35 V, and a collision energy of 6

V. The resulting spectra are found in Figure 3.40.

158

Figure 3.40. Mass spectra of hRARα LBD. A. ATRA sample. B. β-apo-13-carotenone sample. C. ∆His-RARα LBD + ATRA sample. D. ∆His-RARα LBD + β-apo-13-carotenone sample.

159

In addition to the protein+ligand samples, samples of the free ligands were also tested to assure that they ionize in the same conditions used to observe the proteins. When prepared in 800 μl acetonitrile, 600 ammonium acetate, and 60 μl formic acid, both ATRA and β-apo-13-carotenone could be ionized and accurate masses could be measured (Figure 3.39 A and B); the molecular weights of

ATRA and β-apo-13-carotenone are 300.4 Da and 258.4 Da respectively. As for the two protein samples, the mass spectra appear to be exactly the same except that free ligand is observed in the sample prepared with β-apo-13-carotenone

(Figure 3.39 D) while free ATRA is not observed (Figure 3.39 C).

Using the most prominent peak (m/z = 971.5543) and the adjacent +1 charge state (m/z = 939.2904), the charge state was calculated:

p1 = 939.2904 = m+z/z p2 = 971.5543 = m+(z-1)/(z-1) z = p2-1/(p2-p1)

Resulting in a near-integer charge of 30.0817 for m/z = 939.2904. Calculating the mass of the most prominent peak with z = 29 gives a result of 28146.0747 Da which closely compares to the 28146.58 Da expected average molecular weight of the ∆His-hRARα C203S C336S LBD.

Although a mass shift was not observed that corresponded to the carotenone- conjugated receptor, these experiments show that β-apo-13-carotenone remained present in the protein sample after two chloroform washes while ATRA was fully washed away. Given the similar structure of the two compounds, we

160

would expect similar results from the chloroform washes. This leads us to suspect that β-apo-13-carotenone survived the washes due to the proposed covalent conjugation with the receptor and that the ligand was liberated from the protein during ionization, resulting in the observation of free β-apo-13-carotenone with the receptor.

3.7.4 Crystallography Screening

There are numerous NR crystal structures deposited in the protein data bank

(PDB). Typically, these structures are for isolated DNA-binding domains (DBDs) or ligand-binding domains (LBDs). Currently there are 17 structures that include a RAR LBD. All three isoforms are represented: five RARα, three RARβ, and nine RARγ LBD structures are present. The five structures of RARα LBD are a mix of monomers and heterodimers cocrystalized with both agonists and antagonists. A summary of the existing RARα LBD crystal structures is found in

Table 3.6.

161

Table 3.6. Existing hRARα LBD crystal structures.

Release Coactivator Corepressor PDB ID hRARα His mRXRα His RAR ligand RXR ligand Year peptide peptide BMS614 2000 1DKF 182-416 N 230-462 - Oleic acid -- -- (antagonist) BMS493 N-CoRNR1 2010 3KMZ 176-421 Y ------(inverse agonist) 2047-2065 AM580 SRC-1 NR2 2010 3KMR 176-421 Y ------(agonist) 686-698 ATRA LG100754 TIF-2 2010 3A9E 153-421 Y 228-467 - -- (agonist) (antagonist) 686-698 Luffariellolide SRC-1 NR4 2012 4DQM 182-415 Y ------(partial agonist) 1429-1441

162

162

Table 3.7. Conditions tested for -apo-13-carotenone and RARα crystallization. β-apo-13- SRC-1 Total # of hRARα* His hRXRα His Chemical conditions Temperature carotenone NR2 conditions 13.19 13.19 6-fold Hampton Crystal Screen 1 & 2, PEG/Ion - - -- room temp 384 mg/ml mg/ml excess 1 & 2, Wizard I, II, III, and IV 6.34 6.34 6-fold Hampton Crystal Screen 1 & 2, PEG/Ion - - -- room temp 384 mg/ml mg/ml excess 1 & 2, Wizard I, II, III, and IV 12.25 12.25 6-fold 6-fold Hampton Crystal Screen 1 & 2, PEG/Ion - - room temp 384 mg/ml mg/ml excess excess 1 & 2, Wizard I, II, III, and IV 7.17 7.17 6-fold 6-fold Hampton Crystal Screen 1 & 2, PEG/Ion - - room temp 384 mg/ml mg/ml excess excess 1 & 2, Wizard I, II, III, and IV 12.71 3-fold Hampton Crystal Screen 1 & 2, PEG/Ion ------room temp 384 mg/ml excess 1 & 2, Wizard I, II, III, and IV 6.42 3-fold Hampton Crystal Screen 1 & 2, PEG/Ion ------room temp 384 mg/ml excess 1 & 2, Wizard I, II, III, and IV 10.3 3-fold 3-fold Hampton Crystal Screen 1 & 2, PEG/Ion - -- - room temp 384 mg/ml excess excess 1 & 2, Wizard I, II, III, and IV 5.0 3-fold 3-fold Hampton Crystal Screen 1 & 2, PEG/Ion - -- - room temp 384 mg/ml excess excess 1 & 2, Wizard I, II, III, and IV 163 10.0 2-fold 3-fold -- - JCSG Core Suite I, II, III, and IV room temp 384 mg/ml excess excess 5.3 2-fold 2.6-fold  -- - JCSG Core Suite I, II, III, and IV room temp 384 mg/ml excess excess 10 2-fold 3-fold  -- - JCSG Core Suite I, II, III, and IV 4°C 384 mg/ml excess excess 5.3 2-fold 2.6-fold  -- - JCSG Core Suite I, II, III, and IV 4°C 384 mg/ml excess excess * ∆His-hRARα contained C203S C336S mutations

163

A total of 4,608 conditions (48 96-well plates) were set up in order to try to

crystallize β-apo-13-carotenone in complex with the RARα LBD. In addition to

different chemical conditions, variables that were altered included using hRARα

LBD (residues 176-421) both with and without an N-terminal His-tag as a

monomer or as a heterodimer with ∆His-hRXRα (residues 214-452), at room

temperature or 4°C, at higher or lower concentrations, or with or without a

coactivator peptide (SRC-1 NR2 residues 686-698). A complete summary of the

tested conditions is presented in Table 3.7.

The crystallography trials were set up in 96-well plates using the hanging drop

vapor diffusion technique in which 300 or 400 nl of sample was diluted 1:1 with

the reservoir solution on a plate seal. The plates were set up with a Mosquito

machine from TTP Labtech, using reservoir volumes of 100 μl. No crystals were

observed.

3.8 Conclusions

The experiments performed here indicate that both β-apo-14’-carotenoic acid and

β-apo-13-carotenone induce hRARα to bind the coactivator peptide with an affinity greater than observed for the unliganded receptor but less than the affinity of an ATRA-bound receptor. Based on these observations, the two β- apocarotenoids are more accurately categorized as RAR partial agonists instead of antagonists.

164

The origin of the high affinity of β-apo-13-carotenone for RARs is likely due to a covalent interaction occurring between the ligand an a conserved cysteine in the

RAR ligand-binding pocket. While not conclusive, NMR and mass spectrometry studies suggest that this interaction is taking place.

165

Chapter 4. Computational Investigation of Retinoic Acid Receptor Antagonism

4.1 Introduction

Where Chapter 3 details the experimental work performed to characterize the

antagonistic properties of β-apo-13-carotenone and β-apo-14’-carotenoic acid, the following chapter continues probing the same questions, but with a computational approach. Methods such as docking, molecular dynamics (MD)

simulations, and free energy analysis are employed to help explain how the two

aforementioned apocarotenoids modulate retinoic acid receptor (RAR) activity at

a molecular level. Additionally, long-timescale MD simulations of the apo RARα

LBD were performed to evaluate the mobility of helix 12 in the context of

investigating the ‘mouse trap model’ proposed for NR activation.

4.2 Ligand Parameterization

In order to carry out MD simulations of the RARα LBD in complex with the

compounds tested experimentally via ITC (Section 3.6), force field parameters

were prepared to properly describe the bond and angle equilibrium distances and

force constants so that the dynamics of the ligands would be properly simulated.

In all cases, the general AMBER force field (GAFF) was used to describe the

166

bond, angle, and dihedral parameters [195]. Partial atomic charges were

calculated using the restrained electrostatic potential (RESP) method [196] as

facilitated by the R.E.D. vIII Tools [197], in which a single conformation of each

ligand was reoriented three individual times to achieve a set of highly- reproducible charges. More details on the benefit of using R.E.D. Tools for the derivation of partial atomic charges are described in Section 5.4.3.2.

To derive the charges for all-trans retinoic acid (ATRA), 4-[(E)-2-()5,6,7,8- tetrahydro-(5,5,8,8-tetramethyl-2-naphthalenyl)-1-propenyl]benzoic (TTNPB), β- apo-13-carotenone, and β-apo-14’-carotenoic acid, initial structures were required. Coordinates for ATRA were extracted from the crystal structure of a hRARα/mRXRα LBD heterodimer binding ATRA and synthetic antagonist (PDB

ID:3A9E [180]), while coordinates for TTNPB were extracted from the crystal structure of hRARβ in complex with TTNPB (PDB ID:1XAP [198]). Coordinates of the two β-apo-carotenoids were obtained through modification of ATRA using

Maestro (Schrödinger, LLC). Protons were added to each ligand followed by energy minimization in Gaussian 03 at the HF/6-31G* level. The optimized structures were then imported into R.E.D., which utilized Gaussian to calculate

molecular electrostatic potentials (MEPs) using the ‘RESP-A1’ model (HF/6-

31G*; Connolly surface algorithm; two-stage RESP fit qwt=0.0005/0.001). Each

molecule was reoriented three times and the final charges were fit by linear

regression to assure the proper formal charge on the molecule (-1 in the case of

all ligands except β-apo-13-carotenone which is neutral). The three carotenoids

167

were reoriented based on the C6, C7, and C8 atoms, while TTNPB was

reoriented based on the two atoms forming the double bond linker in addition to

the adjoining atom of the bicyclic ring system.

Finally, library files for each ligand were created using the antechamber module

of the AMBER suite of programs. All atom types and bonded parameters were from GAFF. Any missing parameters were either matched from existing parameters for similar atom types or empirically derived with the parmchk program. Figures for each ligand including the GAFF atom types used and derived partial atomic charges are presented in Figures 4.1-4.4. Additionally, the library files (.prepi and .frcmod) required for MD simulations of RAR in complex with the ligands are included in Appendices G-J.

168

Figure 4.1. Structure of all-trans retinoic acid (ATRA). GAFF atom types (middle) and partial atomic charges (bottom) used in MD simulations.

169

Figure 4.2. Structure of TTNPB. GAFF atom types (middle) and partial atomic charges (bottom) used in MD simulations.

170

Figure 4.3. Structure of β-apo-14'-carotenoic acid. GAFF atom types (middle) and partial atomic charges (bottom) used in MD simulations.

171

Figure 4.4. Structure of β-apo-13-carotenone. GAFF atom types (middle) and partial atomic charges (bottom) used in MD simulations.

4.2.1 Parameterization of β-apo-13-carotenone Covalently Linked to Cysteine

Based on molecular dynamics simulations of β-apo-13-carotenone bound to the

RARα, it was observed that the carbonyl carbon of the ligand was in close proximity to the sulfur atom of C235 in the binding pocket. This lead to the hypothesis that β-apo-13-carotenone could possibly be gaining its apparent high

172

affinity for the receptor through covalent linkage with this cysteine residue. To

test how this covalent linkage would affect the dynamics of the RARα LBD, a new

modified cysteine amino acid library file was created.

Based on the orientation of β-apo-13-carotenone in the hRARα binding pocket

from the MD simulation analyzed in Section 4.5.2, the formation of the (S)-isomer appeared most probable. Therefore, force field parameters for the (S)-isomer were calculated and the new amino acid was given the three-letter identifier CCS

(Figure 4.5A).

The first step parameterizing a non-standard amino acid is to obtain an optimized geometry of the structure of interest. Typically, multiple conformations of the same residue would be considered for determining the partial atomic charges since these charges change with conformation. However, in the case of β-apo-

13-carotenone covalently linked to the hRARα LBD we do not expect any conformation flexibility of the side chain due the enclosed nature of the binding pocket. Therefore, a single conformation similar to that observed in the MD simulation of β-apo-13-carotenone in the hRARα binding pocket was deemed sufficient for determining the partial atomic charges of CCS. The method used for determining the partial charges for CCS were the same as described for the ligands in Section 4.2, with slight modifications. The CCS molecule was capped with an acetyl group at its N-terminus and an N-methyl group at its C-terminus to create an Ac-CCS-Nme dipeptide. This followed the procedure used to determine the amino acid charges in the popular AMBER 94/99 force fields that are still in

173

use today [199]. This capping served to terminate the CCS amino acid in a more natural way and avoids the formal charges carried by –NH3 and –COO amino acid termini. After geometry optimization of the dipeptide at the HF/6-31G* level, charge fitting was implemented in R.E.D. with several restraints. While the caps remained on the CSS residue during charge fitting, they would ultimately be removed to leave an amino acid that can be linked in a polypeptide chain. To ensure an overall neutral CCS molecule that is compatible with the other amino acids of the AMBER 94/99 force field, the acetyl and N-methyl groups were forced to have a total charge of 0. Additionally, the backbone amide atoms were fixed to specific values determined by Cieplak et al. [199]. This follows the

AMBER philosophy of consensus backbone charges used for all amino acids.

Specifically, three sets of backbone charges are implemented in AMBER force fields: one for each set of residues that carry either a 0, +1, or -1 net charge. In the case of CCS, the neutral set of amide charges were used. Finally, the amide atoms of the N- and C- terminal caps were also fixed to specific charges determined for the AMBER 94/99 force fields. The final charges computed for

CCS are tabulated in Appendix K.

The remainder of the amino acid parameterization relied on a correct choice of atom types to describe the chemical nature of the molecule. Most of the bond, angle, and dihedral parameters that describe the connectivity of CCS could be adopted from atom types available in the ff99 and GAFF parameter sets. In particular, all possible CCS atoms that were chemically equivalent to atoms

174

found in cysteine were adopted for CSS and use the corresponding ff99 atom

types and parameters. Likewise, the same GAFF atom types used to simulate β- apo-13-carotenone were used to describe the corresponding portion of CCS.

Only one force field parameter was completely missing from the ff99 and GAFF data sets, while three others were found to deviate from the geometry of the optimized structure. These missing or inaccurate parameters were found at the stereogenic center formed by the conjugation of β-apo-13-carotenone to cysteine.

The sulfur atom, formerly of SH atom type, was converted to the S atom type that is found in methionine, as it better describes the connectivity of the atom. Beyond the sulfur atom, away from the backbone, CCS adopts GAFF atom types (GAFF atom types are all lower case, while the ff99 atom types are capitalized). The tetrasubstituted carbon is given the name ‘CD’ and has a c3 atom type.

Connected to ‘CD’ are the hydroxyl group (‘OE’ and ‘HOE’; atom types oh and ho, respectively), the methyl group (‘CE’, ‘HE1’, ‘HE2’, and ‘HE3’; atom types c3, hc, hc, and hc, respectively), and the beginning of the conjugated hydrocarbon chain (‘C1’ and ‘H1’; atom types c2 and ha, respectively).

One particular angle parameter describing a sulfur atom bonded to a sp3 carbon which is in turn bonded to a hydroxyl oxygen atom (ss-c3-oh) is not found in either the GAFF or ff99 parameter sets. Therefore new parameters were derived for this angle according to the methods used in the development of GAFF [195].

One slight change implemented here is that equilibrium angles were adopted

175

from structure geometry optimized at the MP2/6-311+G(d,p) level of theory instead of MP2/6-31G* that was used in the development of GAFF. This change would likely result in more accurate parameterization due to the increased size of the basis set.

The molecule used to parameterize the missing angle parameter, 2-

(methylthio)but-3-en-2-ol (1), is illustrated in Figure 4.5B. After geometry optimization at the MP2/6-311+G(d,p) level of theory, the six angles that describe the geometry of the tetrasubstituted carbon atom were compared to the available parameters in either GAFF or ff99 (Table 4.1).

Table 4.1. Comparison of calculated angles describing the tetrasubstituted carbon of compound 1. MP2/6-311+G(d,p) GAFF Angle ∆θ angles (°) angles (°) oh-c3-c3 105.158 109.43 +4.272 oh-c3-ss 105.895 110.05* +4.155 oh-c3-c2 109.950 109.72 -0.23 c2-c3-ss 114.519 104.97 -9.549 ss-c3-c3 111.391 112.69 +1.299 c3-c3-c2 109.624 110.96 +1.336 * angle computed by empirical approach implemented in Antechamber.

176

Figure 4.5. Structure of CCS amino acid and 2-(methylthio)but-3-en-2-ol (1). A. Structure of CCS amino acid with atom numbering using in topology file. B. Minimal molecule, 1, used to describe the linkage of β-apo-13-carotenone linkage with cysteine. Labels are for the mixed ff99/GAFF atom types used in the CCS parameter set.

As shown in Table 4.1, the oh-c3-ss angle parameter is missing from GAFF, but can be calculated using an empirical approach implemented by the parmchk program found in the AMBER suite of programs. However, the equilibrium angle calculated by parmchk is greater than 4° from the angle calculated using the ab

initio approach. In addition to the oh-c3-ss angle deviating somewhat

significantly from the ab initio results, so do the oh-c3-c3 and c2-c3-ss

angles, deviating +4.272° and -9.549° from the ab initio results respectively.

Finally, the c3-ss-c3 angle was also shown to deviate -4.415° from the ab initio

result. The four equilibrium angles (θ) that deviate more than 2° from the existing

or empirically calculated AMBER force field parameters were changed to the

newly calculated angles and the corresponding force constants (kθ) were refit to better describe the behavior of the new CCS amino acid.

177

The new force field parameters use the equilibrium angles from the MP2/6-

311+G(d,p) calculation and force constants were fit against vibrational frequencies determined at MP2/6-311+G(d,p) that were scaled by 0.9496 as

suggested by Scott and Radom [200]—a method used in GAFF parameter development. Partial atomic charges for 1 were determined using the R.E.D.

method in which the molecule was reoriented three times to reduce positional

biasing of the final charges. The force constants, kθ, in GAFF range from 60.6 to

68.3 kcal/mol for the angles of interest, and were used as starting points in a systematic search for molecular mechanics (MM) force constants that reproduce

the quantum mechanical (QM) vibrational frequencies. The MM vibrational

frequencies were calculated using the nmode program in AMBER after the test molecule had been energy minimized to where the root-mean-square of the

Cartesian elements of the energy gradient was less than 1e-04 kcal/molÅ (drms

= 0.0001).

The force constants were optimized by scanning test parameters in increments of

0.1 kcal/mol/rad2. The optimized parameters, listed in Table 4.2, decreased the

RMSD between the MM nmode vibrations and the scaled QM vibrational frequencies from 121.367 cm-1 for the GAFF parameters to 119.587 cm-1.

178

Table 4.2. Optimized angles and force constants for the tetrahedral linkage of β-apo-13- carotenone to cysteine. GAFF MP2/6-311+G(d,p) Angle k k θ (°) θ Angle θ (°) θ eq (kcal/mol/rad2) eq (kcal/mol/rad2) oh-c3-c3 109.43 67.7 oh-c3-c3 105.158 30.0 oh-c3-ss 110.05 61.9 oh-c3- S 105.895 78.5 c2-c3-ss 104.97 63.6 c2-c3- S 114.519 0.0 c3-ss-c3 99.92 60.6 CT- S-c3 104.335 110.5

Interestingly, a kθ value of 0.0 for the c2-c3-S angle allowed for the best fit between QM and MM vibrational frequencies. Without a force constant, there is no energy penalty to maintain the 105.158° θeq angle that was calculated for 1.

Therefore, in MD simulations, the geometry for this angle will be dictated by the van der Waals and electrostatic forces of surrounding atoms. After 25 ns of MD simulation of the hRARα LBD with the CCS residue, the c2-c3-S angle converged to an average value of 111.18° with a standard deviation of 5.25°.

This value is more in line with the ab initio result of 114.52° than the equivalent

θeq value of 104.97° found in GAFF. Therefore, the lack of force constant to maintain the c2-c3-S angle does not result in a significant deviation from the desired θeq. Simulations of the hRARα LBD with CCS using both the GAFF parameters and the optimized parameters based on the ab initio calculations were carried out to measure the effect that the new parameterization had on the geometry of the CCS residue. Average θ values for the reparameterized angles are reported in Table 4.3. These values have been averaged over 25 ns simulations that were sampled at 50 ps intervals. Figure 4.6 illustrates the raw

179

data for the c2-c3-S measurements and shows that the data is relatively converged after 25 ns of sampling. Overall, the new force field parameters keep the geometry of the CCS amino acid more in line with the QM-optimized geometry. Notably, the CT-S-c3 angle in the GAFF simulation deviates significantly from the θeq value (measured value = 105.55°, θeq = 99.92°). With a

2 kθ value of 60.6 kcal/mol/rad , this results in an average internal energy penalty of 0.585 kcal/mol for this angle alone, an 11-fold increase from the energy penalty obtained with the new parameters presented here. The preferred CT-S- c3 value in the GAFF simulation is nearly identical to that of the MD simulation with the ab initio-optimized parameter with θeq = 104.34°, further supporting the geometry optimized at the MP2/6-311+G(d,p) level of theory.

Table 4.3. Comparison of angle measurments from MD simulations using GAFF or MP2/6- 311+G(d,p)-optimized parameters. GAFF MP2/6-311+G(d,p) Angle θeq (°) MD θave (°) θeq (°) MD θave (°) oh-c3-c3 109.43 107.10 105.158 103.15 oh-c3- S 110.05 106.85 105.895 104.59 c2-c3- S 104.97 105.97 114.519 111.18 CT- S-c3 99.92 105.55 104.335 105.57

180

Figure 4.6. Mesurement of the c2-c3-S angle from MD simulations using parameters refit to ab initio calculations or using GAFF parameters. Top: raw data of angle measurments sampled at 50 ps intervals (light color) with a moving average of the data (dark color). Bottom: Running average of the measurements over 25 ns of MD simulation. Blue: parameterization based on MP2/6-311+G(d,p) calculations, Red: GAFF parameters.

The final library and parameter files required to implement the CCS amino acid in

AMBER simulations are supplied in Appendix L & M.

4.3 Ligand Docking to RARα LBD

The structures of β-apo-13-carotenone and β-apo-14’-carotenoic acid are very similar to the structure of ATRA, with the unsaturated chain of the former being two carbons shorter and the latter two carbons longer. Therefore, it is likely that the compounds bind to RAR in a similar manner as the endogenous ligand, for 181

which a crystal structure is available. To confirm the binding modes, docking

experiments were performed with both AutoDock 4 [201] and Glide (Schrödinger,

Inc.).

AutoDock dockings were performed with affinity grids 70x90x80 points in

dimension using a 0.375 Å grid point spacing and centered at the known agonist

binding site. Ligands were prepared with Gasteiger charges using

AutoDockTools (ADT) and were then docked to the receptor using the

Lamarckian genetic algorithm, which was limited to either 50,000,000 energy

evaluations or 100,000 generations. Each ligand was docked 100 independent

times and the resulting docking poses were clustered by all-atom RMSD with a

1.5 Å tolerance.

Dockings performed using software from Schrödinger, Inc. used the Glide XP

scoring function [202]. The receptor was prepared with the Protein Preparation

Wizard, placing protons to optimize hydrogen bonds while the overall receptor

structure was optimized with the OPLS force field.

The human RARα LBD used for docking and subsequent MD simulations came

from the crystal structure of a human RARα / murine RXRα LBD heterodimer in which ATRA was bound to the RAR LBD and the RXR LBD was bound to the synthetic antagonist LG100754 ((2E,4E,6Z)-3-methyl-7-(5,5,8,8-tetramethyl-3- propoxy-5,6,7,8-tetrahydronaphthalen-2-yl)octa-2,4,6-trienoic acid) (PDB ID:

3A9E [180]). Of the five RARα LBD structures available in the PDB, 3A9E was selected for two major reasons. First, the structure is completely resolved, unlike

182

many other NR LBDs, which often contain an unresolved region in the loop

connecting helices 1 and 3. Additionally, the RARα LBD from 3A9E is bound to

both ATRA and a coactivator peptide, which is helpful for the study of the two

antagonistic carotenoids that are the focus of this work.

4.3.1 ATRA Docking

Initially, the ability of the docking algorithms to recreate the binding mode of

ATRA was tested by stripping ATRA from the binding site of PDB ID: 3A9E and

re-docking it. Using heavy atom RMSD with respect to the crystallographic coordinates as a metric to gauge successful redocking, the Glide XP result was superior to the AutoDock results. The single Glide pose was very similar to the crystallographic binding mode, with an RMSD of 1.07 Å, while the best AutoDock pose had an RMSD of 1.29 Å and was part of a 13 pose cluster with an average

RMSD of 1.45 Å (0.15 Å standard deviation). In interpreting AutoDock results, typically either the largest cluster or the lowest energy cluster is selected for closer analysis, with a large, low-energy cluster being the optimal result. In the case of ATRA docking to RARα, neither the largest cluster nor the lowest energy cluster was the most similar to the experimental binding mode. As listed in Table

4.4, the AutoDock cluster with the smallest average RMSD from the experimental mode ranked third for size and second for energy. For this reason, the Glide XP docking modes for β-apo-13-carotenone and β-apo-14’-carotenoic acid were selected as the starting points for MD simulations.

183

Table 4.4. ATRA docking to RARα LBD. AutoDock Clusters Poses Docking Score* RMSD from 3A9E* Cluster 1 47 -9.04 ± 0.17 2.44 ± 0.13 Cluster 2 20 -8.86 ± 0.10 1.87 ± 0.26 Cluster 3 13 -8.98 ± 0.10 1.45 ± 0.15 Cluster 4 8 -8.44 ± 0.05 2.52 ± 0.09 Cluster 5 3 -8.68 ± 0.02 2.33 ± 0.04 Cluster 6 2 -8.84 ± 0.03 2.67 ± 0.05 Cluster 7 2 -8.37 ± 0.03 9.19 ± 0.10 Cluster 8 1 -8.39 2.74 Cluster 9 1 -8.33 2.59 Cluster 10 1 -8.26 2.84 Cluster 11 1 -8.24 2.68 Cluster 12 1 -7.30 9.24 Glide XP Pose 1 -- 1.07 * For AutoDock clusters with more than one member, scores and RMSD are presented as averages ± one standard deviation; poses were clustered with an RMSD tolerance of 1.5 Å

184

A B

C

Figure 4.7. Docking modes of ATRA. A. Experimental structure of ATRA (blue) bound to RARα LBD (ribbon) from PDB ID:3A9E. Several features are highlighted, including helix 3 (magenta), helix 11 (orange), helix 12 (purple), and a coactivator peptide (green). B. Comparison of crystallographic ATRA position (blue) to Glide XP docking mode (pink), and AutoDock mode with overall lowest RMSD (green, from Cluster 3). C. Same as B, but with the AutoDock mode with lowest RMSD from the largest cluster (green).

As illustrated in Figure 4.7B, the representative AutoDock binding mode from

cluster 3 (RMSD = 1.29 Å) is a better match to the experimental binding mode at the tetramethylcyclohexene portion of the molecule than the Glide XP docking mode. However, the Glide mode is nearly identical near the carboxy terminus of

ATRA, resulting in the overall lower RMSD of 1.07 Å. Figure 4.7C illustrates a representative pose from the largest AutoDock cluster (RMSD = 2.16 Å) which is very similar to the Cluster 3 pose, except that the cyclohexene ring orientation is incorrect.

185

4.3.2. β-apo-13-carotenone Docking

Docking of β-apo-13-carotenone was more straightforward with much more

agreement between AutoDock and Glide as illustrated in Figure 4.8. Of the 100

docking AutoDock runs, 69 were clustered together in the lowest energy cluster,

which matched the Glide XP docking mode except for the orientation of the

ketone oxygen. Analysis of the β-apo-13-carotenone AutoDock results is found in

Table 4.5. Compared to the experimental binding mode of ATRA, the orientation is very similar in both the ring and unsaturated tail regions.

Figure 4.8. Docking modes of β-apo-13-carotenone. β-apo-13-carotenone docked to the RARα LBD (ribbon) in which part of helix 3 (magenta) is removed for clarity. Other features of the RAR LBD highlighted are helix 11 (orange) and helix 12 (purple). The Glide XP docking mode (pink) is shown in addition to representatives from the largest AutoDock cluster (green). The experimental binding mode of ATRA (blue) is shown for reference.

186

Table 4.5. Cluster analysis of β-apo-13-carotenone docked to RARα LBD. AutoDock Clusters Poses Docking Score* RMSD from Glide XP Pose* Cluster 1 69 -7.62 ± 0.04 0.90 ± 0.15 Cluster 2 28 -7.59 ± 0.18 2.38 ± 0.05 Cluster 3 3 -7.41 ± 0.01 2.78 ± 0.06 * Scores and RMSDs are presented as averages ± one standard deviation; poses were clustered with an RMSD tolerance of 1.5 Å

Analysis of the β-apo-13-carotenone docking mode revealed that the carbonyl

carbon of the ligand comes within 3.74 Å of the sulfur atom of C235 (Figure

4.9A), suggesting a possible mechanism by which β-apo-13-carotenone binds with high affinity to the RARs. This cysteine, which is positioned on helix 3 of the receptor, is conserved in all three human RARs. In the RXRs, however, this position is substituted by a glutamine.

187

A

B RARα 224 LWDKFSELSTKCIIKTVEFAK 244 RARβ 217 LWDKFSELATKCIIKIVEFAK 237 RARγ 226 LWDKFSELATKCIIKIVEFAK 246 RXRα 264 PVTNICQAADKQLFTLVEWAK 284 RXRβ 335 PVTNICQAADKQLFTLVEWAK 355 RXRγ 265 PVTNICHAADKQLFTLVEWAK 285

Figure 4.9. Proximity of β-apo-13-carotenone to C235. A. Docking modes of β-apo-13- carotenone with both Glide XP (pink) and AutoDock 4 (green) to the RARα LBD suggest a close proximity between the carbonyl carbon of the ligand and the sulfur atom of C235 on helix 3 of the receptor (magenta). The crystallographic position of ATRA is shown for reference (blue). B. Sequence alignment of all three human RARs show that the cysteine is conserved, while it is substituted by a glutamine in the three human RXRs.

4.3.3. β-apo-14’-carotenoid Acid Docking

Docking of β-apo-14’-carotenoic acid was less trivial due to its extended tail.

Since the endogenous RAR ligand, ATRA, perfectly fills the binding cavity, β-

apo-14’-carotenoic acid, which is two carbons longer than ATRA can only dock to

the RARα LBD due to a tunnel through which the carboxy tail may protrude as

illustrated in Figure 4.10. 188

A

B

Figure 4.10. β-apo-14'-carotenoic acid extending outside the RAR binding cavity. A. Glide XP docking mode of β-apo-14’-carotenoic acid (pink) to hRARα LBD (ribbon). Helix 3 (magenta), helix 11 (orange), and helix 12 (purple) of the receptor are highlighted and the crystallographic binding mode of ATRA (blue) is included for reference. B. Same as A, but with surface representation of receptor to highlight tunnel allowing β-apoo-14’-carotenoic acid to extend outside of the binding cavity.

The AutoDock cluster that is most similar to the Glide XP docking mode is

Cluster 5, which contained only eight of the 100 generated poses. Of the clusters

containing more than one member, Cluster 5 ranks fourth out of nine when

sorted by docking score. The details of the AutoDock results for β-apo-14’-

189

carotenoic acid are presented in Table 4.6 and representative docking modes are illustrated in Figure 4.11.

Table 4.6. Custer analysis of β-apo-14'-carotenoic acid docked to RARα LBD. AutoDock Clusters Poses Docking Score* RMSD from Glide XP Pose* Cluster 1 33 -8.84 ± 0.20 2.88 ± 0.28 Cluster 2 26 -8.81 ± 0.09 2.41 ± 0.13 Cluster 3 10 -8.98 ± 0.17 2.22 ± 0.15 Cluster 4 9 -6.70 ± 0.15 12.42 ± 0.07 Cluster 5 8 -8.34 ± 0.17 1.27 ± 0.15 Cluster 6 4 -7.60 ± 0.12 11.03 ± 0.01 Cluster 7 3 -8.11 ± 0.10 2.87 ± 0.04 Cluster 8 3 -6.65 ± 0.07 12.16 ± 0.09 Cluster 9 2 -5.92 ± 0.08 12.73 ± 0.00 Cluster 10 1 -8.58 2.26 Cluster 11 1 -8.26 3.69 * Scores and RMSDs are presented as averages ± one standard deviation; poses were clustered with an RMSD tolerance of 1.5 Å

190

A B

C D

E F

Figure 4.11. Docking models of β-apo-14'-carotenoic acid. AutoDock cluster representatives (green) from Cluster 1 (A), Cluster 2 (B), Cluster 3 (C), Cluster 4 (D), Cluster 5 (E), and Cluster 6 (F) with Glide XP docking pose (pink) and crystallographic ATRA binding mode (blue).

As detailed in Section 4.4.2, while the docking modes of β-apo-14’-carotenoic acid extend outside of the binding pocket, MD simulations of these docking modes quickly results in an induced fit effect in which the ligand fully enters the binding pocket. Upon entering the pocket, a salt bridge is formed between the carboxy terminus of the ligand and R276, similar to the interaction observed with

ATRA.

191

4.3.5 Preparation of RAR-TTNPB Complex

For MD simulation of RAR in complex with the super agonist TTNPB, docking

was not performed. Since a crystal structure of RARβ LBD exists in complex with

TTNPB (PDB ID: 1XAP [198]) and the RARα and RARβ LBDs are highly similar

(88% identity, 94% similarity, and no gaps), TTNPB was introduced into the

RARα binding pocket of 3A9E via superposition and coordinate transfer.

4.4 Molecular Dynamics Simulations of RARα Complexes

Molecular dynamics simulations of the RARα LBD in complex with a number of

ligands were performed to understand the way in which small molecules act as

agonist or antagonists for the receptor. In total, 19 long-timescale (1+ μs) MD simulations were performed for 18 different RARα states, as detailed in Table

4.7.

Table 4.7. List of RARα LBD simulations performed. Ligand Coactivator Peptide Simulation length ATRA -- 1 μs TTNPB -- 1 μs β-apo-14’-carotenoic acid -- 1 μs β-apo-13-carotenone -- 1 μs Covalently linked carotenone -- 1 μs Apo (no ligand) -- 2 * 5 μs ATRA SRC-1 NR2 1.5 μs TTNPB SRC-1 NR2 1.5 μs β-apo-14’-carotenoic acid SRC-1 NR2 1.5 μs β-apo-13-carotenone SRC-1 NR2 1.5 μs Covalently linked carotenone SRC-1 NR2 1.5 μs Apo (no ligand) SRC-1 NR2 1.5 μs ATRA SRC-2 NR2 1 μs TTNPB SRC-2 NR2 1 μs β-apo-14’-carotenoic acid SRC-2 NR2 1 μs β-apo-13-carotenone SRC-2 NR2 1 μs Covalently linked carotenone SRC-2 NR2 1 μs Apo (no ligand) SRC-2 NR2 1 μs 192

In addition to observing structural effects of ligand binding, another purpose of

the MD simulations was to determine if computed free energies of binding

between ligand-bound RAR and the SRC-1 NR2 coactivator peptide correlate

with the experimental values measured in Section 3.6. Initially, the simulations

used the coactivator peptide that was cocrystallized with the RARα LBD in 3A9E.

It was later realized that this was actually the SRC-2 NR2 peptide and not the

SRC-1 NR2 peptide that was used in the experiments. The peptides are very similar to each other, however not identical:

SRC-1 NR2: COCH3-RHKILHRLLQEGS-NCH3 SRC-2 NR2: COCH3-KHKILHRLLQD-NCH3

Therefore, the SRC-1 peptide was modeled into the coregulator binding pocket

and new MD trajectories were produced so that the simulated complex would

match the experimental conditions as closely as possible. Although the

simulations with the SRC-2 peptide are not ideal for correlating with the

experimental thermodynamic data, they still may offer interesting information

about ligand-bound receptor dynamics on μs timescales and are therefore included in the some of the analysis that follows.

4.4.1 General Structural Preparation

In all cases the starting structure was that of the human RARα LBD in monomeric

form as taken from the hRARα/mRXRα LBD crystal structure (PDB ID: 3A9E) bound to the endogenous RAR ligand, all-trans retinoic acid (ATRA), and the synthetic RXRα antagonist, LG100754 ((2E,4E,6Z)-3-methyl-7-(5,5,8,8- 193

tetramethyl-3-propoxy-5,6,7,8-tetrahydronaphthalen-2-yl)octa-2,4,6-trienoic acid)

[180]. RARα residues 177-415 were used in all simulations, in which the N- and

C- termini were capped with acetyl- and N-methyl- groups respectively. The

Protein Preparation Wizard of Maestro (Schrödinger, LLC) was used to add

protons, adjust ASN/GLN and HIS side chain orientation, and to optimize

hydrogen bonding networks prior to MD simulation. Additionally, PROPKA, as

implemented in Maestro, was used to determine the protonation state of titratable

side chains [203]. MD simulations were performed with the following histidine

protonation states: HID195, HID298, and HIE372 on the LBD and HID1472 and

HID1476 on the coactivator peptide, where HID denotes protonation on the δ1 nitrogen of the imidazole ring, while HIE denotes protonation on the ξ2 nitrogen.

The ATRA-bound RAR LBD of 3A9E was cocrystallized with a 13-residue SRC-2

NR2 coactivator peptide that is similar, but not exactly the same as the peptide used in the ITC experiments. Therefore, in order to simulate receptor-coactivator complexes that most closely match the complexes that were experimentally formed, the SRC-1 NR2 peptide was modeled in the place of the SRC-2 NR2 peptide. The SRC-2 NR2 peptide in 3A9E was only 11 residues long, therefore two residues of SRC-1 NR2 were added de novo with Modeller 9v8 [106].

4.4.2 MD Protocol

Using the LEaP module of the AMBER suite of programs, solutes were solvated in a truncated octahedron of TIP3P water that created a 12 Å buffer between the solute and the edges of the water box. To allow simulations with periodic

194

boundary conditions, the unit cell was charge neutralized with Na+ ions. The total number of water molecules added averaged ~10,300 for all simulations with an average ~34,900 total number of atoms. Prior to simulation, each system was energy minimized with 500 steps of steepest descent minimization followed by

1500 steps of conjugate gradient minimization. Next, the systems were equilibrated by simulating for 200 ps during which the temperature was raised from 0 to 300 K. During this phase, the protein was held rigid by a 50 kcal/mol harmonic restraint, which served to allow the water to equilibrate around the surface of the protein, removing any cavities that may have existed at the solute/solvent interface. Finally, the restraints on the protein were released and the systems were simulated under isothermal-isobaric (NPT) conditions. The temperature was maintained at 300 K with the weak-coupling algorithm in

AMBER using a 2 ps coupling constant, while the pressure was maintained at 1 atm with isotropic position scaling and a pressure relaxation time of 2 ps.

Nonbonded interactions used a cutoff of 8 Å, and electrostatic energies were calculated with the Particle Mesh Ewald (PME) method [37]. The SHAKE algorithm was implemented to exclude proton bond interactions, enabling a 2 fs timestep.

Long time-scale simulations (≥ 1 μs) were performed using the parallelized,

GPU-enabled version of the pmemd (Particle Mesh Ewald Molecular Dynamics) code (pmemd.cuda.MPI) [23,24]. This implementation of the MD code produces greatly accelerated output, allowing for microsecond trajectories to be computed.

195

On a workstation running AMBER 12 on two GTX-680 GPUs in parallel, the

explicitly solvated ~35,000 atom RAR systems averaged output rates of ~42

ns/day, while newer workstations with GTX-780 cards perform ~15-20% faster.

4.4.3 Simulation Stability

As an initial assessment of the receptor stability, backbone atom RMSDs were

calculated for the MD simulations with respect to starting structure of the

production runs (post-equilibration). In processing the MD trajectories, receptor conformations were collected every 250 ps, resulting in 4000 conformations/μs.

These snapshots were fit to the starting structure based on the backbone atoms

(C, N, CA, O) of residues belonging to core secondary structural elements. This included helices 1, 3, 4, 5, 7, and 8 as illustrated in Figure 4.12A. This selection purposefully did not include helices 10 and 12 as they are known to adopt alternate conformations when bound to certain ligands. When calculating the per residue RMSDs of the trajectories, inclusion of H10 and H12 during the fitting process could partially dampen the calculated RMSDs in these regions, making any structural changes that may occur in this region less apparent.

196

A B

C

Figure 4.12. RAR LBD structural alignment and mobility. RARα LBD bound to ATRA (red) and coactivator peptide (purple). A. Helices used for structural alignment of MD snapshots (green). B. Most dynamic regions of the RAR LBD (orange). C. Proximity of L8-9 and L9-10 to RXR dimerization partner (blue).

197

Initial RMSD plots also indicated that the loop joining helices 8 and 9 (L8-9) exhibited significant deviation from both the starting and average structures. This deviation extended to the N-terminal portion of H9, therefore H9 was also excluded from the fitting process.

As illustrated in Figure 4.13, the LBDs are overall very stable on the μs timescale, with average backbone RMSD ranging from 1.46 – 2.93 Å from the 18 trajectories analyzed. While some complexes deviate from the starting structures more than others, the overall range of average RMSDs is tight and no significant trends could be inferred from this data.

Figure 4.13. RARα LBD backbone RMSD. Backbone RMSD for receptors bound to SRC-1 NR2 peptide (top), SRC-2 NR2 peptide (middle), or no peptide (bottom) and in complex with ATRA (red), TTNPB (orange), β-apo-14’-carotenoic acid (green), β-apo-13-carotenone (blue), covalently linked β-apo-13-carotenone (teal), or no ligand (purple).

198

Taking a closer look at the RMSD data by measuring the average RMSD from

the initial or overall average structures on a per residue basis provides a better

idea of which regions are fluctuating the most. As shown in Figures 4.14-17, in

addition to the N- and C-termini, residues in the L1-3, L8-9, and L9-10 loop

regions are observed to fluctuate the most (Figure 4.12B). The fluctuation of the

L1-3 loop is unsurprising, as this region is often unresolved in many RAR crystal

structures, suggesting that this loop is conformationally dynamic or unstructured.

The deviation of the N-terminus is also expected, since this region is normally

joined to the linker domain and the artificial truncation of the LBD leaves the N-

terminus unrestrained. Similarly, the C-terminus is artificially truncated. Where

the experiments described in Chapter 3 use residues 176-421, the simulated

LBDs use 175-415 due to terminal residues that were not resolved in 3A9E. The

full-length RARα terminates at residue 462, including a 41-residue F domain with

unknown function.

The short L9-10 loop contains S369, which is a known phosphorylation target of

mitogen- and stress-activated protein kinase 1 (MSK1). Phosphorylation at this

site enhances the ability of cyclin H to bind to the L8-9 loop, nearly 40 Å away,

which then allows for the association of cdk7 with cyclin H (components of

transcription factor II H (TFIIH)) and the subsequent phosphorylation of S77 in

the A/B domain, ultimately leading to enhanced promoter binding [204,205]. The

communication between these two sites has been studied computationally, in

199

which three 50 ns MD simulations of each form (phosphorylated and

unphosphorylated) of the RARα LBD were performed. The results of these

simulations showed no overall structural changes as a result of phosphorylation,

however more subtle local changes were observed such as salt bridge

reorientation and helical elongation that could help explain the allosteric effect

that phosphorylation has on cyclin H binding [206].

Neither the L8-9 and L9-10 loops do not directly contact the ligand- or

coactivator-binding sites, measuring ~17 Å and ~45 Å away the coactivator

binding site respectively. Given that part of L8-9 is concealed by the RXR LBD in

the functional RAR-RXR heterodimer, this leaves little room for cyclin H binding.

Thus, it is unlikely that the large ~160 kDa coactivator proteins also interact with

this loop. Alternately, it is likely that when both are bound to the RARα LBD, the

37 kDa cyclin H and p160 coactivators interact with each other. However, in spite

of the lack of close proximity with the coactivator binding site, it cannot be discounted that these mobile loops do not modulate coactivator binding through an allosteric effect.

The L1-3 loop, on the other hand, is within close proximity to the ligand binding site, making contacts with R278 and the β-hairpin that forms the boundary of the binding site and interacts with the carboxy terminus of ATRA. This loop deviates the most in simulation with β-apo-14’-carotenoic acid. As discussed in the following section (Section 4.4.4), the tip of the β-hairpin of the LBD must extend away from the core of the receptor to accommodate the longer carotenoid in the

200

binding pocket. This seems to destabilize the L1-3 loop to a greater extent than observed for the LBD bound to the other ligands tested.

201

Figure 4.14. Average per residue RMSDs with respect to starting structure (1). All-atom RMSD with respect to the starting structure for each residue averaged over 1 μs simulations sampled every 250 ps. Simulations were either with a SRC-1 NR2 coactivator peptide (light color) or without a peptide (dark color) for RARα in complex with ATRA (red), TTNPB (orange), or β- apo-14’-carotenoic acid (green).

202

Figure 4.15. Average per residue RMSDs with respect to starting structure (2). All-atom RMSD with respect to the starting structure for each residue averaged over 1 μs simulations sampled every 250 ps. Simulations were either with a SRC-1 NR2 coactivator peptide (light color) or without a peptide (dark color) for RARα in complex with β-apo-13-carotenone (blue), covalently-bound carotenone (teal), or no ligand (purple).

203

Figure 4.16. Average per residue RMSDs with respect to the average structure (1). All-atom RMSD with respect to the average structure for each residue averaged over 1 μs simulations sampled every 250 ps. Simulations were either with a SRC-1 NR2 coactivator peptide (light color) or without a peptide (dark color) for RARα in complex with ATRA (red), TTNPB (orange), or β- apo-14’-carotenoic acid (green).

204

Figure 4.17. Average per residue RMSDs with respect to the average structure (2). All-atom RMSD with respect to the average structure for each residue averaged over 1 μs simulations sampled every 250 ps. Simulations were either with a SRC-1 NR2 coactivator peptide (light color) or without a peptide (dark color) for RARα in complex with β-apo-13-carotenone (blue), covalently-bound carotenone (teal), or no ligand (purple). 205

Measuring the average RMSD of the RAR ligands with respect to the average

structure of the MD simulations reveals an interesting trend in that compounds

that destabilize coactivator peptide binding the most also deviate the most from

the average structure. This held true for all three sets of simulations, as shown in

Table 4.8.

Table 4.8. All-atom RMSD of ligands with respect to average structure. SRC-1 SRC-2 No coactivator TTNPB 0.62 (0.23) 0.61 (0.16) 0.71 (0.26)

ATRA 0.68 (0.17) 0.73 (0.18) 0.74 (0.15) β-apo-14’- 0.92 (0.46) 0.95 (0.53) 0.92 (0.45) carotenoic acid β-apo-13- 1.41 (0.45) 1.40 (0.56) 1.77 (0.41) carotenone Covalent β-apo- 0.87 (0.36) 0.77 (0.19) 1.17 (0.33) 13-carotenone Data averaged over 6000 data points from 1.5 μs simulations with SRC-1 NR2 peptide and 4000 from 1.0 μs simulations with SRC-2 NR2 peptide or no peptide

4.4.4 Induced fit MD of β-apo-14’-carotenoic acid

As discussed in Section 4.3.3, β-apo-14’-carotenoic acid was only able to dock to

the ATRA-bound RARα LBD crystal structure by extending through a tunnel on

the surface of the receptor that connects the binding cavity to the solvent. To

examine how stable these docking modes were, short MD simulations of four of

the AutoDock poses were performed. After 2.5 ns simulations, all four of these

docking modes fully entered the binding site. The carboxy terminus of the β-apo-

14’-carotenoic acid molecules all formed salt bridges with R276, similar to the

206

binding mode of ATRA, in addition to forming an interaction with S287 on the β- hairpin loop. As illustrated in Figure 4.18, the RAR binding pocket was able to accommodate the longer β-apo-14’-carotenoic acid due to an opening motion of the β-hairpin.

A B

C D

E F

Figure 4.18. Induced fit MD of -apo-14'-carotenoic acid. Initial AutoDock docking cluster representative of β-apo-14’-carotenoic acid from cluster 1 (orange, A), cluster 2 (yellow B), cluster 3 (green, C), and cluster 7 (purple, D), docked to the RARα LBD (grey ribbon) in which helix 3 (magenta), helix 12 (purple), and the coactivator peptide (green) are highlighted. Part of helix 3 is removed for clarity. The receptor+ligand conformation after 2.5 ns of MD simulation is shown in light blue. E. Superposition of docking modes. F. Superposition of ligand conformations after 2.5 ns of MD simulation.

207

As discussed in Section 4.4.3, it was observed that the L1-3 loop was notably more dynamic in the β-apo-14’-carotenoic acid-bound simulations. A closer examination of these simulations reveals that two hydrogen bonds between the

L1-3 loop and the receptor are disrupted upon β-apo-14’-carotenoic acid binding.

These are between the side chains of Y208 on the L1-3 loop and D288 on the tip of the β-hairpin and between S213 of L1-3 and D221 of H3 (Figure 4.20A). As shown in Figures 4.19 and 4.20, of the three sets of simulations (with SRC-1

NR2, SRC-2 NR2, and no peptide) with six different liganded states, the two aforementioned hydrogen bonds are prominently formed in all cases except when bound to β-apo-14’-carotenoic acid. Not only are the hydrogen bonds broken, but L1-3 completely rearranges into an alternate conformation (Figure

4.19B), which breaks a β-bridge that forms between Y208 and β2 of the β- hairpin. In all three simulations of RARα bound to β-apo-14’-carotenoic acid, the loop deforms similarly, discounting the chance that a rare event is being sampled in the simulations.

208

A B C

Figure 4.19. Deformation of L1-3 loop upon β-apo-14'-carotenoic acid binding. A. Representative structure of ATRA-bound hRARα. B. Representative structure of β-apo-14’- carotenoic acid-bound hRARα. C. Superposition of panels A and B, rotated slightly. Bound ligands are represented as spheres. Unlabeled are S287 and R276 which interact with the ligand.

Figure 4.20. Y208-D288 distance. Distance data between Y208@OH and D288@OD1 or OD2 is from three simulations (with SRC-1 NR2, with SRC-2 NR2, with no coactivator peptide) for each ligand: ATRA (red), TTNPB (orange), β-apo-14’-carotenoic acid (green), β-apo-13- carotenone (blue), covalently linked β-apo-13-carotenone (teal), no ligand (purple).

209

Figure 4.21. S213-D221 distance. Distance data between S213@OG and D221@OD1 or OD2 is from three simulations (with SRC-1 NR2, with SRC-2 NR2, with no coactivator peptide) for each ligand: ATRA (red), TTNPB (orange), β-apo-14’-carotenoic acid (green), β-apo-13- carotenone (blue), covalently linked β-apo-13-carotenone (teal), no ligand (purple).

4.4.5 Analysis of β-apo-13-carotenone Simulations

Docking poses of β-apo-13-carotenone to the RAR LBD suggested that the

carbonyl carbon of the ligand may come into contact with the sulfur atom on the side chain of C235, leading to the formation of a covalent bond between the

ligand and receptor. Such an interaction could explain the highly favorable binding affinity of β-apo-13-carotenone for the receptor, which is otherwise unexplainable given that this ligand is shorter than ATRA, and lacks the carboxy terminus to interact with R276, yet maintains similar potency.

The distance between the carbonyl carbon of β-apo-13-carotenone and the sulfur atom of C235 was measured over the three μs carotenone-bound RARα MD

210

simulations, as illustrated in Figure 4.22. Based on the van der Waals radius for

sulfur and carbon of 2.000 Å and 1.908 Å, respectively, as implemented in the

ff99SBildn force field, the two atoms would be contacting each other at an

interatom distance of 3.908 Å. Significant populations of the sampled

conformations fall within this range, indicating that β-apo-13-carotenone is indeed

poised for interaction with C235. The overall average C-S distances were

4.21±0.57 Å, 4.05±0.36 Å, and 4.91±0.98 Å, for simulations including SRC-1

NR2, SRC-2 NR2, and no coactivator peptide respectively. In the simulation without a peptide, the carotenone was noticeably less stable in the pocket, resulting in a bimodal distribution of C-S distances, while the presence of a

peptide limited the ligand fluctuations in the binding pocket, shifting the equilibrium towards conformations in which the ligand was closely situated to

C235.

211

A B

C

Figure 4.22. Distance between β-apo-13-carotenone and C235 of RAR LBD. A. Sample MD conformation with short distance between sulfur atom of C235 and carbonyl carbon of β-apo-13- carotenone. B. Sample MD conformation with long distance between sulfur atom of C235 and carbonyl carbon of β-apo-13-carotenone. C. Distance between sulfur atom of C235 and carbonyl carbon of β-apo-13-carotenone bound in three different simulations: with SRC-1 NR2 peptide (cyan), with SRC-2 NR2 peptide (yellow), and with no coactivator peptide (red).

4.5. Free Energy Analysis

Methods used to compute the interaction energy between a ligand and its receptor range from crude and quick to thorough yet costly. On one end of the spectrum, molecular docking may be employed to quickly determine if one ligand

212

interacts more or less favorably with a receptor than another. These energies are

typically based on empirically-weighted, physics-based scoring functions with

errors in the range of 2-3 kcal/mol. On the other end of the spectrum are free

energy perturbation (FEP) and thermodynamic integration (TI) techniques that

are theoretically more rigorous, yet very costly calculations due to slowly converging energy values. In this section, the molecular mechanics Poisson-

Boltzmann surface area (MM-PBSA) approach is applied to calculate relative free

energy differences for coactivator peptide binding to the ligand-bound receptors.

As the name indicates, MM-PBSA is a MD-based technique, which likens it to

FEP and TI. However, where FEP and TI calculate free energy changes in

explicitly solvated environments, MM-PBSA attempts to avoid the difficulty in

measuring small energy changes in a noisy water environment by considering

solvation effects implicitly. Specifically, the polar component of solvation is

calculated using the Poisson-Boltzmann equation, while the nonpolar component

is linear to the solvent accessible surface area of the solute. More details on the

MM-PBSA method are described in Section 1.2.

In two early studies from the Kollman group in which the method was developed,

MM-PBSA was used to calculate the binding affinity of both small molecules and

peptides to protein targets. In a study by Wang and Kollman, the interaction

energy between HIV-1 reverse transcriptase (RT) and a number of small

molecule inhibitors was evaluated. A “satisfactory” rank order was achieved for

the energies of 12 analogs, with an average unsigned error of 0.97 kcal/mol

213

between the computed and experimental energies [207]. In a second report,

Massova and Kollman were able to reproduce binding affinities between 12-

residue N-terminal α-helix of the p53 tumor suppressor binding to the

onocoprotein Mdm2 in good agreement with experimental values [208]. Both of

these studies calculated the nonpolar part of the solvation free energy, Gnp, as:

Gnp = SASA*γ - β

where the solvent accessible surface area (SASA) was scaled by a factor

γ=0.00542 kcal/molÅ2 with an offset of β=0.92 kcal/mol. These same

parameters are used in the following application of the MM-PBSA method.

4.5.1 Entropy Calculations and Sampling Considerations

One of the most difficult components of binding free energy to calculate is the

entropic component, -T∆S. In the MM-PBSA protocol, a standard way to estimate the entropy is through normal mode analysis (NMA). NMA estimates the entropy via a summation of the 3N-6 vibrational normal modes of a system with N atoms using the harmonic oscillator approximation in which the vibrational frequencies are the eigen values of the 3Nx3N Hessian matrix.

Due to the harmonic oscillator approximation, the Hessian must be computed at an equilibrium geometry (stationary point) to avoid errors. Errors that arise from

computing the entropy via normal mode analysis at non-equilibrium geometries

are due to the uniform spacing of an infinite number of vibrational levels within

the harmonic oscillator approximation. In a physical system, bonds will eventually

214

break when stretched too far, making the number of vibrational levels finite.

Furthermore, the actual vibrational levels are not evenly spaced, but grow closer

together with increased vibrational quantum number [209].

Both normal mode calculations and the thorough energy minimization required to

reduce NMA errors are costly calculations. Therefore, tests were carried out to

determine the maximal sampling rate that could be used to reduce the number of

MD snapshots required to achieve averages with acceptable errors. Two different methods were tested on the first 250 ns of MD simulation of the SRC-2 NR2 peptide binding to ATRA-bound hRARα. First, the entropies were computed for the entire system (full LBD + peptide), then an alternate method was tested as proposed by Kongsted and Ryde [210]. In the Kongsted/Ryde approach, entropies are calculated via NMA using only the region within 8 Å of the ligand.

During energy minimization in the approach, an additional buffer region

extending to a 12 Å sphere around the ligand (including water molecules) is considered. The buffer region between 8 and 12 Å is kept frozen during the energy minimization to prevent large distortions in the geometry that may occur

due to the artificial truncation of the system. The following implementation of the

Kongsted/Ryde approach was altered slightly, in that explicit water molecules

were not considered during the energy minimization. Instead, a Generalized Born

implicit solvation model (the Hawkins, Cramer, Truhlar implementation of, GBHTC,

igb=1) was implemented during the minimization to simulate the effect of water

[211].

215

When comparing the entropy calculations for the full receptor/peptide complex to the calculations for the 8 Å protein sphere around the peptide, there appears to be systematic decrease in the –T∆S values for the truncated system as shown in

Figure 4.23. However, measuring the deviation between the two techniques

(Figure 4.23, bottom) shows that this difference can vary over a wide energy range. When comparing the two methods over the 5,000 snapshots of the 250 ns simulation, there is an average 6.61 kcal/mol difference between the calculated

T∆S values with a standard deviation of 5.95 kcal/mol. If the same difference is calculated between the moving average data of the computed entropies (the representation plotted in Figure 4.23) the resulting value is nearly identical, at

6.60 kcal/mol, but with a much lower standard deviation of 1.08 kcal/mol. The most important part of the comparison of the two methods is the range of deviation and the standard deviation. Since MM-PBSA is more a relative free energy method than a means to calculate absolute free energies, the overall average difference between use of a full or truncated receptor in calculating entropies is not very important. However, for the truncated method to be reliable, the difference should be nearly constant so as to not introduce additional error into the overall ∆G calculations. Given the wide range of entropy differences between the two methods, it is concluded that the truncated receptor approach as implemented here does not reliably reproduce the entropy from the full system, and therefore should not be used.

216

Figure 4.23. Entropy comparison between full and truncated systems. Top: Entropy calculations (-T∆S) using the complete hRARα LBD (blue) and a truncated receptor within 8 Å of the peptide (red). Data is smoothed with a moving average using a ±50 data point window. Bottom: Entropy difference between the two methods.

For entropy calculations using the full receptor, tests were performed over the

first 250 ns of MD simulation of the SRC-2 NR2 peptide binding to an ATRA- bound hRARα LBD. Conformations were sampled every 50 ps, for a total of

5,000 individual conformations. Results for the entropy calculations using the full

LBD are shown in Figure 4.24. Tests were also performed to determine the

amount of potential error introduced into the ∆S calculations when reduced data

sets were considered. As illustrated in the lower two panels of Figure 4.24, only

217

slight deviations from the full 5,000-point data set occur when the sampling rate is increased from 50 ps (full data set) to 100 ps.

Overall, sampling rates of 100, 150, 200, 250, 500, and 1000 ps were tested. The average deviation from the full data set was subset dependent. For example, when sampling at 250 ps intervals (every fourth data point), the full data set may be divided into four reduced subsets that can have an overall average deviation of 0.14 to 0.21 kcal/mol when compared to the average calculated entropy for the full data set. In general, as the sampling rate increases, so does range of deviations between the average energies calculated from reduced and full data sets. The range of deviations for both calculated entropies and MM-PBSA energies are compared to the averages for the full data set in Table 4.9 and

Figure 4.25. Ultimately, it was decided that a sampling rate of 250 ps reduced the number of calculations that needed to be performed to compute the binding free energy over 1 μs simulations while maintaining minimal deviation from energies calculated from a more densely sampled trajectory.

218

Figure 4.24. Computed entropic component, T∆S, of SRC-2 NR2 binding to ATRA-bound hRARα LBD. Top: Data collected at 50 ps intervals over 250 ns of MD simulation (grey) and a moving average of the raw data with a ±25 data point window (black). Middle: Running average of the raw data (black) with running averages of data sampled every 100 ps (blue, points 1,3,5…; green, points 2,4,6…). Bottom: Deviation in kcal/mol between the full and the reduced data sets.

Table 4.9. Entropic and MM-PBSA deviations from 50 ps data for reduced data sets. Sampling Data Data points Entropic Deviation (kcal/mol) MM-PBSA Deviation (kcal/mol) interval sets per set Minimum Maximum Minimum Maximum 100 ps 2 2500 0.05 0.05 0.07 0.07 150 ps 3 1666 0.07 0.13 0.09 0.15 200 ps 4 1250 0.10 0.17 0.07 0.19 250 ps 5 1000 0.14 0.21 0.13 0.18 500 ps 10 500 0.19 0.41 0.12 0.37 1000 ps 20 250 0.22 0.82 0.26 0.63

219

Figure 4.25. Range of average deviations from 50 ps data for reduced entropy and MM- PBSA data sets. As the sampling interval increases, so does the number of unique data sets that may be assembled from the original data set that was sampled ever 50 ps. The average deviations of these smaller data sets from the full data set are plotted for both NMA entropy (blue circles) and EMM-PBSA (orange diamonds) calculations.

4.5.2 Complete Binding Free Energy Analysis

Implementing the MM-PBSA approach with entropies calculated using the full receptor via NMA and nonpolar solvation free energies calculated using

γ=0.00542 and β=0.92, the binding energies between ligand-bound hRARα LBD and the SRC-1 NR2 coactivator peptide were calculated. Initially, the energies were calculated over 1 μs MD simulations for the receptor in six states: bound to

TTNPB, ATRA, β-apo-14’-carotenoic acid, and β-apo-13-carotenone in addition to receptor covalently linked to the carotenone and the unbound (apo) state.

Given the promising initial results, these trajectories were each extended to 1.5 220

μs of total simulation time. The results of the free energy calculations are

presented in Figure 4.26 and Tables 4.11-12. As illustrated in the top panel of

Figure 4.26, the computed average ∆Gbind values followed the expected trend from the experimental binding values calculated via ITC (Section 3.6, summarized in Table 4.10), in that TTNPB < ATRA < β-apo-14’-carotenoic acid <

β-apo-13-carotenone < apo.

Table 4.10. Summary of ITC Data of SRC-1 NR2 peptide binding ligand-bound RARα. Ligand ∆G (kcal/mol)* TTNPB -8.62 ± 0.08 ATRA -8.44 ± 0.02 β-apo-14’-carotenoic acid -8.05 ± 0.12 β-apo-13-carotenone -7.73 ± 0.01 apo -7.13 ± 0.09 * Average data ± 1 SD with n=3

221

Figure 4.26. Computed binding energies for SRC-1 NR2 peptide binding to RAR LBD. Data from RARα in six different liganded states: ATRA (red), TTNPB (orange), β-apo-14’- carotenoic acid (green), β-apo-13-carotenone (blue), covaltently-linked β-apo-13-carotenone (teal), or no ligand (purple). Top: Cumulative average of ∆Gbind over 1.5 μs simulations sampled every 250 ps. Middle: Standard deviation of 250 ps data. Bottom: Standard error of 250 ps data.

Table 4.11. MM-PBSA binding energy components after 1.5 μs of MD simulation. ∆EMM + ∆EMM ∆EvdW ∆Eel ∆Gsolv -T∆S ∆G ∆Gsolv -463.20 -52.60 -410.60 404.99 -58.21 36.48 -21.73 TTNPB (67.59) (7.36) (66.50) (59.76) (11.90) (6.13) (11.30)

-427.02 -56.67 -370.35 368.36 -58.66 36.76 -21.90 ATRA (52.53) (6.31) (50.66) (46.30) (9.53) (5.65) (9.56)

β-apo-14’- -447.24 -56.10 -391.41 389.39 -57.85 36.79 -21.07 carotenoic acid (53.70) (8.47) (52.91) (48.24) (10.26) (6.34) (10.59)

β-apo-13- -430.77 -56.02 -374.75 375.04 -55.73 36.67 -19.05 carotenone (51.52) (6.29) (49.85) (46.28) (9.05) (6.00) (9.67)

Covalently-linked β- -390.34 -56.15 -334.19 333.91 -56.43 36.47 -19.96 apo-13-carotenone (50.57) (6.63) (48.52) (44.37) (9.85) (5.62) (9.78)

-394.22 -53.40 -340.82 342.70 -51.52 34.66 -16.86 apo (64.18) (7.02) (61.37) (58.02) (9.99) (5.85) (10.12) 222

Table 4.12. Detailed MM-PBSA components for receptor peptide interaction. complex receptor ligand mean std mean std mean std 1.) All-trans retinoic acid (ATRA) EMM ‐4842.44 119.16 ‐4221.74 114.18 ‐193.68 43.92 EvdW ‐1117.78 27.48 ‐1041.78 26.42 ‐19.32 6.79 Eelectrostatic ‐8210.70 113.60 ‐7397.68 109.16 ‐442.67 42.69 Einternal 4486.03 45.93 4217.72 44.25 268.31 11.70 Gsolv ‐2891.07 99.08 ‐2899.2 95.80 ‐360.23 37.41 Gsolv_ele ‐2960.53 99.52 ‐2965.63 96.22 ‐371.33 37.69 Gsolv_np 69.46 1.22 66.44 1.19 11.10 0.57 EMM-PBSA ‐7733.51 49.97 ‐7120.94 47.55 ‐553.91 13.05 -TS ‐2885.62 9.42 ‐2721.07 8.57 ‐201.31 3.99 G ‐10619.13 49.53 ‐9842.01 47.15 ‐755.22 12.39

2.) TTNPB EMM ‐4738.87 129.82 ‐4087.96 118.70 ‐187.71 44.77 EvdW ‐1095.79 28.71 ‐1024.42 27.08 ‐18.77 7.16 Eelectrostatic ‐8135.43 125.64 ‐7288.20 115.36 ‐436.63 43.57 Einternal 4492.35 45.67 4224.66 44.21 267.69 11.50 Gsolv ‐2903.63 107.55 ‐2939.97 97.44 ‐368.65 38.13 Gsolv_ele ‐2974.66 108.13 ‐3007.65 97.90 ‐379.89 38.32 Gsolv_np 71.03 1.48 67.68 1.35 11.24 0.60 EMM-PBSA ‐7642.50 50.52 ‐7027.93 48.35 ‐556.36 12.92 -TS ‐2894.03 9.64 ‐2728.04 9.13 ‐202.47 3.87 G ‐10536.53 50.00 ‐9755.96 47.88 ‐758.83 12.29

3.) β-apo-14’-carotenoic acid EMM ‐4795.84 141.98 ‐4169.01 130.55 ‐179.58 44.92 EvdW ‐1092.58 29.96 ‐1018.64 27.91 ‐17.84 7.60 Eelectrostatic ‐8186.50 135.70 ‐7366.38 124.17 ‐428.98 42.93 Einternal 4483.24 45.29 4216.01 43.91 267.23 11.55 Gsolv ‐2890.72 121.07 ‐2905.97 112.62 ‐374.13 37.47 Gsolv_ele ‐2962.32 121.79 ‐2974.33 113.28 ‐385.51 37.78 Gsolv_np 71.60 1.70 68.35 1.53 11.38 0.67 EMM-PBSA ‐7686.55 52.46 ‐7074.99 49.46 ‐553.71 13.87 -TS ‐2896.90 10.74 ‐2730.56 9.90 ‐203.13 4.53 G ‐10583.46 50.83 ‐9805.54 48.27 ‐756.84 12.63

4.) β-apo-13-carotenone EMM ‐4778.88 139.74 ‐4161.48 132.66 ‐186.64 42.95 EvdW ‐1099.66 29.55 ‐1024.65 28.43 ‐18.99 5.70 Eelectrostatic ‐8156.79 132.86 ‐7347.03 125.64 ‐435.01 42.85 Einternal 4477.57 46.06 4210.20 44.53 267.36 11.69 Gsolv ‐2857.04 115.70 ‐2859.73 108.54 ‐372.36 38.03 Gsolv_ele ‐2928.16 116.48 ‐2927.65 109.28 ‐383.58 38.11 Gsolv_np 71.12 1.49 67.92 1.40 11.22 0.37 EMM-PBSA ‐7635.92 51.91 ‐7021.20 50.44 ‐558.99 12.31 -TS ‐2895.20 10.60 ‐2729.12 9.63 ‐202.75 3.14 G ‐10531.12 51.06 ‐9750.32 49.49 ‐761.74 12.10 Continued

223

Table 4.12 continued

5.) covalently-linked β-apo-13-carotenone EMM ‐4746.31 137.04 ‐4152.81 127.55 ‐203.16 42.80 EvdW ‐1093.56 28.26 ‐1017.09 26.98 ‐20.32 6.53 Eelectrostatic ‐8128.44 132.58 ‐7343.73 124.19 ‐450.52 41.71 Einternal 4475.70 44.79 4208.01 43.69 267.68 11.69 Gsolv ‐2918.43 116.70 ‐2894.30 108.41 ‐358.04 36.56 Gsolv_ele ‐2989.05 117.05 ‐2961.70 108.73 ‐369.14 36.76 Gsolv_np 70.62 1.33 67.40 1.19 11.10 0.50 EMM-PBSA ‐7664.74 49.33 ‐7047.11 47.04 ‐561.20 12.83 -TS ‐2891.71 10.10 ‐2725.99 9.17 ‐202.19 3.71 G ‐10556.45 48.90 ‐9773.10 46.70 ‐763.39 12.30

6.) apo EMM ‐4821.01 138.53 ‐4222.43 127.84 ‐204.36 45.27 EvdW ‐1071.74 27.60 ‐996.89 26.68 ‐21.46 6.97 Eelectrostatic ‐8183.77 132.74 ‐7390.86 123.21 ‐452.08 43.72 Einternal 4434.50 45.42 4165.32 43.97 269.18 11.53 Gsolv ‐2826.38 111.99 ‐2811.45 103.52 ‐357.63 38.59 Gsolv_ele ‐2897.15 112.59 ‐2879.06 104.08 ‐368.59 38.84 Gsolv_np 70.77 1.30 67.61 1.21 10.96 0.54 EMM-PBSA ‐7647.40 53.47 ‐7033.89 50.76 ‐561.99 12.96 -TS ‐2865.55 10.25 ‐2698.65 9.60 ‐201.56 3.64 G ‐10512.95 51.82 ‐9732.53 49.53 ‐763.55 12.24 All energies in kcal/mol averaged over 6000 data points from 1.5 μs MD simulations.

Over the 6,000 data points sampled at 250 ps intervals from the 1.5 μs simulations, the standard deviations (SD) for the computed ∆G values range from 9.56 to 11.30 kcal/mol. As illustrated in Figure 4.27-28, the calculated energies form normal distributions with very broad ranges given that 95.4% of all the data should fall within ±2 SD of a normal distribution. With these broad distributions, convergence of the averages is a concern since we are attempting to correlate the computed binding energies to experimental data that ranges only

1.49 kcal/mol over five data points. To interpret confidence intervals for the computed averages, the standard error (SE) is computed as:

224

Where the number of samples, N, equals 6,000 in the case of the 1.5 μs trajectories. A range of ±1.96 SE corresponds to a confidence interval (CI) of

95%, indicating that there is a 95% chance that the computed averages fall within this range. In the six cases for which binding energies were computed, the SE ranges from 0.1248 to 0.1459 kcal/mol (95% CI ranges from ±0.2446 to ±0.2860 kcal/mol). In addition to calculations of CIs to interpret the overall averages, plots of the cumulative average ∆G values in the top panel of Figure 4.26 shows that the averages qualitatively converge (the plots plateau) after ~1 μs.

225

Figure 4.27. Computed ∆Gbind distributions. MM-PBSA energies computed for SRC-1 NR2 peptide binding to hRARα LBD in complex with either ATRA (red), TTNPB (orange), or β-apo-14’- carotenoic acid (green). Left plots are histograms of normalized probability distribution functions (pdf) for 6,000 computed energies (top right). A cumulative running average of each data set is also presented (lower right). Dashed lines are the Gaussian distribution of the pdfs for the given average and std (left) and overall average (top right). 226

Figure 4.28. Computed ∆Gbind distributions. MM-PBSA energies computed for SRC-1 NR2 peptide binding to hRARα LBD in complex with either β-apo-13-carotenone (blue), covalently bound carotenone (teal), or no ligand (purple). Left plots are histograms of normalized probability distribution functions (pdf) for 6,000 computed energies (top right). A cumulative running average of each data set is also presented (lower right). Dashed lines are the Gaussian distribution of the pdfs for the given average and std (left) and overall average (top right). 227

A least squares, one-dimensional polynomial fit was calculated between the computed and experimental binding energies at each time point. The Pearson product-moment correlation coefficient, R, of the data to the regression line was used to determine the goodness of fit, with R=1 indicating perfect correlation (the computed averages fall exactly on the regression line). A general rule-of-thumb is that two sets of data are strongly correlated when 0.7 ≤ R < 0.9 and very strong for R > 0.9. As shown in Figure 4.30, the computed and experimental ∆G values are very strongly correlate after 550 ns of simulation time. The data reaches a maximum correlation of R = 0.994 at 661 ns, however at this point, the TTNPB data is clearly not converged. After 1 μs, however, all of the cumulative averages appear to be converged, and the average R value over the final 0.5 μs is 0.946.

The fit between the experimental data and the computed ∆G averages after 1.5

μs is presented in Figure 4.29. Here, the correlation coefficient has a value of

0.9719 and the CIs for the computed free energies only overlap in the case of the

TTNPB- and ATRA-bound cases.

228

Figure 4.29. Correlation between computed and experimental ∆G values for SRC-1 NR2 peptide binding to RARα. Experimental ITC data is compared to computed ∆G values for RARα LBD in five states: bound to ATRA (red), TTNPB (orange), β-apo-14’-carotenoic acid (green), β- apo-13-carotenone (blue), and no ligand (purple). Error bars are ±1.96 SE, representing 95% confidence intervals for the averages.

229

Figure 4.30. Correlation coefficient over 1.5 μs of free energy calculations. Top: Cumulative average of ∆G for interaction between SRC-1 NR2 peptide and RARα LBD bound to ATRA (red), TTNPB (orange), β-apo-14’-carotenoic acid (green), β-apo-13-carotenone (blue), and no ligand (purple). Bottom: Correlation coefficient, R, between computed free energies and experimental ITC data.

As apparent in Figure 4.29 and directly plotted in Figure 4.31B, the slope of the

regression line is greater than 1. After 1.5 μs, the slope of the regression line is

3.51, while the average slope over the final 0.5 μs is 3.68 with a standard

deviation of 0.18. Any slope deviating from 1 implies a diminished ability to

calculate absolute free energies of binding, however the strong correlation

between computation and experiment indicates that reliable relative binding

energies may be calculated. A slope greater than 1 serves to spread out the 230

data, which actually helps achieve averages with non-overlapping CIs, allowing

for non-ambiguous averages to be calculated.

Figure 4.31 also addresses a particular issue regarding how thorough a free

energy calculation needs to be to be considered significant. In particular, it is

fairly common to see reports that exclude entropy calculations when employing

the MM-PBSA method. One reason for this is that entropy calculations are the

most computationally intensive aspect of the MM-PBSA procedure, as described

in Section 4.5.1. Additionally, the use of NMA to calculate entropies is only an

approximation, and it is often assumed that when comparing the relative binding

energies for chemically similar ligands, the entropic differences would be

negligible. In Figure 4.31A, a strong correlation between computed and

experimental binding energies is obtained with only the MM component, however

the slope of the regression is very large. Including the solvation effects of binding

with the PBSA method, slightly strengthens the correlation while dramatically

decreasing the slope of the regression line closer to a value of 1. Finally,

including the ∆S values as calculated with NMA, the correlation becomes even

stronger with the slope of the regression line coming even closer to 1. Thus, in

the case studied here, the most accurate relative free energy differences are calculated when entropy changes are included. However, the gains made by

including these costly calculations are not dramatic, therefore it is reasonable to

exclude these calculations if time or resources are limited.

231

A B

Figure 4.31. Correlation coefficient and slope of regression lines for binding energy components. A. Pearson correlation coefficient, R, for various components of binding data (∆EMM = MM energy only, red; ∆EMM-PBSA = MM + PBSA solvation energy, green; ∆G = MM-PBSA energy + entropy, blue). B. Slope of regression line. Regression is for five data points: SRC-1 NR2 peptide binding to TTNPB-, ATRA-, β-apo-14’-carotenoic acid-, and β-apo-13-carotenone- bound RARα and unbound RARα.

4.5.3 Comparison of SRC-1 NR2 binding energies for RARα bound to β-apo-13- carotenone both covalently and non-covalently.

While not conclusive, the data gathered from experiments performed in Section

3.7 suggest that a covalent interaction is indeed occuring between β-apo-13-

carotenone and C235 of hRARα. Additionally, as quantified with ITC experiments, disrupting this interaction by mutating C235 to alanine did not alter the ability of β-apo-13-carotenone to induce a specific coactivator peptide binding affinity. Comparing the computed binding energies from simulations involving β-

apo-13-carotenone both covalently and non-covalently bound to RARα, reveals

that the computed energies are very similar. After 1 μs, the average peptide

binding energies for receptor covalently and non-covalently bound to the

carotenone were nearly identical, at -19.63 and -19.60 kcal/mol respectively.

After extending the calculations to 1.5 μs, the average binding energies diverged 232

to -19.96 and -19.05 kcal/mol. Plots comparing the calculated binding energies for the two cases are presented in Figure 4.32 with comparison to the data for

the ATRA-bound case to highlight the similarity of energies for the two β-apo-13-

carotenone-bound simulations.

A

B

Figure 4.32. Comparison of SRC-1 NR2 binding energies to RARα bound to β-apo-13- carotenone both covalently and non- covalently. Interlaced histograms of normalized pdfs (left), Gaussian distributions for overall averages and standard deviations (middle), raw binding energies over 1.5 μs MD simulations (right, bottom), and cumulative averages of binding energy data (right, bottom). A. Comparison of covalently-bound (cyan) and non-covaltently-bound (blue) β-apo-13-carotenone cases. B. Comparison of non-covalently-bound (blue) and ATRA-bound (red) cases.

233

4.6. Long Timescale Simulations of Apo RARα LBD

Finally, two independent simulations of the apo RARα LBD were extended to 5

μs to see how the protein behaves at these timescales. As discussed in Section

3.2.3.4, the initial crystal structure of an NR LBD was that of the RXRα in the apo state. In this structure, H12 was extended away from the core of the receptor.

Subsequent crystal structures of agonist-bound NR LBDs revealed an alternate

H12 conformation that formed the coactivator binding pocket. This lead to

proposed ‘mouse-trap mechanism’ of ligand-induced transcription in which large scale H12 motions occur with ligand binding. Fluorescence anisotropy experiments have highlighted the increased mobility of H12 in unbound NRs, but

do not provide evidence of an extended conformation [212]. More recently

hydrogen/deuterium exchange experiments [213] and computational modeling of

fluorescence anisotropy decays [214] suggest that H12 is packed against the core of the receptor irrespective of ligand binding.

In the long timescale simulations performed here, an extended H12 state is not observed, however there is significant mobility of C-terminal region. The backbone RMSDs of the two simulations are presented in Figure 4.33A and are

measured with respect to the starting structure (the ATRA-bound conformation

from PDB ID: 3A9E). With overall average RMSDs of 2.76 and 3.51 Å, each LBD

remains well folded for the duration of the simulations. Backbone RMSD

measurements for only residues 391-415, which compose the C-terminal portion

of H10 through H12, indicate that this region deviates more from the agonist-

234

bound conformation than the remainder of the receptor. In particular, ‘run 2’

(Figure 4.33A, bottom) reaches a maximal RMSD of 11.43 Å at ~4.39 μs,

indicating a particularly altered H12 conformation, as illustrated in Figure 4.36B.

Figures 4.34-4.35 show the stability of the simulations by calculating the

secondary structure content over time using the STRIDE algorithm [215] as

implemented by the ‘timeline’ feature in VMD [216]. While the secondary

structures are generally unaltered for the LBD, it is interesting to see the unfolding of the C-terminal portion of H10 over time, which is particularly more

prominent in ‘run 2’. Conformation snapshots of ‘run 2’ are compared to the

ATRA-bound conformation in Figure 4.36, illustrating the various states sampled

over the 5 μs simulation.

Overall, these simulations highlight the flexibility of the H10-H12 region in the

unliganded state, but do not indicate that an extended H12 conformation is part

of this ensemble. These simulations do not discount the possibility that the

extended conformation would be sampled in trajectories that are extended

beyond 5 μs.

235

A

B

Figure 4.33. Deviation of apo RARα LBD simulations. A. Backbone atom RMSD of two independent 5 μs MD simulations. Solid line: all residues; dashed line: resides 391-415 (second half of H10 through H12). Data was collected at 250 ps intervals and smoothed with a moving average window of ±12.5 ns. B. Per residue RMSD with respect to the starting structure averaged over full trajectory. Color correspond to individual runs in panel A.

236

Figure 4.34. Computed secondary structure of 4.75 μs RARα LBD simulations (run 1). Secondary structure predictions based on STRIDE algorithm [215] as implemented in VMD [216]. 237

Figure 4.35. Computed secondary structure of 5 μs RARα LBD simulation (run 2). Secondary structure predictions based on STRIDE algorithm [215] as implemented in VMD [216]. 238

A

B

1.62 μs 3.00 μs 3.80 μs

4.08 μs 4.39 μs 5.00 μs

Figure 4.36. Conformation of apo RAR LBD over 5 s of explicit solvent simulation. A. Overlay of multiple conformational snapshots extracted from a 5 μs MD simulation of apo hRAR LBD, beginning in the agonist-bound conformation. Ribbon diagrams are colored from red (N- terminus) to blue (C-terminus). B. Individual conformations from indicated time (rainbow ribbon) compared to starting structure in agonist-bound conformation (light blue ribbon).

239

4.7. Conclusions

The computational modeling of β-apo-14’-carotenoic acid and β-apo-13- carotenone indicate that both compounds bind to RAR in a similar fashion as

ATRA. The receptor is able to accommodate the longer β-apo-14’-carotenoic acid via an opening motion of the β-hairpin that forms one boundary of the binding pocket, while the shorter β-apo-13-carotenone is likely to gain its strong binding affinity through an interaction with C235 in the binding pocket that was observed in docking modes and subsequent MD simulations. Free energy analysis of the ligand-bound receptors interacting with a coactivator peptide correlated very strongly with experimental ITC data, strengthening the validity of the proposed binding modes. Finally, long timescale simulations of the apo RARα

LBD indicated that while the C-terminal portion of the receptor is particularly mobile in the absence of ligand, no evidence of an extended H12 conformation was observed.

240

Chapter 5. Force Field Parameterization of S- Adenosylmethionine

5.1 Introduction

S-adenosyl-L-methionine (SAM or AdoMet) is a common cofactor in enzymatic

reactions where it plays many roles, from methyl donor to radical source for

complex chemical reactions. The methyl group donated by SAM is bound to the

sulfur atom of the methionine portion of the molecule, forming a positively

charged sulfonium ion. Sulfonium groups are fairly unusual in biology and have

thus been often overlooked in the parameterization of common force fields, such

as AMBER [31,195]. This chapter describes the parameterization of a sulfonium

atom type for use with the popular ff99 AMBER force field, particularly for

molecular dynamics simulations that include SAM. Additionally, three unique sets

of partial atomic charges are derived for SAM based on the finding that when bound to biomolecules, the adenine group may assume a syn-, or anti-, or high anti-conformation.

5.2 SAM Background

SAM is the most prominent methyl donor in biology. It acts as a cofactor for

numerous methyltransferase reactions in addition to catalyzing more complicated

241

chemical reactions by serving as a radical source. In addition to serving as a

cofactor, SAM may also bind to biomolecules to act as a transcriptional or

enzymatic modulator. Finally, as described in more detail in Section 5.2.4, SAM

is an intermediary in the biosynthesis of cysteine from methionine, the only pathway by which Metazoa are able to synthesize cysteine.

5.2.1 SAM-Dependent Methyltransferases

SAM is the most common methyl donor in biology. Of the 267 enzymatic reactions documented by the Nomenclature Committee of the International Union

of Biochemistry and Molecular Biology (NC-IUBMB) that involve the transfer of a

single methyl group (EC 2.1.1), 243 (91%) are SAM-dependent. Alternate methyl

donors include cobalt-corrin cofactors, 2-mercaptoethanesulfonic acid (coenzyme

M, CoM), and the various forms of folate (see Appendix N for SAM-independent

EC 2.1.1 members).

Using five known folds [217], SAM-dependent methyltransferases modify many

types of biomolecules by transfer of a methyl group to a nucleophilic atom on the

substrate via an SN2 reaction as shown in Figure 5.1. Known substrates include

DNA, RNA, proteins, fatty acids, polysaccharides, and small molecules. DNA

may be methylated at three different positions: m6A, m4C, m5C by DNA

methyltransferases (DNMTs). In prokaryotes, these forms of methylation are used to protect the bacterial genome from degradation as part of restriction modification systems used to destroy foreign DNA [218]. In eukaryotes, the m5C

242

mark is commonly found in CpG sequences, modifying gene expression in an

epigenetic fashion [219].

Figure 5.1. SAM methyltransferase reaction.

Further epigenetic controls are conferred by SAM-dependent methylation of arginine and lysine residues found in histones. Arginine residues may be monomethylated, symmetrically dimethylated, or asymmetrically dimethylated by protein arginine N-methyltransferases (PRMTs) [220], while lysine residues may be mono-, di-, or trimethylated by lysine-specific methyltransferases that often contain a catalytic SET (Su(var)3-9, Enhancer of Zeste, Trithorax) domain [221].

SAM-dependent methyltransferases are responsible for numerous RNA modifications that are functionally important in all three major types of RNA: tRNA, rRNA, and mRNA. In tRNA, modifications that have both structural and functional consequences have been identified. Members of the SPOUT superfamily of methyltransferases (SpoU and TrmD families) can methylate the

2’-O position of various RNAs. Position 18 on the D-loop is frequently 2’-O- methylated by TrmH (SpoU) which stabilizes the L-shape of tRNAs though 243

interaction with pseudouridine at position 55 [222,223]. In yeast, Trm7p performs

2’-O-methylation of the anticodon wobble position (position 34) of tRNALeu,

tRNAPhe, and tRNATrp [224]. SAM-dependent methyltransferases are also

involved in the formation of hypermodified bases, such as wybutosine at position

37 of tRNAPhe, which stabilizes condon-anticodon interactions [225].

Ribosomal RNA (rRNA) methylation is less understood than the modification that

occur in tRNA, however a majority of the rRNA modifications occur in the active

sites, suggesting that they have functional significance [226]. Additional

modifications, such as N1-methylation of G745 and G748 of the 23S subunit

performed by RlmAI, confer resistance to MLS antibiotics (macrolide,

lincosamide, streptogramin B) [227], while the heat shock protein, FtsJ, methylates rRNA to ensure thermal stability [228].

In mRNA, one of the three enzymes required for addition of the 7- methylguanosine (m7A) 5’-terminal cap that is found in eukaryotes is formed by a

SAM-dependent methyltransferase. And most recently, it has been discovered

that m6A in mRNA plays a role in the regulation of circadian rhythms [229].

Finally, SAM is also important in the biosynthesis of numerous small molecules.

For example, SAM-dependent methyltransferases are involved in the synthesis

of epinephrine, creatine, and . Additionally, the caffeine biosynthentic

pathway contains three different SAM-dependent N-methyltransferases [230].

Finally, SAM is important in the biosynthesis of spermidine and spermine [231],

which are important for cell longevity through the promotion of autophagy [232].

244

Instead of acting as a methyl donor, decarboxylated SAM donates a propylamino

group in the formation of these polyamines.

5.2.2 Radical SAM Enzymes

Radical SAM enzymes utilize Fe-S clusters to perform complex chemistries. This

large superfamily of over 600 members was only first identified in 2001 [233]. As

shown in Figure 5.2, these enzymes catalyze the transfer of an electron from a

[4Fe-4S] cluster to a SAM molecule to form methionine and a 5’-deoxyadenosyl

radical. The chemistry that can be carried out by this highly reactive radical

species is quite varied. In tRNA, for example, where SAM-dependent

methyltransferases are required for the biosynthesis of the hypermodified

wybutosine residue at position 37 of tRNAPhe, a radical SAM enzyme is also

required for the creation of the tricyclic base [225]. Additionally, in nearly all

tRNAs that read codons beginning with U, position 37 is modified by the radical

SAM enzyme, MiaB, to form 2-methylthio-6-isopentenyl adenosine (ms2i6A)

[234]. In this chemically difficult reaction, an aromatic C-H is converted to C-S.

Other radical SAM enzymes are involved in the biosynthesis of biotin, thiamine, and heme prosthetic groups.

245

Figure 5.2. Radical SAM reaction.

5.2.3 SAM as an Allosteric Modulator

As discussed in 5.2.4, SAM is product of methionine biosynthesis. Methionine is a member of the aspartate family of amino acids that also includes lysine, threonine, isoleucine, and valine. Organisms have developed multiple feedback systems to control these pathways that involves the use of SAM as a regulatory molecule. In the Gram-negative Enterobacteriacea, which includes E. coli, methionine synthesis is controlled at the transcriptional level through the use of the met repressor, MetJ [235]. Upon SAM binding, MetJ exhibits an increased affinity for at least nine DNA operators that are related to methionine biosynthesis, preventing their transcription [236].

In other prokaryotes, methionine synthesis is controlled at both the transcriptional and translational level via SAM riboswitches. Thus far, five SAM riboswitch motifs have been identified (SAM-I – SAM-V) which are often found in the 5’- untranslated region (UTR) of gene transcripts. SAM-I riboswitches form termination stems in RNA, causing the dissociation of RNA polymerase to halt

246

transcription [237]. SAM-III riboswitches prevent mRNA binding to the ribosome

by occluding the Shine-Delgarno sequence [238].

Finally, feedback inhibition of the aspartate family of amino acids may occur at

the protein level thought allosteric mechanisms. While Metazoa lack the ability for

methionine biosynthesis, plants are higher eukaryotes that have retained the

ability. In plants, there are seven allosteric enzymes that control the biosynthesis

of the aspartate family of amino acids, beginning with aspartate kinase (AK).

Arabidopsis thaliana AK is synergistically inhibited by SAM and lysine in an

allosteric fashion [239], while bacterial AKs have been found to be inhibited by

threonine [240] and lysine [241], but not SAM. Furthermore, in plants, the synthesis of methionine and threonine is branched at the production of phospho-

homoserine (Phser) where both cystathionine γ-synthase and threonine synthase compete for Phser in the synthesis of methionine and threonine, respectively.

Plant threonine synthase is allosterically potentiated by SAM to regulate the

relative amounts of methionine and threonine synthesis [242].

5.2.4 SAM Biosynthesis

The structure of SAM was first reported in 1952, where it was given the name S-

adenosyl-methionine [243]. It had previously been called ‘active methionine’. As

shown in Figure 5.3, SAM is synthesized in vivo by an enzyme called methionine

adenosyltransferase (MAT) which catalyzes the transfer of the adenosine portion

of ATP to the sulfur atom of free methionine. There are three forms of MAT that

are produced from two genes. MAT2A is expressed in all tissues, while MAT1A

247

expression is mostly restricted to the liver where up to 50% of all methionine is

metabolized.

During a methyltransferase reaction, SAM is converted to S-adenosyl-

homocysteine (SAH). SAM may ultimately be regenerated through what is called

the SAM cycle in which SAH is hydrolyzed to adenosine and homocystein. Via

the folic acid cycle, homocystein is converted back to methionine which may then

be converted back to SAM. As shown in Figue 5.4, N5-methyl-tetrahydrofolate

(THF), the second most common methyl donor in biology, serves as the methyl donor in the methionine synthase that converts homocysteine to methionine. The carbon atom transferred to homocysteine through the folic acid cycle, which is also the carbon donated by SAM molecules, originates as the β-carbon of free serine. Instead of regeneration to methionine, homocysteine may exit the SAM cycle through a reaction with serine that ultimately leads to the synthesis of cysteine. Thus, Metazoan cysteine synthesis is dependent upon methionine consumption. Although, cysteine is not strictly considered an essential amino acid, it can be considered a ‘semi-essential’ or ‘conditionally essential’ nutrient due to the fact that its biosynthesis is dependent upon exogenous methionine.

248

Figure 5.3. The SAM cycle. SAM is formed by the transfer of the adenosyl group from an ATP molecule to methionine by methionine adenosyltransferase. After SAM is converted to SAH through a methyltransferase reaction, SAH is hydrolyzed into adenosine and homocysteine. The homocysteine molecule may then be metabolized to cysteine and α-ketobutyrate or regenerate into methionine via the folic acid cycle. The origin of the methyl group that is ultimately transferred to methyl acceptor is tracked in magenta.

249

Figure 5.4. Folic acid cycle. Homocysteine is converted into methionine by the addition of a methyl group from N5-methyl-THF that originates as a serine β-carbon. As in Figure 5.3, the methyl group that ultimately becomes the methyl donor of SAM is tracked in magenta. THF = tetrahydrofolate, DHF = dihydrofolate.

5.3 Survey of Existing SAM and SAH Structures

As of 11/09/2013, there were 250 crystal structures deposited to the Protein Data

Bank (PDB) containing SAM and 511 containing SAH. Figure 5.5 shows how the

number of structures has surged in the past twenty years, with the first SAH- and

SAM-bound structures appearing in 1994 and 1996 respectively. Overall, there

half as many crystal structure containing SAM compared to SAH, perhaps due to

the reactive nature of SAM, particularly bound to enzymes for which it serves as

a cofactor. The average resolution for each group of structures is similar at ~2.18

250

Å. An earlier survey of SAM crystal structures was performed by Markham et al.

in 2002, however at this time, there were only 20 reported structures of SAM in

complex with biomolecules [244].

Figure 5.5. Statistics of PDB crystal structures including SAM or SAM. A. Number of structures deposited each year (blue, SAM; orange, SAH). B. Cumulative number of structures (blue, SAM; orange, SAH). C. Resolution distribution of SAM structures D. Resolution distribution of SAH structures.

251

In the following analysis of SAM and SAH conformations, only the first instance

of SAM/SAH in each structure was considered, so that multimeric proteins would

not skew the statistics. However, multiple entries exist for multiple biomolecules,

such as 12 entries for the SAM-I riboswitch and five entries containing the

SET7/9 domain; these were not removed. Analysis of the SAM and SAH

structures in the PDB revealed three distinct glycosidic torsion angles (χ = O4’-

C1’-N9-C4, see Figure 5.6 for illustration of SAM atom names and χ angle).

A B

C D

Figure 5.6. Gylcosidic (χ) torsion angle of SAM and SAH structures in the PDB. Atom naming for SAM. B. Atoms defining the χ angle (O4’-C1’-N9-C4) are emphasized. C. Histogram of χ angle for SAM in the PDB. D. Histogram of χ angle for instances of SAH in the PDB. A majority of the structures are divided among the anti- (> -135°) and high anti-conformations (- 135°<χ<0°). A small population of SAM molecules are found in the syn-conformation (45° < χ < 90°), a conformation rarely seen for SAH. 252

Structures with 45°<χ<90° represent the syn-conformation, while structures in the anti- and high anti-conformation have χ angles centered at -165° and -106°, respectively. As shown in Figure 5.6, the high anti-conformation is favored for both SAM and SAH, however more so for SAM. Using the following definitions: high anti- = -135°<χ<0° and anti- = χ<-135°, χ>135°, the high anti-conformation is favored over the anti-conformation in SAM by a factor of 4.2. In the SAH structures, this factor is reduced more than two-fold to 1.8.

The conformations of SAM in each χ conformation were superimposed at the

C1’, O4’, C4’, and C3’ atoms of the ribose group as shown in Figures 5.7-9. With these visualizations, observations can be made regarding the distribution of methionine conformations and sugar puckers for each glycosidic conformation.

As noted by Markham et al., structures in the syn-conformation seem to be generally more folded than the extended conformations observed for those structures in the anti-conformation. Nevertheless, each group of structures display a wide variety of methionine conformations. In Figure 5.7, there are a large number of structures with a similar methionine conformation, which represents the numerous SET domains that have been crystallized and does indicate a preferential binding mode for multiple types of SAM-dependent methyltransferases.

253

A B

Figure 5.7. 44 SAM molecules in the anti-conformation. A. Top down view of ribose, highlighting the variety of methionine conformations. The dominant methionine conformation belongs to the large number of crystal structure of SAM bound to SET domains. B. Alternate view highlighting the C3’-endo preference.

A B

Figure 5.8. 184 SAM molecules in the high anti-conformation. A. Top down view of ribose, highlighting the variety of methionine conformations. B. Alternate view highlighting the C2’-endo preference.

254

A B

Figure 5.9. 22 SAM molecules in the syn-conformation. A. Top down view of ribose, highlighting the variety of methionine conformations. B. Alternate view.

The major difference between SAM bound in the anti- and high anti-conformation

is that the later displays a strong preference for a C2’-endo sugar pucker, while

the former is primarily found with a C3’-endo pucker. In the syn-conformation, a

dominant sugar pucker is not apparent.

Interestingly, of the 22 structure of SAM in the syn-conformation, a majority are

biomolecules that bind SAM as a regulatory molecule in manners described in

Section 5.2.3. Table 5.1, lists the PDB entries in which SAM is bound in the syn- conformation. Twelve are SAM riboswitch structures and six are met repressor structures (MetJ). Also included are structures of plant aspartate kinase (PDB ID:

2CDQ, [245]) and threonine synthase (PDB ID: 2C2B, [246]) where SAM is bound as an allosteric inhibitor and potentiator, respectively.

255

Table 5.1. List of crystal structures binding SAM/SAH with a syn-conformation. PDB ID Biomolecule χ angle (°) SAM 3IQR SAM-I riboswitch 19.28 1XVA Glycine N-methyltransferase 42.40 3GX6 SAM-I riboswitch 46.46 2CDQ aspartate kinase 52.10 3KU1 putative tRNA methyltransferase 53.77 2C2B threonine synthase 54.89 3GX7 SAM-I riboswitch 55.02 1CMA met repressor, MetJ 56.34 3E5C SAM-III riboswitch 58.49 3V7E SAM-I riboswitch 59.12 4B5R SAM-I riboswitch 66.67 3GX5 SAM-I riboswitch 68.14 1MJ0 met repressor, MetJ 68.25 1MJ2 met repressor, MetJ 68.31 4KQY SAM-I riboswitch 68.32 2YGH SAM-I riboswitch 70.30 1CMC met repressor, MetJ 73.79 2YDH SAM-I riboswitch 75.68 1MJL met repressor, MetJ 80.92 1MJQ met repressor, MetJ 89.92 3IQN SAM-I riboswitch 89.80 2GIS SAM-I riboswitch 95.27 SAH 2VDW guanine-N7-methyltransferase 78.66 3GX3 SAM-I riboswitch 91.44

Only two of the structures containing SAM in the syn-conformation are methyltransferases. One is a putative tRNA methyltransferase from

Streptococcus pneumoniae (PDB ID: 3KU1, unpublished) while the other is of rat glycine N-methyltransferase (GNMT) (PDB ID: 1XVA, [247]). It has been suggested that the syn-conformation crowds the reaction site of and therefore is not a commonly found conformation when SAM is involved in methyltransferase reactions [ref]. However, analysis of these two structures reveal that they have a

C2’-endo sugar pucker, which seems to allow for the adenine base to extend

256

further away from the sulfonium center while maintaining a syn-conformation.

Finally, two of the 511 crystal structure that include SAH are found in the syn-

conformation. One is of the SAM-I riboswitch (PDB ID: 3GX3, [248]) and the

other is the mRNA 5’-capping enzyme, guanine-N7-methyltransferase of vaccinia

virus (PDB ID: 2VDW [249]). Interestingly, the human mRNA 5’-capping enzyme

has also been crystalized in complex with SAM (PDB ID: 3BGV, unpublished),

but in the high anti-conformation despite sharing a highly similar fold with the viral

enzyme.

5.4 Sulfonium Force Field Parameterization

Although SAM is the most common biological methyl donor, it contains a

relatively uncommon sulfonium moiety. A sulfonium atom type has not been explicitly parameterized in popular force fields such as ff99 of the AMBER

package. Markham et al. published a set of sulfonium parameters in 2002,

however, the methods used in the parameterization were not clearly described

and were not consistent with the way in which the AMBER force fields were

developed [244]. Therefore, new sulfonium AMBER-compatible parameters were derived for molecular dynamics simulations involving SAM.

5.4.1 AMBER Force Field

AMBER (Assisted Model Building with Energy Refinement) is one of the most widely used academic software suites for molecular mechanics (MM)-based molecular dynamics (MD) and free energy calculations. The most popular parameter set for MD simulations with AMBER is the ff99 force field [31], a two- 257

body additive model in which atom-centered point charges are fit from quantum

mechanically-derived electrostatic potentials. The form of the potential is

purposefully simplistic:

1cos ϒ 2

Bonded interactions are described by the first two terms in which harmonic potentials are used to model bond stretching and angle bending. In these terms, a force constant, kb or kθ, is used to maintain an equilibrium bond length or angle

measurement, beq or θeq. The third term is applied to dihedral angles to adjust

torsional profiles, while the final term describes the pair-wise additive van der

Waals and electrostatic interactions. The first two terms are expected to capture

the full interaction of bonded atoms, thus the pair-wise potential is not applied to

1-2 or 1-3 pairs (atoms that are separated by one or two bonds). For 1-4 pairs,

the van der Waals and electrostatic interactions are scaled by a factor of 1.2 and

2 respectively. Partial atom-centered point charges are derived using the

restrained electrostatic potential (RESP) method in which a least-squares

procedure is used to fit quantum mechanically (HF/6-31G*) calculated

electrostatic potential to the atoms of a molecule [250].

In the 14 years since ff99 was introduced, there has been a great leap forward in

both the hardware and software that drives MD simulations. When ff99 was 258

released, MD simulations in the range of 1-10 ns were standard. Now, reports on

simulations exceeding 1 μs have become routine [23,251]. With more exhaustive

sampling, deficiencies in the force field have been revealed and addressed with

slight modifications to ff99. In particular, the over-stabilization of α-helices was

corrected with new φ/ψ dihedral parameters in the ff99SB force field [32]. Later,

the side chain torsions of isoleucine, leucine, aspartate, and asparagine were

reparameterized to better reproduce high-level quantum mechanical calculations

in the ff99SBildn variant [25]. Additionally, improvements have been made to the

nucleic acid parameters. In ff99bsc0, the α/γ torsions of DNA backbone were

corrected [252], while the ff99χOL refinements adjusted the glycosidic torsion angle, χ, for simulations of RNA [253]. These improvements have been

incorporated into the latest version of the AMBER force field, called ff12.

Remarkably, the partial atomic charges for amino and nucleic acids for this force

field were derived in 1994 and have remained unchanged [254]. Simulations of

proteins using ff99 in combination with the TIP3P explicit water model have

yielded good results in long simulations of protein folding in spite of the protein

and water force fields not being specifically parameterized together [19]. Efforts

within the AMBER community to derive a new set of fixed charges and

parameters that are consistent with the more accurate TIP4P-Ew water model

have been recently reported [41].

259

5.4.2 Available Atom Types and Parameters

The ff99 variants of the AMBER protein force field only contain two sulfur atom

types, S and SH, which are used to describe the sulfur atoms found in cysteine and cystine residues respectively. The general AMBER force field (GAFF) more thoroughly treats the possible sulfur bonds with eight different sulfur atom types, including four that are hypervalent, s4, s6, sx, and sy. Figure 5.10 illustrates the different sulfur atom types parameterized in ff99 and GAFF. The ss and sh atom types in GAFF correspond to the S and SH atom types found in ff99, while the s atom type represents the anionic form. An sp2 sulfur involved in a double bond is

represented by s2. Of the four hypervalent sulfur types, s4 and sx represent

sulfur atoms with four bonds to three substituents, while s6 and sy have six bonds to four substituents. The difference between sx/sy and s4/s6 is that the former are involved in conjugated systems. The atom type that is perhaps the closest to describing the nature of a sulfonium sulfur is s4. Although the s4 atom type describes sulfur bound to three substituents, the chemical differences between a neutral, hypervalent sulfur atom forming four bonds with three substituents and the positively charged sulfonium group, make the s4 atom type a questionable choice for the simulation of SAM.

260

s sh

s2 ss

s4 s6

sx sy

Figure 5.10. Sulfur atom types parameterized in the general AMBER force field (GAFF).

To see how the equilibrium bond lengths and angles of the AMBER sulfur atom

types compare to the sulfonium group found in SAM, the geometries of

+ + trimethylsulfonium (C3H9S ) and ethyldimethylsulfonium (C4H11S ) were optimized using Gaussian 09 at the MP2 level of theory with three different basis sets: 6-31G*, 6-311+G(d,p), and aug-cc-pVDZ. Additionally, the corresponding bond and angles were measured from the crystal structures of SAM described in

Section 5.3. The available bond and angle parameters for the most relevant

AMBER sulfur atom types are compared to the Gaussian-optimized and 261

crystallographic geometries in Table 5.2. The geometries derived by Markham et

al. are also included.

For the four bond and angle measurements relevant to the sulfonium structure of

SAM, the existing AMBER parameters appear to not wildly deviate from the

crystallographic or quantum mechanically-derived geometries. From the 244

SAM-containing crystal structures analyzed, 15 had a resolution less than 1.5 Å,

90 structures were in the 1.5 Å to 2.0 Å range, 76 fell between 2.0 Å and 2.5 Å, and 63 structures had a resolution greater than 2.5 Å. For the C-S+ bond and C-

S+-C angle, the highest resolution structural measurements correlate very well

with the QM geometries derived using the large aug-cc-pVDZ basis set. Although

the AMBER bond distances are roughly in line with the QM and structural

measurements, the C-S+-C angles in AMBER deviate by 1.416° to 4.515°

depending on the atom type used.

There is less agreement for the C-C-S+ angle measurements. While the average

crystallographic measurements for this angle remains is a consistent 113.5° for

all but the lowest resolution structures, this value deviates by ~2.5° from the

highest-level QM calculations. The AMBER C-C-S+ measurements for selected

sulfur atom types are also not consistent with the QM or crystallographic data.

Finally, the high-level QM S+-C-H angle measurements are smaller than those in

the AMBER force fields, while the angle used by Markham et al. is significantly smaller.

262

Together, these discrepancies warrant a closer examination of the ability of available AMBER parameters to treat the sulfonium center of SAM, if not necessitate the parameterization of a new sulfur atom type to more accurately model the bonds and angles required for the simulation of SAM.

263

Table 5.2. Experimental, force field, and ab initio sulfonium measurements. ab initio calculations Crystal structures* Markham ff99 GAFF MP2/ MP2/ MP2/ 1.5 Å < x ≤ 2.0 Å < x ≤ et al. All ≤ 1.5 Å > 2.5 Å 6-31G* 6-311+G(d,p) aug-cc-pVDZ 2.0 Å 2.5 Å Bonds CT–S 1.810 Å -- c3-ss -- 1.821 Å 1.7798 Å 1.802 Å 1.799 Å 1.816 Å 1.801 Å 1.817 Å 1.799 Å 1.800 Å 1.801 Å c3–s4 -- 1.807 Å Angles CT–CT–S 114.7˚ -- c3–c3-ss -- 112.69˚ 110.2762˚ 111.345˚ 111.047˚ 110.921° 113.462° 113.686° 113.529° 113.687° 113.041° c3-c3-s4 -- 111.46˚ CT–S–CT 98.9˚ -- c3–ss–c3 -- 99.92˚ 100.5458˚ 102.527˚ 102.052˚ 101.336˚ 102.986° 101.556° 102.864° 102.796° 103.731° c3–s4–c3 -- 96.82˚ S-CT-H1 109.5° -- ss-c3-h1 -- 109.34˚ 104.439˚ 108.658˚ 108.420˚ 107.929˚ ------s4-c3-h1 -- 108.66˚ * A total of 244 crystal structures were used from the PDB that contained SAM. Three instances of a C-S+ bond and C-S+-C angle are found in SAM, and two C-C-S+ angles. Therefore, the ‘all’ column is the average over 732 or 488 individual measurements. There were 15 structures with a resolution ≤ 1.5 Å, 90 structures between 1.5 and 2.0 Å, 76 between 2.0 and 2.5 Å, and 63 with a resolution greater than 2.5 Å. 264

264

5.4.3 Parameterization Protocol

Since the purpose of the work described in this chapter is to derive a set of

sulfonium parameters that are to be used in MD simulations using the ff99 force

field, it makes sense that the same methods are used to derive these parameters

as were used in the development of ff99. This was the same philosophy used in

the development of the general AMBER force field (GAFF) that allows for the

simulation of a variety of organic molecules with ff99.

The first step in the parameterization of a new atom type is to select a set of

Lennard-Jones parameters used to calculate van der Waals interactions between atoms in close range. The new sulfur atom type presented here uses the same

Lennard-Jones parameters as were used for all sulfur atom types in ff94, ff99,

and GAFF (r=2.0, ε=0.25). Next, partial atomic charges were derived with the

RESP method in which an ab initio electrostatic potential (ESP) calculated at the

HF/6-31G* level is fit to atom centers via hyperbolic restraints [250]. Then, for a

set of equilibrium bond lengths and angles, force constants are determined to

best recreate experiment or high-level ab initio values. Force constants were

selected to best reproduce ab initio vibrational frequencies following the methods

used in the development of GAFF: ab initio vibrational frequencies were

calculated at MP2/6-311+G(d,p) and scaled by 0.9496 to best reproduce

experimental frequencies [200].

A python program was written to systematically scan bond, angle, and dihedral

angle force constants to find the best fit to QM vibrational frequencies. In 265

molecular mechanics, the vibrational frequencies are calculated via normal mode

analysis in which the vibrational modes are the eigen values of the 3Nx3N

Hessian matrix, a matrix of the second derivative of the potential energy. The

fitting process is as follows: 1.) create a frcmod file with new force constants 2.)

create a topology file for molecule 3.) run an energy minimization to drms =

0.0001 4.) compute normal modes for energy minimized structure 5.) compare

MM and QM normal modes 6.) repeat with new parameters, searching for the

lowest RMSD between the MM and QM frequencies.

5.4.3.1 Torsion Profiles

The final step in parameterization is to alter the Vn terms to best reproduce ab

initio torsional profiles. The relative energies of the various torsional

conformations are related to all of the other force field parameters, therefore it is best that the Vn terms are set last. While there is a physical basis for the force

constants that describe the bond stretching and angle bending in that they are

related to the vibrational frequencies observed in infrared (IR) and Raman

spectroscopy, there is no physical basis for the dihedral torsion term of the force

field. Thus, the Vn terms may be considered correction factors to better recreate

the energy differences and barriers between molecular conformations. The

general AMBER philosophy is to include as few torsional terms as possible so

that the parameters are more transferrable among chemical structures. However

in the present case, since the sulfonium group is so uncommon, it is not likely

that one would find a need to transfer these specific parameters to a molecule 266

aside from SAM, therefore we should not be concerned about limiting the number

of Vn terms used to model the torsional profile of the sulfonium center.

As introduced in Section 5.4.1, the dihedral term has the general form:

1cos ϒ 2

Where the energy profile for the dihedral angle, φ, is a cosine function with a

force constant Vn and a periodicity, n. A phase shift of ϒ = 180° is required in

some cases to obtain the proper torsional profile. Typically, a periodicity of 1, 2,

or 3 should be sufficient to describe the torsional profiles found in organic

molecules. Examples of these various cases are illustrated in Figure 5.11. In practice, the dihedral potential term is implemented in AMBER as:

1cos ∗

Where PK is Vn/2, and IDIVF is a factor by which the torsional barrier is divided.

The IDIVF factor is required since in some cases, dihedral bonds are not

explicitly defined between four specific atom types, but may incorporate

wildcards, such as (X-c-c-X instead of c3-c-c-c3 to describe the central

torsion of 2,3-butanedione, illustrated in Figure 5.12). In cases where wildcards

are used, multiple dihedral angles may be found in a molecule that describes the

same torsion. In the example of 2,3-butanedione, it is appropriate to set IDIVF=4

if the single torsion of the molecule is defined in the force field parameters as X- 267

c-c-X, since four different sets of atoms will be identified that describe the same

torsion: o1-c-c-c32, o1-c-c-o2, c31-c-c-c32, c31-c-c-o2. The use of

IDIVF≠1 in these cases is important so that the same dihedral term is not added multiple times to the computed potential energy of the molecule. However, in the case where all four atoms of a dihedral angle are explicitly defined, as will be the case in the following sulfonium parameterization, IDIVF should be set to 1. In the following sections, all Vn values listed imply PK values, and thus refer to Vn/2.

Figure 5.11. Sample dihedral profiles with periodicity of 1, 2, and 3.

268

5.4.3.2 Test Case: 2,3-butanedione

An example dihedral profile produced during the GAFF parameterization for 2,3-

butanedione is reproduced in Figure 5.12, showing the MM fit to an ab initio

energy profile. The torsional profile was optimized with a V2 term of 1.2 kcal/mol which minimized the RMSD of relative energies to 0.24 kcal/mol with measurements taken at 30° intervals over 0 to 330°.

2,3-butanedione

Figure 5.12. Dihedral scan of 2,3-butanedione. The MP4/6-311G(d,p)//MP2/6-31G* ab initio profile scanned from -180° to 180° is in red while the MM profile using GAFF is in blue. The relative conformational energies are given in kcal/mol. Left image reproduced from Wang et al. [195] with permission from John Wiley and Sons, Copyright © 2004 Wiley Periodicals, Inc.

Before fitting dihedral parameters for the new sulfonium atom type, attempts

were made to recreate Figure 5.12 using the methods described in the GAFF

parameterization paper [195]. The ab initio energy profile was calculated at the

MP4/6-311G(d,p)//MP2/6-31G* level and charges computed with the RESP

269

method at the HF/6-31G* level. Then, using the same optimized bond and angle parameters derived for GAFF, an optimal V2 (phase = 180°, multiplicity = 4) value was scanned for, seeking to minimize the RMSD between the QM and MM

270

A

B

Figure 5.13. Initial dihedral profile fitting test for 2,3-butanedioine. A. Relative MM energy profiles with GAFF parameters, varying the V2, to match the ab initio MP4/6-311G(d,p)//MP2/6- 31G* profile (black). B. RMSD between MM and QM energies.

271

relative energies over 13 points from 0 to 360°. As shown in Figure 5.13, an

optimal V2 value of 2.2 kcal/mol was determined, reducing the relative energy

RMSD to 0.432 kcal/mol.

Aside from the different optimal V2 value determined in this test compared to

what was published (2.2 kcal/mol vs. 1.2 kcal/mol), significant differences were

observed between both the MM and QM profiles calculated here and those

published by Wang et al. The QM profile calculated at the MP4/6-311G*//MP2/6-

31G* level (Figure 5.13A, black trace) deviates from the published profile in the

range of 30° to 120°. Most notably, the QM profile in Figure 5.13 has local

minima at ±30° instead of forming a single transition state at 0°. Additionally, the

barrier height in the MM profile is ~ 1 kcal/mol less than the computed QM barrier

height. This results in a strange optimized profile where a local minima is formed

at 0° in order to minimize the RMSD between the QM and MM energy profiles.

Communications with Karl N. Kirschner at the Fraunhofer Institute for Algorithms and Scientific Computing helped lead to the realization that slight changes in the partial atomic charges used for the molecule could lead to appreciable changes in the computed dihedral profiles. This lead to the adoption of computing RESP charges through routines implemented in R.E.D. (RESP ESP charge Derive)

[197]. The main purpose of R.E.D. is to standardize the way in which RESP charges are computed so that they are more reproducible, regardless of which

QM software package is used. An important procedure implemented by R.E.D. is the use of multiple orientations in the derivation of a robust set of partial atomic

272

charges. It has been found that the precise position of the points defining a

molecular electrostatic potential (MEP) can have a large effect on the final RESP

charges calculated. The points of the MEP are ultimately determined by the

position of the molecule in Cartesian space, making molecular orientation very

important. R.E.D. allows the user to define a specific number of orientations to be

used during the charge fitting procedure. Reported accuracy of charges

computed with R.E.D. is within 0.0001 e.

Table 5.3 lists the charges that were initially used to produce the profiles in

Figure 5.13. These used only a single orientation, while the charges produced by

Karl and second attempt at computing the charges for 2,3-butanedione used the

R.E.D. package with three molecular orientations. The use of multiple

orientations in the derivation of partial atomic charges has a significant effect.

The effect of the new charges on the computed MM dihedral profile for 2,3- butanedione is shown in Figure 5.14.

Table 5.3. Partial atomic charges for 2,3-butanedione. Atoms Initial Charges Karl’s Charges Final R.E.D. Charges protons 0.13167 0.08610 0.08690 sp3 carbons -0.45155 -0.27390 -0.27771 sp2 carbons 0.57356 0.52390 0.52520 oxygens -0.51704 -0.50820 -0.50890

273

A

B

Figure 5.14. Dihedral profiles for 2,3-butanedione with different charge sets. A. Comparison of the profile with the initial charge set (blue) and with the charges computed by Karl Kirschner (green) both with V2 = 0 (dotted) and V2 = 1.2 (solid). B. RMSD difference between QM and MM profiles.

274

With the use of the new charges, the RMSD of the fit between the QM and MM

profiles was improved from 0.4302 to 0.189 kcal/mol, and the optimum V2 value was decreased slightly 2.2 to 2.1 kcal/mol (Figure 5.14B). In spite of the improved RMSD, the optimization of V2 still resulted in a MM profile with a local minima at 0°. This was largely due to the QM profile to which the V2 value was

being fit.

After experimenting with different basis set, a profile similar to what was

published by Wang et al. was recreated by calculating the dihedral profile at the

MP4/aug-cc-pVDZ//MP2/aug-cc-pVDZ level. This is compared to use of a smaller

basis set (MP4/6-311G(d,p)//MP2/6-31G*) as described in the GAFF paper.

Refitting V2 to this new QM profile resulted in a minimum RMSD of 0.138

kcal/mol at an optimal V2 = 0.14 kcal/mol. This finally compares well to the

optimal V2 value of 0.12 determined in the original GAFF parameterization.

Figure 5.15 illustrates the final MM fit to the new QM profile in which both profiles exhibit a single transition from -180 to 180°. This study implies that the basis set used to calculate the QM torsion profile in the development of GAFF was likely incorrectly reported, and that a larger basis set is necessary to obtain proper relative energy profiles.

275

A

B

Figure 5.15. Optimized dihedral profiles for 2,3-butanedione. A. Relative MM energy profiles with GAFF parameters, varying V2, to match the ab initio MP4/aug-cc-pVDZ//MP2/aug-cc-pVDZ profile (black). B. RMSD between MM and QM energies when scanning V2. 276

5.3.4 Sulfonium Parameterization

A total of six force field parameters, as illustrated in Figure 5.16, are required to

completely describe the sulfonium center of SAM. These include a C-S+ bond; C-

S+-C, S+-C-H, and C-C-S+ angles; and C-S+-C-H and C-C-S+-C dihedral parameters.

1 2 3

SAM 4 5 6

Figure 5.16. Force field parameters required for treatment of sulfonium center in SAM.

A new sulfonium atom type, called S3, is introduced in order to add these parameters to the ff99 AMBER force field. In order to make the new parameters most consistent with ff99, the same atom types used to describe methionine and adenine are used in this parameteization. As far as describing the connectivity at the sulfonium center, only a single carbon and hydrogen atom type are required in addition to the S3 atom. There is a single sp3 aliphatic carbon atom type in ff99, CT, making the selection easy. However, three different hydrogen atom types appeared to be viable candidates for use in this parameterization:

277

H1 = used in ff99 to describe a proton-carbon bond in which the carbon is

bound to one electron withdrawing group

HC = used in ff99 to describe a proton-carbon bond in which the carbon is

not bound to any withdrawing group

HP = used in ff99 to describe a proton-carbon bond in which the carbon is

bound to a positively charged group

Optimal parameters were derived for each of these three hydrogen types in

combination with the new sulfur atom type to see which would yield the best

results.

The first round of parameterization was performed on trimethylsulfonium

+ (C3H9S ), allowing for the parameterization of all but two of the types of interactions required to treat the sulfonium center of SAM detailed in Figure 5.16.

The molecule was energy minimized and vibrational frequencies were calculated

at the MP2/6-311+G(d,p) level in Gaussian. Then, using equilibrium geometries

calculated at the MP2/aug-cc-pVDZ level as listed in Table 5.2, force constants

were systematically scanned to obtain the best fit between the MM and scaled

QM vibrational frequencies (QM frequencies were scaled by 0.9496 as described

in Section 5.4.3). The results of this parameterization are found in Table 5.4.

278

Table 5.4. Force constants derived for trimethylsulfonium. Force constants (kcal/mol) RMSD -1 * Kb(C-S) Ka(C-S-C) Ka(S-C-H) V3(H-C-S-C) V2(H-C-S-C) V1(H-C-S-C) (cm ) S3 222.4 55.1 42.9 0.12 -- -- 49.489 (using H1) S3 224.1 59.6 43.3 0.14 -- -- 49.461 (using HP) S3 221.8 51.5 42.7 0.11 -- -- 49.396 (using HC) SP ** 345.2802 289.2187 33.0448 0.4750 0.5214 -0.3536 115.698 (with H1) GAFF 233.8 62.1 42.9 0.33 -- -- 73.343 (s4) *** GAFF 225.8 60.6 42.4 0.20 -- -- 78.118 (ss) *** ff99 227.0 62.0 50.0 0.33 -- -- 67.891 (S) **** + * RMSD between 33 vibrational modes of C3H9S computed at MP2/6-311+G* and via normal mode analysis in AMBER ** Parameters derived in Markham et al. using new SP atom type with CT and H1 *** GAFF atom types = s4, c3, and h1 or ss, c3, and h1 **** ff99 atom types = S, CT, H1

S3 in conjunction with CT and either H1, HP, or HC resulted in force constants similar to those of the established AMBER force fields, however those derived by

Markham et al., which introduces SP as the sulfonium atom type, are significantly different. Thus, the Markham parameters perform the worst at recreating the ab inito vibrational frequencies of the trimethylsulfonium ion, with an RMSD of 115.7 cm-1 over the 33 frequencies. Of the established AMBER sulfur atom types tested, the S atom type from ff99 performed the best with an RMSD of 67.9 cm-1.

The new S3 atom type was able to improve the fit to the QM data, and the choice

of hydrogen atom type had little effect on the final results. Using S3 in

combination with H1, HP, and HC, resulted in RMSDs of 49.489, 49.461, and

49.396 cm-1, respectively. 279

Following the initial fit of parameters for trimethylsulfonium, the H-C-S+-C

dihedral parameter (illustrated as parameter 5 in Figure 5.16) was

reparameterized to best recreate the high-level ab inito torsional profile

calculated at MP4/aug-cc-pVDZ//MP2/aug-cc-pVDZ. The results for S3 in

combination with H1 are illustrated in Figure 5.17. An optimal V3=0.18 was

determined, reducing the RMSD to 0.057 kcal/mol over 13 points from -180° to

0°.

This is a slight increase from V3=0.12, which was determined based on the fit to vibrational frequencies. Since the torsional profile has a significant impact on the conformations sampled during MD simulations, the V3 value used to optimize the

profile should be selected over the initial value that optimized the vibrational

frequencies. Adopting these new V3 values only slightly alters the fit to the

vibrational frequencies as shown in Table 5.5.

+ Table 5.5. RMSD between MM and QM frequencies for C3H9S with reparameterized V3. Atom types Original RMSD* New RMSD** S3,CT,H1 49.489 50.376 S3,CT,HP 49.461 50.140 S3,CT,HC 49.396 50.408 * RMSD using parameters listed in Table 5.4. ** RMSD using new V3 parameters listed in Table 5.6.

280

A

B

Figure 5.17. Fitting of the H-C-S-C V3 parameter. A. MM profiles with different V3 values compared to the ab initio torsional profile calculated at MP4/aug-cc-pVDZ//MP2-aug-cc-pVDZ. B. RMSD between QM and MM relative potential energies computed over 13 points at 15° intervals from -180° to 0° for various V3 values. 281

The results of the V3 reparameterization, found in Table 5.6, show the new parameters are a significant improvement over the established AMBER atom types available. The worst fit using the S3 atom type is in combination with the

HC hydrogen which produced an RMSD between the QM and MM profile of 0.072 kcal/mol when measured over 13 points from -180 to 0°. This is compared to the best case of the AMBER sulfur atom types tested (s4 from GAFF) which produces an RMSD that is four-fold greater. Once again, the parameters published by Markham et al. perform the worst, with an RMSD of 0.710 kcal/mol.

Table 5.6. H-C-S-C profiles. Atom types V3 (kcal/mol) IDIVF=1 RMSD (kcal/mol)* H1-CT-S3-CT 0.18 0.057 HP-CT-S3-CT 0.19 0.052 HC-CT-S3-CT 0.17 0.072 H1-CT-SP-CT 0.475 0.710 X -CT-S –X 0.33 0.431 X -c3-s4-X 0.20 0.293 X -c3-ss-X 0.33 0.444 * RMSD between QM and MM profiles over 13 points from -180° to 0°

Finally, Figure 5.18 illustrates the H-C-S-C dihedral profiles for the new sulfonium parameterization of this work (red trace) compared to profiles produced with the

AMBER and Markham parameters.

282

+ Figure 5.18. H-C-S-C torsional profile of C3H9S with existing force field parameters. Four existing force field parameters were compared to an ab initio profile calculated at MP4/aug- cc-pVDZ//MP2/aug-cc-pVDZ: GAFF parameters (h1-c3-s4-c3 and h1-c3-ss-c3), ff99 (H1- CT-S-CT), and those published by Markham et al. (H1-CT-SP-CT).

As described above, four of the six parameters required to describe the

+ sulfonium center of SAM were derived using trimethylsulfonium (C3H9S ) as a model compound. To obtain force constants for the remaining two parameters

(C-C-S+ angle and C-C-S+-C dihedral) parameterization was performed using

+ ethyldimethylsulfonium (C4H11S ). Similar methods were employed as were used in the parameterization with trimethylsulfonium. First, the force constants were optimized to best reproduce high-level ab initio vibrational frequencies. Again, S3 283

was parameterized with the H1, HP, and HC hydrogen atom types bound to the carbons adjacent to the sulfur atom, however HC was chosen as the appropriate atom type for the hydrogens bound to the terminal carbon of the ethyl group of ethyldimethylsulfonium. Additionally, there was some uncertainty as to which equilibrium angle to chose for the C-C-S+ angle. As shown in Table 5.2, the C-S+ and C-S+-C bond determined at the highest level ab initio tested correspond well

with corresponding bond and angle measured from the crystal structures with a

resolution less than 1.5 Å. However, for the C-C-S+ angle, there was a

discrepancy, in which the angle measure at MP2/aug-cc-pVDZ was 110.921°,

while the angle measurements in the high-resolution crystal structures containing

SAM averaged to 113.686°. Therefore, in addition to the multiple hydrogen types,

two different C-C-S+ angles were tested in the parameterization. The derived

force constants and the resulting RMSD between QM and MM frequencies are

found in Table 5.7.

284

Table 5.7. Force constants derived for ethyldimethylsulfonium. Force constants (kcal/mol) RMSD -1 * C-C-S Ka(C-C-S) V3(C-C-S-C) V2(C-C-S-C) V1(C-C-S-C) (cm ) S3 49.3 0.04 -- -- 47.678 (using H1) S3 110.921° 50.0 0.07 -- -- 48.084 (using HC) S3 48.7 0.02 -- -- 47.163 (using HP) S3 50.0 0.06 -- -- 47.500 (using H1) S3 113.686° 49.9 0.07 -- -- 47.904 (using HC) S3 48.8 0.05 -- -- 47.016 (using HP) SP ** 110.276° 96.06 1.1139 0.1433 0.0082 82.794 (with H1) GAFF 111.46˚ 68.32 0.20 -- -- 67.415 (s4) *** GAFF 112.69˚ 61.1 0.33 -- -- 69.046 (ss) *** ff99 114.7˚ 50.0 0.33 -- -- 61.297 (S) **** + * RMSD between 42 vibrational modes of C4H11S computed at MP2/6-311+G* and via normal mode analysis in AMBER ** Parameters derived in Markham et al. using new SP atom type with CT, H1, and HC *** GAFF atom types = s4, c3, h1, and hc or ss, c3, h1, and hc **** ff99 atom types = S, CT, H1 and HC

As was the case for the parameterization involving trimethylsulfonium, the

AMBER atom type with the minimum RMSD between QM and MM frequencies

was the S atom type from ff99. In combination with the CT, H1, and HC atom types to model ethyldimethylsulfonium, the S atom of ff99 reproduced the 42 vibrational frequencies with an RMSD of 61.3 cm-1. The Markham parameters were the poorest performing parameters tested, with an RMSD of 82.8 cm-1. The best parameters for the new S3 atom type minimized the RMSD to 47.175 cm-1, which was in combination with the HP hydrogen and a 113.686° C-C-S+ angle as

285

measured from the high-resolution crystal structures. However, all of the S3

parameters derived for the various hydrogen and C-C-S+ angle combinations

performed similarly, with the RMSD between QM and MM frequencies ranging

from 47.175 to 48.019 cm-1 over six combinations. In each instance, the use of

113.686° as the equilbirum C-C-S+ angle resulted in a marginally better fit

between the QM and MM frequencies.

The final step in the parameterization of the sulfonium center of SAM was to fit

+ the V3 parameter relating to the C-C-S -C dihedral angle of ethyl-

dimethylsulfonium by fitting the relative energies of an MM torsional profile to a

QM profile calculated at an appropriate level of theory. The results for the case of

S3 in combination with the H1 hydrogen atom type and an equilibrium C-C-S+ angle of 113.686° is shown in Figure 5.19. As was the case with optimizing the

+ V3 force constant for the H-C-S -C torsion of trimethylsulfonium, the initial V3

parameters derived for each case need to be increased to better reproduce the

QM profile. However, as shown in Table 5.8, this only resulted in a slight

decrease in ability of the parameters to recreate the QM-derived vibrational

frequencies.

286

A

B

Figure 5.19. Fitting of the C-C-S-C V3 parameter. A. MM profiles with different V3 values compared to the ab initio torsional profile calculated at MP4/aug-cc-pVDZ//MP2-aug-cc-pVDZ. B. RMSD between QM and MM relative potential energies computed over 25 points at 15° intervals from -180° to 180° for various V3 values. 287

+ Table 5.8. RMSD between MM and QM frequenceis for C4H11S with reparameterized V3. Atom types C-C-S+ angle Original RMSD* New RMSD** S3,CT,H1 47.678 47.693 S3,CT,HP 110.921° 47.163 47.195 S3,CT,HC 48.084 48.093 S3,CT,H1 47.500 47.528 S3,CT,HP 113.686° 47.016 47.060 S3,CT,HC 47.904 47.923 * RMSD using parameters listed in Table 5.7. ** RMSD using new V3 parameters listed in Table 5.9.

+ The final V3 parameters for the C-C-S -C angle of ethyldimethylsulfonium are

presented in Table 5.9. As was the case in reproducing the vibrational

frequencies, the parameters using the experimental C-C-S+ angle of 113.686°

from the high resolution crystal complexes that included SAM allowed for a more accurate fit to the QM C-C-S+-C dihedral profile. Overall, the best combination of atom types to describe the sulfonium center of SAM involves the use of the H1 to

describe the hydrogen atoms protonating the carbons adjacent to the sulfur atom. Of the established AMBER sulfur atom types, s4 from GAFF was found to outperform S and ss, and the SP atom type derived by Markham et al. performed

very poorly in recreating the QM C-C-S+-C profile.

Table 5.9. C-C-S-C dihedral scan results. Dihedral angle C-C-S V3 (kcal/mol) RMSD (kcal/mol)* CT-CT-S3-CT (H1) 0.17 0.168 CT-CT-S3-CT (HP) 110.921° 0.21 0.133 CT-CT-S3-CT (HC) 0.17 0.183 CT-CT-S3-CT (H1) 0.23 0.104 CT-CT-S3-CT (HP) 113.686° 0.26 0.107 CT-CT-S3-CT (HC) 0.22 0.116 CT-CT-SP-CT 110.276° 3.420 CT-CT-S-CT 114.70˚ 0.952 c3-c3-s4-c3 111.46˚ 0.795 c3-c3-ss-c3 112.69˚ 0.995 * RMSD between QM and MM profiles over 25 points from -180° to 180° 288

Figures 5.20 and 5.21 show how well the various parameters recreate the C-C-

S+-C profile. The best performing established sulfur atom type, s4, does a good job at recreating the small energy barrier at 120°, however as better illustrated in

Figure 5.21, use of the s4 atom type overestimates the larger barriers by > 1 kcal/mol. The error for the new parameterization is < 0.2 kcal/mol over the entire dihedral scan.

Figure 5.20. Comparison of C-C-S-C torsional profile with new and existing force field parameters. Four existing force field parameters were compared to an ab initio profile calculated at MP4/aug-cc-pVDZ//MP2/aug-cc-pVDZ: GAFF parameters (c3-c3-s4-c3, orange and c3- c3-ss-c3, green), ff99 (CT-CT-S-CT, blue), and those published by Markham et al. (CT-CT- SP-CT, purple). The parameterization performed with the new S3 atom type is also included (CT- CT-S3-CT, red).

289

Figure 5.21. Absolute energy difference between QM and MM C-C-S-C torsional profiles. The absolute values of the energy difference between the QM and MM profiles plotted in Figure 5.20 are presented using the same color scheme for the various force field parameters.

Overall, the new parameters derived to model the sulfonium center of SAM are a large improvement over the existing AMBER sulfur types and the parameters that were derived by Markham et al. None of the available AMBER sulfur atom types were explicitly derived to treat sulfonium ions, therefore it is not unexpected that the new parameters presented here are able to improve the ability to describe the test molecules. However, the Markham parameters were specifically derived to treat sulfonium ions and perform the worst at describing the test compounds.

290

The final parameters best suited for modeling the sulfonium center of SAM are summarized in Table 5.10.

Table 5.10. Final sulfonium parameters. Parameter Equilibrium value Force constant (kcal/mol) S3-CT 1.816 Å 222.4 CT-S3-CT 101.336° 55.1 S3-CT-H1 107.929° 42.9 S3-CT-CT 113.686° 50.0 H1-CT-S3-CT V3 potential 0.18 CT-CT-S3-CT V3 potential 0.23

5.5 Derivation of Partial Atomic Charges for SAM

The following section describes the methods used to derive sets of partial atomic charges for the simulation of SAM in complex with various biomolecules. Based on the finding from the survey of crystal structures containing SAM as described in Section 5.3, the nucleoside portion of SAM may be found in either the anti-,

high anti-, or syn-conformation. Therefore general sets of partial atomic charges

for each glycosidic conformation were derived to benefit the general simulation

community.

5.5.1 Charge Fitting Procedure

Multiple conformations were selected with each glycosidic conformation, with the

goal of representing a variety of methionine conformations. Before stripping the

SAM coordinates from their respective PDB files, the structures were first protonated using the ‘AddH’ tool in Chimera such that proton placements

291

optimized potential hydrogen bonding between the SAM molecules and their

respective binding pockets. Then the SAM coordinates were stripped and the

conformations were geometry optimized with Gaussian 09 at the HF/6-31G*

level.

Geometry optimization in the gas phase was found to lead to proton transfer that

neutralized the zwitterion tail of SAM. Optimization in the gas phase also lead to

conformational changes resulting in the formation of intramolecular hydrogen

bonds. Both of these results are undesirable since under physiologic conditions

SAM should be a zwitterion and calculation of molecule electrostatic potentials

(MEPs) of structures with intramolecular hydrogen bonds would results in over-

polarized partial atomic charges. To overcome the undesired conformations

resulting in geometry optimization in the gas phase, a solvent reaction field using

the polarizable continuum model of Gaussian with the integral equation

formalism (IEFPCM) was used. An example of the improved optimization

performance achieved though the use of a polarized continuum model is

illustrated in Figure 5.22.

After HF/6-31G* energy minimization, partial atomic charges were derived using

the RESP method in which the default two stage fitting procedure was

implemented. In the first stage, weak hyperbolic restraints of a=0.0005 were

applied to all heavy atoms, followed by a second stage in which the heavy atom charges are fixed to those of the first stage, while a stronger restraint of a=0.001 is applied to force dynamically equivalent hydrogens to bear the same partial

292

atomic charge [250]. The RESP method as implemented in the R.E.D.-III.4 program was used to automate the procedure in addition to providing highly reproducible sets of charges [197]. Each conformation was reoriented three times using the C4’, O4’, and C1’ atoms of SAM as the basis for the reorientation

(REMARK REORIENT C4’ O4’ C1’ | C1’ O4’ C4’ | C4’ C1’ O4’). The

MEPs were calculated in the gas phases at the HF/6-31G* level, consistent with the methods used for the derivation of partial atomic charges for ff99 and GAFF.

Due to fortuitous errors that arise with gas phase HF computations, the resulting charges are a close reflection of those in the aqueous state.

293

A

B

C

Figure 5.22. Geometry optimization of SAM. A. Ten protonated SAM conformations with similar χ angles. B. HF/6-31G* optimized geometries in vacuo. C. HF/6-31G* optimized geometries with IEFPCM, εr=4.0.

5.5.2 Charge Fitting Results

Numerous SAM structures were energy minimized at the HF/6-31G* level with

IEFPCM, εr=78.3553 to simulate a water environment. While the implicit water

environment prevented the formation of intramolecular hydrogen bonds and the

294

transfer of protons, it did not prevent the transition of some structure interchange

between the anti- and high anti-conformations. However, in the end, five unique, energy-minimized structures were obtained and used for derivation of partial atomic charges using the RESP method as implemented by R.E.D. These structures are listed in Tables 5.11-13.

Table 5.11. PDB structure used for derivation of partial atomic charges representing the anti-conformation. PDB Description Starting χ Min χ 2EGV rRNA methyltransferase -179.62° -171.32° 2YY8 tRNA methyltransferase (SPOUT) -170.47° -168.90° 3DCM rRNA methyltransferase (SPOUT) -167.50° -165.30° 1MSK Methionine synthase -167.12° -162.28° 3OPE Histone-lysine N-methyltransferase (SET) -176.16° -164.30°

Table 5.12. PDB structures used for derivation of partial atomic charges representing the high anti-conformation. PDB Description Starting χ Min χ 1RJD Protein phosphatase methyltransferase 1 -130.77° -124.27° 2QE6 Putative methyltransferase -124.37° -125.47° 2Q6O SAM-dependent chlorinase -123.89° -108.08° 3FPJ Nicotinamine synthase -116.67° -116.98° 3BWC Spermidine synthase -104.74° -126.90°

Table 5.13. PDB structures used for derivation of parital atomic charges representing the syn-conformation. PDB Description Starting χ Min χ 1XVA Glycine N-methyltransferase 42.40° 62.96° 3GX6 SAM-I riboswitch 46.46° 60.06° 2CDQ Aspartate kinase 52.10° 62.00° 2C2B Threonine synthase 54.89° 58.23° 3E5C SAM-III riboswitch 58.49° 61.83°

295

The resulting multiconformation, multiorientation partial atomic charges derived for each of the three glycosidic conformations are listed in Table 5.14, and visualized in Figure 5.23. The charges derived for the individual conformations are found in Appendix O. Overall, the charges for the anti- and high anti- conformations are very similar. Differences are observed for the syn- conformation, in which the C-S+ bonds are more polarized than when in the two anti-conformations. Additionally, the C2’ of the syn-conformation carries a significantly greater positive charge than in the anti-conformations, likely due to the position of the adenine group with respect to the ribose.

296

Table 5.14. Comparison of partial atomic charges for SAM in multiple conformation. χ orientation atom anti high anti syn N -0.4441 -0.5219 -0.5191 H1,2,3 0.3186 0.3444 0.3329 CA 0.2025 0.0610 0.2834 HA 0.0321 0.0758 0.0377 C 0.6877 0.8069 0.6252 O, OXT -0.6668 -0.7014 -0.6520 methionine CB -0.0773 -0.0132 -0.0987 2,3HB 0.0685 0.0615 0.0721 CG -0.1280 -0.1269 -0.2988 2,3HG 0.1267 0.1230 0.1743 SD 0.1316 0.1699 0.2824 CE -0.0938 -0.1827 -0.2586 1,2,3HE 0.1245 0.1435 0.1614 C5’ -0.0775 -0.1238 -0.1456 H5’,’’ 0.1407 0.1431 0.1439 C4’ 0.0960 0.1442 0.0259 H4’ 0.1717 0.1254 0.1590 O4’ -0.3736 -0.3963 -0.3485 C3’ 0.1310 0.1037 0.1051 H3’ 0.0459 0.0761 0.0610 ribose O3’ -0.6347 -0.5751 -0.5780 HO3’ 0.4671 0.4102 0.4217 C2’ 0.0919 0.1130 0.3038 H2’ 0.1278 0.0610 0.0407 O2’ -0.5422 -0.5483 -0.5866 HO2’ 0.3874 0.4130 0.3879 C1’ -0.0246 0.0864 -0.0168 H1’ 0.2044 0.1813 0.1558 N9 0.0001 -0.0326 -0.0119 C8 0.1398 0.1219 0.1226 H8 0.1587 0.1373 0.1744 N7 -0.5740 -0.5354 -0.5389 C5 0.0638 0.0133 0.0742 C6 0.6991 0.6777 0.6520 adenine N6 -0.9016 -0.8644 -0.8602 H61,2 0.4144 0.4017 0.4062 N1 -0.7215 -0.7240 -0.6747 C2 0.5526 0.5755 0.4524 H2 0.0736 0.0717 0.0681 N3 -0.6923 -0.7388 -0.5841 C4 0.3241 0.4386 0.3153

297

anti

high anti

syn

Figure 5.23. Partial atomic charges for SAM with three different χ angles. Each set of RESP charges was determined using five unique conformations each reoriented three times. Charges are depicted on a color scale ranging from -1.0 e (red) to +1.0 e (blue), while actual charge extremes are -0.9016 e and +0.8069 e.

298

References

1. States U (2005) Computational Science: Ensuring America's Competitiveness. Washington: The Commission.

2. Kohn W, Sham LJ (1965) Self-Consistent Equations Including Exchange and Correlation Effects. Physical Review 140: 1133-1138.

3. Newton MD, Lathan WA, Hehre WJ, Pople JA (1970) Self-Consistent Molecular Orbital Methods .5. Ab-Initio Calculation Equilibrium Geometries and Quadratic Force Constants. Journal of Chemical Physics 52: 4064-4072.

4. Ditchfield R, Hehre WJ, Pople JA (1970) Self-Consistent Molecular Orbital Methods .6. Energy Optimized Gaussian Atomic Orbitals. Journal of Chemical Physics 52: 5001-5007.

5. Ditchfield R, Miller DP, Pople JA (1970) Self-Consistent Molecular-Orbital Methods .7. Convergence of Gaussian Expansions of Slater-Type Atomic Orbitals in Calculations of First-Order and Second-Order Properties. Journal of Chemical Physics 53: 613-619.

6. Hill TL (1946) Statistical Mechanics of Multimolecular Adsorption .2. Localized and Mobile Adsorption and Absorption. Journal of Chemical Physics 14: 441-453.

7. Westheimer FH, Mayer JE (1946) The Theory of the Racemization of Optically Active Derivatives of Diphenyl. Journal of Chemical Physics 14: 733-738.

8. Lifson S, Warshel A (1968) Consistent Force Field for Calculations of Conformations Vibrational Spectra and Enthalpies of Cycloalkane and N-Alkane Molecules. Journal of Chemical Physics 49: 5116-5129.

9. Rahman A (1964) Correlations in Motion of Atoms in Liquid Argon. Physical Review a-General Physics 136: A405-411.

10. McCammon JA, Gelin BR, Karplus M (1977) Dynamics of folded proteins. Nature 267: 585- 590.

11. Williamson MP, Havel TF, Wuthrich K (1985) Solution Conformation of Proteinase Inhibitor-Iia from Bull Seminal Plasma by H-1 Nuclear Magnetic-Resonance and Distance Geometry. Journal of Molecular Biology 182: 295-315.

12. Sugita Y, Okamoto Y (1999) Replica-exchange molecular dynamics method for protein folding. Chemical Physics Letters 314: 141-151.

13. Sharp KA, Honig B (1990) Calculating Total Electrostatic Energies with the Nonlinear Poisson-Boltzmann Equation. Journal of Physical Chemistry 94: 7684-7692.

299

14. Jeancharles A, Nicholls A, Sharp K, Honig B, Tempczyk A, et al. (1991) Electrostatic Contributions to Solvation Energies - Comparison of Free-Energy Perturbation and Continuum Calculations. Journal of the American Chemical Society 113: 1454-1455.

15. Bashford D, Case DA (2000) Generalized born models of macromolecular solvation effects. Annual Review of Physical Chemistry 51: 129-152.

16. Zhao G, Perilla JR, Yufenyuy EL, Meng X, Chen B, et al. (2013) Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics. Nature 497: 643- 646.

17. Shirts M, Pande VS (2000) COMPUTING: Screen Savers of the World Unite! Science 290: 1903-1904.

18. Voelz VA, Jager M, Yao SH, Chen YJ, Zhu L, et al. (2012) Slow Unfolded-State Structuring in Acyl-CoA Binding Protein Folding Revealed by Simulation and Experiment. Journal of the American Chemical Society 134: 12565-12577.

19. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, et al. (2010) Atomic-level characterization of the structural dynamics of proteins. Science 330: 341-346.

20. Freddolino PL, Park S, Roux B, Schulten K (2009) Force field bias in protein folding simulations. Biophys J 96: 3772-3780.

21. Wuthrich K, Wagner G (1975) NMR investigations of the dynamics of the aromatic amino acid residues in the basic pancreatic trypsin inhibitor. FEBS Lett 50: 265-268.

22. Liu F, Du D, Fuller AA, Davoren JE, Wipf P, et al. (2008) An experimental survey of the transition between two-state and downhill protein folding scenarios. Proc Natl Acad Sci U S A 105: 2369-2374.

23. Gotz AW, Williamson MJ, Xu D, Poole D, Le Grand S, et al. (2012) Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born. Journal of Chemical Theory and Computation 8: 1542-1555.

24. Salomon-Ferrer R, Gotz AW, Poole D, Le Grand S, Walker RC (2013) Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald. Journal of Chemical Theory and Computation 9: 3878-3888.

25. Lindorff-Larsen K, Piana S, Palmo K, Maragakis P, Klepeis JL, et al. (2010) Improved side- chain torsion potentials for the Amber ff99SB protein force field. Proteins-Structure Function and Bioinformatics 78: 1950-1958.

26. Piana S, Lindorff-Larsen K, Shaw DE (2011) How Robust Are Protein Folding Simulations with Respect to Force Field Parameterization? Biophysical Journal 100: L47-L49.

27. Lindorff-Larsen K, Piana S, Dror RO, Shaw DE (2011) How Fast-Folding Proteins Fold. Science 334: 517-520.

28. Raval A, Piana S, Eastwood MP, Dror RO, Shaw DE (2012) Refinement of protein structure homology models via long, all-atom molecular dynamics simulations. Proteins 80: 2071- 2079. 300

29. Beauchamp KA, Lin YS, Das R, Pande VS (2012) Are Protein Force Fields Getting Better? A Systematic Benchmark on 524 Diverse NMR Measurements. J Chem Theory Comput 8: 1409-1414.

30. Cerutti DS, Freddolino PL, Duke RE, Jr., Case DA (2010) Simulations of a protein crystal with a high resolution X-ray structure: evaluation of force fields and water models. J Phys Chem B 114: 12811-12824.

31. Wang JM, Cieplak P, Kollman PA (2000) How well does a restrained electrostatic potential (RESP) model perform in calculating conformational energies of organic and biological molecules? Journal of Computational Chemistry 21: 1049-1074.

32. Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, et al. (2006) Comparison of multiple amber force fields and development of improved protein backbone parameters. Proteins- Structure Function and Bioinformatics 65: 712-725.

33. Li DW, Bruschweiler R (2010) NMR-Based Protein Potentials. Angewandte Chemie- International Edition 49: 6778-6780.

34. MacKerell AD, Bashford D, Bellott M, Dunbrack RL, Evanseck JD, et al. (1998) All-atom empirical potential for molecular modeling and dynamics studies of proteins. Journal of Physical Chemistry B 102: 3586-3616.

35. Mackerell AD, Feig M, Brooks CL (2004) Extending the treatment of backbone energetics in protein force fields: Limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. Journal of Computational Chemistry 25: 1400-1415.

36. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML (1983) Comparison of Simple Potential Functions for Simulating Liquid Water. Journal of Chemical Physics 79: 926-935.

37. Darden T, York D, Pedersen L (1993) Particle Mesh Ewald - an N.Log(N) Method for Ewald Sums in Large Systems. Journal of Chemical Physics 98: 10089-10092.

38. Essmann U, Perera L, Berkowitz ML, Darden T, Lee H, et al. (1995) A Smooth Particle Mesh Ewald Method. Journal of Chemical Physics 103: 8577-8593.

39. Horn HW, Swope WC, Pitera JW, Madura JD, Dick TJ, et al. (2004) Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. J Chem Phys 120: 9665-9678.

40. Abascal JL, Vega C (2005) A general purpose model for the condensed phases of water: TIP4P/2005. J Chem Phys 123: 234505.

41. Cerutti DS, Rice JE, Swope WC, Case DA (2013) Derivation of fixed partial charges for amino acids accommodating a specific water model and implicit polarization. J Phys Chem B 117: 2328-2338.

42. Ren P, Ponder JW (2002) Consistent treatment of inter- and intramolecular polarization in molecular mechanics calculations. J Comput Chem 23: 1497-1506.

301

43. Grossfield A, Ren P, Ponder JW (2003) Ion solvation thermodynamics from simulation with a polarizable force field. J Am Chem Soc 125: 15671-15682.

44. Duan Y, Wu C, Chowdhury S, Lee MC, Xiong G, et al. (2003) A point-charge force field for molecular mechanics simulations of proteins based on condensed-phase quantum mechanical calculations. J Comput Chem 24: 1999-2012.

45. Lamoureux G, Harder E, Vorobyov IV, Roux B, MacKerell AD (2006) A polarizable model of water for molecular dynamics simulations of biomolecules. Chemical Physics Letters 418: 245-249.

46. Ponder JW, Wu C, Ren P, Pande VS, Chodera JD, et al. (2010) Current status of the AMOEBA polarizable force field. J Phys Chem B 114: 2549-2564.

47. Roberts NA, Martin JA, Kinchington D, Broadhurst AV, Craig JC, et al. (1990) Rational design of peptide-based HIV proteinase inhibitors. Science 248: 358-361.

48. Wlodawer A, Vondrasek J (1998) Inhibitors of HIV-1 protease: a major success of structure- assisted drug design. Annu Rev Biophys Biomol Struct 27: 249-284.

49. Friesner RA, Murphy RB, Repasky MP, Frye LL, Greenwood JR, et al. (2006) Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein- ligand complexes. J Med Chem 49: 6177-6196.

50. Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31: 455- 461.

51. Zwanzig RW (1954) High-Temperature Equation of State by a Perturbation Method .1. Nonpolar Gases. Journal of Chemical Physics 22: 1420-1426.

52. Kirkwood JG (1935) Statistical mechanics of fluid mixtures. Journal of Chemical Physics 3: 300-313.

53. den Otter WK, Briels WJ (1998) The calculation of free-energy differences by constrained molecular-dynamics simulations. Journal of Chemical Physics 109: 4139-4146.

54. Srinivasan J, Miller J, Kollman PA, Case DA (1998) Continuum solvent studies of the stability of RNA hairpin loops and helices. J Biomol Struct Dyn 16: 671-682.

55. Kollman PA, Massova I, Reyes C, Kuhn B, Huo S, et al. (2000) Calculating structures and free energies of complex molecules: combining molecular mechanics and continuum models. Acc Chem Res 33: 889-897.

56. Sitkoff D, Sharp KA, Honig B (1994) Accurate Calculation of Hydration Free-Energies Using Macroscopic Solvent Models. Journal of Physical Chemistry 98: 1978-1988.

57. Tan C, Tan YH, Luo R (2007) Implicit nonpolar solvent models. Journal of Physical Chemistry B 111: 12263-12274.

302

58. Jensen AA, Frolund B, Liljefors T, Krogsgaard-Larsen P (2005) Neuronal nicotinic acetylcholine receptors: structural revelations, target identifications, and therapeutic inspirations. J Med Chem 48: 4705-4745.

59. Gotti C, Zoli M, Clementi F (2006) Brain nicotinic acetylcholine receptors: native subtypes and their relevance. Trends Pharmacol Sci 27: 482-491.

60. Albuquerque EX, Pereira EF, Alkondon M, Rogers SW (2009) Mammalian nicotinic acetylcholine receptors: from structure to function. Physiol Rev 89: 73-120.

61. Cooper E, Couturier S, Ballivet M (1991) Pentameric structure and subunit stoichiometry of a neuronal nicotinic acetylcholine receptor. Nature 350: 235-238.

62. Anand R, Conroy WG, Schoepfer R, Whiting P, Lindstrom J (1991) Neuronal nicotinic acetylcholine receptors expressed in Xenopus oocytes have a pentameric quaternary structure. J Biol Chem 266: 11192-11198.

63. Wang F, Gerzanich V, Wells GB, Anand R, Peng X, et al. (1996) Assembly of human neuronal nicotinic receptor alpha 5 subunits with alpha 3, beta 2, and beta 4 subunits. Journal of Biological Chemistry 271: 17656-17665.

64. Nelson ME, Kuryatov A, Choi CH, Zhou Y, Lindstrom J (2003) Alternate stoichiometries of alpha4beta2 nicotinic acetylcholine receptors. Mol Pharmacol 63: 332-341.

65. Gotti C, Clementi F, Fornari A, Gaimarri A, Guiducci S, et al. (2009) Structural and functional diversity of native brain neuronal nicotinic receptors. Biochem Pharmacol 78: 703-711.

66. Unwin N (2005) Refined structure of the nicotinic acetylcholine receptor at 4A resolution. J Mol Biol 346: 967-989.

67. Levin ED, Simon BB (1998) Nicotinic acetylcholine involvement in cognitive function in animals. Psychopharmacology 138: 217-230.

68. Damaj MI, Meyer EM, Martin BR (2000) The antinociceptive effects of alpha 7 nicotinic agonists in an acute pain model. Neuropharmacology 39: 2785-2791.

69. Dani JA, De Biasi M (2001) Cellular mechanisms of nicotine addiction. Pharmacology Biochemistry and Behavior 70: 439-446.

70. Tapper AR, McKinney SL, Nashmi R, Schwarz J, Deshpande P, et al. (2004) Nicotine activation of alpha 4*receptors: Sufficient for reward, tolerance, and sensitization. Science 306: 1029-1032.

71. Shimohama S (2009) Nicotinic receptor-mediated neuroprotection in neurodegenerative disease models. Biol Pharm Bull 32: 332-336.

72. Radek RJ, Kohlhaas KL, Rueter LE, Mohler EG Treating the cognitive deficits of schizophrenia with alpha4beta2 neuronal nicotinic receptor agonists. Curr Pharm Des 16: 309-322.

73. Steinlein OK, Bertrand D (2010) Nicotinic receptor channelopathies and epilepsy. Pflugers Arch 460: 495-503. 303

74. Hung RJ, McKay JD, Gaborieau V, Boffetta P, Hashibe M, et al. (2008) A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452: 633-637.

75. Mckay DB, Chang C, Gonzalez-Cestari TF, Mckay SB, El-Hajj RA, et al. (2007) Analogs of methyllycaconitine as novel noncompetitive inhibitors of nicotinic receptors: Pharmacological characterization, computational modeling, and pharmacophore development. Molecular Pharmacology 71: 1288-1297.

76. Gonzalez-Cestari TF, Henderson BJ, Pavlovicz RE, McKay SB, El-Hajj RA, et al. (2009) Effect of novel negative allosteric modulators of neuronal nicotinic receptors on cells expressing native and recombinant nicotinic receptors: implications for drug discovery. J Pharmacol Exp Ther 328: 504-515.

77. Brejc K, van Dijk WJ, Klaassen RV, Schuurmans M, van Der Oost J, et al. (2001) Crystal structure of an ACh-binding protein reveals the ligand-binding domain of nicotinic receptors. Nature 411: 269-276.

78. Celie PH, Klaassen RV, van Rossum-Fikkert SE, van Elk R, van Nierop P, et al. (2005) Crystal structure of acetylcholine-binding protein from Bulinus truncatus reveals the conserved structural scaffold and sites of variation in nicotinic acetylcholine receptors. J Biol Chem 280: 26457-26466.

79. Celie PH, Kasheverov IE, Mordvintsev DY, Hogg RC, van Nierop P, et al. (2005) Crystal structure of nicotinic acetylcholine receptor homolog AChBP in complex with an alpha- conotoxin PnIA variant. Nat Struct Mol Biol 12: 582-588.

80. Dellisanti CD, Yao Y, Stroud JC, Wang ZZ, Chen L (2007) Crystal structure of the extracellular domain of nAChR alpha1 bound to alpha- at 1.94 A resolution. Nat Neurosci 10: 953-962.

81. Karlin A (2001) The acetylcholine-binding protein: 'What's in a name?'. Pharmacogenomics J 1: 221-223.

82. Hansen SB, Talley TT, Radic Z, Taylor P (2004) Structural and ligand recognition characteristics of an acetylcholine-binding protein from Aplysia californica. Journal of Biological Chemistry 279: 24197-24202.

83. Celie PHN, Klaassen RV, van Rossum-Fikkert SE, van Elk R, van Nierop P, et al. (2005) Crystal structure of acetylcholine-binding protein from Bulinus truncatus reveals the conserved structural scaffold and sites of variation in nicotinic acetylcholine receptors. Journal of Biological Chemistry 280: 26457-26466.

84. Tzartos SJ, Cung MT, Demange P, Loutrari H, Mamalaki A, et al. (1991) The main immunogenic region (MIR) of the nicotinic acetylcholine receptor and the anti-MIR antibodies. Mol Neurobiol 5: 1-29.

85. Hansen SB, Sulzenbacher G, Huxford T, Marchot P, Taylor P, et al. (2005) Structures of Aplysia AChBP complexes with nicotinic agonists and antagonists reveal distinctive binding interfaces and conformations. Embo Journal 24: 3635-3646.

304

86. Celie PH, van Rossum-Fikkert SE, van Dijk WJ, Brejc K, Smit AB, et al. (2004) Nicotine and carbamylcholine binding to nicotinic acetylcholine receptors as studied in AChBP crystal structures. Neuron 41: 907-914.

87. Bourne Y, Talley TT, Hansen SB, Taylor P, Marchot P (2005) Crystal structure of a Cbtx- AChBP complex reveals essential interactions between snake alpha- and nicotinic receptors. EMBO J 24: 1512-1522.

88. Tasneem A, Iyer LM, Jakobsson E, Aravind L (2005) Identification of the prokaryotic ligand- gated ion channels and their implications for the mechanisms and origins of animal Cys- loop ion channels. Genome Biol 6: R4.

89. Hilf RJ, Dutzler R (2008) X-ray structure of a prokaryotic pentameric ligand-gated ion channel. Nature 452: 375-379.

90. Bocquet N, Nury H, Baaden M, Le Poupon C, Changeux JP, et al. (2009) X-ray structure of a pentameric ligand-gated ion channel in an apparently open conformation. Nature 457: 111-114.

91. Hilf RJ, Dutzler R (2009) Structure of a potentially open state of a proton-activated pentameric ligand-gated ion channel. Nature 457: 115-118.

92. Haddadian EJ, Cheng MH, Coalson RD, Xu Y, Tang P (2008) In silico models for the human alpha4beta2 nicotinic acetylcholine receptor. J Phys Chem B 112: 13981-13990.

93. Cheng X, Wang H, Grant B, Sine SM, McCammon JA (2006) Targeted molecular dynamics study of C-loop closure and channel gating in nicotinic receptors. PLoS Comput Biol 2: e134.

94. Le Novere N, Grutter T, Changeux JP (2002) Models of the extracellular domain of the nicotinic receptors and of agonist- and Ca2+-binding sites. Proc Natl Acad Sci U S A 99: 3210-3215.

95. Grazioso G, Cavalli A, De Amici M, Recanatini M, De Micheli C (2008) Alpha7 nicotinic acetylcholine receptor agonists: prediction of their binding affinity through a molecular mechanics Poisson-Boltzmann surface area approach. J Comput Chem 29: 2593-2602.

96. Huang X, Zheng F, Crooks PA, Dwoskin LP, Zhan CG (2005) Modeling multiple species of nicotine and deschloroepibatidine interacting with alpha4beta2 nicotinic acetylcholine receptor: from microscopic binding to phenomenological binding affinity. J Am Chem Soc 127: 14401-14414.

97. Huang X, Zheng F, Chen X, Crooks PA, Dwoskin LP, et al. (2006) Modeling subtype-selective agonists binding with alpha4beta2 and alpha7 nicotinic acetylcholine receptors: effects of local binding and long-range electrostatic interactions. J Med Chem 49: 7661-7674.

98. Huang X, Zheng F, Stokes C, Papke RL, Zhan CG (2008) Modeling binding modes of alpha7 nicotinic acetylcholine receptor with ligands: the roles of Gln117 and other residues of the receptor in agonist binding. J Med Chem 51: 6293-6302.

305

99. Iorga B, Herlem D, Barre E, Guillou C (2006) Acetylcholine nicotinic receptors: finding the putative binding site of allosteric modulators using the "blind docking" approach. J Mol Model 12: 366-372.

100. Babakhani A, Talley TT, Taylor P, McCammon JA (2009) A virtual screening study of the acetylcholine binding protein using a relaxed-complex approach. Comput Biol Chem 33: 160-170.

101. Ulens C, Akdemir A, Jongejan A, van Elk R, Bertrand S, et al. (2009) Use of acetylcholine binding protein in the search for novel alpha7 nicotinic receptor ligands. In silico docking, pharmacological screening, and X-ray analysis. J Med Chem 52: 2372-2383.

102. Hansen SB, Sulzenbacher G, Huxford T, Marchot P, Taylor P, et al. (2005) Structures of Aplysia AChBP complexes with nicotinic agonists and antagonists reveal distinctive binding interfaces and conformations. EMBO J 24: 3635-3646.

103. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292: 195-202.

104. Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 232: 584-599.

105. Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294: 93-96.

106. Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234: 779-815.

107. Case DA, Cheatham TE, 3rd, Darden T, Gohlke H, Luo R, et al. (2005) The Amber biomolecular simulation programs. J Comput Chem 26: 1668-1688.

108. Simmerling C, Miller JL, Kollman PA (1998) Combined locally enhanced sampling and particle mesh Ewald as a strategy to locate the experimental structure of a nonhelical nucleic acid. J Am Chem Soc 120: 7149-7155.

109. Feig M, Karanicolas J, Brooks CL, 3rd (2004) MMTSB Tool Set: enhanced sampling and multiscale modeling methods for applications in structural biology. J Mol Graph Model 22: 377-395.

110. (1993) Activation of Receptors Directly Coupled to Channels. In: Jackson MB, editor. Thermodynamics of Membrane Receptors and Channels. Boca Raton: CRC Press, Inc. pp. 250-268.

111. Xiu XA, Puskar NL, Shanata JAP, Lester HA, Dougherty DA (2009) Nicotine binding to brain receptors requires a strong cation-pi interaction. Nature 458: 534-537.

112. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, et al. (1998) Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J Comput Chem 19: 1639-1662.

113. Huey R, Morris GM, Olson AJ, Goodsell DS (2007) A semiempirical free energy force field with charge-based desolvation. J Comput Chem 28: 1145-1152. 306

114. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, et al. (2004) UCSF chimera - A visualization system for exploratory research and analysis. J Comput Chem 25: 1605-1612.

115. Henderson BJ, Pavlovicz RE, Allen JD, Gonzalez-Cestari TF, Orac CM, et al. (2010) Negative Allosteric Modulators that Target Human 42 Neuronal Nicotinic Receptors. J Pharmacol Exp Ther 334: 761-774.

116. Chavez-Noriega LE, Crona JH, Washburn MS, Urrutia A, Elliott KJ, et al. (1997) Pharmacological characterization of recombinant human neuronal nicotinic acetylcholine receptors h alpha 2 beta 2, h alpha 2 beta 4, h alpha 3 beta 2, h alpha 3 beta 4, h alpha 4 beta 2, h alpha 4 beta 4 and h alpha 7 expressed in Xenopus oocytes. J Pharmacol Exp Ther 280: 346-356.

117. Chavez-Noriega LE, Gillespie A, Stauderman KA, Crona JH, Claeps BO, et al. (2000) Characterization of the recombinant human neuronal nicotinic acetylcholine receptors alpha 3 beta 2 and alpha 4 beta 2 stably expressed in HEK293 cells. Neuropharmacology 39: 2543-2560.

118. Stauderman KA, Mahaffy LS, Akong M, Velicelebi G, Chavez-Noriega LE, et al. (1998) Characterization of human recombinant neuronal nicotinic acetylcholine receptor subunit combinations alpha 2 beta 4, alpha 3 beta 4 and alpha 4 beta 4 stably expressed in HEK293 cells. Journal of Pharmacology and Experimental Therapeutics 284: 777-789.

119. Kollman PA, Massova I, Reyes C, Kuhn B, Huo SH, et al. (2000) Calculating structures and free energies of complex molecules: Combining molecular mechanics and continuum models. Acc Chem Res 33: 889-897.

120. Parker MJ, Beck A, Luetje CW (1998) Neuronal nicotinic receptor beta 2 and beta 4 subunits confer large differences in agonist binding affinity. Molecular Pharmacology 54: 1132- 1139.

121. Mukhtasimova N, Lee WY, Wang HL, Sine SM (2009) Detection and trapping of intermediate states priming nicotinic receptor channel opening. Nature 459: 451-455.

122. Bourne Y, Radic Z, Araoz R, Talley TT, Benoit E, et al. Structural determinants in phycotoxins and AChBP conferring high affinity binding and nicotinic AChR antagonism. Proc Natl Acad Sci U S A 107: 6076-6081.

123. Hibbs RE, Sulzenbacher G, Shi JX, Talley TT, Conrod S, et al. (2009) Structural determinants for interaction of partial agonists with acetylcholine binding protein and neuronal alpha 7 nicotinic acetylcholine receptor. Embo Journal 28: 3040-3051.

124. Eroglu A, Hruszkewycz DP, dela Sena C, Narayanasamy S, Riedl KM, et al. (2012) Naturally occurring eccentric cleavage products of provitamin A beta-carotene function as antagonists of retinoic acid receptors. J Biol Chem 287: 15886-15895.

125. Zhang Z, Burch PE, Cooney AJ, Lanz RB, Pereira FA, et al. (2004) Genomic analysis of the nuclear receptor family: new insights into structure, regulation, and evolution from the rat genome. Genome Res 14: 580-590.

307

126. Sluder AE, Maina CV (2001) Nuclear receptors in nematodes: themes and variations. Trends Genet 17: 206-213.

127. Giguere V, Ong ES, Segui P, Evans RM (1987) Identification of a receptor for the morphogen retinoic acid. Nature 330: 624-629.

128. Petkovich M, Brand NJ, Krust A, Chambon P (1987) A human retinoic acid receptor which belongs to the family of nuclear receptors. Nature 330: 444-450.

129. Hollenberg SM, Weinberger C, Ong ES, Cerelli G, Oro A, et al. (1985) Primary structure and expression of a functional human glucocorticoid receptor cDNA. Nature 318: 635-641.

130. Laudet V, Gronemeyer H (2002) The Nuclear Receptor Facts Book. London: Academic Press.

131. Horlein AJ, Naar AM, Heinzel T, Torchia J, Gloss B, et al. (1995) Ligand-independent repression by the thyroid hormone receptor mediated by a nuclear receptor co-repressor. Nature 377: 397-404.

132. Chen JD, Evans RM (1995) A transcriptional co-repressor that interacts with nuclear hormone receptors. Nature 377: 454-457.

133. Hu X, Li S, Wu J, Xia C, Lala DS (2003) Liver X receptors interact with corepressors to regulate gene expression. Mol Endocrinol 17: 1019-1026.

134. Downes M, Burke LJ, Bailey PJ, Muscat GE (1996) Two receptor interaction domains in the corepressor, N-CoR/RIP13, are required for an efficient interaction with Rev-erbA alpha and RVR: physical association is dependent on the E region of the orphan receptors. Nucleic Acids Res 24: 4379-4386.

135. Harding HP, Lazar MA (1995) The monomer-binding orphan receptor Rev-Erb represses transcription as a dimer on a novel direct repeat. Mol Cell Biol 15: 4791-4802.

136. Underhill C, Qutob MS, Yee SP, Torchia J (2000) A novel nuclear receptor corepressor complex, N-CoR, contains components of the mammalian SWI/SNF complex and the corepressor KAP-1. J Biol Chem 275: 40463-40470.

137. Lazar MA (2003) Nuclear receptor corepressors. Nucl Recept Signal 1: e001.

138. Guenther MG, Barak O, Lazar MA (2001) The SMRT and N-CoR corepressors are activating cofactors for histone deacetylase 3. Mol Cell Biol 21: 6091-6101.

139. Jepsen K, Hermanson O, Onami TM, Gleiberman AS, Lunyak V, et al. (2000) Combinatorial roles of the nuclear receptor corepressor in transcription and development. Cell 102: 753- 763.

140. Onate SA, Tsai SY, Tsai MJ, O'Malley BW (1995) Sequence and characterization of a coactivator for the steroid hormone receptor superfamily. Science 270: 1354-1357.

141. Torchia J, Rose DW, Inostroza J, Kamei Y, Westin S, et al. (1997) The transcriptional co- activator p/CIP binds CBP and mediates nuclear-receptor function. Nature 387: 677-684.

308

142. Chen H, Lin RJ, Schiltz RL, Chakravarti D, Nash A, et al. (1997) Nuclear receptor coactivator ACTR is a novel histone acetyltransferase and forms a multimeric activation complex with P/CAF and CBP/p300. Cell 90: 569-580.

143. Voegel JJ, Heine MJ, Tini M, Vivat V, Chambon P, et al. (1998) The coactivator TIF2 contains three nuclear receptor-binding motifs and mediates transactivation through CBP binding-dependent and -independent pathways. EMBO J 17: 507-519.

144. Chen D, Ma H, Hong H, Koh SS, Huang SM, et al. (1999) Regulation of transcription by a protein methyltransferase. Science 284: 2174-2177.

145. Heery DM, Kalkhoven E, Hoare S, Parker MG (1997) A signature motif in transcriptional co- activators mediates binding to nuclear receptors. Nature 387: 733-736.

146. Darimont BD, Wagner RL, Apriletti JW, Stallcup MR, Kushner PJ, et al. (1998) Structure and specificity of nuclear receptor-coactivator interactions. Genes Dev 12: 3343-3356.

147. Bannister AJ, Kouzarides T (1996) The CBP co-activator is a histone acetyltransferase. Nature 384: 641-643.

148. Ogryzko VV, Schiltz RL, Russanova V, Howard BH, Nakatani Y (1996) The transcriptional coactivators p300 and CBP are histone acetyltransferases. Cell 87: 953-959.

149. Yang XJ, Ogryzko VV, Nishikawa J, Howard BH, Nakatani Y (1996) A p300/CBP-associated factor that competes with the adenoviral oncoprotein E1A. Nature 382: 319-324.

150. Xu W, Chen H, Du K, Asahara H, Tini M, et al. (2001) A transcriptional switch mediated by cofactor methylation. Science 294: 2507-2511.

151. Kamei Y, Xu L, Heinzel T, Torchia J, Kurokawa R, et al. (1996) A CBP integrator complex mediates transcriptional activation and AP-1 inhibition by nuclear receptors. Cell 85: 403- 414.

152. Heery DM, Hoare S, Hussain S, Parker MG, Sheppard H (2001) Core LXXLL motif sequences in CREB-binding protein, SRC1, and RIP140 define affinity and selectivity for steroid and retinoid receptors. J Biol Chem 276: 6695-6702.

153. Yao TP, Ku G, Zhou N, Scully R, Livingston DM (1996) The nuclear hormone receptor coactivator SRC-1 is a specific target of p300. Proc Natl Acad Sci U S A 93: 10626- 10631.

154. Wilson TE, Paulsen RE, Padgett KA, Milbrandt J (1992) Participation of non-zinc finger residues in DNA binding by two nuclear orphan receptors. Science 256: 107-110.

155. Chandra V, Huang P, Potluri N, Wu D, Kim Y, et al. (2013) Multidomain integration in the structure of the HNF-4alpha nuclear receptor complex. Nature 495: 394-398.

156. Glass CK, Devary OV, Rosenfeld MG (1990) Multiple cell type-specific proteins differentially regulate target sequence recognition by the alpha retinoic acid receptor. Cell 63: 729- 738.

309

157. Rochel N, Ciesielski F, Godet J, Moman E, Roessle M, et al. (2011) Common architecture of nuclear receptor heterodimers on DNA direct repeat elements with different spacings. Nat Struct Mol Biol 18: 564-570.

158. Kumar R, Volk DE, Li J, Lee JC, Gorenstein DG, et al. (2004) TATA box binding protein induces structure in the recombinant glucocorticoid receptor AF1 domain. Proc Natl Acad Sci U S A 101: 16425-16430.

159. Warnmark A, Wikstrom A, Wright AP, Gustafsson JA, Hard T (2001) The N-terminal regions of estrogen receptor alpha and beta are unstructured in vitro and show different TBP binding properties. J Biol Chem 276: 45939-45944.

160. Hard T, Kellenbach E, Boelens R, Maler BA, Dahlman K, et al. (1990) Solution structure of the glucocorticoid receptor DNA-binding domain. Science 249: 157-160.

161. Luisi BF, Xu WX, Otwinowski Z, Freedman LP, Yamamoto KR, et al. (1991) Crystallographic analysis of the interaction of the glucocorticoid receptor with DNA. Nature 352: 497-505.

162. Grishin NV (2001) Treble clef finger--a functionally diverse zinc-binding structural motif. Nucleic Acids Res 29: 1703-1714.

163. Schwabe JW, Chapman L, Finch JT, Rhodes D (1993) The crystal structure of the estrogen receptor DNA-binding domain bound to DNA: how receptors discriminate between their response elements. Cell 75: 567-578.

164. Bourguet W, Ruff M, Chambon P, Gronemeyer H, Moras D (1995) Crystal structure of the ligand-binding domain of the human nuclear receptor RXR-alpha. Nature 375: 377-382.

165. Renaud JP, Rochel N, Ruff M, Vivat V, Chambon P, et al. (1995) Crystal structure of the RAR-gamma ligand-binding domain bound to all-trans retinoic acid. Nature 378: 681-689.

166. Wagner RL, Apriletti JW, McGrath ME, West BL, Baxter JD, et al. (1995) A structural role for hormone in the thyroid hormone receptor. Nature 378: 690-697.

167. Brzozowski AM, Pike AC, Dauter Z, Hubbard RE, Bonn T, et al. (1997) Molecular basis of agonism and antagonism in the oestrogen receptor. Nature 389: 753-758.

168. Moras D, Gronemeyer H (1998) The nuclear receptor ligand-binding domain: structure and function. Curr Opin Cell Biol 10: 384-391.

169. Nolte RT, Wisely GB, Westin S, Cobb JE, Lambert MH, et al. (1998) Ligand binding and co- activator assembly of the peroxisome proliferator-activated receptor-gamma. Nature 395: 137-143.

170. Uppenberg J, Svensson C, Jaki M, Bertilsson G, Jendeberg L, et al. (1998) Crystal structure of the ligand binding domain of the human nuclear receptor PPARgamma. J Biol Chem 273: 31108-31112.

171. Shiau AK, Barstad D, Loria PM, Cheng L, Kushner PJ, et al. (1998) The structural basis of estrogen receptor/coactivator recognition and the antagonism of this interaction by tamoxifen. Cell 95: 927-937.

310

172. Bourguet W, Vivat V, Wurtz JM, Chambon P, Gronemeyer H, et al. (2000) Crystal structure of a heterodimeric complex of RAR and RXR ligand-binding domains. Mol Cell 5: 289- 298.

173. Pike AC, Brzozowski AM, Hubbard RE, Bonn T, Thorsell AG, et al. (1999) Structure of the ligand-binding domain of oestrogen receptor beta in the presence of a partial agonist and a full antagonist. EMBO J 18: 4608-4618.

174. Shiau AK, Barstad D, Radek JT, Meyers MJ, Nettles KW, et al. (2002) Structural characterization of a subtype-selective ligand reveals a novel mode of estrogen receptor antagonism. Nat Struct Biol 9: 359-364.

175. Nahoum V, Perez E, Germain P, Rodriguez-Barrios F, Manzo F, et al. (2007) Modulators of the structural dynamics of the retinoid X receptor to reveal receptor function. Proc Natl Acad Sci U S A 104: 17323-17328.

176. Bourguet W, Germain P, Gronemeyer H (2000) Nuclear receptor ligand-binding domains: three-dimensional structures, molecular interactions and pharmacological implications. Trends Pharmacol Sci 21: 381-388.

177. Xu HE, Stanley TB, Montana VG, Lambert MH, Shearer BG, et al. (2002) Structural basis for antagonist-mediated recruitment of nuclear co-repressors by PPARalpha. Nature 415: 813-817.

178. le Maire A, Teyssier C, Erb C, Grimaldi M, Alvarez S, et al. (2010) A unique secondary- structure switch controls constitutive gene repression by retinoic acid receptor. Nat Struct Mol Biol 17: 801-807.

179. Tanenbaum DM, Wang Y, Williams SP, Sigler PB (1998) Crystallographic comparison of the estrogen and progesterone receptor's ligand binding domains. Proc Natl Acad Sci U S A 95: 5998-6003.

180. Sato Y, Ramalanjaona N, Huet T, Potier N, Osz J, et al. (2010) The "Phantom Effect" of the Rexinoid LG100754: structural and functional insights. PLoS One 5: e15119.

181. Pignatello MA, Kauffman FC, Levin AA (1997) Multiple factors contribute to the toxicity of the aromatic retinoid, TTNPB (Ro 13-7410): binding affinities and disposition. Toxicol Appl Pharmacol 142: 319-327.

182. Kiefer C, Hessel S, Lampert JM, Vogt K, Lederer MO, et al. (2001) Identification and characterization of a mammalian enzyme catalyzing the asymmetric oxidative cleavage of provitamin A. J Biol Chem 276: 14110-14116.

183. Repa JJ, Hanson KK, Clagett-Dame M (1993) All-trans-retinol is a ligand for the retinoic acid receptors. Proc Natl Acad Sci U S A 90: 7293-7297.

184. Geoghegan KF, Dixon HB, Rosner PJ, Hoth LR, Lanzetti AJ, et al. (1999) Spontaneous alpha-N-6-phosphogluconoylation of a "His tag" in Escherichia coli: the cause of extra mass of 258 or 178 Da in fusion proteins. Anal Biochem 267: 169-184.

185. Gill SC, von Hippel PH (1989) Calculation of protein extinction coefficients from amino acid sequence data. Anal Biochem 182: 319-326. 311

186. Zhang H, Zhou R, Li L, Chen J, Chen L, et al. (2011) Danthron functions as a retinoic X by stabilizing tetramers of the receptor. J Biol Chem 286: 1868-1875.

187. Broecker J, Vargas C, Keller S (2011) Revisiting the optimal c value for isothermal titration calorimetry. Anal Biochem 418: 307-309.

188. Sigurskjold BW (2000) Exact analysis of competition ligand binding by displacement isothermal titration calorimetry. Anal Biochem 277: 260-266.

189. Dufour E, Haertle T (1991) Binding of retinoids and beta-carotene to beta-lactoglobulin. Influence of protein modifications. Biochim Biophys Acta 1079: 316-320.

190. Wright E, Vincent J, Fernandez EJ (2007) Thermodynamic characterization of the interaction between CAR-RXR and SRC-1 peptide by isothermal titration calorimetry. Biochemistry 46: 862-870.

191. Osz J, Brelivet Y, Peluso-Iltis C, Cura V, Eiler S, et al. (2012) Structural basis for a molecular allosteric control mechanism of cofactor binding to nuclear receptors. Proc Natl Acad Sci U S A 109: E588-594.

192. Pogenberg V, Guichou JF, Vivat-Hannah V, Kammerer S, Perez E, et al. (2005) Characterization of the interaction between retinoic acid receptor/retinoid X receptor (RAR/RXR) heterodimers and transcriptional coactivators through structural and fluorescence anisotropy studies. J Biol Chem 280: 1625-1633.

193. Ding XF, Anderson CM, Ma H, Hong H, Uht RM, et al. (1998) Nuclear receptor-binding sites of coactivators glucocorticoid receptor interacting protein 1 (GRIP1) and steroid receptor coactivator 1 (SRC-1): multiple motifs with different binding specificities. Mol Endocrinol 12: 302-313.

194. Wang S, Wang Z, Lin S, Zheng W, Wang R, et al. (2012) Revealing a natural marine product as a novel agonist for retinoic acid receptors with a unique binding mode and inhibitory effects on cancer cells. Biochem J 446: 79-87.

195. Wang JM, Wolf RM, Caldwell JW, Kollman PA, Case DA (2004) Development and testing of a general amber force field. Journal of Computational Chemistry 25: 1157-1174.

196. Bayly CI, Cieplak P, Cornell WD, Kollman PA (1993) A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. J Phys Chem 97: 10269-10280.

197. Dupradeau FY, Pigache A, Zaffran T, Savineau C, Lelong R, et al. (2010) The R.ED. tools: advances in RESP and ESP charge derivation and force field library building. Physical Chemistry Chemical Physics 12: 7821-7839.

198. Germain P, Kammerer S, Perez E, Peluso-Iltis C, Tortolani D, et al. (2004) Rational design of RAR-selective ligands revealed by RARbeta crystal stucture. EMBO Rep 5: 877-882.

199. Cieplak P, Cornell WD, Bayly C, Kollman PA (1995) Application of the Multimolecule and Multiconformational Resp Methodology to Biopolymers - Charge Derivation for DNA, Rna, and Proteins. Journal of Computational Chemistry 16: 1357-1377.

312

200. Scott AP, Radom L (1996) Harmonic vibrational frequencies: An evaluation of Hartree-Fock, Moller-Plesset, quadratic configuration interaction, density functional theory, and semiempirical scale factors. Journal of Physical Chemistry 100: 16502-16513.

201. Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, et al. (1998) Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. Journal of Computational Chemistry 19: 1639-1662.

202. Friesner RA, Murphy RB, Repasky MP, Frye LL, Greenwood JR, et al. (2006) Extra precision glide: Docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. Journal of Medicinal Chemistry 49: 6177-6196.

203. Olsson MHM, Sondergaard CR, Rostkowski M, Jensen JH (2011) PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pK(a) Predictions. Journal of Chemical Theory and Computation 7: 525-537.

204. Bruck N, Vitoux D, Ferry C, Duong V, Bauer A, et al. (2009) A coordinated phosphorylation cascade initiated by p38MAPK/MSK1 directs RARalpha to target promoters. EMBO J 28: 34-47.

205. Gaillard E, Bruck N, Brelivet Y, Bour G, Lalevee S, et al. (2006) Phosphorylation by PKA potentiates retinoic acid receptor alpha activity by means of increasing interaction with and phosphorylation by cyclin H/cdk7. Proc Natl Acad Sci U S A 103: 9548-9553.

206. Chebaro Y, Amal I, Rochel N, Rochette-Egly C, Stote RH, et al. (2013) Phosphorylation of the retinoic acid receptor alpha induces a mechanical allosteric regulation and changes in internal dynamics. PLoS Comput Biol 9: e1003012.

207. Wang J, Morin P, Wang W, Kollman PA (2001) Use of MM-PBSA in reproducing the binding free energies to HIV-1 RT of TIBO derivatives and predicting the binding mode to HIV-1 RT of efavirenz by docking and MM-PBSA. J Am Chem Soc 123: 5221-5230.

208. Massova I, Kollman PA (1999) Computational alanine scanning to probe protein-protein interactions: A novel approach to evaluate binding free energies. Journal of the American Chemical Society 121: 8133-8143.

209. Cramer CJ (2004) Essentials of Computational Chemistry: Theories and Models: Wiley; 2nd edition.

210. Kongsted J, Ryde U (2009) An improved method to predict the entropy term with the MM/PBSA approach. J Comput Aided Mol Des 23: 63-71.

211. Tsui V, Case DA (2001) Theory and applications of the generalized Born solvation model in macromolecular Simulations. Biopolymers 56: 275-291.

212. Kallenberger BC, Love JD, Chatterjee VK, Schwabe JW (2003) A dynamic mechanism of nuclear receptor activation and its perturbation in a human disease. Nat Struct Biol 10: 136-140.

213. Figueira AC, Saidemberg DM, Souza PC, Martinez L, Scanlan TS, et al. (2011) Analysis of agonist and antagonist effects on thyroid hormone receptor conformation by hydrogen/deuterium exchange. Mol Endocrinol 25: 15-31. 313

214. Batista MR, Martinez L (2013) Dynamics of nuclear receptor Helix-12 switch of transcription activation by modeling time-resolved fluorescence anisotropy decays. Biophys J 105: 1670-1680.

215. Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins 23: 566-579.

216. Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14: 33-38, 27-38.

217. Schubert HL, Blumenthal RM, Cheng X (2003) Many paths to methyltransfer: a chronicle of convergence. Trends Biochem Sci 28: 329-335.

218. Wilson GG, Murray NE (1991) Restriction and modification systems. Annu Rev Genet 25: 585-627.

219. Adams RL (1990) DNA methylation. The effect of minor bases on DNA-protein interactions. Biochem J 265: 309-320.

220. Bedford MT (2007) Arginine methylation at a glance. J Cell Sci 120: 4243-4246.

221. Dillon SC, Zhang X, Trievel RC, Cheng X (2005) The SET-domain protein superfamily: protein lysine methyltransferases. Genome Biol 6: 227.

222. Kumagai I, Watanabe K, Oshima T (1980) Thermally induced biosynthesis of 2'-O- methylguanosine in tRNA from an extreme thermophile, Thermus thermophilus HB27. Proc Natl Acad Sci U S A 77: 1922-1926.

223. Urbonavicius J, Durand JM, Bjork GR (2002) Three modifications in the D and T arms of tRNA influence translation in Escherichia coli and expression of virulence genes in Shigella flexneri. J Bacteriol 184: 5348-5357.

224. Pintard L, Lecointe F, Bujnicki JM, Bonnerot C, Grosjean H, et al. (2002) Trm7p catalyses the formation of two 2'-O-methylriboses in yeast tRNA anticodon loop. EMBO J 21: 1811- 1820.

225. Noma A, Kirino Y, Ikeuchi Y, Suzuki T (2006) Biosynthesis of wybutosine, a hyper-modified nucleoside in eukaryotic phenylalanine tRNA. EMBO J 25: 2142-2154.

226. Chow CS, Lamichhane TN, Mahto SK (2007) Expanding the nucleotide repertoire of the ribosome with post-transcriptional modifications. ACS Chem Biol 2: 610-619.

227. Liu M, Douthwaite S (2002) Resistance to the macrolide antibiotic tylosin is conferred by single methylations at 23S rRNA nucleotides G748 and A2058 acting in synergy. Proc Natl Acad Sci U S A 99: 14658-14663.

228. Bugl H, Fauman EB, Staker BL, Zheng F, Kushner SR, et al. (2000) RNA methylation under heat shock control. Mol Cell 6: 349-360.

229. Fustin JM, Doi M, Yamaguchi Y, Hida H, Nishimura S, et al. (2013) RNA-Methylation- Dependent RNA Processing Controls the Speed of the Circadian Clock. Cell 155: 793- 806. 314

230. Misako K, Kouichi M (2004) Caffeine synthase and related methyltransferases in plants. Front Biosci 9: 1833-1842.

231. Tabor CW, Tabor H (1984) Polyamines. Annu Rev Biochem 53: 749-790.

232. Eisenberg T, Knauer H, Schauer A, Buttner S, Ruckenstuhl C, et al. (2009) Induction of autophagy by spermidine promotes longevity. Nat Cell Biol 11: 1305-1314.

233. Sofia HJ, Chen G, Hetzler BG, Reyes-Spindola JF, Miller NE (2001) Radical SAM, a novel protein superfamily linking unresolved steps in familiar biosynthetic pathways with radical mechanisms: functional characterization using new analysis and information visualization methods. Nucleic Acids Res 29: 1097-1106.

234. Pierrel F, Bjork GR, Fontecave M, Atta M (2002) Enzymatic modification of tRNAs: MiaB is an iron-sulfur protein. J Biol Chem 277: 13367-13370.

235. Old IG, Phillips SE, Stockley PG, Saint Girons I (1991) Regulation of methionine biosynthesis in the Enterobacteriaceae. Prog Biophys Mol Biol 56: 145-185.

236. Somers WS, Rafferty JB, Phillips K, Strathdee S, He YY, et al. (1994) The Met Repressor- Operator Complex - DNA Recognition by Beta-Strands. DNA Damage 726: 105-117.

237. Winkler WC, Nahvi A, Sudarsan N, Barrick JE, Breaker RR (2003) An mRNA structure that controls gene expression by binding S-adenosylmethionine. Nat Struct Biol 10: 701-707.

238. Fuchs RT, Grundy FJ, Henkin TM (2007) S-adenosylmethionine directly inhibits binding of 30S ribosomal subunits to the SMK box translational riboswitch RNA. Proc Natl Acad Sci U S A 104: 4876-4880.

239. Rognes SE, Lea PJ, Miflin BJ (1980) S-adenosylmethionine--a novel regulator of aspartate kinase. Nature 287: 357-359.

240. Marina P, Martinez-Costa OH, Calderon IL, Aragon JJ (2004) Characterization of the aspartate kinase from Saccharomyces cerevisiae and of its interaction with threonine. Biochem Biophys Res Commun 321: 584-591.

241. Moir D, Paulus H (1977) Properties and subunit structure of aspartokinase II from Bacillus subtilis VB217. J Biol Chem 252: 4648-4651.

242. Madison JT, Thompson JF (1976) Threonine synthetase from higher plants: stimulation by S-adenosylmethionine and inhibition by cysteine. Biochem Biophys Res Commun 71: 684-691.

243. Cantoni GL (1952) The Nature of the Active Methyl Donor Formed Enzymatically from L- Methionine and Adenosinetriphosphate. Journal of the American Chemical Society 74: 2942-2943.

244. Markham GD, Norrby PO, Bock CW (2002) S-adenosylmethionine conformations in solution and in protein complexes: conformational influences of the sulfonium group. Biochemistry 41: 7636-7646.

315

245. Mas-Droux C, Curien G, Robert-Genthon M, Laurencin M, Ferrer JL, et al. (2006) A novel organization of ACT domains in allosteric enzymes revealed by the crystal structure of Arabidopsis aspartate kinase. Plant Cell 18: 1681-1692.

246. Mas-Droux C, Biou V, Dumas R (2006) Allosteric threonine synthase. Reorganization of the pyridoxal phosphate site upon asymmetric activation through S-adenosylmethionine binding to a novel site. J Biol Chem 281: 5188-5196.

247. Fu Z, Hu Y, Konishi K, Takata Y, Ogawa H, et al. (1996) Crystal structure of glycine N- methyltransferase from rat liver. Biochemistry 35: 11985-11993.

248. Montange RK, Mondragon E, van Tyne D, Garst AD, Ceres P, et al. (2010) Discrimination between closely related cellular metabolites by the SAM-I riboswitch. J Mol Biol 396: 761- 772.

249. De la Pena M, Kyrieleis OJ, Cusack S (2007) Structural insights into the mechanism and evolution of the vaccinia virus mRNA cap N7 methyl-transferase. EMBO J 26: 4913-4925.

250. Bayly CI, Cieplak P, Cornell WD, Kollman PA (1993) A Well-Behaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges - the Resp Model. Journal of Physical Chemistry 97: 10269-10280.

251. Walker RC (2012) Supercomputer in a desktop: Routine microsecond molecular dynamics simulations of proteins on commodity hardware: Extreme GPU acceleration of AMBER. Abstracts of Papers of the American Chemical Society 243.

252. Perez A, Marchan I, Svozil D, Sponer J, Cheatham TE, 3rd, et al. (2007) Refinement of the AMBER force field for nucleic acids: improving the description of alpha/gamma conformers. Biophys J 92: 3817-3829.

253. Zgarbova M, Otyepka M, Sponer J, Mladek A, Banas P, et al. (2011) Refinement of the Cornell et al. Nucleic Acids Force Field Based on Reference Quantum Chemical Calculations of Glycosidic Torsion Profiles. J Chem Theory Comput 7: 2886-2902.

254. Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, et al. (1996) A second generation force field for the simulation of proteins, nucleic acids, and organic molecules (vol 117, pg 5179, 1995). Journal of the American Chemical Society 118: 2309-2309.

316

Appendix A. Nuclear Receptor Sequence Alignment

317

318

319

Appendix B. Structure of Synthetic RAR Ligands Available from Tocris

320

Figure B.1. Structure of synthetic RAR ligands. A. BMS 961 B. BMS 493 C. BMS 195614 D. AM 80 E. MM 11253 F. BMS 453 G. AC 55649 H. AC 261066 I. Adapalene J. BMS 753 K. BMS 961 L. AM 580 M. CD 437 N. LE 135 O. ER 50891 P. TTNPB Q. Tazarotene R. EC 23

321

Appendix C. His-hRARα LBD Primary Sequence and Plasmid Sequence

322

His-hRARα LBD primary sequence:

MGSSHHHHHH SSGLVPRGSH MESYTLTPEV GELIEKVRKA HQETFPALCQ LGKYTTNNSS 60 EQRVSLDIDL WDKFSELSTK CIIKTVEFAK QLPGFTTLTI ADQITLLKAA CLDILILRIC 120 TRYTPEQDTM TFSDGLTLNR TQMHNAGFGP LTDLVFAFAN QLLPLEMDDA ETGLLSAICL 180 ICGDRQDLEQ PDRVDMLQEP LLEALKVYVR KRRPSRPHMF PKMLMKITDL RSISAKGAER 240 VITLKMEIPG SMPPLIQEML ENSEGLD 267

His-hRARα LBD plasmid sequence:

M G S S H H H H H H S S G L V P R G S H atgggcagcagccatcatcatcatcatcacagcagcggcctggtgccgcgcggcagccat

M E S Y T L T P E V G E L I E K V R K A atggagagctacacgctgacgccggaggtgggggagctcattgagaaggtgcgcaaagcg

H Q E T F P A L C Q L G K Y T T N N S S caccaggaaaccttccctgccctctgccagctgggcaaatacactacgaacaacagctca

E Q R V S L D I D L W D K F S E L S T K gaacaacgtgtctctctggacattgacctctgggacaagttcagtgaactctccaccaag

C I I K T V E F A K Q L P G F T T L T I tgcatcattaagactgtggagttcgccaagcagctgcccggcttcaccaccctcaccatc

A D Q I T L L K A A C L D I L I L R I C gccgaccagatcaccctcctcaaggctgcctgcctggacatcctgatcctgcggatctgc

T R Y T P E Q D T M T F S D G L T L N R acgcggtacacgcccgagcaggacaccatgaccttctcggacgggctgaccctgaaccgg

T Q M H N A G F G P L T D L V F A F A N acccagatgcacaacgctggcttcggccccctcaccgacctggtctttgccttcgccaac

Q L L P L E M D D A E T G L L S A I C L cagctgctgcccctggagatggatgatgcggagacggggctgctcagcgccatctgcctc

I C G D R Q D L E Q P D R V D M L Q E P atctgcggagaccgccaggacctggagcagccggaccgggtggacatgctgcaggagccg

L L E A L K V Y V R K R R P S R P H M F ctgctggaggcgctaaaggtctacgtgcggaagcggaggcccagccgcccccacatgttc

P K M L M K I T D L R S I S A K G A E R cccaagatgctaatgaagattactgacctgcgaagcatcagcgccaagggggctgagcgg

V I T L K M E I P G S M P P L I Q E M L gtgatcacgctgaagatggagatcccgggctccatgccgcctctcatccaggaaatgttg

E N S E G L D - - gagaactcagagggcctggactgatga

323

Appendix D. His-hRXRα LBD Primary Sequence and Plasmid Sequence

324

His-hRXRα LBD primary sequence:

MGSSHHHHHH SSGLVPRGSH MTSSANEDMP VERILEAELA VEPKTETYVE ANMGLNPSSP 60 NDPVTNICQA ADKQLFTLVE WAKRIPHFSE LPLDDQVILL RAGWNELLIA SFSHRSIAVK 120 DGILLATGLH VHRNSAHSAG VGAIFDRVLT ELVSKMRDMQ MDKTELGCLR AIVLFNPDSK 180 GLSNPAEVEA LREKVYASLE AYCKHKYPEQ PGRFAKLLLR LPALRSIGLK CLEHLFFFKL 240 IGDTPIDTFL MEMLE APHQMT 260

His-hRXRα LBD plasmid sequence:

M G S S H H H H H H S S G L V P R G S H atgggcagcagccatcatcatcatcatcacagcagcggcctggtgccgcgcggcagccat

M T S S A N E D M P V E R I L E A E L A atgaccagcagcgccaacgaggacatgccggtggagaggatcctggaggctgagctggcc

V E P K T E T Y V E A N M G L N P S S P gtggagcccaagaccgagacctacgtggaggcaaacatggggctgaaccccagctcgccg

N D P V T N I C Q A A D K Q L F T L V E aacgaccctgtcaccaacatttgccaagcagccgacaaacagcttttcaccctggtggag

W A K R I P H F S E L P L D D Q V I L L tgggccaagcggatcccacacttctcagagctgcccctggacgaccaggtcatcctgctg

R A G W N E L L I A S F S H R S I A V K cgggcaggctggaatgagctgctcatcgcctccttctcccaccgctccatcgccgtgaag

D G I L L A T G L H V H R N S A H S A G gacgggatcctcctggccaccgggctgcacgtccaccggaacagcgcccacagcgcaggg

V G A I F D R V L T E L V S K M R D M Q gtgggcgccatctttgacagggtgctgacggagcttgtgtccaagatgcgggacatgcag

M D K T E L G C L R A I V L F N P D S K atggacaagacggagctgggctgcctgcgcgccatcgtcctctttaaccctgactccaag

G L S N P A E V E A L R E K V Y A S L E gggctctcgaacccggccgaggtggaggcgctgagggagaaggtctatgcgtccttggag

A Y C K H K Y P E Q P G R F A K L L L R gcctactgcaagcacaagtacccagagcagccgggaaggttcgctaagctcttgctccgc

L P A L R S I G L K C L E H L F F F K L ctgccggctctgcgctccatcgggctcaaatgcctggaacatctcttcttcttcaagctc

I G D T P I D T F L M E M L E A P H Q M atcggggacacacccattgacaccttccttatggagatgctggangcgccgcaccaaatg

T - acttag

325

Appendix E. FPLC Standards

158 kDa 13.7 kDa

440 kDa 43 kDa

2000 kDa

Figure E.1. FPLC chromatogram for protein standard set 1. Contents are listed in Table D1.

Table E.1. Contents of protein standrd 1. Name MW (kDa) Concentration Elution time (ml) (mg/ml) Blue Dextron 2000 1 49.51 Ferritin 440 0.5 58.29 Aldolase 158 6 72.09 Ovalbumin 43 8 84.30 Ribonuclease A 13.7 10 99.78

326

25 kDa 232 kDa

67 kDa

669 kDa

Figure E.2. FPLC chromatogram for protein standard set 2. Contents are listed in Table D2.

Table E.2. Contents of protein standard 2. Name MW (kDa) Concentration (mg/ml) Elution time (ml) Thyroglobulin 669 2 51.16 Catalase 232 5 68.23 BSA 67 6 78.28 Chymotrypsinogen A 25 4 94.21

327

Appendix F. Detailed ITC Data

Table F.1. ATRA-bound hRARα LBD ITC results. Exp. [RARα] [NR2] Kd ∆H ∆S ∆G C Date N # (μM) (mM) (μM) (cal/mol) (cal/mol/K) (kcal/mol) value 1.02 ± -5132 ± 1 11/16/12 50.90 0.5060 0.65 11.1 -8.441 77.8 0.00476 36.08 0.935 ± -5371 ± 2 11/17/12 47.50 0.4640 0.67 10.2 -8.412 70.8 0.00575 47.92 0.930 ± -5414 ± 3 02/21/13 44.00 0.4403 0.64 10.2 -8.455 68.8 0.00642 59.87 Ave. 0.962 0.65 -5307 10.5 -8.44

Table F.2. TTNPB-bound hRARα LBD ITC results. Exp. [RARα] [NR2] Kd ∆H ∆S ∆G C Date N # (μM) (mM) (μM) (cal/mol) (cal/mol/K) (kcal/mol) value 0.750 ± -6130 ± 1 11/28/12 51.57 0.5120 0.58 7.97 -8.51 88.91 0.00274 32.98 0.730 ± -5751 ± 2 11/29/12 28.25 0.2940 0.45 9.74 -8.65 62.78 0.00262 33.73 0.816 ± -6326 ± 3 01/09/13 53.20 0.5318 0.41 8.01 -8.71 129.8 0.00350 44.62 Ave. 0.765 0.48 -6069 8.57 -8.63

Table F.3. β-apo-13-carotenone-bound hRARα LBD ITC results. Exp. [RARα] [NR2] Kd ∆H ∆S ∆G C Date N # (μM) (mM) (μM) (cal/mol) (cal/mol/K) (kcal/mol) value 0.963 ± -6075 ± 1 11/08/12 58.00 0.6330 2.10 5.61 -7.75 27.6 0.00503 45.18 0.988 ± -6233 ± 2 11/12/12 56.60 0.5970 2.21 4.97 -7.71 25.61 0.00690 66.02 0.911 ± -5009 ± 3 03/20/13 63.28 0.6336 2.17 9.11 -7.73 29.16 0.00604 48.93 Ave. 0.954 2.16 -5772 6.56 -7.73

328

Table F.4. β-apo-14'-carotenoic acid-bound hRARα LBD results. Exp. [RARα] [NR2] Kd ∆H ∆S ∆G C Date N # (μM) (mM) (μM) (cal/mol) (cal/mol/K) (kcal/mol) value 0.938 ± -6531 ± 1 07/11/12 104.79 1.046 1.01 5.52 -8.18 105.9 0.00420 44.14 0.931 ± -6851 ± 2 02/22/13 87.78 0.888 1.65 3.48 -7.89 53.2 0.00646 67.32 0.998 ± -6150 ± 3 02/26/13 81.00 0.8089 1.17 6.51 -8.09 69.3 0.00551 49.32 Ave. 0.956 1.28 -6511 5.17 -8.05

Table F.5. BMD 195614-bound hRARα LBD ITC results. Exp. [RARα] [NR2] Kd ∆H ∆S ∆G C Date N # (μM) (mM) (μM) (cal/mol) (cal/mol/K) (kcal/mol) value 1.07 ± 3655 ± 1 11/19/12 107.7 0.971 73.0 31.2 -5.65 1.5 0.0293 173.8 1.08 ± 5473 ± 2 11/21/12 102.6 1.015 202.8 35.3 -5.05 0.5 0.103 767 0.996 ± 4653 ± 3 11/27/12 138.99 1.43 71.4 34.6 -5.66 1.9 0.0135 98.03 0.442 ± 9004 ± 4 03/22/13 188.06 1.877 226.2 46.9 -4.98 0.8 0.0733 1740 Ave. 115.8 5696 37.0 -5.45

Table F.6. β-apo-13-lycopenone-bound hRARα LBD ITC results. Exp. [RARα] [NR2] Kd ∆H ∆S ∆G C Date N # (μM) (mM) (μM) (cal/mol) (cal/mol/K) (kcal/mol) value 0.485 ± -12500 ± 1 05/02/13 64.05 0.6408 25.64 -20.9 -6.27 2.5 0.0117 379.8 0.639 ± -13320 ± 2 05/03/13 127.37 1.2780 48.30 -24.9 -5.89 2.6 0.0172 481.2 0.208 ± -17999 ± 3 05/05/13 87.72 0.875 45.05 -40.5 -5.92 1.9 0.0272 2592 Ave. 0.444 39.66 -14606 -28.77 -6.03

Table F.7. Untreated (apo) hRARα LBD ITC results. Exp. [RARα] [NR2] Kd ∆H ∆S ∆G C Date N # (μM) (mM) (μM) (cal/mol) (cal/mol/K) (kcal/mol) value 11/02/12 0.508 ± -5792 ± 1 71.4 0.35777 23.47 1.76 -6.32 1.5 Wt 0.0117 209.5 10/16/12 0.401 ± -3603 ± 2 65.5 1.267 4.93 12.2 -7.24 6.6 C160S 0.0189 229.6 10/14/12 0.425 ± -4059 ± 3 112.3 1.2076 7.19 9.91 -7.01 7.8 C27S 0.00719 99.68 10/24/12 0.389 ± -4687 ± 4 79.14 0.7874 5.78 8.25 -7.15 6.8 wt 0.00815 136.1 Ave. 0.431 5.97 -4450 10.12 -7.13 329

Table F.8. ATRA-bound C235A hRARα LBD ITC results. Exp. [RARα] [NR2] Kd ∆H ∆S ∆G C Date N # (μM) (mM) (μM) (cal/mol) (cal/mol/K) (kcal/mol) value 0.784 ± -4952 ± 1 12/14/12 43.1 0.422 0.82 11.2 -8.29 52.6 0.00313 27.87 1.03 ± -5499 ± 2 12/20/12 46.8 0.464 0.96 9.09 -8.21 48.7 0.0753 82.2 0.929 ± -5175 ± 3 12/21/12 78.7 0.789 0.79 10.6 -8.34 99.9 0.00316 32.28 Ave. 0.914 0.86 -5208 10.3 -8.28

Table F.9. β-apo-13-carotenone-bound C235A hRARα LBD ITC results. Exp. [RARα] [NR2] Kd ∆H ∆S ∆G C Date N # (μM) (mM) (μM) (cal/mol) (cal/mol/K) (kcal/mol) value 1.05 ± -4745 ± 1 12/13/12 85.16 0.8500 1.02 11.5 -8.17 83.5 0.00429 28.69 0.974 ± -5095 ± 2 04/29/13 43.20 0.4247 2.14 8.86 -7.74 20.2 0.0106 58.7 0.878 ± -4860 ± 3 04/30/13 39.18 0.3924 1.92 9.85 -7.80 20.4 0.00529 40.21 0.951 ± -4977 ± 4 05/01/13 68.12 0.6802 2.49 8.94 -7.64 27.3 0.00427 32.80 Ave. 0.963 1.89 -4919 9.79 -7.84

330

Appendix G. AMBER Input Files for All-trans Retinoic Acid

REA.prepi:

0 0 2

This is a remark line molecule.res REA INT 0 CORRECT OMIT DU BEG 0.0000 1 DUMM DU M 0 -1 -2 0.000 .0 .0 .00000 2 DUMM DU M 1 0 -1 1.449 .0 .0 .00000 3 DUMM DU M 2 1 0 1.522 111.1 .0 .00000 4 C16 c3 M 3 2 1 1.540 111.208 180.000 -0.16090 5 H161 hc E 4 3 2 1.084 32.004 -165.480 0.03440 6 H162 hc E 4 3 2 1.087 133.216 -122.205 0.03440 7 H163 hc E 4 3 2 1.086 108.185 99.088 0.03440 8 C1 c3 M 4 3 2 1.542 83.409 -11.141 0.18360 9 C17 c3 3 8 4 3 1.541 108.012 -66.592 -0.07170 10 H171 hc E 9 8 4 1.083 110.961 63.148 0.01120 11 H172 hc E 9 8 4 1.085 111.772 -176.984 0.01120 12 H173 hc E 9 8 4 1.087 110.521 -56.677 0.01120 13 C2 c3 M 8 4 3 1.540 107.354 174.644 -0.08810 14 H21 hc E 13 8 4 1.089 108.954 -45.008 0.01800 15 H22 hc E 13 8 4 1.088 109.124 70.934 0.02230 16 C3 c3 M 13 8 4 1.522 112.743 -165.386 -0.06160 17 H31 hc E 16 13 8 1.088 110.468 -177.367 0.01420 18 H32 hc E 16 13 8 1.086 110.350 -59.828 0.03770 19 C4 c3 M 16 13 8 1.522 109.349 61.125 -0.02760 20 H41 hc E 19 16 13 1.091 109.810 77.237 0.02950 21 H42 hc E 19 16 13 1.089 110.236 -166.968 0.02060 22 C5 c2 M 19 16 13 1.516 113.710 -44.617 0.02710 23 C18 c3 3 22 19 16 1.512 112.294 -167.042 -0.21570 24 H181 hc E 23 22 19 1.080 112.990 -165.973 0.06040 25 H182 hc E 23 22 19 1.088 109.637 -44.822 0.06040 26 H183 hc E 23 22 19 1.088 110.956 72.614 0.06040 27 C6 ce M 22 19 16 1.335 123.208 14.273 -0.11130 28 C7 ce M 27 22 19 1.492 122.169 -177.016 -0.06540 29 H7 ha E 28 27 22 1.077 116.290 -119.347 0.08880 30 C8 cf M 28 27 22 1.330 124.492 62.807 -0.15680 31 H8 ha E 30 28 27 1.079 117.790 -0.612 0.11120 32 C9 cf M 30 28 27 1.470 127.086 179.106 0.02610 33 C19 c3 3 32 30 28 1.509 117.743 0.328 -0.10900 34 H191 hc E 33 32 30 1.080 112.300 179.653 0.03420 35 H192 hc E 33 32 30 1.088 110.915 -59.860 0.03420 36 H193 hc E 33 32 30 1.088 110.928 58.987 0.03420 37 C10 ce M 32 30 28 1.340 118.134 -179.671 -0.24410 38 H10 ha E 37 32 30 1.080 116.689 0.005 0.14930 39 C11 ce M 37 32 30 1.456 129.205 -179.933 -0.13330 40 H11 ha E 39 37 32 1.073 118.534 -0.120 0.17200 41 C12 cf M 39 37 32 1.338 121.961 179.837 -0.19580 42 H12 ha E 41 39 37 1.080 117.617 0.083 0.15140 43 C13 cf M 41 39 37 1.468 127.352 -179.878 0.03030 44 C20 c3 3 43 41 39 1.510 117.932 -0.093 -0.05880 45 H201 hc E 44 43 41 1.077 109.550 179.766 0.02080 46 H202 hc E 44 43 41 1.090 110.634 -59.108 0.02080 47 H203 hc E 44 43 41 1.089 110.690 59.274 0.02080 48 C14 ce M 43 41 39 1.336 117.512 -179.792 -0.19930 49 H14 ha E 48 43 41 1.079 117.335 0.622 0.07010 50 C15 c M 48 43 41 1.542 130.650 178.205 0.83570 51 O2 o E 50 48 43 1.233 118.037 14.389 -0.78590 52 O1 o M 50 48 43 1.230 112.195 -165.907 -0.78590

331

LOOP C6 C1

IMPROPER C4 C18 C5 C6 C5 C1 C6 C7 C6 C8 C7 H7 C7 C9 C8 H8 C19 C10 C9 C8 C11 C9 C10 H10 C10 C12 C11 H11 C11 C13 C12 H12 C20 C14 C13 C12 C15 C13 C14 H14 C14 O2 C15 O1

DONE STOP

REA.frcmod: remark goes here MASS

BOND

ANGLE c3-c3-ce 63.700 110.960 same as c2-c3-c3 c3-cf-ce 65.700 117.400 same as c3-c2-ce

DIHE c3-c3-ce-c2 1 0.000 0.000 2.000 same as X -c2-c3-X c3-c3-ce-ce 1 0.000 0.000 2.000 same as X -c2-c3-X cf-cf-c3-hc 1 0.000 0.000 2.000 same as X -c2-c3-X hc-c3-cf-ce 1 0.000 0.000 2.000 same as X -c2-c3-X

IMPROPER c3-c3-c2-ce 1.1 180.0 2.0 Using default value c2-c3-ce-ce 1.1 180.0 2.0 Using default value ce-cf-ce-ha 1.1 180.0 2.0 Using default value ce-cf-cf-ha 1.1 180.0 2.0 Using default value c3-ce-cf-cf 1.1 180.0 2.0 Using default value c -cf-ce-ha 1.1 180.0 2.0 Using default value ce-o -c -o 1.1 180.0 2.0 General improper

NONBON

332

Appendix H. AMBER Input Files for TTNPB

TTB.prepi:

0 0 2

This is a remark line molecule.res TTB INT 0 CORRECT OMIT DU BEG 0.0000 1 DUMM DU M 0 -1 -2 0.000 .0 .0 .00000 2 DUMM DU M 1 0 -1 1.449 .0 .0 .00000 3 DUMM DU M 2 1 0 1.522 111.1 .0 .00000 4 O o M 3 2 1 1.540 111.208 180.000 -0.78630 5 C8 c M 4 3 2 1.232 113.794 92.593 0.80290 6 O1 o E 5 4 3 1.231 130.212 4.396 -0.78630 7 C7 ca M 5 4 3 1.549 114.849 -175.615 0.02170 8 C9 ca B 7 5 4 1.389 120.978 179.606 -0.19570 9 H9 ha E 8 7 5 1.073 117.775 -0.409 0.13850 10 C10 ca S 8 7 5 1.386 121.218 179.932 -0.12940 11 H10 ha E 10 8 7 1.076 119.603 178.405 0.09960 12 C6 ca M 7 5 4 1.390 121.029 0.283 -0.18210 13 H6 ha E 12 7 5 1.074 117.780 -0.089 0.13880 14 C5 ca M 12 7 5 1.383 121.064 179.646 -0.16830 15 H5 ha E 14 12 7 1.078 119.790 -179.594 0.11120 16 C4 ca M 14 12 7 1.395 121.005 0.942 0.04350 17 C3 cf M 16 14 12 1.483 119.049 -179.027 -0.21560 18 H3 ha E 17 16 14 1.079 114.114 41.295 0.12070 19 C2 ce M 17 16 14 1.333 129.262 -136.916 0.02010 20 C c3 3 19 17 16 1.512 124.317 3.039 -0.07500 21 H1 hc E 20 19 17 1.086 111.071 131.939 0.03410 22 H4 hc E 20 19 17 1.088 111.034 -109.557 0.03410 23 H7 hc E 20 19 17 1.080 111.570 11.415 0.03410 24 C11 ca M 19 17 16 1.497 119.075 -177.038 0.01200 25 C16 ca S 24 19 17 1.386 122.012 -131.531 -0.28340 26 H16 ha E 25 24 19 1.073 118.048 0.618 0.21030 27 C12 ca M 24 19 17 1.392 121.082 48.581 -0.06500 28 H12 ha E 27 24 19 1.075 119.670 1.995 0.10670 29 C13 ca M 27 24 19 1.377 120.520 -178.635 -0.31020 30 H13 ha E 29 27 24 1.075 117.989 -179.893 0.17440 31 C14 ca M 29 27 24 1.397 122.560 -0.366 0.01320 32 C15 ca M 31 29 27 1.399 117.663 -1.344 -0.00050 33 C17 c3 M 32 31 29 1.539 122.979 -179.628 0.23300 34 C18 c3 3 33 32 31 1.541 111.417 134.088 -0.24560 35 H18 hc E 34 33 32 1.088 109.735 -172.971 0.04910 36 H19 hc E 34 33 32 1.085 111.127 -53.536 0.04910 37 H17 hc E 34 33 32 1.084 112.693 67.708 0.04910 38 C19 c3 3 33 32 31 1.542 109.430 -106.015 -0.23190 39 H20 hc E 38 33 32 1.087 110.528 -179.610 0.05110 40 H21 hc E 38 33 32 1.085 110.844 -60.063 0.05110 41 H22 hc E 38 33 32 1.083 111.953 59.718 0.05110 42 C20 c3 M 33 32 31 1.538 110.371 15.238 -0.07810 43 H23 hc E 42 33 32 1.087 109.066 77.338 0.03270 44 H24 hc E 42 33 32 1.087 109.047 -167.216 0.02710 45 C21 c3 M 42 33 32 1.521 112.960 -44.585 -0.11730 46 H25 hc E 45 42 33 1.088 110.120 -175.827 0.01890 47 H26 hc E 45 42 33 1.086 109.460 -59.448 0.03830 48 C22 c3 M 45 42 33 1.538 112.781 62.157 0.27660 49 C24 c3 3 48 45 42 1.542 110.092 76.638 -0.26110 50 H27 hc E 49 48 45 1.087 110.652 59.065 0.05650 51 H28 hc E 49 48 45 1.085 110.810 178.687 0.05650 52 H29 hc E 49 48 45 1.084 111.878 -61.667 0.05650

333

53 C23 c3 M 48 45 42 1.541 107.222 -165.742 -0.18550 54 H30 hc E 53 48 45 1.087 109.814 -52.517 0.03490 55 H31 hc E 53 48 45 1.085 111.062 66.906 0.03490 56 H32 hc E 53 48 45 1.083 112.638 -171.927 0.03490

LOOP C4 C10 C15 C16 C22 C14

IMPROPER C7 O C8 O1 C8 C6 C7 C9 C7 C10 C9 H9 C4 C9 C10 H10 C5 C7 C6 H6 C6 C4 C5 H5 C5 C10 C4 C3 C4 C2 C3 H3 C C11 C2 C3 C16 C12 C11 C2 C11 C15 C16 H16 C11 C13 C12 H12 C12 C14 C13 H13 C22 C13 C14 C15 C17 C16 C15 C14

DONE STOP

TTB.frcmod: remark goes here MASS

BOND

ANGLE ca-cf-ce 65.200 123.080 same as c2-cf-ca cf-ce-c3 64.300 122.890 same as c2-ce-c3 cf-ce-ca 65.200 123.080 same as c2-ce-ca

DIHE ca-ca-cf-ha 1 6.650 180.000 2.000 same as X -c2-cf-X ca-ca-cf-ce 1 6.650 180.000 2.000 same as X -c2-cf-X cf-ce-c3-hc 1 0.000 0.000 2.000 same as X -c2-c3-X cf-ce-ca-ca 1 2.550 180.000 2.000 same as X -c2-ca-X c3-ce-ca-ca 1 2.550 180.000 2.000 same as X -c2-ca-X hc-c3-ce-ca 1 0.000 0.000 2.000 same as X -c2-c3-X

IMPROPER ca-o -c -o 1.1 180.0 2.0 General improper c -ca-ca-ca 1.1 180.0 2.0 Using default value ca-ca-ca-ha 1.1 180.0 2.0 General improper ca-ca-ca-cf 1.1 180.0 2.0 Using default value ca-ce-cf-ha 1.1 180.0 2.0 Using default value c3-ca-ce-cf 1.1 180.0 2.0 Using default value ca-ca-ca-ce 1.1 180.0 2.0 Using default value

334

Appendix I. AMBER Input Files for β-apo-13-carotenone

A13.prepi:

0 0 2

This is a remark line molecule.res A13 INT 0 CORRECT OMIT DU BEG 0.0000 1 DUMM DU M 0 -1 -2 0.000 .0 .0 .00000 2 DUMM DU M 1 0 -1 1.449 .0 .0 .00000 3 DUMM DU M 2 1 0 1.522 111.1 .0 .00000 4 C14 c3 M 3 2 1 1.540 111.208 180.000 -0.20710 5 H12 hc E 4 3 2 1.084 152.993 -5.595 0.05050 6 H13 hc E 4 3 2 1.087 76.375 97.946 0.05050 7 H14 hc E 4 3 2 1.085 47.863 -32.051 0.05050 8 C1 c3 M 4 3 2 1.542 89.331 -149.909 0.14680 9 C15 c3 3 8 4 3 1.541 108.103 111.409 -0.07260 10 H15 hc E 9 8 4 1.084 111.109 62.701 0.01670 11 H16 hc E 9 8 4 1.084 111.782 -177.131 0.01670 12 H17 hc E 9 8 4 1.087 110.434 -56.897 0.01670 13 C2 c3 M 8 4 3 1.540 107.388 -7.463 -0.05980 14 H1 hc E 13 8 4 1.088 109.035 -44.794 0.02550 15 H2 hc E 13 8 4 1.087 109.053 71.143 0.02550 16 C3 c3 M 13 8 4 1.522 112.805 -165.306 -0.06110 17 H3 hc E 16 13 8 1.086 110.449 -177.522 0.02800 18 H4 hc E 16 13 8 1.086 110.428 -59.923 0.02800 19 C4 c3 M 16 13 8 1.523 109.425 61.114 -0.05070 20 H5 hc E 19 16 13 1.091 109.867 76.951 0.04000 21 H6 hc E 19 16 13 1.088 110.339 -167.066 0.04000 22 C5 c2 M 19 16 13 1.516 113.693 -44.691 0.03540 23 C16 c3 3 22 19 16 1.512 112.289 -166.977 -0.23480 24 H18 hc E 23 22 19 1.080 113.300 -166.231 0.06860 25 H19 hc E 23 22 19 1.087 109.563 -45.284 0.06860 26 H20 hc E 23 22 19 1.087 110.753 72.113 0.06860 27 C6 ce M 22 19 16 1.334 122.892 14.625 -0.10240 28 C7 ce M 27 22 19 1.491 121.836 -177.875 -0.00540 29 H7 ha E 28 27 22 1.076 116.139 -118.127 0.09150 30 C8 cf M 28 27 22 1.328 124.617 63.914 -0.19060 31 H8 ha E 30 28 27 1.077 118.378 -0.501 0.11370 32 C9 cf M 30 28 27 1.475 126.302 179.197 0.07440 33 C17 c3 3 32 30 28 1.509 117.666 0.108 -0.08560 34 H21 hc E 33 32 30 1.081 112.908 179.762 0.04110 35 H22 hc E 33 32 30 1.086 110.460 59.102 0.04110 36 H23 hc E 33 32 30 1.085 110.421 -59.538 0.04110 37 C10 ce M 32 30 28 1.338 117.822 -179.900 -0.25310 38 H9 ha E 37 32 30 1.077 117.761 0.044 0.16510 39 C11 ce M 37 32 30 1.460 127.673 -179.905 -0.08410 40 H10 ha E 39 37 32 1.073 117.857 -0.115 0.18470 41 C12 cf M 39 37 32 1.331 122.974 179.972 -0.34030 42 H11 ha E 41 39 37 1.076 120.915 -0.013 0.16950 43 C13 c M 41 39 37 1.484 125.316 179.981 0.62870 44 O1 o E 43 41 39 1.197 119.547 179.993 -0.54700 45 C18 c3 M 43 41 39 1.516 119.791 -0.027 -0.16790 46 H24 hc E 45 43 41 1.080 108.579 -179.961 0.04500 47 H25 hc E 45 43 41 1.085 110.938 59.897 0.04500 48 H26 hc E 45 43 41 1.086 110.901 -59.876 0.04500

LOOP C6 C1

335

IMPROPER C4 C16 C5 C6 C5 C1 C6 C7 C6 C8 C7 H7 C7 C9 C8 H8 C17 C10 C9 C8 C11 C9 C10 H9 C10 C12 C11 H10 C13 C11 C12 H11 C18 C12 C13 O1

DONE STOP

A13.frcmod: remark goes here MASS

BOND

ANGLE c3-c3-ce 63.700 110.960 same as c2-c3-c3 c3-cf-ce 65.700 117.400 same as c3-c2-ce

DIHE c3-c3-ce-c2 1 0.000 0.000 2.000 same as X -c2-c3-X c3-c3-ce-ce 1 0.000 0.000 2.000 same as X -c2-c3-X cf-cf-c3-hc 1 0.000 0.000 2.000 same as X -c2-c3-X hc-c3-cf-ce 1 0.000 0.000 2.000 same as X -c2-c3-X

IMPROPER c3-c3-c2-ce 1.1 180.0 2.0 Using default value c2-c3-ce-ce 1.1 180.0 2.0 Using default value ce-cf-ce-ha 1.1 180.0 2.0 Using default value ce-cf-cf-ha 1.1 180.0 2.0 Using default value c3-ce-cf-cf 1.1 180.0 2.0 Using default value c -ce-cf-ha 1.1 180.0 2.0 Using default value c3-cf-c -o 10.5 180.0 2.0 General improper torsional angle (2 general atom types)

NONBON

336

Appendix J. AMBER Input Files for β-apo-14’-carotenoic Acid

A14.prepi:

0 0 2

This is a remark line molecule.res A14 INT 0 CORRECT OMIT DU BEG 0.0000 1 DUMM DU M 0 -1 -2 0.000 .0 .0 .00000 2 DUMM DU M 1 0 -1 1.449 .0 .0 .00000 3 DUMM DU M 2 1 0 1.522 111.1 .0 .00000 4 C16 c3 M 3 2 1 1.540 111.208 180.000 -0.18120 5 H161 hc E 4 3 2 1.084 35.351 -161.594 0.04050 6 H162 hc E 4 3 2 1.087 133.430 -110.318 0.04050 7 H163 hc E 4 3 2 1.086 110.656 105.705 0.04050 8 C1 c3 M 4 3 2 1.541 79.305 -3.180 0.17490 9 C17 c3 3 8 4 3 1.541 108.102 -65.408 -0.07310 10 H171 hc E 9 8 4 1.083 110.943 63.040 0.01290 11 H172 hc E 9 8 4 1.084 111.800 -176.984 0.01290 12 H173 hc E 9 8 4 1.088 110.447 -56.665 0.01290 13 C2 c3 M 8 4 3 1.540 107.380 175.773 -0.10790 14 H21 hc E 13 8 4 1.089 108.948 -44.921 0.02550 15 H22 hc E 13 8 4 1.087 109.055 70.993 0.03190 16 C3 c3 M 13 8 4 1.522 112.783 -165.363 -0.03860 17 H31 hc E 16 13 8 1.087 110.494 -177.419 0.00860 18 H32 hc E 16 13 8 1.086 110.365 -59.787 0.03200 19 C4 c3 M 16 13 8 1.522 109.347 61.142 -0.03040 20 H41 hc E 19 16 13 1.091 109.826 77.175 0.03120 21 H42 hc E 19 16 13 1.088 110.270 -167.009 0.02180 22 C5 c2 M 19 16 13 1.516 113.696 -44.618 0.03020 23 C18 c3 3 22 19 16 1.512 112.277 -167.043 -0.21580 24 H181 hc E 23 22 19 1.079 113.097 -166.075 0.06030 25 H182 hc E 23 22 19 1.089 109.644 -44.913 0.06030 26 H183 hc E 23 22 19 1.089 110.920 72.494 0.06030 27 C6 ce M 22 19 16 1.334 123.175 14.340 -0.11340 28 C7 ce M 27 22 19 1.491 122.130 -177.124 -0.05170 29 H7 ha E 28 27 22 1.077 116.261 -119.005 0.08820 30 C8 cf M 28 27 22 1.330 124.535 63.064 -0.15860 31 H8 ha E 30 28 27 1.078 117.860 -0.594 0.10880 32 C9 cf M 30 28 27 1.472 126.991 179.186 0.02870 33 C19 c3 3 32 30 28 1.509 117.674 0.273 -0.08260 34 H191 hc E 33 32 30 1.081 112.326 179.685 0.03050 35 H192 hc E 33 32 30 1.087 110.842 -59.810 0.03050 36 H193 hc E 33 32 30 1.086 110.874 59.074 0.03050 37 C10 ce M 32 30 28 1.339 118.063 -179.712 -0.25250 38 H10 ha E 37 32 30 1.079 116.827 0.006 0.15770 39 C11 ce M 37 32 30 1.458 128.963 -179.958 -0.11900 40 H11 ha E 39 37 32 1.074 118.401 -0.094 0.17080 41 C12 cf M 39 37 32 1.338 121.998 179.873 -0.23120 42 H12 ha E 41 39 37 1.079 117.655 0.084 0.14900 43 C13 cf M 41 39 37 1.461 127.296 -179.964 0.02310 44 C20 c3 3 43 41 39 1.509 117.932 0.008 -0.04160 45 H201 hc E 44 43 41 1.080 111.945 -179.986 0.02060 46 H202 hc E 44 43 41 1.088 110.981 -59.522 0.02060 47 H203 hc E 44 43 41 1.088 110.988 59.548 0.02060 48 C14 ce M 43 41 39 1.342 118.072 -179.976 -0.11640 49 H14 ha E 48 43 41 1.079 116.704 -0.008 0.12110 50 C21 ce M 48 43 41 1.457 129.050 -179.953 -0.22660 51 H2 ha E 50 48 43 1.074 120.207 -0.094 0.16740 52 C22 cf M 50 48 43 1.329 122.727 179.968 -0.17140

337

53 H1 ha E 52 50 48 1.081 120.396 -0.008 0.09820 54 C23 c M 52 50 48 1.538 123.861 179.996 0.83680 55 O2 o E 54 52 50 1.231 113.662 -179.990 -0.79420 56 O1 o M 54 52 50 1.232 115.751 -0.035 -0.79420

LOOP C6 C1

IMPROPER C4 C18 C5 C6 C5 C1 C6 C7 C6 C8 C7 H7 C7 C9 C8 H8 C19 C10 C9 C8 C11 C9 C10 H10 C10 C12 C11 H11 C11 C13 C12 H12 C20 C14 C13 C12 C21 C13 C14 H14 C14 C22 C21 H2 C23 C21 C22 H1 C22 O2 C23 O1

DONE STOP

A14.frcmod: remark goes here MASS

BOND

ANGLE c3-c3-ce 63.700 110.960 same as c2-c3-c3 c3-cf-ce 65.700 117.400 same as c3-c2-ce

DIHE c3-c3-ce-c2 1 0.000 0.000 2.000 same as X -c2-c3-X c3-c3-ce-ce 1 0.000 0.000 2.000 same as X -c2-c3-X cf-cf-c3-hc 1 0.000 0.000 2.000 same as X -c2-c3-X hc-c3-cf-ce 1 0.000 0.000 2.000 same as X -c2-c3-X

IMPROPER c3-c3-c2-ce 1.1 180.0 2.0 Using default value c2-c3-ce-ce 1.1 180.0 2.0 Using default value ce-cf-ce-ha 1.1 180.0 2.0 Using default value ce-cf-cf-ha 1.1 180.0 2.0 Using default value c3-ce-cf-cf 1.1 180.0 2.0 Using default value c -ce-cf-ha 1.1 180.0 2.0 Using default value cf-o -c -o 1.1 180.0 2.0 General improper torsional angle (1 general atom type)

NONBON

338

Appendix K. Partial Atomic Charges for β-apo-13-carotenone Covalently Linked to Cysteine

Table K.1. Partial atomic charges for CCS. ff99 reference CCS Group Atom Name Atom Type Partial Charges Atom Type Partial Charges CH3 CT -0.3662 -0.3541 HH31 HC 0.1123 0.1429 HH32 HC 0.1123 0.0688 ACE HH33 HC 0.1123 0.1131 C C 0.5972 0.5972 O O -0.5679 -0.5679 N N -0.4157 N -0.4157 H H 0.2719 H 0.2719 CA CT 0.0213 CT 0.0016 HA H1 0.1124 H1 0.1458 CB CT -0.1231 CT -0.0398 CYS HB2 H1 0.1112 H1 0.0560 HB3 H1 0.1112 H1 0.0560 SG SH -0.3119 S -0.2672 C C 0.5973 C 0.5973 O O -0.5679 O -0.5679 CD c3 0.2961 OE oh -0.5959 HOE oe 0.4408 CE c3 -0.2098 HE1 hc 0.0657 HE2 hc 0.0657 HE3 hc 0.0657 C1 c2 -0.1404 H1 ha 0.1476 C2 ce -0.0599 H2 ha 0.1482 β-apo-13- C3 ce -0.2939 carotenone H3 ha 0.1469 C4 cf 0.0635 C5 c3 -0.0905 H51 hc 0.0413 H52 hc 0.0413 H53 hc 0.0413 C6 cf -0.1555 H6 ha 0.1150 C7 ce -0.0312 H7 ha 0.0928 C8 ce -0.1124 C9 c2 0.0333 Continued

339

Table K.1. continued C10 c3 -0.2218 H101 hc 0.0618 H102 hc 0.0618 H103 hc 0.0618 C11 c3 -0.0453 H111 hc 0.0388 H112 hc 0.0388 C12 c3 -0.0665 H121 hc 0.0282 H122 hc 0.0282 C13 c3 -0.0582

H131 hc 0.0244 H132 hc 0.0244 C14 c3 0.1527 C15 c3 -0.1000 H151 hc 0.0243 H152 hc 0.0243 H153 hc 0.0243 C16 c3 -0.2062 H161 hc 0.0502 H162 hc 0.0502 H163 hc 0.0502 N N -0.4157 -0.4157 H H 0.2719 0.2719 CH3 CT -0.1490 -0.1169 NME HH31 H1 0.0976 0.0905 HH32 H1 0.0976 0.0982 HH32 H1 0.0976 0.0720 TOTAL 0.0001

340

Appendix L. CCS Library File for Implementation in AMBER

!!index array str "CCS" !entry.CCS.unit.atoms table str name str type int typex int resx int flags int seq int elmnt dbl chg "N" "N" 0 1 131075 7 7 -0.415700 "H" "H" 0 1 131075 8 1 0.271900 "CA" "CT" 0 1 131075 9 6 0.001600 "HA" "H1" 0 1 131075 10 1 0.145800 "C" "C" 0 1 131075 11 6 0.597300 "O" "O" 0 1 131075 12 8 -0.567900 "CB" "CT" 0 1 131075 19 6 -0.039800 "HB2" "H1" 0 1 131075 20 1 0.056000 "HB3" "H1" 0 1 131075 21 1 0.056000 "SG" "S" 0 1 131075 22 16 -0.267200 "CD" "c3" 0 1 131075 23 6 0.296100 "CE" "c3" 0 1 131075 24 6 -0.209800 "HE1" "hc" 0 1 131075 25 1 0.065700 "HE2" "hc" 0 1 131075 26 1 0.065700 "HE3" "hc" 0 1 131075 27 1 0.065700 "OE" "oh" 0 1 131075 28 8 -0.595900 "HOE" "ho" 0 1 131075 29 1 0.440800 "C1" "c2" 0 1 131075 30 6 -0.140400 "H1" "ha" 0 1 131075 31 1 0.147600 "C2" "ce" 0 1 131075 32 6 -0.059900 "H2" "ha" 0 1 131075 33 1 0.148200 "C3" "ce" 0 1 131075 34 6 -0.293900 "H3" "ha" 0 1 131075 35 1 0.146900 "C4" "cf" 0 1 131075 36 6 0.063500 "C5" "c3" 0 1 131075 37 6 -0.090500 "H51" "hc" 0 1 131075 38 1 0.041300 "H52" "hc" 0 1 131075 39 1 0.041300 "H53" "hc" 0 1 131075 40 1 0.041300 "C6" "cf" 0 1 131075 41 6 -0.155500 "H6" "ha" 0 1 131075 42 1 0.115000 "C7" "ce" 0 1 131075 43 6 -0.031200 "H7" "ha" 0 1 131075 44 1 0.092800 "C8" "ce" 0 1 131075 45 6 -0.112400 "C9" "c2" 0 1 131075 46 6 0.033300 "C10" "c3" 0 1 131075 47 6 -0.221800 "H101" "hc" 0 1 131075 48 1 0.061800 "H102" "hc" 0 1 131075 49 1 0.061800 "H103" "hc" 0 1 131075 50 1 0.061800 "C11" "c3" 0 1 131075 51 6 -0.045300 "H111" "hc" 0 1 131075 52 1 0.038800 "H112" "hc" 0 1 131075 53 1 0.038800 "C12" "c3" 0 1 131075 54 6 -0.066500 "H121" "hc" 0 1 131075 55 1 0.028200 "H122" "hc" 0 1 131075 56 1 0.028200

341

"C13" "c3" 0 1 131075 57 6 -0.058200 "H131" "hc" 0 1 131075 58 1 0.024400 "H132" "hc" 0 1 131075 59 1 0.024400 "C14" "c3" 0 1 131075 60 6 0.152700 "C16" "c3" 0 1 131075 61 6 -0.206200 "H161" "hc" 0 1 131075 62 1 0.050200 "H162" "hc" 0 1 131075 63 1 0.050200 "H163" "hc" 0 1 131075 64 1 0.050200 "C15" "c3" 0 1 131075 65 6 -0.100000 "H151" "hc" 0 1 131075 66 1 0.024300 "H152" "hc" 0 1 131075 67 1 0.024300 "H153" "hc" 0 1 131075 68 1 0.024300 !entry.CCS.unit.atomspertinfo table str pname str ptype int ptypex int pelmnt dbl pchg "N" "N" 0 -1 0.0 "H" "H" 0 -1 0.0 "CA" "CT" 0 -1 0.0 "HA" "H1" 0 -1 0.0 "C" "C" 0 -1 0.0 "O" "O" 0 -1 0.0 "CB" "CT" 0 -1 0.0 "HB2" "H1" 0 -1 0.0 "HB3" "H1" 0 -1 0.0 "SG" "S" 0 -1 0.0 "CD" "c3" 0 -1 0.0 "CE" "c3" 0 -1 0.0 "HE1" "hc" 0 -1 0.0 "HE2" "hc" 0 -1 0.0 "HE3" "hc" 0 -1 0.0 "OE" "oh" 0 -1 0.0 "HOE" "ho" 0 -1 0.0 "C1" "c2" 0 -1 0.0 "H1" "ha" 0 -1 0.0 "C2" "ce" 0 -1 0.0 "H2" "ha" 0 -1 0.0 "C3" "ce" 0 -1 0.0 "H3" "ha" 0 -1 0.0 "C4" "cf" 0 -1 0.0 "C5" "c3" 0 -1 0.0 "H51" "hc" 0 -1 0.0 "H52" "hc" 0 -1 0.0 "H53" "hc" 0 -1 0.0 "C6" "cf" 0 -1 0.0 "H6" "ha" 0 -1 0.0 "C7" "ce" 0 -1 0.0 "H7" "ha" 0 -1 0.0 "C8" "ce" 0 -1 0.0 "C9" "c2" 0 -1 0.0 "C10" "c3" 0 -1 0.0 "H101" "hc" 0 -1 0.0 "H102" "hc" 0 -1 0.0 "H103" "hc" 0 -1 0.0 "C11" "c3" 0 -1 0.0 "H111" "hc" 0 -1 0.0 "H112" "hc" 0 -1 0.0 "C12" "c3" 0 -1 0.0

342

"H121" "hc" 0 -1 0.0 "H122" "hc" 0 -1 0.0 "C13" "c3" 0 -1 0.0 "H131" "hc" 0 -1 0.0 "H132" "hc" 0 -1 0.0 "C14" "c3" 0 -1 0.0 "C16" "c3" 0 -1 0.0 "H161" "hc" 0 -1 0.0 "H162" "hc" 0 -1 0.0 "H163" "hc" 0 -1 0.0 "C15" "c3" 0 -1 0.0 "H151" "hc" 0 -1 0.0 "H152" "hc" 0 -1 0.0 "H153" "hc" 0 -1 0.0 !entry.CCS.unit.boundbox array dbl -1.000000 0.0 0.0 0.0 0.0 !entry.CCS.unit.childsequence single int 2 !entry.CCS.unit.connect array int 1 5 !entry.CCS.unit.connectivity table int atom1x int atom2x int flags 1 2 1 1 3 1 3 4 1 3 5 1 3 7 1 5 6 1 7 8 1 7 9 1 7 10 1 10 11 1 11 12 1 11 16 1 11 18 1 12 13 1 12 14 1 12 15 1 16 17 1 18 19 1 18 20 1 20 21 1 20 22 1 22 23 1 22 24 1 24 25 1 24 29 1 25 26 1 25 27 1 25 28 1 29 30 1 29 31 1

343

31 32 1 31 33 1 33 34 1 33 48 1 34 35 1 34 39 1 35 36 1 35 37 1 35 38 1 39 40 1 39 41 1 39 42 1 42 43 1 42 44 1 42 45 1 45 46 1 45 47 1 45 48 1 48 49 1 48 53 1 49 50 1 49 51 1 49 52 1 53 54 1 53 55 1 53 56 1 !entry.CCS.unit.hierarchy table str abovetype int abovex str belowtype int belowx "U" 0 "R" 1 "R" 1 "A" 1 "R" 1 "A" 2 "R" 1 "A" 3 "R" 1 "A" 4 "R" 1 "A" 5 "R" 1 "A" 6 "R" 1 "A" 7 "R" 1 "A" 8 "R" 1 "A" 9 "R" 1 "A" 10 "R" 1 "A" 11 "R" 1 "A" 12 "R" 1 "A" 13 "R" 1 "A" 14 "R" 1 "A" 15 "R" 1 "A" 16 "R" 1 "A" 17 "R" 1 "A" 18 "R" 1 "A" 19 "R" 1 "A" 20 "R" 1 "A" 21 "R" 1 "A" 22 "R" 1 "A" 23 "R" 1 "A" 24 "R" 1 "A" 25 "R" 1 "A" 26 "R" 1 "A" 27

344

"R" 1 "A" 28 "R" 1 "A" 29 "R" 1 "A" 30 "R" 1 "A" 31 "R" 1 "A" 32 "R" 1 "A" 33 "R" 1 "A" 34 "R" 1 "A" 35 "R" 1 "A" 36 "R" 1 "A" 37 "R" 1 "A" 38 "R" 1 "A" 39 "R" 1 "A" 40 "R" 1 "A" 41 "R" 1 "A" 42 "R" 1 "A" 43 "R" 1 "A" 44 "R" 1 "A" 45 "R" 1 "A" 46 "R" 1 "A" 47 "R" 1 "A" 48 "R" 1 "A" 49 "R" 1 "A" 50 "R" 1 "A" 51 "R" 1 "A" 52 "R" 1 "A" 53 "R" 1 "A" 54 "R" 1 "A" 55 "R" 1 "A" 56 !entry.CCS.unit.name single str "CCS" !entry.CCS.unit.positions table dbl x dbl y dbl z 4.256594 3.738532 -0.225958 4.996083 3.342082 -0.760757 4.440020 5.099789 0.263016 4.540068 5.084973 1.336261 3.228430 5.997452 -0.001959 2.931742 6.853149 0.785288 5.727189 5.634298 -0.399061 5.662562 5.447217 -1.464837 6.571359 5.063826 -0.026014 6.105843 7.407026 -0.308047 6.904349 7.710568 1.335168 6.164619 7.038184 2.492121 5.105047 7.260207 2.462108 6.298622 5.963562 2.481124 6.578855 7.415935 3.421066 6.843707 9.101713 1.480211 5.935643 9.368858 1.561164 8.372963 7.356354 1.397196 8.800607 7.566203 2.365351 9.143224 6.853616 0.444867 8.723751 6.671881 -0.524772 10.560689 6.552313 0.659888 10.918306 6.762006 1.654988 11.437851 6.059854 -0.219297

345

11.109548 5.719253 -1.652228 10.077452 5.917247 -1.901179 11.728685 6.296628 -2.332452 11.305544 4.668565 -1.850239 12.823179 5.837476 0.235719 13.017576 6.098741 1.263354 13.827473 5.359187 -0.491519 13.657996 5.117911 -1.526333 15.219743 5.167811 0.009502 15.506322 4.301358 0.982481 14.504911 3.367336 1.623513 13.570165 3.310584 1.087539 14.926771 2.366187 1.682509 14.291263 3.672961 2.645615 16.902307 4.123004 1.543409 17.292241 3.165428 1.198302 16.835440 4.037892 2.625007 17.863071 5.240635 1.159531 18.885197 4.944683 1.376480 17.661357 6.122348 1.760605 17.703554 5.558336 -0.320369 17.924242 4.658850 -0.891305 18.425121 6.308657 -0.633305 16.286376 6.043785 -0.677507 16.146600 5.971555 -2.210454 16.978438 6.487994 -2.680277 16.154550 4.942494 -2.560509 15.235997 6.442718 -2.562395 16.103584 7.512524 -0.248468 16.823237 8.148731 -0.757299 15.109686 7.870752 -0.495577 16.239284 7.639753 0.819452 !entry.CCS.unit.residueconnect table int c1x int c2x int c3x int c4x int c5x int c6x 0 53 0 0 0 0 !entry.CCS.unit.residues table str name int seq int childseq int startatomx str restype int imagingx "CCS" 1 69 1 "?" 0 !entry.CCS.unit.residuesPdbSequenceNumber array int 0 !entry.CCS.unit.solventcap array dbl -1.000000 0.0 0.0 0.0 0.0 !entry.CCS.unit.velocities table dbl x dbl y dbl z 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

346

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

347

Appendix M. CCS Parameter File for Implementation in AMBER

CSS.frcmod MASS

BOND S-c3 227.0 1.810 S-CT parameters from parm99

ANGLE S-c3-c3 61.100 112.69 ss-c3-c3 from GAFF S-c3-c2 0.00 114.519 MP2/6-311+G(d,p) S-CT-HC 42.500 108.76 hc-c3-ss from GAFF CT-S-c3 110.50 104.335 MP2/6-311+G(d,p) c3-ss-c3 from GAFF S-c3-oh 78.500 105.895 MP2/6-311+G(d,p) oh-c3-c3 30.000 105.158 MP2/6-311+G(d,p) c3-c3-ce 63.700 110.960 same as c2-c3-c3 -- from A13.frcmod c3-cf-ce 65.700 117.400 same as c3-c2-ce -- from A13.frcmod

DIHE X -c3-S -X 3 1.000 0.000 3.000 c3-c3-ce-c2 1 0.000 0.000 2.000 c3-c3-ce-ce 1 0.000 0.000 2.000 cf-cf-c3-hc 1 0.000 0.000 2.000 hc-c3-cf-ce 1 0.000 0.000 2.000

IMPROPER c2-c3-c2-ha 1.1 180.0 2.0 c2-ha-c2-ha 1.1 180.0 2.0 c3-c3-c2-ce 1.1 180.0 2.0 c2-c3-ce-ce 1.1 180.0 2.0 ce-cf-ce-ha 1.1 180.0 2.0 ce-cf-cf-ha 1.1 180.0 2.0 c3-ce-cf-cf 1.1 180.0 2.0 c -ce-cf-ha 1.1 180.0 2.0 c3-cf-c -o 10.5 180.0 2.0

NONBON

348

Appendix N. EC 2.1.1 Members That Do Not Use SAM as a Methyl Donor

2.1.1.3 = thetin-homocysteine S-methyltransferase = dimethylsulfonioacetate + L-homocysteine = S-methylthioglycolate + L-methionine

2.1.1.5 = betain-homocysteinine S-methyltransferase = trimethylammonioacetate + L- homocysteine = dimethylglycine + L-methionine

2.1.1.13 = methionine synthase = 5-methyltetrahydrofolate + L-homocysteine = tetrahydrofolate + L-methionine

2.1.1.14 = 5-methyltetrahydropteroyltriglutamate—homocysteine S-methyltransferase = 5- methyltetrahydropteroyltri-L-glutamate + L-homocysteine = tetrahydropteroyltri-L-glutamate + L-methionine

2.1.1.19 = trimethylsulfonium—tetrahydrofolate N-methyltransferase = trimethylsulfonium + tetrahydrofolate = dimethylsulfide + 5-methyltetrahydrofolate

2.1.1.21 = methylamine—glutamate N-methyltransferase = methylamine + L-glutamate = NH3 + N-methyl-L-glutamate

2.1.1.45 = thymidylate synthase = 5,10-methylenetetrahydrofolate + dUMP = dihydrofolate + dTMP

2.1.1.54 = deoxycytidylate C-methyltransferase = 5,10-methylenetetrahydrofolate + dCMP = dihydrofolate + deoxy-5-methylcytidylate

2.1.1.63 = methylated-DNA—[protein]-cysteine S-methyltransferase = DNA (containing 6-O- methylguanine) + protein L-cysteine = DNA (without 6-O-methylguanine) + protein S-methyl-L- cysteine

2.1.1.74 = DNA (containing 6-O-methylguanine) + protein L-cysteine = DNA (without 6-O- methylguanine) + protein S-methyl-L-cysteine = 5,10-methylenetetrahydrofolate + uracil54 in 54 tRNA + FADH2 = tetrahydrofolate + 5-methyluracil in tRNA + FAD

2.1.1.86 = tetrahydromethanopterin S-methyltransferase = 5-methyl-5,6,7,8- tetrahydromethanopterin + 2-mercaptoethanesulfonate = 5,6,7,8-tetrahydromethanopterin + 2- (methylthio)ethanesulfonate

2.1.1.90 = methanol—corrinoid protein Co-methyltransferase = methanol + a [Co(I) methanol- specific corrinoid protein] = a [methyl-Co(III) methanol-specific corrinoid protein] + H2O

2.1.1.148 = thymidylate synthase (FAD) = 5,10-methylenetetrahydrofolate + dUMP + NADPH + H+ = dTMP + tetrahydrofolate + NADP+

2.1.1.245 = 5-methyltetrahydrosarcinapterin:corrinoid/iron-sulfur protein Co-methyltransferase =a [methyl-Co(III) corrinoid Fe-S protein] + tetrahydrosarcinapterin = a [Co(I) corrinoid Fe-S protein] + 5-methyltetrahydrosarcinapterin

349

2.1.1.246 = [methyl-Co(III) methanol-specific corrinoid protein]:coenzyme M methyltransferase = a [methyl-Co(III) methanol-specific corrinoid protein] + coenzyme M = methyl-CoM + a [Co(I) methanol-specific corrinoid protein]

2.1.1.247 = [methyl-Co(III) methylamine-specific corrinoid protein]:coenzyme M methyltransferase = a [methyl-Co(III) methylamine-specific corrinoid protein] + coenzyme M = methyl-CoM + a [Co(I) methylamine-specific corrinoid protein]

2.1.1.248 = methylamine—corrinoid protein Co-methyltransferase = methylamine + a [Co(I) methylamine-specific corrinoid protein] = a [methyl-Co(III) methylamine-specific corrinoid protein] + ammonia

2.1.1.249 = dimethylamine—corrinoid protein Co-methyltransferase = dimethylamine + a [Co(I) dimethylamine-specific corrinoid protein] = a [methyl-Co(III) dimethylamine-specific corrinoid protein] + methylamine

2.1.1.250 = trimethylamine—corrinoid protein Co-methyltransferase = trimethylamine + a [Co(I) trimethylamine-specific corrinoid protein] = a [methyl-Co(III) trimethylamine-specific corrinoid protein] + dimethylamine

2.1.1.251 = methylated-thiol—coenzyme M methyltransferase =methanethiol + coenzyme M = methyl-CoM + hydrogen sulfide (overall reaction) (1a) methanethiol + a [Co(I) methylated--thiol-specific corrinoid protein] = a [methyl-Co(III) methylated-thiol-specific corrinoid protein] + hydrogen sulfide (1b) a [methyl-Co(III) methylated-thiol-specific corrinoid protein] + coenzyme M = methyl-CoM + a [Co(I) methylated-thiol-specific corrinoid protein]

2.1.1.252 = tetramethylammonium—corrinoid protein Co-methyltransferase = tetramethylammonium + a [(Co(I) tetramethylammonium-specific corrinoid protein] = a [methyl-Co(III) tetramethylammonium-specific corrinoid protein] + trimethylamine

2.1.1.253 = [methyl-Co(III) tetramethylammonium-specific corrinoid protein]:coenzyme M methyltransferase = a [methyl-Co(III) tetramethylammonium-specific corrinoid protein] + coenzyme M = methyl-CoM + a [Co(I) tetramethylammonium-specific corrinoid protein]

2.1.1.258 = 5-methyltetrahydrofolate:corrinoid/iron-sulfur protein Co-methyltransferase = a [methyl-Co(III) corrinoid Fe-S protein] + tetrahydrofolate = a [Co(I) corrinoid Fe-S protein] + 5- methyltetrahydrofolate

2.1.1.269 = dimethylsulfoniopropionate demethylase = S,S-dimethyl-β-propiothetin + tetrahydrofolate = 3-(methylthio)propanoate + 5-methyltetrahydrofolate

350

Appendix O: Partial Atomic Charges for All SAM Conformations Used in Multiconfiguration Fits

Table O.1. Partial atomic charges for SAM in anti-conformation. 2EGV 2YY8 3DCM 1MSK 3OPE Combined N -0.2266 -0.3136 -0.2754 -0.2859 -0.2319 -0.4441 H1,2,3 0.2600 0.2977 0.2880 0.2918 0.2640 0.3186 CA 0.1624 0.0832 -0.0143 0.0945 0.1357 0.2025 HA 0.0187 0.0910 0.1374 0.0469 0.0257 0.0321 C 0.6463 0.6766 0.6880 0.7218 0.6708 0.6877 O, OXT -0.6642 -0.6552 -0.6704 -0.6805 -0.6744 -0.6668 CB -0.1296 -0.0806 -0.0544 -0.0124 -0.1241 -0.0773 2,3HB 0.1034 0.0752 0.0376 0.0380 0.1076 0.0685 CG -0.0343 -0.0547 -0.0378 -0.0375 -0.0193 -0.1280 2,3HG 0.0738 0.0951 0.1254 0.1159 0.0581 0.1267 SD 0.1765 0.0926 0.0998 0.0546 0.1626 0.1316 CE -0.0858 -0.0300 -0.1351 -0.0175 -0.0894 -0.0938 1,2,3HE 0.1245 0.1083 0.1232 0.1150 0.1239 0.1245 C5’ -0.0706 -0.2282 -0.0051 -0.1712 -0.0644 -0.0775 H5’,’’ 0.1321 0.2129 0.1458 0.1688 0.1318 0.1407 C4’ 0.0007 0.0153 0.1023 0.0998 0.0268 0.0960 H4’ 0.1981 0.2233 0.1827 0.1741 0.1752 0.1717 O4’ -0.3217 -0.4058 -0.3947 -0.3508 -0.2787 -0.3736 C3’ 0.2778 0.2649 0.0736 0.1051 0.2085 0.1310 H3’ 0.0017 0.0475 0.0749 0.0578 0.0353 0.0459 O3’ -0.6402 -0.5869 -0.6817 -0.6845 -0.6402 -0.6347 HO3’ 0.4629 0.3972 0.4959 0.4820 0.4689 0.4671 C2’ 0.0021 0.0248 0.0867 0.1767 0.0994 0.0919 H2’ 0.1352 0.1612 0.1272 0.1005 0.1238 0.1278 O2’ -0.5859 -0.6266 -0.6258 -0.6546 -0.5910 -0.5422 HO2’ 0.4388 0.4483 0.4543 0.4581 0.4325 0.3874 C1’ -0.0196 0.0035 0.0536 -0.0174 -0.0619 -0.0246 H1’ 0.2032 0.2023 0.2316 0.2229 0.1847 0.2044 N9 0.0147 -0.0318 -0.0598 -0.0551 -0.0445 0.0001 C8 0.0952 0.2193 0.1952 0.0749 0.1441 0.1398 H8 0.1676 0.1198 0.1286 0.1967 0.1519 0.1587 N7 -0.5573 -0.5917 -0.5799 -0.5551 -0.5669 -0.5740 C5 0.0825 0.0574 0.0468 0.0715 0.0592 0.0638 C6 0.7151 0.7114 0.7253 0.6260 0.6640 0.6991 N6 -0.9261 -0.9036 -0.9370 -0.8303 -0.8510 -0.9016 H61,2 0.4242 0.4130 0.4220 0.3922 0.4027 0.4144 N1 -0.7267 -0.7387 -0.7361 -0.7014 -0.7116 -0.7215 C2 0.5662 0.5896 0.5619 0.5410 0.5683 0.5526 H2 0.0706 0.0633 0.0698 0.0751 0.0728 0.0736 N3 -0.6989 -0.7259 -0.6992 -0.7007 -0.7172 -0.6923 C4 0.2949 0.3256 0.3463 0.4052 0.3666 0.3241

351

Table O.2. Partial atomic charges for SAM in high anti-conformation. 1RJD 2QE6 3FPJ 2Q6O 3BWC Combines N -0.3278 -0.3755 -0.2818 -0.3532 -0.2015 -0.5219 H1,2,3 0.2986 0.3155 0.2886 0.3066 0.2693 0.3444 CA -0.0001 -0.0298 -0.0543 0.1266 0.0065 0.0610 HA 0.1175 0.1267 0.1240 0.0455 0.0941 0.0758 C 0.7882 0.8035 0.8003 0.6611 0.6738 0.8069 O, OXT -0.7190 -0.7208 -0.7167 -0.6619 -0.6727 -0.7014 CB -0.0592 -0.1061 -0.0753 0.0060 -0.1019 -0.0132 2,3HB 0.0498 0.0754 0.0818 0.0399 0.0865 0.0615 CG 0.1168 0.0092 -0.0383 -0.0873 0.0445 -0.1269 2,3HG 0.0538 0.1199 0.1183 0.1128 0.1033 0.1230 SD 0.2048 0.0646 0.0707 0.0798 0.1049 0.1699 CE -0.0552 -0.0503 -0.1335 -0.0591 -0.2283 -0.1827 1,2,3HE 0.1095 0.1244 0.1405 0.1069 0.1280 0.1435 C5’ -0.1656 -0.1610 -0.1429 -0.0786 0.0569 -0.1238 H5’,’’ 0.1272 0.1579 0.1524 0.1545 0.0804 0.1431 C4’ 0.0499 0.2434 0.0752 0.0559 0.2369 0.1442 H4’ 0.2145 0.1312 0.1205 0.1685 0.0845 0.1254 O4’ -0.4301 -0.5371 -0.2929 -0.3206 -0.4785 -0.3963 C3’ 0.0830 0.0580 0.2687 0.1115 0.1014 0.1037 H3’ 0.1095 0.0989 0.0245 0.0934 0.0750 0.0761 O3’ -0.6799 -0.6569 -0.6722 -0.6430 -0.5850 -0.5751 HO3’ 0.5089 0.4711 0.4828 0.4737 0.4123 0.4102 C2’ 0.1238 0.0883 0.1419 0.1604 0.0875 0.1130 H2’ 0.0802 0.0791 0.0429 0.0130 0.0539 0.0610 O2’ -0.6527 -0.6182 -0.6582 -0.6368 -0.5809 -0.5483 HO2’ 0.4650 0.4542 0.4688 0.4576 0.4438 0.4130 C1’ 0.1482 0.1954 0.0419 0.0247 0.1693 0.0864 H1’ 0.1995 0.1875 0.1800 0.2004 0.2009 0.1813 N9 -0.0454 -0.0069 0.0016 -0.0226 -0.0229 -0.0326 C8 0.0898 0.1023 0.0783 0.1668 0.1078 0.1219 H8 0.1561 0.1518 0.1591 0.1417 0.1472 0.1373 N7 -0.5231 -0.5323 -0.5451 -0.5742 -0.5334 -0.5354 C5 0.0012 0.0278 0.0421 0.0312 0.0063 0.0133 C6 0.6990 0.6722 0.6786 0.7044 0.6835 0.6777 N6 -0.8835 -0.8592 -0.8714 -0.8842 -0.8645 -0.8644 H61,2 0.4089 0.4009 0.4051 0.4073 0.3990 0.4017 N1 -0.7437 -0.7347 -0.7300 -0.7374 -0.7328 -0.7240 C2 0.6166 0.6054 0.5987 0.5882 0.5882 0.5755 H2 0.0662 0.0639 0.0703 0.0702 0.0654 0.0717 N3 -0.7860 -0.7588 -0.7423 -0.7343 -0.7447 -0.7388 C4 0.4479 0.4060 0.3982 0.4025 0.4362 0.4386

352

Table O.3. Partial atomic charges for SAM in syn-conformation. 1XVA 3GX6 2CDQ 2C2B 3E5C Combine N -0.2567 -0.2727 -0.3459 -0.3005 -0.2266 -0.5191 H1,2,3 0.2831 0.2873 0.3056 0.2809 0.2750 0.3329 CA 0.0400 -0.0302 0.1074 0.1444 0.0076 0.2834 HA 0.0862 0.1348 0.0975 0.0382 0.1027 0.0377 C 0.6997 0.7080 0.6320 0.6583 0.6756 0.6252 O, OXT -0.6774 -0.6838 -0.6504 -0.6700 -0.6713 -0.6520 CB -0.0897 -0.0677 -0.1253 -0.0288 -0.0792 -0.0987 2,3HB 0.0688 0.0762 0.0730 0.0681 0.0679 0.0721 CG -0.0362 -0.1424 0.0589 -0.0386 -0.0824 -0.2988 2,3HG 0.1076 0.1461 0.0599 0.0794 0.1424 0.1743 SD 0.1193 0.1166 0.1308 0.1760 0.1926 0.2824 CE -0.1883 -0.1616 -0.0161 -0.1344 -0.1033 -0.2586 1,2,3HE 0.1371 0.1484 0.0790 0.1325 0.1262 0.1614 C5’ 0.0318 -0.0380 0.0071 -0.1233 -0.0900 -0.1456 H5’,’’ 0.1009 0.1313 0.0959 0.1337 0.0644 0.1439 C4’ 0.0286 0.0256 0.0599 0.0947 0.0233 0.0259 H4’ 0.1721 0.1558 0.1078 0.1859 0.1959 0.1590 O4’ -0.4009 -0.3503 -0.3476 -0.4133 -0.3338 -0.3485 C3’ 0.1527 0.2455 0.0504 0.2098 0.1713 0.1051 H3’ 0.0445 0.0113 0.0996 0.0128 0.0766 0.0610 O3’ -0.6256 -0.6965 -0.6349 -0.7347 -0.6669 -0.5780 HO3’ 0.4399 0.4713 0.4546 0.4986 0.4933 0.4217 C2’ 0.1955 0.1878 0.3284 0.1934 0.2012 0.3038 H2’ 0.1547 0.0816 0.0697 0.0822 0.0630 0.0407 O2’ -0.6533 -0.6471 -0.6455 -0.6598 -0.6907 -0.5866 HO2’ 0.4472 0.4474 0.4344 0.4566 0.4707 0.3879 C1’ 0.0283 0.0328 -0.0177 0.0496 -0.0017 -0.0168 H1’ 0.1693 0.1642 0.1520 0.1864 0.1882 0.1558 N9 -0.0072 0.0095 0.0133 -0.0333 -0.0547 -0.0119 C8 0.1316 0.1359 0.1296 0.1467 0.1837 0.1226 H8 0.1658 0.1544 0.1664 0.1637 0.1575 0.1744 N7 -0.5382 -0.5589 -0.5395 -0.5475 -0.5645 -0.5389 C5 0.0477 0.1156 0.0546 0.1235 0.1488 0.0742 C6 0.6572 0.6845 0.7067 0.6034 0.6218 0.6520 N6 -0.8359 -0.9055 -0.9190 -0.8108 -0.8343 -0.8602 H61,2 0.3945 0.4207 0.4218 0.3906 0.3998 0.4062 N1 -0.6951 -0.6686 -0.6880 -0.6615 -0.6524 -0.6747 C2 0.4961 0.4357 0.4077 0.4334 0.3871 0.4524 H2 0.0616 0.0889 0.0907 0.0648 0.0766 0.0681 N3 -0.6852 -0.5549 -0.5097 -0.5262 -0.4879 -0.5841 C4 0.3931 0.1991 0.2755 0.2465 0.2209 0.3153

353