Investigation of Protein/Ligand Interactions Relating Structural Dynamics to Function: Combined Computational and Experimental Approaches
DISSERTATION
Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University
By
Ryan Elliott Pavlovicz
Graduate Program in Biophysics
The Ohio State University
2014
Dissertation Committee:
Chenglong Li, Advisor
Charles E. Bell
Michael E. Paulaitis
Copyright by
Ryan Elliott Pavlovicz
2014
Abstract
The use of computers in chemistry has matured significantly since the
introduction of the modern personal computer, leading to the development of
many tools that may be used to describe phenomena that are difficult or
otherwise impossible to observe experimentally. Computational chemistry is
becoming an integral component of many research programs, often resulting in
the formation of hypotheses that may be tested experimentally. This document details the application of computational tools to study ligand/receptor interactions
in two systems: the nicotinic acetylcholine receptor (nAChR) and the retinoic acid receptor (RAR).
The nAChR study details how a combination of homology modeling, molecular dynamics (MD), blind docking, and free energy analysis may be used to
determine the binding site of a ligand with few clues from experiment to guide the
search. Specifically, the binding site for a class of negative allosteric nAChR
modulators was successfully identified. The computationally predicted binding
site was verified by functional assays performed on receptors that were mutated
at the suspected binding site. Additionally, a comparison of structural data from
homologous proteins and MD simulations of the receptor in complex with an
allosteric modulator lead to a proposed mode of allosteric antagonism that
involves inhibition of C loop closure, thereby preventing channel opening.
ii
In the RAR study, both computation and experiment were applied to characterize
the activity of two β-apocarotenoids that have been previously described as
antagonists of all-trans retinoic acid (ATRA), the endogenous RAR agonist. The
activity of RAR ligands is related to how they influence the interaction between
the receptor and coactivator proteins that lead to gene transcription. The results
of isothermal titration calorimetry (ITC) experiments indicate that the β-
apocarotenoids induce an interaction between the receptor and coactivator that
is intermediate in strength between the unliganded and ATRA-bound receptor,
indicating that these compounds would be most accurately characterized as
partial agonists instead of antagonists. One of the partial agonists, β-apo-13- carotenone, exhibits an unexpectedly high affinity for RAR given its chemical differences from known high-affinity binders. Modeling this compound in the RAR binding site lead to the hypothesis that a covalent interaction may be occurring between the carotenone and a conserved cysteine residue in the binding pocket.
While not conclusive, NMR and mass spectrometry experiments suggest that this interaction is indeed occurring.
Computational free energy analysis was also performed between the ligand- bound receptors and the coactivator. Using the molecular mechanics Poisson-
Boltzmann surface area (MM-PBSA) method applied to microsecond MD simulations, very strong correlation was achieved between the computational binding energies and the experimental ITC data, providing support that the compounds were correctly modeled in the RAR binding pocket. Converged
iii
binding energy averages that lead to the strong correlation with experiment were
contingent upon simulation lengths of ~1 μs, and inclusion of both the calculated
PBSA free energy of solvation and entropic components of binding were found to
strengthen the correlation.
Finally, Chapter 5 includes a study on the parameterization of a new atom type for use in the pair-wise additive AMBER force field. The sulfonium atom type is not included in the current set of parameters since it is relatively uncommon in biology. However, S-adenosylmethionine (SAM), the most common methyl donor in biology, is a notable exception. The development of sulfonium parameters required for the MD simulation of SAM is discussed in detail.
iv
Dedication
This document is dedicated to my family.
v
Vita
1999 ...... North Royalton High School
2004 ...... B.S. Electrical and Computer Engineering, Ohio State University
2006-2014 ...... Graduate Research Assistant, Department of Pharmacy, Ohio State University
2008-2010 ...... American Foundation for Pharmaceutical Education Pre-Doctoral Fellowship
2010 ...... NSF East Asian and Pacific Summer Institutes for U.S. Graduate Students Research Fellowship (Institute of Biophysics, Beijing, China)
2010 ...... American Chemical Society Division of Medicinal Chemistry Pre-Doctoral Fellowship
2011 ...... Outstanding Student Achievement Award presented by the Ohio State University Biophysics Graduate Committee
2012 ...... Presidential Fellowship (Ohio State University Graduate School)
vi
Publications
1.) Frey EN, Pavlovicz RE, Wegman CJ, Li C, Askwith CC. “Conformational changes in the lower palm domain of ASIC1a contribute to desensitization and RFamide modulation”, PLOS ONE, 8(8): e71733, 2013.
2.) Yi B, Long S, González-Cestari TF, Henderson BJ, Pavlovicz RE, Werbovetz K, Li C, McKay DB. “Discovery of benzamide analogs as negative allosteric modulators of human neuronal nicotinic receptors: pharmacophore modeling, rational design, and structure-activity relationship studies”, Bioorganic & Medicinal Chemistry, 21(15):473-43, 2013.
3.) Liu MJ, Bao S, Gálvez-Peralta M, Pyle CJ, Rudawsky AC, Pavlovicz RE, Killilea DW, Li C, Nebert DW, Wewers MD, Knoell DL. “ZIP8 regulates host defense through zinc-mediated inhibition of NF-κB.” Cell Reports, 3(2):386-400, 2013.
4.) Still P, Yi B, González-Cestari TF, Pan L, Pavlovicz RE, Chai H-B, Ninh T, Li C, Soejarto DD, McKay DB, Kinghor AD. “Alkaloids from Microcos paniuclata with cytotoxic and nicotinic receptor antagonistic activities”, Journal of Natural Products, 76(2):243-439, 2013.
5.) Koval OM, Snyder JS, Wolf RM, Pavlovicz RE, Cardona N, Glynn P, Leymaster ND, Dun W, Wright PJ, Qian L, Mitchell CC, Boyden PA, Binkley PF, Li C, Anderson ME, Mohler PJ, Hund TJ. “CaMKII-based regulation of voltage-gated Na+ channel in cardiac disease”, Circulation, 126(17):2084-94, 2012.
6.) Henderson BJ, González-Cestari TF, Yi B, Pavlovicz RE, Boyd RT, Li C, Bergmeier SC, McKay DB. “Defining the putative inhibitory site for a selective allosteric modulator of human α4β2 neuronal nicotinic receptors”, ACS Chemical Neuroscience, 3(9):682-92, 2012.
7.) Henderson BJ, Carper DJ, González-Cestari TF, Yi B, Mahasenan KV, Pavlovicz RE, Dalefield ML, Coleman RS, Li C, McKay DB. “Structure-activity relationship studies of sulfonylpiperazine analogues as novel negative allosteric modulators of human neuronal nicotinic receptors”, Journal of Medicinal Chemistry, 54(24):8681-92, 2011.
8.) Mahasenan KV, Pavlovicz RE, Henderson BJ, González-Cestari TF, Yi B, McKay DB, Li C. “Discovery of novel α4β2 neuronal nicotinic receptor modulators through structure-based virtual screening”, ACS Medicinal Chemistry Letters, 2(11):855-860, 2011.
9.) Pavlovicz RE, Henderson BJ, Bonnell AB, Boyd RT, McKay DB, Li C. “Identification of a novel negative allosteric site on human α4β2 and α3β4 neuronal nicotinic acetylcholine receptors”, PLOS ONE, 6(9): e24949, 2011.
10.) West MB, Wickham S, Quinalty LM, Pavlovicz RE, Li C, Hanigan M. “Autocatalytic cleavage of human gamma-glutamyl transpeptidase is highly dependent on N-glycosylation at asparagine 95”, Journal of Biological Chemistry, 286(33):28876-88, 2011.
11.) Henderson BJ, Pavlovicz RE, Allen JD, González-Cestari TF, Orac CM, Bonnell AB, Zhu MX, Boyd RT, Li C, Bergmeier SC, McKay DB. “Negative allosteric modulators that target human α4β2 neuronal nicotinic receptors”, Journal of Pharmacology and Experimental Therapeutics, 334(3):761-74, 2010.
vii
12.) Doddapaneni K, Mahler B, Pavlovicz RE, Haushalter A, Yuan C, Wu Z. “Solution structure of RCL, a novel 2’-deoxyribonucleoside 5’-monophosphate N-glycosidase”, Journal of Molecular Biology, 394(3), 423-434, 2009.
13.) Tiwari R, Mahasenan K, Pavlovicz RE, Li C, Tjarks W. “Carborane clusters in computational drug design: a comparative docking evaluation using AutoDock, FlexX, Glide, and Surflex”, Journal of Chemical Informatics, 49(6), 1581-1589, 2009.
14.) Liu Z, Liu S, Xie Z, Pavlovicz RE, Wu J, Chen P, Aimiuwu J, Pang J, Bhasin D, Neviani P, Fuchs JR, Plass C, Li PK, Li C, Huang THM, Wu LC, Rush L, Wang H, Perrotti D, Marcucci G, Chan KK. “Modulation of DNA methylation by a sesquiterpene lactone pathenolide”, Journal of Pharmacology and Experimental Therapeutics, 329(2), 505-514, 2009.
15.) González-Cestari TF, Henderson BJ, Pavlovicz RE, McKay SB, El-Hajj RA, Pulipaka AB, Orac CM, Reed DD, Boyd RT, Zhu MX, Li C, Bergmeier SC, McKay DB. “Effect of novel negative allosteric modulators of neuronal nicotinic receptors on cells expressing native and recombinant nicotinic receptors: implications for drug discovery”, Journal of Pharmacology and Experimental Therapeutics, 328(2), 504-515, 2009.
16.) Liu Z, Xie Z, Jones W, Pavlovicz RE, Liu S, Li PK, Lin J, Fuchs JR, Marcucci G, Li C, Chan KK. “Curcumin is a potent DNA hypomethylation agent”, Bioorganic & Medicinal Chemistry Letters, 19(3), 706-709, 2009.
Fields of Study
Major Field: Biophysics
viii
Table of Contents
Abstract ...... ii
Dedication ...... v
Vita ...... vi
List of Tables ...... xii
List of Figures ...... xv
List of Abbreviations ...... xxi
Chapter 1 . Introduction ...... 1
1.1 Computers in Biochemistry ...... 1
1.2 Free Energy Calculations ...... 12
1.3. Dissertation Themes and Organization ...... 17
Chapter 2 . Identification of a Negative Allosteric Binding Site on the Nicotinic
Acetylcholine Receptor ...... 21
2.1 Introduction ...... 21
2.2 nAChR Background ...... 22
2.3 Homology Modeling ...... 29
2.4 Molecular Dynamics ...... 36
2.5 Blind Docking ...... 44
2.6 Focused Docking and Induced Fit Molecular Dynamics ...... 50
2.7 Binding Site Validation: Mutagenesis and Functional Assays ...... 53
2.8 Free Energy Analysis ...... 59 ix
2.9 Mechanism of Allosteric Antagonism ...... 63
2.10 Conclusions ...... 70
Chapter 3 . Experimental Investigation of Retinoic Acid Receptor Antagonism .. 71
3.1 Introduction ...... 71
3.2 Nuclear Receptor Background ...... 72
3.3 NR LBD expression and purification ...... 103
3.4 Circular Dichroism Experiments ...... 116
3.5 Dimerization LC Experiments ...... 120
3.6 ITC experiments ...... 122
3.7 Origin of β-apo-13-carotenone binding affinity ...... 141
3.8 Conclusions ...... 164
Chapter 4 . Computational Investigation of Retinoic Acid Receptor Antagonism
...... 166
4.1 Introduction ...... 166
4.2 Ligand Parameterization ...... 166
4.3 Ligand Docking to RARα LBD ...... 181
4.4 Molecular Dynamics Simulations of RARα Complexes ...... 192
4.5. Free Energy Analysis ...... 212
4.6. Long Timescale Simulations of Apo RARα LBD ...... 233
4.7. Conclusions ...... 239
Chapter 5 . Force Field Parameterization of S-Adenosylmethionine ...... 240
5.1 Introduction ...... 240
x
5.2 SAM Background ...... 240
5.3 Survey of Existing SAM and SAH Structures ...... 249
5.4 Sulfonium Force Field Parameterization ...... 256
5.5 Derivation of Partial Atomic Charges for SAM ...... 290
References ...... 298
Appendix A. Nuclear Receptor Sequence Alignment ...... 316
Appendix B. Structure of Synthetic RAR Ligands Available from Tocris ...... 319
Appendix C. His-hRARα LBD Primary Sequence and Plasmid Sequence ...... 321
Appendix D. His-hRXRα LBD Primary Sequence and Plasmid Sequence ...... 323
Appendix E. FPLC Standards ...... 325
Appendix F. Detailed ITC Data ...... 327
Appendix G. AMBER Input Files for All-trans Retinoic Acid ...... 330
Appendix H. AMBER Input Files for TTNPB ...... 332
Appendix I. AMBER Input Files for β-apo-13-carotenone ...... 334
Appendix J. AMBER Input Files for β-apo-14’-carotenoic Acid ...... 336
Appendix K. Partial Atomic Charges for β-apo-13-carotenone Covalently Linked to Cysteine...... 338
Appendix L. CCS Library File for Implementation in AMBER ...... 340
Appendix M. CCS Parameter File for Implementation in AMBER ...... 347
Appendix N. EC 2.1.1 Members That Do Not Use SAM as a Methyl Donor ..... 348
Appendix O: Partial Atomic Charges for All SAM Conformations Used in
Multiconfiguration Fits ...... 350
xi
List of Tables
Table 2.1. Sequence identity between template and model sequences...... 31
Table 2.2. Sequence similarity between template and model sequences...... 32
Table 2.3. Number of conformational clusters from MD simulations of nAChR models in three different binding states...... 38
Table 2.4. Average RMSDs for backbone atoms of ECD models from MD simulations in three states...... 41
Table 2.5. Measurements of agonist binding stability in MD simulations of epibatidine-bound nAChRs...... 44
Table 2.6. Blind docking results for agonists to multiple nAChR conformations...... 49
Table 2.7. Effects of agonists and antagonists on wild type and mutant hα4β2 nAChRs...... 58
Table 2.8. MM-PBSA binding energy calculations for epibatidine- and KAB-18-bound receptors.
...... 61
Table 2.9. Survey of C loop closure for AChBP X-ray structures...... 65
Table 2.10. General ranges for C loop "openness" upon binding ligands of different pharmacological function...... 65
Table 2.11. Measurements of C loop closure for MD simulations of epibatidine-bound nAChRs. 66
Table 3.1. List of common NRs and their known endogenous ligands...... 74
Table 3.2. Apocarotenoid binding affinity for human RAR subtypes...... 101
Table 3.3. Percentage of folded His-hRARα LBD with added ethanol...... 120
Table 3.4. Summary of ITC experiments with hRARα LBD...... 137
Table 3.5. Summary of ITC experiments on hRARα C235A LBD...... 147
Table 3.6. Existing hRARα LBD crystal structures...... 162
Table 3.7. Conditions tested for -apo-13-carotenone and RARα crystallization...... 163
xii
Table 4.1. Comparison of calculated angles describing the tetrasubstituted carbon of compound
1...... 176
Table 4.2. Optimized angles and force constants for the tetrahedral linkage of β-apo-13- carotenone to cysteine...... 179
Table 4.3. Comparison of angle measurments from MD simulations using GAFF or MP2/6-
311+G(d,p)-optimized parameters...... 180
Table 4.4. ATRA docking to RARα LBD...... 184
Table 4.5. Cluster analysis of β-apo-13-carotenone docked to RARα LBD...... 187
Table 4.6. Custer analysis of β-apo-14'-carotenoic acid docked to RARα LBD...... 190
Table 4.7. List of RARα LBD simulations performed...... 192
Table 4.8. All-atom RMSD of ligands with respect to average structure...... 206
Table 4.9. Entropic and MM-PBSA deviations from 50 ps data for reduced data sets...... 219
Table 4.10. Summary of ITC Data of SRC-1 NR2 peptide binding ligand-bound RARα...... 221
Table 4.11. MM-PBSA binding energy components after 1.5 μs of MD simulation...... 222
Table 4.12. Detailed MM-PBSA components for receptor peptide interaction...... 223
Table 5.1. List of crystal structures binding SAM/SAH with a syn-conformation...... 255
Table 5.2. Experimental, force field, and ab initio sulfonium measurements...... 263
Table 5.3. Partial atomic charges for 2,3-butanedione...... 272
Table 5.4. Force constants derived for trimethylsulfonium...... 278
+ Table 5.5. RMSD between MM and QM frequencies for C3H9S with reparameterized V3...... 279
Table 5.6. H-C-S-C profiles...... 281
Table 5.7. Force constants derived for ethyldimethylsulfonium...... 284
+ Table 5.8. RMSD between MM and QM frequenceis for C4H11S with reparameterized V3...... 287
Table 5.9. C-C-S-C dihedral scan results...... 287
Table 5.10. Final sulfonium parameters...... 290
Table 5.11. PDB structure used for derivation of partial atomic charges representing the anti-
conformation...... 294 xiii
Table 5.12. PDB structures used for derivation of partial atomic charges representing the high anti-conformation...... 294
Table 5.13. PDB structures used for derivation of parital atomic charges representing the syn- conformation...... 294
Table 5.14. Comparison of partial atomic charges for SAM in multiple conformation...... 296
Table E.1. Contents of protein standrd 1...... 325
Table E.2. Contents of protein standard 2...... 326
Table F.1. ATRA-bound hRARα LBD ITC results...... 327
Table F.2. TTNPB-bound hRARα LBD ITC results...... 327
Table F.3. β-apo-13-carotenone-bound hRARα LBD ITC results...... 327
Table F.4. β-apo-14'-carotenoic acid-bound hRARα LBD results...... 327
Table F.5. BMD 195614-bound hRARα LBD ITC results...... 328
Table F.6. β-apo-13-lycopenone-bound hRARα LBD ITC results...... 328
Table F.7. Untreated (apo) hRARα LBD ITC results...... 328
Table F.8. ATRA-bound C235A hRARα LBD ITC results...... 328
Table F.9. β-apo-13-carotenone-bound C235A hRARα LBD ITC results...... 329
Table K.1. Partial atomic charges for CCS...... 338
Table O.1. Partial atomic charges for SAM in anti-conformation...... 350
Table O.2. Partial atomic charges for SAM in high anti-conformation...... 350
Table O.3. Partial atomic charges for SAM in syn-conformation...... 352
xiv
List of Figures
Figure 1.1. Thermodynamic cycle implemented in the MM-PBSA protocol...... 15
Figure 1.2. Structural representation of the two receptors studied in this dissertation...... 17
Figure 2.1. Schematic of neuronal nAChR structure...... 23
Figure 2.2. Structure of acetylcholine binding protein (AChBP)...... 26
Figure 2.3. Numbered sequence alignment of AChBP and nAChR sequences used for modeling.
...... 31
Figure 2.4. Histograms of model energies per modeling iteration...... 35
Figure 2.5. RMSD plots for MD simulations of nAChR models...... 39
Figure 2.6. Average all-atom RMSDs for hα4β2 nAChR ECD model in three different binding states...... 42
Figure 2.7. Average all-atom RMSDs for hα3β4 nAChR ECD model in three different binding states...... 43
Figure 2.8. Compounds used in blind docking experiments...... 45
Figure 2.9. Blind docking modes compared to X-ray structures...... 49
Figure 2.10. Stability of KAB-18 at its proposed binding site...... 52
Figure 2.11. Detailed docking modes for negative allosteric nAChR modulators...... 53
Figure 2.12. Dose-response curves for epibatidine and KAB-18 on wild type and mutant hα4β2
nAChRs...... 57
Figure 2.13. Convergence of MM-PBSA calculations...... 62
Figure 2.14. C loop closure of AChBP bound to various ligands...... 64
Figure 2.15. Comparison of experimental DMXBA binding to computationally predicted KAB-
18/epibatidine binding...... 69
Figure 3.1. Domain organization of N-CoR1 and SMRT corepressor proteins...... 76
Figure 3.2. Domain organization of p160 family of coactivators...... 78
xv
Figure 3.3. Domain organization of typical nuclear receptor...... 83
Figure 3.4. DNA-binding domain of estrogen receptor in complex with hormone response element...... 85
Figure 3.5. Diagram of apo hRXRα LBD...... 86
Figure 3.6. Crystal structure of apo hPPARγ...... 88
Figure 3.7. Crystal structure of agonist-bound hPPARγ in complex with SRC-1 NR-box 2 peptide.
...... 90
Figure 3.8. Antagonist-induced H12 conformation...... 92
Figure 3.9. Corepressor interactions with inverse agonist-bouind NR LBDs...... 96
Figure 3.10. Extended helix 12 of apo hRXRα LBD interacts with coactivator binding pocket of a neighboring molecule...... 97
Figure 3.11. β-carotene and apocarotenoids...... 102
Figure 3.12. SDS-PAGE of His-hRARα LBD...... 105
Figure 3.13. Size exclusion chromatogram for His-hRARα LBD...... 107
Figure 3.14. SDS-PAGE for His-hRXRα LBD...... 109
Figure 3.15. Size exclusion chromtogram for His-hRXRα LBD...... 110
Figure 3.16. SDS-PAGE of His-hRARα/∆His-hRXRα LBD heterodimer purification...... 112
Figure 3.17. Size exclusion chromatogram for His-hRARα/∆His-hRXRα LBD purification...... 113
Figure 3.18. Purification of RARα/RXRα heterodimers...... 114
Figure 3.19. SDS-PAGE of His-hRARα LBD thrombin cleavage...... 116
Figure 3.20. Circular dichroism spectrum for His-hRARα LBD in solution with varied amounts of ethanol...... 118
Figure 3.21. Percentage of folded His-hRARα LBD with added ethanol...... 119
Figure 3.22. Size exclusion chromatograms of hRARα/hRXRα LBD complexes...... 122
Figure 3.23. Structures of RAR modulators used in ITC experiments...... 128
Figure 3.24. ITC results of SRC-1 NR2 peptide binding to ATRA-bound hRARα LBD...... 130
Figure 3.25. ITC results of SRC-1 NR2 peptide binding to TTNPB-bound hRARα LBD...... 131 xvi
Figure 3.26. ITC results of SRC-1 NR2 peptide binding to β-apo-13-carotenone-bound hRARα
LBD...... 132
Figure 3.27. ITC results of SRC-1 NR2 peptide binding to β-apo-14'-carotenoic acid-bound
hRARα LBD...... 133
Figure 3.28. ITC results of SRC-1 NR2 peptide binding to BMS614-bound hRARα LBD...... 134
Figure 3.29. ITC results of SRC-1 NR2 peptide binding to β-apo-13-lycopenone-bound hRARα
LBD...... 135
Figure 3.30. ITC results of SRC-1 NR2 peptide binding to apo hRARα LBD...... 136
Figure 3.31. Suggested mechanism of covalent bond formed between β-apo-13-carotenone and
C235 of hRARα LBD...... 142
Figure 3.32. Luffariellolide covalently binds to RAR LBD...... 143
Figure 3.33. ITC results of SRC-1 NR2 peptide binding to ATRA-bound hRARα C235A LBD. .. 145
Figure 3.34. ITC results of SRC-1 NR2 peptide binding to β-apo-13-carotenone-bound hRARα
C235A LBD...... 146
Figure 3.35. 13C-labeled β-apo-13-carotenone...... 148
Figure 3.36. 13C NMR spectra...... 150
Figure 3.37. ADEQUATE spectrum of free, triply-labeled β-apo-13-carotenone...... 154
Figure 3.38. ADEQUATE spectrum of triply-labeled β-apo-13-carotenone bound to hRARα LBD
(C203S, C336S)...... 155
Figure 3.39. ADEQUATE spectrum of triply-labeled β-apo-13-carotenone bound to hRARα LBD
(C203S, C336S, C235A)...... 156
Figure 3.40. Mass spectra of hRARα LBD...... 159
Figure 4.1. Structure of all-trans retinoic acid (ATRA)...... 169
Figure 4.2. Structure of TTNPB...... 170
Figure 4.3. Structure of β-apo-14'-carotenoic acid...... 171
Figure 4.4. Structure of β-apo-13-carotenone...... 172
Figure 4.5. Structure of CCS amino acid and 2-(methylthio)but-3-en-2-ol (1)...... 177 xvii
Figure 4.6. Mesurement of the c2-c3-S angle from MD simulations using parameters refit to ab initio calculations or using GAFF parameters...... 181
Figure 4.7. Docking modes of ATRA...... 185
Figure 4.8. Docking modes of β-apo-13-carotenone...... 186
Figure 4.9. Proximity of β-apo-13-carotenone to C235...... 188
Figure 4.10. β-apo-14'-carotenoic acid extending outside the RAR binding cavity...... 189
Figure 4.11. Docking models of β-apo-14'-carotenoic acid...... 191
Figure 4.12. RAR LBD structural alignment and mobility...... 197
Figure 4.13. RARα LBD backbone RMSD...... 198
Figure 4.14. Average per residue RMSDs with respect to starting structure (1)...... 202
Figure 4.15. Average per residue RMSDs with respect to starting structure (2)...... 203
Figure 4.16. Average per residue RMSDs with respect to the average structure (1)...... 204
Figure 4.17. Average per residue RMSDs with respect to the average structure (2)...... 205
Figure 4.18. Induced fit MD of -apo-14'-carotenoic acid...... 207
Figure 4.19. Deformation of L1-3 loop upon β-apo-14'-carotenoic acid binding...... 209
Figure 4.20. Y208-D288 distance...... 209
Figure 4.21. S213-D221 distance...... 210
Figure 4.22. Distance between β-apo-13-carotenone and C235 of RAR LBD...... 212
Figure 4.23. Entropy comparison between full and truncated systems...... 217
Figure 4.24. Computed entropic component, T∆S, of SRC-2 NR2 binding to ATRA-bound hRARα
LBD...... 219
Figure 4.25. Range of average deviations from 50 ps data for reduced entropy and MM-PBSA data sets...... 220
Figure 4.26. Computed binding energies for SRC-1 NR2 peptide binding to RAR LBD...... 222
Figure 4.27. Computed ∆Gbind distributions...... 226
Figure 4.28. Computed ∆Gbind distributions...... 227
xviii
Figure 4.29. Correlation between computed and experimental ∆G values for SRC-1 NR2 peptide
binding to RARα...... 228
Figure 4.30. Correlation coefficient over 1.5 μs of free energy calculations...... 229
Figure 4.31. Correlation coefficient and slope of regression lines for binding energy components.
...... 231
Figure 4.32. Comparison of SRC-1 NR2 binding energies to RARα bound to β-apo-13-carotenone both covalently and non- covalently...... 232
Figure 4.33. Deviation of apo RARα LBD simulations...... 235
Figure 4.34. Computed secondary structure of 4.75 μs RARα LBD simulations (run 1)...... 236
Figure 4.35. Computed secondary structure of 5 μs RARα LBD simulation (run 2)...... 237
Figure 4.36. Conformation of apo RAR LBD over 5 s of explicit solvent simulation...... 238
Figure 5.1. SAM methyltransferase reaction...... 242
Figure 5.2. Radical SAM reaction...... 245
Figure 5.3. The SAM cycle...... 248
Figure 5.4. Folic acid cycle...... 249
Figure 5.5. Statistics of PDB crystal structures including SAM or SAM...... 250
Figure 5.6. Gylcosidic (χ) torsion angle of SAM and SAH structures in the PDB...... 251
Figure 5.7. 44 SAM molecules in the anti-conformation...... 253
Figure 5.8. 184 SAM molecules in the high anti-conformation...... 253
Figure 5.9. 22 SAM molecules in the syn-conformation...... 254
Figure 5.10. Sulfur atom types parameterized in the general AMBER force field (GAFF)...... 260
Figure 5.11. Sample dihedral profiles with periodicity of 1, 2, and 3...... 267
Figure 5.12. Dihedral scan of 2,3-butanedione...... 268
Figure 5.13. Initial dihedral profile fitting test for 2,3-butanedioine...... 270
Figure 5.14. Dihedral profiles for 2,3-butanedione with different charge sets...... 273
Figure 5.15. Optimized dihedral profiles for 2,3-butanedione...... 275
Figure 5.16. Force field parameters required for treatment of sulfonium center in SAM...... 276 xix
Figure 5.17. Fitting of the H-C-S-C V3 parameter...... 280
+ Figure 5.18. H-C-S-C torsional profile of C3H9S with existing force field parameters...... 282
Figure 5.19. Fitting of the C-C-S-C V3 parameter...... 286
Figure 5.20. Comparison of C-C-S-C torsional profile with new and existing force field ...... 288
Figure 5.21. Absolute energy difference between QM and MM C-C-S-C torsional profiles...... 289
Figure 5.22. Geometry optimization of SAM...... 293
Figure 5.23. Partial atomic charges for SAM with three different χ angles...... 297
Figure B.1. Structure of synthetic RAR ligands...... 320
Figure E.1. FPLC chromatogram for protein standard set 1...... 325
Figure E.2. FPLC chromatogram for protein standard set 2...... 326
xx
List of Abbreviations
ACBP = acyl-CoA binding protein
AChBP = acetylcholine binding protein
AdoHcy = S-adenosyl-L-homocysteine
AdoMet = S-adenosyl-L-methionine
AK = aspartate kinase
ATRA = all-trans retinoic acid
BPTI = basic pancreatic trypsin inhibitor
CREB = cAMP response element binding-protein
CBP = CREB binding-protein
CI = confidence interval
DBD = DNA-binding domain (of NR)
DMXBA = 3-2(2,4-dimethoxybenzylidine)-anabaseine
DFT = density functional theory
EC number = enzyme commission number
ECD = extracellular domain
EM = electron microscopy
FEP = free energy perturbation
FF = force field
FPLC = fast protein liquid chromatography
GAFF = general AMBER force field
HAT = histone acelytransferase
HDAC = histone deacetylase
ITC = isothermal titration calorimetry
LBD = ligand-binding domain
xxi
LES = locally enhanced sampling
LGA = Lamarckian genetic algorithm
MAT = methionine adenosyltransferase
MD = molecular dynamics
MEP = molecular electrostatic potential
MM = molecular mechanics
MM-PBSA = molecular mechanics Poisson-Boltzmann surface area
Phser = phosphohomoserine pLGIC = pentameric ligand-gated ion channel nAChR = nicotinic acetylcholine receptor
NAM = negative allosteric modulator
NR = nuclear receptor
NR2 = nuclear receptor box II (second NR interaction motif of SRC)
PDB = protein databank
PME = particle mesh ewald
QM = quantum mechanics
RAR = retinoic acid receptor
REMD = replica exchange molecular mechanics
RESP = restrained electrostatic potential
RID = receptor-interaction domain (of SRC)
RMSD = root-mean-square deviation
RXR = retinoic X receptor
SAH = S-adenosyl-L-homocysteine
SAM = S-adenosyl-L-methionine
SAR = structure activity relationship
SD = standard deviation
SE = standard error xxii
SRC = steroid receptor coactivator
TCEP = tris(2-carboxyethyl)phosphine)
TI = thermodynamics integration
THF = tetrahydrofolate
TM = transmembrane
xxiii
Chapter 1. Introduction
1.1 Computers in Biochemistry
Computation has been called the “third pillar of science” [1], serving as an
important link between theory and experiment. As applied in the field of
chemistry, computation is able to describe phenomena that are either difficult or
otherwise impossible to observe experimentally. In the past two decades, the
field of computational chemistry has become increasingly refined, now frequently
serving as a valuable compliment to experiment; it can often provide insights to
drive the development of new hypotheses that may be tested experimentally. As
the field matures, it draws closer to obtaining the significant goal of becoming a
reliably predictive method.
Perhaps the best way to chronicle the developments of computational chemistry
is to highlight the two Nobel Prizes awarded to scientists in the field. First, in
1998, the Nobel Prize in Chemistry was awarded to Walter Kohn for “his development of the density-functional theory” and to John Pople for his
“development of computational methods in quantum chemistry.” Both Kohn and
Pople significantly contributed to the solution of quantum mechanical (QM) electronic structure calculations through which the properties of chemical entities may be determined. Kohn helped develop density functional theory (DFT) where
1 functionals are used to describe electron density as an alternative method to
dealing with wavefunctions [2]. This method scales well with an increased
number of electrons and is therefore popular in electronic structure calculations
of many-electron systems such as those studied by material scientists.
Pople’s work, on the other hand, was in the ab initio (from first principles) solution
of the Schrödinger equation to determine the discretized energy states
(eigenvalues) of a chemical system described by a wavefunction composed of a
linear combination of atomic orbitals. Methods to speed up calculations, such as
the use of basis sets composed of Gaussian-type orbitals and the development
of efficient algorithms to solve the Schrödinger equation resulted in Gaussian, the
popular quantum chemistry software package [3-5]. By implementing the
quantum mechanical theory worked out in the 1920s by Schrödinger, Dirac,
Heisenberg, and others, the Gaussian program allowed for the application of
these methods by general chemists.
With the use of complete basis sets describing electron positions in combination
with the treatment of all possible configurations, ab initio calculations can
converge on the exact time-independent, non-relativistic solution of the
Schrödinger equation within the Born-Oppenheimer approximation (electron
motion decoupled from fixed nuclei). This, however, comes at a great
computational cost, and in some cases is only theoretically possible. Where DFT
scales no worse than O(N3), where N is the system size, modern treatments of
electronic structures such as MP2 or MP3 scale as O(N5) and O(N6) respectively.
Thus, while ab initio calculations may be carried out for small molecules with high
2 accuracy, such a treatment for large biomolecules including proteins or nucleic
acids is prohibitively costly. Therefore, approximate techniques must be implemented for the study of biological systems via computational chemistry.
The development of more approximate computational chemistry techniques were
highlighted with the 2013 Nobel Prize in Chemistry which was awarded jointly to
Martin Karplus, Michael Levitt, and Arieh Warshel for their “development of multiscale models for complex chemical systems.” The work honored with this award laid the groundwork for the integration of quantum mechanical calculations such as those discussed above with classical molecular mechanics methods
(MM) in an approach called QM/MM. QM/MM simulations have application in the study of enzymatic reactions where only a select number of atoms are required
to be treated quantum mechanically (the atoms involved in bond breaking and
bond formation), while the remaining atoms may be treated with an atomistic
force field. By limiting the number of atoms treated by computationally expensive
QM methods, the QM/MM method allows for chemical reactions to be studied in
their native environments.
Similar to the case of quantum mechanics, the theoretical basis for classical
molecular mechanics simulations of chemical systems was developed well
before modern computers existed. The work of Hill and Westheimer in the 1940s
used Coulombic and van der Waals interactions to explain the interaction of
atoms [6,7]. These concepts were later incorporated into some of the first force
fields and MM programs by Lifson and Warshel in the late 60s, in which early
computations were used to measure energy differences between the
3 conformations of small molecules [8]. One of the earliest molecular dynamics
(MD) simulations was performed by Rahman in 1964, which consisted of 864
argon atoms using periodic boundary conditions [9]. In 1977, thirteen years after
the simulation by Rahman, the first MD simulation of a protein was performed by
McCammon in the Karplus lab. This was a 8.8 ps long simulation of basic
pancreatic trypsin inhibitor (BPTI) performed in vacuum [10]. Although this simulations was extremely short compared to modern standards, it helped to change the mindset that protein interiors were rigid and began to bring life to the static structures provided by crystallography. At the time, X-ray crystallography was the only way in which structures of biomolecules could be determined; the first de novo NMR protein structure, that of proteinase inhibitor IIA from bull
seminal plasma, was reported by the Wüthrich group in 1985 [11].
In the 35 years since the first protein MD simulation, great advances have been
made in the field of computational chemistry, coinciding with the remarkable
increase in availability and power of computer hardware. Throughout this time of
development, a significant trade-off between speed and accuracy of the
simulations has always been a consideration. Regarding the MD simulation of biomolecules, it is important to obtain an adequate amount of conformational
sampling, either through multiple short simulations or a single long simulation,
such that calculated properties have statistically converged to an average with acceptable errors. Different approaches may be used in order to achieve convergent results with limited computational power. One enhanced sampling method is replica exchange molecular dynamics (REMD), which runs several
4 simulations in parallel over a range of temperatures, allowing conformations to be
swapped between different temperatures based on satisfaction of a Metropolis
criterion. This permits enhanced sampling experienced only at higher temperatures to be accessible to the lower temperature simulations due to
lowered energy barriers [12].
Another method to achieve enhanced sampling is to sample the conformation of
a solute in an implicitly solvated environment. In a MD simulation in which the
solute is explicitly solvated (i.e. all water molecules are considered), a considerable amount of the computational effort is applied to calculating the
interactions between the water molecules themselves. For example, when using
a 12-15 Å buffer of water from the solute surface to the edges of the unit
simulation cell, the number of water atoms typically compose 85-90% of the total
atoms of the entire system. These numbers reflect the use of the TIP3P water
model, which uses three points or atoms to describe each water molecule. If a
more accurate four-point water model such as TIP4P-Ew were to be used, the
total percentage of solute would increase to 90-95%. In an implicitly solvated
simulation, only the atoms of the solute are directly considered and the solvent
environment is replaced by a reaction field which acts upon each charge of the
solute. The reaction field may be calculated by the Poisson-Boltzmann (PB)
equation [13] or the more approximate (and more quickly calculated) generalized
Born (GB) method [14,15]. A tradeoff for the increased speed of simulations
implementing implicit solvents is a loss of atomistic detail, particularly at the solvent accessible surface of the solute.
5 In recent years, major breakthroughs in computational power have seen new
efforts applied to the production of very long explicitly solvated simulations that
extend into the microsecond or millisecond range. Three different approaches
have been used to produce exceptionally long MD simulations: supercomputing,
distributed computing, and the use of special-purpose hardware. The use of
supercomputers, in which thousands of processors work together via high-speed
connections, has been a current standard in the production of long MD
trajectories. In 2013, the Schulten group reported an all-atom, explicitly solvated,
100 ns simulation of the full HIV-1 capsid totaling an extraordinary 64,423,983
atoms [16]. The simulation was carried out using 4,000 nodes (128,000
processors) of the $188 million Blue Waters supercomputer for a production rate
of 5-9 ns/day. To put the size of this simulation in perspective, the two targets
studied in this dissertation, the ligand-binding domains of the nicotinic
acetylcholine receptor (nAChR) and retinoic acid receptor (RAR), contain a
respective ~130,000 and ~35,000 atoms when explicitly solvated. Considering
that complete pairwise simulations scale as O(N2) while most modern MD
algorithms scale as O(N log N), doubling the number of atoms, N, in a
simulations will always decrease the simulation rate by more than a factor of 2.
Therefore, assuming O(N log N), the nAChR and RAR systems could be
theoretically simulated at rates of 3.8-6.8 μs/day and 6.7-12.1 μs/day using the supercomputer resources mentioned above. The longest simulations reported in this dissertation are two 5 μs RAR trajectories that were computed at an average rate of 42 ns/day.
6 In a more cost-effective approach, the Pande group has developed a distributed computing platform called Folding@Home which harnesses the unused computational power of thousands of volunteers around the world [17]. Currently,
170,601 computers are donating time to the project. A recent noteworthy result
from this application was the 2012 report of a total of 30 ms of simulation of Acyl-
CoA binding protein (ACBP) using the AMBER ff96 force field in combination with a GBSA implicit water model [18].
Finally, of particular note are the simulations performed on Anton, a special- purpose machine built by D. E. Shaw Research (DESRES) solely to produce very long MD simulations. In 2010, DESRES reported the first 1 ms all-atom
simulation of a solvated protein [19], the 58-residue BPTI, which was the subject
of the very first protein simulation by McCammon and Karplus 33 years earlier.
The DESRES simulation of BPTI was eight orders of magnitude longer than that
initial simulation and two orders of magnitude longer than the previously reported
longest all-atom MD simulation at the time [20]. The BPTI simulation began in the
native state and maintained a near-native conformation for the duration of the
simulation, transitioning between only about four different conformational states,
two of which had been described via NMR experiments [21]. In the same paper,
multiple 100 μs simulations of FiP35, an engineered WW domain protein with a
fast folding time, were also reported. These trajectories began in the unfolded
state and exhibited numerous folding and refolding events within the same
unbiased, all-atom simulation. The calculated folding rate of 10 ± 3 μs was in
close agreement with the experimentally measured folding rate of 14 μs [22].
7 While these simulations are impressive, they were performed using resources that few scientists may access. Both standard supercomputer and special- purpose hardware are very costly, presenting a significant barrier to access.
Distributed computing, on the other hand, provides a solution to the prohibitive costs, since all of the hardware resources are donated. However access to this platform remains an issue. A recent advancement in the rewriting of MD code to make use of modern graphics processing units (GPUs) has dramatically improved simulation speeds available to general users [23,24]. Harnessing GPU technology, the development of which has been propelled by the profitable video game industry, enables the routine production of microsecond all-atom simulations at a small fraction of the cost of a supercomputer or special-purpose machine.
The recent advancements in simulation length have allowed for a more thorough analysis of modern force field accuracy. The earlier simulations by DESRES used a modified version of the AMBER ff99SB force field (ff99SB-ildn) [25], and while the successful folding of FiP35 in part validated the ability of the force field to describe a folding mechanism with rates comparable to experiment does not mean it will translate successfully to other proteins that are either larger or possess different structural elements. In fact, it has been determined that a modified version of the CHARMM force field (CHARM22*) is more robust at describing the folding of a diverse set of small proteins [26,27]. In the application of long timescale, all-atom MD simulations to the refinement of homology models, it was observed that the models tended to drift away from their native
8 structures and that the reason the native state was not achieved was more likely
due to force field deficiencies rather than poor conformational sampling [28].
A recent study compared the ability of 11 force fields (including many AMBER
variants) in five different solvent environments to reproduce 524 NMR chemical
shifts and J-couplings of various short peptides and ubiquitin. These results
indicated that the ff99SB-ildn-NMR and ff99SB-ildn-phi variants performed the
best when simulated in the TIP4P-Ew or TIP4P-2005 water environment [29].
The same study showed that the use of an implicit solvent model (GB in this
study) performed the worst of the five solvent models tested. This indicated that
while the use of GB in folding simulations is able to achieve native
conformations, some of the finer dynamics are not well reproduced. Another
recent report studied four force fields in combination with four water models as
applied to the simulation of a protein crystal (scorpion toxin protein II) [30]. The
results of this study found that the pairing of ff99SB with the TIP3P water model
performed the best at maintaining lattice structure and reproducing experimental
B-factors.
A general conclusion of most recent studies is that current force fields can and
should be improved. The most used modern force fields such as the ff99SB variants of AMBER [25,31-33] and the CHARMM22 force field with CMAP corrections [34,35] originated in the late 90s and have since been only slightly
modified instead of completely reparameterized. Only the CHARMM protein force
field was parameterized for use with a specific water model (a modified form of
TIP3P), while most other force fields, including AMBER, suggest optimal
9 performance with TIP3P. A problem here is that the TIP3P water model was
parameterized in 1983 [36], predating the development of the Particle Mesh
Ewald (PME) method in 1993 which permits the efficient and accurate calculation
of long-range electrostatic interactions in a periodic system by solving for the
interaction energies in Fourier space [37,38]. Use of different mesh Ewald
methods in the simulation of explicitly solvated systems with periodic boundary
conditions is now the standard, however TIP3P, the most common water model,
was never designed for use in such a configuration. In addition to TIP3P being
parameterized with interaction cutoffs instead of PME, this water model was also
only parameterized for use in ambient conditions and is known to have a
viscosity that is much less than experiment. This raises the question of whether
the promising results observed in some recent computational folding experiments
are due to the cancellation of errors between the water model and protein force
fields, since folding in a less viscous environment would be expected to result in
computed folding rates that are faster than experiment. More recent water
models such as TIP4P-Ew [39] and TIP4P/2005 [40] were developed to be consistent with PME methods and are able to better recreate many different
properties of bulk water (density, freezing/vaporization temperatures, self
diffusion coefficient, etc.) than TIP3P. Although this comes at the expense of an
additional ‘atom’ for each water molecule, the recent advances in computational
speed allow for the adoption of a more expensive water molecule.
New force fields are currently in development, being designed for use with
specific, more accurate water models. For instance, new amino acid charge sets
10 and Lennard-Jones parameters have been defined for the AMBER community that incorporate an implicit polarization effect from an average surrounding charge distribution of TIP4P-Ew water [41]. Newer force fields are also taking advantage of the ability to base parameterizations off of higher-level QM calculations: the new AMBER charges were derived at the MP2/cc-pV(T+d)Z level, where the charges used in the ff99 variants were derived using HF/6-31G*.
It is likely that these new parameterization efforts will reach the limits of the approximations implemented in classical force fields, namely the use of fixed, atom-centered partial atomic charges that disregard any explicit treatment of electronic effects.
If these new force fields fail to faithfully replicate the dynamics of biomolecules, it is likely that the incorporation electronic-level details such as polarization effects will be necessary to create more accurate models. Polarizable force fields have already been developed, with the AMOEBA force field being the current standard
[42,43]. The large AMBER and CHARMM communities also have developed polarized force fields of their own [44,45], however they are not in common use due to decreased simulation rate, which is often 2-3 fold slower than a classical force field. While properties such as solvation free energies and condensed phase dynamics of small peptides on a short time scale seem to be better reproduced by polarizable force fields such as AMOEBA [46], the behavior of
AMOEBA on long time scales remains unknown, making its overall accuracy much less defined than the classical force fields.
11 1.2 Free Energy Calculations
In addition to an atomistic description of protein folding, another major goal of computational chemistry is the accurate prediction of free energies. The ability to reliably predict binding free energies (∆Gbind) could revolutionize the
pharmaceutical industry, making computational drug design a standard first
approach to the development of a new therapeutic compound. Although there
have already been significant steps made toward this goal, most notably with the
computational design of the first HIV protease inhibitors [47,48], computational
chemists still often serve a support role of drug design teams instead of leading
the design process.
As is the case for general computational methods, a trade-off between speed
and accuracy for free energy calculations also exists. On one end of the
spectrum, there are fast ‘docking’ methods that determine the best fit between
two molecules, based on empirically weighted, physics-based energy functions.
These methods generally disregard any sort of dynamic interaction between the
ligand and receptor. Additionally, solvent effects are poorly treated or
disregarded all together, resulting in large errors. For example, the Glide XP and
SP scoring functions have RMSDs of 2.26 and 3.18 kcal/mol with respect to
experimental binding affinities for a set of 198 molecules [49]. The latest
implementation of AutoDock, AutoDock Vina, has a reported standard error of
2.75 kcal/mol for 116 compounds not used in its training set [50]. In spite of these
drawbacks, docking techniques may be used to perform tasks that are not
possible with more time consuming scoring methods. Most notably, in a
12 technique known as virtual screening, the speed of docking algorithms is
harnessed in the identification of new potential therapeutic drugs from databases
that can be on the order of 1,000,000 compounds.
On the other end of the accuracy/computational expense spectrum are
thermodynamically rigorous free energy perturbation (FEP) and thermodynamic
integration (TI) methods [51-53]. FEP and TI are alchemical methods in which
one ligand is computationally transformed into another ligand along a coupling
parameter, λ, such that λ=0 represents an initial ligand and λ=1 represents a
second ligand. Convergence of the relative binding energy differences, ∆∆Gbind,
are slow and costly, requiring individual simulations along the λ pathway that
exhaustively sample the protein, ligand, and protein/ligand complex in explicitly
solvated environments. The payoff for these time-consuming simulations is the
resulting, theoretically most accurate change in free energy. In addition to relative
free energy differences, absolute binding free energies may be calculated with
FEP/TI by transforming a ligand into ‘nothing’.
Finally, between docking and FEP/IT on the spectrum of rigor, there is the
molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) method for
computing ∆Gbind [54,55]. As the name suggests, MM-PBSA is a molecular
dynamics-based technique, and in that respect is similar to FEP/TI. However,
MM-PBSA is often described as an ‘end-point’ free energy method where the receptor, ligand, and complex are considered individually, differentiating it from
FEP/TI which employs an alchemical approach to transition one bound state into another. A simplification employed in the MM-PBSA method is that while the MD
13 simulations involved in the calculations are performed in an explicitly solvated
environment, the free energies are calculated using an implicit water model. This
serves to eliminate the noise introduced by the many degrees of freedom of the
numerous water molecules, expediting convergence at the expense of a less
accurate treatment of solvation free energy and entropic effects. The MM-PBSA
method is implemented in Chapters 2 and 4 of this dissertation, with the
application of the method to long timescale simulations making up a significant
portion of Chapter 4. Thus, a more detailed description of the method is provided
below.
The MM-PBSA method follows the thermodynamic cycle illustrated in Figure 1.1,
where the binding energy, ∆Gbind, is considered as the sum of three energy components:
∆Gbind = ∆EMM + ∆Gsolv – T∆S (1)
The first component, ∆EMM, is the intramolecular energy difference between the
complex and the sum of the receptor and ligand:
∆EMM = EMM_complex – (EMM_receptor + EMM_ligand) (2)
∆Gsolv, is the free energy difference of solvation, and T∆S is the temperature, T,
multiplied by the entropy difference, ∆S.
The intramolecular and solvation free energies can be further broken down into
the following components:
∆EMM = ∆EvdW + ∆Eele + ∆Eint (3)
14 Where EvdW is the van der Waals energy, Eele is the electrostatic energy, and Eint is the internal energy of molecule that is determined using classical force field equations.
Figure 1.1. Thermodynamic cycle implemented in the MM-PBSA protocol. The final binding solv vacuum energy, ∆Gbind , is calculated by first determining ∆Gbind , then accounting for the solvation effects of binding, ∆Gsolv, using an implicit water model.
Finally, the solvation free energy may be decomposed into electrostatic and nonpolar contributions:
∆Gsolv = ∆Gelec + ∆Gnp (4)
The traditional method to implicitly calculate the electrostatic contribution of solvation, ∆Gelec, is to use the Poisson-Boltzmann equation. The nonpolar part of solvation can be thought of as the energy required to create a cavity in the water for the solute. Therefore, this cavitation term is dependent upon the solvent 15 accessible surface area (SASA) of the solute. The most common way to
calculate the nonpolar portion of the free energy of solvation uses the following
relationship described by Sitkoff et al. [56]:
Gnp = γ*SASA-β (5)
Where the surface accessible surface area is scaled by a factor γ that is often
considered the surface tension, and offset by a correction factor, β.
The original implementation of the MM-PBSA method treated the nonpolar
component of solvation as linear to the SASA, however a newer, more physically
complete implementation considers both a repulsive and attractive component of
the nonpolar solvation energy:
Gnp = Grep + Gatt (6)
Here, the repulsive component remains linearly correlated to the SASA as in (5),
while the attractive component, which relates to the van der Waals interaction
energy between the solute and solvent molecules, can be computed using a
surface-integration approach [57]. This implementation performed very well at
computing the nonpolar component of solvation for a test set of 42 small
molecules (correlation coefficient = 0.98), however when extending the model to
measure the nonpolar potential of mean force for two sets of cytosine-guanine
base pairs, the results are less impressive. The application of this new methodology was not assessed for the treatment of larger biomolecules, which is addressed in Chapter 4 of this dissertation.
16 1.3 Dissertation Themes and Organization
The focus of this dissertation is in the application of computational tools to describe biological phenomena, particularly the dynamic interaction between biomolecules. The targets studied in the following chapters are two biological receptors: the nicotinic acetylcholine receptor (nAChR) and retinoic acid receptor
(RAR), as illustrated in Figure 1.2.
A B
Figure 1.2. Structural representation of the two receptors studied in this dissertation. A. Nicotinic acetylcholine receptor composed of five subunits (orange and blue) bound to acetylcholine (red) in the extracellular, ligand-binding domain. The position of a lipid bilayer is indicated in grey. B. Nuclear receptor showing the modular structure of the DNA-binding domain (bound to DNA, red and orange) and the ligand-binding domain (bound to ligand, purple). Images created by David S. Goodsell for RCSB PDB and made available through creative commons (http://creativecommons.org/licenses/by/3.0/us/#), used without alteration.
17 Some overlap in terminology arises and should be clarified, specifically for the
terms ‘receptor’ and ‘ligand’. The biological definition of a receptor corresponds
to a protein that translates an extracellular chemical signal (a ligand) into a
cellular response. In the cases of the nAChR and RAR, the endogenous ligands
are acetylcholine and retinoic acid, respectively. Binding of these ligands to their
respective receptors results in depolarization of the plasma membrane due to an
influx of positively charged ions in the case of nAChR, and gene transcription in
the case of RAR. In a biochemical sense, the term ‘receptor’ indicates a
particular type of protein, differentiating it from other classes of proteins such as
enzymes or structural proteins. A ‘ligand’ in this sense is generally limited to small, drug-like molecules or signaling peptides.
In the field of computational chemistry, particularly when considering free energy
calculations, the terms ‘ligand’ and ‘receptor’ carry far less restricted definitions.
Here, both the ligand and receptor may be any two molecules that interact. The
larger of the two interacting molecules is generally considered the receptor and is
often a protein (receptor, kinase, etc.) or nucleic acid. The ligand is generally a
small molecule or peptide, however it may also be another protein or nucleic
acid. Although the two targets considered in this dissertation are true biological
receptors, the computational methods applied herein may be used to describe
the dynamics and interactions between any two types of molecules.
Chapter 2 discusses the methods used to determine the binding site of a set of
nAChR negative allosteric modulators, given very little experimental data to limit
the search. A combination of molecular dynamics and blind docking was used on
18 a homology model of two nAChR subtypes to identify a potential binding site. The
binding mode was further refined and characterized by MD simulations and MM-
PBSA free energy analysis to both wildtype and mutant receptors. Functional
assays were used with mutant receptors, confirming that the correct allosteric binding site was located. Finally, a mechanism of allosteric antagonism is proposed based on a review of structural nAChR homologues and dynamic data
from MD simulations.
Chapters 3 and 4 discuss experimental and computational results that describe
how two retinoic acid analogues act as antagonists of RAR. Chapter 3 contains
the experimental section that includes the results of isothermal calorimetry, NMR,
and mass spectrometry experiments characterizing the interaction of the ligands
with the receptor. Chapter 4 contains the computational results including docking
and dynamics results that compliment the experimental findings in Chapter 3.
Detailed binding free energy calculations are also performed, revealing that for
the case of a peptide/protein interaction, strong correlation with experiment is
dependent upon sampling that extends into microsecond timescales.
Finally, Chapter 5 involves the parameterization of a new atom type for use with
the AMBER ff99 force field. Specifically, parameters to describe a sulfonium
atom type (sulfur atom with three substituents carrying a positive charge) are
derived. This chemical group is uncommon in biology and was therefore not
considered in the development of the common ff99 force field nor the general
AMBER force field (GAFF). However, a sulfonium center is present in S- adenosylmethionine (SAM), the most common methyl donor in biology.
19 Therefore, in order to perform MD simulations of SAM in complex with the 200+ types of SAM-dependent methyltransferases, sulfonium parameters were derived.
20
Chapter 2. Identification of a Negative Allosteric Binding Site on the Nicotinic Acetylcholine Receptor
2.1 Introduction
This chapter deals with computational methods used to identify the binding site of
a known modulator of a receptor, the nicotinic acetylcholine receptor (nAChR),
with very limited experimental data to limit the search space. Extensive homology
modeling was carried out to create a three-dimensional structure of the
extracellular domain (ECD) of the receptor, which was used to search for the
binding site of a set of experimentally-verified negative allosteric modulators
using blind docking methods. After potential binding modes were refined with molecular dynamics (MD) simulations, functional experiments were performed on mutant receptors to confirm the binding site; computational free energy analysis confirmed the experimental findings, providing additional support for the binding mode. Finally, based on a survey of experimental structures from a homologous protein, a mode of allosteric antagonism is proposed.
A majority of this chapter is adapted from the following reference made available from Creative Commons (http://creativecommons.org/licenses/by/2.5/):
Pavlovicz RE, Henderson BJ, Bonnell AB, Boyd RT, McKay DB, Li C. “Identification of a negative allisteric site on human α4β2 and α3β3 neuronal nicotinic acetylcholine receptors.” PLOS ONE, 6(9): e24949, 2011.
21 2.2 nAChR Background
Nicotinic acetylcholine receptors (nAChRs) are members of the pentameric
ligand-gated ion channel (pLGIC) family of membrane proteins, which also includes GABAA, glycine, serotonin, and zinc activated receptors. pLGICs are
also known as Cys-loop receptors due to a common structural motif of a long
membrane-contacting loop formed by a disulfide bridge. nAChRs are cation-
specific, plasma membrane channels found throughout the central and peripheral
nervous systems [58-60], which may be classified as either muscle- or neuronal-
type receptors based on their subunit composition. There are numerous subtypes
of neuronal nAChRs, with α2-α10 and β2-β4 subunits arranging in either homo-
or heteropentameric assemblies. The heteromeric receptors contain both α and β
subunits, with a general stoichiometry of 2α:3β [61-63], although there is also evidence for (α4)3(β2)2 nAChRs [64,65]. The homomeric receptors are solely
comprised of α subunits and have five agonist binding sites. For heteromeric
receptors, agonist binding occurs at α(+)/β(-) interfaces, where the (+) notation implies the contribution of a principle binding feature called the ‘C loop’ to the binding interface and the (-) notation refers to the complimentary subunit surface that completes the binding site. The general structure of nAChRs (Figure 2.1) is known from electron microscopy (EM) data of the Torpedo marmorata muscle- type receptor that has been refined to a resolution of 4 Å [66].
22
Figure 2.1. Schematic of neuronal nAChR structure. The proportions of the extracellular domain (A), transmembrane domain (B), and intracellular domain (C) are illustrated on the right while the modeled subunit stoichiometry and configuration for heteromeric neuronal nAChRs is illustrated on the left, including labels for the (+) and (-) side of each subunit and the location of each of the two agonist binding sites.
Physiologically, neuronal nAChRs are complex, participating in many neurological processes including cognition [67], pain sensation [68], and nicotine reward/addiction mechanisms [69,70]. In addition to nicotine addiction, these receptors have been linked to numerous neurological diseases and disorders including Parkinson’s disease [71], Alzheimer’s disease [71], schizophrenia [72], epilepsy [73], and lung cancer [74], making them important therapeutic targets.
Because the composition and distribution of nAChRs throughout the nervous
system are so varied, it is difficult to study the roles of the various nAChR subtypes in neuronal signaling pathways. In order to deduce these functional roles, there is a need for nAChR antagonists that selectively target specific receptor subtypes. Recently, a class of negative allosteric modulators (NAMs) 23 has been described by the McKay group [75,76]. Some of these compounds,
including a molecule called KAB-18, exhibit preferential inhibition of hα4β2
nAChR when compared to hα3β4 nAChRs, making them particularly interesting to study in order to understand where and how they act on the receptor so as to ultimately design more potent drugs that inhibit nAChRs of specific subunit compositions.
2.2.1 Structural Background: Acetylcholine Binding Protein
Structural comparison between the muscle-type nAChR and acetylcholine
binding protein (AChBP), a soluble pentamer found in molluskan species,
revealed that AChBP is a structural homologue of the extracellular domain (ECD)
of nAChRs [77]. AChBP structures have been reported for three different
molluskan species [77-79], and serve as the most complete templates for nAChR
ECD modeling. The most recent nAChR-related structure is that of the α1
extracellular domain of the mouse nAChR which was resolved to a resolution of
1.94 Å [80].
Great advancements in the field of nAChR structural studies came with the
results of genomic sequencing projects. In 2001, acetylcholine binding protein,
AChBP, was discovered in a cDNA library of the snail Lymnaea stagnalis and
has since been found to be a soluble homopentameric protein homologous to the
extracellular domain of LGICs [81]. AChBP homologues were later found in two
other molluskan species (Aplysia californica [82] and Bulinus truncatus [83]),
each sharing 15-28% sequence identity with all LGIC subunits and exhibiting
pharmacological binding to nAChR ligands similar to those displayed by the 24 homomeric α7 nAChR. The high-resolution structures of AChBPs cocrystallized with various nAChR agonist and antagonists provides a good picture of how nAChR ECDs are structured, how they bind ligands, and how ligand binding may be linked to channel opening.
The nAChR ECD, as inferred from the numerous AChBP structures, is primarily composed of β-strands. As illustrated in Figure 2.2, each subunit contains a short
N-terminal helix followed by ten β-strands which form a four-strand and six-strand
β-sheet. These two sheets are associated in a β-sandwich motif, where all but one strand (β8) is aligned in antiparallel fashion. All together, the topology takes the form of a modified immunoglobulin fold [77]. Two short 310 helices are also present in the structure, the first of which is found on the β2-β3 loop that projects near the N-terminal, making it one of the most distal parts from the membrane.
This loop forms the main immunogenic region (MIR) that is the antibody target of the autoimmune disease myasthenia gravis [84]. Another notable ECD region elucidated by AChBP structures is the ligand-binding site. This site, sometimes called the ‘aromatic nest’ or ‘aromatic cage’, is composed of three tyrosine and two tryptophan residues (Figure 2.2B), all of which are absolutely conserved among all of the nAChR subtypes. The ligand-binding site occurs at the interface between subunits, where one subunit, called the principle face, contributes four of the five aromatic ligand-binding residues. The complementary face contributes the final aromatic residue to complete the aromatic nest. Two tyrosine residues on the principle face are found on the β9-β10 loop, called the ‘C loop’. This loop, a defining characteristic of α subunits, is longer than the corresponding loop of β
25 subunits and contains a vicinal cysteine pair at its tip. nAChRs possess one
ligand binding site for each α subunit. Therefore homopentameric nAChRs such as the α7 subtype contains five binding sites, while the α4β2 and α3β4 subtypes modeled in the following sections contains only two. The conformation of the C loop has been shown to adjust to ligands bound in the aromatic nest [85]. When bound to small agonists such as acetylcholine or nicotine [86], the C loop closes around the molecules. Larger molecules such as methyllycaconitine [85] or cobra toxin [87], on the other hand, force the C loop into a more ‘open’ conformation
and act as antagonists [85].
A B
Figure 2.2. Structure of acetylcholine binding protein (AChBP). A. A subunit interface of the pentameric Lymnaea stagnalis AChBP is shown binding nicotine (magenta) from PDB ID: 1UW6. They Cys loop, named after the disulfide bond that gives it structure, is featured as well as the C loop which forms a large part of the ligand binding site. One subunit is colored red (N-terminus) to blue (C-terminus) while the complimentary subunit is in grey. B. Close-up view of the aromatic nest in the ligand binding site. Four of the aromatic residues that compose the aromatic nest (Y83, W143, Y185, and Y192) come from one subunit, while W53 is contributed by the adjoining subunit. Note the vicinal disulfide bridge at the tip of the C loop.
26 In 2007, the first mammalian nAChR subunit structure was solved, the
monomeric extracellular domain of the mouse α1 subunit (PDB ID: 2QC1),
providing more details on nAChR function. Although the subunit structure was
solved in monomeric form, it confirmed the high level of structural homology suspected for AChBP promoters and revealed two interesting features that likely relate to function: a well-ordered carbohydrate chain and a hydration pocket in the core of the subunit [80]. The N-linked glycosylation stems from an asparagine residue found on the Cys-loop of most subunits. The high-mannose carbohydrate chain is composed of two N-acetylglucosamines followed by eight mannose residues that stretch from the Cys-loop to the C loop. Mutation of this asparagine resulted in a loss of nAChR expression, indicating the importance of glycosylation in folding and trafficking of functional receptors, while single channel patch-clamp recordings on nAChRs that had their carbohydrate chains removed post-expression revealed a decrease in both opening probability and total current measured per opening event [80]. These tests verified the importance of the glycosylation for proper receptor function, probably by linking the ligand-binding site near the C loop to the transmembrane helices which come into contact with the Cys-loop, the point of origin of the carbohydrate chain.
The hydration pocket is composed of a serine and threonine residue that coordinate a single water molecule surrounded by the otherwise hydrophobic core of the ECD β-sandwich. This feature, not found in the molluskan AChBP structures, is located near the disulfide bond that characterizes the Cys-loop near the membrane surface. Mutational experiments that removed these hydrophilic
27 residues from the subunit core showed a ‘substantial’ loss of nAChR function,
indicating the importance of the hydration pocket to proper nAChR function [80].
It seems that the trapped water molecule inside each subunit allows for greater
mobility of the subunits during channel opening events.
2.2.2 Structural Background: Pentameric Ligand-Gated Ion Channels
More recent crystallographic efforts have revealed the structure of membrane-
spanning bacterial pLGICs, homologous to nAChRs. These distantly related
pLGICs share a common fold with metazoan nAChRs, but lack an N-terminal
helix, disulphide linkage in the Cys-loop, and an intracellular TM3-TM4 domain.
Like the discovery of AChBPs, these prokaryotic LGICs were discovered via
genome searches [88]. To date, full LGICs from the gram-negative Erwinia
chrysanthemi (ELIC) [89] and cyanobacteria Gloebacter violaceus (GLIC) [90,91]
have been crystallized. The ELIC structures are thought to represent the resting
or basal state of the receptor, while the two GLIC structures seem to have been
solved in an active, open state. Comparison of these structures reveals multiple
conformational changes that may be universal to all pLGICs. The transition
between the open and closed states involve both a quaternary twist of each
subunit in a counter-clockwise fashion when the receptors are viewed from the
extracellular side in addition to rigid body movements of the extracellular
domains by 8° around an axis parallel to the inner β-sheets [90]. Additionally,
downward movements of the β1-β2 loop of the ECD was observed, as well as movements of the Cys- and TM2-TM3 loops towards the periphery that tilt the
TM2 and TM3 helices to open the central pore [90,91]. 28 2.3 Homology Modeling
Multiple nAChR modeling studies have been previously reported, addressing
topics such as gating dynamics [92,93], agonist binding [94-96], agonist
selectivity [97,98], and allosteric modulator binding [99]. Most of these models
were built using a single crystal structure as a template [92,94-99], while some
studies have eschewed nAChR modeling all together, using AChBP structures
directly in virtual screening attempts to identify novel nAChR ligands [100,101]. A
strength of the software used here to model the nAChR ECD, MODELLER, is
that it can incorporate structural information from multiple templates into a single
model. Therefore, the modeling described below takes advantage of the
numerous templates available by incorporating four crystal structures into the
model of the nAChR ECD.
Since most nAChR-related experimental structures support modeling of the ECD,
and this receptor domain is known to bind a number of ligands with varied
pharmacological effects, the ECD is the focus of this computational study. In particular, human (α4)2(β2)3 (hα4β2) and human (α3)2(β4)3 (hα3β4) extracellular
domains were modeled based on multiple crystallographic templates. Four
different crystallographic templates were used in the homology modeling
process: the AChBP structure of three molluskan species including Lymnaea
stagnalis (PDB IDs: 1UW6 [86]), Aplysia californica (2BYR [102]), and Bulinus
truncatus (2BJ0 [78]), as well as the mouse α1 ECD monomer (PDB ID: 2QC1
[80]). In order to model the pentameric ECD with the monomeric mouse α1 ECD,
29 an artificial α1 pentamer was created by superimposing the monomer over an
AChBP structure five separate times.
The alignment of the four template structures to the target sequences (Figure
2.3) was performed manually, although cues were taken from PSIPRED [103] and PHD [104] secondary structure predictions. The sequence identity and
sequence similarity between the template and target structures were calculated based on the alignment (Table 2.1 and Table 2.2). Overall, the sequences of the
AChBP templates and the human nAChR ECD targets are quite dissimilar, ranging from 21-29% identity and 43-49% similarity. Based on the low sequence identity, only a ‘low-accuracy’ homology model is anticipated which is partly due
to a higher probability of alignment errors at this level of identity [105]. However,
based on the careful sequence alignments in Figure 2.3, most of the secondary
structural elements seem to be conserved between these two distantly related
proteins; a majority of the differences occur in the loop regions where many
insertions and deletions are present. The mouse α1 sequence shares a much
higher level of homology to the target sequences with 41-53% identity and 64-
70% similarity, bringing more certainty to the alignment and the overall quality of
the resulting models.
30
Figure 2.3. Numbered sequence alignment of AChBP and nAChR sequences used for modeling. Templates (bold) are the acetylcholine binding protein from three molluskan species (Lymnaea stagnalis, Aplysia californica, and Bulinus truncatus) and the mouse α1 nAChR ECD. Targets are the human α3, α4, β2, and β4 nAChR ECDs. Magenta highlighting indicates a conserved residue, while turquoise highlighting indicates residue similarity. Light green bars above residues represents α helices, dark green bars represent 310 helices, and light blue arrows represent β strands. The alignment was performed manually with cues taken from AChBP X-ray structures and the secondary structure prediction algorithms PHD and PSIPRED.
Table 2.1. Sequence identity between template and model sequences. Ls Ac Bt mα1 hα3 rα3 hα4 hβ2 hβ4 rβ4 Ls 100 Ac 35.4 100 Bt 47.1 36.5 100 mα1 22.7 24.9 22.4 100 hα3 25.8 29.1 24.3 51.4 100 rα3 25.2 28.0 24.3 52.4 94.7 100 hα4 26.4 29.6 23.7 52.9 60.6 61.1 100 hβ2 24.5 26.1 27.8 44.9 49.5 51.0 54.0 100 hβ4 24.5 23.9 21.5 40.9 49.0 49.5 52.0 69.6 100 rβ4 23.9 22.2 21.5 41.4 48.0 61.5 50.0 68.6 92.8 100 Template sequences: AChBP of Lymnaea stagnalis (Ls), Aplysia californica (Ac), and Bulinus truncatus (Bt) and the mouse α1 ECD (mα1). Target sequences: the ECD of rat α3 and β4 subunits and human α3, α4, β2, and β4 subunits.
31
Table 2.2. Sequence similarity between template and model sequences. Ls Ac Bt mα1 hα3 rα3 hα4 hβ2 hβ4 rβ4 Ls 100 Ac 59.0 100 Bt 68.6 60.5 100 mα1 44.2 46.6 44.7 100 hα3 48.5 45.0 48.7 69.7 100 rα3 48.5 43.9 49.3 68.8 96.2 100 hα4 42.9 42.9 44.7 66.8 76.4 76.4 100 hβ2 45.8 43.9 47.2 65.7 68.7 70.7 69.2 100 hβ4 48.4 43.9 49.3 63.6 68.7 69.7 68.2 85.5 100 rβ4 48.4 43.3 49.3 64.1 67.7 68.7 67.7 85.0 97.6 100 Template sequences: AChBP of Lymnaea stagnalis (Ls), Aplysia californica (Ac), and Bulinus truncatus (Bt) and the mouse α1 ECD (mα1). Target sequences: the ECD of rat α3 and β4 subunits and human α3, α4, β2, and β4 subunits.
Following alignment, three-dimensional models were built with MODELLER9v1
[106] in an iterative fashion, with 200 models being built in each iteration. Since
the model assessment methods used in MODELLER were exclusively calibrated
with single-chain proteins, they are not suitable for selecting top structures
among the pentameric nAChR models. To more accurately select a top structure,
each model was scored with a molecular mechanics Poisson-Boltzmann surface
area (MM-PBSA) approach which includes the internal energy of the model as
well as its solvation free energy. Each model was solvated in a TIP3P water box,
energy minimized, stripped of its waters, then scored with an MM-PBSA
approach in the AMBER suite of programs [107].
Molecular dynamics with locally enhanced sampling (LES) [108] was applied to
the top structure of the sixth modeling iteration to better sample the conformation
of the A loop and its connection to the adjoining β5 strand of each subunit. This approach was taken since the A loop (loop 5 in Figure 2.3, corresponding to residues 94-105 for α subunits and 96-107 for β subunits) is poorly aligned with
32 the AChBP sequences and while a one-to-one alignment exists with the mouse
α1 subunit, this loop exists at the subunit interface which is not present in the monomeric mouse structure. Five copies of each of the five LES regions (one region for each subunit) were created with the ADDLES module of AMBER. After solvating the structure with a TIP3P water box and adding counterions, approximately 4 ns of LES simulation was performed. In total, there were 55 LES residues for each of the five copies, leaving the remaining 987 residues of the
ECD to be treated classically. Over the 3.93 ns duration of the simulation, the all- atom root mean square deviation (RMSD) for residues in the LES regions reached a maximum RMSD of 5.31 Å with respect to the starting structure.
Comparing this deviation to the maximum RMSD of 3.40 Å exhibited by the non-
LES residues characterizes the enhanced sampling achieved by this method.
Upon separation of the five LES copies, the energy of each model conformation during the simulation was calculated with the same MM-PBSA protocol as described for the initial homology modeling. The structure with the lowest computed energy during the MD simulation was selected as the final template for homology modeling.
Initially, a rat α3β4 (rα3β4) nAChR ECD model was built to compliment experimental data that was available at the time. Later, human α3β4 and human
α4β2 models were built based on the final rat model. The rα3β4 ECD was modeled in seven iterations as illustrated in Figure 2.4, where each successive iteration added additional symmetry, distance, and secondary structural restraints as well as incorporating the best-ranking model from the previous iteration as a
33 fifth template structure. The same templates and restraints used to obtain the
final rat α3β4 nAChR model were also used to create the human α3β4 and α4β2
nAChR models.
As shown in Figure 2.4, the each successive modeling iteration was able to
successfully decrease the calculated model energy. The MM-PBSA energy of the
rα3β4 model was reduced by 11.2% from the initial round of modeling through
the final iteration. Three modeling adjustments that made the most significant
improvements in the calculated energies included refinement of the alignment
with secondary structure assignments, incorporation of the mouse α1 monomer
as a fourth crystallographic template, and LES (locally enhanced sampling)
refinement of loop A, which yielded -4.47%, -2.45%, and -3.50% changes in total
computed energy respectively. Incorporation of the mouse α1 monomer into the
homology modeling process was particularly helpful in refining the conformation
of several loop regions: L1, L5 (A loop), L7 (Cys-loop), and L9 (F loop). The three
molluskan species for which AChBPs have been crystallized all have shorter or
longer sequences in these regions, implying altered loop conformations in the
human nAChRs, while the mouse α1 sequence has a one-to-one alignment in
these loop regions. In fact, the mouse α1 sequence shares a one-to-one
alignment with both human α targets considered in this study, except for a single
insertion found in the C loop of the α1 sequence.
In addition to template differences, the alignment used to create this model is
unique, particularly in loop regions, from those previously reported. This implies differences in model structure that will affect docking and dynamics results.
34
Figure 2.4. Histograms of model energies per modeling iteration. Model energies were calculated to include the internal energy (EMM) in addition to solvation free energy calculated using the MM-PBSA method. Iterations 2-7 incorporates the top scoring model from the previous iteration as an additional template. Rat α3β4 ECD is modeled in interations 1-7 (blue), human α3β4 in iteration 8 (orange), and human α4β2 in intertion 9 (red). 1. Two roughly aligned AChBP templates (PDB ID: 1UWG and PDB ID: 2BYR) were used with symmetry restraints. 2. An additional AChBP template (PDB ID: 2BJ0) was included; template alignment was refined, secondary structure assignments and distance restraints of select conserved motifs were added. 3. β-sheet restraints were added. 4. Mouse α1 monomer (PDB ID: 2QC1) was included as a fourth crystallographic template; α1 template specifically used to refine loop 1; hydration pocket waters added. 5. α1 template was used to refine F loop conformation. 6. C loop conformation of β subunits was refined. 7. The A loop of all subunits were refined with a template modified by LES MD simulation; symmetry were restraints removed. 8. Human α3β4 ECD models were built using same alignments and constraints as in 6. 9. Human α4β2 ECD models were built using same alignments and constraints as in 6.
35 2.4 Molecular Dynamics
Prior to docking, molecular dynamics (MD) simulations of the hα4β2 and hα3β4
ECD models were carried out for two purposes: to test the stability of the models and to collect an ensemble of receptor conformation for use in docking studies.
2.4.1 MD Methods
Prior to simulation, the model was solvated in a TIP3P water box with a 15 Å
buffer around all edges of the protein. After solvation, the system was charge
neutralized by the addition of Na+ counterions and energy minimized by 500
steps of steepest descent minimization followed by 1500 steps of conjugate
gradient minimization. The system was equilibrated by first increasing the
temperature of the system from 0 K to 300 K over 200 ps in which all protein
atoms were fixed with a 50 kcal/mol harmonic potential. This proved to be an
important step, since it allowed the water molecules to fill in the gaps at the
protein/water interface that were left vacant by the solvating algorithm in the
LEaP module of AMBER. If the waters were not first allowed to equilibrate
around the protein, undesired side chain movements were observed that
detrimentally effected agonist docking to the agonist binding sites. A final 200 ps
of unrestrained MD completed the equilibration process. Production runs of 5 ns
followed the equilibration. All simulations used a heat bath coupling constant of
2.0 ps and were performed at 1 atm with a pressure relaxation time of 2.0 ps.
Nonbonded interaction calculations were cutoff at 8 Å, while the electrostatic
energy was computed using the Particle Mesh Ewald method. The simulations
were run using the sander code of AMBER 9 with the ff99 force field. Constant 36 volume and temperature MD simulations of the nAChR models used the SHAKE
algorithm as implemented by AMBER with a 2 fs time step. Snapshots were
captured at 200 ps intervals along the production run trajectories to form a set of
26 receptor conformations that were used for docking.
2.4.2 MD Results
Since the antagonists we are studying act in an allosteric fashion, it was
important to model the receptor in the presence of agonist as would occur in vivo.
To prepare the nAChR models for antagonist blind docking, MD simulations were conducted for the receptors in various binding states, including an unbound state, a binary complex bound to a single epibatidine molecule, and a ternary complex saturated with two epibatidine molecules. Epibatidine was selected as the agonist used in the model to most closely recreate the experimental conditions.
Conformational clustering of the MD simulations was performed using the k- means method with the kclust script from the MMTSB toolbox [109]. Receptor conformations were extracted at 1 ps intervals over 5 ns MD simulations and clustered based on their Cα atom RMSDs with a tolerance of 1 Å. As shown in
Table 2.3, a general decrease in sampled conformations for both hα4β2 and hα3β4 models was observed upon ligand binding. While the α3β4 MD simulations show a decrease in sampled receptor conformations with both agonist binding events, no change in number of clusters was found in the α4β2 model upon binding the second epibatidine molecule. These results suggest that the ECDs favor particular conformations when bound to agonists, presumably those conformations that lead to channel opening. This is consistent with single 37 channel experiments which show that short openings results from singly-ligated
nAChR, while doubly-ligated nAChR exhibit sustained openings [110].
Table 2.3. Number of conformational clusters from MD simulations of nAChR models in three different binding states. Number of MD clustersa α4β2 α3β4 apo 9 13 binary complexb 6 12 ternary complexc 6 10 aResults of k-means clustering from 5000 snapshots extracted at 1 ps intervals from 5 ns MD simulations using centroids of all Cα atoms with an RMSD tolerance of 1 Å bSingle epibatidine molecule bound to agonist binding site 1 cEpibatidine bound to both agonist binding sites
The stability of these simulations was quantified by all-atom RMSD analysis. It
was found that the Cys loops were conformationally unstable, leading to steadily
increasing RMSDs over the duration of the 5 ns simulations. However, when the
RMSDs were recalculated to exclude the Cys loop residues, the all-atom RMSD
for each model plateaued in the range of 2-3 Å, indicating stable MD trajectories
at this timescale (Fig. 2.5). The Cys-loops are some of the most variable regions
on the ECD models, which is not surprising since these loops are known to make contact with the membrane head groups as well as the M2-M3 loops of the transmembrane domain. Since these potentially stabilizing interactions are absent in the models which only include the ECD, the Cys-loops are free to sample conformations that do not reflect the physical reality of the full nAChR embedded in a plasma membrane.
38
Figure 2.5. RMSD plots for MD simulations of nAChR models. All-atom RMSD plots for hα4β2 (A) and hα3β4 (B) in three different states: unbound (apo), binary complex, and ternary complex. Dashed lines represent RMSD values for the entire extracellular domain models, while the solid lines represent the RMSD for the entire models excluding the Cys loop residues. Data was smoothed with a ±25 frame sliding window average.
The average RMSDs from the starting structures for the individual subunits in the three sampled binding states show that the MD trajectories are relatively stable
(Table 2.4). The maximal backbone RMSD average for a single ECD subunit is
4.56 Å, while the typical subunit only deviates an average of 2.02 Å from its initial
conformation over simulation times of 5 ns. Some regions, including the C, Cys, and L1 loops, are particularly more variable in conformation when compared to each subunit as a whole, while the A, B, and F loops are generally more stable.
39 Plots of the all-atom RMSDs on a per residue basis are shown in Figures 2.6-7,
illustrating the more conformationally variable regions of the receptor versus the
more stable regions. These plots once again emphasize the mobility of the Cys-
loops (residues 127-138 in α subunits and 129-140 in β subunits) compared to the rest of the ECD model.
40
Table 2.4. Average RMSDs for backbone atoms of ECD models from MD simulations in three states. apo binary ternary apo binary ternary α4β2 α41 all 1.91 1.62 1.40 α3β4 α31 2.44 2.16 2.18 average C loop 2.60 1.07 1.30 average 3.48 2.87 2.26 RMSD F loop 1.84 1.64 1.14 RMSD 2.08 1.55 1.36 (Å) A loop 1.29 1.42 1.33 (Å) 2.10 1.51 1.50 Loop 1 3.08 2.55 2.57 2.42 2.77 4.44 Cys loop 4.35 2.15 2.04 4.12 4.06 3.84 B loop 1.49 0.92 0.95 1.85 2.31 1.44 β21 all 1.82 1.77 1.66 β41 2.15 2.79 1.75 C loop 3.65 2.10 1.92 1.87 4.70 2.80 F loop 1.60 1.59 1.33 1.86 3.23 1.56 A loop 1.47 1.37 0.95 1.38 2.36 1.95 Loop 1 1.57 1.66 1.92 2.98 4.56 2.25 Cys loop 3.00 2.49 2.73 2.99 3.39 2.08 B loop 1.22 1.42 1.13 1.85 1.74 1.30 α42 all 1.55 2.38 1.27 α32 1.90 1.83 1.96 C loop 2.27 2.98 2.19 2.61 1.90 1.54 F loop 2.22 2.54 1.15 1.57 1.63 2.25 A loop 1.34 2.41 1.26 1.38 0.99 2.31 Loop 1 1.64 1.75 1.21 2.61 1.97 1.94 Cys loop 1.60 1.59 1.33 1.86 3.23 1.56 B loop 1.47 1.37 0.95 1.38 2.36 1.95 β22 all 1.57 1.66 1.92 β42 2.98 4.56 2.25 C loop 3.00 2.49 2.73 2.99 3.39 2.08 F loop 1.22 1.42 1.13 1.85 1.74 1.30 A loop 1.55 2.38 1.27 1.90 1.83 1.96 Loop 1 2.27 2.98 2.19 2.61 1.90 1.54 Cys loop 2.22 2.54 1.15 1.57 1.63 2.24 B loop 1.34 2.41 1.26 1.38 0.99 2.31 β22 all 1.64 1.75 1.21 β42 2.61 1.97 1.94 C loop 1.60 1.59 1.33 1.86 3.23 1.56 F loop 1.47 1.37 0.95 1.38 2.36 1.95 A loop 1.57 1.66 1.92 2.98 4.56 2.25 Loop 1 3.00 2.49 2.73 2.99 3.39 2.08 Cys loop 1.22 1.42 1.13 1.85 1.74 1.30 B loop 1.55 2.38 1.27 1.90 1.83 1.96 Binary complex (one bound epibatidine molecule at α/β interface, agonist binding site 1 in Figure 2.1) and ternary complex (an epibatidine molecule bound to each α/β interface, agonist binding sites 1 and 2 in Figure 2.1). MD snapshots were collected at 1 ps intervals from 5 ns long trajectories and all RMSD values are in reference to the initial structure of each trajectory.
41
Figure 2.6. Average all-atom RMSDs for hα4β2 nAChR ECD model in three different binding states. All-atom RMSD of each residue from the initial structure of a 5 ns MD simulation of three states: unbound (blue), bound to one epibatidine molecule at agonist binding site 1 (green), and bound to an epibatidine molecule at both agonist binding sites (red). Several loop regions are highlighted, including L1 (14-27), Cys-loop (127-138), F loop (159-174), and the α- subunit C loop (189-195).
42
Figure 2.7. Average all-atom RMSDs for hα3β4 nAChR ECD model in three different binding states. All-atom RMSD of each residue from the initial structure of a 5 ns MD simulation of three states: unbound (blue), bound to one epibatidine molecule at agonist binding site 1 (green), and bound to an epibatidine molecule at both agonist binding sites (red). Several loop regions are highlighted, including L1 (14-27), Cys-loop (127-138), F loop (159-174), and the α- subunit C loop (189-195).
Most nAChR agonists, including epibatidine, carry a positive charge at
physiologic pH. This plays a significant role in their binding to the nAChRs due to
cation-π interactions between the charged agonist and the cluster of aromatic
residues that forms the agonist binding site [111]. In addition to cation-π
interactions, proper fitting into the agonist binding site can allow for strong
hydrogen bond formation between the positively charge nitrogen of the agonist and the backbone carbonyl of Trp148, as observed in crystallographic structures 43 [86,102] and proven important in mutational studies [111]. Both of these
interactions have been measured in our dynamics studies, with the results
presented in Table 2.5. In both cases of epibatidine binding to the α4β2 models,
the hydrogen bond between the agonist and Trp148 is observed, while initial
simulations of the epibatidine-bound α3β4 models did not indicate that the
hydrogen bond was formed. Inspection of the docked epibatidine conformation in the hα3β4 nAChR binding sites revealed that the hydrogen bonding interaction
was not occurring due to a 180º rotation of the bicyclic portion of the epibatidine
molecules, positioning the positively charged nitrogen atom in the opposite
direction observed in the AChBP-bound conformation [102]. To remedy the
inaccurate epibatidine blind docking mode to the hα3β4 model, the binding mode
was remodeled based on the hα4β2 docking results that had a lower RMSD from
the crystallographic position. MD simulations of the remodeled hα3β4 epibatidine-bound nAChRs resulted in trajectories in which the agonist formed stable hydrogen bonds with the backbone oxygen atom of Trp148, consistent with the experimental binding mode.
44 Table 2.5. Measurements of agonist binding stability in MD simulations of epibatidine- bound nAChRs. hα4β2 hα3β4 Agonist binding Agonist binding Agonist binding Agonist binding
site 1 site 2 site 1 site 2 Hydrogen bonding interaction distance (Å)a binary complexc 2.88 (0.13) - 2.84 (0.11) - ternary complexd 2.89 (0.13) 3.75 (1.16) 2.84 (0.12) 2.86 (0.13) Cation-π interaction distance (Å)b binary complexc 3.91 (0.39) - 3.34 (0.25) - ternary complexd 3.72 (0.38) 4.55 (0.75) 3.45 (0.27) 4.87 (0.49) aDistance between positively charged N of epibatidine and backbone O of Trp148 bDistance between positively charged N of epibatidine and center of mass for the indole group of Trp148 (Å) cSingle epibatidine molecule bound to agonist binding site 1 dEpibatidine bound to both agonist binding sites Average measurements calculated from 5 ns MD simulations with standard deviations in parentheses
2.5 Blind Docking
A set of four nAChR negative allosteric modulators (NAMs) were used to search
for the unknown binding site. As illustrated in Figure 2.8B, these included COB-3,
KAB-18, APB-12, and PPB-9. Given that these compounds are all structurally
related, it was anticipated that each would bind at the same site on the receptor.
Docking methods were used to identify the binding site/mode of the compounds.
Typically, a binding site is already known and the docking algorithm is used to
determine a specific binding mode (conformation of the ligand in the pocket).
However, in this case, the siteof binding was not known, therefore an approach
called ‘blind docking’ was used, in which the entire surface of the receptor was
treated as part of the search space. Only two pieces of data were available to
limit the search space. First, was that these molecules inhibit nAChR function in
an allosteric fashion, therefore the orthosteric (agonist-binding) site could be
discounted. This knowledge was worked into the docking protocol by using
epibatidine-bound receptor conformations. Secondly, one of NAMs was shown to
44 be selective for hα4β2 receptors over hα3β4 receptors. This information allowed
us to eliminate binding sites that shared identical amino acid sequences between
the two receptor subtypes.
Acetylcholine Nicotine Epibatidine A
COB-3 PPB-9
KAB-18 APB-12 B
Figure 2.8. Compounds used in blind docking experiments. A. Docked agonists included acetylcholine, nicotine, and epibatidine. B. Docked negative allosteric modulators included COB- 3, PPB-9, APB-12, and KAB-18.
2.5.1 Docking Methods
Agonist structural coordinates were taken from the PDB and processed by the
LigPrep program of the Schrödinger suite to determine the ionization state of
each compound at pH 7 ± 2. All agonists were determined to carry a positive
charge within the pH range considered. All compounds were assigned Gasteiger charges and docked with the Lamarckian genetic algorithm (LGA) [112] in
45 AutoDock4 [113] with the maximum number of freely rotating bonds per ligand.
One hundred independent docking runs were completed for each ligand to each of the receptor conformations. A cutoff of 25,000,000 – 100,000,000 energy evaluations was used, depending on the number of rotatable bonds in the ligand, while all other docking parameters maintained the default setting.
Blind docking grids of size 90.00 Å 90.00 Å 56.25 Å with grid point spacing of
0.375 Å were constructed for each snapshot conformation with AutoGrid4. These grids were large enough to encompass the entire extracellular domain, only excluding the Cys-loop region, since docking results in this region are unrealistic due to the contact these loops make with the TM2-TM3 loops that are not part of these models.
Each of the 100 docking positions for each ligand at each receptor conformation were clustered by their centroid points with a 4 Å tolerance. The four most populous clusters of each ligand were then clustered against those from the other receptor conformations. This clustering of clusters was based on the receptor residues that came into contact with each cluster instead of the
Cartesian coordinates attributed to the centroid-based clusters. This method allowed for the clusters from different time points to be compared to each other without having to worry about spatial drift or rotation of the receptor. A list of residues coming within 5 Å of each of the docked conformations for each centroid-based cluster was created with scripts utilizing functions available in the
Chimera program [114]. Clusters with residue lists that shared a 65% intersection were considered to belong to the same docking position.
46 After the initial round of blind docking to the unbound nAChR models, the epibatidine docking with the smallest RMSD from the AChBP binding mode (as found in PDB ID: 2BYQ) was kept as part of each nAChR structure. Each agonist-bound system was then resampled via an MD simulation using a similar protocol as described above. A second epibatidine molecule was then docked to the models using the same ensemble blind docking method employed to dock the first compound. Again, the docking with the smallest RMSD from the AChBP binding mode at the second agonist binding site was added to the system to create a ternary complex: nAChR saturated with two agonist molecules.
Epibatidine was chosen as the agonist in the model to correspond to the agonist used in functional assays [115].
Upon creation of the ternary complex for both hα3β4 and hα4β2 nAChR models, the systems underwent one final MD simulation to create ensembles of epibatidine-bound receptor conformations. A final blind docking procedure was carried out with the antagonists illustrated in Fig. 2B. The results of the ensemble blind docking with the antagonists were clustered in the same fashion as the agonists in order to identify the most probable docking sites.
2.5.2 Docking Results
Flexibility of the agonist binding site has been documented by unbound and agonist-bound AChBP crystal structures [102]. When docking, this flexibility was accounted for by the use of multiple receptor conformations as extracted from
MD trajectories. From the MD simulations discussed in Section 2.4, a total of 26 snapshots were collected for each receptor subtype, collected at regular 200 ps 47 intervals. Initially, three different agonists with known experimental binding
modes to AChBP were blindly docked to validate the docking procedure. These
compounds, illustrated in Fig. 2.8A, include acetylcholine, nicotine, and
epibatidine. The docking results for the agonists near the agonist-binding site are
presented in Table 2.6 with representative docking modes illustrated in Fig. 2.9.
Although blind docking of the agonists to the hα3β4 snapshots was able to locate
both binding sites, only one of the two hα4β2 binding sites was properly located.
This was due to an unusual C-loop conformation at agonist binding site 1 in the
unbound state. Experimental binding affinities for all three agonists on human
nAChRs could not be found in the literature, however the EC50 values for
acetylcholine, nicotine, and epibatidine have been reported for recombinant
hα4β2 and hα3β4 receptors expressed in HEK293 and Xenopous oocytes [116-
118]. The docking energies for the agonists were able to reproduce efficacy
trends, with epibatidine binding more strongly than nicotine which displays
greater binding affinity than acetylcholine. Additionally, the average docking
energies of the agonists all showed a preference to bind the hα4β2 models over the hα3β4 models, a trend that is also experimentally observed [116-118].
48
Figure 2.9. Blind docking modes compared to X-ray structures. Docking modes for epibatidine (A – magenta), nicotine (B – orange), and acetylcholine (C – green) to hα4β2 models compared to crystallographic binding modes (blue). Crystallographic structures for AChBP bound to epibatidine, nicotine, and carbamylcholine (PDB IDs: 2BYN, 1UW6, and 1UV6 respectively) were superimposed on nAChR ECD models to determine RMSDs of the dockings.
Table 2.6. Blind docking results for agonists to multiple nAChR conformations. hα4β2 hα3β4 Average Expt. Average Average Expt. Average docking potency, Cluster docking docking potency, Cluster docking
energy EC50 size RMSD energy EC50 size RMSD (kcal/mol)a (μM) b (Å)c (kcal/mol)a (μM) b (Å)c acetylcholine -4.86 100 132 3.13 -4.66 203.14 150 6.45 nicotine -6.59 3.5 282 1.72 -6.31 40.3 90 4.97 epibatidine -7.83 0.043 154 5.44 -7.28 0.151 149 7.42 aAutoDock energies bExperimental agonist potencies from data reported in [116-118] cRMSD measurements compared to corresponding AChBP crystal complexes
Three antagonists, COB-3, PPB-9, and APB-12, were docked to the models using the same ensemble blind docking method that was used to dock the agonists. Based on LigPrep (Schrödinger, LLC) results, the antagonists are all positively charged at physiologic pH, protonated at the nitrogen atom of their piperidine/pyrrolidine moieties. Each antagonist also has one or more stereogenic centers. The two stereoisomers of each compound with the lowest computed energy were used in the blind docking study; each of these conformations had equatorial branching off of the heterocyclic moieties. The
49 antagonist docking site that was ultimately validated as the correct binding site was populated by 28.2% of the dockings to the epibatidine-bound model conformations. Three other sites were more prominently populated with alternate docking clusters; these had 69.6%, 57.1% and 47.1% rates of being identified as one of the four largest docking clusters for each antagonist that was docked. The positions of these other sites were all located on the inside of the doughnut- shaped extracellular domain facing the pore. They were either at subunit interfaces (both α/β and β/β) or tucked inside an A loop.
Some of the false positives observed in the blind docking can be attributed to the large search space used. Additionally, the use of a medium- to low-resolution homology model complicated the matter, possibly creating cavities in the surface that normally do not exist to which the ligands may preferentially dock. The use of multiple receptor conformations served two purposes. First, it could potentially relive some of the bias that the models had toward the modeling templates
(AChBP structures). Additionally, the ensemble-based docking approach could help account for the known flexibility of the receptor, much of which is ligand- induced.
2.6 Focused Docking and Induced Fit Molecular Dynamics
One of the frequently occurring antagonist blind docking modes was investigated more closely by redocking the antagonist KAB-18 to the suspected allosteric site with focused docking grids. KAB-18 became a focus because since it exhibits preferential antagonism of hα4β2 nAChRs versus hα3β4 nAChRs [115]. KAB-18 was docked to focused docking grids of size 37.5 Å 36.0 Å 37.5 Å with 0.375 50 Å point spacing. The grids were centered at an α/β interface, encompassing the
regions surrounding the epibatidine-bound agonist binding site. KAB-18 was
docked using AutoDock with similar parameters as the agonists, using a cutoff of
100,000,000 energy evaluations for the LGA. Recurring docking poses were
determined by clustering the docking results with an all-atom RMSD tolerance of
2 Å.
The selection of a precise docking mode was aided by existing structure activity
relationship (SAR) data that indicate modifying the terminal phenyl of the biphenyl group of KAB-18 to a succinimide moiety results in a loss of hα4β2 selectivity [116]. Additionally, modifying the length of the aliphatic linker on the opposite end of the antagonist was also shown to result in a loss of relative hα4β2 selectivity. Taking this into consideration, a binding mode in which the aforementioned regions of KAB-18 were found to associate with receptor residues that vary between the hα4β2 and hα3β4 nAChRs was selected. A subsequent MD simulation of this focused docking mode revealed two potentially important polar binding interactions. First was a hydrogen bond between the keto group of the ester linkage of KAB-18 and the hydroxyl oxygen of Thr58 of the β2 subunit. Second was a Coulombic interaction between the positively charged nitrogen of the piperidine group of KAB-18 and the carboxyl group of Glu60 of the
β2 subunit. These interactions and their stability in a 9 ns MD simulation are illustrated in Figure 2.10.
The refined binding mode is illustrated in Figure 2.11A, highlighting the amino acids with which the antagonist makes contact, while a superposition of the other
51 antagonist docking modes is found in Figure 2.11B. Interestingly, the residues that seem to confer selectivity for this binding mode, i.e. the sites of variation between the hα4β2 and hα3β4 subtypes (amino acids at positions 78, 110, 112,
118, 58, and 35), are all found on the β subunit, forming a band along the 6- membered β-sheet that creates the (-) side of the α/β interface (dark blue in Fig.
2.11A).
Figure 2.10. Stability of KAB-18 at its proposed binding site. A. Initial docking mode of KAB- 18 (magenta) to the hα4β2 model in the presence of epibatidine (purple). The ligand binds at the interface between the α subunit (green ribbon) and the β subunit (blue ribbon). Dotted lines identify key polar interactions between the ligand and the receptor. B. Induced binding mode after 7 ns of MD simulation. C. Distance between positively charged piperidine nitrogen of KAB-18 and carboxyl oxygens of β2Glu60. D. Distance between keto oxygen of ester linkage of KAB-18 and hydroxyl oxygen of β2Thr58.
52
Figure 2.11. Detailed docking modes for negative allosteric nAChR modulators. A. Docking mode of KAB-18 (magenta) at the 4(+) (green) / 2(-) (blue) interface in the presence of the agonist epibatidine (purple). Residues varying between the β2 and β4 subunits are featured (dark blue). B. Superimposed Glide docking modes of KAB-18 (magenta), APB-12 (grey), PPB-9 (orange), and COB-3 (green) at the same binding site.
2.7 Binding Site Validation: Mutagenesis and Functional Assays
[The work presented in this section was primarily performed by Brandon
Henderson, but is included here as the experiments were specifically performed
to validate the computationally predicted binding mode.]
Based on the proposed interaction between KAB-18 and the hα4β2 ECD (Figure
2.11A), mutations were suggested that could experimentally validate the binding
mode. Initially, two mutations were tested: β2F118L and β2T58K. Both of these
mutations change sites on the β2 subunit to the corresponding amino acid found
on the β4 subunit. KAB-18 has no functional activity on human α3β4 nAChRs 53 when tested at concentrations up to 100 μM [115] (higher concentrations were
not possible due to solubility limitations), therefore, it was thought that these
mutations would potentially decrease the observed potency of the molecule.
2.7.1 Experimental Methods
Human nAChR α4 and β2 full-length cDNAs in the vector pSP64 (poly A) were
obtained from Dr. Jon Lindstrom (University of Pennsylvania) and used as the
template for mutagenesis (β2) and for transient expression (α4 and β2). A single mutation was made in the β2 subunit using the Quik Change Lightning Multi Site-
Directed Mutagenesis Kit (Stratagene) following the manufacturers instructions.
Primers were designed using the QuikChange Primer Design Program
(Stratagene) and Oligo 4.0 (National Biosciences) and synthesized by Invitrogen.
Primers were designed to replace the threonine residue at position 58 in the hβ2
subunit with a lysine found at the similar position in the hβ4 subunit (T58K). The
following primer was designed to change the threonine (ACC) at position 58 in
the β2 subunit to lysine (AAG): β2 mutant 5'-
CCACCAATGTCTGGCTGAAGCAGGAGTGGGAAGATTATCG-3'. The
underlined nucleotides defined the mutation. Primers were also designed to
replace the phenylalanine residue at position 118 in the hβ2 subunit with a
leucine found at the similar position in the hβ4 subunit. The following primer was
designed to change the phenylalanine (TTC) at position 118 in the β2 subunit to
leucine (TTG): 5'-TCTCCTATGATGGTAGCATCTTGTGGCTGCCGCCTGC-3'
and 5'- GCAGGCGGCAGCCACAAGATGCTACCATCATAGGAGA-3'. It should
be noted that for the F118L mutation, an additional mutation (T) was introduced 54 which did not change the coding sequence, but relaxed a potential loop in the
primer in order to allow for the generation of this mutation. The mutant hβ2
cDNAs were subcloned into pcDNA 3.1+Zeo (Invitrogen). The wild type hα4 and
hβ2 cDNAs were also subcloned into the pcDNA 3.1+ and pcDNA 3.1+ Zeo
vectors respectively. All cDNA clones were completely sequenced using a 3730
DNA Analyzer (Applied Biosystems) at the Ohio State University Plant-Microbe
Genomics Facility. DNAs used for transfection were purified using PureLink High
Pure Mini or Midi Kits (Invitrogen). HEK ts201 cells (kind gift of Dr. Rene Anand,
Ohio State University Department of Pharmacology) were transiently transfected
with wild-type hα4 mutant hβ2 or wild-type hα4β2 cDNAs using Lipofectamine
2000 (Invitrogen) in 60 mm dishes. After 8 hours, the cells were replated into 96
well dishes for the intracellular calcium accumulation assays.
Calcium accumulation assays were performed as described previously [76] with
slight modifications. Briefly, HEK ts201 cells, transiently expressing wild type
hα4β2 nAChRs (hα4β2wt) or mutant hα4β2 nAChRs (hα4β2m), were plated on
96-well plates at a density of 2.6 x 105 cells per well. Twenty-four hours after plating, the cells were washed and then incubated with fluo-4-AM for 30 minutes at 37°C followed by 30 minutes at 24°C. After incubation, cells were washed and fluorescence was measured at ~0.7 second intervals using a fluid handling integrated fluorescence plate reader (Flex Station, Molecular Devices,
Sunnyvale, CA). The experimental design involved three treatment groups: control-sham treated, control-epibatidine treated, and antagonist treated.
Functional responses were quantified by first calculating the net fluorescence
55 changes (the difference between control sham-treated and control agonist-
treated groups). Data were expressed as a percentage of control-epibatidine
treated groups. Results were calculated from the number of observations (n)
performed in triplicate. Curve fitting was performed by Prism software
(GraphPad, San Diego, CA). EC50 values, IC50 values, and Hill coefficients were obtained by averaging values generated from each individual concentration-
response curve. EC50 values and IC50 values were expressed as geometric
means (95% confidence limits). Experimental values were compared using the t-
test (p<0.005), as indicated.
2.7.2 Functional Results
Functional IC50 values for KAB-18 and control antagonists (d-tubocurarine and mecamylamine) as well as function EC50 values for a control agonist (epibatidine) were obtained using a fluorescence calcium accumulation assay. Changes in the
IC50 values of KAB-18 were used to document a change in the apparent affinity
of KAB-18 as caused by mutation of the target amino acids. The IC50 for KAB-18
was reduced to 71.8 μM on the β2T58K mutant from the wild-type IC50 of 8.5 μM
(Figure 2.12 and Table 2.7), an eight-fold decrease in observed potency. The affect of the β2F118L mutation was even more pronounced, with a loss of inhibitory activity for KAB-18 at concentrations up to 100 μM. It is important to note these single point mutations did not affect apparent affinity for epibatidine
(an agonist) at the orthosteric site adjacent site of allosteric modulation. Nor did the mutation alter the apparent affinity for tubocurarine (a competitive antagonist)
56 or mecamylamine, a non-competitive antagonist which binds at a different
location on the receptor.
The results of the functional assays with the mutant receptors indicated that both
mutations are involved in the binding of KAB-18, while leaving the receptors
functionally intact. The F118L mutation resulted in a greater change of apparent
affinity between the ligand and receptor. Based on the modeling, phenylalanine
at position 118 seems to be involved in a π-π stacking interaction with the
terminal phenyl group of KAB-18, in addition to a potential cation-π interaction
with the positively charge piperidine moiety of KAB-18. Overall, these data
support the experimental findings that KAB-18 preferentially inhibits hα4β2 over
hα3β4 nAChRs [115] and are consistent with KAB-18 binding to the allosteric site predicted by computational modeling.
Figure 2.12. Dose-response curves for epibatidine and KAB-18 on wild type and mutant hα4β2 nAChRs. A. Functional response for epibatidine binding to hα4β2WT (wild type) and hα4β2 T58K/F118L mutant nAChRs. B. Functional response of KAB-18 on wild type and mutant nAChRs. Data are expressed as a percentage of control responses using 3 μM epibatidine. Values represent means ± SEMs (n = 5 – 7).
57 Table 2.7. Effects of agonists and antagonists on wild type and mutant hα4β2 nAChRs. Wild type hα4β2 hα4β2 T58K hα4β2 F118L a b a b a b EC50 or IC50 Values nH EC50 or IC50 Values nH EC50 or IC50 Values nH c epibatidine (EC50) 36.8 (25.4-53.4) nM 0.9 29.2 (9.8-87.2) nM 0.7 23.7 (12.6-44.6) nM 0.8 d-tubocurarine (IC50) 5.5 (3.4-9.0) μM -1.0 6.2 (2.1-18.5) μM -0.6 6.5 (3.9-10.9) μM -0.9 mecamylamine (IC50) 0.2 (0.1-0.3) μM -1.4 0.2 (0.1-0.5) μM -0.6 0.4 (0.3-0.5) μM -1.1 c c d KAB-18 (IC50) 10.0 (5.5-18.0) μM -1.2 71.8 (48.3-107.3) μM -1.0 >100 μM -- aValues represent geometric means (confidence limits), n = 5-7 b nH, Hill coefficient csignificantly different from wild type response, p<0.005 dcompound is insoluble at concentrations greater than 100 μM Data ranges in parentheses 58
58 2.8 Free Energy Analysis
Following binding mode analysis and in conjuction with the experimental
mutagenesis, β2T58K and β2F118L hα4β2 nAChR models were computationally
built and evaluated. Methods to build the mutants and sample their dynamics
were similar to those described in Sections 2.3 and 2.4. Following MD simulations, binding energy analysis was carried out to correlate KAB-18 binding
with the experimental data in Section 2.7.
Binding free energies were calculated for six cases: epibatidine binding to both
hα4β2 and hα3β4 nAChR models, KAB-18 binding to both models in the
presence of epibatidine, and KAB-18 binding to the hα4β2 T58K and F118L
models. The standard AMBER MM-PBSA protocol [119] was applied to 1500
bound-state conformations, extracted at 1 ps intervals from the MD simulations
described above. The receptor systems were composed of full α/β ECD interfaces for both enthalpic and entropic calculations. Entropy values were calculated using normal mode analysis.
Convergence of the computed binding free energies was tracked to assure sufficient sampling. Standard deviations of time averages are reported for sliding average data with a window size of 200 data points. Time averages at increasing intervals were computed (Figure 2.13) to quantify the convergence of each binding free energy. The average change in computed binding free energy between the first 1400 and first 1500 data points was 0.16 kcal/mol for the six cases reported, supporting the convergence of the values over the sampling period. 59 The results of the binding energy calculations are presented in Table 2.8. A binding energy of -17.46 kcal/mol for epibatidine binding alone to the hα4β2 model was computed, compared to the experimental range of -14.49 to -14.27 kcal/mol [117]. For epibatidine binding to the hα3β4 model, a binding energy of -
14.91 kcal/mol was computed, compared to the experimental range of -13.19 to -
13.19 kcal/mol [118]. These more computationally intensive free energy calculations yield numbers that follow the experimental binding trends for epibatidine in addition to being much closer estimates of the experimentally derived energies than the AutoDock scores reported in Table 2.6. In the presence of epibatidine, KAB-18 was predicted to bind more strongly to the hα4β2 nAChR model than the hα3β4 model, with computed binding energies of -
6.25 kcal/mol and 11.25 kcal/mol respectively. This is inline with the experimental data.
The binding energy for KAB-18 bound to two hα4β2 models with mutations in the putative allosteric binding site were also assessed with the MM-PBSA method.
KAB-18 was computed to bind slightly weaker to the model with a T58K mutation on the β2 subunit with binding energy of -5.34 kcal/mol, a 0.91 kcal/mol difference from the wild-type binding energy. A F118L mutation on the β2 subunit resulted in a positive computed binding energy of 7.31 kcal/mol. Both of these in silico mutation experiments correspond with the functional data presented in
Section 2.7.
60 Table 2.8. MM-PBSA binding energy calculations for epibatidine- and KAB-18-bound receptors. hα4β2-WT hα4β2-T58K hα4β2-F118L hα3β4-WT epibatidine bindinga ∆H -33.28 (1.01) -- -- -32.29 (0.96) -T∆S 15.82 (2.07) -- -- 17.37 (1.28) ∆G -17.46 (2.32) -- -- -14.91 (1.46) Expt. rangeb -14.49 – -14.27 -- -- -13.25 – -13.19 distance 1c 2.86 (0.04) -- -- 2.83 (0.02) distance 2d 7.79 (0.30) -- -- 8.65 (0.27) KAB-18 binding in presence of epibatidinee ∆H -28.27 (2.02) -28.56 (2.69) -21.30 (1.67) -14.90 (0.98) -T∆S 22.02 (1.77) 23.22 (1.99) 28.61 (2.28) 26.15 (2.38) ∆G -6.25 (2.86) -5.34 (3.32) 7.31 (2.59) 11.25 (3.06) distance 1c 2.90 (0.06) 2.93 (0.07) 2.97 (0.11) 2.86 (0.03) distance 2d 11.83 (0.21) 13.76 (0.40) 16.84 (0.34) 13.61 (0.57) aBinding at agonist binding site 2. bExperimental binding affinities calculated from data reported by Parker et al. [120] cDistance between the positively charged N atom in the bound epibatidine molecule and the backbone O atom of Trp148, quantifying epibatidine binding stability. dCα-Cα distance between α191 and β58 at the binding interface, quantifying C loop closure eBoth compounds bound at agonist binding site 2.
61
Figure 2.13. Convergence of MM-PBSA calculations. Average free energies of binding as a function of sampling period for A. epibatidine binding hα4β2 model B. epibatidine binding to hα3β4 model C. KAB-18 binding to epibatidine-bound hα4β2 model D. KAB-18 binding to epibatide-bound hα4β2 T58K model E. KAB-18 binding to epibatidine-bound hα4β2 F118L model F. KAB-18 binding to epibatidine-bound hα3β4 model. Calculated energies are presented as averages starting at time 0.
62 2.9 Mechanism of Allosteric Antagonism
As made apparent in numerous crystal structure of AChBP bound to various
ligands, C loop dynamics are an important aspect of ligand binding. When bound
to small agonists such as nicotine, acetylcholine, or epibatidine, the C loop takes
on a ‘closed’ or capped conformation, while competitive antagonists, which are
much larger than agonists, force the C loop into a more ‘open’ conformation
[102]. To track C loop dynamics in the MD simulations performed here, the Cα-
Cα distance was measured between αCys191 on the tip of the C loop and β58
(β2Thr58 / β4Lys58) on the opposite side of the interface as illustrated in Figure
2.14. These corresponding distances for 22 different AChBP crystal structures were measured (Table 2.9) and have been used to create generalized Cα-Cα ranges of C loop ‘openness’ for agonist, partial agonist, and antagonist binding in addition to unbound states which are grouped with the non-peptidic antagonists.
The general range for agonist binding, based on four X-ray structures, is 7.72 –
8.19 Å, compared to the unbound state which has a range of 15.36 – 15.72 Å based on two structures (Table 2.10).
63
Figure 2.14. C loop closure of AChBP bound to various ligands. Superposition of four crystal structures of AChBP in complex with various compounds to illustrate the difference in intersubunit distances between Cα of residue C191 of the α subunit C loop on the (+) side of the binding interface and Cα of residue 58 of the β subunit β2 strand on the (-) side of the interface. Only epibatidine is shown (pink surface) for clarity to highlight the ligand binding site. The tabulated Cα-Cα distances allows for quantification of the degree of C loop closure upon ligand binding.
64 Table 2.9. Survey of C loop closure for AChBP X-ray structures. Structure Measurement of C PDB ID resolution Compound name a Compound type loop closure (Å) (Å) 2WNL 2.70 anabaseine 7.72 agonist 2BYQ 3.40 epibatidine 7.80 agonist 1UW6 2.20 nicotine 7.93 agonist 2BJ0 2.00 CXS 8.10 buffer 1UV6 2.50 carbamylcholine 8.16 agonist 2BYS 2.05 lobeline 8.19 agonist 2BR7 3.00 HEPES 8.33 buffer 1I9B 2.70 HEPES 9.30 buffer 1UX2 2.20 HEPES 9.34 buffer 2WNJ 1.80 DMXBA 9.75 partial agonist 2WNC 2.20 tropisetron 10.13 partial agonist 2WN9 1.75 4-OH-DMXBA 12.30 partial agonist 2X00 2.40 gymnodimine A 12.88 antagonist 2BYN 2.02 PEG 14.71 buffer 2BYR 2.45 methyllycaconitine 14.64 antagonist 2BG9 4.00 - 15.36 - 2W8E 1.90 - 15.72 - 13-desmethyl 2WZY 2.51 16.05 antagonist spirolidine peptidic 1YI5 4.20 cobratoxin 17.50 antagonist peptidic 2BR8 2.40 α-conotoxin PNIA 18.76 Å antagonist peptidic 2C9T 2.25 α-conotoxin IMI 19.13 Å antagonist peptidic 2BYP 2.07 α-conotoxin IMI 19.24 Å antagonist aAverage Cα-Cα distance between residues that correspond to C191 on the C loop of nAChR α subunit of the (+) of the binding interface and residue 58 on the β2 strand of β subunits of the (-) side of the interface.
Table 2.10. General ranges for C loop "openness" upon binding ligands of different pharmacological function. average Cα-Cα range (Å)a agonist 7.72-8.19 partial agonist 9.75-12.30 antagonist / unbound 12.88-16.05 peptidic antagonist 17.50-19.24 aAverage Cα-Cα distance between residues that correspond to C191 on the C loop of nAChR α subunits on the (+) side of the binding interface and residue 58 on the β2 strand of β subunits on the (-) side of the interface
65 The average Cα-Cα distance from the 5 ns MD simulations of unbound agonist
binding sites (apo binding sites 1 & 2, binary complex binding site 1) all had
values between the partial agonist and unbound ranges defined in Table 2.10,
implying more “open” C loops. An exception was observed for agonist binding to
site 1 of the apo hα4β2 receptor. Here, the C loop is closed in the absence of
agonist, which may represent the closed unbound state observed by
Mukhtasimova et al. [121]. Upon agonist binding, the measured Cα-Cα distances
decreased to values in the ranges measured for agonist-bound and partial
agonist-bound receptors, consistent with structural data that implicates agonists
causing C loop closure to initiate channel opening [102]. In the bound states, the
low standard deviations indicate relatively stable C loop conformations; the
standard deviations for the time-averaged Cα-Cα distances are greater in the
unbound states.
Table 2.11. Measurements of C loop closure for MD simulations of epibatidine-bound nAChRs. average Cα-Cα distance (Å)a hα4β2 hα3β4 agonist binding agonist binding agonist binding agonist binding site 1 site 2 site 1 site 2 apo 8.94 (1.57) 15.08 (1.73) 12.68 (2.12) 15.06 (2.25) binary complexb 7.78 (0.46) 14.32 (3.00) 8.69 (0.48) 12.99 (1.60) ternary complexc 8.02 (0.45) 11.62 (1.79) 10.62 (1.35) 7.88 (0.38) average minimum average minimum distance (Å) distance (Å) distance (Å) distance (Å) epibatidine / 12.97 (1.49) 10.58 18.97 (2.95) 11.55 KAB-18 complexd aSame Cα-Cα measurement as defined in Table 4. bSingle epibatidine molecule bound to agonist binding site 1. cEpibatidine bound to both agonist binding sites. dCompounds bound to agonist binding site 2. Data averaged over 5 ns MD simulations with standard deviations in parenthesis.
66 In our computational KAB-18 binding studies, the dynamics of the C loop show
that even though epibatidine is forming a stable hydrogen bond with the carbonyl
oxygen of Trp148, the C loop is obstructed from closing to an agonist-bound
state due to the presence of KAB-18. The minimum Cα-Cα distances in
simulations of epibatidine and KAB-18-bound hα4β2 and hα3β4 nAChRs was
10.58 and 11.55 Å respectively, while the average values over 5 ns of simulation were larger at 12.97 and 18.97 Å respectively. This indicates a possible mechanism of noncompetitive antagonism: inhibition of C loop closure that is required for the channel to open while not interfering with agonist binding.
Although this is a known mode of antagonism for competitive antagonists
[102,122], this is the first time a negative allosteric modulator has been suggested to act in this fashion.
Furthermore, superposition of the X-ray structure of AChBP in complex with the
α7 nAChR partial agonist, 3-(2,4-dimethoxybenzylidine)-anabaseine (DMXBA)
[123], to an MD snapshot of our equilibrated epibatidine- and KAB-18-bound
hα4β2 nAChR complex, reveals interesting similarities in ligand binding (Figure
2.15). The anabaseine portion of DMXBA superimposes well with the epibatidine
molecule bound in the nAChR model, while the dimethoxybenzylidine moiety of
DMXBA branches towards the (-) surface of the subunit interface to the same
region occupied by KAB-18 in the nAChR model. Anabaseine acts as a full α7
agonist, while the addition of the dimethoxybenzylidine group reduces the level of
efficacy, transforming the molecule into a partial agonist [123]. The experimental
Cα-Cα measurements of C loop closure for anabaseine average 7.72 Å in the
67 bound state while DMXBA measures 9.75 Å. KAB-18 seems to share some of the nAChR binding qualities that make DMXBA an antagonist, however KAB-18 is able to more effectively prevent C loop closure while not competing with the agonist-binding site. These similar binding features coupled with varied degrees of C loop closure can provide some insight on what may differentiate a partial agonist from a full agonist or antagonist; pharmacological effects of a ligand binding at or near the orthosteric site are related to the degree to which the ligand induces or inhibits C loop closure.
68
Figure 2.15. Comparison of experimental DMXBA binding to computationally predicted KAB-18/epibatidine binding. The X-ray structure of DMXBA (orange) in complex with Aplysia californica AChBP (grey ribbon) superimposed on a hα4β2 nAChR ECD model (green and blue ribbon for α4 and β2 subunits respectively) bound to both epibatidine (purple) and the negative allosteric modulator KAB-18 (magenta). The C loops for each protein have been removed for clarity in the main figure, while the inset features the varied degree of C loop closure.
69 2.10 Conclusions
In conclusion, we have shown how a combination of homology modeling, molecular dynamics, and docking techniques can be used to identify the binding site of a ligand with little guiding experimental data. These techniques were specifically applied to locate and validate the binding site/mode of a class of nAChR negative allosteric modulators. Two mutations at the proposed binding site reduced the apparent affinity of KAB-18, a hα4β2-selective NAM, while not affecting the binding of a test agonist, competitive antagonist, or off-site antagonist, verifying the binding site. Finally, a survey of crystallographic structures and MD simulations of the KAB-18-bound receptor suggests that KAB-
18 may act as an antagonist by preventing C-loop closure, while not effecting agonist binding.
70
Chapter 3. Experimental Investigation of Retinoic Acid Receptor Antagonism
3.1 Introduction
The transcription factor retinoic acid receptor (RAR) is activated by all-trans
retinoic acid (ATRA), its endogenous agonist. A major source of ATRA in the
body comes from symmetric cleavage of dietary β-carotene. Recently, it has
been discovered that asymmetric cleavage products of β-carotene, namely β- apo-14’-carotenoic acid and β-apo-13-carotenone, function as antagonists of retinoic acid receptor (RAR) gene transcription [124]. Although β-apo-14’- carotenoic acid remains a theoretical β-carotene metabolite, physiologically relevant concentrations of β-apo-13-carotenone have been identified in human plasma samples.
In the following two chapters, two major questions are addressed:
1.) What is the basis of retinoic acid receptor antagonism for β-apo-13- carotenone and β-apo-14’-carotenoic acid?
2.) What is the origin of the strong binding affinity observed between β-apo- 13-carotenone and the retinoic acid receptor?
71 This chapter discusses the experimental work performed to address these questions, while Chapter 4 contains the computational results that complement the findings presented here.
3.2 Nuclear Receptor Background
Nuclear receptors (NRs) are metazoan, ligand-activated transcription factors responsible for the expression of gene programs related to nearly all aspects of life, including development, cell differentiation, immune response, reproduction, metabolism, and homeostasis. Unlike membrane-bound receptors, which may induce gene transcription by initializing intracellular signaling pathways, NRs are intracellular proteins that directly bind to genomic DNA at sites known as hormone response elements (HREs). Based on a genome-wide search for conserved NR domains, it has been determined that there are 48 human NRs
[125]. Of these 48 receptors, about half have known ligands, while the remainder are currently classified as “orphan” receptors. Interestingly, the number of NRs found in other model organisms with sequenced genomes has been found to vary dramatically. The fruit fly, Drosophila melanogaster, possesses only 21 NRs, while the nematode, Caenorhabditis elegans, has ~270 predicted NR genes
[126]. Retinoic acid receptor (RAR), the particular NR that is the focus of this study was first cloned in 1987 [127,128], two years after the first human NR, the glucocorticoid receptor, was cloned [129].
72 3.2.1 NRs Are Ligand-Activated Transcription Factors
As with most transcription factors, the ability to bind to a target DNA sequence
does not necessarily allow it to mediate transcriptional activity. Most transcription
factors are modular in structure and require one domain to bind DNA, while
another interacts with transcriptional machinery (RNA polymerase) or chromatin
remodelers that in turn interact with RNA polymerase. In the case of NRs,
agonist binding promotes the recruitment of proteins called steroid receptor
coactivators (SRCs) via a C-terminal region called the activation function 2 (AF2).
Although NRs also contain a ligand-independent AF1 region in the unstructured
N-terminal domain, it is the ligand-activated AF2 region that is a defining characteristic of NRs.
3.2.1.1 NR Ligands
Most characterized NR ligands are small, hydrophobic molecules such as steroid
hormones, thyroid hormone, vitamins, or metabolic intermediates. Some of the
most well characterized NRs respond to hormones including thyroid hormones
and steroids such as estradiol, progesterone, testosterone, cortisol, and
aldosterone. Other NRs respond to nutrients such as vitamin D and vitamin A or
metabolic intermediates. Table 3.1 lists the common NRs and their known
endogenous ligands. The receptors in this table are divided into three functional
classes. In the absence of ligand, class I NRs, which primarily bind steroid
hormones, remain in the cytosol bound to heat shock proteins (HSPs). Agonist
binding causes release from the HSPs, homodimerization, and translocation into
the nucleus where the receptors preferentially bind specific HREs. Class II NRs, 73 on the other hand, are always located in the nucleus where they are often bound to DNA. In the absence of agonist, class II NRs may be bound to corepressor proteins that actively silence surrounding genes. Upon agonist binding, corepressors dissociate from the NR and coactivators are recruited to initiate transcription. These NRs typically form heterodimers with retinoid X receptors
(RXRs), while RXR itself may form homodimers or tetramers [130]. Finally, as in most biological classification systems, there are exceptions. While most NRs function as homo- or heterodimers, some have been found to act as monomers.
Although the two examples of this class presented in Table 3.1, nerve growth factor IB-like receptors (NGFIBs) and Rev-Erb receptors, have been observed to form dimers, each may also bind as monomers due to enhanced affinity for the core recognition DNA motif.
Table 3.1. List of common NRs and their known endogenous ligands. Full name Abbreviation Subtypes Endogenous ligand 17β-estradiol, estrone, estrogen receptor ER α, β estriol progesterone receptors PR -- progesterone Class I testosterone and androgen receptor AR -- dihydrotestosterone glucocorticoid receptor GR -- cortisol mineralocorticoid receptor MR -- aldosterone thyroid hormones, T and thyroid hormone receptor TR α, β 3 T4 vitamin D receptor VDR -- vitamin D all-trans retinoic acid retinoic acid receptor RAR α, β, γ (ATRA) or 9-cis-RA Class II retinoic X receptor RXR α, β, γ 9-cis-RA peroxisome proliferator- PPAR α, β, γ fatty acid metabolites activated receptor liver X receptor LXR α, β oxysterols farnesoid X receptor FXR -- bile acids pregnane X receptor PXR -- xenobiotics NGFIB, nerve growth factor IB-like -- -- Class III Nur77 Rev-Erb Rev-Erb α, β heme 74 3.2.1.2 NR Corepressors
In the absence of a bound ligand, NRs that reside in the nucleus are often bound
to corepressors. These proteins, such as the nuclear receptor corepressor 1 (N-
CoR1) [131] or the silencing mediator for retinoid or thyroid-hormone receptors
(SMRT, also known as N-CoR2) [132], recruit histone deacetylases (HDACs).
HDACs actively remodel chromatin by removing acetyl groups from histone lysines, increasing the formal charge of the side chains and strengthening the interaction with the negatively charged DNA backbone. Ultimately, this leads to
transcriptional silencing. In addition to corepressors binding to apo receptors, a subset of NR antagonists, called inverse-agonists, can promote the recruitment
of corepressors.
Most NRs interact with corepressors very weakly, however unliganded thyroid
hormone receptor (TR) and RAR exhibit strong repression of basal transcription
in the presence of corepressor proteins [131]. Additionally, LXR [133] and Rev-
Erb [134,135] have also been shown to interact with corepressors. Rev-Erb can
actually only act as a transcriptional repressor since it lacks the C-terminal region
that is responsible for the ligand-activated AF2 functionality. The AF2 region,
which maps to helix 12 of the NR ligand-binding domain (LBD) as described in
further detail in Section 3.2.3.4, undergoes a conformational change upon ligand
binding which has the dual effect of displacing corepressor proteins while forming
a binding pocket for coactivators.
A schematic of the functional N-CoR1 and SMRT corepressor domains is
illustrated in Figure 3.1. Here we see that there are three independent repression
75 domains (RD1-RD3) that are responsible for either directly recruiting HDACs or
binding to bridging proteins such as mSin3, which then recruit HDACs (HDAC1 in the case of mSin3) [136]. In spite of the large number of deacetylases found to associate with the N-CoR/SMRT corepressors, HDAC3 seems to be responsible for repressive activity [137]. Interestingly, although recombinant HDAC3 is non- active in deacetylase assays, SMRT-bound HDAC3 does display deacetylase activity. Therefore, RD2 has also been called a deacetylase activating domain
(DAD) [138]. At the C-terminus of the corepressor is the receptor interaction domain (RID) that contains two or three (I/L)XX(V/I)I interaction motifs (N-CoR1 contains three interaction motifs). Although the ~270 kDa N-CoR1 and SMRT corepressors are closely related, they do not seem to be redundant proteins, since knock-out of N-CoR1 was found to be lethal in mouse embryos [139].
Figure 3.1. Domain organization of N-CoR1 and SMRT corepressor proteins. Three repressive domains (RD1-RD3) either directly recruit histone deacetylase proteins (HDACs) or bind to proteins such as mSin3 which then binds to an HDAC. The nuclear receptor interaction domain (RID) interacts with the NR via two or three (I/L)XX(V/I)I motifs found at the C-terminal end of the corepressor.
76 3.2.1.3 NR Coactivators
The ligand-activated transcriptional response mediated by NRs is reliant upon
association with additional proteins in a ligand-dependent fashion. These
proteins are called coactivators and are responsible for multiple functions that
lead to downstream transcription. In particular, coactivators are associated with chromatin remodeling and recruitment of the basal transcription machinery.
Nuclear receptor coactivators are known by many names due to parallel discoveries. The first coactivators characterized belong to the p160 family of proteins. In particular, steroid receptor coactivator-1 (SRC-1) was first reported in
1995 through the use of a yeast two-hybrid screen of a human B-lymphocyte
cDNA library that used the hinge and LBD of hPR as bait [140]. In mice, a protein
homologous to SRC-1 was discovered and called nuclear receptor coactivator-1
(NCoA-1) [141]. Ultimately, the p160 family of coactivators was determined to be composed of three members. In this document, they will be referred to as steroid
receptor coactivators (SRCs), although the NCoA notation is perhaps the most
generalized name as SRCs bind to all nuclear receptors, not just those activated
by steroid hormones. It should be pointed out that SRC-2 is commonly known as
glucocorticoid receptor-interacting protein-1 (GRIP-1) or transcriptional mediators/intermediary factor 2 (TIF2), while SRC-3 has been called many names including p300/CBP/co-integrator-associated protein (p/CIP), ACTR [142],
activated in breast 1 (AIB1), receptor associated coactivator 3 (RAC3), and
thyroid hormone receptor activated molecule-1 (TRAM-1) [141].
77 The three p160 members share a similar domain layout as illustrated in Figure
3.2, acting as a platform for multiple protein interactions in addition to exhibiting
intrinsic histone acetyltransferase (HAT) activity [142]. The N-terminal region
contains a basic helix-loop-helix (bHLH) and Per-Arnt-Sim homology (PAS)
domain indicating dimerization and signaling abilities respectively. The central
region contains the nuclear receptor interaction domain (RID) and the N-terminal
region contains two autonomous transactivation domains (AD1 and AD2), each
with the ability to induce transcription. Both AD regions are responsible for
binding to secondary coactivator proteins: AD1 has been mapped to the residues
responsible for binding CREB-binding protein (CBP) [143], while AD2 binds to
coactivator-associated arginine methyltransferase (CARM1, also known as
protein arginine N-methyltransferase 4, PRMT4) [144]. CBP is closely related to
another coactivator, p300, and serves as a general integrator of transcriptional
signals.
Figure 3.2. Domain organization of p160 family of coactivators. Domains identified in all three members of the p160 family of coactivators include N-terminal basic helix-loop-helix (bHLH) and Per-Arnt-Sim homology (PAS) domains, a central nuclear receptor interaction domain (RID), and C-terminal transactivation domains (AD1 and AD2). AD1 binds the cAMP response element binding protein (CREB) binding protein (CBP) while AD2 binds to coactivator-associated arginine methyltransferase 1 (CARM1/PRMT4). Histone acetyltransferase (HAT) activity has also been mapped to the C-terminal region. The RID contains three LXXLL motifs called NR-box 1-3. Additionally, SRC-1 contains a fourth LXXLL motif at the extreme C-terminus.
78 Coactivators specifically interact with NRs via LXXLL motifs (L=leucine, X=any amino acid) that are found in a variety of proteins, including the p160/SRC and p300/CBP coactivator families. Additional coactivators such as receptor- interacting protein 140 (RIP-140), transcription intermediary factor 1 (TIF-1), and
TRIP-1 proteins also contain the interaction motif [145]. The central RID of the
SRC proteins contains three LXXLL motifs, while SRC-1 contains a fourth motif at its extreme C-terminus. The LXXLL motifs have been called NR boxes, resulting in abbreviations such as ‘SRC-1 NR2’ to describe the second NR box motif of the SRC-1 coactivator. Studies have found that the LXXLL motifs alone are not sufficient for receptor binding, while peptides extended to include flanking regions of the motifs are more potent binders. For example, 13 and 14-residue peptides of SRC-2 NR2 and NR3 respectively have been shown to bind TRβ
[146]. In addition to increasing binding affinity, the flanking regions of the LXXLL motifs confer some level of specificity for the different NRs.
The AD1 and AD2 activities of the p160 coactivators have been linked to the chromatin modification capabilities of p300/CBP and CARM1. Like p160, p300/CBP contains intrinsic HAT activity [147,148]. Additionally, p300/CBP further recruits p300/CBP-associated factor (P/CAF) that also exhibits HAT activity [149]. While CBP and its associated factors can acetylate histone lysine residues, CARM1 can asymmetrically methylate select histone arginine side chains. CBP and CARM1 act synergistically to enhance transcriptional activity since the presence of one or the other does not result in the same amount of transcriptional activity observed when both factors are present. Additionally,
79 CARM1 has been found to more efficiently methylate nucleosomes that have already been acetylated [150]. Although CARM1 and CBP are generally considered secondary coactivators for NRs, CBP/p300 also contains three
LXXLL motifs and can bind directly to NRs [151]. However, this interaction results in 50-100-fold less β-galacosidase activity than that observed for p160 activity in a yeast two-hybrid assay, suggesting the importance of the p160 coactivators to serve as a platform for CBP to bind [152,153].
CBP was initially identified as a coactivator of the CREB transcription factor. In addition to directly interacting with CREB and with NRs through the p160 coactivators, CBP interacts with AP-1 transcription factors (c-Jun, c-Fos), c-Myc, v-Myb, p53, Stat-1 and NF-κB. Thus, CBP is a general coactivator protein responsible for the transcriptional activity of several classes of transcription factors. Interestingly, in addition to methylating histone H3, CARM1 can methylate CBP at R600 of its KIX domain, disrupting its ability to interact with the
KID domain of CREB, thereby impairing cAMP-induced transcription. This molecular switch is thought to allow for cross-talk between the NR and cAMP signaling pathways and allow for proper distribution of the limited number of copies of nuclear CBP [150].
3.2.2 RARs Function a Heterodimers with RXRs
Like most other class II NRs, retinoic acid receptors (RARs) heterodimerize with
RXRs to form functional receptors. While the ligand-binding domain (LBD) serves as the primary platform for dimerization, a major functional consequence of
80 dimerization is the ability to bind to the proper DNA sites upstream of the genes
under the control of the particular NR.
3.2.2.1 Response Elements
The DNA sequences recognized by NRs are called hormone response elements
(HREs), or simply response elements (REs), and are typically found in the
promoter region of downstream genes. Two six base pair consensus sequences
have been determined: AGG/TTCA is preferentially recognized by ER, AR, GR,
and MR, and AGAACA is recognized by all other NRs. While most NRs will bind
the same response elements, variations of these sequences leads to greater
specificity for certain NRs. Additional RE specificity for most NRs is due to dimerization. In the context of dimerization, a recognition sequence is called a
‘half-site’ and the orientation of two half-sites can lead to more specific binding.
For example, half-sites may be oriented as palindromes, inverted palindromes, or direct repeats. Homodimers, such as those recognizing the AGG/TTCA half-site, have REs made up of palindromic half-sites, while most others recognize direct repeats. In addition to orientation, spacing between half-sites may also confer greater binding affinity between a specific NR and its corresponding RE. For example, the RAR-RXR dimer (in which the RAR is binding the upstream element) will preferentially bind direct repeats with a two base pair spacing
(DR2), while the RXR-TR dimer will preferentially bind DR4 REs [130].
Finally, it has been shown that NRs that bind DNA as monomers are able to bind with sufficient affinity due to additional contacts made outside of the six base pair half-site. In the case of NGFIB, the recognition sequence is extended by two 81 bases upstream from the half-site with adenines at the -1 and -2 positions:
AAAGGTCA [154]. While the standard six base half-site is recognized in the
major groove of the DNA, the additional contacts made by NGFIB are in the
minor groove. Overall, the half-site consensus sequences are simplified models
for NR/DNA recognition. Based on what has been observed for NGFIB, it is likely
that DNA contacts outside of the traditionally-defined HREs contribute to the
specificity for other NRs to promote the transcription of particular genes.
3.2.2.2 NR Dimerization
Although dimerization can be seen as being most important for recognition and
binding of repeating REs, it is the ligand-binding domain (LBD), not the DNA-
binding domain (DBD) that serves as the primary platform for dimerization. In the
absence of LBD, high affinity RE binding is lost [155,156]. Therefore, it is too simplistic to think of the LBD and DBD as independent domains; the domains of
a nuclear receptor work together to specifically promote the transcription of particular genes.
3.2.3 NR Structure
Numerous crystal structures of nuclear receptor (NR) ligand binding domains
(LBDs) have revealed the large conformational changes that occur upon ligand
binding. Prior to these structures, NRs were known to contain two activation
functions (AFs). The N-terminus contains the ligand-independent AF-1 for which no structural data has been reported. At the C-terminus is the ligand-dependent
82 AF-2 that crystallographic studies have since mapped to α-helix 12 (H12) of the
LBD.
3.2.3.1 Modular Structure
The overall structure of a NR is modular with six different regions as illustrated in
Figure 3.3. The DNA-binding domain (DBD) and ligand-binding domain (LBD) of the receptors are the most conserved, particularly the DBD, and serve as signatures when identifying NRs in newly sequenced genomes. The other regions vary in their length and sequence. The A/B region allows ligand- independent transcription, while the function of the F region, which is completely absent in some NRs, has not been established. Region D, the linker between the structurally conserved DBD and LBD allows for communication between the two domains during dimerization in addition to granting flexibility to the two DBDs of a
NR dimer to bind the half-site repeats of HREs with different orientations or spacing [155,157].
A/B D F
DBD LBD
C E
Figure 3.3. Domain organization of typical nuclear receptor. The DNA-binding domain (DBD) and ligand-binding domain (LBD), C and E regions, are the most conserved and share a common fold among all NRs.
83 3.2.3.2 A/B Region
The A/B region is the most variable in both sequence and length. The human
RXRα A/B region is 200 residues long, while that for the human hepatocyte nuclear factor 4 γ (hHNF4γ) is only 44 residues long. Little is known about the structure of this region; it is considered an intrinsically disordered domain that can become more ordered upon interaction with a binding partner. This phenomenon has been observed in the case of ERα and GR interacting with
TATA-binding protein (TBP) [158,159]. Interestingly, the A/B region contains one
of two activation functions, AF1 and AF2, found on a typical nuclear receptor.
While AF2 is ligand-dependent, AF1 can be responsible for gene transcription in
the absence of an agonist molecule.
3.2.3.3 DNA-Binding Domain
The first structures of a DNA-binding domain (DBD) were the solution structure of
the glucocorticoid receptor in 1990 [160], followed by the crystal structure in 1991
(PDB ID: 1GLU [161]). The latter of these two structures was in complex with a
hormone response element (HRE), revealing the interaction between the DBD
and the major groove of the DNA. The highly conserved fold is composed of two
amphipathic α-helices that cross at about a 90° angle. Additionally, the DBD
includes two zinc fingers, each coordinating a Zn2+ ion with the side chains of
four cysteine residues (Figure 3.4A). The N-terminal zinc finger forms a
conventional treble clef motif, while the C-terminal zinc finger is a modified treble
clef motif [162]. As illustrated in Figure 3.4B, the first of the two helices inserts
84 into the major groove of the DNA, and is responsible for recognition of the HRE
sequence.
A B
180°
Figure 3.4. DNA-binding domain of estrogen receptor in complex with hormone response element. A. DBD monomer, highlighting its two zinc-binding motifs and two-helix composition. B. The DBD heterodimer inserts into the major groove of the DNA (black and light grey) which consists of two palindromic half sites separated by three base pairs. The DBDs shown in ribbon representation with one monomer colored from red (N-terminus) to blue (C-terminus) and the second monomer colored grey. The structure illustrated here comes from PDB ID: 1HCQ [163].
3.2.3.4 Ligand-Binding Domain
The first published NR ligand-binding domain (LBD) crystal structure was that of the human RXRα (PDB ID: 1LBD [164]) in 1995. This structure revealed that the
LBD is composed of a novel fold consisting of 12 α-helices and a short β-hairpin.
The overall conformation has been described as an antiparallel α-helical 85 sandwich in which helices H4, H5, H6, H8, and H9 are sandwiched between H1,
H2, and H3 on one side and H7, H10, and H11 on the other, as illustrated in
Figure 3.5A. This structure also revealed that the dimerization interface is largely
composed of residues on H10, including a highly conserved leucine residue
flanked by moderately conserved hydrophobic residues (Figure 3.5B).
A H9 B H9 H1 H8 H4 H4 H5 180° H5 H7 H10
H3 H2 H2 H6 H11
H12 H12
Figure 3.5. Diagram of apo hRXRα LBD. A. Topology of monomer. B. Dimerization interface formed by H10. A leucine residue, highly conserved in all NRs, is shown in sphere representation.
In this structure, helix 12 (H12), which contains the activation function 2 (AF2)
region, is extended away from the core of the receptor. Subsequent agonist-
bound structures such as the hRARγ bound to ATRA (PDB ID: 2LBD, [165]), the
rat TRα1 bound to a T3 isostere (structure not available, [166]), and hERα bound
to 17β-estradiol (PDB ID: 1ERE, [167]), reveal that the tertiary structure of the
LBD is the same among different types of NR and that upon agonist binding, H12
undergoes a dramatic conformational change compared to the structure of apo hRXRα. In the agonist-bound state, the amphipathic H12 is closely associated
86 with the core of the receptor, in what has been called the ‘active’ conformation,
making contacts with H3, H4, H5, and the ligand, effectively sealing off one
apparent entry to the ligand-binding pocket. In addition to the reconfiguration of
H12 upon ligand binding, subsequent agonist-bound structures have shown that
H11 becomes continuous with H10, resulting in one less individual helix. In such
cases, the numbering of the secondary structures remains based off of the apo
hRXRα structure, thus the C-terminal helix is still called H12 and not H11.
Based on the various crystal structures up to this point, a ‘mouse-trap
mechanism’ was proposed that suggested the dramatic conformational change in
H12 from extended to ‘closed’ was universal for ligand-induced NR
transcriptional activation [165,168]. However, subsequent crystal structures seem
to challenge this model. A particularly important report that helped clarify the
existing structural data was reported by Nolte et al. in 1998 [169]. Here, two structures were reported: apo hPPARγ LBD (PDB ID: 1PRG) and rosiglitazone
(agonist)-bound hPPARγ LBD in complex with an 88 amino acid segment of the steroid receptor coactivator-1 (SRC-1) (PDB ID: 2PRG). Both structures were solved as homodimers. Superposition of the two monomers of the apo structure reveals asymmetry in the H12 position. In neither case is H12 extended in an
‘open’ conformation similar to the apo hRXRα LBD structure. Instead, H12 in one of the monomers takes on the ‘active’ conformation seen in the crystal structure of various agonist-bound LBDs, while the other H12 is still in a ‘closed’, yet slightly different conformation than the ‘active’ conformation as illustrated in
Figure 3.6. After superimposing the two structures with the ‘match’ command in
87 Chimera, the Cα RMSD between the two LBDs is 2.46 Å, while the Cα RMSD is reduced to 1.01 Å when the H11-H12 region is excluded from the calculation, highlighting that the two monomer share nearly identical conformations aside from the H11-H12 region.
A B
H12
Figure 3.6. Crystal structure of apo hPPARγ. A. Apo hPPARγ LBD homodimer (PDB ID: 1PRG). One monomer is colored blue, while the other is tan. B. Superimposed monomers from 1PRG (coloring same as in A) in addition to agonist-bound hRARγ LBD (PDB ID: 2LBD, pink) highlighting the similar overall structure, yet different H12 conformations.
Shortly after the publication of the apo hPPARγ by Nolte et al., another crystal
structure of apo hPPARγ LBD was published by a different group. This time, the
molecule was solved as a monomer with H12 in an agonist-bound conformation
(PDB ID: 3PRG, [170]). These two apo PPARγ structures provide conflicting data
for the ‘mouse-trap mechanism’ hypothesis since these unliganded structures
were exhibiting ‘closed’ or even ‘active’ H12 conformations.
88 3.2.3.4.1 Coactivator Interactions with the LBD
The structure of the hPPARγ dimer in complex with the 88-residue coactivator peptide (PDB ID: 2PRG) revealed the binding mode for the LXXLL motifs found in the receptor interaction domain (RID) of coactivator proteins [169]. The SRC
binding pocket, as shown in Figure 3.7, is partially created by H12, helping
explain the structural basis for the ligand dependent transcription of NRs. Ligand
binding seems to bring together two of the most conserved regions of the NR
LBDs. In a sequence alignment of 18 NRs from nine different receptor types, one
region of the LBD was notably strongly conserved: the short span composed of
the C-terminal end of H3 through H4 contains six absolutely conserved residues
and six more weakly conserved sites (Figure 3.7C; see Appendix A for full
alignment). Since this region forms the largest part of the coactivator-binding
pocket, it seems that interactions at this site are evolutionarily conserved among
the superfamily of nuclear receptors. The other portion of the coactivator-binding
pocket is formed by H12. This amphipathic helix, which contains the AF-2 region,
has high similarity among the selected sequences in addition to one absolutely
conserved glutamate. As revealed by the crystal structure, this invariant
glutamate in conjunction with an absolutely conserved lysine at the end of H3
forms a ‘charge clamp’ interaction with the receptor interaction domain (RID) of
SRC-1. When mapped to the hRARα sequence, the charge clamp residues are
K244 and E412. Both of the LXXLL RID motifs form α-helices in which the backbone atoms of the N-terminal side of the helix interact with E412, while K244 interacts with the backbone of the C-terminal side of the helix (Figure 3.7B).
89 Therefore, in addition to the polar interaction between side chain and backbone atoms, the dipoles of the RID helices could help stabilize interaction with the receptor via the charge clamp residues. Similar interactions were observed soon
after in structures of both the hTRβ and hERα LBDs in complex with SRC-2 NR2 peptide [146,171].
A B
K244 E412
H3 H4 H12
C
Figure 3.7. Crystal structure of agonist-bound hPPARγ in complex with SRC-1 NR-box 2 peptide. A. hPPARγ monomer (PDB ID: 2PRG; chain B) bound to an agonist (van der Waals surface). The ribbon is colored from red (N-terminus) to blue (C-terminus), while the coactivator peptide is in magenta. B. Close up of view of peptide interactions with the receptor. Coloring is the same as in panel A, however the side chains of invariant residues are now shown. This includes K244 of H3 and E412 of H12, forming the ‘charge clamp’ interaction. The leucine residues of the LXXLL motif of the peptide are shown as well as backbone atoms that interact with the charge clamp residues. Residue numbering is for hRARα. C. Sequence alignment of the regions forming the coactivator-binding pocket from 18 NRs representing nine different types of receptors. Note the string of absolutely conserved residues that form the H3-H4 loop. The residues forming the charge clamp are highlighted in magenta, strongly conserved residues in yellow, and more weakly conserved residues in blue. The arginine highlighted in green forms a salt bridge with the carboxylate terminus of most RAR ligands.
90
3.2.3.4.2 Antagonist-Induced NR LBD Conformations
As described in more detail in Section 3.2.4, antagonist activity can be classified
several different ways. One major distinction can be made between partial and
full (or “pure”) antagonists. Where partial NR antagonists, more commonly called
partial agonists, promote some level of transcription above basal (unliganded)
levels, pure antagonists inhibit basal level transcription. Several crystal structures
have been solved of pure antagonists in complex with NR LBDs. These include hERα LBD bound to the selective estrogen receptor modulators (SERMs)
raloxifene (PDB ID: 1ERR, [167]) and tamoxifen (PDB ID: 3ERT, [171]) as well
as hRARα LBD bound to BMS614 (PDB ID: 1DKF, [172]). In each of these three
cases, the antagonists, which are larger than the agonists from which they were
derived, sterically inhibit H12 of the LBD from occupying the ‘active’
conformation, thereby directly preventing the formation of the coactivator binding
pocket. Furthermore, as shown in Figure 3.8, H12, which contains a degenerate
LXXLL motif itself, physically occupies the coactivator binding site.
91 A E
All-trans retinoic acid BMS614
B C D
Figure 3.8. Antagonist-induced H12 conformation. A. Structural comparison between an RAR agonist (ATRA) and antagonist (BMS614), highlighting in red the additional fragment that gives the molecule its antagonistic properties. B. ATRA-bound hRARα LBD conformation (PDB ID: 3A9E) bound to coactivator peptide (magenta). Helix 12 is colored orange. C. BMS614-bound hRARα LBD conformation (PDB ID: 1DKF) with H12 in orange. D. Overlay of panels B and C, highlighting that H12 in the antagonist-bound receptor occupies that coactivator-binding site. E. Superposition of BMS614-bound structure (rainbow ribbon) and ATRA-bound structure (light blue ribbon + magenta coactivator peptide). BMS614 is the grey molecule shown clashing with an isoleucine residue in H12 of the superimposed agonist-bound conformation.
Where the three antagonists described above act as pure antagonists, partial
agonists may induce yet another H12 conformation. In the crystal structure of the
hERβ LBD in complex with the partial agonist genistein (PDB ID: 1QKM, [173]),
H12 takes a conformation similar to, but slightly different from the pure antagonist
conformation. The helix occupies the coactivator binding site, yet at a different
angle than the H12 conformation induced by pure antagonists. Since genistein
does not have the bulky extension common to pure antagonists, the ligand does
not sterically preclude H12 from adopting the agonist conformation. Additional
insights on the way in which ligands can effect H12 positioning are provided by a
pair of structures of 5,11-cis-diethyl-5,6,11,12-tetrahydrochrysene-2,8-diol (THC)
92 bound to hERα and hERβ LBD (PDB IDs: 1L2I & 1L2J, [174]). While THC is a
hERα agonist, it acts as a pure hERβ antagonist in that it depresses transcription
below basal levels. In the THC-bound hERβ crystal structure, H12 is very similar
to the conformation induced by the ER partial agonist genistein. Like genistein,
THC does not possess a bulky side chain common to the pure antagonists
described above; therefore THC seems to act as a hERβ antagonist through a
different mechanism. This has been termed “passive antagonism” as opposed to
the “active antagonism” observed for pure antagonists [174]. Thus, one must
take care in interpreting the static conformations presenting in crystal structures,
since although both genistein and TCH have been crystallized with the same
ERβ H12 conformation that blocks the coactivator binding pocket, coactivators are able to bind genistein-bound ERβ with much greater affinity than TCH-bound
ERβ. This has been demonstrated with fluorescence polarization studies of rhodamine-labeled coactivator peptides binding to ligand-saturated ERβ LBD,
where Kd_genistein = 104 nM, Kd_apo = 215 nM, and Kd_THC = 3.3 μM, suggesting that
THC stabilizes the crystallized conformation more so than genistein [174].
While the structures of genistein and THC in complex with hERβ reveal a new
H12 conformation that can be stabilized by either a partial agonist or pure
antagonist, a structure of RXRαF318A in complex with oleic acid, has revealed that a partial agonist may also induce the pure agonist-bound H12 conformation
[172]. Like genistein, oleic acid does not sterically inhibit the agonist-bound H12 conformation. Finally, in a study of RXR modulators, it has been shown how an agonist may be progressively transformed into a partial agonist and then a pure
93 antagonist by lengthening a side chain that interferes with the formation of the
agonist-bound H12 conformation. In this study, crystal structures of hRXRα in
complex with three different partial agonists were solved with H12 in the agonist-
bound conformation due to co-crystallization with a coactivator peptide containing
the LXXLL motif [175]. Fluorescence anisotropy studies of the RXRα conjugated
to a C-terminal fluorescein moiety revealed that while the agonist stabilized the
conformation of the C-terminal H12, the partial agonists decreased this stability.
However, the addition of a coactivator peptide was able to increase anisotropy in
the partial agonist-bound cases to levels observed with the agonist-bound
receptor [175]. Thus, while agonists induce a specific conformation that allows for
coactivator recruitment, partial agonists or antagonists have been shown to
stabilize alternate conformations. In the case of a partial agonist, the presence of
coactivators is able to shift the equilibrium towards the active form [172,175,176].
3.2.3.4.3 Corepressor Binding
As previously discussed in Section 3.2.1.2, in the absence of ligand, some NRs
including RAR and TR are commonly found in complex with corepressor proteins
that suppress gene transcription through HDAC activity. In certain cases, an
antagonist may strengthen the interaction between a NR and corepressor
protein. Such ligands may be specifically classified as inverse agonists. The first crystal structure of a NR LBD in complex with a corepressor peptide was that for
PPARα binding to a 22-residue SMRT NR2 motif (residues 2329-2358) (PDB ID:
1KKQ, [177]). As shown in Figure 3.9A, H12 in this structure is poorly structured and loosely packed against H3, a conformation not observed in any previous 94 crystal structure. The corepressor fragment is situated in the coactivator-binding
pocket, yet with noticeable differences. The corepressor interacts with the LBD
as an amphipathic α-helix, like the coactivator, indicating how both coregulators
recognize the same groove. However, the SMRT NR2 helix is one turn longer
than the coactivator helix (three turns instead of two), illustrating how
discrimination between the two coregulators is controlled by the H12
conformation. In the agonist-bound case, H12 is positioned to form the charge
clamp interaction with the coactivator, which puts a strict two-helical turn limit on
the motif that may bind to the receptor. However, in an inverse agonist-bound
conformation, the charge clamp is disrupted, allowing for the longer corepressor
motif to bind.
More recently, in 2010, the structure of RARα LBD bound to an inverse agonist
(BMS493) and a fragment of the N-CoR1 NR1 provided insight on why certain apo NRs, such as RAR and TR, more effectively recruit corepressors than others. In this structure (PDB ID: 3KMZ, [178]), it was revealed that the N-CoR
NR1 motif forms a helix that is one turn longer than the SMRT NR2 motif that was previously co-crystalized with PPARα as described above. To allow for this longer helix to bind, H11 in RARα not only unfolds, but forms a short anti-parallel
β-strand interaction with the corepressor (see Figure 3.9B). Thus, the ability for the RAR, and presumably TR, H11 region to switch to a β-strand allows for it to interact more favorably with the corepressor through the longer NR1 motif. Not only does this allow more interactions between the longer helical motif and the
95 receptor, but also forms the β-strand interaction which further stabilizes the NR-
corepressor complex [178].
Figure 3.9. Corepressor interactions with inverse agonist-bouind NR LBDs. A. PPARα LBD interacting with SMRT NR2 peptide (PDB ID: 1KKQ, [177]). PPARα is colored from red (N- terminus) to blue (C-terminus), the corepressor peptide is magenta, and the ligand is grey. Note that H11 remains intact, while H12 is disordered, yet resolved in the crystal structure. B. RARα LBD interacting with N-CoR1 NR1 peptide (PDB ID: 3KMZ, [178]). Coloring is the same as in Panel A. In this structure, H11 is extended, forming an anti-parallel β-strand interaction with the corepressor peptide. Also note that the NR1 motif forms a longer helix.
3.2.3.4.4 Evidence Against an Extended H12 Conformation
Structural knowledge of the interaction between the SRC LXXLL motifs and the
NR LBD allowed previous crystal structures to be interpreted in a new light. For example, the apo hRXRα LBD crystal structure showed that in the absence of
ligand, H12 was extended away from the core of the LBD. However, when
examining the crystallographic contacts made between monomers, it becomes
apparent that the extended H12 is making contact with a neighboring LBD.
96 Superimposing the coactivator peptide-bound hPPARγ structure with the apo hRXRα structure reveals that the amphipathic H12 is mimicking the LXXLL motif
interactions as shown in Figure 3.10.
A B
H12
C D
H12
H12
E F
Figure 3.10. Extended helix 12 of apo hRXRα LBD interacts with coactivator binding pocket of a neighboring molecule. A. Apo hRXRα LBD monomer with extended H12 conformation (PDB ID: 1LBD). B. Crystal packing of apo hRXRα. C. Two apo hRXRα LBDs interacting. D. Agonist-bound hRARα LBD (pink) in complex with coactivator peptide (green) (PDB ID: 3A9E). E. Overlay of structures from panels C and D. F. Close-up view of panel E, highlighting the extended hRXRα H12 interacting with its neighboring molecule at the coactivator- binding site.
97 This new insight casts doubt on whether the extended H12 conformation found in
the apo hRXRα LBD is a state sampled by the protein under physiological
conditions or if it is simply a crystallographic artifact. Crystal structures of other
NR LBDs have been solved with extended H12s. These include the 1998 structure of ERα LBD solved as a symmetric homodimer in which both H12s were extended away from the ligand-binding pocket (PDB ID: 1A52, [179]). A significant crystal-packing artifact, in which a pair of intramolecular disulfide bonds forms between neighboring hERα dimers, allows for the unusual extended
H12 conformation for the agonist-bound receptor. Like the original apo hRXRα structure, each extended H12 is interacting with the coactivator-binding site of a neighboring receptor. Additionally, in 2010, a structure of an agonist-bound hRARα/antagonist-bound mRXRα LBD was published in which the antagonist- bound mRXRα exhibited an extended H12 conformation while H12 of the hRARα
LBD was in the typical agonist-bound conformation (PDB ID: 3A9E, [180]). Again, the extended H12 in this structure interacted with the LXXLL binding site of a neighboring mRXRα molecule. Thus, these structures perhaps highlight the general flexibility of the H12 region, however, they do not support that the extended conformation is the default state in the absence of ligand, since all of these structures seem to require stabilization through interactions with neighboring receptors. Even in the structure of the apo hPPARγ dimer where both H12s were in the ‘closed’ as opposed to extended state (PDB ID: 1PRG,
Figure 3.6), the crystal contacts reveals that the monomer with the ‘active’ H12 conformation is stabilized by the H12 of the dimerization partner of a neighboring
98 dimer. Also, the ‘active’ H12 conformation of the apo hPPARγ monomer from
PDB ID: 3PRG seems to be influenced by a neighboring molecule. However, this
appears to be the only apo LBD crystal structure in which H12 is not interacting
with the coactivator-binding site of a neighboring molecule. Instead, H12 of one
monomer is making crystallographic contacts with the C-terminus of H3 of an
adjacent monomer. Nevertheless, it seems that nearly all reports of apo H12
conformations are influenced by neighboring molecules in the crystal lattice.
3.2.4 NR Pharmacology
In terms of NR ligands, a small molecule must possess two qualities to be
considered biologically active. First it must bind to the receptor. Second, the
binding must have an effect on its transcriptional activity. The affinity that a
molecule has for a receptor is not associated with an ability to elicit or inhibit a
response. For example, (E)-4-[2-(5,6,7,8-tetrahydro-5,5,8,8-tetramethyl-2- napthylenyl)-1-propenyl] benzoic acid (TTNPB) is a RAR ‘superagonist’. It binds to RARs with a 10-fold weaker affinity than ATRA, yet exhibits a 1000-fold greater potency [181], clearly indicating that binding affinity between ligand and receptor is not correlated to transcriptional activity. As discussed above, a NR ligand is better characterized by the H12 conformation that it stabilizes.
Some basic pharmacological terms are defined below as they relate to NRs.
These definitions will help clarify later discussions of experimental findings in an effort to categorize β-apo-13-carotenone and β-apo-14’-carotenoic acid.
Agonist: a molecule that induces a response. As applied to NRs, the response is gene transcription above basal levels. 99 Parital agonist: a molecule that enhances transcription above basal levels but
not to the extent of the endogenous agonist. In the case of RARs, the
endogenous ligand is ATRA.
‘Superagonist’: a molecule that enhances transcriptional activities beyond the
levels observed with the endogenous agonist.
Inverse agonist: a molecule that induces a response opposing the agonistic
response. In the context of NRs, an inverse agonist would recruit corepressor
proteins that actively repress gene transcription via histone deacetylase (HDAC)
activity. An inverse agonist may also be considered an antagonist as it inhibits
agonist activity.
Antagonist: an antagonist is defined as a molecule that inhibits agonist
responses. Partial agonists are also antagonists in that high levels may compete
with the endogenous agonist, resulting in less than maximal activity. An
antagonist may be further classified as being either ‘pure’ or ‘neutral’ if the
compound does not induce an effect opposing the agonistic response as seen in
inverse agonists.
3.2.5 β-Apocarotenoids Modulate RAR Activity
All-trans retinoic acid (ATRA), the acid form of vitamin A, is the endogenous
agonist for RARs. Plant-synthesized β-carotene is the dietary vitamin A source
for all animals; therefore pure carnivores must obtain their vitamin A from animal
stores that are often found in the form of retinol or retinyl esters. Typically, β-
carotene is centrally cleaved by β,β-carotene-15,15’-oxygenase (BCO1) to form two equivalent retinal molecules which are important chromophores that bind to 100 opsins to enable color vision. Retinal may be enzymatically oxidized to retinol, a
storage form of vitamin A, or further oxidized to retinoic acid.
It is possible that β-carotene is cleaved at sites other than the central 15-15’
double bond. For example, β,β-carotene-9’,10’-oxygenase (BCO2) can form β- apo-10’-carotenal and β-ionone [182]. Some of these asymmetric cleavage products, called β-apocarotenoids, have been found to be biologically active.
Specifically, β-apo-13-carotenone and β-apo-14’-carotenoic acid have been shown to be potent RAR antagonists [124] (Figure 3.11). β-apo-13-carotenone is particularly potent, exhibiting a binding affinity similar to ATRA (Table 3.2).
Additionally, this compound has been found at physiologically relevant concentrations in human plasma samples (3.8 ± 0.6 nM in samples from six individuals), suggesting it is a natural modulator of ATRA activity [124].
Table 3.2. Apocarotenoid binding affinity for human RAR subtypes. * Binding affinity , Ki (nM) β-apocarotenoid hRARα hRARβ hRARγ ATRA 3 ± 1 4 ± 2 3 ± 1 β-apo-13-carotenone 5 ± 1 4 ± 2 4 ± 1 β-apo-14’-carotenoic acid 34 25 58 *As measured by radioligand displacement assays; data reproduced from [124].
101 A
B
C
D
Figure 3.11. β-carotene and apocarotenoids. A. β-carotene B. All-trans retinoic acid C. β-apo- 14’-carotenoic acid D. β-apo-13-carotenone
The exact source of β-apo-13-carotene is not yet known, and it has yet to be
determined if β-apo-14’-carotenoic acid is a naturally occurring β-carotene
metabolite. Nevertheless, given the structural similarities between ATRA and the
two aforementioned asymmetric cleavage products, it is interesting to find that
while ATRA is a potent agonist of RAR activity, the shorter (β-apo-13-
carotenone) and longer (β-apo-14’-carotenoic acid) apocarotenoids both act as
relatively potent antagonists. While β-apo-14’-carotenoic acid only differs from
ATRA in the length of its unsaturated hydrocarbon tail, β-apo-13-carotenone also
differs in its terminal functional group (ketone vs. carboxyl group). A large
majority of synthetic retinoids employ a carboxyl group to form a salt bridge with 102 a conserved arginine residue (R276 in hRARα) at one end of the ligand-binding
pocket, mimicking the binding mode of ATRA. Of the 20 non-ATRA RAR ligands
available for purchase on the Tocris website, 19 contain a carboxyl group as
illustrated in Appendix B. ATRA precursors, retinol and retinal, which differ from
ATRA in that they lack the carboxyl terminus, have been measured to bind with
~4-7-fold and >200-fold weaker affinity than ATRA, respectively, while their
activities are reduced by 35- and 500-fold [183]. These findings highlight how the
oxidation of retinol to ATRA transforms a weak agonist into a strong agonist.
Therefore, it is curious that β-apo-13-carotenone, which lacks the carboxyl group
and is shorter than ATRA, exhibits such a favorable binding affinity.
This chapter addresses the experimental work carried out to understand the
activity of β-apo-13-carotenone and β-apo-14’-carotenoic acid, while the
following chapter details the computational results used to explain the
mechanism of RAR antagonism in greater detail.
3.3 NR LBD expression and purification
In the following sections, all discussion of RARα and RXRα refer to the ligand binding domains of the human receptors only, specifically residues 182-421 for
RARα and 213-452 for RXRα. No experiments made use of constructs containing additional domains. In all cases, the proteins contained an N-terminal
6xHis-tag, unless noted otherwise (e.g. ∆His-RXRα).
103 3.3.1 RARα LBD Expression and Purification
Plasmid containing the human RARα LBD (residues 182-421, MW = 30060.6699
Da, sequence found in Appendix C) with an N-terminal His-tag in the pET28a
vector, encoding kanamycin resistance, was obtained as a gift from Noa Noy,
Case Western Reserve University. This represents the full LBD, excluding 41-
residue, C-terminal F domain. The plasmid was transformed into BL21-
Gold(DE3) cells that were grown to an optical density of ~0.6 in lysogeny broth
(LB) with 0.05 mg/ml kanamycin. Protein expression was then induced with 0.5 M
isopropyl β-D-1-thiogalactopyranoside (IPTG) which promotes BL21(DE3) T7
RNA polymerase expression and therefore RARα LBD expression as it is
controlled by a T7 promoter in the pET28a vector. The protein was expressed for
5 hours at 30°C while shaking at 225 rpm. Following expression, the cells were
pelleted in a Beckman centrifuge with a JLA 16.25 rotor for 20 minutes at 4,000
rpm and 4°C.
The pelleted cells from 2 L of culture were resuspended in a solution containing
20 mM Tris-HCl, 500 mM NaCl, 5 mM imidazole, and one tablet of Mini, EDTA-
free Roche cOmplete Protease Inhibitor Cocktail, pH 8.0, then frozen overnight at
-80°C. The following day, the cells were thawed at 4°C and lysozyme and
deoxyribonuclease (DNase) were added to final concentrations of 0.1 and 0.001
mg/ml respectively prior to lysis via sonication. Cellular debris from the lysate
was pelleted with an Eppendorf 5810R centrifuge at maximal speed for 60
minutes at 4°C. The supernatant was filtered through glass wool then a 0.22 μM filter prior to incubation with ~1 ml of Ni Sepharose™ 6 Fast Flow resin for affinity
104 chromatography of the His-tagged protein. The protein was incubated with the
resin overnight on a rotary mixer at 4°C.
The protein was then purified by affinity chromatography by loading the nickel
resin onto a column and washing with increasing concentrations of imidazole
(wash buffer 1: 20 mM Tris-HCl, 500 mM NaCl, 5 mM imidazole; wash buffer 2:
wash buffer 1 + 50 mM imidazole; elution buffer: wash buffer 1 + 200 mM
imidazole). The column was first washed with 5x5 ml of wash buffer 1, followed
by 3x5 ml of wash buffer 2. Finally, the protein was eluted with 5x5 ml of elution
buffer. The purity of the elution fractions was analyzed by sodium dodecyl sulfate
polyacrylamide gel electrophoresis (SDS-PAGE) as illustrated in Figure 3.12.
Figure 3.12. SDS-PAGE of His-hRARα LBD. His-RARα LBD purified from 2 L of BL21(DE3) cell culture induced with 0.5 M IPTG for 5 hours at 30°C. Lane 1: molecular weight markers; Lane 2: flow through; Lanes 3-6: 5 ml buffer washes with 5 mM imidazole; Lanes 7-9: 5 ml buffer washes with 50 mM imidazole; Lanes 10-15: 5 ml elution fractions with 200 mM imidazole.
105 The elution fractions were further purified via size exclusion chromatography with a HiLoad 16/60 Superdex 200 prep grade column (GE Healthcare Life Sciences).
All 25 ml of the elution fractions were concentrated to < 3 ml with an Amicon Ultra
10,000 Da nominal molecular weight cut-off centrifugal filter (Millipore), so that the entire sample could be loaded onto the column in a single run. During the fast protein liquid chromatography (FPLC) run, the protein was exchanged into a buffer containing 10 mM Tris, 150 mM NaCl, and 1 mM tris(2- carboxyethyl)phosphine) (TCEP), pH 7.5. A sample FPLC chromatogram of purified RARα LBD is presented in Figure 3.13. The major peak is found at a volume of 85.91 ml with a slight shoulder at 79.27 ml. RARα LBD has an expected molecular weight of 30,060.6699 Da. When compared to the elution profiles of protein standards of known molecular weight (see Appendix E) we find that the RARα peaks correspond to monomer and dimer fractions. When collected in 5 ml fractions, the protein eluted as a monomer in fractions 17-20, which were named fractions B8-B5. RARα LBD dimers are only observed when the protein solution was highly concentrated, which could explain some of the binding stoichiometries observed in ITC experiments carried out at high concentrations (see Section 3.6, Appendix F). Typical RARα LBD yields are around 25 mg of purified protein from 2 L of cell culture.
106 43 kDa 25 kDa
67 kDa
Figure 3.13. Size exclusion chromatogram for His-hRARα LBD. Chromatogram of His-hRARα LBD run on a HiLoad 16/60 Superdex 200 column. Dashed lines are indicate the elution volumes of BSA (67 kDa), ovalbumin (43 kDa), and chymotrypsinogen A (25 kDa) protein standards (see Appendix E for details).
3.3.2 RXRα LBD Expression and Purification
Plasmid containing the human RXRα LBD (residues 214-452, MW = 28988.3574
Da, sequence found in Appendix D) with an N-terminal His-tag in the pET15b vector, encoding ampicillin resistance, was obtained from Xu Shen, Shanghai
Institute of Materia Medica who originally obtained it from Eric Xu, Van Andel
Research Institute. The plasmid was transformed into BL21-Gold(DE3) and expressed in the same manner as RARα LBD except for the use of 0.05 mg/ml 107 ampicillin instead of kanamycin as a selection agent. An example SDS-PAGE
result is found in Figure 3.14 and an example size exclusion chromatogram is
found in Figure 3.15.
The two most significant differences noted between the RARα and RXRα LBD expression are the protein yields and the oligomeric states observed in the FPLC
profiles. While a typical RARα LBD yield from 2 L of BL21(DE3) cells is ~25 mg, a much larger amount of RXRα LBD is obtained with the same methods. Typical
RXRα LBD yields are ~50 mg, roughly twice that of RARα LBD. However, as illustrated in Figure 3.15, less than half of this protein is in the monomeric state.
Based on the elution volume of protein standards, RXRα LBD seems to predominantly elute as a tetramer (expected mass of 115.9 kDa), unlike RARα
LBD, which is predominantly monomeric. Additional chromatograms in Section
3.3.3 reveal a dimer peak between the two peaks in Figure 3.15, verifying the tetrameric state of RXRα LBD.
108
Figure 3.14. SDS-PAGE for His-hRXRα LBD. His-hRXRα LBD purified from 2 L of BL21(DE3) cell culture induced with 0.5 M IPTG for 5 hours at 30°C. Lane 1: molecular weight markers; Lane 2: flow through; Lanes 3-7: 5 ml washes with 5 mM imidazole; Lanes 8-10: 5 ml washes with 50 mM imidazole; Lanes 11-15: 5 ml elution fractions with 200 mM imidazole.
109 43 kDa
67 kDa 25 kDa
158 kDa
Figure 3.15. Size exclusion chromtogram for His-hRXRα LBD. Chromatogram of His-hRXRα LBD run on a HiLoad 16/60 Superdex 200 column. Dashed lines indicate the elution volumes of aldolase (158 kDa), BSA (67 kDa), ovalbumin (43 kDa), and chymotrypsinogen A (25 kDa) protein standards (see Appendix E for details).
3.3.3 RARα/RXRα LBD Heterodimer Purification
Since neither RARα nor RXRα work alone to transcribe genes, RARα/RXRα LBD dimers were purified to test whether any of our compounds inhibit the observed transcriptional activity by disrupting the RAR/RXR dimerization. Initially, a copurification method was attempted to isolate RAR/RXR LBD heterodimers by creating a His-tag deletion mutant of RXR LBD. By incubating the cell lysates
containing His-RARα LBD and ∆His-RXRα LBDs together with a Ni Sepharose
110 resin, we hoped to purify His-RARα/RXRα LBD dimers in a single purification
step. During purification, it was thought that excess ∆His-RXRα would be washed
away leaving only His-RARα/∆His-RXRα heterodimers immobilized by the
column. Thus, the His-tag on RXRα LBD was removed using the QuikChange II
XL Site-Directed Mutagenesis Kit from Stratagene with the following primers:
RXR ∆His Forward: AAGAAGGAGATATACCATGACCAGCAGCGCCAAC RXR ∆His Reverse: GTTGGCGCTGCTGGTCATGGTATATCTCCTTCTT
The mutagenesis removed 60 bases that encoded for the His-tag and thrombin
cleavage site. Resulting colonies were cultured and the plasmid DNA was
purified with a Qiagen Miniprep kit. The mutation was successful with high efficiency based on sequencing results from the Plant-Microbe Genomics Facility at OSU (eight of eight selected colonies had the His-tag properly removed).
To test the heterodimer purification strategy, 2 L of His-RARα LBD and 2 L of
∆His-RXRα LBD cell lysates were incubated overnight with ~1 ml of Ni
Sepharose™ 6 Fast Flow resin and the protein was purified following the protocol described in Section 3.3.1. SDS-PAGE analysis of the elution fractions clearly shows the presence of both His-RARα and ∆His-RXRα LBDs (Figure 3.16), however, the FPLC chromatogram shows that the eluted protein formed a mixture of LBD monomers, dimers, and tetramers (Figure 3.17).
111
Figure 3.16. SDS-PAGE of His-hRARα/∆His-hRXRα LBD heterodimer purification. His- hRARα/∆His-hRXRα LBD purified from 2 L of BL21(DE3) cell culture induced with 0.5 M IPTG for 5 hours at 30°C. Lane 1: molecular weight markers; Lane 2: flow through; Lanes 3-7: 5 ml washes with 5 mM imidazole; Lanes 8-10: 5 ml washes with 50 mM imidazole; Lanes 11-15: 5 ml elution fractions with 200 mM imidazole.
112
Figure 3.17. Size exclusion chromatogram for His-hRARα/∆His-hRXRα LBD purification. Chromatogram of His-hRARα/∆His-hRXRα LBD run on a HiLoad 16/60 Superdex 200 column. Initial profile (purple) shows mixture of monomer, dimer, and tetramer, as well as aggregate or precipitate at ~45 ml. Dimers could be purified by rerunning specific fractions.
The formation of RAR/RXR tetramers with 2:2 or 1:3 RAR:RXR stoichiometry
was not anticipated, complicating this method of RAR/RXR dimer purification. By
rerunning selected fractions of the initial gel filtration, dimers could ultimately be
purified from the monomers and tetramers. However, this often required several
runs over the Superdex 200 column, making this method inefficient.
Instead, it seems that the effective way to purify RARα/RXRα heterodimers is to
first purify RARα and RXRα monomers individually, then combine equimolar
113 amounts of each LBD before purifying them via gel filtration. A sample
chromatogram of such a preparation is illustrated in Figure 3.18.
A B
Figure 3.18. Purification of RARα/RXRα heterodimers. A. FPLC chromatogram of His-hRARα added to a molar equivalent of His-hRXRα. B. Zoomed in chromatogram of plot in panel A (purple) compared to the profile of the B8 fraction from the His-hRARα/∆His-hRXRα sample in Figure 3.17, repurified on the size exclusion column (yellow). A 1:4 hRARα:hRXRα sample (green) is included to indicate the elution volumes of monomers, dimers, and tetramers.
When compared to B8 fraction of the His-RAR/∆His-RXR experiment (Figure
3.17B, yellow), this alternate method of RAR/RXR purification results in greater
purity as evidenced by smaller shoulder in the elution profiles, while yielding
nearly the same amount of total dimer. This comes at the cost of an FPLC run.
3.3.4 His-tag Cleavage
The hRARα LBD in the pET28a vector contains an additional 27 N-terminal residues that contain a 6xHis-tag and a thrombin cleavage site (LVPR↓GS).
MGSSHHHHHHSSGLVPRGSHMESYTLTPEVGEL └────┘ └────┘ └──────────► hRARα LBD
114
Likewise, the hRXRα LBD in the pET13b vector contains an additional 21 N-
terminal residues, also with a 6xHis-tag and thrombin cleavage site.
MGSSHHHHHHSSGLVPRGSHMTSSANEDMP └────┘ └────┘ └───────► hRXRα LBD
The extra residues on the N-termini of the protein can pose problems in some
cases, for example, the disordered tails can possibly prevent the formation of
protein crystals in the case of X-ray crystallography screening. In mass
spectrometry experiments, it has been shown that His-tagged proteins expressed in BL21(DE3) E. coli cells can become spontaneously gluconoylated or phosphogluconoylated, resulting in extra masses of +178 Da or +258 Da, respectively [184]. To cleave the His-tag from the recombinant protein, 250 units of thrombin (Amersham Biosciences, Amersham, UK) were added to the His-
RARα/His-RXRα LBD elution fractions from the Ni resin purification.
Thrombin has a reported optimal enzymatic activity at pH 8.3 and contains disulfide bonds. Therefore, to cleave the His-tag from the RARα or RXRα LBDs, it is most appropriate to add the thrombin after the initial Ni-affinity purification in which the elution buffer is at pH 8.0 and contains no reducing agents such as
DTT, β-mercaptoethanol, or TCEP. Additionally, although thrombin activity is reduced at lower temperatures, the His-tag cleavage was carried out at 4°C to
minimize protein precipitation. Also, the sample was not mixed during the
cleavage reaction, since earlier experiments revealed that the RARα LBD
115 samples were very sensitive to agitation, resulting in the visual accumulation of protein precipitate.
The expected mass difference between His-tagged and thrombin-cleaved RARα or RXRα LBD is ~1.88 kDa. This mass difference could be resolved via SDS-
PAGE with a 15% acrylamide content. As illustrated in Figure 3.19, when adding
250 units of thrombin to the His-hRARα LBD FPLC elution fractions, the His-tags were removed after ~2 days when incubating at 4°C without stirring.
Figure 3.19. SDS-PAGE of His-hRARα LBD thrombin cleavage. Fractions were taken at ~0h, 6h, 18h, 23h, 42h, 73h, and 91h. Thrombin, with a molecular weight of 36 kDa can be seen as the faint bands above the RARα LBD.
3.4 Circular Dichroism Experiments
Since the RAR/RXR ligands are all very hydrophobic, they are insoluble in aqueous solutions. Therefore, all compounds were dissolved in ethanol. Typical experiments used a 2-3 fold molar excess of the ligands added to protein 116 samples to ensure saturated binding. If the ligands were at low concentrations
and our protein at high concentrations, a large volume of ligand and therefore
ethanol would need to be delivered to the protein. This caused concern that a
high percentage of ethanol in the protein samples would lead to protein denaturation and/or precipitation. To test the stability limits of RARα on solution ethanol content, a circular dichroism (CD) experiment was conducted.
The RARα sample used in the experiments was purified as described in Section
3.3.1 in 10 mM Tris-Hcl, 150 mM NaCl, and 1 mM TCEP at pH 7.5. The sample was determined to have a concentration of 0.57 mg/ml (18.94 μM) based on UV absorbance at 280 nm. The molar extinction coefficient of the LBD was calculated based on amino acid content of the LBD [185]:
ε280 = #Trp*5500 + #Tyr*1490 + #Cystine*125
His-hRARα LBD contains 1 tryptophan residue, 4 tyrosines, and 0 disulfide
-1 -1 bonds for a molar extinction coefficient of 11,460 M cm at 280 nm. An Aviv
62DS circular dichrosim spectrophotometer was used for the experiments with a
0.1 cm path-length quartz cuvette. Readings were collected at 25°C from 260 to
190 nm at 0.5 nm steps with 5 s signal averaging at each wavelength. Final data
are the average of three independent runs from which baseline data of the
cuvette with only buffer has been subtracted.
The cuvette was filled with 300 μl of sample, which was diluted with 0%, 5%,
7.5%, 10%, 15%, and 20% ethanol. Output from the spectrophotometer is given
in degrees of ellipticity (mdeg), which does not consider sample concentration
117 and is therefore inappropriate for the purposes of this experiment. Since the
sample concentration is diluted with the addition of various amounts of ethanol,
the CD data was converted to molar ellipticity so that the different samples could
be compared:
Molar ellipiticity, [θ] = θ / 10 * concentration * path length (deg * M-1 * m-1)
Here, degrees of ellipticity, θ, are in units of mdeg and path length is in cm. The baseline-corrected molar ellipticity spectra for the protein samples with increasing amount of ethanol are found in Figure 3.20.
Figure 3.20. Circular dichroism spectrum for His-hRARα LBD in solution with varied amounts of ethanol.
118 From the numerous nuclear receptor LBD crystal structures, it is known that
these domains are almost completely α-helical in structure. This coincides with
the RARα LBD CD spectra in Figure 3.20, where characteristic α-helical spectra
are observed with minima at 208 and 222 nm. To determine the amount of
precipitated or unfolded protein at varying levels of ethanol, the molar ellipticity at
208 and 222 nm were compared to the sample without any added ethanol to estimate the percentage of folded protein remaining in the solution. The results for this analysis are found in Figure 3.21 and Table 3.3.
Figure 3.21. Percentage of folded His-hRARα LBD with added ethanol.
119 Table 3.3. Percentage of folded His-hRARα LBD with added ethanol. % Ethanol added Wavelength 0 5 7.5 10 15 20 280 nm 100 99.6 88.3 73.8 29.1 7.7 222 nm 100 99.2 91.8 83.3 38.2 18.8
Based on the results of this experiment, it has been determined that the addition
of a volume of no greater than 5% ethanol results in minimal loss of folded
protein and is therefore an acceptable limit when adding ethanol-dissolved
ligands to RARα LBD in future experiments.
3.5 Dimerization LC Experiments
The transcriptional ability of a nuclear receptor is partly based on its ability to
localize to its target genes, which requires recognition of a response element
(RE). As described in Section 3.2.2.1, most REs are formed by two six-base half- sites separated by 0 to 5 bases. Thus, dimerization, which is largely driven by the
LBD is essential for RE binding. To test if the antagonistic effect of β-apo-13- carotenone and β-apo-14’-carotenoic acid is related to the disruption of
RAR/RXR dimerization, a size exclusion chromatography experiment was performed.
RARα/RXRα heterodimers were prepared by mixing 0.36 ml of 46 μM RARα with
0.49 ml of 33.8 μM RXRα for a final RARα/RXRα concentration of 19.5 μM.
Heterodimer samples were treated with 5 μl of either 15.53 mM β-apo-13- carotenone or 11.23 mM β-apo-14’-carotenoic acid, a 4.7 and 3.4-fold molar excess, respectively. The final volume of ethanol in each case was 0.58%, which,
120 based on the CD experiments in Section 3.4, should have a minimal effect on
that stability of the folded protein. Samples were mixed by pipetting then
incubated for 1+ hours on ice prior to loading onto a HiLoad 16/60 Superdex 200
gel filtration column, in which the protein buffer (10 mM Tris-HCl, 150 mM NaCl,
1 mM TCEP, pH 7.5) was used as the mobile phase with a flow rate of 1 ml/min.
The resulting chromatograms (Figure 3.22) indicate that dimerization is not
disrupted by treatment with either β-apo-13-carotenone or β-apo-14’-carotenoic acid. A reference chromatogram of a 1:4 ratio of RAR:RXR was used to identify the elution volumes of tetramers (~72 ml), dimers (~82 ml), and monomers (~88 ml). In both cases, the antagonist-treated heterodimers eluted as dimers.
Although slight shoulders are observed in the dimer peaks for the antagonist- treated samples, indicating a slight population of monomer, a similar shoulder is observed in the control case. The fact that the shoulders are slightly more prominent in the antagonist-treated cases compared to the untreated control is most likely due to the fact that the apocarotenoids contribute additional
absorbance at 280 nm, which is apparent when comparing the relative peak
absorbance of the dimers at ~82 ml.
121
Figure 3.22. Size exclusion chromatograms of hRARα/hRXRα LBD complexes. Heterodimer complexes were prepared with excess amounts of β-apo-13-carotenone (red) or β-apo-14’- carotenoic acid (blue), which both elute as dimers, similar to the untreated control (black). A 1 hRARα : 4 hRXRα mixture is included as a reference for the elution volumes of tetramers (~72 ml), dimers (~82 ml), and monomers (~87 ml).
3.6 ITC experiments
3.6.1 ITC Background
Through direct measurements of heats evolved during binding processes,
isothermal titration calorimetry (ITC) is the only experimental method that can
measure binding free energies (∆Gbind) in addition to the entropic and enthalpic components of binding (∆Sbind and ∆Hbind). This is accomplished by titrating a ligand into a sample cell containing a receptor. The sample cell is maintained at a
122 constant temperature with respect to a reference cell within an adiabatic chamber
by the application of a constant power. During a titration experiment, the amount
of power required to maintain a constant temperature within the sample cell is
recorded with each injection. Integrating these recorded powers over time results
in the total heat (∆H) of each injection. If concentrations of receptor and ligand
are chosen carefully, the full titration experiment will result in a saturation binding
curve. Given known receptor and ligand concentrations, fitting models to the
resulting curve yields the binding constant (Kb or Ka) and the stoichiometry of
binding (n). Thus, ∆H, Kb, and n are directly measured in an ITC experiment.
Using the relationship
∆ ln where R is the gas constant (R=1.985877534e-3 kcal*K-1*mol-1), T is the
temperature in Kelvin, and the dissociation constant Kd = 1/Kb, the binding free
energy (∆Gbind) may be calculated. Finally, using the relationship
∆ ∆ ∆
the entropic change attributed to the binding event (∆Sbind) may be calculated,
providing all of the components of the total binding energy.
ITC experiments have been applied in the past to nuclear receptor systems, most
often in measuring the interaction energy between coactivators and ligand
binding domains bound to various ligands. Most often, a peptide containing a
single LXXLL motif is used in the ITC experiments as an estimation of the full
coactivator binding energy. Some studies have also used a full coactivator
receptor interaction domain (RID) in the binding experiments. The following
123 experiments measure the binding affinity of the SRC-1 NR2 peptide (686-Ac-
RHKILHRLLQEGS-NH2-698) to ligand-bound RARα.
Only a single report was found of ITC used to directly measure the binding affinity of a ligand to a NR LBD: danthron, an RXRα-specific agonist, was measured to have a Kd of 7.5 μM [186]. The lack of ligand binding studies is likely due to the practical limitations of ITC experiments. In particular, it is important that the ligand and receptor are in solutions that are well-matched, otherwise large heats of dilution during the titration experiment will mask any heats of binding. Given that most NR ligands are very hydrophobic, they are poorly soluble in the aqueous solutions required for protein stability. Additionally, ITC experiments are generally limited to ligands with dissociation constants, Kd, in the range of 1 mM to 10 nM. Direct measurement of high affinity ligands is made difficult by limitations on conditions required to obtain interpretable thermograms.
A parameter called the ‘c value’ is often used to determine the precision by which a Kd may be calculated from an isotherm obtain under specific conditions, where