University of Massachusetts Medical School eScholarship@UMMS

GSBS Dissertations and Theses Graduate School of Biomedical Sciences

2018-01-31

Investigating the Structural Basis for Human Disease: APOBEC3A and Profilin

Tania V. Silvas University of Massachusetts Medical School

Let us know how access to this document benefits ou.y Follow this and additional works at: https://escholarship.umassmed.edu/gsbs_diss

Part of the Biochemistry Commons, Enzymes and Coenzymes Commons, Medicinal-Pharmaceutical Chemistry Commons, Nervous System Diseases Commons, and the Structural Biology Commons

Repository Citation Silvas TV. (2018). Investigating the Structural Basis for Human Disease: APOBEC3A and Profilin. GSBS Dissertations and Theses. https://doi.org/10.13028/M2KD6T. Retrieved from https://escholarship.umassmed.edu/gsbs_diss/955

This material is brought to you by eScholarship@UMMS. It has been accepted for inclusion in GSBS Dissertations and Theses by an authorized administrator of eScholarship@UMMS. For more information, please contact [email protected].

INVESTIGATING THE STRUCTURAL BASIS FOR HUMAN DISEASE: APOBEC3A AND PROFILIN

A Dissertation Presented

By

TAÑA VANESSA SILVAS

Submitted to the Faculty of the University of Massachusetts Graduate School of Biomedical Sciences, Worcester in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

JANUARY 31st, 2018

BIOCHEMISTRY AND MOLECULAR PHARMACOLOGY

i

INVESTIGATING THE STRUCTURAL BASIS FOR HUMAN DISEASE: APOBEC3A AND PROFILIN

A Dissertation Presented

By

TAÑA VANESSA SILVAS

This work was undertaken in the Graduate School of Biomedical Sciences BIOCHEMISTRY AND MOLECULAR PHARMACOLOGY

Under the mentorship of

Celia A. Schiffer, PH.D., Thesis Advisor

Jeremy Luban, M.D., Member of Committee

Brian A. Kelch, PH.D., Member of Committee

Katherine Fitzgerald, PH.D., Member of Committee

Catherine L. Drennan, PH.D., External Member of Committee

Dan Bolon, PH.D., Chair of Committee

ANTHONY CARRUTHERS, PH.D., Dean of the Graduate School of Biomedical Sciences

JANUARY 31st, 2018

ii

Acknowledgments

I would like to thank several people for their support, advice, and friendship throughout my graduate career. First, I thank my family, who instilled in me the importance of education and following my passions in life. I thank my mother, nana, my uncle Ramon, and Ruben for all the sacrifices they endured in order to raise my sister and I.

Next, I would like to thank the Biochemistry and Molecular Pharmacology department for providing an amazingly friendly and warm environment to work in.

I would like to thank my committee members Dr. Daniel Bolon, Dr. Brian Kelch,

Dr. Jeremy Luban, Dr. Mohan Somasundaran, Dr. Katherine Fitzgerald, Dr.

Catherine Drennan for their insightful comments and keeping me focused throughout my graduate career.

I thank all past and current members of the Schiffer lab. It was an amazing experience working in such a large group of people from many different backgrounds scientifically and culturally. I thank Ellen Nalivaika not only for making sure everything in the lab is running smoothly but also for all you help on the APOBEC project and being the go to person for any questions I had with running experiments in the lab. I thank Dr. Nese Kurt-Yilmaz for an amazing mentor in writing and in patience. You helped turn my convoluted ideas and explanations into simple and easy texts, and taught me how to stay level headed when things got difficult in lab. Thank you to Christine Pruitte, Candice Dufour,

iii

and Christina Zollo for all of your support and help, I always enjoyed your warm, welcoming, positive energy despite all the crazy paperwork we had to deal with.

I thank my rotation mentors Dr. Shivender Shandilya and Dr. Markus Bohn for introducing me to the APOBEC field, I was always amazed by your excitement and enthusiasm for this project even throughout all the challenges. I also thank you for teaching me purification, biochemical assays and crystallography. Thank you to Shurong Hou for being the best lab and bay mate ever. For the many hours we spent together trouble shooting experiments to brainstorming ideas about the implications of our results- we made great team. It was always a pleasure working with you and I am grateful for your friendship and optimism.

I would like to thank Dr. Bill Royer for being a great mentor especially in crystallography. I thank Dr. Mohan Somasundaran for his mentorship in the

APOBEC project and all your support and advice throughout the years.

I thank Dr. Brian Kelch for being an incredible mentor since I was a rotation student in his lab. I value all your mentorship in crystallography, biochemistry, structural analysis and entertaining my random and sometimes off- the-wall general scientific enquiries. I also thank his lab members Dr. Brendan

Hilbert, Dr. Christl Gaubitz, Janelle Hayes, Nicholas Stone, for your friendship and help with crystallography and electron microscopy.

I would like to thank Dr. Daryl Bosco and Dr. Siva Boopathy for a wonderful collaboration in the Profilin project. I would also like to thank Dr.

iv

Hiroshi Matsuo and Dr. Wazo Myint for their collaboration in the APOBEC project.

Thank you to Dr. Kristina Prachanranarong, Dr. Djade Soumana, and Dr.

Kuan Hung Lin for being amazing lab mates and all your support and friendship over the years. From consolation ramen dinners to dragging couches up my 3rd story apartment window, you guys were always there for me. I would also like to thank Dr. Furkan Ayaz, Dr. Mary Munson, Ashley Mathew, Florian Leidner Dr.

Madhvi Koli, Dr. Sagar Kathuria, Dr. Caroline Duffy, Dr. Laura Deveau, for their advice, friendship and encouragement.

I would like to thank Dr. Aneth Canale and Pamela Cote for being such great friends throughout grad school. We met as classmates our first year have stuck together since. I will miss our lunch dates and our girl’s night out. I’ll be forever grateful for our friendship.

Finally, I would like to thank my thesis advisor, Celia Schiffer for being an amazing mentor. I am so grateful for your support not only in my scientific endeavors but also in my personal growth throughout the past six years. Your encouragement and compassion even through the difficult times is nothing like I have ever experienced from a mentor. I grew up in an environment where aggression and intimidation is the main mode to gain power and respect in a situation. You have opened my eyes to the strength that is in kindness and I strive to put what I learned from you in practice in my career and also in my personal relationships. Thank you for believing in me, even when I didn’t.

v

Abstract

Analyzing protein tertiary structure is an effective method to understanding protein function. In my thesis study, I aimed to understand how surface features of protein can affect the stability and specificity of enzymes. I focus on 2 that are involved in human disease, Profilin (PFN1) and APOBEC3A (A3A). When these proteins are functioning correctly, PFN1 modulates actin dynamics and A3A inhibits retroviral replication. However, mutations in PFN1 are associated with amyotrophic lateral sclerosis (ALS) while the over expression of A3A are associated with the development of cancer. Currently, the pathological mechanism of PFN1 in this fatal disease is unknown and although it is known that the sequence context for mutating DNA vary among A3s, the mechanism for substrate sequence specificity is not well understood.

To understand how the mutations in Profilin could lead to ALS, I solved the structure of WT and 2 ALS-related mutants of PFN1. Our collaborators demonstrated that ALS-linked mutations severely destabilize the native conformation of PFN1 in vitro and cause accelerated turnover of the PFN1 protein in cells. This mutation-induced destabilization can account for the high propensity of ALS-linked variants to aggregate and also provides rationale for their reported loss-of-function phenotypes in cell-based assays. The source of this destabilization was illuminated by my X-ray crystal structures of several PFN1 proteins. I found an expanded cavity near the protein core of the destabilized M114T variant. In contrast, the E117G mutation only modestly perturbs the structure and stability of PFN1, an observation that reconciles the occurrence of this mutation in the control population. These findings suggest that a destabilized form of PFN1 underlies PFN1-mediated ALS pathogenesis.

To characterize A3A’s substrate specificity, we solved the structure of apo and bound A3A. I then used a systematic approach to quantify affinity for substrate as a function of sequence context, pH and substrate secondary structure. I found that A3A preferred ssDNA binding motif is T/CTCA/G, and that A3A can bind RNA in a sequence specific manner. The affinity for substrate increased with a decrease in pH. Furthermore, A3A binds tighter to its substrate binding motif when in the loop region of folded nucleic acid compared to a linear sequence. This result suggests that the structure of DNA, and not just its chemical identity, modulates A3 affinity and specificity for substrate.

vi

Table of Contents

Title Page...... i

Acknowledgment...... iii

Abstract...... vi

Table of Contents...... vii

List of Tables...... xi

List of Figures...... xii

Preface...... xv

Chapter I: Introduction...... 1 I.a. Approaches to elucidating the mechanism of biological processes...... 2 I.b. Human Cytidine Deaminases...... 4 I.b.1. Activity of human cytidine deaminases in the cell...... 4 I.b.1.i. Initial elucidation of APOBEC3 activity...... 4 I.b.1.ii. Functions of APOBEC3 beyond retroviral restriction...... 6 I.b.1.iii. Other members for the human APOBEC superfamily...... 7 I.b.2. Consequences of mis-regulated cytidine deamination activity...... 8 I.b.3. Enzymology of human ssDNA deaminating APOBECs...... 10 I.b.3.i. Comparison of human ssDNA-deaminating APOBEC protein structure...... 10 I.b.3.ii. The active sites of ssDNA cytidine deaminases are highly conserved...... 13 I.b.3.iii. Conserved mechanism of cytidine deamination...... 17 I.b.4. Substrate specificity of human ssDNA deaminating APOBECs.....20 I.b.5. Scope of thesis part 1...... 23 I.c. Human Profilin 1...... 25 I.c.1. Activity of human Profilin 1 in the cell: modulator of actin polymerization...... 25 I.c.2. Consequences of mutant Profilin 1: Amyotrophic Lateral Sclerosis...... 29 I.c.2.i. Amyotrophic Lateral Sclerosis (ALS) ...... 29 I.c2.ii. Single point mutations in PFN1 are associated with fALS...... 30 I.c.3. Scope of thesis part 2...... 32 I.d. References ...... 33

vii

Chapter II...... 42 The ssDNA Mutator APOBEC3A Is Regulated by Cooperative Dimerization II.a. Abstract...... 43 II.b. Introduction...... 44 II.c. Results...... 48 II.c.1. APOBEC3A Preferentially Binds the TTC Trinucleotide Sequence...... 48 II.c.2. Crystal Structure of APOBEC3A...... 52 II.c.3. Assessing the Functional Significance of the Crystallographic Dimer..58 II.d. Discussion...... 62 II.e. Methods...... 69 II.f. References ...... 73

Chapter III...... 82 Crystal structure of APOBEC3A bound to single- stranded DNA reveals structural basis for cytidine deamination and specificity III.a. Abstract...... 83 III.b. Introduction...... 84 III.c. Results and Discussion ...... 89 III.c.1. A3A-ssDNA co-crystal structure...... 89 III.c.2. Recognition of the targeted cytidine...... 97 III.c.3. Specificity for pyrimidines at -1 position...... 101 III.c.4. The conserved N57 is central to the active site geometry...... 103 III.c.5. H29 coordinates the ssDNA binding in the active site...... 106 III.c.6. A3A and rA3G-NTD differ in DNA binding...... 108 III.c.7. Molecular recognition in polynucleotide deaminases...... 113 III.d. Methods...... 115 III.e. References ...... 121

Chapter IV...... 127 Substrate Sequence Selectivity of APOBEC3A Implicates Intra-DNA Interactions IV.a. Abstract...... 128 IV.b. Introduction...... 129 IV.c. Results...... 132 IV.c.1. A3A binding to ssDNA is context dependent...... 132 IV.c.2. A3A affinity for ssDNA is pH dependent...... 136 IV.c.3. Substrate recognition is dependent on thymidine directly upstream of target deoxycytidine, with preference for pyrimidines over purines...... 139

viii

IV.c.4. A3A preference for binding to substrate over product in context dependent...... 143 IV.c.5. Positive correlation between sequence preference of binding and enzymatic activity...... 144 IV.c.6. Structural basis for A3A specificity for binding to preferred recognition sequence...... 147 IV.c.7. A3A bends ssDNA to potentially allow for intra-DNA interaction between -2 and +1 nucleotides...... 152 IV.c.8. Length of ssDNA affects affinity of A3A for substrate sequence...... 153 IV.c.9. A3A prefers binding to target sequence in the loop of structured hairpins...... 155 IV.d. Discussion...... 159 IV.e. Methods...... 162 IV.f. References ...... 169

Chapter V...... 174 Structural basis for mutation-induced destabilization of profilin 1 in ALS V.a. Abstract...... 175 V.b. Introduction...... 176 V.c. Results...... 179 V.c.1. ALS-Linked Mutations Destabilize PFN1 in Vitro...... 179 V.c.2. ALS-Linked PFN1 Exhibits Faster Turnover in a Neuronal Cell Line…………………………………...... 186 V.c.3. ALS-Linked Mutations Induce a Misfolded Conformation Within PFN1.…………………………………...... 191 V.c.4. A Source of Mutation-Induced Destabilization Revealed by X-Ray Crystallography of PFN1…………………...... 196 V.d. Discussion...... 211 V.e. Methods...... 215 V.f. References ...... 231

Chapter VI: Discussion...... 236 VI.a. APOBEC3A...... 237 VI.a.1. Using A3A to understand the structural basis for substrate recognition in ssDNA deaminating APOBECs...... 237 VI.a.2. Nucleic acid-bound structures of APOBECs...... 240 VI.a.3. Implications for the role of A3A homo dimer observed in apoA3A structure...... 246 VI.a.4. Applications for identifying APOBEC signature sequences in a quantitative manner...... 252 VI.a.5. pH dependence of APOBEC activity...... 254 VI.a.6. Implications of deamination of RNA by APOBEC...... 257

ix

VI.b. Profilin 1…...... 259 VI.b.1. Elucidating the structural basis for mutation-induced destabilization of profilin 1 in ALS...... 259 vi.b.2. Characterizing the conformation and local stability of PFN1-ALS mutants in solution...... 261 vi.b.3. Elucidating the role of single point mutations on the function of PFN1...... 262 vi.b.4. Designing small molecule therapies to stabilize PFN1 disease causing mutants...... 263 VI.c. References…...... 264

x

List of Tables

Table I.1. Consensus sequence for deamination activity of human AID and APOBEC proteins...... 22 Table II.1. ssDNA Binding Affinity and Cooperativity of APOBEC3A and Interface Mutants...... 50 Table II.2. Crystallographic Statistics for APOBEC3A Structure...... 54 Table II.3. Nucleotide sequences for ssDNA oligomers used in APOBEC3A binding experiments...... 72 Table III.1. Data collection and refinement statistics...... 92 Table III.2. Conservation of DNA coordinating residues between human A3 domains (catalytic and inactive pseudo-catalytic)...... 102 Table IV.1. A3A affinity for DNA sequences used in this analysis...... 134 Table IV.2. A3A affinity for ssDNA Poly A-TTC in a range of pHs...... 138 Table IV.3. A3A enzyme activity for DNA sequences...... 145 Table V.1. Summary of experimental stability and binding measurements for PFN1 variants...... 185 Table V.2. Crystallographic and refinement statistics of human PFN1 structures...... 197

xi

List of Figures

Figure I.1: A comprehensive elucidation of a biological process...... 3 Figure I.2: APOBEC3s deaminate cytidines in ssDNA...... 5 Figure I.3: Functions and malfunctions of the ssDNA deaminating APOBECs...... 9 Figure I.4: APOBEC3 family of cytidine deaminases...... 11 Figure I.5: Structures of all human ssDNA deaminating APOBECs domains determined to date...... 12 Figure I.6: Sequence alignment of ssDNA proteins...... 15 Figure I.7: Tertiary structure analysis highlight similarities in the active sites of ssDNA deaminating APOBECs...... 16 Figure I.8: Deamination reaction mechanism...... 18 Figure I.9: Elucidating the substrate sequence dependence of A3A affinity to ssDNA...... 24 Figure I.10: Actin dynamics...... 27 Figure I.11: PFN1 can increase actin polymerization...... 28 Figure I.12: Location of PFN1 mutations associated with ALS ...... 31 Figure II.1: APOBEC3A Binding Specifically to Trinucleotide Deamination Motifs...... 49 Figure II.2: Crystal Structure of APOBEC3A...... 53 Figure II.3: Superposition of the two monomers in the crystallographic dimer....56 Figure II.4: Residues Contributing to Interface Formation Are Determinants for Cooperativity and Affinity...... 57 Figure II.5: N-terminally truncated mutant of APOBEC3A Binding to ideal substrate...... 59 Figure II.6: Binding of catalytically inactive E71G APOBEC3A variant...... 60 Figure II.7: Residues Implicated in Deamination Activity...... 63 Figure II.8. Residues Implicated in DNA Binding...... 64 Figure III.1: Sequence alignment of APOBEC3s...... 85 Figure III.2: Crystal structure of A3A in complex with substrate DNA...... 90 Figure III.3: Secondary structure elements of A3A...... 91 Figure III.4: Comparison of bound and unbound crystal structures of A3A...... 94 Figure III.5: A3A–ssDNA atomic interactions...... 98 Figure III.6: Structural model of the A3A catalytically active site...... 100 Figure III.7: Close-up views of the conserved asparagine and sugar arrangement...... 104 Figure III.8: View of the atomic interactions between rA3G-NTD and ssDNA (5K83)...... 108 Figure III.9: Structure and substrate-binding similarity between A3A and RNA deaminase TadA...... 109 Figure III.10: Simulated annealing omit map confirms ssDNA positioning...... 118 Figure III.11: Modeling in the electron density...... 119

xii

Figure IV.1: A3A specificity to ssDNA background and substrate...... 133 Figure IV.2: A3A affinity to ssDNA at different pHs...... 137 Figure IV.3: A3A specificity for nucleotides flanking substrate cytidine...... 140 Figure IV.4: A3A specificity for poly A xTCx...... 141 Figure IV.5: Binding affinity versus enzyme activity...... 146 Figure IV.6: A3A recognition of substrate cytidine and pyrimidines at -1...... 148 Figure IV.7: ssDNA is bent within the complex with A3A...... 150 Figure IV.9: A3A affinity to ssDNA of varied lengths...... 154 Figure IV.10: A3A specificity for substrate in loop region of stem-loop nucleic acids...... 157 Figure IV.11: A3A affinity to ssRNA...... 158 Figure V.1: A comparison of PFN1 C71G purified from the soluble lysate of Escherichia coli vs. from inclusion bodies...... 180 Figure V.2: ALS-linked mutations destabilize PFN1...... 181 Figure V.3: All PFN1 variants unfold by a two-state process...... 182 Figure V.4: ALS-linked PFN1 variants exhibit faster turnover in a neuronal cell line...... 187 Figure V.5: The turnover of insoluble PFN1 in SKNAS cells...... 189 Figure V.6: ALS-linked PFN1 variants retain the same secondary structure as PFN1 WT...... 192 Figure V.7: Analysis of PFN1 proteins by native page and analytical size- exclusion chromatography...... 193 Figure V.8: Superimposition of the crystal structures for PFN1 WT, E117G, and M114T...... 198 Figure V.9: Structural changes induced by the M114T mutation revealed in double difference plots...... 200 Figure V.10: Structure of actin–PFN1–VASP peptide ternary complex with the actin and poly-L-proline binding residues mapped on PFN1...... 202 Figure V.11: Actin and poly-L-proline binding residues exhibit relatively high double difference values...... 203 Figure V.12: The calculated α-carbon B factors for all PFN1 structures...... 204 Figure V.13: ALS-linked PFN1 variants retain the ability to bind poly-L-proline...... 205 Figure V.14: The binding of PFN1 proteins to G-actin...... 207 Figure V.15: The M114T mutation causes a surface-exposed pocket to expand into the core of the PFN1 protein...... 208 Figure V.16: Electrostatic surface potential (ESP) of PFN1 WT and PFN1 M114T...... 210 Figure VI.1: Solved structures of APOBEC-poly nucleic acid complexes...... 241 Figure VI.2: Active site view of APOBEC-poly nucleic acid complexes...... 242 Figure VI.3: macA3H RNA binding residues...... 244 Figure VI.4: A3F-CTD Poly T ssDNA binding residues...... 248 Figure VI.5: Compilation of A3 apo and bound structures...... 249

xiii

Figure VI.6: Proposed model of A3 homodimer cooperatively binding to ssDNA...... 250 Figure VI.7: Active site histidines in APOBEC enzymes...... 256 Figure VI.8: Potential structural mechanism for A3A binding RNA...... 258

xiv

Preface

Chapter II is a collaborative study that has been previously published as:

Bohn MF, Shandilya SMD, Silvas TV, Nalivaika EA, Kouno T, Kelch BA, Ryder

SP, Kurt-Yilmaz N, Somasundaran M, Schiffer CA. The ssDNA Mutator

APOBEC3A Is Regulated by Cooperative Dimerization. Structure. 2015 May

5;23(5):903-911. doi: 10.1016/j.str.2015.03.016. Epub 2015 Apr 23. PubMed

PMID: 25914058; PubMed Central PMCID: PMC4874493.

Contributions from Tania V. Silvas:

I contributed to the cloning and over expression of A3A mutants along with

Markus F. Bohn. I contributed to the structural analyses and interpretation of the

A3A homodimer along with Bohn MF, Shandilya SMD, Somasundaran M,

Schiffer CA.

xv

Chapter III is a collaborative study that has been previously published as:

Kouno T, Silvas TV, Hilbert BJ, Shandilya SMD, Bohn MF, Kelch BA, Royer WE,

Somasundaran M, Kurt Yilmaz N, Matsuo H, Schiffer CA. Crystal structure of

APOBEC3A bound to single-stranded DNA reveals structural basis for cytidine deamination and specificity. Nat Commun. 2017 Apr 28;8:15024. doi:

10.1038/ncomms15024. PubMed PMID: 28452355; PubMed Central PMCID:

PMC5414352.

Contributions from Tania V. Silvas:

I contributed to the structural analyses and interpretation of the A3A-ssDNA complex in this study.

xvi

Chapter IV is a collaborative study that has been previously published as:

Silvas TV, Hou S, Myint W, Somasundaran M, Kelch BA, Matsuo H, Kurt-Yilmaz

N, Schiffer CA. Substrate sequence selectivity of APOBEC3A implicates intra-

DNA interactions. bioRxiv 176297; doi: https://doi.org/10.1101/176297

Contributions from Tania V. Silvas:

I devised the concept of this manuscript. I performed the cloning, expressing, purifying A3A for this study. I performed all fluorescence anisotropy based binding assays and the analysis of the data for this study. I performed structural analyses with assistance from Shurong Hou, Nese Kurt-Yilmaz and Celia A.

Schiffer. I created all figures and tables for this manuscript. I interpreted the data and wrote the manuscript with the assistance of Nese Kurt-Yilmaz and Celia A.

Schiffer.

xvii

Chapter V is a collaborative study that has been previously published as:

Boopathy S*, Silvas TV*, Tischbein M, Jansen S, Shandilya SM, Zitzewitz JA,

Landers JE, Goode BL, Schiffer CA, Bosco DA. Structural basis for mutation- induced destabilization of profilin 1 in ALS. Proc Natl Acad Sci U S A. 2015 Jun

30;112(26):7984-9. doi: 10.1073/pnas.1424108112. Epub 2015 Jun 8. PubMed

PMID: 26056300; PubMed Central PMCID: PMC4491777.

* Authors contributed equally to this work

Contributions from Tania V. Silvas:

I crystallized the WT and mutants of PFN1. I solved and refined the crystal structures of WT, E117G, M114T PFN1 and performed subsequent structural analyses with oversight from S. Shandilya and CA Schiffer. I interpreted the structural data and wrote the corresponding methods, results, and discussion sections with the assistance of Nese Kurt-Yilmaz and Celia Schiffer.

xviii

Chapter I

Introduction

1

I.a. Approaches to elucidating the mechanism of biological processes

Core strategies for understanding any biological process can be consolidated into three approaches: cellular biology, molecular biology, and biochemistry (Figure I.1).

Cellular biology focuses on the different functions of cells and organelles within a cell that are involved in a particular biological process in an organism. Molecular biology includes techniques that study the effects of gain-of-function and loss-of-function mutants of that play a role in biological processes. Biochemical approaches include elucidating the structure and function of biomolecules, as well as how the functions of these biomolecules affect biological processes.

A comprehensive elucidation of a biological process is necessary for a solid foundation for determining the mechanism of disease and designing therapies. For instance, combining the three core approaches explained above was essential to understanding the causes of cancer and ALS, which will be described in more detail below. This thesis explains how studying the structure and function of two different proteins lead to a better understanding of their role in biological processes. Results from this thesis not only demonstrate that biochemistry is a key component to a comprehensive understanding of biological malfunctions that ultimately lead to disease, but also build a foundation from which to design new and innovated therapies for fALS caused by PFN1 mutants and APOBEC3-induced cancers.

2

Figure I.1: A comprehensive elucidation of a biological process. A combination of cellular, molecular, and biochemical approaches can be used to elucidate biological processes. Each approach focuses on the functions of cells, genes, or biomolecules involved in a particular process.

3

I.b. Human Cytidine Deaminases

I.b.1. Activity of human cytidine deaminases in the cell

I.b.1.i. Initial elucidation of APOBEC3 activity

The APOBEC3 (A3) family was first discovered through database studies

as APOBEC1-like genes, but the function of the gene products had not yet been

elucidated (1). Soon after, APOBEC3G gene was found to be expressed in primary human T cells, the target cell type for HIV, and other permissive cells. When A3G was over-expressed in permissive cells, A3G rendered Vif-deficient HIV non-infectious.

Thus, APOBEC3G gene was first identified as an anti-HIV factor (2). With further HIV focused studies, A3G was shown to deaminate cytidines in single stranded (ss) DNA intermediate of replicating HIV genomes (3, 4).

Eventually all members of the A3 family of cytidine deaminases were found to restrict replication of retroviruses and retrotransposons by inducing hypermutations in the viral genome (Fig I.2)(2, 3, 5-9). A3 enzymes deaminate the cytidines in the ssDNA intermediate into uridines during reverse transcription. During second strand DNA synthesis, adenosines are transcribed across from uridines, resulting in G to A hypermutations. Proteins and proviruses produced from this hyper-mutated viral genome will be defective, thus preventing further viral replication (10).

4

Figure I.2: APOBEC3s deaminate cytidines in ssDNA. A schematic of G to A mutation catalyzed by APOBEC3s. Gray ribbons represent DNA. Green nucleotides represent normal progression of complimentary strand synthesis in the absence of A3. Red nucleotides represent nucleotide change in the presence of A3 protein.

5

I.b.1.ii. Functions of APOBEC3 beyond retroviral restriction

Besides inhibiting retroviruses and retroelements, as described above, A3s also

restricts DNA viruses. A3s can restrict nuclear replicating ssDNA viruses such as

adeno-associated virus (11). A3s can also restrict nuclear and dsDNA viruses such as

hepatitis B virus, herpes viruses and HPV (12-15).

A3s mediate the clearance of foreign DNA from monocytes and macrophages.

In particular, A3A, which is primarily expressed in phagocytic cells compared to other

cell types, was found to deaminate exogenous DNA, creating a substrate for UNG2- mediated excision while managing to leave genomic DNA intact (16, 17). A3A is also the most efficient at deaminating methylated cytidine compared to other deaminases such as A3G or AID, giving A3A the unique ability to clear methylated foreign DNA from the cell (18-20).

A3G was discovered to promote double strand break (DSB) repair through studying cancer cells that express high levels of A3G. Poor survival rates for patients undergoing chemotherapy for lymphoma and glioblastoma were found to correlate with having cancer cells that over-expressed the A3G gene (21-23). Through studying lymphoma cells, A3G was found to be transiently recruited to the nucleus and accumulate at DSB sites after irradiation. A3G can directly promote DSB repair through its enzymatic activity by deaminating cytidines in resected ssDNA, which may lead to recruitment of repair factors such as base-excision repair proteins. A3G can also contribute to DSB repair through its ability to simultaneously bind two 3’ ssDNA ends, promoting ssDNA end joining (24).

6

I.b.1.iii. Other members of the human APOBEC superfamily

The APOBEC super family of human cytidine deaminases consists of APOBEC1

(A1), APOBEC2 (A2), subfamily APOBEC3 (A3), APOBEC4 (A4) and activation induced

cytidine deaminase (AID). Besides the A3 subfamily, A1 and AID are the most well-

studied within the APOBEC superfamily. The physiological function for the remaining

APOBEC proteins, A2 and A4, has remained elusive. Both A2 and A4 have not yet

been shown to be catalytically active (25-27). A2 is primarily expressed in heart and

skeletal muscle while A4 expression in humans has not been detected (25).

A1 modulates lipid metabolism and transport in the small intestine by editing the

mRNA of apoB protein (28, 29). A1 deaminates specifically one cytidine in apoB mRNA,

C6666, creating a premature stop codon yielding a truncated apoB protein. This

truncated apoB protein is essential for proper transport of dietary lipids from the intestines to other locations in the body (30-32).

AID plays an essential role in adaptive immunity, regulating antibody maturation

and diversification in activated B cells (33). AID catalyzes the diversification in the

sequence of immunoglobulin genes through three pathways: somatic hypermutation,

gene conversion, and class switch recombination (34, 35). All three pathways are the

consequences of AID’s cytidine deaminate activity in ssDNA of antibody and

immunoglobulin genes(36).

7

I.b.2. Consequences of mis-regulated cytidine deamination activity

The ability of those APOBECs that deaminate cytidines in DNA, such as the

APOBEC3 subfamily and AID, has made them a double-edged sword. If not properly regulated, DNA-editing APOBECs that also have access to the nucleus have the potential to deaminate self-genome, which may instigate cancer. In the APOBEC3 family, A3B and A3H localizes to the nucleus, while A3A and A3G are transiently recruited to the nucleus (24, 37-39). When overexpressed, A3B and A3H have been described as a major endogenous source for mutations in various types of human cancer, such as breast, bladder, head and neck, cervical, and lung cancer (39-41). Like

A3B and A3H, AID localizes to the nucleus and when overexpressed can lead to cancer, as seen in non-Hodgkin B cell lymphomas (42, 43).

8

Figure I.3 Functions and malfunctions of the ssDNA deaminating APOBECs. Schematic of the effects deaminating deoxycytidine into deoxyuridine by ssDNA APOBECs. Green arrows represent functions in the cell APOBEC family helps promote or inhibit. Red arrow represents the consequence of over-expression of A3/AID.

9

I.b.3 Enzymology of human ssDNA deaminating APOBECs

I.b.3.i Comparison of human ssDNA-deaminating APOBEC protein structures.

Considering the similarities of functions and mis-functions between the seven A3 proteins and AID, the remainder of this introduction will focus on these eight proteins.

Each protein in the A3 subfamily contains either one or two zinc-coordinating (Z) domains (Figure I.4.A) (44). These A3 Z domains are separated into three distinct phylogenetic groups (Z1, Z2, and Z3) (Figure I.4.B)(45). A3A (Z1), A3C (Z2), and A3H

(Z3) are comprised of a single cytidine deaminase Z domain while A3B, A3D, A3F, and

A3G contain both an N-terminal pseudo-catalytic Z domain (all Z2), and a catalytically active C-terminal Z domain (Z1, Z2, Z2, and Z1 respectively)(46). Adding to the complexity, A3H has seven haplotypes with varying stability and enzymatic efficiency

(47). AID is a single protein that consists of one catalytically active zinc-coordinating domain (48). Comparing available structures of zinc-coordinating domains of human ssDNA deaminating APOBECs, it is clear that they share a conserved overall fold

(Figure II.5).

10

A B

Figure I.4: APOBEC3 family of cytidine deaminases. A. Schematic of the seven human APOBEC3s. N-terminal and C-terminal domains are shown as ovals. B. Phylogenic tree of A3 Z domains. Active domains are marked with an orange star. Z1 domains are in red, Z2 domains are in blue, and Z3 domain is in green.

11

Figure I.5 Structures of all human ssDNA deaminating APOBECs domains determined to date. Cartoon representation of all human apo ssDNA deaminating APOBEC domain crystal or NMR structures determined. Z1 domains are shown in red, Z2 are in blue, and AID is in purple. Zinc is depicted as orange spheres. Starred (*) PDB IDs denote structures solved by the Schiffer lab.

12

I.b.3.ii The active sites of ssDNA cytidine deaminases are highly conserved.

Although human cytidine deaminases play different roles in the cell, the mechanism of deaminating cytidine is conserved between these enzymes. These enzymes deaminate cytidine into uridine by catalyzing the exchange of the NH2 group to oxygen on cytidine bases (Figure I.8.A). Deamination of cytidine is carried out by a glutamate in the active site pocket of these proteins, along with a coordinated zinc and water molecule (Figure I.6 and I.7B). The coordination of zinc atom and water molecule in the active site results in one zinc hydroxide.

Sequence alignment of all catalytically active domains of ssDNA deaminating

APOBECs reveal residues necessary for catalysis such as the catalytic glutamate and zinc coordinating residues are 100% identical in all domains (green diamonds and yellow star; Figure I.6). Two residues vary within the active site cavity, and the four loops that surround the active site vary in sequence and length (Fig 6a), which are likely determinants of specificity. Putative residues responsible for substrate binding reside in

Loops 1, 3, 5, and 7. Homology within these loops vary (red dashed boxes; Figure I.6).

Homologous residues are displayed on the crystal structure of A3A reveal most identical residues are internal to protein structure (Figure I.7A). These residues may be important for the structural similarities found in all APOBEC domains (Figure I.5). surface exposed 100% identical residues comprise of functional residues such as zinc coordination (Figure I.7B).

Surface representation reveal mixed homology patches of localized patches on the surface of A3A (Figure I.7C). These patches may play a role in conserved and non- conserved DNA binding properties, depending on the homology present in these

13

patches. For example, surface view of A3A active site illustrates that the cytidine

binding pocket is perfectly lined with mostly 100% identical residues (Figure I.7D). The non-identical pocket residues, loop1 T31(teal; T/S/A), loop 3 A71(marine; A/V), loop 5

I96 (teal; (T/I/L), and loop 7 Y130 (marine; Y/F) are highly conserved. The minor differences in chemistry within the cytidine binding pocket play a role in the differences seen in catalytic efficiency between APOBEC enzymes, for instance, modulating the stability of APOBEC cytidine interactions that could affects rate of binding or catalysis.

Loops that line the cytidine binding pocket are the least conserved part of APOBEC active site (Figure I.7D). Variations in loop length and chemistry may be responsible for the differences in APOBEC substrate sequence specificity. Further analysis of active site features of A3 will be discussed in Chapter 4.

14

Figure I.6: Sequence alignment of ssDNA cytidine deaminase proteins. Sequence alignment of human AID, and the seven members of the APOBEC3 subfamily. Catalytic glutamate is denoted by orange star. Zinc coordinating residues are denoted with green diamonds. Identical residues highlighted in blue, residues 80-100% identical in light blue, 60-80% in teal. Active site loops are denoted by horizontal red lines. Residues that make up the active site pocket are highlighted with red dashed boxes.

15

A B

C D

Figure I.7: Tertiary structure analysis highlight similarities in the active sites of ssDNA deaminating APOBECs. A. Cartoon view of A3A crystal structure highlighting the conservation of active domains of ssDNA deaminating APOBECs shown in Fig I.6. B. Example that highly conserved residues are important for structural and function components of APOBEC proteins. Residues responsible for zinc coordination necessary for catalysis are 100% conserved and shown as sticks. C. Surface view of A3A illustrating location of conserved residues on protein surface. D 180 rotation of C reveals the active site of A3A highlighted with a light orange box. The cytidine binding pocket is highly conserved between APOBEC proteins. Loops proposed to be responsible for ssDNA binding, labeled with red circles, are not conserved. Zinc shown as light orange sphere. Identical residues highlighted in blue, residues 80-100% identical in marine, 60- 80% in teal.

16

I.b.3.iii Conserved mechanism of cytidine deamination

When a cytidine is coordinated properly within the active site, the oxygen of the zinc hydroxide molecule can make a nucleophilic attack on the C6 carbon of the cytidine base. The N3 nitrogen of the base may then be protonated by the OE1 oxygen of the catalytic glutamate. The hydrogen from newly activated zinc hydroxide group is then passed to the OE2 oxygen of the catalytic glutamate, which gets transferred to the NH2 group of the cytidine base, creating the leaving NH3 group. When ammonia is released from the cytidine base, the active site can be reset with the introduction of a new water molecule. This new water molecule allows for deprotonation by the carboxylate group of the catalytic glutamate to form the conserved zinc-water coordination necessary for another round of catalysis (Figure I.8B) (49, 50). Recognition of cytidine by the catalytic pocket of A3 proteins will be explored in Chapter III and the Discussion section of this thesis.

17

A

B

18

Figure I.8: Deamination reaction mechanism A. Schematic of deoxy-cytidine conversion to deoxy-uridine by ssDNA cytidine deaminases. Leaving NH2 group of cytidine base is in green, added oxygen is in red. Cytidine deaminase is depicted as gray cartoon. B. Schematic of the steps of deamination by cytidine deaminases.

19

I.b.4. Substrate specificity of human ssDNA deaminating APOBECs

While all A3 and AID enzymes deaminate cytidines in ssDNA, and have a conserved catalytic site and overall fold, they have different preferences for substrate sequence motifs. A3s and AID prefer to deaminate the second cytidine in either CC or

TC dinucleotide motifs (16, 51-55). Additionally, sequence analysis has determined that not every cognate dinucleotide motif is deaminated (39, 56-59) (Table II.1). Further elucidation of A3 sequence specificity will be explored in Chapter IV and in the

Discussion section of this thesis.

Structures of A3s alone have not led to an explanation for the differences in sequence specificity between A3s. Since A3s and AID active domains share an overall fold, major structural features cannot be responsible for differences in specificity. The catalytic site residues of these enzymes are also highly conserved (Figure II.7A). Thus, it is unclear which residues of the active site (Figure II.7B) are responsible for the differences in sequence specificity among AID and A3s.

Secondary structure of substrate DNA may also play a role in A3/AID specificity.

A3s and AID are not capable of deaminating cytidines within a duplex of DNA (60-62).

A3s and AID can only deaminate cytidines in ssDNA near duplex DNA or in the loop region of hairpin DNA(51, 60).

Environmental factors have been shown to also affect the efficiency of cytidine deamination. In particular, pH modulates the activity for some ssDNA deaminase including A3A and A3G, but not others (63-65). Whether the sequence preference for deamination is due to direct interaction with ssDNA substrate or other factors that inhibit

APOBECs such as secondary structure of substrate is still not clear (58, 60, 66).

20

Chapter III, Chapter IV and the Discussion section attempts to clarify the roles of protein, DNA, and environmental contributions to the affinity and specificity of ssDNA cytidine deaminases, with a focus on A3A.

21

Table I.1. Consensus sequence for deamination activity of human AID and APOBEC proteins.

Consensus Sequence AID (A/T)(A/G)C(A/C/T) A3A TTCA A3B ATCA A3C TC A3D TC A3F TTC(A/T) A3G (C/T)CC(A/C/T) A3H TCA

22

I.b.5. Scope of thesis part 1.

This thesis attempts to fill the gaps in the substrate sequence dependence of A3A

affinity to ssDNA (Figure I.9). First, I investigate specific binding of A3A to ssDNA using

biochemical data combined with analysis of a novel crystal structure of apo A3A

(Chapter II). Next, I elucidate the structural basis and mechanism of A3A ssDNA

recognition through analysis of the first crystal structure of A3A in complex with ssDNA.

With this structure, I also pinpoint residues that confer A3A specificity for CC/TC motifs

(Chapter III). Finally, I identify A3A’s ssDNA binding motif as (T/C)TC(A/G), and find

binding affinity correlates with enzymatic activity, validate that A3A binds RNA in a

sequence specific manner, and determine that A3A binds tighter to its substrate binding motif when in a loop compared to linear oligonucleotide (Chapter IV). I also discuss how revealing the structural and biochemical mechanism of A3A’s sequence specificity can be applied to the substrate recognition by other A3s, and designing cancer therapies in

Chapter V.

23

Figure I.9: Elucidating the substrate sequence dependence of A3A affinity to ssDNA.

24

I.c. Human Profilin 1

I.c.1. Activity of human Profilin 1 in the cell: modulator of actin polymerization

Profilin 1 (PFN1) is an evolutionary conserved protein that is expressed in all tissue types besides skeletal muscle (67, 68). The most well studied biochemical function for PFN1 is its regulation of actin polymerization (68-70).The ability to modulate actin dynamics, along with its ubiquitous expression, allows PFN1 to play a role in a variety of cellular processes such as endocytosis, membrane trafficking, cell motility, and neuronal growth and differentiation (71).

In the absence of PFN-1, ATP-bound monomeric G-actin can bind to one another to form stable oligomers. These tetramers act as seeds for the assembly of actin filaments (F-actin) (Figure I.10). Toward the minus end of F-actin, actin converts the bound ATP into ADP. These ADP-bound actin molecules have lower affinity for F-actin and are released from the minus end of F-actin. G-actin released from F-actin eventually exchange ADP for ATP, allowing for ATP-bound G-actin to form new seeds or to get incorporated into the plus end of an elongating actin filament.

PFN1 can modulate actin polymerization in three ways, all of which rely on

PFN1’s ability to bind monomeric actin and regulated by PFN1 protein concentration levels in the cell. When PFN1 protein concentration levels are high, PFN1 can inhibit actin polymerization by binding and thus sequestering monomeric G-actin. When PFN1 protein levels are low, PFN1 can catalyze the release of ADP from G-actin monomers, allowing for faster turnover of ADP-bound G-actin into ATP-bound G-actin (Figure I

.11). At low protein levels, PFN1 can also catalyze the elongation of actin filament by

25

simultaneously binding ATP-bound G-actin and other proteins that increase the rate of actin filament formation.

26

Figure I.10 Actin dynamics ATP bound actin also called G-actin (green cartoon) oligomerizes into a stable tetramer. This seed tetramer allows for nucleation to occurs with the addition G-actin monomers. F actin filaments are formed by the addition of more G- actin monomers to the barbed “+ end” of nucleated actin. Within the F actin filament G-actin hydrolyzes ATP (orange) into ADP (Gray) releasing Pi (phosphate shown in orange circle). ADP bound actin (pink cartoon) is then released from the pointed “- end”. ADP bound actin cannot be added to F- Actin filament until it exchanges ADP for ATP.

27

Figure I.11 PFN1 can increase actin polymerization PFN1 (blue cartoon) can increase the rate of actin polymerization in two ways. 1: PFN1 can bind to ADP bound actin, increasing the rate of exchange ADP for ATP, allowing a faster recycling of released actin from F-actin filaments. 2: PFN1 can also simultaneously bind G-actin and other proteins in order to catalyze the elongation of actin filament.

28

I.c.2. Consequences of mutant Profilin 1: Amyotrophic Lateral

Sclerosis

I.c.2.i. Amyotrophic Lateral Sclerosis (ALS)

Amyotrophic Lateral Sclerosis (ALS) is an incurable and fatal neurodegenerative disorder. In the United States alone, 30,000 individuals are living with ALS. The lifetime risk is 1:1,000 and male to female ratio is 3:2. ALS is a late onset disease; its peak onset is at 65 years of age for both males and females. Disease progresses quickly; there is about a fifteen percent survival rate after 80 months from the onset of symptoms. To date, there are only symptomatic and supportive treatments for this terminal disease.

ALS is caused by progressive death of upper and lower motor neurons (72). The loss of motor neurons first results in impaired motor coordination, leading to paralysis and eventual respiratory failure. Initial clinical phenotype can vary, but most commonly displayed as spasticity, muscle atrophy, and trouble speaking/swallowing.

Patients with ALS fall under two categories: sporadic (sALS) and familial (fALS).

Those patients with no known family history of ALS are classified as sALS and make up

90% all known cases. Of the 10% of patients with the inherited form of ALS, only 50% have known genetic etiology (73).

The first gene associated with ALS, superoxide dismutase (SOD), is the most prevalent and well-studied gene. Within the past decade there has been over 20 other genes associated ALS (74). Most recently, single point mutants of PFN1 were found to be associated with ALS (75).

29

I.c.2.ii.. Single point mutations in PFN1 are associated with fALS

Mutations in PFN1 protein were recently identified in both fALS and sALS patients (75, 76). The four single point PFN1 mutants identified, C71G, M114T, E117G, and G118V, range in pathogenicity. M114T, G118V, and C71G have a high propensity to aggregate in mammalian cultured cells, with C71G being the most aggregation prone.

These three mutants were also shown to inhibit axon outgrowth. Although E117G was initially found in fALS patients, in-cell and in-vitro studies found E117G to be more wildtype-like. These findings suggested that PFN1 mutants cause ALS by modulating cytoskeletal pathway in neuronal cells. Since C71G is the most pathogenic of these mutants in patients, with the biggest change in aggregation potential and highest decrease in axon outgrowth, structural integrity of PFN1 mutants may be the root cause of pathogenicity.

fALS point mutations localize to one area of the protein and are found in surface exposed loops as well as buried within the core of the protein (Figure III.12). The fact that the most pathogenic mutant, C71G, not only shows highest aggregation potential but is also located in the core of the protein strengthens the hypothesis that these mutants may modulate structural integrity of PFN1. Although this initial hypothesis

seems promising, the mechanism associated with PFN1-mediated ALS pathogenesis

has yet to be elucidated. Chapter V and the Discussion section of this thesis will attempt

to address the role of PFN1 mutants on the development of ALS.

30

Figure III.12: Location of PFN1 mutations associated with ALS.

31

I.c.3. Scope of thesis part 2.

This thesis attempts to fill the gaps in the structural basis of single point mutations in

PFN1 that lead to fALS. First, I identify and characterize structural differences of fALS- linked PFN1 mutants, by solving the crystal structures of wild type PFN1, the low pathogenic PFN1 mutant E117G, and a fALS-linked PFN1 mutant, M114T. Next, I identify and describe a deep cavity expanding into the core of PFN1 M114T, but not in the wild type or the E117G structures that may be the source of destabilization for this fALS-linked mutant and may be an important factor in fALS-linked PFN1 mutant pathology. Finally I present preliminary modeling of a more unstable and severe fALS- linked PFN1 mutant, C71G, which further supports the hypothesis that perturbation in the core of PFN1 is a common cause of instability for fALS caused by PFN1 mutants.

32

I.d. References:

1. Jarmuz A, Chester A, Bayliss J, Gisbourne J, Dunham I, Scott J, Navaratnam N. An anthropoid-specific locus of orphan C to U RNA-editing enzymes on 22. Genomics. 2002;79(3):285-96. PubMed PMID: 11863358.

2. Sheehy AM, Gaddis NC, Choi JD, Malim MH. Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature. 2002;418(6898):646- 50. PubMed PMID: 12167863.

3. Mangeat B, Turelli P, Caron G, Friedli M, Perrin L, Trono D. Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts. Nature. 2003;424(6944):99-103. PubMed PMID: 12808466.

4. Zhang H, Yang B, Pomerantz RJ, Zhang C, Arunachalam SC, Gao L. The cytidine deaminase CEM15 induces hypermutation in newly synthesized HIV-1 DNA. Nature. 2003;424(6944):94-8. PubMed PMID: 12808465.

5. Zheng YH, Irwin D, Kurosu T, Tokunaga K, Sata T, Peterlin BM. Human APOBEC3F is another host factor that blocks human immunodeficiency virus type 1 replication. J Virol. 2004;78(11):6073-6. PubMed PMID: 15141007.

6. Dang Y, Siew LM, Wang X, Han Y, Lampen R, Zheng YH. Human cytidine deaminase APOBEC3H restricts HIV-1 replication. J Biol Chem. 2008;283(17):11606- 14. doi: 10.1074/jbc.M707586200. PubMed PMID: 18299330; PMCID: 2430661.

7. Dang Y, Wang X, Esselman WJ, Zheng YH. Identification of APOBEC3DE as another antiretroviral factor from the human APOBEC family. J Virol. 2006;80(21):10522-33. doi: 10.1128/JVI.01123-06. PubMed PMID: 16920826; PMCID: 1641744.

8. Bogerd HP, Wiegand HL, Doehle BP, Lueders KK, Cullen BR. APOBEC3A and APOBEC3B are potent inhibitors of LTR-retrotransposon function in human cells. Nucleic Acids Res. 2006;34(1):89-95. doi: 10.1093/nar/gkj416. PubMed PMID: 16407327; PMCID: PMC1326241.

9. Muckenfuss H, Hamdorf M, Held U, Perkovic M, Lower J, Cichutek K, Flory E, Schumann GG, Munk C. APOBEC3 proteins inhibit human LINE-1 retrotransposition. J Biol Chem. 2006;281(31):22161-72. doi: 10.1074/jbc.M601716200. PubMed PMID: 16735504.

10. Harris RS, Bishop KN, Sheehy AM, Craig HM, Petersen-Mahrt SK, Watt IN, Neuberger MS, Malim MH. DNA deamination mediates innate immunity to retroviral infection. Cell. 2003;113(6):803-9. PubMed PMID: 12809610.

33

11.Chen H, Lilley CE, Yu Q, Lee DV, Chou J, Narvaiza I, Landau NR, Weitzman MD. APOBEC3A is a potent inhibitor of adeno-associated virus and retrotransposons. Curr Biol. 2006;16(5):480-5. doi: 10.1016/j.cub.2006.01.031. PubMed PMID: 16527742.

12. Janahi EM, McGarvey MJ. The inhibition of hepatitis B virus by APOBEC cytidine deaminases. J Viral Hepat. 2013;20(12):821-8. doi: 10.1111/jvh.12192. PubMed PMID: 24304451.

13. Vieira VC, Leonard B, White EA, Starrett GJ, Temiz NA, Lorenz LD, Lee D, Soares MA, Lambert PF, Howley PM, Harris RS. Human papillomavirus E6 triggers upregulation of the antiviral and cancer genomic DNA deaminase APOBEC3B. MBio. 2014;5(6). doi: 10.1128/mBio.02234-14. PubMed PMID: 25538195; PMCID: PMC4278539.

14. Wang Z, Wakae K, Kitamura K, Aoyama S, Liu G, Koura M, Monjurul AM, Kukimoto I, Muramatsu M. APOBEC3 deaminases induce hypermutation in human papillomavirus 16 DNA upon beta interferon stimulation. J Virol. 2014;88(2):1308-17. doi: 10.1128/JVI.03091-13. PubMed PMID: 24227842; PMCID: PMC3911654.

15. Suspene R, Aynaud MM, Koch S, Pasdeloup D, Labetoulle M, Gaertner B, Vartanian JP, Meyerhans A, Wain-Hobson S. Genetic editing of herpes simplex virus 1 and Epstein-Barr herpesvirus genomes by human APOBEC3 cytidine deaminases in culture and in vivo. J Virol. 2011;85(15):7594-602. doi: 10.1128/JVI.00290-11. PubMed PMID: 21632763; PMCID: PMC3147940.

16. Stenglein MD, Burns MB, Li M, Lengyel J, Harris RS. APOBEC3 proteins mediate the clearance of foreign DNA from human cells. Nat Struct Mol Biol. 2010;17(2):222-9. PubMed PMID: 20062055.

17. Land AM, Law EK, Carpenter MA, Lackey L, Brown WL, Harris RS. Endogenous APOBEC3A DNA cytosine deaminase is cytoplasmic and nongenotoxic. J Biol Chem. 2013;288(24):17253-60. doi: 10.1074/jbc.M113.458661. PubMed PMID: 23640892; PMCID: PMC3682529.

18. Wijesinghe P, Bhagwat AS. Efficient deamination of 5-methylcytosines in DNA by human APOBEC3A, but not by AID or APOBEC3G. Nucleic Acids Res. 2012;40(18):9206-17. doi: 10.1093/nar/gks685. PubMed PMID: 22798497; PMCID: PMC3467078.

19. Suspene R, Aynaud MM, Vartanian JP, Wain-Hobson S. Efficient deamination of 5- methylcytidine and 5-substituted cytidine residues in DNA by human APOBEC3A cytidine deaminase. PLoS One. 2013;8(6):e63461. doi: 10.1371/journal.pone.0063461. PubMed PMID: 23840298; PMCID: PMC3688686.

20. Carpenter MA, Li M, Rathore A, Lackey L, Law EK, Land AM, Leonard B, Shandilya SM, Bohn MF, Schiffer CA, Brown WL, Harris RS. Methylcytosine and normal cytosine

34

deamination by the foreign DNA restriction enzyme APOBEC3A. J Biol Chem. 2012;287(41):34801-8. doi: 10.1074/jbc.M112.385161. PubMed PMID: 22896697; PMCID: PMC3464582.

21. Jais JP, Haioun C, Molina TJ, Rickman DS, de Reynies A, Berger F, Gisselbrecht C, Briere J, Reyes F, Gaulard P, Feugier P, Labouyrie E, Tilly H, Bastard C, Coiffier B, Salles G, Leroy K, Groupe d'Etude des Lymphomes de lA. The expression of 16 genes related to the cell of origin and immune response predicts survival in elderly patients with diffuse large B-cell lymphoma treated with CHOP and rituximab. Leukemia. 2008;22(10):1917-24. doi: 10.1038/leu.2008.188. PubMed PMID: 18615101; PMCID: PMC2675107.

22. Prabhu P, Shandilya SM, Britan-Rosich E, Nagler A, Schiffer CA, Kotler M. Inhibition of APOBEC3G activity impedes double-stranded DNA repair. FEBS J. 2016;283(1):112- 29. doi: 10.1111/febs.13556. PubMed PMID: 26460502; PMCID: PMC4712096.

23. Wang Y, Wu S, Zheng S, Wang S, Wali A, Ezhilarasan R, Sulman EP, Koul D, Alfred Yung WK. APOBEC3G acts as a therapeutic target in mesenchymal gliomas by sensitizing cells to radiation-induced cell death. Oncotarget. 2017;8(33):54285-96. doi: 10.18632/oncotarget.17348. PubMed PMID: 28903341; PMCID: PMC5589580.

24. Nowarski R, Wilner OI, Cheshin O, Shahar OD, Kenig E, Baraz L, Britan-Rosich E, Nagler A, Harris RS, Goldberg M, Willner I, Kotler M. APOBEC3G enhances lymphoma cell radioresistance by promoting cytidine deaminase-dependent DNA repair. Blood. 2012;120(2):366-75. doi: 10.1182/blood-2012-01-402123. PubMed PMID: 22645179; PMCID: PMC3398754.

25. Liao W, Hong SH, Chan BH, Rudolph FB, Clark SC, Chan L. APOBEC-2, a cardiac- and skeletal muscle-specific member of the cytidine deaminase supergene family. Biochem Biophys Res Commun. 1999;260(2):398-404. doi: 10.1006/bbrc.1999.0925. PubMed PMID: 10403781.

26. Harris RS, Petersen-Mahrt SK, Neuberger MS. RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol Cell. 2002;10(5):1247-53. PubMed PMID: 12453430.

27. Lada AG, Krick CF, Kozmin SG, Mayorov VI, Karpova TS, Rogozin IB, Pavlov YI. Mutator effects and mutation signatures of editing deaminases produced in bacteria and yeast. Biochemistry (Mosc). 2011;76(1):131-46. PubMed PMID: 21568845; PMCID: PMC3906858.

28. Glickman RM, Rogers M, Glickman JN. Apolipoprotein B synthesis by human liver and intestine in vitro. Proc Natl Acad Sci U S A. 1986;83(14):5296-300. PubMed PMID: 3460091; PMCID: PMC323938.

35

29. Greeve J, Altkemper I, Dieterich JH, Greten H, Windler E. Apolipoprotein B mRNA editing in 12 different mammalian species: hepatic expression is reflected in low concentrations of apoB-containing plasma lipoproteins. J Lipid Res. 1993;34(8):1367- 83. PubMed PMID: 8409768.

30. Kane JP, Hardman DA, Paulus HE. Heterogeneity of apolipoprotein B: isolation of a new species from human chylomicrons. Proc Natl Acad Sci U S A. 1980;77(5):2465-9. PubMed PMID: 6930644; PMCID: PMC349420.

31. Chen SH, Habib G, Yang CY, Gu ZW, Lee BR, Weng SA, Silberman SR, Cai SJ, Deslypere JP, Rosseneu M, et al. Apolipoprotein B-48 is the product of a messenger RNA with an organ-specific in-frame stop codon. Science. 1987;238(4825):363-6. PubMed PMID: 3659919.

32. Innerarity TL, Boren J, Yamanaka S, Olofsson SO. Biosynthesis of apolipoprotein B48-containing lipoproteins. Regulation by novel post-transcriptional mechanisms. J Biol Chem. 1996;271(5):2353-6. PubMed PMID: 8576187.

33. Muramatsu M, Sankaranand VS, Anant S, Sugai M, Kinoshita K, Davidson NO, Honjo T. Specific expression of activation-induced cytidine deaminase (AID), a novel member of the RNA-editing deaminase family in germinal center B cells. J Biol Chem. 1999;274(26):18470-6. PubMed PMID: 10373455.

34. Muramatsu M, Kinoshita K, Fagarasan S, Yamada S, Shinkai Y, Honjo T. Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme. Cell. 2000;102(5):553-63. PubMed PMID: 11007474.

35. Arakawa H, Hauschild J, Buerstedde JM. Requirement of the activation-induced deaminase (AID) gene for immunoglobulin gene conversion. Science. 2002;295(5558):1301-6. doi: 10.1126/science.1067308. PubMed PMID: 11847344.

36. Schrader CE, Linehan EK, Mochegova SN, Woodland RT, Stavnezer J. Inducible DNA breaks in Ig S regions are dependent on AID and UNG. J Exp Med. 2005;202(4):561-8. doi: 10.1084/jem.20050872. PubMed PMID: 16103411; PMCID: PMC2212854.

37. Koito A, Ikeda T. Intrinsic immunity against retrotransposons by APOBEC cytidine deaminases. Front Microbiol. 2013;4:28. doi: 10.3389/fmicb.2013.00028. PubMed PMID: 23431045; PMCID: PMC3576619.

38. Suspene R, Mussil B, Laude H, Caval V, Berry N, Bouzidi MS, Thiers V, Wain- Hobson S, Vartanian JP. Self-cytoplasmic DNA upregulates the mutator enzyme APOBEC3A leading to chromosomal DNA damage. Nucleic Acids Res. 2017;45(6):3231-41. doi: 10.1093/nar/gkx001. PubMed PMID: 28100701; PMCID: PMC5389686.

36

39. Starrett GJ, Luengas EM, McCann JL, Ebrahimi D, Temiz NA, Love RP, Feng Y, Adolph MB, Chelico L, Law EK, Carpenter MA, Harris RS. The DNA cytosine deaminase APOBEC3H haplotype I likely contributes to breast and lung cancer mutagenesis. Nat Commun. 2016;7:12918. doi: 10.1038/ncomms12918. PubMed PMID: 27650891; PMCID: PMC5036005.

40. Burns MB, Temiz NA, Harris RS. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat Genet. 2013;45(9):977-83. doi: 10.1038/ng.2701. PubMed PMID: 23852168.

41. Roberts SA, Lawrence MS, Klimczak LJ, Grimm SA, Fargo D, Stojanov P, Kiezun A, Kryukov GV, Carter SL, Saksena G, Harris S, Shah RR, Resnick MA, Getz G, Gordenin DA. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat Genet. 2013;45(9):970-6. doi: 10.1038/ng.2702. PubMed PMID: 23852170; PMCID: PMC3789062.

42. Greeve J, Philipsen A, Krause K, Klapper W, Heidorn K, Castle BE, Janda J, Marcu KB, Parwaresch R. Expression of activation-induced cytidine deaminase in human B- cell non-Hodgkin lymphomas. Blood. 2003;101(9):3574-80. doi: 10.1182/blood-2002-08- 2424. PubMed PMID: 12511417.

43. Hardianti MS, Tatsumi E, Syampurnawati M, Furuta K, Saigo K, Nakamachi Y, Kumagai S, Ohno H, Tanabe S, Uchida M, Yasuda N. Activation-induced cytidine deaminase expression in follicular lymphoma: association between AID expression and ongoing mutation in FL. Leukemia. 2004;18(4):826-31. doi: 10.1038/sj.leu.2403323. PubMed PMID: 14990977.

44. Conticello SG, Thomas CJ, Petersen-Mahrt SK, Neuberger MS. Evolution of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases. Mol Biol Evol. 2005;22(2):367-77. PubMed PMID: 15496550.

45. LaRue RS, Andresdottir V, Blanchard Y, Conticello SG, Derse D, Emerman M, Greene WC, Jonsson SR, Landau NR, Lochelt M, Malik HS, Malim MH, Munk C, O'Brien SJ, Pathak VK, Strebel K, Wain-Hobson S, Yu XF, Yuhki N, Harris RS. Guidelines for naming nonprimate APOBEC3 genes and proteins. J Virol. 2009;83(2):494-7. Epub 2008/11/07. doi: 10.1128/JVI.01976-08. PubMed PMID: 18987154; PMCID: 2612408.

46. Munk C, Willemsen A, Bravo IG. An ancient history of gene duplications, fusions and losses in the evolution of APOBEC3 mutators in mammals. BMC Evol Biol. 2012;12:71. doi: 10.1186/1471-2148-12-71. PubMed PMID: 22640020; PMCID: PMC3495650.

47. Wang X, Abudu A, Son S, Dang Y, Venta PJ, Zheng YH. Analysis of human APOBEC3H haplotypes and anti-human immunodeficiency virus type 1 activity. J Virol.

37

2011;85(7):3142-52. doi: 10.1128/JVI.02049-10. PubMed PMID: 21270145; PMCID: PMC3067873.

48. Qiao Q, Wang L, Meng FL, Hwang JK, Alt FW, Wu H. AID Recognizes Structured DNA for Class Switch Recombination. Mol Cell. 2017;67(3):361-73 e4. doi: 10.1016/j.molcel.2017.06.034. PubMed PMID: 28757211.

49. Betts L, Xiang S, Short SA, Wolfenden R, Carter CW, Jr. Cytidine deaminase. The 2.3 A crystal structure of an enzyme: transition-state analog complex. J Mol Biol. 1994;235(2):635-56. PubMed PMID: 8289286.

50. Davidson NO, Anant S, MacGinnitie AJ. Apolipoprotein B messenger RNA editing: insights into the molecular regulation of post-transcriptional cytidine deamination. Curr Opin Lipidol. 1995;6(2):70-4. PubMed PMID: 7773570.

51. Pham P, Bransteitter R, Petruska J, Goodman MF. Processive AID-catalysed cytosine deamination on single-stranded DNA simulates somatic hypermutation. Nature. 2003;424(6944):103-7. doi: 10.1038/nature01760. PubMed PMID: 12819663.

52. Beale RC, Petersen-Mahrt SK, Watt IN, Harris RS, Rada C, Neuberger MS. Comparison of the differential context-dependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo. J Mol Biol. 2004;337(3):585-96. PubMed PMID: 15019779.

53. Esnault C, Heidmann O, Delebecque F, Dewannieux M, Ribet D, Hance AJ, Heidmann T, Schwartz O. APOBEC3G cytidine deaminase inhibits retrotransposition of endogenous retroviruses. Nature. 2005;433(7024):430-3. doi: 10.1038/nature03238. PubMed PMID: 15674295.

54. Hultquist JF, Lengyel JA, Refsland EW, LaRue RS, Lackey L, Brown WL, Harris RS. Human and rhesus APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H demonstrate a conserved capacity to restrict Vif-deficient HIV-1. J Virol. 2011;85(21):11220-34. doi: 10.1128/JVI.05238-11. PubMed PMID: 21835787; PMCID: PMC3194973.

55. Burns MB, Lackey L, Carpenter MA, Rathore A, Land AM, Leonard B, Refsland EW, Kotandeniya D, Tretyakova N, Nikas JB, Yee D, Temiz NA, Donohue DE, McDougle RM, Brown WL, Law EK, Harris RS. APOBEC3B is an enzymatic source of mutation in breast cancer. Nature. 2013;494(7437):366-70. doi: 10.1038/nature11881. PubMed PMID: 23389445.

56. Larijani M, Petrov AP, Kolenchenko O, Berru M, Krylov SN, Martin A. AID associates with single-stranded DNA with high affinity and a long complex half-life in a sequence-independent manner. Mol Cell Biol. 2007;27(1):20-30. doi: 10.1128/MCB.00824-06. PubMed PMID: 17060445; PMCID: PMC1800660.

38

57. Rogozin IB, Diaz M. Cutting edge: DGYW/WRCH is a better predictor of mutability at G:C bases in Ig hypermutation than the widely accepted RGYW/WRCY motif and probably reflects a two-step activation-induced cytidine deaminase-triggered process. J Immunol. 2004;172(6):3382-4. PubMed PMID: 15004135.

58. Liddament MT, Brown WL, Schumacher AJ, Harris RS. APOBEC3F properties and hypermutation preferences indicate activity against HIV-1 in vivo. Curr Biol. 2004;14(15):1385-91. PubMed PMID: 15296757.

59. Chan K, Roberts SA, Klimczak LJ, Sterling JF, Saini N, Malc EP, Kim J, Kwiatkowski DJ, Fargo DC, Mieczkowski PA, Getz G, Gordenin DA. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat Genet. 2015;47(9):1067-72. doi: 10.1038/ng.3378. PubMed PMID: 26258849; PMCID: PMC4594173.

60. Holtz CM, Sadler HA, Mansky LM. APOBEC3G cytosine deamination hotspots are defined by both sequence context and single-stranded DNA secondary structure. Nucleic Acids Res. 2013;41(12):6139-48. doi: 10.1093/nar/gkt246. PubMed PMID: 23620282; PMCID: PMC3695494.

61. Iwatani Y, Takeuchi H, Strebel K, Levin JG. Biochemical activities of highly purified, catalytically active human APOBEC3G: correlation with antiviral effect. J Virol. 2006;80(12):5992-6002. PubMed PMID: 16731938.

62. Dickerson SK, Market E, Besmer E, Papavasiliou FN. AID mediates hypermutation by deaminating single stranded DNA. J Exp Med. 2003;197(10):1291-6. doi: 10.1084/jem.20030481. PubMed PMID: 12756266; PMCID: PMC2193777.

63. Pham P, Landolph A, Mendez C, Li N, Goodman MF. A biochemical analysis linking APOBEC3A to disparate HIV-1 restriction and skin cancer. J Biol Chem. 2013;288(41):29294-304. doi: 10.1074/jbc.M113.504175. PubMed PMID: 23979356; PMCID: PMC3795231.

64. Love RP, Xu H, Chelico L. Biochemical analysis of hypermutation by the deoxycytidine deaminase APOBEC3A. J Biol Chem. 2012;287(36):30812-22. doi: 10.1074/jbc.M112.393181. PubMed PMID: 22822074; PMCID: PMC3436324.

65. Harjes S, Solomon WC, Li M, Chen KM, Harjes E, Harris RS, Matsuo H. Impact of H216 on the DNA binding and catalytic activities of the HIV restriction factor APOBEC3G. J Virol. 2013;87(12):7008-14. Epub 2013/04/19. doi: 10.1128/JVI.03173- 12. PubMed PMID: 23596292; PMCID: 3676121.

66. Ara A, Love RP, Chelico L. Different mutagenic potential of HIV-1 restriction factors APOBEC3G and APOBEC3F is determined by distinct single-stranded DNA scanning mechanisms. PLoS Pathog. 2014;10(3):e1004024. doi: 10.1371/journal.ppat.1004024. PubMed PMID: 24651717; PMCID: PMC3961392.

39

67. Witke W, Podtelejnikov AV, Di Nardo A, Sutherland JD, Gurniak CB, Dotti C, Mann M. In mouse brain profilin I and profilin II associate with regulators of the endocytic pathway and actin assembly. EMBO J. 1998;17(4):967-76. doi: 10.1093/emboj/17.4.967. PubMed PMID: 9463375; PMCID: PMC1170446.

68. Kwiatkowski DJ, Bruns GA. Human profilin. Molecular cloning, sequence comparison, and chromosomal analysis. J Biol Chem. 1988;263(12):5910-5. PubMed PMID: 3356709.

69. Schluter K, Schleicher M, Jockusch BM. Effects of single amino acid substitutions in the actin-binding site on the biological activity of bovine profilin I. J Cell Sci. 1998;111 ( Pt 22):3261-73. PubMed PMID: 9788869.

70. Karakesisoglou I, Schleicher M, Gibbon BC, Staiger CJ. Plant profilins rescue the aberrant phenotype of profilin-deficient Dictyostelium cells. Cell Motil Cytoskeleton. 1996;34(1):36-47. doi: 10.1002/(SICI)1097-0169(1996)34:1<36::AID-CM4>3.0.CO;2-G. PubMed PMID: 8860230.

71. Witke W. The role of profilin complexes in cell motility and other cellular processes. Trends Cell Biol. 2004;14(8):461-9. doi: 10.1016/j.tcb.2004.07.003. PubMed PMID: 15308213.

72. Bosco DA, LaVoie MJ, Petsko GA, Ringe D. Proteostasis and movement disorders: Parkinson's disease and amyotrophic lateral sclerosis. Cold Spring Harb Perspect Biol. 2011;3(10):a007500. doi: 10.1101/cshperspect.a007500. PubMed PMID: 21844169; PMCID: PMC3179340.

73. Norris F, Shepherd R, Denys E, U K, Mukai E, Elias L, Holden D, Norris H. Onset, natural history and outcome in idiopathic adult motor neuron disease. J Neurol Sci. 1993;118(1):48-55. PubMed PMID: 8229050.

74. Siddique T, Deng HX. Genetics of amyotrophic lateral sclerosis. Hum Mol Genet. 1996;5 Spec No:1465-70. PubMed PMID: 8875253.

75. Wu CH, Fallini C, Ticozzi N, Keagle PJ, Sapp PC, Piotrowska K, Lowe P, Koppers M, McKenna-Yasek D, Baron DM, Kost JE, Gonzalez-Perez P, Fox AD, Adams J, Taroni F, Tiloca C, Leclerc AL, Chafe SC, Mangroo D, Moore MJ, Zitzewitz JA, Xu ZS, van den Berg LH, Glass JD, Siciliano G, Cirulli ET, Goldstein DB, Salachas F, Meininger V, Rossoll W, Ratti A, Gellera C, Bosco DA, Bassell GJ, Silani V, Drory VE, Brown RH, Jr., Landers JE. Mutations in the profilin 1 gene cause familial amyotrophic lateral sclerosis. Nature. 2012;488(7412):499-503. doi: 10.1038/nature11280. PubMed PMID: 22801503; PMCID: PMC3575525.

76. Smith BN, Vance C, Scotter EL, Troakes C, Wong CH, Topp S, Maekawa S, King A, Mitchell JC, Lund K, Al-Chalabi A, Ticozzi N, Silani V, Sapp P, Brown RH, Jr., Landers

40

JE, Al-Sarraj S, Shaw CE. Novel mutations support a role for Profilin 1 in the pathogenesis of ALS. Neurobiol Aging. 2015;36(3):1602 e17-27. doi: 10.1016/j.neurobiolaging.2014.10.032. PubMed PMID: 25499087; PMCID: PMC4357530.

41

Chapter II

The ssDNA Mutator APOBEC3A Is Regulated by Cooperative Dimerization

42

II.a. Abstract

Deaminase activity mediated by the human APOBEC3 family of proteins

contributes to genomic instability and cancer. APOBEC3A is by far the most

active in this family and can cause rapid cell death when overexpressed, but in

general how the activity of APOBEC3s is regulated on a molecular level is

unclear. In this study, the biochemical and structural basis of APOBEC3A

substrate binding and specificity is elucidated. We find that specific binding of

single-stranded DNA is regulated by the cooperative dimerization of APOBEC3A.

The crystal structure elucidates this homodimer as a symmetric domain swap of

the N-terminal residues. This dimer interface provides insights into how cooperative protein-protein interactions may affect function in the APOBEC3 enzymes and provides a potential scaffold for strategies aimed at reducing their mutation load.

43

II.b. Introduction

Several exogenous and endogenous factors act as mutagens, contributing to carcinogenesis. The APOBEC3 proteins have been described as a major endogenous source for mutations in various types of cancer. Acting on chromosomal DNA, the APOBEC3 family of cytidine deaminases can introduce

G-to-A hypermutations, as observed in clusters of APOBEC3-mediated mutational signatures found in breast cancer genomes (3). APOBEC3B (A3B) was recently identified as a direct enzymatic source for this type of clustered mutation (4). In addition to breast cancer, several other cancers such as bladder cancer, head and neck cancer, cervical cancer, and lung cancer exhibit a similar genomic mutation pattern (5, 6). Urothelial bladder cancer exhibits the most pronounced contribution of APOBEC3-mediated hypermutations to the overall mutation load (8). In lung cancer, APOBEC3-induced genomic instability appears to increase over time as the tumor progresses (9). APOBEC3A (A3A) shares the same genomic locus as A3B but is much more catalytically active and potentially linked to breast cancer (10, 11).

APOBEC3 proteins belong to a superfamily of deaminases and catalyze a cytidine to uridine zinc-dependent deamination reaction (12) (13). Common ancestry links the seven proteins of the contiguous human APOBEC3 locus (14) and allows classification based on phylogeny (15). A3A, A3C, and A3H comprise a single, catalytically active deaminase domain, whereas A3B, A3D, A3F, and

A3G are two-domain proteins with an N-terminal pseudocatalytic deaminase

44

domain (NTD) and a C-terminal catalytic domain (CTD). The spatial extent of the

substrate accommodating the active site region appears to be a determinant of

whether a deaminase domain exhibits catalytic activity or not (16). The

APOBEC3 proteins act on ssDNA to introduce strand-coordinated G-to-A point

mutations. These mutations not only compromise the informational integrity of

DNA but may also lead to double strand breaks (4, 17) contributing to genomic

damage observed in the cancer genomes (18, 19).

Four members of the APOBEC3 family (A3D, A3F, A3G, and A3H) apply

strong selective pressure on HIV-1 in the absence of Vif (20-32). These proteins

are incorporated into budding virions and, upon subsequent infection of a target

cell, introduce point mutations in the newly reverse-transcribed viral genomic

ssDNA, leading to direct degradation of the highly mutated product (33) or

detrimental G-to-A mutations (23, 34).

A3G and A3F form high molecular mass complexes with polynucleotides that are relevant for biological function (35, 36). The four antiretroviral APOBEC3 proteins were recently demonstrated to form multimeric complexes in living cells

(37). Over the last few years, atomic force microscopy (AFM) studies have provided insights into the mechanistic details influencing this complex formation

(38-40). The crystal structures of A3C (41), A3F-CTD (42), and A3G-CTD (43-

45), and the nuclear magnetic resonance (NMR) structures of A3G-CTD (46, 47) and A3A (48) have provided further insights into the structural factors influencing this activity. However, significant details are still missing due to the lack of

45

APOBEC3-ssDNA complex structures that could illuminate the molecular basis of

complex formation.

The functional oligomerization state of A3 enzymes and whether or not

cooperativity contributes to DNA binding affinity are not clear. The two-domain

A3G binds ssDNA both as a monomer and dimer, but the dimerization is not

induced by ssDNA binding (39), which is consistent with the observation that

A3G binds its substrate noncooperatively (49, 50) with high affinity (KD ~ 50–75 nM) (49, 51, 52).

The single-domain A3A has been reported to be monomeric in vitro, in solution, as detected by AFM (40), as well as in living cells monitored by fluorescence fluctuation spectroscopy (FFS) (37). The binding affinity of catalytically active A3A to ssDNA substrate, compared with A3G, is reported to be lower by 100-fold (2, 48) or 50-fold (53), depending on the chosen experimental conditions. These reports are further complicated by the observation that catalytically active A3A is the most potent enzyme of the A3 family, with deamination rates up to 10-fold above those of APOBEC3G (54).

Catalytically inactive A3A, on the other hand, dimerizes readily on the target ssDNA substrate (40). However, catalytically inactive A3A can bind substrates with similar affinity as A3G (55). The fundamental reasons for these apparent discrepancies are not well understood.

In this study, we explore the structural and biochemical basis underlying

A3A ssDNA binding activity and the direct functional impact of cooperative

46

dimerization on binding affinity. We developed a novel fluorescence anisotropy-

based high-throughput binding assay, solved a high-resolution crystal structure,

and generated a range of A3A mutants to demonstrate that catalytically inactive

A3A binds ssDNA with high affinity and specificity while exhibiting a high degree of cooperativity. Cooperative dimerization of APOBEC3A provides fundamental insights into the function of the entire APOBEC3 family of proteins and their respective roles in anti-retroviral and anti-cancer therapies.

47

II.c. Results

II.c.1. APOBEC3A Preferentially Binds the TTC Trinucleotide

Sequence.

APOBEC3 enzymes recognize target deamination site cytidines within the

context of specific trinucleotide motifs. A binding assay was developed using

short, fluorescently labeled 15-mer oligonucleotides with single target sites for

deamination to quantitatively assess motif specificity. A catalytically inactive

mutant of A3A (E72A/C171A) (Figure II.1A; Table II.1) was used and the

dissociation constant determined for several trinucleotide deamination motifs.

This variant of A3A binds the target trinucleotide motif (5’-TTC-3’) with high

affinity (KD = 77 ± 3 nM) and strong specificity compared with an adenine

polynucleotide. This affinity is comparable with that reported for A3G (49). Other

cytidine-containing trinucleotide motifs known to be substrates for A3A are bound

with similar affinities, KD = 114 ± 4 nM for 5’-TCC-3’ and KD = 100 ± 4 nM for 5’-

CCC-3’. These data are consistent with previously reported values for

trinucleotide motifs (53) but in our observations 5’-TTC-3’ is slightly more

preferred with a ~20–30 nM difference in KD.

Using substrates based on the high-affinity trinucleotide (5’ -TTC-3’) a poly

thymidine oligomer containing two cytidine bases was identified as the tightest

binding partner for A3A (Figure II.1B; Table II.1) with KD = 44 ± 2 nM. Varying the number of consecutive cytidines, ranging from one to four, did not significantly alter substrate affinity. An oligonucleotide consisting solely of

48

Figure II.1. APOBEC3A Binding Specifically to Trinucleotide Deamination Motifs A. Fluorescence anisotropy measurements show how introduction of a single TTC motif in a polyA background leads to high affinity binding. Other motifs have a similar effect, but TTC appears to be the preferred substrate. B. APOBEC3A binding to an ideal substrate consisting of a polyT oligomer containing a single cytidine residue. Varying the number of cytidines does not have any pronounced effect on APOBEC3A affinity. C. Difference in affinity to the target deamination site versus a polyT oligomer. Binding is cooperative with higher affinity to the oligomer containing the TTC motif (polyT_1C). APOBEC3A has the same affinity to the deaminated base (polyT_1U) as to the polyT background. Binding is specific to the substrate and not to the product. The error bars represent the SD calculated for each measurement point from three independent repeats.

49

Table II.1. ssDNA Binding Affinity and Cooperativity of APOBEC3A and Interface Mutants

The Hill coefficient (nH) is expected to be 1 for noncooperative binding and 2 for cooperative binding by a dimer. A3A refers to the inactive variant E72A. Fold- changes in KD (second column) are relative to polyT_1C substrate (bold type). See also Table S1 for oligomer sequences.

50

thymidine, lacking cytidine, binds to the enzyme (Figure II.1C) but the affinity is

an order of magnitude weaker than in the presence of a single cytidine (KD = 502

± 27 nM).

A second parameter undergoing change upon introducing a target site is the Hill coefficient of the binding curve. When the substrate contains a cytidine, the Hill coefficient is ~2 (Figures II.1A and II.1B; Table II.1), whereas the Hill coefficient is ~1 in the absence of a cytidine. This difference in the Hill coefficient may imply that APOBEC3A is in a monomeric low-affinity state in the absence of a target cytidine but assumes a dimeric high-affinity state when encountering a deamination substrate. Such a binding mechanism would predict that the affinity to the product of the deamination reaction would likely also be weaker. To test this hypothesis, we replaced the cytidine base with a uracil, in which case both the affinity and the Hill coefficient are similar to those for the all-thymidine oligomer (1U KD = 434 ± 35 nM, h = 0.80 ± 0.04; TTT KD = 502 ± 27nM, h = 0.96

± 0.04) (Figure II.1C; Table II.1).

51

II.c.2. Crystal Structure of APOBEC3A.

The crystal structure of A3A-E72A-C171A was determined to 2.85 Å resolution with two molecules per asymmetric unit (Figure II.2A) forming a homodimer. The A3A dimer crystallized in space group P6522 and refined with good statistics (Table II.2).

As observed in the NMR solution structure (48), A3A has a canonical DNA cytosine deaminase fold, composed of five β-strands, six α-helices, and the catalytic zinc-binding site. The zinc atom is coordinated by direct interactions with

H70, C101, C106, and, as the catalytic glutamic acid (E72A) is inactive, to what appears to be a second zinc ion which is 3.3 Å apart from and appears to stabilize the geometry around the catalytic zinc (Figure II.2B). This type of site with two zinc ions in close proximity resembles cocatalytic zinc sites, which can be found in class III hydrolases (56). Comparing with the NMR ensemble of catalytically active A3A (48) shows that the loop connecting the zinc-coordinating cysteine residues has moved, which is necessary to conserve the active-site geometry in the presence of the E72A mutation and allows recruitment of T31 to coordinate the second ion. This loop contains 104-WG-105, an insertion unique to A3A and the closely related C-terminal domain of A3B. All zinc-coordinating residues are located in helices α2 and α3, which also provide a structural backbone for the catalytic pocket.

The crystal structure reveals that A3A forms a dimer that defines the asymmetric unit. This dimer is formed via a symmetric domain swap between two

52

Figure II.2. Crystal Structure of APOBEC3A A. Crystallographic dimer of APOBEC3A. The two monomers are in orange and cyan, with the metal ions at the active sites depicted as steel gray spheres and chloride ions at the dimerization interface shown as green spheres. B. Close-up view of the active site. Recruitment of a second metal via a threonine residue (T31) protects active site geometry in the presence of a mutation of the catalytic glutamate residue. C. Surface representation of the dimeric structure in (A) reveals a groove connecting both active sites via the dimer interface. D. SiteMap prediction (7) of putative binding sites (blue) on a wireframe representation of the surface in (C) matches the groove connecting both active sites. E. Electrostatic potential map, same orientation as (A) and (C). Positive (blue) and negative (red) charges are indicated on the surface. The groove connecting the two active sites is mainly positively charged. See also Figure S1 for a superposition of both monomers.

53

Table II.2. Crystallographic Statistics for APOBEC3A Structure

54

A3A molecules. The root-mean-square deviation (RMSD) between the two molecules is 0.64 Å with the largest deviations in residues 25–27, 50, and 86–87 with Cα -Cα distances of 1.3 Å (Figure II.3). Seventeen residues that form the N- terminal loop regions of both molecules form an intimate handshake (Figure

II.2A) burying >1,000 Å2 acting as the dimerization interface. The dimerization interface is away from the active sites and coordinates three metal ions by residues H11 and H56 from both chains and a network of water molecules. In addition, the side chains of residues K30 are coordinated via a water molecule at the interface bridging the two molecules (Figure II.4.A).

On the surface of the dimer (Figure II.2C), a symmetric groove 36 Å in length connects the active site regions of the two molecules via the dimerization interface. This groove has been identified as a potential ligand binding site with a calculated (7) pocket volume of ~800 Å3 (Figure II.2D). Based on surface electrostatic potential, the groove is mostly positively charged (Figure II.2E).

Residues H16, R28, K30, H56, and K60 are prominently contributing to both the accessible pocket volume and charge.

55

Figure II.3. Superposition of the two monomers in the crystallographic dimer Superposition of the two monomers in the crystallographic dimer colored by A. the two monomers (orange and cyan) and B. the alpha-Carbon deviations between the two monomers increasing from dark to lighter shades of purple. Regions displaying the highest deviations are indicated by residue numbers.

56

Figure II.4. Residues Contributing to Interface Formation Are Determinants for Cooperativity and Affinity A. The dimer interface in the APOBEC3A crystal structure. Residues forming the dimerization interface are shown as sticks and zinc (gray), chloride (green), and water (red) as spheres. Side-chain oxygen and nitrogen atoms are colored red and green, respectively. B. Bar graphs show how point mutations at the highlighted sites affect KD and the Hill coefficient. WT represents data collected for A3A-E72A-C171A. KD (rel.) represents the fold change in KD relative to A3A-E72A-C171A binding polyT_1C. The error bars represent the SD from three independent repeats.

57

II.c.3. Assessing the Functional Significance of the

Crystallographic Dimer.

The crystal structure allowed us to identify and test the potential determinants for substrate recognition and binding. Based on the crystallographic dimer, we engineered a series of mutant constructs to probe whether the observed dimer might play a role in substrate binding. If the crystallographic dimer indeed corresponds to a biochemically relevant structure, then the de- signed mutations should affect and provide insights into substrate recognition.

Since the crystallographic interface brings the N-terminal 17 residues in close proximity, we tested an N-terminally truncated version of A3A to investigate the potential role of the crystallographic dimer as the structure responsible for the observed cooperativity. Under the same experimental conditions as used above, the truncated protein lost the high affinity to the ideal substrate (Figure II.5), and protein expression yields and the construct solubility were severely compromised.

In order to confirm that the active site rearrangement caused by the E72A mutation did not cause the observed differential substrate recognition, we repeated the binding experiment presented in Figure II.1C with an E72Q variant.

A3A E72Q exhibits the same kind of marked increase in affinity and cooperativity upon encountering the target cytidine (Figure II.6; Table II.1).

The crystal structure was further used to narrow down the determinants for dimerization and identify sites amenable to single amino acid substitutions

58

Figure II.5. N-terminally truncated mutant of APOBEC3A Binding to ideal substrate Fluorescence anisotropy measurements show that deleting the N-terminus (A3Ad17) impairs specific, high-affinity binding of APOBEC3A to polyT-1C (ideal substrate). Higher concentrations of protein used in Figure 1 were not possible due to instability of the A3Ad17 protein, making measurement of accurate Kd or hill coefficient impossible.

59

Figure II.6. Binding of catalytically inactive E71G APOBEC3A variant Fluorescence anisotropy measurements of the catalytically inactive variant E72Q show the same kind of substrate dependent cooperativity as observed for E72A.

60

based on the charge distribution and interatomic distances described above. A

series of mutant proteins were then engineered to measure the contribution of

these individual amino acids to cooperativity and affinity (Figure II.4). From the

structure, two pairs of residues were identified that might contribute to dimer

formation. H11 and H56 are the first set, which forms the base of the dimerization

interface (Figure II.4A), and mutations at these residues not only disrupt the

protein-protein interface but also severely affect cooperativity of binding (H11A,

KD = 855 ± 311 nM, h = 0.7 ± 0.1; H56A, KD = 86 ± 13 nM, h = 0.9 ± 0.1). A second set of ionic residues, H16 and K30, closer to the surface (Figure II.4A) of

the groove also markedly reduce cooperatively and greatly compromise affinity

when mutated (H16A, KD = 584 ± 92 nM, h = 1.2 ± 0.2; K30E, KD = 284 ± 20 nM,

h = 1.2 ± 0.1) and greatly compromise affinity. H56A was the only mutation

drastically reducing cooperativity while maintaining substrate affinity. The results

show that both sets of interactions predicted by the dimerization interface

observed in the crystal structure are critical to cooperative DNA recognition in

solution.

61

II.d. Discussion

In this study, we characterized the cooperativity in specific binding of A3A to ssDNA substrate, determined the crystal structure of the A3A dimer, and engineered mutations that interrogated the functional implications of this dimer interface. We found that A3A recognizes substrate cooperatively and with high affinity and specificity. Key to this recognition is the A3A dimer that forms an extensive positively charged groove connecting the active sites of both monomers. A3A exists as a monomer and as a dimer both in solution and bound to substrate (40, 55) yet a functional implication for transition between those states was missing. The identification of this substrate-binding groove and mutational analysis provide key insights into the structural basis of A3A substrate specificity, and helps to explain the previously reported apparent discrepancies on A3A function.

The contiguous and positively charged groove on the A3A surface is consistent with DNA recognition and binding. In fact, many residues around the active site region appear to be involved in substrate binding. Our structure unifies many of the residues previously associated with ssDNA binding to A3A (1, 2) by mapping them to a contiguous band on the molecular surface crossing the dimer interface (Figure II.7; Figure II.8). This bridges the data from the two previous studies where only relatively few residues identified were overlapping (Figure

II.7, yellow); most likely the experimental conditions account for the differences

(Figure II.7, green,

62

Figure II.7. Residues Implicated in Deamination Activity Shown are independently determined substrate binding surfaces by enzymatic activity (red) (1) or chemical shift perturbation accompanied by enzymatic activity (green) (2). Residues identified in both studies are colored yellow. See also Figure S4.

63

Figure II.8. Residues Implicated in DNA Binding Residues identified to be involved in DNA binding by (A) Didier Trono’s (1) and (B) Angela Gronenborn’s and Judith Levin’s group (2) are colored in red on the APOBEC3A dimeric structure. The two monomers are in cyan and orange, and the metal ions are indicated as spheres as in Figures 2-4.

64

(2) and red, (1)). Our crystal structure demonstrates the consistency of both studies in that all residues identified by either study lie within the groove. In fact, two residues outside the active site identified in both studies, K30 and K60, that bridge the dimer interface and contribute to the groove’s charge, respectively, affect substrate affinity and deamination (1, 2). The H56A mutant, which has a significant impact on dimerization and a much smaller effect on substrate affinity, seems to allow a separation of function and shows that high affinity can be achieved in the monomeric state as well.

Although A3A can and does exist in monomeric form in solution and in the cell (37, 40, 55), our analysis strongly implies that the high-affinity DNA-binding functional form is a homodimer formed by swapping the N-terminal loop. A naturally occurring isoform of A3A, which lacks the initial 12 residues (57, 58), was described to be 5-fold less active compared with the full-length enzyme (54) in an in vitro deamination assay. This only modest reduction in activity could arise from key residues of the interface lying outside the N-terminus.

Dimerization via the N-terminus could be communicated via K30 and H56, which are involved in interface formation, to neighboring residues T31 and N57 forming the pocket containing the catalytic site. Residues H56 and K30, which form the top of the groove of the dimerization interface, are also positioned to favorably interact with the negatively charged phosphates of the substrate and thus can not only communicate dimerization, but, more specifically, dimerization on the substrate to the active site region.

65

The necessity for A3A to form a cooperative dimer for high-affinity binding effectively explains the apparent discrepancy between high enzymatic activity and the great variation in reported substrate affinities (2, 48, 53, 59). As we show, affinity drops by an order of magnitude and the Hill coefficient drops dramatically when binding to product is compared with substrate binding. Since most binding experiments to determine affinity are conducted at equilibrium and

A3A has very fast deamination rates, experiments done with active enzyme will observe binding to the reaction product instead of the substrate. Weaker and monomeric binding to product corroborates our results (40); after incubation with substrate, active A3A was observed in a predominantly monomeric form, whereas inactive A3A would form a dimer on substrate. A recent study in which the catalytically deficient E72A mutant was used was also able to measure a similarly high affinity to the ssDNA substrate (55). In the presence of active A3A, substrate would quickly have been turned over to product and therefore appear to bind with low affinity. Taken together, the differences in binding product and substrate explain why binding with high affinity and in a dimeric state could only be observed for catalytically inactive enzyme.

A cooperative model of APOBEC3 activity also explains the ability to achieve the required fidelity of substrate recognition despite high deamination activity. Being the most active deaminase of the APOBEC3 family, A3A also serves as the most effective restrictor of foreign DNA (57). Less discriminatory than other APOBEC3 enzymes and the only known family member reported to

66

deaminate modified cytidine residues (54, 60, 61), A3A can also be implied as an

agent in demethylation pathways (62, 63). At the same time, the random nature

of mutations introduced by A3A can be very detrimental to cell viability (4, 57) or

can be the source for mutations in cancer, as has been shown for the close

relative A3B, the catalytic domain of which shares 97% similarity with A3A. The

cooperative model of substrate interaction for A3A can have implications on how

the mutation load inflicted by A3A is regulated in vivo. At low concentrations and

in the dense milieu of the cell, the enzyme would encounter a short, exposed

ssDNA substrate mostly as monomer with only modest affinity. At higher

concentrations and in the presence of longer stretches of foreign ssDNA, A3A

can act with very high specificity and affinity on a target sequence, leading to the

observed high rates of deamination. Also, the change in binding affinity toward a

stretch of thymidine bases (KD = 502 ± 27 nM) upon introduction of a single cytidine (KD = 56 ± 2) could suggest a possible mechanism of substrate binding wherein A3A initially binds the thymidine bases with low affinity, followed by identifying the target cytidine and binding more tightly.

The cooperative binding model may also provide insights into the evolution of the APOBEC3 domain structure. Four of the seven members of the human APOBEC3 protein family (A3B, A3D, A3F, and A3G) comprise two cytidine deaminase domains connected via a short linker. This repertoire of double-domain APOBEC3 proteins likely evolved during a series of events from single-domain precursors (14, 64). The evolutionary

67

linkage between the members of the modern primate APOBEC3 locus can be

understood from a prototypical set of single-domain APOBEC3 proteins, one for

each Z domain subtype (15). In primates, A3A is the sole member comprising a

single Z1 domain, sharing phylogenetic origin with the catalytic domains of A3B

and A3G. Both of these enzymes possess a pseudocatalytic N-terminal Z2

domain, which is required for efficient substrate binding but does not catalyze a

deamination reaction (65, 66). The functional A3A dimer identified here might

provide a reason why APOBEC3 might form two-domain fusions. With a single

target site, one of the A3A monomers does not act on the target cytidine but is

involved in substrate binding. The evolution of double-domain enzymes appears

to have allowed for a separation of function between binding and catalysis

leading to less active proteins that became more specific to their target. The

interdomain linker region has recently been shown to have the determinants for

processivity in A3G and A3F and alterations in the linker can impair enzyme

function (67) .

Damage in ssDNA was shown to be a major source for mutation clusters

in cancer (18, 19) and can also contribute to the diversity of viral genomes (68,

69). Targeting the activity of deaminases may have implications for novel strategies in the treatment of infectious diseases and cancer therapies and the insights into the structural mechanism of substrate binding described in this study could help guide efforts to alleviate the detrimental mutagenic activity of cellular deaminases.

68

II.e. Methods

II.e.1. Expression and Purification of APOBEC3A- E72A-C171A.

Escherichia coli BL21 DE3 Star (Stratagene) cells were transformed with a pColdIII vector (Takara Biosciences) encoding a glutathione S-transferase

(GST)-based construct. The E72A mutation was chosen to render the protein inactive and C171A to increase solubility. Expression occurred at 16 ºC for 22 hr in lysogeny broth medium containing 1 mM isopropyl β-D-1- thiogalactopyranoside and 100 µg/ml ampicillin. Cells were pelleted, resuspended in purification buffer (50 mM Tris-HCl [pH 8.0], 300 mM NaCl, 1 mM

DTT) and disrupted through sonication. Cellular debris was separated by centrifugation (45,000 x g, 30 min, 4 ºC). The fusion protein was separated using glutathione Sepharose (GE Healthcare). The GST tag was removed by means of a PreScission protease digest overnight at 4 ºC. Size-exclusion chromatography using a HiLoad 16/60 Superdex 75 column (GE Healthcare) was used as a final purification step.

69

II.e.2. Crystallization and Structural Data Analysis of

APOBEC3A-E72A- C171A.

The protein solution was concentrated to 19.5 mg/ml in crystallization buffer (50 mM Tris-HCl [pH 8.0], 150 mM NaCl, 1 mM DTT, 50 µM ZnCl) and crystals were grown at 4 ºC in crystallization solution (100 mM sodium cacodylate

[pH 6.0], 40% 2-methyl-2,4-pentanediol, 8% PEG8000) by sitting-drop vapor diffusion over 3 years.

Diffraction experiments were conducted using a rotating anode X-ray source (Rigaku Micromax-007 HF) and charge-coupled device detector (Rigaku

Saturn 944) at 100 K.

Data were indexed and scaled using the software HKL2000 (70). CC1/2 and CC* were used to determine the data cutoff. The molecular replacement solution was calculated by Phaser (71) using the ID PDB:

3V4K as a search model (44). The structure was rebuilt using phenix.autobuild

(72). An automated pipeline (REdiii) was used for processing data from subsequent diffraction experiments (73). Multiple crystals were diffracted with exposure times between 2 and 4 min per oscillation image; only one led to diffraction spots beyond 3 Å. Refinement was carried out using Coot (74) and phenix.refine (72). Molecular graphics images were generated using PyMOL

(Schrödinger LLC) (75). SiteMap (7) was used to identify and evaluate volumes of binding sites, using a fine grid to search around the Zn2+ atoms in the dimerization interface.

70

II.e.3. High-Throughput DNA Binding Assay.

Carboxytetramethylrhodamine (5’-TAMRA)-labeled ssDNA (IDT) served

as substrate (sequences are listed in Table II.3). 10 nM of substrate was added to A3A-E72A-C171A in concentrations varying between 10 nM and 20 mM, and to a control without protein. The A3A protein concentrations were 0, 10, 25, 50,

75, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500, and 750 nM; 1, 1.25,

1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, and 20 mM. The mixtures were incubated for 1 hr at room temperature in nonbinding 96-well plates (Greiner) in 50 mM MES buffer (pH 6.0), 100 mM NaCl, 2 mM tris(2-carboxyethyl)phosphine in a total reaction volume of 150 ml per well. Fluorescence anisotropy was measured for triplicate experiments using an EnVision plate reader (PerkinElmer) equipped with the Optimized Tamra Acyclo Prime SNP Label detection kit, allowing excitation at 531 nm and detection of polarized emission at 579 nm wavelength.

Data analysis was performed using Prism (GraphPad) performing least-square fitting of the measured fluorescence anisotropy values (Y) at different protein concentrations (X) with a single-site binding curve with Hill slope, a nonspecific linear term, and a constant background using the equation Y=(Bmax x X^h)/

(Kd^h + X^h) + NS x X + Background, where Kd is the equilibrium dissociation constant, h is the Hill coefficient, and Bmax is the extrapolated maximum anisotropy at complete binding. SD was calculated for each measurement point from the three independent repeats and is shown as error bars in the corresponding data figures.

71

Table II.3. Nucleotide sequences for ssDNA oligomers used in APOBEC3A binding experiments

72

II.f. References:

1. Bulliard Y, Narvaiza I, Bertero A, Peddi S, Rohrig UF, Ortiz M, Zoete V, Castro-Diaz N, Turelli P, Telenti A, Michielin O, Weitzman MD, Trono D. Structure-function analyses point to a polynucleotide-accommodating groove essential for APOBEC3A restriction activities. J Virol. 2011;85(4):1765-76. doi: 10.1128/JVI.01651-10. PubMed PMID: 21123384; PMCID: PMC3028873.

2. Mitra M, Hercik K, Byeon IJ, Ahn J, Hill S, Hinchee-Rodriguez K, Singer D, Byeon CH, Charlton LM, Nam G, Heidecker G, Gronenborn AM, Levin JG. Structural determinants of human APOBEC3A enzymatic and nucleic acid binding properties. Nucleic Acids Res. 2014;42(2):1095-110. doi: 10.1093/nar/gkt945. PubMed PMID: 24163103; PMCID: 3902935.

3. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, Jones D, Hinton J, Marshall J, Stebbings LA, Menzies A, Martin S, Leung K, Chen L, Leroy C, Ramakrishna M, Rance R, Lau KW, Mudie LJ, Varela I, McBride DJ, Bignell GR, Cooke SL, Shlien A, Gamble J, Whitmore I, Maddison M, Tarpey PS, Davies HR, Papaemmanuil E, Stephens PJ, McLaren S, Butler AP, Teague JW, Jonsson G, Garber JE, Silver D, Miron P, Fatima A, Boyault S, Langerod A, Tutt A, Martens JW, Aparicio SA, Borg A, Salomon AV, Thomas G, Borresen-Dale AL, Richardson AL, Neuberger MS, Futreal PA, Campbell PJ, Stratton MR, Breast Cancer Working Group of the International Cancer Genome C. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012;149(5):979-93. doi: 10.1016/j.cell.2012.04.024. PubMed PMID: 22608084; PMCID: PMC3414841.

4. Burns MB, Lackey L, Carpenter MA, Rathore A, Land AM, Leonard B, Refsland EW, Kotandeniya D, Tretyakova N, Nikas JB, Yee D, Temiz NA, Donohue DE, McDougle RM, Brown WL, Law EK, Harris RS. APOBEC3B is an enzymatic source of mutation in breast cancer. Nature. 2013;494(7437):366-70. doi: 10.1038/nature11881. PubMed PMID: 23389445.

5. Burns MB, Temiz NA, Harris RS. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat Genet. 2013;45(9):977-83. doi: 10.1038/ng.2701. PubMed PMID: 23852168.

6. Roberts SA, Lawrence MS, Klimczak LJ, Grimm SA, Fargo D, Stojanov P, Kiezun A, Kryukov GV, Carter SL, Saksena G, Harris S, Shah RR, Resnick MA, Getz G, Gordenin DA. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat Genet. 2013;45(9):970-6. doi: 10.1038/ng.2702. PubMed PMID: 23852170; PMCID: PMC3789062.

73

7. Halgren TA. Identifying and characterizing binding sites and assessing druggability. J Chem Inf Model. 2009;49(2):377-89. doi: 10.1021/ci800324m. PubMed PMID: 19434839.

8. Cancer Genome Atlas Research N. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature. 2014;507(7492):315-22. doi: 10.1038/nature12965. PubMed PMID: 24476821; PMCID: PMC3962515.

9. de Bruin EC, McGranahan N, Mitter R, Salm M, Wedge DC, Yates L, Jamal- Hanjani M, Shafi S, Murugaesu N, Rowan AJ, Gronroos E, Muhammad MA, Horswell S, Gerlinger M, Varela I, Jones D, Marshall J, Voet T, Van Loo P, Rassl DM, Rintoul RC, Janes SM, Lee SM, Forster M, Ahmad T, Lawrence D, Falzon M, Capitanio A, Harkins TT, Lee CC, Tom W, Teefe E, Chen SC, Begum S, Rabinowitz A, Phillimore B, Spencer-Dene B, Stamp G, Szallasi Z, Matthews N, Stewart A, Campbell P, Swanton C. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science. 2014;346(6206):251-6. doi: 10.1126/science.1253462. PubMed PMID: 25301630; PMCID: PMC4636050.

10. Caval V, Suspene R, Shapira M, Vartanian JP, Wain-Hobson S. A prevalent cancer susceptibility APOBEC3A hybrid allele bearing APOBEC3B 3'UTR enhances chromosomal DNA damage. Nat Commun. 2014;5:5129. doi: 10.1038/ncomms6129. PubMed PMID: 25298230.

11. Nik-Zainal S, Wedge DC, Alexandrov LB, Petljak M, Butler AP, Bolli N, Davies HR, Knappskog S, Martin S, Papaemmanuil E, Ramakrishna M, Shlien A, Simonic I, Xue Y, Tyler-Smith C, Campbell PJ, Stratton MR. Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer. Nat Genet. 2014;46(5):487-91. doi: 10.1038/ng.2955. PubMed PMID: 24728294; PMCID: PMC4137149.

12. Betts L, Xiang S, Short SA, Wolfenden R, Carter CW, Jr. Cytidine deaminase. The 2.3 A crystal structure of an enzyme: transition-state analog complex. J Mol Biol. 1994;235(2):635-56. PubMed PMID: 8289286.

13. Wilson DK, Rudolph FB, Quiocho FA. Atomic structure of adenosine deaminase complexed with a transition-state analog: understanding catalysis and immunodeficiency mutations. Science. 1991;252(5010):1278-84. PubMed PMID: 1925539.

14. Wedekind JE, Dance GS, Sowden MP, Smith HC. Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business. Trends Genet. 2003;19(4):207-16. PubMed PMID: 12683974.

74

15. LaRue RS, Andresdottir V, Blanchard Y, Conticello SG, Derse D, Emerman M, Greene WC, Jonsson SR, Landau NR, Lochelt M, Malik HS, Malim MH, Munk C, O'Brien SJ, Pathak VK, Strebel K, Wain-Hobson S, Yu XF, Yuhki N, Harris RS. Guidelines for naming nonprimate APOBEC3 genes and proteins. J Virol. 2009;83(2):494-7. Epub 2008/11/07. doi: 10.1128/JVI.01976-08. PubMed PMID: 18987154; PMCID: 2612408.

16. Shandilya SM, Bohn MF, Schiffer CA. A computational analysis of the structural determinants of APOBEC3's catalytic activity and vulnerability to HIV-1 Vif. Virology. 2014;471-473C:105-16. doi: 10.1016/j.virol.2014.09.023. PubMed PMID: 25461536.

17. Landry S, Narvaiza I, Linfesty DC, Weitzman MD. APOBEC3A can activate the DNA damage response and cause cell-cycle arrest. EMBO Rep. 2011;12(5):444-50. doi: 10.1038/embor.2011.46. PubMed PMID: 21460793; PMCID: PMC3090015.

18. Roberts SA, Sterling J, Thompson C, Harris S, Mav D, Shah R, Klimczak LJ, Kryukov GV, Malc E, Mieczkowski PA, Resnick MA, Gordenin DA. Clustered mutations in yeast and in human cancers can arise from damaged long single- strand DNA regions. Mol Cell. 2012;46(4):424-35. doi: 10.1016/j.molcel.2012.03.030. PubMed PMID: 22607975; PMCID: PMC3361558.

19. Sakofsky CJ, Roberts SA, Malc E, Mieczkowski PA, Resnick MA, Gordenin DA, Malkova A. Break-induced replication is a source of mutation clusters underlying kataegis. Cell Rep. 2014;7(5):1640-8. doi: 10.1016/j.celrep.2014.04.053. PubMed PMID: 24882007; PMCID: PMC4274036.

20. Bishop KN, Holmes RK, Sheehy AM, Davidson NO, Cho SJ, Malim MH. Cytidine deamination of retroviral DNA by diverse APOBEC proteins. Curr Biol. 2004;14(15):1392-6. PubMed PMID: 15296758.

21. Dang Y, Wang X, Esselman WJ, Zheng YH. Identification of APOBEC3DE as another antiretroviral factor from the human APOBEC family. J Virol. 2006;80(21):10522-33. doi: 10.1128/JVI.01123-06. PubMed PMID: 16920826; PMCID: 1641744.

22. Harari A, Ooms M, Mulder LC, Simon V. Polymorphisms and splice variants influence the antiretroviral activity of human APOBEC3H. J Virol. 2009;83(1):295- 303. doi: 10.1128/JVI.01665-08. PubMed PMID: 18945781; PMCID: 2612324.

75

23. Harris RS, Bishop KN, Sheehy AM, Craig HM, Petersen-Mahrt SK, Watt IN, Neuberger MS, Malim MH. DNA deamination mediates innate immunity to retroviral infection. Cell. 2003;113(6):803-9. PubMed PMID: 12809610.

24. Hultquist JF, Lengyel JA, Refsland EW, LaRue RS, Lackey L, Brown WL, Harris RS. Human and rhesus APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H demonstrate a conserved capacity to restrict Vif-deficient HIV-1. J Virol. 2011;85(21):11220-34. doi: 10.1128/JVI.05238-11. PubMed PMID: 21835787; PMCID: PMC3194973.

25. Lecossier D, Bouchonnet F, Clavel F, Hance AJ. Hypermutation of HIV-1 DNA in the absence of the Vif protein. Science. 2003;300(5622):1112. PubMed PMID: 12750511.

26. Liddament MT, Brown WL, Schumacher AJ, Harris RS. APOBEC3F properties and hypermutation preferences indicate activity against HIV-1 in vivo. Curr Biol. 2004;14(15):1385-91. PubMed PMID: 15296757.

27. Mangeat B, Turelli P, Caron G, Friedli M, Perrin L, Trono D. Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts. Nature. 2003;424(6944):99-103. PubMed PMID: 12808466.

28. OhAinle M, Kerns JA, Li MM, Malik HS, Emerman M. Antiretroelement activity of APOBEC3H was lost twice in recent human evolution. Cell Host Microbe. 2008;4(3):249-59. PubMed PMID: 18779051.

29. Sheehy AM, Gaddis NC, Choi JD, Malim MH. Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature. 2002;418(6898):646-50. PubMed PMID: 12167863.

30. Wiegand HL, Doehle BP, Bogerd HP, Cullen BR. A second human antiretroviral factor, APOBEC3F, is suppressed by the HIV-1 and HIV-2 Vif proteins. EMBO J. 2004;23(12):2451-8. doi: 10.1038/sj.emboj.7600246. PubMed PMID: 15152192; PMCID: 423288.

31. Zhang H, Yang B, Pomerantz RJ, Zhang C, Arunachalam SC, Gao L. The cytidine deaminase CEM15 induces hypermutation in newly synthesized HIV-1 DNA. Nature. 2003;424(6944):94-8. PubMed PMID: 12808465.

32. Zheng YH, Irwin D, Kurosu T, Tokunaga K, Sata T, Peterlin BM. Human APOBEC3F is another host factor that blocks human immunodeficiency virus type 1 replication. J Virol. 2004;78(11):6073-6. PubMed PMID: 15141007.

76

33. Weil AF, Ghosh D, Zhou Y, Seiple L, McMahon MA, Spivak AM, Siliciano RF, Stivers JT. Uracil DNA glycosylase initiates degradation of HIV-1 cDNA containing misincorporated dUTP and prevents viral integration. Proc Natl Acad Sci U S A. 2013;110(6):E448-57. doi: 10.1073/pnas.1219702110. PubMed PMID: 23341616; PMCID: PMC3568341.

34. Loeb LA, Essigmann JM, Kazazi F, Zhang J, Rose KD, Mullins JI. Lethal mutagenesis of HIV with mutagenic nucleoside analogs. Proc Natl Acad Sci U S A. 1999;96(4):1492-7. PubMed PMID: 9990051; PMCID: PMC15492.

35. Wang T, Tian C, Zhang W, Luo K, Sarkis PT, Yu L, Liu B, Yu Y, Yu XF. 7SL RNA mediates virion packaging of the antiviral cytidine deaminase APOBEC3G. J Virol. 2007;81(23):13112-24. doi: 10.1128/JVI.00892-07. PubMed PMID: 17881443; PMCID: 2169093.

36. Wedekind JE, Gillilan R, Janda A, Krucinska J, Salter JD, Bennett RP, Raina J, Smith HC. Nanostructures of APOBEC3G support a hierarchical assembly model of high molecular mass ribonucleoprotein particles from dimeric subunits. J Biol Chem. 2006;281(50):38122-6. PubMed PMID: 17079235.

37. Li J, Chen Y, Li M, Carpenter MA, McDougle RM, Luengas EM, Macdonald PJ, Harris RS, Mueller JD. APOBEC3 multimerization correlates with HIV-1 packaging and restriction activity in living cells. J Mol Biol. 2014;426(6):1296-307. doi: 10.1016/j.jmb.2013.12.014. PubMed PMID: 24361275; PMCID: PMC3977201.

38. Shlyakhtenko LS, Lushnikov AY, Li M, Lackey L, Harris RS, Lyubchenko YL. Atomic force microscopy studies provide direct evidence for dimerization of the HIV restriction factor APOBEC3G. J Biol Chem. 2011;286(5):3387-95. Epub 2010/12/03. doi: 10.1074/jbc.M110.195685. PubMed PMID: 21123176; PMCID: 3030345.

39. Shlyakhtenko LS, Lushnikov AY, Miyagi A, Li M, Harris RS, Lyubchenko YL. Atomic force microscopy studies of APOBEC3G oligomerization and dynamics. J Struct Biol. 2013;184(2):217-25. doi: 10.1016/j.jsb.2013.09.008. PubMed PMID: 24055458; PMCID: PMC3844295.

40. Shlyakhtenko LS, Lushnikov AJ, Li M, Harris RS, Lyubchenko YL. Interaction of APOBEC3A with DNA assessed by atomic force microscopy. PLoS One. 2014;9(6):e99354. doi: 10.1371/journal.pone.0099354. PubMed PMID: 24905100; PMCID: PMC4048275.

77

41. Kitamura S, Ode H, Iwatani Y. Structural Features of Antiviral APOBEC3 Proteins are Linked to Their Functional Activities. Front Microbiol. 2011;2:258. doi: 10.3389/fmicb.2011.00258. PubMed PMID: 22203821; PMCID: 3243911.

42. Bohn MF, Shandilya SM, Albin JS, Kouno T, Anderson BD, McDougle RM, Carpenter MA, Rathore A, Evans L, Davis AN, Zhang J, Lu Y, Somasundaran M, Matsuo H, Harris RS, Schiffer CA. Crystal structure of the DNA cytosine deaminase APOBEC3F: the catalytically active and HIV-1 Vif-binding domain. Structure. 2013;21(6):1042-50. doi: 10.1016/j.str.2013.04.010. PubMed PMID: 23685212; PMCID: 3805256.

43. Holden LG, Prochnow C, Chang YP, Bransteitter R, Chelico L, Sen U, Stevens RC, Goodman MF, Chen XS. Crystal structure of the anti-viral APOBEC3G catalytic domain and functional implications. Nature. 2008. PubMed PMID: 18849968.

44. Li M, Shandilya SM, Carpenter MA, Rathore A, Brown WL, Perkins AL, Harki DA, Solberg J, Hook DJ, Pandey KK, Parniak MA, Johnson JR, Krogan NJ, Somasundaran M, Ali A, Schiffer CA, Harris RS. First-in-class small molecule inhibitors of the single-strand DNA cytosine deaminase APOBEC3G. ACS Chem Biol. 2012;7(3):506-17. Epub 2011/12/21. doi: 10.1021/cb200440y. PubMed PMID: 22181350; PMCID: 3306499.

45. Shandilya SM, Nalam MN, Nalivaika EA, Gross PJ, Valesano JC, Shindo K, Li M, Munson M, Royer WE, Harjes E, Kono T, Matsuo H, Harris RS, Somasundaran M, Schiffer CA. Crystal structure of the APOBEC3G catalytic domain reveals potential oligomerization interfaces. Structure. 2010;18(1):28-38. doi: 10.1016/j.str.2009.10.016. PubMed PMID: 20152150; PMCID: PMC2913127.

46. Chen KM, Harjes E, Gross PJ, Fahmy A, Lu Y, Shindo K, Harris RS, Matsuo H. Structure of the DNA deaminase domain of the HIV-1 restriction factor APOBEC3G. Nature. 2008;452(7183):116-9. doi: 10.1038/nature06638. PubMed PMID: 18288108.

47. Harjes E, Gross PJ, Chen KM, Lu Y, Shindo K, Nowarski R, Gross JD, Kotler M, Harris RS, Matsuo H. An extended structure of the APOBEC3G catalytic domain suggests a unique holoenzyme model. J Mol Biol. 2009;389(5):819-32. PubMed PMID: 19389408.

48. Byeon IJ, Ahn J, Mitra M, Byeon CH, Hercik K, Hritz J, Charlton LM, Levin JG, Gronenborn AM. NMR structure of human restriction factor APOBEC3A reveals substrate binding and enzyme specificity. Nat Commun. 2013;4:1890. doi: 10.1038/ncomms2883. PubMed PMID: 23695684; PMCID: 3674325.

78

49. Chelico L, Pham P, Calabrese P, Goodman MF. APOBEC3G DNA deaminase acts processively 3' --> 5' on single-stranded DNA. Nat Struct Mol Biol. 2006. PubMed PMID: 16622407.

50. Chelico L, Sacho EJ, Erie DA, Goodman MF. A model for oligomeric regulation of APOBEC3G cytosine deaminase-dependent restriction of HIV. J Biol Chem. 2008;283(20):13780-91. PubMed PMID: 18362149.

51. Iwatani Y, Takeuchi H, Strebel K, Levin JG. Biochemical activities of highly purified, catalytically active human APOBEC3G: correlation with antiviral effect. J Virol. 2006;80(12):5992-6002. PubMed PMID: 16731938.

52. Yu Q, Konig R, Pillai S, Chiles K, Kearney M, Palmer S, Richman D, Coffin JM, Landau NR. Single-strand specificity of APOBEC3G accounts for minus- strand deamination of the HIV genome. Nat Struct Mol Biol. 2004;11(5):435-42. PubMed PMID: 15098018.

53. Love RP, Xu H, Chelico L. Biochemical analysis of hypermutation by the deoxycytidine deaminase APOBEC3A. J Biol Chem. 2012;287(36):30812-22. doi: 10.1074/jbc.M112.393181. PubMed PMID: 22822074; PMCID: PMC3436324.

54. Carpenter MA, Li M, Rathore A, Lackey L, Law EK, Land AM, Leonard B, Shandilya SM, Bohn MF, Schiffer CA, Brown WL, Harris RS. Methylcytosine and normal cytosine deamination by the foreign DNA restriction enzyme APOBEC3A. J Biol Chem. 2012;287(41):34801-8. doi: 10.1074/jbc.M112.385161. PubMed PMID: 22896697; PMCID: PMC3464582.

55. Logue EC, Bloch N, Dhuey E, Zhang R, Cao P, Herate C, Chauveau L, Hubbard SR, Landau NR. A DNA sequence recognition loop on APOBEC3A controls substrate specificity. PLoS One. 2014;9(5):e97062. doi: 10.1371/journal.pone.0097062. PubMed PMID: 24827831; PMCID: PMC4020817.

56. Auld DS. Zinc coordination sphere in biochemical zinc sites. Biometals. 2001;14(3-4):271-313. PubMed PMID: 11831461.

57. Stenglein MD, Burns MB, Li M, Lengyel J, Harris RS. APOBEC3 proteins mediate the clearance of foreign DNA from human cells. Nat Struct Mol Biol. 2010;17(2):222-9. PubMed PMID: 20062055.

58. Thielen BK, McNevin JP, McElrath MJ, Hunt BV, Klein KC, Lingappa JR. Innate immune signaling induces high levels of TC-specific deaminase activity in

79

primary monocyte-derived cells through expression of APOBEC3A isoforms. J Biol Chem. 2010;285(36):27753-66. doi: 10.1074/jbc.M110.102822. PubMed PMID: 20615867; PMCID: PMC2934643.

59. Pham P, Landolph A, Mendez C, Li N, Goodman MF. A biochemical analysis linking APOBEC3A to disparate HIV-1 restriction and skin cancer. J Biol Chem. 2013;288(41):29294-304. doi: 10.1074/jbc.M113.504175. PubMed PMID: 23979356; PMCID: PMC3795231.

60. Suspene R, Aynaud MM, Vartanian JP, Wain-Hobson S. Efficient deamination of 5-methylcytidine and 5-substituted cytidine residues in DNA by human APOBEC3A cytidine deaminase. PLoS One. 2013;8(6):e63461. doi: 10.1371/journal.pone.0063461. PubMed PMID: 23840298; PMCID: PMC3688686.

61. Wijesinghe P, Bhagwat AS. Efficient deamination of 5-methylcytosines in DNA by human APOBEC3A, but not by AID or APOBEC3G. Nucleic Acids Res. 2012;40(18):9206-17. doi: 10.1093/nar/gks685. PubMed PMID: 22798497; PMCID: PMC3467078.

62. Franchini DM, Schmitz KM, Petersen-Mahrt SK. 5-Methylcytosine DNA demethylation: more than losing a methyl group. Annu Rev Genet. 2012;46:419- 41. doi: 10.1146/annurev-genet-110711-155451. PubMed PMID: 22974304.

63. Guo JU, Su Y, Zhong C, Ming GL, Song H. Hydroxylation of 5-methylcytosine by TET1 promotes active DNA demethylation in the adult brain. Cell. 2011;145(3):423-34. doi: 10.1016/j.cell.2011.03.022. PubMed PMID: 21496894; PMCID: PMC3088758.

64. Jarmuz A, Chester A, Bayliss J, Gisbourne J, Dunham I, Scott J, Navaratnam N. An anthropoid-specific locus of orphan C to U RNA-editing enzymes on . Genomics. 2002;79(3):285-96. PubMed PMID: 11863358.

65. Chelico L, Prochnow C, Erie DA, Chen XS, Goodman MF. A structural model for deoxycytidine deamination mechanisms of the HIV-1 inactivation enzyme APOBEC3G. J Biol Chem. 2010. PubMed PMID: 20212048.

66. Haché G, Liddament MT, Harris RS. The retroviral hypermutation specificity of APOBEC3F and APOBEC3G is governed by the C-terminal DNA cytosine deaminase domain. J Biol Chem. 2005;280(12):10920-4. PubMed PMID: 15647250.

80

67. Ara A, Love RP, Chelico L. Different mutagenic potential of HIV-1 restriction factors APOBEC3G and APOBEC3F is determined by distinct single-stranded DNA scanning mechanisms. PLoS Pathog. 2014;10(3):e1004024. doi: 10.1371/journal.ppat.1004024. PubMed PMID: 24651717; PMCID: PMC3961392.

68. Kim EY, Bhattacharya T, Kunstman K, Swantek P, Koning FA, Malim MH, Wolinsky SM. Human APOBEC3G-mediated editing can promote HIV-1 sequence diversification and accelerate adaptation to selective pressure. J Virol. 2010;84(19):10402-5. doi: 10.1128/JVI.01223-10. PubMed PMID: 20660203; PMCID: PMC2937764.

69. Sadler HA, Stenglein MD, Harris RS, Mansky LM. APOBEC3G contributes to HIV-1 variation through sublethal mutagenesis. J Virol. 2010;84(14):7396-404. doi: 10.1128/JVI.00056-10. PubMed PMID: 20463080; PMCID: PMC2898230.

70. Otwinowski Z, Minor W. Processing of X-ray diffraction data collected in oscillation mode. Macromolecular Crystallography, Pt A. 1997;276:307-26. doi: Doi 10.1016/S0076-6879(97)76066-X. PubMed PMID: WOS:A1997BH42P00020.

71. McCoy AJ, Grosse-Kunstleve RW, Adams PD, Winn MD, Storoni LC, Read RJ. Phaser crystallographic software. J Appl Crystallogr. 2007;40(Pt 4):658-74. doi: 10.1107/S0021889807021206. PubMed PMID: 19461840; PMCID: PMC2483472.

72. Adams PD, Afonine PV, Bunkoczi G, Chen VB, Echols N, Headd JJ, Hung LW, Jain S, Kapral GJ, Grosse Kunstleve RW, McCoy AJ, Moriarty NW, Oeffner RD, Read RJ, Richardson DC, Richardson JS, Terwilliger TC, Zwart PH. The Phenix software for automated determination of macromolecular structures. Methods. 2011;55(1):94-106. doi: 10.1016/j.ymeth.2011.07.005. PubMed PMID: 21821126; PMCID: PMC3193589.

73. Bohn MF, Schiffer CA. REdiii: a pipeline for automated structure solution. Acta Crystallogr D Biol Crystallogr. 2015;71(Pt 5):1059-67. doi: 10.1107/S139900471500303X. PubMed PMID: 25945571; PMCID: PMC4427196.

74. Emsley P, Cowtan K. Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr. 2004;60(Pt 12 Pt 1):2126-32. doi: 10.1107/S0907444904019158. PubMed PMID: 15572765.

75. DeLano W. The PyMOL Molecular Graphics System2002.

81

Chapter III

Crystal structure of APOBEC3A bound to single- stranded DNA reveals structural basis for cytidine deamination and specificity

82

III.a. Abstract

Nucleic acid editing enzymes are essential components of the immune system that lethally mutate viral pathogens and somatically mutate immunoglobulins, and contribute to the diversification and lethality of cancers. Among these enzymes are the seven human APOBEC3 deoxycytidine deaminases, each with unique target sequence specificity and subcellular localization. While the enzymology and biological consequences have been extensively studied, the mechanism by which APOBEC3s recognize and edit DNA remains elusive. Here we present the crystal structure of a complex of a cytidine deaminase with ssDNA bound in the active site at 2.2 Å. This structure not only visualizes the active site poised for catalysis of APOBEC3A, but pinpoints the residues that confer specificity towards

CC/TC motifs. The APOBEC3A–ssDNA complex defines the 5’–3’ directionality and subtle conformational changes that clench the ssDNA within the binding groove, revealing the architecture and mechanism of ssDNA recognition that is likely conserved among all polynucleotide deaminases, thereby opening the door for the design of mechanistic-based therapeutics.

83

III.b. Introduction

Apolipoprotein B messenger RNA-editing enzyme, catalytic polypeptide-like

(APOBEC3) proteins are single-stranded DNA (ssDNA) deoxycytidine deaminases that are among some of the fastest evolving proteins in the (1). APOBEC3s catalyze a cytidine (C) to uridine (U) zinc-dependent deamination reaction (2-5). The seven APOBEC3 enzymes are clustered on chromosome 22 (6). Although each APOBEC3 has a single catalytic active site, the human genome includes three single-domain (APOBEC3A, C and H) and four double-domain (APOBEC3B, D, F and D) enzymes. The double-domain enzymes consist of a catalytically active C-terminal domain (CTD) and an inactive pseudo-catalytic N-terminal domain (NTD) that can bind but not edit nucleic acids. Four of the seven APOBEC enzymes (APOBEC3D, APOBEC3F,

APOBEC3G and APOBEC3H) have been implicated as HIV-1 host restriction factors (7-13). The APOBEC3 enzymes act on ssDNA to introduce C-to-U modifications that create G-to-A point mutations on the paired strand as the U is read as T during replication. Such mutations in ssDNA can lead to double-strand breaks that may result in genomic DNA damage that have been observed in cancer (14-20).

In the last decade, our laboratories (21-27) along with others (28-42) have solved crystal and nuclear magnetic resonance (NMR) structures of single domains of human APOBEC3s (Figure III.1). These proteins share the same

84

A

B

FigureSupplementary III.1. Sequence Fig. 1. Sequencealignment alignment of A ofPOBEC3s. A3A A) with sequences of catalytically active SequenceSupplementaryA3 domains alignment whose Fig. crystal 1.of Sequence A3A: structures alignment have been of A3Adetermined, A) with B) sequences with sequences of catalytically of inactive active A. withA3pseudo sequencesdomains-catalytic whose domains ofcrystal catalytically whosestructures structures have active been have determined,A3 been domains solved. B) with whose sequences crystal of inactive structures havepseudo been- catalyticdetermined, domains whose structures have been solved. B. with sequences of inactive pseudo-catalytic domains whose structures have been solved.

85

overall fold (42), deaminate cytosines in ssDNA, but vary in their substrate specificity, processivity, catalytic rate and ability to restrict HIV-1. All APOBEC3 domains contain a HAEx28Cx2-4C zinc binding motif. The carboxylate group of the catalytic glutamic acid stabilizes the transition state and proton transfer during catalysis where a water coordinated by the catalytic zinc is the sole source of proton for the amino group and N3 atom of cytosine (2,44,45). The specificity of different APOBECs has been elucidated by the determination of preferred mutagenic hotspot sequences, 5’- CC/TC-3’ for APOBEC3A (studied here) (46),

5’-TC-3’ for APOBEC3F and 5’-CC-3’ for APOBEC3G (10,47,48). APOBEC3G deaminates hotspots closer to 5’-end more efficiently than to 3’-end of ssDNA

(28,30,32,49), but the underlying mechanism for this preference is not known.

Several alternative ssDNA-binding models for APOBEC3G-CTD and APOBEC3A have been proposed (21,29,35,36). Most recently, the crystal structure of the inactive pseuodo-catalytic rhesus macaque APOBEC3G-NTD (rA3G-NTD)

(Figure III.1B) in complex with poly-dT ssDNA has been reported (42). However, only one complete deoxythymidine (dT) was resolved in this structure bound in a shallow cleft far from the pseudo-catalytic zinc-binding motif. This complex did not reveal how substrate (dC) or product (dU) may be accommodated for deamination reaction. The details of ssDNA-binding and -editing mechanisms, and molecular basis underlying substrate nucleotide sequence specificities of

APOBEC3 enzymes still remain elusive.

86

APOBEC3A (A3A) is a single-domain enzyme with the highest catalytic activity among the human APOBEC3 proteins (50). While the DNA-editing activity inhibiting the replication of retroelements is beneficial for genome stability, increased expression or defective regulation of A3A could lead to mutagenesis of human genome and contribute to carcinogenesis (51). The structure of A3A was initially determined by NMR (35) and some preference for

DNA over RNA was suggested by chemical shift perturbation data (36).

However, mutations of residues predicted to be involved in DNA targeting had variable effects on deamination activity, and the detailed mechanism by which

A3A binds DNA substrate is still elusive (35,36).

In this study, we determined the crystal structure of a ssDNA:deaminase complex, or a polynucleotide substrate bound at the active site of a catalytic domain APOBEC3 protein. Previously, we solved the crystal structure of the unliganded inactive A3A (26) and determined potent binding affinity to substrate ssDNA of ~60 nM, whereas the product exhibited an order of magnitude lower affinity. Here the crystal structure to 2.2Å of this variant of A3A in complex with substrate DNA oligonucleotide containing a single 5’-TC-3’ deamination target sequence in a polyT background is presented. The central nucleotides comprising the 5’-TCT-3’ motif is well ordered and bound at the active site, revealing the intermolecular interactions defining specificity for the bases at each of these three positions. The target deoxycytidine (dC0), is bound in a reaction- competent coordination at the active site. This A3A–ssDNA structure elucidates

87

the molecular basis of nucleotide preferences in the substrate motif and provides key insights into the overall molecular mechanisms of DNA editing by cytidine deaminases.

88

III.c. Results and Discussion

III.c.1. A3A–ssDNA co-crystal structure.

A3A (E72A/C171A) (26) was used for co-crystallization with ssDNA. E72A inactivates the enzyme permitting the formation of stable complexes and C171A increases solubility. The crystal structure of A3A (E72A/C171A) in complex with ssDNA was determined by molecular replacement at 2.2Å resolution (Figure

III.2A-C and Figure III.3). A 15-mer DNA oligonucleotide that binds A3A with

~60 nM affinity (26) with a target deoxycytidine (5’- TTTTTTTCTTTTTTT-3’) was co-crystallized with A3A. The final refinement of the structure resulted in R- factor/R-free of 0.177/0.225, respectively (Table III.1).

There was a single A3A–ssDNA complex in the asymmetric unit and crystal contacts with symmetry-related complexes did not correspond to the zinc- coordinated dimer interface we observed for the apo A3A crystal structure (26).

The apo A3A structure included an excess of zinc (50µM ZnCl) in the crystallization condition, while the A3A–ssDNA complex lacked added zinc, which may have destablized the dimer within this crystal form. The cooperativity upon DNA binding we observed in solution and interrogated with site-directed mutagenesis (26) implicates A3A capable of binding ssDNA in the dimeric form at least transiently. Nevertheless, cooperativity does not seem to be essential as the monomeric form of A3A, with a mutation at H56A, binds substrate DNA with similar affinity (26). Most likely both monomer and dimer forms of A3A play a role in recognizing substrates in solution.

89 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms15024

46 polipoprotein B messenger RNA-editing enzyme, catalytic CC/TC-30 for APOBEC3A (studied here) ,50-TC-30 for 10,47,48 polypeptide-like (APOBEC3) proteins are single-stranded APOBEC3F and 50-CC-30 for APOBEC3G . APOBEC3G ADNA (ssDNA) deoxycytidine deaminases that are among deaminates hotspots closer to 50-end more efficiently than to 30- some of the fastest evolving proteins in the human genome1. end of ssDNA28,30,32,49, but the underlying mechanism for this APOBEC3s catalyse a cytidine (C) to uridine (U) zinc-dependent preference is not known. Several alternative ssDNA-binding deamination reaction2–5. The seven APOBEC3 enzymes are models for APOBEC3G-CTD and APOBEC3A have been clustered on chromosome 22 (ref. 6). Although each APOBEC3 proposed21,29,35,36. Most recently, the crystal structure of the has a single catalytic active site, the human genome includes three inactive pseuodo-catalytic rhesus macaque APOBEC3G-NTD single-domain (APOBEC3A, C and H) and four double-domain (rA3G-NTD) (Supplementary Fig. 1b) in complex with poly-dT (APOBEC3B, D, F and D) enzymes. The double-domain enzymes ssDNA has been reported42. However, only one complete consist of a catalytically active C-terminal domain (CTD) and an deoxythymidine (dT) was resolved in this structure bound in a inactive pseudo-catalytic N-terminal domain (NTD) that can shallow cleft far from the pseudo-catalytic zinc-binding motif. bind but not edit nucleic acids. Four of the seven APOBEC This complex did not reveal how substrate (dC) or product (dU) enzymes (APOBEC3D, APOBEC3F, APOBEC3G and may be accommodated for deamination reaction. The details of APOBEC3H) have been implicated as HIV-1 host restriction ssDNA-binding and -editing mechanisms, and molecular basis factors7–13. The APOBEC3 enzymes act on ssDNA to introduce underlying substrate nucleotide sequence specificities of C-to-U modifications that create G-to-A point mutations on the APOBEC3 enzymes still remain elusive. paired strand as the U is read as T during replication. Such APOBEC3A (A3A) is a single-domain enzyme with the highest mutations in ssDNA can lead to double-strand breaks that may catalytic activity among the human APOBEC3 proteins50. While result in genomic DNA damage that have been observed in the DNA-editing activity inhibiting the replication of retroelements cancer14–20. is beneficial for genome stability, increased expression or defective In the last decade, our laboratories21–27 along with others28–42 regulation of A3A could lead to mutagenesis of human genome and have solved crystal and nuclear magnetic resonance (NMR) contribute to carcinogenesis51. The structure of A3A was initially structures of single domains of human APOBEC3s determined by NMR35 and some preference for DNA over RNA (Supplementary Fig. 1). These proteins share the same overall was suggested by chemical shift perturbation data36. However, fold43, deaminate cytosines in ssDNA, but vary in their substrate mutations of residues predicted to be involved in DNA targeting specificity, processivity, catalytic rate and ability to restrict HIV-1. had variable effects on deamination activity, and the detailed 35,36 All APOBEC3 domains contain a HAEx28Cx2-4C zinc binding mechanism by which A3A binds DNA substrate is still elusive . motif. The carboxylate group of the catalytic glutamic acid In this study, we determined the crystal structure of a stabilizes the transition state and proton transfer during catalysis ssDNA:deaminase complex, or a polynucleotide substrate bound where a water coordinated by the catalytic zinc is the sole source at the active site of a catalytic domain APOBEC3 protein. of proton for the amino group and N3 atom of cytosine2,44,45. Previously, we solved the crystal structure of the unliganded The specificity of different APOBECs has been elucidated by inactived A3A (ref. 26) and determined potent binding affinity to the determination of preferred mutagenic hotspot sequences, 50- substrate ssDNA of B60 nM, whereas the product exhibited an

a bcH29 5′ 3′ Loop 1 α1

dTÐ2 dT1 Loop 3 dTÐ1

α6

α2

Loop 7 dC0 α3 α4 Loop 5 Y132

d R28 e f 5′ H29

3′

Y132

Figure 1 | Crystal structure of A3A in complex with substrate DNA. (a) A3A structure with a 2Fo Fc electron density map contoured at 1s. The protein À is presented as a green-colouredFigure III.2. ribbon Crystal diagram structure and the bound of DNA A3A is in stickin complex representation with (carbons substrate and phosphates, DNA orange; nitrogens, blue; oxygens, red). A zinc ion at the active centre is depicted as a magenta-coloured sphere. The side chains of zinc-coordinating residues H70, C101 and C106 are shown as sticks (carbons,A. green; A3A nitrogens, structure blue; oxygen, with a red; 2Fo sulfurs, - Fc yellow). electron DNA density binding at map the active contoured site of A3A at is presented1σ. The in (b) ribbon and (c) surface representation. (d)protein Conformational is presented changes of residuesas a green R28, H29-colored and Y132 ribbon upon DNA diagram binding areand indicated the bound by arrows, DNA with is side in chains in stick representation (whitestick and green-colouredrepresentation carbon (carbons for the apo and (PDB codephosphates, 4XXO)26 and orange; DNA-bound nitrogens, forms, respectively). blue; oxygens, Surface electrostatic potentials of 1 (e) apo and (f) DNA-boundred). A A3Azinc are ion coloured at the red active to blue cent for negativeer is depicted and positive as charges, a magenta respectively,-coloured using a scale sphere. of 5 to 5 kTe À . À þ The side chains of zinc-coordinating residues H70, C101 and C106 are shown as 2 sticks (carbons, green; nitrogens,NATURE COMMUNICATIONS blue; oxygen,| 8:15024 red; | DOI:sulfurs, 10.1038/ncomms15024 yellow). | www.nature.com/naturecommunications DNA binding at the active site of A3A is presented in B. ribbon and C. surface representation. D. Conformational changes of residues R28, H29 and Y132 upon DNA binding are indicated by arrows, with side chains in stick representation (white and green-colored carbon for the apo (PDB code 4XXO) (26) and DNA-bound forms, respectively). Surface electrostatic potentials of E. apo and F. DNA-bound A3A are colored red to blue for negative and positive charges, respectively, using a scale of -5 to +5 kTe-1.

90

Figure III.3. Secondary structure elements of A3A SupplementaryA3A structure Fig. is shown2. Secondary as a structureribbon diagram elements inof green.A3A. A3A A zinc structure ion atis shownthe active as a site ribbonis depicted diagram as in green.a magenta A zinc- coloredion at the sphere. active site The is depicted zinc-coordinating as a magenta -residuescolored sphere. H70, TheC101 zinc -andcoordinating C106 are residues in stick H70, representation C101 and C106 (carbons, are in stick green; representation nitrogens, (carbons, blue; green; nitrogens,oxygens, blue; red; oxygens, sulfurs, red; yellow). sulfurs, A3A yellow). structure A3A structure is comprised is comprised of helix of helix α1 (residues!1 (residues 1515–22),–22), strand strand "1 (32 β1– 41),(32 –strand41), strand"2 (44– 47β2 and (44 53–47–56, and with 53 a –break56, with or “kink” a break in the or strand), “kink” in helixthe !strand),2 (71–82), helix strand α 2" 3(71 (89––82),96), strandhelix !3 β (1063 (89–116),–96), strand helix " α4 3(120 (106–126),–116), helix strand !4 (136 β4– 146),(120 strand–126 "),5 helix(149– α152),4 (136 helix– 146),!5 (155 strand–165) andβ5 (149helix –!152),6 (179 helix–195). α 5 (155–165) and helix α6 (179–195).

91 NATURE COMMUNICATIONS | DOI: 10.1038/ncomms15024 ARTICLE

order of magnitude lower affinity. Here the crystal structure to in solution and interrogated with site-directed mutagenesis26 2.2 Å of this variant of A3A in complex with substrate DNA implicates A3A capable of binding ssDNA in the dimeric form at oligonucleotide containing a single 50-TC-30 deamination target least transiently. Nevertheless, cooperativity does not seem to be sequence in a polyT background is presented. The central essential as the monomeric form of A3A, with a mutation at 26 nucleotides comprising the 50-TCT-30 motif is well ordered and H56A, binds substrate DNA with similar affinity . Most likely bound at the active site, revealing the intermolecular interactions both monomer and dimer forms of A3A play a role in defining specificity for the bases at each of these three positions. recognizing substrates in solution. The target deoxycytidine (dC0), is bound in a reaction-competent The target deoxycytidine (dC0) and flanking deoxythymidines coordination at the active site. This A3A–ssDNA structure (dT 1 and dT1), as well as one additional deoxyribose at 50-end À elucidates the molecular basis of nucleotide preferences in the and one phosphate at 30-end, were well ordered in the electron substrate motif and provides key insights into the overall density (50-sugar-dT 1-dC0-dT1-phosphate-30; Fig. 1a). Of the molecular mechanisms of DNA editing by cytidine deaminases. nearly 1,280 Å2 of surfaceÀ area on the resolved DNA, B620 Å2 is buried in the interface with A3A. The central cytidine (dC0) and Results the preceding thymidine (dT 1) are accommodated in a deep groove formed by Loops 1,3,5À and 7 of A3A (Fig. 1b; A3A–ssDNA co-crystal structure. A3A (E72A/C171A)26 was Supplementary Fig. 1a). The bound DNA adopts an irregular used for co-crystallization with ssDNA. E72A inactivates the conformation to encircle the side chain of H29 (Fig. 1c). enzyme permitting the formation of stable complexes and C171A Compared to apo A3A, there are conformational changes in the increases solubility. The crystal structure of A3A (E72A/C171A) rotamers of the side chains of R28 and H29 in Loop 1, and Y132 in complex with ssDNA was determined by molecular in Loop 7, accompanied by more subtle reorganization of N57– replacement at 2.2 Å resolution (Fig. 1a–c; Supplementary A72 in loop 3 (Fig. 1d; Supplementary Figs 1 and 3). The rest of Fig. 2). A 15-mer DNA oligonucleotide that binds 26 the enzyme including the active site remains essentially A3A with B60 nM affinity with a target deoxycytidine (50- unchanged. The groove significantly differs from any of the TTTTTTTCTTTTTTT-3 ) was co-crystallized with A3A. The 0 previously suggested models for how ssDNA binds to A3s (refs final refinement of the structure resulted in R-factor/R-free of 21,29,35,36) including the recent structure of the pseuodo- 0.177/0.225, respectively (Table 1). catalytic A3G-NTD in complex with poly-dT ssDNA42. This There was a single A3A–ssDNA complex in the asymmetric conformational change allows the groove to sequester the ssDNA unit and crystal contacts with symmetry-related complexes did by forming a more complementary molecular surface, both in not correspond to the zinc-coordinated dimer interface we terms of van der Waals packing and electrostatic (electropositive) observed for the apo A3A crystal structure26. The apo A3A nature of the groove (Fig. 1e,f). structure included an excess of zinc (50 mM ZnCl) in the crystallization condition, while the A3A–ssDNA complex lacked added zinc, which may have destablized the dimer within this crystal form. The cooperativity upon DNA binding we observed Recognition of the targeted cytidine. The deoxycytidine (dC0), which is the target of deamination reaction, is well coordinated and buried within the active site of A3A. The cytidine ring is Table 1 | Data collection and refinement statistics (molecular located directly over the hydroxyl group of the T31 side chain, Tablereplacement). III.1. Data collection and refinement statistics which likely hydrogen bonds to the p-orbital cloud of the base ring and simultaneously coordinates O4 atom of the deoxyribose A3A/DNA complex (Fig. 2a). Residue Y130 contributes to the dC0 positioning by Data collection forming a T-shaped p–p interaction with the pyrimidine ring. Space group I222 The hydroxyl group of Y130 further forms a hydrogen bond with Cell dimensions 50-phosphate of dC0 (Fig. 2b). The H70 side chain is positioned a, b, c (Å) 56.6, 72.7, 115.0 over the N1 atom of dC0, capable of potentially forming a p–p a, b, g (°) 90.0, 90.0, 90.0 stacking (Fig. 2a). The backbone NH of A71 hydrogen bonds to Resolution (Å) 50.00–2.20 (2.24–2.20)* O2 of dC0. In addition, the carbonyl oxygen atoms of W98 and Rmerge 9.1 (52.8) S99 form a bifurcated hydrogen bond to NH2 of the cytosine, I/sI 27.9 (3.0) which appears to both support the dC positioning and dictate Completeness (%) 98.2 (82.9) 0 Redundancy 13.1 (8.4) the specificity for cytosine over thymine. As expected for a catalytic A3 domain, electron density that fits Refinement a zinc ion was observed coordinating H70, C101, C106 as well as Resolution (Å) 50.00–2.20 additional density that fits a Cl À ion, with both assignments No. of reflections 11,542 confirmed by anomalous difference calculations. To prevent Rwork/Rfree 0.177/0.225 catalysis, our A3A construct was inactivated by an E72A No. of atoms mutation, which left the geometry of the active site intact Protein 1,469 (Fig. 2a). Instead of the E72 side chain, we observe electron Ligand/ion 71/3 density that fits a water molecule. Molecular modelling of E72 Water 100 into this space shows the side chain would be positioned just B-factors proximal to the deamination target, C4-NH2 moiety, of the Protein 35.8 2 Ligand/ion 50.9/55.1 cytosine (Fig. 3) and poised for deamination reaction . After Water 38.7 catalysis and subsequent release of NH3, this coordination, along r.m.s. deviations with the interactions with W98 and S99, would be unfavourable Bond lengths (Å) 0.011 for the product uridine. Overall, multiple interactions of the Bond angles (°) 1.040 substrate cytosine with A3A active site residues ensure the specific recognition and geometry required for the deamination *Highest-resolution shell is shown in parenthesis. reaction and product release.

NATURE COMMUNICATIONS | 8:15024 | DOI: 10.1038/ncomms15024 | www.nature.com/naturecommunications 3

92

The target deoxycytidine (dC0) and flanking deoxythymidines (dT-1 and dT1), as well as one additional deoxyribose at 5’-end and one phosphate at 3’- end, were well ordered in the electron density (5’-sugar-dT-1-dC0-dT1-phosphate-

3’; Figure III.2A). Of the nearly 1,280 Å2 of surface area on the resolved DNA,

2 ~620 Å is buried in the interface with A3A. The central cytidine (dC0) and the preceding thymidine (dT-1) are accommodated in a deep groove formed by Loops

1,3,5 and 7 of A3A (Figure III.2B and Figure III.1A). The bound DNA adopts an irregular conformation to encircle the side chain of H29 (Figure III.2C).

Compared to apo A3A, there are conformational changes in the rotamers of the side chains of R28 and H29 in Loop 1, and Y132 in Loop 7, accompanied by more subtle reorganization of N57– A72 in loop 3 (Figure III.2D, Figure III.1, and

Figure III.4). The rest of the enzyme including the active site remains essentially unchanged. The groove significantly differs from any of the previously suggested models for how ssDNA binds to A3s (21,29,35,36) including the recent structure of the pseuodo-catalytic A3G-NTD in complex with poly-dT ssDNA (42). This conformational change allows the groove to sequester the ssDNA by forming a more complementary molecular surface, both in terms of van der Waals packing and electrostatic (electropositive) nature of the groove (Figure III.2E,F).

93

a

Figure III.4. Comparison of bound and unbound crystal structures of A3A A. Distance difference matrix between apo (PDB code 4XXO) (26) and DNA- bound form of A3A. All possible inter-C distances were calculated within the apo and DNA-bound form of A3A. Each distance in the apo form was subtracted from 26 theSupplementary corresponding distance Fig. in the 3. DNA (a)-bound Distance form, and difference the resultant distance matrix between apo (PDB code 4XXO)! and difference matrix is displayed as a contour plot (blue and red for negative and positiveDNA -values,bound respectively, form of with A3A. a scale All of -3.3 possible to +3.3 Å). inter The secondary-C! distances were calculated within the apo and structureDNA- elementsbound ofform the DNA of- boundA3A. A3A Each are indicated distance along in the the matrix; apo red -form was subtracted from the corresponding and green-colored rectangles depict α-helices and β-strands, respectively. distance in the DNA-bound form, and the resultant distance difference matrix is displayed as a contour plot (blue and red for negative and positive values, respectively, with a scale of -3.3 to +3.3 Å). The secondary structure elements of the DNA-bound A3A are indicated along the matrix; red- and green-colored rectangles depict !-helices and "-strands, respectively. (b) The distance difference matrix 94of selected A3A residues forming the interface with the bound DNA, showing the changes between apo and DNA-bound structures. The residues were grouped into three categories and indicated by red, green and blue; red- and green-colored residues have a relatively longer distance to each other when the substrate DNA binds to A3A whereas blue- colored residues have a shorter distance to both red- and green-colored residues. The location of selected residues in the three groups on the A3A structure, (c) side chains displayed in stick representation with arrows indicating side chain conformational changes, (d) spheres indicating the position of C! atoms (apo gray, bound colored according to the three groups as in panel a). The C! position changes upon the DNA binding are depicted by arrows. DNA molecule bound to A3A is in stick representation (carbons and phosphates, orange; nitrogens, blue; oxygens, red) and three nucleotides (dT-1, dC0 and dT1) are shown.

b

Figure III.4. Comparison of bound and unbound crystal structures of A3A (Continued) B. The distance difference matrix of selected A3A residues forming the interface with the bound DNA, showing the changes between apo and DNA- bound structures. The residues were grouped into three categories and indicated by red, green and blue; red- and green-colored residues have a relatively longer distance to each other when the substrate DNA binds to A3A whereas blue- colored residues have a shorter distance to both red- and green- colored residues.

Supplementary Fig. 3. (a) Distance difference matrix between apo (PDB code 4XXO)!26 and DNA-bound form of A3A. All possible inter-C! distances were calculated within the apo and

DNA-bound form of A3A. Each distance in the apo form was subtracted from the95 corresponding distance in the DNA-bound form, and the resultant distance difference matrix is displayed as a contour plot (blue and red for negative and positive values, respectively, with a scale of -3.3 to +3.3 Å). The secondary structure elements of the DNA-bound A3A are indicated along the matrix; red- and green-colored rectangles depict !-helices and "-strands, respectively. (b) The distance difference matrix of selected A3A residues forming the interface with the bound DNA, showing the changes between apo and DNA-bound structures. The residues were grouped into three categories and indicated by red, green and blue; red- and green-colored residues have a relatively longer distance to each other when the substrate DNA binds to A3A whereas blue- colored residues have a shorter distance to both red- and green-colored residues. The location of selected residues in the three groups on the A3A structure, (c) side chains displayed in stick representation with arrows indicating side chain conformational changes, (d) spheres indicating the position of C! atoms (apo gray, bound colored according to the three groups as in panel a). The C! position changes upon the DNA binding are depicted by arrows. DNA molecule bound to A3A is in stick representation (carbons and phosphates, orange; nitrogens, blue; oxygens, red) and three nucleotides (dT-1, dC0 and dT1) are shown.

c d

Figure III.4. Comparison of bound and unbound crystal structures26 of Supplementary Fig. 3. (a) Distance difference matrixA3A (Continued) between apo (PDB code 4XXO)! and The location of selected residues in the three groups on the A3A structure, DNA-bound form of A3A. All possible inter-C! distancesC. side chains displayedwere calculatedin stick representation within with arrows the indicatingapo and side chain conformational changes, DNA-bound form of A3A. Each distance in the apoD. spheres form indicating was subtracted the position of C αfrom atoms (apothe gray, corr boundesponding colored according to the three groups as in panel a). The Cα position changes upon distance in the DNA-bound form, and the resultantthe distance DNA binding aredifference depicted by arrows. matrix DNA moleculeis displayed bound to A3A as is a in contour plot (blue and red for negative and positivestick values, representation respectively, (carbons and phosphates, with aorange; scale nitrogens, of -3.3 blue; to oxygens, red) and three nucleotides (dT-1, dC0, and dT1) are shown. +3.3 Å). The secondary structure elements of the DNA-bound A3A are indicated along the matrix; red- and green-colored rectangles depict !-helices and "-strands, respectively. (b) The distance difference matrix of selected A3A residues forming the interface with the bound DNA, showing the changes between apo and DNA-bound structures. The residues were grouped into three categories and indicated by red, green and blue; red- and green-colored residues have a relatively longer distance to each other when the substrate DNA binds to A3A whereas blue- colored residues have a shorter distance to both red- and green-colored residues. The location of selected residues in the three groups on the A3A structure, (c) side chains displayed in stick representation with arrows indicating side chain conformational changes, (d) spheres indicating the position of C! atoms (apo gray, bound colored according to the three groups as in panel a). The C! position changes upon the DNA binding are depicted by arrows. DNA molecule bound to A3A is in stick representation (carbons and phosph ates, orange; nitrogens,96 blue; oxygens, red) and three nucleotides (dT-1, dC0 and dT1) are shown.

III.c.2. Recognition of the targeted cytidine.

The deoxycytidine (dC0), which is the target of deamination reaction, is well coordinated and buried within the active site of A3A. The cytidine ring is located directly over the hydroxyl group of the T31 side chain, which likely hydrogen bonds to the π-orbital cloud of the base ring and simultaneously coordinates O4 atom of the deoxyribose (Figure III.5A). Residue Y130 contributes to the dC0 positioning by forming a T-shaped π–π interaction with the pyrimidine ring. The hydroxyl group of Y130 further forms a hydrogen bond with

5’-phosphate of dC0 (Figure III.5B). The H70 side chain is positioned over the N1 atom of dC0, capable of potentially forming a π–π stacking (Figure III.5A). The backbone NH of A71 hydrogen bonds to O2 of dC0. In addition, the carbonyl oxygen atoms of W98 and S99 form a bifurcated hydrogen bond to NH2 of the cytosine, which appears to both support the dC0 positioning and dictate the specificity for cytosine over thymine.

As expected for a catalytic A3 domain, electron density that fits a zinc ion was observed coordinating H70, C101, C106 as well as additional density that fits a Cl- ion, with both assignments confirmed by anomalous difference calculations. To prevent catalysis, our A3A construct was inactivated by an E72A mutation, which left the geometry of the active site intact (Figure III.5A). Instead of the E72 side chain, we observe electron density that fits a water molecule.

Molecular modelling of E72 into this space shows the side chain would be positioned just proximal to the deamination target, C4-NH2 moiety, of the cytosine

97 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms15024

a dC dC0 dC0 0

T31 H70 T31 H70 T31 H70 Y130 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms15024 A71 A71 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms15024 Zn Zn Y130 Y130 W W98 W W98 A71 Cl a Cl A72 A72 Zn W dC a dC0 dC0 0 S99 S99 W98 dC dC dC0 0 P100 0 P100 T31 H70 T31 H70 T31 H70 H70 H70 H70 T31 b T31 T31 E72 Y130 H29 H29 dT1 dT1 Y130 S99 A71 A71 A71 A71 P100 dT Zn dT Zn Ð1 Ð1Y130 Y130 Zn Zn W W98 W W98 A71 W W98 Y130 W W98 Y130 Figure 3 | Structural model of the A3A catalytically active site. The target W Cl W Cl A71 K60 K60 Cl A72 Cl nucleotideA72 base (dC0) bound at the A3A active site whereZn the catalyticW E72 A72 A72 Zn ARTICLE Y130 S99Y130 side chainNATURE was modelled COMMUNICATIONS inS99W instead of the | alanine DOI: 10.1038/ncomms15024 at this position in the W98 A59 S99 A59 S99 W98 T31 P100T31 crystal structure. Zinc,P100 the coordinated water (W), carbonyl oxygen of dC0 P100 P100 and carboxyl oxygen of E72 side chain were connected by dashed lines in A Q58 N57 B Q58 N57 magenta. a dC b dC E72 0 0 E72 b dC dCdT H29 dT H29 dC0 S99 dT0 H29 dT0 1 H29 1 1 1 with the D131 side chain carboxylate and O4S99 with a water c P100 H70 H70 molecule. InH70 addition, the D131 sideT31 chainP100 has a salt bridge to the T31 T31 dTÐ1 dTÐ1 dTÐ1 dTÐ1 R189 side chain in helix 6, which stabilizes the overall hydrogen Y130 W98 W98 bonding configuration of Loop 7Figure to the 3 | Structural thymine base. model This of the A3A catalytically active site. The target K60 W FigureK60 3 |W Structural model of the A3A catalytically active site. The target dTK60Ð1 W dTK60Ð1 W A71 A71 nucleotidecoordination base appears (dC ) bound critical at the as residueA3Anucleotide active 189 site is base conserved where (dC the0) catalytic boundas a basic at E72 the A3A active site where the catalytic E72 W W residue (Arg/Lys)0 only in catalytically active A3 domains Zn I129 Zn I129Y130 Y130 side chain was modelled in instead of the alanine at this position in the Y130 Y130Y130 Y130 Y130Y130 side chain was modelled in instead of the alanine at this position in the A59 W W98 A59 W W98 A59(Supplementary Fig.A71 1; Supplementary Table 1). At the 1 W W T31 crystal structure. Zinc,T31 the coordinatedcrystal water (W), structure. carbonyl Zinc, oxygen the coordinated of dC water (W), carbonyl oxygen of dC0 Cl T31 Cl T31 position, deoxcytidine could form similar, but slightly rearranged,À 0 A72 A72 and carboxyl oxygenZn of E72 side chainand were carboxyl connected oxygen by dashed of E72 lines side in chain were connected by dashed lines in R189 R189 interactions as the N3W atom lacks the proton to hydrogen bond Q58Y132 N57 S99 Y132Q58 N57N57 S99 magenta.Q58 N57 magenta.W98 dCD131 dCD131dC with D131. Indeed,dC although A3A has dual specificity for 50-TC-30 P1000 P10000 0 and 50-CC-30 (ref. 40), there is a preference for thymidine at the with1 position.the D131 However, side chain Loop carboxylate 7 ofwith A3A, the and in D131 particular O4 with side residues a chain water carboxylate and O4 with a water C c D bc molecule.Y130À and In D131, addition,E72 would the likely D131 preclude sidemolecule. chain a larger has In apurine addition, salt bridge base the from to D131 the side chain has a salt bridge to the d dT H29 dTÐ2 dT H29 dT 1 1 Ð2 R189fitting side in this chain position, in helix thus 6, which definingR189 stabilizes the side T/C the chainspecificityS99 overall in helix ofhydrogen A3A. 6, which stabilizes the overall hydrogen W98H29 W98W98H29 bonding configurationW98 of Loopbonding 7 toP100 the configuration thymine base. of This Loop 7 to the thymine base. This dT dT1 dT dTdT dT dT dT Ð1 Ð1 Ð1Ð1 1 Ð1 Ð1coordination appears critical as residuecoordination 189 is conserved appears as critical a basic as residue 189 is conserved as a basic W WW residueThe conserved (Arg/Lys) N57W isonly central in to catalytically the active site active geometry A3. domains N57 of I129 I129 Figure 3 | Structural model of the A3A catalyticallyresidue active (Arg/Lys) site. The only target in catalytically active A3 domains Y130K60 W Y130K60 W I129 Y130A3A is completely conservedI129 among the catalytically active nucleotide(Supplementary base (dC0) Fig. bound 1; at Supplementarythe A3A active(Supplementary site Table where the 1). catalytic Fig. At the 1; E72 Supplementary1 Table 1). At the 1 W WW W À À Y130 Y130 sideposition,APOBEC chain was deoxcytidine protein modelled domains, in instead could form of while the similar,position, alanine inactive at but thisdeoxcytidine pseudo-catalytic slightly position rearranged, in the could A3 form similar, but slightly rearranged, A59 A59 interactionsdomains have as the a N3 conserved atom lacks glycine the proton (Supplementary to hydrogen Fig. bond 1; T31 R189 T31 R189R189 crystal structure. Zinc, the coordinatedR189 waterinteractions (W), carbonyl as oxygen the N3 of atomdC0 lacks the proton to hydrogen bond Y132 D131 Y132Y132 D131 Y132Supplementary Table 1), and widely conserved among other dTÐ1 D131dTÐ1 andwith carboxyl D131. oxygen Indeed,D131 of E72 although side chain A3A were haswith connected dual D131. specificity Indeed,by dashed for although lines 50-TC-3 in A3A0 has dual specificity for 5 -TC-3 dC dC 0 0 Q58 0 N57 Q580 N57 magenta.andcytidine/cytosine 50-CC-30 (ref. deaminases 40), there from is a preferenceEscherichia for coli thymidinethrough Homo at the 52 and 50-CC-30 (ref. 40), there is a preference for thymidine at the dC0 dC0 sapiens . The structure explains this strong conservation, as N57 1 position. However, Loop 7 of A3A,1 position. in particular However, residues Loop 7 of A3A, in particular residues FigureFigure III.5. 2 A3A | A3A–ssDNA–ssDNA atomic atomic interactions interactions.. Stereo-view of the Y130Àof A3A and is D131, central would in recognizing likely preclude ssDNAÀ a withlarger three purine key base distinct from d dT dT with the D131 side chain carboxylateY130 and and O4 D131, with would a water likely preclude a larger purine base from Interactionsinteractions between between A3A A3Aand andÐ2 (a)d the target nucleotide base (dC0),Ð2 (b) the interactions: The side chain of N57 determines the 50–30 c dTÐ2 fitting in this position,dT thusÐ2 defining the T/C specificity of A3A. A. the target nucleotide base (dC0), molecule. In addition, the D131 side chain has a salt bridge to the DNA backbone flanking dC0 (c) nucleotide at 1 position (dT 1). directionality of ssDNA binding byfitting forming in this a hydrogen position, bondthus defining the T/C specificity of A3A. B. the DNA backboneH29 flanking dC0 À H29 À R189 side chain in helix 6, which stabilizes the overall hydrogen (ddT) Interactions between H29 side chaindT and the substrate DNA. Side C. nucleotide1 at -1 position (dT-1). 1 H29 to O30 atomH29 of dC0, which helps stabilize the geometry of the W98 dT1 W98 bondingdT1 configuration of Loop 7 to the thymine base. This D.dT Interactionschains of A3A between residues H29 (carbonsside chain green)dT and the and substrate the DNA DNA. (carbons Side chains and of DNA backbone and the sugar in a C20-endo conformation Ð1 Ð1 coordinationThe conserved appears N57 critical is central as residue to the 189 active is conserved site geometry as a basic. N57 of A3Aphosphates residues (carbons orange) green) are in and stick the representation, DNA (carbons withand phosphates other atoms orange) coloured (Fig. 2b; Supplementary Fig. 4a) and induces a backbone W W residueA3A is (Arg/Lys) completely only conserved in catalytically amongThe conserved activethe catalytically A3 N57 domains is central active to the active site geometry. N57 of are asin stick in Fig. repres 1b. Aentation, zinc ion with (Zn)I129 other at the atoms active colored centre, as the in Fig.III.2 zinc-ligandedb. AI129 zinc chlorine ion deformation due to steric hindrance with O5 of the target dC . (Zn)Y130 at the active center, the zinc-ligandedY130 chlorine (Cl) and water molecule (W) (SupplementaryAPOBEC protein Fig. domains, 1; Supplementary whileA3A inactive Table is completely pseudo-catalytic0 1). At the conserved1 0 A3 among the catalytically active (Cl) and water molecule (W) are indicated by spheres coloured magenta, The N57 side chain forms a hydrogen bond with the backbone are indicated by spheresW colored magenta, green and red, respectively.W position,domains deoxcytidine have a conserved could form glycine similar,APOBEC but (Supplementary slightly protein rearranged, domains, Fig.À 1; while inactive pseudo-catalytic A3 Estimatedgreen andhydrogen red, respectively. bonds and π Estimated–orbital interactions hydrogen ar bondse depicted and pby–orbital dashed NH of T31, positioning the T31 sidedomains chain to have hydrogen a conserved bond to glycine (Supplementary Fig. 1; R189 R189 interactionsSupplementary as the Table N3 atom 1), lacks andwidely the proton conserved to hydrogen among bond other linesinteractions colored orange are depictedanddT black,Ð1 by respectively. dashed lines coloured orangedT andÐ1 black, the -orbital cloud of the dC base ring, thus ensuring the Y132 D131 Y132 D131 p 0 Supplementary Table 1), and widely conserved among other dC0 dC0 dTÐ1 withcytidine/cytosine D131. Indeed,dT althoughdeaminasesÐ1 A3A from hasEscherichia dual specificity coli forthrough 50-TC-3Homo0 respectively. dC geometrydC 52 of the target nucleotide within the active site (Fig. 2a; 0 andsapiens 500-CC-3. The0 (ref. structure 40), there explains is a preference thiscytidine/cytosine strong for conservation, thymidine deaminases at as the N57 from Escherichia coli through Homo Supplementary Fig. 4a). Finally, the N57 side52 chain packs against Figure 2 | A3A–ssDNA atomic interactions.98 Stereo-view of the of1 A3A position. is central However, in recognizing Loop 7 of ssDNAsapiens A3A, in with. particular The three structure key residues distinct explains this strong conservation, as N57 Specificity for pyrimidines at 1 position. The deoxythymidine À both the deoxyribose ring of dC0, stabilizing the orientation of interactions between A3A and (a) theFigureÀ target 2 nucleotide | A3A–ssDNA base (dC atomic0), (b interactions.) the Y130interactions: andStereo-view D131, The would of the side likely chain preclude ofof N57 a A3A larger determines is purine central base in the recognizing from 50–30 ssDNA with three key distinct d at the 50-side of the targetdTÐ2 (the 1 position; dT 1) hasdT extensiveÐ2 sugar plane, and H70, which coordinates zinc. Although RNA À À fitting in this position, thus defining the T/C specificity53,54 of A3A. DNAvan backbone der Waals flanking contacts dC0 (c with) nucleotideinteractions three residues at between1 position from A3A Loop(dT and1). 7 ( (Y130,a) the targetdirectionalitydeaminase nucleotide activity base of ssDNA (dC has0), been (b binding) the reportedinteractions: by forming for A3A a The hydrogen, if the side sugar bond chain of N57 determines the 50–30 À H29 À H29 (d) InteractionsD131 and Y132) between and H29 W98 side in chainDNA Loop and backbone 5 (Fig. the substrate 2c). flanking The Watson–Crick DNA. dC0 Side(c) nucleotidetowas O3 at a0 riboseatom1 position of a steric dC (dT0, whichclash1). between helpsdirectionality stabilize the 20-OH the and geometry of ssDNA H70 would of binding the by forming a hydrogen bond dT1 dT1 À À chainsedge of A3Aof the residues thymine (carbons base faces green)(d) these Interactions and Loop the DNA 7 between residues, (carbons H29 and and side makes chainDNAoccur, and the backbone therefore substrate requiring DNA.and the Side a sugar conformationalto in O3 a0 C2atom0-endo rearrangement of conformation dC0, which for helps stabilize the geometry of the The conserved N57 is central to the active site geometry. N57 of phosphatesthree hydrogen orange) are bonds: in stick O2 representation, atomchains with of A3A Y132 with residues backboneother atoms (carbons amide, coloured green) N3 (Fig. andRNA the 2b; modification. DNA Supplementary (carbons Thus, and these Fig. three 4a)DNA pivotal and backbone induces interactions anda backbone of the N57 sugar in a C20-endo conformation A3A is completely conserved among the catalytically active as in Fig. 1b. A zinc ion (Zn) at the activephosphates centre, orange) the zinc-liganded are in stick chlorine representation,deformation with other due atoms to steric coloured hindrance(Fig. with 2b; O5 Supplementary0 of the target dC Fig.0. 4a) and induces a backbone APOBEC protein domains, while inactive pseudo-catalytic A3 (Cl)4 and water molecule (W) are indicatedas in Fig. by 1b. spheres A zinc coloured ionNATURE (Zn) magenta, at COMMUNICATIONS the activeThe centre, N57| 8:15024 the side zinc-liganded | DOI: chain 10.1038/ncomms15024 forms chlorine a hydrogen |deformation www.nature.com/naturecommunications bond with due the to stericbackbone hindrance with O5 of the target dC . domains have a conserved glycine (Supplementary Fig. 1; 0 0 green and red, respectively. Estimated(Cl) hydrogen and water bonds molecule and p–orbital (W) are indicatedNH by of spheres T31, positioning coloured magenta, the T31 sideThe chain N57 to side hydrogen chain forms bond toa hydrogen bond with the backbone dT dT Supplementary Table 1), and widely conserved among other interactions are depictedÐ1 by dashedgreen lines coloured and red, orange respectively. andÐ1 black, Estimated hydrogenthe p-orbital bonds and cloudp–orbital of the dC0 NHbase of ring, T31, thus positioning ensuring the the T31 side chain to hydrogen bond to dC0 dC0 cytidine/cytosine deaminases from Escherichia coli through Homo respectively. geometry of the target nucleotide within the active site (Fig. 2a; interactions are depicted by dashed lines coloured52 orange and black, the p-orbital cloud of the dC0 base ring, thus ensuring the sapiensSupplementary. The structure Fig. 4a). explains Finally, this the strong N57 side conservation, chain packs as N57 against Figure 2 | A3A–ssDNA atomic interactions.respectively.Stereo-view of the of A3A is central in recognizing ssDNAgeometry with three of the key target distinct nucleotide within the active site (Fig. 2a; Specificity for pyrimidines at 1 position. The deoxythymidine both the deoxyribose ring of dC0Supplementary, stabilizing the Fig. orientation 4a). Finally, of the N57 side chain packs against interactions between A3A and (a) theÀ target nucleotide base (dC0), (b) the interactions: The side chain of N57 determines the 50–30 at the 50-side of the target (the Specificity1 position; for dT pyrimidines1) has extensive at 1 positionsugar plane,. The and deoxythymidine H70, which coordinates zinc. Although RNA DNA backbone flanking dC0 (c) nucleotideÀ at 1 positionÀ (dT 1). directionality of ssDNA binding byboth forming the deoxyribosea hydrogen53,54 bond ring of dC0, stabilizing the orientation of van der Waals contacts with three residuesÀ from LoopÀ 7 (Y130,À deaminase activity has been reported for A3A , if the sugar (d) Interactions between H29 side chainat the and 50 the-side substrate of the DNA. target Side (the 1to position; O30 atom dT of dC1) has0, which extensive helps stabilizesugar plane, the geometry and H70, of the which coordinates zinc. Although RNA D131 and Y132) and W98 in Loop 5 (Fig. 2c). The Watson–CrickÀ was a riboseÀ a steric clash between the 2 -OH and H70 would chains of A3A residues (carbons green)van and der the Waals DNA (carbons contacts and with threeDNA residues backbone from Loopand the 7 (Y130, sugar indeaminase a C2 -endo0 activity conformation has been reported for A3A53,54, if the sugar edge of the thymine base faces these Loop 7 residues, and makes occur, therefore requiring a conformational0 rearrangement for phosphates orange) are in stick representation,D131 and with Y132) other and atoms W98 coloured in Loop(Fig. 5 (Fig. 2b; 2c). Supplementary The Watson–Crick Fig. 4a)was and a induces ribose a a steric backbone clash between the 2 -OH and H70 would three hydrogen bonds: O2 atom with Y132 backbone amide, N3 RNA modification. Thus, these three pivotal interactions of N57 0 as in Fig. 1b. A zinc ion (Zn) at the activeedge centre, of the the thymine zinc-liganded base chlorine faces thesedeformation Loop 7 residues, due to steric and hindrancemakes occur, with O5 therefore0 of the target requiring dC0. a conformational rearrangement for (Cl) and water molecule (W) are indicated by spheres coloured magenta, The N57 side chain forms a hydrogen bond with the backbone 4 three hydrogen bonds:NATURE O2 COMMUNICATIONS atom with Y132| 8:15024 backbone | DOI: 10.1038/ncomms15024 amide, N3 RNA | www.nature.com/naturecommunications modification. Thus, these three pivotal interactions of N57 green and red, respectively. Estimated hydrogen bonds and p–orbital NH of T31, positioning the T31 side chain to hydrogen bond to interactions are depicted by dashed lines4 coloured orange and black, the p-orbital cloudNATURE of COMMUNICATIONS the dC0 base ring,| 8:15024 thus | DOI: ensuring 10.1038/ncomms15024 the | www.nature.com/naturecommunications respectively. geometry of the target nucleotide within the active site (Fig. 2a; Supplementary Fig. 4a). Finally, the N57 side chain packs against Specificity for pyrimidines at 1 position. The deoxythymidine both the deoxyribose ring of dC , stabilizing the orientation of À 0 at the 50-side of the target (the 1 position; dT 1) has extensive sugar plane, and H70, which coordinates zinc. Although RNA van der Waals contacts with threeÀ residues fromÀ Loop 7 (Y130, deaminase activity has been reported for A3A53,54, if the sugar D131 and Y132) and W98 in Loop 5 (Fig. 2c). The Watson–Crick was a ribose a steric clash between the 20-OH and H70 would edge of the thymine base faces these Loop 7 residues, and makes occur, therefore requiring a conformational rearrangement for three hydrogen bonds: O2 atom with Y132 backbone amide, N3 RNA modification. Thus, these three pivotal interactions of N57

4 NATURE COMMUNICATIONS | 8:15024 | DOI: 10.1038/ncomms15024 | www.nature.com/naturecommunications

(Figure III.6) and poised for deamination reaction (2). After catalysis and subsequent release of NH3, this coordination, along with the interactions with

W98 and S99, would be unfavorable for the product uridine. Overall, multiple interactions of the substrate cytosine with A3A active site residues ensure the specific recognition and geometry required for the deamination reaction and product release.

99

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms15024 a dC dC0 dC0 0

T31 H70 T31 H70 T31 H70 Y130

A71 A71 Zn Zn Y130 Y130 W W98 W W98 A71 Cl Cl A72 A72 Zn W S99 S99 W98 P100 P100 b E72 H29 H29 dT1 dT1 S99

P100 dT dT Ð1 Ð1

FigureFigure 3 | III.6 Structural. Structural model model of of the the A3AA3A catalytically catalytically active active site. site. The target K60 W K60 W nucleotideThe target base nucleotide (dC0) base bound (dC at0) bound the A3A at the active A3A active site wheresite where the the catalytic catalytic E72 E72 side chain was modelled in instead of the alanine at this position in the side chain was modelled in instead of the alanine at this position in the Y130 Y130 crystal structure. Zinc, the coordinated water (W), carbonyl oxygen of dC0 and A59 A59 carboxyl oxygen of E72 side chain were connected by dashed lines in magenta. T31 T31 crystal structure. Zinc, the coordinated water (W), carbonyl oxygen of dC0 and carboxyl oxygen of E72 side chain were connected by dashed lines in Q58 N57 Q58 N57 magenta. dC0 dC0 with the D131 side chain carboxylate and O4 with a water c molecule. In addition, the D131 side chain has a salt bridge to the R189 side chain in helix 6, which stabilizes the overall hydrogen W98 W98 bonding configuration of Loop 7 to the thymine base. This dT dT Ð1 Ð1 coordination appears critical as residue 189 is conserved as a basic W W residue (Arg/Lys) only in catalytically active A3 domains Y130 I129 Y130 I129 (Supplementary Fig. 1; Supplementary100 Table 1). At the 1 W W position, deoxcytidine could form similar, but slightly rearranged,À R189 R189 interactions as the N3 atom lacks the proton to hydrogen bond Y132 Y132 D131 D131 with D131. Indeed, although A3A has dual specificity for 50-TC-30 and 50-CC-30 (ref. 40), there is a preference for thymidine at the 1 position. However, Loop 7 of A3A, in particular residues Y130À and D131, would likely preclude a larger purine base from d dT dT Ð2 Ð2 fitting in this position, thus defining the T/C specificity of A3A. H29 H29 dT1 dT1 The conserved N57 is central to the active site geometry. N57 of A3A is completely conserved among the catalytically active APOBEC protein domains, while inactive pseudo-catalytic A3 domains have a conserved glycine (Supplementary Fig. 1; Supplementary Table 1), and widely conserved among other dTÐ1 dTÐ1 dC0 dC0 cytidine/cytosine deaminases from Escherichia coli through Homo sapiens52. The structure explains this strong conservation, as N57 Figure 2 | A3A–ssDNA atomic interactions. Stereo-view of the of A3A is central in recognizing ssDNA with three key distinct interactions between A3A and (a) the target nucleotide base (dC0), (b) the interactions: The side chain of N57 determines the 50–30 DNA backbone flanking dC0 (c) nucleotide at 1 position (dT 1). directionality of ssDNA binding by forming a hydrogen bond À À (d) Interactions between H29 side chain and the substrate DNA. Side to O30 atom of dC0, which helps stabilize the geometry of the chains of A3A residues (carbons green) and the DNA (carbons and DNA backbone and the sugar in a C20-endo conformation phosphates orange) are in stick representation, with other atoms coloured (Fig. 2b; Supplementary Fig. 4a) and induces a backbone as in Fig. 1b. A zinc ion (Zn) at the active centre, the zinc-liganded chlorine deformation due to steric hindrance with O50 of the target dC0. (Cl) and water molecule (W) are indicated by spheres coloured magenta, The N57 side chain forms a hydrogen bond with the backbone green and red, respectively. Estimated hydrogen bonds and p–orbital NH of T31, positioning the T31 side chain to hydrogen bond to interactions are depicted by dashed lines coloured orange and black, the p-orbital cloud of the dC0 base ring, thus ensuring the respectively. geometry of the target nucleotide within the active site (Fig. 2a; Supplementary Fig. 4a). Finally, the N57 side chain packs against Specificity for pyrimidines at 1 position. The deoxythymidine both the deoxyribose ring of dC , stabilizing the orientation of À 0 at the 50-side of the target (the 1 position; dT 1) has extensive sugar plane, and H70, which coordinates zinc. Although RNA van der Waals contacts with threeÀ residues fromÀ Loop 7 (Y130, deaminase activity has been reported for A3A53,54, if the sugar D131 and Y132) and W98 in Loop 5 (Fig. 2c). The Watson–Crick was a ribose a steric clash between the 20-OH and H70 would edge of the thymine base faces these Loop 7 residues, and makes occur, therefore requiring a conformational rearrangement for three hydrogen bonds: O2 atom with Y132 backbone amide, N3 RNA modification. Thus, these three pivotal interactions of N57

4 NATURE COMMUNICATIONS | 8:15024 | DOI: 10.1038/ncomms15024 | www.nature.com/naturecommunications

III.c.3. Specificity for pyrimidines at -1 position.

The deoxythymidine at the 5’-side of the target (the -1position; dT-1) has extensive van der Waals contacts with three residues from Loop 7 (Y130, D131 and Y132) and W98 in Loop 5 (Figure III.5C). The Watson–Crick edge of the thymine base faces these Loop 7 residues, and makes three hydrogen bonds:

O2 atom with Y132 backbone amide, N3 with the D131 side chain carboxylate and O4 with a water molecule. In addition, the D131 side chain has a salt bridge to the R189 side chain in helix 6, which stabilizes the overall hydrogen bonding configuration of Loop 7 to the thymine base. This coordination appears critical as residue 189 is conserved as a basic residue (Arg/Lys) only in catalytically active

A3 domains (Figure 1 and Table III.2). At the -1 position, deoxcytidine could form similar, but slightly rearranged, interactions as the N3 atom lacks the proton to hydrogen bond with D131. Indeed, although A3A has dual specificity for 5’-TC-

3’ and 5’-CC-3’ (40), there is a preference for thymidine at the -1 position.

However, Loop 7 of A3A, in particular residues Y130 and D131, would likely preclude a larger purine base from fitting in this position, thus defining the T/C specificity of A3A.

101

! ! Table! III.2. Conservation of DNA coordinating residues between human A3 ! domains (catalytic and inactive pseudo-catalytic) ! ! Residue!(A3A!sequence)! Catalytic( Inactive(pseudo2catalytic( Included!Domains! A3A,!A3B7CTD,!A3C,!A3D7 A3B7NTD,!A3D7NTD,!A3G7NTD,! CTD,!A3G7CTD,!A3F7CTD! (rA3G7NTD)!A3F7NTD! 28! Arg( Arg!(ex!A3G7NTD!Leu)! 29! His!/!Asn!/!Arg! Asn!/!Asp!/!Ser! 31! Ser!/!Thr! Val!/!Thr! 57! Asn( Gly(( 70! His((Zn)( His(((Zn)( 71! Ala( Ala!/!Pro! 72! Glu((catalytic)( Glu((Zn)( 98! Trp( Trp( 99! Ser( Ser!/!Thr!/!Asn! 101! Cys((Zn)( Cys((Zn)( 106! Cys((Zn)( Cys((Zn)( 130! Tyr( Tyr( 131! Asp!/!Tyr!! Tyr( 132! Asp!/!Tyr!/!Phe! Phe!/!Tyr! 189! Arg!/!Lys! Met!/!Thr! ! *APOBEC3H not included *APOBEC3H!not!included!!

Supplementary(Table(1.(Conservation(of(DNA(coordinating(residues (between(human(A3( domains((catalytic(and(inactive(pseudo2catalytic)(

102

III.c.4. The conserved N57 is central to the active site geometry.

N57 of A3A is completely conserved among the catalytically active

APOBEC protein domains, while inactive pseudo-catalytic A3 domains have a conserved glycine (Figure III.1 and Table III.2), and widely conserved among other cytidine/cytosine deaminases from Escherichia coli through Homo sapiens

(52). The structure explains this strong conservation, as N57 of A3A is central in recognizing ssDNA with three key distinct interactions: The side chain of N57 determines the 5’–3’ directionality of ssDNA binding by forming a hydrogen bond to O3’ atom of dC0, which helps stabilize the geometry of the DNA backbone and the sugar in a C2’-endo conformation (Figure III.5B and Figure 7A) and induces a backbone deformation due to steric hindrance with O5’ of the target dC0. The

N57 side chain forms a hydrogen bond with the backbone NH of T31, positioning the T31 side chain to hydrogen bond to the π-orbital cloud of the dC0 base ring, thus ensuring the geometry of the target nucleotide within the active site (Figure

III.5A and Figure 7A). Finally, the N57 side chain packs against both the deoxyribose ring of dC0, stabilizing the orientation of sugar plane, and H70, which coordinates zinc. Although RNA deaminase activity has been reported for

A3A (53,54), if the sugar was a ribose a steric clash between the 2’-OH and H70 would occur, therefore requiring a conformational rearrangement for RNA modification. Thus, these three pivotal interactions of N57 organize the enzyme substrate complex to be poised for catalytic turnover.

103

Figure III.7. Close-up views of the conserved asparagine and sugar arrangement.Supplementary Fig. 4. Close-up views of the conserved asparagine and sugar arrangement. The selected amino acid residues in (a) A3A in complex with ssDNA (this study; only dC0 is shown) The selected amino acid residues in 56 A. A3A(b) DNA in complex cytidine deaminase with ssDNA from bacteriophage (this study; S only-TIM5 dC (PDB is code shown) 4P9C) !in complex with deoxyuridine monophosphate (dUMP) (c) RNA cytidine deaminase0 from mouse (PDB code B. DNA2FR6) cytidine55 in complex deaminase with cytidine from (rC) bacteriophageand (d) RNA cytidine S-TIM5 deaminase (PDB from code human 4P9C) (PDB(56) code in complex1MQ0) with57 in complexdeoxyuridine with deaminase monophosphate inhibitor, 1-" -(dUMP) ribofuranosyl -1,3-diazapinone (indicated by C. RNAasterisk) cytidine are shown deaminase as a stick model from (carbons mouse for (PDB the conserved code 2FR6) asparagine (55) residue, in complex green; with cytidinecarbons (rC) for others,and white). The substrates are colored orange for carbons and phosphates. The D. RNAconserved cytidine asparagine deaminase and neighboring from humanhistidine/cysteine (PDB code are drawn 1MQ0) with (57)van der in Waals complex surface. with deaminase inhibitor, 1- βribofuranosyl -1,3-diazapinone (indicated by asterisk) are shown as a stick model (carbons for the conserved asparagine residue, green; carbons for others, white). The substrates are colored orange for carbons and phosphates. The conserved asparagine and neighboring histidine/cysteine are drawn with van der Waals surface.

104

The three central interactions mediated by N57 are strictly conserved in the active site geometry of other cytidine deaminases (55-57), where the asparagine side chain (1) hydrogen bonds to substrate backbone, (2) packs to maintain the sugar orientation and (3) packs against the zinc-coordinating residue side chain (Figure III.7B-D). The RNA cytidine deaminases replace the zinc coordinating histidine with a relatively small amino acid, cysteine, which permits a ribose ring to fit (Figure III.7.C,D). This structure explains why although not located directly at the active site, even conservative N57Q or N57D mutations severely disrupt deaminase activity (29,52,58), thus our A3A–ssDNA structure reveals the conservation of N57 to be critical for proper orientation of the substrate within the active sites of cytidine deaminases.

105

III.c.5. H29 coordinates the ssDNA binding in the active site.

H29 is the other lynchpin in ssDNA binding to A3A. H29 of A3A corresponding to H216 in the catalytic domain of A3G (A3G-CTD), which when mutated to alanine abolishes activity (21). Maximal catalytic activity occurs at pH

5.5 for both A3G-CTD (59) and A3A (52), implying that the histidine is protonated. Interestingly, this His is not completely conserved in other A3s, where this position is sometimes an arginine or asparagine. The H216R mutation in A3G and H29R in A3A resulted in reduced but still significant catalytic activity

(41,59). In the apo A3A crystal structure (26), H29 is involved in crystal contacts and rotated away from the active site (Figure III.2D). In the NMR structure of

A3A the H29, side chain is solvent exposed and the rotamer is not defined in solution (PDB code 2M65) (35). Thus, upon ssDNA binding, the side chain of

H29 selects a rotamer to interact extensively with the substrate, latching the active site to permit catalysis. Once catalysis occurs, H29 needs to rotate out of this position to release the deaminated product. H29 forms hydrogen bonds to the backbone phosphates of dT-1, dC0, and dT1, and the deoxyribose of dT1

(Figure III.2D-F and Figure III.5D). The side chain of H29 is crucial in dT1 recognition, with the imidazole ring positioned to form π–π interactions with the pyrimidine ring of dT1. This relatively non-specific stacking interaction explains the apparent lack of specificity at the +1 position. Thus, our structure reveals the unique role of H29 in positioning the substrate ssDNA with a series of

106

coordinated hydrogen bonds and stacking interactions, essentially latching the ssDNA and the target dC0, within the active site.

107

III.c.6. A3A and rA3G-NTD differ in DNA binding.

The recent structure of ssDNA bound to the inactive pseudo-catalytic domain rA3G-NTD (42) is not that of a substrate complex and displays a binding mode that is incompatible with catalysis. In contrast to our structure, the single base ordered in that structure is not coordinated within the binding pocket

(Figure III.8), but rather a sugar is partially buried in the pocket. More specifically, comparing the A3A–ssDNA with the rA3G-NTD– ssDNA structure:

H70, W98, S99 and Y130 in A3A (H65, W94, S95 and Y125 in rA3G-NTD) are conserved in the two protein’s sequences and interact with ssDNA; however, there are no similarities in their interactions with the ssDNA (Figure III.5A,

Figure III.8, and Figure III.9). H70 of A3A forms a π-hydrogen bond with dC0, while H65 of rA3G-NTD forms a hydrogen bond with C3’-carbonyl group of the ribose of dT0. W98 and S99 of A3A use their backbone carbonyl group to hydrogen bond with amino group of the target cytidine (dC0), while W94 of rA3G-

NTD is stacking with the pyrimidine of dT0. Y130 of A3A forms a π–π interaction with dC0 and a hydrogen bond with the phosphate between dC0 and dT-1, while

Y125 of rA3G-NTD forms a hydrogen bond with C3’-carbonyl group of the ribose of dT1. Many of these interactions preclude the interactions observed in A3A– ssDNA (Figure III.2 and Figure III.5). Amino acids with more extensive interactions with substrate ssDNA are not conserved in sequence or structure including: H29, which is D, T31, which is V, and the critical N57, which is G

(Figure III.1). In addition, interactions at -1 and +1 positions are not observed in

108

Figure III.8. View of the atomic interactions between rA3G-NTD and ssDNA (5K83).Supplementary Fig. 5. Stereo-view of the atomic interactions between rA3G-NTD and ssDNA rA3G-NTD (gray ribbon) bound to ssDNA (orange sticks: dT0 base, the backbone of dT1(5K83). and the sugar rA3G of dT-0NTD), magenta (gray and ribbon)red spheres bound are water to and ssDNA Zn (orange sticks: dT0 base, the backbone of dT1 respectively. and the sugar of dT0), magenta and red spheres are water and Zn respectively.

109

Supplementary Fig. 6. (a) 2Fo-Fc and Fo-Fc maps of the bound DNA. A3A and DNA structure are represented as cartoon and stick model, respectively. The 2Fo-Fc map is indicated with a cyan-colored mesh (1.0σ), the Fc-Fo map is depicted blue (3.0σ) and red (-3.0σ). (b) The simulated-annealing composite omit map of the bound DNA. The 2mFo-DFc map (pink) is contoured at 1.0σ.

ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms15024

abc C

5 ′ 3′ 5′ 5′ 3′ 3′

α2 α2 α2 C C

α3 α3 α3

dfe

dT1 dTÐ1 dT1 (phosphate) 5′ (sugar) 5′ 5′ 3′ dT rC1 rU 3′ Ð1 Ð1 3′

dC0 Neb0 (rA0) dT0

ghi

dC Neb0 H70 0 H53 dT0 (rA0)

H65

E72 E55 Zn Zn Zn E67

C101 C83 C97 C100 C106 C86

Figure 4 | StructureFigure and III.9 substrate-binding. Structure similarity and between substrate A3A and-binding RNA deaminase similarity TadA. (a) A3A between structure (green A3A ribbon) and bound to substrate DNA (orange sticks, as in Fig. 1b). Three DNA nucleotides (dT 1, dC0 and dT1) are displayed. A zinc ion at the active centre is coordinated by H70 (helix RNA deaminase TadA.60 À a2), C101 and C106 (helix a3). (b) TadA structure (PDB code 2B3J) (grey ribbon) bound to substrate RNA (orange sticks). Three RNA nucleotides (rU 1, À Neb0 (nebularine)A. A3A and rC structure1) are displayed. (green Zinc-coordinating ribbon) residues bound H53 (helixto substratea2), C83 and C86DNA (helix (orangea3) are shown sticks, in stick as representation in (carbons, white; nitrogens,Fig. blue; 1b). oxygen, Three red; sulfurs, DNA yellow). nucleotides (c) rA3G-NTD (dT structure-1, dC (grey0 ribbon)and dT bound1) toare ssDNA displayed. (orange sticks), A only zinc dT0 ionhas the at base while the backbone ofthe dT1 and ac thetive sugar cent of dT0eisr mapped.is coordinated Surface representation by H70 of the (helix nucleotide-binding α2), C101 site of and (d) A3A, C106 (e) TadA (helix (f) A3G-NTD. α3). Close-up view of the active site of (g) A3A, (h) TadA and (i) A3G-NTD. The catalytic glutamic acid side chain was modelled in instead of alanine at position 72 in the A3A crystal structure.B. TadA structure (60) (PDB code 2B3J) (grey ribbon) bound to substrate RNA (orange sticks). Three RNA nucleotides (rU-1, Neb0 (nebularine) and rC1) are displayed. Zinc-coordinating residues H53 (helix α2), C83 and C86 (helix α3) are Methods shown in stick representation (carbons, white;Crystallization nitrogens, and data collectionblue; .oxygen,Crystals of the red; A3A(E72A/C171A)–DNA Preparation of protein and DNA. The preparation method of A3A(E72A/C171A) complex were grown by hanging-drop vapour-diffusion method over a reservoir of sulfurs, yellow).26 protein was described previously as follows: the protein was expressed in E. coli 100 mM MOPS (pH 6.5), 50 mM MgCl2, 50 mM CaCl2, 23% polyethylene glycol strain BL21 DE3 Star (Stratagene) cells with pCold-GST-A3A(E72A/C171A) 3,350 and 15% 2-methyl-2,4-pentanediol. Drops were formed by mixing 1 mlof l vector. Expression was induced with 1 mM isopropyl b-D-1-thiogalactopyranoside A3A(E72A/C171A)–DNA solution (B20 mg ml À of protein concentration) and l at 16 °C for 22 h in lysogeny broth medium containing 100 mgmlÀ ampicillin. 1 ml of reservoir solution, with equilibration over the reservoir at 20 °C. Micro- Cells were pelleted, resuspended in purification buffer (50 mM Tris-HCl (pH 8.0), seeding was performed using a cat whisker and larger crystals suitable for X-ray 300 mM NaCl and 1 mM dithiothreitol) and lysed through sonication. Cellular diffraction were obtained. Crystals were flash-frozen directly in the cryogenic debris was separated by centrifugation (45,000g, 30 min, 4 °C). The protein was stream. Diffraction data were collected using an in-house X-ray source MicroMax- purified as a GST-fused protein with glutathione-immobilized resin (Clontech). 007 HF (Rigaku) with a copper anode at a wavelength of 1.54178 Å and a Saturn After digesting with HRV 3C protease, the protein was further purified with a 944 HG (Rigaku) detector. The space group of the crystals was I222 with unit cell size-exclusion column (GE Healthcare) equilibrated with a buffer (10 mM Tris-HCl dimensions of a 56.6 Å, b 72.7 Å, c 115.0 Å (Table 1). The collected inten- (pH 8.0), 200 mM NaCl and 1 mM dithiothreitol). The fraction containing the sities were indexed,¼ integrated,¼ corrected¼ for absorption and scaled using HKL2000 monomeric form was collected and concentrated for crystallization. The purity and (ref. 62). integrity of A3A(E72A/C171A) was confirmed by SDS–polyacrylamide gel 110 electrophoresis. E72A inactivates the enzyme, while C171A (distal the active site) enhances solubility of the expressed protein. Structure determination. The protein structure was solved by molecular repla- The DNA oligo, d(TTTTTTTTCTTTTTT), was synthesized (Integrated DNA cement phasing using a previously determined apo A3A(E72A/C171A) crystal Technologies), and mixed with the purified A3A(E72A/C171A) protein at a molar structure (PDB code 4XXO)26 with the program Phaser63. Model building of the ratio of 2:1. protein and bound DNA, and refinements were manually performed using the

6 NATURE COMMUNICATIONS | 8:15024 | DOI: 10.1038/ncomms15024 | www.nature.com/naturecommunications

Figure III.9. Structure and substrate-binding similarity between A3A and RNA deaminase TadA. (Continued) C. rA3G-NTD structure (grey ribbon) bound to ssDNA (orange sticks), only dT0 has the base while the backbone of dT1 and the sugar of dT0 is mapped. Surface representation of the nucleotide-binding site of D. A3A, E. TadA F. A3G-NTD. Close-up view of the active site of G. A3A, H. TadA and I. A3G- NTD. The catalytic glutamic acid side chain was modelled in instead of alanine at position 72 in the A3A crystal structure.

111

the rA3G-NTD–ssDNA complex structure as only a single dT0 is ordered in the electron density. Critically, the target cytidine (dC0) in the A3A is located ready for deamination, while non- substrate dT0 in the rA3G-NTD is not located close to the catalytic Zn2+. This binding mode corresponds to a much lower affinity of the pseudo-catalytic rA3G-NTD to ssDNA (~1.6 µM) (42) confirming non-specific binding, compared to ~60 nM (26) we observed for substrate ssDNA binding to

A3A. While the rA3G-NTD-dT structure may represent mechanisms by which non-substrate ssDNA binds A3 domains, the A3A–ssDNA structure we present here elucidates the mechanism by which ssDNAs are recognized as substrates by catalytically active A3s.

112

III.c.7. Molecular recognition in polynucleotide deaminases.

Our crystal structure of the A3A–ssDNA complex and the crystal structure of Staphylococcus aureus tRNA adenosine deaminase (TadA) in complex with

RNA (2B3J) (60) are structures of single- stranded polynucleotide deaminases bound to their substrates. Although their substrates are different, as TadA deaminates adenosine at the anti-codon stem-loop of tRNAArg2 and A3A deaminates cytosines in ssDNA, their active sites are similar in that both have a

HAEx~30Cx2-4C zinc-binding motif. We observe the most striking similarity in the phosphate-sugar backbone traces of RNA (TadA) and ssDNA (A3A) (Figure

III.9): 5’–3’ directionality is the same, and the polynucleotide is sharply bent with the target nucleotide deep in the active site pocket. Five nucleotides located in the anti-codon stem-loop of tRNAArg2 have adopted C2’-endo ribose conformation that is typical for DNA, explaining how the RNA forms a similar backbone conformation to the ssDNA bound to A3A (Figure III.9A,D,G). This remarkable similarity of the phosphate-sugar backbone, despite different substrates, tRNA for TadA and ssDNA for A3A, implies that the HAEx~30Cx2-4C type zinc-dependent deaminases have an evolutionary conserved substrate-binding topology as well as catalytic mechanism.

This crystal structure of an ssDNA substrate–enzyme complex reveals how substrate recognition occurs by single-stranded polynucleotide-modifying enzymes, for APOBEC family members and other ssDNA deaminases. This is in contrast with the pseudo-catalytic domain A3G-NTD (42), which is not a

113

substrate complex and has a single base ordered in the structure that is only partially buried in the binding pocket, displaying a very dissimilar binding mode

(Figure III.9C,F,I). The striking similarity of A3A–ssDNA (Figure III.9A,D,G) with the structure of TadA–tRNA complex (Figure III.9B,E,H) implies structural and mechanistic conservation among single-stranded nucleotide-modifying enzymes that have evolved to acquire distinct specificities. These specificities may be leveraged for specific gene editing. APOBEC1 and other cytidine deaminases were recently combined with CRISPR/Cas9 technology in direct ‘base editing’ to correct point mutations, without the need for a donor template or double-stranded

DNA breaks (61). By leveraging the directionality, specificity and binding architecture of ssDNA revealed by our A3A–ssDNA complex, base-editing technologies will become even more targeted and specific to expand the scope and effectiveness of genome editing.

114

III.d. Methods

III.d.1. Preparation of protein and DNA.

The preparation method of A3A(E72A/C171A) protein was described previously

(26) as follows: the protein was expressed in E. coli strain BL21 DE3 Star

(Stratagene) cells with pCold-GST-A3A(E72A/C171A) vector. Expression was induced with 1 mM isopropyl β-D-1-thiogalactopyranoside at 16 °C for 22 h in lysogeny broth medium containing 100 µg ml-1 ampicillin. Cells were pelleted, resuspended in purification buffer (50 mM Tris-HCl (pH 8.0), 300 mM NaCl and 1 mM dithiothreitol) and lysed through sonication. Cellular debris was separated by centrifugation (45,000g, 30 min, 4 °C). The protein was purified as a GST-fused protein with glutathione-immobilized resin (Clontech). After digesting with HRV

3C protease, the protein was further purified with a size-exclusion column (GE

Healthcare) equilibrated with a buffer (10 mM Tris-HCl (pH 8.0), 200 mM NaCl and 1 mM dithiothreitol). The fraction containing the monomeric form was collected and concentrated for crystallization. The purity and integrity of

A3A(E72A/C171A) was confirmed by SDS–polyacrylamide gel electrophoresis.

E72A inactivates the enzyme, while C171A (distal the active site) enhances solubility of the expressed protein.

The DNA oligo, d(TTTTTTTTCTTTTTT), was synthesized (Integrated DNA

Technologies), and mixed with the purified A3A(E72A/C171A) protein at a molar ratio of 2:1.

115

III.d.2. Crystallization and data collection.

Crystals of the A3A(E72A/C171A)–DNA complex were grown by hanging- drop vapour-diffusion method over a reservoir of 100 mM MOPS (pH 6.5), 50 mM

MgCl2, 50 mM CaCl2, 23% polyethylene glycol 3,350 and 15% 2-methyl-2,4- pentanediol. Drops were formed by mixing 1 µl of A3A(E72A/C171A)–DNA solution (~20 mg ml-1 of protein concentration) and 1 µl of reservoir solution, with equilibration over the reservoir at 20 °C. Micro- seeding was performed using a cat whisker and larger crystals suitable for X-ray diffraction were obtained.

Crystals were flash-frozen directly in the cryogenic stream. Diffraction data were collected using an in-house X-ray source MicroMax- 007 HF (Rigaku) with a copper anode at a wavelength of 1.54178 Å and a Saturn 944 HG (Rigaku) detector. The space group of the crystals was I222 with unit cell dimensions of a a = 56.6 Å, b = 72.7 Å, c = 115.0 Å (Table III.1). The collected intensities were indexed, integrated, corrected for absorption and scaled using HKL2000 (62).

116

III.d.3. Structure determination.

The protein structure was solved by molecular replacement phasing using a previously determined apo A3A(E72A/C171A) crystal structure (PDB code

4XXO) (26) with the program Phaser (63). Model building of the protein and bound DNA, and refinements were manually performed using the programs Coot

(64) and Phenix (65,66), respectively. A simulated annealing omit map was calculated to confirm the ssDNA positioning (Figure III.10). The first nine residues and the side chains of residues R10, H11, H16, K30, N42, V46, K47,

Q50, Q58, K60, L62, L63, F66, Y67, D177, E181 and N196 of A3A(E72A/C171A) were not modelled in due to lack of electron density. Residues N42–T44 and

L62– G65 were somewhat disordered; the occupancy values were set to 0.5 for residues N42–T44, L62 and G65, and to 0.75 for residues L63 and C64 due to poor electron density. A density proximal to a zinc ion at the active center was assigned to chloride considering the statistics of zinc ligand (67), resulting in a good fit without phase-error signals (Figure III.11). The identification of the active site zinc is further supported by the highest peak, 9.5σ, in an anomalous difference Fourier map at this position. A smaller peak at 5.6σ in this map is present at the assigned chloride position. The final model was refined to

R(work)/R(free) values of 0.177/ 0.225 at 2.20 Å resolution (Table III.1). The quality of the final model was assessed by Molprobity68, which indicated that 96.2% of the residues were in the favored dihedral angle configuration and there were no

Ramachandran outliers.

117

Supplementary Fig. 5. Stereo-view of the atomic interactions between rA3G-NTD and ssDNA

(5K83). rA3G-NTD (gray ribbon) bound to ssDNA (orange sticks: dT0 base, the backbone of dT1 and the sugar of dT0), magenta and red spheres are water and Zn respectively.

Figure III.10. Simulated annealing omit map confirms ssDNA positioning SupplementaryA. 2Fo-Fc and Fig. Fo 6.- Fc(a) 2mapsFo-Fc of and the Fo bound-Fc maps DNA. of the A3A bound and DNA. DNA A3A structure and DNA are structure arerepresented represented as as cartoon cartoon and andstick stickmodel, model, respectively. respectively. The 2Fo -TheFc map 2Fo is -indicatedFc map withis a cyanindicated-colored withmesh a(1.0 cyanσ), the-colored Fc-Fo meshmap is depicted(1.0σ), theblue Fc (3.0-Foσ) andmap red is (depicted-3.0σ). (b) blueThe simulated(3.0σ) -andannealing red ( -composite3.0σ). omit map of the bound DNA. The 2mFo-DFc map (pink) is contouredB. The atsimulated 1.0σ. -annealing composite omit map of the bound DNA. The 2mFo-DFc map (pink) is contoured at 1.0σ.

118

SupplementaryFigure III.11 .Fig. Modeling 7 Modeling in inthe the electron electron density density next. to the active site zinc (Zn) with a (a)Modeling chloride ion in (Cl)the versuselectron (b) waterdensity molecule next to(W). the The active 2Fo -siteFc map zinc is indicated(Zn) with with a a cyan- coloredA. chloride mesh (1.0 ionσ ),(Cl) the Fversusc-Fo map is depicted by meshes colored in blue (3.0σ) and red (- 3.0B.σ ).water molecule (W). The 2Fo-Fc map is indicated with a cyan- colored ! mesh (1.0σ), the Fc-Fo map is depicted by meshes colored in blue (3.0σ) and red (-3.0σ).

119

III.d.4. Structure analysis.

Figures of structure models were generated by Pymol (69), which was also used to model in the catalytic E72 side chain in Figs 3 and 4. The electrostatic distribution of A3A(E72A/C171A) was calculated and visualized using PDB2PQR server (70) and Pymol with the APBS plugin, where the cysteine was modelled as thiolate anion (S-) and solutes were excluded. Solvent- accessible and buried surface area was calculated with PISA (71). Local root mean square deviation between apo and DNA-bound forms of

A3A(E72A/C171A) was calculated using Molmol (72). The distance difference matrices between the apo- and DNA-bound forms of A3A(E72A/C171A) were calculated and visualized using a custom-made script in MacOS Xcode

(https://developer.apple.com/xcode/).

120

III.e. Results:

1. Sawyer, S. L., Emerman, M. & Malik, H. S. Ancient adaptive evolution of the primate antiviral DNA-editing enzyme APOBEC3G. PLoS Biol. 2, E275 (2004).

2. Betts, L., Xiang, S., Short, S. A., Wolfenden, R. & Carter, Jr C. W. Cytidine deaminase. The 2.3A crystal structure of an enzyme: transition-state analog complex. J. Mol. Biol. 235, 635–656 (1994).

3. Jarmuz, A. et al. An anthropoid-specific locus of orphan C to U RNA-editing enzymes on chromosome 22. Genomics 79, 285–296 (2002).

4. Wedekind, J. E., Dance, G. S., Sowden, M. P. & Smith, H. C. Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business. Trends Genet. 19, 207–216 (2003).

5. Conticello, S. G., Thomas, C. J., Petersen-Mahrt, S. K. & Neuberger, M. S. Evolution of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases. Mol. Biol. Evol. 22, 367–377 (2005).

6. LaRue, R. S. et al. Guidelines for naming nonprimate APOBEC3 genes and proteins. J. Virol. 83, 494–497 (2009).

7. Sheehy, A. M., Gaddis, N. C., Choi, J. D. & Malim, M. H. Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature 418, 646–650 (2002).

8. Sheehy, A. M., Gaddis, N. C. & Malim, M. H. The antiretroviral enzyme APOBEC3G is degraded by the proteasome in response to HIV-1 Vif. Nat. Med. 9, 1404–1407 (2003).

9. Malim, M. H. & Emerman, M. HIV-1 accessory proteins--ensuring viral survival in a hostile environment. Cell Host Microbe 3, 388–398 (2008).

10. Malim, M. H. APOBEC proteins and intrinsic resistance to HIV-1 infection. Philos. Trans. R. Soc. Lond. B Biol. Sci. 364, 675–687 (2009).

11. Koning, F. A., Goujon, C., Bauby, H. & Malim, M. H. Target cell-mediated editing of HIV-1 cDNA by APOBEC3 proteins in human macrophages. J. Virol. 85, 13448–13452 (2011).

12. Berger, G. et al. APOBEC3A is a specific inhibitor of the early phases of HIV- 1 infection in myeloid cells. PLoS Pathog. 7, e1002221 (2011).

121

13. Harris, R. S., Hultquist, J. F. & Evans, D. T. The restriction factors of human immunodeficiency virus. J. Biol. Chem. 287, 40875–40883 (2012).

14. Chan, K. et al. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 47, 1067–1072 (2015).

15. Hoopes, J. I. et al. APOBEC3A and APOBEC3B preferentially deaminate the lagging strand template during DNA replication. Cell Rep. 14, 1273–1282 (2016).

16. Kazanov, M. D. et al. APOBEC-induced cancer mutations are uniquely enriched in early-replicating, gene-dense, and active chromatin regions. Cell Rep 13, 1103–1109 (2015).

17. Roberts, S. A. et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat. Genet. 45, 970–976 (2013).

18. Burns, M. B. et al. APOBEC3B is an enzymatic source of mutation in breast cancer. Nature 494, 366–370 (2013).

19. Burns, M. B., Temiz, N. A. & Harris, R. S. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat. Genet. 45, 977–983 (2013).

20. Sakofsky, C. J. et al. Break-induced replication is a source of mutation clusters underlying kataegis. Cell Rep. 7, 1640–1648 (2014).

21. Chen, K. M. et al. Structure of the DNA deaminase domain of the HIV-1 restriction factor APOBEC3G. Nature 452, 116–119 (2008).

22. Harjes, E. et al. An extended structure of the APOBEC3G catalytic domain suggests a unique holoenzyme model. J. Mol. Biol. 389, 819–832 (2009).

23. Shandilya, S. M. et al. Crystal structure of the APOBEC3G catalytic domain reveals potential oligomerization interfaces. Structure 18, 28–38 (2010).

24. Li, M. et al. First-in-class small molecule inhibitors of the single-strand DNA cytosine deaminase APOBEC3G. ACS Chem. Biol. 7, 506–517 (2012).

25. Bohn, M. F. et al. Crystal structure of the DNA cytosine deaminase APOBEC3F: the catalytically active and HIV-1 Vif-binding domain. Structure 21, 1042–1050 (2013).

26. Bohn, M. F. et al. The ssDNA mutator APOBEC3A is regulated by cooperative dimerization. Structure 23, 903–911 (2015).

122

27. Kouno, T. et al. Structure of the Vif-binding domain of the antiviral enzyme APOBEC3G. Nat. Struct. Mol. Biol. 22, 485–491 (2015).

28. Chelico, L., Pham, P., Calabrese, P. & Goodman, M. F. APOBEC3G DNA deaminase acts processively 30 --4 50 on single-stranded DNA. Nat. Struct. Mol. Biol. 13, 392–399 (2006).

29. Holden, L. G. et al. Crystal structure of the anti-viral APOBEC3G catalytic domain and functional implications. Nature 456, 121–124 (2008).

30. Chelico, L., Sacho, E. J., Erie, D. A. & Goodman, M. F. A model for oligomeric regulation of APOBEC3G cytosine deaminase-dependent restriction of HIV. J. Biol. Chem. 283, 13780–13791 (2008).

31. Furukawa, A. et al. Structure, interaction and real-time monitoring of the enzymatic reaction of wild-type APOBEC3G. EMBO J. 28, 440–451 (2009).

32. Chelico, L., Prochnow, C., Erie, D. A., Chen, X. S. & Goodman, M. F. A structural model for deoxycytidine deamination mechanisms of the HIV-1 inactivation enzyme APOBEC3G. J. Biol. Chem. 285, 16195–16205 (2010).

33. Kitamura, S. et al. The APOBEC3C crystal structure and the interface for HIV-1 Vif binding. Nat. Struct. Mol. Biol. 19, 1005–1010 (2012).

34. Siu, K. K., Sultana, A., Azimi, F. C. & Lee, J. E. Structural determinants of HIV-1 Vif susceptibility and DNA binding in APOBEC3F. Nat. Commun. 4, 2593 (2013).

35. Byeon, I. J. et al. NMR structure of human restriction factor APOBEC3A reveals substrate binding and enzyme specificity. Nat. Commun. 4, 1890 (2013).

36. Mitra, M. et al. Structural determinants of human APOBEC3A enzymatic and nucleic acid binding properties. Nucleic Acids Res. 42, 1095–1110 (2014).

37. Lu, X. et al. Crystal structure of DNA cytidine deaminase ABOBEC3G catalytic deamination domain suggests a binding mode of full-length enzyme to single-stranded DNA. J. Biol. Chem. 290, 4010–4021 (2015).

38. Shi, K., Carpenter, M. A., Kurahashi, K., Harris, R. S. & Aihara, H. Crystal structure of the DNA deaminase APOBEC3B catalytic domain. J. Biol. Chem. 290, 28120–28130 (2015).

39. Nakashima, M. et al. Structural insights into HIV-1 Vif-APOBEC3F

123

interaction. J. Virol. 90, 1034–1047 (2015).

40. Shaban, N. M., Shi, K., Li, M., Aihara, H. & Harris, R. S. 1.92 Angstrom zinc- free APOBEC3F catalytic domain crystal structure. J. Mol. Biol. 428, 2307–2316 (2016).

41. Byeon, I. J. et al. Nuclear magnetic resonance structure of the APOBEC3B catalytic domain: structural basis for substrate binding and DNA deaminase activity. Biochemistry 55, 2944–2959 (2016).

42. Xiao, X., Li, S. X., Yang, H. & Chen, X. S. Crystal structures of APOBEC3G N-domain alone and its complex with DNA. Nat. Commun. 7, 12193 (2016).

43. Shandilya, S. M., Bohn, M. F. & Schiffer, C. A. A computational analysis of the structural determinants of APOBEC3’s catalytic activity and vulnerability to HIV-1 Vif. Virology 471–473, 105–116 (2014).

44. Carlow, D. C., Short, S. A. & Wolfenden, R. Complementary truncations of a hydrogen bond to ribose involved in transition-state stabilization by cytidine deaminase. Biochemistry 37, 1199–1203 (1998).

45. Snider, M. J., Reinhardt, L., Wolfenden, R. & Cleland, W. W. 15N kinetic isotope effects on uncatalyzed and enzymatic deamination of cytidine. Biochemistry 41, 415–421 (2002).

46. Chen, H. et al. APOBEC3A is a potent inhibitor of adeno-associated virus and retrotransposons. Curr. Biol. 16, 480–485 (2006).

47. Chiu, Y. L. & Greene, W. C. The APOBEC3 cytidine deaminases: an innate defensive network opposing exogenous retroviruses and endogenous retroelements. Annu. Rev. Immunol. 26, 317–353 (2008).

48. Goila-Gaur, R. & Strebel, K. HIV-1 Vif, APOBEC, and intrinsic immunity. Retrovirology 5, 51 (2008).

49. Furukawa, A. et al. Quantitative analysis of location- and sequence- dependent deamination by APOBEC3G using real-time NMR spectroscopy. Angew. Chem. Int. Ed. 53, 2349–2352 (2014).

50. Carpenter, M. A. et al. Methylcytosine and normal cytosine deamination by the foreign DNA restriction enzyme APOBEC3A. J. Biol. Chem. 287, 34801– 34808 (2012).

124

51. Pham, P., Landolph, A., Mendez, C., Li, N. & Goodman, M. F. A biochemical analysis linking APOBEC3A to disparate HIV-1 restriction and skin cancer. J. Biol. Chem. 288, 29294–29304 (2013).

52. Bulliard, Y. et al. Structure-function analyses point to a polynucleotide- accommodating groove essential for APOBEC3A restriction activities. J. Virol. 85, 1765–1776 (2011).

53. Sharma, S., Patnaik, S. K., Kemer, Z. & Baysal, B. E. Transient overexpression of exogenous APOBEC3A causes C-to-U RNA editing of thousands of genes. RNA Biol. (2016).

54. Sharma, S. et al. APOBEC3A cytidine deaminase induces RNA editing in monocytes and macrophages. Nat. Commun. 6, 6881 (2015).

55. Teh, A. H. et al. The 1.48A resolution crystal structure of the homotetrameric cytidine deaminase from mouse. Biochemistry 45, 7825–7833 (2006).

56. Marx, A. & Alian, A. The first crystal structure of a dTTP-bound deoxycytidylate deaminase validates and details the allosteric-inhibitor binding site. J. Biol. Chem. 290, 682–690 (2015).

57. Chung, S. J., Fromme, J. C. & Verdine, G. L. Structure of human cytidine deaminase bound to a potent inhibitor. J. Med. Chem. 48, 658–660 (2005).

58. Marx, A., Galilee, M. & Alian, A. Zinc enhancement of cytidine deaminase activity highlights a potential allosteric role of loop-3 in regulating APOBEC3 enzymes. Sci. Rep. 5, 18191 (2015).

59. Harjes, S. et al. Impact of H216 on the DNA binding and catalytic activities of the HIV restriction factor APOBEC3G. J. Virol. 87, 7008–7014 (2013).

60. Losey, H. C., Ruthenburg, A. J. & Verdine, G. L. Crystal structure of Staphylococcus aureus tRNA adenosine deaminase TadA in complex with RNA. Nat. Struct. Mol. Biol. 13, 153–159 (2006).

61. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

62. Otwinowski, Z. & Minor, W. Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326 (1997).

63. Bunkoczi, G. et al. Phaser.MRage: automated molecular replacement. Acta

125

Crystallogr. D Biol. Crystallogr. 69, 2276–2286 (2013).

64. Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D Biol. Crystallogr. 60, 2126–2132 (2004).

65. Adams, P. D. et al. The Phenix software for automated determination of macromolecular structures. Methods 55, 94–106 (2011).

66. Echols, N. et al. Graphical tools for macromolecular crystallography in PHENIX. J. Appl. Crystallogr. 45, 581–586 (2012).

67. Laitaoja, M., Valjakka, J. & Janis, J. Zinc coordination spheres in protein structures. Inorg. Chem. 52, 10983–10991 (2013).

68. Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 66, 12–21 (2010).

69. Schro dinger,̈ L. L. C. The PyMOL Molecular Graphics System, Version 1.8 (2015).

70. Dolinsky, T. J., Nielsen, J. E., McCammon, J. A. & Baker, N. A. PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Res. 32, W665–W667 (2004).

71. Krissinel, E. & Henrick, K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 372, 774–797 (2007).

72. Koradi, R., Billeter, M. & Wuthrich, K. MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph. 14, 51–55 (1996).

126

Chapter IV

Substrate Sequence Selectivity of APOBEC3A Implicates Intra-DNA Interactions

127

IV.a. Abstract

The APOBEC3 (A3) family of human cytidine deaminases is renowned for providing a first line of defense against many exogenous and endogenous retroviruses. However, the ability of these proteins to deaminate deoxycytidines in ssDNA makes A3s a double-edged sword. When overexpressed, A3s can mutate endogenous genomic DNA resulting in a variety of cancers. Although the sequence context for mutating DNA varies among A3s, the mechanism for substrate sequence specificity is not well understood. To characterize substrate specificity of A3A, a systematic approach was used to quantify the affinity for substrate as a function of sequence context, length, secondary structure, and solution pH. We identified the A3A ssDNA binding motif as (T/C)TC(A/G), which correlated with enzymatic activity. We also validated that A3A binds RNA in a sequence specific manner. A3A bound tighter to substrate binding motif within a hairpin loop compared to linear oligonucleotide, suggesting A3A affinity is modulated by substrate structure. Based on these findings and previously published A3A–ssDNA co-crystal structures, we propose a new model with intra-

DNA interactions for the molecular mechanism underlying A3A sequence preference. Overall, the sequence and structural preferences identified for A3A leads to a new paradigm for identifying A3A’s involvement in mutation of endogenous or exogenous DNA.

128

IV.b. Introduction

The APOBEC3 (short for “apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like”) family of human cytidine deaminases provides a first line of defense against many exogenous and endogenous retroviruses such as HIV-1 and the retro-element LINE-1 (1-6). APOBEC3 (A3) proteins restrict replication of retroviruses by inducing hypermutations in the viral genome (7). A3s deaminate deoxycytidines in ssDNA into uridines during reverse transcription. This results in

G to A hypermutations, as adenosines are transcribed across from uridines during second strand DNA synthesis. While all A3 enzymes deaminate deoxycytidines in ssDNA, they have differential substrate specificities that are context dependent, resulting in altered frequencies of mutation for the deoxycytidines. Some A3s deaminate the second deoxycytidine in a sequence containing CC while others deaminate deoxycytidine in a TC context (8-10).

However, not every cognate dinucleotide motif (CC or TC) in the ssDNA of the

HIV genome is deaminated (11). Nevertheless, hypermutation in a viral genome results in defective proteins and proviruses, thus decreasing the probability of further viral replication (12).

Beyond restricting viral replication, the ability of A3s to deaminate deoxycytidines in ssDNA have made A3s a double-edged sword. When overexpressed, A3s can mutate the host genome resulting in a variety of cancers. The identities and patterns of the mutations observed in cancer genomes can define the source of these mutations. Recently, the search for the

129

deaminase(s) responsible for kataegic mutations found in breast cancer was narrowed down to APOBEC3B, through the comparison of all known APOBEC mutational signatures and eliminating APOBEC3G and other deaminases from potential mutational contributors (9, 13). Soon after, APOBEC3B was found to be correlated with a variety of other cancers such as ovarian, cervical, bladder lung, head and neck; signature sequence analysis was also a contributing factor that led to these conclusions (14, 15). Most recently APOBEC3H, which has a different sequence preference than APOBEC3B, has been identified to also play a role in breast and lung cancer (16). Thus, defining A3 sequence specificity can be helpful in identifying A3’s role in viral restriction and in cancer.

A3 signature sequences proposed for deaminating deoxycytidines range between di-nucleotide to quad-nucleotide motifs (8-11, 16-21). Although A3s are known to have varied sequence preference, quantitative and systematic studies of sequence specificity are incomplete. Recently, crystal structures of

APOBEC3A (A3A) and APOBEC3B-CTD (an active site A3A chimera) with ssDNA have been solved (20, 22). However, despite these breakthrough structures, the molecular mechanism underlying substrate sequence specificity flanking the TC dinucleotide sequence remains unclear.

A3A is a single-domain enzyme with the highest catalytic activity among human APOBEC3 proteins (23) and a known restriction factor for the retroelement LINE-1 and HPV (24, 25). A3A can also contribute to carcinogenesis with increased expression or defective regulation (26). A3A is the

130

only A3 where both the intact apo and substrate bound structures have been determined (19, 20, 22, 27, 28). Initial substrate specificity studies have shown a preference for DNA over RNA, suggested by NMR chemical shift perturbation

(19). Since A3A is the best biochemically characterized A3 human cytidine deaminase and thus a critical benchmark within the family, we chose A3A to elucidate the extended characteristics of ssDNA specificity.

To determine the substrate specificity of A3A, we systematically quantified the affinity of A3A for nucleic acid substrates as a function of substrate sequence, length, secondary structure, and solution pH. We identified the A3A preferred ssDNA binding motif, (T/C)TC(A/G) and found this sequence correlated with enzymatic activity. Also, we determined that A3A can bind RNA in a sequence specific manner. Surprisingly, A3A’s signature sequence was necessary but not sufficient to account for A3A’s high affinity for ssDNA.

Significantly, A3A bound more tightly to the motif in longer oligonucleotides, and in the context of a hairpin loop. Using recently published structures of A3As complexed with ssDNA from our lab and others, we propose a structural model for the molecular mechanism for this enhanced affinity where inter-DNA interactions contribute to A3A recognition of the cognate sequence. This model provides insights into how the nucleotides flanking the canonical TC sequence may contribute to substrate sequence preference of A3A.

131

IV.c. Results

IV.c.1. A3A binding to ssDNA is context dependent.

To interrogate the substrate sequence preference of A3A, we systematically quantified the changes in binding affinity of catalytically inactive A3A bearing the mutation E72A to a library of labeled ssDNA sequences using a fluorescence anisotropy-based DNA binding assay (28). First, to ensure that the affinity for substrate was due entirely to the sequence of interest and not due to nonspecific binding or undesired secondary structure effects, an appropriate control background sequence was identified. The dissociation constants (Kd’s) for homo-

12-mer ssDNA sequences, Poly A, Poly T, Poly C, were determined (Figure

IV.1A). Poly G was not tested due its propensity to form secondary structure elements. Poly T (750 ± 44 nM), which had previously been used in background sequences (28), bound to A3A with 2-fold higher affinity than Poly C (1,600 ± 117 nM). Thus without a greater context for A3A to target, Poly C was only weakly bound. A3A had the lowest affinity for Poly A with a Kd of >11,00 nM (Table IV.1).

For all subsequent assays, Poly A was used as the background, as there is no detectible binding affinity of A3A to Poly A.

The specificity of A3A for substrate versus product was measured by binding to Poly A with a single C versus Poly A with a single U (Figure IV.1B).

Surprisingly, the presence of a single deoxycytidine in a Poly A background was not sufficient for binding with appreciable affinity. The affinity of A3A for the Poly

A-C (5A-1C-6A) (>5,000 nM) is similar to the affinity for Poly A-U (5A-1U-6A)

132

Figure 1 a 1.0 Poly A Poly T Poly C

0.5 Normalized Anisotropy (mA) 0.0

10 100 1,000 10,000 A3A E72A (nM) b 300 Poly A AAAAANAAAAAA Poly A 1C 200 Poly A 1U

100 Anisotropy (mA)

10 100 1,000 10,000 A3A E72A (nM) c 300 TTTTTNTTTTTT Poly T Poly T 1C 200 Poly T 1U

100 Anisotropy (mA)

10 100 1,000 10,000 A3A E72A (nM)

Figure IV.1. A3A specificity to ssDNA background and substrate. Fluorescence anisotropy of TAMRA-labeled ssDNA sequences binding to A3A(E72A). A. Binding of A3A to poly nucleotide (12 mers): Poly A (blue), Poly T (red) and Poly C (green), B. Binding to Poly A (blue), 5A-C-6A (red), 5A-U-6A (green), C. Binding to Poly T (blue), 5T-C-6T (red), 5T-U-6T (green).

133

TableTable IV.1. A3A 1 affinityA3A foraffinity DNA sequences for DNA used sequences in this analysis used in this analysis

134

(>6,500 nM) and even the background Poly A. This is in contrast to A3A’s specificity for binding a single C over U in a Poly T background, which is more than ten-fold (35 ± 2 nM and 500 ± 23 nM respectively) (Figure IV.1C), as we previously measured (28). This strong context dependence differentiating substrate C versus product U within the background of Poly A versus Poly T indicates that A3A heavily relies on the identity of the surrounding nucleotide sequence to recognize and bind substrate deoxycytidine.

135

IV.c.2. A3A affinity for ssDNA is pH dependent.

A systematic measurement of A3A affinity in a broad range of pH values was performed to verify and quantify the pH dependence of A3A to substrate ssDNA

(21, 26), and set a reference pH for subsequent experiments. The Kd of A3A for

TTC in a Poly A background was determined at pH ranging from 4.0 to 9.0 in 0.5 pH increments (Figure IV.2 and Table IV.2). A3A had the highest affinity for Poly

A-TTC at pH 5.5 with a Kd of 68 ± 3 nM. The isotherms for A3A binding ssDNA at pHs below 6.0 show some secondary binding event that may be due to non- specific binding or aggregation (Figure IV.2A). A steady decrease was also observed for the affinity of A3A for ssDNA when pH was increased above 6

(Figure IV.2B), in agreement with decreased deamination activity at higher pH

(26). A3A affinity also overall correlated with reported deamination activity determined using a different assay at pH 7.5 (30). Interestingly, A3A had no appreciable affinity for Poly A-TTC above pH 8.0. Since A3A is stable at these higher pH values, the lower affinity for ssDNA with increased pH is likely not due to aggregation but due to the protonation of His 29, as previously described (26) and reported to be responsible for coordinating ssDNA (31). Therefore, all of the subsequent binding experiments were performed at pH 6.0 to avoid any potential for secondary binding events or aggregation of the protein.

136

Figure IV.2. A3A affinity to ssDNA at different pHs. Fluorescence anisotropy of TAMRA-labeled ssDNA 4A-TTC-6A binding to A3A(E72A). A. Binding of A3A to ssDNA at pH 6.0 (blue), 6.5 (red), 7.0 (green), 7.5 (orange), 8.0 (purple), 8.5 (black), 9.0 (brown). B. Binding of A3A to ssDNA at pH 4.0 (blue), 4.5 (red), 5.0 (green), 5.5 (orange), and 6.0 (purple).

137

Table IV.2. A3A affinity for ssDNA Poly A -TTC in a range of pHs

138

IV.c.3. Substrate recognition is dependent on thymidine directly upstream of target deoxycytidine, with preference for pyrimidines over purines.

To study the effect of the nucleotide identity at position -1 relative to target deoxycytidine (NC) on A3A affinity for substrate (Figure IV.3A), the Kd values of

A3A for (4A)-TC-(6A), AC, CC, GC in a Poly A background were determined. A preference for TC (143 ± 4 nM), followed by CC (250 ± 14 nM) was identified.

Interestingly, AC and GC had similarly very weak binding affinities for A3A

(>5,000 and >6,500 nM respectively), validating a preference for pyrimidines (T or C) over purines (A or G) at -1 position with T as the strongest binder.

The effects of the sequence identity around the cognate dinucleotide deamination motif (TC) on affinity of A3A for ssDNA was determined by first testing the change in affinity for all nucleotide substitutions at -2 position (3A)-

NTC-(6A). A3A has a preference for pyrimidine over purine at -2 position (Figure

IV.3B) with TTC and CTC having similar affinities (90 ± 1 nM and 85 ± 1 nM respectively) compared to that of purines ATC and GTC (145 ± 2 nM and 150 ± 3 nM respectively). While not as strong as for -1 position, there is a preference for the smaller pyrimidines at position -2. Next, the effect of +1 position on affinity of

A3A to TC was determined (Figure IV.3C). A3A did not demonstrate a strong preference for any particular nucleotide, although disfavoring T, at the +1 position

(145 ± 2 nM for background versus 209 ± 5 nM).

139

Figure 2 a c 150 AC 150 TCA AAAANCAAAAAA AAAATCNAAAAA TC TCT CC TCC 100 GC 100 TCG

50 50 Anisotropy (mA) Anisotropy (mA)

10 10 100 100 1,000 1,000 A3A E72A (nM) A3A E72A (nM) b 150 d ATC 150 AAANTCAAAAAA AAANNUNAAAAA TTC ATCA ATUA 100 CTC 100 ATCG GTC ATUG TTCA 50

Anisotropy (mA) 50 TTUA Anisotropy (mA)

10 100 10 1,000 100 1,000 A3A E72A (nM) A3A E72A (nM)

Figure IV.3. A3A specificity for nucleotides flanking substrate cytidine. Fluorescence anisotropy of TAMRA-labeled ssDNA sequences to A3A(E72A). A. Binding of A3A to ssDNA with changes at -1 position of substrate C and TU (purple) in a poly A background (12 mers): 4A-AC-6A (blue), 4A-TC-6A (red), 4A- CC-6A (green), and 4A-GC-6A (orange). B. Binding of A3A to ssDNA with changes at -2 position in a TC context in a Poly A background (12 mers): 4A-ATC-6A (blue), 4A-TTC-6A (red), 4A-CTC-6A (green), and 4A-GTC-6A (orange). C. Binding of A3A to ssDNA with changes at +1 position in a TC context in a Poly A background (12 mers): 4A-TCA-6A (blue), 4A-TCT-6A (red), 4A-TCC-6A (green), and 4A-TCG-6A (orange). D.Three substrate sequences, TTCA (green), ATCG (red) and ATCA (blue), in closed circles with the corresponding 3 product sequences TTUA, ATUG and ATUA in open circles.

140

Figure 3

250 ATCN TTCN CTCN GTCN

200

150

100 Kd (nM) 50

0

A..A A..T A..C A..G T..A T..T T..C T..G C..A C..T C..C C..G G..A G..T G..C G..G

NTCN

Figure IV.4. A3A specificity for poly A xTCx. Binding affinity of A3A(E72A) to TAMRA-labeled ssDNA sequences in a Poly A background. Gray boxes bin sequences by -2 nucleotide identity. Colors represent +1 nucleotide identity: A (blue), T (red), C (green), G (orange). Consensus sequence derived from these Kd values is shown above the graph.

141

Finally, to identify if there was any interdependency between nucleotide identity at -2 and +1 positions, the affinity of A3A for (3A)-NTCN-(5A) was determined (Figure IV.4, Table IV.1). A3A displayed preference for pyrimidines at -2 position regardless of the nucleotide at +1. A3A also disfavored T at +1 position regardless of the nucleotide identity at -2. Most interestingly, A3A preferred a pyrimidine at -2 when there was a purine at +1 position. However, the reverse was not true; purine at -2 position with pyrimidine at +1 position did not result in comparable affinities. In fact, the worst binders (ATCT and GTCT) were those that contained purines at -2 with pyrimidines at +1 position. Thus, we have broadly have three classes of substrate binders high affinity (80-130 nM), medium affinity (150-165nM), and weak affinity (210-220 nM) and have identified

(T/C)TC(A/G) as the preferred sequence for ssDNA recognition by A3A.

142

IV.c.4. A3A preference for binding to substrate over product in context dependent.

A3A’s affinity for substrate C was compared to product U in the context of variations of the signature A3A substrate sequence (T/C)TC(A/G). The affinity of three substrate sequences, TTCA, ATCG and ATCA, were compared to the corresponding product sequences (Figure IV.3D). For all three sequences, a substantial loss of binding affinity was observed for the corresponding TTUA,

ATUG and ATUA, with the most substantial loss with ATUA. Thus, the decrease in affinity for product over substrate was context dependent.

143

IV.c.5. Positive correlation between sequence preference of binding and enzymatic activity.

Although enzymatic activity and binding affinity are not expected to be directly correlated, the trends for specificity would likely be similar. Thus A3A’s deamination activity was determined in the context of variations of the signature sequence (T/C)TC(A/G) using a 1H NMR based A3 deaminase activity assay.

High (TTCA and TTCG), medium (ATCA, ATCG, GTCA, GTCG, TTCT) and low

(ATCT and GTCT) affinity sequences were tested (Table IV.3) to determine the correlation between binding and activity. Overall, activity by NMR has the same trend as affinity from the binding assay (Figure IV.5). This indicates that in general those substrates sequences with varying binding affinity (high, medium and weak) are also processed in a similar order.

144

Table IV.3. A3A enzymeTable activity2 A3A enzymefor DNA sequencesactivity for DNA sequences

145

Figure 4

120#

100# R²#=#0.61# TTCA#

80# High Affinity TTCT# Medium Affinity 60# GTCA# TTCG# Low Affinity

ATCT# GTCG# 40# ATCA#

Ac#vity((reac#ons/min)( ATCG# 20# GTCT#

0# ,4.0# ,3.6# ,3.2# ,2.8# ,2.4# ,2.0# ΔG(of(binding((kcal/mol)(

Figure IV.5. Binding affinity versus enzyme activity. The enzyme activity of active A3A measured by NMR based deamination assay versus the free energy of binding calculated (∆G =-RTln (Kd) from the binding affinity for nine 12-mers. These nine represent, 2 high binding (green), 5 medium binding (orange) and 2 weak binding (red) sequences.

146

IV.c.6. Structural basis for A3A specificity for binding to preferred recognition sequence.

To determine the structural basis for the A3A consensus sequence

(T/C)TC(A/G), crystal structures of A3A bound to ssDNA recently determined by our group and others (PDB ID: 5KEG and 5SWW) were analyzed (20, 22). The target deoxycytidine is well coordinated and buried within the active site of A3A

(Figure IV.6A) in these structures. The thymidine at position -1 has extensive contacts with loop 7 (Y130, D131 and Y132), and van der Waals contacts with loop 5 (W98) (Figure IV.6B). The Watson-Crick edge of the thymidine base faces the loop 7 residues, and makes three hydrogen bonds: one with the backbone nitrogen of Y132 and the other two, one is water mediated, are with the D131 sidechain. The D131 side chain further forms a salt bridge to the R189, which stabilizes the overall hydrogen-bonding configuration of loop 7 to the thymine base. This coordination appears critical, as residue 189 is conserved as a basic residue (Arg/Lys) in catalytically active A3 domains. This coordination also explains why -1 must be the thymidine base. If the -1 position is modeled as a cytidine the N3 atom lacks the proton to hydrogen bond with D131 (Figure

IV.6C) and wouldn’t be as well coordinated thus would be less preferential.

Residues Y130 and D131, in loop 7, physically would preclude a larger purine base from fitting in this position (as modelled Figure IV.6D). Thus the T specificity at the -1 position is consistent with the crystal structures.

147

Figure IV.6. A3A recognition of substrate cytidine and pyrimidines at -1. Crystal structure of A3A(E72A/C171A) shown in surface view (gray) bound to Poly T-1C ssDNA sequence represented as sticks (PDB ID: 5KEG). A. Substrate cytidine (orange sticks) is buried in active site of A3A. Residues interacting with cytidine are shown in green sticks. B. -1 nucleotide thymidine (orange sticks) surrounded by Y130, D131 and Y132 of loop 7 (light blue sticks), W98 of loop 5 (pink sticks), and R189 (green sticks). C. Cytidine modeled into -1 position (orange sticks). N3 atom lacks proton to hydrogen bond with D131 indicate with a red X. D. Adenosine modeled into -1 position (orange sticks) shows severe van der Waal clashes if occupying the same site as the pyrimidines. Other nucleotides are shown as orange sticks. Hydrogen bond and a salt bridges shown in dashes black lines. Water shown as red spheres. Nitrogen and oxygen of residues and nucleic acids are in blue and red respectively.

148

Although A3A prefers (T/C)TC(A/G), neither of the co-crystal structures has the optimal nucleotide identity at the -2 and +1 positions (20, 22). Specificity for purine at the -2 position was not evident in the available A3A–ssDNA structures, presumably as neither structure contains an optimal ssDNA sequence. For instance, even though the 5KEG structure contains a preferred pyrimidine in the -2 position, the thymidine is disordered in this complex.

However, in both structures (20, 22), the base at +1 (pyrimidine T in 5KEG and a purine G in 5SWW) stacks with the critical histidine 29 (Figure IV.7A & B) (20,

22). This type of histidine π-π stacking can occur with either a purine or a pyrimidine. However, protonated histidine prefers to stack with a purine base over pyrimidine, with thymidine stacking being the least preferred (32) at pH 6.

Thus the base stacking potential with protonated histidine 29 provides strong rationale for the specificity for purines and the disfavoring of thymidine at the +1 position relative to substrate deoxycytidine observed in our biochemical assays

(Figure IV.4).

149 Figure 5 a c e 5’T/CTCA/G3’ 5’ -2+++++++-1+++0++++++++1+ 3’ -2T pyrimidine+ purine+ +1T

+1T Zn

H2O H29

FigureFigure 55 aa bcc dee f 5’ 3’ 5’T/CT/CTTCCA/GA/G3’ 3’ 5’5’ 5’ -2+++++++-1+++0++++++++1+-2+++++++-1+++0++++++++1+ 3’3’ -2T-2T pyrimidine+pyrimidine+-2A purine+purine+ +1T+1T +1G -2C +1G +1T+1T ZnZn +1G Zn

HH22OO H29 H29H29 H29

Figure 5 a bbc dde ff 5’T/CTCA/G3’3’3’ 5’ 5’5’ -2+++++++-1+++0++++++++1+ 3’ -2T pyrimidine+-2A-2A purine+ +1T +1G+1G -2C-2C +1G+1G +1T Zn +1G+1G ZnZn H2O H29H29 H29 H29H29

b d f 3’ 5’ -2A +1G -2C +1G +1G Zn H29 H29

150

Figure IV.7. ssDNA is bent within the complex with A3A. Crystal structure of A3A shown in surface and cartoon representation (gray) bound to ssDNA displayed as orange sticks; A. +1 thymidine (light blue) is interacting with His 29 (light green sticks) through aromatic stacking (PDB ID: 5KEG). B. +1 guanine (light blue) also interacting with His 29 through aromatic stacking (light green sticks) (PDB ID: 5SWW). C. A3A(E72A/C171A) with TTTTTTTTCTTTTTT (PDB ID: 5KEG) D. A3A(E72A) with AAAAAAATCGGGAAA (PDB ID: 5SWW). Other nucleotides are shown as orange sticks, while water (red), zinc (blue), and chloride (gray) in the active site are shown as spheres. Nitrogen and oxygen of residues and nucleic acids are in blue and red respectively. E. A schematic of hydrogen bonding between pyrimidine (pink) at -2 and purine (light blue) at +1 position via bending of the DNA by A3A upon binding. F. Model of inter-DNA base interactions through binding of A3A to ssDNA. A3A(E72A)– ssDNA complex (PDB ID: 5SWW) was used to model A3A signature sequence CTCG bound at the active site. A3A is shown as gray surface and cartoon, His29 as light green sticks, original ssDNA as orange sticks with +1G in light blue. Adenosine at -1 position was switched to cytosine (pink) with hydrogen bonds to +1G displayed as yellow dashes.

151

IV.c.7. A3A bends ssDNA to potentially allow for intra-DNA interaction between -2 and +1 nucleotides.

A common feature between the two A3A–ssDNA complex structures is that the ssDNA forms a “U” shape in the active site (Figure IV.7C & D) (20, 22).

This U shape of the bound polynucleotide may be conserved among deaminases, including adenosine deaminases (20, 33). In both A3A-ssDNA structures, the U shape of the ssDNA orients the -2 and +1 bases in close proximity to each other. Thus, we hypothesized that the observed sequence preference (Figure IV.4) for the -2 position is a result of intra-DNA interactions rather than specific interactions with the protein.

To determine the potential for intra-DNA interactions when A3A is bound to a (T/C)TC(A/G) signature sequence, molecular models were developed based on the crystal structures of A3A bound to ssDNA (PDB ID: 5KEG and 5SWW)

(20, 22). These models orient the bases of the -2 and +1 nucleotides so that they form hydrogen bonds, with the larger purine at +1 position stacking on His 29 and the smaller -2 pyrimidine coordinating the +1 base (Figure IV.7E and 7F). The reversal of the nucleotides at +1 and -2 positions would not result in a fit nearly as well, which could explain the lower affinity of purine-TC-pyrimidine. Thus the structural model explains the preference for (T/C)TC(A/G) and suggests stabilizing the inter-DNA interactions may further increase the affinity.

152

IV.c.8. Length of ssDNA affects affinity of A3A for substrate sequence.

If the bending of the ssDNA is important for substrate recognition, dependence of binding affinity on substrate length may be expected. To determine if the DNA beyond the four-nucleotide signature sequence contributed to the binding, the length of the ssDNA that contained the recognition sequence was varied in Poly A-TTC (AAA TTCA AAA AAA). A competition assay with different length oligonucleotides was performed to test the effect of ssDNA length on affinity for substrate (Figure IV.8). Length was varied from 1 nucleotide flanking each end of TTCA (TTCAA and ATTCA) to 3 nucleotides flanking each end, increasing by one nucleotide addition on either end. Surprisingly, a single nucleotide flanking TTCA signature sequence was not enough to permit binding

(Figure IV.8A), and even three nucleotides on either side still did not bring A3A binding to original binding affinity as Poly A-TTC (AAA TTCA AAA AAA) (Figure

IV.8B). Thus, binding affinity is impacted beyond the recognition motif to prefer longer sequences, although the additional nucleotides not expected to have any direct contacts with A3A, consistent with the model that intra-DNA interactions modulate A3A affinity.

153

Supplementary Figure 3

a 200

150 AAA TTCA AAA AAA A TTCA TTCA A

100 Anisotropy (mA) 50

100 1000 10000 b Competitor (nM)

200200 AAA TTCA AAA AAA AA TTCA AAA AAA TTCA AAA AAA 150150 AAA TTCA AA A TTCA AA TTCA AA TTCA A A TTCA AA 100100 AA TTCA A A TTCA A Anisotropy (mA) Anisotropy (mA) 5050

100100 10001000 1000010000 CompetitorInhibitor (nM) (nM)

Figure IV.9. A3A affinity to ssDNA of varied lengths. Fluorescence anisotropy of TAMRA-labeled ssDNA 3A-TTCA-6A to A3A(E72A) competing with unlabeled ssDNA of different lengths. A. Binding of A3A to labeled ssDNA preincubated with unlabeled 3A-TTCA-6A (red), 1A-TTCA (blue), and TTCA-1A (green). B. Binding of A3A to labeled ssDNA preincubated with unlabeled 3A-TTCA-6A (red), 2A-TTCA-3A (blue), 3A-TTCA-2A (green), 2A-TTCA-2A (blue), 1A-TTCA- 2A (purple), 2A-TTCA-1A (black), and 1A-TTCA-1A (gray).

154

IV.c.9. A3A prefers binding to target sequence in the loop of structured hairpins.

Another implication of this model would be that pre-bent DNA could be a better substrate for A3A binding, as A3A would not have to pay the entropic cost of bending the DNA. This bending of DNA could be achieved either by the inter-

DNA interactions modeled in Figure IV.7F, or when within a loop of a hairpin. To determine the significance of the bent U shape DNA structure in the mechanism of A3 binding, we tested A3A affinity to a target deoxycytidine in the loop region of a DNA hairpin. The hairpin sequence was based on a previously identified potential RNA substrate for A3A, from succinate dehydrogenase complex iron sulfur subunit B (SDHB)(34). The affinity for TTC in the loop region of hairpin

DNA was higher than that in linear DNA (26 nM vs 90–127 nM respectively). As expected, A3A had a higher affinity for the DNA hairpin with loop region containing TTC compared to one with AAA (26 nM vs ~676 nM respectively)

(Figure IV.9A). Interestingly, the Kd value for the hairpin (26 nM) is comparable to that for a single C in a polyT background (35 nM) (28). This may imply that the polyT DNA adopts a hairpin structure in solution, as has been reported (35).

A3A affinity to a target cytidine in the loop region of an RNA hairpin was also tested. The exact SDHB hairpin RNA sequence including UC in the loop of this hairpin versus a modified SDHB hairpin RNA replacing the AUC with AAA was compared. A3A had specific affinity for the hairpin RNA containing UC compared to AA (37 nM vs 202 nM respectively) (Figure IV.9B). In contrast to

155

what has been previously proposed (19), we found that A3A has high affinity and specificity for RNA. Furthermore, A3A has a higher affinity for AUC in the loop region of a hairpin compared to UUC in a linear sequence (Figure IV.10). The potential UUC substrate sequence in linear RNA has no measurable affinity, comparable to linear RNA without a potential substrate sequence. Overall, A3A has higher affinity for target sequence in the context of a pre-ordered loop region rather than linear DNA, and specific affinity for RNA hairpins with a substrate site.

156

Figure IV.10. A3A specificity for substrate in loop region of stem-loop nucleic acids. Fluorescence anisotropy of TAMRA-labeled hairpin DNA and RNA to A3A(E72A). A. Binding of A3A to a DNA version of the hairpin SDHB RNA containing TTC (dark blue) and AAA (light blue) in the loop region. B. Binding of A3A to hairpin SDHB RNA (dark orange) and the same RNA sequence replacing the UC with AA in the loop region of the hairpin (light orange).

157

100

80

60 RNA-PolyA UUCA RNA-Poly A 40 Anisotropy (mA)

10 100 1000 A3A E72A (nM)

Figure IV.11. A3A affinity to ssRNA Fluorescence anisotropy of TAMRA-labeled ssRNA sequences to A3A(E72A). Binding of A3A to ssRNA with PolyA UUCA (blue) and ssRNA Poly A (red).

158

IV.d. Discussion

A3A is a single-domain enzyme with the highest catalytic activity among the human APOBEC3 proteins (23), a known restriction factor (24, 25), and also likely contributes to carcinogenesis (26). In this study we quantified the ssDNA specificity of A3A, and identified the consensus signature sequence as

(T/C)TC(A/G). The dinucleotide sequence preference for A3A, TC, which was previously found through activity assays (10, 20, 21) was confirmed and expanded to a preference for pyrimidine-TC-purine. Surprisingly context matters, in that the background nucleotide sequence impacts binding affinity, with essentially no binding observed for Poly A 1C (Figure IV.1B), while Poly T 1C binds with 35 ± 2 nM affinity (28). Furthermore, the length of the ssDNA in which

(T/C)TC(A/G) is imbedded within also modulates affinity (Figure IV.8). Structural analysis of the two A3A-ssDNA complexes containing two distinct, but suboptimal ssDNA sequences have led us to develop a model with intra-DNA interactions for the molecular mechanism for A3A’s specificity to ssDNA. In contrast to previous results (27), which implicate the -2 position as defining specificity, the base at this position observed in both A3A–ssDNA co-crystal structures do not make any specific interactions with the protein. Rather, the hydrogen bonding edge of the -2 base is in close proximity to corresponding edge of +1 base, suggesting possible intra-DNA interactions as being determinants of preference. Our molecular modeling confirmed such interactions

159

could stabilize the U-shaped DNA conformation within the A3A active site, explaining the -2 position specificity.

We found that A3A binds to RNA in a highly specific and structural context-dependent manner. Previous reports (19) suggested that A3A bound only weakly and did not deaminate RNA. However, the potential substrate sequence was designed to lack secondary structure, which in light of our results on hairpin versus linear RNAs, may have inadvertently precluded RNA deamination. Recently, A3G and A3A were implicated in deaminating RNA in proposed RNA hairpins in whole cell lysates but the specificity was not quantified

(34, 36). Intriguingly, our data show that A3A binds RNA hairpins with similar affinity as for DNA hairpins, which suggests that RNA-editing activity of A3A might be more prevalent than previously anticipated. Future experiments will identify if A3A’s catalytic efficiency is similar for DNA and RNA hairpins.

The comprehensive identification of A3A signature sequences and preference for loop structures will enable a more accurate evaluation of A3 activity based on sequence analysis. Previous studies used only a single identified A3 signature sequence to implicate A3’s role in viral restriction or cancer progression. In contrast, our study suggests a more accurate method for determining evidence of A3 activity would be to use a set of sequences. In the case of A3A, we have identified four almost equivalent substrate signature sequences, TTCA, TTCG, CTCA, and CTCG, which should be used for identifying A3A’s involvement in mutagenesis. We also found a positive

160

correlation between A3A’s sequence preference of binding and enzymatic activity. Correlation not only legitimizes the use of a DNA binding assay with inactive enzyme as a reliable method for studying specificity of A3s, it also shows that affinity for substrate is a driving factor for catalysis. Thus, factors that could enhance or perturb binding, such as pH or nucleic acid structure, would result in modulation of deamination activity.

In addition to using the full A3A signature sequences, the probability of mutagenesis should not be solely based on nucleotide sequence, but should also be weighted by the propensity of the target sequence to be within a structured loop. Secondary structure prediction software could be used to identify the consensus sequence in loop regions of structured DNA or RNA. A3A signature sequences, (T/C)TC(A/G), that we identified, not only accounts for the discrepancies in the A3A target sequences reported in the literature such as

TTCA versus CTCG (21) (20), but also leads us to advocate a new paradigm for identifying A3A’s involvement in mutation of endogenous or exogenous DNA.

Designing inhibitors or activators for A3s has been extremely challenging.

Our results implicate a need to incorporate the structural context of the target deoxycytidine in the therapeutic design. Larger “U” shaped macrocycles may serve as more appropriate starting scaffolds in designing cancer therapies targeting A3s, which would mimic the “U” shape of the bound ssDNA.

Macrocycles have recently been shown to have good drug-like properties and may be a strategy to target these critical enzymes(37).

161

IV.e. Methods

IV.e.1. Cloning of APOBEC3A E72A overexpression construct.

The pColdII His-6-SUMO-A3A(E72A) was constructed by first cloning the SUMO gene from pOPINS His-6-SUMO into pColdII His-6 vector (Takara Biosciences) using NdeI and KpnI restriction sites. Human APOBEC3A coding sequence from pColdIII GST-A3A(E72A, C171A) was then cloned into the pColdII His-6-SUMO vector with KpnI and HindIII. The C171A mutation in the A3A construct was reverted to wild type residue by site directed mutagenesis resulting in the pColdII

His-6-SUMO-APOBEC3A(E72A) catalytically inactive over-expression construct used for all experiments in this study.

162

IV.e.2. Expression and purification of APOBEC3A E72A.

Escherichia coli BL21 DE3 Star (Stratagene) cells were transformed with the pColdII His-6-SUMO-APOBEC3A(E72A) vector described above. The E72A mutation was chosen to render the protein inactive. Expression occurred at 16 °C for 22 hours in lysogeny broth medium containing 0.5 mM IPTG and 100 µg/mL ampicillin. Cells were pelleted, re-suspended in purification buffer (50 mM Tris-

HCl [pH 7.4], 300 mM NaCl, 1 mM DTT) and lysed with a cell disruptor. Cellular debris was separated by centrifugation (45,000 g, 30 min, 4C). The fusion protein was separated using HisPur Ni-NTA resin (Thermo Scientific). The His6-SUMO tag was removed by means of a Ulp1 protease digest overnight at 4 °C.

Untagged A3A(E72A) was separated from tag and Ulp1 protease using HisPur

Ni-NTA resin. Size-exclusion chromatography using a HiLoad 16/60 Superdex 75 column (GE Healthcare) was used as a final purification step. Purified recombinant A3A was determined to be free of nucleic acid prior to binding experiments by checking OD 260/280 ratios, which was at 0.54

163

IV.e.3. Oligo source and preparation.

Labeled and unlabeled oligonucleotides used in this assay were obtained through Integrated DNA Technologies (IDT). Labeled oligonucleotides used in the fluorescence anisotropy based binding assay contain a 50-TAMRA flourophore at their 5’ end and were re-suspended in ultra-pure water at a concentration of 20 µM. Unlabeled oligonucleotides used for the competition assays were resuspended in ultra-pure water to a concentration of 4 mM.

164

IV.e.4. Fluorescence anisotropy based DNA binding assay.

Fluorescence anisotropy based DNA binding assay was performed as described (28) with minor alterations. A fixed concentration of 10 nM 50-TAMRA- labeled oligonucleotides was added to A3A-E72A in 50 mM MES buffer (pH 6.0),

100 mM NaCl, 0.5 mM TCEP in a total reaction volume of 150 mL per well in nonbinding 96-well plates (Greiner). For the fluorescence anisotropy based DNA binding assay with APOBEC3B-CTD E255A was performed in 50 mM Tris buffer

(pH 7.4), 100 mM NaCl, 0.5 mM TCEP. The concentration of APOBEC3 was varied in triplicate wells. Plates were incubated for overnight at room temperature.

For the pH dependence experiments the buffer reagent used for testing was pH 4.0–5.0 sodium acetate, pH 5.5-6.5 MES, pH 7.0-8.0 HEPES, pH 8.5-9.0

TRIS. Assay was performed as described above. For the competition assays, a fixed concentration of 300 nM A3A(E72A) was used and unlabeled oligonucleotide of varied concentration was added from 0–6.1uM. A3A(E72A) was pre-incubated with unlabeled oligonucleotide for an hour in assay buffer, then labeled DNA was added and incubated overnight at room temperature.

For all experiments, fluorescence anisotropy was measured using an

EnVision plate reader (PerkinElmer), exciting at 531 nm and detecting polarized emission at 579 nm wavelength. For analyzing data and determining Kd values,

Prism (GraphPad) was used for least-square fitting of the measured fluorescence anisotropy values (Y) at different protein concentrations (X) with a single-site

165

binding curve with Hill slope, a nonspecific linear term, and a constant background using the equation Y=(Bmax * X^h)/ (Kd^h + X^h) +NS*X +

Background, where Kd is the equilibrium dissociation constant, h is the Hill coefficient, and Bmax is the extrapolated maximum anisotropy at complete binding.

166

IV.e.5. 1H NMR based A3 deaminase activity assay.

Deaminase activity was determined for A3A protein by assaying active enzyme against linear DNA substrates and measuring the product formation using 1H

NMR. Active A3A protein (50 nM) was assayed against linear DNA substrates

(200 µM) in buffer with 50 mM MES pH 6.0, 100 mM NaCl, 0.5 mM TCEP, and

5% D2O. Experiments were performed on 9-mer substrates containing the target sequences AA(A/G/T)TC(A/G/T)AAA and at 40°C to prevent the DNA from oligomerizing due to high concentration. Experiments were performed using a

Bruker Avance III NMR spectrometer operating at a 1H Larmor frequency of 600

MHz and equipped with a cryogenic probe. Product concentration was estimated from peak integrals with Topspin 3.5 software (Bruker Biospin Corporation,

Billerica, MA) using an external standard. Activity was determined from the initial rate of product formation via first-order exponential fitting of the progress curve.

Rate errors were estimated by Monte Carlo simulation using 100 synthetic data sets and taking the residuals of the initial fit to the experimental data as the concentration error.

167

IV.e.6. Molecular Modeling.

The crystal structures of A3A bound to ssDNA (PDB ID: 5KEG and 5SWW) were used for molecular modeling (20, 22). The DNA sequence was first mutated using Coot (29). The complex structure was then prepared and minimized by

ProteinPrep Wizard in Maestro (Schrödinger) at pH6.0 with other settings as default.

168

IV.e. References:

1. Sheehy AM, Gaddis NC, Choi JD, Malim MH. Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature. 2002;418(6898):646-50. PubMed PMID: 12167863.

2. Zheng YH, Irwin D, Kurosu T, Tokunaga K, Sata T, Peterlin BM. Human APOBEC3F is another host factor that blocks human immunodeficiency virus type 1 replication. J Virol. 2004;78(11):6073-6. PubMed PMID: 15141007.

3. Dang Y, Siew LM, Wang X, Han Y, Lampen R, Zheng YH. Human cytidine deaminase APOBEC3H restricts HIV-1 replication. J Biol Chem. 2008;283(17):11606-14. doi: 10.1074/jbc.M707586200. PubMed PMID: 18299330; PMCID: 2430661.

4. Dang Y, Wang X, Esselman WJ, Zheng YH. Identification of APOBEC3DE as another antiretroviral factor from the human APOBEC family. J Virol. 2006;80(21):10522-33. doi: 10.1128/JVI.01123-06. PubMed PMID: 16920826; PMCID: 1641744.

5. Bogerd HP, Wiegand HL, Doehle BP, Lueders KK, Cullen BR. APOBEC3A and APOBEC3B are potent inhibitors of LTR-retrotransposon function in human cells. Nucleic Acids Res. 2006;34(1):89-95. doi: 10.1093/nar/gkj416. PubMed PMID: 16407327; PMCID: PMC1326241.

6. Muckenfuss H, Hamdorf M, Held U, Perkovic M, Lower J, Cichutek K, Flory E, Schumann GG, Munk C. APOBEC3 proteins inhibit human LINE-1 retrotransposition. J Biol Chem. 2006;281(31):22161-72. doi: 10.1074/jbc.M601716200. PubMed PMID: 16735504.

7. Mangeat B, Turelli P, Caron G, Friedli M, Perrin L, Trono D. Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts. Nature. 2003;424(6944):99-103. PubMed PMID: 12808466.

8. Hultquist JF, Lengyel JA, Refsland EW, LaRue RS, Lackey L, Brown WL, Harris RS. Human and rhesus APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H demonstrate a conserved capacity to restrict Vif-deficient HIV-1. J Virol. 2011;85(21):11220-34. doi: 10.1128/JVI.05238-11. PubMed PMID: 21835787; PMCID: PMC3194973.

9. Burns MB, Lackey L, Carpenter MA, Rathore A, Land AM, Leonard B, Refsland EW, Kotandeniya D, Tretyakova N, Nikas JB, Yee D, Temiz NA, Donohue DE, McDougle RM, Brown WL, Law EK, Harris RS. APOBEC3B is an

169

enzymatic source of mutation in breast cancer. Nature. 2013;494(7437):366-70. doi: 10.1038/nature11881. PubMed PMID: 23389445.

10. Stenglein MD, Burns MB, Li M, Lengyel J, Harris RS. APOBEC3 proteins mediate the clearance of foreign DNA from human cells. Nat Struct Mol Biol. 2010;17(2):222-9. PubMed PMID: 20062055.

11. Liddament MT, Brown WL, Schumacher AJ, Harris RS. APOBEC3F properties and hypermutation preferences indicate activity against HIV-1 in vivo. Curr Biol. 2004;14(15):1385-91. PubMed PMID: 15296757.

12. Harris RS, Bishop KN, Sheehy AM, Craig HM, Petersen-Mahrt SK, Watt IN, Neuberger MS, Malim MH. DNA deamination mediates innate immunity to retroviral infection. Cell. 2003;113(6):803-9. PubMed PMID: 12809610.

13. Taylor BJ, Nik-Zainal S, Wu YL, Stebbings LA, Raine K, Campbell PJ, Rada C, Stratton MR, Neuberger MS. DNA deaminases induce break-associated mutation showers with implication of APOBEC3B and 3A in breast cancer kataegis. Elife. 2013;2:e00534. doi: 10.7554/eLife.00534. PubMed PMID: 23599896; PMCID: PMC3628087.

14. Burns MB, Temiz NA, Harris RS. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat Genet. 2013;45(9):977-83. doi: 10.1038/ng.2701. PubMed PMID: 23852168.

15. Leonard B, Hart SN, Burns MB, Carpenter MA, Temiz NA, Rathore A, Vogel RI, Nikas JB, Law EK, Brown WL, Li Y, Zhang Y, Maurer MJ, Oberg AL, Cunningham JM, Shridhar V, Bell DA, April C, Bentley D, Bibikova M, Cheetham RK, Fan JB, Grocock R, Humphray S, Kingsbury Z, Peden J, Chien J, Swisher EM, Hartmann LC, Kalli KR, Goode EL, Sicotte H, Kaufmann SH, Harris RS. APOBEC3B Upregulation and Genomic Mutation Patterns in Serous Ovarian Carcinoma. Cancer Res. 2013;73(24):7222-31. doi: 10.1158/0008-5472.CAN-13- 1753. PubMed PMID: 24154874; PMCID: 3867573.

16. Starrett GJ, Luengas EM, McCann JL, Ebrahimi D, Temiz NA, Love RP, Feng Y, Adolph MB, Chelico L, Law EK, Carpenter MA, Harris RS. The DNA cytosine deaminase APOBEC3H haplotype I likely contributes to breast and lung cancer mutagenesis. Nat Commun. 2016;7:12918. doi: 10.1038/ncomms12918. PubMed PMID: 27650891; PMCID: PMC5036005.

17. Ara A, Love RP, Chelico L. Different mutagenic potential of HIV-1 restriction factors APOBEC3G and APOBEC3F is determined by distinct single-stranded DNA scanning mechanisms. PLoS Pathog. 2014;10(3):e1004024. doi:

170

10.1371/journal.ppat.1004024. PubMed PMID: 24651717; PMCID: PMC3961392.

18. Holtz CM, Sadler HA, Mansky LM. APOBEC3G cytosine deamination hotspots are defined by both sequence context and single-stranded DNA secondary structure. Nucleic Acids Res. 2013;41(12):6139-48. doi: 10.1093/nar/gkt246. PubMed PMID: 23620282; PMCID: PMC3695494.

19. Mitra M, Hercik K, Byeon IJ, Ahn J, Hill S, Hinchee-Rodriguez K, Singer D, Byeon CH, Charlton LM, Nam G, Heidecker G, Gronenborn AM, Levin JG. Structural determinants of human APOBEC3A enzymatic and nucleic acid binding properties. Nucleic Acids Res. 2014;42(2):1095-110. doi: 10.1093/nar/gkt945. PubMed PMID: 24163103; PMCID: 3902935.

20. Shi K, Carpenter MA, Banerjee S, Shaban NM, Kurahashi K, Salamango DJ, McCann JL, Starrett GJ, Duffy JV, Demir O, Amaro RE, Harki DA, Harris RS, Aihara H. Structural basis for targeted DNA cytosine deamination and mutagenesis by APOBEC3A and APOBEC3B. Nat Struct Mol Biol. 2017;24(2):131-9. doi: 10.1038/nsmb.3344. PubMed PMID: 27991903; PMCID: PMC5296220.

21. Byeon IJ, Byeon CH, Wu T, Mitra M, Singer D, Levin JG, Gronenborn AM. Nuclear Magnetic Resonance Structure of the APOBEC3B Catalytic Domain: Structural Basis for Substrate Binding and DNA Deaminase Activity. Biochemistry. 2016;55(21):2944-59. doi: 10.1021/acs.biochem.6b00382. PubMed PMID: 27163633; PMCID: PMC4943463.

22. Kouno T, Silvas TV, Hilbert BJ, Shandilya SMD, Bohn MF, Kelch BA, Royer WE, Somasundaran M, Kurt Yilmaz N, Matsuo H, Schiffer CA. Crystal structure of APOBEC3A bound to single-stranded DNA reveals structural basis for cytidine deamination and specificity. Nat Commun. 2017;8:15024. doi: 10.1038/ncomms15024. PubMed PMID: 28452355; PMCID: PMC5414352.

23. Carpenter MA, Li M, Rathore A, Lackey L, Law EK, Land AM, Leonard B, Shandilya SM, Bohn MF, Schiffer CA, Brown WL, Harris RS. Methylcytosine and normal cytosine deamination by the foreign DNA restriction enzyme APOBEC3A. J Biol Chem. 2012;287(41):34801-8. doi: 10.1074/jbc.M112.385161. PubMed PMID: 22896697; PMCID: PMC3464582.

24. Bogerd HP, Wiegand HL, Hulme AE, Garcia-Perez JL, O'Shea KS, Moran JV, Cullen BR. Cellular inhibitors of long interspersed element 1 and Alu retrotransposition. Proc Natl Acad Sci U S A. 2006;103(23):8780-5. doi: 10.1073/pnas.0603313103. PubMed PMID: 16728505; PMCID: PMC1482655.

171

25. Vartanian JP, Guetard D, Henry M, Wain-Hobson S. Evidence for editing of human papillomavirus DNA by APOBEC3 in benign and precancerous lesions. Science. 2008;320(5873):230-3. doi: 10.1126/science.1153201. PubMed PMID: 18403710.

26. Pham P, Landolph A, Mendez C, Li N, Goodman MF. A biochemical analysis linking APOBEC3A to disparate HIV-1 restriction and skin cancer. J Biol Chem. 2013;288(41):29294-304. doi: 10.1074/jbc.M113.504175. PubMed PMID: 23979356; PMCID: PMC3795231.

27. Byeon IJ, Ahn J, Mitra M, Byeon CH, Hercik K, Hritz J, Charlton LM, Levin JG, Gronenborn AM. NMR structure of human restriction factor APOBEC3A reveals substrate binding and enzyme specificity. Nat Commun. 2013;4:1890. doi: 10.1038/ncomms2883. PubMed PMID: 23695684; PMCID: 3674325.

28. Bohn MF, Shandilya SM, Silvas TV, Nalivaika EA, Kouno T, Kelch BA, Ryder SP, Kurt-Yilmaz N, Somasundaran M, Schiffer CA. The ssDNA Mutator APOBEC3A Is Regulated by Cooperative Dimerization. Structure. 2015;23(5):903-11. doi: 10.1016/j.str.2015.03.016. PubMed PMID: 25914058.

29. Emsley P, Cowtan K. Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr. 2004;60(Pt 12 Pt 1):2126-32. doi: 10.1107/S0907444904019158. PubMed PMID: 15572765.

30. Love RP, Xu H, Chelico L. Biochemical analysis of hypermutation by the deoxycytidine deaminase APOBEC3A. J Biol Chem. 2012;287(36):30812-22. doi: 10.1074/jbc.M112.393181. PubMed PMID: 22822074; PMCID: 3436324.

31. Harjes S, Solomon WC, Li M, Chen KM, Harjes E, Harris RS, Matsuo H. Impact of H216 on the DNA binding and catalytic activities of the HIV restriction factor APOBEC3G. J Virol. 2013;87(12):7008-14. Epub 2013/04/19. doi: 10.1128/JVI.03173-12. PubMed PMID: 23596292; PMCID: 3676121.

32. Churchill CD, Wetmore SD. Noncovalent interactions involving histidine: the effect of charge on pi-pi stacking and T-shaped interactions with the DNA nucleobases. J Phys Chem B. 2009;113(49):16046-58. doi: 10.1021/jp907887y. PubMed PMID: 19904910.

33. Losey HC, Ruthenburg AJ, Verdine GL. Crystal structure of Staphylococcus aureus tRNA adenosine deaminase TadA in complex with RNA. Nat Struct Mol Biol. 2006;13(2):153-9. PubMed PMID: 16415880.

34. Sharma S, Patnaik SK, Taggart RT, Kannisto ED, Enriquez SM, Gollnick P, Baysal BE. APOBEC3A cytidine deaminase induces RNA editing in monocytes

172

and macrophages. Nat Commun. 2015;6:6881. doi: 10.1038/ncomms7881. PubMed PMID: 25898173; PMCID: PMC4411297.

35. Johnson AT, Wiest O. Structure and dynamics of poly(T) single-strand DNA: implications toward CPD formation. J Phys Chem B. 2007;111(51):14398-404. doi: 10.1021/jp076371k. PubMed PMID: 18052367.

36. Sharma S, Patnaik SK, Taggart RT, Baysal BE. The double-domain cytidine deaminase APOBEC3G is a cellular site-specific RNA editing enzyme. Sci Rep. 2016;6:39100. doi: 10.1038/srep39100. PubMed PMID: 27974822; PMCID: PMC5156925.

37. Heinis C. Drug discovery: tools and rules for macrocycles. Nat Chem Biol. 2014;10(9):696-8. doi: 10.1038/nchembio.1605. PubMed PMID: 25038789.

173

Chapter V

Structural basis for mutation-induced destabilization of profilin 1 in ALS

174

V.a. Abstract

Mutations in profilin 1 (PFN1) are associated with amyotrophic lateral sclerosis (ALS); however, the pathological mechanism of PFN1 in this fatal disease is unknown. We demonstrate that ALS-linked mutations severely destabilize the native conformation of PFN1 in vitro and cause accelerated turnover of the PFN1 protein in cells. This mutation- induced destabilization can account for the high propensity of ALS- linked variants to aggregate and also provides rationale for their reported loss-of-function phenotypes in cell-based assays. The source of this destabilization is illuminated by the X-ray crystal structures of several PFN1 proteins, revealing an expanded cavity near the protein core of the destabilized M114T variant. In contrast, the E117G mutation only modestly perturbs the structure and stability of PFN1, an observation that reconciles the occurrence of this mutation in the control population. These findings suggest that a destabilized form of PFN1 underlies PFN1-mediated ALS pathogenesis.

175

V.b. Introduction

Mutations in the profilin 1 gene (PFN1) were recently associated with both familial and sporadic forms of amyotrophic lateral sclerosis (ALS) (1, 2), an incurable and fatal neurodegenerative disease that primarily targets motor neurons (3). The etiology of sporadic ALS is poorly understood, whereas familial

ALS is caused by inheritable genetic defects in defined genes such as PFN1 (3).

PFN1 is a 15-kDa protein that is best known for its role in actin dynamics in the context of endocytosis, membrane trafficking, cell motility, and neuronal growth and differentiation (4). In addition to binding monomeric or G-actin, PFN1 also binds to a host of different proteins through their poly-L-proline motifs and to lipids such as phosphatidylinositol 4,5-bisphosphate (4, 5). However, little is known about the mechanism(s) associated with PFN1-mediated ALS pathogenesis. The observation that most ALS-linked PFN1 variants are highly prone to aggregation in mammalian cultured cells suggests that disease-causing mutations induce an altered, or misfolded, conformation within PFN1 (2). Protein misfolding is a hallmark feature of most neurodegenerative diseases, including

ALS (3), and can contribute to disease through both gain-of-toxic-function and loss-of-normal-function mechanisms (6). Although mutations in PFN1 cause ALS through a dominant inheritance mode (2), there is some evidence supporting a loss-of-function mechanism for mutant PFN1. For example, ALS-linked mutations were shown to abrogate the binding of PFN1 to actin (2) and to impair the

176

incorporation of PFN1 into cytoplasmic stress granules during arsenite-induced stress (7) in cultured cells. Moreover, ectopic expression of these variants in murine motor neurons led to a reduction in both axon outgrowth and growth cone size, consistent with a loss of function through a dominant-negative mechanism

(2).

Although ALS-linked mutations were shown to induce PFN1 aggregation, the effect of these mutations on protein stability and structure has not been studied. Because the impact of disease-causing mutations on protein stability varies from protein to protein (8–10), these parameters must be determined empirically. Here, we demonstrate that certain familial ALS-linked mutations severely destabilize PFN1 in vitro and cause faster turnover of the protein in neuronal cells. To gain insight into the source of this mutation-induced instability, the 3D crystal structures for three PFN1 proteins, including the WT protein, were solved by X-ray crystallography. We discovered that the M114T mutation created a cleft that extended into the interior of PFN1. Further, we predict that the most severely destabilizing C71G mutation also creates a cavity near the core of the

PFN1 protein, proximal to the cleft formed by M114T. Experimental mutations that create enlarged pockets or cavities are known to exert a destabilizing effect on the protein’s native conformation (11), and there are several examples of mutation-induced cavity formation occurring in nature and disease (12, 13).

Interestingly, the variant predicted to be the least pathogenic according to recent

177

genetics studies, E117G, was relatively stable and closely resembled the WT protein in every assessment performed herein (2, 14). These data implicate a destabilized form of PFN1 in ALS pathogenesis and call for therapeutic strategies that can stabilize mutant PFN1.

178

V.c. Results

V.c.1. ALS-Linked Mutations Destabilize PFN1 in Vitro.

To investigate the effect of ALS-linked mutations on the stability of PFN1,

PFN1 proteins were expressed and purified from Escherichia coli and subjected to chemical and thermal denaturation analyses. A novel purification protocol that includes sequential cation-exchange and gel filtration chromatography steps was developed here and applied to all PFN1 variants (V.d.Methods). PFN1 C71G was found to be highly prone to aggregation in E. coli, consistent with observations that this variant exhibited particularly low solubility in mammalian cells (2), and therefore was isolated from inclusion bodies (V.d.Methods). The biochemical properties of PFN1 C71G purified from inclusion bodies are indistinguishable from PFN1 C71G purified from the soluble lysate of E. coli as determined by several assays (Figure V.1), providing confidence that PFN1 proteins purified by these two methods can be directly compared.

To examine the stability of PFN1 proteins, fluorescence from tryptophans

(W4 and W32) in PFN1 WT and ALS-linked variants was measured as a function of increasing urea concentration (Figure V.2A). To ensure reversibility, the reciprocal analysis was also performed, where denatured PFN1 proteins in urea were refolded upon dilution with buffer (Figure IV.3 A–E). Only one transition was observed between the folded or native (N) and unfolded (U) states for all

PFN1 proteins, indicative of a two-state unfolding mechanism. This two-state unfolding model was further substantiated with an unfolding study of two PFN1

179

Fig. S1. A comparison of PFN1 C71G purified from the soluble lysate of Escherichia coli vs. from inclusion bodies. (A)Equilibriumunfoldingand(B) thermal denaturation curves (describedFigure in Fig. V.1. 1) for A PFN1 comparison C71G purified fromof PFN1 the soluble C71G lysate purified and inclusion from bodies. the The soluble apparent meltinglysate temperature of PFN1 C71G purified from inclusion bodiesof Escherichia (34.62 ± 0.05 °C) iscoli the samevs. asfrom that purifiedinclusion from solublebodies. lysate (34.60 ± 0.03 °C). (C) PFN1 C71G has similar affinities to poly-L- proline as determined by the binding assay described in Fig. 6 irrespective of whether this variant was purified from the soluble lysate or inclusion bodies. A. Equilibrium unfolding and B. thermal denaturation curves (described in Figure V.2) for PFN1 C71G purified from the soluble lysate and inclusion bodies. The apparent melting temperature of PFN1 C71G purified from inclusion bodies (34.62 ± 0.05 °C) is the same as that purified from soluble lysate (34.60 ± 0.03 °C). C. PFN1 C71G has similar affinities to poly-L-proline as determined by the binding assay described in Figure V.14 irrespective of whether this variant was purified from the soluble lysate or inclusion bodies.

180

Boopathy et al. www.pnas.org/cgi/content/short/1424108112 3 of 10

and Table 1). Differential scanning fluorimetry (DSF) with SYPRO Orange, a fluorescent indicator of hydrophobic regions exposed upon protein unfolding, was used next to determine the apparent melting temperature, Tm,forallPFN1proteinsusedinthisstudy (15). Consistent with the chemical denaturation results, all ALS- linked variants except E117G exhibited a Tm that was at least 10 °C lower than WT (Fig. 1B and Table 1). Based on the denaturation studies, C71G emerges as the most destabilizing mutation in the context of PFN1, whereas the E117G mutation has a relatively modest impact on PFN1 stability. Fig. 1. ALS-linked mutations destabilize PFN1. Chemical and thermal de- naturationFigure V.2. studies ALS-linked reveal mutations that ALS-linked destabilize variants PFN1. C71G, M114T, and G118V, ALS-Linked PFN1 Exhibits Faster Turnover in a Neuronal Cell Line. The Chemical and thermal denaturation studies reveal that ALS-linked variants but not E117G, are severely destabilized relative to PFN1 WT. (A)Equilibrium turnover rate for proteins with destabilizing mutations is often unfoldingC71G, M114T, curves and for G118V, PFN1 WT but and not ALS-linkedE117G, are variants severely generated destabilized by relative mea- to PFN1 WT. faster relative to their WT counterparts, generally because suring the intrinsic tryptophan fluorescence of the indicated protein equil- A. Equilibrium unfolding curves for PFN1 WT and ALS-linked variants destabilized proteins are misfolded and targeted for degradation ibrated in increasing concentrations of urea. Data were processed to obtain generated by measuring the intrinsic tryptophan fluorescence of the indicated the center of mass (COM) of the emission spectrum and then fit to a two- by the cellular quality control machinery (16). To determine protein equilibrated in increasing concentrations of urea. Data were processed whether the results of our in vitro denaturation studies extend to stateto obtain model the for center protein of mass folding. (COM) The resultingof the emission fits are spectrum displayed and as solid then lines. fit to a Thetwo- correspondingstate model for thermodynamicprotein folding. The parameters resulting obtainedfits are displayed from the as fitted solid a cellular environment, V5-tagged PFN1 variants were tran- datalines. are The shown corresponding in Table 1.thermodynamic (B) Thermal denaturation parameters obtained profiles offrom PFN1 the pro- fitted siently transfected into human neuronal SKNAS cells, and PFN1 teinsdata are measured shown byin Table SYPRO 1. Orange fluorescence as a function of increasing turnover was assessed by tracking V5-PFN1 protein expression temperatureB. Thermal den wereaturation used toprofiles determine of PFN1 the pro apparentteins measured Tm, which by isSYPRO the tem- over a 12.5-h time course in the presence of cycloheximide. At peratureOrange fluorescence corresponding as toa function 0.50 fluorescence of increasing signal temperature as denoted were by used the in-to the start of the experiment (t = 0 of the cycloheximide time tersectiondetermine ofthe the apparent dashed Tm, lines which for each is the curve. temperature corresponding to 0.50 course), all V5-tagged PFN1 variants were expressed at similar fluorescence signal as denoted by the intersection of the dashed lines for each curve. levels except that V5-PFN1 C71G, M114T, and G118V parti- here and applied to all PFN1 variants (Materials and Methods). tioned into the insoluble fraction (Fig. 2 A and B) as reported PFN1 C71G was found to be highly prone to aggregation in previously (2). The turnover of both PFN1 C71G and M114T E. coli, consistent with observations that this variant exhibited occurred significantly faster than that of PFN1 WT. As early as 2.5 h, the majority of PFN1 C71G and M114T within the soluble particularly low solubility in mammalian cells (2), and therefore fraction had already degraded (Fig. 2 A and C). This decrease in was isolated from inclusion bodies (Materials and Methods). The soluble PFN1 content was not simply due to further PFN1 ag- biochemical properties of PFN1 C71G purified from inclusion gregation, which could confound our analysis, as evidenced by bodies are indistinguishable from PFN1 C71G purified from the the concomitant clearance of PFN1 from the insoluble fraction soluble lysate of E. coli as determined by several assays (Fig. S1), at the early time points of cycloheximide exposure (Fig. 2B). The providing confidence that PFN1 proteins purified by these two faster turnover of PFN1 C71G and M114T in cells closely cor- methods can be directly compared. relates with their reduced stabilities in vitro, confirming the To examine the stability of PFN1proteins,fluorescencefrom destabilizing effect of the C71G and M114T mutations. We note tryptophans (W4 and W32) in PFN1 WT and ALS-linked variants that the turnover of PFN1 C71G was faster in the soluble frac- was measured as a function of increasing urea concentration (Fig. tion compared with the insoluble fraction (Fig. S3), likely be- 1A). To ensure reversibility, the reciprocal analysis was also per- 181 cause clearance of insoluble cellular aggregates by the quality formed, where denatured PFN1 proteins in urea were refolded control machinery is less efficient compared with the turnover of upon dilution with buffer (Fig. S2 A–E). Only one transition was smaller, soluble species (17). Although PFN1 G118V was destabi- observed between the folded or native (N) and unfolded (U) states lized to a similar degree as M114T in vitro, the turnover of this for all PFN1 proteins, indicative of a two-state (N$U) unfolding variant within the soluble fraction seemed slower in cells (Fig. 2C), mechanism. This two-state unfolding model was further sub- which may reflect a stabilizing effectofotherproteinsand/orfactors stantiated with an unfolding study of two PFN1 proteins (WT and that interact with PFN1 in the cellular milieu (4), or that this variant M114T) using CD spectroscopy (Fig. S2F). The following thermo- is not properly handled by the quality control machinery in the cell. dynamic parameters were determined by fitting the fluorescence In fact, we detected a low level of insoluble PFN1 G118V that data to a two-state folding model: apparent ΔG°, the free energy of persisted throughout the 12.5-h time course (Fig. 2B and Fig. S3). folding; m, the denaturant dependence of ΔG°; and Cm,themid- point of the unfolding transition (Table 1). Both ΔG° and Cm were ALS-Linked Mutations Induce a Misfolded Conformation Within PFN1.

reduced for ALS-linked variants relative to PFN1 WT, particularly We reasoned that ALS-linked variants must undergo some de- BIOPHYSICS AND for the PFN1 variants C71G, M114T, and G118V, indicating these gree of structural or conformational change to account for their COMPUTATIONAL BIOLOGY variants are severely destabilized compared with PFN1 WT (Fig. 1A destabilization. However, ALS-causing mutations did not perturb

Table 1. Summary of experimental stability and binding measurements for PFN1 variants † Equilibrium unfolding (N$U)* Tm, °C †,‡ Binding to poly-L-proline –1 –1 –1 Variant ΔG°, kcal·mol m, kcal·mol ·M Cm,M Proteinalone + 4 mM proline Kd, μM WT 7.04 ± 0.49 2.25 ± 0.16 3.13 ± 0.31 54.68 ± 0.04 57.25 ± 0.03 463 ± 26 C71G 1.89 ± 0.70 1.95 ± 0.40 0.97 ± 0.41 34.60 ± 0.03 39.96 ± 0.03 687 ± 77 M114T 3.51 ± 0.40 2.51 ± 0.24 1.40 ± 0.21 42.62 ± 0.03 46.52 ± 0.02 572 ± 23 E117G 6.90 ± 0.74 2.49 ± 0.26 2.77 ± 0.42 51.05 ± 0.04 53.78 ± 0.03 407 ± 27 G118V 3.70 ± 0.44 2.20 ± 0.23 1.68 ± 0.26 42.84 ± 0.04 46.92 ± 0.04 397 ± 40

*Errors are shown as SD. †Errors are shown as SE. ‡ Kd values are reported in terms of proline residues.

Boopathy et al. PNAS | June 30, 2015 | vol. 112 | no. 26 | 7985

Fig. S2. All PFN1 variants unfold by a two-state process. (A–E) PFN1 variants denatured in urea were refolded by diluting the urea. The final concentration of PFN1 in each sample was 10 μM and tryptophan fluorescence was used to monitor folding. The equilibrium transition regions overlay closely for the unfolding and refolding curves, indicating that the unfolding reaction is reversible. Filled and open circles represent unfolding and refolding, respectively. (F)Thetwo- state unfolding of PFN1 observed by intrinsic fluorescence (data from Fig. 1A; Fluor) was verified by CD measurements for PFN1 WT and M114T. The con- centration of protein used was 2 μMand10μM for tryptophan fluorescence and CD measurements, respectively. The y axis on the left is the mean residue ellipticity at 220 nm (MRE220) obtained from CD experiments, whereas the y axis on the right reflects the change in the COM (as shown in Fig. 1). The thermodynamic parameters obtained by fitting the CD data agree well with those obtained from the fluorescence data (Table 1) and are as follows: for WT −1 −1 −1 −1 −1 −1 ΔG° = 7.16 ± 0.11 kcal·mol ,m= 2.36 ± 0.04 kcal·mol ·M ,Cm = 3.03 ± 0.07 M; for M114T ΔG° = 4.35 ± 0.10 kcal·mol ,m= 2.95 ± 0.06 kcal·mol ·M ,Cm = 1.47 ± 0.05 M.

182

Fig. S3. The turnover of insoluble PFN1 in SKNAS cells. The experiment was carried out as described in Fig. 2, and a representative Western blot analysis of the insoluble fraction is shown in Fig. 2B. The data above reflect the densitometry results from an average of n = 2 (M114T) or n = 3 (C71G and G118V) independent experiments and error bars represent SEM. Each sample was normalized to the PFN1 C71G band corresponding to “time 0.” The turnover of C71G within the insoluble fraction was slower relative to C71G within the soluble fraction (compare this graph to that in Fig. 2C). There was relatively less M114T and G118V in the insoluble fraction compared with C71G, and the small fraction of insoluble G118V persisted throughout the experimental time course.

Boopathy et al. www.pnas.org/cgi/content/short/1424108112 4 of 10

Figure V.3. All PFN1 variants unfold by a two-state process. A–E. PFN1 variants denatured in urea were refolded by diluting the urea. The final concentration of PFN1 in each sample was 10 μM and tryptophan fluorescence was used to monitor folding. The equilibrium transition regions overlay closely for the unfolding and refolding curves, indicating that the unfolding reaction is reversible. Filled and open circles represent unfolding and refolding, respectively. F. The two- state unfolding of PFN1 observed by intrinsic fluorescence (data from Figure V.2A; Fluor) was verified by CD measurements for PFN1 WT and M114T. The concentration of protein used was 2 μM and 10 μM for tryptophan fluorescence and CD measurements, respectively. The y axis on the left is the mean residue ellipticity at 220 nm (MRE220) obtained from CD experiments, whereas the y axis on the right reflects the change in the COM (as shown in Figure IV.2). The thermodynamic parameters obtained by fitting the CD data agree well with those obtained from the fluorescence data (Table V.1) and are as follows: for WT ΔG° = 7.16 ± 0.11 kcal·mol−1, m = 2.36 ± 0.04 kcal·mol−1·M−1, Cm = 3.03 ± 0.07 M; for M114T ΔG° = 4.35 ± 0.10 kcal·mol−1, m = 2.95 ± 0.06 kcal·mol−1·M−1, Cm = 1.47 ± 0.05 M.

183

proteins (WT and M114T) using CD spectroscopy (Figure V.3F). The following thermo- dynamic parameters were determined by fitting the fluorescence data to a two-state folding model: apparent ΔG°, the free energy of folding; m, the denaturant dependence of ΔG°; and Cm, the mid-point of the unfolding transition

(Table 1). Both ΔG° and Cm were reduced for ALS-linked variants relative to

PFN1 WT, particularly for the PFN1 variants C71G, M114T, and G118V, indicating these variants are severely destabilized compared with PFN1 WT

(Figure V.2A and Table V.1). Differential scanning fluorimetry (DSF) with

SYPRO Orange, a fluorescent indicator of hydrophobic regions exposed upon protein unfolding, was used next to determine the apparent melting temperature,

Tm, for all PFN1 proteins used in this study (15). Consistent with the chemical denaturation results, all ALS- linked variants except E117G exhibited a Tm that was at least 10 °C lower than WT (Figure V.2B and Table V.1). Based on the denaturation studies, C71G emerges as the most destabilizing mutation in the context of PFN1, whereas the E117G mutation has a relatively modest impact on

PFN1 stability.

184 and Table 1). Differential scanning fluorimetry (DSF) with SYPRO Orange, a fluorescent indicator of hydrophobic regions exposed upon protein unfolding, was used next to determine the apparent melting temperature, Tm,forallPFN1proteinsusedinthisstudy (15). Consistent with the chemical denaturation results, all ALS- linked variants except E117G exhibited a Tm that was at least 10 °C lower than WT (Fig. 1B and Table 1). Based on the denaturation studies, C71G emerges as the most destabilizing mutation in the context of PFN1, whereas the E117G mutation has a relatively modest impact on PFN1 stability. Fig. 1. ALS-linked mutations destabilize PFN1. Chemical and thermal de- naturation studies reveal that ALS-linked variants C71G, M114T, and G118V, ALS-Linkedand Table PFN1 1). Differential Exhibits Faster scann Turnovering fluorimetry in a Neuronal (DSF) Cell with Line. SYPROThe but not E117G, are severely destabilized relative to PFN1 WT. (A)Equilibrium turnoverOrange, rate a fluorescent for proteins indicator with destabilizing of hydrophobic mutations regions is oftenexposed unfolding curves for PFN1 WT and ALS-linked variants generated by mea- fasterupon relativeprotein unfolding, to their WT was used counterparts, next to determine generally the because apparent suring the intrinsicand Table tryptophan 1). Differential fluorescence scann ofing the fluorimetry indicated protein (DSF) equil- with SYPROdestabilized proteins are misfolded and targeted for degradation ibrated in increasing concentrations of urea. Data were processed to obtain melting temperature, Tm,forallPFN1proteinsusedinthisstudy Orange, a fluorescent indicator of hydrophobic regions exposed the center of mass (COM) of the emission spectrum and then fit to a two- by(15). the Consistent cellular quality with the control chemical machinery denaturation (16). To results, determine all ALS- upon protein unfolding, was used next to determine the apparent state model for protein folding. The resulting fits are displayed as solid lines. whetherlinked variants the results except of our E117G in vitro exhibited denaturation a Tm that studies was at extend least 10 to °C The correspondingmelting thermodynamic temperature, T parametersm,forallPFN1proteinsusedinthisstudy obtained from the fitted alower cellular than environment, WT (Fig. 1B V5-taggedand Table 1).PFN1 Based variants on the were denaturation tran- data are shown(15). in Consistent Table 1. (B) with Thermal the denaturation chemical denaturation profiles of PFN1 results, pro- all ALS-sientlystudies, transfected C71G emerges into human as the neuronal most destabilizing SKNAS cells, mutation and PFN1 in the teins measuredlinked by SYPRO variants Orange except fluorescence E117G exhibited as a function a Tm that of increasing was at least 10turnovercontext °C wasof PFN1, assessed whereas by tracking the E117G V5-PFN1 mutation protein has expression a relatively lower than WT (Fig. 1B and Table 1). Based on the denaturation temperature were used to determine the apparent Tm, which is the tem- overmodest a 12.5-h impact time on course PFN1 stability. in the presence of cycloheximide. At Fig.perature 1. ALS-linked correspondingstudies, mutations C71G to 0.50 emerges destabilize fluorescence as the PFN1. signalmost Chemical destabilizing as denoted and thermal by mutation the in- de- inthe the start of the experiment (t = 0 of the cycloheximide time naturationtersection of studiescontext the dashed reveal of PFN1, linesthat ALS-linkedfor whereas each curve. variants the E117G C71G, mutation M114T, and has G118V, a relativelycourse),ALS-Linked all V5-taggedPFN1 Exhibits PFN1 Faster variants Turnover were in a expressed Neuronal Cell at similar Line. The but not E117G,modest are severely impact destabilized on PFN1 stability. relative to PFN1 WT. (A)Equilibrium Fig. 1. ALS-linked mutations destabilize PFN1. Chemical and thermal de- levelsturnover except rate that for V5-PFN1 proteins with C71G, destabilizing M114T, and mutations G118V parti- is often unfolding curves for PFN1 WT and ALS-linked variants generated by mea- tionedfaster into relative the insoluble to their fraction WT co (Fig.unterparts, 2 A and generallyB) as reported because naturation studies reveal that ALS-linked variants C71G, M114T, andsuringhere G118V, and the intrinsic appliedALS-Linked tryptophan to PFN1all PFN1 Exhibits fluorescence variants Faster of Turnover (Materials the indicated in a and Neuronal protein Methods Cell equil-). Line. The but not E117G, are severely destabilized relative to PFN1 WT. (A)Equilibrium previouslydestabilized (2). proteins The turnover are misfolded of both and PFN1 targeted C71G for and degradation M114T ibratedPFN1 in C71G increasingturnover was concentrations found rate for to proteins be of highly urea. with Data prone destabilizing were to processed aggregation mutations to obtain in is often unfolding curves for PFN1 WT and ALS-linked variants generated by mea- the center offaster mass (COM) relative of the to theiremission WT spectrum counterparts, and then generally fit to a two- becauseoccurredby the cellular significantly quality faster control than that machinery of PFN1 (16). WT. To As earlydetermine as suring the intrinsic tryptophan fluorescence of the indicated proteinE. equil- coli, consistent with observations that this variant exhibited state model fordestabilized protein folding. proteins The are resulting misfolded fits are and displayed targeted as solid for degradation lines. 2.5whether h, the majority the results of ofPFN1 our C71G in vitro and denaturation M114T within studies the soluble extend to ibrated in increasing concentrations of urea. Data were processed toparticularly obtain low solubility in mammalian cells (2), and therefore The corresponding thermodynamic parameters obtained from the fitted fractiona cellular had environment,already degraded V5-tagged (Fig. 2 A PFN1and C). variants This decrease were tran- in the center of mass (COM) of the emission spectrum and then fit towas a two- isolatedby from the cellular inclusion quality bodies control (Materials machinery and Methods (16).). To The determine data are shown in Table 1. (B) Thermal denaturation profiles of PFN1 pro- solublesiently PFN1 transfected content into was human not simply neuronal due SKNAS to further cells, PFN1 and PFN1ag- state model for protein folding. The resulting fits are displayed as solidbiochemical lines. whether properties the results of PFN1 of our C71G in vitro purified denaturation from inclusion studies extend to teins measured by SYPRO Orange fluorescence as a function of increasing gregation,turnover waswhich assessed could confound by tracking our V5-PFN1 analysis, proteinas evidenced expression by The corresponding thermodynamic parameters obtained from thebodies fitted area indistinguishable cellular environment, from PFN1 V5-tagged C71G PFN1 purified variants from the were tran- data are shown in Table 1. (B) Thermal denaturation profiles of PFN1temperature pro- siently were used transfected to determine into human the apparent neuronal Tandm, SKNAS Table which 1). is Differential cells, the tem- and scann PFN1theingover fluorimetry concomitant a 12.5-h (DSF) with time clearance SYPRO course of in PFN1 the presence from the of insoluble cycloheximide. fraction At soluble lysate of E. coli as determined by several assays (Fig. S1), teins measured by SYPRO Orange fluorescence as a function of increasingperature correspondingturnover was to assessed0.50 fluorescence by tracking signal V5-PFN1 asOrange, denoted a protein fluorescent by the expression indicator in- of hydrophobic regions exposed = upon protein unfolding, was usedatthe thenext start to early determine of time the the points apparent experiment of cycloheximide (t 0 of exposure the cycloheximide (Fig. 2B). The time temperature were used to determine the apparent Tm, which istersection theproviding tem- of confidence the dashed lines that for PFN1 each proteins curve. purified by these two over a 12.5-h time course in the presencemelting of temperature, cycloheximide. Tm,forallPFN1proteinsusedinthisstudyfastercourse), At turnover all V5-tagged of PFN1 PFN1 C71G variants and M114T were in expressed cells closely at similarcor- perature corresponding to 0.50 fluorescence signal as denoted bymethods the in- can be directly compared. the start of the experiment (t = 0 of(15). the Consistent cycloheximide with the chemical timerelateslevels denaturation withexcept results, their that all reduced V5-PFN1 ALS- stabilities C71G, M114T, in vitro, and confirming G118V the parti- tersection of the dashed lines for each curve. To examine the stability of PFN1proteins,fluorescencefromlinked variants except E117G exhibited a Tm that was at least 10 °C course), all V5-tagged PFN1 variants were expressed at similardestabilizingtioned into effect the insoluble of the C71G fraction and M114T(Fig. 2 mutations.A and B) as We reported note heretryptophans and applied (W4 and to all W32) PFN1 in PFN1 variants WT ( andMaterials ALS-linkedlower andthan WT Methods variants (Fig. 1B and). Table 1). Based on the denaturation levels except that V5-PFN1 C71G, M114T,studies, C71G and emerges G118V as the parti-that mostpreviously destabilizing the turnover (2). mutation The of in PFN1 theturnover C71G of was both faster PFN1 in C71Gthe soluble and M114Tfrac- PFN1was measured C71G was as a found function to of be increasing highly prone urea concentrationcontext to aggregation of PFN1, whereas (Fig. in the E117G mutation has a relatively here and applied to all PFN1 variants (Materials and Methods). tioned into the insoluble fraction (Fig. 2 A and B) as reportedtionoccurred compared significantly with the faster insoluble than fraction that of PFN1 (Fig. S3 WT.), likely As early be- as E.1A). coli To, consistent ensurepreviously reversibility, with (2). observations The the turnover reciprocal that of both analysis thismodest PFN1 variant was impact C71G also exhibited on PFN1 per-and stability. M114T PFN1 C71G was found to be highly prone to aggregationFig. 1. ALS-linked in mutations destabilize PFN1. Chemical and thermal de- cause2.5 h, clearance the majority of insoluble of PFN1 C71G cellular and aggregates M114T within by the the quality soluble naturationparticularlyformed, studies where reveal low that denatured solubility ALS-linked variants in PFN1 C71G, mammalian M114T, proteins and G118V, cells in ureaALS-Linked (2), were and PFN1 therefore refolded Exhibits Faster Turnover in a Neuronal Cell Line. The E. coli, consistent with observations that this variant exhibited occurred significantly faster than that of PFN1 WT. As earlycontrolfraction as machinery had already is less degraded efficient (Fig. compared 2 A and withC). the This turnover decrease of in butwasupon not E117G, isolated dilution are severely2.5 from h, with destabilized the inclusion buffer majority relative (Fig. to bodies of PFN1 PFN1S2 WT.A (A (–)EquilibriumMaterialsE C71G). Only and and oneturnover M114T Methods transition rate within for). proteins was Thethe soluble with destabilizing mutations is often particularly low solubility in mammalian cells (2), and thereforeunfolding curves for PFN1 WT and ALS-linked variants generated by mea- faster relative to their WTsmaller, cosolubleunterparts, soluble PFN1 generally species content because (17). was Although not simply PFN1 due G118V to further was destabi- PFN1 ag- suringbiochemicalobserved the intrinsic between tryptophanfraction properties fluorescence the had folded already of of thePFN1 or indicated degraded native C71G protein (N) (Fig. equil- purifiedand 2 unfoldedAdestabilizedand fromC). (U) proteins inclusionThis states decrease are misfolded in and targeted for degradation was isolated from inclusion bodies (Materials and Methodsibrated). in The increasing concentrations of urea. Data were processed to obtain lizedgregation, to a similar which degree could as confound M114T in our vitro, analysis, the turnover as evidenced of this by bodies aresoluble indistinguishable PFN1 content from was PFN1 not simplyC71Gby purifieddue the cellularto further from quality the PFN1 control ag- machinery (16). To determine biochemical properties of PFN1 C71G purified fromthe inclusionfor center all ofPFN1 mass (COM) proteins, of the emission indicative spectrum and of then a fit two-state to a two- (N$U) unfolding variantthe concomitant within the soluble clearance fraction of PFN1 seemed from slower the in insoluble cells (Fig. fraction 2C), statesolublemechanism. model for lysate proteingregation, folding. This of E. The two-state coli resulting whichas fits determined could are unf displayedolding confound as solid by model lines. several ourwhether was analysis, assays further the results ( asFig. evidenced sub-of S1 our), in vitro by denaturation studies extend to bodies are indistinguishable from PFN1 C71G purifiedThe from corresponding the thermodynamic parameters obtained from the fitted a cellular environment, V5-taggedwhichat the PFN1 may early variants reflect time were a points stabilizing tran- of cycloheximide effectofotherproteinsand/orfactors exposure (Fig. 2B). The dataproviding are shown in Tableconfidencethe 1. concomitant (B) Thermal that denaturation clearancePFN1 profiles proteins of of PFN1 PFN1 pro- purified fromsiently the transfected by insoluble these into two human fraction neuronal SKNAS cells, and PFN1 soluble lysate of E. coli as determined by several assays (Fig.stantiated S1), with an unfolding study of two PFN1 proteins (WT and thatfaster interact turnover with PFN1 of PFN1 in the C71G cellular and milieu M114T (4), or in that cells this closely variant cor- teinsmethods measured by can SYPROat the be Orange earlydirectly fluorescence time compared. points as a function of cycloheximide of increasingF turnover exposure was assessed (Fig. by 2B tracking). The V5-PFN1 protein expression providing confidence that PFN1 proteins purified by thesetemperatureM114T) two were using used to CD determine spectroscopy the apparent Tm (,Fig. which S2 is the). tem- The following thermo- faster turnover of PFN1 C71G and M114Tover a 12.5-h in cells time course closely in theis cor-relates not presence properly with of cycloheximide. handled their reduced At by the quality stabilities control in machinery vitro, confirming in the cell. the methods can be directly compared. peraturedynamicTo corresponding examine parameters to the0.50 fluorescence stability were signaldetermi of as PFN denotedned1proteins,fluorescencefrom by by the fitting in- the the start fluorescence of the experiment (tIn= fact,0 of the we cycloheximide detected atime low level of insoluble PFN1 G118V that tersection of the dashedrelates lines for with each curve. their reduced stabilities in vitro, confirming thedestabilizing effect of the C71G and M114T mutations. We note To examine the stability of PFN1proteins,fluorescencefromtryptophansdata to a two-state (W4 and folding W32) model: in PFN1 apparent WT andΔG°, ALS-linkedcourse), the free all V5-tagged energy variants of PFN1 variants were expressed at similar destabilizing effect of the C71G and M114Tlevels except mutations. that V5-PFN1 We C71G,notepersistedthat M114T, the throughout turnover and G118V of parti-the PFN1 12.5-h C71G time course was faster (Fig. in 2B theand solubleFig. S3 frac-). tryptophans (W4 and W32) in PFN1 WT and ALS-linkedwas variantsfolding; measured m, the as denaturant a function dependence of increasing of ureaG°; concentration and C ,themid- (Fig. here and appliedthat to all the PFN1 turnover variants (Materials of PFN1 and C71G MethodsΔ). wastioned faster intom in the the insoluble soluble fraction frac-tion (Fig. compared 2 A and B) as with reported the insoluble fraction (Fig. S3), likely be- 1A). To ensure reversibility, the reciprocal analysispreviously was (2). also The turnover per- of both PFN1 C71G and M114T was measured as a function of increasing urea concentrationPFN1point (Fig. C71G of wasthetion found unfolding compared to be highly transition with prone the to(Table aggregation insoluble 1). Both in fractionΔG° and (Fig. Cm S3were), likelyALS-Linked be-cause clearance Mutations of Induce insoluble a Misfolded cellular Conformation aggregates byWithin the PFN1. quality E. coli, consistent with observations that this variant exhibited occurred significantly faster than that of PFN1 WT. As early as 1A). To ensure reversibility, the reciprocal analysis was alsoformed,reduced per- wherefor ALS-linked denatured variants PFN1 relative proteins to PFN1 in urea2.5 WT, h, the were particularlymajority refolded of PFN1 C71GWe and reasoned M114T within that the ALS-linked soluble variants must undergo some de- BIOPHYSICS AND particularly low solubilitycause clearancein mammalian of cells insoluble (2), and therefore cellular aggregates by the qualitycontrol machinery is less efficient compared with the turnover of formed, where denatured PFN1 proteins in urea were refoldeduponfor the dilution PFN1 variants with buffer C71G, (Fig. M114T, S2 A– andE). G118V, Onlyfraction one indicating transition had already these degraded was gree (Fig. 2 ofA and structuralC). This decrease or conformational in change to account for their was isolated from inclusion bodies (Materials and Methods). The COMPUTATIONAL BIOLOGY control machinery is less efficient comparedsoluble PFN1 with content the turnover was notsmaller, simply of duesoluble to further speciesPFN1 ag- (17). Although PFN1 G118V was destabi- upon dilution with buffer (Fig. S2 A–E). Only one transitionTablebiochemicalobserved wasV.1.Summary properties between of PFN1the of foldedexperimental C71G purified or native from stability inclusion(N) and and unfolded binding (U) measurements states variants aresmaller, severely soluble destabilized species compared (17). Although with PFN1gregation, PFN1 WT G118V which (Fig. could was 1A confound destabi-destabilization.lized our analysis, to a similar as evidencedHowever, degree by ALS-causing as M114T in mutations vitro, the did turnover not perturb of this bodies are indistinguishable from PFN1 C71G purified from the observed between the folded or native (N) and unfolded (U)forfor PFN1 states all PFN1 variants proteins, indicative of a two-statethe (N concomitant$U) unfolding clearance of PFN1 from the insoluble fraction soluble lysate of E.lized coli as to determined a similar by severaldegree assays as M114T(Fig. S1), in vitro, the turnover of thisvariant within the soluble fraction seemed slower in cells (Fig. 2C), for all PFN1 proteins, indicative of a two-state (N$U) unfolding at the early time points of cycloheximide exposure (Fig. 2B). The providingmechanism. confidencevariant This that withinPFN1 two-state proteins the soluble purified unfolding fraction by these model two seemed was slower further in cells sub- (Fig. 2whichC), may reflect a stabilizing effectofotherproteinsand/orfactors Table 1. Summary of experimental stability andfaster binding turnover measurements of PFN1 C71G and for M114T PFN1 in cells variants closely cor- mechanism. This two-state unfolding model was further methodsstantiated sub- can be with directlywhich an compared. may unfolding reflect a study stabilizing of two effec PFN1tofotherproteinsand/orfactorsrelates proteins with their(WT reduced and stabilities in vitro, confirming the To examine the stability of PFN1proteins,fluorescencefrom that interact with PFN1 in the cellular milieu (4), or that this variant stantiated with an unfolding study of two PFN1 proteins (WT and destabilizing effect of the C71G and M114T mutations.† We note tryptophansM114T) (W4 using andthat W32) CD interact in spectroscopy PFN1 withEquilibrium WT PFN1 and ALS-linked ( inFig. the unfolding S2 cellular variantsF). The (N milieu$ followingU)* (4), or thermo- that this variant Tm, °C that the turnover of PFN1 C71Gis was not faster properly in the soluble handled frac- by the quality control machinery in the†, cell.‡ M114T) using CD spectroscopy (Fig. S2F). The following thermo- Binding to poly-L-proline wasdynamic measured as parametersis a function not properly of increasing were handled determi urea concentration byned the quality by (Fig. fitting controltion the compared machinery fluorescence with the in insoluble the cell.In fraction fact, we (Fig. detected S3), likely be- a low level of insoluble PFN1 G118V that dynamic parameters were determined by fitting the fluorescence1A). To ensure reversibility, the reciprocal–1 analysis was also per-–1 –1 dataVariant to a two-stateInΔ fact,G°, kcal foldingwe·mol detected model: am, low apparent kcal level·molofΔ·G°,M insolublecause the clearance free PFN1C energym,M ofG118V insoluble of cellular thatpersisted Proteinalone aggregates throughout by the+ quality4 the mM 12.5-h proline time course (Fig.Kd 2,BμMand Fig. S3). data to a two-state folding model: apparent ΔG°, the freeformed, energy where of denaturedpersisted PFN1 throughout proteins in the urea 12.5-h were refolded time coursecontrol (Fig. machinery 2B isand lessFig. efficient S3). compared with the turnover of uponfolding;WT dilution m, with the buffer7.04 denaturant (Fig.± S20.49A–E). dependence Only one transition2.25 ± of0.16 wasΔG°;smaller, and solubleC 3.13m,themid- species± 0.31 (17). Although 54.68 PFN1± 0.04 G118V was destabi- 57.25 ± 0.03 463 ± 26 folding; m, the denaturant dependence of ΔG°; and Cm,themid-observed between the folded or native (N) and unfolded (U) states pointC71G of the unfolding 1.89 ± 0.70 transition (Table1.95 1).± 0.40 BothlizedΔG° to aand 0.97 similar C±m degree0.41were as M114TALS-Linked 34.60 in vitro,± 0.03 the Mutations turnover of 39.96 this Induce± 0.03 a Misfolded Conformation687 ± 77 Within PFN1. point of the unfolding transition (Table 1). Both ΔG° andfor Cm allwere PFN1 proteins,ALS-Linked indicative Mutations of a two-state Induce (N$U) a unfolding Misfoldedvariant Conformation within the soluble Within fraction PFN1. seemed slower in cells (Fig. 2C), mechanism.reducedM114T This for two-stateALS-linked 3.51 ± unf0.40olding variants model was relative2.51 further± to0.24 sub- PFN1 WT, 1.40 particularly± 0.21We 42.62 reasoned± 0.03 that ALS-linked 46.52 ± 0.02 variants must572 undergo± 23 some de- BIOPHYSICS AND reduced for ALS-linked variants relative to PFN1 WT, particularly We reasoned that ALS-linked variantswhich must may reflect undergo a stabilizing some effec de-tofotherproteinsand/orfactorsBIOPHYSICS AND stantiatedfor the with PFN1 an unfolding variants study C71G, of two PFN1 M114T, proteins and (WT and G118V,that interact indicating with PFN1 these in the cellulargree milieu of (4), structural or that this variantor conformational change to account for their for the PFN1 variants C71G, M114T, and G118V, indicatingE117G these gree 6.90 of structural± 0.74 or conformational2.49 ± 0.26 change to 2.77 account± 0.42 for their 51.05 ± 0.04 53.78 ± 0.03 407 ± 27 COMPUTATIONAL BIOLOGY M114T)variants using are CD severelyspectroscopy destabilized (Fig. S2F). The comparedfollowing thermo- withis PFN1 not properly WT handled (Fig.1 byA the qualitydestabilization. control machineryCOMPUTATIONAL BIOLOGY However, in the cell. ALS-causing mutations did not perturb variants are severely destabilized compared with PFN1 WTdynamic (Fig.G118V 1 parametersA destabilization. were 3.70 determi± 0.44ned However, by fitting the ALS-causing2.20 fluorescence± 0.23In mutations fact, we 1.68 detected did± 0.26 not a low perturb level 42.84of insoluble± 0.04 PFN1 G118V 46.92 that ± 0.04 397 ± 40 data to a two-state folding model: apparent ΔG°, the free energy of persisted throughout the 12.5-h time course (Fig. 2B and Fig. S3). folding;*Errors m, the are denaturant shown as dependence SD. of ΔG°; and Cm,themid- † point of the unfolding transition (Table 1). Both ΔG° and Cm were ALS-Linked Mutations Induce a Misfolded Conformation Within PFN1. Table 1. Summary of experimental stability and binding Table measurementsErrors 1. are Summaryshown for as PFN1 SE. of experimental variants stability and binding measurements for PFN1 variants reduced‡ for ALS-linked variants relative to PFN1 WT, particularly We reasoned that ALS-linked variants must undergo some de- BIOPHYSICS AND Kd values are reported in terms of proline residues. for the PFN1 variants C71G, M114T, and† G118V, indicating these gree of structural or conformational change to account for† their $ COMPUTATIONAL BIOLOGY Equilibrium unfolding (N$U)* EquilibriumTm, °C unfolding (N U)* Tm, °C variants are severely destabilized compared with PFN1 WT (Fig. 1A destabilization. However, ALS-causing†,‡ mutations did not perturb †,‡ Binding to poly-L-proline Binding to poly-L-proline –1 –1 –1 –1 –1 –1 Variant ΔG°, kcal·mol m, kcal·mol ·M CVariantBoopathym,M et al. ProteinaloneΔG°, kcal·mol + 4m, mM kcal proline·mol ·M CKmd,,MμM Proteinalone PNAS+ 4 mM| June proline 30, 2015 | vol. 112 | Kno.d, μ 26M | 7985 Table 1. Summary of experimental stability and binding measurements for PFN1 variants † WT 7.04 ± 0.49 2.25 ± 0.16 3.13WT± 0.31 54.687.04Equilibrium±±0.040.49 unfolding (N 57.25$U)*2.25± 0.03± 0.16 3.13463Tm,±°C±0.3126 54.68 ± 0.04 57.25 ± 0.03 463 ± 26 †,‡ Binding to poly-L-proline C71G 1.89 ± 0.70 1.95 ± 0.40 0.97C71G± 0.41 34.60 1.89–1 ±±0.030.70 –1 39.96–1 1.95± 0.03± 0.40 0.97687±±0.4177 34.60 ± 0.03 39.96 ± 0.03 687 ± 77 Variant ΔG°, kcal·mol m, kcal·mol ·M Cm,M Proteinalone + 4 mM proline Kd, μM M114T 3.51 ± 0.40 2.51 ± 0.24 1.40M114T± 0.21 42.62 3.51±±0.030.40 46.522.51± 0.02± 0.24 1.40572±±0.2123 42.62 ± 0.03 46.52 ± 0.02 572 ± 23 E117G 6.90 ± 0.74 2.49 ± 0.26 2.77WT ± 0.427.04 ± 51.050.49 ± 0.042.25 ± 0.16 53.78 ± 3.130.03± 0.31 54.68 ± 0.04407 ± 57.2527 ± 0.03 463 ± 26 C71GE117G 1.89 ± 0.70 6.90 ± 0.741.95 ± 0.402.49 0.97 ±±0.410.26 34.60 ± 0.03 2.77 ± 0.42 39.96 ± 0.03 51.05 ± 0.04687 ± 77 53.78 ± 0.03 407 ± 27 G118V 3.70 ± 0.44 2.20 ± 0.23 1.68M114TG118V± 0.26 3.51 ± 42.840.40 3.70±±0.040.442.51 ± 0.24 46.922.20± 1.400.04±±0.210.23 42.62 ± 0.03 1.68397±±0.26 46.5240 ± 0.02 42.84 ± 0.04572 ± 23 46.92 ± 0.04 397 ± 40 E117G 6.90 ± 0.74 2.49 ± 0.26 2.77 ± 0.42 51.05 ± 0.04 53.78 ± 0.03 407 ± 27 *Errors are shown as SD. G118V 3.70 ± 0.44 2.20 ± 0.23 1.68 ± 0.26 42.84 ± 0.04 46.92 ± 0.04 397 ± 40 † *Errors are shown as SD. Errors are shown as SE. † *ErrorsErrors are shown are shown as SD. as SE. ‡ † Kd values are reported in terms of proline residues. Errors‡ are shown as SE. ‡ Kd values are reported in terms of proline residues. Kd values are reported in terms of proline residues.

Boopathy et al. Boopathy et al. PNAS | June 30, 2015 | vol. 112 | no.PNAS 26 |June7985 30, 2015 vol. 112 no. 26 7985 Boopathy et al. | | | | PNAS | June 30, 2015 | vol. 112 | no. 26 | 7985

185

V.c.2. ALS-Linked PFN1 Exhibits Faster Turnover in a Neuronal Cell Line.

The turnover rate for proteins with destabilizing mutations is often faster relative to their WT counterparts, generally because destabilized proteins are misfolded and targeted for degradation by the cellular quality control machinery

(16). To determine whether the results of our in vitro denaturation studies extend to a cellular environment, V5-tagged PFN1 variants were transiently transfected into human neuronal SKNAS cells, and PFN1 turnover was assessed by tracking

V5-PFN1 protein expression over a 12.5-h time course in the presence of cycloheximide. At the start of the experiment (t = 0 of the cycloheximide time course), all V5-tagged PFN1 variants were expressed at similar levels except that V5-PFN1 C71G, M114T, and G118V partitioned into the insoluble fraction

(Figure V.4A and B) as reported previously (2). The turnover of both PFN1

C71G and M114T occurred significantly faster than that of PFN1 WT. As early as

2.5 h, the majority of PFN1 C71G and M114T within the soluble fraction had already degraded (Figure V.4A and C). This decrease in soluble PFN1 content was not simply due to further PFN1 aggregation, which could confound our analysis, as evidenced by the concomitant clearance of PFN1 from the insoluble fraction at the early time points of cycloheximide exposure (Figure V.4B). The faster turnover of PFN1 C71G and M114T in cells closely correlates with their reduced stabilities in vitro, confirming the destabilizing effect of the C71G and

M114T mutations. We note that the turnover of PFN1 C71G was faster in the soluble fraction compared with the insoluble fraction (Figure V.5), likely because

186

PFN1 WT, E117G, and M114T produced crystals that diffracted at relatively high resolution (∼2.2 Å; Table S1). The 3D structure of human PFN1 WT agrees well with previously determined structures (20–22). PFN1 WT and E117G crystallized in the same space group, C121, whereas M114T crystallized in the P6 space group, with two molecules (designated as chains A and B) in the asymmetric unit (Table S1). Residues 22–36, 46–52, 101–105, 112–120, and 125–128 within PFN1 were used for Cα superimposition of the four molecules (PFN1 WT, M114T chains A and B, and E117G). In agreement with the biochemical analyses described above (Table 1 and Fig. S4), the secondary and tertiary structures of all three PFN1 proteins, including chains A and B of M114T, are highly similar (Fig. 3). Although the space groups for PFN1 WT and M114T crystals were different, we calculated the double difference plots between these and the other PFN1 structures to get a sense for structural perturbations potentially induced by the ALS-linked mutations. Double difference plots were constructed by calcu- lating the distances between all of the Cα atoms in PFN1 WT and an ALS-linked variant separately, and then plotting the difference of the difference between PFN1 structures as de- scribed previously (23). Virtually no structural deviations were observed between PFN1 WT and E117G, whereas moderate Fig. 2. ALS-linked PFN1 variants exhibit faster turnover in a neuronal cell differences were detected between WT and M114T (Fig. S6). line. SKNAS cells transiently transfected with V5-PFN1 constructsweretreated Next we sought to determine whether these moderate struc- with cycloheximide (CHX) for up to 12.5h,duringwhichtimelysateswere tural changes between PFN1 WT and M114T mapped to regions collected and probed by Western analysis with a V5-specific antibody to assess involved in PFN1 function, namely to residues that make contact the rate of PFN1 turnover in cells. (A and B)ArepresentativeWesternblot – analysis of soluble and insoluble fractions from cell lysates demonstrates a de- with actin (24 31) or poly-L-proline (21, 22, 24, 32, 33). The crease in V5-PFN1 protein with time. GAPDH serves a loading control for the ternary complex comprised of PFN1 WT, actin, and the poly-L- soluble fraction. (C)DensitometryanalysisofA reveals that the turnover of PFN1 proline peptide derived from vasodilator-stimulated phospho- C71G and M114T is significantly faster than that of PFN1 WT. Statistical signif- protein (VASP) (21) (PDB ID code 2PAV) is shown in Fig. 4. icance was determined using a two-way ANOVA followed by a Tukey’sposthoc Residues with the highest (0.3 Å or greater) average of absolute analysis (*P < 0.05, **P < 0.01, #P < 0.0001). Error bars represent SEM. WT and double difference (Avg-Abs-DD) values between PFN1 WT and E117G, n = 3; G118V, M114T and C71G, n = 4independentexperiments. M114T chain B (Fig. S6C) were mapped onto PFN1 WT (Fig. S7). PFN1 M114T chain B was used for this and all subsequent structural comparisons because chain B had lower B factors the secondary structural elements of PFN1 as determined by CD compared with chain A (Fig. S8). Indeed, several PFN1 residues spectroscopy (Fig. S4), and the fact that similar m values were de- 187 that reportedly make contacts with actin (V119, H120, G122, termined for all PFN1 variants by the urea denaturation analysis and K126) and poly-L-proline (W4, Y7, H134, and S138) also suggested these proteins adopt similar tertiary structures as well have relatively high Avg-Abs-DD values (Fig. S7). (Table 1) (18). To probe further for potential structural differences To assess whether these mutation-induced structural changes between PFN1 WT and ALS-linked variants, these proteins were are sufficient to alter the normal binding interactions of PFN1, subjected to native gel electrophoresis, a biochemical technique capable of detecting conformational differences between misfolded variants and their WT counterparts (19). PFN1 WT and E117G migrated predominately as single, distinct bands with similar mo- bility, whereas multiple bands of slower mobility were observed for PFN1 variants C71G, M114T, and G118V (Fig. S5A). The slower mobility bands likely reflect the larger hydrodynamic volume due to partial unfolding of these variants. In addition, PFN1 C71G, M114T, and G118V produced relatively large-molecular-weight species that were retained in the stacking gel and unable to elec- trophorese through the separating native gel but were resolublized under conditions used for the denaturing gel (Fig. S5A). Analytical size-exclusion chromatography revealed that all PFN1 proteins eluted as expected for soluble, monomeric PFN1 (Fig. S5 B–G). However, despite equal loading of PFN1 proteins onto the ana- Fig. 3. Superimposition of the crystal structures for PFN1 WT, E117G, and lytical size-exclusion column, the peak area corresponding to M114T. (A and B) The secondary and tertiary structures for PFN1 WT (green), soluble monomer PFN1 is reduced for ALS-linked variants, par- E117G (mustard), M114T chain A (pink), and B (red) are highly superimpos- ticularly for the most aggregation-prone variant, C71G. These able. For each structure, sticks and spheres denote the side chains and van data are consistent with a loss of soluble monomer PFN1 in the der Waals radii, respectively, for residues at position 114 and 117. Residue form of insoluble species that cannot pass through the analytical 117 is located within a solvent-exposed flexible loop that has no discernible size-exclusion column filter. secondary structure, whereas Met114 is located within a β-sheet toward the interior of the protein. (B) A zoomed cartoon representation showing resi- dues within 4 Å of residue 114. The side chains of these residues are in- A Source of Mutation-Induced Destabilization Revealed by X-Ray dicated as sticks with nitrogen, oxygen, and sulfur atoms indicated in blue, Crystallography of PFN1. Crystal structures of PFN1 proteins red, and yellow, respectively. The van der Waals radii of the atoms com- were determined to identify regions within mutant PFN1 that are prising residue 114 are reduced upon mutation of methionine (green and conformationally distinct from PFN1 WT at atomic resolution. mustard structures) to threonine (red and pink structures).

7986 | www.pnas.org/cgi/doi/10.1073/pnas.1424108112 Boopathy et al.

Figure V.4. ALS-linked PFN1 variants exhibit faster turnover in a neuronal cell line. SKNAS cells transiently transfected with V5-PFN1 constructs were treated with cycloheximide (CHX) for up to 12.5 h, during which time lysates were collected and probed by Western analysis with a V5-specific antibody to assess the rate of PFN1 turnover in cells. A and B. A representative Western blot analysis of soluble and insoluble fractions from cell lysates demonstrates a de- crease in V5-PFN1 protein with time. GAPDH serves a loading control for the soluble fraction. C. Densitometry analysis of A reveals that the turnover of PFN1 C71G and M114T is significantly faster than that of PFN1 WT. Statistical significance was determined using a two-way ANOVA followed by a Tukey’s post hoc analysis (*P < 0.05, **P < 0.01, #P < 0.0001). Error bars represent SEM. WT and E117G, n = 3; G118V, M114T and C71G, n = 4 independent experiments.

188 Fig. S2. All PFN1 variants unfold by a two-state process. (A–E) PFN1 variants denatured in urea were refolded by diluting the urea. The final concentration of PFN1 in each sample was 10 μM and tryptophan fluorescence was used to monitor folding. The equilibrium transition regions overlay closely for the unfolding and refolding curves, indicating that the unfolding reaction is reversible. Filled and open circles represent unfolding and refolding, respectively. (F)Thetwo- state unfolding of PFN1 observed by intrinsic fluorescence (data from Fig. 1A; Fluor) was verified by CD measurements for PFN1 WT and M114T. The con- centration of protein used was 2 μMand10μM for tryptophan fluorescence and CD measurements, respectively. The y axis on the left is the mean residue ellipticity at 220 nm (MRE220) obtained from CD experiments, whereas the y axis on the right reflects the change in the COM (as shown in Fig. 1). The thermodynamic parameters obtained by fitting the CD data agree well with those obtained from the fluorescence data (Table 1) and are as follows: for WT −1 −1 −1 −1 −1 −1 ΔG° = 7.16 ± 0.11 kcal·mol ,m= 2.36 ± 0.04 kcal·mol ·M ,Cm = 3.03 ± 0.07 M; for M114T ΔG° = 4.35 ± 0.10 kcal·mol ,m= 2.95 ± 0.06 kcal·mol ·M ,Cm = 1.47 ± 0.05 M.

Fig. S3. The turnover of insoluble PFN1 in SKNAS cells. The experiment was carried out as described in Fig. 2, and a representative Western blot analysis of the insoluble fraction is shown in Fig. 2B. The dataFigure above V. reflect5. The theturnover densitometry of insoluble results PFN1 in from SKNAS an cells. average of n = 2 (M114T) or n = 3 (C71G and G118V) independent experiments and error bars represent SEM. EachThe sampleexperiment was was normalized carried out as to described the PFN1 in Figure C71G V.4 band, and corresponding a representative to “time 0.” The turnover of C71G within the Western blot analysis of the insoluble fraction is shown in Figure V.4B. The data insoluble fraction was slower relative to C71G within the soluble fraction (compare this graph to that in Fig. 2C). There was relatively less M114T and G118V in above reflect the densitometry results from an average of n = 2 (M114T) or n = 3 the insoluble fraction compared with C71G, and(C71G the and small G118V) fraction independent of insoluble experiments G118V and error persisted bars represent throughout SEM. Each the experimental time course. sample was normalized to the PFN1 C71G band corresponding to “time 0.” The turnover of C71G within the insoluble fraction was slower relative to C71G within the soluble fraction (compare this graph to that in Figure V.4C). There was relatively less M114T and G118V in the insoluble fraction compared with C71G, and the small Boopathy et al. www.pnas.org/cgi/content/short/1424108112fraction of insoluble G118V persisted throughout the experimental time course. 4 of 10

189

clearance of insoluble cellular aggregates by the quality control machinery is less efficient compared with the turnover of smaller, soluble species (17). Although

PFN1 G118V was destabilized to a similar degree as M114T in vitro, the turnover of this variant within the soluble fraction seemed slower in cells (Fig. V.4C), which may reflect a stabilizing effect of other proteins and/or factors that interact with PFN1 in the cellular milieu (4), or that this variant is not properly handled by the quality control machinery in the cell. In fact, we detected a low level of insoluble PFN1 G118V that persisted throughout the 12.5-h time course (Figure

V.4B and Figure V.5).

190

V.c3. ALS-Linked Mutations Induce a Misfolded Conformation

Within PFN1.

We reasoned that ALS-linked variants must undergo some degree of structural or conformational change to account for their destabilization. However,

ALS-causing mutations did not perturb the secondary structural elements of

PFN1 as determined by CD spectroscopy (Figure V.6), and the fact that similar m values were determined for all PFN1 variants by the urea denaturation analysis suggested these proteins adopt similar tertiary structures as well (Table

V.1) (18). To probe further for potential structural differences between PFN1 WT and ALS-linked variants, these proteins were subjected to native gel electrophoresis, a biochemical technique capable of detecting conformational differences between misfolded variants and their WT counterparts (19). PFN1

WT and E117G migrated predominately as single, distinct bands with similar mobility, whereas multiple bands of slower mobility were observed for PFN1 variants C71G, M114T, and G118V (Figure V.7A). The slower mobility bands likely reflect the larger hydrodynamic volume due to partial unfolding of these variants. In addition, PFN1 C71G, M114T, and G118V produced relatively large- molecular-weight species that were retained in the stacking gel and unable to electrophorese through the separating native gel but were resolublized under conditions used for the denaturing gel (Figure V.7A). Analytical size-exclusion chromatography revealed that all PFN1 proteins eluted as expected for soluble, monomeric PFN1 (Figure V.7B–G). However, despite equal loading of PFN1

191

Fig. S4. ALS-linked PFN1 variants retain the same secondary structure as PFN1 WT. (A–D) Far UV CD spectra for the indicated PFN1 variant (10 μM) overlaid with CD spectrum for PFN1Figure WT (10 μM). V.6. ALS-linked PFN1 variants retain the same secondary structure as PFN1 WT. A–D. Far UV CD spectra for the indicated PFN1 variant (10 μM) overlaid with CD spectrum for PFN1 WT (10 μM).

192

Boopathy et al. www.pnas.org/cgi/content/short/1424108112 5 of 10

optye al. et Boopathy www.pnas.org/cgi/content/short/1424108112 optye al. et Boopathy

Fig. S5. Analysis of PFN1 proteins by native page and analytical size-exclusion chromatography. (A) PFN1 proteins (10 μg) were subjected to native (Top)ordenaturing(Bottom)gelelectrophoresisanddetected with Coomassie Brilliant Blue stain. The mobility of native PFN1 WT is indicated. PFN1 E117G migrates with a slightly faster mobility than PFN1 WT owing to the addition of a negatively charged amino acid. Misfolded ALS-linked PFN1 variants migrate with slower mobility and form aggregated species that are retained in the stacking gel. This gel is representative of n = 2 experiments using proteins from different purification preparations. (B–F)TheindicatedPFN1protein(40μg) was subjected to analytical size-exclusion chromatography using a Superdex 75 column. A single peak corresponding to the expected elution www.pnas.org/cgi/content/short/1424108112 volume (∼15 mL) for monomeric PFN1 was detected for all PFN1 proteins. The experiments were carried out in duplicate for each variant, indicated by solid (n = 1experiment)anddashed(n = 2 experiment) lines. The average relative peak area ± the SD is indicated to the right of each curve. Despite equal sample loading, the peak area of PFN1 C71G and M114T is lower than that of WT (within error), consistent with a reduced level of soluble protein for these ALS-linked variants. (G) An overlay of B–F for the n = 1 experiment demonstrates a similar elution profile for all PFN1 proteins. f10 of 6

Fig. S5. Analysis of PFN1 proteins by native page and analytical size-exclusion chromatography. (A) PFN1 proteins (10 μg) were subjected to native (Top)ordenaturing(Bottom)gelelectrophoresisanddetected with Coomassie Brilliant Blue stain. The mobility of native PFN1 WT is indicated. PFN1 E117G migrates with a slightly faster mobility than PFN1 WT owing to the addition of a negatively charged amino acid. Misfolded ALS-linked PFN1 variants migrate with slower mobility and form aggregated species that are retained in the stacking gel. This gel is representative of n = 2 experiments using proteins from different purification preparations. (B–F)TheindicatedPFN1protein(40μg) was subjected to analytical size-exclusion chromatography using a Superdex 75 column.193 A single peak corresponding to the expected elution volume (∼15 mL) for monomeric PFN1 was detected for all PFN1 proteins. The experiments were carried out in duplicate for each variant, indicated by solid (n = 1experiment)anddashed(n = 2 experiment) lines. The average relative peak area ± the SD is indicated to the right of each curve. Despite equal sample loading, the peak area of PFN1 C71G and M114T is lower than that of WT (within error), consistent with a reduced level of soluble protein for these ALS-linked variants. (G) An overlay of B–F for the n = 1 experiment demonstrates a similar elution profile for all PFN1 proteins. f10 of 6

Figure V.7. Analysis of PFN1 proteins by native page and analytical size- exclusion chromatography. A. PFN1 proteins (10 μg) were subjected to native (Top) or denaturing (Bottom) gel electrophoresis and detected with Coomassie Brilliant Blue stain. The mobility of native PFN1 WT is indicated. PFN1 E117G migrates with a slightly faster mobility than PFN1 WT owing to the addition of a negatively charged amino acid. Misfolded ALS-linked PFN1 variants migrate with slower mobility and form aggregated species that are retained in the stacking gel. This gel is representative of n = 2 experiments using proteins from different purification preparations. B–F. The indicated PFN1 protein (40 μg) was subjected to analytical size- exclusion chromatography using a Superdex 75 column. A single peak corresponding to the expected elution volume (∼15 mL) for monomeric PFN1 was detected for all PFN1 proteins. The experiments were carried out in duplicate for each variant, indicated by solid (n = 1 experiment) and dashed (n = 2 experiment) lines. The average relative peak area ± the SD is indicated to the right of each curve. Despite equal sample loading, the peak area of PFN1 C71G and M114T is lower than that of WT (within error), consistent with a reduced level of soluble protein for these ALS-linked variants. G. An overlay of B–F for the n = 1 experiment demonstrates a similar elution profile for all PFN1 proteins.

194

proteins onto the analytical size-exclusion column, the peak area corresponding to soluble monomer PFN1 is reduced for ALS-linked variants, particularly for the most aggregation-prone variant, C71G. These data are consistent with a loss of soluble monomer PFN1 in the form of insoluble species that cannot pass through the analytical size-exclusion column filter.

195

V.c.4. A Source of Mutation-Induced Destabilization Revealed by

X-Ray Crystallography of PFN1.

Crystal structures of PFN1 proteins were determined to identify regions within mutant PFN1 that are conformationally distinct from PFN1 WT at atomic resolution. PFN1 WT, E117G, and M114T produced crystals that diffracted at relatively high resolution (∼2.2 Å; Table V.2). The 3D structure of human PFN1

WT agrees well with previously determined structures (20–22). PFN1 WT and

E117G crystallized in the same space group, C121, whereas M114T crystallized in the P6 space group, with two molecules (designated as chains A and B) in the asymmetric unit (Table V.2).

Residues 22–36, 46–52, 101–105, 112–120, and 125–128 within PFN1 were used for Cα superimposition of the four molecules (PFN1 WT, M114T chains A and B, and E117G). In agreement with the biochemical analyses described above (Table V.1 and Figure V.6), the secondary and tertiary structures of all three PFN1 proteins, including chains A and B of M114T, are highly similar

(Figure V.8). Although the space groups for PFN1 WT and M114T crystals were different, we calculated the double difference plots between these and the other

PFN1 structures to get a sense for structural perturbations potentially induced by the ALS-linked mutations. Double difference plots were constructed by calculating the distances between all of the Cα atoms in PFN1 WT and an ALS- linked variant separately, and then plotting the difference of the difference between PFN1 structures as described previously (23). Virtually no structural

196

Table V.2. Crystallographic and refinement statistics of human PFN1 structures

Table 1. Crystallographic and refinement statistics of human Profilin-1 structures WT E117G M114T

Resolution (Å) 2.16 2.17 2.23

Space group C121 C121 P6

a (Å) 74.26 73.65 81.69

b (Å) 31.84 31.71 81.69

c (Å) 61.02 60.54 65.35 α 90° 90° 90° β 122.66° 122.03° 90°

γ 90° 90° 120°

Z 1 1 2

Rmerge (%) linear 0.075 0.036 (2.25-2.17: 0.095) 0.147

I/sigma 13.3(2.63 at 2.16) 12.2 (2.25-2.17: 26.1) 12.4

Completeness (%) 99.28 99.49 (2.25-2.17: 100) 99.58

Total no. of reflections 20,783 16,453 76,801

No. of unique reflections 6,416 6,422 (2.25-2.17: 644) 12,156

Rfactor (%) 0.2159 (2.72-2.16: 0.2592) 0.1965 (2.73-2.17: 0.1926) 0.1952 (2.45-2.23: 0.2375)

Rfree (%) 0.2469 (2.72-2.16: 0.2837) 0.2139 (2.33-2.17: 0.2528) 0.2383 (2.45-2.23: 0.2871)

RMSD in: Bond lengths (Å) 0.002 0.003 0.003

RMS Angle (°) 0.62 0.67 0.61

Temperature (°C) -80 -80 -80

Residues Missing: 1, 2, 59-62, 81,82, 93- Chain A 1, 2, 57-62, 92-96 95, 140 1, 93-97 Chain B ------1, 13, 91-97

PDB ID 4XIL 4X1M 4X25

197 PFN1 WT, E117G, and M114T produced crystals that diffracted at relatively high resolution (∼2.2 Å; Table S1). The 3D structure of human PFN1 WT agrees well with previously determined structures (20–22). PFN1 WT and E117G crystallized in the same space group, C121, whereas M114T crystallized in the P6 space group, with two molecules (designated as chains A and B) in the asymmetric unit (Table S1). Residues 22–36, 46–52, 101–105, 112–120, and 125–128 within PFN1 were used for Cα superimposition of the four molecules (PFN1 WT, M114T chains A and B, and E117G). In agreement with the biochemical analyses described above (Table 1 and Fig. S4), the secondary and tertiary structures of all three PFN1 proteins, including chains A and B of M114T, are highly similar (Fig. 3). Although the space groups for PFN1 WT and M114T crystals were different, we calculated the double difference plots between these and the other PFN1 structures to get a sense for structural perturbations potentially induced by the ALS-linked mutations. Double difference plots were constructed by calcu- lating the distances between all of the Cα atoms in PFN1 WT and an ALS-linked variant separately, and then plotting the difference of the difference between PFN1 structures as de- scribed previously (23). Virtually no structural deviations were observed between PFN1 WT and E117G, whereas moderate Fig. 2. ALS-linked PFN1 variants exhibit faster turnover in a neuronal cell differences were detected between WT and M114T (Fig. S6). line. SKNAS cells transiently transfected with V5-PFN1 constructsweretreated Next we sought to determine whether these moderate struc- with cycloheximide (CHX) for up to 12.5h,duringwhichtimelysateswere tural changes between PFN1 WT and M114T mapped to regions collected and probed by Western analysis with a V5-specific antibody to assess involved in PFN1 function, namely to residues that make contact the rate of PFN1 turnover in cells. (A and B)ArepresentativeWesternblot – analysis of soluble and insoluble fractions from cell lysates demonstrates a de- with actin (24 31) or poly-L-proline (21, 22, 24, 32, 33). The crease in V5-PFN1 protein with time. GAPDH serves a loading control for the ternary complex comprised of PFN1 WT, actin, and the poly-L- soluble fraction. (C)DensitometryanalysisofA reveals that the turnover of PFN1 proline peptide derived from vasodilator-stimulated phospho- C71G and M114T is significantly faster than that of PFN1 WT. Statistical signif- protein (VASP) (21) (PDB ID code 2PAV) is shown in Fig. 4. icance was determined using a two-way ANOVA followed by a Tukey’sposthoc Residues with the highest (0.3 Å or greater) average of absolute analysis (*P < 0.05, **P < 0.01, #P < 0.0001). Error bars represent SEM. WT and double difference (Avg-Abs-DD) values between PFN1 WT and E117G, n = 3; G118V, M114T and C71G, n = 4independentexperiments. M114T chain B (Fig. S6C) were mapped onto PFN1 WT (Fig. S7). PFN1 M114T chain B was used for this and all subsequent structural comparisons because chain B had lower B factors the secondary structural elements of PFN1 as determined by CD compared with chain A (Fig. S8). Indeed, several PFN1 residues spectroscopy (Fig. S4), and the fact that similar m values were de- that reportedly make contacts with actin (V119, H120, G122, termined for all PFN1 variants by the urea denaturation analysis and K126) and poly-L-proline (W4, Y7, H134, and S138) also suggested these proteins adopt similar tertiary structures as well have relatively high Avg-Abs-DD values (Fig. S7). (Table 1) (18). To probe further for potential structural differences To assess whether these mutation-induced structural changes between PFN1 WT and ALS-linked variants, these proteins were are sufficient to alter the normal binding interactions of PFN1, subjected to native gel electrophoresis, a biochemical technique capable of detecting conformational differences between misfolded variants and their WT counterparts (19). PFN1 WT and E117G migrated predominately as single, distinct bands with similar mo- bility, whereas multiple bands of slower mobility were observed for PFN1 variants C71G, M114T, and G118V (Fig. S5A). The slower mobility bands likely reflect the larger hydrodynamic volume due to partial unfolding of these variants. In addition, PFN1 C71G, M114T, and G118V produced relatively large-molecular-weight species that were retained in the stacking gel and unable to elec- trophorese through the separating native gel but were resolublized under conditions used for the denaturing gel (Fig. S5A). Analytical size-exclusion chromatography revealed that all PFN1 proteins eluted as expected for soluble, monomeric PFN1 (Fig. S5 B–G).

However, despite equal loading of PFN1 proteins onto the ana- Fig. 3. Superimposition of the crystal structures for PFN1 WT, E117G, and Figure V.8. Superimposition of the crystal structures for PFN1 WT, lytical size-exclusion column, the peak area corresponding to M114T.E117G, (andA and M114T.B) The secondary and tertiary structures for PFN1 WT (green), soluble monomer PFN1 is reduced for ALS-linked variants, par- E117GA and B. (mustard), The secondary M114T and chain tertiary A (pink), structures and B for (red) PFN1 are WT highly (green), superimpos- E117G ticularly for the most aggregation-prone variant, C71G. These able.(mustard), For each M114T structure, chain A sticks(pink), and and spheres B (red) are denote highly the superimpos side chainsable. and For van data are consistent with a loss of soluble monomer PFN1 in the dereach Waals structure, radii, sticks respectively, and spheres for residuesdenote the at side position chains 114 and and van 117. der ResidueWaals form of insoluble species that cannot pass through the analytical 117radii, is respectively, located within for aresidues solvent-exposed at position flexible 114 and loop 117. that Residue has no117 discernible is located within a solvent-exposed flexible loop that has no discernible size-exclusion column filter. secondary structure, whereas Met114 is located within a β-sheet toward the interiorsecondary of stru thecture, protein. whereas (B) A Met114 zoomed is cartoon located within representation a β-sheet showingtoward the resi- duesinterior within of the 4protein. Å of residue 114. The side chains of these residues are in- B. A zoomed cartoon representation showing residues within 4 Å of residue A Source of Mutation-Induced Destabilization Revealed by X-Ray dicated as sticks with nitrogen, oxygen, and sulfur atoms indicated in blue, Crystallography of PFN1. Crystal structures of PFN1 proteins 114. The side chains of these residues are indicated as sticks with nitrogen, red,oxygen, and and yellow, sulfur respectively. atoms indicated The in van blue, der red, Waals and yellow, radii of respectively. the atoms The com- were determined to identify regions within mutant PFN1 that are prisingvan der residueWaals radii 114 of are the reduced atoms com uponprising mutation residue of 114 methionine are reduced (green upon and conformationally distinct from PFN1 WT at atomic resolution. mustardmutation structures)of methionine to (green threonine and (redmustard and structures) pink structures). to threonine (red and pink structures).

7986 | www.pnas.org/cgi/doi/10.1073/pnas.1424108112 Boopathy et al.

198

deviations were observed between PFN1 WT and E117G, whereas moderate differences were detected between WT and M114T (Figure V.9).

Next we sought to determine whether these moderate structural changes between PFN1 WT and M114T mapped to regions involved in PFN1 function, namely to residues that make contact with actin (24–31) or poly-L-proline (21, 22,

24, 32, 33). The ternary complex comprised of PFN1 WT, actin, and the poly-L- proline peptide derived from vasodilator-stimulated phospho-protein (VASP) (21)

(PDB ID code 2PAV) is shown in Figure V.10. Residues with the highest (0.3 Å or greater) average of absolute double difference (Avg-Abs-DD) values between

PFN1 WT and M114T chain B (Figure V.9C) were mapped onto PFN1 WT

(Figure V.11). PFN1 M114T chain B was used for this and all subsequent structural comparisons because chain B had lower B factors compared with chain A (Figure V.12). Indeed, several PFN1 residues that reportedly make contacts with actin (V119, H120, G122, and K126) and poly-L-proline (W4, Y7,

H134, and S138) also have relatively high Avg-Abs-DD values (Figure V.11).

To assess whether these mutation-induced structural changes are sufficient to alter the normal binding interactions of PFN1, we first monitored changes in the intrinsic tryptophan fluorescence of PFN1 as a function of poly-L- proline peptide concentration (Figure V.13A). Our results revealed that the effect of ALS-linked mutations on the PFN1-poly-L-proline interaction was modest, because the apparent dissociation constants (Kd) were within twofold for all

PFN1 proteins in this study (Table V.1). In fact, excess concentrations of poly-L-

199

Fig. S6. Structural changes induced by the M114T mutation revealed in double difference plots. Double different plots (Left)ofWTvs.E117G(A), WT vs. M114T chain A (B), WT vs. M114T chain B (C), and M114T chains A vs. B (D). The Avg-Abs-DD values are plotted as a function of residue number for each structural comparison (Middle); these plots provide an indication for residues that undergo a structural change between the proteins that are being compared. Residues with Avg-Abs-DD values of 0.3 Å or greater are plotted onto the structure (Right) of PFN1 WT (A–C) and PFN1 M114T chain A (D) in green. Residues not used in this analysis are colored black.

200

Boopathy et al. www.pnas.org/cgi/content/short/1424108112 7 of 10

Figure V.9. Structural changes induced by the M114T mutation revealed in double difference plots. Double different plots (Left) of WT vs. E117G (A), WT vs. M114T chain A (B), WT vs. M114T chain B (C), and M114T chains A vs. B (D). The Avg-Abs-DD values are plotted as a function of residue number for each structural comparison (Middle); these plots provide an indication for residues that undergo a structural change between the proteins that are being compared. Residues with Avg-Abs-DD values of 0.3 Å or greater are plotted onto the structure (Right) of PFN1 WT (A–C) and PFN1 M114T chain A (D) in green. Residues not used in this analysis are colored black.

201

exert a destabilizing effect on the native conformation of PFN1 owing to this loss of van der Waals contacts and the reduced hydrophobicity of the threonine side chain relative to that of methionine (11). Moreover, hydrophobic residues that are oth- erwise buried in the PFN1 WT structure were exposed by the cleft in the PFN1 M114T structure (Fig. 7 and Fig. S9). To in- vestigate the potential impact of the C71G mutation on PFN1 structure, the cysteine side chain of residue 71 was removed to mimic a glycine amino acid in the PFN1 WT structure using PyMOL. Interestingly, this mutation is predicted to form a void in the core of the protein that partially overlaps with the cleft observed in the PFN1 M114T crystal structure (Fig. 7B). Anal- ysis using PyMOL and SiteMap suggest that, unlike the solvent- accessible WT and the M114T pocket, the proposed C71G void is buried within the core of the protein. Solvent-inaccessible voids have a more destabilizing effect than solvent-exposed cavities (11, 35), providing an explanation for why the C71G mutation is more destabilizing than M114T (Fig. 1). Discussion Here we show that ALS-linked mutations severely destabilize (Fig. 1) and alter the native protein conformation (Fig. 3) of PFN1. Fig.Figure 4. V.10.Structure Structure of actin of actin–PFN1–PFN1–VASP–VASP peptide peptide ternary ternary complex complex with with the actinthe actin and poly-and Lpoly-proline-L-proli bindingne binding residues residues mapped mapped on PFN1. on The PFN1. X-ray structure Changes in protein stability owing to disease-causing mutations, ofThe the X-ray PFN1 structure WT (gray) of the–actin PFN1 (blue) WT (gray)–poly-–L-prolineactin (blue) peptide–poly-L (gold)-proline complex peptide whether these mutations stabilize or destabilize the protein, are (PDB(gold) ID complex code 2PAV) (PDB isID shown. code 2PAV) Residues is shown. reportedly Residues involved reportedly in actin involved binding in thought to play a pivotal role in various disease mechanisms (13). (V61,actin binding K70, S72, (V61, V73, K70, I74, S72, R75, V73, E83, I74, R89, R75, K91, E83, P97, R89, T98, K91, N100, P97, V119, T98, N100, H120, In the context of ALS, disease-linked mutations destabilize Cu, V119, H120, G122, N125, K126, Y129, and E130) and poly-L-proline binding G122, N125, K126, Y129, and E130) and poly-L-proline binding (W4, Y7, N10, (W4, Y7, N10, A13, S28, S30, W32, H134, S138, and Y140) are highlighted in Zn-superoxide dismutase (SOD1) (9), but instead hyperstabilize A13,blue and S28, gold, S30, W32,respectively. H134, S138, The sites and Y140)of ALS are-linked highlighted mutations in investigated blue and gold, in TAR DNA-binding protein 43 (TDP-43) (8, 10, 36). These find- respectively.this study are The highlighted sites of ALS-linkedand labeled mutations in black with investigated side chains in displayed this study as are ings underscore the importance of defining the toxic properties of highlightedblack sticks. andResidues labeled involved in black in actin with or side poly chains-L-proline displayed binding as that black also sticks. disease-linked proteins, thereby directing the rational design of Residuesexhibit Avg involved- Abs-DD in values actin orof 0.3 poly- Å Lor-proline greater binding between that PFN1 also WT exhibit and M114T Avg- Abs-DDchain B (W4, values K126, of 0.3 and Å S138) or greater are labeled between in black PFN1 (the WT remaining and M114T residues chain B therapeutic strategies against those offending proteins (3). (W4,that ful K126,fill this and criteria S138) are are shown labeled in inFigure black V.11 (the). remaining residues that fulfill Our X-ray crystal structures of PFN1 proteins illuminate a this criteria are shown in Fig. S7). probable source of mutation-induced destabilization. An enlarged surface pocket, or void, forms as a result of the M114T mutation (Fig. 7). The destabilizing effect of similar voids has been dem- we first monitored changes in the intrinsic tryptophan fluorescence onstrated using a systematic site-directed mutagenesis approach of PFN1 as a function of poly-L-proline peptide concentration (Fig. with lysozyme and is thought to arise from a loss of hydrophobic 5A). Our results revealed that the202 effect of ALS-linked mutations interactions (11, 35). Examples of mutation-induced cavity for- on the PFN1-poly-L-proline interaction was modest, because the mation and destabilization have also been observed in nature (13). apparent dissociation constants (Kd)werewithintwofoldforall Interestingly, modeling the removal of the cysteine side chain at PFN1 proteins in this study (Table 1). In fact, excess concentrations position 71 creates an internal cavity that is predicted to partially of poly-L-proline effectively stabilized all PFN1 proteins as de- overlap the cleft formed by M114T, raising the intriguing possi- termined by DSF, with the largest increase in Tm observed upon bility that both mutations destabilize PFN1 through a common poly-L-proline peptide binding to C71G (Fig. 5B and Table 1). Next, mechanism that involves the loss of hydrophobic and van der we measured the binding capacity of our PFN1 proteins for G-actin Waals contacts within the same region of PFN1 (Fig. 7). Because by comparing their concentration-dependent abilities to suppress spontaneous polymerization of pyrenyliodoacetamide‐labeled actin monomers (34). This assay is based on the fact that PFN1 binds G-actin and inhibits actin nucleation in the absence of formins (34). As expected, increasing concentrations of recombinant PFN1 WT reduced the rate of actin polymerization, whereas the H120E var- BIOPHYSICS AND iant that exhibits impaired binding to actin failed to suppress actin COMPUTATIONAL BIOLOGY polymerization to the same extent (Fig. 6). Of the four ALS-linked variants, only G118V was defective in suppressing actin polymeri- zation, which was most apparent at the highest concentration of PFN1 used in this assay, although thiseffectdidnotreachstatistical significance (Fig. 6). These data argue against a general mechanism Fig. 5. ALS-linked PFN1 variants retain the ability to bind poly-L-proline. for PFN1-mediated ALS pathogenesis that involves impaired direct (A) Binding of PFN1 to the poly-L-proline peptide was monitored by measuring binding between PFN1 and either poly-L-proline or actin. the intrinsic tryptophan fluorescence of the indicated PFN1 protein as a Importantly, the X-ray crystal structures reveal a possible function of increasing peptide concentration. The data points were fit using mechanism by which ALS-linked mutations destabilize PFN1. aone-sitetotalbindingmodelinGraphPadPrismandtheapparentdisso- Residues Thr90, Met114, and Gln18 contribute to the formation ciation constants (Kd) obtained from the fit are shown in Table 1. Note that of a surface exposed pocket that was detected using SiteMap the concentration of the peptide is reported in terms of [proline] because the peptide stock is supplied as a mixture of poly-L-proline species (Materials (Fig. 7). Mutation of methionine to threonine at position 114 and Methods). (B)DSFwasperformedasdescribedinFig.1B in the presence increased the size of this pocket, thereby forming a cleft, because (dashed lines) and absence (solid lines) of 4 mM proline. The presence of the residues nearby failed to rearrange and compensate for the proline increases the Tm for all PFN1 proteins used in this study (Table 1), as loss of van der Waals contacts (Fig. 7B). This cleft is expected to illustrated here for WT, C71G, and M114T.

Boopathy et al. PNAS | June 30, 2015 | vol. 112 | no. 26 | 7987

Fig. S7. Actin and poly-L-proline binding residues exhibit relatively high double difference values. Residues that have Avg-Abs-DD values of 0.3 Å or greater that are also engaged inFigure actin binding V.11. (V119, Actin H120, G122,and andpoly K126)-L- orproline poly-Pro bindingbinding (W4, residues Y7, H134, and exhibit S138) are mappedrelatively onto the structure of PFN1 WT in magenta. All other residues with Avg-Abs-DD values of 0.3 Å or greater are highlighted in green. Residues with Avg-Abs-DD values between chain A and chain B of M114T 0.3 Å orhigh greater double (Fig. S6D difference) were excluded values. from this analysis. Residues not used in this analysis are colored black. Residues that have Avg-Abs-DD values of 0.3 Å or greater that are also engaged in actin binding (V119, H120, G122, and K126) or poly-Pro binding (W4, Y7, H134, and S138) are mapped onto the structure of PFN1 WT in magenta. All other residues with Avg-Abs-DD values of 0.3 Å or greater are highlighted in green. Residues with Avg-Abs-DD values between chain A and chain B of M114T 0.3 Å or greater (Figure V.9D) were excluded from this analysis. Residues not used in this analysis are colored black.

203

Boopathy et al. www.pnas.org/cgi/content/short/1424108112 8 of 10

Fig. S8. The calculated α-carbon B factors for all PFN1 structures. Cartoon representations of WT (A), E117G (B), and M114T chains A (C)andB(D). Residues are colored according to the α-carbon B factors using the scale shown at the bottom. The average α-carbon B factor for WT, E117G, and M114T chains A and B structures are 30.52, 22.94,Figure 29.47, V.12. and 27.33, The respectively. calculated Because α- thecarbon average B B factorfactors is higher for for all M114T PFN1 chain structures. A, M114T chain B was used for structural analyses unless otherwiseCartoon noted. representations of WT (A), E117G (B), and M114T chains A (C) and B (D). Residues are colored according to the α-carbon B factors using the scale shown at the bottom. The average α-carbon B factor for WT, E117G, and M114T chains A and B structures are 30.52, 22.94, 29.47, and 27.33, respectively. Because the average B factor is higher for M114T chain A, M114T chain B was used for structural analyses unless otherwise noted.

Fig. S9. Electrostatic surface potential (ESP) of PFN1 WT and PFN1 M114T. A comparison of the ESP for PFN1 WT (A)andM114T(B) around the surface pocket (for WT) and cleft (for M114T) shown in Fig. 7. Comparison of the ESP was calculated204 using Maestro (Schrödinger, LLC). The Red_White_Blue color scheme was used to depict the ESP of both surfaces, where red denotes negative, blue denotes positive, and white denotes neutral ESP. The minimum and maximum values are −0.12 and 0.12, respectively. The cleft (boxed region in B) formed by M114T exposes a deeper pocket comprised of hydrophobic residues that would otherwise be buried beneath the surface-exposed pocket (boxed region in A) in PFN1 WT.

Boopathy et al. www.pnas.org/cgi/content/short/1424108112 9 of 10 exert a destabilizing effect on the native conformation of PFN1 owing to this loss of van der Waals contacts and the reduced hydrophobicity of the threonine side chain relative to that of methionine (11). Moreover, hydrophobic residues that are oth- erwise buried in the PFN1 WT structure were exposed by the cleft in the PFN1 M114T structure (Fig. 7 and Fig. S9). To in- vestigate the potential impact of the C71G mutation on PFN1 structure, the cysteine side chain of residue 71 was removed to mimic a glycine amino acid in the PFN1 WT structure using PyMOL. Interestingly, this mutation is predicted to form a void in the core of the protein that partially overlaps with the cleft observed in the PFN1 M114T crystal structure (Fig. 7B). Anal- ysis using PyMOL and SiteMap suggest that, unlike the solvent- accessible WT and the M114T pocket, the proposed C71G void is buried within the core of the protein. Solvent-inaccessible voids have a more destabilizing effect than solvent-exposed cavities (11, 35), providing an explanation for why the C71G mutation is more destabilizing than M114T (Fig. 1). Discussion Here we show that ALS-linked mutations severely destabilize (Fig. 1) and alter the native protein conformation (Fig. 3) of PFN1. Fig. 4. Structure of actin–PFN1–VASP peptide ternary complex with the actin and poly-L-proline binding residues mapped on PFN1. The X-ray structure Changes in protein stability owing to disease-causing mutations, of the PFN1 WT (gray)–actin (blue)–poly-L-proline peptide (gold) complex whether these mutations stabilize or destabilize the protein, are (PDB ID code 2PAV) is shown. Residues reportedly involved in actin binding thought to play a pivotal role in various disease mechanisms (13). (V61, K70, S72, V73, I74, R75, E83, R89, K91, P97, T98, N100, V119, H120, In the context of ALS, disease-linked mutations destabilize Cu, G122, N125, K126, Y129, and E130) and poly-L-proline binding (W4, Y7, N10, Zn-superoxide dismutase (SOD1) (9), but instead hyperstabilize A13, S28, S30, W32, H134, S138, and Y140) are highlighted in blue and gold, TAR DNA-binding protein 43 (TDP-43) (8, 10, 36). These find- respectively. The sites of ALS-linked mutations investigated in this study are ings underscore the importance of defining the toxic properties of highlighted and labeled in black with side chains displayed as black sticks. disease-linked proteins, thereby directing the rational design of Residues involved in actin or poly-L-proline binding that also exhibit Avg- Abs-DD values of 0.3 Å or greater between PFN1 WT and M114T chain B therapeutic strategies against those offending proteins (3). (W4, K126, and S138) are labeled in black (the remaining residues that fulfill Our X-ray crystal structures of PFN1 proteins illuminate a this criteria are shown in Fig. S7). probable source of mutation-induced destabilization. An enlarged surface pocket, or void, forms as a result of the M114T mutation (Fig. 7). The destabilizing effect of similar voids has been dem- we first monitored changes in the intrinsic tryptophan fluorescence onstrated using a systematic site-directed mutagenesis approach of PFN1 as a function of poly-L-proline peptide concentration (Fig. with lysozyme and is thought to arise from a loss of hydrophobic 5A). Our results revealed that the effect of ALS-linked mutations interactions (11, 35). Examples of mutation-induced cavity for- on the PFN1-poly-L-proline interaction was modest, because the mation and destabilization have also been observed in nature (13). apparent dissociation constants (Kd)werewithintwofoldforall Interestingly, modeling the removal of the cysteine side chain at PFN1 proteins in this study (Table 1). In fact, excess concentrations position 71 creates an internal cavity that is predicted to partially of poly-L-proline effectively stabilized all PFN1 proteins as de- overlap the cleft formed by M114T, raising the intriguing possi- termined by DSF, with the largest increase in Tm observed upon bility that both mutations destabilize PFN1 through a common poly-L-proline peptide binding to C71G (Fig. 5B and Table 1). Next, mechanism that involves the loss of hydrophobic and van der we measured the binding capacity of our PFN1 proteins for G-actin Waals contacts within the same region of PFN1 (Fig. 7). Because by comparing their concentration-dependent abilities to suppress spontaneous polymerization of pyrenyliodoacetamide‐labeled actin monomers (34). This assay is based on the fact that PFN1 binds G-actin and inhibits actin nucleation in the absence of formins (34). As expected, increasing concentrations of recombinant PFN1 WT reduced the rate of actin polymerization, whereas the H120E var- BIOPHYSICS AND iant that exhibits impaired binding to actin failed to suppress actin COMPUTATIONAL BIOLOGY polymerization to the same extent (Fig. 6). Of the four ALS-linked variants, only G118V was defective in suppressing actin polymeri- zation, which was most apparent at the highest concentration of PFN1 used in this assay, although thiseffectdidnotreachstatistical significance (Fig. 6). These data argue against a general mechanism Fig. 5. ALS-linked PFN1 variants retain the ability to bind poly-L-proline. for PFN1-mediated ALS pathogenesis that involves impaired direct Figure(A) Binding V.13. of ALS PFN1-linked to the PFN1 poly-L -prolinevariants peptide retain the was ability monitored to bind by measuring poly-L- binding between PFN1 and either poly-L-proline or actin. proline.the intrinsic tryptophan fluorescence of the indicated PFN1 protein as a A. Binding of PFN1 to the poly-L-proline peptide was monitored by measuring function of increasing peptide concentration. The data points were fit using Importantly, the X-ray crystal structures reveal a possible the intrinsic tryptophan fluorescence of the indicated PFN1 protein as a mechanism by which ALS-linked mutations destabilize PFN1. functionaone-sitetotalbindingmodelinGraphPadPrismandtheapparentdisso- of increasing peptide concentration. The data points were fit using a Residues Thr90, Met114, and Gln18 contribute to the formation oneciation-site constants total binding (Kd )model obtained in GraphPad from the Prism fit are and shown the inapparent Table 1. dissociation Note that the concentration of the peptide is reported in terms of [proline] because of a surface exposed pocket that was detected using SiteMap constants (Kd) obtained from the fit are shown in Table 1. Note that the the peptide stock is supplied as a mixture of poly-L-proline species (Materials (Fig. 7). Mutation of methionine to threonine at position 114 concentration of the peptide is reported in terms of [proline] because the peptideand Methods stock ).is ( Bsupplied)DSFwasperformedasdescribedinFig.1 as a mixture of poly-L-proline speciesB in (Materials the presence and increased the size of this pocket, thereby forming a cleft, because Methods).(dashed lines) and absence (solid lines) of 4 mM proline. The presence of the residues nearby failed to rearrange and compensate for the B.proline DSF increaseswas performed the Tm asfor described all PFN1 in proteins Figure usedV.2B inin thisthe presence study (Table (dashed 1), as loss of van der Waals contacts (Fig. 7B). This cleft is expected to lines)illustrated and absence here for (solid WT, C71G, lines) andof 4 mM M114T. proline. The presence of proline increases the Tm for all PFN1 proteins used in this study (Table 1), as illustrated here for WT, C71G, and M114T. Boopathy et al. PNAS | June 30, 2015 | vol. 112 | no. 26 | 7987

205

proline effectively stabilized all PFN1 proteins as determined by DSF, with the largest increase in Tm observed upon poly-L-proline peptide binding to C71G

(Figure V.13B and Table V.1). Next, we measured the binding capacity of our

PFN1 proteins for G-actin by comparing their concentration-dependent abilities to suppress spontaneous polymerization of pyrenyliodoacetamide-labeled actin monomers (34). This assay is based on the fact that PFN1 binds G-actin and inhibits actin nucleation in the absence of formins (34). As expected, increasing concentrations of recombinant PFN1 WT reduced the rate of actin polymerization, whereas the H120E variant that exhibits impaired binding to actin failed to suppress actin polymerization to the same extent (Figure V.14). Of the four ALS-linked variants, only G118V was defective in suppressing actin polymerization, which was most apparent at the highest concentration of PFN1 used in this assay, although this effect did not reach statistical significance

(Figure V.14). These data argue against a general mechanism for PFN1- mediated ALS pathogenesis that involves impaired direct binding between PFN1 and either poly-L-proline or actin.

Importantly, the X-ray crystal structures reveal a possible mechanism by which ALS-linked mutations destabilize PFN1. Residues Thr90, Met114, and

Gln18 contribute to the formation of a surface exposed pocket that was detected using SiteMap (Figure V.15). Mutation of methionine to threonine at position 114 increased the size of this pocket, thereby forming a cleft, because the residues nearby failed to rearrange and compensate for the loss of van der Waals

206

PFN1 ligands. These data, however, do not rule out the possibility that mutation-induced misfolding and destabilization culminate in defective actin homeostasis in vivo. PFN1 plays a complex role in actin homeostasis, requiring coordinated interactions between PFN1 and many other cellular factors that ultimately dictate the fate of different actin networks within the cell (41). The misfolding of PFN1 variants may also induce gain of toxic functions and interactions, the latter via aberrant protein–pro- tein interactions through exposed hydrophobic patches, such as those detected for PFN1 M114T (Fig. S9). Further, the aggregation of PFN1 variants can potentially sequester other vital proteins, in- cluding those with poly-L-proline binding motifs (4), culminating in compromised actin and/or cellular homeostasis (6). Although the downstream effect of ALS-linked PFN1 on actin dynamics and other cellular processes have not been elucidated, our data identify misfolded and destabilized PFN1 as a potential FigureFig. 6. V.14.The The binding binding of PFN1 of PFN1 proteins proteins to G-actin. to G- Polymerizationactin. of mono- upstream trigger of the adverse events that culminate in ALS, Polymerizationmeric rabbit muscle of monomeric actin (3 μrabbitM, 5% muscle pyrene-labeled) actin (3 μM, was 5% monitored pyrene-labeled) in the opening new avenues for therapeutic advancement in ALS. waspresence monitored of increasing in the presence concentrations of increasing of WT orconcentrations ALS-linked PFN1 of WT variants or ALS and- One potential direction is the development of pharmacological linkedused toPFN1 derive variants relative and rates used of to polymerization derive relative (ratesn = 3). of Thepolymerization variant H120E, (n = chaperones (16). For example, small molecules that fill the void 3).which The isvariant impaired H120E, in binding which tois actin,impaired fails in to binding suppress to spontaneousactin, fails to actinsuppress po- spontaneous actin polymerization as effectively as WT PFN1. Although G118V formed by the M114T mutation are expected to stabilize the lymerization as effectively as WT PFN1. Although G118V is relatively weak in is relatively weak in suppressing actin polymerization, the data did not reach protein (35). Our data with poly-L-proline (Fig. 5B) suggest that suppressing actin polymerization, the data did not reach statistical signifi- statistical significance. Statistical significance was determined using a two-way cance. Statistical significance was determined using a two-way ANOVA fol- small-molecules binding to other regions of PFN1 could also ANOVA followed by a Tukey’s post hoc analysis. **P ≤ 0.01 for WT vs. H120E ’ stabilize the protein. We posit that stabilizing mutant PFN1 will atlowed 7 μM byconcentration. a Tukey s post No hoc other analysis. significant **P ≤comparisons0.01 for WT with vs. H120EWT were at 7 μM obtained.concentration. Other Nosignificant other significantcomparisons comparisons included C71G with WTvs. H120E were obtained.and E117G restore the normal structure and function of the protein, thereby vs.Other H120E significant (P ≤ 0.05) comparisons at 7 μM concentration. included C71G Error vs. H120E bars andrepresent E117G SD. vs. H120E preventing the pathogenic cascade leading to ALS. (P ≤ 0.05) at 7 μM concentration. Error bars represent SD. Materials and Methods A pET vector containing human PFN1 flanked by NdeI and EcoRI restriction G118V is located within a solvent-exposed flexible loop, it is dif- sites was kindly provided by Bruce Goode, Brandeis University, Waltham, MA. ficult to predict whether this mutation propagates structural The mutant PFN1 DNA (2) was amplified using primers 5′-GGACCA- changes to the same region affected by M114T. We note that the TATGGCCGGGTGGAAC -3′ and 5′-GCCTGAATTCTCAGTACTGGGAACGC-3′ phi and psi angles for Gly118 are in a region of the Ramachandran and ligated into the pET vector using NdeI and EcoRI restriction sites. BL21 plot that are generally disallowed for a valine residue, and therefore (DE3) pLysS cells (200132; Agilent Technologies) transformed with PFN1 – we speculate that the G118V mutation also induces a conforma- constructs were cultured in LB containing 100 μg·mL 1 ampicillin and –1 tional change within PFN1 that allows valine to adapt dihedral 34 μg·mL chloramphenicol at 37 °C until an OD600 of 0.7, at which point PFN1 angles that are energetically more favorable. expression was induced by addition of 1 mM isopropyl β-D-thiogalactopy- ranoside (0487; Amresco) for either 3 h at 37 °C (for WT and E117G) or 24 h at Our study also provides insight207 into the relative pathogenicity 18 °C (for C71G, M114T, and G118V). Cells were harvested by centrifugation of ALS-linked PFN1 variants. The pathogenicity of the E117G and stored until purification. Refer to Supporting Information for complete variant was called into question after it had been detected in the details on methods. control population (2, 14, 37, 38). Moreover, this variant exhibited mild phenotypes compared with other ALS-linked PFN1 variants in cell-based functional experiments(2,7).Here,theE117Gmutation had only a modest effect on the stability and structure of PFN1 (Table 1 and Fig. S6), supporting the view that E117G is a risk factor for disease rather than overtly pathogenic (1, 14). Further, the E117G mutation was detected in sporadic ALS and fronto- temporal lobar degeneration cases (14, 37–40), consistent with the idea that environmental factors and/or genetic modifiers contribute to PFN1 E117G toxicity. In fact, proteasome inhibition triggered the aggregation of PFN1 E117G (2), suggesting that cellular stress may exacerbate PFN1 misfolding and dysfunction in vivo. Although the mechanism of PFN1 in ALS has yet to be fully elucidated, the destabilized mutant-PFN1 species identified here can serve as an upstream trigger for either loss-of-function or gain- of-toxic-function mechanisms. Several investigations from cell-based Fig. 7. The M114T mutation causes a surface-exposed pocket to expand experiments support a loss-of-function mechanism for ALS-linked into the core of the PFN1 protein. (A)Residuesaredepictedasdescribedin PFN1 variants with respect to actin binding (2), actin dynamics (2), Fig. 3. The van der Waals radii of residues 90, 114, and 18 are in contact in and stress granule assembly (7). For example, PFN1 variants immu- the PFN1 WT structure (Top). These contacts are reduced by the M114T noprecipitated less actin from mammalian cells compared with PFN1 mutation (Bottom) owing to the smaller size of threonine, leading to an WT (2). Our in vitro results suggest this is unlikely due to a general enlargement of the surface-exposed pocket. (B)PFN1WTisshownwitha defect in the inherent ability of mutant PFN1 to directly bind actin transparent surface and the secondary structure is shown in cartoon repre- (Fig. 6) but may be the consequence of mutant PFN1 being se- sentation. The surface pocket volume for PFN1 WT (green) and the cleft volume for PFN1 M114T chain B (red) are depicted as opaque surfaces and were gen- questered away from actin and/or engaged in other aberrant in- erated using SiteMap. The predicted cavity (blue) for PFN1 C71G (generated teractions within the cell. Moreover, ALS-linked mutations do not using PyMOL) overlays with the M114Tvoid,andunliketheWTandM114T simply abrogate the direct-binding interaction between PFN1 and volumes, is not surface-exposed. The insets (Right)showtheaforementioned the poly-L-proline motif (Fig. 5A)thatispresentinmanybiological voids for WT (Top), M114T chain B (Middle), and C71G (Bottom).

7988 | www.pnas.org/cgi/doi/10.1073/pnas.1424108112 Boopathy et al. PFN1 ligands. These data, however, do not rule out the possibility that mutation-induced misfolding and destabilization culminate in defective actin homeostasis in vivo. PFN1 plays a complex role in actin homeostasis, requiring coordinated interactions between PFN1 and many other cellular factors that ultimately dictate the fate of different actin networks within the cell (41). The misfolding of PFN1 variants may also induce gain of toxic functions and interactions, the latter via aberrant protein–pro- tein interactions through exposed hydrophobic patches, such as those detected for PFN1 M114T (Fig. S9). Further, the aggregation of PFN1 variants can potentially sequester other vital proteins, in- cluding those with poly-L-proline binding motifs (4), culminating in compromised actin and/or cellular homeostasis (6). Although the downstream effect of ALS-linked PFN1 on actin dynamics and other cellular processes have not been elucidated, our data identify misfolded and destabilized PFN1 as a potential Fig. 6. The binding of PFN1 proteins to G-actin. Polymerization of mono- upstream trigger of the adverse events that culminate in ALS, meric rabbit muscle actin (3 μM, 5% pyrene-labeled) was monitored in the opening new avenues for therapeutic advancement in ALS. presence of increasing concentrations of WT or ALS-linked PFN1 variants and One potential direction is the development of pharmacological used to derive relative rates of polymerization (n = 3). The variant H120E, chaperones (16). For example, small molecules that fill the void which is impaired in binding to actin, fails to suppress spontaneous actin po- formed by the M114T mutation are expected to stabilize the lymerization as effectively as WT PFN1. Although G118V is relatively weak in protein (35). Our data with poly-L-proline (Fig. 5B) suggest that suppressing actin polymerization, the data did not reach statistical signifi- cance. Statistical significance was determined using a two-way ANOVA fol- small-molecules binding to other regions of PFN1 could also lowed by a Tukey’s post hoc analysis. **P ≤ 0.01 for WT vs. H120E at 7 μM stabilize the protein. We posit that stabilizing mutant PFN1 will concentration. No other significant comparisons with WT were obtained. restore the normal structure and function of the protein, thereby Other significant comparisons included C71G vs. H120E and E117G vs. H120E preventing the pathogenic cascade leading to ALS. (P ≤ 0.05) at 7 μM concentration. Error bars represent SD. Materials and Methods A pET vector containing human PFN1 flanked by NdeI and EcoRI restriction G118V is located within a solvent-exposed flexible loop, it is dif- sites was kindly provided by Bruce Goode, Brandeis University, Waltham, MA. ficult to predict whether this mutation propagates structural The mutant PFN1 DNA (2) was amplified using primers 5′-GGACCA- changes to the same region affected by M114T. We note that the TATGGCCGGGTGGAAC -3′ and 5′-GCCTGAATTCTCAGTACTGGGAACGC-3′ phi and psi angles for Gly118 are in a region of the Ramachandran and ligated into the pET vector using NdeI and EcoRI restriction sites. BL21 plot that are generally disallowed for a valine residue, and therefore (DE3) pLysS cells (200132; Agilent Technologies) transformed with PFN1 – we speculate that the G118V mutation also induces a conforma- constructs were cultured in LB containing 100 μg·mL 1 ampicillin and –1 tional change within PFN1 that allows valine to adapt dihedral 34 μg·mL chloramphenicol at 37 °C until an OD600 of 0.7, at which point PFN1 angles that are energetically more favorable. expression was induced by addition of 1 mM isopropyl β-D-thiogalactopy- ranoside (0487; Amresco) for either 3 h at 37 °C (for WT and E117G) or 24 h at Our study also provides insight into the relative pathogenicity 18 °C (for C71G, M114T, and G118V). Cells were harvested by centrifugation of ALS-linked PFN1 variants. The pathogenicity of the E117G and stored until purification. Refer to Supporting Information for complete variant was called into question after it had been detected in the details on methods. control population (2, 14, 37, 38). Moreover, this variant exhibited mild phenotypes compared with other ALS-linked PFN1 variants in cell-based functional experiments(2,7).Here,theE117Gmutation had only a modest effect on the stability and structure of PFN1 (Table 1 and Fig. S6), supporting the view that E117G is a risk factor for disease rather than overtly pathogenic (1, 14). Further, the E117G mutation was detected in sporadic ALS and fronto- temporal lobar degeneration cases (14, 37–40), consistent with the idea that environmental factors and/or genetic modifiers contribute to PFN1 E117G toxicity. In fact, proteasome inhibition triggered the aggregation of PFN1 E117G (2), suggesting that cellular stress may exacerbate PFN1 misfolding and dysfunction in vivo. Although the mechanism of PFN1 in ALS has yet to be fully elucidated, the destabilized mutant-PFN1 species identified here can serve as an upstream trigger for either loss-of-function or gain- of-toxic-function mechanisms. Several investigations from cell-based Fig. 7. The M114T mutation causes a surface-exposed pocket to expand experiments support a loss-of-function mechanism for ALS-linked Figure V.15. The M114T mutation causes a surface-exposed pocket to expandinto the into core the of core the PFN1 of the protein. PFN1 protein. (A)Residuesaredepictedasdescribedin PFN1 variants with respect to actin binding (2), actin dynamics (2), A.Fig. Resid 3. Theues van are derdepicted Waals as radii described of residues in Figure 90, 114,V.8. andThe 18van are der in Waals contact radii in and stress granule assembly (7). For example, PFN1 variants immu- ofthe residues PFN1 WT90, 114, structure and 18 (Top are). in These contact contacts in the PFN1 are reducedWT structure by the (Top). M114T noprecipitated less actin from mammalian cells compared with PFN1 Thesemutation contacts (Bottom are )reduced owing toby the M114T smaller mutation size of (Bottom) threonine, owing leading to the to an WT (2). Our in vitro results suggest this is unlikely due to a general smallerenlargement size of ofthreonine, the surface-exposed leading to an enlargement pocket. (B)PFN1WTisshownwitha of the surface-exposed pocket.transparent surface and the secondary structure is shown in cartoon repre- defect in the inherent ability of mutant PFN1 to directly bind actin B. PFN1 WT is shown with a transparent surface and the secondary structure (Fig. 6) but may be the consequence of mutant PFN1 being se- sentation. The surface pocket volume for PFN1 WT (green) and the cleft volume isfor shown PFN1 in M114T cartoon chain representation. B (red) aredepic The tedsurface as opaque pocket surfacesvolume for and PFN1 were WT gen- (green) and the cleft volume for PFN1 M114T chain B (red) are depicted as questered away from actin and/or engaged in other aberrant in- erated using SiteMap. The predicted cavity (blue) for PFN1 C71G (generated teractions within the cell. Moreover, ALS-linked mutations do not opaque surfaces and were generated using SiteMap. The predicted cavity (blue)using for PyMOL) PFN1 overlays C71G (generated with the M114usingTvoid,andunliketheWTandM114T PyMOL) overlays with the M114T simply abrogate the direct-binding interaction between PFN1 and void,volumes, and unlike is not the surface-exposed. WT and M114T The volumes, insets ( Rightis not)showtheaforementioned surface-exposed. The the poly-L-proline motif (Fig. 5A)thatispresentinmanybiological insetsvoids for(Right) WT (showTop), the M114T aforementi chain Boned (Middle voids), and for WT C71G (Top), (Bottom M114T). chain B (Middle), and C71G (Bottom).

7988 | www.pnas.org/cgi/doi/10.1073/pnas.1424108112 Boopathy et al.

208

contacts (Figure V.15B). This cleft is expected to exert a destabilizing effect on the native conformation of PFN1 owing to this loss of van der Waals contacts and the reduced hydrophobicity of the threonine side chain relative to that of methionine (11). Moreover, hydrophobic residues that are otherwise buried in the

PFN1 WT structure were exposed by the cleft in the PFN1 M114T structure

(Figure V.15 and Figure V.16). To investigate the potential impact of the C71G mutation on PFN1 structure, the cysteine side chain of residue 71 was removed to mimic a glycine amino acid in the PFN1 WT structure using PyMOL.

Interestingly, this mutation is predicted to form a void in the core of the protein that partially overlaps with the cleft observed in the PFN1 M114T crystal structure

(Figure V.15B). Analysis using PyMOL and SiteMap suggest that, unlike the solvent-accessible WT and the M114T pocket, the proposed C71G void is buried within the core of the protein. Solvent-inaccessible voids have a more destabilizing effect than solvent-exposed cavities (11, 35), providing an explanation for why the C71G mutation is more destabilizing than M114T (Figure

V.2).

209 Fig. S8. The calculated α-carbon B factors for all PFN1 structures. Cartoon representations of WT (A), E117G (B), and M114T chains A (C)andB(D). Residues are colored according to the α-carbon B factors using the scale shown at the bottom. The average α-carbon B factor for WT, E117G, and M114T chains A and B structures are 30.52, 22.94, 29.47, and 27.33, respectively. Because the average B factor is higher for M114T chain A, M114T chain B was used for structural analyses unless otherwise noted.

Fig. S9. Electrostatic surface potential (ESP) of PFN1 WT and PFN1 M114T. A comparison of the ESP for PFN1 WT (A)andM114T(B) around the surface pocket (for WT) and cleft (for M114T)Figure shown V.16. in Fig. 7.Electrostatic Comparison of the surface ESP was calculatedpotential using (ESP) Maestro of (Schrödinger, PFN1 WT LLC). and The PFN1 Red_White_Blue color scheme was used to depict the ESP of bothM114T. surfaces, where red denotes negative, blue denotes positive, and white denotes neutral ESP. The minimum and maximum values are −0.12 and 0.12, respectively. The cleft (boxed region in B) formed by M114T exposes a deeper pocket comprised of hydrophobic residues that would otherwise be buried beneathA comparison the surface-exposed of the pocket ESP (boxed for PFN1 region inWTA) in(A) PFN1 and WT. M114T (B) around the surface pocket (for WT) and cleft (for M114T) shown in Figure V.15. Comparison of the ESP was calculated using Maestro (Schrödinger, LLC). The Red_White_Blue color scheme was used to depict the ESP of both surfaces, where red denotes negative, blue denotes positive, and white denotes neutral ESP. The minimum and maximum values are −0.12 and 0.12, respectively. The cleft (boxed region in B) formed by M114T exposes a deeper pocket comprised of hydrophobic residues that would otherwise be buried beneath the surface-exposed pocket (boxed region in A) in PFN1 WT.

Boopathy et al. www.pnas.org/cgi/content/short/1424108112 9 of 10

210

V.d. Discussion

Here we show that ALS-linked mutations severely destabilize (Figure V.2) and alter the native protein conformation (Figure V.8) of PFN1. Changes in protein stability owing to disease-causing mutations, whether these mutations stabilize or destabilize the protein, are thought to play a pivotal role in various disease mechanisms (13). In the context of ALS, disease-linked mutations destabilize Cu,

Zn-superoxide dismutase (SOD1) (9), but instead hyperstabilize TAR DNA- binding protein 43 (TDP-43) (8, 10, 36). These findings underscore the importance of defining the toxic properties of disease-linked proteins, thereby directing the rational design of therapeutic strategies against those offending proteins (3).

Our X-ray crystal structures of PFN1 proteins illuminate a probable source of mutation-induced destabilization. An enlarged surface pocket, or void, forms as a result of the M114T mutation (Figure V.15). The destabilizing effect of similar voids has been demonstrated using a systematic site-directed mutagenesis approach with lysozyme and is thought to arise from a loss of hydrophobic interactions (11, 35). Examples of mutation-induced cavity formation and destabilization have also been observed in nature (13). Interestingly, modeling the removal of the cysteine side chain at position 71 creates an internal cavity that is predicted to partially overlap the cleft formed by M114T, raising the intriguing possibility that both mutations destabilize PFN1 through a common mechanism that involves the loss of hydrophobic and van der Waals contacts

211

within the same region of PFN1 (Figure V.15.). Because G118V is located within a solvent-exposed flexible loop, it is difficult to predict whether this mutation propagates structural changes to the same region affected by M114T. We note that the phi and psi angles for Gly118 are in a region of the Ramachandran plot that are generally disallowed for a valine residue, and therefore we speculate that the G118V mutation also induces a conformational change within PFN1 that allows valine to adapt dihedral angles that are energetically more favorable.

Our study also provides insight into the relative pathogenicity of ALS-linked PFN1 variants. The pathogenicity of the E117G variant was called into question after it had been detected in the control population (2, 14, 37, 38). Moreover, this variant exhibited mild phenotypes compared with other ALS-linked PFN1 variants in cell- based functional experiments (2, 7). Here, the E117G mutation had only a modest effect on the stability and structure of PFN1 (Table V.1 and Figure V.9), supporting the view that E117G is a risk factor for disease rather than overtly pathogenic (1, 14). Further, the E117G mutation was detected in sporadic ALS and fronto-temporal lobar degeneration cases (14, 37–40), consistent with the idea that environmental factors and/or genetic modifiers contribute to PFN1

E117G toxicity. In fact, proteasome inhibition triggered the aggregation of PFN1

E117G (2), suggesting that cellular stress may exacerbate PFN1 misfolding and dysfunction in vivo.

Although the mechanism of PFN1 in ALS has yet to be fully elucidated, the destabilized mutant-PFN1 species identified here can serve as an upstream

212

trigger for either loss-of-function or gain-of-toxic-function mechanisms. Several investigations from cell-based experiments support a loss-of-function mechanism for ALS-linked PFN1 variants with respect to actin binding (2), actin dynamics (2), and stress granule assembly (7). For example, PFN1 variants immunoprecipitated less actin from mammalian cells compared with PFN1 WT

(2). Our in vitro results suggest this is unlikely due to a general defect in the inherent ability of mutant PFN1 to directly bind actin (Figure V.14) but may be the consequence of mutant PFN1 being sequestered away from actin and/or engaged in other aberrant interactions within the cell. Moreover, ALS-linked mutations do not simply abrogate the direct-binding interaction between PFN1 and the poly-L-proline motif (Figure V.13A) that is present in many biological

PFN1 ligands. These data, however, do not rule out the possibility that mutation- induced misfolding and destabilization culminate in defective actin homeostasis in vivo. PFN1 plays a complex role in actin homeostasis, requiring coordinated interactions between PFN1 and many other cellular factors that ultimately dictate the fate of different actin networks within the cell (41).

The misfolding of PFN1 variants may also induce gain of toxic functions and interactions, the latter via aberrant protein–protein interactions through exposed hydrophobic patches, such as those detected for PFN1 M114T (Figure

V.16). Further, the aggregation of PFN1 variants can potentially sequester other vital proteins, including those with poly-L-proline binding motifs (4), culminating in compromised actin and/or cellular homeostasis (6).

213

Although the downstream effect of ALS-linked PFN1 on actin dynamics and other cellular processes have not been elucidated, our data identify misfolded and destabilized PFN1 as a potential upstream trigger of the adverse events that culminate in ALS, opening new avenues for therapeutic advancement in ALS. One potential direction is the development of pharmacological chaperones (16). For example, small molecules that fill the void formed by the

M114T mutation are expected to stabilize the protein (35). Our data with poly-L- proline (Figure V.13B) suggest that small-molecules binding to other regions of

PFN1 could also stabilize the protein. We posit that stabilizing mutant PFN1 will restore the normal structure and function of the protein, thereby preventing the pathogenic cascade leading to ALS.

214

V.e. Methods

V.e.1. Cloning and over expression of PFN1.

A pET vector containing human PFN1 flanked by NdeI and EcoRI restriction sites was kindly provided by Bruce Goode, Brandeis University,

Waltham, MA. The mutant PFN1 DNA (2) was amplified using primers 5′-

GGACCATATGGCCGGGTGGAAC -3′ and 5′-

GCCTGAATTCTCAGTACTGGGAACGC -3′ and ligated into the pET vector using NdeI and EcoRI restriction sites. BL21 (DE3) pLysS cells (200132; Agilent

Technologies) transformed with PFN1 constructs were cultured in LB containing

–1 –1 100 μg·mL ampicillin and 34 μg·mL chloramphenicol at 37 °C until an OD600 of 0.7, at which point PFN1 expression was induced by addition of 1 mM isopropyl β-D-thiogalactopyranoside (0487; Amresco) for either 3 h at 37 °C (for

WT and E117G) or 24 h at 18 °C (for C71G, M114T, and G118V). Cells were harvested by centrifugation and stored until purification.

215

V.e.2. Purification of Recombinant PFN1.

Cells containing recombinant PFN1 were lysed by sonication in 10 mM citrate and 10 mM NaCl, pH 5.0 (buffer A) containing protease inhibitor

(11873580001; Roche). The lysate was cleared by centrifugation and applied to a Nuvia cPrime hydrophobic cation exchange column (35-mL column volume)

(156-3402; Bio-Rad) preequilibrated with buffer A using an ÄKTAPurifier FPLC system (GE Healthcare). Bound impurities were eluted with 200 mL linear gradient of 10 mM citrate and 1 M NaCl, pH 5.0 (buffer B). PFN1-containing fractions eluted at 100% buffer B at ∼300 mL from the start of the gradient.

SDS/PAGE was used to identify PFN1-containing fractions, which were pooled and dialyzed into buffer A with 6,000–8,000 molecular weight cut-off dialysis tubing (8015-40; Membrane Filtration Products, Inc.) before being applied to an anion (Q-resin) exchange column (17-0510-01; GE Healthcare). PFN1 eluted in the flow-through and was concentrated to 1–2 mL using stirred ultrafiltration cells

(5123 and 5121; Millipore) and then applied to a Sephacryl S-100 HR (17-1194-

01; GE Healthcare) size-exclusion column preequilibrated with PBS. PFN1 proteins eluted at ∼200 mL and were >95% pure as assessed by SDS/PAGE analysis with Coomassie Brilliant Blue stain. The identity and purity of the PFN1 proteins were verified by intact mass analysis at the Proteomics and Mass

Spectrometry Facility (University of Massachusetts Medical School). The concentration of PFN1 was determined spectrophotometrically at an absorbance of 280 nm using a molar extinction coefficient of 18,450 M–1·cm–1. Aliquots of

216

PFN1 proteins were stored at –80 °C, typically at concentrations between 60–

600 μM.

When PFN1 C71G was purified from inclusion bodies, BL21 (DE3) pLysS cells expressing C71G were cultured as described for PFN1 WT and C71G- containing inclusion bodies were extracted as previously described (42).

Inclusion bodies were solubilized in 50 mM Tris·HCl, pH 7.0, containing 5 mM

EDTA, 5 mM DTT, and 3 M guanidinium hydrochloride (buffer C) at ambient temperature. The solubilized inclusion bodies were diluted in buffer C to a PFN1

C71G concentration of ∼5 mg·mL–1. PFN1 C71G was re-folded in buffer A containing 0.5 M L-arginine at ambient tem- perature under conditions where the final concentrations of guanidinium hydrochloride and C71G were below 0.1 M and 0.2 mg·mL–1, respectively. The refolded protein was dialyzed in buffer A at 4

°C and purified using a Sephacryl S-100 HR column as described for PFN1 WT.

217

V.e.3. Equilibrium Unfolding Experiments.

For equilibrium unfolding experiments using tryptophan fluorescence, solutions of increasing urea concentration were prepared from a concentrated stock solution of 10.546 M urea in PBS using a Hamilton Microlab 500 titrator.

PFN1 was mixed into the urea solutions to a final concentration of 2 μM with 1 mM Tris(2-carboxyethyl)phosphine) (TCEP) and the samples were equilibrated for 15–30 min. The intrinsic tryptophan fluorescence of PFN1 was measured at

25 °C with a T-format Horiba Fluorolog fluorimeter using an excitation wavelength of 295 nm. Three emission spectra (310 nm to 450 nm) were collected for each sample and averaged. The concentration of the urea in each sample was measured using an Abbe refractometer after data acquisition. Data were processed to obtain the center of mass (COM) of the emission spectrum.

The COM was fit to a two-state transition model as previously described and the thermodynamic parameters, apparent ΔG° (the free energy of folding), m (the denaturant dependence of ΔG°), and Cm (the midpoint of the unfolding transition) were determined with the program Savuka (43, 44). Because the quantum yield of the native and unfolded states was within a factor of 2, the use of COM analysis is justified. We explicitly checked this by a rigorous global analysis using singular value decomposition and showed that the fit of the urea dependence basis vector gave thermodynamic parameters that were within the error of the

COM and CD spectroscopy analyses, and no indications of non-two-state behavior. For equilibrium unfolding experiments using CD spectroscopy, PFN1

218

(10 μM) was equilibrated in various concentrations of urea as described above and CD spectra were acquired from 215 nm to 260 nm using a Jasco J-810 spectropolarimeter. Three spectra were averaged and the mean residual ellipticity (MRE) at 220 nm was plotted as a function of urea concentration and fit to a two-state equilibrium unfolding model.

For protein refolding experiments, a concentrated stock of PFN1 (100–250

μM) denatured in urea (4–8.5 M) was diluted in urea/PBS to obtain a series of samples with decreasing concentrations of urea, 10 μM PFN1 and 1 mM TCEP.

Samples were equilibrated for 30 min before acquisition of fluorescence emission spectra as described above.

219

V.e.4. DSF.

Samples containing WT or mutant PFN1 (20 μM) in PBS with 20× SYPRO

Orange (S6651; Invitrogen) were pipetted in quadruplicate into a 384-well plate and subjected to heat denaturation using a Bio-RadCFX384 Touch Real-Time

PCR Detection System. The temperature was increased from 25 °C to 100 °C in

0.3 °C increments and at each increment fluorescent intensities were acquired using HEX detector (excitation 515–535 nm, emission 560–580 nm). PFN1 proteins were analyzed alone and in the presence of the poly-L-proline peptide

(molecular weight 1,000–10,000, P2254; Sigma). Because this peptide was supplied from the manufacturer as a mixture of poly-L-proline species, the concentration is reported here in units of proline (molecular weight 115.13 g·mol–

1). For experiments with the poly-L-proline peptide, PFN1 was prepared with 4 mM proline. The fluorescence intensities for the four replicates were averaged, normalized to the maximum fluorescence intensity, and plotted as a function of temperature to obtain melting curves, which were fit with a sigmoidal function in

GraphPad Prism to determine the midpoint of transition or the apparent Tm.

220

V.e.5. Measuring PFN1 Turnover in Cells.

Human SKNAS cells were cultured in DMEM (11965; Gibco) containing

10% (vol/vol) FBS (F4135; Sigma-Aldrich) and 1% (wt/vol) penicillin and streptomycin (10378; Gibco) under standard culture conditions (37 °C, 5%

CO2/95% air). SKNAS cells were transiently transfected with 0.5 μg of V5-PFN1 plasmids (2) in 24-well plates using 1.75 μL NeuroMag (NM50500; OZ

Biosciences) diluted in Opti-MEM (38915; Invitrogen). After 12 h of V5-PFN1 expression, translation was inhibited with 30 μg·mL–1 cycloheximide (C7698;

Sigma-Aldrich). Cells were lysed at specific time points during a 12.5-h time course following cycloheximide addition using RIPA buffer (BP-115-500; Boston

BioProducts) supplemented with protease inhibitors (11836170001; Roche) and centrifuged at 19,357 × g for 15 min, after which the supernatant (containing soluble PFN1) was collected. The remaining pellet (containing insoluble PFN1) was washed once with RIPA lysis buffer, centrifuged again, and resolubilized with 8 M urea in volumes equal to their soluble counterparts. The protein concentration of the soluble fractions was determined using a bicinchoninic acid assay (23227; Thermo Scientific Pierce). Samples were processed and subjected to Western blot and densitometry analyses essentially as described

(19). Western blots were probed using V5-specific (1:1,000, R96025; Invitrogen) and GAPDH-specific (1:20,000, G9545; Sigma) antibodies. Bands corresponding to soluble V5-PFN1 were normalized to the loading control, GAPDH, and then to the band corresponding to cycloheximide treatment for “0 h” for each protein. For

221

each biological replicate, visible bands corresponding to insoluble V5-PFN1 were normalized to their respective 0 h PFN1 C71G band. Statistical significance was determined using a two-way ANOVA followed by Tukey’s post hoc analysis.

222

V.e.6. CD Spectroscopy.

CD spectra of WT PFN1 or mutants (10 μM in PBS) were acquired from

190 nm to 260 nm at a scan speed of 2 s per wavelength with a 1-mm cuvette at

25 °C using a AVIV Biomedical CD spectrometer model 400. Data reflect an average of five scans that were blank subtracted. The resulting ellipticity curves were transformed to mean residue ellipticity as described (45).

223

V.e.7. Acidic Native PAGE.

The method for acidic native PAGE analysis of basic proteins described by the Mario Lebendiker laboratory (wolfson.huji.ac.il/purification/) was used.

Briefly, 29:1 acrylamide-bisacrylamide (BP1408-1; Fisher Scientific) native gels were cast with 7.5% (wt/vol) polyacrylamide in the resolving gel, pH 4.3, and 3%

(wt/vol) polyacrylamide in the stacking gel, pH 6.8. The gel sample containing

WT or mutant PFN1 (0.8 μg·μL–1) was prepared under native conditions using ice-cold acetate–KOH, pH 6.8 and 10% (vol/vol) glycerol with 0.025% (wt/vol) of methylene blue. PFN1 proteins (10 μg) were loaded onto the gel and subjected to reversed polarity electrophoresis under ice-cold conditions for 2 h at 100 V.

The protein bands were visualized with Coomassie Brilliant Blue as described above for denaturing gels.

224

V.e.8. Analytical Size-Exclusion Chromatography.

WT or mutant PFN1 (50 μL of PFN1 at 0.8 μg·μL–1) were subjected to analytical size-exclusion chromatography at 4 °C using a Superdex 75 column

(17-5174-01; GE Healthcare) equilibrated with PBS and a flow rate of 0.5 mL·min−1. For each trial (n = 2), elution profiles were acquired using absorbance at 280 nm and normalized to the peak value of WT PFN1. The area under peak was calculated using GraphPad Prism.

225

V.e.9. Protein Crystallization and X-Ray Structural

Determination.

PFN1 crystals were grown by hanging drop vapor diffusion after mixing the PFN1 protein with a 1:1 ratio of reservoir solution at 25 °C for WT and E117G and at 18 °C for M114T. Reservoir solution for WT contained 50 mM KH2PO4,

36% (wt/vol) PEG 8,000 and 100 mM MES, pH 6.0. Reservoir solution for E117G contained 50 mM KH2PO4, 41% (wt/vol) PEG 8,000 and 100 mM MES, pH 6.0. Reservoir solution for M114T contained 750 mM sodium citrate, 200 mM

NaCl, and 100 mM Tris, pH 7.5.

E117G crystals were soaked in cryoprotectant composed of 25% (vol/vol) ethylene glycol and 75% (vol/vol) reservoir solution and M114T crystals were passed through mineral oil before mounting for data collection. Diffraction data were collected using a Rigaku 007 MicroMax HF rotating anode X-ray generator, under a nitrogen cryostream at 100 K (Oxford Cryosystems), on a Saturn944+

CCD detector.

The data were processed using Xia2 (46) [running XDS (47)] for WT and

M114T and HKL2000 (HKL Research) for E117G. All three structures were solved via molecular replacement with Phaser (48) using the profilin structure

PDB ID code 1FIK (20) as the starting model followed by multiple rounds of manual model building performed with Coot (49). WT was refined with PHENIX

(50) and E117G with REFMAC5 (51) using standard refinement protocols.

M114T was refined with PHENIX using twin refinement with the twin law {h,-h-k,l}

226

applied through refinement, because the data were highly twinned with a twin fraction estimated to be 0.48.

227

V.e.10. Structural Analysis.

SiteMap (Schrödinger, LLC) was used to identify and evaluate the mutation-site cavity volumes. Figures were generated using PyMOL

(Schrödinger, LLC).

228

V.e.11. Poly-L-Proline Peptide Binding Experiments.

The intrinsic tryptophan fluorescence of WT or ALS-PFN1 (2 μM) as a function of increasing concentrations of the poly-L-proline peptide described above at 25 °C was used to measure binding of PFN1 to poly-L- proline as previously described (52). The samples were excited at 295 nm and three emission spectra between 310 nm and 450 nm were collected for each sample and averaged. The fluorescence emission intensity at 323 nm was baseline- corrected, normalized, plotted as a function of poly-L-proline and fit to a one-site total binding model in GraphPad Prism to yield apparent Kd values.

229

V.e.12. Inhibition of Spontaneous Actin Assembly.

Gel-filtered monomeric rabbit muscle actin (3 μM, 5% pyrene-labeled) was converted to Mg–ATP–actin immediately before use in each reaction and mixed with 7 μL of different concentrations of PFN1 WT, and PFN1 mutants or control buffer and 3 μL of 20× initiation mix (40 mM MgCl2, 10 mM ATP, and 1 M KCl) in

60-μL reactions. Actin polymerization was monitored over time at 365 nm excitation and 407 nm emission in a PTI fluorometer at 25 °C. Average relative rates of actin polymerization (n = 3) were determined based on the slopes of the assembly curves during the first 500 s of each reaction and plotted against increasing concentrations of PFN1 (mutants). Statistical significance was determined using a two-way ANOVA followed by Tukey’s post hoc analysis.

230

V.f. References:

1. Smith BN, et al. (2015) Novel mutations support a role for Profilin 1 in the patho- genesis of ALS. Neurobiol Aging 36(3):1602.e17–1602.e27.

2. Wu CH, et al. (2012) Mutations in the profilin 1 gene cause familial amyotrophic lateral sclerosis. Nature 488(7412):499–503.

3. Bosco DA, LaVoie MJ, Petsko GA, Ringe D (2011) Proteostasis and movement disorders: Parkinson’s disease and amyotrophic lateral sclerosis. Cold Spring Harb Perspect Biol 3(10):a007500.

4. Witke W (2004) The role of profilin complexes in cell motility and other cellular processes. Trends Cell Biol 14(8):461–469.

5. Lambrechts A, et al. (1997) The mammalian profilin isoforms display complementary affinities for PIP2 and proline-rich sequences. EMBO J 16(3):484–494.

6. Winklhofer KF, Tatzelt J, Haass C (2008) The two faces of protein misfolding: Gain- and loss-of-function in neurodegenerative diseases. EMBO J 27(2):336– 349.

7. Figley MD, Bieri G, Kolaitis RM, Taylor JP, Gitler AD (2014) Profilin 1 associates with stress granules and ALS-linked mutations alter stress granule dynamics. J Neurosci 34(24):8083–8097.

8. Austin JA, et al. (2014) Disease causing mutants of TDP-43 nucleic acid binding do- mains are resistant to aggregation and have increased stability and half-life. Proc Natl Acad Sci USA 111(11):4309–4314.

9. Rotunno MS, Bosco DA (2013) An emerging role for misfolded wild-type SOD1 in sporadic ALS pathogenesis. Front Cell Neurosci 7:253.

10. Watanabe S, Kaneko K, Yamanaka K (2013) Accelerated disease onset with stabilized familial amyotrophic lateral sclerosis (ALS)-linked mutant TDP-43 proteins. J Biol Chem 288(5):3641–3654.

11. Eriksson AE, et al. (1992) Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science 255(5041):178–183.

231

12. Joerger AC, Ang HC, Fersht AR (2006) Structural basis for understanding onco- genic p53 mutations and designing rescue drugs. Proc Natl Acad Sci USA 103(41): 15056–15061.

13. Yue P, Li Z, Moult J (2005) Loss of protein structure stability as a major causative factor in monogenic disease. J Mol Biol 353(2):459–473.

14. Fratta P, et al. (2014) Profilin1 E117G is a moderate risk factor for amyotrophic lateral sclerosis. J Neurol Neurosurg Psychiatry 85(5):506–508.

15. Vedadi M, et al. (2006) Chemical screening methods to identify ligands that promote protein stability, protein crystallization, and structure determination. Proc Natl Acad Sci USA 103(43):15835–15840.

16. Ringe D, Petsko GA (2009) What are pharmacological chaperones and why are they interesting? J Biol 8(9):80.

17. Verhoef LG, Lindsten K, Masucci MG, Dantuma NP (2002) Aggregate formation in- hibits proteasomal degradation of polyglutamine proteins. Hum Mol Genet 11(22): 2689–2700.

18. Myers JK, Pace CN, Scholtz JM (1995) Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci 4(10): 2138–2148.

19. Rotunno MS, et al. (2014) Identification of a misfolded region in superoxide dis- mutase 1 that is exposed in amyotrophic lateral sclerosis. J Biol Chem 289(41): 28527–28538.

20. Fedorov AA, Pollard TD, Almo SC (1994) Purification, characterization and crystalli- zation of human platelet profilin expressed in Escherichia coli. J Mol Biol 241(3): 480–482.

21. Ferron F, Rebowski G, Lee SH, Dominguez R (2007) Structural basis for the recruitment of profilin-actin complexes during filament elongation by Ena/VASP. EMBO J 26(21): 4597–4606.

22. Mahoney NM, Rozwarski DA, Fedorov E, Fedorov AA, Almo SC (1999) Profilin binds proline-rich ligands in two distinct amide backbone orientations. Nat Struct Biol 6(7): 666–671.

23. Prabu-Jeyabalan M, Nalivaika EA, Romano K, Schiffer CA (2006) Mechanism of substrate recognition by drug-resistant human immunodeficiency

232

virus type 1 protease variants revealed by a novel structural intermediate. J Virol 80(7):3607–3616.

24. Cedergren-Zeppezauer ES, et al. (1994) Crystallization and structure determination of bovine profilin at 2.0 A resolution. J Mol Biol 240(5):459–475.

25. Chik JK, Lindberg U, Schutt CE (1996) The structure of an open state of beta-actin at 2.65 A resolution. J Mol Biol 263(4):607–623.

26. Hájková L, Björkegren Sjögren C, Korenbaum E, Nordberg P, Karlsson R (1997) Characterization of a mutant profilin with reduced actin-binding capacity: Effects in vitro and in vivo. Exp Cell Res 234(1):66–77.

27. Korenbaum E, et al. (1998) The role of profilin in actin polymerization and nucleotide exchange. Biochemistry 37(26):9274–9283.

28. Porta JC, Borgstahl GE (2012) Structural basis for profilin-mediated actin nucleotide exchange. J Mol Biol 418(1-2):103–116.

29. Schutt CE, Myslik JC, Rozycki MD, Goonesekere NC, Lindberg U (1993) The structure of crystalline profilin-beta-actin. Nature 365(6449):810–816.

30. Sohn RH, Chen J, Koblan KS, Bray PF, Goldschmidt-Clermont PJ (1995) Localization of a binding site for phosphatidylinositol 4,5-bisphosphate on human profilin. J Biol Chem 270(36):21114–21120.

31. Suetsugu S, Miki H, Takenawa T (1998) The essential role of profilin in the assembly of actin for microspike formation. EMBO J 17(22):6516–6526.

32. Björkegren C, Rozycki M, Schutt CE, Lindberg U, Karlsson R (1993) Mutagenesis of human profilin locates its poly(L-proline)-binding site to a hydrophobic patch of aromatic amino acids. FEBS Lett 333(1-2):123–126.

33. Ostrander DB, Ernst EG, Lavoie TB, Gorman JA (1999) Polyproline binding is an essential function of human profilin in yeast. Eur J Biochem 262(1):26–35.

34. Pollard TD, Cooper JA (1984) Quantitative analysis of the effect of Acanthamoeba profilin on actin filament nucleation and elongation. Biochemistry 23(26):6631–6641.

35. Eriksson AE, Baase WA, Wozniak JA, Matthews BW (1992) A cavity- containing mutant of T4 lysozyme is stabilized by buried benzene. Nature 355(6358):371–373.

233

36. Ling SC, et al. (2010) ALS-associated mutations in TDP-43 increase its stability and promote TDP-43 complexes with FUS/TLS. Proc Natl Acad Sci USA 107(30):13318–13323.

37. Dillen L, et al. (2013) Explorative genetic study of UBQLN2 and PFN1 in an extended Flanders-Belgian cohort of frontotemporal lobar degeneration patients. Neurobiol Aging 34(6):1711.e1–1711.e5.

38. van Blitterswijk M, et al. (2013) Profilin-1 mutations are rare in patients with amyotrophic lateral sclerosis and frontotemporal dementia. Amyotroph Lateral Scler Frontotemporal Degener 14(5-6):463–469.

39. Tiloca C, et al. (2013) Screening of the PFN1 gene in sporadic amyotrophic lateral sclerosis and in frontotemporal dementia. Neurobiol Aging 34(5):1517.e9– 1517.e10.

40. Yang S, et al. (2013) Mutation analysis and immunopathological studies of PFN1 in familial and sporadic amyotrophic lateral sclerosis. Neurobiol Aging 34(9):2235.e7–2235.e10.

41. Rotty JD, et al. (2015) Profilin-1 serves as a gatekeeper for actin assembly by Arp2/3-dependent and -independent pathways. Dev Cell 32(1):54–67.

42. Palmer I, Wingfield PT (2012) Preparation and extraction of insoluble (inclusion-body)proteins from Escherichia coli. Curr Protoc Protein Sci Chap 6, Unit 6.3.

43. Bilsel O, Yang L, Zitzewitz JA, Beechem JM, Matthews CR (1999) Time- resolved fluorescence anisotropy study of the refolding reaction of the alpha- subunit of tryptophan synthase reveals nonmonotonic behavior of the rotational correlation time. Biochemistry 38(13):4177–4187.

44. Greene RF, Jr, Pace CN (1974) Urea and guanidine hydrochloride denaturation of ribonuclease, lysozyme, alpha-chymotrypsin, and beta- lactoglobulin. J Biol Chem 249(17):5388–5393.

45. Mackness BC, Tran MT, McClain SP, Matthews CR, Zitzewitz JA (2014) Folding of the RNA recognition motif (RRM) domains of the amyotrophic lateral sclerosis (ALS)- linked protein TDP-43 reveals an intermediate state. J Biol Chem 289(12):8264–8276.

46. Winter G (2010) xia2: An expert system for macromolecular crystallography data reduction. J Appl Cryst 43:186–190.

234

47. Kabsch W (2010) Xds. Acta Crystallogr D Biol Crystallogr 66(Pt 2):125–132.

48. McCoy AJ, et al. (2007) Phaser crystallographic software. J Appl Cryst 40(Pt 4):658–674.

49. Emsley P, Cowtan K (2004) Coot: Model-building tools for molecular graphics. Acta Crystallogr D Biol Crystallogr 60(Pt 12 Pt 1):2126–2132.

50. Adams PD, et al. (2010) PHENIX: A comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr D Biol Crystallogr 66(Pt 2):213–221.

51. Murshudov GN, Vagin AA, Dodson EJ (1997) Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr 53(Pt 3):240–255.

52. Lu J, Pollard TD (2001) Profilin binding to poly-L-proline and actin monomers along with ability to catalyze actin nucleotide exchange is required for viability of fission yeast. Mol Biol Cell 12(4):1161–1175.

235

Chapter VI

Discussion

236

VI.a.1 Using A3A to understand the structural basis for substrate recognition in ssDNA deaminating APOBECs

ssDNA deaminating APOBECs are essential components of the immune

system. The A3 subfamily of human cytidine deaminases is renowned for

providing a first line of defense against many exogenous and endogenous

retroviruses while AID is responsible for somatically mutating immunoglobulins.

However, the ability of these proteins to deaminate deoxycytidines in ssDNA

makes APOBECs a double-edged sword. When APOBECs are over expressed,

the resulting mis-regulated deaminase activity can contribute to genomic

instability and cancer.

The enzymology and biological consequences of APOBEC function have

been extensively studied. However, as most studies focus on double-domain A3s

(1-3), the mechanism by which APOBEC3s recognize and edit DNA remains

largely elusive. The N-terminal domain of these double domain A3s are insoluble,

thus they are very difficult to study in vitro (4). Studying the active C-terminal

domain alone is not sufficient as it binds substrates only weakly (5) compared with full length and the N-terminal domain can influence the activity of double domain A3s (6-9). In contrast to isolated CTDs of double-domain A3s, single domain A3s can bind substrate ssDNA with as low as 100 nanomolar Kd values

(10, 11) . Challenges with double-domain A3s warrant alternative strategies to

elucidate specificity and structures of A3 complexes.

237

A3A is a single-domain enzyme with the highest catalytic activity among the deaminases in the APOBEC superfamily (12) and a known inhibitor for HPV and the retroelement LINE-1(13, 14). When A3A is over expressed or mis- regulated, it can also contribute to carcinogenesis (15). Since A3A is soluble in vitro as an intact protein, catalytically efficient, and is involved in viral restriction and cancer, A3A can be used as a critical benchmark to understand the function of the APOBEC super family. Therefore, I chose to focus on A3A to understand the structural basis for substrate recognition in ssDNA deaminating.

As described in the Chapter III of this thesis, the crystal structure of A3A in complex with ssDNA not only visualizes the active site poised for catalysis by

A3A, but also pinpoints the residues that confer specificity towards CC/TC motifs.

The A3A–ssDNA complex structure also defines the 5’–3’ directionality and subtle conformational changes that clench the ssDNA within the binding groove, revealing the architecture and mechanism of ssDNA recognition that is likely conserved among all polynucleotide deaminases.

Although our structure of A3A bound to ssDNA along with another A3 co- crystal structure (16, 17) elucidated the structural basic of substrate specificity for

TC, the molecular mechanism underlying substrate sequence specificity flanking the TC dinucleotide sequence remained unclear. In order to elucidate the substrate specificity of A3 that cannot be explained by these enzyme-substrate structures, I took a systematic approach to quantify the affinity for substrate as a function of sequence context, length, secondary structure, and solution pH, as

238

described in Chapter IV. The A3A ssDNA binding motif was identified as

(T/C)TC(A/G), which correlated with enzymatic activity. A3A’s ability to bind RNA in a sequence specific manner was also validated. A3A was found to bind tighter to substrate binding motif within a hairpin loop compared to linear oligonucleotide, suggesting A3A affinity is modulated by substrate structure.

Based on these findings and previously published A3A–ssDNA co-crystal structures, I proposed a new model with intra-DNA interactions for the molecular mechanism underlying A3A sequence preference.

Crystal structure of DNA-bound A3A elucidates the architecture and mechanism of ssDNA recognition that is likely conserved among all polynucleotide deaminases, thereby opening the door for the design of mechanistic-based therapeutics. The signature sequence and substrate structural preferences identified for A3A leads to a new paradigm for identifying

A3A’s involvement in mutation of endogenous or exogenous DNA.

239

VI.a.2. Nucleic acid-bound structures of APOBECs.

In Chapter III, I describe the architecture and mechanism of A3A ssDNA

substrate recognition, which is likely conserved among all polynucleotide

deaminases. Since Chapter III was published, three other APOBEC-nucleic acid

structures were solved. Figure VI.1 is a consolidation of key structures discussed in Chapter III and Chapter IV (Figure VI.1.A, B, and C), as well as the new structures since publication (Figure VI.1.D, E, and F). Comparison of APOBECs bound to poly nucleic acids with a substrate cytosine (Figure VI.2.A, B, and F) to those without (Figure VI.2.C, D, E, and F) reveal the potential for alternative poly nucleic acid binding sites outside of the canonical active site of these proteins.

A novel human A3F-CTD co-crystal structure was solved using non-

substrate Poly T ssDNA (Figure VI.1.D) (18). Detailed analysis of A3F-Poly T structure will be further examined in section VI.a.3. This structure also reveals a unique zinc coordinated dimer interface. As described in Chapter II, our apo A3A homodimer crystal structure was also coordinated by zinc. Further studies elucidating the role of zinc in modulating APOBEC oligomerization is necessary to determine the role of zinc beyond its function in the active site of APOBECs.

The first ever A3H structure was recently published of a macaque A3H

(macA3H) bound to dsRNA (Figure VI.1.E) (19). This is also the first structure of an APOBEC bound to RNA, albeit of unknown sequence (resolution was not high enough to identify the nucleotides of co-purified RNA). Sequence alignment with other active APOBEC domains reveal that the residues involved in macA3H-RNA

240

Figure VI.1. Solved structures of APOBEC-poly nucleic acid complexes. A. Our crystal structure of human A3A bound to the PolyT-1C ssDNA sequence 5’-TTTTTTTTCTTTTTT-3’ (PDB ID: 5KEG). B. Human A3B with active site A3A chimera bound to ssDNA 5’-TTTTCAT-3’ (PDB ID: 5TD5). C. Primate A3G-NTD bound to Poly T ssDNA sequence 5’-TTTTTTTTTT-3’ (PDB ID: 5K83). D. Human A3F-CTD bound to Poly T ssDNA sequence 5’-TTTTTTTTTT-3’ (PDB ID: 5W2M). Note ssDNA and Zinc binding site located outside active site. E. Macaque A3H bound to dsRNA of unknown sequence (PDB ID: 5W3V). F. MBP fused human AID bound to dsDNA sequence 5’-GTTCAAGGCCAG- 3’, 5’-CTGGCCTTGAAC-3’ and deoxycytidine monophosphate (PDB ID: 5W0U). All protein structures shown in gray transparent surface view and cartoon. Zinc shown as marine sphere. Nucleic acid shown as orange sticks.

241

242

Figure VI.2. Active site view of APOBEC-poly nucleic acid complexes. A. PolyT-1C ssDNA with substrate deoxycytidine bound to the active site of human A3A (PDB ID: 5KEG). B. ssDNA with substrate deoxycytidine bound to active site of human A3B- active site A3A chimera (PDB ID: 5TD5). C. Poly T ssDNA with only substrate deoxycytidine visible and bound to active site the Primate A3G-NTD(PDB ID: 5K83). D. Poly T ssDNA bound to “back side” of Human A3F-CTD (PDB ID: 5W2M). E. dsRNA of unknown sequence bound to the “top” of Macaque A3H(PDB ID: 5W3V). F. deoxycytidine monophosphate bound to active site of MBP fused human AID. dsDNA also bound to “top” of protein (PDB ID: 5W0U). All protein structures shown in gray cartoon. Zinc shown as marine sphere. Nucleic acid shown as orange sticks.

243

Figure VI.3: macA3H RNA binding residues. Sequence alignment of human AID, the seven members of the APOBEC3 subfamily active domains, and macA3H. Orange dots represent macA3H residues with side chain interactions with dsRNA, corresponding orange boxes to highlight the residue identity of other APOBEC3 at this position. Grey dots represent macA3H residues with backbone interactions with dsRNA, corresponding grey boxes to highlight the residue identity of other APOBEC3 at this position. Catalytic glutamate denoted by orange star. Zinc coordinating residues denoted with green diamond. Identical residues highlighted in blue, residues 80-100% identical in light blue, 60-80% in teal. Active site loops are denoted by red line. Residues that make up the active site pocket are highlighted with red dashed boxes.

244

interactions are not conserved, other than in human A3H (Figure VI.3),

suggesting this RNA binding mode may be A3H specific.

The first AID structure was published as MBP-human AID fusion protein bound to dsDNA and deoxycytidine monophosphate (Figure VI.1.F and Figure

VI.2.F) (20). However, AID does not bind to dsDNA in solution and interactions in the crystal structure are solely interactions with PO4 backbone. Authors suggest dsDNA-AID interaction is a crystallization artifact; dsDNA is stacking in the crystal lattice and is likely playing the role of neutralizing repulsion between the highly positively charged AID monomers. Despite major efforts from these authors, solving a structure of biologically relevant AID bound to substrate DNA remains elusive and is still necessary to elucidate the molecular mechanism for

AID substrate recognition.

To summarize, the nucleic acid bound structures of APOBEC superfamily members reveal both specific and nonspecific modes of binding, some being protein specific, while others may be shared within the family.

245

VI.a.3. Implications for the role of A3A homo dimer observed in

apoA3A structure

As described in the Chapter II of this thesis, cooperative oligomerization of

A3A was found to regulate the specific binding of A3A to ssDNA. Additionally, as

A3A forms multiple oligomeric states in solution. Our apo A3A crystal structure

reveals A3A as a homodimer with symmetric domain swapping of N-terminal

residues. Mutating the homodimer interface found in this structure resulted in a

decrease in affinity for substrate as well as a decrease in Hill coefficient value

relative to wildtype. These results suggest that the homodimer interface seen in

our apo-A3A crystal structure mediates A3A cooperative protein-protein

interactions that affect A3A activity.

Considering these results, it is reasonable to expect a solved structure of

A3A-substrate complex to have more than a 1:1 stoichiometry. However, our co-

crystal structure of A3A bound to ssDNA shown in Chapter III revealed a

monomer in the asymmetric unit, with a 1:1 stoichiometry with ssDNA. The

apparent discrepancy between the results in Chapter II and the co-complex seen

in Chapter III may be explained by a new hypothesis for the mechanism for A3A substrate binding, in light of the third structure of an A3-substrate complex reported after the publication of Chapters II and III.

Human A3F-CTD bound to a non-substrate Poly T ssDNA revealed a novel second non-specific polynucleotide binding site (Figure VI.1.D) (18). This novel binding site is located at the “backside” of APOBEC proteins relative to the

246

substrate binding site seen in our A3A-ssDNA structure (Figure VI.2.D). With

more rigorous structural analysis combined with homology analysis between

other active APOBEC domains suggest that (Figure VI.4), in contrast with the

authors conclusions, that this second binding site may in fact be conserved

through APOBEC proteins.

The ssDNA used to obtain this co-crystal structure (a poly T 10-mer oligo)

may not have bound to the active site of A3F-CTD because the sequence did not

contain a substrate cytidine. The fluorescence anisotropy based binding assays

described in Chapter II also used poly-T ssDNA as a background sequence for

studying A3A substrate binding. Using the poly T sequence as a background for

studying substrate specificity for A3A resulted in a considerable amount of

background binding, as described in Chapter IV. Therefore, the non-specific

binding we found in Chapter IV may be due to this second non-specific binding

site. This second site can also explain the effect of ssDNA length on A3A binding affinity to DNA, with longer oligonucleotide binding stronger, as described in

Chapter IV and previous reports on A3G substrate length dependence (21-23).

Compilation of A3A-homodimer from Chapter II, A3A-ssDNA structure from Chapter III, A3F-CTD-ssDNA structure, and studies reveal a new model for A3A binding to ssDNA (Figure V1.5 and V1.6). Figure

VI.5 illustrates the potential for a homodimer of A3A bound to ssDNA in the active site as well as the distal nonspecific nucleic acid binding site. Figure VI.6

247

Figure VI.4. A3F-CTD Poly T ssDNA binding residues. A. A3F-CTD backbone interactions with backbone PO4s. B. Sequence alignment of residues involved in backbone-backbone protein- nucleic acid interactions with active APOBEC domains. C. Y333 residue side chain base stacking with deoxythymine base. D. Sequence alignment of residues involved in aromatic stacking protein- nucleic acid interactions with active APOBEC domains. For A and C, residues 4A away from ssDNA are in green sticks. Hydrogen bonds represented as gray dashes. ssDNA represented as orange sticks. Oxygen and nitrogen colored as red and blue, respectively. For B and D, Sequence alignment of human AID, the seven members of the APOBEC3 subfamily active domains, and macA3H. Orange dots represent A3F-CTD residues with side chain interactions with ssDNA, corresponding orange boxes to highlight the residue identity of other APOBEC3 at this position. Grey dots represent A3F-CTD residues with backbone interactions with ssDNA, corresponding grey boxes to highlight the residue identity of other APOBEC3 at this position. Identical residues highlighted in blue, residues 80- 100% identical in light blue, 60-80% in teal.

248

Figure VI.5: Compilation of A3 apo and bound structures. A. Our co-crystal structure of A3A-ssDNA complex aligned with A3F-CTD ssDNA structure. A3A shown as gray cartoon, ssDNA bound to A3A shown in green sticks. PolyT ssDNA from A3F-CTD-ssDNA structure depicted as orange sticks. Zinc shown as marine sphere. B. compiled structure in A. aligned to homodimer in our apo-A3A crystal structure. A3A from complex structure shown as gray surface. DNA depicted as described for A.

249

Figure VI.6: Proposed model of A3 homodimer cooperatively binding to ssDNA. A. Compiled structure as in Figure VI.6. ssDNA from solved crystal structures depicted in light green sticks. Model of directionality of ssDNA binding shown with green arrow. B. Compiled structures seen in A, with homologous residues highlighted in A3A surface. C. 90º rotation of B. with poly T ssDNA from A3F-CTD structure shown in orange sticks. D. Figure C. with Model of directionality of ssDNA binding shown with green arrow. Model of directionality of ssDNA binding shown with green arrow. A3A from complex structure shown as gray surface. Identical residues highlighted in blue, residues 80-100% identical in light blue, 60-80% in teal.

250

illustrates the potential for a homodimer of a A3A binding to two strands of single stranded nucleic acid. Note the antiparallel directionality that results from this model (Figure VI.6A). Interestingly, this model captures many homologous

APOBEC residues within in the path of nucleic acid binding (Figure VI.6B, C, and D). Additionally, this model requires two A3A proteins to bind one strand of single stranded nucleic acid simultaneously (Figure VI.6 D) and would elucidate the structural mechanism for cooperative binding observed for A3A, as described in Chapter II, A3F and AID (18, 20). Crystallographic, mutational, and biochemical studies are necessary to determine the biological relevance of this model.

251

VI.a.4. Applications for identifying APOBEC signature sequences in a quantitative manner

The dinucleotide motif, TC, was previously studies identified as A3A

signature sequence (17, 24, 25). I confirmed and expanded on A3A’s signature sequence to the 4-mer motif, (T/C)TC(A/G). The comprehensive identification of

A3A signature sequences found in chapter IV enables a more accurate evaluation of A3A activity based on sequence analysis. Previous studies used only a single A3A signature sequence to implicate A3A’s role in viral restriction or cancer progression. Since I have identified four almost equivalent substrate signature sequences, TTCA, TTCG, CTCA, and CTCG, I propose using a set of

sequences rather than just one as a more accurate method to identify A3A’s

involvement in mutagenesis.

The quantitative and systematic approach I took to determine A3A’s

signature sequence can also be applied to identifying the signature sequences of other APOBECs. To date, no other study has determined any APOBEC signature sequence in a comprehensive and quantitative manner. Thus, the signature

sequences previously identified may not represent the actual signature sequence

for these enzymes. Additionally, the signature sequence of many other

APOBECs are unknown outside their dinucleotide sequence motif. Further

determination of the sequence specificity of other APOBECs will be a strong

foundation for a more accurate identification of APOBECs role in cancer, viral

252

restriction and other function in the cell that rely on or are effected by modifications in ssDNA or RNA.

253

VI.a.5. pH dependence of APOBEC activity

In chapter IV, a systematic measurement of A3A affinity for signature

sequence in a broad range of pH values was determined in order to verify and

quantify the pH dependence of A3A for substrate ssDNA (15, 25). This pH dependency may relate to the cellular compartmentalization of maximal activity.

A3A was found to have an increase in affinity with a decrease in pH value. The

structural basis for the pH dependence of A3A as described in chapter III was

elucidated with analysis of our A3A-DNA co-crystal structure. The bound A3A

structure shows that the active site His29 in loop 1 of A3A can hydrogen bond

with the ssDNA backbone and sugar of -1 and 0 nucleotides. This hydrogen bond

network could only occur at pH values of 6.5 and below, when the histidine is

protonated. Thus, the protonated state of histidine may be responsible for

change in affinity seen at different pH values described in chapter IV and

maximum catalytic activity seen at pH 6.0 in previous studies (15).

The activity of A3G was also been shown to be pH dependent. Activity

experiments with A3G-CTD identified an increase in cytidine deamination with

decrease pH value (2). These authors suggested this pH dependency is due to the His216, also located in in loop 1, of the active site of A3G-CTD (2). Thus, active site histidines in A3A and A3G may be an inter-protein regulation mechanism of enzymatic activity that may be conserved and define specificity within the APOBEC super family. Furthermore, A3A and A3G maximum activity occurring in the acidic pH range warrants further studies on their role in

254

endosomal related functions, such as foreign DNA sensing and potential exosomal related functions, such as cell-cell signaling, embryonic morphogenesis, the regulation of host-pathogen interactions, as well as in the progression of neurodegenerative pathologies and cancer.

Through sequence analysis of other ssDNA deaminating APOBECs, I found A3H as the only another APOBEC enzyme that has a histidine located within its active site (Figure VI.7A). Unlike A3G and A3A, with histidine in loop 1 of their active sites, the active site histidine of A3H is located in loop 7. Homology model of A3H shows that the active site histidine is located in relatively the same area as the histidine seen in A3A and A3G, although in this model, A3H active site histidine is flipped away from the active site (Figure VI.7B). I propose that upon substrate binding, A3H active site histidine could flip back towards the active site to make hydrogens bonds with substrate backbone, similar to what was seen for A3A.

A3H is proposed to be the least stable and least active of the APOBECs, thus elucidating A3H activity and substrate specificity has remained elusive.

Studying A3H activity and specificity in a quantitative and systematic way, as described in chapter IV for A3A, may reveal conditions in which A3H is more active than previously reported. Further studies testing A3H activity and specificity is warranted to determine if A3H is also regulated by pH.

255

A

B A3A A3G-CTD

A3H

Figure VI.7 Active site histidines in APOBEC enzymes. A. Sequence alignment of active site loops of all members of the APOBEC super family. B. Solved structures of A3A and A3G-CTD and homology model of A3H. Surface depicted in gray, Z1 domains shown as red cartoon (A3A and A3G-CTS), Z3 domain shown as green cartoon (A3H). Active site histidine shown as orange stick. Nitrogen and oxygen of histidine base colored blue and red respectively. Active site Zinc shown as gray spheres.

256

VI.a.6. Implications of deamination of RNA by APOBEC

Chapter IV describes for the first time A3A’s ability to bind RNA in a highly specific and structural context-dependent manner. Previous reports suggested that A3A binds only weakly to RNA and is not an RNA deaminase, however these experiments were performed with unstructured RNA (23). Our results on

A3A binding to hairpin RNA and not linear ssRNA, along with a recent study describing A3A preference for deaminating cytidines in the loop region of stable

RNA hairpins demonstrates that A3A’s RNA deamination activity is highly dependent on sequence context and secondary structure (26) .

Analysis of our A3A-ssDNA structure from Chapter III also illustrates a potential structural basis for A3A binding and deaminating RNA. A3A can conceivably bind RNA in the same manner as substrate DNA. The extra oxygen of the cytidine sugar could be easily accommodated by the highly conserved residue Tyr130 and the zinc coordinating His70 (Figure VI.8A and B). Other

RNA ribose oxygens in close proximity to A3A are at the -1 and -2 position and could also be accommodated for by residues Tyr132, Gly27, and His29 (Figure

VI.8C and D). A crystal structure of A3A bound to RNA is essential for determining the mechanism of A3A deamination of RNA as well as providing a foundation for using structural analysis to identify other A3s with the potential to deaminate RNA. A3As ability to edit RNA opens up a new dimension of potential substrates that would augment the biological role of this enzyme.

257

Figure VI.8. Potential structural mechanism for A3A binding RNA A. Our A3A-PolyT-1C co crystal structure (PDB ID_ 5KEG): and B. A3A-ATCG structure (PDB ID: 5SWW) illustrating substrate cytidine and A3A residues near where the extra Oxygen would be in an RNA cytidine substrate. C. Residues near -1T and D. near -2A oxygen in PDB 5SWW. A3A residues in proximity to extra oxygen on RNA ribose are shown as purple sticks. DNA shown in orange. Location of extra oxygen depicted in green. catalytic zinc in gray-blue spheres, active site oxygen shown in red sphere. Oxygen and Nitrogens are colored red and blue respectively.

258

VI.b.1. Elucidating the structural basis for mutation-induced destabilization of profilin 1 in ALS

Profilin-1 is a small protein that binds monomeric actin to enhance actin

polymerization, a critical process for axon outgrowth in motor neurons. Mutations

in the PFN1 gene were recently associated with both familial and sporadic forms

of ALS (12, 27). Although ALS-linked mutations have been shown to induce

PFN1 aggregation, the effect of these mutations on protein stability and structure

has not been studied (12). Chapter V focuses on determining what effect these disease-causing mutants have on the structure and stability of PFN1 in order to elucidate the mechanism of pathogenicity of these single point mutants.

To visualize the effects disease causing single point mutants have on the structure of PFN1, I solved the crystal structures for three PFN1 proteins, including the WT protein. Analysis of these crystal structures revealed that the

M114T mutation creates a cleft that extends into the interior of PFN1 compared the WT PFN1 structure. Additionally, I developed a model that illustrates the most severely destabilizing C71G mutation creates a cavity near the core of the

PFN1 protein, proximal to the cleft formed by M114T. This model is based on studied on other proteins demonstrating that single point mutants which create non-surface exposed cavities can severely destabilize a protein’s native conformation (28, 29). The structure of E117G, a variant predicted to have low pathogenicity, closely resembled WT PFN1 structure, thus reconciling the

259

occurrence of this mutation in control populations in previous studies (12).

Overall, Chapter V implicates a destabilized form of PFN1 in ALS pathogenesis.

260

VI.b.2. Characterizing the conformation and local stability of

PFN1-ALS mutants in solution.

In Chapter V, the effects of two of the four ALS associated mutant PFN1

mutants on PFN1 structure were elucidated. Although many crystal trials were

performed for the remaining two mutants, structures of G118V and C71G remain

elusive. To characterize PFN1 mutants not conducive for crystallographic

studies, I propose studying the low-energy native states of these mutants in solution using hydrogen exchange (HX) coupled with high-resolution mass

spectrometry (MS). This approach may also be used to study M114T in

conjunction with the structural analysis described in Chapter V. Specifically,

results from HX-MS could determine if the cleft revealed through our structural

analysis reflect those found for PFN1 M114T in solution. Furthermore, with

native-state HX-MS one could characterize the local stability of PFN1-ALS

mutants in solution through characterization of their high-energy states. Since

partial unfolding of a protein can cause it to aggregate, identifying changes in the

local stability of these mutants may help explain why these mutants are more

prone to aggregation.

261

VI.b.3. Elucidating the role of single point mutations on the function of PFN1.

PFN1 is best known for its role in actin dynamics in the context of

endocytosis, membrane trafficking, cell motility, and neuronal growth and

differentiation (30). In addition to binding monomeric or G-actin, PFN1 also binds

to a host of different proteins through their poly- L-proline motifs and to lipids

such as phosphatidylinositol 4,5- bisphosphate (30, 31). Beyond the effects of

single point mutants on protein stability, determining PFN1 disease causing

mutants effects on ligand binding could further elucidate mechanisms that

contribute to the mutations pathogenicity.

262

VI.b.4. Designing small molecule therapies to stabilize PFN1 disease causing mutants.

Data described in Chapter V identify misfolded and destabilized PFN1 as

a potential upstream trigger of the adverse events that culminate in ALS, opening

up new avenues for therapeutic approaches for ALS. One approach to designing

therapies for PFN1 associated ALS is to develop specific pharmacological

chaperones for each ALS causing mutant of PFN1. For example, a small

molecule that fills the cleft formed by the M114T mutation or the putative cavity

formed by C71G mutation could be designed to stabilize PFN1. Stabilizing ALS-

PFN1 has the potential to restore the normal structure and function of the protein,

thereby preventing the pathogenic cascade leading to ALS.

Another therapeutic approach was developed based on the stabilizing

effects of poly-L-proline on mutant PFN1 shown in Chapter V. We hypothesize

that designing small-molecules that bind to allosteric regions of mutant PFN1

could stabilize PFN1. An allosteric stabilizer of PFN1 may allow for a therapy that

could be used for any PFN1 ALS associated mutant.

263

VI.c. References:

1. Harjes E, Gross PJ, Chen KM, Lu Y, Shindo K, Nowarski R, Gross JD, Kotler M, Harris RS, Matsuo H. An extended structure of the APOBEC3G catalytic domain suggests a unique holoenzyme model. J Mol Biol. 2009;389(5):819-32. PubMed PMID: 19389408.

2. Harjes S, Solomon WC, Li M, Chen KM, Harjes E, Harris RS, Matsuo H. Impact of H216 on the DNA binding and catalytic activities of the HIV restriction factor APOBEC3G. J Virol. 2013;87(12):7008-14. Epub 2013/04/19. doi: 10.1128/JVI.03173-12. PubMed PMID: 23596292; PMCID: 3676121.

3. Furukawa A, Nagata T, Matsugami A, Habu Y, Sugiyama R, Hayashi F, Kobayashi N, Yokoyama S, Takaku H, Katahira M. Structure, interaction and real-time monitoring of the enzymatic reaction of wild-type APOBEC3G. Embo J. 2009;28(4):440-51. PubMed PMID: 19153609.

4. Kouno T, Luengas EM, Shigematsu M, Shandilya SM, Zhang J, Chen L, Hara M, Schiffer CA, Harris RS, Matsuo H. Structure of the Vif-binding domain of the antiviral enzyme APOBEC3G. Nat Struct Mol Biol. 2015;22(6):485-91. doi: 10.1038/nsmb.3033. PubMed PMID: 25984970; PMCID: PMC4456288.

5. Ara A, Love RP, Chelico L. Different mutagenic potential of HIV-1 restriction factors APOBEC3G and APOBEC3F is determined by distinct single-stranded DNA scanning mechanisms. PLoS Pathog. 2014;10(3):e1004024. doi: 10.1371/journal.ppat.1004024. PubMed PMID: 24651717; PMCID: PMC3961392.

6. Feng Y, Chelico L. Intensity of deoxycytidine deamination of HIV-1 proviral DNA by the retroviral restriction factor APOBEC3G is mediated by the noncatalytic domain. J Biol Chem. 2011;286(13):11415-26. doi: 10.1074/jbc.M110.199604. PubMed PMID: 21300806; PMCID: PMC3064197.

7. Bulliard Y, Turelli P, Rohrig UF, Zoete V, Mangeat B, Michielin O, Trono D. Functional analysis and structural modeling of human APOBEC3G reveal the role of evolutionarily conserved elements in the inhibition of human immunodeficiency virus type 1 infection and Alu transposition. J Virol. 2009;83(23):12611-21. doi: 10.1128/JVI.01491-09. PubMed PMID: 19776130; PMCID: PMC2786736.

8. Song C, Sutton L, Johnson ME, D'Aquila RT, Donahue JP. Signals in APOBEC3F N-terminal and C-terminal deaminase domains each contribute to encapsidation in HIV-1 virions and are both required for HIV-1 restriction. J Biol

264

Chem. 2012;287(20):16965-74. doi: 10.1074/jbc.M111.310839. PubMed PMID: 22451677; PMCID: PMC3351310.

9. Pak V, Heidecker G, Pathak VK, Derse D. The role of amino-terminal sequences in cellular localization and antiviral activity of APOBEC3B. J Virol. 2011;85(17):8538-47. doi: 10.1128/JVI.02645-10. PubMed PMID: 21715505; PMCID: PMC3165795.

10. Love RP, Xu H, Chelico L. Biochemical analysis of hypermutation by the deoxycytidine deaminase APOBEC3A. J Biol Chem. 2012;287(36):30812-22. doi: 10.1074/jbc.M112.393181. PubMed PMID: 22822074; PMCID: PMC3436324.

11. Bohn MF, Shandilya SM, Silvas TV, Nalivaika EA, Kouno T, Kelch BA, Ryder SP, Kurt-Yilmaz N, Somasundaran M, Schiffer CA. The ssDNA Mutator APOBEC3A Is Regulated by Cooperative Dimerization. Structure. 2015;23(5):903-11. doi: 10.1016/j.str.2015.03.016. PubMed PMID: 25914058.

12. Wu CH, Fallini C, Ticozzi N, Keagle PJ, Sapp PC, Piotrowska K, Lowe P, Koppers M, McKenna-Yasek D, Baron DM, Kost JE, Gonzalez-Perez P, Fox AD, Adams J, Taroni F, Tiloca C, Leclerc AL, Chafe SC, Mangroo D, Moore MJ, Zitzewitz JA, Xu ZS, van den Berg LH, Glass JD, Siciliano G, Cirulli ET, Goldstein DB, Salachas F, Meininger V, Rossoll W, Ratti A, Gellera C, Bosco DA, Bassell GJ, Silani V, Drory VE, Brown RH, Jr., Landers JE. Mutations in the profilin 1 gene cause familial amyotrophic lateral sclerosis. Nature. 2012;488(7412):499-503. doi: 10.1038/nature11280. PubMed PMID: 22801503; PMCID: PMC3575525.

13. Bogerd HP, Wiegand HL, Hulme AE, Garcia-Perez JL, O'Shea KS, Moran JV, Cullen BR. Cellular inhibitors of long interspersed element 1 and Alu retrotransposition. Proc Natl Acad Sci U S A. 2006;103(23):8780-5. doi: 10.1073/pnas.0603313103. PubMed PMID: 16728505; PMCID: PMC1482655.

14. Vartanian JP, Guetard D, Henry M, Wain-Hobson S. Evidence for editing of human papillomavirus DNA by APOBEC3 in benign and precancerous lesions. Science. 2008;320(5873):230-3. doi: 10.1126/science.1153201. PubMed PMID: 18403710.

15. Pham P, Landolph A, Mendez C, Li N, Goodman MF. A biochemical analysis linking APOBEC3A to disparate HIV-1 restriction and skin cancer. J Biol Chem. 2013;288(41):29294-304. doi: 10.1074/jbc.M113.504175. PubMed PMID: 23979356; PMCID: PMC3795231.

265

16. Kouno T, Silvas TV, Hilbert BJ, Shandilya SMD, Bohn MF, Kelch BA, Royer WE, Somasundaran M, Kurt Yilmaz N, Matsuo H, Schiffer CA. Crystal structure of APOBEC3A bound to single-stranded DNA reveals structural basis for cytidine deamination and specificity. Nat Commun. 2017;8:15024. doi: 10.1038/ncomms15024. PubMed PMID: 28452355; PMCID: PMC5414352.

17. Shi K, Carpenter MA, Banerjee S, Shaban NM, Kurahashi K, Salamango DJ, McCann JL, Starrett GJ, Duffy JV, Demir O, Amaro RE, Harki DA, Harris RS, Aihara H. Structural basis for targeted DNA cytosine deamination and mutagenesis by APOBEC3A and APOBEC3B. Nat Struct Mol Biol. 2017;24(2):131-9. doi: 10.1038/nsmb.3344. PubMed PMID: 27991903; PMCID: PMC5296220.

18. Fang Y, Xiao X, Li SX, Wolfe A, Chen XS. Molecular Interactions of a DNA Modifying Enzyme APOBEC3F Catalytic Domain with a Single-Stranded DNA. J Mol Biol. 2018;430(1):87-101. doi: 10.1016/j.jmb.2017.11.007. PubMed PMID: 29191651; PMCID: PMC5738261.

19. Bohn JA, Thummar K, York A, Raymond A, Brown WC, Bieniasz PD, Hatziioannou T, Smith JL. APOBEC3H structure reveals an unusual mechanism of interaction with duplex RNA. Nat Commun. 2017;8(1):1021. doi: 10.1038/s41467-017-01309-6. PubMed PMID: 29044109; PMCID: PMC5647330.

20. Qiao Q, Wang L, Meng FL, Hwang JK, Alt FW, Wu H. AID Recognizes Structured DNA for Class Switch Recombination. Mol Cell. 2017;67(3):361-73 e4. doi: 10.1016/j.molcel.2017.06.034. PubMed PMID: 28757211.

21. Chelico L, Pham P, Calabrese P, Goodman MF. APOBEC3G DNA deaminase acts processively 3' --> 5' on single-stranded DNA. Nat Struct Mol Biol. 2006. PubMed PMID: 16622407.

22. Byeon IJ, Ahn J, Mitra M, Byeon CH, Hercik K, Hritz J, Charlton LM, Levin JG, Gronenborn AM. NMR structure of human restriction factor APOBEC3A reveals substrate binding and enzyme specificity. Nat Commun. 2013;4:1890. doi: 10.1038/ncomms2883. PubMed PMID: 23695684; PMCID: 3674325.

23. Mitra M, Hercik K, Byeon IJ, Ahn J, Hill S, Hinchee-Rodriguez K, Singer D, Byeon CH, Charlton LM, Nam G, Heidecker G, Gronenborn AM, Levin JG. Structural determinants of human APOBEC3A enzymatic and nucleic acid binding properties. Nucleic Acids Res. 2014;42(2):1095-110. doi: 10.1093/nar/gkt945. PubMed PMID: 24163103; PMCID: 3902935.

266

24. Stenglein MD, Burns MB, Li M, Lengyel J, Harris RS. APOBEC3 proteins mediate the clearance of foreign DNA from human cells. Nat Struct Mol Biol. 2010;17(2):222-9. PubMed PMID: 20062055.

25. Byeon IJ, Byeon CH, Wu T, Mitra M, Singer D, Levin JG, Gronenborn AM. Nuclear Magnetic Resonance Structure of the APOBEC3B Catalytic Domain: Structural Basis for Substrate Binding and DNA Deaminase Activity. Biochemistry. 2016;55(21):2944-59. doi: 10.1021/acs.biochem.6b00382. PubMed PMID: 27163633.

26. Sharma S, Baysal BE. Stem-loop structure preference for site-specific RNA editing by APOBEC3A and APOBEC3G. PeerJ. 2017;5:e4136. doi: 10.7717/peerj.4136. PubMed PMID: 29230368; PMCID: PMC5723131.

27. Smith BN, Vance C, Scotter EL, Troakes C, Wong CH, Topp S, Maekawa S, King A, Mitchell JC, Lund K, Al-Chalabi A, Ticozzi N, Silani V, Sapp P, Brown RH, Jr., Landers JE, Al-Sarraj S, Shaw CE. Novel mutations support a role for Profilin 1 in the pathogenesis of ALS. Neurobiol Aging. 2015;36(3):1602 e17-27. doi: 10.1016/j.neurobiolaging.2014.10.032. PubMed PMID: 25499087; PMCID: PMC4357530.

28. Eriksson AE, Baase WA, Zhang XJ, Heinz DW, Blaber M, Baldwin EP, Matthews BW. Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science. 1992;255(5041):178-83. PubMed PMID: 1553543.

29. Joerger AC, Ang HC, Fersht AR. Structural basis for understanding oncogenic p53 mutations and designing rescue drugs. Proc Natl Acad Sci U S A. 2006;103(41):15056-61. doi: 10.1073/pnas.0607286103. PubMed PMID: 17015838; PMCID: PMC1635156.

30. Witke W. The role of profilin complexes in cell motility and other cellular processes. Trends Cell Biol. 2004;14(8):461-9. doi: 10.1016/j.tcb.2004.07.003. PubMed PMID: 15308213.

31. Lambrechts A, Verschelde JL, Jonckheere V, Goethals M, Vandekerckhove J, Ampe C. The mammalian profilin isoforms display complementary affinities for PIP2 and proline-rich sequences. EMBO J. 1997;16(3):484-94. doi: 10.1093/emboj/16.3.484. PubMed PMID: 9034331; PMCID: PMC1169652.

267