<<

MECHANISMS OF CRISPR-MEDIATED

IMMUNITY IN

by

Paul Bertram Geert van Erp

A dissertation submitted in partial fulfillment of the requirements for the degree

of

Doctor of Philosophy

in

Microbiology

MONTANA STATE UNIVERSITY Bozeman, Montana

November 2019

iii

©COPYRIGHT

by

Paul Bertram Geert van Erp

2019

All Rights Reserved

iv

ACKNOWLEDGEMENTS

My doctoral work has been an adventurous journey that has only been possible thanks to the help and support of many people. First and foremost, I’d like to thank my advisor Blake Wiedenheft. Blake has taught me how to ask good scientific questions, to consider why one would care for the answers, and how one can convey them precisely and clearly to others. His enthusiasm for scientific discovery is infectious, and some of my favorite times in the Wiedenheft lab were spent brainstorming during lab meetings or impromptu meetings in Blake’s office. Thank you for being an awesome mentor!

I’d like to thank all past and present members of the Wiedenheft lab whom have made it such a fun and stimulating place to do research, especially MC, Sarah, Royce,

Ryan, and Emma. Also, I’d like to thank my undergraduate students for their commitment and enthusiasm. I’m grateful for my committee members and our collaborators that have helped made this work possible, especially Ailong Ke’s lab, and Brian Bothner’s lab and

Angela Patterson in particular. I’m grateful for the wisdom of Coach Corey Wayne; thanks to you I’ve become more certain, balanced, and peaceful.

I’d like to thank my friends, both in Bozeman and in the Netherlands. Good friends are hard to come by, and you’ve made life outside the lab highly enjoyable. A special thanks goes out to my parents, Jan and José, and my brother, Hannes. Bedankt voor alle steun, jullie staan altijd voor mij te juichen en zijn altijd geïnteresseerd in mijn onderzoek, ook als het ingewikkeld is. Ik hou van jullie. Finally, I want to thank my biggest supporter, best friend, and love of my life, Jenny. Your love and support have been invaluable during this journey and I can wait to see what the future brings for us! v

TABLE OF CONTENTS

1. THE HISTORY AND MARKET IMPACT OF CRISPR RNA-GUIDED NUCLEASES ...... 1

Contribution of Authors and Co-Authors ...... 1 Manuscript Information ...... 2 Abstract ...... 4 Main text ...... 5 A goldmine for biotechnology ...... 6 Why all the fuss? ...... 7 The Emerging Market ...... 11 Acknowledgements...... 14 References ...... 15

2. CONFORMATIONAL REGULATION OF CRISPR-ASSOCIATED NUCLEASES ...... 20

Contribution of Authors and Co-Authors ...... 20 Manuscript Information ...... 21 Abstract ...... 23 Introduction ...... 24 Regulation of Class 2 Cas nucleases ...... 27 Type II systems ...... 27 Type V systems ...... 31 Type VI systems ...... 32 Regulation of Class 1 nucleases that are essential for interference...... 33 Type I systems ...... 33 Type III systems...... 37 Perspective ...... 38 Acknowledgements...... 39 References ...... 40

3. CRYSTAL STRUCTURE OF THE CRISPR RNA-GUIDED SURVEILLANCE COMPLEX FROM ESCHERICHIA COLI ...... 50

Contribution of Authors and Co-Authors ...... 50 Manuscript Information ...... 52 Abstract ...... 54 Introduction ...... 55 Results ...... 56 Overview of the Cascade structure ...... 56 vi

TABLE OF CONTENTS CONTINUED

Mechanism of RNA recognition by the Cas6e endonuclease ...... 58 Assembly of the Cas7 backbone...... 60 Programmed tail assembly ...... 66 Discussion ...... 70 Acknowledgements...... 72 References ...... 74 Materials and Methods ...... 78 Cascade expression and purification ...... 78 Cascade crystallization and structure determination ...... 78 Labeling of ...... 80 Electrophoretic Mobility Shift Assay (EMSA) ...... 80 Structure alignments and graphical rendering ...... 81 Supplemental Figures ...... 82 Supplemental Tables ...... 96 Supplemental Movies ...... 98 Supplemental References ...... 99

4. MECHANISM OF CRISPR-RNA GUIDED RECOGNITION OF DNA TARGETS IN ESCHERICHIA COLI ...... 101

Contribution of Authors and Co-Authors ...... 101 Manuscript Information ...... 102 Abstract ...... 104 Introduction ...... 105 Materials and Methods ...... 107 Cascade expression and purification ...... 107 Cascade crystallization and structure determination ...... 108 Structural analysis and graphical rendering ...... 109 Electrophoretic Mobility Shift Assay (EMSA) ...... 110 Curing Assays ...... 111 Results ...... 112 A lysine-rich vise is essential for dsDNA binding and unwinding...... 112 PAM recognition and strand invasion...... 116 An arginine relay facilitates conformational rearrangements during target binding...... 119 Discussion ...... 123 Acknowledgments ...... 128 References ...... 129 Supplemental Figures ...... 134 Supplemental Tables ...... 146 Supplemental Movies ...... 150 vii

TABLE OF CONTENTS CONTINUED

5. STRUCTURAL BASIS FOR PROMISCUOUS PAM RECOGNITION IN THE TYPE I-E CASCADE FROM E. COLI ...... 151

Contribution of Authors and Co-Authors ...... 151 Manuscript Information ...... 152 Abstract ...... 154 Main text ...... 155 Materials and Methods ...... 165 Expression and purification of Cascade, Cascade-dsDNA complex and Cas3 ...... 165 Crystallization and structure determination of Cascade/partial R-loop complex ...... 167 Electrophoretic mobility shift assay ...... 167 Cascade-mediated Cas3 DNA cleavage assay...... 168 Acknowledgments ...... 169 References ...... 170 Supplemental Figures ...... 174 Supplemental Tables ...... 183

6. CONFORMATIONAL DYNAMICS OF DNA BINDING AND CAS3 RECRUITMENT BY THE CRISPR RNA-GUIDED CASCADE COMPLEX ...... 185

Contribution of Authors and Co-Authors ...... 185 Manuscript Information ...... 187 Abstract ...... 189 Introduction ...... 190 Results ...... 192 DNA binding ...... 195 Path of the Displaced DNA Strand ...... 198 Recruitment of the nuclease-helicase Cas3...... 202 Discussion ...... 206 Acknowledgments ...... 210 Materials and Methods ...... 210 Hydrogen-Deuterium Exchange coupled with Mass Spectrometry (HDX-MS): ...... 210 Cascade expression and purification ...... 213 Cas3 expression and purification ...... 214 Oligoduplex preparation and DNA labeling...... 215 Electrophoretic mobility shift assay (EMSA) ...... 215 viii

TABLE OF CONTENTS CONTINUED

Plasmid-curing assays ...... 216 Cascade/Cas3 mediated DNA degradation ...... 216 References ...... 218 Supplemental Figures ...... 222 Supplemental Tables ...... 236

7. PROTEIN OVEREXPRESSION REDUCES SPECIFIC PHAGE INFECTIVITY IN PROKARYOTIC ARGONAUTE SCREEN ...... 237

Contribution of Authors and Co-Authors ...... 237 Manuscript Information ...... 238 Abstract ...... 240 Introduction ...... 241 Results ...... 245 Discussion ...... 253 Materials and Methods ...... 256 E. coli strains ...... 256 Argonaute selection and cloning...... 256 Phage preparation and purification ...... 257 Phage plate spot plaque assays ...... 258 Phylogenetic Analysis ...... 258 pAgo guide Analysis ...... 259 References ...... 260 Supplementary Figures ...... 264 Supplementary Tables ...... 266

8. GENERAL CONCLUSION AND OUTLOOK...... 267

The Type I-E CRISPR System ...... 268 Target Binding and PAM Recognition ...... 269 Conformational Control and Cas3 Recruitment ...... 270 References ...... 273

REFERENCES CITED ...... 275

ix

LIST OF TABLES

Table Page

1.1 Industry interests in Cas9-based technologies...... 12

S3.1. Data collection and refinement statistics ...... 96

S3.2. DNA substrates used for band shift assays...... 97

S4.1. used ...... 146

S4.2. Data collection and refinement statistics ...... 147

S4.3. DNA substrates used for band shift assays ...... 148

S4.4. Primers used for site directed mutagenesis ...... 149

S5.1. Oligonucleotides used in this study...... 183

S5.2. Data collection and refinement statistics...... 184

S6.1. Oligonucleotides used in this study...... 236

S7.1. Primers used in Overlap Extension PCR and Recombination in vivo...... 266

x

LIST OF FIGURES

Figure Page

1.1. Cas9 delivery and repair of the target DNA...... 10

2.1. Diverse CRISPR systems defend against DNA- and RNA-based invaders ...... 26

2.2. Class 2 Nuclease activation mechanisms...... 29

2.3. Regulation of Class 1 Cas nucleases...... 36

3.1. X-ray crystal structure of Cascade...... 57

3.2. Mechanism of crRNA recognition by Cas6e...... 59

3.4. Assembly of backbone creates an interwoven structure that presents segments of the crRNA for target binding...... 64

3.5. Mechanism of tail assembly...... 68

S3.1. Electron Density of Cascade...... 82

S3.2. Superposition of the two Cascade molecules in the asymmetric unit...... 83

S3.3. Schematic of amino acid contacts with the crRNA...... 84

S3.4. Structural Comparison of Cas6e proteins...... 85

S3.5. Structural similarities between Cas7 CRISPR proteins...... 86

S3.6. Surface features of Cascade...... 87

S3.7. Every 6th- of the crRNA-guide does not participate in target recognition...... 88

S3.8. Cse2 proteins form a dimer that assembles along the belly of Cascade...... 89

S3.9. Structures of Cas5 proteins...... 90 xi

LIST OF FIGURES CONTINUED

Figure Page

S3.10. Docking the atomic model of Cascade into the cryo-EM density of Cascade bound to dsDNA...... 92

S3.11. Conserved mechanism for recognition of the 5'-handle...... 93

S3.12. Cse1 is a metal binding protein...... 94

S3.13. A-form ordering of RNA by diverse protein platforms...... 95

4.1. A lysine-rich vise is critical for binding dsDNA...... 115

4.2. The L1-helix and a long β-hairpin in the Cse1 subunit are involved in PAM recognition and duplex destabilization...... 118

4.3. Conserved arginines on the Cse2 subunits participate in a relay that stabilizes target binding induced conformational changes...... 122

4.4. DNA target recognition and binding by Cascade...... 127

S4.1. Representative agarose gels used to screen colonies in plasmid curing assays...... 134

S4.2. Superposition of Cascade structures...... 135

S4.3. The ΔKKKK mutant expresses and purifies similar to the wild type Cascade complex...... 136

S4.4. Cascade relies on a lysine-rich vise for binding dsDNA targets...... 138

S4.5. Structural models generated using Molecular Dynamics Flexible Fitting...... 139

S4.6. Sequence and structural alignment of PAM sensing features from E. coli and S. thermophilus...... 140

S4.7. The L1-helix and β-hairpin are required for high-affinity binding to dsDNA targets...... 142 xii

LIST OF FIGURES CONTINUED

Figure Page

S4.8. Conserved arginines stabilize flipped-out bases on the DNA target...... 143

S4.9. Mutations that perturb the arginine relay result in a dsDNA binding defect...... 144

S4.10. Aspartate residue conservation in Cas7 proteins from Type I and Type III CRISPR systems ...... 145

5.1. Foreign DNA bound Cascade structure...... 158

5.2. PAM recognition by Cse1 subunit of Cascade...... 161

5.3. Active guidance of the non-target DNA strand by Cascade...... 163

5.4. Model for PAM-dependent directional R-loop formation in Type I CRISPR-Cas system...... 164

S5.1. Electron density of nucleic acids in Cascade-dsDNA complex structure...... 174

S5.2. Comparison between the Cascade-partial R-loop crystal structure and the Cascade-full R-loop EM reconstruction...... 175

S5.3. Conformational changes in Cascade upon partial R-loop formation...... 176

S5.4. Local conformational rearrangement in Cse1 at the proposed Cas3 binding site...... 177

S5.5. Guided entry of dsDNA into Cascade...... 178

S5.6. Influence of PAM-1 base-pair composition on Cascade binding affinity and Cas3 cleavage efficiency ...... 179

S5.7. Conservation of PAM recognition elements...... 181

6.1. Binding of dsDNA increases the stability of Cascade...... 194 xiii

LIST OF FIGURES CONTINUED

Figure Page

6.2. Peptide-resolution mapping of deuterium uptake reveals areas of Cascade that become more or less protected upon dsDNA binding ...... 197

6.3. Using HDX to identify residues involved in binding the displaced DNA strand...... 200

6.4. Predicting the Cas3 interaction site using HDX...... 204

S6.1. DNA binding increases the stability of Cascade...... 222

S6.2. Heat and coverage maps for all Cascade subunits...... 229

S6.3. HD-exchange over time...... 230

S6.4. Peptide covering the lysine-rich helix (Cas7 134-148) displays a bimodal distribution ...... 231

S6.5. Sequence alignment of Cse1 shows little conservation in L3 and L4 ...... 232

S6.6. Structures of the Cse1 subunit ...... 233

S6.7. Cascade mutants express and purify as WT Cascade...... 234

S6.8. Deletion of Loop3 results in a major dsDNA binding defect...... 235

7.1. Selection of pAgo proteins...... 244

7.2. Overexpression of some pAgos results in specific antiviral activity...... 248

7.3. Recombinantly expressed pAgo are loaded with guide sequence matching phages...... 250

7.4. Protein expression in general cause reduction in phage infectivity...... 253

S7.1 E. coli optimized Prokaryotic Argonautes...... 264 xiv

ABSTRACT

Prokaryotes are under constant threat from foreign genetic elements such as and plasmids. To defend themselves against these genetic invaders prokaryotes have evolved extensive defense mechanisms. In this thesis I explore two such defense systems: prokaryotic Argonautes and CRISPR-systems. CRISPR-systems acquire short sequences derived from foreign genetic elements and store them in the CRISPR locus. In subsequent rounds of infection these stored sequences are used as guides by Cas proteins to target the invaders. Escherichia coli K-12 contains a type I-E CRISPR system, consisting of two CRISPR loci and eight cas genes. five of these cas genes, together with and 61-nucleotide CRISPR-RNA guide form the RNA-guided surveillance complex Cascade. This complex finds and binds foreign DNA targets that are complementary to its RNA guide. After target binding the helicase/nuclease Cas3 is recruited to the Cascade-DNA complex for destruction of the target. The goal of this research is to understand the molecular mechanisms that lead to target recognition and destruction in the type I-E CRISPR systems. Atomic resolution structures of the proteins involved in these CRISPR systems provide the blueprints of these proteins machines. Structure guided mutational analysis coupled with in vivo and in vitro biochemical experiments are used to investigate the underlying molecular mechanisms of this CRISPR system. Together, these results explain the rules of target recognition and Cas3 recruitment. Prokaryotic Argonautes have been hypothesized to defend against mobile genetic elements such as plasmids and viruses through guided nuclease activity. To test this hypothesis, we overexpressed 8 phylogenetically diverse prokaryotic Argonautes proteins in Escherichia coli and challenged them with seven . This resulted in robust protection against phage Lambda and phage P1 by four of the tested Argonautes, while little impact on phage infectivity was observed for the other phages tested. However, control experiments with a nuclease inactive Argonaute mutant and expression of an unrelated control protein showed similar protection against phage Lambda and phage P1. Collectively, our data suggest that protein overexpression in general, rather than Argonaute expression in particular, results in protection against 2 specific phages.

Correspondence: [email protected] xv

PREFACE AND OUTLINE

Bacteria and archaea are under constant threat from genetic invaders such as plasmids and viruses. To defend themselves against these threats they have evolved numerous defense systems. CRISPR (Clustered Regularly Interspaced Palindromic

Repeat)-Cas (CRISPR associated) systems are unique in their ability to provide inheritable and adaptable defense against such invaders. CRISPR loci consist of short fragments of foreign derived DNA termed spacers, separated by palindromic repeat sequences. Next to this locus lie the CRISPR-associated cas genes that code for the protein machinery of the

CRISPR systems. CRISPR loci were initially discovered in the 1980s, but only in the latter half of the previous decade did it become clear that they are part of an intricate prokaryotic defense system. The current decade has revealed that CRISPR systems are both common and diverse, with over 50% of bacteria and 90% of archaea containing at least one of over

30 different types of CRISPR systems. While CRISPR systems are diverse, their general mechanisms of action are divided into three main stages. In the first stage, Adaptation, short segments of foreign DNA termed protospacers are selected by Cas1-Cas2 and integrated as new spacers into the CRISPR locus. During the second stage, Biogenesis, the

CRISPR locus is transcribed into RNA and processed into mature CRISPR (crRNA)- guides. Together, Cas proteins and crRNAs assemble into crRNA-guided surveillance complexes. In the final stage of CRISPR immunity, Interference, the surveillance complexes identify targets that are complementary to the crRNA guide. Target recognition activates intrinsic nuclease activity in the surveillance complex, recruits a trans-acting nuclease or both. xvi

CRISPR has been brought to the scientific and general public limelight by the applications that have been derived from CRISPR systems in the form of genetic tools that are transforming medicine, molecular biology, and agriculture. While mostly derived from the Type-II ribonucleoproteins such as Cas9, recently we have seen the first tools and applications derived from Type I CRISPR system. This thesis highlights my scientific contributions to our understanding of the Type I-E CRISPR system of Escherichia coli, one of the best studied systems that became an early model CRISPR system.

At the beginning of my doctoral research, the major players in this system were known. Cas1 and Cas2 are responsible for new spacer acquisition. Cascade (CRISPR associated complex for antiviral defense) is the surveillance complex that searches for complementary DNA that matches its CRISPR-RNA (crRNA) guide. Cas3 is the helicase/nuclease that is recruited to the target bound Cascade complex for destruction of the invading target. What remained to be elucidated was how these proteins and protein complexes function in molecular detail. Atomic resolution structures of the protein components of the Type I-E CRISPR system have enabled us to understand the coordinated molecular mechanisms of Cascade assembly, target surveillance, and Cas3 recruitment.

Finally, the last chapter of this thesis delves into a different defense system, the prokaryotic Argonautes (pAgo). The role of pAgo proteins remains largely unknown, but they have some similarities to CRISPR systems. Like CRISPRs, prokaryotic Argonautes proteins rely on guide nucleic acids to identify complementary foreign . In some cases, they act as nucleases and cleave their targets. In the last chapter I explore the xvii ability of prokaryotic Argonaute proteins to act as an antiviral defense system in

Escherichia coli.

Chapter 1 provides a historical perspective on the repurposing of CRISPR-RNA guided nucleases as biotechnological tools. We compare the CRISPR revolution to the earlier biotechnological adaptation of a prokaryotic defense system; the restriction endonucleases. Restriction enzymes paved the way towards the first recombinant DNA technologies and revolutionized molecular biology back in the 1970s. Similarly, CRISPR-

RNA guided nucleases are transforming by allowing efficient and precise alteration of sequences of eukaryotes. We summarize the emerging market for programmable nucleases and provide an overview of the major players in the CRISPR marketplace.

Chapter 2 discusses how CRISPR-associated nuclease are regulated through conformational control. In the interference stage of CRISPR-mediated immunity, the RNA guided effector proteins identify foreign targets and either activate intrinsic nuclease activity, or recruit separate nucleases to the RNA-guided surveillance complexes for destruction of the target. Proper identification of valid targets and tight regulation of nuclease activity is critical for avoiding autoimmunity. We summarize how the major types of CRISPR systems use both orthosteric and allosteric conformational control to activate the nuclease activity.

Chapter 3 presents the crystal structure of the CRISPR-RNA guided surveillance complex Cascade from the Type I-E CRISPR system of E. coli. This structure explains how 11 proteins subunits together with a 61-nucleotide CRISPR-RNA guide come together xviii and form the Cascade complex. We show how conserved sequence at the 3’ and 5’ end of the RNA guide is recognized by protein subunits of the complex. Every 6th nucleotide of the crRNA spacer is not available for base pairing with the target DNA strand, and mismatches at these positions have no impact on the ability of Cascade to bind the target with high affinity. This structure represents the first atomic resolution structure of the

Cascade complex and serves as a blueprint for understanding the molecular mechanisms of Cascade.

Chapter 4 compares the structures of Cascade before and after target DNA binding.

We use molecular dynamic simulations to investigate the conformational changes that take place upon DNA binding and use these insights to make functional predictions on the mechanisms of target binding. We test these predictions through in vivo and in vitro experiments using mutational analysis. This reveals how a lysine-vise in the tail of the complex interacts with DNA non-specifically and is critical for positioning the target for

Protospacer Adjacent Motif (PAM) recognition. We investigate amino acid residues involved in PAM recognition and reveal how a relay of salt bridges facilitates sliding of the Cse2 (Cas11) subunits along the belly of the complex upon target binding.

Chapter 5 presents the crystal structure of the E. coli Cascade complex bound to a partial dsDNA target. This structure explains how the PAM sequence is recognized by three amino acid motifs in the Cse1 (Cas8e) subunit of the Cascade complex. It how the

PAM duplex is recognized, from the minor grove, which explains the promiscuous PAM tolerance of the Cascade complex. The structure shows how dsDNA binding leads to conformational rearrangement of the belly and tail subunits, which induces a local xix conformational change in the Cse1 subunit that exposes the recruitment site for the helicase/nuclease Cas3.

Chapter 6 investigates the conformational dynamics of the Cascade complex during target binding, which primes the complex for Cas3 recruitment. Hydrogen-

Deuterium exchange coupled to Mass Spectrometry (HDX-MS) is utilized to investigate the dynamics of Cascade before and after target binding. We use these data to generate a map of the complex that highlights areas that become more, and less dynamic upon target binding. We use this map to make functional prediction on the location of the non-target

DNA strand and the recruitment site of Cas3. These predictions are tested using in vivo and in vitro experiments which show that the non-target strand is loosely bound by the belly subunits of the complex. The map highlights residues in the Cse1 subunit near the local conformational change shown in Chapter 5. We show that mutation of these residues completely abolishes Cas3 mediated target degradation indicating that they are part of the

Cas3 recruitment site. Collectively, these results show that target binding induces a local conformational change that exposes the Cas3 recruitment site on the Cse1 subunit.

Chapter 7 explores the biological role of prokaryotic Argonaute (pAgo) proteins.

Argonaute proteins are found in all three domains of life. In eukaryotes, Argonautes play a central role in RNA interference (RNAi), but their biological role in prokaryotes is largely unknown. We hypothesized that pAgo proteins can provide antiviral defense through guided nuclease activity. To test this hypothesis, we overexpressed a diverse set of pAgos in E. coli and challenged them with 7 different E. coli phages belonging to three different viral families. While no direct antiviral activity by pAgos was observed, we did see strong xx reduction in viral infectivity of specific phages related to protein expression. Furthermore, we investigated the guide sequences that pAgos acquire when recombinantly expressed in

E. coli and show that pAgos associate with guides targeting phages that are derived from region of homology between the E. coli and phage . The mechanism of protein expression mediated reduction in viral infectivity, and the potential role of pAgos in antiviral defense remains unclear.

1

CHAPTER ONE

THE HISTORY AND MARKET IMPACT OF CRISPR RNA-GUIDED NUCLEASES

Contribution of Authors and Co-Authors

Manuscript in Chapter 1

Author: Paul B.G. van Erp

Contributions: Wrote the manuscript

Co-Author: Gary Bloomer

Contributions: Wrote the manuscript

Co-Author: Royce Wilkinson

Contributions: Wrote the manuscript

Co-Author: Blake Wiedenheft

Contributions: Wrote the manuscript

2

Manuscript Information

Paul B.G. van Erp, Gary Bloomer, Royce Wilkinson, Blake Wiedenheft

Current Opinion in Virology

Status of Manuscript: ____ Prepared for submission to a peer-reviewed journal ____ Officially submitted to a peer-reviewed journal ____ Accepted by a peer-reviewed journal X Published in a peer-reviewed journal

Elsevier Volume 12 June 2015 Pages 85-90 DOI: 10.1016/j.coviro.2015.03.011 3

THE HISTORY AND MARKET IMPACT OF CRISPR RNA-GUIDED NUCLEASES

Paul B.G. van Erp 1, Gary Bloomer 2, Royce Wilkinson 1, Blake Wiedenheft 1

1Department of Microbiology and Immunology, 2Technology Transfer Office

Montana State University, Bozeman MT 59717, USA.

4

Abstract

The interface between viruses and their hosts’ are hot spots for biological and biotechnological innovation. Bacteria use restriction endonucleases to destroy invading

DNA, and industry has exploited these enzymes for molecular cut-and-paste reactions that are central to many recombinant DNA technologies. Today, another class of nucleases central to adaptive immune systems that protect bacteria and archaea from invading viruses and plasmids are blazing a similar path from basic science to profound biomedical and industrial applications.

5

Main text

In retrospect, we probably should have anticipated that bacteria and archaea would have sophisticated immune systems. After all, viruses are the most abundant biological agents on the planet, causing roughly 1023 infections every second (1-3). The selective pressures imposed by viral predation have resulted in the evolution of numerous phage defense systems, but it was only recently that sophisticated adaptive defense systems were identified in both bacteria and archaea (4-7).

Initial indications that Clusters of Regularly Interspaced Short Palindromic Repeats

(CRISPRs) were part of an adaptive defense system came from a series of bioinformatics observations revealing that the short spacer sequences embedded in CRISPRs were sometimes identical to sequences found in phages and plasmids (8-10). These observations led to the hypothesis that CRISPRs are central components of an adaptive immune system, and in 2007 Barrangou et. al. provided the first demonstration of adaptive immunity in bacteria by monitoring CRISPR loci in phage-challenged cultures of Streptococcus thermophilus (11). This paper showed that CRISPRs evolve by acquiring short fragments of phage-derived DNA and strains with new spacers are resistant to these phages. It was immediately clear that this paper would serve as a foundation for an emerging team of scientists interested in understanding the mechanisms of adaptive defense systems in bacteria and archaea, but few of us anticipated the broader impacts of these discoveries for new applications in genome engineering.

Building on this initial foundation, a series of mechanistic studies showed that

CRISPR loci are transcribed and processed into a library of small CRISPR derived 6

(crRNAs) that guide dedicated nucleases to complementary nucleic acid targets (5-7, 12,

13). In nature, these RNA-guided nucleases provide bacteria and archaea with sequence specific resistance to previously encountered genetic parasites. However, sequence specific nucleases have considerable value in biotechnology and one of these CRISPR-associated nucleases (i.e. Cas9) has recently been co-opted for new applications in biomedical, bioenergy, and agricultural sciences (14-17).

A goldmine for biotechnology

The molecular interface between a parasite and its host is a hot spot for innovation.

A resistant host has a competitive advantage over a susceptible host, but an obligatory parasite faces extinction unless it is able to subvert host defense mechanisms. This conflict results in an accelerated rate of evolution that stimulates genetic innovation on both sides of this molecular arms race.

Genes at the interface of a genetic conflict have proven to be a goldmine for enzymes with activities that can be creatively repurposed for applications in biotechnology.

In the 1970s, basic research on bacteriophages led to the discovery of DNA restriction endonucleases, which transformed molecular biology by making it possible to cleave specific DNA sequences (18). The discovery of these enzymes paved the way for the emergence of recombinant DNA technologies, and in 1978 Werner Arber, Daniel Nathans, and Hamilton Smith shared the Nobel Prize in Physiology or Medicine "for the discovery of restriction enzymes and their application to problems of " (19).

Identification and application of type II restriction enzymes, which are integral to almost 7 every aspect of DNA manipulation, effectively triggered the emergence of a global biotech industry.

Like restriction enzymes, CRISPR systems evolved as components of a prokaryotic defense system. However, the mechanisms of sequence recognition by these enzymes are fundamentally different. Unlike DNA restriction enzymes, which typically rely on protein mediated recognition of 4 to 8 base pairs; CRISPR-associated nucleases are guided by base pairing between an RNA-guide and a complementary target. The implications of this targeting mechanism have triggered a sea-change in biology and now the historical precedent of nucleases in biotechnology seems poised to repeat itself.

Why all the fuss?

Reverse genetics is a powerful method used to determine the biological function of a specific gene. This approach is used routinely to determine gene function in organisms with simple genomes, but existing techniques are not applicable for high-throughput genetic screens in organisms with large genomes and multiple chromosomes. However, the recently discovered mechanism of DNA cleavage by CRISPR RNA-guided nuclease Cas9

(20), has transformed the field of genetics by allowing efficient and precise genetic manipulation of diverse eukaryotic genomes, including human cells (14-17). To repurpose the Cas9 nuclease for targeted genome editing, the cas9 gene has been codon optimized for expression in eukaryotic systems and tagged with a nuclear localization signal (NLS)

(21-23). Cas9 and the guide RNA have been delivered to eukaryotic cells by transient transfections with expression vectors (24) or purified Cas9/sgRNA (25, 26), viral using Lentiviruses (27-29) or Adeno-Associated (30-33), and 8 cytoplasmic or nuclear injections (34-37) (Figure 1.1). In each case, RNA-guide targeting of Cas9 to a complementary DNA target results in a double-stranded DNA break (DSB) at the target site. These lesions are typically repaired by non-homologous end joining (NHEJ), which is an error-prone process that is often accompanied by insertion or deletion of (indels) at the targeted site, resulting in a genetic knock-out of the targeted gene due to a frameshift mutation. Alternatively, DSBs are repaired by homology directed repair

(HDR). In most systems, HDR is an inefficient process that requires a DNA donor with sequence homology to regions of the genome that flank the DSB (38). Using Cas9 in combination with a DNA donor provides a method to target cleavage and repair of naturally occurring genetic defects, add foreign DNA encoding genes with new functions to specific locations, or precisely excise defined fragment of DNA (14-16, 39).

Cas9 is not the first programmable nuclease developed for engineering eukaryotic genomes (40). Some of the earliest methods for introducing targeted genome modifications relied on meganucleases (e.g. HO and I-SceI), which are endonucleases that have long recognition sequences (12 to 40 base pairs) (41). The enhanced specificity of these nucleases, as compared to the 4 to 8 base pairs recognized by most restriction enzymes, presents an opportunity to target specific locations in large eukaryotic genomes. However, these enzymes rely on protein-mediated recognition of the target DNA, and reprogramming these proteins to target new DNA sequences has been challenging due to the integrated nature of the DNA binding and nuclease domains of these proteins. To address this issue, non-natural chimeric nucleases composed of distinct DNA binding and nuclease domains 9 have been engineered. The most celebrated examples of these are Zinc finger nucleases

(ZFNs) and activator-like effector nucleases (TALENs).

Zinc fingers (ZFs) are sequence specific DNA binding domains found in eukaryotic transcription factors and transcription activator-like effectors (TALEs) are modular DNA binding proteins made by bacterial that infect plants. The mechanism of DNA recognition by these proteins is well understood and the modular nature of these interactions has been exploited for reprogramming. While the success of these proteins for targeted genome engineering cannot be overstated, they both rely on protein-mediated recognition of the DNA, which means that every new target requires the engineering of new proteins. In contrast to these techniques, Cas9 is an RNA-guided nuclease that relies on complementary base pairing and protein mediated recognition of an adjacent short sequence motif, commonly referred to as the PAM (protospacer adjacent motif) (42).

PAMs are typically 2 to 5 basepairs in length, which means that PAMs occur at high frequency and are rarely a limitation when designing RNA guides to specific target sequences. However, Cas9 proteins from different organisms often recognize different

PAM sequences, so in rare instances where a target sequence does not contain a PAM recognized by one Cas9 (i.e. Cas9 from Streptococcus pyogenes recognizes a 5’-NGG

PAM), then another Cas9 with a distinct PAM recognition sequence may be used (i.e. Cas9 from Neisseria meningitides recognizes a 5′-NNNNGATT PAM) (43). The diversity of

Cas9 proteins and the simplicity of RNA-guided programing abrogates the need for sophisticated protein engineering, and affords rapid generation of designer nucleases. In fact, guide RNAs that target Cas9 to as many as 20,000 different genes and 1,800 10 in the human genome have been generated in a single experiment (28, 29, 44).

These whole-genome knockout techniques are transforming functional genomics and redefining the possibilities of reverse genetics.

Figure 1.1. Cas9 delivery and repair of the target DNA. Cas9 and a single-guide RNA (sgRNA) have been delivered to eukaryotic cells by transient transfections with either expression vectors encoding cas9 and a sgRNA or purified Cas9/sgRNA complex, cytoplasmic or nuclear injections using mRNA, purified Cas9/sgRNA or expression, or by transduction using Lentiviruses or Adeno-Associated Virus. Cas9 identifies its target by protein mediated PAM (yellow) recognition and base pairing between the sgRNA and the DNA target. Target recognition activates the nuclease sites (red triangles), resulting in double stranded break (DSBs) 3-4 nucleotides downstream from the PAM. DSBs can be repaired by non-homologous end joining (NHEJ) or homology directed repair (HDR). NHEJ results in insertion or deletions (indels), which often results in a frameshift mutation. HDR relies on homologous recombination with a donor DNA molecule. This donor DNA can be used to specifically insert desired sequences.

11

The Emerging Market

CRISPR applications span most every industry that involves biological systems

(Table 1.1). Danisco (DuPont) was an initial pioneer of commercial use of CRISPR technology to enhance viral immunity in bacteria used to make yogurts and cheese, but other markets have been rapidly emerging. Applications in agriculture have lower regulatory hurdles than biomedical application and some of these markets are anticipated to produce earlier returns on investments. Dow Agrosciences has co-developed intellectual property (IP) with Sangamo Biosciences for developing genetically modified crops using

Cas9, and Cellectis Plant Sciences is leveraging its relationship with parent Cellectis SA to move the technology into crops. Similarly, Recombinetics Inc. is using TALENs, ZFNs and Cas9 to enhance productivity in the livestock industry (45). While the USDA has yet to decide on how it will treat genomes edited using Cas9, it has already ruled that ZFNs and TALENs do not fall under their governance (46). This saves an average 5.5 years and

$35 million in related regulatory costs for bringing a product to market (47). Similar treatment of CRISPR-based genome editing may stimulate economic activity around the development of new agricultural and industrial products. 12

Table 1.1 Industry interests in Cas9-based technologies.

In addition to Cas9-based applications in the agricultural industry, market segments for Cas9 endonucleases within the human health sectors are experiencing frenetic growth.

These markets include: gene-, cell-, and immuno-therapy, fast and efficient development of transgenic research animals, drug discovery, as well as target validation and screening.

It is difficult to accurately estimate the value of the nascent market for CRISPR RNA- guided nucleases in the biomed sector, but documents from the initial public offering (IPO) of Horizon Discovery Group, plc., which has in-licensed Cas9 IP, indicate a market size of

$46 billion (48) and recent private equity financings of Cas9-based genome engineering companies include: Caribou Biosciences, (undisclosed venture estimated at $2.9 million 13 from Novartis), CRISPR Therapeutics, ($25 million), Recombinetics, Inc. ($5 million),

Intellia Therapeutics ($15 million), and Editas Medicine ($43 million). In total, companies with an interest in using Cas9 for applications related to gene therapy have raised over

$600 million in venture capital and public markets since the beginning of 2013. The pace of this activity is remarkable given that the first granted patent for the use of CRSIPR technology in eukaryotic cells was issued April 15, 2014.

Commercial interest in Cas9 IP has not escaped the interest of big pharmaceuticals.

Novartis has partnered with a tier I private equity firm, Atlas Ventures to commit $15 million to kick start Intellia Therapeutics, and Pfizer partner Cellectis SA will be using

Cas9-based technologies to make T-cells with chimeric antigen receptors. In January of this year, AstraZeneca announced four partnerships with academia around the use of Cas9 nucleases to validate new drug targets.

In addition to the agricultural and biomedical sectors, the research tools market is also embracing CRISPR-based technology. Sigma-Aldrich has in-licensed technology in order to make, use, and distribute tools for the generation of modified plant and animal models, custom cell line creation and for pooled genetic screening. Perhaps the financial activity in each of these market sectors is heightened by over-exuberance common to the early market development. However, given the scope of current applications across multiple industries, we see no limits to research or financial commitments in this space.

14

Acknowledgements

We are grateful to Jennifer A. Doudna for valuable feedback on this manuscript.

Research in the Wiedenheft lab is supported by the National Institutes of Health

(P20GM103500 and R01GM108888), the National Science Foundation EPSCoR (EPS-

110134), the M.J. Murdock Charitable Trust, and the Montana State University

Agricultural Experimental Station.

15

References

1. Rodriguez-Valera F, Martin-Cuadrado AB, Rodriguez-Brito B, Pasic L, Thingstad TF, Rohwer F, Mira A: Explaining microbial population genomics through phage predation. Nature Reviews Microbiology 2009, 7:828-836.

2. Suttle CA: Marine viruses--major players in the global ecosystem. Nat Rev Microbiol 2007, 5:801-812.

3. Weinbauer MG: Ecology of prokaryotic viruses. FEMS Microbiol Rev 2004, 28:127-181.

4. Labrie SJ, Samson JE, Moineau S: resistance mechanisms. Nat Rev Microbiol 2010, 8:317-327.

5. Sorek R, Lawrence CM, Wiedenheft B: CRISPR-mediated adaptive immune systems in bacteria and archaea. Annu Rev Biochem 2013, 82:237-266.

6. van der Oost J, Westra ER, Jackson RN, Wiedenheft B: Unravelling the structural and mechanistic basis of CRISPR-Cas systems. Nat Rev Microbiol 2014, 12:479- 492.

7. Barrangou R, Marraffini LA: CRISPR-Cas Systems: Prokaryotes Upgrade to Adaptive Immunity. Mol Cell 2014, 54:234-244.

8. Bolotin A, Ouinquis B, Sorokin A, Ehrlich SD: Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology 2005, 151:2551-2561.

9. Mojica FJ, Diez-Villasenor C, Garcia-Martinez J, Soria E: Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J Mol Evol 2005, 60:174-182.

10. Pourcel C, Salvignol G, Vergnaud G: CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology 2005, 151:653-663.

11. Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P: CRISPR provides acquired resistance against viruses in prokaryotes. Science 2007, 315:1709-1712. ** This paper provides the first direct evidence for adaptive immunity in bacteria.

16

12. Hochstrasser ML, Doudna JA: Cutting it close: CRISPR-associated endoribonuclease structure and function. Trends Biochem Sci 2015, 40:58-66.

13. Reeks J, Naismith JH, White MF: CRISPR interference: a structural perspective. Biochem J 2013, 453:155-166.

14. Wilkinson R, Wiedenheft B: A CRISPR method for genome engineering. F1000Prime Rep 2014, 6:3.

15. Hsu PD, Lander ES, Zhang F: Development and applications of CRISPR-Cas9 for genome engineering. Cell 2014, 157:1262-1278.

16. Doudna JA, Charpentier E: Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 2014, 346:1258096.

17. Carroll D: Genome engineering with targetable nucleases. Annu Rev Biochem 2014, 83:409-439.

18. Roberts RJ: How restriction enzymes became the workhorses of molecular biology. Proceedings of the National Academy of Sciences of the United States of America 2005, 102:5905-5908.

19. The Nobel Prize in Physiology or Medicine 1978. Edited by: Nobel Media 2015.

20. Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E: A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science 2012, 337:816-821. **This paper provides mechanistic insights that explain how Cas9 can be used as a programmable nuclease for genome engineering.

21. Cong L, Ran FA, Cox D, Lin SL, Barretto R, Habib N, Hsu PD, Wu XB, Jiang WY, Marraffini LA, et al.: Multiplex Genome Engineering Using CRISPR/Cas Systems. Science 2013, 339:819-823. **This is one of the first demonstations that Cas9 can be used for precise multiplexed genome engineering in mouse and human cells.

22. Jinek M, East A, Cheng A, Lin S, Ma E, Doudna J: RNA-programmed genome editing in human cells. Elife 2013, 2:e00471. **This is one of the first demonstations that Cas9 can be used for precise genome engineering in human cells.

23. Mali P, Yang LH, Esvelt KM, Aach J, Guell M, DiCarlo JE, Norville JE, Church GM: RNA-Guided Human Genome Engineering via Cas9. Science 2013, 339:823-826. 17

**This is one of the first demonstations that Cas9 can be used for precise genome engineering in human cells, including induced pluripotent stem cells.

24. https://http://www.addgene.org/CRISPR/.

25. Kim S, Kim D, Cho SW, Kim J, Kim JS: Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins. Genome Res 2014, 24:1012-1019.

26. Zuris JA, Thompson DB, Shu Y, Guilinger JP, Bessen JL, Hu JH, Maeder ML, Joung JK, Chen ZY, Liu DR: Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo. Nat Biotechnol 2015, 33:73-80.

27. Kabadi AM, Ousterout DG, Hilton IB, Gersbach CA: Multiplex CRISPR/Cas9- based genome engineering from a single lentiviral vector. Nucleic Acids Res 2014, 42:e147.

28. Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelsen TS, Heckl D, Ebert BL, Root DE, Doench JG, et al.: Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 2014, 343:84-87. *This paper demonstates how lentiviral delivery of sgRNAs can be used to generate human genome knockout libraries for genome-scale screening.

29. Wang T, Wei JJ, Sabatini DM, Lander ES: Genetic Screens in Human Cells Using the CRISPR-Cas9 System. Science 2014, 343:80-84. *This paper demonstates how lentiviral delivery systems can be used to generate human genome knockout libraries.

30. Cheng R, Peng J, Yan Y, Cao P, Wang J, Qiu C, Tang L, Liu D, Tang L, Jin J, et al.: Efficient gene editing in adult mouse livers via adenoviral delivery of CRISPR/Cas9. FEBS Lett 2014, 588:3954-3958.

31. Ding Q, Strong A, Patel KM, Ng SL, Gosis BS, Regan SN, Cowan CA, Rader DJ, Musunuru K: Permanent alteration of PCSK9 with in vivo CRISPR-Cas9 genome editing. Circ Res 2014, 115:488-492.

32. Maggio I, Holkers M, Liu J, Janssen JM, Chen X, Goncalves MA: Adenoviral vector delivery of RNA-guided CRISPR/Cas9 nuclease complexes induces targeted mutagenesis in a diverse array of human cells. Sci Rep 2014, 4:5105.

33. Holkers M, Maggio I, Henriques SF, Janssen JM, Cathomen T, Goncalves MA: Adenoviral vector DNA for accurate genome editing with engineered nucleases. Nat Methods 2014, 11:1051-1057. 18

34. Horii T, Arai Y, Yamazaki M, Morita S, Kimura M, Itoh M, Abe Y, Hatada I: Validation of microinjection methods for generating knockout mice by CRISPR/Cas-mediated genome engineering. Sci Rep 2014, 4:4513.

35. Long C, McAnally JR, Shelton JM, Mireault AA, Bassel-Duby R, Olson EN: Prevention of muscular dystrophy in mice by CRISPR/Cas9-mediated editing of germline DNA. Science 2014, 345:1184-1188.

36. Niu Y, Shen B, Cui Y, Chen Y, Wang J, Wang L, Kang Y, Zhao X, Si W, Li W, et al.: Generation of gene-modified cynomolgus monkey via Cas9/RNA-mediated gene targeting in one-cell embryos. Cell 2014, 156:836-843.

37. Wang H, Yang H, Shivalila CS, Dawlaty MM, Cheng AW, Zhang F, Jaenisch R: One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 2013, 153:910-918.

38. Heyer WD, Ehmsen KT, Liu J: Regulation of homologous recombination in eukaryotes. Annu Rev Genet 2010, 44:113-139.

39. Terns RM, Terns MP: CRISPR-based technologies: prokaryotic defense weapons repurposed. Trends Genet 2014, 30:111-118.

40. Carroll D: Genome Editing by Targeted Chromosomal Mutagenesis. Methods in Molecular Biology 2014, 1239:1-13.

41. Chevalier BS, Stoddard BL: Homing endonucleases: structural and functional insight into the catalysts of intron/intein mobility. Nucleic Acids Research 2001, 29:3757-3774.

42. Anders C, Niewoehner O, Duerst A, Jinek M: Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 2014, 513:569-573.

43. Hou Z, Zhang Y, Propson NE, Howden SE, Chu L-F, Sontheimer EJ, Thomson JA: Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis. Proc Natl Acad Sci U S A 2013.

44. Sanjana NE, Shalem O, Zhang F: Improved vectors and genome-wide libraries for CRISPR screening. Nat Methods 2014, 11:783-784.

45. Tan W, Carlson DF, Lancto CA, Garbe JR, Webster DA, Hackett PB, Fahrenkrug SC: Efficient nonmeiotic allele introgression in livestock using custom endonucleases. PNAS 2013, 110:16526–16531.

19

46. Glorikian H: Gene Editing Will Change Everything—Just Not All at One Time. Edited by; 2015.

47. Voytas DF, Gao C: Precision genome engineering and agriculture: opportunities and regulatory challenges. PLoS Biol 2014, 12:e1001877.

48. Horizon Discovery Group, Admission Document; 2015.

20

CHAPTER TWO

CONFORMATIONAL REGULATION OF CRISPR-ASSOCIATED NUCLEASES

Contribution of Authors and Co-Authors

Manuscript in Chapter 2

Author: Ryan N. Jackson

Contributions: Wrote the manuscript

Co-Author: Paul B.G. van Erp

Contributions: Wrote the manuscript

Co-Author: Samuel H. Sternberg

Contributions: Wrote the manuscript

Co-Author: Blake Wiedenheft

Contributions: Wrote the manuscript

21

Manuscript Information

Ryan N. Jackson, Paul B.G. van Erp, Samuel H. Sternberg, Blake Wiedenheft

Current Opinion in Microbiology

Status of Manuscript: ____ Prepared for submission to a peer-reviewed journal ____ Officially submitted to a peer-reviewed journal ____ Accepted by a peer-reviewed journal X Published in a peer-reviewed journal

Elsevier Volume 37 1 June 2017 Pages 110-119 DOI: 10.1016/j.mib.2017.05.010

22

CONFORMATIONAL REGULATION OF CRISPR-ASSOCIATED

Ryan N. Jacksona, Paul B.G. van Erpb, Samuel H. Sternbergc, and Blake Wiedenheftb

aDepartment of Chemistry and Biochemistry, Utah State University, Logan UT 84322, bDepartment of Microbiology and Immunology, Montana State University, Bozeman MT

59717, cCaribou Biosciences, Inc. Berkeley, CA, 94710

23

Abstract

Adaptive immune systems in bacteria and archaea rely on small CRISPR-derived

RNAs (crRNAs) to guide specialized nucleases to foreign nucleic acids. The activation of these nucleases is controlled by a series of molecular checkpoints that ensure precise cleavage of nucleic acid targets, while minimizing toxic off-target cleavage events. In this review, we highlight recent advances in understanding regulatory mechanisms responsible for controlling the activation of these nucleases and identify emerging regulatory themes conserved across diverse CRISPR systems. 24

Introduction

Nucleases that degrade DNA and RNA are indispensable for diverse biological functions (1). To avoid toxicity associated with aberrant activity, nucleases are often controlled by substrate binding at (orthosteric), or away from (allosteric) the active site.

Recent evidence suggests that CRISPR (Clustered Regularly Interspaced Short

Palindromic Repeat)-associated (Cas) nucleases, which are essential components of adaptive immune systems that protect bacteria and archaea from infection by viruses and plasmids, rely on orthosteric and allosteric control (2,3). Presumably these regulatory mechanisms evolved to efficiently eliminate foreign DNA or RNA, while avoiding autoimmune reactions associated with destruction of the bacterial or archaeal genome. The programmable nature of CRISPR-associated nucleases (e.g. Cas9) has given rise to a powerful new method for genome engineering, and a comprehensive understanding of how these nucleases function is necessary for safe implementation (4-7).

CRISPR RNA-guided adaptive immune systems are structurally and functionally diverse, consisting of two Classes (1 and 2), six Types (I-VI), and more than nineteen subtypes distinguished by CRISPR repeat sequence and cas genes (2,8-10) (Figure 2.1).

Despite this diversity, all CRISPR systems rely on specialized nucleases to execute three stages of adaptive immunity; acquisition, CRISPR RNA biogenesis, and interference (2,3).

During acquisition, a nuclease active complex comprised of Cas1 and Cas2 proteins inserts foreign nucleic acid targets (about 20 – 40 nt in length) called

‘protospacers’ into the spacer-repeat array at the leader end of the CRISPR locus. CRISPR loci are transcribed, and Cas or RNAse III enzymes process these transcripts into libraries 25 of small CRISPR-derived RNAs (crRNA). Each crRNA assembles with Cas proteins into surveillance complexes that use the crRNA to bind complementary DNA (Type I, II, V) or

RNA (Type III, VI) (Figure 2.1). Target binding induces conformational rearrangements within the surveillance complex that activate either cis-acting nuclease domains located within the complex, or a trans-acting nuclease that is recruited for destruction of bound targets. 26

Figure 2.1. Diverse CRISPR systems defend against DNA- and RNA-based invaders. (A) Phylogenetic tree of CRISPR system Types with nucleic acid substrates that are bound and cleaved. DNA targeting nucleases are labeled red and RNA targeting nucleases are labeled blue. Genes coding for multi-subunit complexes are indicated with the brackets. (B) Cartoon representations of each CRISPR system targeting DNA phage, RNA phage, or both. 27

Structures of Cas nucleases involved in spacer acquisition (e.g. Cas1) and crRNA biogenesis (e.g. Cas6), suggest substrate binding at the enzymatic active site (i.e. orthosteric) induces conformational rearrangements that regulate the nuclease (11-13). In contrast, Cas nucleases involved in interference rely on a combination of orthosteric and allosteric activation mechanisms. Here we review the regulatory mechanisms that control

Cas nucleases involved in interference.

Regulation of Class 2 Cas nucleases

Class 2 systems consist of three different Types (i.e. Type II, V, and VI) that encode single-subunit Cas nucleases (2,8,9) (Figure 2.2). The Type II (Cas9) systems gained considerable notoriety in 2012 and early 2013 when the programmable nature of these

RNA-guided nucleases was exploited for precise cleavage of DNA in human cells (for a review of the primary literature, see (4-6)). In the last five years, these nucleases have ushered in a new era of genome editing technologies that have considerable potential for applications in molecular medicine, industrial biotechnology, and agriculture.

Type II systems

The implementation of Type II systems for genome editing applications requires a fundamental understanding of how these RNA-guided nucleases assemble, bind to double- stranded DNA (dsDNA), and then cleave both strands of the dsDNA target. Initial biochemical work on the Cas9 ribonucleoprotein from Streptococcus pyogenes showed that the crRNA base-paired to a trans-activating tracrRNA, and that these two RNAs could be fused into a single-guide RNA (sgRNA) capable of directing the Cas9 nuclease to a 28 complementary DNA (14,15). Subsequent structural and biochemical studies demonstrated that Cas9 is a dynamic enzyme, and that target cleavage requires sequential completion of multiple regulatory checkpoints.

Cas9 adopts a bi-lobed architecture in which two adjacent nuclease domains (RuvC and HNH) reside within the nuclease lobe (NUC lobe), and an alpha-helical lobe (also known as the recognition or REC lobe) interacts with the guide RNA–target DNA heteroduplex (16-21). In the absence of RNA, Cas9 adopts a nuclease-inactive conformation, which undergoes a dramatic rearrangement upon binding the guide RNA

(16) (Figure 2.2A). Cas9 relies on base-pairing between the crRNA guide and DNA targets, but single-molecule experiments performed using Cas9 from Streptococcus pyogenes showed that Cas9 must first recognize a tri-nucleotide motif called a PAM (Protospacer

Adjacent Motif) located next to the DNA target sequence (22). In addition to PAM recognition, high-affinity binding requires precise complementarity between the DNA target and a PAM-proximal “seed sequence” followed by directional unwinding of the

DNA from the seed to the PAM-distal end of the target (23) (Figure 2.2A). Chromatin immunoprecipitation followed by sequencing (ChIP-seq), revealed that as many as 15 mismatches are allowed outside of the seed sequence for stable binding, but incomplete

RNA-DNA hybrids are often not cleaved by Cas9 (24-26). 29

Figure 2.2. Class 2 Nuclease activation mechanisms. (A) Activation of the Type II Cas9 nuclease (grey with red outline) relies on several regulatory checkpoints. Cas9 adopts a bi- lobed structure that consists of a nuclease-lobe (Nuc lobe) and an alpha-helical recognition lobe (Rec lob). Single-guide-RNA loading causes a conformational rearrangement in the Rec-lobe. To bind duplex DNA, a PAM motif (dark red) is recognized followed by 3’ seed complementation. Directional unwinding of the duplex to the PAM-distal side of the guide induces a conformational rearrangement of two peptide linkers (L1 and L2 colored blue and orange) to stabilize the active conformation of the HNH and RuvC nuclease domains (colored red) to make a blunt-end cut in duplex DNA. (B) Type V nucleases (Cpf1, C2c1, and C2c3) also adopt a bi-lobed architecture, are loaded with an RNA-guide and use PAM and seed recognition to initiate target binding with DNA. However, the DNA cleavage mechanisms may involve the RuvC domain and Nuc domain, or just the RuvC domain. C. Like Type II and V, the Type VI (C2c2) proteins adopt a bi-lobed architecture and are programmed with an RNA-guide. Type VI nucleases bind RNA with a central seed within the RNA-guide, and an adjacent sequence called the PFS (Protospacer Flanking Site) plays a role in nuclease activation. 30

Initial structures of DNA-bound Cas9 could not explain why stable binding and incomplete R-loop formation fail to activate Cas9 cleavage. In these structures, the active site of the HNH nuclease domain is ~30 Å away from the scissile phosphate of the target

DNA strand (17,18,27) (Figure 2.2A), suggesting that a major conformational rearrangement was necessary to activate Cas9-mediated DNA cleavage. To test this hypothesis and clarify the substrate features required for Cas9 activation, Förster

Resonance Energy Transfer (FRET)-based assays were used to measure conformational changes in the HNH domain upon binding to perfectly complementary and mismatched

DNA targets (28). This work revealed that the HNH nuclease domain adopts an inactive conformation on mismatched DNA substrates that are stably bound by Cas9, explaining why off-target DNA binding events do not always lead to off-target DNA cleavage. Full complementarity between target DNA and the crRNA guide drives conformational changes in the HNH domain that trigger target-strand cleavage, and allosterically activates cleavage of the non-complementary strand by the RuvC domain, ensuring concerted production of double-strand breaks (28).

A recent structure of Cas9 bound to a long dsDNA substrate provides a structural basis for the concerted activation of the Cas9 nuclease domains (29) (Figure 2.2). As observed in all other Cas9 structures, the HNH domain is inserted between motifs II and

III in the RuvC domain, and is connected to these motifs by two peptide linkers, L1 and

L2. Comparison with previous Cas9 structures reveals that L1 and L2 undergo extensive rearrangement after binding a dsDNA-target. L1 interacts with the PAM-distal side of the

RNA-DNA hybrid, while a conserved phenylalanine on L2 stacks with the fourth 31 of the protospacer on the non-complementary strand. These interactions stabilize a ~180° rotation of the HNH domain and position the active site in proximity to the scissile phosphate of the complementary strand. Additionally, these rearrangements position the scissile phosphate of the non-complementary strand into the active site of the

RuvC nuclease. To avoid steric clashes, these large conformational rearrangements can only be achieved if both L1 and L2 move in concert, which explains why HNH and RuvC activation is coupled. Cas9 nucleases from organisms other than S. pyogenes share a similar

HNH and RuvC domain linkage, suggesting this concerted activation mechanism may be conserved across all Cas9 enzymes (16,19-21).

Type V systems

Like Cas9, the Type V enzymes – Cas12a, Cas12b, and Cas12c; also known as

Cpf1, C2c1, and C2c3 – contain a putative nuclease domain (Nuc) inserted between motifs

II and III of a RuvC domain (9,30-35). Point mutations in the Nuc domain of Cas12a (Cpf1) inhibited cleavage of the complementary strand, while mutations in the RuvC active site disrupted cleavage of both strands (31). These data suggested RuvC-mediated cleavage of the non-complementary strand is a pre-requisite for cleavage of the complementary strand by the Nuc domain. However, a structure of Cas12b (C2c1) bound to a long single-stranded

DNA suggests the RuvC nuclease may be responsible for cleaving both strands of the DNA in sequential order (34). Indeed, a recent study of Cas12a provides biochemical evidence that a single active site within the RuvC domain cleaves both DNA strands using the same catalytic mechanism, and that activation of this nuclease function requires substantial conformational rearrangements to unblock the previously occluded active site (36). While 32 it appears that Cas12 enzymes recognize DNA targets in a distinct way from Cas9, both enzyme families have evolved mechanisms to sequester the active sites in their nuclease domains until RNA-DNA heteroduplex formation triggers activation.

Type VI systems

Unlike most Class 2 systems, which target dsDNA, the Type VI Cas nucleases

(Cas13, also known as C2c2) bind and cleave ssRNA using two HEPN (Higher Eukaryote and Prokaryote Nucleotide-binding) domains (9,37-39) (Figure 2.2B). The HEPN RNase domains are activated by ssRNA base-pairing with the crRNA guide, but nuclease activity is suppressed if complementarity extends beyond the guide and into the repeat-derived portion of the crRNA (37). Complementarity to the repeat-derived sequence of the crRNA may explain the inhibition of Cas13 (C2c2) nuclease activity, similar to regulation of

DNase activity in Type III systems (see below). Notably, unlike other Class 2 systems,

RNA binding by Cas13 activates a non-sequence specific and trans-acting RNase activity

(37,38); a property which has recently been harnessed for sensitive RNA and DNA diagnostics (40). Thus, target-induced conformational rearrangements are required to activate the promiscuous HEPN nuclease domains. While recent structures have shed light on Cas13 conformational changes that occur upon crRNA binding (39), further structural and biophysical studies are necessary to fully understand the conformational rearrangements that occur upon target RNA binding and the mechanism of RNA cleavage in trans.

33

Regulation of Class 1 nucleases that are essential for interference.

Class 1 CRISPR systems are phylogenetically and functionally diverse, but a unifying feature of these systems is that they all rely on multi-subunit crRNA-guided surveillance complexes for detection of invading nucleic acids (8,41). Class 1 systems consist of two well-studied Types (Type I and III), and a recently identified Type IV system that has not been experimentally tested and is beyond the scope of this review.

Type I systems

Phylogenetic studies have divided Type I systems into seven sub-types, I-A to I-F and I-U (8). All Type I CRISPR systems rely on multi-subunit surveillance complexes for crRNA-guided detection of dsDNA targets, and a Cas3 nuclease/helicase that is responsible for target degradation.

Foreign DNA detection by the Type I-E system in Escherichia coli relies on a large crRNA-guided complex called Cascade (CRISPR-associated complex for antiviral defense) (42,43). Cascade searches for targets by first localizing to PAM sequences, which are recognized by the Cas8e subunit (also called Cse1) (23,44-46). PAM detection destabilizes the target duplex and facilitates crRNA-guided sampling of the PAM- proximal DNA seed sequence (47). Complementarity between the crRNA-guide and the seed leads to directional unwinding of the DNA duplex that proceeds away from the PAM and triggers a conformational change in Cascade that is necessary for recruitment of the trans-acting Cas3 nuclease/helicase, which degrades the DNA target (23,48-51). However, there is an accumulating body of work showing that Cascade may adopt distinct conformational states based on interactions with specific PAM and protospacer 34 combinations, and that these distinct conformations may regulate the activity of Cas3

(44,49,51-56).

Cascade elicits two distinct immune responses upon target binding: interference or priming (Figure 2.3). During interference, the Cas8e subunit adopts a “closed” conformation capable of recruiting a nuclease active Cas3, which processively degrades the target DNA (44,50,51,57-59). However, strict sequence requirements present a potential weakness in the immune system, because mutations in the PAM, seed, or specific positions of the protospacer, allow viruses to escape CRISPR-Cas immunity (47,60,61).

Bacteria with Type I immune systems can restore immunity against PAM or protospacer

“escape” mutants using a positive feedback loop that rapidly updates the CRISPR locus with new spacers derived from the phage or plasmid genome (61-64). This process of rapid acquisition, called “priming”, requires a target that is partially complementary to the crRNA-guide, Cas3, and two proteins that are essential for adaptation (i.e. Cas1 and Cas2)

(62,65). 35

36

Figure 2.3. Regulation of Class 1 Cas nucleases. (A) Type I-E Cascade binds to both interference (red) and priming (orange) DNA targets. Binding of dsDNA causes a conformational change in the Cse1 and Cse2 subunits (blue arrows). When bound to interference targets the Cse1 subunit adopts a closed conformation that recruits nuclease active Cas3 (red). When bound to priming targets the Cse1 subunit adopts an open conformation that does not recruit Cas3 directly, but relies on the presence of Cas1-Cas2 for the recruitment of nuclease repressed Cas3 (gray). Cas3 helicase activity produces looped sections of ssDNA which can act as sources of new spacers. Co-localization of Cas1-Cas2 may ensure that the complementary strand fragments get predominantly acquired during priming. (B) In Type III systems the multi-subunit complexes bind ssRNA complementary to the crRNA-guide and RNase active sites in the Cas7 backbone subunits (blue) cleave the RNA at regular 60-nt internals. In addition, RNA binding also activates a non-specific DNase activity of the Cas10 subunit, which in some systems also requires recognition of the 3’-end flanking sequence (red). Cleavage and release of the RNA deactivates the Cas10 subunit (gray) and ensures temporal control.

Mutated targets that elicit a priming response correspond to an “open” conformation of the Cas8e subunit, which does not effectively recruit nuclease active Cas3

(44,51,52) (Figure 2.3). However, in the presence of Cas1 and Cas2, Cascade bound to a priming target efficiently recruits nuclease-repressed Cas3 (44). Recent structural and biochemical studies performed using the Type I-F immune system from Pseudomonas aeruginosa, in which Cas2 and Cas3 are fused into a single polypeptide (Cas2/3), indicated that Cas1 and Cas2/3 form a complex, and that Cas1 represses Cas2/3 nuclease activity until activated by the target bound surveillance complex (66).

Collectively, these results suggest that the conformational state of Cascade recruits a nuclease active Cas3 for direct degradation (interference) or a nuclease repressed Cas1-

2/3 complex for rapid adaptation (priming) (44). However, in both instances Cas3 cleavage products are precursors for CRISPR adaptation (67), suggesting that priming targets do elicit some level of Cas3 nuclease activity and that primed adaptation also takes place during the interference response. It is clear Cas1, 2, and 3 work together to generate and 37 integrate new sequences into CRISPR loci, but the mechanism of allosteric regulation of

Cas3 activity by the conformational state of Cascade remains poorly understood.

Type III systems

Like Type I systems, the Type III systems also rely on large multi-subunit crRNA- guided surveillance complexes that are divided into sub-types (i.e., Type III-A to III-D)

(8). Initially these subtypes were functionally divided into either DNA or RNA targeting immune systems. However, a series of recent experiments revealed a sophisticated mechanism that explains how crRNA-guided binding to complementary RNA allosterically activates a non-sequence specific DNase activity (Figure 2.3) (68). This model reconciles earlier distinctions between RNA or DNA targeting and reveals that Type

III systems cleave both RNA and DNA targets.

The RNase and DNase active sites in Type III complexes are spatially separated, functionally integrated, and temporally controlled. RNA binding drives a conformational rearrangement that allosterically activates non-sequence-specific DNase activity in the

Cas10 subunit (69-75). The RNA target is cleaved in regular 6-nt intervals and structures of Type III complexes have identified residues in the backbone (Cas7-family) and belly subunits that are necessary for RNA cleavage (41,68,76). According to the current model, periodic cleavage of the RNA target, leads to dissociation of the RNA fragments, which restores the complex to a DNase inactive state (68,70-72).

In addition to complementarity between the crRNA-guide and the RNA protospacer, sequence 3’ of the protospacer regulates DNase activity. Complementarity between the crRNA and the RNA target that extends beyond the guide and into the 5’- 38 repeat of the crRNA inhibits DNase activation in all Type III systems (70-74,75,77), but requirements for DNase activation seem to vary between systems. In Thermotoga maritima, the DNase function is activated by protospacers lacking flanking sequence, suggesting complementation between the crRNA-guide and RNA target is the only signal necessary for DNase activation and base pairing with the 5’-repeat is inhibitory (71).

However, in several other systems, hybridization with the protospacer sequence alone is not sufficient for DNase activation. In these systems, the DNase remains inactive unless a non-complementary sequence extends from the protospacer into the 5’-repeat (72,73,75), and in some systems (e.g. Type III-B Cmr system in P. furiosus) this 3’ non- complementary RNA sequence must meet specific sequence requirements (70). The mechanistic distinctions between these systems and the precise role of the 3’ RNA in controlling DNase activation awaits structural studies of Type III complexes bound to

RNAs with 3’-flanking sequence.

Perspective

CRISPR-associated nucleases are phylogenetically and functionally diverse, yet common regulatory themes are beginning to emerge. All CRISPR systems that target DNA rely on protein mediated PAM recognition and crRNA-guided recognition of the protospacer. However, binding does not necessarily result in target cleavage. Recent data indicates that different PAM and protospacer combinations result in distinct conformational states that regulate nuclease activity. These new insights are revealing additional layers of sophistication in prokaryotic adaptive immune responses, and 39 understanding how target selection impacts the activity of these nucleases will have important implications for applications in genome editing.

Acknowledgements

Research in the Wiedenheft lab is supported by the National Institutes of Health

(P20GM103500, P30GM110732-03, R01GM110270, and R01GM108888), the National

Science Foundation EPSCoR (EPS-110134), the M. J. Murdock Charitable Trust, a young investigator award from Amgen, Gordon and Betty Moore Foundation, and the Montana

State University Agricultural Experimental Station. Research in the Jackson Lab is supported by Utah State University New Faculty Start-up funding from the Department of

Chemistry and Biochemistry, the Research and Graduate Studies Office, and the College of Science.

40

References

Paper of particular interest, published within the period of review, have been highlighted as:  of special interest

 of outstanding interest

1 W. Yang. Nucleases: diversity of structure, function and mechanism. Q Rev Biophys, 44 (2010), pp. 1–93

2 P. Mohanraju, K.S. Makarova, B. Zetsche, F. Zhang, E.V. Koonin, J. van der Oost Diverse evolutionary roots and mechanistic variations of the CRISPR-Cas systems. Science, 353 (2016), aad5147

3 A.V. Wright, J.K. Nuñez, J.A. Doudna. Biology and Applications of CRISPR Systems: Harnessing Nature’s Toolbox for Genome Engineering. Cell, 164 (2016), pp. 29–44

4 S.H. Sternberg, J.A. Doudna. Expanding the Biologist’s Toolkit with CRISPR- Cas9. Mol Cell, 58 (2015), pp. 568–574

5 W. Jiang, L.A. Marraffini. CRISPR-Cas: New Tools for Genetic Manipulations from Bacterial Immunity Systems. Annu Rev Microbiol, 69 (2015), pp. 209–228

6 A.C. Komor, A.H. Badran, D.R. Liu. CRISPR-Based Technologies for the Manipulation of Eukaryotic Genomes. Cell, 168 (2017), pp. 20–36

7 P.B. van Erp, G. Bloomer, R. Wilkinson, B. Wiedenheft. The history and market impact of CRISPR RNA-guided nucleases. Curr Opin Virol, 12 (2015), pp. 85–90

8 K.S. Makarova, Y.I. Wolf, O.S. Alkhnbashi, F. Costa, S.A. Shah, S.J. Saunders, R. Barrangou, S.J.J. Brouns, E. Charpentier, D.H. Haft, et al. An updated evolutionary classification of CRISPR–Cas systems. Nat Rev Microbiol, 13 (2015), pp. 722-736

9 S. Shmakov, O.O. Abudayyeh, K.S. Makarova, Y.I. Wolf, J.S. Gootenberg, E. Semenova, L. Minakhin, J. Joung, S. Konermann, K. Severinov, et al. Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems. Mol Cell, 60 (2015), pp. 385–397

41

10 D. Burstein, L.B. Harrington, S.C. Strutt, A.J. Probst, K. Anantharaman, B.C. Thomas, JA Doudna, JF Banfield. New CRISPR–Cas systems from uncultivated microbes. Nature, 542 (2017), pp. 237–241

11 J.K. Nuñez, L.B. Harrington, P.J. Kranzusch, A.N. Engelman, J.A. Doudna. Foreign DNA capture during CRISPR–Cas adaptive immunity. Nature, 527 (2015), pp. 535-538

12 J. Wang, J. Li, H. Zhao, G. Sheng, M. Wang, M. Yin, Y. Wang. Structural and Mechanistic Basis of PAM-Dependent Spacer Acquisition in CRISPR-Cas Systems. Cell, 163 (2015), pp. 840-53

13 D.G. Sashital, M. Jinek, J.A. Doudna. An RNA-induced conformational change required for CRISPR RNA cleavage by the endoribonuclease Cse3. Nat Struct Mol Biol, 18 (2011), pp. 680–687

14 E. Deltcheva, K. Chylinski, C.M. Sharma, K. Gonzales, Y. Chao, Z.A. Pirzada, M.R. Eckert, J. Vogel, E. Charpentier. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III Nature 471 (2011), pp. 602–607

15 M. Jinek, K. Chylinski, I. Fonfara, M. Hauer, J.A. Doudna, E. Charpentier. A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science, 337 (2012), pp. 816–821

16 M. Jinek, F. Jiang, D.W. Taylor, S.H. Sternberg, E. Kaya, E. Ma, C. Anders, M. Hauer, K. Zhou, S. Lin, et al. Structures of Cas9 Endonucleases Reveal RNA- Mediated Conformational Activation. Science, 343 (2014), p.1247997

This manuscript describes the apo Cas9 structures of Streptococcus pyogenes and Actinomyces naeslundii, which revealed apo-Cas9 adopts a catalytically inactive conformation. Electron Microscopic reconstructions of guide-RNA bound Cas9 revealed conformational rearrangements enable DNA binding.

17 H. Nishimasu, F.A. Ran, P.D. Hsu, S. Konermann, S.I. Shehata, N. Dohmae, R. Ishitani, F. Zhang, O. Nureki. Crystal Structure of Cas9 in Complex with Guide RNA and Target DNA. Cell, 156 (2014), pp. 935-949

This paper presents the crystal structure of Streptococcus pyogenes Cas9 loaded with sgRNA and bound to target DNA.

42

18 C. Anders, O. Niewoehner, A. Duerst, M. Jinek. Structural basis of PAM- dependent target DNA recognition by the Cas9 endonuclease. Nature, 513 (2014), pp. 569–573

This manuscript describes the crystal structure of Streptococcus pyogenes Cas9- sgRNA complex bound to a DNA target with 5’-NGG-3’ PAM. This structure explains how the PAM sequence is recognized and how it contributes to duplex strand separation.

19 H. Nishimasu, L. Cong, W.X. Yan, F.A. Ran, B. Zetsche, Y. Li, A. Kurabayashi, R. Ishitani, F. Zhang, O. Nureki. Crystal Structure of Staphylococcus aureus Cas9 Cell, 162 (2015), pp. 1113–1126

This paper reports the crystal structure of Staphylococcus aureus Cas9 in complex with sgRNA and target DNA. It reveals how the canonical 5′-NNGRRT-3′ PAM is recognized and highlights structural conservation and divergence compared to spyCas9

20 M. Yamada, Y. Watanabe, J.S. Gootenberg, H. Hirano, F.A. Ran, T. Nakane, R. Ishitani, F. Zhang, H. Nishimasu, O. Nureki. Crystal Structure of the Minimal Cas9 from Campylobacter jejuni Reveals the Molecular Diversity in the CRISPR- Cas9 Systems. Mol Cell, 65 (2017), pp. 1109–1121

This paper reports the crystal structure of Campylobacter jejuni Cas9 bound to sgRNA and target DNA. The structure reveals that the guide RNA contains a triple helix and explains how the canonical 5′-NNNVRYM-3′ PAM is recognized on both DNA strands.

21 H. Hirano, J.S. Gootenberg, T. Horii, O.O. Abudayyeh, M. Kimura, P.D. Hsu, T. Nakane, R. Ishitani, I. Hatada, F. Zhang, et al. Structure and Engineering of Francisella novicida Cas9. Cell, 164 (2016), pp. 950-961

This paper reports the crystal structure of Francisella novicida Cas9 bound to sgRNA and target DNA. The structure shows how the canonical 5′-NGG-3′ PAM is recognized. Structure guided mutations are used to create a variant that recognizes a relaxed 5′-YG-3′ PAM.

22 S. H. Sternberg, S. Redding, M. Jinek, E. C. Greene, J. A. Doudna. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature, 507 (2014), pp. 62–67

The authors use single molecule techniques to show that Cas9 first interrogates the PAM before hybridization with the protospacer.

43

23 M.D. Szczelkun, M.S. Tikhomirova, T. Sinkunas, G. Gasiunas, T. Karvelis, P. Pschera, V Siksnys, R Seidel. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes. Proc Natl Acad Sci U S A, 111 (2014), pp. 9798–9803

This paper reports single-molecule DNA supercoiling experiments that show in Cascade and Cas9 R-loop formation proceeds directionally starting at the PAM. PAM mutations affect R-loop formation rates while protospacer distal mutations affect R-loop stability.

24 X. Wu, D.A. Scott, A.J. Kriz, A.C. Chiu, P.D. Hsu, D.B. Dadon, A.W. Cheng, A.E. Trevino, S Konermann, S Chen, et al. Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nat Biotechnol, 32 (2014), pp. 670–676

25 S.Q. Tsai, Z. Zheng, N.T. Nguyen, M. Liebers, V.V. Topkar, V. Thapar, N. Wyvekens, C. Khayter, A.J. Iafrate, L.P. Le, et al. GUIDE-seq enables genome- wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol, 33 (2014), pp. 187–197

26 F.A. Ran, L. Cong, W.X. Yan, D.A. Scott, J.S. Gootenberg, A.J. Kriz, B. Zetsche, O. Shalem, X. Wu, K.S. Makarova, et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature, 520 (2015), pp. 186–191

27 F. Jiang, K. Zhou, L. Ma, S. Gressel, J.A. Doudna. A Cas9-guide RNA complex preorganized for target DNA recognition. Science, 348 (2015), pp. 1477–1481

This paper describes the crystal structure of Streptococcus pyogenes Cas9 bound to sgRNA and reveals the seed sequence of the single-guide-RNA is pre-ordered in a pseudo A-form configuration.

28 S.H. Sternberg, B. LaFrance, M. Kaplan, J.A. Doudna. Conformational control of DNA target cleavage by CRISPR–Cas9. Nature, 527 (2015), pp. 110–113

This paper reports single molecule work that shows Cas9 binding and cleavage of target DNA first requires recognition of a PAM.

29 F. Jiang, D.W. Taylor, J.S. Chen, J.E. Kornfeld, K. Zhou, A.J. Thompson, E. Nogales, J.A. Doudna. Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science, 351 (2016), pp. 867–871

This paper reported the structure of Cas9 bound to a duplex DNA target with a non-complementary R-loop. The structure captures the HNH nuclease domain in an active conformation and explains the concerted cleavage mechanism of duplex DNA. 44

30 B. Zetsche, J.S. Gootenberg, O.O. Abudayyeh, I.M. Slaymaker, K.S. Makarova, P. Essletzbichler, S.E. Volz, J. Joung, J. van der Oost, A. Regev, et al. Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System. Cell, 163 (2015), pp. 759–771

31 T. Yamano, H. Nishimasu, B. Zetsche, H. Hirano, I.M. Slaymaker, Y. Li, I. Fedorova, T. Nakane, K.S. Makarova, E.V. Koonin, et al. Crystal Structure of Cpf1 in Complex with Guide RNA and Target DNA. Cell, 165 (2016), pp. 949- 962

32 D. Dong, K. Ren, X. Qiu, J. Zheng, M. Guo, X. Guan, H. Liu, N. Li, B. Zhang, D. Yang, et al. The crystal structure of Cpf1 in complex with CRISPR RNA. Nature, 532 (2016), pp. 522–526

33 P. Gao, H Yang, KR Rajashankar, Z Huang, DJ Patel. Type V CRISPR-Cas Cpf1 endonuclease employs a unique mechanism for crRNA-mediated target DNA recognition. Cell Res, 26 (2016), 901–913

34 L Liu, P Chen, M Wang, X Li, J Wang, M Yin, Y Wang. C2c1-sgRNA Complex Structure Reveals RNA-Guided DNA Cleavage Mechanism. Mol Cell, 65 (2017), pp. 310–322

35 H Yang, P Gao, KR Rajashankar, DJ Patel. PAM-Dependent Target DNA Recognition and Cleavage by C2c1 CRISPR-Cas Endonuclease. Cell, 167 (2016), pp. 1814–1828

36 Swarts DC, van der Oost J, Jinek M. Structural Basis for Guide RNA Processing and Seed-Dependent DNA Targeting by CRISPR-Cas12a. Molecular Cell, 66 (2017), pp. 221–233.e4

37 O.O. Abudayyeh, J.S. Gootenberg, S. Konermann, J. Joung, I.M. Slaymaker, D.B.T. Cox, S. Shmakov, K.S. Makarova, E. Semenova, L. Minakhin, et al. C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector. Science, 353 (2016), aaf5573

38 A. East-Seletsky, M.R. O’Connell, S.C. Knight, D. Burstein, J.H.D. Cate, R. Tjian, JA Doudna. Two distinct RNase activities of CRISPR-C2c2 enable guide-RNA processing and RNA detection. Nature, 538 (2016), pp. 270-273

39 L. Liu, X. Li, J. Wang, M. Wang, P. Chen, M. Yin, J. Li, G. Sheng, Y. Wang. Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities. Cell, 168 (2017), pp. 121–134

45

40. J. S. Gootenberg, O. O. Abudayyeh, J. W. Lee, P. Essletzbichler, A. J. Dy, J. Joung, V. Verdine, N. Donghia, N. M. Daringer, C. A. Freije, et al. Nucleic acid detection with CRISPR-Cas13a/C2c2. Science, 356 (2017), pp. 438–442

41 R.N. Jackson, B. Wiedenheft. A Conserved Structural Chassis for Mounting Versatile CRISPR RNA-Guided Immune Responses. Mol Cell, 58 (2015) pp. 722- 728

42 S.J.J. Brouns, M.M. Jore, M. Lundgren, E.R. Westra, R.J.H. Slijkhuis, A.P.L. Snijders, M.J. Dickman, K.S. Makarova, E.V. Koonin, J. van der Oost. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science, 321 (2008), pp. 960–964

43 M.M. Jore, M. Lundgren, E. van Duijn, J.B. Bultema, E.R. Westra, S.P. Waghmare, B. Wiedenheft, U. Pul, R. Wurm, R. Wagner, et al. Structural basis for CRISPR RNA-guided DNA recognition by Cascade. Nat Struct Mol Biol, 18 (2011), pp. 529–536

44 S. Redding, S.H. Sternberg, M. Marshall, B. Gibb, P. Bhat, C.K. Guegler, B. Wiedenheft, J.A. Doudna, E.C. Greene. Surveillance and Processing of Foreign DNA by the Escherichia coli CRISPR-Cas System. Cell, 163 (2015), pp. 1–12

The authors use single molecule techniques to show that nuclease active Cas3 is recruited to Cascade bound to interference targets. Cascade bound to priming targets only recruits nuclease inactive Cas3 in the presence of Cas1 and Cas2. Furthermore, they show that Cas3 nuclease activity produces short segments of ssDNA.

45 R.P. Hayes, Y. Xiao, F. Ding, P.B.G. van Erp, K. Rajashankar, S. Bailey, B. Wiedenheft, A. Ke. Structural basis for promiscuous PAM recognition in type I–E Cascade from E. coli. Nature, 530 (2016), pp. 499–503

Crystal structure of Cascade bound to DNA containing a double stranded PAM. This structure explains the PAM recognition mechanism and PAM preferences of Cascade.

46 D.G. Sashital, B. Wiedenheft, J.A. Doudna. Mechanism of foreign DNA selection in a bacterial adaptive immune system. Mol Cell, 46 (2012), 606–615

47 E. Semenova, M.M. Jore, K.A. Datsenko, A. Semenova, E.R. Westra, B. Wanner, J. van der Oost, S.J.J. Brouns, K. Severinov: Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc Natl Acad Sci U S A, 108 (2011), pp. 10098–10103

46

48 M. Rutkauskas, T. Sinkunas, I. Songailiene, M.S. Tikhomirova, V. Siksnys, R. Seidel. Directional R-Loop Formation by the CRISPR-Cas Surveillance Complex Cascade Provides Efficient Off-Target Site Rejection. Cell Rep, 10 (2015), pp. 1534–1543

49 P.B.G. van Erp, R.N. Jackson, J. Carter, S.M. Golden, S. Bailey, B. Wiedenheft Mechanism of CRISPR-RNA guided recognition of DNA targets in Escherichia coli. Nucleic Acids Res, 43 (2015), pp. 8381–8391

50 M.L. Hochstrasser, D.W. Taylor, P. Bhat, C.K. Guegler, S.H. Sternberg, E. Nogales, J.A. Doudna. CasA mediates Cas3-catalyzed target degradation during CRISPR RNA-guided interference. Proc Natl Acad Sci U S A, 111 (2014), pp. 6618-6623

51 C. Xue, N.R. Whitis, D.G. Sashital. Conformational Control of Cascade Interference and Priming Activities in CRISPR Immunity. Mol Cell, 64 (2016), pp. 826–834

The authors show that the Cse1 subunit of Cascade adopts a closed conformation when bound to interference targets and an open conformation when bound to priming targets. They demonstrate that the closed conformation is associated with interference while the open conformation attenuates interference, but increases priming.

52 C. Xue, A.S. Seetharam, O. Musharova, K. Severinov, S.J.J. Brouns, A.J. Severin, DG Sashital. CRISPR interference and priming varies with individual spacer sequences. Nucleic Acids Res, 43 (2015), pp. 10831–10847

53 T.R. Blosser, L. Loeff, E.R. Westra, M. Vlot, T. Kunne, M. Sobota, C. Dekker, S.J.J. Brouns, C. Joo. Two Distinct DNA Binding Modes Guide Dual Roles of a CRISPR-Cas Protein Complex. Mol Cell, 58 (2015), pp. 60–70

The authors show that Cascade adopts distinct conformations when bound to priming or interference targets.

54 R.N. Jackson, S.M. Golden, P.B.G. van Erp, J. Carter, E.R. Westra, S.J.J. Brouns, J. van der Oost, T.C. Terwilliger, R.J. Read, B. Wiedenheft. Crystal structure of the CRISPR RNA-guided surveillance complex from Escherichia coli. Science, 345 (2014), pp. 1473–1479

55 S. Mulepati, A. Heroux, S. Bailey. Crystal structure of a CRISPR RNA-guided surveillance complex bound to a ssDNA target. Science, 345 (2014), pp. 1479– 1484

47

56 H. Zhao, G. Sheng, J. Wang, M. Wang, G. Bunkoczi, W. Gong, Z. Wei, Y. Wang. Crystal structure of the RNA-guided immune surveillance Cascade complex in Escherichia coli. Nature, 515 (2014), pp. 147-150

57 S. Mulepati, S. Bailey. In Vitro Reconstitution of an Escherichia coli RNA-guided Immune System Reveals Unidirectional, ATP-dependent Degradation of DNA Target. J Biol Chem, 288 (2013), pp. 22184–22192

58 T. Sinkunas, G. Gasiunas, S.P. Waghmare, M.J. Dickman, R. Barrangou, P. Horvath, Siksnys V. In vitro reconstitution of Cascade-mediated CRISPR immunity in Streptococcus thermophilus. EMBO J, 32 (2013), pp. 385–394

59 E.R. Westra, P.B.G. van Erp, T. Kunne, S.P. Wong, R.H.J. Staals, C.L.C. Seegers, S. Bollen, M.M. Jore, E. Semenova, K. Severinov, et al. CRISPR immunity relies on the consecutive binding and degradation of negatively supercoiled invader DNA by Cascade and Cas3. Mol Cell, 46 (2012), pp. 595–605

60 E.R. Westra, E. Semenova, K.A. Datsenko, R.N. Jackson, B. Wiedenheft, K. Severinov, S.J.J. Brouns. Type I-E CRISPR-Cas Systems Discriminate Target from Non-Target DNA through Base Pairing-Independent PAM Recognition. PLoS Genet, 9 (2013), p. e1003742

61 P.C. Fineran, M.J.H Gerritzen, M. Suarez-Diez, T. Kunne, J. Boekhorst, S.A.F.T. van Hijum, R.H.J. Staals, S.J.J. Brouns. Degenerate target sites mediate rapid primed CRISPR adaptation. Proc Natl Acad Sci U S A, 111 (2014), pp. e1629- e1238

62 K.A. Datsenko, K. Pougach, A. Tikhonov, B.L. Wanner, K. Severinov, E. Semenova. Molecular memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system. Nat Comms 3, (2012) p. 945

63 E. Savitskaya, E. Semenova, V. Dedkov, A. Metlitskaya, K. Severinov. High- throughput analysis of type I-E CRISPR/Cas spacer acquisition in E. coli. RNA Biol, 10 (2013) pp. 716-725

64 C. Richter, R.L. Dy, R.E. McKenzie, B.N.J. Watson, C. Taylor, J.T. Chang, M.B. McNeil, R.H.J. Staals, P.C. Fineran. Priming in the Type I-F CRISPR-Cas system triggers strand-independent spacer acquisition, bi-directionally from the primed protospacer. Nucleic Acids Res, 42 (2014), pp. 8516–8526

48

65 D. Vorontsova, K.A. Datsenko, S. Medvedeva, J. Bondy-Denomy, E.E. Savitskaya, K. Pougach, M. Logacheva, B. Wiedenheft, A.R. Davidson, K. Severinov, et al. Foreign DNA acquisition by the I-F CRISPR–Cas system requires all components of the interference machinery. Nucleic Acids Res, 43 (2015), pp. 10848–10860

66. M. F. Rollins, S. Chowdhury, J. Carter, S. M. Golden, R. A. Wilkinson, J. Bondy- Denomy, G. C. Lander, B. Wiedenheft. Cas1 and the Csy complex are opposing regulators of Cas2/3 nuclease activity. Proc Natl Acad Sci U S A, (2017), doi:10.1073/pnas.1616395114

67 T. Kunne, S.N. Kieper, J.W. Bannenberg, A.I.M. Vogel, W.R. Miellet, M.D. Klein, Suarez- M. Diez, S.J.J. Brouns. Cas3-Derived Target DNA Degradation Fragments Fuel Primed CRISPR Adaptation. Mol Cell, 63 (2016), pp. 852–864

The authors show in vitro Cascade and Cas3 activity produces spacer precursors enriched in PAM sequences that are integrated into the CRISPR locus by Cas1- Cas2. These precursors are produced from both interference and priming targets.

68 G. Tamulaitis, Č. Venclovas, V. Siksnys. Type III CRISPR-Cas Immunity: Major Differences Brushed Aside. Trends Microbiol, 25 (2017) pp. 49–61

69 D.W. Taylor, Y. Zhu, R.H.J. Staals, J.E. Kornfeld, A. Shinkai, J. van der Oost, E. Nogales, J.A. Doudna. Structures of the CRISPR-Cmr complex reveal mode of RNA target positioning. Science, 348 (2015), pp. 581-585

70 J.R. Elmore, N.F. Sheppard, N. Ramia, T. Deighan, H. Li, R.M. Terns, M.P. Terns. Bipartite recognition of target RNAs activates DNA cleavage by the Type III-B CRISPR–Cas system. Gene Dev, 30 (2016), pp. 447–459.

The authors show that in the Type III-B Cmr complex from Pyrococcus furiosus binding of target RNA, and recognition of the 3'-side or "rPAM" of the target RNA, activates non-sequence specific DNase activity.

71 M.A. Estrella, F.T. Kuo, S. Bailey. RNA-activated DNA cleavage by the Type III- B CRISPR–Cas effector complex. Gene Dev, 30 (2016), 460–470.

This paper reports that in the Type III-B Cmr complex from Thermotoga maritima binding of complementary ssRNA activates non-sequence specific ssDNA nuclease activity.

49

72 M. Kazlauskiene, G. Tamulaitis, G. Kostiuk, Č. Venclovas, V. Siksnys. Spatiotemporal Control of Type III-A CRISPR-Cas Immunity: Coupling DNA Degradation with the Target RNA Recognition. Mol Cell, 62 (2016), pp. 295–306.

This paper shows that in the Type III-A Csm complex from Streptococcus thermophilus binding of target RNA activates DNase activity, while RNA cleavage and release deactivates DNase activity.

73 T.Y. Liu, A.T. Iavarone, J.A. Doudna. RNA and DNA Targeting by a Reconstituted Thermus thermophiles Type III-A CRISPR-Cas System. PLoS One, 12 (2017), p. e0170552

This manuscript reports in the Type III-A Csm complex from Thermus thermophilus complementary RNA binding activates non-sequence specific ssDNase. DNase activation also requires a lack of complementarity with the 5’- repeat of the crRNA.

74 J. Zhang, S. Graham, A. Tello, H. Liu, M.F. White. Multiple nucleic acid cleavage modes in divergent type III CRISPR systems. Nucleic Acids Res, 44 (2016), pp. 1789–1799

This paper shows that in the Type III-B and III-D complexes from Sulfolobus solfataricus complementary RNA binding activates cleavage of plasmid DNA in vitro.

75 W. Han, Y. Li, L. Deng, M. Feng, W. Peng, S. Hallstrøm, J. Zhang, N. Peng, Y.X. Liang, M.F. White, et al. A type III-B CRISPR-Cas effector complex mediating massive target DNA destruction. Nucleic Acids Res (2016), doi:10.1093/nar/gkw1274

The authors show that in the Type III-B Cmr complex from Sulfolobus islandicus cleaves RNA binding activates ssDNase activity, but activation required non- complementarity between the 5’-handle of the guide and 3’-flank of the RNA target.

76 T. Osawa, H. Inanaga, C. Sato, T. Numata. Crystal Structure of the CRISPR-Cas RNA Silencing Cmr Complex Bound to a Target Analog. Mol Cell 58, 2015, pp. 418–430

77 L.A. Marraffini, E.J. Sontheimer. Self versus non-self discrimination during CRISPR RNA-directed immunity. Nature, 463 (2010), pp. 568–571

50

CHAPTER THREE

CRYSTAL STRUCTURE OF THE CRISPR RNA-GUIDED SURVEILLANCE

COMPLEX FROM ESCHERICHIA COLI

Contribution of Authors and Co-Authors

Manuscript in Chapter 3

Author: Ryan N. Jackson

Contributions: designed the research, determined the structure and performed biochemical analyses, wrote the manuscript

Co-Author: Sarah M. Golden

Contributions: assisted with diffraction data collection and processing, performed biochemical analyses

Co-Author: Paul B.G. van Erp

Contributions: performed biochemical analyses

Co-Author: Joshua Carter

Contributions: determined the structure

Co-Author: Edze R. Westra

Contributions: wrote the manuscript

Co-Author: Stan J. J. Brouns

Contributions: wrote the manuscript

Co-Author: John van der Oost

Contributions: wrote the manuscript

Co-Author: Thomas C. Terwilliger 51

Contributions: determined the structure

Co-Author: Randy J. Read

Contributions: determined the structure

Co-Author: Blake Wiedenheft

Contributions: designed the research, determined the structure, wrote the manuscript

52

Manuscript Information

Ryan N. Jackson, Sarah M. Golden, Paul B. G. van Erp, Joshua Carter, Edze R. Westra, Stan J. J. Brouns, John van der Oost, Thomas C. Terwilliger, Randy J. Read, Blake Wiedenheft

Science

Status of Manuscript: ____ Prepared for submission to a peer-reviewed journal ____ Officially submitted to a peer-reviewed journal ____ Accepted by a peer-reviewed journal X Published in a peer-reviewed journal

American Association for the Advancement of Science Volume 345 Issue 6203 19 Sep 2014 Pages 1473-1479 DOI: 10.1126/science.1256328

53

CRYSTAL STRUCTURE OF THE CRISPR RNA-GUIDED SURVEILLANCE

COMPLEX FROM ESCHERICHIA COLI

Ryan N. Jackson1, Sarah M. Golden1, Paul B. G. van Erp1, Joshua Carter1, Edze

R. Westra2†, Stan J. J. Brouns2, John van der Oost2, Thomas C. Terwilliger3, Randy J.

Read4, Blake Wiedenheft1

1Department of Microbiology and Immunology, Montana State University,

Bozeman MT 59717, USA. 2Laboratory of Microbiology, Department of Agrotechnology and Food Sciences, Wageningen University, Dreijenplein 10, 6703 HB Wageningen,

Netherlands. 3Bioscience Division, Los Alamos, NM 87545 4Department of Haematology,

University of Cambridge, Cambridge Institute for Medical Research, Cambridge CB2

0XY, U.K. †Environment and Sustainability Institute, University of Exeter, Penryn

Campus, Penryn, Cornwall TR10 9FE, England.

54

Abstract

Clustered regularly interspaced short palindromic repeats (CRISPRs) are essential components of RNA-guided adaptive immune systems that protect bacteria and archaea from viruses and plasmids. In Escherichia coli, short CRISPR-derived RNAs (crRNAs) assemble into a 405 kDa multi-subunit surveillance complex called Cascade (CRISPR- associated complex for antiviral defense). Here we present the 3.24 Å resolution x-ray crystal structure of Cascade. Eleven proteins and a 61-nucleotide crRNA assemble into a sea-horse-shaped architecture that binds double-stranded DNA targets complementary to the crRNA-guide sequence. Conserved sequences on the 3’- and 5’-ends of the crRNA are anchored by proteins at opposite ends of the complex, while the guide sequence is displayed along a helical assembly of six interwoven subunits that present 5-nucleotide segments of the crRNA in pseudo A-form configuration. The structure of Cascade suggests a mechanism for assembly and provides insights into the mechanisms of target recognition.

55

Introduction

CRISPR loci provide the molecular memory of an adaptive immune system that is prevalent in bacteria and archaea (1-5). Each CRISPR locus consists of a series of short repeats separated by non-repetitive spacer sequences acquired from foreign genetic elements such as viruses and plasmids. CRISPR loci are transcribed, and the long primary transcripts are processed into a library of short CRISPR-derived RNAs (crRNAs) that contain sequences complementary to previously encountered invading nucleic acids.

CRISPR-associated (Cas) proteins bind each crRNA, and the resulting ribonucleoprotein complexes target invading nucleic acids complementary to the crRNA guide. Targets identified as foreign are subsequently degraded by dedicated nucleases.

Phylogenetic and functional studies have identified three main CRISPR-system

Types (I, II, and III) and 11 subtypes (IA-F, IIA-C, IIIA-B) (6). The Type IE system from

Escherichia coli K12 consists of a CRISPR locus and eight cas genes (Figure 3.1A). Five of the cas genes in this system encode for proteins that assemble with the crRNA into a large complex called Cascade (CRISPR-associated complex for antiviral defense) (7).

Efficient detection of invading DNA relies on complementary base pairing between the

DNA target and crRNA-guide sequence, as well as recognition of a short sequence motif immediately adjacent to the target called a protospacer-adjacent motif (PAM) (8-10).

Target recognition by Cascade triggers a conformational change and recruits a trans-acting nuclease-helicase (Cas3) that is required for destruction of invading target (8, 11-15).

However, an atomic resolution understanding of Cascade assembly and CRISPR RNA- guided surveillance has not been available. 56

To understand the mechanism of crRNA-guided surveillance by Cascade, we determined the 3.24 Å resolution x-ray crystal structure of the complex (Figure 3.1). The structure explains how the 11 proteins assemble with the crRNA into an interwoven architecture that presents discrete segments of the crRNA for complementary base pairing.

Overall, the Cascade structure reveals features required for complex assembly, and provides insights into the mechanisms of target recognition.

Results

Overview of the Cascade structure

We determined the x-ray crystal structure of Cascade by molecular replacement, using the 8 Å cryo-EM map as a search model (Figure 3.1, Table S3.1, and Figure S3.1)

(12). Initial phases were improved and extended to 3.24 Å by averaging over non- crystallographic symmetry (16). The asymmetric unit contains two copies of Cascade that superimpose with an average root-mean-square deviation (rmsd) of 1.29 Å for equivalently positioned C atoms (Figure S3.2). Here we focus our description on complex 1, but both assemblies consist of 11 protein subunits and a single 61-nt crRNA that traverses the length of the complex. Nine of the 11 Cas proteins make direct contact with the crRNA and eight of the nine RNA-binding proteins contain a modified RNA Recognition Motif (RRM)

(Figure 3.1B-D). The 5’ and 3’ ends of the crRNA are derived from the repeat region of the CRISPR RNA, and are bound at opposite ends of the sea-horse-shaped complex. Cas6e binds the 3’ end of the crRNA at the head of the complex, while the 5’-end of the crRNA is sandwiched between three protein subunits (Cas5, Cas7.6 and Cse1) in the tail (Figure 57

3.1C). The head and tail of the complex are connected along the belly by two Cse2 subunits and by a helical backbone of six Cas7 proteins (Cas7.1 - Cas7.6). This assembly creates an interwoven ribonucleoprotein structure that kinks the crRNA at 6-nt intervals (Figure

3.1D).

Figure 3.1. X-ray crystal structure of Cascade. (A) The Type IE CRISPR-mediated immune system in E. coli K12 consists of eight cas genes and one CRISPR locus. The CRISPR locus consists of a series of 29-nucleotide repeats (black diamonds) separated by 32-nucleotide spacer sequences (red cylinders). (B) Orthogonal views of the Cascade structure. (C) Schematic of Cascade colored according to panel B. Kinked bases are numbered. (D) Cascade consists of an uneven stoichiometry of five different Cas proteins and a single crRNA. The ‘thumb’ of each backbone protein folds over the top of the crRNA creating a kink in the RNA at 6-nt intervals (-1, 6, 12, 18, 24, and 30). 58

Mechanism of RNA recognition by the Cas6e endonuclease

Cas6 family proteins are phylogenetically diverse, but all Cas6 proteins are metal- independent endoribonucleases that selectively bind and cleave long CRISPR RNA transcripts (Figure 3.2A) (7, 8, 17-20). Our structure reveals that the E. coli Cas6e protein consists of tandem RRMs connected by an eight-residue linker (Figure 3.2). Each RRM

(also called a ferredoxin-like fold) consists of a conserved β1 α1 β2 β3 α2 β4 arrangement in which the β-strands are arranged in a four-stranded antiparallel β-sheet and the two helices pack together on one side of the sheet. The β-sheets in each of the two RRMs face one another, creating a V-shaped cleft along one face of the protein (Figure 3.2B). This cleft was initially predicted to bind RNA (21), but a positively charged surface on the opposite face of the protein makes significant electrostatic contacts with the 3′ strand of the crRNA stem-loop (Figure 3.2C). In addition, a positively charged groove-loop

(residues 90-119) on the C-terminal RRM domain makes extensive electrostatic contacts with the major groove of the crRNA stem-loop, including base specific contacts with C49,

G48, and G51 (Figure 3.2D and Figure S3.3). Residues at the base of this loop (N91, K94,

N99, R102, C112, I116) make base-specific (G35, U36, U37) and hydrophobic (A34) contacts with nucleotides 5’ of the stem-loop (Figure 3.2D, Figure S3.3, and S3.4). We expect other Cas6e proteins make similar contacts, but this portion of the crRNA has not been included in previous studies. 59

Figure 3.2. Mechanism of crRNA recognition by Cas6e. (A) Schematic of Cas6e bound to the stem-loop of the CRISPR RNA repeat. (B) Structure of Cas6e bound to the 3’ stem- loop of the crRNA. A β-hairpin, referred to as the ‘groove-loop’, inserts into the major groove of the crRNA stem-loop. Cas6e binding positions the scissile phosphate into the endonuclease active site. (C) Electrostatic surface representation of Cas6e illustrates how the positively charged ‘groove-loop’ fits into the major groove of the crRNA stem-loop. (D) The ‘groove-loop’ makes sequence specific interactions with nucleotides 5’ of the stem-loop. 60

Pre-cleavage recognition of the CRISPR RNA relies on base-specific interactions within the stem-loop and nucleotides on the 3’ side of the stem-loop, which help position the scissile phosphate in the active site (18). After cleavage of the primary CRISPR transcript, Cas6e remains tightly associated with the stem-loop of the mature crRNA

(Figure 3.3). The V-shaped cleft, opposite the RNA binding face of Cas6e, provides a binding site for a short helix from Cas7.1 that tethers Cas6e to the helical backbone of

Cascade (Figure 3.3B-D).

Figure 3.3. Connecting the head to the backbone. (A) Schematic of Cascade highlighting the connection between Cas6e and Cas7.1. (B-C) A short helix located on the thumb of Cas7.1 fits in a groove between the N- and C-terminal RRMs on Cas6e. The Cas6e ‘helix- binding groove’ is located opposite the crRNA-binding surface. (D) Conserved hydrophobic residues (Phe200, Thr201 and Trp199) are positioned in binding pockets in the Cas6e V-shaped cleft.

Assembly of the Cas7 backbone.

The backbone of Cascade is composed of six Cas7 proteins that oligomerize along the crRNA forming an interwoven architecture that presents the crRNA-guide sequence in 61 six discrete segments (Figure 3.4A and Figure S3.3). Each segment consists of a buried nucleotide followed by five solvent-accessible bases that are ordered in a pseudo A-form configuration by interactions with three different protein subunits (Figure 3.4A-C). The

Cas7 protein folds into a structure shaped like a right hand (Figure 3.4C and Figure S3.5)

(22, 23). This shape is created by a modified RRM that forms the palm, a helical domain resembles fingers (residues 59-181), a 30 amino acid loop takes on the shape of a thumb

(residues 193-223), and two smaller loops inserted in the RRM form a web between the thumb and the fingers (Figure 3.4C). Unlike most RRMs, which bind to RNA using conserved residues positioned on the face of the antiparallel β-sheet, our structure reveals a series of interactions with the phosphate backbone that are primarily limited to the first

α-helix (α1) of the RRM, the web, and the thumb (Figure 3.4C and Figure S3.5). The first

α-helix of most RRMs is positioned on the backside of the β-sheet, and does not directly contact the RNA. However, in Cas7 the α1-helix is positioned perpendicular to and off to one side of the central β-sheet (Figure 3.4C). Conserved residues in the α1-helix interact with three consecutive phosphates in a way that introduces two consecutive ~90° turns in the backbone of the crRNA (Figure 3.4, Figure S3.3 and S3.6). These chicanes in the crRNA occur at a regular 6-nt periodicity defined by the distance between α1-helices on adjacent Cas7 subunits. Each chicane is separated by 5 bases presented to the solvent in pseudo A-form, while the 6th base is flipped out of the helical presentation and covered by the thumb of an adjacent Cas7 molecule (Figure 3.4). The position of each thumb is stabilized by electrostatic interactions with the 1-helix on the palm of an adjacent 62 molecule and mutations in the thumb of homologous Cas7 proteins have been shown to significantly reduce RNA binding affinities (23). 63

64

Figure 3.4. Assembly of backbone creates an interwoven structure that presents segments of the crRNA for target binding. (A) The Cas7 subunits bind the crRNA in a right-handed helical arrangement where the “thumbs” of Cas7.2 to Cas7.6 fold over the top of the crRNA, kinking every 6th nucleotide. (B) Cas7 binding subdivides the crRNA into six segments that are pre-ordered in an A-form like conformation. Idealized RNA:DNA hybrids are superimposed on each pre-ordered segment of the crRNA and the rmsd for each section is indicated. (C) Each Cas7 subunit is shaped like a right hand with fingers (helical domain), palm (modified RRM), webbing, and a thumb. The inset is a zoomed in view of segments 1-5 superimposed on one another. Key residues on the thumbs that flank each segment are indicated. (D) Electrophoretic mobility shift assays of dsDNA substrates that contain mismatches with the crRNA at 6-nt intervals. Equilibrium dissociation constants (KD) are an average from three independent experiments.

The interwoven arrangement of interlocking Cas7 subunits divides the crRNA into six segments (Figure 3.4A and D). The first five segments consist of a pattern of five ordered nucleotides that are book-ended by thumbs that fold over every 6th-nucleotide in the crRNA-guide sequence. This suggests that every 6th-nucleotide of the crRNA-guide may not participate in target recognition. To test this hypothesis, we determined binding affinity for Cascade to double-stranded DNA targets that were either 100% complementary to the crRNA-guide or mismatched at 6-nt intervals (Figure 3.4D and Table S3.2). The equilibrium dissociation constant (KD) for a target that contains a PAM (5’-CAT-3’), and a target sequence complementary to the crRNA-guide is 1.6 nM (Figure 3.4D and Figure

S3.7). Mutations in the target that disrupt base pairing at every 6th position (positions 6, 12,

18, 24 and 30) have no measurable defect in target binding, whereas mutations on either side of every 6th position result in major binding defects (Figure 3.4D). Mutations at positions 5, 11, 17, 23, and 29, result in binding affinities that are more than two orders of magnitude weaker than targets that are either 100% complementary or mutated at every 6th position. The binding defect is even more pronounced for targets with mismatches at positions 7, 13, 19, 25 and 31 (Figure 3.4D). 65

Complementarity between the crRNA-guide and the target is critical at positions 1-

5, 7 and 8 (9, 10, 24, 25). This portion of the crRNA-guide is called the ‘seed’ sequence and it has been suggested that helical ordering of these bases may explain their importance in target binding. However, the helical arrangement of bases in segment 1 (positions 1-5) of the crRNA-guide is not significantly different from segments 2, 3, 4, and 5 (Figure 3.4C).

In fact, the ordered nucleotides in segments 1 through 5 superimpose with an average rmsd of 0.45Å, suggesting that the importance of the seed in target recognition may have more to do with the location of this sequence relative to the PAM, rather than preferential pre- ordering of the bases.

The helical display of each segment is induced by amino acids (T201, L214, W199 and F200) located on the Cas7 thumbs that stack with bases on the 5’ and 3’ ends of each segment (Figure 3.4C and Figure S3.3). The first two bases in each segment are ordered in an A-form configuration, but the third base is nudged out of ideal A-form by a conserved methionine (M166) that inserts between the 3rd- and 4th-nucleotides in each segment

(Figure 3.4C). Many of the amino acids important for ordering the bases in each segment are located on the thumbs that flank segments 1-5. Segment 6 is not flanked by a thumb on the 3’ end and the bases in these segments are more flexible (Figure 3.4B and Figure S3.5).

Unlike the other five Cas7 subunits, the thumb on Cas7.1 contains a short helix that inserts into the hydrophobic V-shaped cleft of Cas6e connecting the Cas6e head to the Cas7 backbone (Figure 3.3).

In addition to the Cas7 backbone, Cas6e is connected to the body of Cascade via interactions with Cse2 (Figure 3.1 and Figure S3.8). The Cse2 proteins form a head-to-tail 66 dimer that assembles along the belly of Cascade making contacts with the thumb and web of Cas7 proteins (Figure 3.1C and Figure S3.8). Although the Cse2 subunits do not make direct contacts with the crRNA, electrostatic calculations show that both faces of the Cse2 dimer are positively charged, indicating a possible role for Cse2 in stabilizing the bound and displaced strands of the DNA target (Figure S3.8) (12, 26). Comparison of the two

Cascade assemblies in the asymmetric unit reveals that Cse2.1 of assembly 2 is shifted by

7 Å away from the equivalent position in assembly 1 and Cas6e is rotated ~16° (Figure

S3.2). This suggests that rotation of the head can influence the position of the Cse2 subunits

(12).

Programmed tail assembly

CRISPR RNA processing results in a library of mature crRNAs that have a conserved 8-nt ‘handle’ on the 5’-end that is derived from the CRISPR repeat sequence

(Figure 3.1 and 3.2). These nucleotides, numbered -8 to -1 according to convention, function as a molecular signal that initiates assembly of Cas7.6, Cas5e and Cse1. The α1- helix of Cas7.6 introduces a final 5’-chicane in the crRNA by interacting with nucleotides that straddle the boundary between the 5’-handle and seed sequence (Figure 3.5). If the oligomeric assembly of Cas7s were to continue along the crRNA in the 5’ direction then the remaining 6-nts would be ordered across the web of the next Cas7 subunit. However, these six nucleotides (-8 to -3) are recognized by Cas5e, which may block propagation of

Cas7 oligomerization at the 5’-end of the crRNA, induce a conformational change in the finger domain of Cas7.6, and provide a platform for the recruitment of Cse1 to the tail

(Figure 3.5). 67

The Cas5e protein adopts a ‘right-handed fist-shape’ structure where the thumb arches across the top of the fist (Figure 3.5B and Figure S3.9). The fist is composed of a modified RRM that includes a 50 amino acid insertion between β-strands 2 and 3 that takes on the shape of a thumb. The Cas5e thumb, which bears no recognizable sequence similarity to the Cas7 thumb, performs a very similar function by folding over the top of the kinked base (nucleotide -1), and positions the first nucleotide of the seed in an A-form configuration (Figure 3.5B and Figure S3.3). However, unlike the straight thumb on Cas7 proteins, the Cas5e thumb arches over the top of the fist and interacts with the finger domain of Cas7.6. Structural alignments of Cas7.6 with the other Cas7 subunits reveal a

~180° rotation of the finger domain that accommodates the Cas5e thumb and creates a 28

Å gap between the finger domains of Cas7.5 and Cas7.6 (Figure 3.5C). A recent cryo-EM structure of Cascade bound to dsDNA reveals that the enlarged separation between these two domains accommodates the dsDNA target (11). Modeling our crystal structure into the cryo-EM density reveals a lysine-rich helix (K137, K138, K141, and K144) on Cas7.5 and

Cas7.6 that may play a role in stabilizing the dsDNA during target recognition (Figure

S3.10). 68

Figure 3.5. Mechanism of tail assembly. (A) Schematic view of the 5’ tail. Base specific binding pockets and the 1-helix of Cas5e are highlighted. (B) Cas5e (orange) is composed of a modified RRM and a thumb that interacts with Cas7.6 and the crRNA. The AAC triplet of the 5’ handle is indicated, and insets highlight the three base-specific binding pockets. (C) The finger domain of Cas7.6 (blue) is rotated 180° relative to the finger domain of the other Cas7 proteins (white). This rotation increases the distance between the finger domains from 16 to 28 Å. The thumb of Cas5e would clash with the canonical orientation of the Cas7 finger domain (white), suggesting that the rotation of the Cas7.6 finger domain is influenced by Cas5e binding. (D) The L1-helix of Cse1 fits snugly into a pore created by the thumb of Cas5e. Inset shows the base-specific interactions made between L1 residues and the AAC triplet of the 5’-handle.

The last 7-nts on the 5’-end of the crRNA form a unique S-shaped curve that follows along the arch of the Cas5e thumb, swings across the web of Cas7.6 and the final 69

3 bases (-8A, -7U, and –6A) fit into base-specific binding pockets positioned along the top of the glycine rich α1-helix on Cas5e (Figure 3.5 and Figure S3.9). Nucleotides -5A, -4A, and -3C stack into a well-ordered triplet, while the cytosine at position -2 hangs vertically behind this triplet, and hydrogen bonds with the phosphate of nucleotide -4A. Mutations at the -2 position interfere with Cascade assembly (9), and the structure reveals that the cytosine at this position participates in maintaining the S-shaped curve in the 5’-handle

(Figure 3.5B).

Cas5 structures from distantly related CRISPR systems also contain a glycine rich

α1-helix and a positively charged binding pocket that may play a similar role in recognition of nucleotides in the 5’-handle of the crRNA (Figure S3.9) (27-29). Cas5 proteins from

Type IC systems have an additional C-terminal extension that contains an endonuclease active site (29). In these systems, Cas5d is the CRISPR-specific endoribonuclease responsible for CRISPR RNA processing and structural alignments with Cas5e suggest that Cas5d endonucleases may recognize the 5’-handle of the crRNA rather than the 3’ stem-loop (Figure S3.11). These observations may explain why Cas5d proteins no longer cleave crRNA substrates when portions of the 5’-handle are mutated or removed (27, 29).

The Cas5e thumb arches over the top of the fist creating a cylindrical pore that permits access to the nucleotides in the 5’-handle (Figure 3.5B-D). This pore is a docking module for a short α-helix on Cse1. This helix is on a loop, previously called L1 (residues

130-143) that is disordered in the crystal structure of the Cse1 protein from T. thermophilus

(30, 31). In the Cascade structure the L1-helix inserts into the Cas5e helix-binding pore and makes base-specific interactions with the AAC triplet (Figure 3.5A and D) (30). 70

Cse1 is a large two-domain protein that adopts a unique globular fold that contains a metal-ion coordinated by four cysteines (C140, C143, C250, and C253), and a C-terminal four-helix bundle (Figure S3.12). The metal-ion binding motif creates a knob on the end of a loop that may be involved in positioning the L1-helix for docking. In addition to the docking interaction by L1, the globular domain of Cse1 also makes contacts with the modified RRM of Cas5e, and the four-helix bundle on Cse1 extends off the top of the globular domain, making contacts with the C-terminal domain of Cse2.2 (Figure S3.8).

This interaction completes the structural bridge that connects the four-helix bundle of the

Cse1 tail to the Cas6e head.

Discussion

The x-ray crystal structure of Cascade explains how the 12 subunits of this complex assemble into an RNA-guided surveillance machine that targets dsDNA. CRISPR RNA processing by Cas6e is essential for RNA-guided protection from invading DNA (7). Cas6e recognizes the CRISPR RNA repeat sequence through interactions with the RNA stem- loop and specific interactions with bases on the 5’- and 3’- sides of the stem-loop (Figure

3.2 and Figure S3.4) (18). After cleavage Cas6e remains tightly associated with the 3’ stem- loop of the mature crRNA and this sub-complex may serve as a platform for the ordered assembly of the remaining 10 protein subunits that compose the backbone, tail, and belly of Cascade (Supplemental movie S3.1).

Unlike Cas6e and Cas5e, which make sequence-specific interactions with portions of the CRISPR repeat sequence, the Cas7 proteins polymerize along the crRNA via non- sequence specific interactions (Figure 3.4). The structure of Cascade reveals a common 71 thumb-like feature on Cas7 and Cas5e proteins that is critical to the oligomeric assembly of the helical backbone. The thumb of each Cas7 protein folds over the top of the crRNA and fits into a positively charged crease on the palm of the adjacent Cas7 protein (Figure

3.4). This assembly creates an interwoven architecture that simultaneously protects the crRNA from degradation by cellular nucleases, while presenting a series of 5-nts segments for complementary base pairing to a target. EM structures of crRNA-guided surveillance complexes from Type I, Type III-A and Type III-B systems reveal a similar helical backbone structure, suggesting that this architecture may be a conserved feature of Type I and Type III CRISPR-systems (11, 12, 23, 24, 29, 32-34). Indeed, crystal structures of

Csa2 (Type IA) (23) and Csm3 (Type IIIA) (22) reveal modified RRMs with large disordered loops at the same location as the E. coli Cas7 thumb, and a mutation in the predicted thumb of Csa2 has been shown to disrupt crRNA binding (Figure S3.5) (23).

Pre-ordering of crRNA-guide plays an important role in target recognition by reducing the entropic penalty associated with helix formation and provides a thermodynamic advantage for target binding (25). Argonaute proteins enhance target detection using a similar strategy and a structural comparison of Cascade to eukaryotic

Argonautes reveals a similar “kink helix” positioned between nucleotides 6 and 7 (Figure

S3.13) (35). However, in Argonautes there is no thumb that covers the kinked base, and it is expected that target hybridization may release the RNA for contiguous duplex formation

(36). Recent crystal structures of the Cas9 protein suggest a similar protein mediated pre- ordering of the RNA-guide (37, 38), and a structure of the target bound complex suggests that the RNA-DNA hybrid forms a contiguous A-form duplex (38). 72

Target detection by Cascade relies on protein-mediated recognition of a three- nucleotide PAM and crRNA-guided hybridization to the target. PAM recognition has been proposed to destabilize the target DNA duplex and initiate crRNA-guided strand invasion.

Loop-1 (L1) in Cse1 has been implicated in this process and the structure explains why mutations in L1 result in Cascade assembly defects (Figure 3.5) (30). However, the structure of Cascade without DNA does not explain how Cascade recognizes the PAM.

Structures of Cascade in association with DNA and Cas3 may provide additional insights into the interplay of Cascade and Cas3 in the process of RNA-guided DNA interference.

Acknowledgements

The authors are grateful to Jane Richardson and David Richardson for technical suggestions and discussion. Airlie McCoy for implementing the EM scale factor refinement in Phaser. X-ray diffraction data was collected with assistance from Jay Nix at

ALS beamline 4.2.2 (DE-AC02-05CH11231), Ruslan Sanishvili and Craig Ogata at APS beamline 23-ID (Y1-GM-1104), the Structural Biology Center at APS 19-ID (DE-AC02-

06CH11357) and SSRL (DE-AC02-76SF00515 and P41GM103393). ERW received funding from the People Program (Marie Curie Actions) of the European Union’s Seventh

Framework Program (FP7/2007-2013) under REA grant agreement n⁰ [327606]. SJJB is supported by a Vidi grant from the Netherlands Organization of Scientific Research

(864.11.005) and JvdO by a Vici grant (865.05.001). RJR is supported by a Principal

Research Fellowship from the Wellcome Trust (grant No. 082961/Z/07/Z) and a grant

(GM063210) from the NIH. JC is supported by a grant for undergraduate research from 73 the Howard Hughes Medical Institute (#52006931). RNJ is supported by the NRSA postdoctoral fellowship (F32 GM108436) from the NIH. Research in the Wiedenheft lab is supported by the National Institutes of Health (P20GM103500 and R01GM108888), the

National Science Foundation EPSCoR (EPS-110134), the M.J. Murdock Charitable Trust, and the Montana State University Agricultural Experimental Station. Atomic coordinates have been deposited into the Protein Data Bank with accession code 4TVX.

74

References

1. R. Sorek, C. M. Lawrence, B. Wiedenheft, CRISPR-mediated adaptive immune systems in bacteria and archaea. Annu Rev Biochem 82, 237-266 (2013).

2. J. van der Oost, E. R. Westra, R. N. Jackson, B. Wiedenheft, Unravelling the structural and mechanistic basis of CRISPR-Cas systems. Nat Rev Microbiol 12, 479-492 (2014).

3. J. Bondy-Denomy, A. R. Davidson, To acquire or resist: the complex biological effects of CRISPR-Cas systems. Trends in microbiology 22, 218-225 (2014). 4. B. Wiedenheft, S. H. Sternberg, J. A. Doudna, RNA-guided genetic silencing systems in bacteria and archaea. Nature 482, 331-338 (2012).

5. J. Reeks, J. H. Naismith, M. F. White, CRISPR interference: a structural perspective. Biochem J 453, 155-166 (2013).

6. K. S. Makarova et al, Evolution and classification of the CRISPR-Cas systems. Nat Rev Microbiol 9, 467-477 (2011).

7. S. J. Brouns et al, Small CRISPR RNAs guide antiviral defense in prokaryotes. Science 321, 960-964 (2008).

8. M. M. Jore et al, Structural basis for CRISPR RNA-guided DNA recognition by Cascade. Nat Struct Mol Biol 18, 529-536 (2011).

9. E. R. Westra et al, Type I-E CRISPR-Cas Systems Discriminate Target from Non-Target DNA through Base Pairing-Independent PAM Recognition. PLoS Genet 9, e1003742 (2013).

10. E. Semenova et al, Severinov, Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc Natl Acad Sci U S A 108, 10098-10103 (2011).

11. M. L. Hochstrasser et al, CasA mediates Cas3-catalyzed target degradation during CRISPR RNA-guided interference. Proc Natl Acad Sci U S A, (2014).

12. B. Wiedenheft et al, Structures of the RNA-guided surveillance complex from a bacterial immune system. Nature 477, 486-489 (2011).

13. E. R. Westra et al, CRISPR immunity relies on the consecutive binding and degradation of negatively supercoiled invader DNA by Cascade and Cas3. Mol Cell 46, 595-605 (2012). 75

14. R. N. Jackson, M. Lavin, J. Carter, B. Wiedenheft, Fitting CRISPR-associated Cas3 into the Helicase Family Tree. Current Opinion in Structural Biology 24, 106–114 (2014).

15. E. R. Westra, D. Swarts, R. Staals, M. M. Jore, S. J. J. Brouns, J. v. d. Oost, The CRISPRs, They Are A-Changin': How Prokaryotes Generate Adaptive Immunity. Annu. Rev. Genet. 46, 311-339 (2012).

16. T. C. Terwilliger, Finding non-crystallographic symmetry in density maps of macromolecular structures. J Struct Funct Genomics 14, 91-95 (2013).

17. E. M. Gesner, M. J. Schellenberg, E. L. Garside, M. M. George, A. M. MacMillan, Recognition and maturation of effector RNAs in a CRISPR interference pathway Nat Struct Mol Biol 18, 688-692 (2011).

18. D. G. Sashital, M. Jinek, J. A. Doudna, An RNA induced conformational change required for CRISPR RNA cleavage by the endonuclease Cse3. Nat Struct Mol Biol 18, 680-687 (2011).

19. J. Carte, R. Wang, H. Li, R. M. Terns, M. P. Terns, Cas6 is an endoribonuclease that generates guide RNAs for invader defense in prokaryotes. Genes and Development 22, 3489-3496 (2008).

20. R. E. Haurwitz, M. Jinek, B. Wiedenheft, K. Zhou, J. A. Doudna, Sequence- and structure-specific RNA processing by a CRISPR endonuclease. Science 329, 1355-1358 (2010).

21. A. Ebihara, M. Yao, R. Masui, I. Tanaka, S. Yokoyama, S. Kuramitsu, Crystal structure of hypothetical protein TTHB192 from Thermus thermophilus HB8 reveals a new protein family with an RNA recognition motif-like domain. Protein Sci 15, 1494-1499 (2006).

22. A. Hrle, A. A. Su, J. Ebert, C. Benda, L. Randau, E. Conti, Structure and RNA- binding properties of the type III-A CRISPR-associated protein Csm3. RNA Biol 10, 1670-1678 (2013).

23. N. G. Lintner et al, Structural and Functional Characterization of an Archaeal Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated Complex for Antiviral Defense (CASCADE). J Biol Chem 286, 21643-21656 (2011).

24. B. Wiedenheft,et al, RNA-guided complex from a bacterial immune system enhances target recognition through seed sequence interactions. Proc Natl Acad Sci U S A 108, 10092-10097 (2011). 76

25. T. Kunne, D. C. Swarts, S. J. Brouns, Planting the seed: target recognition of short guide RNAs. Trends in microbiology 22, 74-83 (2014).

26. K. H. Nam, Q. Huang, A. Ke, Nucleic acid binding surface and dimer interface revealed by CRISPR-associated CasB protein structures. FEBS letters 586, 3956- 3961 (2012).

27. E. L. Garside et al, Cas5d processes pre-crRNA and is a member of a larger family of CRISPR RNA endonucleases. RNA Biol 18, 2020-2028 (2012).

28. Y. Koo, D. Ka, E. J. Kim, N. Suh, E. Bae, Conservation and variability in the structure and function of the Cas5d endoribonuclease in the CRISPR-mediated microbial immune system. J Mol Biol 425, 3799-3810 (2013).

29. K. H. Nam et al, Cas5d Protein Processes Pre-crRNA and Assembles into a Cascade-like Interference Complex in Subtype I-C/Dvulg CRISPR-Cas System. Structure 20, 1574-1584 (2012).

30. D. G. Sashital, B. Wiedenheft, J. A. Doudna, Mechanism of foreign DNA selection in a bacterial adaptive immune system. Mol Cell 48, 606-615 (2012).

31. S. Mulepati, A. Orr, S. Bailey, Crystal structure of the largest subunit of a bacterial RNA-guided immune complex and its role in DNA target binding. J Biol Chem 287, 22445-22449 (2012).

32. C. Rouillon et al, Structure of the CRISPR interference complex CSM reveals key similarities with cascade. Mol Cell 52, 124-134 (2013).

33. M. Spilman et al, Structure of an RNA silencing complex of the CRISPR-Cas immune system. Mol Cell 52, 146-152 (2013).

34. R. H. Staals et al, Structure and activity of the RNA-targeting Type III-B CRISPR-Cas complex of Thermus thermophilus. Mol Cell 52, 135-145 (2013).

35. N. T. Schirle, I. J. MacRae, The crystal structure of human Argonaute2. Science 336, 1037-1040 (2012).

36. G. Sheng et al, Structure-based cleavage mechanism of Thermus thermophilus Argonaute DNA guide strand-mediated DNA target cleavage. Proc Natl Acad Sci U S A 111, 652-657 (2014).

37. M. Jinek et al, Structures of Cas9 endonucleases reveal RNA-mediated conformational activation. Science 343, 1247997 (2014); published online EpubMar 14 (10.1126/science.1247997). 77

38. H. Nishimasu et al, Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell 156, 935-949 (2014).

78

Materials and Methods

Cascade expression and purification

The protein and RNA components of Cascade were cloned and co-expressed in E. coli Bl21 (DE3) cells using four different expression vectors (7, 8, 12). Expression was induced using 0.2 mM isopropyl-β-D-1-thiogalactopyranoside (IPTG) at OD600nm=0.5, and grown overnight at 16-18°C. Cells were pelleted by centrifugation (5,000g for 10 min), suspended in buffer (100mM Tris-HCl pH 8.0, 150mM NaCl, 1mM EDTA, and 1mM

TCEP), and frozen at -80°C. Cells were lysed by sonication and lysates were clarified by centrifugation (22,000g for 20 min). The protein and RNA components of Cascade self- assemble in vivo and the complex was affinity purified using an N-terminal Strep-II tagged

Cse2 subunit with Strep-Tactin Superflow Plus resin (Qiagen). Cascade was eluted from the Strep-Tactin resin in lysis buffer containing 2.5 mM desthiobiotin. The Strep-II tag was removed with HRV-3C protease, followed by a second purification using the Strep-Tactin

Superflow Plus resin. Cascade was concentrated and further purified using a HiLoad 16/60

Superdex200 gel filtration column (GE Healthcare) equilibrated with gel filtration buffer

(50mM Tris-HCl, pH 7.5).

Cascade crystallization and structure determination

Cascade crystals were grown at 4, 12, 18 and 24°C from equal volumes (2μl + 2μl) of concentrated Cascade protein solution (A280=30 to 45) and mother liquor (0.1 M HEPES pH 6.5-7.5, 0.05 – 0.20 M NaSCN, 0.150 NaCl, and 8-14% (w/v) PEG 8000) using hanging drop vapor diffusion or microbatch. Hanging drop experiments were carried out using

EasyXtal 15-Well DG-Tool (Qiagen) plates and microbatch experiments were performed 79 using Impact Plates (HR3-293 Hampton Research), and ClearSeal film (HR4-521).

Cascade crystals generally appeared after one to two weeks of incubation and were cryo- protected for x-ray diffraction experiments using a stepwise exchange within the crystallization drop that gradually increased the concentration of PEG 400 to 20% (v/v). In the initial step, 1μl of mother liquor containing a 5% higher concentration of PEG 8000 was added to the drop. The concentration of PEG 400 was then raised from 0-20% in 5% v/v increments by removing 1μl of the crystallization drop and adding 1 μl of cryo- protectant solution. After about 30 seconds in the final cryo-protectant, crystals were scooped and flash-cooled in liquid Nitrogen. Diffraction data were collected at beamline

4.2.2 of the Advanced Light Source (Lawrence Berkeley National Laboratory), beamlines

19-ID and 23-ID of the Advanced Photon Source (Argonne National Laboratory), and beamline 12-2 of the Stanford Synchrotron Radiation Light source (SLAC National

Accelerator Laboratory). Data processing was carried out using HKL2000, XDS, and

Aimless (39-41). The data collection statistics are shown in Table S3.1. The crystals belong to space group P212121 and contain two molecules of Cascade in the asymmetric unit related by non-crystallographic symmetry.

The 8 Å cryo-EM structure was used as a search model for molecular replacement using Phaser (42). Two copies were found with a clear signal (TFZ=20.3 for the second copy), and scale factor refinement reduced the magnification of the cryo-EM structure by a factor of 0.949. Phases were improved and extended to 3.24 Å by iterative rounds of density modification using the Autobuild wizard (43) of the Phenix software suite (44).

Density modification included non-crystallographic symmetry (NCS) averaging both over 80 the two copies of the Cascade complex and over the multiple copies of the Cas7 and Cse2 subunits within each complex. NCS operators within the copies of the complex were identified using phenix.find_ncs_from_density (16). The final atomic model was completed using iterative rounds of model building in COOT (45), refinement with

Phenix.refine (46), and validation using Molprobity (47). Refinement statistics for the model are provided in Table S3.1.

Labeling of oligonucleotides

Oligonucleotides (Integrated DNA Technologies) listed in Table S3.2 were 5’-end labeled with γ32P-ATP (PerkinElmer) using T4 polynucleotide kinase (NEB). Labeled oligonucleotides were purified by phenol/chloroform extraction followed by Micro Bio-

Spin P-30 columns (Bio-Rad). dsDNA was prepared by mixing labeled oligonucleotides with a >10 times molar excess of the complementary , in hybridization buffer (20mM HEPES pH 7.5, 75mM NaCl, 2mM EDTA, 10% glycerol, and 0.01% bromophenol blue). The mixture was incubated at 95°C for 5 minutes, and gradually cooled to 25°C in a thermocycler. Duplexed oligonucleotides were further purified by 15% native polyacrylamide gel (1x TBE buffer run at 25mA for 40 minutes at 4°C), and recovered from the gel by overnight precipitation.

Electrophoretic Mobility Shift Assay (EMSA)

Increasing concentrations of Cascade were incubated with oligonucleotides in a buffer containing 20mM HEPES pH 7.5, 75 mM NaCl, 2 mM EDTA, 1 mM TCEP 10% glycerol and 0.01% bromophenol blue. Samples were incubated for 30 minutes at 37°C and loaded onto a 6% native polyacrylamide gel and visualized using a phosphorimager 81

(GE Healthcare). Gels were dried, exposed to phosphor storage screens, and scanned with a Typhoon (GE Healthcare) phosphorimager. The signals from the bound and unbound

DNA fractions were quantified using ImageQuant software (GE Healthcare). After background subtraction, the fractions of bound oligonucleotides were plotted against the total Cascade concentration. The data were fit by nonlinear regression analysis using the equation:

푀1 ∗ [푃7 퐶푎푠푐푎푑푒] 퐹푟푎푐푡푖표푛 푏표푢푛푑 퐷푁퐴 = 푡표푡푎푙 (퐾푑 + [푃7 퐶푎푠푐푎푑푒]푡표푡푎푙 )

Where M1 is the amplitude of the binding curve. Reported KDs are the average of three independent experiments.

Structure alignments and graphical rendering

Comparison of homologous structures was performed using the secondary structure matching tool in COOT (45), or with the align tool in PyMol (48). Figures were rendered using PyMol or Chimera (48, 49). Movies were made using Chimera (49).

82

Supplemental Figures

Figure S3.1. Electron Density of Cascade. Electron density map (mesh) at 3.24 Å resolution obtained by molecular replacement using the EM density, followed by averaging over non-crystallographic symmetry. The refined model of Cascade is shown as sticks.

83

Figure S3.2. Superposition of the two Cascade molecules in the asymmetric unit. (A) The asymmetric unit contains two molecules of Cascade. (B) The two molecules superimpose with a rmsd of 1.29 Å. The primary difference between the two molecules is a rotation of the head.

84

Figure S3.3. Schematic of amino acid contacts with the crRNA. Amino acids are shown using three letter abbreviations and colored according to the key in the lower right corner. The segments of the crRNA are labeled and kinked bases are numbered. 85

Figure S3.4. Structural Comparison of Cas6e proteins. (A) Structure of E. coli Cas6e (yellow) bound to the 3’-end of the crRNA. (B) Structure of T. thermophilus Cas6e (gray) bound to a non-hydrolysable substrate (PDB: 2Y8W) (18). This structure reveals protein specific interaction 3’ of the cleavage site. (C) Superposition of E. coli Cas6e (yellow) and T. thermophilus Cas6e (gray). The extended RNA “groove-binding loop” in the E. coli protein is labeled. (D) Cartoons of the E. coli Cas6e (yellow), T. thermophilus Cas6e (gray) and their respective crRNA sequences. Together these structures suggest that Cas6e proteins recognize bases on both sides of the stem-loop and that after cleavage Cas6e releases the 3’ cleavage product (far right). 86

Figure S3.5. Structural similarities between Cas7 CRISPR proteins. (A) The three different morphologies of the Cas7 proteins from Cascade are shown in complex with their respective crRNA substrates. (B) Structural alignments of the different morphologies of Cas7. The thumb of Cas7.1 is rotated 73˚ relative to Cas7.5 (left). The thumb of Cas7.1 also forms a short helix that fits into the hydrophobic groove of Cas6e, serving as a tether between the head and backbone. The finger domain of Cas7.6 is rotated 180˚ relative to Cas7.5 (right). (C) Cas7 homologues in different CRISPR systems and their structural similarities to Cas7.5. Both Csa2 (Type I-A) (23) and Csm3 (Type III-A) (22) exhibit similar architecture to the Cas7 protein from E. coli, consisting of a modified RRM (palm domain), a helical fingers domain and two loops that make the webbing between the palm and thumb. The thumb is disordered in Csa2 and Csm3, however the structural similarities suggest a conserved role for the thumb in assembly of the Cas7 backbone across diverse CRISPR systems.

87

Figure S3.6. Surface features of Cascade. (A) Cascade shown as surface representation and colored by conservation. Conserved residues are colored orange and variable residues are white. (B) Individual components of Cascade colored by conservation. The palm and thumb of Cas7 proteins are the most conserved surfaces. (C) Cascade shown as an electrostatic surface. Positively charged residues are colored blue, negative as red and hydrophobic as white. (D) Individual components of Cascade colored according to electrostatic surface potential. The nucleic acid binding surfaces are positively charged. 88

Figure S3.7. Every 6th-nucleotide of the crRNA-guide does not participate in target recognition. (A) Data from electrophoretic mobility shift assays of dsDNA substrates that contain mismatches with the crRNA at 6-nt intervals (Figure 3.4D). The data were fit with a standard binding isotherm (Materials and Methods). The concentrations of Cascade are 0, 0.1 pM, 0.5 pM, 1 nM, 5 nM, 10 nM, 100 nM, 1 uM, and 10 uM. (B) Average KD and standard error of the mean (SEM) from three independent experiments are plotted for each of the four different dsDNA substrates (Table S3.2). 89

Figure S3.8. Cse2 proteins form a dimer that assembles along the belly of Cascade. (A) Two Cse2 proteins form a dimer that parallels the helical Cas7 backbone and connects the head (Cas6e) to the tail (Cse1) of Cascade. Each Cse2 protein contacts three Cas7 subunits through interactions with the thumb and webbing motifs. (B) Two parallel tracks of positive surface charge (dashed yellow line) may stabilize the target and displaced strand of DNA. 90

Figure S3.9. Structures of Cas5 proteins. (A) Atomic models of six Cas5-like proteins from Type I and Type III immune systems. Each of the structures consists of a modified RRM and a thumb. The structures were superimposed using the secondary structure matching program in COOT (45). The only structure that contains RNA is Cas5e from Cascade, however the crRNA binding pocket appears to be conserved in the other Cas5- 91 like structures and the 5’-handle of the crRNA has been modeled into the binding pocket of these structures. (B) Surface representation of the Cas5-like proteins with the 5’-handle of the crRNA modeled according to panel A. (C) Electrostatic surface representation of the Cas5-like proteins with the 5’-handle of the crRNA modeled according to panel A. (D) Superposition of the E. coli Cas5e and B. halodurans Cas5c (29). (E) A co-crystal structure of Cmr3 (red) and Cmr2 (pink) from P. furiosus (50). Cmr3 (red) is a Cas5-like protein that may cap the end of the CMR complex. The N-terminal RRM of Cmr3 and Cas5d are superimposed. Superposition of the Cmr3-Cmr2 complex onto Cas5e positions the Cmr2 protein in a similar location to Cse1 in Cascade (far right). 92

Figure S3.10. Docking the atomic model of Cascade into the cryo-EM density of Cascade bound to dsDNA. (A) Atomic models of Cascade docked into the cryo-EM density of Cascade bound to dsDNA. The dsDNA target is shown as a red surface. There is a 16 Å gap between the finger domains of the Cas7 proteins except between Cas7.5 to Cas7.6. A conformational change in Cas7.6 creates a larger gap between Cas7.5 and Cas7.6 subunits allowing space for a nucleic acid duplex. (B) Zoomed in view of dsDNA target bound to Cascade. Lysine-rich helices on the finger domains of Cas7.5 and Cas7.6 are positioned to interact with the negatively charged backbone of the dsDNA target. 93

Figure S3.11. Conserved mechanism for recognition of the 5'-handle. (A) Superposition of Cas5e from E. coli Cascade and Cas5d from Bacillus halodurans (PDB: 4F3M) (29). Cas5d has an additional C-terminal extension that contains an endonuclease active site (red). (B) The aligned 5'-handle from E. coli is modeled onto the surface of Cas5d and the putative trajectory of the extra three nucleotides observed in Type IC repeats is indicated with a dashed line. (C) Schematic of Cas6e-mediated crRNA processing and Cas5e binding in CRISPR subtype IE. (D) Schematic of a putative Cas5d-mediated crRNA processing mechanism in CRISPR subtype IC, in which Cas5d recognizes the 5'-handle sequence. 94

Figure S3.12. Cse1 is a metal binding protein. (A) Cse1 (purple) is a large two-domain protein consisting of a unique globular fold and a C-terminal four-helix bundle. The globular domain contains a metal-ion coordinated by four cysteines (black box). The inset on the right shows the electron density map (white mesh), model of the Cse1 metal binding site (purple) and the metal-ion (blue). (B) Excitation scan indicating that zinc is present in the crystal. (C) Fluorescence scan indicating that zinc is present in the crystal. 95

Figure S3.13. A-form ordering of RNA by diverse protein platforms. (A) The crystal structures of E. coli Cascade, Cas9 (PDB: 4OO8), and Argonaute (PDB: 4OLA) with guide RNAs (above). The protein in each of these systems pre-orders the RNA in a pseudo A- form conformation (below).

96

Supplemental Tables

Table S3.1. Data collection and refinement statistics

Crystal morphology Shard Data collection Beamline SSRL 12-2 Space group P212121 Cell dimensions a, b, c (Å) 201.0, 214.4, 217.9 () 90.0, 90.0, 90.0 CC1/2 0.994 (0.736) Completeness (%) 99.9 (100.0) Redundancy 6.8 (6.9) Rmerge(%) 14.1 (92.6) I / σI 8.5 (2.2)

Refinement Resolution† (Å) 39-3.24 (3.35-3.24)* No. reflections 174565 Rwork/Rfree (%) 22.09/26.47 No. non-hydrogen atoms 53326 Marcomolecules 53324 Zn 2 Mean B-factors 89.5 Macromolecules 89.5 Zn 182.8 RMSD Bond lengths (Å) 0.008 Bond angles (deg) 1.42 Ramachandran Favored (%) 96 Outliers (%) 0.17 Clashscore 14.2

*Values in parentheses are for highest-resolution shell. †Resolution limits use the criterion of I/σI > 2.0 CC1/2 is the percentage of correlation between intensities from random half-datasets (51).

97

Table S3.2. DNA substrates used for band shift assays.

Description Sequence

Target strand (top) and non-target strand (bottom);

PAM in black; the target is blue and mutations are red

72 bp dsDNA 100% 1 6 12 18 24 30

complementary to 3’-CGCGCCGTTCGGCTTTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’

crRNA-guide 5’-GCGCGGCAAGCCGAAAGCATGACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’

72 bp dsDNA with 1 5 11 17 23 29

mutations at 3’-CGCGCCGTTCGGCTTTCGTACTGCCTTAACATGTCTATGACCGTACGGTCGTCACTAACGAGCCCTCAGCGA-5’

positions 5, 11, 5’-GCGCGGCAAGCCGAAAGCATGACGGAATTGTACAGATACTGGCATGCCAGCAGTGATTGCTCGGGAGTCGCT-3’

17, 23, and 29.

72 bp dsDNA with 1 6 12 18 24 30

mutations at 3’-CGCGCCGTTCGGCTTTCGTACTGCCACAACAATTCTAGTACCGATCGGTTTTCACTAACGAGCCCTCAGCGA-5’

positions 6, 12, 5’-GCGCGGCAAGCCGAAAGCATGACGGTGTTGTTAAGATCATGGCTAGCCAAAAGTGATTGCTCGGGAGTCGCT-3’

18, 24, and 30.

72 bp dsDNA with 1 7 13 19 25 31

mutations at 3’-CGCGCCGTTCGGCTTTCGTACTGCCATTACAAGCCTAGGTCCGAATGGTTGCCACTAACGAGCCCTCAGCGA-5’

positions 7, 13, 5’-GCGCGGCAAGCCGAAAGCATGACGGTAATGTTCGGATCCAGGCTTACCAACGGTGATTGCTCGGGAGTCGCT-3’

19, 25, and 31.

98

Supplemental Movies

Movie S3.1. Cas proteins and their interactions with the crRNA. Cascade is a sea- horse-shaped complex designed to bind double-stranded DNA targets complementary to the crRNA-guide sequence. The complex consists of 11 proteins and a 61-nt crRNA. The CRISPR locus is transcribed and partial in the E. coli repeat sequence results in long CRISPR RNA consisting of a series of 29-nt repeats with stable stem-loop structures that are separated by 32-nt spacer sequences. Cas6e is a CRISPR-specific endoribonuclease that interacts with the crRNA stem-loop. Cas6e cleaves the crRNA on the 3’-side of the stem-loop forming a mature 61-nt crRNA consisting of the 32-nt spacer (red) flanked by portions of the repeat. After cleavage Cas6e remains tightly associated with the 3’ stem-loop of the mature crRNA and a hydrophobic V-shaped cleft on the back of Cas6e provides a binding site for a short helix from Cas7.1 that tethers Cas6e to the helical backbone of Cascade. The backbone of Cascade is composed of six Cas7 proteins that oligomerize along the crRNA forming an interwoven architecture that divides the crRNA in six discrete segments. Each Cas7 protein resembles a ‘right hand’ that grips the crRNA through non- sequence specific interactions on the palm, web and fingers. The thumb of each Cas7 subunit folds over the top of the RNA and slots into a positively charged crease in the palm of the adjacent molecule. The 8-nts on the 5’-end of the crRNA function as a molecular signal that initiates assembly of the tail. The Cas5e protein specifically binds nucleotides in the 5’-handle and a thumb folds over the top of the crRNA and arches across the palm of Cas7.6. The Cas5e thumb creates a pore that provides access to nucleotides in the 5’- handle. This pore is a docking module for a short helix on Cse1. The two Cse2 subunits assemble along the belly of Cascade. Although the Cse2 subunits do not make direct contacts with the crRNA, electrostatic calculations show that both faces of the Cse2 dimer are positively charged, indicating a possible role for Cse2 in stabilizing the bound and displaced strands of the DNA target. The interlocking assembly of the Cascade backbone segments the crRNA into six sections. The first five segments consist of a buried nucleotide followed by five solvent accessible bases that are ordered in a pseudo A-form configuration. Previous biochemical and genetic studies has shown that the nucleotides in segment 1 (positions 1-5) are critical for target recognition. This portion of the crRNA-guide is called the ‘seed’ sequence and it has been suggested that helical ordering of these bases may explain their importance in target binding. However, the helical arrangement of bases in segment 1 are not significantly different from those in segments 2, 3, 4, and 5, suggesting that the importance of the seed may have more to do with its proximity to the PAM, rather than preferential pre-ordering of the bases. A surface representation of Cascade colored according to phylogenetic conservation. White signifies variable, while residues colored orange are conserved. Disassembly of the complex into individual subunits reveals that the most highly conserved subunits appear along the palm, web and finger domains of Cas7. https://science.sciencemag.org/content/suppl/2014/08/06/science.1256328.DC1 99

Supplemental References

39. Z. Otwinowski, W. Minor, Processing of X-ray diffraction data collected in oscillation mode. Method Enzymol 276, 307-326 (1997).

40. W. Kabsch, Xds. Acta crystallographica. Section D, Biological crystallography 66, 125-132 (2010).

41. P. R. Evans, G. N. Murshudov, How good are my data and what is the resolution? Acta Crystallogr D 69, 1204-1214 (2013).

42. A. J. Mccoy, R. W. Grosse-Kunstleve, P. D. Adams, M. D. Winn, L. C. Storoni, R. J. Read, Phaser crystallographic software. J Appl Crystallogr 40, 658-674 (2007).

43. T. C. Terwilliger et al, Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. Acta crystallographica. Section D, Biological crystallography 64, 61-69 (2008).

44. P. D. Adams et al, PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta crystallographica. Section D, Biological crystallography 66, 213-221 (2010).

45. P. Emsley, K. Cowtan, Coot: model-building tools for molecular graphics. Acta Crystallogr D 60, 2126-2132 (2004).

46. P. V. Afonine et al, Towards automated crystallographic structure refinement with phenix.refine. Acta crystallographica. Section D, Biological crystallography 68, 352-367 (2012).

47. V. B. Chen et al, MolProbity: all-atom structure validation for macromolecular crystallography. Acta crystallographica. Section D, Biological crystallography 66, 12-21 (2010).

48. W. L. DeLano. (2002).

49. T. D. Goddard, C. C. Huang, T. E. Ferrin, Software extensions to UCSF chimera for interactive visualization of large molecular assemblies. Structure 13, 473-482 (2005).

50. Y. Shao, A. I. Cocozaki, N. F. Ramia, R. M. Terns, M. P. Terns, H. Li, Structure of the Cmr2-Cmr3 Subcomplex of the Cmr RNA Silencing Complex. Structure 21, 376-384 (2013). 100

51. P. A. Karplus, K. Diederichs, Linking crystallographic model and data quality. Science 336, 1030-1033 (2012).

101

CHAPTER FOUR

MECHANISM OF CRISPR-RNA GUIDED RECOGNITION OF DNA TARGETS IN

ESCHERICHIA COLI

Contribution of Authors and Co-Authors

Manuscript in Chapter 4

Author: Paul B.G. van Erp

Contributions: designed the research, performed experiments, analyzed the data, wrote the manuscript

Author: Ryan N. Jackson

Contributions: designed the research, performed experiments, analyzed the data, wrote the manuscript

Author: Joshua Carter

Contributions: designed the research, performed experiments, analyzed the data

Co-Author: Sarah M. Golden

Contributions: performed experiments, analyzed the data

Co-Author: Scott Bailey

Contributions: wrote the manuscript

Co-Author: Blake Wiedenheft

Contributions: designed the research, analyzed the data, wrote the manuscript

102

Manuscript Information

Paul B. G. van Erp, Ryan N. Jackson, Joshua Carter, Sarah M. Golden, Scott Bailey, Blake Wiedenheft

Nucleic Acids Research

Status of Manuscript: ____ Prepared for submission to a peer-reviewed journal ____ Officially submitted to a peer-reviewed journal ____ Accepted by a peer-reviewed journal X Published in a peer-reviewed journal

Oxford University Press Volume 43 Issue 17 30 Sep 2015 Pages 8381-8391 DOI: 10.1093/nar/gkv793

103

MECHANISM OF CRISPR-RNA GUIDED RECOGNITION OF DNA TARGETS IN

ESCHERICHIA COLI

Paul B. G. van Erpa1, Ryan N. Jacksona1, Joshua Cartera1, Sarah M. Goldena, Scott

Baileyb, Blake Wiedenhefta aDepartment of Microbiology and Immunology, Montana State University, Bozeman MT

59717. bDepartment of Biochemistry and Molecular Biology, Johns Hopkins School of

Public Health, Baltimore, MD 21205

1P.B.G.v.E., R.N.J. and J.C. contributed equally to this work

104

Abstract

In bacteria and archaea, short fragments of foreign DNA are integrated into

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat) loci, providing a molecular memory of previous encounters with foreign genetic elements. In Escherichia coli, short CRISPR-derived RNAs (crRNAs) are incorporated into a multi-subunit surveillance complex called Cascade (CRISPR-associated complex for antiviral defense).

Recent structures of Cascade capture snap-shots of this seahorse-shaped RNA-guided surveillance complex before and after binding to a DNA target. Here we determine a 3.2

Å x-ray crystal structure of Cascade in a new crystal form that provides insight into the mechanism of dsDNA binding. Molecular dynamic simulations performed using available structures reveal functional roles for residues in the tail, backbone, and belly subunits of

Cascade that are critical for binding dsDNA. Structural comparisons are used to make functional predictions and these predictions are tested in vivo and in vitro. Collectively, the results in this study reveal underlying mechanisms involved in target-induced conformational changes and highlight residues important in DNA binding and PAM recognition.

105

Introduction

All immune systems must recognize and eliminate foreign invaders while avoiding self-antigens that would lead to an autoimmune reaction. In many bacteria (~50%) and most archaea (~90%), immunity to invading genetic parasites is achieved by integrating short fragments of invading DNA into Clustered Regularly Interspaced Short Palindromic

Repeats (CRISPRs) (1-8). Transcripts from these loci are processed into short CRISPR- derived RNAs (crRNAs) that assemble with CRISPR-associated (Cas) proteins into large ribonucleoprotein surveillance complexes that use the crRNA as a guide to bind complementary DNA targets, called protospacers. While all CRISPR-systems rely on crRNAs for sequence specific detection of invading nucleic acids, phylogenetic analyses of CRISPR loci and cas genes have identified three distinct CRISPR-system types (Type

I, II, and III) that are further separated into at least 11 subtypes (Type I-A to I-E, Type II-

A to II-C, and Type III-A to III-C) (9).

CRISPR RNA-guided (crRNA) surveillance complexes must distinguish “non- self” (viral and plasmid DNA) from “self” (chromosomal DNA) sequences that are complementary to the crRNA-guide. Type I and Type II CRISPR-systems identify “non- self” targets through recognition of a protospacer adjacent motif (PAM) (10-17). In contrast, Type III complexes prevent self-targeting by detecting complementary base pairing that extends beyond the crRNA-guide and into the 5’-handle (7,18,19). Foreign

DNA recognition results in conformational changes in the CRISPR surveillance machinery that activate cis- or trans-acting nucleases for target destruction. 106

Escherichia coli K12 contains a Type IE system that consists of eight cas genes and a downstream CRISPR locus (Figure 4.1A). Five of these cas genes encode proteins that assemble (Cse11, Cse22, Cas76, Cas5e1, and Cas6e1) with a 61 nt crRNA into a large seahorse-shaped complex called Cascade (CRISPR-associated complex for antiviral defense) (Figure 4.1) (10,20,21). Recently, three x-ray crystal structures and an additional cryo-EM reconstruction of Cascade were determined at different stages of DNA target surveillance (22-25). The crystal structures explain how six Cas7 proteins assemble into a helical backbone with the 32nt crRNA-guide sequence presented in short helical segments, while head and tail proteins interact with conserved regions of the 3’ and 5’ repeat sequences, positioned at opposite ends of the complex (22-24). The unbound and ssDNA- bound crystal structures adopt different conformational states, and the cryo-EM reconstruction offers new insights into the mechanism of binding double-stranded

(dsDNA), which is the physiologically relevant target. Collectively, these structures offer snap-shots of Cascade at different stages of DNA surveillance, however the contemporaneous nature of these publications has precluded a comparison of these structures.

Here we present a 3.2 Å x-ray crystal structure of Cascade in a new crystal form, which reveals interactions between protein and nucleic acid subunits. This structure, combined with insights from recent x-ray and cryo-EM structures of Cascade, offer a mechanistic explanation for target induced conformational rearrangements and reveals new functions for specific residues involved in target recognition. These structure-guided insights are used to test function using a combination of in vivo and in vitro assays. 107

Collectively, the data presented here support a DNA targeting mechanism where positively charged residues in the backbone make non-sequence specific interactions with dsDNA substrates that are necessary to initiate RNA-guided strand invasion. Conserved residues along the belly stabilize directional hybridization with the crRNA, driving the conformational rearrangements that lock the complex on the DNA target and recruit the

Cas3 nuclease for target destruction.

Materials and Methods

Cascade expression and purification

Protein expression and purification was performed using previously described methods (10,21). Briefly, cas genes and CRISPR RNAs were co-expressed in E. coli Bl21

(DE3) cells using three or four different expression vectors, where either Cse2 or Cas7 is fused to an N-terminal Strep tag (Supplemental Table S4.1). Cells were grown in LB-media under selection and induced at an OD600nm of 0.5 using 0.2 mM isopropyl-β-D-

1-thiogalactopyranoside (IPTG). Cells were cultured overnight at 16°C, pelleted by centrifugation (5,000g for 10 min), suspended in lysis buffer (100mM Tris-HCl pH 8.0,

150mM NaCl, 1mM EDTA, 1mM TCEP and 5% glycerol), and frozen at -80°C. Cells were lysed by sonication and lysates were clarified by centrifugation (22,000g for 30 min).

Cascade self-assembles in vivo and the complex was affinity purified on StrepTrap HP resin (GE) using N-terminal Strep-II tags on either Cse2 or Cas7. Elution of Cascade was performed using the lysis buffer supplemented with 2.5 mM desthiobiotin. The Strep-II tag was removed with HRV-3C protease, followed by a second purification using StrepTrap

HP resin. Cascade was concentrated prior to gel filtration chromatography using either a 108

10/300 Superose6 column (GE Healthcare) equilibrated with 50mM Tris-HCl pH 7.5,

100mM NaCl, 1mM TCEP and 5% glycerol, or a 26/60 Superdex 200 (GE Healthcare) equilibrated with 50mM Tris-HCl pH 7.5.

Cascade crystallization and structure determination

Cascade crystals were grown using hanging-drop vapor diffusion at 4, 12, 18 and

24°C from equal volumes (2μl + 2μl) of concentrated Cascade protein (A280=30 to 45) and mother liquor (0.1 M HEPES pH 7.0, 0-0.1 M KCl, and 8-14% (w/v) PEG 8000). The crystal used in this study grew in the presence of an 11-nt ssDNA containing a 5’-CTT-3’

PAM and 8-nts (5'-AATACCGT-3') complementary to the crRNA-guide sequence in a 2:1 oligonucleotide:protein ratio. However, the target DNA is not observed in the electron density. Hanging-drop experiments were carried out in EasyXtal 15-Well DG-Tool plates

(Qiagen). Crystals generally appear after three weeks of incubation. Crystals were harvested and cryo-protected in mother liquor supplemented with 20% (v/v) PEG 400 before flash-cooling in liquid Nitrogen. Diffraction data were collected on beamline 23-ID at the Advanced Photon Source at the Argonne National Lab (Supplemental Table S4.2).

Data processing was carried out using XDS and Aimless (26,27). A previously determined structure of Cascade was used as an initial search ensemble to determine the structure by molecular replacement using Phaser (28). Model building was performed using COOT

(29), the model was refined using Phenix.refine (30), and validated using Molprobity

(Supplemental Table S4.2) (31).

109

Structural analysis and graphical rendering

Structures were analyzed and figures were rendered using PyMol or Chimera

(32,33). Movies were made using Chimera. Molecular dynamics simulations were performed using Cascade structures (PDB ID: 4TVX, 4QYZ, and EMDB ID: 5929)

(22,23,25). Two different Molecular dynamic simulations we preformed. In the first simulation we used atomic coordinates from the unbound structure of Cascade (4TVX), whereas the second simulation used atomic coordinates from Cascade (4QYZ) bound to ssDNA. However, parts of the ssDNA bound structure are disordered and not present in the x-ray density. To create a model based on the ssDNA bound structure, the unbound

(4TVX) and ssDNA bound (4QYZ) structures of Cascade were superimposed using the

Cas7 backbone (rmsd 0.71 Å). The aligned unbound structure was used to model amino acid positions not observed in the ssDNA bound structure. For both simulations, the models were refined using Molecular Dynamics Flexible Fitting (MDFF) by generating force potentials from a simulated 5-Å resolution map of the unbound Cascade structure or the ssDNA bound Cascade structure, while simultaneously enforcing harmonic restraints to maintain secondary structure (34,35). This simulation ran for 100-picoseconds (ps) followed by a 2-ps energy minimization in Nanoscale Molecular Dynamics (NAMD) (36).

This atomic model was then fit into the ~9 Å resolution cryo-EM map of Cascade bound to dsDNA (25), and B-form DNA was modeled into density at the tail of the complex using

Chimera. The 9 Å resolution cryo-EM density accommodates two distinct starting models, each positioning the globular domain of Cse1 (Cas8) in a different conformational state.

These differences are derived from whether we used the ssDNA bound or unbound x-ray 110 crystal structure as the basis to model the conformational state of the Cse1 subunit. Atomic coordinates of these models were refined using MDFF for 150-ps, followed by a 2-ps energy minimization. All simulations were performed at 300˚K with 2-femtosecond time steps. Correlation coefficients were determined using Chimera’s Fit in Map function.

Electrophoretic Mobility Shift Assay (EMSA)

Oligonucleotides () listed in Supplemental Table S4.3 were 5’-end labeled with γ32P-ATP (PerkinElmer) using T4 polynucleotide kinase (NEB). Labeled oligonucleotides were purified by phenol/chloroform extraction followed by MicroSpin G-

25 columns (GE Healthcare). dsDNA was prepared by mixing labeled oligonucleotides with a >5 fold molar excess of the complementary oligonucleotide, in hybridization buffer

(20mM HEPES pH 7.5, 75mM NaCl, 2mM EDTA, 10% glycerol, and 0.01% bromophenol blue). The mixture was incubated at 95°C for 5 minutes, and gradually cooled to 25°C in a thermocycler. Oligonucleotide duplexes were gel purified, ethanol precipitated and recovered in hybridization buffer.

Increasing concentrations of Cascade were incubated with oligonucleotides in hybridization buffer plus 1mM TCEP. Samples were incubated for 15 minutes at 37 °C, loaded onto a 6% native polyacrylamide gel, and run for 3 hours at 150 Volts at 4 °C. Gels were dried, exposed to phosphor storage screens, and scanned with a Typhoon (GE

Healthcare) phosphorimager. Bound and unbound DNA fractions were quantified using

ImageQuant software (GE Healthcare) or GelQuant.NET software

(biochemlabsolutions.com). After background subtraction, the fractions of bound 111 oligonucleotides were plotted against total Cascade concentration. The data were fit by nonlinear regression analysis using the equation:

푀1 ∗ [퐶푎푠푐푎푑푒] 퐹푟푎푐푡푖표푛 푏표푢푛푑 퐷푁퐴 = 푡표푡푎푙 [ ] (퐾푑 + 퐶푎푠푐푎푑푒 푡표푡푎푙 )

Where M1 is the amplitude of the binding curve. Reported KDs are the average of three independent experiments and error bars represent standard deviations.

Plasmid Curing Assays

Plasmid curing assays were preformed according to previously described methods

(37). In brief, E. coli BL21 (AI) cells were transformed with pWUR547 (CRISPR 7xP7), pWUR397 (Cas3), and pWUR400 (Cse1, Cse2, Cas7, Cas5e, and Cas6e) plasmids. These cells were made electrocompetent using standard methods and transformed with an equal molar mixture of pUC19 (non-target) and pUC19-P7-CAT (target) containing a 350-bp insert of the phage P7 genome flanked by a 5’-CAT-3’ PAM (Supplemental Table S4.1).

Cells used for a negative control (immunodefective) were identical except pWUR547

(CRISPR 7xP7) was replaced with a non-targeting CRISPR (pWUR630), which targets a protospacer that is not present on either pUC19 or pUC19-P7-CAT plasmids. After transformation the cells were grown for 2 hours at 37°C in 1 ml of LB-media supplemented with 100 µg/ml ampicillin, 0.2 % L-arabinose, and 0.2 mM IPTG. Cells were plated on

LB-agar plates supplemented with 100 µg/ml ampicillin, 0.2 % L-arabinose and 0.2 mM

IPTG at 37°C overnight. The next day, 40 colonies were randomly screened for the 112 presence of pUC19 or pUC-P7-CAT by colony PCR using forward (5’-

CAGTGAGCGCAACGCAATTA) and reverse (5’- AACTATGCGGCATCAGAGCA) primers that amplify a 454-bp region of pUC19 or a 768-bp region in pUC-P7-CAT. PCR reactions were separated by agarose gel electrophoresis and stained with sybr safe

(Supplemental Figure S4.1). The error bars represent standard error of the mean calculated from three independent experiments. Mutants of Cascade were made by site directed mutagenesis using the pWUR400 plasmid as a template. Primers for mutagenesis are listed in Supplemental Table S4.4.

Results

A lysine-rich vise is essential for dsDNA binding and unwinding.

We determined a 3.2 Å x-ray structure of Cascade in a new crystal form using molecular replacement methods (Figure 4.1 and Supplemental Table S4.2). The asymmetric unit contains two structurally similar seahorse-shaped Cascade assemblies that superimpose on two previously determined structures of Cascade with a root-mean-square deviation (rmsd) of less than 1.1 Å for equivalently positioned C atoms (Supplemental

Figure S4.2 and Supplemental Table S4.2) (22,24). However, the arrangement of Cascade molecules in the new crystal form provides new insights into the mechanism of dsDNA binding. Symmetrically related Cascade complexes are packed in a head-to-tail and head- to-belly arrangement (Figure 4.1B and 1C), with the 3’-stem-loop of one Cascade placed between two positively charged lysine-rich helices (K137, K138, K141, and K144) on

Cas7.5 and Cas7.6 subunits. The backbone of Cascade is composed of six Cas7 subunits that share a ‘right-hand’ morphology consisting of fingers, palm, and a thumb (Figure 4.1). 113

The lysine-rich helix (K-rich helix) is located on the ‘pinky’ of the fingers domain of

Cas7.1 to Cas7.5, but the fingers domain of Cas7.6 is rotated ~180 degrees. Previously we speculated that the lysine-rich helix on Cas7.5 and Cas7.6 may play a role in stabilizing dsDNA during target recognition and our new crystal structure reveals non-sequence specific interactions between the lysines and the negatively charged phosphate backbone

(Figure 4.1D) (22). To test the importance of these lysines in Cascade-mediated immunity, we made a quadruple mutant of Cas7 (K137A, K138A, K141A, and K144A) denoted

KKKK, and tested the mutant using an in vivo plasmid curing assay. Plasmid curing assays indicate that the KKKK mutation results in severe immunodeficiency (Figure

4.1E).

To model potential interactions between the K-rich helices and dsDNA, we docked the atomic coordinates of Cascade and B-form dsDNA into the 9 Å cryo-EM reconstruction of Cascade and then used Molecular Dynamics Flexible Fitting (MDFF) to fit the atomic model into the EM density while maintaining bond angles and distances (34,35,38). This model indicates that the 3’-end of a bound dsDNA target is positioned between the K-rich helices of Cas7.5 and Cas7.6. We hypothesized that the K-rich helices of Cas7.5 and Cas7.6 function as a molecular vise-grip that binds the phosphate backbone of dsDNA and positions the target for PAM recognition and crRNA-guided strand invasion (Figure 4.1G and Supplemental Movie S4.1). To test this hypothesis, we purified KKKK Cascade

(Supplemental Figure S4.3A), and performed electrophoretic mobility shift assays

(EMSAs) using dsDNA containing a protospacer and a 5’-CAT-3’ PAM. Wild type

Cascade bound to a 72-bp dsDNA target containing a protospacer and PAM with high 114 affinity (KD=1.14 nM), while the KKKK mutation resulted in a severe binding defect

(KD>1000 nM) (Figure 4.1H and Supplemental Figure S4.4B). In contrast to dsDNA targets, the KKKK mutation results in modest binding defects for ssDNA targets Figure

4.1H). These biochemical data indicate the immunodeficiency of the KKKK mutant is a result of a dsDNA-binding defect.

Structural models of Cascade bound to dsDNA suggest that at least twelve additional base pairs on the 3’-side of the protospacer sequence are required to extend through the K-rich helices. Accordingly, the model predicts that 3’-truncations of the dsDNA will disrupt high-affinity binding to Cascade. To test this model, we performed

EMSAs using a series of dsDNA substrates with identical protospacers, but incrementally shorter 3’-ends. We observed a direct correlation between the length of the 3’-end and binding affinity. Double-stranded DNA substrates with 3’-ends that extend 8-bps beyond the PAM bind with high affinity (KD<5 nM), while substrates with shorter 3’-ends display significant binding defects (Figure 4.1H and Supplemental Figure S4.4B). Binding affinities directly correlated with the length of the 3’-end and dsDNA substrates that do not reach the lysine vise (+3 and +0) result in binding affinities comparable to non-target DNA

(KD>1000 nM). However, these binding defects are primarily limited to dsDNA substrates.

The length of the 3’-end had no significant impact on binding affinities for short ssDNA substrates, though longer ssDNA did bind with lower affinity to the KKKK mutant

(Figure 4.1H and Supplemental Figure S4.4C). Together, these results suggest the K-rich vise positions dsDNA for subsequent PAM scanning and duplex unwinding.

115

Figure 4.1. A lysine-rich vise is critical for binding dsDNA. (A) The CRISPR-mediated adaptive immune system in Escherichia coli (Type IE) consists of eight cas genes (arrows) and a CRISPR locus (black diamonds and red cylinders). Five of the cas genes (colored arrows) encode proteins that assemble with a 61-nt crRNA to from the seahorse-shaped complex composed of an unequal number of five different Cas proteins. The stoichiometry of Cas proteins in Cascade is indicated above the arrows. (B) Cartoon schematic of how three symmetry related Cascade complexes pack together in the crystal in a head-to-tail and head-to-belly arrangement. Subunits are colored according to the scheme used in panel A. (C) Surface rendition of symmetry related Cascade complexes. (D) The interface between the 3’ stem-loop of one Cascade assembly with lysine-rich (K-rich) helices of another. Positively charged K-rich helices are indicated in blue. (E) Plasmid curing results. The KKKK mutation results in immunodeficiency. (F) Model of Cascade bound to dsDNA docked into the cryo-EM density EMDB 5929. The K-helices of Cas7.5 and Cas7.6 are indicated. (G) Schematic of Cascade binding dsDNA. The PAM and number of base pairs required to reach through the K-helices (+12) are indicated. (H) Equilibrium dissociation constants of Cascade and KKKK Cascade with various dsDNA and ssDNA substrates containing 3’-ends with variable lengths. 116

PAM recognition and strand invasion.

Cascade distinguishes self from non-self sequences through recognition of a tri- nucleotide PAM sequence located adjacent to the protospacer (12,39). Previous structural, biochemical, and genetic work suggested that residues 125 to 131 of Cse1 form a loop (L1) that participates in PAM recognition (40). The crystal structure of Cascade reveals that L1 includes a short α-helix that extends through a pore formed by Cas5e (22,24), and that the

L1-helix interacts with in the 5’-handle of the crRNA (Figure 4.2A) (25,40).

However, DNA binding induces a conformational change in Cascade, and L1 is disordered in the ssDNA bound structure of Cascade (23). This suggests that conformational changes induced by DNA binding may extract the L1-helix from the Cas5e pore. To clarify the functional role of the L1-helix, we compared unbound structures of Cascade to atomic models of Cascade bound dsDNA that we generated using MDFF (Figure 4.2A-B). MDFF models generated using atomic coordinates from the ssDNA bound structure indicate that

L1 moves out of the Cas5e pore and repositions itself next to the PAM. However, a similar simulation performed using atomic coordinates from the unbound structure of Cascade reveals only modest changes in the position of residues in the L1-helix (Supplemental

Figure S4.5). The different position of L1 between these models reflects bias introduced by the position of L1 in the starting model. Regardless of the starting model, both simulations indicate that two residues in the L1-helix (T125 and N126) become more accessible for PAM interactions. Furthermore, both models indicate that a -hairpin on

Cse1 (residues 343 to 366) positions itself between the complementary and displaced strands of the protospacer. 117

To test the importance of the L1-helix and the -hairpin in dsDNA target recognition, we mutated L1 and the -hairpin of Cse1 to mimic the Cse1 sequence from

Streptococcus thermophilus (Supplemental Figure S4.6), which recognizes a 5’-AAT-3’

PAM rather than a 5’-CAT-3’ PAM recognized by E. coli Cascade (41). Mutations in L1 were restricted to solvent accessible residues (i.e. T125N and N126K) (Figure 4.2A-B).

The -hairpin is surface exposed in both MDFF models and 6 residues (343 to 366) were replaced with the corresponding amino acids from the S. thermophilus Cse1 sequence

(Supplemental Figure S4.6). Mutations in either the L1-helix or the -hairpin result in immunodeficiencies comparable to controls performed using a non-targeting CRISPR

(Figure 4.2C). To determine if the immunodeficiencies were due to assembly defects, we purified Cascade complexes containing either the L1-helix or the -hairpin mutations. The

L1-helix and -hairpin mutants express and purify like wild type, suggesting that these mutations do not perturb assembly of the complex (Figure 4.2D). Next, we used the purified complexes to perform EMSAs to determine if these mutations perturb DNA binding. The L1-helix and -hairpin mutations result in only minor binding defects for ssDNA, but these same mutants result in significant binding defects for dsDNA substrates that contain a 5’-CAT-3’ or a 5’-AAT-3’ PAM, and a protospacer (Figure 4.2E and

Supplemental Figure S4.7). These results indicate that residues in the L1-helix or the - hairpin are not necessary for recognition of ssDNA targets and that the L1-helix or the - hairpin are not individually responsible for PAM recognition, but they are critical for binding dsDNA. 118

Figure 4.2. The L1-helix and a long β-hairpin in the Cse1 subunit are involved in PAM recognition and duplex destabilization. (A) In the structure of Cascade prior to DNA binding, L1 sits inside the Cas5e pore. F129, V130, and N131 make contact with nucleobases in the 5’-handle of the crRNA. T125 and N126 are solvent accessible. (B) Molecular Dynamic Flexible Fitting (MDFF) was used to model atomic coordinates of Cascade prior to target binding, into the cryo-EM density of Cascade bound to a dsDNA 119 target with a PAM. The model is colored according to changes in distance relative to the PAM. Motion towards the PAM is colored orange and motion away from the PAM is blue. In the simulation, the L1-helix is positioned proximal to the PAM and the -hairpin is positioned between single-strand regions of the DNA target. (C) Mutations in the L1-helix (T125N/N126K) or the -hairpin (R351G/N353P/A355S/S356R) result in immunodeficiency. (D) Elution profile of the L1-helix and -hairpin mutants. The insert shows a Commassie blue-stained SDS-PAGE gel (top) and a denaturing polyacrylamide gel of phenol extracted crRNA isolated from each of the Cascade complexes (bottom). (E) Equilibrium dissociation constants of Cascade, the L1-helix and -hairpin mutants for dsDNA substrates containing either 5’-CAT-3’ or 5’-AAT-3’ PAM and for ssDNA substrates containing a 5’-CAT-3’ PAM.

An arginine relay facilitates conformational rearrangements during target binding.

Comparison of unbound and DNA bound x-ray structures reveals that the helical

Cas7 backbone of Cascade is a rigid structure that does not move upon target binding

(Figure 4.3) (22-24). In fact, the hexameric Cas7 backbones from these two structures superimpose with an rmsd of 0.71 Å over all equivalently positioned C atoms. In contrast to the structurally rigid Cas7 backbone, target binding triggers a dramatic conformational rearrangement in the belly (Cse2) and tail (Cse1) subunits. The two Cse2 subunits slide

~10 Å towards the tail upon target binding, and the four-helix bundle of Cse1 tilts ~14 Å closer to the bound ssDNA (Figure 4.3A and Supplemental Movie S4.1). To understand the functional implications of this conformational rearrangement, we examined interactions between the Cas7 and Cse2 subunits in the unbound and ssDNA-bound crystal structures. In the unbound state, two conserved arginines (R27 and R101) on each Cse2 subunit form salt-bridges with conserved aspartic acid residues (D22) displayed on the

Cas7 subunits. Target binding induces a conformational change that moves the Cse2 subunits down the Cas7 backbone. The R27 and R101 residues form new contacts with kinked-out bases on the bound DNA target, and compensatory salt bridges are formed 120 between additional arginines on Cse2 (R107 and R119) and D22 residues further down the backbone (Figure 4.3B-C).

This “molecular relay” of arginines anchors the Cse2 subunits to the Cascade complex during target binding, and results in a net gain of four ionic interactions with the kinked out bases. We hypothesized that newly formed interactions between flipped-out bases and the arginines on Cse2 would contribute to the stability of the target bound conformational state. To test this hypothesis we performed EMSAs using dsDNA and ssDNA targets that are missing a nucleobase at every 6th position of the target strand protospacer (Supplementary Table S4.3). We observed a 5-fold binding defect in abasic ssDNA substrates (Supplemental Figure S4.8), suggesting that interactions between the

Cse2 arginines R27 and R101, and kinked out bases contribute to target binding. However, no significant difference was observed for binding abasic dsDNA substrates, which may not be surprising given that the relatively modest contribution gained by stabilizing the flipped out bases may be masked in dsDNA substrates where abasic substitutions compensate for the binding defect observed in ssDNA by destabilizing the dsDNA duplex.

Salt bridges between D22 residues on Cas7 and conserved arginines on Cse2 (R27,

R101, R107, and R119), appear to be critical for anchoring the Cse2 subunits to the Cas7 backbone (Figure 4.3 B-C). To determine the importance of the salt bridges in Cascade- mediated immunity we made point mutants of the salt-bridging arginines on Cse2 (R27A,

R101A, R107A, and R119A), and performed in vivo plasmid-curing assays (Figure 4.3D).

Individual arginine mutations of R27A, R107A, and R119A had no detectable effect on immunity, while R101A resulted in an immunodeficiency. Next, we made double mutants 121 of the arginine residues that participate in salt bridging before (R27A/R101A) or after

(R107A/R119A) target binding. The double mutant R27A/R101A was immunocompromised, while R107A/R119A showed no significant immune defect, indicating that salt bridges formed before target binding are critical for immunity while compensatory interactions made after target binding are not essential in an overexpression system. Consistent with these results, the quadruple arginine mutant (R27A, R101A,

R107A, and R119A), denoted RRRR, resulted in an immune defect comparable to R101A or the double R27A/R101A mutant. To confirm that the immune defect was a result of disrupting the observed salt-bridges we mutated the aspartic acid residue (D22A) on Cas7.

The D22A mutation resulted in a plasmid-curing defect similar to R101A, R27A/R101A, and RRRR, suggesting that salt-bridge formation between Cse2 R101 and Cas7 D22 is important for immunity.

To determine if the immunodeficiencies observed for the Cse2 double mutants

R27A/R101A and R107A/R119A and the Cas7 D22A mutant are a consequence of perturbing Cascade assembly, target binding, or Cas3 recruitment, we expressed and purified mutant Cascade complexes using a strep-tag on the N-terminus of Cas7. These mutations all result in Cascade complexes lacking the Cse2 subunits (Figure 4.3E). A faint

Cse2 band is observed in D22A Cascade preparations. One explanation for this observation is that Cas7 has a second aspartic acid residue at position 21 that may partially compensate for the D22A mutations during purification. To determine the consequence of Cse2 destabilization on target binding we performed EMSAs using ssDNA and dsDNA targets.

Cascade lacking Cse2 is unable to bind dsDNA tightly and has a 15-fold decrease in affinity 122 for ssDNA targets (Figure 4.3F and Supplemental Figure S4.9). These data indicate that

Cse2 facilitates binding to ssDNA, and that Cse2 is critical for efficient binding to dsDNA substrates (17,21,42,43).

Figure 4.3. Conserved arginines on the Cse2 subunits participate in a relay that stabilizes target binding induced conformational changes. (A) Overlay of Cascade crystal structures before (gray) and after (colored) target binding. (B - C) Schematic of the conformational change induced by target binding. The two Cse2 subunits move ~10-Å down the backbone of Cascade. Salt bridges between the belly (Cse2 R27 and R101) and 123 the backbone (Cas7 D22) are broken. R27 and R101 are repositioned to stabilize the flipped-out bases on the DNA target strand and compensatory salt bridges are formed by Cse2 R107 and R119 with D22 residues on Cas7 subunits further down the backbone. (D) Plasmid curing assays reveal that R101A mutations result in a strong immune system defects, while individual mutation of the other Cse2 arginines shows little or no measurable defect. The D22A mutation also results in immunodeficiency. (E) Elution profile of the WT Cascade and different mutants. The insert shows a Commassie blue-stained SDS- PAGE gel (top) and a denaturing polyacrylamide gel of phenol extracted crRNA isolated for the Cascade complexes (bottom). The Cse2 R27A/R101A, Cse2 R107A/R119A, and Cas7 D22A mutants lack Cse2. (F) Equilibrium dissociation constants of Cascade and Cascade mutants for a 72-bp dsDNA target and a 72-nt ssDNA target containing a 5’-CAT- 3’ PAM.

Discussion

Cascade is a crRNA-guided surveillance complex that efficiently binds foreign

DNA and recruits a trans-acting nuclease for target degradation (41,44-46). Here we determined the x-ray structure of Cascade in a new crystal form that reveals non-sequence specific interactions between K-rich helices located on the Cas7.5 and Cas7.6 backbone subunits and the negatively charged crRNA from a symmetry related molecule (Figure

4.1D). Collectively, we show that the positive charge on the K-rich helix is necessary for dsDNA, but not ssDNA binding, and that the K-rich vise plays a fundamental role in dsDNA binding that is equivalent to the importance of the protospacer and the PAM. dsDNA substrates that contain a PAM and a protospacer, but do not have 3’-ends long enough to reach the K-rich vise are bound with affinities similar to dsDNAs lacking a PAM or protospacer (Figure 4.1H) (10-12,40). Our structural models indicate that 12-bps of duplex DNA stretch from the 3’-end of the protospacer through the K-rich vise, and we show that substrates with 3’-ends that extend across the K-rich vise are bound with high- affinity. This distance is similar to the 9 base pairs protected by Cascade in nuclease protection assays (10). 124

PAM recognition by Cascade is promiscuous, and different PAMs elicit distinct immune responses (39). Recognition of a protospacer flanked by one of the five different

‘interference’ PAMs elicits a response that involves the recruitment of Cas3 for degradation of the DNA target (39,47). However, protospacers flanked by one of 22 different ‘priming’ PAMs are still recognized by Cascade, but rather than eliciting Cas3 to degrade the target, these interactions appear to elicit a ‘priming’ response that promotes the rapid acquisition of adjacent protospacers (25,39,48). While the mechanism of PAM sensing remains unknown, we propose that PAM-dependent and PAM-priming modes of protospacer recognition will require the K-rich vise on Cas7.5 and Cas7.6, the L1-helix on

Cse1, the -hairpin on Cse1, and residues involved in the arginine relay that reposition the two Cse2 subunits during target binding (Figure 4.4). Residues in the L1-helix on Cse1 have previously been implicated in PAM detection (15,25), but structures of Cascade prior to engaging a DNA target suggested that these residues (F129, V130, and N131) are buried and unavailable for PAM interactions. However, target binding introduces a conformational change that may liberate the L1-helix for participation in PAM scanning.

Our structural models and biochemical studies also identified a -hairpin (residues 343-

366) that is important for dsDNA target binding and may be involved in PAM sensing. Our attempts to change PAM detection from 5’-CAT-3’ to 5’-AAT-3’ by simply swapping solvent accessible residues in the L1-helix, or -hairpin from E. coli for residues found at equivalent positions in the Cse1 subunit from S. thermophilus failed. We think that this failure reflects our incomplete understanding of PAM recognition and we anticipate that higher-resolution structures of Cascade bound to a dsDNA target containing a PAM will 125 provide more insight into the mechanism of PAM sensing. While the resolution of our models precludes an atomic-resolution understanding of PAM-recognition, we anticipate that PAM detection momentarily stalls Cascade, and that protracted lifetimes at PAM motifs may explain the proficiency of recognizing protospacers flanked by PAMs. This model is supported by recent evidence showing that the Csy complex (Type 1-F) preferentially interacts with dsDNA substrates containing PAMs, and that Cas9, a Type II-

A crRNA-guided surveillance complex, preferentially samples PAM-rich regions of the lambda phage genome (14,16).

PAM recognition initiates directional hybridization between the dsDNA target and complementary crRNA, displacing the non-complementary strand of the DNA target and creating an R-loop that is predicted to be stabilized by Cse1 and Cse2 subunits (17,43).

Complete R-loop formation results in a locking mechanism that increases the grip of

Cascade on DNA substrates and Rutkauskas et al demonstrated that the locking mechanism is required for Cas3 recruitment to bound targets (43). The structural analysis presented here reveals a series of conserved arginine resides on the Cse2 subunits that may contribute to locking by stabilizing flipped out bases on the target strand and forming salt-bridges with conserved acidic residues (i.e. D22) displayed on the Cas7 backbone (Supplemental

Figure S4.10). Interestingly, the D22 residue is also strictly conserved in the Cas7 proteins of the Type III Cmr and Csm complexes, where it is required for endonucleolytic cleavage of RNA substrates (Supplemental Figure S4.10) (22). Although Cascade has no detectable

RNase activity (data not shown), we show that D22 still plays a critical role in Cascade assembly and function. It is likely that this strictly conserved residue is derived from a 126 common ancestor of Type I and Type III systems, but its function has evolved to match the needs of these diverse surveillance systems. 127

Figure 4.4. DNA target recognition and binding by Cascade. (A - B) A lysine-rich vise makes non-sequence specific interacts with dsDNA. (C) DNA binding induces a conformational change and the L1-helix and -hairpin are positioned for PAM scanning. (D) Directional hybridization between the DNA target and complementary crRNA drives 128 an arginine relay along the belly subunits, in which arginines interact with kinked out bases from the bound target strand and conserved aspartates on the Cas7 backbone. (E) The conformational rearrangement locks Cascade on the target and signals recruitment of the Cas3 helicase-nuclease. (F - G) Cas3 is recruited to the Cascade-DNA complex and the DNA is degraded by Cas3. Degradation of the DNA may release Cascade for another round of target detection, however turnover of Cascade has not been demonstrated.

Acknowledgments

We thank Stan J.J. Brouns for providing plasmids used in this study and Brandon

Smart and Maya Tsidulko for their help with statistical analysis. We are grateful to the members of the Wiedenheft lab for technical assistance and helpful discussions. X-ray diffraction data was collected using beamline 23-ID-B at the Advance Photon Source

(ACB-12002, AGM-12006, DE-AC02-06CH11357), and beamline 12-2 at the Stanford

Synchrotron Radiation Lightsource (DE-AC02-76SF00515 and P41GM103393). Atomic coordinates have been deposited into the Protein Data Bank with accession code 5CD4

129

References

1. Barrangou, R. and Marraffini, L.A. (2014) CRISPR-Cas Systems: Prokaryotes Upgrade to Adaptive Immunity. Mol Cell, 54, 234-244.

2. Jiang, F. and Doudna, J.A. (2015) The structural biology of CRISPR-Cas systems. Curr Opin Struct Biol, 30, 100-111.

3. Reeks, J., Naismith, J.H. and White, M.F. (2013) CRISPR interference: a structural perspective. Biochem J, 453, 155-166.

4. Sorek, R., Lawrence, C.M. and Wiedenheft, B. (2013) CRISPR-mediated adaptive immune systems in bacteria and archaea. Annu Rev Biochem, 82, 237- 266.

5. van der Oost, J., Westra, E.R., Jackson, R.N. and Wiedenheft, B. (2014) Unravelling the structural and mechanistic basis of CRISPR-Cas systems. Nat Rev Microbiol, 12, 479-492.

6. Gasiunas, G., Sinkunas, T. and Siksnys, V. (2014) Molecular mechanisms of CRISPR-mediated microbial immunity. Cell Mol Life Sci, 71, 449-465.

7. Jackson, R.N. and Wiedenheft, B. (2015) A conserved structural chassis for mounting versatile CRISPR RNA-guided immune responses. Mol Cell, 58, 722– 728.

8. Tsui, T.K. and Li, H. (2015) Structure Principles of CRISPR-Cas Surveillance and Effector Complexes. Annu Rev Biophys.

9. Makarova, K.S., Haft, D.H., Barrangou, R., Brouns, S.J., Charpentier, E., Horvath, P., Moineau, S., Mojica, F.J., Wolf, Y.I., Yakunin, A.F. et al. (2011) Evolution and classification of the CRISPR-Cas systems. Nat Rev Microbiol, 9, 467-477.

10. Jore, M.M., Lundgren, M., van Duijn, E., Bultema, J.B., Westra, E.R., Waghmare, S.P., Wiedenheft, B., Pul, U., Wurm, R., Wagner, R. et al. (2011) Structural basis for CRISPR RNA-guided DNA recognition by Cascade. Nat Struct Mol Biol, 18, 529-536.

130

11. Semenova, E., Jore, M.M., Datsenko, K.A., Semenova, A., Westra, E.R., Wanner, B., van der Oost, J., Brouns, S.J. and Severinov, K. (2011) Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc Natl Acad Sci U S A, 108, 10098-10103.

12. Westra, E.R., Semenova, E., Datsenko, K.A., Jackson, R.N., Wiedenheft, B., Severinov, K. and Brouns, S.J. (2013) Type I-E CRISPR-Cas Systems Discriminate Target from Non-Target DNA through Base Pairing-Independent PAM Recognition. PLoS Genet, 9, e1003742.

13. Anders, C., Niewoehner, O., Duerst, A. and Jinek, M. (2014) Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature, 513, 569-573.

14. Rollins, M.F., Schuman, J.T., Paulus, K., Bukhari, H.S. and Wiedenheft, B. (2015) Mechanism of foreign DNA recognition by a CRISPR RNA-guided surveillance complex from Pseudomonas aeruginosa. Nucleic Acids Res, 43, 2216-2222.

15. Sashital, D.G., Jinek, M. and Doudna, J.A. (2011) An RNA induced conformational change required for CRISPR RNA cleavage by the endonuclease Cse3. Nat Struct Mol Biol, 18, 680-687.

16. Sternberg, S.H., Redding, S., Jinek, M., Greene, E.C. and Doudna, J.A. (2014) DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature, 507, 62-67.

17. Szczelkun, M.D., Tikhomirova, M.S., Sinkunas, T., Gasiunas, G., Karvelis, T., Pschera, P., Siksnys, V. and Seidel, R. (2014) Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes. Proc Natl Acad Sci U S A, 111, 9798-9803.

18. Marraffini, L.A. and Sontheimer, E.J. (2010) Self versus non-self discrimination during CRISPR RNA-directed immunity. Nature, 463, 568-571.

19. Samai, P., Pyenson, N., Jiang, W., Goldberg, G.W., Hatoum-Aslan, A. and Marraffini, L.A. (2015) Co-transcriptional DNA and RNA cleavage during type III CRISPR-Cas immunity. Cell, 161, 1164–1174.

20. Brouns, S.J., Jore, M.M., Lundgren, M., Westra, E.R., Slijkhuis, R.J., Snijders, A.P., Dickman, M.J., Makarova, K.S., Koonin, E.V. and van der Oost, J. (2008) Small CRISPR RNAs guide antiviral defense in prokaryotes. Science, 321, 960- 964.

131

21. Wiedenheft, B., Lander, G.C., Zhou, K., Jore, M.M., Brouns, S.J.J., van der Oost, J., Doudna, J.A. and Nogales, E. (2011) Structures of the RNA-guided surveillance complex from a bacterial immune system. Nature, 477, 486-489.

22. Jackson, R.N., Golden, S.M., van Erp, P.B., Carter, J., Westra, E.R., Brouns, S.J., Van Der Oost, J., Terwilliger, T.C., Read, R.J. and Wiedenheft, B. (2014) Crystal structure of the CRISPR RNA-guided surveillance complex from Escherichia coli. Science, 345, 1473-1479.

23. Mulepati, S., Heroux, A. and Bailey, S. (2014) Crystal structure of a CRISPR RNA-guided surveillance complex bound to a ssDNA target. Science, 345, 1479- 1484.

24. Zhao, H., Sheng, G., Wang, J., Wang, M., Bunkoczi, G., Gong, W., Wei, Z. and Wang, Y. (2014) Crystal structure of the RNA-guided immune surveillance Cascade complex in Escherichia coli. Nature, 515, 147-150.

25. Hochstrasser, M.L., Taylor, D.W., Bhat, P., Guegler, C.K., Sternberg, S.H., Nogales, E. and Doudna, J.A. (2014) CasA mediates Cas3-catalyzed target degradation during CRISPR RNA-guided interference. Proc Natl Acad Sci U S A, 111, 6618-6623.

26. Evans, P.R. and Murshudov, G.N. (2013) How good are my data and what is the resolution? Acta Crystallogr D, 69, 1204-1214.

27. Kabsch, W. (2010) Xds. Acta crystallographica. Section D, Biological crystallography, 66, 125-132.

28. Mccoy, A.J., Grosse-Kunstleve, R.W., Adams, P.D., Winn, M.D., Storoni, L.C. and Read, R.J. (2007) Phaser crystallographic software. J Appl Crystallogr, 40, 658-674.

29. Emsley, P. and Cowtan, K. (2004) Coot: model-building tools for molecular graphics. Acta Crystallogr D, 60, 2126-2132.

30. Afonine, P.V., Grosse-Kunstleve, R.W., Echols, N., Headd, J.J., Moriarty, N.W., Mustyakimov, M., Terwilliger, T.C., Urzhumtsev, A., Zwart, P.H. and Adams, P.D. (2012) Towards automated crystallographic structure refinement with phenix.refine. Acta crystallographica. Section D, Biological crystallography, 68, 352-367.

132

31. Chen, V.B., Arendall, W.B., 3rd, Headd, J.J., Keedy, D.A., Immormino, R.M., Kapral, G.J., Murray, L.W., Richardson, J.S. and Richardson, D.C. (2010) MolProbity: all-atom structure validation for macromolecular crystallography. Acta crystallographica. Section D, Biological crystallography, 66, 12-21.

32. DeLano, W.L. The PyMOL Molecular Graphics System, Version 1.7.4 Schrödinger, LLC. (2002).

33. Goddard, T.D., Huang, C.C. and Ferrin, T.E. (2005) Software extensions to UCSF chimera for interactive visualization of large molecular assemblies. Structure, 13, 473-482.

34. Chan, K.Y., Trabuco, L.G., Schreiner, E. and Schulten, K. (2012) Cryo-electron microscopy modeling by the molecular dynamics flexible fitting method. Biopolymers, 97, 678-686.

35. Trabuco, L.G., Villa, E., Mitra, K., Frank, J. and Schulten, K. (2008) Flexible fitting of atomic structures into electron microscopy maps using molecular dynamics. Structure, 16, 673-683.

36. Kale, L., Skeel, R., Bhandarkar, M., Brunner, R., Gursoy, A., Krawetz, N., Phillips, J., Shinozaki, A., Varadarajan, K. and Schulten, K. (1999) NAMD2: Greater scalability for parallel molecular dynamics. Journal of Computational Physics, 151, 283-312.

37. Almendros, C., Guzman, N.M., Diez-Villasenor, C., Garcia-Martinez, J. and Mojica, F.J. (2012) Target Motifs Affecting Natural Immunity by a Constitutive CRISPR-Cas System in Escherichia coli. PLoS One, 7, e50797.

38. Trabuco, L.G., Villa, E., Schreiner, E., Harrison, C.B. and Schulten, K. (2009) Molecular dynamics flexible fitting: a practical guide to combine cryo-electron microscopy and X-ray crystallography. Methods, 49, 174-180.

39. Fineran, P.C., Gerritzen, M.J., Suarez-Diez, M., Kunne, T., Boekhorst, J., van Hijum, S.A., Staals, R.H. and Brouns, S.J. (2014) Degenerate target sites mediate rapid primed CRISPR adaptation. Proc Natl Acad Sci U S A, 111, 1629-1638.

40. Sashital, D.G., Wiedenheft, B. and Doudna, J.A. (2012) Mechanism of foreign DNA selection in a bacterial adaptive immune system. Mol Cell, 48, 606-615.

41. Sinkunas, T., Gasiunas, G., Waghmare, S.P., Dickman, M.J., Barrangou, R., Horvath, P. and Siksnys, V. (2013) In vitro reconstitution of Cascade-mediated CRISPR immunity in Streptococcus thermophilus. Embo Journal, 32, 385-394.

133

42. Westra, E.R., Nilges, B., van Erp, P.B., van der Oost, J., Dame, R.T. and Brouns, S.J. (2012) Cascade-mediated binding and bending of negatively supercoiled DNA. RNA Biol, 9.

43. Rutkauskas, M., Sinkunas, T., Songailiene, I., Tikhomirova, M.S., Siksnys, V. and Seidel, R. (2015) Directional R-Loop Formation by the CRISPR-Cas Surveillance Complex Cascade Provides Efficient Off-Target Site Rejection. Cell Rep.

44. Westra, E.R., Swarts, D., Staals, R., Jore, M.M., Brouns, S.J.J. and Oost, J.v.d. (2012) The CRISPRs, They Are A-Changin': How Prokaryotes Generate Adaptive Immunity. Annu. Rev. Genet., 46, 311-339.

45. Jackson, R.N., Lavin, M. and Wiedenheft, B. (2014) Fitting CRISPR-associated Cas3 into the Helicase Family Tree. Current Opinion in Structural Biology, 24, 106–114.

46. Mulepati, S. and Bailey, S. (2013) In Vitro Reconstitution of an Escherichia coli RNA-guided Immune System Reveals Unidirectional, ATP-dependent Degradation of DNA Target. J Biol Chem, 288, 22184-22192.

47. Westra, E.R., van Erp, P.B.G., Künne, T., Wong, S.P., Staals, R.H.J., Seegers, C.L.C., Bollen, S., Jore, M.M., Semenova, E., Severinov, K. et al. (2012) CRISPR immunity relies on the consecutive binding and degradation of negatively supercoiled invader DNA by Cascade and Cas3. Mol Cell, 46, 595- 605.

48. Blosser, T.R., Loeff, L., Westra, E.R., Vlot, M., Kunne, T., Sobota, M., Dekker, C., Brouns, S.J. and Joo, C. (2015) Two distinct DNA binding modes guide dual roles of a CRISPR-Cas protein complex. Mol Cell, 58, 60-70. 134

Supplemental Figures

Figure S4.1. Representative agarose gels used to screen colonies in plasmid curing assays. (A) Agarose gel of PCR products from 20 colonies of E. coli that contain a Type IE CRISPR programed to target a high-copy number (pUC19) plasmid (positive control). Only one colony contained the target plasmid (lane1, 768-bp fragment), while 19 colonies contained the non-target plasmid (454-bp fragment). (B) Agarose gel of PCR products from 20 colonies of E. coli that contain a Type IE CRISPR-mediated programed to target a sequence that is not represented in the plasmid (negative control). Twelve colonies contained the target plasmid, seven colonies contained the non-target plasmid, and one colony contained both plasmids (lane 20).

135

Figure S4.2. Superposition of Cascade structures. (A) All six Cascade assemblies from the three available structures are aligned. Cas6e head adopts different conformations that vary by ~6 Å. (B) Overlay of the ssDNA-bound Cascade structure (colored) onto all the apo structures (transparent white). Cas6e of the ssDNA-bound structure adopts a conformation similar to that observed in apo Crystal assemblies, but target binding causes a conformational rearrangement in the Cse2 and Cse1 subunits, shown as colored arrows. The grey arrow indicates the variable position of Cas6e in the apo structures. 136

Figure S4.3. The ΔKKKK mutant expresses and purifies similar to the wild type Cascade complex. (A) Elution profile of WT Cse2-tagged Cascade (green) and Cse2- tagged ΔKKKK (blue). The insert shows a Commassie blue-stained SDS-PAGE gel (top) and a denaturing polyacrylamide gel of phenol-extracted crRNA isolated for the Cascade complexes (bottom). ΔKKKK expresses and purifies as WT Cascade. (B) Elution profile of Cse2-tagged Cascade (blue) and Cas7-tagged Cascade (red). The broader peak of the Cas7-tagged Cascade consists of WT Cascade and CascadeΔCse2 complexes. 137

138

Figure S4.4. Cascade relies on a lysine-rich vise for binding dsDNA targets. (A) Schematic of the crRNA and a 72-bp dsDNA target. The protospacer (red) is flanked by 19-bps (brown) on the 5’-end and 0 to 21-bps on the 3’-end. (B) Electrophoretic mobility shift assays (EMSA) performed using Cascade or ΔKKKK-Cascade, with dsDNA substrates that have different 3’- ends. Only dsDNA targets that reach the lysine-rich vise (K-rich vise) are bound with high affinity. Deletion of the lysine vise (ΔKKKK) results in a binding defect. (C) EMSA performed with Cascade or ΔKKKK-Cascade, using either 32-nt or 72-nt ssDNA targets. Mutation of the K-rich vise (ΔKKKK-Cascade) has little effect on ssDNA binding. Imagines of gels used for Electrophoretic Mobility Shift Assays were saved as JPGs and the brightness and contrast have been altered to enhance visibility.

139

Figure S4.5. Structural models generated using Molecular Dynamics Flexible Fitting. (A) Atomic coordinates of Cascade were modeled into the cryo-EM density of Cascade bound to a dsDNA target with a PAM (magenta) using Molecular Dynamics Flexible Fitting (MDFF). The model is colored according to changes in distance relative to the PAM. Atoms that move towards the PAM are colored orange and away from the PAM are blue. (B) Same as in panel A, only the atomic coordinates of the ssDNA bound structure of Cascade were used as the starting model. In this simulation the L1-helix moves out of the Cas5e pore, and is repositioned next to the PAM. 140

Figure S4.6. Sequence and structural alignment of PAM sensing features from E. coli and S. thermophilus. (A) Amino acid sequence alignment of the L1 and β-hairpin features of Cse1 from E. coli and S. thermophilus. (B) Structural comparison of the L1 and β-hairpin features of Cse1 from E. coli (left) and a homology model of Cse1 from S. thermophilus (right). 141

142

Figure S4.7. The L1-helix and β-hairpin are required for high-affinity binding to dsDNA targets. (A) EMSAs of WT Cascade, the L1-helix mutant (Cse1 T125N/N126K), and the β-hairpin mutant (Cse1 R351G/N353P/A355S/S356R) performed using a 72-bp dsDNA target. (B) Mutation of the L1-helix or the β-hairpin has little effect on ssDNA binding. EMSAs of Cascade, the L1-helix mutant, and the β-hairpin mutant, were performed using 72-nt ssDNA target.

143

Figure S4.8. Conserved arginines stabilize flipped-out bases on the DNA target. (A) Equilibrium dissociation constants of Cascade for 72-bp targets and 72-bp targets that are abasic at the positions corresponding to the flipped-out bases in the target strand. (B) Representative, EMSA performed using Cascade and ssDNA targets with or without abasic nucleotides a every 6th position of the protospacer. The absence of a nucleobase results in a modest but reproducible binding defect. 144

Figure S4.9. Mutations that perturb the arginine relay result in a dsDNA binding defect. (A) EMSAs performed using Cascade or Cascade mutants (Cse2R27A/R101A or Cse2R107A/R119A) with 72-bp dsDNA target. Absence of the Cse2 subunits in the mutants leads to a severe dsDNA-binding defect. Cas7-tagged Cascade has a higher KD (13.3 nM) than Cse2-tagged Cascade because it contains a subpopulation of Cascades lacking Cse2 (Figure S4.3). (B) EMSA of Cascade and arginine or aspartic acid mutants with 72-nt ssDNA target. These mutations result in a 15-fold binding defect. Visible is the subpopulation of CascadeΔCse2 binding ssDNA in the WT Cascade lanes.

145

Figure S4.10. Aspartate residue conservation in Cas7 proteins from Type I and Type III CRISPR systems. (A) Amino acid sequence alignments of the Cas7 subunit from Type I-E (Cas7), Type III-A (Csm3), and Type III-B (Cmr4). The conserved aspartate is indicated on each alignment. (B) Cas7 subunits are shaped like a right hand with palm, fingers, and thumb domains. The conserved aspartate is located between the palm and thumb domains. PDB codes are indicated. (C) Views of the conserved aspartate in context of the Cascade complex (left) and Cmr Complex (right). In Cascade, D22 forms a salt bridge with R27, whereas in the Cmr Complex D31 forms part of the RNase active site. 146

Supplemental Tables

Table S4.1. Plasmids used

Plasmid Description source pWUR397 Cas3 in pRSF-1b Brouns et al. (1) pWUR400 cse1-cse2-cas7-cas5-cas6e in pCDF-1b Brouns et al. (1) pWUR407 cse1-cse2 in pRSF-1b Brouns et al. (1) pWUR408 cse1 in pRSF-1b Brouns et al. (1) pWUR515 cas7 with Strep-tag II (N-term)-cas5-cas6e in pET52b Brouns et al. (2) pWUR547 E. coli R44 CRISPR, 7x spacer P7 in pACYCDuet-1 Jore et al. (3) pWUR610 pUC-J3; pUC19 containing J3-protospacer on a 350 bp phage λ amplicon Semenova et al. (4) pWUR613 pUC-P7-CGT; pUC19 containing R44-protospacer with CGT PAM on a 350 bp phage P7 Jore et al. (3) amplicon pWUR630 E. coli CRISPR, 4x spacer J3 in pACYCDuet-1 Semenova et al. (4) pWUR656 cse2 with Strep-tag II (N-term)-cas7-cas5-cas6e in pCDF-1b Westra et al. (5) pUC-P7-CAT pUC19 containing R44-protospacer with CAT PAM on a 350 bp phage P7 amplicon This study

147

Table S4.2. Data collection and refinement statistics

Crystal morphology Shard Data collection Beamline APS 23-ID Space group P212121 Cell dimensions a, b, c (Å) 105.99, 244.80, 426.74    (deg) 90.0, 90.0, 90.0 CC1/2 0.984 (0.566) Completeness (%) 100 (100.0) Redundancy 7.3 (6.9) Rmerge(%) 0.304 (1.369) I / σI 8.0 (2.1)

Refinement Resolution† (Å) 49.74-3.20 (3.25-3.20)* No. reflections 1347007 Rwork/Rfree (%) 25.0/27.6

RMSD Bond lengths (Å) 0.005 Bond angles (deg) 0.839 Ramachandran Favored (%) 94.6 Outliers (%) 0.58 Clashscore 4.3

*Values in parentheses are for highest-resolution shell. †Resolution limits use the criterion of I/σI > 2.0

148

Table S4.3. DNA substrates used for band shift assays

Description Sequence target strand (top) and non-target strand (bottom); PAM in black and spacer in blue

72 bp dsDNA target 3’-CGCGCCGTTCGGCTTTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 5’-GCGCGGCAAGCCGAAAGCATGACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’ (+21) 66 bp dsDNA target 3’-GTTCGGCTTTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 5’-CAAGCCGAAAGCATGACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’ (+15) 62 bp dsDNA target 3’-GGCTTTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 5’-CCGAAAGCATGACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’ (+11) 60 bp dsDNA target (+9) 3’-CTTTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 5’-GAAAGCATGACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’

58 bp dsDNA target (+7) 3’-TTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 5’-AAGCATGACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’

56 bp dsDNA target (+5) 3’-CGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 5’-GCATGACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’

54 bp dsDNA target (+3) 3’-TACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 5’-ATGACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’

51 bp dsDNA target (+0) 3’-TGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 5’-ACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’

72 bp a-basic dsDNA 3’-CGCGCCGTTCGGCTTTCGTACTGCCA-AACAA-TCTAG-ACCGA-CGGTT-TCACTAACGAGCCCTCAGCGA-5’ 5’-GCGCGGCAAGCCGAAAGCATGACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’ target 72 bp dsDNA target 3’-CGCGCCGTTCGGCTTTCGTAATGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 5’-GCGCGGCAAGCCGAAAGCATTACGGTATTGTTCAGATCCTGGCTTGCCAACAG TGATTGCTCGGGAGTCGCT-3’ 5’-AAT-3’ PAM 72 nt ssDNA target (+21) 3’-CGCGCCGTTCGGCTTTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 66 nt ssDNA target (+15) 3’-GTTCGGCTTTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 62 nt ssDNA target (+11) 3’-GGCTTTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 60 nt ssDNA target (+9) 3’-CTTTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 58 nt ssDNA target (+7) 3’-TTCGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 56 nt ssDNA target (+5) 3’-CGTACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 54 nt ssDNA target (+3) 3’-TACTGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 51 nt ssDNA target (+0) 3’-TGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 72 nt a-basic ssDNA 3’-CGCGCCGTTCGGCTTTCGTACTGCCA-AACAA-TCTAG-ACCGA-CGGTT-TCACTAACGAGCCCTCAGCGA-5’ target 72 nt ssDNA target 3’-CGCGCCGTTCGGCTTTCGTAATGCCATAACAAGTCTAGGACCGAACGGTTGTC ACTAACGAGCCCTCAGCGA-5’ 5’-AAT-3’ PAM

149

Table S4.4. Primers used for site directed mutagenesis

Primer Sequence (5`-3`) description

PvE 80 gctgttcttgctGAAGATATTGCCGCCATAC cas7 K137A K138A K141A K144A (fw) (ΔKKKK) PvE 81 gagcagagcagcATCATCCAGATTATCAGCC cas7 K137A K138A K141A K144A (rv) (ΔKKKK) PvE 106 cagagccgcATTCTTGAACGGCGTCATG cse1 R351G N353P A355S S356R (fw) (β-hairpin) PvE 114 tggatttccATATCCCCCCATAATCAATTC cse1 R351G N353P A355S S356R (rv) (β-hairpin) PvE 132 CatgACGGTATTGTTCAGATCCTGGCTTGCCAACAG pUC-p7-CAT (fw) PvE 135 CTTTCGAGCTTGCCGATCAGCTTGG pUC-P7-CAT (rv) PvE 148 GCAAATTAGAgctGTTTCAGAACCTGATGAATTAC cse2 R27A(fw) PvE 149 GCACATGATCCATTATCC cse2 R27A(rv) PvE 150 TAACGAGCGCgctATCTTTCAATTAATTCGG cse2 R101A(fw) PvE 151 ATTCTTCCACTATTGGCTAAAG cse2 R101A(rv) PvE 152 TCAATTAATTgctGCTGACAGAACAGCCGATATGGTC cse2 R107A(fw) PvE 153 AAGATACGGCGCTCGTTA cse2 R107A(rv) PvE 154 TAACGAGCGCgctATCTTTCAATTAATT cse2 R101A R107A(fw) PvE 166 AAGCGGCGCGaacaagTGTGCATTTGTCAATCAAC cse1 T125N N126K (fw) (L1) PvE 167 ACCCCAGCCAACAGTTTTTC cse1 T125N N126K (rv) (L1) PvE 176 CCAGTTACGTgctTTACTTACTCACGC cse2 R119A(fw) PvE 177 ACCATATCGGCTGTTCTG cse2 R119A(rv)

150

Supplemental Movies

Movie S4.1. Mechanism of dsDNA binding by Cascade. Cascade is a seahorse-shaped ribonucleoprotein complex consisting of 11 protein subunits and a crRNA guide molecule (red). X-ray crystal structures of Cascade determined before and after binding a complementary DNA target reveal a conformational change that is primarily restricted to the belly (Cse2) and tail (Cse1) subunits. To understand the mechanism of DNA binding, the unbound structure of Cascade was docked into the 9 Å cryo-EM density and a piece of B-form dsDNA (green) with PAM sequence (pink) was modeled into density near the tail. Lysine rich helices (blue) are present on the Cas7 backbone subunits. The lysine rich helices of Cas7.5 and Cas7.6 form a vise, which interacts with the phosphate backbone of the DNA target. Molecular dynamics flexible fitting (MDFF) was used to model atomic coordinates of the unbound Cascade structure into the cryo-EM density. The simulation shows that upon dsDNA binding the L1-helix slightly out of a pore formed by Cas5e. In contrast to the modest movement in L1, a large β-hairpin is repositioned in between the complementary and displaced strands of the DNA target. Residues that move towards the PAM are colored orange in the simulation. In addition to conformational changes in the tail, target binding triggers an “arginine relay” in which specific salt bridges between arginine residues on Cse2 (R27 and R101) and aspartic acid residues (D22) on the Cas7 subunits, are broken and compensatory salt bridges are formed (R107 R119) with acidic residues further down the Cas7 backbone. The arginines R27 and R101 interact with the flipped-out bases of the target DNA strand. https://academic.oup.com/nar/article/43/17/8381/2414437#supplementary-data

151

CHAPTER FIVE

STRUCTURAL BASIS FOR PROMISCUOUS PAM RECOGNITION IN THE

TYPE I-E CASCADE FROM E. COLI

Contribution of Authors and Co-Authors

Manuscript in Chapter 5

Author: Robert P. Hayes

Contributions: designed the research, determined the structure and performed biochemical analyses, wrote the manuscript

Co-Author: Yibei Xiao

Contributions: contributed to biochemical analysis

Co-Author: Fran Ding

Contributions: contributed to biochemical analysis

Co-Author: Paul B.G. van Erp

Contributions: contributed to biochemical analysis

Co-Author: Kanagalaghatta Rajashankar

Contributions: assisted with diffraction data collection and processing

Co-Author: Scott Bailey

Contributions: contributed to assay setup

Co-Author: Blake Wiedenheft

Contributions: data interpretation, wrote the manuscript

Co-Author: Ailong Ke

Contributions: designed the research, wrote the manuscript 152

Manuscript Information

Robert P. Hayes, Yibei Xiao, Fran Ding, Paul B. G. van Erp, Kanagalaghatta Rajashankar, Scott Bailey, Blake Wiedenheft, Ailong Ke

Nature

Status of Manuscript: ____ Prepared for submission to a peer-reviewed journal ____ Officially submitted to a peer-reviewed journal ____ Accepted by a peer-reviewed journal X Published in a peer-reviewed journal

Nature Publishing Group Volume 530 Issue 7591 25 Feb 2016 Pages 499-503 DOI: 10.1038/nature16995

153

STRUCTURAL BASIS FOR PROMISCUOUS PAM RECOGNITION IN THE

TYPE I-E CASCADE FROM E. COLI

Robert P. Hayes1, Yibei Xiao1, Fran Ding1, Paul B. G. van Erp4, Kanagalaghatta

Rajashankar2, Scott Bailey3, Blake Wiedenheft4, Ailong Ke1

1Department of Molecular Biology and Genetics, Cornell University, 253 Biotechnology

Building, Ithaca, NY 14853, USA. 2Department of Chemistry and Chemical Biology,

Cornell University, NE-CAT, Advanced Photon Source, Argonne National Laboratory,

Argonne, IL 60439, USA. 3Department of Biochemistry and Molecular Biology, Johns

Hopkins University, Bloomberg School of Public Health, Baltimore, MD 21205, USA.

4Department of Microbiology and Immunology, Montana State University, Bozeman,

MT 59717, USA.

154

Abstract

Clusters of regularly interspaced short palindromic repeats (CRISPRs) function in tandem with cas (CRISPR-associated) operon to form an RNA-based adaptive immune system against foreign genetic elements in prokaryotes (1). Type I systems account for

95% of known CRISPR systems, and have been utilized to control and cell fate (2, 3). During CRISPR RNA (crRNA)-guided interference, Cascade (CRISPR- associated complex for antiviral defense) facilitates crRNA-guided invasion of double- stranded DNA (dsDNA) for complementary base-pairing with the target DNA strand, while displacing the non-target strand to form an R-loop (4, 5). Cas3 nuclease/helicase is recruited subsequently to degrade two DNA strands (4, 6, 7). Protospacer adjacent motif

(PAM) flanking target DNA is crucial for self vs. foreign discrimination (4, 8-16). Here we present a 2.45 Å crystal structure of E. coli Cascade bound to a foreign dsDNA target.

The 5’-ATG PAM is recognized in double-stranded form, from the minor groove side, by three structural features in Cse1. The promiscuity inherent to minor groove DNA recognition rationalizes the puzzling observation that a single Cascade can respond to several distinct PAM sequences. Optimal PAM recognition coincides with a wedge inserted into dsDNA, initiating the directional target strand displacement for segmented base-pairing with crRNA. The non-target strand is guided along a parallel path 25 Å apart, and the R-loop structure is further stabilized by locking this strand behind Cse2 dimer.

These observations provide the structure basis to understand the PAM-dependent directional R-loop formation process (17, 18).

155

Main text

Differentiating between ‘self’ and ‘non-self’ antigens is critical in CRISPR systems, as foreign target sequences (protospacers) are identical to sequences recorded in the host

CRISPR locus (spacers). In Type I and II CRISPR systems, foreign DNA detection relies on protein-mediated recognition of a short PAM sequence (4, 8-16). Whereas PAM recognition in Cas9-based Type II systems has been elucidated based on major-groove

DNA contact (19, 20), it remains unclear as to whether Cascade recognizes PAM from

DNA major or minor groove (12), in ss- or ds-form (4, 21). A particularly puzzling observation is the promiscuity in PAM recognition. Five PAM sequences (5’-ATG, AAG,

AGG, GAG, and TAG reading from the non-target strand) are capable of triggering robust

CRISPR interference via E. coli Cascade (4, 11, 21, 22). Crystal structures of E. coli

Cascade in free- and ssDNA-bound forms revealed multiple conformational states, and provided valuable insights about the crRNA-guided ssDNA recognition mechanism (21,

23, 24), however, the mechanisms for dsDNA entry, PAM recognition, and R-loop formation remain poorly defined.

To understand the PAM-dependent foreign DNA recognition mechanism, we determined the 2.45 Å crystal structure of E. coli Cascade bound to a partially duplexed dsDNA that forms a partial R-loop (Figure 5.1A-C, Figure S5.1); such DNA substrates were efficiently bound by T. fusca Type I-E Cascade, and the resulting complex specifically recruited Cas3 (25). This structure agrees well with the cryo-EM reconstruction of the full R-loop/Cascade complex, underlining its validity in explaining the R-loop formation process (21) (Figure S5.2, Supplementary Video 1). Comparison with 156 high-resolution crystal structures suggests the R-loop formation requires both the sliding of Cse1-CTD (C-terminal domain)-Cse2.1-Cse2.2, as seen in the ssDNA-bound structure, and the engagement of Cse1-NTD (N-terminal domain) as seen in the free-Cascade structure (21, 23, 24) (Figure S5.3, Supplementary Videos 2, 3). In addition, a localized conformational change takes place near the putative Cas3 binding site (21), which is only observed in this R-loop bound structure, therefore may play a role in Cas3 recruitment

(Figure S5.4). dsDNA enters Cascade between Cas7.5 and Cas7.6, contacted by the lysine- rich helices (16, 26) (Figure 5.1D-E, Figure S5.5). DNA bifurcates underneath PAM. The entire target DNA strand flips to form the segmented DNA-crRNA duplex, as observed in the Cascade-ssDNA structure (27). The 10-nt non-target strand is guided to a parallel path

25 Å apart by sequence nonspecific contacts, an active mechanism to prevent R-loop collapse (Figure 5.1D-E). Modeling dsDNA beyond PAM would project it across Cse1-

CTD without severe steric clashes (Figure 5.1F), which illustrates a possible PAM- searching scenario (21).

The 5’-ATG PAM sequence is recognized in the double-stranded form, from the minor groove side, by the Cse1 subunit of the Cascade (Figure 5.2A-B). This rather surprising mode of recognition strongly biases towards the target DNA strand, which explains the previous observations that mismatched PAMs could be tolerated, provided the target strand sequence was optimal (4, 21). Three structural features in Cse1 are involved in PAM recognition: the glutamine-wedge, the glycine-loop and a lysine-finger (Figure

5.2A-B). Most studies showed that only CT-1−GNT-1 is tolerated at the -1 PAM position

(PAM-1) in E. coli (5, 21), although recent high throughput analyses revealed spacer- 157 dependent tolerance of other base-pairs at PAM-1 (28). In our structure, CT-1−GNT-1 is specified by two main-chain protein contacts (Figure 5.2A-D). The amide of A355 in the glutamine-wedge donates a H-bond to O2 of CT-1, specifying a pyrimidine in the target strand. The carbonyl of G157 in the glycine-loop accepts a water-mediated H-bond from

N2 of GNT-1, although GNT-1-to-inosine substitution assayed by Electrophoretic Mobility

Shift Assays (EMSAs) suggests this contact only provides minor discrimination against

ANT-1 (Figure S5.6A). Cascade’s affinity for different PAM-1 combinations agreed well with our structure-based prediction that the target strand contact is the major determinant of specificity (Figure S5.6B). The narrow dynamic range of Kd differences, however, does not seem to support an absolute CT-1−GNT-1 specification at PAM-1. Indeed, Cas3 was capable of cleaving each PAM-containing target, provided the Cascade concentration is above the Kd of the corresponding target (Figure S5.6C). These results echo the recent analysis in suggesting that PAM-1 readout could be further complicated by Cascade/Cas3 expression level and spacer content (28). 158

a b

c f

d e

Figure 5.1. Foreign DNA bound Cascade structure. (A) Arrangement of I-E CRISPR- cas locus in the E. coli K12 genome. The color schemes are preserved throughout all figures. (B) Nucleic acid sequences and (C) overall views of foreign DNA-bound Cascade structure. (D) Entry of dsDNA between Cas7.5 and Cas 7.6, PAM recognition by three structural elements in Cse1-NTD (magenta), and partial R-loop underneath PAM. (E) Schematic of Cascade-DNA contacts around PAM regions. Hydrogen bonds and electrostatic contacts as dashed lines, hydrophobic interactions as solid lines, and waters as blue circles. (F) Modeling of Cascade sampling B-form dsDNA for PAM.

159

Importantly, PAM recognition coincides with the insertion of the glutamine-wedge into the path of dsDNA underneath PAM (Figure 5.2A, D). The tip residue Q354 stacks underneath CT-1 and, together with N353, sterically displaces the first two nucleotides in the target strand protospacer, forcing them to rotate outwards. Given its strategic location, it is rather surprising that this wedge is not highly conserved in sequence (Figure S5.7).

Indeed, tip residue substitutions (Q354A, N353A, and Q354A/N353A) had little effect in

EMSA assays. In contrast, trimming this wedge (NNQAS352-356/GG) led to 100-fold

DNA binding defect. These data suggest the wedge functions largely through a steric interference mechanism, to nucleate the displacement of target strand nucleotide upon

PAM recognition (Figure 5.2E-F). A serine-to-phosphate ‘lock’ is essential in initiating the target DNA strand flipping in Cas9 (19). T125 is in a similar location in our structure but contacts the +1 bridging oxygen instead, and T125A substitution had negligible defect

(Figure 5.2E).

Recognition at PAM-2 is promiscuous by E. coli Cascade. Only GT-2-CNT-2 is rejected at this position; the other three combinations lead to efficient interference (22).

Here the glycine-loop residues (159-161) assume a Type II β-turn. This lip-like structure introduces DNA bending at AT-2-TNT-2 and ‘bites’ onto PAM-1 base-pair in conjunction with the glutamine-wedge underneath; TNT-2 retreats backwards and tilts upwards (Figure

5.2B, D). The rim of the glycine-loop explores shape-complementarity to AT-2, and donates a weak H-bond to N3 of AT-2. G160A substitution disrupts the shape complementarity and reduced Cascade binding affinity by ~100-fold, and Cas3 cleavage to baseline level (Figure

5.2E-F). Rejection of a GT-2-CNT-2 pair at PAM-2 can be rationalized by the fact that N2 160 amine of GT-2 would introduce steric clashes against the glycine loop; whereas a TT-2-ANT-

2 or CT-2-GNT-2 pair would not, based on modeling (Figure S5.8A). Indeed, removal of this amine by inosine substitution largely rescued the DNA-binding defect, confirming that N2 of GT-2 is the anti-determinant for PAM-2 specificity (Figure S5.8B). An equivalent glycine-rich loop is present in all known Cse1 structures; they likely play a similar minor groove DNA recognition function (Figure S5.7).

PAM-3 is typically specified as a pyrimidineT-purineNT pair by E. coli Cascade (4,

11, 21), TAG PAM is also tolerated, but none of the GT-3-CNT-3 containing PAMs lead to interference (22, 28). Here the only contact is a favorable electrostatic interaction from a lysine-finger (K268) to O2 of TT-3, which rationalizes the strong preference for pyrimidine at this position (Figure 5.2B, D). K268A mutation had >8-fold DNA binding and >10-fold

Cas3 cleavage defects, emphasizing its positive contribution to PAM-3 recognition (Figure

5.2E-F). Interestingly, K268A mutant still retained wild-type level discrimination against

5’CTG-PAM (Figure S5.8C). GT-3–to-inosine substitution ultimately proved that, similar to the scenario at PAM-2, the N2 amine of GT-3 serves as a strong anti-determinant in the rejection of GT-3–containing PAMs (Figure S5.8D). Interestingly, K268 makes a weaker electrostatic interaction to CT-4 (Figure 5.2B, D), which imply certain level of sequence discrimination at PAM-4 as well. 161

Figure 5.2. PAM recognition by Cse1 subunit of Cascade. (A)(B) Detailed views featuring PAM recognition by the glutamine-wedge (and its involvement in target DNA strand displacement), glycine-loop, and lysine-finger of Cse1. The 2FO-FC electron density map is displayed at 1.5 σ. (C) Summarization of the five interference PAMs in E. coli Type I-E CRISPR system, with the observed Cascade contacts marked. (D) Top-down views of PAM recognition at each base-pair. Left: stringent recognition of PAM-1. Middle: shape complementarity to PAM-2. Right: Electrostatic contacts to PAM-3 and -4. (E) Mutagenesis assayed by Cascade-binding (EMSA) and (F) Cas3-mediated DNA cleavage to evaluate the observed PAM contacts.

162

Detailed structure dissection also helps us rationalize the self-avoidance mechanism. The spacers in E. coli CRISPR loci are ‘protected’ by a 5’-CCG-PAM. This

PAM combines the least-preferred nucleotides in each PAM position (GNT-3GNT-2CNT-1), rationalizing the inability of Cascade to form an R-loop in the CRISPR loci, despite a perfect spacer match.

The non-target strand is guided sequence nonspecifically along the Cascade surface, 20-25 Å away from the target strand (Figure 5.3A). The sugar-phosphates of nucleotides 1-3 are contacted by K163, G169 and K296 of Cse1-NTD. Nucleotides 6-9 traverse across Cse1-CTD, contacted by a string of positively charged residues

(R393/K394/K488/H489/R491) (Figure 5.3A). Y397 and L481 further interdigitate between di-nucleotide stacks to strengthen this interaction. The redundant interactions were not disrupted by a point mutation (Y397A). A double mutant (488-489 KH/AA) neutralizing a positive patch, however, led to a 4-fold binding defect (Figure 5.3B). The

10th/last nucleotide rests at an intersection between Cse1-CTD and Cse2.1. To resolve whether the following non-target residues traverse across the surface or backside of the

Cse2 dimer (trench route), we further determined a 3.2 Å structure where the non-target

DNA strand is 22-nt longer. Although most of the additional residues are not resolved, the electron density clearly reveals that the following non-target strand residues take the trench route downwards through the gap between Cse1-CTD and Cse2.1, attracted by the favorable electrostatic environment therein (Figure 5.3C-E). The non-target strand sequestration to the Cse2’s backside explains the corresponding protection pattern in 163 footprinting analysis (5). It likely corresponds to the extra “locking” step after most of the

R-loop forms, as the R-loop collapse would be kinetically slow after sequestration (17).

a b

c

d

Figure 5.3. Active guidance of the non-target DNA strand by Cascade. (A) Sequence nonspecific contacts guide the 10-nt non-target strand (red sticks) underneath PAM along the surface of Cse1-NTD (cyan), and across a binding site on Cse1-CTD (green). (B) EMSA evaluating non-target contacts by Cascade. (C)(D) Nucleic acid sequence and Zoom-in view of the 3.2 Å structure of Cascade programmed with a 22-nt longer non-target strand. The 2FO-FC electron density map at 1.0 σ clearly reveals that the longer non-target DNA strand takes the trench route. Inset illustrates the favorable electrostatic surface. 164

a b c d

Figure 5.4. Model for PAM-dependent directional R-loop formation in Type I CRISPR-Cas system. (A) Cascade samples dsDNA minor groove for various sequence combinations (yellow sticks). (B) Left: interference and priming PAMs lead to longer dwell time and local DNA bending. Middle: Interference PAMs allow optimal minor groove interaction, which is coupled with the glutamine wedge insertion and disruption of the first 2-bp of protospacer. Right: Directional DNA melting leads to segmented DNA/crRNA duplex formation at the target strand side, and favorable sugar-phosphate contacts at the non-target side, leading to seed bubble stabilization. (C) Further DNA unwinding leads to the non-target strand sequestration to the backside of Cse2 dimer, locking the R-loop in place. (D) R-loop formation is accompanied by a pivoting motion in Cse1-CTD and a sliding motion in Cse2 dimer. Local rearrangement occurs in Cse1 (depicted as a flag), licensing Cas3 recruitment.

In summary, our structural analysis provides important insights about the PAM- dependent directional R-loop formation process (17) (Figure 5.4). Recognition of an interference PAM by Cascade coincides with the wedge-mediated displacement of the first two target strand nucleotides, initiating DNA unwinding. The directional DNA melting ensures the ordered guidance of the non-target DNA strand ~25Å away from the target strand, as a mechanism to stabilize the seed bubble. Further R-loop propagation leads to non-target strand sequestration behind Cse2 dimer, locking the R-loop in place.

Conformational changes accompany the process and reorganize the Cse1 surface, paving the way for Cas3 binding. The active guidance of the non-target DNA strand is a theme not observed in Cas9-DNA structures (19, 20). It rationalizes the observation that the Cascade- bound R-loop is significantly more stable than that by Cas9 (17). Besides five interference 165

PAMs, Twenty-one other PAM sequences stimulate the “primed adaptation” in E. coli (22), where Cascade and Cas3 actively recruit the spacer acquisition Cas1/Cas2 complex (29,

30). Priming PAMs may lead to suboptimal Cse1 contacts and non-canonical R-loops.

Such T-loops are usually not completely unwound and more difficult to form (18), requiring much higher Cascade concentration (30) and favorable DNA torque (17). They also fail to directly recruit Cas3 (30), which may reflect that the non-target DNA strand is misguided, as Cas3 recruitment is contingent on physically interacts with non-target strand as well (25). Overall, our work provides a framework for future biochemical and structural studies to better understand the interference mechanism in Type I CRISPR-Cas systems.

Materials and Methods

Expression and purification of Cascade, Cascade-dsDNA complex and Cas3

Sequence information for primers and the synthetic CRISPR expression cassette can be found in Extended Data Table 1. Cascade expression was similar to (26). Briefly, cse1 was PCR amplified from E. coli K12 genomic DNA and cloned into pRSF-Duet-

ORF1 vector (KanR), between NcoI and NotI restriction sites (Extended Data Table 1). The cse2-cas7-cas5e-cas6e sub-operon was cloned into pET52b (AmpR) between NcoI and

NotI; as a Precission cleavable His6 fusion at the N-terminal of cse2. The pre-crRNA expression cassette was synthesized by Life Technologies and cloned into the pHSG-398 vector (CamR) (Extended Data Table 1). E. col BL21 (DE3) star cells containing the 3

° plasmids were grown in LB medium at 37 C to O.D. 600 of 0.6. Cascade expression was induced by the addition of 0.5 mM isopropyl-β-D-1-thiogalactopyranoside (IPTG) and cells were further cultured at 20 °C for 12 hr. 166

The cells ware disrupted by sonication in buffer A (50mM Tris pH 8.0, 20mM imidazole, 300 mM NaCl), loaded to Ni-NTA column, and eluted with buffer A supplemented with 300 mM imidazole. The His6 tag was cleaved by Precision protease, and back-adsorbed with a second Ni-NTA column binding step. Cascade was concentrated and buffer exchanged into buffer B (20 mM Tris pH 7.5, 150 mM NaCl, 2 mM DTT), and further purified on Superdex 200 prep grade column (GE healthcare). Free-Cascade containing fractions were pooled, concentrated to 15 mg/mL, flash-frozen, and stored at -

80 oC.

Cascade-dsDNA complex was prepared by mixing free-Cascade with dsDNA R- loop mimicking substrate. DNA substrates were chemically synthesized from IDT

(Extended Data Table 1). The non-target strand was annealed with the target strand at a

1.5:1 molar ratio. The resulting R-loop mimicking substrate was mixed with Cascade at a

2:1 molar ratio, incubated at room temperature for 30 minutes, and re-purified on Superdex

200. Cascade-dsDNA complex fractions were pooled and utilized in crystallization trials.

The Cascade-dsDNA complex containing the 32-nt non-target strand overhang was also obtained using the above protocol, except the His tag was not cleaved. Cascade mutants were constructed with site-directed mutagenesis, and purified using the same method as free-Cascade, except that the N-terminal His6 tag was left intact. Cascade integrity was checked using SDS-PAGE (Figure S5.4B).

The Cas3 gene was amplified from E. coli K12 genomic DNA and cloned between

BamHI and XhoI into the pET28a-SUMO plasmid. E. coli BL21 (DE3) star cells were

° grown in LB medium at 20 C to O.D.600 of 0.3, induced with 0.2 mM IPTG, and further 167 cultured at 20 °C for 12 hr. The Ni-NTA and SEC purification procedures were similar to the procedure mentioned above. The monomeric SUMO-Cas3 fractions were pooled, concentrated to 2 mg/mL, flash-frozen and stored at -80 °C until usage in biochemical assays.

Crystallization and structure determination of Cascade/partial R-loop complex

Cascade complex crystals were grown using the hanging drop vapor-diffusion method by mixing 2 L of purified Cascade-ds-DNA complex (15 mg/mL) with 2 L of mother liquor (1.6 M Na/K Phosphate pH 6.2) at 18 °C. Initial crystals appeared after 2 weeks and grew to full size after ~6 weeks. Crystals were cryoprotected in motherliquor supplemented with 20% ethylene glycol and flash frozen in liquid nitrogen. Diffraction data were collected at Advanced Photon Source – NECAT beamline 24-ID-C and were processed with HKL2000 (31). The structure was solved by molecular replacement with

PDB: 4QYZ as the search model. Iterative model building and refinement was conducted with COOT (32) and PHENIX (33). A summary of the diffraction and refinement statistics can be found in Extended Data Table 2. The Ramachandran plot for the Cascade-dsDNA

10-nt overhang structure indicated 95.9% of residues in the favored region, 3.9% allowed, and 0.2% outliers. The Ramachandran plot for the Cascade ds-DNA 32-nt overhang structure indicated 91.9% of residues in the favored region, 7.3% allowed, and 0.8% outliers. Figures were generated using Pymol (34) and CCP4mg (35).

Electrophoretic mobility shift assay

Fluorescent dsDNA target substrates were generated for biochemical assays. The crRNA-matching targets with varied PAMs were cloned into pCDF-Duet between the PstI 168 and NcoI sites Extended Data Table 1. The dsDNA1 and dsDNA2 substrates were PCR amplified from the plasmid using the indicated fluorescent oligonucleotides (5’ 6-FAM for the non-target strand and 5’ Cy5 for the target strand). The dsDNA3 substrates were prepared by oligonucleotide annealing. All dsDNA substrates were subsequently gel- purified. The dsDNA1 substrate (5’ATG-PAM) was used for all main-text EMSA and Cas3 cleavage assays. The dsDNA2 substrates were used for the EMSA shown in Figure

S5.7. The dsDNA3 substrates were used in the EMSA shown in Table S5.1. DNA binding was conducted in 20 mM Tris pH 7.5, 150 mM NaCl, 5% glycerol. The dsDNA substrate concentration was held constant at 3 nM and Cascade concentration was titrated as indicated. The Cascade and dsDNA were incubated at 37 ˚C for 30 minutes in 20 mM Tris pH 7.5, 150 mM NaCl, 5% glycerol. EMSA was carried out at 4 ˚C on 2% agarose gels.

Fluorescent signals were scanned using a Typhoon 9200 scanner.

Cascade-mediated Cas3 DNA cleavage assay

Cascade-R-loops were pre-formed by mixing 40 nM Cascade or Cascade mutants with 6 nM fluorescent dsDNA target in binding buffer (5 mM HEPES pH 7.5, 60 mM KCl) at 37 ˚C for 30 minutes. Cascade-R-loops were then mixed with 500 nM SUMO-Cas3 in

DNA cleavage RXN buffer (5 mM HEPES pH 7.5, 60 mM KCl, 10 mM MgCl2, 10 M

CoCl2). Either 2 mM ATP or 2 mM AMPPNP was added and the reaction was incubated at 37 ˚C for 30 minutes. Cy5 and 6-FAM fluorescent signals were recorded by Typhoon

9200 scanner. The wild-type E. coli Cascade specifically nicked the non-target DNA strand

~10-12 nt into the R-loop region in the presence of a non-hydrolyzable ATP analog,

AMPPNP. Addition of ATP triggered processive degradation of the non-target DNA strand 169 and distributive degradation of the target strand upstream of the R-loop. These results are consistent with previous studies (11, 21).

Acknowledgments

This work is supported by NIH grants GM102543 and GM086766 to A.K,

GM097330 to S.B. and GM108888 to B.W. NE-CAT beamlines were supported by NIH grants P41 GM103403 and S10 RR029205. We thank Gerald Feigenson and John Mallon for technical help, Ilya Finkelstein and Ian Price for helpful discussions.

170

References

1. van der Oost J, Westra ER, Jackson RN, Wiedenheft B. Unravelling the structural and mechanistic basis of CRISPR-Cas systems. Nature reviews Microbiology. 2014;12(7):479-92.

2. Luo ML, Mullis AS, Leenay RT, Beisel CL. Repurposing endogenous type I CRISPR-Cas systems for programmable gene repression. Nucleic acids research. 2015;43(1):674-81.

3. Caliando BJ, Voigt CA. Targeted DNA degradation using a CRISPR device stably carried in the host genome. Nature communications. 2015;6:6989.

4. Westra ER, van Erp PB, Kunne T, Wong SP, Staals RH, Seegers CL, et al. CRISPR immunity relies on the consecutive binding and degradation of negatively supercoiled invader DNA by Cascade and Cas3. Molecular cell. 2012;46(5):595-605.

5. Jore MM, Lundgren M, van Duijn E, Bultema JB, Westra ER, Waghmare SP, et al. Structural basis for CRISPR RNA-guided DNA recognition by Cascade. Nature structural & molecular biology. 2011;18(5):529-36.

6. Brouns SJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJ, Snijders AP, et al. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science. 2008;321(5891):960-4.

7. Wiedenheft B, Lander GC, Zhou K, Jore MM, Brouns SJ, van der Oost J, et al. Structures of the RNA-guided surveillance complex from a bacterial immune system. Nature. 2011;477(7365):486-9. 8. Westra ER, Semenova E, Datsenko KA, Jackson RN, Wiedenheft B, Severinov K, et al. Type I-E CRISPR-cas systems discriminate target from non-target DNA through base pairing-independent PAM recognition. PLoS genetics. 2013;9(9):e1003742.

9. Marraffini LA, Sontheimer EJ. Self versus non-self discrimination during CRISPR RNA-directed immunity. Nature. 2010;463(7280):568-71.

10. Mojica FJ, Diez-Villasenor C, Garcia-Martinez J, Almendros C. Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiology. 2009;155(Pt 3):733-40.

171

11. Mulepati S, Bailey S. In vitro reconstitution of an Escherichia coli RNA-guided immune system reveals unidirectional, ATP-dependent degradation of DNA target. The Journal of biological chemistry. 2013;288(31):22184-92.

12. Rollins MF, Schuman JT, Paulus K, Bukhari HS, Wiedenheft B. Mechanism of foreign DNA recognition by a CRISPR RNA-guided surveillance complex from Pseudomonas aeruginosa. Nucleic acids research. 2015;43(4):2216-22.

13. Sashital Dipali G, Wiedenheft B, Doudna Jennifer A. Mechanism of Foreign DNA Selection in a Bacterial Adaptive Immune System. Molecular cell. 2012;46(5):606-15.

14. Sinkunas T, Gasiunas G, Waghmare SP, Dickman MJ, Barrangou R, Horvath P, et al. In vitro reconstitution of Cascade-mediated CRISPR immunity in Streptococcus thermophilus. The EMBO journal. 2013;32(3):385-94.

15. Sternberg SH, Redding S, Jinek M, Greene EC, Doudna JA. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature. 2014;507(7490):62-7.

16. van Erp PB, Jackson RN, Carter J, Golden SM, Bailey S, Wiedenheft B. Mechanism of CRISPR-RNA guided recognition of DNA targets in Escherichia coli. Nucleic acids research. 2015.

17. Rutkauskas M, Sinkunas T, Songailiene I, Tikhomirova MS, Siksnys V, Seidel R. Directional R-Loop Formation by the CRISPR-Cas Surveillance Complex Cascade Provides Efficient Off-Target Site Rejection. Cell reports. 2015.

18. Blosser TR, Loeff L, Westra ER, Vlot M, Kunne T, Sobota M, et al. Two distinct DNA binding modes guide dual roles of a CRISPR-Cas protein complex. Molecular cell. 2015;58(1):60-70.

19. Anders C, Niewoehner O, Duerst A, Jinek M. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature. 2014;513(7519):569- 73.

20. Nishimasu H, Cong L, Yan WX, Ran FA, Zetsche B, Li Y, et al. Crystal Structure of Staphylococcus aureus Cas9. Cell. 2015;162(5):1113-26.

21. Hochstrasser ML, Taylor DW, Bhat P, Guegler CK, Sternberg SH, Nogales E, et al. CasA mediates Cas3-catalyzed target degradation during CRISPR RNA- guided interference. Proceedings of the National Academy of Sciences of the United States of America. 2014;111(18):6618-23.

172

22. Fineran PC, Gerritzen MJ, Suarez-Diez M, Kunne T, Boekhorst J, van Hijum SA, et al. Degenerate target sites mediate rapid primed CRISPR adaptation. Proceedings of the National Academy of Sciences of the United States of America. 2014;111(16):E1629-38.

23. Zhao H, Sheng G, Wang J, Wang M, Bunkoczi G, Gong W, et al. Crystal structure of the RNA-guided immune surveillance Cascade complex in Escherichia coli. Nature. 2014;515(7525):147-50.

24. Jackson RN, Golden SM, van Erp PB, Carter J, Westra ER, Brouns SJ, et al. Structural biology. Crystal structure of the CRISPR RNA-guided surveillance complex from Escherichia coli. Science. 2014;345(6203):1473-9.

25. Huo Y, Nam KH, Ding F, Lee H, Wu L, Xiao Y, et al. Structures of CRISPR Cas3 offer mechanistic insights into Cascade-activated DNA unwinding and degradation. Nature structural & molecular biology. 2014;21(9):771-7.

26. Jackson RN, Golden SM, van Erp PB, Carter J, Westra ER, Brouns SJ, et al. Crystal structure of the CRISPR RNA-guided surveillance complex from Escherichia coli. Science. 2014;345(6203):1473-9.

27. Mulepati S, Heroux A, Bailey S. Crystal structure of a CRISPR RNA-guided surveillance complex bound to a ssDNA target. Science. 2014;345(6203):1479- 84.

28. Xue C, Seetharam AS, Musharova O, Severinov K, Brouns SJ, Severin AJ, et al. CRISPR interference and priming varies with individual spacer sequences. Nucleic acids research. 2015.

29. Datsenko KA, Pougach K, Tikhonov A, Wanner BL, Severinov K, Semenova E. Molecular memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system. Nature communications. 2012;3:945. 30. Redding S, Sternberg SH, Marshall M, Gibb B, Bhat P, Guegler CK, et al. Surveillance and Processing of Foreign DNA by the Escherichia coli CRISPR- Cas System. Cell. 2015.

31. Otwinowski Z, Minor W. Processing of X-ray Diffraction Data Collected in Oscillation Mode Methods in Enzymology. 1997;276(Macromolecular Crystallography, part A):307-26.

32. Emsley P, Cowtan K. Coot: model-building tools for molecular graphics. Acta crystallographica Section D, Biological crystallography. 2004;60(Pt 12 Pt 1):2126-32.

173

33. Adams PD, Afonine PV, Bunkoczi G, Chen VB, Davis IW, Echols N, et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta crystallographica Section D, Biological crystallography. 2010;66(Pt 2):213-21.

34. Schrodinger, LLC. The PyMOL Molecular Graphics System, Version 1.3r1. 2010.

35. Collaborative Computational Project N. The CCP4 suite: programs for protein crystallography. Acta crystallographica Section D, Biological crystallography. 1994;50(Pt 5):760-3.

174

Supplemental Figures

Figure S5.1. Electron density of nucleic acids in Cascade-dsDNA complex structure. 2FO-FC electron density map (1.0 σ). Nucleic acid strands are shown as sticks and colored as previously indicated.

175

Figure S5.2. Comparison between the Cascade-partial R-loop crystal structure and the Cascade-full R-loop EM reconstruction. Rigid body docking of the partial R-loop Cascade crystal structure into the EM reconstruction (EMD-5929) of the full R-loop Cascade illustrates a similar overall conformation, with a correlation value of 0.83.

176

Figure S5.3. Conformational changes in Cascade upon partial R-loop formation. Comparison of free- (PDB: 4TVX; in a darker shade) and partial R-loop bound Cascade (this study, in lighter shade). Arrows indicate the direction of the movement. The C- terminal domain (CTD) of Cse1 pivots 30˚ about a hinge at the Cse1 NTD-CTD interface. The amplified motion at the tip of the Cse1-CTD slides with the Cse2 dimer ~12 Å relative to the Cas7 scaffold, and protrudes upwards into the R-loop. Cse1 NTD remains in the docked position, stabilized by the clamping of Cse1 L1 loop onto the exposed tri-nucleotide motif on crRNA 5’-handle.

177

Figure S5.4. Local conformational rearrangement in Cse1 at the proposed Cas3 binding site. (A) Vector map showing global and local conformational changes in Cse1. Cse1-NTD (in cyan) undergoes moderate movement. Cse1-CTD (in green) swings about a pivoting point (W307) as part of the global conformational changes, The Cse1 NTD/CTD interface (in orange) undergoes significant local conformational rearrangement upon partial R-loop formation. (B) SDS-PAGE of wild type and mutant Cascades used in mutagenesis. All mutants contained stoichiometric amounts of Cse1, except W307A, in which Cse1 failed to assemble. (C) Detailed structure rearrangement at Cse1 NTD/CTD interface, involving amino acids 300-326. Different colors are used to differentiate structural elements (amino acid numbers tabulated to the right side). Partial R-loop and free-Cascade structures are rendered in solid and semi-transparent cartoons, respectively. (D) W307 from Cse1-NTD rotates inside a hydrophobic socket at Cse1-CTD during the conformational change (free-Cascade-CTD rendered in a darker shade of green). Disruption of this interaction (W307A) caused Cse1 dissociation from Cascade. (E) Docking of our structure into the cryo-EM reconstruction of the crosslinked Cascade- dsDNA-Cas3 complex (EMD-5930). Cas3 density and the location of the local rearrangement in Cse1 are circled to highlight their proximity.

178

Figure S5.5. Guided entry of dsDNA into Cascade. A series of positively charged residues from Cas7.5, Cas7.6, Cas5e and Cse1 (dark blue surfaces) guide the dsDNA into Cascade and towards PAM recognition elements in Cse1-NTD.

179

Figure S5.6. Influence of PAM-1 base-pair composition on Cascade binding affinity and Cas3 cleavage efficiency. (A) Inosine substitution of GNT-1 led to a mild 2-fold binding defect, suggesting the N2 amine of GNT-1 is a minor determinant of specificity. (B) Affinity of E. coli Cascade for all four base-pair compositions at PAM-1. Kd for 5’- ATG, ATA, ATC, and ATT PAMs were measured at ~10, 40, 100, and 100 nM, respectively. (C) Tighter Cascade binding correlated well Cas3 cleavage efficiency (5’- ATG>ATA>>ATT≈ ATC). Cascade concentration above the Kd led to efficient Cas3 cleavage. Experiments were done in triplicate and representative results are shown.

180

181

Figure S5.7. Conservation of PAM recognition elements. (A) Sequence alignment of Cse1 proteins with known structures. Identical residues are highlighted in red, conserved residues in red text. DNA-contacting residues in the Cse1-NTD and CTD are marked by cyan and green asterisks, respectively. The glycine loop and glutamine wedge are in blue and orange boxes, respectively. Residues mediating the W307 ball-and-socket interaction are marked with black asterisks. (B) Structural alignment of Cse1 PAM recognition elements. Apo Cse1 from T. thermophillus (PDB: 4AN8), A. ferrooxidans (PDB: 4H3T) and T. fusca (PDB: 3WVO) were superimposed with the Cse1 in our partial R-loop forming E. coli Cascade structure. Despite sequence variation, a glycine-rich loop is present in each Cse1 structure, and likely plays a similar function to recognize PAM from the minor groove (left inset). The glutamine-wedge protrusion is highly conserved in 3-D. Each wedge features a long side chain at the tip (right inset), which likely stacks underneath PAM in a similar fashion. 182

Figure S5.8. Rationalization of PAM-2 and PAM-3 specificity using nucleotide substitution and modeling. (A) Modeling of alternative base-pairs at PAM-2 suggesting that only the N2 amine of GT-2 would cause steric clashes with Cα of G160 (lower right quadrant), this amine therefore may serve as the anti-determinant for the rejection of GT- CNT at PAM-2. (B) EMSAs demonstrating that removal of this amine by inosine substitution rescues the Cascade binding defect. (C) Whereas K268A contained reduced affinity for the correct PAM, it still possesses very strong discrimination against GT-CNT at PAM-3 (5’-CTG), suggesting the further presence of a mechanism to reject GT-3. (D) Inosine substitution of GT−3 restored the Cascade binding Kd to ~40 nM, leading to the conclusion that the N2 amine of GT−3 is a minor determinant of specificity. Experiments were done in triplicate and representative results are shown.

183

Supplemental Tables

Table S5.1. Oligonucleotides used in this study.

184

Table S5.2. Data collection and refinement statistics.

185

CHAPTER SIX

CONFORMATIONAL DYNAMICS OF DNA BINDING AND CAS3 RECRUITMENT

BY THE CRISPR RNA-GUIDED CASCADE COMPLEX

Contribution of Authors and Co-Authors

Manuscript in Chapter 6

Author: Paul B.G. van Erp

Contributions: designed the research, performed experiments, analyzed the data, wrote the manuscript

Author: Angela Patterson

Contributions: performed experiments, analyzed the data, wrote the manuscript

Co-Author: Ravi Kant

Contributions: designed the research, performed experiments, analyzed the data

Co-Author: Luke Berry

Contributions: analyzed the data, wrote the manuscript

Co-Author: Sarah M. Golden

Contributions: performed experiments

Co-Author: Brittney L. Forsman

Contributions: performed experiments

Co-Author: Joshua Carter

Contributions: analyzed the data

Co-Author: Ryan N. Jackson

Contributions: analyzed the data 186

Co-Author: Brian Bothner

Contributions: designed the research, analyzed the data, wrote the manuscript

Co-Author: Blake Wiedenheft

Contributions: designed the research, analyzed the data, wrote the manuscript

187

Manuscript Information

Paul B.G. van Erp, Angela Patterson, Ravi Kant, Luke Berry, Sarah M. Golden, Brittney L. Forsman, Joshua Carter, Ryan N. Jackson, Brian Bothner, and Blake Wiedenheft

ACS Chemical Biology

Status of Manuscript: ____ Prepared for submission to a peer-reviewed journal ____ Officially submitted to a peer-reviewed journal ____ Accepted by a peer-reviewed journal X Published in a peer-reviewed journal

ACS Publications Volume 13 Issue 2 16 Oct 2017 Pages 481-490 DOI: 10.1021/acschembio.7b00649

188

CONFORMATIONAL DYNAMICS OF DNA BINDING AND CAS3 RECRUITMENT

BY THE CRISPR RNA-GUIDE CASCADE COMPLEX

Paul B.G. van Erp1,§, Angela Patterson2,§, Ravi Kant2, Luke Berry2, Sarah M.

Golden1, Brittney L. Forsman1, Joshua Carter1, Ryan N. Jackson3, Brian Bothner2,*, and

Blake Wiedenheft1,*

1Department of Microbiology and Immunology, 2Department of Chemistry and

Biochemistry, Montana State University, Bozeman, MT 59717, USA, 3Department of

Chemistry and Biochemistry, Utah State University, Logan UT 84322, USA.

§These authors contributed equally. *These authors jointly supervised this work.

189

Abstract

Bacteria and archaea rely on CRISPR (clustered regularly interspaced short palindromic repeats) RNA-guided adaptive immune systems for sequence specific elimination of foreign nucleic acids. In Escherichia coli, short CRISPR-derived RNAs

(crRNAs) assemble with Cas (CRISPR-associated) proteins into a 405-kilodalton multi- subunit surveillance complex called Cascade (CRISPR-associated complex for antiviral defense). Cascade binds foreign DNA complementary to the crRNA guide and recruits

Cas3, a trans-acting nuclease-helicase required for target degradation. Structural models of

Cascade have captured static snapshots of the complex in distinct conformational states, but conformational dynamics of the 11-subunit surveillance complex have not been measured. Here we use hydrogen-deuterium exchange coupled to mass spectrometry

(HDX-MS) to map conformational dynamics of Cascade onto the three-dimensional structure. New insights from structural dynamics are used to make functional predictions about the mechanisms of the R-loop coordination and Cas3 recruitment. We test these predictions in vivo and in vitro. Collectively, we show how mapping conformational dynamics onto static 3D-structures adds an additional dimension to the functional understanding of this biological machine.

190

Introduction

The adaptive immune system in Escherichia coli (type I-E) consists of eight cas genes flanked by a CRISPR locus (Figure 6.1A). Five of the cas genes encode proteins

(Cse11, Cse22, Cas76, Cas51, Cas6e1) that assemble with a 61-nucleotide CRISPR- derived RNA (crRNA) into a seahorse-shaped complex called Cascade (CRISPR- associated complex for antiviral defense), with subunits that resemble a head, backbone, belly, and tail (Figure 6.1B) (1-4). Cascade is a crRNA-guided surveillance complex that identifies foreign DNA and recruits a trans-acting nuclease-helicase (i.e. Cas3) for target degradation (5-7).

Target DNA recognition by Cascade relies on protein-mediated recognition of a three-nucleotide PAM (protostpacer adjacent motif) and a protospacer sequence that is complementary to the crRNA-guide (1, 8-10). Structures of E. coli Cascade before binding

DNA (3, 4, 11), bound to ssDNA (12), bound to a partially duplexed dsDNA target (8), and recent structures of Cascade from Thermobifida fusca bound to a dsDNA target (13), provide snapshots of the complex poised at different stages of antiviral defense.

Collectively, these structures explain how DNA binding triggers a conformational rearrangement in which the two belly subunits (i.e. Cse2) slide along the ridged Cas7 backbone via a molecular relay that relies on a series of conserved arginine residues (11).

One of the Cse2 subunits abuts the C-terminal domain (CTD) of Cse1 (also called Cas8e), and the DNA-induced transition of Cse2 is thought to shove the CTD of Cse1 between the two strands of the DNA duplex (3, 4, 8, 12). While the CTD of Cse1 appears to function like a molecular pry bar that helps separate the two strands of the DNA duplex, three 191 structural features (i.e. lysine finger, glycine rich loop and glutamine wedge) in the N- terminal domain of Cse1 are involved in PAM recognition (8). PAM recognition relies on sequence- and shaped-based recognition of the minor groove, resulting in a diverse combination of different PAMs that are recognized with varying affinities (7, 8, 14, 15).

While a variety of PAMs support target binding, different PAM and protospacer combinations have been shown to elicit distinct biological outcomes (14, 16, 17). Binding to a limited number of PAMs elicits an “interference response”, which is characterized by recruitment of a nuclease active form of Cas3 that processively degrades the target (7, 17-

20). Alternatively, some PAMs or mutations at specific positions in the protospacer, reduce interference to levels below what is required for clearance of plasmids or virus elimination, and trigger a “priming response.” Priming is a positive feedback mechanism characterized by the rapid integration of new fragments of foreign DNA into the CRISPR locus, which are used to restore immunity (14, 16, 17, 21). Priming relies on all the of the CRISPR machinery (Cascade, Cas1, Cas2, and Cas3) in addition to a partial complementary target.

Recently, Förster resonance energy transfer (FRET) studies have been used to monitor the conformational state of Cascade in association with targets that elicit priming or interference. Xue et al. used structures of Cascade to position FRET dyes at specific locations in the tail of Cascade to monitor the conformational state of Cse1 in response to binding DNA targets that elicit priming or interference (19, 22). These studies demonstrated that Cascade adopts distinct conformational states based on sequence characteristics of the dsDNA target and that the conformational state of the Cse1 tail subunit correlates with the relative rates of interference and priming. While structure 192 guided placement of FRET dyes provides a powerful means for measuring distances between two selected landmarks, this approach cannot provide a global understanding of the complex conformational dynamics of a large multi-subunit machine like Cascade.

Rather than examining movements of specific locations on Cascade, here we use hydrogen-deuterium exchange coupled to mass spectrometry (HDX-MS) to investigate protein interactions and dynamics of Cascade before and after DNA binding. HDX measures the stability of hydrogen bond networks and solvent accessibility on global and local scales (23, 24) , enabling a mechanistic analysis of coordinated changes associated with target binding. To our knowledge, this is the first application of HDX-MS to a ribonucleoprotein of this complexity (i.e., 5 unique proteins totaling >1,450 amino acid residues). Peptide-resolution HDX-MS experiments reveal residues predicted to be involved in binding the displaced DNA strand (i.e. R-loop) and regions of the Cse1 subunit that are part of the Cas3 binding site. We test the function of these residues using in vivo and in vitro assays and show that HDX-guided mutations at specific locations eliminate

Cas3-mediated DNA degradation without altering Cascade’s ability to bind dsDNA.

Results

Structural studies have captured snapshots of Cascade before and after target binding, but these static structures provide limited information about the conformational dynamics of the complex, which are important for function (3, 4, 8, 12). To measure global changes in the stability and dynamics of Cascade upon target binding, we used intact protein HDX-MS to measure HD-exchange in all 11-protein subunits of Cascade, both before and after binding DNA. We diluted unbound Cascade and Cascade bound to a 72- 193 dsDNA target containing a 3’-TTC-5’ PAM into deuterated buffer and then analyzed both complexes by Liquid Chromatography Mass Spectrometry (LCMS) over 15 time points (Figure 6.1 and Supplementary Dataset 1). At each time point, dsDNA bound

Cascade exhibited less deuterium uptake than the unbound form, though the difference was less pronounced at later time points. Overall, the intact protein HDX-MS revealed a net stabilization of each protein subunit in the complex upon dsDNA binding (Figure 6.1C).

This stabilization was further confirmed using differential scanning fluorimetry, which showed that Cascade bound to a dsDNA target has a melting temperature that is approximately 10°C above the melting temperature of unbound Cascade (Figure S6.1).

Having established that DNA binding globally decreases deuterium uptake and increases stability, we set out to map these changes to specific regions of the complex. To accomplish this, we digested Cascade and dsDNA-bound Cascade with pepsin (i.e., a low specificity protease), prior to LCMS. Deuterium uptake curves were generated by quenching the HDX reaction after 0, 0.16, 0.5, 2.5, 5.0, 10, 30 and 180 minutes. 255 peptides, representing more than 90% of all Cascade protein sequences, were analyzed

(Figure S6.2). Parent spectra (MS1) were used to determine the number of deuterons incorporated per peptide, and the difference in deuteron incorporation between peptides isolated from dsDNA bound and unbound states of Cascade were used to calculate changes in HD-exchange (Supplementary Dataset 1). Overall, 185 of the 255 peptides showed increased deuterium incorporation over the time course in either or both complexes, while

70 peptides did not. After binding dsDNA, 22 of the 255 peptides increased deuterium 194 incorporation, while 119 peptides had decreased deuterium incorporation compared to the unbound complex.

Figure 6.1. Binding of dsDNA increases the stability of Cascade. (A) The type 1-E CRISPR system of E. coli consists of eight cas genes and a CRISPR locus. Five cas genes (colored) code for proteins that assemble with a crRNA into the Cascade complex. (B) Schematic representation of three stages of Cascade during the interference process. Subunits colored according to Figure 6.1A. (C) Deuterium uptake curves for the individual subunits of Cascade. Error bars show the standard deviation (n=3) and are smaller than the circular symbols used to plot the data. Both the unbound and dsDNA bound Cascade complexes were incubated in deuterated buffer for 0.5 to 77.5 minutes. Colors of the data points on the uptake curves correspond with the subunits of Cascade as colored in Figure 6.1A. Differences in deuterium uptake were determined by subtracting deuterium uptake of the unbound complex from the dsDNA bound complex at 77.5 minutes. The percentage change in uptake was mapped onto the dsDNA bound structure of Cascade (5H9F). Increasing blue hue indicates greater protection (i.e., less deuterium uptake) in the dsDNA bound form. Overall, the dsDNA bound form of Cascade takes up less deuterium and is more protected.

HD-exchange profiles are not uniformly distributed across the complex or within an individual subunit. By analyzing the distribution of deuterium uptake over the time course, changes in the stability and dynamics of hydrogen bond networks upon dsDNA binding can be determined. Early time points report on changes in solvent accessibility 195 upon dsDNA binding, whereas later time points provide information on the stability of hydrogen bonding networks and the participation of peptide backbone amides in protein secondary structures that are protected from deuterium exchange because of noncovalent interactions (25). Regions of the complex that displayed significant protection upon dsDNA binding (i.e. decreased deuterium exchange) were investigated as potential DNA binding sites (Figure 6.2A, blue). We also analyzed peptides that assimilate more deuterium (i.e. become less protected) after binding DNA (Figure 6.2A, red). An increase in exchange indicates greater solvent exposure or dynamics. These regions are anticipated to have functional roles, such as docking sites for the nuclease-helicase Cas3.

DNA binding

To validate the HDX data we examined regions of the complex known to interact with DNA. The expectation is that DNA binding will stabilize the peptide backbone and physically occlude these regions from HD-exchange. Recognition of the PAM sequence takes place in the tail of the complex and involves three structural elements in the Cse1 subunit: lysine finger, glycine loop, and glutamine wedge (8). The glutamine wedge (Cse1 amino acids 352-358) forms a beta-hairpin that is positioned (i.e. wedged) between the complementary and non-complementary strands of the DNA target (Figure 6.2). In the dsDNA bound complex, a peptide corresponding to this feature (amino acids 345-358) exchanges less deuterium over time, than the same peptide in the unbound complex (Figure

6.2B). This indicates that the glutamine wedge is protected from exchange and is less dynamic when Cascade is bound to DNA. Similarly, a peptide covering the PAM sensing lysine finger (Cse1 amino acid 268) also exchanges less deuterium upon dsDNA binding 196

(Figure S6.3). Results for the glutamine wedge and lysine finger are consistent with the structural models, however the glycine loop (Cse1 amino acids 157-161) showed no increase in protection upon dsDNA binding, which was surprising since the crystal structure indicates that this feature interacts with the minor groove of the double-stranded

PAM (Figure S6.3). This suggests, that in solution, the glycine loop may be more dynamic after DNA binding than is indicated by the crystal structure.

The Cse1 subunit is critical for binding DNA and the conformational state of this subunit has been implicated in recruitment and allosteric regulation of Cas3 (7, 19, 26, 27).

In structures of Cascade without DNA or bound to dsDNA, the Cse1 subunit is tethered to

Cas5 via a short loop (L1, amino acids 125-131) on Cse1 that is buried in a pore on Cas5

(3). However, in the structure of Cascade bound to a complementary ssDNA target, the N- terminal domain (NTD) of Cse1 comes “unhitched” from the complex and L1 loop is disordered in the structure (12). Collectively, these structures demonstrate that the NTD of

Cse1 adopts conformationally distinct positions and recent FRET experiments by Xue et al. suggest that NTD of Cse1 is conformationally dynamic (19). While the structures of

Cascade without DNA suggest that Cse1 is in a “closed” conformation where L1 is buried in the Cas5 pore, our HDX data reveals that the L1 peptide of Cse1 (amino acids 118-129) is significantly more protected (less HD-exchange) in the dsDNA bound complex, as compared to the unbound complex (Figure 6.2B and Figure S6.3). These data suggest that the Cse1 subunit is dynamic in the absence of DNA, sampling both the “open” and “closed” conformations. 197

Figure 6.2. Peptide-resolution mapping of deuterium uptake reveals areas of Cascade that become more or less protected upon dsDNA binding. (A) The percentage difference in deuterium uptake between the unbound and dsDNA bound complex for individual peptides at the 180-minute time point was mapped onto the dsDNA bound structure (5H9F). Blue indicates less deuterium incorporation in the dsDNA bound form (protection) while the red indicates more deuterium incorporation in the dsDNA bound form (de-protection). White indicates regions where there is no difference in deuterium uptake or peptides where no data is available. The inset shows the ribbon structure of the Cse1 subunit with the glutamine wedge and Loop 1 showing increased protection. The Cse1 subunit consists of a four-helix bundle (CTD) and globular domain (NTD). (B) Deuterium uptake curves show increased protection in the dsDNA bound form for peptides in Loop 1 (p-value<0.01 for all time points) and the glutamine wedge (p-value<0.01 for all time points after 2.5 minutes). Error bars represent the standard deviation of three replicates.

Target recognition by Cascade relies on both specific- and non-sequence specific interactions with DNA. Recently, we showed that a lysine-rich helix (amino acid residues 198

137-144) on the fingers domain of two adjacent Cas7 subunits (i.e. Cas7.5 and Cas7.6) creates a lysine-rich vise that makes non-sequence specific interactions with dsDNA and that these interactions are essential for target binding (11). We predicted that dsDNA would result in less HD-exchange for peptides corresponding to the lysine-rich vise. To test this prediction, we examined a peptide covering the lysine-rich helices of Cas7 (amino acid residues 134-148). This peptide showed a small degree of protection upon dsDNA binding

(Figure S6.3 and S6.4). However, there are six copies of the Cas7 subunit in the surveillance complex and only two of these subunits (Cas7.5 and Cas7.6) interact with the phosphate backbone of the incoming DNA (11). The Cas7 peptide corresponding to the lysine-rich vise (residues 134-148) should have a different exchange rates depending on their position in the complex. Peptides from Cas7.5 and Cas7.6 should have less HD- exchange than the same peptides form Cas7.1-7.4, and this difference should result in a bimodal distribution of these peptides. In fact, we observe a bimodal behavior, which is consistent with a subset of the Cas7 subunits having increased protection upon DNA binding (Figure S6.4).

Path of the Displaced DNA Strand

One of the outstanding questions about the mechanism of target binding is the location of the displaced DNA strand. Once Cascade binds to a dsDNA target, the complementary strand of the DNA target base pairs with the crRNA, while the non- complementary strand is displaced and forms an R-loop (2, 28). Structures of E. coli

Cascade bound to dsDNA showed 10 to 12 nucleotides of the displaced DNA strand, but it remains unclear if this fragment reflects the path of the displaced strand when the 199 complex binds to a full-length target duplex (8). In the structure, the partial displaced strand is stabilized by interactions with residues on the Cse1 subunit of Cascade and we expected that the remaining part of the displaced strand would interact with specific residues on Cas subunits as it traversed form the PAM proximal tail to the PAM distal head. These stabilizing interactions would shield specific amino acid residues from HD-exchange. To trace the path of the displaced strand, we used our HDX-MS data to identify peptides that show greater protection from exchange in the dsDNA bound form (Figure 6.3A).

Analysis of the tail of Cascade revealed a peptide in the C-terminal domain (CTD) of Cse1 (amino acid residues 381-398) with significant protection (p<0.01) in the dsDNA bound form of the complex compared to the unbound complex (Figure 6.3A and Figure

S6.3) (24). This is consistent with the path of the displaced strand in the dsDNA bound structure from Hayes et al (8). However, to clarify the mechanism of protection at this location, we conducted an additional comparison by performing HDX-MS on Cascade bound to ssDNA (Supplemental Dataset 1). Comparison of the dsDNA and ssDNA bound complexes reveal regions of protection specifically resulting from the presence of the displaced strand. Importantly, this comparison showed equivalent levels of protection in the ssDNA bound and dsDNA bound form of Cascade, indicating that the increase in protection observed in the dsDNA bound structure is not necessarily due to direct interactions with the displaced strand, but likely an allosteric stabilization of the hydrogen- bonding network across this peptide. This result is consistent with a recent structure of T. fusca Cascade bound to a complete dsDNA target. In this structure, the R-loop is disordered in the region of the Cse1-CTD (13). 200

Figure 6.3. Using HDX to identify residues involved in binding the displaced DNA strand. (A) Differences in deuterium uptake between the unbound and dsDNA bound complex for individual peptides at the 180-minute time point are mapped onto the dsDNA bound structure. Only peptides that are more protected in the dsDNA bound complex (blue) are shown. The proposed path of the displaced strand is indicated (green). (B) Deuterium incorporation for peptides Cse2 6-27 and Cas7 296-311 at the 180-minute time point is displayed for Cascade without DNA (teal), ssDNA bound (green), and dsDNA bound complexes (red). The Cse2 peptide shows increased protection in the dsDNA bound complex, as compared to the unbound or the ssDNA bound complex (p<0.01). The Cas7 peptide shows the most protection in the dsDNA bound complex. Error bars represent the standard deviation of three replicates. (C) HDX-guided mutation of positively charged residues on Cse2 (R12E), Cse2 (R53D, K77D, K78D, R89D, R135D, R143D and K158D, named Cse2 charge swap), and Cas7 (K296E, K299E, K301E) were tested in a plasmid- curing assay to determine the biological impact of mutating these residues. The Cas7 mutant did not have a significant impact while the Cse2 mutants displayed a small, but significant defect. The single mutation of Cse2 R12E had an equal impact as mutation of seven Cse2 residues. The error bars represent the standard error of the mean of three replicates.

Once the displaced strand leaves the four-helix bundle, it must travel up the complex where it is expected to form interactions with the belly or backbone subunits of the complex. The HDX data revealed two distinct areas of increased protection that may accommodate the displaced strand (Figure 6.3A). The upper route travels across the top of the Cse2 subunits and includes a peptide that shows strong protection (residues 6-27). The 201 lower path of protection is formed by peptides covering residues 296-311 of the Cas7 subunits. This lower route is part of the basic cleft that is formed by the Cse2 and Cas7 subunits and has previously been hypothesized to accommodate the displaced strand (3, 8,

12). Interestingly, the Cse2 peptide corresponding to the “upper path” (amino acid residues

6-27) showed comparable levels of exchange in the unbound and ssDNA bound forms, while it was more protected from exchange in the dsDNA bound form (p<0.001), suggesting that this peptide may interact with the displaced strand (Figure 6.3B and Figure

S6.3). Conversely, the Cas7 peptide corresponding to the “lower path” (amino acid residues

296-311) showed comparable levels of exchange in both the ssDNA and dsDNA bound states (p>0.05), suggesting that the enhanced stability measured by HDX upon dsDNA binding is not a result of stable interactions with the displaced strand (Figure 6.3B and

Figure S6.3). To test the role of these residues in stabilizing the displaced strand we mutated arginine 12 on Cse2 to glutamic acid (Cse2 R12E) and three lysine residues on

Cas7 to glutamic acids (Cas7 K296E, K299E, and K301E). We then tested the impact of these mutations using a plasmid-curing assay that measures the in vivo efficiency of

Cascade/Cas3 to detect and degrade a target plasmid (Figure 6.3C) (11, 29). The Cse2 mutant (i.e. Cse2 R12E) displayed small, but reproducible defects, while the Cas7 mutant did not result in a measurable defect. We compared the HDX-guided mutants to a mutant we previously made based on electrostatic surface potential, where we mutated seven positively charged residues on Cse2 that line the basic cleft (Cse2 R53D, K77D, K78D,

R89D, R135D, R143D and K158D, named Cse2 charge swap) (Figure 6.3C) (3, 12). The charge swap of seven Cse2 residues resulted in a small defect comparable to the mutation 202 of the single arginine 12 on Cse2. Collectively this suggests that precise coordination of the R-loop, may not be critical for immune system function.

Recruitment of the nuclease-helicase Cas3.

DNA binding triggers a conformational change in Cascade that recruits Cas3 to the

Cse1 subunit (3, 4, 6, 8, 11, 30). We hypothesized that dsDNA binding exposes a Cas3 docking site on Cascade and that these residues would display greater HD-exchange after

Cascade binds dsDNA. To test this hypothesis, we compared HD-exchange before and after dsDNA binding and identified a large beta-hairpin, termed Loop 3 (L3, amino acids

285-296) on Cse1 that is more flexible after DNA binding (Figure 6.4A). The increased exchange in L3 coincides with movement of a short loop called L4 (amino acids 316-326)

(Figure S6.5 and S6.6). L4 is positioned next to L3 in the unbound and ssDNA bound

Cascade structures, but moves 11 Å away from L4 upon dsDNA binding (Figure S6.6A).

To assess the functional importance of L3 and L4 we compared the Cse1 protein structure from E. coli to homologous protein structures from Thermobifida fusca, Thermus thermophiles, and Acidimicrobium ferrooxidans (8, 11, 13, 31-33). While there is no detectable amino acid sequence conservation, L3 and L4 are conserved structural features

(Figure S6.5 and S6.6). Based on structural conservation and dynamic response to dsDNA binding, we hypothesized that L3 and L4 may participate in Cas3 recruitment. To test this hypothesis, we deleted L3 (L285-K296, ΔL3) and performed plasmid-curing assays.

Deletion of L3 caused a plasmid-curing defect, indicating that L3 has an important functional role in DNA binding, Cas3 recruitment, or both (Figure 6.4B). To clarify the function of L3, we expressed and purified the Cse1ΔL3 deletion mutant of Cascade. Size 203 exclusion chromatography and SDS-PAGE gel electrophoresis confirmed that the complex assembles and purifies like wild type Cascade (Figure S6.7A). However, Electrophoretic

Mobility Shift Assays (EMSAs) revealed that the L3 deletion results in a DNA binding defect (KD= 135.2 nM) compared to wild type Cascade (KD= 1.5 nM) (Figure 6.4C and

Figure S6.8), which indicates that defects in DNA binding are responsible for the immune defect observed in the plasmid-curing assays. To further investigate the function of L3, we focused on two conserved lysine residues (Cse1 K289 and K290) at the apex of L3. Alanine substitution of these two lysine residues (Cse1 K289A/K290A) resulted in a similar plasmid-curing defect as deletion of the entire 11 residue loop (Figure 6.4B). Like the ΔL3 mutant, the Cse1 K289A/K290A mutant expressed and purified similar to wild type (Figure

S6.7B). However, the Cse1 K289A/K290A mutation does not impair DNA binding (KD=

1.8 nM) (Figure 6.4C and Figure S6.8). This suggested that the lysine residues might be involved in Cas3 recruitment or activation. To determine the role of the lysine residues, we performed an in vitro cleavage assay in which Cse1 K289A/K290A Cascade was pre- bound to dsDNA followed by the addition of Cas3. However, in contrast to our prediction,

Cascade with the Cse1 K289A/K290A mutation results in a very modest DNA degradation defect (Figure 6.4D). We speculate that the discrepancy between the in vivo plasmid-curing assay (defect) and the in vitro DNA degradation assay (no defect) might be explained by high Cas3 concentrations (300 nM) and low of sensitivity in our DNA degradation assay.

The mechanistic basis of this distinction requires further investigation, but the result illustrates the importance of performing both in vivo and in vitro experimentation. 204

Figure 6.4. Predicting the Cas3 interaction site using HDX. (A) Ribbon representation of Cse1 colored according to the percentage difference in deuterium uptake of the dsDNA bound complex minus the unbound form of the Cascade complex at the 180-minute time point. Areas in red are more available for HDX after dsDNA binding. dsDNA is colored green. (B) Residues in three of the peptides (L3, H1 and H2) that exchange more deuterium after binding dsDNA and one peptide that exchanges less deuterium (H3) after binding dsDNA, were targeted for mutagenesis and tested using plasmid-curing assays. Deletion of Loop 3 (Cse1 ΔL285-K296, denoted ΔL3) or mutation of the lysine residues present at the tip of Loop 3 (Cse1 K289A/K290A, denoted KK) result in large immune system defects. Mutation of alpha helices H1 (Cse1 N379A/E380K), H2 (Cse1 N332A, Q333A, R335E) and H3 (Cse1 R194E/K197E) resulted in immune system defects for H1 and H3. Mutants showing defects were selected for further study. The error bars represent the standard error of the mean of three replicates. (C) Measurement of the binding affinity for 205

72 bp dsDNA containing a protospacer and interference PAM (3’-TAC-5’ or 3’-TTC-5’) using Electrophoretic Mobility Shift Assays (EMSAs). Only the deletion of Loop 3 resulted in a major binding defect. Error bars represent the standard deviation of three replicates. (D) Denaturing polyacrylamide gels showing time courses of Cascade/Cas3 mediated DNA degradation. 72 bp dsDNA containing P7 protospacer and 3’-TTC-5’ PAM was pre- bound to wild type Cascade or the Cascade mutants and incubated for various lengths of time with Cas3. Wild type Cascade facilitates destruction of the DNA by Cas3. The H1 mutant severely reduces the ability of Cas3 to degrade the DNA, indicating that this mutant blocks Cas3 recruitment or activation. The percentage of DNA degraded after 60 minutes is indicated with the error representing the standard deviation of three replicates.

In addition to L3, we identified other surface exposed peptides on Cse1 with changes in their HDX profile upon DNA binding. We made mutations in three distinct locations that cluster on an exposed surface of Cse1 that is anticipated to be involved in

Cas3 recruitment (Figure 6.4A). Two alpha-helices (H1 and H2) that showed increased exchange upon dsDNA binding and one alpha helix (H3) with a slight decrease in HD- exchange were selected for mutational analysis. Polar residues with uncharged side chains were mutated to alanine residues, and residues with charged side chains were swapped for the opposite charge. The H1 mutation (Cse1 N379A/E380K) and the H3 mutation (Cse1

R194E/K197E) both result in pronounced defects in the plasmid-curing assay, while the

H2 mutation (Cse1 N332A, Q333A and R335E) results in a modest immune system defect

(Figure 6.4B). To clarify the functional importance of H1 and H3 we expressed and purified these two mutants. The expression levels and stoichiometry of the purified mutants were comparable to wild type Cascade (Figure S6.7C). In vitro binding assays revealed that the H1 mutant binds dsDNA with wild type affinity (KD=0.86 nM), while the H3 mutant has a small defect (KD=4.87 nM) (Figure 6.4C and Figure S6.8). To test the function of these residues in Cascade-mediated recruitment of Cas3, we performed DNA cleavage assays. The H1 mutant binds DNA like wild type Cascade, but does not facilitate Cas3- 206 mediated DNA degradation, while the H3 mutant, which has a modest binding defect, degrades DNA similar to wild type (Figure 6.4D). A high concentration of Cascade (300 nM) was used in the DNA cleavage assays to compensate for the small DNA binding defect observed. These data indicate that residues in H1 are critical for Cas3 recruitment or activation, while the plasmid-curing defect of the H3 mutant is caused by the small DNA binding defect.

Discussion

Structures of Cascade have captured the complex in several distinct conformational states (3, 4, 6, 8, 11, 12, 34). Collectively, these structural models reveal a series of conformational changes that coincide with DNA recognition and Cas3 recruitment. Here, we add an additional dimension to this work by performing HDX-MS on Cascade before and after target recognition. HDX-MS analysis reveals specific regions of the complex that have more or less HD-exchange after binding DNA. Mapping this data onto 3D-structures of Cascade highlights regions of functional importance that have been previously identified, as well as functionally important regions that do not stand out in static structures.

The lysine-rich vise of Cas7 and the PAM sensing domain of Cse1 are critical for

DNA binding (8, 11). Slow deuteration of a peptide is indicative of low solvent exposure or extensive hydrogen bonding and we measure significant decreases in deuterium incorporation after dsDNA binding for peptides that cover the lysine-rich vise, and two the

PAM sensing elements (i.e., lysine finger and glutamine wedge). However, the glycine- rich loop (amino acids 157-161), which has previously been shown to intercalate in the minor groove of the PAM duplex is not protected from HD-exchange in the dsDNA bound 207 complex (Figure S6.3). These results suggest that the glycine-rich loop may be more dynamic after binding than indicated by the crystal structure.

While the mechanism of PAM sensing and directional hybridization of the crRNA guide to the DNA target is relatively well understood, the location of the displayed strand has not been determined for the E. coli Cascade complex. Based on the structure of E. coli

Cascade bound to a partially duplexed DNA target, we expected the displaced strand to traverse over the C-terminal four-helix bundle of Cse1 and continue along a basic cleft formed by the Cse2 and Cas7 subunits (Figure 6.3) (12). Indeed, peptides covering the C- terminal four-helix bundle of Cse1 (amino acids 381-398) incorporate significantly less deuterium after dsDNA binding. However, this same peptide is also protected (i.e. less deuterium incorporation) in the ssDNA bound complex, suggesting that the lack of deuterium uptake is due to changes in hydrogen bonding networks, rather than direct interactions with the displaced strand. This conclusion is consistent with the recent structure of T. fusca Cascade bound to a dsDNA target, which shows no density for the displaced DNA strand in this region (13).

Following the displaced strand beyond the four-helix bundle we predicted that the remaining portion of the R-loop would be coordinated by residues that form a positively charged cleft between Cse2 and Cas7 (12). To test the role of this cleft in DNA binding, we flipped the charge on seven positively charged residues on Cse2 (i.e. R53D, K77D,

K78D, R89D, R135D, R143D and K158D) and three positively charged residues on Cas7

(i.e. K296E, K299E and K301E). Remarkably, neither of these mutants resulted in a pronounced immune defect (Figure 6.3C). Using the HD-exchange data, rather than 208 focusing strictly on electrostatic surface potential, we identified a peptide on Cse2 (amino acids 6-27) that exchanges significantly (p<0.001) less deuterium upon binding of dsDNA.

Mutation of a single arginine (Cse2 R12) resulted in a small, but reproducible immune defect, which is consistent with a minor role for this peptide in stabilizing the displaced strand. In support of this conclusion, the recent T. fusca Cascade structure reveals an equivalently positioned arginine (R16) that is located ~4 Å from the phosphate backbone of the displaced strand (13). While we initially expected to identify an extensive network of non-sequence specific, protein-mediated interactions involved in R-loop stabilization, our results and recent structures of the Cascade complex from T. fusca reveal a poorly ordered displaced strand that makes few direct interactions with the Cas proteins.

Collectively this suggests that precise coordination of the R-loop at positions distal to the

PAM, may not be critical for immune system function.

In addition to DNA binding, Cse1 also plays a critical role in Cas3 recruitment (6,

7, 20). We hypothesized that regions of Cse1 that incorporate more deuterium after DNA binding, may be involved in Cas3 recruitment. To test this hypothesis, we designed HDX- guided mutations in L3, H1, and H2 regions of Cse1 (Figure 6.4). Mutations in L3 and H1 result in significant immune defects, while the H2 mutation has no measurable phenotype.

We expected that the increase in HD-exchange would be indicative of regions involved in

Cas3 recruitment; however, only the H1 mutant blocks Cas3-mediated DNA degradation, suggesting that this may be a docking site for Cas3.

Conformational dynamics are critical to the function of any molecular machine.

Magnetic tweezers have been used to measure the dynamics of torque-dependent R-loop 209 formation and DNA dissociation for Cascade (28, 35). These experiments reveal conformational changes in the DNA target and recent FRET studies have been used to identify distinct conformational states of Cascade that are controlled by the sequence of the

DNA target (19, 22, 26). FRET has been used to show that binding to DNA targets containing specific PAM or protospacer mutations correspond with a rotation of the N- terminal domain in Cse1. This conformation is referred to as the “open” state, which does not effectively recruit nuclease-active Cas3, but recruits a nuclease-repressed form of Cas3, in the presence of Cas1 and Cas2 (18). Collectively these studies suggest that the conformational state of Cascade is determined by the sequence of the DNA ligand, and that these conformations control the immune response by regulating the activity of Cas3 (3, 18,

19).

While structure guided placement of FRET dyes provides a powerful means for measuring distances between two selected landmarks, this approach cannot provide a global understanding of the complex conformational dynamics of a large multi-subunit machine like Cascade. Here we demonstrate how the high-spatial resolution of FRET is complemented by the peptide-resolution of HDX-MS. In this paper, we validate the method by comparing our results to previous insights (e.g. PAM binding residues) and provide new information about the mechanism of DNA binding (e.g. coordination of the displaced strand), and Cas3 recruitment (e.g. role of H1). Future experiments performed using HDX-

MS should focus on Cascade bound to DNA targets that have previously been shown to elicit interference or priming responses. These experiments will be critical to understanding 210 how DNA-induced conformational states of Cascade control the downstream immune response.

Acknowledgments

Research in the Wiedenheft lab is supported by the National Institutes of Health

(P20GM103500, P30GM110732-03, R01GM110270, and R01GM108888), the National

Science Foundation EPSCoR (EPS-110134), the M. J. Murdock Charitable Trust, a young investigator award from Amgen, Montana University System Research Initiative, Gordon and Betty Moore Foundation, and the Montana State University Agricultural Experimental

Station. Research in the Bothner lab is supported by National Institutes of Health (R01

AI081961, R21 AI126583). The Proteomics, Metabolomics, and Mass

Spectrometry facility at MSU received support from the Murdock Charitable Trust and NIGMS of the National Institutes of Health (P20GM103474). The content is solely the responsibility of the authors and does not necessarily represent the official views of the

National Institutes of Health.

Materials and Methods

Hydrogen-Deuterium Exchange coupled with Mass Spectrometry (HDX-MS):

Three forms of the Cascade complex were tested using HDX-MS: unbound

Cascade, Cascade bound to a ssDNA substrate, and Cascade bound to a dsDNA bound substrate (Supplemental Dataset 1). First Cascade was expressed and purified (see below) and pre-bound to ssDNA or dsDNA substrate, by incubating unbound Cascade with an excess of the appropriate oligonucleotide at 37 °C for 15 minutes. Excess oligonucleotides 211 were removed from the ssDNA and dsDNA bound Cascade preparations by gel filtration as described below (Figure S6.6D). Cascade (10mg/mL) was diluted 10-fold into reaction buffer (10mM sodium acetate, 50mM NaCl, pH 7.0) which was made in D2O. Samples were removed at different time points and quenched to stop the exchange after 0, 0.16, 0.5,

2.5, 5.0, 10, 30 and 180 minutes. At each time point, 10 µL was withdrawn and placed into quenching/digestion solution at pH 2.5 containing 1% formic acid (FA, Sigma) and 0.2 mg/mL pepsin (Sigma) at 0°C. After one and half minutes of digestion, the tube was frozen in liquid N2 and stored at -80°C until liquid chromatography-mass spectrometry (LC-MS) analysis. For Intact protein HDX-MS the reaction started in the same manner as the pepsin- based HDX-MS. Time points were collected at 0.5, 6, 11.5, 17, 22.5, 28, 33.5, 39, 44.5,

50, 55.5, 61, 66.5, 72, and 77.5 minutes. The intact protein HDX-MS time points were quenched in the same manner; however, the intact protein quench solution did not contain pepsin, and tubes were immediately frozen in liquid N2 and stored at -80°C until liquid chromatography-mass spectrometry (LC-MS) analysis

LC-MS analysis of intact Cascade and Cascade peptides was completed on a 1290

UPLC series chromatography stack (Agilent Technologies) coupled directly to a 6538

UHD Accurate-Mass Q-TOF LC/MS mass spectrometer (Agilent Technologies). Before electrospray-time of flight (ESI-TOF) analysis, peptides were separated on a Reverse Phase

(RP) column (Phenomenex Onyx Monolithic C18 column, 100 x 2 mm) at 1°C using a flow rate of 600 µL/min in the following conditions: 1min, 5% B; 1.0-9.0min, 5-45% B;

9.0-9.80 min, 95% B; 9.80-9.90 min, 5% B solvent A = 0.1% formic acid (FA, Sigma) in water (ThermoFisher) and solvent B = 0.1% FA in acetonitrile (ACN, ThermoFisher). Data 212 was acquired at 2 Hz sec-1 over the scan range 50-1700 m/z in positive mode. Electrospray settings were: nebulizer set to 3.7 bar, drying gas 8.0 L/min, drying temperature 350○C, and capillary voltage 3.5 kV.

Data processing was carried out in Agilent MassHunter Qualitative Analysis version 6.0. Unique peptides were identified by using Peptide Analysis Worksheet version

2000.6.8.0 (PAWs, ProteoMetrics LLC.) with an accuracy of 10 ppm. Peptide sequences were also confirmed by MS-MS data acquisition, with a scan range of 50-1700 m/z (auto

MS-MS)) with an isolation width of 4 m/z and an acquisition rate of 1 spectrum/s. Different collision energy voltage (20V, 25V, and 30V) and linear gradient voltages were applied in auto MS-MS mode. MS-MS data were analyzed and mapped to confirm the corresponding sequence by using Peptide Shaker version 0.41.1 paired with Search GUI version 1.30.1

(Compomics) (36).

Deuterium incorporation in each peptide was calculated using the program

HDExaminer (version 1.3.0 beta 6, Sierra Analytics Inc.) which measures the shift of the centroid peak of the peptides collected throughout the time course in different samples.

The shift of the centroid was then compared to a non-deuterated control (no exchange) and a fully deuterated sample (24-hour reaction) to determine the extent of exchange using methods previously reported by Guttman et al. (23). Back-exchange during chromatography was estimated from a fully deuterated protein. The 24-hour time point was denoted as 100% (푚100) exchange. The non-deuterated control was denoted as 0% (푚0%) exchange. Percentage deuteration for each peptide at different time points was calculated by following formula. 213

푚 − 푚 % Deuteration = 푡 0% 푚100% − 푚0%

푚푡 = deuterium uptake at different times

Statistical analysis on deuterium uptake was performed using MEMHDX 24. P-values less than 0.05 were considered statistically significant. For the uptake curves (Figure 6.2 6.3 and Figure S6.3), peptides that were the most representative of the data were displayed.

Uptake information for all peptides can be found in Supplementary Data Set 1.

Nested or sister peptides that shared the same N-terminal or C-terminal residue and varied in length were manually resolved to determine the contributions of the different amides to the overall uptake of the peptides. The overlapping regions of the peptides were assumed to take up the same amount of deuterium and the non-overlapping regions were assumed to be responsible for the rest of the uptake in the longer peptide. For example, if the peptide covering residues 1-4 took up two deuterium and the peptide covering residues

1-6 took up three deuterium, it would be assumed that residues 5 and 6 together took up, on average, one deuterium. The uptake percentage of the resolved peptides of the unbound complex were then subtracted from the dsDNA bound complex and mapped onto the structural model of dsDNA bound Cascade (5H9F) using Chimera version 1.10.2 (UCSF

Resource for Biocomputing, Visualization, and Informatics), to create the difference maps shown in Figures 6.1-4 (37).

Cascade expression and purification

Cascade and Cascade mutants were expressed and purified using previously described methods (11). Briefly, cas genes and CRISPR RNAs were coexpressed in E. coli

Bl21 (DE3) cells with Cse2 fused to an N-terminal Strep tag. Cells were grown in LB 214 media at 37°C under antibiotic selection and induced at an OD600nm of 0.5 using 0.2 mM isopropyl-D-1-thiogalactopyranoside (IPTG). Cells were cultured overnight at 20 ◦C, pelleted by centrifugation and suspended in lysis buffer (100 mM Tris–HCl pH 8.0, 150 mM NaCl, 1 mM Ethylenediaminetetraaceticacid (EDTA), 1 mM tris (2- carboxyethyl) phosphine (TCEP) and 5% glycerol), and frozen at −80 °C. Cells were lysed by sonication and lysates were clarified by centrifugation. Cascade self-assembles in vivo and the complex was affinity purified on StrepTrap HP resin (GE) using N-terminal Strep-II tags on Cse2. Elution of Cascade was performed using the lysis buffer supplemented with 2.5 mM desthiobiotin. The Strep-II tag was removed with HRV-3C protease. Cascade was concentrated prior to gel filtration chromatography using 10/300 Superose6, 16/60

Superose6 or 26/60 Superose6 columns (GE Healthcare) equilibrated with 50 mM Tris–

HCl pH 7.5, 100 mM NaCl, 1 mM EDTA, 1 mM TCEP and 5% glycerol (Figure S6.6) and stored at -80 °C until use. Stoichiometry of the different Cascade mutants was analyzed by

SDS-PAGE gel and the presence of crRNA was checked by phenol/chloroform extraction of the protein followed by denaturing polyacrylamide gel (Figure S6.6).

Cas3 expression and purification

Cas3 was expressed and purified using previously described methods (5, 6).

Briefly, the Cas3 gene fused to an N-terminal His6-MBP (maltose binding protein) tag was expressed from pHMGWA plasmid in E. coli BL21 (DE3) cells. Cells were grown in LB media at 20°C under antibiotic selection and expression was induced at an OD600 nm of 0.3 using 0.2 mM IPTG. Cells were pelleted by centrifugation and suspended in lysis buffer

(20 mM TRIS-HCl pH 8.0, 200 mM NaCl, 1 mM TCEP and 10 % glycerol). Cells were 215 lysed by sonication and lysates were clarified by centrifugation. Cas3 was affinity purified using Ni-NTA superflow resin (Qiagen), washed with lysis buffer supplemented with 5 mM Imidazole followed by a 1 M Nacl wash. Cas3 was eluted with lysis buffer supplemented with 250 mM Imidazole. Cas3 was concentrated prior to gel filtration chromatography using 16/60 Superoses6 column equilibrated with 20 mM TRIS-HCl pH

8.0, 200 mM NaCl, 1 mM TCEP and 5 % glycerol and stored at -80°C until use. Protein purity was checked by SDS-PAGE gel (Figure S6.6E).

Oligoduplex preparation and DNA labeling

Oligonucleotides (Supplemental Table 1) were 5-end labeled with [ɣ-32P] ATP

(PerkinElmer) using T4 polynucleotide kinase (NEB). Labeled oligonucleotides were purified by phenol/chloroform extraction followed by MicroSpin G-25 columns (GE

Healthcare). Oligonucleotides were gel purified on a denaturing polyacrylamide gel and recovered. dsDNA was prepared by mixing labeled oligonucleotides with a 5 fold molar excess of the complementary oligonucleotide in hybridization buffer (20 mM HEPES pH

7.5, 100 mM KCl and 5 % glycerol). The mixture was heated to 95°C for 5 minutes and allowed to cool to room temperature. Duplexed oligonucleotides were purified on a native polyacrylamide gel and recovered in hybridization buffer. Duplexed oligonucleotides were used for EMSA and Cascade/Cas3 mediated DNA degradation assays.

Electrophoretic mobility shift assay (EMSA)

Electrophoretic mobility shift assays were carried out as described previously (11).

The reported KDs are the average from three independent experiments and error bars 216 represent the standard deviation. Oligonucleotides used for EMSAs are listed in

Supplemental Table 1. The non-target DNA strand was labeled.

Plasmid-curing assays

Plasmid-curing assays were carried out as described previously (11). Briefly, competent E. coli cells containing plasmids encoding Cascade, Cas3, and a crRNA targeting the pUC-P7-TAC plasmid were transformed with an equal molar mixture of pUC19 (non-target) and pUC-p7-TAC (target) plasmids. Expression of Cascade, Cas3, and the crRNA was induced using 0.2 mM isopropyl-D-1-thiogalactopyranoside (IPTG) and cells were plated on LB-agar plates. Forty individual colonies were screened using PCR designed to discriminate target from non-target plasmids based on size. Three replicates were carried out for each experiment and the average value of the ratio of target to non- target plasmid was calculated and plotted in the graphs (Figure 6.3 6.4). An active immune system will destroy the target plasmid and result in a value close to 0, while an inactive immune system will preserve the original 1:1 ratio of target to non-target plasmid and result in a value close to 1. The error bars represent standard error of the mean calculated from three independent experiments.

Cascade/Cas3 mediated DNA degradation

Duplexed oligonucleotides were [ɣ-32P] ATP labeled on the 5’end of the non-target

DNA strand and incubated with 300 nM Cascade or Cascade mutants for 15 minutes at

37°C in reaction buffer (20 mM HEPES pH 7.5, 100 mM KCl, 10 mM MgCl2, 10 mM

CaCl2, 100 µM NiSO4, 100 µM CoCl2 and 2 mM TCEP) supplemented with 2 mM ATP

(Lanes marked No ATP did not receive ATP, lanes marked EDTA contained 17 mM 217

EDTA, Figure 6.4D). Reactions were incubated on ice for 5 minutes prior to addition of

Cas3 to a final concentration of 300 nM. After the addition of Cas3 reactions were incubated for 60 minutes at 37 °C except for the samples that were part of the time course

(triangle in Figure 6.4D, incubated for 0, 15 and 60 minutes). Once the reactions were complete the samples were quenched by the addition of EDTA and Sodium Dodecyl

Sulphate (SDS) to a final concentration of 8 mM EDTA and 1 % SDS. Samples were mixed with 2x Formamide loading buffer, heated to 95°C C for 5 minutes and run on a pre-run 7

M Urea, 15 % polyacrylamide (29:1) gel in Tris-Taurine-EDTA (TTE) buffer. Gels were dried, exposed to phosphor storage screens and scanned with a Typhoon phosphorimaginer.

Degraded and undegraded DNA fractions were quantified using GelQuant.NET software and used to calculate the fraction of degraded DNA compared to the total DNA. Reported degradation percentages are the average of three independent experiments with the error representing the standard deviation.

218

References

1. Brouns SJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJ, Snijders AP, et al. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science. 2008;321(5891):960-4.

2. Jore MM, Lundgren M, van Duijn E, Bultema JB, Westra ER, Waghmare SP, et al. Structural basis for CRISPR RNA-guided DNA recognition by Cascade. Nat Struct Mol Biol. 2011;18(5):529-36.

3. Jackson RN, Golden SM, van Erp PB, Carter J, Westra ER, Brouns SJ, et al. Crystal structure of the CRISPR RNA-guided surveillance complex from Escherichia coli. Science. 2014;345(6203):1473-9.

4. Zhao H, Sheng G, Wang J, Wang M, Bunkoczi G, Gong W, et al. Crystal structure of the RNA-guided immune surveillance Cascade complex in Escherichia coli. Nature. 2014;515(7525):147-50.

5. Mulepati S, Bailey S. In Vitro Reconstitution of an Escherichia coli RNA-guided Immune System Reveals Unidirectional, ATP-dependent Degradation of DNA Target. J Biol Chem. 2013;288(31):22184-92.

6. Hochstrasser ML, Taylor DW, Bhat P, Guegler CK, Sternberg SH, Nogales E, et al. CasA mediates Cas3-catalyzed target degradation during CRISPR RNA-guided interference. Proc Natl Acad Sci U S A. 2014;111(18):6618-23.

7. Westra ER, van Erp PBG, Künne T, Wong SP, Staals RHJ, Seegers CLC, et al. CRISPR immunity relies on the consecutive binding and degradation of negatively supercoiled invader DNA by Cascade and Cas3. Mol Cell. 2012;46(5):595-605.

8. Hayes RP, Xiao Y, Ding F, van Erp PBG, Rajashankar K, Bailey S, et al. Structural basis for promiscuous PAM recognition in type I–E Cascade from E. coli. Nature. 2016;530(7591):499-503.

9. Semenova E, Jore MM, Datsenko KA, Semenova A, Westra ER, Wanner B, et al. Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence. Proc Natl Acad Sci U S A. 2011;108(25):10098-103.

10. Jung C, Hawkins JA, Jones SK, Jr., Xiao Y, Rybarski JR, Dillard KE, et al. Massively Parallel Biophysical Analysis of CRISPR-Cas Complexes on Next Generation Sequencing Chips. Cell.170(1):35-47.e13.

219

11. van Erp Paul BG, Jackson RN, Carter J, Golden SM, Bailey S, Wiedenheft B. Mechanism of CRISPR-RNA guided recognition of DNA targets in Escherichia coli. Nucleic Acids Research. 2015;43(17):8381-91.

12. Mulepati S, Heroux A, Bailey S. Crystal structure of a CRISPR RNA-guided surveillance complex bound to a ssDNA target. Science. 2014;345(6203):1479-84.

13. Xiao Y, Luo M, Hayes RP, Kim J, Ng S, Ding F, et al. Structure Basis for Directional R-loop Formation and Substrate Handover Mechanisms in Type I CRISPR-Cas System. Cell. 2017;170(1):48-60.e11.

14. Fineran PC, Gerritzen MJ, Suarez-Diez M, Kunne T, Boekhorst J, van Hijum SA, et al. Degenerate target sites mediate rapid primed CRISPR adaptation. Proc Natl Acad Sci U S A. 2014;111(16):1629-38.

15. Westra ER, Semenova E, Datsenko KA, Jackson RN, Wiedenheft B, Severinov K, et al. Type I-E CRISPR-Cas Systems Discriminate Target from Non-Target DNA through Base Pairing-Independent PAM Recognition. PLoS Genet. 2013;9(9):e1003742.

16. Datsenko KA, Pougach K, Tikhonov A, Wanner BL, Severinov K, Semenova E. Molecular memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system. Nat Commun. 2012;3:945.

17. Richter C, Dy RL, McKenzie RE, Watson BN, Taylor C, Chang JT, et al. Priming in the Type I-F CRISPR-Cas system triggers strand-independent spacer acquisition, bi-directionally from the primed protospacer. Nucleic Acids Res. 2014;42(13):8516-26.

18. Redding S, Sternberg Samuel H, Marshall M, Gibb B, Bhat P, Guegler Chantal K, et al. Surveillance and Processing of Foreign DNA by the Escherichia coli CRISPR- Cas System. Cell. 2015;163(4):854-65.

19. Xue C, Whitis NR, Sashital DG. Conformational Control of Cascade Interference and Priming Activities in CRISPR Immunity. Molecular Cell. 2016;64(4):826-34.

20. Xue C, Seetharam AS, Musharova O, Severinov K, J. Brouns SJ, Severin AJ, et al. CRISPR interference and priming varies with individual spacer sequences. Nucleic Acids Research. 2015;43(22):10831-47.

21. Swarts DC, Mosterd C, van Passel MW, Brouns SJ. CRISPR Interference Directs Strand Specific Spacer Acquisition. PLoS One. 2012;7(4):e35888.

220

22. Blosser TR, Loeff L, Westra ER, Vlot M, Kunne T, Sobota M, et al. Two distinct DNA binding modes guide dual roles of a CRISPR-Cas protein complex. Mol Cell. 2015;58(1):60-70.

23. Guttman M WD, Engen JR, Lee KK. Analysis of overlapped and noisy hydrogen/deuterium exchange mass spectra. J Am Soc Mass Spectrom 2013;24(12):1906-12.

24. Hourdel V VS, O'Brien DP, Chenal A, Chamot-Rooke J, Dillies MA, Brier S. MEMHDX: An interactive tool to expedite the statistical validation and visualization of large HDX-MS datasets. 2016.

25. Engen JR. Analysis of Protein Conformation and Dynamics by Hydrogen/Deuterium Exchange MS. Analytical Chemistry. 2009;81(19):7870-5.

26. Jackson RN, van Erp PBG, Sternberg SH, Wiedenheft B. Conformational regulation of CRISPR-associated nucleases. Current Opinion in Microbiology. 2017;37:110-9.

27. Sashital DG, Jinek M, Doudna JA. An RNA induced conformational change required for CRISPR RNA cleavage by the endonuclease Cse3. Nat Struct Mol Biol. 2011;18(6):680-7.

28. Szczelkun MD, Tikhomirova MS, Sinkunas T, Gasiunas G, Karvelis T, Pschera P, et al. Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes. Proc Natl Acad Sci U S A. 2014;111(27):9798-803. 29. Almendros C, Guzman NM, Diez-Villasenor C, Garcia-Martinez J, Mojica FJ. Target Motifs Affecting Natural Immunity by a Constitutive CRISPR-Cas System in Escherichia coli. PLoS One. 2012;7(11):e50797.

30. Wiedenheft B, Sternberg SH, Doudna JA. RNA-guided genetic silencing systems in bacteria and archaea. Nature. 2012;482(7385):331-8.

31. Mulepati S, Orr A, Bailey S. Crystal structure of the largest subunit of a bacterial RNA-guided immune complex and its role in DNA target binding. J Biol Chem. 2012;287(27):22445-9.

32. Sashital DG, Wiedenheft B, Doudna JA. Mechanism of foreign DNA selection in a bacterial adaptive immune system. Mol Cell. 2012;48(5):606-15.

33. Tay M, Liu S, Yuan YA. Crystal structure of Thermobifida fuscaCse1 reveals target DNA binding site. Protein Science. 2015;24(2):236-45.

221

34. Wiedenheft B, Lander GC, Zhou K, Jore MM, Brouns SJJ, van der Oost J, et al. Structures of the RNA-guided surveillance complex from a bacterial immune system. Nature. 2011;477(7365):486-9.

35. Rutkauskas M, Sinkunas T, Songailiene I, Tikhomirova MS, Siksnys V, Seidel R. Directional R-Loop Formation by the CRISPR-Cas Surveillance Complex Cascade Provides Efficient Off-Target Site Rejection. Cell Rep. 2015.

36. Vaudel M, Burkhart JM, Zahedi RP, Oveland E, Berven FS, Sickmann A, et al. PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat Biotech. 2015;33(1):22-4.

37. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et al. UCSF Chimera—A visualization system for exploratory research and analysis. Journal of Computational Chemistry. 2004;25(13):1605-12.

222

Supplemental Figures

Figure S6.1. DNA binding increases the stability of Cascade. Differential scanning fluorimetry of unbound Cascade (blue) and the dsDNA bound form of Cascade (red). The increase in melting temperature after dsDNA binding indicates that the complex is more stable.

223

224

225

226

227

228

229

Figure S6.2. Heat and coverage maps for all Cascade subunits. Heat map (colored) and coverage map (blue) for each subunit (Cse1 A, Cse2 B, Cas7 C, Cas5e D, and Cas6e E). The heat map shows the percentage of deuterium exchanged for each peptide at every time point. Each condition (no DNA, dsDNA bound, and ssDNA bound) are displayed as separate rows respectively, with each block further divided into rows indicating the tested time points. The percent deuterium uptake is indicated by differing colors. Coverage maps for each subunit are shown below each heat map. Each line represents a peptide used to measure the deuterium uptake in all conditions. Over 90% sequence coverage was achieved for all peptides.

230

Figure S6.3. HD-exchange over time. The number of deuteriums incorporated by specific peptides are plotted over time. HD-exchange is measured for peptides from Cascade prior to binding DNA (blue), bound to ssDNA (green), and dsDNA (red) at time points 10 and 30 seconds, 2.5, 5, 10, 30 and 180 minutes. With the exception of the 10-minute data point from the unbound form of Cascade, which was only performed once, the remaining data points are the average of three replicates with the standard deviation shown as error bars. Error bars are too small to see in some cases. The maximum number of exchangeable amide hydrogens is indicated and calculated by subtracting two possible exchangeable amides hydrogens from the peptide length (back exchange) and one additional for each proline present. 231

Figure S6.4. Peptide covering the lysine-rich helix (Cas7 134-148) displays a bimodal distribution. Measured isotopic distributions for the Cas7 peptide 134-148 in Cascade prior to binding DNA, bound to ssDNA, and dsDNA bound. The expected isotopic distribution during HDX-MS is shown in green. In the absence of DNA (left column), the peptide shows a normal isotopic distribution, which shifts to the right over time. This shows the incorporation of deuterium. In the ssDNA bound form (middle column), the peptide shows bimodal behavior, which appears at the “leading edge” (red arrows) of the normal Gaussian distribution. This leading edge appears after 30 seconds and becomes continuously more exaggerated until 30 minutes when the isotopic envelope re-adopts a near Gaussian distribution. When dsDNA is bound (right column), the bimodal behavior is observed at the first-time point. Similar to behavior measured for the ssDNA bound complex, the profile returns to a near Gaussian distribution by 30 minutes. However, the centroid of the isotopic distribution for the dsDNA bound form of Cascade lies at a lower m/z value than in the ssDNA bound form. This shows that the dsDNA bound form of this peptide is more protected from deuterium exchange than the ssDNA bound form.

232

Figure S6.5. Sequence alignment of Cse1 shows little conservation in L3 and L4. Protein sequence alignment of Cse1 subunits of different organisms for which structures are available. Escherichia coli (5CD4), Thermobifida fusca YX (5U0A), Thermus thermophilus (4AN8), Thermobifida fusca (3WVO) and Acidimicrobium ferrooxidans (4H3T). Identical residues are highlighted in red and similar residues are in red text. L3 (blue box) and L4 (green, underlined) are indicated and show little sequence conservation. 233

Figure S6.6. Structures of the Cse1 subunit. (A) Structures of the E. coli Cse1 subunit from the forked dsDNA bound Cascade (left, 5H9F) and from the unbound Cascade (right, 5CD4). L4 in the dsDNA bound structure has moved 11 Å away from L3. (B) The Cse1 subunit form Thermobifida fusca YX (5U0A), Thermobifida fusca (3WVO), Thermus thermophilus (4AN8) and Acidimicrobium ferrooxidans (4H3T) all share the conserved structural features L3 and L4.

234

Figure S6.7. Cascade mutants express and purify as WT Cascade. (A) Elution profile of WT Cascade (black) and ΔLoop3 (Cse1 ΔL285-K296, red). The insert shows a SDS- PAGE gel and (top) and denaturing polyacrylamide gel (bottom). (B) Elution profile of WT Cascade (black), Cascade H1 (Cse1 N379A/E380K, blue) and Cascade H3 (Cse1 R194E/K197E, purple). (C) Elution profile of WT Cascade (black) and Cascade K289A/K290A (Cse1 K289A/K290A, green). (D) Elution profile of WT Cascade (black), Cascade bound to 72-nt ssDNA containing P7 protospacer and 3’-TTC-5’ PAM (orange) and Cascade bound to 72-bp dsDNA containing P7 protospacer and 3’-TTC-5’PAM. (E) Elution profile of Cas3. The insert shows an SDS-PAGE gel. 235

Figure S6.8. Deletion of Loop3 results in a major dsDNA binding defect. Electrophoretic Mobility Shift Assays (EMSA) show equilibrium dissociation constants for Cascade and Cascade mutants binding to 72-bp dsDNA containing P7 protospacer and 3’-TTC-5’ or 3’-TAC-5’ PAM. Only deletion of Loop 3 resulted in a major binding defect. The H3 mutant resulted in a minor binding defect. Equilibrium dissociation constant (KD) is calculated using three replicates. Error represents standard deviation.

236

Supplemental Tables

Table S6.1. Oligonucleotides used in this study.

237

CHAPTER SEVEN

PROTEIN OVEREXPRESSION REDUCES SPECIFIC PHAGE INFECTIVITY IN

PROKARYOTIC ARGONAUTE SCREEN

Contribution of Authors and Co-Authors

Manuscript in Chapter 7

Author: Paul B.G. van Erp

Contributions: designed the research, performed experiments, analyzed the data, wrote the manuscript

Co-Author: Tanner Wiegand

Contributions: preformed phylogenetic analysis, analyzed the data

Co-Author: Royce A. Wilkinson

Contributions: performed experiments

Co-Author: Laina Hall

Contributions: performed experiments

Co-Author: Dominick Faith

Contributions: performed experiments

Co-Author: Blake Wiedenheft

Contributions: designed the research, analyzed the data, wrote the manuscript

238

Manuscript Information

Paul B.G. van Erp, Tanner Wiegand, Royce A. Wilkinson, Laina Hall, Dominick Faith, Blake Wiedenheft

Status of Manuscript: X Prepared for submission to a peer-reviewed journal ____ Officially submitted to a peer-reviewed journal ____ Accepted by a peer-reviewed journal Published in a peer-reviewed journal

239

PROTEIN OVEREXPRESSION REDUCES SPECIFIC PHAGE INFECTIVITY IN

PROKARYOTIC ARGONAUTE SCREEN

Paul B.G. van Erp, Tanner Wiegand, Royce A. Wilkinson, Laina Hall, Dominick Faith,

Blake Wiedenheft

Department of Microbiology and Immunology, Montana State University, Bozeman, MT

59717, USA.

240

Abstract

Argonaute (Ago) proteins are present in all three domains of life and are involved in nucleic acid guided silencing and interference pathways. Eukaryotic Argonautes (eAgo) form the catalytic core of the RNA interference (RNAi) pathway, which is involved in gene silencing, transposon silencing and antiviral defense. However, the biological role of

Argonaute proteins in bacteria and archaea remains undetermined. Prokaryotic Argonautes

(pAgos) have been hypothesized to defend against mobile genetic elements such as plasmids and viruses. To test this hypothesis, we overexpressed 8 phylogenetically diverse pAgo proteins in Escherichia coli and challenged them with 7 bacteriophages spanning the

Myo-, Sipho-, and Podoviridae families. This resulted in robust protection against phage

λvir and phage P1 (up to 400,000-fold reduced infectivity) by 4 of the tested pAgos, while little impact on phage infectivity was observed for the other phages tested. However, control experiments with a nuclease inactive pAgo mutant and expression of an unrelated gene, and showed similar protection against phage λvir and phage P1. Collectively, our data suggest that protein overexpression in general, rather than pAgo expression in particular, results in protection against 2 specific phages.

241

Introduction

Argonaute (Ago) proteins are found in all three domains of life (1, 2). In

Eukaryotes, Argonautes play a central role in small RNA-guided gene silencing processes collectively known as RNA interference (RNAi). RNAi is involved in many cellular functions such as antiviral defense, gene regulation and transposon regulation. Eukaryotic

Argonaute (eAgo) proteins make up the catalytic core of the RNAi pathways and are loaded with small interfering RNAs (siRNAs), microRNAs (miRNAs) or Piwi-interacting RNAs

(piRNAs) that act as guides to identify RNA targets through sequence complementarity (3,

4). This leads to either target cleavage through intrinsic endonuclease activity, or to silencing of the RNA target through blocking of translation. Members of the Argonaute protein family are also present in prokaryotes. A recent comprehensive genomic and phylogenetic analysis by Ryazansky et al. identified prokaryotic Argonaute (pAgo) proteins in 17 % of bacterial and 25 % of archaeal genera (3). The prokaryotic Argonautes show far greater phylogenetic diversity compared to the eukaryotic Argonautes whom cluster as a single branch in the Argonaute family tree (Figure 7.1A). This greater diversity is also reflected in the nucleic acid guide and target versatility, as well as the domain architecture of the pAgos. Both short RNA and/or DNA guides are utilized by pAgos to target complementary RNA and/or DNA molecules, although the pAgos studied thus far appear to be predominantly DNA-targeting proteins (4, 5). Structurally, pAgos also show greater diversity than eAgos. All eAgo proteins have a bilobed architecture consisting of four domains separated by two linkers: N-terminal, L1 (Linker 1), PAZ (PIWI-Argonaute-

Zwille), L2 (Linker 2), MID (Middle) and PIWI (P-element induced Wimpy testis) 242 domains (Figure 7.1C). The pAgos are divided into two main phylogenetic groups: the

“long-pAgos” and “short-pAgos” (1-3). The long-pAgos consist of the same four domains as the eAgos, the short-pAgos only contain the MID and PIWI domains. The PIWI domain of most of the eAgos and a subset of the long-pAgos contains the conserved catalytic tetrad of amino acids (DEDX, where X is D, H or K) that is responsible for endonucleolytic cleavage of the target. Short-pAgos do not contain the catalytic tetrad and are either nuclease inactive or are fused to or found next to additional putative nucleases. Among the

Long-pAgos there is a secondary division in Long-A and Long-B pAgos clades (3). All the long-B pAgos lack the conserved catalytic tetrad and are thought to be nuclease inactive.

While in some of the long-A pAgos the catalytic tetrad is also mutated, the majority of

Long-A pAgos contain the catalytic tetrad and are predicted to be nuclease active. Previous research has mainly focused on pAgos from the Long-A clade that have an intact catalytic tetrad (6-14). Only two pAgos from the Long-B clade have been studied to date

(Rhodobacter sphaeroides Argonaute, RsAgo, and Archaeoglobus fulgidus Argonaute

AfAgo) and no short pAgos have been studied (15, 16).

While the biological role of pAgo proteins remains unclear they have been implicated in defense against foreign genetic elements based on the functions of eAgos, the genomic location of pAgo in areas of prokaryotic genomes enriched in genes with defense functions, and on limited experimental data (1-3, 17, 18). RsAgo, Thermus thermophilus Ago (TtAgo), Pyrococcus furiosus Ago (PfAgo) and Methanocaldococcus jannaschii Ago (MjAgo) have been shown to interfere with intercellular plasmid levels, and plasmid transformation efficiency of cells expressing these Argonautes (9, 10, 13, 16). 243

TtAgo and MjAgo can generate their own guide sequences in a process termed “chopping” by non-specifically cleaving double-stranded DNA and retaining one of the DNA strands as a guide that is used in subsequent rounds of targeted DNA degradation (13, 19). The

RNA-guided RsAgo samples the transcriptome and associates with guides from expressed genes (16). How other pAgos obtain their guide sequences is unknown. pAgo proteins have long been hypothesized to function as an antiviral defense system. Here, we directly test this hypothesis by overexpressing 8 different pAgos in E. coli and challenging with 7 different dsDNA phages. While we observed strong reduction in phage infectivity for two of the tested phages, control experiments revealed that this antiviral activity was not due to the pAgos in particular, but rather protein expression in general. This antiviral effect is related to the E. coli cell line, expression plasmid and protein being expressed. 244

Figure 7.1. Selection of pAgo proteins. (A) Maximum-likelihood phylogenetic tree of 1010 pAgo and 8 eAgo sequences. Long-A, Long-B and short pAgos are color coded. Some Long pAgos have lost N-terminal and/or PAZ domains and are truncated. The branch of Short pAgos that are fused to SIR2 domains are indicated in brown. The 8 pAgos studied here are indicated on the outer rim. (B) Atomic resolution structures (boxed) of RsAgo (PDB 5awh), TtAgo (PBD 4nca) and MpAgo (PDB 5i4a), and RaptorX structural prediction of the remaining 5 pAgos. Domain of pAgos are colored according to Figure 1C. NgAgo and PxAgo contain additional domains in yellow. (C) The selected 8 pAgos. Host species, guide and target identities are indicated on the left. Right side of the figure shows domain architecture of the pAgos. Amino acid numbers of the different domain are indicated. The active site residues are indicated in yellow. Mutated active site residues of are indicated in red. NgAgo is fused to a single stranded oligonucleotide binding domain (SSB). PxAgo is a short pAgo fused to a SIR2 domain and contains an APAZ domain (brown). The APAZ domain has structural homology to N-terminal L1, MID and L2 domains.

245

Results

We set out to directly test the hypothesis that pAgo proteins provide antiviral defense. Previous research shows that pAgo proteins can be recombinantly expressed in E. coli and are loaded with guide sequences predominantly derived from their expression plasmids, that allow them to degrade complementary plasmid targets in vitro (8-10, 16).

Moreover, RsAgo actively degrades plasmid DNA when recombinantly expressed in E. coli (16). This suggest that pAgos will function in vivo when recombinantly expressed in

E. coli. We set out to test and compare multiple pAgo proteins from diverse prokaryotic organisms. To select pAgo proteins of interest, we constructed a phylogenetic tree using

1010 pAgo sequences identified by Ryazansky et al. and used it to guide selection of 8 prokaryotic Argonautes (Figure 7.1A) (3). We selected four previously studied pAgos:

Rhodobacter sphaeroides Ago (RsAgo), Thermus thermophilus Ago (TtAgo), Marinitoga piezophila Ago (MpAgo) and Natronobacterium gregoryi Ago (NgAgo) and four novel pAgos: Microcystis aeruginosa Ago (MaAgo), Clostridium sartagoforme Ago (CsAgo)

Paraburkholderia xenovorans Ago (PxAgo) and Pseudomonas aeruginosa Ago (PaAgo).

Five of these pAgos belong to the Long-A clade (CsAgo, NgAgo, TtAgo, MpAgo and

MaAgo) one to the Long-B clade (RsAgo) and two to the short pAgo clade (PxAgo and

PaAgo). Atomic resolution structures have been solved for TtAgo, RsAgo and MpAgo, and we started out by running structural predictions on the remaining 5 pAgos using the

RaptorX webserver (Figure 7.1B) (20). CsAgo and MaAgo adopt a classical Long-A pAgo fold most closely related to the eAgo from Silkworm (Bombyx mori) and the bacterial

Aquifex aeolicus Argonaute (AaAgo), respectively (14, 21). Both pAgos contain the 246 catalytic tetrad DEDD suggesting they are nuclease active. NgAgo also adopts the Long-

A pAgo fold with a catalytic tetrad consisting of DEDD, however the first 128 amino acids on the N-terminal side are predicted to fold as an Oligonucleotide Binding domain (OB- fold domain) with structural homology to ssDNA binding domains of the archaea

Methanosarcina mazei and Saccharolobus solfataricus (Figure 7.1B+C) (22). This homology is not detectable at the amino acid level by BLAST-P but only at the higher order structural level using the RaptorX webserver (20).

Previous work suggests NgAgo utilizes DNA guides to target complementary

RNA, however these results have been disputed and recently NgAgo has been implicated in facilitating homologous recombination through guided DNA endonuclease activity (6,

22-25). PxAgo belongs to the short pAgo clade and contains an APAZ (Analogue of PAZ) domain and a domain with homology to eukaryotic deacetylases of the SIR2 family on the

N-terminal side (Figure 7.1B). APAZ domains are often found associated with or fused to short pAgos and are predicted to have structural, but not sequence homology to PAZ domains. Similarly, SIR2 like domains are regularly found in proximity of or fused to short pAgos (Figure 7.1A, brown branch) and have been suggested to act as nucleases in the context of pAgos (1-3). Structural prediction of PxAgo shows the APAZ domain has homology to the N-terminal, L1 and PAZ domains of Long pAgos, albeit with shorter sequence (Figure 7.1C). The Argonaute from P. aeruginosa belongs to the short pAgo clade and consist only of the MID and PIWI domains but is found in close proximity to two putative proteins of unknown function. Upstream of the PaAgo lies a 551 amino acid open reading frame that overlaps 11 bp with the start codon of PaAgo. 53 bp downstream 247 of the of PaAgo lies a 214 amino acid open reading frame that codes for a protein that belongs to the Domain of Unknown Function 2026 (DUF2026) protein family

(Figure 7.1C). We hypothesized that these two associated proteins might be required for

PaAgo function and set out to test the function of these three proteins combined. PaAgo in the remainder of the paper refers to PaAgo together with the up and downstream genes. In total we cloned 7 individual Argonautes and the PaAgo with up and down stream genes into the high copy number expression vector pRSF-1b and transformed them into E. coli

BL21 (DE3) for testing in antiviral activity assays.

We selected an array of 7 dsDNA E. coli phages spanning the Myoviridae (T4 and

P1) Siphoviridae (T5, λvir and SECφ18) and Podoviridae (T3 and T7) families for testing against the pAgos. Five of these phages have a virulent lifecycle (T4, T5, SECφ18, T3 and

T7) while two are temperate phages (P1 and λ) that can exist as lysogens, even though the virulent (intemperate) mutant of Lambda (λvir) that has lost its ability to lysogenize, is being used in our experiments (26). Phage P1 exist as a stable plasmid during lysogenic infection and does not integrate into the host chromosome as phage λ does (27). We preformed spot plaque assays with the 7 phages on lawns of E. coli BL21 (DE3) carrying the pAgo plasmids. 7 phages over 8-logs dilution were stamped on lawns of E. coli cells using a 96- well pinning tool (Figure 7.2A). This allowed comparison of how individual pAgos affect all 7 phages simultaneously. Phages were stamped on lawns of cells induced and uninduced for pAgo expression. A fold protection score was calculated that shows the difference in phage infectivity on the pAgo expressing cells versus the non-expressing cells. This resulted in 4 pAgos (RsAgo, NgAgo, TtAgo and PxAgo) that provide robust protection 248 against phage λvir (up to 400,000-fold) and phage P1 (up to 7,000-fold, Figure 7.2B +C).

Little to no protection was observed against the other dsDNA phages tested. This was surprising since phage λvir and P1 belong to different viral families (Siphoviridae and

Myoviridae, respectively) and other phages belonging to the same families showed no protection (SECφ18, T5 and T4).

Figure 7.2. Overexpression of some pAgos results in specific antiviral activity. (A) Overview of the phage plate spot assays. E. coli cell containing pAgo plasmids are plate on induction and non-induction agar plates. 7 dsDNA phages over 8-logs dilution are stamped on the agar plates and plaques are observed after overnight incubation. (B) Table showing results of the phage plate spot plaque assays. For each of the pAgos a fold protection score is calculated for each phage that shows the fold reduction in phage infectivity between the pAgo expressing plates and the pAgo non-expressing plates. Values are the average of three independent replicates. (C) Example of phage plate spot assays. Shown are E. coli BL21 (DE3) cells not containing a pAgo plasmid (left), containing TtAgo plasmid on plates without induction (middle), and containing TtAgo plasmid on plates with induction (right). Infectivity of phage P1 and λvir is greatly reduced on the TtAgo expressing plate (red box).

249

We wondered what phages λvir and P1 had in common that could explain the high levels of protection offered against these phages. The E. coli cell line used in the experiments contains the DE3 λ-lysogen integrated into the genome. This could provide a possible source of guide sequences for the pAgos to target phage λvir that are already present in the cell before phage infection occurs. BLAST-N alignments of the E. coli BL21

(DE3) genome against the phage genomes revealed that next to λvir, P1, T3 and T7 share identical sequence with E. coli BL21 (DE3) (Figure 7.3A). P1 contains the small mobile genetic element “Insertion Sequence 1” (IS1) that is present in ±29 copies in the E. coli

BL21 (DE3) genome (27-29). Phage T3 and T7 match the T7 RNA polymerase gene present on the DE3 lysogen (77% and 100% identity, respectively). We hypothesized that the pAgos are loaded with guide sequences derived from the λDE3 lysogen and the IS1 elements already present in the cells before phage infection, and that these guides allow the pAgos to target and neutralize λvir and P1 upon infection.

To investigate if pAgos acquire guide sequences derived from the E. coli genome that match the phages, we reanalyzed previously published datasets of guide sequences extracted from RsAgo (RNA guides) when expressed in E. coli BL21 (DE3) and of TtAgo

(DNA guides) when expressed in E. coli KRX (9, 16). We hypothesized that the pAgos may have a better chance of neutralizing invading phages if they are loaded with guides matching these phages, before phage infection occurs. First, guides derived from the respective expression plasmids were separated from guides derived from the E. coli genomes. Next, the E. coli derived guides were mapped against the 7 phage genomes. This resulted in a set of guide sequences that are associated with RsAgo or TtAgo that are 250 derived from the E. coli genome or transcriptome, but also match the phage genomes

(Figure 7.3B).

Figure 7.3. Recombinantly expressed pAgo are loaded with guide sequence matching phages. (A) Schematic representation of the E. coli BL21 (DE3) genome with sequence that shares homology with the tested phages indicated. (B) Alignment of RsAgo and TtAgo guide sequences to phage genomes when recombinantly expressed in E. coli (BL21 (DE3), RsAgo and KRX, TtAgo). On the left side the three phages that share sequence homology with E. coli are shown. The majority of guides matching the phages are derived from homologous sequence. On the right side the three phages that do not share sequence homology with E. coli are shown. The guides matching these phages are derived from random sequence matches.

Unexpectedly, both datasets contained guides matching all 7 phages. We did not anticipate this since several of the of the phages do not share homologous sequence with the E. coli genome. Closer inspection of the data revealed that some of the guides that 251 match both the phages and E. coli genomes derive from random short sequences matches that do not match in the flaking regions of the guide origin. This sequence homology is not picked up by BLAST-N alignments because it requires a minimal 28 bp match, longer than the ~14-25 nt guides. Guides matching phages T4, T5 and SECφ18 whom do not share sequence homology to the E. coli genome, only came from these random short sequence matches (Figure 7.3B). Guides matching phages P1, λvir and T7 are predominantly derived from the longer stretches of homology shared between the phages and E. coli genomes, such as the DE3 lysogen, cryptic , insertion sequences and the T7 RNA polymerase gene (Figure 7.3B). There is a clear difference in the data between the number of guides targeting the phages that share homology (P1, λvir and T7) and the ones that don’t

(T4, T5 and SECφ18). For the phages that share homology we found an average of 61.8 and 555.7 guides per kb of phage genome for the RsAgo and TtAgo datasets, respectively.

The phages that have no homology only match with 3.4 and 18.1 guides per kb of phage genomes for RsAgo and TtAgo respectively. This indicates that there are 30-60-fold more guides targeting the phages that share homology with the host than the phages that do not share homology. We did not include phage T3 in the homology containing phages because it’s RNA polymerase gene only matches with 77% identity to the T7 RNA polymerase gene in the E. coli genome and yields few fully matching guides sequences that are used in our analysis. The guides targeting all 7 phages combined only make up 0.23 and 0.67 % of the total number of guides derived from the E. coli genome for RsAgo and TtAgo, respectively. This indicates that only a few of the pAgo proteins present in the cell before phage infection are pre-loaded with guides targeting phage. From this analysis we 252 concluded that pAgos can be loaded with guide sequences derived from the E. coli genome that match phages before phage infection takes place.

The majority of RsAgo guide sequences targeting phage P1 map back to the IS1 element shared between E. coli BL21 (DE3) and phage P1 (Figure 7.3B). In order to test if the antiviral activity identified in the pAgo screen is mediated through sequence homology between the phages and the E. coli host, we tested a mutant of phage P1 where the IS1 element has been removed (P1virΔIS1). This mutant phage P1 virΔIS1 is derived from a virulent mutant (P1vir) (30, 31). We hypothesized that the antiviral activity of the pAgos against P1 phage was mediated by pAgos loaded with guides derived from the IS1 elements in the E. coli genome, and that the P1virΔIS1 phage would escape pAgo mediated antiviral activity because it is lacking the IS1 element. The guide analysis revealed that

88% of the RsAgo guides derived from the E. coli BL21 (DE3) genome that match phage

P1, target the IS1 element in P1. We anticipated that deletion of this IS1 element would abolish or greatly reduce the antiviral activity observed. Against our expectations, E. coli

BL21 (DE3) expressing pAgos showed similar levels of antiviral activity against all P1 phage variants (P1, P1vir and P1virΔIS1) (Figure 7.4A). This unexpected result prompted us to preform additional control experiments in order to validate the specific antiviral activity of the four pAgos against the two phages. We constructed the active site mutant of TtAgo which has a mutated residue in the catalytic tetrad (D478A) and is nuclease inactive. This active site mutant should not be able to degrade the viral genome and should not provide antiviral activity mediated through targeted nuclease activity. In addition, we used the pRSF-1b vector to express an unrelated protein (Cas8e, subunit of the Cascade complex of 253 the Type 1-E CRISPR system of E. coli K-12), which by itself does not have antiviral activity (32). Plaque assays with E. coli BL21 (DE3) expressing the mutant TtAgo

(D478A) or Cas8e showed similar levels of reduction in infectivity of phage λvir and P1, while the other 5 phages remained unaffected (Figure 7.4B). This suggested that the antiviral activity detected in our pAgo screen is not due to specific activity of the

Argonautes, but rather related to protein expression in general.

Figure 7.4. Protein expression in general cause reduction in phage infectivity. (A) Table showing results of phage plate spot plaque assays with the four active pAgos and the three phage P1 variants. Infectivity of all three phages is reduced by pAgo expression, and the IS1 element is not required for antiviral activity. Values are the average of three independent replicates. (B) Table showing results of phage plate spot plaque assays with the TtAgo active site mutant D478A and the inert Cas8e protein. These two proteins have no guided nuclease activity and show that the reduction in phage infectivity is related to protein expression in general, not pAgo expression in particular.

Discussion

The biological role of prokaryotic Argonautes proteins is unknown, but it is speculated that pAgos are involved in defense against mobile genetic elements, such as plasmid, transposons and phages. Here, we set out to directly test the hypothesis that pAgos can defend the cell against invading bacteriophages. We overexpressed 8 prokaryotic

Argonaute proteins in E. coli and challenged with 7 dsDNA E. coli phages. We observed 254 strong antiviral activity against two specific phages, however this activity was not related to pAgo expression in particular but to protein expression in general.

Analysis of pAgo guide sequences revealed that a some of the pAgo proteins are loaded with guides matching phages when expressed in E. coli. The majority of the guides that match to the 7 tested phages are derived from regions of the E. coli genome that share sequence homology with the phages and target phages P1, λvir or T7. No reduction in T7 infectivity was observed and any antiviral activity against phages P1 and λvir mediated by the pAgos is obscured in our screen by the protein expression mediated reduction in viral infectivity of these two phages. Therefore, it remains unknow if pAgos can provide antiviral defense.

Interestingly, our analysis of guide sequence does reveal similarities between the

RsAgo and TtAgo datasets. Both pAgos appear to acquire guides from similar genes in similar numbers. For example, the percentage of guides that match phage λ from the total number of guides that match the E. coli genomes is highly similar for both datasets (both

0.098 %). Furthermore, the most common guide targeting phage λ in both datasets is derived from the same lysis gene (RzpD, Figure 7.3B), possibly pointing towards a conserved pattern of guide acquisition. Olovnikov et al. showed that RNA guides targeting Insertion Sequence 4 (IS4) are highly enriched in RsAgo when it is expressed in the native host R. sphaeroides. We observed that both RsAgo and TtAgo acquire guides targeting IS1 when recombinantly expressed in E. coli. IS1 is also found in phage P1 and guides targeting phage P1 are predominantly derived from this insertion sequence in the case of RsAgo. Guides targeting phage P1 in the TtAgo dataset predominantly come from 255

IS5, an insertion sequence that is not present in E. coli Bl21 (DE3), but is present in E. coli

KRX, the host strain used to express and purify this Argonaute. In our analysis, the predominant guide matching IS5 is the 17th most common guide derived from the E. coli

KRX genome and highly overrepresented in the dataset (Figure 7.3B). This indicates that guides targeting IS elements are efficiently acquired by TtAgo. The association of pAgos with guides derived from Insertion Sequences suggest pAgo may play a role in suppression or spread of these elements. However, bioinformatic analysis has shown no decrease in transposable elements in prokaryotic genomes that contain Argonaute proteins (3).

The protein expression mediated antiviral activity seen against phages λvir and P1 in our screen remains puzzling. This activity is specific against 2 phages while the other 5 phages show no defect in infectivity. We observed that this activity is related to the E. coli strain used. Expression of pAgos in E. coli BL21-AI and T7 express lysY E. coli, showed similar antiviral activity while no antiviral activity was observed when pAgos are expressed in E. coli MDS42 or E. coli KRX (data not shown). Interestingly, all strains that showed the antiviral activity are derived from E. coli B stains, while the strains that showed no activity are derived from E. coli K12 strains. Furthermore, when we expressed the pAgos from the lower copy number plasmid pCDF-1b we did not observe any antiviral activity (data not shown), suggesting the activity is also related to the expression level of the protein. The mechanism of protein expression mediated antiviral activity is the subject of ongoing research.

256

Materials and Methods

E. coli strains

Experiments were carried out in a legacy E. coli BL21 (DE3) strain available in the laboratory unless stated otherwise. Additional strains used: E. coli KRX (Promega), E. coli

MDS42 ScarabXpress® T7 lac (Scarab genomics).

Argonaute selection and cloning

Plasmids containing Thermus thermophilus Argonaute (TtAgo), Marinitoga piezophila Argonaute (MpAgo) and Natronobacterium gregoryi Argonaute (NgAgo) were obtained from Addgene (#53079, #79249, #78253, respectively). Rhodobacter sphaeroides

Argonaute (RsAgo, WP_011910606) was amplified from genomic DNA from R. sphaeroides strains ATCC 17025. Pseudomonas aeruginosa Argonaute (PaAgo,

WP_016263567) and the upstream and downstream flanking genes (WP_016263566 and

WP_025982329) were amplified from genomic DNA from P. aeruginosa PACS2.

Microcystis aeruginosa Argonaute (MaAgo, WP_012265209), Clostridium sartagoforme

Argonaute (CsAgo, WP_016205751) and Paraburkholderia xenovorans Argonaute

(PxAgo, WP_011488074) were obtained as E. coli optimized gene blocks from Genscript

(Figure S7.1). All pAgos except PaAgo, were individually cloned into the expression vector pRSF-1b under control of an T7 promotor using OEPR cloning (Overlap Extension

PCR and Recombination in vivo) with primers listed in Table S7.1 (33). Strep-II tags followed by HRV3C cleavage sites were added to the N-terminal ends of the pAgo proteins in the pRSF-1b vectors using Q5 site directed mutagenesis (NEB). PaAgo was cloned with up, - and downstream flanking genes into pRSF-1b. A Strep-II tag followed by HRV3C 257 cleavage site was added to the upstream flanking gene (WP_016263566) in the pRSF-1b vector.

Phage preparation and purification

Phages T3 and λvir were obtained from the Félix d'Hérelle Reference Center for bacterial viruses. Phage P1vir was obtained from the Coli Genetics Stock Center. Phages

T4, T5 and P1 were kindly provided by U. Qimron. Phages SECphi18 and T7 were kindly provided by R. Sorek. Phage P1virΔIS1 was kindly provided by T. Fehér. All phages except

P1virΔIS1 were grown up and purified from E. coli BL21 (DE3). Phage P1virΔIS1 was grown up and purified from the insertion sequence free E. coli MDS42. Phages were purified according to the “Phage on tap” protocol with minor alterations (34). Briefly, E. coli cells were grown in 100 ml LB supplemented with 2 mM MgSO4 and 2 mM CaCl2 to and OD600 of ± 0.4. 100 µl of high titer phage lysate was added and cells were incubated

overnight at 37 ֯C with agitation. The next day, cells were spun down and supernatant was collected. 2 ml of chloroform was added to the lysates and vortexed for 2 minutes and incubated at room temperature for 15 minutes. Lysates were centrifuged and the supernatant was separated from the chloroform and pellet using a serological pipet. Lysates were filtered-sterilized using a 0.22 µm filter to yield cell free lysates. Lysates were concentrated using a 20 ml Corning Spin-X UF centrifugal concentrator with a 400 kDa molecular weight cut-off. Concentrated lysates were further washed with 2 x 20 ml of SM- buffer (100 mM NaCl, 8 mM MgSO4, 50 mM Tris-HCl pH 7.4) in order to exchange the buffer and achieve desired phage titer. 100 µl aliquots of phage lysate were mixed with

.µl of 50 % glycerol in SM-buffer and frozen at -80 ֯C until use 100 258

Phage plate spot plaque assays

E. coli BL21 (DE3) cells carrying pAgo plasmids were grown in LB + kanamycin to an optical density of ± OD600=1.0. Once grown, 800 µl of culture was mixed with 10 ml of 0.6 % (top) agar either with or without 0.1 mM IPTG and plated on square agar plates containing 1% agar with or without 0.1mM IPTG, to generate induced (+IPTG) and uninduced (-IPTG) plates. As an additional control, E. coli BL21 (DE3) not carrying plasmids was also included. Plates were dried for ±30 minutes at room temperature. Once dry, the E. coli lawns were challenged with phage by stamping on 7 dsDNA phages (T3,

T7, SECφ18, T5, λvir, P1 and T4) spanning a range of 8-logs dilution (Figure 7.2A) using a pinning tool (VP407A from V&P scientific) that transfers approximately 3 µl of phage

suspension per pin. After overnight incubation at 37 ֯C, plates were imagined, and plaques assessed. Infectivity was counted as the highest fold dilution at which a minimum of 2 plaques were visible. Assays were run in triplicate and the average infectivity calculated.

A fold protection score was generated by dividing the average infectivity value of the uninduced plate with the induced plate. Experiments shown in Figure 7.4B were not performed in triplicate.

Phylogenetic Analysis

1010 pAgo protein sequences were retrieved from NCBI using accession numbers from Ryazansky et al. (3). 8 eAgo protein sequences were added identified by Swarts et al

(2). Redundant sequences were removed with CD-HIT using a similarity threshold of 0.9, then locally aligned using MAFFT (35, 36). Domain borders were defined using Batch CD-

Search from NCBI and columns representing MID and PIWI domains were manually 259 extracted and trimmed with trimAl to remove columns composed of more than 50% gaps

(37-39). Sixteen sequences that appeared to be only partial protein sequences were manually removed from the trimmed alignment, before a phylogenetic tree was made using

FastTree, with the WAG and discrete gamma models (40). The maximum-likelihood tree was formatted using the ggtree package in R (41). pAgo guide Analysis

Previously published datasets of small RNA associated with RsAgo when expressed in E. coli BL21 (DE3) was obtained from the Gene Expression Omnibus

(GSM1208314) and small interfering associated with TtAgo when expressed E. coli

KRX was obtained from Sequence Read Archive (SRS540697) (9, 16). FASTQ files were filtered for high quality reads (Phread > 20) and redundant reads were concatenated and converted to FASTA using the FASTX-Toolkit (42). Using BOWTIE and a custom R- script, small RNA and DNA reads shorter than 14 nt, and reads that aligned to the pAgo expression plasmids were removed from each dataset (43). BOWTIE was used to align the remaining reads to the bacteriophage and bacterial genomes, with 0 mismatches allowed.

Alignments were analyzed in Excel and graphed in R.

260

References

1. Makarova KS, Wolf YI, van der Oost J, Koonin EV. Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements. Biology Direct. 2009;4(1):29.

2. Swarts DC, Makarova K, Wang Y, Nakanishi K, Ketting RF, Koonin EV, et al. The evolutionary journey of Argonaute proteins. Nature structural & molecular biology. 2014;21(9):743-53.

3. Ryazansky S, Kulbachinskiy A, Aravin AA. The Expanded Universe of Prokaryotic Argonaute Proteins. mBio. 2018;9(6):e01935-18.

4. Lisitskaya L, Aravin AA, Kulbachinskiy A. DNA interference and beyond: structure and functions of prokaryotic Argonaute proteins. Nat Commun. 2018;9(1):5165.

5. Willkomm S, Makarova KS, Grohmann D. DNA silencing by prokaryotic Argonaute proteins adds a new layer of defense against invading nucleic acids. FEMS microbiology reviews. 2018;42(3):376-87.

6. Fu L, Xie C, Jin Z, Tu Z, Han L, Jin M, et al. The prokaryotic Argonaute proteins enhance homology sequence-directed recombination in bacteria. Nucleic Acids Res. 2019;47(7):3568-79.

7. Kuzmenko A, Yudin D, Ryazansky S, Kulbachinskiy A, Aravin AA. Programmable DNA cleavage by Ago nucleases from mesophilic bacteria Clostridium butyricum and Limnothrix rosea. Nucleic Acids Res. 2019;47(11):5822-36.

8. Hegge JW, Swarts DC, Chandradoss SD, Cui TJ, Kneppers J, Jinek M, et al. DNA- guided DNA cleavage at moderate temperatures by Clostridium butyricum Argonaute. Nucleic Acids Res. 2019;47(11):5809-21.

9. Swarts DC, Jore MM, Westra ER, Zhu Y, Janssen JH, Snijders AP, et al. DNA- guided DNA interference by a prokaryotic Argonaute. Nature. 2014;507(7491):258-61.

10. Swarts DC, Hegge JW, Hinojo I, Shiimori M, Ellis MA, Dumrongkulraksa J, et al. Argonaute of the archaeon Pyrococcus furiosus is a DNA-guided nuclease that targets cognate DNA. Nucleic Acids Research. 2015;43(10):5120-9. 261

11. Kaya E, Doxzen KW, Knoll KR, Wilson RC, Strutt SC, Kranzusch PJ, et al. A bacterial Argonaute with noncanonical guide RNA specificity. Proceedings of the National Academy of Sciences. 2016;113(15):4057-62.

12. Willkomm S, Oellig CA, Zander A, Restle T, Keegan R, Grohmann D, et al. Structural and mechanistic insights into an archaeal DNA-guided Argonaute protein. Nature Microbiology. 2017;2:17035.

13. Zander A, Willkomm S, Ofer S, van Wolferen M, Egert L, Buchmeier S, et al. Guide-independent DNA cleavage by archaeal Argonaute from Methanocaldococcus jannaschii. Nature Microbiology. 2017;2:17034.

14. Yuan Y-R, Pei Y, Ma J-B, Kuryavyi V, Zhadina M, Meister G, et al. Crystal Structure of A. aeolicus Argonaute, a Site-Specific DNA-Guided Endoribonuclease, Provides Insights into RISC-Mediated mRNA Cleavage. Molecular Cell. 2005;19(3):405-19.

15. Ma JB, Yuan YR, Meister G, Pei Y, Tuschl T, Patel DJ. Structural basis for 5'-end- specific recognition of guide RNA by the A. fulgidus Piwi protein. Nature. 2005;434(7033):666-70.

16. Olovnikov I, Chan K, Sachidanandam R, Newman Dianne K, Aravin Alexei A. Bacterial Argonaute Samples the Transcriptome to Identify Foreign DNA. Molecular Cell. 2013;51(5):594-605.

17. Makarova KS, Wolf YI, Koonin EV. Comparative genomics of defense systems in archaea and bacteria. Nucleic Acids Research. 2013;41(8):4360-77.

18. Koonin EV. Evolution of RNA- and DNA-guided antivirus defense systems in prokaryotes and eukaryotes: common ancestry vs convergence. Biology Direct. 2017;12(1):5.

19. Swarts DC, Szczepaniak M, Sheng G, Chandradoss SD, Zhu Y, Timmers EM, et al. Autonomous Generation and Loading of DNA Guides by Bacterial Argonaute. Mol Cell. 2017;65(6):985-98.e6.

20. Kallberg M, Wang H, Wang S, Peng J, Wang Z, Lu H, et al. Template-based protein structure modeling using the RaptorX web server. Nat Protoc. 2012;7(8):1511-22.

21. Matsumoto N, Nishimasu H, Sakakibara K, Nishida KM, Hirano T, Ishitani R, et al. Crystal Structure of Silkworm PIWI-Clade Argonaute Siwi Bound to piRNA. Cell. 2016;167(2):484-97.e9. 262

22. Lee KZ, Mechikoff MA, Kikla A, Liu A, Pandolfi P, Gimble FS, et al. NgAgo- enhanced homologous recombination in E. coli is mediated by DNA endonuclease activity. bioRxiv. 2019:597237.

23. Sunghyeok Y, Taegeun B, Kyoungmi K, Omer H, Seung Hwan L, Yoon Young K, et al. DNA-dependent RNA cleavage by the Natronobacterium gregoryi Argonaute. bioRxiv. 2017.

24. Lee SH, Turchiano G, Ata H, Nowsheen S, Romito M, Lou Z, et al. Failure to detect DNA-guided genome editing using Natronobacterium gregoryi Argonaute. Nat Biotechnol. 2016;35(1):17-8.

25. Wu Z, Tan S, Xu L, Gao L, Zhu H, Ma C, et al. NgAgo-gDNA system efficiently suppresses hepatitis B virus replication through accelerating decay of pregenomic RNA. Antiviral research. 2017;145:20-3.

26. Hershey DA, ed. The Bacteriophage Lambda. Cold Spring Harbor Laboratory Press. 1971.

27. Lobocka MB, Rose DJ, Plunkett G, 3rd, Rusin M, Samojedny A, Lehnherr H, et al. Genome of bacteriophage P1. J Bacteriol. 2004;186(21):7032-68.

28. Jeong H, Barbe V, Lee CH, Vallenet D, Yu DS, Choi SH, et al. Genome sequences of Escherichia coli B strains REL606 and BL21(DE3). J Mol Biol. 2009;394(4):644-52.

29. Studier FW, Daegelen P, Lenski RE, Maslov S, Kim JF. Understanding the differences between genome sequences of Escherichia coli B strains REL606 and BL21(DE3) and comparison of the E. coli B and K-12 genomes. J Mol Biol. 2009;394(4):653-80.

30. Luria SE, Adams JN, Ting RC. Transduction of lactose-utilizing ability among strains of E. coli and S. dysenteriae and the properties of the transducing phage particles. Virology. 1960;12:348-90.

31. Feher T, Karcagi I, Blattner FR, Posfai G. Bacteriophage in the lytic state using the lambda red recombinases. Microbial biotechnology. 2012;5(4):466-76.

32. Brouns SJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJ, Snijders AP, et al. Small CRISPR RNAs guide antiviral defense in prokaryotes. Science. 2008;321(5891):960-4.

263

33. Liu CJ, Jiang H, Wu L, Zhu LY, Meng E, Zhang DY. OEPR Cloning: an Efficient and Seamless Cloning Strategy for Large- and Multi-Fragments. Sci Rep. 2017;7:44648.

34. Bonilla N, Rojas MI, Netto Flores Cruz G, Hung SH, Rohwer F, Barr JJ. Phage on tap-a quick and efficient protocol for the preparation of bacteriophage laboratory stocks. PeerJ. 2016;4:e2261.

35. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics (Oxford, England). 2006;22(13):1658-9.

36. Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33(2):511-8.

37. Marchler-Bauer A, Bryant SH. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004;32(Web Server issue):W327-31.

38. Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, et al. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 2011;39(Database issue):D225-9.

39. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics (Oxford, England). 2009;25(15):1972-3.

40. Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490.

41. Yu G, Smith DK, Zhu H, Guan Y, Lam TT-Y. ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017;8(1):28-36.

42. Hannon GJ. FASTX-Toolkit 2010 [Available from: http://hannonlab.cshl.edu/fastx_toolkit.

43. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.

264

Supplementary Figures

Figure S7.1 E. coli optimized Prokaryotic Argonautes MaAgo ATGAACTACACCGCGGCCAACACGGCAAATAGTCCGATCTTCCTGTCCGAAATTAGCTCTCTGACCCTGAAAAATAGTT GCCTGAACTGTTTTCAACTGAATCACCAGGTGACGCGTAAAATCGGTAACCGCTTTAGCTGGCAGTTCTCTCGCAAATT TCCGGATGTCGTGGTTATTTTCGAAGACAATTGCTTTTGGGTTCTGGCTAAAGATGAAAAATCTATCCCGTCCCTGCAAC AGTGGAAAGAAGCGCTGTCGGACATTCAGGAAGTGCTGCGTGAAGATATCGGCGACCATTATTACAGCATTCACTGGC TGAAAGATTTTCAGATTACCGCGCTGGTTACGGCCCAACTGGCAGTCCGCATTCTGAAAATCTTTGGTAAATTCTCCGA TCCGATCGTTTTCCCGAAAGACTCACAGATTTCGGAAAACCAAGTGCAGGTTCGTCGCGAAGTGAATTTTTGGGCTGAA ATTATCAACGATACCGACCCGGCGATCTGTCTGACGGTCGATAGTTCCATTGTGTATTCAGGCGACCTGGAACAGTTTT ATGAAAATCATCCGTACCGTCAAGATGCCGTCAAACTGCTGGTTGGCCTGAAAGTTAAAGACCGCGAAACCAACGGTA CGGCTAAAATTATCCGTATTGCGGGCCGCATCGGTGAACGTCGCGAAGATCTGCTGACCAAAGCAACGGGTTCAATCTC GCGTCGCAAACTGGAAGAAGCCCACCTGGGTCAACCGGTCGTGGCAGTGCAGTTCGGTAAAAATCCGCAAGAATACAT CTACCCGCTGGCAGCTCTGAAACCGTGGGTTACCGATGAAGACGAAAGCCTGTTCCAGGTCAACTATGGCAATCTGCTG AAAGCCACGAAAATCTTTTATGCAGAACGTCAGGAACTGCTGAAACTGTACAAACAAGAAGCTCAGAAAGCGCTGAAC AATTTTGGTTTCCAGCTGCGCGAAAAATCTATCAATTCTCAAGAATACCCGGAACTGTTCTGGACCCCGAGCATTTCTAT CGAACAGACGCCGATTCTGTTTGGCCAAGGTGAACGTGGCGAAAAACGCGAAATTATCAAAGGTCTGAGCAAAGGCGG TGTCTATAAACGTCATCGCGAATACGTGGATCCGGCCCGTAAAATTCGCCTGGCTATCCTGAAACCGGCGAACCTGAAA GTTGGCGACTTCCGTGAACAGCTGGAAAAACGCCTGAAACTGTACAAATTCGAAACCATCCTGCCGCCGGAAAACCAG ATTAATTTCTCTGTGGAAGGCCTGGGTTTTGAAAAACGTGCCCGCCTGGAAGAAGCAGTTGATCGTCTGATTGGCGTCG AAATCCCGGTGGACATTGCCCTGGTCTTCCTGCCGCAGGAAGATCGCAACGCAGACAATACCGAAGAAGGTAGTCTGT ATTCCTGGATTAAACGTAAATTTCTGGGCCGCGGTGTGATCACCCAGATGATCTACGAAAAAACGCTGAACGATAAAA GCAACTACAAAAACATCCTGAACCAGGTTGTCCCGGGCATTCTGGCGAAACTGGGTAACCTGCCGTATGTTCTGGCTGA ACCGCTGGAAATCGCGGATTACTTCATTGGCCTGGACGTGGGTCGTATGCCGAAGAAAAACCTGCCGGGCTCACTGAA CGTTTGCGCGTCGGTCCGCCTGTATGGCAAACAGGGTGAATTTGTGCGTTGTCGCGTTGAAGATAGTCTGACCGAAGGT GAAGAAATCCCGCAGCGTATTCTGGAAAATTGCCTGCCGCAGGCCGAACTGAAAAACCAAACGGTGCTGATTTACCGC GATGGCAAATTCCAGGGTAAAGAAGTTGAAAATCTGCTGGCACGTGCACGTGCAATCAACGCAAAATTCATCCTGGTT GAATGTTACAAAACCGGCATCCCGCGTCTGTACAATCTGCAACAGAAACAGATTAACGCACCGAGCAAAGGTCTGGCT CTGGCGCTGTCTAATCGCGAAGTGATTCTGATCACCTCACAAGTGTCGGAACAGATCGGTGTTCCGCGTCCGCTGCGTC TGAAAGTCCATGAACTGGGTGAACAACGTAACCTGAAACAGCTGGTTGATACCACGCTGAAACTGACCCTGCTGCACT ATGGCAGCCTGAAAGATCCGCGCCTGCCGATTCCGCTGTATGGTGCCGACATTATCGCATACCGTCGTCTGCAAGGCAT TTACCCGTCCCTGCTGGAAGATGATTGTCAGTTTTGGCTG CsAgo ATGAAAGAATTCAACGTCATTACCGAATTCAAAAACGGCATCAACTCTAAAAGTATCGAAATCTACATCTACAAAATG ATGGTGCGTGATTTCGAAAAACGCCACAACGAAAATTATGACGTCGTGAAAGAACTGATCAACCTGAACAATAACTCC ACCATCGTGTTTTATGAACAATACATCGCATCATTCAAAGAAATCGAAAAATGGGGTAACGAACAGTACATCAACGTT GAAAAACGTGCTATCAACCTGGAATCAAACGAAAAGAAAATTCTGGAACGCCTGCTGCTGAAAGAAATCAAAAACAA CATCGATAACAACAAATATAAAGTTGTCAAAGACAGCATCTACATCAACAAACCGGTTTACAACGAAAAAGGCATCAA AATCGATCGTTACTTCAACCTGGACATTAATGTCGAATCGAACGGCGATATCATCATCGGTTTCGACATCAGCCATAAC TTCGAATACATCAACACCCTGGAATACGAAATTAAAAACAACAACATCAAAATCGGTGATCGCGTCAAAGACTATTTTT ACAACCTGACGTATGAATACGTGGGCATTGCGCCGTTCACCATCTCGGAAGAAAATGAATATATGGGTTGCAGCATCGT TGATTACTACGAAAACAAAAACCAGAGCTACATCGTCAACAAACTGCCGAAAGACATGAAAGCCATCCTGGTGAAAAA TAACAAAAACTCCATCTTCCCGTACATCCCGTCACGTCTGAAGAAAGTGTGTCGCTTTGAAAATCTGCCGCAGAACGTG CTGCGTGATTTCAATACCCGCGTTAAACAAAAAACGAACGAAAAAATGCAGTTCATGGTCGACGAAGTGATCAACATC GTCAAAAACTCTGAACACATCGATGTGAAAAAGAAAAACATGATGTGCGACAACATTGGCTATAAAATCGAAGATCTG CAACAGCCGGACCTGCTGTTTGGTAATGCACGTGCACAGCGTTATCCGCTGTACGGTCTGAAAAACTTCGGTGTTTACG AAAACAAACGTATCGAAATCAAATACTTCATCGATCCGATCCTGGCCAAATCGAAAATGAACCTGGAAAAAATCAGCA AATTCTGTGACGAACTGGAACAGTTCAGCTCTAAACTGGGCGTTGGTCTGAATCGCGTCAAACTGAACAACATCGTGAA CTTCAAAGAAATCCGTATGGATAACGAAGACATTTTCTCTTACGAAATCCGCAAAATCGTTAGTAACTACAATGAAACC ACGATTGTCATCCTGTCAGAAGAAAACCTGAACAAATACTACAACATCATTAAGAAAACCTTTTCGGGCGGTAACGAA GTTCCGACGCAGTGCATTGGTTTCAATACCCTGTCCTACACGGAGAAAAACAAAGATTCAATTTTTCTGAACATCCTGC TGGGCGTCTACGCAAAAAGCGGTATTCAGCCGTGGATCCTGAATGAAAAACTGAACTCTGATTGTTTCATCGGCCTGGA CGTTAGTCGTGAAAACAAAGTGAATAAAGCTGGTGTTATTCAAGTGGTTGGCAAAGATGGTCGCGTCCTGAAAACGAA AGTGATCAGTTCCTCACAGAGTGGCGAAAAAATTAAACTGGAAACCCTGCGTGAAATTGTGTTTGAAGCAATCAATTCC TATGAAAACACCTACCGCTGTAAACCGAAACATATCACGTTCCACCGTGATGGTATTAACCGCGAAGAACTGGAAAAC CTGAAAAATACCATGACGAATCTGGGCGTTGAATTCGATTACATCGAAATCACCAAAGGTATCAACCGTCGCATTGCTA CGATCAGCGAAGGCGAAGAATGGAAAACCATTATGGGTCGTTGCTATTACAAAGATAATTCTGCGTATGTGTGTACCA CGAAACCGTACGAAGGCATCGGTATGGCCAAACCGATTCGCATCCGTCGCGTGTTTGGCACGCTGGATATCGAAAAAA TCGTTGAAGACGCATACAAACTGACCTTCATGCATGTGGGTGCTATTAACAAAATCCGTCTGCCGATTACCACGTATTA CGCGGATCTGTCGAGCACCTATGGCAACCGTGACCTGATTCCGACCAACATTGACACCAACTGCCTGTATTTCATC

265

PxAgo ATGCATATCCACGCAGCACTGCAGACCGGTAGCTCTGTCCGTCGCCTGGGTGTTGATGCACTGGTCCGTTCACTGGCTG TTTCGCAGGACCGTCCGCTGATGCTGTTTCTGGGTGCAGGTGCATCCATGACCTCAGGCATGCCGTCCGCGAATCAATG TATCTGGGAATGGAAACGTGATATTTTTCTGTCAAACAATCCGGGCATCGAAGAACAGTTCTCGGAACTGAGCCTGCCG TCTGTGCGTGATCGCATTCAGACCTGGCTGGACCGTCAACGCTGCTATCCGGTTGCGGGCCATCCGGATGAATATGGTG CGTACATTGAAGCCTGTTTTTCACGCTCGGATGACCGTCGCCGTTATTTCGAACGTTGGGTGAAACAATCGACGCCGCA CACCGGTTACCGTCTGCTGGCAGAACTGGCAGCTAGCGGTCTGATCCAGACCGTGTGGACCACGAACTTTGATGGTCTG ATTGCACGTGCGGCCGTTGCTACGAATCTGACCAGCATTGAAATCGGCATTGATTCTCAACAGCGTCTGTATCGTGCAC CGGGTAAAGACGAACTGGCATGCGTTTCTATGCATGGTGATTATCGTTACGACCGCCTGAAAAACAGTCCGGGCGAAC TGGCGCAGGTCGAAGTGCAACTGCGTGATAGCCTGATTGAAGCACTGCGTACGCACACCGTCGTGGTTGCAGGCTATA GTGGTCGTGATGAATCCGTGATGCAGGCATTTCGCCAATACGCAGCTTCTGGTCCGGCTCGTACGGATCTGCCGCTGTT CTGGACCCAGTATGGCGAAGATCCGCCGCTGGACACGGTTAGCGCGTTTCTGTCTACCAATGATGACGAACCGTCACGC TTTATCGTGCCGGGTGTTTCGTTCGATGACCTGATGCGCCGTCTGGCACTGTACCTGAGCAAAGGCCCGGCTCGTGATC GCGTCAACAAAATTCTGGACGAACATGCAACCACGCCGGTGAATCAGCTGACGGCTTTCGGTCTGCCGCCGCTGCCGC CGACCGGTCTGATCAAATCTAACGCGATTCCGCTGACCCCGCCGCAGGAACTGCTGGAATTTGATCTGCATCAATGGCC GGCAAGTGGTACGGTCTGGGCAACCCTGCGTGAACTGGGTGATAAACACAATTTTGTGGCGGCCCCGTTCCGCAGTAA AATCTATGCTATCGCGATTGCCGAATCCCTGCGTCTGGCCTTTGGCGAAAACCTGAAAGGTGAAATCAAACGCGTGCCG CTGAATGATGACGATCTGCGTTACGAAGATGGTGTTATTAACCAGCTGGTCCGCCGTGCAACGGTGCTGGCTCTGTCCG CAAAAGCTAATTGCCCGAGTGATGGCGAATCCCTGATTTGGACCTCAGAAAAAGTTGAAAACCTGCGTCTGGACCGCG TGGATTGGAAAGTTCATCAGGCCGTTCTGGTCCAAATCCGCCCGCTGGGTACCGAAATGGCACTGGTTCTGAAACCGAC GCTGTATGTCACCGATAAAAGTGGTGCGATTGCCCCGAAAGACACGGAACGTCTGGTCAAACAGCGCGTGCTGGGCTA CCAACACAACAAAGAATTTAATGATGCGACCGAAGCATGGCGCCGTCGCCTGGTGCCGCAGCGTGACTTTCATGTTCG CTTCCCGGACCACGAAGATGGCATCGACCTGACCTTTTCGGGTCGTCCGCTGTTCGCACGCATTACGGATGAACGTGAA CGCACCGTGAGCCTGAGTTCCGCACAGGAACTGGCAGCTCGTCAGGCAGGTCTGCAACTGGCAGAACCGCGTCTGAAA TTTGCACGTAAAAGCGCAGCAGGTCTGGCATTCGATACCCATCCGGTGCGTGGTCTGATCAACAATCGCCCGTTCGACT CATCGCTGACCACGACCGGCATCGCGAGCTCTATTCGCGTTGGTATTATCGCACCGGCTCAAGATGCCACCCGTGTTCA TCAGTATCTGTCCCAACTGCACGTCGCAGCTCAGCCGGGCAAAGACGCGGATTATCTGCCGCCGTTTCCGGGTTTCGCG TCAGCCTACCAGTGTCCGCTGGAAATTCCGGCCGTGGGTGAACAGAGCTTTGTTCAACTGGATGAACCGGACTCTATGA CGCCGAGTTCCGCACGTGCTCTGGCAGGTGCAATCACCCGTAGCATTGCAAGCCTGTCTGCCAGTCAGCGCCCGGATGT TACCATTATCTATGTCCCGGACCGTTGGGCCCCGCTGCGTAACTACATGATCGACGATGAAGAATTTGATCTGCACGAC TTCGTTAAAGCGGCCGCAATTCCGAAAGGTTGCGCAACCCAGTTTGTCGAAGAAGATACGCTGCGCAATACCCAACAG CAATGTCGTGTGCGTTGGTGGCTGAGTCTGGCCCTGTATGTTAAATCCATGCGTACGCCGTGGACCCTGGAAGGCCTGA GTGAAAAATCCGCATACGTCGGCCTGGGTTTCTCTGTGAAACGTAAAACGACCCAGAACGCAGGCGCTCATGTCGTGCT GGGTTGCTCACACCTGTATTCGCCGAATGGCATTGGTCTGCAGTTTCGCCTGTCTAAAATCGAAGATCCGATTATGCGT AACAAAAATCCGTTTATGAGTTTCGACGATGCGCGTCGCCTGGGCGAAGGTATCCGCGAACTGTTTTTCGCTGCACAGC TGCGTCTGCCGGAACGTGTTGTCATTCATAAACAAACGCCGTTCCTGCGTGAAGAACGTAGCGGTCTGCAGGCAGGTCT GGAAGGTGTCGCTTGTGTGGAACTGCTGCAGATTTTCGTCGACGATACCCTGCGTTATGTGGCGAGCCACCCGACGTCT GATGGCAAATTTGAAACCGACAACTACCCGATCCGTCGCGGTACGACCGTGGTTATTGACGATCATACGGCCCTGCTGT GGGTGCACGGTGCAAGTACCGCTCTGAATCCGCGTCGCCATTATTTTCAGGGCAAACGTCGCATCCCGGCACCGCTGGT GATTCGTCGCCACGCTGGCACGACCGATCTGATGACCATCGCGGACGAAGTTCTGGGTCTGTCAAAAATGAACTTCAAC TCGTTCGATCTGTACGGCCAGCTGCCGGCGACGATTGAAACCAGCCGTCGCGTGGCCAAAATTGGTGCTCTGCTGGACC GCTTTTCGGAACACTCGTATGACTACCGCCTGTTTATG

266

Supplementary Tables

Table S7.1. Primers used in Overlap Extension PCR and Recombination in vivo. Lowercase sequence matches pRSF-1b backbone, uppercase sequence matches Argonautes Primer Description Sequence (5'-3') PvE 334 pRSF-1b backbone amplification FW taattaacctaggctgctg PvE 335 pRSF-1b backbone amplification RV catggtatatctccttattaaagttaaac PvE 336 PxAgo OEPR FW ctttaataaggagatataccatgCATATCCACGCAGCACTG PvE 337 PxAgo OEPR RV gtggcagcagcctaggttaattaCATAAACAGGCGGTAGTC PvE 338 CsAgo OEPR FW ctttaataaggagatataccatgAAAGAATTCAACGTCATTACC PvE 339 CsAgo OEPR RV gtggcagcagcctaggttaattaGATGAAATACAGGCAGTTG PvE 340 MaAgo OEPR FW ctttaataaggagatataccatgAACTACACCGCGGCCAAC PvE 341 MaAgo OEPR RV gtggcagcagcctaggttaattaCAGCCAAAACTGACAATCATCTTC PvE 342 NgAgo OEPR FW ctttaataaggagatataccatgACAGTGATTGACCTCGATTC PvE 343 NgAgo OEPR RV gtggcagcagcctaggttaattaGAGGAATCCGACATTAGAC PvE 344 RsAgo OEPR FW ctttaataaggagatataccatgGCCCCAGTGCAGGCTGCC PvE 345 RsAgo OEPR RV gtggcagcagcctaggttaattaTAGGAACCAGCGGCTCCACTTG PvE 346 MpAgo OEPR FW ctttaataaggagatataccatgTACCTGAATCTGTATAAAATCG PvE 347 MpAgo OEPR RV gtggcagcagcctaggttaattaGTACGGAATATACAGCTTC PvE 348 PaAgo 3 gene OEPR FW ctttaataaggagatataccatgCGAGCGCATCCAGCTCAC PvE 349 PaAgo 3 gene OEPR RV gtggcagcagcctaggttaattaCCAACCCCCTTCGACAGC PvE 350 TtAgo OEPR FW ctttaataaggagatataccatgAACCACCTTGGAAAAACG PvE 351 TtAgo OEPR RV gtggcagcagcctaggttaattaAACGAAGAAGAGCTTTTC

267

GENERAL CONCLUSION AND OUTLOOK

All living organisms face the threat of viral predation and have evolved extensive antiviral defense mechanisms to mitigate this threat. Prokaryotes, the most abundant living organisms, are often outnumbered 10 to 1 by bacteriophages in their natural environments and therefore, require efficient defense systems for survival. Viruses on the other hand, are obligate intercellular parasites that rely on infection of the host for perpetuating their existence, driving them to evolve countermeasures against the prokaryotic defense mechanisms. This evolutionary arms race has given rise to many adaptations, both on the host and phage side, and has played a large role in driving evolution.

The interface between virus and host is an ideal arena for scientific discovery.

Firstly, it facilitates the study of evolution as it happens, as the short replication times of prokaryotes and their phages allow one to study many generations in a short period of time.

Secondly, this interface has proven to be a goldmine for discovery of new proteins that can be adapted as biotechnological tools.

During my doctoral program, the use of CRISPR derived tools for has rapidly expanded. CRISPR based genetic modification is now common in the plant breeder world, and soon we will likely see the first CRIPSR edited crops reach the marketplace. In medicine, the first clinical trials with CRISPR based therapies are underway, testing potential treatments for cancer, blood disorders, and blindness. More controversial modification of the human germline has also taken place, and the first

CRISPRed humans have been born. 268

Looking beyond CRISPR to the next generation of biotechnological tools,

Argonaute proteins come to mind. Similar to CRISPR, the Argonautes are nucleic acid guided nuclease, which makes them ideal for similar applications as CRISPR derived genetic tools.

Prokaryotic Argonautes proteins represent an underexplored research area were many exciting discoveries are yet to be made. Likely, the phylogenetic diversity of the pAgos points toward functional diversity, which potentially entails another goldmine for discovery of new tools. Already, we have seen the first applications derived from pAgo proteins. PfAgo has been repurposed as an Artificial (ARE), that can be programed to cleave at any desired location, and can be utilized in

(1). NAVIGATER is a TtAgo based tool that is used to detect rare nucleic acids in a complex sample, similar to the CRISPR based tools such as Cas9-DASH, DETECTR, and

SHERLOCK (2-6). As we gain a better understanding on the mechanisms of action of prokaryotic Argonaute proteins, we will surely see many more applications derived from this protein family.

The Type I-E CRISPR System

During my thesis work, considerable progress has been made elucidating the mechanism of action of the Type I-E CRISPR system. Atomic resolution structures of the different components of the system have facilitated more informed questions about the internal workings of the system.

The structures of Cascade before and after target binding, discussed in Chapters 3 and 5, together with the single-strand DNA bound structure of Cascade (7), provide 269 snapshots of the complex during different stages of antiviral defense. Together, these structures show the conformational changes that take place upon DNA binding that lead to

Cas3 recruitment and target degradation. Combined, these structures explain how the PAM sequence is recognized, how directional hybridization of the crRNA guide to the target strand takes place, how full-length hybridization leads to Cascade locking, and how the

Cas3 recruitment site is formed upon target binding.

Target Binding and PAM Recognition

The Cascade target search starts with non-specific interactions of Cascade with dsDNA. Through sliding and hopping interactions, Cascade initially searches for PAM sequences (8). This reduces the complexity of the target search and allows Cascade to quickly scan the vast sequence space present in the cell. The non-specific DNA interactions rely on the lysine-vise discussed in Chapter 4, a large solvent exposed β-hairpin (Loop3) discussed in Chapter 6, and a positively charged patch on the Cse1 (Cas8e) subunit (8, 9).

Cascade complexes missing the Cse1 subunit lack any non-specific DNA binding activity in vivo (10).

After non-specifically binding to dsDNA, Cascade must search for a correct PAM sequence. Recognition of the PAM takes place in double stranded form, from the minor grove and is facilitated by three motifs present in the Cse1 subunit; lysine finger, glycine loop, and glutamate wedge (Chapter 5). These three motifs specify the PAM preference of the system (3’-TTC-5’ on the target strand 5’-AAG-3’ on the non-target strand). 3’-

TTC-5’ represents the optimal PAM sequence, but it is not the only functional PAM that can be bound with high affinity. The minor grove PAM recognition is promiscuous, and 270 type I-E Cascade tolerates up to ~20 different PAM sequences that are bound with varying degrees of affinity. Only four-to-six of these PAM sequences support interference levels high enough to clear plasmids from cells or provide protection against phage (11-15). The

-3 and -2 positions of the PAM are recognized by the lysine finger and glycine loop and exclude G-nucleotides on the target strand. This avoids autoimmunity by ensuring that

Cascade cannot bind to the CRISPR locus from which the guide was derived. The spacer sequences in the CRISPR locus contain the 3’-GGC-5’ PAM sequence, which is highly unfavorable for Cascade binding (11, 12, 15). Possibly, the type I-E CRISPR system has evolved high stringency PAM recognition by the Cas1-Cas2 complex, during the acquisition of new spacers in the adaptation stage (16), while PAM recognition by Cascade during the interference stage has much lower stringency. This ensures that the CRISPR array is loaded with spacers derived from 3’-TTC-5’ PAM containing protospacers, while maintaining the potential benefit that targets with mutated PAM sequences can still be subjected to the interference machinery, either by direct interference or by priming.

Conformational Control and Cas3 Recruitment

The Cse1 subunit is loosely associated within the Cascade complex and recent in vivo work shows it is disassociated from the complex about 40% of the time (10). This loose attachment of Cse1 allows this subunit to adopt distinct conformations within the complex. The structures of Cascade during different stages of antiviral defense, together with single molecule FRET experiments have shown that Cse1 can adopt “open and closed” conformations (7, 14, 17, 18). Strong interference PAMs lead to a predominantly closed conformation that exposes the Cas3 recruitment site, while less optimal PAM 271 sequences drive Cse1 to adopt the open conformation, where the Cas3 recruitment site is buried. This conformational control of the nuclease activity of this system is discussed in

Chapter 2.

The local conformational change in the Cse1 subunit that completes the Cas3 recruitment site is only visualized when both the unbound (Chapter 3) and dsDNA bound

(Chapter 5) structures of Cascade are compared. In Chapter 6, HDX-MS is used to investigate the dynamics of the unbound and dsDNA bound complexes. The HDX map, together with the structures, allowed us to identify specific residues on Cse1. These residues form part of the Cas3 recruitment site that only fully forms upon target binding.

Structures of Thermobifida fusca Cascade during different stages of target recognition show an equivalent local conformational change (19) and a recent structure of target-bound

T. fusca Cascade in complex with Cas3 visualizes the entire Cas3 interaction site (20). This structure shows the equivalent residues in T. fusca, identified in the HDX map for E. coli

Cascade, interacting with Cas3. Finally, the Cascade-Cas3 structure also shows that steric clashing that prevent Cas3 binding when Cascade is in the unbound or in a ssDNA bound conformation. Together, these structures explain how the open conformation of Cse1 leads to reduced Cas3 recruitment and target degradation.

This elegant mechanism of conformational control provides an additional target verification step for the type I-E CRISPR systems. Not only is a sequence match required between the crRNA guide and target DNA, a PAM sequence is also required which facilitates Cse1 to adopt the closed conformation. This ensures that the PAM sequence is not only read during initial target binding, but also during the target degradation step. This 272 feature may prevent autoimmunity during host genome replication, when the CRISPR locus is briefly single stranded and can be bound by Cascade. In this way, even if Cascade binds to the CRISPR locus, the additional PAM read-out by Cse1 will ensure the Cas3 recruitment site is buried and no genome degradation takes place.

During my PhD research I have contributed to the study of CRISPR mediated immunity in Escherichia coli. Studying the protein components of this system helps us understand in molecular detail how these biological machines function. This in turn facilitates discovery of new genetic tools and improvement of existing ones. The rapid progress being made in CRISPR research promises many new exciting challenges and opportunities for the future. I am excited to learn and discover more about the molecular machines that make up prokaryotic antiviral defense systems.

273

References

1. Enghiad B, Zhao H. Programmable DNA-Guided Artificial Restriction Enzymes. ACS synthetic biology. 2017;6(5):752-7.

2. Chen JS, Ma E, Harrington LB, Da Costa M, Tian X, Palefsky JM, et al. CRISPR- Cas12a target binding unleashes indiscriminate single-stranded DNase activity. Science (New York, NY). 2018;360(6387):436-9.

3. Gootenberg JS, Abudayyeh OO, Lee JW, Essletzbichler P, Dy AJ, Joung J, et al. Nucleic acid detection with CRISPR-Cas13a/C2c2. Science (New York, NY). 2017;356(6336):438-42.

4. Song J, Powell MJ, Liu W, Chen J, Bau H. Highly sensitive detection of rare mutant alleles by combining argonaute-based enrichment and XNA-PCR. Journal of Clinical Oncology. 2019;37(15_suppl):e14605-e.

5. Gu W, Crawford ED, O’Donovan BD, Wilson MR, Chow ED, Retallack H, et al. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications. Genome Biology. 2016;17(1):41.

6. Song J, Hegge JW, Mauk MG, Chen J, Till JE, Bhagwat N, et al. Highly Specific Enrichment of Rare Nucleic Acids using Thermus Thermophilus Argonaute with Applications in Cancer Diagnostics. bioRxiv. 2019:491738.

7. Mulepati S, Heroux A, Bailey S. Structural biology. Crystal structure of a CRISPR RNA-guided surveillance complex bound to a ssDNA target. Science (New York, NY). 2014;345(6203):1479-84.

8. Dillard KE, Brown MW, Johnson NV, Xiao Y, Dolan A, Hernandez E, et al. Assembly and Translocation of a CRISPR-Cas Primed Acquisition Complex. Cell. 2018;175(4):934-46.e15.

9. Xue C, Zhu Y, Zhang X, Shin YK, Sashital DG. Real-Time Observation of Target Search by the CRISPR Surveillance Complex Cascade. Cell reports. 2017;21(13):3717-27.

10. Vink JNA, Martens KJA, Vlot M, McKenzie RE, Almendros C, Bonilla BE, et al. Direct visualization of native CRISPR target search in live bacteria reveals Cascade DNA surveillance mechanism. bioRxiv. 2019:589119.

274

11. Fineran PC, Gerritzen MJ, Suarez-Diez M, Kunne T, Boekhorst J, van Hijum SA, et al. Degenerate target sites mediate rapid primed CRISPR adaptation. Proceedings of the National Academy of Sciences of the United States of America. 2014;111(16):E1629-38.

12. Musharova O, Sitnik V, Vlot M, Savitskaya E, Datsenko KA, Krivoy A, et al. Systematic analysis of Type I-E Escherichia coli CRISPR-Cas PAM sequences ability to promote interference and primed adaptation. Molecular microbiology. 2019;111(6):1558-70.

13. Xue C, Seetharam AS, Musharova O, Severinov K, Brouns SJ, Severin AJ, et al. CRISPR interference and priming varies with individual spacer sequences. Nucleic acids research. 2015;43(22):10831-47.

14. Xue C, Whitis NR, Sashital DG. Conformational Control of Cascade Interference and Priming Activities in CRISPR Immunity. Molecular cell. 2016;64(4):826-34.

15. Cooper LA, Stringer AM, Wade JT. Determining the Specificity of Cascade Binding, Interference, and Primed Adaptation In Vivo in the Escherichia coli Type I-E CRISPR-Cas System. mBio. 2018;9(2).

16. Nuñez JK, Kranzusch PJ, Noeske J, Wright AV, Davies CW, Doudna JA. Cas1– Cas2 complex formation mediates spacer acquisition during CRISPR–Cas adaptive immunity. Nature Structural &Amp; Molecular Biology. 2014;21:528.

17. Hayes RP, Xiao Y, Ding F, van Erp PB, Rajashankar K, Bailey S, et al. Structural basis for promiscuous PAM recognition in type I-E Cascade from E. coli. Nature. 2016;530(7591):499-503.

18. Jackson RN, Golden SM, van Erp PB, Carter J, Westra ER, Brouns SJ, et al. Structural biology. Crystal structure of the CRISPR RNA-guided surveillance complex from Escherichia coli. Science (New York, NY). 2014;345(6203):1473- 9.

19. Xiao Y, Luo M, Hayes RP, Kim J, Ng S, Ding F, et al. Structure Basis for Directional R-loop Formation and Substrate Handover Mechanisms in Type I CRISPR-Cas System. Cell. 2017;170(1):48-60.e11.

20. Xiao Y, Luo M, Dolan AE, Liao M, Ke A. Structure basis for RNA-guided DNA degradation by Cascade and Cas3. Science (New York, NY). 2018;361(6397).

275

REFERENCES CITED

Abudayyeh, O. O., J. S. Gootenberg, S. Konermann, J. Joung, I. M. Slaymaker, D. B. Cox, S. Shmakov, K. S. Makarova, E. Semenova, L. Minakhin, K. Severinov, A. Regev, E. S. Lander, E. V. Koonin, and F. Zhang. 2016. 'C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector', Science, 353: aaf5573.

Adams, P. D., P. V. Afonine, G. Bunkoczi, V. B. Chen, I. W. Davis, N. Echols, J. J. Headd, L. W. Hung, G. J. Kapral, R. W. Grosse-Kunstleve, A. J. Mccoy, N. W. Moriarty, R. Oeffner, R. J. Read, D. C. Richardson, J. S. Richardson, T. C. Terwilliger, and P. H. Zwart. 2010. 'PHENIX: a comprehensive Python-based system for macromolecular structure solution', Acta Crystallogr D Biol Crystallogr, 66: 213- 21.

Afonine, P. V., R. W. Grosse-Kunstleve, N. Echols, J. J. Headd, N. W. Moriarty, M. Mustyakimov, T. C. Terwilliger, A. Urzhumtsev, P. H. Zwart, and P. D. Adams. 2012. 'Towards automated crystallographic structure refinement with phenix.refine', Acta Crystallogr D Biol Crystallogr, 68: 352-67.

Almendros, C., N. M. Guzman, C. Diez-Villasenor, J. Garcia-Martinez, and F. J. Mojica. 2012. 'Target Motifs Affecting Natural Immunity by a Constitutive CRISPR-Cas System in Escherichia coli', PLoS One, 7: e50797.

Anders, Carolin, Ole Niewoehner, Alessia Duerst, and Martin Jinek. 2014. 'Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease', Nature, 513: 569-73.

Barrangou, R., C. Fremaux, H. Deveau, M. Richards, P. Boyaval, S. Moineau, D. A. Romero, and P. Horvath. 2007. 'CRISPR provides acquired resistance against viruses in prokaryotes', Science, 315: 1709-12.

Barrangou, R., and L. A. Marraffini. 2014. 'CRISPR-Cas Systems: Prokaryotes Upgrade to Adaptive Immunity', Mol Cell, 54: 234-44.

Blosser, T. R., L. Loeff, E. R. Westra, M. Vlot, T. Kunne, M. Sobota, C. Dekker, S. J. Brouns, and C. Joo. 2015. 'Two distinct DNA binding modes guide dual roles of a CRISPR-Cas protein complex', Mol Cell, 58: 60-70.

Blosser, T. R., L. Loeff, E. R. Westra, M. Vlot, T. Kunne, M. Sobota, C. Dekker, S. J. J. Brouns, and C. Joo. 2015. 'Two distinct DNA binding modes guide dual roles of a CRISPR-Cas protein complex', Mol Cell, 58: 60-70.

276

Bolotin, A., B. Ouinquis, A. Sorokin, and S. D. Ehrlich. 2005. 'Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin', Microbiology, 151: 2551-61.

Bondy-Denomy, J., and A. R. Davidson. 2014. 'To acquire or resist: the complex biological effects of CRISPR-Cas systems', Trends Microbiol, 22: 218-25.

Bonilla, N., M. I. Rojas, G. Netto Flores Cruz, S. H. Hung, F. Rohwer, and J. J. Barr. 2016. 'Phage on tap-a quick and efficient protocol for the preparation of bacteriophage laboratory stocks', PeerJ, 4: e2261.

Brouns, S. J., M. M. Jore, M. Lundgren, E. R. Westra, R. J. Slijkhuis, A. P. Snijders, M. J. Dickman, K. S. Makarova, E. V. Koonin, and J. van der Oost. 2008. 'Small CRISPR RNAs guide antiviral defense in prokaryotes', Science, 321: 960-4.

Burstein, D., L. B. Harrington, S. C. Strutt, A. J. Probst, K. Anantharaman, B. C. Thomas, J. A. Doudna, and J. F. Banfield. 2017. 'New CRISPR-Cas systems from uncultivated microbes', Nature, 542: 237-41.

Capella-Gutierrez, S., J. M. Silla-Martinez, and T. Gabaldon. 2009. 'trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses', Bioinformatics, 25: 1972-3.

Carroll, Dana. 2014. 'Genome Editing by Targeted Chromosomal Mutagenesis', Methods in Molecular Biology, 1239: 1-13.

Carroll, D. 2014. 'Genome engineering with targetable nucleases', Annu Rev Biochem, 83: 409-39.

Carte, Jason , Ruiying Wang, Hong Li, Rebecca M. Terns, and Michael P. Terns. 2008. 'Cas6 is an endoribonuclease that generates guide RNAs for invader defense in prokaryotes', Genes and Development, 22: 3489-96.

Chan, K. Y., L. G. Trabuco, E. Schreiner, and K. Schulten. 2012. 'Cryo-electron microscopy modeling by the molecular dynamics flexible fitting method', Biopolymers, 97: 678-86.

Chen, J. S., E. Ma, L. B. Harrington, M. Da Costa, X. Tian, J. M. Palefsky, and J. A. Doudna. 2018. 'CRISPR-Cas12a target binding unleashes indiscriminate single- stranded DNase activity', Science, 360: 436-39.

277

Chen, V. B., W. B. Arendall, 3rd, J. J. Headd, D. A. Keedy, R. M. Immormino, G. J. Kapral, L. W. Murray, J. S. Richardson, and D. C. Richardson. 2010. 'MolProbity: all-atom structure validation for macromolecular crystallography', Acta Crystallogr D Biol Crystallogr, 66: 12-21.

Cheng, R., J. Peng, Y. Yan, P. Cao, J. Wang, C. Qiu, L. Tang, D. Liu, L. Tang, J. Jin, X. Huang, F. He, and P. Zhang. 2014. 'Efficient gene editing in adult mouse livers via adenoviral delivery of CRISPR/Cas9', FEBS Lett, 588: 3954-8.

Chevalier, B. S., and B. L. Stoddard. 2001. 'Homing endonucleases: structural and functional insight into the catalysts of intron/intein mobility', Nucleic Acids Research, 29: 3757-74.

Collaborative Computational Project, Number. 1994. 'The CCP4 suite: programs for protein crystallography', Acta Crystallogr D Biol Crystallogr, 50: 760-3.

Cong, L., F. A. Ran, D. Cox, S. L. Lin, R. Barretto, N. Habib, P. D. Hsu, X. B. Wu, W. Y. Jiang, L. A. Marraffini, and F. Zhang. 2013. 'Multiplex Genome Engineering Using CRISPR/Cas Systems', Science, 339: 819-23.

Cooper, L. A., A. M. Stringer, and J. T. Wade. 2018. 'Determining the Specificity of Cascade Binding, Interference, and Primed Adaptation In Vivo in the Escherichia coli Type I-E CRISPR-Cas System', MBio, 9.

Datsenko, K. A., K. Pougach, A. Tikhonov, B. L. Wanner, K. Severinov, and E. Semenova. 2012. 'Molecular memory of prior infections activates the CRISPR/Cas adaptive bacterial immunity system', Nat Commun, 3: 945.

DeLano, W.L. 2002. 'The PyMOL Molecular Graphics System '. http://www.pymol.org.

Deltcheva, E., K. Chylinski, C. M. Sharma, K. Gonzales, Y. Chao, Z. A. Pirzada, M. R. Eckert, J. Vogel, and E. Charpentier. 2011. 'CRISPR RNA maturation by trans- encoded small RNA and host factor RNase III', Nature, 471: 602-7.

Dillard, K. E., M. W. Brown, N. V. Johnson, Y. Xiao, A. Dolan, E. Hernandez, S. D. Dahlhauser, Y. Kim, L. R. Myler, E. V. Anslyn, A. Ke, and I. J. Finkelstein. 2018. 'Assembly and Translocation of a CRISPR-Cas Primed Acquisition Complex', Cell, 175: 934-46.e15.

Ding, Q., A. Strong, K. M. Patel, S. L. Ng, B. S. Gosis, S. N. Regan, C. A. Cowan, D. J. Rader, and K. Musunuru. 2014. 'Permanent alteration of PCSK9 with in vivo CRISPR-Cas9 genome editing', Circ Res, 115: 488-92.

278

Dong, D., K. Ren, X. Qiu, J. Zheng, M. Guo, X. Guan, H. Liu, N. Li, B. Zhang, D. Yang, C. Ma, S. Wang, D. Wu, Y. Ma, S. Fan, J. Wang, N. Gao, and Z. Huang. 2016. 'The crystal structure of Cpf1 in complex with CRISPR RNA', Nature, 532: 522-6.

Doudna, J. A., and E. Charpentier. 2014. 'Genome editing. The new frontier of genome engineering with CRISPR-Cas9', Science, 346: 1258096.

East-Seletsky, A., M. R. O'Connell, S. C. Knight, D. Burstein, J. H. Cate, R. Tjian, and J. A. Doudna. 2016. 'Two distinct RNase activities of CRISPR-C2c2 enable guide- RNA processing and RNA detection', Nature, 538: 270-73.

Ebihara, A., M. Yao, R. Masui, I. Tanaka, S. Yokoyama, and S. Kuramitsu. 2006. 'Crystal structure of hypothetical protein TTHB192 from Thermus thermophilus HB8 reveals a new protein family with an RNA recognition motif-like domain', Protein Sci, 15: 1494-9.

Elmore, J. R., N. F. Sheppard, N. Ramia, T. Deighan, H. Li, R. M. Terns, and M. P. Terns. 2016. 'Bipartite recognition of target RNAs activates DNA cleavage by the Type III-B CRISPR-Cas system', Genes Dev, 30: 447-59.

Emsley, P., and K. Cowtan. 2004. 'Coot: model-building tools for molecular graphics', Acta Crystallographica Section D-Biological Crystallography, 60: 2126-32.

Engen, John R. 2009. 'Analysis of Protein Conformation and Dynamics by Hydrogen/Deuterium Exchange MS', Analytical Chemistry, 81: 7870-75.

Enghiad, B., and H. Zhao. 2017. 'Programmable DNA-Guided Artificial Restriction Enzymes', ACS Synth Biol, 6: 752-57.

Estrella, M. A., F. T. Kuo, and S. Bailey. 2016. 'RNA-activated DNA cleavage by the Type III-B CRISPR-Cas effector complex', Genes Dev, 30: 460-70.

Evans, P. R., and G. N. Murshudov. 2013. 'How good are my data and what is the resolution?', Acta Crystallographica Section D-Biological Crystallography, 69: 1204-14.

Feher, T., I. Karcagi, F. R. Blattner, and G. Posfai. 2012. 'Bacteriophage recombineering in the lytic state using the lambda red recombinases', Microb Biotechnol, 5: 466- 76.

Fineran, P. C., M. J. Gerritzen, M. Suarez-Diez, T. Kunne, J. Boekhorst, S. A. van Hijum, R. H. Staals, and S. J. Brouns. 2014. 'Degenerate target sites mediate rapid primed CRISPR adaptation', Proc Natl Acad Sci U S A, 111: 1629-38.

279

Fu, L., C. Xie, Z. Jin, Z. Tu, L. Han, M. Jin, Y. Xiang, and A. Zhang. 2019. 'The prokaryotic Argonaute proteins enhance homology sequence-directed recombination in bacteria', Nucleic Acids Res, 47: 3568-79.

Gao, P., H. Yang, K. R. Rajashankar, Z. Huang, and D. J. Patel. 2016. 'Type V CRISPR- Cas Cpf1 endonuclease employs a unique mechanism for crRNA-mediated target DNA recognition', Cell Res, 26: 901-13.

Garside, E. L., M. J. Schellenberg, E. M. Gesner, J. B. Bonanno, J. M. Sauder, S. K. Burley, S. C. Almo, G. Mehta, and A. M. Macmillan. 2012. 'Cas5d processes pre-crRNA and is a member of a larger family of CRISPR RNA endonucleases', RNA Biol, 18: 2020-28.

Gasiunas, G., T. Sinkunas, and V. Siksnys. 2014. 'Molecular mechanisms of CRISPR- mediated microbial immunity', Cell Mol Life Sci, 71: 449-65.

Gesner, E.M., M.J. Schellenberg, E.L. Garside, M.M. George, and A.M. MacMillan. 2011. 'Recognition and maturation of effector RNAs in a CRISPR interference pathway ', Nat Struct Mol Biol, 18: 688-92.

Glorikian, Harry. 2015. 'Gene Editing Will Change Everything—Just Not All at One Time'. http://www.genengnews.com/insight-and-intelligence/gene-editing-will-change- everything-just-not-all-at-one-time/77900351/.

Goddard, T. D., C. C. Huang, and T. E. Ferrin. 2005. 'Software extensions to UCSF chimera for interactive visualization of large molecular assemblies', Structure, 13: 473-82.

Gootenberg, J. S., O. O. Abudayyeh, J. W. Lee, P. Essletzbichler, A. J. Dy, J. Joung, V. Verdine, N. Donghia, N. M. Daringer, C. A. Freije, C. Myhrvold, R. P. Bhattacharyya, J. Livny, A. Regev, E. V. Koonin, D. T. Hung, P. C. Sabeti, J. J. Collins, and F. Zhang. 2017. 'Nucleic acid detection with CRISPR-Cas13a/C2c2', Science, 356: 438-42.

Gu, W., E. D. Crawford, B. D. O’Donovan, M. R. Wilson, E. D. Chow, H. Retallack, and J. L. DeRisi. 2016. 'Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications', Genome Biology, 17: 41.

Guttman M, Weiss DD, Engen JR, Lee KK. 2013. 'Analysis of overlapped and noisy hydrogen/deuterium exchange mass spectra.', J Am Soc Mass Spectrom., 24: 1906- 12.

280

Han, W., Y. Li, L. Deng, M. Feng, W. Peng, S. Hallstrom, J. Zhang, N. Peng, Y. X. Liang, M. F. White, and Q. She. 2017. 'A type III-B CRISPR-Cas effector complex mediating massive target DNA destruction', Nucleic Acids Res, 45: 1983-93.

Hannon, G.J. 2010. 'FASTX-Toolkit'. http://hannonlab.cshl.edu/fastx_toolkit.

Haurwitz, R. E., M. Jinek, B. Wiedenheft, K. Zhou, and J. A. Doudna. 2010. 'Sequence- and structure-specific RNA processing by a CRISPR endonuclease', Science, 329: 1355-8.

Hayes, Robert P., Yibei Xiao, Fran Ding, Paul B. G. van Erp, Kanagalaghatta Rajashankar, Scott Bailey, Blake Wiedenheft, and Ailong Ke. 2016. 'Structural basis for promiscuous PAM recognition in type I–E Cascade from E. coli', Nature, 530: 499- 503.

Hegge, J. W., D. C. Swarts, S. D. Chandradoss, T. J. Cui, J. Kneppers, M. Jinek, C. Joo, and J. van der Oost. 2019. 'DNA-guided DNA cleavage at moderate temperatures by Clostridium butyricum Argonaute', Nucleic Acids Res, 47: 5809-21.

Hershey, D.A., ed. 1971. 'The Bacteriophage Lambda', Cold Spring Harbor Laboratory Press.

Heyer, W. D., K. T. Ehmsen, and J. Liu. 2010. 'Regulation of homologous recombination in eukaryotes', Annu Rev Genet, 44: 113-39.

Hirano, H., J. S. Gootenberg, T. Horii, O. O. Abudayyeh, M. Kimura, P. D. Hsu, T. Nakane, R. Ishitani, I. Hatada, F. Zhang, H. Nishimasu, and O. Nureki. 2016. 'Structure and Engineering of Francisella novicida Cas9', Cell, 164: 950-61.

Hochstrasser, M. L., and J. A. Doudna. 2015. 'Cutting it close: CRISPR-associated endoribonuclease structure and function', Trends Biochem Sci, 40: 58-66.

Hochstrasser, M. L., D. W. Taylor, P. Bhat, C. K. Guegler, S. H. Sternberg, E. Nogales, and J. A. Doudna. 2014. 'CasA mediates Cas3-catalyzed target degradation during CRISPR RNA-guided interference', Proc Natl Acad Sci U S A, 111: 6618-23.

Holkers, M., I. Maggio, S. F. Henriques, J. M. Janssen, T. Cathomen, and M. A. Goncalves. 2014. 'Adenoviral vector DNA for accurate genome editing with engineered nucleases', Nat Methods, 11: 1051-7.

Horii, T., Y. Arai, M. Yamazaki, S. Morita, M. Kimura, M. Itoh, Y. Abe, and I. Hatada. 2014. 'Validation of microinjection methods for generating knockout mice by CRISPR/Cas-mediated genome engineering', Sci Rep, 4: 4513.

281

Hou, Zhonggang, Yan Zhang, Nicholas E. Propson, Sara E. Howden, Li-Fang Chu, Erik J. Sontheimer, and James A. Thomson. 2013. 'Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis.', Proc Natl Acad Sci U S A.

Hourdel V, Volant S, O'Brien DP, Chenal A, Chamot-Rooke J, Dillies MA, Brier S. 2016. 'MEMHDX: An interactive tool to expedite the statistical validation and visualization of large HDX-MS datasets.'.

Hrle, A., A. A. Su, J. Ebert, C. Benda, L. Randau, and E. Conti. 2013. 'Structure and RNA- binding properties of the type III-A CRISPR-associated protein Csm3', RNA Biol, 10: 1670-8.

Hsu, P. D., E. S. Lander, and F. Zhang. 2014. 'Development and applications of CRISPR- Cas9 for genome engineering', Cell, 157: 1262-78.

Huo, Y., K. H. Nam, F. Ding, H. Lee, L. Wu, Y. Xiao, M. D. Farchione, Jr., S. Zhou, K. Rajashankar, I. Kurinov, R. Zhang, and A. Ke. 2014. 'Structures of CRISPR Cas3 offer mechanistic insights into Cascade-activated DNA unwinding and degradation', Nat Struct Mol Biol, 21: 771-7.

Jackson, R. N., S. M. Golden, P. B. van Erp, J. Carter, E. R. Westra, S. J. Brouns, J. van der Oost, T. C. Terwilliger, R. J. Read, and B. Wiedenheft. 2014. 'Structural biology. Crystal structure of the CRISPR RNA-guided surveillance complex from Escherichia coli', Science, 345: 1473-9.

Jackson, Ryan N., Matthew Lavin, and Blake Wiedenheft. 2014. 'Fitting CRISPR- associated Cas3 into the Helicase Family Tree', Current Opinion in Structural Biology, 24: 106–14.

Jackson, Ryan N., Paul B. G. van Erp, Samuel H. Sternberg, and Blake Wiedenheft. 2017. 'Conformational regulation of CRISPR-associated nucleases', Current Opinion in Microbiology, 37: 110-19.

Jackson, R. N., and B. Wiedenheft. 2015. 'A conserved structural chassis for mounting versatile CRISPR RNA-guided immune responses', Mol Cell, 58: 722–28.

Jeong, H., V. Barbe, C. H. Lee, D. Vallenet, D. S. Yu, S. H. Choi, A. Couloux, S. W. Lee, S. H. Yoon, L. Cattolico, C. G. Hur, H. S. Park, B. Segurens, S. C. Kim, T. K. Oh, R. E. Lenski, F. W. Studier, P. Daegelen, and J. F. Kim. 2009. 'Genome sequences of Escherichia coli B strains REL606 and BL21(DE3)', J Mol Biol, 394: 644-52.

Jiang, F., and J. A. Doudna. 2015. 'The structural biology of CRISPR-Cas systems', Curr Opin Struct Biol, 30: 100-11. 282

Jiang, F., D. W. Taylor, J. S. Chen, J. E. Kornfeld, K. Zhou, A. J. Thompson, E. Nogales, and J. A. Doudna. 2016. 'Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage', Science, 351: 867-71.

Jiang, F., K. Zhou, L. Ma, S. Gressel, and J. A. Doudna. 2015. 'STRUCTURAL BIOLOGY. A Cas9-guide RNA complex preorganized for target DNA recognition', Science, 348: 1477-81.

Jiang, W., and L. A. Marraffini. 2015. 'CRISPR-Cas: New Tools for Genetic Manipulations from Bacterial Immunity Systems', Annu Rev Microbiol, 69: 209-28.

Jinek, M., K. Chylinski, I. Fonfara, M. Hauer, J. A. Doudna, and E. Charpentier. 2012. 'A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity', Science, 337: 816-21.

Jinek, M., A. East, A. Cheng, S. Lin, E. Ma, and J. Doudna. 2013. 'RNA-programmed genome editing in human cells', Elife, 2: e00471.

Jinek, M., F. Jiang, D. W. Taylor, S. H. Sternberg, E. Kaya, E. Ma, C. Anders, M. Hauer, K. Zhou, S. Lin, M. Kaplan, A. T. Iavarone, E. Charpentier, E. Nogales, and J. A. Doudna. 2014. 'Structures of Cas9 endonucleases reveal RNA-mediated conformational activation', Science, 343: 1247997.

Jore, M. M., M. Lundgren, E. van Duijn, J. B. Bultema, E. R. Westra, S. P. Waghmare, B. Wiedenheft, U. Pul, R. Wurm, R. Wagner, M. R. Beijer, A. Barendregt, K. Zhou, A. P. Snijders, M. J. Dickman, J. A. Doudna, E. J. Boekema, A. J. Heck, J. van der Oost, and S. J. Brouns. 2011. 'Structural basis for CRISPR RNA-guided DNA recognition by Cascade', Nat Struct Mol Biol, 18: 529-36.

Jung, Cheulhee, John A. Hawkins, Stephen K. Jones, Jr., Yibei Xiao, James R. Rybarski, Kaylee E. Dillard, Jeffrey Hussmann, Fatema A. Saifuddin, Cagri A. Savran, Andrew D. Ellington, Ailong Ke, William H. Press, and Ilya J. Finkelstein. 'Massively Parallel Biophysical Analysis of CRISPR-Cas Complexes on Next Generation Sequencing Chips', Cell, 170: 35-47.e13.

Kabadi, A. M., D. G. Ousterout, I. B. Hilton, and C. A. Gersbach. 2014. 'Multiplex CRISPR/Cas9-based genome engineering from a single lentiviral vector', Nucleic Acids Res, 42: e147.

Kabsch, W. 2010. 'Xds', Acta Crystallogr D Biol Crystallogr, 66: 125-32.

Kale, L., R. Skeel, M. Bhandarkar, R. Brunner, A. Gursoy, N. Krawetz, J. Phillips, A. Shinozaki, K. Varadarajan, and K. Schulten. 1999. 'NAMD2: Greater scalability for parallel molecular dynamics', Journal of Computational Physics, 151: 283-312. 283

Kallberg, M., H. Wang, S. Wang, J. Peng, Z. Wang, H. Lu, and J. Xu. 2012. 'Template- based protein structure modeling using the RaptorX web server', Nat Protoc, 7: 1511-22.

Karplus, P. A., and K. Diederichs. 2012. 'Linking crystallographic model and data quality', Science, 336: 1030-3.

Katoh, K., K. Kuma, H. Toh, and T. Miyata. 2005. 'MAFFT version 5: improvement in accuracy of multiple sequence alignment', Nucleic Acids Res, 33: 511-8.

Kaya, Emine, Kevin W. Doxzen, Kilian R. Knoll, Ross C. Wilson, Steven C. Strutt, Philip J. Kranzusch, and Jennifer A. Doudna. 2016. 'A bacterial Argonaute with noncanonical guide RNA specificity', Proceedings of the National Academy of Sciences, 113: 4057-62.

Kazlauskiene, M., G. Tamulaitis, G. Kostiuk, C. Venclovas, and V. Siksnys. 2016. 'Spatiotemporal Control of Type III-A CRISPR-Cas Immunity: Coupling DNA Degradation with the Target RNA Recognition', Mol Cell, 62: 295-306.

Kim, S., D. Kim, S. W. Cho, J. Kim, and J. S. Kim. 2014. 'Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins', Genome Res, 24: 1012-9.

Komor, A. C., A. H. Badran, and D. R. Liu. 2017. 'CRISPR-Based Technologies for the Manipulation of Eukaryotic Genomes', Cell, 168: 20-36.

Koo, Y., D. Ka, E. J. Kim, N. Suh, and E. Bae. 2013. 'Conservation and variability in the structure and function of the Cas5d endoribonuclease in the CRISPR-mediated microbial immune system', J Mol Biol, 425: 3799-810.

Koonin, Eugene V. 2017. 'Evolution of RNA- and DNA-guided antivirus defense systems in prokaryotes and eukaryotes: common ancestry vs convergence', Biology Direct, 12: 5.

Kunne, T., S. N. Kieper, J. W. Bannenberg, A. I. Vogel, W. R. Miellet, M. Klein, M. Depken, M. Suarez-Diez, and S. J. Brouns. 2016. 'Cas3-Derived Target DNA Degradation Fragments Fuel Primed CRISPR Adaptation', Mol Cell, 63: 852-64.

Kunne, T., D. C. Swarts, and S. J. Brouns. 2014. 'Planting the seed: target recognition of short guide RNAs', Trends Microbiol, 22: 74-83.

Kuzmenko, A., D. Yudin, S. Ryazansky, A. Kulbachinskiy, and A. A. Aravin. 2019. 'Programmable DNA cleavage by Ago nucleases from mesophilic bacteria Clostridium butyricum and Limnothrix rosea', Nucleic Acids Res, 47: 5822-36. 284

Labrie, S. J., J. E. Samson, and S. Moineau. 2010. 'Bacteriophage resistance mechanisms', Nat Rev Microbiol, 8: 317-27.

Langmead, B., C. Trapnell, M. Pop, and S. L. Salzberg. 2009. 'Ultrafast and memory- efficient alignment of short DNA sequences to the human genome', Genome Biol, 10: R25.

Lee, Kok Zhi, Michael A. Mechikoff, Archana Kikla, Arren Liu, Paula Pandolfi, Frederick S. Gimble, and Kevin V. Solomon. 2019. 'NgAgo-enhanced homologous recombination in E. coli is mediated by DNA endonuclease activity', bioRxiv: 597237.

Lee, S. H., G. Turchiano, H. Ata, S. Nowsheen, M. Romito, Z. Lou, S. M. Ryu, S. C. Ekker, T. Cathomen, and J. S. Kim. 2016. 'Failure to detect DNA-guided genome editing using Natronobacterium gregoryi Argonaute', Nat Biotechnol, 35: 17-18.

Li, W., and A. Godzik. 2006. 'Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences', Bioinformatics, 22: 1658-9.

Lintner, N. G., M. Kerou, S. K. Brumfield, S. Graham, H. Liu, J. H. Naismith, M. Sdano, N. Peng, Q. She, V. Copie, M. J. Young, M. F. White, and C. M. Lawrence. 2011. 'Structural and Functional Characterization of an Archaeal Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated Complex for Antiviral Defense (CASCADE)', J Biol Chem, 286: 21643-56.

Lisitskaya, L., A. A. Aravin, and A. Kulbachinskiy. 2018. 'DNA interference and beyond: structure and functions of prokaryotic Argonaute proteins', Nat Commun, 9: 5165.

Liu, C. J., H. Jiang, L. Wu, L. Y. Zhu, E. Meng, and D. Y. Zhang. 2017. 'OEPR Cloning: an Efficient and Seamless Cloning Strategy for Large- and Multi-Fragments', Sci Rep, 7: 44648.

Liu, L., P. Chen, M. Wang, X. Li, J. Wang, M. Yin, and Y. Wang. 2017. 'C2c1-sgRNA Complex Structure Reveals RNA-Guided DNA Cleavage Mechanism', Mol Cell, 65: 310-22.

Liu, L., X. Li, J. Wang, M. Wang, P. Chen, M. Yin, J. Li, G. Sheng, and Y. Wang. 2017. 'Two Distant Catalytic Sites Are Responsible for C2c2 RNase Activities', Cell, 168: 121-34.e12.

Liu, T. Y., A. T. Iavarone, and J. A. Doudna. 2017. 'RNA and DNA Targeting by a Reconstituted Thermus thermophilus Type III-A CRISPR-Cas System', PLoS One, 12: e0170552.

285

Lobocka, M. B., D. J. Rose, G. Plunkett, 3rd, M. Rusin, A. Samojedny, H. Lehnherr, M. B. Yarmolinsky, and F. R. Blattner. 2004. 'Genome of bacteriophage P1', J Bacteriol, 186: 7032-68.

Long, C., J. R. McAnally, J. M. Shelton, A. A. Mireault, R. Bassel-Duby, and E. N. Olson. 2014. 'Prevention of muscular dystrophy in mice by CRISPR/Cas9-mediated editing of germline DNA', Science, 345: 1184-8.

Luria, S. E., J. N. Adams, and R. C. Ting. 1960. 'Transduction of lactose-utilizing ability among strains of E. coli and S. dysenteriae and the properties of the transducing phage particles', Virology, 12: 348-90.

Ma, J. B., Y. R. Yuan, G. Meister, Y. Pei, T. Tuschl, and D. J. Patel. 2005. 'Structural basis for 5'-end-specific recognition of guide RNA by the A. fulgidus Piwi protein', Nature, 434: 666-70.

Maggio, I., M. Holkers, J. Liu, J. M. Janssen, X. Chen, and M. A. Goncalves. 2014. 'Adenoviral vector delivery of RNA-guided CRISPR/Cas9 nuclease complexes induces targeted mutagenesis in a diverse array of human cells', Sci Rep, 4: 5105.

Makarova, K. S., D. H. Haft, R. Barrangou, S. J. Brouns, E. Charpentier, P. Horvath, S. Moineau, F. J. Mojica, Y. I. Wolf, A. F. Yakunin, J. van der Oost, and E. V. Koonin. 2011. 'Evolution and classification of the CRISPR-Cas systems', Nat Rev Microbiol, 9: 467-77.

Makarova, K. S., Y. I. Wolf, O. S. Alkhnbashi, F. Costa, S. A. Shah, S. J. Saunders, R. Barrangou, S. J. Brouns, E. Charpentier, D. H. Haft, P. Horvath, S. Moineau, F. J. Mojica, R. M. Terns, M. P. Terns, M. F. White, A. F. Yakunin, R. A. Garrett, J. van der Oost, R. Backofen, and E. V. Koonin. 2015. 'An updated evolutionary classification of CRISPR-Cas systems', Nat Rev Microbiol, 13: 722-36.

Makarova, Kira S., Yuri I. Wolf, and Eugene V. Koonin. 2013. 'Comparative genomics of defense systems in archaea and bacteria', Nucleic Acids Research, 41: 4360-77.

Makarova, Kira S., Yuri I. Wolf, John van der Oost, and Eugene V. Koonin. 2009. 'Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements', Biology Direct, 4: 29.

Mali, P., L. H. Yang, K. M. Esvelt, J. Aach, M. Guell, J. E. DiCarlo, J. E. Norville, and G. M. Church. 2013. 'RNA-Guided Human Genome Engineering via Cas9', Science, 339: 823-26.

286

Marchler-Bauer, A., and S. H. Bryant. 2004. 'CD-Search: protein domain annotations on the fly', Nucleic Acids Res, 32: W327-31.

Marchler-Bauer, A., S. Lu, J. B. Anderson, F. Chitsaz, M. K. Derbyshire, C. DeWeese- Scott, J. H. Fong, L. Y. Geer, R. C. Geer, N. R. Gonzales, M. Gwadz, D. I. Hurwitz, J. D. Jackson, Z. Ke, C. J. Lanczycki, F. Lu, G. H. Marchler, M. Mullokandov, M. V. Omelchenko, C. L. Robertson, J. S. Song, N. Thanki, R. A. Yamashita, D. Zhang, N. Zhang, C. Zheng, and S. H. Bryant. 2011. 'CDD: a Conserved Domain Database for the functional annotation of proteins', Nucleic Acids Res, 39: D225-9.

Marraffini, L. A., and E. J. Sontheimer. 2010. 'Self versus non-self discrimination during CRISPR RNA-directed immunity', Nature, 463: 568-71.

Matsumoto, N., H. Nishimasu, K. Sakakibara, K. M. Nishida, T. Hirano, R. Ishitani, H. Siomi, M. C. Siomi, and O. Nureki. 2016. 'Crystal Structure of Silkworm PIWI- Clade Argonaute Siwi Bound to piRNA', Cell, 167: 484-97.e9.

Mccoy, A. J., R. W. Grosse-Kunstleve, P. D. Adams, M. D. Winn, L. C. Storoni, and R. J. Read. 2007. 'Phaser crystallographic software', Journal of Applied Crystallography, 40: 658-74.

Mohanraju, P., K. S. Makarova, B. Zetsche, F. Zhang, E. V. Koonin, and J. van der Oost. 2016. 'Diverse evolutionary roots and mechanistic variations of the CRISPR-Cas systems', Science, 353: aad5147.

Mojica, F. J., C. Diez-Villasenor, J. Garcia-Martinez, and C. Almendros. 2009. 'Short motif sequences determine the targets of the prokaryotic CRISPR defence system', Microbiology, 155: 733-40.

Mojica, F. J., C. Diez-Villasenor, J. Garcia-Martinez, and E. Soria. 2005. 'Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements', J Mol Evol, 60: 174-82.

Mulepati, S., and S. Bailey. 2013. 'In Vitro Reconstitution of an Escherichia coli RNA- guided Immune System Reveals Unidirectional, ATP-dependent Degradation of DNA Target', J Biol Chem, 288: 22184-92.

Mulepati, S., A. Heroux, and S. Bailey. 2014. 'Crystal structure of a CRISPR RNA-guided surveillance complex bound to a ssDNA target', Science, 345: 1479-84.

Mulepati, S., A. Orr, and S. Bailey. 2012. 'Crystal structure of the largest subunit of a bacterial RNA-guided immune complex and its role in DNA target binding', J Biol Chem, 287: 22445-9.

287

Musharova, O., V. Sitnik, M. Vlot, E. Savitskaya, K. A. Datsenko, A. Krivoy, I. Fedorov, E. Semenova, S. J. J. Brouns, and K. Severinov. 2019. 'Systematic analysis of Type I-E Escherichia coli CRISPR-Cas PAM sequences ability to promote interference and primed adaptation', Mol Microbiol, 111: 1558-70.

Nam, K. H., C. Haitjema, X. Q. Liu, F. Ding, H. W. Wang, M. P. DeLisa, and A. L. Ke. 2012. 'Cas5d Protein Processes Pre-crRNA and Assembles into a Cascade-like Interference Complex in Subtype I-C/Dvulg CRISPR-Cas System', Structure, 20: 1574-84.

Nam, K. H., Q. Huang, and A. Ke. 2012. 'Nucleic acid binding surface and dimer interface revealed by CRISPR-associated CasB protein structures', FEBS Lett, 586: 3956-61.

Nishimasu, H., L. Cong, W. X. Yan, F. A. Ran, B. Zetsche, Y. Li, A. Kurabayashi, R. Ishitani, F. Zhang, and O. Nureki. 2015. 'Crystal Structure of Staphylococcus aureus Cas9', Cell, 162: 1113-26.

Nishimasu, H., F. A. Ran, P. D. Hsu, S. Konermann, S. I. Shehata, N. Dohmae, R. Ishitani, F. Zhang, and O. Nureki. 2014. 'Crystal structure of Cas9 in complex with guide RNA and target DNA', Cell, 156: 935-49.

Niu, Y., B. Shen, Y. Cui, Y. Chen, J. Wang, L. Wang, Y. Kang, X. Zhao, W. Si, W. Li, A. P. Xiang, J. Zhou, X. Guo, Y. Bi, C. Si, B. Hu, G. Dong, H. Wang, Z. Zhou, T. Li, T. Tan, X. Pu, F. Wang, S. Ji, Q. Zhou, X. Huang, W. Ji, and J. Sha. 2014. 'Generation of gene-modified cynomolgus monkey via Cas9/RNA-mediated gene targeting in one-cell embryos', Cell, 156: 836-43.

Nunez, J. K., L. B. Harrington, P. J. Kranzusch, A. N. Engelman, and J. A. Doudna. 2015. 'Foreign DNA capture during CRISPR-Cas adaptive immunity', Nature, 527: 535- 8.

Nuñez, James K., Philip J. Kranzusch, Jonas Noeske, Addison V. Wright, Christopher W. Davies, and Jennifer A. Doudna. 2014. 'Cas1–Cas2 complex formation mediates spacer acquisition during CRISPR–Cas adaptive immunity', Nature Structural &Amp; Molecular Biology, 21: 528.

Olovnikov, Ivan, Ken Chan, Ravi Sachidanandam, Dianne K Newman, and Alexei A Aravin. 2013. 'Bacterial Argonaute Samples the Transcriptome to Identify Foreign DNA', Molecular Cell, 51: 594-605.

Osawa, T., H. Inanaga, C. Sato, and T. Numata. 2015. 'Crystal structure of the CRISPR- Cas RNA silencing Cmr complex bound to a target analog', Mol Cell, 58: 418-30.

288

Otwinowski, Z., and W. Minor. 1997. 'Processing of X-ray diffraction data collected in oscillation mode', Macromolecular Crystallography, Pt A, 276: 307-26.

Pettersen, Eric F., Thomas D. Goddard, Conrad C. Huang, Gregory S. Couch, Daniel M. Greenblatt, Elaine C. Meng, and Thomas E. Ferrin. 2004. 'UCSF Chimera—A visualization system for exploratory research and analysis', Journal of Computational Chemistry, 25: 1605-12.

Pourcel, C., G. Salvignol, and G. Vergnaud. 2005. 'CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies', Microbiology, 151: 653-63.

Price, M. N., P. S. Dehal, and A. P. Arkin. 2010. 'FastTree 2--approximately maximum- likelihood trees for large alignments', PLoS One, 5: e9490.

Ran, F. A., L. Cong, W. X. Yan, D. A. Scott, J. S. Gootenberg, A. J. Kriz, B. Zetsche, O. Shalem, X. Wu, K. S. Makarova, E. V. Koonin, P. A. Sharp, and F. Zhang. 2015. 'In vivo genome editing using Staphylococcus aureus Cas9', Nature, 520: 186-91.

Redding, S., S. H. Sternberg, M. Marshall, B. Gibb, P. Bhat, C. K. Guegler, B. Wiedenheft, J. A. Doudna, and E. C. Greene. 2015. 'Surveillance and Processing of Foreign DNA by the Escherichia coli CRISPR-Cas System', Cell.

Reeks, J., J. H. Naismith, and M. F. White. 2013. 'CRISPR interference: a structural perspective', Biochem J, 453: 155-66.

Richter, C., R. L. Dy, R. E. McKenzie, B. N. Watson, C. Taylor, J. T. Chang, M. B. McNeil, R. H. Staals, and P. C. Fineran. 2014. 'Priming in the Type I-F CRISPR-Cas system triggers strand-independent spacer acquisition, bi-directionally from the primed protospacer', Nucleic Acids Res, 42: 8516-26.

Roberts, R. J. 2005. 'How restriction enzymes became the workhorses of molecular biology', Proceedings of the National Academy of Sciences of the United States of America, 102: 5905-08.

Rodriguez-Valera, F., A. B. Martin-Cuadrado, B. Rodriguez-Brito, L. Pasic, T. F. Thingstad, F. Rohwer, and A. Mira. 2009. 'Explaining microbial population genomics through phage predation', Nature Reviews Microbiology, 7: 828-36.

Rollins, M. F., S. Chowdhury, J. Carter, S. M. Golden, R. A. Wilkinson, J. Bondy-Denomy, G. C. Lander, and B. Wiedenheft. 2017. 'Cas1 and the Csy complex are opposing regulators of Cas2/3 nuclease activity', Proc Natl Acad Sci U S A, 114: E5113-e21.

289

Rollins, M. F., J. T. Schuman, K. Paulus, H. S. Bukhari, and B. Wiedenheft. 2015. 'Mechanism of foreign DNA recognition by a CRISPR RNA-guided surveillance complex from Pseudomonas aeruginosa', Nucleic Acids Res, 43: 2216-22.

Rouillon, C., M. Zhou, J. Zhang, A. Politis, V. Beilsten-Edmands, G. Cannone, S. Graham, C. V. Robinson, L. Spagnolo, and M. F. White. 2013. 'Structure of the CRISPR interference complex CSM reveals key similarities with cascade', Mol Cell, 52: 124-34.

Rutkauskas, M., T. Sinkunas, I. Songailiene, M. S. Tikhomirova, V. Siksnys, and R. Seidel. 2015. 'Directional R-Loop Formation by the CRISPR-Cas Surveillance Complex Cascade Provides Efficient Off-Target Site Rejection', Cell Rep.

Ryazansky, Sergei, Andrey Kulbachinskiy, and Alexei A. Aravin. 2018. 'The Expanded Universe of Prokaryotic Argonaute Proteins', mBio, 9: e01935-18.

Samai, Poulami , Nora Pyenson, Wenyan Jiang, Gregory W. Goldberg, Asma Hatoum- Aslan, and Luciano A. Marraffini. 2015. 'Co-transcriptional DNA and RNA cleavage during type III CRISPR-Cas immunity.', Cell, 161: 1164–74.

Sanjana, N. E., O. Shalem, and F. Zhang. 2014. 'Improved vectors and genome-wide libraries for CRISPR screening', Nat Methods, 11: 783-4.

Sashital, D. G., M. Jinek, and J. A. Doudna. 2011. 'An RNA-induced conformational change required for CRISPR RNA cleavage by the endoribonuclease Cse3', Nat Struct Mol Biol, 18: 680-7.

Sashital, Dipali G. , Blake Wiedenheft, and Jennifer A. Doudna. 2012. 'Mechanism of foreign DNA selection in a bacterial adaptive immune system', Mol Cell, 48: 606- 15.

Savitskaya, E., E. Semenova, V. Dedkov, A. Metlitskaya, and K. Severinov. 2013. 'High- throughput analysis of type I-E CRISPR/Cas spacer acquisition in E. coli', RNA Biol, 10: 716-25.

Schirle, N. T., and I. J. MacRae. 2012. 'The crystal structure of human Argonaute2', Science, 336: 1037-40.

Schrodinger, LLC. 2010. "The PyMOL Molecular Graphics System, Version 1.3r1." In.

Semenova, E., M. M. Jore, K. A. Datsenko, A. Semenova, E. R. Westra, B. Wanner, J. van der Oost, S. J. Brouns, and K. Severinov. 2011. 'Interference by clustered regularly interspaced short palindromic repeat (CRISPR) RNA is governed by a seed sequence', Proc Natl Acad Sci U S A, 108: 10098-103. 290

Shalem, O., N. E. Sanjana, E. Hartenian, X. Shi, D. A. Scott, T. S. Mikkelsen, D. Heckl, B. L. Ebert, D. E. Root, J. G. Doench, and F. Zhang. 2014. 'Genome-scale CRISPR- Cas9 knockout screening in human cells', Science, 343: 84-7.

Shao, Yaming , Alexis I. Cocozaki, Nancy F. Ramia, Rebecca M. Terns, Michael P. Terns, and Hong Li. 2013. 'Structure of the Cmr2-Cmr3 Subcomplex of the Cmr RNA Silencing Complex', Structure, 21: 376-84.

Sheng, G., H. Zhao, J. Wang, Y. Rao, W. Tian, D. C. Swarts, J. van der Oost, D. J. Patel, and Y. Wang. 2014. 'Structure-based cleavage mechanism of Thermus thermophilus Argonaute DNA guide strand-mediated DNA target cleavage', Proc Natl Acad Sci U S A, 111: 652-7.

Shmakov, S., O. O. Abudayyeh, K. S. Makarova, Y. I. Wolf, J. S. Gootenberg, E. Semenova, L. Minakhin, J. Joung, S. Konermann, K. Severinov, F. Zhang, and E. V. Koonin. 2015. 'Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems', Mol Cell, 60: 385-97.

Sinkunas, T., G. Gasiunas, S. P. Waghmare, M. J. Dickman, R. Barrangou, P. Horvath, and V. Siksnys. 2013. 'In vitro reconstitution of Cascade-mediated CRISPR immunity in Streptococcus thermophilus', Embo Journal, 32: 385-94.

Song, Jinzhao, Jorrit W. Hegge, Michael G. Mauk, Junman Chen, Jacob E. Till, Neha Bhagwat, Lotte T. Azink, Jing Peng, Moen Sen, Jazmine Mays, Erica Carpenter, John van der Oost, and Haim H. Bau. 2019. 'Highly Specific Enrichment of Rare Nucleic Acids using Thermus Thermophilus Argonaute with Applications in Cancer Diagnostics', bioRxiv: 491738.

Song, Jinzhao, Michael Joseph Powell, Wei Liu, Junman Chen, and Haim Bau. 2019. 'Highly sensitive detection of rare mutant alleles by combining argonaute-based enrichment and XNA-PCR', Journal of Clinical Oncology, 37: e14605-e05.

Sorek, R., C. M. Lawrence, and B. Wiedenheft. 2013. 'CRISPR-mediated adaptive immune systems in bacteria and archaea', Annu Rev Biochem, 82: 237-66.

Spilman, M., A. Cocozaki, C. Hale, Y. Shao, N. Ramia, R. Terns, M. Terns, H. Li, and S. Stagg. 2013. 'Structure of an RNA silencing complex of the CRISPR-Cas immune system', Mol Cell, 52: 146-52.

Staals, R. H., Y. Agari, S. Maki-Yonekura, Y. Zhu, D. W. Taylor, E. van Duijn, A. Barendregt, M. Vlot, J. J. Koehorst, K. Sakamoto, A. Masuda, N. Dohmae, P. J. Schaap, J. A. Doudna, A. J. Heck, K. Yonekura, J. van der Oost, and A. Shinkai. 2013. 'Structure and activity of the RNA-targeting Type III-B CRISPR-Cas complex of Thermus thermophilus', Mol Cell, 52: 135-45. 291

Sternberg, S. H., and J. A. Doudna. 2015. 'Expanding the Biologist's Toolkit with CRISPR- Cas9', Mol Cell, 58: 568-74.

Sternberg, S. H., B. LaFrance, M. Kaplan, and J. A. Doudna. 2015. 'Conformational control of DNA target cleavage by CRISPR-Cas9', Nature, 527: 110-3.

Sternberg, S. H., S. Redding, M. Jinek, E. C. Greene, and J. A. Doudna. 2014. 'DNA interrogation by the CRISPR RNA-guided endonuclease Cas9', Nature, 507: 62-7.

Studier, F. W., P. Daegelen, R. E. Lenski, S. Maslov, and J. F. Kim. 2009. 'Understanding the differences between genome sequences of Escherichia coli B strains REL606 and BL21(DE3) and comparison of the E. coli B and K-12 genomes', J Mol Biol, 394: 653-80.

Sunghyeok, Ye, Bae Taegeun, Kim Kyoungmi, Habib Omer, Lee Seung Hwan, Kim Yoon Young, Lee Kang-In, Kim Seokjoong, and Kim Jin-Soo. 2017. 'DNA-dependent RNA cleavage by the Natronobacterium gregoryi Argonaute', bioRxiv.

Suttle, C. A. 2007. 'Marine viruses--major players in the global ecosystem', Nat Rev Microbiol, 5: 801-12.

Swarts, Daan C., Jorrit W. Hegge, Ismael Hinojo, Masami Shiimori, Michael A. Ellis, Justin Dumrongkulraksa, Rebecca M. Terns, Michael P. Terns, and John van der Oost. 2015. 'Argonaute of the archaeon Pyrococcus furiosus is a DNA- guided nuclease that targets cognate DNA', Nucleic Acids Research, 43: 5120-29.

Swarts, Daan C., Matthijs M. Jore, Edze R. Westra, Yifan Zhu, Jorijn H. Janssen, Ambrosius P. Snijders, Yanli Wang, Dinshaw J. Patel, Jose Berenguer, Stan J. J. Brouns, and John van der Oost. 2014. 'DNA-guided DNA interference by a prokaryotic Argonaute', Nature, 507: 258-61.

Swarts, Daan C., Kira Makarova, Yanli Wang, Kotaro Nakanishi, René F. Ketting, Eugene V. Koonin, Dinshaw J. Patel, and John van der Oost. 2014. 'The evolutionary journey of Argonaute proteins', Nature structural & molecular biology, 21: 743- 53.

Swarts, D. C., C. Mosterd, M. W. van Passel, and S. J. Brouns. 2012. 'CRISPR Interference Directs Strand Specific Spacer Acquisition', PLoS One, 7: e35888.

Swarts, D. C., M. Szczepaniak, G. Sheng, S. D. Chandradoss, Y. Zhu, E. M. Timmers, Y. Zhang, H. Zhao, J. Lou, Y. Wang, C. Joo, and J. van der Oost. 2017. 'Autonomous Generation and Loading of DNA Guides by Bacterial Argonaute', Mol Cell, 65: 985-98.e6.

292

Swarts, D. C., J. van der Oost, and M. Jinek. 2017. 'Structural Basis for Guide RNA Processing and Seed-Dependent DNA Targeting by CRISPR-Cas12a', Mol Cell, 66: 221-33.e4.

Szczelkun, M. D., M. S. Tikhomirova, T. Sinkunas, G. Gasiunas, T. Karvelis, P. Pschera, V. Siksnys, and R. Seidel. 2014. 'Direct observation of R-loop formation by single RNA-guided Cas9 and Cascade effector complexes', Proc Natl Acad Sci U S A, 111: 9798-803.

Tamulaitis, G., C. Venclovas, and V. Siksnys. 2017. 'Type III CRISPR-Cas Immunity: Major Differences Brushed Aside', Trends Microbiol, 25: 49-61.

Tan, W., D. F. Carlson, C. A. Lancto, J. R. Garbe, D. A. Webster, P. B. Hackett, and S. C. Fahrenkrug. 2013. 'Efficient nonmeiotic allele introgression in livestock using custom endonucleases', PNAS, 110: 16526–31.

Tay, Melanie, Su Liu, and Y. Adam Yuan. 2015. 'Crystal structure of Thermobifida fuscaCse1 reveals target DNA binding site', Protein Science, 24: 236-45.

Taylor, D. W., Y. Zhu, R. H. Staals, J. E. Kornfeld, A. Shinkai, J. van der Oost, E. Nogales, and J. A. Doudna. 2015. 'Structural biology. Structures of the CRISPR-Cmr complex reveal mode of RNA target positioning', Science, 348: 581-5.

Terns, R. M., and M. P. Terns. 2014. 'CRISPR-based technologies: prokaryotic defense weapons repurposed', Trends Genet, 30: 111-8.

Terwilliger, T. C. 2013. 'Finding non-crystallographic symmetry in density maps of macromolecular structures', J Struct Funct Genomics, 14: 91-5.

Terwilliger, T. C., R. W. Grosse-Kunstleve, P. V. Afonine, N. W. Moriarty, P. H. Zwart, L. W. Hung, R. J. Read, and P. D. Adams. 2008. 'Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard', Acta Crystallogr D Biol Crystallogr, 64: 61-9.

Trabuco, L. G., E. Villa, K. Mitra, J. Frank, and K. Schulten. 2008. 'Flexible fitting of atomic structures into electron microscopy maps using molecular dynamics', Structure, 16: 673-83.

Trabuco, L. G., E. Villa, E. Schreiner, C. B. Harrison, and K. Schulten. 2009. 'Molecular dynamics flexible fitting: a practical guide to combine cryo-electron microscopy and X-ray crystallography', Methods, 49: 174-80.

293

Tsai, S. Q., Z. Zheng, N. T. Nguyen, M. Liebers, V. V. Topkar, V. Thapar, N. Wyvekens, C. Khayter, A. J. Iafrate, L. P. Le, M. J. Aryee, and J. K. Joung. 2015. 'GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases', Nat Biotechnol, 33: 187-97.

Tsui, T. K., and H. Li. 2015. 'Structure Principles of CRISPR-Cas Surveillance and Effector Complexes', Annu Rev Biophys. van der Oost, J., E. R. Westra, R. N. Jackson, and B. Wiedenheft. 2014. 'Unravelling the structural and mechanistic basis of CRISPR-Cas systems', Nat Rev Microbiol, 12: 479-92. van Erp, P. B., G. Bloomer, R. Wilkinson, and B. Wiedenheft. 2015. 'The history and market impact of CRISPR RNA-guided nucleases', Curr Opin Virol, 12: 85-90. van Erp, Paul B.G., Ryan N. Jackson, Joshua Carter, Sarah M. Golden, Scott Bailey, and Blake Wiedenheft. 2015. 'Mechanism of CRISPR-RNA guided recognition of DNA targets in Escherichia coli', Nucleic Acids Research, 43: 8381-91.

Vaudel, Marc, Julia M. Burkhart, Rene P. Zahedi, Eystein Oveland, Frode S. Berven, Albert Sickmann, Lennart Martens, and Harald Barsnes. 2015. 'PeptideShaker enables reanalysis of MS-derived proteomics data sets', Nat Biotech, 33: 22-24. Vink, Jochem N.A., Koen J.A. Martens, Marnix Vlot, Rebecca E. McKenzie, Cristóbal Almendros, Boris Estrada Bonilla, Daan J.W. Brocken, Johannes Hohlbein, and Stan J.J. Brouns. 2019. 'Direct visualization of native CRISPR target search in live bacteria reveals Cascade DNA surveillance mechanism', bioRxiv: 589119.

Vorontsova, D., K. A. Datsenko, S. Medvedeva, J. Bondy-Denomy, E. E. Savitskaya, K. Pougach, M. Logacheva, B. Wiedenheft, A. R. Davidson, K. Severinov, and E. Semenova. 2015. 'Foreign DNA acquisition by the I-F CRISPR-Cas system requires all components of the interference machinery', Nucleic Acids Res, 43: 10848-60.

Voytas, D. F., and C. Gao. 2014. 'Precision genome engineering and agriculture: opportunities and regulatory challenges', PLoS Biol, 12: e1001877.

Wang, H., H. Yang, C. S. Shivalila, M. M. Dawlaty, A. W. Cheng, F. Zhang, and R. Jaenisch. 2013. 'One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering', Cell, 153: 910-8.

Wang, J., J. Li, H. Zhao, G. Sheng, M. Wang, M. Yin, and Y. Wang. 2015. 'Structural and Mechanistic Basis of PAM-Dependent Spacer Acquisition in CRISPR-Cas Systems', Cell, 163: 840-53.

294

Wang, T., J. J. Wei, D. M. Sabatini, and E. S. Lander. 2014. 'Genetic Screens in Human Cells Using the CRISPR-Cas9 System', Science, 343: 80-84.

Weinbauer, M. G. 2004. 'Ecology of prokaryotic viruses', FEMS Microbiol Rev, 28: 127- 81.

Westra, E. R., B. Nilges, P. B. van Erp, J. van der Oost, R. T. Dame, and S. J. Brouns. 2012. 'Cascade-mediated binding and bending of negatively supercoiled DNA', RNA Biol, 9.

Westra, E. R., E. Semenova, K. A. Datsenko, R. N. Jackson, B. Wiedenheft, K. Severinov, and S. J. Brouns. 2013. 'Type I-E CRISPR-Cas Systems Discriminate Target from Non-Target DNA through Base Pairing-Independent PAM Recognition', PLoS Genet, 9: e1003742.

Westra, Edze R., Daan Swarts, Raymond Staals, Matthijs M. Jore, Stan J.J. Brouns, and John van der Oost. 2012. 'The CRISPRs, They Are A-Changin': How Prokaryotes Generate Adaptive Immunity', Annu. Rev. Genet., 46: 311-39.

Westra, Edze R. , Paul B. G. van Erp, Tim Künne, Shi Pey Wong, Raymond H. J. Staals, Christel L. C. Seegers, Sander Bollen, Matthijs M. Jore, Ekaterina Semenova, Konstantin Severinov, Willem M. de Vos, Remus T. Dameb, Renko de Vries, Stan J. J. Brouns, and John van der Oost. 2012. 'CRISPR immunity relies on the consecutive binding and degradation of negatively supercoiled invader DNA by Cascade and Cas3', Mol Cell, 46: 595-605.

Wiedenheft, Blake, Gabriel C. Lander, Kaihong Zhou, Matthijs M. Jore, Stan J.J. Brouns, John van der Oost, Jennifer A. Doudna, and Eva Nogales. 2011. 'Structures of the RNA-guided surveillance complex from a bacterial immune system', Nature, 477: 486-9.

Wiedenheft, B., S. H. Sternberg, and J. A. Doudna. 2012. 'RNA-guided genetic silencing systems in bacteria and archaea', Nature, 482: 331-8.

Wiedenheft, B., E. van Duijn, J. Bultema, S. Waghmare, K. Zhou, A. Barendregt, W. Westphal, A. Heck, E. Boekema, M. Dickman, and J. A. Doudna. 2011. 'RNA- guided complex from a bacterial immune system enhances target recognition through seed sequence interactions', Proc Natl Acad Sci U S A, 108: 10092-7.

Wilkinson, R., and B. Wiedenheft. 2014. 'A CRISPR method for genome engineering', F1000Prime Rep, 6: 3.

295

Willkomm, S., K. S. Makarova, and D. Grohmann. 2018. 'DNA silencing by prokaryotic Argonaute proteins adds a new layer of defense against invading nucleic acids', FEMS Microbiol Rev, 42: 376-87.

Willkomm, Sarah, Christine A. Oellig, Adrian Zander, Tobias Restle, Ronan Keegan, Dina Grohmann, and Sabine Schneider. 2017. 'Structural and mechanistic insights into an archaeal DNA-guided Argonaute protein', Nature Microbiology, 2: 17035.

Wright, A. V., J. K. Nunez, and J. A. Doudna. 2016. 'Biology and Applications of CRISPR Systems: Harnessing Nature's Toolbox for Genome Engineering', Cell, 164: 29-44.

Wu, X., D. A. Scott, A. J. Kriz, A. C. Chiu, P. D. Hsu, D. B. Dadon, A. W. Cheng, A. E. Trevino, S. Konermann, S. Chen, R. Jaenisch, F. Zhang, and P. A. Sharp. 2014. 'Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells', Nat Biotechnol, 32: 670-6.

Wu, Z., S. Tan, L. Xu, L. Gao, H. Zhu, C. Ma, and X. Liang. 2017. 'NgAgo-gDNA system efficiently suppresses hepatitis B virus replication through accelerating decay of pregenomic RNA', Antiviral Res, 145: 20-23.

Xiao, Y., M. Luo, R. P. Hayes, J. Kim, S. Ng, F. Ding, M. Liao, and A. Ke. 2017. 'Structure Basis for Directional R-loop Formation and Substrate Handover Mechanisms in Type I CRISPR-Cas System', Cell, 170: 48-60.e11.

Xiao, Yibei, Min Luo, Robert P. Hayes, Jonathan Kim, Sherwin Ng, Fang Ding, Maofu Liao, and Ailong Ke. 2017. 'Structure Basis for Directional R-loop Formation and Substrate Handover Mechanisms in Type I CRISPR-Cas System', Cell, 170: 48- 60.e11.

Xue, Chaoyou, Arun S. Seetharam, Olga Musharova, Konstantin Severinov, Stan J. J. Brouns, Andrew J. Severin, and Dipali G. Sashital. 2015. 'CRISPR interference and priming varies with individual spacer sequences', Nucleic Acids Research, 43: 10831-47.

Xue, Chaoyou, Natalie R. Whitis, and Dipali G. Sashital. 2016. 'Conformational Control of Cascade Interference and Priming Activities in CRISPR Immunity', Molecular Cell, 64: 826-34.

Xue, C., Y. Zhu, X. Zhang, Y. K. Shin, and D. G. Sashital. 2017. 'Real-Time Observation of Target Search by the CRISPR Surveillance Complex Cascade', Cell Rep, 21: 3717-27.

296

Yamada, M., Y. Watanabe, J. S. Gootenberg, H. Hirano, F. A. Ran, T. Nakane, R. Ishitani, F. Zhang, H. Nishimasu, and O. Nureki. 2017. 'Crystal Structure of the Minimal Cas9 from Campylobacter jejuni Reveals the Molecular Diversity in the CRISPR- Cas9 Systems', Mol Cell, 65: 1109-21.e3.

Yamano, T., H. Nishimasu, B. Zetsche, H. Hirano, I. M. Slaymaker, Y. Li, I. Fedorova, T. Nakane, K. S. Makarova, E. V. Koonin, R. Ishitani, F. Zhang, and O. Nureki. 2016. 'Crystal Structure of Cpf1 in Complex with Guide RNA and Target DNA', Cell, 165: 949-62.

Yang, H., P. Gao, K. R. Rajashankar, and D. J. Patel. 2016. 'PAM-Dependent Target DNA Recognition and Cleavage by C2c1 CRISPR-Cas Endonuclease', Cell, 167: 1814- 28.e12.

Yang, W. 2011. 'Nucleases: diversity of structure, function and mechanism', Q Rev Biophys, 44: 1-93.

Yu, Guangchuang, David K. Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. 'ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data', Methods in Ecology and Evolution, 8: 28-36.

Yuan, Yu-Ren, Yi Pei, Jin-Biao Ma, Vitaly Kuryavyi, Maria Zhadina, Gunter Meister, Hong-Ying Chen, Zbigniew Dauter, Thomas Tuschl, and Dinshaw J. Patel. 2005. 'Crystal Structure of A. aeolicus Argonaute, a Site-Specific DNA-Guided Endoribonuclease, Provides Insights into RISC-Mediated mRNA Cleavage', Molecular Cell, 19: 405-19.

Zander, Adrian, Sarah Willkomm, Sapir Ofer, Marleen van Wolferen, Luisa Egert, Sabine Buchmeier, Sarah Stöckl, Philip Tinnefeld, Sabine Schneider, Andreas Klingl, Sonja-Verena Albers, Finn Werner, and Dina Grohmann. 2017. 'Guide- independent DNA cleavage by archaeal Argonaute from Methanocaldococcus jannaschii', Nature Microbiology, 2: 17034.

Zetsche, B., J. S. Gootenberg, O. O. Abudayyeh, I. M. Slaymaker, K. S. Makarova, P. Essletzbichler, S. E. Volz, J. Joung, J. van der Oost, A. Regev, E. V. Koonin, and F. Zhang. 2015. 'Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR- Cas system', Cell, 163: 759-71.

Zhang, J., S. Graham, A. Tello, H. Liu, and M. F. White. 2016. 'Multiple nucleic acid cleavage modes in divergent type III CRISPR systems', Nucleic Acids Res, 44: 1789-99.

297

Zhao, H., G. Sheng, J. Wang, M. Wang, G. Bunkoczi, W. Gong, Z. Wei, and Y. Wang. 2014. 'Crystal structure of the RNA-guided immune surveillance Cascade complex in Escherichia coli', Nature, 515: 147-50.

Zuris, J. A., D. B. Thompson, Y. Shu, J. P. Guilinger, J. L. Bessen, J. H. Hu, M. L. Maeder, J. K. Joung, Z. Y. Chen, and D. R. Liu. 2015. 'Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo', Nat Biotechnol, 33: 73-80.

'https://www.addgene.org/CRISPR/'.

'Horizon Discovery Group, Admission Document'. 2015. http://www.horizondiscovery.com/media/item/201.

'The Nobel Prize in Physiology or Medicine 1978'. 2015. Nobel Media http://www.nobelprize.org/nobel_prizes/medicine/laureates/1978/.