Precise and Expansive Genomic Positioning for

CRISPR Edits MASSADHIUSET[NSTITUTE

by JUL 2 6 2019 Noah Michael Jakimo L I LIBRARIES 19 B.S., California Institute of Technology (2010) S.M., Massachusetts Institute of Technology (2015)

Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning in partial fulfillment of the requirements for the degree of in Media Arts and Sciences at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2019 ©Massachusetts Institute of Technology 2019. All rights reserved.

Signature redacted A uthor ...... Program in Medd Arts and Sciences, School of Architecture and Planning May 3, 2019

Certified by ...... Signature redacted Joseph M. Uacobson Associate Professor of Media Arts and Sciences Thesis Supervisor

Accepted by ...... Signatureredacted (j)Tod Machover Academic Head, rogram in Media Arts and Sciences 77 Massachusetts Avenue Cambridge, MA 02139 MITLibraries http://Iibraries.mit.edu/ask

DISCLAIMER NOTICE

Due to the condition of the original material, there are unavoidable flaws in this reproduction. We have made every effort possible to provide you with the best copy available.

Thank you.

Some pages in the original document contain text that is illegible. t Precise and Expansive Genomic Positioning for CRISPR Edits by Noah Michael Jakimo Submitted to the Program in Media Arts and Sciences, School of Architecture and Planning on May 3, 2019, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Media Arts and Sciences

Abstract

The recent harnessing of microbial adaptive immune systems, known as CRISPR, has enabled genome-wide engineering across all domains of life. A new generation of gene-editing tools has been fashioned from the natural DNA/RNA-targeting ability of certain CRISPR-associated (Cas) proteins and their guide RNA, which work together to recognize and defend against infectious genetic threats. This straight-forward RNA-programmed sequence recognition by CRISPR has facilitated its rapid global impact on genetic research, diagnostics, therapeutics, and bioproduction. An ideal DNA-editing platform would achieve perfect accuracy on any desired cellular and genomic target. CRISPR systems, however, have limited target fidelity and range, in part due to their evolutionary pressures to defend microbes from fast- mutating without self-targeting their own guide RNA. These natural limita- tions of CRISPR can especially constrain gene-editing in animals and plants, which are more vulnerable to off-target activity occurring in one of their trillions of cells with genomes that are 1000x larger than those of unicellular microbes that natively harbor CRISPR systems. This thesis overcomes three critical challenges for precise and broad gene-editing of complex organisms: 1) engineering a means of specificity for the type of cells to edit, 2) improving target-matching accuracy, and 3) broadening the editable portion of the genome. This thesis addresses these challenges by integrating custom developed computa- tional design tools and biological validation of the resulting novel CRISPR systems; 1) To target within multicellular heterogeneity, new oligonucleotide-sensing structural motifs are designed and embed into guides that can potentially control CRISPR nucle- ase activity based on cell-type transcriptome patterns; 2) To discern among increased similarity between a target and non-target sequences in larger genomes, base-pairing thermostability principles are employed to tune the biochemical composition of guides that can evade subtly mismatched off-target sites; 3) To expand the reach of editing techniques with narrow windows of operation, such as base-editing, bioinformatics workflows that discover previously uncharacterized Cas proteins with novel target scope are created. This thesis demonstrates the effectiveness of these strategies in the context of in vitro, bacterial, and human cell culture assays, and contributes advancements in the precision and generality for CRISPR gene-editing.

Thesis Supervisor: Joseph M. Jacobson Title: Associate Professor of Media Arts and Sciences I s Precise and Expansive Genomic Positioning for CRISPR Edits by Noah Michael Jakimo

The following people served as readers for this thesis:

I Signature redacted Professor Joseph M. Jacobson...... Thesis Supervisor Associate Professor of Media Arts and Sciences, MIT

S i jnaUIt 1rlU re d4 alezt d -

Professor Edward S. Boyden III... Thesis Reader Associate Professor of Media Arts and Sciences, MIT

-~ Signature redacted Professor George M. Church.. Thesis Reader Professor of Genetics, Harvard Medical School

Dedication

For Teri, who makes every day wonderful. We are lucky to have each other. t Acknowledgments

My thesis was made possible by the kind and generous support of my advisor, mentors, labmates in Molecular Machines, co-workers at the MIT Media Lab (ML) and Center for Bits and Atoms (CBA), friends, family, and, of course, my wife, Teri. I would like to give special thanks:

To Joe, for the opportunity to learn, invent, explore, and discover together;

To Pranam, for our balanced and productive partnership;

To Lisa and Thras, for cohesive team efforts through struggles and successes; To Neil and Joi, for the feeling of belonging in both the CBA and Media Lab;

To CBA and ML, for being empowering environments for myself and the world;

To George and Ed, for sharing your prolific genius and caring guidance;

To CBA and ML admins, for enabling our capabilities and focus;

To Linda, for overseeing our degree-seeking academic trajectory;

To Naama and David, for vital experience and knowledge;

To mom and dad, Benay and Alan, for a lifetime of loving encouragement;

To Teri, for shared love and inspiration.

From the bottom of my heart, thank you!

Contents

1 CRISPR Systems for Gene Editing 15 1.1 Introduction ...... 16 1.2 Native CRISPR Mechanics ...... 16

1.3 CRISPR for Gene-Editing ...... 17 1.3.1 Range ...... 19 1.3.2 Cas9 Specificity ...... 20 1.3.3 Cas9 Activity ...... 21 1.4 Thesis Contributions ...... 22

2 DNA/RNA Chimeric CRISPR Guides Enhance Target Specificity for Streptococcus pyogenes Cas9 (SpCas9) 23 2.1 Introduction ...... 24

2.2 Results and Discussion ...... 25 2.2.1 DNA Substitutions in Cas9 gRNA Improve Mismatch Sensitivity 25 2.2.2 R-Loop Expansion Kinetics Determine Melt-Guide Specificity 28

2.2.3 Melt-Guides Reduce Off-Target Genome Editing ...... 29

2.2.4 Conclusions ...... 32 2.3 Materials and Methods ...... 32

2.3.1 Cas9-Guide in vitro DNA Digestions ...... 32

2.3.2 Preparation of Single-Stranded Target DNA Substrates ... . 33

2.3.3 Genomic Indel Production and Measurements ...... 33

2.3.4 Sequence Information ...... 34

11 3 Single-Base PAM Specificity of a Highly-Similar SpCas9 Ortholog

from Streptococcus canis 35

3.1 Introduction ...... 36

3.2 R esults ...... 37 3.2.1 Identification of SpCas9 Homologs ...... 37

3.2.2 Determination of PAM Sequences Recognized by ScCas9 .. . 38

3.2.3 Assessment of ScCas9 PAM Specificity in Human Cells ... . 39 3.2.4 Off-Target Analysis of ScCas9 ...... 41 3.2.5 ScCas9 Genome Editing Capabilities ...... 43

3.2.6 Investigation of Sequence Conservation Between S. canis and

Other Streptococcus Cas9 Orthologs ...... 46

3.2.7 Genus-wide Prediction of Divergent Streptococcus Cas9 PAMs 48

3.3 D iscussion ...... 51

3.4 M aterials and M ethods ...... 54

3.4.1 Identification of Cas9 Homologs and Generation of Plasmids . 54

3.4.2 PAM-SCANR Assay ...... 55 3.4.3 Cell Culture and Gene Modification Analysis ...... 56 3.4.4 Base Editing Analysis with Traffic Light Reporter ...... 57

3.4.5 Base Editing Evaluation Program ...... 57

3.4.6 SPAMALOT Pipeline ...... 58 3.4.7 Statistical Analysis ...... 58

4 A Cas9 with Complete PAM Recognition for Adenine Dinucleotides

from Streptococcus macacae 59

4.1 Introduction ...... 60

4.2 M aterials and Methods ...... 69

4.2.1 Selection of Streptococcus Cas9 Orthologs of Interest . .... 69

4.2.2 PAM-SCANR Bacterial Fluorescence Assay ...... 69

4.2.3 Purification of and DNA cleavage with Selected Nucleases . . 70

4.2.4 Gene Modification Analysis and Software ...... 72

12 4.2.5 Base Editing Analysis and Software ...... 72

5 RNA-Switched Cas9 Guides Engineered by Strand-Displacement 77 5.1 Introduction ...... 78 5.2 Computational Design of Switchable Guide RNA ...... 78 5.3 Validation of the Toehold-Gated Strand-Displacement Mechanism . . 81 5.4 Demonstration of OFF-to-ON and ON-to-OFF Switchable Guide RNA 82 5.5 M aterials and Methods ...... 83 5.5.1 In vitro Cas9 Cleavage Assays ...... 83 5.5.2 Nupack swigRNA Design ...... 84

6 Concluding Remarks 85

7 Bibliography 89

13 14 Chapter 1

CRISPR Systems for Gene Editing

15 '1

Growth in CRISPR and Gene Editing Publications Since 2010 5000- = "Gene Editing" in abstract -A-"TALEN" in abstract 4000 - "CRISPR" in abstract "CRISPR" 2-year doubling projection Uo 3000 -

0 .2000-

1000-

0- 2010 2011 2012 2013 2014 2015 2016 2017 2018

Figure 1.1 Crowth trends in "CRISPR" and related literature since 2010 using data from the Dimensions research knowledge databasehttps:/app.diension.ail.

1.1 Introduction

CRISPR refers to Clustered Regularly Interspaced Short Palindromic Ropeats that. encode genetic memory cassettes for adaptive inimae systems in and ar- chaea. [98, 7, 94] CRISPR is best known, however, for accelerating the field of gene editing (Figure 1.1). [140, 76] This chapter reviews the native microbial defense system mechanics for CRISPR, the properties of CRISPR that facilitated its imme- diate impact on gne editing, and crtain constraints on CRISPR that. this thesis overcomes t.o achieve more accurate and general DNA-targeting.

1.2 Native CRISPR Mechanics

A variety of CRISPR-associated (Cas) proteins append and access the CRISPR ge- netic cassette in order to surveil and protct against infectious mobile geneticele- ments (MGFs), such as viral (Figure 1.2). 1971 Casi and Cas2 work together to recognize new threats by sampling sequences from invading MCEs and storing them as spacers between CRISPR repeats. 11041 Other Cas onzymes then

16 Phage Mobile Genetic Element Threat

Streptococcus pyogenes/

/ CRISPR Cassette Spac

Threat Clearance

______ProtoSpscer

Figure 1.2 Ilhistration of the mechanics for spaceracquisition and RNA-directed ceavag' of an in- vading genetic threat in the typicalClass 2 Type TT CRTSPR system of human-infecting Strptoocus pyoqns [Partial ilistration credit: Lisa Nip]

clear threats by using guide RNA (gRNA) transcribed from the CRISPR cassette to

direct the enzyme's nucleolytic cleavage of familiar MGE targets. Such enzymes are

multi-protein complexes in Class 1 (Type I and III) CRISPRsystenis or single-protein in Class 2 CRISPR systems. which includes Cas9 (Type II), Cas12 (Type V), Cas13 (Type VI), and Cas14 (Type V). |64, 147, 1, 49

1.3 CRISPR for Gene-Editing

Clobal efforts have feverishly adapted CRISPR-Cas enzymes and their gRNA for programniable, multi-purpose genome engineering. 1143, 121, 35. 140. 761 This surge in attention to CRISPR follows from seminal work published in the surmner of 2012 that demonstrated Cas9-gRNA from Streptococcus pyogenes. a bacterium that infects humans, functions as a directed endonuclease on double stranded DNA (dsDNA). [64, 401 From then on, it was evident that. Cas9-gRNA could fulfill the essential activity of previous gone editing platforms, to break DNA in a sequence-specific manner (Figure

1.3B). |21,931 By the end of 2013, many studis had alreadyshown the effcOtiveness of CRISPR-Cas forgenome engineering in a variety of model organisms. [37, 60 . 101,

17 1 -1 - -- - MOM

A B DNA endonuclease encodings to target CGTACCTCTGTGTGACATCTCGG Owes. aMeganmlease (-CreO 4, DNAreek £rN~ H &S.4".W K- " Tik5 V/ H .flanscripton ActivatrLike Effector

worrDNA I

GGOJU.

3CRISPR-Cas9 guide RNA K" Joking (NHMJ (H"M

Figure 1.3 (A) Protein and RNA encodings of gene-editing platforms for comm)n DNA sequence recognition (3) Classic gene-editing outcomes upn double-strand break induction ofendogenons repair pathways with and without a donor template

142, 65] The immediat.e progress in applying CRISPR-Casto gene-editing benefited from methods developed over several years for precedifig platforms. These older platforms (e.g. meganucleases. zinc-finger nucleases. transcription activator-like effect.or nucle- ases, and recombinases) by contrast rely on protein-encoded specificity to achieve DNA sequence recognition (Figure 1.3A). [67, 6, 92, 9] Their design and/or as- sembly are therefore more expensive, less compact, and more complex thl the sim- ple base-pairing principles and minimal DNA synthesis required for building target- addressable gRNA for Cas9. 1134, 108, 1Il1 Additionally, methods developed for ex- isting oligonucleotide-based transcript.ome engineering plat.forms (e.g. short.-hairpin RNA int.erference and anti-sense oligonieleot.ides) furt.her prepared RNA-programmed CRISPR-Cas to quickly eclipse its genome engineering predecessors. [127, 105, 96, 115]

CRISPR syst.ems were discovered in a variety of microbes before they were foud in Streptococcus pyogencs. 198, 97, 7| 'More recent discoveries reveal a vast diversit.y of

18 CRISPR systems with distinct origins, as well as divergent RNA- and DNA-targeting properties. Nevertheless, the exceptional effectiveness and versatility of Cas9 from Streptococcus pyogenes (SpCas9), the first well-characterized single-protein CRISPR enzyme, allow it remain the most frequently employed for genome engineering appli- cations [78]. For all the impressive impact of SpCas9 on genome engineering, its evolution was instead within the context of a microbial adaptive . [94] Therefore the the range, specificity, and activity of SpCas9 were not under selective pressure to be ideal for its biotechnological usage. As this thesis heavily leverages the wealth of investigations and engineering applications of SpCas9, the remainder of this chapter highlights how the wild-type form of this specific enzyme can limit genome engineer- ing. The subsequent chapters of this thesis present novel improvements for more precise and expansive CRISPR-Cas9 gene editing.

1.3.1 Cas9 Range

SpCas9 recognizes target sites that match a 20 nucleotide (nt) "spacer" sequence in the guide RNA and are positioned next to a short protospacer adjacent motif (PAM) [102]. The PAM is primarily determined by affinity to the nuclease's PAM interaction (PI) domain. The PAM requirement therefore helps distinguish a target from the CRISPR spacer-repeat cassette that encodes a guide for that target - as the PAM is absent from the start of each repeat sequence. This evolutionary lower-bound effectively sets the floor on the selective pressure to minimize PAM sequence lengths for maximizing the number of valid PAM targets in genetic threats. Natural PAM sequences usually fall in the range of 2-6 bases long (Table 1.1). The PAM for SpCas9 is a guanine dinucleotide offset by one nucleotide (NGG). In a genome with unbiased nucleotide content, GG and its complement (CC) should together have an average spacing of 1 = 8 nt. In the genome of a phage that infects Streptococcus pyogenes, this average spacing is 14.8 nt. For the human genome, the average spacing is 10.1 nt. Single-nucleotide precision editing techniques, such as base-editing, are sensitive to the distance of a PAM relative to the position of the

19 CRISPR Endonuclease PAM Recognition Reference Streptococcus pyogenes Cas9 5'-NGG-3' [64] Streptococcus canis Cas9 5'-NG-3' [This work] Streptococcus macacae Cas9 5'-NAA-3' [This work] Streptococcus thermophilus Cas9 5'-NNAGAAW-3' [26] Neisseria meningitidis Cas9 5'-NNNNGMTT-3' [32] Staphylococcus aureus Cas9 5'-NNGRRT-'3 [109] Campylobacter jejuni Cas9 5'-NNNVRYAC-3' [36] Acidaminococcus Casl2a 5'-TTTN-3' [147] Bacillus hisashii Casl2b 5'-TTN-3' [130] W = A or T; M = A or C; R = A or G; V = G or C or A; Y = C or T

Tablel.1. PAM Recognition of Natural CRISPR Endonucleases desired base for editing; The window of operation can be as narrow as 3 nt, which raises the need for Cas nucleases with broader and/or alternative PAM recognition.

1.3.2 Cas9 Specificity

Phages mutate rapidly, so it is advantageous for guides to recognize similar targets that are not perfectly matched. The SpCas9 nuclease has a natural tolerance for roughly up to three mismatches in the target [137] Based on this property, the equa- tions below approximate the expected number of off-target sites for SpCas9's native operation in S. pyogenes and its biotechnological use on the human genome. Assuming independence between target and genome sequences, as well as unbiased nucleotide content, the probability that a random genomic position is targeted:

12 3 20 3' 120-i p(offtargetItarget)= - x - x - =1.85 x 10-9 4 4 4

Since the size of the Streptococcus pyogenes genome is

Gs, = 1.95 x 106 nt and the size of the human genome is

20 GHs = 3.23 x 10 9 nt, it follows that the expected number of SpCas9 off-targets in Streptococcus pyogenes is

Es, [offtargetItarget] = Gs,, x p(offtarget target) = 0.0036 offtargets and the expected number of SpCas9 off-targets in humans is

EHs [offtargetItarget] = Gas x p(offtarget target) = 5.98 offtargets.

We conclude that off-target effects are to be expected when using SpCas9 on the human genome, as others have observed.

1.3.3 Cas9 Activity

The dsDNA substrate of SpCas9 reflects the polynucleotide form of the type of vi- ral bacteriophages that infect Streptococcus bacteria. Streptococcus lack the repair pathways to efficiently end-join broken DNA, so the break on a phage's genome by a

RNA-guided SpCas9 ultimately prevents the phage from replicating and proliferating within the microbial population. Certain genome engineering applications resem- bling the native function of SpCas9 can take advantage of a destructive repair-less cut; However, many more applications require specific DNA repair outcomes after breakage. Several important applications even operate best when one or both Cas9 nuclease domains are made catalytically dead. Table 1 summarizes engineered vari- ants of SpCas9 that have been used to achieve an assortment of DNA modifications. [68, 140, 119, 110, 56]

21 Variant Break Type Application No Repair + Cas9 DSB Replication Disruption nCas9/dCas9 + Fusion to Deaminase SSB/NB Base-Editing dCas9 +/- Fusion to Repressor NB CRISPR Interference dCas9 + Fusion to Activator NB CRISPR Activation End-Ligation + Cas9/paired nCas9 DSB/SSB Insertion/Deletion Mutagenesis Donor DNA + End-Ligation + Cas9 DSB Donor Insertion + Templated-Repair + Cas9/nCas9 DSB/SSB Homologous Recombination

nCas9 = nickase Cas9, dCas9 = dead Cas9 DSB = Double-Strand Break, SSB = Single-Strand Break, NB = No Break

Table T1.2 Examples of Cas9 Usage for Genome Engineering Applications

1.4 Thesis Contributions

The key contributions of this thesis are summarized by chapter as follows:

Chapter 2 - Independently discovered that substituting DNA bases for RNA in guides can reduce off-target cleavage. Uniquely demonstrated a gene-editing efficiency increase by combining these guides with exonuclease over-expression to affect DNA repair outcome.

Chapter 3 - Discovered and validated the natural single-base 5'-NG-3' PAM recog- nition of Streptococcus cantis Cas9. Built a new computational pipeline for discovering

CRISPR enzyme PAM target ranges.

Chapter 4 - Discovered and validated the 5'-NAA-3' PAM recognition of Strep- tococcus macacae Cas9. Engineered an efficient gene-editing chimeric nuclease by swapping the PAM-interacting domain of SpCas9 with that of Streptococcus macacae

Cas9.

Chapter 5 - Independently discovered how to engineer oligo-triggered strand- displacement control switches into guides. The designs in this thesis are unique in their inpedence between the sequence of the triggering oligo and the target of the guide.

22 Chapter 2

DNA/RNA Chimeric CRISPR Guides Enhance Target Specificity for Streptococcus pyogenes Cas9 (SpCas9)

This chapter is adapted from Jakimo, Chatterjee, and Jacobson. Chimeric CRISPR guides enhance Cas9 target specificity.BioRxiv (2017)

23 2.1 Introduction

The recent discoveries, characterizations, and modifications of natural oligonucleotide- guided nucleases associated with CRISPR and RNAi have empowered a genome- editing revolution [65, 93, 21, 76]. Low barriers for OGNs' cost and design drive their widespread adoption over alternatives, including modular base-recognition do-

mains (i.e., transcription activator like effector, zinc finger, and pumilio assemblies), which can be hard to synthesize, or meganucleases, which are difficult to engineer for new targets [111, 108, 2, 134]. Unlike protein-directed systems, OGNs also permit employing predictable nucleic acid chemistry and biophysics to alter native features [51, 59, 82, 138].

Among the most important properties dictating the usage of a nucleic acid recog- nition system is its specificity. Thus, the desire to identify new methods diminishing potentially toxic or detrimental off-target activity has prompted many to measure and improve mismatch discrimination for RNA-guided SpCas9 - the most prevalent OGN and henceforth referred to as Cas9 [117, 137, 27]. Up to now, others have increased its precision through broad approaches, such as controlling duration of ex- posure, enforcing co-localization on adjacent targets, or destabilizing binding affinity by minor variation [24, 110, 71, 38]. Here we present chimeric mismatch-evading lowered-thermostability guides that replace most gRNA spacer positions with DNA bases to suppress mismatched targets under Cas9's catalytic threshold.

In this work, we confirm by in vitro cleavage assays that melt-guides can direct Cas9 with substantially enhanced mismatch discrimination. Moreover, we verify in vivo that melt-guides can achieve efficient mutagenesis with greater precision by pro- viding deep sequencing data from transfected HEK293T cells stably expressing Cas9.

24 A B 5'

CH H OH 3

DNA:DNA DNA:RNA RNA:RNA B-Form Double Heux A- Form Dnuhie Hepix C

DNANA **

Figure 2.1 Aunoated 31) struture of a arge-guide-Cas9 -loop based on PDB 5F9R shown above he 2D structure of a mel-guide. Red and yellow spheres higldight RNA hydroxs eliminaed and retained. respectively, in initial melt-guide designs. Proposed model of relaive R-loop expansion rate differences (represented by arrow sizes and directions) Lhat. increase mismach sensitiviy for melt-guides compared to gNA. Red segments indicate mismatches between guide and target.

2.2 Results and Discussion

2.2.1 DNA Substitutions in Cas9 gRNA Improve Mismatch Sensitivity

Efforts that. have measured and modeled Cas9 target. recognition imply a mechanism that includes incromontal strand invasion between gRNA spacer and t.arget sequence 133, 661. After prorequisite binding to a short. prot-ospacer adjacent. motif(PAM),Cas9 helps stabilize DNA unwinding at a potential target as guide displaces its DNA:DNA base-pairs with RNA:DNA base-pairs (Figure 2.1) [63]. After the resulting structure, called an R-loop, expands beyond a ~15 base-pair exchange, Cas9 can then create a double-strand DNA break [62, 68]. Although RNA nucleotides differsubtly from DNA nucleotides by the addition of a 2'-hydroxyl in RNA, this difference favors A-form double helix structures for RNA

25 base-pairing. The tighter helicity of A-form structures are more thermostable than the B-form double helix structures of DNA:DNA hybridization because of the better pro- tection of hydrophobic nucleobase stacking. [100] Motivated by studies on RNA:DNA chimera hybridization indicating more DNA content generally decreased duplex sta- bility, we rationally designed chimeric melt-guides promoting the rehybridization of mismatched R-loops (Figure 2.1) [131, 100]. As illustrated, we selected candidate DNA-tolerant positions in gRNA by excluding most positions containing RNA-specific 2'-hydroxyl contacts with Cas9 that may help maintain assembly of active OGN.

For a standard target sequence from human VEGFA, we used commercially syn- thesized chimeric melt-guides and corresponding on- and off-target DNA substrates to compare a melt-guide's mismatch discrimination to canonical gRNA when direct- ing DNA cleavage. (Figure 2.2) shows that a melt-guide containing 17 DNA bases was functional in a 4-hour digestion assay with purified Cas9 and produced 74% the amount of cleaved on-target substrate as did gRNA. The same melt-guide resulted in no detectable cleavage for all surveyed two-mismatch off-targets, which, in many cases gRNA-Cas9 cut faster than on-target substrate. Furthermore, on a challenging single-mismatch substrate that has been reported to be just as frequently an off- target for wild-type and high-fidelity Cas9 (hfCas9) variants, melt-guide reduced the digested fraction by four-fold [124, 71].

Additional in vitro assays demonstrate the generality of designing melt-guides for different genomic targets, but likewise reveal that targets comprising high GC and/or pyrimidine target content can limit sufficient destabilization to avoid cutting certain multi-mismatched sequences, even with melt-guides containing only DNA in the spacer (Figure 2.3) [47]. This limitation can be used to inform target-selection for a given application or it can be potentially overcome through combination of orthogonal destabilization techniques, such as truncating guide or complexing it with hfCas9. We have identified other nucleotide-type substitutions that also enhance specificity, including phosphorothioate (PS) internucleotide linkages [125, 149].

26 20 RNA - 0 DNA RNA

jRNA smd1-g~Wf us-gRNA RNA ruf-gurus 17RNA-ODNAttu-gRNA Spa Nkvftc SNmam20 a is 3 5 17 20 9 a 3 a0i No"4 a is 14 It fI 15 0) 11 14 1/X

9 RNA -11 DNA melt-gude WUGOTCTGCWG'

6 RNA -14 DNA melt-guide

3 RNA - 17 DNA met-gude MaM4IGTQTG7CA

0 RNA -20 DNA mekt-uide

Figure 2.2 ConrMt-adjus.ed gel image of 4-hour Cas9 in vil.ro digests of targe.s wilh mismatches ranging from 0 to 3 using gRNA or melt-guide.

A B

'lt, Af5pr A A Aq#* A, AfA4

h*d tf Ahde (q0e0

M, Nd Twge" AMA tGACCAGFAA °AA4ACCO Tarot &ma~cwhg raw" I ww ICStarget C / Gt/ CWC.AGA Gf M 11V1,WIZ Ca9o easo Ceo "e cO ast VtAw9 W* -t(~y- gthFWRrzey' 04qhV'Mcy wi 4F lo

KatchdnTfeat GCg FAOTarget GCAGMAGGGATTccAAGGG Gc.AMaC~CMAcGSGG vuEFTargetcccccaccccctccc auatcdtargetccccaocata~cacnccrrereaiceac OMbathedvaget

Figure 2.3 Contrst-adjusted gel image of4-hour Cas9 in vitrodigests of targets withmismatches ranging from 0 to 3 sing gRNA or mell-guide.

27 IM WM

Gaa O's

06 0.6

0.6 0.- gN

J0302 1I'' *I::L a melt-guad. 01 01 0.0 M.0 0 W 20 3 * so D W 10o i50 200 shor"aeOtn Time 0n) LongeO cw etgmmn ThmM (Mn)

Figure 2.4 Melt-gide (gray) and gRNA (back) cleavage ime courses or no-mismatch target wi.h Cas9 plot alongside leasi,-sqiaresfitling logarihmic runciions (dashed curves)

2.2.2 R-Loop Expansion Kinetics Determine Melt-Guide Speci- ficity

Whereas Cas9 is known to rapidly cleave DNA, its rates with melt-guides slowed appreciably (Figure 2.4) [128]. In order to confirmi R-oop expansion contributes more than inisniatched hybridization to this change in kinetics, we ran time-coursed digestions using substrates that were eitherdouble-strandied (ds) or single-stranded (ss) along the target (Figure 2.5). Within minutes, Cas9 with canonical gINA was able to cut both ds- and ss- target to near conmpletion. Formelt-gide-directed cleavage, we instead observed steady digestion of no-mismatch ds-targets over several hours, yet rates on ss-targets about as rapid as gRNA's and at similar timescales in the presence and absence of mismatches. The fast error-prone cuts we detected upon removing strand-displacement from cleavage dynamics support that R-loop destabi- lization contributes mainly to melt-guides' improved specificity.

Future single-molecule fluorescent resonance energy transfer (FRET) measure- mnents of nielt-gides can be usedto oltain finer detail of recognition kinetics and Cas9 conformational changes, complementary to previous work using gRNA 123 133J.

While we noticed melt-guides that include all-DNA-spacer did not introduce drastic structural changes that would have prevented cleavage, it is unclear whether such guides more closely adopt A-form or B-form duplexes with their target. This uncer-

28 A

$ 1S WW*S 1 dkf $Th'** VEGEA Targt

GTGACCTGAGTGTGjTGCOTG GGNf~msa~p 044 wad~

__ _W ____W#___a__44__4

Figure 2.5 Invertedcontras-adjused gel images ofshorL (lef-Side within each quadraut) and long (righ-side within each quadrant) Cas9 digests of double-sranded (left-half) and single-stranded (righ-half) targets with two misunuches (botoin) or none (top) using gRNA ormel-guides wih spacers containing eiher all-DNA, 3 DNA distributed as previously described, or an additional 3 DNA fill-in.

tainty arises from antagonistic influences of Cas9 pre-loading guide in an unpaired A-form versus the favored L-forming tendency of DNA:DNA dimers [102, 48]. The exact extent to which the helicity is altered for melt-guides in oligonucleotide-protein complexes could be solved from a crystal struct.nre of the bound melt-guide OCN.

2.2.3 Melt-Guides Reduce Off-Target Genome Editing

To test the use of melt-guides for genome editing we first transfected EMX1-targeting melt-guide oligos (5 DNA substitutions into the 5'-most positions) pre-assembled with

Cas9 protein into 11EK293T and enzymatically measured insertion/deletion (indel) mitations. The melt-guide edited on-t.arget with almost. the same efliciency as gRNA and successfully evaded the off-targetsite edited by gRNA. (Figure 2.6) We trans- fected VECFA-targeting melt-guide oligos with mostly DNA content. into HEK293T cells stably expressing Cas9 and enzymatically measured insertion/deletion (indel)

29 A B EMX1IndetGenero (TTE1) froM VWFAIndelGeneradon GS) from C Cutur Gud-Cf RNP DaUred by LpoeconInto HEK23 CLL Cuture Gume Delnred by LIpoatecton bto C9 Stable EK293 TreON-TaUgel ONTCg A CAGSTcCAGCAACAAGrAACGCG

rWmsm -TrMcZOFF-Tr9't

-- 1 ep S RPAD RpA2 eP1 Refp2 F SnAAOGGef OFF-Teget ck~fAACAGAn I;

(¶SRNA -D N)(ORNA -ODNMW Nye NA DuRNA meL-syrNA (20RNaON0") (9 -DMA) -4 RNA-1DNA) OW-Tnget IT-Target CGAGTGACTCTGMCCGTGC ACArAGTGAGMG7CCATG ACG

Figure 2.6 Dual y-axis char shows deep sequencing indel ineasuremients on-targeL and at a known off-arget, comparing iuagenesis by gRNA to inelt-guides designed wiLh DNA substituions inLheir first 11 positions with and wihout Trex2 overexpressiou (blue and light blue, respectively). Nucleic acid-type content in the guides spacer is noted in parenheses.

mutations (Figure 2.6). Initial attempts yielded unsatisfactorily lownutagenesis, which we believe resulted from unfavorable relative rates of: (i) guide oligo degra- dation, (ii) slower R-loop expansion, and (iii) errorless non-homologous end-joining (NIIEJ) repair [132, 119]. We tried comteracting degradation with oligo lifetime- lengthning modications(e.g, phosphorothioatc or inverted terminal bases and 2'- O-metfhyl RNA substitutions on non-spacer guide positions) and partially restored cleavage rates by using fewer DNA substitutions in melt-guides 1511. Since these tac- ties did not lead to substantial improvement, we later pursued methods that could bias genomic double-strand breaks towards more error-prone repair.

Overexpression of the mammalian 3' exonuclease Trex2, associated with DNA damage processing, has been reported t.o raise indel rates forvarioussequenc-specific Trx2 gone editing systems without. causing toxicity 125, 13, 161. Thereforewcadded expression plasmid to transfoctions and measured effcted mutations by deep sequenc- ing (Figure 2.6) [106]. We found a melt-guide containing mostly DNA in spacer

30 tracrRNA

crRNA

Figure 2.7iuvered gel image of Cas9 diges of o-mismatch VEGFA target with almot.all-DNA melt-guide of erRNA-length. bases produced indel percentages above 25% on-t.arget., which acceptably .ranslates to 70% gRNA's rate. Crucially, on an off-target where we detected gNA-induced mutations, melt-guides' indel percentages fell below our no-guide negative control. Between melt-guide types, single-molecule gRNA (sgRNA) length melt-guides con- sistently generated more than double the indelrate of melt-guides derived from shorter

CRISPR RNA (crRNA) sequence. which need t.o duplex with trans-activating crRNA

(tracrRNA). Despite Trex2 addition increasing indel percentages roughly seven-fold for both melt-guide types, theexonuclease had marginal impact on gRNA-directed nutation rates.

Others have achieved enhanced Cas9 specificity and could maintain high indel rates on-target without an accessory exonuclease [71, 124]. However, their experi- nents relied on transcribing all OGN components to abundant cellular concentrations.

On one hand, a similar Trx2 supplementation strategy may bnefit applications where some components are delivered as oligo or protein - which may include DNA- guided editing with Argonaute 130, 831. On the other hand, a reverse-t.ranscribable melt-guide with only DNA bases could lessen dependence on Trex2 forefficient mu-

31 tagenesis. Towards that end, we show in vitro cleavage directed by tracrRNA in duplex with a crRNA-length melt-guide containing a all DNA outside of the spacer sequence (Figure 2.7). Chimeras with such sparse RNA content are furthermore likely resistant to most RNA endonucleases.

2.2.4 Conclusions

In the case of Cas9, we improve the precision of target activity in vitro and in vivo with mismatch-evading lowered-thermostability guides. We believe melt-guides should be extensible to the expanding collection of CRISPR systems by extrapolating either from chimeric oligo libraries to scan nucleotide-type substitution or from published crystal structure data to avoid disrupting RNA-specific interactions (i.e., Cpfl guide's pseudoknots) [144, 12]. Given the minimal RNA content that we found to be suf- ficient for guiding Cas9, additional protein engineering - perhaps through homolog alignments - may enable the realization of all-DNA melt-guides.

2.3 Materials and Methods

2.3.1 Cas9-Guide in vitro DNA Digestions

Mixed nucleotide-type and RNA oligos, designed as Cas9 guides for selected standard genomic targets, were obtained from Integrated DNA Technologies (IDT). A 1 M dilution was prepared for stocks of guide derived from sgRNA or crRNA and the latter was combined with equimolar tracrRNA (GE Dharmacon). Reactions consisted of 20 M pre-annealed guide stock, 20 nM purified Cas9 from New England BioLabs (NEB), 10x NEB reaction buffer, and 500 g of IDT-synthesized dsDNA target in 30 1 mixes. Samples were incubated at 37 and digested products separated by TAE- gel electrophoresis. Images of cleaved fractions from SYBR-Safe dsDNA gel stain (Thermo Fisher) under a blue light lamp were quantified using ImageJ software.

32 2.3.2 Preparation of Single-Stranded Target DNA Substrates

Target substrates were PCR-amplified using a primer oligo set (IDT) with 5' phos-

phorylation for only the primer generating PAM-sided strands. Amplicons purified on

anion-resin exchange columns (Qiagen) were digested by Lambda exonuclease (NEB), a 5'-to-3' enzyme that prefers phosphorylated ends of dsDNA, to yield ssDNA of the

strand opposite of PAM. Following subsequent column purification, ssDNAs were an-

nealed to a primer beginning at the PAM site of the removed strand and templated for extension by DNA polymerase (NEB).

2.3.3 Genomic Indel Production and Measurements

HEK293T cells stably expressing Cas9 purchased from GeneCopoeia were plated to 250,000 cells / 35 mm well in 2.5 ml Dulbeccos Modified Eagles Medium with 10%

Fetal Bovine Serum and incubated at 37 C and 5% Co 2 . The next day, transfec- tions via TransIT-X2 reagent (Mirus Bio) delivered a 25 nM final concentration of guide with or without 2.5 g pExodus CMV.Trex2, which was a gift from Dr. Andrew Scharenberg (Addgene plasmid #40210). After an additional 48 hours, genomic DNA was isolated using Epicentre QuickExtract solution and indel production was visual- ized by a common T7 Endonuclease I assay (NEB) on amplicons from on-target and known off-target regions [139]. Amplicons were then prepared for deep sequencing with Nextera-XT tagmentation (Illumina) and run on a MiSeq 2x300 v3 kit (Illu- mina). Reads were analyzed using the CRISPResso software pipeline for precise indel percentages from biological and technical duplicates [106].

33 2.3.4 Sequence Information

Target Name Sequence (Protospacer PAM) VEGFA ON GGTGAGTGAGTGTGTGCGTG TGG VEGFA OFF1 GGTGAGTGAGTGTGTGTGTGGGG VEGFA OFF2 GCTGAGTGAGTGTATGCGTG TGG VEGFA OFF3 TGTGGGTGAGTGTGTGCGTG AGG VEGFA OFF4 GGTGAACGAGTGTGTGCGTG TGG VEGFA OFF5 GGTGAGTAGGTGTGTGCGTG TGG VEGFA OFF6 AGAGAGTGAGTGTGTGCATG AGG

34 Chapter 3

Single-Base PAM Specificity of a Highly-Similar SpCas9 Ortholog from Streptococcus cants

This chapter is adapted from Chatterjee*, Jakimo*, and Jacobson. Minimal PAM specificity of a highly similar

SpCas9 ortholog. Science Advances (2018)

*Equal contribution

35 3.1 Introduction

RNA-guided endonucleases of the CRISPR-Cas system, such as Cas9 [64] and Cpfl (also known as Casl2a) [147], have proven to be versatile tools for genome editing and regulation[107], which have numerous implications in medicine, agriculture, bioen- ergy, food security, nanotechnology, and beyond [8]. The range of targetable sequences is limited, however, by the need for a specific protospacer adjacent motif (PAM), which is determined by DNA-protein interactions, to immediately follow the DNA sequence specified by the single guide RNA (sgRNA) [64, 97, 120, 129, 63]. For example, the most widely used variant, Streptococcus pyogenes Cas9 (SpCas9), requires an 5'-NGG- 3' motif downstream of its RNA-programmed DNA target [64, 8, 97, 120, 129, 63]. To relax this constraint, additional Cas9 and Cpfl variants with distinct PAM require- ments have been either discovered [109, 32, 70, 54, 50] or engineered [54, 50, 72, 39] to diversify the range of targetable DNA sequences. In total, these studies have provided only a handful of CRISPR effectors with minimal PAM requirements that enable wide targeting capabilities.

To help augment this list, we characterize an orthologous Cas9 protein from Strep- tococcus canis, ScCas9 (UniProt I7QXF2), possessing 89.2% sequence similarity to SpCas9. We find that despite such homology, ScCas9 prefers a more minimal 5'-NNG- 3' PAM. To explain this divergence, we identify two significant insertions within its open reading frame (ORF) that differentiate ScCas9 from SpCas9 and contribute to its PAM-recognition flexibility. We show that ScCas9 can efficiently and accurately edit genomic DNA in mammalian cells, and investigate possible explanations for PAM divergence between Streptococcus orthologs. Finally, we construct a bioinformatics pipeline to explore the PAM specificities of other Streptococcus orthologs.

36 3.2 Results

3.2.1 Identification of SpCas9 Homologs

While numerous Cas9 homologs have been sequenced, only a handful of Streptococcus

orthologs have been characterized or functionally validated. To explore this space, we curated all Streptococcus Cas9 protein sequences from UniProt [22], performed global pairwise alignments using the BLOSUM62 scoring matrix [52], and calculated percent sequence homology to SpCas9. From them, the Cas9 from Streptococcus canis (ScCas9) stood out, not only due to its remarkable sequence homology (89.2%) to SpCas9, but also because of a positive-charged insertion of 10 amino acids within the highly-conserved REC3 domain, in positions 367-376 (Figure 3.1). Exploiting both of these properties, we modeled the insertion within the corresponding domain of PDB 4008 [102] and, when viewed in PyMol, noticed that it formed a "loop"-like structure, of which several of its positive-charged residues come in close proximity with the target DNA near the PAM. We further identified an additional insertion of two amino acids (KQ) immediately upstream of the two critical arginine residues necessary for PAM binding [4], in positions 1337-1338. We thus hypothesized that these insertions may affect the PAM specificity of this enzyme. To support this prediction, we computationally characterized the PAM for ScCas9, by first mapping spacer sequences from the Cas9-associated type II CRISPR loci in the Streptococcus canis genome [86] to viral and plasmid genomes using BLAST [3], extracting the sequences 3' to the mapped protospacers, and subsequently generating a WebLogo [23] representation of the aligned PAM sequences. Our analysis suggested an 5'-NNGTT- 3' PAM. Intrigued by these novel motifs and motivated by the potentially reduced specificity at position 2 of the PAM sequence, we selected ScCas9 as a candidate for further PAM characterization and engineering.

37 A B

A B

C Putative PAM (7 BLAST Hits)

IWWI 1.0

5

Figure 3.1 In silico characterization of ScCas9. (A) Global pairwise sequence alignment of SpCas9 and ScCas9. (B) Insertion of novel REC motif into PDB 4008 (18). (C) WebLogo (22) for sequences found at the 3 end of protospacer targets identified in plasmid and viral genomes using type II spacer sequences within S. canis as BLAST (21) queries.

3.2.2 Determination of PAM Sequences Recognized by Sc- Cas9

Due to the relatively low number of protospacer targets, we validated the PAM bind- ing sequence of ScCas9 utilizing an existent positive selection bacterial screen based on GFP expression conditioned on PAM binding, termed PAM-SCANR [85]. A plasmid library containing the target sequence followed by a randomized 5'-NNNNNNNN-3' (8N) PAM sequence was bound by a nuclease-deficient ScCas9 (and dSpCas9 as a con- trol) and an sgRNA both specific to the target sequence and general for SpCas9 and ScCas9, allowing for the repression of lacI and expression of GFP. Plasmid DNA from FACS-sorted GFP-positive cells and pre-sorted cells were extracted and amplified, and enriched PAM sequences were identified by Sanger sequencing, and visualized utiliz- ing DNA chromatograms. Our results provided initial evidence that ScCas9 can bind to the minimal 5'-NNG-3' PAM, distinct to that of SpCas9's 5'-NGG-3' (Figure 3.2).

We hypothesized that the previously described insertions may contribute to this flex-

38 ibility, and thus engineered ScCas9 to remove either insertion or both, and subjected these variants to the same screen. Only removing the loop (ScCas9 A367-376 or ScCas9 ALoop) extended the PAM of ScCas9 to 5'-NAG-3', with reduced specificity for C and G at position 2, while only removing the KQ insertion (ScCas9 A1337- 1338 or ScCas9 AKQ), reverted its specificity to a more 5'-NGG-3'-ike PAM, with reduced specificity for A at position 2. Finally, the most SpCas9-like variant, where both insertions are removed (ScCas9 A367-376 A1337-1338 or ScCas9 ALoop AKQ), expectedly reverted its specificity back to 5'-NGG-3'. Thus, from a functional per- spective, these insertions operate in tandem to reduce the specificity of ScCas9 from the canonical 5'-NGG-3' PAM to a more minimal 5'-NNG-3'.

To confirm the results of the library assay and to rule out limiting downstream re- quirements, we decided to elucidate the minimal PAM requirements of ScCas9 by utilizing fixed PAM sequences. We replaced the PAM library with individual PAM sequences, which were varied at positions 2, 4, and 5 to test each possible base. Our results demonstrate that while ScCas9 exhibits no clear additional base dependence, with activity for all base iterations at each position, ScCas9 ALoop AKQ demon- strates significant binding at 5'-NGG-3' PAM sequences and at some, but not all, 5'-NNGNN-3' motifs, indicating an intermediate PAM specificity between that of SpCas9 and ScCas9.

3.2.3 Assessment of ScCas9 PAM Specificity in Human Cells

We compared the PAM specificity of ScCas9 to SpCas9 in human cells by co-transfecting HEK293T cells with plasmids expressing these variants along with sgRNAs directed to a native genomic locus (VEGFA) with varying PAM sequences. We first tested editing efficiency at a site containing an overlapping PAM (5'-GGGT-3'). After 48 hours post- transfection, gene modification rates, as detected by the T7E1 assay, demonstrated comparable editing activities of SpCas9, ScCas9, and ScCas9 ALoop AKQ (Figure 3.3). Additionally, we constructed sgRNAs to sites with various non-overlapping 5'- NNGN-3' PAM sequences. While SpCas9's cleavage activity was impaired at other

39 -M

A B FACS PAM Analysis EA -scas9 SN so. OW SC,,, A"e agO Ubrary C mSpcas9 UG 0 ScCas9 T 0

ScCaS9 _ +f5 so,.~. ALcap ~4O ScCas9 AKQ 0 ScCas9 20.

S-f 0-

Figure 3.2 PAM determination of engineered ScCas9 variants. (A) PAM binding enrichment on a 5-NNNNNNNN-3 (8N) PAM library. PAM profiles are represented by Sanger sequencing chro- matograms via amplification of PAM region following plasmid extraction of GFP+ E. coli cells. (B) Examination of PAM preference for ScCas9. For individual PAMs, all four bases were iterated at a single position (2, 4, 5). Each PAM-containing plasmid was electroporated in duplicates, subjected to FACS analysis, and gated for GFP expression. Subsequently, GFP expression levels were aver- aged. SD was used to calculate error bars, and statistical significance analysis was conducted using a two-tailed Students t test as compared to the negative control.

40 non-5'-NGG-3' sequences [57], ScCas9 maintained comparable activity to that of Sp- Cas9 on its 5'-NGG-3' target across all tested targets with 5'-NNGN-3' PAM se- quences. Consistent with our bacterial data, ScCas9 ALoop AKQ was able to cleave

at the 5'-NGG-3' target, along with significant activity on the 5'-NNGA-3' target, with reduced gene modification levels at all other 5'-NNGN-3' targets. Overall, these results verify that ScCas9 can serve as an effective alternative to SpCas9 for genome editing in mammalian cells, both at overlapping 5'-NGG-3' and more minimal 5'- NNGN-3' PAM sequences.

We assessed the PAM specificity of ScCas9 base editors by using a synthetic Traf- fic Light Reporter (TLR) [14] plasmid, containing an early stop codon upstream of a GFP ORF and downstream of an mCherry ORF. Successful A--+G base editing using the ABE(7.10) architecture, as described in Gaudelli, et al. [41], converts an early, in-frame TAG stop codon to a TGG tryptophan codon, thus restoring GFP expres- sion. After gating cells based on mCherry expression, we observed significant base editing efficiency at all 5'-NNGN-3' target PAM sequences for ScCas9-ABE(7.10), as compared to the SpCas9-ABE(7.10) architecture, which only demonstrates significant A-+G conversion on the standard 5'-NGG-3' and tolerated 5'-NAG-3' motifs in this assay (Figure 3.3).

3.2.4 Off-Target Analysis of ScCas9

We next sought to evaluate the accuracy of this enzyme in comparison to SpCas9. We utilized previous genome-wide analysis of SpCas9 targeting accuracy to select three genomic targets (VEGFA site 3, FANCF site 2, and DNMT1 site 4) that pos- sess multiple off-target sites on which SpCas9 demonstrates activity [137]. Each of these three sites additionally possesses a single off-target that has been particularly difficult to mediate via engineering of high-fidelity Cas9 variants [124, 71, 19]. We first analyzed ScCas9's activity on these off-targets. After co-transfection of sgRNAs to the three aforementioned sites alongside both SpCas9 and ScCas9, we amplified

41 A B T7E1 Indel Analysis PAN UK^9~ S`f' 4-- ****"Dma CWd Caroni

00 W* " A1i

C A- ACS A-GAnalyzs

GTGGC I-R

GTGG

Figure 3.3 ScCas9 PAM specificity in human cells. (A) T7E1 analysis of indels produced at VEGFA loci with indicated PAM sequences. The Cas9 used is indicated above each lane. All samples were performed in biological duplicates. As a background control, SpCas9, ScCas9, and ScCas9 Loop KQ were transfected without targeting guide RNA vectors [() guide control]. (B) Quantitative analysis of T7E1 products. Unprocessed gel images were quantified by line scan analysis using Fiji (41), the total intensity of cleaved bands were calculated as a fraction of total product, and percent gene modification was calculated. All samples were performed in duplicates, and quantified modification values were averaged. SD was used to calculate error bars, and statistical significance analysis was conducted using a two-tailed Students t test as compared to the negative control. (C) ScCas9- mediated AG base editing. GFP+ cells were calculated as a percentage of mCherry+ (RFP+) cells for indicated PAM sequences using the TLR (25) with an early stop codon. All samples were performed in duplicates, and quantified percentages were averaged. SD was used to calculate error bars, and statistical significance analysis was conducted using a two-tailed Students t test. RFP+, red fluorescent proteinpositive.

42 genomic DNA flanking both the on-target and difficult off-target sequences to assess

their genome modification activities. Consistent with previously-reported data [19], SpCas9 demonstrated high off-to-on targeting on all three examined targets (Figure 3.4). ScCas9 demonstrated comparable on-target activities for the three targets, but exhibited negligible activity on the VEGFA site 3 and DNMT1 site 4 off-targets, and a nearly 1.5-fold decrease in off-to-on target ratio for FANCF site 2, suggesting im- proved accuracy over SpCas9 on overlapping 5'-NGG-3' targets.

To examine ScCas9's accuracy across its wider PAM targeting range, we utilized a mismatch tolerance assay {19] on target sequences with 5'-NAG-3', 5'-NCG-3', 5'- NGG-3', and 5'-NTG-3' PAMs. We generated sgRNAs containing both single and adjacent double mismatches at every other base along each of the four on-target crRNA sequences, and subsequently measured the genome modification efficiencies for these mismatched sgRNAs. Our results demonstrate that ScCas9 generally tol- erates single mismatches better than double mismatches for each analyzed spacer position, and is similarly less likely to tolerate mismatches within the seed region of the crRNA, though with greater sensitivity than SpCas9. Across all of the four PAM targets, ScCas9 does, however, tolerate mismatches within the middle of the crRNA sequence, with highest efficiencies reported for the 5'-NTG-3' target. SpCas9 expectedly demonstrates negligible genome modification activity on the 5'-NCG-3' and 5'-NTG-3' targets, but weakly tolerates single and double mismatches across the entire crRNA sequence, with reduced tolerance in the seed region, for the standard

5'-NGG-3' target, corroborating previous mismatch tolerance studies [19]. Finally, ScCas9 exhibits a similar mismatch tolerance profile to SpCas9 on the 5'-NAG-3' target, albeit with a higher reported on-target efficiency.

3.2.5 ScCas9 Genome Editing Capabilities

Finally, to establish ScCas9 as a useful genome editing tool, we evaluated its ability to modify a variety of gene targets for a handful of different PAM sequences. We constructed sgRNAs to 24 targets within 9 endogenous genes in HEK293T cells, and

43 evaluated on-target gene modification utilizing the T7E1 assay. Our results demon- strate that ScCas9 maintains comparable efficiencies to that of SpCas9 on 5'-NGG-3' sequences, as well as on selected 5'-NNG-3' PAM targets (Figure 3.4), supporting our previous findings. SpCas9 expectedly performs efficiently on 5'-NGG-3'and weakly on 5'-NAG-3' targets, but demonstrates negligible editing capabilities on 5'-NCG-3' and 5'-NTG-3' PAM sequences, as previously demonstrated. Notably, ScCas9 performed less effectively on selected target sequences in the Hemoglobin subunit delta (HBD) gene, while demonstrating higher efficiencies on 5'-NNG-3' sequences in VEGFA and DNMT1, for example. Such variation in efficiency within each PAM group and across different genes indicates that proper target selection within specified genomic regions is critical for successful ScCas9-mediated gene modification.

We subsequently measured the efficacy of ScCas9 integrated within the BE3 [77] and ABE(7.10) [41] base editing architectures on endogenous genomic loci. To evalu- ate the efficiency of base editing activities, we developed a simple, easy-to-use Python program, termed the Base Editing Evaluation Program (BEEP), that takes as input both a negative control ab1 Sanger sequencing file and the edited sample abl file and outputs the efficiency of an indicated base conversion at a specific position (read 5' to 3') along the target sequence. BEEP analysis on abl files, following transfection of ScCas9 base editors, genomic amplification, and subsequent Sanger sequencing, demonstrates that ScCas9 is capable of mediating C-+T and A-G base conversion at both overlapping 5'-NGG-3' and nonoverlapping 5'-NNG-3' PAM sequences (Figure 3.4). While ScCas9 base editors perform efficiently on the non-5'-NGG-3' targets, as compared to SpCas9, ScCas9 is less effective at editing 5'-NGG-3' genomic targets than SpCas9 for both architectures, indicating that further development is necessary for broad usage of ScCas9 base editors.

44 AGenomkcOff-Targets 8

NoU, ScCas9 t a :

A D .ummmuuu* SpCas9

C D AC T SpCAs9-BE3 secueS-a ScCas9 spcus'

CACT MW~ SC. SA

* orAl I

SpCas9-AE(7.10) Sccas9-ABE(7.10) **CT tS Ah t"' p A L AINY'JIVA1 3AN.t7% Edting%% (BEB Mung S Om

Figure 3.4 (A) QuaniLative analysis of T7E1 products for indicated genoinic on- and off-target editing. All samples wereperformed in duplicates, and quantified modification valuesswere averaged. SD was used to calculate error bars. and staisical significance analysis was conducued using a wo-ailed Studens t Lest as compared Lo each negative control. Misiached positions wiLhin he spacer sequence are highlighted in red. (B) Efficiency heaumap of mismatch tolerance assay. Quantified modification efficiencies. as assessed by the T7EB assay. are exhibited for each labeled single or double mismatch in thesgRNA sequence for eachindicated PAM. (C) Dot plot of on-target modification percentages at various gene targets for indicated PAM, as assessed by the TElassay. Duplicate modification percentages were averaged. (D) Genomic base-editing characterization. For each indicated PAM, a representative Sanger sequencing chromatogram is shown, demonstrating the most efficiently edited base in the Larget sequence. Percenu edited values. as quantified by BEEP in comparison to an unediued negative control, were averaged. and SI) was subsequenly calculaed.

45 3.2.6 Investigation of Sequence Conservation Between S. ca-

nis and Other Streptococcus Cas9 Orthologs

To further investigate the distinguishing motif insertions in ScCas9, we inserted the loop (SpCas9 ::Loop), the KQ motif (SpCas9 ::KQ), or both (SpCas9 ::Loop ::KQ) into the SpCas9 ORF and analyzed binding on the 8N library using PAM-SCANR. Of these variants, only SpCas9 ::KQ showed target binding affinity in the PAM- SCANR assay. Sequencing on enriched GFP-expressing cells demonstrated an unaf- fected preference for 5'-NGG-3' (Figure 3.5). FACS analysis on a fixed 5'-TGG-3' PAM confirmed these binding profiles, with SpCas9 ::KQ yielding half the fraction of GFP-positive cells compared to SpCas9. This data, in conjunction with the bind- ing profiles of ScCas9 variants, suggests that while these insertions within ScCas9 do distinguish its PAM preference from SpCas9, other sequence features of ScCas9 also contribute to its divergence.

S. canis has been reported to infect dogs, cats, cows, and humans, and has been implicated as an adjacent evolutionary neighbor of S. pyogenes, as evidenced by various phylogenetic analyses [86, 113, 112. In addition to sharing common hosts, we identified S. canis CRISPR spacers that map to phage lysogens in S. pyogenes genomes, which suggests they are overlapping viral hosts as well. This close evolu- tionary relationship has manifested itself in the sequence homology of ScCas9 and SpCas9, amongst other orthologous genes, predicted to be a result of lateral gene transfer (LGT) [86, 113, 112]. Nonetheless, from the alignment of SpCas9 and Sc- Cas9, the first 1240 positions score with 93.5% similarity and the last 144 positions score with 52.8%. To account for the exceptional divergence in the PAM-interacting domain (PID) at the C-terminus of ScCas9 as well as the positive-charged inserted loop, we focused on alignment of the distinguishing sequences of ScCas9 to other

Streptococcus Cas9 orthologs. Notably, the loop motif is present in certain orthologs, such as those from S. gordonii, S. anginosus, and S. intermedius, while the ScCas9

46 PID is mostly composed of disjoint sequences from other orthologs, such as those from

S. phocae, S. varani, and S. equinis. Additional LGT events between these orthologs, as opposed to isolated divergence, more likely explain the differences between ScCas9

and SpCas9. Our demonstration that two insertion motifs in ScCas9 alter PAM pref-

erences, yet do not abolish PAM binding when removed, suggests other functional

evolutionary intermediates in the formation of effective PAM preferences.

SpCas9-NG and xCas9-3.7 both harbor various substitutions in their open read-

ing frames (ORFs) that allow reduced specificity from the canonical 5'-NGG-3' to the more minimal 5'-NGN-3' PAM. Specifically, positions 1218-1219 for both enzymes

have been shown to be the most consequential in terms of PAM recognition[86, 57]. To

engineer ScCas9 to possess improved PAM targeting capabilities, we performed global

pairwise alignments using the BLOSUM62 scoring matrix[14] of various Streptococcus

Cas9 orthologs to SpCas9, xCas9-3.7, and SpCas9-NG at these critical residues. Our

sequence alignment isolated a positive-charged lysine residue, derived from the S.

gordonii Cas9 ORE. Substituting positive-charged residues into the PAM-interacting

domain (PID) of Cas enzymes has been suggested to allow for the formation of novel

PAM-proximal DNA contacts [102]. Motivated by this finding, we thus substituted

the corresponding T1227K mutation into the ORF of ScCas9, generating ScCas9+ (Sc+) (Figure 3.6).

One of the defining characteristics of ScCas9's PAM flexibility is its employment

of a positive-charged loop, in positions 367 to 376 of its ORF, which does not exist

in SpCas9 or its engineered variants [3]. Our sequence alignments identified a diver-

gent insertion from S. anginosus, which not only maintains the positive charge of the

ScCas9 loop by compensating an extra lysine residue for a histidine, but also pos-

sesses an "SG" motif, a flexible sequence of residues used for linker design in protein

engineering [41]. We therefore hypothesized that this novel loop may improve the

targeting capabilities and efficiency of ScCas9 by allowing for more flexible protein-

phosphate backbone contacts with the PAM sequence. Thus, we substituted the loop sequence from S. anginosus into the Sc+ ORF to generate ScCas9++ (Sc++).

47 We compared the PAM specificities and nucleolytic capabilities of Sc+ and Sc++ to SpCas9, xCas9-3.7, SpCas9-NG, and ScCas9 by transfecting HEK293T cells with plasmids expressing each variant individually alongside one of 16 sgRNAs, together directed to four genomic loci with diverse PAM sequences, collectively representing every base at each position in the PAM window. The sgRNA sequences were shifted by one base for xCas9-3.7 and SpCas9-NG to account for their reported 5'-NGN-3' PAM preferences, so to equivalently compare these enzymes to ScCas9 variants with 5'-NNG-3' specificities. After 5 days post-transfection, indel formation was quantified from Sanger sequencing abl files using the TIDE algorithm [124] following PCR amplification of the target genomic region. Our results demonstrate that Sc+ and Sc++ can effectively edit across the various genomic loci, and demonstrate improved indel formation percentages for a majority of the targets tested (Figure 3.6). SpCas9, xCas9-3.7, and SpCas9-NG all edit on "GG" PAM targets, and maintain activity on various 5'-AGN-3' PAM sequences. xCas9-3.7 and SpCas9-NG additionally edit few sites that harbor 5'-CGN-3' and 5'-TGN-3' sequences, but perform poorly on all tested 5'-NGC-3' PAM targets, consistent with previously reported data [4, 86, 23, 85, 57]. Sc+ and Sc++, on the other hand, improve greatly upon the editing capabilities of the wild-type ScCas9 enzyme, demonstrating improvement in indel formation efficiency on certain 5'-NNGC-3' targets, and even editing sites at which ScCas9, xCas9-3.7, and SpCas9-NG have negligible activity. We were also able to carry out base-editing on human genomic regions lacking "GG" dinucleotides by integrating Sc++ into the BE3 architecture.

3.2.7 Genus-wide Prediction of Divergent Streptococcus Cas9 PAMs

Demonstrations of efficient genome editing by Cas9 nucleases with distinct PAM speci- ficity from several Streptococcus species, including S. canis, motivated us to develop a bioinformatics pipeline for discovering additional Cas9 proteins with novel PAM re- quirements in the Streptococcus genus. We call this method the Search for PAMs by

48 A B 8N* Ubrary got"

SpCas9 .-KQ C

Figure 3.5 A) PAM binding enrichment on a 5-NNNNNNNN-3 PAM library of ScCas9-like SpCas9 variants. The PAM-SCANR screen (23) was applied to variants of SpCas9 containing the loop, KQ insertions, or both. SpCas9 ::Loop and SpCas9 ::Loop ::KQ failed to demonstrate PAM binding and thus GFP expression. (B) FACS analysis of binding at 5-NGG-3 PAM. All samples were performed in duplicates and averaged. SD was used to calculate error bars. (C) Sequence conservation of Streptococcus orthologs with ScCas9as a reference. Each ortholog is referred to by its UniProt ID (16). The loop (367 to 376) and KQ (1337 and 1338) insertion alignments are indicated.

49 A C V-NGN PAM-3Z

o,. 6P

I cnl, "SCI T SCC&SP++

4 A m 5 A ~ ditmg %(TIDE) 32% 17% Z6% 35% 33% M9 ~ AAT A f' A

&kv% MIO) % 6% S% 12% 1% 3% B E

xCas9 (3.7)

Se dlsng%(IO6 IS% 28% 2% 13% 0% 1%

D

SC++ BE3

BaseI fam,% (EEP) Jr,

Figure 3.6 A) Amino acid sequence of Sc++. SpCas9, SpCas9-NG, xCas9-3.7, and ScCas9 were aligned with various Streptococcus Cas9 orthologs, employing the BLOSUM62 scoring matrix, to identify the T1227K mutation derived from Streptococcus gordonii. Sequence alignment of ScCas9 with various Streptococcus Cas9 orthologs further isolated a novel loop structure from Streptococcus anginosus harboring an additional lysine residue and a flexible SG motif. B) PAM binding analysis of single G PAM Cas9 variants on a 5-NNNNNNNN-3 (8N) PAM library. Each dCas9 plasmid was electroporated in duplicates, subjected to FACS analysis, and gated for GFP expression. Subse- quently, percentages of GFP-positive cells were averaged. Standard deviation was used to calculate error bars. C) PAM binding enrichment visualization. PAM profiles are represented by DNA chro- matograms via amplification of PAM region following plasmid extraction of GFP-positive E. coli cells and subsequent Sanger sequencing.

50 ALignment Of Targets (SPAMALOT). Briefly, we mapped a 20 nt portion of spac-

ers flanked by known Streptococcus repeat sequences to candidate protospacers that

align with no more than two mismatches in phages associated with the genus [122].

We grouped 12 nt protospacer 3'-adjacent sequences from each alignment by genome

and CRISPR repeat, and then generated group WebLogos [23] to compute presumed

PAM features. The resulting WebLogos accurately reflect the known PAM specifici-

ties of Cas9 from S. canis (this work), S. pyogenes, S. thermophilus, and S. mutans (Figure 3.7) [129, 99, 36]. We identified a notable diversity in the WebLogo plots

derived from various S. thermophilus cassettes with common repeat sequences, each of which could originate from any other such S. thermophilus WebLogo upon subtle specificity changes that traverse intermediate WebLogos among them. We observe a similar relationship between two S. oralis WebLogos that also share this repeat, as well as unique putative PAM specificities associated with CRISPR cassettes contain- ing S. mutans-like repeats from the S. oralis, S. equinis, and S. pseudopneumoniae genomes.

3.3 Discussion

As the growth and development of CRISPR technologies continue, the range of tar- getable sequences remains limited by the requirement for a PAM sequence flanking a given target site. While significant discovery and engineering efforts have been undertaken to expand this range [109, 32, 70, 54, 50, 72, 39], there are still only a handful of CRISPR endonucleases with minimal specificity requirements. Here, we have developed an analogous platform for genome editing using the Cas9 from Strep- tococcus canis, a highly-similar SpCas9 ortholog with affinity to minimal 5'-NNG-3'

PAM sequences.

Established PAM engineering methods, such as random mutagenesis and directed evolution, can only generate substitution mutations in protein coding sequences. In fact, during the preparation of this manuscript, another group utilized phage as-

51 I

A4z

+E+10

-Ma-td B C

Ja,

£ D 'Ik4 s' a 4

associaued with Cas9 Figure 3.7 Spacer sequences found withinu he type 11 CRISPRcassees generate spacer- ORFs from specified Sreptococcus genomes were aligned Lo S. phage genoines to and CRISPR prolospacer mappings. WebLogos (22), labeled with Lhe relevant species, genoine, targetswith no repeat, were generated forisequences found at the 3 end of candidate protospacer Cas9 PAM se- more than two mismatches (2 mm). (A) PAM predictions for experimentally validated Cas9 orthologs quences in previousstudies. (B) Novel PAM predictions of alternate S. thermophihis Streptococcus with putative divergent specificities. (C) Novel PAM predictions ofuncharacterized orthologs with distinct specificities. sisted continuous evolution (PACE) [31] to evolve an SpCas9 variant, xCas9(3.7), with preference for various 5'-NG-3' PAM sequences [58]. An alternative approach consists of inserting or removing motifs with specific properties, which may provide a sequence search space that more common mutagenic techniques cannot directly

access. Here, we demonstrate an evolutionary example of this method with ScCas9, whose sequence disparities with SpCas9 include two divergent motifs that contribute to its minimal PAM sequence. Engineered variants lacking these motifs exhibit more stringent PAM specificities in our PAM determination assays, and the removal of both motifs reverts its PAM specificity back to a more 5'-NGG-3'-ike preference. While minimal inconsistencies in PAM preference between the utilized assays may arise from PAM-dependent allosteric changes that drive DNA cleavage [4], the PAM flexibility of ScCas9, as compared to SpCas9, remains consistent in all tested contexts.

To date, there are limited open-source tools or platforms specifically for the prediction of PAM sequences, though prior studies have conducted internal bioinformatics-based characterizations prior to experimental validation [109, 32, 70, 50, 19, 53]. Here, we have established SPAMALOT as an accessible resource that we share with the com- munity for application to CRISPR cassettes from other genera. Future development will include broadening the scope of candidate targets beyond genus-associated phage to capture additional sequences that could be beneficial targets, such as lysogens in species that host the same phage. We hope that this pipeline can be utilized to more efficiently validate and engineer PAM specificities that expand the targeting range of CRISPR, especially for strictly PAM-constrained technologies such as base editing [41, 77] and homology repair induction [114].

Finally, because ScCas9 does not require any alterations to the sgRNA of SpCas9, and due to its significant sequence homology with SpCas9, we presume that identical modifications from previous studies [124, 71, 19] can be made to increase the accuracy and efficiency of the endonuclease and its variants, although it already demonstrates potential improved on-to-off activity as compared to the standard SpCas9 on 5'-NGG-

53 3' targets. Additionally, while we have exhaustively evaluated the PAM specificity of ScCas9 on multiple targets in a variety of genome editing contexts, we do not rule out the possibility that there may exist untested 5'-NNG-3' genomic targets on which ScCas9 does not possess significant activity. Used together with SpCas9 and xCas9(3.7), however, ScCas9 expands the target range of currently-used Cas9 en- zymes for genome editing purposes. With further development, we anticipate that this broadened Streptococcus Cas9 toolkit, containing both ScCas9 and additional, uncharacterized orthologs with expanded targeting range, will enhance the current set of CRISPR technologies.

3.4 Materials and Methods

3.4.1 Identification of Cas9 Homologs and Generation of Plas- mids

The UniProt database [22] was mined for all Streptococcus Cas9 protein sequences, which were used as inputs to either the BioPython pairwise2 module or Geneious to conduct global pairwise alignments with SpCas9, using the BLOSUM62 scoring matrix [52], and subsequently calculate percent homology. The Cas9 from Strep- tococcus canis was codon optimized for E. Coli, ordered as multiple gBlocks from Integrated DNA Technologies (IDT), and assembled using Golden Gate Assembly. The pSF-EF1-Alpha-Cas9WT-EMCV-Puro (OG3569) plasmid for human expression of SpCas9 was purchased from Oxford Genetics, and the ORFs of Cas9 variants were individually amplified by PCR to generate 35 bp extensions for subsequent Gibson Assembly into the OG3569 backbone. Engineering of the coding sequence of ScCas9 and SpCas9 for removal or insertion of motifs was conducted using either the Q5 Site-Directed Mutagenesis Kit (NEB) or Gibson Assembly. To generate ScCas9 base editing plasmids, pCMV-ABE(7.10) (Addgene plasmid #102919) and pCMV-BE3 (Addgene plasmid #73021) were received as gifts from David Liu. Similarly, the ORF of the ScCas9 D10A nickase was amplified by PCR to generate 35 bp exten-

54 sions for subsequent Gibson Assembly into each base editing architecture backbone. sgRNA plasmids were constructed by annealing oligonucleotides coding for crRNA sequences as well as 4 bp overhangs, and subsequently performing a T4 DNA Ligase- mediated ligation reaction into a plasmid backbone immediately downstream of the human U6 promoter sequence. Assembled constructs were transformed into 50 pL NEB Turbo Competent E. coli cells, and plated onto LB agar supplemented with the appropriate antibiotic for subsequent sequence verification of colonies and plasmid purification.

3.4.2 PAM-SCANR Assay

Plasmids for the SpCas9 sgRNA and PAM-SCANR genetic circuit, as well as BW25113 AlacI cells, were generously provided by the Beisel Lab (North Carolina State Uni- versity). Plasmid libraries containing the target sequence followed by either a fully- randomized 8-bp 5'-NNNNNNNN-3' library or fixed PAM sequences were constructed by conducting site-directed mutagenesis, utilizing the KLD enzyme mix (NEB) af- ter plasmid amplification, on the PAM-SCANR plasmid flanking the protospacer se- quence (5'-CGAAAGGTTTTGCACTCGAC-3'). Nuclease-deficient mutations (D10A and H850A) were introduced to the ScCas9 variants using Gibson Assembly as pre- viously described. The provided BW25113 cells were made electrocompetent using standard glycerol wash and resuspension protocols. The PAM library and sgRNA plasmids, with resistance to kanamycin (Kan) and carbenicillin (Crb) respectively, were co-electroporated into the electrocompetent cells at 2.4 kV, outgrown, and re- covered in Kan+Crb Luria Broth (LB) media overnight. The outgrowth was diluted 1:100, grown to ABS600 of 0.6 in Kan+Crb LB liquid media, and made electrocom- petent. Indicated dCas9 plasmids, with resistance to chloramphenicol (Chl), were electroporated in duplicates into the electrocompetent cells harboring both the PAM library and sgRNA plasmids, outgrown, and collected in 5 mL Kan+Crb+Chl LB media. Overnight cultures were diluted to an ABS600 of 0.01 and cultured to an OD600 of 0.2. Cultures were analyzed and sorted on a FACSAria machine (Becton Dickinson). Events were gated based on forward scatter and side scatter and flu-

55 orescence was measured in the FITC channel (488 nm laser for excitation, 530/30 filter for detection), with at least 30,000 gated events for data analysis. Sorted GFP- positive cells were grown to sufficient density, and plasmids from the pre-sorted and sorted populations were then isolated, and the region flanking the nucleotide library was PCR amplified and submitted for Sanger sequencing (Genewiz). Bacteria har- boring non-library PAM plasmids, performed in duplicates, were analyzed by FACS following electroporation and overnight incubation, and represented as the percent of GFP-positive cells in the population, utilizing standard deviation to calculate error bars. Additional details on the PAM-SCANR assay can be found in Leenay, et al. [85].

3.4.3 Cell Culture and Gene Modification Analysis

HEK293T cells were maintained in DMEM supplemented with 100 units/ml peni- cillin, 100 mg/ml streptomycin, and 10% fetal bovine serum (FBS). sgRNA plasmid (500 ng) and effector (nuclease, BE3, or ABE(7.10)) plasmid (500 ng) were trans- fected into cells as duplicates (2 x 10/well in a 24-well plate) with Lipofectamine 2000 (Invitrogen) in Opti-MEM (Gibco). After 48 hours post-transfection, genomic DNA was extracted using QuickExtract Solution (Epicentre), and genomic loci were amplified by PCR utilizing the KAPA HiFi HotStart ReadyMix (Kapa Biosystems). For base editing analysis, amplicons were purified and submitted for Sanger sequenc- ing (Genewiz). For indel analysis, the T7E1 reaction was conducted according to the manufacturers instructions and equal volumes of products were analyzed on a 2% agarose gel stained with SYBR Safe (Thermo Fisher Scientific). Unprocessed gel image files were analyzed in Fiji [118]. The cleaved bands of interest were isolated using the rectangle tool, and the areas under the corresponding peaks were measured and calculated as the fraction cleaved of the total product. Percent gene modification was calculated as follows [46]:

% gene modification = 100 x (1 - (1 - fraction cleaved) )

56 All samples were performed in duplicates and percent gene modifications were aver- aged. Standard deviation was used to calculate error bars.

3.4.4 Base Editing Analysis with Traffic Light Reporter

HEK293T cells were maintained as previously described, and transfected with the corresponding sgRNA plasmids (333 ng), ABE7.10 plasmids (333 ng), and synthet- ically constructed TLR plasmids (333 ng) into cells as duplicates (2 x 105 /well in a 24-well plate) with Lipofectamine 2000 (Invitrogen) in Opti-MEM (Gibco). After 5 days post-transfection, cells were harvested and analyzed on a FACSCelesta machine (Becton Dickinson) for mCherry (561 nm laser excitation, 610/20 filter for detec- tion) and GFP (488 nm laser excitation, 530/30 filter for detection) fluorescence. Cells expressing mCherry were gated and percent GFP calculation of the subset were calculated. All samples were performed in duplicates and percentage values were averaged. Standard deviation was used to calculate error bars. The TLR spacer sequence is 5'-TTCTGTAGTCGACGGTACCG-3'.

3.4.5 Base Editing Evaluation Program

The Base Editing Evaluation Program (BEEP) was written in Python, employing the pandas data manipulation library and BioPython package. As inputs, the program requires a sample abl file, a negative control abl file, a target sequence, as well as the position of the specified base conversion, either handled as a .csv file for multiple sample analysis or for individual samples on the command line. Briefly, the provided target sequences are aligned to the base-calls of each input ab1 file to determine the absolute position of the target within the file. Subsequently, the peak values for each base at the indicated position in the spacer are obtained, and the editing percentage of the specified base conversion is calculated. Finally, a separate function normalizes the editing percentage to that of the negative control ab1 file to account for background signals of each base. The final base conversion percentage is outputted to the same .csv file for downstream analysis. The BEEP software can be downloaded

57 at https://github.com/mitmedialab/BEEP.

3.4.6 SPAMALOT Pipeline

All 11,440 Streptococcus bacterial and 53 Streptococcus associated phage genomes were downloaded from NCBI. CRISPR repeats catalogued for the genus were downloaded from CRISPRdb hosted by University of Paris-Sud [45]. For each genome, spacers upstream of a specific repeat sequence were collected with a toolchain consisting of the fast and memory-efficient Bowtie 2 alignment [80]. Each genome and repeat- type specific collection of spacers were then matched to all phage genomes using the original Bowtie short-sequence alignment tool [81] to identify candidate protospacers with at most one, two, or no mismatches. Unique candidates were input into the WebLogo 3 [23] command line tool for prediction of PAM features. The SPAMALOT software can be downloaded at https://github.com/mitmedialab/SPAMALOT.

3.4.7 Statistical Analysis

Data are shown as mean s.d. unless stated otherwise. Statistical analysis was performed using the two-tailed Students t-test, utilizing the SciPy software package. Calculated p-values, as compared to the negative control, are represented as follows: *P<0.05, **P<0.01, ***P<0.001, and ****P<0.0001. Data was plotted using Mat- plotlib.

58 Chapter 4

A Cas9 with Complete PAM Recognition for Adenine Dinucleotides from Streptococcus macacae

This chapter is adapted from Jakimo*, Chatterjee*, Nip*, and Jacobson. A Cas9 with Complete PAM Recognition for Adenine Dinucleotides. bioRxiv (2018) *Equal contribution

59 A B

Difucleotide Content Trinueotide Content 2010.75

175

1100- Vj

75

0.0.

0.0 wna 0-0 -''.. ...

Figure 4.1 (A) Dinucleotide and (B) Trinucleotide occurrences in the human reference genome GRCh38. Tallies were carried out using the compseq EMBOSS command line software tool. Dashed gray lines mark what the expected percentages would be for a uniform representation of all sequences of length 2 or 3.

4.1 Introduction

Biotechnologies based on CRISPR systems have enabled extensively precise and pro- grammable genomic interfacing.[76] However, CRISPR-associated (Cas) enzymes are also collectively restrained from localizing to any position along DNA.[97, 129, 84] Current gaps in the sequences they are known to recognize prevent access to numer- ous genomic positions for powerful methods like base editing, which can only operate on a narrow window of nucleotides at fixed distances from a PAM.[77] Many AT-rich regions, in particular, have been excluded from compelling CRISPR applications be- cause previously reported Cas9 and Casl2a endonucleases require targets to neighbor GC-content or more restrictive motifs, respectively.[150, 64, 147] In this work, we introduce a Cas9 ortholog derived from Streptococcus macacae NCTC 11558 that can instead recognize a short 5'-NAA-3' PAM.[112] These se- quences constitute 18.6% of the human genome, making adjacent adenines the most abundant dinucleotide (Figure 4.1). The importance of this alternative PAM recog-

60 18 CI 9 tho2o3s h R 123 1 1133 1Q 13 Q SMa..Coss N RIt Q It4 43 4008 L 50 D40 KSi V1P LW f tI of ¶4484 1 14 P L f 4 .---- -S0 OT 0 1 4 PV YPION sca Slg ( Lm 4S,gs .c~ .. iQ0V 5. V3NA L43 F K8 LGF18 P NFLGVM 1GQ4 P CT4 G R 3 V N81-E 128 1293 130 1312 1322 1332 134 1348 1358 1244

SwtQ" coo 4 FQ Q 0 SpyCa9 .S (CB Thedoma orgaiaVS o tinSA$F Sp Ca9 ja TSP psLGVeLd4 4 or PCTRG s V tr o Sft-maCaSO M8"

B C I 4 0 78 741 to"114 13114 PAM to"of Puts* Phage Tarpts(Nwll) ______for C&O OthoopwIthtlS33Q +R135Q

Figure 4.2 (A) Sequence alignment (Genewiz software)ofSpyCas9, itstQQRvariant,hand Smac Cas9. The stepin theunderliningred linemarks the joining of Spy Cas9 and SmacCas9 to construct aSpy-mac Cas9 hybrid. The sequence logo (Weblogo online tool) immediately below thealignmentdepicts theconservation at 11positionsaround thePAM-contacting aginines of SpyCas9. (B) Thedomain organization ofSpyCas9juxtaposedoveracolor-codedstructureof RNA-guided, target-bound Spy Cas9(PDB ID 5F9R). The two DNA strands are black with the exception of amagenta segment corresponding to the PAM. Ablue-green-red color map is used for labeling the Cas9 PIdomain and guide spacer sequence to highlight structures that confer sequence specificity and the prevalence of intra-domain contacts within the P1.42 (C)A sequence logo generated online (WebLogo) that was input with putative PAM sequences found inStreptococcus phage and associated with close Smac Cas9 homologs. nition for a Cas9 is reinforced by recent work revealing that while many Casl2a (formerly known as Cpfl) orthologs have AT-rich PAM sequences and are highly ac- curate nucleases on double-stranded DNA (dsDNA), they will also indiscriminately digest single-stranded DNA (ssDNA) when bound to their targets.[18, 71] Such collat- eral activity may introduce unwanted risks around partially unpaired chromosomal structures, such as transcription bubbles, R-loops, and replication forks. Here we characterize the specificity and utility of engineered nucleases derived from Smac Cas9, by means of transcriptional repression in bacterial culture, in vitro digestion reactions, and both gene and base editing in a human cell line. To modify the ancestral 5'-NGG-3' PAM specificity of Spy Cas9, early and new re-

61 ports have employed directed evolution (e.g., "VQR", "EQR", and "VRER" variants) and rational design informed by crystal structure (e.g., "QQR" and "NG" variants).[4, 73, 72, 103] These reports focused on the PAM-contacting arginine residues R1333 and R1335 that abolish function when exclusively mutated. While those studies could identify compensatory mutations resulting in altered PAM specificity, the Cas9 vari- ants that they produced maintained a guanine preference in at least one position of the PAM sequence. We aimed to lift such GC-content pre-requisites via a bioinformatics- driven strategy that mines natural PAM diversity in the Streptococcus genus. We then homed in on Smac Cas9 as having the potential to bear novel PAM specificity upon aligning 115 orthologs of Spy Cas9 from UniProt (limited to those with greater than a 70% pairwise BLOSSOM62 score). From the alignment we found Smac Cas9 was one of two close homologs, along with a Streptococcus mutans B112SM-A Cas9 (Smut Cas9), with divergence at both of the positions aligned to the otherwise highly conserved PAM-contacting arginines (Figure 4.2). We thus hypothesized that Smac Cas9 had naturally co-evolved the necessary compensatory mutations to gain new PAM recognition. A small sample size of 13 spacers from its corresponding genome's CRISPR cassette prevented us from confidently inferring the Smac Cas9 PAM in silico. However, the possibility for Smac Cas9 requiring less GC-content in its PAM was supported by sequence similarities to the "QQR" variant that has 5'-NAAG-3' specificity, in addition to phage-originating spacers in cassettes associated with Smut Cas9, which were identified with the aid of our custom computational pipeline of scripts called SPAMALOT (Figure 4.3).[17]

We proceeded to experimentally assay the PAM preferences of several Streptococ- cus orthologs that change one or both of the critical PAM-contacts. Based on demon- strated examples of the PAM-interaction (PI) domain and guide RNA (gRNA) having cross-compatibility between Cas9 orthologs that are closely related and active, we con- structed new variants by rationally exchanging the PI region of catalytically-"dead" Spy Cas9 (Spy dCas9) with those of the selected orthologs (Figure 4.4).[102, 11] Assembled variants, including Spy-mac dCas9, were separately co-transformed into E coli cells, along with guide RNA derived from S. pyogenes and an 8-mer PAM li-

62 A B Notable SpacerMapgs SMucWSPRCo0tte(NCHAE}EWS07000 MAW M

C1&T(TaICThaT&1gGITflT JMCO feM-iwfit (Mt; PRi1te)

WATTWIArTaUTtHL LAMiATM ArCe faitts (t04t; Prieste

JVA*AM6TaT6aaU CiaAiAVas AMOtCO fast cti (~n; Pdeo0tel

TAMc1lcuacK ercMeCaGT Stesmtcms. ~w ra (Moe 2pi0

rAMTCTMkC1TV AuTuC---- #W0etefof Ust4swwt (*cteria,

*TAT10c(61ES)C" C&A71CT Stroetctus #Ase.d

u aa.ttmsi tesntmtueaea~sneaeemrt~*it tp 99 ymae+eann rmcamocriAcLW castmTC strstcca emw a/" z

1uusss"0ccAsanWt cTauScca mow eriesmas m

2ENC*TMWA14AWRWMI Ft06 y .ltW IO GTtf)

4104 prbMoe.syNW KNW O 8*0u

K1Mv4UCIOAMW AMCM IrnpowmUo *AW PW su CREPEMi fNC* WIkNffALYJWIUS s.... '.....l...... r... KMyAMareninsCat sniEans gerauaeemgnenbln WKrUKMu.MAT *TT gUMOOUO ANN

TCrcla1suhErvuAr a %frmpu"Ws aaneI bl

Figure 4.3 (A) Annotated CRISPR cassettes obtained from the genomes corresponding to orthologs that substitute both PAM-contacting arginine residues to glutamine. (B) Mappings of CRISPR cas- sette spacers to their putative target source for listed crRNA, identified via an online BLAST and/or SPAMALOT. SPAMALOT uncovered most cases of mismatch-tolerated mappings to Streptococcus phage. Underlined bases indicate mismatches that are tolerated for the mapping. Additional line spacing separates analysis for each CRISPR cassette.

63 A B

ON PAM Pr.-Swt Lbrary S-HAAM-Y PAN Subsftatu (10o #"~ ~T gdeJOO nM) - AAAA .C A T A-' CP,AC t-C C, TC- f _-A AC C T. !,AT - A'T 'CT 4 T Te A CT G ACGT

4 (lts" %0. G

PAM-SCAUR Post-Sort (GFP4) C D "q mo TOW o

AAG woo-0 CE (100 1OO~~T~OT TO' 0' 3

Figure 4.4 (A) Chromatograms representing the PAM-SCANR based enrichment of variant- recognizing PAM sequences from a 5-NNNNNNNN-3 library. (B) SYBR-stained agarose gels showing in vitro digestion of 10 nM 5-NAAN-3 substrates upon 16 minutes of incubation with 100 nM of purified ribonucleoprotein enzyme assemblies. Arrows distinguish banding of the cleaved products from uncleaved substrate (top band). Matrix plots summarize cleaved fraction calculations, which were carried out in a custom script for processing gel images. (C) Timecourse measurements of target DNA substrate cleavage for Smac Cas9 and Spy-mac Cas9. (D) DNA substrate cleavage plotted as a function of 0.25:1, 1:1, and 4:1 molar ratios of ribonucleoprotein to target for wild-type Spy Cas9 and hybrid Spy-mac Cas9. brary of uniform base representation in the PAM-SCANR genetic circuit, established by others.[85] The circuit usefully up-regulates a green fluorescent protein (GFP) reporter in proportion to PAM-binding strength. Therefore, we collected the GFP- positive cell populations by flow cytometry and Sanger sequenced them around the site of the PAM to determine position-wise base preferences in a corresponding vari- ant's PAM recognition. Spy-mac dCas9, more so than Spy-mut dCas9, generated a trace profile that was most consistent with guanine-independent PAM recognition, along with a dominant specificity for adenine dinucleotides.

Next, we purified nuclease-active enzymes to continue probing the DNA target recognition potential and uniqueness of Spy-mac Cas9 (Figure 4.5).[5, 89] We in- dividually incubated the ribonucleoprotein complex enzymes (composed of Cas9 +

64 A B C Jk*UKU crRKA Spmcr Logth Ot). is 19 20 2 1 22 tmrRKA Sequosce: 3-. Smar spy

40 (1001

SW

30 ((9sia1IAA 44"

fUN "OVA* "uWo(Wm"

9D~pinS AMAAMA"177?50"

A5-I

(100*0m) , __ 5.

G E

wleba00AtmctUA( Wb*14101 ~~SlIT 11 Yo 11? aA* K0*bOO ANA O T11 VW0W

Figure 4.5 (A) SDS-PAGE gel image of Spy-mac Cas9 after purification by affinity chromatography. SYBR-stained agarose gels running in vitro digestion reactions are shown that assay dependencies on (B) crRNA spacer length and (C) tracrRNA sequence origin. (D) Sequence alignment (Genewiz software) of tracrRNA from S. pyogenes and S. mutans highlighted in a color code that reflects the base-pairing in their (E) duplex gRNA secondary structure. SYBR-stained agarose gels running in vitro digestion reactions are shown that assay dependencies on (F) positions 58 in the PAM sequence and (G) increments to the distribution of adenine content in positions 15 in the PAM sequence.

crRNA + tracrRNA) with double-stranded target substrates of all 5'/3'-neighboring base combinations at an adenine dinucleotide PAM (5'-NAAN-3'). Our brief 16- minute digestion indicated both wild-type Smac Cas9 and our designed Spy-mac Cas9 cleaved adjacently to 5'-NAAN-3' motifs more broadly and evenly than the previously reported QQR variant. Spy-mac Cas9 distinguished itself further with rapid DNA- cutting rates that resemble the fast digest kinetics of Spy Cas9 (Figure 4.4).[42] We ran reactions that used varying crRNA spacer lengths and tracrRNA sequence, as the latter differs slightly between the S. macacae and S. pyogenes genomes (Figure 4.5). Neither of these two parameters compensated for the slower cleavage rate of Smac Cas9, but we did notice marginal improvement in the activity of the wild-type form with its native tracrRNA, which comports with the interface of the guide-Cas9 interaction being mostly outside of the PI domain.

65 To crucially verify that an adenine dsDNA dinucleotide is sufficient for Cas9 PAM recognition, we confirmed Spy-mac Cas9 remains active on targets that set the next four downstream bases to the same nucleotide (e.g. 5'-TAAGXXXX-3', for X all fixed to A, C, G, or T; Figure 4.5). Additionally, we observed moderate yield of cleaved products on examples of 5'-NBBAA-3', 5'-NABAB-3', 5'-NBABA-3' PAM sequences (where B is the IUPAC symbol for C, G, or T), revealing an even broader tolerance for increments to the dinucleotide position or adenine adjacency. We anticipate future measurements of guide-loading, target-dissociation and R-loop expansion/contraction will provide more insights on the serendipitous catalytic benefit over Smac Cas9 from grafting its PI domain onto a truncated Spy Cas9.

Encouraged by the nucleolytic performance of Spy-mac Cas9, we investigated its capacity for gene modification in human cells. First, we transfected a human embry- onic kidney (HEK293T) cell line with plasmids that encode Smac Cas9 or Spy-mac Cas9, and co-expressed single-guide RNA molecules that target the VEGFA gene locus at sites representing a breadth of 5'-NAAN-3' PAM diversity (Figure 4.6). Consistent with our in vitro observations, we found Spy-mac Cas9 was generally more efficient than Smac Cas9 at mediating enzymatically-detected (T7 Endonucle- aseI) genomic insertion/deletion (indel) mutations. Spy-mac Cas9 also proved ca- pable of generating indels with variable efficiency on instances of any directly 5'- or 3-neighboring base for 5'-NAAG-3' or 5'-CAAN-3' PAM sequences. To address sites with low modification rates, we introduced two mutations (R221K and N394K) into Spy-mac Cas9 that can raise gene knock-out percentages and had been previ- ously identified by deep mutational scans of Spy Cas9. We refer to this variant as an "increased" editing Spy-mac Cas9 (iSpy-mac Cas9) due to its similarly elevated modification rates on our targets.[126]

We then benchmarked the gene editing performance of the nucleases derived from Streptococcus macacae Cas9 against orthologs of Casl2a by making use of their com- mon AT-rich PAM specificity (Figure 4.6).[145, 39] We included Casl2a orthologs known for efficient gene editing from Acidaminococcus sp. BV3L6 (AsCas12) and Lachnospiraceaebacterium ND2006 (LbCas12).[69] Our selection of target sites per-

66 1

PAM In VEGFA A

CAM';

CAA~ . r

A.Ca'l2a VS TT

o 5 10 1 Spy-Mac CaM I5Y-Kcowl AAATA 1RUMt.N09" 4P~ dlaa(W 0.0 0.5 LB 1. 2.0 G&W N e"a4 fstt

B

Sr-m CasS43 C,,-T,30.77*3.0%

"IF~ C4 -T;,31.6*4S7I C,,-T: 234*435% $WM*gMACALcCA ae" - Mcc C,-T: 2.7*2.5% P VAWn WA C.C0 Cac.

Figure 4.6 (A) Schematic diagram for matching Cas9 and Casl2a guides in a manner that enforces their recognition of the same PAM sequence and therefore facilities their comparison (a Casl2a vs Cas9 Comparator). (B) Dot plots of absolute and relative gene modification efficiency in HEK293T cells by Cas9 and Casl2a variants targeting common PAM sequences located in the VEGFA gene. Values were quantified in a T7EI-based assay and are consistent with biological duplicates that were run in parallel. (C) Genomic base editing demonstration for the targeted conversion of cytosines to thymines with Spy-mac nCas9-BE3. Analysis on the efficiency was carried out in our custom Sanger sequencing trace file processing script called BEEP.

67 mits overlapping PAM recognition between these Cas9 and Casl2a nucleases by guid- ing the Casl2a variants with the reverse complemented spacer sequences of those guiding our Cas9 variants. The Cas9 and Casl2a thereby targeted opposite strands, yet were constrained to recognize the same PAM site while preserving important features for guide RNA effectiveness (e.g. distribution of purines/pyrimidines and GC-content.[136, 79] We believe this is the first report of Casl2a and Cas9 activity being compared so explicitly on an endogenous genomic locus. For each site we ex- amined, iSpy-mac Cas9 consistently generated a larger indel percentage than either AsCasl2a or LbCasl2a - never exhibiting less activity than the lower-editing of the two Cas12 proteins - if not generating the largest overall percentage (Figure 4.6).

Lastly, we selected a window of four nucleotides in the VEGFA locus in a se- quence context such that any other reported CRISPR endonuclease capable of gene modification would not allow their base-editing from cytosine to thymine.[95] We co-transfected a gRNA plasmid targeting a 5'-CAAC-3' PAM and a nickase form of Spy-mac Cas9 that was inserted into the previously reported base-editing BE3 ar- chitecture (Spy-mac nCas9-BE3).[77] Harvested cells exhibited robust base-editing, with conversions of cytosines to thymines ranging from 20% to 30% in our selected window (Figure 4.6). The significant gain in efficiency compared to the indel for- mation we measured for this target can likely be explained by the larger scale of our gene modification experiments, plus differing codon usage. Recent work shows that editing rates can be improved by such codon selection, as well as optimization to nuclear-localization sequences/linkers, protein solubility, delivery methods, and sortable labeling of transfected cells.[75, 141, 88, 28] Our method for checking Smac Cas9 against Casl2a does not apply to base-editing since the enzymes must target op- posite strands in order to recognize the same PAM sequence and current base-editing methods only directly modify the non-target strand.[144, 87]; Hence, most Cas9 base- editing architectures utilize their ability to nick on the target side for transferring a base-edit in a manner that templates from the modified opposing strand.[41]

In summary, we have identified a homolog of Spy Cas9 in Streptococcus macacae with native 5'-NAAN-3' PAM specificity. By leveraging the substantial background

68 in the development and characterization of Spy Cas9, we engineered variants of Smac

Cas9 that maintain minimal PAM specificity and achieve suitable activity for mediat- ing edits on chromosomes in human cells.[61] This finding sets the path for engineering

Smac Cas9 with other desirable properties and activities.[74, 56, 63, 58] As a result, Cas9 can now substitute for Casl2a and open access to more AT-content PAM se- quences in the ever-growing list of genome engineering applications with CRISPR-Cas systems.

4.2 Materials and Methods

4.2.1 Selection of Streptococcus Cas9 Orthologs of Interest

All Streptococcus Cas9 orthologs were downloaded from the online UniProt database https://www.uniprot.org/. These were downsampled upon global protein alignment feature with a Blosum62 cost matrix in the Genewiz software package. Orthologs with less than 70% agreement with the Spy Cas9 sequence were then discarded. The re- maining 115 orthologs were used to generate a sequence logo (Weblogo http://weblogo. threeplusone.com/create.cgi), and were manually selected for divergence at positions aligned to residues critical for the PAM interaction of Spy Cas9.

4.2.2 PAM-SCANR Bacterial Fluorescence Assay

Sequences encoding the PAM-interaction domains of selected Cas9 orthologs were synthesized as gBlock fragments by Integrated DNA Technologies (IDT) and inserted via a New England Biolabs (NEB) Gibson Assembly reaction into the C-terminus of a low-copy plasmid containing Spy dCas9 (Beisel Lab, NCSU). The hybrid protein constructs were transformed into electrocompetent E. coli cells with additional PAM-

SCANR components as previously established. [85] Overnight cultures were analyzed and sorted on a Becton Dickinson (BD) FACSAria machine. Sorted GFP-positive cells were grown to sufficient density, and plasmids from the pre-sorted and sorted populations were then isolated. The region flanking the nucleotide library was PCR

69 amplified and submitted for Sanger sequencing (Genewiz). The choromatograms from received trace files were inspected for post-sorted sequence enrichments relative to the pre-sorted library.

4.2.3 Purification of and DNA cleavage with Selected Nucle- ases

The gBlock (IDT) encoding the PAM-interaction domain of S. macacae was inserted into a bacterial protein expression/purification vector containing wild-type S. pyo- genes Cas9 fused to the His6-MBP-tobacco etch (TEV) protease cleavage site at the N-terminus (pMJ915 was a gift from , Addgene plasmid #69090). The resulting hybrid Spy-mac Cas9 protein expression construct was sequence-verified by a next-generation complete plasmid sequencing service (CCBI DNA Core Facility at Massachusetts General Hospital). The hybrid-protein construct was then trans- formed into BL21 Rosetta 2TM(DE3) (MilliporeSigma), and a single colony was picked for protein expression by inoculating into 2xYT with a final concentration of 1% glu- cose at 37 Celsius. This overnight starter culture was then used to inoculate 1 L 2xYT and grown at 37 Celsius to a cell density of OD600 0.6, at which point the temperature was lowered to 18 Celsius and His-MBP-TEV-SpyMac Cas9 expression was induced by supplementing with 0.2 mM IPTG and grown for 18 hours before harvest.

Cells were then lysed with BugBusterTMProtein Extraction Reagent, supplemented with 1 mg/ml lysozyme solution (MilliporeSigma), 125 Units/gram cell paste of BenzonaseTMNuclease (MilliporeSigma), and complete, EDTA-free protease inhibitors (Roche Diagnostics Corporation). The lysate was clarified by centrifugation, includ- ing a final spin with a pre-chilled Steriflip TM 0.45 micron filter (MilliporeSigma). The clarified lysate was incubated with Ni-NTA resin (Qiagen) at 4 Celsius for 1 hour and subsequently applied to an Econo-PacTMchromatography column (Bio-Rad Labora- tories). The protein-bound resin was washed extensively with wash buffer (20 mM Tris pH 8.0, 800 mM KCl, 20 mM imidazole, 10% glycerol, 1 mM TCEP) and His-

70 tagged Spy-mac protein was eluted in wash buffer (20 mM HEPES, pH 8.0, 500 mM

KCl, 250 mM imidazole, 10% glycerol). ProTEVTMPlus protease (Promega, Madi- son) was added to the pooled fractions and dialyzed overnight into storage buffer (20 mM HEPES, pH 7.5, 500 mM KCl, 20% glycerol) at 4 Celsius using Slide-A-

Lyzer T dialysis cassettes with a molecular weight cut-off of 20 KDa (ThermoFisher

Scientific). The sample was then incubated again with Ni-NTA resin for 1 hour at 4

Celsius with gentle rotation and applied to a chromatography column to remove the cleaved His tag. The protein was eluted with wash buffer (20 mM Tris pH 8.0, 800 mM KCl, 20 mM imidazole, 10% glycerol, 1 mM TCEP) and fractions containing cleaved protein were verified once more by SDS-PAGE and Coomassie staining, then pooled, buffer exchanged into storage buffer, and concentrated. The concentrated aliquots were measured based on their light-absorption (Implen Nanophotometer) and flash-frozen at -80 Celsius for storage or used directly for in vitro cleavage assays.

The crRNA and tracrRNA guide components were procured in the form of HPLC- purified RNA oligos (IDT) and resuspended in 1X IDTE pH 7.5 solution (IDT).

Duplex crRNA-tracrRNA guides were annealed at 1 uM concentration in duplex buffer (IDT) by a protocol of rapid melting followed by gradual cooling. Target substrates were PCR amplified from assemblies of the PAM-SCANR plasmid with a fixed PAM sequence. In vitro digestion reactions with 10 nM target and typ- ically a 10-fold excess of enzyme components were prepared on ice and then in- cubated in a thermal cycler at 37 Celsius. Reactions were halted after at least 1 minute by heat denaturation at 65 Celsius for 5 minutes and run on a 2% TAE- agarose gel stained with DNA-intercalating SYBR dye (Invitrogen). Gel images were recorded from blue-light exposure and analyzed in a Python script adapted from https://github.com/jharman25/gelquant/. Cleavage fraction measurements were quan- tified by the relative intensity of substrate and product bands as follows:

integrated intensity of product bands integrated intensity of all bands

71 4.2.4 Gene Modification Analysis and Software

The gBiock (IDT) encoding the PAM-interaction domain of S. macacae was swapped into the Spy Cas9 mammalian expression plasmid OG5209 (Oxford Genetics). Plas- mids for Casl2a protein plus Cas9 and Cas2a guide construction were gifts from Keith Joung (Addgene plasmid # 78741, 78742, 78743, 78744). HEK293T cells were maintained in DMEM supplemented with 100 units/ml penicillin, 100 mg/ml strep- tomycin, and 10% fetal bovine serum (FBS). sgRNA plasmid (62.5 ng) and nuclease plasmid (187.5 ng) were transfected into cells as duplicates (5 x 104/well in a 24-well plate) with Lipofectamine 3000 (Invitrogen) in Opti-MEM (Gibco). After 5 days post- transfection, genomic DNA was extracted using QuickExtract Solution (Epicentre), and genomic loci were amplified by PCR utilizing the KAPA HiFi HotStart ReadyMix (Kapa Biosystems). For indel analysis, the T7EI reaction was conducted according to the manufacturer's instructions and equal volumes of products were analyzed on a 2% agarose gel stained with SYBR Safe (Thermo Fisher Scientific). Gel image files were analyzed in a Python script adapted from https://github.com/jharman25/gelquant/. Boundaries of cleaved and uncleaved bands of interest were hard-coded for each dupli- cate set of Cas variants with a common target, and the areas under the corresponding peaks were measured and calculated as the fraction cleaved of the total product. Per- cent gene modification was calculated as follows:

% gene modification = 100 x (1 - (1 - fraction cleaved) )

4.2.5 Base Editing Analysis and Software

The gBlock (IDT) encoding the PAM-interaction domain of S. macacae was swapped into a mammalian expression plasmid for cytosine to thymine base-editing, which came as a gift of David Liu (Addgene plasmid # 73021). HEK293T (ATCCgCRL-

3 2 16TM) cells (MilliporeSigma, Burlington, MA) were maintained in DMEM supple- mented with 100 units/ml penicillin, 100 mg/ml streptomycin, and 10% fetal bovine serum (FBS). sgRNA (500 ng) and BE3 plasmids (500 ng) were transfected into cells

72 as duplicates (2 x 105 /well in a 24-well plate) with Lipofectamine 3000 (Invitrogen) in

Opti-MEM (Gibco). After 5 days post-transfection, genomic DNA was extracted us-

ing QuickExtract Solution (Epicentre), and the VEGFA genomic locus was amplified by PCR utilizing the KAPA HiFi HotStart ReadyMix (Kapa Biosystems). Amplicons were purified and submitted for Sanger sequencing (Genewiz). For base conversion analysis, an automated Python script termed BEEP, employing the pandas data ma- nipulation library and BioPython package, was utilized to align base-calls of an input abl file to first determine the absolute position of the target within the file, and subsequently measure the peak values for each base at the indicated position in the spacer. Finally, editing percentages of specified base conversions were calculated and normalized to that of an unedited control. Conversion efficiencies are reported as the average of two independent duplicate reactions ± standard deviation. The BEEP software can be downloaded at https://github.com/mitmedialab/BEEP.[17]

73 Table T1 Sequence information for in vitro digest reactions

Name Sequence crRNA rCrGrArArArGrGrUrUrUrUrGrCrArCrUrCrGrArC... rGrUrUrUrUrArGrArGrCrUrArUrGrCrU tracrRNA (Spy) rArGrCrArUrArGrCrArArGrUrUrArArArArU... rArArGrGrCrUrArGrUrCrCrGrUrUrArUrCrA... rArCrUrUrGrArArArArArGrUrG... rGrCrArCrCrGrArGrUrCrGrGrUrGrCrUrU PAM Target 5'-NNNN-3' CGAAAGGTTTTGCACTCGACNNNNACCAACGAAAGGGCC

74 Table T2 Sequence information for genome editing in human cells

Name Sequence sgRNA for Cas9 N20(Target) GTTTTAGAGCTATGCTG... GAAACAGCATAGCAAGTTAAAAT... AAGGCTAGTCCGTTATCAACTTGAAA... AAGTGGCACCGAGTCGGTGCTTpolyT gRNA for AsCasl2a TAATTTCTACTCTTGTAGAT N20(Target) polyT gRNA for LbCasl2a AATTTCTACTAAGTGTAGAT N20(Target) polyT Target for CAAATTCC PAM w/ Cas9 GAACCCGGATCAATGAATAT Target for CAAATTCC PAM w/ Casl2a ATATTCATTGATCCGGGTTC Target for CAACCCCA PAM w/ Cas9 GCTCCCCGCTCCAACACCCT Target for CAACCCCA PAM w/ Casl2a AGGGTGTTGGAGCGGGGAGC Target for CAAGCCGT PAM w/ Cas9 GGGAAGTAGAGCAATCTCCC Target for CAAGCCGT PAM w/ Cas12a GGGAGATTGCTCTACTTCCC Target for CAATGTGC PAM w/ Cas9 GCCACAGTGTGTCCCTCTGA Target for CAATGTGC PAM w/ Casl2a TCAGAGGGACACACTGTGGC Target for TAACCTCA PAM w/ Cas9 GCTCAGGCCCTGTCCGCACG Target for TAACCTCAI PAM w/ Casl2a CGTGCGGACAGGGCCTGAGC Target for TAAGGCCC PAM w/ Cas9 GTTCCATCGGTATGGTGTCC Target for TAAGGCCC PAM w/ Casl2a GGACACCATACCGATGGAAC Target for GAAGTCGA PAM w/ Cas9 GGTAGCAAGAGCTCCAGAGA Target for GAAGTCGA PAM w/ Casl2a TCTCTGGAGCTCTTGCTACC Target for GAAAGTGA PAM w/ Cas9 GATTGGCGAGGAGGGAGCAG Target for GAAAGTGA PAM w/ Casl2a CTGCTCCCTCCTCGCCAATC Target for GAAACCAG PAM w/ Cas9 GCCTGGAAATAGCCAGGTCA Target for GAAACCAG PAM w/ Casl2a TGACCTGGCTATTTCCAGGC Target for AAACCAGC PAM w/ Cas9 GCTGGAAATAGCCAGGTCAG Target for AAACCAGC PAM w/ Casl2a CTGACCTGGCTATTTCCAGC Target for AAAGTGAG PAM w/ Cas9 GTTGGCGAGGAGGGAGCAGG Target for AAAGTGAG PAM w/ Cas12a CCTGCTCCCTCCTCGCCAAC Target for AAATTCCA PAM w/ Cas9 GACCCGGATCAATGAATATC Target for AAATTCCA PAM w/ Casl2a GATATTCATTGATCCGGGTC

75 76 Chapter 5

RNA-Switched Cas9 Guides Engineered by Strand-Displacement

This chapter is adapted from

Jakimo, Chatterjee, and Jacobson. ssRNA/DNA-Sensors via Embedded Strand- Displacement Programs in CRISPR/Cas9 Guides. BioRxiv (2018)

77 5.1 Introduction

Native ribonucleoprotein nucleases of CRISPR systems protect bacterial populations from viral proliferation by guide RNA (gRNA) that help to recognize and damage invading genomic material [94], and in some cases, result in collateral digestion of single-stranded RNA or DNA [29, 19, 43]. While these CRISPR components confer certain sensing and actuation functionality, applications that demand more complex responses to physical and biochemical stimuli have led to an abundance of CRISPR- based engineering. To date, Cas9 from the Type II CRISPR system of S. pyogenes has dominated the focus of these efforts [64, 116, 140]. Even when only considering modifications to the SpCas9 guide, CRISPR activity has been coupled to a vari- ety of stimuli, such as light absorption [59] and emission [82], aptamer-associated small-molecules [135], RNA-binding proteins [90], ssDNA anti-sense oligos [34], and bacterial transcriptional regulators [15]. In this work we demonstrate the first gRNA- embedded molecular programs for conditional assembly of guide and Cas9 in response to RNA triggers.

5.2 Computational Design of Switchable Guide RNA

We implemented the two low and high Cas9-affinity states of swigRNA with a toehold- gated strand-displacement reaction. Several groups have previously employed this method for controlling transcription termination [15], RNA processing [55], transla- tion initiation [44], aptamer fluorescence[10], and nucleic acid-based logic [20, 148].

Figure 5.1 illustrates its relation to the stages of swigRNA-Cas9 activity, which are listed as follows: 1) Insertions or extensions in the swigRNA molecule disrupt struc- tural features essential for its complexing with Cas9. 2) The added sequence content includes an unpaired "toe-hold" domain to promote hybridization with the triggering oligonucleotide. 3) A sufficiently matched trigger that base-pairs with the swigRNA toehold proceeds to incrementally displace base-pairing within the swigRNA. 4) In- tramolecular and intermolecular strand exchanges promote alternative base-pairings

78 I

swigRNA closed swigRNA closed swigRNA partitfy open swigRNA Open trlgRNA mInus tdgRNAteodng r Arand-dIsplcIng tigRA plus

4-0.

2 * 2~.

Cas9 Affinity I

Figure 5.1 Illustration of the toehold-gated strand-displacement steps for switching swigRNA-Cas9 activity using trigger trigRNA. Colored domains denote target-defining spacer sequence (green), re- gions that map to positions in sgRNA that base-pair (dark blue), regions that map to positions in sgRNA that do not base-pair (light blue), and inserted swigRNA-unique sequences (gray). High- lighted domains denote domains in swigRNA and trigRNA that duplex based on toeholding (yellow) or strand-displacement (red).

79 A

.. !=t:~b ...... m.. .. K3dS...w K.p r 200"W"X X3-,4* Stwgus - a-.

NUPACK - -

3 lo-

Figure 5.2 (a) Modular strategies that impose RNA secondary structure constraints for swigRNA. b) Design, build, and test workflow for swigRNA evaluation.

within the swigRNA. The resulting structure then rescues Cas9 affinity.

We lay out general design principles for swigRNA Figure 5.2 by incorporating other studies on the effectiveness of gRNA variants that contain target-independent sequence and structural changes [11, 102]. These principles exploit domains of a guide that can tolerate insertions or structure-preserving sequence substitutions. They al- low terminal extensions, as well as resizing loops and stems in either the 3'-proximal hairpins or at the native duplex junction between CRISPR RNA (crRNA) and trans- activating CRISPR RNA (tracrRNA). We generated sequences that satisfy our design constraints using the nucleic acid computational design and analysis package, NU- PACK [146]. We were able to specify stable switching behavior by including OFF- and ON-state models in co-optimization tasks. In Figure 1C, we demonstrate functional switching by a standout design, which was evaluated by an in vitro Cas9 cleavage and gel shift assay [5].

80 5.3 Validation of the Toehold-Gated Strand-Displacement Mechanism

We further assessed the same design to validate the role of toehold-gated strand- displacement Figure 5.3. In the absence of trigRNA, only a marginal fraction of target DNA substrate was cleaved after 8 hours of digestion with Cas9 and swigRNA. By contrast, a time course that included a two-fold excess of trigRNA yielded cleavage on roughly half of the targets within the first hour. In support of our proposed mechanism, we found neither toeholding nor strand-displacing domain was sufficient for a trigger to induce cleavage.

Like trigRNA, full-length trigDNA was also effective at inducing switching, yet with slightly lower efficiency. We expect this decrease in performance results from generally higher thermal stability of RNA:RNA hybridization relative to RNA:DNA. Nonetheless, we tested a variety of trigDNA with partial terminal truncations in order to probe the length of toeholding and strand-displacement needed for triggering swigRNA (Figure 5.3). We observed 5'-end truncations in the toeholding domain of trigger gradually reduced digestion rates. While 2-4 nt 3'-end truncations in the strand-displacing domain had minimal effects on cleavage yields, those that extended beyond 5 nt showed drastic reduction.

We take these results to mean that toehold domain binding alone was insufficient to promote strand exchanges for stronger swigRNA-Cas9 affinity. Toeholding, how- ever, facilitated invasion and displacement of base-pairing that inhibited Cas9 bind- ing. Since near-complete displacement enabled formation of active swigRNA-Cas9, this switching after only partial invasion could suggest the remaining displacement occurred by thermal dehybridization ("breathing"), was promoted by intermediate strand-exchange structures, and was then stabilized by Cas9 binding.

81 A

C

B D

Figure 5.3 A) 2D and 1D representations of base-pairing in a swigRNA that is tested over an 8 hour time-course with and without trigRNA. B) Cas9 digest reactions run with 5-end truncations of trigR.NA or trigDNA. C) Cas9 digest reactions run with 3-end truncations of trigRNA or trigDNA.

5.4 Demonstration of OFF-to-ON and ON-to-OFF Switchable Guide RNA

We confirm the same design strategy generates additional swigRNA that have off- to-on performance for other triggers and targets (Figure 3A). We were surprised to discover alternative design strategies produced opposite on-to-off switching (Figure 5.4). These on-to-off swigRNA embed the entire toehold-gated strand-displacement program - in different orientations - within the junction of crRNA and tracrRNA. We expect this region is sterically sensitive to trigger hybridization for the stem lengths used in these designs. Lastly, we validate more swigRNA hybridization strategies that produce our intended off-to-on switching (Figure 3C). Our work reveals several methods for coupling activity of the CRISPR/Cas9 sys- tem to sensing ssRNA or ssDNA molecules. We have also established a workflow that can now be applied to broader design libraries for new DNA targets and RNA triggers. We plan to correspond predicted and actual swigRNA structures using tech-

82 -=I

A B

-WW" -o f -f~

C Orr oa 2W"meel~.aIs-piu

Figure 5.4 A) Evaluation of additional off-to-on swigRNA based on interdomain hybridization between the stem-bulge and hairpins. B) Demonstration of on-to-off swigRNA based on intradomain hybridization within the stem-bulge. C) Evaluation of alternative off-to-on swigRNA based on hybridization between a 3 terminal extension and the nexus and hairpins. niques, such as SHAPE-seq [91], in order to adjust our design principles for further generality.

5.5 Materials and Methods

5.5.1 In vitro Cas9 Cleavage Assays

IDT DNA Ultramers of reverse-complimented swigRNA designs were transcribed us- ing the NEB HiScribe T7 Quick High Yield RNA Synthesis Kit and purified using the Qiagen RNeasy Mini kit. Each cleavage reaction consisted of 30 nM NEB Cas9, S. pyogenes nuclease, 30 nM swigRNA, 3 nM PCR amplified DNA target substrate, and 60 nM HPLC-purified synthetic trigRNA or trigDNA from IDT. Unless otherwise specified, digests incubated 4 hours before 2% TAE agarose gel electrophoresis.

83 5.5.2 Nupack swigRNA Design

We used the online NUPACK nucleic acid sequence design server (http://nupack.org/design/new) to co-optimize active and inactive states of swigRNA. We wrote Python code to both parametrically generate an input NUPACK script and parse the output sequences into genbank format.

84 Chapter 6

Concluding Remarks

85 Cas9 (ScSpy-mac) -

gLNtr r. A ( guidp O A

Figure 6.1 Diagram ofCas-gRNA RNP that has its components annotated by theassociated technological developments in this thesis.

This thesis achieves novel precision and range for CRISPR-Cas9 aene-edit.ing by modifying separate components of the Cas9-gRNA ribonclooprotein complex (Figure 6.1). Chapter 2 demonstrates DNA/RNA chimeric cr1NA that can evade off-target genorne editing. Chapters 3 and 4 uncover the first Cas9 enzymes (Sc and Smac) with natural single-base and AT-only. two-base PAM specificity. Chapter 5 shows how to engineer tracrRNA segments for switchable guides that can be acti- vated by RNA signals. Integrating these distinct, modifications into the same RNP will enable systems that combine their honefits for genome editing.

A critieal contribution of this ihesis isthe broadened sequence space for flicient genome engineering. To date. Sc Cas9 appeas to be the broadest. and most. efficient single-base PAM Cas9. Combining its range with the minimal adenine dinu-

86 I -MR

ra~L ka 4IP1 Streptococcus pyogenes PAM Targets Streptococcus canis PAM Targets Streptococcus macacae PAM Targets

Figure 6.2 The expanded sequence targeting range of Sc and Smac derived Cas9 enzymes.

cleotide PAM recognition of iSpy-mac Cas9, opens unprecedented access to genomic targets. Whereas SpCas9 as an ABE base-editor could repair 41.1% of deleterious nonsense single-nucleotide polymorhphisms, ABE base-editors from Sc++ and iSpy- mac could together repair 95.5% of such deleterious mutations on the human genome. These enzymes therefore present the opportunity to provide new cures for patient pop- ulations with genetic disorders or to more densely screen the function of sequences in the genome. Inspired by the diversity of life, the constraints of evolution, and the ability to engineer the bonds of biology, this thesis combines computation and experimentation to create new tools for precise and expansive CRISPR edits.

87 88 Chapter 7

Bibliography

89 90 Bibliography

[1] Omar 0 Abudayyeh, Jonathan S Gootenberg, Patrick Essletzbichler, Shuo Han, Julia Joung, Joseph J Belanto, Vanessa Verdine, David B T Cox, Max J Kellner, Aviv Regev, Eric S Lander, Daniel F Voytas, Alice Y Ting, and Feng Zhang. Rna targeting with -cas13. , 550:280-284, October 2017.

[2] Katarzyna P Adamala, Daniel A Martin-Alarcon, and Edward S Boyden. Pro- grammable rna-binding protein composed of repeats of a single modular unit. Proceedings of the National Academy of Sciences, 113(19):E2579-E2588, 2016.

[3] S F Altschul, W Gish, W Miller, E W Myers, and D J Lipman. Basic local alignment search tool. Journal of molecular biology, 215:403-410, October 1990.

[4] Carolin Anders, Katja Bargsten, and Martin Jinek. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease cas9. Molecular Cell, 61(6):895-902, mar 2016.

[5] Carolin Anders and Martin Jinek. In vitro enzymology of cas9. Methods in enzymology, 546:1-20, 2014.

[6] Sylvain Arnould, Christophe Perez, Jean-Pierre Cabaniols, Julianne Smith, Agns Gouble, Sylvestre Grizot, Jean-Charles Epinat, Aymeric Duclert, Philippe Duchateau, and Frdric Pques. Engineered i-crei derivatives cleaving sequences from the human xpc gene can induce highly efficient gene correction in mam- malian cells. Journal of molecular biology, 371:49-65, August 2007.

[7] Rodolphe Barrangou, Christophe Fremaux, Helene Deveau, Melissa Richards, Patrick Boyaval, Sylvain Moineau, Dennis A Romero, and Philippe Horvath. Crispr provides acquired resistance against viruses in prokaryotes. Science, 315(5819):1709-1712, 2007.

[8] Rodolphe Barrangou and Philippe Horvath. A decade of discovery: Crispr functions and applications. Nature microbiology, 2:17092, June 2017.

[9] Victoria M Bedell, Ying Wang, Jarryd M Campbell, Tanya L Poshusta, Colby G Starker, Randall G Krug, Wenfang Tan, Sumedha G Penheiter, Alvin C Ma, Anskar Y H Leung, Scott C Fahrenkrug, Daniel F Carlson, Daniel F Voytas, Karl J Clark, Jeffrey J Essner, and Stephen C Ekker. In vivo genome editing using a high-efficiency talen system. Nature, 491:114-118, November 2012.

91 [10] Sanchita Bhadra and Andrew D Ellington. A spinach molecular beacon trig- gered by strand displacement. Rna, 20(8):1183-1194, 2014.

[11] AlexandraE. Briner, PaulD. Donohoue, AhmedA. Gomaa, Kurt Selle, EuanM. Slorach, ChristopherH. Nye, RachelE. Haurwitz, ChaseL. Beisel, AndrewP. May, and Rodolphe Barrangou. Guide rna functional modules direct cas9 Ac- tivity and orthogonality. Molecular Cell, 56(2):333339, Oct 2014.

[12] David Burstein, Lucas B.Harrington, Steven C Strutt, Alexander J Probst, Karthik Anantharaman, Brian C Thomas, Jennifer A Doudna, and Jillian F Banfield. New crispr-cas systems from uncultivated microbes. Nature, 542:237- 241, February 2017.

[13] Michael T Certo, Kamila S Gwiazda, Ryan Kuhar, Blythe Sather, Gabrielle Curinga, Tyler Mandt, Michelle Brault, Abigail R Lambert, Sarah K Baxter, Kyle Jacoby, Byoung Y Ryu, Hans-Peter Kiem, Agnes Gouble, Frederic Paques, David J Rawlings, and Andrew M Scharenberg. Coupling endonucleases with dna end-processing enzymes to drive gene disruption. Nature methods, 9:973- 975, October 2012.

[14] Michael T Certo, Byoung Y Ryu, James E Annis, Mikhail Garibov, Jordan Jarjour, David J Rawlings, and Andrew M Scharenberg. Tracking genome engineering outcome at individual dna breakpoints. Nature methods, 8:671- 676, July 2011.

[15] James Chappell, Melissa K Takahashi, and Julius B Lucks. Creating small transcription activating rnas. Nature chemical biology, 11(3):214, 2015.

[16] Raj Chari, Prashant Mali, Mark Moosburner, and George M Church. Un- raveling crispr-cas9 genome engineering parameters via a library-on-library ap- proach. Nature methods, 12:823-826, September 2015.

[17] Pranam Chatterjee, Noah Jakimo, and Joseph M. Jacobson. Divergent pam specificity of a highly-similar spcas9 ortholog. bioRxiv, 2018.

[18] Janice S Chen, Enbo Ma, Lucas B Harrington, Maria Da Costa, Xinran Tian, Joel M Palefsky, and Jennifer A Doudna. Crispr-casl2a target binding un- leashes indiscriminate single-stranded dnase activity. Science (New York, N. Y.), 360:436-439, April 2018.

[19] Janice S Chen, Enbo Ma, Lucas B Harrington, Xinran Tian, and Jennifer A Doudna. Crispr-casl2a target binding unleashes single-stranded dnase activity. bioRxiv, page 226993, 2017.

[20] Yuqi Chen, Yanyan Song, Fan Wu, Wenting Liu, Boshi Fu, Bingkun Feng, and Xiang Zhou. A dna logic gate based on strand displacement reaction and rolling circle amplification, responding to multiple low-abundance dna fragment input signals, and its application in detecting mirnas. Chemical communications, 51(32):6980-6983, 2015.

92 [21] Le Cong, F Ann Ran, David Cox, Shuailiang Lin, Robert Barretto, Naomi Habib, Patrick D Hsu, Xuebing Wu, Wenyan Jiang, Luciano A Marraffini, and Feng Zhang. Multiplex genome engineering using crispr/cas systems. Science (New York, N. Y.), 339:819-823, February 2013.

[22] UniProt Consortium. Uniprot: a worldwide hub of protein knowledge. Nucleic acids research, 47:D506-D515, January 2019.

[23] Gavin E Crooks, Gary Hon, John-Marc Chandonia, and Steven E Brenner. Weblogo: a sequence logo generator. Genome research, 14:1188-1190, June 2004.

[24] Kevin M Davis, Vikram Pattanayak, David B Thompson, John A Zuris, and David R Liu. Small molecule-triggered cas9 protein with improved genome- editing specificity. Nature chemical biology, 11:316-318, May 2015.

[25] Fabien Delacte, Christophe Perez, Valrie Guyot, Marianne Duhamel, Christelle Rochon, Nathalie Ollivier, Rachel Macmaster, George H Silva, Frdric Pques, Fayza Daboussi, and Philippe Duchateau. High frequency targeted mutagenesis using engineered endonucleases and dna-end processing enzymes. PloS one, 8:e53217, 2013.

[26] Hlne Deveau, Rodolphe Barrangou, Josiane E Garneau, Jessica Labont, Christophe Fremaux, Patrick Boyaval, Dennis A Romero, Philippe Horvath, and Sylvain Moineau. Phage response to crispr-encoded resistance in strepto- coccus thermophilus. Journal of bacteriology, 190:1390-1400, February 2008.

[27] John G Doench, Nicolo Fusi, Meagan Sullender, Mudra Hegde, Emma W Vaim- berg, Katherine F Donovan, Ian Smith, Zuzana Tothova, Craig Wilen, Robert Orchard, Herbert W Virgin, Jennifer Listgarten, and David E Root. Optimized sgrna design to maximize activity and minimize off-target effects of crispr-cas9. Nature biotechnology, 34:184-191, February 2016.

[28] Katarzyna Duda, Lindsey A Lonowski, Michael Kofoed-Nielsen, Adriana Ibarra, Catherine M Delay, Qiaohua Kang, Zhang Yang, Shondra M Pruett-Miller, Eric P Bennett, Hans H Wandall, Gregory D Davis, Steen H Hansen, and Morten Frdin. High-efficiency genome editing via 2a-coupled co-expression of fluorescent proteins and zinc finger nucleases or crispr/cas9 nickase pairs. Nu- cleic acids research, 42:e84, June 2014.

[29] Alexandra East-Seletsky, Mitchell R OConnell, David Burstein, Gavin J Knott, and Jennifer A Doudna. Rna targeting by functionally orthogonal type vi-a crispr-cas enzymes. Molecular cell, 66(3):373-383, 2017.

[30] Behnam Enghiad and Huimin Zhao. Programmable dna-guided artificial re- striction enzymes. ACS , 6:752-757, May 2017.

93 [31] Kevin M Esvelt, Jacob C Carlson, and David R Liu. A system for the continuous directed evolution of biomolecules. Nature, 472:499-503, April 2011.

[32] Kevin M Esvelt, Prashant Mali, Jonathan L Braff, Mark Moosburner, Stephanie J Yaung, and George M Church. Orthogonal cas9 proteins for rna- guided gene regulation and editing. Nature methods, 10:1116-1121, November 2013.

[33] Iman Farasat and Howard M Salis. A biophysical model of crispr/cas9 activity for rational design of genome editing and gene regulation. PLoS , 12:e1004724, January 2016.

[34] Quentin RV Ferry, Radostina Lyutova, and Tudor A Fulga. Rational design of inducible crispr guide rnas for de novo assembly of transcriptional programs. Nature communications, 8:14633, 2017.

[35] G Parker Flowers, Andrew T Timberlake, Kaitlin C McLean, James R Mon- aghan, and Craig M Crews. Highly efficient targeted mutagenesis in axolotl using cas9 rna-guided nuclease. Development (Cambridge, England), 141:2165- 2171, May 2014.

[36] Ines Fonfara, Anas Le Rhun, Krzysztof Chylinski, Kira S Makarova, Anne-Laure Lcrivain, Janek Bzdrenga, Eugene V Koonin, and . Phylogeny of cas9 determines functional exchangeability of dual-rna and cas9 among orthologous type ii crispr-cas systems. Nucleic acids research, 42:2577- 2590, February 2014.

[37] Ari E Friedland, Yonatan B Tzur, Kevin M Esvelt, Monica P Colaicovo, George M Church, and John A Calarco. Heritable genome editing in c. ele- gans via a crispr-cas9 system. Nature methods, 10:741-743, August 2013.

[38] Yanfang Fu, Deepak Reyon, and J Keith Joung. Targeted genome editing in human cells using crispr/cas nucleases and truncated guide rnas. Methods in enzymology, 546:21-45, 2014.

[39] Linyi Gao, David B T Cox, Winston X Yan, John C Manteiga, Martin W Schnei- der, Takashi Yamano, Hiroshi Nishimasu, Osamu Nureki, Nicola Crosetto, and Feng Zhang. Engineered cpfl variants with altered pam specificities. Nature biotechnology, 35:789-792, August 2017.

[40] Giedrius Gasiunas, Rodolphe Barrangou, Philippe Horvath, and Virginijus Sik- snys. Cas9-crrna ribonucleoprotein complex mediates specific dna cleavage for adaptive immunity in bacteria. Proceedings of the National Academy of Sciences of the of America, 109:E2579-E2586, September 2012.

[41] Nicole M Gaudelli, Alexis C Komor, Holly A Rees, Michael S Packer, Ahmed H Badran, David I Bryson, and David R Liu. Programmable base editing of a-t to g-c in genomic dna without dna cleavage. Nature, 551:464-471, November 2017.

94 [42] Shanzhong Gong, Helen Hong Yu, Kenneth A. Johnson, and David W. Taylor. DNA unwinding is the primary determinant of CRISPR-cas9 activity. Cell Reports, 22(2):359-371, jan 2018.

[43] Jonathan S Gootenberg, Omar 0 Abudayyeh, Jeong Wook Lee, Patrick Esslet- zbichler, Aaron J Dy, Julia Joung, Vanessa Verdine, Nina Donghia, Nichole M Daringer, Catherine A Freije, Cameron Myhrvold, Roby P Bhattacharyya, Jonathan Livny, Aviv Regev, Eugene V Koonin, Deborah T Hung, Pardis C Sabeti, James J Collins, and Feng Zhang. Nucleic acid detection with crispr- casl3a/c2c2. Science (New York, N. Y.), 356:438-442, April 2017.

[44] Alexander A Green, Pamela A Silver, James J Collins, and Peng Yin. Toehold switches: de-novo-designed regulators of gene expression. Cell, 159(4):925-939, 2014.

[45] Ibtissem Grissa, Gilles Vergnaud, and Christine Pourcel. The crisprdb database and tools to display and to generate dictionaries of spacers and repeats. BMC bioinformatics, 8:172, May 2007.

[46] Dmitry Y Guschin, Adam J Waite, George E Katibah, Jeffrey C Miller, Michael C Holmes, and Edward J Rebar. A rapid and general assay for mon- itoring endogenous gene modification. Methods in molecular biology (Clifton, N.J.), 649:247-256, 2010.

[47] J I Gyi, G L Conn, A N Lane, and T Brown. Comparison of the thermodynamic stabilities and solution conformations of dna.rna hybrids containing purine- rich and pyrimidine-rich strands with dna and rna duplexes. Biochemistry, 35:12538-12548, September 1996.

[48] J I Gyi, A N Lane, G L Conn, and T Brown. Solution structures of dna.rna hybrids with purine-rich and pyrimidine-rich strands: comparison with the ho- mologous dna and rna duplexes. Biochemistry, 37:73-80, January 1998.

[49] Lucas B Harrington, David Burstein, Janice S Chen, David Paez-Espino, Enbo Ma, Isaac P Witte, Joshua C Cofsky, Nikos C Kyrpides, Jillian F Banfield, and Jennifer A Doudna. Programmed dna destruction by miniature crispr-cas14 enzymes. Science (New York, N. Y.), 362:839-842, November 2018.

[50] Lucas B Harrington, David Paez-Espino, Brett T Staahl, Janice S Chen, Enbo Ma, Nikos C Kyrpides, and Jennifer A Doudna. A thermostable cas9 with increased lifetime in human plasma. Nature communications, 8:1424, November 2017.

[51] Ayal Hendel, Rasmus 0 Bak, Joseph T Clark, Andrew B Kennedy, Daniel E Ryan, Subhadeep Roy, Israel Steinfeld, Benjamin D Lunstad, Robert J Kaiser, Alec B Wilkens, Rosa Bacchetta, Anya Tsalenko, Douglas Dellinger, Laurakay

95 Bruhn, and Matthew H Porteus. Chemically modified guide rnas enhance crispr- cas genome editing in human primary cells. Nature biotechnology, 33:985-989, Sep 2015.

[52] S Henikoff and J G Henikoff. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America, 89:10915-10919, November 1992.

[53] Claudio Hidalgo-Cantabrana, Alexandra B Crawley, Borja Sanchez, and Rodolphe Barrangou. Characterization and exploitation of crispr loci in javax.xml.bind.jaxbelementA603a5, . Frontiers in microbiology, 8:1851, 2017.

[54] Hisato Hirano, Jonathan S Gootenberg, Takuro Horii, Omar 0 Abudayyeh, Mika Kimura, Patrick D Hsu, Takanori Nakane, Ryuichiro Ishitani, Izuho Hatada, Feng Zhang, Hiroshi Nishimasu, and Osamu Nureki. Structure and engineering of francisella novicida cas9. Cell, 164:950-961, February 2016.

[55] Lisa M Hochrein, Maayan Schwarzkopf, Mona Shahgholi, Peng Yin, and Niles A Pierce. Conditional dicer substrate formation via shape and sequence transduc- tion with small conditional rnas. Journal of the American Chemical Society, 135(46):17322-17330, 2013.

[56] Liad Holtzman and Charles A Gersbach. Editing the epigenome: Reshaping the genomic landscape. Annual review of genomics and human genetics, 19:43-71, August 2018.

[57] Patrick D Hsu, David A Scott, Joshua A Weinstein, F Ann Ran, Silvana Koner- mann, Vineeta Agarwala, Yinqing Li, Eli J Fine, Xuebing Wu, Ophir Shalem, Thomas J Cradick, Luciano A Marraffini, Gang Bao, and Feng Zhang. Dna targeting specificity of rna-guided cas9 nucleases. Nature biotechnology, 31:827- 832, September 2013.

[58] Johnny H Hu, Shannon M Miller, Maarten H Geurts, Weixin Tang, Liwei Chen, Ning Sun, Christina M Zeina, Xue Gao, Holly A Rees, Zhi Lin, and David R Liu. Evolved cas9 variants with broad pam compatibility and high dna specificity. Nature, 556:57-63, April 2018.

[59] Piyush K Jain, Vyas Ramanan, Arnout G Schepers, Nisha S Dalvie, Apekshya Panda, Heather E Fleming, and Sangeeta N Bhatia. Development of light- activated crispr using guide rnas with photocleavable protectors. Angewandte Chemie (Internationaled. in English), 55:12440-12444, Sep 2016.

[60] Li-En Jao, Susan R Wente, and Wenbiao Chen. Efficient multiplex biallelic zebrafish genome editing using a crispr nuclease system. Proceedings of the Na- tional Academy of Sciences of the United States of America, 110:13904-13909, August 2013.

96 [61] Fuguo Jiang and Jennifer A Doudna. Crispr-cas9 structures and mechanisms. Annual review of biophysics, 46:505-529, May 2017.

[62] Fuguo Jiang, David W Taylor, Janice S Chen, Jack E Kornfeld, Kaihong Zhou, Aubri J Thompson, Eva Nogales, and Jennifer A Doudna. Structures of a crispr-cas9 r-loop complex primed for dna cleavage. Science (New York, N. Y.), 351:867-871, February 2016.

[63] Fuguo Jiang, Kaihong Zhou, Linlin Ma, Saskia Gressel, and Jennifer A Doudna. Structural biology. a cas9-guide rna complex preorganized for target dna recog- nition. Science (New York, N. Y.), 348:1477-1481, June 2015.

[64] Martin Jinek, Krzysztof Chylinski, Ines Fonfara, Michael Hauer, Jennifer A Doudna, and Emmanuelle Charpentier. A programmable dual-rna-guided dna endonuclease in adaptive bacterial immunity. Science (New York, N. Y.), 337(6096):816-821, August 2012.

[65] Martin Jinek, Alexandra East, Aaron Cheng, Steven Lin, Enbo Ma, and Jen- nifer Doudna. Rna-programmed genome editing in human cells. eLife, 2:e00471, January 2013. Original DateCompleted: 20130207, Original DateCompleted: 20140206.

[66] Eric A. Josephs, D. Dewran Kocak, Christopher J. Fitzgibbon, Joshua McMen- emy, Charles A. Gersbach, and Piotr E. Marszalek. Structure and specificity of the RNA-guided endonuclease cas9 during DNA interrogation, target binding and cleavage. Nucleic Acids Research, 43(18):8924-8941, sep 2015.

[67] M S Jurica, R J Monnat, and B L Stoddard. Dna recognition and cleavage by the laglidadg homing endonuclease i-crei. Molecular cell, 2:469-476, October 1998.

[68] Samira Kiani, Alejandro Chavez, Marcelle Tuttle, Richard N Hall, Raj Chari, Dmitry Ter-Ovanesyan, Jason Qian, Benjamin W Pruitt, Jacob Beal, Suhani Vora, et al. Cas9 grna engineering for genome editing, activation and repression. Nature methods, 12(11):1051-1054, 2015.

[69] Daesik Kim, Jungeun Kim, Junho K Hur, Kyung Wook Been, Sun-Heui Yoon, and Jin-Soo Kim. Genome-wide analysis reveals specificities of cpfl endonucle- ases in human cells. Nature biotechnology, 34:863-868, August 2016.

[70] Eunji Kim, Taeyoung Koo, Sung Wook Park, Daesik Kim, Kyoungmi Kim, Hee-Yeon Cho, Dong Woo Song, Kyu Jun Lee, Min Hee Jung, Seokjoong Kim, Jin Hyoung Kim, Jeong Hun Kim, and Jin-Soo Kim. In vivo genome edit- ing with a small cas9 orthologue derived from campylobacter jejuni. Nature communications, 8:14500, February 2017.

[71] Benjamin P Kleinstiver, Vikram Pattanayak, Michelle S Prew, Shengdar Q Tsai, Nhu T Nguyen, Zongli Zheng, and J Keith Joung. High-fidelity crispr-cas9

97 nucleases with no detectable genome-wide off-target effects. Nature, 529:490- 495, January 2016.

[72] Benjamin P Kleinstiver, Michelle S Prew, Shengdar Q Tsai, Nhu T Nguyen, Ved V Topkar, Zongli Zheng, and J Keith Joung. Broadening the targeting range of staphylococcus aureus CRISPR-cas9 by modifying PAM recognition. Nature Biotechnology, 33(12):1293-1298, nov 2015.

[73] Benjamin P Kleinstiver, Michelle S Prew, Shengdar Q Tsai, Ved V Topkar, Nhu T Nguyen, Zongli Zheng, Andrew P W Gonzales, Zhuyun Li, Randall T Peterson, Jing-Ruey Joanna Yeh, Martin J Aryee, and J Keith Joung. Engi- neered crispr-cas9 nucleases with altered pam specificities. Nature, 523:481-485, July 2015.

[74] Benjamin P Kleinstiver, Shengdar Q Tsai, Michelle S Prew, Nhu T Nguyen, Moira M Welch, Jose M Lopez, Zachary R McCaw, Martin J Aryee, and J Keith Joung. Genome-wide specificities of crispr-cas cpfl nucleases in human cells. Nature biotechnology, 34:869-874, August 2016.

[75] Luke W Koblan, Jordan L Doman, Christopher Wilson, Jonathan M Levy, Tris- tan Tay, Gregory A Newby, Juan Pablo Maianti, Aditya Raguram, and David R Liu. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nature biotechnology, 36:843-846, October 2018.

[76] Alexis C Komor, Ahmed H Badran, and David R Liu. Crispr-based technologies for the manipulation of eukaryotic genomes. Cell, 168:20-36, January 2017.

[77] Alexis C Komor, Yongjoo B Kim, Michael S Packer, John A Zuris, and David R Liu. Programmable editing of a target base in genomic dna without double- stranded dna cleavage. Nature, 533:420-424, May 2016.

[78] Eugene V Koonin, Kira S Makarova, and Feng Zhang. Diversity, classification and evolution of crispr-cas systems. Current opinion in microbiology, 37:67-78, June 2017.

[79] Maurice Labuhn, Felix F Adams, Michelle Ng, Sabine Knoess, Axel Schambach, Emmanuelle M Charpentier, Adrian Schwarzer, Juan L Mateo, Jan-Henning Klusmann, and Dirk Heckl. Refined sgrna efficacy prediction improves large- and small-scale crispr-cas9 applications. Nucleic acids research, 46:1375-1385, February 2018.

[80] Ben Langmead and Steven L Salzberg. Fast gapped-read alignment with bowtie 2. Nature methods, 9:357-359, March 2012.

[81] Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L Salzberg. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome biology, 10:R25, 2009.

98 [82] Kunwoo Lee, Vanessa A Mackley, Anirudh Rao, Anthony T Chong, Mark A Dewitt, Jacob E Corn, and Niren Murthy. Synthetically modified guide rna and donor dna are a versatile platform for crispr-cas9 engineering. eLife, 6, May 2017.

[83] Seung Hwan Lee, Giandomenico Turchiano, Hirotaka Ata, Somaira Nowsheen, Marianna Romito, Zhenkun Lou, Seuk-Min Ryu, Stephen C Ekker, Toni Cath- omen, and Jin-Soo Kim. Failure to detect dna-guided genome editing using natronobacterium gregoryi argonaute. Nature biotechnology, 35:17-18, Novem- ber 2016.

[84] Ryan T Leenay and Chase L Beisel. Deciphering, communicating, and engi- neering the crispr pam. Journal of molecular biology, 429:177-191, January 2017.

[85] Ryan T. Leenay, Kenneth R. Maksimchuk, Rebecca A. Slotkowski, Roma N. Agrawal, Ahmed A. Gomaa, Alexandra E. Briner, Rodolphe Barrangou, and Chase L. Beisel. Identifying and visualizing functional PAM diversity across CRISPR-cas systems. Molecular Cell, 62(1):137-147, apr 2016.

[86] Tristan Lefbure, Vince P Richards, Ping Lang, Paulina Pavinski-Bitar, and Michael J Stanhope. Gene repertoire evolution of streptococcus pyogenes in- ferred from phylogenomic analysis with streptococcus canis and streptococcus dysgalactiae. PloS one, 7:e37607, 2012.

[87] Xiaosa Li, Ying Wang, Yajing Liu, Bei Yang, Xiao Wang, Jia Wei, Zongyang Lu, Yuxi Zhang, Jing Wu, Xingxu Huang, Li Yang, and Jia Chen. Base editing with a cpfl-cytidine deaminase fusion. Nature Biotechnology, 36(4):324-327, mar 2018.

[88] Xiquan Liang, Jason Potter, Shantanu Kumar, Yanfei Zou, Rene Quintanilla, Mahalakshmi Sridharan, Jason Carte, Wen Chen, Natasha Roark, Sridhar Ran- ganathan, Namritha Ravinder, and Jonathan D Chesnut. Rapid and highly efficient mammalian cell engineering via cas9 protein transfection. Journal of biotechnology, 208:44-53, August 2015.

[89] Steven Lin, Brett T Staahl, Ravi K Alla, and Jennifer A Doudna. En- hanced homology-directed human genome engineering by controlled timing of crispr/cas9 delivery. eLife, 3:e04766, December 2014.

[90] Yuchen Liu, Yonghao Zhan, Zhicong Chen, Anbang He, Jianfa Li, Hanwei Wu, Li Liu, Chengle Zhuang, Junhao Lin, Xiaoqiang Guo, et al. Directing cellular information flow via crispr signal conductors. Nature methods, 13(11):938, 2016.

[91] David Loughrey, Kyle E Watters, Alexander H Settle, and Julius B Lucks. Shape-seq 2.0: systematic optimization and extension of high-throughput chem- ical probing of rna secondary structure with next generation sequencing. Nucleic acids research, 42, December 2014.

99 [92] Morgan L Maeder, Stacey Thibodeau-Beganny, Anna Osiak, David A Wright, Reshma M Anthony, Magdalena Eichtinger, Tao Jiang, Jonathan E Foley, Ron- nie J Winfrey, Jeffrey A Townsend, Erica Unger-Wallace, Jeffry D Sander, Felix Mller-Lerch, Fengli Fu, Joseph Pearlberg, Carl Gbel, Justin P Dassie, Shon- dra M Pruett-Miller, Matthew H Porteus, Dennis C Sgroi, A John Iafrate, Drena Dobbs, Paul B McCray, Toni Cathomen, Daniel F Voytas, and J Keith Joung. Rapid "open-source" engineering of customized zinc-finger nucleases for highly efficient gene modification. Molecular cell, 31:294-301, July 2008. [93] Prashant Mali, Luhan Yang, Kevin M Esvelt, John Aach, Marc Guell, James E DiCarlo, Julie E Norville, and George M Church. Rna-guided human genome engineering via cas9. Science (New York, N. Y.), 339:823-826, February 2013. [94] Luciano A Marraffini and Erik J Sontheimer. Crispr interference: Rna-directed adaptive immunity in bacteria and archaea. Nature Reviews Genetics, 11(3):181, 2010.

[95] Aamir Mir, Alireza Edraki, Jooyoung Lee, and Erik J. Sontheimer. Type II- c CRISPR-cas9 biology, mechanism, and application. ACS Chemical Biology, 13(2):357-365, dec 2017.

[96] Makoto Miyagishi and Kazunari Taira. U6 promoter-driven sirnas with four uridine 3' overhangs efficiently suppress targeted gene expression in mammalian cells. Nature biotechnology, 20:497-500, May 2002. [97] F J M Mojica, C Diez-Villasenor, J Garcia-Martinez, and C Almendros. Short motif sequences determine the targets of the prokaryotic crispr defence system. Microbiology (Reading, England), 155:733-740, March 2009. [98] Francisco J M Mojica, Csar Dez-Villaseor, Jess Garca-Martnez, and Elena So- ria. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. Journal of molecular evolution, 60:174-182, February 2005.

[99] Maximilian Mller, Ciaran M Lee, Giedrius Gasiunas, Timothy H Davis, Thomas J Cradick, Virginijus Siksnys, Gang Bao, Toni Cathomen, and Clau- dio Mussolino. Streptococcus thermophilus crispr-cas9 systems enable specific editing of the human genome. Molecular therapy : the journal of the American Society of Gene Therapy, 24:636-644, March 2016. [100] Shu-ichi Nakano, Takayuki Kanzaki, and Naoki Sugimoto. Influences of ribonu- cleotide on a duplex conformation and its thermal stability: study with the chimeric rna-dna strands. Journal of the American Chemical Society, 126:1088- 1095, February 2004.

[101] Takuya Nakayama, Margaret B Fish, Marilyn Fisher, Jamina Oomen-Hajagos, Gerald H Thomsen, and Robert M Grainger. Simple and efficient crispr/cas9- mediated targeted mutagenesis in xenopus tropicalis. Genesis (New York, N. Y. : 2000), 51:835-843, December 2013.

100 [102] Hiroshi Nishimasu, F Ann Ran, Patrick D Hsu, Silvana Konermann, Soraya I Shehata, Naoshi Dohmae, Ryuichiro Ishitani, Feng Zhang, and Osamu Nureki. Crystal structure of cas9 in complex with guide rna and target dna. Cell, 156(5):935-949, 2014.

[103) Hiroshi Nishimasu, Xi Shi, Soh Ishiguro, Linyi Gao, Seiichi Hirano, Sanae Okazaki, Taichi Noda, Omar O Abudayyeh, Jonathan S Gootenberg, Hideto Mori, Seiya Oura, Benjamin Holmes, Mamoru Tanaka, Motoaki Seki, Hisato Hi- rano, Hiroyuki Aburatani, Ryuichiro Ishitani, Masahito Ikawa, Nozomu Yachie, Feng Zhang, and Osamu Nureki. Engineered crispr-cas9 nuclease with expanded targeting space. Science (New York, N. Y.), August 2018.

[104] James K Nuez, Philip J Kranzusch, Jonas Noeske, Addison V Wright, Christo- pher W Davies, and Jennifer A Doudna. Casl-cas2 complex formation medi- ates spacer acquisition during crispr-cas adaptive immunity. Nature structural & molecular biology, 21:528-534, June 2014.

[105] K K Ogilvie, N Usman, K Nicoghosian, and R J Cedergren. Total chemical synthesis of a 77-nucleotide-long rna sequence having methionine-acceptance activity. Proceedings of the National Academy of Sciences of the United States of America, 85:5764-5768, August 1988.

[106] Luca Pinello, Matthew C Canver, Megan D Hoban, Stuart H Orkin, Donald B Kohn, Daniel E Bauer, and Guo-Cheng Yuan. Analyzing crispr genome-editing experiments with crispresso. Nature biotechnology, 34:695-697, July 2016.

[107] Lei S Qi, Matthew H Larson, Luke A Gilbert, Jennifer A Doudna, Jonathan S Weissman, Adam P Arkin, and Wendell A Lim. Repurposing crispr as an rna- guided platform for sequence-specific control of gene expression. Cell, 152:1173- 1183, February 2013.

[108] Cherie L Ramirez, Jonathan E Foley, David A Wright, Felix Mller-Lerch, Shamim H Rahman, Tatjana I Cornu, Ronnie J Winfrey, Jeffry D Sander, Fengli Fu, Jeffrey A Townsend, Toni Cathomen, Daniel F Voytas, and J Keith Joung. Unexpected failure rates for modular assembly of engineered zinc fingers. Nature methods, 5:374-375, May 2008.

[109] F Ann Ran, Le Cong, Winston X Yan, David A Scott, Jonathan S Gootenberg, Andrea J Kriz, Bernd Zetsche, Ophir Shalem, Xuebing Wu, Kira S Makarova, Eugene V Koonin, Phillip A Sharp, and Feng Zhang. In vivo genome editing using staphylococcus aureus cas9. Nature, 520:186-191, April 2015.

[110] F Ann Ran, Patrick D Hsu, Chie-Yu Lin, Jonathan S Gootenberg, Silvana Konermann, Alexandro E Trevino, David A Scott, Azusa Inoue, Shogo Ma- toba, Yi Zhang, and Feng Zhang. Double nicking by rna-guided crispr cas9 for enhanced genome editing specificity. Cell, 154:1380-1389, September 2013.

101 [111] Deepak Reyon, Shengdar Q Tsai, Cyd Khayter, Jennifer A Foden, Jeffry D Sander, and J Keith Joung. Flash assembly of talens for high-throughput genome editing. Nature biotechnology, 30:460-465, May 2012.

[112] Vincent P Richards, Sara R Palmer, Paulina D Pavinski Bitar, Xiang Qin, George M Weinstock, Sarah K Highlander, Christopher D Town, Robert A Burne, and Michael J Stanhope. Phylogenomics and the dynamic genome evo- lution of the genus streptococcus. Genome biology and evolution, 6:741-753, April 2014.

[113] Vincent P Richards, Ruth N Zadoks, Paulina D Pavinski Bitar, Tristan Lef- bure, Ping Lang, Brenda Werner, Linda Tikofsky, Paolo Moroni, and Michael J Stanhope. Genome characterization and population genetic structure of the zoonotic pathogen, streptococcus canis. BMC microbiology, 12:293, December 2012.

[114] Christopher D Richardson, Graham J Ray, Mark A DeWitt, Gemma L Curie, and Jacob E Corn. Enhancing homology-directed genome editing by catalyt- ically active and inactive crispr-cas9 using asymmetric donor dna. Nature biotechnology, 34:339-344, March 2016.

[115] David E Root, Nir Hacohen, William C Hahn, Eric S Lander, and David M Sabatini. Genome-scale loss-of-function screening with a lentiviral rnai library. Nature methods, 3:715-719, September 2006.

[116] Jeffry D Sander and J Keith Joung. Crispr-cas systems for editing, regulating and targeting genomes. Nature biotechnology, 32:347-355, April 2014.

[117] Kellie A Schaefer, Wen-Hsuan Wu, Diana F Colgan, Stephen H Tsang, Alexan- der G Bassuk, and Vinit B Mahajan. Unexpected mutations after crispr-cas9 editing in vivo. Nature methods, 14:547-548, May 2017.

[118] Johannes Schindelin, Ignacio Arganda-Carreras, Erwin Frise, Verena Kaynig, Mark Longair, Tobias Pietzsch, Stephan Preibisch, Curtis Rueden, Stephan Saalfeld, Benjamin Schmid, Jean-Yves Tinevez, Daniel James White, Volker Hartenstein, Kevin Eliceiri, Pavel Tomancak, and Albert Cardona. Fiji: an open-source platform for biological-image analysis. Nature methods, 9:676-682, June 2012.

[119] Jonathan L Schmid-Burgk, Klara Hning, Thomas S Ebert, and Veit Hornung. Crispaint allows modular base-specific gene tagging using a ligase-4-dependent mechanism. Nature communications, 7:12338, July 2016.

[120] Shiraz A Shah, Susanne Erdmann, Francisco J M Mojica, and Roger A Garrett. Protospacer recognition motifs: mixed identities and functional diversity. RNA biology, 10:891-899, May 2013.

102 [121] Qiwei Shan, Yanpeng Wang, Jun Li, and Caixia Gao. Genome editing in rice and wheat using the crispr/cas system. Nature protocols, 9:2395-2410, October 2014.

[122] Sergey A Shmakov, Vassilii Sitnik, Kira S Makarova, Yuri I Wolf, Konstantin V Severinov, and Eugene V Koonin. The crispr spacer space is dominated by sequences from species-specific mobilomes. mBio, 8, September 2017.

[123] Digvijay Singh, Samuel H Sternberg, Jingyi Fei, Jennifer A Doudna, and Taekjip Ha. Real-time observation of dna recognition and rejection by the rna-guided endonuclease cas9. Nature communications, 7:12778, September 2016.

[124] Ian M Slaymaker, Linyi Gao, Bernd Zetsche, David A Scott, Winston X Yan, and Feng Zhang. Rationally engineered cas9 nucleases with improved specificity. Science (New York, N.Y.), 351:84-88, Jan 2016.

[125] Nicholas M Snead, Julie R Escamilla-Powers, John J Rossi, and Anton P McCaf- frey. 5' unlocked nucleic acid modification improves sirna targeting. Molecular therapy. Nucleic acids, 2:e103, July 2013.

[126] Jeffrey M. Spencer and Xiaoliu Zhang. Deep mutational scanning of s. pyogenes cas9 reveals important functional domains. Scientific Reports, 7(1), dec 2017.

[127] M L Stephenson and P C Zamecnik. Inhibition of rous sarcoma viral rna translation by a specific oligodeoxyribonucleotide. Proceedings of the National Academy of Sciences of the United States of America, 75:285-288, January 1978.

[128] Samuel H Sternberg, Benjamin LaFrance, Matias Kaplan, and Jennifer A Doudna. Conformational control of dna target cleavage by crispr-cas9. Na- ture, 527:110-113, November 2015.

[129] Samuel H Sternberg, Sy Redding, Martin Jinek, Eric C Greene, and Jennifer A Doudna. Dna interrogation by the crispr rna-guided endonuclease cas9. Nature, 507:62-67, March 2014.

[130] Jonathan Strecker, Sara Jones, Balwina Koopal, Jonathan Schmid-Burgk, Bernd Zetsche, Linyi Gao, Kira S Makarova, Eugene V Koonin, and Feng Zhang. Engineering of crispr-casl2b for human genome editing. Nature com- munications, 10:212, January 2019.

[131] N Sugimoto, S Nakano, M Katoh, A Matsumura, H Nakamuta, T Ohmichi, M Yoneyama, and M Sasaki. Thermodynamic parameters to predict stability of rna/dna hybrid duplexes. Biochemistry, 34:11211-11216, September 1995.

[132] Keiichiro Suzuki, Yuji Tsunekawa, Reyna Hernandez-Benitez, Jun Wu, Jie Zhu, Euiseok J. Kim, Fumiyuki Hatanaka, Mako Yamamoto, Toshikazu Araoka, Zhe Li, and et al. In vivo genome editing via crispr/cas9 mediated homology- independent targeted integration. Nature, 540(7631):144149, Nov 2016.

103 [133] Mark D Szczelkun, Maria S Tikhomirova, Tomas Sinkunas, Giedrius Gasiu- nas, Tautvydas Karvelis, Patrizia Pschera, Virginijus Siksnys, and Ralf Seidel. Direct observation of r-loop formation by single rna-guided cas9 and cascade effector complexes. Proceedings of the National Academy of Sciences of the United States of America, 111:9798-9803, July 2014.

[134] Ryo Takeuchi, Michael Choi, and Barry L Stoddard. Redesign of exten- sive protein-dna interfaces of meganucleases using iterative cycles of in vitro compartmentalization. Proceedings of the National Academy of Sciences, 111(11):4061-4066, 2014.

[135] Weixin Tang, Johnny H Hu, and David R Liu. Aptazyme-embedded guide rnas enable ligand-responsive genome editing and transcriptional activation. Nature communications, 8:15939, 2017.

[136] Summer B Thyme, Laila Akhmetova, Tessa G Montague, Eivind Valen, and Alexander F Schier. Internal guide rna interactions interfere with cas9-mediated cleavage. Nature communications, 7:11750, June 2016.

[137] Shengdar Q Tsai, Nhu T Nguyen, Jose Malagon-Lopez, Ved V Topkar, Martin J Aryee, and J Keith Joung. Circle-seq: a highly sensitive in vitro screen for genome-wide crispr-cas9 nuclease off-targets. Nature methods, 14:607-614, June 2017.

[138] Kumiko Ui-Tei, Yuki Naito, Shuhei Zenno, Kenji Nishi, Kenji Yamato, Fumi- taka Takahashi, Aya Juni, and Kaoru Saigo. Functional dissection of sirna sequence by systematic dna substitution: modified sirna with a dna seed arm is a powerful tool for mammalian gene silencing with significantly reduced off- target effect. Nucleic acids research, 36:2136-2151, April 2008.

[139] Lna Vouillot, Aurore Thlie, and Nicolas Pollet. Comparison of t7el and surveyor mismatch cleavage assays to detect mutations triggered by engineered nucleases. G3 (Bethesda, Md.), 5:407-415, January 2015.

[140] Haifeng Wang, Marie La Russa, and Lei S Qi. Crispr/cas9 in genome editing and beyond. Annual review of biochemistry, 85:227-264, June 2016.

[141] Tina Wang, Ahmed H Badran, Tony P Huang, and David R Liu. Continuous directed evolution of proteins with improved soluble expression. Nature chemical biology, August 2018.

[142] Yueqiang Wang, Zhiqian Li, Jun Xu, Baosheng Zeng, Lin Ling, Lang You, Yazhou Chen, Yongping Huang, and Anjiang Tan. The crispr/cas system medi- ates efficient genome engineering in bombyx mori. Cell research, 23:1414-1416, December 2013.

[143] Wen Xue, Sidi Chen, Hao Yin, Tuomas Tammela, Thales Papagiannakopoulos, Nikhil S Joshi, Wenxin Cai, Gillian Yang, Roderick Bronson, Denise G Crowley,

104 Feng Zhang, Daniel G Anderson, Phillip A Sharp, and Tyler Jacks. Crispr- mediated direct mutation of cancer genes in the mouse liver. Nature, 514:380- 384, October 2014.

[144] Takashi Yamano, Hiroshi Nishimasu, Bernd Zetsche, Hisato Hirano, Ian M Slaymaker, Yinqing Li, lana Fedorova, Takanori Nakane, Kira S Makarova, Eugene V Koonin, Ryuichiro Ishitani, Feng Zhang, and Osamu Nureki. Crystal structure of cpfl in complex with guide rna and target dna. Cell, 165:949-962, May 2016.

[145] Takashi Yamano, Bernd Zetsche, Ryuichiro Ishitani, Feng Zhang, Hiroshi Nishi- masu, and Osamu Nureki. Structural basis for the canonical and non-canonical PAM recognition by CRISPR-cpfl. Molecular Cell, 67(4):633-645.e3, aug 2017.

[146] Joseph N Zadeh, Conrad D Steenberg, Justin S Bois, Brian R Wolfe, Marshall B Pierce, Asif R Khan, Robert M Dirks, and Niles A Pierce. Nupack: analysis and design of nucleic acid systems. Journal of computational chemistry, 32(1):170- 173, 2011.

[147] Bernd Zetsche, Jonathan S Gootenberg, Omar 0 Abudayyeh, Ian M Slaymaker, Kira S Makarova, Patrick Essletzbichler, Sara E Volz, Julia Joung, John van der Oost, Aviv Regev, Eugene V Koonin, and Feng Zhang. Cpfl is a single rna- guided endonuclease of a class 2 crispr-cas system. Cell, 163:759-771, October 2015.

[148] David Yu Zhang and Erik Winfree. Robustness and modularity properties of a non-covalent dna catalytic reaction. Nucleic acids research, 38(12):4182-4197, 2010.

[149] Junbin Zhang, Jie Zheng, Chang Lu, Quan Du, Zicai Liang, and Zhen Xi. Modification of the sirna passenger strand by 5-nitroindole dramatically reduces its off-target effects. Chembiochem : a European journal of chemical biology, 13:1940-1945, September 2012.

[150] Min Zhang, Chengqi Wang, Thomas D Otto, Jenna Oberstaller, Xiangyun Liao, Swamy R Adapa, Kenneth Udenze, Iraad F Bronner, Deborah Casan- dra, Matthew Mayho, Jacqueline Brown, Suzanne Li, Justin Swanson, Julian C Rayner, Rays H Y Jiang, and John H Adams. Uncovering the essential genes of the human malaria parasite plasmodium falciparum by saturation mutagenesis. Science (New York, N.Y.), 360, May 2018.

105