Engineering High-Precision CRISPR- Nuclease and Base Editor Technologies

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:40050081

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Developing High-Precision CRISPR-Cas9 Base Editor Technologies

A dissertation presented by

Jason Gehrke

to

The Department of Molecular and Cellular Biology

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

in the subject of

Biochemistry

Harvard University

Cambridge, Massachusetts

May 2018

© 2018 Jason Gehrke

All rights reserved.

Dissertation advisor: Dr. J. Keith Joung Author: Jason Gehrke

Abstract

The human genome, comprising approximately three billion nucleotides separated across

23 chromosomes, contains a vast amount of sequence space encoding instructions for the production of RNA and protein molecules that make up the functional components of the cell.

Some changes in these genetic sequences produce phenotypic changes in organisms, such as genetically-heritable diseases in humans. , loosely defined as creating targeted, sequence-specific changes in a desired cellular genome, promises to enable the correction of many disease-causing mutations. To enable these technologies to be used to their full potentials in research and therapeutic settings, it is necessary to engineer genome editing technologies with extreme specificity and fidelity to the desired target sequence compared to all other possible sequences in the target genome. This work focuses on engineering CRISPR-Cas9 RNA-guided nucleases and base editor technologies to enhance their precision and genome-wide specificities, thus enabling researchers to more effectively use these tools to study fundamental biological principles or to translate these technologies into clinical settings.

Chapter 1 introduces relevant history of the genetic engineering field, including contemporary genome editing platforms such as CRISPR-Cas9 nuclease and base editor platforms.

Chapter 2 describes efforts towards engineering CRISPR-Cas9 systems that are dependent on specific epigenetic contexts adjacent to their target site in order to successfully induce DNA double strand breaks, adding additional layers of complexity and regulation of activity to the CRISPR-

Cas9 platform.

iii

Chapter 3 focuses on leveraging natural cytidine deaminase protein diversity, coupled with standard protein engineering techniques, to create base editor proteins able to programmably edit single nucleotides at desired on-target sites based on the sequence context of a given target base.

The technology described in this chapter greatly enhances the ability of the base editor platform to correct many disease-causing genomic SNPs with fewer or no bystander nucleotide editing events at the on-target site, and greatly reduces the rates of off-target editing compared to the original base editor platform. Importantly, we demonstrate that this technology can be used to more efficiently and precisely correct a disease-causing mutation in a model cell line and in erythroid precursor cells derived from a patient bearing this mutation.

iv

Table of Contents

Abstract ...... iii

Table of Contents ...... v

Acknowledgements ...... vii

Chapter 1: Introduction ...... 1

Repair of naturally-occurring or enzymatically-induced DNA lesions ...... 2

Overview of genome editing tools ...... 6

CRISPR and the democratization of genome editing ...... 10

Base editor technologies ...... 12

Genome-wide specificities of designer nucleases and base editors ...... 14

Chapter 2: Engineering CRISPR-Cas9 for Improved Targeting Range, Delivery, and Control ...... 20

Engineering transcription factor-specific CRISPR-Cas9 nucleases ...... 20

Engineering minimal Cas9 orthologs for double stranded DNA binding and nuclease activity ...... 29

Chapter 3: Engineering Sequence-Specific Cytidine Deaminases ...... 32

Developing high-precision CRISPR-Cas9 base editors with minimized bystander and off-target

mutations ...... 34

Supplementary Figures ...... 51

Methods ...... 71

Chapter 4: Discussion and future directions ...... 78

Supplementary Tables ...... 84

Supplementary Table 1 ...... 85

v

Supplementary Table 2 ...... 87

Supplementary Table 3 ...... 102

Supplementary Table 4 ...... 112

Supplementary Table 5 ...... 116

Supplementary Table 6 ...... 119

References ...... 154

vi

Acknowledgements

I would like to express my sincere gratitude to Prof. J. Keith Joung for the years of scientific training and knowledge he has imparted on me, and the profound impact he has had on how I approach science, especially the field of genome editing. The creative autonomy given to me in proposing and pursuing ideas was essential for my development as a scientist. Though very few of my ideas became successful projects, each failed experiment taught me more about designing projects and experiments with scientific rigor and I couldn’t have asked for a more supportive mentor through this process.

To my friends, particularly James Angstman, Diego Baptista, Jacques Carolan, Elliot

Clark, Geoff Cockrell, Sandy Mattei, Grace Sager, Matt Smith and Cayla Zimmer: without you I would have never made it through graduate school. Your support and encouragement over the past five years, especially in difficult and trying times, has kept me going.

To my family, and especially to my parents and sister: your support of my decisions, good or bad, has gotten me here, and I cannot thank you enough for all that you have done for me.

Lastly, I’d like to thank all of my collaborators that I’ve worked with over the course of my graduate school experience. Whether we were successful in our endeavors or not, these were valuable experiences and I learned a great deal from every shared project. In particular, Prof. Dan

Bauer and Drs. Yuxuan Wu, and Jing Zeng were instrumental in aiding the translation of the base editor technologies I developed to patient-derived cells. I also thank my advisory committee members, Profs. Mo Khalil, David Liu, and Alex Schier for the valuable advice and direction given to me over my graduate career, as well as the National Science Foundation for helping to fund my scientific education.

vii

Chapter 1: Introduction

The genomes of organisms encode a set of instructions for the production of functional proteins and RNAs in the form of DNA sequence elements that collectively enable a given cell to perform standard housekeeping functions that meet the basic requirements of the cell, and also to sense external stimuli and respond through adaptation to their environment. Production of each of these elements is tightly regulated in the cell, and genomic alterations that result in misregulation of either protein-coding genes or non-coding RNA species can result in disease phenotypes in humans (Ptashne, 2014). The most well-understood disease-causing mutations reside in protein- coding genes, where changes in the genomic sequence often results in the incorporation of improper amino acids or even premature termination of protein translation, though mutations in the genomic sequences encoding long non-coding RNAs or microRNAs are increasingly being recognized as drivers of human diseases (Esteller, 2011). Critically, eukaryotic genomes encode vast numbers of as-yet incompletely understood sequences that contribute to the precise regulation of the genome, from the three-dimensional organization of nucleic acids to the control of gene expression by regulatory elements such as promoters or enhancers. The genetic engineering tools described in this work serve two important functions: (1) they allow users to precisely define the function of genetic elements by perturbing their sequences in cells, and (2) they hold the promise of allowing users to precisely correct disease-causing mutations in the context of therapeutic interventions.

The earliest forms of genetic engineering began between 10,000 and 15,000 years ago with the domestication of animals and crops. In both cases, selective breeding of individuals with desired traits resulted in offspring with those traits, and over many generations humans were able

1 to produce crops and animals that fit their needs and desires. The basis of selective breeding as the propagation of heritable material only began to be elucidated by Gregor Mendel in 1865, and the molecular basis of the genetical material being passed down to new generations wasn’t understood until Alfred Hershey and Martha Chase used radiolabeled DNA to track its egress from bacteriophage viruses into in 1952 (Hershey and Chase, 1952).

The discovery of DNA as the molecular basis of phenotypic traits raised the possibility that researchers might be able to manipulate DNA directly in order to engineer desired phenotypic traits without the time-consuming efforts of early methods relying on selective breeding. To enable this modern form of genome engineering, though, it would also be necessary to invent methods to discover the genetic sequences of entire organismal genomes in order to manipulate them, a feat not required for engineering traits by selective breeding. The advent of high-throughput DNA sequencing capabilities and sequence-specific, customizable tools to easily manipulate these sequences in cells of a large swath of organisms, including humans, has revolutionized our understanding of the molecular basis of biology and has come tro be known as genome editing.

Genome editing offers a vastly improved platform for rapidly generating animals or crops with desired traits, and is now beginning to be used to genetically alter specific cells or tissues of humans to alleviate disease phenotypes in clinical settings.

Repair of naturally-occurring or enzymatically-induced DNA lesions

Damage to genomic DNA in mammalian cells regularly occurs in the form of ionizing radiation-induced damage, oxidation or deamination of individual nucleotides, DNA replication errors, single strand breaks (nicks), or double strand breaks (DSBs) (Chang et al., 2017; Lieber,

2010). The generation of DSBs represents the most potent threat to genomic stability because repair of these lesions often requires processing of the DNA at the lesion site by DNA exonucleases

2 and DNA polymerases,and can lead to loss of genomic sequence or fusion of distant genomic sequences by translocation. Because DSBs occur at an estimated frequency of 10 DSBs per day per cell in mammalian cells (Lieber and Karanjawala, 2004), and because native processes in vertebrate lymphocyte and immune system development such as V (D)J recombination and antibody affinity maturation require the induction and subsequent repair of DSBs or deamination of genomic DNA, mammalian cells have developed robust methods for recognizing and repairing

DNA damage lesions.

Physiologically-occurring DSBs are largely repaired by two competing pathways: non- homologous end joining (NHEJ) or homology-directed repair (HDR). NHEJ works by re-joining the two freely-diffusing DNA ends in a manner mostly agnostic to the sequences of the two ends, whereas HDR requires a copy of the sequence bearing a DSB nearby to use as a template for repairing the break site. Because most breaks occur outside of the S phase of the cell cycle when a second copy of genomic sequence is in close proximity in the form of a sister chromatid, NHEJ is the predominant repair pathway in all mammalian cell types and is active in all phases of the cell cycle (Moore and Haber, 1996).

NHEJ repair of DSBs is an error-prone process. Binding to the DNA ends by the Ku protein recruits a suite of DNA processing proteins to the site of the DSB, including exonucleases, polymerases, and DNA ligase proteins to the site of the DSB (Jeggo, 1998; O’Driscoll and Jeggo,

2006). Upon recruitment to the DSB site, DNA exonucleases resect one or both DNA ends to produce short, single-stranded DNA ends that are compatible for ligation to each other. DNA polymerases also act at the DNA ends to fill in resected DNA ends, and sometimes act to add DNA sequences in a template-dependent manner (Dyck et al., 1999; Sonoda et al., 2006). These

3 processes work iteratively until successful re-ligation of the two DNA ends occurs and the DNA lesion has been repaired (Grawunder et al., 1997).

Because NHEJ repair results in iterative cycles of DNA resection, re-polymerization, and can include template-independent insertion of sequences by DNA polymerases, repair of a DSB by NHEJ often gains or loses information at the site of the DSB in the form of small insertion or deletion (indel) mutations (Chang et al., 2017). When a DSB is induced in a protein-coding region of the genome, the resulting indels can cause missense or nonsense mutations when translated into protein, thus inactivating the target gene. When targeted to a gene regulatory element such as a promoter or enhancer, indels can result in non-functional regulatory elements by altering or abrogating the DNA sequences which are bound by sequence-specific transcription factors.

Because NHEJ is the most efficient DNA repair pathway, the simplest and most efficient genome editing methods rely on inducing indel formation at a target site in a functional genomic element.

Though efficient, it is often difficult or impossible to use NHEJ to produce a user-defined mutation at a genomic site of interest. As a result, it becomes necessary to co-opt the more-precise homology-directed repair (HDR) class of pathways to incorporate specific mutations in a genome of interest.

The HDR class of pathways are required to resolve naturally-occurring lesions, such as those that form during DNA replication, in an error-free manner and are critical for limiting tumor formation during organismal development (Jasin and Haber, 2016). The primary determining factor for whether a DSB is repaired via NHEJ or HR is the stage of the mitotic cycle a particular cell is in when DSB repair is initiated (Chapman et al., 2012). NHEJ is the predominant pathway in G1, S, and G2 phases, while HDR is active in later stages of the S and G2, when sister chromatids are present and able to act as donor molecules. Most of the proteins required for HDR

4 are not well-expressed outside of the S and G2 phases to prevent large-scale chromosomal rearrangements (Orthwein et al., 2014). Repair of DSBs by HDR can occur via several mechanisms: (i) gene conversion by synthesis dependent strand annealing, (ii) gene conversion with crossover double Holliday Junction intermediate, and (iii) break-induced replication. The most common repair mechanism and the most relevant for discussion involving co-opting DNA repair processes for genome editing purposes is gene conversion by synthesis-dependent strand annealing (Paix et al., 2017). In this pathway, resection of each of the genomic DNA ends resulting from the DSB by exonucleases produces long ssDNA tails that are able to invade the donor DNA duplex based on sequence homology. Native DNA polymerases are then able to extend the ssDNA tails using the invaded DNA duplex strands as templates. The long ssDNA tails then hybridize according to sequence complementarity, and DNA polymerases and ligases cooperate to fully repair the lesion to product intact genomic DNA (Jasin and Haber, 2016).

In mitotically-active populations of primary human cells, the efficiencies of gene conversion by HR can vary between 0.1-5% (Cong et al., 2013; Ran et al., 2013a), significantly lower than the rates of repair by NHEJ, regardless of the cell cycle phase in which the DSB undergoes repair. Because precise manipulation of genomic DNA in cells is highly desirable for genome editing purposes, enhancing HR efficiencies has been a significant point of emphasis in recent years. Some studies have concluded that HR efficiencies in mitotically-active cells have been improved by inhibiting DNA ligase IV (Chu et al., 2015; Maruyama et al., 2015), an essential component of the NHEJ pathway, though these claims have come under recent scrutiny (Greco et al., 2016). Increases in HR efficiencies have also been reported by chemically arresting populations of cells in phases of the cell cycle that favor increased HR (Lin et al.; Maeder et al.,

2008), by rational design of ssDNA oligonucleotide donors that can serve as templates for HR

5

(Richardson et al., 2016), or by engineering sequence-specific nucleases fused to parts of the human Geminin protein such that the fusion protein only localizes to the nucleus during the S phase of the cell cycle (Gutschner et al., 2016; Howden et al., 2016). However, each of these strategies has significant limitations: inhibition of DNA ligase IV is likely to have deleterious effects due to global inhibition of a critical DNA repair pathway, especially in the context of systemic delivery of a small-molecule DNA ligase IV inhibitor to a whole organism where the majority of that organisms’ cells are not targeted for genomic editing, cell-cycle arrest remains relevant only for research purposes in vitro, as chemically arresting cells is highly deleterious to cell viability. Delivery of ssDNA oligonucleotide donor molecules in relevant concentrations to an organism is likely to be very challenging, and nuclease-Geminin fusion proteins offer only modest increases in rates of HDR relative to nucleases alone. Further, in post-mitotic cell populations, the efficiencies of gene conversion by HR are very low and are unlikely to be stimulated by any of these strategies as post-mitotic populations lack the protein machinery necessary to undergo HR efficiently. Thus, methods to direct high-efficiency, precise, user-defined alterations to a target cellular genome regardless of cell cycle remains an area of intense research interest.

Overview of genome editing tools

Many organisms are capable of incorporating DNA provided by a user into their genomes through HDR at a specified locus without the use of a targeted nuclease to induce a DSB at or near the integration site, a process termed gene targeting. However, this is a low-efficiency process and requires a selectable marker be incorporated with the sequence of interest. Selection based on this marker allows for the cells with the rare HDR events incorporating the exogenous DNA to be separated from the majority of cells that did not incorporate the exogenous DNA into their

6 genomes. Because of this limitation in efficiency, gene targeting remains intractable for most precision genome engineering applications, including potential therapeutic applications.

The field of genetics has long been dependent on tools and methodologies that create mutations in genetic elements in order to analyze the subsequent phenotype and thus biological function of that particular genetic element. The creation of a platform capable of creating mutations at a specific target site in the genome of a given cell at high efficiency is critical to enabling the study of not just DNA repair at a given chromosomal location, but would allow researchers to create specific mutations associated with disease phenotypes or remove or alter sequences or regulatory elements underlying important biological processes. Such a platform for making arbitrary, user defined alterations to the mammalian genome would provide tools for researchers to dissect biological functions of specific genomic sequences, and might even provide a means to reverse disease-causing mutations in human cells. To make genome engineering tools efficient enough for regular use in research and potentially therapeutic applications, it was then necessary to develop a new method for integrating exogenous DNA sequences into target genomes. The vast majority of modern genome editing tools now operate by introducing a DNA double-strand break

(DSB) at a specific sequence in a mammalian genome. That DSB is then typically repaired by two competing pathways: error-prone non-homologous end joining (NHEJ) or homology-directed repair (HDR). The first evidence that nuclease-induced DSBs stimulated DNA repair came from studies of the MAT mating locus in Saccharomyces cerevisiae, in which expression of the HO endonuclease and the subsequent DSB induced by this protein enables the conversion of MAT alleles between a and α by HDR (Haber, 2012). Though mating type conversion in yeast is a natural process, the discovery of allele conversion induced by a nuclease through HDR raised the possibility that it may be possible to directly manipulate the sequence of a eukaryotic genome by

7 inducing a DSB at a specific site and co-opting the HDR pathway to use an exogenous repair template.

The observation that a DSB induced at a specific locus in a mammalian chromosome stimulates DNA repair was first made by Maria Jasin in 1994 (Rouet et al., 1994). Researchers led by Jasin used the mitochondrially-encoded Saccharomyces cerevisiae homing endonuclease I-

SceI, which only induces DSBs at stretches of DNA with a specific 18 base pair sequence, to cut at a single location in the mouse genome which they had previously integrated for the study of well-defined DSBs in mammalian cells. Jasin found that expression of I-SceI and subsequent DSB induction stimulated HDR at the integrated I-SceI site in the mouse genome by two orders of magnitude compared to controls with no induced DSB, and that these DSBs were also often repaired by error-prone NHEJ. This seminal observation formed the basis for the field we now know as genome editing.

Though the I-SceI homing endonuclease offered unprecedented opportunities to study the foundational aspects of DNA repair in mammalian cells, I-SceI itself is not amenable to customization in order to target user-defined genomic sequences. The first customizable nuclease platform came in the form of engineered zinc finger arrays attached to the non-sequence-specific

FokI nuclease domain (ZFNs). Zinc finger motifs, short polypeptide sequences that chelate a zinc ion to fold into a specific local structure, are the most widely found motif mediating protein-DNA interactions in eukaryotic cells. The crystal structure of the Zif268 array of zinc finger domains revealed a predictable pattern of interactions between the protein and its substrate DNA (Pavletich and Pabo, 1991). Each zinc finger domain contacted approximately three nucleotides of DNA through interactions in the major groove, and the array as a whole was able to recognize a 9 base pair consensus sequence. The crystal structure of Zif268 in complex with its substrate DNA raised

8 the possibility that each zinc finger motif might be modular, and that single motifs recognizing a specific three base pair sequence might be easily assembled together with more motifs to recognize a customizable target sequence.

Creating customized zinc finger proteins that recognize a user-defined target sequence is significantly more difficult than assembling arrays of individual zinc finger motifs to create a longer array. Zinc finger motifs are not modular, and the fingers directly adjacent to a given zinc finger motif greatly affect its ability to bind to its target sequence. To overcome this limitation, Dr.

J. Keith Joung and others designed and validated high-throughput strategies for the molecular selection of a given zinc finger array for a target sequence based on one- or two-hybrid bacterial selections. These selections significantly improved the quality and throughput with which engineered zinc finger protein arrays were able to be made. However, engineering these arrays was still beyond the abilities of the majority of laboratories and companies that necessitated customized ZFNs for research or therapeutic applications.

The discovery of Transcription Activator-Like Effectors (TALEs) set in motion the democratization of genome editing. TALEs are composed of arrays of domains, each of which recognizing a single nucleotide in a highly modular and easily programmable manner, in which each domain differs only by two residues. TALENs, fusions between TALEs and the FokI sequence-non-specific nuclease domain, are significantly easier than ZFNs to re-engineer for a desired target site and like ZFNs, require two separate TALENs to bind adjacent to each other at a target site in a specific orientation to induce a DSB. However, because designing a new TALEN requires designing two novel proteins, each composed of 12-18 repeats of domains that differ from each other at only two positions, successfully cloning TALENs proved challenging. Complex cloning schemes were devised to facilitate the high-throughput production of plasmids encoding

9

TALENs, but these schemes still remained largely out of reach of the average lab because they required specialized equipment such as liquid-handling robotics.

CRISPR and the democratization of genome editing

Soon after the advent of high-throughput methods for assembling plasmids encoding customized TALENs, the prokaryotic adaptive immune system described as clustered regularly interspaced palindromic repeats (CRISPR) was shown to encode a protein component, now named

Cas9, that acts as an RNA-guided DNA endonucleases both in vitro and in human cells. All the

Cas9 protein requires to be programmed to target and induce a DSB at a specific sequence of genomic DNA is a guide RNA molecule, composed of a 20 nucleotide spacer sequence that binds to a target sequence by Watson-Crick base pairing and a scaffold RNA composed of a constant sequence, as well as a short protospacer adjacent motif (PAM) directly adjacent to the targeted sequence. The discovery and characterization of CRISPR systems and their encoded DNA endonucleases vastly simplified the process of directing nucleases to specific genomic sequences to manipulate genetic elements in desirable ways. The Cas9 protein has offered researchers the unparalleled ability to manipulate the genomes of nearly any organism in a highly targetable and specific manner, expanding our ability to directly investigate the functions of genetic elements, allowing us to better understand the fundamentals of biology. In addition, the Cas9 protein also offers advantages in some properties over other designer nuclease platforms for use in therapeutic interventions in humans.

CRISPR systems are the product of a billion year old arms race between prokaryotes and the viruses that infect them. The defining features of CRISPR arrays, identical repetitive sequence elements separated by short, unique spacer elements, were first described by Francisco Mojica

(Mojica et al., 1993, 2005) then experimentally demonstrated to be part of a bacterial adaptive

10 immune system soon thereafter (Barrangou et al., 2007; Fineran and Charpentier, 2012; Horvath and Barrangou, 2010; Wiedenheft et al., 2012). Though the identical repetitive elements remained enigmatic at the time, Mojica noticed that the unique spacer sequences were homologous to sequences in the genomes of bacteriophage and invading plasmids. This was corroborated by evidence published by Rudolphe Barrangou in 2007 showing that, upon challenging bacteria with a phage capable of infecting that bacteria, the CRISPR arrays of the challenged bacteria expanded to include sequences homologous to the genomes of the infecting phage (Barrangou et al., 2007).

Further, these expanded CRISPR arrays provided immunization against repeated infection with the same phage. Further evidence of the presence of an RNA-targeted DNA endonuclease in

CRISPR arrays came in 2010, providing evidence that the CRISPR machinery was responsible for producing DSBs in plasmids in E. coli (Garneau et al., 2010).

In 2012, scientists led by first worked out the biochemical details of how

CRISPR arrays and their associated protein-coding genes convey immunity to a host against invading genetic elements, followed shortly by a second group (Gasiunas et al., 2012; Jinek et al.,

2012). Through a series of biochemical experiments, the researchers demonstrated that a protein encoded in CRISPR arrays and now called Cas9 functions as a DNA endonuclease and is guided to 20 nucleotide target sites directly adjacent to a protospacer adjacent motif (PAM) by dual RNA molecules termed crRNA and tracrRCA. Further, Doudna and colleagues found that they could combine both the crRNA and tracrRNA into a single guide RNA molecule (gRNA) composed of a 20 nucleotide spacer sequence homologous to the target DNA sequence, and a 100 nucleotide scaffold sequence that bound tightly to Cas9. Because the spacer sequence dictates the target site,

Cas9 could be easily programmed to target user-defined DNA sequences by changing the sequence of the spacer to be homologous to any given target site. This work was followed shortly by other

11 groups demonstrating that the Cas9 and gRNA components could be introduced to mammalian cells and efficiently induce DSBs at genomic target sites (Cong et al., 2013; Mali et al., 2013a).

The biochemical characterizations and successive demonstrations of the utility of the

CRISPR-Cas9 system in mammalian cells spurred what has been the biggest paradigm shift in biological research since the discovery of DNA as the heritable genetic material. CRISPR-Cas9 can be easily programmed to target nearly any genetic element in mammalian cells and alter its functionality by introducing DSBs and, following DNA repair, indels with high frequencies at precise genomic locations. This has allowed researchers to directly interrogate the functions of protein coding genes and non-coding elements with unprecedented efficiency and at several orders of magnitude greater scale than was previously possible using sequence-specific designer nuclease platforms such as ZFNs and TALENs. Further, reprogramming the Cas9 protein to target new sequences is simple enough to allow any interested researcher to rapidly and cost-effectively target desired genomic sites in a wide range of organisms and cell types.

Base editor technologies

Precise, user-defined genome editing outcomes can be effected by co-opting the HDR pathway to insert a sequence encoded on an exogenously-provided template at a genomic site with an induced DSB in many mitotically-active cell types (Cox et al., 2015a; Jasin and Haber, 2016).

While this strategy is capable of adding relatively large sequences of DNA into a target genome, the efficiencies of DSB repair by HR are significantly lower than NHEJ and thus the fraction of cells in a population that have undergone precise repair by HR is low, with many of the cells having stochastically-produced indel mutations at the DSB site. However, the majority of human diseases are caused by a single nucleotide polymorphism (SNP) and thus can be modeled or corrected by

12 changing the identity of the single pathogenic nucleotide to a non-pathogenic base at a given position (Landrum et al., 2016).

Base editor technologies (BEs), first described as a fusion protein comprising the rat

APOBEC1 cytidine deaminase enzyme, and later the lamprey-derived PmCDA1 cytidine deaminase, single-strand nicking Cas9 (nCas9) and a uracil glycosylase inhibitor (UGI) domain, are able to efficiently induce C to T base substitutions in the genomes of mammalian cells. These proteins are first guided to a target site by nCas9, which binds to the target site specified by its gRNA spacer sequence and forms a stable R loop. Because nCas9 recognizes its target site through

Watson-Crick base pairing to the target strand of DNA, the non-target strand is left unpaired as single stranded DNA. A segment of this single stranded DNA then becomes accessible to the rat

APOBEC1 protein, which binds to any cytidines in the single stranded DNA and enzymatically deaminates them. Deamination of cytidines produces uracil on the non-target DNA strand at these positions. Concomitant nicking of the target strand downstream of the editing window by nCas9 results in 3’à5’ resection of the target strand by native exonucleases. The target strand is then re- polymerized by endogenous polymerases, most notably the Rev1 protein, using the uracil- containing non-target strand as a template. Together, the enzymatic deamination of non-target strand cytidines coupled with the nicking activity of nCas9 strongly biases stable incorporation of the desired C to T mutation into the target genome. Because this process does not induce DNA double strand breaks, the repair of these enzymatically-induced lesions does not compete with the

NHEJ pathway. The genomic products resulting from base editing typically contain between 0.1% and 2% indel mutations, but can incorporate desired mutations at frequencies greater than 70%.

Base editors represent a unique genome editing platform that is highly differentiated from other existing technologies. Because BEs are able to incorporate precise, defined mutations at

13 arbitrary target sequences without an exogenously-supplied donor DNA molecule to serve as a template, and with very few stochastic indel mutations, they have greatly improved the ability of researchers to precisely modify single nucleotides in mammalian genomes. In addition to their uses as tools to precisely introduce or correct pathogenic SNPs in cells, researchers have also leveraged BEs to make large numbers of guided mutations to protein-coding sequences using large libraries of gRNAs tiling an open reading frame for the purpose of creating protein variants with enhanced or otherwise desirable properties (Hess et al., 2016a).

The base editor platform has been further developed to also direct targeted genomic A to

G base substitutions. Through molecular evolution, Gaudelli et al were able to alter the substrate preference of the tRNA-specific adenosine deaminase protein TadA from tRNA to ssDNA

(Gaudelli et al., 2017). These modified TadA-nCas9-UGI fusion proteins, termed A base editors

(ABEs), use enzymatic deamination of adenosine bases to produce inosine. During DNA repair or replication, this genomic inosine base pairs with cytidine, producing a G:C base pair where an A:T base pair previously existed. Further, because genomic inosine is a rare form of DNA damage there are no efficient mechanisms for repairing these lesions. As a result, the induced inosine is rarely excised from the genome to produce a DSB with the concomitant nick on the target strand by nCas9. This property results in greatly reduced indels produced by ABE relative to BE, typically on the order of 0.1% frequency.

Genome-wide specificities of designer nucleases and base editors

All sequence-specific genome editing proteins function by recognizing DNA through specific patterns of interactions with the target base pair sequence. In the cases of ZFNs or

TALENs, each DNA binding array recognizes target sequences primarily through base-specific interactions between the protein and the major groove of the substrate DNA duplex. To stably bind

14 to a given sequence, the ZFN or TALEN must have sufficient interaction energy with a substrate

DNA molecule for the protein to stably bind. When both monomers of a ZFN or TALEN pair bind to adjacent DNA sequences, the FokI nuclease domain on each monomer is able to dimerize and cleave the genomic DNA between the two monomer binding sites.

Off-target binding and cleavage by engineered, sequence-specific nucleases is a critical parameter to consider when designing new genome editing reagents. For obligate dimeric reagents that utilize the FokI nuclease domain such as ZFNs and TALENs, off-target sites can generally be classified into two categories: (i) homodimeric sites, and (ii) heterodimeric sites that imperfectly match the intended target sequence. Homodimeric off-targets occur when one of the two ZFN or

TALEN monomers binds adjacent to a second copy of the same monomer, inducing a genomic

DSB. These off-target sites can be minimized by targeting a unique sequence that does not have a similar sequence repeated adjacent to it, or by employing mutations in the FokI nuclease domain that require it to heterodimerize with the second monomer in the ZFN or TALEN pair.

Heterodimeric off-target sites represent the most significant class of genome-wide off- target sites by ZFNs and TALENs. The energetic requirements for stable binding for a typical ZFN and TALEN monomer can often be satisfied by a relatively large number of substrate nucleotide sequences that bear significant sequence homology to the target substrate, but differ from the intended target site by one or more nucleotides. Often, ZFNs constructed with arrays of three zinc fingers can tolerate sites with up to 3 base pairs different from their intended target substrate

(Gabriel et al., 2011; Pattanayak et al., 2011). Similarly, TALEN monomers can also tolerate sites that differ by up to 12 base pairs from their target site (Guilinger et al., 2014a; Hockemeyer et al.,

2011; Osborn et al., 2013; Pattanayak et al., 2014). This promiscuity often leads to both monomers of a ZFN or TALEN pair binding in close enough proximity and inducing DSBs at a relatively

15 small subset of genomic sites. As a result, careful design of the ZFN or TALEN target site to avoid sites with high similarity to genomic off-target sites is a necessary consideration. Modifications to the proteins themselves can also abrogate most or all genomic off-targets. For instance, substitution of cationic residues for glutamine at specific positions in the C-terminal domains of TALENs decreases the positive charge of the protein, and thus makes the protein less likely to have sufficient interaction energy with an off-target site that is not perfectly complementary to the target site

(Guilinger et al., 2014a).

The mechanisms determining the specificities of RNA-guided endonucleases such as

SpCas9 are significantly more complex than those governing off-target activities of ZFNs or

TALENs. For instance, a catalytically-inactive form of Cas9 (dCas9) stably binds to a wide array of off-target sites in the genome, some with as little as 10 or 12 base pairs of homology to the 20 nucleotide target site (Kuscu et al., 2014; Wu et al., 2014). However, the advent of strategies for directly monitoring the sites at which SpCas9 is able to induce a DSB revealed the set of bonafide nuclease off-target sites for this protein to be a small subset of those sites to which it is able to bind

(Crosetto et al., 2013; Frock et al., 2015; Kim et al., 2015; Tsai et al., 2015, 2017). Initial efforts at further improving SpCas9 genome-wide fidelity focused on adapting the naturally monomeric system into a dimeric system. By inactivating one of SpCas9’s two nuclease domains to produce single strand nicking SpCas9, one group demonstrated that it was possible to direct this protein to target both DNA strands in adjacent sequences two gRNA molecules, thus producing a DSB only when both strands are nicked in close proximity (Cho et al., 2014; Mali et al., 2013b; Ran et al.,

2013b). Though effective in reducing off-target mutagenesis by SpCas9 to a significant degree, this system is not truly dimeric because each SpCas9 nickase is capable of nicking at all off-target sites specified by its gRNA, and single strand nicks can be converted to DSBs at low frequencies

16 to produce indel mutations52. To create a truly dimeric system using SpCas9, a strategy using catalytically-inactive SpCas9 (dSpCas9) fused to the FokI domain was developed (Guilinger et al., 2014b; Tsai et al., 2014). Targeting two dSpCas9-FokI molecules to adjacent DNA sequences allows the FokI domain to dimerize and induce a DSB only at genomic loci where the dSpCas9s stably bind.

Engineering dimeric, RNA-guided endonucleases greatly limits the targeting range of the system because of the requirement for two PAM sequences in the correct spacing and orientation relative to each other. As a result, a significant amount of work has been performed with the goal of creating monomeric RNA-guided endonucleases with very high genome-wide specificities.

Rational, structure-informed protein engineering efforts aimed at increasing the genome-wide fidelity of SpCas9 by reducing non-sequence-specific interactions between the protein and its substrate DNA yielded SpCas9 variants bearing amino acid substitutions that greatly decrease or completely ablate off-target effects (Kleinstiver et al., 2016; Slaymaker et al., 2016). Biochemical experiments later revealed that these increased-fidelity SpCas9 variants have unaltered binding affinities for off-target sites compared to the wild-type SpCas9, indicating that the target specificities of these proteins is improved by a mechanism distinct from target site affinity. Single molecule Förster resonance energy transfer experiments demonstrated that the increased fidelity mutations decreased the frequency with which SpCas9 is able to undergo an essential conformational change when bound to substrate DNA that activates it for endonucleolytic cleavage

(Chen et al., 2017). These initial efforts at improving SpCas9 targeting accuracy have since been followed up by more elegant molecular evolution approaches (Casini et al., 2018; Lee et al., 2017), with one resulting variant, termed xCas9, having both relaxed PAM sequence requirements, and

17 thus greatly improved targeting range, as well as greatly enhanced genome-wide fidelity compared to the wild-type protein (Hu et al., 2018).

Unlike their Cas9 nuclease counterparts, base editor technologies have been poorly characterized with respect to their off-target editing capabilities to date. One genome-wide, unbiased off-target discovery assay, termed modified digenome-seq, has been developed to detect base editor-specific off-target editing (Kim et al., 2017a), while other studies have used a combination of GUIDE-seq results and in silico off-target prediction based on sequence homology to the intended on-target site (Komor et al., 2016a; Rees et al., 2017a), a strategy that is unlikely to capture all off-target editing sites for a given BE and gRNA. The off-target editing capabilities of base editor technologies have been found to be primarily dictated by the specificity of the nCas9 part of the fusion protein. Importantly, BE3 has been found to have significantly fewer off-target editing sites compared to Cas9 nuclease at all sites that have been examined using modified digenome-seq. However, the off-target profile of a base editor protein with a given gRNA differs significantly from the profile of the Cas9 nuclease protein alone (Kim et al., 2017a). BE3 edits only a subset of the off-target sites edited by its SpCas9 nuclease counterpart, but also retains high- efficiency editing capabilities at sites containing single RNA base bulges of the gRNA spacer, sites that are rarely edited by SpCas9 nuclease alone.

Because the nCas9 portion of the BE determines the set of guided off-target editing events for a given protein and gRNA, it is possible to leverage the Cas9 variants that have been engineered to have increased genome-wide fidelity as nucleases to create BEs with high genome-wide specificity. BEs incorporating mutations intended to enhance the genome-wide fidelity of SpCas9 nuclease also greatly improve the fidelity of BE3 (Rees et al., 2017a). Additionally, as with Cas9 nuclease (Zuris et al., 2015), delivering BE3 as a protein in complex with a gRNA molecule

18 directly to target cell populations significantly enhances the genome-wide specificities of BE3

(Rees et al., 2017a). However, because PAM sequences remain a critical limiting factor for the BE platform, it may be necessary to incorporate different Cas9 proteins in order to target a given genomic site. This PAM targeting limitation can be partially alleviated by incorporating engineered Cas9 variants recognizing orthogonal PAM sequences, or by the use of native Cas9 proteins from orthogonal organisms that naturally recognize divergent PAM sequences (Kim et al., 2017e). In the first case, however, high fidelity mutations may not be compatible with all engineered SpCas9 variants recognizing orthogonal PAM sequences, and there are currently no other Cas9 proteins derived from orthogonal species that have high fidelity mutations described.

19

Chapter 2: Engineering CRISPR-Cas9 for Improved Targeting Range, Delivery, and

Control

Engineering transcription factor-specific CRISPR-Cas9 nucleases

Engineered targeted nucleases can be used to genetically correct disease-causing mutations in human cells. Such therapeutic strategies rely on the nuclease to introduce a sequence-specific

DNA double strand break (DSB) at a specified site in the genome. For example, the specificity of

RNA-guided nuclease (RGN) platforms such as CRISPR-Cas9 is primarily dictated by a guide

RNA molecule (gRNA) bearing a spacer sequence with 20 nucleotides of complementarity to the target DNA site; other genome editing platforms, like zinc-finger (ZF) nucleases or TALE nucleases, derive their specificity from sequence-specific protein-DNA contacts but require more complicated engineering strategies to produce protein domains that specifically bind to user- defined sequences. Genome editing is achieved by leveraging endogenous cell machineries that repair these targeted DSBs either via an error-prone pathway termed non-homologous end joining

(NHEJ) that introduces variable length indel mutations in a stochastic manner, or by more precise homology-directed repair (HDR) using a homologous exogenous “donor template” or a homologous sequence found within the genome itself (Rouet et al., 1994). Although genome- editing nucleases can robustly induce DSBs at their specified target sites, all nuclease platforms are also known to induce unwanted DSBs at sequences that resemble the intended target (Guilinger et al., 2014a; Pattanayak et al., 2011; Tsai and Joung, 2016). These off-target DSBs are efficiently repaired by NHEJ, resulting in unintended mutations at these sites, which can be distributed throughout the genome.

For therapeutic applications, a desirable capability would be to restrict nuclease activity not only to specific DNA sequences but also to only a particular epigenetic context (s), which in

20 turn could represent a specific cell type; for example, only in cells that produce a disease phenotype or in which introduction of a genetic alteration would be expected to have a therapeutic benefit.

Having such a capability would enable limitation of the number and kinds of cells in which nucleases are active, and thus minimize the number of cells in which either on- or off-target DSBs might accrue. Existing strategies for performing genome editing in a cell-type-specific manner involve ex vivo sorting approaches to separate out relevant cell types (Perez et al., 2008), delivering nucleic acids encoding genome editing reagents in a virus with tropism towards a specific cell or tissue type (Nathwani et al., 2014) or the use of cell-type-specific regulatory elements (e.g., promoters and/or enhancers) to drive cell-type-specific expression of the nuclease (s) (Walther and

Stein, 1996). Enrichment for a specific cell type by cell surface labeling and cell sorting is costly, laborious, and in some cases it may not be possible to differentiate between closely related cell types. Though some viruses have marked preference for cell type, the targetable cell types are limited and, if the virus is introduced systemically, often results in significant titers of virus in the liver regardless of the virus used. Further, it can often be difficult to evade a neutralizing host immune response. In addition, many cell-type-specific regulatory elements such as promoters exhibit leaky expression in related cell types, limiting their utility for genome editing applications that require tight control of nuclease activities. This strategy is also incompatible with delivery of

RNA, purified nuclease proteins, or ribonucleoprotein (RNP) complexes to bulk populations of cells, strategies that have shown demonstrably lower off-target nuclease effects than delivery by

DNA encoding the genome editing reagents (Zuris et al., 2015).

Epigenetically regulated sequence-specific nucleases

To limit the activities of sequence-specific nucleases to particular cell types, we believed it may be advantageous to create a system in which their cleavage activities are contingent on the

21 presence of specific transcription factors (TFs) or histone modifications adjacent to the target site.

To create such a system, we genetically linked nucleases that on their own induce minimal or no

DSBs to engineered affinity proteins (APs) that possess high affinities for specific TFs or post- translational histone modifications (Figure 1). Examples of APs include but are not limited to single chain antibodies, engineered fibronectin domains, engineered Staphylococcus aureus immunoglobulin binding protein A, and engineered nanobodies (Helma et al., 2015). The cleavage activities of these nuclease-AP fusions are dependent both on recognition of the target site specified by the nuclease as well as the presence of the AP binding partner in close proximity to the target site.

Figure 1. RGN nuclease activity dependent on a proximal transcription factor or histone modification. (a) A representation of an affinity protein, shown here as an scFv, covalently linked to an RGN targeted to a site within a gene. Because the binding partner of the scFv isn’t present at a site adjacent to the gRNA target site, the RGN is unable to stably bind its target site and thus is unable to induce a DSB. (b) Conversely, when the binding partner of the scFv is present adjacent

22 to the gRNA target site, the scFv binds to its target, represented here as a transcription factor. This binding event stabilizes the RGN, causing it to induce a DSB at the target site. This DSB is repaired by NHEJ or by HDR.

Previous work has established that mutations made to residues in the PAM interacting domain of Streptococcus pyogenes Cas9 (Cas9) abolish nuclease activity of the protein, and fusion to a second engineered DNA binding domain (DBD) is sufficient to rescue SpCas9 nuclease activity in this context (Bolukbasi et al., 2015). It’s possible that this system would prove useful in creating nucleases that are active only in the context of a DNA-bound TF, however it’s unlikely that this system would prove useful with a ubiquitous target (i.e. modifications made to histone proteins), as it relies entirely on the second DBD to provide initial specificity in the target search.

To engineer site-specific nucleases that are poised for cleavage activity (but unable to efficiently cleave their target site), we destabilized binding of the protein to its target sites by decreasing the non-specific affinity of the nuclease for DNA through targeted mutations to residues that contact the target or non-target DNA strands (Kleinstiver et al., 2016). The resulting Cas9 variants could also be used in conjunction with gRNAs that possess decreased affinity for their genomic target sites, such as: (i) gRNAs with spacer lengths of 19, 18, and 17 bp79, (ii) gRNAs possessing one, two, or three intentional mismatches relative to the intended target site, (iii) appending an additional 5’ G base (that is mismatched to the target DNA sequence) to gRNAs with 20 nts of complementarity to the on-target site, and (iv) a combination of any of these previously listed gRNA variations.

To test whether such an approach might be feasible, we first developed a system in which

Cas9 variants bearing different sets of mutations intended to decrease non-specific affinity of the protein for DNA were genetically fused to an engineered zinc finger array (ZF292R) targeted to a

23 single genomically integrated copy of an EGFP reporter gene. Introduction of a nuclease-induced

DSB into the EGFP coding region that is then repaired via NHEJ can lead to frameshift mutations, causing cells to become EGFP-negative, a phenotype that can be quantitatively assayed using flow cytometry. We tested the activities of these variant nucleases with and without fusion to the

ZF292R zinc finger array also targeted to EGFP together with four different gRNA variants targeting the same site in EGFP: (1) gRNA with 20 nt of homology to the target site and with an additional 5’ appended G that is mismatched to the target site sequence (gRNA1), (2) gRNA with

19 nt of homology to the target site and a 5’ 20th nt that is a G, which is mismatched to the target site (gRNA2), (3) gRNA with 18 nt of homology to the target site with two 5’ Gs mismatched to the target site (gRNA3), and (4) a perfectly matched gRNA with 17 nt of homology to the target site and no additional mismatched G nts (gRNA4). We found that two variants, Cas9 (R661A,

Q695A) (Cas9 Var1) and Cas9 (R661A, Q926A) (Cas9 Var2), when tested with all four gRNAs showed increased nuclease activity when fused to ZF292R as judged by EGFP disruption assay

(Figure 2a). We also performed TIDE, a sequencing-based indel quantification assay (Brinkman et al., 2014), to directly assess the nuclease activity of each of these nuclease complexes. In agreement with the flow cytometry assay, analysis of the cell populations by TIDE demonstrated increased rates of indel formation when both Cas9 variants were fused to ZF292R with all four gRNAs tested (Figure 2b).

To provide proof of principle for creating nucleases with activities dependent on binding to a DNA-bound artificial transcription factor, we next developed a system in which ZF292R is genetically fused to a GCN4 peptide (GCN4-ZF292R) that can be bound tightly and specifically by an engineered scFv (scFv GCN4) (Wörn et al., 2000). We fused scFv GCN4 directly to Cas9

Var1 and evaluated whether Cas9 Var1-scFv GCN4 was able to disrupt EGFP when co-expressed

24 with GCN4-ZF292R or H3 (1-38)-ZF292R (a fusion of the same ZF292R zinc finger array to the

N-terminal 38 residues of histone H3). Indeed, Cas9 Var1-scFv GCN4 showed increased EGFP disruption when co-expressed with GCN4-ZF292R but not with H3 (1-38)-ZF292R using gRNA1 and gRNA 2 (Figure 3a), indicating that the specific interaction between scFv GCN4

25

100 90 80 70 60 gRNA1 50 40 gRNA2 30 gRNA3

EGFP disruption (%) 20 gRNA4 10 no gRNA 0

Cas9

Cas9 Var1 Cas9 Var2

Cas9 Var1-ZF292R Cas9 Var2-ZF292R

70

60

50

40 gRNA1 30 gRNA2 indels (%) 20 gRNA3 gRNA4 10 no gRNA 0

Cas9

Cas9 Var1 Cas9 Var2

Cas9 Var1-ZF292R Cas9 Var2-ZF292R

Figure 2a. Characterizing the EGFP disruption activity of two Cas9 variants with or without fusion to ZF292R. (a) Cas9 variants were used alone or fused to ZF292R, an engineered zinc finger DNA binding domain with a binding site adjacent to the gRNA target site. (b) TIDE analysis of the same cell populations as in Figure 2a confirming that both Cas9 variants have greater capacity to induce indels when fused to ZF292R.

26

100 90 80 70 60 50 40 30 gRNA1

EGFP disruption (%) 20 10 gRNA2 0 gRNA5

Cas9 Cas9 cells no gRNA

H3 (1-38)-NLS-FLAG-ZF292RGCN4 V1-NLS-FLAG-ZF292R Cas9 (R661A, Q695A)-scFv GCN4 Cas9 (R661A, Q695A)-scFv GCN4

Cas9 (R661A, Q695A)-scFv GCN4 + GCN4-… Cas9 (R661A, Q695A)-scFv GCN4 + H3-ZF292R

80 70 60 50 40 30 indels (%) 20 gRNA1 10 gRNA2 0 gRNA5

Cas9 Cas9 cells no gRNA

H3 (1-38)-NLS-FLAG-ZF292RGCN4 V1-NLS-FLAG-ZF292R Cas9 (R661A, Q695A)-scFv GCN4 Cas9 (R661A, Q695A)-scFv GCN4

Cas9 (R661A, Q695A)-scFv GCN4 + GCN4-… Cas9 (R661A, Q695A)-scFv GCN4 + H3-ZF292R

27

Figure 3: Characterizing the EGFP disruption activity of SpCas9 (R661A, Q695A)-scFv GCN4 when expressed alone or co-expressed with H3 (1-38)-ZF292R or GCN4-ZF292R. (a) Increased EGFP disruption activity by the SpCas9 variant is specific to co-expression with GCN4- ZF292R, suggesting that the interaction between GCN4-ZF292R and scFv GCN4 is mediating the increased EGFP disruption. Further, the perfectly matched gRNA5 restores SpCas9 (R661A, Q695A)-scFv GCN4 EGFP disruption activity to wild-type levels, indicating that the gRNA modifications outlined in Strategy #1 and Strategy #2 are important for inducible activity of the SpCas9 variants tested in this system. (b) TIDE analysis of the same cell populations from Figure 3a demonstrating that the interaction between GCN4-ZF292R and SpCas9 (R661A, Q695A)-scFv GCN4 stimulates indel formation at the EGFP target site. and its cognate peptide is required for this phenotype. In agreement with the flow cytometry assay, analysis of these cell populations by TIDE demonstrated increased rates of indel formation by

Cas9 Var1-scFv GCN4 only when co-expressed with GCN4-ZF292R and not H3 (1-38)-ZF292R

(Figure 3b). Additionally, as a control, Cas9 Var1-scFv GCN4 was tested with a gRNA bearing

20 nt of perfect complementarity to a different target site in EGFP with no appended 5’ mismatched G (gRNA5) to ensure that the proteins retained nuclease activity comparable to wild- type SpCas9 in the absence of modifications made to the gRNA.

Conclusions and Future Directions

We have demonstrated that through a combination of limiting gRNA affinity and mutations made to the Cas9 protein intended to have a similar effect, it is possible to engineer a system in which Cas9 variants have limited nuclease activity on their own, but are rescued to wild type or near-wild type levels when fused to an AP in close proximity to a DNA-bound TF recognized by the AP in close proximity to the gRNA target site. These Cas9 variants possess WT activity when targeted to a site with a gRNA bearing 20 nucleotides of complementarity and no 5’ appended mismatched guanine, indicating that these proteins are finding their genomic target sites but are unable to achieve the conformational state necessary to efficiently activate the catalytic activities of the endonuclease domains of the protein when the gRNA is altered to minimize affinity for its 28 target site. Moving forward, it’s clear that additional engineering of Cas9 will be necessary in order to reduce nuclease activity in the absence of an AP while maintaining wild type or near wild type nuclease activity when fused to an AP.

Engineering minimal Cas9 orthologs for double stranded DNA binding and nuclease activity

For many research and therapeutic purposes, it would be useful to be able to deliver all components of a genome editing system in a single adeno-associated virus (AAV). AAVs are unique in their abilities to induce targeted delivery of a genome editing payload to specific cell or tissue types when given systemically, including to muscle, bone marrow or the central nervous system (Hocquemiller et al., 2016; Ponnazhagan et al., 1997; Wang et al., 2014, 2014). In some cases, this would encompass the gene for Cas9 and its cognate sgRNA as well as the regulatory elements necessary to express both. In other cases, it might be necessary to also encode a donor molecule to make precise homology-directed repair (HDR) alterations to the genome. In the case of BE technologies, it would be necessary to encode a full length base editing protein with UGI and a catalytically-active minimal deaminase domain. Though this may be feasible using active small Cas9 orthologs derived from S. aureus or S. thermophilus (Kleinstiver et al., 2015; Ran et al., 2015a), it would be difficult or impossible to also encode an sgRNA expression cassette in the same AAV because commonly used AAVs have a packaging limit of approximately 4.5 kb.

Further, it would be useful to have additional small Cas9s that are capable of targeting diverse protospacer adjacent motifs (PAMs) to increase the targeting range of genome editing technologies encoded by AAV.

Type II-C Cas9s are a class of size minimized, evolutionarily distant proteins to the Type

II-A Cas9s typically used in genome editing experiments (Table 1). The Type II-C Cas9s that have

29 been investigated are unable to bind and cleave double stranded DNA (dsDNA). Because these proteins are able to act on single stranded DNA (ssDNA) and dsDNA in which the target site is mismatched to create a ‘bulge’, it’s thought that these proteins have deficiencies in their phosphate lock loops8 (PLLs), a conserved region that interacts with the phosphate backbone of dsDNA just upstream of the PAM and that is required for efficient R-loop formation. However, this class of proteins has been poorly characterized with respect to their nuclease activities in human cells. To further characterize this class of proteins in mammalian cells and determine whether a subset of these orthologs might be capable of enabling genome editing in the cellular milieu, we expressed a subset of proteins in U2OS cells targeted to a chromosomally-integrated

Size Spacer

Species Abbreviation PAM Class (AA) Length

F. Novicida FnCpf1 TTV V 1,300 20, 23

M. Bovoculi MbCpf1 TTV V 1,372 20

C. Jejuni CjCas9 NNNNACAC II-C 984 22

N. Cinerea NcCas9 NNNNGTA II-C 1,082 20, 23

P. Lavamentivorans PlCas9 NNNCAT II-C 1,037 20, 23

C. Lari ClCas9 NNGGG (Y?) II-C 1,003 20, 23

S. Pasteurianus SpastCas9 NNGTGA II-A 1,130 20, 23

P. Multocida PmCas9 GNNNCNNA II-C 1,056 20, 23

S. Pyogenes SpCas9 NGG II-A 1,368 20

S. Aureus SaCas9 NNGRRT II-A 1,053 20-24

30

Table 1. Cas9 orthologs with desirable characteristics. Orthologs highlighted in tan or blue boxes were poorly characterized or uncharacterized for nuclease activities in human cells. Orthologs that are smaller in coding sequence than SpCas9 or SaCas9 are in bold. EGFP gene and measured their ability to disrupt this gene, either by steric hindrance of transcription through target site binding or by frameshift mutations resulting from NHEJ events induced by DNA double strand breaks (Figure 4). With the exception of CjCas9, none of the Type

II-C Cas9s were functional in human cells. However, the smallest Type II-A ortholog, SpastCas9, efficiently disrupted the EGFP gene with all five gRNAs tested, providing additional PAM sequence recognition and expanding the targeting range of the CRISPR-Cas9 toolbox for genome editing in human cells. Going forward, we will work to determine which of these proteins will be candidates for further engineering to improve their activities on dsDNA, either through PLL mutagenesis screens or introduction of non-specific protein-DNA contacts.

C a s 9 O rth o lo g S c re e n

1 0 0

) n o g R N A %

( 8 0

n E G F P s ite 1 o i t 6 0 E G F P s ite 2 p u

r E G F P s ite 3 s i 4 0 D

E G F P s ite 4 P

F 2 0 E G F P s ite 5 G E

0

s s s s s s s s s r r r r r r r r r e e e e e e e e e c c c c c c c c c a a a a a a a a a p p p p p p p p p s s s s s s s s s t t t t t t t t t n n n n n n n n n 2 3 0 3 0 3 0 0 3 2 2 2 2 2 2 2 2 2 ------G G G G G G G 9 9 s s A A A A A A A a a L L L L L L L F F F F F F F tC tC ------s s S S S S S S S a a L L L L L L L p p -N -N -N -N -N -N -N S S 9 9 9 9 9 9 9 s s s s s s s a a a a a a a C C C C C C C j c c l l l l C N N P P C C

Figure 4. Cas9 orthologs have variable activities in human cells. U2OS cells bearing a chromosomally-integrated EGFP gene were transfected with the indicated Cas9 ortholog and

31 gRNA and cells were analyzed for disruption of the EGFP gene by flow cytometry 48 hours post transfection.

Chapter 3: Engineering Sequence-Specific Cytidine Deaminases

Jason Gehrke1,2, Oliver Cervantes1, M. Kendell Clement1,3, Yuxuan Wu4, Jing Zeng4, Daniel E.

Bauer4, Luca Pinello1,3, J. Keith Joung1,3

1Molecular Pathology Unit, Center for Cancer Research, and Center for Computational and

Integrative Biology, Massachusetts General Hospital, Charlestown, Massachusetts 02129, USA.

2Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts

02138, USA.

3Department of Pathology, Harvard Medical School, Boston, Massachusetts 02115, USA.

4Division of Hematology/Oncology, Boston Children’s Hospital, Department of Pediatric

Oncology, Dana-Farber Cancer Institute, Harvard Stem Cell Institute, Department of Pediatrics,

Harvard Medical School, Boston, Massachusetts 02115, USA.

This chapter contains work from the manuscript “Developing high-precision CRISPR-

Cas9 base editors with minimized bystander and off-target mutations”. The text and figures were modified to fit the format of this dissertation.

Acknowledgements

J.K.J. was supported by grants from the National Institutes of Health (R35 GM118158 and

RM1HG009490), by the Desmond and Ann Heathwood MGH Research Scholar Award, and by a

St. Jude Children’s Research Hospital Collaborative Research Consortium award. J.M.G. was supported by the National Science Foundation Graduate Research Fellowship Program. D.E.B.

32 was supported by NHLBI (DP2OD022716, P01HL032262) and the St. Jude Children’s Research

Hospital Collaborative Research Consortium. We thank Alexander Sousa for advice on performing GUIDE-seq experiments, Peter Cabeceiras for assistance producing lentivirus, and

James Angstman, Vikram Pattanayak, and Alexandra Mattei for helpful discussions and comments.

Author contributions

J.M.G. conceived of the project, designed experiments and performed data analysis. O.C. performed all experiments with assistance from J.M.G. Y.W. and J.Z. performed experiments in

β-thalassemia cells, and Y.W., J.Z., and D.E.B. designed and analyzed these experiments. Data analysis was performed by M.K.C. with assistance from L.P. J.K.J. conceived of experiments and directed the research. J.K.J. and J.M.G. wrote the manuscript with input from all the authors.

Competing interests

J.M.G. is a consultant for Beam Therapeutics. J.K.J. has financial interests in Beam

Therapeutics, Editas Medicine, Monitor Biotechnologies, Pairwise Plants, Poseida

Therapeutics, and Transposagen Biopharmaceuticals. J.K.J.’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies. J.M.G. and J.K.J. are inventors on a patent application that has been filed for engineered sequence-specific deaminase domains in base editor architectures.

33

Developing high-precision CRISPR-Cas9 base editors with minimized bystander and off- target mutations

Recently described base editor (BE) technology, which uses CRISPR-Cas9 to direct cytidine deaminase enzymatic activity to specific genomic loci, enables the highly efficient introduction of precise cytidine-to-thymidine (C to T) DNA alterations in many different cell types and organisms (Hess et al., 2016b; Kim et al., 2017c; Komor et al., 2016b; Nishida et al., 2016;

Rees et al., 2017b; Shimatani et al., 2017). In contrast to genome-editing nucleases (Cox et al.,

2015b; Doudna and Charpentier, 2014; Sander and Joung, 2014), BEs avoid the need to introduce double-strand breaks or exogenous donor DNA templates and induce lower levels of unwanted variable-length insertion/deletion mutations (indels) (Komor et al., 2016b, 2017; Nishida et al.,

2016). However, existing BEs can also efficiently create unwanted C to T alterations when more than one C is present within the five base pair “editing window” of these proteins, a lack of precision that can cause potentially deleterious bystander mutations. Mutations in the cytidine deaminase enzyme can shorten the length of the editing window and thereby partially address this limitation but these BE variants still do not discriminate among multiple cytidines within the narrowed window and also possess a more limited targeting range (Kim et al., 2017f). Here, we describe an alternative strategy for reducing bystander mutations using a novel BE architecture that harbors an engineered human APOBEC3A (eA3A) domain, which preferentially deaminates cytidines according to a TCR>TCY>VCN (R = A, G; V = G, A, C; Y = C, T) hierarchy. In direct comparisons with the widely used BE3 fusion in human cells, our eA3A-BE3 fusion exhibits comparable activities on cytidines in TC motifs but greatly reduced or no significant editing on cytidines in other sequence contexts. Importantly, we show that eA3A-BE3 can correct a human beta-thalassemia promoter mutation with much higher (>40-fold) precision than BE3, substantially

34 minimizing the creation of an undesirable bystander mutation. Surprisingly, we also found that eA3A-BE3 shows reduced mutation frequencies on known off-target sites of BE3, even when targeting promiscuous homopolymeric sites. Our results validate a general strategy to improve the precision of base editors by engineering their cytidine deaminases to possess greater sequence specificity, an important proof-of-principle that should motivate the development of a larger suite of new base editors with such properties.

To engineer base editor fusions with greater precision within the editing window, we sought to leverage the natural diversity of cytidine deaminase proteins to employ a deaminase with greater sequence specificity than the rat APOBEC1 (rAPO1) deaminase present in the widely used

BE3 architecture. BE3 consists of a Streptococcus pyogenes Cas9 nuclease bearing a mutation that converts it into a nickase (nCas9) fused to rAPO1 and a uracil glycosylase inhibitor (UGI) (Fig.

5a). We replaced the rAPO1 in BE3 with the human APOBEC3A (A3A) cytidine deaminase to create A3A-BE3 (Fig. 5a). We used A3A because previously published in vitro biochemical studies showed that it preferentially deaminates cytidines embedded in the context of a TCR trinucleotide motif (where R = A/G) (Kouno et al., 2017; Logue et al., 2014; Shi et al., 2017). To test the precision of base editing by A3A-BE3, we used a guide RNA targeted to a site in a single integrated EGFP reporter gene in human U2OS cells that bears both a cognate motif (TCG) and a non-cognate bystander (GCT) motif within the expected editing window. Surprisingly, A3A-BE3 did not exhibit preferential base editing of the cytidine in the TCG motif over the GCT motif (Fig.

5b).

We hypothesized that the apparent loss of sequence preference by A3A on the EGFP site in the context of base editor fusion might have been due to its increased proximity secondary to

35 recruitment to that locus. We envisioned that sequence selectivity might be restored by reducing the non-specific binding energy of A3A for its substrate DNA. Based on co-crystal structures of

A3A and a single-stranded DNA substrate and of A3A alone, we identified 11 residues in the deaminase that appear to mediate base-specific or non-specific contacts to the DNA or that are

36

Figure 5: Engineering and characterization of an A3A-BE3 base editor that selectively edits Cs preceded by a 5’ T. (a) Schematic illustrating the architecture of the original BE3 fusion (consisting of rAPO1 linked to SpCas9 nickase and UGI) and the A3A-BE3 fusion. (b) Activities of BE3, A3A-BE3, and a series of A3A-BE3 variants bearing mutations in A3A on an integrated EGFP reporter gene target site bearing a cognate cytidine preceded by a 5’ T and a bystander cytidine preceded by a 5’ G in the editing window. (c) Schematic summarizing non-specific interactions between amino acid positions in A3A and its substrate single-stranded DNA derived from previously published co-crystal structures. (d) Heat maps showing C-to-T editing efficiencies for BE3, YE BE3s, and various A3A-BE3 variants at 12 endogenous human gene

37 target sites, each bearing a cognate cytidine preceded by a 5’ T (indicated with a black arrow) and one or more bystander cytidines within the editing window. Editing efficiencies shown represent the mean of three biological replicates.

directly involved with dimerization or reside proximal to the dimerization interface (Fig. 5c)

(Bohn et al., 2015; Kouno et al., 2017; Shi et al., 2017). Guided by this information, we created a series of 14 mutant A3A proteins bearing one or more amino acid substitutions at each of these positions (Fig. 5b). Testing of these mutated A3A proteins in the context of the A3A-BE3 architecture showed that most retained high activity on both the bystander and cognate motifs but that those bearing mutations in position N57 drastically reduced bystander motif alteration while retaining near-wild-type activity on the cognate motif (Fig. 5b). We also found we could further increase the preference of A3A for its cognate TCR motif by combining point mutations at residues

N57, K60, or Y130 (Supplementary Fig. 1). This strategy yielded the N57Q/Y130F (QF) variant, which appeared to have similar sequence preferences to the N57A/G single mutation variants. We conclude that engineering mutations into A3A enables restoration of its cytidine deaminase sequence preference in the context of a BE fusion.

We next sought to more broadly test the precision of our A3A-BE3 fusions on a larger number of sites within endogenous human genes. To do this, we used 12 different gRNAs targeted to three different human genes and directly compared the editing activities of seven base editor fusions: three A3A-BE3 variants (bearing N57G, N57A, and N57Q/Y130F mutations in A3A), the original BE3, and three previously described BE3 variants, YE1, YE2, and YEE BE3 (YE

BE3s), that possess point mutations in rAPO1 designed to slow its kinetic rate and thereby restrict the editing window11 (Fig. 5d). We found that among all seven base editor fusions tested, A3A

(N57G)-BE3 displayed the highest activity at cognate motifs while minimizing bystander cytidine

38 editing at all of the sites tested. At eight of 12 tested sites, A3A (N57G)-BE3 induced 5- to 264- fold (median of 11.2-fold) higher editing of cognate motifs than bystander motifs in the editing window. At the remaining four tested sites, A3A (N57G)-BE3 induced less than 5-fold higher editing of cognate-to-bystander motifs, but still edited bystander motifs at much lower frequencies than observed with BE3 while retaining high activity at the cognate motif. As expected, all three

A3A-BE3 variants maintained a five-nucleotide editing window (approximately 5 to 9 nucleotides downstream from the 5’ end of the targeted sequence) similar to that of the wild-type A3A-BE3 enzyme. Attempts to increase editing efficiencies by introducing the 32-amino acid linker sequence from the recently described BE4 fusion10 between eA3A and nSpCas9 did not substantially increase editing activity or alter the editing window length (data not shown). YE1

BE3 narrowed the editing window to approximately three nucleotides in most cases while still retaining catalytic activity at the cognate motif similar to BE3. YE2 BE3 failed to produce fewer bystander mutations compared to YE1 BE3, and YEE BE3 lost significant activity at 9 sites. Based on these results, we chose the A3A (N57G)-BE3 variant for additional characterization and refer to it hereafter as eA3A-BE3 (for engineered A3A-BE3).

To examine the purity of the edited alleles produced by eA3A-BE3, we performed a detailed analysis of the high-throughput sequencing results obtained at the 12 endogenous human gene target sites (Fig. 5d), which revealed that eA3A-BE3 induced significant differences in the frequencies of unwanted alterations compared with the original BE3. At 11 of the 12 sites, eA3A-

BE3 showed an altered frequency of unwanted base substitutions (i.e., C to A or G) from what was observed with the original BE3 fusion, with increases in unwanted base substitutions at 7 of the

12 sites (Supplementary Fig. 2a). This finding provides additional support for the previously proposed hypothesis that processing of genomic lesions with multiple uracils by endogenous DNA

39 repair machinery differs from those with single uracils (Komor et al., 2017). In addition, we observed eA3A-BE3 also induced fewer indels than BE3 at eight of the twelve tested sites

(Supplementary Fig. 2b), suggesting that single nucleotide editing does not generally produce indels at substantially different frequencies than multi-base editing.

To attempt to improve the precision of eA3A-BE3 at sites that had cognate-to-bystander editing ratios of less than five, we sought to further reduce the catalytic efficiency of the A3A

N57G deaminase. Our rationale for this strategy stemmed from the observation that at these four sites the majority of bystander deamination events occur within the same DNA strand as (i.e., in cis with) a cognate deamination event; bystander deamination without cognate deamination is found at fewer than 2% of all alleles at these sites while deamination of the cognate cytidine alone was found in at least 20% of alleles (Supplementary Fig. 3). To obtain a protein with lower catalytic rate, we added to eA3A mutations to the homologous positions for three previously described residues (E38, A71 or I96) that modulate the catalytic activity of the human AID enzyme (Wang et al., 2009). We then screened these mutated eA3A-BE3 variants for activity on three of the genomic sites that had retained significant bystander deamination when edited with eA3A-BE3 (Supplementary Fig. 4). Mutations made to residues I96 and A71 greatly decreased mutation of bystander motifs at each of the three target sites while retaining 50-75% of A3A N57G

BE3 activity at the cognate motif. These results suggest that it may be possible to further modify eA3A-BE3 using a set of defined mutations to tune precision at sites with less than optimal cognate-to-bystander editing ratios.

Having characterized the on-target activity of eA3A-BE3, we next sought to characterize and optimize its potential off-target activity. To do this, we used three different gRNAs (targeted to the EMX1, FANCF, and VEGFA genes) (Supplementary Table 3), for which a number of off-

40 targets had been previously identified with the original BE3 by either Digenome-seq (performed with rAPO1-nSpCas9 also known as “BE3 ΔUGI” (Kim et al., 2017b) or GUIDE-seq (performed with SpCas9 nuclease) (Komor et al., 2016b; Tsai et al., 2015). Using GUIDE-seq performed with

SpCas9 nuclease, we also identified two potential off-target sites for a fourth gRNA (targeted to the CTNNB1 gene) (Supplementary Fig. 5). Finally, we identified some additional closely matched sequences for the CTNNB1-targeting gRNA in silico in the human reference genome using the Cas-OFFinder program (Bae et al., 2014). We then performed targeted amplicon sequencing of these 60 sites to assess base editing events induced by the original BE3 and eA3A-

BE3 with the four gRNAs in human HEK293T cells. For two of the four gRNAs, on-target base editing efficiency of the cognate motif with eA3A-BE3 either matched or outperformed the original BE3, although we observed small to moderate decreases in editing efficiency using the

CTNNB1 site 1 or VEGFA site 2 gRNAs (Figs. 6a - 6d and Supplementary Fig. 6). For 36 of the

60 potential off-target sites we examined, BE3 induced significant base editing events (compared to control amplicons from untransfected cells) (Figs. 6a – 6d and Supplementary Table 4).

Surprisingly, at 34 of these 36 off-target sites, eA3A-BE3 induced significantly lower frequencies of base editing events with no significant detectable editing at 21 of these 36 sites (Figs. 6a – 6d).

The N57G mutation in the A3A deaminase part of eA3A-BE3 is critical for this higher specificity because the A3A-BE3 fusion lacking this alteration showed higher off-target mutations with the

EMX1 site 1 and FANCF site 1 gRNAs (Supplementary Fig. 7). The addition of mutations previously shown to improve the genome-wide specificity of SpCas9 (the “HF1” and “Hypa” mutations (Chen et al., 2017; Kleinstiver et al., 2016)) together with a second UGI domain further reduced the off-target base editing events (reducing them to undetectable levels for all but 5 of the 15 sites that still showed detectable edits with eA3A-BE3) (Figs. 6a – 6d). These higher-

41

42

Figure 6: Off-target editing activities of BE3 and eA3A-BE3 variants. On- and off-target editing frequencies of four gRNAs targeted to (a) EMX1 site 1, (b) VEGFA site 2, (c) FANCF site 1 or (d) CTNNB1 site 1 with BE3 or one of the indicated eA3A-BE3 variants. Percentage edits represent the sum of all edited Cs in the editing window and represent the mean of three biological replicates with error bars representing SEMs. Intended target sequence is shown at the top of each graph. On-target sites are marked with a black diamond to the left and mismatches or bulges in the various off-target sites are shown with colored boxes or a dash in gray boxes, respectively. Off-

43 target sites that lose the cognate TC motif within the editing window and thus might be expected to show lower off-target editing by eA3A, are noted with empty circles to the left. Asterisks indicate statistically significant differences in editing efficiencies observed between BE3 and eA3A-BE3 at each site (* p < 0.05, ** p < 0.005, *** p < 0.0005).

specificity variants also induce improved base editing product purity and reduced frequencies of indels at on-target sites (Supplementary Fig. 8), consistent with earlier studies that used similar strategies to achieve these outcomes for the original BE3 fusion (Komor et al., 2017; Wang et al.,

2017).

To test the eA3A-BE3 fusion on a disease-relevant mutation, we examined its activity on a common β-thalassemia allele found in China and some Southeast Asian populations (Cao and

Galanello, 2010; Liang et al., 2017) for which single nucleotide editing is critical. Mutation of position -28 of the human HBB promoter from an A to G (and therefore a T to C on the complementary strand) results in β-thalassemia disease (Fig. 7a). The HBB -28 C mutation can be corrected with SpCas9 and a gRNA with the C falling within the predicted editing window.

However, another C (at position -25 of the HBB promoter) is also present within the editing window of this gRNA and previous work has shown that mutation of this base can cause a β- thalassemia phenotype in humans independent of the nucleotide identity at the -28 position (Fig.

7a) (Eng et al., 2007; Li et al., 2015).We directly compared the abilities of the original BE3, the

YE BE3s, and eA3A-BE3 to edit an integrated copy of 200 bps of mutant HBB promoter sequence encompassing the cytidines at positions -28 and -25 in HEK293T cells. (For technical reasons, all experiments targeting the HBB -28 (A>G) allele used a gRNA expressed with a self-cleaving hammerhead ribozyme on its 5’ end (Online Methods).) As expected, eA3A-BE3 showed higher precision than BE3 and the YE BE3s for selectively editing the -28 cytidine relative to the -25 cytidine (Fig. 7b). This resulted in substantially higher levels of perfectly corrected alleles bearing

44 only a -28 C to T edit: 22.48% for eA3A-BE3 compared with 0.57%, 1.04%, 0.92%, and 0.76% for BE3, YE1 BE3, YE2 BE3, and YEE BE3, respectively (Fig. 7c). Analysis of eight potential off-target sites for the HBB-targeted gRNA (three identified by GUIDE-seq with SpCas9 nuclease

(Supplementary Fig. 5 and five by in silico methods; Online Methods) showed that eA3A-BE3 induced significant off-target editing at two sites while BE3 induced significant editing at these same two sites and an additional third site all at higher frequencies (Fig. 7d). As expected, use of the eA3A-HF1-BE3-2xUGI and eA3A-Hypa-BE3-2xUGI fusions with the HBB gRNA reduced the frequencies of off-target edits to undetectable levels at all eight sites we examined (Fig. 7d).

Both high-fidelity base editor fusions also improved product purity, resulting in a reduction of unwanted -28 C to G edits that are also known to cause β-thalassemia from 16.3% with eA3A-

BE3 to 8.8% and 7.5% with the HF1 and Hypa variants, respectively (Fig. 7c).

We next sought to determine whether eA3A-BE3 could be used to edit the HBB -28 (A>G) mutation at the endogenous gene locus in erythroid precursor cells derived from human CD34+ hematopoietic stem and progenitor cells. To do this, we purified eA3A-BE3 and A3A (N57Q)-

BE3 proteins to near homogeneity (Supplementary Fig. 9a) and electroporated them as ribonucleoprotein (RNP) complexes (with the HBB -28 (A>G)-targeted gRNA) into human erythroid precursors obtained from a compound heterozygous β-thalassemia patient bearing a 4 bp deletion in exon 1 of one HBB allele (allele 1), and the HBB -28 (A>G) mutation in the second allele (allele 2). As expected, both proteins selectively edited the -28 position as compared to the

-25 position of allele 2 (Supplementary Fig. 9b). Interestingly, editing of this site in erythroid precursors produced greater numbers of C>G transitions than in the 293T.HBB cell line, perhaps due to differences in the expression or functionality of DNA repair activities between the two cell types. To determine whether HBB expression was functionally altered by editing allele 2, we

45 terminally differentiated the electroporated erythroid precursors and measured the expression of the globin genes HBA1/2, HBB, and HBG1/2 by real-time quantitative PCR (Supplementary Fig.

9c). Editing with eA3A-BE3 increased expression of HBB 2.6-fold compared to the mock control, whereas editing with A3A (N57Q)-BE3 increased

46

Figure 7: On- and off-target activities of eA3A-BE3 variants at a β-thalassemia-causing mutation HBB -28 (A>G) sequence in human cells. (a) Schematic of the HBB -28 (A>G) mutation and potential base editing outcomes when targeting Cs at -28 and -25 in the editing window of an HBB-targeting gRNA. Mutations to the bystander cytidine at the -25 position are

47 deleterious and cause β-thalassemia phenotypes independent of the identity of the -28 nucleotide. (b) Heat maps showing C-to-T editing efficiencies for BE3, YE BE3s, and various A3A-BE3 variants at the HBB -28 (A>G) target site in an integrated reporter in human HEK293T cells. The -28 C is indicated with a black arrow. Editing efficiencies shown represent the mean of three biological replicates. (c) Graph showing the frequencies of perfectly corrected (-28 C to T only) and other imperfectly edited (-28 C to G or other edited Cs) alleles by BE3, YE BE3 variants, and eA3A-BE3 variants. (d) On- and off-target editing frequencies of the HBB-targeted gRNA with BE3, YE BE3 variants, or eA3A-BE3 variants. Percentage edits represent the sum of all edited Cs in the editing window and represent the mean of three biological replicates with error bars representing SEMs. Intended target sequence is shown at the top. On-target site is marked with a black diamond to the left and mismatches or bulges in the various off-target sites are shown with colored boxes or a dash in gray boxes, respectively. Off-target sites that lose the cognate TC motif within the editing window and thus might be expected to show lower off-target editing by eA3A, are noted with empty circles to the left. Asterisks indicate statistically significant differences in editing efficiencies observed between BE3 and eA3A-BE3 and between eA3A-BE3 and the untransfected control (* p < 0.05, ** p < 0.005, *** p < 0.0005).

expression of HBB 4.0-fold. Importantly, Cas9 nuclease did not alter HBB expression relative to the mock control, indicating that the single nucleotide substitutions induced by the base editors are responsible for increased HBB expression. Furthermore, while A3A (N57Q)-BE3 induced low but significant levels of off-target editing at four of six investigated sites, eA3A-BE3 induced significant off-target editing at only one of these sites and at lower frequency than observed with

A3A (N57Q)-BE3 (Supplementary Fig. 9d).

Our study provides an important proof-of-principle illustrating how changing and engineering the cytidine deaminase in base editors can be used to optimize on-target precision and reduce off-target effects. Bioinformatic analysis of SNPs deposited in the ClinVar database indicates that eA3A-BE3 can be used to efficiently correct up to 376 disease-causing SNPs, including 171 with additional bystander cytidines in the editing window that would likely be edited by BE3 (Supplementary Fig. 10 and Supplementary Table 6). We envision that a large suite of base editor fusions can be engineered by exploiting both the rich diversity of naturally occurring cytidine deaminase domains and by modifying the function and activity of these enzymes using

48 protein engineering and evolution. In our study, mutation of the N57 residue in the human A3A deaminase was critical to restoring its native target sequence precision in the context of a base editor and also to lowering its off-target base editing activity. Introduction of additional mutations at A3A residues I96 and A71 further refined this precision, albeit at the expense of the desired cognate activity. Furthermore, the eA3A deaminase we engineered might be incorporated into and used to reduce the off-target effects of other base editor architectures that use different Cas9 orthologues for which high-fidelity variants have not yet been described (e.g., SaCas9 from

Staphylococcus aureus110).

Relative to previously published studies, our strategy of using alternative and engineered cytidine deaminases provides a different and orthogonal approach to improve the precision of on- target editing. An earlier study introduced mutations into the rAPOBEC1 part of BE3 that shorten the editing window but this strategy narrows targeting range and does not permit predictable discrimination of base deamination when multiple cytidines are present in the window (as is the case with the β-thalassemia HBB -28 promoter mutation we successfully modified with eA3A-

BE3 in this study). We also note that the YE BE3 variants that show the highest discrimination among multiple cytidines also typically show the greatest reductions in their overall base editing activity.

One limitation of eA3A-BE3 is a decreased targeting range due to the increased sequence requirements flanking the target cytidine, a restriction that might be addressed by using engineered

SpCas9 PAM recognition variants and naturally occurring Cas9 orthologues with different PAM specificities. In this regard, we constructed eA3A-BE3 derivatives using the engineered VRQR or xCas9 variants of SpCas9 that have been reported to recognize sites with an NGA or NGN PAM sequence, respectively (Hu et al., 2018; Kleinstiver et al., 2016). We found that eA3A-BE3

49

(VRQR) robustly edited nine sites bearing NGA PAMs with high efficiencies and precision

(Supplementary Fig. 11). eA3A-BE3 (xCas9) efficiently edited a subset of two target sites with

NGT PAMs we tested while showing lower activities on five other sites with NGT, NGC, or NGA

PAMs (Supplementary Fig. 11); for all seven of these sites, eA3A-BE3 (xCas9) generally showed higher efficiencies and higher precision than BE3 (xCas9) (Supplementary Fig. 11). Importantly, eA3A-BE3 (VRQR) also retained its improved off-target specificity relative to the original BE3

(VRQR) with two gRNAs targeted to sites containing NGA PAMs (Supplementary Fig. 12).

Taken together, these results show that Cas9 variants with altered PAM recognition can be used with our eA3A-BE3 platform, suggesting that (like BE3) it behaves in a modular fashion with retention of higher on-target precision and off-target specificity even when constructed with engineered SpCas9 variants. To further expand the targeting range of the eA3A platform it may also be possible to engineer or evolve different sequence specificities into APOBEC enzymes in the context of a base editor architecture, as has been done with APOBEC enzymes in isolation

(Rathore et al., 2013; Shi et al., 2017). Thus, in the longer-term, we envision that the targeting range restriction might eventually be completely overcome by creating a larger series of different base editors that collectively recognize cytidines embedded in any sequence context.

50

Supplementary Figures

Supplementary Figure 1: Base editing activities of engineered A3A-BE3 variants with mutations designed to disrupt non-specific interactions with substrate ssDNA. Graphs

51 illustrating the frequencies of C to T editing by a series of A3A-BE3 variants containing various pairs of mutations in A3A on bystander and cognate cytidines at four endogenous human gene target sites in single replicate. The reference sequence of each target site is shown at the top of each graph.

52

Supplementary Figure 2: Comparison of product purities and indel mutation frequencies for BE3 and eA3A-BE3 on endogenous human gene target sites. (a) Graph showing normalized frequencies of cognate Cs edited to A, T, or G for twelve endogenous human gene target sites from Fig. 1d when targeting with BE3 or A3A (N57G)-BE3. (b) Graph showing indel mutation frequencies for BE3, the YE BE3s, and different engineered A3A-BE3 variants at the same 12 sites shown in (a). All data shown represent the mean of three biological replicates with error bars representing SEMs.

53

54

Supplementary Figure 3: eA3A-BE3-mediated editing of bystander cytidines typically occurs on the same allele as editing of cognate cytidines. Allele frequency tables for four sites on which eA3A-BE3 exhibits cognate-to-bystander editing ratios less than five. Arrows with red outlines indicate bystander cytidines in the editing window, while black arrows indicate the target cytidine. The predicted nicking site by nSpCas9 is indicated by black dashed line.

55

Supplementary Figure 4: Mutations designed to decrease the catalytic rate of eA3A can increase the cognate-to-bystander editing ratio of eA3A-BE3. Heat maps showing C-to-T editing efficiencies for eA3A-BE3 and three eA3A-BE3 variants bearing the indicated mutations at three endogenous human gene target sites, each bearing a cognate cytidine preceded by a 5’ T and one or more bystander cytidines within the editing window. Editing efficiencies shown represent the mean of three biological replicates. Graph showing cognate-to-bystander editing ratios for all three target sites with eA3A-BE3 and each of the three eA3A-BE3 variants. Data points shown represent ratios of the means of the three biological replicates performed for each base editor. Mean values are indicated by a line and error bars represent the SEM.

56

Supplementary Figure 5: SpCas9 nuclease off-targets discovered by GUIDE-seq for the CTNNB1 and HBB -28 (A>G) gRNAs. GUIDE-seq plots depicting the target site for both gRNAs (top of each figure panel) and off-target sites discovered by GUIDE-seq shown below with base positions containing mismatches to the on-target site highlighted with a colored box and RNA bulges indicated with a dash. Sites with more than one potential alignment (i.e. where either an RNA bulge or additional mismatches are both plausible target sites) are shown with brackets. GUIDE-seq read counts for each site discovered are shown in the right column. The on-target site for the CTNNB1 gRNA and the single mismatched on-target site for the HBB -28 (A>G) gRNA are indicated with a small black square to the left.

57

Supplementary Figure 6: on-target C to T editing efficiencies for each C in the editing window for sites in Figure 2a-c. Bar plots showing the C-to-T editing efficiencies for each C in the editing windows of the on-target sites for the three gRNAs depicted in Figure 2a-c. The target sequence for each gRNA is depicted above each plot.

58

Supplementary Figure 7: The N57G mutation is important for decreased off-target activity of eA3A-BE3. Editing frequencies for EMX1 site 1 and FANCF site 1 gRNAs with BE3, eA3A- BE3, and untransfected cells (from Figs. 2a – 2b) are re-plotted with editing frequencies with these same gRNAs observed with A3A-BE3 from a separate experiment. Sequences of the on- and off-target sites are shown to the left of the bar plot below the target sequence with mismatches relative to the on-target site highlighted with colored boxes and bulges with a grey highlighted dash. The on-target sites for the EMX1 site 1 and FANCF site 2 gRNAs are indicated with a small black square to the left.

59

60

Supplementary Figure 8: Effects of adding a second UGI domain and HF1 or Hypa high- fidelity mutations eA3A-BE3 on base editing product purities and indel frequencies at four endogenous human gene target sites. Graph in top panel shows normalized frequencies of cognate Cs edited to A, T, or G for the four endogenous human gene target sites of Figures 2a – 2d when targeting each site with BE3, eA3A-BE3, or eA3A-BE3 variants incorporating HF1 or Hypa mutations and a second UGI domain. Graph in bottom panel shows indel frequencies induced by the same base editors at the same sites. All data shown represent the mean of three biological replicates and error bars represent SEMs.

61

Supplementary Figure 9: Efficient correction of the HBB -28 (A>G) pathogenic SNP in patient-derived erythroid precursor cells. (a) Coomassie-stained SDS-PAGE gel showing titrations of purified eA3A-BE3 and A3A (N57Q)-BE3 protein next to a protein ladder. Molecular weights are indicated to the left of the image. (b) C-to-T, -G or -A editing frequencies at the -25 and -28 positions of the HBB -28 (A>G) allele in patient-derived erythroid precursor cells. Sequencing reads were separated according to whether they contained the -CTTT deletion in exon 2 that is present only in the allele without the HBB -28 (A>G) mutation. (c) RT-qPCR data examining HBB and HBG1/2 globin mRNA expression normalized to expression of HBA1/2 from the edited populations shown in panel (b) following terminal erythroid differentiation. (d) Off- target editing by A3A (N57Q)-BE3 or eA3A-BE3 RNPs from the edited populations shown in panel (b) at six off-target sites. Percentage edits represent the sum of all C-to-D (D = A, G or T) editing frequencies in the editing window and represent the mean of three biological replicates with error bars representing SEMs. Intended target sequence is shown at the top. On-target site is marked with a black diamond to the left and mismatches or bulges in the various off-target sites are shown with colored boxes or a dash in gray boxes, respectively. Off-target sites that lose the cognate TC motif within the editing window and thus might be expected to show lower off-target editing by eA3A, are noted with empty circles to the left. Asterisks indicate statistically significant differences in editing efficiencies observed between the indicated samples at each site (* p < 0.05, ** p < 0.005, *** p < 0.0005).

62

Supplementary Figure 10: eA3A enables high precision correction of 376 disease-causing SNPs deposited in the ClinVar database that can be targeted using NGG, NGA, or NGT PAMs. The chart on the left visualizes the 5’ nucleotide context of all T to C SNPs that could be corrected by C to T base editing and reside in the editing window (defined as positions 5-9 of the spacer sequence from the 5’ end) and are targetable using Cas9s recognizing NGG, NGA, or NGT PAMs. The panel on the right visualizes further analysis of the 376 SNPs in the leftmost figure with a 5’ T (green) indicating whether bystander Cs in the editing window also have a 5’ T, or whether the 5’ nucleotide of the bystander C is not a T, indicating that eA3A would be able to more precisely target 171disease-causing SNPs compared to BE3.

63

64

Supplementary Figure 11: eA3A-BE3 bearing VRQR or xCas9 mutations that alter PAM recognition specificity can be used to edit sites bearing NGA or NGT PAMs for highly precise single nucleotide editing. Heat maps showing C-to-T editing efficiencies for BE3 (VRQR), eA3A-BE3 (VRQR), BE3 (xCas9), and eA3A-BE3 (xCas9) at 14 endogenous human gene target sites. VRQR target sites all bear NGAN PAM sequences, while xCas9 target sites bear NGN PAM sequences. Each site bears a cognate cytidine preceded by a 5’ T and one or more bystander cytidines within the editing window. Editing efficiencies shown represent the mean of three biological replicates. Graph shows cognate-to-bystander C to T editing ratios for BE3 (VRQR), eA3A- BE3 (VRQR), BE3 (xCas9) and eA3A-BE3 (xCas9). Data points shown represent ratios calculated from the mean values shown in the heat maps. Median values are indicated by a red line and error bars represent interquartile ranges.

65

Supplementary Figure 12: eA3A retains high genome-wide fidelity when used with SpCas9 (VRQR). Off-target editing frequencies of two gRNAs targeted to PPP1R12C VRQR site 1 or 3 with BE3 (VRQR) or eA3A-BE3 (VRQR). Percentage edits represent the sum of all edited Cs in the editing window and represent the mean of three biological replicates with error bars representing SEMs. Intended target sequence is shown at the top of each graph. Mismatches or bulges in the various off-target sites are shown with colored boxes or a dash in gray boxes, respectively. Off-target sites that lose the cognate TC motif within the editing window and thus might be expected to show lower off-target editing by eA3A, are noted with empty circles to the left. Asterisks indicate statistically significant differences in editing efficiencies observed between the indicated samples at each site (* p < 0.05, ** p < 0.005, *** p < 0.0005).

66

Supplementary Figure 13: Incorporating a self-cleaving hammerhead ribozyme on the 5’ end of the gRNA preserves perfect matching of the spacer and target site and rescues the activities of the eA3A-BE3 variants bearing HF1 and Hypa high-fidelity mutations. Heat maps showing C-to-T editing efficiencies for eA3A variants incorporating HF1 or Hypa mutations targeting HBB -28 (A>G) using a gRNA with a 5’ mismatched guanine (top), a self-cleaving hammerhead ribozyme with 6 nucleotides of self-complementarity (middle), or a self-cleaving hammerhead ribozyme with 8 nucleotides of self-complementarity (bottom). The 5’ mismatched guanine or the ribozyme sequence is shown in red, while the spacer sequence is shown in black. Editing efficiencies shown represent the mean of three biological replicates.

67

Supplementary Information

Protein sequences used in this study

A3A-BE3

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQ AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN THVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP WDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVI TDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRK KLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENP INASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAED AKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASM IKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILE KMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKN LPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVT VKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVL TLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTIL DFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQT VKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVL TRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKA GFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYK VREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQV NIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEK GKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKR MLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIE QISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTID RKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESIL MLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIK MLSGGSPKKKRKV

A3A (N57Q/Y130F)-BE3

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHQQ AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN THVRLRIFAARIFDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP WDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVI TDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRK KLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENP INASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAED

68

AKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASM IKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILE KMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKN LPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVT VKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVL TLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTIL DFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQT VKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVL TRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKA GFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYK VREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQV NIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEK GKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKR MLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIE QISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTID RKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESIL MLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIK MLSGGSPKKKRKV eA3A-BE3

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHGQ AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN THVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP WDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVI TDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRK KLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENP INASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAED AKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASM IKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILE KMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKN LPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVT VKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVL TLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTIL DFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQT VKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVL TRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKA GFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYK VREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQV

69

NIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEK GKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKR MLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIE QISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTID RKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKETGKQLVIQESIL MLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIK MLSGGSPKKKRKV eA3A-BE3-2xUGI

MEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKMDQHRGFLHNQ AKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGCAGEVRAFLQEN THVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDTFVDHQGCPFQP WDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESDKKYSIGLAIGTNSVGWAVI TDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRIC YLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRK KLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENP INASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAED AKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASM IKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILE KMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNREKIE KILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKN LPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVT VKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDNEENEDILEDIVL TLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLINGIRDKQSGKTIL DFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLAGSPAIKKGILQT VKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH PVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSFLKDDSIDNKVL TRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAERGGLSELDKA GFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYK VREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKA TAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQV NIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEK GKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKR MLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIE QISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTID RKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSGGSGGSTNLSDIIEKETGKQL VIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVIQDSN GENKIKMLSGGSGGSGGSTNLSDIIEKETGKQLVIQESILMLPEEVEEVIGNKPESDILVHT AYDESTDENVMLLTSDAPEYKPWALVIQDSNGENKIKMLSGGSPKKKRKV

6xHis-eA3A-BE3

MGSSHHHHHHMEASPASGPRHLMDPHIFTSNFNNGIGRHKTYLCYEVERLDNGTSVKM DQHRGFLHGQAKNLLCGFYGRHAELRFLDLVPSLQLDPAQIYRVTWFISWSPCFSWGC AGEVRAFLQENTHVRLRIFAARIYDYDPLYKEALQMLRDAGAQVSIMTYDEFKHCWDT

70

FVDHQGCPFQPWDGLDEHSQALSGRLRAILQNQGNSGSETPGTSESATPESDKKYSIGLA IGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETAEATRLKRTARR RYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHE KYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQ TYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPN FKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNT EITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQ EEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFY PFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQS FIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIV DLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLDN EENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSRKLIN GIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIANLA GSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEG IKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSDYDVDHIVPQSF LKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKFDNLTKAE RGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVS DFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYGDYKVYDVRKMI AKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEIVWDKGRDFATV RKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYS VLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYS LFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVE QHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAP AAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSTNLSDIIEKET GKQLVIQESILMLPEEVEEVIGNKPESDILVHTAYDESTDENVMLLTSDAPEYKPWALVI QDSNGENKIKMLSGGSPKKKRKV

Methods

Plasmids and oligonucleotides

Sequences of proteins and their expression plasmids used in this study are listed in the Supplementary Information. gRNA target sites and sequences of oligonucleotides used for on- target PCR amplicons for high throughput sequencing in this study can be found in Supplementary Table 5. Sequences of oligonucleotides used to investigate off-target editing sites can be found in Supplementary Table 3. BE expression plasmids containing amino acid substitutions were generated by PCR and standard molecular cloning methods. gRNA expression plasmids were constructed by ligating annealed oligonucleotide duplexes into MLM3636 cut with

71

BsmBI. All gRNAs except those targeting the HBB -28 (A>G) and CTNNB1 sites were designed to target sites containing a 5′ guanine nucleotide.

Human cell culture and transfection

U2OS.EGFP cells containing a single stably integrated copy of the EGFP-PEST reporter gene and HEK293T cells were cultured in DMEM supplemented with 10% heat-inactivated fetal bovine serum, 2 mM GlutaMax, penicillin and streptomycin at 37 °C with 5% CO2. The media for

U2OS.EGFP cells was supplemented with 400 µg ml−1 Geneticin. Cell line identity was validated by STR profiling (ATCC), and cells were tested regularly for mycoplasma contamination.

U2OS.EGFP cells were transfected with 750 ng of plasmid expressing BE and 250 ng of plasmid expressing sgRNA according to the manufacturer’s recommendations using the DN-100 program and SE cell line kit on a Lonza 4-D Nucleofector. For HEK293T transfections, 75,000 cells were seeded in 24-well plates and 18 hours later were transfected with 600 ng of plasmid expressing BE and 200 ng of plasmid expressing sgRNA using TransIT-293 (Mirus) according to the manufacturer’s recommendations. For all targeted amplicon sequencing and GUIDE-seq experiments, genomic DNA was extracted 72 h post-transfection. Cells were lysed in lysis buffer containing 100 mM Tris-HCl pH 8.0, 150 mM NaCl, 5 mM EDTA, and 0.05% SDS and incubated overnight at 55C in an incubator shaking at 250 rpm. Genomic DNA was extracted from lysed cells using carboxyl-modified Sera-Mag Magnetic Speed-beads resuspended in 2.5 M NaCl and

18% PEG-6000 (magnetic beads).

The HEK293T.HBB cell line was constructed by cloning a 200 base pair fragment of the

HBB promoter upstream of an EF1a promoter driving expression of the puromycin resistance gene in a lentiviral vector. The HBB -28 (A>G) mutation was inserted by PCR and standard molecular

72 cloning methods. The lentiviral vector was transfected into 293FS cells and media containing viral particles was harvested after 72 hours. Media containing viral particles was serially diluted and added to 10 cm plates with approximately 10 million HEK293T cells. After 48 hours, media was supplemented with 2.5 µg ml-1 puromycin and cells were harvested from the 10 cm plate with the fewest surviving colonies to ensure single copy integration.

Peripheral blood was obtained from β-thalassemia patients following Boston Children’s

Hospital institutional review board approval and patient informed consent. CD34+ hematopoietic stem and progenitor cells (HSPCs) were isolated using the Miltenyi CD34 Microbead kit

(Miltenyi Biotec). HSPCs were cultured with X-VIVO 15 (Lonza, 04-418Q) supplemented with

100 ng ml-1 human SCF, 100 ng ml-1 human thrombopoietin (TPO) and 100 ng ml-1 recombinant human Flt3-ligand (Flt3-L) 3 days for expansion. For in vitro erythroid differentiation, HSPCs were cultured with erythroid differentiation medium (EDM) consisting of IMDM supplemented with 330 µg ml-1 holo-human transferrin, 10 µg ml-1 recombinant human insulin, 2 IU ml-1 heparin,

5% human solvent detergent pooled plasma AB, 3 IU ml-1 erythropoietin, 1% L-glutamine, and

1% penicillin/streptomycin. During days 0–7 of culture, EDM was further supplemented with 1

µM hydrocortisone (Sigma), 100 ng ml-1 human SCF, and 5 ng ml-1 human IL-3 (R&D). After day 7 erythroid precursors were cryopreserved. 1 day after thawing, 50,000 erythroid precursors were electroporated using the Lonza 4-D electroporator. During days 7–11 of culture, EDM was supplemented with 100 ng ml-1 human SCF only. During days 11–18 of culture, EDM had no additional supplements.

Off-target site selection and amplicon design

73

Two of the sites characterized here, EMX1 site 1 and FANCF, were previously characterized by modified Digenome-seq, an unbiased approach to discover BE3-specific off- target sites. All off-targets discovered by modified Digenome-seq were investigated, and these sites represent the most comprehensive off-target characterization because they were discovered de novo using BE3. The VEGFA site 2 target is a promiscuous, homopolymeric gRNA that was previously characterized by GUIDE-seq. Because the VEGFA site 2 gRNA has over one hundred nuclease off-target sites, we selected the 20 off-target sites with the highest number of GUIDE- seq reads that also reside in loci for which we were able to design unique PCR amplification primers for characterization here. The CTNNB1 and HBB -28 (A>G) gRNAs had not been previously characterized with respect to BE or nuclease off-target sites. We performed GUIDE- seq as previously described (Tsai et al., 2015) using these gRNAs to determine the SpCas9 nuclease off-target sites, and used Cas-OFFinder to predict all of the potential off-target sites with one RNA bulge and one mismatch. (GUIDE-seq and Cas-OFFinder analyses were performed using the hg38 reference genome.) This class of off-targets is more prevalent in BE3 relative to nucleases (Kim et al., 2017b), and thus sites that we were unlikely to discover by GUIDE-seq.

Primers were designed to amplify all off-target sites such that potential edited cytidines were within the first 100 base pairs of Illumina HTS reads. A total of six primer pairs encompassing

EMX1 site 1, VEGFA site 2 and CTNNB1 site 1 off-target sites did not amplify their intended amplicon and were thus excluded from further analysis.

Targeted amplicon sequencing

On- and off-target sites were amplified from ~100 ng genomic DNA from three biological replicates for each condition. PCR amplification was performed with Phusion High Fidelity DNA

Polymerase (NEB) using the primers listed in Supplementary Tables 3 and 5. 50 µl PCR

74 reactions were purified with 1x volume magnetic beads. Amplification fidelity was verified by capillary electrophoresis on a Qiaxcel instrument. Amplicons with orthogonal sequences were pooled for each triplicate transfection and Illumina flow cell-compatible adapters were added using the NEBNext Ultra II DNA Library Prep kit according to manufacturer instructions. Illumina i5 and i7 indices were added by an additional 10 cycles of PCR with Q5 High Fidelity DNA

Polymerase using primers from NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set

1) and purified using 0.7x volume magnetic beads. Final amplicon libraries containing Illumina- compatible adapters and indices were quantified by droplet digital PCR and sequenced with 150 bp paired end reads on an Illumina MiSeq instrument. Sequencing reads were de-multiplexed by

MiSeq Reporter then analyzed for base frequency at each position by a modified version of

CRISPResso (Pinello et al., 2016). Indels were quantified in a 10 base pair window surrounding the expected cut site for each sgRNA.

Expression of HBB -28 (A>G) gRNAs

In order to use eA3A BEs with the HF1 or Hypa mutations that decrease genome-wide off- target editing, it was necessary to use 20 nucleotides of spacer sequence in the gRNA with no mismatches between the spacer and target site (Kim et al., 2017d; Kleinstiver et al., 2016; Kulcsár et al., 2017). We expressed the HBB -28 (A>G) gRNA from a plasmid using the U6 promoter, which preferentially initiates transcription at a guanine nucleotide at the +1 position. To preserve perfect matching between the spacer and target site, we appended a self-cleaving 5’ hammerhead ribozyme that is able to remove the mismatched guanine at the 5’ of the spacer (Kim et al., 2017d).

This strategy rescued activity of HF1 eA3A BE3.9 or Hypa eA3A BE3.9 by approximately 1.4- fold compared to the gRNA with a 5’ mismatched guanine (Supplementary Fig. 13).

75

Protein Purification

Proteins were expressed and purified as previously described (Rees et al., 2017b). Briefly,

8 liters of BL21 STAR (DE3) E. coli containing the plasmids pET-6xHis-eA3A-BE3 or pET-

6xHis-A3A (N57Q)-BE3 were grown to OD600 = 0.7 in LB broth then cooled to 16° C in an ice water bath. Protein expression was then induced by the addition of 0.5 mM IPTG and cultures were incubated overnight at 16° C. Cells were harvested by centrifugation then lysed by sonication.

Proteins were purified by Ni-NTA immobilized metal affinity chromatography then cation exchange using SP Sepharose resin. Following cation exchange, the elution buffer was diluted to a final salt concentration of 150 mM and the proteins were concentrated to 20 mg/ml and snap frozen in liquid nitrogen.

RNP electroporation

Electroporation was performed using Lonza 4D Nucleofector (V4XP-3032 for 20 µl

Nucleocuvette Strips) as the manufacturer’s instructions. For 20 µl Nucleocuvette Strips, the RNP complex was prepared by mixing SpCas9 (200 pmol) and chemically modified synthetic sgRNA

(200 pmol) purchased from Synthego and incubated for 15 minutes at room temperature immediately before electroporation. 50K erythroid precursors resuspended in 20 µl P3 solution were mixed with RNP and transferred to a cuvette for electroporation with program EO-100. The electroporated cells were resuspended with EDM for in vitro differentiation.

RT-qPCR quantification of globin induction

RNA isolation was performed with RNeasy columns (Qiagen, 74106) according to the manufacturer’s instructions. Reverse transcription was performed with the iScript cDNA synthesis kit (Bio-Rad, 170-8890). RT–qPCR was performed with iQ SYBR Green Supermix (Bio-Rad,

76

170-8880). The induction of globin gene expression was measured using primers amplifying

HBG1/2 (5’- GGTTATCAATAAGCTCCTAGTCC and ACAACCAGGAGCCTTCCCA-3’),

HBB (5’-TGAGGAGAAGTCTGCCGTTAC-3’ and 5’-ACCACCAGCAGCCTGCCCA-3’) and

HBA1/2 (5’-GCCCTGGAGAGGATGTTC-3’ and 5’-TTCTTGCCGTGGCCCTTA-3’) (Ye et al.,

2016).

Statistical testing

All statistical testing was performed using two-tailed Student’s t-test according to the method of Benjamini, Krieger, and Yekutieli without assuming equal variances between samples.

Data Availability

High–throughput sequencing reads will be deposited in the NCBI Sequence Read Archive database prior to publication.

77

Chapter 4: Discussion and future directions

Moving forward, the field of gene editing is well-positioned to create vast amounts of knowledge detailing the specifics of how cells and cellular systems function at the molecular level.

Unquestionably, the biggest contribution of the CRISPR-Cas9 platform is the new-found ability of any cell or molecular biology laboratory to quickly and efficiently mutagenize a given genetic element in cultured cells or whole organisms by providing Cas9 with a particular gRNA molecule.

This system allows researchers to rapidly and precisely dissect the functions of genetic elements, even allowing for the precise determination of the function of short nucleotide motifs within larger genetic elements with unparalleled accuracy. No other programmable nuclease platform offers similar ease of use in targeting unique genetic elements, and as a result CRISPR-Cas9 will have an outsized impact on biological research relative to its potential uses in therapeutic genome editing. The development of novel methods to regulate and improve the activities of designer genome editing reagents with respect to precise on-target editing, degree of off-target editing and cell type-dependent activity are paramount to the successful implementation of these reagents in the laboratory to study biology and disease through precise genome editing, and also to the potential translation of these technologies to therapies intended to correct pathological genetic mutations in humans. Because designer nucleases such as CRISPR-Cas9 and base editor technologies are prone to off-target mutagenesis distributed throughout the genome, it is critical that significant progress is made towards enabling only correction of the on-target genomic site.

This is likely to be the most important issue facing the successful translation of these technologies to therapeutic settings.

The development of RNA-guided nucleases that have activity dependent on specific cellular or chromatin contexts represents a step forward in enabling tissue-specific genome editing

78 when the nuclease technology is delivered systemically or encoded in a gene drive. Though it is possible to deliver designer nucleases packaged in viruses that have tropism towards a specific cell or tissue type (Ellis et al., 2013; Naso et al., 2017; Ran et al., 2015a; Vasileva and Jessberger,

2005), these viruses fail to exclusively deliver their payloads to the desired target tissue or cells, and they often have significant off-target tropism, especially in the liver. For some applications, it may be necessary to nearly or entirely avoid editing in off-target tissue or cell types, making additional layers of regulation of nuclease activity essential. Though there may be specific applications in which engineering nucleases dependent on specific epigenetic contexts proves critical, a different approach that conditionally stabilizes the expression of Cas9 protein based on the co-expression of a second protein may enable a more generalizable strategy for cell type- dependent nuclease activity (Tang et al., 2016).

Base editors incorporating the eA3A domain leverage the natural diversity of cytidine deaminase domains to greatly improve the precision of on-target base editing, enabling single nucleotide editing events at the vast majority of targeted DNA molecules with a large number of gRNAs. Critically, the development of eA3A-BEs required the use of common protein engineering techniques to enhance the native sequence preferences of the APOBEC3A cytidine deaminase domain in the context of the base editor architecture, indicating that it may be necessary to employ protein engineering of native deaminase domains in order to achieve certain desirable characteristics. Though it may be possible to use more complex protein engineering techniques to re-program the sequence preferences of the eA3A deaminase domain such that all nucleotide contexts can be recognized by discrete base editor proteins, it may also be necessary to employ orthogonal cytidine deaminase domains with divergent sequence preferences to access some of the available sequence space. There are an abundance of cytidine deaminase domains that have been

79 identified from many organisms, many with their own unique sequence preferences (Table 2), and the work outlined here may also aid in further protein engineering efforts utilizing divergent deaminase domains. Interestingly, the most common 5’ nucleotide to a disease-causing SNP is a

C (Supplementary Fig. 10), raising the possibility that human APOBEC3G might be used to selectively edit the second C in 5’ CC motifs, greatly expanding the targeting range of the sequence-specific deaminase platform. Further, it may also be possible to apply the concept of sequence-specific deaminase domains to the recently described A to G base editors (Gaudelli et al., 2017), enabling precise, single nucleotide correction of a different class of SNPs for research or therapeutic purposes.

Ortholog Nucleotide sequence preference

hAID 5’-WRC rAPOBEC1* 5’-TC ≥ CC ≥ AC > GC mAPOBEC3 5’-TYC hAPOBEC3A 5’-TCY hAPOBEC3B 5'-TCR > TCT hAPOBEC3C 5’-WYC hAPOBEC3F 5’-TTC hAPOBEC3G 5’-CCC hAPOBEC3H 5’-TTCA ~ TTCT ~ TTCG > ACCCA > TGCA

Table 2. Previously characterized cytidine deaminase domains and their substrate sequence preferences. Nucleotide positions that are poorly specified or are permissive of two or more nucleotides are annotated according to IUPAC codes, where W = A or T, R = A or G, and Y = C or T.

80

Precise on-target base editing enables high-efficiency disease modeling and correction at the single nucleotide level. The ability to reliably manipulate the smallest component of the genome will be critical to the gene editing field moving forward, however there are critical areas pertaining to off-target mutagenesis that must be addressed with base editor proteins. Because of the APOBEC enzymes’ natural ability to bind and deaminate cytidines in genomic DNA and cytidines in RNA, non-specific spurious deamination events are a possibly important source of off-target mutagenesis in the genome and transcriptome from BE technology. In theory, even if the BE’s nCas9 domain is eminently specific, this might do nothing to prevent the natural RNA- and ssDNA-targeting ability of the APOBEC enzyme from non-specifically deaminating globally across the transcriptome or whichever regions of the genome are exposed as ssDNA, such as actively transcribed regions or DNA undergoing replication. In fact, an E. coli-based assay examining deaminases showed that an actively transcribed region could be highly enriched (~7-

530 fold) for C to T transition mutations when exposed to various overexpressed mammalian cytidine deaminase enzymes (Harris et al., 2002). Further, one group has found that co-expression of PmCda1 and nCas9 as two separate, untethered proteins in yeast cells results in similar levels of deamination at the sgRNA-specified target site as when the two components are expressed as direct fusion partners, demonstrating that these proteins are capable of deaminating ssDNA from solution without an affinity tether to the genomic location (Nishida et al., 2016). This concern is especially relevant now that scientists are becoming increasingly aware that R-loops are a more common occurrence in the genomes of eukaryotic cells than previously thought, thus creating many potential steady-state off-target ssDNA substrates where an APOBEC could bind and deaminate (Santos-Pereira and Aguilera, 2015).

81

While it is as yet unproven whether BE overexpression itself can sufficiently stimulate spurious deamination and mutagenesis on a global genomic scale, aberrant and over-active

APOBEC deaminase activity is a known driver of tumorigenic mutagenesis (Rebhandl et al., 2015) and overexpression of at least hAPO3 has been shown to stimulate genomic cytidine hypermutation (Aynaud et al., 2012; Holtz et al., 2013; Shinohara et al., 2012; Suspène et al.,

2005). Further, aberrant expression of cytidine deaminases has been associated with spurious deamination of RNA distributed throughout the transcriptome (Yamanaka et al., 1995), and this may result in the production of autoantigens implicated in human disease (Roth et al., 2018). Thus, it stands to reason that limiting the naturally global deaminating activity of over-expressed deaminases like BE will be important for translating BE technologies into therapeutic applications.

Of note, since BE proteins include the UGI inhibitor to bias deamination events toward productive

C to T mutations, it is possible that global off-target BE activity is even more mutagenic than the effects of aberrant deaminase activity alone during tumorigenesis.

The work outlined in this thesis will enable researchers to more precisely modify cellular genomes in the context of ex vivo delivery to cell populations or by delivery to whole organisms.

However, this work also outlines only proof-of-concept experiments intended to demonstrate the efficacy of strategies to engineer more precise genome editing reagents. In particular, it will be critical to develop a suite of deaminase domains that recognize cytidines embedded in all possible di- or trinucleotide sequence contexts to incorporate into base editor architectures and to determine whether spurious genomic or transcriptomic deamination is occurring, and whether there exists a viable strategy to stop it. These strategies will greatly improve targeting range of the sequence- specific deaminase-BE platform pioneered by eA3A, enabling single nucleotide correction of a

82 greater number of pathologic single nucleotide polymorphisms, and also enhance the safety profile of base editor proteins in their transition towards use in therapeutic genome editing.

83

Supplementary Tables

Supplementary Table 1: All on-target sites investigated by the EGFP disruption assay or T7E1 assay and sgRNA scaffold sequences for Cas9 orthologs.

Supplementary Table 2: All standard deviation values derived from high-throughput sequencing reads of the 12 endogenous sites depicted in Figure 5d.

Supplementary Table 3: All off-target sites investigated by high-throughput sequencing in this study identified by name and sequence. Amplicon oligonucleotides represent the forward and reverse primers used to amplify each site from genomic DNA. Sites are organized by gRNA then by the method used to discover each off-target site.

Supplementary Table 4: Table of C to T editing efficiencies for each off-target site investigated in this study for BE3 or untransfected control. Statistically significant differences in editing efficiencies observed between BE3 and the untransfected control are indicated by p-value (p <

0.05).

Supplementary Table 5: All on-target sites investigated by high-throughput sequencing in this study identified by name and sequence. Amplicon oligonucleotides represent the forward and reverse primers used to amplify each site from genomic DNA.

Supplementary Table 6: All disease-causing SNPs from the ClinVar database that are targetable using an NGG, NGA, or NGT PAM sequence and can be corrected using C to T base editing wherein the causal SNP is also preceded by a 5’ T.

84

Supplementary Table 1

Target Target site CjCas9 EGFP site 1 GGTGGCATCGCCCTCGCCCTCG CjCas9 EGFP site 2 gCCGCGCCGAGGTGAAGTTCGAG CjCas9 EGFP site 3 GCAGCTCGCCGACCACTACCAG NcCas9 EGFP site 1 - 23 nt GCAGATGAACTTCAGGGTCAGCT NcCas9 EGFP site 2 - 23 nt GGGGTAGCGGCTGAAGCACTGCA NcCas9 EGFP site 3 - 23 nt GAAGTCGTGCTGCTTCATGTGGT NcCas9 EGFP site 4 - 23 nt GCCGTCGCCGATGGGGGTGTTCT NcCas9 EGFP site 1 - 20 nt GATGAACTTCAGGGTCAGCT NcCas9 EGFP site 2 - 20 nt GTAGCGGCTGAAGCACTGCA NcCas9 EGFP site 3 - 20 nt GTCGTGCTGCTTCATGTGGT NcCas9 EGFP site 4 - 20 nt GTCGCCGATGGGGGTGTTCT PmCas9 EGFP site 1 - 23 nt GGCGAGGAGCTGTTGGGGT PmCas9 EGFP site 2 - 23 nt GGCGATGCTACGGCAAGCT PmCas9 EGFP site 3 - 23 nt gCAGGGCACGGGCAGCTTGCCGGT PmCas9 EGFP site 4 - 23 nt GGGTCTTTGCTCAGGGCGGACTG PmCas9 EGFP site 1 - 20 nt GAGGAGCTGTTGGGGT PmCas9 EGFP site 2 - 20 nt GATGCTACGGCAAGCT PmCas9 EGFP site 3 - 20 nt GGCACGGGCAGCTTGCCGGT PmCas9 EGFP site 4 - 20 nt GTCTTTGCTCAGGGCGGACTG PlCas9 EGFP site 1 - 23 nt GAGGAGCTGTTGGGGTGGT PlCas9 EGFP site 2 - 23 nt GCGGACTTGAAGAAGTCGTGCTG PlCas9 EGFP site 3 - 23 nt GCCGACCACTACCAGCAGAACAC PlCas9 EGFP site 4 - 23 nt GCGGCGGTCACGAACTCCAGCAG PlCas9 EGFP site 1 - 20 nt GAGCTGTTGGGGTGGT PlCas9 EGFP site 2 - 20 nt GACTTGAAGAAGTCGTGCTG

85

PlCas9 EGFP site 3 - 20 nt GACCACTACCAGCAGAACAC PlCas9 EGFP site 4 - 20 nt GCGGTCACGAACTCCAGCAG ClCas9 EGFP site 1 - 23 nt GCCACAAGTTCAGCGTGTCCGGC ClCas9 EGFP site 2 - 23 nt GAAGTCGTGCTGCTTCATGTGGT ClCas9 EGFP site 3 - 23 nt GTAGTGGTTGTCGGGCAGCAGCA ClCas9 EGFP site 4 - 23 nt GACCATGTGATCGCGCTTCTCGT ClCas9 EGFP site 1 - 20 nt gACAAGTTCAGCGTGTCCGGC ClCas9 EGFP site 2 - 20 nt GTCGTGCTGCTTCATGTGGT ClCas9 EGFP site 3 - 20 nt GTGGTTGTCGGGCAGCAGCA ClCas9 EGFP site 4 - 20 nt gCATGTGATCGCGCTTCTCGT T7E1 PCR amplification primer 1 ATGGTGAGCAAGGGCGAG T7E1 PCR amplification primer 2 CACATTGATCCTAGCAGAAGCAC GTTTTAGTCCCTGAAGGGACTAAAATAAAGAGTTTGCGGGACTCTGCGGGGTTACAATCCCCTAAAACCGCTTTTTT CjCas9 sgRNA scaffold T GTTTTAGTCTCTGAAAAGAGACTAAAATAAGTGGTTTTTGGTCATCCACGCAGGGTTACAATCCCTTTAAAACCATT ClCas9 sgRNA scaffold AAAATTCAAATAAACTAGGTTGTATCAACTTAG GTTGTAGCTCCCATTCTCGAAAGAGAACCGTTGCTACAATAAGGCCGTCTGAAAAGATGTGCCGCAACGCTCTGCC CCTTAAAGCTTCTGCTTTAAGGGGCATCGTTTATTTCGGTTAAAAATGCCGTCTGAAACCGGTTTTTAGGTTTCAGA NcCas9 sgRNA scaffold CGGCA GCTGCGGATTGCGGGAAATCGCTTTTCGCAAGCAAATTGACCCCTTGTGCGGGCTCGGCATCCCAAGGTCAGCTGC PlCas9 sgRNA scaffold CGGTTATTATCGAAAAGGCCCACCGCAAGCAGCGCGTGGGCC GTTGTAGTTCCCTCTCTCATTTCGCAGTGCTACAATGAAAATTGTTGCACTGCGAAATGAGAGACGTTGCTACAATA Pm sgRNA scaffold AGGCTTCTGAAAAGAAGACCGTAACGCTCTGCCCCTTGTGATTCTTAATTGCAAGGGGCATCG GTTTTTGTACTCGAAAGAGCCTACAAAGATAAGGCTTTATGCCGAATTCAAGCACCCCATGTTTTGACATGAGGTG Spast sgRNA scaffold C

86

Supplementary Table 2

PPP1R12C site 1 Sample Nt G A C T C A C C C A 0.00354 0.00259 0.00238 0.00664 0.02600 0.00075 0.00060 0.00557 BE3 A 0 0 7 3 8 3 7 4 4 2 0.00619 0.00894 0.08213 0.00185 0.09274 0.00838 0.02217 0.00108 T 0 0 5 6 7 6 2 8 4 7 0.00023 0.00252 0.07495 0.00186 0.00919 0.02157 0.00553 C 0 0 9 2 6 1 0.12138 2 7 2 0.00240 0.00479 0.00292 0.00263 0.00155 7.45E- 0.00112 G 0 0 8 0.00383 3 6 2 9 06 7 0.00016 4.12E- 0.00520 0.00224 0.00939 0.00021 2.74E- 0.00263 YE1 BE3 A 0 0 2 05 9 2 7 4 05 3 0.00223 0.00070 0.00224 0.01382 0.00252 0.00086 0.00027 T 0 0 3 0.00095 4 2 4 9 6 4 0.00195 0.00322 0.00022 0.00254 0.00191 0.00104 0.00221 C 0 0 2 0.00055 3 5 8 1 3 6 0.00011 0.00044 0.00772 0.00022 0.00187 0.00083 0.00014 G 0 0 9 2 9 5 8 2 0.00015 4 0.00029 0.00019 0.00132 0.00047 0.00019 6.04E- 0.00040 0.00197 YE2 BE3 A 0 0 7 9 5 5 1 05 4 9 0.00051 0.05499 0.00017 0.00161 0.02857 0.00020 0.00044 T 0 0 6 0.00136 7 1 6 9 6 7 3.29E- 0.00056 0.04940 0.00029 0.02883 0.00055 0.00252 C 0 0 05 1 5 7 0.00738 1 2 7 0.00025 0.00426 0.00034 0.00557 0.00031 0.00035 0.00099 G 0 0 2 0.0006 7 9 3 3 5 6 0.00105 0.00040 0.00060 0.00102 0.00176 0.00023 6.16E- 0.00205 YEE BE3 A 0 0 7 2 8 3 7 3 05 7

87

0.00069 5.56E- 0.01114 0.00027 0.02248 0.00389 0.00044 0.00013 T 0 0 7 05 2 8 9 1 2 9 0.00064 0.00772 0.00032 0.02393 0.00375 0.00166 C 0 0 6 0.00056 4 5 5 9 0.00086 2 0.00028 0.00010 0.00042 0.00032 0.00036 0.00035 0.00025 G 0 0 6 3 0.00281 1 2 4 6 6 0.00088 0.00049 0.00179 0.00250 0.00117 8.06E- 0.00011 0.00510 A3A (N57A)-BE3 A 0 0 4 3 5 1 5 05 4 1 0.00127 0.00232 0.00558 0.00221 0.00637 0.00321 0.00389 0.00019 T 0 0 6 9 2 7 7 7 4 3 0.00022 0.00127 0.02002 0.00034 0.00473 0.00258 0.00252 0.00405 C 0 0 8 6 2 6 6 7 2 7 0.00016 0.01264 6.25E- 0.00046 0.00148 0.00085 G 0 0 4 0.00056 4 05 5 0.00055 7 1 A3A (N57Q/Y130F)- 0.00072 0.00134 0.00746 0.00153 0.00229 8.71E- 0.00212 BE3 A 0 0 2 2 6 5 5 05 0.00017 6 0.00081 0.00058 0.05266 0.00132 0.01244 0.00631 0.00031 T 0 0 9 8 3 5 9 2 0.00133 1 0.00035 0.04869 0.00072 0.01504 0.00668 0.00119 0.00105 C 0 0 2 0.00073 2 2 3 5 3 1 0.00025 2.43E- 0.00349 0.00051 0.00029 0.00028 3.35E- 0.00076 G 0 0 4 05 6 2 8 6 05 4 1.76E- 0.00174 0.00158 0.00042 0.00052 0.00017 0.00011 0.00157 eA3A-BE3 A 0 0 05 1 3 7 7 3 4 9 0.00612 0.00315 0.07331 0.00034 0.00185 0.00304 2.05E- T 0 0 4 4 5 8 6 9 0.00061 05 0.00571 0.00180 0.05817 0.00048 0.00093 0.00060 0.00136 C 0 0 3 7 5 2 3 0.00239 3 8 0.00039 0.00039 0.01355 0.00029 0.00039 0.00048 0.00023 G 0 0 4 4 6 3 6 7 0.00012 2

88

PPP1R12C site 2 Sample Nt G T C C G A C T C G 0.0009 0.0022 0.0003 0.0014 0.0019 0.0013 0.0015 0.0004 0.0003 BE3 A 51 0.0019 3 34 02 97 34 29 58 41 0.0004 0.0020 0.0237 0.0107 0.0012 0.0014 0.0257 0.0013 0.0028 0.0014 T 73 06 29 53 38 62 86 65 29 88 0.0017 0.0001 0.0231 0.0119 0.0002 0.0002 0.0254 0.0008 0.0030 0.0009 C 52 61 8 23 63 35 83 27 58 72 0.0020 0.0002 0.0009 0.0012 0.0014 0.0003 0.0046 0.0001 0.0002 0.0018 G 12 55 25 14 48 77 21 12 38 17 0.0013 0.0023 0.0006 0.0003 0.0006 0.0006 0.0012 0.0013 0.0002 0.0012 YE1 BE3 A 44 87 34 62 07 98 83 26 53 85 0.0008 0.0018 0.0035 0.0013 0.0007 0.0010 0.0310 0.0026 0.0037 0.0014 T 73 63 09 26 87 59 89 11 08 1 0.0014 7.26E- 0.0033 0.0002 0.0001 0.0008 0.0269 0.0012 0.0037 0.0011 C 77 05 01 27 28 03 62 63 03 45 0.0022 0.0011 0.0001 0.0009 0.0003 9.52E- 0.0029 0.0001 0.0001 0.0013 G 17 13 14 34 3 05 04 75 5 96 0.0007 0.0020 0.0010 5.23E- 0.0010 0.0037 0.0002 0.0012 7.45E- 0.0009 YE2 BE3 A 6 01 26 05 81 84 77 69 05 61 0.0012 0.0020 0.0010 0.0013 0.0022 0.0020 0.0112 0.0014 0.0016 0.0011 T 32 58 77 23 23 35 4 52 57 81 0.0011 0.0001 0.0017 0.0013 0.0001 0.0011 0.0115 0.0007 0.0020 0.0007 C 44 88 01 01 94 41 55 55 04 55 0.0024 0.0002 0.0006 0.0010 0.0018 0.0006 0.0004 0.0005 0.0025 G 03 26 96 48 58 16 0.0014 48 67 11 0.0017 0.0005 0.0005 0.0001 0.0010 0.0007 0.0001 0.0010 0.0004 0.0004 YEE BE3 A 13 95 49 1 26 95 43 62 73 47 0.0003 0.0011 0.0006 0.0006 0.0010 0.0012 0.0120 0.0019 0.0003 0.0005 T 59 32 56 61 74 57 54 99 8 75

89

0.0005 0.0003 0.0005 0.0001 0.0003 0.0005 0.0134 0.0003 0.0005 0.0007 C 82 09 84 96 67 1 88 06 77 71 0.0011 0.0001 0.0004 0.0005 0.0004 0.0014 0.0006 9.36E- 0.0004 G 0.0021 22 13 66 7 09 03 61 05 22 0.0019 0.0003 0.0011 0.0001 0.0013 0.0014 7.29E- 0.0014 0.0004 0.0001 A3A (N57A)-BE3 A 84 35 18 76 85 02 05 93 15 12 0.0014 0.0004 0.0035 0.0022 0.0001 0.0003 0.0025 0.0015 0.0169 0.0009 T 27 14 63 86 55 71 55 36 69 92 0.0006 0.0001 0.0039 0.0026 0.0002 0.0008 0.0045 0.0007 0.0171 0.0006 C 41 61 51 54 79 83 18 07 49 36 0.0015 9.55E- 0.0006 0.0013 0.0012 0.0002 0.0019 0.0002 0.0004 0.0011 G 06 05 83 94 11 56 74 47 64 99 A3A (N57Q/Y130F)- 0.0004 0.0013 0.0005 9.36E- 0.0002 0.0009 0.0007 0.0010 0.0004 0.0006 BE3 A 43 29 72 05 41 14 08 02 14 63 0.0003 0.0019 0.0031 0.0020 0.0010 0.0012 0.0015 0.0010 0.0010 0.0012 T 19 15 01 86 19 52 97 22 55 68 0.0013 0.0004 0.0029 0.0018 0.0003 0.0002 0.0015 0.0002 0.0009 0.0004 C 58 46 38 59 11 55 33 36 37 79 0.0012 0.0010 0.0003 0.0002 0.0013 0.0001 0.0002 4.09E- 0.0001 0.0015 G 58 18 57 54 23 56 07 05 45 28 0.0019 0.0022 0.0011 6.87E- 0.0012 0.0016 0.0003 0.0009 0.0004 0.0004 eA3A-BE3 A 57 22 39 05 16 27 69 06 51 09 0.0001 0.0022 0.0086 0.0020 0.0009 0.0011 0.0019 0.0006 0.0271 0.0011 T 51 55 07 15 51 7 44 12 84 29 0.0013 7.99E- 0.0092 0.0028 0.0002 0.0006 0.0002 0.0003 0.0265 0.0003 C 1 05 25 9 35 83 85 17 28 82 0.0008 0.0001 0.0007 0.0014 0.0016 0.0003 0.0018 0.0003 0.0004 0.0011 G 42 8 23 39 32 21 44 77 72 33

90

PPP1R12C site 3 Sample Nt G A C C C T C A G C 0.00029 0.00019 0.00236 6.19E- 0.00610 0.00594 0.00080 0.00044 BE3 A 0 0 5 3 4 05 7 9 5 8 0.00583 0.02080 0.03313 0.00036 0.04008 0.00575 0.00319 2.12E- T 0 0 2 3 3 6 5 7 9 05 0.00440 0.00012 0.03471 7.63E- 0.00063 C 0 0 5 0.01974 0.02916 7 9 0.00015 05 2 0.00172 0.00017 0.00074 4.16E- 0.00231 0.00016 G 0 0 2 0.00087 0.00161 7 1 05 7 3 0.00169 5.39E- 0.00135 4.89E- 0.00017 YE1 BE3 A 0 0 4.4E-05 1 0.00095 05 5 05 0.00012 4 0.00203 0.02344 0.01465 0.00021 0.02090 4.45E- 6.72E- T 0 0 7 8 1 8 6 05 0.00402 05 0.00479 0.02624 0.00374 2.22E- 0.01996 3.29E- 1.91E- 0.00010 C 0 0 8 1 4 05 1 05 06 8 0.00271 0.00110 0.01185 3.74E- 0.00414 1.31E- G 0 0 7 2 7 0.00025 0.00041 05 2 06 9.42E- 0.00181 0.00175 7.17E- 0.00461 0.00027 YE2 BE3 A 0 0 05 4 3 06 0.00605 3 0.00034 1 0.00168 0.00809 0.02368 0.00598 0.00100 9.31E- 6.55E- T 0 0 3 9 8 8 7 0.00465 05 06 0.00181 0.01188 0.02866 3.95E- 0.00529 8.68E- 1.36E- 0.00016 C 0 0 8 2 9 05 1 06 05 8 0.00359 0.00196 0.00673 0.00595 0.00024 4.59E- 0.00044 9.72E- G 0 0 5 9 4 6 8 05 6 05 1.64E- 7.74E- 3.21E- 0.00014 0.00396 0.00511 6.18E- 0.00025 YEE BE3 A 0 0 05 05 05 2 8 2 05 9 0.00405 0.00080 0.00068 0.00057 0.00495 0.00025 3.92E- T 0 0 8 1 0.01786 4 4 7 2 05

91

0.00804 0.00422 0.00618 6.61E- C 0 0 2 5 0.00957 6.3E-05 5 05 9E-05 0.00038 0.00334 0.00825 0.00047 0.00164 8.95E- 0.00040 G 0 0 0.004 6 8 9 3 05 3 0.00016 6.16E- 2.19E- 8.88E- 0.00377 0.00010 3.89E- 0.00016 A3A (N57A)-BE3 A 0 0 05 05 4.9E-05 05 8 7 05 2 0.00084 0.01147 0.01369 6.31E- 0.04431 7.84E- 0.00542 1.03E- T 0 0 5 8 5 06 9 05 5 05 0.00384 0.01356 0.02047 0.00015 0.04891 6.87E- 0.00013 9.56E- C 0 0 4 6 9 1 2 05 5 05 0.00306 0.00673 0.00023 3.97E- 0.00532 7.63E- G 0 0 1 0.00211 5 4 0.00837 05 9 05 A3A (N57Q/Y130F)- 0.00015 0.00015 4.51E- 0.00402 0.00500 0.00011 0.00045 BE3 A 0 0 9 1 0.00098 05 8 6 7 9 0.00178 0.00158 0.00399 0.00017 0.00390 0.00500 7.38E- 0.00010 T 0 0 7 2 8 9 5 4 05 2 0.00029 0.00332 0.00802 2.42E- 0.00214 2.08E- 5.11E- 0.00039 C 0 0 3 7 3 05 8 05 05 8 0.00223 0.00189 0.00500 0.00010 0.00202 1.87E- 0.00013 4.09E- G 0 0 9 6 5 9 5 05 9 05 0.00017 0.00026 7.96E- 3.48E- 0.00175 0.00354 5.38E- 7.16E- eA3A-BE3 A 0 0 7 7 05 05 6 7 05 06 0.00028 0.00078 0.01904 0.00099 0.02099 0.00364 0.00081 0.00108 T 0 0 3 8 5 9 5 4 1 1 0.02094 5.89E- 0.03519 2.58E- 4.51E- 0.00116 C 0 0 0.00332 0.00214 1 05 5 05 05 2 0.00161 0.00197 0.00090 0.01244 7.17E- 7.36E- G 0 0 0.00286 9 6 5 4 05 0.00091 05

92

PPP1R12C site 4 Sample Nt G C T C T C A G C C BE3 A 0 0 0.00054 0.00096 0.00037 0.00091 0.001 2.3E-05 0.00031 0.00048 T 0 0 0.00075 0.01371 0.00089 0.0172 0.00072 0.00018 0.00063 0.00049 C 0 0 0.00021 0.01454 3.9E-05 0.01886 7.7E-05 0.00019 0.00049 0.00131 G 0 0 0.00012 0.00165 0.00051 0.00125 0.00027 0.00034 4.4E-05 0.00043 YE1 BE3 A 0 0 0.00036 0.00021 7.3E-05 0.00079 0.00063 0.00013 0.00024 8.8E-05 T 0 0 0.00064 0.04276 0.00071 0.03404 0.00051 0.0004 0.0006 0.00015 C 0 0 0.00014 0.04203 4.7E-05 0.03349 0.00013 9.3E-05 0.00068 0.00049 G 0 0 0.00032 0.00099 0.00081 0.00091 3.1E-05 0.00062 0.00028 0.00035 YE2 BE3 A 0 0 0.00015 0.00085 7.3E-05 0.0014 0.0009 0.0001 0.00021 0.00029 T 0 0 0.00033 0.01221 0.00016 0.01302 0.00089 0.00023 0.0001 0.0007 C 0 0 0.00016 0.01406 0.00013 0.01135 0.00024 0.00019 0.00018 0.00099 G 0 0 0.00017 0.00239 0.00034 0.00062 0.00022 0.00023 0.00044 4.3E-05 YEE BE3 A 0 0 0.00013 0.00022 2.7E-05 0.00041 0.00057 4.6E-05 0.00031 0.00047 T 0 0 8.4E-05 0.01313 0.00092 0.02159 0.00094 0.00018 4.3E-05 0.00139 C 0 0 0.0001 0.01415 0.00015 0.02205 0.00129 0.00012 0.00039 0.00095 G 0 0 0.00019 0.0011 0.00087 0.00116 0.00062 0.00021 0.00027 3.7E-05 A3A (N57A)-BE3 A 0 0 0.00017 7.3E-05 5.4E-05 0.00147 0.00066 7.7E-05 0.00019 0.00035 T 0 0 0.00044 0.00115 0.00106 0.02157 0.00085 0.00021 0.00086 0.00079 C 0 0 0.00041 0.00145 0.00043 0.02604 0.00062 6.8E-05 0.00092 0.00084 G 0 0 0.00015 0.00188 0.00064 0.00588 0.0002 0.00022 0.00012 0.00012 A3A (N57Q/Y130F)- BE3 A 0 0 0.00011 0.00016 0.00038 0.00056 0.00012 9.1E-05 0.00055 0.00035 T 0 0 0.00027 0.00364 0.00026 0.0399 0.0009 0.00034 0.00113 0.00139 C 0 0 0.00017 0.00197 0.00022 0.04047 0.00122 0.00011 0.00135 0.00107 G 0 0 0.00023 0.00231 0.00036 0.00168 0.00024 0.00033 5E-05 0.00012 eA3A-BE3 A 0 0 0.00031 0.00073 0.00025 0.00141 0.00079 2.2E-05 0.00024 0.00015 T 0 0 0.00055 0.00345 0.00041 0.00466 0.00051 0.0003 0.00027 0.00071

93

C 0 0 0.00057 0.00737 0.00011 0.00556 0.00021 0.00027 0.00047 0.00101 G 0 0 0.00014 0.00382 0.00031 0.00123 0.00046 0.00023 0.00047 0.00017

PPP1R12C site 5 Sample Nt G C T G A C T C A G 0.00900 0.00104 0.00348 0.00527 8.15E- 0.00183 7.97E- BE3 A 0 0 4 6 4 6 05 7 05 0.00045 0.00725 0.00035 0.00447 0.02056 0.00222 0.02291 4.92E- 1.52E- T 0 0 8 6 6 7 4 8 05 05 0.00049 0.02537 0.00218 0.02497 6.71E- 6.91E- C 0 0 0.00029 0.00018 5 5 8 8 05 05 0.00203 0.00158 0.00148 0.00046 4.51E- 0.00389 3.66E- 0.00053 G 0 0 6 1 7 8 05 7 05 4 2.57E- 0.00180 0.00647 1.42E- 0.00161 0.00027 0.00012 YE1 BE3 A 0 0 05 4.5E-05 3 6 05 6 1 7 0.00236 0.00111 0.00342 0.00032 0.00844 0.00035 9.04E- T 0 0 1 6 2 0.02212 6 6 9 05 0.00012 0.00174 9.13E- 0.01869 1.93E- 0.00889 3.73E- 0.00218 C 0 0 3 1 05 8 05 9 05 4 0.00290 0.00305 0.00032 0.00116 5.01E- 0.00196 G 0 0 0.00251 2 0.00171 4 1 3 05 7 2.03E- 7.61E- 0.00053 0.00798 2.64E- 0.00025 7.91E- 7.81E- YE2 BE3 A 0 0 05 06 5 3 05 2 05 05 0.00029 0.00193 0.00702 0.01395 2.64E- 0.00021 T 0 0 0.00353 9 3 4 0.00059 6 05 5 3.55E- 0.01114 1.27E- 0.01166 8.22E- 4.57E- C 0 0 06 0.00012 0.00014 6 05 8 05 05 0.00355 0.00041 0.00260 0.00386 0.00057 0.00203 2.94E- 0.00024 G 0 0 4 1 8 1 6 6 05 8

94

0.00447 0.00448 0.00629 0.00117 1.39E- 0.00035 0.00141 0.00013 YEE BE3 A 0 0 3 8 8 1 05 3 6 4 5.66E- 0.00594 0.01997 0.00016 0.00862 1.78E- 0.00155 T 0 0 0.00293 05 1 5 5 2 05 3 0.00386 0.02145 8.22E- 0.00805 0.00010 5.28E- C 0 0 1 3.5E-06 5.9E-05 7 05 8 2 06 0.00231 0.00442 0.00041 0.00031 0.00023 0.00021 0.00142 G 0 0 8 8 6 1 3 1 0.0015 4 0.00139 0.00140 0.00531 0.00159 1.52E- 0.00395 0.00469 0.00024 A3A (N57A)-BE3 A 0 0 1 8 1 1 05 2 5 9 0.00019 1.65E- 0.00481 0.01199 0.00027 0.00308 0.00626 0.00145 T 0 0 8 05 5 5 8 5 9 6 0.00102 0.00041 0.01076 7.05E- 0.01465 5.09E- 6.87E- C 0 0 3 0.00609 8 7 05 5 05 05 0.00261 0.00466 0.00091 0.00036 0.00033 0.01378 0.00152 0.00177 G 0 0 2 5 4 3 3 8 4 4 A3A (N57Q/Y130F)- 0.00136 0.00129 0.00032 0.00227 3.29E- 0.00209 0.00080 0.00023 BE3 A 0 0 7 6 5 2 05 6 7 3 9.33E- 9.82E- 0.00030 7.54E- 4.88E- 0.00125 T 0 0 0.00094 05 05 2 05 0.00897 05 6 2.61E- 0.00019 0.00170 4.68E- 0.00816 5.37E- 1.36E- C 0 0 05 7.9E-05 2 2 05 7 05 05 0.00045 0.00146 0.00061 0.00087 0.00015 0.00289 0.00080 0.00147 G 0 0 3 8 6 2 5 9 2 5 7.19E- 0.00025 0.00012 0.00180 0.00074 0.00433 0.00371 6.27E- eA3A-BE3 A 0 0 05 9 2 3 2 8 9 05 0.00012 0.00027 0.00109 0.00075 0.06744 8.89E- 0.00341 T 0 0 0.00221 6 9 1 1 8 05 8 0.00526 1.24E- 0.00018 0.00024 0.07722 7.11E- 0.00151 C 0 0 4 05 3 0.00033 5 1 05 8

95

0.00312 0.00012 0.00033 0.00038 0.00025 0.00543 0.00373 0.00499 G 0 0 6 1 9 3 3 6 7 9

PPP1R12C site 6 Sample Nt G G G G C T C A A C 7.13E- 0.0002 0.0020 0.0019 0.0001 0.0023 0.0016 0.0003 0.0016 BE3 A 05 53 3.4E-05 03 36 7 62 45 99 65 0.0002 0.0004 0.0002 0.0002 0.0429 0.0006 0.0502 0.0002 2.96E- 0.0027 T 25 95 21 06 91 31 41 41 05 18 1.46E- 0.0003 7.17E- 0.0445 6.96E- 0.0542 0.0015 0.0004 0.0043 C 2.4E-05 05 4 05 29 05 82 36 18 28 0.0001 0.0003 0.0005 0.0019 0.0027 0.0004 0.0071 0.0002 0.0001 G 44 83 36 92 82 53 26 9.7E-05 15 63 0.0001 0.0003 9.82E- 0.0020 0.0029 0.0001 0.0013 0.0008 0.0002 0.0009 YE1 BE3 A 6 04 05 77 01 07 68 66 51 72 0.0003 0.0002 0.0005 0.0001 0.0321 0.0003 0.0307 0.0001 0.0001 0.0011 T 04 18 42 37 51 69 32 65 64 66 0.0007 2.41E- 9.23E- 0.0320 0.0001 0.0312 0.0007 0.0001 0.0010 C 6E-05 57 05 05 36 83 65 9 66 04 0.0004 0.0006 0.0005 0.0021 0.0005 0.0002 0.0149 6.79E- 0.0005 G 92 15 75 4 67 74 95 05 0.0002 77 0.0001 7.66E- 0.0006 0.0010 0.0001 0.0001 0.0020 0.0010 0.0002 0.0004 YE2 BE3 A 28 05 51 83 75 09 67 01 16 11 0.0002 8.28E- 0.0001 0.0475 0.0005 0.0592 2.03E- 0.0017 T 14 05 51 6.9E-05 75 32 88 3E-05 05 29 1.39E- 0.0001 0.0478 0.0001 0.0666 0.0010 0.0002 0.0014 C 05 0 3E-05 02 99 54 23 23 42 26 0.0001 0.0001 0.0005 0.0010 0.0004 0.0003 0.0082 0.0001 1.07E- 0.0001 G 16 25 05 41 03 28 39 42 05 02 3.17E- 0.0001 0.0001 0.0004 0.0008 0.0001 0.0008 0.0003 0.0001 0.0002 YEE BE3 A 05 19 54 95 62 55 41 74 98 46

96

0.0002 2.99E- 0.0001 0.0001 0.0130 0.0002 0.0197 3.99E- 2.07E- 0.0006 T 21 05 25 69 84 35 77 05 05 94 5.18E- 3.74E- 0.0001 0.0123 0.0002 0.0273 0.0005 5.27E- 0.0009 C 06 05 5.2E-05 46 24 23 6 41 05 75 0.0002 0.0003 0.0004 0.0007 0.0001 0.0070 0.0002 0.0002 8.37E- G 14 7.9E-05 17 88 38 47 3 13 08 05 0.0002 0.0006 0.0002 0.0010 0.0017 8.06E- 0.0032 0.0008 0.0012 0.0006 A3A (N57A)-BE3 A 06 33 08 8 51 05 41 91 04 92 0.0004 0.0002 7.49E- 0.0102 0.0004 0.0094 0.0005 0.0015 T 01 0.0007 41 05 86 81 77 5.6E-05 22 45 3.32E- 0.0003 0.0004 0.0101 0.0003 0.0164 0.0008 0.0006 0.0020 C 05 81 5.9E-05 33 61 22 39 71 49 42 0.0003 0.0012 0.0004 0.0015 0.0003 8.78E- 0.0131 8.23E- 0.0001 0.0002 G 22 31 74 57 85 05 36 06 95 06 A3A (N57Q/Y130F)- 0.0001 0.0001 0.0001 0.0013 0.0004 0.0001 0.0029 0.0018 0.0003 0.0014 BE3 A 84 81 01 66 43 73 27 26 54 09 0.0002 0.0005 0.0001 0.0001 0.0150 0.0005 0.0074 0.0001 0.0002 0.0007 T 84 35 13 03 77 44 44 67 49 22 1.95E- 1.44E- 1.98E- 5.45E- 0.0151 0.0003 0.0084 0.0016 0.0001 0.0020 C 05 05 05 05 6 29 65 52 37 1 0.0004 0.0004 0.0001 0.0013 0.0004 5.13E- 0.0103 0.0003 0.0001 0.0002 G 83 54 75 19 92 05 09 39 04 29 0.0001 0.0001 0.0002 0.0012 0.0005 5.99E- 0.0037 0.0014 0.0012 0.0003 eA3A-BE3 A 07 32 05 43 26 05 64 22 77 18 0.0002 0.0002 0.0007 3.05E- 0.0068 0.0005 0.0277 0.0012 0.0013 0.0017 T 46 2 55 05 69 07 47 94 41 73 3.88E- 0.0001 6.12E- 0.0001 0.0070 0.0003 0.0306 0.0013 0.0002 0.0021 C 05 18 05 08 41 1 42 36 05 62 0.0001 0.0004 0.0005 0.0011 0.0005 0.0001 0.0052 0.0006 0.0002 0.0001 G 5 46 85 33 75 68 74 71 35 23

97

PPP1R12C site 7 Sample Nt G G C A C T C G G G 0.00601 0.00134 0.00136 0.01234 0.00499 0.00046 BE3 A 0 0 0.00086 8 1 1 5 0.0004 3 8 0.00181 0.00576 0.00199 0.00178 0.00099 0.00731 0.00536 0.00522 T 0 0 1 7 4 4 7 9 1 4 0.00272 0.00020 0.00315 0.00013 0.01042 0.00039 0.00021 0.00067 C 0 0 4 9 9 6 1 3 9 1 5.26E- 4.18E- 0.00250 0.00028 0.00292 0.00811 0.01057 0.00636 G 0 0 05 05 7 7 2 2 3 3 5.59E- 0.00433 0.03223 0.00274 0.01989 0.00020 0.00039 0.00055 YE1 BE3 A 0 0 05 6 7 6 8 3 5 1 0.00498 0.00323 0.00080 0.00306 0.01690 0.00345 0.00356 T 0 0 1 2 8 8 4 0.00341 9 2 0.00502 0.00112 0.01505 2.45E- 0.00393 7.36E- 8.42E- 0.00025 C 0 0 8 1 2 05 5 05 05 8 8.97E- 1.71E- 0.01799 0.00034 0.00368 0.00326 G 0 0 06 05 3 7 0.00094 6 0.00377 9 0.00018 0.00263 0.03023 0.00311 0.00646 0.00015 0.00068 0.00027 YE2 BE3 A 0 0 4 6 4 4 1 9 8 1 0.00609 0.00247 0.08352 0.00346 0.10979 0.00232 0.00237 0.00228 T 0 0 1 1 8 8 1 2 8 8 0.00014 0.11665 0.00014 2.77E- 8.55E- 0.00026 C 0 0 0.0063 7 9 8 0.09815 05 05 2 2.49E- 1.75E- 0.00289 0.00020 0.00245 0.00315 0.00282 G 0 0 05 05 7 6 0.00518 3 1 1 0.00045 0.00395 0.01589 0.00040 0.02342 7.57E- 0.00048 0.00018 YEE BE3 A 0 0 4 7 7 6 8 05 3 6

98

0.00306 0.00371 0.05258 0.00149 0.06542 0.00397 0.00402 0.00422 T 0 0 3 3 2 1 2 4 8 5 0.00264 0.00016 0.08677 5.57E- 0.09954 1.78E- 0.00015 9.08E- C 0 0 4 6 3 05 4 06 3 05 3.51E- 7.85E- 0.01829 0.00102 0.01069 0.00435 0.00450 G 0 0 05 05 4 9 3 0.0039 7 2 0.00209 0.00461 0.00367 0.00285 0.00030 0.00068 0.00154 A3A (N57A)-BE3 A 0 0 5 0.00573 9 2 9 9 1 3 0.00603 0.00615 0.00250 0.00424 0.03201 0.00608 0.00606 0.00640 T 0 0 2 2 9 3 9 2 8 6 0.00398 0.00032 0.02071 9.83E- 0.02042 0.00012 0.00019 7.98E- C 0 0 8 8 6 05 4 2 3 06 5.16E- 9.46E- 0.01358 0.00047 0.01445 0.00651 0.00655 0.00485 G 0 0 05 05 7 2 4 4 6 6 A3A (N57Q/Y130F)- 0.00020 0.00042 0.00749 1.26E- 8.74E- 0.00011 BE3 A 0 0 5 3 4 0.00021 0.01131 05 05 9 0.00581 0.00058 0.02298 0.00016 0.02930 0.00066 0.00051 T 0 0 7 4 6 2 3 9 4 0.00052 0.00441 0.00021 0.00375 1.44E- 0.05436 0.00013 0.00011 C 0 0 7 8 4 05 3 5 0.00012 8 0.01924 3.29E- 0.00079 0.00072 0.00051 G 0 0 0.01044 5.7E-05 6 05 0.01375 1 2 9 0.00013 0.01701 0.00084 0.00104 0.00573 0.00040 0.00149 eA3A-BE3 A 0 0 2 5 1 2 6 7 4 0.00015 0.01189 0.00688 0.02979 0.00128 0.01416 0.00718 0.00724 T 0 0 1 2 5 5 1 4 6 0.00741 0.01173 0.01004 0.02352 6.56E- 0.01708 0.00034 0.00011 0.00061 C 0 0 2 2 4 07 2 6 9 5 2.72E- 9.06E- 0.00024 0.03697 0.00712 0.00885 0.00787 G 0 0 05 05 0.00543 3 9 3 9 4

99

PPP1R12C site 8 Sample Nt G A G C T C A C T G 0.0093 0.0002 0.0059 0.0097 0.0044 BE3 A 0 0 3 3 2.7E-05 5 6 4 6.3E-05 6.1E-05 0.0040 0.0028 0.0081 0.0195 0.0074 0.0009 T 0 0 7.4E-05 5 9 3 0.0002 3 2 4 0.0012 0.0003 0.0027 0.0216 0.0096 0.0075 0.0094 C 0 0 2 9 8 9 2 7 5 6E-05 0.0106 0.0042 0.0075 0.0020 0.0010 G 0 0 2 1 7.5E-05 0.0076 6.3E-05 3 9 6 0.0002 0.0085 0.0004 0.0019 YE1 BE3 A 0 0 1 0.0005 0.0034 2 6 2 6.7E-06 1.2E-05 0.0006 0.0036 0.0185 0.0003 0.0084 T 0 0 0.0001 9 2 9 2 9 0.003 0.0016 0.0013 0.0023 0.0044 0.0084 0.0001 0.0074 0.0001 C 0 0 1 2 3 2 6 1 1 0.0001 0.0016 0.0025 0.0042 0.0016 0.0031 0.0016 G 0 0 2 2 1 5 2E-05 0.003 1 9 0.0061 0.0001 YE2 BE3 A 0 0 4.7E-05 3 0.0021 0.0015 1 6.1E-05 3.3E-05 6.8E-05 0.0035 0.0018 0.0283 0.0051 0.0035 0.0009 T 0 0 4E-05 7 9 2 6.9E-05 4 3 6 0.0009 0.0077 0.0042 0.0289 0.0001 0.0005 0.0009 C 0 0 5 6 2 9 1 4 9.5E-05 7 0.0009 0.0019 0.0002 0.0021 0.0045 0.0036 G 0 0 4 4 2 7 7.4E-05 4 6 7.2E-05 0.0023 0.0002 0.0081 0.0010 0.0017 0.0002 0.0067 YEE BE3 A 0 0 5 2 5 9 3 3 9 7.9E-06 0.0002 0.0008 0.0084 0.0515 0.0067 0.0022 0.0006 T 0 0 1 4 8 7 5 2 1 1.4E-05

100

0.0013 0.0489 0.0049 0.0072 0.0050 C 0 0 7.9E-05 6 9.6E-06 3 7 1 1 2.5E-05 0.0026 0.0019 0.0003 0.0015 0.0011 G 0 0 3 8 2 5 5.5E-05 0.0092 6 2.7E-06 0.0077 0.0045 0.0034 0.0103 0.0011 0.0076 A3A (N57A)-BE3 A 0 0 4 0.0007 1 5 7 7 4 1.4E-05 0.0014 0.0165 0.0047 0.0045 0.0078 0.0029 0.0002 T 0 0 6 2 7 3 7 2.2E-05 4 7 0.0136 0.0002 0.0026 0.0063 C 0 0 9.7E-05 5 4 0.0022 3 5 3.7E-06 2.5E-05 0.0021 0.0101 0.0001 0.0051 0.0047 0.0002 G 0 0 0.0093 7 2.2E-05 8 2 6 1 6 A3A (N57Q/Y130F)- 0.0038 0.0027 0.0039 0.0006 0.0040 0.0001 BE3 A 0 0 5 6 8 6 4 5 0.004 4E-06 0.0001 0.0013 0.0040 0.0138 0.0039 0.0051 T 0 0 3 9 8 4 4 4 0.0047 4.1E-06 0.0005 0.0056 C 0 0 6.4E-05 1 2.4E-05 0.0151 4E-05 3 3.3E-05 2.3E-05 0.0040 0.0008 0.0001 0.0019 0.0003 0.0006 G 0 0 5 5 2 3 5.9E-05 5 7 2.3E-05 0.0005 0.0057 0.0015 0.0107 0.0001 eA3A-BE3 A 0 0 3 5 3 1 0.0002 4 9.4E-05 6.2E-05 0.0071 0.0018 0.0038 0.0036 T 0 0 2E-05 9 1 0.0159 3.2E-05 4 8 4.1E-05 0.0070 0.0098 0.0001 0.0106 0.0018 C 0 0 5 1 1.2E-05 0.0008 8 4 1 8.7E-06 0.0031 0.0002 0.0059 0.0069 0.0019 G 0 0 0.0076 3 7 9 1.7E-05 5 7 1.2E-05

101

Supplementary Table 3

Identification Off-Target Off-Target site Amplicon Oligonucleotides method GAGTCTAAGCAGAAGAAGAAG modified digenome- EMX1.1 AG CCATCACGGCCTTTGCAAATAGAGCC seq modified digenome- EMX1.1 CTACTGTTTCACTGCCTACCTTCCTCTACTTC seq GAATCCAAGCAGAAGAAGAGA modified digenome- EMX1.2 AG CTGGTCGATCTGCCGGTCTGTGAG seq modified digenome- EMX1.2 AATGGGAACAGTTGGGAAGAGAAGTCGG seq AAGTCTGAGCACAAGAAGAAT modified digenome- EMX1.3 GG GGATGTAGTTCTGACATTCCTCCTGAGGG seq modified digenome- EMX1.3 GCTCTGTTGTTATTTTTTGGTCAATATCTGAAAGG seq GAATCCAAGAGAAGAAGAATG CTACTGGTGTAATGTTGTACAATAGCAGATCTCT modified digenome- EMX1.5 G AG seq modified digenome- EMX1.5 CTGAATGACAAATACTGCGTGATCTCACTCC seq GAGTCCTAGCAGGAGAAGAAG modified digenome- EMX1.6 AG GAGGAAGACCAGACTCAGTAAAGCCTGG seq modified digenome- EMX1.6 CTAACCACACAGTATCTAGCTGTCCTGTCTC seq GAGTCCAAGCAGTAGAGGAAG modified digenome- EMX1.7 GG CCTCTCAGAGGGTATTGTGACATGTTGCTG seq modified digenome- EMX1.7 GGCCAGGGCTTATGTGGAAGACTCAC seq GTGTCCTAGAGAAGAAGAAGG modified digenome- EMX1.8 G CTCTCCACAGAGTCACGAGCAGCC seq

102

modified digenome- EMX1.8 CCACCACTCTTGGAGGCAGAGGAG seq modified digenome- EMX1.9 AAGTCCGAGGAGAGGAAGAA GGGTCCTTGAGGTTGCTTATGCGG seq modified digenome- EMX1.9 GAGTCAGAGGTCACAAAAGAGGGGCC seq GAGGCCGAGCAGAAGAAAGAC modified digenome- EMX1.10 GG TGCGAGCCGCAAGCGCAGGAG seq modified digenome- EMX1.10 CCTTCCCTCAGCCACTTTATTTCATCCC seq AGTTCCAAGCAGAAGAAGCATG modified digenome- EMX1.11 G ACCATAGCAGTGTTATGACAAGGTGCTGTG seq modified digenome- EMX1.11 CCTTTCAGGGCCTCAAGTAATCCAAGG seq GAGTCCACACAGAAGAAGAAA CCCAAGAATGCAATTCCTAGGTCACAGGATAATT modified digenome- EMX1.12 GA G seq modified digenome- EMX1.12 GATAACCCAGATGCTGGGATTAGCTAACAAGG seq GAGTCCAAGAGAAGAAGTGAG modified digenome- EMX1.13 G CATGACCGTGTGGTTAGGAACAGCC seq modified digenome- EMX1.13 CTGTGTGCCTTTCAGCCTTCAACTGC seq GAGTCCTTGAGAAGAAGGAAG modified digenome- EMX1.14 G AGCCTGGAGTGCAGACAACTAATGTGG seq modified digenome- EMX1.14 CATCGCCAGTGCTACAAGATAAATCAGTGC seq GAATCCAAGCAGGAGAAGAAG modified digenome- EMX1.15 GA GGTGGGAACTAGGCAAGGGTCTCAG seq modified digenome- EMX1.15 GTTGGCCATTTGTATATCTTCTTTGGAGAAACATC seq

103

GTACCAGAGAGAAGAAGAGAG modified digenome- EMX1.16 GG GCCCAGAGTCTCCCTGAATGGAAGG seq modified digenome- EMX1.16 CACAGAGGGTTGTTTGCTTCTCTTTCATCC seq GAGTCCCAGCAAAAGAAGAAA GGAATCAATCAATGAAGTTGAAGAGAGAGCAAT modified digenome- EMX1.17 AG GG seq GTTTTGCTTTTCACCTAATTTGGGTAGTTTTTAGG modified digenome- EMX1.17 C seq AAGTCCAAGTGAAGAAGAAAG modified digenome- EMX1.18 GA GACTGACATTTGATAGAACAGATGGGTAGATCC seq modified digenome- EMX1.18 GTTGACCACAGCGTGGTTGACCTCTG seq AAGTCCATGCAGAAGAGGAAG modified digenome- EMX1.19 GG CCTGGATGCCCTGCAAATTGAGTACG seq modified digenome- EMX1.19 GTCACATACTCATTCCTCAACCAAGAGGTG seq GAGTCCTAGAGAAGAAAAAGG modified digenome- EMX1.20 GT GGGAGTTCTAGAAGCCCCTATATCGTGTTG seq modified digenome- EMX1.20 GGTCTTTCGTTGCTGCCCAGGAGAG seq GAGTCCCTCAGGAGAAGAAAG modified digenome- EMX1.21 GC GATCATTGAGAGGCTTTATTCACAGAAGCAGG seq modified digenome- EMX1.21 GACAAAGAAAGATGAATGCAGGGAGCTGTG seq ACGTCTGAGCAGAAGAAGAAT modified digenome- EMX1.22 GG TGCAGAGGTTCTGCCAGTGCCTC seq modified digenome- EMX1.22 CATGAATGTAGCTCGGGCACACTCAG seq GAGTTCCAGAAGAAGAAGAAG modified digenome- EMX1.23 AG CCCAAAGACAGGTCTTTTGAAATAACTCAGTCAG seq

104

modified digenome- EMX1.23 GGAAAACTCTTCTGGACATTGGTCAAGCAAAG seq GAGTCCTAAAGAAGAAGCAGG modified digenome- EMX1.24 G CAACATGTTTGATTGCTCTCCGCTCTACC seq modified digenome- EMX1.24 CAACTTGTGGATCATGGGTACTGAGTAATCAG seq CAGTCCAAACAGAAGAGGAAT modified digenome- EMX1.25 GG GGAAGATGTGTAACTCAGGAAGTAGAGACTCC seq CAAAGTCCATCATATTGTGTAATCTGTCAGTTTCA modified digenome- EMX1.25 GG seq TGAATCCCATCTCCAGCACCAG modified digenome- FANCF.1 G CTGGAGACCCTCCTGGTTAAGAGCATG seq modified digenome- FANCF.1 GACACACAGTGATCACCACAGCTACTG seq GGAGTCCCTCCTACAGCACCAG modified digenome- FANCF.3 G CACAGTGACAGAAGGCAGCCAAGG seq modified digenome- FANCF.3 CCTAGAGCTCCAAAGGGAATACAGCCC seq GGAGTCCCTCCTACAGCACCAG modified digenome- FANCF.4 G CACAGTGACAGAAGGCAGCCAAGG seq modified digenome- FANCF.4 GACAAAGACATGGATTCCCTGCTAGAGTTCC seq modified digenome- FANCF.5 GGAATCCCTTCTACAGCATCCTG CACCTCCACCTGAGGAAAGAGGCAG seq modified digenome- FANCF.5 CATTTTGAAAATGAGGGACATGGAATGCCTTAC seq GGAGTCCCTCCTGCAGCACCTG modified digenome- FANCF.6 A GCACACATAGTGGGTCGACGTGG seq modified digenome- FANCF.6 GCTCTCAGCAAGCGCCAACTCC seq

105

GGAACCCCGTCTGCAGCACCAG modified digenome- FANCF.7 G CACTGGGTGCTTAATCCGGCTCC seq modified digenome- FANCF.7 CACTGAAGAAGCAGGGCCACACC seq GTCTCCCCTTCTGCAGCACCAG modified digenome- FANCF.8 G CTGCATGACGTGCGAGCTGCC seq modified digenome- FANCF.8 CTGTGCTCTGCCCTGGCTGAGG seq AAAATCCCTTCCGCAGCACCTA modified digenome- FANCF.9 G GCTTTCAGCAAAACTGATTGATCTAGCTCTCC seq modified digenome- FANCF.9 CCACAAGGATTTAGGTATGCAAAGCCAGG seq modified digenome- FANCF.10 TGTATTTCTTCTGCCTCAGGCTG CCATTCTATGTTCTTAATGCCAAGTCACTCTGC seq modified digenome- FANCF.10 CCTCCCTATCCTAGCCCCTAGCAAAC seq GGAATATCTTCTGCAGCCCCAG modified digenome- FANCF.11 G CTGGGAATACCTACTGGAACTGCTCAGG seq modified digenome- FANCF.11 GTGGCTCCTCTCTAGCTGTACATGTGG seq GAGTGCCCTGAAGCCTCAGCTG modified digenome- FANCF.12 G GAACTTGGCTCACCACAGGCCTTG seq modified digenome- FANCF.12 GGGACACTTCCTCTCAGAACCCACTC seq ACCATCCCTCCTGCAGCACCAG modified digenome- FANCF.13 G GCAACCTAGAGAGATACTCACTGGCATGAC seq modified digenome- FANCF.13 CAAGGAACTAGAGCCTCGAGTAGTGGC seq TGAATCCTAACTGCAGCACCAG modified digenome- FANCF.14 G CATGTAACCTATCTTCAAACCCTGAAGCTGC seq

106

modified digenome- FANCF.14 GTGCTACAAGATTACAGACCGGGTGGC seq modified digenome- FANCF.15 CTCTGTCCTTCTGCAGCACCTGG CGGGATCCTCAACAGTCGAGGATCC seq modified digenome- FANCF.15 GGAAATGGCTGAAACTGGAGAAGTTTGGG seq TTAAACATCAGACTCATTTATCGTGGAGTGACTT VEGFA2.1 CTACCCCTCCACCCCGCCTCCGG G GUIDE-seq VEGFA2.1 CACTTATCTACGCCCCACTTTGACAAGG GUIDE-seq GACCCCCCCCACCCCGCCCCCG VEGFA2.2 G CCTCAGCACCGGTAGAAAGGTAATGC GUIDE-seq VEGFA2.2 CTGAGCAGAGGAATATGTGACATGAGGAG GUIDE-seq VEGFA2.3 TGCCCCCCCCACCCCACCTCTGG GGAACCAACCCTGCCGACACC GUIDE-seq VEGFA2.3 GTGCACAATAAATGTAATGTGCTCGAATCATCC GUIDE-seq GACCCCTCACACCCCGCCCCTG VEGFA2.4 G GGAAACGATTTCACTTCTGTGTTGGAGC GUIDE-seq VEGFA2.4 GCTGAAACATTTCCGTTTGTCCTGTAGG GUIDE-seq VEGFA2.5 GCTTCCCTCCACCCCGCATCCGG TGCTGCAGATGGCCAAGACGCTG GUIDE-seq VEGFA2.5 ACACCTCCTCTGCCAGGTCTTCCTG GUIDE-seq VEGFA2.6 CCCCCTCCCCGCCCCGCCCCCGG CACGATGCGCTGAAGACACTCCAG GUIDE-seq VEGFA2.6 CCGAGGGTGTGTCACTGCCTGG GUIDE-seq VEGFA2.7 ATTCCCCCCCACCCCGCCTCAGG CAACCAAGCCCATTTGTCCAGGAACC GUIDE-seq VEGFA2.7 GTCTCCCGAAGTTCTTGAGTCTAATCCTACTTC GUIDE-seq VEGFA2.8 CACCCCCCCCCCCCCACCTCCGG GCCCTTCGGATAAGTCTAGACTTGCAG GUIDE-seq VEGFA2.8 TGTTTCTTTCCCAGGTTAGCGGCC GUIDE-seq VEGFA2.10 CTCCCCACCCACCCCGCCTCAGG CTCATCACAAGATGACTATGTCCCTCTGG GUIDE-seq VEGFA2.10 GGGCTGTGGGCATTTTTGGTCTAGG GUIDE-seq AGGCCCCCACACCCCGCCTCAG VEGFA2.14 G CAGACCCATCATCTTCAGGAAGACGC GUIDE-seq VEGFA2.14 CCAACATACAAGACATGCAGATTTGATGCTGG GUIDE-seq

107

GTACCCCACCACCCCGCCCCAG VEGFA2.15 G AGCCTCCAGATGCAGGCCTG GUIDE-seq VEGFA2.15 CCCTGAGGCTCTTATCAAACAACTGCC GUIDE-seq GACACACCCCACCCCACCTCAG VEGFA2.19 G CCACCACTCCTAATTCACCCTCTGACAG GUIDE-seq VEGFA2.19 CCTCTTCCCACACTTCCATCTGCC GUIDE-seq VEGFA2.20 TGCCCCTCCCACCCCGCCTCTGG CCTCCTGTTGTCTACCTGTTCCTGTG GUIDE-seq VEGFA2.20 CCCTGGTTTAGAGCCAAAGAGAAGTG GUIDE-seq CACTCCCCCCACCCCGCCCCAG VEGFA2.21 G AGGAGCCTGGACAGACGAAGGC GUIDE-seq VEGFA2.21 GTTCAACTTGAAAGGAAACATTCCCCGTAAG GUIDE-seq GACCCCTCCCACCCCGACTCCG VEGFA2.23 G GTATCAGAGGAAATAACACTGGCCTTTTCCC GUIDE-seq VEGFA2.23 GCTGGGAATGGTCCGGGAAAGC GUIDE-seq ACTCCCCTCCACCCCGGCTCGG VEGFA2.24 G GGGAAAAAGGGAGAAGGAATCACAAAGG GUIDE-seq VEGFA2.24 GCAGTTGTTTACCTCATTCAGACTATGCCTTTC GUIDE-seq AACACGCCCCACCCCGCCCCAG VEGFA2.27 G CCTCAATCTGTTTGATCAGCTGAGAGTATCC GUIDE-seq VEGFA2.27 GTCTTCTGTTATTTAGTTCCTGCCCTTAGGC GUIDE-seq VEGFA2.28 GTCACTCCCCACCCCGCCTCTGG TCCAACTCTGAGGAAAATGCAAATCTCTCC GUIDE-seq VEGFA2.28 GGTGGCACTTGTTAGCATTTCAAGTTTCTG GUIDE-seq VEGFA2.29 CCCCCCCCCCCCCCCGCCTCCGG GGAAAGGACTTTTGGGGCAGAATTCC GUIDE-seq VEGFA2.29 GTGGACGTCAGTTTCTGGCTAGACC GUIDE-seq VEGFA2.30 TACCCCCCACACCCCGCCTCTGG CGGAGCTGTTTCTGCTGACTGTCG GUIDE-seq VEGFA2.30 GGAGACACGGTGCTCATGAGAGG GUIDE-seq CTGGACTCTGGAATCCATTCTG CTNNB1.1 G CAGAAAAGCGGCTGTTAGTCACTGGC GUIDE-seq CTNNB1.1 CTTACCAGCTACTTGTTCTTGAGTGAAGGACTG GUIDE-seq

108

GTGGACTCTGAAATCCATCCAG CTNNB1.2 G GGGGACTTCCAGTAGCTGAGTCCG GUIDE-seq CTNNB1.2 CCTTTCCTCATCCATGTTGTAAGGAGCCC GUIDE-seq CGCGCCTCTGCACTCCAGCCTG CTNNB1.3 G CACATCATAAAACATGCAAGCCAGAAGGC GUIDE-seq CTNNB1.3 CGAGAATCGCTTGAACCCAGGAGG GUIDE-seq CTNNB1.4 TGGACTCTGGAATCCATTCTGG GCGGCTGTTAGTCACTGGCAGC Cas-OFFinder CTNNB1.4 TACTCTTACCAGCTACTTGTTCTTGAGTGAAGG Cas-OFFinder CTNNB1.5 CTGGACTCTGGATCCAGTCGGG GGTAACTTGCTCAAGTTCAGACGTGCC Cas-OFFinder CTNNB1.5 CACATATGATCAGGTGAACTCATTGCTAAGCC Cas-OFFinder CTNNB1.6 CTGGACTCTGAATACATTCTGG CTCAAGGCCTTAAATTTGCCACTCCAG Cas-OFFinder CTNNB1.6 TCCTACAGGCTTTGGGTCTTGGAGG Cas-OFFinder CTNNB1.7 CTGGACTCTGAATACATTCTGG CAGGTCACAAGCTAGGACAGTGCC Cas-OFFinder CTNNB1.7 CACTCTGAGAACAGAATGCAGCCTCC Cas-OFFinder CTNNB1.8 CTGGACTCAGGATCCATTCAGG GACAGCTGCTTCACACTTCCTAAATAGCTG Cas-OFFinder CTNNB1.8 TGGTGGGTTCTGTCATCCTGTCTACC Cas-OFFinder HBB20.1 CTGACTTTATGCCCAGCCCTGG CAAGCTAGCAAGATAGCCGAGCATGG GUIDE-seq HBB20.1 CCCTGGAATGTGAGAGCCCATGTCAG GUIDE-seq HBB20.2 CTGACTCTAGGCCCAGCCCTGG CAAGTTAGAGCCCCCAAATGAAGCATTAGAAAG GUIDE-seq HBB20.2 GGCTAGGGCTAGTGTGAGGCATTTGATG GUIDE-seq HBB20.3 TTGACTCTAAGCCCAGCCCAGG GTCCCAGTTGTTGTCTCCCTAGATCTCC GUIDE-seq HBB20.3 GATTTTGTCACCACCAGGCCTGCC GUIDE-seq HBB20.4 CTGACTCTATGGCCAGCCCGGG ATGAGATTGATGCTGTGAGTAGGAAGCTCTG Cas-OFFinder HBB20.4 CAGGTACTGAATCAGGCAGGAAACCC Cas-OFFinder HBB20.5 CTGACTTCTATGCCAGGCCTGG GGTGGATATCTGTGTACCGCGTGG Cas-OFFinder HBB20.5 GTGGACTGGTCATGCTAGATCCTCACAG Cas-OFFinder HBB20.6 CTGCCTTCATGCCCAGCCCTGG GCATTCCCATCTTCACTGATTCAGCACC Cas-OFFinder HBB20.6 CTGGGGGTGAACCTCATACAAAACCC Cas-OFFinder PPP1R12C_VRQR_site 1.1 GTCCTCACTCCCCCCACCCAGGA GCTAGGCTATGCATGCCCATCAGC Cas-OFFinder

109

PPP1R12C_VRQR_site 1.1 CCAGAGACCATGAGTGGAAAATACCCAC Cas-OFFinder PPP1R12C_VRQR_site GGCCTCCCAGCCCCCACCCACG 1.2 A GGCCTGCACCAGTGCTAGGTG Cas-OFFinder PPP1R12C_VRQR_site 1.2 CCACGGCTGAGCAGAGGCTTC Cas-OFFinder PPP1R12C_VRQR_site 1.3 GTCCTCAAGCCCCTCACCCATGA GTGATGCAGTTCTCAGAGTTTCCACAAGG Cas-OFFinder PPP1R12C_VRQR_site 1.3 GGGTCTCTGTGGCAGGGACTTG Cas-OFFinder PPP1R12C_VRQR_site GTTCTCATGGCCTCCACCCAGG 1.4 A GAAATCACAGGGAGTCGAAGTTGCAC Cas-OFFinder PPP1R12C_VRQR_site 1.4 GAGGCTGCGAGATTCTAAACCTCCC Cas-OFFinder PPP1R12C_VRQR_site 1.5 GCCTCACGGCCCCAACCCAGGA ACTAGGACTTGTATCCTGTCCATTTACATCCC Cas-OFFinder PPP1R12C_VRQR_site 1.5 GGTGTTCCCTGGAGAAAGCAGCT Cas-OFFinder PPP1R12C_VRQR_site GGCCTCACAGACCCCACCCAGG 1.6 A CCTTGTCAATACAGGGAGGTAGCAAGG Cas-OFFinder PPP1R12C_VRQR_site 1.6 GGTTTGTGTGTGTGTGCACCTGC Cas-OFFinder PPP1R12C_VRQR_site GCCATCAAGGCCCCCACCCAGG 1.7 A TGGAAGAAGGAATCTAAGTGTTCCTTCAAGG Cas-OFFinder PPP1R12C_VRQR_site 1.7 GACATGGCAGAGCCAGGACTGG Cas-OFFinder PPP1R12C_VRQR_site 1.8 GTCCTGACCTCCCCCACCCATGA GGACACTAGAGAACCAGCTTTATTTCACACAAG Cas-OFFinder PPP1R12C_VRQR_site 1.8 CTGAATTCATCTGTTAACAACTGGGTACCTTAGC Cas-OFFinder

110

PPP1R12C_VRQR_site GGACTCTCTAAGGAAAGAAAG 3.1 A CCTCAGTCACATTTGATTGATCCAAGTCACTG Cas-OFFinder PPP1R12C_VRQR_site CGACGTCTTCTGGGGAGTATATTTAGTAAGAAAG 3.1 G Cas-OFFinder PPP1R12C_VRQR_site 3.2 GGGACTCTTTCAGAAAGAAAGA TTCCTTCCCTTTCTCTCTTTCATTTCTTTCTCC Cas-OFFinder PPP1R12C_VRQR_site 3.2 GGCATGCGTCTGTGGTACCAGC Cas-OFFinder PPP1R12C_VRQR_site GGGACTCTTTAATAAAAGAAAG GTAGCTACAGATAACTTAGAAACCAATAACTCAA 3.3 A CTC Cas-OFFinder PPP1R12C_VRQR_site 3.3 CATACTTAAGTGAGAAATGCAGTTGTTCTTTGGG Cas-OFFinder PPP1R12C_VRQR_site TGGAATCTTTAAGGAAAGAATG GAGTAATAATATTTGTCTTGCCATGGAGATGGAG 3.4 A TAG Cas-OFFinder PPP1R12C_VRQR_site 3.4 GAGCTCCTGTGACTGGCCCG Cas-OFFinder PPP1R12C_VRQR_site GGGTCTGTTTAAGGAAAGAAG 3.5 GA GCCCCATCGGATGATCATGGAGTCTC Cas-OFFinder PPP1R12C_VRQR_site CTCATGACCGAAGAAATACGTAGTTGAGTCAATA 3.5 G Cas-OFFinder PPP1R12C_VRQR_site GGGAGTCTTTAAGAAAGAAGG 3.6 A CTACTCTTTGCAGCAGTCATTGACCAGC Cas-OFFinder PPP1R12C_VRQR_site 3.6 GGAAGTTTATGAGTGGAGAAGAGGTCAGTTTCC Cas-OFFinder

111

Supplementary Table 4

untransfected off-target C to T BE3 C to T site frequency stdev frequency stdev p-value 0.00251136 0.28482 EMX1.1 0.030677176 8 6.337148682 9 2.76186E-06 0.01884204 0.31994 EMX1.2 0.306149106 4 3.591416219 3 5.91281E-05 0.19208914 2.51489 EMX1.4 0.542657909 5 38.25088283 5 1.32127E-05 0.02646 EMX1.5 0.026094683 0.00289431 0.793126246 5 9.64959E-07 0.00989207 EMX1.6 0.147227291 6 1.662707495 0.01517 1.35916E-08 0.23463727 EMX1.7 2.357731422 5 2.695272308 0.30094 0.200269674 0.03632693 0.04882 EMX1.8 0.218575123 3 2.131332623 7 6.81663E-07 0.29169933 0.07699 EMX1.9 1.137223482 7 0.398920261 6 0.013277161 0.00641640 0.10553 EMX1.10 0.068824155 6 4.767538783 9 1.7075E-07 0.01887509 0.03592 EMX1.11 0.128030391 8 2.681487635 5 4.25081E-08 0.00811693 EMX1.12 0.757619239 5 2.997784547 1.16513 0.029103254 0.07465695 0.33617 EMX1.13 2.516570294 9 2.317298933 9 0.372930685 0.03860459 0.46236 EMX1.14 0.551124044 1 1.148157335 8 0.089734612 0.20795988 0.01985 EMX1.15 0.188943435 5 0.418487445 3 0.129763969 0.00237297 0.02710 EMX1.16 0.062525925 6 0.163322736 4 0.003031444 0.13674733 0.03405 EMX1.17 0.423686645 6 0.47759349 2 0.543843185 0.01021931 0.11169 EMX1.18 0.025941349 5 0.520553262 1 0.001577975 0.01022036 0.14919 EMX1.19 0.027964314 8 1.237570528 8 0.000150606

112

0.00764964 0.08105 EMX1.21 0.170097903 6 0.529935155 3 0.001564615 0.14035 EMX1.22 0.495762558 0.028875 1.341314073 4 0.000516456 0.03294 EMX1.23 0.062489038 0.01013143 0.083575266 4 0.349051594 0.00362445 0.03567 EMX1.24 0.284494922 3 2.069793803 1 0.000110336 0.00827153 0.01540 EMX1.25 0.04894466 7 1.108632006 2 4.93577E-08

untransfected C to T BE3 C to T frequency stdev frequency stdev p-value 0.01114038 VEGFA2.01 0.199976454 7 94.10610953 7.3814 2.51044E-05 10.2356560 VEGFA2.02 13.3138942 3 401.083855 6.55959 6.4268E-07 0.62288761 VEGFA2.03 1.31657689 9 94.13045026 4.02201 2.4545E-06 0.04679377 4.76617 VEGFA2.04 0.161541842 5 24.25151342 9 0.000938558 3.32389 VEGFA2.05 0.131881697 0.03841795 16.77966284 1 0.000971977 0.09222914 0.58478 VEGFA2.06 0.319898585 4 4.63243667 6 0.000227161 0.07577484 12.1605 VEGFA2.07 0.945249419 5 140.0804979 8 3.82538E-05 0.02864966 9.08590 VEGFA2.08 0.216591621 8 36.53688149 2 0.002283948 0.02146176 2.94480 VEGFA2.14 0.409198032 4 18.49116825 9 0.000442615 0.02059065 4.12750 VEGFA2.15 0.217615734 9 16.62320649 7 0.002333274 0.02653780 VEGFA2.19 0.199606183 6 2.150334057 0.2408 0.000153283 0.00162338 0.07016 VEGFA2.20 0.004308993 8 0.447216794 1 0.000397789 0.06114689 10.4189 VEGFA2.21 0.181352852 1 90.14731425 5 0.000116436 0.09755544 5.77664 VEGFA2.23 0.257491863 3 56.96772914 3 7.01878E-05

113

0.12784012 VEGFA2.24 0.996773813 5 20.51226987 2.0683 8.26709E-05 0.00035757 VEGFA2.27 0.003137833 8 0.00473461 0.00162 0.170823059 0.02608610 0.20331 VEGFA2.28 0.216025801 5 4.789261424 6 2.67884E-06 0.04028296 7.06964 VEGFA2.30 0.195413629 7 64.34926328 6 9.57208E-05

untransfected C to T BE3 C to T frequency stdev frequency stdev p-value 0.06023713 FANCF.1 0.105941356 3 0.587412238 0.13126 0.0044664 0.15687738 0.08619 FANCF.3 0.40586541 5 0.427749261 5 0.842648748 0.14775178 0.16198 FANCF.4 0.562018037 4 0.443579767 8 0.402424981 0.14255848 0.04922 FANCF.5 0.024479804 7 0.050821616 8 0.777336185 0.05984735 0.03431 FANCF.6 0.311915159 4 0.275217614 2 0.408993925 0.01832375 1.67738 FANCF.7 1.003873447 1 7.381418191 5 0.002753873 0.14552845 0.12125 FANCF.8 0.446993337 5 0.351119553 4 0.43016899 0.10512692 FANCF.9 1.784803672 9 0.37152623 0.09781 6.94407E-05 0.08977932 0.15603 FANCF.10 0.002713704 6 0.001383946 1 0.990404514 0.05075493 0.15772 FANCF.11 0.004546931 6 0.003381969 4 0.990866738 0.08351586 0.16619 FANCF.13 0.122486286 4 0.620573609 2 0.009746199 0.11489961 0.09961 FANCF.14 0.460122699 8 0.145772595 8 0.023160165 0.10922515 FANCF.15 0.270148582 3 0.321150023 0.05034 0.503375948

untransfected C to T BE3 C to T frequency stdev frequency stdev p-value

114

0.05058199 0.00345 CTNNB1.5 0.068786004 7 0.064121168 9 0.881101599 0.01416529 0.00787 CTNNB1.6 0.031040143 5 0.028824924 3 0.824481224 0.00662866 0.00768 CTNNB1.7 0.017434923 5 0.027767951 3 0.152552529 0.01110242 CTNNB1.8 0.037541935 3 0.049989227 0.01496 0.311569231

115

Supplementary Table 5

Target Target site Amplicon Oligonucleotides CAGCTCGATGCGGTTCACCAG EGFP site 1 GG GCACGACTTCTTCAAGTCCG CTGCTTGTCGGCCATGATATAGACGTTGTG G PPP1R12C site GACTCACCCAGGAGTGCGTTA 1 GG CCTGAGTAACTGAGGGGATTGGAATGCC GGCTCCTGGACCCCATATGAAGGC PPP1R12C site GTCCGACTCGGCCAGGTCCAG 2 GG AAGACAATCCTAGGAAGCAGGGTCAGC GGGCCAACATCGCCGCCG PPP1R12C site GACCCTCAGCCGTGCTGCTCG 3 GG GGAATTCTGCATCATGTGGGCGTGTCC GGATCAGCCAGGAACGAAGACCACTGG PPP1R12C site GCTCTCAGCCTGGAGACCACG 4 GG AAGACAATCCTAGGAAGCAGGGTCAGC GGGCCAACATCGCCGCCG PPP1R12C site GCTGACTCAGAGACCCTGAGT 5 GG CCACACCTAGGAGGATAAGAGGCACG GGAAAGGCCTGTGATCTCCGTTTTCC PPP1R12C site GGGGCTCAACATCGGAAGAG 6 GGG GCCAGGCAGATAGACCAGACTGAGC GCCTCCTTACCATTCCCCTTCGACC PPP1R12C site GGCACTCGGGGGCGAGAGGA 7 GGG CCAGGCCACAAGCTGCAGACAG GCTCAGTCTGGTCTATCTGCCTGGC PPP1R12C site GAGCTCACTGAACGCTGGCAT 8 GG GCCTGGCCTTGGGAGCTTCC CGGCTCTGCCATGGATCTGCCG PPP1R12C site GCTGGCTCAGGTTCAGGAGA 9 GGG GCCAGGCAGATAGACCAGACTGAGC GCCTCCTTACCATTCCCCTTCGACC GAGTCCGAGCAGAAGAAGAA EMX1 site 1 GGG GGACAAAGTACAAACGGCAGAAGCTGG GAGTGGCCAGAGTCCAGCTTGG EMX1 site 2 GTATTCACCTGAAAGTGTGC TATCTCCAGGCTCCTGTCCATTCTGG CAAACAAGCAAACTATGGCTCCAGCC GGAATCCCTTCTGCAGCACCT FANCF site 1 GG GACGTAGGTAGTGCTTGAGACCGCC AGTTGCCCAGAGTCAAGGAACACGG

116

CTGGACTCTGGAATCCATTCT CTNNB1 site 1 GG CAGAAAAGCGGCTGTTAGTCACTGGC CTTACCAGCTACTTGTTCTTGAGTGAAGGA CTG GACCCCCTCCACCCCGCCTCCG VEGFA site 2 G CTGACCAGTCGCGCTGACGG CAGAAGTTGGACGAAAAGTTTCAGTGCG HBB -28 (A>G) lenti cassette CTGACTTCTATGCCCAGCCC GTGATCAGTGTGAGGGAGTGTAAAGCTGG CCTAGGGTTGGCCAATCTACTCCC PPP1R12C GTCCTCACGGCCCCCACCCAT VRQR site 1 GA AGCTGTGCAGGAGCTCACTGC CCCCTGGGAGCTAGAAGGACCC PPP1R12C GCCACTCACAGGCAGGGCTG VRQR site 2 GGA CGAAGGGGATGGTCGTGCCTG GGCTGCACCCCAGCTCTAAGG PPP1R12C GGGACTCTTTAAGGAAAGAAGG VRQR site 3 A GCATGAGATGGTGGACGAGGAAGG GTCGTGGCCGCCTCTACTCC PPP1R12C GACATCACGTGGTGCAGCGCCG VRQR site 4 A GAGAAGGCCATCCTAAGAAACGAGAGATG GCTGCCCAAGGATGCTCTTTCC PPP1R12C GGTCTCACTCAGGATCACACGG VRQR site 5 A CCCCAAGAGGAACACAGCACAGGC CAGGCCAAGGAGCTCCGTCTTGC PPP1R12C GAGCTTCACAAAGTGGGAACAG GGAGAACTGGAATGAGTTTCTGTGTCAAT VRQR site 6 A G CGTGAGATGCTGCAGGGCAC PPP1R12C GAGCTTCACAAAGTGGGAACAG GGAGAACTGGAATGAGTTTCTGTGTCAAT VRQR site 7 A G CGTGAGATGCTGCAGGGCAC PPP1R12C GGCTATCTGCAAACAGGAAGTG VRQR site 8 A CCAGGCCTTCAGTGCTCAGTGG AACCTCTTCCCTAGTCTGAGCACTGG FANCF VRQR site 1 GAATCCCTTCTGCAGCACCTGGA GTGCTTGAGACCGCCAGAAGC GAGTCAAGGAACACGGATAAAGACGCT PPP1R12C AGAGAGAGGAAAACAGCAAGTAAGCCAG NGT site 1 GGAATTCTGCATCATGTGGG G CCAGCTCTAAGGAGGGCGGGTC PPP1R12C NGT site 2 GAATCCACAGGAGAACGGGG GCTGCCCAAATGAAAGGAGTGAGAGG

117

CCTTCTCCGACGGATGTCTCCCTTG PPP1R12C NGT site 3 GCGGTCTCAAGCACTACCTA CCTCTTGCCTCCACTGGTTGTGC GCAGCACCTGGATCGCTTTTCC PPP1R12C GGTGTCACAGGCAGTCGCCTTG NGT site 4 T GACCCGCCCTCCTTAGAGCTGG GACCCTCAGCCGTGCTGCTC EMX1 NGC site 1 GAGCTCACTTCTGTCCCCTGGGC GCTCTCACTGCCAGACACAGAATAGGG AGGAAGCGGGTCTGAGGGCAG PPP1R12C GCTAATTCCATGGCTTTAAGTGGAGTACTT NGA site 7 GTGTCACGGATTTTCTGGATAGA CC AATGTTCTCACCAAATATATGCCTTCGTGTG TC PPP1R12C NGA site 8 GAAAACTCTTCACTACTGAGGGA CAGCTTTTCAAACCAATATTCCTGACACTGC GCTTTTTAATAGAAGGCCCATTTGTAAGAA TGTTG HBB allele for erythroid precursors CTGACTTCTATGCCCAGCCCTGG CAGCATCAGGAGTGGACAGATCCC CCTAGGGTTGGCCAATCTACTCCC

118

Supplementary Table 6

# Cs in Editing editing pos chr Disease Name pam Guide seq 5’ base window window TTTAAGTTC 326 TATGCGAT 0 MT 'Cardiomyopathy_with_or_without_skeletal_myopathy' CGG TACCGG T AGTTC 1

ACTTCGAT 429 'Hypertension', '_hypercholesterolemia', AGAGTAAA 1 MT '_and_hypomagnesemia', '_mitochondrial' AGG TAATAGG T CGATA 1 TCTTCTAGT 999 ATAAATAG 7 MT 'Primary_familial_hypertrophic_cardiomyopathy' CGT TACCGT T CTAGT 1 GCGAGGTC 136 GACCTGTT 37 MT "Leber's_optic_atrophy" TGA AGGGTGA T GGTCG 1 AATTTATTC 144 AGGGGGA 95 MT "Leber's_optic_atrophy" TGG ATGATGG T TATTC 1 TCGTGGTC 146 GTAGTCCG 92 MT 'Diabetes-deafness_syndrome_maternally_transmitted' AGA TGCGAGA T GGTCG 1

180 GATAAGTC 650 'Strabismus|Seizures|Muscular_hypotonia|Global_developm TACCATCCT 9 1 ental_delay|Growth_delay|Infantile_muscular_hypotonia' AGG GCGAGG T AGTCT 1

119

229 CCGTCGTG 944 GCCCAGCA 2 16 'Surfactant_metabolism_dysfunction', '_pulmonary', '_3' GGT GGACGGT T CGTGG 1 249 GGTCTCTG 683 ACGTCTTCC 4 16 'Early_infantile_epileptic_encephalopathy_16' TGG TGGTGG T TCTGA 1 266 TCAGTTTCT 543 AAAAAGGA 1 17 'Lissencephaly_1' TGA AGCTGA T TTTCT 1 271 CTACAATTC 823 CTACCTGTC 0 9 'Retinal_cone_dystrophy_3B' GGG CGGGG T AATTC 1 277 CTCCTCGAT 603 'Congenital_long_QT_syndrome|Cardiovascular_phenotype| GCGCACCA 8 11 not_provided' GGT TGAGGT T TCGAT 1

314 ACTTCATTT 459 'Sideroblastic_anemia_with_B-cell_immunodeficiency', GACTACTTT 9 3 '_periodic_fevers', '_and_developmental_delay' TGG AATGG T CATTT 1 434 TTCATAGTC 098 'Spondylometaphyseal_dysplasia', '_megarbane-dagher- CTGCAGAG 5 16 melki_type' AGG GAGAGG T TAGTC 1 466 CGATCATT 746 GTGGAGCA 7 3 'Spinocerebellar_ataxia_29' AGT GGGCAGT T CATTG 1 491 TATGAAGT 214 CCATGATG 1 12 'Myokymia_1_with_hypomagnesemia' TGA TTTTTGA T AAGTC 1

120

522 TTTTCTATT 548 AGGCAGAA 6 11 'Beta-plus-thalassemia' AGA TCCAGA T CTATT 1 522 TTTTTCATT 548 AGGCAGAA 7 11 'beta_Thalassemia|Beta-plus-thalassemia|not_provided' AGA TCCAGA T TCATT 1 587 GTTCTCTG 494 AGTTTGTG 6 1 'Nephronophthisis_4' AGA CTTAAGA T TCTGA 1 708 GGGCATTC 207 GGAAGAAT 7 12 'Ehlers-Danlos_syndrome', '_type_8' AGA GAACAGA T ATTCG 1 717 TGGGGTCA 064 TAGTGGAA 8 19 'Leprechaunism_syndrome' AGT GAAGAGT T GTCAT 1 758 GAAGGGTC 599 GAGTCTCC 7 17 'Congenital_disorder_of_glycosylation_type_1F' TGT AGTCTGT T GGTCG 1

881 AAAGTCTG 166 'Congenital_disorder_of_glycosylation|Carbohydrate- TAGCAGAT 0 16 deficient_glycoprotein_syndrome_type_I|not_provided' GGA CTACGGA T TCTGT 1 998 TGCAGTGT 267 CTCTCTGCA 8 1 'Leber_congenital_amaurosis_9' GGG AAGGGG T GTGTC 1 106 AGCCTGTC 264 TGCTAACA 89 6 'Congenital_cataract' TGA AGTTTGA T TGTCT 1 117 ACTCTCTG 947 GTCTGGAG 81 1 'Homocysteinemia_due_to_MTHFR_deficiency' AGA GCCCAGA T TCTGG 1

121

119 'Charcot-Marie-Tooth_disease', '_axonal', AAGTCTTG 988 '_autosomal_recessive', '_type_2A2B|Charcot-Marie- TCTGGATG 17 1 Tooth_disease', '_type_2|not_provided' TGT CTGATGT T CTTGT 1 129 TCATATTCA 372 CAGGCTTG 02 11 'Sveinsson_choreoretinal_atrophy' AGG TAAAGG T ATTCA 1 134 AATTCGTTT 928 TCTTACAA 04 11 'Hypoparathyroidism_familial_isolated' CGG AATCGG T CGTTT 1 137 ATGAGGTC 160 TATTATGCT 80 X 'Spondyloepiphyseal_dysplasia_tarda' TGA TCATGA T GGTCT 1

146 CACAGAAT 832 'Asplenia|Duodenal_atresia|Reduced_number_of_intrahepat CGAGCTAC 43 12 ic_bile_ducts|Abnormal_biliary_tract_morphology' TGA CCCATGA T GAATC 1 156 GGCCTCGG 356 AGCTCATG 27 3 'Biotinidase_deficiency' AGA AACCAGA T TCGGA 1 185 ACAATGTC 754 TTCCTGCTT 19 X 'Atypical_Rett_syndrome|not_provided' AGT GAGAGT T TGTCT 1 185 ACACAGTC 819 TTAGGACA 36 X 'not_provided' TGT TCATTGT T AGTCT 1 206 GTGGTTCG 936 GCATCATA 85 14 'Amyotrophic_lateral_sclerosis_type_9' TGG GTGCTGG T TTCGG 1

122

219 ATTTCGAT 771 AAGGTGAA 74 X 'Snyder_Robinson_syndrome' CGT TCTTCGT T CGATA 1 222 TTCTCTAAA 622 GGGAAGTT 31 11 'Limb-girdle_muscular_dystrophy', '_type_2L|not_provided' AGG CGTAGG T CTAAA 1 233 CAGATCAG 206 'Severe_autosomal_recessive_muscular_dystrophy_of_childh AATCCCCC 39 13 ood_-_North_African_type' GGA ACTCGGA T TCAGA 1

234 TGTCTTGTC 241 'Hypertrophic_cardiomyopathy|Primary_familial_hypertrophi CCTGAAGG 48 14 c_cardiomyopathy|Familial_hypertrophic_cardiomyopathy_1' GGA TGAGGA T TTGTC 1 264 CTTCTTTTC 206 GGAAGGTT 95 4 'Adams-Oliver_syndrome_3' AGA TGGAGA T TTTTC 1 279 GACATCTG 833 GAGGGTCC 83 15 'Tyrosinase-positive_oculocutaneous_albinism|not_provided' TGG CCGATGG T TCTGG 1 287 CCAGTAGT 680 CGCCCTTG 36 14 'Rett_syndrome', '_congenital_variant' GGT CCCGGGT T TAGTC 1 309 CGACTCGG 764 AGAGACGC 39 7 'Isolated_growth_hormone_deficiency_type_1B' AGG CTGCAGG T TCGGA 1 312 GAATTCTTT 218 'Neurofibromatosis', AAAAAAGA 42 17 '_type_1|Juvenile_myelomonocytic_leukemia' AGA GAAAGA T TCTTT 1

123

312 CTTCTTCTG 590 TGAAGAGA 68 17 'Neurofibromatosis', '_type_1' TGA ACATGA T TTCTG 1 317 GGTATGGT 937 CTTCTAATC 95 11 'Congenital_ocular_coloboma|Coloboma_of_optic_disc' GGG GAAGGG T TGGTC 1 346 CCCTCGGG 479 'Deficiency_of_UDPglucose-hexose-1- TGCAGGTT 53 9 phosphate_uridylyltransferase' AGG TGTGAGG T CGGGT 1 348 GTCTCGTT 868 'Familial_platelet_disorder_with_associated_myeloid_malign GCAGCGCC 66 21 ancy' CGT AGTGCGT T CGTTG 1 374 CAAGTCGT 367 TAGCTGCC 29 9 'Primary_hyperoxaluria', '_type_II' AGG AACAAGG T TCGTT 1 383 ACAAGATC 698 GTCTACAG 63 X 'not_provided' AGG AAACAGG T GATCG 1 384 AGTGTATC 012 GTCTAGCA 80 X 'not_provided' AGA TGGCAGA T TATCG 1 384 GGCTGATC 014 ACCTCACG 14 X 'not_provided' AGG CTCCAGG T GATCA 1 384 TATAGTGT 036 CCCTAAAA 19 X 'not_provided' CGG GGCACGG T GTGTC 1 384 CAGGATAT 036 CGTTCCCAT 72 X 'not_provided' CGA CCCCGA T ATATC 1

124

384 TGTATCAA 089 TTACAGAC 37 X 'not_provided' GGA ACTTGGA T TCAAT 1

384 'Lymphoblastic_leukemia', '_acute', TGTCTCTTT 147 '_with_lymphomatous_features|Transitional_cell_carcinoma ATAGTAGT 90 8 _of_the_bladder|Astrocytoma|Glioblastoma' TGT CGATGT T TCTTT 1 384 CGCTGGTC 279 GAAAAATG 70 8 'Hartsfield_syndrome' AGA GCAAAGA T GGTCG 1 384 AAAGAAAT 280 CGCATGCA 48 8 'Hartsfield_syndrome' CGG GTGCCGG T AAATC 1 385 GAAGAAGT 804 CGAAGGCC 30 19 'Central_core_disease|not_provided' CGG ACCACGG T AAGTC 1 416 GTCATCGA 242 GGTTCTGC 36 17 'Pachyonychia_congenita_2|not_provided' TGG ATGGTGG T TCGAG 1 423 TCACTCTTG 296 TGTTTGTG 12 17 'Autoimmune_disease', '_multisystem', '_infantile-onset', '_1' AGA CCCAGA T TCTTG 1 430 GAACTCGG 592 TCCTGCGG 99 21 'Homocystinuria', '_pyridoxine-responsive' GGG GATGGGG T TCGGT 1

125

431 CCACAGTC 225 'Muscular_dystrophy-dystroglycanopathy_ GCTCTGGA 97 8 (congenital_with_brain_and_eye_anomalies)', '_type_a', '_12' CGA GCCACGA T AGTCG 1

431 'Hereditary_cancer- AATCTCAG 240 predisposing_syndrome|Hereditary_breast_and_ovarian_can AGTGTCCC 32 17 cer_syndrome|Breast-ovarian_cancer', ' GGT ATCTGGT T TCAGA 1 442 GTCCTCGA 866 ACAGCACC 71 21 'Polyglandular_autoimmune_syndrome', '_type_1' AGA CTCCAGA T TCGAA 1 458 TACCTCGA 981 TCGTGAGA 07 20 'Galactosialidosis', '_late_infantile' CGA AAGGCGA T TCGAT 1 484 GGTAGATT 563 CCAATGGC 16 13 'Retinoblastoma' GGG TTCTGGG T GATTC 1 485 CTGTCTGA 172 TGTCGATG 59 X 'not_provided' TGA TGGATGA T CTGAT 1 492 CAGCTGTC 003 TGTTGACC 95 19 'Progressive_familial_heart_block_type_1B' AGA GTGAAGA T TGTCT 1 494 CTTTCTGAT 579 'Methylmalonic_aciduria_due_to_methylmalonyl- GGAATTCC 23 6 CoA_mutase_deficiency' AGA TTTAGA T CTGAT 1 496 TGTTGGGT 193 CTGGCTTC 26 6 'Stomatocytosis_I' TGA CTCATGA T GGGTC 1

126

499 GTTGTTGT 634 CGAGCTGC 95 12 'Diffuse_palmoplantar_keratoderma', '_Bothnian_type' GGA AAGGGGA T TTGTC 1 504 GAATCTTG 953 ACTGTTCT 69 7 'Deficiency_of_aromatic-L-amino-acid_decarboxylase' TGT GCCATGT T CTTGA 1 505 CACGAGTC 272 'Mitochondrial_DNA_depletion_syndrome_1_ TCTTACTGA 65 22 (MNGIE_type)|not_provided' TGG GAATGG T AGTCT 1 516 GAGTCTGT 887 AAACCTAG 72 12 'Early_infantile_epileptic_encephalopathy_13' TGT GCAATGT T CTGTA 1

516 'Global_developmental_delay|Developmental_regression|De TGCCTCGA 996 velopmental_stagnation_at_onset_of_seizures|Generalized_t TCGGACTG 63 12 onic_seizures' TGT CAGCTGT T TCGAT 1 518 ACTGAAGT 067 CCTCCAGG 88 12 'Early_infantile_epileptic_encephalopathy_13' TGG ATGATGG T AAGTC 1 520 GAATCAAA 248 GAATTATTT 72 13 'Congenital_disorder_of_glycosylation_type_1P' TGA GTCTGA T CAAAG 1 523 CAGTGGTC 978 TGTCCGAT 21 1 'Meier-Gorlin_syndrome_1' TGA TCTGTGA T GGTCT 1 524 AACAAGTC 926 TGCCTCCTT 68 12 'Pachyonychia_congenita_3|not_provided' CGA CATCGA T AGTCT 1

127

535 TTTCTCAGA 346 AGGCTTCT 67 X 'not_provided' AGA ATGAGA T TCAGA 1 542 CAAATCTT 838 GGACCCAT 77 12 'Chronic_progressive_multiple_sclerosis' GGA GAAGGGA T TCTTG 1

547 'Malignant_melanoma_of_skin|Malignant_melanoma|Hemat GGACTTCG 280 ologic_neoplasm|Gastrointestinal_stromal_tumor|Adenocarc AGTTCAGA 55 4 inoma_of_stomach' AGG CATGAGG T TTCGA 1 575 GGACATTC 003 TCCAGCAG 37 12 'Interstitial_lung_and_liver_disease|not_provided' TGA TTGCTGA T ATTCT 1 621 TTGTAGGT 460 'Multiple_congenital_anomalies-hypotonia- CCCCATGG 23 18 seizures_syndrome_1' GGT GGCTGGT T AGGTC 1 646 GATGATTC 158 TTTAAAATT 94 8 'Spastic_paraplegia_5A' TGA TGATGA T ATTCT 1 676 CATTGAGT 119 CCTATAAG 34 11 'Mitochondrial_complex_I_deficiency' AGA CACGAGA T GAGTC 1 739 CTGGTGTC 581 'Mitochondrial_DNA-depletion_syndrome_3', GGATGTCA 87 2 '_hepatocerebral' TGA ATGATGA T TGTCG 1 781 GACTCTGT 093 GGACATGT 09 15 'Deafness', '_autosomal_recessive_48' CGT TTTCCGT T CTGTG 1

128

831 CTTTAAGTC 719 GTCTGTTT 50 6 'Immunodeficiency_23' AGA GGAAGA T AAGTC 1 831 TGAAATGT 887 CGGCACCA 55 6 'Immunodeficiency_23|Hyper-IgE_syndrome' GGG TCCTGGG T ATGTC 1 847 GCCTCGGT 625 GATGATCA 54 6 'Multicystic_renal_dysplasia', '_bilateral' AGT TCTCAGT T CGGTG 1 859 AACTTCGT 091 ATGGCTCG 37 16 'Immunodeficiency_32b' TGT GAAATGT T TCGTA 1 879 GTACTCTG 579 AGTTCCCTC 40 10 'Macrocephaly/autism_syndrome' CGT AGCCGT T TCTGA 1 890 CTTCATCG 142 'SQUAMOUS_CELL_CARCINOMA', '_BURN_SCAR-RELATED', ACACCATT 05 10 '_SOMATIC' CGA CTTTCGA T ATCGA 1 893 TGAAGATT 209 CTGGCATG 07 15 'not_provided' CGG CTCACGG T GATTC 1 897 GCGCAGTT 766 CCTCCCGCT 28 15 'Spondylocostal_dysostosis_2' TGG CGCTGG T AGTTC 1 940 ACTTCTGA 786 AGAACTGG 93 1 'Stargardt_disease_1|not_provided' CGT AACACGT T CTGAA 1 945 TTCCTTCAT 781 GCTACTGA 30 13 'Catel_Manzke_syndrome' TGT TGTTGT T TTCAT 1

129

951 GGATATCG 147 TTTCACGAT 33 4 'Acromesomelic_dysplasia', '_Demirhan_type' AGA GATAGA T ATCGT 1 955 GCCCTTTCA 440 GATGAAAA 94 14 'Sideroblastic_anemia_3', '_pyridoxine-refractory' AGA GAAAGA T TTTCA 1 100 AATTCGTG 527 TGATTGAA 696 13 'Propionic_acidemia' TGA GCCATGA T CGTGT 1 101 TGACATTC 353 GGGCTTTT 879 X 'X-linked_agammaglobulinemia' AGT GGTAAGT T ATTCG 1 101 GGGTCGGA 398 ATGACCCA 802 X 'Fabry_disease|Deoxygalactonojirimycin_response' TGG GATATGG T CGGAA 1 101 AAGTCTAA 726 AGAATCTG 785 13 'Spinocerebellar_ataxia_27' TGA TTTTTGA T CTAAA 1 104 CGTGGAGT 788 CCTTTGCCC 469 9 'Familial_hypoalphalipoproteinemia' TGA TTTTGA T GAGTC 1 108 CCCCATCG 694 TCCTCAGG 874 X 'Alport_syndrome', '_X-linked_recessive' GGA GATGGGA T ATCGT 1 109 GTCCTCGA 796 ACTTGCGG 638 12 'Skeletal_dysplasia' GGT GACAGGT T TCGAA 1 112 CTCCTGTC 472 GTTGTAGT 954 12 'Rasopathy|not_specified|not_provided' GGA GTCTGGA T TGTCG 1

130

112 TGCCTCGG 917 TTGGCTTG 796 1 'Erythrocyte_lactate_transporter_defect' CGA GGCCCGA T TCGGT 1 117 AGCCTCTG 642 GAGTGATA 577 7 'Cystic_fibrosis' AGG CCACAGG T TCTGG 1

119 'Noonan_syndrome- CATATTTCA 278 like_disorder_with_or_without_juvenile_myelomonocytic_le CATAGTTG 214 11 ukemia' TGT GAATGT T TTTCA 1 121 AGATGGTT 447 CTCTCCGT 464 6 'Oculodentodigital_dysplasia' CGA GGGGCGA T GGTTC 1 121 GTGCTCGA 520 TCCACTGG 044 10 'Crouzon_syndrome' GGG ATGTGGG T TCGAT 1 123 GCAATTTC 630 GTTTCTTGA 210 12 'Leukoencephalopathy_with_vanishing_white_matter' TGA CAGTGA T TTTCG 1 125 TGAAGGTC 061 TTCTAGTA 192 8 'Spastic_paraplegia_8' AGA ATACAGA T GGTCT 1 128 TGTTTTCAT 810 CATAGTAA 662 11 'Bleeding_disorder', '_platelet-type', '_21' CGG TAACGG T TTCAT 1 129 AAATCGGA 963 'Hypotonia', '_ataxia', TTTCCGGA 462 10 '_and_delayed_development_syndrome' GGA GGTTGGA T CGGAT 1

131

130 AACGGTGT 342 'CONGENITAL_DISORDER_OF_GLYCOSYLATION', CGAACGCC 034 2 '_TYPE_IIo|Congenital_disorders_of_glycosylation_type_II' TGG CGGGTGG T GTGTC 1 137 ATGTGTTC 569 GCGCAGGG 054 X 'Heterotaxy', '_visceral', '_X-linked' CGG AGCTCGG T GTTCG 1 140 GGTGATCT 753 'Malignant_melanoma|Cardiofaciocutaneous_syndrome_1|n TGGTCTAG 352 7 ot_provided' AGT CTACAGT T ATCTT 1 140 ATCATCTG 781 GAACAGTC 605 7 'Cardio-facio-cutaneous_syndrome|not_provided' AGG TACAAGG T TCTGG 1 143 CTTTGTCTG 332 ACAACAAT 755 7 'Myotonia_congenita|not_provided' TGG ACATGG T GTCTG 1 148 TTCCTCGTA 404 TCCCAATG 708 5 'Distal_hereditary_motor_neuronopathy_2D' AGG CTAAGG T TCGTA 1 149 GAATCTGT 209 GGTGATGT 342 3 'Deficiency_of_ferroxidase' TGT TTTCTGT T CTGTG 1 149 GATGATAT 980 CACAAAAG 866 5 'Diastrophic_dysplasia|Achondrogenesis', '_type_IB' TGG CCAATGG T ATATC 1 150 CATCTCTG 056 ACTGTGTC 097 5 'Hereditary_diffuse_leukoencephalopathy_with_spheroids' CGG TACACGG T TCTGA 1

132

154 AGCCATCG 904 ATTGCTGG 083 X 'Hereditary_factor_VIII_deficiency_disease' GGA AGAAGGA T ATCGA 1 154 TTACTTCAT 965 GGGGAAGT 996 X 'Hereditary_factor_VIII_deficiency_disease' AGA TGGAGA T TTCAT 1 156 GCGTGAGT 134 CTGAGAGC 875 1 'Primary_dilated_cardiomyopathy' TGG CGGCTGG T GAGTC 1 GCTCTCGG 156 AGGGCGA 136 'Congenital_muscular_dystrophy', '_LMNA- GGAGGAG 103 1 related|not_provided' AGA A T TCGGA 1 156 AAGTCTGA 579 TGCAGACC 109 2 'Diabetes_mellitus_type_2' AGG AGAAAGG T CTGAT 1 160 TTCTCAGT 706 GACACTGA 469 6 'Plasminogen_deficiency', '_type_I' TGA ACAGTGA T CAGTG 1 161 CTCTTCGA 306 AGGTCCCC 870 1 'Charcot-Marie-Tooth_disease', '_demyelinating', '_type_1b' CGT ACCTCGT T TCGAA 1 165 TGTCTCGA 069 ACGTTATA 101 3 'Sucrase-isomaltase_deficiency' TGA ACCATGA T TCGAA 1

166 ACTTCTATA 058 'Severe_myoclonic_epilepsy_in_infancy|Generalized_epilepsy GTATTGAA 684 2 _with_febrile_seizures_plus', '_type_2' AGG TAAAGG T CTATA 1

133

173 CATTTGTC 910 GATGGCCG 861 1 'Antithrombin_III_deficiency' GGA CTCTGGA T TGTCG 1 174 ACCCAATT 877 CATGAGCA 962 2 'Duane_syndrome_type_2' GGA CGTAGGA T AATTC 1 189 AGCTTCTTT 864 GTAGACAG 349 3 'Split-hand/foot_malformation_4' TGG GCATGG T TCTTT 1 190 ACTGAGTC 975 GATTTCTGT 830 2 'Immunodeficiency_31a' TGA GTCTGA T AGTCG 1 193 ACCTCGAA 643 TACACAGT 625 3 'Dominant_hereditary_optic_atrophy|not_provided' TGG ATGATGG T CGAAT 1 202 AGCTTCGA 532 AACAAGTA 658 2 'Primary_pulmonary_hypertension' TGT GACATGT T TCGAA 1 202 GTAGTTTTC 532 ACCTGGGA 735 2 'Primary_pulmonary_hypertension' GGT AGAGGT T TTTTC 1 202 AGTAGTTT 532 CTACCTGG 736 2 'Primary_pulmonary_hypertension' AGG GAAGAGG T GTTTC 1 212 AGGCGTCG 858 ACCAGCGA 813 1 'Posterior_column_ataxia_with_retinitis_pigmentosa' AGG GTACAGG T GTCGA 1 225 AATTCTAA 404 AATATTTTC 715 1 'Pelger-Hu\xc3\xabt_anomaly' TGT CTATGT T CTAAA 1

134

226 TTGGATTCT 984 GGGGCAAC 892 1 'Coenzyme_Q10_deficiency', '_primary', '_4' GGA GCGGGA T ATTCT 1 240 CATTCGGG 869 GGCAGCGA 326 2 'Primary_hyperoxaluria', '_type_I' TGG GCCGTGG T CGGGG 1 241 TGCAGGTC 767 ACCATCTCC 718 2 'D-2-hydroxyglutaric_aciduria_1' AGG TGGAGG T GGTCA 1 GGAGTCGC 123 ACCAAAAT 20 MT 'Mitochondrial_myopathy' GGG TTTTGGG T TCGCA 2

'Alpha-thalassemia-2', TCAGACTT 173 '_nondeletional|Hemoglobin_H_disease', CATTCAAA 692 16 '_nondeletional|not_specified|not_provided' AGG GACCAGG T ACTTC 2 134 AATTCGCA 818 AGTATGTC 5 4 'UV-sensitive_syndrome_3' GGT TTAGGGT T CGCAA 2 258 CTCCTCCTT 352 'Long_QT_syndrome_1|Congenital_long_QT_syndrome|not_ TGCGCTCC 9 11 provided' CGG CAGCGG T TCCTT 2 277 TGTCTCCTA 803 CTCGGTTC 0 11 'not_provided' CGG AGGCGG T TCCTA 2 278 GCCATCCTT 727 CCAGGAGG 8 Y '46', 'XY_sex_reversal', '_type_1' AGA CACAGA T TCCTT 2

135

288 CGCTCGGC 412 'Intrauterine_growth_retardation', '_metaphyseal_dysplasia', GAAGAAAT 3 11 '_adrenal_hypoplasia_congenita', '_and_genital_anomalies' GGG CTGCGGG T CGGCG 2 382 GCGTCGCT 409 AGTGCTCA 7 11 'Hyperphosphatasia_with_mental_retardation_syndrome_3' TGT CTTATGT T CGCTA 2 391 ACCACTTCT 251 GAAGAAGC 4 20 'Retinitis_pigmentosa' TGA TCTTGA T CTTCT 2 411 CTTGGCTTC 754 CTGGGTGA 1 19 'Cardiofaciocutaneous_syndrome_1|not_provided' AGG GAAAGG T GCTTC 2 477 GCCTGCTT 958 CCCATGAT 4 3 'Gillespie_syndrome' AGG GGCCAGG T GCTTC 2 494 CACCTCCTT 597 TGCCCATC 3 17 'Amyotrophic_lateral_sclerosis_18' AGG AGCAGG T TCCTT 2 522 TGCCCTCG 664 AGGTTGTC 5 11 'Hb_gambara' TGA CAGGTGA T CTCGA 2 522 CTGACTTCT 709 ATGCCCAG 9 11 'beta_Thalassemia|Beta-plus-thalassemia|not_provided' TGG CCCTGG T CTTCT 2 522 CTGACTTTC 710 'beta_Thalassemia|Beta_thalassemia_intermedia|Beta-plus- ATGCCCAG 0 11 thalassemia' TGG CCCTGG T CTTTC 2

136

525 CAGAGGTC 448 CTTTGACA 2 11 'Cyanosis', '_transient_neonatal' TGG GCTTTGG T GGTCC 2 860 AATACTTCT 489 GTAGAAAA 8 12 'Immunodeficiency_with_hyper_IgM_type_2' CGA CCACGA T CTTCT 2 881 GGGGTACT 108 CTCATTGA 6 16 'Carbohydrate-deficient_glycoprotein_syndrome_type_I' CGA ATTCCGA T TACTC 2 998 CGCTACTC 257 GGTACCAG 7 1 'Leber_congenital_amaurosis_9' TGT ATCTTGT T ACTCG 2 110 ATTGTGCT 221 'FRONTOTEMPORAL_DEMENTIA_WITH_TDP43_INCLUSIONS', CAGGTTCG 96 1 '_TARDBP-RELATED' TGG GCATTGG T TGCTC 2 111 CGGCAGTC 052 CGTCTGTG 83 19 'Familial_hypercholesterolemia' AGA ACTCAGA T AGTCC 2 111 GACGAATC 065 CCAGTGCT 92 19 'Familial_hypercholesterolemia' TGG CTGATGG T AATCC 2 117 TGGCTCCA 581 GCCCTCAG 86 8 'Congenital_heart_disease' GGA GAAGGGA T TCCAG 2 131 GCTCTTCCA 320 GCATGAAA 98 10 'Amyotrophic_lateral_sclerosis_type_12' AGA ATCAGA T TTCCA 2 137 TCACTTCCT 366 GTTTTCTTT 40 X 'Oral-facial-digital_syndrome' AGA CCAGA T TTCCT 2

137

137 CCACTTCCT 366 GGAAAGAA 56 X 'Oral-facial-digital_syndrome' AGA AACAGA T TTCCT 2 161 GGTGTCTG 889 CTGTCCAC 58 16 'Pseudoxanthoma_elasticum' TGG ACTCTGG T TCTGC 2 174 GGGAGCGT 969 CACCTTTCC 35 20 'Cataract_33', '_multiple_types' CGG CACCGG T GCGTC 2 182 GAATCGAC 916 AACATTAT 63 11 'Hermansky-Pudlak_syndrome_5' GGA GTTTGGA T CGACA 2 201 CTCTTCCAT 839 TCGCTTCTT 61 3 'Chronic_atrial_and_intestinal_dysrhythmia' TGT TATGT T TCCAT 2 233 CTCTCTGAC 407 CCTGATAT 15 13 'Spastic_ataxia_Charlevoix-Saguenay_type' AGT AGAAGT T CTGAC 2 264 GCACTCCTT 382 GGCATAGC 28 1 'Retinitis_pigmentosa_59' GGT GACGGT T TCCTT 2 303 GGATCCTG 139 AATGTACA 26 10 'Ataxia', '_spastic', '_4', '_autosomal_recessive' AGA GAGGAGA T CCTGA 2 312 CATTAACTC 584 CAAGCCCC 88 17 'Neurofibromatosis', '_type_1' CGA TTTCGA T AACTC 2 313 GAATCCTA 907 AAATTGCC 12 18 'Hypotrichosis_6' AGA TACAAGA T CCTAA 2

138

315 GAGCCATC 930 'Amyloidogenic_transthyretin_amyloidosis|AMYLOIDOSIS', TGCCTCTG 17 18 '_LEPTOMENINGEAL', '_TRANSTHYRETIN-RELATED' AGT GGTAAGT T CATCT 2 316 CGTTCGGC 685 TTGTGGTG 47 21 'Amyotrophic_lateral_sclerosis_type_1|not_provided' TGG TAATTGG T CGGCT 2 383 TTTAATTCC 673 TTCTCCGGT 68 X 'not_provided' AGT AAAGT T ATTCC 2 384 GATTCGGA 013 CACCCTGG 31 X 'not_provided' AGA CTAAAGA T CGGAC 2 384 AGCCCATC 013 GATAATTG 69 X 'not_provided' GGA GGATGGA T CATCG 2 384 TTCTGGCTC 119 TCTGGGCA 13 X 'not_provided' AGT AGCAGT T GGCTC 2 384 GCAGAGCT 407 CGAGCTGC 96 19 'King_Denborough_syndrome|not_provided' TGA TCCTTGA T AGCTC 2 403 ATCCTCCTT 650 CTTTCACTG 27 22 'not_provided' CGT GTCGT T TCCTT 2 408 ACATCGAC 198 AACCATCG 82 1 'DFNA_2_Nonsyndromic_Hearing_Loss' TGG GCTATGG T CGACA 2 408 TTCCTACTC 221 GGACAAAG 04 17 'Bullous_ichthyosiform_erythroderma|not_provided' GGG TTCGGG T TACTC 2

139

413 ACTTACTTC 441 TCTGGCTTC 26 X 'not_provided' CGT CTCGT T ACTTC 2 415 CTCTTACTC 714 GGATAAGG 90 17 'Epidermolytic_palmoplantar_keratoderma|not_provided' AGG TGCAGG T TACTC 2 427 GGGCTGCT 961 CCCCCAGC 96 17 'Pseudohypoaldosteronism_type_2B' TGT CGGCTGT T TGCTC 2 428 CCTCACTC 430 GTGGAAAG 25 8 'Dystonia_6', '_torsion' GGG AAACGGG T ACTCG 2 430 CATTGACT 539 CGCTGAAC 20 21 'Homocystinuria', '_pyridoxine-responsive' TGG TTCGTGG T GACTC 2 436 CAGACTTT 140 CCTGCTTA 18 6 'Xeroderma_pigmentosum', '_variant_type' AGG AAGAAGG T CTTTC 2 449 CCGCTCTG 152 CCAGCTAC 48 17 "Alexander's_disease" AGA ATCGAGA T TCTGC 2 456 TACGACTT 452 'Epileptic_encephalopathy', '_early_infantile', CCAAGGTT 20 5 '_24|not_provided' AGA AATTAGA T ACTTC 2 472 CACTTCCTT 347 TTTGTGCAT 84 11 'Xeroderma_pigmentosum', '_group_E' TGA TCTGA T TCCTT 2 479 ACCTCAAC 467 TCTGGGAC 80 17 "Pyridoxal_5'-phosphate-dependent_epilepsy" TGG CTGCTGG T CAACT 2

140

484 TCTTACTCG 598 GTCCAAAT 32 13 'Hereditary_cancer-predisposing_syndrome' TGT GCCTGT T ACTCG 2 490 GACCCTCT 544 ACAGGTGG 19 12 'not_provided' AGG GAAGAGG T CTCTA 2 494 CTATACTTC 561 AGCAGATG 49 6 'not_provided' TGG GATTGG T ACTTC 2 505 GTTAAGTC 272 CGGAGGG 23 22 'Mitochondrial_DNA_depletion_syndrome_1_ (MNGIE_type)' CGT GCCGCCGT T AGTCC 2 520 GTGTGCTT 191 CAAGAGCC 25 13 'Congenital_disorder_of_glycosylation_type_1P' AGA CTGCAGA T GCTTC 2 534 GCACACTT 315 CCTCTGGG 56 X '2-methyl-3-hydroxybutyric_aciduria|not_provided' TGG AGGCTGG T ACTTC 2 541 ATGCTTCA 580 CCCAAAGG 96 20 'Idiopathic_hypercalcemia_of_infancy' AGT AGTAAGT T TTCAC 2 551 CTCCTCCTT 540 CACCTGCTT 47 19 'Familial_restrictive_cardiomyopathy_1' GGT GAGGT T TCCTT 2 556 GACTCCAG 561 ATGTAACT 48 2 'Deafness', '_autosomal_recessive_70' TGG CTTATGG T CCAGA 2 644 TGTACTTCA 854 'Frontotemporal_dementia_and/or_amyotrophic_lateral_scle GGGAGGG 66 12 rosis_4' TGA AAACTGA T CTTCA 2

141

674 CACCAGCT 903 CGCACTGG 91 11 'Somatotroph_adenoma' AGT CAGTAGT T AGCTC 2 685 GTTCTTCAC 289 CTCTAAAA 94 17 'Carney_complex', '_type_1' TGA TAATGA T TTCAC 2 721 CTGGTACT 228 CGTAATCC 04 17 'Acampomelic_campomelic_dysplasia' GGT GGGTGGT T TACTC 2 727 ACCTCGCT 923 GGGAAACA 29 9 'Deafness', '_autosomal_recessive_7' TGG ATGGTGG T CGCTG 2 761 CTCTTCCAT 617 'Epilepsy', '_progressive_myoclonic_4', CCGCTGTT 38 4 '_with_or_without_renal_failure' TGA CCCTGA T TCCAT 2 775 TTGACTATC 743 ACCGTTTA 26 X 'ATR-X_syndrome' GGT GATGGT T CTATC 2 835 TTCTTTCCT 093 CTTTTTGTC 24 X 'Deafness', '_X-linked_2' CGA TTCGA T TTCCT 2 854 AGTCCTCA 652 TGCCAGTG 89 8 'CARBONIC_ANHYDRASE_II_VARIANT' GGT CTCAGGT T CTCAT 2 879 CAGCCGTC 579 ACCTGTGT 58 10 'Macrocephaly/autism_syndrome' TGA GTGGTGA T CGTCA 2 946 GGAGTCCA 069 TTTTGGTG 39 9 'Fructose-biphosphatase_deficiency' AGG GACAAGG T TCCAT 2

142

100 'Autosomal_dominant_progressive_external_ophthalmoplegi AACAACTC 989 a_with_mitochondrial_DNA_deletions_3|Sensory_ataxic_neu GGCGGCTT 165 10 ropathy' GGA CCCAGGA T ACTCG 2 102 AGACGTTC 855 CTCAGTTCC 151 12 'Phenylketonuria|not_provided' AGA TGCAGA T GTTCC 2 102 CATGTATC 855 CCACTCGA 309 12 'Phenylketonuria|not_provided' TGG GGGATGG T TATCC 2 105 CTGCGCTC 802 GCGTGTGC 291 8 'Double_outlet_right_ventricle' GGT GCATGGT T GCTCG 2 107 AAAGAATC 704 CCAAAGAA 344 7 "Pendred's_syndrome" TGT TTGATGT T AATCC 2 107 TCATTTTCC 902 TCAATGCA 340 7 'Maple_syrup_urine_disease', '_type_3' TGT GACTGT T TTTCC 2 107 TTCTCCATG 947 AACCTTTTC 388 4 'Spastic_paraplegia_56', '_autosomal_recessive' TGT TATGT T CCATG 2 108 CAGCTCCT 695 ATCCAAGC 042 X 'Alport_syndrome', '_X-linked_recessive' TGT ACTGTGT T TCCTA 2 111 AACCAGCT 401 CGGGGCGC 157 X 'Heterotopia' AGT ACAAAGT T AGCTC 2 111 AGTCCTCG 401 TTCTCCCTG 175 X 'Heterotopia' TGT GCCTGT T CTCGT 2

143

111 ACCTCGCA 410 GGCACTGA 271 X 'Heterotopia' TGA GTAATGA T CGCAG 2 111 GCTGCTTC 685 GCCAGTTC 059 X 'Epileptic_encephalopathy', '_early_infantile', '_36' TGA CAGCTGA T CTTCG 2

112 'Noonan_syndrome|Noonan_syndrome_1|Juvenile_myelomo ATATCTGC 477 nocytic_leukemia|Metachondromatosis|LEOPARD_syndrome ATTGATGT 719 12 _1' TGA AATCTGA T CTGCA 2 114 AATTCACA 659 GCTGGTTT 617 7 'Speech-language_disorder_1' GGA ACACGGA T CACAG 2 116 TATTTTCCA 120 TACCACGT 318 6 'Metaphyseal_chondrodysplasia', '_Schmid_type' TGT GCATGT T TTCCA 2 116 TCTTTACTC 120 GTCAGATA 345 6 'Metaphyseal_chondrodysplasia', '_Schmid_type' GGA CCAGGA T TACTC 2 116 GTCCTCAA 832 CGGTGCTC 816 11 'Hyperalphalipoproteinemia_2' AGT CAGTAGT T TCAAC 2 116 GCGCTCGG 836 CCGCGCGC 019 11 'Familial_visceral_amyloidosis', '_Ostertag_type' AGG CTTGAGG T TCGGC 2 116 GTTTCCTG 895 GAAGCAAG 184 6 'Mitchell-Riley_syndrome' AGA CTAAAGA T CCTGG 2

144

122 CCCCTCCTT 254 'Hypocalcemia', '_autosomal_dominant_1', TTGGGCTC 274 3 '_with_bartter_syndrome' GGT GCTGGT T TCCTT 2 123 CGCTGCTT 428 CCCTGACT 027 X 'Mental_retardation', '_X-linked', '_syndromic', '_wu_type' AGA GTGGAGA T GCTTC 2 127 GACTCCTT 292 GGTGGCTA 785 2 'Xeroderma_pigmentosum', '_complementation_group_b' AGA TTGCAGA T CCTTG 2 128 GCTGTCGG 323 CCGCCGGC 100 9 'Coenzyme_Q10_deficiency', '_primary', '_7' CGG TCCGCGG T TCGGC 2 131 GAGAGCAT 508 'Limb-girdle_muscular_dystrophy-dystroglycanopathy', CCTCTGTTT 913 9 '_type_C1' AGA CAAAGA T GCATC 2 132 AACATCAG 330 CCAGTGTA 432 9 'Amyotrophic_lateral_sclerosis_type_4' AGT CTTCAGT T TCAGC 2 132 CTGCTGCT 675 CCCTGCAA 363 3 'Spinocerebellar_ataxia', '_autosomal_recessive_24' TGT TTTCTGT T TGCTC 2 135 TATAGTTCC 648 ATACAGGC 501 7 'Nephrotic_syndrome', '_type_13' TGT TCTTGT T GTTCC 2 136 CGTGACTC 207 GCCATGAG 917 X 'Myopathy', '_reducing_body', '_X-linked', '_childhood-onset' AGT ACCAAGT T ACTCG 2 136 ATATCTGC 330 ATGTTTTCT 012 3 'Propionic_acidemia' AGG CCAAGG T CTGCA 2

145

147 AATACTCA 828 TCCCAATG 056 5 'Hereditary_pancreatitis' CGT AATGCGT T CTCAT 2 149 AAAGACTC 503 TTCCCACC 326 X 'Mucopolysaccharidosis', '_MPS-II' TGG GACATGG T ACTCT 2 151 ACCTCGGG 851 CAGAGATG 392 5 'Hyperekplexia_hereditary' AGA CTCGAGA T CGGGC 2 154 GATTCTGA 031 CTTCACGG 364 X 'Rett_syndrome' TGG TAACTGG T CTGAC 2 154 CAACTCAC 172 'Congenital_myopathy_with_fiber_type_disproportion|not_p GAGCCACC 969 1 rovided' GGA TACAGGA T TCACG 2

154 'G6PD_IOWA|G6PD_IOWA_CITY|G6PD_SPRINGFIELD|G6PD_ GCGCTCGC 532 WALTER_REED|Anemia', '_nonspherocytic_hemolytic', ACTGCTGG 698 X '_due_to_G6PD_deficiency' AGA TGGAAGA T TCGCA 2

154 'G6PD_TAIWAN-HAKKA_2|Favism', GATGCGGT 534 '_susceptibility_to|Anemia', '_nonspherocytic_hemolytic', CCCAGCCT 489 X '_due_to_G6PD_deficiency|not_specified' TGG CTGCTGG T CGGTC 2 154 AGCAACTT 765 CGGATTCA 486 X 'Dyskeratosis_congenita_X-linked' TGA GGTTTGA T ACTTC 2

146

156 ATTTCCTGT 791 AATAAACA 137 7 'Triphalangeal_thumb|Polydactyly', '_preaxial_II' AGA CTAAGA T CCTGT 2 166 CCAAGTCC 039 TACCAGGC 475 2 'Severe_myoclonic_epilepsy_in_infancy' CGT TAAGCGT T GTCCT 2 173 GATCTCGA 232 CCTGCGTG 997 5 'Atrioventricular_septal_defect', '_somatic' TGA GACGTGA T TCGAC 2 177 CCCTCGCA 248 AAGAAAAA 178 5 'Sotos_syndrome_1' TGT ATAATGT T CGCAA 2

178 GGCTCACG 149 'Dyskeratosis_congenita_autosomal_recessive_1|Dyskeratosi ATGAGTGC 760 5 s_congenita', '_autosomal_recessive_2' AGG CTGGAGG T CACGA 2 180 CGTTCACG 806 TGTCCCCCC 529 1 'Basal_ganglia_calcification', '_idiopathic', '_6' GGG TTTGGG T CACGT 2 186 ATACAGTC 209 CCTGGGGC 263 4 'Bietti_crystalline_corneoretinal_dystrophy' AGA CAGCAGA T AGTCC 2 198 AGCACTTC 900 GATTGAAC 385 1 'Acute_myeloid_leukemia_with_maturation' AGA TAAAAGA T CTTCG 2 219 AATCGTCC 421 TGCAGGAG 340 2 'Myofibrillar_myopathy_1|not_provided' GGA AGGGGGA T GTCCT 2 219 CGGCGCTT 927 CCGGGGG 941 1 'Hypermanganesemia_with_dystonia_1' AGG GCCTCAGG T GCTTC 2

147

247 GTGCCTCT 424 'Familial_cold_urticaria|Chronic_infantile_neurological', GACGAGCA 375 1 '_cutaneous_and_articular_syndrome' GGA CATAGGA T CTCTG 2 161 GGACCGCT 078 CCCCCTTCT 0 6 'Axenfeld-Rieger_syndrome_type_3' GGG ACCGGG T CGCTC 3 211 CTCAGCCT 448 CGGTGCTC 9 16 'Polycystic_kidney_disease', '_adult_type' TGG CAGGTGG T GCCTC 3 469 GAGAACTC 981 'Gerstmann-Straussler- CACCGAGA 3 20 Scheinker_syndrome|Genetic_prion_diseases' CGT CCGACGT T ACTCC 3 522 GGCACCTC 663 TGCCACAC 5 11 'Hemoglobinopathy' TGA TGAGTGA T CCTCT 3 646 GTCCTCCCT 944 CCTTATCTA 4 1 'Distal_spinal_muscular_atrophy', '_autosomal_recessive_4' TGA CCTGA T TCCCT 3 111 CGTTTCCCT 137 CTTCACGC 01 19 'Familial_hypercholesterolemia' TGG CCTTGG T TCCCT 3 111 GATGCCAT 161 CGGGCCAC 97 19 'Familial_hypercholesterolemia' TGT TGAATGT T CCATC 3 132 ATTTCCTAC 574 GTCGTCTA 74 19 'Episodic_ataxia_type_2' TGT CTTTGT T CCTAC 3

148

161 AAAGCCTC 889 TGTGACTC 07 16 'Pseudoxanthoma_elasticum' AGT TCACAGT T CCTCT 3 162 ATGTCCTG 020 CTGCTCAA 69 16 'Pseudoxanthoma_elasticum' CGT ACAGCGT T CCTGC 3 185 TACCTCCAC 071 CTACAACC 58 X 'Early_infantile_epileptic_encephalopathy_2' AGG CCAAGG T TCCAC 3 201 TCCCTTCCC 863 AAGGAAAA 38 X 'Coffin-Lowry_syndrome' CGA GATCGA T TTCCC 3 241 CCACCTCC 229 'Thyroid_hormone_resistance', '_generalized', ATGTGCAG 43 3 '_autosomal_dominant' CGG GAAGCGG T CTCCA 3 251 CTCTCACAC 239 AGCCGCCA 75 7 'Thrombocytopenia_4' AGA ATAAGA T CACAC 3 252 CCTCTCCGC 343 TCCGCTGA 13 2 'Tatton-Brown-rahman_syndrome' AGT AGGAGT T TCCGC 3

258 "Alzheimer's_disease|Alzheimer_disease", ATCTCCTGC 918 '_type_1|Cerebral_amyloid_angiopathy', '_APP- AAAGAACA 55 21 related|not_provided' TGA CCTTGA T CCTGC 3

149

356 'Metaphyseal_chondrodysplasia', GGACTTCC 579 '_McKusick_type|Metaphyseal_dysplasia_without_hypotrich CCCTAGGC 48 9 osis|Anauxetic_dysplasia|not_provided' AGG GGAAAGG T TTCCC 3 383 TTTGCCTTC 698 ATTGCAAG 06 X 'not_provided' AGT GGAAGT T CCTTC 3 383 AAGGACTC 698 CCCTTGCA 17 X 'Ornithine_carbamoyltransferase_deficiency|not_provided' AGG ATAAAGG T ACTCC 3 384 CAGACACT 089 CGGATAAG 51 X 'not_provided' GGA CATGGGA T CACTC 3 384 TCCACTCCT 119 TCTGGCTTT 23 X 'not_provided' GGG CTGGG T CTCCT 3 387 TATGGCCT 102 CCTCGTCG 86 19 'Focal_segmental_glomerulosclerosis_1' GGG GGCCGGG T GCCTC 3

412 GCTCCTCCT 246 'Transitional_cell_carcinoma_of_the_bladder|Adenocarcinom CTGAGTGG 45 3 a_of_lung|Malignant_melanoma_of_skin' AGG TAAAGG T CTCCT 3 441 CTTCACCTC 509 'Maturity-onset_diabetes_of_the_young', CTCCTTTCC 90 7 '__type_2|not_provided' TGA TGTGA T ACCTC 3 449 GCGAACCT 152 CCTCGATG 31 17 "Alexander's_disease" TGG TAGCTGG T ACCTC 3

150

476 AAGGCACT 838 CGCCTGTG 03 4 'Preeclampsia/eclampsia_5' AGT TGACAGT T CACTC 3 550 ATGATCAC 210 CTGGGCAT 95 X 'Hereditary_sideroblastic_anemia' CGA GAGCCGA T TCACC 3 717 CCTCACCTC 850 CAACATCA 51 10 'Deafness', '_autosomal_recessive_12' CGG CTGCGG T ACCTC 3 746 TCACCTCCA 749 GGTGAGTG 64 15 'Mental_retardation', '_autosomal_recessive_50' TGA TCCTGA T CTCCA 3 747 CCACCGCT 190 CCCTGTTCC 67 16 'Spastic_paraplegia_35' TGA ACATGA T CGCTC 3 795 GGAGACTC 573 CCGCTACT 63 10 'Idiopathic_fibrosing_alveolitis', '_chronic_form' TGG CAGATGG T ACTCC 3 107 GAATCCCT 690 "Pendred's_syndrome|Enlarged_vestibular_aqueduct_syndro AAGGAAGA 125 7 me|not_provided" GGA GACTGGA T CCCTA 3

133 TCTTCCCG 362 'Mitochondrial_short-chain_enoyl- GTCATCCT 924 10 coa_hydratase_1_deficiency|not_provided' GGA GGCAGGA T CCCGG 3 136 CACTCCCTT 432 CCTCTTCAT 986 9 'Joubert_syndrome' CGT CACGT T CCCTT 3

151

138 GGCTGCCT 827 CGATGGCC 609 5 'Macular_dystrophy', '_patterned', '_2' CGT GACTCGT T GCCTC 3 149 GGGCACCT 498 CGCCTGAC 228 X 'Mucopolysaccharidosis', '_MPS-II' AGA AAACAGA T ACCTC 3 150 GGTACTTC 660 CTTATTCAA 450 X 'Severe_X-linked_myotubular_myopathy' TGA CTGTGA T CTTCC 3 150 GACGTCGC 951 CGAAGCCC 508 7 'Congenital_long_QT_syndrome|not_provided' TGG ACACTGG T TCGCC 3 153 TGGCCTTC 934 CCTGGCCC 388 X 'N-terminal_acetyltransferase_deficiency' TGG CAGGTGG T CTTCC 3 160 CCACATCC 736 CTGGCCCT 976 6 'Dysplasminogenemia' AGT GGCAAGT T ATCCC 3 177 GTCAGCCT 994 CTGGGAGG 185 5 'Pituitary_hormone_deficiency', '_combined_2' AGT AACCAGT T GCCTC 3 224 ACACACCT 503 CACCTTTAA 649 2 'Pseudohypoaldosteronism', '_type_2' AGA CTTAGA T ACCTC 3 247 CAATCCCA 425 GCTGGCTG 329 1 'Familial_cold_urticaria' GGA GGCTGGA T CCCAG 3 266 CCTCCTCCC 256 GGCCTGCG 03 22 'Cataract_23', '_multiple_types' TGT GCCTGT T CTCCC 4

152

577 AAAATCCC 653 CCCGCCAC 20 12 'Vitamin_D-dependent_rickets', '_type_1' CGA GTCCCGA T TCCCC 4 699 ACTTCCCCT 648 TATTCCATC 80 3 'Waardenburg_syndrome_type_2A' CGG CACGG T CCCCT 4 736 CATTTCCCC 418 CCCTATTTT 10 6 'Salla_disease|not_provided' TGG GCTGG T TCCCC 4 154 GTGACCTC 969 CGAGGAAT 468 X 'Hereditary_factor_VIII_deficiency_disease' AGT ATTGAGT T CCTCC 4

153

References

Aynaud, M.-M., Suspène, R., Vidalain, P.-O., Mussil, B., Guétard, D., Tangy, F., Wain-Hobson,

S., and Vartanian, J.-P. (2012). Human Tribbles 3 protects nuclear DNA from cytidine deamination by APOBEC3A. J. Biol. Chem. 287, 39182–39192.

Bae, S., Park, J., and Kim, J.-S. (2014). Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinforma. Oxf. Engl. 30,

1473–1475.

Barrangou, R., Fremaux, C., Deveau, H., Richards, M., Boyaval, P., Moineau, S., Romero, D.A., and Horvath, P. (2007). CRISPR provides acquired resistance against viruses in prokaryotes.

Science 315, 1709–1712.

Bohn, M.-F., Shandilya, S.M.D., Silvas, T.V., Nalivaika, E.A., Kouno, T., Kelch, B.A., Ryder,

S.P., Kurt-Yilmaz, N., Somasundaran, M., and Schiffer, C.A. (2015). The ssDNA Mutator

APOBEC3A Is Regulated by Cooperative Dimerization. Struct. Lond. Engl. 1993 23, 903–911.

Bolukbasi, M.F., Gupta, A., Oikemus, S., Derr, A.G., Garber, M., Brodsky, M.H., Zhu, L.J., and

Wolfe, S.A. (2015). DNA-binding-domain fusions enhance the targeting range and precision of

Cas9. Nat. Methods 12, 1150–1156.

Brinkman, E.K., Chen, T., Amendola, M., and van Steensel, B. (2014). Easy quantitative assessment of genome editing by sequence trace decomposition. Nucleic Acids Res. 42, e168.

Cao, A., and Galanello, R. (2010). Beta-thalassemia. Genet. Med. Off. J. Am. Coll. Med. Genet.

12, 61–76.

154

Casini, A., Olivieri, M., Petris, G., Montagna, C., Reginato, G., Maule, G., Lorenzin, F., Prandi,

D., Romanel, A., Demichelis, F., et al. (2018). A highly specific SpCas9 variant is identified by in vivo screening in yeast. Nat. Biotechnol. 36, 265–271.

Chang, H.H.Y., Pannunzio, N.R., Adachi, N., and Lieber, M.R. (2017). Non-homologous DNA end joining and alternative pathways to double-strand break repair. Nat. Rev. Mol. Cell Biol. 18,

495–506.

Chapman, J.R., Taylor, M.R.G., and Boulton, S.J. (2012). Playing the End Game: DNA Double-

Strand Break Repair Pathway Choice. Mol. Cell 47, 497–510.

Chen, J.S., Dagdas, Y.S., Kleinstiver, B.P., Welch, M.M., Sousa, A.A., Harrington, L.B.,

Sternberg, S.H., Joung, J.K., Yildiz, A., and Doudna, J.A. (2017). Enhanced proofreading governs

CRISPR-Cas9 targeting accuracy. Nature 550, 407–410.

Cho, S.W., Kim, S., Kim, Y., Kweon, J., Kim, H.S., Bae, S., and Kim, J.-S. (2014). Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases. Genome Res.

24, 132–141.

Chu, V.T., Weber, T., Wefers, B., Wurst, W., Sander, S., Rajewsky, K., and Kühn, R. (2015).

Increasing the efficiency of homology-directed repair for CRISPR-Cas9-induced precise gene editing in mammalian cells. Nat. Biotechnol. 33, 543–548.

Cong, L., Ran, F.A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P.D., Wu, X., Jiang, W.,

Marraffini, L.A., et al. (2013). Multiplex genome engineering using CRISPR/Cas systems. Science

339, 819–823.

155

Cox, D.B.T., Platt, R.J., and Zhang, F. (2015a). Therapeutic genome editing: prospects and challenges. Nat. Med. 21, 121–131.

Cox, D.B.T., Platt, R.J., and Zhang, F. (2015b). Therapeutic genome editing: prospects and challenges. Nat. Med. 21, 121.

Crosetto, N., Mitra, A., Silva, M.J., Bienko, M., Dojer, N., Wang, Q., Karaca, E., Chiarle, R.,

Skrzypczak, M., Ginalski, K., et al. (2013). Nucleotide-resolution DNA double-strand breaks mapping by next-generation sequencing. Nat. Methods 10, 361–365.

Doudna, J.A., and Charpentier, E. (2014). The new frontier of genome engineering with CRISPR-

Cas9. Science 346, 1258096.

Dyck, E.V., Stasiak, A.Z., Stasiak, A., and West, S.C. (1999). Binding of double-strand breaks in

DNA by human Rad52 protein. Nature 398, 728–731.

Ellis, B.L., Hirsch, M.L., Porter, S.N., Samulski, R.J., and Porteus, M.H. (2013). Zinc-finger nuclease-mediated gene correction using single AAV vector transduction and enhancement by

Food and Drug Administration-approved drugs. Gene Ther. 20, 35–42.

Eng, B., Walker, L., Nakamura, L.M., Hoppe, C., Azimi, M., Lee, H., and Waye, J.S. (2007).

Three new beta-globin gene promoter mutations identified through newborn screening.

Hemoglobin 31, 129–134.

Esteller, M. (2011). Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874.

Fineran, P.C., and Charpentier, E. (2012). Memory of viral infections by CRISPR-Cas adaptive immune systems: acquisition of new information. Virology 434, 202–209.

156

Frock, R.L., Hu, J., Meyers, R.M., Ho, Y.-J., Kii, E., and Alt, F.W. (2015). Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat. Biotechnol. 33, 179–186.

Fu, Y., Sander, J.D., Reyon, D., Cascio, V.M., and Joung, J.K. (2014). Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nat. Biotechnol. 32, 279–284.

Gabriel, R., Lombardo, A., Arens, A., Miller, J.C., Genovese, P., Kaeppel, C., Nowrouzi, A.,

Bartholomae, C.C., Wang, J., Friedman, G., et al. (2011). An unbiased genome-wide analysis of zinc-finger nuclease specificity. Nat. Biotechnol. 29, 816–823.

Garneau, J.E., Dupuis, M.-È., Villion, M., Romero, D.A., Barrangou, R., Boyaval, P., Fremaux,

C., Horvath, P., Magadán, A.H., and Moineau, S. (2010). The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature 468, 67–71.

Gasiunas, G., Barrangou, R., Horvath, P., and Siksnys, V. (2012). Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc. Natl. Acad. Sci.

109, E2579–E2586.

Gaudelli, N.M., Komor, A.C., Rees, H.A., Packer, M.S., Badran, A.H., Bryson, D.I., and Liu, D.R.

(2017). Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage.

Nature 551, 464–471.

Grawunder, U., Wilm, M., Wu, X., Kulesza, P., Wilson, T.E., Mann, M., and Lieber, M.R. (1997).

Activity of DNA ligase IV stimulated by complex formation with XRCC4 protein in mammalian cells. Nature 388, 492–495.

157

Greco, G.E., Matsumoto, Y., Brooks, R.C., Lu, Z., Lieber, M.R., and Tomkinson, A.E. (2016).

SCR7 is neither a selective nor a potent inhibitor of human DNA ligase IV. DNA Repair 43, 18–

23.

Guilinger, J.P., Pattanayak, V., Reyon, D., Tsai, S.Q., Sander, J.D., Joung, J.K., and Liu, D.R.

(2014a). Broad specificity profiling of TALENs results in engineered nucleases with improved

DNA-cleavage specificity. Nat. Methods 11, 429–435.

Guilinger, J.P., Thompson, D.B., and Liu, D.R. (2014b). Fusion of catalytically inactive Cas9 to

FokI nuclease improves the specificity of genome modification. Nat. Biotechnol. 32, 577–582.

Gutschner, T., Haemmerle, M., Genovese, G., Draetta, G.F., and Chin, L. (2016). Post- translational Regulation of Cas9 during G1 Enhances Homology-Directed Repair. Cell Rep. 14,

1555–1566.

Haber, J.E. (2012). Mating-Type Genes and MAT Switching in Saccharomyces cerevisiae.

Genetics 191, 33–64.

Harris, R.S., Petersen-Mahrt, S.K., and Neuberger, M.S. (2002). RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. Mol. Cell 10, 1247–1253.

Helma, J., Cardoso, M.C., Muyldermans, S., and Leonhardt, H. (2015). Nanobodies and recombinant binders in cell biology. J. Cell Biol. 209, 633–644.

Hershey, A.D., and Chase, M. (1952). Independent functions of viral protein and nucleic acid in growth of bacteriophage. J. Gen. Physiol. 36, 39–56.

158

Hess, G.T., Frésard, L., Han, K., Lee, C.H., Li, A., Cimprich, K.A., Montgomery, S.B., and Bassik,

M.C. (2016a). Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells. Nat. Methods 13, 1036–1042.

Hess, G.T., Frésard, L., Han, K., Lee, C.H., Li, A., Cimprich, K.A., Montgomery, S.B., and Bassik,

M.C. (2016b). Directed evolution using dCas9-targeted somatic hypermutation in mammalian cells. Nat. Methods 13, 1036–1042.

Hockemeyer, D., Wang, H., Kiani, S., Lai, C.S., Gao, Q., Cassady, J.P., Cost, G.J., Zhang, L.,

Santiago, Y., Miller, J.C., et al. (2011). Genetic engineering of human pluripotent cells using

TALE nucleases. Nat. Biotechnol. 29, 731–734.

Hocquemiller, M., Giersch, L., Audrain, M., Parker, S., and Cartier, N. (2016). Adeno-Associated

Virus-Based Gene Therapy for CNS Diseases. Hum. Gene Ther. 27, 478–496.

Holtz, C.M., Sadler, H.A., and Mansky, L.M. (2013). APOBEC3G cytosine deamination hotspots are defined by both sequence context and single-stranded DNA secondary structure. Nucleic Acids

Res. 41, 6139–6148.

Horvath, P., and Barrangou, R. (2010). CRISPR/Cas, the immune system of bacteria and archaea.

Science 327, 167–170.

Howden, S.E., McColl, B., Glaser, A., Vadolas, J., Petrou, S., Little, M.H., Elefanty, A.G., and

Stanley, E.G. (2016). A Cas9 Variant for Efficient Generation of Indel-Free Knockin or Gene-

Corrected Human Pluripotent Stem Cells. Stem Cell Rep. 7, 508–517.

159

Hu, J.H., Miller, S.M., Geurts, M.H., Tang, W., Chen, L., Sun, N., Zeina, C.M., Gao, X., Rees,

H.A., Lin, Z., et al. (2018). Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature.

Jasin, M., and Haber, J.E. (2016). The democratization of gene editing: Insights from site-specific cleavage and double-strand break repair. DNA Repair 44, 6–16.

Jeggo, P.A. (1998). DNA breakage and repair. Adv. Genet. 38, 185–218.

Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J.A., and Charpentier, E. (2012). A

Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science

337, 816–821.

Kim, D., Bae, S., Park, J., Kim, E., Kim, S., Yu, H.R., Hwang, J., Kim, J.-I., and Kim, J.-S. (2015).

Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nat.

Methods 12, 237–243, 1 p following 243.

Kim, D., Lim, K., Kim, S.-T., Yoon, S., Kim, K., Ryu, S.-M., and Kim, J.-S. (2017a). Genome- wide target specificities of CRISPR RNA-guided programmable deaminases. Nat. Biotechnol. 35,

475–480.

Kim, D., Lim, K., Kim, S.-T., Yoon, S.-H., Kim, K., Ryu, S.-M., and Kim, J.-S. (2017b). Genome- wide target specificities of CRISPR RNA-guided programmable deaminases. Nat. Biotechnol. 35,

475–480.

160

Kim, K., Ryu, S.-M., Kim, S.-T., Baek, G., Kim, D., Lim, K., Chung, E., Kim, S., and Kim, J.-S.

(2017c). Highly efficient RNA-guided base editing in mouse embryos. Nat. Biotechnol. 35, 435–

437.

Kim, S., Bae, T., Hwang, J., and Kim, J.-S. (2017d). Rescue of high-specificity Cas9 variants using sgRNAs with matched 5’ nucleotides. Genome Biol. 18, 218.

Kim, Y.B., Komor, A.C., Levy, J.M., Packer, M.S., Zhao, K.T., and Liu, D.R. (2017e). Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371–376.

Kim, Y.B., Komor, A.C., Levy, J.M., Packer, M.S., Zhao, K.T., and Liu, D.R. (2017f). Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371–376.

Kleinstiver, B.P., Prew, M.S., Tsai, S.Q., Topkar, V.V., Nguyen, N.T., Zheng, Z., Gonzales,

A.P.W., Li, Z., Peterson, R.T., Yeh, J.-R.J., et al. (2015). Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481–485.

Kleinstiver, B.P., Pattanayak, V., Prew, M.S., Tsai, S.Q., Nguyen, N.T., Zheng, Z., and Joung, J.K.

(2016). High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects.

Nature 529, 490–495.

Komor, A.C., Kim, Y.B., Packer, M.S., Zuris, J.A., and Liu, D.R. (2016a). Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424.

161

Komor, A.C., Kim, Y.B., Packer, M.S., Zuris, J.A., and Liu, D.R. (2016b). Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424.

Komor, A.C., Zhao, K.T., Packer, M.S., Gaudelli, N.M., Waterbury, A.L., Koblan, L.W., Kim,

Y.B., Badran, A.H., and Liu, D.R. (2017). Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774.

Kouno, T., Silvas, T.V., Hilbert, B.J., Shandilya, S.M.D., Bohn, M.F., Kelch, B.A., Royer, W.E.,

Somasundaran, M., Kurt Yilmaz, N., Matsuo, H., et al. (2017). Crystal structure of APOBEC3A bound to single-stranded DNA reveals structural basis for cytidine deamination and specificity.

Nat. Commun. 8.

Kulcsár, P.I., Tálas, A., Huszár, K., Ligeti, Z., Tóth, E., Weinhardt, N., Fodor, E., and Welker, E.

(2017). Crossing enhanced and high fidelity SpCas9 nucleases to optimize specificity and cleavage. Genome Biol. 18, 190.

Kuscu, C., Arslan, S., Singh, R., Thorpe, J., and Adli, M. (2014). Genome-wide analysis reveals characteristics of off-target sites bound by the Cas9 endonuclease. Nat. Biotechnol. 32, 677–683.

Landrum, M.J., Lee, J.M., Benson, M., Brown, G., Chao, C., Chitipiralla, S., Gu, B., Hart, J.,

Hoffman, D., Hoover, J., et al. (2016). ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862-868.

Lee, J.K., Jeong, E., Lee, J., Jung, M., Shin, E., Kim, Y., Lee, K., Kim, D., Kim, J.-S., and Kim,

S. (2017). Directed evolution of CRISPR-Cas9 to increase its specificity. BioRxiv 237040.

162

Li, Z., Li, L., Yao, Y., Li, N., Li, Y., Zhang, Z., Yan, F., Qiu, H., Wu, C., and Zhang, Z. (2015). A novel promoter mutation (HBB: c.-75G>T) was identified as a cause of β(+)-thalassemia.

Hemoglobin 39, 115–120.

Liang, P., Ding, C., Sun, H., Xie, X., Xu, Y., Zhang, X., Sun, Y., Xiong, Y., Ma, W., Liu, Y., et al. (2017). Correction of β-thalassemia mutant by base editor in human embryos. Protein Cell 8,

811–822.

Lieber, M.R. (2010). The mechanism of double-strand DNA break repair by the nonhomologous

DNA end-joining pathway. Annu. Rev. Biochem. 79, 181–211.

Lieber, M.R., and Karanjawala, Z.E. (2004). Ageing, repetitive genomes and DNA damage. Nat.

Rev. Mol. Cell Biol. 5, 69–75.

Lin, S., Staahl, B.T., Alla, R.K., and Doudna, J.A. Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery. ELife 3.

Logue, E.C., Bloch, N., Dhuey, E., Zhang, R., Cao, P., Herate, C., Chauveau, L., Hubbard, S.R., and Landau, N.R. (2014). A DNA sequence recognition loop on APOBEC3A controls substrate specificity. PloS One 9, e97062.

Maeder, M.L., Thibodeau-Beganny, S., Osiak, A., Wright, D.A., Anthony, R.M., Eichtinger, M.,

Jiang, T., Foley, J.E., Winfrey, R.J., Townsend, J.A., et al. (2008). Rapid “open-source” engineering of customized zinc-finger nucleases for highly efficient gene modification. Mol. Cell

31, 294–301.

163

Mali, P., Yang, L., Esvelt, K.M., Aach, J., Guell, M., DiCarlo, J.E., Norville, J.E., and Church,

G.M. (2013a). RNA-guided human genome engineering via Cas9. Science 339, 823–826.

Mali, P., Aach, J., Stranges, P.B., Esvelt, K.M., Moosburner, M., Kosuri, S., Yang, L., and Church,

G.M. (2013b). CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering. Nat. Biotechnol. 31, 833–838.

Maruyama, T., Dougan, S.K., Truttmann, M.C., Bilate, A.M., Ingram, J.R., and Ploegh, H.L.

(2015). Increasing the efficiency of precise genome editing with CRISPR-Cas9 by inhibition of nonhomologous end joining. Nat. Biotechnol. 33, 538–542.

Mojica, F.J., Juez, G., and Rodríguez-Valera, F. (1993). Transcription at different salinities of

Haloferax mediterranei sequences adjacent to partially modified PstI sites. Mol. Microbiol. 9, 613–

621.

Mojica, F.J.M., Díez-Villaseñor, C., García-Martínez, J., and Soria, E. (2005). Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J. Mol.

Evol. 60, 174–182.

Moore, J.K., and Haber, J.E. (1996). Cell cycle and genetic requirements of two pathways of nonhomologous end-joining repair of double-strand breaks in Saccharomyces cerevisiae. Mol.

Cell. Biol. 16, 2164–2173.

Naso, M.F., Tomkowicz, B., Perry, W.L., and Strohl, W.R. (2017). Adeno-Associated Virus

(AAV) as a Vector for Gene Therapy. Biodrugs 31, 317–334.

164

Nathwani, A.C., Reiss, U.M., Tuddenham, E.G.D., Rosales, C., Chowdary, P., McIntosh, J., Della

Peruta, M., Lheriteau, E., Patel, N., Raj, D., et al. (2014). Long-term safety and efficacy of factor

IX gene therapy in hemophilia B. N. Engl. J. Med. 371, 1994–2004.

Nishida, K., Arazoe, T., Yachie, N., Banno, S., Kakimoto, M., Tabata, M., Mochizuki, M., Miyabe,

A., Araki, M., Hara, K.Y., et al. (2016). Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353.

O’Driscoll, M., and Jeggo, P.A. (2006). The role of double-strand break repair — insights from human genetics. Nat. Rev. Genet. 7, 45–54.

Orthwein, A., Fradet-Turcotte, A., Noordermeer, S.M., Canny, M.D., Brun, C.M., Strecker, J.,

Escribano-Diaz, C., and Durocher, D. (2014). Mitosis inhibits DNA double-strand break repair to guard against telomere fusions. Science 344, 189–193.

Osborn, M.J., Starker, C.G., McElroy, A.N., Webber, B.R., Riddle, M.J., Xia, L., DeFeo, A.P.,

Gabriel, R., Schmidt, M., von Kalle, C., et al. (2013). TALEN-based gene correction for epidermolysis bullosa. Mol. Ther. J. Am. Soc. Gene Ther. 21, 1151–1159.

Paix, A., Folkmann, A., Goldman, D.H., Kulaga, H., Grzelak, M.J., Rasoloson, D., Paidemarry,

S., Green, R., Reed, R.R., and Seydoux, G. (2017). Precision genome editing using synthesis- dependent repair of Cas9-induced DNA breaks. Proc. Natl. Acad. Sci. 114, E10745–E10754.

Pattanayak, V., Ramirez, C.L., Joung, J.K., and Liu, D.R. (2011). Revealing off-target cleavage specificities of zinc-finger nucleases by in vitro selection. Nat. Methods 8, 765–770.

165

Pattanayak, V., Guilinger, J.P., and Liu, D.R. (2014). Determining the specificities of TALENs,

Cas9, and other genome editing enzymes. Methods Enzymol. 546, 47–78.

Pavletich, N.P., and Pabo, C.O. (1991). Zinc Finger-DNA Recognition: Crystal Structure of a

Zif268-DNA Complex at $\overset{\circ}{\mathrm A}$. Science 252, 809–817.

Perez, E.E., Wang, J., Miller, J.C., Jouvenot, Y., Kim, K.A., Liu, O., Wang, N., Lee, G.,

Bartsevich, V.V., Lee, Y.-L., et al. (2008). Establishment of HIV-1 resistance in CD4+ T cells by genome editing using zinc-finger nucleases. Nat. Biotechnol. 26, 808–816.

Pinello, L., Canver, M.C., Hoban, M.D., Orkin, S.H., Kohn, D.B., Bauer, D.E., and Yuan, G.-C.

(2016). Analyzing CRISPR genome-editing experiments with CRISPResso. Nat. Biotechnol. 34,

695–697.

Ponnazhagan, S., Mukherjee, P., Wang, X.S., Qing, K., Kube, D.M., Mah, C., Kurpad, C., Yoder,

M.C., Srour, E.F., and Srivastava, A. (1997). Adeno-associated virus type 2-mediated transduction in primary human bone marrow-derived CD34+ hematopoietic progenitor cells: donor variation and correlation of transgene expression with cellular differentiation. J. Virol. 71, 8262–8267.

Ptashne, M. (2014). The Chemistry of Regulation of Genes and Other Things. J. Biol. Chem. jbc.X114.547323.

Ran, F.A., Hsu, P.D., Wright, J., Agarwala, V., Scott, D.A., and Zhang, F. (2013a). Genome engineering using the CRISPR-Cas9 system. Nat. Protoc. 8, 2281–2308.

166

Ran, F.A., Hsu, P.D., Lin, C.-Y., Gootenberg, J.S., Konermann, S., Trevino, A., Scott, D.A., Inoue,

A., Matoba, S., Zhang, Y., et al. (2013b). Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell 154, 1380–1389.

Ran, F.A., Cong, L., Yan, W.X., Scott, D.A., Gootenberg, J.S., Kriz, A.J., Zetsche, B., Shalem,

O., Wu, X., Makarova, K.S., et al. (2015a). In vivo genome editing using Staphylococcus aureus

Cas9. Nature 520, 186–191.

Ran, F.A., Cong, L., Yan, W.X., Scott, D.A., Gootenberg, J.S., Kriz, A.J., Zetsche, B., Shalem,

O., Wu, X., Makarova, K.S., et al. (2015b). In vivo genome editing using Staphylococcus aureus

Cas9. Nature 520, 186–191.

Rathore, A., Carpenter, M.A., Demir, Ö., Ikeda, T., Li, M., Shaban, N.M., Law, E.K., Anokhin,

D., Brown, W.L., Amaro, R.E., et al. (2013). The Local Dinucleotide Preference of APOBEC3G

Can Be Altered from 5′-CC to 5′-TC by a Single Amino Acid Substitution. J. Mol. Biol. 425,

4442–4454.

Rebhandl, S., Huemer, M., Greil, R., and Geisberger, R. (2015). AID/APOBEC deaminases and cancer. Oncoscience 2, 320–333.

Rees, H.A., Komor, A.C., Yeh, W.-H., Caetano-Lopes, J., Warman, M., Edge, A.S.B., and Liu,

D.R. (2017a). Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery. Nat. Commun. 8, 15790.

Rees, H.A., Komor, A.C., Yeh, W.-H., Caetano-Lopes, J., Warman, M., Edge, A.S.B., and Liu,

D.R. (2017b). Improving the DNA specificity and applicability of base editing through protein engineering and protein delivery. Nat. Commun. 8, 15790.

167

Richardson, C.D., Ray, G.J., DeWitt, M.A., Curie, G.L., and Corn, J.E. (2016). Enhancing homology-directed genome editing by catalytically active and inactive CRISPR-Cas9 using asymmetric donor DNA. Nat. Biotechnol. 34, 339–344.

Roth, S.H., Danan-Gotthold, M., Ben-Izhak, M., Rechavi, G., Cohen, C.J., Louzoun, Y., and

Levanon, E.Y. (2018). Increased RNA Editing May Provide a Source for Autoantigens in Systemic

Lupus Erythematosus. Cell Rep. 23, 50–57.

Rouet, P., Smih, F., and Jasin, M. (1994). Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol. Cell. Biol. 14, 8096–8106.

Sander, J.D., and Joung, J.K. (2014). CRISPR-Cas systems for editing, regulating and targeting genomes. Nat. Biotechnol. 32, 347.

Santos-Pereira, J.M., and Aguilera, A. (2015). R loops: new modulators of genome dynamics and function. Nat. Rev. Genet. 16, 583–597.

Shi, K., Carpenter, M.A., Banerjee, S., Shaban, N.M., Kurahashi, K., Salamango, D.J., McCann,

J.L., Starrett, G.J., Duffy, J.V., Demir, Ö., et al. (2017). Structural basis for targeted DNA cytosine deamination and mutagenesis by APOBEC3A and APOBEC3B. Nat. Struct. Mol. Biol. 24, 131–

139.

Shimatani, Z., Kashojiya, S., Takayama, M., Terada, R., Arazoe, T., Ishii, H., Teramura, H.,

Yamamoto, T., Komatsu, H., Miura, K., et al. (2017). Targeted base editing in rice and tomato using a CRISPR-Cas9 cytidine deaminase fusion. Nat. Biotechnol. 35, 441–443.

168

Shinohara, M., Io, K., Shindo, K., Matsui, M., Sakamoto, T., Tada, K., Kobayashi, M., Kadowaki,

N., and Takaori-Kondo, A. (2012). APOBEC3B can impair genomic stability by inducing base substitutions in genomic DNA in human cells. Sci. Rep. 2, 806.

Slaymaker, I.M., Gao, L., Zetsche, B., Scott, D.A., Yan, W.X., and Zhang, F. (2016). Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88.

Sonoda, E., Hochegger, H., Saberi, A., Taniguchi, Y., and Takeda, S. (2006). Differential usage of non-homologous end-joining and homologous recombination in double strand break repair.

DNA Repair 5, 1021–1029.

Suspène, R., Henry, M., Guillot, S., Wain-Hobson, S., and Vartanian, J.-P. (2005). Recovery of

APOBEC3-edited human immunodeficiency virus G->A hypermutants by differential DNA denaturation PCR. J. Gen. Virol. 86, 125–129.

Tang, J.C., Drokhlyansky, E., Etemad, B., Rudolph, S., Guo, B., Wang, S., Ellis, E.G., Li, J.Z., and Cepko, C.L. (2016). Detection and manipulation of live antigen-expressing cells using conditionally stable nanobodies. ELife 5, e15312.

Tsai, S.Q., and Joung, J.K. (2016). Defining and improving the genome-wide specificities of

CRISPR-Cas9 nucleases. Nat. Rev. Genet. 17, 300–312.

Tsai, S.Q., Wyvekens, N., Khayter, C., Foden, J.A., Thapar, V., Reyon, D., Goodwin, M.J., Aryee,

M.J., and Joung, J.K. (2014). Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nat. Biotechnol. 32, 569–576.

169

Tsai, S.Q., Zheng, Z., Nguyen, N.T., Liebers, M., Topkar, V.V., Thapar, V., Wyvekens, N.,

Khayter, C., Iafrate, A.J., Le, L.P., et al. (2015). GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187–197.

Tsai, S.Q., Nguyen, N.T., Malagon-Lopez, J., Topkar, V.V., Aryee, M.J., and Joung, J.K. (2017).

CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off- targets. Nat. Methods 14, 607–614.

Vasileva, A., and Jessberger, R. (2005). Precise hit: adeno-associated virus in gene targeting. Nat.

Rev. Microbiol. 3, 837–847.

Walther, W., and Stein, U. (1996). Cell type specific and inducible promoters for vectors in gene therapy as an approach for cell targeting. J. Mol. Med. Berl. Ger. 74, 379–392.

Wang, D., Zhong, L., Nahid, M.A., and Gao, G. (2014). The potential of adeno-associated viral vectors for gene delivery to muscle tissue. Expert Opin. Drug Deliv. 11, 345–364.

Wang, L., Xue, W., Yan, L., Li, X., Wei, J., Chen, M., Wu, J., Yang, B., Yang, L., and Chen, J.

(2017). Enhanced base editing by co-expression of free uracil DNA glycosylase inhibitor. Cell

Res. 27, 1289–1292.

Wang, M., Yang, Z., Rada, C., and Neuberger, M.S. (2009). AID upmutants isolated using a high- throughput screen highlight the immunity/cancer balance limiting DNA deaminase activity. Nat.

Struct. Mol. Biol. 16, 769–776.

Wiedenheft, B., Sternberg, S.H., and Doudna, J.A. (2012). RNA-guided genetic silencing systems in bacteria and archaea. Nature 482, 331–338.

170

Wörn, A., Auf der Maur, A., Escher, D., Honegger, A., Barberis, A., and Plückthun, A. (2000).

Correlation between in vitro stability and in vivo performance of anti-GCN4 intrabodies as cytoplasmic inhibitors. J. Biol. Chem. 275, 2795–2803.

Wu, X., Scott, D.A., Kriz, A.J., Chiu, A.C., Hsu, P.D., Dadon, D.B., Cheng, A.W., Trevino, A.E.,

Konermann, S., Chen, S., et al. (2014). Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nat. Biotechnol. 32, 670–676.

Yamanaka, S., Balestra, M.E., Ferrell, L.D., Fan, J., Arnold, K.S., Taylor, S., Taylor, J.M., and

Innerarity, T.L. (1995). Apolipoprotein B mRNA-editing protein induces hepatocellular carcinoma and dysplasia in transgenic animals. Proc. Natl. Acad. Sci. 92, 8483–8487.

Ye, L., Wang, J., Tan, Y., Beyer, A.I., Xie, F., Muench, M.O., and Kan, Y.W. (2016). Genome editing using CRISPR-Cas9 to create the HPFH genotype in HSPCs: An approach for treating sickle cell disease and β-thalassemia. Proc. Natl. Acad. Sci. 113, 10661–10665.

Zuris, J.A., Thompson, D.B., Shu, Y., Guilinger, J.P., Bessen, J.L., Hu, J.H., Maeder, M.L., Joung,

J.K., Chen, Z.-Y., and Liu, D.R. (2015). Cationic lipid-mediated delivery of proteins enables efficient protein-based genome editing in vitro and in vivo. Nat. Biotechnol. 33, 73–80.

171