© 2016 Nature America, Inc. All rights reserved. Correspondence should Correspondence be addressed to D.K.G. ( Harvard University, Cambridge, USA. Massachusetts, of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, USA. Massachusetts, 1 vectors lentiviral into cloned are within sites of thousands of tumor and metastasis resistance drug survival, for required genes identify to regulation. native decipher to required expected as expression gene activate not do enhancers predicted of majority the that found have assays, data from reporter modification histone using predicted native gene test to expression Efforts enhancers, levels. systematically influencing in element each of importance relative the determining of capable approach high-throughput no is there and elements, tory regula many have can genes Additionally, context. non-native a to not transfer does activity or whose expression for gene not sufficient text and therefore cannot elements characterize that are but necessary con heterologous a in expression gene activate to sufficient are that regions have shortcomings. several assays Reporter focus on elements apart. megabases be at times can and proximity close promoters cognate their with enhancers techniques to identify Additionally, distal DNA interactions contexts. have begun to associate heterologous in expression gene activate to formats high- in throughput done be now can which assays, reporter Gene-expression poised and active as such states, chromatin and promoters, and been has with code correlate to modification found histone A elements. regulatory gene challenging. remains expression gene on variants of effects the predicting disease, and variation human in function. differences Although type–specific cell for basis the provides regulation Gene identify the base functional pair–resolution motifs of regulatory elements. epigenetic or chromatin features. We compare thousands of functional and genotypes nonfunctional at a genomic location and gene expression and identify unmarked regulatory elements (UREs) that control gene expression but do not have typical enhancer for expression of four embryonic stem genes. cell–specific We show a consistent contribution of neighboring gene promoters to on the contribution of in green fluorescent (GFP) reporters to read out gene activity. Using this approach, we obtain quantitative information genome in its native context. MERA tiles thousands of mutations across ~40 kb of regulatory assay (MERA), a approach CRISPR-Cas9–based that high-throughput analyzes the functional impact of the regulatory Quantifying the effects of Budhaditya Banerjee Nisha Rajagopal mapping High-throughput of regulatory DNA nature biotechnology nature Received 11 May 2015; accepted 24 December 2015; published online 25 January 2016; Computer Computer Science and Artificial Intelligence Laboratory, Institute Massachusetts of Technology, Cambridge, USA. Massachusetts, CRISPR-Cas9 has been used in genome-wide mutation screens screens mutation genome-wide in used been has CRISPR-Cas9 regulatory gene identifying for techniques existing However, Important strides have been made over the past decade in cataloging 14– 1 8 . In these screens, guide RNAs guide . tens In screens, (gRNAs) targeting these 6– 1 , , Sharanya Srinivasan 8 , have confirmed elements that are sufficient sufficient are that elements confirmed have , 1 cis cis 3 cis . This suggests that additional assays are are assays additional that suggests This . 2

-regulatory -regulatory regions to gene expression. We identify proximal and distal regulatory elements necessary , , Tahin Syed -regulatory elements, such as enhancers enhancers as such elements, -regulatory -regulatory DNA are known to underlie underlie to known are DNA -regulatory cis advance online publication online advance -regulatory -regulatory DNA on gene expression is a major challenge. Here, we present the multiplexed editing [email protected] 9– 1 1 , , Bart J M Emons 2 , which often are not in in not are often which , 4 Program Program in Cancer, Stem Cells, and Biology,Developmental University of Utrecht, Utrecht, the Netherlands. 1 , 2 , , Kameron Kooshesh ) ) or R.I.S. ( cis -regulatory -regulatory

[email protected] 2 1– , 4 5 , , David K Gifford - - .

Cas9–based mutation screening technique. First, the targeted sites sites targeted the First, gene technique. screening mutation previous Cas9–based and MERA CRISPR- the alter between to us spurred that approaches screening mutation distinctions two are There assay MERA the Developing RESULTS by elements genomic recombination. homologous selected of replacement the through screen of Finally, base. each importance we the of results validate the MERA functional the characterize to us enables This expression. gene not lose did or did either that genotypes of thousands reveal to regions perform deep sequencing of the gRNA-induced mutations in targeted We expression. gene then gRNA is to each diminish how likely uring by expressing gRNAs that tile a gene’s discovery motif and Wevalidated. map functional elements that are required for by gene expression characterized be then can elements Selected expression. gene on variation genomic of effects the maps bases. individual altered or insertion (<10-bp) indels (>100-bp) larger small include can but (indels) are deletions typically that site cleavage initiated the at mutations of range wide a inducing (NHEJ), joining end non-homologous cellular by fashion error-prone an in repaired are single gRNA employs Cas9, which has been shown MERA to context. cleavenative its in DNAresolution when single-base at pairedgenome latory with a identified. systematically be can phenotype for this are that required genes phenotype, desired a for selection after cells the in depleted or enriched are that gRNAs and delivered as a pool into target cells along with Cas9. By identifying The MERA assay first carries out a high-throughput screen that that screen high-throughput a out carries first assay MERA The regu the to analyze MERA CRISPR-Cas9–based Here we develop 2 d , oi:10.1038/nbt.346 3 , , Yuchun Guo 19– ). 2 2 3 . In MERA, Cas9-induced double-strand breaks (DSBs) Department Department of Stem Cell and Regenerative Biology, 1 cis & Richard I Sherwood -regulatory -regulatory genomic space and uses knock- 8 1 , , Matthew D Edwards cis 2 Division Division of Genetics, Department -regulatory region -regulatory and meas 2 ticles e l c i rt A 1 , 20 , 21 , 2 3 and and  - -

© 2016 Nature America, Inc. All rights reserved. the the from replicates between correlated highly are ( sequencing. deep for isolated cytometrically flow are cells These or completely. partially expression GFP lose cells of a subpopulation integration, gRNA bulk After expression. GFP strong uniformly show mESCs ( expression. gene of loss complete or partial with associated preferentially gRNAs identify to used is population each on sequencing Deep levels. expression GFP their to according sorted cytometrically flow are cells and locus, gene a GFP-tagged of the across tiled are RNAs Guide gRNA. a single receives cell each that such recombination homologous Cas9–based CRISPR- through gRNAs of library pooled a with replaced is gRNA dummy integrated ( (MERA). assay 1 Figure MERA CRISPR-based other appliedto be in also could used platform screening gRNA gRNA. one only expresses cell each which in screen pooled a allowing cell, each in integrated is gRNA which dictates chance and ( library the from site genomic single tional gRNA expression targeting a construct by a creating func recombination, repaired homologous is gRNA dummy the (~30%), cells of fraction substantial a In site. gRNA dummy the at DSB a induces that plasmid ments and into Cas9 along with a cells gRNA frag homology gRNA of pool the introduce ( fragments PCR unintegrated from of transcribed cutting gRNAs background increase arms homology longer that found we as library, gRNA our to arms use Wehomology 90-bp to 79- library. add to PCR our from gRNA a with to gRNA dummy the recombination replace homologous CRISPR-Cas9–mediated use then ( recombination homologous CRISPR-Cas9–mediated acces sible universally the into hairpin) gRNA dummy a of expression driving promoter U6 (a construct expression gRNA the sin of copy a gle integrated We vector. delivery a into cloning molecular any without used be to libraries gRNA allows and cell per expressed be can gRNA one only that ensures that strategy a devised we cell, per on used be to library. library arrives. it day each the a for allow would separately that approach done an be sought We to have would that process time-consuming a virus, of batch a producing and vector lentiviral approaches to based date have cloning required gRNA into libraries a requires a MERA different gRNAperform library.we All which high-throughput for CRISPR-Cas9– gene each Second, gRNA. single a to cells limit to approach elegant more a sought we (MOI), infection of multiplicity the lowering by libraries lentiviral for addressed be can issue this Although analysis. downstream complicate region, would a of which mutation of instead deletion undergo may gRNA than one more receiving cells so together, close often are screen our in s e l c i rt A  rates. integration replicable and consistent

Of note, the genomic integration–based integration–based genomic the note, Of To enable the efficient targeting of precisely one regulatory element Tdgf1 Supplementary Fig. 2 Fig. Supplementary ROSA

Multiplexed editing regulatory regulatory editing Multiplexed ( c ) or ) or Supplementary Fig. 1 Fig. Supplementary locus of mouse embryonic stem cells (mESCs) using using (mESCs) cells stem embryonic mouse of locus a Zfp42 ) In MERA, a genomically a genomically MERA, ) In c , d ) Bulk reads for gRNAs gRNAs for reads ) Bulk libraries ( libraries cis -regulatory regions regions -regulatory ). Only random random Only ). d ), indicating indicating ), b ) ) ). We). then Zfp42 Fig. 1 Fig. GFP

a - -

100 200 300 400 b a c ROSA log10 (bulk reads in replicate 3) 0 10 0.5 1.0 1.5 2.0 3 Zfp42 0% locus log Sequence gRNAsineach Fig. 1 Fig. Spearman correlation=0.64 10 10 GFP 1% 99% sorted population 4 (bulk readsinreplicate 2) gRNA mESC 0.5 a 10 U6 Pro ). We ). 5 10 gRNA 5. - - 5

1.0 + gRNA library Tdgf1 be be involved in regulation based on enhancer-like features In regions other we cases, proximal selected to the gene most likely to targeted the 40-kb region proximal 3,908 to the gene in an with unbiased manner. each libraries, tiling gRNAs gRNA corresponding synthesized and Nanog screening for applicable cell lines such as embryonic stem cells. stantially reduces the time, effort and cost involved in CRISPR library recombination, it provides an alternative to lentiviral screening ill-suited thatto sub thus is approach integration-based the Although screening. natorial toexpression achieve of any number set of gRNAs for cell per combi ogous recombination at appreciable frequency, and it could be modified high-throughput screens as long as the cell line used undergoes homol whose sequences differ from those in the reference genome cannot cannot genome reference the in those from differ sequences regions whose targeting gRNAs and gRNAs, with tiled be cannot regions genomic unmappable and repetitive bp. note, 11 Of gRNAs was cent of one least at of the replicates three in integrated be to found gRNAs 3,621 the Among data interaction distal ChIA-PET when gene the from away Mb 92 to up regions distal as well as gene, the from away kb ~150 of maximum a be appropriately tiled without genome sequence data of the cell line. line. cell the of data sequence genome without tiled appropriately be Dummy gRNA PS We generated GFP knock-in lines for four mESC-specific genes, genes, mESC-specific four for lines knock-in GFP Wegenerated GFP 2 suggested a with suggested the interaction possible target gene promoter 1.5

Count , Rpp25 100 advance online publication online advance 50 0 gRNA hairpin gRNA 10 GFP fluorescence gRNA in vivo 3 2.0 , cis 3% Tdgf1 -regulatory regions. In the case of of case the In regions. -regulatory Bulk population Homologous recombination screens or screens in cells with limited homologous 10 and GFP sort 3% 4 Tdgf1 of gRNAlibrary d log10 (bulk reads in replicate 4) Zfp42 0.5 1.0 1.5 2.0 library, the mean distance between adja between library, distance mean the 10 94% log 5 ( Spearman correlation=0.74 Cas9 Fig. 1 10 10 Target genelocu 5. (bulk readsinreplicate 3) 5 b 0.5 cytometric , + FLow sorting Supplementary Fig. 3

U6 Pro nature biotechnology nature 1.0 Zfp42 s 100 200 300 100 200 300 GF 0 0 gRNA 10 10 Gene PS P Tdgf1 3 3 GFP 1.5 9% 89% medium 10 10 GFP 99% 25– , the library library the , 1% GFP 4 4 hairpi gRNA 3 1 , ref. – 10 10 that are 2.0 n 0% 2% 5 5 10 10 2 5. 5. 4 5 5 3 ), - - - - .

© 2016 Nature America, Inc. All rights reserved. identify gRNAsidentify thatexpression.inducedof GFPloss SeparateGFP 5 Figs. Supplementary track, density (bulk function detailed their assay to us allow to gRNAs of coverage sufficient had surveyed we that regions ( licates rep the biological between concordance populationshowed the bulk ( in the GFP the in ( replicates. two in variability experimental indicate bars Error red. in highlighted are population bulk the in present but populations GFP either in reads no showing gRNAs as defined controls Negative gRNA. the of location genomic actual the against plotted is gRNA control positive GFP-targeting of incorporation upon loss GFP undergoing cells of proportion the by divided gRNA a particular of incorporation upon loss GFP undergoing cells of proportion The system. CRISPR self-cloning the using assay MERA the in population GFP the in enriched as detected gRNAs ( text. the in detail further in described and plot, the above annotated shaded, are interest of regions Genomic (UREs). features element regulatory of devoid regions in observed also are population GFP the in enriched significantly gRNAs Numerous gRNAs. overlapping of clusters dense with coincide elements regulatory active Several (blue). data ChIP-seq H3K4me3 (viii) data. ChIP-seq on based density binding factor Transcription (vii) regions. I hotspot DNase (vi) (v) Predicted enhancers (cyan, weak; red, strong). genes. Annotated (iv) replicates. across GFP of log-ratio mean the to proportional is height bar Methods; Online the P at replicates three of one any in (cyan) cells GFP in enriched gRNAs (iii) replicates. GFP of log-ratio mean the to proportional is height bar methods; the P at replicates three of one any in (red) GFP in enriched gRNAs (ii) replicates. biological three of one any in gRNAs integrated 3,621 all of locations The (i) bottom. to top from order track in following the showing 40-kb the for elements required of identification 2 Figure both for DNA genomic the in detected were gRNAs synthesized in correctly of 90% differences Over examine integration. gRNA to cells library-integrated below. DNA unsorted genomic the of collected described we as electroporation, after confirmed week 1 Starting independently were hits screen for replicates two for screens replicate biological four Weperformed regions regulatory required identify screens MERA expression. GFP of targeting gRNAs control the positive ten contained also library Each nature biotechnology nature the surrounding categories genomic functional different the among gRNAs c Supplementary Methods Supplementary < 10 < 10 ) Correlation of gRNAs significantly enriched enriched significantly gRNAs of ) Correlation Library-integrated mESCs were then flow cytometrically sorted to sorted cytometrically flow then were mESCs Library-integrated GFP −10 −10

Fig. 1c Fig. MERA enables systematic systematic enables MERA open reading frame that we expected would cause loss loss cause would expected we that frame reading open b using a binomial test as described in described as test a binomial using in described as test a binomial using Tdgf1 neg ) Individual validation of specific specific of validation ) Individual Tdgf1 population in fixed-size bins varying from 100 bp to 1 kb for biological replicates in in replicates biological for 1 kb to bp 100 from varying bins fixed-size in population , proximal regulatory region region regulatory proximal d and and . . ( Nanog a neg ) A genomic view of view ) A genomic Supplementary Fig. 4a Fig. Supplementary to bulk reads across across reads bulk to and and cis and a single replicate for for replicate single a and neg -regulatory -regulatory

). In addition, gRNA integration rates in in rates integration gRNA addition, In ). 6 to bulk reads reads bulk to neg advance online publication online advance ). or GFP or

neg medium neg

neg

cells cells medium

Tdgf1

). All of the regulatory regulatory the of All ). and and b a c Pearson correlation coefficient Zfp42 0.7 0.8 0.9 Rpp25

Fraction of cells with GFP loss Zfp42 normalised to GFP-targeting gRNA 2 Figs. 0.05 0.15 0.25 0.35 0.45 and and 200 . Selected Selected . libraries libraries

Bulk density Weak Fraction ofgRNAenrichedin Tdgf1 and 400 GFP GFP neg

GFP Rep3 vsMean(Rep1,Rep2) Rep2 vsMean(Rep1,Rep3) Rep1 vsMean(Rep2,Rep3) 3 Bin size(inbp) - ; ,

Tdgf1 neg medium neg 110,935,000 Unmarked populatio GFP gRNAs with statistically significant over-representation in GFP the dummy gRNA as a negative control, we devised a method to detect of dances abun relative the Using expression. gene of loss induce that gRNAs 4b Fig. Supplementary ( replicates all in gRNAs region–targeting coding cant over-representation of nearly all integrated positive-control loci gene ( four all at expression gene for required is space genomic of a subset that indicates clearly screens all in ulations population separation ( bined in the GFP whereas experiments, and GFP ciated with gene regulation gRNAs in established categories functional of genomic elements asso Table 5 Supplementary H3K4me3 20_TF_density /bulk regulatory elements Figs. 2a Figs. 600 /bulk The distribution of gRNA abundance in GFP In our MERA screen of gene. Error bars show variability due to the three biological replicates. biological three the to due variability show bars Error gene. medium

n Neg ctrl 110,940,000 medium 800 , b b GFP Strong Tdgf populations (Online Methods, Methods, (Online populations and Tdgf1 Nanog 1 10 kb coding region-targeting positive control gRNAs and and gRNAs control positive region-targeting coding populations were sorted in the

3 , , 110,945,000 Predicted GFP Supplementary Tables 1 Supplementary

Tdgf1 Intragenic d

Fraction of gRNA significantly and

GFP-targeting DNase hotspot

Enh51650+ Fig. 1

enriched for GFP loss enhancer 0.2 0.4 0.6 0.8 1.0 libraries. ( libraries. ), suggesting that MERA robustly identifies identifies robustly MERA that suggesting ), ). Tdgf1 Rpp25

Promoter46158DHS50250 27– neg 110,950,000 Promoter Lrrc2 b 3

Neg ctrl GFP and 1 and , we observed differential enrichment of

Intragenic promoter K4me3 ( GFP GF GF Enh48650 Enhancer Fig. 2a Lrrc2

Lrrc2 P P d Supplementary Fig. 3 experiments because of incomplete medium ne ) Fraction of GFP of ) Fraction 110,955,000 g

promoterEnh58450 (k4me3) mm1 External promoter

Enh56300 Pearson correlationbetweenreplicates medium , 0 d EnhXXXXX =Strongenhancercentere DHSXXXXX =DNasehotspotcentered and Enh61250 Weak_Enh28150 4b Fig. Supplementary DHS41289 – GF GF 4 populations were compopulationswere Supplementary Fig. 5 110,960,000 neg Tdgf1 ). We detected signifi We ). detected at1109XXXX P P at1109XXXX neg

DHS39875 med and GFP DHS37197 = 0.81–0.93 = 0.9–0.9 ticles e l c i rt A neg GFP Enh66050 2d Figs. Negative control -enriched -enriched Enh38450 cis and 110,965,000 ). 7 -regulatory -regulatory DHS36336 medium

URE X X

DHS41894 Zfp42 and

Enh40750 neg pop GFP and 3 GFP d , c c ).  - - - - - , ,

© 2016 Nature America, Inc. All rights reserved. To determine the accuracy of the MERA screen in systematically systematically in required screen determining MERA the of accuracy the determine To hits MERA of Validation overall of the part for only responsible are each enhancers these that gesting (GFP GFP of loss upstream of and neighboring the of ( GFP and enhancer proximal ( regions and promoter the in loss GFP for enrichment ( enhancers distant some by induced that to comparable GFP of loss a produced UREs elements were (UREs). regulatory often over 1 kb in and length as unmarked markers any do not that contain of these elements such We designated modifications. chromatin from predicted enhancers or hotspots, I (TF)-binding DNase predicted sites, factor hypersensitivity I DNase sites, transcription known H3K4me3, H3K27ac, as H3K4me1, such activity, regulatory of markers known any with of stream the enhancer 4 kb upstream of regions for region genomic promoter the at the observed was in gRNAs significant of density highest The enriched gRNA among the different functional genomic categories surrounding the significantly enriched in the GFP aboveannotated the plot and described in further detail in the text.shaded, are ( interest of regions Genomic gRNAs. overlapping of clusters dense with coincide elements regulatory active Several data. ChIP-seq H3K4me3 (viii) data. ChIP-seq on based density binding factor Transcription (vii) regions. I hotspot DNase (vi) strong). red, weak; GFP of log-ratio mean the to proportional is height bar Methods; Online the in described as test a binomial using GFP in gRNAs Enriched (iii) GFP of log-ratio mean the to proportional is height bar methods; at replicates four of one any in (red) GFP in gRNAs (ii) gRNAs. integrated 1,643 all of location The (i) order. track in for 3 Figure s e l c i rt A  our among consistency licate a Fig. 3 Fig. External promoter In our MERA screen of of screen MERA our In Surprisingly, a we novel observed class of genomic elements down I 43,200,000 Zfp42 Lrrc2 Supplementary Fig. 6b Fig. Supplementary a Zfp42

and . . ( MERA enables systematic identification of required required of identification systematic enables MERA Fig. Fig. 3a promoter ( promoter medium Tdgf1 a ) A genomic view the the view ) A genomic Fig. 2 Fig. Zfp42 Supplementary Fig. 6a Fig. Supplementary expression level in cells. in level expression , population at UREs in regions II, III, VI and VII VII and VI III, II, regions in UREs at population ( c ). We observed enrichment ). of We gRNAs enrichment in the GFP observed Fig. 2 Fig. d medium tended to cause rather intermediate than complete ). Predicted enhancers DNase hotspot neg 20_TF_density Triml2 Fig. 2a Fig. H3K4me3 cis to bulk reads. (iv) Annotated genes. (v) Predicted enhancers (cyan, (cyan, enhancers Predicted (v) genes. Annotated (iv) reads. bulk to a in red versus GFP versus red in , highlighted in gray) that did not coincide coincide not did that gray) in highlighted , -regulatory regions, we first examined rep examined first we regions, -regulatory 43,250,000 medium Tdgf1 50 kb promoter in regulating regulating in promoter , d Zfp42 ). We also note that regulatory regions regions regulatory that note Wealso ). neg ). Tdgf1 cells (cyan) in any one of four replicates at at replicates four of one any in (cyan) cells population in fixed-size bins varying from 100 bp to 1 kb for biological replicates in Zfp42 and the strong enhancer overlapping Bulk density P , we also observed the strongest strongest the observed also we , < 10 ) and observed the participation participation the ) and observed , , proximal regulatory region showing the following following the showing region regulatory proximal Zfp42 GFP −10 GFP Tdgf1 medium I using a binomial test as described in the the in described as test a binomial using I neg and and – /bulk in blue; blue; in /bulk , the strong proximal proximal strong the , 43,300,000 Nanog Zfp4 Zfp4 Zfp4 Zfp4 2 2 2 2 Zfp42 mm10 III Fig. 3 Fig. MERA data. data. MERA cis Promoter neg -regulatory elements elements -regulatory ( b c i. 3 Fig. to bulk reads. reads. bulk to ) Correlation of gRNAs ), sug ), IV neg a - - - 43,350,00 V

target activity ( activity target off- gRNAs gRNAs, of displaying number a with non-clustered small control GFP predominantly were cells library–targeted mismatched 8 Fig. ( library matched the by targeting upon than library gRNA mismatched a by targeting upon GFP lost cells of percentage guish the from gRNAs any whether asked we regions. screened other none of these regions was more likely to contain off-target effects than of moter of regulation the supported effects off-target predicted no with gRNAs ( unaffected tered and relative contributions of different functional categories were off- gRNAs potential enriched along the of landscape regulatory the gene was unal with gRNAs eliminated target effects from our we analysis, the global distribution of significantly when that found We model GUIDE-Seq a on using based results MERA affect effects off-target putative how phenotypes. varying with genotypes mutant of distinct thousands cause gRNA can a single because likely P (hypergeometric replicates all for significant was replicates between GFP in enriched gRNAs between overlap the level, ( size bin of values 0.8 at a Pearson correlation with 300-bp replicates, between of GFP patterns Spatial P < 10 value <0.001); however, it was not as high as for binned regions, regions, binned for as high as not was it however, <0.001); value neg Zfp42 To analyze potential off-target effects with an independent method, To we by examined caused effects, positives off-target false analyze Tdgf1

0 cells cells Zfp42 . eunig eeld ht h gNs nihd n GFP in enriched gRNAs the that revealed Sequencing ). −10 V

gene. Error bars show variability due to the four biological replicates. advance online publication online advance I by of promoter the

Figs. 2c 2c Figs. Triml2

VII

GFP Supplementary Figs. 5a Figs. Supplementary

activity and Weversa. vice activity found that a much smaller Supplementary Figs. 5 Figs. Supplementary and a URE region ( region URE a and c b and Fraction of gRNA signi cantly Pearson correlation coefficient enriched for GFP loss between replicates 3 0.2 0.4 0.6 0.8 1.0 3 0.7 0.8 0.9 2 neg

b External promote (Online Methods, Methods, (Online

, , eGFP Promote gRNA enrichment were largely conserved conserved largely were gRNA enrichment Supplementary Fig. 6c Fig. Supplementary Lrrc2

Strong enhancer 200 UR r Region I URE Region II ( DHS onlE Zfp42 Supplementary Fig. 5b Fig. Supplementary Strong enhancer

Supplementary Fig. 6a Fig. Supplementary r and Region III downstream

UR y 400

DHS onlyE and and . ( Strong enhancer Bin size(inbp) Tdgf1 6a nature biotechnology nature c Region III intragenic ) Fraction of GFP UR , Supplementary Fig. 7 Fig. Supplementary c DHS onlE 6

). Furthermore, several several ). Furthermore, Strong enhance ). Thus, the clustered clustered the Thus, ). 600 library would extin would library Region III upstream UR y

). At an individual individual an At ). DHS onlE Weak enhancer r Supplementary Supplementary

neg Region IV UR y 800 populations populations Strong DHSenhance onlE neg

), the pro ), the Region V UR y GF GF –

DHS onlE Rep4 Rep3 Rep2 Rep1 P P c medium neg ), and and ), r Region VI 1,000 UR y neg E

). ). Region VII - - -

© 2016 Nature America, Inc. All rights reserved. GFP and right paired-end reads as features. ( Tcfcp2l1 and Sox2. ( enhancer region (gRNAs are shaded) showing overlap with DNase I hotspot and predicted enhancer regions, and transcription factor binding sites for Stat3, procedure involved in finding mutations induced by a particular gRNA. ( Figure 4 text. the in later described discovery, motif functional performing by cases several in hypothesis this We confirm sites. factor–binding such as elements transcription regulatory to functional likely disrupt the wide of spectrum mutations at target sites, which are differentially ( cells of <2% in loss GFP induce controls negative while cells, of 5–40% in of 20–40% GFP loss induced in elements regulatory distal targeting and cells those loss GFP induced regions promoter targeting those the targeting gRNAs (see expression Discussion gene to regard with phenotypes gene function, we found that regulatory mutations induce more variableto be equivalently likely to induce frameshift mutations that inactivate As opposed to ORF-targeting screens, in which all gRNAs are assumed rate. false-positive ( gRNAs trol con located similarly 5 to compared as screens these from gRNAs 2 Fig. including categories, ( promoters neighboring and functional UREs several within fell gRNAs These screen. MERA pooled the in activity their with correlated loss GFP of rate their whether determine to gRNAs individual introduced we ( effects gRNA on-target of result a is regions these in loss GFP that conclude to us leading effects, or off-target determined predicted experimentally computationally by replicated not is MERA in and UREs promoters neighboring enhancers, at loss GFP of enrichment nature biotechnology nature and Tdgf_gRNA_2 along the left paired end read. The location of the Stat3 binding site with its positive-strand motif is shown along the length of the gRNA. Supplementary Figs. 5a Figs. Supplementary a We next sought to determine the false-negative rate of MERA. MERA. of rate false-negative the determine to sought next We To determine the false-positive rate at the level of individual gRNAs, Classify filteredreadsinGFP Target-site cleavage insertions, deletionsandmismatchesasfeatures neg c Fig. 2 Fig. b genotypes at a base to the reference base to the Hellinger distance of the GFP pos

GFP classification rate 29/30 in loss GFP increased significantly confirmed We ). Functional motif discovery analysis of region-specific mutant genotypes at enhancers reveals required regulatory motifs. ( 1.0 0.2 0.4 0.6 0.8 ). In our individual follow-up assays, we found that that found we assays, follow-up individual our In ). b 0 Fig. 2 Fig. ). We assert that this phenotypic diversity results from from results diversity phenotypic this that We assert ). 0.2 1 –GFP b c ) ROC curve for fivefold classification of GFP ). Altogether, we conclude that MERA has a low low a has MERA that conclude we Altogether, ). GFP deletions Mismatches, insertions, pos neg 0.4 – andGFP ORF induced GFP loss in >90% of cells, cells, of >90% in loss GFP induced ORF calssificationrate

c and and advance online publication online advance NHEJ 0.6 6a NHE neg using , J Weighted Non-weighted d b Fig. 2 Fig. ) Motif logo for region mutated by gRNAs with base scores computed as log-ratios of the Hellinger distance of the ). 0.8 a GFP-sort , highlighted in gray, and gray,and in highlighted , 1 Paired-end 300-bp d

neg Filter readsbybasequality Sequencing sequencing

H log (H(GFP ,reference)/ 10 pos Supplementary Supplementary (

x,y H(GFP ,reference)) 0.13 ) =Hellingerdistancebetween –19 Left pairedend neg

and GFP b ) Plot showing the genomic regions surrounding two gRNAs at a proximal –10 -

≤ pos 30 age of gRNAs within an enhancer that induce GFP loss, and some some and loss, GFP induce that enhancer an within gRNAs of age and around kb +45 to (−21 enhancers of predicted 9/9 enhancers at ( loss GFP significant induced gRNAs of percentage appreciable An function. gene for necessary are regions regulatory bp ( of 100-1,000 a region (~1 per 8 bp) allows highly reproducible resolution at the level determine required regulatory regions, as the high density of gRNAs to in MERA of ability the impair not does it negatives, false are gRNAs is targeted. However, even if this appreciable of percentage individual nism of inducing GFP loss is not as as direct when the GFP ORF itself mecha as their negatives, or true negatives are loss false GFP induce not do that regions these in gRNAs of 20% the whether unclear Itis GFP in enriched were expression gene on effect an have to ( replicates in multiple of loss GFP gRNAs (81%) induce 48/59 region, the of in enriched GFP highly bp 700 first the target that gRNAs (81%) in GFP enriched highly loss. GFP inducing of We likelihood found gRNAsthat GFP-targeting 10/10 in four all GFP were lines strong with data our in regions ficiencies in gRNA targeting. gRNA in ficiencies inef systematic from suffer may regions certain that possibility the out We rule cannot regions. regulatory required predict entirely not 6 Tables contain a high fraction of GFP loss-inducing gRNAs ( modifications histone enhancer without sites DNase-hypersensitive Supplementary Fig. Supplementary 4f genotypes using mutations within −20 to +20 bp of the gRNA along left To assess the false-negative rate of MERA gRNAs, we examined examined we gRNAs, MERA of rate false-negative the assess To to referencesequence 7 pos Alignment ofreads ). However, there was substantial heterogeneity in the percent the in heterogeneity substantial was However,). there x and genotypes at a base to the reference base, caused by Tdgf_gRNA_1 1 and and Stat3 y Tdgf_Enh_gRNA1 + 7 ), indicating that enhancer histone modifications do do modifications histone enhancer that indicating ), Distance alonggRNA 10 Figs. 2c 2c Figs. ± Bulk density b 20 kb around neg neg ). ). Thus, a high percentage of gRNAs expected and cells. Incells. 500 bp around the cells ( cells 3 Tdgf_Enh_gRNA2 110947444–11094746 GFP GFP b Figs. 2d Figs. ). We then asked whether annotated annotated We). whether asked then Tdgf_Enh_gRNA1 medium neg 500 bases Zfp42 Tdgf1 H3K4me3 Stat3 ) ( ) and ) and 6/7 of predicted 110,947,500 Tcfcp2I1 Supplementary Tables 6 Tables Supplementary 6 3 +3 a 110947456–110947478 Sox2 ) A schematic of the c Tdgf_Enh_gRNA2 ). ). Additionally, 67/83 DNase Ihotspot gRNAs ticles e l c i rt A Rpp25 Strong enhancer Supplementary Supplementary Tdgf1 +12 20_TF_density ORF were were ORF promoter +17 Tdgf1 neg Tdgf1 Zfp42 cells. cells.

 - - -

© 2016 Nature America, Inc. All rights reserved. lines following HDR back to the wild-type genotype (fourth and fifth from left), all normalized to wild-type wild-type to normalized all left), from fifth and (fourth genotype wild-type the to back HDR following lines in shaded URE and enhancer the of deletions with lines mESC ( left). from plots fourth and (second loci both at recovery GFP robust induced genotype wild-type the to back (HDR) repair homology-directed CRISPR-mediated from regions shaded the of deletion CRISPR-induced ( region. (right) URE GFP of ratio screen 6 Figure no showed regions ChIA-PET distal other three whereas enriched gRNAs, strongly several showed away Mb >92 region ChIA-PET near regions tory ( regions regulatory to in GFP of regions regulatory mal significant gRNAs at its promoter and short ORF region. Other proxi 9c Fig. proximal hotspots to and of gRNAs, 20–40% the tested gRNAs and in enhancers DNase predicted I enriched highly of clusters with associated were promoter 5 Figs. have different regulatory architectures ( that revealed results MERA Our screens MERA from emerging trends regulatory Gene GFP the of distance ( GFP in ( specificity. and sensitivity calculating for it, to assigned reads of number the as times many as set test the in genotype unique each counts (red) classification weighted while once, only set test the in genotype unique each counts blue) (in classification Unweighted Tdgf_URE_gRNA2. of bp +20 to −20 within read paired-end GFP of classification fivefold for curve (ROC) characteristic operating ( I hotspots. DNase or enhancers predicted sites, binding factor transcription known modifications, histone active of lack their showing shaded) are (gRNAs gRNAs ( regulation. gene in involved positions base critical reveals 5 Figure s e l c i rt A  replicates. two in level expression a d a Bulk density a

) Plot showing the genomic regions surrounding two two surrounding regions genomic the showing ) Plot 110,947,767 MERA Hellinger the of log-ratios as computed scores base with read paired-end right the along Tdgf_URE_gRNA2 by mutated region the for logo ) Motif Nanog GFP-loss ratio 20 10 0 neg neg , ). In contrast, the the In contrast, ). neg 110934270–11093429 6

and GFP and and GFP and Local genotypes at an enhancer and a URE dictate dictate a URE and enhancer an at genotypes Local a URE of analysis discovery motif Functional Tdgf_URE_gRNA 65-bp enhancerdeletion 110,934,200 , with dense clusters of significant gRNAs in the proximal proximal the in gRNAs significant of clusters dense with , and and cells ( GFP GFP medium neg 9 200 bases ). All regulatory regions within within regions regulatory All ). 110,934,30 Supplementary Fig. 9d Supplementary pos pos medium/neg Rpp25 neg b genotypes using mutations on the right right the on mutations using genotypes populations with mutations at bases along the right paired-end read reveals pattern of cleavage around Tdgf_URE_gRNA2. Tdgf_URE_gRNA2. around cleavage of pattern reveals read paired-end right the along bases at mutations with populations ) Flow cytometric measurement of of measurement cytometric ) Flow genotypes at a base to the reference base to the Hellinger distance of the GFP the of distance Hellinger the to base reference the to a base at genotypes Fig. 2a Fig. 1 0 110,947,968 Nanog 3 110,934,40 gRNAs ( 110934425–110934447 Rpp25 Supplementary Fig. 9b Fig. Supplementary Tdgf_URE_gRNA /bulk reads for each gRNA at an upstream enhancer (left) and a downstream a downstream and (left) enhancer upstream an at gRNA each for reads /bulk Rpp25 resulted in GFP 0 , d H3K4me3 20_TF_density DNase Ihotspot Predicted enhancer 110,934,50 ). UREs were also seen in in seen also were UREs ).

110,933,906 MERA of concentration dense a shows gene had 12% of tested gRNAs resulting resulting gRNAs tested of 12% had GFP-loss ratio mm10 0 10 20 Tdgf1 0 110,934,60 2 Figs. Figs. 2 ). ). , , 0 89-bp UREdeletion Tdgf1 Nanog neg b c b pos ) ) ) Receiver ) Receiver and

cells ( GFP classification rate shows a trend similar Tdgf1 0.2 0.4 0.6 0.8 1.0 ± 0 , , ). ). In a 20 kb of the the of kb 20

0 3 show loss of GFP (first and third plots from left). left). from plots third and (first GFP of loss show

Rpp25

Right paired-endread , , Tdgf1 RNA expression in wild-type mESCs (WT; left), clonal (WT;clonal mESCs left), wild-type in expression RNA Supplementary Supplementary 110,934,076 Supplementary

Nanog 1 –GFP 0.2 Tdgf1

d

and and GFP

cis a (second and third from left), and bulk mESC mESC bulk and left), from third and (second neg expression in clonal cell lines following following lines cell clonal in expression 0.4 -regula , a distal , a distal neg

expression phenotype. ( phenotype. expression log (H(GFP ,reference)/ b 10 classificationrate

Nanog pos Count Zfp42 H(GFP ,reference)) 100 200 250 0.33 0.67 0 0.6 10 –19 Enh deletionclon GFP uorescence - - 3 Weighted Non-weighte

Right pairedend

0.8 as enhancers within the same cellular context. cellular same the within enhancers as functioning by genes neighboring of patterns expression gene nate coordinated other each with interact often promoters neighboring that known is it Additionally, contexts. cellular different in enhancer or promoter either as act can elements dual-property of existence the documented have ies Scamp5 Tdgf1 the of cases the in seen be can involvement moter pro foreign of gene away.Examples bases these million cases several are some promoters In regulation. in genes other of promoters long-distance of interactions. chromatin functionality the measuring of capable is MERA ( gRNAs enriched strongly –1 10 5 4 One observation common to all genes is the participation of the the of participation the is genes all to common observation One 1. –1 0.7% d ( 10 0 – 1 Fig. 2a Fig. c 5 and advance online publication online advance e 10 Fraction of unique genotypes 5. 0. 0. 0. 3 Count 7 1 3 5 200 400 450 a Cox5a 3 ) ) 0 5 –19 10 . Here we observe that active promoters may coordi may promoters active that observe we Here . URE deletionclone Tdgf1 , –3 Right pairedend GFP uorescenc d 3 3 ), ), 4 –15 and that expression of neighboring genes is often often is genes neighboring of expression that and Mirc35hg in 10 8 2 MERA MERA –11 4 pos Rpp25 genotypes at a base to the reference base. reference the to a base at genotypes –7 0%

10 Distance alonggRNA

5 Tdgf_URE_gRNA2

G e 10 Supplementary Fig. 9a Fig. Supplementary –3 ( in in 5. Supplementary Supplementary Fig. 9b Start ofgRNA 3 Count 100 200 250

Nanog 2 0 14 10 Distance alonggRNA HDR towildtype 3 FP uorescenc c Tdgf_URE_gRNA2

Tdgf1 RNA 8 ( 10 expression c Supplementary Fig. 9a Fig. Supplementary ) Fraction of unique genotypes genotypes unique of ) Fraction

1. 0. 1. 4 20 0 0 5 5 14 nature biotechnology nature 55.4% WT 10 5 PAM G e 10 20

Enh-Del 7 3 5.

3 Count 100 300 200 350 Lrrc2 0

URE-Del that indicating ), 10 7 3 ). ). Previous stud HDR towildtyp 3

Enh-Del-Rep FP uorescenc 11 promoter in in promoter 10

URE-Del-Rep 11 4 15 GFP GFP

60.6% 15 3 10 3 pos neg ) and and ) that that 5 19 e 10 e 19 5. 3 - - - © 2016 Nature America, Inc. All rights reserved. ing that clonal deletion cells lost lost cells deletion clonal that ing a without cells wild-type in experiment this large a that site each to a GFP reverted of at cells percentage finding cells, these in genotype wild-type ( and expression, GFP GFP clonal lost obtained cells we of subset a expected, As URE. a at the in one screen, MERA our by loss GFP induce to predicted regions two in deletions (>100-bp) short induce to gRNAs We flanking MERA. used of phase third the in expression gene for required truly are elements regulatory URE and enhancer expression. gene for required are that bases of blocks identify we UREs, in and motifs, binding known to respond cor bases these regions, enhancer In expression. gene for required are regions regulatory MERA-identified at bases which ascertaining for method is a valuable discovery motif that we functional conclude regions of the genome that are required for URE Altogether,function. ( GFP loss with correlated deletion whose bases of consecutive blocks observed 0.76, and 0.81 (AUC for held-out accuracy from genotypes both high gRNAsclassification the of downstream kb ~12 ( sites binding p300 at ( the bases for the Stat3 motif (AUC) of the curve ( 0.81 Fig. 10 ( gRNA the of bp +20 and −20 between mutations constrained Using site. binding direct a with factor only the is Stat3 which of Tcfcp2lI, and Sox2 Stat3, factors transcription mESC key a in discovery motif functional for gRNAs overlapping two WeGFP selected as training algorithmic initial of out held types Tdgf1 Methods). (Online features base-level using GFP distinguishing of accuracy the gauge to of random-forest a built and gRNA the of biases cutting independent the was that score importance base-level a defined we of in the atdifferences fractions genotypes along positions the gRNA, ( expression gene for important are bases which reveal that motifs as genotypes observed the marizing GFP both from GFP genotypes of selected a thousands by obtaining mutation gRNA, Cas9-mediated individual performing by ( at site a given types GFP of thousands between that differ features consistently sequence identifying by loss GFP cause that motif(s) DNA nucleases have been used to derive footprints functional of regulatory centered genotypes on the gRNA cleavage site. Recently, TAL effector with Cas9 and treated a single gRNAmESCs of will contain pool thousands a of distinct mutations, mutant random induces Cas9 Because hits. screen MERA at expression governing elements causal the tify to iden discovery motif functional uses of MERA phase second The regions regulatory MERA-predicted examine to discovery motif Functional nature biotechnology nature restored lines repaired Fig. 4 Fig. Fig. 6a Fig. Zfp42 We then used homologous recombination to confirm that the We next applied functional motif discovery to two gRNAs in a URE in discovery motif functional whether see to tested first We medium/neg Tdgf1 3 and and 6 d ), we were able to classify held out genotypes with an area under . We hypothesized that we could pinpoint DNA sequence sequence DNA pinpoint could we that hypothesized We . and and , enhancer sites, identifying required bases around Nrf1 and and Nrf1 around bases required identifying sites, enhancer b ). We then used homology-directed repair to restore the the restore to repair homology-directed used then We ). Fig. 5c Fig. proximal enhancer that overlapped binding sites for the the for sites binding overlapped that enhancer proximal Zfp42 Supplementary Fig. 11e Fig. Supplementary cells by high-throughput sequencing, and then sum then and sequencing, high-throughput by cells enhancer regions would permit us to classify geno classify to us permit would regions enhancer , d and Supplementary Figs. 12 Figs. Supplementary Fig. 4 Fig. Fig. 5 Fig. Supplementary Fig. 15d Fig. Supplementary Tdgf1 neg Fig. 4 Fig. 2 a

5 b ). Functional motif discovery is achieved is achieved discovery motif Functional ). lines containing the deletion genotype genotype deletion the containing lines Tdgf1 in both the left and right paired-end reads advance online publication online advance and and expression ( expression c ), and we observed an enrichment of an enrichment ), and we observed transcript ( transcript Tdgf1 Supplementary Fig. 14c Fig. Supplementary Fig. 4 Fig. pos ). We achieved similar success success similar We ). achieved state ( state RNA expression, and clonal clonal and expression, RNA a , Online Methods). Using Methods). Online , Fig. 6 Fig. Tdgf1 neg and and Fig. 6 Fig. Fig. 5 Fig. Tdgf1 pos or GFP or , c e 13 and GFP and enhancer and one one and enhancer ). This robust and and robust This ). ), suggesting focal focal ), suggesting b ). a Supplementary Supplementary GFP ). ). We replicated ). We obtained Weobtained ). neg pos allele, find allele, or GFP or 3 genotypes genotypes 7 classifier classifier ), and we we and ), neg

pos geno Tdgf1 and and pos ------.

(phastcons score >0.85, >0.85, score (phastcons activity. regulatory its to Furthermore, we find the first half of this URE to be conserved highly crucial is sequence DNA its sug that bases, gesting consecutive of string a at substitution base to sensitive the of downstream URE a that find we UREs, predict requirement. regulatory fully not does annotation genome correlative that observe we as function, genome characterize definitively to pertur analysis direct bation of importance the reinforces expres observation gene This sion. controlling in roles large unexpectedly play features mark histone and hypersensitivity DNase expected conventionally as elements that (UREs) unmarked regulatory are not with associated well as promoters gene neighboring that We evidence elements. find regulatory identified of nature precise the dissect to discovery motif of survey required a provides only Itnot genome. regulatory the of function the rogate MERA offers an approach unbiased, high-resolution to inter directly DISCUSSION for required is URE a at DNA sequence local the that evidence compelling provides sion straightforward relationship local between genotype and GFP expres online version of the pape Note: Any Supplementary Information and Source Data files are available in the code accession under database codes. Accession the in v available are references associated any and Methods M types. cell relevant in function for variants such screen to in genome-wide association studies by MERA will provide a rapid way We expect that the direct interrogation of variant locations discovered of how understanding in individual can by be MERA to analyzed Extending explore MERA. how changes that types cell of range the expand to used be can delivery lentiviral Weof levels expression. desired combine to regions achieve note that expression and could potentially provide insights into how regulatory assess ment of the relative quantitative of effects distinct enables also sufficient MERA and elements. regulatory necessary gene between concordance of degree the into future including experiments both approaches should provide insight genome. the of architecture regulatory the in are promoters gene neighboring and UREs pervasive how elucidate will loci more at coverage extensive more even with screens MERA that future of and we UREs, expect the frequency we estimate cannot Therefore, genes. any these of regions proximal the tile uniformly unannotated and tated assayed. is population cellular entire the when detected not are features mark histone and hypersensitivity DNase expected conventionally thus and subpopulation cellular a in only active are UREs that possibility the We exclude cannot importance. secondary protein acterized factors, or precise spacers whose is base sequence of may data, UREs these be with RNA by bound templates, elements unchar Consistent region. genomic the of significance functional tial e ethods r Although we do not yet have definitive data as to the function of of function the to as data definitive have yet not do we Although MERA is complementary to high-throughput reporter assays, and assays, reporter to high-throughput is complementary MERA We designed our gRNA libraries to target a mix of previously anno s i o n

o f

t h e cis

p a -regulatory elements alter gene networks will aid our will gene alter networks elements -regulatory p e Data have been submitted to the NCBI GEO GEO NCBI the to submitted been have Data cis r . Tdgf1 r . -regulatory elements -regulatory but also enables functional cis Supplementary Fig. 15e Fig. Supplementary cis -regulatory variants lead to human disease. to disease. human lead variants -regulatory expression. -regulatory regions, and thus we did not not did we thus and regions, -regulatory G SE7631 cis -regulatory elements on -regulatory gene 8 . Tdgf1 ), indicating poten indicating ), ticles e l c i rt A gene is highly highly is gene o n l i n  e ------

© 2016 Nature America, Inc. All rights reserved. 12. 11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. reprints/index.htm R version of The authors declare competing interests:financial details are available in the computational analysis. Y.G., T.S. and M.D.E. helped with the computationalperformed by R.S.,analysis. S.S., K.K., B.B. and B.J.M.E. N.R. and D.K.G. performed the Experiments were designed by N.R., R.I.S. and D.K.G. MERA experiments were ProgramScience grants to R.I.S. award, Brigham and Women’s Hospital BRI Innovation Fund, and Human Frontier and(1K01DK101684-01) Harvard Stem Institute’sCell Sternlicht Director’s Fund the US National Institutes of Health to D.K.G. (1U01HG007037) and to R.I.S. Y. Qiu for flow assistance.cytometric The authors acknowledge funding from the MIT BiomicroCenter for high-throughput sequencing assistance, and The authors thank F. Zhang (Broad Institute of MIT and Harvard) for reagents, s e l c i rt A  CO AUTH Ac

eprints and permissions information is available online at at online available is information permissions and eprints

kn MPETING FINAN MPETING Dostie, J. Dostie, M. Simonis, E. Lieberman-Aiden, M.J. Fullwood, R.P. Patwardhan, C.D. Arnold, A. Melnikov, M.P. Creyghton, N.D. Heintzman, A. Rada-Iglesias, Bernstein, B.E. code. histone Translatingthe C.D. Allis, T.& Jenuwein, 16 elements. genomic between interactions mapping for solution parallel (2006). (4C). 1348–1354 capture-on-chip conformation by uncovered genome. human the of (2009). principles folding reveals interactome. enhancers STARR-seq. (2012). 271–277 n ua cls sn a asvl prle rpre assay. reporter parallel massively a using cells human in (2010). state. developmental predicts and expression. gene cell-type-specific humans. in enhancers cells. stem embryonic in (2001). , 1299–1309 (2006). 1299–1309 , owl O R R the the pape CO edgments et al. et in vivo in NTRIBUTI Science et al. et et al. et et al. et Nature l Chromosome Conformation Capture Carbon Copy (5C): a massively a (5C): Copy Carbon Capture Conformation Chromosome . et al. r . t al. et et al. et . t al. et t al. et Nuclear organization of active and inactive chromatin domains chromatin inactive and active of organization Nuclear Genome-wide quantitative enhancer activity maps identified by identified maps activity enhancer quantitative Genome-wide Systematic dissection and optimization of inducible enhancers inducible of optimization and dissection Systematic t al. et Nat. Biotechnol. Nat.

C

A bivalent chromatin structure marks key developmental genes 462 339 IA itn HK7c eaae atv fo pie enhancers poised from active separates H3K27ac Histone t al. et A unique chromatin signature uncovers early developmental early uncovers signature chromatin unique A Nature Histone modifications at human enhancers reflect global reflect enhancers human at modifications Histone O n etoe-eetrapabud ua chromatin human oestrogen-receptor-alpha-bound An asvl prle fntoa dseto o mammalian of dissection functional parallel Massively L , 58–64 (2009). 58–64 , , 1074–1077 (2013). 1074–1077 , Cell NS INTERESTS opeesv mpig f ogrne interactions long-range of mapping Comprehensive

125 470 , 315–326 (2006). 315–326 , Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. , 279–283 (2011). 279–283 , Nature

30 , 265–270 (2012). 265–270 ,

459 , 108–112 (2009). 108–112 , Science Science http://www a. Biotechnol. Nat.

107

293 a. Genet. Nat. , 21931–21936 , 326 , 1074–1080 , .nature.com/ Genome Res. Genome 289–293 ,

online

38 30 , ,

23. 22. 21. 20. 19. 18. 17. 16. 15. 14. 13. 27. 26. 25. 24. 37. 36. 35. 34. 33. 32. 31. 30. 29. 28.

rdc, .. Fn, .. Atc, .. Bo G CIP/a9 ytm targeting systems CRISPR/Cas9 G. Bao, & C.J. Antico, E.J., Fine, T.J., Cradick, M. Jinek, P. Mali, L. Cong, M. Jinek, S. Chen, Y.,Li, Koike-Yusa,H., Tan, Genome-wide E.P.,Yusa,K. & Velasco-Herrera,Mdel.C. Y.Zhou, O. Shalem, cells human in screens Genetic E.S. Lander, & D.M. Sabatini, J.J., T.,Wei,Wang, High-throughput B.A. Cohen, & H.G. Chaudhari, C., Fiore, J.C., Kwasnieski, aaoa, N. Rajagopal, F. Yue, state.cell embryonic stem the ofControlYoung, R.A. T.,Cloning-free Hashimoto, R.I. S., Sherwood, Srinivasan, & M., N. Arbab, Geijsen, Breiman, L. Random forests. Random L. Breiman, in domains expression J. Coordinated Vierstra, G.A. Churchill, & M. Walker, Y.H., Woo, Li, G. Leung, D. Tsai,S.Q. event binding wide genome resolution High D.K. Gifford, & S. Mahony, Y., Guo, Sherwood, R.I. S. John, the in elements DNA of encyclopedia integrated An Consortium. Project ENCODE 339 immunity. bacterial metastasis. and library. recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA cells. human in genomics Science system. CRISPR-Cas9 the using (2014). 1595–1602 β (2013). (2013). 823–826 predictions. segmentation ENCODE of testing functional identification from chromatin state. chromatin from identification genome. mouse the in elements CRISPR. 41 927–930 (2015). 927–930 genomes. mammalian regulation. transcription for basis tissues. nucleases. CRISPR-Cas Biol. Comput. PLoS constraints. binding spatial factor transcription reveals discovery motif and finding shape. and (2014). magnitude 171–178 profile DNase modeling by factors patterns. binding genome. human goi ad C5 ee hv sbtnil f-agt activity. off-target substantial have genes CCR5 and -globin , 9584–9592 (2013). 9584–9592 , , 819–823 (2013). 819–823 , advance online publication online advance et al. Nat. Biotechnol. Nat.

t al. et Nature 343 t al. et et al. et et al. et Stem Cell Reports Cell Stem t al. et et al. et et al. et al. et et al. et t al. et Extensive promoter-centered chromatin interactions provide a topological et al. et t al. et , 84–87 (2014). 84–87 , os ECD Cnotu. cmaaie nylpda f DNA of encyclopedia comparative A Consortium. ENCODE Mouse High-throughput screening of a CRISPR/Cas9 library for functional for library CRISPR/Cas9 a of screening High-throughput Multiplex genome engineering using CRISPR/Cas systems. CRISPR/Cas using engineering genome Multiplex N-udd ua gnm egneig i Cas9. via engineering genome human RNA-guided et al. RNA-programmed genome editing in human cells. human in editing genome RNA-programmed

Integrative analysis of haplotype-resolved epigenomes across human eoewd CIP sre i a os mdl f uo growth tumor of model mouse a in screen CRISPR Genome-wide GUIDE-seq enables genome-wide profiling of off-target cleavage by cleavage off-target of profiling genome-wide enables GUIDE-seq A programmable dual-RNA-guided DNA endonuclease in adaptive in endonuclease DNA dual-RNA-guided programmable A 518 t al. et hoai acsiiiy r-eemns lccriod receptor glucocorticoid pre-determines accessibility Chromatin Cell Nature Genome-scale CRISPR-Cas9 knockout screening in human cells. human in screening knockout CRISPR-Cas9 Genome-scale ucinl otrnig f euaoy DNA. regulatory of footprinting Functional Nat. Genet. Nat. Discovery of directional and nondirectional pioneer transcription

, 350–354 (2015). 350–354 , Science

8 160 FC: rno-oet ae agrtm o enhancer for algorithm based random-forest a RFECS: , e1002638 (2012). e1002638 , PLoS One PLoS

Nat. Biotechnol. Nat. 489 32 , 1246–1260 (2015). 1246–1260 , Nature , 267–273 (2014). 267–273 ,

, 57–74 (2012). 57–74 , 5 337

Mach. Learn. Mach. , 908–917 (2015). 908–917 , 43

, 816–821 (2012). 816–821 , Science , 264–268 (2011). 264–268 , Nature

5 Cell 509 , e12158 (2010). e12158 , PLoS Comput. Biol. Comput. PLoS

, 487–491 (2014). 487–491 , 148

515

33 343 , 84–98 (2012). 84–98 ,

45 , 187–197 (2015). 187–197 , , 355–364 (2014). 355–364 , , 80–84 (2014). 80–84 ,

, 5–32 (2001). 5–32 , nature biotechnology nature Cell

9

144 , e1002968 (2013). e1002968 , a. Biotechnol. Nat. , 940–954 (2011).940–954 , eoe Res. Genome uli Ais Res. Acids Nucleic a. Methods Nat. eLife Science

2 , e00471 , Science

339

24 12 32 , , , ,

© 2016 Nature America, Inc. All rights reserved. the positive control gRNAs targeting the GFP coding region always inducex loss of cell number and diversity of the respective populationssequencing ( runs of bulk, GFP tion in GFP populations. Identification of gRNAs that are significantly enriched in GFP gRNAs. available of majority vast the of integration enables MERA thus and gRNA is approach taken, cloning a or as such MERA a approach lentiviral integration whether by unaffected is quality synthesis library Oligonucleotide esti we Hence, replicate. one least rate of gRNAs that are mate integration of that the is synthesized. >90% those at of library bulk the in detected were the in the gRNA for library or we errors, due integration sequenced to synthesis by was caused inefficient for gRNA. the to matches exact showed that reads sequenced of number the counting by obtained were populations the to for Counts sequence reads. each gRNA sequenced for GFP either gRNA designed the of matches exact and primer barcode, reads. MERA of Mapping ( controls positive as serve CTATGCTGGAAACAGCATAGCAAGTTTAAATAAGGCTAGT regions. ChIA-PET distal overlapping etc. modifications histone active enrichment, I DNase binding, p300 as such features enhancer other on based interactions filtered we regions, ChIA-PET of number large a of promoter. In case the with interact to predicted are that regions distal find promoter. to PolII kb ChIA-PET gene of data the we Further, used side on either 100 to up going enrichment I DNase strong of regions on based gRNAs for For region. designed this were that gRNAs 3908 profile to promoter TDGF the around Zfp42 for TDGF, libraries four of the to each gRNAs specific 3,908 designed MERA. for design Library METHODS ONLINE doi: GFP expression, which is consistent with our previous results showing that over -axis versus Step 1. The gRNA integration rate into cellular genomic DNA was found to be 93% to frame reading open GFP the gRNAs 10 targeting contained libraries All TTATATATCTTGTGGAAAGGACGAAACACC[GN 5. 4. 3. 2. 1. gRNAs: design to algorithm Wefollowing the used Tdgf1 10.1038/nbt.3468

c. b. a. We ordered gRNA libraries of 3,918 members from LC Sciences. LC from members 3,918 of We libraries gRNA ordered an rounding interest, sequences homologous to the U6 promoter of and gRNA hairpin. sequence sur and one, with begin already not genomic does protospacer the if G optional the targeting protospacer 20-bp region. the tile to gRNAs required 3908 the achieve “PAM” sequence: NGG the preceding immediately genome the to homology of bp 19–20 Libraries were ordered as 98- to 100-bp sequences containing a 19- to to 19- a containing sequences 100-bp to 98- as ordered were Libraries to able were we until bp 1 just by shifted gRNAs adjacent Wefiltered have should RNAs Guide format. following the in RNA guide Design strand. reverse and forward the both on sequences GG all Find RNA design. guide for interest of region Determine and and Zfp42

does not matter if the first G is in the genome. the in is G first the if matter not does sequence is NNNNNNNNNNNNNNNNNNNN NGG (N (GN GNNNNNNNNNNNNNNNNNNNN If a and b are not satisfied, the guide RNA sequence should be be should sequence RNA guide the satisfied, not are b and a If NNNNNNNN (GN NNNNNNNN (GN If the genome sequence is NGG GNNNNNNNNNNNNNNNNNNN We normalize the gRNA sequence read counts, which can vary between GNNNNNNNNNNNNNNNNNN (GN GNNNNNNNNNNNNNNNNNN (GN If a is not satisfied but GNNNNNNNNNNNNNNNNNN NGG NGG GNNNNNNNNNNNNNNNNNN but satisfied not is a If but appeared to be only 43% for 43% only to be but appeared Rpp25 neg y In order to detect gRNAs with statistically significant overrepresenta library were synthesized accurately. Among these, 1,718/1,723 1,718/1,723 these, Among accurately. synthesized were library 19 -axis limits) . In order to normalize these read ranges, we assume that 18 and GFP NGG), the guide RNA sequence should be GNNNNNNNNNNN NGG) is satisfied, the guide RNA sequence should be be should sequence RNA guide the satisfied, is NGG) Nanog . For TDGF we selected a −20 kb to +20 kb proximal region region proximal kb +20 to kb −20 a selected we TDGF For . Zfp42 medium , , Rpp25 Supplementary Methods Supplementary and found that only 1,723 of the 3,919 guide RNAs We mapped the sequence composing of sample sample of composing sequence the mapped We medium 19 populations, we perform a step-wise procedure. In addition to 10 GFP-targeting gRNAs, we we gRNAs, GFP-targeting 10 to addition In and and ); and GFP Zfp42 , we prioritized the design of 3,908 3,908 of design the prioritized we , Zfp42 neg populations due to differences in . In order to determine if this this if . to In determine order 18 ); Supplementary Fig. 4b ). 20 neg ) where the genomic genomic the where ) 18-20 , , GFP neg ]GTTTAAGAG and GFP medium 20 NGG)—it or bulk Nanog medium , c - - - , ,

by gRNA, g2 = grouping by replicate. grouping = g2 by gRNA, GFP = y where also. component replicate-dependent a having as considered are slopes the while replicates, across independent but gRNA the on dependent model linear generalized a using ables as the dependent variable, and GFP of mutations induced by gRNAa particular that do not cause GFP-loss. gRNA-dependent intercept or GFP of GFP due to variable mutations or imperfect sorting. In addition, there is a low GFP the from derived are reads bulk of majority the GFP-targetinggRNA,any for population, along with 3 to 4 biological replicates per cell-line. We assume that different kinds of linear models depending on the data available and GFP we predict the number of GFP for all guide RNAs ( tional to the bulk reads for the GFP targeting GFPgRNAs, to a much greater extent than 99% of cells receiving a GFP-targeting gRNA lose GFP expression tion tion by gRNA in a a was selected performed large of pool to cells create tens of Analysis of data sequencing deep sets Zfx. Tcfcp2l1, Stat3, Nr5a2, n-Myc, Klf4, Esrrb, CTCF,E2F1, Med1, Med12, c-Myc, Smad3, CTCF,Smc1, p300, TCF3, Sox2, Nanog, Oct4, factor: transcription for data sets following the seq algorithm hotspot standard a using hotspots called and strong enhancers and RPKM-normalized H3K4me1 reads for weak enhancers. RPKM (reads per kilobase per million) – algorithmnormalized implementedH3K27ac readsin MATLAB. Edges were input.identified Further boundaries for of enhancersan wereinput called using asubtracted Sobel edge-detection “strong” and “weak” categories based on presencesitefromdatamouseingembryonic cells.stem Enhancers wereseparated intoof H3K27ac at levels greater than method genes. various of regions regulatory for snapshots browser genome Data sets for comparison and visualization with enriched gRNA. trials. of number the as gRNA the for reads bulk-transformed GFP of gRNA’s the with number probability, null this on observed based gRNA a for GFP GFP the to transformed gRNA dummy the for reads bulk of number the the GFP by dividing population we can obtain the null probability of reads in observing the GFP gRNA in the GFP GFP in occur not should GFP in gRNAs enriched finding GFP either GFP = y where form: the of replicate each for model and at most 2 replicates. In this case we build an linear independent regression GFP = y with but model In order to transform the bulk reads to the GFP form the of is model The GFP to the reads bulk the to transform In order of Incase I. TF TF density. hotspot. I DNase Enhancerpredictions. Step 2. to reads bulk all transform now we models, regression linear the Using of case In II. neg neg medium and to a lesser extent GFP population. However, each gRNA may also cause some intermediate loss neg 2 medium 7 We now use the fact that since the dummy gRNA (negative control) control) We dummy gRNA the (negative since that fact the now use using 6 histone modifications from ENCODE /GFP scale. Wescale. then use a to significance binomial distribution calculate neg The GEM algorithm or GFP count if it always caused GFP loss. In order to do this, we build two Tdgf1 neg neg medium Nanog 3 neg , x1 = bulk, z11 = intercept. = z11 bulk, = x1 , , x1 = bulk, x2 = GFP = x2 bulk, = x1 , 9 Supplementary Fig. 4c x y was used to visualize the data and create genomic view view genomic create and data the visualize to used was We used the DNase-seq data set previously generated previously set data DNase-seq the Weused /GFP ( ~ GFP medium reads as the number of successes, and the number of of number the and successes, of number the as reads GFP 2 1 Theenhancer predictions were made using theRFECS and medium + + neg medium and and z x /GFP populations, depending on if we in are depending interested populations, neg Zfp42 , x2 = GFP = x2 , reads we would see for each gRNA given its bulk Rpp25 neg population are due to random chance. Hence, medium neg 11 y medium pos 3 GFP /GFP x ~ medium 1 or GFP or ) | was applied to factor ChIP- transcription x g population, which may be a small fraction 1 1 , we have a GFP have a we , GFP 1 1 . . Individual scCRISPR-mediated muta reads are always observed to be propor medium + 3 + + cells any reads corresponding to this this to corresponding reads any cells medium , 8 d , we have only a GFP a only have we , z . The intercept is modeled as being being as modeled is intercept The . | ( and bulk reads as vari independent neg and medium 1 .

, z11 = intercept, g1 = grouping grouping = g1 intercept, = z11 , x g reads for reads the dummy gRNA by Supplementary Table 5 2 2 nature biotechnology nature ( ) medium populations respectively. populations neg scale, we model GFP we model scale, medium 2 ) | 8 scale, we use the same g trained on p300 bind 2 9 2 .

as well as GFP as well as neg neg 2 2 4 7 . In addition, in the case of population population /GFP

The The UCSC ). Hence, medium neg neg neg 3 0 - - - - /

© 2016 Nature America, Inc. All rights reserved. G, and deletion. The feature for a base was one of four values for a particular particular a for values four of one was base a for feature The deletion. and G, T, C, possibilities—A, 5 represented we gRNA the within bases the of all For bp of −20/+20 the gRNA within as features. and deletions insertions matches, GFP of Classification bases. these of downstream and upstream emerging profiles sequence distinct observe can we site, cleavage the from originating mutations of stretches of lengths random the to due However, site. cleavage the surrounding bases for these score importance a base-level to obtain ficult GFP as well GFP for mutations all since that noted be It should gRNA. the of biases gRNA to of of indicate relative importance each independent the base, cutting as: defined was at any base score motif The (D). deletion fifth T, a C, and G) (A, type base each of occurrence distributions: discrete two between distance the of distance a from we base reference, the measure Hellinger used of GFP the to deviation the of GFP deviation the representing 13a Figs. Supplementary and insertion mutations were all observed, with deletions predominating predominating deletions GFP the with in observed, all were mutations insertion and ( populations GFP both in site cleavage gRNA the around mutation gRNA. the of bp 20 beyond go not do and gRNA the we analysis limited to further with genotypes disruptions that originate within GFP-loss, on gRNA the of effect local the assess to wish we Since gRNA. the GFP of population ( GFP for enriched RNA were guide the of ends the beyond extending tions ( errors ing mutations that lying outside to be could assumed be one or two sequenc base short few very with expected, as gRNA, the majority within the originated that disruptions of observed and disruptions these of ends right plotted Weand left bases. the <5 of matches intervening maximum with mutations of series continuous a as disruption” of “length a defined we Hence, . deletions reference. the in Weposition observed long base stretches of the mutations to with combinations of respect mismatches and with quality counted were base insertions (per quality sequence for discovery motif Functional was: used MATLAB in command algo rithm Needleman-Wunsch the of version semi-global the using performed was genome reference the to reads of sequenced Alignment bp). of 175 (total downstream bp 30 by extended genotype reference the to genotypes long bp barcodes, the length of each paired-end read was 145 bp. We aligned these 145- of stripping filter 30. After by quality a quality using base minimum sequence ( expression GFP of loss induce not did and did that regions mutated of thousands on data genotypic obtain to site targeted each surrounding regions on sequencing paired-end GFP sorted cally cytometri flow We site. then the at genotypes mutated unique of thousands nature biotechnology nature Supplementary Fig. 10a These base scores were plotted as a motif logo along −20/+20 bp of the the of bp −20/+20 along logo motif a as plotted were scores base These of frequency the were which base per values possible five had we Here, score base-level a defined we logo, motif base-level a develop to order In increased observed we genotypes, these to analysis our Restricting Genotype_s nwalign(Reference_Seq, 4 ba 0 with a gap opening penalty of 8 and gap extension penalty of 4. The The 4. of penalty extension gap and 8 of penalty opening gap a with se Supplementary Figs. 10 Figs. Supplementary medium/neg sc neg or Supplementary Figs. 11a Figs. Supplementary arise within the seed region of the gRNA, it is sometimes dif sometimes is it gRNA, the of region seed the within arise l e eq,’ pos = neg alphabet and GFP and og genotypes ( genotypes P H as well as GFP as well as 10 , ( pos Fig. 4 Fig. – ( ( , H d n GFP and b Q pos , yellow-red versus blue), we also observed a mixed ’, versus versus ) ’N GF population from reference. In order to find the In to order find reference. from population = medium/neg . After globally aligning and filtering reads reads filtering and aligning globally After . a T’ ). Deep-sequencing data sets were filtered for for filtered were sets data Deep-sequencing ). , P ,’ 1 ne 2 gapope neg Supplementary Figs. 11a Figs. Supplementary 13c g and and pos population from reference as compared as compared reference from population neg ∑ i reference) = k , 1 deletions lying within −20/+20 bp of bp −20/+20 within lying deletions d populations. populations and performed 150-bp 150-bp performed and populations ) ( , ). nn’ b 14a ≥ and and p ,8,’ 30), mismatches, deletions and and deletions mismatches, 30), i i , − b Ex ). While a majority of disrup of majority a While ). ( / H te 13a q ndGap’ GF − , b 2 ). Mismatch, deletion, deletion, Mismatch, ). , P e ersne mis represented We pos , p os ,4,’ and GFP and reference) glocal , b versus versus 4 1 for finding ’, ’t medium/neg rue’ ) 11c pos );

neg as as , d ------,

Off-target effect analysis. effect Off-target of the base for regulation as determined above ( in the gRNA at the URE to see if there was a correspondence withConservation the importance of bases. classification for each test-set genotype, by the number of reads assigned to it. onlyonce. Incaseof weighted accuracymeasures, weweighted theaccuracy of type, was computed in an unweighted manner by counting each test-setreached genotypeconvergence at this parameter value. out-of-bagclassificationClassificationhadthe error that ascertained and trees 100We used rate for a test-set geno GFP 40). + gRNA of 6×(length = were features total the Hence, ries. after each of base immediately the inserted gRNA of bases bounda and flanking number integer the as represented were Insertions features. 40) + gRNA of 5×(length to we format, a obtained numeric representation categorical this Converting base. that at starting bases deleted of number integer the or base 42. 41. 40. 39. 38. the positive and negative set, we created a set of rules of the following on form to of a following we of the set created set, rules and GFP.negative Based positive the of loss significant a cause not did but replicates all in tested ( bp) (9–20 region seed the in mismatches 3 to up with PAM region, the including mismatches of total 3–6 a maximum tolerate gRNAs could GC-content, to the proportion GUIDE-Seq by generated were that gRNAs 13 from data using cutting off-target CRISPR of model a built we effects, target Supplementary Methods Supplementary methods. Experimental data. MERA in seen patterns the alter not does effects off-target of overestimate this even However, association. functional of likelihood the reducing gene, the from away kb >200 often are hits MERA much cutting rarer cally than on-target domain the containing topological in the effects off-target target effects, and 34/332 of the gRNAs enriched in GFP data domain containing topological the GFP in enriched significantly were that gRNAs the of 150/925 and effects, off-target ( site genomic a overlapping significant effect off-target gRNAs had a potential effects: off-target predict Supplementary Fig. 7e Fig. Supplementary e efre fvfl casfcto o uiu gntps n GFP in genotypes unique of classification fivefold performed We Using our MERA data, we defined a true negative set as gRNAs that were were that gRNAs as set negative true a defined we data, MERA our Using 4. 3. 2. 1. allowed. bp) (8–20 region seed the in mismatches of pairs No adjacent Our off-target predictions are overly cautious, as off-target cutting is typi is cutting off-target as cautious, overly are predictions off-target Our the In set negative these of none that predicted model effect off-target Our 5.

Dixon, J.R. K.J. Miescke, & F. Liese, Needleman, S.B. & Wunsch, C.D. A general method applicable to Kent, theW.J. search for similarities models. linear Generalized R.W.Wedderburn, & J.A. Nelder, of chromatin interactions. chromatin of Selection intheamino acid sequence oftwoproteins. (2002). A 4 neg

2 135 ≥ tolerated. mismatches mismatches, total tolerated. If total GC GC If total GC If total content GC gRNA total If If gRNA total GC content . In the If seed GC>7: GC>7: If seed GC If seed populations using a parallelized random forest implemented in MATLAB. 16 GC: GC: 16 neg , 370–384 (1972). 370–384 , Tdgf1 populations have one or two potential off-target sites within the the within sites off-target potential two or one have populations (Springer, 2008). (Springer, et al. Zfp42 et al. ≤ library, 1,160/3,621 of the integrated gRNAs have potential potential have gRNAs integrated the of 1,160/3,621 library, 6 total mismatches, mismatches, total 6 ≥ ≥ ≤ The browser at UCSC. Topological domains in mammalian genomes identified by analysis 10 and Supplementary Fig. 7a Fig. Supplementary 14 and and 14 library, 632/1,643 integrated guide RNAs have predicted off- 7: 7: ≤ ≤ 3 seed mismatches. seed 3 We examined the vertebrate phastcons score for every base 2 seed mismatches; seed 2 ≤ ). 2 seed mismatches tolerated. mismatches seed 2 ≤ . ≤ Experimental methods are described in the the in described are methods Experimental To analyze potential false positives caused by off- by caused positives false To potential analyze Nature 13 and seed GC > 7: 13 and seed 15: total mismatches mismatches total 15: ttsia Dcso Ter: siain Tsig and Testing, Estimation, Theory: Decision Statistical ≤ ≥ 9:

485 10 and and 10 Tdgf1 ≤ ≤ 3 total mismatches, 3 mismatches, total 3 seed mismatches tolerated. mismatches seed 3 , 376–380 (2012). 376–380 , 3 2 gene as from determined mESC HiC and the off-target sites predicted for sites predicted and the off-target J.Mol. Biol. – ≤ d 13 and seed GC content content GC seed and 13 ). Supplementary Fig. 15e 3 ≤ 2 ≤ 5 total mismatches, mismatches, 5 total . We found that in inverse inverse in that found We . 5;

Genome Res. 48 , 443–453, (1970). neg doi: ≤ cells have predicted Zfp42 2 seed mismatches mismatches 2 seed J. R. Stat. Soc. Ser. Soc. Stat. R. J. 10.1038/nbt.3468

12 gene. , 996–1006 ). ≤ pos ≤ 3 seed 3 seed 7: 7: and ≤ 4 - - -