© 2013 Nature America, Inc. All rights reserved. W. J. G . ( . G J. W. USA. California, Stanford, 1 of portrait multidimensional a provide can that profiling timescales. diagnostic on epigenomes’ ‘personal of generation of precluding thereby samples, clinical application well-defined to assays these prevent often requirements input Third, ways. the ex vivo geneity in cellular populations. Second, cells must often be grown ‘drown and hetero out’ over average can methods current First, simultaneously. These limitations give rise to three shortcomings. and TF binding accessibility chromatin positioning, nucleosome consuming sample preparations, and cannot probe the interplay of material,millionsstartingof as cells involve complex and time- require methods existing for protocols published powerful, are tioning chromatin’) (‘open accessibility chromatin assaying separately for methods genome-wide high-throughput, of have chromatin come structure from the nucleoprotein within lation and the nature plays a of role in central packaging this gene regu Eukaryotic genomes are packaged hierarchically into chromatin clinical decision-making. individual’s epigenome on a timescale compatible with days, we demonstrated the feasibility of analyzing an human tended to overlap with nucleosomes. of compaction at nucleotide resolution. We discovered classes DN the interplay between genomic locations of open chromatin, simple two-step protocol with 500–50,000 cells and reveals analysis. A as a rapid and sensitive method for integrative epigenomic transposition of sequencing adaptors into native chromatin, using sequencing (A We describe an assay for chromatintransposase-accessible Jason D Buenrostro DN sensitive epigenomic profiling of open chromatin, T RECEIVED Department of Genetics, Stanford University School of Medicine, Stanford, California, USA. Here we report a robust and sensitive method for epigenomic epigenomic for method sensitive and robust a report we Here ransposition ransposition of native chromatin for fast and DN A-binding proteins, individual nucleosomes and chromatin in in vivo A-binding factors that strictly avoided, could tolerate or 2 [email protected] , to obtain sufficient material, and such conditions perturb and perturb such conditions material, to sufficient obtain A-binding proteins and nucleosome position nucleosome and proteins A-binding 6– 3

CD 20 . Major insights into the epigenetic information encoded encoded . information Major into the epigenetic insights 8

and (TF) occupancy (TF) factor transcription and 4 JUNE; context and state the modulate in epigenetic unknown T + A

T C cells from a proband obtained on consecutive -seq -seq captures open chromatin sites using a

ACCEPTED ) or H.Y.C. ( T 3 A Program in Epithelial Biology, Stanford University School of Medicine, Stanford, USA. California, Correspondence should be addressed to C

29 1 -seq), based on direct – A 3 , Paul G Giresi UGUST; [email protected]

PUBLISHED U sing A

ONLINE 4 2 , , 5 3 T , nucleosomeposi , , Lisa , C Lisa Zaba A in vitro

C 6 -seq -seq maps of O ). 9 CTOBE . . Though they

R

2013;

D 2 O , 3 1 I - - - : , Howard Y Chang , 1

0 . 1 0 highlighted by others highlighted ( cells million 1–50 from erated that that this is method compatible and with stand timescales clinical demonstrated We line. B-cell a in ‘footprinting’ using DNA-binding proteins of positions the infer and regions, regulatory in matin, identify nucleosome-bound and nucleosome-free positions chro open of regions We ATAC-seq used identify regulation. to sets,including DNase-seq data chromatin-accessibility with and validation for comparison ref. 1; Tier (ENCODE) DNAof Elements (Encyclopedia line cell lymphoblastoid human GM12878 the from isolated nuclei reversal. cross-link and purification gel ligation, adaptor as such steps, loss-prone potentially many and protocols multistep involve accessibility chromatin assaying for methods (FAIRE-seq) of sequencing with isolation elements regulatory formaldehyde-assisted and DNase-seq published contrast, In PCR. by followed insertion Tn5 involving protocol entire assay and library construction can be carried out in a simple generated at erentially of locations open chromatin ( DNA fragments suitable for high-throughput sequencing are pref tin makes such transposition less probable. Therefore, amplifiable ble chromatin, in steric whereas hindrance less chroma accessible integrates its adaptor into transposase payload regions of accessi Tn5ATAC-seq, In chromatin. accessible of regions interrogate transposase, on small numbers of prokaryotic unfixed eukaryotic a nucleiTn5, would purified by transposition that hypothesized elements regulatory active into integrate to shown (previously “tagmentation”as described adaptors sequencing with genome a tag and ment frag simultaneously can DNA sequencing, high-throughput for transposase Tn5 Hyperactive A RESULTS volunteer. healthy a of landscape open-chromatin the observing by draws blood ard 3 8 T We carried out ATAC-seq on sets of 50,000 and 500 unfixed unfixed 500 and 50,000 of sets on ATAC-seq out Wecarried / A N C M 2 -seq probes chromatin accessibility with transposomes Howard Hughes Medical Institute, Stanford University School of Medicine, E T H . 2 6 8 8 N A 2 TURE METHODS , 3 & William J Greenleaf J William & 5 ( Fig. Fig. 1 1 3 and FAIRE-seq 1 c 1 10 ). Because transposons have been been have transposons Because ). ), ), ATAC-seq had a signal-to-noise

, 1 | 1 ADVANCE Fig. 1 Fig. , loaded loaded , b ). At a locus previously previously locus a At ).

ONLINE in vitro in 1 6 , which were gen were which , 1

ARTICLES PUBLICATION with adaptors adaptors with Fig. Fig. 1 in vivo in a ). ). The 1 2 13 , we we , 1 | ,

5 1 1 4 ------)

© 2013 Nature America, Inc. All rights reserved. to the global insert distribution (Online Methods), we observed observed we Methods), (Online distribution insert global the to models previous by defined as chromatin of tioning the size insert according distribution to classes functional DNA of pitch helical the to equal periodicity clear nucleo of ( somes multiples integer by protected are fragments many suggesting bp, 200 approximately of periodicity clear had matin The positioning. insert size distribution and of sequenced fragments from human packing chro nucleosome about information detailed produced reads paired-end ATAC-seq that found We A material. input of numbers smaller for diminished starting material ( as nuclei human 500 or 5,000 using when even tained and preference( sequence transposase with ties also correlated well with markers of chromatin active and not ( and DNase-seq ATAC-seq for specificity and sensitivity similar a demonstrated in ENCODE DNase-seq data:identified receiver sites operating characteristic curves DNase-hypersensitive to data our compared ( peaks ATAC-seq and DNase of intersections from came peaks within reads of majority the and ( data DNase-seq correlated highly ATAC-seqbetween and two different sources of ( replicates technical between reproducible highly generated were from ~3–5data orders whose of magnitudeDNase-seq, moreof cellsthat to similar ratio 2 clear enrichments class-specific across this insert size distribution

ENCODE/Duke T ARTICLES ENCODE/UNC per replicate) | (50,000 cell A FAIRE-seq chromatin DNase HS ADVANCE c a ATAC-se ATAC-seq (500 cells Closed H3K4me3 H3K27ac H3K4me1 CTCF C 5 -seq insert sizes disclose nucleosome positions ). Highly sensitive open-chromatin detection was main was detection open-chromatin sensitive Highly ).

Chr19: Scal i. 2 Fig. q HAUS5 )

s Supplementary Figs. 3 Figs. Supplementary chromati Chr19 (q13.12) e 1 1 0 1 0 1 0 1 0 Open

ONLINE a transposome n ). This fragment size distribution also showed showed also distribution size fragment This ). RBM42 Supplementary Fig. 3 Supplementary R Tn5

= 0.79 and 0.79 = PUBLICATION

ETV2 COX6B1 19p13.3 36,150,000 UPK1A-AS1 R 19p13.2 UPK1A |

= 0.83; 0.83; = and N 50 kb A upeetr Fg 2 Fig. Supplementary TURE METHODS 13.12 6 ). ATAC-seq peak intensi ), although sensitivity was ), sensitivity although 19p13.11 Supplementary Fig. 1 Fig. Supplementary Supplementary Figs. 4 Figs. Supplementary 1 3 sequenc . . Peak intensities were Amplify 19p12 and 36,200,000 1 7 and normalizing normalizing and e DNase-seq ZBTB32 No. ofcells FAIRE-seq ATAC-seq b R 1 = 0.98) and and 0.98) = 19q12 MLL4 1 . By parti By . q13.11 10 2 . We ). Starting material hg19 10 q 3 3.12 IGFLR1 10 U2AF1L4 U2AF1L4 4 PSENEN 10 ), ), - - - - - 19q13.2 5

10 36,250,000 LIN37 6 10 HSPB6 C19orf5 7 chromatin state previously defined six nucleosomes. ( to persists periodicity clear shows histogram log-transformed Inset, compaction. ( Figure to are DNA refractory have shown that studies certain sequences prior Finally, TSSs. of that than accessible less is that chromatin of form a represent regions such that suggesting result a ments, regions flanking were enriched for longer multinucleosomal frag Transcribedfragments. and promoter-trinucleosome–associated 10 b a 50 8 q13.32 Day Normalized read density × 10–3 5 0 2 4 6 8 q13.33 ARHGAP3 ARHGAP3 Preparation time 2 12 0 | 13.41 ATAC-seq provides genome-wide information on chromatin chromatin on ATAC-seq information genome-wide provides 3 3 q1 225 3.42 a 3 ) ATAC-seq fragment sizes generated from GM12878 nuclei. nuclei. GM12878 from ) ATAC-seqgenerated sizes fragment (~10.5 bp) DNA pitch q13.43 Nucleosom b 200 Fragment length(bp) 4 ) Normalized read enrichments for seven classes of classes seven for enrichments read ) Normalized 400 e ed u wt AA-e. h differen The ATAC-seq. be with out can read that fingerprint accessibility an have chromatin of states functional these ( ferentially depleted for mono-, di- and and di- mono-, for depleted ferentially whereas DNA, dif were (TSSs) sites transcription start of fragments short enriched for were factor regions CCCTC-binding (CTCF)-bound classes: these of state functional putative the with ent fragmentationtial patterns wereconsist ENCODE production centers. Carolina, UNC) data were from the indicated seq (Duke) and FAIRE-seq (University of North with active enhancers and promoters. DNase- of CTCF and histone modifications associated FACS-sorted cells. locationsBottom, composite lower ATAC-seq track was generated from 500 a locus in GM12878 lymphoblastoid cells. The ATAC-seq to other open-chromatin assays at ( analysis. open-chromatin of time requirements for genome-wide methods reported input material and sample preparation that can be PCR-amplified. ( gray) and generates sequencing-library fragments of open chromatin nucleosomes(between in adaptors (red and blue), inserts only in regions Transposase (green), loaded with sequencing state. ( Figure Fig. 2 Fig.

Fragment length(bp) Norm. read density 400 10 10 10 –5 –4 –3 1 a 0 b ) ) ATAC-seq reaction schematic. 1 | 7 ). This result demonstrated that that demonstrated result This ). ATAC-seq probes open-chromatin . Single nucleosom 575 600 Fragment length(bp) 400 Dimer Trimer 750 e Tetramer 800 Repressed Transcribed Promoter flanking Enhancer Weak enhancer TSS CTCF Chromatin state: 800 b c Pentamer ) ) Approximate ) Comparison ) ofComparison 1,20 Hexamer 0 1,000

–0.15 Enrichment

0

0.15 - - - -

© 2013 Nature America, Inc. All rights reserved. Methods and and Methods (Online DNA nucleosome-associated from derived likely reads and DNA of regions nucleosome-free putative from generated reads into data our partitioned we line, cell GM12878 the in tin exist to hypothesized long been have which chromatin, of forms accessible differentially ATAC-seq reveals that suggest data These state. inaccessible expected their with consistent is which inserts, multinucleosomal phased for enriched and fragments short for depleted are model) state chromatin the by defined (as regions untranscribed repressed, heterochromatin condensed are ments fragments sized multinucleosome- large, as released are and digestion nuclease within accessible chromatin. in TSS and distal sites (Online Methods). ( seen by MNase-seq at the −2, −1, +1, +2, +3 and +4 positions. ( by cap analysis of (CAGE) values. ( (198 million paired reads) and MNase-seq (4 billion single-end reads from ref. and H2A.Z tracks for comparison. DNase-seq (University of Washington, UW) data were from the H3K4me3 indicated H3K27ac, MNase, DNase, ENCODE as production well as Methods) center. (Online track nucleosome ( calculated and track read a nucleosome-free showing (TSSs) sites Figure that positively weights nucleosome-associated fragments and and fragments nucleosome-associated weights positively that b a nucleosome signal ENCODE CAG nucleosome-free To explore nucleosome positioning within accessible chroma accessible within positioning To nucleosome explore RefSeq DNase (UW MNase-seq ATAC-seq; ATAC-seq; H3K4me3 H3K27ac H2A.Z reads 3 |

ATAC-seq provides genome-wide information on nucleosome positioning in regulatory regions. ( regions. regulatory in positioning nucleosome on information genome-wide provides ATAC-seq TSSs sorted by expression −1 kb ) E WDR6 Chr19: THAP8 Scale Supplementary Fig. 7 Fig. Supplementary 134 923 245 125 316 paired reads 198 million 11 13 95 ATAC-seq 8 0 8 2 0 4 ______1 2 8 sbeun suis hwd ht uh frag such that showed studies subsequent ; 0 36,545,000 1 kb 1 kb 36,545,500 0 5 10 15 20 25 30 35 in vivo in −1 kb 36,546,000 e ). Using a simple heuristic heuristic simple a Using ). ) Hierarchical clustering of DNA-binding factor position with respect to the nearest nucleosome dyad dyad nucleosome nearest the to respect with position factor of DNA-binding clustering ) Hierarchical single-end reads 2 1 MNase-seq , 20 9 4 billion . Indeed we found that that found we Indeed . , 2 01 1 c hg19 . ) TSSs are enriched for nucleosome-free fragments and show phased nucleosomes similar to those those to similar nucleosomes phased show and fragments nucleosome-free for enriched are ) TSSs 36,546,500 kb 0.08 0.24 0.40 0.56 0.72 0.88 1.04 1.20 d ) Relative fraction of nucleosome associated versus nucleosome-free (NFR) bases bases (NFR) nucleosome-free versus associated of nucleosome fraction ) Relative e c Fraction of signal 0.5 1.0 0 −1,000 - - Nucleosome-fre

Dyad center Nucleosome 2 region overlapping the TSS, whereas our nucleosome signal was was signal nucleosome our whereas TSS, the overlapping region promoter nucleosome-free canonical a at enriched were ments we across TSSs, all active frag signal found that nucleosome-free chromatin( ofregions accessible within tive regulatory regions, as the majority of reads were concentrated puta within nucleosomes data were to more detecting amenable digestion followed by sequencing (MNase-seq) ( mononucleosome well-positioned single a by separated regions distinct two nucleosome-free fact TSSs in two revealed ATAC-seq bp. showing ~700 by data separated (CAGE) expression gene of analysis chromatin ( accessible positions of nucleosome regions call to within used track data a calculated we negatively weights nucleosome-free fragments (Online Methods), signal 3 Fig. 3 Fig. ) nucleosome signal for all active TSSs ( 0 Position (bp) 10 TSS e 20 0

a 30 ) contains a putative bidirectional promoter with cap cap with promoter bidirectional putative a contains ) 40 Nucleosome

50 MNase-se

60 edge 70 80 1,000 N

90 q A

TURE METHODS 100

Fig. 3 Fig. 110

120 d 130 Median NFRsize:239bp a 140 ) Locus containing two transcription start start transcription two containing ) Locus

a 150

). Compared to micrococcal nuclease nuclease micrococcal to Compared ). 160 170 82% 180 TSS

190 18% |

ADVANCE 200 210 n 220 Nucleosomes = 64,836). TSSs are sorted sorted are TSSs = 64,836).

230 NFR 240 250 Median NFRsize:86bp BHLHE40 MAZ MAX Ab P300 Ab2 P300 Ab ELK1 RFX5 NRF SREBP2 P300 JUND Ab NFE2 JUND Ab NFYB CTCF SMC RAD21 ZNF143 NFYA IRF3 C-FOS CHD1 SIN3A WHIP ZNF384 ZZZ3 GCN5 POL2s2 POL2 Ab POL2 Ab YY1 BRCA IKZF1 CDP SREBP1 MAF MAX Ab EBF1 ERRA MXI TB TR STAT SPT20 TBLR1 STAT COREST CHD2 USF2 Fig. 3 Fig. 2

ONLINE 2 Enrichment P 4 –0.35 . An example locus locus example An . 1

1 K

35%

3 B

1 3

1

1 2 1

2 2 1 2 1 Distal

3 0 b b data, ATAC-seq

ARTICLES ). By averaging averaging By ). ) ) ATAC-seq PUBLICATION nucleosome adjacent/associated CTCF-like avoiding Strongly 65% Nucleosome Nucleosome 0.35 Nucleosome insensitive avoiding

|

3 - -

© 2013 Nature America, Inc. All rights reserved. e esnd ht N sqecs iety cuid y DNA- by specific CTCF-binding site on 1, we observed a clear occupied footprints digestion DNase to analogous site, each at tein directly pro DNA-binding a of presence the reveals ‘footprint’ sequence sequences DNA resulting the transposition; from protected are proteins that binding reasoned We A studies. mechanistic for hypotheses specific suggests ately immedi factor of DNA-binding locations and positioning some boundary nucleosome as well as RNA II, polymerase which appears to be at enriched the includes chromatin-remodeling factors such as CHD1 and SIN3A to tend sites binding overlap nucleosome-associated DNA.whose Interestingly, class this final class a (iv) and behavior binding nucleosome-overlapping or nucleosome-avoiding of subunits gradations have that TFs of cohesin-complex primarily class large a and (iii) SMC3, and RAD21 CTCF factor looping matin chro includes notably which DNA contacts, nucleosome of end expected the to up’next ‘nestle that factors of class a (ii) IRF3), from the nearest bases nucleosome ~180 dyad at (comprising C-FOS, NFYAstereotyped events and binding with factors of group nucleosome-avoiding strongly a (i) including nucleosome, imal ( derived from ATAC-seq data. Unsupervised hierarchical clustering binding factors with respect to the dyad of the nearest DNA-nucleosome of variety a of position the plotted we data, (ChIP-seq) sequencing by followed immunoprecipitation chromatin Using A wide. genome elements tory regula in regions nucleosome-free and nucleosome-associated of readouts high-resolution provide ATAC-seq can suggest data elements, which tend to remain nucleosome rich ( distal to compared when regions nucleosome-free for enriched that regions Webound. to were and nucleosome be note that TSSs were predicted free nucleosome and were that regions regions into regulatory TSSs distal putative partitioned further we ( nucleosomes regulatory at concentrated are reads) paired million (198 ATAC-seq from generated reads whereas nucleosomes, all assays reads) (4 billion gestion of more accessible nucleosomes. Additionally, MNase-seq at increased larger distances from the TSS, likely owing to overdi signal nucleosome MNase-seq In contrast, +3 nucleosomes. +4 and +2, the at decreased that nucleosome +1 the at signal some nucleo strong a saw we chromatin, open of regions at centrated nucleosomes downstream and upstream of phasing characteristic displayed and TSS active the of downstream and upstream both enriched are the CTCF ChIP-seq data (ENCODE) for this GM12878 cell line. column rightmost the in Shown (PhyloP). conservation evolutionary and CTCF motif, the for scores (PWM) matrix weight position data, seq the genome. ( within sites binding over generated shown) CTCF (motif for footprint seq were from the indicated ENCODE production centers. ( data Institute) (Broad ChIP-seq and (Duke) DNase-seq 1. chromosome on locus a specific at ATAC-seqdata, in DNase-seq and observed footprints Figure 4 4 (a notch footprint deep of ATAC-seq to footprints similar signal) Fig. Fig. 3

T T ARTICLES | A A ADVANCE C C -seq footprints infer factor occupancy genome wide -seq reveals patterns of nucleosome- e | ) revealed major classes of binding with respect to the prox ATAC-seq assays genome-wide factor occupancy. ( occupancy. factor genome-wide assays ATAC-seq

c ONLINE ) CTCF predicted binding probability inferred from ATAC- 6 , 7 ,23 (

PUBLICATION Fig. 3 Fig. 8 . The interplay between precise nucleo precise between interplay The . Fig. 3b Fig. c ). Because ATAC-seq reads are con are ATAC-seqreads Because ). |

, c N ). Using our nucleosome calls, calls, Using ). nucleosome our A TURE METHODS T F spacing b ) Aggregate ATAC- Fig. Fig. 3 a ) ) CTCF d ). ). These 2 4

. At. a ------

of this individual. this of state regulatory on the information relevant clinically to provide engaged targets, drug T of (NFAT),only factor cells nuclear activated but not two other cells, T proband’s the in that showed ATAC-seq blockade. IL-2 of goal therapeutic the serve to unlikely drugs to patient the ing TF pathway in order to rationally target inhibition without expos context-dependent manner. One might wish to identify the causal putativebind that TFs differentof activities diseases mune and autoim in inflammatory and functions T-cell growth drives epigenome can be IL-2 individual obtained. is a that key cytokine atfilethe ( daysconsecutive mately 275 min ( approxi was sequencing to draw blood from time required total enrichment and sample handling protocols (Online Methods), the T cells obtained via standard blood serial draws. With rapid T-cell We applied ATAC-seqfeasible. to clinic assay the epigenome of the a healthy volunteer’s to application its make would that scale time a on individual’sepigenome an of aspects at look to tool powerful a as serve may it reasoned we cells, of numbers small Because ATAC-seq is rapid, information rich and compatible with A networks. ATAC-seq regulatory of reconstruction allow these would which data, from extracted be can data occupancy factor tor inference ( occupancy data in cell line this and fac compared favorably to DNase-based binding ChIP-seq recapitulated ATAC-seq closely using Results motif that coincides with the summit of the CTCF ChIP-seq signal DNase-seqwith seen a posterior probability of CTCF binding at all loci all at binding CTCF of probability posterior a to generate and data ATAC-seq insertion conservation lutionary evo score, consensus motif from probability binding CTCF the for a of variety common TFs ( footprint ( well-stereotyped all locations expected of CTCF within the genome and a observed ( cells GM12878 in b a T 50 10 15 A Insertion-site probability 0.004 0.008 Chr1 0 0 0 1 0 0 C -seq enables epigenomic analysis on clinical timescales 0 Broad CTCFChIP-se Duke DNase ATAC-se : −100 C A T G Distance tomotif(bp) G A T IL2 76,620,000 −50 q C G A T C CTCF motif G A T I locus to show how regulatory information on an an on information regulatory how show to locus C G A C A T 2 G A T 0 8 G Frhroe dsic drugs distinct Furthermore, . Fig. Fig. 5 G A 05 q G A T Fig. 5 Fig. 500 base G A Fig. 4 Fig. G A T C A T 25 G A T 0 a C G A IL2 , ). We performed this procedure on three on three procedure ). Wethis performed C A T 2 s b C G A 6 100 Supplementary Fig. 9 Supplementary a ) and investigated the ATAC-seqpro the investigated and ) , at the precise location of the CTCF CTCF the of location precise the at , ). We averaged ATAC-seq signal over over ATAC-seqsignal We averaged ). ( Fig. Fig. 4 c Fig. 5 Fig. Supplementary Fig. 8 Supplementary Genome-wide set of CTCF motifs PWM b

–100 PhyloP ). ). Similar results were obtained c ). Thus, ATAC-seq was able able was ATAC-seq Thus, ).

ATAC-seq CTCF IL2 100 ), suggesting that ), suggesting 29–

enhancers in a in enhancers Predicted binding probability –200 3 ). ). We inferred 1 CTCF ChIP-seq 2 inhibit the the inhibit 7 ( CTCF Fig. 4 Fig. 200 c ). ). ------

© 2013 Nature America, Inc. All rights reserved. can also provide this information, they each require separate separate require each they information, this provide also can tion. Although extant methods such as DNase-seq and MNase-seq tribution of insert lengths captured during the transposition reac genome wide by considering the position of insertion and the dis nucleosome positions in regulatory sites, and chromatin ATAC-seqaccessibility allows simultaneous interrogation of factor occupancy, DISCUSSION clinical from applications. for door networks future diagnostic the opening samples, regulatory gene personalized detailed 12 Fig. chromatin regions within the GM12878 cell line ( we also used ATAC-seq to open- candidate identify allele-specific of human disease to our understanding chromatin accessibility has been shown to be particularly relevant face markers ( sur ATAC-sequsing that enrichment cellular with compatible is demonstrated and chromatin accessible of map a created draw CD8 ( RNA RP11-229C3.2 noncoding intergenic large uncharacterized was localized to known T cell–specific genes such as Supporting this interpretation, we noted specific loci where NFAT ( types two cell these within correlated was highly pancy occu CTCF canonical whereas regulating, differentially was T factorscell–specific ( ation in T between distribution cells and B cells were for enriched GM12878 and proband CD4 between TFs 89 same the of distribution genomic the compared we map, regulatory personalized this With networks. regulatory of 89 TFs in proband T cells, enabling systematic reconstruction of indicated by a red box. are TFs regulated differentially highly most The B cells. T versus in (blue) distinctiveness or (yellow) similarity relative indicates Color type. cell same the in TFs other of all that of versus a TF profile footprint the is column or row line. Each B cell of a GM12878 that with compared T cells proband from network regulatory from ref. data track; (blue data ChIP-seq to compared is track) (green prediction footprint ( standard blood draws. ( Figure 5 assays with large cell numbers. ATAC-seq also provides insert-size c Supplementary Fig. 10 Fig. Supplementary b a ) Application of ATAC-seq data to prioritize candidate TF drug targets. ATAC-seq targets. drug TF candidate of ATAC-seq prioritize to ) Application data Using ATAC-seq footprints, we generated the occupancy profiles + Blood draw T cells and monocytes isolated by FACS from a single blood 5 min ). These results demonstrate the feasibility of generating generating of feasibility the demonstrate results These ). | 3 ATAC-seq enables real-time personal epigenomics. ( epigenomics. personal real-time enables ATAC-seq 5 Day Day Day ). Three TFs and their targeting drugs are shown. ( 3 2 1 CD4 Supplementary Fig. 11 Fig. Supplementary IL2 + 123,450,000 T-cellpurification 90 min b ) Serial ATAC-seq data from proband T cells over 3 d. over ATAC-seq T cells ) Serial proband from data Fig. 5 Fig. 100 kb ). Additionally,). ATAC-seq CD4 of BC045668 + d T cells. TFs that exhibited large vari IL21 123,550,000 ). This analysis showed that NFAT that showed analysis This ). Transposition &lification ). Separately, allele-specific ). Separately, allele-specific 180 min 3 2 . . As a proof of principle hg19 123,650,000 CETN4P BBS12 Supplementary Supplementary CD28 240 min–120 Sequencin a d ) Workflow from from ) Workflow ) Cell type–specific type–specific ) Cell Fig. 5 Fig. and the + and and g h d NFAT ChIP-seq ). ). - - - - -

ATAC-seq d c

Chr4 o Note: Any Information Supplementary and Source Data files are available in the number at accession available codes. Accession v and are Methods references any in available the associated M disorders. neuropsychiatric and autoimmunity cancer, including of diseases, human progression the and during atsubpopulations different points during development and aging care on cellular select to study ATAC-seq applied that be also studies can expect We tissues. primary enable from may subpopulations it rare and FACS, sorted fully with compatible is As ATAC-seq future. the in studies epigenomic of bottleneck the become sequencing—will or biology molecular the analyses—not matic anticipateandweadditionaltime, take thatbioinforwill ences infer other and networks regulatory global generate to analyses a of day-long possibility turnaround time the for a personal offer epigenomic map. will Deeper technique this times, turnaround analysis and sequencing to improvements ongoing with coupled is ATAC-seq when that envision We decision-making. clinical with compatible timescale a on profiles epigenomic personal ate RNA-seqin advancements recent and rare-cell techniques such as FACS or laser-capture microdissection gene regulation, particularly when integrated with other powerful add and to toolkit to the genomics improve of our understanding compaction. chromatin on We ATAC-seq expect to have applicability,broad to considerably information captures it that gests sug which regions, genomic relevant of biologically fingerprints CD4 n Jurkat e ETHODS l Onepotentially exciting application of ATAC-seq toisgener r i n s + e i

o 100 v 25 n e 1 0 r

s IL2 _ _ o i o f n

t

Cyclosporin

o GM12878 B cell h 123,375,00 f e

t FK506

NFAT h p e N a

p p A a NCBI Gene Expression Omnibus: raw are data Omnibus: Expression Gene NCBI TURE METHODS e p 0 r e A . r . GATA, NFAT, OCT1, IL Lenalidomide 2 CD4 G

IRF4 | S ADVANCE + E Tcell 4 7 7 33 5 , 123,465,000 3 3 CTCF 4 .

. ONLINE

–0.60 Correlation

ARTICLES

PUBLICATION 0

0.60 123,506,000 Ruxolitinib STAT3 o n l i | n

5 e - - - - -

© 2013 Nature America, Inc. All rights reserved. R o The authors declare competing financial interests: details are available in the COM experiments and interpreted the data. H.Y.C. and W.J.G. wrote the paper. J.D.B., P.G.G. and L.C.Z. performed the research. All authors designed A a gift from the Snyder laboratory (Stanford University). Career Scientist of the Howard Hughes Medical Institute. GM12878 cells were for Regenerative Medicine (H.Y.C.). H.Y.C. acknowledges support as an Early U19AI057229; Scleroderma Research Foundation (H.Y.C.); and California Institute (H.Y.C., W.J.G. and J.D.B.), including RC4NS073015, U01DK089532 and graphics. This work was supported by the US National Institutes of Health FACS sorting, A. Schep for modeling Tn5 insertion preference, and V. Risca for S. Kim lab and the Stanford flow-cytometry core facility for assistance with We thank members of Greenleaf and Chang labs for discussion, A. Burnet and ACKNO 6 7. 6. 5. 4. 3. 2. 1. c 14. 13. 12. 11. 10. 9. 8. o n eprints and permissions information is available online at UTHOR

ARTICLES

m l |

i

ADVANCE n / P regulatory DNA. regulatory active isolate to (FAIRE) elements regulatory of isolation assisted cells. mammalian from genome the across elements regulatory gene active mapping vivo in transposon DNA (2010). R119 high-density by librariesfragment 2 data. ENCODE from genome.human cells. .human the genome. genome.the across chromatin 1 Biol. Cell Rev. Annu. Science Simon, J.M., Giresi, P.G., Davis, I.J. & Lieb, J.D. Using formaldehyde-Using J.D. Lieb, & I.J. P.G.,Davis, Giresi, J.M., Simon, for technique high-resolution a DNase-seq: G.E. Crawford, & L. Song, Gangadharan,Mularoni,Craig,Fain-Thornton,N.L.S.,L., & Wheelan, S.J. J., Adey,A. W.S.Tn5 Reznikoff, & I.Y. Goryshin, M.B. Gerstein, A. Barski, Valouev,A. D.E. Schones, R.E. Thurman, A.P.Boyle, promoters. at remodelingchromatin of dynamics The Mellor,J. transcription. and structure ChromatinY. Lorch, & R.D. Kornberg, DNA. and histones of unit repeating a structure: Chromatin R.D. Kornberg, e r ETING 7 9 e

v W 3 (2005). 147–157 , p e

r , 7367–7374 (1998). 7367–7374 , LEDGMENTS CONTRI r i s n i . o t

F s 1 n Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. / IN Nature et al. et

84 o i

et al. et n ONLINE f A et al. et

B et al. et , 868–871 (1974). 868–871 , d t NCI e UTIONS h Nature et al. et Rapid, low-input, low-bias construction of shotgun of construction low-bias low-input, Rapid, x e et al. et et al. et

.

High-resolution profiling of histone methylations in the in methylationshistone of profilingHigh-resolution h 489 p A Cell Nat. Protoc. Nat. Hermes High-resolution mapping and characterization of open of characterization and mappingHigh-resolution Determinants of nucleosome organization in primary in organizationnucleosome of Determinants a t L Cold Spring Harb. Protoc. Harb. Spring Cold

m p

PUBLICATION Dynamic regulation of nucleosome positioning in positioningnucleosome of regulation Dynamic Nature

INTERESTS

e

The accessible chromatin landscape of the human the of landscape chromatin accessible The Architecture of the human regulatory network derived network regulatory human the of Architecture , 75–82 (2012). 75–82 , l 474 Cell 12 8 r . . , 563–587 (1992). 563–587 , 9 inserts into DNA in nucleosome-free regionsnucleosome-free in DNA into inserts

, 516–520 (2011). 516–520 , , 823–837 (2007). 823–837 , 132

489

, 887–898 (2008). 887–898 , 7 , 256–267 (2012). 256–267 , Cell , 91–100 (2012). 91–100 ,

| 1 in vitro in 132

07 N in vitro in A , 21966–21972 (2010). 21966–21972 , , 311–322 (2008). 311–322 , TURE METHODS transposition.

2 transposition. 0 1 0 pdb.prot5384 (2010). pdb.prot5384 h t Genome Biol. Genome t p J. Biol. Chem. Biol. J. : / / w Mol. Cell Mol. w w

. n a

t 11

u

r

e , .

25. 35. 34. 33. 32. 31. 30. 29. 28. 27. 26. 24. 23. 22. 21. 20. 19. 18. 17. 16. 15.

diverse transcription factors in human cells. human in factors transcription diverse vivo in (2012). 1735–1747 elements. regulatory at environmentchromatin sequencing. by occupancy repression. transcriptional with it couples and compaction chromatin regulates surface here.from fragment. chromatincondensed genomic a of properties Physical G. Felsenfeld, gene. receptor folate erythroid-specific independentlyregulated an from locus chicken the separate region chromatincondensed and element data. ENCODE Elements). Regulatory of Isolation Assisted(Formaldehyde FAIRE using chromatin eukaryotic (ENCODE). Elements DNA 861–873 (2010). 861–873 specificities.binding factor transcriptionhuman of cells. immune in splicing and expression Methods Nat. regulatory.in variation myelofibrosis. in inhibitor, response.predicting for link mechanistic potential a providing factor-4, regulatory factor,interferon A. cyclosporin and FK-506 by blocked factor transcriptionT-cell a of association CD28. molecule accessory cell T the by activity enhancer gene interleukin-2 (2011). 447–455 data. accessibility chromatin and sequence DNA from footprints. factor transcription (2011). Boyle, A.P.Boyle, Jolma, A. Jolma, A.K. Shalek, F.Tang, M.T.Maurano, S. Verstovsek, A. Lopez-Girona, Nuclear G.R. Crabtree, & R.J. Bram, W.M.,Corthésy,B., Flanagan, of Regulation A. Weiss, & G.R. Crabtree,B.A., Irving, Fraser,J.D., R. Pique-Regi, S. Neph, Hesselberth, J.R. Hesselberth, Kundaje,A. K. Chen, nucleosome The Tremethick,D.J. & Rangasamy,J.Y.,D. Fan, J., Zhou, go we do wheretranscription: and ChromatinY. Lorch, & R.D. Kornberg, & F.Recillas-Targa, M.-N., Prioleau, M.D., Litt, R., Ghirlando, insulator An G. Felsenfeld, & M. Nony,P.,Simpson, M.-N., Prioleau, M.M. Hoffman, from elements regulatory active of Isolation J.D. Lieb, P.G.& Giresi, of Encyclopedia the to guide user’s A Consortium. Project ENCODE The EMBO J. EMBO by digital genomic footprinting.genomic digital by Science et al. et et al. et et al. et et al. et J. Mol. Biol. Mol. J. Curr. Opin. Genet. Dev. Genet. Opin. Curr. Nat. Struct. Mol. Biol. Mol. Struct. Nat. et al. et et al. et

et al. et

6 mRNA-Seq whole-transcriptome analysis of a single cell. single a of analysiswhole-transcriptome mRNA-Seq

Nucleic Acids Res. Acids Nucleic 2 DANPOS: dynamic analysis of nucleosome position and position nucleosome of analysis dynamic DANPOS: An expansive human regulatory lexicon encoded in encoded lexicon regulatory human expansive An et al. et et al. et Nature et al. et 1 , 377–382 (2009). 377–382 , et al. et Multiplexed massively parallel SELEX for characterization for SELEX parallel massively Multiplexed 5 8 High-resolution genome-wideHigh-resolution 1 et al. et Ubiquitous heterogeneity and asymmetry of the of asymmetry andheterogeneity Ubiquitous et al. et , 4035–4048 (1999). 4035–4048 , Single-cell transcriptomics reveals bimodality in bimodality reveals transcriptomics Single-cell , 313–316 (1991). 313–316 , Safety and efficacy of INCB018424, a JAK1 and JAK2 and JAK1 a INCB018424, of efficacy and Safety Systematic localization of common disease-associated common of localization Systematic Accurate inference of transcription factor binding factor transcription of inference Accurate Integrative annotation of chromatin elements from elements chromatin of annotation Integrative

3

Lenalidomide downregulates the cell survival cell the Lenalidomidedownregulates Methods Global mapping of protein-DNA interactionsprotein-DNA of mapping Global 336 5 Science 2 PLoS Biol. PLoS , 803–807 (1991). 803–807 , Genome Res. Genome N. Engl. J. Med. J. Engl. N. , 597–605 (2004). 597–605 , Br. J. Haematol. J. Br.

Nature 48

33 4

1 1 12 , 233–239 (2009). 233–239 , 7 , 827–841 (2013). 827–841 , 4

, 1190–1195 (2012). 1190–1195 , 9 , 1070–1076 (2007). 1070–1076 , , 249–251 (2002). 249–251 ,

, e1001046 (2011). e1001046 , 489

23 Nat. Methods Nat. Nature , 83–90 (2012). 83–90 , , 341–351 (2013). 341–351 ,

363

Genome Res. Genome 1 54 in vivo in

Genome Res. Genome , 1117–1127 (2010). 1117–1127 , 498 , 325–336 (2011). 325–336 , Genome Res. Genome , 236–240 (2013). 236–240 ,

6 footprinting of footprinting Genome Res. Genome , 283–289 (2009). 283–289 ,

21

, 456–464 , 22

B ,

2 -globin

0

21 ,

, © 2013 Nature America, Inc. All rights reserved. 2% FBS and underlaid with 15 mL Ficol-Paque Plus (GE). Blood Blood (GE). Plus Ficol-Paque mL 15 with underlaid and FBS 2% 50 at the with blood was incubated cocktail Technologies). RosetteSep RosetteSep HumanCD4 CD4for selected negatively was point time each at approved IRB- Universityprotocol. StanfordInformed consent was a obtained. 5 under mL of blood period, 72-h a over times three volunteer normal one from obtained was blood whole of CD4 methods. qPCR-based using libraries our quantified we reason other mass-based quantitation methods hard to interpret. For this and Qubit make would kb,>2 which fragments we observed gels 40 bp and 1 kb with a mean of ~120 bp. From and the bioanalyzer between distribution a is size insert sequenced The complexity. library the maximize to step size-selection a avoid to chose we Library QC and quantitation. 50- the into directly transposition after diately purification before PCR and instead took the 5- 50- of 5- a in done was reaction transposition the tions: excep notable some with protocol same the used we reactions, Low–cell number protocol. cycles. 10–12 of total a for amplified 20 in nM ~30 of concentration library final a yielding kit cleanup PCR Qiagen a using purified were libraries 45- remaining the for needed additional cycles of the number determine to cycles 20 for reaction this ran We the PCR with cocktail Sybr Green at a concentration of final . × 0.6 10 and added reaction of PCR the an we aliquot took which tion. To do this, we amplified the libraries full for five after cycles, using qPCR in reaction order to stop before satura amplification PCR the monitored we PCR, our in bias size and GC Toreduce at thermocycling 98 °C for 10 s, 63 °C for 30 s and 72 °C for 1 min. the PCR following 72 conditions: °C for 5 min; 98 °C for 30 s; and ( 2 and 1 primers PCR Nextera 1.25 and mix master PCR NEBnext × 1 using kit. MinElute Qiagen a using at 37 °C. following the Directly transposition sample was purified free water). The was reaction transposition out carried for 30 min fixed- a 2× TD buffer, 2.5 used we the pellet was resuspended in the reaction transposase mix (25 prep, nuclei pellet the the from away centrifugations. after during pipetted carefully and cells centrifuge angle losing avoid To 500 at spun were MgCl mM 3 NaCl, mM 10 7.4, pH Tris-HCl, mM (10 buffer lysis cold 500 at centrifugation and PBS 50 using wash a by followed was which min, 5 for steps. major protocol. ATAC-seq ONLINE doi: 1,200 at min 20 for centrifuged was 3. PCR 2. Transpose and purify 1. Prepare nuclei 10.1038/nmeth.2688 + M 2 enrichment from peripheral blood. M and 0.1% IGEPAL CA-630). Immediately after lysis, nuclei L/mL for 20 min, diluted in an equal volume of PBS with with PBS of volume equal an in diluted min, 20 for L/mL L reaction. Also, we eliminated the Qiagen MinElute MinElute Qiagen the eliminated we Also, reaction. L

METHODS . Following purification, we amplified library fragments fragments library we amplified . purification, Following M g . . To prepare nuclei, we spun 50,000 cells at 500 L transposase (Illumina) and 22.5 for 10 min using a refrigerated centrifuge. centrifuge. refrigerated a using min 10 for The ATAC-seq protocol has three three has protocol ATAC-seq The + . Immediately following . the following Immediately nuclei prep, T Cell Enrichment Cocktail (StemCell (StemCell Cocktail Enrichment T Cell To prepare the 500- and 5,000-cell 5,000-cell and 500- the To prepare g for 5 min. Cells were lysed using using lysed were Cells min. 5 for During the ATAC-seq protocol, ATAC-seqprotocol, the During Supplementary Table 1 Table Supplementary g without break, negatively negatively break, without One green-top tube tube green-top One M M M L reaction imme L. Libraries were were Libraries L. M L PCR. L reaction. The The reaction. L M M of custom custom of M M M + × 1 cold of L cells using using cells M L nuclease- L instead instead L ), using using ), M l l of M L g - - -

ing reads were collected (-m1). For all data files, duplicates were were duplicates files, data For all (-m1). collected were reads ing align unique only that and (-X2000) align to allowed were kb 2 up to that fragments ensured parameters and These -m1. -X2000 Bowtie using hg19 to aligned were Reads HiSeq. a on reads 50 × 8 × 50 or MiSeq a from reads 34 × 8 × 34 processing. data Primary FBS. 10% with PBS into sorted were FBS. 50,000 CD3 2% in with PBS and resuspended twice 2% PBS FBS with washed diH in dilution at 1:10 BDpharmLyse using for lysed were RT.Cells at 1:20) dark the (RPA-T8, in min 20 CD8 and 1:20) (RPA-T4, 1:20), (SK7, CD4-APC-Cy7 CD3-PE-Cy7 1:20), diluted (M5E2, CD14-A-488 antibodies Bioscience BD with stained was coat) (buffy sample cells. GM 100- and a with Biosciences) (BD FACSAria a using Stain sorted were leukocytes cells live Cell and Probes), Fixed (Molecular NucBlue DAPI blood with stained were cells 12878 peripheral of FACS FBS. 2% with PBS in twice washed were cells and interface medium–plasma density the from removed were cells selected ATAC-seq insertion size enrichment analysis within chroma within ing fragment sizes analysis overlapping each chromatin state ( enrichment annotations. size tin insertion ATAC-seq posterior a with those as >0.8. of probability identified were regions components. Enriched enriched and background the for count read ATAC- seq the and component, zero-inflated the model to was used Alignability bp. 75 of offset an a and bp using 300 of run size window was ZINBA manuscript. this in peaks ATAC-seq peak-calling. ATAC-seq bp.5 − offset were strand – the to aligning reads all and bp, +4 by offset were (ref. by bp separated 9 adaptors two inserts and dimer a as binds poson event. binding transposon of descriptions the Previous Tn5 show that the transposase the trans of center the represent to sites Picard. using removed 100 bp), allowing an effective negative weighting of these reads. reads. these of weighting negative effective an allowing bp), 100 The reads used (reads was background less nucleosome-free than and Danpos using Dantools analyzed were Reads reads. three were into reads split trinucleosome and reads, two into split were reads see cutoffs, determining between 558 and 615 bp were considered to be trinucleosomes (for reads and dinucleosomes, be to considered were bp 473 and 315 between reads mononucleosomes, be to considered were bp 247 and 180 between reads free, nucleosome bp considered 100 were below Reads bins. various into reads to split we chose track, data positioning. Nucleosome sizes. fragment of set genome-wide the to relative percent maximal within each state, the and enrichmentto was computed normalized then were distributions The computed. were e n For peak-calling and footprinting, we adjusted the read start start read the adjusted we footprinting, and peak-calling For s e m b l . o 2 2 r using the parameters -p 1, -a 1, -d 20, -clonalcut 0. 0. -clonalcut 20, -d 1, -a 1, -p parameters the using g 1 / 1 i ). Therefore, all reads aligning to the + strand strand + the to aligning reads all Therefore, ). n f o + First, the distribution of paired-end sequenc / CD8 d o c s 2 + / 0 (BD) for 15 min, centrifuged for 5 min, 5 min, for centrifuged min, 15 for 0 (BD) f , CD3 u Supplementary Fig. 7 Fig. Supplementary We used ZINBA to call all reported reported all call to ZINBA used We n To generate the nucleosome-position nucleosome-position To the generate c Data were collected using either either using collected were Data g e + M n CD4 m nozzle. One peripheral blood blood peripheral One nozzle. m / r e g + u and CD14 and l a t o r 3 y 6 _ with the parameters parameters the with s e g + ). Dinucleosome Dinucleosome ). m cell populations N e A n TURE METHODS t h a t t t i p o : n / / . h w GM GM t w m w l - - - ) - .

© 2013 Nature America, Inc. All rights reserved. lated lated each gene by was the determined taking sum of the weighed transcriptionextenta whichThetotypes. celltivefactor regu respec the for by Centipede estimated probabilities posterior of set genome-wide the with v.14 by genes GENCODE the comparing constructed were networks regulatory factor Transcription networks. regulatory transcription factor ofComparison repository. ENCODE UCSC the from obtained were data ChIP-seq within motif. a matching counts ATAC-seq (ref. c motifs was obtained from the ENCODE motif repository ( CENTIPEDE. using Footprinting Euclidean using clustered by mean centered and by gene normalized and distance hierarchically then were Factors bps. of 10 bins in dyad nearest events the to by distance annotated were Binding peak-calling. for control a as used were Inputs GEM data used, see loaded from the UCSC ENCODE for repository; a complete list of ChIP-seq peak-calling and clustering. genome wide ( position nucleosome on features global recapitulated sized features, i.e., we enhanceosomes, that observed we faithfully insert simple nucleosome- other to using due positives false yield may cutoffs size tracks nucleosome generating Although nucleosomes. overlapping multiple calling allows analysis This N o A m TURE METHODS p 2 3 b Fig. 3c Fig. 7 7 i , and the parameters used where -k_min 6 -k_max 20. 20. -k_max 6 -k_min where used parameters the and , ) included the PWM score, conservation (PhyloP) and and (PhyloP) conservation score, PWM the included ) o . m i t , . d e Supplementary Table 2 Supplementary d ). u / e n c o d e - m o o 100 bp of each genomic region region genomic each of bp 100 t i f s / ). The input for Centipede Centipede for input The ). The genome-wide set of of set genome-wide The ChIP-seq data were down were data ChIP-seq . Peaks where called using using called where . Peaks 3 8 . h t t p : / - - - /

elements elements (ref. NFAT-responsive known the to addition in STAT3-binding sites etable by a IRF4- and human This identified analysis therapeutic. targ pathways TF (iii) and ChIP-seq by confirmed as TFs more or one by binding (ii) H3K27ac), and (H3K4me1 marks histone of upstream region genic to responsive FDA-approved be immunomodulatorymay drugs. that We scanned types the inter cell more or one in enhancers putative identify to browser genome UCSC the on data Candidate linkage complete and coefficient Pearson the correlation using clustered hierarchically factors in the other cell type. The resulting correlation matrix was each factor transcription in of a given cell type with all transcription correlation the as computed was networks regulatory factor transcription of Comparison gene. each for site start scription tran to the distance of the basis on the weighted was probability posterior the motif, mapped each For chromosome. same the to mapping factor transcription given a for probabilities posterior 38. 37. 36.

and display of genome-wide expression patterns.genome-wideexpression of display and constraints.binding spatial factor transcription reveals discovery motif and finding event Biol. genome.human the to sequences DNA short of alignmentefficient 95 Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis Cluster D. Botstein, P.T.,P.O.& Spellman,Brown, M.B., Eisen, binding widegenome resolution High D.K. Gifford, & Mahony,Y.,S. Guo, memory- and Ultrafast S.L. Salzberg, & M. Pop, C., Trapnell,B., Langmead, , 14863–14868 (1998). 14863–14868 ,

1 0 , R25 (2009). R25 , IL2 2 8 enhancer analysis. ). PLoS Comput. Biol. Comput. PLoS 3 8 . IL2 in hg19 for (i) enhancer-associated enhancer-associated for (i) hg19 in

8 , e1002638 (2012). e1002638 , We inspected ENCODE ENCODE inspected We doi: Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. 10.1038/nmeth.2688 Genome IL2 - - -