© 2015 Nature America, Inc. All rights reserved. and Brigham and Women’s Hospital, Boston, Massachusetts,Division USA. ofCorrespondence Immunology, should Department be addressed of Microbiology to D.M. ( and Immunobiology, Harvard Medical School, and Evergrande Center for Immunologic Diseases, Harvard Medical School (refs. II polymerase operandi modus Aire’s that main indicate approaches experimental of variety a from machinery and splicing transcriptional the polymer ases paused of surfeit a with (TSSs) sites start transcriptional (refs. H3 histone of tail amino-terminal hypomethylated the as such activity, low with chromatin of markers nonspecific of tion recogni the on depend to seems activity transcriptional its Instead, in association with the methylated CpG–binding factor MBD1 (ref. it although has to CpG methylated suggested been recognize residues motif, DNA-binding distinct a have not does it but DNA in binding, involved typically domain SAND a contains Aire cell-type. given a the in expressed be to allows expected be not would that of generally and genes of number large a transcription affects It unusual factor. very a is Aire factors, transcription specific autoimmunity multi-organ develop Aire encoding loci in mutations loss-of-function with mice and humans as profound, are T cells regulatory specificities selection self-reactive of negative inducing by repertoire cell T the mold peptides cells dendritic by cross-presentation their or mTECsurface, the at molecules (MHC) complex histocompatibility PTAsmajor by factors that modulate the presentation of from derived these peptides antigens’ tissue (PTAs)) cells ‘peripheral (so-called cells parenchymal ated differentiated with fully epithelial medullary associ typically are in products whose genes of set large a of (mTECs) expression ectopic induces it First, thymocytes differentiating of tolerance immunological promoting in function unique a with factor, transcription fascinating a is Aire individuals. between self-tolerance of diversity favors and antigens self of presentation effective more ensures which divisions, mTEC through stable and ‘bookmarked’ is but determinism stochastic from results that epigenome the of or DNA the of organization an suggest results our Thus, mice. individual between differed microclusters These mTECs. of a proportion in concert in activated clusters inter-chromosomal small as patterns, DNA-methylation of independently mTECs, of a minority only in induced was genes target Aire’s of Each (mTECs). cells of analysis DNA-methylation and RNA-seq single-cell performed we specificity, Aire’s Tounderstand machinery. better transcriptional the on stops removing by broadly acting genes, specific tissue- many of expression thymic ectopic the inducing by tolerance immunological controls Aire factor transcription The Matthew Meredith, David Zemmour, Diane Mathis & Christophe Benoist epithelium with ordered stochasticity Aire controls in the thymic nature immunology nature Received 13 February; accepted 9 July; published online 3 August 2015; Even if its structural domains are shared with conventional motif- conventional with are shared domains Even if its structural 1 0 . Aire also interacts with a variety of non-specific elements of of elements non-specific of variety a with interacts also Aire . s o ees puig f rmtrpoia RNA promoter-proximal of pausing release to is 6 . The physiological consequences of Aire’s consequences activity physiological . The 10 , 13 2

. In addition, Aire controls the expression . of Aire In controls the expression addition, , aDV 1 4 4 , ). 5 or by promoting positive selection of of selection positive promoting by or A NCE ONLINE PUBLIC ONLINE NCE 11 , 1 2 . Indeed, data derived data derived . Indeed, A TION 1

. 3 . These These . 8 doi:10.1038/ni.324 , 9 ) or or ) 7 1 ). - - - .

distribution of PTA expression in individual mTECs. This perspective, models statistical complex necessitates and data profiling than less conventional intuitive analysis makes which pling statistics, Finally, uncertain. the data must be interpreted in the context of sam to assess byimpossible analysis, single-cell estimation of variance technical remains innately are replicates real because Second, cell. one any in assessed unreliably is expression low with transcriptome the of portion the best, at 20% of efficacies molecular-conversion with however. First, challenging, remains data of scRNA-seq analysis The devices and microfluidic ‘multiplexing’,‘barcoding’ cell molecular innovations have made scRNA-seq more Technical performant and robust, with responses. or differentiation cellular impor determining be in tant can that programs transcriptional specific of activation bursting transcriptional from cells of population homogeneous expression gene in (‘noise’) information fluctuations provide the on can and averaging erroneous avoid and ture struc subpopulation unrecognized reveal can It cytometry. flow as such technologies profiling by brought single-cell granularity unique the with transcriptome genome-wide of quality global the bining expression gene of analysis the on vistas new completely elusive. remained has cells Aire’s individual in explains that action framework coherent a clues, transcripts of bulk the for than Aire of targets are mice is for higher genes that identical genetically between expression from individual mice also suggests that inter-individual ‘noise’ in gene PTAs of patterns distinct express mTECs,indistinguishable, that suggests individual otherwise scRNA-seq seemed to provide a good opportunity to explore the the explore to opportunity good a provide to seemed scRNA-seq opened has (scRNA-seq) sequencing transcriptome Single-cell Aire’s action has an element of stochasticity. Single-cell PCR analysis Aire -sufficient and and -sufficient 7 [email protected] Aire 15– 1 -deficient medullary epithelial epithelial medullary -deficient 8 . Gene-expression profiling of mTECs of profiling Gene-expression . 2 6 , but it may also reveal coordinated coordinated reveal also may it but , 23– 2 5 . Some of this noise can result result can noise this of Some . ) or C.B. ( [email protected] 21 , 2 ticles e l c i rt A 2 in an otherwise otherwise an in 1 9 . Despite such such Despite . 24 2 0 , by com by

25 , 2

8 . 2 ). 7  - - - - .

© 2015 Nature America, Inc. All rights reserved. to relative position within the gene (key); ‘density’ (vertical axis) indicates the number of genes. Data are representative of two experiments with results results with experiments two of representative are Data genes. of number the indicates axis) (vertical ‘density’ (key); gene the within position relative to according presented (right), genes Aire-neutral a ( (purple). versa vice or (green) level exon the at 1.1-fold than less but level gene the at twofold than more upregulated exons showing from data RNA-seq mTEC for genes versus ( (yellow). exon spliced) (differentially induced Aire gene Aire-neutral the in top) boxes, (black exons in ( genes). Aire-neutral lines; dashed (between 1.1-fold than less of expression in a change with or more, or twofold by genes) Aire-repressed (blue; downregulated or genes) Aire-induced (red; upregulated genes showing littermates, wild-type their and mice Aire-deficient of RNA whole-mTEC from generated libraries RNA-seq in genes all of (WT/KO)) (wild-type/Aire-deficient expression in change versus FPKM) (as counts ( transcriptome. mTEC of diversity 1 Figure of absence Aire exons inclusion showed of Aire-dependent particular the by mTECs in little affected were Several levels overall genes. whose within transcripts exons individual of more use the exerted on effects further subtle Aire that showed data RNA-seq our above, below. analyses scRNA-seq the in tracked those were here defined genes and ‘Aire-neutral’ genes (neither induced nor Aire-induced repressed of by sets Aire) The transcriptome. the of fraction large a lates analyses ( ‘Aire- 766 and genes repressed’ genes (at‘Aire-induced’ an arbitrary change in were expression of over twofold) 2,995 sets, data these in reads) mapped million per exon of kilobase per (fragments FKPM 1 of threshold a (at expressed genes 19,772 the of Aire: of effect deep × 10 31.3 Aire mutation encoding Aire standard transgene CD45 artificial on a driven bacterial (GFP) prepared performed We we mTECs. Ly51 analysis, bulk-sorted of single-cell analysis to RNA-seq prelude a As expression gene Aire-induced of Range RESULTS deviation. autoimmune to susceptibility individual observations Our mice. and induction of tolerance for efficiency the implications have direct individual between varied and logic, ent whose expression clustered in small groups of mTECs, transcripts with no appar Aire-induced of number large a detecting chaos, this in order unexpected identified we expression, of frequency low a with Aire-induced gene expression proved to be noisy,very genes affecting Although present. are they in already in which cells expression script expressing particular transcripts or instead boosts the intensity of tran are Aire of of cells in mTECs the frequency Aire changes expressed targets and whether are that genes individual frequently how tigate by PCR earlier achieved than much broader s e l c i rt A  ( group per mice two from pooled d c Fig. 1 Fig. ) of exons in Aire-induced genes (left) and and (left) genes Aire-induced in exons ) of

) Change in expression (as in in (as expression in ) Change ) Distribution of change in expression (as in (as expression in change of ) Distribution In addition to the consequences on entire transcripts reported reported transcripts entire on consequences the to addition In -deficient samples, identifying an Aire- an identifying samples, -deficient -deficient littermates. In the libraries generated (11.8 × 10 × (11.8 generated libraries the In littermates. -deficient lo Abcb1b MHCII a

). These results were consistent with published microarray microarray published with consistent were results These ). 1 Aire increases the repertoire and and repertoire the increases Aire 6 9 2 mapped reads per sample), we observed a biased and very and very a biased we observed sample), per reads mapped and subsequent RNA-seq data RNA-seq subsequent and to generate generate to (defined as in in as (defined 2 hi 9 , which we crossed with mice carrying the GFP b hi ) Read ‘pileups’ (peaks) (peaks) ‘pileups’ ) Read cells from mice that express green fluorescent fluorescent green express that mice from cells Aire a ) in wild-type and and wild-type ) in -sufficient (called ‘wild-type’ here) and and here) ‘wild-type’ (called -sufficient a ) of exons exons ) of a a ; mean; biological duplicates) or two experiments ( experiments two or duplicates) biological ; mean; ) Read ) Read

a , 3

0 showing that Aire regu Aire that showing 15 , 1 7 , allowed us , to inves allowed c a Exon expression (fold) Gene expression (fold)

WT/KO 1,000 WT/KO 10 100 10 0.1 Aire 10 10 –1 1 3 0.1 −2 -knockout 1,352 genes 3,219 exonsin Expression (lo Gene expression(fold) −1 1 6 4 3 2 1 0 to to WT/KO – - - - - 10

1,730 genes 6,965 exonsin g 10 With the reference data above in hand, we proceeded with analyz with proceeded we hand, in above data reference the With mTECs individual of transcriptomes in diversity Overall TSS. the from distance some quite extend actually might effect progressively along ( the transcript the between degree of induction of genes and exons by Aire increased in Aire’s of stalling before gene the The match portion short absence. Aire of absence the in change of little exon shows the first comparatively the representation ( transcripts the of ( induced was transcript whole the while Aire, of the absence or in presence invariant remained abundance whose exons of set a of ence interacts it which enhancing with factors by splicing the with consistent peptides property a inclusion, exon genome-encoded to exposure maximize (refs. autoimmunefor responses to the product recognized of the PTA-encoding initially gene as tolerance, affect to known been long has exons; (175,216 whole a as gene the of that with correlated to representation contrast whose exons in of majority the expression, Aire-dependent with exons such 3,219 ( ( 201 cells that generated at least 1 × 10 to analysis our further for data, we robustness restricted high-quality num bers of molecules initial small of over-amplification from artifacts avoid to original molecule each of tagging for identifiers molecular unique as well as origin, of cell its to read sequence each of attribution allow to codes’ technique Seq Aire and mTECs of wild-type pairs two from single 360 from libraries ing the to ( cells the of profiles RNA-seq characteristics marker the relate could we that such plates, titer of into wells micro cells Weof scRNA-seq. single sorting index used through mTECs individual in expression gene Aire-controlled ing Fig. Fig. 2 Fig. 1 Fig. Fig. 1 Fig. FPKM 100 -deficient mice using -deficient a from protocol CEL- the modified original 1,000 31 ) b b c b – ). Several findings confirmed our single-cell data. First, there data. First, our single-cell confirmed findings ). Several ). These exons were particularly prevalent at the beginning beginning the at prevalent particularly were exons These ). ). A more complete analysis of this phenomenon revealed revealed phenomenon this of analysis complete more A ). , d 3 ). 2 aDV ). As has also been speculated before speculated been also has As ). d Density 3 b Abcb1b A 4 0. 0. . This protocol includes oligo(dT) priming with ‘bar with priming oligo(dT) . includes protocol This NCE ONLINE PUBLIC ONLINE NCE 1 2 0 1 0.1 WT KO 1 Fig. 1 Fig. . Conversely, our analysis also revealed the pres the revealed also analysis our Conversely, . Reads 1 Aire-induced 0 3 1 1 , reflective of polymerases that transcribe a transcribe that polymerases of reflective , 5 1 kb d . Although most single-cell . libraries Although most yielded single-cell ), consistent with the demonstration that that demonstration the with consistent ), Exon expression(fold) 0 WT/KO Fig. Fig. 1 Fig. 2 Fig. A 0.1 4 TION unique mappable reads per cell Fig. 1 Fig. d Position Aire-neutral ), which suggested ), that suggested which this a 1 1

). We generated sequenc We generated ). nature immunology nature c ). Alternative splicing splicing Alternative ). 0 3 3 , Aire may help help may Aire , relative toTSS Position 0.8 0.6 0.4 0.2 0.0 Plp1 0.8 0.6 0.4 0.2 1. 0 : ------

© 2015 Nature America, Inc. All rights reserved. script. When we compared the changes in mean expression intensity intensity expression mean in changes the compared we When script. tran the of or expressing cell cells in proportion per the of transcript amount the in either increase an from resulted have might profiling population bulk in gene Aire-induced given a for mTECs deficient in than mTECs wild-type in observed expression higher The expression target-gene of intensity the mainly increases Aire nominal P a with (68.1%, result false-negative a being of low very probability a had transcripts Aire-induced Most dropouts. statistical were gene given a for reads no with cells that probability the plotted was which not here the case ( transcripts, neutral or Aire- Aire-induced for at expression-matched the same frequency expected be would dropouts and these of expression, intensity to the (‘dropouts’)results related of from is sampling false-negative directly sampling, which can be a concern for scRNA-seq. First, the frequency of statistical a consequence merely not was transcripts Aire-induced ( levels sion expres three the across significant was difference the and mTECs, these in expressed frequently more were genes Aire-repressed and only the of in 5–20% active were genes Aire-induced most that showed analysis This RNA-seq). bulk in expression for matched Aire-repressed mRNAs (transcripts or Aire-neutral Aire-induced, for genes individual expressing mTECs of frequency the of plots density generated We frequency low at expressed transcripts mainly targets Aire mTEC Fifth, ters. mice. clusters were in individual different were groups of mTECs with comparable expression of small gene clus there point, that with accordance in Fourth, expression. coordinated showed genes Aire-induced of clusters discrete Third, cells. deficient in than cells wild-type in more frequency, that increased Aire Second, transcripts). Aire-neutral than sparse more were transcripts targeted mainly transcripts expressed at a low frequency (Aire-induced ( we and points, which substantiated confirmed in ual transcripts each cell ( these scRNA-seq data. We determined presence or absence of individ ( level population at the of aggregation by all the scRNA-seq data recapitulated well sets, the data obtained gene, per reads of number total the Third, transcript). ( matched the counts of GFP encoding and transcripts Aire in each cell intensity of the Aire-GFP fluorescence, as detected by flow cytometry, ( mTECs in expected transcripts housekeeping was good representation of the transcripts encoding MHC class II and with two mice per genotype ( cells (Pearson counts per gene versus bulk read counts of those same genes, in wild-type number of reads from the mRNA reads observed in each cell (Pearson in (as sorting during intensity fluorescence GFP between ( (antisense). red or (sense) Top,blue mRNA: poly(A). to adjacent sequences only tags technique scRNA-seq the because detected is exon carboxy-terminal the only (rows); cells representative 26 in sets), data scRNA-seq the in (columns) genes illustrative five ( analysis. further from omitted cells indicates Aire or (WT) cell wild-type each for detected genes versus reads mappable ( analysis. scRNA-seq for mice wild-type from 2 Figure nature immunology nature mTECs in versus wild-type Fig. 2 Fig. value of <0.05; <0.05; of value We then analyzed computationally the expression of Aire targets in -deficient cell (KO) in the scRNA-seq data sets; gray (Excluded) (Excluded) gray sets; data scRNA-seq the in (KO) cell -deficient d

; the the ; scRNA-seq analysis of mTECs. ( mTECs. of analysis scRNA-seq Fig. 4 Fig. r Aire = 0.72). Data are pooled from two independent experiments Aire Fig. 4 Fig. -deficient mutation abolishes function but not the the not but function abolishes mutation -deficient a -deficient mTECs sampled ( sampled mTECs -deficient ). Two points indicated that this low frequency of frequency low this that Two indicated ). points b Aire

Fig. 2 Fig. ). a aDV ) or two independent experiments ( transcript (wedge). ( Fig. Fig. Aire e A ). NCE ONLINE PUBLIC ONLINE NCE -deficient mTECs for -deficient positive a given 3 ); this revealed ); each of this revealed the following a ) Sorting of single mTECs (red) (red) mTECs single of ) Sorting r = 0.56): dot size indicates the b c ) Quantification of unique unique of ) Quantification ) Read ‘pileups’ for for ‘pileups’ ) Read e Fig. 4 Fig. ) ) Mean single-cell read Figs. 4 Figs. Fig. 2 Fig. Fig. Fig. 4 d ) Correlation ) Correlation A a a TION c ) versus GFP GFP ) versus ). Aire-neutral Aire-neutral ). – a ). Second, the the Second, ). 6 ). Second, we ). Second, ). First, ). Aire First, b

e ).

Aire Aire

------

the genome-wide norm, even after transcriptional activation by Aire. activation transcriptional after even norm, genome-wide the that genes than are of expressed targets Aire frequently less remained ( distribution Aire-neutral the from predicted from the dropout distribution of gene pairs randomly drawn ence led to less in increase the of expression frequency its targets than Aire’s in intensity expression pres in shift the that found we Indeed, merely follow the main intensity-frequency relationship ( relationship intensity-frequency main the follow merely that not did a shift intensities, per-cell toward higher distribution the in shift predominant a in resulted Aire of presence the that showed and wild-type in genes, those express did that mTECs in expression of intensity mean the versus genes Aire-neutral or Aire-induced expressing cells of frequency the higher To rates. favor dropout lower intensities we plotted bias, for because such test simply cells positive of detection frequent more to potential A cell. of the confounder is this analysis that a read higher number per cell leads in genes of majority the activated subtly Aire ( category Aire-neutral the in scripts between shift increase to seemed ( both expression Aire that found we transcript, this expressing cells of frequency the in change the as well as transcript, c a d GFP transcripts (log ) Fig. 4 Fig. 10 Reads MHCII −1 0 1 2 3 GFP Actb 1 kb Aire expression GFP fluorescenceintensity c ). Curiously, we also observed a limited but significant significant but limited a observed also we Curiously, ). Aire 4 3 GFP -deficient mTECs and wild-type mTECs for tran for mTECs wild-type and mTECs -deficient (log Gapdh lo 10 ) GFP hi KO WT 5 Aire Aire b -deficient mTECs. This analysis analysis This mTECs. -deficient

Supplementary Fig. 1 Fig. Supplementary Genes detected (log ) e 10 Fig. 4 Fig. Bulk expression (log10 FPKM) 2 3 4 −2 −1 0 1 2 3 4 Total uniquereadspercell Mean single-cellexpression −3 c ), which indicated that that indicated which ), H2-Ab1 4 3 −2 −1 (log ticles e l c i rt A 10 3 2 1 0 ) Excluded KO WT H2-Aa 5 Fig. 4 Fig. ). Thus, Thus, ). d ). ).  - -

© 2015 Nature America, Inc. All rights reserved. and Aire-deficient mice ( mice Aire-deficient and wild-type from cells in (gray), distribution genome-wide to relative (red) genes Aire-induced for expression of frequency versus cells transcript-positive ( mice Aire-deficient and in as (identified genes Aire-repressed or Aire-neutral Aire-induced, of gene) per counts 25–50 of (window transcripts Aire ( genes. of number the indicates axis) (vertical ‘density’ in as (identified genes Aire-neutral and Aire-induced expression-matched for dropouts, sampling not were cell * FPKM). (50–100 high or FPKM) (10–25 of data (scRNA-seq 4 Figure algorithm using an affinity-propagation a expression matrix weighted of basis (on the genes Aire-induced for all correlations gene-by-gene calculated we data, the in structures such investigate To better cert. streaks, vertical (short above presented results The of Aire-induced genes Coordinate expression of discrete clusters independent experiments. package, as in (‘single-cell differential expression’) software being dropouts by analysis with the SCDE no expression (most at high confidence of not regardless of intensity, and white squares indicate black squares indicate transcript presence calculated by a published Bayesian approach of expression of each transcript in each cell was genotype and mouse. The weighted probability cells are arranged in columns according to are arranged in rows by hierarchical clustering; ( data scRNA-seq the from (right) cells transcript-positive in expression mean for matched genes Aire-neutral of a set and (left) transcripts Aire-induced for mTECs, deficient and wild-type in transcripts individual of (white) absence or (black) Presence results. 3 Figure s e l c i rt A  experiments. independent two from pooled are Data

a c Density

-deficient) versus change in the frequency of expression (frequency of expressing cells; wild-type/ cells; expressing of (frequency expression of frequency the in change versus -deficient) Change in frequency 0 2 4 6 (WT/KO) 0 0.2

20 Fig. Fig. Summary of single-cell expression expression single-cell of Summary Aire increases the intensity and frequency of otherwise rare transcripts. ( transcripts. rare otherwise of frequency and intensity the increases Aire 1 5 Frequency ofexpression 0.2 0.2 8 15 3 Fig. ) suggested ) that suggested of subsets genes were in expressed con 0.4 5 1 Low 4 Induced ). Data are pooled from two Fig. Fig. 0.6 2 n n ) from Aire-deficient mice ( mice Aire-deficient ) from 20 = 2 per genotype); below, expansion of results above to focus on the shift in Aire-induced genes (identified as in in as (identified genes Aire-induced in shift the on focus to above results of expansion below, genotype); = 2 per genotype). = 2 per 0.8 2,075 * 32 488 106 4 6 1.0 2 4 Fig. Fig. ) and performed ) a clustering and partition performed 0.2 32 13 0 Change inintensity 2 Frequency ofexpression ): genes): 5 1 0.2

Aire (WT/KO Neutral P 3 2 < 10 6 8 0.4 - . . We found a of degree high ; Medium P < 10 ) 20 −15 0.6 66 24 (Wilcoxon test). ( test). (Wilcoxon −15 n 0.8 = 2) for gene sets matched by expression in bulk RNA-seq data as low (1–5 FPKM), medium medium FPKM), (1–5 low as data RNA-seq bulk in expression by matched sets gene for = 2) Genes c , wild-type versus versus , wild-type Cells 0.2 * 25 993 274 ) Change in expression intensity (mean count per gene in cells expressing the gene; wild-type/ gene; the expressing cells in gene per count (mean intensity expression in ) Change Mouse 49 115 7 1.0 5 1 Repressed M 1 0 WT Frequency ofexpression 0.2 b ouse 20 Aire-induced genes ) Bayesian confidence ) Bayesian 0.4 - Aire Hig 34 34 M 2 h 0.6 of Aire-induced transcripts grouped into 19 clusters with an internal an with into internal 19 clusters grouped transcripts of Aire-induced mTECs,51% as wild-type from sets data scRNA-seq the in structure verified the significance of these clusters by permutation (randomly (randomly permutation by clusters these of significance the verified transcripts; median, 57) and were largely distinct from each other. We mean correlation of >0.75 ( -deficient, and and -deficient, ouse a 0.8 d ) Distribution of the frequency of expression in single cells cells single in expression of frequency the of ) Distribution 14 * 177 55 M 3 0

Frequency of positive cells 1.0 KO aDV ouse 2 P A 8 < 10 ( NCE ONLINE PUBLIC ONLINE NCE 0 Represse Neutr Induced 4 P 0.4 0.8 value) that genes considered unexpressed a given a given unexpressed considered genes that value) 0 −7 al Aire , Aire-neutral genes ( genes , Aire-neutral d -deficient), calculated for expression-matched expression-matched for calculated -deficient), KO Mouse 1 Mean readsinpositivecells(log Fig. Fig. 5 Fig. Fig. b M 1 a ); ); these clusters were small (33–114 1 Density WT Induced ) in Aire-deficient mice ( mice Aire-deficient ) in 10 20 2 Fig. Fig. A 0 TION ouse (expression-matched) Aire-neutral genes 0 1 ) in cells from wild-type wild-type from cells ) in χ M 2

Confidence 2 test). ( test). 0.4 0.8 0 nature immunology nature 0 0.4 ouse 10 d M 3 ( ) WT ) Mean counts in counts ) Mean P value) 1 KO 0.8 Fig. Fig. n ouse Neutr Induced = 2); = 2);

1 4 ). ). al 2

© 2015 Nature America, Inc. All rights reserved. expression when analyzed across the GNF (Genomics Institute across the of GNF (Genomics analyzed when the expression of specificity share members cluster did nor clusters; these of any in by genes or encoded pathway by shared any products function reveal to failed system) classification relationships’) evolutionary through analysis (‘protein PANTHER the or Database Signatures Molecular (by or the pathway analysis Gene-ontology clusters. small form these would that transcripts the between commonalities for Wesearched genes. interchromosomal discrete, of networks mTECsco-expressed ( across correlations on ‘rested’ little contributed events tion to the overall gene clusters, most of which Mup lated activity of local genomic segments, as illustrated for the clusters gene local into partition to clusters. co-expressed these of appearance the for required was that Aire and indicated results of these the significance substantiated ( genes Aire-neutral expression-matched from results calculated we ( mTECs Aire-deficient from sets data scRNA-seq dom permutations ( cluster and sizes were correlations internal not in ran 1,000 achieved not did ( which of structure cluster degree same the reproduce cells), between gene per levels expression shuffling ( permutations 1,000 with experiment * right). and (middle mouse in as (identified transcripts in sets data scRNA-seq wild-type the of clusters largest 23 the in chromosomes) (different interchromosomal or chromosome) same on 1 Mb than more of (distance intrachromosomal chromosome), same the on 1 Mb than less of (distance ( (right). mTECs Aire-deficient and wild-type in clusters of correlation internal ( >0.75). of correlation a mean and more genes 30 with than clusters as defined clusters significant (with iteration each in correlations significant of quantification and correlation within-cluster mean as ( values. expression clustering in as (identified genes Aire-induced for matrix 5 Figure nature immunology nature expression gene of compendium Foundation) Research Novartis b a P = 8 × 10

It has been reported that Aire-induced PTA-encoding genes tend tend genes PTA-encoding Aire-induced that reported been has It Genes Genes Genes Genes loci ( Randomized data

Aire coordinates discrete interchromosomal gene networks. ( networks. gene interchromosomal discrete coordinates Aire 3 Supplementary Fig. 3 6 WT −5 , with no preset number of clusters. ( clusters. of number preset no , with (Wilcoxon test); test); (Wilcoxon Gene-gene Gene-gene correlation correlatio 1 0 1 0 c ) Results of 1,000 random permutations and affinity-propagation clustering of the scRNA-seq data for wild-type cells, presented presented cells, wild-type for data scRNA-seq the of clustering affinity-propagation and permutations random 1,000 of ) Results Fig. Fig. 5 n

c P Fig. Fig. c Cluster mean correlation Count Count aDV 0.4 0.8 100 150 100 150 ). ). We fewer detected such clusters with the = 0.002 and ** and = 0.002 50 50 0 0 0 1 2 0 A 0.6 Supplementary Fig. 2 Fig. Supplementary 5 0 ) calculated as in in as ) calculated NCE ONLINE PUBLIC ONLINE NCE ). However, these localized co-regula Mean intracluster Significant clusters 6 0 correlation 0.7 d 37 ). 10 Cluster size , 3 8 Randomized data Real data 0.8 . We observed such co-regu such We observed . 15 0 Real data Real data P 20 Fig. Fig. = 0.001 (Wilcoxon test). Data are pooled from two independent experiments ( experiments independent two from pooled are Data test). (Wilcoxon = 0.001 0.9 b 25 ) Correlations as in in as ) Correlations ** d 100 1 a ) Correlations as in in as ) Correlations Fig. 5 Fig. i. 5 Fig. ) in all mTECs from wild-type mice ( mice wild-type from mTECs all ) in for all wild-type mTECs (left), and calculated independently in mTECs from each wild-type wild-type each from mTECs in independently calculated and (left), mTECs wild-type all for Fig. 5 Fig. A ), which further further ), which TION b e d ); comparable ); comparable ). Therefore, Therefore, ). Cluster mean correlation d 1 0

) or when when or ) Genes Genes 2 0 Sprr 6 0 a a and , but in a control data matrix generated by random permutation of gene- of permutation random by generated matrix data a control in , but ) Gene-by-gene Pearson correlations calculated from the weighted expression expression weighted the from calculated correlations Pearson ) Gene-by-gene a Cluster size 3 , but for Aire-deficient mTECs (pooled from from (pooled mTECs Aire-deficient for , but KO 9 - - - ; 0 Gene-gene correlatio 1 0 e more distant from each other ( other each from distant more ( the same center, further radiated t-SNE plot ( the of center the at together close group to tended mTECs deficient space. cells two-dimensional two in fit best that the displays and similar are probability the computes that algorithm reduction t-SNE with matrix correlation cell-to-cell the and wild-type other in from heterogeneity transcripts with calculated ( chromosomes when part, most the for reproduced, were chromosome one from transcripts of basis the on calculated maps correlation cell-to-cell the because networks, gene ( groups Aire- induced genes showed for that mTECswild-type values partitioned into discrete probability on based individual analysis Correlation demarcated mTECs. expression their how determine to sought we above, identified genes Aire-induced of clusters small the Given subgroups mTEC distinct define networks gene Aire-induced or function to regulation. biological transcriptional seemed position, genomic of clusters terms in gene unrelated be co-expressed not these (data factor Therefore, transcription shown). particular a for motifs binding for enrichment show members cluster of regions promoter the did nor ) Quantification of significant gene-gene correlations defined as local local as defined correlations gene-gene significant of ) Quantification 100 * n KO WT Fig. Fig. n = 2); genes are ordered according to affinity-propagation affinity-propagation to according ordered are genes = 2); f a Fig. Fig. 6

. . ( Genes 6 Genes ). These groupings were based on inter-chromosomal inter-chromosomal on based were groupings These ). f e ) Gene-gene correlations between Aire-induced Aire-induced between correlations ) Gene-gene Fig. 6 Fig. Gene interactions per cluster b 100 20 40 60 80 ). Wild-type mTECs, although distributed around mTECs,). Wild-type distributed although 0 All a , right). For a broader perspective on mTEC on perspective broader a For right). , 1 Local Intrachromosomal Interchromosomal P < 10 < Aire P n < 10 WT mouse = 2 mice) (left), and size and and size and (left), = 2 mice) -deficient thymi, we analyzed analyzed we thymi, -deficient –60 Cluster –3 (Wilcoxon test)) than were were than test)) (Wilcoxon (Wilcoxon and test)) were a 1 – c , 4 e 0 , , a dimensionality- a , f ) or are from one one from are ) or ticles e l c i rt A WT mouse Gene-gene correlation 1 0 23 2 Aire  -

© 2015 Nature America, Inc. All rights reserved. gene-gene correlations independently from scRNA-seq mTEC data data mTEC scRNA-seq from independently correlations gene-gene were not shared by different mice ( It was already apparent that the small gene clusters expressed mice in individual in mTECs clusters mTEC in Difference cells. between coordination of degree some with but fashion random completely a in not expression, gene diversified ( above identified clusters cell the with coincided Aire two independent experiments. * Aire-induced genes (identified as in ( by t-SNE) from wild-type and 7 (right). ( with Aire-induced genes from chromosome 2 or order was applied for correlation values calculated chromosome (Chr) 1 (left), and the same cell determined by affinity propagation for genes on genes on different chromosomes. Clustering was genotype per group), calculated for Aire-induced and Aire-deficient mice (bottom) ( Fig. between individual mTECs (scRNA-seq data of mTEC subsets. ( gene networks generate diverse and distinct Figure 6 s e l c i rt A  n P b a

= 2 per genotype), based on the expression of Frequency of expression Density < 10 0.04 0.08 2 0.2 0.4 0.6 0.8 1.0 -deficient -deficient mTECs ( ) among mTECs from wild-type mice (top) 0 0 −3

2 0 Aire-dependent interchromosomal Methylated CpG(% 2 0 (Wilcoxon test). Data are pooled from b Methylated intragenicCpG(%) ) Distribution of mTECs (calculated Upstream 4 0 4 0 c a 6 0

Methylation per CpG (%) ) Cell-by-cell Pearson correlation Aire-induced

KO 8 0 6 0 100 20 40 60 80 0 0 Fig. Fig. 6 ) 2 0 8 0 0.1 0.2 0.3 Aire 0 b 0 1 0 Methylation perCpG(% Methylated CpG(% ). ). Predictably, these small t-SNE groups 4 0 -deficient mice n = 2 per Fig. 2 0 100 Promoter Fig. 6 0 3 0 1 WT ). 3 4 0

). ). Indeed, when we calculated 2 0 8 0 5 0 Methylated intragenicCpG(% ) 0 4 0 0.04 0.08 0.12 0 ) Fig. 6 Fig. Aire-neutral 0 WT a KO 6 0 100 Methylated CpG(% 2 0 Cells Cell a 4 0 Intragenic 8 0 ). Thus, Aire ). Thus, s Aire-induced Aire-neutral 6 0 Chr 1 8 0 0 0 ) 100 ) DNA methylation in mTECs does not account for Aire specificity cues. molecular by not ‘hardwired’ and events by stochastic mouse other the of those ( not but mouse, one only of mTECs for a applied cluster within correlations mouse, wild-type from each sets pooled from two independent experiments. independent two from pooled from (data mTECs Aire-deficient versus mTECs wild-type ( mTECs. Aire-deficient in genes, Aire-induced for expression of frequency and ( from (pooled mice Aire-deficient of mTECs from libraries methylation RRBS in assessed (key), genes Aire-neutral or Aire-induced of TSS) the beyond length gene of more or (25% regions intragenic or TSS) the from kb −1 to (−100 regions promoter TSS), the from (kb) kilobases −50 to (−1 regions upstream in (‘density’) residues CpG methylated ( genes. Aire-neutral at that versus genes Aire-induced 7 Figure data expression published of reanalysis and TSSs, Aire-target genes was the same in Aire-induced TSSs and Aire-neutral loci all in ( unmethylated uniformly were which particular, in sites unmethylated in both cases ( in both sets of genes, and the region surrounding represented the similarly were TSSelements enhancer was upstream in uniformly( positions loci Aire-neutral and from Aire-induced those for mice, and cient mice wild-type from mTECs between markedly did not differ at positions of tion various CpG methylation DNA for RRBS their processed and above described above. reported analyses single-cell the to complement good a thus was and locations specific at marks DNA-methylation of frequency the that measures methodology a is ing (RRBS) inherently single-cell sequenc bisulfite sis of by representation DNA reduced methylation for ‘preferential’ targeting of PTA-encoding genes motif TCGCA at the methylation recognize ‘preferentially’ factor’sto ability this uses and MBD1 protein CpG–binding methylated the the root of this. Indeed, it has been proposed that Aire with associates at be could patterns methylation heritable but variable as candidate, such one is dinucleotides CpG at DNA of methylation The mTECs. clusters mosomal that were coordinately in expressed groups small of dilection’ genes, and expressed the to interchro infrequently activate two for candidates likely the ‘pre of specificity: Aire prominent transcriptional characteristics are mechanisms regulatory Epigenetic b Fig. 5 Fig. Supplementary Fig. 4 Fig. Supplementary ) Relationship between mean methylation frequency at intragenic CpGs CpGs intragenic at frequency methylation mean between ) Relationship To determine their DNA-methylation status, we sorted mTECs as sorted we status, DNA-methylation their Todetermine f ). Thus, these gene networks were most probably established established probably most were networks gene these Thus, ).

c Little or no difference in the amount of CpG methylation at methylation CpG of amount the in difference no or Little ) Methylation frequency at each CpG position in DNA from from DNA in position CpG each at frequency ) Methylation Chr aDV 2 A NCE ONLINE PUBLIC ONLINE NCE ). In fact, the frequency of TCGCA sites in in sites TCGCA of frequency the fact, In ). Chr Fig. 7 correlation Cell-cell 1 0 7 a ). ). This observation held for MBD1 A TION b t-SNE2

−100 100 nature immunology nature 0 7 . . In addition, analy −100 7 a KO WT showed a limited limited a showed ) Frequency of ) Frequency 4 a n 1 ). Data are are Data ). = 2 mice). = 2 mice). . The distribu . The Fig. 7 Fig. t-SNE1 0 * Aire a ). CpG CpG ). 100 -defi

------

© 2015 Nature America, Inc. All rights reserved. netic modifications, which are set stochastically in every cell but but cell every in epige or stochastically DNA set are the which of modifications, organization netic particular a from result can expression Alternatively, low-frequency burst. such a transcriptional during cell ‘catching’a of odds the indicate simply can gene specific of a for expression the positive of cells a low frequency Then, scripts. time small the a only of transcribed fraction actively is gene given any whereby tion, transcrip of ‘burst’ a to correspond can expression gene Infrequent pro processes allowing adaptation cellular in or differentiation important through be gression can that expression gene of in sources noise several are There expression. gene ‘noisy’ related of instead notions is to but differentiation lineage for envisaged usually notion far removed from the gene-expression deterministic programs Aire the like population cell primary homogenous rather a for mean, really gene expressed’ ‘infrequently an what does However, of gene expression, of in terms regulation the genes. expressed poorly of features general these on ‘keyed’ instead ases no methylation at Lys4 or promoters with a surfeit of paused polymer features of gene and chromatin such organization, as histone H3 with generic that Aire notion the recognizes with is This consistent genes. minority of cells, and increased the intensity of expression of its target thymocytes. developing to presented peptides of sentation repre the maximizes mTEC transcriptome the on effect broad This machinery and its ‘preferential’ in on cells cultured transcripts effect spliced transcriptional the of close its factors to splicing with tied is interactions inclusion exon differential on effect Aire’s that likely seems It knowledge. our to far, thus recognized been not has on quences autoimmune responses ‘normally’ are PTAs which in expressed tissues the in those from patterns splicing different involves expression PTA ectopic that the with prediction consistent was observation This genes. Aire-neutral wise Aire-dependent transcripts, as well as Aire-dependent exons of in other thousands inducing by transcriptome mTEC the of diversity the increased Aire First, tolerance. central of induction the in function on its consequences have direct probably factor, which transcription a as Aire’s function of aspects novel several revealed has study Our DISCUSSION Aire. of targets are that not provide any obvious clue as to of distribution the genes frequency did patterns methylation and mTECs, in profiles DNA-methylation ( mTECs deficient Aire- and mTECs wild-type between correlation of degree high the by shown as dependent, Aire not were methylation, variable of sites ( quencies fre expression of range a with associated were statuses methylation these of both and methylated, highly or methylated not either were mTECs. CpGs of The majority in intragenic genes wild-type induced Aire- corresponding of expression of frequency the to relate might methylation CpG intragenic of frequency the whether determine to ( loci loci Aire-neutral Aire-induced expression-matched for for than pronounced less slightly was trend this and genes. PTA-encoding of expression or that had MBD1 little no role indicated in This the Aire-dependent ( signature Aire’s with transcriptional very little overlapped which mTECs, on MBD1 of effect transcriptional nature immunology nature mainly found is Aire time. of period some for stable then are Second, Aire seemed to ‘preferentially’ target genes expressed in a in expressed genes target ‘preferentially’ to seemed Aire Second, positions at intragenic frequency in increases methylation CpG 8 , 10 , 13 , 3 4 Fig. 7 Fig. 3 3 . Alternative splicing is known to have important conse important have to known is splicing Alternative . . Thus, Aire had no particular specificity for PTAs but but PTAs for specificity particular no had Aire Thus, . b ). Finally these methylation profiles, including the the including profiles, methylation these Finally ). Fig. 7 Fig. 2 6 n poue rltvl sotlvd tran short-lived relatively produces and

c aDV ). Therefore, Aire itself did not alter the the alter not did itself Aire Therefore, ). A NCE ONLINE PUBLIC ONLINE NCE 31 , 3 + 2 mTECs analyzed here? It is a is It here? analyzed mTECs , , but Aire’s role in process this Supplementary Fig. 4c Fig. Supplementary Fig. 7 Fig. A TION a ). We sought We sought ).

11 , 4 4 1 4 2 4 ). ). ------­ , . .

transcription factors, but it is puzzling for a mode of regulation that that regulation of mode a for puzzling is it but factors, transcription of specific of networks by persistence the transcription conventional mitosis after programs recovery gene-expression (the of ‘Bookmarking’ division. cell across clone mTEC ‘bookmarked’ an is within targets Aire of selection the that indicated progenitor cell epithelial same the of daughters be of relationship. clonal mTECs that share PTA plausibly clusters could terms in interpreted easily most perhaps is mTECs the of fraction a in co-expression their transcription, coordinated their explain might not share motifs sequence or discernable chromosomal locations that clusters did expression these the genes within or Because epigenome. which of from results organization stable the expression genome infrequent in model a with compatible are expression of microclusters discrete the instead, cells; different in time same the at ‘burst’ would genes that unlikely is it because model, ‘burst’ a with compatible ily mTECs. of of is not expression eas microclusters of discrete The these existence groups discrete by shared was expression whose genes of clusters co-regulated be found in might small the genes controlled Aire- these into ‘threaded’ speckles. containing been have expressed that those ectopically are genes mTEC of an set by the that possible is it and tion, speckles nuclear tight in hi imtr tyoye t sihl dfeet es f ef pep self of different slightly with sets repertoires cell T generated different thereby and tides slightly to thymocytes immature their exposed analyzed mice two the of experiment, at the time the shared However, since the clusters expressed of genes were Aire-induced not state of the mTEC at for pool of scRNA-seq. the time preparation cell stant inter-individual differences, or if they fluctuate and con represent the reflect and persist clusters co-regulated these if know formally One caveat of our observation. study here, however, is that we cannot scripts significantly shown tran than have Aire-independent variability inter-individual greater have transcripts Aire-induced data, microarray profiling of basis the On species. a within tolerance in variation by scRNA-seq; this has important implications for the inter-individual analyzed mTECswe whose mice wild-type identical genetically two mTECs. in PTAof amounts expression widespread but lower with than effective more be should selection cells presenting antigen-positive of pockets small with thymocytes scan the thymic medulla and negative selection is effective immature Since mTECs. all in expressed uniformly PTAswere all if than product gene any of concentration local higher a is there that mTECsthat are burden’indicate ‘splitting the of PTA and expression Aire-target genes and the existence of expression microclusters would cells. killer natural in receptors inhibitory and activating of repertoire determined stochastically the parallels scenario This transmissible. and bookmarked then is co-expression stochastic mechanism initially selects and marks groups of loci, whose inherently an that be might proposal one Thus, regions. DNA fixed with complexes stable trans-mitotically forms cofactors, other and it that with Aire is Brd4, communication); together possible personal bookmarking involved promoter,is the at RNA stalled II of polymerase release the promotes and histones acetylated binds which Brd4, note, Of PTA expression. ‘bookmarking’ in involved be might cations, albeit probably not methylation, or remanent histone marking PTA-encodingspecific genes activate normally that factors transcription the on depend not does le t te ai o te o epeso feuny f Aire- of frequency expression low the of basis the to Clues Notably, the co-expressed gene clusters were not the same in the the in same the not were clusters gene co-expressed the Notably, of frequency expression low the induction, tolerance of terms In 1 9 . Our data have now provided a cellular explanation for this this for explanation cellular a provided now have data Our . 4 7 and is an essential Aire cofactor (H. Yoshida (H. cofactor Aire essential an is and 4 5 thought to be sites of active transcrip active of sites be to thought 1 5 . . Epigenetic cues such as DNA modifi 4 7 ) can be explained for for explained be can ) in trans in ticles e l c i rt A 4 6 , which would would which , 48 , 4 in mitotic mitotic in 9 , negative negative , et al. et  ------,

© 2015 Nature America, Inc. All rights reserved. 15. 14. 13. 12. 11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. reprints/index.htm R The authors declare no competing interests.financial manuscript writing. and manuscript writing; D.M., manuscript writing; C.B. data analysis and M.M., data data collection, analysis and manuscript writing; D.Z., data analysis Ingelheim Fonds (D.Z.). Hospital in Pediatric Gastroenterology training grant for M.M.) and Boehringer Supported by the US National Institutes of Health (DK060027; and a Children’s G. and Buruzula, K. Waraska for help with mice, sorting and sequencing. (University of San California, Francisco) for the We thank S. Mostafavi for advice on computational analysis; M. Anderson online v Note: Any Supplementary Information and Source Data files are available in the analysis, SRR203821 SRR203819 codes. Accession the in versio available are references associated any and Methods M diseases. autoimmune to susceptibility of price the at albeit repertoire, the in ‘holes’ uniform without pathogens to at the level of the in species ensuring a diversity of potential responses may Such diversity be favorable tissues. to peripheral autoreactivities s e l c i rt A  C AU A

eprints and permissions information is available online at at online available is information permissions and eprints c O

et

knowle T MP Villaseñor, J., Besse, W., Benoist, C. & Mathis, D. Ectopic expression of peripheral- of expression Ectopic D. Mathis, W.,& Besse, C. Villaseñor,J., Benoist, M. Giraud, I. Oven, M. Gaetani, molecular the in Aire’spartners D. Mathis, & C. Benoist, M., Giraud, J., Abramson, Giraud, M. A.S. Koh, Org, T. Waterfield, M. S. Malchow, M.S. Anderson, A. Liston, F.X.Hubert, Anderson, M.S. molecular AIRE: by regulation Transcriptional A. Rebane, & T. Org, P., Peterson, Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. misinitiated. Proc. monoallelic, probabilistic, epithelium: thymic the in antigens tissue USA transcription. ectopic Aire-activated and release polymerase cells. epithelial thymic medullary in (2012). interactome. chromatin-associated of tolerance. immunological of control inthymic epithelial cells. USA Sci. Acad. tolerance, linking chromatin regulation with organ-specific autoimmunity. expression. gene activate to H3K4 (2014). MBD1 complex for the induction of immunotolerance. cells. T regulatory Immunity Immunol. Nat. tolerance. thymic of induction for cells protein. aire the by tolerance. central of mechanisms H h O n of the pape the of n ET ersion of the pape

ods 111 R R C I et al. GSE7079 NG et al. et

, 1491–1496 (2014). 1491–1496 , ONT et al. 23 et al. et 5 d 2 FI t al. et t al. et The autoimmune regulator PHD finger binds to non-methylated histone , , and and t al. et g et al. et , 227–239 (2005). 227–239 , SRR AIRE recruits P-TEFb for transcriptional elongation of target genes target of elongation transcriptional for P-TEFb recruits AIRE et al. et

N t al. et Aire unleashes stalled RNA polymerase to induce ectopic gene expression m l RIBU 4 .

Aire employs a histone-binding module to mediate immunological mediate to module histone-binding a employs Aire et al. et al. et 105 A , 350–354 (2003). 350–354 , ents n Ni cen o Ar cfcos eel a oe o Hrp in Hnrnpl for role a reveals cofactors Aire for screen RNAi An Aire regulates negative selection of organ-specific T cells. organ-specific of selection negative regulates Aire SRA: RNA-seq and data: methylation AIRE-PHD fingers are structural hubs to maintain the integrity the maintain to hubs structural are fingers AIRE-PHD Aire regulates the transfer of antigen from mTECs to dendritic to mTECs from antigen of transfer the regulates Aire N Science 203819 8 The transcriptional regulator Aire coopts the repressive ATF7ip-repressive the coopts Aire regulator transcriptional The Science r CIA SRR20382 , 15878–15883 (2008). 15878–15883 , . Projection of an immunological self shadow within the thymus iedpnet hmc eeomn o tumor-associated of development thymic Aire-dependent . T The cellular mechanism of Aire control of T cell tolerance. cell T of control Aire of mechanism cellular The r . I ONS Proc.Natl. Acad. Sci. USA L I

339 NTE

6 105 298 , , SRR203819 , 1219–1224 (2013). 1219–1224 , , 15854–15859 (2008). 15854–15859 , , 1395–1401 (2002). 1395–1401 , R 1 Nat. Rev. Immunol. Rev. Nat. ESTS 3 EMBO Rep. EMBO Cell Mol. Cell. Biol. Cell. Mol. ; GEO: single-cell transcriptomic transcriptomic single-cell GEO: ; Blood uli Ais Res. Acids Nucleic

140 , 123–135 (2010). 123–135 , 7

Aire 118 , ,

9 S , 370–376 (2008). 370–376 , RR203820

-GFP -GFP line; and K. Hattori, 109 , 2462–2472 (2011). 2462–2472 ,

27

Nat. Immunol. , 535–540, (2012). 8 , 8815–8823 (2007). 8815–8823 , , 948–957 (2008). 948–957 , http://www.nature Proc. Natl. Acad. Sci. Acad. Natl. Proc.

40 6 , , 11756–11768 , SRR203821 SRR20

15 , 258–265 Proc. Natl. 3819 online online .com/

0 4 , ,

18. 28. 27. 26. 25. 24. 23. 22. 21. 20. 19. 17. 16. 49. 48. 47. 46. 45. 44. 43. 42. 41. 40. 39. 38. 37. 36. 35. 34. 33. 32. 31. 30. 29.

Pinto, S. Pinto, Kharchenko, P.V., Silberstein, L. & Scadden, D.T. Bayesian approach to single-cell to approach D.T.Bayesian Scadden, P.V.,& Kharchenko, L. Silberstein, A.R. in Wu, genes individual of Transcription D.A. Hume, & C.M. Browne, I.L., Ross, Trapnell,C. Shalek, A.K. T. Kalmar, E.M. Ozbudak, expression gene P.S.Stochastic Swain, & E.D. Siggia, A.J., Levine, M.B., Elowitz, Shapiro, E., Biezuner, T. & Linnarsson, S. Single-cell sequencing-based technologies immunological variable The C. Benoist, & D. Mathis, R., Melamed, E.S., Venanzi, Derbinski, J. tissue- of expression variable Highly B. Kyewski, & J. Schwendemann, Taubert,R., Merkenschlager, M., Benoist, C. & Mathis, D. Evidence for a single-niche model of model single-niche a for Evidence D. Mathis, & C. Benoist, Merkenschlager,M., M. Borgne, Le R. Zhao, thymic complete a of Generation R. Boyd, & G.A. Hollander, M., Malin, J., Gill, Tao,Y. expression. gene eukaryotic in stochasticity of Control E.K. O’Shea, & J.M. Raser, T. Org, M.P.Ball, H. Gu, t-SNE. using data high-dimensional Visualizing G. Hinton, & L. Maaten, der van A.I. Su, J. Derbinski, J.B. Johnnidis, affinity for package Bodenhofer,R Kothmeier,Hochreiter,an U., & APCluster: A. S. S. Islam, by Hashimshony,RNA-Seq T.,single-cell Wagner,CEL-Seq: F.,Yanai,I. & Sher,N. the under splicing mRNA Promiscuous C. Seoighe, & R. Ceredig, P., Keane, A.C. Anderson, L. Klein, Sansom, S.N. J.M. Gardner, pteil el gnrt sl-nie diversity. self-antigen generate cells epithelial (2008). 657–662 mechanism. stochastic a for argue cells epithelial differential expression analysis. expression differential Methods Nat. (1994). eukaryotic cells occurs randomly and infrequently. cells.singlepseudotemporal ofordering variation. cells. stem embryonic in decisions Genet. Nat. cell. single a in science. (2013). whole-organism revolutionize will USA Sci. Acad. Natl. transcription. Aire-regulated in noise nongenetic and variation genetic self: (2013). E3497–E3505 and self-tolerance for implications thymus: autoimmunity. human in self-antigens restricted positive selection. positive medulla. the re-activation. transcriptional (2002). MTS24 by microenvironment matrix. nuclear to tethering through regions Science chromatin. inactive with associated cells. human in signatures profiling. methylation DNA genome-scale for Res. Learn. Mach. J. USA Sci. Acad. Natl. Proc. levels. multiple at regulated factor. transcription clustering. propagation Methods Nat. amplification. linear multiplexed (2015). cells. epithelial thymic medullary in AIRE of control ofthe ofselection mice:mechanisms naive of repertoire. self-reactive the periphery in T cells specific cells. epithelial thymic in expressed protein self epithelia. thymic in expression self-antigen of distribution and silencing Polycomb from relief cells. Science et al. et

t al. et t al. et 304 et al. et aDV t al. et et al. et et al. et Nature Genome Res. Genome t al. et et al. et t al. et

t al. et et al. et AIRE recruits multiple transcriptional components to specific genomic specific to componentstranscriptional multiple recruits AIRE 31 , 1811–1814 (2004). 1811–1814 , et al. et al. Preparation of reduced representation bisulfite sequencing libraries sequencing bisulfite representation reduced of Preparation

Nat. Immunol. Nat.

et al. AIRE activated tissue specific genes have histone modifications histone have genes specific tissue activated AIRE 11 11 t al. et Overlapping gene coexpression patterns in human medullary thymic medullary human in patterns coexpression gene Overlapping t al. et A 321 Shaping of the autoreactive T-cell repertoire by a splice variant of variant splice a by repertoire T-cell autoreactive the of Shaping Eur. J. Immunol. J. Eur. Quantitative single-cell RNA-seq with unique molecular identifiers. molecular unique with RNA-seq single-cell Quantitative , 69–73 (2002). 69–73 , ag-cl aayi o te ua ad os transcriptomes. mouse and human the of analysis Large-scale Targeted methylation gene-body reveal strategies genome-scale and t al. et uniaie seset f igecl RAsqecn methods. RNA-sequencing single-cell of assessment Quantitative t al. et NCE ONLINE PUBLIC ONLINE NCE t al. et Science

t al. et The dynamicsregulatorsdecisionsThefateandcellrevealed are ofby , 41–46 (2014). 41–46 , , 163–166 (2014). 163–166 , ee okakn aclrts h kntc o post-mitotic of kinetics the accelerates bookmarking Gene euae futain i nng xrsin eit cellfate mediate expression nanog in fluctuations Regulated 510 Promiscuous gene expression patterns in single medullary thymic Single-cell RNA-seq reveals dynamic paracrine control of cellular Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. , 843–847 (2008). 843–847 , Population and single-cell genomics reveal the Aire dependency, eeinl oeac mdae b etahmc Aire-expressing extrathymic by mediated tolerance Deletional rmsuu gn epeso i tyi eihla cls is cells epithelial thymic in expression gene Promiscuous Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. h ipc o ngtv slcin n hmct mgain in migration thymocyte on selection negative of impact The

, 363–369 (2014). 363–369 , ih rqec o atratv mei poelpd protein- proteolipid myelin autoreactive of frequency High 9 euain f os i te xrsin f snl gene. single a of expression the in noise of Regulation 105 hoooa cutrn o gns otold y h aire the by controlled genes of clustering Chromosomal , 2579–2605 (2008). 2579–2605 ,

Bioinformatics J. Exp. Med. Exp. J. 24

297 , 15860–15865 (2008). 15860–15865 , , 1918–1931 (2014). 1918–1931 ,

Nat. Biotechnol. Nat.

99 Nat. Cell Biol. Cell Nat. 10 J. Exp. Med. Exp. J. + , 1183–1186 (2002). 1183–1186 , thymic epithelial cells. epithelial thymic , 4465–4470 (2002). 4465–4470 , , 823–830 (2009). 823–830 ,

Nat. Methods Nat. 37 Cell Rep. Cell , 838–848 (2007). 838–848 , PLoS Biol. PLoS Hum. Mol. Genet. Mol. Hum.

191

27 Nat. Biotechnol.Nat. , 761–770 (2000). 761–770 , , 2463–2464 (2011). 2463–2464 , A

202

TION 2

13 Mol. Immunol. Mol. 27 , 666–673 (2012). 666–673 , Nat. Protoc. Nat.

11 , 33–45 (2005). 33–45 , , 1295–1304 (2011). 1295–1304 , 91 a. e. Genet. Rev. Nat. 7 , 361–368 (2009). 361–368 ,

rc Nt. cd Si USA Sci. Acad. Natl. Proc. , e1000149 (2009). e1000149 , 102 Nat. Med. Nat. , 740–742 (2014). 740–742 , Immunol. Cell Biol.

, 11694–11698 (1994). 11694–11698 , Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. nature immunology nature , 7233–7238 (2005). 7233–7238 , Bioinformatics

Nat. Immunol. Nat. 18

32

, 4699–4710 (2009). 4699–4710 , 6

, 468–481 (2011). 468–481 , , 381–386 (2014).381–386 , 43 6 , 56–61 (2000). 56–61 , , 335–345 (2006). 335–345 ,

14

31 72

3 618–630 , , 986–990 , , 635–642 , , 177–185

Proc. 110 105 , ,

© 2015 Nature America, Inc. All rights reserved. mocycler lid temperature, 80 °C) in a thermocycler. 2 2 thermocycler. a in °C) 80 temperature, lid mocycler weeks. 3 to up for °C −80 for (>2250 centrifuged speed s, maximum 10 at for min 1 vortexed were then Scientific), Excel F96100; 96; (AlumaSeal seal aluminum an with covered quickly were plates the sorting, 0.25 Ambion), AM1751; Kit; Amplification aRNA II MessageAmpTM the in provided tially 0.25 0.5 contained wells These mixture to process the carrier RNA and into which single cells were not sorted. a with different was filled of wells (24 bp). Each three sequence and oligo(dT) bp) (8–16 barcode DNA cell 5 single a bp), the (4–9 ‘barcodes’ promoter, molecular unique T7 a contained primer Each primer (25 ng per 1 1 per U (40 4.4 with filled (HSP9631; BioRad) plates PCR hard-shell 96-well in II FACSAria BD a with sorted protocol CEL-Seq a modified with generated construction. library scRNA-seq mm10. to relative by SeqMonk determined was genes closest the 20 reads were considered for subsequent analysis. Relating of CpG positions to using SeqMonk (Babraham Institute). Only those CpG sites covered by at least tool mapping Bismark the with mm10 to aligned were reads Trimmed Institute). (Babraham script wrapper automated TrimGalore! the of option RRBS with the sequences adaptor remove to trimmed were reads alignment, to Prior HiSeq2000. a on performed was (bp)) pair 50–base (single-end, Sequencing conversion. bisulfite for used was (Zymo) Methylation-Direct DNA EZ that described analysis. and preparation library RRBS the with calculated Institute). data (Babraham sequence was mapped of analysis expression and visualization Exon for program SeqMonk analyses. of RNA-Seq suite Cufflinks for the with tools calculated were (FPKM) transcript per counts mapper. Normalized out analysis. reads were filtered Duplicated from further to the mm10 assembly of the mouse genome with the TopHat2 splice-junction and reads were on aligned (Illumina), end, 50bp) a was performed HiSeq2000 from (Illumina) 5 × 10 protocol manufacturer’s the following TruSeq with pared analysis. and preparation library TruSeq (BD). FACSAriaa on sorter plate GFP preparation. RRBS Forlibrary scRNA-seq, gating, which similar also included preparation (for TruSeq library preparation) or into RPMI medium (Gibco) for 10 × 10 DAPI (Miltenyi). beads PE underwent samples were of depletion CD45 BioLegend), from all (30-F11); anti-CD45 conjugated (Cy5)– indodicarbocyanine phycoerythrin– and phycoerythrin- (6C3) anti-Ly51 conjugated (M5/114.15.2), (I-A–I-E) II class MHC (anti-) to (allophycocyanin-conjugated antibody antibodies primary with staining Following and (Roche) 0.2 mg/ml collagenase-dispase DNase (Sigma) in RPMI medium. medium and was digested for 30 min, with cells. agitation epithelial thymic every of Isolation 10 min, with 0.5 mg/ml Anderson. by M. provided were gene encoding islet-specific glucose-6-phosphate-related protein; with Medicine (Institutional Animal Care and Use Committee protocol 2954). Mice at the Harvard Medical Center School for Animal and Resources Comparative Mice. MET ONLINE doi: Reverse (ArrayScript well each to added then was mix Transcription Reverse RNA was denaturated by incubation of the plates for 3 min at 70 °C (ther °C 70 at min 3 for plates the of incubation by denaturated RNA was 10.1038/ni.3247 5 lo µ Aire- 0 All mice were housed and bred under conditionsspecific-pathogen–free (Babraham Institute), and methylation calls per CpG were calculated calculated were CpG per calls methylation and Institute), (Babraham 4 mTECs, was used for at sorting a density of one cell per well of a 96-well l l of a T7-oligo(dT) primer without the Illumina adaptor or barcodes (ini to 10 × 10 4 per mouse) were sorted on a MoFlo (Cytomation) into Trizol for RNA driven driven expression of Igrp-GFP (sequence encoding GFP fused to the 4 1 from 5 × 10 × 5 from µ l stock; 10777-019; Invitrogen), 0.25 0.25 Invitrogen), 10777-019; stock; l µ H 4 l of RNAseOut and 3.5 3.5 and RNAseOut of l sorted mTECs (one mouse) per sample. Sequencing (single- ODS µ l stock) and 4 + µ cells by magnetic-activated with cell separation anti- cells by magnetic-activated 4 l of lysis buffer containing 0.125 0.125 containing buffer lysis of l to 10 × 10 × 10 to µ – l HeLa total RNA (1 (1 RNA total HeLa l CD45 µ Single-cell RNA sequencing libraries were were libraries RNA sequencing Single-cell – g l of RNAse-free water (AM9932; Ambion). Ly51 at 4 °C), frozen on dry ice and kept at kept and ice dry on frozen °C), 4 at Thymus tissue was dissociated in RPMI RPMI in dissociated was tissue Thymus 4 sorted mTECs (one mouse), except except mouse), (one mTECs sorted lo MHCII µ RRBS libraries were prepared as as prepared were libraries RRBS Bulk RNA-seq libraries were pre were libraries RNA-seq Bulk l of RNAse-free water. After cell cell After water. RNAse-free of l 3 4 . First, single cells were index- were cells single . First, ′ TruSeq Illumina adaptor, adaptor, Illumina TruSeq hi µ µ GFP g/ l of reverse-transcription reverse-transcription of l µ l; AM7852; Ambion), Ambion), AM7852; l; hi mTECs (5 × 10 × (5 mTECs µ l of First Strand Strand First of l µ l of RNAseOut RNAseOut of l Adig mice 4 to to ) 2 9 - - -

transcribed with SuperScript II (18064-014; Invitrogen). 8.5 8.5 Invitrogen). (18064-014; II SuperScript with transcribed reverse- then was RNA ligated The lid). thermocycler (open °C 22 at h 1 for 1 per U (5 2 Ligase RNA U/ (40 Inhibitor RNAse then 2 and °C), ice, 80 in temperature, immediately lid placed were (thermocycler °C 70 at min 2 for incubated (10 2 buffer, RNA 1 Ligase 3 Enzymatics). (L6070L; truncated 2, Ligase RNA T4 with RNA treated the to ligated then was (RA3) 5 in 14 MinElute RNeasy an Cleanup Kit according to with the manufacturer’s protocol purified (Qiagen) and was eluted then was kinase polynucleotide T4 and for 1 h phosphatase at by with RNA 37 °C. incubation treated followed NEB), 1 0.5 water, free 5 (75 mM stock), 1.6 1.6 stock), mM (75 10.4 Ambion). (AM1354; kit scription 6 in elution with and of 1× volume XP RNA Beads with AMPure similarly performed was tion out in 50 carried was elution before min 15 for dried were Beads magnet. the on still while twice ethanol supernatant was removed carefully and the were beads washed with fresh 70% The magnet. the on min 5 for incubation then temperature, room at min 15 for incubation by followed beads, XP RNAClean Agencourt of volume 0.8× with mixed was pool Each plate). per pools (three cDNA HeLa library carrier with a cDNA together were 30 pooled libraries In single-cell our experiments, different barcodes were pooled into one tube with a Hela carrier cDNA library. beads (A63987; Beckman Coulter). First, single-cell cDNA libraries XP containing RNAClean Agencourt the using performed then were selection size and then incubated for 2.5 h at 16 °C (with lid thermocycler open). were cDNA plates clean-up The plates. the of well each to added was Mix Enzyme Synthesis 1 and Buffer Reaction Synthesis Strand Second 10× 12 containing Transcription Reverse Strand 15 NEBNext). (E6111L; module synthesis strand Second °C). 50 temperature, lid (thermocycler °C 42 1 per 1 per U (200 ArrayScript 0.5 Invitrogen), 18427-013; stock; 1 containing Ambion) AM2048; Transcriptase; at the 5 65 °C. Then, the RNA at was dephosphorylated the 3 at min 5 for and °C 37 at min 30 for incubation followed pool, aRNA each of 1 per (5U phosphatase 2 containing 5 the The samples were Agilent). as treated then First, pico follows. Kit (5067-1513; of 1 by analysis 10 in twice, water. RNAse-free elution, by followed protocol, manufacturer’s the to ing accord Qiagen) (74204; Kit Cleanup MinElute RNeasy an with cleaned then addition rapid the by stopped was of 2 immediately reaction were the and then ice, °C), onto 105 transferred temperature, lid (thermocycler °C 94 at 16 to added was 2 4 Fragmentation E6150S). RNA (NEBNext Magnesium Module the with fragmented was (aRNA) RNA temperature, lid (thermocycler °C 37 at 70 °C). Illumina libraries were then as constructed h follows. First, the amplified 14 for incubation by followed 0.8 and Mix Enzyme 1.6 µ µ µ In vitro mRNA the with performed was transcription reverse Second-strand The size distribution and quantity of fragmented aRNA was then assessed assessed then was aRNA fragmented of quantity and distribution size The l with a vacuum concentrator (5–7 min at 55 °C). The 3 The °C). at 55 min (5–7 concentrator a vacuum l with l of RNAseOut and 2 2 and RNAseOut of l l of RNAse-free water and 2 2 and water RNAse-free of l µ µ µ µ ′ l of ATP (100 mM stock, ATP Tris buffered; R1441; Thermo Scientific), Scientific), Thermo R1441; ATP Tris stock, buffered; mM ATP of l (100 l of CTP (75 mM stock), 1.6 1.6 stock), mM (75 CTP of l µ end of the aRNA was dephosphorylated by the addition of 4 by addition the of aRNA end dephosphorylated the was l of 10× RNA Fragmentation Stop Solution. The fragmented aRNA was aRNA was fragmented The Solution. Stop RNA l Fragmentation of 10× M stock) was added to 5 5 to added was stock) M µ l l of RNAse-free water. The samples were then dried down to a volume of ′ l stock; AM2682; Ambion). The plates were then incubated for 2 h at h 2 for incubated then were plates The Ambion). AM2682; stock; l end by the addition of 30 transcription was then conducted with a MEGAshortscript T7 tran µ µ l of 10× Antarctic Phosphatase Reaction Buffer (M0289; NEB), NEB), (M0289; Buffer Reaction Phosphatase l of Antarctic 10× l of 10× Antarctic Phosphatase Reaction l Buffer, of Reaction 1 10× Phosphatase Antarctic µ µ l of each sample in a BioAnalyzer with an Agilent RNA an with Agilent 6000 l in a of sample BioAnalyzer each µ l of aRNA. Samples were immediately incubated for 2 min min 2 for incubated immediately were Samples aRNA. of l l of RNase-free water. RNase-free of l µ l of UTP (75 mM stock), 1.6 1.6 stock), mM (75 UTP of l µ µ µ l of RNAseOut was added to each 6- each to added was RNAseOut of l µ l of DMSO (D9170; Sigma) and 1 Sigma) l of (D9170; DMSO l stock; M0289; NEB) and 1 and NEB) M0289; l stock; µ µ µ l; Y9240L; Enzymatics) and 1.5 1.5 and Enzymatics) Y9240L; l; µ l of a mixture containing 1 1 containing mixture a of l l of stock) was added. The ligation was performed performed was ligation The added. was stock) of l l of T4 PolyNucleotide Kinase (10 U/ (10 Kinase PolyNucleotide T4 of l l of stock) and 0.25 0.25 and stock) of l µ l of RNAse-free water. A second bead purifica water. l bead of A RNAse-free second µ µ µ µ l of the RNA Fragmentation Buffer (10×), (10×), Buffer Fragmentation RNA the of l µ l l of a mixture containing 21.5 l of the treated RNA. The samples were were samples The RNA. treated the of l l of the fragmentation mix, containing containing mix, fragmentation the of l µ l of T7 10× Reaction Buffer, 1.6 1.6 Buffer, Reaction 10× T7 of l l of 10× First Strand Buffer, 0.25 0.25 Buffer, Strand First 10× of l µ µ l of a mix containing 1.6 1.6 containing mix a of l l of a mixture containing 0.5 0.5 containing mixture a of l µ l of RNAse-free water, 2 2 water, RNAse-free of l µ µ l of dNTP mix (10 mM each mM each (10 l mix of dNTP l of RNAse Inhibitor (40 U (40 Inhibitor RNAse of l nature immunology nature µ ′ l of GTP (75 mM stock), stock), mM (75 GTP of l end and phosphorylated µ µ µ l of RNAseOut to 16 to 16 l of RNAseOut l of the Second Strand Strand Second the of l l of 10× truncated T4 T4 truncated 10× of l µ µ ′ Illumina adaptor adaptor Illumina l l of 3 the l of truncated T4 T4 truncated of l µ l of the Second Second the of l µ µ l cDNA pool, pool, cDNA l l of a mixture mixture a of l µ µ µ l l of Antartic l l of RNAse- l; M0201S; M0201S; l; µ µ l l of a mix ′ l of ATP of l adaptor µ l of T7 T7 of l µ µ µ µ l of of l l of of l l of of l l of of l µ - - - l

© 2015 Nature America, Inc. All rights reserved. script was unsampled (versus genuinely unexpressed) with the scde.failure. the with unexpressed) genuinely (versus unsampled was script that a probability tran given we calculated biases, for sampling account these to ‘dropouts.’ as Therefore, known are events undetected These undetected. to remain low expression, with those particularly that transcripts, some cause can data RNA-seq single-cell of analysis during occur can inherent biases However, sampling depths. read in pre- cells between of differences for account to collection computing) statistical for preprocessCore project R the of software in (in functions processing function normalize.quantiles the 3 a to closest one the to genes in several overlapped that reads to assign modified nonempty.was script The with the options: following reference gff transcriptome -s yes -m intersection- and biomart_mm10_gene. the software of use the htseq-count through genes ments and utilities for manipulating alignments. Finally, reads were assigned to format map) alignment for (sequence storing large align sequence nucleotide SAMTools the of 256 flag parameter the via out filtered were positions tiple to that mapped mul Reads kept. were then identifiers molecular unique with reads only positions, these of each for and extracted, were reads duplicated molecular mand picard-tools-1.79/MarkDuplicates.jar. unique Then, the genomic the position of the with out as barcodes follows. filtered First, duplicated mapped reads were marked via the com were reads mapping 15–transcriptome-index. Duplicated 5–no-coverage-search–segment-length 5–read- edit-dist 5–read-gap-length fr-firststrand–read-mismatches 2–library-type -p tophat options: following the with kept was information strand and tome fastx_trimmer. command the with sequence transcript the retrieve to of two mismatches. Reads assigned to each single cell were then trimmed again for a script tool maximum and fastx_barcode_splitter.pl the 8-bp the barcode of use through cell to single 33 each -q 20 were -p reads assigned the 80. Then -Q -v fastq_quality_filter command the with >33) of score quality Phred+33 Sanger a had sequence the of 80% than (more quality for filtered were reads sequence), transcript and identifiers molecular unique (barcode, parts ferent dif the of merging After sequence. oligo(dT) potential a of rid get bp 30 to (8 bp) and the (4–8 unique identifiers bp), molecular and Read 1 was trimmed barcode of –Q 2 for the single-cell 33). Read trimmer was extraction trimmed (fastx_ 0.0.13 version toolkit, FASTX the with trimmed first were reads Raw processing. data scRNA-seq unique and barcode identifiers. single-cell molecular the through reads 2 Read sequence. script tran the through reads 1 Read (custom_Read_2_seq). 2 Read for bp 25 and (custom_i7_seq), sequencing index for bp 7 (custom_Read_1_seq), 1 Read (100 primers custom with performed MiSeq was a sequencing 50-bp on Paired-end mode). (rapid (‘nano 2500 kits’) and HiSeq sequencing for pooled were Samples Agilent). (5067-4626; Kit 1 of 12 and volume 1.2× used 32 in performed was of elution volume and 1× beads used purification first The modifications. following the with above, described as (A63880), Beads XP AMPure Agencourt with treatment The PCR product was up then cleaned by and size selected with two rounds of °C. 72 at min 5 and °C; 72 at s 30 and °C 60 at s 30 °C, 98 at s 20 of cycles 18 as 95 °C follows: for were and performed 3 was PCR cycles added, stock) min; 2× HotStart ReadyMix and 4 4 To 4 reaction each and ReadyMix HotStart 2× 17 reaction: reverse-transcription each 71 Kapa). (KK2602; ReadyMix HotStart Kapa with amplified then was library the and °C), 70 temperature, 1 per 2 stock), mM (100 DTT of ture containing 4 on ice, 80 and immediately °C) temperature, 10.5 and placed then lid (thermocycler °C 70 at min 2 for incubated then were samples The RNA. 10 2 containing nature immunology nature Counts were normalized between cells by quantile normalization using using normalization quantile by cells between normalized were Counts transcrip mouse mm10 the to TopHat2 using performed was Mapping analysis by assessed was library the of quantity and distribution size The lid (thermocycler °C 50 at h 1 for performed was transcription Reverse µ M stock) and M 6.5 stock) µ µ l of each sample in a BioAnalyzer with a Agilent High Sensitivity DNA Sensitivity High Agilent a with BioAnalyzer a in sample each of l l of stock) was added. was stock) of l µ l of RNA reverse-transcription primer (RTP primer; Illumina); Illumina); (RTP primer primer; l of RNA reverse-transcription µ l l of 5× First Strand Buffer, 0.5 µ l of a uniquely indexed P7_Rd2_Primer_idxN_R (10 (10 P7_Rd2_Primer_idxN_R l indexed of a uniquely µ l of RNAse-free water was added to l 10 water was added of RNAse-free µ l of elution water. elution of l µ l of RNAseOut and 2 2 and RNAseOut of l Raw data were processed with custom scripts. scripts. custom with processed were data Raw ′ end. µ µ l of the following mixture was added to to added was mixture following the of l l of P5_Rd1_Primer_F (10 (10 P5_Rd1_Primer_F of l µ M stock in water) as follows: 75 bp as for M in follows: water) stock µ µ l RNAse-free water, 50 50 water, RNAse-free l l of water; the second purification purification second the water; of l µ l l of dNTP (25 mM mix), 2 µ l of SuperScript II (200 U (200 II SuperScript of l µ l l of ligated the µ µ µ M stock). stock). M l l of a mix l of Kapa Kapa of l µ M µ ------l

the basis of Aire-induced genes from chromosomes 2 and 7. and 2 chromosomes from genes Aire-induced of basis the using genes on Pearson chromosome on1 correlations cell-cell and calculated analysis initial our by determined order same the maintained we mosomes, chro other from genes of Aire-induced of expression the on basis the similar highly still were groups cell same the whether To 1. determine on chromosome located genes Aire-induced of expression the of basis the on groups cell script. custom a with tions, storing cluster size and mean correlation per cluster for all permutations, the data and shuffled data ran for apcluster with the permuta 1,000 wild-type We matrix. data the of row per R in function sample the with cells wild-type our real data shuffled by read counts per randomly gene redistributing among known clusters of a number require not does it as case, this in useful was propagation Affinity (that is, the number of counts per gene that was the second highest among among highest second the was that gene per counts of number the is, (that counts read maximum second-highest for windows expression the in genes we selected For this, many in of data analyses. our by scRNA-seq defined sets of transcriptional output from different loci, we used expression-matched gene mTECS versus wild-type in 1.1-fold than more differ not did expression whose those as defined were in than mTECs wild-type in higher twofold at least was expression whose those were ( above defined were analyses our of definitions. Gene-set Aire and mTECs wild-type for gene per counts nonzero averaged simply we fore, there that gene is when genes expressed; output of individual transcriptional the determine to cells expressing from counts mean calculated we Similarly, mTECs; of number total the by and wild-type divided cell) given a in transcript given a for detected were reads >0 if (specifically, number gene given a the mTECsexpressing of as gene per expression of frequency calculated we population, confident. significantly considered were 0.95 than greater of values confidence with events Conversely, event. unexpressed genuinely a expression’) differential probability. An event with a confidence (‘single-cell SCDE the package of function probability subsets, we clustered our expression data by affinity propagation based on on based Pearson propagation correlations with the ‘corSimMat’affinity by function in data the apcluster package expression our clustered we subsets, cell similar and highly networks gene Toin R. co-expressed function identify biases. dropout for correct to dropout) − (1 value the standardized expression value was multiplied by the expression confidence in raw the events in For R. count data, zero-read function scale the using cells studies matrix weighted by the of confidence expression (1 − dropout) as in published and Correlation clustering analyses. in difference mean their ( expression against it plotted and pairs, random 50 these for as Gi in mTECs.wild-type We then the calculated average change in frequency level same at the expressed genes 50 another and cells, Gi Aire-deficient as in level same at the expressed genes we Aire-neutral random 50 sampled (‘Gi’), randomly gene Aire-induced each For gene: in Aire-induced intensity for mean cells in positive changes from result might that frequency in changes the of genes Aire-neutral for distribution null a derived we genes, expressed low for observed probability dropout higher the account into To take genes. for Aire-induced expression in mean change the versus frequency expression between change, the distributions. joined of Simulation intensity-frequency events. outlier confounding, avoid to counts read maximum use not did We data. specifically single-cell our among group) that in cells all For cell clusters, affinity propagation with apcluster was used to determine to determine was used apcluster with propagation affinity For clusters, cell To test the validity of the gene clusters in observed the data wild-type set, we To control for unrelated that effects could result simply from different levels To genes how were in individual expressed determine the frequently mTEC Gene-gene and cell-cell Pearson correlations were performed with the with ‘cor’ were performed Pearson correlations and cell-cell Gene-gene -deficient mTECs, separately. -deficient 2 4 2 Aire . Specifically, expression levels per gene were standardized among all gene per were levels standardized expression . Specifically, 8 . ‘Confidence probabilities’ were calculated as 1 − SCDE dropout dropout SCDE − 1 as calculated were probabilities’ ‘Confidence . Supplementary Fig. 1 Fig. Supplementary -deficient mTECs, at the population level. Aire-neutral genes genes Aire-neutral level. population the at mTECs, -deficient Aire Aire-deficient frequencies were calculated independently. independently. were calculated frequencies Aire-deficient a priori a Aire Aire-induced and Aire-neutral gene lists used in many used gene lists and Aire-neutral Aire-induced Aire -deficient mTECs and wild-type mTECs, in gene- in mTECs, wild-type and mTECs -deficient . -deficient mTECs. -deficient ). Fig. Fig. We used a expression row-standardized P value of less than 0.05 was considered 1 ). Specifically, Aire-induced genes genes Aire-induced Specifically, ). We aimed at testing doi: 10.1038/ni.3247 3 6 ). ). - - -

© 2015 Nature America, Inc. All rights reserved. the the components of the gene clusters spanned, we matched each gene per cluster distances. chromosomal cluster Gene package t-SNE the with probabilities confidence gene Aire-induced of correlations Pearson of basis the on components t-SNE Wecalculated representation. dimensional and wild-type our in observed we doi: We used t-SNE computation to visualize the cell-cell heterogeneity heterogeneity cell-cell the visualize to computation t-SNE used We 10.1038/ni.3247 4 0 . Aire To what distances genomic determine -deficient mTECs as a simple two- simple a as mTECs -deficient

50. partner. gene’scorrelated that to highly most distance the of basis the on chromosome, but >1 Mb away), or (same chromosome local and <1 Mb away) inter as (same designated intrachromosomal was chromosomes), on different (located chromosomal gene each genes, the of positions TSS the of basis the On R). in ‘cor’the (using function partner correlated highly most its with

Krueger, F. & Andrews, S.R. Bismark: a flexible aligner and methylation caller for caller methylation and aligner flexible a Bismark: S.R. Andrews, F.& Krueger, Bisulfite-Seq applications. Bisulfite-Seq Bioinformatics

27 , 1571–1572 (2011). 1571–1572 , nature immunology nature -