© 2015 Nature America, Inc. All rights reserved. to to L.M.S. ( Center, Heidelberg, Germany. USA. 1 domain finger PHD1 its with Lys4) at methylated not H3 the (histone to H3K4me0 binding mark by chromatin repressive directly, either chromatin inactive targets Aire exception a notable represents in thymus, TRAs the expressed ectopically of part large a of expression the for responsible is which Aire, regulator the transcriptional are understood; expression poorly self-tolerance of induction the during factor limiting potential a is antigens self of thus availability the and Therefore, pattern. mTECs, mosaic a of follows expression TRA 1–3% only in expressed is TRA each cell-fate diversion toward the regulatory T cell lineage two modes, either through the elimination of self-reactive T cells or by via operates of self-tolerance induction The imprinting. tolerance to responsive most still are they when cells T developing to accessible become in and time space, of controlled outside thymus the is tightly restricted self antigens (TRAs) tissue- of range wide a express ectopically to ability unique due their to out stand (mTECs) cells epithelial thymic medullary cells, ing complementary fashion display a broad range of self antigens in a redundant partly and partly by maturing T cells antigens self of scanning exhaustive the on relies tolerance), central develop to shown been have diseases autoimmune various fails, distinction subtle this when and system, immune adaptive the of hallmark a is self-tolerance, including non-self, and self between Discrimination remodeling of chromatin and thus ensures a of representation comprehensive the self. immunological chromatin accessibility. Our findings characterize TRA expression in mTECs as a coordinated process that might involve local expression patterns, each present in only a subset of mTECs. Co-expressed clustered in the genome and showed enhanced TRAs, is poorly understood. Here we used single-cell RNA sequencing and obtained evidence of numerous recurring TRA–co- single mTECs and is coordinated at the population level, such that the varied single-cell patterns add up to faithfully represent and self-tolerance prevents autoimmunity, with each TRA being expressed in only a few mTECs. How this process is regulated in Expression of self tissue-restricted antigens (TRAs) in medullary thymic epithelial cells (mTECs) is essential for the induction of Rita Küchler Philip Brennecke medullary thymic epithelial cells coordinated ectopic patterns -expression in Single-cell analysis transcriptome reveals nature immunology nature Received 10 February; accepted 8 July; published online 3 August 2015; Department Department of Genetics, Stanford University, School of Medicine, California, USA. Many aspects of the complex molecular regulation of thymic TRA TRA of thymic regulation molecular of complex the Many aspects 3 European European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany. 1 , [email protected] 2 . Self-tolerance of T cells, as imposed in the thymus (i.e., (i.e., thymus the in imposed as cells, T of Self-tolerance . 4 , 10– 4 , , Wolfgang Huber 1 2 3 . . . Distinct types of thymic antigen-presenting cells 1 , 2 4

, . . Among the various thymic antigen-present 5 5 aDV , , Alejandro Reyes These These authors contributed equally to this work. ), ), W.H. ( A 5 , 6 NCE ONLINE PUBLIC ONLINE NCE . . In mTECs, TRAs, whose expression [email protected] 16 3 , , 1 6 7 , , Bruno Kyewski , or indirectly, through its its through indirectly, or , 3 ) ) or B.K. ( , 5 , , Sheena Pinto A TION 3 , 4 , 7– [email protected] 9

. . Typically, 4 1 doi:10.1038/ni.324 , , 13– 6 & M Lars Steinmetz 6 1 These These authors jointly directed this work. should Correspondence be addressed 5 - . 4

, 5 2 , , Kristin Rattay Stanford Stanford Genome Technology Center, Stanford University, California, dence of the co-regulation of TRAs within single cells single within TRAs of co-regulation the of dence mTECs mouse approaches have TRA–co-expression patterns not in discerned single However,level. the single-cell multiple using studies while single-cell at patterns thymic gene-expression that tissue-specific mimics proposal expression TRA the challenges which categories, functional mTECs TRAs that of single diverse indicated encoding genes express (scRNA-seq) RNA sequencing single-cell and PCR multiplex single-cell analysis, transcriptome bulk applying mTEC. individual an of as space, well as how are patterns the lifetime these stable throughout and time in levels intercellular and intra- the at coordinated is sion mTECs it express. Likewise, remains elusive how thymic TRA expres individual that TRAs of set the on constraints are there whether or of TRAs set mTECa random each samples whether tissues. unclear It also is peripheral of transcriptomes combined the covers reliably of mTECs composite the that such level, single-cell at the expression remains it However, govern the of rules patterning underlying which thymic unclear TRA mechanisms. multiple using chroma potentially inactive tin, targets ‘preferentially’ Aire that indicate studies TRA-encoding of expression II from genes promoters polymerase their by stalled releasing ectopic promote to believed is Aire chromatin, silent to recruited being Upon respectively. chromatin, polycomb-silenced and promoters repressed at dinucleotides CpG protein complex ATF7ip-MBD1 the as such partners, binding ). Published studies have addressed some of those questions by by questions those of some addressed have studies Published 4 Division Division of Immunology,Developmental German Cancer Research 1 6 9 . These are thought to recruit Aire to methylated methylated to Aire recruit to thought are proteins These . 10 , 19 4 1 , , – 2 5 1 3 , Michelle , NguyenMichelle , a study of human mTECs has provided evi provided has mTECs human of study a , , 6 10 , 1 12 , 2 , 19 ,

, 2 1 . Such studies . have Such studies ticles e l c i rt A 1 1 8 2 or the Cdh4 Cdh4 or the . Identifying Identifying .

2

0

. . Such  - - -

© 2015 Nature America, Inc. All rights reserved. variable genes showed enrichment for TRA-encoding genes compared than larger 0.25) at a variation false-discovery rate of (FDR) of 10% ( coefficient squared a (i.e., 50% than larger variation of across coefficient biological a expression having genes 9,689 gene mTECs, with in heterogeneity of degree high a revealed analysis This mTECs. single 203 the across variable highly was sion before the suggested been has as at level, mTECs population mature in self immunological the of representation comprehensive a documented data These mTECs. mature hundred few a across sampled was genome protein-coding the of 90% nearly the Ensembl project of of genome ( databases) 75 release genes; 22,740 of (19,619 analyzed mTECs mature 203 the in genes protein-coding annotated all of 86% of expression ( TRA-encoding 3,976 reported genes previously the of 95% Moreover, ( coverage sequencing by varying explained mTECbe could per detected genes TRA-encoding of number the in variation the as genes, TRA-encoding expressed of proportion the in variation cell and ( genes) TRA-encoding as classified were detected genes of (19% detected genes of number total the to proportional was cell a single within detected genes of that number the TRA-encoding We found scRNA-seq. by detected was expression whose genes) ing of protein-cod a subset (i.e., genes and TRA-encoding genes coding ( analysis further for (96%) cells 203 retained we control, quality data a using mice version method of the modified Smart-seq2 libraries cDNA C57BL/6 single-cell female 211 generated 6-week-old and mice) to (5–20 4- of tissue thymic pooled (PI mTECs mature single MHCII TRA expression in single mTECs, wethymic performedof scRNA-seqpatterning onand mature heterogeneity of extent the Toinvestigate mTECs by self immunological the of coverage Comprehensive RESULTS cells. individual in present clusters population composite of recurrent and complementary co-expression a by assembling mTEC the compartment in antigens self of diversity full the of representation ensures that process regulated highly a as expression TRA thymic characterize findings mTECs. of Our subset tion of all genes, and individual clusters were expressed only in a small a frac only comprised cluster Each genes. of TRA-encoding clusters co-expression distinct numerous with of cells composed at was large we approach, found that mature the mTECgenome-wide population class II and (MHCII) the maturation marker UsingCD80 (B7-1). this cells with high surface expression of major histocompatibility complex At TRAs. antigen-presenting are competent they the same time, fully of diversity largest self- the expressing by cells inducing T developing for in tolerance responsible mainly subset mTEC the represent they as mTECs, mature on study our Wefocused TRAs. particular of expression their for selected subsets mTEC mature three as well (MHCII mature 203 of profiles expression single-cell mTEC the compartment. in generated is self-tolerance, of prerequisite a antigens, self expressed ectopically of diversity the how understanding to key is cells single in expression TRA thymic regulate that mechanisms molecular the s e l c i rt A  with the abundance of TRA-encoding genes among all protein-coding Supplementary Code Supplementary Fig. 1 Fig.

Next we used a published method the studied and mTECs mouse to scRNA-seq applied we Hence, Supplementary Fig. 1 Fig. Supplementary 1 2 b were cumulatively detected in the 203 mature mTECs analyzed hi ). In addition, the scRNA-seq assay cumulatively detected detected cumulatively assay scRNA-seq the addition, In ). mouse mTECs (called ‘mature mTECs’ here). We sorted sorted We here). mTECs’ ‘mature (called mTECs mouse ). For each mTEC, we counted the protein- the counted we mTEC, each For ). ). We did not observe evidence of cell-to- of evidence We). observe not did − CD45 − 2 Ly51 5 to identify genes whose expres Fig. Fig. 1 − EpCAM Fig. 1 22 b , 2 19 ), ), which indicated that 3 . . After implementing , 2 c 4 ). This set of highly . + MHCII hi ) mTECs, as as mTECs, ) hi Fig. 1 Fig. ) from from ) Fig. 1 Fig. ± 3.6% 3.6% a ). ). a - - -

detected in less than 15% of our single mature mTECs ( mature single our of 15% than less in detected independ Aire ( were ent 390 and dependent Aire were genes 522 set, data FANTOM the from types cell 91 the of 10 most at in detected of regardless ( mTECs, regulation Aire single in frequency low a at expressed were with expression tissue-restricted patterns in the of periphery the body ( genes Aire-independent were than mTECs of fraction smaller a in expressed were genes ent genes Aire-regulated of list a and consortium genome’) mammalian the of annotation (‘functional types cell by the FANTOMacquired lines) cell and three types cell 91 (88 primary of atlas transcriptome the with data gene-expression in expression TRA of our single-cell we integrated mature mTECs. analysis, For single this dependence Aire the investigated we Next mosaically expressed generally are genes TRA-encoding genome. the of most express reliably to seemed collectively yet and cells individual of level the at heterogeneous highly was that ( TRAs encoded variable highly as detected not genes the of 14% only while TRAs, encoded genes variable highly the of 26% specifically, genes (odds ratio = 2.2, and ( gene, TRA-encoding Aire-dependent an selected mTECs we this, mature For cells). within (203 genes TRA-encoding of co-expression the mTECs, we single chose in an independent patterns co-expression of concept the evaluate Tofurther dependence Aire of regardless co-expression TRA mTECs. mature individual in patterns discernible followed genes TRA-encoding of regulation the that and mTECs single in patterns ( cells of fraction larger a in present was which B, cluster co-expression for cell in frequencies human mTECs small groups at co-expression low of distinct three identification published a only ( in mTECs mature of expression fraction high showed patterns co-expression ( co-expression of evidence no provided data the which for genes together grouped that (L) cluster one and sion of co-expres (A–K) patterns showed that clusters gene 11 stable fied resampling by clustering the their of of stability statistical the basis assessed and cells across the of expression level on genes Aire-dependent variable highly group method model’) by latent scLVMthe variable transcriptomes (‘single-cell cell mTECmature 203 the from single- variation out cell-cycle regressed first we manner, cycle–dependent cell a in co-regulated being genes Because the cell cycle was gene–co-expression a potential confounding factor, due to many mTECs noticeable single patterns without in expression randomly—i.e., TRA occurs whether addressed we Next mTECs mature single in patterns TRA-expression Non-random for effect pronounced genes. more Aire-dependent a with mTECs, mature in cell frequency fewer to low a at restricted expressed generally were be body the of periphery to the in types tends expression whose genes that in less than 15% detected of mTECs ( were (265) 68% genes, Aire-independent 390 the of manner, similar Supplementary Fig. 2 Supplementary Fig. 2 Fig. Fig. 1f Fig. Fig. Fig. Tspan8 a 10 2 ). We detected detected We ). 7 , . Next we used clustering by the the by clustering used we Next . 19 2 aDV ). These results suggested the existence of co-expression co-expression of existence the suggested results These ). , , 2 g (encoding tetraspanin-8), which belonged to cluster B cluster to belonged which tetraspanin-8), (encoding 1 ). Of the 522 Aire-dependent genes, 94% (492) were were (492) 94% genes, Aire-dependent 522 the Of ). —or instead is governed by rules of gene co-regulation A NCE ONLINE PUBLIC ONLINE NCE Fig. 1f Fig. ). ). Thus, mature mTECs represented a cell type Tspan8 , g ). When we considered a set of 912 genes genes 912 of set a considered we When ). Fig. 1d Fig. P < 2.2 × 10 Fig. 2 Fig. 2 8 ( expression in 66 of the 203 mature mature 203 the of 66 in expression Supplementary Code Supplementary in silico 1 , e 2 ). Moreover, we found that genes genes that found we Moreover, ). . . We a observed notable exception 1 b 9 A ). This was consistent with the the with consistent was This ). Fig. Fig. 1 . We found that Aire-depend that found We . TION −16 analytical approach to assess (Fisher’s exact test)). More g

). These results indicated ). indicated results These k nature immunology nature -medoids algorithm to to algorithm -medoids Fig. 2 Fig. a ). Most of these these of Most ). ). We identi We ). Fig. 1 Fig. f ). In a In ). 1 2 2 6 - - - - .

© 2015 Nature America, Inc. All rights reserved. as in in as by identified clusters in (identified genes Aire-dependent variable highly 2,174 of profiles expression the of 2 Figure the ( genes Aire-independent Table 1 ‘ the this called we with co-expressed being as genes 595 fied Using for gene sets both testing. approach, we this considered identi con are into mature mTECs, we upon differentiation upregulated comitantly genes Aire-independent and genes Aire-dependent both mTECs137 lacked that remaining in the detected we which in cells 66 the in expression ( genes variable highly ( (~33%) mTECs ( gene Aire-independent ( mTECs mature of number the ( detected. was genes ( variation). of coefficient 50% (i.e., 0.25 of threshold variation of coefficient squared biological the indicates line purple noise; technical for fit model indicates line black solid RNA; ‘spike-in’ control external indicate symbols black genes; other all indicate symbols gray variable; highly as classified method a published by ( mTECs mature single across expression variable highly significantly with genes ( mTECs of number increasing an against plotted expressed, being as scRNA-seq by detected genes protein-coding ( ‘overplotting’. by points data of obscuring prevent to symbols semitransparent as presented mice, type wild C57BL/6 6-week-old to 4- of tissue thymic pooled ( mTECs mature single in detected expression with genes total versus detected expression with genes TRA-encoding ( a population. as genes TRA-encoding of set a comprehensive express but level single-cell 1 Figure nature immunology nature each. in cell one with d a , –1 e

correlation a TRA-encoding genes Spearma ) Aire-dependent genes ( genes ) Aire-dependent k 2

a -medoids clustering analysis ( analysis clustering -medoids detected (×10 ) ; columns indicate individual mature mTECs ordered by by ordered mTECs mature individual indicate ; columns 1 0 10 15 20 ). ). This gene set consisted of 129 Aire-dependent genes and 466

5 Single mature mTEC transcriptomes reveal numerous low-frequency sets of co-expressed genes. ( genes. co-expressed of sets low-frequency numerous reveal transcriptomes mTEC mature Single the at expression gene heterogeneous show mTECs Mature n Genes detected 6 2 8 4 G H D C K E B A F L J I i. 2 Fig. Tspan8 f k , -medoids clustering. ( clustering. -medoids 2 g a 5 g ( ) Quantification of tissues in which expression of individual genes was detected in the FANTOM data set FANTOM data the in detected was genes individual of expression which in tissues of ) Quantification Fig. 1 Fig. × : maroon symbols indicate genes with a biological squared coefficient of variation (SCV) of >0.25 at an FDR of 10%, 10%, of FDR an at >0.25 of (SCV) variation of coefficient squared a biological with genes indicate symbols : maroon ) Scatterplot of scRNA-seq data quantifying quantifying data scRNA-seq of ) Scatterplot b ); maroon horizontal line indicates the threshold value of 10. Data are representative of 203 experiments with one cell in each. each. in cell one with experiments 203 of representative are Data 10. of value threshold the indicates line horizontal maroon ); 10

) Cumulative fraction of TRA-encoding genes and and genes TRA-encoding of fraction ) Cumulative 10 b 3 aDV Supplementary Table1 Supplementary ) d –co-expressed gene set’ ( set’ gene –co-expressed ). Next we assessed each of the 9,689 9,689 the of each assessed we Next ). ) and Aire-independent genes ( genes Aire-independent ) and c b n Aire-dependent genes ) to determine whether they had higher had higher they whether ) to determine = 203) in which expression of the gene was detected: each data point represents one Aire-dependent gene ( gene Aire-dependent one represents point data each detected: was gene the of expression which in = 203)

A Cumulative fraction

NCE ONLINE PUBLIC ONLINE NCE of genes detected 1.0 0.2 0.4 0.6 0.8 0 Fig. 2 Fig. 0 n = 203). ( = 203). Mature mTECs 50 Tspan8 b genes Protein-coding genes TRA-encoding n ) Expression of highly variable Aire-dependent genes across individual mature mTECs ( mTECs mature individual across genes Aire-dependent variable highly of ) Expression Tspan8 a = 203) isolated from from isolated = 203) 100 ), the 129 Aire-depend 129 the ), 150 c expression. Because Because expression. Tspan8 ) Identification of 9,689 9,689 of ) Identification at an FDR of 10%; 10%; of FDR an at ). Consistent with with Consistent ). 200 A Supplementary Supplementary TION c mRNA than than mRNA e genes dependent Aire- ) as a function of the number of mature mTECs ( mTECs mature of number the of a function ) as SCV

Tspan8 10 10 10 10 10 –2 –1 0 1 2

expression (low (left) to high (right)). Data are representative of 203 experiments experiments 203 of representative are Data (right)). high to (left) (low expression 10 –1

10 - - - 0

Reads

10 1 libraries from 48 Tspan8 48 from libraries procedure used for human mTECs published a by surface, cell the on Tspan8 expressing mTECs single with co-expressed indeed test); (Fisher’s exact = 22, ratio (odds clusters other of the genes with than B cluster from genes the with overlap more much showed genes ent the 66 unselected mature mTECs in which the expression of of expression the Tspan8 which mTECsin mature unselected 66 sorted the 48 these between were concordant genes highly Aire-independent and genes Aire-dependent both for M

10 2 n Expression b + 2 0 We then independently confirmed the finding that the genes were genes the that finding the confirmed We independently then 10 = 203) = 203) MHCII 3 (log f 10 Tissues with 4 10 10 Fig. 1 Fig. 5 ) 4 expression of Aire-dependent genes

hi 25 50 75 d 0 Tspan8 Aire-dependent c 2 5 0 ) across mature mTECs ( mTECs mature ) across G H D C K E B A F L J genes (×10 ) I 0 2 4 6 Mature mTECswith 5 0 Tspan8 0 + Mature mTECswith Supplementary Fig. 3 Fig. Supplementary ). We found that the patterns of co-expression co-expression of patterns the that Wefound ). detected gene expression 100 0 Cells orderedby mRNA detected + 100 a mature mTECs (PI mTECs mature Tspan8 ) Pairwise Spearman correlation matrix matrix correlation Spearman ) Pairwise 150 n 150 = 203) for which expression of the the of expression which for = 203) 200 1 by using flow cytometry to sort sort to cytometry flow using by 2 200 . . We cDNAsingle-cell sequenced Tspan8 g n

= 203); left margin, 12 gene gene 12 margin, left = 203); Tissues with

expression of Aire-independent e

expression Aire-independent genes 2

). genes (×10 ) 25 50 75 0 2 4 6 0 5 0 5 0 2 − 6 CD45 as a function of a function as Mature mTECswith Mature mTECswith n gene expression gene expression ticles e l c i rt A = 203): row order, row = 203): 0 0 detected P − + 100 100 CDR1 mTECs and and mTECs < 2.2 × < 10 2.2 f genes dependent Aire ) ) or

150 150 - − Tspan8

EpCA

200 200 −16 

© 2015 Nature America, Inc. All rights reserved. compared with their expression in the unselected Ceacam1 unselected the in expression their with compared Ceacam1 the in upregulation the to ing R1 (PI of Ceacam1 expression for surface cytometry by flow in this gene set with ( ‘ the this called we 10%; of FDR a with were co-expressed that genes) 42 Aire-independent (31 of the 203 cells). We found 65 genes (23 Aire-dependent genes and of expression detected and screened the 203 mature mTECs for the presence of with Aire-independent TRA-encoding gene detected as being co-expressed an protein Ceacam1, the the we gene cell-adhesion encoding selected egy followed for strat the we repeated genes, and Aire-independent genes dependent P ( Tspan8 cells sorted 48 the in upregulated also belonging genes the to the of 96% ( Specifically, initially detected was mRNA s e l c i rt A  ( 227 experiments ( experiments 227 ( experiments 233 ( experiments 251 ( experiments 185 of representative are Data Klk5 ( mTECs unselected the in genes P Klk5 ( (qPCR)) PCR Klk5 Klk5 in (as expression in changes of ( ( mTECs unselected in set gene co-expressed ( (Ceacam1 Ceacam1 unselected versus Ceacam1 preselected in set in (as ( Table 1 with co-expressed genes indicate rows scRNA-seq); by measured as increasing by (ordered ( ( mTECs unselected in set gene co-expressed ( (Tspan8 Tspan8 which for mTECs mature unselected 137 the versus cytometry flow by selected mTECs mature Table 1 Tspan8 ( approaches. experimental independent by sets 3 Figure t n n c n n t a Fig. 3c Fig. upeetr Tbe 1 Table Supplementary

= 8.2 × = 10 8.2 < 2.2 × 10 × 2.2 < -test). ( -test). -test). ( -test). ) Distribution of changes in expression expression in changes of ) Distribution the of expression in changes of ) Distribution = 30), presented as in in as presented = 30), Ceacam1 preselected and = 203) cells individual indicate columns = 48): Tspan8 pre-selected and = 203) To further confirm co-expression in mature mTECsTo co-expression for Aire- confirm both further − EpCAM + − + preselected in set gene –co-expressed Tspan8 mTECs ( mTECs ( mTECs quantitative by detected mRNA (with a i. 3 Fig. mRNA was not detected by scRNA-seq scRNA-seq by detected not was mRNA ( set gene –co-expressed ) for the the ) for ) or all other genes in the 48 Tspan8 48 the in genes other ) all or ); left margin, Aire-dependent genes. genes. Aire-dependent margin, left ); Tspan8 +

, d b vs vs d Confirmation of co-expression in gene gene in co-expression of Confirmation + ) Expression of in genes the ) Expression ) Expression of genes in the the in genes of ) Expression and and vs Ceacam1 vs Ceacam1 Tspan8 Klk5 ( −16 −5 + a Supplementary Supplementary Table 1 MHCII n n n ( and and = 24) versus unselected unselected versus = 24) –co-expressed gene set were were set gene –co-expressed Supplementary Fig. 4 Fig. Supplementary Ceacam1 ( = 24), presented as in in as presented = 24), ( = 190) t -test). ( -test). Tspan8 –co-expressed gene set in set gene –co-expressed t f d b -test)). ) with one cell in each. in cell one ) with − ), 214 experiments ( experiments 214 ), ( experiments 202 ), n ). ). = 203) and preselected preselected and = 203) Ceacam1 –co-expressed gene set, 92% showed consistent consistent showed 92% set, gene –co-expressed upeetr Fg 4 Fig. Supplementary hi P Tspan8 − Ceacam1 < 2.2 × < 10 2.2 f ). ). Tspan8 for two additional TRA-encoding genes. First, Klk5 –co-expressed gene gene –co-expressed ) Expression of ) Expression b . . ( P + − = 9.8 × = 10 9.8 ). Next we confirmed the co-expression co-expression the confirmed we Next ). mTECs ( mTECs mTECs ( mTECs e + transcripts, transcripts, ) Distribution ) Distribution Ceacam1 vs vs a by sequencing 30 single mTECs selected Supplementary Supplementary ( ) for the the ) for + Supplementary Supplementary mTECs selected by flow cytometry, cytometry, flow by selected mTECs + Klk5 + ) ) ( −16 mTECs mTECs + i. 3a Fig. mTECs mTECs n Fig. Fig. 3c n Ceacam1 Ceacam1 Tspan8

− = 30) = 172) = 172)

b −11 ). ). As we had done for ). ). e c ; ; in 15% of the mature mTECs mature the of 15% in .

) ) or ), P

= 9.8 × 10 = 9.8

+

a

,

,

b

– d ) ). ). ). ). Of the 65 genes belong – + ; –co-expressed gene set’ set’ gene –co-expressed

Ceacam1 −11 ( t -test)). − Ceacam1 transcripts Tspan8 CD45 − mTECs − , , we CD at - -

also positive for for positive also were mTECs (5.0%) 562 the of 28 PCR. quantitative by C) ubiquitin gene housekeeping the for positive be to confirmed co-expressed with test); exact among the rest of the clusters (odds ratio = 4.7, ment among the genes from cluster D compared with their abundance ( ( of 68 genes: genes 39 and genes Aire-dependent 29 Aire-independent ( of detection Ceacam1 the in D cluster to assigned was and frequency, representative more a at Thus, we respectively). also assessed a TRA-encoding 15%, gene, and (33% population mTEC mature the across Supplementary Table Supplementary 1 Fig. 2 Fig. Table 1 Supplementary We experimentally confirmed the finding that the genes were indeed Both k -medoids clustering ( clustering -medoids a ), these 39 Aire-dependent genes showed significant enrich significant showed genes Aire-dependent 39 these ), Tspan8 , we defined , the ‘ we defined aDV Supplementary Fig. 5 Fig. Supplementary Klk5 A and and NCE ONLINE PUBLIC ONLINE NCE transcripts in 13 of the 203 mature mTECs (6.4%) mTECs(6.4%) mature 203 the of 13 in transcripts Klk5 Klk5 Ceacam1 by screening 562 mature mTEC cDNA libraries expression, as determined by quantitative quantitative by determined as expression, ). ). The ). Consistent with the the with Consistent ). Klk5 Fig. 2 Fig. were expressed relatively frequently frequently relatively expressed were –co-expressed gene set’ on ofthe –co-expressed basis Klk5 a ). ). As we had defined defined had we As ). –co-expressed gene set consisted –co-expressed A TION nature immunology nature Klk5 P k = 8.2 × 10 -medoids clustering clustering -medoids , , that was expressed Ubc Tspan8 (encoding (encoding −5 (Fisher’s and and -

© 2015 Nature America, Inc. All rights reserved. axis of gene-expression variation, principal component 1 (PC1), (PC1), 1 Tspan8 48 component the distinguished principal variation, dominant gene-expression The of sets). axis gene co-expressed two the of union the (i.e., genes all of the in data co-expressed expression the of analysis principal-component 24 and etry, flow cytometry, 30 Ceacam1 Tspan8 48 203 mTECs, mature cells: (305 unselected mTECs mature single all of profiles expression the between groups. distinct transition their during mTECs in patterns co-expression suggested been has lifespan their throughout groups co-expression distinct through Tspan8 Tspan8 with only co-expressed were that the of (40% genes 27 identified with co-expressed the to belonging test); exact (Fisher’s Tspan8 We found significant overlap of the genes in the the in genes the of overlap significant found We genealogies within mTEC groups Potential co-expression Klk5. and Tspan8, Ceacam1 frequency: population varying of TRAs three of expression surface of basis the on mTECs pre-selected mature of subsets of sequencing analysis of 203 single unselected mature mTECs and by transcriptome Thus, we patterns of identified by co-expression transcriptome initial study). this in used definition TRA the to (according TRAs as sified were they to test)), clas exact products not genes encoding restricted ( ( genes TRA-encoding for enrichment showed sets ing this concordance was particularly pronounced for the genes neighbor ( upregulation in the ( set gene co-expressed of sion expres the detected we which in mTECs mature unselected 13 the of the transcriptomes of the 24 sequenced we Next shown). not (data PCR ( experiments ( experiments 305 of representative Ceacam1 of expression top, PC1); by Ceacam1 preselected mature in margin) (left both or with co-expressed being as ( projection. PC1 the along 10 of threshold the indicates line vertical and the of union the in genes of expression Klk5 24 and mTECs Tspan848 and mTECs, ( sequenced mTECs ( of a gradient along organized are mTECs corresponding and overlap, sets gene 4 Figure nature immunology nature a Fig. Fig. 3e Tspan8 ) Principal-component analysis of all mature mature all of analysis ) Principal-component To explore that hypothesis, we visualized the interrelationships of interrelationships the visualized we To hypothesis, that explore In addition, while we found that the three co-expressed gene gene co-expressed three the that found we while addition, In Klk5 Ceacam1 –co-expressed gene ratio sets (odds –co-expressed = 23.5, and . A model in which single mTECs would sequentially shift shift sequentially would mTECs single which in model A . –co-expressed gene set) that were co-expressed only with with only co-expressed were that set) gene –co-expressed

, Klk5 ), ), f in the genome (discussed below). (discussed genome the in mRNA in individual mTECs. Data are are Data mTECs. individual in mRNA + The The and mTECs ( mTECs P Klk5 = 7 × 10 b transcripts, 71% of the genes from this defined defined this from genes the of 71% transcripts, –co-expressed gene sets; dashed dashed sets; gene –co-expressed Tspan8 Klk5 ) with one cell in each. in cell one ) with Supplementary Fig. 4 Supplementary + 1 2 mTECs. In agreement with findings obtained for for obtained findings with agreement In mTECs. , which would indicate the existence of overlapping of overlapping existence the indicate would , which + n Ceacam1 n + mature mTECs selected by quantitative PCR) by PCR) quantitative by selected mTECs mature Tspan8 = 30) (columns ordered ordered (columns = 30) b Klk5 – – or = 305: 203 unselected unselected 203 = 305: mTECs), based on based mTECs), −15 ) Genes (rows) detected detected (rows) ) Genes Supplementary Table1 Supplementary Ceacam1 + mTECs, 30 Ceacam1 30 mTECs, ( Supplementary Table 1 Supplementary Ceacam1 +

Ceacam1 mature mTECs selected by quantitative PCR Tspan8 Tspan8 aDV . Despite such substantial overlap, we also also we overlap, substantial such Despite . Tspan8 –co-expressed gene set (i.e., 60%) were were 60%) (i.e., set gene –co-expressed + + mature mTECs by selected flow cytom cells and 30 Ceacam1 30 and cells A – and and – NCE ONLINE PUBLIC ONLINE NCE or or –co-expressed –co-expressed mRNA and and mRNA a ) ) and expression. expression. ) or ) 30 or Ceacam1 Ceacam1 Tspan8 ; ; Ceacam1 Tspan8 P = 8.2 × 10 P + mature mTECs selected by selected mTECs mature = 1.3 × 10 + –

–co-expressed gene set) set) gene –co-expressed –co-expressed gene sets sets gene –co-expressed ). Specifically, 39 genes genes 39 Specifically, ). and 557 (93% of the the of (93% 557 and ) showed a consistent ) a showed consistent −5 a −4 –15 –10 ( PC2 A –5 10 15 t 0 5 TION ( + P -test)). Notably,-test)). Ceacam1 cells from the the from cells Klk5 P < 2.2 × 10 × 2.2 < < 2.2 × 10 Ceacam1 Tspan8 Klk5 Unselected cells

) ) (Fisher’s –10 + cells(qPCR) + – and and – Klk5 cells(flowcytometry) + cells(flowcytometry) −16 −16 1 0 – - - - -

PC1 10. 10. These results suggested that a single gene-expression program was mTECs and none of the compared with 27% of the Ceacam1 10, than higher axis) horizontal the along (position a projection PC1 the Ceacam1 Tspan8 the with cells, the of rest i. 7 Fig. the from resulting clusters gene 11 the of each of localization the plotted we 10%; of (FDR ity proxim genomic significant in located were 8 clusters from gene 11 genes the of the that found ( we models, null cluster these of basis respective the On the of size the genes between given distance genomic median expected the estimate to For us allowed that cluster. model null a same constructed we clusters, 11 the the of each within neighbor we gene its clusters, to co-expressed gene each nearest co-expression between 11 distance genomic the median of the each calculated for Thus, point. this address adequately cells, can analysis single-cell individual transcriptome-wide only from patterns gene-expression aver different to from of due aging misleading expression be would gene populations clustered cell heterogeneous of inference because However, reported been has mTECs mouse and human in clusters tissues peripheral in regulation their of non-random configurations of chromatin regardless genes neighboring of local expression ectopic generation allow would that be the could patterns for co-expression mechanism possible One genome the in genes co-expressed of Clustering another to group co-expression one from transition Tspan8 to These data were consistent similarity with the hypothesis that individual mTECs increasing and genes co-expressed Ceacam1 0.35; = correlation Ceacam1 the only considered we when present 0.62; = correlation union of the the from genes of all expression mean the with correlated expression Tspan8 the of expression the Ceacam1 the did than program this of genes and that the Tspan8 of variability most the underlying of selected cell-to-cell the observed 2 0 To further expand the findings reported above, we quantified quantified we above, reported findings the expand further To ). Despite being dispersed across the genome, many genes genes many genome, the across dispersed being Despite ). 0 + k mTECs was concomitant with increased expression of the expression mTECs increased with was concomitant + mdis lseig n krorm ( karyogram a in clustering -medoids Tspan8 and Ceacam1 and + cells ( cells b Supplementary Fig. 6 Fig. Supplementary Tspan8 Expression – and Supplementary Code Supplementary Fig. 4 Fig. Ceacam ( –4 z Fig. 4 Fig. -score) Tspan8 4 0 Klk5 Ceacam1 mRNA (from the scRNA-Seq analysis) in in analysis) scRNA-Seq the (from mRNA b + 1 a mTECs had a more pronounced adoption ). Thus, the amount of of amount the Thus, ). + ). ). 52% of the Tspan8 mTECs. We found that that found We mTECs. + cells had a PC1 projection higher than higher had a projection cells PC1 + cells being separated further than than further separated being cells Cells orderedbyPC1 –co-expressed gene sets (Spearman + cells. Only 10% of the unselected Co-expression: + 6 mTECs. . Ectopic expression of gene gene of expression Ectopic . ). To visualize these effects, effects, these Tovisualize ). ). The correlation was still still was correlation The ). upeetr Code Supplementary + + mature mTECs had mTECs (Spearman (Spearman mTECs ticles e l c i rt A Tspan8 Ceacam Tspan Both 1 Supplementary Supplementary 2 Tspan8 . 8 10 mRNA in in mRNA 1 + , 12 mTECs. mTECs. genes expressed Co- mRNA mRNA , 15 , 29 , 3 0 ). ).  - - .

© 2015 Nature America, Inc. All rights reserved. (left margin (purple)), presented by decreasing Klk5 expression (top (highest) to bottom (lowest)); black box indicates mTECs for which (ordered by genomic position as in family; purple indicates genes assigned to cluster D ( cluster D, which deviates from the null model (FDR = 10%). ( sets of genes of the same set size as gene set D); purple vertical line indicates median distance observed for the 115 co-expressed genes belonging to Mb, megabases. ( Figure 5 ( relationship functional obvious no with products 3 encoded but co-expressed were that genes boring on genome the in ( neighbors close were family and on 2 chromosome ( genome the in consecutively located were 1’) protein-like increasing products belonging encoding to ‘BPI D fold–containing cluster familyin B’ genes (‘bactericidal four permeability- example, For products. related functionally and structurally encoding genes encompassing co-expression by (exemplified D; close cluster other in each located to were proximity cluster genomic co-expression gene same the from s e l c i rt A  were detected by scRNA-seq. Data are representative of 203 experiments ( Supplementary Supplementary Fig. 8b

c a Gstm7 Chr 7

Co-expressed genes cluster in the genome. ( Ctu1 Klk14 ) encoding ) products from encoding the ‘glutathione i. 5a Fig. Klk13 (+ ) (+) (+) b Klk12 43. Klk11 Klk10 ) Distribution of expected median genomic distance between two genes in the genome (based on 1 × 10 Ch 19 18 17 16 15 14 13 12 10 11 7 , X Klk9 9 8 7 6 5 4 3 2 1 Klk8 b Supplementary Fig. 8a Supplementary r Klk7 (+) ). Some of these loci comprised gene families families gene comprised loci these of Some ). 5 0 (+) Klk6 (+) (+) (+) Klk5 (+) 43. ). ). Notably, we also identified groups of neigh (+) Klk4 (+) 8 0 Klk15 (+) Klk1b8 Distance (Mb) Klk1b1 c Klk1b9 43. Klk1b11 ) in single unselected mature mTECs ( Distance (Mb) Supplementary Fig. 8c Fig. Supplementary (+) Klk1b26 9 (+) 100 (+) Klk1b27 (+) (+) (+) 44 Klk1b21 Klk1b22 ), while two ( ), genes while (+) Klk1b16 150 1700028J19Rik (+) 2 Klk1b24 a (+) 44. a ) Karyogram of the genomic localization of genes in co-expressed cluster D ( Gm10109 Klk1b3 ); (+), plus strand; (–), minus strand. ( (+) Klk1b4 1 Klk1b5 200 S Klk1 -transferase- (+ (+) c ) (+ ) Genomic region on chromosome 7 hosting genes encoding peptidases of the kallikrein (Klk) 44. ). (+) (–) ) (+) (– 2 ) Gstm2 b d µ n a - = 203) and – ’

c ) or 227 experiments ( Cells Frequency genes located in close proximity in the genome. the in proximity close in located genes of groups co-expressed involved mTECs in genes TRA-encoding of and Ctu1 ( genes neighboring of expression the for proxy a as served We PCR. by quantitative mTECs that found selected genomic kallikrein 24 mTECs the the mature and unselected 203 our in of patterns gene-expression the explored including products ( 7 chromosome on proximity genomic encoding genes close in located family, 27 peptidase kallikrein-related the to contains belonging locus The family. related functionally and structurally a of example prominent a represented 40 60 80 20

Klk14 0 h lcs noig alkenrltd etdss ( peptidases kallikrein-related encoding locus The Klk13 8 5 Supplementary Fig. 9 Supplementary Klk12 –3 Klk11 Klk5

Klk10 Expected genomicdistance(Mb) –2 Klk5 aDV Klk9 +

d Klk8 mature mTECs ( –1 ) Expression profiles for genes encoding kallikrein peptidases Klk7 A , were assigned to cluster D ( D cluster to assigned were , Klk6 z -score NCE ONLINE PUBLIC ONLINE NCE

Klk5 3 2 1 0 d 12

) with one cell in each. Klk4 Klk15 Klk1b8 Klk1b1 Klk1b9

Klk1b11 ). These results showed that the expression 16 n Klk1b26 = 24) selected by quantitative PCR Klk1b27 Klk1b21 Klk1b22

Klk1b16 20 A Klk1b24 TION 3 permutations selecting random Klk1b3 5 Fig. Klk1b4

Klk1b5 Gm10109 nature immunology nature 1700028J19Rik Fig. Fig. 5 Fig. c ). Nine of these genes, genes, these of Nine ). Klk1 2 Klk5 ). Chr, chromosome; c Klk5 Cells orderedby ). Moreover, we we Moreover, ). Klk5 transcripts Klk5 expression expression expression i. 5 Fig. + mature mature Fig. 5 Fig.

c d )

© 2015 Nature America, Inc. All rights reserved. Data are representative of three experiments with one donor in each. in donor one with experiments three of representative are Data MUC1 versus mTECs the for accessibility method DESeq2 the by calculated changes ‘fold’ logarithmic moderated as presented and seq CEACAM5 versus mTECs single-cell single-cell approach we used here addressed the issue of co-expression the because findings, those advance here have substantially provided by the availability of antibodies suitable for flow cytometry. The data we subset of the mTEC population, because those studies were constrained preselected mTEC subsets analyzed previously represent patterns only a narrow gene-expression differential display TRAs particular of expression the for cytometry flow by selected mTECs this model has been challenged by the mTECs discovery that subsets single of humanin genes TRA-encoding of induction random cells. T maturing to signal tolerogenic a transmit to density sufficient at presentation epitope ensuring to crucial be to seems cell per genes expressed ectopically of number the restricting for presentation, antigen capacity a limited of number the TRA-encoding genes limiting expressed in individual mTECs. while As mTECs havelevel population the at presented be reported TRAs. of set comprehensive a present to level population the at up’ ‘added faithfully that patterns mosaic formed sets gene Co-expressed accessibility. chromatin promoters enhanced displayed their and genome, the in clustered genes Co-expressed frequencies. cell low at occurred generally patterns These mature mTECs. in patterns co-expression recurring numerous of evidence scRNA-seq applied we Here, rules. follows or poorly TRA of remains regulation the expression in to mTECs;single i.e., to what extent the regulation process is relates random question open molecular One understood. its self- of However, induction the for tolerance. essential is mTECs in expression TRA DISCUSSION loci. respective the of regions promoter at the accessibility chromatin enhanced accompanied mTEC subsets ( mTECs TRA-negative mTECs in the than TRA-positive in the more accessible significantly the with were MUC1) or co-expressed CEACAM5 (either subsets were TRA-positive respective that ( loci that accessibility observed and we chromatin However, mTECs their TRA-positive in the mTECs between TRA-negative difference no was there genes, protein-coding all for accounted we When fractions. mTEC and TRA-negative TRA–positive surface mTECs respective the from MUC1 the sets: gene co-expressed human published two of basis the on cells sorted and tissue thymic human used we assay, this for required mTECs TRA–specific surface of number a sufficient obtain To accessibility. chromatin of measurement direct allows TN5 to chromatin and integrate into thus un-compacted transposase profiling epigenomic of we method ATAC-seq the genes, by accessibility co-expressed DNA of genome-wide assayed state chromatin the assess directly To chromatin accessible to map genes co-expressed of Promoters genes) (288 set gene ( accessibility. 6 Figure nature immunology nature required). pre-selection no (i.e., way unbiased genome-wide a in It has been proposed that mosaic expression patterns arise by by arise patterns expression mosaic that proposed been has It been have thymus the in patterns gene-expression Mosaic gene sets gene

Promoters of co-regulated genes show increased chromatin chromatin increased show genes co-regulated of Promoters 10– 1 2 a , and they allow a considerable diversity of antigens to to antigens of diversity considerable a allow they and , ) Chromatin accessibility for the the for accessibility ) Chromatin 1 2 . We performed the ATAC-seq experiments with with experiments ATAC-seq the Weperformed . MUC1 − 1 mTECs, presented as in in as presented mTECs, 2 4 and all other protein-coding genes in CEACAM5 in genes protein-coding other all and 4 − . . Fig. Fig.

mTECs ( mTECs 3 –co-expressed gene set (219 genes) in MUC1 in genes) (219 set gene –co-expressed P 1 = 1.2 × 10 = 1.2 aDV , which is based on the ‘preference’ of the the of ‘preference’ the on based is which , 6 ). Thus, gene co-expression in distinct distinct in co-expression gene Thus, ). A NCE ONLINE PUBLIC ONLINE NCE n = 3 donors), assayed by bulk ATAC- bulk by assayed = 3 donors), −15 ( t -test). ( -test). a . . CEACAM5 P 22 = 1.1 × 10 = 1.1 , 23 b ) Chromatin ) Chromatin , 32– A 1 CEACAM5 TION 2 3 –co-expressed –co-expressed . However, the the However, . 6 and obtained obtained and −14

( Fig. Fig. 10 t -test). -test). , 19 and and , 6 + 2 +

1 ). ).

;

our calculations. Hence, our T calculations. cells would only have to scan sub-domains in margin error generous a for allowance with even medulla, thymic complete would TRA be repertoire covered multiple the within times mTEC (~1 × 10 compartment 95% of the reported TRA-encoding genes. Given the size of the mouse an underestimate. be to number this expect we As the number of mature mTECs we sequenced was limited (203 cells), previously 11 unknown co-expressionidentify patternsto within us the mature allowed mTEC population.analysis of depth current The toire during their lifetime their toire during reper of TRA the portion a sizeable express might and thus patterns co-expression TRA different between mTECs transit individual that postulates that model a in proposed been has that concept a groups, co-expression different between cells individual of transitioning the Ceacam1 with co-expressed genes of expression increased and between correlation positive a observed we self-tolerance. of induction efficient for compartment this of adjacent to TRA-encoding genes. However, co-expressed gene sets sets gene co-expressed However, genes. genes other TRA-encoding to of adjacent expression the promotes indicate also might TRAs expression TRA encode that not did that genes contain also sets gene co-expressed that observation our used, thresholds the on ent of the definition TRAs Although is and operational is depend highly in the regulation periphery. tissue-specific of regardless distinct their mTECs, single in fashion coordinated a in co-expressed be to genes neighboring allows that remodeling chromatin local on rely would mechanism a Such patterns. co-expression inter-chromosomal and ments,experi this ATAC-seq suggests a our potential mechanism with for the generation ofconjunction intra- In genome. the in cluster and to autoimmunitylead might induction tolerance of process the undermine potentially that expression TRA thymic of pitfalls represent might the they at time, antigens; same self of presentation thymic of diversity the extend unclear is role a serve patterns splicing variable and isoforms, mRNA truncated in resulting usage promoter of slippage expression, bi-allelic versus expression some allelic represent or size sample mono- to extent which the features Inof sampling. random addition, limited of the to detection due escaped co-expression either mTECs. genes mature single TRA-encoding 203 remaining of The basis the on set gene co-expressed medulla the in sojourn its during cell T same the re-encountering when TRAs different express could a mTEC given because repertoire, TRA full the to with encounter act of number mTECsthe minimal any T to need would single cell inter Nevertheless, even this relatively small number of number mTECs small covered relatively this even Nevertheless, Moreover, by ‘zooming in’ on the co-expression groups identified, identified, Moreover,groups in’ by ‘zooming co-expression on the Our single-cell data showed that co-expressed genes tended to to tended genes co-expressed that showed data single-cell Our a to genes TRA-encoding the of 71% assign to able were We a

Promoter accessibility (log2 fold) + + – cells and Tspan8cells CEACAM5 vs CEACAM5 −1 –2

Co-expressed 0 1 2

All others 1 2 + . Such a mechanism could further reduce reduce further could . Such a mechanism 38 cells. This finding would be in line with with in be line would finding This cells. 5 , 3 cells) 9 . 6 3 , 21 7 b . , 38 1 0 , , this finding indicates that the indicates , finding this 3 Promoter accessibility (log2 fold) 9 . Those last features might might features last Those . MUC1+ vs MUC1– −0.5 0.5 –1 0 1 Tspan8 Co-expressed ticles e l c i rt A transcript levels levels transcript Tspan8 All others in both both in  - - - -

© 2015 Nature America, Inc. All rights reserved. manuscript; M.N. and providedR.K. technical assistance; and L.M.S., B.K. and P.B., S.P.,A.R., W.H.,K.R., B.K. and L.M.S. interpreted the data and wrote the W.H. analysis designed strategy and analyzed the data; preparedA.R. the figures; mTEC preparations and of flow cytometry single and bulk mTECs; andA.R. S.P. helped with the ATAC-seq experiments; S.P. and performed experimental K.R. quantitative PCR confirmation experiments and ATAC-seq experiments; experiments; P.B. sequencing performed single-cell experiments, P.B., S.P., B.K. and L.M.S. conceived of the project; P.B., S.P. and designed K.R. Institutes of Health (P01 HG000205 and R01 GM068717 to P.B., M.N. and L.M.S.). the European Research Council (ERC-2012-AdG to B.K.) and the US National The Helmholtz Center the (K.R.), Sonderforschungsbereich (DFG 938 to S.P.), Union 7th Framework Programme (Health) via Project Radiant (W.H. and A.R.), (German Cancer Research Center) for animal care. Supported by the European W. Wei and M. Sikora for help with data andtransfer; The Animal Central Facility C. Michelrespectively; and S. Anders for advice and comments on the manuscript; and C. forChabbert discussions about ATAC-seq experiments and data, subsequent sequencing at the Stanford Genome Technology J.Center; Buenrostro for initial sequencing, and M. Miranda and E. Hopmans for support during tissue; The Genomics Core Facility of the European Molecular Laboratory Biology help; C. and Sebening T. Loukanov (University of Heidelberg) for human thymic We thank K. Hexel and S. Schmitt for sorting; S. single-cell Egle for technical online version of the pape Note: Any Supplementary Information and Source Data files are available in the E-MTAB- Accession codes. the pape the of in version available are references associated any and Methods M gene. single ‘choice’ every of cell-autonomous independent, fully a than means economic more a provide might groups co-expression then avoid coverage), maximal to ensure thus and genes same example, the expressing (for other each with programs expression their coordinate to were cells If question. open an remains cells single in patterns co-expression coordinated uses in vertebrates, early evolved cells cancer and cells stem embryonic for reported been has mTECs, in expression here for as TRA proposed remodeling, by epigenetic gene expression of activation localized spatially context, this In . mote that pro regulators transcriptional the and, moreover, define should identify the molecular pathways that target co-expressed gene clusters lates with marks of transcription active casein- encoding gene the of expression ectopic that shown mTECshas wide. mouse in locus genome casein the on focusing study investigated However,a been yet not have mTECs in stretches tion that genomic positions influences thymic TRA expression. products of unrelated biological function further supports our proposi The finding that co-expressed gene clusters contained genes encoding ulation ‘transcription factories’ has been described for lineage-specific gene reg nucleus. Correlation between gene co-expression and co-localization in some but also genes nearby in the three-dimensional architecture of the tissues. of number small a to restricted is body the of periphery the in expression whose genes mainly gets that the mechanism underlying co-expression patterns in mTECs tar suggest would which genes, TRA-encoding for enrichment showed s e l c i rt A  W.H. the project. supervised AU Ac

ethods Why mTEC-mediated tolerance induction, which presumably presumably which induction, tolerance mTEC-mediated Why chromatin ‘accessible’ such specifying signatures Epigenetic Chromatin remodeling can affect nearby genes on the same chromo kn T H o O 4 wl R R 0 , and this might also be the case for thymic TRA expression CO 362 edg N T 4 m . RIBU 4 e 3 ArrayExpress: sequencing data, ArrayExpress: sequencing . n r t . T r s . I O NS 4 1 . . Thus, future should studies E-MTAB-334 Klk5 single-cell single-cell β

corre online online 6 and 1 4 2 2 ------.

11. 10. 9. 8. 7. 6. 5. 4. 3. 2. 1. reprints/index.htm R The authors declare no competing interests.financial 31. 30. 29. 28. 27. 26. 25. 24. 23. 22. 21. 20. 19. 18. 17. 16. 15. 14. 13. 12. CO eprints and permissions information is available online at at online available is information permissions and eprints

MP loe, S. Cloosen, gene Promiscuous B. Kyewski, & K. Hexel, S., Rosch, S., Pinto, J., Derbinski, S. Malchow, Yang, S., Fujikado, N., Kolodin, D., Benoist, C. & Mathis, D. Regulatory T cells generated J.S.Perry, tolerance. central for role central A L. Klein, & B. Kyewski, expression gene Promiscuous L. Klein, & B. Kyewski, A., Schulte, J., Derbinski, selection negative and P.M.Positive Allen, K.A. B., Hogquist, Kyewski, & L., Klein, self- learning tolerance: Central S.C. Jameson, & T.A. Baldwin, K.A., Hogquist, monogenic the from tolerance immune on Lessons M.S. Anderson, & J.J. DeVoss, Anderson, M.S. Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y. & Greenleaf, W.J. Transposition J.B. Johnnidis, of cells epithelial Medullary B. Kyewski, & M. Hergenhahn, B., Brors, J., Gotter, Y.Ohnishi, Buettner,F. A.R. Forrest, P. Brennecke, & S. Lemieux, M., Dumont-Lagace, J.R., Vanegas, S., Brochu, C., St-Pierre, S. Picelli, Picelli, S. peripheral- of expression Ectopic D. Mathis, W.,& Besse, C. Villaseñor,J., Benoist, M. Giraud, Sansom, S.N. Waterfield, M. Org, T. A.S. Koh, J. Derbinski, molecular the in Aire’spartners D. Mathis, & C. Benoist, M., Giraud, J., Abramson, Aire. C. Benoist, & D. Mathis, S. Pinto, mechanism. expression patterns in single medullary thymic epithelial cells argue for a stochastic cells. T regulatory (2015). distinctearlylifeplayain maintainingrole in self-tolerance. thymus. the inself-tolerance of generation the to 24 (2001). 1032–1039 self. peripheral the mirrors cells epithelial thymic medullary in 14 don (and see thymocytes what repertoire: cell T the of thymus. the in control APS1. disease protein. aire the by pteil el gnrt sl-nie diversity. self-antigen generate cells epithelial therapy. tumor and glycoforms and CEA, in human thymic epithelial cells: implications for self-tolerance (2013). position. nucleosome and proteins DNA-binding chromatin, open of profiling epigenomic sensitive and fast for chromatin native of factor. transcription genes tissue-specific of selection diverse highly clusters. chromosomal in a colocalized express thymus human the lineages. mouse early segregates progressively (2015). 155–160 cells. of subpopulations hidden reveals data RNA-sequencing (2014). 462–470 experiments. cells. epithelial thymic Rep. neonatal of sequencing Transcriptome C. Perreault, Protoc. cells. USA Sci. Acad. Natl. misinitiated. Proc. monoallelic, probabilistic, epithelium: thymic the in antigens tissue (2012). cells. epithelial thymic in expression epithelia. thymic in expression self-antigen of distribution and silencing Polycomb from relief (2014). MBD1 complex for the induction of immunotolerance. expression. gene activate to H3K4 USA Sci. Acad. tolerance, linking chromatin regulation with organ-specific autoimmunity. levels. multiple at regulated tolerance. immunological of control (2013). E3497–E3505 , 571–606 (2006). 571–606 , (2014). 377–391 , ET

3 Nat. Methods Nat. , 1860 (2013). 1860 , IN

et al. 9 , 171–181 (2014). 171–181 , et al. et aDV G Genome Res. Genome et al. et al. et t al. et t al. et et al. et FINAN t al. et The autoimmune regulator PHD finger binds to non-methylated histone et al. et t al. et Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. Nat. Methods Nat. et al. t al. et t al. et et al. et Overlapping gene coexpression patterns in human medullary thymic medullary human in patterns coexpression gene Overlapping A t al. et l Smart-seq2 for sensitive full-length transcriptome profiling in single Curr. Opin. Genet. Dev. Genet. Opin. Curr. .

Aire employs a histone-binding module to mediate immunological mediate to module histone-binding a employs Aire et al. t al. et itnt otiuin o Ar ad antigen-presenting-cell subsets and Aire of contributions Distinct 105 NCE ONLINE PUBLIC ONLINE NCE Cell-to-cell expression variability followed by signal reinforcement signal by followed variability expression Cell-to-cell t al. et ullnt RAsq rm ige el uig Smart-seq2. using cells single from RNA-seq Full-length ie nese sald N plmrs t idc etpc gene ectopic induce to polymerase RNA stalled unleashes Aire Computational analysis of cell-to-cell heterogeneity in single-cell in heterogeneity cell-to-cell of analysis Computational xrsin f uo-soitd ifrnito atgn, MUC1 antigens, differentiation tumor-associated of Expression Science Population and single cell genomics reveal the Aire-dependency, The transcriptional regulator Aire coopts the repressive ATF7ip-repressive the coopts Aire regulator transcriptional The

pooe-ee mmain xrsin atlas. expression mammalian promoter-level A Cancer Res. Cancer rmsuu gn epeso i tyi eihla cls is cells epithelial thymic in expression gene Promiscuous Science C Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. 10 , 15878–15883 (2008). 15878–15883 , Projection of an immunological self shadow within the thymus iedpnet hmc eeomn o tumor-associated of development thymic Aire-dependent IAL IN conig o tcncl os i snl-el RNA-seq single-cell in noise technical for Accounting hoooa cutrn o gns otold y h aire the by controlled genes of clustering Chromosomal Nat. Rev. Immunol. Rev. Nat. , 1096–1098 (2013). 1096–1098 ,

24

, 1918–1931 (2014). 1918–1931 ,

339 10

105 298 J. Exp. Med. Exp. J. TE , 1093–1095 (2013). 1093–1095 , Annu. Rev. Immunol. Rev. Annu. , 1219–1224 (2013). 1219–1224 ,

, 15854–15859 (2008). 15854–15859 , 67 R , 1395–1401 (2002). 1395–1401 , E EMBO Rep. EMBO , 3919–3926 (2007). 3919–3926 , S Cell T J. Exp. Med. Exp. J. rc Nt. cd Si USA Sci. Acad. Natl. Proc.

S 17

140 105 A

, 193–200 (2007). 193–200 , 202

TION 5 , 772–782 (2005). 772–782 , , 123–135 (2010). 123–135 , , 657–662 (2008). 657–662 , , 33–45 (2005). 33–45 , Nat. Cell Biol. Cell Nat.

rc Nt. cd Si USA Sci. Acad. Natl. Proc. 9 102 , 370–376 (2008). 370–376 ,

Immunity

a. Methods Nat. 199 nature immunology nature

, 7233–7238 (2005). 7233–7238 , 27 Nat. Immunol. ′ t see). t , 155–166 (2004). 155–166 , , 287–312 (2009). 287–312 , http://www.nature.c

Science 41 Annu. Rev. Immunol. Rev. Annu.

Nat. Biotechnol. Nat. 16 , 414–426 (2014). 414–426 , Nat. Rev. Immunol. Rev. Nat.

Nat. Immunol. Nat. , 27–37 (2014). 27–37 , 10

109 348

1213–1218 , 15 Nature 535–540 , , 589–594 , , 258–265 Proc. Natl.

507 110

Nat. om/ Sci. 33

2 , , , ,

© 2015 Nature America, Inc. All rights reserved. 38. 37. 36. 35. 34. 33. 32. nature immunology nature

Pinto, S. Pinto, M. Borgne, Le S. Islam, Islam, S. Tang, F. Tang,F. D. Ramsköld, 2811–2821 (2014). 2811–2821 MART-1-specific cells. of T frequency high the explain usage medulla. the sequencing. RNA-seq. multiplex 6 cell. cells. tumor circulating individual , 377–382 (2009). 377–382 , Nat. Protoc. Nat. et al. et al. et et al. et al. et t al. et Nat. Protoc. Nat. mRNA-Seq whole-transcriptome analysis of a single cell. RNA-Seq analysis to capture the transcriptome landscape of a single a of landscape transcriptome the capture to analysis RNA-Seq Nat. Immunol. Nat. Characterization of the single-cell transcriptional landscape by highly t al. et Misinitiation of intrathymic MART-1 transcription and biased TCR biased and MART-1transcription intrathymic of Misinitiation t al. et Highly multiplexed and strand-specific single-cell RNA 5 RNA single-cell strand-specific and multiplexed Highly

5 , 516–535 (2010). 516–535 , Genome Res. Genome ullnt mN-e fo snl-el ees f N and RNA of levels single-cell from mRNA-Seq Full-length h ipc o ngtv slcin n hmct mgain in migration thymocyte on selection negative of impact The

7

, 813–828 (2012). 813–828 , aDV 10 , 823–830 (2009). 823–830 ,

A 21 Nat. Biotechnol. Nat. NCE ONLINE PUBLIC ONLINE NCE , 1160–1167 (2011). 1160–1167 ,

30 , 777–782 (2012). 777–782 , A Eur. J. Immunol. Eur.J. TION

Nat. Methods ′ end

44 ,

44. 43. 42. 41. 40. 39.

oe MI, ue, . Adr, . oeae etmto o fl cag and change fold of estimation Moderated S. Anders, & W. Huber, M.I., Love, S.A. Bert, V. Azuara, L.O. Tykocinski, S. Schoenfelder, the of Shaping B. Kyewski, & V.K. Tuohy, K.A., Nave, M., Klugmann, L., Klein, dispersion for RNA-seq data with DESeq2. with data RNA-seq for dispersion remodeling. (2006). 532–538 (2010). cells. epithelial medullary thymic cells. erythroid (2010). in interactome transcriptional a reveal cells. epithelial thymic in expressed protein self of variant splice a by T-cellrepertoire autoreactive et al. et t al. et Cancer Cell Cancer Regional activation of the cancer genome by long-range epigenetic long-range by genome cancer the of activation Regional Nat. Med. Nat. hoai sgaue o puioet el lines. cell pluripotent of signatures Chromatin t al. et t al. et pgntc euain f rmsuu gn epeso in expression gene promiscuous of regulation Epigenetic

Preferential associations between co-regulated genes genes co-regulated between associations Preferential 23

6 , 9–22 (2013). 9–22 , , 56–61 (2000). 56–61 , Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. Genome Biol. Genome

a. Genet. Nat. 15 ticles e l c i rt A , 550 (2014). 550 ,

107 a. el Biol. Cell Nat. , 19426–19431 ,

42 , 53–61 53–61 ,

8  ,

© 2015 Nature America, Inc. All rights reserved. the following modifications: 5 × 10 × 5 modifications: following the as cytometry flow by described sorted and isolated were TRAs) surface for negative Human (MHCII patients. subsets mTEC all from obtained was consent informed and were (367/2002), samples Heidelberg human of University of the of Board Studies Review Institutional the by Heidelberg. approved of University the of School Medical Surgery, Cardiac of Department the at surgery cardiac corrective of ATAC-seq. 10 × ~150 between yielded typically lane 2500 lane and used 105– paired-end sequencing. A HiSeq sequencing HiSeq Illumina per samples 24 We‘multiplexed’ Coulter). (Beckman beads tion step was performed with a ratio of 0.8:1.0 (as above) of Ampure SPRIselect reaction and applied 12 cycles for the final enrichment PCR. The final purifica used 100 pg of cDNA for the We further. processed were controls quality both passed that libraries cDNA as reported (Agilent) size was on instrument checked a Bioanalyzer expression of a mouse housekeeping gene ( for PCR) quantitative for cDNA libraries purified of dilution 1:10 a used (we PCR quantitative via screened were libraries cDNA amplification, PCR first the After carryover. dimer primer minimize to purification PCR first the for total PCR volume; instead of of 1.0:1.0) Ampure Coulter) XP (Beckman beads (beads/ 0.6:1.0 of ratio a used and amplification PCR initial of cycles 19 used method published a by genes variable highly significantly of ‘calling’ to ‘spike-ins’ERCC ‘noise’ of were for and used technical estimation levels for a of volume total 5 of ERCC Spike-In Mix (Life Technologies) in RNase-free water was included in reported RNA-seq. Single-cell tissue. thymic from pooled cells described represent mature mTECs as Single experiments in the all used mode sorting single-cell the by Biosciences) (BD sorter cell 0.2 of concentration final a at CC1; iodide propidium of use the through excluded were (anti-Ceacam1; cells Dead eBioscience). PE–anti-CD66a or Systems) R&D (657909; Tspan8 in-house) prepared hybridoma; (CDR1 anti-CDR1 prepared in-house) (G8.8; 647–anti-EpCAM Fluor Alexa Pharmingen), BD (30-F11; (PerCP)–anti-CD45 protein chlorophyll peridinin mixture: antibody the in ( Biosciences). BD (6C3; (FITC)–anti-Ly51 (PE)–anti-I-A in-house) prepared (G8.8; 647–anti-EpCAM Fluor Alexa chlorophyll BD protein (30-F11; Pharmingen), peridinin (PerCP)–anti-CD45 unselected for ( sorted mTECs mature fraction, cell stromal pre-enriched The experiment. described as purified and isolated cells. epithelial thymic medullary mouse of Isolation (Deutsches Legislation. German the and Purposes Scientific other and Center Experimental for Research used Animals Vertebrate of with Protection the for Convention Cancer accordance European the in conditions German approved under the Krebsforschungszentrum) ani of central the in laboratory performed was mal maintenance cohort and breeding All Mice. METHODS ONLINE nature immunology nature n = 48 cells) or Ceacam1 ( Ceacam1 or cells) 48 = o te eeto o mEs y xrsin f h srae Rs Tspan8 TRAs surface the of expression by mTECs of selection the For C57BL/6 mice were used in this study for the isolation of mTECs. mTECs. of isolation the for study this in used were mice C57BL/6 22 4 1 , 6 2 2 Human thymic tissue was obtained from children in the course course the in children from obtained was tissue thymic Human , FITC–anti-I-A , AA-e eprmns ee efre a reported as performed were experiments ATAC-seq . 3 with the following modifications: 1 1 modifications: following the with b (16-10A1; BD Biosciences) and fluorescein isothiocyanate isothiocyanate fluorescein and Biosciences) BD (16-10A1; n = 211 cells), was stained with the following antibodies: antibodies: following the with stained was cells), 211 = µ l lysis buffer. During analysis, sequencing reads mapping reads sequencing l analysis, buffer. lysis During Single-cell sequencing libraries were prepared as as prepared were libraries sequencing Single-cell hi cells positive for surface TRAs and MHCII and TRAs surface for positive cells n b ′ tagmentation (AF6-120.1; BD Pharmingen) and Pacific Blue- Pacific and Pharmingen) BD (AF6-120.1; = 30 cells), the following antibodies were used used were antibodies following the cells), 30 = µ g/ml. Cells were sorted on BD FACSAria III III FACSAria BD on sorted were Cells g/ml. 4 5 3 with pooling of cells 5–20 mice per per mice 5–20 cells of pooling with to 50 × 10 × 50 to 6 ′ and ~200 × 10 × ~200 and Ubc (transposase-based fragmentation) ), ), and the distribution of library µ 3 pooled cells (depending (depending cells pooled l of a 1:1,000,000 dilution dilution 1:1,000,000 a of l 4 7 ad ihr PE–anti- either and , Mouse mTECs were were mTECs Mouse 6 reads. 4 6 , phycoerythrin phycoerythrin , 22 , 2 3 3 hi 1 2 . . Only with with 5 cells cells . We . 1 0 - - .

Bioinformatics. sequencing. Illumina 28 the of 24 only kit, indexing dual Illumina 24-sample the used we Since sequencing. both of expression for positive were that libraries cDNA Single-cell tool. BLAST quantitative PCR pre-screening. Primers were designed with the NCBI Primer- of 1:10 (in nuclease-free water) of the cDNA libraries were used for subsequent a with amplification Dilutions Coulter). PCR (Beckman of Ampure XP above) beads (as of of 0.6:1.0 ratio cycles 19 after described as purified were prepared Libraries were above. mTECs mature of libraries cDNA Single-cell the of Confirmation fragments. sequenced 40,820,441 and 16,867,055 between yielded samples pair was 2500 sequencing 105–base paired-end machine used, and (Illumina). sequencing on a PCR and HiSeq by were sequenced quantitative ‘multiplexed’ were quantified libraries final The dimers. primer of Kit; removal for Extraction Qiagen) Gel MinElute (QIA extraction gel performed we PCR of products, purification subsequent and PCR enrichment the After saturation. approached amplification as soon as stopped was reaction amplification the and Technologies), (Life System PCR Real-Time StepOnePlus the with ally individu monitored was PCR enrichment Each five pre-amplification). of (without cycles PCR enrichment for reaction ‘tagmentation’ (PBS purified each of buffer cytometry flow in We for ATAC-seq 50% used were and used FCS) sorted 5% experiments. containing were frequency) subset mTEC on 48. 47. 46. 45. method published another this We used used analysis. we further and for subset 50%, than larger were variation of coefficients biological we a used method variation, published for technical ing, and normalized for sequencing depth by a published method lapped with each gene through the use of package for the over HTSeq data process that fragments sequenced of number the tabulated we transcriptome, fragments weresequenced considered for analysis. Forfurther each single-cell to the mouse reference genome (ENSEMBL release 75). Only uniquely mapped 2014-07-04) version program, the GSNAP ments (with nucleotide-alignment Supplementary Code Supplementary in the statistics, and summary figures of reported the all generation including data, the all of analysis the for used code R documented the containing flow availability. Code 2014-07-04. version GSNAP with 75) release (ENSEMBL genome reference human the to mapped were data ATAC-seqThe method. Benjamini-Hochberg the using genes done were corrections identify testing To ‘clue’. package R the Multiple test. Wilcoxon the we used genes, TRA-encoding with with co-expressed stability their assessed and computing) statistical for project R method the of ‘cluster’ (software package R (pam) the of medoids’ around ‘partitioning the by genes co-regulated of out’ by Weon cycle. the the cell the variation data explained groups identified

nes S & ue, . ifrnil xrsin nlss o sqec cut data. count sequence for analysis expression Differential W. Huber, & S. Anders, antibodies Monoclonal B.A. Kyewski, & J.R. Bender, L.M., Bolin, R.V., Rouse, murine the in heterogeneity Epithelial S. Hosier, & J. Truex, A., Nelson, A., Farr, K. Rattay, Genome Biol. Genome Cytochem. cells. epithelial thymic human and mouse of subsets with reactive medullary and subcapsular by expressed glycoprotein epithelium. surface cell a thymus: cells. epithelial thymic medullary in expression gene promiscuous modulates partner, interaction regulator Klk5 Klk5 and the housekeeping gene

t al. et 36 J. Histochem. Cytochem. Histochem. J. -positive cells (instead of the 28 identified) were subjected to to subjected were identified) 28 the of (instead cells -positive

, 1511–1517 (1988). 1511–1517 , 11 For the single-cell data, we mapped the sequenced read frag data, we For mapped the sequenced the single-cell oedmi-neatn poen iae , nvl autoimmune novel a 2, kinase protein Homeodomain-interacting We work and have reproducible a provide comprehensive , R106 (2010). R106 , . Klk5 J. Immunol. J. c-xrse gn st y uniaie PCR. quantitative by set gene –co-expressed

194

39 Ubc , 921–928 (2015). 921–928 , , 645–653 (1991). 645–653 , were for further processed Illumina 2 5 to identify genes whose genes whose to identify doi: 10.1038/ni.3246 4 2 8 J. Histochem. J. 7 . To account to ‘regress ‘regress to - - - - -