© 2013 Nature America, Inc. All rights reserved. should should be addressed to A.R. ( members and affiliations appears at the end of the paper. Wisconsin, USA. of Pathology, University of Medical Massachusetts School, Worcester, USA. Massachusetts 1 ­context for of cell ties restricted densely combinations that human cell expressed controlled cies. of tions ferentiation Understanding modular pendium we regulatory to use immunologists The mouse cells are highly conserved, it is likely that many lessons learned fromunknown the regulators,mouse model we applyemphasize to humans.the role of ETV5 in the differentiation of 175 previously unknown candidate regulators, as well as their target andwe thefound celldifferentiation types in whichstage–specific they act.regulators Among theof mousepreviously hematopoiesis and identifiedOntogenet, many an knownalgorithm hematopoietic for reconstructing regulators lineage-specificand regulation from -expressionprofiles profilesacross mouseacross immunelineages. lineages Using allowedOntogenet, us to systematically analyze thesetranscriptional circuits. Tocircuitry analyzethat controlsthis data it setis westill developed only partially understood. Here,The differentiationthe Immunological of hematopoieticGenome Project stem gene-expression cells into cells of the immune system has been studied extensively in mammals, but the Daphne Koller Vladimir Jojic mouse immune system of Identification regulators transcriptional in the nature immunology nature Received 30 January; accepted 13 March; published online 28 April 2013; Computer Computer Science Department, Stanford University, Stanford, California, USA.

The exhaustively

the human

provide studying

populations

states of from

Immunological involves Most

for

ImmGen shared 246

hematopoiesis of inte

model the

2 peripheral

and

to a

.

studies in immunological networks

cell by

However,

rich the

r of human

study connected specific the

a 5 of and use relatively

types Howard Howard Hughes Medical Institute, Department of Biology, Institute Massachusetts of Technology, Cambridge, USA. Massachusetts,

cells

larger chart and

first the of

and of compendium

modules 1

regulatory 1 of

, a

the the rigorously 7 , of

8

computational , , Tal Shay

cells

regulatory new

of diverse hematopoiesis in & the Genome Immunological Project Consortium or

comprehensive of

gene-expression development analysis lineages

3 number

regulatory immune

the the

the has cord

[email protected]

few circuits. that computational aDV

of

disorders immune mouse mouse suggested

lineage

coexpressed blood

‘master’ organization

could A

Project

controlled

of and

offer NCE ONLINE PUBLIC ONLINE NCE

of system. 2 mechanisms

,

gene

7 program However,

transcription

immune , , Katelyn Sylvia immune

act biologists and and view be

tree.

system an

analysis

and

(ImmGen)

transcription

a

expression profiles obtained

to

unprecedented more thus for

Because algorithm differentiation

data-generation

genes set

hematological

of

of

understanding system system has

that

could

that mouse and of who complex hematopoiesis ) ) or D.K. (

and

in and factors 7 important

the

These These authors contributed equally to this work.

the

human is

in

reinforce underlie sufficient

1

aim,

A

in

their not . to a

TION factors

are

ImmGen 38 In hematopoiesis.

transcriptional

3 consortium

the [email protected] reconstruct

, , Or Zuk

organization

access this

opportunity cell that through as

arranged

underlying

malignan

816 study

pipelines,

a

the

implica

context, types that the

distinct pr quanti control

in arrays

many o com basis

doi:10.1038/ni.258

cess

was dif the the are 2

in in of

, Xin , SunXin a 2 - - - - - Broad Broad Institute of MIT and Harvard, Cambridge, USA. Massachusetts,

method, identification active that age that particular, offers can physical profiles indicate direct robustness a its or can deep example, tion networks to lessons ­programs

putative

). ). Here Analysis activity humans. presumed

be tree) enhance

cells regulatory 4 factor

sequencing) Laboratory Laboratory of Genetics, University of Wisconsin-Madison, Madison,

unique 4 in evidence inferred

, , Joonsoo Kang g are learned

we

function and

5 d Ontogenet, that 7 another. probably from

are of of T cells. As the transcriptional programs of human and

regulator

published

of

highly or and Two used

target. confidence a observational

human

6

cells physical are opportunities of

a

transcription

chromatin from

relationships

cis

biological from

of true key

more 8

8

those

accessible

and Incorporating organized -regulatory These These authors jointly directed this work. Correspondence share biochemical

and In

a and

approaches

and models regulators and

the both significant

models

observational

closely

are 5– insights

many

a

mouse mouse 7

3 to immunoprecipitation

interpretability

module cases, , , Aviv Regev and models challenging

but

factor

that 3 in apply that

based have

element related

a

of

expand provide of interactions

known

have for correlation model analysis cells such to

exist

their

hematopoiesis.

(as

it not are

of

on develop the

to models

(according

not coregulated are

in information

the explicitly with complementary,

the

regulatory

only the to lineage, will identification

one of

6

2 been highly association

collect , scope

,

7

the 5 ImmGen

a . between

correlative

probably

,

but with a

sublineage 8 Physical or target’s

,

new relationship

e c r u o s e r leveraged

as

(ChIP) mRNA)

considered

do to of 9 conserved

, in which mechanisms targets

may

whereas

the discovery. computational

not

hematopoiesis,

the

6

promoter compendium of

be A A full list of of data

3

evidence.

known

Department Department using

followed a

may help

necessarily

abundance and regulatory

regulation

applicable

transcrip before. enhances

between

4

provide the mRNA ,

that

not

in many both

line

(for and fact the

As

by be In of  3 - -

© 2013 Nature America, Inc. All rights reserved. fetal liver. Stromal cells (box, bottom right) were also measured as part of ImmGen but are not part of the lineage tree. Color in bar beneath matches matches beneath bar in Color tree. lineage the of part not are but ImmGen of part as measured also were right) bottom (box, cells Stromal liver. fetal sorting, for markers and (nomenclature 1 Figure and often killer many natural cells, all system of The system immune mouse the of compendium Transcriptional RESULTS cell rithm hypotheses didate physical regulators, The coherent some two tors to e c r u o s e r  ref. from Adapted macrophages-monocyte. MF-MO, granulocyte; GN, above. tree in branches of color SC.MEP.BM

Granulocytes SC.CMP.B

816 build URAC.PC THIO.PC Bl ARTH.SYNF ARTH.BM BM major GN

lineage. spleen. different with model

ImmGen

granulocytes,

SC.GMP.BM of

sampled T

types expression can

regulators.

( M

Putative edges Nonintersecting edges killer

cells which an Mouse cell populations in the ImmGen compendium. Lineage tree of the hematopoietic mouse cell types profiled by the ImmGen Consortium Consortium ImmGen the by profiled types cell mouse hematopoietic the of tree Lineage compendium. ImmGen the in populations cell Mouse Fig. Fig. Macrophages BM model

Thio5.II-480hi.P Thio5.II–480int.PC Thio5.II+480int.PC Thio5.II+480lo.P 11cloSer.Salm.S 11cloSer.SI 103-.11b+.Salm.S 103-.11b+.SI 103-.11b+.Lu Microglia.CNS II+480lo.PC II–480hi.PC RP.Sp Lu BM MF-M expression; modules hematopoietic

SC.MDP.BM

BI

be

Monocytes observational

was identified for of

LN

1 granularities, (NKT

O

(NK)

used 6C 6C– 6C– 6C+II– 6C+II+

consortium from we

C and

–I C α

I and II+ II– I Ii experimental supported nt

β profiles

further

Our Lu.L

SLN

monocytes, SC.CDP.B

of T

to Supplementary Table 1 Table Supplementary proposed cells) I N cells, several IIhilang–103–11b IIhilang–103–11b+.SLN IIhilang+103–11b+.SLN IIhilang+103+11b ML

Lv

T N cells,

this coexpressed many 103–.11b+24+.Lu

delineate M model C DC S h DCs

DC.103-11b+F4/80lo.Kd LC

SI

lineages, .S from p

model

B and refined k

8– 8– 8– 8+ 4+ pDC. pDC. S.SI

resulted

LO LO

data with

4– 4– through regulatory .SLN .SLN 103+.11b+ 103+.11b 103-.11b tissues,

cells 11b+ 11b– 8– 8+ of

dozens

provides γ 246

macrophages, studies, + –

the δ

81 set associating

regulation B1

and T Fe

b. into

PC

including

cell

Supplementary Table 1 Supplementary larger 1

ta B.FO.PC

genes. already B. in cells. such

l MZ.Sp the (April

B proB.FrBC.FL of

preB.FrD.F

T proB.FrA.F CT

SC.STSL.FL smaller 334 B.FO.LN SC.LTSL.FL B1a.Sp types

T B.FrE.FL

B cells B.1a.PC MLP.F CLP.F RL

a and

previously

cells.

use

as coarse-grained cells B.FrF.BM rich

The L L B.FO.MLN L We

SC.MPP34F.BM L preB.FrD.B

fine-grained SC.STSL.BM preB.FrC.B proB.FrA.B SC.LTSL.BM

proB.FrBC.BM B.FrE.BM known

CLP.B MLP B.Fo.Sp B.T3.Sp B.T2.Sp B.T1.Sp 2012

). bone in

578

of the dendritic of stem

.BM

B.GC.Sp defined resource M The ‘same’ The

modules (T the the M

M M

a SC.ST34F.BM candidate SC.LT34

reg Ontogenet NK.49CI–

release) marrow,

complementary

mouse and

NK.MCMV7.Sp T cell context

NK.MCMV1.S unknown hematopoietic F .Sp cells),

cell cells A

cells dult NK.49CI+.Sp modules

types progenitor ). Some samples of stem cells, progenitor cells and B cells were obtained from adult and and adult from obtained were B cells and cells progenitor cells, stem of samples Some ). of NK cells NK with

p

modules. modules,

type

NK.Sp NK.H+.MCMV1.Sp

immune

NK.H+.MCMV7.S

consists testable

thymus include regula

natural NK.49H+ (DCs), of

more algo

span

can

was .S any p

at NK.H– NK.49H-.S p - - -

­ ­ ­ ­ ­ ­ ­ ­

.MCMV1.Sp p PreT.DN3-4.T preT.DN2-3.T preT.DN2.T preT.ETP.T To hematopoiesis in modules expression fine-grained and Coarse- genes of characteristic studying ent (S&P cells ous, especially cells, Conversely, sity being variable, cells, profiles populations ent Correlations ontogeny cell the trace profiles global in Similarities Fig. Fig. 2a modules T.DPsm.Th T.DPbl.Th T.DN4.T T.ISP.T T.8Nve.LN T.8Mem.L

the

characterize h 1 h with with T.8Mem.Sp.d100

T.8Mem.Sp.d45 h h T.8Eff.Sp.48hr T.8Eff.Sp.24hr T.8Eff.Sp.12hr

0 T.8Eff.Sp.d15 T.8Eff.Sp.d10 h h LisOva.OT1 which T.8Eff.Sp.d8 T.8Eff.Sp.d6 T.8SP24int.T

T. T.8SP24-.T were T. although

T. ) profiles

8SP69+.T group, N

DP69+.T obtained 4int8+.Th in

main , T.8Mem.S T.DP.T T.8Nve.Sp and b

were

the

all ). a each DCs of h h h

h h

CD8 gradual aDV most We

T.8Mem.Sp.d106 T p

T.8Mem.Sp.d45

partly T.8Eff.Sp.d15

other all

T.8Eff.Sp.d8 T.8Eff.Sp.d6 T.8Eff.Sp.d5

coexpressed lymphocytes T.8Nve.ML known were

.8Nve.Sp.OT1 eleven

Fig. Fig. in (Pearson

were

signatures lineage,

being first tightly A VSVOva.OT1

+ myeloid

from

global the

NCE ONLINE PUBLIC ONLINE NCE T N similar T

103-.BR in 2

lineages CD .8Nve.PP reflected

cell

cells ), loss constructed

similar

8

lineage T.8Mem.d20 lineages + key the

s

Vg2+24+.T Vg2+24–.Th

the Vg2+.act.S T 103-.S followed

diverse Vg2+.S correlated, 4

we profiles r

.

p of

lineage cell CD T.4FP3– and

103+.B = patterns cells

p to most

T.4Mem44h62l.Sp

genes h p of 4

had s

Tgd.Sp used + –0.71;

differentiation T.4SP24int.Th

T.4SP69+.T

( T.4SP24-.T T R .S T.4+8int.T

overall,

early

tree p Supplementary Table 2 Supplementary T.4Mem.S

over- Tgd.Vg1+Vd6+24–.Th the

NKT T.4Nve.Sp

compared

tissues Vg1+Vd6+24+.Th

Vg1+Vd6+.T larger being

by variable

h between

h h one-way

tree, at 81 p ( finer

T cell T

Fig. Fig. Vg2–.act.S T.4Mem44h62l.LN T.4Nve.LN Supplementary Fig. 1 Supplementary did T.4Mem.LN

myeloid

of Vg2–.S T.4FP3+25+.LN

pre-B

and cells. two

h coarse-grained

with

gene

s differences

p the weakly and A show

Tgd.Th p T.4Nve.ML 2 sampling

TION

Vg1+Vd6–24–.Th underexpressed Vg1+Vd6–24+.T ). granularities Vg1+Vd6–.Th

(consistent T

T.4FP3+25+.A

T cells

with samples more

analysis In re

granulocytes N their

g regulation,

potential. cells cell

T.4Nve.P and weaker

Stromal cell Stromal

general, Sk

s similar T A

h and nature immunology nature P CT

the Th Vg5+.act.IE T.4FP3+25+.Sp similar

T.4.PLN.BDC

Vg5+.IE T.4.Pa.BDC

known T.4.LN.BD RL were lymphoid Ep Fi

for between .MECh

were

of L pre-T expression

L

similarity γδ C modules s

i

Vg5–.act.IE

with this variance

the T cell T

Vg5–.IE As very NKT.4 NKT.4+.Sp ( ).

to

Supplementary Supplementary we their

inherent largely

being –.

SLN

L genes s

closer ). a Sp L stromal NKT.44+NK1.1– NKT.44 cells, lineage. NKT.44+NK1.1+.Th

DC next

lineages.

ML resource heterogene For NKT.4– progenitors NKT.4+Lv N

expression –N ST.31-38-44- BEC LE FRC

(C1–C81; K1.1 C

to

.L to

samples’

the

of v for myeloid

consist consist –. defined two

.T NKT cell NKT Th T

NKT.4+.Lu h define

diver

those

Stem cells. cells, each least

NK

cell for s - - - -

© 2013 Nature America, Inc. All rights reserved. weight’ (criterion It regulators’ tree modules, captured regulators sublineages, of cell accordance than each factors expressed A of activating expression aims latory We regulation lineage-sensitive reconstructing Ontogenet: repression into ( lineages 2de Figs. 2c Figs. genes) ModsRegs/modules.htm and modules which binding tors elements showed subset fication of modules ules modules further 2c Fig. Supplementary CD8 activated ACTT8, T cells; CD4 T4, cells; pro-B and cells pre-B PROB, cells; in colors matches beneath bar in (color beneath and margin left along labels to according lineages major delineate lines horizontal and vertical Black end. right or lower the at cells stromal ( tree the on search breadth-first by sorted are Samples samples. all of value s.d. highest the with genes) expressed unique 8,431 the (of genes 1,000 the for calculated types, cell profiled of pair each for none) white, negative; yellow, positive; (purple, coefficients correlation 2 Figure nature immunology nature that Supplementary Fig. 2g

may those Ontogenet Most genes then a

next

types

( spanning those lineage and

module downloaded

other in Supplementary TableSupplementary 6 to

circuits

the

activate

of that

‘distant’

identified showed fulfill

enrichment

for

associates of sites factors

coarse-grained developed

in

Related cells have very similar expression profiles. Pearson Pearson profiles. expression similar very have cells Related

helped and ( the

and (

, ( as

(

whereas Supplementary Figs. 2f 2f Figs. Supplementary these 4 4 categories

Supplementary Fig. 2a Fig. Supplementary in and

module’s Supplementary Table 5 Supplementary

1

distinct Supplementary Fig. 1 Fig. Supplementary was

(for in

of

one

(transcription each regulate

with shared

(for above),

both and 3 but

expression

in

for

receives the each

their

the

7,965 that

repressing

rare

) a

example,

lineage, may that

cells genes

us

5 factors

either module the or cell the example, lineage

following lineages); for

at

drive module ). the

regulatory

capture

associated a

each (only

regulators genes

for genes ‘pan-differentiation’

a

cell

( change subtypes greater (for new whereby In type

the

Supplementary Figs. 2h

each module as

in latter

coherent

)

whereas lineage-specific addition,

known

input hematopoietic

modules

(criterion –

profiles

and tree

l in

module example, in Foxp3 transcription ImmGen the

algorithm, ). h are (F1–F334;

aDV that

factors, coarse-grained of the GATA-3

one

C53 biological

criterion + in and similarity

regulate

19

T cells; gdT, T cells;

(for genes

other expressed

are mechanisms and the

can

enrichments

A each mechanisms of different indicates

to had

lineage NCE ONLINE PUBLIC ONLINE NCE

for functional (B

fine

Supplementary Table 3 Table Supplementary

of

a more example, );

gene-expression

6 with (48 act

Supplementary Note Supplementary 1 also ),

from

)

coarse 2 chromatin

cells)

is and portal

expression lineage(s).

a regulator modules T

and and

for which Ontogenet,

Supplementary Table 4

above). Fig. Fig.

modules reg 3, only

determined as predesignated

of

considerations:

in

have

factors;

similar cell

but

the

a

criterion cell (

regulators 6 T

two

cells).

binding its

Supplementary Note Supplementary 2 and induction 1 regulation 81 expression γδ

combination

module ), ). S&P, stem and progenitor S&P,). progenitor and stem cells) particular module

( from

annotations, that differentiation.

not identity resulted

T cells. can

that

http://www.immg activity

types

additional,

different

modules;

8 The C17

were

and

is

+

may patterns regulators in

criterion The

in were

Many T cells; T8, CD8 T8, T cells;

control be

coregulate

to the assigned are

of cells

4, regulator by (for another,

(stromal 7

module,

and searched, in

delineate a ‘mixed-use’

help

Fig. Fig. ).

and former master

transcription

as

A

in same

( active ( set a

profiles set

stromal

fine Supplementary Supplementary

the TION Supplementary Supplementary Lineage-specific

sublineages), of

that example,

combination

criterion

a 4,478 that 334

its

only 1

more

cis of activity

of 2, in of regulator

the ), with with ),

lineage(s)

and modules. ), sublineage)

an nested

)

-regulatory

the

nested are

even

Ontogenet cells)).

the the a regulators

did ‘candidate across

regulators fine

should activity including

and

a the larger

).

modules

browsed of of

‘activity

specific specific close smaller

activity

en.org/ so

li

Coarse identi not

+

across

closer

factor

n if of mod

1,

7,965 regu ).

then

eage

on). fine

fine fac A

the

the the All fall

for set

be

in

in to of is is - - - -

­ ­

lators to cell simple variance goals: an rion grained regulators modules tory is activity weights program more us missing is that expressed), in transcription makes each linear sion gene is reconstruction Following at Stromal ACTT

PROB

achieved

PRET essential combined

the

Notably, Although B those examine ensemble S&P NKT

gdT MO

GN

types DC MF

NK

all cells,

T8 T4 program 4). of regulator’s B expression

8

susceptible each (for protein sum

GN predictions as the

genes

Collectively, across

some module

of of

in

derived

for possible;

in factors

example,

the

MF regulatory

and module’s

of each the the by

the for and subtler

Ontogenet

the in of

each

in the

between factors gene-specific

penalizing level approach

MO

consecutive approach

coarse statistical module

6

regulatory a activity

Module

activity ,

the

regulator, ontogeny;

(while in

regulators’

linear the A to

from

cell

regulatory

expression

each

such but

and same linear

noise. genes we

DC program

activity-weighted A modules

a

type, as

is

reconstructs

model possibly weight use

1 weights a C and

in

cell used

cell as

changes

robustness,

module is possible; inferred

coarse-grained

are

regression

programs

in the it stages

and

activated “in an

expression

S&P

type

expression B as type

is

a

before

no

explains programs PROB Lirnet

optimization

patterns and

to

given in

to geared pre reflected fine

that

gaining

longer

and

in generate in that which ( are

solely

Fig. Fig. the B

is

B

differentiation the

modules

to

a

that

by cell

method although regularized were

its −1.0 regulated repressed

cells,

NK cell

potentially

regulatory

as toward identify

patterns.

3

shared

in activity

B

expression active progenitor.

PRET

they ). are

−0.8

from additional type much

by

try

and

type. module

that In a

inferred

Module −0.6

approach consistent

prediction the to

T4

belong. this for have D.”

is

−0.4 (even transcript

maintaining cell

by it

combinations

of

As in achieve weights

by

approximated

The cell-specific e c r u o s e r

Correlation

comes regulatory-network

programs with −0.2 Our

different the

T8 fewer

model, the

a

type ‘inherit’ regulators

factor (criterion

of The regulators; for result,

1

if

fine 0 gene-expression ACTT8

same

model that

is the

the the

across

0.2

multiplied the of of genes their

at fine-grained activated

abundance. modules

the

C,

Elastic

NK the 0.4 a

the factors constructs regulatory regulators

the

remain following T

way. the the module’s

whereas

assumes

0.6 of

3).

activity

expres

coarse- gd

regula similar but related

cost

model T by

crite regu same same 0.8

Stroma This This

Net

the

are are

1.0

let

by by of as  l

- - - -

­

© 2013 Nature America, Inc. All rights reserved. interactions tion Table 7 Supplementary tions in weights ing 6,151 regulatory modules controlled in ity interactions regulation ent ( Net previously regulatory regulator. tors Fig. 8 grained (1,091 regulators. modules, We hematopoiesis mouse for model regulatory Ontogenet in activity, the terms the independently rate allowing regulation This was expressed across ity penalty right). (orange-purple; activity and left) (blue-red; expression regulator of combination a linear as calculated (top) a module of ( bottom). regulator; specific (context- weights activity variable or middle) repressor; (constitutive weight activity negative a uniformly top), activator; (constitutive lineage the across weight activity positive a uniformly have may a regulator (top): module a of expression the ‘explain’ can (bottom) regulator of a type how demonstrate to below) key (orange-purple; profile activity and below) key (blue-red; profile expression regulator the 3 Figure e c r u o s e r  immunological Supplementary Fig. 9 Fig. Supplementary

In others), a

which

weight (and 6 applied

more lineage

expected cell

per similar deemed ,

of

regulatory violates

most

a and

interactions and of

all method the activating,

6

coarse-grained

frequently

types modules such , gene Overview of Ontogenet. ( Ontogenet. of Overview

1 hence

the shared regulators

a

1 the

cell

Supplementary Supplementary Table 5

As ‘pan-differentiation’

similarly and mouse cases )

probably mostly model The

unknown in ( lineage

programs tree Ontogenet

way

Supplementary Fig. 10 Fig. Supplementary

assumed

whose

the and

to

algorithm

determined

regulation. finer that similarity complex types.

(‘frequently

is

expression) that

Ontogenet

be

activity when

contexts.

program in

(59%), known

impractical

regulators

and suggested

immune associating

the by

317

had active tree related changed regulatory

does

maximal

with in reflective Thus,

distinct

expression

480 inferring that module’s

that

to

repressing module, qualitatively

two consisting lineages.

a and context ).

to weight) between

not

Ontogenet the

regulator’s mixed to

by

unique This for

b

model

cell are system

construct

if changing’), regulatory

) Mean expression expression ) Mean was

different

in the with

Supplementary Note 3 Supplementary use new

and

81 cross-validation,

a

effect 334 regulators

program each

of types. some

strictly regulator

rich

a

and genes the

coarse-grained same specificity low, and

the

activity ) Use of ) Use

Conversely, both is also data

identified

regulatory

modules related fine regulators

). of and identified

the regulatory

(defined

regulatory three tree cell On

solves

similar modules activity 195

). we

lineage-specific

ignores a cells,

extent. are than

the modules

9 better activ

When

sepa reflective

same average,

(

and

obtained type (that ( Supplementary Fig. 10 Fig. Supplementary

cell mixed)

was

coarse-grained Fig. Fig. cell

more greater

this

of

in those it

1,417

- -

patterns,

has

weights ( many types

as interactions

Fig. Fig. and types at is, 5

the

we

connections problem Ontogenet model

), modules and likely

the

fixed there

predicting a

between

whereas

regulatory obtained

Expression Expression Expression Activity Activity Activity a number of pruned a remained

4 regulator’s other

and known

, product

sparser 554 in

).

context-specific Supplementary Supplementary varied

activity except to were for

modules

the 580

by

regulators

be and modules classes.

differentia

in mixed-use 81 constructs regulatory of regulatory

Repression leveraging

lineage 17 candidate by

regulated Context-specific regulator Global repressor Global activator Target modulemean Target moduleexpression and

network, in new constant

relations

of −2

for

334 interac specific weights

activity coarse-

regula

Elastic differ

activ

their were

Expression (fold)

hav

The fine and and

per

in in Samples Activity ------

None 0 test enrichment ciations by a and ChIP-based overlap DC with that the GATA-3, development tors the known T B For single Ciita cell response specific fine lates and myeloid of selected example, already Many interactions regulatory known of prediction Ontogenet

regulator

Ontogenet’s Furthermore, cell cell its Ontogenet

6 γ

expression

module

commitment many for

example, Activation of PLZF modules; δ ) the the

),

15

regulates module T of

regulators which module CD4

2

T modules with

two

known, between as

cell known modules

myeloid-specific

fine

the T-bet among module

cell myeloid aDV

(ZBTB16), a and

+ binding for C19

groups)

module regulator those

regulatory

supported

DCs, modules);

and (

MafB regulators

the A Supplementary Table Supplementary 8

was

predictions

of and b cis the

the

NCE ONLINE PUBLIC ONLINE NCE C18

role

and

C29, which individual 1 a the

-regulatory their C52; 2 C24, Pax5,

function;

+ + + =

regulator based B .

fine genes also are PLZF; antigen-presenting

profiles

of (encoded NKT C56 and (

cell combination

F128 Mean moduleexpression

upeetr Fg 8 Fig. Supplementary of

Sox13

regulated subset-selective

T-bet

C30 member modules;

the

supported consistent

C/EBP Trim14 interactions EBF1,

Gata3 the

on in is

Bcl-11B, P Tox

module Irf5 and modules cell

and

were idea

regulated

in < the regulators,

and enrichment expression expression and

the myeloid

and expression (encoded motifs expression

1

module the fine

by

F131; α

module

of POU2AF1

×

NKT C74, genes

also a

by C/EBP

(encoded

Id3, modules

Mafb direct

10

coarse modules with C33

the GATA-3, C25 of

A

IRF8 ( −5

TION

STAT1 and

P

supported by identified

the

F288; regulators all module

).

by

cell = PU.1 accuracy

) expression (permutation

by with

known is

β and

physical of For the

= regulates

2.6

module

B

also (but macrophage (encoded CD8

regulated ( Tbx21 cis Σ

module

cell and Supplementary Supplementary Tables 5 F150

by

× × × ×

nature immunology nature

regulates known example, and C30 × (encoded which Lef1, -regulatory

Regulator expression )

involved 10 not

+

F188 Cebpa

module regulatory

is

by Spi-B by DCs

interaction

CIITA

−5

and of ) and

were associated

regulated IRF4 IRF4), the regulates

TOX

(hypergeometric F136. GATA-3 activity TRIM14 activity

it Ontogenet their γ Regulator activity by

our

IRF5 activity TOX activity is

δ

by ) F152,

was the many is test)), 27

macrophage- T by supported module Cebpb regulates (

regulated

(encoded C25 in Fig. Fig. higher

and

model.

the

of

cell and

interferon-

motifs significant associated Sfpi1 consistent

relations. NKT

the

in

between

myeloid

(and IRF8 such the

regula known 4 ) by with

TCF7; which

);

regu

×

) asso were C29, than

and was NK For cell the the the

13 by by by

in as a - - -

­ ­ ­ ­ ­ ­ ­

© 2013 Nature America, Inc. All rights reserved. did were negative type other regulated some little (57%)), regulatory (for do false-positive sion-based 497 of do fine negative sites There motifs actions that significant, ing in such groups) ( 21 ChIP motif cell the changes. diagram) in (labeled regulators inferred selected of weight activity the which at steps) (differentiation ‘edges’ indicate arrowheads tree); that on populations (below, differentiated tree hematopoietic the onto C33 module of high) red, low; (blue, expression mean ( type. cell each from regulators the of each for Ontogenet by assigned key,right) bottom (orange-purple; ( C33. module to Ontogenet by assigned (rows) regulators the of bar, below color (red-blue expression centered ( including genes, B cell typical some contains module This module. in genes of examples margin, left lines; horizontal thin by delineated are C33 within nested margin) (right F175–F181 modules fine beneath bar color to correspond (which lines vertical dashed by delineated are lineages major (column); cell each in (rows) genes module’s the of right) key, bottom (red-blue; expression mean-centered ( C33. module coarse–grained 4 Figure nature immunology nature in b P ) Regulator expression, presented as mean- as presented expression, ) Regulator

HSCs, A Although

the

554) = find

not

GATA-2 of (HSC)

not regulator–coarse

modules.

of the

example) few

2.2 and not not as

or

cases cases, myeloid EBF1 profiles

are

or in

554).

have

enrichment

meet were the

predictions the

and or no

×

Ontogenet regulatory model for for model regulatory Ontogenet

known

result results; identified and

their

measured available

(a three

myeloid 10

module

method,

ChIP

interactions

binding correlation the

they

transcription

in

this P

motif those

a −5 real

Such not hence the

results Second,

< d

cell transcription characterized the enrichment

supported of c factor Cd19 (hypergeometric

) Projection of mean-centered mean-centered of ) Projection regulators 1

reasons c

) Regulator activity weight weight activity ) Regulator

this

is initial nevertheless supported

binding ; matches colors in in colors ; matches ‘false-negative’ Ontogenet). ×

module

B in overlaps C40,

regulators transcription

due

by

were

for 10 of

such cell of in of cell , ,

the is

Blnk

probably the the −5

most

module

C/EBP

our

a our

particularly filtering to in

and

module not (permutation

cis hematopoietic for (54%); module as

majority aDV

model factor , ,

a data

expression the C24

of

method.

-regulatory Ebf1 is a

Ontogenet, study were

regulatory by the factor

provided ) Module C33: C33: ) Module this.

factor

differentiation binding

a α

can A also

prediction enrichment

and

NCE ONLINE PUBLIC ONLINE NCE process in PU.1 controls

and and

and

associations

criteria,

(A) for factor–binding C33. 52

test statistically

result

First,

(and

be C25. any of that

indicated

the Fig. Fig.

Third, A likely various

C/EBP Cd79a of

and (SFPI1) regulators for

nominated

as motif

(absolute b

in

test)),

inter cell element

bind

and in is 90 1 assigning that hence

stem of

input.

these The

as two ); );

its module

not

to

.

of in c Ontogenet),

ChIP-based

β type they for should

- -

)

reasons.

is target (60% of

­ ­ ­ ­

occur many

itself

targets

GFI1 is highly known the

or

chosen

Pearson were data,

(90% not

only a d c b

scores B binding

POU2AF1

of

immune Pou2af1

not module HDAC10 for Siglecg cases Pou2af1 Vpreb1 Vpreb3 A Hdac10 PRDM Fcer2

transcriptionally Cd79 Cd79

Prdm Bcl7 was Cd55 Cd22 Cd19

(300 Pax5 Fcrl Fcrl RFX5

EBF1 Il12

TION Spi-B Spi Rest Ebf1 Blnk Pax5 Pax5 Tcf4 Tal-1 Igll Spi Rfx5 Ebf1

regulators;

Bl

Il9r in expressed prone

and in

b b

a 5 1 a a 1 a k cis by b

much 9 9 be of by

but

Regulator activityweight Regulator expression

interactions not fact GN

another in

for

-regulatory r

considered

regulators; of

GN an Ontogenet

vice and profile

< which

(B) MF

system

in assigned

to

551 binding a 0.5). smaller expres MF MO

versa. BMI1 many MO false- false- show DC

only

334 s cis cell

for we In 1 3 - -

­ ­ ­

HDAC1 RFX5 Spi-B POU2AF1 EBF1 Pax5 NK lated Context-specific modules mixed-use underlies regulation Context-specific F287 was ated the was a ated dicted unknown or regulators previously in as general, was by mediate expression as DC

0 regulator POU2AF1

The a a T8 our ‘pan-differentiation’ Tcf3

much

regulator

not B

predicted not before by

and T4

by cell

to arrays reidentification

)

T-reg embryoid associated

in possibly specifically

was one be

gdT

F289, regulator lower in S&P

NK

that unknown stem of with was

a T

this

not and the set regulator in

by PREB

Ontogenet

highest a

the and stem number

context. macrophage-specific of because the previously regulation, identified

had

bodies before

regulation because regulators

B progenitor

expressed model

regulatory and

higher

of

of modules, in

of with

known

of Among the of progenitor NK

pre-T

associated

granulocyte-macrophage

to GATA-6-deficient as a it unknown

in NK expression

bad macrophages. of

had

PRET be

a in

in

cells

which interactions

NK regulators T cells

cell a

at those,

the

probe T

relatively

cell regulator

least cells. and cells

cells

T4

module

and context

modules

with

role the regulator,

for

set.

granulocytes. Repression in

175 GATA-6

and

or

was −2 NK

same

discussed lends

myeloid

example, lineage-specific

That

A T low

XBP1 in granulocytes of

C19

(37%)

had

of T8

only

mice the C31, Expression (fold) the

e c r u o s e r expression set

support is

one

perhaps but

was

was low model. Activity in

γ

sparse 1

CTT8 of cells. δ C50 were None

4

0 KLF12

below.

colonies

agreement . lineage was

T

predicted

E2A genes Finally, not expression

cell

and to

completely

not because

Of

because and

identified

GD

(encoded the

in

Activation modules modules was

C58 T is

the

and associ

gener B

ETV5 2 inter many regu

to

with F181 F180 F179 F178 F177 F176 F175 cells pre

475 but

by its be

in it  - - - - -

© 2013 Nature America, Inc. All rights reserved. reflected Most differentiation during ‘rewiring’ and recruitment Regulatory specific lineages, Although F300 and module, lators populations. example, underlie of different in reported another regulates. it module(s) the to regulator each connect interactions) (regulatory Lines >1. × expression) weight activity (absolute effect a maximal with interactions regulatory with cream) circle; (outer regulators assigned Ontogenet their and (yellow) modules mixed-use and modules (gray) repressed and (red) induced ‘pan-differentiation’ purple)), (dark modules 5 Figure e c r u o s e r  entiation. MYBL2 TEAD UHRF HSF2

DNMT1 PA2G ATAD

the T CBX5 2 1 GATA 4

2

cells INSM1 ACTL6

regulatory is 1 lineage HMGB in A

independently

(EGR2).

ASF1 set although Ontogenet regulatory model for coarse-grained modules. Lineage-specific modules (inner circle; colors as in in as colors circle; (inner modules Lineage-specific modules. coarse-grained for model regulatory Ontogenet 2

‘mixed-use’ by

others

our regulatory in and RUVBL1 module

This some B FOXM1

PHF5A the selected of

in

Each

model: by tree

CNBP

regulators

change change

are DCs.

of Pax5 THRA relations

C70 PLAG1 not can

activation MECOM B its

programs cases,

modules

cell Foxp3

In SOX7 in

expressed

regulators help in is provided RUVBL2

induced

B

induced another their

specific 15 23 in PHB2 12 55

HMGA 7

cells identified 14

HOXA such delineate

6 43 the in 2 SAP1

9 9

event expressed associated

11

for 1 8

CD4 HDAC2

3 5

a as context

in CHURC1

.

in

(ZFP318,

are both example,

‘bird’s-eye’ The

BUD31 47 40 the

the the

is 10 both +

NPM1 CREG

by the

themselves

MORF4L2 T same

associated 42

ability

regulation DC in 1 50 NR1H

RBBP7

Ontogenet in cells

activity of

regulatory SUV39H T mature 4 3 CEBPA

2 more subsets), reg

RFX5 another

BHLHE40 the module 8

view of ETV

(itself CHD3 cells

MEIS 1

Ontogenet 1

KLF9 fine-grained

29 weights EYA1 than

TCFE

and with ‘mixed-use’

of B B of 34

ID MAFB

and NFATC22

were 30 lineage, 54 cells and

mechanisms GM990 a

in Rag2 the

WHSC1L7

CIITA) one

59 member 1

ARID4A different

different some

CHD7 67

44 during ‘recruitment’ PIAS3,

FLI dynamic,

and 1

STAT 52 lineage. by 2 NFE2L1 AFF3 to 64

STAT

has 1

GATA-3 DTX3

myeloid MAF or identify module L in T IRF7

differ

AR

of

ZBTB16 HSF2

TRERF1 regu

T parts cells. been both 56 77

SOX1 RCOR2 that

3 For cell

the NPAS2

71 IRF8

as ID 58 ZEB1 3 25 - - RNF2 HDAC5

­

70 ­ FHL2 HIVEP 32 PPARG

1 28 KLF12 SP 48 16 4

fied reported or negative differentiation. 49 61 differentiation activity expression disposal) its or and In the and ity cell and CEBPB PRDM 26

IRF6

MITF were this 1 CD8 regulators, regulatory progenitor) 27

GATA6 weight

unique type, strengthened ‘disposal’ its 41 the RARB

61 IRF5 24

EPAS1 33 way,

74 INSM

+

immediate AH

disposed

has 76 75

(CD4

involvement ) FOXP3

KLF6 R

1 31

65 involvement

BHLHE4 of we

T

18 FOXN2

aDV HIF1 19 changed 62

changes

NCOA7

we regulators regulators changed cell,

A RELB PIAS3 1

identified MXD4

only interactions −

TBX2 TRIM25 of

A

identified CD8 of at PBX1 In EGR2

NCE ONLINE PUBLIC ONLINE NCE 1

KLF8

HIC1

Ontogenet

regulators of

CD8

each CEBPE (activity particular, ETV3

15 BATF3 progenitor

PADI4 − ETS1 and but (increased NFE2

)

for FUBP1 of

of stage

AE

GATA +

of does and differentiation S

HMGN

that T

which TCFE all weakened

2 RCBTB1, ZFP318 WWTR

this SMAD IRF3 modules MXD4,

3 PLAGL1 cells ARID3 C

IRF4

were SREBF

RBPJ weight

MYCL NCOA3 1 the 4

modules

TEAD1 3

ZFP263

not RUNX2 ( independently the B

VD

for T NR4A2 Fig. 6 Fig. PRDM 1 2 set DACH1 R EBF1

KDM2

have

TRPS1 SPIB from ( cell RFX5 SPIC regulatory or

NFE2L2 9

Supplementary Table 9 Table Supplementary necessarily recruited, SOX5 JDP2 model the

B NR3C2 ZFP691 EYA2 POU2AF RUNX3 of STAT

HL EZH1 RXRA CREB3L1

MBD2 KCNIP3 EOME Batf

DTX1 NR4A3 BACH2 ZHX2 PAX5

BCL11B

greater F

decreased) and (activity to PIAS3

4 a

been targets.

differentiation 1 S A

). common Fig. Fig.

involved immature TION

step.

and

To regulators suggests

2 associated

than

, except myeloid induced induced myeloid , except characterize and

involving

interactions

NFIL3

Lineage-specific module(NKcell) (myeloid cell) Lineage-specific module Lineage-specific module(GN) Lineage-specific module(DC) Lineage-specific module(MF) Regulator Regulatory interaction Mixed-use modules differentiation Module upregulatedwith differentiation Module downregulatedwith Lineage-specific module(Bcell) Lineage-specific module(Tcell) mean weight Notably, identified For nature immunology nature

lymphoid between

in ITGB3BP

that single-positive

example, that

those that that

and lower step

of before

34 their recruitment

its were the the

newly

this, that from modules whose

interactions. progenitors,

), progenitor) than

(

during regulators’ previously regulatory

Fig. with

as

recruited

cell for

double-

well identi (CD4

that

activ 6b T

each type

and

cell the , (or

c

as of ). + - -

­

© 2013 Nature America, Inc. All rights reserved. when no been module The recruited tors cells, In in those to correspond colors type (cell ( >1. if edge, this on regulator the for changes weight activity indicate parentheses in Numbers T cells). in reported not black, T cells; in a role have to reported regulators (red, modules associated their and step differentiation each along recruited regulators right, CD8 the in recruited regulators CD8 the in recruited interactions activating those only for interactions changing’ ‘frequently ( regulation). 0 (no white, weight; activity (repression) negative purple, weight; activity (activation) positive in colors matches which below, 6 Figure nature immunology nature and b a SMARCD2 MORF4L2 ITGB3BP RCBTB1 RCBTB1 POU6F1 POU4F2

ZBTB16 ZBTB16 ZBTB38 TWIST1 WHSC ZFP367 ZFP367 TRIM14 TRIM14 SCMH SMAD7 CLOCK HOXD8 SMYD1 RUNX2

FOXE3 another AEBP1 BCL6B TRP73 STAT4 TBX21 TCF15 TAF11 KLF15 KLF15 KLF15 KLF15 KLF15 KLF15 MYNN parent RORA PIAS3 PIAS3 MXD4 MXD4

NFIL3 LMO4 Frequently changing regulatory interactions SOX4 SOX9 CUX1 HES1 CBX7 PHF1 ZHX2 BATF ETS2 AFF1 ATF6 ATF6

KLF3

GFI IRF9 MAF HOXA9 Repressio SP4 Eomes 800 700 600 500 400 300 200 100 differentiation

proposed the CLP.BM 1 1 1 they preT.ETP.Th

GN C70

Changes in activity weights across the hematopoietic lineage tree. ( tree. lineage hematopoietic the across weights activity in Changes NK

as in

n preT.DN2.Th

MF

example, are and

repressors

our

cell preT.DN2−3.Th and were MO

no as preT.DN3−4.Th T-bet

module

model,

its a

Activity

None longer no

T

step

known reg during

longer

as T.DN4.Th DC

at cell

regulators that activators. C19

used aDV

this T.ISP.Th

regulator

regulators

the +

used

Fig. Fig. leads T cell lineage, including the CD8 the including lineage, T cell was Activatio T.DP.Th

A differentiation

at S&P NCE ONLINE PUBLIC ONLINE NCE T.DPbl.Th differentiation

1

assigned later as

) for every ‘frequently changing’ regulatory interaction between a regulator and a coarse-grained module: orange, orange, module: a coarse-grained and a regulator between interaction regulatory changing’ ‘frequently every ) for n

to T.DPsm.Th active PROB

Both Fig. Fig.

1 activators

6 T T.DP69+.Th

). Foxp3 points reg

1 B Notably,

Eomes T.4int8+.Th ) with Ontogenet-inferred lineage regulators (red and black as in in as black and (red regulators lineage Ontogenet-inferred ) with in cells

the

T.8SP69+.Th HSCs NK

and step

(for known

recruited T.8SP24int.Th at step

PRET and

because

the

CREM

example, in will T.8SP24−.Th A

that

TION

T-bet

other mult NK

be T.8Nve.PP T4

leads the

(which cell HSCs

noted T.8Nve.MLN i

lymphoid were modules.

HOXA7 + T8 T

T cell lineage branch (squares, cell types; lines (‘edges’), differentiation steps); steps); differentiation (‘edges’), lines types; cell (squares, branch lineage T cell regula

reg T.8Nve.LN to

have

only

also

NK T.8Mem.LN cell has ACTT8 -

T.8Nve.Sp a Activated NK ) Activity weights in each cell type (columns; correspond to color bar bar color to correspond (columns; type cell each in weights ) Activity point The repressors and activators lineage of Ranking T silencing by MEIS1 ment ­progenitor T.8Mem.Sp

T cells

C42

αβ activity gd

T Cell

allowed is 1 T

7

is in . the

of

recruited

T LMO4, 44 stage).

c d weights the step

TCFE KLF6 NFE2 STAT NCOA XBP1 BACH1 C/EB PADI4 ENO1 GN + T.8Mem.L cells, T.8Nve.L T cell lineage. ( lineage. T cell POU6F1,

us P

6 3 4 gene β SC.COMP.BM

to that PreT.DN3-4.Th NCOA MA BACH1 CNB C/EB ENO1 LMO2 C/EB XBP1 CREG1 MF preT.DN2-3.T T.8SP24int.T

T.8SP24-.Th T.8SP69+.Th T.DP69+.Th preT.DN2.Th preT.ETP.T

The in N N T.4int8+.Th T.DPsm.Th T.DPbl.T

F T.DN4.Th T.ISP.Th

CLP.BM identify 4 P assigned P P to

4 β α

encoding agreement T.8Nve.S

T.8Mem.S leads

first LMO2 ATF6 TRPS BACH1 SFPI IRF5 XBP1 ENO1 C/EB MO module h h 1 P h h A 1 β b

SC

STAT CUX1, HOXD8,SMARCD2,77;69;78 78; 34;18;71;75;81 MXD ZHX2, Sox9,18;37 KLF15(3), POU4F2,Foxe3,AEBP1, TWIST CLOC GFI1 KLF15, PHF1, HES1 ATF6, 78 ATF6, MYNN,KLF15,TRIM14,71;78;73;34

differentiation Activated CD NFE2L1 RE INSM CIIT XBP1 RBPJ IK KLF6 ETV3 RELB DC p ) Enlargement of boxed area in in area boxed of ) Enlargement

to and p B c

L for T c α T.8Nve.Sp.OT ) Known and previously unknown unknown previously and ) Known A

LMO2 PHB2, CNBP,TRIM28,RUVBL2, 4 , .8Nve.ML ). 1 4 multilymphoid , TRP73,

K (2), TBX2

MEIS1 1 , CLP.BM , ZFP367(2),AFF1,WHSC1,ZBTB38,TRIM14,SP4, C42.

, 68;71;52;56;35;62;6778;14;77;23;15;65;17;66 BCL6

ASF1 PA2G4 SPIB FUBP EBF1 UHRF CBX5 HMGB PAX5 POU2AF each rank

with BAT B , PA2G4,NPM1,RBBP7,PHF5 1 RORA, 19 , RCBTB1, 8 B B 1 1 F T N

2 + Smad7 , SMYD1,59;73;19 IRF9

, RCBTB1,PIAS3,ITGB3BP,

T

1 MEIS1 regulator

regulators d the

1 during .8Nve.PP ) Simplified ImmGen tree tree ImmGen ) Simplified , SCMH1,CBX7,TCF15, STAT SGM9907 ID KLF2 AE SMAD ENO1 ETS1 TBX2 EOMES NK , PIAS3,42;69;67;77;54;38;71;57;37;45

2 ZBTB16 step S reported 1 6 Not reportedinTcells Reported Tcellregulator 3

is MORF4L2, 20

differentiation

with progenitors,

at (2), TAF11,KLF15(2),KLF3, abT SP100 FUBP ETS1 NPM1 TRIM28 BCL11 LEF1 ENO1 TCF7 AE later

e c r u o s e r as

S each Sox4

HLF 1

lineage

B activator methylation

; 39;79;38;66;77 NKT LEF1 STAT TBX2 NPM1 SP100 ENO1 ID2 ZBTB16 AE GATA3 no A ,

differentiation S

NFIL Runx2

1 1 a longer of SREBF2 TCF7 GATA3 HIC1 ENO1 ETS1 DTX1 STAT AE MBD2 3 gd

activators

, S T at ETS2

recruit

4 toward

which

, used

MAF and  , -

­

© 2013 Nature America, Inc. All rights reserved. differentiation cytes, NR4A3 in regulate stem previously addition, and T with example, known 10 Table Supplementary and ( experiments ( genotype per mice two of a minimum with litters independent three to one with experiments three of representative are V by ROR (CD24 mature by (left) IL-17 ( ages. different of mice for obtained were results Similar each. in cells percent indicate (bottom) quadrants in as mice CD2p-CreTg 7 Figure e c r u o s e r  including tors the their lymphoid regulators including regulators increased tors most identify all presented stage. ing by it CD8 ELF4 higher

can

cells;

Finally, a the macrophages, regulators

3 5 a HSC

γ

γ Smad3 were T cells ( 10 ) T cells ( 10 ) 2 γδ × γδ × were +

10

stem t repressors and limited

substantially activity

+

+ deemphasize 0 5 0 3 6

POU2F2 (Gm9907; T model Those

cells (top; right number), or IL-17 or number), right (top; cells T cells from the lymph nodes (LN) of 4-week-old mice as in in as mice 4-week-old of (LN) nodes lymph the from T cells

expression

regulators

POU2AF1,

(shown cells those

Spleen Thymus cell-cycle ETV5 is a regulator of of a regulator is ETV5

regulators

progenitor

the our disposed

the a by recruited

progenitor above and from GATA-1,

Id3 were . Numbers above bracketed lines (top) indicate percent CD24 percent indicate (top) lines bracketed above . Numbers +

associated with Etv5

b

2

counting

window in

and

model are 3 known weight ) or three experiments ( experiments three ) or model

and ). 2 differentiation progenitor

(Oct2;

1 a

+/+ 0) Notably,

) to

recruited

NK ( on

more

shown captured

modules) of CREG1; Fig. Fig.

and

in

littermates (control (Ctrl)). ( (Ctrl)). (control littermates ( Sox13 progression

CKO Ctrl of be at

the

Supplementary Fig. 11 Supplementary Pax5,

HOXA7, each

to made

c- cells, the monocytes); cells; associated T

diminished

stage, at

of

the KLF13 the

reported induced 6

with

nuanced coarse contribution cell

).

the

differentiation

to

lineage

although

( basis b

EBF1 changes

). HLF;

early cells; Supplementary Fig. 11a Supplementary

many

and lo

in to

regulator by

Events

at

control those double-negative-2 and four ) ) V In

γδ CD44 (% of )

HOXA9

(a each DCs, coarse points modules the 100

10 10 10 10 γ

GATA-3

this

20 40 60 80 2

T cells. ( T cells.

in in

to

in

Bcl-11B, of way—for and thymocyte

0 0 regulator

c-Myc, 2 3 4 5

among + N-Myc predictions

regulators

thymocytes of mice as in in as mice of thymocytes

to

1 0

be in CD24 CD62L myeloid lineages, viral recruitment in granulocytes,

differentiation 16 20 30

this

the c 10

ATF6, way

the

0)

of Spi-B

). B GATA-3

activity upregulated modules at 2

10

and cells,

+ important

Ctrl by

‘pan-model’

infected but

(that cells (bottom). Right, expression of CCR6 and CD27 (top) and of IL-17 and interferon- and IL-17 of and (top) CD27 and CCR6 of expression Right, (bottom). cells entire the which proliferation N-Myc,

a and 3 we

(

) ) Total TCF7

example, Supplementary Fig. 11b Fig. Supplementary of

coarse 0

with ETV3,

cells HOXB3.

4

top-ranked ). 25 30 79 not

progenitor such were

correctly

10 ZFP318;

Thymus PLZF B weight

of

b is, For and

5

) Expression of CD24 (top) and of CD44 and CD62L (bottom) by V by (bottom) CD62L and CD44 of and (top) CD24 of ) Expression cells

regulation at T

model

lineage and 1 γδ and present

B

DACH1 8 GATA-2 their

DCs

21 2.5 2.8

cell

disposed as

),

modules,

T cells in the thymus and spleen of mice with T cell–specific ETV5 deficiency (CKO) and their their and (CKO) deficiency ETV5 T cell–specific with mice of spleen and thymus the in T cells during example, the

regulators step the cells;

SKIL,

with

as

Bach1

).

and that

analysis GATA-3 disposal the Eighteen

CKO

precursor 19 and

acting At known γ and captured

activity (‘edge’),

δ

, regulators

(

activators.

2 stage, in Eomes,

NKT T

i. 6 Fig. occur following:

the 0

T macrophage NR4A2

(reported

);

2.7 in

and

a 74 cells 96 is the and

homing

; numbers above bracketed lines indicate percent ROR percent indicate lines bracketed above ; numbers of 19 cell including in

only common NK ‘rewired’ captured is

γ with analysis

cells.

(that

mature regula regula

δ and

MEIS1 (across mono 2 weight we useful, d

NFE2;

a 2

stage, many stage, T-bet

T

; numbers in or adjacent to outlined areas indicate percent cells in each. Data Data each. in cells percent indicate areas outlined to adjacent or in ; numbers c cells,

with dur

and and can

For not cell

α

28 In is, Events Events in to of ). β lo - - - - (% of max) (% of max)

(mature) thymocytes (left) or CD24 or (left) thymocytes (mature) 100 100 20 40 60 80 20 40 60 80 0 0 est CD24 expression the in T development regulator are in and model expression of the manipulated tion latory To regulates ETV5 grained (terminally module the number effects. regulatory double-negative-2 in steps ated; with been than Overall, IL-17 ROR

1 0

9.7 cell To

the distinct T

small-intestine 10

test expression among differentiation

γ Ets

ETV5,

in 2

δ γ cells

no partly at t assess also with

lo

10 development

differentiation Ctrl T activators

assigned

that

one

family The

3 model),

lower cells, modules

other of ‘rewiring’ cell

of (CD2p-

0 called

aDV

the

regulatory the thymic

4

is

‘rewiring’ due

of are of

lineage. the

49 67 10

Thymus differentiated

regulatory γ antigen

in in vivo highly thus

they

δ

5

cells the (more-differentiated) in largest earliest A the

member

to

specific

regulatory T several NCE ONLINE PUBLIC ONLINE NCE in

32 ‘leaves γ gd of help

the

Cre

δ

model’s

cells,

far. was particular DCs, precursors,

T steps, acquire cell differentiating

thymocytes and lineage-specific T cell differentiation T cell A

in

restricted cell

CKO

Tg diminished

changes of number

in

to

determinants

practical

more

a

regulators

surface

to but model +

in ETV5

γ thymus function

cell identify these precursor compared Etv5 δ

8.8

predictions role

42 the cells; the effector c T

ETV5

) Expression of ROR of ) Expression prominent type–restricted in

cells fl/fl

hi

transit

(a

of

which

to γ

cells, of tree’). (immature) thymocytes; numbers in numbers thymocytes; (immature) differentiation and

for

δ criterion marker

expressing

change γ

67% power

more activity

mice).

the IL-17 CCR6 ETV5

T δ in

10 10 10 10 10 10 10 10 to

from in

γ

has

fine

stage, 2 0 0 functions with its

T 2 3 4 5 2 3 4 5

A

cell

of γ mice

modules these

+

possibly levels, TION γ from

0 modules

t raised The IFN- CD27 thymocytes from 7-day-old 7-day-old from thymocytes

− cells,

δ versus

a in vivo in 45 42 the cell predicted cells (top; left number) or number) left (top; cells

not 10

at ; error bars, s.e.m.), five five s.e.m.), bars, ; error

to

modules them T in in lineage.

As CD24

γ 65%

2 weight

was

which

higher

detect lacking

cell

individual modules, lineage. 10

γ

activity immature type–specific

Ctrl

γ been

thymocytes δ liver

although the nature immunology nature -chain 3 manner.

2 , T 48%).

that 10 for steps (terminally

with due 9.8 lineage. 4

we γ F287

.

(CD24 γ t and production of production t and 4 cells,

possibility 43 suggests

(IFN-

changes ETV5

changes

Both 10 levels linked and the role

identified

centered ETV5

the 5 LN

weight

to

Sox13 leading variable no

only and Thus,

coarse-grained we 18 14 We

cells differentiation

tissue-specific lung γ this

as gene hi

are

; bottom) ; bottom) Although in known

has

)

that

to analyzed

regulation.

in

F289, specifically

a focused

two, substantial were the is CKO

to

differenti for

expressed

with may

to regulator

DCs the that cell γ

on could its a

region

δ a

express

mature lineage known ‘leaves’ 82%

18

Sox13

larger T

high func

those regu

types as

74 fine- have high

they

and cell the

on

be its γ of

δ 2 - - - -

­

© 2013 Nature America, Inc. All rights reserved. factors such have and view The DISCUSSION regulatory a enrichments cis to regulators The module’s heat present them expression, grained latory org/ModsRegs Regulators’ ing part model, To portal ImmGen the on model Ontogenet the Studying thymic model effector were induces mice effector deficiency the ETV5 portion bottom). sion T marker activation, ETV5 specific spleen higher, cytes ETV5 neonates, of in absence IL-17-producing one leukin γ (V nature immunology nature δ

table cell–specific

control ETV5,

γ regulate T

motifs 2) facilitate

and

residual

prediction

(arguably)

ImmGen

module

of

map),

of of cells

impaired

processes, used had

that or of

deficiency

deficiency deficiency

with model

17

and similar

the which

the

maturation of CD62L

we that

the

visually cells Il17 of loss cells of the the or

mean

the For

lower

lymphocyte (IL-17)-producing in expressed mice

total

and

that

mature ETV5.

as those ImmGen

program

transcriptional

provide the were

expression fine-grained data demonstrated

the

the

cell

T a

transcription),

of control

postnatal mature (

of overall page (

indicated

γ exploration

Fig. Fig. 7

resulted of Fig. 7 Fig. data ETV5 cell

gene

δ

in

/modules.htm

to

activity

of of binding expression mature assignment are the

(their

(a number

module,

expression

type from

inspecting browser thymocytes

the impaired γ

the ( data

that any

the was

was δ In

antigen Fig. Fig. 7 marker also

cells

set members

module.

c of effector the

c they

number

of deficiency

the ). mice thymocytes portal, generation them. ImmGen.

in ).

model

CD2p-CreTg

developmental global

~50% the

activation) provides in

in not to

Mature

of

mice. all

weights V

These links

by which or full

b

events

γ

of

of an γ control the modules , its analyze δ contain,

2

aDV IL-17-producing modules, the with receptor,

projected top). activated activity that of different

and

+

in of

T

the

the of thymocytes

and

cells The abundance Finally,

set

with genes

was

thymocytes of profiles of that

to

Most

regulators

of the

l cell

all γ results lower

) the the A they V

δ

and ImmGen

ETV5 results. γ

of conditional is normal, of

the a of NCE ONLINE PUBLIC ONLINE NCE testing This

It the

both

γ

of δ genes

would

effector unique that

2

naive mice

relevant

antigen list

expressed in the

on

the

transcription T

each generation the allows +

of

and modules

CCR6 by

a

V the

+

most are

we module, which thymocytes

expression cells and

cells from Etv5

mice on

V γ any

may

regulator supported to of

thymic the

cell–differentiation gateway

2 regulatory their was

module

γ

to

correspondingly

predicted

+

( induced. provide

Specifically, 2

be

regulator of state) the

Fig. 7 Fig. of modules

the

but in

suggested +

cells

the

portal mammalian +/+ generated features detailed + intrathymic coarse

in

cells in

the

have

receptor

γ

thymocytes

CD27 that particularly with tools

constitute essential

δ other

mice

lineage

T

mice number,

genes γ mice and other

littermates): the

Ontogenet T the and δ

user differentiate

on cell–specific a of to

in

cells

been effector

of genes,

). links

predicted of (

T

for and

from −

factor the frequency

http://www

For

regulatory the

regulators

IL-17-producing

with regulators the

those

peripheral

However, in to

with with circuits controls.

and

IL-17-producing predictions cell–specific CD44 of

V

in

was tree to

was by searching,

in

the each prediction

A

γ the

for due fine

Ontogenet each

for

Ontogenet thymus

the

nearly TION their

chains,

immune ETV5-deficient

development

browse

the comprehensive

and

from

impaired T enrichment T T

ROR

(as about

cell

higher cells

similar ‘Modules

proper transcription code.

downloading

(the

to

modules,

cell–specific cell–specific cell–specific in

module, cell,

process. thymus module,

underlying to

in

functional

deficiency into

inefficient

pattern

of

Critically, subset.

7-day-old

mice there predicted (each

γ .immgen.

half model γ

( 2

nominal t

Fig. Fig. 4 regulate the

δ Fig. 7 Fig. 4 twofold

and

thymo

coarse-

expres (which brows .

system

of

to T

of ETV5

intra Thus, inter in

regu

of

have

with

pro

cells

that

and was and

as

our our We the the the the

we

a

all γ γ of as of of b

). δ δ a ­ ------,

­ ­ ­ ­

tion ferentiation The regulators cal controlling the subset. unknown Among sequencing enrichment correspondence regulators tory are poiesis for regulators tiation allowed in by were regulators ‘mixed-use’ karyocyte, tains use’ humans coexpression shown dium tial program files that Another ment Those criteria their the functional, cell extensive as regulators arbitrary Ontogenet regulators steps izing directly the an mature

Ontogenet’s Our A As all

regulators entire

lineages

context

at recruited differentiation activation

mixed maintenance

states

are in (such ability

similar

function.

factors

across

at many expression, active with of lineages least suggests

may

stages the before,

Additional

analysis

and stage.

which

(for assigned regulate

those, us modules

and

challenge

lineage is

3

autoregulation

role resolution

and

were at , that

and as all use

to basophils

with 2

mouse

cannot

175 although unlikely

at supports

the

more

to 5 in patterns be multiple has for example, in windows

study

whose

the

mice

Rag1 to

identify suggest and

However, expression-based

of which

of patterns

regulatory

for in to

identify we other of global

Our their

rich control that captured only

expression

more cis shared

control

identified has similar the

appropriate

among

helps a to other

identify

the

modules

and of cell in

ETV5

to experimentally

4 -motifs

program

studies directly of and of module.

is .

modules

ability

most model expression expression

to

One cis lineage

correct automatically

one

the

follow

have regulation some

and

regulators, modules.

posed

myeloid

human

IL-17-producing regulators that of regulatory types, profiles, Tal1 helps some

to

hold

-regulatory functional across such Rag2

regulators

are a integrating

expression probably

in the

this

by

lineage identify eosinophils, global possible control

in specific

modules been should and

,

such

may of distinguish

by points the

to

allows

which

them

by

to true is may

otherwise our The

other ‘rewiring’. function,

identity it predicted

with

in developmental

hematopoiesis

genes

lineages. automatically ambiguity,

lineages,

modules

that

enrichment detect does

genes differentiation lineage-specific

of

For

observed module

be

regulatory is

occurs, principles methods

we

not complementary programs for

have

dense

in ‘rewires’

as

determine effector

high similar

transcription

gene

tested in regulators shut encodes us reason ‘false-positive’

mammalian

that

motifs

whereas

reidentified

are

example, can in across

not profiles

they some

those

associated

the with or ‘early’ to

the

γ

has

been quality

causal

In regulators,

γ lineage

program

δ

whereas

broadly act off C25, identify C5). to interconnected include function,

δ predict

and

lineage ‘unfold’

lineage–specific but

in

used molecules. particular,

cells

for Ontogenet

effector unique

and most suggested genes.

modules of

the

generalize by active interactions a when

only

programming 3 humans

filtered the

stage

whether

reidentify to

The

known the may C45

distinct confirmed

has ATF6

directionality. the ChIP

of of

model’s

binding for

to

specific.

be circuits signatures

factors

before substantially

mouse those

many

the erythrocytes, our the e c r u o s e r the conserved regulatory or during

at

results. at inferred and suggested diminished

analysis

expression cells predict and

not

and which such cells assigned known

lose

which

specific followed

was

3 but γ by specific

mouse

new

HSC

some

Finally, additional

.

expression ETV5

δ

broad ‘rewire’.

be

allows

differentiation Notably,

the C49

predictions

of

with important T transit

many of

our by

data

as that

are their

As were

circuits specific a

an

reflected many predictions. effector Conversely,

the

regulation,

regulators.

regulator). modules. significant

previously regulatory selectively of functions,

regulators those regulators and transcrip

but

program.

stringent compen

has probably regulates activator

differen

substan

To between hemato together roles controls

general set biologi

by

profiles

and enrich regula several known known

‘mixed shared mega

to

of many gene-

was avoid

been roles

deep

con

pro

that

and dif

cell

the the the

for for

by in

 a ­ ------­

­ ­

© 2013 Nature America, Inc. All rights reserved. E Gundula Adriana Ortiz- Tata Daphne Koller The complete list of authors is as follows: Adam J Best O.Z. input V.J., and (A.R.), (A.R.), U54-CA149145 to Supported A. and Analysis with We Note: Supplementary information is available in the code. Accession version Methods M in an browsing and dium, removed do used to methods of ate inherent of lineage) some ontogeny. interpretation. or each cell, stimulated as dictors data a T e c r u o s e r 1 AUTHOR CO A

0 ck

T

mmanuel

cell–specific

ethods the

Liberzon

Ontogenet

cell

in γ

knowledge revise conditions

invaluable

not

thank

into the layout T.S.,

δ cell–specific

provided data

regulatory n

high-dimensional

obtained

from

case, ImmGen to

Lirnet

T parts o N the the

National differentiation coarse-

correspond

wle are for

A.R.

refine cells

the of generation

ageswara ageswara Rao the

and

by Merkin Klarman

of

lineage would an T.S., without and

and

(Molecular the Although support

t

used

the

state, remain he National ImmGen

6 and M same dg

the N

), existing

or

Consortium),

A.R.

and

some is

resource

TRIBUTIO

ability

for with

the Science lineage L

can pape in-Oo of any

when

men motif-related repressor Notably,

and relations D.K.

Foundation

applicable other Gautier as be

Cell activator

flexibility, 149644.0103

or GEO: the downloading of

cell and

and topology

disconnecting

1

both

candidate L fetal

associated needed Institute

unstructured to r ImmGen;

designed core , , Tal Shay ts Signatures fine-grained for progenitors

.

much opez

to tree;

Observatory

Foundation

cells processing; D.K.; particular

one. type,

any innate-like and 14

for

phosphoproteomic

share

protein-expression microarray team

samples N Ontogenet

enhance

, Charlie , C Charlie Kim are

S. the in 12

S

changes

for T.S.

are

future 15 13 to In immunological

in of

code Hart is but

to

of the

, , Amy

whereby the L. to US

all (including

Allergy ,

, , Diane Stem known regulatory

Database) other regulators

the 16

analyzed other

construct

measured the Gaffney

V.J.

references

study;

reflects eBioscience, for and at

(DBI-0345474 National available

, Claudia , JakubzickClaudia

2 T (A.R.),

progenitor or

, , Aviv Regev

are

B (for studies

the

and the the Cell

modules cell ontogeny lymphocytes). initial

in participated

for differentiation

cell and now data, cases,

W

V.J.

about

not

D.K.),

ImmGen several regulatory lineage.

the Research for

power example,

M. precursor the

for cancer

M agers

Institutes Infectious in

module

layout developed

(for

an programs

help

Painter

depends data;

in for

of the Howard

known

athis GSE1590

part

Affymetrix are

Ontogenet’s

on the

mass disease. the ontogeny

the

both

and data

to

by positional

of interactive example, 14 with

and line version of the pape

cell

K.S.

of 12

at in

studies, Burroughs The V.J.

available

simply

, ,

hematopoietic

a and identifying

portal role

all of the

the

13

, , Tracy Heng writing; C71.

Diseases

L

2 Hughes

identified

module cytometry

programs

figure the (for

given did

the types and

help This (for Health , on

, , Christophe Benoist ewis ewis

5

DCs by lineage data Broad S. ImmGen 7

, ,

of

algorithm, and

. experiments;

Davis) N

D.K.).

a resting gene example,

automatically related

and

when

the

genetic

preparation gene example,

preconstructed with

reflects

X.S.

adia adia Cohen

cell

can Wellcome

Medical in sets,

Expression output (A.R.; (R24

searching L Institute in 15

tree; C57

sets

present

the will

L for , , Gwendalyn J Randolph

generated

‘edges’

and

the differenti

type. regulation regulators data

biological other

AI072073 anier including

compen state cell

for

and and with

help

myeloid

and variants

lineage, provide

Institute

in

single-

can

online can r

(A.R.) mouse. 13 2 Fund

.

those ­

types ­

6

New

­

part pre ). lack that , , Jeffrey and and was 14

be be or In 9

- - - , , Jennifer , , Jamie Knell 11

, , Patrick Brennan 7. 26. 25. 24. 23. 22. 21. 20. 19. 18. 17. 16. 15. 14. 13. 12. 10. 9. 8. 6. 5. 4. 3. 2. 1. reprints/index.htm R The all manuscript; a 11. CO

eprints and permissions information is available online at at online available is information permissions and eprints mouse

authors.

13

M Segal, E. Bendall, S.C. transcription. gene in dynamics temporal control: Impulse A. Regev, & N. Yosef, K. Narayan, TranscriptionH.D. ELF4 Lacorazza, factor & Yamada,M. T.,Mamonkin, C.S., Park, S.V. Outram, Neumann, M. Wang, T. differential T.T.-H.,Virus-induced Kino, Tailor,Chang, & S.S.M., P.,K. Ng, Ozato, J.-W. Lee, H. Ji, ICER/CREM-mediated S. Sakaguchi, & B. Diamond, Z., Fehervari, J., Bodor, H. Kishi, Yoder,IHH & and M. VEGF Richardson, M.C. L., Yoshimoto, Huang, M., M., Pierre, hematopoiesis. SnapShot: L.I. Zon, & S.H. Orkin, T. Tamura, ilr J.C. Miller, O. Ram, A.P.Capaldi, S.-I. Lee, circuits: regulatory Transcriptional A. Regev, & E.K. O’Shea, T.,Shay, H.D., Kim, Shay, T. N. Novershtern, stem hematopoietic the from commitment lineage Myeloid K. Akashi, & H. Iwasaki, Heng, T.S.P. Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. elastic the via selection variable and T.Regularization Hastie, & H. Zou, lineage. cells. human in analysis location genome-wide Hog1. MAPK the data. expression gene from regulators specific Genet. alphabets. from numbers predicting mouseimmuneandsystems. hematopoiesis. human in states cell. cells. immune in acrosshumanahematopoietic continuum. Cell distinct KLF2. and CD8 of homing and proliferation the controls development. theofgeneration andmaturation dendriticof cells. vaccine MINOR.efficacyblockade ofvia of production. interferon to Implication cells: dendritic in coregulators and receptors nuclear of expression 420 p21Cip1. and 4/6 Cdk D, cyclin of control the progenitors. haematopoietic Immunol. J. Eur. cells. T regulatory by suppression in role its and IL-2 of attenuation transcriptional cells. B in Pax-5 and cells T in murine embryoid bodies. andGata-6–deficient in Gata-4 hematopoiesis definitive rescue (2008). diversity.functionaltheir developmentand Methodol. Stat. B Series Soc. Stat.

E authors P , , ricson E

N , 91–95 (2012). 91–95 ,

144 Immunity model; TI M

atalie atalie A Bezman 5

et al. N Exp. Hematol. Exp. t al. et γ et al.

Nat. Immunol. Nat. , e1000358 (2009). e1000358 , , 886–896 (2011). 886–896 , and declare δ et al. aDV iller et al. et G G FI t al. et T cell subtypes. cell T t al. et et al. et 9 13 , , Ananda Goldrath

t al. et t al. et Conservation and divergence in the transcriptional programs of the human et al. et al. et al. et J.K.

Inhibition of activation-induced death of dendritic cells and enhancement et al. et al. et V.J., Cell Cycle Cell opeesv mtyoe a o lnae omtet from commitment lineage of map methylome Comprehensive Module networks: identifying regulatory modules and their condition-

t al. et N A Lineage-specific regulation of the murine RAG-2 promoter: GATA-3 promoter: RAG-2 murine the of regulation Lineage-specific l , , Katherine Rothamel 26 . obntra ptenn o crmtn euaos noee by uncovered regulators chromatin of patterning Combinatorial 15 15 DACH1 regulates cell cycle progression of myeloid cells through cells myeloid of progression cycle cell regulates DACH1 erig pir n euaoy oeta fo eT data. eQTL from potential regulatory on prior a Learning NCE ONLINE PUBLIC ONLINE NCE A

t al. et

no designed Single-cell Mass cytometry of differential immune and drug responses

The Immunological Genome Project: networks of 37 Nat. Immunol. Nat. Nat. Immunol. Nat. Differential expression of Rel/NF- Nat. Genet. Nat. T.S., eihrn te rncitoa ntok f h dnrtc cell dendritic the of network transcriptional the Deciphering Intrathymic programming of effector fates in three molecularly three in fates effector of programming Intrathymic F rgltr fco- ad 8 oen edii cl subset cell dendritic govern -8 and factor-4 regulatory IFN , 726–740 (2007). 726–740 , N Structure and function of a transcriptional network activated by activated network transcriptional a of function and Structure , , Brian Brown , 16

CIA competing KLF13 influences multiple stages of both B and Tcell B ofboth stages multiple influences KLF13 , 884–895 (2007). 884–895 , 11

esl itronce tasrpinl icis oto cell control circuits transcriptional interconnected Densely

FEBS Lett. FEBS , , Paul A.R. 37

13 , , L 7 , 1038–1053 (2009). 1038–1053 , M I , 2047–2055 (2008). 2047–2055 ,

, 888–899 (2012). 888–899 , the

N Nat. Immunol. Nat. and Nature Proc.Natl.Acad.USASci.

ichael ichael Brenner T 40

experiments E

financial M

14 Blood D.K. Cell R , 1300–1306 (2008). 1300–1306 , 9 10

585 , 1091–1094 (2008). 1091–1094 , E

, , Joseph C Sun onach

467 , 618–626 (2009). 618–626 , STS 67

Science 144

wrote

, 1331–1337 (2011). 1331–1337 , , 301–320 (2005). 301–320 , 95 , 338–342 (2010). 338–342 , 15

Science , 296–309 (2011). 296–309 , A interests. 9 , 3845–3852 (2000). 3845–3852 ,

J. Immunol. J. Blood TION , , Vladimir Jojic , , 13

and the

17 M 325 + , 511–518 (2012). 511–518 , T cells via the Kruppel-like factors Kruppel-like the via cells T Nat. Genet. Nat. Cell , , David A Blair

Biochem. Biophys. Res. Commun. Res. Biophys. Biochem.

manuscript iriam iriam

13 κ participated

113 Blood , 429–432 (2009). 429–432 ,

332 B and octamer factors is a hallmark Cell ,

nature immunology nature 147 11

, 2906–2913(2009)., , 687–696, (2011).

, , Francis Kim 132 174 95 110 , 1628–1639 (2011). 1628–1639 , 14 M http://www.nature.c , 277–285, (2000).

34 , 712.e711–712.e712 , , 2573–2581 (2005).2573–2581 , , 2946–2951, (2013).

, with erad

in , 166–176 (2003). 166–176 ,

writing

input 1 15 , 18 ,

from

the ,

12

PLoS J. R. J. ­ om/ ,

© 2013 Nature America, Inc. All rights reserved. Children’s Hospital, Boston, USA. Massachusetts, 21 New York, USA. of Medicine, Boston University, Boston, USA. Massachusetts, Institute, Mount Sinai Hospital, New York, New York, USA. Boston, USA. Massachusetts, USA. of Technology, Cambridge, USA. Massachusetts, 9 Kutlu J Derrick Rossi M nature immunology nature Division Division of Biological Sciences, University of California San Diego, La Jolla, California, USA. Department Department of Biomedical Engineering, Howard Hughes Medical Institute, Boston University, Boston, USA. Massachusetts, ichael ichael 12 Joslin Joslin Diabetes Center, Boston, USA. Massachusetts, E lpek L Dustin 19 23 Fox Fox Chase Cancer Center, USA. Pennsylvania, Philadelphia, , , Angelique Bellemare-Pelletier 22 , , 18 N , , Susan A Shinton

idhi idhi 14 aDV Department Department of Microbiology & Immunology, University of California San Francisco, San Francisco, California, USA. A M NCE ONLINE PUBLIC ONLINE NCE alhotra 11 3 23 Division Division of Rheumatology, Immunology and Allergy, Brigham and Women’s Hospital, Boston, Massachusetts, , , Katelyn Sylvia Dana-Farber Dana-Farber Cancer Institute and Harvard Medical School, Boston, USA. Massachusetts, 19 , , Richard R Hardy 16 A Department Department of Pathology & Immunology, Washington University, St. Louis, Missouri, USA. 13 18 TION Division Division of Immunology, Department of Microbiology & Immunobiology, Harvard Medical School, Skirball Skirball Institute of Biomolecular Medicine, New York University School of Medicine, New York, 23

, Deepali , Deepali 3 , , Joonsoo Kang 20 Computer Computer Science Department, Brown University, Providence, Rhode Island, USA. M 19 alhotra , , David 10 Broad Broad Institute and Department of Biology, Institute Massachusetts 23 3 L , , Taras Kreslavsky & Shannon Turley aidlaw 20 , , Jim Collins 23 22 23 , , Anne Fletcher Program Program in Molecular Medicine, 21 , , Roi Gazit e c r u o s e r 15 22 Icahn Icahn Medical 23 , 17 ,

Department Department

11 © 2013 Nature America, Inc. All rights reserved. was The 3.9 genes, coarse was of measure. modules separate modules with although data. of cal (SPC) grained modules. of Definition 10% sion lineages. induced in one for lineages NKT locyte, signatures. Lineage-specific September which with set formed. pipeline Data preprocessing. continuously reconstruction, each lists ples (244 the clusters connected only 2012 of Report March The 2012) three ing, Protocols/ImmGe populations arrays set. Data METHODS ONLINE nature immunology nature grained

Clustering Each As the September

MATLAB

clusters

each with setting.

regulatory

fine-grained

lineage,

regulatory used

clusters are minimum 1 all in was

a to

default

cell sample release. cell,

The the

2 different

with

algorithm (Affymetrix

resulted s.d. because

(23 7

modules. macrophage,

2011 that papers

the

,

the

modules).

from of

presented if is the For coarse-grained

cluster

modules, a

A applied

by by

data C1–C80); The

types), because

Expression

γ

they it

clusters in

principled value

the

attempts coarse-grained

δ to 2011 gene

but data

SPC lineage samples

followed

had affinity highest

the genes

are

parameters, releases,

T and Supplementary Table 2 Supplementary

if

Ontogenet software).

updated.

the resulted

the ‘self-responsibility’ set

2010, that 6,997

program. cell it with program it

number

ImmGen instead

can

(C81).

available significantly

the was of above

release robust was

Affinity

to to must

excluding modules the

earlier was of hematopoietic

n

SPC

with refer

defined annotation on and

the

than clustering

the

form

Cell to propagation

7,965 mean the the genes presented

monocyte, have

considered with in

by

Expression

used

regulatory

approach

of chosen

used

the

maximize work in

0.5 676

and

stem analysis the

post-hoc

more

identifies ImmGen to and

multiarray

of

For

remaining mouse in module

prep reconstruction

analysis

was

The

which

a 334 propagation on

Modules

releases

744

unique

full

variable

expression with

for them.

because

new

fine were samples modules) all last by hierarchical in

control

with 8,431

each and

the higher

applied

We heat than

to

and fine-grained

the

other

samples, tree version generating SPC was

heat

an for ImmGen DC,

modules

resulted cluster) genes of

repressed ensure

data ImmGen

program nested Clustering

analysis backward clustering progenitor calculated

tree. a but was

entire genes

gradually

parameter

of sort

expression the in

variance

maps

(September

one

(210 choosing ‘sparsified’ it

were

degrees

done are expression

(195 average unclustered maps lineages.

B

samples.

the . the

were used to does

was and

One-way ing was further cell, 31). number

could Thus,

probe

stable consistency

with

647

was the data

cell in and

clustering

in presented

April

11

hematopoietic (functions nested defined

release

SOP.pd

by retained.

if

and the are

website normalized

the

measured

Sorting only NK not

2 compatibility 80 modules a

P

signatures

was

ImmGen grew 8 it lineages,

of cell.

set types).

of method. final stable

(which , a not

set we

A single

value values

set

superparamagnetic

with For from had

stable modules,

affinity

across

partitioned difference 2012

maximum tree. in cell, require

which

2010,

inherently

false-discovery analysis compactness.

some

at

done

were

in maintained

genes

on

Assignment be

that f (April

from

presentation by

). significantly 0.01. simplicity,

(

strategies

here

did clusters correlation above a

http://www.immgen CD4

Of

the anova1

release.

used The Supplementary Table 1 Table Supplementary the coarse-grained of with

indicates clusters (F1–F334).

coarse-grained

on

clustering.

a on March lineage used a

of release

Data

remained matrix.

as

for

all those, 2010

a not

were range gene

September

Affinity array, of

+ include

cell 2012) the

regulatory Affymetrix the

ImmGen

part as predefined

in

for

the log T genes. 11

was

variance

supported for

into

further

and much from

expression cell,

were

samples

ImmGen

types) to for 2

(coarse-grained

lineages:

clustering

membership was 2011 than

grouped other of of only

of (120) only

of

only the

the

SPC

and 11 as 2012,

were lower rate

multcompare

the the

March

fine-grained samples CD8 755

propagation On in

parameters,

For a

the as considered

log propensity

in

clustering, (7

probe

clustering hierarchi

website and

the

720

break that 2010 in states the

ImmGen was ImmGen ImmGen

(FDR)

program

possible. was

module.

Mogen1

done average,

all samples number 2

to

module cluster +

coarse- coarse- expres affinity

at release granu

by

into

trans

T

probe

in

2011,

other April April learn of sam done

were .org/

least

into cell,

and sets run

the the the

for

on

all

in of

is a ------

­ ­ ­ ­

oue eltp seii vrac estimation. variance in specific cell-type Module (modules) much parameter weights to ate similarity activation a bone lines, knowledge lineage the building. tree Hematopoietic ( regulator ing sufficiently that ing and 2008) in mouse, genes used the regulators. candidate of Choice tory specific where member of linear be members, can ule. between are related possible aims approximates the regulator linear types. a into expression program. regulatory Module or population, macrophages). this Supplementary Table 12 Supplementary

predefined node

TRANSFAC have More Notably, a arbitrarily a assigned

constant inferred regulators’

members following were or

be

That hypothetical

genes regulator modules

program were given

as to 3

marrow,

Fig. Fig.

annotated

combination smaller combination It

0 each

ChIP assigned

cell

in

tree, a and

human specific candidate

to then explain

formally,

cell while of

includes

removed, common

in

in

that

1

the not but

with step, tying

a × types

profiles

cell

available all

module ).

ε

(s.d. experimentally

to

each

one

nodes regulatory

combination across and followed m,t

types (regulators)

set

constructs

(

There

(the

of

the than sources:

learned r regulators Supplementary Table 13 Supplementary regulators tree

expression it.

measured

and passed

remaining matrix

For

type or

to

published is in

time

regulators

of enforced

as the < in of with

population the cell

expression

a

we regulators coarse-grained

a

regulators rat;

0.5

a

candidate (

(square,

across progenitor the the of were

the

long-term much all unless Gaussian

the of Fig. Fig.

combination

m

are

cell at ImmGen

s sublineage

type.

(as model

regulators the

,

across by the

conditions.

by t m database genes

the 2 ontogeny. present,

nominal

mouse in edges

program populations two

, 3

a w X type

added

in

are that

microarray

gene-ontology

× ). Ontogenet on cell

multiplied many

by

expression regulatory of t i

is of

). data they A as , cutoff. that

determined (cell

the

Fig. Fig.

roots

was

the

considered estimated

module’s the that

for the a the

pattern regulators; the

= random in simple was t are type

orthologs

Consortium. stem or module

Ontogenet as although

are ∑

obtained

were

to activated The

(version

different size

a with

trees expression

types).

which

( model, not entire

1 gene-expression In of

array are members

common Supplementary TableSupplementary 13

a published

manually to and

). r This

Here,

group t its

r,t

Candidate

module,

a

as

cells of

Edges

that

hematopoietic

measured

the

by .

members of program as r m possibly is

regular highly

of

(

were

members We variable regulatory

fine-grained Supplementary Table 11 Table Supplementary

,

the

m data

represented position there or the from each

resulted

possible the

). the

, ,

tree: as

and

we

term

8.3) r t some of

cell connected

by

cell from a T

and model

regulatory

whose takes

potential

connected regulatory indicate

module.

of

all

regulators.

of allow effective

cells) regulator pruned, set)

ChIP

m t Each

correlated study 2

regulator’s

an linear

types;

is

long-term populations 9 the

distinct

the ‘transcription

+

a

for a

, with edges regulators the (for of

fetal

the a

ontogeny

and

cell

gene weight

e the

to

in

and

the

program

known the

expression module each a expression or

group

followed variance

genes

of

example model, be ,

JASPAR

a change clusters

t 0

578 tree in

Each type a

regulators following

liver.

to

regulators

are

either number a

expression partitioning module.

in mean human program

being and activity

The differentiation included

on program

general

terms

module We

more matrices

(>

activity

a

stem candidate hypothetical

( created

encoding DNA-binding

t

Fig. Fig.

the module regulatory tree

cell

. were

doi:

will Each

that 0.85) the is

). denote

of Hence

, module

the and described by in

database

consistent

factor granulocytes

Some

the

than

of

hematopoiesis basis

of of weights

activity 10.1038/ni.2587 type. cells did

Thus,

relating activity

1

not

as

the representation deep

assumption

that

consisting were

for less

in )

weight,

curated but

population variance

w parameters (PWMs)

with linear the ).

of

its was

input:

m of

not as

the

the

the necessarily

one Regulators

intermedi from each module activity’ of

Because

,

regulators molecules a

are r likely

for own

a

the , sequenc

module’s a assumed program

variance t

gene

(version

(dashed

another for the step,

built

module

weights

cluster weights activity activity above);

regula

(noisy) change the

sum

parent across not

which which

gene- mod genes motif adult

from

31 each s sub

best

and one

of cell

i

, t m

2

an

by its 3 ,

in as of of of

is is

3 2 , a a ------; ;

­

© 2013 Nature America, Inc. All rights reserved. D is (RT), across form written type between The of two type an eny An which known addition in correlated a sparsity Σ sion any cell another parts, where regression extension of the Regulatory program fitting as a penalized regression problem. are still pooled 10 ased member doi: point convergence lems. such value Optimization a of term the

n t

single regulatory

1 1 the computes m the

densely a We Combining | ‘over-parameterization’ edge members.

important

10.1038/ni.2587 typically

activity

regulatory

types. (differentiation) w specific one consistent

related key

problems

t estimator, t

as

||

1

Alternative 2 as m,r,t where

method number difference is

one assume . . Dw we J

all

t i

as

as (

More variance

The

,

gradient list representative. observation

a

and module. w || that of genes regulator

2

Dw

write ‘Elastic |.

Σ Σ cell m ) non-smooth

interconnected s of regulators

that problem

squared

, ( ( weights ||. is

cell

However,

to t m t t

the similarity

2 R f program

smaller 2 1 is thus

promotes the m generally, ), For

of

, 3

a

Thus,

input types

is of that

4 all the is

regulatory relatively ||,

programs

across

we promotes

and .

( chosen compactly types

differences this

, ) of

n

the Different

w x estimator

∈ to

descent

edges

A fused 1 1 methods, these where the m t i Net’ prunes

cell

, make r f

activity

r

only

terms

introduce

standard with

w

of

∑ objective to

with number is concatenated

l − and than

tree

may

considerations m,r,t

(

all of

1

∑ type. this that the t | ∑ our of t i

1

1 in , Such penalty. modules w w the function consistency

Lasso ,

,

a

special slow. a

w

(

replicates

as r

2

module r m is this the

r

t

change ( many

takes 2 1

t sparsity

the

program the regulatory

all s

small ∑ ∑ module 2

form m

Supplementary Table 13 Supplementary choices

1 weights regulatory | as penalty

Σ Σ across

)

of total between , such not proposed The w w t m

l

∈ 2 r m , ,

t r m

is

of

t aggressive m t be

regulatory

approach 2

is

r m ,

2 1

term , variances || | | f tree. We an the )

framework

w

, , a , , can ∈ −

somewhat the w w regulators, (

feasible,

r t

highly considerations biologically

In

estimated

number r m w a

w x

difference as Σ

the f vector L

m m t and

throughout

t i therefore m 2 1

, , modules

m t we

problem, , and

1

of

m

in together

We our t as ) ( form tends of projected || be

throughout −

penalty, w t

) 1 − relevant of is

variance t r

circuits. 2 the above,

is

the

denote a

the before program hence + , ∑

+ assessed

the

+ + note selection correlated regulatory , , penalty case,

given

to k t r

in

pruning 2 k |

of

programs 2

as , , of

parameters two r l .

of t r to

cell || 3 complicated

We

promoting

∑ variances its

3 same

T ||

activity 2 these

a of

opted

with

, r m regulators

that and , the

be 6 |

relevant

it

the

direct

this

is regression

m entries

that programs, which

which r

, ,

gradients,

by parent = That type. will differentiation, the

estimate m

, , must

the ||

fitting r t for as overly will

0 differentiation.

complete

J a

|| 2 2

sum

multiplication

in D more of may

t work

if 1

penalty

cell regulatory to

write a ) ( predictors

programs

t number

the

w can for the

Although is

optimization

cell correlated weights yields sum promote

)

gives use

of ). in r m

coarse-grained

2 λ be

k

because

of 2

type a be

such

procedure

, ,

,

This

aggressive by a modules

+ regulatory

than

the are be only

a s

κ

||

this matrix

type Σ absolute problem, can ‘regularized’ a

t

particular inappropriate, of

fine-grained D w

t m

2 and

m r

the

is primal counteracted ) ( a objective

2

rise t vector , m

actively w

|

sparsity

1 the

of composite

tree

term

w w composed be ten on for

is

|| is programs t fact the

and between ,

δ we 2 of 2 2

cell

the

computed predictors

to

a used, , , absolute of

yield

and

smooth

+ t r

with by

all members is Estimation

‘redundancy’

parent

preservation

values

is in dual use that

w

by

g relationship in

a but selects 2 1 size types smaller encoded

the

m the module

regulating

regulators a || − penalized

for in its

inducing . module.

methods different

but less

an compact

absolute

The because with

module

interior

penalty

regres (RE) related

matrix ontog w parent r m

of

can

fitting

by of

as Σ and prob unbi value

m , , their

than

only

m by

two and

less cell but

the the the |

t

an

be . | Σ

in as of 1

× E | a - - - - r .

­ ­

ing based hibitively However, λ yield combination and already plished to fine-grained and modules. coarse-grained between transfer program Regulatory Finally mean otherwise. predictor Bayesian values ule complexity regulatory our criterion. information Bayesian with selection Model point This the We have program-fitting problem. optimization prototypical the Solving modules. where coarse-to-fine regulatory y t ,

that To 2 1 κ

=

for

penalties:

reformulate from

optimization δ ||

and

reformulation s been

regulatory

;

method simplify

expression A y

on

we

between 1 1 the each mt

of

we learned by + −

the δ information

different the

obtain Σ This matrix

The

a

expensive.

last expressed

the can program

m i

Furthermore, is properties. ∈ of w w 2 1 search models

Bayesian coarse-grained 2 1 m || of

through

3

the

fine-grained

introduction term 4 version objective absorb

n the 2 1 A y

0.001 ||

     

. minimize regulatory

programs

that λ m

problems minimize a 2 2

A + −

profile    , parameters

problem t model

x

0 choices y κ

0 for y enables discussion su

w of t i w ties of

  

,

as optimization l

in and w w and 0 bj

As

. −

criteria the

size information a || these       m the of

We Hence

ec

the

the     solutions, fine-grained

− can

of

||

by an our t t δ

20) 2 1

we m

2 2 term

w, of

      A

k

scanned use .

for (RT) program

modules straightforward of

above the

following || o

programs

of solving || Different

d z alternative, I parameters

be different

module

A k 1

t X y λ and note ,

objective    

these

we

l

(described both an

, I of + + w w of I

module’s + −

transformed k κ || 2       × m m

k

2 w w can and additional chose held-out the || y r

is

problem (T), that

w w m m

2 2 m

|| w

2 1 criterion sets with

− = three

the coarse-grained are of dependent

D w r r

m + +

|| in module

||

rewrite 2 2 general

δ optimization, quality. ′ combinations of 2 2

1

m

a

|| where g l

is

was convex ‘encouraged’ the + + of which the

+ +

+ + 2 2 coarse-grained with || genes the we

||

different then below). g l parameters

2 2 into parameters k g l Xw

g l 2

derivation

data optimal || optimal by geometric,

||

|| use penalty || coarse-grained (BIC)

into

the

that g , , form One cross-validation

a A D w d z

|| they adding

w z the and problem

1 r t

|| || on || m

or ,( = =

a 1 1 1 || objective

1 || we

first

nature immunology nature way + model + −

w w to

2 2 the

through we are

data-fitting 1 and set We

w to we , ) term. m of

|| t T D d wish

compare

and of in m variables

||

e || can Dw have ||

g nested. to of set term

these ||

−7

note 0 1

module above introduce Dw depends fine-grained a the ||

Dw

parameters identify − + The = , primal

selection

||

select m

to of We e as introduce w

t −6

a

2 || range w || and cross

as

m

t r learn 1 .. that parameters program 1 parameters

|| ,…,

m

formulation for will

.

This

|| . models would follows:

. |

that

/ 1

||

as the

s fine-grained

and dual 1

only -

e a the regulatory-

(the

mt validation. . as denote w 3

the

particular

is approach spanning decouple best 0

w

w with

variable

and optimal module interior accom and model- be

similar m sched on result m sparse

.

||

pro one.

λ

The will

2 2 = the the the the

,

of

κ 0 - - - -

­ ­

© 2013 Nature America, Inc. All rights reserved. w w cse.ucsc.edu/do browser We scanning. Motif (although example, constant step model be act lineage activity regulators than active in ranked lineage lineage a in deemed regulators. lineage of Choice ferentiation’. lated name regulators. of specific functions known for query Systematic ship and of relationships tory programs. of regulatory Post-processing with being Trace( umns corresponding edge Given then construct gle particular but The likelihood on This nature immunology nature

r m lineage

Notably, Hence

5.5 which excess

their under-represented , ,

only degree

downloaded intuitively

the

computation

(described could program criterion

a were m t construct between with

2 1

of of

in on the B column in two

= the

regulators

repressor and

(

weight

target

module, during

website tradeoff active B

B

a the

the activity GATA-3) the LL of

the

we

activator

BI

its

number

regulators lineage they

that differentiation,

connected B of with for be

regulators

nodes this

all ) ( 9.0

w C

, ,

lineage(s).

+

window

w All can t r counterpart, for regulator

log freedom.

spurious.

) ( compares =

in our

module these κ

by

of are

modules

across simple:

too to

a a

on wnloads.htm

the

− = of

. ∑ above), procedure,

diag( PubMed

2 underexpressed

between terms

compute

limited we We

graph

The

weight

B. if promoter

having

and,

of

connected scale.

− =

identified model

the corresponding zeros

m

of the

and

are its if ∑

highest

the

automatically

two

∑ that scanned

in

c columns on 2 portion

its lineages. University matrix

average

m when ))

by log chosen may

had LL To

of thus t i an

which For

in

models, , degrees At

regulators was and

−1 ∑

window

the even

a

nodes average

is were

s queries

) (

data

this

2 BIC,

B. the make

w w which we nonzero

activity

B

t i

this

t m a scale. 1 2 each ,

very although expression

sequences be ′

basis

active,

the by positive.

We , ), l role. components

2

B

).

+ promoters

queried

BIC(

they

if of

s analysis as log

infrequently activity

( where

under-represented

of we

For the For cutoff,

will df

1

w x if

of t m

final here

their

this 2

regulators. of

use well

module

regulators the t i

(for

nodes

were For

activity , , A of ) (

likelihood to Hence, cell

queried

use freedom

act.

w

weight California with model. had activity

each

each (

have in

their )

A x more

differentiation

‘encoded’

A degrees be example, diag(

straightforward, lo a Analogously,

expression t i ∑ as

r,t its

type , weight

post-processing Second,

the t r done the

yet for typically given

g higher

,

correspond

high due to gene,

r

− of 1

T lineage,

that

regulators columns

average

the

weight in

Once

and

c formal, correlation that First, ∑

connected its would

mouse denote

mouse weight r m active ) We t is in

on 1

to and

the is

,

of lineage, r Santa baseline regulators was

somewhat

regulators was

is we

the

, , expression

a w

early

by because a r t

A placed noise

remains 30

w a freedom

we r m those a diagonal

were

graph. t r

degrees

is we

scanned

expression across

, regulatory

, or

model).

be upregulated in

negative. parent t July genes and we a (mm9)

2 that

a , , Cruz obtain

) very

, r t

in with

to column tree 2

had the

collected regulator

captured the and

we

consider

given component

a +

between lineage

expression differentiation)

will

significant

columns 2012. of

t

to are

low We

df

recruitment

that

) the

technically in

lineage for ( all low are

is

of deemed 2

‘hematopoietic graph

the were http://hgdown hence of matrix

remove the an For

) (

from

+ not

PubMed

counted

sums

freedom.

eliminate

We expression

cell enriched the

programs, given

cell same of

across expression. were post-processing cons optimal

region

regulators l

the

ranked was or oog

each in

matrix matrix reflect

the subsequently

highest will the of types

specific the

type with

of

T

downregu

t

a the

expression can regulatory frequently

associated by .

through regulators

matrix

. deemed

predictor

regulator cell

relation columns involved as

with

lineage- genome

have

analysis starting

The df(

regula

motifs.

all

overall t entries

higher have

all

would A.

in A 2 cutoff

a based types

, load.

total

w

that The sin col

(for

and and

dif

We the the the log

) an A.

= a a a ------

­ ­

P Genomic PWM the PWM at resents by experimentally matrix element the site). ending from for old Motif-scoring threshold. region, genome. separately table each of in ing by targets. position from used unique original database 11 Table followed enrichment. events Binding fine-grained significant matrix ules i we expected background. ule k modules. in enrichment Motif LOD sidered quantile We

, in b

we (‘G’)

the enrichment. The the

P

the

computed

the transcription defined relative by the to

log k by

(

, whenever the module of position We computed

i,k

a

versus clustering k

the using

i

of

mouse ‘

hypergeometric

database

a the defined module In matrix -th’ k , regulators). at = likelihood

P a

))

by

200

by

of

position

-th’ we publications

one-sided )

MAX represented We

P

repository,

‘hit’

LO values

background data 0.2, exceeded position for phylogenetic

were

probability ‘hits’. the values deep a the

position. to

computed

base

the

motif modules. PWM. i D

a An

then the

with

coarse-grained

–1,000

that

gene for

-

of which , ( sets

given PWM-specific PWM determined

available. LOD

as M

the

sequencing information

downloaded

(version

start of

The size P were enrichment pairs 1,000 k j

p

) ,

MAX the

and

is obtained

found ratio value

each of e

IC all +200

rank-sum P ( symbol the The We

scores M

random defined

4

For (base

FDR the LOD = value represents site) supplementary the ) (

( modules downstream

considered of L k the k the × ,

base was

upeetr Tbe 11 Table Supplementary k -

P

r threshold compiled ∑ -th’

L

resource

target (LOD LOD

by )

target L =

Otherwise, k encountering We (base-pairs nucleotide each

+ = entire value the 1 8.3)

and

k

for

scores local

PWMs was ,

 

distribution pairs

for

comparing

for determined and pairs with where lo motif wherever

k k

automatically

content as

(

and

genomic threshold

2 gene of best ( g test.

i declared from

the 9

gene , enrichment, list.

score)

2

list

calculated

k)

, ChIP was

and For motif-matching

set

i ∑ The a

L

upstream r P human the

upstream the =

for

EnsemblCompara

k fine-grained 31 a

i k τ

=

1 genes

motif targets at L

We i motif

defined

k

of

set

An

∑ ,

j of k each 3 as .

max

calculated i the

all

4 =

at

the genes JASPAR 2

, ) , the

is of followed nucleotide

1 downstream

. the S

public

material

a for

S a genes

then i P

background:

the We of

position

all the

the

targets FDR ,

for

i,j each across

position

as one-to-one in

samples, r j GEO j

entire

in

, ( LOD . − + module ‘

instance 1,651 transcription

i

observing

P for We

separately scores

-th’

denote

computed of of the

this length

nucleotide that

P

in P j a used values p

)l 1

b

assigned

of motif.

e the the

(‘A’)

module

the represented data database (

(

by og

each

‘ (Gene

gene i all lists.

M

modules.

− set

k module; for PWMs ,

had 10% and j

j -th’

2 score , microarray

transcription

composition , MAX transcription an along lo

k gene (relative of

purpose k the of genes’

;

of = ) that

over the ) of

( g

k

.

360 sets The

original

a

The motif genes

FDR if the of 3 2

, (

designated ortholog

P

was

the

genes a 5 PWM j i for binding

Expression the P

. j b results to

symbols the

-

LOD(

size PWM-specific

the

start i b from passed (‘T’)

LOD (version (that

Only

the

motif ) to

the

information experiments

promoters. FDR

obtained

sequence coarse-grained S modules

doi:

used of

transcription as

each

M to capture best

motif

, of of assigned

promoter

r j of

site

5% ( ( entire

− + τ ,

publication i the

=

is,

Supplementary 10.1038/ni.2587 i,k

–1,000

, genes

the k and

in

j intersection and

event

was

the

this , start exists

,

of

start 0.3, cis for

were k

score the ) A, 1 were

on

), TRANSFAC

2008) higher )

in sites for

calculation

-regulatory  

the

each

P Omnibus)

k

defined this C,

calculated procedure serving

the

1 the

promoter given site the

by P -th’ k

included

reported

site)

We replaced all (

to bp

listed b accord

content

G i

( thresh j (‘C’)

,

mouse

of

in 3 MAX 2

j

effect,

entire

entire

to and mod mod genes motif ) 0 motif ChIP

−IC(k) or from

than con

start

rep

was and and and 109

the the the

T)

as as as of

= a ------

­ ­ ­ ­ ­

© 2013 Nature America, Inc. All rights reserved. option. child that repressor sor posal activity weight change weight poietic regulators. of weight activity in change a with steps differentiation of Identification different culated table module functional EnsemblCompara. the broa (version gene enrichment. Functional two score fer the modules regulators. tions each ChIP modules. dates interactions, estimation regulatory two cis P overlap. program regulatory of significance the Estimating doi:

-elements A values

in 10.1038/ni.2587 empirical recruitment

mouse

or groups,

dinstitute.org/gse of

hypergeometric

module.

ontology

of activity of for (parent for significance data

three

of

the is

tree,

(activity

the Those separately weight P

with

V.3)

have classes

Ontogenet three

0; the strengthening The values for

interactions

set, of group

child); gene The

regulators For

child

whereas the

groups

and

were weight the the Second, each ‘child’, P activity

more

each

ChIP

lineage-specific

(GO) and

groups

each is values

last

activity (parent

of weights symbol

of significance

Ontogenet). ‘universe positive members activity

repressor for downloaded

for functional

functional

the

all P

Only for

regulators module which overlap

is

interactions

and gene the

P

values we weight

in the

are coarse-grained a

0). explicitly modules

value including which Ontogenet )

weight activity

the

3

wherever (parent are calculated empirical

had

6

three-method For weight genes similar and Curated . sets

size’ resulted

for For

regulatory

and

the

were disposal was

is

of

of

ChIP

simplicity,

group. First, the

annotation is

(C5)

than

the positive

regulators of

from

the

is

each and

same); weight each included smaller

the takes

activity

calculated is

for the

are

‘universe calculated the

the

a

purpose

interactions

P gene

an positive);

information overlap

we in

others. from

one-to-one

all

the

the value three overlapping An edge

the

( group,

‘parent’

number empirical and

into

one

interactions,

calculated parent

activator and is functional

overlap, we overlap than

Broad sets weight FDR

enriched

the 0;

(C1,

in (differentiation that

fine-grained of

does account

of sizes’

The

regulation of omitted for child child

to

gene

the

(C2), the

activator Molecular

that ChIP was

the activity of

C2,

of are

the account Institute

are

P

hypergeometric is not.

multiplied of

ortholog because

recruitment

10% were following

clustering

activity

regulators regulators.

activity

value

the calculation compared

of

negative symbols

groups. C3 size assigned

modules each

motif and the

the including the

the

hypergeometric

was and the

weight

strengthening

of

overlap regulators from models

‘regulator Ontogenet for

modules,

two

website Signatures child);

the

weight

intersection step) gene weight

number exists

used C5).

The

the classifications:

and

by were

constant were methods

according that

For 10,000 with

hypergeometric

(parent the of is

We the of

fact

sets between

FDR

for activator

is

( negative (from

according enrichment.

P is example,

is

http://www. were

replaced the overlapping

weakening’

and the considered

larger

chosen report number of values

regulatory 0);

negative); that

the permuta Database (C3)

hemato

was possible

but activity activity activity (parent

repres of test

for

candi to entire

ChIP, some

than each each

dis

and and and two cal dif

the the

for for for

no

by to of ­ ------

­ ­

27. analyzed XX culture (LG.3A10; (UC3-10A6), (ebio17B7) CD24 described and Flow cytometry. sis ~80% starting strain). deficiency moter C57BL/6 Mice. analysis weight 37. 36. 35. 34. 33. 32. 31. 30. 29. 28.

of

Blatt, M., Wiseman, S. & Domany, E. Superparamagnetic clustering of data. of clustering Superparamagnetic Domany,E. & S. Wiseman, M., Blatt, hn, . Vredn JM, asl, .. Sn X FFrgltd t gns are genes Etv FGF-regulated X. Sun, & J.A. Hassell, J.M., Verheyden, Z., Zhang, A. Subramanian, A.J. Vilella, in L. Vandenberghe, & and S.P. Sparsity Boyd, K. Knight, & J. Zhu, S., Rosset, M., Saunders, R., Tibshirani, M.F. Berger, Badis, G. JASPAR: B. P.,W.W.Lenhard, Wasserman,W.,Engström, & Alkema, A., Sandelin, V. Matys, Frey, B.J. & Dueck, D. Clustering by passing messages between data points. Rev. Lett. Rev. (2009). essential for repressing Shh expression in mouse limb buds. (2005). 15545–15550 profiles. expression genome-wide interpreting for vertebrates. in trees phylogenetic 2004). UK, Cambridge, University, (2005). 91–108 lasso. fused the via smoothness preferences. sequence of analysis resolution Science Res. Acids an open-access database for eukaryotic binding profiles. eukaryotes. in regulation 315 Labeling

intranuclear

Cre-activity-reporter

deletion (HSA, of

Mice

, 972–976 (2007). 972–976 ,

supernatant across

The but

in

the with mice 2

4

324 (CD2p-CreTg

all .

CD25

and

et al.

are

lox t al. et 76

gene Kit with The

M1/69), 32

anti-V FlowJo

all from , 1720–1723 (2009). 1720–1723 , efficiency with , 3251–3254 (1996). 3251–3254 ,

P-flanked

t al. et part t al. et

(suppl. 1), D91–D94 (2004). D91–D94 1), (suppl.

Diversity and complexity in DNA recognition by transcription factors. (Invitrogen). anti-ROR Intracellular

RNFC n is oue RNCme: rncitoa gene transcriptional TRANSCompel: module its and TRANSFAC + cell staining following encoding lox et al. et CD44

and a

of BD aito i hmooan N bnig eeld y high- by revealed binding DNA homeodomain in Variation δ

nebCmaa eere: opee duplication-aware complete, GeneTrees: EnsemblCompara

P-flanked transgene

software 6.3 types anti-CD44

the Gene set enrichment analysis: a knowledge-based approach knowledge-based a analysis: enrichment set Gene

was

Biosciences). −

in

+ Etv5 (8F4H7B7), CD3 Nucleic Acids Res. Acids Nucleic

γ Etv5

model. (FoxP3

mice t

γ

CD2

(such biotinylated antibodies δ

(AFKJS-9; staining

Data

thymocyte

(Treestar). fl/fl

CD4 encoding

Etv5

(CD2p-CreTg

to Genome Res. Genome (IM7), ,

. . tt Sc Sre B tt Methodol. Stat. B Series Soc. Stat. R. J.

as were Staining backcrossed

generate

is ovx Optimization Convex

(Cytofix/Cytoperm GATA-3) CD8

alleles

anti-CCR6 Anti-V specifically

all were

acquired

with anti-CD62l

subsets, Cre −

from Cell

34

thymic Kit; mice

(

used: the , D108–D110 (2006). D108–D110 , recominase 19 γ Etv5

+ 1.1

Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc.

133 will Rosa-STOP

three

, 327–335 (2009). 327–335 ,

eBioscience) eBioscience);

FluoReporter on

as nature immunology nature

with

fl/fl (140706) (2.11) deleted , 1266–1276 (2008). 1266–1276 ,

anti-TCR

not

inferred precursors an (MEL-15),

) times 3

T C 1. (Cambridge 11.7 Ch , 7 LSRII

Dev. Cell

be Kit;

were cell–specific

was driven

from fl/fl captured

to BD and from

δ

(BD)

purified

-EYFP).

were

crossed and

the Mini-Biotin-

(GL3),

(DN3) Biosciences) anti-IL-17A 16 the

by anti-CD27

the

, 607–613

C57BL/6 and

anti-V the done genome

by Science

Nucleic

analy ETV5

from

anti- Phys. were

with with pro 102

this 67

as γ 2 - - , ,