© 2015 Nature America, Inc. All rights reserved. authors authors contributed equally to this work. Raleigh, North Carolina, USA (D.L.A.) and Department of Computer Science, University of California, Los Angeles, Los Angeles, California, USA (W.W.). Epidemiology and Karolinska Biostatistics, Institutet, Stockholm, Sweden. College Station, Texas, USA. University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. 4 at Chapel Hill, Chapel Hill, North Carolina, USA. 1 F reciprocal include to experiment our designed we in Therefore, variation epigenetic or genetic requires that process a mice, heterozygous in transcription in imbalance allelic detect to used be can (ASE) expression of allele-specific Examination alleles. parental two the of one either to transcripts assign to capacity the enhancing concurrently while variation genetic of study level The the mice. maximized design laboratory in tissues multiple in on expression origin parental and variation genetic of effects the investigated traits biomedical and diseases human with associated strongly nonetheless are structure, change to predicted not although that, SNPs of thousands of identification the through and populations and taxa divergent comparing studies recent in ent The appar sequences. increasingly become has sequence regulatory regulatory of or importance RNA protein-coding, in to assigned variation be can variation phenotypic most of basis genetic The complex imbalance cis complex tissues aid regulatory Complex Patrick F Sullivan David W Threadgill Wei Wang J Randal Nonneman Nashiya N Robinson Ryan J Buus Jeremy R Wang James J Crowley imbalance allelic pervasive identifies crosses mouse divergent highly in expression gene of allele-specific Analyses Nature Ge Nature Received 17 September 2014; accepted 26 January 2015; published online 2 March 2015; Lineberger Comprehensive Lineberger Cancer Comprehensive Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

regulatory in

further

in

genetic traits human a

n

elements in three-way 3 etics ,

9 expression

characterizing 1 effect.

, , Catherine E Welsh and , 4

,

5 traits traits

3 , , Mark E Calaway

1 , , Andrew P Morgan ADVANCE ONLINE PUBLICATION ONLINE ADVANCE usually ,

10 1 are

We

, 5 diallel.

, , Vasyl Zhabotynsky 7 are in , 6 1 Department Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. 7

1

, Lisa , M Lisa Tarantino conserved

, , Kyungsu Kim ,

favoring , also 8 mice 4

,

, influenced 11 extend 5 , , Ginger D Shaw

& Fernando Pardo-Manuel de Villena

these

Greater observe

and 11

the

These These authors jointly supervised this work. should Correspondence be addressed to F.P.-M.d.V. (

to

mechanisms. provide between

the 3

paternal by than two , , Chen-Ping Fu 1 3 , Department Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA. 4

human variation , 5 1 2

types , , John P Didion

, , , James Xenakis

4 80% a

, humans

5 new 7 1 , , John D Calaway

allele. , , Darla R Miller , 1 10

,

of ortholog. 4 Here of

, , Wei Sun ,

5 resource

in

parent-of-origin , , Jason S Spence mouse 5–

and regulatory We

8 we . Here we we Here . 3

, , Zhaojun Zhang

conclude mice, provide

Further,

9 toward Present Present addresses: Department of Biological Sciences, North Carolina State University, 1 1 6 2 cis , 1– , Department Department of Molecular and Cellular Medicine, Texas A&M Health Science Center, 2 , , Yuying Xie 4

, , 1 4 - 10 a

. 5

DNA

, , Terry J Gooch have 1 thorough

, , Shunping Huang 1 , a

4

, we not replicate these claims these replicate not imprinting sex-specific with genes 347 autosomal loci imprinted new >1,300 of identification claimed small number of new imprinted transcripts (ref. the placed estimates number of imprinted genes in mouse at 100–200 the number of mouse genes subject to imprinting. Previous consensus studies have been controversial embryo placenta fetal used brain examined studies ( (RNA-seq) sequencing RNA using possible. as genes many as for expression in ance imbal allelic significant statistically quantify and detect to hybrids that,

understanding 4 , detailed effects, 5 , 5 , , Fei Zou

Previous publications have examined allelic imbalance in F in imbalance allelic examined have publications Previous 1

through

, , David L Aylor cis estimate , Corey , R Corey Quackenbush 1

as

8 1 regulatory doi:10.1038/ng.322 ). An early application of RNA-seq in brain tissue yielded a yielded tissue brain in RNA-seq of application early An ). ,

1

4 with

annotation 6 including ,

1 . However, some of the conclusions of these RNA-seq RNA-seq these of conclusions the of some However, . 3 5 portrait , , William Valdar , , James Holt ,

11 mechanisms

2

that humans, , 11 2

the Department Department of University Biostatistics, of North Carolina 1 , , Leonard McMillan ,

4

at variation. of , 13

5 classical genetic

3 , , Stephanie D Hansen of least ,

1 mouse 1 , , Isa Kemal Pakatci 4 ,

, one used adult liver adult used one , 9 pervasive cis 2 , , Zaining Yun 9–

1 3 that

2 one

1 , , Zhishan Guo . regulatory 2

oe eotd utpe tissues multiple reported one ,

control

gene Effects imprinting 5

1 are Carolina Carolina Center for Genome Sciences, 1 in 7 , . A particularly . controversial A issue particularly is 1 4

every , , Cordelia J Barrick , , Alan B Lenarcic

regulatory

not expression

Supplementary Table 1 Table Supplementary from of

variants fully

transcription thousand

8 1 and 3 Department Department of Medical

9 , , Timothy A Bell , these 11 , , but 2 subsequent studies

understood. [email protected] 3 1 3

, , variation 5 a , , Yunjung Kim

and one used whole whole used one and

across

1

new in

, 4 variants 1 s e l c i t r A

,

1 5 mice SNPs . A reanalysis did . A reanalysis ,

global 10

multiple in , 1 1

10

influences , 1

could creates mammals. , including including ,

These These

1 Because influence ,

allelic ). Four Four ). 4 1

, two two , mice mice 1

, 1 ). 4

a , , 5



, -

© 2015 Nature America, Inc. All rights reserved. on RNA extracted from brain, and microarrays were run on RNA extracted extracted RNA on run were microarrays and brain, from extracted RNA on performed was RNA-seq males). 39 females, (52 RNA-seq for is shown size sample The lung. and kidney liver, brain, whole from extracted was RNA total and killed, d and 23 to aged were Mice combinations. genotypic possible nine the of each for replicates biological sex-matched and age- including a 3 × 3 diallel, form to crosses pairwise possible the within subspecies three of representative strains inbred divergent three 1 Figure by ( tissue perfectly samples the partitioned of data from 384 microarrays (4 tissues × 96 samples) Clustering microarrays. expression to RNA-seq for used mice same the from samples RNA lung and kidney liver, brain, Wehybridized Major RESULTS allele. paternal the of expression favoring imbalance allelic historical parent-of-origin genome-wide new a from report we but estimates, different substantially not was genes imprinted of number We the that genes. found also testable of all for >80% effects F an in allele parental each from derived ASE the in imbalance significant statistically a by detected is which allele, linked the from expression ( in acting variant trans regulatory epigenetic or genetic a of ence available data genomic current most and quality (‘pseudogenomes’) genomes tomized cus to of alignment genome diploid to approach number new a We developed the and ASE. detect to power improve sequencing to sexes both of included and replicates depth the increased also We Mouse Consortium Knockout International the as such projects large-scale of aims the the proportion of variants that have a pitfalls to estimate our conclusions, us to generalize of two, allowing potential instead address to and ( variation regulatory of covery tissues. other in reproduced are and brain in effects sex parent-of-origin strain, which to degree the of estimation and tissue single a examining by missed are that effects genetic of proportion the of parison of two major platforms for expression analysis, determination com detailed a allowed data array the of Inclusion lung. and kidney ASE in brain and microarrays to assess gene expression in brain, liver, We combinations. genotypic possible nine RNA-seq used to measure the of each for replicates biological sex-matched and age- with lung ( 26.9). of s.d. 19.9, of (mean gene per variants such of number the and genes) v37 of genes with expressed SNPs and/or indels (31,259 of 36,817 Ensembl million SNPs and 4.6 million 27.7 indels vary in example, these (for strains diversity genetic of level the maximize to strains musculus m. M. the within subspecies three of representative WSB/EiJ) and PWK/PhJ (CAST/EiJ, strains studied related but divergent genomes. We selected three inbred mouse control of gene expression in mouse. To maximize generalizability, we s e l c i t r A  lung. and kidney liver, brain, from Supplementary Supplementary Fig. 1 Supplementary Table 2 Supplementary Fig. Fig.

Allelic imbalance in expression for an F expression in imbalance Allelic dis the optimize to attempted we experiment, this designing In diallel 3 × 3 a form to crosses pairwise possible all Weconducted In the context of these findings, we sought to improve knowledge of the -acting factors have an equal opportunity to affect both alleles alleles both affect to opportunity equal an have factors -acting 1

) and measured gene expression in brain, liver, kidney and and kidney liver, brain, in expression gene measured and ) 1 drivers

mouse ( mouse Diallel crossing scheme and sample sizes. We selected We selected sizes. sample and scheme crossing Diallel M. musculus M.

of and and Supplementary Fig. 2 Fig. Supplementary

differential 1 9 M. m. domesticus m. M. , , Collaborative Cross species group. We generated offspring from all all from offspring We generated group. species Mus Mus musculus ). ). Regulatory variation in ). In particular, we included three genomes genomes three included we particular, In ).

gene species group ( group species

cis expression 22– , respectively). We chose these We these chose , respectively). regulatory effect and to assist ). We observe Weobserve ). 2 4 2 1 created from the highest highest the from created 0 mouse requires the pres the requires mouse Supplementary Fig. 3a Fig. Supplementary and Diversity Outbred cis 4

. in causes differential

M. m. castaneus m. M. mice 4 cis ), the number regulatory regulatory

cis

, as as , 2 1 ), ), - - - - , .

regulatory variation in laboratory mice was mostly additive, as mice was additive, the mostly F in laboratory variation regulatory of architecture genetic the that determined Wealso genomes. gent diver equally three these to reads RNA-seq of alignment the in bias ( strains parental ing triangle with the F ( multiplexing for used barcodes the from effects notable no with sex, and extent, origin of lesser parent far a to and, strain by determined strongly also were total read count (TReC). The remaining top ten principal components autosomal in variation total the of ~30% for accounted components from effects parent of and ( sex origin the exceeded greatly effect this strain; was expression cor (median related highly were values intensity microarray and counts read total RNA-seq Brain manner. similar a in acts often tissues diverse across shared ( data also showed that, are across strain different effects tissues, commonly Microarray sex. by finally and origin of by parent by partitioned then strain, samples the tissue, After sexes. both of resentation rep and diversity genetic extreme of presence the in even type, sue tis is expression gene of predictor predominant the that indicating cis with genes of number the of underestimation in result and ASReC and TReC between inconsistency to lead also can variation number effect of or overdominance ( F the in expression gene of levels total the but inbreds, parental the in TReC with consistent manner example, For other effects. and dominance to owing ASReC, and TReC between sistent or incon TReC in of are undetectable that effects strain create variants fraction Some effect. additive an as explained be F the in expression gene of level the and inbreds, parental cis ( allele CAST/EiJ the than stronger is turn in which allele, WSB/EiJ the than stronger is allele the at that, indicating crosses, three all for For a example, cross. read count in (ASReC) effect allele-specific the same within direction tive by defined effects, having an additive TReC and effect an additive testable genes). More than 75% of these genes showed consistent addi Wefound Cis predominated. effects parent-of-origin and dominance if strains parental the between midway not would be located samples Within each tissue, the three inbred strains formed an equilateral equilateral an formed strains inbred three the tissue, each Within gene differential of driver overwhelming the tissue, each Within effects. effect is consistent with the differential gene expression of the the of expression gene differential the with consistent is effect

regulatory Supplementary Fig. Supplementary 3b Da cis m regulatory effects for 11,287 autosomal genes (89% of of (89% genes autosomal 11,287 for effects regulatory a variation r DVANCE ONLINE PUBLICATION ONLINE DVANCE = 0.86, range of 0.84–0.87). of range 0.86, = 1 samples located midway between the correspond Fig. Fig. Fos Mad1l1 PWK/PhJ CA WSB/EiJ Supplementary Table 3 Supplementary S showed allelic imbalance in all F all in imbalance allelic showed T/EiJ Fig. Fig. 2

Supplementary Fig. 4 Fig. Supplementary is ). This indicates that there was no overall overall no was there that indicates This ).

pervasive showed allelic imbalance in expression in expression imbalance allelic showed 2 ). ). For RNA-seq, two the principal first 6 6 5 ), ), suggesting that regulatory variation , 6 , 6 , 2

CAST/EiJ 1 mice were best explained as an an as explained best were mice

Supplementary Supplementary Fig. 5 in 6 6 6

Sir diverse , 6 , 4 , 2

PWK/PhJ e cis ). level, the PWK/PhJ PWK/PhJ the level,

). Furthermore, this this Furthermore, ). Nature Ge Nature 6 5 6

mice

, 2 , 5 , 6 WSB/EiJ cis regulatory regulatory 1 1 mice in a in mice mice can can mice n ). ). Copy etics 1 ------

© 2015 Nature America, Inc. All rights reserved. only only the genes that have a for ortholog one-to-one mouse and human to were restricted comparisons These quantita studies. (eQTL) trait tive expression human for those to results our compared we dimensions. phenotype ( out knocked being after phenotype aberrant overt no in result that genes 1,348 the for no with genes in phenotype knockout logical mice ( or neuro a behavioral with to associated be more likely significantly with genes Brain-expressed URLs). (see dimensions phenotype 29 mouse and genes different knockout 6,039 for of phenotypes set comprehensive a to results our compared of consequences potential the test To Phenotypic ( lyzed ana tissues four the across data microarray the with pattern similar ( cross given any in genes downregulated to lated of upregu ratio the in skewing minimal was there and crosses, three the among pattern similar a showed sizes effect imbalance allelic of distribution fold-change the align Furthermore, any genome. to one ment read in bias overall no was there that indicating expression, were (45%) Notably, all contributed three to subspecies similarly gene differential 5,065 strains, of pairs 3 detected for 2 pairs and 2,109 (19%) were all detected for 1 pair ( for detected were (36%) multiple diverse tissues. four tissues. The pattern seen in brain extends to expression values for all autosomal genes across origin and sex. ( with its effect greatly exceeding those of parent of overwhelming driver of gene expression difference, TReC, indicating that genetic background is the PC1 and PC2 account for 31% of the variance in between their corresponding parental strains. a near-perfect triangle with the F all autosomal genes. The three inbred strains form ( and the inner color indicates the paternal strain. mice, the outer color indicates the maternal strain male) and color indicating genotype. For the F with shape indicating sex (circle, female; square, four tissues. Each point represents one mouse, RNA-seq and microarray expression levels across Figure 2 Nature Ge Nature a a ) PC1 versus PC2 of the brain RNA-seq TReC for To test the human relevance of mouse mouse of relevance human the test To with genes autosomal 11,287 the Of CAST/EiJ PWK/PhJ 709 Supplementary Fig. 6 Fig. Supplementary

vs. Principal components (PCs) of brain 1,729 n

etics consequences PWK/PhJ b WSB/Ei cis 4,113 1,689 ) PC1 versus PC2 of microarray 714 vs effect. Furthermore, we found no such enrichment enrichment such no found we Furthermore, effect. .

J ADVANCE ONLINE PUBLICATION ONLINE ADVANCE P 1,647 = 0.56) or those associated with the 27 other other 27 the with associated those or 0.56) = CAST/EiJ WSB/Ei 686 vs . 1,439

1 and ). samples located J

human CAST/EiJ b CAST/EiJ PWK/PhJ P = than 0.012) brain-expressed cis cis 1

cis relevance vs. regulatory variation, we we variation, regulatory regulatory effects, 4,113 4,113 effects, regulatory vs. Density vs. cis regulatory effects were were effects regulatory 6 0 4 2 PWK/PhJ WSB/EiJ WSB/EiJ regulatory variation, variation, regulatory 0 a Fig. 3 Fig. PC2 –10 Strain effect?

10 15 –5 0.2 0 5 Yes b ). We a ). saw CAST/EiJ No –10 0.4 Proportion Fig. Fig. 3 –5 0.6 a ). ). - - - - -

0.45 5 0 0.8 PWK/PhJ our results our to results a meta-analysis studies had much smaller samples sizes; nonetheless, when comparing ( eQTL blood eral to have a were much more likely periph human in mouse effect tory ( in all four tissues. four all in gene differential with genes ( of expression magnitude and fraction the with correlated was diversity sequence comparison, Furthermore, pairwise each within increases. kilobase per variants total of number as the increases kilobase per variants regulatory functional of number ( correlation logarithmic positive a result was The effects). strain consistent (additive, expression gene tial differen showed that genes of fraction the and (SNPs/kb) diversity sequence between relationship the examined we comparison, each For subspecific origin. subspecific same of the regions different between three and of origin regions genomic between three parisons: ( represented was subspecies one just genome, the of sub remainder the different the ( between pair species possible genome, were the multiple of >90% comparisons allow for pairwise to experiment, our design In comparisons. experimental wise our in genomes F previous to contrast In Proportion ( enrichment consistent observed we n M. M. m. domesticus PC1 = 15,312 genes; see URLs). Brain-expressed genes with a 0.49 1.0 0.51 M. m. domesticus m. M. 10

Supplementary Fig. 7 Fig. Supplementary of WSB/EiJ

SNPs 0.55 15 or P = 7.8 × 10 × 7.8 = musculus

with b subspecies to the identification of of identification the to subspecies 3 Figure for a strain effect (filled distributions). (filled for a effect strain significance statistical did that not reach genes for sizes of effect the distribution background, in the of 0.5 and in provides, sizes the vicinity of effect the distribution magnifies The inset or WSB/EiJ). PWK/PhJ in (i.e., second the legend listed the from strain reads allele-specific of fraction the is proportion the cross, each In cross. one least at for expression in imbalance allelic showed that genes autosomal 11,287 the for sizes effect imbalance allelic of ( crosses. other to relationship the and cross each in < 0.05) (FDR) rate discovery (false imbalance allelic with genes of ( genes. PC2 PC2 –0. –0. –0. –0. 0. 0. 0. 0.

0 0 1 2 1 2 2 1 2 1 cis 1 , ,

RNA-seq studies, we included three three included we studies, RNA-seq –0.2

2 –0.2 musculus 6 )

of 5 data (total sets available 2 regulatory −10 a –0.1

7

) Venn diagram showing the number number the showing ) Venn diagram –0.1 Balanced contribution of different different of contribution Balanced . . Therefore, we could make Kidney ) Brain PC PC 2

), and this correlation replicated replicated correlation this and ), 0 5 0 . Published human brain eQTL eQTL brain human Published . 1 1 P 0.1

= 0.04). = 0.1 or or 0.2 Fig. Fig. 0.2

effects castaneus 4

), indicating that the the that indicating ), PC2 PC2 –0. –0. –0. –0. 0. 0. 0. 0. s e l c i t r A 2 1 2 1 1 2 1 2 0 0

b –0.2 ) Distribution ) Distribution ), whereas, for for whereas, ), –0.2 –0.1

cis –0.1 Lung PC1 Liver PC1 -regulated -regulated cis

n 0 six 0 regula = 439), 0.1 0.1

com 0.2 0.2  - - - - - ­

© 2015 Nature America, Inc. All rights reserved. one regulatory one variant. Therefore, regulatory we could estimate the lower bound comparisons. three final the for Gb 2.25 and Mb 175 Mb, 150 Mb, 50 right, to left from approximately, was comparisons six these of ( body gene entire the within SNPs only and considered were information allele-specific with genes expressed only comparisons, pairwise six blue, (magenta, considered genome the of regions the in origin subspecific the represents text colored and CAST/EiJ), green, WSB/EiJ; blue, PWK/PhJ; (magenta, strain represent circles Colored dendrograms). by (indicated origin subspecific different or same the with genome the of regions for effects), strain consistent additive, with genes of (proportion expression gene differential show that genes of fraction the and (SNPs/kb) diversity sequence of level local the between relationship the indicates square Each scales. evolutionary multiple at diversity sequence 4 Figure s e l c i t r A  ± A330076H08Rik

10 kb) were included. The portion of the genome considered for each each for considered genome the of portion The included. were kb) 10 a Each Each 0 0 A230006K03Ri –log M. m. domesticus m. M. Apbb1i Adam23 Slc17a Ppp3ca Ddx39b Gabbr2 Gabrb3 Gabra5 Ube2 Cdk5r1 Nap1l Begai Peg10 Impact Grb1 Plagl1 Nhlrc Tusc2 Clcn Dclk1 Ttyh Peg Sgce War Ptms

Nnat Ga 10 cis Differential gene expression is positively correlated with with correlated positively is expression gene Differential Known maternal p: a 3 s Paternal expressionproportio 1 6 m eQTL identified in this study was explained by at least least at by explained was study this in identified eQTL 0 1 5 n 7 p .2 Camk2n1 DOKist Rasgrf1 Cacng Ndufs1 Hnrnp Eef1b2 Calm1 Ywhaz Bcl2l Usp29 Ywhae Inpp5f Adcy1 Wdr25 0 5 Snrpn Atp5e Zdbf2 Lars2 Gnas Zrsr1 Rtn3 Mest Ctsh Syt4 Ahi1 Dlk1 Nd k n 1 4 2 k 8 4 .5 ; green, ; green, 0 0 Tmem209 Gm16532 Fam135b Known paternal Khdrbs3 Trappc9 Cdkn1c Npepl1 Tgfb1i1 Kcnk9 Eif2c2 Meg Stx16 Blcap Ctsd Igf2r H13 Ddc Igf2 Th 1 8 6330512M04Ri 3 M. m. castaneus m. M. .7 Pcdhb10 Ppp1r9a Fam13a Klhdc10 Phactr2 Chrac1 Prss33 Ube3a 5 Copg2 Herc3 Bag Vapb n Asb Tial Cob Cct Rian Mirg 3 6 1 4 4 3 l 1. 5 k 0 b Number of genes (density) ). For each of the the of each For ). 0. 5 0 1 0.45 M. m. musculus m. M. CAST/EiJ CAST/EiJ PWK/PhJ Paternal expressionproportion

vs.

vs. vs.

PWK/PhJ

0.49 WSB/EiJ WSB/EiJ

;

dividing the number of of number the dividing a create that mutations of proportion the of Imprinted? 0.51 Yes Differential gene expression (proportion) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0 No 0.5 a 0.55 DVANCE ONLINE PUBLICATION ONLINE DVANCE 6% ofgenom dependent expression (filled distributions). (filled expression dependent origin– of parent for significance statistical reach not did that genes background, the in and, (lines) imprinted are that genes for 0.5 of vicinity the in proportion expression parental the ( underlined are effect origin of × parent a strain with Genes ( black in shown are genes imprinted new and blue in shown are expressed paternally be to known those red, in shown are expressed maternally be to literature the from effect parent-of-origin the to proportional is size dot and CAST/EiJ), × PWK/PhJ versus × PWK/PhJ CAST/EiJ example, (for cross a reciprocal to corresponds dot Each effect. parent-of-origin a significant ( 5 Figure evaluated be could 73 literature, the in ing 128 genes of with previous evidence imprint Of URLs). (see imprinting of evidence vious pre with genes 43 and genes imprinted new ( ters genes residing in well-known imprinting clus 62 of 95 the with on found 16 , NoteSupplementary hid a Zou (see from model Markov den imprinting of evidence with a or <0.01 value ble across all crosses examined and all across examined crosses all ble across a creates SNPs 1,000 in 1 approximately that ( 0.10% was ratio overall The for a particular cross ( genes testable all spanning regions genomic was defined by a parent-of-origin effect effect parent-of-origin a by defined was the ( imprinting of dence evi significant with genes 95 Weidentified under Classical expression. of levels and size varying of origin. This estimate also generalized to genes phylogenetic their of independently regions a 1. M. M. M. m. Genetic div cis cis ) Paternal expression ratio for 95 genes with with genes 95 for ratio expression ) Paternal 1 0 m. m. musculus domesticus domesticus Supplementary Data Set Data Supplementary eQTLs by the number of SNPs within within SNPs of number the by eQTLs regulatory effect. This estimate was sta was estimate This effect. regulatory Supplementary Fig. 9 Fig. Supplementary r 2 e

genetic = 0.955

Imprinted genes in mouse brain. brain. mouse in genes Imprinted er

imprinting 7% ofgenom sit M. M. M. .5 n m. m. = 47 genes). ( genes). = 47 m. y (SNPs/kb) musculus domesticus domesticus

control e P 2.0 value <0.01 combined combined <0.01 value P Supplementary Fig. 8 ). ). Imprinted genes were cis Fig. 5 Fig.

value. Genes known known Genes value. is

regulatory effect by effect regulatory

n incomplete 2% ofgenom = 54 new genes). genes). new = 54 M. M. M. Nature Ge Nature b

m. ) Distribution of ) Distribution m. m. a ). There were 52 52 were There ). musculus domesticus musculus ; full gene list in in list gene full ; 2.5 et al et ). Significance Significance ). ± e 0.02%), such such 0.02%), . 87–92% ofgenom 24 M. M. M. m. m. m. and the the and n

domesticus

musculus ca 3.0 and etics st

aneu

s e ). q ------

© 2015 Nature America, Inc. All rights reserved. the 19 autosomes, 15 had a higher proportion of genes whose neighbor expression from one parental allele tended to cluster ( 5 Table Supplementary ( 50% of expectation the from different significantly a proportion allele, paternal from the expression higher each cross for and each sex separately. Weeffect found that 54–60% of genes showed parent-of-origin the estimated we genes, imprinted beyond extended effects in parent-of-origin asymmetry this whether in brain whereas maternal expression predominates in placenta predominates expression paternal that observation the with ent ( allele maternal the than paternal the from expressed be to likely more times 1.5 were genes Imprinted Global ( which it ( could not, the of existence suggesting a more complex model effect strain single a by explained be could expression gene differential the which for those classes: 2 into genes 47 these We divided effect). origin of parent × (strain strength imprinting of the modifying effect strain a showed 95 47 the Of genes, cross. imprinted a within 3.2% of variance mean a with replicable, of The strength was imprinting ASReC (range highly of 50.6–99.7%). the of 75.6% average an represented reads paternal genes, expressed paternally For 51.5–97.9%). of (range ASReC the of 67% represented average an reads maternal genes, expressed maternally For plete. incom was imprinting genes, most For gene. each for imprinting of imprinting. on origin–dependent effects strain or of brain in imprinting of lack imprinting, parent sue-specific for criteria (median the expression meet not did but TReC (median = expressed 809, median ASReC = 143) for evaluation were (58%) 42 and ­identified variation) as being imprinted. exonic The remaining 31 genes containing were sufficiently and (expressed ( ( overexpression overexpression maternal paternal consistent consistent with Genes crosses. three all in consistent still but sexes both in consistent fully not ( island. CpG nearest the and TSS the between distance given a with genes of proportion ( genes inconsistent to relative away, ( overexpression maternal consistent with those whereas ( sexes both and crosses 3 all in overexpression ( autosome. every on overexpression maternal or paternal consistent with genes of number the on based clustering of proportion expected the denotes line red The autosomes. most on clustered are crosses) three all in (found imbalance ( present). were skew paternal no if expectation (the 0.5 of left the to values the of reflection a represents line red dashed The sexes. 2 × × 3 genes crosses ~10,000 for data aggregate in described genes imprinted 95 the except genes, all for expression paternal of proportion the of ( allele. paternal the of favor 6 Figure Nature Ge Nature d n n n ) Expanded analysis including genes genes including analysis Expanded ) cumulative the is Plotted 9,540). = islands, CpG to closer be to tend 467) = = 36) ( 36) = Allele-specific RNA-seq Allele-specific data allowed quantification of the strength

allelic Global allelic imbalance in in imbalance allelic Global Supplementary Table 4 Supplementary b ) Genes with consistent allelic allelic consistent with Genes ) c n ) Genes with consistent paternal paternal consistent with Genes ) Figure Figure etics

n imbalance = 116) tend to be farther farther be to tend 116) =

5 . The distribution reflects reflects distribution The . ADVANCE ONLINE PUBLICATION ONLINE ADVANCE P 03, ag = .109) sgetn tis suggesting 0.01–0.97), = range 0.37, = ). We also observed that genes with higher higher with genes that observed Wealso ).

a in ) Distribution Distribution )

favor ).

n Fig. 5 Fig.

= 1,631) are not different from inconsistent genes ( genes inconsistent from different not are 1,631) = of

the P b = 5.9 × 10 × 5.9 = ). This finding is consist is finding This ).

paternal

n = 11) and those for for those and 11) = c a Distance between TSS and Density Fig. 6

10 20 30 40 −24

nearest CpG island (bp) 0 allele 10 10 10 10 10 10 ; ; 0 1 2 3 4 5 6 Fig. 6 Fig. b 0.5 0.46 ). ). Among 9 Paternal expressionproportion . . To test a Paternal consistent Maternal consistent Inconsistent Cumulative proportionofgenes and and 0.48 0.6 - - -

0.50 0.7 ( genes a with near a TSS overexpressed CpG among island maternally genes of depletion significant a and genes overexpressed paternally among island CpG a near TSS a with genes for enrichment was further there cross), each within sexes both in expressed (consistently overexpression maternal ( consistent with genes among effect this ( genes remaining the for TSSs the ( islands CpG to closer were sex) by stratification ( crosses 3 all in allele paternal the of transcription start sites (TSSs) of genes with consistent overexpression the that found we islands, CpG to genes these of exam proximity the we ined when However, genes. significantly imprinted not known were with genes clustered These sex). by stratification without or (with paternal crosses reciprocal consistent three the in with overexpression genes maternal or selected first we genes, expressed modest parental overexpression correcting while for multiple testing. genes all power with to we nal overexpression, identify lack sufficient pater with genes of number the estimate can we Although chance. by some chance, are occurred expression proportion while not due to because we conservatively assumed that all genes with higher maternal Table 5 938 between by sex; stratified and crosses reciprocal (across ranged 2,500 overexpression paternal with genes of excess The genes). overexpressed maternally 5,138 minus genes expressed with genes more 1,652 in favor imbalance over were allelic of paternally (6,790 allele the paternal there hybrids, reciprocal PWK/PhJ genes with of higher maternal expression. For example, number for female CAST/EiJ × the and expression paternal higher with genes of number the between difference the taking by simply overexpression ( chance by expected than expression in skew parental same the had n P P = 116 and 1,631; = 1 × 10 × 1 = 10 × 9.6 = o dniy eoi faue ascae wt prnal over parentally with associated features genomic identify To We calculated a rough estimate of the number of genes with paternal n 0.52 = 3,338) retain enrichment for CpG islands, whereas those with with those whereas islands, CpG for enrichment retain 3,338) = 0.8 ). ). However, these numbers likely represent an underestimation Observed symmetric Expected if 0.54 −5 0.9 −3 ; ; Fig. 6 Fig. n , binomial test). binomial , = 5,154). = b

1.0 Proportion of P c

= 0.60). Note that, for the more grouprestrictive neighboring genes with ).

d same parental skew

Distance between TSS and 0.40 0.45 0.50 0.55 0.60 nearest CpG island (bp) 10 10 10 10 10 10 0 1 2 3 4 5 6 1 0.6 2 3

Fig. 6c Fig. 4 n Cumulative proportionofgenes Paternal consistent Maternal consistent Inconsistent

= 467 with and 3,338 without without 3,338 and with 467 = 5 Observed 0.7 6

Chromosome 7 ,

d 8 ). We did not observe observe not did We ). 9

0.8 10 s e l c i t r A

P 11 Simulation

< 1 × 10 × 1 < 12 Supplementary Supplementary 13 14 0.9 15 16 −5 17 ) than than ) 18

1.0 19  - - - ­

© 2015 Nature America, Inc. All rights reserved. equivalent to that for the autosomes in all four tissues examined examined tissues four all ( in autosomes the for that females to equivalent between genes X-linked males of and expression average the izes equal females in X one of silencing the that indicate ( examined tissues four all in in and gene was males expression equivalent females X-chromosome genes autosomal of expression the with genes X-linked of expression average the equalizes second males and females in genes X-linked of expression the equalizes first The compensation. dosage of forms two to subject be to believed is mammals in chromosome X the on expression Gene Two bias methylation maternal versus paternal a with islands CpG between differences inherent of result nal overexpression and nearby maternal methylation is not simply the ( CpG islands that were preferentially methylated on the maternal allele were to from closer allele with consistent overexpression the paternal mice CAST/EiJ and set 129X1/SvJ of data hybrids reciprocal from methylation DNA brain parent-of-origin whole-genome origin– of phenomenon. this in parent implicated be may that methylation dependent suggests this brain; in allele paternal the of expression favoring imbalance allelic a and pervasive to a TSS island of a proximity the CpG between level is an there association gene (imprinting), the at observed imbalance allelic significant the statistically to addition in that, conclude We genome. the in clustered also were island CpG a overlapped that TSS a with genes and sizes, effect strain additive smaller with associated was TSS a to island CpG a of elements have less impact on genes. these Interestingly, the proximity ( genes other for observed that the size of the strain effect was significantly smaller than s e l c i t r A  remained relatively constant as mouse subspecies evolved. Twoevolved. constant as relatively subspecies types mouse remained a create that mutations of proportion the and scales, evolutionary multiple at diversity sequence with related cor is positively expression gene differential Furthermore, ortholog. consequences for mouse phenotypes and usually extend to the human in acting variation regulatory to subject are and model additive an fit genes expressed differentially these of majority The variation. genetic on dependent levels expression have genes mouse of 80% than more that Wefind DISCUSSION gene, by one driven gene largely effects with X-chromosome expression, in variation the of ~12% for account did sex however, Overall, (0.28%). autosomes the to similar rate a sexes, the between expressed only were 4 genes, differentially (0.76%) X-linked testable 527 the Of samples. female in informative be only can data on in power the reduction to effects X as detect chromosome, ASReC the of because expected is difference This autosomes. for that than lower slightly rate a genes), testable and expressed all of (77% effect ( females in skewing inactivation in X-chromosome effect and a parent-of-origin at of genotype an effect We observed homologs. Y-chromosome of also degeneration the for to was genes doubled compensate of expression X-linked autosomes, a from of pair chromosomes sex of mammalian evolution the during Supplementary Fig. 11b Supplementary Supplementary Fig. 10 Supplementary

We were able to support this claim using a recently published published recently a using claim this support to able were We we allele, paternal the overexpressing consistently genes For A total of 346 X-chromosome genes were found to possess a were strain genes found to A possess of total 346 X-chromosome

forms 29

Supplementary Fig. 12 Fig. Supplementary of , 3 0

. In addition, X-chromosome gene expression was was expression gene X-chromosome addition, In . dosage P < 1.2 × 10 × 1.2 < cis . These These .

compensation Xce ). This observed relationship between pater between relationship ). observed This ). These ). data These support the hypothesis (X-chromosome–controlling element) (X-chromosome–controlling −4 Supplementary Fig. 11a Fig. Supplementary cis 3 ), implying that that implying ), 1 . In our data set, the overall level of of level overall the set, data our In . regulatory effects have functional functional have effects regulatory 2 8 . ) 3 3 .

on

the cis

regulatory effect has has effect regulatory X cis

chromosome -acting regulatory regulatory -acting Xist ). These data data These ). . 29 , 3 0 2 , and the the and , 8 . Genes Genes . 3 1 that,

3 2 - - -

and and Diversity Outbred 47,000 regulatory variants are segregating in the Collaborative Cross a SNPs creates 1,000 every in a have that variants phenotype. on ortholog, mouse the in providing a model to tractable assess the of effect variation regulatory counterpart a have often blood human in are patterns eQTL and of such tissue, that species often independent that shown have We relationships. phenotype genotype- of understanding detailed more a provide to variation needed are regulatory such for models Animal DNA. regulatory for mice. knockout gene-targeted incorporate or even complement could approach This expression. gene of level the titrate to intentionally bred animals set of series’,a ‘allelic an create to means the provide pheno data a these influence type, to suspected is gene particular a differential of if expression Furthermore, expression. tissue-specific or patterns distribution strain of basis the on genes candidate prioritize tigators often span hundreds of genes. The data here described can help inves crosses in experimental detected QTLs the because challenging been ( genes 5 spanning cation by (supported and genotyping) another mouse with a dupli ~250-kb RNA-seq the in included study, samples we identified one XO 96 female caused by paternal nondisjunction the Among dosage. gene in of striking examples two observed We continuously. arise mutations but time, status. number copy for of informative patterns still calls—are of variant However, are independent TReC—which undercounted. being strain that from reads region genomic that in strain that in has a copy no number SNPsgain, typically and indels are small called F reciprocal the create to used strains the of one when that, and WeTReC ASReC. between of terns inheritance have determined human additive pattern of inheritance, consistent with studies in mouse in studies with consistent inheritance, of pattern additive studies eQTL mouse for findings previous all exceeds number This genes). 2 Fig. and F inbred from and ASReC TReC analyze jointly to method likelihood-based new a Finally, effects. founding con chromosome-wide without chromosome X the on expression of analysis allow and misalignment read RNA-seq to due inference coveries and reference bias, detect and correct spurious transcriptome and allele-specific parent-of-origin effects while minimizing false dis Supplementary TableURLs; 2 with (see species many tools in analysis RNA-seq for analytical usefulness improved broad developed we com Finally, dosage of pensation. forms two includes and autosomes the for that to similar is chromosome X the on expression gene of regulation conclude that we Furthermore, imprinting. of strength the modify can and incomplete, is imprinting genes, most For Second, paternal allele at a large the number ofof estimates. genes associated with CpG expression islands. of favor in imbalance historical allelic global from a observed we different genes substantially imprinted not classically is of number the that demonstrated we First, observed. were expression gene on effects parent-of-origin of We have provided a lower-bound estimate of the proportion of of proportion the of estimate lower-bound a provided have We enriched are variants disease-associated common humans, In has QTLs mouse underlie that variants genetic the Pinpointing Inbred mouse strains are assumed to a possess fixed genome across We found Wefound ) increases statistical power to detect genetic effects. genetic detect to power statistical ) increases 3 3 6 4 and plant . We found that the expression of most transcripts shows an shows . Wetranscripts of most expression the that found de novo de cis regulatory effects for 11,686 genes (85% of testable testable of (85% genes 11,686 for effects regulatory a DVANCE ONLINE PUBLICATION ONLINE DVANCE 3 7 mutations altering gene expression via changes changes via expression gene altering mutations . . Interestingly, many genes have inconsistent pat cis 2 regulatory effect. We estimate that at least 1 least at that Weestimate effect. regulatory 1 ) populations. These regulatory variants likely 22– Supplementary Fig. 13 Fig. Supplementary 2 4 . These tools improve the power to detect cis regulatory effect. Therefore, at least at least Therefore, effect. regulatory 4 ; this leads to allele-specific allele-specific to leads this ; 1 mice ( mice

cis ). cis Nature Ge Nature -acting mutations mutations -acting -regulated genes -regulated Supplementary Supplementary 1 n hybrid hybrid etics 3 2 5 0 ------,

© 2015 Nature America, Inc. All rights reserved. somatic tissues are more abundant on the X chromosome than on on autosomes than chromosome X the on abundant more are tissues somatic considered are sion expres low with genes whether been has studies across results rate studies recent multiple tissues human and mouse cies spe eutherian several across studies microarray three by supported (Ohno’sthe autosomes hypothesis) on expression to equivalent roughly is expression X-chromosome of expressed are biallelically in the human scenario where ~15% of females, genes X-chromosome identified been viously pre have which of all tested, be could that genes the of 1.1% just for is rare for genes to escape X inactivation in mouse, with this occurring it that we confirm Furthermore, samples. tissue primary in evidence lines cell using before demonstrated been has this Although expression. similar have females and males some. First, for most of the genes on the X chromosome, we found that imprinting of studies controversial previous 2 in found genes of number large surprisingly the explain and may also allele paternal the of overexpression modest show 54) of (37 here described genes may partly explain why the majority of the newly identified imprinted the global allelic imbalance in favor of expression of the paternal allele Lastly, methylation. overall on effect minor overall an have only will cis strong create to likely are type later the of genes of promoters the in mutations that fact by the explained be could effect parent-of-origin this by not or affected are that genes between effects strain of size in selection to create classical imprinting. We propose that the natural difference by exploited been have may TSS the to close islands CpG at In imprinting. in other methylation words, parental differences small for imbalance. allelic responsible global mechanism the likely is development early during marks methylation of resetting origin–dependent of parent ferential CpG methylated differentially islands ( with associated are allele paternal ( TSSs genes their inconsistent to relative near effects strain smaller islands show to tend and CpG for enriched strongly are sexes and both crosses 3 all in allele paternal the of overexpression consistent have that genes 467 The crosses. reciprocal three in all and is present autosomes the all across distributed genes many affects This allele. paternal the of expression favored with brain, mouse in asymmetric be to appear expression gene on effects consensus. historical parent-of-origin However, the with line in is brain mouse in genes of imprinted number the that indicate results our then parent, one ing favor expression in imbalance allelic significant show that genes to genes subject to imprinting. If the definition of imprinting is restricted year. each created likely are variants regulatory proportion and the size of the human population, million new several this Given mutation. regulatory new a have should offspring 10 in 1 and mice average ~100 as human at Furthermore, gene a expression maintain level. constant to pressure selective under likely are study) this in (~15% variation regulatory without genes testable of proportion small the and tions, contribute to the broad phenotypic distributions seen in those popula Nature Ge Nature chromosome X the on genes all considers analysis Our ratios. sion Fig. Fig. We verified two forms of dosage compensation on the X chromo X the on compensation dosage of forms two We verified classical to ancestral is imbalance global this that Wehypothesize There have been conflicting reports regarding the number of mouse regulatory variants. On the other hand, mutations in CpG islands in CpG islands mutations hand, other On the variants. regulatory 40 , 6 47 ). In addition, genes with consistent overexpression of the the of overexpression consistent with genes addition, In ). , Supplementary Supplementary Fig. 10 4 8 but then contradicted in 2010 by an RNA-seq analysis of of analysis RNA-seq an by 2010 in contradicted then but 5 n 0 , their inclusion can lower median X:autosome expres X:autosome median lower can inclusion their , etics

57 ADVANCE ONLINE PUBLICATION ONLINE ADVANCE , 5 8 42– 50– de de novo . Because genes with no or low expression in in expression low or no with genes Because . 45 4 5 4 4 6 , 4 . This finding stands in sharp contrast to contrast sharp in stands finding This . 9 . The main factor contributing to dispa to contributing factor main The . 6 , and this controversy remains, despite despite remains, controversy this and , . Second, . we Second, found that the level overall mutations per generation ). ). These observations suggest that dif 3 1 40 . . Ohno’s was initially hypothesis , 4 1 , here we provide additional additional provide we here ,

38 , 3 9 , , at least 10 , 1 1 . ------

fied by creating a union of the databases from the following websites: by websites: a fied from the creating following of union the databases identi were imprinting of evidence previous with one2one.”Genes compara/homology ( Ensembl from identified org/phenotyp from acquired were phenotypes mouse at GeneScissors access (pseudogenes), misalignment read RNA-seq spurious of at correction found be can skewing h X-inactivation in factor to and ASReC com/p/suspenders com/p/lapels Scripts are provided to construct pseudogenomes ( URLs. humans. in traits genetic complex underlying mechanisms powe a vide variation regulatory incorporating to models Mouse subject are genes of majority vast the pressure. evolutionary under is sion that of the level gene expres strong evidence provides compensation and clearly supports Ohno’s hypothesis in mouse. This form of dosage 2. 1. reprints/index.ht at online available is information permissions and Reprints The authors declare no competing interests.financial pseudogenome construction and RNA-seq read alignment. S.H., I.K.P., J.R.W., C.E.W., C.-P.F., Z.Z., J.H., Z.G. and L.M. contributed to J.X., J.P.D., A.P.M. and D.L.A. contributed to data analysis and interpretation. statistical models and conducted analyses. W.V., A.B.L., D.W.T., L.M.T., K.K., prepared samples for expression profiling. W.S., F.Z., V.Z., Y.K. and W.W. developed C.R.Q. and Y.X. bred the mice and tissues. collected J.D.C., C.J.B., Z.Y. and T.J.G. edited it. D.R.M., G.D.S., T.A.B., R.J.B., M.E.C., S.D.H., J.S.S.,N.N.R., R.J.N., managed the project. J.J.C. and F.P.-M.d.V. drafted the manuscript, and all authors F.P.-M.d.V., J.J.C., L.M., F.Z., W.S., V.Z. and P.F.S. the designed study, and J.J.C. the National Institute of Mental Health. Medical General and Sciences (principal K01MH094406 investigator J.J.C.) from grants (principal R01GM074175 investigator F.Z.) from the National Institute of co-principal investigators F.P.-M.d.V. and P.F.S.). This work was also supported by of Excellence for Genome grants Sciences and(P50MH090338 P50HG006582, Institute of Mental Health/National Research Institute Center M. Calabrese for helpful discussions. Major funding was provided by National We thank P. Mieczkowski, A. Brandt, E. Malc, M. Vernon, J. Brennan and online version of the pape Note: Any Supplementary Information and Source Data files are available in the accession under (GEO) Omnibus Expression codes. Accession the pape the of in version available are references associated any and Methods M mous http:// C AUTH A ttp://www.bios.unc.e ckn O

ethods In summary, our study demonstrates that in the mouse laboratory in the that demonstrates In our study summary, MPETING FINANCIAL INTERESTS FINANCIAL MPETING ig MC & isn AC Eouin t w lvl i hmn ad chimpanzees. and humans in levels two at Evolution A.C. Wilson, & M.C. King, the in elements DNA of encyclopedia integrated An Consortium. Project ENCODE Science genome. human ebook.org/catalog.ph o O www.geneimprint.com Expression data can be viewed at data can be viewed Expression wledgments R R C

188 O r NTRIBUTI ful complement to null mutants null to complement ful / , 107–116 (1975). 107–116 , m es.shtm ) and perform diploid alignment ( alignment diploid perform and ) l . Nature Expression data can be acquired from the Gene Gene the from acquired be can data Expression / r ). An R package for jointly analyzing TReC and and TReC analyzing jointly for package R An ). . _method.htm r . l . . Orthologous genes for human and mouse were

O du/~feizou/software/ 489 http://csbio.u NS , 57–74 (2012). 57–74 , p?catalog=imprintin http://www. / , http://igc.otago.ac l uig h ctgr “ortholog_ category the using ) nc.edu/genescissors http://csbio.unc.ed http://www.informat ensembl.org/info/gen cis rxSe 1 GSE4455 regulatory variation. variation. regulatory 9 in the search for the the for search the in g .nz q . http://code.google http://www.nature.c . For detection and . For detection s e l c i t r A http://code / 20 and , 2 1 5 should pro should . / http://www. . Knockout . Knockout u/gecco

.google. ics.jax. online online ome/ om/ /  - - - . . © 2015 Nature America, Inc. All rights reserved. 31. 30. 29. 28. 27. 26. 25. 24. 23. 22. 21. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10. 9. 8. 7. 6. 5. 4. 3. s e l c i t r A 

Ohno, S. Ohno, ( mouse the of X-chromosome the in action Gene M.F. Lyon, single a by chromatin sex the of Formation R. Kinosita, W.D.& Kaplan, S., Ohno, W.Xie, H. Yang, Y. Kim, F.A.Wright, F.Zou, Zhang, Z. multi-alignment novel A W.Wang, & L. McMillan, C.Y., Kao, J., Holt, S., Huang, Outbred Diversity The K.L. Svenson, & S.C. Munger, D.M., Gatti, G.A., Churchill, Cross Collaborative the of architecture genome The Consortium. Cross Collaborative mammals. W.C.Skarnes, in imprinting Gametic D.P. Barlow, fire. under studies RNA E.C. Hayden, T.Babak, A. Goncalves, H. Okae, the in genes imprinted novel for survey A A.G. imprinted Clark, & P.D. of Soloway, X., Wang, evaluation Critical T. Babak, & D. Kooy, der van B., DeVeale, parent-of-origin Sex-specific C. Dulac, & D. Butler,Haig, J., J.E., Zhang, C., Gregg, C. Gregg, X. Wang, D.L. Nicolae, A.P.,Boyle, M.A., Schaub, Snyder,& S. Batzoglou, A., disease Kundaje, Linking M. L.A. Hindorff, Maurano, M.T. T.M.Keane, X. Gan, soito lc fr ua dsae ad traits. and diseases human for loci association DNA. regulatory in regulation. thaliana Nature expression. imprinted placenta-specific with mRNA-seq. by placenta mouse perspective. new a (2012). RNA-Seq: by expression gene brain. mouse the in expression allelic brain. mouse brain. mouse neonatal GWAS. from discovery enhance to (2012). 1748–1759 (1959). of cells liver in X-chromosome genome. mouse the in methylation DNA Genet. Nat. Transl.Psychiatry Genet. Nat. F 29 spurious transcriptome inference due to RNAseq reads misalignment. (2014). data. sequencing high-throughput for pipeline population. mouse population. reference genetic mouse function. gene mouse (1995). Biol. Curr. expression. gene mouse (2012). genome. human the in information regulatory with associations (2009). 9362–9367 1 reciprocal crosses and inbred lines. inbred and crosses reciprocal , 291–299 (2013). 291–299 ,

190 et al. et et al. et . t al. et t al. et et al. et Nature et al. et Sex Chromosomes and Sex Linked Genes Linked Sex and Chromosomes Sex et al. et al. et

et al. et t al. et 18 Nature

, 372–373 (1961). 372–373 , et al. et 43 46 et al. et Base-resolution analyses of sequence and parent-of-origin dependent parent-of-origin and sequence of analyses Base-resolution A novel statistical approach for jointly analyzing RNA-Seq data from data RNA-Seq analyzing jointly for approach statistical novel A mt-nlss f ee xrsin uniaie ri lc i brain. in loci trait quantitative expression gene of meta-analysis A , 1735–1741 (2008). 1735–1741 , Science et al. et et al. et et al. et et al. et Subspecific origin and haplotype diversity in the laboratory mouse. laboratory the in diversity haplotype and origin Subspecific Re-investigation and RNA sequencing-based identification of genes of identification sequencing-based RNA and Re-investigation GeneScissors: a comprehensive approach to detecting and correcting et al. utpe eeec gnms n tasrpoe for transcriptomes and genomes reference Multiple High-resolution analysis of parent-of-origin allelic expression in the in expression allelic parent-of-origin of analysis High-resolution , 648–655 (2011). 648–655 , (2014). 430–437 , Global survey of genomic imprinting by transcriptome sequencing. transcriptome by imprinting genomic of survey Global

Transcriptome-wide identification of novel imprinted genes in genes imprinted novel of identification Transcriptome-wide 477 Heritability and genomics of gene expression in peripheral blood. peripheral in expression gene of genomics and Heritability

Mouse genomic variation and its effect on phenotypes and gene and phenotypes on effect its and variation genomic Mouse

4 477 Trait-associated SNPs are more likely to be eQTLs: annotation eQTLs: be to likely more are Trait-associatedSNPs Mamm. Genome Mamm. Extensive compensatory Extensive Science Potential etiologic and functional implications of genome-wide of implications functional and etiologic Potential , e459 (2014). e459 , A conditional knockout resource for the genome-wide study of study genome-wide the for resource knockout conditional A Systematic localization of common disease-associated variation , 419–423 (2011). 419–423 ,

Nature 329 , 289–294 (2011). 289–294 , PLoS ONE PLoS Genome Res. Genome , 643–648 (2010). 643–648 ,

337

474 Genetics , 1190–1195 (2012). 1190–1195 , ats norvegicus Rattus

, 337–342 (2011). 337–342 , 3 PLoS Genet. PLoS

, e3839 (2008). e3839 , 23 Genetics

Nature Science 22 , 713–718 (2012). 713–718 , Genetics

Cell 189 , 2376–2384 (2012). 2376–2384 , cis

, 109–122 (2011). 109–122 , 148

484

190 - 329 u. o. Genet. Mol. Hum.

trans aaae (Oxford) Database

6 rc Nt. cd Si USA Sci. Acad. Natl. Proc. 197 , e1000888 (2010). e1000888 , , 816–831 (2012). 816–831 , , 428 (2012). 428 , . , 389–401 (2012). 389–401 , (Springer Verlag, 1967). Verlag, (Springer , 682–685 (2010). 682–685 , x. el Res. Cell Exp. regulation in the evolution of evolution the in regulation , 389–399 (2014). 389–399 , Science LS Genet. PLoS

u musculus Mus 270 Genome Res. Genome

2014 , 1610–1613 1610–1613 , 21 Bioinformatics 8 18 e1002600 , 548–558 , Arabidopsis 415–418 , bau057 ,

106 L.).

22 , ,

51. 50. 49. 45. 44. 43. 42. 41. 40. 39. 38. 37. 36. 35. 34. 33. 32. 58. 57. 56. 55. 54. 53. 52. 48. 47. 46.

hrhno PV, i R & ak PJ Eiec fr oae opnain between compensation dosage for Evidence P.J. Park, & R. P.V., Xi, Kharchenko, X. Deng, Y. Xiong, Carrel, L. & Willard, H.F. X-inactivation profile reveals extensive variability in X-linked A.M. Lopes, property intrinsic an is inactivation chromosome X from Escape L. Carrel, & N. Li, X from escape of survey Global C.M. Disteche, & J. F.,Yang,T.,Shendure, Babak, C.M. Johnston, H. Lin, of comparisons mutation: germline of basis Biological W.R. Lee, & J.B. Drost, A. Kong, E.E. Schadt, A.L. Price, patterns Inheritance G.A. Y.Woo,Shockley,Churchill, K.R., J., & Affourtit, X., Cui, D.L. Aylor, Calaway,J.D. Influence 3. X-chromosome. mouse the in elements Controlling B.M. Cattanach, u, N.K. Jue, chromosomes. sex the of compensation Dosage C.M. Disteche, X mammalian D. Brawand, in reduction Expression X. He, & J. Zhang, K., Xing, F., Lin, X. He, hyperactivation X-chromosome J.T. Lee, S.F.& Pinter, R.I., Sadreyev, E., Yildirim, H. Lin, V.Gupta, chromosome X active the of compensation Dosage C.M. Disteche, & D.K. Nguyen, and mice in inactivation X from Escape C.M. Disteche, F.& Yang, J.B., Berletch, gene expression in females. in expression gene Turner of basis molecular the and inactivation X Syndrome. for implications mouse: the the of mouse. in RNA-sequencing by inactivation complete. virtually is compensation dosage that genes. X-linked of silencing among Mutagen. Mol. Environ. rates mutation germline spontaneous risk. disease Nature (2011). individuals. unrelated or related in identity-by-descent F in levels transcript of Cross. mouse. (1970). 150 (2013). 150 approach. analytical on dependent is RNA-seq by chromosome 46 Nature USA Sci. compensation. Acad. Natl. dosage of hypothesis Ohno’s refutes evolution chromosome Biol. Mol. Struct. Nat. in mammals via nonlinear relationships between chromatin states and transcription. (2011). 1171–1172 cells is consistent with Ohno’s hypothesis. (2011). 1171–1172 reply the X chromosome and autosomes in mammals. (2011). 1179–1185 mammals, in X-chromosome. (2006). 3 mammals. in humans. pn oh at o a X iie b rearrangement. by divided X an of parts both upon , 537–560 (2012). 537–560 ,

Jarid1c Genome Res. Genome et al. et 422 478 PLoS Genet. PLoS et al. et t al. et et al. et Genome Biol. Genome t al. et et al. et t al. et BMC Genomics BMC t al. et et al. et et al. et , 297–302 (2003). 297–302 , , 343–348 (2011). 343–348 , He et al. et et al. et Relative overexpression of X-linked genes in mouse embryonic stem embryonic mouse in genes X-linked of overexpression Relative Nature et al. et Caenorhabditis elegans Caenorhabditis Nat. Genet. Nat. et al. et locus. Evidence for compensatory upregulation of expressed X-linked genes X-linked expressed of upregulation compensatory for Evidence oae opnain n h mue aacs prglto and up-regulation balances mouse the in compensation Dosage ae of Rate Global analysis of X-chromosome dosage compensation. dosage X-chromosome of analysis Global N sqecn sos o oae opnain f h active the of compensation dosage no shows sequencing RNA Nat. Genet. Nat. a t al. et Genetic analysis of complex traits in the emerging Collaborative emerging the in traits complex of analysis Genetic Single-tissue and cross-tissue heritability of gene expression via expression gene of heritability cross-tissue and Single-tissue et al. et eemnto o dsg cmesto o te amla X mammalian the of compensation dosage of Determination Transcriptional changes in response to X chromosome dosage in dosage chromosome X to response in Transcriptionalchanges DVANCE ONLINE PUBLICATION ONLINE DVANCE Genetics of gene expression surveyed in maize, mouse and man. and mouse maize, in surveyed expression gene of Genetics The evolution of gene expression levels in mammalian organs. mammalian in levels expression gene of evolution The Genetic architecture of skewed X inactivation in the laboratory the in inactivation X skewed of architecture Genetic

488 21

9 ag-cl pplto suy f ua cl lns indicates lines cell human of study population Large-scale Proc. Natl. Acad. Sci. USA Sci. Acad. Natl. Proc. reply. 109

, e1003853 (2013). e1003853 ,

19

1 , 1213–1222 (2011). 1213–1222 , 11 25 e novo de hybrid mice. hybrid , 471–475 (2012). 471–475 ,

, 56–61 (2012). 56–61 , , 11752–11757 (2012). 11752–11757 , 38 , 213 (2010). 213 ,

(suppl. 26), 48–64 (1995). 48–64 26), (suppl.

11 42 PLoS Biol. PLoS Nature , 47–53 (2006). 47–53 , Nat. Genet. Nat. , 82 (2010). 82 , , 1043–1047 (2010). 1043–1047 , uain ad h iprac o fte’ ae to age father’s of importance the and mutations

434 and Genetics

5 , 400–404 (2005). 400–404 ,

Drosophila melanogaster Drosophila , e326 (2007). e326 , 43 Nat. Genet. Genome Res. Genome , 1171–1172 (2011). 1171–1172 ,

Nat. Genet. 174 Drosophila PLoS Genet. PLoS

105 , 627–637 (2006). 627–637 ,

43 , 17055–17060 (2008). 17055–17060 , ee. Res. Genet. PLoS Genet. PLoS

, 1169–1170 author reply

20

Nature Ge Nature mue ad human. and mouse, , 43 , 614–622 (2010). 614–622 ,

, 1167–1169 author 4 BMC Genomics BMC , e9 (2008). e9 , Annu. Rev. Genet. Rev. Annu. . Nat. Genet. Nat.

16 7 , e1001317 , 293–301 , J. Biol. J. n etics Proc.

14 43

5 , , ,

© 2015 Nature America, Inc. All rights reserved. cate the numbers of maternal and paternal alleles (SNPs and indels) observed (SNPs observed and indels) alleles and of paternal cate numbers the maternal indi to read aligned each annotated we Third, mm9). to relative coordinates pseudogenome the alter indels as necessary was (which read aligned each of the strings CIGAR and rewriting positions by dinates the alignment updating to reads mm9 coor from the we pseudogenome-aligned coordinates mapped Second, release). 2011 (June and WSB/EiJ PWK/PhJ CAST/EiJ, included that effort sequencing large-scale a by reported variants all We(mm9). included structed by incorporating all known SNPs and indels into the C57BL/6 genome con were approximations Pseudogenomes of and indel 3 a bases). maximum read 100-bp per allowed two mismatches bp, two 25 segment, per of allowed mismatches length segment including parameters default (v1.4; TopHat appropriate the to aligned (ref. were ­‘pseudogenomes’ diallel) the in cells nine the of (six hybrid F each from reads First, sequence. reference the and genomes between parental the distance genetic in differences by caused bias minimize to and ity qual alignment improve to sequence reference alignment the into variants genetic approach has the advantage all of known strain-specific incorporating diversity genetic considerable containing line for subspecies mouse alignment. RNA-seq: barcodes of lanes. sequencing assignments to or and of indicators group combination these between each for associations 2) sex. × and 3 strain × maternal (3 strain, groups paternal 18 into partitioned be can was portion each randomly and assigned tosequence portions, one four and lane into of density onedivided was machine. cluster sample The 384 each in portions quality, (4 effects × 96 reads. samples) machine and paired-end lane 100-bp for generate account Toto used was 2000 HiSeq Illumina The pools. equimolar a of multiplexed total eight at yielding sequencing, before using concentrations pooled were samples quantified were selected randomly 12 Libraries and barcodes. fluorometry, different two least at assigned Sample RNA was sample each and TruSeq input, as used was sample RNA per total of microgram Illumina One the (AD001–AD012). adaptors using indexed unique 12 with v2 Kit preparation Preparation library for 48 of preparation. sample RNA-seq: Technologies). Agilent (Bioanalyzer, verified platform microfluidics a was using quality RNA and Technologies), Life Fluorometer, 2.0 (Qubit RNA Total Kit, Promega). Purification RNA LEV concentration was by measured fluorometry Tissue 16 (Maxwell instrumentation automated using sue extraction. RNA Products). (BioSpec unit BioPulverizer a using pulverized and nitrogen liquid in frozen snap dissected, rapidly were lobes) (all and lungs (both) kidneys lobe), (left liver brain, Whole cage. home were euthanized between 10 and 12 a.m., immediately after removal from their without anesthesia (to avoid confounding effects on gene expression). All mice collection. Tissue enrichment. for cage each in present were material available were RMH3000 Prolab Purina Waterand bedding. Bed-O-Cob laboratory-grade with cages humidity. Mice were housed in standard 20 cm × 30 cm relative ventilated polysulfone 40–50% with °C 20–24 at maintained was room housing The a.m. 6 at on turned lights with schedule dark 10-h light, 14-h a Mice on maintained were Laboratory. Jackson the from acquired founders from removed tions genera six than fewer were that mice from Carolina North of University the and WSB/EiJ. CAST/EiJ, PWK/PhJ strains were mice All bred at wild-derived Mice. of University the of Committee by Use Carolina. and approved North and Care 1996) Animal Council, Institutional the Research National Resources, Animal Animals Laboratory of Use and Care the for Guide statement. Ethical ONLINE doi: 10.1038/ng.3222 The mice used in this study were inbred and reciprocal F

METHODS Total RNA was extracted from ~25 mg of powdered tis powdered of mg ~25 from extracted was RNA Total Mice were killed at 23 23 at killed were Mice All mouse work was conducted in compliance with the the with compliance in conducted was work mouse All We developed a customized RNA-seq alignment pipe RNA-seq alignment Wea customized developed 2 2 ), representing each of the parental genomes using using genomes parental the of each representing ), ad libitum ad The 96 samples were randomized to batches to batches randomized were samples 96 The . A small section of PVC pipe and nestlet nestlet and of PVC pipe section . A small ± χ 1 d of age by cervical dislocation dislocation cervical by age of d 1 2 tests confirmed no significant significant no confirmed tests (Institute of Laboratory Laboratory of (Institute 1 hybrids of the 22– 2 4 . . This 4 1 ------

isoform/countReads. position information was assigned on the basis of basis on the assigned was information position Exon isoform/countReads. function R the using gene a of regions exonic the overlapped that reads end We other). the to from of a mapped of number the reads as gene number the paired- counted then were none and parent one (all from consistent were fully reported was it alleles if only parent a to assigned was it indel, or SNP and paternal the between maternal Ifstrains. a read paired-end overlapped more than one heterozygous heterozygous was that overlapped indel or end SNP either one if least at specific allele was read paired-end A ASReC). or paired-end reads (only for allele-specific F maternal and paternal of numbers the and TReC) or count, read total mice; F and inbred both (for reads paired-end of number total the RNA-seq: with RNA-seq: read assignment. control. quality passed samples strain for the sex chromosomes, autosomes and mitochondrial genome. Ninety for on the of proportions expectations reads that to should align each parental stages. annotation and remapping same by the followed necessary, was alignment pseudogenome single a only mice, inbred For file. alignment single a into combined were files alignment resulting the and sample, a of lanes the all to separately applied was step final This once). combined were alignments such that a to read was the aligning two in same counted position both alignments the (i.e., alignments separate the of union to alignments Finally, were maternal and merged pseudogenomes paternal by computing the proper ASE. determining reads paired-end more of use the in a given read and its paired-end mate. Considering paired-end mates allowed indicating multiple 96-well plates and “dissection” is a categorical variable variable categorical a is “dissection” and plates 96-well multiple indicating F reciprocal gin” is a vector for comparisons of reciprocal F where “strain” is a vector for comparisons of two inbred strains, “parent of ori as each effects follows: for sex and models dominance parent-of-origin, fixed-effect strain, for linear test to fitted transcript we strains, inbred the between analysis. statistical Microarray: below). provided are details (full for effects sex and dominance models parent-of-origin, strain, for test to fixed-effect transcript each linear fitted we strains, inbred the F between reciprocal crosses and strains and inbred For function (PCA). link analysis average the principal-component with hclust function R the hierarchical using applied we clustering arrays, the of performance overall the evaluate To annotation. mRNA without those and sets probe control excluding after sets strains inbred mouse 3 the in SNPs known any ing transcripts. During of normalization, we masked 78,632 probes levels (~10% of all probes) contain expression normalized the estimate to normalization) gene sketch-quantile and Affymetrix polish (median the settings default in with console expression implemented (RMA) method average robust multiarray the used We protocols. manufacturer’s the to according Affymetrix Affymetrix to from Mouse 1.1 Plate ST Gene instrument a using arrays 96-Array GeneTitan hybridized was RNA-seq for used mice same the from RNA control. quality and processing Microarray: al. et analysis. RNA-seq: statistical covariate. a as sample each for reads of number total the We included samples. across gene that of expression the of and were was gene thus length gene analyses constant specific in comparisons all because length gene for correct to need no was There 2012). February 14 transcriptome annotations from Ensembl (Release 66, based on mm9; accessed After After alignment, we performed a series of quality control checks capitalizing y 2 4 + = as well as in the the in as well as + × + + b b 1 0 b b b 9 8 5 1 st plat crosses, “sex” indicates female, “plate” is a categorical variable variable categorical a is “plate” female, indicates “sex” crosses, rain st d e rain sse + p x Supplementary Note Supplementary b + isse 2 b parent 6 Three counts were obtained for each gene assessed c ttion Statistical analysis is described in detail in in Zou detail is described analysis Statistical aren For inbred strains and reciprocal F reciprocal and strains inbred For + of o t e origin o f rigi 1 hybrids; allele-specific read count, hybrids; allele-specific + s n . 1 b × crosses, “dominance” indicates Brain, liver, kidney and lung lung and kidney liver, Brain, 3 ex dominanc + 4 . We used 28,310 probe probe 28,310 Weused . b 7 Nature Ge Nature dominanc s e + b 4 ex s e × 1 n crosses crosses ex etics 1 1 - -

© 2015 Nature America, Inc. All rights reserved. we randomly sampled 467 and 116 genes from the whole gene list and calculated from CpG islands ( away further be to tended genes overexpressed maternally and genes, sistent Paternally overexpressed genes tended to be closer to CpG islands than incon distributions. respective generate to expression) parental inconsistent (with genes expressed of remainder the for those to distances these Wecompared overexpressed. maternally consistently were that genes 116 all and expressed the nearest CpG island for all 467 genes that were consistently over paternally expected. than higher was proportion observed the autosomes, 19 of 15 for that, Wefound chromosome. corresponding the for genes overexpressed paternally of tion with null: the under proportions expected be would what chromosome-wise these compared we Then sexes. both and crosses three from results pooling after chromosome each within genes We effect. of of the proportion such recorded parent-of-origin same direction the had genes neighboring whether checked we sex, each and cross each For in shown clustering of the magnitude the Toquantify clustering. of evidence was there and autosome, one into collapsed then and case) each in significant were (and and sex of cross combination for each result separately tests median These were from simulations). 2,000 performed the (we used test a using Wilcoxon 50% rank-sum from different was portion subset of 1,000 genes, we tested whether the paternal expression expected pro (this test is therefore conservative yet still highly Forsignificant). each random of avoid to subset genes random a used We times. 2,000 genes) imprinted known (minus in shown bias sion data. RNA-seq in bias expression Paternal and (FDR) rate discovery the if false significant be to tests declared used we correction, multiple-testing For follows: as effects sex and dominance parent-of-origin, strain, the Wetested dates. dissection different indicating Nature Ge Nature Se St Dominanc e Parent-of-origin To further explore this clustering, we calculated the distance from the TSS to every on observed was imbalance allelic on effect parent-of-origin This rain e x ffect effect : : e e H n : : 4 0 H ffect etics P b b 1 0 -value inflation due to possible correlation between genes genes between correlation possible to due inflation -value b b : : f Fig. Fig. 6 = = ffect H Figure 6 Figure = 3 0 6 5 : : b b P H 1 5 c value using Fisher’s combined probability test. probability Fisher’s using combined value ). ). To of test formally the this difference, significance b b 2 0 0 = Figure 6 Figure = = b b a = = , we permuted a random subset of 1,000 genes genes 1,000 of subset random a permuted we , vs 7 = = : . 1 7 H v 0 1 6 b v 0 , we performed the following procedure. procedure. following the , we performed q s. b b value was <0.05. was value 5 1 s. v 0 H ≠ H 3 1 p o 0 s. 0 : 2 b b + (1 − − (1 + 0 : H To quantify the paternal expres paternal the To quantify b b 0 r 5 4 ≠ ≠ : : b b ≠ ≠ 6 2 ≠ or ≠ ≠ p or ) o 0 2 7 , where where , 0 r ≠ ≠ 0 0 0 b b p 7 6 is the propor the is or ≠ 0 0 - - - - -

we examined distributions for all combinations of methylation group (mater of for methylation combinations all distributions we examined In words, to other overexpressed. parental with respect tion of distances these island for each parentally biased methylation group and examined the distribu group. reference a as used were methylation preferential no with islands CpG remaining The methylated. paternally preferentially be to island CpG than we higher the proportion proportion, maternal methylation the declared a had paternal mice if both Likewise, analysis. of this purposes for the ylated, we proportion, this CpG declared island meth to maternally be preferentially paternal the than higher proportion methylation a had maternal mice if both counts over methylation each CpG island and applied a criterion: simple filter this CpG integrated we cross, first just reciprocal one per mouse included set data Because mice. Cast/EiJ and 129X1/SvJ of hybrids reciprocal from data DNA methylation brain parent-of-origin whole-genome of consists set Xie data by publication recent a from Omnibus) Expression Gene methylation ( in bias parent-of-origin with islands CpG to closer were allele paternal the from overexpression consistent methylation. with DNA genes and whether bias Wetested expression paternal between Relationship the as 10 × 1 and result) extreme empirical as was none permutations, 100,000 of (out genes expressed resulting The data. unperturbed squared from deviation randomly sampled genes was larger than the one from the calculated and times the mean squared of deviation Wethe curves. repeated this procedure 100,000 CpG islands was itself also significant ( significant also itself was islands CpG ( (permutation islands CpG methylated maternally for enrichment greatest the had genes expressed to genes of class each inconsistent CpG Weisland. over found that paternally than closer be to tended genes overexpressed paternally consistent whether island (bp)) is shown in CpG to nearest TSS expressed: (bp)/inconsistently CpG island to nearest TSS log ratio: log following the inconsistently using and genes, genes expressed overexpressed paternally consistently of parison com A island. CpG closest the and TSS the between distance median the of permutations 10,000 used we robust more result the make to and group, est group, we calculated distance to a down-sampled subset equivalent to per count the island CpG small differential to To due bias total. avoid in combinations six and and group and nal, paternal other) maternal), (paternal overexpressed P Supplementary Fig. 10 Fig. Supplementary = 0.0034). This greater enrichment for maternal over paternal methylated methylated paternal over maternal for enrichment greater This 0.0034). = Next, we calculated the distance from each gene’s TSS to the closest CpG CpG closest the to gene’s TSS each from distance the calculated we Next, P = 0), followed by paternally methylated CpG islands islands CpG methylated paternally by followed 0), = Supplementary Supplementary Figure 10 ). To accomplish this, we used a data set ( a set data we ). To used this, accomplish P value as the proportion of times where the mean mean the where times of proportion the as value −5 for maternally overexpressed genes. overexpressed maternally for P values were <1 × 10 × <1 were values P = 0.0015). = . . In short, this plot examines 10 (paternally expressed: expressed: (paternally −5 for paternally over paternally for doi: 10.1038/ng.3222 et al. et GSE3372 2 8 . This This . 2 ------,