© 2015 Nature America, Inc. All rights reserved. A A full list of authors and affiliations appears at the end of the paper. projects. phenotyping of The European variety Mouse of Phenotyping Resource Screens Standardised a in used been subsequently has tests phenotyping robust of set a of development the reported program Applications) Industrial and Health Public for Research Mouse Union (European EUMORPHIA The and place. in time and ers is to robust differences cent multiple across validated been has that pipeline phenotyping a a and to develop center, vital It is therefore required. be will approach single multicenter a of capacity and infrastructure the beyond is mouse of annotation functional comprehensive a of delivery tools display and analysis acquisition, data of development robust with the data together sets to large-scale deliver infrastructure scientific the and approaches phenotyping in advances number important of a require will mutants mouse of phenotyping by function states. and for disease basis phenotype genetic the understanding typing of mouse mutants and inbred strains in humans. Thus, systematic efforts to broad-based undertake pheno which will inform further studies of genetic and physiological systems phen by genetics underpinned human be in will identified studies loci of Notably, role the disease. of and understanding health an on contributions these of effects the interpret to and states disease and systems, organ and physiological pathways, metabolic to each of contribution the understand to pleiotropy cataloguing and Assessing sis of developmental, biochemical, physi assessment of phenotype been undertaken that encompassed the analy interests of the investigator, and in only a few has cases a broad-based which the phenotype is screened is often dependent on the expertise about a third of andthe genes in the mouse genome Phenotypic annotations of knockout mutants have been generated for providing phenotype of unique at comprising consortium knockout The of mouse clinics broad-based phenotypic screens across a consortium Analysis of mammalian gene function through Nature Ge Nature Received 2 February; accepted 25 June; Published online 27 July 2015;

It is recognized that any large-scale analysis of mammalian gene gene mammalian of analysis large-scale any that recognized is It detecting the

function

mutant

genes,

resource a n

annotation

20 powerful etics developed reproducible

of

lines

of disease-oriented

the

which 9

that was validated across our consortium and and consortium our across validated was that

ADVANCE ONLINE PUBLICATION ONLINE ADVANCE

provides are majority

basis

according and

phenodeviant, half

phenotypes

for validated

had

a of

basis hypothesis

genes

platforms. no

to

o

5 for previous zygosity. will be critical in beginning beginning in critical be will o

typic analyses in the mouse, mouse, the in analyses typic

robust logical logical and organ systems

in with with

the

6 the ,

7 generation will be will of great value to

We characterization 65% high 1

mouse methodologies . . However, the way in

functional New

developed

power. demonstrating

phenotypes and

and

human 2 We annotation. ,

3

further doi:10.1038/ng.336 new , 8

. The The .

acquired for of 2–

4

statistical relationships were - - - genomes the .

pleiotropy.

investigation

program under which was phenotyping the analysis, EUMORPHIAdeveloped high-throughput for pipeline EMPReSSslim the employed We have The RESULTS zygosity. to according annotation phenotype in differences significant identifying heterozygotes, and homozygotes for annotations we of were able to number lines, phenotype compare a for Moreover, function. of annotation previous no with genes for phenotypes of rate abnormal discovery a high with pleiotropy, along the analyze and phenotypes catalog for 320 mouse genes. to These approaches identify extensive and effort multicenter first phenotyping our here report broad-based high-throughput, for statistical approaches and experimental both developed have we summary, In and providing robust data on(FDR) rate abnormal phenotypesdiscovery atfalse highthe confidence. controlling of aim the with acquired, data of for phenotype the analysis methodologies statistical Bayesian We phenotypes. abnormal power to detect apply that maximizes new of statistical approaches to the development of an experimental application design the Wereport alleles. mouse mutant the 449 of phenotypes undertake characterize to to effort procedures phenotyping these broad-based of multicenter, a subset a used has consortium (EUMODIC) Clinic Mouse Disease and European the efforts, typing the and functions gene new uncover to found opportunities be can that pleiotropy the illuminated has phenotyping pipeline a through lines knockout hundred several effort analyze to single-center major a recently, More measured. the and parameters details operational including developed, were that (SOPs) database (EMPReSS) broad-based

We

uncovered phenotype The EMPReSS SOPs are the foundation for future large-scale pheno

phenotyping

captured

remains methods 0

9 Surprisingly,

and incorporates a standardized and validated set of tests tests of set validated and standardized a incorporates and between

in

for

phenotyping data

diverse unknown.

data for

many

pipeline

from genes pipeline

from 1 0

we catalogs the standard operating procedures procedures operating standard the catalogs

systems. genes

449

The and over found

of

design

mutant

with mouse knockouts phenotypes.

27,000

significant

previously and

embryonic alleles,

mice,

data

through

The

7 differences

.

analysis representing finding

unknown

EUMODIC s e l c i t r A

stem

a

pipeline

that

aimed

cell

in function,

83%

320

 -

© 2015 Nature America, Inc. All rights reserved. mice would decrease the detectable detectable the decrease would mice design-analysis powerful in most combination the Under females. 7 and males 7 days lines). the of multiple 68% for case (the across or lines) the of 32% for case (the day single on a assessed mutants phenotyping difference between size little is effect there detectable in met, are conditions two these that Given data. the of 71% approximately for achieved was days which same mutants, as the on mice baseline phenotyping by second, and, lysis ana the in mice) wild-type C57BL/6N control (for data baseline of set entire the EUMODIC, in practice the was as including, by first, gained, be to is power considerable that demonstrated analysis This ( 10 × 1 of level a at litter) significance and analysis of day from effects from (resulting vations obser correlated with model linear frequentist a under power 80% on attaining based were calculations deviation; standard phenotypic the of units in scaled values mean baseline and mutant the between where analysis methods, and workflows experimental of variety a under detectable genotype, mutant the for size effect standardized chemistry. clinical and systems hematological and sensory behavioral, and neurological cular, bone, cardiovas metabolic, the including systems, biological and related 413 capturing pipeline. of disease- a cover variety chosen tests tests, The phenotyping parameters. each in phenotyping 20 analyzed mice encompasses of EMPReSSslim cohort separate a different with tests, incorporating each pipelines component two comprises SOPs by underpinned MGI. in annotations without or with lines mutant from arising numbers into split bar each with term, ontology MP ( lines. heterozygous and homozygous from arising counts into split bar each with line, ( effects residual for 0.69 and effects litter for 0.12 analysis, of day from effect for 0.18 was proportion variance the procedures: and parameters all across estimates average the as taken were calculations power the in used components variance The 5%. at FDR the controlling while power 80% attaining on based were Calculations only). data (accompanied mutants as day(s) same the on phenotyped mice from data baseline to restricted analysis (ii) and data) (all data baseline all of analysis (i) compared: were approaches Two analytical ‘accompanied’). were mutants the whether is, (that mutants as day(s) same the on phenotyped were mice baseline whether (ii) and day a single on phenotyped were mice all or day per assessed mice four with days multiple across phenotyped were mice mutant whether (i) were consideration under choices design qualitative two The approaches. analysis and workflows experimental 1 Figure s e l c i t r A  the from sample size 14 to 8 decreasing mice whereas improvement), a Fig. 1 Fig.

In EUMODIC, we used a cohort sample size of 14, consisting of of consisting 14, of size sample cohort a used we EUMODIC, In the quantify to performed was analysis power statistical A Detectable d 0 1 2 3 4 4 (2) Single dayaccompanied(alldata) Multiple daysaccompanied(alldata) Multiple daysunaccompanied(alldata Multiple daysaccompanied(accompanieddataonly) Single dayunaccompanied(alldata) Single dayaccompanied(accompanieddataonly) a

, , Detectable effect size and the distribution of annotations. ( annotations. of distribution the and size effect Detectable Supplementary Figs. 2 Figs. Supplementary 8 (4) Number ofmutantmice(litters) S Figure 1 Figure upplementary Fig. 2 Fig. upplementary d − = | 14 (7) b b 1 mu geno 0 a EPeSlm ( EMPReSSslim . , increasing the sample size from 14 to 20 20 to 14 from size sample the increasing , ta nt −7 estimated to control the FDR at 5% 5% at FDR the control to estimated 20 (10) and and ba geno seline ) shows similar plots for procedure-specific variance components). ( components). variance procedure-specific for plots similar shows d value from 1.64 to 1.39 (a 15% 15% (a 1.39 to 1.64 from value 3 24 (12) | , and and , / s , the absolute difference difference absolute the , upeetr Fg 1 Fig. Supplementary Supplementary Note Supplementary b Number of lines 20 40 60 80 0 d , that would be be would that , 0 Number ofannotations a 2 ) Effect size, size, ) Effect 4 0 ). ). - - ­ )

and females, 1 line with reduced fertility in females and 2 lines with with lines 2 and females in fertility reduced with line 1 females, and showed (4/153) males in both fertility 1 reduced with line 2.6% including fertility, reduced that found we investigated, lines 153 the Of fertility. assessed also we lines, homozygote viable the of many For lethal or subviable, we analyzed heterozygotes through EMPReSSslim. embryonic were homozygotes Where lifespan). normal a of end the (2.1%) lines Four showed a as reduced lifespan (defined death after subviable. weaning and before were that mutants homozygous had homozygous mutants that were embryonic lethal and 22 lines (11.8%) subviable or progeny) ( 28 from recovered homozygotes (no lethal embryonic either as in the intercrosses classified we numbers, heterozygote sufficient from homozygotes recover to failed we Where viability. homozygous for assessed and intercrossed were 187 heterozygotes (303), analyzed mutants. Of the lines homozygous to generate intercrossed were heterozygotes of lines, majority for the chimeras Mutagenesis Program (EUCOMM) resource were injected to generate Embryonic stem (ES) cell lines from the European Conditional Mouse and Generation the increased only 50 to 100 detectable estimated from analyzed were mice baseline which over of days number the in a reduction calculations, In power ( model multilevel the in components ance vari of estimation precise relatively provides approach this as day, each on represented litters more or two from mice with d 50 least at over analysis propose we mice, baseline of number target minimum a establishing In 14. above size sample the increasing by would attained be size effect detectable in decrease small relatively a only that increase would ≤ 13% of 28 progeny). We found that, in total, 65 lines (34.8%) had had (34.8%) lines 65 total, in that, We found progeny). 28 of 13% 6 0

fertility d , is shown as a function of sample size, under a variety of a variety under size, sample of a function as shown , is Heterozygotes Homozygotes c ) Histogram of the number of annotations within each top-level top-level each within annotations of number the of ) Histogram 1 1 . . Following the recovery of -transmitting progeny,

0 of

a mouse d DVANCE ONLINE PUBLICATION ONLINE DVANCE

from 1.64 to 2.14 (a 31% increase), illustrating illustrating increase), 31% (a 2.14 to 1.64 from Homeostasis and metabolism

d Behavior and neurological value from 1.64 to 1.68 (a 3% increase). 3% (a 1.68 to 1.64 from value

c mutants

b Number of annotations 100 200 300 400 ) Histogram of the number of annotations per per annotations of number the of ) Histogram 0

Hematopoietic system Growth and size

Skeleton and

Immune system

Nervous system assessment Limbs, digits and tail Vision and eye Cardiovascular system Supplementary Fig. 3 Fig. Supplementary

Integument

Digestive and alimentary Endocrine and exocrine glands Nature Ge Nature Craniofacial of ReproductivePigmentation system

viability Renal and urinary system In MG Not inMG n I etics

I ). ). -

© 2015 Nature America, Inc. All rights reserved. time, outliers and other data anomalies ( anomalies data other and outliers time, over measurements in drift struc systematic variables, correlation confounding as ture, complex such distributions, data, response phenotyping non-Gaussian mouse characteristics high-throughput of to range broad common a incorporating models statistical portal (IMPC) Consortium Phenotyping Mouse International the into incorporated been have EMPReSSslim database EuroPhenome the in captured were through analyzed controls and mutants from Data Phenotype Methods). (Online control wild-type a as used was strain background appropriate the and heterozygotes as analyzed were lines these cases, many In traps. gene and ones with including mutant lines, a of number additional we analyzed ods, also meth new the of applicability the To test males. in fertility reduced PP,inhibition. noise; prepulse PPI, BN, test; background tolerance IPGTT,lipoprotein; glucose prepulse; low-density LDL, intraperitoneal lipoprotein; high-density HDL, killer; NK, natural content; mineral bone BMC, density; to mineral a bone BMD, of mean the curve; 1. under area AUC, multiplicatively scaled were for test each deviations standard phenotypic the total comparison, For visual below). (labeled test each within above) (labeled parameter for quantitative each are shown panels) three (bottom of variance and proportions (top) deviation standard of phenotypic total intervals) 2 Figure Nature Ge Nature Note Supplementary ICS WTSI MRC-Harwell HMGU

Non−invasiv

blood pressure Posterior median Phenotyping variance. Comparison of estimated variance components across centers. Posterior medians (with error bars indicating 95% credible 95% credible indicating bars error (with medians Posterior centers. across components variance of estimated Comparison variance. Phenotyping 1.5 2.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.5 1.0 N 0 0 0 0 n -ethyl-

data etics e Mean systolic arterial pressure Mean pulse rate Body mass after experiment Total food intake Respiratory exchange ratio Calorimetry

acquisition Dark oxygen consumption N Dark carbon dioxide production Dark heat production -nitrosourea (ENU)-generated and other targeted

ADVANCE ONLINE PUBLICATION ONLINE ADVANCE Dark ambulatory activity Dark total activity ). Light oxygen consumption Simplified IPGT Light carbon dioxide production Light heat production Light ambulatory activity Light total activity Start dark oxygen consumption

1 Start dark carbon dioxide production

3 Start dark heat production

. We have developed and implemented implemented and We . developed have Start dark ambulatory activity Start dark total activity and T Glucose response AUC Fat mass DEXA Lean mass BMD (excluding skull)

BMC analysis Body length BMC/body weight Lean/body weight Fat/body weight Bone area (BMC/BMD) Whole-arena resting time Fig. Fig. Whole-arena average speed Open field Periphery distance traveled Periphery resting time Periphery permanence time

1 Periphery average speed

1 Modified SHIRP

2 Center distance traveled , Online Methods and Methods , Online

. In addition, the data data the In . addition, Center resting time Center permanence time Center average speed Latency to center entry Grip strengt Number of center entries Distance traveled, total Number of rears, total A Percentage center time Acoustic startle and PPI Locomotor activity Rotarod Forelimb grip strength measurement mean h Forelimb and hindlimb grip strength measurement mean Forelimb grip strength normalized against body weight Forelimb and hindlimb grip strength normalized against body weight Mean latency to fall BN startle magnitude PP1 startle magnitude PP2 startle magnitude PP3 startle magnitude PP4 startle magnitude 110dB startle magnitude PP1 + pulse startle magnitude PP2 + pulse startle magnitude PP3 + pulse startle magnitude PP4 + pulse startle magnitude - - Hot plat Prepulse inhibition, PP1 Prepulse inhibition, PP2 Prepulse inhibition, PP3 Prepulse inhibition, PP4

mental procedures could potentially be improved. The variance variance The improved. be potentially experi could where procedures indicate mental therefore can unmodeled and by driven variation mainly was experimental analysis of day to due portion pro variance the contrast, In for animals. and litters between those variation proportions, bi of substantially composed variance were effects residual three and litter the and Of 12% 18%, respectively. of 69%, parameters all across proportions effects, average residual and with litter analysis, of day to attributable variance phenotyping of proportion the estimated we variation, of measures To centers. across scale-free scale examine measurement and, hence, in equipment due to differences measurement, experimental precise of more or less indication this as a potential viewed be only can but variation parameters, some for centers across considerably differed parameter ( quantitative transformed each underlying components variance the by estimating mice control C57BL/6N in centers across variance phenotyping in differences for potential the Weexplored Phenotyping Global prepulse inhibition

Fig. Fig. e Time of first response Clinical chemistr Glucose Urea Creatinine Sodium

2 Potassium

and and Chloride Total Albumin Calcium y Phosphorus Iron Supplementary Fig. 1 Fig. Supplementary Lactate dehydrogenase Aspartate aminotransferase

Alanine aminotransferase

variance α−amylase Hematology Total cholesterol Triglycerides Free fatty acids White blood cell count count Hemoglobin Hematocrit Heart weight/tibiaFACS analysi lengt Mean cell volume Mean corpuscular hemoglobin Mean cell hemoglobin concentration count Fasted clinical chemistr T cell CD4+ percentage T cell CD8+ percentage T cell CD3+ percentage CD25+ percentage of CD4+ T cells s B cell CD19+ percentage Mature B cell IgD+ percentage Granulocyte Gr1+ percentage NK cell percentage Monocyte gate

). The total phenotyping variance variance phenotyping total The ). h Heart weight Glucose Total cholesterol Triglycerides y Free fatty acids HDL−cholesterol LDL−cholesterol Body weight Glycerol Dysmorphology body weight Non−invasive blood pressure body weight Calorimetry body weight Simplified IPGTT body weight DEXA body weight Heart weight/tibia length body weight Grip strength body weight Rotarod body weight Acoustic startle and PPI body weight

Hot plate body weight s e l c i t r A Opthalmoscope body weight Day ofanalysi (proportion ofvariance) Residual (phenotypic s.d.) Tota (proportion ofvariance) Litte (proportion ofvariance) r l o s logical logical  - -

© 2015 Nature America, Inc. All rights reserved. deviation), with blue or red indicating a mutant phenotype with decreased or increased strength relative to baseline mice, respectively. Significant Significant respectively. mice, baseline to relative strength increased or decreased with phenotype a mutant indicating red or blue with deviation), 3 Figure the (HMGU)—with München Zentrum Helmholtz at parameters some open-field parameters at MRC-Harwell and systemati some acoustic (ICS), startle Souris la de of Clinique at Institut occasion parameters analysis the calorimetry some on in example, was center—for particular a analysis at higher of cally day to due proportion s e l c i t r A  Institute. Trust Wellcome Sanger WTSI, cell. corresponding the around outline a black by indicated are < 5%) (FDR annotations Forelimb andhindlimbgripstrengthnormalizedagainstbodyweight Forelimb andhindlimbgripstrengthmeasurementmean

Forelimb gripstrengthnormalizedagainstbodyweigh Heat map comparing the annotations of reference lines across centers. Colors represent scaled genotype effect (posterior median/standard median/standard (posterior effect genotype scaled represent Colors centers. across lines reference of annotations the comparing map Heat ≤− 7 Forelimb gripstrengthmeasurementmean Non−invasive bloodpressurebodyweigh Heart weight/tibialengthbodyweigh Acoustic startleandPPIbodyweigh Start darkcarbondioxideproduction Genotype effect Mean cellhemoglobinconcentration −3 CD25 Light carbondioxideproductio Dark carbondioxideproduction Start darkoxygenconsumption Mean systolicarterialpressure Mean corpuscularhemoglobin Mature BcellIgD PP1 +pulsestartlemagnitude PP2 +pulsestartlemagnitude PP3 +pulsestartlemagnitude PP4 +pulsestartlemagnitude Simplified IPGTTbodyweight Granulocyte Gr + Whole-arena averagespee Opthalmoscope bodyweigh Dysmorphology bodyweight Body massafterexperiment Aspartate aminotransferase Periphery permanencetim percentageofCD4 Periphery distancetravele Respiratory exchangeratio Start darkheatproduction Light oxygenconsumption Dark oxygenconsumption Grip strengthbodyweight 3 0 Alanine aminotransferase Global prepulseinhibition Periphery averagespee Whole-arena restingtim Center permanencetim Number ofcenterentries B cellCD1 110dB startlemagnitud Center distancetravele Calorimetry bodyweight Prepulse inhibition,PP2 Prepulse inhibition,PP3 Prepulse inhibition,PP4 Prepulse inhibition,PP1 Lactate dehydrogenas Glucose responseAUC T cellCD4 T cellCD3 T cellCD8 Percentage centertim Bone area(BMC/BMD Latency tocenterentry Distance traveled,tota Center averagespee PP2 startlemagnitude PP3 startlemagnitude PP4 startlemagnitude PP1 startlemagnitude White bloodcellcount Periphery restingtim Number ofrears,total BMD (excludingskull) Light heatproduction Hot platebodyweight Time offirstresponse Dark heatproduction Alkaline phosphatase BN startlemagnitude Rotarod bodyweigh Red bloodcellcount Mean latencytofall NK cellpercentage Center restingtim DEXA bodyweigh Locomotor activity Lean/Body weight BMC/Body weight Mean cellvolum HDL−cholestero Total foodintake Total cholesterol Total cholesterol LDL−cholestero Mean pulserate Fat/Body weigh Free fattyacid Monocyte gate 1 9 count + + + + + + Triglycerides Triglycerides Heart weigh Total protein Phosphoru Hemoglobi percentage percentag percentag percentag percentag Body length percentag Tibia lengt α Lean mass Hematocrit Potassium Creatinine −amylase Fat mass Chlorid Glucos Glucos Glycero Albumin Calciu + Sodium Tcells ≥ BM Urea 7 Iro m C e n e n e e e e e e e e e e e e h e e d d d e d d n s s ) t t t t t t t t t l l l l

HMGU MRC-Harwell Aldh2 (hom) ICS HMGU MRC-Harwell Arhgef4 (hom) ICS HMGU MRC-Harwell Clk1 (hom) ICS HMGU ICS Elk4 (het) HMGU MRC-Harwell Elk4 (hom) HMGU MRC-Harwell Entpd1 (hom) ICS HMGU ICS Fam134c (hom) HMGU Fam63a (hom) ICS - HMGU WTSI Impad1 (het) HMGU differences between-center the Reflecting centers. at those protocols experimental consistent more centers some reflecting at potentially smaller others, at than generally was analysis of day to due tion propor variance the For procedures, data. some baseline of outlying a day’s reflecting proportions typically variance corresponding worth MRC-Harwell Kdm4c (hom) HMGU ICS Kdm8 (het) HMGU WTSI Mta3 (hom) MRC-Harwell WTSI Mysm1 (het) HMGU WTSI Mysm1 (hom) a HMGU DVANCE ONLINE PUBLICATION ONLINE DVANCE ICS Nfya (het) HMGU MRC-Harwell Setmar (hom) ICS HMGU MRC-Harwell Slc38a10 (hom) ICS MRC-Harwell WTSI Snip1 (het) WTSI ICS Sytl1 (hom) HMGU MRC-Harwell Tnfaip1 (hom) ICS HMGU MRC-Harwell Ttll4 (hom) ICS HMGU ICS Xbp1 (het)

Nature Ge Nature FACS analysis length Heart weight/tibia chemistry Fasted clinical Body weight pressure Non−invasive blood Calorimetry Simplified IPGTT DEXA Open field Modified SHIRPA Grip strength Rotarod Acoustic startleandPPI Hot plate Clinical chemistry Hematology n etics - © 2015 Nature America, Inc. All rights reserved. one-tailed one-tailed in one center but not the other, 158 binomial of exact (71%; 222 cases made was call a where centers of pairs for compared were estimates effect when However, center. one than more in concordantly anno tated were (33%) 61 centers, the of one least at in annotated were that lines 183 the of Indeed, effects. smaller detect to power reduced ( ers cent across detected reproducibly less and weaker were sizes effect two across or ibly a annotated more of number whereas centers, gene demonstrated by,perturbations, for example, phen extreme Relatively of instances. in majority the observed cordance in about 8% of comparisons, between-center consistency was ( heterogeneity no Q statistically significant heterogeneity in 7% of comparisons (Cochran’s approximately 9% the threshold (using of comparisons in heterogeneity phenotypic between-center of levels high exhibit to heterogeneity of measures ( of across the centers effect genotype for each parameter, both visually at two or we more phenotyped estimates line compared each centers, the ( tests of phenotyping the examine reproducibility to between-center centers multiple the across lines mutant reference mutants. and controls between comparisons within-center to restricted were analyses data observed, variance in annotations. three least at with lines the for parameters of a subset displays only map heat the legibility, For red. in are lines non-EUCOMM for Labels cell. corresponding the around outline a black by indicated are < 5%) (FDR annotations Significant respectively. mice, baseline to relative strength increased or decreased with phenotype a mutant indicating red 4 Figure Nature Ge Nature the same direction ( Fig. Forelimb andhindlimbgripstrengthnormalizedagainstbodyweight test at FDR < 5%). Sixty-two percent of cases were estimated to have Most notably, EUMODIC analyzed a large set of 22 common common 22 of set large a analyzed EUMODIC notably, Most Fig. Fig. Forelimb andhindlimbgripstrengthmeasurementmean 3 Forelimb gripstrengthnormalizedagainstbodyweight and

Heat map of annotations for the complete data set. Colors represent scaled genotype effect (posterior median/standard deviation), with blue or blue with deviation), median/standard (posterior effect genotype scaled represent Colors set. data complete the for annotations of map Heat Forelimb gripstrengthmeasurementmean 3 and and Supplementary Figs. 4 ≤− P Genotype effect Non−EUCOMM EUCOMM n Start darkcarbondioxideproductio Mean cellhemoglobinconcentration = 1.2 × 10 × 1.2 = 7 etics Light carbondioxideproductio Dark carbondioxideproduction Start darkoxygenconsumptio Mean systolicarterialpressure Mean corpuscularhemoglobin −3 Whole-arena averagespeed Body massafterexperiment Aspartate aminotransferase Periphery permanencetim Periphery distancetraveled Respiratory exchangeratio Light oxygenconsumptio Dark oxygenconsumption Start darkheatproductio Grip strengthbodyweight Alanine aminotransferase Global prepulseinhibitio Periphery averagespeed Whole-arena restingtim Number ofcenterentries Center permanencetim 110dB startlemagnitude Center distancetraveled Prepulse inhibition,PP1 Prepulse inhibition,PP4 Prepulse inhibition,PP2 Prepulse inhibition,PP3 Lactate dehydrogenase Glucose responseAU Percentage centertim Bone area(BMC/BMD) Latency tocenterentry Distance traveled,tota Supplementary Fig. 4 Fig. Supplementary Center averagespeed PP1 startlemagnitude PP3 startlemagnitude PP2 startlemagnitude PP4 startlemagnitude White bloodcellcount Periphery restingtim Number ofrears,tota BMD (excludingskull) Light heatproduction Time offirstresponse 3 0 Alkaline phosphatase Dark heatproductio Red bloodcellcount Mean latencytofall Center restingtim DEXA bodyweight Locomotor activit Lean/Body weight BMC/Body weight Mean cellvolum HDL−cholesterol Total cholesterol Total cholesterol Mean pulserat Fat/Body weight Free fattyacids I

Platelet count Triglycerides Triglycerides Heart weight Total protein Phosphorus Hemoglobin 2 α Lean mass Supplementary Supplementary Fig. 5 Hematocrit Potassium Creatinine −amylase ≥ Fat mass = 0); thus, although there was considerable dis considerable was there although thus, 0); = Chloride Glucose Glucose Glycerol Albumin Calcium ADVANCE ONLINE PUBLICATION ONLINE ADVANCE Sodium 7 BMC Urea Iron C n e n n n n n n e e e e e e e y l l –10 Mysm1 (hom) Phex (hom) Umod (hom) ) displayed estimates of genotype effect in in effect genotype of estimates displayed ) Aco2 (het) Plcg2 (het)

Q Cidec (hom) Frmd5 (hom) Abcb11 (hom) Arvcf (hom) and and Ndufaf6 (het) Xbp1 (het) Optn (hom) Ndufa8 (het) HMGU Arhgef4 (hom) Hprt (hom) and 8430419L09Rik (hom) Hsf2bp (hom) Slc38a10 (hom) I Spryd3 (hom)

2 Dock10 (hom) Elk4 (hom) (ref. (ref. Dpp9 (het)

5 Rxfp2 (hom) Gsk3a (hom) ), consistent with there being being there with consistent ), ) and using two meta-analytical Trnt1 (het) Golm1 (hom) Entpd1 (hom) Wrnip1 (hom) Usp12 (hom)

). ). Overall, the data from the Cap2 (hom)

1 Myoz1 (hom) Slc35f6 (hom) Zfp760 (het) 4 Epc2 (hom) Fam63a (hom) ). The lines were found found were lines The ). Elmod1 (hom) Mysm1 Ostes (hom) Fbxo11 (het) Mecom (het) Cisd2 (hom) Slc38a10 (hom) cub (hom) Kcnj16 (hom) Trpc3 (het) Gnas (hom) Ankrd11 (het) Brpf1 (het) Cyb561 (hom) , , were reproduc Grin2d (hom) Ctnnb1 (het) Gna11 (hom) Acp2 (hom) Vangl2 (het) I Fam117b (hom)

2 Sfrp1 (hom) Ablim1 (hom) > and 0.75) Shh (het) Fig. Fig. Celsr1 (het) Vnn1 (hom) Cul7 (het) Tc(HSA21)91−1Emcf (het) Snf8 (het) C030046E11Rik (hom) Map3k1 (het) MRC-Harwell

3 Elk4 (hom)

o Rbfox3 (hom) Rab18 (hom) ). ). For

typic typic Klhdc2 (het) Chrna9 (hom) Wbp2 (hom) Sfrp2 (hom) Sfrp2 (hom) Sfrp2 (hom) March9 (hom) - - - - Atp8a1 (hom) Fhl1 (hom) Arhgef4 (hom) Ucp1 (hom) Rhbdf2 (hom) Rhbdd1 (hom) Fam83g (hom)

was effectively demonstrated with the pipelines used. pipelines the with demonstrated effectively was lines (290/449) had more than one phenotypic hit. pleiotropyOverall, annotated, at an estimated FDR parameter for weight the line ofbody 5%. one Sixty-five percent least of at had (30%) lines 448 of 133 that We found (2.2%). FDR annotation the than greater be to (11%) line that for FDR the causes line a in parameters hundred several across eter annotation, at an FDR estimated per line of 11%. Multiple testing param one at least had (83%) genes 320 representing alleles mutant 449 the of 374 We that 2.2%. found of FDR annotation estimated an phenotype. weight body a with associated always not but often are that effects map phenotypic broad heat the from apparent is it Moreover, phenotype. same the of aspects different measuring test a from expected a be would as within and field) open and single test (for (DEXA)), X-ray example, absorptiometry dual-energy startle acoustic example, (for tests heat global map a highlights number of lines with multiple hits across non- co and in significant of phenotypes representation significant global The annotations genes. phenotype 320 2,947 to and ascribed and alleles points data mouse 9,019,984 mutant 449 generated we total, In mice. 27,707 on data phenotyped phenotype accumulated have we far, Thus Phenotype results. negative false of possibility the emphasizing while the concordance of reference lines highlight centers the data between Parl (hom) Parvb (hom) Rnf169 (hom) Entpd1 (hom) n

We identified 2,316 non–body weight parameter annotations at annotations parameter weight non–body 2,316 identified We Hint1 (hom) Tpcn2 (hom) Celsr1 (het) sistent trends in significant hits across centers. In addition, this this addition, In centers. across hits significant in trends sistent Tpcn1 (hom) Hbp1 (het) Ttll4 (hom) Tbc1d2b (hom) Capn5 (hom) Gm5134 (hom) Tmem167 (het) Rab18 (het) Ccdc101 (het) Gse1 (het) Smpdl3b (hom) 2810408A11Rik (hom) March7 (hom)

Ccnyl1 (hom)

annotations Wtap (het) Cd28 (hom) Sirt6 (het) Thrb (hom) Serf1 (hom) Kcnj12 (hom) Prss50 (hom) Asns (hom) Cd200 (hom) Mysm1 (hom) Mir96 (hom) Myo7a (hom) Ube3b (hom) Hmx3 (hom) Mta1 (hom) Kctd10 (het) Akt2 (hom) Ankrd13a (hom) Apoe (hom) Cadm1 (hom) WTSI Ppp5c (hom) Wdr47 (hom) Ube3b (het) Gys2 (hom) from Akt2 (het) Baz1b (het) Baz1b (hom) Mta1 (het)

Figure Figure Epc1 (het) Epc1 (hom) Mir96 (het) Rhd (hom) Brd7 (het)

Mier1 (hom)

449 Smyd2 (hom) Ankrd13a (het) Tpm1 (het) Zranb2 (het) Sytl1 (hom) Aldh3a2 (hom) Nr1h4 (hom)

Slc25a38 (het)

4 Pacs2 (hom) mutant Marc2 (hom)

enables the visualization of of visualization the enables Cops4 (het) Mb21d1 (hom) Vat1l (hom) Nfxl1 (hom) Sept8 (hom) Tnfrsf1b (hom) H6pd (hom) Rxrg (hom) Gbx1 (hom) Fam134c (hom)

ICS Prpsap2 (hom) Tnfaip1 (hom)

Fkbp9 (hom)

lines Gar1 (het) Elk3 (hom) Mib2 (hom) Fkbp10 (het) Mat2a (het) Slc38a10 (hom) Drg2 (hom) Nr1d1 (hom) Nr3c2 (het) s e l c i t r A Fabp12 (hom) Ephx1 (hom) Ttll1 (het) Rnase10 (hom) Eif2b5 (het) Ncor2 (hom) Zfp13 (hom) Tmc8 (hom) Phykpl (hom) Patl2 (hom) blood pressur Non−invasive Calorimetr Simplified IPGT DEXA Open field Modified SHIRP Grip strength Rotarod startle andPP Acoustic Hot plat Clinical chemistr Hematology tibia lengt Heart weight chemistr Fasted clinical Body weight e y h y / e I T A y  -

© 2015 Nature America, Inc. All rights reserved. parameters that were identified in both homozygotes and heterozygotes forInterestingly, sizes lines. 43 effect the thatfound of we 11 to fined in common between heterozygotes and homozygotes, which were con parameter annotations for the homozygotes. We found 53 annotations zygotes accumulated 101 parameter annotations in comparison to 410 zygotes to compare phenotype outputs according to For zygosity.43 of the mutantThe genes, heterowe analyzed both homozygotes and hetero Homozygote choice. gene of impact global the to relative minor was effects but tion in hit the extent of rate parameters, these for a few particular varia the explained analysis) of day the to due proportion variance line mice and heterogeneity in phenotyping the variance (particularly Set Data in power and thus the annotation hit rate ( to across centers lead in could variation phenotyping that differences explanation alternative the investigated we choice, gene from effects 1 Table there notype, was a lower annotation rate ( phe to respect with random at selected were lines non-EUCOMM At extent to a sets. the HMGU lesser annotation ICS, however, where and Wellcomethe at sets MRC-Harwell Trust and Institute Sanger to lines (in non-EUCOMM red), contributed strongly to the annotation illustrated in is choice pleio gene of when effect The account. into particularly taken is tropy phenotypes, broad-effect have to likely more are lines selected Phenotypically centers. some at information non- mainly lines, mutant phenotypic of preexisting on basis the of selected was lines, EUCOMM subset a that given unsurprising is ( ICS at lower and Institute Sanger Trust Wellcome the and MRC-Harwell controlling test FDR exact (Fisher’s tests 20 the of 13 for centers across ontology (MP) ( term Phenotype Mammalian top-level per annotations of number the in reflected similarly was outputs phenotype of tion distribu The testing. (3%) ratio length weight/tibia heart and (4%) (33%) and istry weight body (29%) testing to lower rates for hot-plate chem clinical for rates higher from ranging variation, considerable 6 Fig. ( procedure each for rate hit the computing by test heterozygotes genes. of dosage-sensitive catalog the to phenotyping adds and of usefulness the underscores zygotes test, Wald (GLM), model P linear heterozygotes for generalized 0.5) = binomial (s.e.m. (negative annotations 4.4 the than higher significantly homozygotes, for 0.8) = (s.e.m. 8.3 was annotations of statistically difference significant ( this with tested), lines 197 of for (151 77% than at tested) heterozygotes lines higher mutant was 248 of annotation (219 88% one at least homozygotes for at with lines of proportion The = 0.674). curve the under area (ROC varied is threshold similarity phenotype the as matched, as classified incorrectly pairs gene unmatched of proportion the against matched as classified correctly pairs gene matched EUMODIC-MGI of proportion the plots curve (ROC) characteristic operating receiver The similarity. phenotype of basis the on unmatched or matched as pairs 5 Figure s e l c i t r A  tended to be stronger in homozygotes ( = 6 × 10 × 6 = Finally, we assessed the performance of each individual phen zygosity. to according rates hit annotation analyzed also We Figure Figure ≤ ). First, as expected, the overall hit rates across tests showed showed tests across rates hit overall the expected, as First, ). Fig. 1 Fig. 5%), with the tendency for hit rates to be relatively high at high relatively be to rates hit for tendency the with 5%),

). Although we attribute the observed differences mainly to to mainly differences observed the attribute we Although ). Supplementary Fig. 6 Fig. Supplementary Phenotyping similarity. Classification of EUMODIC-MGI gene gene EUMODIC-MGI of Classification similarity. Phenotyping ). Differences in sample size, unmodeled variation in base in variation unmodeled size, sample in Differences ). 4 –7 , where a relatively small number of lines, preferentially preferentially lines, of number small relatively a where , c ). Nevertheless, the high annotation hit rate for hetero for rate hit annotation high the Nevertheless, ). ). Second, there were significant differences in hit rates rates hit in differences significant were there Second, ).

and

heterozygote χ 2 test, P ). Variation in hit rates across centers centers across rates hit in Variation ). = 0.002) (

comparisons Supplementary Supplementary Fig. 7 Fig. 1 Fig. Fig. Fig. Fig. 2 4 b and and ). The mean number Supplementary Supplementary Supplementary Supplementary Supplementary Supplementary ). o typing ­ ------­ ­

on the basis of phenotype similarity ( similarity phenotype unmatched of basis or the on matched as pairs the gene EUMODIC-MGI for classify to Mouse the phenotypes in database genes (MGI) the for Informatics Genome observed between those and set similarity data EUMODIC the assessed We Phenotype orders, and (iii) neurological and behavioral disorders, to determine to determine disorders, behavioral and neurological (iii) and orders, disorders, comprising diabetes and , (ii) bone and skeleton dis metabolic (i) including areas, disease three examined we disease, in function. unknown of genes of role the that are mouse models demonstrates a for resource valuable studying correlation this addition, In diseases. human known for data tional that correlate with human disease and, in a number of cases, add phenotypes func recapitulate models mouse new the that demonstrating ( gene mouse the with correlation notypic 42 in with unique genes humans, disease associated 14 showed a phe the Of Methods). (Online data disease human the with concordance showed mice the from data phenotype the whether Weinvestigated were orthologs 36 of Orphanet genes and associated OMIM with GWASin in loci disorders (Online Methods). genetic in diseases involved genes for rare orthologs with associated genes for orthologs were 21 EUMODIC, by identified phenotypes significant (OMIM) Man in Inheritance Mendelian diseases and genes associated with human genetic disorders in Online study association (GWAS)-discovered loci, genes with associated rare role of about the human provide functional knowledge genome-wide can models mouse new the whether determine to analysis an taken annotations. phenotype-rich as well as is pipeline phenotype-poor with the mutants in phenotypes at uncovering that efficient demonstrating pipeline, EMPReSSslim the in mutants all for rate discovery overall the to similar is rate discovery genes in this class we were able to identify perturbed phenotypes. This in the MGI curated database. We found that for 87.9% (152/179) of the Around half of had the (179) genes no analyzed previous annotations Methods). (Online annotations previous no with of genes class large Aside from genes with existing annotations, phenotype we analyzed a New Methods). Online ( genes different of alleles for phenotypes to were they than gene same the of alleles for phenotypes literature-curated MGI the to similar more significantly be to EUMODIC in observed types To further investigate the role of the previously uncharacterized genes For the class of genes with no previous annotations, we have under

gene

Proportion of matched gene pairs classified as matched function

similarity 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0 a Proportion ofunmatchedgenepairsclassifiedasmatched DVANCE ONLINE PUBLICATION ONLINE DVANCE 0.1

identified

0.2 to

published 0.3 0.4 0.5

data Fig. Fig. 1 . We investigated the ability ability the investigated We . 0.6 1

Supplementary Table2 Supplementary 5 sets 5 . Of the 152 genes with with genes 152 the Of . ) and found the pheno the found and ) 0.7

0.8 Nature Ge Nature 0.9 P 1 = 0.00048; 0.00048; = 6 1.0 , 20 were were 20 , n etics ), ), - - - - -

© 2015 Nature America, Inc. All rights reserved. SHIRPA (SmithKline Beecham, Harwell, Imperial College, Royal Royal College, Imperial Harwell, Beecham, of (SmithKline SHIRPA knockout mouse the Elmod1 Interestingly, genes. disease-associated candidate many identified we domain, behavioral and neurological the in plei genes 45 behavioral the Of the pipeline. of and phenotyping the using by example observed neurological typical a to providing disorders, corresponding domain the in disease bone for gene candidate interesting an as carrier reported been solute already has the including genes, 39 identified various of concentrations reduced weight. body reduced and lipids blood curve, response glucose the under area and concentration glucose blood fasted reduced showed mutant with mice example, For loci. metabolic new of number a identifying of areas. several disease diverse on effects had which genes, disease-associated candidate interesting of number large a identified have we area, disease each For represent and exploration for candidates interesting validation. further hits multiple with genes that is expectation Our ( Venn a diagram in hits more or one with gene each plotted and class test each in hits phenotype for annotations correlate. disease relevant previous no with genes of set our analyzed we the Subsequently, of indicative be would hit phenotype a where tests of combinations identified we case, In model. or each in disease a combination indicate potential singly either can mice in hits phenotype significant whether areas. 3 disease the across identified were genes 94 of A total accordingly. genes assigned and correlate disease relevant the of indicative be would hit a phenotype where tests of combinations identified we area, each For disorders. ( skeleton and bone areas: disease three in hits phenotype relevant with ( 6 Figure Nature Ge Nature a of lack fluidity analyses), assessment) phenotype Hospital, London a – Classification of genes according to bone and skeletal parameters parameters skeletal and bone to according of genes Classification parameters, metabolic on effects exhibited genes Sixty-nine c ) The Venn diagrams illustrate the distribution of genes genes of distribution the illustrate Venn diagrams ) The 1

showed increased activity (as measured in open-field and and open-field in measured (as activity increased showed 7 Elmod1 Analysis of genes with no previous annotations. annotations. previous no with genes of Analysis . . Our of analysis the EUMODIC data set annotated a ), metabolism ( metabolism ), n etics , a gene with no existing functional information, information, functional existing no with gene a ,

ADVANCE ONLINE PUBLICATION ONLINE ADVANCE b ), and neurological and behavioral ( behavioral and neurological and ), Slc38a10

Fig. Fig. o

tropy that is is that tropy c 6 ) ). ). Slc38a10

, which which ,

and analysis methods. This analysis underscored the usefulness of of usefulness the workflows underscored analysis experimental This size of methods. analysis and variety effect a the under on genotypes impact mutant the of determine to design the of analysis experimental power statistical a undertook we procedures, these using In EMPReSSslim. pipeline, phenotyping common a in dures proce EMPReSS the employed EUMODIC and tests, phenotyping mutants. mouse of phenotyping large-scale the to approach multicenter a required undertake to were that developments statistical and experimental of new number a were There genera information. phenotypic the new for rich of lines tion mouse mutant of large-scale, phenotyping multicenter, of broad-based feasibility the demonstrated have We DISCUSSION amplitudes. and multiple grip across amplitude inhibition one reduced prepulse in reduced response curling, startle trunk acoustic of reduced strength, frequency increased and gait in First, a multicenter approach requires the use of robust, validated validated robust, of use the requires approach multicenter a First, s e l c i t r A  - -

© 2015 Nature America, Inc. All rights reserved. large class of genes expressed in the brain for which there is little if if little is there which for brain the in expressed genes of class large gene, such One investigation. further merit by program. the pleiotropy identified underlying the reflecting areas, an area, individual disease whereas others had hits in multiple disease to exclusive were genes Some sets. parameter the across hits multiple disease areas. We identified a large number of genes (94) with single or different the appropriateto annotations with genes assigned and late corre disease relevant the of indicative be would hits which among sets parameter identified we disorders), behavior and neurological and skeleton, and bone (metabolism, areas disease three For genes. disease-associated candidate new identify to aiming genes, of class meth tion the underline value of phenotyping the and broad-based analysis population human the in disease to contribution its and haploinsufficiency of understanding wider on bearing a have potentially will studies Such investigation. of further copies merit will annotations two dosage-sensitive versus these and gene, each copy single a of loss the from outcomes path way in differences significant imply data These heterozygotes. to specific unexpectedly are annotations of number a shared, are tions annota of number considerable a although that, showed analyzed were and heterozygotes homozygotes both of sons where 43 the lines compari regard, this In effects. phenotypic their and loci dosage- sensitive on information provides and set data the enriches further significantly heterozygotes of were analysis Thus, heterozygotes. for those homozygotes than higher for rates hit the although (77%), phe effects. pleiotropic notypic for marker early potential a is weight ( body parameters Thus, weight body to annotations and annotations noteworthy is that it there is strongand association between non–body weight annotated, parameter weight body one least at had 5%) of FDR an at (30% lines of number large A annotation. phenotype one than more having (290/449) 65% with lines, pleiotropic of number of level concordance for centers. across tests high phenotyping of parameters no heterogeneity was These results observed. indicate a heterogeneity in only 9% of comparisons, whereas, in contrast, for phenotypic 62% between-center of levels high We found reproducibility. future. the in improvements test into forward feed and protocols phenotyping of reproducibility the underlying variation unwanted of consideration the at allow tests analyses some These for reflects centers. lower some or analysis higher of was day and to variation due experimental variation the between animals, variation and biological litters the reflect effects residual Whereas and effects. litter residual and litter analysis, In of day to FDR. variance attributable phenotyping the the examined of we analysis, of control this out carrying reproducibility permutation-based the via ensure calls to phenotype aiming sets, data phenotyping that addressed many of the features of multivariate large-scale, mouse reproducibility. center between- enhance potentially would power increased Nevertheless, increased. was size cohort the if enhancements power modest only that reasonable power was provided by a cohort sample size of of 14, with phenotyping the baseline mice and on the same day set as mutants. This analysis also indicated baseline control entire the employing s e l c i t r A  ref. ‘ignorome’; so-called (the information any functional Notably, we uncovered new candidate genes disease-associated that annota previous any without genes for rates hit phenotype The hit rate for heterozygotes phenotype a we found high Intriguingly, significant a identified set data EUMODIC the of analysis The Third, we employed 22 reference lines to directly test between-center models statistical new implemented and developed we Second, o dologies that we developed. We extended the analysis of this this of analysis the Weextended developed. we that dologies 1 8 . Elmod1 , belongs to the the to belongs , 1 9 ). Many ). Fig. Fig. 4 ). ). ------

resource with a well-defined series of mutant alleles and a broad- a and alleles mutant of series well-defined a with resource data. As such, the work reported here paves the way toward phenotype a robust reference analyze and gather to programs multicenter underpin large will that analyses statistical and design experimental the genome mouse the in genes protein-coding the all of annotation comprehensive the on embarks community genetics mouse mutants as international the of to from hundreds mouse thousands efforts phenotyping of scaling set. data genome-wide a of development the the underscores diversity of that hypotheses might be generated from analysis This . of unexplored largely this characterization tional func the elaborating further traits, metabolic of number a displays of behavioral traits. function Notably, we the have also shown that the demonstrate to able been have Elmod1 the of region the in map to known is activity locomotor in variation Moreover, expression. brain regional including expression, brain for with associated a strong other functional attributes havefunction been determined. However,auditory in involved be to shown been recently of network and connectivity other protein characteristics. in genes terms from well-studied of are genes these indistinguishable R. R. Cox, E.D., A. Dierich, B.D., A. Duchon, O.E., C.T.E., L.E.F., I.E., J.E., J.F., A.F., V.B., D.H.B., J.N.B., J.C.-W., H.C., M.-F.C., P.C., C.C., F.C., G.F.C., R. Combe, S.A., A. Auburtin, A. Ayadi, J. Becker, L.B., E.B., R.B., M.-C.B., J. Bottomley, M.R.B., T.R.R.-S., Sorg, S.W., H.F., M.F., D.J.A., N.C.A., T.A., A.A.-P., D.A.-H., G.A., P.A., G.N., H. Morgan, A.-M.M. and S.D.M.B. wrote the manuscript. M. J.K.W.,Selloum, M.H.d.A., K.P.S., Y.H. and S.D.M.B. conceived the study and directed the research. available at of the project is available in the A list full of members of the EUMODIC consortium who contributed to the goals IDEX-0002-02, ANR-10-LABX-0030-INRT and ANR-10-INBS-07 PHENOMIN. la Recherche under the framework program Investissements d’Avenir by ANR-10- (ICS) has supportedbeen by French state funds through the Agence Nationale de through Genome Canada and Genome Prairie. The Institut Clinique de la Souris (TCP) was funded under the NorCOMM project by the government of Canada LSHG-CT-2006-037188. The work at the Toronto Centre for Phenogenomics The EUMODIC project was funded by European Commission contract number online version of the pape Note: Any Supplementary Information and Source Data files are available in the www.mousephenoty at website IMPC the from available is services web and access europhenome. from downloaded be can data raw All database. public codes. Accession the in vers available are references associated any and Methods M (IMPC), proje www.europ Clinic (EUMODIC), (EMPReSS), URLs. characterization. community in-depth further for scientific the to accessible set data phenotyping based AU Acknowledgments ethods In summary, the work described here demonstrates the usefulness usefulness the here demonstrates In summary, work the described T ion of the pape the of ion H cts/europhenome O European Mouse Phenotyping Resource of Standardised Screens R R locus on 9. Using the EUMODIC pipeline, we we pipeline, EUMODIC the Using 9. chromosome on locus htt CONT http://www.eumodi henome.org p://www.mousephenoty http://emp org/rawdata.htm RIBU 8 a . Most notably, it provides fundamental insights into insights . Most notably, fundamental it provides DVANCE ONLINE PUBLICATION ONLINE DVANCE The EuroPhenome database is an open access access open an is database EuroPhenome The r . T r pe.org http://www.eumodic.o . I ONS / / ; Institutional Mouse Phenotyping Consortium ress.har.mrc.ac.uk ; EuroPhenome library, cis c.org/partners.htm Supplementary Supplementary Note expression quantitative trait (eQTL) locus / . l . Additional information about data data about information Additional . pe.org l / / . ; ; European Mouse Disease . rg , , and a list of partners is / ; ; EuroPhenome, http://s

Elmod1 Nature Ge Nature Elmod1 ourceforge.net/ htt Elmod1 in several several in 2 Elmod1 p://www. 0 , but no no but , mutant n online online htt http:// etics

has p://

is -

© 2015 Nature America, Inc. All rights reserved. L D Ralf Jennifer Birgit Rathkolb Benoit Petit- L L M T T N Richard Houghton G Isabelle Hilary Irina Brendan G D Heather Joanna Bottomley L G D Hugh M 5. 4. 3. 2. 1. reprints/index.htm at online available is information permissions and Reprints The authors declare no competing interests.financial databases, and outcarried data and statistical analyses. A. Walling, M.W.-D., H.W., J.W., C.H. and A.-M.M. developed data tools and R. Hoehndorf, N.J., N.A.K., S.L., C.L., H. Maier, D.G.M., L.S., M. Simon, L.V., phenotyping pipelines. G.N., H. Morgan, A.B., A.D.F., T.F., G.G., S.G., J.M.H., mouse generation, phenotyping, and data acquisition and assessment from the A. Wolter, W.W., A.Ö.Y., A. R.Z., Zimmer, A. Zimprich and V.G.-D. undertook I. Tilly, G.P.T.-V., M.T., I. Treise, E.V., D.V.-W., C.W.,J.S., K.-H.S., E.S., A.S., H.S., M. R.S., Stewart, C.S., T. B.W., Stöger, O.W.,M. Sun, D.S., L.T., M.W., E.W., O.P., D.R., L.Q.-F.,S.R., M.M.Q., B.R., I.R., F.R., J. Rossant, J.M.R., Rozman, E.R., B.N., F.N., P.M.N., L.M.J.N., M.O., G.P., N.S.P., E.P., B.P.-D., A.P., C.P., S. Marschall, M.M., H. Meziane, K.M., C. P.P.,Mittelhauser, L.M., D.M., S. Muller, L.P., H.A.K., S.K., T.K.-R., M.K., T.K., V.L., E.l.M., T.L., A.L., C. McKerlie, J.-L.M., A. Guimond, W.H., G.H., S.M.H., H.H., T.H., R. Houghton, A.H., B.I., H.J., S.J., A. Gambadoro, L. Garrett, H.G., A.-K.G., L. Glasl, P.G., I.G.D.C., A. Götz, J.G., Nature Ge Nature COM onia onia

aurent ore Becker aurent Vasseur auryl homas regory Amannregory atasha A eoff eoff Hicks emma F avid avid avid avid Richardson avid J Adams anuel artin artin Hrabě de Angelis Wagner, G.P. & Zhang, J. The pleiotropic structure of the genotype-phenotype map: genotype-phenotype the of Wagner,structure pleiotropic G.P.The J. Zhang, & Gailus-Durner,V. Brown, S.D., Hancock, J.M. & Gates, H. Understanding mammalian genetic systems: annotation functional The J.M. Hancock, & R. Kuhn, W., Wurst, S.D., Brown, J.A. Blake, the evolvability of complex organisms. complex of evolvability the phenotyping. standardized for mouse. the in phenotyping of challenge the (2009). 305–333 f amla gnms te hleg o phenotyping. of challenge the genomes: mammalian of mouse. (2014). laboratory the about knowledge P S ET E teinkamp M L S melyanova I G M udwig NG unter M G organ M S C K ates n D J alisbury oncalves oncalves FI ark ater etics C lopstock onassier t al. et K oe N N 24 1 l odner . D arp 10 utter , A 9 18 14 et al. et 5 , , 1 10 , , 10 h Mue eoe aaae itgain f n acs to access and of integration Database: Genome Mouse The NC emoulière , , Anna- – , , Aline S

9 , , Arnaud , , L 8 1 1 9 , , , , , abine 5 E , 35 , , 5 , ydia ydia ADVANCE ONLINE PUBLICATION ONLINE ADVANCE IA Introducing the German Mouse Clinic: open access platform access open Clinic: Mouse German the Introducing , , Hugo A 16 30 N M – 9 – M lodie Bedu lodie 10 28 9 10 , 8 , Ramiro Ramirez- 9 8 L , , 5 , , Fabrice Riet iels iels 9 , , 1 D , , M , , , , Philippe André arie-France I – , , Jeanne , , ichelle ichelle D , , Anja Hurt , , , Roy L E S 14 NTE 8 M a a ichael R Bowl T uis uis tephane Rousseau , milie Velotmilie avid avid L K Nat. Methods Nat. , , Valerie M C D eboul C arkus ux R arin arin ruz 5 Hölter Adams avid D ESTS S 1 – C 5 – antos 8 K G uchon – S 3 , Amanda Pickard ombe 8 5 Nat. Rev. Genet. Rev. Nat. , imon atus E 35

10 G , , – 5 M O M 8 stabel C – , , , , Isabelle erdin uli Ais Res. Acids Nucleic , , Alexander 8

elvin llert L G 1 olin olin PLoS Genet. PLoS oulaert C 2 22 , Raffi , Bekeredjian Raffi 10 9 5 1 , , Boris Ivandic , 403–404 (2005). 403–404 , alanne 6 6 , , 5 eorge eorge – , hampy , , Heinz Höfler , , 10 15 – – – 8 T 10 K 8 8 8 , , Janet Rossant 1 , , , , 9 , hure Adler 9 , , , Roger M 9 5 16 , , Véronique Brault 16 arl-Heinz M , , Jack Favor S 12 , , , , Hamid – S haron G , , 8 , , c olis ,

ichelle ichelle 5 , , 13 D 12 N K O 5

– eorge eorge T 5 S 2 – http://www.nature.c 5 nu Rv Genet. Rev. Annu. 8 – , , erlie aniela aniela Vogt- , 204–213 (2011). 204–213 , icholson , e118 (2006). e118 , liver liver – 8 arah arah Atkins illy , 8 G 9 , , 8 S , ,

G , , , Philippe S 35

C uillaume Pavlovicuillaume L téphanie K ophie ophie 5 ötz 42 eticia Quintanilla-Fend eticia , 10 ox – 27 M G S itchen E T 8 D810–D817 , S 1 , tewart 1 , , , koutos ania , ickelberg 28 19 10 eziane chäble 14 , C 11 20 G 15 , , Holger , , Jochen , , 4 , , John hristine Podrini , , Antonio Aguilar-Pimentel 28 , , Ann Flenniken lauco lauco P , , Hughes Jacobs 1 , L E 35

, , , 15

eblanc milie milie S 10

M , , M 10 W C

om/ 10 43 org , , 6 M , , 5 1 23 – , , Aurelie Auburtin M harles ichel ichel Roux uller – , , , , eisenhorn T , 8

ohammed ohammed M 8 , , E C , anja 19 16 arie- 5 , , S velyn velyn D – T laudia laudia G M K Hancock 19. 18. 17. 16. 15. 14. 13. 12. 11. 10. 9. 8. 7. 6. 20. imon 5 , , , 8 occhini-Valentini –

D alloneau C , 5 ateryna ateryna raw 35 aier 8

– Pandey, A.K., Lu, L., Wang, X., Homayouni, predicting R. and & Williams, Characterising M.E. R.W. Hurles, Functionally & enigmatic E.M. Marcotte, I., Lee, N., Huang, J.H. Bassett, OMIM. version, A. online Rath, its and Man in Inheritance Mendelian V.A. McKusick, inconsistency Measuring D.G. Altman, J.P.,& Higgins, J.J. Deeks, S.G., Thompson, G. Koscielny, Morgan, H. W.C. online Skarnes, EMPReSS: and EuroPhenome J.M. Hancock, & EMPReSS: A. Blake, C. A.M., Mallon, Eumorphia, & M.H. Angelis, de P., Chambon, S.D., Brown, Brown, S.D. & Moore, M.W. Towards an encyclopaedia of mammalian gene function: J.K. White, M.M. Simon, Johnson, K.R., Longo-Guess, C.M. & Gagnon, L.H. Mutations of the mouse ELMO mouse the of Mutations L.H. Gagnon, & C.M. Longo-Guess, K.R., Johnson, haploinsufficiency in the . human the in haploinsufficiency (2012). bonestrength. that determine genes 9 new identifies (2012). users. end of range wide a serve to approach Orphanet Genet. Hum. J. Am. meta-analyses. in Res. Acids data. phenotyping related and mice knockout for access of point unified a data. function. gene mouse resource. phenotyping mouse genome. mouse Genet. Nat. the of annotation functional for screens phenotype standardized (2012). Consortium. Phenotyping Mouse International the genes. many for roles new reveals mice strains. mouse C57BL/6N and dynamics in hair cell stereocilia. cell hair in dynamics domain containing 1 gene ( ignorome.brainthe ofstudycase genes: a 5 , , K hris hris irk H Busch 8 5 C – , C , Beatrix – 8 S lein-Rodewald hristine hristine Birling 8 22 S , , 1 hristoph ara G , , Nucleic Acids Res. Acids Nucleic C S chiller , , Jean- 22 , , Alain 9 N 5 T 5 töger reenaway 21 , Paul Potter laire – et al. et S – , , atalia atalia

8

10 8 E M W elloum , , Alessia et al. 42

6 C et al. et , , , 17 37 16 – sapa , Robert , HoehndorfRobert et al. et et al. et et al. et S , D802–D809 (2014). D802–D809 , hristelle hristelle icklich Representation of rare diseases in health information systems: the systems: information health in diseases rare of Representation 8 et al. et ells , 1155 (2005). 1155 , , , , , Jan Rozman ylvie Jacquotylvie , 1 1 L EuroPhenome: a repository for high-throughput mouse phenotyping N C 16 Genome-wide generation and systematic phenotyping of knockout of phenotyping systematic and generation Genome-wide M , , , , Anja Br. Med. J. Med. Br. G ouis 11 Rapid-throughput skeletal phenotyping of 100 knockout mice knockout 100 of phenotyping skeletal Rapid-throughput aton hevalier A comparative phenotypic and genomic analysis of C57BL/6J of analysis genomic and phenotypic comparative A The International Mouse Phenotyping Consortium Web Portal, Web Consortium Phenotyping Mouse International The L 5 T S , , André A conditional knockout resource for the genome-wide study of study genome-wide the for resource knockout conditional A 10 10

ohamed ohamed – uimond 80 Pellegata , James obias engger 5 8 , Nature 10 , , 35 – , , Abdel Ayadi , 588–604 (2007). 588–604 , L

G 1 18 8 38 , , M 1 , Helmut Fuchs 1 , , Elmod1 , 17 ahcen ahcen 35 , 10 L , Frauke ambadoro W S , , 5 12 C , D577–D585 (2010). D577–D585 ,

andel – , ,

M isa isa chrewe Nucleic Acids Res. Acids Nucleic , , Jacqueline 474 , 327 S hristophe , 8 agner M Genome Biol. Genome 13 L 6 D 1 , , Andrew Blake töger N onica onica – , , 5 PLoS ONE PLoS M aurent Pouilly ) link small GTPase signaling to , 337–342 (2011). 337–342 , , , artin artin , 557–560 (2003). 557–560 , G 8 ierich – Bussell E 5 1 D , 17 8 16 E Quwailid – lise lise le lasl , 5 , , 26 8 , , alila Ali-Hadjialila l l Fertak – W , , Francesco 5 , , 19 Cell E 8 N 1 , , – N , , , Holger T PLoS Genet. PLoS PLoS ONEPLoS 22 E 16 8 milie Petermilie olfgang Hans K , , eff 18 5

, ost 25 ora Jones 5

7 dward Ryder M 16 154 – , , Patrice , , 14 lingenspor – 9 , e36074 (2012). e36074 , M , , 8 M , , S 8 17 , Julia , , Alison , , Armida L inxuan inxuan , R82 (2013). R82 , K , , Julien Becker 17 T

usan usan , 452–464 (2013). 452–464 , 1 ittelhauser 36 archand , Patrick ilian ilian Dis. Model. Mech. Model. Dis. , ertius ertius Hough 5

35 , , Irina W 10 – , D715–D718 (2008). D715–D718 ,

9 8 ,

LS Genet. PLoS , e88889(2014)., Hum. Mutat. Hum. , Ildiko , Racz Ildiko , , S 5 6 hite 10 M C – , e1001154 (2010). e1001154 , T M chulz C G 8 s e l c i t r A , G 21 anja Feigel alzada- artin artin Fray , 5 S hiani

W arschall arrett – 5 O oetz 9 , un T 1 8 – D 5 , 1

, M 35 alling , reise 8 liver Puk – 26 , 9 , i i Fenza

8 19 22 , 5

,

,

,

– N

5 8

18 33 ,

,

8 – e1002858 , 22 5 W

, olan 8 , 289–292 , , 803–808 , , 5 1 10 , , – 10 , 1 ack 8 , , Nucleic 1 10 , , 10 , 10 29

10 , 22 17 35 , , , ,  , , ,

© 2015 Nature America, Inc. All rights reserved. equally equally to this work. should Correspondence be addressed to S.D.M.B. ( Technische Universität München, Freising, Germany. 32 Animal Breeding and Biotechnology, Ludwig Maximilians Universität München, Munich, Germany. Cell Biology, Hospital for Sick Children, Toronto, Ontario, Canada. Universität München, Freising-Weihenstephan, Germany. Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. University of Aberystwyth, Aberystwyth, UK. Genetics, Helmholtz Zentrum Research München–German Center for Germany.Health, Environmental München/Neuherberg, Germany.München/Neuherberg, Germany.München/Neuherberg, Research Council), Rome, Italy. Research München–German Center for Germany.Health, Environmental München/Neuherberg, Heidelberg, Germany. der Ludwig Maximilians Universität München, Munich, Germany. Clinical Research Division of Molecular and Clinical Allergotoxicology, Technische Universität München, Munich, Germany. Allergy (UDA), Helmholtz Zentrum Research München–German Center for Health, Environmental Munich, Germany. 11 Illkirch, France. Européan de Recherche en Biologie et Médecine GIE (Groupement d’Intérêt Economique), Illkirch, France. Diabetes Research (DZD), Neuherberg, Germany. Germany. 1 Yann Herault Andreas Zimmer E M s e l c i t r A 1 German German Mouse Clinic, Institute of Experimental Genetics, Helmholtz Zentrum Research München–German Center for Health, Environmental München/Neuherberg, 0 Deutsches Deutsches Zentrum für Munich, Erkrankungen, Neurodegenerative Germany. Institute for Medical Microbiology, Immunology and Hygiene, Technische Universität München, Munich, Germany. ckhard ckhard

arie W 2 Experimental Experimental Genetics, Centre of Life and Food Sciences Weihenstephan, Technische Universität München, Freising, Germany. attenhofer- W olf 8 Université Université de Strasbourg, Illkirch, France. 5 1 – , 16 30 8 , Institut Institut de Génétique et de Biologie Moléculaire et Cellulaire, Illkirch, France. 16 , , Anne 1 , , 29 35 D , , Annemarie Zimprich , , Valérie onze 19 21 20 Institute Institute of Lung Biology and Disease, Helmholtz Zentrum Research München–German Center for Health, Environmental Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada. Institute of Human Genetics, Helmholtz Zentrum Research München–German Center for Health, Environmental W 5 olter – 8 , Bruno G 24 5 ailus- Manitoba Manitoba Institute of Cell Biology, University of Manitoba, Winnipeg, Manitoba, Canada. – 8 4 , , Joe Department Department of Statistics, University of Oxford, Oxford, UK. 34 W D A A list of members and affiliations is provided in the W urner eber 27 Toronto Centre for Toronto,Phenogenomics, Ontario, Canada. ood 9 Wellcome Trust Sanger Institute, Hinxton, UK. 22 5 – 1 15 10 , , 29 8 , 35 , Department Department of Cardiology, Angiology and Pneumology, Heidelberg University Hospital, E Institute Institute of Molecular Psychiatry, University of Bonn, Bonn, Germany. , , O U W , , Ann- [email protected] livia MOD olfgang W 33 M I Development Development Genetics, Centre of Life and Food Sciences Weihenstephan, endling C arie arie

W C onsortium urst M allon 5 26 22 – ). 18 Else Else Kröner-Fresenius Center for Nutritional Medicine, Technische 8 , , 31 Institute Institute of Cell Biology and Neurology, CNR (National 16 31 10 – , Henrik Max Max Planck Institute of Psychiatry, Munich, Germany. 33 , 17 35 34 a , , Ali Önder Yildirim Institute Institute of Pathology, Helmholtz Zentrum & DVANCE ONLINE PUBLICATION ONLINE DVANCE , , S 6 C 10 CNRS CNRS UMR 7104, Illkirch, France. upplementary upplementary Note 5 S Medical Medical Research Council, MRC-Harwell, Harwell, UK. hris hris Holmes Institut Institut Clinique de la Souris, PHENOMIN, Centre teve teve W esterberg 13 12 D Department Department of Dermatology and Allergy, Division Division of Dermatology Environmental and

28 M 14 Program Program in and Developmental Stem Brown Deptartment Deptartment of Neurology, Klinikum 23 22 Department Department of Computer Science, . . 4 Institute Institute of Developmental 35 10 , 35 These These authors contributed , , , M 19 K 25 10 onja , , Ramona Zeh aren P Computational Bioscience Computational , 30 3 35 German German Center for Institute Institute of Molecular

W 7 Nature Ge Nature INSERM INSERM U964,

S illershäuser

teel

9

,

35 1 n

, , etics

1 ,

© 2015 Nature America, Inc. All rights reserved. from EMPReSSlim by the four centers are stored in their local LIMS, LIMS, local their in stored are centers four the by EMPReSSlim from guide ARRIVE the of Karp in described are IMPC and EUMODIC implementation in lines the phenotyping on standardized details IMPC Further the of protocols. all holds which IMPReSS, called database the of version international newer the to migrated been now have the if identified mutant be were statistically differentwould from the that control. All of the phenotype data in EMPReSS expected the is, that parameters, of majority the for annotations ontology MP the stores EMPReSS addition, In pipelines. EMPReSSlim the from metadata and parameters data measured Data capture by EuroPhenome. approval license internal numbers 2012-009 and 2014-024. Institute:Clinical Com’Eth (CNREA number 17) for of the Ministry Research, Board (AWERB), approval PPLlicenses and80/2076 PPL ICS80/2485; Mouse PPL and 30/2380 PPL Wellcome30/2890; Trust Sanger Institute: licenses Animal Welfare and approval Ethical Review (AWERB), Board Review Ethical Welfareand Animal MRC-Harwell: 2532; license approval Oberbayern, von tees involved were as follows: GMC Helmholtz München: Zentrum Regierung Animal welfare was assessed routinely for all mice involved. The ethics commit consortium. the throughout disseminated were that refinements potential for considerate housingby andsuffering husbandry. minimize to All made phenotypingwere efforts procedures All below. were provided are examined licenses legislation under which they operate. Details of their review ethical andbodies ethical review panels and licensing and accrediting bodies, reflecting the national at TCP. EMPReSSslim through analyzed 141; were lines 13 MRC-Harwell, addition, In 136. ICS, and 101; 72; Institute, WellcomeTrust HMGU,Sanger follows: as were center each at and generated analyzed numbers total The lines. EUCOMM were 334 which of ing, the through analyzed were In mice for bred from 449 lines total, phenotyp pipeline. and EMPReSSslim bred were traps, gene and alleles targeted other mutations, ENU-generated with ones including lines, mutant existing other of number A arm. each for employed were different two permitted, sequences Whenever events. recombination homologous of accuracy the for 3 the analysis. data during performed was that step control quality a was bias operator of effect the of assessment However, blinding to the identity of pipeline, mutant lines during phenotyping phenotyping was not employed.the of nature high-throughput the Reflecting typing. pheno for type, wild or mutant either stock, of selection preferential no was there employed, not experimental was to randomization Although mice relevant. of not was groups allocation the in randomization pipeline, typing sex. affected the of heterozygotes by breeding lines were as assessed Phenotype cohorts sub-fertile. were obtained mouse from sub-fertile non-homozygote a with mated were sex either of homozygotes four any to at produce pups live when least Mutant failing lines generation. cohort during assessed also was line each of sexes both of fertility The pipelines. ing phenotyp the to committed were mice heterozygote circumstances, both In subviable. as being judged was line the homozygous, were intercrossing from the line was nonviable. deemed Similarly, if less than 13% of the pups resulting intercrosses, heterozygote of offspring more or 28 from obtained were gotes homozy If no available. mice the and line mutant the on depending scheme, breeding effective most the by generated were pipeline per sex each of mice the intercrossing of heterozygote carriers. Cohorts of at least seven homozygote allele non-agouti the correct to repair targeted to subjected cell and lines had The and been JM8A1.N3 JM8A3.N1 JM8A1.N3. JM8A3.N1 JM8.N4, JM8.F6, namely lines, cell parental C57BL/6N from different four derived of one were clones cell ES the strategy, targeting original the of part As transmission. germline confirm to screened were progeny the and mice, to C57BL/6NTac mated were chimeras resulting The generation. for chimera repository (EuMMCR) were injected into BALB/cAnN or C57BL/6J blastocysts generation. Mouse ONLINE doi: EUMODIC institutes that phenotypingcollect data are byguided their own The were targeted alleles by validated conventional PCR for the presence of pheno the through analyzed were cohorts mutant and wild-type both As before mice C57BL/6NTac to bred were mutations targeted carrying Mice 10.1038/ng.3360 ′

loxP site and by non-radioactive Southern blot with with blot Southern by non-radioactive and site

METHODS Targeted ES cell clones obtained from the EUCOMM cell Targetedfrom EUCOMM the obtained ES clones cell The EMPReSS database et al. et 1 0 incorporates SOPs, neo 2 2 2 1 . Data generated generated Data . . or lacZ probes probes ­ ------

the the analysis. Sex, strain, litter, day of analysis and other metadata experimental in included being center a at data baseline all with center, each at phenotype categorical dichotomized or quantitative transformed each to applied were analysis. Statistical to project. IMPC enhanced larger the support been now has EUMODIC supported that architecture matics process is employed to capture and validate the data before removal. The infor same The site. FTP the of directory delete the in files the placing by database the data from removed be the can Data database. successful, EuroPhenome is the into loaded are validation If Tracker. EuroPhenome web the a and files interface, log XML of form the in and sites upload the to the provided of are validation results The EuroPhenome. within data existing consistency for against checked further are and EMPReSS and schema the against verified again are data The uploaded. are files new any and system, capture site. FTP an on placed and compressed XML, to output are the EMPReSS database and the schema. If this validation is successful, the data as an object model. The against validation then library the performs necessary informaticians at the centers use The the library to export. represent data the for data to be library exported java a provides EuroPhenome consistency, data standardized format. To assist in data export and improve standardization and a common in EuroPhenome to each transferred are data to The county’s regulations. applicable licenses and panel review ethical center’s that by guided was center each at data phenotyping of collection The systems. management database relational different on running schemas database by diverse backed methods were implemented to study this set of ‘novel’of genes. set this study to implemented were methods Two literature. the from annotations without remain but projects these from into MGI, so these gene allele combinations incorporated now show were phenotypic annotations project Institute Sanger Trust Wellcome the and project this from data analysis, this performing While literature. the from phenotype a curated with set data MGI the in alleles no corresponding had they if tation phenotype annotations significant were as classified having no previous anno annotations. previous no with genes of Analysis pairs. gene non-matching for the distribution against pairs gene EUMODIC-MGI matching for ranks of distribution the comparing test pair. gene MGI a rank-sum Wilcoxon We performed then of number the in 1 i.e., genes for MGI) EUMODIC- and each 9,821, (between rank a yielding thereby gene, that to similarity phenotypic their to according genes MGI all ranked we gene, EUMODIC each for dif To this, do or genes. ferent same the to related were lines the whether of independent was lines MGI and EUMODIC between similarity phenotypic that hypothesis null the We tested EUMODIC. from MGI into integrated data all excluding set, data EUMODIC the in MGI identifier gene the unique the in for MGI genes searched we same database, the of alleles with observed those and EUMODIC in observed phenotypes the between relation. similarity superclass phenotypic the the Tocompute against closed sets comparing and content mation infor with weighted index Jaccard a is simGIC measure. similarity semantic simGIC set-based in PhenomeNET, the we used model) in a mouse observed or disease a with associated (either phenotypes of sets To compare analysis. data from the EUMODIC alleles were excluded from the MGI database for this The in MGI database. the of genes same the alleles with observed phenotypes the and EUMODIC in observed phenotypes the between similarity semantic similarity. Phenotype request. on available is ( center at each test each for 5% at FDR the control to approach permutation-based a via chosen were thresholds significance and statistic, a as test used and summarized was effect genotype mutant for a nonzero evidence posterior The of mice. groups components to with variance for allow among correlation chically phenotypic hierar modeled baseline were litter and the analysis of day from in Effects time. over changes mean systematic for account to incorporated penalized was a and spline covariates, as included were handled) were samples blood of such as the procedure, details how as (such and the used equipment certain Each center’s FTP site is regularly checked by the EuroPhenome data data EuroPhenome the by checked regularly is site FTP center’s Each Supplementary Note Supplementary Bayesian linear and logistic multilevel regression models models regression multilevel and logistic linear Bayesian We used the PhenomeNET the Weused ). The R code to generate the results results the generate to R code The ). A subset of the genes with with genes the of subset A 2 3 system to compute the the compute to system Nature Ge Nature n etics - - - - -

© 2015 Nature America, Inc. All rights reserved. groups of phenotypic annotations were selected as representative of the the of representative as selected three areas. disease The newly annotated were genes were placed on the appropriate annotations phenotypic of groups process. manual a adopted instead and problematic approach mated but across ontologies species, with this data phenotype set integrate we found this auto to systems hierarchical created has work Previous term. ontology MP higher-level corresponding the to term MeSH higher-level each mapped of Institutes National US MeSH Browser the Health using terms MeSH to term phenotypic each phenotype-centric approach. For all the retrieved human data sets, we mapped a adopted we disease, human and phenotypes mouse new our between tions we extracted data on associations with in GWAS-central, associations To OMIM. statistical to robust focus our limit All diseases with associations to these genes were extracted from Orphanet and genes these with associated diseases human for search to mined then were Ensembl v76 (ref. 24). Three data sets (GWAS-central Nature Ge Nature Second, in collaboration with experts in the domain and literature, three three literature, and domain the in experts with collaboration in Second, The first analysis identified human genes orthologous to the mouse genes in n etics 2 7 . To find equivalent mouse phenotypes, we manually we manually phenotypes, . Tomouse equivalent find P < 1 × 10 −5 . . To find phenotype correla 2 5 , Orphanet and OMIM) 2 6 - - .

27. 26. 25. 24. 23. 22. 21. Vennthe diagrams. in included were genes 94 total, In parameters. these annotation to respect the with pipeline of results the on depending diagram Venn each of sections

Nelson, S.J. & Schulman, J.L. Orthopaedic literature and MeSH. models mouse J.P.Laboratory Ioannidis, P.J.& Castaldi, N., Tangri, G.D., Kitsios, Central: GWAS A.J. Brookes, & R.C. Free, S., Gollapudi, R.K., T.,Hastings, Beck, Cunningham,F. awhole-phenome PhenomeNET: G.V. & Gkoutos, P.N. Schofield, R., Hoehndorf, N.A. Karp, S.J. Pettitt, resources. Res. associations. genome-wide human the for genome-wide of interrogation studies. and association comparison the for resource comprehensive a discovery. gene disease to approach 13 , e1002151 (2015). e1002151 ,

468 , 2621–2626 (2010). 2621–2626 , Nat. Methods Nat. et al. et t al. et et al. et Applying the ARRIVE guidelines to an to guidelines ARRIVE the Applying gui 5B/N mroi se cls o mue genetic mouse for cells stem embryonic C57BL/6N Agouti Eur. J. Hum. Genet. Hum. J. Eur. Ensembl2015.

6 , 493–495 (2009). 493–495 , Nucleic Acids Res. Acids Nucleic NucleicRes.Acids

22 PLoS ONE PLoS , 949–952 (2014). 949–952 ,

5 in vivo in , e13782 (2010). e13782 ,

39 43 , e119 (2011). e119 , , D662–D669(2015)., doi: database. Clin. Orthop. Relat. 10.1038/ng.3360 PLoS Biol. PLoS