REVIEWS

STRATEGIES FOR MAPPING AND CLONING QUANTITATIVE TRAIT IN RODENTS

Jonathan Flint, William Valdar, Sagiv Shifman and Richard Mott Abstract | Over the past 15 years, more than 2,000 quantitative trait loci (QTLs) have been identified in crosses between inbred strains of mice and rats, but less than 1% have been characterized at a molecular level. However, new resources, such as substitution strains and the proposed Collaborative Cross, together with new analytical tools, including probabilistic ancestral haplotype reconstruction in outbred mice, Yin–Yang crosses and in silico analysis of sequence variants in many inbred strains, could make QTL cloning tractable. We review the potential of these strategies to identify genes that underlie QTLs in rodents.

In the 1990s, when it first became possible to map the Collaborative Cross), techniques (such as in silico chromosomal locations of quantitative trait loci (QTLs) mapping and whole-genome expression studies) and in rodents, it looked as if QTL discovery would soon analytical tools (such as quantitative complementa- lead to the identification of genes involved in any med- tion tests and Yin–Yang crosses, see below for more ically important that could be modelled in information). mice or rats1–4. In this review, we discuss the realized or potential More than 10 years later, the field of QTL analysis is success of new approaches to identify genes that under- heading towards a crisis. The National Center for lie QTLs.We focus on methods to identify genetic vari- Biotechnology Information (NCBI) lists 700 QTLs that ants that have only a small effect on the phenotype (by are found in rats (see the web site in the which we mean that the variants account for 10% or Online links box), and the mouse genome database cur- less of the phenotypic variation of a quantitative rently contains 2,050 mouse QTLs (see the Mouse trait). In the past, robust and reliable detection of Genome Informatics web site in the Online links box). small-effect QTLs has been a problem5;but, as should Compare these figures with the number of candidate be evident from the number of QTLs that have been genes that are proposed to underlie QTLs in rodents reported, difficulties in QTL detection are not imped- (TABLES 1,2).Ifwe uncritically accept all the claims as cor- ing research. The methods might not be the most effi- rect, only about 20 genes have been identified. Even if no cient, but there is no doubt that QTLs can be detected more QTLs are mapped, at the present rate of progress at high levels of significance and that the findings can (20 genes identified in 15 years) it will take 1,500 years to be replicated, even for behavioural phenotypes6–9. Wellcome Trust Centre for find all the genes that underlie known QTLs. Although it is likely that only a small fraction of the Human , Oxford Several recent developments have been published total number of QTLs that segregate in inbred strain University, Roosevelt Drive, that might make what is currently almost impossible crosses are currently known to us (not to mention Oxford OX3 7BN, United more tractable. These include new genomic resources those in outbred stocks), without the ability to deter- Kingdom. Correspondence to J.F. (for example, the availability of sequences and mine the genes they represent, their detection is of e-mail: [email protected] sequence variants), animal resources (for example, limited interest. We urgently need advances in gene, doi:10.1038/nrg1576 chromosome substitution strains and the proposed not QTL, identification.

NATURE REVIEWS | GENETICS VOLUME 6 | APRIL 2005 | 271 © 2005 Nature Publishing Group

REVIEWS

Table 1 | Cloned quantitative trait loci Gene symbol Gene name Phenotype/function Variation (%) Gene identification method Reference* Alox15 Arachidonate Abnormal bone mass 4% Expression differences, analysis of 107 12/15 lipoxygenase knockout , pharmacological inhibition Rgs2 Regulator of Fear-related behaviour 5% Outbred mice, QTL–knockout interaction 31 G- signalling 2 Cyp11b1 Cytochrome P450, Abnormal blood pressure 5% Congenics, sequence variants 154 subfamily 11B, polypeptide 1 Lipc Hepatic lipase 7% Multiple crosses, knockout interaction 155 C5 Complement Experimental allergic 8% Expression differences, loss-of-function 106 component 5 asthma mutation Ace2 Angiotensin 1 Abnormal blood pressure 12% Knockout phenotype 156 converting enzyme 2 (peptidyl-dipeptidase A) Apoa2 Apolipoprotein A2 HDL cholesterol 23% Multiple-strain haplotypes, coding 88 sequence variants Mpdz Multiple PDZ domain Pentobarbital drug 25% -dependent differences in 16 protein withdrawal with seizures coding sequence and expression Ncf1 Neutrophil cytosolic Severe arthritis 25% Functional sequence change, 157 factor 1 pharmacological treatment Kcnj10 Potassium inwardly- Susceptibility to seizure 25% Coding sequence variants, multiple-strain 15 rectifying channel, haplotypes subfamily J, member 10 Ptprj (Scc1)Protein Susceptibility to colon 29% Recombinant haplotypes, sequence 158 tyrosine phosphatase differences, loss of heterozygosity type, J polypeptide in human cancer Tas1r3 Taste receptor, type 1, Saccharin response 30% Multiple-strain haplotypes, coding 159 member 4 sequence variants Nppa Natriuretic peptide Cardiac hypertrophy 44% Promoter variants increase gene 160 precursor A expression Cd36 Fatty acid translocase Abnormal fatty acid 50% Expression differences, loss-of-function 161 metabolism mutation, transgene complementation Pla2g2a Phospholipase A2, group IIA Modifier of tumours in 50% Frameshift loss of function and transgene 162 (platelets, synovial fluid) intestinal cancer complementation (cosmid) Mtap1a Microtubule-associated Modifier of hearing loss 57% Coding sequence variants, transgene 163 protein 1a complementation (cosmid) *The first publication of this material. HDL, high density lipoprotein.

Before proceeding, we should also be clear about probable effectiveness of the new approaches? To answer what we exclude from our discussion. In this paper, this question, we need to define success, which is not so we do not consider artificially induced mutations that straightforward. Two recent reviews that attempt this task present as QTLs, such as those that are due to mutage- identify 21 genes in total, but agree about the identifica- nesis, although it has been suggested that mutagenesis tion of only four genes in mice and five in rats11,12.As in the mouse can be used to determine the genetic many have pointed out, there is no single proof that con- basis of quantitative traits instead of QTL mapping10. firms a gene at a QTL and, to a large extent, each case has Screening mice for the quantitative effects of induced to be evaluated on its own strengths13,14.Replacing one mutations is doubtless an efficient method for the with another at a through targeted mutagene- identification of genes that are involved in the expres- sis with subsequent functional analysis provides a strin- sion of a phenotype, quantitative or otherwise. gent test, as does complementation, but these procedures However, our concern here is with the subset of poly- might not be possible (or even necessary) in every case: morphic loci that give rise to phenotypic variation in alternative evidence can provide sufficient proof that the natural populations (although we admit that labora- gene has been found. tory mouse populations are in many respects unnat- There are also issues about what to include as a ural). Mutagenesis does not distinguish between QTL. Strictly speaking, a QTL is any locus that con- these loci and the large number of loci that are func- tributes to a phenotype that is measured quantitatively. tionally invariant in nature, but in which mutations However, some investigators use a quantitative mea- can be induced. sure for a phenotype that others assess qualitatively (for example, susceptibility to cancer can be assessed as Lessons from previous success stories affected or not affected, or by a quantitative measure, Are there any lessons from success stories in the field of such as the number of tumours). To be comprehen- QTL mapping that might allow us to determine the sive, we have summarized data on candidate genes in

272 | APRIL 2005 | VOLUME 6 www.nature.com/reviews/genetics © 2005 Nature Publishing Group

REVIEWS

Table 2 | Cloned susceptibility loci Gene Gene Phenotype Penetrance Method of detection Reference* symbol Cblb Casitas B-lineage Type 1 High Nonsense mutation, transgene 164 lymphoma b complementation (cDNA) Stk6 Serine–threonine Skin tumour Low Multiple-strain haplotypes, 74 kinase 6 susceptibility outbred mice, expression difference Ctla4 Cytotoxic T-lymphocyte- Autoimmune Low Sequence variants, association 165 associated protein 4 disease studies in humans Il2 Interleukin 2 Type 1 diabetes Low Coding sequence variants, 166 differences in electrophoretic patterns Pctr1 Plasmacytoma resistance 1 Susceptibility to Low Coding sequence variants, 167 plasmacytoma protein variant less active B2m β-2-Microglobulin Type 1 diabetes Low Coding sequence variant, 168 transgene complementation (plasmid) *The first publication of this material.

TABLES 1,2. TABLE 1 lists candidate genes at loci where candidacy of Mpdz,but had a much smaller interval, the effects are expressed as the percentage of the total with only three known and three predicted genes to val- phenotypic variance. TABLE 2 lists candidate genes at idate, and so were able to make a more convincing case susceptibility loci, the effects of which are expressed as for gene identification16.Although TABLES 1,2 should not differences in penetrance. be interpreted as definitive lists, they do allow us to draw The weight of evidence in favour of each gene two conclusions of general importance. varies considerably. For example, Ferraro and col- leagues have proposed Kcnj10 as a that Effect size. First, the QTLs that have yielded genes have is involved at a seizure-susceptibility locus on the basis had exceptionally large EFFECT SIZES.To appreciate how of finding sequence variants in coding regions and large, we ideally need an unbiased estimate of the typical gene expression in the relevant tissue, but had to con- QTL effect size. Unfortunately this is difficult to obtain sider data for 120 genes within the region of study15. because, in cases of low statistical power to detect Shirley and colleagues applied similar criteria for the effect sizes, the contributions of detected QTLs are

50

Physiological QTL

Behavioural QTL

40

30

20 Number of QTLs

10

0 1234567891011121314151617181920212223242526272829 EFFECT SIZE Effect size (% variation) The percentage of the total phenotypic variation that is Figure 1 | The distribution of QTL effect sizes. Data are from 14 recent quantitative trait loci (QTL) mapping experiments attributable to a QTL. (including 244 significant QTLs). Behavioural QTL data are from REF.116. Data for physiological QTLs are from REFS 140–153.

NATURE REVIEWS | GENETICS VOLUME 6 | APRIL 2005 | 273 © 2005 Nature Publishing Group

REVIEWS

Box 1 | How QTL effect size depends on the mapping population The effect size of a quantitative trait locus (QTL) depends on the population in which the QTL alleles segregate. Here, we give a quantitative treatment and show that a genetic effect that produces a QTL with a 5% effect in a recombinant inbred (RI) population will produce a QTL that explains ~2.6% of the phenotypic variance in an F2 generation and ~1.3% in a backcross (BC). Consider a diallelic QTL at which a major allele has frequency p and physiological effect g,and a minor allele has frequency q = (1 – p). Let G be the genetic effect of the QTL on the phenotype for a given animal. The genetic variance, Var (G), in each population is 4g2pq for RI, 2g2pq for F2, and g2pq for BC (REF.5) .This shows that the genetic variance coming from the QTL — that is, the proportion of the phenotypic variance it explains — decreases in the ratio 4:2:1 for RI:F2:BC (FIG. 2).What does this do to the effect size of the QTL in these populations? In the simplest case where there are no other genetic components, the environmental variance Var(E) is constant and the effect size, θ,is shown in equation 1. = Var (G) (1) (Var(G)+Var(E)) The top part of the table shows θ for each population and how the effect of a QTL in an RI becomes smaller in F2 and BC populations132.(Note that ‘|’ in the table means ‘conditional on’.)

Measure RI F2 BC Single QTL system 4g2pq 2g2pq g2pq θ 4g2pq + Var(E) 2g2pq + Var(E ) g2pq + Var(E ) θ θ θ |θ θ RI RI RI RI θ θ 2 – RI 4 – 3 RI θ θ |{ RI = 5%} 5% 2.56% 1.30% Many QTLs system 4λΓ 2λΓ λΓ θ 1 4Γ + Var (E)2Γ + Var (E ) Γ + Var (E ) λθ λθ θ |θ θ 1,RI RI 1 1, RI 1, RI λ θ λ θ 2 – 1, RI 4 – 3 RI

But what if there are k QTLs that affect the trait? We take the simple case of k — unlinked, diallelic, independent, additive 2 QTLs. The genetic variance coming from the ith QTL in, for example, an RI, will be 4gi piqi, and the total genetic variance Var (G) in the RI will be kk 2 Var (G) = ∑Var (Gi) = 4 ∑ gi piqi (2) ii θ λ The bottom part of the table gives the effect size, 1,ofQTL 1, (which accounts for a fraction of the genetic variance) in RI, F2 and BC populations, as well as conditional effect sizes. In the table, we make the substitution defined in equation 3 for brevity. k Γ 2 = ∑ gi piqi (3) i The drop in effect size moving from RI–F2–BC is shallower when other genetic components have a large role in the phenotype. In the extreme case of no environmental variance (when other QTLs account for all the remaining phenotypic variance), there is no drop at all. At the other extreme, when λ = 1, the formulae are the same as in the top part of the table.

over-estimated17,18.With this caveat in mind, FIG. 1 FIGURE 1 shows that the distribution of effect sizes for shows a comparison of effect sizes from a reasonably both behavioural and physiological are complete survey of behavioural QTLs (94 significant almost identical, as are the average effects of the uncloned QTLs) and a compendium of physiological pheno- QTLs for these phenotypes, which are 5.8% and 5.3%, types (such as growth, weight, blood pressure and gall respectively. The average effect size of a cloned QTL is stones; note that we are using data from uncloned 26% (TABLE 1).Ifwe restrict our attention to the QTL QTLs). It is important to realize that effect sizes genes that have been most stringently confirmed (by depend on the mapping population in which they are complementation), then the average effect size is 50%. measured (see BOX 1 for a detailed discussion; see also Clearly, it is easier to identify genes when the QTL effect FIG. 2). Here, and in the rest of this review, we use effect sizes are large. size to mean the percentage of the total phenotypic In fact, the true effect sizes of most uncloned QTLs variance that is explained by the QTL segregating in an are almost certainly much smaller than the 5–6% aver- F2 intercross. age mentioned above, and this is not just because of

274 | APRIL 2005 | VOLUME 6 www.nature.com/reviews/genetics © 2005 Nature Publishing Group

REVIEWS

ab 100 5

RI RI

80 F2 4 F2

BC BC 60 3

40 2

20 1 % Effect size in X that is a 5% QTL RI % Effect % Effect size in X, given effect size in RI size in X, given effect % Effect 0 0 020406080100 0 5 20 40 60 80 100 % Effect size in RI Combined % effect size of all QTLs in RI Figure 2 | Effect size in F2 and BC relative to RI. a | Illustrates how a QTL’s effect size in F2 and BC populations relate to its effect size for a monogenic trait in an RI population. b | Shows the effect sizes in F2 and BC populations of what, in RI populations, is a 5% QTL for a single QTL that affects a multigenic trait ( is shown on the horizontal axis). Epistatic and gene–environment interactions require a more sophisticated treatment than is presented here. BC, backcross; RI, recombinant inbred.

statistical problems in their estimation. Perhaps the only does the sequence tell us which gene is the most most important finding to emerge from attempts to likely candidate, it also indicates what experiments are clone genes that underlie rodent QTLs is that a single needed to confirm the candidacy. For example, comple- genetic effect, as detected in an inbred strain cross, turns mentation will rescue the function of a null allele, or a out in most cases to be due to several physically linked biochemical or pharmacological intervention will deter- small effects. There are many examples of this phenom- mine whether an aberrant protein is involved. Unfortun- enon, including QTLs that influence seizures19,obesity20, ately, possessing a recognizable molecular signature growth21,blood pressure22–26, diabetes27, antibody pro- might be the exception rather than the rule for QTLs duction28 and infection29.Classical approaches to gene (although, admittedly, we have only a small sample of identification (BOX 2) generally assume that a single QTL cloned QTLs to assess). is just that, a single genetic effect. By not allowing for the Although SNPs that cause monogenic disease have a fractionation of the QTL, gene identification becomes strong tendency to occur at highly conserved amino much more frustrating. acids, this does not seem to be true of variants in com- Unfortunately, we still do not have a precise picture plex diseases. At least in human studies, the pattern of of the of QTLs in rodents, or indeed disease-associated coding variants is so far indistin- in any organism30.Just how small are the effects likely to guishable from the distribution that is found in the be? At the rarely achieved level of mapping where normal population36.Moreover, in many cases, no cod- genetic effects can be resolved into intervals of a hun- ing-sequence variant is found; instead many sequence dred kilobases or less, QTLs might further disintegrate. variants are detected in non-coding DNA in regions, the This has been found in studies of a behavioural QTLs in function of which (if any) is unclear31,37.In these cases, we mice31, and has been seen in similar work in other do not yet have an easy method to prove the involvement species32,33.Further high-resolution information at of a gene at a QTL. other QTLs might drive the estimate of average effect It is important to emphasize that identifying a gene is size below 5%. not the same as isolating a sequence variant (or quanti- In short, QTLs are predominantly loci with small tative trait nucleotide; QTN). Discovering a QTN does effects, so the problem of identifying the genes that not guarantee gene identification because the position underlie them is a problem of identifying the molecular of sequence variants does not necessarily coincide with basis of small genetic effects. Whether we call the locus a the position of the genes on which the QTNs have an QTL or not is of little importance; at the moment, the effect. For example, a QTN could lie within a regulatory distinction that makes cloning tractable is between large region that is megabases away from its cognate gene38. and small effect sizes. Unless we have methods that can Note also that it is a QTN, not a gene, that segregates in a identify the genes at QTLs that contribute 1 or 2% to the cross, so genetic strategies for QTL dissection focus on CONGENIC phenotypic variation, the backlog of uncloned QTLs sequence variants, not genes. A strain produced by a breeding strategy that delineates a will remain. genomic region containing a The measure of success. Before we look at new methods, trait locus. Recombinants Molecular signature. The second lesson from the success we should be clear about how well the old methods per- between two inbred strains are stories in TABLE 2 is that finding a recognizable molecular form (BOX 2).The relevant points (taken from a theoreti- backcrossed to produce a strain signature at a QTL, such as a sequence variant that cal literature that is almost as extensive as the reports on that carries a single segment from one strain on the genetic introduces a stop codon or that changes protein func- QTL mapping) are as follows: 300 F2 animals can be used background of the other. tion, makes QTL cloning tractable34,35.In such cases, not to map a QTL with an effect size of 5% onto a 40-cM

NATURE REVIEWS | GENETICS VOLUME 6 | APRIL 2005 | 275 © 2005 Nature Publishing Group

REVIEWS

Box 2 | Existing strategies for QTL mapping based on inbred strain crosses F2 intercrosses and backcrosses Two parental inbred strains are crossed to produce an F1, the starting point for the two most common mapping strategies. Intercrossing the F1 generates an F2 and backcrossing to one or other of the parental strains produces a backcross. The number of animals (n) needed to map an additive QTL in an F2 can be calculated using equation 1 (REFS 5,42). 2 1 – VQTL Z1 – α/2 n = + Z β (1) V (1 – V )1/2 1 – ) ( QTL )( QTL α α Z1 – α/2 is the cut-off point in a standard for a probability of 1 – /2 and is the significance threshold. If we use a genome-wide significance threshold of logP 4.3 (REF.133) from normal tables, the cut-off point is 3.89. If we set the

power at 90% (in other words, 10% chance of missing a true association) then we need a Z1 – β of 1.28 (from normal tables). VQTL is the proportion of variance that is attributable to the QTL. For a purely additive QTL that explains 5% of the variance, the calculation yields equation 2.

0.96 3.89 2 + 1.28 = 527 (2) (0.05) ( 0.97 ) Recombinant inbred (RI) lines A pair of inbred strains is intercrossed and their progeny is inbred to generate a panel of inbred animals, each with a different combination of the progenitor genomes. Once a set of RI lines has been genotyped, the marker data are available for all subsequent mapping experiments, which take place by determining the strains that are present at a marker and by correlating this strain distribution pattern with phenotypic differences. Using the terminology given above for power in an F2 intercross, the relationship between the number of animals (n) and the effect size of the QTL is given by equation 3 (REF.134). 2 n – 2 = (1 – VQTL)(Z1 – α/2 + Z1 – β) (3) (VQTL) Congenics and interval-specific congenics Congenics are still the mainstay of fine-mapping QTLs in rodents. By repeatedly backcrossing one strain onto another, it is possible to create animals that have a particular genomic region from one strain and the remainder of their genome from the other; subsequent intercrossing makes the genomic segment homozygous and the mouse fully inbred. However, inadequate or incorrect assumptions about the distribution of chromosome segments, the population structure, the marker spacing and the selection strategy might mean that the breeding does not go as predicted59. Flaherty and colleagues point out that transgenic mice, which are routinely created on a 129 background and then backcrossed onto another strain (usually C57BL/6), can be used as congenics for the interval around the engineered mutation135.An interval-specific congenic is made by first identifying an animal with a recombinant chromosome in a region of interest and then backcrossing it for several generations to remove other QTLs. Resultant animals are intercrossed, and homozygotes for the recombinant haplotype are selected. Recombinant congenics and genome-tagged mice Recombinant congenics are a type of RI line made by backcrossing and randomly fixing parts of the genome by inbreeding, so that each contains an average of 87.5% genes of a common background strain and 12.5% of a common donor136.Genome-tagged mice are sets of overlapping congenics with sufficient material from one strain crossed onto the other so that the whole genome is represented137. The 60 strains created by West and colleagues contain segments of about 23 cM that are INTROGRESSED from either DBA/2 or CAST/Ei onto C57BL/6 (REF.137). Recombinant progeny testing Animals that have a recombinant chromosome in a region of interest are backcrossed to a parental strain to determine the location of the QTL relative to the recombination point. Advanced intercross lines Two parental strains are crossed to produce an F1 that is intercrossed to produce an F2. Subsequent generations are produced by intercrossing, so that animals in an advanced intercross line accumulate new recombinants. Animals are maintained according to a pseudo-random breeding protocol to reduce loss of genetic variation in the population. An appropriate advanced intercross line could take at least 5 years to make, but it can be used to map many loci. Heterogeneous stocks Genetically, heterogeneous stocks are derived from inbred strains through a series of progressive intercrosses. Existing heterogeneous stocks are derived from eight strains, but any number can be used (an advanced intercross line is a heterogeneous stock derived from two strains). By subsequently maintaining the stock for many generations, using a pseudo-random breeding protocol, recombinants are introduced that allow high-resolution mapping. Traits are mapped by associating allelic variants with a phenotype, so that must be carried out for each experiment. Two heterogeneous stocks have been used for mouse mapping experiments: the older of the two (the Boulder heterogeneous stock), has been breeding for more than 60 generations and is derived from the C57BL/6, BALB/c, RIII, AKR, DBA/2, I, A/J and C3H strains138. The second heterogeneous stock (the Northport heterogeneous stock) has passed its fortieth 139 Introduction of a chromosomal generation , and is derived from the A/J, AKR/J, BALB/cJ, C3H/HeJ, C57BL/6J, CBA/J, DBA/2J and LP/J strains. Genetic segment from one strain into variation from many strains complicates analysis, but robust and powerful QTL detection is possible by estimating the another by interbreeding. probability that an allele descends from each progenitor strain70 (BOX 3).

276 | APRIL 2005 | VOLUME 6 www.nature.com/reviews/genetics © 2005 Nature Publishing Group

REVIEWS

interval with 50% power, using markers that are spaced QTLs in the first mouse CSS experiment51.Notably, this every 20 cM across the genome39.Increasing the marker is not a substantial saving on the numbers used for an density does not appreciably increase mapping resolu- F2 intercross (BOX 2). tion for small-effect QTLs40.Increasing the number of A comparison between parental and CSS strains animals increases the likelihood of QTL detection (so will only map a QTL to a chromosome. For higher- that with 800 animals there is a greater than 99% chance resolution mapping, CSS strains allow the rapid creation of detecting a QTL with an effect size of 5%; BOX 2), but of a congenic strain, either by using interval-specific con- does not increase resolution sufficiently to identify genic strains or by recombinant progeny testing42 (BOX 2). genes — even 1,000 animals will only map a 5% QTL Because of the relative increase in effect size, congenic onto an interval of about 10 cM (REF. 41).In summary, a construction and recombinant progeny testing will standard inbred-strain mapping experiment will need require 3–4 generations to reduce the interval to 1 cM, about 400 animals and will use 100 markers. rather than the 9–10 generations that are required when Starting from a cross between two inbred animals, it starting from an F2 intercross52. could take less than 6 months to map a QTL to a 40-cM QTL mapping in a CSS delivers researchers to the region. It will take a few more years to isolate a QTL by same point faster than classical strategies have led them, creating CONGENICS42 (BOX 2).Alternatively, by continuing but no farther. The main drawback of the method is to intercross beyond the F2 generation, fine mapping that it makes no allowances for the fractionation of a can be carried out in an advanced intercross line or a large QTL effect into many loci with smaller effects. This heterogeneous stock43 (BOX 2).After 6 years, we can rea- is the problem that has for so long beset the use of con- sonably expect traditional strategies to have resolved a genics for QTL dissection and gene identification. CSS 5% QTL into a region of about 1 cM, which in the mapping is a powerful method for the identification of mouse corresponds to 2 Mb on average. The average small-effect QTLs, but it does not offer advantages over gene density in the mouse is about 10 genes per other methods for the identification of genes. megabase, so a 1 cM QTL interval will contain about 20 genes (note that gene density varies significantly Collaborative Cross: a proposal across the genome). Keeping this in mind, we now turn Would the difficulties of QTL cloning be overcome if we to the new strategies to see how they fare in comparison. had 1,000 recombinant inbred (RI) lines, as suggested by the Complex Trait Consortium53? Although the Chromosome substitution strains Collaborative Cross — a panel of 1,000 RIs derived Chromosome substitution strains (CSSs) consist of a set from 8 parental strains — is only a proposal, not an of animals in which one chromosome is derived from existing resource, we discuss it here because it might one strain and all the rest are derived from another. For provide a tool for gene identification (although this is mice, a panel of 22 CSS strains is required (for 19 auto- not the sole reason for its creation53). somes, 2 sex and mitochondria), or 44 if a The use of RI lines for QTL mapping in rodents has reciprocal set (with both parents as progenitor donors) is so far been limited by the relatively small size of each set. used. QTL mapping is carried out through the relatively The BXD RI set (which is derived from C57BL/6 and simple process of comparing the phenotypes of each DBA/2 strains) consisting (until recently) of 26 lines strain with the parental background strain. CSS strains has 90% power to detect a QTL that explains half the were first used to map QTLs in mice in 1999 (REF. 44); phenotypic variance in a trait54.The comparable effect of theoretical aspects of the mapping process were subse- a QTL segregating in an F2 generation is half this value quently described for mice45 and rats46,47.The method (because the F2 animals are not inbred), or 25% of the has a long history in plant48 and Drosophila genetics49. phenotypic variance (BOX 1).What happens to power and The first complete CSS set, created from A/J and mapping resolution if we have 100 lines? C57BL/6 strains, was produced in 2004 and was used to We have investigated this question using a simula- detect QTLs across the mouse genome50,51. tion55. Fine mapping of a QTL with a 5% effect is effi- The ease of QTL detection using CSS strains results cient: 100 RI lines provide 80% power for QTL detection from two features: first, the background genetic variance at a 5% significance level. Mapping resolution is also is reduced so that each QTL explains a greater propor- good: the 95% confidence interval is 7 cM. Fortunately, tion of the total phenotypic variation (BOX 1).Second, a RI panels approaching this size are now available: the lower significance level is needed for QTL detection LXS panel consists of 77 lines56 and the expanded BXD because, compared with the ~100 markers tested in an set consists of 90 lines57.Therefore, the new RI sets can F2 generation, only 21 comparisons need to be made. deliver high resolution, although this is still insufficiently Singer and colleagues point out that the F2 intercross high to identify genes at small-effect QTLs. requires at least 35% more animals for QTL detection51. The proposed Collaborative Cross differs in two Nevertheless, Belknap estimates that to detect a QTL ways from standard RI lines. It is created from eight with an effect size of 6% and with 50% power will progenitors, not two as is the case with RI lines, so that require 20 CSS strains and 20 parental animals for each the set would contain more genetic diversity than a stan- comparison, or between 3 and 400 animals (depending dard RI. Second, the proposal is to create 1,000 lines, on how many background animals are used) for a ten times more than the largest available set. What genome scan52.Belknap’s estimate of 3 to 400 animals would be the advantage, for gene identification, in hav- agrees with the actual figure of 435 animals used to map ing such an RI set? Mapping resolution improves to

NATURE REVIEWS | GENETICS VOLUME 6 | APRIL 2005 | 277 © 2005 Nature Publishing Group

REVIEWS

Box 3 | Probabilistic ancestral haplotype reconstruction

Single-marker association mapping is an inefficient QTL m n p q method when the allele frequencies of the variant and 1 1 1 0 the quantitative trait locus (QTL) do not coincide A and when contrasting QTL alleles are in linkage disequilibrium (LD) with the same marker allele, as 2 1 1 1 illustrated in the figure. In this example, the B population from which all chromosomes descended consisted of four inbred strains. Inbred strains A and 3 1 0 1 C contain a QTL allele that increases the phenotype, C and strains B and D contain an allele that decreases the phenotype. A polymorphic marker, m, that is in complete LD with the QTL possessing four distinct 4 0 0 0 D alleles (labelled 1,2,3,4 in the figure) will show evidence of association. Consider what happens with a diallelic marker, n, that, in the ancestral population, has one allele ‘1’ that is physically linked to strains A, B and C and the other allele ‘0’ linked to strain D. Evidence of association will now be reduced because the marker fails to partition the into two equally contrasting groups. Things are even worse at markers p and q where both alleles 0 and 1 are linked to both increaser and decreaser QTL alleles; there is no detectable association at all. But the haplotypes that are defined by the markers p and q distinguish all four strains and so are surrogates for the efficient marker m. Therefore, multi-point haplotype-based mapping is generally more powerful than single-point mapping. Heterogeneous stocks are mosaics of eight inbred founder strains, and QTL mapping in these stocks is best approached by reconstructing the ancestral haplotype mosaic. However, the strain distribution patterns of neighbouring markers rarely distinguish between all founder strains, so instead we compute the probability that an animal is descended from a given pair of strains at a particular locus. A useful statistical model of the genome mosaic of a diploid heterogeneous stock animal is a pair of HIDDEN MARKOV MODELS, in which the hidden states of the model are the ancestral strains, and the observed data are the marker genotypes. A DYNAMIC PROGRAMMING ALGORITHM is then used to compute the probabilities of descent using all the available information. The marker alleles must be known in the founder strains for the method to work. Recently, we have shown that other outbred mice of unknown ancestry can be modelled and analysed as mosaics of standard inbred strains in the same way31.

about 4 cM when the number of lines increases to 1,000 understand the principle, we first explain RISTs, and further lines make it possible to map effects that are which are a particular example of YYCs42. smaller than 5%. As we have seen, QTLs have a habit of Assume that a QTL has been mapped in an inbred- disintegrating when they are dissected and we need the strain cross. RI strains that have recombinants within ability to find the pieces. the QTL interval are crossed with both parental If the Collaborative Cross were to be created, it inbred strains to produce two segregating popula- would enable us to map many of the 2,000 uncloned tions. In one cross, the RI and the parental line share QTLs into small genetic intervals. This is an attractive the QTL allele, so no QTL segregates; in the other proposition but it must be balanced against two cross, the RI and the parental line have a different important disadvantages. First, maintaining and dis- allele at the QTL, and the QTL segregates (hence the tributing an RI set of 1,000 lines poses significant name segregation test). Consequently, the QTL will be problems. Even a laboratory devoted to providing mapped to a position either above or below the recom- animals to the research community, such as the Jackson bination point in the RI that is used for the two crosses. Laboratory, is able to maintain only a few hundred The overall results from a selected set of RIs will locate strains as live stock. Second, the resolution of the the QTL to a small interval. HIDDEN MARKOV MODEL Collaborative Cross is not sufficient to identify genes: Resolution is determined solely by the number of A probabilistic description of a a 4-cM interval is roughly equivalent to 8 Mb of recombinations in the interval. For example, with 100 RI system in which the observed DNA, which will contain approximately 80 genes. lines, each having approximately 3.3 recombinations per data depends on the hidden Unfortunately, the Collaborative Cross would give us no 100 cM (REF. 56), the total number of recombinations is internal state of the system. The objective is usually to infer the help with the final goal of gene discovery. However, one 3.3 multiplied by the number of centimorgans in the likelihood that the system is in a solution might lie in the application of recombination genome (~1,400) to give 4,620 recombinations; so, particular hidden state, given the inbred segregation tests (RISTs) and its generalization, the expected resolution is 1,400/4,620 = 0.30 cM. This observed data. Yin-Yang crosses (YYCs). corresponds to an interval of 0.6 Mb. With 1,000 RI, the

DYNAMIC PROGRAMMING expected accuracy is 0.03cM (potentially a few hundred ALGORITHM Yin–Yang crosses kilobases). However, recombinations are not distributed An algorithm that finds the RISTs and YYCs, applied to a large number of RIs, will evenly in the genome, making the above calculation an optimum solution to a problem increase mapping resolution to the point where indi- approximation only59. involving N objects in terms of vidual genes can be identified. YYCs take advantage of RIST, as with congenic construction and other the solutions to a series of smaller problems that involve the abundance of recombination events in RI or strategies that seek to treat QTLs as Mendelian genetic subsets of the objects. inbred strains to improve mapping resolution58.To factors, will have problems with multiple linked QTLs;

278 | APRIL 2005 | VOLUME 6 www.nature.com/reviews/genetics © 2005 Nature Publishing Group

REVIEWS

but unlike the construction of a congenic line, a RIST many thousands are required77.However, QTL analysis in analysis is quick to carry out, so if multiple QTLs are outbred mouse populations is likely to be easier than its suspected, it will be possible to test that hypothesis with human equivalent. In fact, it would be as simple and pow- further RISTs (assuming appropriate recombinant lines erful as mapping in inbreds if each allele at every locus are available). Indeed, with 1,000 RI lines, it would be could be traced back to a unique progenitor strain, so possible to resolve most QTLs onto an interval that that (ANOVA) at a marker extracted contains a few genes in just two generations; an efficient maximal information for QTL detection (BOX 3). and relatively economical strategy. In some instances, In contrast to human populations, in which progeni- the resolution will be good enough to identify a gene tor alleles are unknown (although there are a few excep- underlying a QTL. tions, such as ADMIXTURE MAPPING78), one type of outbred The YYC approach is a generalization of the RIST mouse (heterogeneous stock) is derived from known method that treats inbred strains as if they were RI lines, ancestral inbred strains. This makes it possible, with the on the grounds that inbred strains descend from a rela- help of some mathematics, to derive the origin of each tively small number of founders. In this respect,YYCs allele and map QTLs at sub-centimorgan resolution make the same assumption as in silico mapping, which (BOX 3).Unfortunately, mapping a QTL to a megabase is discussed below. The large number of recombination interval is still insufficient to identify a single gene. events that have accumulated in a set of inbred strains Other outbred stocks might offer better resolution, but since their origin can be used to map the QTL to a small they lack the advantages of progenitor information region58.Similar to RISTs,YYCs are used as a fine map- from the heterogeneous stock. ping tool that is applied after the initial mapping of the Several methods have been developed that deal QTL. The allelic state of the QTL in multiple strains is with this problem in its various incarnations79–84.So revealed by crossing the two strains that were used ini- far, there has been only one application of this method tially to map the QTL and any new strain. YYCs estab- in mouse genetics. We have shown that sufficient lish the strain distribution pattern of the QTL, which is recombinants have accumulated in a commercially used to identify the list of variants in the region that available outbred stock (known as MF1) to map follows this distribution. genetic effects onto regions of only a few hundred kilo- The YYC method is based on the assumption of a bases or less31.Sequence analysis showed that variants

ANALYSIS OF VARIANCE single biallelic variant or haplotype that affects the trait in the MF1 strain were identical to those found in A statistical method to test the in the different strains. Therefore, as for the RIST inbred strains, and haplotype reconstruction revealed null hypothesis that the mean approach, the main limitation of YYC analysis with that more than 95% of the haplotypes could be values of two or more groups are inbred strains is that it is not suitable for multiple QTLs derived from known inbred strains. Therefore, we were equal. The variance around the in a region. Furthermore, QTLs that primarily have able to map the MF1 strain by probabilistic ancestral mean in groups is compared with the variance of the group epistatic effects will not be suitable for this kind of haplotype reconstruction (BOX 3).It is important to mean. In genetic applications, analysis. They will result in inconclusive results of the realize that our approach neither required us to recover the variance between families is Yin–Yang analysis,but this is unlikely to produce false the correct haplotypes, nor to identify the progenitor compared with the variance negative results. strains correctly. We found that a QTL for anxiety-like within families. A significant F-ratio implies that variance behaviour, previously mapped to an interval of less between families is larger than Outbred stocks than 1 cM, resolved into three smaller independent within families. Outbred stocks accumulate recombinants over time, QTLs, each about 100–200 kb wide. It would also be so that they offer high mapping resolution; potentially possible to combine pedigree and LINKAGE DISEQUILIBRIUM ADMIXTURE MAPPING high enough to identify candidate genes. The effects of (LD) data for fine mapping in the MF1 strain and, Genetic mapping using 79,81,85 individuals whose genomes individual genes can be mapped by association, as has indeed, in other outbred mice . are mosaics of fragments been successfully shown in human populations60–67.In Potentially, outbred animals could deliver a resolu- that are descended from a few cases, QTLs have been mapped in mouse popula- tion that would be sufficient to guarantee candidature of genetically distinct populations. tions: small-effect QTLs have been mapped to sub- a single gene. The disadvantages of the method lie in the This method exploits differences in allele frequencies in the centimorgan resolution using heterogeneous stock complexities of the analysis and the need for large num- 68–71 founders to determine ancestry mice (BOX 2);crosses using outbred Mus spretus bers of animals and high-density genotyping. For exam- at a locus in order to map traits, have been used first to map72,73 and then to identify ple, genome-wide mapping in heterogeneous stocks in a way that is broadly similar to serine–threonine kinase 6 (Stk6) as a candidate skin- (BOX 2) requires at least 6,000 SNP markers. To reduce an advanced intercross. tumour susceptibility gene74;outbred CD1 mice have false-positive results to acceptable levels with this

LINKAGE DISEQUILIBRIUM been used to map a susceptibility locus for pulmonary number of markers means imposing stringent signifi- The tendency for markers to adenoma75; and outbred MF1 strains were used to cance thresholds: by permutation, we estimate that the have correlated genotypes when identify regulator of G-protein signalling 2 (Rgs2) as a 5% and 1% significance levels correspond to P-values, they are physically close together. gene that influences anxiety in mice31. expressed as negative logarithms, of 4.4 and 5.1 Over several generations, recombination will break down Mapping in outbreds is not as straightforward as (REF. 86), and will require about 1,000 animals to pro- 87 linkage between markers and a mapping in inbred crosses: it requires higher density vide 80% power to detect a 5% QTL .More markers, QTL, so that linkage genotyping and larger numbers of animals. At first sight, and more animals, will be needed for whole-genome disequilibrium will only occur the method seems to have not only the advantages, but analysis of more complex outbred stocks. These fea- between markers that are close also to suffer from the same drawbacks, as human tures militate against whole-genome analyses using to a QTL. This explains why outbred animals can provide studies, where robust, replicable outbreds, but developments in high-throughput geno- high-mapping resolution. results have been hard to obtain76 and sample sizes of typing will make it feasible.

NATURE REVIEWS | GENETICS VOLUME 6 | APRIL 2005 | 279 © 2005 Nature Publishing Group

REVIEWS

Box 4 | QTL mapping using recombinant inbred lines versus in silico mapping Because recombinant inbred (RI) panels are descended from (usually) two inbred strains, their genomes are random mosaics of these founder haplotypes. Once established, an RI panel need only be genotyped once, using markers that distinguish between the founder strains. At a given marker, the pattern of genotypes across the panel is called a strain distribution pattern. For example, in a panel of eight RI strains, the SDP 0100111 means that the first, third and fourth strains carry the allele ‘0’ and the remainder the allele ‘1’ at the marker in question. To map a quantitative trait locus, RI animals across the panel are phenotyped (if possible, in replicate, to reduce non-genetic variance), and the values are correlated with the strain distribution patterns at each marker (see the Gene Network web site in the Online links box) to identify markers with statistically significant associations. Mapping resolution depends on the size of the panel, but is roughly equivalent to that of an intercross. Superficially, in silico mapping looks similar. A set of inbred strains (that have an unclear ancestry) replaces the panel of RI strains of known ancestry. An important difference is that regions of HAPLOTYPE SHARING between the inbred strains must first be predicted from runs of contiguous markers with shared genotypes. The inbred strains are phenotyped and correlated with the predicted haplotype distribution patterns as before. Because the blocks of haplotype sharing between inbreds are smaller than in RI lines, theoretically in silico mapping resolution should be greater for the same number of strains. The key difference is that mapping using RI lines is by descent, but is by inferred state for in silico mapping. RI mapping is uncontroversial; by contrast, the merits of in silico mapping are hotly debated.

Haplotype analysis and in silico mapping mosaic structure, which is consistent with their deriva- Many have pointed out that a QTL must be contained in tion from a relatively small number of Asian and a region where sequence divergence corresponds to European stocks95–97. genetic action. So, when QTLs have been mapped in dif- There is little doubt that in silico mapping is effective ferent combinations of inbred-strain crosses, the strain for monogenic, highly penetrant mutations95,98,99.The distribution pattern can be combined with mapping data crucial issue is whether in silico mapping performs well to refine the region that contains the functional vari- in detecting small-effect loci. Wiltshire and colleagues ant68,88–90.For example, a QTL identified in a cross have gone the furthest towards this goal. They used a between strain A and strain B that does not segregate in dense map (of 10,990 SNPs) that was typed in 48 strains a cross between strains A and C will be located where and applied a permutation method to determine the sig- sequences from strains A and C are the same, but those nificance of their results99.They demonstrated the effec- from strains A and B are different. This approach has tiveness of the method for localizing large effects (on been used to carry out mapping at an extremely high coat colour and taste preference) and then showed that resolution and to identify candidate genes68,88–90. for two multigenic (high-density A development of this idea is in silico mapping (BOX 4). lipoprotein cholesterol levels and gallstone forma- It relies on known phenotypic differences between inbred tion) they could find loci that corresponded to some strains (such as those being collected as part of the Mouse previously identified QTLs. These results are encour- Phenome Project91;see the Mouse Phenome Database in aging, although further work is required to distinguish the Online links box) and uses a high density of markers between true- and false-positive findings. to derive a strain distribution pattern that can be used for At present, two difficulties confront in silico map- mapping92.The proponents of this method argue that it ping: low power and the complex structure of the reduces the time required for analysis of genetic models of genomes of laboratory strains. At first sight, the problem complex disease “from many months to milliseconds”92. of power seems to be insuperable: how can the genetic In principle, in silico mapping is similar to mapping basis of complex traits be dissected with just a few dozen using RI lines — by establishing phylogenetic relation- animals, or even a few 100, when we know that success- ships between inbred strains from sequence informa- ful association studies in humans requires thousands of tion it should be possible to identify regions that are individuals77? The truth of this analogy depends on the identical by descent. The problem is that in contrast to a extent to which the inbred strains are related by descent. set of RI strains (or indeed all mapping experiments If the inbred strains are completely unrelated, the anal- that are derived from crosses between inbreds), in which ogy is correct and thousands of animals will be needed genotyping unambiguously defines descent so that for genetic mapping. This is partly because there will be genetic identities can be associated with phenotypic less LD between loci in unrelated individuals, so that similarities, finding sequence variants that are shared by more markers are needed to detect the QTL77.More a set of inbreds does not guarantee their descent from a unlinked markers will require an increase in the strin- common ancestor. gency of the significance threshold to avoid false-positive Nevertheless, several observations support the treat- results, which in turn requires an increase in the number ment of inbred strains as if they were derived from a set of animals to maintain power for QTL detection. of founder strains. For example, the genealogies of However, more animals are also needed because, in fully HAPLOTYPE SHARING many inbred strains can be traced back to a relatively outbred populations, more loci are likely to contribute to Sets of closely linked genetic small number of animals93,94.Moreover, the pattern of the genetic variance. Therefore, the relative contribution variants in different individuals that are identical by descent sequence variation between inbred strains initially indi- to the phenotypic variance that is attributable to a single around a locus. cated that the distribution of polymorphisms has a locus will be less than for the same locus in a population

280 | APRIL 2005 | VOLUME 6 www.nature.com/reviews/genetics © 2005 Nature Publishing Group

REVIEWS

Figure 3 | Haplotype complexity in inbred strains of mice. 700 variants with a strain distribution pattern frequency >1% between eight inbred strains across a 2-Mb region of mouse chromosome 1 are shown. The region is represented along the horizontal axis and scaled so that the nth coordinate from the left edge corresponds to the nth variant. The alternating grey and white tracks show the spatial arrangement of the 13 most common strain distribution patterns. Each track represents one strain distribution pattern, a bar on the track shows where the corresponding variant has that strain distribution pattern. The strains are A/J, AKR, BALB/cJ, C3H, C57BL/6J, DBA/2J, I and RIII. The data presented are from REF.37.

of individuals who have common ancestors (BOX 1). The power of in silico mapping will vary along the However, as we have seen, inbred strains do share genome depending on the complexity of the pattern of ancestors, so their degree of genetic relatedness is not sequence variation. There will be effects on both power equivalent to that in outbred humans. and resolution, depending on the number of progenitor To obtain a lower limit on the number of animals genomes that are assumed to give rise to the inbred needed to detect a genetic effect by in silico mapping, strains and on the size of the blocks of sequence identity. assume that all inbreds are derived from just two prog- Although initial surveys of polymorphisms have enitors; in other words, assume the inbreds are a set of revealed an apparently simple pattern of relatively large RI lines. Using a standard equation to calculate the blocks of sequence similarity and difference, detailed number of RI lines required to map a QTL, Darvasi analyses of multiple strains show that contiguous strain argued that between 40 and 150 inbred strains would distribution patterns are different, but consistent with provide 50% power to detect a QTL that explains the same phylogenetic tree. 5–20% of phenotypic variation in an F2 cross42.So,the FIGURE 3 shows a representative example of the com- smallest number needed to obtain sufficient power will plexities that have been discovered in a single 2-Mb be about 50 animals. However, it will almost certainly be region on mouse chromosome 1 (REF. 38).In the right more than that, because inbreds are not as closely half of the figure, the structure is relatively simple; related to each other as a set of RI lines. without too much loss of information, the distribu- Nevertheless, Peltz and colleagues claimed that they tion of sequence variants can be modelled as if the could detect QTLs using data from just eight inbred genomes were descended from just two progenitor strains. They make the point that by phenotyping more strains, with one strain distribution pattern accounting animals it is possible to increase the heritability by for 93% of the variants. In this case, in silico mapping reducing measurement error100.The drawback to this could detect QTLs (assuming that an adequate num- argument is that, even if all environmental noise could ber of strains has been analysed). However, the be removed in this way, the individual effect of each method cannot resolve QTLs into an interval that is QTL will still be limited by the number of other QTLs smaller than the block (which in this case is about that contribute to phenotypic variation. Good estimates 1 Mb). By contrast, the left half of the figure shows a of the average number of QTLs and their relative contri- much more complex structure, with several strain dis- butions to a typical complex trait are hard to come by, tribution patterns that alternate in an unpredictable but assuming an exponential distribution of effect sizes, fashion, but are still consistent with the same phylo- our own data on behavioural measures yield a mean of genetic tree. Much higher mapping resolution will be 7 QTLs (with an upper limit of 14 using a 95% confi- obtained in this area, but with some loss in the power dence interval (REF.101)). So, given the known distribu- to detect QTLs. tion and effect sizes of QTLs, it is extremely unlikely that The analysis of sequence variation described above a single effect accounting for more than 20% of the phe- was carried out using just eight strains, but we expect that notypic variance will occur in a comparison of inbred the results can be extrapolated to accommodate more strains. Therefore, we can be reasonably certain that strains. Importantly, the full complexity of the structure to tackle genetically complex phenotypes successfully, was only revealed when near-complete sequence infor- in silico mapping will require much larger numbers of mation became available. If the structure is mis-specified, strains than have so far been analysed. as is likely with a low density of markers (which in this The second difficulty for in silico mapping concerns particular region would be more than about 10 kb apart), the structure of the genomes of inbred strains. Is the differences between strains will not be identified and so degree of mosaicism sufficient to satisfy the assump- the power to detect QTLs will be reduced. tions of the mapping method? The idea that large seg- In silico mapping is an intriguing and attractive ments of the genomes of inbred mice maintained their method, but it does require the ability to infer identi- sequence identity over many megabases has recently cal sequences unequivocally when neighbouring been modified by several publications that describe a genotypes are the same. The recent announcement much more complex picture37,102,103. that 15 inbred mouse strains are to be resequenced

NATURE REVIEWS | GENETICS VOLUME 6 | APRIL 2005 | 281 © 2005 Nature Publishing Group

REVIEWS

means that questions about appropriate marker den- guarantee that the physical coincidence of gene expres- sity will soon be addressed (see the US National sion and QTL mapping data mean the two are causally Institute of Environmental Health Sciences Press related. This is a problem for Peltz and colleagues, who Releases web site), but the potential of in silico map- claim to have identified an allele-specific functional ping will only materialize when we have the complete genomic element that regulates the expression of the sequences of more than 50 inbred strains. histocompatibility 2, class II antigen E-α gene (H2-Ea)98. In silico mapping was used to map H2-Ea expression Gene-expression profiling variation to a 1-kb block that lies within the first intron Few strategies even attempt to turn a QTL from a locus of the gene, followed by in vitro assays to identify a into a molecular variant, which highlights the difficulty putative 45-bp regulatory element. But differential in confirming the role of a gene in a complex pheno- expression of H2-Ea is caused by several mechanisms, type. Gene-expression profiling is not a method for including a loss-of-function deletion of 600 bp in the QTL detection by itself, but when combined with promoter and first exon of the H2-Ea gene111,112. genetic mapping data it can help to identify candidates, even for behavioural phenotypes104. Quantitative complementation In most examples, researchers have compared the Quantitative complementation, or QTL–knockout gene-expression profiles of inbreds and congenics, to interaction, which was designed originally for QTL reduce the number of background strain differences work in Drosophila by Trudy MacKay113, is a method for and to leave just a small number of genes for further testing the candidacy of a gene at a QTL. It is oblivious consideration. The good news is that relatively few to the nature and position of the responsible sequence differences are found, even when a large chromosome variant, a feature that is both its strength and weakness, segment is derived from two genetically distant because whereas one interpretation of a positive result is strains105.Gene-expression profiling identified the allelism to the QTL, another is . Strictly speaking, fatty-acid translocase gene CD36 as a QTL that affects the test is not complementation in its classical form insulin fatty-acid metabolism, an example that other because it is testing for an interaction between the null researchers have attempted to emulate, with varying allele and the QTL, rather than for a main effect of either. degrees of success: complement factor 5 has been sug- For this reason, the name QTL–knockout interaction test gested as a gene at a QTL that influences susceptibility is preferable, as suggested by Darvasi114. in a model of asthma106, lipoxygenase (Alox15) has The experimental design is simple — it requires off- been proposed as a candidate gene for bone mass107, spring from four crosses. An inbred animal bearing one an interferon-inducible gene for susceptibility to an QTL allele (for example ‘high’) is mated to an inbred autoimmune condition (systemic lupus)108 and glu- animal with a null allele of the gene of interest (‘m’) and tathione S-transferase M2 has been identified as a also to the co-isogenic wild-type animal (‘wt’). A similar gene that might be involved in hypertension109. pair of crosses is established, but this time using an Although this method has strengths, there are also inbred strain with the alternative QTL allele (‘low’). If several caveats. First, differential gene expression is not the difference in mean phenotype between the high/m always a marker of a QTL. Variants that alter protein and low/m genotypes is greater than that between the structure might not alter expression levels, or there high/wt and low/wt genotypes then we have evidence of could be compensatory mechanisms that obscure the quantitative failure of the mutation to complement the effect of a QTL on expression. Todd and colleagues QTL alleles. This is detected as a statistical ‘cross’ (m or used microarray analysis to study a model of type I dia- wt) by ‘line’ (high or low) interaction in a two-way betes and reported that disease protection that is con- analysis of variance. One biological interpretation of a ferred by each of three known loci was not reflected in significant interaction is that the expression of the wild- global effects on the non-induced immune system, and type (that is, functional) gene is modulated by a QTL that gene-regulatory systems seemed to be remarkably allele on the homologous chromosome. The test does robust to genetic variation105. not implicate any particular QTL, which could be any- Second, expression differences might be restricted to where on the genome where the high and low strains certain tissues or developmental stages. For example, differ. expression of 5HT1a receptors (Htr1a) in the forebrain is QTL–knockout interaction tests require co-isogenic required to modulate anxiety during embryonic and fetal wild types, which can be difficult to obtain in mice. life, but it is not required for the same task in adult ani- Knockouts created in a 129 strain are usually back- mals110.Therefore, expression differences for genes that crossed onto a different strain (typically C57BL/6) so encode the serotonin receptor that are relevant to anxiety that often, no pure co-isogenic wild type is available. would not be detected in the adult brain, although the However, when the experimenter has only the hybrid phenotype is assessed in the adult animal. Gene-expres- to work with, the problem of mixed background can sion analyses will have to be carried out across a large sometimes be overcome by taking advantage of the number of tissues at different developmental times to mosaic nature of the mouse genome: some regions of determine whether the gene is differentially expressed. the 129 strain will be identical to the strain onto Third, finding a gene-expression difference within a which it has been backcrossed. Where the targeted relevant tissue in a relevant biochemical pathway does gene occurs in such a region (or in a region that is not prove the gene’s candidacy at the QTL. There is no known from genetic crosses not to carry QTLs that

282 | APRIL 2005 | VOLUME 6 www.nature.com/reviews/genetics © 2005 Nature Publishing Group

REVIEWS

influence the trait of interest), and the rest of the genetic architecture, rather than considering whether 129 strain has been removed by repeated backcrossing, there is a strategy that can reduce the genetic basis of then it should be possible to find an appropriate co-iso- any phenotype to a monogenic condition. genic wild type. For example, by extensive resequencing High-resolution mapping of small-effect loci will of the 129, C57BL/6J and DBA/2J strains, we showed that undoubtedly benefit from some of the new resources inbred C57BL/6 could be combined with a targeted we have reviewed. Although some, such as the use of mutation of the Rgs2 gene in a quantitative complemen- CSS, do little more than accelerate QTL discovery, and tation test31.This arduous task can be avoided in the others, such as in silico mapping, make claims that future once the relevant strains have been fully rese- have yet to be realized, there are grounds for opti- quenced. Alternatively, knockouts could be made and mism. The use of high-resolution mapping in outbred maintained on a single background or obtained by mice and the construction of a large set of RI strains screening the DNA of mutagenized inbred mice115. (the Collaborative Cross), combined with RISTs, will make high-resolution mapping routine; and once the Conclusion sequences of many inbred strains are available, in silico Throughout this review we have emphasized two mapping could become a powerful tool. Nevertheless, points: the importance of being able to resolve QTLs even the ability to map down to under a kilobase will that contribute to less than 5% of the phenotypic not always deliver genes into the hands of researchers. variance into small regions and the need for methods The availability of whole-genome analyses, at the level that identify genes, not sequence variants, at the QTL. of both sequence variation and gene expression, can We stress the first point because success stories in induce a hubris which blinds us to the fact that com- QTL cloning have identified genetic loci with atypi- plex phenotypes continue to frustrate genetic analysis. cally large effects that have obviously functional This is one reason why we have stressed gene iden- sequence changes. Therefore, an expectation has tification as a goal for QTL mapping — we lack arisen that loci with smaller effects can be character- methods that deliver genes. Almost all the methods ized using extensions of the same strategies: more we discuss are genetic mapping strategies that chase mice, more congenics and more expression arrays. the segregating sequence variants that are responsible However, methods that work for large-effect QTLs are for a QTL; they do not directly find genes. It is not our not well suited to finding genes that underlie most intention to diminish the importance of characteriz- QTLs. Not only do most QTLs typically have small ing sequence variants, as there are cases where we will effects116,but the type of sequence variant responsible, need to know the causal SNP (or whatever other form even when it occurs in a coding region, is also hard to the molecular pathology takes). Instead, we wish to recognize36, and the expression differences, if present, make it clear that the two goals are not necessarily the are subtle105. same. Stated simply, finding the QTN does not neces- Given its importance for determining the success sarily give us the quantitative trait gene. One explana- of QTL cloning, it is perhaps surprising that so little tion for the subtle effects of QTLs is that they lie in attention has been paid to emerging clues that reveal regulatory regions (the sequence features of which are considerable variation in the genetic architecture of poorly understood) that can lie at a considerable dis- phenotypes. It seems that the number of loci, their tance from cognate transcriptional units. For exam- relative effects, and how they interact with each other ple, in the polydactylous mouse mutant Sasquatch, and the environment, might vary considerably (Shh) is expressed at an ectopic site. between phenotypes, indicating that some pheno- Characterization of the mutant led to the identification of types might be more tractable to QTL cloning than an Shh enhancer element that lies within intron 5 of a others. For example, it is striking that susceptibility to novel gene, and is also involved in limb development infectious disease in inbred strains has frequently (limb region 1 (Lmbr1)128), that is situated 1 Mb from turned out to be due to large genetic effects, and, Shh129,130.Consider what could happen if a QTL had been where characterized, to null alleles or coding-sequence found at this regulatory region: the obvious, but erro- mutations117–122.Why this should be the case, when neous, conclusion would be that the Lmbr1 gene was the other apparently equally complex phenotypes (such quantitative trait gene. as , drug responses and obesity) have The transition from locus to gene remains a formi- much more complex genetic architecture, is unclear, dable task. None of the strategies we have reviewed but it might be a consequence of the nature and here provide a comprehensive solution to gene identi- extent of the selective pressure that is exerted by infec- fication after QTL mapping. Simple approaches, such tious agents. There are also differences in the extent to as investigating the phenotype of the knockout, will which gene-by-gene interactions (epistasis) shape a add weight to the candidature of the gene, but will not phenotype. Pervasive epistatic effects have been docu- prove it. Definitive studies, such as the targeted inter- mented in some autoimmune conditions123,morphol- change of nucleotides between two strains, will prove ogy124 and in susceptibility to some cancers125,126,but that a sequence variant operates as a QTL, but will not despite intensive searches, the genetic architecture unambiguously identify a gene. For that, we still need underlying fear-related phenotypes consists almost functional assays that make few assumptions about entirely of additive effects127.QTL cloning strategies the biology of candidate genes (for example, where need to accommodate all these possible types of and when they are expressed, and at what levels).

NATURE REVIEWS | GENETICS VOLUME 6 | APRIL 2005 | 283 © 2005 Nature Publishing Group

REVIEWS

Among the approaches described here, only the in RNAi technology might aid functional investiga- gene–knockout interaction test comes close to meeting tion of those cases in which QTLs operate by altering the necessary requirements. Where a null allele of the expression. It might soon be possible to reproduce relevant gene can be found, and an appropriate co-iso- naturally occurring variation, to abrogate the exp- genic strain is available, the test provides a simple test ression of either parental mRNA, or to generate of candidacy and interpretation of a positive result, null alleles on the appropriate background for allelism or epistasis provides important information. gene–knockout interaction testing. The challenge The current drawback is the lack of appropriate for the coming years is to develop new technologies mutants, but this might change with plans to knock that will accelerate gene identification after detection out all mouse genes131.Furthermore, developments of a QTL.

1. Hilbert, P. et al. Chromosomal mapping of two genetic loci 24. Garrett, M. R. & Rapp, J. P. Two closely linked interactive 40. Darvasi, A., Weinreb, A., Minke, V., Weller, J. I. & Soller, M. associated with blood-pressure regulation in hereditary blood pressure QTL on rat chromosome 5 defined using Detecting marker-QTL linkage and estimating QTL gene hypertensive rats. Nature 353, 521–529 (1991). congenic Dahl rats. Physiol. 8, 81–86 (2002). effect and map location using a saturated genetic map. 2. Jacob, H. et al. Genetic mapping of a gene causing 25. Garrett, M. R. & Rapp, J. P. Multiple blood pressure QTL on Genetics 134, 943–951 (1993). hypertension in the stroke-prone spontaneously rat chromosome 2 defined by congenic Dahl rats. Mamm. 41. Darvasi, A. & Soller, M. A simple method to calculate hypertensive rat. Cell 67, 213–224 (1991). Genome 13, 41–44 (2002). resolving power and confidence interval of QTL map 3. Lander, E. S. & Botstein, D. Mapping Mendelian factors 26. Frantz, S., Clemitson, J. R., Bihoreau, M. T., Gauguier, D. & location. Behav. Genet. 27, 125–132 (1997). underlying quantitative traits using RFLP linkage maps. Samani, N. J. Genetic dissection of region around the Sa 42. Darvasi, A. Experimental strategies for the genetic dissection Genetics 121, 185–199 (1989). gene on rat chromosome 1: evidence for multiple loci of complex traits in animal models. Nature Genet. 18, 19–24 4. Todd, J. A. et al. Genetic analysis of autoimmune type 1 affecting blood pressure. Hypertension 38, 216–221 (2001). (1998). diabetes mellitus in mice. Nature 351, 542–547 (1991). 27. Podolin, P. L. et al. Localization of two insulin-dependent An excellent review of QTL detection and fine-mapping 5. Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative diabetes (Idd) genes to the Idd10 region on mouse methods in rodents, and the first description of the Traits (Sinauer Associates, Sunderland, Massachusetts, Chromosome 3. Mamm. Genome 9, 283–286 (1998). recombination inbred segregation test, which was then 1998). 28. Puel, A. et al. Identification of two quantitative trait loci extended to Yin–Yang crosses in reference 58. 6. Turri, M. G., Henderson, N. D., DeFries, J. C. & Flint, J. involved in antibody production on mouse chromosome 8. 43. Darvasi, A. & Soller, M. Advanced intercross lines, an Quantitative trait locus mapping in laboratory mice derived Immunogenetics 47, 326–331 (1998). experimental population for fine genetic mapping. Genetics from a replicated selection experiment for open-field activity. 29. Bihl, F., Brahic, M. & Bureau, J. F. Two loci, Tmevp2 and 141, 1199–1207 (1995). Genetics 158, 1217–1226 (2001). Tmevp3, located on the telomeric region of , The first description of advanced intercross lines for 7. Turri, M. G., Datta, S. R., DeFries, J., Henderson, N. D. & control the persistence of Theiler’s virus in the central QTL mapping in rodents. Flint, J. QTL analysis identifies multiple behavioral nervous system of mice. Genetics 152, 385–392 (1999). 44. Matin, A., Collin, G. B., Asada, Y., Varnum, D. & Nadeau, J. H. dimensions in ethological tests of anxiety in laboratory mice. 30. Mackay, T. F. The genetic architecture of quantitative traits. Susceptibility to testicular germ-cell tumours in a Curr. Biol. 11, 725–734 (2001). Annu. Rev. Genet. 35, 303–339 (2001). 129.MOLF-Chr 19 chromosome substitution strain. Nature 8. Henderson, N. D., Turri, M. G., DeFries, J. C. & Flint, J. QTL 31. Yalcin, B. et al. Genetic dissection of a behavioral Genet. 23, 237–240 (1999). Analysis of multiple behavioral measures of anxiety in mice. quantitative trait locus shows that Rgs2 modulates anxiety in The first use of a chromosome-substitution strain for Behav. Genet. 34, 267–293 (2004). mice. Nature Genet. 36, 1197–1202 (2004). QTL mapping. 9. Belknap, J. K. & Atkins, A. L. The replicability of QTLs for The complexity of QTL architecture in mice becomes 45. Nadeau, J. H., Singer, J. B., Matin, A. & Lander, E. S. murine alcohol preference drinking behavior across eight apparent in this paper, which used probabilistic Analysing complex genetic traits with chromosome independent studies. Mamm. Genome 12, 893–899 (2001). ancestral haplotype reconstruction in outbred mice substitution strains. Nature Genet. 24, 221–225 (2000). 10. Nadeau, J. H. & Frankel, W. N. The roads from phenotypic and a knockout interaction test to identify a candidate 46. Roman, R. J. et al. in Cold Spring Harbor Symposia on variation to gene discovery: mutagenesis versus QTLs. gene. Quantitative Biology Vol. LXVII 309–315 (Cold Sping Harbor Nature Genet. 25, 381–384 (2000). 32. Mackay, T. F. The genetic architecture of quantitative traits: Laboratory, New York, 2002). A discussion of the relative advantages of QTL lessons from Drosophila. Curr. Opin. Genet. Dev. 14, 47. Cowley, A. W. Jr, Liang, M., Roman, R. J., Greene, A. S. & mapping and mutagenesis for investigating the 253–257 (2004). Jacob, H. J. Consomic rat model systems for physiological molecular basis of complex traits. An excellent review of the genetic basis of complex genomics. Acta Physiol. Scand. 181, 585–592 (2004). 11. Korstanje, R. & Paigen, B. From QTL to gene: the harvest traits, from the view point of Drosophila genetics. 48. Law, C. N. The location of genetic factors affecting a begins. Nature Genet. 31, 235–236 (2002). 33. Steinmetz, L. M. et al. Dissecting the architecture of a quantitative character in wheat. Genetics 53, 487–498 12. Glazier, A. M., Nadeau, J. H. & Aitman, T. J. Finding genes quantitative trait locus in yeast. Nature 416, 326–330 (1966). that underlie complex traits. Science 298, 2345–2349 (2002). (2002). 49. Caligari, P. D. & Mather, K. Genotype–environment 13. Page, G. P., George, V., Go, R. C., Page, P. Z. & Allison, D. B. Even yeast have QTLs, and the formidable power of interaction. III. Interactions in Drosophila melanogaster. ‘‘Are we there yet?’’: Deciding when one has demonstrated yeast genetics was used here to show that neither Proc. R. Soc. Lond. B 191, 387–411 (1975). specific genetic causation in complex diseases and expression differences nor sequence variation are 50. Singer, J. B., Hill, A. E., Nadeau, J. H. & Lander, E. S. quantitative traits. Am. J. Hum. Genet. 73, 711–719 (2003). enough to identify their molecular basis. This paper 14. Abiola, O. et al. The nature and identification of quantitative Mapping quantitative trait loci for anxiety in chromosome trait loci: a community’s view. Nature Rev. Genet. 4, introduced reciprocal hemizygosity for gene substitution strains of mice. Genetics 15 September 2004 911–916 (2003). identification. (doi:10.1534/genetics.104.031492). 15. Ferraro, T. N. et al. Fine mapping of a seizure susceptibility 34. Aitman, T. J. et al. Quantitative trait loci for cellular defects in 51. Singer, J. B. et al. Genetic dissection of complex traits with locus on mouse chromosome 1: nomination of Kcnj10 as a glucose and fatty acid metabolism in hypertensive rats. chromosome substitution strains of mice. Science 304, causative gene. Mamm. Genome 15, 239–251 (2004). Nature Genet. 16, 197–201 (1997). 445–448 (2004). 16. Shirley, R. L., Walter, N. A., Reilly, M. T., Fehr, C. & Buck, K. J. 35. Vingsbo-Lundberg, C. et al. Genetic control of arthritis This paper describes the construction of the first Mpdz is a quantitative trait gene for drug withdrawal onset, severity and chronicity in a model for rheumatoid complete set of chromosome-substitution strains seizures. Nature Neurosci. 7, 699–700 (2004). arthritis in rats. Nature Genet. 20, 401–404 (1998). and their application in genome-wide QTL 17. Beavis, W. D. in Molecular Analysis of Complex Traits 36. Thomas, P. D. & Kejariwal, A. Coding single-nucleotide mapping. (ed. Paterson, A. H.) 123–150 (CRC, Boca Raton, Florida, polymorphisms associated with complex vs. Mendelian 52. Belknap, J. K. Chromosome substitution strains: some 1998). disease: evolutionary evidence for differences in molecular quantitative considerations for genome scans and fine 18. Beavis, W. D. in 49th Annual Corn and Sorghum Research effects. Proc. Natl Acad. Sci. USA 101, 15398–15403 mapping. Mamm. Genome 14, 723–732 (2003). Conference 252–268 (American Seed Trade Association, (2004). 53. Churchill, G. A. et al. The Collaborative Cross, a community Washington DC, 1994). The molecular basis of complex traits in humans is resource for the genetic analysis of complex traits. Nature 19. Legare, M. E., Bartlett, F. S. & Frankel, W. N. A major effect shown to differ from the molecular basis of disorders Genet. 36, 1133–1137 (2004). QTL determined by multiple genes in epileptic EL mice. owing to highly penetrant mutations. This paper describes the Collaborative Cross and Genome Res. 10, 42–48 (2000). 37. Yalcin, B. et al. Unexpected complexity in the haplotypes of explains what the proposed resource would provide 20. Stylianou, I. M. et al. Genetic complexity of an obesity QTL commonly used inbred strains of laboratory mice. Proc. Natl for complex trait analysis. (Fob3) revealed by detailed genetic mapping. Mamm. Acad. Sci. USA 101, 9734–9739 (2004). 54. Belknap, J. K., Mitchell, S. R., O’Toole, L. A., Helms, M. L. & Genome 15, 472–481 (2004). The first of a series of papers that showed that the Crabbe, J. C. Type I and type II error rates for quantitative 21. Christians, J. K. & Keightley, P. D. Fine mapping of a murine DNA-sequence relationship between inbred mouse trait loci (QTL) mapping studies using recombinant inbred growth locus to a 1.4-cM region and resolution of linked strains is remarkably complex, a finding with mouse strains. Behav. Genet. 26, 149–160 (1996). QTL. Mamm. Genome 15, 482–491 (2004). important implications for in silico mapping strategies 55. Valdar, W., Flint, J. & Mott, R. Simulating the collaborative 22. Ariyarajah, A. et al. Dissecting quantitative trait loci into (see also references 102 and 103). cross: power of QTL detection and mapping resolution opposite blood pressure effects on Dahl rat chromosome 8 38. Nobrega, M. A., Ovcharenko, I., Afzal, V. & Rubin, E. M. afforded by a large set of recombinant inbred strains. by congenic strains. J. Hypertens. 22, 1495–1502 (2004). Scanning human gene deserts for long-range enhancers. Genetics (in the press). 23. Alemayehu, A., Breen, L., Krenova, D. & Printz, M. P. Science 302, 413 (2003). 56. Williams, R. W. et al. Genetic structure of the LXS panel of Reciprocal rat chromosome 2 congenic strains reveal 39. van Ooijen, J. W. Accuracy of mapping quantitative trait loci recombinant inbred mouse strains: a powerful resource for contrasting blood pressure and heart rate QTL. Physiol. in autogamous species. Theor. Appl. Genet. 84, 803–811 complex trait analysis. Mamm. Genome 15, 637–647 Genomics 10, 199–210 (2002). (1992). (2004).

284 | APRIL 2005 | VOLUME 6 www.nature.com/reviews/genetics © 2005 Nature Publishing Group

REVIEWS

57. Peirce, J. L., Lu, L., Gu, J., Silver, L. M. & Williams, R. W. 84. Morris, A. P., Whittaker, J. C., Xu, C. F., Hosking, L. K. & 109. McBride, M. W. et al. Microarray analysis of rat chromosome A new set of BXD recombinant inbred lines from advanced Balding, D. J. Multipoint linkage-disequilibrium mapping 2 congenic strains. Hypertension 41, 847–853 (2003). intercross populations in mice. BMC Genet. 5, 7 (2004). narrows location interval and identifies mutation 110. Gross, C. et al. Serotonin1A receptor acts during 58. Shifman, S. & Darvasi, A. Mouse inbred strain sequence heterogeneity. Proc. Natl Acad. Sci. USA 100, development to establish normal anxiety-like behaviour in information and Yin–Yang crosses for QTL fine mapping. 13442–13446 (2003). the adult. Nature 416, 396–400 (2002). Genetics 1 November 2004 85. Meuwissen, T. H. & Goddard, M. E. Mapping multiple QTL A salutary lesson in the use of gene-expression (doi:10.1534/genetics.104.032474). using linkage disequilibrium and linkage analysis information methods: this paper is an example of a mutation in 59. Visscher, P. M. Speed congenics: accelerated genome and multitrait data. Genet. Sel. Evol. 36, 261–279 (2004). which the phenotype depends on where and when the recovery using genetic markers. Genet. Res. 74, 81–85 86. Churchill, G. A. & Doerge, R. W. Empirical threshold values mutation occurs. (1999). for quantitative trait mapping. Genetics 138, 963–971 111. Mathis, D. J., Benoist, C., Williams, V. E. 2nd, Kanter, M. & 60. Zhang, Y. et al. Positional cloning of a quantitative trait locus (1994). McDevitt, H. O. Several mechanisms can account for on chromosome 13q14 that influences immunoglobulin E 87. Mott, R. & Flint, J. Simultaneous detection and fine mapping defective E α gene expression in different mouse levels and asthma. Nature Genet. 34, 181–186 (2003). of quantitative trait Loci in mice using heterogeneous stocks. haplotypes. Proc. Natl Acad. Sci. USA 80, 273–277 (1983). 61. Lipkin, S. M. et al. The MLH1 D132H variant is associated Genetics 160, 1609–1618 (2002). 112. Jones, P. P., Murphy, D. B. & McDevitt, H. O. Variable with susceptibility to sporadic colorectal cancer. Nature 88. Wang, X., Korstanje, R., Higgins, D. & Paigen, B. Haplotype synthesis and expression of E α and Ae (E β) Ia polypeptide Genet. 36, 694–699 (2004). analysis in multiple crosses to identify a QTL gene. Genome chains in mice of different H-2 haplotypes. Immunogenetics 62. Guo, D. et al. A functional variant of SUMO4, a new Res. 14, 1767–1772 (2004). 12, 321–337 (1981). IκBα modifier, is associated with type 1 diabetes. 89. Manenti, G. et al. Haplotype sharing suggests that a 113. Long, A. D., Mullaney, S. L., Mackay, T. F. C. & Langley, C. H. Nature Genet. 36, 837–841 (2004). genomic segment containing six genes accounts for the Genetic interactions between naturally occurring alleles at 63. Stoll, M. et al. Genetic variation in DLG5 is associated with pulmonary adenoma susceptibility 1 (Pas1) locus activity in quantitative trait loci and mutant alleles at candidate loci inflammatory bowel disease. Nature Genet. 36, 476–480 mice. 23, 4495–4504 (2004). affecting bristle number in Drosophila melanogaster. (2004). 90. Park, Y. G., Clifford, R., Buetow, K. H. & Hunter, K. W. Multiple Genetics 144, 1497–1510 (1996). 64. Gretarsdottir, S. et al. The gene encoding cross and inbred strain haplotype mapping of complex-trait The first description of quantitative complementation phosphodiesterase 4D confers risk of ischemic stroke. candidate genes. Genome Res. 13, 118–121 (2003). testing for the investigation of candidate genes at Nature Genet. 35, 131–138 (2003). 91. Grubb, S. C., Churchill, G. A. & Bogue, M. A. A collaborative QTL, carried out in Drosophila melanogaster. 65. Prokunina, L. et al. A regulatory polymorphism in PDCD1 is database of inbred mouse strain characteristics. 114. Darvasi, A. Dissecting complex traits: the geneticists’ associated with susceptibility to systemic lupus Bioinformatics 20, 2857–2859 (2004). “around the world in 80 days”. Trends Genet. (in the press). erythematosus in humans. Nature Genet. 32, 666–669 92. Grupe, A. et al. In silico mapping of complex disease-related 115. Coghill, E. L. et al. A gene driven approach to the (2002). traits in mice. Science 292, 1915–1918 (2001). identification of ENU mutants in the mouse. Nature Genet. 66. Rioux, J. D. et al. Genetic variation in the 5q31 cytokine The paper that introduced in silico mapping to the 30, 255–256 (1999). gene cluster confers susceptibility to Crohn disease. Nature world of mouse genetics. 116. Flint, J. Analysis of quantitative trait loci that influence animal Genet. 29, 223–228 (2001). 93. Ferris, S. D., Sage, R. D. & Wilson, A. C. Evidence from behavior. J. Neurobiol. 54, 46–77 (2003). 67. Altshuler, D. et al. The common PPARγ Pro12Ala mtDNA sequences that common laboratory strains of inbred 117. Min-Oo, G. et al. Pyruvate kinase deficiency in mice protects polymorphism is associated with decreased risk of type 2 mice are descended from a single female. Nature 295, against malaria. Nature Genet. 35, 357–362 (2003). diabetes. Nature Genet. 26, 76–80 (2000). 163–165 (1982). 118. Mitsos, L. M. et al. Susceptibility to tuberculosis: a locus on 68. Hitzemann, R. et al. Multiple cross mapping (MCM) 94. Beck, J. A. et al. Genealogies of mouse inbred strains. mouse chromosome 19 (Trl-4) regulates Mycobacterium markedly improves the localization of a QTL for ethanol- Nature Genet. 24, 23–25 (2000). tuberculosis replication in the lungs. Proc. Natl Acad. Sci. induced activation. Genes Brain Behav. 1, 214–222 (2002). 95. Wade, C. M. et al. The mosaic structure of variation in the USA 100, 6610–6615 (2003). 69. Talbot, C. J. et al. Fine scale mapping of a genetic locus for genome. Nature 420, 574–578 (2002). 119. Diez, E. et al. Birc1e is the gene within the Lgn1 locus conditioned fear. Mamm. Genome 14, 223–230 (2003). 96. Lindblad-Toh, K. et al. Large-scale discovery and associated with resistance to Legionella pneumophila. 70. Mott, R., Talbot, C. J., Turri, M. G., Collins, A. C. & Flint, J. genotyping of single-nucleotide polymorphisms in the Nature Genet. 33, 55–60 (2003). A method for fine mapping quantitative trait loci in outbred mouse. Nature Genet. 24, 381–386 (2000). 120. Mitsos, L. M. et al. Genetic control of susceptibility to animal stocks. Proc. Natl Acad. Sci. USA 97, 12649–12654 97. Wiltshire, T. et al. Genome-wide single-nucleotide infection with Mycobacterium tuberculosis in mice. Genes (2000). polymorphism analysis defines haplotype patterns in Immunol. 1, 467–477 (2000). The introduction of probabilistic ancestral haplotype mouse. Proc. Natl Acad. Sci. USA 100, 3380–3385 (2003). 121. Vidal, S. M., Malo, D., Vogan, K., Skamene, E. & Gros, P. reconstruction for mapping QTL using heterogeneous References 95, 96 and 97 presented the first Natural resistance to infection with intracellular parasites: stocks of mice. description of genome-wide distribution of genetic isolation of a candidate for Bcg. Cell 73, 469–485 (1993). 71. Talbot, C. J. et al. High-resolution mapping of quantitative variation in the mouse, and suggested that the 122. Poltorak, A. et al. Defective LPS signaling in C3H/HeJ and trait loci in outbred mice. Nature Genet. 21, 305–308 (1999). observed mosaic structure was due to common C57BL/10ScCr mice: mutations in Tlr4 gene. Science 282, The first use of heterogeneous stock mice for QTL descent from a relatively few progenitors strains. 2085–2088 (1998). mapping 98. Liao, G. et al. In silico genetics: identification of a functional 123. Wanstrat, A. & Wakeland, E. The genetics of complex 72. Nagase, H. et al. Distinct genetic loci control development of element regulating H2-Eα gene expression. Science 306, autoimmune diseases: non-MHC susceptibility genes. benign and malignant skin tumours in mice. Nature Genet. 690–695 (2004). Nature Immunol. 2, 802–809 (2001). 10, 424–429 (1995). 99. Pletcher, M. T. et al. Use of a dense single nucleotide 124. Leamy, L. J., Routman, E. J. & Cheverud, J. M. An epistatic 73. Nagase, H., Mao, J. H. & Balmain, A. A subset of skin tumor polymorphism map for in silico mapping in the mouse. PLoS genetic basis for fluctuating asymmetry of mandible size in modifier loci determines survival time of tumor-bearing mice. Biol. 2, e393 (2004). mice. 56, 642–653 (2002). Proc. Natl Acad. Sci. USA 96, 15032–15037 (1999). 100. Usuka, J. et al. In silico mapping of mouse quantitative trait 125. van Wezel, T., Ruivenkamp, C. A., Stassen, A. P., Moen, C. J. 74. Ewart-Toland, A. et al. Identification of Stk6/STK15 as a loci. Science 5551, 2423 (2001). & Demant, P. Four new colon cancer susceptibility loci, candidate low-penetrance tumor-susceptibility gene in 101. Turri, M. G., De Fries, J. C., Henderson, N. D. & Flint, J. Scc6 to Scc9 in the mouse. Cancer Res. 59, 4216–4218 mouse and human. Nature Genet. 34, 403–412 (2003). Multivariate analysis of quantitative trait loci influencing (1999). 75. Manenti, G., Galbiati, F., Noci, S. & Dragani, T. A. Outbred variation in anxiety-related behavior in laboratory mice. 126. Fijneman, R. J., de Vries, S. S., Jansen, R. C. & Demant, P. CD-1 mice carry the susceptibility allele at the pulmonary Mamm. Genome 15, 69–76 (2004). Complex interactions of new quantitative trait loci, Sluc1, adenoma susceptibility 1 (Pas1) locus. Carcinogenesis 24, 102. Frazer, K. A. et al. Segmental phylogenetic relationships of Sluc2, Sluc3, and Sluc4, that influence the susceptibility to 1143–1148 (2003). inbred mouse strains revealed by fine-scale analysis of lung cancer in the mouse. Nature Genet. 14, 465–467 (1996). 76. Neale, B. M. & Sham, P. C. The future of association studies: sequence variation across 4.6 Mb of mouse genome. 127. Flint, J., De Fries, J. C. & Henderson, N. D. Little epistasis for gene-based analysis and replication. Am. J. Hum. Genet. Genome Res. 14, 1493–1500 (2004). anxiety-related measures in the DeFries strains of laboratory 75, 353–362 (2004). 103. Ideraabdullah, F. Y. et al. Genetic and haplotype diversity mice. Mamm. Genome 15, 77–82 (2004). 77. Zondervan, K. T. & Cardon, L. R. The complex interplay among wild-derived mouse inbred strains. Genome Res. 14, 128. Clark, R. M., Marker, P. C. & Kingsley, D. M. A novel among factors that influence allelic association. Nature Rev. 1880–1887 (2004). candidate gene for mouse and human preaxial Genet. 5, 89–100 (2004). 104. Sandberg, R. et al. Regional and strain-specific gene with altered expression in limbs of Hemimelic extra-toes 78. McKeigue, P. M. Prospects for admixture mapping of expression mapping in the adult mouse brain. Proc. Natl mutant mice. Genomics 67, 19–27 (2000). complex traits. Am. J. Hum. Genet. 76, 1–7 (2005). Acad. Sci. USA 97, 11038–11043 (2000). 129. Lettice, L. A. et al. A long-range Shh enhancer regulates 79. Perez-Enciso, M. Fine mapping of complex trait genes 105. Eaves, I. A. et al. Combining mouse congenic strains and expression in the developing limb and fin and is associated combining pedigree and linkage disequilibrium information: microarray gene expression analyses to study a complex with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735 a Bayesian unified framework. Genetics 163, 1497–1510 trait: the NOD model of type 1 diabetes. Genome Res. 12, (2003). (2003). 232–243 (2002). 130. Lettice, L. A. et al. Disruption of a long-range cis-acting 80. Meuwissen, T. H. & Goddard, M. E. Prediction of identity by An example of how gene-expression profiling might regulator for Shh causes preaxial polydactyly. Proc. Natl descent probabilities from marker-haplotypes. Genet. Sel. not be of help in gene identification, and a careful Acad. Sci. USA 99, 7548–7553 (2002). Evol. 33, 605–634 (2001). consideration of the limitations of this approach for 131. Austin, C. P. et al. The knockout mouse project. Nature 81. Meuwissen, T. H. & Goddard, M. E. Fine mapping of gene location. Genet. 36, 921–924 (2004). quantitative trait loci using linkage disequilibria with closely 106. Karp, C. L. et al. Identification of complement factor 5 as a 132. Belknap, J. K. Effect of within-strain sample size on QTL linked marker loci. Genetics 155, 421–430 (2000). susceptibility locus for experimental allergic asthma. Nature detection and mapping using recombinant inbred mouse 82. Farnir, F. et al. Simultaneous mining of linkage and linkage Immunol. 1, 221–226 (2000). strains. Behav. Genet. 28, 29–38 (1998). disequilibrium to fine map quantitative trait loci in outbred 107. Klein, R. F. et al. Regulation of bone mass in mice by the 133. Lander, E. & Kruglyak, L. Genetic dissection of complex half-sib pedigrees: revisiting the location of a quantitative lipoxygenase gene Alox15. Science 303, 229–232 (2004). traits: guidelines for interpreting and reporting linkage trait locus with major effect on milk production on bovine In contrast to reference 105, references 106 and 107 results. Nature Genet.11, 241–247 (1995). chromosome 14. Genetics 161, 275–287 (2002). are demonstrations of the power of gene-expression 134. Belknap, J. K., Mitchell, S. R., Otoole, L. A., Helms, M. L. 83. McPeek, M. S. & Strahs, A. Assessment of linkage profiling for gene location. & Crabbe, J. C. Type-I and type-II error rates for disequilibrium by the decay of haplotype sharing, with 108. Rozzo, S. J. et al. Evidence for an interferon-inducible gene, quantitative trait loci (Qtl) mapping studies using application to fine-scale genetic mapping. Am. J. Hum. Ifi202, in the susceptibility to systemic lupus. Immunity 15, recombinant inbred mouse strains. Behav. Genet. 26, Genet. 65, 858–875 (1999). 435–443 (2001). 149–160 (1996).

NATURE REVIEWS | GENETICS VOLUME 6 | APRIL 2005 | 285 © 2005 Nature Publishing Group

REVIEWS

135. Bolivar, V. J., Cook, M. N. & Flaherty, L. Mapping of 149. Vaughn, T. T. et al. Mapping quantitative trait loci for murine 162. Cormier, R. T. et al. Secretory phospholipase Pla2g2a quantitative trait loci with knockout/congenic strains. growth: a closer look at genetic architecture. Genet. Res. confers resistance to intestinal tumorigenesis. Nature Genet. Genome Res. 11, 1549–1552 (2001). 74, 313–322 (1999). 17, 88–91 (1997). 136. Moen, C. J. et al. The recombinant congenic strains — a 150. Cheverud, J. M. et al. Genetic architecture of adiposity in the 163. Ikeda, A. et al. Microtubule-associated protein 1A is a novel genetic tool applied to the study of colon tumor cross of LG/J and SM/J inbred mice. Mamm. Genome 12, modifier of tubby hearing (moth1). Nature Genet. 30, development in the mouse. Mamm. Genome 1, 217–227 3–12 (2001). 401–405 (2002). (1991). 151. Workman, M. S., Leamy, L. J., Routman, E. J. & 164. Yokoi, N. et al. Cblb is a major susceptibility gene for rat type 137. Iakoubova, O. A. et al. Genome-tagged mice (GTM): two Cheverud, J. M. Analysis of quantitative trait locus effects on 1 diabetes mellitus. Nature Genet. 31, 391–394 sets of genome-wide congenic strains. Genomics 74, the size and shape of mandibular molars in mice. Genetics (2002). 89–104 (2001). 160, 1573–1586 (2002). 165. Ueda, H. et al. Association of the T-cell regulatory gene 138. McClearn, G. E., Wilson, J. R. & Meredith, W. in 152. Reed, D. R. et al. Loci on chromosomes 2, 4, 9, and 16 for CTLA4 with susceptibility to autoimmune disease. Nature Contributions to Behavior-Genetic Analysis: the Mouse as a body weight, body length, and adiposity identified in a 423, 506–511 (2003). Prototype (eds Lindzey, G. & Thiessen, D.) 3–22 (Appleton genome scan of an F2 intercross between the 129P3/J and 166. Podolin, P. L. et al. Differential glycosylation of interleukin 2, Century Crofts, New York, 1970). C57BL/6ByJ mouse strains. Mamm. Genome 14, 302–313 the molecular basis for the NOD Idd3 type 1 diabetes gene? 139. Demarest, K., Koyner, J., McCaughran, J. Jr, Cipp, L. & (2003). Cytokine 12, 477–482 (2000). Hitzemann, R. Further characterization and high-resolution 153. Lammert, F., Carey, M. C. & Paigen, B. Chromosomal 167. Zhang, S. L. et al. Efficiency alleles of the Pctr1 modifier mapping of quantitative trait loci for ethanol-induced organization of candidate genes involved in cholesterol locus for plasmacytoma susceptibility. Mol. Cell Biol. 21, locomotor activity. Behav. Genet. 31, 79–91 (2001). gallstone formation: a murine gallstone map. 310–318 (2001). 140. Rocha, J. L., Eisen, E. J., Van Vleck, L. D. & Pomp, D. Gastroenterology 120, 221–238 (2001). 168. Hamilton-Williams, E. E. et al. Transgenic rescue implicates A large-sample QTL study in mice: I. Growth. Mamm. 154. Cicila, G. T. et al. High-resolution mapping of the blood β2-microglobulin as a diabetes susceptibility gene in Genome 15, 83–99 (2004). pressure QTL on using Dahl rat congenic nonobese diabetic (NOD) mice. Proc. Natl Acad. Sci. USA 141. Rocha, J. L., Eisen, E. J., Van Vleck, L. D. & Pomp, D. strains. Genomics 72, 51–60 (2001). 98, 11533–11538 (2001). A large-sample QTL study in mice: II. Body composition. 155. Farahani, P. et al. Reciprocal hemizygosity analysis of mouse Mamm. Genome 15, 100–113 (2004). hepatic lipase reveals influence on obesity. Obes. Res. 12, Acknowledgements 142. Brockmann, G. A. et al. QTLs for pre- and postweaning 292–305 (2004). The authors are supported by the Wellcome Trust. We would like to body weight and body composition in selected mice. 156. Crackower, M. A. et al. Angiotensin-converting enzyme 2 is thank C. Benoist for comments on the manuscript. Mamm. Genome 15, 593–609 (2004). an essential regulator of heart function. Nature 417, 143. Brockmann, G. A. et al. Single QTL effects, epistasis, 822–828 (2002). Competing interests statement and account for two-thirds of the phenotypic 157. Olofsson, P. et al. Positional identification of Ncf1 as a gene The authors declare no competing financial interests.

F2 variance of growth and obesity in DU6i x DBA/2 mice. that regulates arthritis severity in rats. Nature Genet. 33, Genome Res. 10, 1941–1957 (2000). 25–32 (2003). 144. Taylor, B. A., Wnek, C., Schroeder, D. & Phillips, S. J. 158. Ruivenkamp, C. A. et al. Ptprj is a candidate for the mouse Online links Multiple obesity QTLs identified in an intercross between the colon-cancer susceptibility locus Scc1 and is frequently NZO (New Zealand obese) and the SM (small) mouse deleted in human . Nature Genet. 31, 295–300 (2002). DATABASES strains. Mamm. Genome 12, 95–103 (2001). 159. Bachmanov, A. A. et al. Positional cloning of the mouse The following terms in this article are linked online to: 145. Pitman, W. A. et al. Quantitative trait locus mapping of genes saccharin preference (Sac) locus. Chem. Senses 26, Entrez: http://www.ncbi.nih.gov/Entrez that regulate HDL cholesterol in SM/J and NZB/B1NJ inbred 925–933 (2001). Rgs2 | Stk6 mice. Physiol. Genomics 9, 93–102 (2002). 160. Deschepper, C. F. et al. Functional alterations of the Nppa 146. Paigen, B. et al. Quantitative trait loci mapping for promoter are linked to cardiac ventricular hypertrophy in FURTHER INFORMATION cholesterol gallstones in AKR/J and C57L/J strains of mice. WKY/WKHA rat crosses. Circ. Res. 88, 223–228 Entrez Gene: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi? Physiol. Genomics 4, 59–65 (2000). (2001). CMD=Pager&DB=gene 147. Anunciado, R. V. et al. Quantitative trait locus analysis of 161. Aitman, T. J. et al. Identification of Cd36 (Fat) as an Gene Network: http://www.webqtl.org serum insulin, triglyceride, total cholesterol and phospholipid insulin-resistance gene causing defective fatty acid and Mouse Genome Informatics: http://www.informatics.jax.org

levels in the (SM/J x A/J)F2 mice. Exp. Anim. 52, 37–42 glucose metabolism in hypertensive rats. Nature Genet. 21, Mouse Phenome Database: http://aretha.jax.org/pub- (2003). 76–83 (1999). cgi/phenome/mpdcgi?rtn=docs/home 148. Colinayo, V. V. et al. Genetic loci for diet-induced The first demonstration that gene-expression US National Institute of Environmental Health Sciences Press atherosclerotic lesions and plasma lipids in mice. Mamm. profiling could be used to find genes that underlie Releases: http://www.niehs.nih.gov/oc/news/micedna.htm Genome 14, 464–471 (2003). QTLs. Access to this interactive links box is free online.

286 | APRIL 2005 | VOLUME 6 www.nature.com/reviews/genetics © 2005 Nature Publishing Group