484 Review TRENDS in Plant Science Vol.8 No.10 October 2003

Forward genetics and map-based cloning approaches

Janny L. Peters, Filip Cnudde and Tom Gerats

Department of Experimental Botany, Plant Genetics, University of Nijmegen, Toernooiveld 1, NL-6525 ED Nijmegen, The Netherlands

Whereas strategies seek to identify and select mutations in a known sequence, forward Glossary genetics requires the cloning of sequences underlying a Allelism test crosses: (complementation test for functional allelism) a test to particular mutant phenotype. Map-based cloning is determine whether two mutations are caused by the same gene. By crossing tedious, hampering the quick identification of candidate the two mutants, one obtains an individual in which one homologue of a genes. With the unprecedented progress in the sequen- particular chromosome carries one of the recessive mutations and the other homologue carries the other recessive mutation. If the phenotype of the cing of whole genomes, and perhaps even more with resulting heterozygote is wild type, we say that the two mutations complement the development of saturating marker technologies, each other, and they are usually in different genes. If the heterozygote shows the mutant phenotype, we say they do not complement each other and map-based cloning can now be performed so efficiently conclude that the mutations are in the same gene. A complementation group is that, at least for some plant model systems, it has a collection of mutations that affect the same gene. become feasible to identify some candidate genes Bacterial artificial chromosome (BAC): a vector used to clone DNA fragments of 100–300 kb in Escherichia coli cells. within a few months. This, in turn, will boost the use of Bulked segregant analysis (BSA): a rapid procedure for identifying markers in forward genetics approaches, as applied (for example) specific regions of the genome [34]. The method involves comparing two to isolating genes involved in natural variation and pooled DNA samples of individuals from a segregating population originating from a single cross. Within each pool, or bulk, the individuals are identical for genes causing phenotypic mutations as derived from the trait or gene of interest but are arbitrary for all other genes. Two pools with (second-site) mutagenesis screens. a different trait of interest are analysed to identify markers that distinguish them. Markers that are polymorphic between the pools will be genetically linked to the locus determining the trait used to construct the pools. There are basically two ways to link the sequence and CentiMorgans (cM): a measure of recombination frequency. This unit of function of a specific gene: forward and reverse genetics. linkage refers to the distance between two loci based on the number of recombination events occurring between them. Two loci are said to be 1 cM Reverse approaches rely upon sequence information as apart if recombination is observed between them in 1% of meioses. In retrieved from genome and EXPRESSED SEQUENCE TAG Arabidopsis, 1 cM is equivalent, on average, to 205 kb. (EST) (see Glossary) sequencing or TRANSCRIPT PROFILING Genetic marker: a readily identifiable genetic element that exists in two or more allelic forms and is inherited in association with particular genes and projects. The scientist starts with the selection of a specific genetic traits of interest. (set of) sequence(s) and tries to gain insight into the Epistasis: a condition in which one gene suppresses the expression of another underlying function(s) by selecting for mutations that gene (and therefore whatever phenotype it is responsible for) when the two genes are not alternate alleles of the same phenotype. The gene that does the disrupt the sequence and thereby, it is hoped, its function. suppressing is called the epistatic gene. The suppressed gene is called Tagging by either endogenous or heterologous transposa- hypostatic with regard to the epistatic gene. Expressed sequence tag (EST): a partial sequence obtained by performing a ble elements or T-DNA constructs provides a range of single raw sequence read from a random cDNA clone. ESTs have uses in the opportunities, for a restricted number of model species discovery of new genes, mapping and the identification of coding regions in [1,2]. These tagging approaches can identify the function genomic sequences. F2: the second generation of a cross, obtained by self-fertilization or of a specific gene by uncovering a specific phenotype. intercrossing of F1 individuals. Moreover, when a particular function is encoded by more Insertion/deletion (InDel) polymorphisms: DNA sequence variations that than one gene, reverse approaches are the only way to involve the insertion (or deletion) of one or more nucleotides. Linkage: a measure of the degree to which alleles at two loci fail to assort perform a step-by-step analysis of such redundant independently during meiosis and are inherited together more often than not. functions [3]. Marker-assisted selection (MAS): the use of linked genetic markers to select a A less-focused set of reverse approaches includes specific character or trait. Quantitative trait loci (QTLs): the genes that affect a trait that is measured on a antisense, cosuppression and RNA interference strategies quantitative scale. These traits are typically affected by more than one gene [4]. Although they indicate particular overall functions, and by environmental factors. Second-site mutation: a secondary mutation that suppresses another these strategies generally run the risk of putting a set mutation, so that the wild-type phenotype is restored in the double mutant. of related genes out of order, thereby obscuring the Single nucleotide polymorphisms (SNPs): a DNA sequence variation that contribution of each individual gene to the obtained involves a change in a single nucleotide. Targeting-induced local lesions in genomes (TILLING): a technique designed phenotype. Nonetheless, constructs targeting a single to identify ethyl methane sulfonate (EMS)-mutagenized plants that have a specific gene can be made and can be of value when mutation in a gene of interest whose sequence is known. TILLING combines working with species for which tagging approaches are pooling of DNA prepared from EMS-treated plants with the detection of a mutation by heteroduplex formation. not feasible. A much more widely applicable and promising Transcript profiling: measuring the mRNA levels of all expressed genes relative to one another. Corresponding author: Janny L. Peters ([email protected]). http://plants.trends.com 1360-1385/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. doi:10.1016/j.tplants.2003.09.002 Review TRENDS in Plant Science Vol.8 No.10 October 2003 485

species-independent reverse approach is TARGETING- advances in sequencing projects, the wealth of available INDUCED LOCAL LESIONS IN GENOMES (TILLING) [5]. marker systems and the progress made in methods to For specific model systems, such as Arabidopsis, detect DNA polymorphisms make fast MBC of a gene in a tagging strategies now require only the screening of model species such as Arabidopsis feasible [10]. The several databases to identify relevant insertion lines definition of polymorphisms and thus of markers is no (http://www.arabidopsis.org/links/insertion.html). longer a limiting factor in MBC. Unfortunately, this is not the general situation. The availability of such outstanding services is limited to a Maps and markers handfulofspecies,suchasrice(http://www.tigr.org/tdb/ The first concept of a genetic map was presented by Alfred e2k1/osa1/), maize (http://www.tigr.org/tdb/tgi/maize/), H. Sturtevant [11], who ordered five sex-linked characters wheat (http://wheat.pw.usda.gov/index.shtml), tomato in a linear fashion on the Y chromosome of Drosophila (http://www.sgn.cornell.edu/), and to a lesser degree melanogaster. Today, whole genomes are being sequenced Antirrhinum (http://www.antirrhinum.net/) and Petunia, with ever-increasing speed. In addition to a range of with Lotus (http://www.kazusa.or.jp/lotus/) and Medicago microbial and animal genomes, the Arabidopsis [12] and (http://www.noble.org/medicago/) as strong runners-up. rice [13,14] genomes have almost been sequenced. In total, But even for these, the situation is in general hardly there are around 40 smaller and larger genome-sequen- comparable to what is available for Arabidopsis. cing projects in progress for plants, including species such The main subject of this article relates to the opposite as Avena sativa (oat), Medicago sativa and Medicago set of approaches to gene function analysis: forward trunculata, Lotus corniculata, different Brassica species, genetics, which aims to identify the sequence change banana, barley, coffee, cotton, Eucalyptus, maize, Populus, that underlies a specific mutant phenotype. The starting soybean and tomato [15]. Between the conception of the point is an already available or a specifically searched for first map and the sequencing of whole genomes lie about and predicted phenotypic mutant of interest. Such 100 years of incredible developments in theoretical under- mutants can be the result of deliberate mutagenesis or standing and technological breakthroughs, leading to the be defined on the basis of existing variation. When the building of increasingly dense genetic maps, culminating mutation causing the phenotype is the result of a T-DNA or in physical maps in which genes or markers are located at transposon insertion, rapid identification of the gene of a precise sequence position. As a result, the definition of interest is at least theoretically possible by locating the the sequence and function of a gene in model species such sequence tag and analysing its neighbouring sequences. as Arabidopsis or rice can be carried out within a four-year However, the identification of a gene affected by a PhD project, something completely unimaginable less chemically or radiation-induced mutation requires a than 20 years ago. more-laborious map-based cloning (MBC) approach in A genetic map is constructed on the basis of recombina- which markers linked to the mutated gene are used to tion events between two non-sister chromatids of each pair delimit the region containing the gene of interest. There of homologous chromosomes during meiosis (Box 1). A are several good reasons to screen for mutants in chemical genetic localization experiment determines the order of or radiation mutagenized populations: mutagenesis can be linked markers. The distance determination [in CENTI- applied to essentially any species of interest, the spectrum MORGANS ( cm) or percentage recombination] is relative of induced mutants is broader than with tagging (as in medieval times, when the distance between towns approaches, mutagenesis is usually more efficient and was measured in hours). Recombination frequencies vary SECOND-SITE MUTATIONS are thus easier to obtain. between different chromosome parts, physical conditions Until recently, MBC was a rather tedious, time- and sexes. As a consequence, the ratio between genetic and consuming and often unsuccessful approach, comparable physical distance is not constant over the length of the to trying to reach the top of Mount Everest: although chromosome. Also, genetic distance depends on the progress is initially easy, the final 100 kb or meters, parental combination used: closely related lines will respectively, are increasingly difficult to conquer. Com- exhibit an intrinsically higher recombination frequency bined with a fashionable view of functional gene definition than distantly related lines do. by sequencing and reverse genetics approaches, this has In the early days, progress in mapping was hampered diminished interest in MBC approaches. Lack of pheno- by the lack of sufficient markers that did not exhibit types in reverse screenings, mainly because of redun- EPISTATIC interactions. Mutants could only be analysed as dancy, and improvements in MBC approaches, especially phenotypic black boxes. The analysis of polymorphisms in the availability of saturating marker systems, have put isozymes was a first step towards denser maps that were both strategies back on a level playing field. Finally, easier to create [16]. Further significant advancements QUANTITATIVE TRAIT LOCI (QTLs) underlying natural included the invention of environmentally insensitive variation simply require map-based cloning for their DNA-based marker systems such as restriction fragment sequence identification. Cloning QTLs is, in principle, no length polymorphism (RFLP) analysis [17] and, even more different from cloning any other gene but can at times be so, the development of PCR-based markers such as problematic because of phenotype definition. The specifics random amplified polymorphic DNA (RAPD), simple of QTL mapping have been described in several recent sequence repeats (SSR) and amplified fragment length reviews [6–9] (http://naturalvariation.org/). polymorphisms (AFLPsw)(Table 1). Thus, the availability For reasons stated above, an efficient MBC procedure of markers is no longer a bottleneck and, today, this is true appears to be an important research tool. Today, the for essentially any living organism. Moreover, sequencing http://plants.trends.com 486 Review TRENDS in Plant Science Vol.8 No.10 October 2003

SSLP markers are PCR based, abundant and co-dominant, Box 1. Meiotic recombination and their detection is straightforward and inexpensive. Map-based cloning relies on a biological process that takes place at However, specific primers need to be designed and tested the heart of meiosis: the high-frequency genetic exchange events of for each putative SSLP marker. In sequenced organisms, meiotic recombination. Meiotic recombination not only ensures this is a relatively undemanding task but, in species being proper chromosome disjunction but also contributes to genetic examined for the first time, SSLP markers need to be diversity among the gametes. Each pair of homologous chromo- somes undergoes at least one crossover to allow correct segregation developed de novo, which is a time-consuming and costly at the first meiotic division. The double-strand-break repair model business: first, sequence segments harbouring SSLPs have [48] still serves as a guideline for the basic description of meiotic to be identified; then, specific PCR primers have to be recombination in yeast and other eukaryotes. developed and tested. The use of random primers (as in Recombination events are not uniformly distributed throughout RAPD analysis) avoids this up-front input, but the reprodu- the genome. Instead, the frequency of crossovers per unit physical distance can vary substantially, even within a single chromosome cibility of this approach is usually not up to the required [49]. In Saccharomyces cerevisiae, recombination hot spots corre- standards, particularly between different laboratories. The spond to the preferred sites for meiosis-specific double-strand detection of SINGLE NUCLEOTIDE POLYMORPHISMS (SNPs) breaks that initiate recombination. In yeast, there is also a positive and INSERTION/DELETION (INDEL) POLYMORPHISMS has correlation between the occurrence of hot spots and an open been simplified by recent developments in sequencing chromatin configuration [50], rather than with specific DNA sequences. Furthermore, in most organisms, recombination is technology and is relatively straightforward even in crop concentrated in the gene-rich (proximal) regions of the chromosome species, particularly when many EST sequences are and strongly suppressed over large intervals around the centro- available. SNPs and InDels represent a virtually inex- meres. Analysis using classical markers forms the basis for the haustible source of polymorphic markers. For Arabidopsis, present understanding of meiotic recombination, but genetic 37 344 SNPs and 18 579 InDels have been identified recombination values can also be estimated directly by counting chiasmata. In Arabidopsis, for example, meiotic cells exhibit an (http://www.arabidopsis.org/Cereon/index.html). Our cur- average of 9.24 chiasmata [51]. rent knowledge of SNPs and their application to crop Environmental factors, such as temperature, can influence the genetics has recently been reviewed [23]. frequency of meiotic recombination. However, most frequently, such The AFLP technique [24] is highly reproducible and differences have a genetic basis. In many organisms, including tomato [52] and Arabidopsis [53], recombination during male and does not require prior sequence information. It can thus be female gametogenesis occurs at different rates. Sequence diversity applied directly to any organism. Genomic restriction in interspecific crosses [52] and heterozygosity [54] also influence fragments are adaptor-ligated, and PCR primers are meiotic recombination. Finally, several genes have been identified designed based on adaptor and restriction site sequences. that are essential for normal meiotic levels of recombination. These Subsets of fragments can subsequently be specifically mostly encode proteins required for the synapsis of chromosomes or 0 for catalysing key steps in DNA breakage, repair and recombination amplified by adding random selective nucleotides at the 3 [55]. Disrupting these genes usually results in depressed levels of end of the primers (Figure 1). The number of selective recombination [56]. Mutant loci that increase meiotic recombination nucleotides needed depends on the size of the genome frequencies, such as Rm1 in Petunia [57] or xrs4 in Arabidopsis [58], studied. Thus, although it is based on restriction and remain exceptional. The variation in meiotic recombination frequen- random selection, the targeted subsets of PCR templates cies among eight accessions of Arabidopsis [59] also indicates the existence of recombination regulatory elements, which are interest- are highly specific and the procedure is robust. A great ing to plant breeders. advantage is that one basic set of primers can be used for organisms with comparable genome sizes. projects enable us to assign markers a physical position on Map-based cloning in Arabidopsis the map (units in base pairs). For example, a map To initiate MBC of a gene, genome-wide mapping harbouring .1250 AFLP markers localized directly on procedures have been developed based on SSLP markers the Arabidopsis sequence was developed by combining [25] and SNPs [26], among other techniques. We have experimental and in silico AFLP analysis [18]. The exploited the fact that multiple AFLP markers can be physical position of these and several hundreds of other obtained per primer combination to develop an AFLP- markers can be found at the TAIR website (http://www. based genome-wide mapping strategy [18,27]. Like all arabidopsis.org/). Consequently, in sequenced model genome-wide mapping strategies, it is helpful at the species, we can now focus on a precisely defined region first steps of the MBC process only, when no knowledge of the genome that contains the gene of interest. about the location of the gene of interest is available. Our strategy establishes LINKAGE to a ,6Mb region Map-based cloning strategies: balancing the available within 3 days by analysing only 20–30 mutant marker systems individuals from a segregating population with eight Map-based cloning strategies use the fact that, as primer combinations that provide a well-dispersed grid distances between the gene of interest and the analysed of 85 AFLP markers covering the genome. Further markers decrease, so does the frequency of recombination. analysis of ,120 mutant individuals will typically To position any locus in the Arabidopsis genome in a 10-kb exhaust the available Arabidopsis AFLP map [27] and interval, one would need ,12 000 well-positioned mar- identifies a 200–800kb region within 1–2 weeks. In kers. In itself, this is feasible with presently available addition to linkage, the AFLP-based approach will technologies. In recent Arabidopsis MBC projects, simple show non-linkage to the rest of the genome, which is sequence length polymorphism (SSLP) markers have most essential for recognizing situations involving gross commonly been used to map the gene of interest [19–22]. chromosomal re-arrangements. http://plants.trends.com Review TRENDS in Plant Science Vol.8 No.10 October 2003 487

Table 1. An overview of marker systems Marker Abbreviation Description Refs Restriction fragment length RFLP The occurrence of variation in the length of DNA fragments that are produced [17] polymorphism after cleavage with a restriction enzyme. There are differences in DNA lengths because of the presence or absence of recognition site(s) for that particular restriction enzyme. Variable number tandem repeat VNTR A short DNA sequence that is present as tandem repeats (motifs of 10–100 [60,61] Minisatellites bases) and in highly variable copy number. Simple sequence length SSLP The sequence of unique DNA that flanks microsatellites (i.e. tandemly repeated [62] polymorphism motifs of 1–6 bases) is used to construct oligonucleotide primers. Variation is Simple sequence repeats SSR based on size differences of the microsatellite. Short tandem repeats STR Sequence tagged microsatellite STMS site Microsatellite SSLP, SSR, STR, STMS and microsatellite are synonyms. Inter-simple sequence repeat ISSR The stretches of unique DNA in between the SSRs are amplified. SSRs such as [63,64] Single primer amplification SPAR (GATA)4 are used to prime PCR on genomic DNA. Tetra-nucleotide repeats reaction perform better than di- or tri-nucleotide repeats. Cleaved amplified polymorphic CAPS PCR amplified DNA that is not yet polymorphic is digested with restriction [65] sequences endonucleases to reveal polymorphism. Derived cleaved amplified dCAPS A restriction enzyme recognition site that includes the SNP is introduced into [66] polymorphic sequences the PCR product by a primer containing one or more mismatches to template DNA. The resulting PCR product is subjected to restriction enzyme digestion to determine the presence or absence of the SNP. Allele-specific polymerase chain AS-PCR Refers to amplification of specific alleles, or DNA sequence variants, at the [67,68] reaction same locus. Specificity is achieved by designing one or both PCR primers so PCR amplification of specific PASA that they partially overlap the site of sequence difference between the amplified alleles alleles. Only a template with exactly the same sequence as the target DNA will be amplified. Allele-specific oligonucleotide ASO Allele-specific associated primers ASAP

Single-nucleotide amplified SNAP Modified AS-PCR procedure for assaying SNPs. Adding an extra mismatch, [69] polymorphisms coupled with the natural (SNP) mismatch at the 30 end, produces a dramatic reduction in PCR product yield of the non-specific allele, but has a relatively minor effect on the amplification of the specific allele.

Single-strand confirmational SSCP DNA fragments (200–800 bp) amplified by PCR using specific20–25 bp [70] polymorphism primers, followed by gel-electrophoresis of single-strand DNA to detect nucleotide sequence variation. The method is based on the dependence of the electrophoretic mobility of single-strand DNA on the secondary structure (conformation) of the molecule, which changes significantly in case of mutation. SSCP provides a method to detect nucleotide variation without the need to sequence DNA samples. Random amplified polymorphic RAPD A PCR product obtained from genomic DNA using a single or a combination of [71] DNA typically 10-mer oligonucleotides. Variation is based on the position and orientation of primer-annealing sites and the interval they span. Arbitrary primed polymerase AP-PCR Related to RAPD; uses longer arbitrary primers than RAPD does. [72] chain reaction DNA amplification fingerprinting; DAF Related to RAPD: uses 5–8 bp primers to generate more fragments. [73] Multiple arbitrary amplicon MAAF Collective term for techniques using single arbitrary primers. [74] profiling [75] Sequence characterized SCAR DNA fragments amplified by PCR using specific15–30 bp primers designed amplified region from nucleotide sequences established in cloned RAPD fragments. By using longer PCR primers, SCAR does not face the problem of poor reproducibility.

Amplified fragment length AFLP AFLPs are DNA fragments (80–500 bp) obtained from endonuclease restriction, [24] polymorphism followed by ligation of oligonucleotide adapters to the fragments and selective amplification by PCR. The PCR-primers consist of a core sequence (part of the adapter), a restriction enzyme-specific sequence and 1–3 selective nucleotides. The AFLP fragments are separated by gel-electrophoresis and generally scored as a dominant marker. Selective amplification of SAMPL The template is identical to the AFLP template. The rare cutter primer is [76]

microsatellite polymorphic loci replaced by (CA)72(TA)22 or (GT)72(TA)2

Before embarking on a large screen to select recombi- end. Potential candidates should be tested in ALLELISM nants for fine mapping, it is crucial to confirm that there TEST CROSSES. Once it is established that an unidentified are no comparable (and cloned) mutants located in the gene is being dealt with, fine mapping should be initiated. identified region. The sequence-based map of Arabidopsis Because the advantage of AFLP (many physically dis- genes with mutant phenotypes [28] is a great help to this persed markers per primer combination) is lost at this http://plants.trends.com 488 Review TRENDS in Plant Science Vol.8 No.10 October 2003

Isolation of genomic DNA

AATTC------G AATTC------T TAA------T Digestion G------CTTAA G------AAT T------AAT

EcoRI adapter MseI adapter Adapter ligation CTCGTAGACTGCGTACCAATTC------TTACTCAGGACTCAT CTGACGCATGGTTAAG------AATGAGTCCTGAGTAGCAG

CTCGTAGACTGCGTACCAATTC------TTACTCAGGACTCAT Pre-amplification step ← +AATGAGTCCTGAGTAGCAG

GTAGACTGCGTACCAATTC→ CTGACGCATGGTTAAG------AATGAGTCCTGAGTAGCAG

GTAGACTGCGTACCAATTC------TTACTCAGGACTCATCGTC Selective amplification step ← +++AATGAGTCCTGAGTAG

∗GACTGCGTACCAATTC++→ CATCTGACGCATGGTTAAG------AATGAGTCCTGAGTAGCAG TRENDS in Plant Science

Figure 1. The amplified fragment length polymorphism (AFLPw) technique [24]. Genomic DNA is digested with a rare cutter (e.g. Eco RI) and a frequent cutter (e.g. Mse I), which results in three types of fragments. Double-stranded adapters are ligated to the ends of the fragments to generate template DNA for amplification. Point mutations are incorporated into the adaptor sequence to prevent restriction after successful ligation. Amplification of the fragments usually takes place in two steps called pre-amplifi- cation and selective amplification. Selective nucleotides are included at the 30 end of the PCR primers so that only a subset of the fragments will be amplified (* indicates a labelled primer). The number of selective nucleotides (indicated by þ) used depends on the size of the genome under investigation. To avoid nonspecific amplification, the difference between the number of selective nucleotides in the pre- and selective amplification steps should be #2. The resulting fragments are visualized by polyacrylamide gel electrophoresis. Polymorphisms detectable on the gels are caused by mutation(s) in the restriction site(s), selective nucleotide(s) or insertion or deletions within the AFLP fragments. AFLP is a registered trademark of Keygene (http://www.keygene-products.com/). point, we advise designing other PCR-based markers as Map-based cloning in other systems flanking markers. The InDel polymorphisms (http://www. Although MBC is now relatively straightforward in arabidopsis.org/Cereon/index.html) are a practical source Arabidopsis, these conclusions cannot be extrapolated for designing such PCR markers but any other easy-to-use directly to other systems. However, for the first steps of marker will do. The flanking PCR markers are tested on MBC projects, the AFLP-based genome-wide mapping 1000–2000 individuals from a segregating population. procedure can confidently be used in any organism for The markers can be analysed simultaneously, which which a dense AFLP map is available (e.g. barley [29], makes identifying recombinants in the region of interest maize [30], tomato [31], lettuce [32], Petunia [33]) or can be easy [27]. All recombinant plants are selected and F3 seeds created. When no AFLP map is available, the AFLP collected to determine whether wild-type recombinants procedure can still be used because no prior sequence are homozygous or heterozygous for the locus. The selected information is required. In case of a single-gene recombinants can subsequently be used to further delimit project, a useful modification of the strategy is first the area containing the gene of interest to a region of to identify linked markers using BULKED SEGREGANT approximately 0.025–0.050 cM, corresponding to, on ANALYSIS (BSA) [34], by performing a large set of AFLP average, 5.5–11.0 kb. Markers necessary for this process primer combinations on DNA pools of mutants and can be developed from the available Cereon InDel and wild types. Instead of AFLP markers, any other SNP collection [10]. During the process of MBC, it is efficient marker system can be used in combination important to keep checking the existing databases for with BSA. As a result, one should be able easily to candidate genes in the identified region because, for identify closely linked markers, enabling subsequent model organisms such as Arabidopsis, an increasing regional fine mapping, followed by joining of BACTERIAL number of genes is being characterized, which implies ARTIFICIAL CHROMOSOME (BAC) sequences to develop a that the chance that a gene has already been cloned is physical map. For fine mapping, microsynteny of the increasing. A diagram of the described MBC strategy is species under investigation with Arabidopsis (or rice) presented in Figure 2 and can be compared to similar has been used successfully in some cases [35].An schemes [10,19]. interesting and, for plants, relatively new approach to http://plants.trends.com Review TRENDS in Plant Science Vol.8 No.10 October 2003 489

Steps Required

Mutant and wild type in different F2 mapping population (4 months) genetic backgrounds

Establish linkage and non-linkage using an AFLP-based genome-wide mapping 20Ð30 appropriate F2 individuals strategy (identifies a <6 Mb region) (in most cases: F2 mutants), AFLP map (3 days)

Identify a 200Ð800 kb region containing About 120 appropriate F2 individuals the gene of interest with AFLP markers (in most cases: F2 mutants), AFLP map (1Ð2 weeks)

TAIR, sequence-based map of Check known mutants in the identified Arabidopsis genes with mutant region phenotypes

Identify and test flanking PCR markers for fine-mapping purposes (1Ð2 weeks) e.g. InDel polymorphisms

Select recombinants in a large mapping population with the flanking markers 1000Ð2000 F2 seeds (1 month)

Fine mapping: use recombinants to narrow down area with InDel e.g. InDel polymorphisms and SNPs polymorphisms and SNPs (1Ð2 months)

TRENDS in Plant Science

Figure 2. A map-based cloning procedure. The time (left) and material (right) needed to map a mutation in Arabidopsis thaliana is given for each step in the procedure. See Ref. [18] for the AFLP map referred to in steps 2 and 3. For the sequence-based map of Arabidopsis genes with mutant phenotypes referred to in step 4, see Ref. [28]. See http://www.arabidopsis.org/ for the TAIR database; see http://www.arabidopsis.org/Cereon/index.html for further details of InDel polymorphisms and SNPs. Abbreviations: AFLP, amplified fragment length polymorphism; TAIR, The Arabidopsis Information Resource; InDel, insertion/deletion; SNP, single-nucleotide polymorphism. identifying linked markers is linkage disequilibrium few genes per 100 kb. Overall, it might appear that cloning mapping in natural populations [23,36]. a gene from large genomes requires an effort similar to Although fine mapping in plants with large genomes that of cloning from small genomes, once sequence remains difficult and labour intensive in comparison to information and marker availability improves for the Arabidopsis, a rather surprising development is taking species under investigation. place in MBC of genes from organisms with larger genomes. Although genome sizes can vary up to a 1000- Mapping applications fold, genetic maps (and number of genes) in general tend to For applications such as MARKER-ASSISTED SELECTION be more or less equal in size. Where one thus would expect (MAS) [44] and breeding by design [45], the definition of to get into trouble with a huge physical: genetic distance a limited interval is sufficient and one can stop once the ratio for the larger genomes, there are now ample segment carrying the gene of interest has been minimized examples showing a remarkably favourable local ratio to satisfaction. This application can also be used in plants [8]. From the data obtained so far, it appears that in large with larger genomes. If one decides to proceed ultimately genomes, genes are often positioned in gene-rich regions to isolate the gene of interest, some points should be taken with a local Arabidopsis-like kb/cM ratio [37,38]. The into consideration. In principle, the mapping procedure mentioning of an average ratio then becomes irrelevant. In can be continued until two markers flank the gene of tomato, for instance, the average is 700 kb/cM [39] but, in interest. Even for Arabidopsis, this would require the specific cases, 1cM equalled ,100 kb [40], 160 kb [41], screening of an extremely large population, which effec- 280 kb [42] and even 5 kb [43]. By contrast, in low- tively disqualifies this approach because it is impractical. recombination regions, the gene density might be equally A more sensible strategy is to aim to identify a relatively low, thus restricting candidate gene analysis to one or a small region (e.g. 50 kb in Arabidopsis) containing the http://plants.trends.com 490 Review TRENDS in Plant Science Vol.8 No.10 October 2003 gene of interest. In Arabidopsis, a 50 kb region on average 13 Goff, S.A. et al. (2002) A draft sequence of the rice genome (Oryza contains between four and ten genes. One way to identify sativa L. ssp. japonica). Science 296, 92–100 the gene of interest is to order the appropriate TILLING 14 Yu, J. et al. (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79–92 [5] lines and/or lines in which the candidate genes are 15 Bernal, A. et al. (2001) Genomes online database (GOLD): a monitor of tagged by a T-DNA or transposon. A comprehensive list of genome projects world-wide. Nucleic Acids Res. 29, 126–127 sequence tagged lines can be found at the website of the 16 Lewontin, R.C. and Hubby, J.L. (1966) A molecular approach to the Nottingham Arabidopsis Stock Centre (http://nasc.life. study of genetic heterozygosity in natural populations. Genetics 54, nott.ac.uk/); for obtaining TILLING lines, one should 595–609 17 Botstein, D. et al. (1980) Construction of a map in man consult the Arabidopsis TILLING project website (http:// using restriction fragment length polymorphisms. Am. J. Hum. Genet. tilling.fhcrc.org:9366/). Studying the phenotypes of such 32, 314–331 lines and eventually performing allelism tests might direct 18 Peters, J.L. et al. (2001) A physical amplified-fragment length you to the desired gene. Another rational approach is to polymorphism map of Arabidopsis. Plant Physiol. 127, 1579–1589 19 Lukowitz, W. et al. (2000) Positional cloning in Arabidopsis. Why it transform the mutant with subclones of the BAC(s) feels good to have a genome initiative working for you. Plant Physiol. containing the wild-type version of the gene of interest 123, 795–805 and to test whether the clone used for transformation can 20 Choe, S. et al. (2002) Arabidopsis brassinosteroid-insensitive dwarf12 complement the mutant of interest. Recently, BIBAC mutants are semidominant and defective in a glycogen synthase (binary-BAC) transformation libraries for Arabidopsis and kinase 3b-like kinase. Plant Physiol. 130, 1506–1515 21 Gonzalez-Guzman, M. et al. (2002) The short-chain alcohol dehydro- rice have been presented [46,47]. When the gene is genase ABA2 catalyzes the conversion of xanthoxin to abscisic eventually pinpointed, sequencing the mutant allele(s) aldehyde. Plant Cell 14, 1833–1846 and comparing these to the wild-type sequence will 22 Shirano, Y. et al. (2002) A gain-of-function mutation in an Arabidopsis definitely identify the desired gene. Toll interleukin1 receptor-nucleotide binding site-leucine-rich repeat type R gene triggers defence responses and results in enhanced disease resistance. Plant Cell 14, 3149–3162 Conclusion 23 Rafalski, A. (2002) Applications of single nucleotide polymorphsisms Reverse mutagenesis screens are popular because of their in crop genetics. Curr. Opin. Plant Biol. 5, 94–100 genome-wide reach and theoretical promise to capture all 24 Vos, P. et al. (1995) AFLP: a new technique for DNA fingerprinting. genes involved in a particular process. However, lack of Nucleic Acids Res. 23, 4407–4414 phenotypes is a general and disappointing feature of these 25 Ponce, M.R. et al. (1999) High-throughput genetic mapping in Arabidopsis thaliana. Mol. Gen. Genet. 261, 408–415 approaches. Here, we have argued that well-designed 26 Cho, R.J. et al. (1999) Genome-wide mapping with biallelic markers in forward mutagenesis screens together with a balanced Arabidopsis thaliana. Nat. Genet. 23, 203–207 strategy for MBC are an excellent approach for gene 27 Peters, J.L. et al. An AFLP-based genome-wide mapping strategy. function analysis. Although this is presently true only for Theor. Appl. Genet. (in press) model plant systems such as Arabidopsis and rice, once we 28 Meinke, D.W. et al. (2003) A sequence-based map of Arabidopsis genes with mutant phenotypes. Plant Physiol. 131, 409–418 have more sequence and marker availability it might also 29 Qi, X. et al. (1998) Use of locus-specific AFLP markers to construct a become the case for crop plants. high-density molecular map in barley. Theor. Appl. Genet. 96, 376–384 30 Vuylsteke, M. et al. (1999) Two high-density AFLP linkage maps of Zea References mays L.: analysis of distribution of AFLP markers. Theor. Appl. Genet. 1 Parinov, S. and Sundaresan, V. (2000) Functional genomics in 99, 921–935 Arabidopsis: large-scale insertional mutagenesis complements the 31 Haanstra, J.P. et al. (1999) An integrated high-density RFLP–AFLP genome sequencing project. Curr. Opin. Biotechnol. 11, 157 – 161 map of tomato based on two Lycopersicon esculentum £ L. pennellii F2 2 Bouche, N. and Bouchez, D. (2001) Arabidopsis gene knockout: populations. Theor. Appl. Genet. 99, 254–271 phenotypes wanted. Curr. Opin. Plant Biol. 4, 111–117 32 Jeuken, M. et al. (2001) An integrated interspecific AFLP map of 3 Meissner, R.C. et al. (1999) Function search in a large transcription lettuce (Lactuca) based on two L. sativa £ L. saligna F2 populations. factor gene family in Arabidopsis: assessing the potential of reverse Theor. Appl. Genet. 103, 638–847 genetics to identify insertional mutations in R2R3 Myb genes. Plant 33 Strommer, J. et al. (2002) AFLP maps of Petunia hybrida: building Cell 11, 1827–1840 maps when markers cluster. Theor. Appl. Genet. 105, 1000–1009 4 Matzke, M. et al. (2001) RNA: guiding gene silencing. Science 293, 34 Michelmore, R.W. et al. (1991) Identification of markers linked to 1080–1083 disease-resistance genes by bulked segregant analysis: a rapid method 5 Colbert, T.G. et al. (2001) High-throughput screening for induced point to detect markers in specific genomic regions by using segregating mutations. Plant Physiol. 126, 480–484 populations. Proc. Natl. Acad. Sci. U. S. A. 88, 9828–9832 6 Alonso-Blanco, C. and Koornneef, M. (2000) Naturally occurring 35 Oh, K. et al. (2002) Fine-mapping in tomato using microsynteny with variation in Arabidopsis: an underexploited resource for plant Arabidopsis genome: the Diageotropica (Dgt) locus. Genome Biol. 3, genetics. Trends Plant Sci. 5, 22–29 research0049 (http://genomebiology.com/2002/3/9/research/0049) 7 Flint, J. and Mott, R. (2001) Finding the molecular basis of 36 Brandon, S. et al. (2003) The lowdown on linkage disequilibrium. Plant quantitative traits: successes and pitfalls. Nat. Rev. Genet. 2, 437–445 Cell 15, 1502–1506 8 Remington, D.L. et al. (2001) Map-based cloning of quantitative trait 37 Fu, H.H. et al. (2001) The highly recombinogenic bz locus lies in an loci: progress and prospects. Genet. Res. 78, 213–218 unusually gene-rich region of the maize genome. Proc. Natl. Acad. Sci. 9 Yano, M. (2001) Genetic and molecular dissection of naturally U. S. A. 98, 8903–8908 occurring variation. Curr. Opin. Plant Biol. 4, 130–135 38 Brooks, S.A. (2002) Analysis of 106 kb of contiguous DNA sequence 10 Jander, G. et al. (2002) Arabidopsis map-based cloning in the post- from the D genome of wheat reveals high gene density and a complex genome era. Plant Physiol. 129, 440–450 arrangement of genes related to disease resistance. Genome 45, 11 Sturtevant, A.H. (1913) The linear arrangement of six sex-linked 963–972 factors in Drosophila, as shown by their mode of association. J. Exp. 39 Tanksley, S.D. et al. (1992) High density molecular linkage maps of the Zool. 14, 43–59 tomato and potato genomes. Genetics 132, 1141–1160 12 The Arabidopsis Genome Initiative (2000) Analysis of the genome 40 Ling, H.Q. et al. (2002) The tomato fer gene encoding a bHLH protein sequence of the flowering plant Arabidopsis thaliana. Nature 408, controls iron-uptake responses in roots. Proc. Natl. Acad. Sci. U. S. A. 796–815 99, 13938–13943 http://plants.trends.com Review TRENDS in Plant Science Vol.8 No.10 October 2003 491

41 Ling, H.Q. et al. (1999) Map-based cloning of chloronerva, a gene 60 Jeffreys, A.J. et al. (1985) Hypervariable ‘minisatellite’ regions in involved in iron uptake of higher plants encoding nicotianamine human DNA. Nature 314, 67–73 synthase. Proc. Natl. Acad. Sci. U. S. A. 96, 7098–7103 61 Jeffreys, A.J. et al. (1985) Individual-specific ‘fingerprints’ of human 42 Ballvora, A. et al. (2001) Chromosome landing at the tomato Bs4 locus. DNA. Nature 316, 76–79 Mol. Genet. Genomics 266, 639–645 62 Bell, C.J. and Ecker, J.R. (1994) Assignment of 30 microsatellite loci to 43 Eyal Fridman, E. et al. (2000) A recombination hotspot delimits a wild- the linkage map of Arabidopsis. Genomics 19, 137–144 species quantitative trait locus for tomato sugar content to 484 bp 63 Zietkiewicz, E. et al. (1994) Genome fingerprinting by simple sequence within an invertase gene. Proc. Natl. Acad. Sci. U. S. A. 97, 4718–4723 repeat (SSR)-anchored polymerase chain reaction amplification. 44 Dekkers, J.C. and Hospital, F. (2002) The use of in Genomics 20, 176–183 the improvement of agricultural populations. Nat. Rev. Genet. 3, 64 Gupta, M. et al. (1994) Amplification of DNA markers from 22–32 evolutionarily diverse genomes using single primers of simple- 45 Peleman, J.D. et al. (2003) Breeding by design. Trends Plant Sci. 8, sequence repeats. Theor. Appl. Genet. 89, 998–1006 330–334 65 Konieczny, A. and Ausubel, F.M. (1993) A procedure for mapping 46 Tao, Q. (2002) One large-insert plant-transformation-competent Arabidopsis mutations using co-dominant ecotype-specific PCR-based BIBAC library and three BAC libraries of japonica rice for genome markers. Plant J. 4, 403–410 research in rice and other grasses. Theor. Appl. Genet. 105, 1058–1066 66 Neff, M.M. et al. (1998) dCAPS, a simple technique for the genetic 47 Chang, Y.L. (2003) A plant-transformation competent BIBAC library analysis of single nucleotide polymorphisms: experimental appli- from the Arabidopsis thaliana Landsberg ecotype for functional and cations in Arabidopsis thaliana genetics. Plant J. 14, 387–392 comparative genomics. Theor. Appl. Genet. 106, 269–276 67 Bottema, C.D. and Sommer, S.S. (1993) PCR amplification of specific 48 Szostak, J.W. et al. (1983) The double-strand-break repair model for alleles: rapid detection of known mutations and polymorphisms. recombination. Cell 33, 25–35 Mutat. Res. 288, 93–102 49 Schmidt, R. et al. (1995) Physical map and organization of Arabidopsis 68 Gu, W-K. et al. (1995) Large scale, cost-effective screening of PCR thaliana chromosome 4. Science 270, 480–483 products in marker-assisted selection applications. Theor. Appl. Genet. 50 Wu, T.C. and Lichten, M. (1995) Factors that affect the location and 91, 465–470 frequency of meiosis-induced double-strand breaks in Saccharomyces 69 Drenkard, E. et al. (2000) A simple procedure for the analysis of single cerevisiae. Genetics 140, 55–66 nucleotide polymorphisms facilitates map-based cloning in Arabidop- 51 Armstrong, S.J. and Jones, G.H. (2003) Meiotic cytology and sis. Plant Physiol. 124, 1483–1492 chromosome behaviour in wild-type Arabidopsis thaliana. J. Exp. 70 Orita, M. et al. (1989) Detection of polymorphisms of human DNA by Bot. 54, 1–10 gel electrophoresis as single-strand conformation polymorphisms. 52 Ganal, M.W. and Tanksley, S.D. (1996) Recombination around the Proc. Natl. Acad. Sci. U. S. A. 86, 2766–2770 Tm2a and Mi resistance genes in different crosses of Lycopersicon 71 Williams, J.G. et al. (1990) DNA polymorphisms amplified by arbitrary peruvianum. Theor. J. Appl. Genet. 92, 101–108 primers are useful as genetic markers. Nucleic Acids Res. 18, 53 Armstrong, S.J. and Jones, G.H. (2001) Female meiosis in wild-type 6531–6535 Arabidopsis thaliana and in two meiotic mutants. Sex. Plant Reprod. 72 Welsh, J. and McClelland, M. (1991) Genomic fingerprints produced by 13, 177–183 PCR with consensus tRNA gene primers. Nucleic Acids Res. 19, 54 Barth, S. et al. (2001) Influence of genetic background and hetero- 861–866 zygosity on meiotic recombination in Arabidopsis thaliana. Genome 73 Caetano-Anolle´s, G. et al. (1991) DNA amplification fingerprinting 44, 971–978 using very short arbitrary oligonucleotide primers. Biotechnology 9, 55 Zickler, D. and Kleckner, N. (1999) Meiotic chromosomes: integrating 553–557 structure and function. Annu. Rev. Genet. 33, 603–754 74 Caetano-Anolle´s, G. et al. (1994) Multiple arbitrary amplicon profiling 56 Grelon, M. et al. (2001) AtSPO11-1 is necessary for efficient meiotic using short oligonucleotide primers. In Plant Genome Analysis recombination in plants. EMBO J. 20, 589–600 (Gresshoff, P.M., ed.), pp. 29–46, CRC Press 57 Cornu, A. et al. (1989) A genetic basis for variations in meiotic 75 Paran, I. and Michelmore, R.W. (1993) Development of reliable PCR recombination in Petunia hybrida. Genome 32, 46–53 based markers linked to downy mildew resistance genes in lettuce. 58 Masson, J.E. and Paszkowski, J. (1997) Arabidopsis thaliana mutants Theor. Appl. Genet. 85, 985–993 altered in homologous recombination. Proc. Natl. Acad. Sci. U. S. A. 76 Witsenboer, H. et al. (1998) Identification, genetic localization and 94, 11731–11735 allelic diversity of selectively amplified microsatellite polymorphic loci 59 Sanchez Moran, E. et al. (2002) Variation in chiasma frequency among (SAMPL) in lettuce and wild relatives (Lactuca spp.). Genome 40, eight accessions of Arabidopsis thaliana. Genetics 162, 1415–1422 923–936

Free journals for developing countries

The WHO and six medical journal publishers have launched the Access to Research Initiative, which enables ~70 developing countries to gain free access to biomedical literature through the Internet.

The science publishers, Blackwell, Elsevier Science, the Harcourt Worldwide STM group, Wolters Kluwer International Health and Science, Springer-Verlag and John Wiley, were approached by the WHO and the British Medical Journal in 2001. Initially, >1000 journals will be available for free or at significantly reduced prices to universities, medical schools, research and public institutions in developing countries. The second stage involves extending this initiative to institutions in other countries.

Gro Harlem Brundtland, director-general for the WHO, said that this initiative was ‘perhaps the biggest step ever taken towards reducing the health information gap between rich and poor countries’.

See http://www.healthinternetwork.net for more information.

http://plants.trends.com