Proc. Natl. Acad. Sci. USA Vol. 92, pp. 10824-10830, November 1995 Review Gene disruptions using P transposable elements: An integral component of the Drosophila project Allan C. Spradling*, Dianne M. Stern*, Istvan Kisst, John Rootet, Todd Laverty§, and Gerald M. Rubin§ *Howard Hughes Medical Institute Research Laboratories, Camegie Institution of Washington, 115 West University Parkway, Baltimore, MD 21210; tInstitute of , Biological Research Center, Hungarian Academy of Sciences, H-6701 Szeged, Hungary; tDepartment of Genetics, Cambridge University, Downing Street, Cambridge, CB2 3EH, England; and §Howard Hughes Medical Institute Research Laboratories, Department of Molecular and Cell Biology, University of Califomia, Berkeley, CA 94720-3200

ABSTRACT Biologists require ge- tion). Novel informatics tools are being able from the Bloomington, IN, Drosoph- netic as well as molecular tools to deci- developed to collect, analyze, and distrib- ila stock center, greatly facilitates both pher genomic information and ultimately ute these data. If this vast new store of gene disruption and mutant identification. to understand gene function. The Berke- structural information is to enhance our Investigators interested in a particular se- ley Drosophila Genome Project is address- understanding of human biology, how- quenced ORF can often find a strain from ing these needs with a massive gene dis- ever, it must be accompanied by more the collection in which that ORF is inser- ruption project that uses individual, ge- efficient methods to correlate genes with tionally mutated. Otherwise, insertions netically engineered P transposable functions. Currently, data base searches can usually be obtained close enough to elements to target open reading frames identify similarities to a given query se- the ORF of interest to rapidly generate throughout the Drosophila genome. DNA quence more than a third of the time. the desired . In addition, BDGP flanking the insertions is sequenced, Moreover, these similarities are often as- strains make it much easier to associate thereby placing an extensive series of ge- sociated with biological meaning because the thousands of genes Drosophila genet- netic markers on the physical genomic many of the entries in the public data base icists have defined over the last several map and associating insertions with spe- have been deposited by individuals in- decades with specific units cific open reading frames and genes. In- volved in hypothesis-driven research. With and their protein products. BDGP lines sertions from the collection now lie within the onset of whole can be identified that either disrupt such or near most Drosophila genes, greatly and large cDNA collections, database sim- genes or otherwise delimit their positions reducing the time required to identify new ilarities will be found more frequently. on the physical map to small molecular mutations and analyze gene functions. However, because future database sub- intervals. Finally, the inserted P elements Information revealed from these studies missions will come primarily from ge- in BDGP lines carry traps (6) about P element site speciricity is being nome-sequencing efforts which select tar- that can be used to efficiently acquire used to target the remaining open reading gets on the basis of developing complete information about the expression pattern frames. data bases rather than on their biological of disrupted genes. The strains in the function, the frequency of association of a current collection disrupt 20-25% of es- Genetics provides the most powerful ap- query sequence with well-characterized sential genes, provide information on proach available to understand the func- biochemical mechanisms is unlikely to rise their expression patterns, and link the tion of each human gene and to decipher in parallel. genetic, cytogenetic, and physical maps of the role played by the large noncoding If the model organism genome projects the Drosophila genome at "100-kb inter- component of the human genome. Model are to be maximally useful in assigning vals. The approaches made possible by the organisms, such as Drosophila melano- function to human DNA sequences, they BDGP strains are reducing or eliminating gaster, share many genes with humans must be accompanied by genetic studies so long delays in obtaining the tools needed whose sequences and functions have been that not only the sequences of the genes, to study gene function. In this report, we conserved. In addition to myriad similar- but also their biological functions, are discuss the current status of the BDGP ities in cellular structure and function, determined. To facilitate that end, BDGP gene disruption library and report some of humans and Drosophila share pathways has adopted a broad approach that com- the results revealed by this project. for intercellular signaling (1), develop- bines the determination of the genomic mental patterning (2), learning and be- sequence with the development and ap- Origin of the BDGP Gene-Disruption havior (3), as well as tumor formation and plication of methods for large-scale func- Project metastasis (4). The fruit fly provides a tional analysis. In the past, the effort powerful system to study the function of required to disrupt a particular gene or to Transposable elements provide a potent conserved genes since, unlike humans, any identify the ORF responsible for a mutant means of correlating genetic and molecu- open reading frame (ORF) within the strain's interesting properties has varied lar information because they generate a fruit fly genome can be mutated and widely and has frequently slowed simple, reproducible lesion upon insertion subjected to detailed functional analysis progress. Consequently, BDGP has un- that can be detected much more easily within the context of an intact organism. dertaken a gene-disruption project to ad- than damage produced by other mutagens The Berkeley Drosophila Genome dress this rate-limiting step in utilizing (7, 8). In D. melanogaster, the P transpos- Project (BDGP) has as its primary goal to Drosophila to assign function to human able element has been particularly useful map and sequence the "120 Mb of DNA genes. because it moves with high frequency but comprising the euchromatic-i.e., nonre- The BDGP gene-disruption project can be controlled by limiting the availabil- petitive-regions of the four Drosophila consists of a large collection ofDrosophila ity of an element-encoded (9, chromosomes (see for examples ref. 5; W. strains that each contain a single, geneti- 10). Early efforts focused on cloning spe- Kimmerly, K. Stultz, K. Lewis, V. Lustre, cally engineered P S. Lewis, D. Sun, R. Romero, C. Martin, inserted in a defined genomic region. The Abbreviations: ORF, open reading frame; and M. Palazzolo, personal communica- BDGP strain library, which is freely avail- BDGP, Berkeley Drosophila Genome Project. 10824 Downloaded by guest on September 26, 2021 Review: Spradling et aL Proc. Natl. Acad. Sci. USA 92 (1995) 10825

cific genes by mobilizing large numbers of a comprehensive gene-disruption library. tosomal insertion sites have been resolved; natural P elements (reviewed in ref. 11). As a byproduct, it was realized that the hence, the average distance between ele- The resulting genetic lines were unsuitable strains would prove extremely useful for ments within autosomal regions is about for genetic or phenotypic studies without linking the physical and genetic maps. 85 kb. extensive outcrossing to remove extrane- When the project began in 1993, more We took advantage of the fact that ous P elements and were rarely saved once than 3000 strains in which single P element many genes had independently been mu- flanking DNA had been cloned. P ele- insertions had been associated with reces- tated in more than one line to determine ment-mediated transformation allowed sive lethal or sterile phenotypes were col- how accurately the insertions had been strains containing just one or a few ele- lected from six laboratories (Table 1). The localized. Since most. Drosophila genes ments to be constructed (12). However, gene-disruption library was assembled studied to date are smaller than 50 kb, one only a limited number of strains could be from this starting material, and it contin- would expect that the insertions in allelic generated by microinjecting DNA, and the ues to be expanded today. In their starting lines would map within the same or an insertions could not be targeted into genes state, the lines were of limited use to the adjacent polytene band. Errors in local- of particular interest. research community and too numerous to ization would' show up as allelic lines In 1988, Cooley et al. (13) showed that be maintained in stock centers. Since most whose insertions had been mapped at individual, experimentally modified P el- of the transposon insertion sites were un- more widely separated sites. Complemen- ements can be mobilized in large genetic known, investigators would have to study tation crosses identified 750 lines that fell screens to generate thousands of stable thousands of lines instead of just the few into 180 complementation groups with mutant strains. About 15% of such trans- whose P element insertions lay in a between two and 25 alleles. The average poson insertions disrupted a gene re- genomic region of interest. Furthermore, standard deviation in the map position of quired for viability or fertility, while the many strains in the collection mutated the allelic insertions from this sample was only remaining insertions presumably inte- same genes, due to the existence of P 0.61 ± 0.69 bands. Since these tests would grated into phenotypically silent genes or element insertion hotspots. Others con- have revealed mistakes in localization as spacer regions. They proposed that single- tained background mutations that were large as .8-16 polytene chromosome insert stocks associated with mutations be responsible for the mutant phenotype in- bands, their reproducibility was unprece- collected and used to select a minimal stead of the inserted transposon. dented. Pictures of the hybridization sig- subset of strains (an "insertion library") to nals from each line were digitally recorded maintain in a Stock Center as a commu- High-Resolution Cytogenetic Mapping and have been distributed as part of the nity resource with diverse applications in of Transposon Insertion Sites BDGP data base (see Fig. 2). Thus, spe- genomic analysis. By identifying muta- cific lines from the collection can be re- tions that disrupted different genes from Accurately localizing the sites of transpo- quested whose insertions have been shown throughout the genome for the library, the son insertion in all the starting strains was to reside in virtually any genomic interval. inability to target P elements into partic- the key to constructing a nonredundant ular sites would be largely overcome. The library. In situ hybridization to salivary Disrupting Genes and Mapping size of the library presented an obstacle; gland polytene chromosomes can physi- Mutations on the Basis of Their collections of Drosophila mutant stocks cally map a given DNA sequence with an Location must remain small since strains cannot be accuracy of 1-2 polytene bands (Fig. 1). reliably maintained in a frozen state. This The 4015 bands recognized on the major Simply knowing the sites of transposon problem could be minimized, however, by autosomes are divided into 485 polytene insertion to high accuracy makes many eliminating redundant or aberrant lines so subdivisions (21A-100F), an average of 8 types of genetic analysis possible when that the collection would not grow beyond bands per subdivision. On the basis of using lines from the BDGP library. Mu- the number of mutable genes and would microspectrophotometry and compari- tations are frequently desired in an ORF remain small enough to maintain and dis- sons with chromosomal walks, an average that was identified and cloned by sequence tribute widely. The prospect of a P ele- band contains 25 kb of genomic DNA. similarity. Once the cytogenetic location ment insertion library was further en- Thus, if carried out with maximal accu- of any gene has been determined by in situ hanced when P element constructs that racy, the insertions would provide valu- hybridization, lines from the BDGP li- could be used for gene disruption were able guideposts that could be used brary can be obtained whose insertions engineered to facilitate the cloning of throughout the genome. So far the inser- are located nearby. These transposons flanking genomic DNA (14), the detection tions in 2785 of the lines have been contain an easily scored eye-color marker of gene expression patterns (6), the misex- mapped by in situ hybridization (Fig. 1). and can be remobilized efficiently. Con- pression of genes in developmental pat- These sites are distributed throughout the sequently, the inherent tendency of P el- terns (15), and the mediation of site- autosomes in a pattern indistinguishable ements to transpose "locally" (within specific recombination (16). from random (once the effects of a small about 100 kb) can be utilized to preferen- Numerous mutant strains containing number of P element insertion hotspots tially mutate the region containing the single P element insertions were subse- are removed). For example, 410 of 470 gene of interest (24). Moreover, by select- quently generated in several laboratories euchromatic subdivisions contain at least ing for loss of the marker gene following (refs. 17-21; M. Scott and M. Fuller, per- one insertion. At least 1200 different au- remobilization or x-ray treatment, small sonal communication). However, it re- mained impractical for any single group to Table 1. Sources of P element lines determine all their insertion sites and ge- Number of Insertions on Insertions on netic properties. Without this informa- Originating laboratory strains II III Reference(s) tion, an insertion library of maximal utility and feasible size could not be created, and A.C.S. 1124 460 664 13, 18 the long-term maintenance of these G.M.R. 146 50 96 19 strains remained in doubt. Recognizing L. and Y.-N. Jan 201 0 201 17 that for a model organism, genetic tools M. Scott and M. Fuller 114 35 79 Personal communication are highly synergistic with molecular and I.K.* 1500 1500 0 20 informatics tools, BDGP undertook to A. Laughnon 99 0 99 21 characterize available single P element Total 3184 2045 1139 insertion lines and to use them to generate *Estimated number of independently derived lines with <3 insertions. Downloaded by guest on September 26, 2021 10826 Review: Spradling et al. Proc. Natl. Acad. Sci. USA 92 (1995) about one crossover every 5 kb, which in most genomic regions would be sufficient to map a point to the site of one or a very few specific transcription units. The correct transcript can subsequently be identified by testing the ability of various DNA segments from the region to com- plement the mutant defect in transgenic flies or by comparing the sequence of mutant and wild-type alleles. Identifying Disrupted Genes As the size of the BDGP gene-disruption library grows, the density of insertions along the physical map will increase, and with it the power and precision of the local mutagenesis and recombination mapping methods described above. However, the fraction of Drosophila genes that have been directly disrupted by a transposon insertion in the collection will also grow and will reduce the need for these meth- B ods. Strains whose insertions directly dis- ry+m rupt genes of interest are ideal tools to expedite studies of gene function. For example, it is usually possible to identify

A F?{;l BR =e Ss r rapidly the transcription unit an insertion ry has disrupted by looking in the mutant for + altered transcripts near the insertion site. Unlike mutations generated by chemical -4- 250 kb -* mutagens or radiation, single P element insertions allow new alleles of the gene to 4- 0.4 mu l be generated rapidly by imprecisely excis- ing the original element. Studying a range FIG. 1. Localization of insertions to precise genomic regions and their use in fine-scale mutant includes true nulls is recombination mapping. (A) Insertions in line 1(2)03999 and 1(2)05822 were mapped to bands of alleles that 90E1-2 and 90F9-10, respectively, by polytene chromosome in situ hybridization with an frequently important for understanding insertion-specific probe. Pictures of the hybridization signals similar to those shown were recorded gene function. Imprecise excisions can be with a video camera, digitized, and stored in the BDGP data base. (B) Recombinants between selected that delete the gene's promoter the two insertion sites, which each carry a functional rosy+ (ry+) eye-color gene, were recovered and coding sequences, revealing its true among the progeny of heterozygous females as flies with rosy mutant eye color. The location of "null" phenotype. two such crossovers is drawn relative to the position of an arbitrary mutation (m) located within Because of the importance of direct the interval defined by the P element insertions. The inheritance of the mutation itself or gene disruptions, a major goal of the molecular polymorphisms within the interval among the recombinants can be used to assist in BDGP is to identify as many lines as positionally cloning the mutated gene by more precisely determining its location. The blue line collection shows the position of a hypothetical recombination event that would produce a ry- m- possible from the starting chromosome, while the recombination event illustrated by the yellow line would produce a ry- whose insertions disrupt different genes m+ chromosome. Recombinants can be recognized by their unique ry- eye color, making it that cause scorable phenotypes. This re- possible to readily isolate 50-100 independent recombinant chromosomes. The relative number quires proving that the mutant phenotype of these two classes of recombinants gives an estimate of the location of the mutant within the originally associated with the line is in fact interval between the P element insertions. Since the interval is only 250 kb, it is generally possible caused by the transposable element inser- to map a mutation to a 20- to 30-kb region in this way. Further refinement and confirmation can tion rather than by a secondary lesion that be provided by assaying the recombinant chromosomes for the presence of physically mapped arose during the screen. Moreover, the molecular polymorphisms (22). mutation must define a previously undis- locus and not chromosomal deletions in the surrounding number of recombinants seen between the rupted genetic simply rep- resent another allele of a gene already can some of which mutation and each P element, allows the region be generated, disrupted within an existing strain in the are likely to remove the desired locus. approximate position of the mutation library. Strains with these properties be- Frequently, mutations disrupting a bio- within the interval to be determined. come a permanent part of the final library. logical process of interest have been iden- Moreover, the resulting recombinant The number of different genes that are tified in chemical mutagenesis screens, but chromosomes can be scored for DNA ultimately represented should be limited their locations are not known accurately sequence polymorphisms within the inter- only by an inherent site specificity that enough for efficient cloning. By using val to further refine the position of the restricts the ability of P elements to mu- BDGP lines, any mutation can be accu- mutation (for example, see ref. 22). tate some loci and by the number of rately mapped by recombination relative The two insertions shown in Fig. 1 are starting strains that can be generated and to nearby insertions (see Fig. 1). This separated by approximately 250 kb. From analyzed. allows the mutation to be placed within 23,000 progeny, a total of 46 recombinants Two methods are being used to verify specific physical intervals defined by their were selected in which crossing-over had that the P element insertion is indeed the nearest two flanking insertions. Analysis occurred between the two elements simply cause of the lethal mutation and thereby of recombinants between parental insert- by selecting flies that had lost both eye- increase the size of the final gene- bearing chromosomes, in particular the color markers. This provides an average of disruption library. The first is to identify Downloaded by guest on September 26, 2021 Review: Spradling et al. Proc. Natl. Acad. Sci. USA 92 (1995) 10827

r File Edit TeHt Window " 8:44 RM Warz *iw Ir-77 --- :. S* l t lor iqin: 1 lRc-tive zone: 1 3541 X3-03999 1 354 (353) lfc:::1 --ww n_1-= Form Out| 1(3)03999 1(3)03999 1 atgcctcagagtcgaaatcgaaa Element PZ[ry, lacZl, A. Spradling Polytene 90E1 90E2 Covers 90E ...... 90E1 31 actcgctctctgcggcgggggaa 11 90E2 ;...~~~~~~~ Slide 1^s W. Express ion Pt 1(3)03999.embryjos 51 tgctcatcgacgtcatatgtctc STS DmO285 Contig DS06686 Genetics se ected-for-STS 91 tggtatatatatatatgtgtgttg Fly-ystatus primary collection Plasmidl10 P-0152 atLstock.center 8/16/93 121 agtotagccggccggtttggcgg stock-number P1618 Insert Def inedgene 1(3)03999 Comp.P.element 1(3)00643 26/57 151 ttgatccaatcgtaggtctggtt 1(3)05697 14/48 1(3)neo48 36/26 Non.CompP.Ielement 1(3)06915 8/97 181 tacactgeagccattttcatcgc 1(3)06948 15/125 1(3)s2541 15/107 Non.Comp.Aberrat ion Df(3R)P14 0/12 1 Reference Karpen and Spradling, 1992, Genetics 211 cactccagccaatcaacagtttg 132: 737--753 BDGP Project Members, 1994-, Berkeley Drosophila Genome Project 241 tttccgccgngggtgantttcgn

DmO285 Lcgggcaagcoggg

Sequence tctctccggtcttc Origin P-element 1(3)03999 Pr mer-status tests-comp ete Lab...tatus Rank 4 Ztacaacaggg tac Hits DS07302 GenBank.ref 000573 dbSTS.ref 5994 Reference BDGP Project Members, 1994-, Berkeley Drosophila Genome Project Contig DS06686 Primers Forward GATCCARTCGTAGGTCTGOTT 58 Reverse GRAGACCGGRGAGAGTCGA 58 Product-size 171 PCR-cycles 30 Annr,n nn I Pmnprn t Jre- 59

FIG. 2. Data from the gene-disruption project as displayed by the Encyclopaedia of Drosophila database browser. The Encyclopaedia of Drosophila is a joint product of the BDGP and FlyBase (23). It is implemented in a version of ACeDB, a genome database program for the UNIX operating system by Richard Durbin (Sanger Center, Cambs, U.K.) and Jean Thierry-Mieg (Centre National de la Recherche Scientifique, Montpellier, France). The original customization of ACeDB for use with Drosophila was done by Suzanna Lewis (BGDP) to produce Flydb, the laboratory database of the BDGP, and further enhancements for the Encyclopaedia of Drosophila were made by Suzanna Lewis and Cyrus Harmon (BDGP). A Macintosh-compatible version of ACeDB was written by Frank Eeckman (Lawrence Berkeley Laboratory), Richard Durbin, and Cyrus Harmon and was further customized for the Encyclopaedia of Drosophila by Cyrus Harmon. The Encyclopaedia of Drosophila displays published and unpublished data of the BDGP, as well as data collected by FlyBase from the scientific literature and other genome projects. The individual windows illustrate the organization of information relevant to each strain in the BDGP gene-disruption library, using strain 1(3)03999 as an example. The window entitled P element: 1(3)03999 summarizes general information, such as the structure of the inserted element, the polytene chromosome location of the insertion, the name of the sequence-tagged site (STS) derived from the insertion, the name of the contig in which this STS lies, the date the line was sent to the Stock Center, and the results of genetic complementation tests with other P element insertions and chromosomal deletions. Double clicking on individual items in this window opens other windows that display more detailed information. For example, selecting the slide number, 1.3.03999, opens a window that shows a digitized image of the chromosomal in situ hybridization used to map the insertion. Double clicking in the expression field opens a window, entitled 1(3)03999.embryos, that displays the embryonic expression pattern of the enhancer trap associated with this insertion. More information about the STS derived from this insertion can be obtained by opening the window entitled STS: DmO285. From within this window, the window entitled X3-03999, which contains the sequence of the STS, can be opened.

loci with two or more independent P that allow such tests to be carried out on tribution at the Drosophila Stock Center in insertion alleles. Control crosses using the the majority of the starting strains (see Bloomington, IN. Strains from the BDGP starting lines show there is a negligible Fig. 3). An additional advantage of this collection already rank among the most probability that two lines will fail to com- method is that lines bearing background frequently requested items and accounted plement for reasons unrelated to their mutations can be positively identified, and for more than 30% of all strains shipped cytogenetically similar P insertions. The the affected lines discarded. New dele- during 1994, their first full year of avail- crosses described above to test the accu- tions are being constructed to allow inser- ability. racy of insert localization have so far tions in the remaining genomic regions to The molecular structures of the genes identified 180 loci with multiple alleles. A be tested by this method. disrupted in the library are being learned second and more generally applicable At present, slightly more than 700 lines from two major sources. First, a flanking method is to determine if the mutant have been incorporated into the disrup- sequence tag is generated for each inser- phenotype associated with each starting tion library, representing s20% of the tion to localize it precisely on the genomic strain is uncovered when the region sur- estimated 3500 autosomal genes that mu- sequence, to generate an STS for the rounding the insertion site is removed by tate to a scorable phenotype. All these BDGP physical mapping project, and to a deletion. Small deletions are available lines have been deposited for public dis- assist in gene identification. Generating Downloaded by guest on September 26, 2021 10828 Review: Spradling et aL Proc. Natl. Acad. Sci. USA 92 (1995) 63F A B C 64D of the disrupted gene. Nearly half of these of the insertions in these lines were deter- disrupt genes encoding proteins with ex- mined to ensure that the mutations were tensive similarity to known human genes. associated with a P element insertion. Of course, these percentages underesti- The results from this study were highly mate the true fraction of insertions that informative (Fig. 4). A total of 16 of the 48 disrupt known Drosophila genes and genes vital genes (33%) were mutated by lines in with mammalian relatives because they the collection. Thus, if the 34D-36A re- are based on such a short sequence of gion is representative, -1/3 of all vital Df(3L)GN24 genomic DNA and because the sequence second chromosome genes ("500 genes) 63F4,7 64C13,15 of most human genes is not yet available will have been mutated by verified single- for comparison. insert lines when the analysis of the 3000 O x x x xxo The second approach to gene identi- starting strains is complete. A slightly t (0 ,.0 I fication is the Drosophila research com- lower number of third chromosome genes (Q cl m Vo5 munity, which has a close working rela- is predicted (400) because fewer starting tionship with BDGP. The chromosomal insertions were recovered on that chro- (c A D coX insertion-site data and genetic comple- mosome. These results are also consistent CO 'O (0 C U)o 't CO Lo mentation results from each line are with our previous estimates of inter-insert o C o0 0 0o o widely distributed via the Internet as distances. Since the 34D-36A region is part of the FlyBase (23) database project estimated to contain 1.8 Mb of DNA, the FIG. 3. Complementation data illustrating and in the form of an integrated genome average distance between insertions mu- the use of chromosomal deletions to verify browser known as "Encyclopaedia of Dro- tating vital genes would be approximately insertion lines. Genetic complementation re- sophila" that is produced in a collabora- 113 kb. sults from polytene subdivision 63F-64D are tive effort by the BDGP and FlyBase and While not detracting from the useful- summarized below a picture of the polytene distributed on CD-ROM (Fig. 2). Infor- ness of the collection, this is a lower value banding pattern of this region. The chromo- mation on the "enhancer-trap" expression than would have some region between bands 63F4-7 and been expected if P ele- 64C13-15, which polytene cytology shows to be patterns in these lines is also being col- ments could inactivate any gene. The total missing in strain Df(3L)GN24, is indicated by a lected and incorporated into the data number of independent second chromo- black box. Below the box are the names of seven base. These patterns are known to fre- some lines is approximately 2000, while strains associated with lethal mutations from quently reflect the expression pattern of 1600 lethal complementation groups are the starting collection whose insertions were the mutated gene, so this information estimated to reside on this chromosome. If mapped to this region by in situ hybridization. allows genes to be selected for further they were equally mutable with P ele- An X indicates that the strain failed to com- study on the basis of their probable pat- ments, more than 67% of the loci (rather plement the deletion, while an 0 indicates that terns of expression in diverse tissues. With than 33%) would contain one or more the lethal mutation survives in combination with Df(3L)GN24. The perfect correlation be- this information, the BDGP insertion lines mutations. On the basis of the success tween the deletion endpoints and complemen- located near genes currently undergoing rates in screens that mobilized small nat- tation behavior indicates that the lethal muta- study by members of the research com- ural elements, P elements were previously tions in the five lines lying inside Df(3L)GN24 munity are routinely requested from the estimated to mutate about 50% of all are caused by their P element insertions. One Stock Center and molecularly character- genes (26). It may be possible to disrupt line, 1(3)02333 (not shown), contained an in- ized, and the results are reported. the remaining genes by mobilizing P ele- sertion at 64B16-17 but complementedDf(3L)- ments containing sequences that modify GN24 and was therefore discarded. Estimating Mutational Saturation by their insertional specificity or by employ- Using a 1.8-Mb Genomic Region ing other transposable elements whose the sequence tags is greatly facilitated by movement can be controlled. The two the presence of internal cloning sequences The genomic region surrounding the Al- strongest candidates for alternative ele- in the inserted transposons that allow 5' cohol dehydrogenase (Adh) gene in poly- ments are hobo, an ele- flanking DNA to be cloned by tene divisions 34D-36A has been exten- ment in the Ac family (27) and Minos, an rescue (12). Subsequently, about 350 bp of sively studied (see ref. 25). A total of 48 element in the mariner/Tc family (28). 5' sequence is determined and compared complementation groups that mutate to The feasibility of these approaches is cur- with DNA sequence libraries to learn if lethality have been defined from mu- rently being tested. the insertion falls within a previously se- tagenesis screens that appear to have sat- quenced region. Additionally, the flank- urated mutable vital loci. We selected this P Elements Preferentially Mutate 5' ing sequence is translated in all six reading region to estimate the final saturation Gene Regions frames and compared with proteins from level of autosomal loci that could be ex- all organisms. The results of analyzing the pected when using the starting strains. All The information gathered on insertions first 280 flanking sequence tags showed lines failing to complement deficiencies from the library made it possible to ex- that approximately 20% were homologous known to uncover the 34D-36A region amine whether P elements exhibit site to Drosophila sequences previously re- were identified and crossed to represen- specificity within genes as well. An appar- ported in public data bases, leading in tative strains defining the 48 known vital ent preference for insertion near 5' ends most cases to the molecular identification loci. In addition, the cytogenetic locations of genes had been noted previously for

0) 0)fL OX) aUV : X ~ ~~~~A LouU') m 1 U-) .) 0a cZnc aLg a} am Xt, ... 0 v."n Cm 0 N.i U. \ \l NN 3 c'Cya :::::..:...... ;.;.; ...... i .;. 34E 35A 35B 35C 35D 35E 3SF 36A FIG. 4. Diagram of the vital genetic loci in the chromosomal region 34D-36A. This region contains approximately 1.8 Mb of DNA. The solid boxes represent the polytene chromosome bands of the cytogenetic map. The names and approximate positions of genetic loci are indicated (J.R. and M. Ashburner, unpublished data). The 48 loci shown can mutate to lethality; an additional 24 genes have been identified in this region that are not essential for viability. One-third (16/48) of the previously known vital complementation groups in this region were mutated by P element insertion strains in the current collection. These genes are highlighted. Downloaded by guest on September 26, 2021 Review: Spradling et al. Proc. Natl. Acad. Sci. USA 92 (1995) 10829 several specific genes (29, 30), but excep- Disrupting "Silent" Genes integration into their 5' flanking regions tions were also known. Fig. 5 shows the (A.C.S. and G.M.R., unpublished data). locations of 56 insertions present in start- The numb(er of transcripts appears to ex- The frequent proximity of the insertions ing lines (representing 49 different genes) ceed the nuamber of mutable loci in several to the 5' portion of the transcription unit whose positions have been determined at extensively(analyzed segments of the Dro- means that the gene disrupted in many the sequence level. Insertions are plotted sophila ge:nome (34, 35). In yeast and lines could be molecularly identified by for comparison relative to a "generic" Caenorhab'oditis elegans, the majority of comparing the DNA sequence flanking gene having one prior to the coding ORFs reviealed by genomic sequencing the insertion to genomic sequence data. sequences and one intron within coding have not been associated with mutant Final confirmation would depend on stud- sequences. The insertions studied exhibit phenotype!s, even in genomic regions that ies of the effects of the mutation of the a strong preference for 5' untranslated have been subjected to saturation mu- expression ofgene products at the levels of regions. The next most common integra- tagenesis (j'36, 37). The effects of disrupt- RNA and protein. ing ilent" genes revealed in these tion sites are located 100-200 bp upstream t ty be masked by the activity of Broadening Our Concept of Genomics of the site where transcription initiates. related geines or because their mutations Nine of the genes are frequent targets of rucedg efects that are not readily ap- Multicellular organisms are proving to P insertion and represent insertion "hot- prode dcasual observation under normal have far more in common at the biological spots" (Fig. 5, filled triangles). The laboratoryrconditions. To understand the level than previously suspected. Research mapped insertions in all but one hotspot function o)f such silent genes, it will be on the most tractable model systems, such are located in transcribed regions up- necessary tto obtain null mutations, just as as Drosophila, is greatly advancing our stream from the initiation AUG codon. in the case of genes causing obvious mu- understanding of what specific genes do, Even the few insertions that are found in tant pheno)types. In many cases, a pheno- including many that are directly relevant are clustered toward the 5' end, type can in fact be detected by using more to human biology and medicine. As sum- usually in introns within 5' untranslated specializedI assays-e.g., measuring circa- marized in this article, the scientists of the regions or else in the first intron to disrupt dian rhythhms (38). However, new ap- BDGP believe that the benefits of whole- coding sequences. The four coding se- proaches wvill be required since mutations genome analysis can best be realized if it quence insertions are located within the in silent geenes are unrecognized in most is coupled with powerful genetic tools. We first 80 amino acids from the N terminus, mutant scrreens. remain confident that resources, such as so the 5' preference applies to all types of Insertiornal mutagenesis with a single P the gene-disruption library, which inte- gene regions. element prrovides a powerful approach to grate molecular and genetic methods, will rmutations in silent genes when play an increasingly important role in fu- These observations strongly support obtaining ture research. previous anecdotal evidence that P ele- carried ouut in the context of whole- biological ments favor 5' gene regions for insertion. genome sequencing. Silent genes appear ar in structure and regulation to We thank G. Helt for carrying out database Preferential integration into 5' noncoding to. be simln similarity searches; M. Ashburner for advice regions is also observed with the yeast Tyl vital geneEs. Consequently, many P ele- and support; D. Nakahara, S. Sundby, W. Yu, transposon and other retroelements in ment inserrtion lines lacking any obvious and G. Doughty for help with in situ hybridiza- studies analyzing inactivated genes (31). phenotype probably contain silent-gene tion mapping of insertion lines; and A. Beaton It is likely that information on for help with genetic analysis. N. Patel provided Host proteins, including specific transcrip- thetiprs.sion pattern of silent genes pro- images of enhancer-trap expression patterns, tion factors with binding sites near tran- vided pby analysis of enhancer-trap inser- and H. Chang provided the recombination data scription start sites, are necessary for this tions will bie useful in guiding the choice of used in Fig. 1. We thank A. Fire, D. Koshland, specificity (32, 33). Regardless of biolog- specializedI phenotypic assays to perform M. Palazzolo, and D. Van Vactor for comments ical mechanisms underlying the tendency on individuual insertion lines. The identity on the manuscript. The BDGP is a consortium of the P element to insert close to the 5' of the ORIF disrupted by a silent insertion of the Drosophila Genome Center (funded by of National Center for Human Genome Research end of transcription units, the strength would be assigned on a tentative basis Grant P50-HG00750; G.M.R., Principal Inves- this preference greatly aids in identifying entirely b3y molecular data. In several tigator), the Lawrence Berkeley Laboratory the transcription unit affected by a P cases, P eliements in phenotypically silent Human Genome Center (funded by the De- element-induced mutation. lines have 1been shown to disrupt genes by partment of Energy; Mohan Narla, Acting Di- rector), and the Howard Hughes Medical In- 7vv stitute (through its support of work in the tV~11~ Number >15 G.M.R. and A.C.S. laboratories). This work was of supported by National Institutes of Health tV Grants P50-HG00750 and R03-HG01028. /VVvv4V alleles: 5-14 v A.C.S. and G.M.R. are Investigators of the Howard Hughes Medical Institute. J.R. was Vv_vv~ supported by a Medical Research Programme V <57 N0V_tVV V Grant to M. Ashburner and D. Gubb. V VV V 4V V V"5'VV4V V Vt pA 1. Pawson, T. & Bernstein, A. (1990) Trends Genet. 6, 350-356. intron intron 2. Krumlauf, R. (1992) BioEssays 14, 245- 5' FLK 5' UNT CODING 3' UNT 252. 3. Kandel, E. & Abel, T. (1995) Science 268, FIG. 5. Preferential P element insertion near the 5' end of transcription units. The insertion 825-826. site of 56 different mutagenic P elements whose insertion sites have been determined at the 4. Watson, K. L., Justice, R. W. & Bryant, nucleotide sequence level are shown. Each insertion site is plotted with respect to a simplified P. J. (1994) J. Cell Sci. 107 Suppl. 18, "standard" gene containing one intron before and one intron after the AUG initiation site. The 19-33. gene regions shown are as follows: 5' flanking sequences, 5' FLK (line); 5' untranslated region, 5. Martin, C. H., Mayeda, C. A., Davis, 5' UNT (hatched box); coding region (solid boxes); 3' untranslated region, 3' UNT (hatched box); C. A., Ericsson, C. L., Knafels, J. D., Ma- pA, poly(A) addition signal. The position of each insertion was plotted proportionally within the thog, D., Celniker, S., Lewis, E. B. & gene region shown. Upstream insertions were plotted by assuming that the 5' flanking region Palazzolo, M. J. (1995) Proc. Natl. Acad. shown was 500 bp. Sci. USA 92, 8398-8402. Downloaded by guest on September 26, 2021 10830 Review: Spradling et al. Proc. Natl. Acad. Sci. USA 92 (1995) 6. O'Kane, K. & Gehring, W. R. (1987) Proc. 19. Gaul, U., Mardon, G. & Rubin, G. M. 29. Tsubota, S., Ashburner, M. & Schedl, P. Natl. Acad. Sci. USA 84, 9123-9127. (1992) Cell 68, 1007-1019. (1985) Mol. Cell. Biol. 5, 2567-2574. 7. Kleckner, N., Roth, J. & Botstein, D. 20. Torok, T., Tick, G., Alvarado, M. & Kiss, 30. Kelley, M. B., Kidd, S., Berg, R. L. & (1977) J. Mol. Biol. 116, 125-159. I. (1993) Genetics 135, 71-80. Young, M. W. (1987) Mol. Cell. Biol. 7, 8. Bingham, P. M., Levis, R. & Rubin, G. M. 21. Chang, Z., Price, B. D., Bockheim, S., 1545-1548. (1982) Cell 25, 693-704. Boedigheimer, M. J., Smith, R. & 31. Sandmeyer, S. B., Hansen, L. J. & 9. Spradling, A. C. & Rubin, G. M. (1982) Laughon, A. (1993) Dev. Biol. 160, 315- Chalder, D. L. (1990) Annu. Rev. Genet. Science 218, 341-347. 332. 24, 491-518. 10. Engels, W. R. (1984) Science 226, 1194- 32. Liebman, S. W. & Newnam, G. (1993) 22. Cutforth, T. & Rubin, G. M. (1994) Cell Genetics 133, 499-510. 1196. 77, 1027-1036. 11. Kidwell, M. G. (1987) Drosophila Info. 33. Kirchner, J., Connolly, C. M. & Sandm- 23. FlyBase (1994) Nucleic Acids Res. 22, eyer, S. B. (1995) Science 267, 1488-1491. Serv. 66, 81-83. 3456-3458. 34. Bossy, B. L. M., Hall, C. & Spierer, P. 12. Rubin, G. M. & Spradling, A. C. (1982) 24. Tower, J., Karpen, G. H., Craig, N. & (1983) EMBO J. 3, 2537-2541. Science 218, 348-353. Spradling, A. C. (1993) Genetics 133, 347- 35. Kozlova, A., Semeshin, V. F., Tretyakova, 13. Cooley, L., Kelley, R. & Spradling, A. C. Kokoza, E. B., Pirrotta, V., Grafo- 1121-1128. 359. I. V., (1988) Science 239, 25. Ashburner, A., Thompson, P., Roote, J., datskaya, V. E., Belyaeva, E. S. & Zhimu- 14. Stellar, H. & Pirrotta, V. (1985) EMBO J. lev, I. F. (1994) Genetics 136, 1063-1073. 4, 167-171. Lasko, P. F., Grau, Y., El Messal, M., Roth, S. & Simpson, P. (1990) Genetics 36. Dujon, B., Alexandraki, D., Andre, B., 15. Brand, A. & Perrimon, N. (1993) Devel- Ansorge, W., Baladron, V., et al. (1994) opment (Cambridge, UK) 118, 401-415. 126, 679-694. 26. Kidwell, M. G. (1986) in Drosophila: A Nature (London) 369, 371-378. 16. Golic, K. & Lindquist, S. (1989) Cell 59, 37. Sulston, J., Du, Z., Thomas, K., Wilson, 499-509. Practical Approach, ed. Roberts, D. B. R., Hillier, L., Staden, R., Halloran, N., 17. Bier, E., Vaessin, H., Shepherd, S., Lee, (IRL, Oxford), pp. 59-81. Green, P., Thierry-Mieg, J., Qiu, L., Dear, K., McCall, K., Barbel, S., Ackerman, L., 27. Smith, D., Wohlgemuth, J., Calvi, B. R., S., Coulson, A., Craxton, M., Durbin, R., Carreto, R., Uemura, T., Grell, E., Jan, Franklin, I. & Gelbart, W. M. (1993) Ge- Berks, M., Metzstein, M., Hawkins, T., L. Y. & Jan, Y. N. (1989) Genes Dev. 3, netics 135, 1063-1076. Ainscough, R. & Waterston, R. (1992) 1273-1287. 28. Franz, G., Loukeris, T. G., Dialektaki, G., Nature (London) 356, 37-41. 18. Karpen, G. H. & Spradling, A. C. (1992) Thompson, C. R. L. & Savakis, C. (1994) 38. Konopka, R. J. & Benzer, S. (1971) Proc. Genetics 132, 737-753. Proc. Natl. Acad. Sci. USA 91, 4746-4750. Natl. Acad. Sci. USA 68, 2112-2116. Downloaded by guest on September 26, 2021