Human Chromosome-Specific Cdna Libraries: New Tools for Gene Identification and Genome Annotation

Downloaded from genome.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press RESEARCH Human Chromosome-specific cDNA Libraries: New Tools for Gene Identification and Genome Annotation Richard G. Del Mastro, 1'2 Luping Wang, ~'2 Andrew D. Simmons, Teresa D. Gallardo, 1 Gregory A. Clines, ~ Jennifer A. Ashley, 1 Cynthia J. Hilliard, 3 John J. Wasmuth, 3 John D. McPherson, 3 and Michael Lovett ~'4 1Department of Biochemistry and the McDermott Center for Human Growth and Development, The University of Texas Southwestern Medical Center, Dallas, Texas 75235-8591; 3Department of Biological Chemistry and the Human Genome Center, University of California, Irvine, California 9271 7 To date, only a small percentage of human genes have been cloned and mapped. To facilitate more rapid gene mapping and disease gene isolation, chromosome S-specific cDNA libraries have been constructed from five sources. DNA sequencing and regional mapping of 205 unique cDNAs indicates that 25 are from known chromosome S genes and 138 are from new chromosome S genes (a frequency of 79.5%}. Sequence complexity estimates indicate that each library contains -20% of the -SO00 genes that are believed to reside on chromosome 5. This study more than doubles the number of genes mapped to chromosome S and describes an important new tool for disease gene isolation. A detailed map of expressed sequences within the pressed Sequence Tags (eSTs)] (Adams et al. 1991, human genome would provide an indispensable 1992, 1993a,b; Khan et al. 1991; Wilcox et al. resource for isolating disease genes, and would 1991; Okubo et al. 1992). However, while ran- answer fundamental questions relating to gene dom sequencing does aid gene discovery, it pro- number, density, and evolution. Unfortunately, vides no mapping information per se. To provide this biological annotation of the physical map a useful resource for disease gene isolation eSTs has been slow in occurring; only 5213 human must be mapped in an expensive "top down" ap- "genes" have been regionally localized (Bowcock proach using genome-wide reagents (Cox et al. et al. 1995). Of these, only 311 gene sequences 1990; Green and Olson 1990; Khan et al. 1991; have been localized to human chromosome 5 Wilcox et al. 1991; Cohen et al. 1993; Polymero- and only 110 of these have been sublocalized poulos et al. 1993). Both approaches result in within this chromosome. Currently, two strate- gene maps (and eST data bases) that will be in- gies are used for mapping large numbers of genes: complete if only a limited spectrum of tissue (1) positional cloning approaches (The Hunting- types are sampled as cDNA libraries. In this re- ton's Disease Collaborative Research Group 1993; port, we describe the use of an entire human Miki et al. 1994; Lefebvre et al. 1995; Roy et al. chromosome to select cDNAs from complex 1995; Thompson et al. 1995) which use cloned cDNA pools (Lovett et al. 1991; Parimoo et al. genomic regions to isolate coding regions in a 1991; Morgan et al. 1992; Lovett 1994a,b); in ef- time-consuming "bottom up" approach that re- fect a positional cloning project on a chromo- sults in detailed transcription maps over small somal scale. Clones from these chromosome- (< 1 Mb) regions of the 3000-Mb human genome; specific cDNA libraries have been sequenced and and (2) random cDNA sequencing approaches mapped within existing chromosome-specific re- which yield DNA sequence information [Ex- agents, obviating the need to conduct genome- wide mapping, and considerably speeding up the ZThese authors contributed equally to this work. 4Corresponding author. analysis. Unlike conventional cDNA libraries, E-MAIL [email protected]; FAX (214) 648-1666. this new type of chromosome-specific reagent 5:185-194 @1995 by Cold Spring Harbor Laboratory Press ISSN 1054-9803/95 $5.00 GENOME RESEARCH @ 185 Downloaded from genome.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press DEL MASTRO ET AL. contains cDNAs that occur usually at very low S I 2 levels, now at relatively high levels [quasi-normalization (Weissman 1987)]. They can be constructed from many different tissue types and ANX6 ANX6 have wider applications beyond random sequencing strategies; for example, as physical I/ mapping reagents in hybridization-based analy- ses of genomic contigs. COUP COUP RESULTS Direct Selection CD74 Our starting source for human chromosome 5 ge- i[CO nomic DNA was a chromosome 5-specific cosmid library constructed by the Los Alamos National T ~'"iii ili ii~-iii ii'.:ii-~L'-~'~j.i Laboratories, Los Alamos, NM (Longmire et al. SPARC ::: ::: t t ::: :;: :t • : ,.: lt~,:t~,tt: T rill 1993). This arrayed library of 24,768 cosmid T !ii ii" ~ii ii~ iL~iii -~ii=-'i~!" clones represents -5 chromosome 5 equivalents and a DNA sequence complexity of -174 Mb. It contains at least one copy of all of the > 50 chro- A B mosome 5 loci that we have tested to date. (War- Figure 1 Examplesof qualitative and quantitative rington et al. 1991, Saltman et al. 1993), and a measures of enrichment of specific chromosome 5 small (< 5%) contamination with hamster ge- genes. (A) Four examples of autoradiographs from nomic DNA. Cosmid DNAs from all 24,768 Los Southern blots hybridized with the designated Alamos National Laboratories chromosome 5 cDNA probes. In each case the tracks contained 1 cosmids were prepared, biotinylated, and used en ~g of the starting HeLa cDNA (lane S), the primary masse in a set of five separate but parallel direct selected cDNAs (lane 1), and the secondary selected cDNA selection experiments (Lovett 1994). The cDNAs (lane 2). (B) The result of hybridizing the cDNA sources used were derived from human same cDNA probes to a sixfold density filter array of placenta, fetal brain, thymus, activated T-cells, cDNAs (576 clones of the 4608 total are shown and HeLa cells (Lovett 1994b). In all cases these here) arrayed in the template pattern shown at the were uncloned, PCR-amplifiable pools of cDNAs bottom. Each clone was spotted twice to yield a duplicate positive signal when hybridized with radi- (to maximize the complexity of cDNAs sampled), olabeled probes. Examples of positive signals within derived from cytoplasmic RNAs (to avoid ge- this array are shown for ANX6 (one hit), COUP (two nomic DNA contamination and hnRNA arti- hits), and CD74 (four hits). facts). Chromosome 5 cosmids and the cDNA pools were hybridized in solution (Lovett 1994a,b) and the cosmids plus cognate cDNAs captured on streptavidin-coated paramagnetic slightly decrease in abundance (a pattern that is beads. After washing, the cDNAs were eluted, characteristic of the occurrence of abundance PCR amplified, and recycled through an addi- normalization) in contrast with the other cDNAs tional round of selection. To assess enrichment which continue to increase in the secondary se- qualitatively, equal amounts of cDNA from the lection (Weissman 1987; Lovett at al. 1991; Pari- starting cDNA pool, primary selected cDNAs, and moo et al. 1991; Morgan et al. 1992; Lovett secondary selected cDNAs were analyzed by 1994a,b). Southern blotting and hybridization with a panel of eight chromosome 5 genes previously de- Quantitation of Enrichment scribed (Wang et al. 1989; Warrington et al. 1991). Figure 1A shows four examples of these The secondary selected cDNAs were then cloned controls (ANX6, SPARC, CD74, and COUP). All of into a plasmid vector and 4608 clones were ran- these genes show enrichments in the selected domly picked from each tissue source, for a total material. However, the COUP enrichment ap- of five selected cDNA libraries. These clones were pears to peak in the primary material and then formatted into high-density, duplicate clone ar- 186 ~ GENOME RESEARCH Downloaded from genome.cshlp.org on September 25, 2021 - Published by Cold Spring Harbor Laboratory Press CHROMOSOME-SPECIFIC cDNA LIBRARIES rays on filters and were hybridized with full- same material (Table 1). Substantial enrichments length control reporter genes to quantitate the have been obtained in these selections; -540-fold number of clones present in each library of 4608 for the CD74 gene and -40-fold for the RPL7 clones. Examples of the detected positives within cDNA which starts at a relatively high abundance the HeLa array are shown in Figure lB. To quan- and is actually selected by a pseudogene (see be- titate the abundance of the control cDNAs prior low). Some level of abundance normalization to enrichment, the starting HeLa cDNA was (Weissman 1987; Lovett at al. 1991; Parimoo et cloned in a bacteriophage vector and 500,000 re- al. 1991; Morgan et al. 1992; Lovett 1994a, b) is combinant plaques were screened by hybridiza- also evidenced in these figures, with the range of tion for the presence of the control genes. These abundance of the controls spanning 21-fold in data and data from other selected libraries are the HeLa selection (data not shown). The excep- summarized in Table 1. Not all of the control tion to these relatively uniform abundance num- genes are expressed in all of the starting tissues. bers is the placental selection, in which some in- However, in all cases but one (SPARC in the HeLa dividual controls are as high as 2.0% and others library), where a control could be detected by are as low as 0.02%. However, the placental se- PCR in the starting cDNA, it was also present at lected library was cloned after only one round of least once in the array of 4608 clones. As shown selection and it is the second round of selection in Figure 1A, the SPARC gene is enriched, but is that is designed to accomplish abundance nor- not present in the 4608 picked clones, indicating malization (Weissman 1987; Lovett at al.

Human Chromosome-Specific Cdna Libraries: New Tools for Gene Identification and Genome Annotation

(APOCI, -C2, and -E and LDLR) and the Genes C3, PEPD, and GPI (Whole-Arm Translocation/Somatic Cell Hybrids/Genomic Clones/Gene Family/Atherosclerosis) A

Mobile Genetic Elements in Streptococci

Genomics and Its Impact on Science and Society: the Human Genome Project and Beyond

From the Human Genome Project to Genomic Medicine a Journey to Advance Human Health

GENOME GENERATION Glossary

An Overview of the Independent Histories of the Human Y Chromosome and the Human Mitochondrial Chromosome

Library Construction and Screening

A Study of Visitors to Genome: the Secret of How Life Works

The Economic Impact and Functional Applications of Human Genetics and Genomics

Construction of Small-Insert Genomic DNA Libraries Highly Enriched

Cell Growth and Reproduction Lesson 6.2: Chromosomes and DNA Replication

Multiplexed Microbial Library Preparation Using Smrtbell