Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

RESEARCH An Expression-independent Catalog of from Human 22 James A. Trofatter, l'3 Kimberly R. Long, 1'3 Jill R. Murrell, 1 Christy J. Stotler, 1 James F. Gusella, 1'2 and Alan J. Buckler 1'4

Molecular Neurogenetics Unit, Departments of 1Neurology and 2Genetics, Massachusetts General Hospital and Harvard Medical School, Charlestown, Massachusetts 02129

To accomplish large-scale identification of genes from a single human chromosome, exon amplification was applied to large pools of clones from a flow-sorted human chromosome 22 cosmid library. Sequence analysis of more than one-third of the 6400 cloned products identified 35% of the known genes previously localized to this chromosome, as well as several unmapped genes and randomly sequenced cDNAs. Among the more interesting sequence similarities are those that represent novel human genes that are related to others with known or putative functions, such as one exon from a that may represent the human homolog of Drosophila Polycomb. it is anticipated that sequences from at least half of the genes residing on chromosome 22 are contained within this exon library. This approach is expected to facilitate fine-structure physical and transcription mapping of human , and accelerate the process of disease gene identification.

A primary goal of the initiative is proaches based on hybridization and biological the construction of fine-structure physical maps selection (Auch and Reth 1990; Duyk et al. 1990; of the chromosomes in anticipation of full DNA Buckler et al. 1991; Lovett et al. 1991; Parimoo et sequence analysis. However, probably the most al. 1991), have been proposed. Exon amplifica- important purpose of this mapping, the identifi- tion, an example of the latter category, relies on cation and placement of human genes, can be selection for functional splice sites flanking ex- carried out effectively before determining the ons and thereby, avoids problematic issues such complete sequence of the human genome, and as tissue specificity or relative mRNA abundance can aid in increasing the resolution of the phys- that are inherent to other gene identification ap- ical map. Low-resolution physical maps of hu- proaches. Recently, we have modified the exon man chromosomes have been described recently amplification technique to make it applicable to (Cohen et al. 1993), but considerably greater de- the isolation of gene sequences from very com- tail is needed to maximize their utility and pro- plex sources of genomic DNA (Church et al. ceed with large-scale sequencing. Identification 1994). As an initial test, we have applied this of a representative set of genes or gene fragments method to the isolation of large numbers of gene corresponding to a specific genomic region sequences from a single human chromosome. would satisfy many of the requirements for finer mapping and would add a level of functional sig- nificance to these evolving maps. The resulting RESULTS gene, or transcription maps, would provide a new Construction of a Chromosome-specific Exon framework for the study of structural, functional, Library and organizational aspects of chromosomes, and would lead to more efficient identification of Human chromosome 22 was chosen as a model genes involved in human disease. Consequently, for construction of exon libraries because of in- recently the development of methods for rapid tensive mapping and disease gene identification gene identification has received greater atten- efforts in this region of the genome. Plasmid tion, and numerous strategies, including ap- DNA was prepared from pooled clones of each of the 130-microtiter plates in an arrayed cosmid library constructed from flow-sorted human 3These authors contributed equally to this work. 4Corresponding author. chromosome 22 (LL22NC03), which represents E-MAIL [email protected]; FAX (617)726-5736. approximately five equivalents of chromosome

214 @GENOME RESEARCH 5:214-224 ©1995 by Cold Spring Harbor Laboratory Press ISSN 1054-9803/95 $5.00 Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

CHROMOSOME 22 EXON LIBRARY

22. An additional four pools were generated from utable to biases introduced during certain steps a human-specific subset of cosmids derived from of the procedure. Although cultured in separate a human-hamster hybrid cell line, GM10888, wells, growth differences among individual containing chromosome 22 as its sole human cosmid clones within each pool may have intro- component (Lichter et al. 1990). A total of 134 duced bias during the shotgun subcloning step. plate-pool cosmid DNAs were prepared, and each However, we have designed our approach to min- sample was digested and shotgun-cloned into the imize the probability that more than one exon in vivo splicing vector, pSPL3 (Church et al. will be isolated from the same cosmid (to maxi- 1994). Plasmid DNA from pSPL3 subclones (pools mize the uniformity of exon distribution across of 500-2000 insert-containing clones) was trans- the chromosome). By maintaining a high com- fected transiently into COS7 cells, which facili- plexity of target DNA (i.e., large numbers of tated SV40 large T-mediated plasmid amplifica- cosmids), the growth bias is likely to be distrib- tion and transcription. Cytoplasmic RNA derived uted across several cosmids in each pool, thus from these transfectants was used in RNA-PCR increasing the likelihood that an exon will be de- amplifications, and the resulting products were rived preferentially from several of the more cloned directionally as described in Methods. prevalent genomic clones. Bias in the library may Exon clones (48 from each pool, or -6400 clones) also be attributable to differential splicing effi- were arrayed, grown, and stored in 96-well mi- ciency of particular exons, as well as preferential crotiter plates. Approximately 24 clones from RNA-PCR amplification or exon cloning. These each of the first 100 pools were sequenced (a total aspects may be more difficult to control and are of 2304 sequences generated) in a single pass, likely to be dependent on the sequence compo- yielding 709 unique sequences, or an average of sition of exons or splice site sequences. As a re- 7.1 unique sequences per pool. To determine the sult, specific exons may have a reduced probabil- number of unique sequences in the rest of the ity of being trapped, but it is likely that other exon library, the remaining 24 exon clones from exons from the same gene will be identified. 10 pools were sequenced and compared to all of the 709 initial sequences. An additional 40 Sequence Data Base Comparisons unique sequences were produced, yielding an av- erage of four remaining unique clones per pool. The sequences were compared to those in public Therefore, we estimate that an average of 11.1 data bases (Altschul et al. 1990; BLAST compari- unique clones exist in each pool, and that se- son with Genbank and EMBL versions and up- quence has been produced for -64% (7.1 + 11.1) dates that were available 7/23/95), and a sum- of the unique clones in the first 100 pools. Be- mary of these results is presented in Tables 2 and cause there are 134 exon pools, our upper esti- 3. One hundred ninety-nine of the 709 sequences mate of the total number of unique sequences in (28%) analyzed are highly similar to known the entire exon library is 1487 (134 x 11.1). Thus, genes from a number of species. Included in the sequences that we have generated to date rep- these are 101 sequences that are identical (~97% resent approximately half (47%) of those present nucleotide identity) to segments of previously in the library. identified human genes. These can be subdivided Complete sequence was produced for 91% of into 48 sequences from 24 different genes that the clones analyzed, yielding a minimum average were mapped previously to chromosome 22 (in length of -125 bp per clone. This is close to the some cases, multiple exons were isolated from value of 135 bp that we have reported previously the same gene), and 53 other sequences corre- for completely sequenced sets of exons (Church sponding to heretofore unmapped genes and ex- et al. 1994), and likely will be similar when all the pressed sequence tags (EST). Included in the latter sequences are complete. We have estimated the group were sequences from RanGTPase- accuracy of the sequences produced to be activating 1 (Bischoff et al. 1995), phos- -99.5%, based on alignments to previously phatidlyinositol 4-kinase (Wong and Cantley known, well-characterized gene sequences (Ta- 1994), small nuclear ribonucleoprotein Sm D3 ble 1). (Lehmeier et al. 1994), glutathionine S-trans- An average threefold redundancy of clones ferase T1 (Pemble et al. 1994), and cadherin-13 corresponding to each unique sequence was ob- (Tanihara et al. 1994), which are now localized served, and is likely to be higher than this value provisionally to chromosome 22. In a few cases, when sequencing is complete; this may be attrib- the sequence identity was found with the corn-

GENOME RESEARCH@ 2 i 5 Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

TROFATTER ET AL.

Table 1. Summary of sequence data base search results

Sequence Representative number Length doue BLASTN Species Ao~e~ion % Sim BLASTX Species Accession % Sire

Identity, near Identity, or homologue

10 60 10H4 RanGTPase activating lm~eiu 1 HU X82260 100 RauGAPI HU X82260 100 12 164 1(}!-16 EST07064 HU T09171 95 13 206 1~t7 merlin (NF2) HU L11353 100 merlin (NF2) HU $33809 100 14 84 1(I-18 yg84aO3al Homo sat/ens eDNA IKI R53636 96 21 110 IICI0 lg lambda light chaLn* HU X51754 94 25 138 11D7 EST03946 HU T06057 90 28 88 12F2 casein kinase l-aiplm IK) M76343 69 casein idnase ! delta RA A46002 100 34 66 12G3 "lXJPl-like enhancer of split (TUPLEI) HU X75296 98 1UPl-like enhancer of split (TUPLEI) HU X75296 100 36 71 12G5 yg53a05.rl Homo saplem cDNA clone HU R25258 100 glutathlone reductese Pf X83603 78 37 69 12GI0 catechol methyltramfefase, upstream* RA Z12652 85 40 284 39E7 ydl8fO6.sl Homo sapiem cDNA clone I-KI T69835 92 42 144 13A7 fibelin A,B,C,D (FBLNI) ~ X53743 100 flbulin A,B,C,D (FBLNI) HU X53743 100 48 102 13144 fibulln A,B,C,D (FBLNI) I-~ X53743 100 flbulin A,B,C,D (FBLNI) HU X53743 100 59 67 14A9 ye81 c04arl Homo ~m cDNA ~ R01350 100 79 36 15FI 1 yh30cOZrl Homo sal/em cDNA I-~ R24354 100 80 153 16AI ~ I-RI X84076 100 DC_d:3~ HU X84076 100 86 111 16B2 l~hOq~fldylinodtol Inmfef protein IKI M73704 77 phmphatidylinositol transfer protein HU Q00169 86 100 39 19El pefoxisome prolifefator activated recept. I-KI L02932 100 101 130 19F..2 Ewing sKcoma (LZWSRI) HU X66899 100 Ewtug sarcoma (EWSRI) I-~ X66899 100 i 07 58 19F7 catechoi-O-methyltramfegase (COMT) HU M65212 100 catechol-O-melhylmmsfer~ HU M65212 I O0 117 117 20A2 yg59e09.rl I-lomo sapiem eDNA IKI R35102 100 128 123 21E3 merlin (NF2) I-KI L11353 I00 merlin (NF2) HU S33809 IGO 137 136 21F3 GM-CSF receptor beta drain (CSF2RB) HU M59941 98 GM-CSF receptor beta chain (CSF2RB) HU M59941 99 141 56 21FlO cDNA clone seq1298 I.U TIOG83 98 145 249 22A4 ym20f07.ri Homo sapiem cDNA ~ HI 5984 95 aconitue Pl P16276 96 158 75 24A7 yh83a06asl Homo sapiem cDNA ~ R33473 100 169 136 26A5 abulln A,B,C,D (FBLNI) HU X53743 100 fiMiin A,B,C,D (FBLNI) HU X53743 100 171 137 26B1 ya53d06al Homo sapiem eDNA HU T67209 100 probable glutathlonet, educ~se Ce P30635 66 172 206 26B2 partial cDNA sequence; doue c-21b05 IRI Z45264 99 174 154 ~B9 yf53a05al Homo ssplem cDNA HU R! 1814 96 175 75 28A1 EST02272 HU M85753 100 177 204 28A6 EST02272 HU M85753 98 180 141 29E3 lg lmnbda clmin variable HU M94113 I00 182 127 29E8 ye981ffT.rlHomo sapiens d)NA I-KI R07694 100 188 112 M Human homologue of yeast see7 l-ru M85169 77 Human homologne of yeast sect HU S24168 97 192 116 2P6 chlodde darnel HU X77197 89 chlodde dmnuel IRI X77197 97 211 193 32A1 partial cDNA sequence; clone c-2ka09 HU F07861 96 212 193 32A2 fetal-hng cDNA 5'.end seqnence HU D31239 I00 220 123 32B8 pedod clock protein MU P08399 85 228 94 331z4 ylO2el 0arl Homo sapiem eDNA l-ru R53774 96 235 111 36A1 gmmm-gtmtamyltranskt~e related 'HU M64099 100 p_nmm-glutamylUmsferase ndated I-RI A41125 100 251 100 39E2 l~t'lmtnl cluster tegkm (BCR) I-RJ U07000 100 258 165 3A4 yf13805al Homo sepiem d)NA IKI R07096 100 aiplm-aceutn CH $45673 62 265 118 3B12 . pst~M eDNA sequence;ekme c-I 0d08 W-R] Z391 I0 92 273 184 40AIO TI0 MU X"/4504 86 TI0 MU X74504 100 282 63 41E8 y193a07al Homo saplem eDNA W-R] I~0434 87 283 96 41E9 lqUPl -like e~lumcef c4fsplit (TUPLE 1) HU X75296 100 TUPl-like enhancer of split (TUPLEI) ~ X75296 100 284 96 41Ell ye83a05arl llomo s~ cDNA I-]U R02199 100 R01H2~ 8ene product Ce U00035 71 285 230 41Fl Ewtng sarcoma (EWSRI)* HU X66899 99 287 60 41F3 TUPI-Rke enhancer ~plit (TUPLEI) HU X'75296 100 291 70 42A7 celiulaf myosin heavy chain (MYH9) HU M81105 100 cellular myosin heavy chain HU A34876 ! 00 300 1 ! 3 43EA GM-CSF receptor beta dm/n (CSF2RB) HU M5994 1 ! 00 GM-CSF receptor beta chain (CSF2RB) ~ P32927 1O0 301 144 9H4 l-lk594-f Homo saplens d)NA ~ R41006 96 305 105 43F3 IMS2463 homeobox-l/ke geue HU U18977 90 307 220 43[:6 phesphalldylino,~ttol 4-1dnase IKI L36151 !00 phosphatidylinoaitol 4-kinase HU L36151 100 310 191 43F12 sepm~glne4ynlbetase IKI M27396 96 aspKagtne synfl~tase HU A27443 97 312 128 44A3 UK-HGMP ~quence ~) AAAASTE HU Z19815 88 326 149 45E6 EST05647 HU T07757 99 casein kinase I delta RA Q06486 100 329 251 45F3 beta-coymllin subunil beta B 1 BO X01808 86 beta-cWatallin sulmnit beta B 1 BO S07264 93 332 204 45F12 flbulin D (FBLNi) itU U01244 99 fibulin D (FBLNI) HU U01244 I00 333 126 46A1 ffiS UT6047 I.~ L30702 99 338 150 47A8 yg02812.fl Homo sapiens dDNA I-KI R17513 98 343 136 47B9 yd21~3.rl Homo ~iem eDNA I~ 1"78985 100 352 76 48F7 ~11 HU T08420 100 360 218 4F3 fl~roxine Mo~nue RA M21476 89 thyroxine deiodinase RA . P13700 95 364 53 4F8 ym43e06.rl Homo sapiens cDNA IKI HI 8565 87 368 125 50A! ye64b05,s I Homo ~m cDNA HU "1"99352 100 TI 0 MU X745O4 100 375 114 50B12 collasen alpha 3(VI) chain HU S13679 100 383 106 5D8 L-2R-beta HU A07797 I00 IL.2R-beta HU A07795 100 390 184 ~6 yg73f08.rl ~o SalOnS c.DNA HU R51689 97 FA6 MU P28658 94 395 82 7G'7 beta-aianlne symhase RA M97662 88 beta.alanine synthase RA Q03248 100 396 140 7G10 ~ point cluster gene (BCR) HU Y00661 95 Ixeak Mm cleste~ gene (BCR) HU U07000 95 400 176 81)12 d)NA done B270 HU T20010 I00 409 174 9AI0 SnRNP cote protein SaD3 I-~ U15009 100 SnRNP c~e protein Sin D3 HU U15009 100 415 145 9H12 c-sis (PDGFB) HU K01916 100 platelet-dm/ved growth factor B HU X00561 100 417 168 100El yo77d12.rl Homo sal~ens cDNA HU H30621 99 419 95 100E3 MDB 1128R Mus musculus cDNA MU R75186 92 421 111 100E7 IhoGAP protein HU Z23024 73 rhoGAP protein HU $34 296 91 430 206 100F8 E.VIX}~43 HU M78795 I00 440 50 27F8 NMDA receptor NR2C sulmnit* RA M91563 91 466 190 52A12 UK-HCaMPsequem~ ID AAADFQK i-gJ Z20962 99 470 122 52BI ! lmmmmglchulin lambda-like (IGLL8) HU 7,52204 100 471 96 52BI2 EST04544 HU T06655 89 473 141 53E3 rm inhtHtof I-RI M37191 90 Ras inhil~tor HU C38637 100 474 101 53E4 ye81c04~1 Homo sai~,ns eDNA HU R01350 100 491 154 54A5 partlai cDNA sequence; clone c-I va09 HU F07091 I00 495 138 54B 1 tm~ial cDNA sequence;clone c-Ogc09 I-KI Z42M 5 97

216 @ GENOME RESEARCH Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

Table 1. (continued)

497 136 54B4 NIBI246 Homo sapiens eDNA 3'end HU T16398 97 501 179 54B12 yml 5f07.ri Homo utdens cDNA HU HI 1873 100 509 235 25F7 L-2R-beta (L2RB) HU M26062 98 L-n-beta (IZ3RB) l-l[J A07795 95 515 7O $5A6 GTIL~ protein MU DI0715 97 GTP.bindflng l~otetn MU P32233 I00 518 94 56A!1 ARI wotetn (AR) mRNA HU U19345 1oo PDCdF respomtve elemem bind. protein MU U20282 94 520 50 56B7 ARI protein (AR) mRNA !-~ U19345 100 523 159 57E1 partial d~NA seqoeoce; cime c.zqe07 HU Z45647 91 531 68 57E!1 p300 protein I-U U01877 IOO tramcdptional adaptor protein p300 1~ A54277 1GO 533 112 57F1 ! 3BP-I, an SlD domain binding protein MU X87671 88 SH3 domain blnding protein MU X876"7! 91 543 105 58B4 Na~ glacme em'ampot'l~ Pi L02900 88 Na+/amlno add cotrampofm Pl P31636 94 545 141 58B11 yl~gd07a'l Homo mq~m c,DNA ILl RI 1 i00 100 562 211 60A12 ymSid05~rl Homo sapiens eDNA ~ H23042 100 575 177 61F4 HSLI3041 Hnman fetabMng dDNA HU D31239 96 590 127 63E3 mmwm-llatamyl tnumpeplidue (CLOT) HU M24903 100 592 153 63E6 pmiM ~NA sequeu~ elme e-ZlXl06 HU Z45600 98 593 70 63E7 ~m.51d0551 Hemo f~l~eW dDNA HU 1423042 100 616 160 65F3 B melanoma antigen (BAG~ mRNA IgtJ U19180 I00 predicted ttithorax protein Dm Z31725 57 620 76 81A4 pmtlal d)NA sequea~ doue c-2tl03 I-U P07808 97 625 77 66A5 eDNA sequence; done e-20d07 HU Z44432 96 630 75 66B2 lx,eMq~ut cluster ~eglou (BCR) 1413 M25947 100 breakpol~ cluster region (BCR) FIU B24847 ! 00 635 149 66BI0 brain tubuliu, alpha ¢Imln CH .100912 85 alpha-tubelin CH 1302552 95 636 60 67E1 smd~am/glucme eotmmp~tef RA DI6101 72 Na+/amino add eoDmpof~ Pl P31636 94 637 177 67F.A modifier I MU X56690 68 modtaec 3 MU P30658 85 650 194 68A5 btqmdkpoim cluster gene(BCR) HU M24603 100 break paint cluster Bene (BC]~,) l-'li3 M24603 100 651 123 68A6 vollage-gated sodium ~1 HU M94055 75 voltage*gated sodlmn channel HU A38195 I00 652 181 68A9 Ixeaiq~m da~er ~egion(BCR) HU Y0O661 ! 00 break pelm cluster ~ene (BCR) HU A28765 100 653 92 68A10 ~ lelmvtms HU M14123 91 x~.tf~ env poly~otein HU P ! 0267 83 672 71 70A5 neonatal glydne ~-ptor* RA X57281 89 687 148 71F11 lm3tein phosphatase T RA XT7237 91 i~otein phosphetase T RA U12203 160 696 199 73E6 anonymous ~ene i~ L18972 100 anonymous 8ene HU L! 8972 109 699 86 74A9 S-lae lectlu L- 14-11(LGALS2) HU M87857 100 704 96 75E4 ye98RYLrl ~ mplem dDNA i~ R07694 100 707 239 75E8 ~ 'i"09171 100 712 200 75F$ I..ZTR- ! mRNA HU D38496 100 ORF 882 gene prodmct Sc X87941 60 713 94 75F8 Ig ~ chain vm'lable* HU X56178 88 Ig lambda charm variable* I-E.; S13726 83 733 222 78F6 yc27d07.rl Homo sapiens cDNA If,J T67791 88 hypothetical in'otein YKL207w Sc P36039 68 737 129 79A12 glutatidone S-~ TI HILl X79389 98 glutathione S-tramferase TI HU X79389 97 739 177 79B5 glutmldoue S-Iramfetue TI HU X79389 98 glutamtoneS-transferase TI HU X79389 98 744 79 80F3 beta*adtqenergic~qor kinase 2 HU X69117 100 beta-adrenefglca~*pt~ kinase 2 I~ P35626 100 747 103 81A9 STS UT5309 1-112 I..30836 85 751 179 81B9 pt~tlal d3NA ~equenee; done c-zqe06 HU Z41320 I00 753 143 82E1 Ix~nt cluster~egio~ (BCR) HU Mi5025 100 tweakpotm cluster region (BCR) HU A26664 !00 757 98 82E12 N-type calcium charnel alphal MU U04999 81 voltage

4 177 10G5 dDNA eloee b12056 I-U T18862 76 25 138 111)7 EST03946 l-P2 TO6057 82 41 147 13A5 h~othetical protein 332 Cp JQ0367 65 44 215 13All malonyl e.~ A-acyl carder Ec P25715 61 66 157 14B6 paegdn HU M91585 67 pexegrin HU M9 ! 585 80 113 4O IBI cDNA clooe hbc787 FrO TIll61 85 114 164 IB2 F47A4.5 (3e Z49888 73 118 256 20A5 oxymefol-bindlng protein RB J05056 7 1 oxysteroi-binding protein I~ P22059 83 135 114 21F1 DNA binding protein (CI)C 10 parme0 Sp Z32838 71 143 246 22AI nemonal pentraxin 11[ i-U U29193 73 apexin GP U13236 84 148 108 22A10 off gene prodact Sc 7.,48149 64 151 92 23E1 growth factor reeeptor-heund protein 3 HU L295 ! 1 84 201 205 30112 ~-activating protein t'hoGAP ~ $34296 76 216 77 3ZAI0 lg lambda light chain HU X51754 80 258 165 3A4 alpha-actlnln HU P35609 63 262 189 3B2 CDI0 neutral e~ MU M81591 70 endothe li~c(mve~ng-etwyme 1 i-R] Z35307 8l 266 166 40AI fringe protein Dm L35770 54 275 96 40118 HUMXQG Human mRNA HU DI 6473 75 278 73 41E2 hb protein MU X81634 81 hb gem product MU X81634 9o 311 101 44A! semaphorin C MU X85992 64 344 114 47B10 transforming protein ray ~ S05382 71 350 177 48F3 e~eta glutathioue-S-Iransferase CH U13676 66 lheta gluta~ione~er~e CH U 13676 74 37 ! 116 50A7 eo~ell alpha 3('t/!) chain HU X52022 84 collagen Mpha 3(VI) dain HU S13679 82 461 59 51F12 y175al 2.sl Hou~ mqpiemcDNA HU R77368 71 526 136 57E4 LIMK CH D26310 86 LIM motif-contaildng protein kinase CH Jl:q)079 91 536 126 58A3 l:atrtlaleDNA sequence; clone c-2zg03 HU F09301 69

GENOME RESEARCH@ 21 7 Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

TROFATTER ET AL.

Table 1. (continued)

537 190 ~A6 ~a~ IM ll0k~ reg. subunit RA ~49~ 65 l~kDa ~n-~ng su~t ~ D379~ 556 51 ~ ~~uenoe ~gl MU L12133 88 576 137 6117/ ~NA done ~1~10 HU ~8~ 83 618 166 ~F6 y~l l.rl Hon~ sapienscDNA HU ~1~9 627 201 ~ ~ve im~ ~ ~or HU ~295 66 ~m~ Intestinal pepiidier~p~ RA ~3 63 665 91 69F1 iglyc~e diehycko~na~ HU ~3378 75 685 186 71~ D. me~m~ ~S Dm 7-.32012 64 a~ne ~celluhr ~aae YI ~9379 742 94 80El0 ~a~-~va~ng ~n ~ ~ S~296 787 180 ~1 ~y~-~ ~n Dm S3~7 53 804 198 88E4 ES'i~i HU T~2 75 806 152 ~ ~u~o~ ~mferaae RA X~6~ 73 ~u~one ~m~eraae ~ QOi 579 76 825 84 ~9 y~.sl ~mo ~ cDNA I-RJ ~9138 80 832 213 93~ ~flc.~158.3 Kd ~n ~103 ~ ~420 59 853 172 95~ cell ~m ~ ~ne ~ ~ X~579 869 129 ~ ~w~e~ GAS 6 ~A ~ ~ 63 fl~nln-2 ~ P35556 70 926 119 ~B9 ~9~11.~1 Homo ~a~ ~NA clone HU ~1~1 73

The criteria for categorizing similarities were based on BLASTN or BLASTX results (Altschul et al. 1990) and were as follows: Identity, near identity, or homolog, nucleic acid, or protein similarities t>85% and P value ~<10-s; strong similarity, nucleic acid, or protein similarity i>50% and P value ~<10-3. Asterisks denote sequence similarity to the complementary strand of the data base "hit." Species names: (BO) bovine; (Ce) Caenorhabditis elegans; (CH) chicken; (Cp) Clostridium perfringens; (Dd) Dictyostelium discoideum; (Din) Drosophila melanogaster; (Ec) Escherichiacoli; (GP) Cavia porcellus (guinea pig); (HU) human; (MU) mouse; (PI) pig; (RA) rat; (RB) rabbit; (Sc) Saccahromyces cerevesiae; (Sp) Schizosaccharomycespombe; (YI) Yarrowia lipolytica.

plementary strand of known genes. The most no- quence of exon 637 is highly similar to part of a table of these were the matches of sequences 285 common domain, termed the chromodomain, and 760, which were identical to the comple- found in genes whose products associate with mentary strands of Ewing sarcoma and cy- heterochromatin (James and Elgin 1986; Paro tochrome P450 IID6 gene sequences, respec- and Hogness 1991; Singh et al. 1991; Delmas et tively. In both cases, the match was found near al. 1993). The best studied of these genes are the 3' end of the mRNA sequences of these genes, Drosophila heterochromatin-associated protein, and the sequences flanking the aligned regions HP1, and Polycomb. Both of these genes have closely match consensus splice sites. Whether been shown to control, by repression, develop- these sequences represent artifacts or genes en- mental regulators such as homeotic genes. The coded on the overlapping DNA strand opposite chromodomain appears to be essential for assem- to the known genes remains unclear. bly of these into chromatin as part of a The remaining 98 sequences represent hu- multiple protein complex, as mutations or dele- man homologs of genes from other species, tions in this domain in the Polycomb protein members of gene families, or genes sharing abolish its ability to associate with heterochro- strong similarities with known genes. Among the matin (Messmer et al. 1992). Thus, the sequence more interesting sequences with similarities are represented by exon 637 may represent a novel those that represent novel human genes that are regulator of homeotic function in human devel- related to others with known or putative func- opment. Figure 1 is an amino acid alignment of tions. For example, the predicted amino acid se- exon 637 with the chromodomains of other pro-

Table 2. Summary of sequence data base comparisons

Category Number of sequences Percent of total

Identity to known human genes/ESTs 101 14.2 Near identity or strong similarity 98 13.8 Weak or no similarity 404 57.0 Repetitive sequence/artifact 106 15.0 Total unique sequences 709 100.0

Sequences have been placed into each category based on the results of BLASTN and BLASTX comparisons (Altschul et al. 1990). Criteria for similarity categories are described in the legend to Table 1.

218 ~ GENOME RESEARCH Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

CHROMOSOME 22 EXON LIBRARY

Table 3. Chromosome 22 gene sequences identified

Number of Locus Chromosomal matching name Gene name position sequences

ADRBK2 13-adrenergic receptor kinase c 11 1 ADTB1 13-adaptin c 12 1 BCR breakpoint cluster region c 11.2 7 COMT catechol-O-methyltransferase c 11.2 1 CSF2RB GM-CSF receptor, [3-chain cl 3 2 CYP2D6 a cytochrome P450 lID6 cl 3 1 DIAl NADH-cytochrome b5 reductase c1 3.3 2 EWSRa Ewing sarcoma ql 2.1-ql 2.2 2 FBLN1 fibulin-1 (isoforms A, B, C, D) ql 3.2-ql 3.3 4 GGT 7-glutamyl transpeptidase ql 1.2 1 GGT-rel GGT-related ql 1.2 1 IGLL8 immunoglobulin ;L-like ql 1.2 1 IGLV immunoglobulin light-chain variable ql 1.2 1 IL2RB interleukin 2 receptor, [3-subunit ql 3 2 LGALS2 S-lac lectin L-14-11 ql 2.2-ql 3 1 LIF leukemia inhibotoory factory ql 2 1 MYH9 cellular myosin heavy chain ql 2.3-ql 3.1 1 NF2 merlin, neurofibromatosis type 2 ql 2.1-ql 2.2 2 PDGFB c-sis, platelet-derived growth factor ql 3 1 PPAR peroxisome proliferation activated receptor ql 2-ql 3.1 1 TUPLE1 TUPI-like enhancer of split ql 1 5 anonymous gene ql 2 1 p300 transcription adaptor ql 3.2 1 LZTR-I ql 1 2 ARI ? 2

The previously localized genes identified by data base comparisons and the number of different sequences matching them are listed. a One of the sequencesmatching EWSRI and the matching sequencefor CYP2D6were identical to the mRNA complementary strands of these genes.

teins. Exon 637, however, does not contain the specific mRNA species exemplifies an advantage complete chromodomain, but begins several of the exon amplification approach: expression- amino acids downstream and continues beyond independent gene identification. The over- the carboxyl end of the motif. Interestingly, the whelming majority of human genes are ex- genomic structure of the Polycomb locus of pressed at low levels, producing low-abundance Drosophila has been determined, and the 5' end mRNA (Hastie and Bisho p 1976). Many other ap- of exon 637 is at the precise location of an in- proaches that require significant levels of gene tron-exon boundary within this gene (Paro and expression, or knowledge of tissue specificity, Hogness 1991). This suggests that some phyloge- may fail to identify such genes with any effi- netic conservation of genomic structure exists for ciency. This includes most large-scale random the 637 gene or that the 637 gene may represent cDNA sequencing strategies (Adams et al. 1991, the human homolog of Polycomb. 1992; Khan et al. 1992; Okubo et al. 1992), which In addition to exon 637, several of the se- are biased toward identification of mRNAs that quences appear to be closely related to genes in- are highly expressed in the tissue (or cells) from volved in growth regulatory, developmental, or which the library was generated, as well as ap- cell type-specific-processes. Isolation of these proaches using RNA derived from monochrom- types of genes, many of which are likely to be somal- or region-specific human-rodent hybrid representative of low-abundance or tissue- cell lines (Liu et al. 1989; Corbo et al. 1990; Jones

GENOME RESEARCH@ 219 Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

TROFATTER ET AL.

rived from a human-hamster hybrid that contains preferen- Exon637 ...... AAFEQKERERELYGPKK Musmod3 AAECILSKRLR LAFQKKEHEKEV.QNRK tially human chromosome Drospc ~AEKIIQ~RVKK~Q~ DIYEQTNKSSG..TPSK 22, but also retains chromo- Musmo~1 VVEKVLDRrVW~D~ AEFLQSQKTAHE ..... somes 9 and Y at a low fre- Humhpl VVEKVLDRRVVK~E~ SEFMKKYKKMKEGENNK quency. Thus, the possibility Celseq TVESILEHRKKKWY~DHT ~ EAFFTREAARKAEIKAK exists that some of the exons ~uomo~ vv~w~vv~~~~~ ~ EDFLNSQKAGKE ..... Droshpl AVEKIIDRRVRK~Y~PETE~N QQYEASRKDEEKSAASK originate from nonchromo- i some 22 genomic DNA, in- cluding hamster. This esti- "Chromodomain" mate has been confirmed by Figure I Alignment of the predicted peptide sequence of 637 with other mapping these sequences to chromodomain-containing proteins. Identities or conservative amino acid dif- Southern blots of human, ferences between 637 and each protein are shaded. The arrow indicates the location of an intron in the Drosophila melanogaster Polycomb gene. (Mus- hamster, and GM10888 rood3) Mouse modifier-3 (accession no. P30658); (Drospc) Drosophila mela- (monochromosome 22 hu- nogaster Polycomb (accession no. P26017); (Musmodl) mouse modifier-1 man-hamster hybrid) DNAs (accession no. P23197); (Humhpl) human HP1 homolog (accession no. (data not shown). Of 21 ran- $62077); (Celseq) Caenorhabditis elegans, hypothetical 33.8-kD protein (ac- domly chosen exons that hy- cession no. P34618); (Musmod2) mouse modifier-2 (accession no. P23198); bridized to the blots, 17 (81%) (Droshpl) Drosophila melanogaster heterochromatin protein HP1 (accession mapped to chromsome 22, 3 no. P05205). were of hamster origin, and 1 was human, but apparently not originating from chromo- et al. 1992). Direct or cDNA selection procedures some 22. These numbers are consistent with the (Parimoo et al. 1991; Lovett et al. 1991) have estimated nonchomosome 22 content of the been designed to minimize this problem of rep- starting cosmid library. The percentage of chro- resentation by enriching for rare mRNAs and mosome 22-specific sequences is likely to be complement exon amplification in that they can higher, as we excluded from our analysis exons enrich for these low abundance species, if they from previously mapped genes. are present. These approaches are also adaptable for en masse, region-specific gene identification, as Del Mastro et al. (1995) demonstrate using hu- DISCUSSION man as a target. The collection of clones described above repre- One hundred six of the 709 sequences sents one of the first large-scale, chromosome- (-15%) were artifacts or similar to repetitive ele- specific isolations of human gene sequences, and ments. The artifacts were derived from a number will serve as an inroad to the development of an of sources but originate primarily from pSPL3 integrated physical and transcription map of hu- and the pLawrist16 cosmid vector. The higher man chromosome 22. It is likely to be an invalu- prevalence of these clones as artifacts in these able tool for creating the high resolution maps assays, as compared to assays of single cosmid that are needed for sequencing of the human ge- clones, may be attributable to the vast molar ex- nome. Extrapolation of our results leads to a pre- cess of these sequences relative to the genomic diction that -1500 sequences will be generated sequences targeted for exon isolation. It should after identification of all unique sequences in this be noted that sequences from pSPL3 or pLawrist collection of clones (of which >1200 will be non- comprise -20% (10% each) of all clones in the repetitive and nonartifact), and an estimation library, based on sequencing of -2300 clones. that nearly half of these sequences have been These can be eliminated readily by hybridization produced to date. In this study 24 of 59 (41%) of detection, thereby significantly reducing the ef- the fully sequenced genes (nonpseudogenes) cur- fort required for sequencing of similar libraries. rently known to map to chromosome 22 were identified, suggesting that a similar percentage of all genes on this chromosome are represented by Localization of Exons to Chromosome 22 the sequences that have been generated thus far. The starting genomic DNA for these experiments Therefore, it is anticipated that exons from as was derived from flow-sorted chromosomes de- many as 80% of the genes on this chromosome

220 ~ll GENOME RESEARCH Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

CHROMOSOME 22 EXON LIBRARY

will be represented in this library after comple- tified thus far could be used to complete the Hu- tion of library sequencing. To further increase the man Genome Initiative's goal of one STS per 100 representation of genes in this library and elimi- kb for each chromosome. The conversion of ex- nate much of the bias in gene identification, sub- ons to cDNAs would then provide both ordering sequent isolation of exons will be performed on and orientation across groups of yeast artificial cosmids that stochastically failed to produce ex- chromosomes (YACs) or cosmids as well as con- ons in the initial library construction. firm any existing contig information. Also, the Several aspects of this approach make it ideal cross-species conservation of many exons allows for construction of detailed physical/ for effective comparative mapping of genes and transcription maps. First, the use of genomic for direct comparison of emerging physical maps DNA from a specific human chromosome allows in humans, mice, and other model genomes. positional information to be associated provi- Third, because exon amplification is not de- sionally to each exon, circumventing the need to pendent on the level or pattern of expression of localize cloned gene fragments that have been the gene that is isolated, representational biases isolated and sequenced as random cDNA, and inherent in tissues or cells from which cDNA li- provides a more direct approach to saturate spe- braries are constructed, are eliminated. Exons can cific regions of the genome with such sequences. be used as a DNA sequence source for determin- It should be noted, however, that sources such as ing the specific expression pattern of the gene flow-sorted chromosomes, or libraries con- from which it originated by using quantitative structed from them, frequently originate from assays of RNA expression, such as Northern blot- human-rodent somatic cell hybrids; a small frac- ting, S 1 nuclease, or RNase protection, in situ hy- tion of genomic DNA, hence exons, may origi- bridization, or by using PCR to detect the pres- nate from the rodent parent or from other hu- ence of exon sequences in a cDNA library man chromosomes present within the hybrid. (Church et al. 1993; Munroe et al. 1995). The The chromosome 22 cosmid library used in this resulting information allows for effective screen- study was derived from a hybrid cell line that also ing of appropriate cDNA libraries to saturate retains human chromosomes 9 and Y at a low quickly a given genomic region with genes. This frequency. We have estimated that 10%-20% of property of the technique has already been ap- the exon library contains sequences from ge- plied successfully to positional cloning efforts in nomic DNA not originating from human chro- Huntington's disease and neurofibromatosis 2 mosome 22, the majority of which is from ham- (Huntington's Disease Collaborative Research ster. Thus, although the exon library is highly Group 1993; Trofatter et al. 1993), resulting in enriched for chromosome 22 gene sequences, it is identification of the disease genes by isolation of not pure, and the sequences are annotated as 28 and 8 of their exons, respectively, as well as such. It should be noted, however, that no exons successful effort to identify several other human from known genes that map to any other human and mouse disease genes (Vidal et al. 1993; Vulpe chromosome were identified to date in these et al. 1993; Walker et al. 1993; Cachon-Gonzalez studies, whereas 48 exons from chromosome 22- et al. 1994; H~istbacka et al. 1994). specific genes were isolated. In addition, the re- The large-scale exon isolation approach that cent availability of high-quality monochromo- we have applied here is currently being trans- somal hybrids for most human chromosomes, ferred to several other human chromosomes. Our coupled with improved flow-sorting will reduce data suggest that this method could identify seg- dramatically the problem for future exon library ments from the majority of human genes before constructions. to the generation of the human genome's se- Second, the exons produced by this proce- quence. Moreover, the strategy would help to dure are consummate multi-purpose mapping re- achieve this goal by facilitating the necessary agents. Because the vast majority of exons are construction and comparison of fine structure single copy sequences, they can be used as hy- physical maps while simultaneously integrating bridization probes in filter-based mapping proce- them into transcription maps of greater utility to dures. Moreover, we have found that exon se- a wide range of researchers in genetics and biol- quences are easily converted to sequence-tagged ogy. Continued application of this strategy rep- sites (STS) for use in PCR-based mapping schemes resents a most effective and cost-efficient means (Green and Olson 1990). With an average spac- of substantiating the most prominent rationale ing of 60-70 kb, the chromosome 22 exons iden- for pursuing the Human Genome Initiative, cre-

GENOME RESEARCH ,@ 221 Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

TROFATTER ET AL. ation of the infrastructure needed to support the ACKNOWLEDGMENTS rapid and efficient discovery of genes causing hu- We thank Drs. M. Duyao, M.K. McCormick, D.J. Munroe, man disease. and M. Lovett for helpful comments, discussion, and crit- ical reading of the manuscript. We also thank the Lawrence Livermore National Laboratory for providing the METHODS flow-sorted chromosome 22 cosmid library. This work was supported by grants HG00672 (A.J.B.) and HG00169 Exon Library Construction (J.F.G.) from the National Institutes of Health. The publication costs of this article were defrayed in Exon amplification was performed as described (Church et part by payment of page charges. This article must there- al. 1994), with some modification. Cosmid-containing fore be hereby marked "advertisement" in accordance clones were propagated in 96-well microtiter plates, with 18 USC section 1734 solely to indicate this fact. pooled, and cosmid DNA purified using the alkaline lysis method. Before propagation, cosmids containing riboso- mal gene DNA (rDNA) were identified by hybridization and removed from each plate. This was done to insure REFERENCES against overrepresentation of chromosome 22 rDNA se- Adams, M.D., J.M. Kelley, J.D. Gocayne, M. Dubnick, quences in the amplified and cloned products. Shotgun M.H. Polymeropoulos, H. Xiao, C.R. Merril, A. Wu, B. cloning into pSPL3, transfections, and RNA isolations were Olde, R.F. Moreno, A.R. Kerlavage, W.R. McCombie, and performed as described (Church et al. 1994). Initially, J.C. Venter. 1991. Complementary DNA sequencing: RNA-based-PCR amplification (RT-PCR) and cloning of Expressed sequence tags and human genome project. the resulting products was performed as described (Church Science 252" 1651-1656. et al. 1994), but was replaced by ligation-independent cloning using uracil DNA glycosylase (UDG; Rashtchian et Adams, M.D., M. Dubnick, A.R. Kerlavage, R. Moreno, al. 1991). This entailed the replacement of oligodeoxynu- J.M Kelley, T.R. Utterback, J.W. Nagle, C. Fields, and J.C. cleotide primers SD2 and SA4 in the second PCR amplifi- Venter. 1992. Sequence identification of 2,375 human cation, with SDDU and SADU. The sequences of these brain genes. Nature 355: 632-634. primers are as follows:

SDDU: 5"-AUAAGCUUGAUCUCACAAGCTG Altschul, S.F., W. Gish, W. Miller, E. Myers, and D.J. CACGCTCTAG-3', Lippman. 1990. Basic local alignment search tool. J. Mol. Biol. 215" 403-410. SADU: 5'-UUCGAGUAGUACUTTCTATTCCT TCGGGCCTGT-3'. Auch, D. and M. Reth. 1990. Exon trap cloning: Using PCR to rapidly detect and clone exons from genomic Complementary primers were also designed for the clon- DNA fragments. Nucleic Acids Res. 18: 6743-6744. ing vector pBluescript IIKS + (Stratagene), surrounding the EcoRV site; these are as follows: Bischoff, F.R., H. Krebber, T. Kempf, I. Hermes, and H. Ponstingl. 1995. Human RanGTPase-activating protein BSDU: 5'-GAUCAAGCUUAUCGATACCGT RanGAP1 is a homologue of yeast Rnalp involved in CGACC-3', mRNA processing and transport. Proc. Natl. Acad. Sci. 92: 1749-1753. BSAU: 5'-AGUACUACUCGAAUTCCTGCA GCC-3'. Buckler, A.J., D.D. Chang, S.L. Graw, J.D. Brook, D.A. Haber, P.A. Sharp, and D.E. Housman. 1991. Exon Ten nanograms of EcoRV-digested pBluescript IIKS + was amplification: A strategy to isolate mammalian genes amplified with BSDU and BSAU. Fifty nanograms of the based on RNA splicing. Proc. Natl. Acad. Sci. amplified, linearized plasmid was mixed with 50-100 ng of 88: 4005-4009. RT-PCR product, and the mixture was digested with I unit of UDG (GIBCO-BRL) at 37°C in a 10-~1 volume of I × PCR Cachon-Gonzalez, M.B., S. Fenner, J.M. Coffin, C. buffer. The digested and annealed products were immedi- Moran, S. Best., and J.P. Stoye. 1994. Structure and ately transformed into Escherichia coli DH5~ host. UDG expression of the hairless gene of mice. Proc. Natl. Acad. cloning streamlined the procedure and completely elimi- Sci. 91" 7717-7721. nated a significant frequency of clone chimerism. Clones from each pool were picked, propagated, frozen, and stored in 96-well microtiter plates. Sequencing was per- Church, D.M., A.C. Rogers, S.L. Graw, D.E. Housman, formed using the method of Sanger et al. (1977). Se- J.F. Gusella, and A.J. Buckler. 1993. Identification of quences were automatically read using a Millipore Bioim- human chromosome 9 specific genes using exon age DNA sequence film reader operating on a Sun Sparc amplification. Human Mol. Genet. 2- 1915-1920. Station. Sequence data base comparisons were performed using the BLAST network service of the National Center for Church, D.M., C.J. Stotler, J.L. Rutter, J.R. Murrell, J.A. Biotechnology Information (Altschul et al. 1990). The se- Trofatter, and A.J. Buckler. 1994. Isolation of genes from quences have been deposited in Genbank with the follow- complex sources of mammalian genomic DNA using ing accession numbers: H55062-H55737. exon amplification. Nature Genet. 6: 98-105.

222 ~11GENOME RESEARCH Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

CHROMOSOME 22 EXON LIBRARY

Cohen, D., I. Chumakov, and J. Weissenbach. 1993. A and D3 from human small nuclear ribonucleoproteins: first-generation physical map of the human genome. Evidence for a direct D1-D2 interaction. Proc. Natl. Acad. Nature 366: 698-701. Sci. 91: 12317-12321.

Corbo, L., J.A. Maley, D.L. Nelson, and C.T. Caskey. Lichter, P., S.A. Ledbetter, D.H. Ledbetter, and D.C. 1990. Direct cloning of human transcripts with HnRNA Ward. 1990. Fluorescence in situ hybridization with Alu from hybrid cell lines. Science 249: 652-655. and L1 polymerase chain reaction probes for rapid characterization of human chromosomes in hybrid cell Delmas, V., D.G. Stokes, and R.P. Perry. 1993. A lines. Proc. Natl. Acad. Sci. 87: 6634-6638. mammalian DNA-binding protein that contains a chromodomain and an SNF2/SWI2-1ike helicase domain. Liu, P., R. Legerski, and M.J. Siciliano. 1989. Isolation of Proc. Natl. Acad. Sci. 90: 2414-2418. human transcribed sequences from human-rodent somatic cell hybrids. Science 246: 813-815. Del Mastro, R., L. Wang, A.D. Simmons, T.D. Gallardo, G.A. Clines, J.A. Ashley, C.J. Hilliard, J.J. Wasmuth, J.D. Lovett, M., J. Kere, and L.M. Hinton. 1991. Direct McPherson, and M. Lovett. 1995. Human selection: A method for the isolation of cDNAs encoded chromosome-specific cDNA libraries: New tools for gene by large genomic fragments. Proc. Natl. Acad Sci identification and genome annotation. Genome Res. 88: 9628-9632. 5: 185-194. Messmer, S., A. Franke, and R. Paro. 1992. Analysis of Duyk, G.M., S. Kim, R.M. Myers, and D.R. Cox. 1990. the functional role of the Polycomb chromo domain in Exon trapping: A genetic screen to identify candidate Drosophila melanogaster.Genes & Dev. 6: 1241-1254. transcribed sequences in cloned mammalian genomic DNA. Proc. Natl. Acad. Sci. 87: 8995-8999. Munroe, D.J., R. Loebbert, E. Bric, T. Whitton, D. Prawitt, D. Vu, A. Buckler, A. Winterpracht, B. Zabel, and Green, E. D. and M.V. Olson. 1990. Systematic screening D.E. Housman. 1995. Systematic screening of an arrayed of yeast artificial-chromosome libraries by use of the cDNA library by PCR. Proc. Natl. Acad. Sci. polymerase chain reaction. Proc. Natl. Acad. Sci. 92: 2209-2213. 87: 1213-1217. Okubo, K., N. Hori, R. Matoba, T. Niiyama, A. H~istbacka, J., A. de la Chapelle, M. Mahtani, G. Clines, Fukushima, Y. Kojima, and K. Matsubara. 1992. Large M.P. Reeve, M. Daly, B. Hamilton, K. Kusumi, B. Trivedi, scale cDNA sequencing for analysis of quantitative and A. Weaver, M. Lovett, A. Buckler, I. Kaitila, and E.S. qualitative aspects of gene expression. Nature Genet. Lander. 1994. The diastrophic dysplasia gene encodes a 2: 173-179. novel sulfate transporter: Positional cloning by fine-structure linkage disequilibrium mapping. Cell Paro, R. and D.S. Hogness. 1991. The Polycomb protein 78: 1073-1087. shares a homologous domain with a heterochromatin-associated protein of Drosophila. Proc. Hastie, N.D. and J.O. Bishop. 1976. The expression of Natl. Acad. Sci. 88: 263-267. three abundance classes of messenger RNA in mouse tissues. Cell 9: 761-774. Parimoo, S., S.R. Patanjali, H. Shulka, D.D. Chaplin, and S.M. Weissman. 1991. cDNA selection: Efficient PCR The Huntington's Disease Collaborative Research Group. approach for the selection of cDNAs in large genomic 1993. A novel gene containing a trinucleotide repeat DNA fragments. Proc. Natl. Acad. Sci. 88: 9623-9627. that is expanded and unstable on Huntington's disease chromosomes. Cell 72: 971-983. Pemble, S., K.R. Schroeder, S.R. Spencer, D.J. Meyer, E. James, T.C. and S.C.R. Elgin. 1986. Identification of a Hallier, H.M. Bolt, B. Ketterer, and J.B. Taylor. 1994. nonhistone chromosomal protein associated with Human glutathione S-transferase theta (GSTT1): cDNA heterochromatin in Drosophila melanogaster and its cloning and the characterization of a genetic gene. Mol. Cell. Biol. 6: 3862-3872. polymorphism. Biochem. J. 300: 271-276.

Jones, K.W., M. Chevrette, M.H. Shapero, and R.E.K. Rashtchian, A., G.W. Buchman, D.M. Schuster, and M. Fournier. 1992. Generation of region and species-specific Berninger. 1991. Uracil DNA glycosylase-mediated expressed gene probes from somatic cell hybrids. Nature cloning of polymerase chain reaction-amplified DNA: Genet. 1: 278-283. Application to genomic and cDNA cloning. Anal. Biochem. 206: 91-97. Khan, A.S., A.S. Wilcox, M.H. Polymeropoulos, J.A. Hopkins, T.J. Stevens, M. Robinson, A.K. Orpana, and Sanger, F., S. Nicklen, and A.R. Coulson. 1977. DNA J.M. Sikela. 1992. Single pass sequencing and physical sequencing with chain terminating inhibitors. Proc. Natl. and genetic mapping of human brain cDNAs. Nature Acad. Sci. 74: 5463-5467. Genet. 2: 180-185. Singh, P.B., J.R. Miller, J. Pearce, R. Kothary, R.D. Burton, Lehmeier, T., V.A. Raker, H. Hermann, and R. R. Paro, T.C. James, and S.J. Gaunt. 1991. A. sequence Luehrmann. 1994. cDNA cloning of the Sm proteins D2 motif found in a Drosophila heterochromatin protein is

GENOME RESEARCH @ 223 Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

TROFATTER ET AL. conserved in animals and plants. Nucleic Acids Res. 19: 789-794.

Tanihara, H., K. Sano, R.L. Heimark, T. St.John, and S. Suzuki. 1994. Cloning of five cadherins clarifies characteristic features of cadherin extracellular domain and provides further evidence for two structurally different types of cadherin. Cell Adhesion Commun. 2: 15-26.

Trofatter, J.A., M.M. MacCollin, J.L. Rutter, J.R. Murrell, M.P. Duyao, D.M. Parry, R. Eldridge, N. Kley, A.G. Menon, K. Pulaski, V. Haase, C. Ambrose, D. Munroe, C. Bove, J.L. Haines, R.L. Martuza, M.E. MacDonald, B.R. Seizinger, M.P. Short, A.J. Buckler, and J.F. Gusella. 1993. A novel moesin-, ezrin-, radixin-like gene is a candidate for the neurofibromatosis 2 tumor suppressor. Cell 72: 791-800.

Vidal, S.M., D. Malo, K. Vogan, E. Skamene, and P. Gros. 1993. Natural resistance to infection with intracellular parasites: Isolation of a candidate for Bcg. Cell 73: 469-485.

Vulpe, C., B. Levinson, S. Whitney, S. Packman, and J. Gitschier. 1993. Isolation of a candidate gene for Menkes disease and evidence that it encodes a copper-transporting ATPase. Nature Genet. 3: 7-13.

Walker, A.P., F. Muscatelli, and A.P. Monaco. 1993. Isolation of the human Xp21 glycerol kinase gene by positional cloning. Hum. Mol. Genet. 2: 107-114.

Wong, K. and L. Cantley. 1994. Cloning and characterization of a human phosphatidylinositol 4-kinase. J. Biol. Chem. 269: 28878-28884.

Received June 15, 1995; accepted in revised form September 22, 1995.

224 ~I GENOME RESEARCH Downloaded from genome.cshlp.org on September 26, 2021 - Published by Cold Spring Harbor Laboratory Press

An expression-independent catalog of genes from human chromosome 22.

J A Trofatter, K R Long, J R Murrell, et al.

Genome Res. 1995 5: 214-224 Access the most recent version at doi:10.1101/gr.5.3.214

References This article cites 40 articles, 21 of which can be accessed free at: http://genome.cshlp.org/content/5/3/214.full.html#ref-list-1

License

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Copyright © Cold Spring Harbor Laboratory Press