The G Protein-Coupled Receptor Repertoires of Human and Mouse
Total Page:16
File Type:pdf, Size:1020Kb
The G protein-coupled receptor repertoires of human and mouse Demetrios K. Vassilatis*†, John G. Hohmann*, Hongkui Zeng*, Fusheng Li, Jane E. Ranchalis, Marty T. Mortrud, Analisa Brown, Stephanie S. Rodriguez, John R. Weller, Abbie C. Wright, John E. Bergmann, and George A. Gaitanaris Primal, Incorporated, 1124 Columbia Street, Seattle, WA 98104 Communicated by William A. Goddard III, California Institute of Technology, Pasadena, CA, January 21, 2003 (received for review October 3, 2002) Diverse members of the G protein-coupled receptor (GPCR) super- To address these issues, we conducted a comprehensive family participate in a variety of physiological functions and are analysis that defined the endoGPCR superfamilies of two mam- major targets of pharmaceutical drugs. Here we report that the malian species, human and mouse.‡ Further analyses revealed repertoire of GPCRs for endogenous ligands consists of 367 recep- evolutionary relationships indicative of a high level of nonre- tors in humans and 392 in mice. Included here are 26 human and dundancy in the endoGPCR superfamily as well as phylogenetic 83 mouse GPCRs not previously identified. A direct comparison of relationships that are predictive of ligands and͞or functions for GPCRs in the two species reveals an unexpected level of orthology. numerous endoGPCRs. Expression profiling showed that indi- The evolutionary preservation of these molecules argues against vidual endoGPCRs are expressed in many tissues and that single functional redundancy among highly related receptors. Phyloge- tissues and brain regions express an unexpectedly large number netic analyses cluster 60% of GPCRs according to ligand preference, of endoGPCRs. However, each tissue expresses a unique com- allowing prediction of ligand types for dozens of orphan receptors. bination of endoGPCRs, indicating a combinatorial use of Expression profiling of 100 GPCRs demonstrates that most are endoGPCRs for the regulation of diverse functions. expressed in multiple tissues and that individual tissues express multiple GPCRs. Over 90% of GPCRs are expressed in the brain. Materials and Methods Strikingly, however, the profiles of most GPCRs are unique, yield- Database Mining and Phylogenetic Analyses. The transmembrane ing thousands of tissue- and cell-specific receptor combinations for domain regions of 254 known GPCRs (13) were each used as a PHARMACOLOGY the modulation of physiological processes. query in TBLASTN searches of the National Center for Biotechnol- ogy Information human genome database; GPCR Class A, B, and ammalian G protein-coupled receptors (GPCRs) consti- C Hidden Markov Model models were also used as queries to Mtute a superfamily of diverse proteins with hundreds of search the International Protein Index proteome database (see Fig. members (1). All members have seven transmembrane domains 5 and Supporting Materials and Methods, which are published as but, on the basis of shared sequence motifs, they are grouped into supporting information on the PNAS web site, www.pnas.org). four classes: A, B, C, and F͞S (2). Neighbor-joining phylogenetic trees were built based on class- GPCRs act as receptors for a multitude of different signals. specific alignments (HMMALIGN). Bootstrap consensus trees were One major group, referred to as chemosensory GPCRs (csG- plotted by using TREEVIEW (14). PCRs), are receptors for sensory signals of external origin that are sensed as odors, pheromones, or tastes (3–5). Most other Expression Analyses. RT-PCR͞RNA was prepared (Totally RNA GPCRs respond to endogenous signals, such as peptides, lipids, kit, Ambion, Austin, TX) from dissected tissues of adult mice neurotransmitters, or nucleotides (6, 7). These GPCRs, the and treated extensively with DNase. PCR was performed with subject of this report, are involved in numerous physiological cDNA prepared from 2, 20, or 200 ng of RNA in 25-l reactions processes, including the regulation of neuronal excitability, for 37 cycles. The ALPHA EASE program (Alpha Innotech, San metabolism, reproduction, development, hormonal homeosta- Leandro, CA) was used to estimate relative expression levels. sis, and behavior. Considering that endogenous ligands are required for regulating these processes, in this report, we refer In Situ Hybridization. In situ hybridization was performed as to this group of GPCRs as ‘‘endoGPCRs.’’ described (15) with minor modifications. A characteristic feature of endoGPCRs is that they are differentially expressed in many cell types in the body. This Results feature, together with their structural diversity, has proved The endoGPCR Repertoire. To define the full complement of important in medicinal chemistry. Of all currently marketed endoGPCRs in human and mouse, we embarked on a multistep drugs, Ͼ30% are modulators of specific endoGPCRs (8). How- process involving the identification of first known and then novel ever, only 10% of endoGPCRs are targeted by these drugs, genes. Initially, we searched the available public literature and emphasizing the potential of the remaining 90% of the GPCR sequence databases for human and mouse endoGPCRs and then superfamily for the treatment of human disease. performed sequence comparisons. This identified a unique gene Despite the importance of GPCRs in physiology and disease, set for each species and defined the human and mouse orthologs. the size of the endoGPCR superfamily is still uncertain. Celera’s In total, 338 endoGPCRs were identified in human and 301 in initial analysis of the human genome found 616 GPCRs, but not mouse. Sequence alignments indicated that 260 of these mole- distinguish among endoGPCRs and csGPCRs (9), whereas the cules are common to both species (Fig. 5). International Human Genome Sequencing Consortium reported We then asked whether the remaining endoGPCR genes (78 a total of 569 ‘‘rhodopsin-like’’ (i.e., Class A) GPCRs (10), implying the inclusion of odorant receptors but the exclusion of the many endoGPCRs that are not ‘‘rhodopsin-like.’’ Focusing Abbreviations: GPCR, G protein-coupled receptor; csGPCR, chemosensory GPCR. only on intronless genes, Takeda et al. (11) found 178 intronless Data deposition: The sequences reported in this paper have been deposited in the GenBank nonchemosensory GPCRs. In addition, whereas most, if not all, database (accession nos. AY255527–AY255623). csGPCRs are known to be selectively expressed in specific *D.K.V., J.G.H., and H.Z. contributed equally to this work. subsets of cells (11, 12), the expression patterns of most †To whom correspondence should be addressed. E-mail: [email protected]. endoGPCRs are incomplete or unknown. ‡The sequences can be accessed at www.primalinc.com. www.pnas.org͞cgi͞doi͞10.1073͞pnas.0230374100 PNAS ͉ April 15, 2003 ͉ vol. 100 ͉ no. 8 ͉ 4903–4908 Downloaded by guest on September 30, 2021 human and 41 in mouse), which did not show a counterpart in the as belonging to the same ‘‘family’’ are at least 40% homologous other species, might have undiscovered orthologs. Using the non- in protein sequence. We therefore assigned endoGPCRs that shared endoGPCRs as queries, the public human and mouse showed at least 40% sequence homology to the same family. In genome sequence databases were searched for orthologous genes this manner, 95 different families of endoGPCRs were identi- using TBLASTN (16). These studies identified mouse orthologs for 56 fied, including 18 families of orphan receptors that have not been of the human endoGPCRs, but no orthologs could be found for the previously described (Fig. 5). These studies assigned 12 of 143 remaining 22 (Fig. 5). No human orthologs were detected for 38 of human and 49 of 178 mouse orphan endoGPCRs to seven the mouse genes. Eight additional mouse genes were discovered different families of receptors that interact with known ligands using 11 rat trace amine receptor sequences as queries. In total, 35 and thus can be predicted to recognize ligands similar to those of these 46 mouse genes belonged to the trace amine receptor (17) detected by other family members. and Mas1-related gene families (18, 19). These studies increased the To further investigate sequence–ligand relationships among number of endoGPCRs to 341 in human and 365 in mouse, with 319 human endoGPCRs, we conducted a phylogenetic analysis. endoGPCRs shared by the two species (Fig. 5). endoGPCRs were aligned to the class-specific Hidden Markov We next undertook an exhaustive search for new human Model profile model by using HMMALIGN (21). These alignments endoGPCR genes. Two different approaches were used. In the were used for the construction of phylogenetic trees by CLUSTAL first, we used a homology-based strategy to search the human W (22). The phylogenetic trees were then overlaid with infor- genome sequence database for genes encoding endoGPCRs. mation on the ligand specificities of individual receptors, where Two hundred fifty-four known endoGPCRs, representative of all available. classes, were each used as an independent query in TBLASTN The combined phylogenetic͞ligand analyses of human endo- searches of all human chromosomes. These searches yielded GPCRs are shown in Fig. 1. The phylogenetic tree of the Class Ϸ500,000 matches, which were first reduced to Ϸ50,000 unique A receptors, the largest set, was composed of a number of major matches and then to 10,000 matches with homology to known branches that were progressively subdivided into smaller GPCRs (see Materials and Methods). Among these, hits repre- branches containing increasingly related endoGPCRs. The three senting