BMC Genomics BioMed Central

Research article Open Access The G -coupled receptor subset of the dog genome is more similar to that in humans than rodents Tatjana Haitina1, Robert Fredriksson1, Steven M Foord2, Helgi B Schiöth*1 and David E Gloriam*2

Address: 1Department of Neuroscience, Functional Pharmacology, Uppsala University, BMC, Box 593, 751 24, Uppsala, Sweden and 2GlaxoSmithKline Pharmaceuticals, New Frontiers Science Park, 3rd Avenue, Harlow CM19 5AW, UK Email: Tatjana Haitina - [email protected]; Robert Fredriksson - [email protected]; Steven M Foord - [email protected]; Helgi B Schiöth* - [email protected]; David E Gloriam* - [email protected] * Corresponding authors

Published: 15 January 2009 Received: 20 August 2008 Accepted: 15 January 2009 BMC Genomics 2009, 10:24 doi:10.1186/1471-2164-10-24 This article is available from: http://www.biomedcentral.com/1471-2164/10/24 © 2009 Haitina et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract Background: The dog is an important model organism and it is considered to be closer to humans than rodents regarding metabolism and responses to drugs. The close relationship between humans and dogs over many centuries has lead to the diversity of the canine species, important genetic discoveries and an appreciation of the effects of old age in another species. The superfamily of G protein-coupled receptors (GPCRs) is one of the largest families in most mammals and the most exploited in terms of drug discovery. An accurate comparison of the GPCR repertoires in dog and human is valuable for the prediction of functional similarities and differences between the species. Results: We searched the dog genome for non-olfactory GPCRs and obtained 353 full-length GPCR gene sequences, 18 incomplete sequences and 13 pseudogenes. We established relationships between human, dog, rat and mouse GPCRs resolving orthologous pairs and species- specific duplicates. We found that 12 dog GPCR are missing in humans while 24 human GPCR genes are not part of the dog GPCR repertoire. There is a higher number of orthologous pairs between dog and human that are conserved as compared with either mouse or rat. In almost all cases the differences observed between the dog and human genomes coincide with other variations in the rodent species. Several GPCR gene expansions characteristic for rodents are not found in dog. Conclusion: The repertoire of dog non-olfactory GPCRs is more similar to the repertoire in humans as compared with the one in rodents. The comparison of the dog, human and rodent repertoires revealed several examples of species-specific gene duplications and deletions. This information is useful in the selection of model organisms for pharmacological experiments.

Background about 100 000 years ago dogs have shared living space The dog is an important model in biomedical research for and food sources with humans and have been selectively several reasons. Dogs have unique evolutionary history. inbred with periodic population bottlenecks [1,2]. The Since their domestication from the grey wolf in East Asia American Kennel Club (AKC) and similar organizations

Page 1 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

worldwide have provided easily accessible and extensive two GPCR monomers (for example, GABABR1-GABABR2, genealogies which provide unique opportunities for TAS1R3-TAS1R1 and TAS1R3-TAS1R2) or even as het- genetic analyses. Dogs show a high prevalence of specific erodimers of a GPCR monomer and a receptor activity- diseases; such as blindness, heart disease, cataracts, epi- modifying protein (RAMP) (for example, PTHR1-RAMP2) lepsy and deafness; that are relevant for human biology [11]. The key common structural components of the [3,4]. Dogs are susceptible to a wide variety of genetic dis- GPCRs are the seven transmembrane α-helices that span eases. For example, dogs have cancers that seem to affect the cell membrane. GPCRs represent between 30-45% of just one breed or a few closely related breeds. The dog also the current drug targets [12,13] and many pharmaceutical has more similarities in general physiology, anatomy, dis- companies devote up to 30% of their drug discovery ease susceptibility, morphological variation and behav- efforts toward them [14]. Even so, they have an enormous ioural traits to humans than the most frequently used unexploited therapeutic potential as drugs in the clinic experimental animals, mouse and rat. Regulatory author- target only 30 of the approximately 400 non-olfactory ities mandate the use of non-rodent species in safety GPCRs [15]. assessments for new medicines and dogs are the most fre- quent choice. Furthermore, the dog is also an important The human GPCR repertoire has previously been divided model in evolutionary analysis in which its relative diver- into five main families (GRAFS); Glutamate (clan C), Rho- gence in relation to other mammalian lineages allows for dopsin (clan A, includes the olfactory receptors), Adhesion valuable comparisons. The sequence of the dog (Canis (clan B2), /Taste2 and Secretin (clan B) [16]. The familiaris) genome has had contributions from two GRAFS families are found in all bilateral species [17]. The breeds, the boxer [5] and the poodle [6]. Analysis have family is the largest and includes hundreds of revealed long-range haplotypes across the entire genomes, olfactory receptors (ORs). The Rhodopsin family also con- crucial for defining the nature of genetic diversity within tains most of the GPCR drug targets, mainly amine and and across breeds [5]. These maps provide good opportu- peptide receptors [18]. In humans, the second largest fam- nities for genome-wide association studies to identify ily is the Adhesion family. Adhesion GPCRs are character- genes responsible for diseases and traits. ized by long extracellular N-termini. Most of the receptors in this family are still orphans (i.e their endogenous lig- Automated gene predictions offer fast annotation of ands are unknown) [19,20] The Glutamate family includes genomes but they are error-prone and need to be followed receptors that are activated by glutamate, GABA and cal- up by careful manual curation of the coding sequences. cium as well as the two groups of sweet and umami taste For instance the Genscan gene prediction program has a receptors (TAS1Rs) and vomeronasal receptors type 2 sensitivity and specificity of about 90% for detecting (V2Rs) that recognize . The Secretin family exons, leading to frequent errors in multi-exon genes [7]. has ligands that are large peptides such as secretin, parath- Our recent annotation of the G protein-coupled receptors yroid hormone, glucagon, glucagon-like peptide, calci- (GPCRs) within the chicken genome showed that over tonin, vasoactive intestinal peptide, growth hormone 60% of the Genscan gene predictions with a human releasing hormone and pituitary adenylyl cyclase activat- ortholog needed curation. Curation markedly increased ing protein. The Frizzled receptors bind, among others, the the quality of the dataset, raising the average percentage Wnt ligands and play an important role in embryonic identity between the human-chicken one-to-one ortholo- development. The Taste2 or bitter family gous pairs from 56% to 73% [8]. The quality of protein was originally assigned together with Frizzled family, but sequences has a significant impact on phylogenetic analy- they form two very distinct clusters [16]. It is not clear if ses and calculations of evolutionary distances. Accurate Frizzled and Taste2 groups have a common evolutionary comparisons of the dog and human , such as cor- origin and in this study we describe them as two different rect assignment of orthologous pairs, are crucial for the families. Another family is the vomeronasal 1 receptors, design and interpretation of physiological and pharmaco- abbreviated V1R, which does not display similarity to any logical studies in which results are inferred between the of the GRAFS groups. V1R family has many members in species. rodents [21], but very few in humans and was therefore not included into the original GRAFS classification. A con- The superfamily of GPCRs is one of the largest groups of sensus list of all human, mouse and rat 'non sensory' proteins within most mammals. GPCRs are signal media- GPCRs is maintained by IUPHAR [22]. The GPCR reper- tors that have a prominent role in most major physiolog- toire has also been studied in detail in non-mammalian ical processes at both the central and peripheral level [9]. vertebrates such as the teleost pufferfish [23] and in inver- It has been estimated that about 80% of all known hor- tebrates such as the lancelet [24] and the mosquito [25]. mones and neurotransmitters activate cellular signal transduction mechanisms via GPCRs [10]. Many of The sense of smell, or odorant detection, is strongly GPCRs are able to form and function as heterodimers of evolved in dogs for which 876 genes have been predicted

Page 2 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

to encode olfactory receptors, a figure almost double the Broad Institute genome assembly (from the boxer). How- human repertoire and comparable to that of rodents [26]. ever, in a second BLAST search of these sequences in the The vomeronasal V2R receptors are also thought to serve TIGR poodle assembly [28] these 4 genes were found to be an olfactory function [21], but surprisingly no such func- intact/full-length. This may either reflect sequencing tional genes were found in dog, only pseudogenes. issues or indicate real differences between breeds. Rodents have a largely expanded V1R repertoire with over 100 genes, whereas dogs have 8 and humans have no The dog GPCR gene sequences were divided into families V1Rs [21]. The dog also has 14 Taste2 receptors for bitter in line with the GRAFS classification: Glutamate, Rho- taste [27]. The non-sensory GPCRs have not previously dopsin, Adhesion, Frizzled, Secretin, Taste2 and V1R families been studied in dog. [29]. 18 genes do not have sequence similarity to any GPCR family and these were treated as a separate group In this study we provide the subset of the non-olfactory called Other GPCRs according to our previous classifica- GPCRs in the dog genome. We have made comprehensive tion of the rat GPCRs [30], the only difference being that searches for dog GPCR genes, put extensive efforts in man- GPR149 was here moved to the Rhodopsin family. The ually correcting coding sequences and performed detailed numbers of genes in each GPCR family; including previ- phylogenetic analyses. Furthermore, we provide a com- ously published sensory GPCRs for human, dog, mouse parison between the GPCR repertoires in human, dog, and rat; are presented in Table 1. A complete table of all mouse and rat. GRAFS GPCR genes in dog, human, rat and mouse is pre- sented in Additional file 1. The sequences of Results all dog GPCRs obtained in this study are included in Addi- We performed a comprehensive search for non-olfactory tional file 2. GPCR genes in the dog genome. A start dataset was pro- duced from BLASTN searches in the Genbank non-redun- We performed phylogenetic analyses of all dog and dant database. This contained 325 full-length GPCRs and human GRAFS GPCR protein sequences and identified 5 pseudogenes. Around 13% of these needed manual orthologs and species-specific genes. The latter represent curation because they had an incorrect composition of paralogous genes that have arisen or been lost specifically exons. TBLASTN and BLAT searches in the dog genome in either human or dog or the lineages leading to them. assembly completed the analysis. A total number of 353 Consensus trees of 100 Maximum Parsimony phyloge- full-length sequences, 18 incomplete sequences and 13 netic trees and the average amino acid sequences identi- pseudogenes were retrieved. A full-length dog GPCR gene ties of receptor orthologs are presented in Figure 1: has been defined as one that contains an intact transmem- Rhodopsin family and Figure 2:Glutamate, Adhesion, brane domain. The incomplete GPCR gene sequences are Frizzled and Secretin families. Dog genes missing in missing exons or parts thereof because they reside in human are listed in Table 2, whereas human GPCR genes genomic regions that have not been sequenced. It is also not found in the dog and/or rodent genomes are listed in possible that whole GPCR genes are missing in the dog Table 3. genome assembly and these can be very difficult to distin- guish from those that do not exist in this species unless We identified 267 Rhodopsin GPCR genes in dog and this the specific genomic region is carefully analysed. The gene can be compared with the corresponding number in sequences of MAS1, NPY2R, GPR52 and GPR37L1 were human that is 284 (Table 1 and Additional File 1). The found to include frameshifts and/or stop codons in the average protein sequence identity is 86% between dog

Table 1: The number of GPCR genes in human, dog, mouse and rat.

GPCR Family Total Number of GPCRs

Dog Dog Pseudogenes Human Mouse Rat

Adhesion 37 1 33 30 30 Frizzled 11 0 11 11 10 Glutamate 22 0 22 22 22 Rhodopsin non-olfactory 267 12 284 320 297 Secretin 15 0 15 15 15 Taste 2 14# 5# 25§ 35§ 35§ V1R 8* 33* 5* 187* 106* V2R 0* 9* 0* 121* 79* Other GPCRs 18 0 18 19 19

*[21]; # [27]; §[30]

Page 3 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

Table 2: Table showing the dog GPCR genes that are missing or pseudogenes in human.

Family GPCR Human Dog Rat Mouse

Adhesion EMR2b missing present missing missing Adhesion EMR2c missing present missing missing Adhesion EMR2d missing present missing missing Adhesion EMR4b missing present missing missing Adhesion EMR4c missing present missing missing Rhodopsin GPR166P pseudogene present present pseudogene Rhodopsin GPR33 pseudogene present present present Rhodopsin GPR79 pseudogene present present present Rhodopsin TAAR4 pseudogene present present present Rhodopsin GPR141b missing present missing missing Rhodopsin MRGPR-like1 missing present missing missing Rhodopsin TRHR3 missing present missing missing

Table 3: Human GPCR genes that are missing (not found in genome assemblies) or are pseudogenes in dog and/or rodents.

Family GPCR Human Dog Rat Mouse

Adhesion EMR2 present present missing missing Adhesion EMR3 present present missing missing Adhesion GPR144 present present pseudogene pseudogene Rhodopsin AGTR1 present present missing missing Rhodopsin CCR1 present present missing missing Rhodopsin FPR1 present missing present present Rhodopsin FPRL2 present missing missing missing Rhodopsin GPR109B present missing missing missing Rhodopsin GPR135 present missing present present Rhodopsin GPR148 present missing missing missing Rhodopsin GPR150 present missing present present Rhodopsin GPR32 present missing pseudogene pseudogene Rhodopsin GPR42 present missing missing missing Rhodopsin GPR75 present missing present present Rhodopsin GPR78 present pseudogene missing missing Rhodopsin HTR1E present present missing missing Rhodopsin MAS1L present missing missing missing Rhodopsin MCHR2 present present missing missing Rhodopsin MLNR present present missing pseudogene Rhodopsin MRGPRE present missing present present Rhodopsin MRGPRX1 present missing missing missing Rhodopsin MRGPRX2 present present missing missing Rhodopsin MRGPRX3 present missing missing missing Rhodopsin MRGPRX4 present missing missing missing Rhodopsin NPBWR2 present missing pseudogene pseudogene Rhodopsin OPN1LW present missing missing missing Rhodopsin OXER1 present present missing missing Rhodopsin P2RY11 present present missing missing Rhodopsin P2RY4 present pseudogene present present Rhodopsin P2RY8 present present missing missing Rhodopsin RXFP4 present pseudogene pseudogene present Rhodopsin SSTR4 present missing present present Rhodopsin TAAR1 present pseudogene present present Rhodopsin TAAR6 present missing present present Rhodopsin TAAR8 present missing missing missing Rhodopsin TAAR9 present missing present present

Page 4 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

1 1 3 3 2 CCR 2 fCCR fCCR 5 c hs c CCR CCR hs hs cfCCRCCR 5 hscfCCR 13 cfCCRL2 RY 13

A 2

hsMRGPRX4 B P RY 12 hsCCRL2 1 2

1 hs P 109 1 f RY 109

1 c 2

3 R P 12 42 RY

hsMRGPRX3 P 68 hs 2

68 4 AR P

GPR 4 cf A G 81 hsCCR4

81

SUCNR OXGR hs

SUCNR 141

GPR OXGR hs f cfCCR4

f 23

GPR 5 5 hs 14

GPR c 109 23

hs GPR 141b

c hsFF GPR 14 f f hs Y GPR cfCX3CR1 c c GPR hs 141

f hs RY R RY RY GPR hsCX3CR1 hs

3 c 2 2 2

GPR

6 GPR

2 2 hs 2

GPR GPR P f

79 6 P P

f f f f P c fGPR

hs

hsMRGPRX1 c c

AR c c c 8

RY AR hs AR hs

RY CCR

2 cf 8 FF

2

GPR

f FF

P

f 171 82

f CCR

c

P 171

c hs

f c 65

hs

hsFF 65 c

2 hs 2 31 c 4 GPR MRGPRX hs 31 hs f c fGPR hs 82 MRGPRG f GPR

P

RY c

P RY GPR

c MRGPRG c RY cfGPR

f 2 GPR

hsF2 2 hs MRGPR 2 2 2 f 2 f

F2 GPR

RY GPR hs

P c

RY P

P f

hs f

1 1 c

MRGPR c hs

F c R

c f hs MRGPRX R 8 hs fMRGPRD 8

F hs hsF2

1 132 c 1 132

MRGPRD f dog/human

F2 OXER

OXER 1

AR f cfMAS1 AR 34 1 1 1

R c 34

R hs GPR R GPR R 2 FF 86.3%

f XCR

hsMAS1 L1 L1 f 2 f mouse/human

c c c XCR

hs LT LT 2 2 hsF2 hsFF GPR 2 hs f GPR R R c c 1 83.3% f hs 6 6

c F2 fCYSCYS LT 1 1 LT

f 1 c hs SS c R hs fBDKRBBDKRB 2 9

f R c CCRCCR

MRGPR- L2 11 RY 2 f 9

SS RY 11 CYS dog/mouse

T L2 f CYS hs c

2 R 2 hs

T 1 c fBDKRB 6 sCCR P

RY

c P hs c BDKRB R RY CCBP CCR

hs f f 6 h f 81.8%

cfSSTR3 1 hs NPBWR 2 c

2

c hs CCBP c SS hs hs f

NPBWR P 7

f c hs P c hsSSTR3 SS f 174 CXCR

T hs fMCHR like c 174 10 7 hsF2 c

R f hs 10 hs CXCR hsSSTR5 T 2 MCHR MRGPRE F2 Y cf CCR R c R R CCR 2 1 f R R hs f cfSSTR5 hs MCHR R GPR GPR 2 F F c 1 R f P 2 A RA 1 MCHR 1 L3 c f A R L3 hs PY T T L1 L8 L 1 c 1 hs c P cfI RB B LR hsOPRD1 f hs hs f P L1 L8 f B OPRK 2 MAS c 2 CCR fI RB c hs 3 hs SS 92 c cfOPRD1 OPRK 1 T 2 1L 92 hs R 2 hsfCCR hsIL8RA cfCXCR 3 R 55 T R 1 c CXCR 1 4 hs 55 T R IL8 hs hsOPRM1 c R 1 hs NPBWR f GPR AG T 10

hs DARC GPR R

f fAG T 10

c cfOPRM1 hs hs DARC GP GPR AG fCCR f c c CCR cf c hs AG hs R2 GAL 2 hs cf 4 hsGA R2 c R4 7 B 4R2 LR2 hsfOPR hs R4 CXCR 7 hsLTfLTB B4R cfGALR3 OPR L1 cf UT cfCXC cf c R hs L1 UT S hsCXC hsCXCR hsLTB4 1

GALR3 S 2R cfLT LR

2 17 CMK 1 hs 17 hs LR GHSR hsNMUR R fGPR182 32 cfCMK 1

cfGHSR cfNMUR c GPR18244 hsGPR 33 GPR

c GPR 44 fGPR hs 1 GPR hs

cfM hs 2 hs fGPR f fGPR c fGPR

hs LNR NMUR2 GA c c GPR c

MLNR cfNMUR hsT cfGA LR hs hs C5AR1 1 cfT RHR LR 1 166 hs 5AR1 cfGPR 1 RHR 1 P cfC 77 hsNTSR1 hsGPR 39 cfKI 1 cfGPR 77 39 hsK SS1R cfC3AR hsGPR cfNTSR1 cfTRHR ISS1 hsC3AR1 1 3 R hsFPR hsFPRL2 hsNTSR2 cf cfFPRL1 hsGRPR hsNNMBR hsRXFP4 hsRXFP3hsFPRL1 2A cfNTSR2 cfGRPR MBR cfRXFP3 cfADORA 2A hsAGTRL1 hsADORA2B hsBRS3 cfAGTRL1 cfADORA 2B cfBRS3 ADORA3 hsEDNRA cfGPR25 hscfADORA 3 hsGPR25 hsADORA1 cfEDNRA cfGPR15 cfADORA 1 cfEDNRB hsGPR15 hsADORA 2 cfADRB 2 hsEDNRB hsADRB3 hsNPFFR1 cfADRB 3 hsGPR37L1 cfNPFFR1hsNPFFR2 hsADRBcfDRD1 cfGPR37L1 hsHCRTR2 cfNPFFR2 cfADRB1 hsDRD1 cfHCRTR2 103 hsADRB1 cfDRD5 cfGPR37 hsHCRTR1 hsGPR103 hsDRD5 GPR37 cfHCRTR1 cfCCKAR cfGPR cfADRA1D hs hsCCKAR 50 hsADRA1D cfADRA1A hsCCKBR cfGPR50 hsADRA1A cfMTNR1A cfCCKBR hsGPR cfHRH2 cfADRA1B hsMTNR1A 2R hsHRH2 hsADRA1B hsMTNR1B 83 hsNPY 2R cfHTR1E cfMTNR1B hsGPR83 cfNPY hscfHTR1A hsHTR1E cfGPR PRLHR HTR1A cfHTR1F hsTACR2 hs LHR 1 c cfHTR7 hsHTR1F cfTACR2 3 cfPR 19 hsfHTR4 hsHTR7 hsHTR1D cfTACR1 hsTACR 3 fGPR 19 HTR4 cfHTR5 cfHTR1D ACR1 cfTACR c sGPR cfPROKR hsHT A cfHTR hsT hs 1 R5A hsHT 1B NPY1R cf R1B hscfNPY1R PROKR 150 hsDRD cf 1 5R hs c DRD4 hsT TAAR5 hsPPYR1 fNPY R GPR hs fH 4 cfT AAR hsTAAR PPYR c 5 hs c T AAR 1 hsT 5 cf NPY GNRHR hs f H R 4 AAR hs 2 hs GPR TR 6 cfADRA 9 PROKR 2 cfGNRHR GPR 6 hsADRA cfDRD3 hsT hs 119 2C hsDRD AAR6 cfPROKR 1 119 2 cfDRD 3 hsT hsNPSR1 2 cfH C hsDRD2 AAR fNPSR cfAVPR 2 hs T 2 8 c AVPR HT R2 hs 1 c R B c 1 f 2 fH hs HRH B chsH TR c c HRH f T 2C fADRA cfOXTR f hs H R hs hs CNR 1 TR 2C ADRA2A CNR H 2 cf GPBAR c 1 T A ADRA f GPBAR hs f R hs 2A hsOXTR c CNR 2 CNR 2 A ADRA 2B cfAVPR1A hs 2 1 2B 1 RGR hsAVPR1A fRGR hs c hs hs c fMC 3 CHRM 3 c 4 MC fCHRM 4 c c c 1 hs f f 1 R fGPR hs HRH hs CHRM 2 OPN R c hs OPN OPN GPR HRH CHRM 2 cfAVPR1B OPN f hs fHRH c CHRM c f hs 4 f 4 hs c 3 3 HRH CHRM 5 3 3 4 4 cfLGR4 5 3 1 GER 3 1 4 GER hsAVPR1B GR OPN T RRH hsL RRH P T hs OPN f hsCHRM3 f P c c c hs

1 hs f 4 4 hs f c hs f hs GPR P 1 c MC c P MC f GPRGPR cfCHRM3 cfLGR6 F 2 GPR cfTSHR 2 F c h P GER 2 P GER 6 cfCHRM5 cfLGR5 hsTSHR RX 2 hs f s

F T R EDG T c F 6 RX R 12 hs f EDG f P 12 EDG

hsLGR6 hs P c EDG hs hsCHRM5 hsLGR5 f c RX hs c RX c f EDG c EDG f f 1MW hs EDG f 4 EDG MC hs MC 6 c 4 6 8

4 8 4 OPN 5 5 f R R R R c F cfEDG3 F R

G I R hs G I c T G hsEDG3 T hs f EDG P T G c EDG f P T f c P EDG EDG

cfLHCGR P hsEDG1 hs 1 f SW 1 hs 2 1 c R R c 2 cfEDG1 2 hsLHCGR SW 2 f hs 7 1 MC 7

GER hs OPN GER MC SHR f fRHO T c SHR W c RHO T f 3 BXA c BXA MC MC F OPN MW P f 1 P R

1L T f 3 hsF c hs hs f hs c R c 5 hsT 5

R 2 R OPN OPN 2 hs

hs GER GDR GDR T T GER T

P P T P f

fP c hs c hs

ConsensusFigure 1 tree of the human (hs) and dog (cf) Rhodopsin family based on 100 Maximum Parsimony phylogenetic trees Consensus tree of the human (hs) and dog (cf) Rhodopsin family based on 100 Maximum Parsimony phyloge- netic trees. The sequence alignment used for the phylogenetic calculation was based on the transmembrane segments. A pie- chart displays the average pairwise percentages of protein sequence identity between human, mouse and dog one-to-one orthologs.

Page 5 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

cfEMR4c cfEMR4 cfEMR2c cfEMR2d cfEMR2 dog/human cfEMR3 cfEMR2b 83.1% mouse/human hsEMR4 79.9% hsEMR3 cfEMR4b hsEMR2 dog/mouse hsEMR1 hsGPR112 cfFZD5 78.1% cfGPR64 cfFZD4 hsFZD4 hsFZD9 cfGPR112 hsFZD5 cfEMR1 cfFZD9 cfCD97 cfGPR128 hsGPR64 hsGPR128 cfFZD8 cfELTD1 hsCD97 Adhesion hsFZD8 cfFZD10 hsELTD1 hsGPR98 cfGPR98 cfGPR126 cfLPHN1 hsFZD10 hsLPHN1 cfFZD1 cfGPR97 hsGPR126 hsGPR97 hsFZD1 cfLPHN3 cfGPR56 hsGPR56 cfFZD3 hsFZD2 cfGPR114 hsFZD3 hsGPR114 cfFZD2 hsLPHN3 cfFZD7 hsSMO cfGPR113 hsFZD7 cfSMO cfFZD6 hsGPR113 hsFZD6 cfGPR115 hsLPHN2 hsGPR125 Frizzled hsGPR115 cfLPHN2 cfGPR125 dog/human cfGPR144 hsGPR111 cfGPR123 hsGPR144 96.9% mouse/human hsGPR123 cfGPR133 cfGPR116 cfGPR111 95.3% cfGPR124 hsGPR133 hsGPR116 cfGPR110 hsGPR124cfCELSR3 hsGPR110 dog/mouse hsCELSR3 hsBAI1 hsBAI3 94.2% cfBAI1 dog/human cfCELSR2 cfBAI3 89.0% hsCELSR2 hsBAI2 mouse/human cfCELSR1cfBAI2 hsGHRHR cfGHRHR hsSCTR cfVIPR1 86.5% hsCELSR1 cfSCTR hsVIPR1 cfGLP2R dog/mouse hsGCGR hsGLP2R cfADCYAP1R 86.0% cfGCGR hsGABBR2 cfGABBR2 hsADCYAP1R cfGABBR1 hsGPRC5C cfGPRC5C Glutamate hsGPRC5B cfGIPR cfGPRC5B hsGABBR1 hsVIPR2 cfGPRC5D cfVIPR2 hsGIPR hsGPRC5D hsGPR156

cfGPR156 hsGLP1R cfGPRC5A hsCALCRL hsGPR158 cfGLP1R hsGPRC5A cfCALCRL hsPTHR2 cfGPR158 cfPTHR2 cfCALCR hsTAS1R3 hsPTHR1 hsCALCR cfTAS1R3 cfGPR179 cfPTHR1

cfTAS1R1 cfCRHR1 cfCRHR2 hsGPR179 hsCRHR1 hsCRHR2 hsTAS1R1 Secretin dog/human hsTAS1R2 hsGRM1 88.5% cfGRM1 mouse/human cfTAS1R2 85.7% hsCASR cfGRM5 cfCASR hsGRM5 dog/mouse hsGPRC6A 83.4% cfGPRC6A hsGRM7 hsGRM2 cfGRM2 cfGRM7 cfGRM3 hsGRM3 hsGRM6 cfGRM6 hsGRM8 hsGRM4 cfGRM4 cfGRM8

ConsensusFigure 2 trees of the human (hs) and dog (cf) Adhesion, Frizzled, Glutamate and Secretin GPCR families Consensus trees of the human (hs) and dog (cf) Adhesion, Frizzled, Glutamate and Secretin GPCR families. Each tree is based on 100 Maximum Parsimony trees. The sequence alignments used for phylogenetic calculations were based on the transmembrane segments. For each GPCR family a pie-chart displays the average pairwise percentages of protein sequence identity between human, mouse and dog one-to-one orthologs.

Page 6 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

and human one-to-one orthologs and this is higher than mate and Secretin families of GPCRs. All families are is observed for each of these two species to the mouse relatively well conserved in terms of sequence identity (in orthologs. For ease of discussion we present the Rho- the order Frizzled > Glutamate > Secretin > Adhesion). dopsin family of GPCRs according to their broad phyloge- netic grouping [16] (see Additional file 1).Rhodopsin α The results show that the Glutamate family is well con- subfamily in dog is missing the receptors GPR148, Red served having 22 orthologous receptor pairs and no spe- (OPN1LW), TAAR6, TAAR8 and TAAR9 (Table 3). cies-specific genes in dog and human. (Figure 2 and In the rodent genomes, three of these receptors; GPR148, Additional file 1). The average protein sequence identity is OPN1LW and TAAR8; are absent, whereas two; TAAR6 89% between dog and human orthologs and lower for and TAAR9; are present. In dog GPR78 and TAAR1 are each of these two species to the mouse orthologs. The pseudogenes while TAAR4 is a full-length/intact gene in sequence of dog GRM3 is incomplete. contrast to its human ortholog, which is a pseudogene. The gene sequences of dog ADRA1B, ADRA1D, ADRA2A, The Adhesion family displays have unconventional orthol- DRD4 and MTNR1B are incomplete. ogy relationships between dog and human. All 33 human Adhesion GPCRs are present in the dog genome. But, inter- In the Rhodopsin β subfamily one new dog gene, TRHR3, estingly, the dog also contains an additional 5 full-length was identified. TRHR3 is not present in humans or rodents genes; EMR2b, EMR2c, EMR2d, EMR4b and EMR4c; and and the receptor with the highest amino acid identity, 59%, 1 pseudogene GPR133b. These Adhesions GPCR genes is the Xenopus laevis thyrotropin-releasing hormone recep- seem to be specific for the dog lineage as they have not tor 3 (TRHR3, GenBank accession: CAD12656). Two Rho- been found in other mammals studied [8,30,31]. We per- dopsin β subfamily receptors, GPR75 and GPR150, are formed a phylogenetic analysis based on the 5 dog-spe- missing in dog, but present in human and rodents. The dog cific EMR receptor sequences together with the dog, NPFFR1 and NPFFR2 gene sequences are incomplete. human, cow and opossum EMR1-EMR4 and CD97. The phylogenetic analysis was based on the transmembrane In the Rhodopsin γ subfamily the dog lacks the genes for regions and the resulting consensus tree is presented in FPR1, FPRL2, GPR32, NPBWR2 and SSTR4, which are all Figure 3. The dog and human one-to-one Adhesion recep- present in human (Table 3). GPR33, which is a pseudog- tor orthologs have an average protein sequence identity of ene in human, is a full-length gene in both dog and 83% and this is higher than each of these species have to rodents. In contrast, another gene, RXFP4, is a pseudog- their mouse counterparts (Figure 2). GPR144, EMR2 and ene in dog, but full-length in both human and rodents. EMR3; which are full-length in human but pseudogenes The dog KISSR1 was found to have an incomplete in rodents; appear to be functional (are full-length) in sequence in the genome assembly. dog. The gene sequences of BAI1, EMR2d, EMR4c, GPR123 and GPR124 are incomplete. In the Rhodopsin δ subfamily we identified one new dog member of the Mas-Related GPCR (MRG) cluster, The Frizzled family is well conserved between dog, mouse MRGPR-like1. The dog assembly is missing, GPR109B, and human having 11 orthologous receptor pairs and no GPR42 (FFAR1L), MAS1L, MRGPRE, MRGPRX1, species-specific genes in either species (Figure 2 and Addi- MRGPRX3 and MRGPRX4, which are all present in the tional file 1). A slight difference is observed for the rat Friz- . GPR79 is a full-length gene in both dog zled repertoire in which FZD10 appears as a pseudogene. and rodents, but is a pseudogene in human. In contrast, The average amino acid identity is 96.9% between dog P2RY4 is a full-length gene in human and rodents, but not and human Frizzled orthologs. The gene sequence of dog in dog in which it is a pseudogene. The dog MRGPRX2 FZD8 is incomplete. gene sequence is incomplete. The Secretin family has the same 15 members in human, One additional new dog Rhodopsin GPCR was identified, dog, mouse and rat i.e. their repertoires are identical. The GPR141b. The most similar receptor, human GPR141, is average protein sequence identity between dog and human an orphan GPCR. GPR135, which is also an orphan Rho- Secretin family GPCR orthologs is 88.5% (Figure 2). dopsin GPCR, was not found in dog. GPR166P, which is a pseudogene in human, was found to be a full-length gene The group defined as Other GPCRs include 18 dog genes. in dog appearing to be functional. Two additional dog One of these, GPR172A, which is present in human but Rhodopsin GPCRs, DARC and GPR88, have only incom- missing in rodents, was found to be missing in the dog plete gene sequences. genome. Another gene, TMEM185B, which is a pseudog- ene in human but full-length in rodents, appears to be Figure 2 displays consensus trees of 100 maximum parsi- functional (is a full-length gene) in dog. mony phylogenetic trees of the Adhesion, Frizzled, Gluta-

Page 7 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

cowEMR2e cowEMR2f

cowEMR3 100

humanEMR2 cowEMR2 humanEMR3 100 opossumEmr3 dogEMR3 dogEMR2 opossumEmr2 75 66 57 dogEMR2b dogEMR4c 50 100 66

dogEMR2c 90 dogEMR4 86 91 96 opossumEmr1 90 100 humanEMR4 50 cowEMR1 cowEMR4 100 62 dogEMR4b opossumEmr4 100 dogEMR1 humanEMR1 opossumCD97 cowCD97

dogCD97 humanCD97

ConsensusFigure 3 tree of the EGF-TM7 Adhesion family GPCRs derived from 100 Maximum Parsimony phylogenetic trees Consensus tree of the EGF-TM7 Adhesion family GPCRs derived from 100 Maximum Parsimony phylogenetic trees. The sequence alignment used for the phylogenetic calculation was based on the transmembrane segments.

Discussion genome. Moreover, 20 human GPCR genes were not In this study we present the overall repertoire of non- found in dog while 4 human GPCR genes were found as olfactory GPCRs in dog and compared it with its counter- pseudogenes in the dog genome. There are a variety of parts in human, rat and mouse. Comparison of the dog possible underlying reasons and consequences of why and human GPCR repertoires, tabulated in Tables 2, 3, some receptors have been lost or duplicated in some spe- shows 12 GPCR genes that are only found in the dog cies, but not affected in others. The general evolutionary

Page 8 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

explanation is that gene repertoires are altered in response opsin (OPN1LW) gene, which is essential for normal to the adaptation of the animals to environmental factors in human. This difference has however a such as the availability of food, disease, predators and more specific consequence as compared with changes in appropriate habitat. Behavioural factors such as coopera- the other sensory gene repertoires. tivity, care of young, learning and social hierarchy also come into play. Receptors would be gained if gene dupli- There is a large interest in the Adhesion family receptors, cation offers a functional advantage by allowing for an many of which were recently discovered [34,35]. Adhesion increased or altered expression (e.g. change in tissue dis- receptors have unique configurations of functional tribution or expression level) of the protein or the gain of domains within their N-termini and these are thought to a new function (e.g. a different ligand). If the redundancy play different physiological roles by mediating a variety of is physiologically insignificant gene duplicates are typi- interactions with extracellular molecules. One group of cally lost by pseudogenization. Adhesion receptors, the EGF-TM7 GPCRs, are equipped with a variable number of epidermal growth factor (EGF) The result of the dog genome provides interesting insight and calcium binding domains and are reported to be into the differential evolutionary pressures among the important components of the immune system [34]. In subgroups of GPCRs. The majority of the differences dog, two EMR2-like GPCRs have been reported previously observed in this study (7 of 12 GPCRs present in dog but [36] and here we present one additional EMR2 and two not in humans and 12 of 24 GPCRs present in humans EMR4 gene duplicates. Moreover, we found additional but not found in dogs) are found in only four sub-groups; EMR2 duplicates in cow (See Figure 3). The 5 dog-specific EGF-TM7 (epidermal growth factor GPCRs), MRGPRs EMR receptors are here termed EMR2b, EMR2c, EMR2d, (Mas-related GPCRs), TAAR (trace amine-associated EMR4b and EMR4c. It has been suggested that EMR2 has receptors) and FPR (formyl peptide receptors). The TAAR a chimeric structure [36]. The seven transmembrane family is known to be highly variable between the mam- (7TM) segments of EMR2 are most similar to those in malian species. For example, the number of intact TAAR EMR3 while the EGF domains in EMR2 are almost identi- genes are 5, 15 and 22 in human, mouse and opossum, cal to those in CD97 [36]. Interestingly, in our phyloge- respectively [32]. In dogs, there are only 2 intact TAAR netic analysis based on the 7TM segments (Figure 3), genes; TAAR4 and TAAR5. The olfactory system is also EMR2 and EMR3 orthologs did not cluster together and associated with high interspecies variation at the mamma- instead receptor paralogs grouped together. This is in line lian level [26]. Pseudogenes are common in the olfactory with the previous hypothesis about chimeric gene struc- repertoire, a feature that may relate to its peculiar signal- tures in this group [36]. We find this pattern to be the ling system, based on an olfactory neuron that has to have same for the human, dog, cow and opossum receptors a signalling neuron to allow its connection to the appara- (Figure 3). The new genes that we found in dog provide tus of perception. A relatively dysfunctional GPCR has rel- additional evidence for the unique evolution of the EMR atively little consequence as a result. The strong variations subfamily of Adhesion GPCRs that seem not only shuffle in the repertoires of the TAAR family are consistent with a domains within the N-terminal region but also larger seg- common engine of evolution- sensory perception. ments of the N-termini. Another sensory system, that perceives sweet or umami tastes, is mediated by three receptors in human, TAS1R1- Dogs are commonly used as model organisms in toxicity 3 (members of the Glutamate family). In cats TAS1R2 is a and dose tests in drug development and it has been pro- pseudogene and is therefore not available to form a criti- posed that the immune system is more similar between cal heterodimer with TAS1R3 and this has resulted in loss dog and human, than between mouse and human [37]. of the ability to sense sweet tastes [33]. Dogs, unlike cats, The EGF-TM7 have a role in the immune system [20] and are known to have an appetite for natural sugars. This fact it is intriguing to speculate if the additional members in is supported by genetics as TAS1R1, TAS1R2 and TAS1R3 dog may give this species an immunologic advantage. all have intact gene sequences in the dog genome and thus Chondroitin sulphate is a native ligand for both EMR2 can encode functional proteins. In contrast another recep- and CD97, which can also bind decay-accelerating factor tor family that senses bitter tastants, the Taste2 receptors, (DAF/CD55). The EGF domains in the N-termini of CD97 are fewer in dog than in many other mammals. The have been suggested to be essential for DAF/CD55 bind- number of Taste2 receptors is 14 in dog, whereas the cor- ing [38,39] while several other Adhesion GPCRs also have responding figures in human and mouse are 25 and 34, N-terminal EGF domains that could compensate for the respectively [27]. The number or bitter taste receptors in a gene difference in the mammalian gene repertoire. Inter- species is likely to correlate with exposure to environmen- estingly, and a bit surprisingly, the formyl peptide recep- tal factors vital for survival as bitter taste is an indicator of tors, FPR1 and FPRL2, could not be found in the dog poison. Looking at other sensory genes such as the cluster genome. These receptors are also believed to participate in of opsin receptors, the dog, like rodents, is lacking the Red immune responses and respond to a large number of var-

Page 9 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

ious ligands [40]. FPRL1 however has an intact gene modified/curated compared to nr. The higher number sequence and appears to encode a functional receptor. It and significantly improved quality of the dog GPCR rep- is possible that FPRL1 could have taken over the functions ertoire presented here clearly illustrates the value of man- of two missing receptors in dog, pending that the ual sequence curation and extensive sequence collation corresponds to a shared function. from the genomic sequence. The involvement of the FPR and EMR families in the immune system could also be indicative of an immuno- In summary, we have presented the overall non-olfactory logical selection pressure that affects diverse groups of GPCR repertoire in dog and analysed it in relation to the receptors. In drug development it is crucial to select model human, rat and mouse counterparts. We have identified organism with genetics closely reflecting that of human new genes and established the relationships of orthologs and a model organism might prove inadequate because of and species-specific receptors. This study describes in differences in the gene repertoire. A missing dog ortholog detail gene losses or duplications of GPCRs in both dog may cause difficulties in drug development because the and rodents and this information is useful for the selec- preclinical studies always include at least one non-rodent tion of model organism as it affects how to interpret phar- species, usually dog. Dog is good for the assessment of macological results. toxicity and it is easier to spot the effect of a drug in dog and more analytical instruments can be used e.g. electro- Conclusion encephalography and impedance cardiography. Differ- We present the first overall analysis of the non-olfactory ences in the immune system could also be potentially GPCR repertoire in dogs and compare this to the versions important for toxicity testing of a candidate drug, when it in mouse, rat and human. The receptor sequences have is crucial to have complete and functional immune sys- been manually curated to assure a higher level of com- tem. pleteness and quality. Our results show that the dog GPCR repertoire is more similar to that in human than rodents There are several other differences between the dog and both with respect to the number of receptor family mem- human genome that are mostly related to Rhodopsin bers and the sequence similarity of orthologs. The com- GPCRs. The MAS-related GPCRs (MRGPRs) family (in the parison of the GPCR repertoires revealed several examples δ-cluster) shows large variation between humans and dog. of species-specific gene duplications and losses and these Most of the MRGPRs are orphan receptors, but some of were described in detail both for dog and rodents. This them have known native ligands, like β-alanin, BAM8-22, information can be used to guide the selection of model cortistatin and 1–7 [41]. This family is also organism as gene redundancies or absences can have cru- highly variable in rodents and it is reasonable to assume cial effect on the outcome of pharmacological experi- that each of these receptor families is under strong selec- ments and how they should be interpreted. tion pressure that is very species dependent. Also the Rho- dopsin family γ cluster has several differences between the Methods compared species. The neuropeptide B/W receptor 2 Identification of dog Glutamate, Rhodopsin, Adhesion, (NPBWR2) was not found in dog and is absent also in Frizzled and Secretin GPCRs rodents and chicken. The 4 (SSTR4) BLASTN searches in the non-redundant database seems to be missing only in dog and all five somatostatin The human GRAFS GPCR genes were used as queries in receptors (SSTR1-5) are present in both the human and individual BLASTN searches [42] against the NCBI non- rodent genomes. However, it needs to be noted that con- redundant database [43]. For each query, the accession clusions on the effect of genes not found in dog are some- numbers of the 10 first hits (representing orthologs, para- what preliminary as it is possible that genes could be logs and homologs) were collected and a non-redundant missing due to incompleteness of the genome assembly. list was obtained which was used to collect the sequences of the hits using fastacmd of the NCBI blast package. Our extensive dog GPCR gene sequence searches and the manual curation of the coding domains have resulted in Manual curation of predicted gene sequences an improved dataset compared to what was previously The dataset obtained from the above search contained available in the public domain. We compared our dog many predicted genes and these were manually curated GPCR dataset to what is available in the NCBI non-redun- for incorrectly included or left out exons. Incorrect dant (nr) database and found that our dataset contains 28 sequences were identified from pair-wise comparison exclusive and 43 modified/curated full-length gene with the human ortholog or most similar homolog. Miss- sequences. 282 of our full-length dog GPCR gene ing exon sequences were obtained from BLAT searches in sequence have identical entries in nr. The corresponding the dog genome assembly using the human protein numbers for the 13 dog GPCR pseudogenes identified in sequence as a query. Because of the genomic complexity this study are; 2 identical in nr, 8 not found in nr and 3 of the dog Adhesion GPCRs only their TM regions were

Page 10 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

checked and corrected. For other GPCR families the full- typically subfamily-specific hindering broader compari- length sequences were curated. son. The alignment was bootstrapped 100 times and 100 Maximum Parsimony trees were calculated with Phylip TBLASTN searches in the dog genome assembly 3.67 [47]. The consensus tree of the 100 Maximum Par- Previously published datasets of the human, rat and simony phylogenetic trees was calculated with Phylip mouse GPCR protein sequences were used as queries 3.67, plotted with Treeview and manually edited in [30,44]. The query dataset was searched using TBLASTN CANVAS. against the May 2005 assembly of the dog genome with expectation cut-off value set to 1.0. The results were proc- GPCR family sequence similarity analysis essed so that overlapping hits on the same strand were Human, mouse and dog GPCR protein sequences were merged using a custom made Java program (available divided into separate datasets for the Glutamate, Rho- upon request). The coordinates of the dopsin, Adhesion, Frizzled and Secretin families. Full-length merged hits were used to extract the corresponding amino acid sequences were aligned with ClustalW 1.81. sequences from the genome using fastacmd of the NCBI The percentages of protein sequence identity were calcu- blast package. Each sequence was elongated upstream lated individually for each orthologous pairs from until the first start codon and downstream until the first human, mouse and dog and used to derive an overall stop codon using a custom made Java program (available average for each GPCR family. upon request). The preliminary dataset was "cleaned" from non-GPCRs and GPCRs from other families by que- Abbreviations rying it against the Refseq database using BLASTN with the ADRA1B, ADRA1D, ADRA2A: alpha ; default settings. The criterion for including new GPCRs AKC: American Kennel Club; BAI1: brain-specific angio- was that the first two hits belonged to the same GPCR genesis inhibitor 1; BLAST: Basic Local Alignment Search family. Protein translations were obtained using transeq Tool; BLAT: BLAST-Like Alignment Tool; DAF: decay- from the EMBOSS package [45] and from these the long- accelerating factor; DARC: Duffy antigen receptor for est intact coding domain was extracted. chemokines; DRD4: D4; EGF: epider- mal growth factor; EMR: EGF-like module containing, BLAT searches for missing dog orthologs mucin-like, hormone receptor-like; FPR1: formyl peptide Missing dog orthologs were searched for using online receptor 1; FPRL2: -like 2; GABA: BLAT [46]. Dog orthologs were defined as BLAT hits with gamma-aminobutyric acid; GPCR: G protein-coupled higher sequence similarity to the human query than any receptor; KISS1R: KiSS1-derived peptide receptor; MAS1L: other protein sequence in the Refseq database or in our MAS1 oncogene-like; MRGPR: MAS-related GPCR; dog GPCR repertoire dataset. MTNR1B: receptor 1B; NPBWR2: neuropep- tides B/W receptor 2; NPFFR1, NPFFR2: neuropeptide FF Naming dog GPCR gene sequences receptor; NPY2R: Y2; OPN1LW: The dog sequences were named according to official Gene opsin 1, long-wave-sensitive, Red opsin; PTHR1: parathy- names of human, rat and mouse orthologs. Genes that are roid hormone receptor 1; P2RY4: pyrimidinergic receptor specific to dog, i.e. not found in other species, were P2Y, G-protein coupled, 4; RXFP4: relaxin/insulin-like assigned the name of the most similar paralog and family peptide receptor 4; SSTR4: somatostatin receptor appending a lower case single character suffix e.g. 4;TAAR: trace amine associated receptor; TM: transmem- "EMR2b". Orthologous and paralogous gene relation- brane; TMEM185B: transmembrane protein 185B; ships were initially assigned based on reciprocal BLAST TRHR3: thyrotropin-releasing hormone receptor 3; V1R: searches of dog and human GPCRs and subsequently ver- vomeronasal receptors type 1; V2R: vomeronasal recep- ified by phylogenetic analysis. tors type 2.

Phylogenetic analysis Authors' contributions Human and dog GPCR sequences were divided into TH performed the phylogenetic analyses, identified some Glutamate, Rhodopsin, Adhesion, Frizzled and Secretin fam- of the gene sequences and wrote most of the manuscript. ilies. Phylogenetic analysis was performed for each of the SMF participated in the design of the study and wrote groups. The 7TM helices (excluding loops and N-/C-ter- parts of the discussion. HBS and RF participated in the mini) for amino acid sequences were determined from a design of the study and contributed to the manuscript. multiple alignment with bovine rhodopsin. produced DEG conceived the study, participated in its design, iden- with ClustalW 1.81. The reason for using only the tified most of the gene sequences and contributed to the sequences of the helices is that these regions have family- manuscript. All authors have read and approved the final wide similarity, whereas the similarity of other parts are manuscript.

Page 11 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

Additional material 13. Hopkins AL, Groom CR: The druggable genome. Nat Rev Drug Discov 2002, 1(9):727-730. 14. Klabunde T, Hessler G: Drug design strategies for targeting G- protein-coupled receptors. Chembiochem 2002, 3(10):928-944. Additional file 1 15. Tyndall JD, Sandilya R: GPCR and antagonists in the Comparative table of the GRAFS family receptors in dog, human, rat clinic. Medicinal chemistry (Shariqah (United Arab Emirates)) 2005, and mouse. A table listing the dog (cf), human (hs), rat (rn) and mouse 1(4):405-421. (mm) GPCRs of the Glutamate, Rhodopsin, Adhesion, Frizzled and 16. Fredriksson R, Lagerstrom MC, Lundin LG, Schioth HB: The G-pro- tein-coupled receptors in the human genome form five main Secretin families. Pseudogenes are marked with a "P" in the column to families. Phylogenetic analysis, paralogon groups, and finger- the right of the respective species. When several species- or lineage-specific prints. Mol Pharmacol 2003, 63(6):1256-1272. duplicates (paralogs) exist the paralog with the highest sequence identity 17. Fredriksson R, Schioth HB: The repertoire of G-protein-coupled has been given as the primary ortholog whereas the other genes are present receptors in fully sequenced genomes. Mol Pharmacol 2005, on separate rows in the table and without a counterpart in the other spe- 67(5):1414-1425. cies. 18. Attwood TK, Findlay JB: Fingerprinting G-protein-coupled receptors. Protein Eng 1994, 7(2):195-203. Click here for file 19. Kop EN, Kwakkenbos MJ, Teske GJ, Kraan MC, Smeets TJ, Stacey M, [http://www.biomedcentral.com/content/supplementary/1471- Lin HH, Tak PP, Hamann J: Identification of the epidermal 2164-10-24-S1.xls] growth factor-TM7 receptor EMR2 and its ligand dermatan sulfate in rheumatoid synovial tissue. Arthritis Rheum 2005, 52(2):442-450. Additional file 2 20. Kwakkenbos MJ, Pouwels W, Matmati M, Stacey M, Lin HH, Gordon Dog GPCR amino acid sequences. Complete list of dog GPCR amino S, van Lier RA, Hamann J: Expression of the largest CD97 and acid sequences and pseudogenes in FASTA format. EMR2 isoforms on leukocytes facilitates a specific interac- Click here for file tion with chondroitin sulfate on B cells. Journal of leukocyte biol- [http://www.biomedcentral.com/content/supplementary/1471- ogy 2005, 77(1):112-119. 21. Grus WE, Shi P, Zhang J: Largest vertebrate vomeronasal type 2164-10-24-S2.txt] 1 receptor gene repertoire in the semiaquatic platypus. Mol Biol Evol 2007, 24(10):2153-2157. 22. IUPHAR Database of G Protein-Coupled Receptors [http:// www.iuphar-db.org/index.jsp] 23. Metpally RP, Sowdhamini R: Genome wide survey of G protein- Acknowledgements coupled receptors in Tetraodon nigroviridis. BMC Evol Biol The studies were supported by the Swedish Research Council (VR), the 2005, 5:41. foundation "Stiftelsen Olle Byggmastare", the Swedish Brain Foundation, 24. Nordstrom KJ, Fredriksson R, Schioth HB: The amphioxus (Bran- chiostoma floridae) genome contains a highly diversified set Svenska Läkaresällskapet, the Göran Gustafsson foundation, Lars Hiertas of G protein-coupled receptors. BMC Evol Biol 2008, 8:9. Foundation, the Novo Nordisk Foundation, and the Magnus Bergvall Foun- 25. Hill CA, Fox AN, Pitts RJ, Kent LB, Tan PL, Chrystal MA, Cravchik A, dation. Collins FH, Robertson HM, Zwiebel LJ: G protein-coupled recep- tors in Anopheles gambiae. Science 2002, 298(5591):176-178. 26. Quignon P, Giraud M, Rimbault M, Lavigne P, Tacher S, Morin E, References Retout E, Valin AS, Lindblad-Toh K, Nicolas J, et al.: The dog and rat 1. Wayne RK, Geffen E, Girman DJ, Koepfli KP, Lau LM, Marshall CR: repertoires. Genome Biol 2005, 6(10):R83. Molecular systematics of the Canidae. Systematic biology 1997, 27. Go Y: Proceedings of the SMBE Tri-National Young Investi- 46(4):622-653. gators' Workshop 2005. Lineage-specific expansions and 2. Vila C, Savolainen P, Maldonado JE, Amorim IR, Rice JE, Honeycutt RL, contractions of the bitter taste receptor gene repertoire in Crandall KA, Lundeberg J, Wayne RK: Multiple and ancient ori- vertebrates. Mol Biol Evol 2006, 23(5):964-972. gins of the domestic dog. Science 1997, 276(5319):1687-1689. 28. TIGR Poodle Assembly [http://www.ncbi.nlm.nih.gov/genome/ 3. Ostrander EA, Galibert F, Patterson DF: Canine genetics comes seq/BlastGen/BlastGen.cgi?taxid=9615] of age. Trends Genet 2000, 16(3):117-124. 29. Schioth HB, Fredriksson R: The GRAFS classification system of 4. Patterson DF: Companion animal medicine in the age of med- G-protein coupled receptors in comparative perspective. ical genetics. Journal of veterinary internal medicine/American College Gen Comp Endocrinol 2005, 142(1–2):94-101. of Veterinary Internal Medicine 2000, 14(1):1-9. 30. Gloriam DE, Fredriksson R, Schioth HB: The G protein-coupled 5. Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, receptor subset of the rat genome. BMC genomics 2007, 8:338. Kamal M, Clamp M, Chang JL, Kulbokas EJ 3rd, Zody MC, et al.: 31. Bjarnadottir TK, Fredriksson R, Hoglund PJ, Gloriam DE, Lagerstrom Genome sequence, comparative analysis and haplotype MC, Schioth HB: The human and mouse repertoire of the structure of the domestic dog. Nature 2005, adhesion family of G-protein-coupled receptors. Genomics 438(7069):803-819. 2004, 84(1):23-33. 6. Kirkness EF, Bafna V, Halpern AL, Levy S, Remington K, Rusch DB, 32. Hashiguchi Y, Nishida M: Evolution of trace amine associated Delcher AL, Pop M, Wang W, Fraser CM, et al.: The dog genome: receptor (TAAR) gene family in vertebrates: lineage-specific survey sequencing and comparative analysis. Science 2003, expansions and degradations of a second class of vertebrate 301(5641):1898-1903. chemosensory receptors expressed in the olfactory epithe- 7. Rogic S, Mackworth AK, Ouellette FB: Evaluation of gene-finding lium. Mol Biol Evol 2007, 24(9):2099-2107. programs on mammalian sequences. Genome research 2001, 33. Li X, Li W, Wang H, Cao J, Maehashi K, Huang L, Bachmanov AA, 11(5):817-832. Reed DR, Legrand-Defretin V, Beauchamp GK, et al.: Pseudogeniza- 8. Lagerstrom MC, Hellstrom AR, Gloriam DE, Larsson TP, Schioth HB, tion of a sweet-receptor gene accounts for cats' indifference Fredriksson R: The G Protein-Coupled Receptor Subset of the toward sugar. PLoS genetics 2005, 1(1):27-35. Chicken Genome. PLoS Comput Biol 2006, 2(6):e54. 34. Bjarnadottir TK, Fredriksson R, Schioth HB: The adhesion GPCRs: 9. Bockaert J, Pin JP: Molecular tinkering of G protein-coupled a unique family of G protein-coupled receptors with impor- receptors: an evolutionary success. The EMBO journal 1999, tant roles in both central and peripheral tissues. Cell Mol Life 18(7):1723-1729. Sci 2007, 64(16):2104-2119. 10. Birnbaumer L, Abramowitz J, Brown AM: Receptor-effector cou- 35. Lagerstrom MC, Schioth HB: Structural diversity of G protein- pling by G proteins. Biochimica et biophysica acta 1990, coupled receptors and significance for drug discovery. Nat 1031(2):163-224. Rev Drug Discov 2008, 7(4):339-357. 11. Foord SM: Matching accessories. Sci STKE 2003, 2003(190):pe25. 36. Kwakkenbos MJ, Matmati M, Madsen O, Pouwels W, Wang Y, Bon- 12. Drews J: Drug discovery: a historical perspective. Science 2000, trop RE, Heidt PJ, Hoek RM, Hamann J: An unusual mode of con- 287(5460):1960-1964.

Page 12 of 13 (page number not for citation purposes) BMC Genomics 2009, 10:24 http://www.biomedcentral.com/1471-2164/10/24

certed evolution of the EGF-TM7 receptor chimera EMR2. Faseb J 2006, 20(14):2582-2584. 37. Felsburg PJ: Overview of immune system development in the dog: comparison with humans. Human & experimental toxicology 2002, 21(9–10):487-492. 38. Hamann J, Stortelers C, Kiss-Toth E, Vogel B, Eichler W, van Lier RA: Characterization of the CD55 (DAF)-binding site on the seven-span transmembrane receptor CD97. European journal of immunology 1998, 28(5):1701-1707. 39. Stacey M, Chang GW, Davies JQ, Kwakkenbos MJ, Sanderson RD, Hamann J, Gordon S, Lin HH: The epidermal growth factor-like domains of the human EMR2 receptor mediate cell attach- ment through chondroitin sulfate glycosaminoglycans. Blood 2003, 102(8):2916-2924. 40. Migeotte I, Communi D, Parmentier M: Formyl peptide recep- tors: a promiscuous subfamily of G protein-coupled recep- tors controlling immune responses. Cytokine & growth factor reviews 2006, 17(6):501-519. 41. Burstein ES, Ott TR, Feddock M, Ma JN, Fuhs S, Wong S, Schiffer HH, Brann MR, Nash NR: Characterization of the Mas-related gene family: structural and functional conservation of human and rhesus MrgX receptors. Br J Pharmacol 2006, 147(1):73-82. 42. The NCBI BLAST archive [http://www.ncbi.nlm.nih.gov/BLAST/ download.shtml] 43. The NCBI databases [http://www.ncbi.nlm.nih.gov/Database/] 44. Bjarnadottir TK, Gloriam DE, Hellstrand SH, Kristiansson H, Fre- driksson R, Schioth HB: Comprehensive repertoire and phylo- genetic analysis of the G protein-coupled receptors in human and mouse. Genomics 2006, 88(3):263-273. 45. Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 2000, 16(6):276-277. 46. Kent WJ: BLAT–the BLAST-like alignment tool. Genome research 2002, 12(4):656-664. 47. Phylip software [http://evolution.gs.washington.edu/phylip.html]

Publish with BioMed Central and every scientist can read your work free of charge "BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime." Sir Paul Nurse, Cancer Research UK Your research papers will be: available free of charge to the entire biomedical community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours — you keep the copyright

Submit your manuscript here: BioMedcentral http://www.biomedcentral.com/info/publishing_adv.asp

Page 13 of 13 (page number not for citation purposes)