Genes Genet. Syst. (2011) 86, p. 109–116 Evolution of the CYP2D cluster in humans and four non-human primates

Yoshiki Yasukochi* and Yoko Satta Department of Evolutionary Studies of Biosystems, the Graduate University for Advanced Studies, Shonan Village, Hayama, Kanagawa 240-0193, Japan

(Received 29 January 2011, accepted 29 March 2011)

The human cytochrome P450 2D6 (CYP2D6) is a primary involved in the metabolism of about 25% of commonly used therapeutic drugs. CYP2D6 belongs to the CYP2D subfamily, a gene cluster located on 22, which comprises the CYP2D6 gene and CYP2D7P and CYP2D8P. Although the chemical and physiological properties of CYP2D6 have been exten- sively studied, there has been no study to date on molecular evolution of the CYP2D subfamily in the . Such knowledge could greatly contrib- ute to the understanding of drug metabolism in humans because it makes us to know when and how the current metabolic system has been constructed. The knowledge moreover can be useful to find differences in exogenous substrates in a particular metabolism between human and other animals such as experimental animals. Here, we conducted a preliminary study to investigate the evolution and gene organization of the CYP2D subfamily, focused on humans and four non- human primates (chimpanzees, orangutans, rhesus monkeys, and common mar- mosets). Our results indicate that CYP2D7P has been duplicated from CYP2D6 before the divergence between humans and great apes, whereas CYP2D6 and CYP2D8P have been already present in the stem lineages of New World monkeys and Catarrhini. Furthermore, the origin of the CYP2D subfamily in the human genome can be traced back to before the divergence between amniotes and amphibians. Our analyses also show that reported chimeric sequences of the CYP2D6 and CYP2D7 in the chimpanzee genome appear to be exchanged in its genome database.

Key words: CYP2D subfamily, CYP2D6, drug metabolism, primate

been released since 2006: that of the chimpanzee (Pan INTRODUCTION troglodytes), released by the Washington University Cytochrome P450 2D6 (CYP2D6) is an important Genome Sequencing Center (Pan_troglodytes-2.1; The enzyme involved in the metabolism of about 25% of Chimpanzee Sequencing and Analysis Consortium, 2005), commonly used therapeutic drugs (Ingelman-Sundberg, the Sumatran orangutan (Pongo pygmaeus abelii), pro- 2005), showing a high affinity for alkaloids (Fonne-Pfister duced by the Genome Sequencing Center at Washington and Meyer, 1988). CYP2D6 belongs to the CYP2D sub- University School of Medicine in St. Louis in July 2007 family, a gene cluster within a contiguous region of about (WUSTL version Pongo_abelii-2.0.2), the rhesus monkey 45 kb on (Kimura et al., 1989), which in (Macaca mulatta), released by the Macaque Genome humans comprises the CYP2D6 gene and two pseudo- Sequencing Consortium in February 2006 (v.1.0, genes CYP2D7P and CYP2D8P. The nucleotide sequ- Mmul_051212; Rhesus Macaque Genome Sequencing and ences of these pseudogenes are highly similar to those of Analysis Consortium, 2007) and a draft assembly of the the CYP2D6 gene. common marmoset (Callithrix jacchus), produced by In recent years, entire genome databases have become WUSTL School of Medicine Genome Sequencing Center available from various species. In particular, the whole [WUGSC 3.2 (GCA_000004665.1)]. Based on the infor- genome assemblies of four non-human primates have mation released, the CYP2D6 gene is located on chromo- some 22 in the genomes of the chimpanzee (based on the Edited by Ryo K. Takahashi * Corresponding author. E-mail: [email protected] chromosome naming system proposed by McConkey; Note: Supplementary materials in this article are at http:// McConkey, 2004) and the orangutan, and on chromosome www.jstage.jst.go.jp/browse/ggs 10 in the genome of the rhesus monkey, based on chromo- 110 Y. YASUKOCHI and Y. SATTA some numbering system of Rogers et al. (2006). No other index.html). The CYP2D genomic sequences of the gorilla annotations exist regarding the CYP2D6 gene in the com- (Gorilla gorilla), the mouse lemur (Microcebus murinus) mon marmoset or other CYP2D genes in any of these spe- and the tarsier (Tarsius syrichta) were also available, but cies. the sequences are incomplete (i.e., they included many The genetic variability of CYP2D6 has been exten- undetermined nucleotides). Thus, their sequences were sively studied in human populations due to its clinical excluded from the analysis. importance (Xie et al., 2001; Bradford, 2002; Mizutani, The chimpanzee, orangutan and rhesus monkey also 2003; Raimundo et al., 2004; Sistonen et al., 2007) but to possess a few undetermined nucleotides but not as many our knowledge there has been no study to date exploring as the gorilla, mouse lemur and tarsier. In addition, molecular evolution of the CYP2D subfamily in the there are some frameshift and nonsense mutations in the human genome. Study on the origin and evolution of putative coding region of some CYP2D sequences. this subfamily is important to understanding of drug Hence, the undetermined parts of sequences were deter- metabolism in humans because the study brings us the mined and the deleterious mutations in some genes were knowledge of when and how we have acquired a meta- confirmed. Genomic DNA samples of the chimpanzee, bolic system for exogenous substrates. The knowledge orangutan and rhesus monkey were provided by the can also reveal the difference in a metabolism of sub- Primate Research Institute of Kyoto University and the strates such as drugs between humans and other ani- Max Planck Institute for Biology. DNA was amplified by mals (e.g., experimental animal). Here, we preliminary PCR with primers (Table 1). PCR amplification was compare the organization of the CYP2D subfamily in the carried out using a DNA Thermal Cycler in 25 μl reaction human genome with genomes of the chimpanzee, oran- mixture with TaKaRa LA Taq Hot Start Version (TaKaRa gutan, rhesus monkey and common marmoset and also Bio Inc.) or PCR Master Mix (Promega). PCR conditions try to trace its evolutionary origin of the subfamily in were according to the manufacturer’s instructions. PCR animals. products used for sequencing were purified with ExoSAP- IT (USB), and cycle sequencing was performed with the BigDye terminator v3.1 cycle sequencing kit (Applied MATERIALS AND METHODS Biosystems). Sequencing was conducted with an ABI Genomic sequences of the chimpanzee, Sumatran oran- PRISM 3130X/Genetic Analyzer (Applied Biosystems). gutan, rhesus monkey and common marmoset genomes The alignment of sequence data was carried out using were used for the identification of CYP2D genes. MEGA Ver. 4.1 Beta (Tamura et al., 2007). The align- Sequence data for humans, chimpanzees and rhesus mon- ment was modified by hand later and the positions of keys were obtained from the NCBI genome database deletions or insertions (indels) were excluded from subse- (http://www.ncbi.nlm.nih.gov/), whereas those of the quent analyses. A neighbor-joining (NJ) tree (Saitou orangutan and common marmoset were obtained from the and Nei, 1987) was reconstructed based on the empirical UCSC genome database (http://genome.ucsc.edu/). JTT substitution matrix (Jones et al., 1992). CYP2D genes from non-human primates were identified Bootstrap analysis was performed using 1,000 replica- by BLAST and Blat homology search and gene order was tions. Maximum likelihood (ML) and Bayesian phyloge- examined with referring to human CYP2D sequences netic trees were implemented in the PHYLIP 3.69 (Genbank accession numbers: M33387 and M33388). package (Felsenstein, 2009) and MrBayes ver. 3.1.2 The genome databases of eight eutherians (house mouse, (Ronquist and Huelsenbeck, 2003), respectively. Boot- Mus musculus; Norway rat, Rattus norvegicus; rabbit, strap analyses used 100 replicates for the ML Oryctolagus cuniculus; cattle, Bos taurus; pig, Sus scrofa; trees. Distance was corrected by the JTT matrix-based horse, Equus caballus; giant panda, Ailuropoda method. A global rearrangement was allowed and the melanoleuca and dog, Canis lupus familiaris), one marsu- input order of OTU was randomized with three jumbles pial (gray short-tailed opossum, Monodelphis domestica), during randomization. The Bayesian analysis was con- one monotreme (platypus, Ornithorhynchus anatinus), one ducted considering 7.1 × 105 generations and tree sam- bird (chicken, Gallus gallus), one reptile (green anole pling every 100 generations. The first 1,775 trees were lizard, Anolis carolinensis), two amphibians (African clawed discarded as burn-in. Distance was corrected by the frog, Xenopus laevis and western clawed frog, Xenopus WAG matrix with gamma correction for site rate varia- tropicalis), three fishes (zebrafish, Danio rerio; medaka tion (Whelan and Goldman, 2001). The ML and fish, Oryzias latipes and puffer fish, Takifugu rubripes) Bayesian trees were visualized with TreeView version and one urochordate (sea squirt, Ciona intestinalis) were 1.6.6 (Page, 1996). Transposable elements were pre- used to search the origin of the CYP2D subfamily. The dicted by Repbase Update (Jurka et al., 2005) with the synteny to human CYP2D genes was predicted by the NCBI CENSOR (Kohany et al., 2006) and RepeatMasker (Smit Map viewer (http://www.ncbi.nlm.nih.gov/mapview/) and et al., 1996–2010) softwares. Ensemble Genome Browser (http://www.ensembl.org/ Gene organization of the CYP2D subfamily in primates 111

Table 1. Primer list used in this study

Name Primer sequence (5’ → 3’) Specificity Objective Patr-CYP2D6F1b ATCCACGTGACAGCTTTGAGGCTC Chimpanzee CYP2D7* Specific amplification of CYP2D7 Patr-CYP2D6R1b GGCCGAGAGGATACTCAGGGGAT Chimpanzee CYP2D7* Specific amplification of CYP2D7 Patr-CYP2D7F1 GTAGCCCAAGCAGCGCCGAC Chimpanzee CYP2D7 Determination of unknown sequence Patr-CYP2D7R1 TGCCCATCACCCACCGGCTTC Chimpanzee CYP2D7 Determination of unknown sequence Patr-CYP2D7F1b CCCAGAAGGCTTTGCAGGCTTCA Chimpanzee CYP2D6* Specific amplification of CYP2D6 Patr-CYP2D7R1b CCGGGTGTCCCAGCAAAGTTCAT Chimpanzee CYP2D6* Specific amplification of CYP2D6 Patr-CYP2D7R1 SEQ1 CCACCCTGACCACCTTTCC Chimpanzee CYP2D7 Determination of unknown sequence Patr-CYP2D8F1 GCTCCTGGCACGCTATGGACA Chimpanzee CYP2D8P Confirmation of deleterious mutation Patr-CYP2D8R1 CAGGGGTCGCTTTCCCAGTCCT Chimpanzee CYP2D8P Confirmation of deleterious mutation Poab-CYP2D6F1 GGATTTCGATTTTAGGTTTCTCCTCTGGGC Orangutan CYP2D6 Determination of unknown sequence Poab-CYP2D6R1 CTCAGTCCCTGGGCTTCCATGA Orangutan CYP2D6 Determination of unknown sequence Poab-CYP2D6F1’ TGAGCAGAGGTTGCATCATT Orangutan CYP2D6 Determination of unknown sequence Poab-CYP2D6R1’ ATCTGGGCAGTCAGAATTGG Orangutan CYP2D6 Determination of unknown sequence Poab-CYP2D6F1 SEQ1 TGCTCATGATCTTACACCCAG Orangutan CYP2D6 Determination of unknown sequence Poab-CYP2D7F1 GAGGCTGACGCCTTTCACCAC Orangutan CYP2D7P Determination of unknown sequence Poab-CYP2D7R1 GTCCTCAAAACTGATCTCCCCAAGTC Orangutan CYP2D7P Determination of unknown sequence Poab-CYP2D7F1b GCACTAAGGGGGAACTGG Orangutan CYP2D7P Determination of unknown sequence Poab-CYP2D7R1b TGAGATGTCCCTCCTCCTCA Orangutan CYP2D7P Determination of unknown sequence Poab-CYP2D7F2 TCGCACCTGGGCTGACATATAA Orangutan CYP2D7P Confirmation of deleterious mutation Poab-CYP2D7R2 TCTGACACTCCTTCCTGCCTC Orangutan CYP2D7P Confirmation of deleterious mutation Poab-CYP2D7F2b TCGGAGAGAGAGCTCAGG Orangutan CYP2D7P Confirmation of deleterious mutation Poab-CYP2D7R2b GAGCATCCAGGAAGTGTTCG Orangutan CYP2D7P Confirmation of deleterious mutation Poab-CYP2D8F1 GGGGAAGAGGGGCTTGTGAG Orangutan CYP2D8P Confirmation of deleterious mutation Poab-CYP2D8R1 CCACCTTTTGCCTGGCCACT Orangutan CYP2D8P Confirmation of deleterious mutation Poab-CYP2D8F1’ TCCAAGGAGCAGGGTTTG Orangutan CYP2D8P Confirmation of deleterious mutation Poab-CYP2D8F1b GGAAAAGCACAGGGTTGG Orangutan CYP2D8P Confirmation of deleterious mutation Poab-CYP2D8R1b GCAGTCCAGGCACCTCTC Orangutan CYP2D8P Confirmation of deleterious mutation Mamu-CYP2D8F1 ACCTTCCTGCTCCTCAAGGTCG Rhesus monkey CYP2D8 Determination of unknown sequence Mamu-CYP2D8R1 GCCTGATTTCCTAATTTAAACGGCACATA Rhesus monkey CYP2D8 Determination of unknown sequence Mamu-CYP2D8F2 CAGACATGGTCTAAAGAAATGAGTAAGTTGG Rhesus monkey CYP2D8 Determination of unknown sequence Mamu-CYP2D8R2 AACCCTGTGGTTTTCCTGGTCTTCCG Rhesus monkey CYP2D8 Determination of unknown sequence Mamu-CYP2D8R2 SEQ1 GCCTTTTTCTGTGTCCCAAC Rhesus monkey CYP2D8 Determination of unknown sequence Mamu-CYP2D8F3 AGCCCCAGTCTAGTAGGGAAGAC Rhesus monkey CYP2D8 Determination of unknown sequence Mamu-CYP2D8R3 ATCACCGAACCTGAGGGTGGTC Rhesus monkey CYP2D8 Determination of unknown sequence Mamu-CYP2D8F3 SEQ1 CCGCATGGAGCTCTTCCTC Rhesus monkey CYP2D8 Determination of unknown sequence Mamu-CYP2D8R3 SEQ1 TTAGCCAGGGGTGGTAACG Rhesus monkey CYP2D8 Determination of unknown sequence *The primer that is used to confirmed the “true” sequence of the chimpanzee CYP2D6 or CYP2D7 (see text and Fig. 2).

of deleterious mutations the following fragments were RESULTS AND DISCUSSION sequenced: 837, 611 and 775 bp in the rhesus monkey Gene organization of the CYP2D subfamily in five pri- CYP2D8P (-like) gene; 1,153 bp in the chimpanzee mate species was revealed by homology search with each CYP2D7P; 121 bp in the chimpanzee CYP2D8P; 2,113 bp human counterpart (Fig. 1). In these non-human pri- in the orangutan CYP2D6; 979 bp in the orangutan mates, nucleotide sequences of eight fragments contained CYP2D7P and 2,011 bp in the orangutan CYP2D8P. undetermined nucleotides or putative deleterious muta- These sequences are deposited in the DNA Data Bank of tions in the referenced genome database of the CYP2D Japan (DDBJ) (Genbank accession numbers: AB594492– genes. To fill in sequence gaps and confirm the presence AB594499). A single CYP2D gene consisting of nine 112 Y. YASUKOCHI and Y. SATTA LTR10G HERVI* MER2 MIRb MER77 LTR12C MER3 HERVIP10F HERVIP10F* C MC1 MC1 MC1 MB2 P_MA2* M1_5* M1_5* MB3_5* MB6_5* MB6_5* MB3_5* MB3_EC* ME1 ME_ORF2* MB6_5* A A ER25* PBA_5* MEf_5end* AL1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 L L L L L L L L L L H L M L L L L L L L L L AluSx AluY AluSg4 AluJb AluSx AluJr AluSx1 AluJb AluY AluSx1 AluSx1 AluSq2 AluJb AluSg AluJb AluSq AluJb AluSq2 AluJo AluJb Alu2_OG* AluJo AluY AluSz6 AluSz AluYb8 AluY 5’ 3’ Homo sapiens CYP2D8P CYP2D7P CYP2D6 (Chr. 22) NCBI annotation HERVIP10F

CYP2D6 LOC745989 luJb A 5’ 3’ Pan troglodytes CYP2D8P CYP2D7 CYP2D6 (Chr. 22) HERVIP10F* LTR5_HS PB1D11

Insertion of CYP2D6 intron 6 fragment MEf_5end* 1 L AluSz6 AluJb AluJb AluY 5’ 3’

Pongo abelii CYP2D8P CYP2D7P CYP2D6 * * ↑ Insertion Insertion ↑ (Chr. 22) HERVI* HERVIP10F HERVIP10F MIRc MER77 LTR10G MER21 L1MB3_5* L1ME1 L2* L1MD L1PBA L1M1_5* L1M1_5* L1P_MA2* L1MB6_5* L1MB6_5* Sc8 Sg7 Y Sq2 YRa1 Jr Sz J* Sq2 Sx Jb YRa1 Jo Jr Jb Sx Jb Y Sz6 Sz Y luSg4 luYRa1 u u u u u u u u u u u u u u u u u u u u u A A Al Al Al Al Al Al Al Al Al Al Al Al Al Al Al Al Al Al Al Al Al 5’ 3’ Macaca mulatta CYP2D8 CYP2D6 (Chr. 10) MIR3* MER21 MER77 MER77 MIR3 L2B* L1P_ MA2* L1M5 HAL1 HAL1 L1MB6 _5* L1PBA L1PBA L1M1_5* MB3_5* 1 0 4 * L AluSg AluSx3 AluSx1 AluSx1 AluSx1 AluJb AluJr AluSq1 AluSp AluSg4 AluJb AluSg7 AluJb AluSc AluSq AluJb AluSq AluSp AluSp AluYRc0 AluSz AluSx1 AluSq AluSz6 5’ 3’ Callithrix jacchus CYP2D8 CYP2D6 ?????????? (?) Fig. 1. Diagram of the organization of the CYP2D subfamily in five primate species. Only transposable elements detected by both methods (RepeatMasker and CENSOR) are shown. They are named following the nomenclature of RepeatMasker. Asterisks indicate CENSOR nomenclature. Closed triangles represent an Alu element. Gray triangles represent L1 or L2 elements. Open triangles represent other elements. The name of transposable elements in non-human primates that have a location or order identical to that in humans is not shown. exons and eight introns ranged from 4,000 bp in the mar- between CYP2D6 and CYP2D7 and that between moset CYP2D6 to 5,200 bp in the orangutan CYP2D7P. CYP2D6 and CYP2D8P were, respectively, 97% and 93%, Such differences in sequence length among species and in the orangutan 95% and 93%, respectively. These resulted from intron size variation. Human CYP2D8P results indicate that the CYP2D7 gene has been dupli- ortholgs were present in all primates, with those in the cated from CYP2D6 before the divergence between rhesus monkey and marmoset genomes being apparently humans and great apes during the Miocene. On the functional due to the absence of putative premature stop other hand, the CYP2D6 and CYP2D8 or CYP2D8P are codons in their coding regions. On the other hand, the present in all five primates, indicating that the origin of CYP2D7P ortholog was not found in the monkey and mar- these genes in the human genome can be traced back to, moset and does not seem to be a in the at latest, a stem lineage of New World monkeys and chimpanzee. We therefore named the ortholog in the Catarrhini. Although the CYP2D genes are annotated in chimpanzee without the character “P” indicating a the mouse genome database, CYP2D6 and CYP2D8P pseudogene. In the orangutan this gene seems to have orthologous genes are ambiguous as the mouse has nine been pseudogenized independently from humans due to active CYP2D products (Nelson et al., 2004). frameshift mutations (data not shown). To investigate the origin of the CYP2D subfamily in Kimura et al. (1989) have reported that the exonic humans, synteny among seventeen vertebrates and one sequence of the human CYP2D7P shares a higher level of urochordate CYP2D candidate gene clusters was exam- similarity to CYP2D6 than to CYP2D8P. The com- ined. The organization of the genes surrounding CYP2D parison of sequences performed here also shows the same candidate genes in amniotes (i.e., mammals, birds and trend (Fig. 2). In the chimpanzee, nucleotide similarity reptiles) was relatively similar to that of genes surround- Gene organization of the CYP2D subfamily in primates 113

111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1111111111 1122222 2222223333 3444557899 9000000000 0000111111 1111111112 2222222222 2233333333 3333333344 4444444444 3470001222 3345671127 8145004427 8011223346 7889001122 2333455890 1123334489 9900011111 5677779901 1223466789 1072855248 3465911843 8698059687 7027561925 3468890114 8036306663 8982390140 2623823678 1717892573 9073607921 Patr CYP2D6 (DQ282164) GAGACTGCCT GAAGCCTGGC CGCCTTGGTC ACCTACATTA GAAAGTGATG TGAGCGAGCA TATCGGACGT GGAATGCGCG GAGTAATCCA ACCCCAGCAG Patr CYP2D7 (NW_001230982) ...... CA G.GCTAGCC. ACCCACAGCA ACCCTCGCTG GGCTC.CTNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN Patr CYP2D6* ...... C...... Down- stream Hosa CYP2D6 (M33388) .G...C...... G.GA.G GAT...CTC. ..G.....C...... A...... CG...... G C....G.... Patr CYP2D6 (NW_001230982) .G...... GGC.T...... C...... Patr CYP2D7* .G...... GGC.TCA G.GCTAGCCG ACCCACAGCA ACCCTCGCTG GGCTC.CTCC CAGGC..A.C CGACGGCGG. C.GTTGCGGT Hosa CYP2D7P (M33387) AGAGTCCAGC CCGTGT-.AG GAT.....CA GGGCTAGCCG .CCCACAGCA ACCCTCGCTG GGCTC.CTCC CAGGC..AT. CGACGGCGG. CTGT.GCGGT Up- stream

1111111111 1111111111 1111111111 1111111111 1111112222 2222222222 2222222222 2222222222 2222222222 2222222333 4455555555 5555555555 5666666667 7777888888 8899990000 0000111111 2222222334 4444555667 7778889999 9999999000 9900013335 5556678889 9011234558 8888013478 9912364566 8999233345 1488999170 0346002781 1252470112 2345589111 6905890253 4563490265 8105203171 4679753337 8971246858 2128835750 4548367626 8527230591 2329207576 7780180178 Patr CYP2D6 (DQ282164) ACAGTCAACA GTGATTGTGT TTCGCGTGTA CCTTGACGTA GCCTTGTAGA CGATGAATAT TACGTTGTGG TGGGCACTGG CGGGCAACGT CACACTCTAA Patr CYP2D7 (NW_001230982) NNNNNNNNNN NNNNNGTC.C GCACT.AACG GAGCA.T.GG ...CCACGAG TTT...... A AG...... Patr CYP2D6* ...... Down- stream Hosa CYP2D6 (M33388) .....A.... .G...... T. .C...A...... CA...... T.C..G.. ...C..CG.. .GTACCA.TC AT..TGT... ..A.G...... G...... Patr CYP2D6 (NW_001230982) ...... C...CA. TA.A..G..C TG.GT..GGC Patr CYP2D7* GAGCC.CGTT TGTCC..C.C GCACT.AACG GAGCA.T.GG ...CCACGAG TTT...... A AG...... C...CA. TA.A..G..C TG.GT..GGC Hosa CYP2D7P (M33387) GAG...CGTT TGTCC..C.C GCACT....G .AGCAC.A.. CGT.C.CGAG T...CGCGG. .G..C..CTC ..A.TG...A ..AA.G.GAC TGT..CTGGC Up- stream

3333333333 3333333333 3333333333 3333333333 3333333333 3333334444 4444444444 444 0000000000 0001111111 1112233334 4444445555 6667778888 8889990111 1122222222 222 1223355666 6890000223 4661703460 0334673349 0370280156 6890024078 8801112222 234 9230212145 9800369256 1370972137 9275100103 6665724751 5250703082 5790280123 647 Patr CYP2D6 (DQ282164) AGTGTGCAAA TCATAAAGCT GCCGGCGGGA ACCACACGAC CAGGAGCCCG TAATGGCCGC ACCAGTTGCT CCT Patr CYP2D7 (NW_001230982) ...... Patr CYP2D6* ...... Down- stream Hosa CYP2D6 (M33388) ....C...... AT...G G.A.TG.AGT ...AG..TAA AGGC..GT.. .T...... G. Patr CYP2D6 (NW_001230982) CAGA..TCGG ATCGGTCATC .TT.A....G GT.G.GT... T.TAGAT.AA AGGC.A...... Patr CYP2D7* CAGA..TCGG ATCGGTCATC .TT.A....G GT.G.GT... T.TAGAT.AA AGGC.A...... Hosa CYP2D7P (M33387) CAGA.CTCGG ATCGG.CATC C-TTA.AAAG GT.GTG..GT TG.AG..TAA AGGCA.GTCG G.TCCGCAGC T.C Up- stream

Fig. 2. Alignment of the CYP2D6 and CYP2D7 or CYP2D7P genes from humans and chimpanzees. Hosa and Patr represent the human and chimpanzee, respectively. Dots indicate identity with the nucleotides of Patr CYP2D6 (DQ282164). Nucleotide position numbers at the top of the figure represent variable sites. Gray boxes indicate sequences that are putatively exchanged between the Patr CYP2D6 and CYP2D7 genes. The asterisk indicates the sequence modified by this study. Part of the modified sequences was confirmed by PCR using each gene-specific primer pair and sequencing (Heavy underlines). In the actual sequences, the upper four and the lower three sequences of the alignment are located downstream and upstream, respectively. ing human CYP2D genes, with adjacent genes NDUFA6 that some fish Cyp2k sequences are syntenic and there- and TCF20 being detected in all cases. In contrast, such fore orthologous to CYP2W1 in chicken and mammals regions in fishes and the sea squirt do not show signifi- although the similarity between Cyp2k and CYP2W cant similarity to those of amniotes. In the amphibian sequences is low. These results therefore suggest that genome, NDUFA6 and TCF20 were not found, but other the origin of the CYP2D subfamily in primates might be adjacent genes SREBF2 and WBP2NL were observed traced back to a stem lineage between amniotes and together with CYP2D ortholgs. A BLAST search amphibians. The CYP2D candidate of the sea squirt is revealed that the human CYP2D6 gene showed relatively annotated as CYP2D6 according to automated computa- high similarity to the Cyp2k and Cyp2j genes in the tional analysis of the NCBI annotation (Genbank acces- zebrafish and the Cyp2j genes in the medaka fish and sion number: XM_002128104). However, our results puffer fish genomes. However, the Ensemble Genome indicate that the annotated CYP2D6 in the sea squirt Browser revealed that CYP2J2 in the human was proba- must be more closely related to another CYP2 gene in ver- bly an ortholog of the zebrafish Cyp2j gene, and human tebrates. CYP2W1 might be an ortholog of the zebrafish Cyp2k. We reconstructed NJ, ML and Bayesian trees based on The ML phylogeny based on Clan 2 (CYP1, CYP2, CYP17 amino acid sequences of CYP2D candidate genes together and CYP21) amino acid sequences in the zebra fish and with human CYP2W1 and CYP2J2 (Fig. 3 and Supple- human genomes showed that sequences of the fish mentary Figs. S1–S3). Although the topology of the Cyp2ks and Cyp2js (Cyp2js were described as CYP2N, three trees was slightly different from each other, the CYP2P, CYP2V and CYP2AD in the analysis) formed a CYP2D candidate genes of amniotes and amphibians monophyletic group with those of the human CYP2W1 formed a monophyletic cluster to the Cyp2k/Cyp2j genes and CYP2J2, respectively, but the fish has no CYP2D of fishes and the CYP2W/CYP2J genes of humans (Sup- candidate (Goldstone et al., 2010). This previous study plementary Figs. S1–S3). This implies that the CYP2D moreover revealed that CYP2j genes shared synteny with subfamily could have already been present before the the human CYP2J2. Nelson (2011) has also reported divergence of amniotes from amphibians. Although both 114 Y. YASUKOCHI and Y. SATTA

Human CYP2D6 89 Pgmy Chimpanzee CYP2D6 Chimpanzee CYP2D6 Orangutan CYP2D6 Orangutan CYP2D7P Human CYP2D7P 66 81 Chimpanzee CYP2D7 61 Rhesus monkey CYP2D6 Rhesus monkey CYP2D8 Orangutan CYP2D8P Human CYP2D8P 99 94 Chimpanzee CYP2D8P Marmoset CYP2D6 Marmoset CYP2D8 100 Mouse Cyp2d22 (NM_001163472) Rat Cyp2d4 (NM_138515) 98 Mouse Cyp2d26 (NM_029562) 73 93 Rat Cyp2d2 (NM_012730) Rat Cyp2d3 (NM_173093) 100 Rat Cyp2d1 (NM_153313) 99 51 Rat Cyp2d5 (NM_173304) 58 Mouse Cyp2d9 (NM_010006) 98 Mouse Cyp2d12 (NM_201360) 99 Mouse Cyp2d34 (NM_145474) Mouse Cyp2d10 (NM_010005) 91 Mouse Cyp2d11 (NM_001104531) 99 Rabbit CYP2D23 (NM_001168395) 87 Rabbit CYP2D24 (NM_001168397) 87 Rabbit CYP2D/II-like: LOC100348527 (XM_002721374) 68 Rabbit CYP2D/I-like: LOC100348786 (XM_002721375) 100 Rabbit CYP2D/I-like: LOC100349036 (XM_002721376) 100 Cattle CYP2D14 (NM_174529) 83 Cattle CYP2D-like: LOC785824 (NM_001080364) 94 Pig CYP2D25 (NM_214394) 67 Horse CYP2D50-like: LOC100056087 (XM_001502856) 89 Horse CYP2D50-like: LOC100146596 (XM_001916743) 95 Horse CYP2D50-like: LOC100070962 (XM_001502900) 83 Horse CYP2D50-like: LOC100070895 (XM_001502807) 69 Horse CYP2D50-like: LOC100146391 (XM_001917460) Horse CYP2D50 (NM_001111306) 98 Giant panda CYP2D15-like (XM_002926679) 81 Dog CYP2D15 (NM_001003333) Opossum CYP2D6-like (NC_008808) Platypus CYP2D6 (NW_001597728 ) Anole lizard CYP2D6 (ENSACAG00000003666*) Chicken cyp2d3-like (NM_001195557 ) 79 Western clawed frog CYP2D4-like (XM_002933776) Western clawed frog CYP2D26-like (XM_002933762) 98 African clawed frog cyp2d6-B-prov ( BC054243) 100 African clawed frog cyp2d6-a ( NM_00109357 4) 0.05 96 Western clawed frog cyp2d6 (NM_001015719) Fig. 3. Neighbor-joining trees of CYP2D genes of amniotes and amphibians based on amino acid sequences of the full-length coding region. The distance is corrected by the JTT matrix-based method. Only bootstrap values over 50% are shown. Numbers in parentheses are Genbank accession numbers, whereas numbers with an asterisk represent the Ensemble gene ID. of the lizard and chicken have a single CYP2D gene, lineage-specific gene expansion has not been occurred in amphibians and major mammalian orders possess multi- the lizard CYP2D subfamily but occurred in the CYP2G ple CYP2D genes and show an independent expansion of and CYP2AG subfamilies. Although it is not so easy to CYP2D. While primates have two to three CYP2D prove it, one may hypothesize that the lizard is not nec- genes, rodents have five to seven, rabbits have five, and essary to increase detoxification activity against plant, horses have six. This expansion of the CYP2D subfamily but is necessary to enlarge the functions of CYP2G and in herbivores is interesting, and might be related to the CYP2AG . The CYP2G enzymes are known to very high affinity of the CYP2D6 enzyme for plant toxins be expressed specifically in the olfactory mucosa of sev- like alkaloids (Fonne-Pfister and Meyer, 1988). Kubota eral mammals (Larsson et al., 1989; Nef et al., 1989; et al. (2011) have reported that the anole lizard appears Reed, 1993; Hua et al., 1997) although the CYP2AG to have a much larger set of CYP2 genes, especially enzyme is identified only in the anole lizard. Many tet- CYP2G and CYP2AG, than those of chicken or zebra rapod vertebrates have a vomeronasal organ involved in finch, Taeniopygia guttata. The number of lizard CYP2 the CYP2G gene expression and the organ is particularly genes also appears to be much larger than those of the well-developed in lizards and snakes (Schwenk, human. However, the lizard genome has a single 1995). Although the functions of CYP2G and 2AG in the CYP2D candidate. It is interesting to examine why the lizard are not known yet, the extent of expansion in dif- Gene organization of the CYP2D subfamily in primates 115 ferent subfamilies is likely to reflect the requirement of supported by an orthologous relationship of SINEs and genes in an environment. LINEs between the chimpanzee and human genomes Furthermore, we focused on the evolutionary mode of (Fig. 1). CYP2D in primates. Since CYP2D genes were still not In summary, here we examined gene organization in annotated in non-human primates, CYP2D orthologies the CYP2D subfamily in primates, other vertebrates and were examined among different primate species by invertebrate. Obtained results revealed three findings: searching cladistic markers such as LINEs or SINEs in first, the origin of this subfamily could be traced back to the CYP2D clusters. Various transposable elements, a stem lineage of amniotes and amphibians, second, the including Alu elements, were found in intergenic regions CYP2D6 and CYP2D8P in humans have been already (Fig. 1). It has been reported that the high rate of gene present before divergence between New World monkeys conversion is related to the dense distribution of Alu ele- and Catarrhini, and third the expansion of CYP2D genes ments in the human genome (Chen et al., 2007). Indeed, seems to reflect or be affected by an environment. We many studies have described gene conversion among hope that gene organization in the CYP2D subfamily is human CYP2D genes (Kimura et al., 1989; Gonzalez and precisely investigated in other various species to confirm Nebert, 1990; Hanioka et al., 1990; Heim and Meyer, these findings. Future work including phylogenetic 1992; Masimirembwa et al., 1996). However, LINEs and analyses and detection of gene conversions should eluci- SINEs were observed at identical sites among several pri- date further molecular evolution of this gene cluster. mate genomes (Fig. 1). This indicates that gene conver- sion between flanking regions of different CYP2D is not The authors are indebted to the Max Planck Institute for obvious. Rather, each flanking region has maintained Biology and the Primate Research Institute of Kyoto University for their kind contribution on sample collection. This work was unique patterns of insertion of LINEs or SINEs. supported by Grant-in-Aid for Scientific Research (B) (21370106). In intron 1 of the CYP2D8 or CYP2D8P, we detected AluSx-AluY-AluSg4 in hominoid genomes, AluSx-AluSg4- REFERENCES AluYRa1 in the rhesus monkey genome, and AluSx- AluSg4 in the marmoset (Fig. 1). This result indicates Bradford, L. (2002) CYP2D6 allele frequency in European Cau- that AluSx-AluSg4 have been inserted into intron 1 casians, Asians, Africans and their descendants. Pharma- before the divergence between New World monkeys and cogenomics 3, 229–243. Chen, J., Férec, C., and Cooper, D. (2007) Mechanism of Alu Catarrhini, but after the divergence of the CYP2D6 and integration into the human genome. Genomic Med. 1, 9– CYP2D8 genes. 17. In the chimpanzee, the CYP2D genes on chromosome Felsenstein, J. (2009) PHYLIP (Phylogeny Inference Package) 22 genomic contig (Genbank accession number: ver.3.69. Distributed by the author. Department of Genome NW_001230982.1) are CYP2D6, CYP2D7 ( Gene Sciences, University of Washington, Seattle, USA. Fonne-Pfister, R., and Meyer, U. (1988) Xenobiotic and endobi- ID: 745989), and CYP2D8P (Entrez Gene ID: 470229). otic inhibitors of cytochrome P-450dbl function, the target of However, according to the annotation in the database, the the debrisoquine/sparteine type polymorphism. Biochem. order of the genes in the chimpanzee genome was differ- Pharmacol. 37, 3829–3835. ent from that in the human and orangutan genomes. In Goldstone, J. V., McArthur, A. G., Kubota, A., Zanette, J., the chimpanzee, genes were ordered, from the proximal to Parente, T., Jönsson, M. E., Nelson, D. R., and Stegeman, J. J. (2010) Identification and developmental expression of the distal regions, as CYP2D8P-CYP2D6-CYP2D7, whereas full complement of Cytochrome P450 genes in Zebrafish. in humans and orangutans they were ordered as BMC Genomics 11, 643. CYP2D8P-CYP2D7P-CYP2D6. The comparison of the Gonzalez, F. J., and Nebert, D. W. (1990) Evolution of the P450 CYP2D6 nucleotide sequence from the genome with that gene superfamily: animal-plant ‘warfare’, molecular drive of complete cds (Genbank accession number: DQ282164) and human genetic differences in drug oxidation. Trends Genet. 6, 182–186. revealed that the nucleotide sequence in a middle region Hanioka, N., Kimura, S., Meyer, U. A., and Gonzalez, F. J. of the genomic CYP2D6 in the database is different from (1990) The human CYP2D locus associated with a common the EST sequence (Fig. 2). We further performed PCR genetic defect in drug oxidation: A G1934→A base change in amplifications of the genomic DNA of the chimpanzee intron 3 of a mutant CYP2D6 allele results in an aberrant using specific primer pairs based on the annotated chim- 3’ splice recognition site. Am. J. Hum. Genet. 47, 994– 1001. panzee CYP2D6 and CYP2D7 sequences (Table 1). The Heim, M., and Meyer, U. (1992) Evolution of a highly polymor- sequences obtained with the CYP2D7- and CYP2D6- phic human cytochrome P450 gene cluster: CYP2D6. specific primers that are based on the NCBI annotation Genomics 14, 49–58. are orthologs to the human CYP2D6 and CYP2D7P, Hua, Z., Zhang, Q. Y., Su, T., Lipinskas, T. W., and Ding, X. respectively. These results indicate that the CYP2D6 (1997) cDNA cloning, heterologous expression, and charac- terization of mouse CYP2G1, an olfactory-specific steroid gene annotated in the reference genome of the chimpan- hydroxylase. Arch. Biochem. Biophys. 340, 208–214. zee should be in fact CYP2D7, and the predicted CYP2D7 Ingelman-Sundberg, M. (2005) Genetic polymorphisms of must be in fact CYP2D6. This observation is strongly cytochrome P450 2D6 (CYP2D6): clinical consequences, evo- 116 Y. YASUKOCHI and Y. SATTA

lutionary aspects and functional diversity. Pharmacoge- genes, pseudogenes and alternative-splice variants. Phar- nomics J. 5, 6–13. macogenetics 14, 1–18. Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) The rapid Page, R. D. (1996) TreeView: an application to display phyloge- generation of mutation data matrices from protein netic trees on personal computers. Comput. Appl. Biosci. sequences. Comput. Appl. Biosci. 8, 275–282. 12, 357–358. Jurka, J., Kapitonov, V., Pavlicek, A., Klonowski, P., Kohany, O., Raimundo, S., Toscano, C., Klein, K., Fischer, J., Griese, E., and Walichiewicz, J. (2005) Repbase Update, a database of Eichelbaum, M., Schwab M., and Zanger, U. (2004) A novel eukaryotic repetitive elements. Cytogenet. Genome Res. intronic mutation, 2988G>A, with high predictivity for 110, 462–467. impaired function of cytochrome P450 2D6 in white Kimura, S., Umeno, M., Skoda, R., Meyer, U., and Gonzalez, F. subjects. Clin. Pharmacol. Ther. 76, 128–138. (1989) The human debrisoquine 4-hydroxylase (CYP2D) Reed, C. J. (1993) Drug metabolism in the nasal cavity: rele- locus: sequence and identification of the polymorphic vance to toxicology. Drug Metab. Rev. 25, 173–205. CYP2D6 gene, a related gene, and a pseudogene. Am. J. Rhesus Macaque Genome Sequencing and Analysis Consortium. Hum. Genet. 45, 889–904. (2007) Evolutionary and Biomedical Insights from the Kohany, O., Gentles, A., Hankus, L., and Jurka, J. (2006) Anno- Rhesus Macaque Genome. Science6, 222–234. 31 tation, submission and screening of repetitive elements in Rogers, J., Garcia, R., Shelledy, W., Kaplan, J., Arya, A., Repbase: RepbaseSubmitter and Censor. BMC Bioinfor- Johnson, Z., Bergstrom, M., Novakowski, L., Nair, P., matics 7, 474. Vinson, A., et al. (2006) An initial genetic linkage map of the Kubota, A., Stegeman, J. J., Goldstone, J. V., Nelson, D. R., Kim, rhesus macaque (Macaca mulatta) genome using human E. Y., Tanabe, S., and Iwata, H. (2011) Cytochrome P450 microsatellite loci. Genomics 87, 30–38. CYP2 genes in the common cormorant: Evolutionary rela- Ronquist, F., and Huelsenbeck, J. P. (2003) MrBayes 3: tionships with 130 diapsid CYP2 clan sequences and chem- Bayesian phylogenetic inference under mixed models. ical effects on their expression. Comp. Biochem. Physiol. C Bioinformatics 19, 1572–1574. Toxicol. Pharmacol. 153, 280–289. Saitou, N., and Nei, M. (1987) The neighbor-joining method: a Larsson, P., Pettersson, H., and Tjälve, H. (1989) Metabolism of new method for reconstructing phylogenetic trees. Mol. aflatoxin B1 in the bovine olfactory mucosa. Carcinogene- Biol. Evol. 4, 406–425. sis 10, 1113–1118. Schwenk, K. (1995) Of tongues and noses: chemoreception in Masimirembwa, C., Persson, I., Bertilsson, L., Hasler, J., and lizards and snakes. Trends Ecol. Evol. 10, 7–12. Ingelman-Sundberg, M. (1996) A novel mutant variant of Sistonen, J., Sajantila, A., Lao, O., Corander, J., Barbujani, G., the CYP2D6 gene (CYP2D6*17) common in a black African and Fuselli, S. (2007) CYP2D6 worldwide genetic variation population: association with diminished debrisoquine shows high frequency of altered activity variants and no hydroxylase activity. Br. J. Clin. Pharmacol. 42, 713–719. continental structure. Pharmacogenet. Genomics 17, 93– McConkey, E. (2004) Orthologous numbering of great ape and 101. human is essential for comparative genomics. Smit, A. F. A., Hubley, R., and Green, P. (1996–2010) Repeat- Cytogenet. Genome Res. 105, 157–158. Masker Open-3.0. . Mizutani, T. (2003) PM frequencies of major CYPs in Asians and Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007) MEGA4: Caucasians. Drug Metab. Rev. 35, 99–106. Molecular Evolutionary Genetics Analysis (MEGA) software Nef, P., Heldman, J., Lazard, D., Margalit, T., Jaye, M., version 4.0. Mol. Biol. Evol. 24, 1596–1599. Hanukoglu, I., and Lancet, D. (1989) Olfactory-specific cyto- The Chimpanzee Sequencing and Analysis Consortium. (2005) chrome P-450. cDNA cloning of a novel neuroepithelial Initial sequence of the chimpanzee genome and comparison enzyme possibly involved in chemoreception. J. Biol. with the human genome. 7Nature 43 , 69–87. Chem. 264, 6780–6785. Whelan, S., and Goldman, N. (2001) A general empirical model Nelson, D. R. (2011) Progress in tracing the evolutionary paths of protein evolution derived from multiple protein families of cytochrome P450. Biochim. Biophys. Acta 1814, 14–18. using a maximum-likelihood approach. Mol. Biol. Evol. 18, Nelson, D. R., Zeldin, D. C., Hoffman, S. M., Maltais, L. J., 691–699. Wain, H. M., and Nebert, D. W. (2004) Comparison of cyto- Xie, H., Kim, R., Wood, A., and Stein, C. (2001) Molecular basis chrome P450 (CYP) genes from the mouse and human of ethnic differences in drug disposition and response. genomes, including nomenclature recommendations for Annu. Rev. Pharmacol. Toxicol. 41, 815–850.