View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

FEBS Letters 584 (2010) 4775–4782

journal homepage: www.FEBSLetters.org

Phylogenetic analysis and evolution of aromatic amino acid hydroxylase ⇑ Jun Cao a, , Feng Shi b, Xiaoguang Liu a, Guang Huang a, Min Zhou a

a Institute of Life Science, Jiangsu University, Xuefu Road 301, Zhenjiang 212013, Jiangsu, PR China b Shandong Lvdu Bio-technique Industry, 169# Huanghe 2 Road, Binzhou 256600, Shandong, PR China

article info abstract

Article history: A study was performed to investigate the phylogenetic relationship among AAAH members and to Received 31 August 2010 statistically evaluate sequence conservation and functional divergence. In total, 161 were Revised 29 October 2010 identified from 103 species. Phylogenetic analysis showed that well-conserved subfamilies exist. Accepted 5 November 2010 Exon–intron structure analysis showed that the structures of AAAH were highly conserved Available online 10 November 2010 across some different lineage species, while some species-specific introns were also found. The Edited by Takashi Gojobori dynamic distribution of ACT domain suggested one gene fusion event has occurred in eukaryota. Significant functional divergence was found between some subgroups. Analysis of the site-specific profiles revealed critical amino acid residues for functional divergence. This study highlights the Keywords: Aromatic amino acid hydroxylase molecular evolution of this family and may provide a starting point for further experimental Phylogenetic analysis verifications. Evolution Ó 2010 Published by Elsevier B.V. on behalf of the Federation of European Biochemical Societies.

1. Introduction [5]. Alterations in the brain concentration of some neurotransmit- ter can produce behavior abnormalities such as depression, insom- Neurotransmitters are endogenous chemicals that are released nia, autism, eating disorders, despair, misery, aggression and from neurons. They usually act on receptors on post-synaptic cells schizophrenia [6–11]. In addition, hydroxylation of PAH is an and produce functional changes in the target cells. important step in phenylalanine catabolism and neurotransmitter (5-HT), dopamine (DA), norephinephrine (NE), epinephrine and biosynthesis and is linked to a severe variant of phenylketonuria their metabolites are conventional neurotransmitters in the brain. (PKU) in humans [1]. Calvo et al. (2008) also show that Pah knock- The synthesis of these neurotransmitters needs some catalytic out worms display serious cuticle abnormalities [12]. TH deficiency reaction of a class of rate-limiting , aromatic amino acid can cause the autosomal recessive form of dopa responsive dysto- hydroxylases (AAAH). AAAHs are family of non-heme, iron(II)- nia (Segawa’s disease) [13]. And several researches have shown dependent enzymes. According to the different substrates, they that aging are often associated with the TPH activity [3,14]. can be divided into three categories: phenylalanine-4-hydroxylase In phylogenetic terms, the structural features or expression pro- (PAH), (TH) and files of some AAAH homologs have been described partially in hu- (TPH). Among them, PAH catalyzes the conversion of phenylala- man [15,16], amphioxus [17], amoeba [18], tobacco hornworm nine to tyrosine, the rate limiting step in the catabolism of phenyl- [19]. Given the relatively high amino acid sequence similarity alanine [1]. TH catalyzes the hydroxylation of tyrosine to L-dopa, found among AAAH proteins [4], surprisingly, few studies have the rate-limiting step in the synthesis of catecholamines such as investigated their relationships. Since phylogenetic studies of pro- dopamine and noradrenaline [2]. And TPH catalyzes the rate tein family is a valuable tool to determine conserved or divergent limiting step in the biosynthesis of serotonin [3]. In the chemical regions, which can potentially lead to functional predictions [20]. reaction, all three enzymes require the reduced pterin tetrahydro- In this study we elucidated the evolutionary history of the AAAH biopterin as , as well as molecular oxygen and iron, to by a comprehensive bioinformatics/phylogenetic hydroxylate their amino acid substrate [4]. approach, which might lay the framework for further research into The aromatic amino acid hydrogenase-mediated neurotrans- function of these genes. mission plays a crucial role in mental and physical health. Some re- searches have also shown that the decline in physiological 2. Materials and methods functions is often associated with (or accompanied by) changes in various neurotransmitter levels in the central nervous system 2.1. Data collection

⇑ Corresponding author. To identify members of AAAH protein family, the key words of E-mail address: [email protected] (J. Cao). ‘‘aromatic amino acid hydroxylases’’ are firstly used as a query to

0014-5793/$36.00 Ó 2010 Published by Elsevier B.V. on behalf of the Federation of European Biochemical Societies. doi:10.1016/j.febslet.2010.11.005 4776 J. Cao et al. / FEBS Letters 584 (2010) 4775–4782 search against the NCBI database and superfamily HMM library and eight discrete categories and the Ka/Ks values are computed by cal- genome assignment server (http://supfam.org/SUPERFAMILY/)is culating the expectation of the posterior distribution [26]. also used as supplement. At the same time, the BLASTP and TBLASTN programs were used to search NCBI, Ensembl and FlyBase. 2.4. Functional divergence analysis All amino acid sequences are examined for potential aromatic ami- no acid hydroxylase domain for their authenticity using CD-Search To estimate the level of functional divergence and predict analyses. important amino acid residues for these functional differences among AAAH subfamilies, the coefficients of type-I functional divergence were calculated using the method suggested by Gu 2.2. Sequence alignment and phylogenetic analysis et al. [29,30]. The analysis was carried out with Diverge (version 2.0). This method is based on maximum likelihood procedures to Firstly, amino acid sequences of AAAH genes were aligned estimate significant changes in the site-specific shift of evolution- using Clustal X v2.0.12 program [21] with the default parameters. ary rate or site-specific shift of amino acid properties after the PHYLIP v.3.69 was used for our phylogenetic analysis employing emergence of two paralogous sequences. The advantage of this the neighbor joining (NJ) method with 1000 bootstrap values. method is that it uses amino acid sequences and, thereby, is not Meanwhile, to further verify the reliability of the NJ tree, maxi- sensitive to saturation of synonymous sites. Type I designates ami- mum likelihood (ML) analysis was also performed using ProtTest no acid configurations that are very conserved in gene 1 but highly v2.4 [22]. It is based on the PhyML program for ML optimizations, variable in gene 2, or vice versa, implying that these residues have and the best-fit model considers the relative rates of amino acid experienced altered functional constraints [29]. The coefficients of replacement and the evolutionary constraints imposed by conser- functional divergence values are significantly greater than 0, sug- vation of protein structure and function. The Akaike Information gesting site-specific altered selective constraints or a radical shift criterion (AIC) was implemented in ProtTest v2.4 to estimate the of amino acid physiochemical properties after gene duplication. most appropriate model of amino acid substitution for tree build- Moreover, a site-specific posterior analysis was used to predict ing analyses. Then, according to the best-fit model predicted by amino acid residues that were crucial for functional divergence ProtTest v2.4, a rooted maximum likelihood tree was constructed [31]. using the PhyML v3.0 online program [23], and the reliability of interior branches was assessed with 1000 bootstrap resamplings. Finally, the phylogenetic trees were displayed using MEGA v4 3. Results and discussion [24]. 3.1. Identification of AAAH genes 2.3. Positive selection assessment We collected AAAH family gene sequences through database Identification of site-specific positive and purifying selection search. The result revealed the presence of AAAH homologs in was calculated with The Selecton Server using a Bayesian inference much of organisms. For Eukaryota, we did not find definitive AAAH approach [25,26]. Since an evolutionary model describes in proba- orthologs in Fungi and higher Plants. While AAAH analogs mainly bilistic terms how characters are expressive enough to describe the exist in Alphaproteobacteria, Betaproteobacteria, Gammaproteo- biological reality. This Server can implement several evolutionary bacteria, Bacteroidetes, Actinobacteria and so on in Prokaryota. models, such as MEC (Mechanistic Empirical Combination Model), Considering incomplete nucleotides and proteins sequence dat-

M5 (gamma), M7 (beta), M8a (xs = 1) and M8 (xs P 1), each of abases of some species, the number of AAAH genes collected in a which assumes different biological assumptions and enable con- species varied from 1 to 5 in Eukaryota and only an AAAH gene ex- trasting different hypotheses through testing which model better isted in one species in Prokaryota. In total, 102 and 59 AAAH se- fits the data. M8 allows for positive selection operating on the pro- quences from 44 and 59 selected species in Eukaryota and tein. A proportion p0 of the sites are drawn from a beta distribution Prokaryota, respectively, were obtained (Supplementary data 1). (which is defined in the interval [0,1]), and a proportion p1(=1 p0) of the sites are drawn from an additional category xs (P1). Thus, 3.2. Phylogenetic analyses of AAAH gene lineages sites drawn from the beta distribution are sites experiencing puri- fying selection, whereas sites drawn from the xs category are sites To explore the phylogenetic relationship among AAAH paralo- experiencing either neutral or positive selection. M8a model is gous, a neighbor joining phylogenetic tree with 161 AAAH genes similar to the M8 model, except for the fact that it does not allow from 103 species was inferred from the amino acid sequences for positive selection by setting xs = 1. Thus, only neutral and puri- using the PHYLIP v.3.69. At the same time, a rooted maximum-like- fying selection are allowed here. M7 model is similar to M8, except lihood (ML) phylogenetic tree was aslo constructed using the for the fact that it assumes only a beta distribution with no addi- PhyML v3.0 online program [23] under the best-fit model (LG+I+G) tional category. Thus it allows mainly for purifying selection in for amino acid substitution was selected by ProtTest v2.4 [22] with the protein. M5 model assumes Ka/Ks among sites are gamma dis- discrete gamma distribution in four categories. All parameters tributed and thus may allow for purifying, neutral, and positive (gamma shape = 1.051; proportion of invariants = 0.008) were esti- selection [27]. The MEC model differs from the M models in that mated from the dataset. Tree topology assessed by ML method was it allows instantaneous substitutions between pairs of codons that substantially similar to the NJ tree (data not shown). Fig. 1 shows differ at 2 or 3 codon positions and in its ability to take into ac- the consensus phylogeny obtained for these sequences. As a whole, count the different replacement probabilities between amino acids eukaryotic AAAH genes originated from prokaryotic ones. In pro- [28]. The advantage of the MEC model over the other models pre- karyotes, AAAH members within the same species tend to be sented here is that by treating different amino-acid replacements grouped in the same branch, such as Gammaproteobacteria (de- differently, Ka is computed differently: under the MEC model, a po- noted by solid square, AAAH-gamma); Betaproteobacteria (de- sition with radical replacements will obtain a higher Ka value than noted by hollow diamond, AAAH-beta); Alphaproteobacteria a position with more moderate replacements. These models all as- (denoted by hollow triangle, AAAH-alpha) and so on. Phylogenetic sume a statistical distribution to account for heterogenous Ka/Ks analysis showed that three distinct clusters, named PAH (bootstrap values among sites. And the distributions are approximated using value is 198), TPH (bootstrap value is 607) and TH (bootstrap value J. Cao et al. / FEBS Letters 584 (2010) 4775–4782 4777

Fig. 1. Phylogenetic tree of 161 aromatic amino acid hydroxylases. This NJ tree was inferred from the amino acid sequences alignment using the PHYLIP v.3.69. This tree was validated by ML method under the best-fit model LG+I+G (selected by ProtTest v2.4) with discrete gamma distribution in four categories. All parameters (gamma shape = 1.051; proportion of invariants = 0.008) were estimated from the dataset. Red, blue and green evolutionary branches denote TH, TPH and PAH subfamilies in eukaryote, respectively. Evolutionary branch in black denotes AAAH members in prokaryote, in which different symbols are marked in different prokaryotic species, such as, hollow round: Actinobacteria; solid round: Green non-sulfur bacteria; hollow square: Bacteroidetes; solid square: Gammaproteobacteria; hollow triangle: Alphaproteo- bacteria; hollow diamond: Betaproteobacteria; solid diamond: Deltaproteobacteria; solid triangle in green: Acidobacteria; solid square in green: Archaea Korarchaeota; solid round in green: Firmicutes; solid diamond in green: Chlamydiae. The dynamic distribution of ACT domain in AAAH family has also been marked in this figure. That is, AAAH members with ACT domain are shown in red and those without ACT domain in black. is 477), were unambiguously separated from Protists and lower families. Our phylogenetic analysis suggests that the AAAH origi- Plants (Mosses/Green algae) before the divergence of Choanofla- nated by duplication and divergence of a common protein at the gellates in eukaryote. From Fig. 1, we also inferred that two major base of the eukaryotic tree. Both small-scale and large-scale gene duplications had occurred in TPH and TH subfamilies. TPH duplica- duplications are known to contribute to the complexity of eukary- tion, which occurred in the vertebrate lineages, ultimately led to otic organisms [34]. The high level of sequence identity between the emergence of two lineages which evolved into TPH1 (bootstrap different subfamilies suggests evolutionarily conserved functions. value is 1000) and TPH2 (bootstrap value is 1000). However, for TH While the presence of AAAHs in phylogenetically distant taxo- duplication, the condition is more complicated. Two TH genes nomic groups such as bacteria, insects and mammals highlight emerged as a consequence of a whole genome duplication before their general functional importance. the divergence of jawed vertebrates, but TH2 (bootstrap value is 999) was secondarily lost in eutherians [32]. Since whole genome 3.3. Exon-intron evolution of the AAAH family genes duplication has occurred in the ancestral vertebrate [33], and anal- ysis of a phylogenetic perspective may provide the basis for under- To examine the possible mechanisms of structural evolution of standing the functional diversity within conserved protein AAAH paralogous, we compared the exon-intron structures of 4778 J. Cao et al. / FEBS Letters 584 (2010) 4775–4782

Fig. 2. Conservation and variability of intron positions and phases in eukaryotic AAAH family. Three subgroups [PAH (a); TPH (b); TH (c)] are shown, respectively in here. The 0, 1 and 2 phase introns are marked with black, red and green lines, respectively. To facilitate the presentation, some intron such as Ia, Ig, Ii and so on are also named artificially according to their different inserted positions. individual AAAH genes in Eukaryota. Fig. 2 provides a detailed in the organization of gene. In Fig. 2, some conserved introns (such illustration of the distribution and position of introns within each as intron Ia, Ig, Ii, Ij, Il, Im, In) are also marked. While, from Fig. 2, of the AAAH paralogous. In order to facilitate the presentation, we also find that the gene structures of Arthropods, Nenatodes and we named some introns, such as intron Ia, Ig, Ih and so on accord- Flatworms for TH, TPH and PAH seem to have fewer conserved in- ing to their different inserted positions. In general, the positions of trons when compared to other eukaryotes. From a genomic per- some spliceosomal introns are conserved in orthologous genes spective, changes in exon-intron structures appear to be from protists, sea anemone, insect to mammalia whereas others predominantly dependent on lineage-specific trends [36]. Our are lineage-specific. Such as, intron Id is insect-specific only for analysis about AAAH provides further support to the hypothesis TPH and not likely gained by the ancestor. As well, intron Ik exists that intron conservation is usually found in slow-evolving lineages not only in TH subfamily of Vertebrata, Hemichordata and Echino- [37], but is rare in fast-evolving species like Caenorhabditis elegans dermate but also in PAH subfamily of Chordata and Echinoder- and Drosophila melanogaster [38,39]. This lineage-specific intron mate, but in other evolutionary branches (such as in TPH), this evolution trends might be connected with generation time and intron was lost. Intron Io exists specifically in PAH of ascidians population size, which might affect whether species evolve quickly and vertebrates, while is lost in Canis lupus (Fig. 2a). Intron Ih exists or slowly. Perhaps, introns are disfavoured to gene expression in not only in vertebrata TH gene but also in insect TPH lineage. So it these species that are under strong selective pressure for short seems that a intron gain has been occured in these groups (Fig. 2b genome replication time. And chosen in part for their short gener- and c). It has been suggested that abundance of intron loss occured ation times, for which short replication times are presumably an after segmental duplication [35]. Our study about TH and TPH advantage [39]. duplication also proved this. Excepting the duplicate gene TH2, in- Therefore, the exon-intron structure of the AAAH family is tron Is exists in all other TH group (Fig. 2c). So, the duplication highly dynamic such that evolution of this gene family must have leading to the emergence of TH1 and TH2 might promote the loss involved numerous intron losses or gains. In general, the structural of intron Is in TH2. This phenomena of intron loss following gene diversity of gene family members is a mechanism for the evolution duplication also occurs in TPH, in which intron Is is lost in TPH1 of multiple gene families, while intron loss or gain can be an (Fig. 2b). It is thus clear that duplication plays an important role important step in generating this structural diversity and complex- J. Cao et al. / FEBS Letters 584 (2010) 4775–4782 4779

Fig. 2 (continued) ity [31,40]. In this study, we analyzed the structural diversity of intron loss or gain events have occurred during the expansion of AAAH genes and found that intron loss and gain events occurred AAAH genes and generated diversity of gene structure. during the expansion and structural evolution of AAAH paralogous. And the number and position of intron loss or gain was distinctly 3.4. The dynamic distribution of ACT domain in AAAH genes different among them. On the average, intron positions have been shown to be remarkably well conserved over long evolutionary The ACT domain is one of many different small molecule-bind- time intervals [41]. In this paper, we observed that the intron posi- ing domains characterized by high sequence divergence and evolu- tions and phases of the AAAH family genes were well-conserved tionary mobility, which can be fused into a wide variety of proteins even in almost all of the lineages (Fig. 2). And some lineage-specific in the course of protein evolution [42]. The majority of these can 4780 J. Cao et al. / FEBS Letters 584 (2010) 4775–4782 bind amino acids that function to regulate some aspect of amino Table 1 acid metabolism, such as ACT domains in acetohydroxyacid syn- Likelihood values and parameter estimates for the AAH genes. thase [43]; 3-phosphoglycerate dehydrogenase [44]; chorismate Model Gene Ka/Ksa Log- Positively selected mutase [45] and so on. To describe the distribution of ACT domain branches likelihood sites in AAAH genes, CD conserved domain analyses were also per- MEC TH 0.092 23076.2 not found formed for all AAAH proteins. The results indicated that ACT do- PAH 0.109 22817.5 3, 4, 6, 11, 13 main is not present in all investigated species but first appeared TPH 0.176 17294.6 41, 167, 168 TPH1 0.083 7435.92 136 in the protists tetrahymena and its distribution was obviously dy- TPH2 0.073 9278.76 40 namic. Fig. 1 marked this change. AAAH genes with ACT domain AAAH-alpha 0.193 7902.06 2, 53 are shown in red and genes without ACT domain in black. That AAAH-beta 0.249 8150.08 2, 3, 5, 8, 31, 76, 80, is, ACT domain did not appear in prokaryotic AAAH genes but 95, 173, 215, 273, was later obtained in eukaryotic ones in evolution. 276, 277, 287, 289, 295 AAAH-gamma 0.220 10730.8 82, 85, 208 In general, the origin and evolution of multidomain proteins is usually driven by diverse processes including fusion, fission, do- M5(gamma) TH 0.174 23787.3 not found PAH 0.165 23613.7 not found main shuffling by genetic recombination, which play important TPH 0.238 17749.4 not found roles in the generation of new protein architectures [46,47]. The TPH1 0.084 7473.3 not found first presence of linked ACT domain in tetrahymena strongly sug- TPH2 0.077 9405.83 not found gests the gene fusion event occurred at least as early as the protists AAAH-alpha 0.239 8107.92 not found AAAH-beta 0.221 8383.13 not found appearing. Although different from most of ACT-containing pro- AAAH-gamma 0.252 11034.1 not found teins described above, the ACT domains in AAAH can not bind ami- M7(beta) TH 0.162 23736.3 not found no acids or substrate analogs [1,48], an allosteric effect has been PAH 0.152 23610.8 not found demonstrated in the ACT domain of PAH in mammalian [49]. So, TPH 0.204 17729.6 not found the acquisition of ACT domain is likely to reflect the demand for TPH1 0.077 7463.58 not found greater sophistication in protein function in complex eucaryotic TPH2 0.072 9395.16 not found species. Strangely, some eukaryotic AAAH genes (such as AAAHs AAAH-alpha 0.194 8097.31 not found AAAH-beta 0.201 8379.84 not found in moss, green algae and apicomplexans; THs in flatworms; TPH AAAH-gamma 0.211 11024.6 not found and TH in nenatodes; PAH in fruitfly; TPH and PAH in sea squirt; M8a(xs = 1) TH 0.173 23750.2 not found TPH2 in platypuses) have not ACT domain. We have no clear expla- PAH 0.158 23601.2 4 nation for this status, but it may be due to the functional inactiva- TPH 0.210 17731.5 not found tion or change of some genes. Overall, the presence of ACT domain TPH1 0.077 7466.88 not found in new AAAH may have had a role in contributing to the higher TPH2 0.073 9396.42 not found AAAH-alpha 0.203 8097.7 not found complexity of eucaryota. The effect of this on the structure and AAAH-beta 0.204 8379.45 not found function of closely related proteins in different species still remains AAAH-gamma 0.219 11023.9 not found to be examined, but our findings suggest that acquisition of ACT M8(xs P 1) TH 0.162 23752.2 not found domain may play an important role in protein evolution. PAH 0.166 23597.1 4, 5, 6, 11, 20 TPH 0.228 17741.1 not found 3.5. Variable selective pressures among amino acid sites TPH1 0.082 7435.92 136 TPH2 0.074 17741.1 not found AAAH-alpha 0.205 8092 not found The Ka/Ks metric is defined to measure selection pressure on AAAH-beta 0.252 8348 not found amino acid substitutions. A Ka/Ks ratio greater than 1 suggests po- AAAH-gamma 0.248 11026.4 4, 82 sitive selection and the ratio less than 1 suggests purifying selec- a The Ka/Ks ratio is the an average over all sites of gene branch alignments. tion. Amino acids in a protein sequence are expected to be under different selective pressure and to have different underlying Ka/ Ks ratios. To analyze positive or negative selection of specific ami- no acid sites within the full-length protein sequences of AAAH se- all others (Table 1). For example the MEC model predicted possible quences for different branches, substitution rate ratios of non- positively selected sites in all genes except TH. In contrast, the M8 synonymous (Ka) versus synonymous (Ks) mutations were calcu- model found no positively selected sites for TPH, TPH2, AAAH-al- lated with The Selecton Server using a Bayesian inference ap- pha, and AAAH-beta. In addition, while both models predicted proach [26]. The results showed that the Ka/Ks ratios of the five some sites for AAAH-gamma, far more were predicted by MEC subclades in eukaryota and three subclades in prokaryota are sig- (16) than M8 (2) and none of the sites were in common between nificantly different (Table 1). In general, compared with the five the two methods. Interestingly, much of the positively selected eukaryotic branches, three prokaryotic branches have higher Ka/ sites inferred in our research occurred on the branches right before Ks, suggesting an accelerated evolutionary rate in prokaryotic (TPH) or after (TPH1 and TPH2) duplication events. Functional dif- AAAH group. This study agrees with previous studies implying that ferentiation of duplicate genes is extremely important for genetic shorter generation times (or higher replication speed) are associ- novelty. Based on our phylogenetic analyses (Fig. 1), both TPH1 ated with accelerated evolutionary rates [50,51]. The Ka/Ks value and TPH2 come from the duplication of ancestral TPH on the emer- of TPH is higher and the Ka/Ks values of TPH1 and TPH2 are lower gence of vertebrate. And previous researches have also observed in Eukaryota. Despite the differences in Ka/Ks values among these that asymmetric evolution rate often consist in the duplicated subclades, all the estimated Ka/Ks values are substantially lower gene pairs [52,53]. So we suggest that TPH may play a broader role than 1, suggesting that the AAAH members within each subgroup before the onset of vertebrate, and the higher Ka/Ks in TPH could were under strong purifying selection pressure. We used MEC result from relaxation of functional constraint, and that TPH1 (Mechanistic Empirical Combination Model), M5 (gamma), M7 and TPH2 come from the TPH replication and likely refine and

(beta), M8a (xs = 1) and M8 (xs P 1) models to perform the tests. polarize the role of TPH in vertebrates, so they have a lower Ka/ The selection model M5 and M7 do not suggest presence of posi- Ks value. The putatively selected sites are also shown in Table 1, tively selected sites, whereas MEC and M8 models get similar re- which might be responsible for the functional divergence among sults for the PAH and TPH1 genes and give different results for hydroxylase genes. J. Cao et al. / FEBS Letters 584 (2010) 4775–4782 4781

Table 2 Functional divergence estimated in AAAH paralogous.

Comparison ha SEb LRTc N(0.5)d N(0.6)d TH/PAH 0.08 0.058434 1.874355 1 1 TH/TPH1 0.232 0.181894 1.62682 2 1 TH/TPH2 0.3 0.217097 1.909571 2 1 PAH/TPH1 0.0296 0.155206 0.036372 0 0 PAH/TPH2 0.6176 0.129162 22.86347 122 120 TPH1/TPH2 0.588 0.164921 12.71032 123 122 TH/AAAH-gamma 0.6128 0.054716 125.4314 75 56 TH/AAAH-alpha 0.5336 0.060708 77.258 53 46 TH/AAAH-beta 0.652 0.066395 96.43274 118 63 PAH/AAAH-gamma 0.5912 0.060716 94.81306 60 50 PAH/AAAH-alpha 0.632 0.068595 84.88775 90 56 PAH/AAAH-beta 0.7064 0.066342 113.378 119 102 TPH1/AAAH-gamma 0.3608 0.13534 7.106959 10 4 TPH1/AAAH-alpha 0.276 0.226091 1.490231 2 0 TPH1/AAAH-beta 0.5168 0.161215 10.27626 115 89 TPH2/AAAH-gamma 0.6208 0.148392 17.50178 122 114 TPH2/AAAH-alpha 0.5592 0.192622 8.427928 123 81 TPH2/AAAH-beta 0.812 0.204739 15.72941 129 128 AAAH-gamma/AAAH-alpha 0.3992 0.052172 58.54659 35 22 AAAH-gamma/AAAH-beta 0.3992 0.059226 45.43073 34 24 AAAH-alpha/AAAH-beta 0.1 0.059405 2.833705 3 2

a h is the coefficient of functional divergence. b SE: standard error. c LRT is a likelihood ratio test. d N(0.5) and N(0.6) means the numbers of divergent residues when the cut-off value is 0.5 and 0.6, respectively.

3.6. Analysis of functional divergence predicted to be functionally divergent were mapped onto topology models of human and Pseudomonas syringae AAAH members (Sup- Next, we further investigated whether amino acid substitutions plementary data 2). The predicted functional sites are not equally in the highly conserved hydroxylase domain could have caused distributed throughout the respective AAAH, but instead are clus- adaptive functional diversification. Type-I functional divergence tered in some areas. And several sites have been experimentally between gene clusters of the AAAH family were estimated by pos- verified, such as, P206S substitution can reduce thermal stability terior analysis using the DIVERGE program algorithms [29,30], and solubility of TPH2 [54]; The K274E mutation of PAH gene often which evaluates the shifted evolutionary rate and altered amino has physiological consequences related to amine neurotransmitter acid properties. Pairwise comparisons of paralogous members in function [55]; Y325 of PAH influences iron binding and coupling subfamilies AAAH-gamma, AAAH-beta and AAAH-alpha in pro- efficiency [56], and so on. The results of the functional divergence karyota and TH, PAH, TPH1 and TPH2 in eukaryota were carried analysis suggested that AAAH genes should be significantly func- out and the rate of amino acid evolution at each sequence position tionally divergent from each other, owing to the evolutionary rate was estimated, respectively. Our results, as shown in Table 2, indi- differences at some amino acid sites. Perhaps, amino acid muta- cated that the coefficient of functional divergence (h) values be- tions spured AAAH family genes to evolve some new functions tween AAAH subfamilies vary from 0.0296 to 0.812. These after divergence. Hence, functional divergence might reflect the observations indicated that there were significantly site-specific existence of long-term selective pressure. altered selective constraints on most members of the AAAH family, leading to subfamily-specific functional evolution after diversifica- 4. Conclusion tion. Moreover, some critical amino acid residues responsible for the functional divergence were predicted based on site-specific This study provides a comparative genomic analysis addressing profiles in combination with suitable cut-off values derived from the phylogenetic relationships and evolution of the aromatic ami- the posterior probability of each comparison. And the results indi- no acid hydroxylase gene family in Eukaryota. The results of the cate that distinct differences in the number and distribution of pre- phylogenetic analysis revealed that duplication and deletion had dicted sites for functional divergence within each pair. For occurred in TPH and TH subfamily. The exon-intron structure anal- example, no critical amino acid site was predicted for the subfam- ysis showed that the gene structures were diverse, while some in- ily PAH/TPH1 pair, while over 100 critical amino acids sites were tron positions and intron phases were highly conserved across predicted for the subfamily TPH2/PAH (122) and TPH1/TPH2 different lineages. Functional divergence analysis revealed critical (123) in eukaryota. In prokaryota, lower theta value (0.1) was de- amino acid residues, leading to subgroup-specific functional evolu- tected for the AAAH-alpha/AAAH-beta comparison. Interestingly, tion after their phylogenetic diversification. when compared AAAH members in eukaryotic and prokaryotic species, only 2 critical amino acid sites were predicted for the sub- Acknowledgment family TPH1/AAAH-alpha. Moreover, when the cut-off value is 0.6, no sites were predicted, implying a lower evolutionary rate be- This work is partly supported by Jiangsu University Senior Per- tween the TPH1/AAAH-alpha. Since TPH1 and TPH2 come from sonnel Research Grants to JC (10JDG027). the duplication of ancestral TPH on the emergence of vertebrate, higher seta value (0.588) between them indicates higher evolu- tionary rata after the emergence of two lineages TPH1 and TPH2. Appendix A. Supplementary data During the long period of evolution, the shifted evolutionary rate at specific amino acid sites with in each pair might facilitate the Supplementary data associated with this article can be found, in functional divergence of AAAH subfamilies. Next, residues the online version, at doi:10.1016/j.febslet.2010.11.005. 4782 J. Cao et al. / FEBS Letters 584 (2010) 4775–4782

References [28] Doron-Faigenboim, A. and Pupko, T. (2007) A combined empirical and mechanistic codon model. Mol. Biol. Evol. 24, 388–397. [29] Gu, X. (1999) Statistical methods for testing functional divergence after gene [1] Flatmark, T. and Stevens, R.C. (1999) Structural insight into the aromatic duplication. Mol. Biol. Evol. 16, 1664–1674. amino acid hydroxylases and their disease-related mutant forms. Chem. Rev. [30] Gu, X. (2001) Maximum-likelihood approach for gene family evolution under 99, 2137–2160. functional divergence. Mol. Biol. Evol. 18, 453–464. [2] Kopin, I.J., Gordon, E.K., Jimerson, D.C. and Polinsky, R.J. (1983) Relation between [31] Li, W., Liu, B., Yu, L., Feng, D., Wang, H. and Wang, J. (2009) Phylogenetic plasma and cerebrospinal fluid levels of 3-methoxy-4-hydroxyphenylglycol. analysis, structural evolution and functional divergence of the 12-oxo- Science 1219, 73–75. phytodienoate acid reductase gene family in plants. BMC Evol. Biol. 9, 90. [3] Sanatiago, M., Machado, A., Reinoso-Suarez, F. and Cano, F. (1988) Changes in [32] Yamamoto, K., Ruuskanen, J.O., Wullimann, M.F. and Vernier, P. (2010) Two biogenic amines in rat hippocampus during development and aging. Life Sci. tyrosine hydroxylase genes in vertebrates New dopaminergic territories 42, 2503–2508. revealed in the zebrafish brain. Mol. Cell Neurosci. 43, 394–402. [4] Teigen, K., McKinney, J.A., Haavik, J. and Martinez, A. (2007) Selectivity and [33] Dehal, P. and Boore, J.L. (2005) Two rounds of whole genome duplication in affinity determinants for ligand binding to the aromatic amino acid the ancestral vertebrate. PLoS Biol. 3, e314. hydroxylases. Curr. Med. Chem. 14, 455–467. [34] Gu, X., Wang, Y. and Gu, J. (2002) Age distribution of human gene families [5] Sparks, D., Markesbery, W. and Slevin, J. (1985) Age-associated alteration of shows significant roles of both large- and small-scale duplications in serotonergic synaptic neurochemistry in rat rostral hypothalamus. Neurobiol. vertebrate evolution. Nat. Genet. 31, 205–209. Aging. 6, 213–217. [35] Lin, H., Zhu, W., Silva, J.C., Gu, X. and Buell, C.R. (2006) Intron gain and loss in [6] Lerer, B., Gillon, D., Litchenberg, P., Gorfine, M. and Allolio, B. (1995) segmentally duplicated genes in rice. Genome Biol. 7, R41. Interrelationship of age, depression, and central serotonergic function: [36] Yandell, M., Mungall, C.J., Smith, C., Prochnik, S., Kaminker, J., Hartzell, G., evidence from fenfluramine challenge studies. Int. Psychogeriatr. 8, 83–102. Lewis, S. and Rubin, G.M. (2006) Large-scale trends in the evolution of gene [7] Maes, M., Ombelet, W., Verkerk, R., Bosmans, E. and Scharpe, S. (2001) Effects structures within 11 animal genomes. PLoS Comput. Biol. 2, e15. of pregnancy and delivery on the availability of plasma tryptophan to the [37] Putnam, N.H., Srivastava, M., Hellsten, U., et al. (2007) Sea anemone genome brain: relationships to delivery-induced immune activation and early anxiety reveals ancestral eumetazoan gene repertoire and genomic organization. and depression. Psychol. Med. 31, 847–858. Science 317, 86–94. [8] Van Praag, H.M. (2004) Can stress cause depression? Prog. Neuropsycho- [38] Roy, S.W. and Gilbert, W. (2005) Rates of intron loss and gain: implications for pharmacol. Biol. Psychiatry 28, 891–907. early eukaryotic evolution. Proc. Natl. Acad. Sci. USA 102, 5773–5778. [9] Maron, E. and Shlik, J. (2006) Serotonin function in panic disorder: important, [39] Roy, S.W. and Gilbert, W. (2006) The evolution of spliceosomal introns: but why? Neuropsychopharmacology 31, 1–11. patterns, puzzles and progress. Nat. Rev. Genet. 27, 211–221. [10] Popova, N.K. (2006) From genes to aggressive behavior: the role of [40] Zhang, Z. and Kishino, H. (2004) Genomic background predicts the fate of serotonergic system. Bioessays 28, 495–503. duplicated genes: evidence from the yeast genome. Genetics 166, 1995– [11] Jans, L.A., Riedel, W.J., Markus, C.R. and Blokland, A. (2007) Serotonergic 1999. vulnerability and depression: assumptions, experimental evidence and [41] Lander, E.S., Linton, L.M., Birren, B., et al. (2001) Initial sequencing and analysis implications. Mol. Psychiatry 12, 522–543. of the . Nature 409, 860–921. [12] Calvo, A.C., Pey, A.L., Ying, M., Loer, C.M. and Martinez, A. (2008) Anabolic [42] Anantharaman, V., Koonin, E.V. and Aravind, L. (2001) Regulatory potential, function of phenylalanine hydroxylase in Caenorhabditis elegans. FASEB J. 22, phyletic distribution and evolution of ancient, intracellular small-molecule- 346–3058. binding domains. J. Mol. Biol. 307, 1271–1292. [13] Bräutigam, C., Wevers, R.A., Jansen, R.J., Smeitink, J.A., de Rijk-van Andel, J.F., [43] Mendel, S., Elkayam, T., Sella, C., Vinogradov, V., Vyazmensky, M., Chipman, Gabreëls, F.J. and Hoffmann, G.F. (1998) Biochemical hallmarks of tyrosine D.M. and Barak, Z. (2001) Acetohydroxyacid synthase: a proposed structure hydroxylase deficiency. Clin. Chem. 44, 1897–1904. for regulatory subunits supported by evidence from mutagenesis. J. Mol. Biol. [14] Hussain, A.M. and Mitra, A.K. (2000) Effect of aging on tryptophan hydroxylase in 307, 465–477. rat brain: implications on serotonin level. Drug Metab. Dispos. 28, 1038–1042. [44] Schuller, D.J., Grant, G.A. and Banaszak, L.J. (1995) The allosteric ligand site in [15] Boularand, S., Darmon, M.C. and Mallet, J. (1995) The human tryptophan the V-max-type cooperative phosphoglycerate dehydrogenase. Nat. hydroxylase gene. An unusual splicing complexity in the 50-untranslated Struct. Biol. 2, 69–76. region. J. Biol. Chem. 270, 3748–3756. [45] Zhang, S., Pohnert, G., Kongsaeree, P., Wilson, D.B., Clardy, J. and Ganem, B. [16] Sugden, K., Tichopad, A., Khan, N., Craig, I.W. and D’Souza, U.M. (2009) Genes (1998) Chorismate mutase-prephenate dehydratase from Escherichia coli: within the serotonergic system are differentially expressed in human brain. study of catalytic and regulatory domains using genetically engineered BMC Beurosci. 10, 50. proteins. J. Biol. Chem. 273, 6248–6253. [17] Patton, S.J., Luke, G.N. and Holland, P.W. (1998) Complex history of a chromo- [46] Snel, B., Bork, P. and Huynen, M.A. (2002) Genomes in flux: the evolution of somal paralogy region: insights from amphioxus aromatic amino acid archaeal and proteobacterial gene content. Genome Res. 12, 17–25. hydroxylase genes and insulin-related genes. Mol. Biol. Evol. 15, 1373–1380. [47] Nilsen, T.W. and Graveley, B.R. (2010) Expansion of the eukaryotic proteome [18] Siltberg-Liberles, J., Steen, I.H., Svebak, R.M. and Martinez, A. (2008) The by alternative splicing. Nature 463, 457–463. phylogeny of the aromatic amino acid hydroxylases revisited by [48] Thorolfsson, M., Ibarra-Molero, B., Fojan, P., Petersen, S.B., Sanchez-Ruiz, J.M. characterizing phenylalanine hydroxylase from Dictyostelium discoideum. and Martinez, A. (2002) L-Phenylalanine binding and domain organization in Gene 427, 86–92. human phenylalanine hydroxylase: a differential scanning calorimetry study. [19] Gorman, M.J., An, C. and Kanost, M.R. (2007) Characterization of tyrosine Biochemistry 41, 7573–7585. hydroxylase from Manduca sexta. Insect Biochem. Mol. Biol. 37, 1327–1337. [49] Siltberg-Liberles, J. and Martinez, A. (2009) Searching distant homologs of the [20] Eisen, J.A. (1998) Phylogenomics: improving functional predictions for regulatory ACT domain in phenylalanine hydroxylase. Amino Acids 36, 235– uncharacterized genes by evolutionary analysis. Genome Res. 8, 163–167. 249. [21] Larkin, M.A., Blackshields, G., Brown, N.P., et al. (2007) Clustal W and Clustal X [50] Gu, X. and Li, W.H. (1992) Higher rates of amino acid substitution in rodents version 2.0. Bioinformatics 23, 2947–2948. than in humans. Mol. Phylogenet. Evol. 1, 211–214. [22] Abascal, F., Zardoya, R. and Posada, D. (2005) ProtTest: selection of best-fit [51] Duffy, S., Shackelton, L.A. and Holmes, E.C. (2008) Rates of evolutionary change models of protein evolution. Bioinformatics 21, 2104–2105. in viruses: patterns and determinants. Nat. Rev. Genet. 9, 267–276. [23] Guindon, S., Lethiec, F., Duroux, P. and Gascuel, O. (2005) PHYML Online – a [52] Conant, G.C. and Wagner, A. (2003) Asymmetric sequence divergence of web server for fast maximum likelihood-based phylogenetic inference. duplicate genes. Genome Res. 13, 2052–2058. Nucleic Acids Res. 33, W557–W559. [53] Fang, S., Ting, C.T., Lee, C.R., Chu, K.H., Wang, C.C. and Tsaur, S.C. (2009) [24] Tamura, K., Dudley, J., Nei, M. and Kumar, S. (2007) MEGA4: Molecular Molecular evolution and functional diversification of fatty acid desaturases Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. after recurrent gene duplication in Drosophila. Mol. Biol. Evol. 26, 1447–1456. 24, 1596–1599. [54] Cichon, S., Winge, I., Mattheisen, M., et al. (2008) Brain-specific tryptophan [25] Doron-Faigenboim, A., Stern, A., Mayrose, I., Bacharach, E. and Pupko, T. (2005) hydroxylase 2 (TPH2): a functional Pro206Ser substitution and variation in the Selecton: a server for detecting evolutionary forces at a single amino-acid site. 50-region are associated with bipolar affective disorder. Hum. Mol. Genet. 17, Bioinformatics 21, 2101–2103. 87–97. [26] Stern, A., Doron-Faigenboim, A., Erez, E., Martz, E., Bacharach, E. and Pupko, T. [55] Chao, H.M. and Richardson, M.A. (2002) Aromatic amino acid hydroxylase (2007) Selecton 2007: advanced models for detecting positive and purifying genes and schizophrenia. Am. J. Med. Genet. 114, 626–630. selection using a Bayesian inference approach. Nucleic Acids Res. 35, W506– [56] Miranda, F.F., Kolberg, M., Andersson, K.K., Geraldes, C.F. and Martinez, A. W511. (2005) The residue tyrosine 325 influences iron binding and [27] Yang, Z., Nielsen, R., Goldman, N. and Pedersen, A.M. (2000) Codon- coupling efficiency in human phenylalanine hydroxylase. J. Inorg. Biochem. substitution models for heterogeneous selection pressure at amino acid 99, 1320–1328. sites. Genetics 155, 431–449.