Molecular Evolution of PAS Domain-Containing Proteins of Filamentous Cyanobacteria Through Domain Shuffling and Domain Duplication
Total Page:16
File Type:pdf, Size:1020Kb
DNA Research 11, 69–81 (2004) Molecular Evolution of PAS Domain-Containing Proteins of Filamentous Cyanobacteria Through Domain Shuffling and Domain Duplication Rei Narikawa, Shinobu Okamoto, Masahiko Ikeuchi,∗ and Masayuki Ohmori Department of Life Sciences (Biology), Graduate School of Arts and Sciences, University of Tokyo, Komaba, Meguro-ku, Tokyo 153-8902, Japan (Received 15 December 2003; revised 11 March 2004) Downloaded from https://academic.oup.com/dnaresearch/article/11/2/69/534432 by guest on 28 September 2021 Abstract When the entire genome of a filamentous heterocyst-forming N2-fixing cyanobacterium, Anabaena sp. PCC 7120 (Anabaena) was determined in 2001, a large number of PAS domains were detected in signal-transducing proteins. The draft genome sequence is also available for the cyanobacterium, Nostoc punctiforme strain ATCC 29133 (Nostoc), that is closely related to Anabaena. In this study, we extracted all PAS domains from the Nostoc genome sequence and analyzed them together with those of Anabaena. Clustering analysis of all the PAS domains gave many specific pairings, indicative of evolutionary conser- vations. Ortholog analysis of PAS-containing proteins showed composite multidomain architecture in some cases of conserved domains and domains of disagreement between the two species. Further inspection of the domains of disagreement allowed us to trace them back in evolution. Thus, multidomain proteins could have been generated by duplication or shuffling in these cyanobacteria. The conserved PAS domains in the orthologous proteins were analyzed by structural fitting to the known PAS domains. We detected several subclasses with unique sequence features, which will be the target of experimental analysis. Key words: cyanobacterium; PAS domain; domain shuffling; ortholog pair; molecular evolution 12,13 1. Introduction heme-binding O2 sensor PAS domain, voltage sen- sor PAS domain,14 FMN-binding blue light sensor PAS Signal-transducing proteins usually have sensor mod- domain,15 and small compound-binding PAS domain.16 1 2 ules and response modules. In bacteria, PAS domains, However, there is still a huge number of PAS domains 3 4 5 GAF domains, CBS domains, HAMP domains and whose structure or function remains to be solved. 6 FHA domains are mainly employed as the sensor Kaneko et al. reported the complete sequence of 7 modules, while histidine kinase (HK) domains, HPt the entire genome of a filamentous heterocyst-forming 7 7 domains, response regulator (RR) domains, adenylate N2-fixing cyanobacterium, Anabaena sp. strain PCC 8 1 1 cyclase domains, GGDEF domains, EAL domains, 7120 (Anabaena).17 Later, domain analysis revealed that 9 Ser/Thr kinase domains and DNA-binding domains are the Anabaena genome encodes a number of signal- mainly used as the response modules. transducing proteins including HK and RR proteins of The PAS domain is one of the important signaling the two-component regulatory system, PAS domain pro- modules that monitors changes in light, redox poten- teins, GAF domain proteins, and Ser/Thr-type protein tial, oxygen, small ligands, or the overall energy level kinase proteins.18 It was found that not only the number 10 of a cell. It is widely distributed from bacteria to of such proteins but also the number of the signaling do- higher plants and animals. The PAS domain super- mains in single proteins is extremely large compared with family is highly diverse in sequence and in length but other bacteria. In the total genome, 87 GAF domains is conserved in the basic three-dimensional (3D) struc- were detected in 62 putative proteins, while 140 PAS do- ture, consisting of PAS-core motif, helical connector and mains were detected in 59 proteins. Moreover, many sig- PAC motif. So far, five distinct structures are resolved: naling proteins in Anabaena were characterized as mul- 11 coumaric acid-binding blue light sensor PAS domain, tidomain proteins. For example, one gene (all2095)was Communicated by Satoshi Tabata predicted to encode ten PAS domains in tandem arrange- ∗ To whom correspondence should be addressed. Tel. +81-3- ment in addition to an HK domain. Another example, 5454-6641, Fax. +81-3-5454-4337, E-mail: [email protected] all0729, was predicted to encode three PAS domains, tokyo.ac.jp 70 Shuffling of PAS Domains in Cyanobacteria [Vol. 11, three GAF domains, one HK domain and two RR do- are categorized as Nostoc-specific proteins, and had lit- mains. To understand the complexity, we performed clus- tle effect on the result of the analysis. Assembly error tering analysis of the GAF and PAS domains within the in the draft sequence may result in generation of hy- Anabaena genome. It was found that only a few of them brid ORFs. However, detailed comparison of the PAS could be grouped together, while most of others were domain-containing proteins did not show such an exam- placed as single-component clades. The PAS domains in ple except for Npun0339 (see Results and Discussion), particular seem to be highly diverged according to the which is, again, specific to Nostoc. In conclusion, the clustering analysis within the Anabaena genome. This draft Nostoc sequence seems to be sufficient for the bioin- suggests that, in most cases, even the tandemly arranged formatic analysis presented here. PAS domains were not generated by simple duplication. To study the evolutionary and physiological role of the 2.2. Informatics multiple PAS domains, we must take into consideration Homology analysis was performed using the BLAST the genome information of closely related cyanobacterial (NCBI-BLAST, version 2.1–2.2) and PSI-BLAST (ver- species. 20 sion 2.1.1) programs running locally or on the Downloaded from https://academic.oup.com/dnaresearch/article/11/2/69/534432 by guest on 28 September 2021 Recently, genome projects have been finished or are Web (non-redundant GenBank, SwissProt in NCBI in progress for more than ten cyanobacterial species. Of and GenomeNet and Cyanobase). Motif analysis these, Nostoc punctiforme strain ATCC 29133 (Nostoc) is was performed by Pfam21 and SMART22 searches lo- our choice for computational analysis in comparison with cally or on the Web (http:/www.sanger.ac.uk/Pfam/, Anabaena. It is also a filamentous heterocyst-forming http:/smart.embl-heidelberg.de/). Multiple alignments N2-fixing cyanobacterium. Nostoc has additional proper- were performed using CLUSTAL X.23 Phylogenetic trees ties of symbiosis with plants and development of akinetes were constructed using TreeView.24 Harplot analysis was and hormogonia. Phylogenetically, it is closely related to performed with GENETYXr -MAC Ver. 10.1 (SOFT- Anabaena according to sequence analysis of rRNA and WARE DEVELOPMENT CO., LTD.). Secondary struc- many protein coding genes, although significant differ- ture was predicted with PSIPRED.25 ences in gene composition and cellular function such as symbiosis have been documented.19 In this study, we ex- 2.3. Detection of PAS domains tracted all PAS domains from the draft genome sequence We previously detected 140 PAS domains from the of Nostoc and analyzed them together with Anabaena. 18 Detailed comparison between Nostoc and Anabaena gave Anabaena genome and constructed a sequence align- us deeper insights into the evolution of signaling domains ment. Based on this alignment, we built a custom pro- and proteins in cyanobacteria. Based on these analyses, file HMM and extracted the PAS domains from An- abaena and Nostoc using the default search parameters we identified orthologous pairs of domains or genes as 26 well as unique ones, which might have been generated by of HMMER (version 2.2) with the cut off E-value of duplication or shuffling of domains, sets of domains, or 10. The custom profile HMM and the extracted PAS genes. We further analyzed the orthologous PAS domains sequences are available at the web site http://bio.c.u- by structural fitting to the known domains. tokyo.ac.jp/labs/ikeuchi/narikawa-resource. Moreover, we considered a PAS domain to be a false-positive and manually removed it whenever a significant part of the 2. Materials and Methods domain was also assigned to other known motifs hav- ing higher E-values. PAS domains from the genomes of 2.1. Sequences 98 other prokaryote species were automatically detected The whole set of potential proteins deduced from when the E-value was over 0.1. the complete genome of Anabaena was obtained from CyanoBase (http://www.kazusa.or.jp/cyano/cyano. 2.4. Clustering analysis html). A similar set of proteins from the draft sequence of Nostoc was obtained from NCBI (updated on 07- We performed bootstrap trials based on the multiple NOV-2002). Sequences from other species were also ob- alignment of PAS domains from Anabaena and Nostoc. tained from NCBI. The draft Nostoc sequence may con- We regarded a group of PAS domains as distinct sub- > tain some sequence errors and assembly errors. How- classes when clustered over the cutoff value ( 500) of ever, we detected only three frameshift mutations in the the bootstrap trials (1000). PAS domain-containing proteins of Nostoc. One of them, Npun3208-3209, is not likely due to a sequencing er- 2.5. Definition of ortholog ror because the mutation was a tandem duplication of A specific definition of the ortholog is needed for mul- 11 bp (see Results and Discussion, 3.4.2). The others tidomain signaling proteins. Since there is large varia- (Npun5710-5709 and Npun2261-2260) may be caused by tion in the conservation and composition of domains of sequencing error or just pseudogenes. Regardless, they such proteins, a BLAST search of whole proteins often No. 2] R. Narikawa et al. 71 gives confusing results. Here, we defined the ortholog mains. In total, 288 PAS domains were clustered into of the multidomain signaling proteins as follows. First, 64 subclasses, which might have kept common roles, if we divided domains in a single protein into the sensor any, during evolution.