International Symposium on Extremophiles and Their Applications 2005
Total Page:16
File Type:pdf, Size:1020Kb
International Symposium on Extremophiles and Their Applications 2005 Revisiting the genome sequence of Aeropyrum pernix K1 Takahisa Tajima, Satoshi Tanikawa, Syuji Yamazaki, Nobuyuki Fujita, Shigeaki Harayama National Institute of Technology and Evaluation, Tokyo, Japan, [email protected] Six years have passed since the first publication of the genome sequence of Aeropyrum pernix K1. To incorporate growing information on gene functions and proteomes, we completely re-annotated the genome sequence of A. pernix K1. It is the first aerobic hyperthermophilic crenarchaeon whose genome was completely sequenced (1), and thus has played an important role as a model of hyperthermophilic archaea. The original annotation was released in 1999, in which all longest reading frames of larger than 100 codons starting with ATG or GTG were assigned as ORFs. Smaller ORFs of 50-99 codons having any similarity match or known protein motifs were also included. Consequently a total of 2,694 ORFs were assigned. This number of ORFs seemed too large taking the small genome size (1.67 M) into account, suggesting that some 1,000 ORFs were overestimated. Several third-parties have independently re-annotated the genome of A. pernix K1 using COG database, Z curve method and so on (2). The number of estimated ORFs ranged from 1400 to 1871, and some overlapped ORFs and long non-coding regions still existed in these annotations. In this study, we predicted ORFs using gene-finding program GLIMMER 2.0 allowing as a potential start codon the TTG, which was recently shown to be the most abundant start codon in A. pernix K1 (3). Predicted ORFs were subjected to extensive manual curation using the results of BLASTP search against UniProt database and motif prediction by InterProScan. After the first publication, many ORFs have been experimentally proved to code for functional proteins and dozens of proteins were analyzed for their crystal structures in A. pernix and related organisms. We incorporated these most recent biochemical and structural data. More than 200 ORFs, which have been annotated as hypothetical proteins, were given functional annotations such as DNA replication, DNA repair, transport, and fundamental metabolism including glycolysis. The re-annotation of A. pernix K1 will be submitted to DDBJ databank as an update to the current entry (accession numbers AP000058- AP000064). Additional information containing experimental references, expression data from proteomic analysis (3), and links to protein structural databases will be available at website DOGAN (http://www.bio.nite.go.jp/dogan/Top). 1. Kawarabayashi, Y. et al., 1999, DNA Res., 6, 83-101. 2. Natale, D. A. et al., 2000, Genome Biol., 1, RESERCH0009; Pruitt K. D. et al., 2003, Nucleic Acid Res., 31, 34-37; Guo, F. B. et al., 2004, DNA Res., 11, 361-370. 3. Yamazaki, S. et al., submitted for publication. 316.