View Plasmodb: a Functional Genomic Database for Malaria Parasites

View Plasmodb: a Functional Genomic Database for Malaria Parasites

Ali et al. BMC Genomics 2011, 12(Suppl 3):S6 http://www.biomedcentral.com/1471-2164/12/S3/S6 PROCEEDINGS Open Access ApicoAlign: an alignment and sequence search tool for apicomplexan proteins Jamshaid Ali†, Umadevi Paila†, Akash Ranjan* From Asia Pacific Bioinformatics Network (APBioNet) Tenth International Conference on Bioinformatics – First ISCB Asia Joint Conference 2011 (InCoB/ISCB-Asia 2011) Kuala Lumpur, Malaysia. 30 November - 2 December 2011 Abstract Background: Over the recent years, a number of genomes have been successfully sequenced and this was followed by genome annotation projects to help understand the biological capabilities of newly sequenced genomes. To improve the annotation of Plasmodium falciparum proteins, we earlier developed parasite specific matrices (PfSSM) and demonstrated their (Smat80 and PfFSmat60) better performance over standard matrices (BLOSUM and PAM). Here we extend that study to nine apicomplexan species other than P. falciparum and develop a web application ApicoAlign for improving the annotation of apicomplexan proteins. Results: The SMAT80 and PfFSmat60 matrices perform better for apicomplexan proteins compared to BLOSUM in detecting the orthologs and improving the alignment of these proteins with their potential orthologs respectively. Database searches against non-redundant (nr) database have shown that SMAT80 gives superior performance compared to BLOSUM series in terms of E-values, bit scores, percent identity, alignment length and mismatches for most of the apicomplexan proteins studied here. Using these matrices, we were able to find orthologs for rhomboid proteases of P. berghei, P. falciparum &P. vivax and large subunit of U2 snRNP auxiliary factor of Cryptosporidium parvum in Arabidopsis thaliana. We also show improved pairwise alignments of proteins from Apicomplexa viz. Cryptosporidium parvum and P. falciparum with their orthologs from other species using the PfFSmat60 matrix. Conclusions: The SMAT80 and PfFSmat60 substitution matrices perform better for apicomplexan proteins compared to BLOSUM series. Since they can be helpful in improving the annotation of apicomplexan genomes and their functional characterization, we have developed a web server ApicoAlign for finding orthologs and aligning apicomplexan proteins. Background its proteins due to its extreme AT rich genome com- One of the important goals of post-genomic era is to position [2,3]. As a result, many proteins show no or develop tools/services to help in the annotation of low sequence match to the known proteins in the hypothetical/putative proteins of newly sequenced gen- database, posing a major difficulty in genome annota- omes. In case of Plasmodium falciparum, approxi- tion. In order to address this issue we developed the mately ~60% of its genes did not show sequence symmetric Smat series and the asymmetric PfFSmat60 similarity to known genes [1]. This organism showed and demonstrated their better performance over stan- an unusual amino acid composition and substitution in dard matrices (BLOSUM and PAM) [2]. Here we extend the use of these matrices to better annotate the * Correspondence: [email protected] proteins of other apicomplexa like Plasmodium ber- † Contributed equally ghei, Plasmodium chabaudi, Plasmodium knowlesi, Computational and Functional Genomics Group, Centre for DNA Fingerprinting and Diagnostics, A Sun Centre of Excellence in Medical Plasmodium vivax, Plasmodium yoelii yoelii, Toxo- Bioinformatics, Hyderabad 500001, India plasma gondii, Cryptosporidium parvum, Theileria © 2011 Ali et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Ali et al. BMC Genomics 2011, 12(Suppl 3):S6 Page 2 of 12 http://www.biomedcentral.com/1471-2164/12/S3/S6 parva and Neospora caninum. After benchmarking the Best Bidirectional Hit performance of these matrices for apicomplexan pro- The method of BBH (Bidirectional Best Hit) [4] has been teins, we develop ApicoAlign a web server for finding employed for the search of potential orthologs of apicom- orthologs and aligning apicomplexan proteins using a plexan proteins across a range of organisms. The input for novel series of matrices. bidirectional ortholog detection is a protein sequence file of the query genome and that of the subject in the fasta Implementation format. The subject proteome may be either selected from ApicoAlign is a web-based application written in Perl/ the list of the organisms provided in the web page or in CGI language. The web server has five applications (1) case of a user specific sequence file it may be uploaded Search Database (2) Search a genome (3) Reciprocal Hit through the file upload option. Large sequence files may (4) Best Bidirectional Hit and (5) Pairwise Alignment for take a longer run time and the size of the uploaded query apicomplexan proteins. The sample input buttons have and subject sequence files is limited to 25 MB. been provided for some apicomplexan species for auto- matic loading of sample protein sequences in the Pairwise alignment required fields for each option. The parasite specific The pair-wise alignment option uses the water program symmetric matrices (Smat series) consisting of Smat50, (EMBOSS package, version 6.3.1) [5] for performing Smat60, Smat70, Smat80 and Smat90 are provided for local alignments of the apicomplexan query protein and first four applications. Smat matrices have been earlier its potential ortholog. The asymmetric parasite specific demonstrated to work best for database searches [2] of matrix, PfFSmat60 is provided for performing these P. falciparum andhereweshowtheirsuperiorperfor- alignments along with standard matrices EBLOSUM62, mance for other apicomplexa to increase the utility of EBLOSUM90, EPAM200, and PfFSmat60. PfFSmat60 these matrices. For comparison, the standard BLO- has been demonstrated to perform best for pair-wise SUM62 matrix and similar entropy matrix BLOSUM90 alignments [2], where the alignments span motif like have been provided in the drop down menu. For the regions of the protein. PfFSmat60 is a scaled version of first four applications, the default values for gap open a unique asymmetric matrix [2] used here for improving and extension penalties have been set to 10 and 1 the alignment of an apicomplexan protein with its respectively that are defined best for the standard strongly suspected ortholog. Hence, users are not matrices with entropy similar to Smat series. Few other encouraged to use this matrix indiscriminately for non- combinations of gap open and extensions have also orthologous proteins. The input is a single protein been provided that the user can try. E-value cut-off may sequence in fasta format for query as well as subject. be defined by the user. The user may provide (or use default values of) the gap open and extension penalties for the pair-wise align- Search database ment. PfFSmat60 was developed in context of Plasmo- The non-redundant (nr), swiss-prot and PDB databases dium falciparum and represents unidirectional have been provided for finding orthologs for apicom- substitutions [2] whose usage we extend to other api- plexan proteins using parasite specific and standard complexans in this study. Hence, one of the limitations matrices. The input should be a single protein sequence of the pairwise alignment is that the query sequence is in FASTA format which can be pasted in the text box restricted only to apicomplexa, therefore, the query and provided or uploaded through a file. subject proteins should not be reversed in their order. Search a genome Results and discussion This option has been provided for finding hits for api- To check whether Plasmodium falciparum Specific Sub- complexan proteins across different genomes provided stitution Matrices (SMAT and PfFSmat) perform better in the drop down menu. The input is protein sequences for other apicomplexan species, we carried out database in FASTA format which can be pasted in the text box searches against non-redundant database (nr) and found or uploaded through a file (upto 5 MB). best bidirectional hits across different bacterial and eukaryotic genomes using BLOSUM and SMAT series Reciprocal hits of matrices. This option has been provided for finding reciprocal hits for apicomplexan proteins across different genomes pro- Amino acid composition of different apicomplexan vided in the drop down menu. The input is protein species sequences in FASTA format which can be pasted in the In our earlier study, we have shown that Plasmodium text box or uploaded through a file (upto 5 MB). falciparum has biased amino acid choices for its Ali et al. BMC Genomics 2011, 12(Suppl 3):S6 Page 3 of 12 http://www.biomedcentral.com/1471-2164/12/S3/S6 proteins and this is one of the reasons that standard apicomplexan species (except Toxoplasma gondii matrices BLOSUM & PAM do not perform well in this &Neospora caninum) show minimum p-value with case [2]. Since the SMAT and PfFSmat60 matrices were Mycobacterium tuberculosis while in case of positively originally developed for Plasmodium falciparum, we cal- charged amino acids all apicomplexan species except culated the amino acid composition for all the proteins Cryptosporidium parvum &Neospora caninum show of apicomplexan genomes and compared them with that minimum p-value with Mycobacterium tuberculosis.We

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us