Current Status and Future Perspectives of Bioinformatics in Tanzania
Total Page:16
File Type:pdf, Size:1020Kb
CURRENT STATUS AND FUTURE PERSPECTIVES OF BIOINFORMATICS IN TANZANIA Sylvester L. Lyantagaye Department of Molecular Biology and Biotechnology, College of Natural and Applied Sciences, University of Dar es Salaam, P.O. Box 35179, Dar es Salaam, Tanzania E-mail: [email protected], [email protected] ABSTRACT The main bottleneck in advancing genomics in present times is the lack of expertise in using bioinformatics tools and approaches for data mining in raw DNA sequences generated by modern high throughput technologies such as next generation sequencing. Although bioinformatics has been making major progress and contributing to the development in the rest of the world, it has still not yet fully integrated the tertiary education and research sector in Tanzania. This review aims to introduce a summary of recent achievements, trends and success stories of application of bioinformatics in biotechnology. The applications of bioinformatics in the fields such as molecular biology, biotechnology, medicine and agriculture, the global trend of bioinformatics, accessibility bioinformatics products in Tanzania, bioinformatics training initiatives in Tanzania, the future prospects of bioinformatics use in biotechnology globally and Tanzania in particular are reviewed. The paper is of interest and importance to rouse public awareness of the new opportunities that could be brought about by bioinformatics to address many research problems relevant to Tanzania and sub-Sahara Africa. Keywords: Bioinformatics, Biotechnology, Genomics, Tanzania. INTRODUCTION analysis, and management of biological Bioinformatics is conceptualising biology in information through computational terms of molecules (in the sense of physical techniques. It uses mathematics, biology, chemistry) and applying "informatics and computer science to understand the techniques" (derived from disciplines such biological importance of an extensive as applied mathematics, computer science variety of omics data (Reportlinker 2013). and statistics) to understand and organise the Bioinformatics technologies are used in information associated with these molecules, various pharmaceutical and biotechnology on a large scale (Luscombe et al. 2001). In sectors. The medical sector is driven by the short, bioinformatics is a management increasing use of bioinformatics for the drug information system for molecular biology discovery and development process. and has many practical applications. Students graduating in biology or Bioinformatics is the discipline that focuses biochemistry today ought to know their way on the conversion of experimental data into around the Linux command line. They biological knowledge and hypothesis (Bayat should also know bioinformatics tools such 2002). Bioinformatics is essential both, in as what “tab” output is and how to bring it deciphering genomic, transcriptomic, and into a spreadsheet or to a database, etc. proteomic data generated by high- throughput experimental technologies, and The evolution of bioinformatics, which in organizing data gathered from traditional started with sequence analysis (has led to biology and medicine. Bioinformatics high-throughput whole genome or involves retrieval, storage, processing, transcriptome sequencing today) is now 1 Lyantagaye S.L. – Current status and future perspectives… being directed towards recently emerging areas of integrative and translational APPLICATION OF BIOINFORMATICS genomics, and ultimately personalized DNAs for thousands of organisms have been medicine. This review aims to introduce sequenced since the Phage Φ-X174 was scientists from Tanzania and elsewhere to sequenced in 1977 (Sanger et al. 1977, the use of bioinformatics in their research www.kegg.jp). Bioinformatics analyses are and to present some of the bioinformatics used to determine genes that encode resources available in Tanzania. polypeptides (proteins), RNA genes, regulatory sequences, structural motifs, and Considerable efforts are required to provide repetitive sequences. A comparison of genes the necessary infrastructure for high- within a species or between different species performance computing, sophisticated can show similarities between protein algorithms, advanced data management functions, or relations between species. capabilities, and-most importantly-well trained and educated personnel to design, Today, computer algorithms implemented in maintain and use these environments. With programs such as BLAST (NCBI BLAST the presence of the STM-1 SEACOM Fiber- home) are used daily to search genomes of optic cable for broadband Internet thousands of organisms, containing billions connection in Tanzania, and given modern of nucleotides. These algorithms compensate computer infrastructure and reliable power for mutations (exchanged, deleted or supply, Tanzanians could have relatively inserted bases) in the DNA sequence, in easy access to the products of order to identify sequences that are related, bioinformatics. However, future use of this but not identical. Another aspect of technology in Tanzania hinges on the bioinformatics in sequence analysis is availability of bioinformatics knowledge in annotation, which involves computational the public domain. gene finding to search for protein-coding genes, RNA genes, and other functional High ignorance in the country of what sequences within a genome (Korf 2004, bioinformatics could be useful for, suggests Ratsch et al. 2007), but the programs a need for awareness campaigns through available for analysis of genomic DNA are journals and other publication media. The constantly changing and improving. few bioinformaticians in the country trained oversees when they come back their Other uses are genome assembly (Gascuel knowledge of bioinformatics becomes 2007), determination by gene expression of redundant because their employers did not the genes implicated in a disorder (Wagner know what they could be useful for. It is et al. 2010, Meiri et al. 2010), expression crucial that local academic institutions take data can be used to infer gene regulation: the responsibility to contribute to the one might compare microarray data from a production of a critical mass of wide variety of states of an organism to form bioinformaticians. This review also outlines hypotheses about the genes involved in each the most promising trends in bioinformatics, state. In a single-cell organism, one might potential and the current status of compare stages of the cell cycle, along with Bioinformatics in Tanzania, which may play various stress conditions (heat shock, a major role in the sensitization of the starvation, etc.) (Liu et al. 2009, Wang et al. authorities in both public and private sectors 2010). One can then apply clustering to seriously consider investing in algorithms to expression data to determine bioinformatics capacity building in the which genes are co-expressed. For example, country. the upstream regions (promoters) of co- 2 Tanz. J. Sci. Vol 39 2013 expressed genes can be searched for over- computer in one part of the world to use represented regulatory elements algorithms, data and computing resources on (Rakhshandehroo et al. 2009). servers in other parts of the world. The main advantages derived from the fact that end Protein microarrays and high Throughput users do not have to deal with software and Mass Spectrometry (HT MS) are very much database maintenance overheads. A variety involved in making sense of Bioinformatics of bioinformatics systems are being data (Simara et al. 2009,Yang et al. 2010), developed for specific purposes, e.g., analysis of mutations in cancer (Kim et al. eBioKit project has been running for a 2004, Lassmann et al. 2007), protein number of years with the aim to help in sub structure prediction (Chab et al. 2010, Peng Saharan Africa where bioinformatics is still and Xu 2010), establishment of the in juvenile stages. eBioKit contains all the correspondence between genes (orthology major open source databases and tools which analysis) or other genomic features in can be used for teaching and for small different organisms (Miller et al. 2004, Su et research projects (Fuxelius et al. 2010). al. 2005), the use of computer simulations of Another system that contains many tools that cellular subsystems (such as the networks of the eBioKit has but no databases is the metabolites and enzymes which comprise BioLinux project (Field et al. 2006). metabolism, signal transduction pathways and gene regulatory networks) to both GLOBAL TREND OF analyze and visualize the complex BIOINFORMATICS connections of these cellular processes For more than a century, vast progress has (Hucka et al. 2010, Larsen et al. 2010) and a been made in genetics and molecular variety of methods have been developed to biology. New high-throughput experimental tackle the protein-protein docking problem techniques continue to emerge rapidly. The (Wang et al. 2007, Marshall and Vakser automation of DNA sequencing set the stage 2005, Uchikoga and Hirokawa 2010). for the Human Genome Project in 1990 (Collins et al. 1998), which has led to Basic bioinformatics services are classified genomics (the branch of genetics that studies by the European Bioinformatics Institute organisms in terms of their full DNA (EBI) into three categories: Sequence Search sequences) and a range of related disciplines Services (SSS), Multiple Sequence such as transcriptomics (the study of the Alignment (MSA) and Biological Sequence complete gene expression state), proteomics Analysis (BSA) (McWilliam et al. 2009). (the