CURRENT STATUS AND FUTURE PERSPECTIVES OF IN TANZANIA

Sylvester L. Lyantagaye Department of Molecular Biology and Biotechnology, College of Natural and Applied Sciences, University of Dar es Salaam, P.O. Box 35179, Dar es Salaam, Tanzania E-mail: [email protected], [email protected]

ABSTRACT The main bottleneck in advancing genomics in present times is the lack of expertise in using bioinformatics tools and approaches for data mining in raw DNA sequences generated by modern high throughput technologies such as next generation sequencing. Although bioinformatics has been making major progress and contributing to the development in the rest of the world, it has still not yet fully integrated the tertiary education and research sector in Tanzania. This review aims to introduce a summary of recent achievements, trends and success stories of application of bioinformatics in biotechnology. The applications of bioinformatics in the fields such as molecular biology, biotechnology, medicine and agriculture, the global trend of bioinformatics, accessibility bioinformatics products in Tanzania, bioinformatics training initiatives in Tanzania, the future prospects of bioinformatics use in biotechnology globally and Tanzania in particular are reviewed. The paper is of interest and importance to rouse public awareness of the new opportunities that could be brought about by bioinformatics to address many research problems relevant to Tanzania and sub-Sahara Africa.

Keywords: Bioinformatics, Biotechnology, Genomics, Tanzania.

INTRODUCTION analysis, and management of biological Bioinformatics is conceptualising biology in information through computational terms of molecules (in the sense of physical techniques. It uses mathematics, biology, chemistry) and applying "informatics and computer science to understand the techniques" (derived from disciplines such biological importance of an extensive as applied mathematics, computer science variety of omics data (Reportlinker 2013). and statistics) to understand and organise the Bioinformatics technologies are used in information associated with these molecules, various pharmaceutical and biotechnology on a large scale (Luscombe et al. 2001). In sectors. The medical sector is driven by the short, bioinformatics is a management increasing use of bioinformatics for the drug information system for molecular biology discovery and development process. and has many practical applications. Students graduating in biology or Bioinformatics is the discipline that focuses biochemistry today ought to know their way on the conversion of experimental data into around the command line. They biological knowledge and hypothesis (Bayat should also know bioinformatics tools such 2002). Bioinformatics is essential both, in as what “tab” output is and how to bring it deciphering genomic, transcriptomic, and into a spreadsheet or to a database, etc. proteomic data generated by high- throughput experimental technologies, and The evolution of bioinformatics, which in organizing data gathered from traditional started with sequence analysis (has led to biology and medicine. Bioinformatics high-throughput whole genome or involves retrieval, storage, processing, transcriptome sequencing today) is now

1 Lyantagaye S.L. – Current status and future perspectives…

being directed towards recently emerging areas of integrative and translational APPLICATION OF BIOINFORMATICS genomics, and ultimately personalized DNAs for thousands of organisms have been medicine. This review aims to introduce sequenced since the Phage Φ-X174 was scientists from Tanzania and elsewhere to sequenced in 1977 (Sanger et al. 1977, the use of bioinformatics in their research www.kegg.jp). Bioinformatics analyses are and to present some of the bioinformatics used to determine genes that encode resources available in Tanzania. polypeptides (proteins), RNA genes, regulatory sequences, structural motifs, and Considerable efforts are required to provide repetitive sequences. A comparison of genes the necessary infrastructure for high- within a species or between different species performance computing, sophisticated can show similarities between protein algorithms, advanced data management functions, or relations between species. capabilities, and-most importantly-well trained and educated personnel to design, Today, computer algorithms implemented in maintain and use these environments. With programs such as BLAST (NCBI BLAST the presence of the STM-1 SEACOM Fiber- home) are used daily to search genomes of optic cable for broadband Internet thousands of organisms, containing billions connection in Tanzania, and given modern of nucleotides. These algorithms compensate computer infrastructure and reliable power for mutations (exchanged, deleted or supply, Tanzanians could have relatively inserted bases) in the DNA sequence, in easy access to the products of order to identify sequences that are related, bioinformatics. However, future use of this but not identical. Another aspect of technology in Tanzania hinges on the bioinformatics in sequence analysis is availability of bioinformatics knowledge in annotation, which involves computational the public domain. gene finding to search for protein-coding genes, RNA genes, and other functional High ignorance in the country of what sequences within a genome (Korf 2004, bioinformatics could be useful for, suggests Ratsch et al. 2007), but the programs a need for awareness campaigns through available for analysis of genomic DNA are journals and other publication media. The constantly changing and improving. few bioinformaticians in the country trained oversees when they come back their Other uses are genome assembly (Gascuel knowledge of bioinformatics becomes 2007), determination by gene expression of redundant because their employers did not the genes implicated in a disorder (Wagner know what they could be useful for. It is et al. 2010, Meiri et al. 2010), expression crucial that local academic institutions take data can be used to infer gene regulation: the responsibility to contribute to the one might compare microarray data from a production of a critical mass of wide variety of states of an organism to form bioinformaticians. This review also outlines hypotheses about the genes involved in each the most promising trends in bioinformatics, state. In a single-cell organism, one might potential and the current status of compare stages of the cell cycle, along with Bioinformatics in Tanzania, which may play various stress conditions (heat shock, a major role in the sensitization of the starvation, etc.) (Liu et al. 2009, Wang et al. authorities in both public and private sectors 2010). One can then apply clustering to seriously consider investing in algorithms to expression data to determine bioinformatics capacity building in the which genes are co-expressed. For example, country. the upstream regions (promoters) of co-

2 Tanz. J. Sci. Vol 39 2013

expressed genes can be searched for over- computer in one part of the world to use represented regulatory elements algorithms, data and computing resources on (Rakhshandehroo et al. 2009). servers in other parts of the world. The main advantages derived from the fact that end Protein microarrays and high Throughput users do not have to deal with software and Mass Spectrometry (HT MS) are very much database maintenance overheads. A variety involved in making sense of Bioinformatics of bioinformatics systems are being data (Simara et al. 2009,Yang et al. 2010), developed for specific purposes, e.g., analysis of mutations in cancer (Kim et al. eBioKit project has been running for a 2004, Lassmann et al. 2007), protein number of years with the aim to help in sub structure prediction (Chab et al. 2010, Peng Saharan Africa where bioinformatics is still and Xu 2010), establishment of the in juvenile stages. eBioKit contains all the correspondence between genes (orthology major open source databases and tools which analysis) or other genomic features in can be used for teaching and for small different organisms (Miller et al. 2004, Su et research projects (Fuxelius et al. 2010). al. 2005), the use of computer simulations of Another system that contains many tools that cellular subsystems (such as the networks of the eBioKit has but no databases is the metabolites and enzymes which comprise BioLinux project (Field et al. 2006). metabolism, signal transduction pathways and gene regulatory networks) to both GLOBAL TREND OF analyze and visualize the complex BIOINFORMATICS connections of these cellular processes For more than a century, vast progress has (Hucka et al. 2010, Larsen et al. 2010) and a been made in genetics and molecular variety of methods have been developed to biology. New high-throughput experimental tackle the protein-protein docking problem techniques continue to emerge rapidly. The (Wang et al. 2007, Marshall and Vakser automation of DNA sequencing set the stage 2005, Uchikoga and Hirokawa 2010). for the Human Genome Project in 1990 (Collins et al. 1998), which has led to Basic bioinformatics services are classified genomics (the branch of genetics that studies by the European Bioinformatics Institute organisms in terms of their full DNA (EBI) into three categories: Sequence Search sequences) and a range of related disciplines Services (SSS), Multiple Sequence such as transcriptomics (the study of the Alignment (MSA) and Biological Sequence complete gene expression state), proteomics Analysis (BSA) (McWilliam et al. 2009). (the study of the full set of proteins encoded The availability of these service-oriented by a genome), and metabolomics (the study bioinformatics resources freely on the web of comprehensive metabolite profiles), demonstrate the applicability of web based sometimes all these domains are collectively bioinformatics solutions, and range from a referred to as genomics. Genomics has collection of standalone tools with a already, and will have a major impact on the common data format under a single, advancement of society in many ways standalone or web-based interface, to (GMIS 2008). By understanding genes, or integrative, distributed and extensible the blue print, of how humans, plants, bioinformatics workflow. A variety of microorganisms and animals work, we can interfaces have been developed, e.g., Simple find solutions to some of the most pressing Object Access Protocol (SOAP) and challenges facing us today in health, Representational State Transfer (REST) for environment and agriculture. There is much a wide variety of bioinformatics applications potential for genomics to solve many of the allowing an application running on one major challenges we face today, and many

3 Lyantagaye S.L. – Current status and future perspectives…

which are still unknown. Genomics has growth of the bioinformatics market is greatly accelerated fundamental research in driven by decrease in cost of DNA molecular biology as it enables the sequencing, increasing government measurement of molecular processes initiatives and funding, and growing use of globally and from different points of view. bioinformatics in drug discovery and This led to a range of applications in the biomarkers development processes. It is biomedical sector and increasingly affects expected that the market will offer patient care (Collins et al. 2003). opportunities for bioinformatics solutions manufacturers with the introduction and One of the bottlenecks that prevent large- adoption of upcoming technologies such as scale implementation of genomics in health nanopore sequencing and cloud computing. care is the problem related to the However, factors such as scarcity of skilled management, analysis and interpretation of personnel to ensure proper use of large amounts of heterogeneous data that are bioinformatics tools and lack of integration measured for (patient) samples (van Kampen of a wide variety of data generated through and Horrevoets 2006), hence the emergence various bioinformatics platforms are of bioinformatics. hindering the growth of the market. Manufacturers of bioinformatics solutions In the recent years, the continuously will face further challenges with regard to changing market of bioinformatics industry consolidation and management of applications enjoys a wide acceptance owing high volume data”. to declining new drugs’ manufacturing, non- existence of potential drugs and expiry of Reportlinker (2013) further shows that North patents. In 2004-05, this market with a US America accounted for the largest market $1.4 billion has been competing with share of the bioinformatics market, followed unusual and innovative tools facilitating by Europe, in 2012. However, Asian and critical Research and Development (R&D) Latin American countries represent for the bioinformatics companies. The emerging markets, owing to a rise in “World Bioinformatics Market (2005- research outsourcing by pharmaceutical 2010)” (RNCOS 2006) indicates that the giants, increasing number of Contract bioinformatics market was at stability where Research Organizations (CROs), rise in 20% of the applications discovered were public and private sector investment, and based on genomics and proteomics, which growing industry -academia partnerships. boost growth of bioinformatics tools. As per The major players in the bioinformatics new research report “Global Bioinformatics market are Accelrys, Inc. (U.S.), Affymetrix, Market Outlook”, the market for Inc. (U.S.), Life Technologies Corporation bioinformatics will surge during 2011-2013 (U.S.), Illumina, Inc. (U.S.), and CLC bio. to a value of around US$ 6.2 Billion (Denmark). (RNCOS 2010). With the increasing R&D investment by companies on bioinformatics and the regulatory support in various IS BIOINFORMATICS ACCESSIBLE countries, it is anticipate that the IN TANZANIA? bioinformatics market will post high growth Public bioinformatics resources, such as rate in majority of the countries. According databanks and software tools that are crucial to Reportlinker (2013), the global for biotechnology projects, are today bioinformatics market was valued at $2.9 available via the Internet (Stajich and Lapp billion in 2012 and is poised to reach $7.5 2006, http://www.expasy.org/, billion by 2017 at a CAGR of 20.9%”. The http://www.ncbi.nlm.nih.gov). Scientists

4 Tanz. J. Sci. Vol 39 2013

need only a computer and an Internet Grand Challenges in Global Health funds the connection of a certain quality to use them. Genome Science Centre at SUA. The In 2010, the STM-1 SEACOM undersea laboratory is well equipped with high- Fibre-optic Cable arrived on the Tanzanian throughput facilities for genomics and good coastal shores. To the country IT users computing facilities but there is a lack of connection to the SEACOM cable means an bioinformaticians. increase of bandwidth capacity. For the University of Dar es Salaam (UDSM) for While the government through the National instance, the increase is from 10 mega-bits Commission for Science and Technology to 155 mega bits per second (UDSM 2010), (COSTEC) understands the potential of and this has resulted in tremendous R&D in Biotechnology to the nation’s improvement in the accessibility of the economic growth internet resources. With such resources the (http://www.costech.or.tz/?s=biotechnology) situation of a Tanzanian biologist is closer to , there is a dare need for awareness that of an academic biologist in an campaign on the significance of industrialized country. bioinformatics in biotechnology. It is the role of universities and other higher learning Modern bioinformatics research does not institution to design programmes, which also necessarily require more resources than any involves Bioinformatics in their curricula. other field of Computer Science; almost all With the recent development in registering processes can be efficiently designed and several new public and private higher modeled on a personal computer or learning institutions, if all joins the initiative workstation (Stevens 2006). If this basic that the UDSM is taking a leading role in infrastructure is sufficiently provided, Bioinformatics training, the government and biomathematics can be recommended to the private sector might get interested to universities in Tanzania as an up-to-date and invest. For the first time in the country promising research subject that does not “Introduction to Bioinformatics” has been require excessive resources. incorporated as a core course in the reviewed curricula for biotechnology However, the step from theoretical programmes of the UDSM commencing bioinformatics to applied bioinformatics 2010/11 academic year (ARIS 2010). A new requires a supportive research and taught Masters of Biochemistry programme, development climate that generates local which will also have a bioinformatics need for such research. Like in many other component, is being designed at the college developing African countries, little is known of Natural and Applied Sciences UDSM about bioinformatics in Tanzania. (Not published).

Biotechnology industry in Tanzania is still COSTECH in collaboration with the Global very poor and hardly any bioinformatics is Biodiversity Information Facility (GBIF) vividly talked about. There are several and other national and international partners biotechnology research works around the established an online information tool, country but no serious investments have Tanzania Biodiversity Information Facility been made for bioinformatics. Research (TanBIF) in 2007, to support policy and groups most likely to apply bioinformatics decision making process on biodiversity and are the Tanzania Genome Network (TGN) its related issues. TanBIF is an extensive, member groups and the Genome Science decentralized system of national biodiversity Centre at Sokoine University of Agriculture information units that intends to provide free (SUA). The Gates Foundation through the and universal access to data and information

5 Lyantagaye S.L. – Current status and future perspectives…

regarding Tanzania’s biodiversity. It is a country to teach a formal degree course national node of the Global Biodiversity (MBB, 2009). From 14th to 18th November Information Facility (GBIF). The 2011, UDSM also successfully organized informatics tool will enable analysis and and hosted a Bioinformatics short course, modeling of primary biodiversity data to which was sponsored by the network called support (biodiversity related) decision- Southern Africa Biochemistry and making activities such as land use planning, Informatics of Natural Products (SABINA). design of protected areas risk assessment, The target groups were postgraduate among others. The tool will allow users to students, academic staff, and researchers in perform meaningful analysis by integrating biological sciences and medical fields. The and using biodiversity data (including course also provided opportunity to occurrence data, species-level postgraduate research students to ask for data/ecosystem data) in combination with assistance from instructor for analysis of other types of data (geospatial, climatic, DNA sequences generated from their demographic, economic datasets). This research works. project is sponsored by the Department of Environment, Royal Danish Ministry of Although UDSM and some other Tanzanian Foreign Affairs and Capacity Enhancement institutions have for many years been Programme for Developing Countries conducting genomic studies, there have been (CEPDEC) (http://www.tanbif.or.tz). limited collaboration leading to uncoordinated and duplication of research, and under-utilization of resources (expertise, BIOINFORMATICS TRAINING equipment and other laboratory facilities). INITIATIVES AND SUCCESS STORIES To address these issues and in preparation FROM TANZANIA for future genomic revolution, researchers Bioinformatics is increasingly included in from different Tanzanian institutions the undergraduate curriculum for biology decided to establish the TGN. TGN was students, globally (Barker et al. 2013). established in 2011 with the aim to work Teaching bioinformatics is made difficult, together to consolidate existing and develop however, by the constraints of typical resources in genetic/genomic research in university computer classrooms. Some areas each institution to minimize duplication and of basic bioinformatics may be taught using ensure expertise and resources are utilized in such classrooms, where all that is required is an optimal manner. The TGN will facilitate an Internet connection and Web browser collaboration between members and support [e.g. BLAST (Altschul et al. 1997) searches development of centralized state-of-the-art at the NCBI (NCBI BLAST home)]. More facilities and resource centers to conduct in-depth teaching requires the re-creation of genetic/genomic research (Ishengoma et al. a bioinformatics research environment, 2012). TGN is made of 12 biomedical consisting of a Linux or UNIX operating research institutions and universities in system, standard GNU utilities Tanzania each with one or more contact (http://www.gnu.org), specialist persons. The members include: bioinformatics software, and sequence • Muhimbili University of Health and databases (Barker et al. 2013). Allied Sciences (MUHAS) • Ifakara Health Institute (IHI) It was not until 2009 the UDSM started • National Institute for Medical teaching an introduction to bioinformatics Research (NIMR) course (BN 205) for BSc students and • University of Dar es Salaam became the only training institution in the (UDSM)

6 Tanz. J. Sci. Vol 39 2013

• Sokoine University of Agriculture the aim that they will teach others at their (SUA) home institutions. H3ABioNet has a plan of • Ministry of Health and Social a series of courses throughout the funding Welfare (MoH&SW)/National period (http://www.h3abionet.org/); no Health Laboratory Quality doubt the project will leave Tanzania in a Assurance and Training Centre better position with regards to • Hubert Kairuki Memorial bioinformatics capacity. University (HKMU) • Nelson Mandela African Institute H3ABioNet is also funding eBioKit system of Science and Technology (NM- infrastructure and training to facilitate in AIST) teaching bioinformatics across the network. • African Malaria Network Trust Again, UDSM was the first in Tanzania to (AMANET) acquire an eBioKit system in October 2013. eBioKit is a compilation of online open • Kilimanjaro Christian Medical Centre (KCMC) source databases and software hosted on the same server. All of the ensemble servers and • African Academy of Public Health the biomart server are hosted by different (AAPH/ MDH)/ Harvard School of instances of Apache and some of the other Public Health server applications just behind a proxy. Due

to the coding of the ensembl servers, they In 2011, three TGN member institutions can't be behind a proxy server. For that (MDH, MUHAS and UDSM) joined Human reason, it's only possible to access the Heredity and Health in Africa servers by their IP and port number Bioinformatics Network (H3ABioNet). (Fuxelius et al. 2010). H3ABioNet is an NIH-funded (2013-2017)

Pan African Bioinformatics network CHALLENGES OF TEACHING comprising 32 Bioinformatics research groups distributed amongst 15 African BIOINFORMATICS IN TANZANIA AND HOW TO SOLVE THEM countries and 2 partner institutions based in In most cases the available computer the USA. The main goal of H3ABioNet is to laboratories contain few functional create a sustainable Pan African computers to train a class, so classes huddle Bioinformatics Network to support H3Africa around few working computers. Moreover, researchers and their projects through the nations’ energy supply would often be Bioinformatics capacity development on the cut off, making teaching bioinformatics on a African continent. The specific aims of computer rather difficult. But also, there is a H3ABioNet will be achieved through the serious shortage of skilled personnel to teach following main activities: Training and bioinformatics in the country. Education, Research and Tool Development, User Support and Communication, However, even if the power supply is Infrastructure Development, and Node intermittent and the Internet connections run Assessment and Accreditation at dial-up speed, it is still possible to conduct (http://www.h3abionet.org/). The bioinformatics activities. Determined involvement with H3ABioNet is bioinformaticians can start a study with just revolutionizing bioinformatics in Tanzania a computer and open-source software through knowledge transfer and downloaded through the Internet. infrastructure improvements. Through Alternatively, acquiring an US$3000 H3ABioNet, researchers and technical staff wealthy eBioKit system will make a big from MDH, MUHAS and UDSM have been difference. Meanwhile, awareness attending specialized training courses with

7 Lyantagaye S.L. – Current status and future perspectives…

campaigns should be constantly stages to manage their own specific data on appeal to the government and private sector indigenous biological species, on local to invest in bioinformatics. epidemiology and biodiversity programmes. These tasks clearly require that statisticians Many organizations use computer clusters to and informatics experts become advanced maximize processing time, increase database users of bioinformatics software and develop storage and implement faster data storing a capability to solve problems locally. This and retrieving techniques (Barker et al. process does not require large resources in it 2013). The major advantages of using but will allow developing countries to computer clusters are clear when an further investigate their own biological organization requires large scale processing resources. To facilitate this process like in bioinformatics teaching and research. biomathematics/bio-computing should be When used this way, computer clusters introduced to universities, and the offer: Cost efficiency: the cluster technique establishment of small software groups and is cost effective for the amount of power and companies should be encouraged. processing speed being produced. It is more efficient and much cheaper compared to Whereas the 1990s to 2000s have been other solutions like setting up mainframe characterized by genome projects that have computers. Processing speed: multiple high called for massive data processing solutions, speed computers work together to provided the challenge remains in the understanding unified processing, and thus faster of the results. At present, advanced processing overall. Improved network bioinformatics is concentrated in a few infrastructure: different Local Area Network research centres and private companies (LAN) topologies are implemented to form a around the world that have the capacity to computer cluster. These networks create a employ personnel with highly specialized highly efficient and effective infrastructure training. In spite of the fact that that prevents bottlenecks. Flexibility: unlike bioinformatics methods are freely mainframe computers, computer clusters can accessible, there is clearly a gap between the be upgraded to enhance the existing developing and the industrialized world, specifications or add extra components to which must be consciously narrowed. the system. High availability of resources: If Bioinformatics is indeed the enabling any single component fails in a computer technology for several fields of biomedical cluster, the other machines continue to and agricultural research. The use of provide uninterrupted processing (Bader and bioinformatics spreads freely through the Pennington 2001, Barker et al. 2013). Internet and it helps developing countries to Computer clustering, therefore, could be catch up with industrialized countries. All handy in bioinformatics teaching and this is based, however, on the principle that training institutions in Tanzania by pooling information resources worldwide remain together small computational resources freely accessible. If this should change in the available in various units. future, it might widen the North-South gap in biotechnology. FUTURE PERSPECTIVES OF BIOINFORMATICS CONCLUSION Any country intending to remain up-to-date There is unlimited opportunities provided by in the biomedical, biotechnological and the next generation sequencing technologies agricultural sectors, cannot disregard which has made affordable genetic data Bioinformatics. In addition to this general generation for science and medicine even in trend, developing countries may also want to developing countries. However, the

8 Tanz. J. Sci. Vol 39 2013

bottleneck in advancing these technologies scientists. Universities and other higher consists in a lack of expertise in using learning institutions in the country need to bioinformatics tools and approaches for data face the fact that the world has changed. The mining in raw DNA sequences. There is no paper is meant to rouse public awareness of need of very large investments to overcome these new opportunities to address many this limitation, but the need is in establishing research problems relevant to Tanzania and of education programs, training courses and sub-Sahara Africa. workshops for students, university staff members and medical practitioners and

REFERENCES Altschul SF, Madden TL, Schäffer AA, COSTECH: Zhang J, Zhang Z, Miller W and Lipman http://www.costech.or.tz/?s=biotechnolo DJ 1997 Gapped BLAST and PSI- gy, 21 November 2013. BLAST: a New Generation of Protein ExPasy: http://www.expasy.org/ Database Search Programs. Nucleic Field D, Tiwari B, Booth T, Houten S, Swan Acids Res. 25:3389-3402. D, Bertrand N and Thurston M2006 ARIS 2010 The Academic Registration Open Software for Biologists: From Information System Famine to Feast. Developing and (ARIS).https://aris2.udsm.ac.tz/ Deploying Specialized Computing Bader DA and Pennington R 2001 “Cluster Systems for Specific Research Computing: Applications''. Int. J. High Communities is Achievable, Cost Perform. Comput. Appl. 15:181-185. Effective and Has Wide-ranging Barker D, Ferrier DE, Holland PW, Mitchell Benefits. Nat. Biotechnol. 24: 801-803. JB, Plaisier H, Ritchie MG and Smart Fuxelius H, Bongcam E and Jaufeerally Y SD 20134273π: Bioinformatics 2010 The Contribution of the eBioKit to Education on Low Cost ARM Hardware. Bioinformatics Education in Southern BMC Bioinformatics 14: 243. Africa. EMBnet J. Available at: Bayat A 2002 Science, Medicine and the . 21 1022. Nov. 2013. Chubb D, Jefferys BR, Sternberg MJE and Gascuel O and Steel M 2007 Reconstructing Kelley LA 2010 Sequencing Delivers Evolution: New Mathematical and Diminishing Returns for Homology Computational Advances. Oxford Detection: Implications for Mapping the University Press, New York. Protein Universe. Bioinformatics 26: Genome Management Information System 2664-2671. (GMIS) 2008 Genomics and Its Impact Collins FS, Morgan M and Patrinos A 2003 on Science and Society, Human Genome The Human Genome Project: Lessons Program, U.S. Department of Energy: A from Large-scale Biology. Science 300: 2008 Primer. 286-90. GNU operating system: http://www.gnu.org Collins FS, Patrinos A, Jordan E, Hucka M, Bergmann F, Hoops S, Keating S, Chakravarti A, Gesteland R and Walters Sahle S, Schaff JC, Smith LP and L 1998 New Goals for the U.S. Human Wilkinson D 2010 The Systems Biology Genome Project: 1998-2003. Science Markup Language (SBML): Language 282: 682-689. Specification for Level 3 Version 1 Core (Release 1 Candidate). Nature Proceedings 1-167.

9 Lyantagaye S.L. – Current status and future perspectives…

Ishengoma DS, Makani J, All TGN Marshall G, Vakser L 2005 Protein-Protein members 2013 Tanzania Genome Docking Methods. Protein Reviews 3: Network: Setting up a Framework and 115-146. Foundation for Genomic Research in MBB: Tanzania. The 8th Scientific Meeting of http://www.mbb.udsm.ac.tz/mbb_files/m the African Society of Human Genetics olecular_biology&biotechnology1.html, (19th - 21st May 2013, at La Palm Beach 22 November 2013. Hotel, Accra, Ghana). Mcwilliam H, Valentin F, Goujon M, Li KEGG: http://www.kegg.jp WW, Narayanasamy M, Martin J, Miyar Kim IJ, Kang HC and Park JG 2004 T and Lopez R 2009 Web services at the Microarray Applications in Cancer European Bioinformatics Institute-2009. Research. Cancer Res. Treat. 36: 207– Nucl. Acids Res. 37 (Web Server issue): 213. W6–W10. Korf I 2004. "Gene Finding in Novel Meiri E, Levy A, Benjamin H, Ben-David Genomes". BMC Bioinformatics 5: 59- M, Cohen L, Dov A, Dromi N, Elyakim 67. E, Yerushalmi N, Zion O, Lithwick- Krampis K, Booth T, Chapman B, Tiwari B, Yanai G and Sitbon E 2010 Discovery of Bicak M, Field D and Nelson KE 2012 microRNAs and Other Small RNAs in Cloud BioLinux: Pre-configured and On- Solid Tumors. Nucleic Acids Res. demand Bioinformatics Computing for 38:6234-6246. the Genomics Community. BMC Miller W, Makova KD, Nekrutenko A, Bioinformatics 13: 1-8 Hardison RC 2004 Comparative Larsen M, Yamada KM and Musselmann K genomics. Annu. Rev. Genomics Hum. 2010 Systems Analysis of Salivary Genet. 5:15-56. Gland Development and Disease. WIREs NCBI: http://www.ncbi.nlm.nih.gov Syst. Biol. Med. 2: 670-682. NCBI BLAST Lassmann S, Weis R, Makowiec F, Roth J, home.http://blast.ncbi.nlm.nih.gov/Blast. Danciu M, Hopt U and Werner M 2007 cgi Array CGH Identifies Distinct DNA NEPAD: http://www.nepad.org Copy Number Profiles of Oncogenes and Pauling L, Corey RB and Branson HR Tumor Suppressor Genes in 1951The Structure of Proteins: Two Chromosomal- and Microsatellite- Hydrogen-bonded Helical unstable Sporadic Colorectal Configurations of the Polypeptide Chain. Carcinomas. J. Mol. Med. 85: 293-304. Proc. Natl. Acad. Sci. USA 37: 205-211. Liu X, Long F, Peng H, Aerni SJ, Jiang M, Peng J and Xu J 2010 Low-homology Sa´nchez-Blanco A, Murray JI, Preston Protein Threading. Bioinformatics 26: F, Mericle B, Batzoglou S, Myers EW i294-i300. and Kim SK 2009 Analysis of Cell Fate Rakhshandehroo M, Hooiveld G, Müller M from Single-Cell Gene Expression and Kersten S 2009 Comparative Profiles in C. elegans. Cell 139: 623– Analysis of Gene Regulation by the 633. Transcription Factor PPARα Between Luscombe NM, Greenbaum D and Gerstein Mouse and Human. PLoS ONE 4: e6796. M 2001What is Bioinformatics? An Rätsch G, Sonnenburg S, Srinivasan J, Witte Introduction and Overview. Yearbook of H, Müller KR, Sommer RJ, Schölkopf B Medical Informatics. Yale University, 2007 Improving the C. elegans Genome New Haven, USA. Annotation Using Machine Learning. PLoS Comput. Biol. 3: e20.

10 Tanz. J. Sci. Vol 39 2013

Reportlinker 2013 Genomics Industry: Regulation of Nitrogen Assimilation and Bioinformatics Market by Sector its Coupling to Photosynthesis. Nucleic (Molecular Medicine, Agriculture, Acids Res. 33: 5156-5171. Research & Forensic), Segment TANBIF: http://www.tanbif.or.tz (Sequencing Platforms, Knowledge Uchikoga N and Hirokawa T 2010 Analysis Management Tools & Data Analysis of Protein-protein Docking Decoys Services) & Application (Genomics, Using Interaction Fingerprints: Proteomics & Drug Design), Global Application to the Reconstruction of Forecasts to 2017. SOURCE CaM-Ligand Complexes. BMC Reportlinker. Bioinformatics 11: 236-246. http://www.reportlinker.com, 15 UDSM (University of Dar-es-salaam) 2010 November 2013. Connection of the University to The RNCOS 2010 Global Bioinformatics Market STM-1 SEACOM Fibre-optic Cable. Outlook. Publish Date: September 2010 Press Release Issued by the Public RNCOS 2006 Bioinformatics Market Relations Office 04/08/2010. Update, Publish Date: Jul 2006. van Kampen AHC and Horrevoets AJG http://www.rncos.com/Report/IM045.ht 2006 The Role of Bioinformatics in m Genomic Medicine. In: Pasterkamp G Sanger F, Air GM, Barrell BG, Brown NL, and de Kleijn DPV (Eds) Cardiovascular Coulson AR, Fiddes CA, Hutchinson CA Research, New Technologies, Methods, and Slocombe PM 1977 Nucleotide and Applications Springer, New York Sequence of Bacteriophage φX174 Wagner JR, Ge B, Pokholok D, Gunderson DNA. Nature 265: 687-695. KL, Pastinen T and Blanchette M 2010 Simara P, Koutna I, Stejskal S, Krontorad P, Computational Analysis of Whole- Rucka Z, Peterkova M and Kozubek M Genome Differential Allelic Expression 2009 Combination of mRNA and Protein Data in Human. PLoS Comput. Biol. 6: Microarray Analysis in Complex Cell 1-12. profiling. Neoplasma 56: 141-149. Wang C, Bradley P, Baker D 2007 Protein- Stajich JE and Lapp H 2006 Open Source protein Docking with Backbone tools and Toolkits for Bioinformatics: Flexibility. J. Mol. Biol. 373: 503-19. Significance, and Where are We? Brief Wang H, Liu Y, Briesemann M and Yan J Bioinform. 7: 287-296. 2010 Computational Analysis of Gene Stevens R. 2006 Trends in Cyber- Regulation in Animal Sleep Deprivation. infrastructure for Bioinformatics and Physiol. Genomics 42: 427-436. Computational Biology. CTWatch Q. 2. Yang HY, Kwon J, Cho EJ, Choi HI, Park http://www.ctwatch.org/quarterly/articles C, Park HR, Park SH, Chung KJ, Ryoo /2006/08/trends-in-cyberinfrastructure- ZY, Cho KO and Lee TH 2010 for-bioinformatics-and-computational- Proteomic Analysis of Protein biology/, 5 March 2014. Expression Affected by Peroxiredoxin V Su Z, Olman V, Mao F and Xu Y 2005 Knock-Down in Hypoxic Kidney. J. Comparative Genomics Analysis of Proteome Res. 9: 4003-4015. NtcA regulons in Cyanobacteria:

11