Proceedings of 11thNational Science Symposium on Recent Trends in Science and Technology (February 03, 2019) Organized by Christ College, Rajkot & Sponsored by Gujarat State Biotechnology Mission (GSBTM), DST, Govt. of Gujarat. VIROINFORMATICS: DATABASES AND TOOLS Angelin George, John J. Georrge* Department of Bioinformatics, Christ College, Rajkot-360 005, Gujarat [email protected]

ABSTRACT

The amalgamation of virology and Bioinformatics has led to the development of a new field known as viroinformatics. More than 100 web servers and databases are currently available, which provides information regarding different viruses, for example, dengue virus, influenza virus, hepatitis virus, human immunodeficiency virus [HIV], hemorrhagic fever virus [HFV], human papillomavirus [HPV], West Nile virus, etc. The databases provide the tools for homology modelling, phylogenetic tree, multiple , 3D visualization. The need for computer-assisted technologies of genome structure, function and evolution of viruses is increasing immensely to tackle various challenges in virology. This review presents an overview of all the viroinformatics databases and tools developed to contribute to the development of the new potential drug.

1. INTRODUCTION

Viruses are ubiquitous infectious agent known to have infected all types of life forms, from animals and plants to microorganisms such as bacteria and archaea. Viruses cause various human diseases; this includes common cold, influenza, chickenpox, and cold sores and serious diseases such as Ebola virus disease, dengue fever, AIDS, avian influenza, and severe acute respiratory syndrome (SARS). The possible connection between human herpesvirus 6 (HHV6) and neurological diseases such as multiple sclerosis and chronic fatigue syndrome is under investigation (Komaroff, 2006). Approximately 6 million deaths occur every year due to viruses, despite the availability of effective vaccines and treatments for several diseases. Thus, it is crucial to develop remedies against these viral invaders. A large amount of genomic and experimental data is generated due to advancement in molecular biology and Bioinformatics. To store, examine, and disseminate all this information, 144 viroinformatics resources have been developed. The International Committee on Taxonomy of Viruses (ICTV) performs naming and classifying virus lists 4,958 species. The genome of 8110 viral strains has been sequenced (NCBI Viral Genome Resource). Bioinformatics research works, including data analysis, development of tools and databases on the microorganisms, are growing gradually (Abouelwafa et al., 2017; George et al., 2018; George et al., 2017; Georrge et al., 2011; Georrge et al., 2012; Georrge, 2016; Kotadiya et al., 2015; Lijo et al., 2012; Nishita et al., 2015; Nishita et al., 2016; Ranipa et al., 2018; Sakina et al., 2016; Sharma et al., 2014; Trivedi et al., 2016a, 2016b; Ukani et al., 2011; Vaidya et al., 2018). The powerful viroinformatics resources have been developed that provides unprecedented opportunities to address fundamental questions in virology. Bioinformatic analysis of viruses includes analyzing any novel sequences, such as gene identification, gene functional annotation, and analysis of phylogenetic relationships. As the viral genome is of small size, it is possible to sequence large numbers of isolates, which calls for specific methods of analysis. However, the current technologies available for viral genomes pose challenges because most analysis steps are not easily automated (Tumpey et al., 2005).

2. VIRUS-CENTERED RESOURCES

The biodiversity of viruses and its coverage of multiple scales is challenging for algorithm and software development (Hufsky et al., 2018). Recently, many new databases and tools are available to virologists that will be discussed in the following section.

Baculoviruses and Papillomaviruses Baculoviruses are a family of viruses that infect insects. They have a large double-stranded DNA (dsDNA) genome that can accommodate multiple additional foreign genes (Kamita et al., 2010). The www.virology.ca is a database that provides easy access to the genes, gene families, and genomes of the different virus families, including Baculoviruses. Human Papillomavirus (HPV) is an infection caused by papillomavirus that can spread through skin-to-skin contact (Table 1). A Papillomavirus

www.christcollegerajkot.edu.in, © Christ College, Rajkot, India ISBN: 9788192952147, Page No. 117 Proceedings of 11thNational Science Symposium on Recent Trends in Science and Technology (February 03, 2019) Organized by Christ College, Rajkot & Sponsored by Gujarat State Biotechnology Mission (GSBTM), DST, Govt. of Gujarat. Bioinformatics Episteme (PaVE) database contains curated papillomavirus genomic sequences and provides several web-based sequence analysis tools (Van Doorslaer et al., 2016).

Table 1: Resources of Baculoviruses and Papillomaviruses

Resources Specific Features URL

Baculoviruses virology.ca Database that provides access to https://4virology.net/ viral genomic information Papillomaviruses PaVE Access to papillomavirus http://pave.niaid.nih.gov sequences and analysis tools

Dengue Virus and West Nile Virus Dengue virus is the causative agent of common arthropod-borne viral disease in man with 50–100 million infections per year. Till now, vaccines have been developed that can affect all the serotypes of the dengue virus. A set of the virus-specific database is available at NCBI, referred to as Virus Variation Resources (VVR) (Table 2). It is an integrated resource for the dengue virus and West Nile Virus, where the users can build complex queries and then apply various analysis tools to the result (Resch et al., 2009). The DengueNet is the World Health Organization’s central data management system for the global epidemiological surveillance of dengue fever (DF) and dengue haemorrhagic fever (DHF) (Lawrence, 2002).

Table 2: Resources of Dengue Virus and West Nile Virus

Resources Specific Features URL Dengue Virus and West Nile Virus NCBI-VVR Set of virus-specific database http://www.ncbi.nlm.nih.gov/genomes/Virus Variation/ Dengue Virus

DengueNet Data management system for surveillance of dengue fever -

Influenza Virus The human influenza virus is distributed worldwide. Influenza was brought to the forefront of the world’s attention due to the recent emergence of highly pathogenic avian influenza virus (AIV; H5N1) that resulted in the death of more than 100 people and the slaughter of millions of poultry in Asia, Europe and Africa (World Health Organization, http://www.who.int). Till now, ten web portals and tools have been developed solely for the influenza virus (Table 3). Influenza Virus Database (IVDB) was the first information resource to be developed containing both Beijing Institute of Genomics (BIG) data and published IV sequences after expert curation to ensure a high standard of accuracy completeness. Till now IVDB contains 43,875 influenza virus nucleotide sequences, 53,983 CDS sequences and 53,983 protein sequences. Two main features of IVDB are (i) Sequence Distribution Tool: It facilitates IV global transmission and evolution analysis. (ii) IV Sequence Quality Filter System: The nucleotide sequences are classified and ranked in 7 categories according to sequence content and integrity by the Q-filter system (Chang et al., 2006).

www.christcollegerajkot.edu.in, © Christ College, Rajkot, India ISBN: 9788192952147, Page No. 118 Proceedings of 11thNational Science Symposium on Recent Trends in Science and Technology (February 03, 2019) Organized by Christ College, Rajkot & Sponsored by Gujarat State Biotechnology Mission (GSBTM), DST, Govt. of Gujarat. Bioinformatics Table 3: Resources of Influenza Virus

Resources Specific features URL NCBI-IVR Integrated information resource and analysis http://influenza.psych.ac.cn/ platform for genetic, genomic, and phylogenetic studies of influenza virus. FluTE Influenza epidemic simulation tool https://www.cs.unm.edu/~dlch ao/flute/ IRD Provides various visualization and analysis tools https://www.fludb.org/brc/hom for comparative genomics e.spg?decorator =influenza FluGenome Web portal for genotyping influenza A virus https://omictools.com/flugeno me-tool IVDB An integrated information resource and analysis http://influenza.big.ac.cn/ platform for influenza virus research EpiFlu Provides access to influenza virus sequences, http://platform.gisaid.org related clinical and epidemiological data associated with human viruses GiRaF Identifies influenza virus reassortments http://kingsfordlab.cbd.cmu.ed u/ OpenFluDB Contains genomic and protein sequences as well http://openflu.vital- as epidemiological data from more than 25’000 it.ch/browse.php isolates. ISED Establishes influenza genomic sequences and http://influenza.cdc.go.kr compares the user’s sequences with those of vaccine strains. ATIVS Analysis tools for influenza virus surveillance. http://influenza.nhri.org.tw/AT IVS/

NCBI-IVR is the most cited resources and provides tools for genome annotation of influenza virus such as FLAN (FLu ANnotation) for user-provided influenza A virus or influenza B virus sequences (Bao et al., 2007). IRD is a public-accessible resource that integrates genomic, proteomic, immune epitope, and surveillance data from various sources, including public databases, computational algorithms and scientific literature (Squires et al., 2012). Apart from this, FluGenome is a developed tool for genotyping influenza A virus and identifies reassortment events between divergent lines (Lu et al., 2007). The ISED (Influenza sequence and epitope database) was established to analyze drug resistance mutation in user’s sequences and gain information about epitope. It also allows users to visualize epitope-matching structures (Yang et al., 2008). ATIVS is a web- server which carries out both antigenic and genetic analyses of influenza isolates for influenza surveillance (Liao et al., 2009). The open FluDB database contains genomic and protein sequences and epidemiological data from more than 27 000 isolates. It includes virus type, host, geographical location and experimentally tested antiviral resistance (Liechti et al., 2010). A web interface known as Influenza Primer Design Resource (IPDR) is established that provides several essential tools that aid in the development of oligonucleotides that may be used to develop better diagnostics (Bose et al., 2008). The FluTE is a publicly available Influenza Epidemic Simulation Model with more realistic intervention strategies and can run on a personal computer (Chao et al., 2010).

HIV and Human T-lymphotropic virus Tremendous efforts are going on to reduce the advent of the Human Immunodeficiency Virus (HIV). For the same many resources have been developed (Table 4). The LANL HIV stores information such as genetic sequences, drug-resistance associated mutations and several other tools (1) Geography Search Interface: Retrieves HIV sequence based on the Geographical distribution (2) HIValign: Uses

www.christcollegerajkot.edu.in, © Christ College, Rajkot, India ISBN: 9788192952147, Page No. 119 Proceedings of 11thNational Science Symposium on Recent Trends in Science and Technology (February 03, 2019) Organized by Christ College, Rajkot & Sponsored by Gujarat State Biotechnology Mission (GSBTM), DST, Govt. of Gujarat. Bioinformatics the HMM alignment models already available in the database to align the query sequences. (3) Sequence Locator: Finds the position of HIV or SIV nucleotide or protein sequence. (4) Find Model: Gives the best evolutionary model that fits the query sequences (Kuiken et al., 2003; Shaw et al., 2013). Some viruses mutate at high rates and rapidly develop resistance to existing antiviral drugs. As a result, HIV drug resistance is increasing worldwide. An essential resource Stanford HIV Drug Resistance Database (HIVDB), was developed that enables HIV care providers to interpret HIV drug resistance tests to choose the most appropriate treatment for their patients (Shaw et al., 2013). A similar interpretation system named EuResist Network has been developed to explore several machine learning techniques to develop a treatment response prediction engine (Zazzi et al., 2012). The PIRSpred is a web server to predict protein-inhibitor resistance and susceptibility for HIV-1 (Jenwitheesuk et al., 2005). HIV Therapy Simulator (HIVSIM) is software that contains computer simulation models helpful in exploring the efficacy of HIV therapy regimens (Lim et al., 2011).

Table 4: Human Immunodeficiency Virus

Resources Specific features URL LANL Stores HIV sequences, http://hiv.lanl.gov HIV drug-resistance mutations database and several other tools Stanford Predicts drug-resistance http://hivdb.stanford.edu HIV drug mutations resistance DB EuResist Treatment response https://www.euresist.org/ Network prediction engine PIRSpred Predicts protein-inhibitor http://protinfo.compbio.washington.edu/pirspred/ resistance for HIV-1 SQUAT Quality assessment tool http://www.stat.brown.edu/CFAR/SQUAT SCUEAL Phylogenetic method for http://www.datamonkey.org/dataupload_scueal.php automatic subtyping an HIV-1 sequence bNAber Stores detailed information http://bnaber.org about HIV bNAbs and provides visualization tools

HIVCD Tool for contamination http://sourceforge.net/projects/hivcd/ detection vFitness Tool developed to http://bis.urmc.rochester.edu/vFitness/ understand viral fitness HIV Protein interaction, HIV- http://hivsystemsbiology.org Systems replication cycle site Biology HIVSIM Explores the efficacy of https://sites.google.com/site/hivsimulator/ HIV therapy treatment. HIV Provides selection pressure http://fold.doe-mbi.ucla.edu/HIV/ positive maps of PR/RT selection mutation DB

www.christcollegerajkot.edu.in, © Christ College, Rajkot, India ISBN: 9788192952147, Page No. 120 Proceedings of 11thNational Science Symposium on Recent Trends in Science and Technology (February 03, 2019) Organized by Christ College, Rajkot & Sponsored by Gujarat State Biotechnology Mission (GSBTM), DST, Govt. of Gujarat. Bioinformatics Web server SCUEAL is a model-based phylogenetic method for automatically subtyping an HIV- 1 sequence, assigning parental sequences in recombinant strains, and computing confidence levels for the inferred quantities (Pond et al., 2009). Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment was created for quality assessment before sequence analysis (DeLong et al., 2012). The bNAber database provides access to detailed data on the rapidly growing list of HIV bNAbs, including sequences and three-dimensional structure (Eroshkin et al., 2014).

HIV Contamination Detection (HIVCD) is an open-source tool utilized to make pairwise comparisons of HIV-1 pol gene sequences from patients across the United States, contributing to quality testing (Ebbert et al., 2013). The accurate estimation of viral fitness depends on complicated statistical methods. This led to the development of vFitness, a web-based computing tool for improving the estimation of in vitro HIV-1 fitness experiments (Ma et al., 2010). HIV Systems Biology is a website that collects Big Data on HIV and hosts tools such as Gene Overlapper, HIV Replication Cycle site, GPS-Prot and AIDSVu (Bushman et al., 2013). Human T-cell lymphotropic virus type 1 (HTLV-1) is a group of human retroviruses that causes a type of cancer called adult T-cell leukaemia/lymphoma. The HTLV-1 Molecular Epidemiology Database stores annotated HTLV-1sequences from clinical, epidemiological, and geographical studies (Araujo et al., 2012).

Hepatitis virus Hepatitis is mainly caused by the five unrelated hepatotropic viruses: hepatitis A virus (HAV), hepatitis B virus (HBV), hepatitis C virus (HCV), hepatitis D virus (HDV), and hepatitis E virus (HEV). HepSEQ is a public repository that contains data related to hepatitis B virus (HBV) infection collected from international sources (Table 5). There are four major sections in the web interface of HepSEQ. The first section shows the summary of data in the repository. The second section allows the user to access and submit the data. The third section generates pie and bar charts based on the relation between different factors. In the third section, three tools are available for sequence analysis, such as Sequence Matcher, Genotyper and Mutation Marker (Gnaneshan et al., 2006).

Table 5: Hepatitis B virus (HBV), hepatitisC virus (HCV)

Resource Specific Feature URL HBV HepSEQ Data repository of Hepatitis B http://www.hepseq.org/ HBVdb Contains various sequences and http://hbvdb.ibcp.fr analysis tools SeqHepB To determine resistance-associated http://www.seqhepb.com mutations HBVRegDB Comparison and detection of http://lancelot.otago.ac.nz regulatory elements in hepatitis B HCV euHCVdb Resource of sequence, structures and http://euhcvdb.ibcp.fr tools

HBVdb facilitates the investigation of genetic variability of Hepatitis B Virus (HBV) and allows the users to annotate their sequence (Hayer et al., 2012). SeqHepB is both a sequence analysis program and a database containing data from multiple sources (Yuen et al., 2007). HBVRegDB is a tool that enables annotation, comparison, detection and visualization

www.christcollegerajkot.edu.in, © Christ College, Rajkot, India ISBN: 9788192952147, Page No. 121 Proceedings of 11thNational Science Symposium on Recent Trends in Science and Technology (February 03, 2019) Organized by Christ College, Rajkot & Sponsored by Gujarat State Biotechnology Mission (GSBTM), DST, Govt. of Gujarat. Bioinformatics of regulatory elements in hepatitis B virus sequences (Panjaworayan et al., 2007). The European hepatitis C virus database (euHCVdb) is a library of computer-annotated sequences based on the reference genome (Combet et al., 2006).

ViPR Virus Pathogen Database and Analysis Resource (ViPR, www.viprbrc.org) contains various records, gene and protein annotations, immune epitopes, 3D structures, host factor data. The ViPR provides three main functions. It stores data from external and internal sources and groups these data into two main categories, virus families containing human priority pathogens or possible public health threats. Secondly, users can perform analysis with the help of the data analysis and visualization tools provided by ViPR. The third main feature the ViPR workbench, which allows the storage of the results retrieved when necessary (Araujo et al., 2012). Currently, ViPR contains wide range of information regarding several human-pathogen viruses belonging to Arenaviridae, Bunyaviridae, Caliciviridae, Coronaviridae, Filoviridae, Flaviviridae, Hepeviridae, Herpesviridae, Paramyxoviridae, Picornaviridae, Poxviridae, Reoviridae, Rhabdoviridae, and Togaviridae families. The ViPR stores data from three different sources: (i) data from public archives GenBank, UniProt, (PDB, http://www.rcsb.org/pdb), Immune Epitope Database and PubMed. (ii) ViPR produces novel derived data with the help of various automated Bioinformatics and comparative genomics algorithms. (iii) direct data submission to ViPR from experiments and other independent institutions.

3. VIRUS-SPECIFIC TOOLS Identity distribution and genotype sequencing are crucial for studying the viral genome. The identity distribution is plotted in the form of a histogram in which each bar represents the intervals of identities. A number of tools have been developed for the analysis of viral sequences, which are listed below.

De novo assembly tools for viral genome Velvet, ABySSor Geneious are the tools available for whole-genome assembly. Velvet is a set of algorithms that can leverage short reads and produce useful assembly based on de Bruijn graphs (Zerbino et al., 2008). ABySS (Assembly By Short Sequences), a parallelized sequence assembler, enables to increase in the amount of memory available to the assembly process (Simpson et al., 2009). Geneious Basic is a software platform for analyzing and visualization of biological data (Kearse et al., 2012). Due to repetitive UTR region, these tools cannot be used for the complete viral genome. As a result, an alternative algorithm such as SPAdes (Bankevich et al., 2012) and IDBA-UD (Peng et al., 2012) was developed for single-cell assemblies. VICUNA is another algorithm designed to generate assemblies from the heterogenous population (Yang et al., 2012). In addition to this, SOAPdenovo-Trans is a non-virus specific tool but works efficiently for memory-efficient short-read de novo assembly (Luo et al., 2012).

Secondary structure prediction tools It is necessary to predict the secondary structure of viruses to understand the regulatory function of virus. Advances in sequencing technology led to the development of the computational programs and tools: Mfold, RNAfold and LocARNA that predict viruses’ secondary structure. The M fold identifies the RNA and DNA folding and predicts hybridization (Zuker, 2003). RNAfold used to predict Minimum Free Energy secondary structure and calculates the equilibrium base-pairing probabilities (Gruber et al., 2008). The LocARNA (Local Alignment of RNA) is a tool that can deal with pseudoknot-free RNA secondary structures (Will et al., 2007).

Virus genotyping and annotation Genome annotation is a crucial step to identify the coding and noncoding region. For instance, GLUE (Genes Linked by Underlying Evolution) is software for interpreting sequence data for different viruses and can also be used as a storage platform (Singer et al., 2018). It has a simple interface that is www.christcollegerajkot.edu.in, © Christ College, Rajkot, India ISBN: 9788192952147, Page No. 122 Proceedings of 11thNational Science Symposium on Recent Trends in Science and Technology (February 03, 2019) Organized by Christ College, Rajkot & Sponsored by Gujarat State Biotechnology Mission (GSBTM), DST, Govt. of Gujarat. Bioinformatics used in Bioinformatics pipelines. Besides this ATHALES is software that determines HLA genotypes from Illumina exome sequencing. The PriSM selects and matches primers for viral genome amplification (Yu et al., 2010).

4. CONCLUSION

This review covers the majority of databases and tools that contains essential information on several viruses. This will allow the virologists to select the best tool for specific experiments. Bioinformatics tools facilitate the comparison of genetic and genomic data, which helps to understand the evolutionary relationship between various species.

5. REFERENCES

Abouelwafa Manal & Georrge John J. (2017). Ebola virus and its potential drug targets. Paper presented at the Proceedings of International Science Symposium on Recent Trends in Science and Technology (ISBN: 9788193347553). Araujo Thessika Hialla Almeida, Souza-Brito Leandro Inacio, Libin Pieter, Deforche Koen, Edwards Dustin, de Albuquerque-Junior Antonio Eduardo, . . . Alcantara Luiz Carlos Junior. (2012). A public HTLV-1 molecular epidemiology database for sequence management and data mining. PloS one, 7(9), e42123. Bankevich Anton, Nurk Sergey, Antipov Dmitry, Gurevich Alexey A, Dvorkin Mikhail, Kulikov Alexander S, . . . Prjibelski Andrey D. (2012). SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of , 19(5), 455-477. Bose Michael E, Littrell John C, Patzer Andrew D, Kraft Andrea J, Metallo Jacob A, Fan Jiang & Henrickson Kelly J. (2008). The Influenza Primer Design Resource: a new tool for translating influenza sequence data into effective diagnostics. Influenza and other respiratory viruses, 2(1), 23-31. Bushman Frederic D, Barton Spencer, Bailey Aubrey, Greig Caitlin, Malani Nirav, Bandyopadhyay Sourav, . . . Krogan Nevan. (2013). Bringing it all together: big data and HIV research. AIDS (London, England), 27(5), 835. Chang Suhua, Zhang Jiajie, Liao Xiaoyun, Zhu Xinxing, Wang Dahai, Zhu Jiang, . . . Wang Jian. (2006). Influenza Virus Database (IVDB): an integrated information resource and analysis platform for influenza virus research. Nucleic acids research, 35(suppl_1), D376-D380. Chao Dennis L, Halloran M Elizabeth, Obenchain Valerie J & Longini Jr Ira M. (2010). FluTE, a publicly available stochastic influenza epidemic simulation model. PLoS computational biology, 6(1), e1000656. Combet Christophe, Garnier Nicolas, Charavay Celine, Grando Delphine, Crisan Daniel, Lopez Julien, . . . Hulo Chantal. (2006). euHCVdb: the European hepatitis C virus database. Nucleic acids research, 35(suppl_1), D363-D366. DeLong Allison K, Wu Mingham, Bennett Diane, Parkin Neil, Wu Zhijin, Hogan Joseph W & Kantor Rami. (2012). Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase. AIDS research and human retroviruses, 28(8), 894-901. Ebbert Mark TW, Mallory Melanie A, Wilson Andrew R, Dooley Shane K & Hillyard David R. (2013). Application of a new informatics tool for contamination screening in the HIV sequencing laboratory. Journal of Clinical Virology, 57(3), 249-253. Eroshkin Alexey M, LeBlanc Andrew, Weekes Dana, Post Kai, Li Zhanwen, Rajput Akhil, . . . Godzik Adam. (2014). bNAber: database of broadly neutralizing HIV antibodies. Nucleic acids research, 42(D1), D1133-D1139. George Rija & Georrge John J. (2018). Statistical analysis of industrially important thermophilic organisms producing alpha-amylase, DNA polymerase and protease. Paper presented at the Proceedings of 10th National Science Symposium on Recent Trends in Science and Technology (ISBN: 9788192952130). George Rija, Thomas Sneha, Jacob Sarah & Georrge John J. (2017). Approaches for novel drug target identification. Paper presented at the Proceedings of International Science Symposium on Recent Trends in Science and Technology (ISBN: 9788193347553).

www.christcollegerajkot.edu.in, © Christ College, Rajkot, India ISBN: 9788192952147, Page No. 123 Proceedings of 11thNational Science Symposium on Recent Trends in Science and Technology (February 03, 2019) Organized by Christ College, Rajkot & Sponsored by Gujarat State Biotechnology Mission (GSBTM), DST, Govt. of Gujarat. Bioinformatics Georrge John J & Umrania Valentina. (2011). In silico identification of putative drug targets in Klebsiella pneumonia MGH78578. Georrge John J & Umrania VV. (2012). Subtractive genomics approach to identify putative drug targets and identification of drug-like molecules for beta subunit of DNA polymerase III in Streptococcus species. Applied Biochemistry and Biotechnology, 167(5), 1377-1395. Georrge John J. (2016). A Bioinformatics Approach for the Identification of Potential Drug Targets and Identification of Drug-like Molecules for Ribosomal Protein L6 of Staphylococcus species. Paper presented at the Proceedings of 9th National Level Science Symposium on Recent Trends in Science and Technology (ISBN: 9788192952123). Gnaneshan Saravanamuttu, Ijaz Samreen, Moran Joanne, Ramsay Mary & Green Jonathan. (2006). HepSEQ: international public health repository for hepatitis B. Nucleic acids research, 35(suppl_1), D367-D370. Gruber Andreas R, Lorenz Ronny, Bernhart Stephan H, Neuböck Richard & Hofacker Ivo L. (2008). The vienna RNA websuite. Nucleic acids research, 36(suppl_2), W70-W74. Hayer Juliette, Jadeau Fanny, Deleage Gilbert, Kay Alan, Zoulim Fabien & Combet Christophe. (2012). HBVdb: a knowledge database for Hepatitis B Virus. Nucleic acids research, 41(D1), D566-D570. Hufsky Franziska, Ibrahim Bashar, Beer Martin, Deng Li, Le Mercier Philippe, McMahon Dino P, . . . Marz Manja. (2018). Virologists—Heroes need weapons. PLoS pathogens, 14(2), e1006771. Jenwitheesuk Ekachai, Wang Kai, Mittler John E & Samudrala Ram. (2005). PIRSpred: a web server for reliable HIV-1 protein-inhibitor resistance/susceptibility prediction. Trends in microbiology, 13(4), 150-151. Kamita SG, Kang KD, Hammock BD & Inceoglu AB. (2010). 10 Genetically Modified Baculoviruses for Pest Insect Control. INSECT CONTROL. Kearse Matthew, Moir Richard, Wilson Amy, Stones-Havas Steven, Cheung Matthew, Sturrock Shane, . . . Duran Chris. (2012). Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics, 28(12), 1647-1649. Kotadiya Rohitkumar & Georrge John J. (2015). In silico approach to identify putative drugs from natural products for Human papillomavirus (HPV) which cause cervical cancer. Life Sciences Leaflets, 62, 1-13. Kuiken Carla, Korber Bette & Shafer Robert W. (2003). HIV sequence databases. AIDS reviews, 5(1), 52. Lawrence J. (2002). DengueNet–WHO’s internet based system for the global surveillance of dengue fever and dengue haemorrhagic fever. Weekly releases (1997–2007), 6(39), 1883. Liao Yu-Chieh, Ko Chin-Yu, Tsai Ming-Hsin, Lee Min-Shi & Hsiung Chao A. (2009). ATIVS: analytical tool for influenza virus surveillance. Nucleic acids research, 37(suppl_2), W643-W646. Liechti Robin, Gleizes Anne, Kuznetsov Dmitry, Bougueleret Lydie, Le Mercier Philippe, Bairoch Amos & Xenarios Ioannis. (2010). OpenFluDB, a database for human and animal influenza virus. Database, 2010. Lijo John, Georrge John J. & Trupti Kholia. (2012). A Reverse Vaccinology Approach for the Identification of Potential Vaccine Candidates from Leishmania spp. Applied Biochemistry and Biotechnology, 167(5), 1340-1350. Lim Huat Chye, Curlin Marcel E & Mittler John E. (2011). HIV Therapy Simulator: a graphical user interface for comparing the effectiveness of novel therapy regimens. Bioinformatics, 27(21), 3065- 3066. Luo Ruibang, Liu Binghang, Xie Yinlong, Li Zhenyu, Huang Weihua, Yuan Jianying, . . . Liu Yunjie. (2012). SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1(1), 18. Ma Jingming, Dykes Carrie, Wu Tao, Huang Yangxin, Demeter Lisa & Wu Hulin. (2010). vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments. BMC bioinformatics, 11(1), 261. Nishita Vaishnav, Aparna Gupta, Sneha Paul & Georrge John J. (2015). Overview of computational vaccinology: vaccine development through information technology. Journal of Applied Genetics, 56(3), 381-391. Nishita Vaishnav, Suvagiya Pratiksha & Georrge John J. (2016). Modeling mutations, docking, primer and probe designing of Cytochrome P450 2D6, a drug metabolizing enzyme. Paper presented at

www.christcollegerajkot.edu.in, © Christ College, Rajkot, India ISBN: 9788192952147, Page No. 124 Proceedings of 11thNational Science Symposium on Recent Trends in Science and Technology (February 03, 2019) Organized by Christ College, Rajkot & Sponsored by Gujarat State Biotechnology Mission (GSBTM), DST, Govt. of Gujarat. Bioinformatics the Proceedings of 9th National Level Science Symposium on Recent Trends in Science and Technology (ISBN: 9788192952123). Panjaworayan Nattanan, Roessner Stephan K, Firth Andrew E & Brown Chris M. (2007). HBVRegDB: annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences. Virology journal, 4(1), 136. Peng Yu, Leung Henry CM, Yiu Siu-Ming & Chin Francis YL. (2012). IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28(11), 1420-1428. Pond Sergei L Kosakovsky, Posada David, Stawiski Eric, Chappey Colombe, Poon Art FY, Hughes Gareth, . . . Frost Simon DW. (2009). An evolutionary model-based algorithm for accurate phylogenetic breakpoint mapping and subtype prediction in HIV-1. PLoS computational biology, 5(11), e1000581. Ranipa Avani, Shrilal Anju, Nimavat Akash, Rank Jalpa, Kothari Ramesh & Georrge John J. (2018). Aspergillus flavus-A menace to farmers. Paper presented at the Proceedings of 10th National Science Symposium on Recent Trends in Science and Technology (ISBN: 9788192952130). Resch Wolfgang, Zaslavsky Leonid, Kiryutin Boris, Rozanov Michael, Bao Yiming & Tatusova Tatiana A. (2009). Virus variation resources at the National Center for Biotechnology Information: dengue virus. BMC microbiology, 9(1), 65. Sakina S. Vakhariya & Georrge John J. (2016). Curcumin: A multi-tasking molecule. Paper presented at the Proceedings of 9th National Level Science Symposium on Recent Trends in Science and Technology (ISBN: 9788192952123). Sharma Arun, Dutta Prasun, Sharma Maneesh, Rajput Neeraj Kumar, Dodiya Bhavna, Georrge John J, . . . Bhardwaj Anshu. (2014). BioPhytMol: a drug discovery community resource on anti- mycobacterial phytomolecules and plant extracts. Journal of cheminformatics, 6(1), 46. Shaw Timothy I & Zhang Ming. (2013). HIV N-linked glycosylation site analyzer and its further usage in anchored alignment. Nucleic acids research, 41(W1), W454-W458. Simpson Jared T, Wong Kim, Jackman Shaun D, Schein Jacqueline E, Jones Steven JM & Birol Inanç. (2009). ABySS: a parallel assembler for short read sequence data. Genome research, 19(6), 1117- 1123. Singer Joshua B, Thomson Emma C, McLauchlan John, Hughes Joseph & Gifford Robert J. (2018). GLUE: A flexible software system for virus sequence data. BMC bioinformatics, 19(1), 532. Squires R Burke, Noronha Jyothi, Hunt Victoria, García‐Sastre Adolfo, Macken Catherine, Baumgarth Nicole, . . . Larsen Christopher N. (2012). Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and other respiratory viruses, 6(6), 404- 416. Trivedi Gauravi & Georrge John J. (2016a). Bacteriocin producing bacteria from gut of Apis mellifera. Paper presented at the Proceedings of 9th National Level Science Symposium on Recent Trends in Science and Technology (ISBN: 9788192952123). Trivedi Gauravi & Georrge John J. (2016b). Identification of novel drug targets and its Inhibitors from essential genes of human pathogenic Gram positive bacteria. Paper presented at the Proceedings of 9th National Level Science Symposium on Recent Trends in Science and Technology (ISBN: 9788192952123). Ukani Hetal, Purohit Megha K, Georrge John J, Paul Sneha & Singh Satya P. (2011). HaloBase. Development of database system for halophilic bacteria and archaea with respect to proteomics, genomics and other molecular traits. J Sci Ind Res, 70, 976-981. Vaidya Atman, Nair Varun S., Georrge John J. & P Singh S. (2018). Comparative Analysis of Thermophilic Proteases. Research Journal of Life Sciences, Bioinformatics, Pharmaceutical and Chemical Sciences, 04(06), 65-91. doi:10.26479/2018.0406.06 Van Doorslaer Koenraad, Li Zhiwen, Xirasagar Sandhya, Maes Piet, Kaminsky David, Liou David, . . . McBride Alison A. (2016). The Papillomavirus Episteme: a major update to the papillomavirus . Nucleic acids research, 45(D1), D499-D506. Will Sebastian, Reiche Kristin, Hofacker Ivo L, Stadler Peter F & Backofen Rolf. (2007). Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS computational biology, 3(4), e65.

www.christcollegerajkot.edu.in, © Christ College, Rajkot, India ISBN: 9788192952147, Page No. 125 Proceedings of 11thNational Science Symposium on Recent Trends in Science and Technology (February 03, 2019) Organized by Christ College, Rajkot & Sponsored by Gujarat State Biotechnology Mission (GSBTM), DST, Govt. of Gujarat. Bioinformatics Yang Seok, Lee Joo-Yeon, Lee Joon Seung, Mitchell Wayne P, Oh Hee-Bok, Kang Chun & Kim Kyung Hyun. (2008). Influenza sequence and epitope database. Nucleic acids research, 37(suppl_1), D423-D430. Yang Xiao, Charlebois Patrick, Gnerre Sante, Coole Matthew G, Lennon Niall J, Levin Joshua Z, . . . Henn Matthew R. (2012). De novo assembly of highly diverse viral populations. BMC genomics, 13(1), 475. Yu Qing, Ryan Elizabeth M, Allen Todd M, Birren Bruce W, Henn Matthew R & Lennon Niall J. (2010). PriSM: a primer selection and matching tool for amplification and sequencing of viral genomes. Bioinformatics, 27(2), 266-267. Yuen Lilly KW, Ayres Anna, Littlejohn Margaret, Colledge Danielle, Edgely Andrew, Maskill William J, . . . Bartholomeusz Angeline. (2007). SeqHepB: a sequence analysis program and relational database system for chronic hepatitis B. Antiviral research, 75(1), 64-74. Zazzi Maurizio, Incardona Francesca, Rosen-Zvi Michal, Prosperi Mattia, Lengauer Thomas, Altmann Andre, . . . Kaiser Rolf. (2012). Predicting response to antiretroviral treatment by machine learning: the EuResist project. Intervirology, 55(2), 123-127. Zerbino Daniel R & Birney Ewan. (2008). Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome research, 18(5), 821-829. Zuker Michael. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic acids research, 31(13), 3406-3415.

How to cite this Book Chapter?

APA Style Angelin George and John J. Georrge (2019). Viroinformatics: Databases and Tools. Recent Trends in Science and Technology-2019 (pp.117-126. ISBN: 9788192952147. Rajkot, Gujarat, India: Christ Publications MLA Style Angelin George and John J. Georrge. “Viroinformatics: Databases and Tools”. Recent Trends in Science and Technology-2019 (ISBN: 9788192952147). Rajkot, Gujarat, India: Christ Publications, 2019. pp. 117-126. Chicago Style Angelin George and John J. Georrge. “Viroinformatics: Databases and Tools”. In Recent Trends in Science and Technology-2019 (ISBN: 9788192952147), pp. 117-126. Rajkot, Gujarat, India: Christ Publications, 2019.

www.christcollegerajkot.edu.in, © Christ College, Rajkot, India ISBN: 9788192952147, Page No. 126