Online Bioinformacs Resources

Dedan Githae BecA-ILRI Hub, Nairobi, Kenya hp://hub.africabiosciences.org/ hp://www.Ilri.org/

[email protected] Overview

Bioinformatics involves “using computers to store, retrieve, analyse..”

l Resources available online

l Information

l Scientific articles, literature; journals and biomedical search engines

l Databases

l Primary (DNA), Genome, , RNA, Protein-protein interaction, metabollic pathway; others databases

l Tools/ software

l Manipulate, analyse biological data Information: Scientific articles and literature l Journal websites and E-Journals (Open-access / Subscription)

Every major journal highlights journal articles and abstracts on its website

Information: searching for Biological Literature l Biological literature “search engine”

Allows you to search for scientific articles in multiple databases, examples include:

• NCBI PubMed (1966 – present); covers not only clinical medicine but also many areas of biology including biochemistry, molecular biology, genetics, physiology, microbiology, immunology.

• Web of science: online subscription-based scientific citation indexing service maintained by Thomson Reuters

• Scopus: interdisciplinary database covering all of the sciences. Comprehensive.

• Google scholar: allows you to search across a wide range of academic literature. It draws on information from journal publishers, university repositories and other websites identified as “scholar-ly”

• Agricola / National agricultural library provides citations to agricultural library.

Biological databases

Databases are repositories of primary biological data..

l Sequence Databases

l Primary (DNA): Consist of data derived experimentally, e.g. nucleotide sequences and 3D structures. The three, cooperate to make publicly available sequences available

l GenBank (USA), http://www.ncbi.nlm.nih.gov/nucleotide/

l European Nucleotide Archive (Europe) http://www.ebi.ac.uk/ena and

l DNA Database of Japan. http://www.ddbj.nig.ac.jp/

l Secondary databases are those whose data is derived from analysis or treatment of primary data, such secondary structures, hydrophobicity plots, and domain are stored.

Biological databases: Protein

l Uniprot (Universal protein resource): Database of protein sequences, and functional information.

l Swiss-prot (548,208): Manually annotated and reviewed

l TrEMBL (46,714,516): Automatically annotated and not reviewed

l PIR: Protein information resource

http://www.uniprot.org/ http://pir.georgetown.edu/

Biological databases: Protein

• Interpro: provides functional analysis of by classifying them into families and predicting domains and important sites. http://www.ebi.ac.uk/interpro/

: Protein families database of alignments and HMM http://pfam.xfam.org/

• PRINTS: Protein “fingerprints” i.e. conserved motifs to characterise protein family

http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/index.php

• PRODOM: Database of Protein domain families: http://prodom.prabi.fr/prodom/current/html/home.php

• SignalP: Predicts signal peptide prediction including cleavage site prediction http://www.cbs.dtu.dk/services/SignalP/

Metabolic pathway

BRENDA: comprehensive enzyme information

http://www.brenda-enzymes.org/

KEGG pathway DB (Kyoto Encyclopaedia of Genes and Genomes)

http://www.genome.jp/kegg/

Reactome

http://www.reactome.org/

RNA databases

miRBase- MicroRNA DB www.mirbase.org/

snoRNAdb- snoRNA DB www.lowelab.ucsc.edu/snoRNAdb/

Rfam- Database of RNA families http://rfam.xfam.org/

Silva – Database of Ribosomal RNA www.arb-silva.de/ Biological Databases: Genome

Genome databases: Examples include Wormbase; Flu DB; saccharomyces DB, Vectorbase; Flybase, zebra fish, etc.

l Multi-genome: Ensembl: genome databases for vertebrates and other eukaryotic species.

www.yeastgenome.org/ www.fudb.org www.zfin.org www.wormbase.org/ http://vectorbase.org/ http://www.metacyc.org/ KEGG: http://www.genome.jp http://web.expasy.org/pathways/ http://www.molecular-networks.com/databases/ biopath http://www.ncbi.nlm.nih.gov/geo/ http://www.ebi.ac.uk/gxa http://world-2dpage.expasy.org/swiss-2dpage/

Biological Databases: Others

l Biochemical pathway databases: Biological activities are orchestrated by various molecules These include KEGG; ExPASY; MetaCyc; BioPath

l Gene expression Data: DNA Microarrays: array of probe molecules that can bind specific DNA / mRNA. Fluoro-labelling enables viewing level of expression of genes; e.g. NCBI Geo (Gene expression omnibus)

l 2D page: allows quantitative study of protein concentration in the cell

www.yeastgenome.org/ http://www.ncbi.nlm.nih.gov/geo/ www.fudb.org http://www.ebi.ac.uk/gxa www.zfin.org http://world-2dpage.expasy.org/swiss-2dpage/ www.wormbase.org/ http://vectorbase.org/ http://www.metacyc.org/ KEGG: http://www.genome.jp http://web.expasy.org/pathways/ http://www.molecular-networks.com/databases/biopath Biological Databases: Protein structure

l Protein Databank (PDB) consists of experimentally validated protein structure e.g. x-ray crystallography, NMR.

l ModBase: A database of annotated comparative protein structure models Modelled proteins)

l SCOP: Structural classification of Proteins Depending on α ; β ; α+β ; membrane & cell surface proteins; small proteins; coiled coil proteins, etc.

l CATH: hierarchical domain classification of protein structures in the (Class | Architecture | Topology | Homologous super-families)

www.pdb.org https://modbase.compbio.ucsf.edu/modweb/

Software / Tools as resources Software / Tools as resources

•Journals e.g. Bioinformacs, Nucleic Acids Research, Journal of Molecular Biology, Protein science publish papers on cung edge developments and innovaons in computaonal biology methods

•Most biological databases have soware resource lisngs- e.g. Sequence searching, visualisaon resources (genome / alignment / genome level).

•Web servers : “Simple” web implementaon of the sowares. Clear inputs, outputs, parameters, graphical data representaon and downloadable results. www.ebi.ac.uk Protein Domains

A protein domain is a conserved part of a given protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain.

Molecular evolution uses domains as building blocks and these may be recombined in different arrangements to create proteins with different functions.

Databases include: SMART; PROSITE; NCBI; CATH

GLYCINAMIDE RIBONUCLEOTIDE SYNTHETASE (GAR-SYN) FROM E. COLI.

hp://prosite.expasy.org/ hp://smart.embl-heidelberg.de/

Motifs

A motif is a locally, conserved region / short sequence pattern shared by set of sequence; (easily observed via a Multiple sequence analysis); thus can be indicative of function / structural similarities.

Can be displayed via Sequence logos, or as patterns of amino acids.

Sequence Logo

Patterns of amino acids [PROSITE]:For example N-glycosylation site motif takes the form: N{P}[ST]{P} To mean: Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro Typical biological problems l What is the role of a particular gene? l Does a particular gene help cause a disease? l How does a drug affect a cell? l Can we insert a gene into corn to protect it against diseases or pests? l Can we design a drug to accomplish a particular purpose? l Can we build a cell that eats pollution? ... … Quick links l Database search:

l BLAST

l GenBank (NCBI): http://blast.ncbi.nlm.nih.gov/Blast.cgi

l EMBL – EBI: http://www.ebi.ac.uk/Tools/sss/ncbiblast/

l DDBJ: http://blast.ddbj.nig.ac.jp/blastn?lang=en

l HMMER:

l HMMER (DNA & PROT) http://hmmer.janelia.org/search/phmmer

l HHPRED (PROT): http://toolkit.lmb.uni-muenchen.de/hhpred l Sequence analysis suites

l EBI suite: http://www.ebi.ac.uk/services

l Weblogo http://weblogo.berkeley.edu/logo.cgi

l Mobyle @ Pasteur http://mobyle.pasteur.fr/cgi-bin/portal.py?#welcome Current status:

1. There is vast amount of available out there…. 2. There is a vast number of tools out there.. 3. Select best that is applicable for your project..

Arranging sequences (DNA/RNA/Proteins) to identify regions of similarity; which may be responsible for functional, structural or evolutionary relationship.

l Databases

l Softwares / servers

l Information l Scope and applications of l Databases and sequence retrieval l Sequence alignment: Homology, Similarity; Identity; BLAST l Pairwise vs. Multiple Sequence Alignment l Protein structure: Primary, Secondary; tertiary structure l Genomics: , assembly, annotation, functional genomics