Bioinformatics

Bioinformatics 3 Dept. of Computational Biology & Bioinformatics Bioinformatics - play with sequences & structures Dept. of Computational Biology & 4 Bioinformatics ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS Dept. of Computational Biology & Bioinformatics 5 WHAT IS BIOINFORMATICS? Computational Biology/Bioinformatics is the application of computer sciences and allied technologies to answer the questions of Biologists, about the mysteries of life. It has evolved to serve as the bridge between: Observations (data) in diverse biologically-related disciplines and The derivations of understanding (information) APPLICATIONS OF BIOINFORMATICS Computer Aided Drug Design Microarray Bioinformatics Proteomics Genomics Biological Databases Phylogenetics Systems Biology Dept. of Computational Biology & Bioinformatics 6 WHAT IS A BIO-SEQUENCE? DNA, RNA or protein information represented as a series of bases (or amino acids) that appear in bio-molecules. The method by which a bio- sequence is obtained is called Bio-sequencing. AGTCTTGATTCTTCTAGTTCTGC GTCCTGATAAGTCAGTGTCTCC TGAGTCTAGCTTCTGTCCATGCT DNA/ RNA ARNDGCAQTCEAGTHGILTKCCMAFTPGUTTSTCWTYAZGETGCANTDTWRDC SEQUENCE FPUQEGGAHTAILGDTTCLGKASTTTMCTFAEGWTCGUTWCCETSGTHCFPW RDTCAETTDUASGTTCCWTTEGGAAHILTDCNTTDCTTEAGGHTTTWC UWWE SPUSTTPPGTUCCQAWTTRDACCTCCLAKTSCWTCGUAWTGGMFACQEDT WRWEGSTPAWGYTTZWATEGGCHGILADDTCFTPCTACTGTW GRT DSTTFP PROTEIN UEEDCCCCGDATAWCCTUAWTCCGHTIGSTADTATKKTAGSCUTTNENDCFE SEQUENCE GWCRAAGTHCPPTTHHCTLADGTTTWCQTEGSTRCCNDACTTQAETGHILKM FPUSTWYZEGNDTWRDCCCFAPTUCQTEGGTHCILDCLK STMFWHATEWCU WISE SEQUENCESTHCFPWR ALIGNMENT?DTCEDUSTTW E GHILDArrangingNDTE DNA/proteinGHTWUWW sequencesESP sideUS byT PPside UtoQ studyWR theD extent of their similarity CCLKSWCUWMFCQEDTWRWESPWYZWEG Dept. of Computational Biology & Bioinformatics 7 HILDDFPTCTWRDSTTFPUEEDCCDTWCUW GHISTDTKKSUNENDCFEGWCRGHPPHHLD TWQES sequencing CRISIS AFTER DATA EXPLOSION!! Dept. of Computational Biology & Bioinformatics 8 DATA EXPLOSION TREND BIOLOGICAL SOLUTION?? DATABASES Dept. of Computational Biology & Bioinformatics 9 BIOLOGICAL DATABASES Dept. of Computational10 Biology & Bioinformatics WHAT IS A DATABASE? A structured set of data held in a computer, esp. one that is accessible in various ways. Dept. of Computational Biology & Bioinformatics 11 POPULAR DATABASE WEBSITES Dept. of Computational Biology & Bioinformatics 12 BIOLOGICAL DATABASES Dept. of Computational Biology & Bioinformatics 13 CLASSIFICATION OF BIOLOGICAL DATABASES Based on data source Based on data type Dept. of Computational Biology & Bioinformatics 14 BASED ON DATA SOURCE Dept. of Computational Biology & Bioinformatics 15 BIOLOGICAL DATABASES PRIMARY DATABASES SECONDARY DATABASES • First-hand information of • Derived from information experimental data from gathered in primary scientists and researchers database • Data not edited or validated • Data is manually curated • Raw and original and annotated submission of data • Data of highest quality as it • Made available to public for is double checked annotation Dept. of Computational Biology & Bioinformatics 16 PRIMARY DATABASES Database Website 1. NCBI (National Centre for Biotechnology www.ncbi.nlm.nih.gov Information) 2. DDBJ (DNA Data Bank of Japan) www.ddbj.nig.ac.jp 3. EMBL(European Molecular Biology Laboratory) www.ebi.ac.uk/embl 4. PIR (Protein Information Resource) www.pir.georgetown.edu 5. PDB (Protein Data Bank) www.rcsb.org/pdb 6. NDB( Nucleotide Data Bank) www.ndbserver.rutgers.edu 7. SwissProt (Protein- only sequence database) www.expasy.ch SECONDARY DATABASES Database Website 1. PROSITE (Protein domains, families, functional www.expasy.org/prosite sites) 2. Pfam (Protein families) www.sanger.ac.uk/pfam 3. SCOP (Structural Classification Of Proteins) www.scop.mrc-lmb.cam.ac.uk/scop 4. CATH (Class, Architecture, Topology, Homologous www.cathdb.info Super Family of Proteins) 5. OMIM (Online Mendelian Inheritance in Man) www.ncbi.nlm.nih/omim 6. KEGG (Kyoto Encyclopedia of Genes and www.genome.jp/kegg/pathway.html Genome) 7. MetaCyc (Enzyme Metabolic Pathways) www.metacyc.org Dept. of Computational Biology & Bioinformatics 18 Based on type of data Dept. of Computational Biology & Bioinformatics 19 BASED ON THE TYPE OF DATA BIOLOGICAL DATABASES NUCLEOTIDE SEQUENCE DATABASE PROTEIN SEQUENCE DATABASE GENOME DATABASE GENE EXPRESSION DATABASE ENZYME DATABASE STRUCTURE DATABASE PROTEIN INTERACTION DATABASE PATHWAY DATABASE LITERATURE DATABASE Dept. of Computational Biology & Bioinformatics 20 NUCLEOTIDE SEQUENCE DATABASES Dept. of Computational21 Biology & Bioinformatics NCBI- National Centre for Biotechnology Information Dept. of Computational Biology & 22 Bioinformatics EMBL – European Molecular Biology Lab Dept. of Computational Biology & Bioinformatics 23 DDBJ- DNA DATA BANK OF JAPAN Dept. of Computational Biology & Bioinformatics 24 PROTEIN SEQUENCE DATABASE Dept. of Computational25 Biology & Bioinformatics Dept. of Computational Biology & 26 Bioinformatics PDB- PROTEIN DATA BANK Dept. of Computational Biology & Bioinformatics 27 PATHWAY DATABASES Dept. of Computational28 Biology & Bioinformatics KEGG- KYOTO ENCYCLOPEDIA OF GENES AND GENOMES Dept. of Computational Biology & Bioinformatics 29 GENOME DATABASE Dept. of Computational Biology & Bioinformatics 30 WORMBASE : has the entire genome of C. elegans and other nematodes Dept. of Computational Biology & Bioinformatics 31 GENE EXPRESSION DATABASE Dept. of Computational Biology & Bioinformatics 32 Dept. of Computational Biology & Bioinformatics 33 Microarrays provide a means to measure gene expression Dept. of Computational Biology & 34 Bioinformatics Yeast genome on a chip ENZYME DATABASE Dept. of Computational Biology & Bioinformatics 36 ENZYME DATABASE OF ExPaSy server Dept. of Computational Biology & Bioinformatics 37 STRUCTURE DATABASE Dept. of Computational Biology & Bioinformatics 38 Dept. of Computational Biology & Bioinformatics 39 LITERATURE DATABASE Dept. of Computational Biology & Bioinformatics 40 Dept. of Computational Biology & Bioinformatics 41 Use of Databases in Biology- Sequence Analysis Dept. of Computational Biology & 42 Bioinformatics Where do we get these sequences from? Through genome sequencing projects Dept. of Computational Biology & 43 Bioinformatics • Submit sequences to biological databases • Biological databases helps in efficient manipulation of large data sets • Provides improved search sensitivity, search efficiency • Joining of multiple data sets • Databases allows the users to analyse the biological data sets DNA RNA Proteins Dept. of Computational Biology & 44 Bioinformatics Analysis of Nucleic acids & Protein Sequences • Sequence Analysis Process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods To understand its features, function, structure, or evolution To assign function to genes & proteins by the studying the similarities between the compared sequences Methodologies include: Sequence alignment Searches against biological databases Dept. of Computational Biology & 45 Bioinformatics • Sequence analysis in molecular biology includes a very wide range of relevant topics: The comparison of sequences in order to find similarity, infer if they are related (homologous) Identification of active sites, gene structures, reading frames etc. Identification of sequence differences and variations – SNP, Point mutations, identify genetic markers Revealing the evolution and genetic diversity of sequences and organisms Identification of molecular structure from sequence alone Dept. of Computational Biology & 46 Bioinformatics Sequence Alignment Relationships between these sequences are usually discovered by aligning them together assigning a score to the alignments Two main types of sequence alignment: Pair-wise sequence alignment - compares only two sequences at a time Multiple sequence alignment- compares many sequences Two important algorithms for aligning pairs of sequences : Needleman-Wunsch algorithm Smith-Waterman algorithm Dept. of Computational Biology & 47 Bioinformatics • Popular tools for sequence alignment include: Pair-wise alignment - BLAST Multiple alignment - ClustalW, MUSCLE, MAFFT, T-Coffee etc. • Alignment methods: Local alignments - Needleman–Wunsch algorithm Global alignments - Smith-Waterman algorithm Dept. of Computational Biology & 48 Bioinformatics Pair-wise alignment • Used to find the best-matching piecewise (local or global) alignments of two query sequences • Can only be used between two sequences at a time Dept. of Computational Biology & 49 Bioinformatics Multiple Sequence Alignment • Is an extension of pairwise alignment to incorporate more than two sequences at a time • Align all of the sequences in a given query set • Often used in identifying conserved sequence regions across a group of sequences hypothesized to be evolutionarily related • Alignments helps to establish evolutionary relationships by constructing phylogenetic trees Dept. of Computational Biology & 50 Bioinformatics Sequence Analaysis Tools Pair-wise alignment - BLAST • Basic Local Alignment Search Tool (BLAST) • Developed by Research staff at NCBI/GenBank as a new way to perform seq. similarity search • Available as free service over internet • Very fast ,Accurate and sensitive database searching • Server-NCBI Dept. of Computational Biology & 51 Bioinformatics Types of BLAST Programs: Dept. of Computational Biology & 52 Bioinformatics NCBI -BLAST Dept. of Computational Biology & 53 Bioinformatics Dept. of Computational Biology & 54

Bioinformatics

Zebrafish Disease Models to Study the Pathogenesis of Inherited Manganese Transporter Defects and Provide A

Glycomics Goes Visual and Interactive

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

A Semantic Standard for Describing the Location of Nucleotide and Protein Feature Annotation Jerven T

Pathogenicity and Selective Constraint on Variation Near Splice Sites

HTG Data Input: Annotation File Output in DDBJ Flat File

Viroinformatics Investigation of B-Cell Epitope Conserved Region in SARS

Nucleic Acid Databases on the Web Richard Peters and Robert S

Bioinformatics Exercises: Bovine Lactate Dehydrogenase (LDH)

Biological Databases

Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases

Plant Protein Annotation in the Uniprot Knowledgebase