Bioinformatics Databases and Applications

Bioinformatics databases and applications Eitan Rubin, December 2002 Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline • Introduction • A day in the life of a biologist • Major databases • Major tools Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline • Introduction • A day in the life of a biologist • Major databases • Major tools Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Life as a simple CS problem Input1 Algorithm Output Input2 Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services A more realistic view Input1 Algorithm2 Algorithm1 Output decision Input2 Algorithm3 Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services A typical real-life view Input1 Algorithm2 Algorithm1 Output decision Input2 Algorithm3 Input1 Input1 Algorithm2 Algorithm2 Algorithm1 Output decision Algorithm1 Output decision Input2 Algorithm3 Input2 Algorithm3 Input1 Input1 Algorithm2 Algorithm2 Algorithm1 Output decision Algorithm1 Output decision Input2 Algorithm3 Input2 Algorithm3 Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services The life cycle of a bioinformatics project • Clearly define the goals • Define a strategy • Run the process • QA & optimize – Controls – External knowledge – Re-sampling – Correlation Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline • Introduction • A day in the life of a biologist • Major databases • Major tools Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Positional cloning of disease X XM-417-L16 XM-417-L15 Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Genome browser @ UCSC Looking at the region of interest chrX:98100000-98500000 Gene prediction program suggest there are 6-8 genes in the region Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Get mRNA @ NCBI >unkown_protein MRLTEKSEGEQQLKPNNSNAPNEDQEEEIQQSEQHTPARQRTQRADTQPSRCRLPSR RTPTTSSDRTINLLEVLPWPTEWIFNPYRLPALFELYPEFLLVFKEAFHDISHCLKA Bioinformatics & Biological Computing Unit QMEKIGLPIILHLFALSTLYFYKFFLPTILSLSFFILLVLLLFIIVFILIFFEitan Rubin Department of Biological Services BLAST @ NCBI Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Search for domains @Interpro Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Search for domains @Interpro Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Get predicted protein @ UCSC >naharu.b MSSRKQGSQPRGQQSAEEENFKKPTRSNMQRSKMRGASSGKKTAGPQQKN LEPALPGRWGGRSAENPPSGSVRKTRKNKQKTPGNGDGGSTSEAPQPPRK KRARADPTVESEEAFKNRMEVKVKIPEELKPWLVEDWDLVTRQKQLFQLP AKKNVDAILEEYANCKKSQGNVDNKEYAVNEVVAGIKEYFNVMLGTQLLY KFERPQYAEILLAHPDAPMSQVYGAPHLLRLFVRIGAMLAYTPLDEKSLA LLLGYLHDFLKYLAKNSASLFTASDYKVASAEYHRKAL Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline • Introduction • A day in the life of a biologist • Major databases • Major tools Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services BIND; MINT; BRITE … PDB HSSP Swissprot ; interpro; LAMA; GO StackDB; Gencarta; Ensembl AAAAA ??? EPD INSD (genbank, EMBL, DDJB) Specialized databases: Flybase, YPD, UCSC, Bioinformatics & Biological Computing Unit TAIR Eitan Rubin Department of Biological Services INSD • Genbank, EMBL, DDJB • CleanBank • Divisions (EST, HTG) • Specialized databases Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Major tools • Transcript modelling from ESTs – Sequencher, Staden, StackPACK • Database searching – Blast – BLAT – Fasta • Multiple Sequence Alignment – ClustalX – MACAW Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Major tools • Gene prediction • (EST) assembly • Promoter Finding • ORF identification • Similarity searching • MSA • Phylogenetic analysis • Structure prediction • Docking Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services ClustalX • Stepwise tree-guided alignment • “Bag full of tricks” • Demo Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services The effect of parameters Default parameters Modified parameters Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services The effect of parameters Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Major tools • Gene prediction • (EST) assembly • Promoter Finding • ORF identification • Similarity searching • MSA • Phylogenetic analysis • Structure prediction • Docking Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Similarity searching • SW (accelerated) • BLAST + The NCBI environment, Fast, wide dynamic range, availability - DNA very bad stats, poor for proteins ? Highly local FASTA • BLAT + Lightening fast, focused - Limited dynamic range Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services MSA • ClustalX + Fast; familiar - Global; One, not very accurate algorithm • Macaw + Very interactive; outstanding GUI; multiple algorithms - Immature; runs on PCs; incompatible • BLOCKS maker + Fully automated; fast - Poor control; many mistakes Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services.

Bioinformatics Databases and Applications

Bioinformatics 1

13 Genomics and Bioinformatics

Use of Bioinformatics Resources and Tools by Users of Bioinformatics Centers in India Meera Yadav University of Delhi, [email protected]

Challenges in Bioinformatics

Blockchain Database for a Cyber Security Learning System

An Introduction to Cloud Databases a Guide for Administrators

Comparing Bioinformatics Software Development by Computer Scientists and Biologists: an Exploratory Study

Middleware-Based Database Replication: the Gaps Between Theory and Practice

Database Technology for Bioinformatics from Information Retrieval to Knowledge Systems

What Is a Database? Differences Between the Internet and Library

Bioinformatics Careers

Data Quality Requirements Analysis and Modeling December 1992 TDQM-92-03 Richard Y