Bioinformatics databases and applications
Eitan Rubin, December 2002
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline
• Introduction • A day in the life of a biologist • Major databases • Major tools
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline
• Introduction • A day in the life of a biologist • Major databases • Major tools
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Life as a simple CS problem
Input1
Algorithm Output
Input2
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services A more realistic view
Input1 Algorithm2
Algorithm1 Output decision
Input2 Algorithm3
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services A typical real-life view
Input1 Algorithm2
Algorithm1 Output decision
Input2 Algorithm3
Input1 Input1 Algorithm2 Algorithm2
Algorithm1 Output decision Algorithm1 Output decision
Input2 Algorithm3 Input2 Algorithm3
Input1 Input1 Algorithm2 Algorithm2
Algorithm1 Output decision Algorithm1 Output decision
Input2 Algorithm3 Input2 Algorithm3
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services The life cycle of a bioinformatics project • Clearly define the goals • Define a strategy • Run the process • QA & optimize – Controls – External knowledge – Re-sampling – Correlation
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline
• Introduction • A day in the life of a biologist • Major databases • Major tools
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Positional cloning of disease X
XM-417-L16 XM-417-L15
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Genome browser @ UCSC Looking at the region of interest
chrX:98100000-98500000
Gene prediction program suggest there are 6-8 genes in the region
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Get mRNA @ NCBI
>unkown_protein MRLTEKSEGEQQLKPNNSNAPNEDQEEEIQQSEQHTPARQRTQRADTQPSRCRLPSR RTPTTSSDRTINLLEVLPWPTEWIFNPYRLPALFELYPEFLLVFKEAFHDISHCLKA Bioinformatics & Biological Computing Unit QMEKIGLPIILHLFALSTLYFYKFFLPTILSLSFFILLVLLLFIIVFILIFFEitan Rubin Department of Biological Services BLAST @ NCBI
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Search for domains @Interpro
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Search for domains @Interpro
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Get predicted protein @ UCSC
>naharu.b MSSRKQGSQPRGQQSAEEENFKKPTRSNMQRSKMRGASSGKKTAGPQQKN LEPALPGRWGGRSAENPPSGSVRKTRKNKQKTPGNGDGGSTSEAPQPPRK KRARADPTVESEEAFKNRMEVKVKIPEELKPWLVEDWDLVTRQKQLFQLP AKKNVDAILEEYANCKKSQGNVDNKEYAVNEVVAGIKEYFNVMLGTQLLY KFERPQYAEILLAHPDAPMSQVYGAPHLLRLFVRIGAMLAYTPLDEKSLA LLLGYLHDFLKYLAKNSASLFTASDYKVASAEYHRKAL Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline
• Introduction • A day in the life of a biologist • Major databases • Major tools
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services BIND; MINT; BRITE …
PDB
HSSP
Swissprot ; interpro; LAMA; GO
StackDB; Gencarta; Ensembl AAAAA
???
EPD INSD (genbank, EMBL, DDJB) Specialized databases: Flybase, YPD, UCSC, Bioinformatics & Biological Computing Unit TAIR Eitan Rubin Department of Biological Services INSD
• Genbank, EMBL, DDJB • CleanBank • Divisions (EST, HTG) • Specialized databases
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Major tools
• Transcript modelling from ESTs – Sequencher, Staden, StackPACK • Database searching – Blast – BLAT – Fasta • Multiple Sequence Alignment – ClustalX – MACAW Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Major tools
• Gene prediction • (EST) assembly • Promoter Finding • ORF identification • Similarity searching • MSA • Phylogenetic analysis • Structure prediction • Docking
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services ClustalX
• Stepwise tree-guided alignment • “Bag full of tricks” • Demo
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services The effect of parameters Default parameters
Modified parameters
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services The effect of parameters
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Major tools
• Gene prediction • (EST) assembly • Promoter Finding • ORF identification • Similarity searching • MSA • Phylogenetic analysis • Structure prediction • Docking
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Similarity searching
• SW (accelerated) • BLAST + The NCBI environment, Fast, wide dynamic range, availability - DNA very bad stats, poor for proteins ? Highly local FASTA • BLAT + Lightening fast, focused - Limited dynamic range
Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services MSA
• ClustalX + Fast; familiar - Global; One, not very accurate algorithm • Macaw + Very interactive; outstanding GUI; multiple algorithms - Immature; runs on PCs; incompatible • BLOCKS maker + Fully automated; fast - Poor control; many mistakes Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services