<<

Bioinformatics and applications

Eitan Rubin, December 2002

Bioinformatics & Biological Unit Eitan Rubin Department of Biological Services Outline

• Introduction • A day in the life of a • Major databases • Major tools

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline

• Introduction • A day in the life of a biologist • Major databases • Major tools

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Life as a simple CS problem

Input1

Algorithm Output

Input2

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services A more realistic

Input1 Algorithm2

Algorithm1 Output decision

Input2 Algorithm3

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services A typical real-life view

Input1 Algorithm2

Algorithm1 Output decision

Input2 Algorithm3

Input1 Input1 Algorithm2 Algorithm2

Algorithm1 Output decision Algorithm1 Output decision

Input2 Algorithm3 Input2 Algorithm3

Input1 Input1 Algorithm2 Algorithm2

Algorithm1 Output decision Algorithm1 Output decision

Input2 Algorithm3 Input2 Algorithm3

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services The life cycle of a bioinformatics project • Clearly define the goals • Define a strategy • Run the • QA & optimize – Controls – External – Re- – Correlation

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline

• Introduction • A day in the life of a biologist • Major databases • Major tools

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Positional cloning of disease X

XM-417-L16 XM-417-L15

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services browser @ UCSC Looking at the region of interest

chrX:98100000-98500000

Gene prediction program suggest there are 6-8 in the region

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Get mRNA @ NCBI

>unkown_protein MRLTEKSEGEQQLKPNNSNAPNEDQEEEIQQSEQHTPARQRTQRADTQPSRCRLPSR RTPTTSSDRTINLLEVLPWPTEWIFNPYRLPALFELYPEFLLVFKEAFHDISHCLKA Bioinformatics & Biological Computing Unit QMEKIGLPIILHLFALSTLYFYKFFLPTILSLSFFILLVLLLFIIVFILIFFEitan Rubin Department of Biological Services BLAST @ NCBI

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Search for domains @Interpro

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Search for domains @Interpro

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Get predicted @ UCSC

>naharu.b MSSRKQGSQPRGQQSAEEENFKKPTRSNMQRSKMRGASSGKKTAGPQQKN LEPALPGRWGGRSAENPPSGSVRKTRKNKQKTPGNGDGGSTSEAPQPPRK KRARADPTVESEEAFKNRMEVKVKIPEELKPWLVEDWDLVTRQKQLFQLP AKKNVDAILEEYANCKKSQGNVDNKEYAVNEVVAGIKEYFNVMLGTQLLY KFERPQYAEILLAHPDAPMSQVYGAPHLLRLFVRIGAMLAYTPLDEKSLA LLLGYLHDFLKYLAKNSASLFTASDYKVASAEYHRKAL Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Outline

• Introduction • A day in the life of a biologist • Major databases • Major tools

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services BIND; MINT; BRITE …

PDB

HSSP

Swissprot ; ; LAMA; GO

StackDB; Gencarta; Ensembl AAAAA

???

EPD INSD (, EMBL, DDJB) Specialized databases: Flybase, YPD, UCSC, Bioinformatics & Biological Computing Unit TAIR Eitan Rubin Department of Biological Services INSD

• Genbank, EMBL, DDJB • CleanBank • Divisions (EST, HTG) • Specialized databases

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Major tools

• Transcript modelling from ESTs – Sequencher, Staden, StackPACK • searching – Blast – BLAT – Fasta • Multiple – ClustalX – MACAW Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Major tools

prediction • (EST) assembly • Promoter Finding • ORF identification • Similarity searching • MSA • Phylogenetic analysis • Structure prediction • Docking

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services ClustalX

• Stepwise tree-guided alignment • “Bag full of tricks” • Demo

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services The effect of parameters Default parameters

Modified parameters

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services The effect of parameters

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Major tools

• (EST) assembly • Promoter Finding • ORF identification • Similarity searching • MSA • Phylogenetic analysis • Structure prediction • Docking

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services Similarity searching

• SW (accelerated) • BLAST + The NCBI environment, Fast, wide dynamic , availability - DNA very bad stats, poor for ? Highly local FASTA • BLAT + Lightening fast, focused - Limited dynamic range

Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services MSA

• ClustalX + Fast; familiar - Global; One, not very accurate • Macaw + Very interactive; outstanding GUI; multiple - Immature; runs on PCs; incompatible • BLOCKS maker + Fully automated; fast - Poor control; many mistakes Bioinformatics & Biological Computing Unit Eitan Rubin Department of Biological Services