<<

IntroductionIntroduction toto BioinformaticsBioinformatics

StephenStephen TaylorTaylor

Computational Biology Research Group BackgroundBackground

DefinitionDefinition •• BBioinformaticsioinformatics isis thethe computationalcomputational analysisanalysis andand storagestorage ofof biologicalbiological datadata

DerivationDerivation •• bbioio –– bbiologyiology •• iinformatiquenformatique –– FFrenchrench forfor ‘data‘data processing’processing’

GoalGoal •• TToo discoverdiscover newnew biologicalbiological insightsinsights usingusing computerscomputers andand biologybiology

Computational Biology Research Group OtherOther relatedrelated disciplinesdisciplines

ComputationalComputational BiologyBiology •• bbroaderroader term,term, moremore anan approachapproach asas genericgeneric asas biologybiology itselfitself ChemoinformaticsChemoinformatics •• sstudytudy andand analysisanalysis ofof chemicalchemical informationinformation MedicalMedical InformaticsInformatics •• sstudy,tudy, inventioninvention andand implementatioimplementationn ofof structuresstructures andand algorithmsalgorithms ttoo improveimprove communication,communication, understanunderstandingding andand managementmanagement ofof medicalmedical informationinformation MathematicalMathematical BiologyBiology •• mmoreore theoretical.theoretical. ThingsThings whichwhich araree notnot necessarilynecessarily algorithmic,algorithmic, notnot necessarilynecessarily molecularmolecular inin nature,nature, andand areare notnot necessarilynecessarily usefuluseful inin analyzinganalyzing collectedcollected data!data!

Computational Biology Research Group WhatWhat isis ?bioinformatics?

ExperimentExperiment Analysis Hypothesis Sequence Data Structure Result Function Evolution Pathway Interaction Mutation Expression

Computational Biology Research Group WhyWhy useuse bioinformaticsbioinformatics

FindFind anan answeranswer quicklyquickly •M•Moosstt inin silicosilico biologybiology isis fasterfaster thanthan inin vitrovitro

MassiveMassive amountsamounts ofof datadata toto analyseanalyse •• NNeedeed toto makemake useuse ofof allall informationinformation •• NNotot possiblepossible toto dodo analysisanalysis byby handhand •• CCan’tan’t organiseorganise andand storestore informationinformation onlyonly usingusing lablab notenote booksbooks •• AAutomationutomation isis keykey

However!However! •• AAllll resultsresults ofof computercomputer analysisanalysis shouldshould toto bebe verifiedverified byby biologistsbiologists

Computational Biology Research Group BioinformaticsBioinformatics databasesdatabases

•• PublicPublic databasesdatabases areare thethe momostst importantimportant entityentity inin bioinformaticsbioinformatics •• StoreStore knowledgeknowledge aboutabout •• SSequenceequence e.g.e.g. EMBLEMBL •• SStructuretructure e.g.e.g. PDBPDB •• PPathwaysathways e.g.e.g. KEGGKEGG •• IInteractionsnteractions e.g.e.g. DIPDIP •• DiseasesDiseases e.g.e.g. OMIMOMIM •• AAndnd manymany othersothers …… •• CanCan bebe searchedsearched inin aa varietyvariety ofof waysways e.g.e.g. keyword,keyword, pattern,pattern, sequencesequence

Computational Biology Research Group BioinformaticsBioinformatics ToolsTools

•• HundrHundreedsds ofof computercomputer programsprograms •• ManyMany freelyfreely availableavailable •• GenerallyGenerally availableavailable onon UNIXUNIX oror LINUXLINUX •• OftenOften interactinteract withwith bibioinformaticsoinformatics databasesdatabases •• ManyMany accessibleaccessible viavia thethe WWWWWW •• SomeSome requirerequire veryvery powerfulpowerful computerscomputers toto runrun onon •• ComputationalComputational BiologyBiology ResearchResearch GroupGroup provideprovide aa environmentenvironment toto dodo thisthis

Computational Biology Research Group TheThe HumanHuman GenomeGenome ProjectProject

CoulCouldd notnot havehave beenbeen achievedachieved withoutwithout bioinformaticsbioinformatics

GoalsGoals •• DDetermineetermine thethe completecomplete sequencesequence ofof thethe 33 billionbillion DNADNA subunitssubunits •• DiscoverDiscover allall thethe humanhuman genesgenes andand makemake themthem accessibleaccessible forfor furtherfurther biologicalbiological studystudy NeedNeed toto bringbring togethertogether andand ststoreore vastvast amountsamounts ofof informationinformation fromfrom •• LabLab equipmentequipment andand experimentsexperiments •• ComputerComputer AnalysisAnalysis •• HumanHuman AnalysisAnalysis •• MakeMake visiblevisible toto thethe world’sworld’s scientistsscientists

Computational Biology Research Group CentralCentral DogmaDogma ofof MolecularMolecular BiologyBiology

(See http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml)

Computational Biology Research Group OverviewOverview HGPHGP bioinformaticsbioinformatics

AssembleAssemble

AnalyseAnalyse

AnnotateAnnotate

DisplayDisplay

Computational Biology Research Group AssemblyAssembly

HumanHuman gengenoomeme isis theoreticallytheoretically severalseveral longlong stringsstrings totallingtotalling 33 billionbillion basebase pairspairs •• AAssembledssembled viavia hundredshundreds ofof thousandsthousands ofof overlappingoverlapping unitsunits oror contigscontigs ttoo makemake aa singlesingle consensusconsensus sequencesequence •• SSequencesequences collatedcollated usingusing informationinformation storedstored onon ABIABI sequencersequencer •• SSequenceequence assemblyassembly bioinformaticsbioinformatics toolstools usedused toto •• AutomaticallyAutomatically assembleassemble fragmentsfragments •• HandHand finishfinish usingusing computercomputer toolstools •• RRequiresequires constantconstant reassemblyreassembly andand rebuildsrebuilds asas newnew datadata comescomes inin •• EE.g..g. PHRED/PHRAPPHRED/PHRAP andand StadenStaden

Computational Biology Research Group AnalyseAnalyse

TakeTake thethe assembledassembled stringstring ofof nucleotidesnucleotides

AGTACGTAGTAGCTGCTGCTACGTGCGCTAGCTAGTACG TCACGACGTAGATGCTAGCTGACTCGATGCAGACTGCTA GCTGCCAGCGACTCAGCTACGACTAGCATCGGCGCTAG CATCGGCAGC…

Computational Biology Research Group FindFind genesgenes

5’ 3’

•• TTrainrain algorithmalgorithm toto looklook forfor featurfeatureses e.g.e.g. •• SpSplilicece sisitestes •• StarStartt // StopStop codonscodons •• CodonCodon ffrrequencyequency •• PromotersPromoters

•• UUsese existingexisting biologicalbiological informationinformation e.g.e.g. ESTs,ESTs, cDNAcDNA •• BBuilduild aa modelmodel ofof genegene structurestructure Computational Biology Research Group FindFind translatedtranslated (s)protein(s)

TranslateTranslate DNADNA toto theoreticaltheoretical proteinprotein

5’ 3’

>Unknown Sequence VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF DLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLR VDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKY R

Computational Biology Research Group FindFind FunctionFunction

MajorMajor challengechallenge inin bioinformaticsbioinformatics •• SSearchearch thethe proteinprotein sequencesequence vsvs databasedatabase ofof proteinsproteins ofof knownknown function*function* •• PProteinrotein domainsdomains areare evolutionarilyevolutionarily conservedconserved •• PProteinsroteins thatthat areare similarsimilar inin sequencesequence acrossacross severalseveral speciesspecies areare llikelyikely toto havehave aa similarsimilar functionfunction •• BBLASTLAST :: •• AA queryquery sequencesequence •• SequenceSequence databasedatabase (protein(protein oror nucleotide)nucleotide) •• InspectionInspection ofof significantsignificant hitshits •• TTherehere areare manymany otherother methodsmethods usedused toto implyimply function!function!

* Many of the databases contain errors

Computational Biology Research Group ExampleExample

Query Human, Unknown

Sequence Uniprot database

Chimp, Myoglobin Search Results Pig, Myoglobin Mouse, Myoglobin

Putative function = Human, Myoglobin

Computational Biology Research Group AnnotateAnnotate

•• ResultsResults ofof rawraw genegene analysisanalysis areare FEATFEATURURESES •• IntegrationIntegration ofof features,features, biolbiologicalogical rulesrules andand knoknowwledgeledge makemake ANNOTATIONSANNOTATIONS •• WriteWrite thesethese backback toto thethe databasedatabase •• AutomatedAutomated whatwhat wouldwould takentaken hundredshundreds ofof sciescienntiststists toto dodo

Computational Biology Research Group EnsemblEnsembl

EnsemblEnsembl GenomeGenome BrowserBrowser (www.ensembl.org)(www.ensembl.org)

Computational Biology Research Group UCSCUCSC GenomeGenome BrowserBrowser -- http://genome.ucsc.edu/http://genome.ucsc.edu/

Computational Biology Research Group AndAnd thisthis isis justjust thethe start…start…

BioinformaticsBioinformatics isis nownow inin thethe ‘post‘post -- ggenomicsenomics’’ eraera •• TTranscriptomicsranscriptomics •• EEpigenomicspigenomics •P•Prrootteeoommiiccss •• CComparativeomparative genomicsgenomics •• MMetabolicetabolic pathwayspathways •• RRegulatoryegulatory networksnetworks •• VVirtualirtual CellsCells •• MModellingodelling SystemsSystems andand OrgansOrgans

Computational Biology Research Group MicroarraysMicroarrays

•• UsedUsed inin largelarge scalescale functionalfunctional studiesstudies •• LookingLooking forfor patternspatterns ofof genegene expressionexpression e.g.e.g. •• DDiseaseisease vsvs NormalNormal •O•Ovveerr ttiimmee •• BioinformaticsBioinformatics usedused toto •• NNormaliseormalise imagesimages •• MMeasuringeasuring andand adjustadjust forfor variabilityvariability •• AAnalysisnalysis ofof differdifferentiallyentially expressedexpressed genesgenes •• SStoringtoring datadata MuchMuch moremore complicatedcomplicated datadata thanthan sequencessequences http://www.ebi.ac.uk/microarray/biology_intro.html#Microarrays •• TranscriptomeTranscriptome

Computational Biology Research Group ProteomicsProteomics

SampleSample ProteomeProteome

FractionationFractionation

ProteinProtein AnnotationAnnotation // BiBioinformaticsoinformatics

MassMass spectrometryspectrometry

Computational Biology Research Group SystemsSystems BiologyBiology

HowHow everythingeverything fitsfits togethertogether byby takingtaking aa holisticholistic viewview ofof aa biologicalbiological systemsystem •• DNADNA •• RNARNA •P•Prrootteeiinnss •• PProteinrotein InteractionsInteractions •N•Neettwwoorrkkss •• CCellsells •• OOrgansrgans

WillWill requirerequire aa hugehuge amountamount ofof datadata andand correspondingcorresponding computationalcomputational InfrastructureInfrastructure

Computational Biology Research Group SystemsSystems biologybiology examplesexamples

•• DennisDennis BrayBray (Cambridge)(Cambridge) •• hhttp://www.pdn.cam.ac.uk/groups/comp-cell/ttp://www.pdn.cam.ac.uk/groups/comp-cell/ •• MModellingodelling chemotaxischemotaxis inin bacteriabacteria •• DenisDenis NobleNoble (Oxford)(Oxford) -- ScienceScience,, VolVol 295,295, IssueIssue 5560,5560, 1678-16821678-1682 •• MModellingodelling tthehe HearHeart-t---ffromrom GenesGenes toto CellsCells toto thethe WholeWhole OrganOrgan

•• http://www.physiome.org/http://www.physiome.org/

Computational Biology Research Group ListList ofof usefuluseful websiteswebsites

•• WWWWWW hashas drivendriven aa lotlot ofof bioinformaticsbioinformatics researchresearch •• LowLow poweredpowered computerscomputers accessaccess veryvery highhigh poweredpowered computerscomputers •• ImportantImportant toto useuse allall resourcesresources availableavailable toto dodo researchresearch •• HundredsHundreds ofof sitessites availableavailable

Computational Biology Research Group EBIEBI -- http://www.ebi.ac.uk/http://www.ebi.ac.uk/

Computational Biology Research Group NCBINCBI -- http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/

Computational Biology Research Group BioMartBioMart -- http://www.biomart.orghttp://www.biomart.org//

Computational Biology Research Group TIGRTIGR -- http://www.tigr.org/http://www.tigr.org/

Computational Biology Research Group ExPASyExPASy -- http://www.expasy.org/http://www.expasy.org/

Computational Biology Research Group PDBPDB -- http://www.rcsb.org/pdb/http://www.rcsb.org/pdb/

Computational Biology Research Group BioperlBioperl -- http://bioperl.org/http://bioperl.org/

Computational Biology Research Group CBRGCBRG -- http://www.cbrg.ox.ac.ukhttp://www.cbrg.ox.ac.uk

„ [email protected]

Computational Biology Research Group