IntroductionIntroduction toto BioinformaticsBioinformatics
StephenStephen TaylorTaylor
Computational Biology Research Group BackgroundBackground
DefinitionDefinition •• BBioinformaticsioinformatics isis thethe computationalcomputational analysisanalysis andand storagestorage ofof biologicalbiological datadata
DerivationDerivation •• bbioio –– bbiologyiology •• iinformatiquenformatique –– FFrenchrench forfor ‘data‘data processing’processing’
GoalGoal •• TToo discoverdiscover newnew biologicalbiological insightsinsights usingusing computerscomputers andand biologybiology
Computational Biology Research Group OtherOther relatedrelated disciplinesdisciplines
ComputationalComputational BiologyBiology •• bbroaderroader term,term, moremore anan approachapproach asas genericgeneric asas biologybiology itselfitself ChemoinformaticsChemoinformatics •• sstudytudy andand analysisanalysis ofof chemicalchemical informationinformation MedicalMedical InformaticsInformatics •• sstudy,tudy, inventioninvention andand implementatioimplementationn ofof structuresstructures andand algorithmsalgorithms ttoo improveimprove communication,communication, understanunderstandingding andand managementmanagement ofof medicalmedical informationinformation MathematicalMathematical BiologyBiology •• mmoreore theoretical.theoretical. ThingsThings whichwhich araree notnot necessarilynecessarily algorithmic,algorithmic, notnot necessarilynecessarily molecularmolecular inin nature,nature, andand areare notnot necessarilynecessarily usefuluseful inin analyzinganalyzing collectedcollected data!data!
Computational Biology Research Group WhatWhat isis bioinformatics?bioinformatics?
ExperimentExperiment Analysis Hypothesis Sequence Data Structure Result Function Evolution Pathway Interaction Mutation Expression
Computational Biology Research Group WhyWhy useuse bioinformaticsbioinformatics
FindFind anan answeranswer quicklyquickly •M•Moosstt inin silicosilico biologybiology isis fasterfaster thanthan inin vitrovitro
MassiveMassive amountsamounts ofof datadata toto analyseanalyse •• NNeedeed toto makemake useuse ofof allall informationinformation •• NNotot possiblepossible toto dodo analysisanalysis byby handhand •• CCan’tan’t organiseorganise andand storestore informationinformation onlyonly usingusing lablab notenote booksbooks •• AAutomationutomation isis keykey
However!However! •• AAllll resultsresults ofof computercomputer analysisanalysis shouldshould toto bebe verifiedverified byby biologistsbiologists
Computational Biology Research Group BioinformaticsBioinformatics databasesdatabases
•• PublicPublic databasesdatabases areare thethe momostst importantimportant entityentity inin bioinformaticsbioinformatics •• StoreStore knowledgeknowledge aboutabout •• SSequenceequence e.g.e.g. EMBLEMBL •• SStructuretructure e.g.e.g. PDBPDB •• PPathwaysathways e.g.e.g. KEGGKEGG •• IInteractionsnteractions e.g.e.g. DIPDIP •• DiseasesDiseases e.g.e.g. OMIMOMIM •• AAndnd manymany othersothers …… •• CanCan bebe searchedsearched inin aa varietyvariety ofof waysways e.g.e.g. keyword,keyword, pattern,pattern, sequencesequence
Computational Biology Research Group BioinformaticsBioinformatics ToolsTools
•• HundrHundreedsds ofof computercomputer programsprograms •• ManyMany freelyfreely availableavailable •• GenerallyGenerally availableavailable onon UNIXUNIX oror LINUXLINUX •• OftenOften interactinteract withwith bibioinformaticsoinformatics databasesdatabases •• ManyMany accessibleaccessible viavia thethe WWWWWW •• SomeSome requirerequire veryvery powerfulpowerful computerscomputers toto runrun onon •• ComputationalComputational BiologyBiology ResearchResearch GroupGroup provideprovide aa environmentenvironment toto dodo thisthis
Computational Biology Research Group TheThe HumanHuman GenomeGenome ProjectProject
CoulCouldd notnot havehave beenbeen achievedachieved withoutwithout bioinformaticsbioinformatics
GoalsGoals •• DDetermineetermine thethe completecomplete sequencesequence ofof thethe 33 billionbillion DNADNA subunitssubunits •• DiscoverDiscover allall thethe humanhuman genesgenes andand makemake themthem accessibleaccessible forfor furtherfurther biologicalbiological studystudy NeedNeed toto bringbring togethertogether andand ststoreore vastvast amountsamounts ofof informationinformation fromfrom •• LabLab equipmentequipment andand experimentsexperiments •• ComputerComputer AnalysisAnalysis •• HumanHuman AnalysisAnalysis •• MakeMake visiblevisible toto thethe world’sworld’s scientistsscientists
Computational Biology Research Group CentralCentral DogmaDogma ofof MolecularMolecular BiologyBiology
(See http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml)
Computational Biology Research Group OverviewOverview HGPHGP bioinformaticsbioinformatics
AssembleAssemble
AnalyseAnalyse
AnnotateAnnotate
DisplayDisplay
Computational Biology Research Group AssemblyAssembly
HumanHuman gengenoomeme isis theoreticallytheoretically severalseveral longlong stringsstrings totallingtotalling 33 billionbillion basebase pairspairs •• AAssembledssembled viavia hundredshundreds ofof thousandsthousands ofof overlappingoverlapping unitsunits oror contigscontigs ttoo makemake aa singlesingle consensusconsensus sequencesequence •• SSequencesequences collatedcollated usingusing informationinformation storedstored onon ABIABI sequencersequencer •• SSequenceequence assemblyassembly bioinformaticsbioinformatics toolstools usedused toto •• AutomaticallyAutomatically assembleassemble fragmentsfragments •• HandHand finishfinish usingusing computercomputer toolstools •• RRequiresequires constantconstant reassemblyreassembly andand rebuildsrebuilds asas newnew datadata comescomes inin •• EE.g..g. PHRED/PHRAPPHRED/PHRAP andand StadenStaden
Computational Biology Research Group AnalyseAnalyse
TakeTake thethe assembledassembled stringstring ofof nucleotidesnucleotides
AGTACGTAGTAGCTGCTGCTACGTGCGCTAGCTAGTACG TCACGACGTAGATGCTAGCTGACTCGATGCAGACTGCTA GCTGCCAGCGACTCAGCTACGACTAGCATCGGCGCTAG CATCGGCAGC…
Computational Biology Research Group FindFind genesgenes
5’ 3’
•• TTrainrain algorithmalgorithm toto looklook forfor featurfeatureses e.g.e.g. •• SpSplilicece sisitestes •• StarStartt // StopStop codonscodons •• CodonCodon ffrrequencyequency •• PromotersPromoters
•• UUsese existingexisting biologicalbiological informationinformation e.g.e.g. ESTs,ESTs, cDNAcDNA •• BBuilduild aa modelmodel ofof genegene structurestructure Computational Biology Research Group FindFind translatedtranslated protein(s)protein(s)
TranslateTranslate DNADNA toto theoreticaltheoretical proteinprotein
5’ 3’
>Unknown Sequence VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF DLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLR VDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKY R
Computational Biology Research Group FindFind FunctionFunction
MajorMajor challengechallenge inin bioinformaticsbioinformatics •• SSearchearch thethe proteinprotein sequencesequence vsvs databasedatabase ofof proteinsproteins ofof knownknown function*function* •• PProteinrotein domainsdomains areare evolutionarilyevolutionarily conservedconserved •• PProteinsroteins thatthat areare similarsimilar inin sequencesequence acrossacross severalseveral speciesspecies areare llikelyikely toto havehave aa similarsimilar functionfunction •• BBLASTLAST :: •• AA queryquery sequencesequence •• SequenceSequence databasedatabase (protein(protein oror nucleotide)nucleotide) •• InspectionInspection ofof significantsignificant hitshits •• TTherehere areare manymany otherother methodsmethods usedused toto implyimply function!function!
* Many of the databases contain errors
Computational Biology Research Group ExampleExample
Query Human, Unknown
Sequence Uniprot database
Chimp, Myoglobin Search Results Pig, Myoglobin Mouse, Myoglobin
Putative function = Human, Myoglobin
Computational Biology Research Group AnnotateAnnotate
•• ResultsResults ofof rawraw genegene analysisanalysis areare FEATFEATURURESES •• IntegrationIntegration ofof features,features, biolbiologicalogical rulesrules andand knoknowwledgeledge makemake ANNOTATIONSANNOTATIONS •• WriteWrite thesethese backback toto thethe databasedatabase •• AutomatedAutomated whatwhat wouldwould takentaken hundredshundreds ofof sciescienntiststists toto dodo
Computational Biology Research Group EnsemblEnsembl
EnsemblEnsembl GenomeGenome BrowserBrowser (www.ensembl.org)(www.ensembl.org)
Computational Biology Research Group UCSCUCSC GenomeGenome BrowserBrowser -- http://genome.ucsc.edu/http://genome.ucsc.edu/
Computational Biology Research Group AndAnd thisthis isis justjust thethe start…start…
BioinformaticsBioinformatics isis nownow inin thethe ‘post‘post -- ggenomicsenomics’’ eraera •• TTranscriptomicsranscriptomics •• EEpigenomicspigenomics •P•Prrootteeoommiiccss •• CComparativeomparative genomicsgenomics •• MMetabolicetabolic pathwayspathways •• RRegulatoryegulatory networksnetworks •• VVirtualirtual CellsCells •• MModellingodelling SystemsSystems andand OrgansOrgans
Computational Biology Research Group MicroarraysMicroarrays
•• UsedUsed inin largelarge scalescale functionalfunctional studiesstudies •• LookingLooking forfor patternspatterns ofof genegene expressionexpression e.g.e.g. •• DDiseaseisease vsvs NormalNormal •O•Ovveerr ttiimmee •• BioinformaticsBioinformatics usedused toto •• NNormaliseormalise imagesimages •• MMeasuringeasuring andand adjustadjust forfor variabilityvariability •• AAnalysisnalysis ofof differdifferentiallyentially expressedexpressed genesgenes •• SStoringtoring datadata MuchMuch moremore complicatedcomplicated datadata thanthan sequencessequences http://www.ebi.ac.uk/microarray/biology_intro.html#Microarrays •• TranscriptomeTranscriptome
Computational Biology Research Group ProteomicsProteomics
SampleSample ProteomeProteome
FractionationFractionation
ProteinProtein AnnotationAnnotation // BiBioinformaticsoinformatics
MassMass spectrometryspectrometry
Computational Biology Research Group SystemsSystems BiologyBiology
HowHow everythingeverything fitsfits togethertogether byby takingtaking aa holisticholistic viewview ofof aa biologicalbiological systemsystem •• DNADNA •• RNARNA •P•Prrootteeiinnss •• PProteinrotein InteractionsInteractions •N•Neettwwoorrkkss •• CCellsells •• OOrgansrgans
WillWill requirerequire aa hugehuge amountamount ofof datadata andand correspondingcorresponding computationalcomputational InfrastructureInfrastructure
Computational Biology Research Group SystemsSystems biologybiology examplesexamples
•• DennisDennis BrayBray (Cambridge)(Cambridge) •• hhttp://www.pdn.cam.ac.uk/groups/comp-cell/ttp://www.pdn.cam.ac.uk/groups/comp-cell/ •• MModellingodelling chemotaxischemotaxis inin bacteriabacteria •• DenisDenis NobleNoble (Oxford)(Oxford) -- ScienceScience,, VolVol 295,295, IssueIssue 5560,5560, 1678-16821678-1682 •• MModellingodelling tthehe HearHeart-t---ffromrom GenesGenes toto CellsCells toto thethe WholeWhole OrganOrgan
•• http://www.physiome.org/http://www.physiome.org/
Computational Biology Research Group ListList ofof usefuluseful websiteswebsites
•• WWWWWW hashas drivendriven aa lotlot ofof bioinformaticsbioinformatics researchresearch •• LowLow poweredpowered computerscomputers accessaccess veryvery highhigh poweredpowered computerscomputers •• ImportantImportant toto useuse allall resourcesresources availableavailable toto dodo researchresearch •• HundredsHundreds ofof sitessites availableavailable
Computational Biology Research Group EBIEBI -- http://www.ebi.ac.uk/http://www.ebi.ac.uk/
Computational Biology Research Group NCBINCBI -- http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/
Computational Biology Research Group BioMartBioMart -- http://www.biomart.orghttp://www.biomart.org//
Computational Biology Research Group TIGRTIGR -- http://www.tigr.org/http://www.tigr.org/
Computational Biology Research Group ExPASyExPASy -- http://www.expasy.org/http://www.expasy.org/
Computational Biology Research Group PDBPDB -- http://www.rcsb.org/pdb/http://www.rcsb.org/pdb/
Computational Biology Research Group BioperlBioperl -- http://bioperl.org/http://bioperl.org/
Computational Biology Research Group CBRGCBRG -- http://www.cbrg.ox.ac.ukhttp://www.cbrg.ox.ac.uk
Computational Biology Research Group