CMSE 520
BIOMOLECULAR STRUCTURE, FUNCTION AND DYNAMICS
(Computational Structural Biology) OUTLINE
Review: Molecular biology Proteins: structure, conformation and function(5 lectures) Generalized coordinates, Phi, psi angles, DNA/RNA: structure and function (3 lectures) Structural and functional databases (PDB, SCOP, CATH, Functional domain database, gene ontology) Use scripting languages (e.g. python) to cross refernce between these databases: starting from sequence to find the function Relationship between sequence, structure and function Molecular Modeling, homology modeling Conservation, CONSURF Relationship between function and dynamics Confromational changes in proteins (structural changes due to ligation, hinge motions, allosteric changes in proteins and consecutive function change) Molecular Dynamics Monte Carlo Protein-protein interaction: recognition, structural matching, docking PPI databases: DIP, BIND, MINT, etc... References:
CURRENT PROTOCOLS IN BIOINFORMATICS (e-book) (http://www.mrw.interscience.wiley.com/cp/cpbi/articles/bi0101/frame.html) Andreas D. Baxevanis, Daniel B. Davison, Roderic D.M. Page, Gregory A. Petsko, Lincoln D. Stein, and Gary D. Stormo (eds.) 2003 John Wiley & Sons, Inc.
INTRODUCTION TO PROTEIN STRUCTURE Branden C & Tooze, 2nd ed. 1999, Garland Publishing
COMPUTER SIMULATION OF BIOMOLECULAR SYSTEMS Van Gusteren, Weiner, Wilkinson
Internet sources Ref: Department of Energy Rapid growth in experimental technologies
Human Genome Projects Two major goals 1. DNA mapping 2. DNA sequencing Rapid growth in experimental technologies
z Microrarray technologies – serial gene expression patterns and mutations z Time-resolved optical, rapid mixing techniques - folding & function mechanisms (Æ ns) z Techniques for probing single molecule mechanics (AFM, STM) (Æ pN) Æ more accurate models/data for computer-aided studies
Weiss, S. (1999). Fluorescence sp Science 283, 1676-1683.
ectroscopy of si
ngle molecules.
function
StructuralStructural Biology/MolecularBiology/Molecular BiophysicsBiophysics MostMost (all?)(all?) basicbasic “life“life processes”processes” areare mediatedmediated byby “machines”“machines” thatthat representrepresent thethe ultimateultimate miniaturizationminiaturization achievableachievable inin aa universeuniverse comprisedcomprised ofof atomsatoms andand molecules.molecules. TheThe goalgoal isis toto understandunderstand thethe underlyingunderlying principlesprinciples thatthat governgovern thethe operationoperation ofof thesethese molecularmolecular machines.machines. WhatWhat thththisisis coursecourse isis aboutabout overviewoverview ofof waysways inin whichwhich computerscomputers areare usedused toto solvesolve problemsproblems inin biologybiology supervisedsupervised learninglearning ofof illustrativeillustrative oror frequentlyfrequently--usedused algorithmsalgorithms andand programsprograms andand databasesdatabases supervisedsupervised learninglearning ofof programmingprogramming techniquestechniques andand algorithmsalgorithms selectedselected fromfrom thesethese usesuses StructureStructure
WhatWhat dodo thethe moleculesmolecules looklook like?like? HowHow dodo wewe determinedetermine thatthat experimentally?experimentally? AreAre therethere generalgeneral structuralstructural principles?principles? HowHow isis thisthis informationinformation organized?organized? HowHow dodo structuralstructural generalizationsgeneralizations relaterelate toto simplesimple physical/chemicalphysical/chemical principles?principles? DynamicsDynamics
TimeTime isis ofof thethe essenceessence inin biologicalbiological processesprocesses thereforetherefore howhow dodo wewe understandunderstand timetime--dependentdependent processesprocesses atat thethe molecularmolecular level?level? HowHow dodo wewe dodo thisthis experimentally?experimentally? HowHow dodo wewe dodo thisthis computationally?computationally? PromisingPromising FutureFuture forfor ComputationalComputational BiologyBiology Exponential growth in data Sequence and structure data from experiments Computational technology 12,665 structures as of July 11, 2000
22,810 structures as of October 7, 2003
35,026 structures as of February 7, 2006
Rost, B. (1998). Marrying structure and genomics. Structure 6, 259-263 Large databases
Archival databanks of biological information Protein, DNA sequence databases Protein structure and nucleic acid databases Protein expression patterns Experimental Tecniques Derived databanks Sequence motifs Mutations and variations in proteins Classifications and or relationships
Databanks of web sites Databanks of databanks containing biological information Links between databanks BIOINFORMATICS (definition)
Definition by Luscombe et al., Yale, Dept. of Molecular Biophysics and Biochemistry, 2001
“Bioinformatics is conceptualizing biology in terms of macromolecules (in the sense of physical chemistry) and then applying ‘informatics’ techniques (derived from disciplines such as applied maths, computer science, and statistics) to understand and organize the information associated with these molecules, on a large-scale” COMPUTATIONAL BIOLOGY (definition)
Definition by NIH (working definition)
The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.
InformationInformation flowflow
AA majormajor tasktask inin computationalcomputational molecularmolecular biologybiology isis toto “decipher”“decipher” informationinformation containedcontained inin biologicalbiological sequencessequences SinceSince thethe nucleotidenucleotide sequencesequence ofof aa genomegenome containscontains allall informationinformation necessarynecessary toto produceproduce aa functionalfunctional organism,organism, wewe shouldshould inin theorytheory bebe ableable toto duplicateduplicate thisthis decodingdecoding usingusing computerscomputers http://www-fp.mcs.anl.gov/~gaasterland/sg-review-slides.html
5 Two major challenges after completion of the HGP: StructuralStructural GenomicsGenomics andand FunctionalFunctional GenomicsGenomics
Schematic representation of the universe of proteins in a given organism
Kim, S.H. (1998). Nature Struct.Biol. 5, 643-645
Aim: “to construct the complete scheme of biological functions and cellular pathways for the entire organism” What's E-Cell Project?
E-Cell Project is an international research project aiming to model and reconstruct biological phenomena in silico, and developing necessary theoretical supports, technologies and software platforms to allow precise whole cell simulation.
Metabolism model of the model cell constructed with 127 genes
PROTEOMICSPROTEOMICS
Covers the following areas (but not limited to): ¾Protein structure Primary Structure: sequence of amino acids Secondary Structure: local spatial arrangement Tertiary Structure: three dimensional native conformation
¾Protein Function related to 3-D shape of the protein
¾Protein clusters according to a specified characteristic
¾Protein-Protein Interaction interaction among a number of proteins
¾Protein-DNA Interaction interaction between one protein and the genome