
Bioinformatics Algorithms Protein structure prediction David Hoksza http://siret.ms.mff.cuni.cz/hoksza Motivation • Sequence → structure → function • The number of available (protein) sequences grows much faster than the number of available 3D structures • Given a protein sequence we want to determine its structure • Inverse problem to protein structure design where, given a structure, we want to find sequence which codes for it 2 Structure → function • Inferring function from structure • Detection of local structural motifs with functional roles • Analysis of surface clefts → catalytic sites • Conservation analysis • Quaternary structure (beware of false positives due to crystallization) • Buried and solvent exposed residues • Issues • Moonlighting proteins • Multiple functions carried out by a single domain • Conformational change of shape upon binding • ligand-bound state (holo structures) vs unbound state (apo structure) • Intrinsically disordered proteins (IDP) • Natively unfolded proteins 3 Sequence → structure Size of common cores as a function of protein homology. If two proteins of length 푛1 and 푛2 have 푐 residues in the common core, the fractions of The relation of residue identity and the r.m.s. deviation of the backbone atoms 푐 푐 of the common cores of 32 pairs of homologous proteins each sequence in the common core are and . We plot these values, 푛1 푛2 4 connected by a bar,- against the residue identity of the core source: Chothia, Cyrus, and Arthur M. Lesk. "The relation between the divergence of sequence and structure in proteins." The EMBO journal 5.4 (1986): 823. Protein structure prediction tasks • Secondary structure prediction • Assign each amino acid one of three (or more) states (helix, sheet, loop) • Tertiary structure prediction • Assign each amino acid/atom its position in 3D space • Interaction sites prediction • Tertiary structure (intra-molecular) contacts • Protein-protein/DNA/RNA sites prediction (inter-molecular quaternary structure contacts) • Protein-ligand (active sites/pockets) prediction 5 Protein structure determination 6 Protein structure determination • X-ray crystallography (89%) • NMR spectroscopy (8%) • 3D (cryo) electron microscopy (EM) (2%) 7 X-ray crystallography • Crystallized protein subjected to X-ray beams, electrons disperse the beam, interfering with each other forming a diffraction patterns which is observed • Electron density of crystal is determined by the positions of electrons (atoms) ↔ magnitudes and phases of the X-ray diffraction waves = diffraction pattern of the crystal • Fourier transformation is used to estimate the electron density for each position • Works only for proteins which form a crystal → suitable for rigid proteins but unsuitable for flexible proteins source: https://www.nature.com/news/cryo-electron-microscopy-wins-chemistry-nobel-1.22738 8 X-ray crystallography – quality measures Electron density map • Resolution • 3Å→ secondary structure • 2.5Å→ side chains • <1Å→ hydrogen atoms • R-factor • After structure reconstruction, theoretical diffraction pattern can be computed → difference between real and theoretical pattern 3.7 Å 2.4 Å expressed as percentage (how well model back-predicts the data) • Rule of thumb - good structure should have R-factor lower than resolution/10 ( ≤ 0.3 for 3Å resolution) • R(free)-factor • When set aside data is used for the real pattern • B-factor (temperature factor) • Thermal motion is present even in crystal → extent to which electron density is spread out for each atom • 퐵 = 8휋2푈2 1.5 Å 0.8 Å 9 source:: Finding the best data for your needs in the PDB archive (EBI webinar - youtube) NMR spectroscopy • Purified protein in a solution is put to strong magnetic PDB ID: 6F0Y field and probed with radio waves and observed resonances (each atom has characteristic resonance in magnetic field based on its surroundings) which are analyzed to build a model of atomic nuclei and bonded atoms • Resonances give indication of which atoms are close to each other → list of restraints to build the model • NMR structure commonly includes ensemble of structures which fit the constraints → diverse regions correspond to flexible parts PDB ID: 5MN3 • Proteins in solution → works also for flexible proteins which can’t be locked in a crystal • Works for small to medium-sized proteins PDB ID: 6BNH NMR spectroscopy – quality measures • Completeness of resonance assignments • Percentage of atoms for which the resonances were measured • Statistically unusual resonances • Random coil index • How does the resonance fit usual protein conformations such as secondary structure 11 source:: Finding the best data for your needs in the PDB archive (EBI webinar - youtube) 3D cryo-EM • A beam of electrons and a system of electron lenses is used to image the biomolecule directly. • Cryo-EM • Vitrification - protein solution is cooled so rapidly that water molecules do not have time to crystallize → thin layer of non-crystalline ice • Thousands of 2D projection images → 3D density map → fitting atomic model to the map • Chemistry Nobel prize in 2017 - Jacques Dubochet, Joachim Frank and Richard Henderson • Ability to analyze large, complex and flexible structure • Works for proteins in native state • Often breaking 3Å resolution barrier PDB ID: 3j3q 12 13 ource: "Protein Data Bank: the single global archive for 3D macromolecular structure data." Nucleic Acids Research 47, no. D1 (2018): D520-D528. 14 Protein folding 15 Protein folding • Folding (skládání) is the process through which protein obtains its three- dimensional structure • The protein wants to fold into most thermodynamically efficient state, i.e. state with the lowest free energy • Information for folding is (mostly) driven by protein’s amino acid sequence through thermodynamic process • Anfinsen’s dogma 16 Anfinsen’s dogma • All information needed to fold native structure of a protein is contained in its amino acid sequence • Experiment with ribonuclease A (RNaseA), a 124-long extracellular enzyme with 4 disulfide bonds • Observation 1. SS bonds reduced using mercaptoethanol → denaturation with 8M urea → inactive protein, flexible random polymer 2. Removal of urea → oxidation of –SH groups back to SS bonds → regain of 90% of activity • Control (proving that the protein was unfolded) • Change of the order of steps in second phase → 1-2% of activity and random assortment of SS bonds 17 Levinthal’s paradox • Reaching native folded state of a protein by a random search among all possible configurations can take an enormously long time • Unfolded polypeptide chain has many degrees of freedom • Even a small number of allowed 휙 and 휓 combinations leads to astronomically large number of structures • Proteins fold in at most seconds which is a paradox → there must be pathway or set of pathways leading to energetically favorable conformation • Biased search • When considering some conformations as stabilizing and preferred (energy bias), the folding time becomes reasonable [Zwanzig et al. "Levinthal's paradox." PNSA 89.1 (1992): 20-22] 18 Structure prediction 19 Template existence dependency of tertiary structure prediction approaches sequence identity 20% – 30% night twilight zone day Combinatorial exploration Utilization of existing structures of the folding space with the goal to find the state with the lowest energy 20 source: Krieger, E., Nabuurs, S. B. and Vriend, G. (2003) Homology Modeling, in Structural Bioinformatics 21 Model scoring 23 Energy/scoring functions • Native structure is the lowest free energy conformation → need for a function capable to assess energy/quality of a proposed structure • Approaches • Potential energy • Atom-level resolution • Based on energy terms • Knowledge-based scoring functions • Residue-level resolution • Recognizing good folds from existing knowledge (PDB) 24 Potential energy function • Potential energy function defines the potential energy of a system of positions of all its atoms • Behavior of a molecule can be described by the Schrödinger equation which in general describes behavior of a dynamic system • We need to consider not only the atoms of the molecules, but also surrounding water molecules • Were we able to compute the energy of the system, we could use it to score our predictions • To compute the equation we need to consider all nuclei and electrons of the system and their interactions → impossible to solve for more than few atoms systems → potential energy function / force field • Molecular mechanics force field • Consists of energetic contribution of covalent (bonded) and electro-static (non-bonded) interactions • Each contribution consists of a functional part and its parametrization • Atoms represented by their centers only, but that depends on the type of energy function 25 Potential energy function – covalent interactions spring equilibrium bond • Bond-length potential constant length • Treating bond as a spring and describing its energy by Hooke’s law bond length • Bonds between chemically similar atoms have similar lengths, thus we can assume the observed 2 equilibrium is the one with minimum potential 퐸푏표푛푑 = 퐾푟 푟 − 푟푒푞 energy • Bond-angle potential 2 • Same as bonds 퐸푎푛푔푙푒 = 퐾휃 휃 − 휃푒푞 • Dihedral angle potential Barrier height given number of • Dihedral angles do not have single energy energy minima minimum • Not sufficient to represent energy of a dihedral 푉 angle and often combined with electrostatic 퐸 = 푛 [1 + cos 푛휙 − 훾 ] energy between the first and last atom of the 푑푖ℎ푒푑푟푎푙 2 atoms involved in the dihedral angle angular offset 26 source:
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages68 Page
-
File Size-