Characterization of natural product biological imprints for computer-aided drug design applications Noe Sturm
To cite this version:
Noe Sturm. Characterization of natural product biological imprints for computer-aided drug design applications. Cheminformatics. Université de Strasbourg, 2015. English. NNT : 2015STRAF059. tel-01300872
HAL Id: tel-01300872 https://tel.archives-ouvertes.fr/tel-01300872 Submitted on 11 Apr 2016
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. UNIVERSITÉ DE STRASBOURG
ÉCOLE DOCTORALE DES SCIENCES CHIMIQUES
Laboratoire d’Innovation Thérapeutique, UMR 7200 en cotutelle avec Eskitis Institute for Drug Discovery, Griffith University
THÈSE
présentée par Noé STURM
soutenue le : 8 Décembre 2015
pour obtenir le grade de : Docteur de l’université de Strasbourg Discipline/Spécialité : Chimie/Chémoinformatique
Caractérisation de l’empreinte biologique des produits naturels pour des applications de conception rationnelle de médicament assistée par ordinateur
THÈSE dirigée par :
KELLENBERGER Esther Professeur, Université de Strasbourg
QUINN Ronald Professeur, Université de Griffith, Brisbane, Australie
RAPPORTEURS :
IORGA Bogdan Chargé de recherche HDR, Institut de Chimie des Substances Naturelles, Gif-sur-Yvette
GÜNTHER Stefan Professeur, Université Albert Ludwigs, Fribourg, Allemagne
Acknowledgments
First of all, I would like to thank my two supervisors Professor Kellenberger Esther and Professor Quinn Ronald for their excellent assistance, both academic and personal in nature.
Esther, I remember that day, as I was still an undergraduate student. Next to a coffee achi e you asked the ope uestio : who’s keen for Australia?! As simple as it sounds, this question changed my life. Since then, you have shaped me, just like biosynthetic enzymes shape natural products and I sense that I will retain this imprint!
Now, my candidature has come to an end, it’s ti e to prove what I lear ed. I will remember this important period of my life with much nostalgia but I am also looking forward to the future. Really, I would like to express all my gratitude to you for your precious help, your opportunism and also for your patience (I know you would have needed this with me!). I owe you more than my PhD project.
Also, I would like to extend my gratitude to Professor Quinn without whom none of this could have happen. I really enjoyed being part of the Eskitis family. I learned a lot! I particularly enjoyed each of your intervention during group meetings. I’d like to thank you because you gave me the opportunity to study natural products and their biosynthetic origins. It is fascinating. I also would like to express my admiration to you for what you do. I ea , who would ’t wa t to have such an incredible driving power.
I also would like to thank Dr. Rognan Didier who has allowed me to work in his laboratory in the first place. Your gifted nature for science impresses me even today. It was an honor to work with your lab.
III
I would also like to thank Dr. Campitelli who was an excellent colleague at Eskitis
University. Also, as part of Eskitis engineering members, I would like to thanks Stephen
Toms for his sympathy and devotion to the group.
Then, I would like to thank my PhD colleagues. I’ll start with a special thanks to
Jeremy Desaphy and Jamel Meslamani. You guys are awesome! Both of you have guided my first steps. I also would like to thank other colleagues in Illkrich, Guillaume Bret,
Franck Da Silva, Ina Slynko and the ones that I am forgetting for their supportive behaviors.
At Eskitis, there are numerous past and present PhD colleagues that were just great with me. I have special thoughts to Liliana Pedro and her devotion to science.
Thanks to the French connection, Marie-Laure Vial, Ro ai Morvel Lepage and Fanny
Lombard for your efforts in creating social events at Eskitis. Benoît Serive (désolé de te mettre dans le lot des doctorants), Asmaa Boufridi (you are one of us) and Jan Lanz (a swiss is everything), I am glad you came at Eskitis. And last but not least thanks to Shaz for all the chicken drawings he left on my desk.
I would like to express a special thanks to Pauline Fabre for her very supportive and attentive behavior regarding me in all aspects of life. If I am here today, it is also thanks to you.
Last, I would like to thanks the members of my family, for their support during my candidature. It was always hard to leave France mainly because of you.
IV
Résumé
La comparaison de site peut-elle v rifier l’hypoth se: «Les origines biosynthétiques des produits naturels leurs confèrent des activités biologiques»? Pour répondre à cette question, nous avons développé un outil modélisant les propriétés accessibles au solvant des sites de liaison. La méthode a montré des aspects intéressants, mais elle souffre d’u e se si ilit au coordonnées atomiques. Cependant, des méthodes e ista tes ous o t per is de prouver ue l’hypoth se est valide pour la fa ille des flavo oïdes. Afi d’ te dre l’ tude, ous avo s d velopp u proc d auto ati ue capa le de rechercher des structures d’e zy es de biosynthèse de produits naturels disposant de sites actifs capables de lier une molécule de petite taille. Nous avons trouvé les structures de 117 enzymes.
Les structures nous ont permis de caractériser divers modes de liaison substrat-enzyme, nous indiquant que l’e prei te iologi ue des produits aturels ne correspond pas toujours au modèle « clé-serrure ».
V
Abstract
Can computational binding site similarity tools verify the hypothesis: Biosynthetic moldings give potent biological activities to natural products ? To answer this question, we designed a tool modeling binding site properties according to solvent exposure. The method showed interesting characteristics but suffers from sensitivity to atomic coordinates.
However, existing methods have delivered evidence that the hypothesis was valid for the flavonoid chemical class. In order to extend the study, we designed an automated pipeline capable of searching natural product biosynthetic enzyme structures embedding ligandable catalytic sites. We collected structures of 117 biosynthetic enzymes. Finally, according to structural investigations of biosynthetic enzymes, we characterized diverse substrate-enzyme binding-modes, suggesting that natural product biological imprints usually do not agree with the key-lock odel.
VI
Statement of originality
This work has not previously been submitted for a degree or diploma in any university.
To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made in the thesis itself.
______Brisbane, 7th October 2015 Noé Sturm
VII
Table of contents
Acknowledgments ______III Résumé ______V Abstract______VI Statement of originality ______VII Table of contents ______VIII Common abbreviations ______XII
Résumé (long) ______XIII
Summary ______XXVIII
Introduction ______34
Chapter 1. Development of a computational tool for binding site comparison ____ 39 1. SITEALIGN ______41 1.1. Overview of SiteAlign ______41 1.2. SiteAlign input ______41 1.3. The sphere representation of protein binding sites ______42 1.4. Scoring similarity between maps ______47 1.5. Structural alignment ______51 2. SOLVENT ACCESSIBLE BINDING SITE DEFINITION ______52 2.1. Solvent accessibility filter ______52 2.2. Binding site delimitation ______56 2.3. Binding site mouth detection ______57 2.4. Analysis of solvent accessible binding sites ______59 3. MODIFICATION OF SITEALIGN ______61 3.1. Physico-chemical descriptors ______62 3.2. Topological descriptor ______63 3.3. Residue fingerprint ______64 3.4. Scoring function ______64 3.5. Characterization of modifications impacting binding site comparison __ 65 4. BENCHMARKING VERSIONS OF SITEALIGN ______71 4.1. Similarity threshold definition ______71 4.2. Testing SiteAlign-5 against SiteAlign-4 ______73
VIII
CONCLUSION ______78 REFERENCES ______79 ANNEX 1 ______81
Chapter 2. Structural Insights into the Molecular Basis of the Ligand Promiscuity _ 83 ABSTRACT ______84 INTRODUCTION ______84 MATERIALS AND METHODS ______85 Identification in the sc-PDB of promiscuous ligands and their targets ______85 2D-description of ligands ______85 Conformational variability of protein-bound ligands ______85 2D comparison of the targets of promiscuous ligands ______85 3D comparison of the targets of promiscuous ligands ______86 3D comparison of binding sites for promiscuous ligands ______86 RESULTS AND DISCUSSION ______86 Setting up a data set of promiscuous ligands and their bound proteins ______86 Binding sites which accommodate the same ligand are not necessarily similar _ 87 Multiple binding modes explain why ligands can bind dissimilar sites ______88 Examples of multiple binding modes of a ligand ______90 Superpromiscous ligands have extreme binding modes ______91 Do the promiscuous ligands have specific characteristics? ______91 Do the promiscuous ligands have similar affinity for their different targets ____ 93 CONCLUSIONS ______93 REFERENCES ______93
Chapter 3. Similarity Between Flavonoid Biosynthetic Enzymes and Flavonoid Protein Targets Captured by Three-Dimensional Computing Approach ______96 Abstract______98 Introduction ______98 Results and Discussion ______99 Material and Methods ______102 Three-dimensional structures of protein binding sites ______102 Binding site comparison ______103 Virtual screening ______103 References ______103
IX
Chapter 4. Inventory of Natural Product Biological Enzymes in the Protein Databank ______105 INTRODUCTION ______106 1. METHODS AND MATERIALS ______110 1.1. Knowledge-based strategy for collecting the biosynthetic enzymes ___ 110 1.2. Top-down strategy for collecting the biosynthetic enzymes ______114 1.3. Automated process for catalytic site identification ______117 2. RESULTS ______122 2.1. Statistics for the Knowledge-based approach to collect biosynthetic enzymes ______122 2.2. Statistics for the Top-down approach to collect biosynthetic enzymes 133 2.3. Comparison of top-down and knowledge-based strategies ______136 3. DISCUSSION ______144 3.1. MetaCyc fi al set a d U iProt fi al set are overlappi g ut disti ct because source data are classified differently ______144 3.2. Top-Down strategy compared to knowledge-based strategy ______146 3.3. Chemical diversity of natural products in the dataset ______148 4. PERSPECTIVES ______157 4.1. Top-down ______157 4.2. Perspectives for ligandable natural products biosynthetic enzyme structures collection. ______165 CONCLUSION ______167 REFERENCES ______169 ANNEX 4 ______174
Chapter 5. Structural Investigations of Natrual Product Biological Imprints for Binding Site Comparison ______219 INTRODUCTION ______220 1. METHOD AND MATERIALS ______222 1.1. Virtual screening ______222 1.2. Chemical structure of substrates and products ______223 1.3. Molecular recognition ______223 RESULT & DISCUSSION ______223 Part 1: chemical structures of substrates and products ______224 Part 2: substrates molecular recognition ______228 Part 3: a new example of Protein Fold Topology ______237
X
CONCLUSION ______243 REFERENCES ______245 ANNEX 5 ______248
Conclusions ______257
XI
Common abbreviations
Standard amino-acid three letter codes
Ala Alanine Leu Leucine Arg Arginine Lys Lysine Asn Asparagine Met Methionine Asp Aspartic acid Phe Phenylalanine Cys Cysteine Pro Proline Glu Glutamic acid Ser Serine Gln Glutamine Thr Threonine Gly Glycine Trp Tryptophan His Histidine Tyr Tyrosine Ile Isoleucine Val Valine
Cα Alpha atom on amino-acid side chain Cβ Beta carbon on amino-acid side chain MOL2 Tripos format of molecular structures sc-PDB Screening Protein databank Å Angström distance unit (10-10 meter) HET HETero atom group identifier H-bond Non-covalent hydrogen bond EC Enzyme Commission number
XII
Résumé (long)
INTRODUCTION
Les produits aturels so t à l’origi e des pri cipes actifs de o reu dica e ts.1 Ces pri cipes actifs pre e t effet grâce à la for atio d’u co ple e prot i e-ligand résultant de la reconnaissance moléculaire du ligand par sa cible thérapeutique. Or, dans la nature, les produits naturels sont synthétisés par des enzymes de biosynthèse. Et, lors de la iosy th se, les pr curseurs d’u produit aturel i teragisse t avec des e zy es, formant ainsi des complexes enzyme-ligand. Partant du principe que deux protéines capables de former un complexe protéine-ligand avec un ligand de structure identique partagent des propriétés structurales locales aux sites de reconnaissance du ligand, nous avons postulé ue les ases structurales de la reco aissa ce ol culaire d’u pri cipe actif d’origi e aturelle par ses e zy es de iosy th se so t gale e t pr se tes chez les protéines responsables de son effet thérapeutique (figure 1). Cette hypothèse, désignée par « Protein Fold Topology » (PFT), a été initialement formulée par RJ Quinn en