EMBL-EBI/Wellcome Trust Course: Resources for Computational Drug Discovery

Total Page:16

File Type:pdf, Size:1020Kb

Load more

EMBL-EBI/Wellcome Trust Course: Resources for Computational Drug Discovery John P. Overington Welcome! • Introductions • Overview of course material • Why ChEMBL had to be built! • The more questions you ask the more you’ll learn what you want to know • Your feedback is important to us Project Idea Phenotypic Target-based assay assay Previously Novel screened Unknown Known/similar Known/similar target structure target structure ligands In-house compound Commercial Virtual collection compounds compounds HTS Screen Purchase of specific Commission compounds synthesis Drug Overview of Material • Monday pm • Important safety stuff • Meandering intro by jpo • Perspectives and Challenges in Drug Discovery – Darren Green (GSK) • Delegate presentations – Tell us what you’re interested in, what you expect, and we’ll see what we can do Overview of Material • Tuesday • Target Analysis and Selection – Mike Barnes (Queen Mary College), Bissan Al- Lazikani (Institute of Cancer Research) and Anna Gaulton (EMBL-EBI) – Pathways, druggability, genetics, disease linkage, target annotation, prioritisation. – Hands on sessions Overview of Material • Wednesday • Databases and resources for Compound Selection – Anne Hersey (EMBL-EBI), John Irwin (UCSF), Noel O’Boyle (NextMove) – ChEMBL, PubChem, ZINC, searching, complexities of compound structure representation – Software tools – Hands on session Overview of Material • Thursday • Computational Chemistry – Val Gillet (Sheffield), Francis Atkinson (EMBL-EBI), George Papadatos (EMBL-EBI), Gary Battle (EMBL- EBI) – Chemoinformatics in compound design, structure- based design, protein structure, target profiling – Software tools - KNIME – Hands-on Overview of Material • Friday Morning – Drug Repurposing – John Overington (EMBL-EBI) – Methods and Resources, Tactics for repurposing, Product Profile – Hands on session What if things go wrong? • If you get lost, need information of transport, miss a bus, lose your computer, are arrested, need a recommendation for a nice pub, want to buy an animal to take home,….. • Tom Hancocks during the course itself • For medical emergencies – Telephone 999 • For everything else – Telephone 07557-767072 Mapping Drug and Medicinal Chemistry Space – The ChEMBL Database John P. Overington EMBL-EBI [email protected] Our Strategy and Hopes • Comprehensively catalogue historical drug discovery • Include successes and failures • Large scale abstraction curation of primary literature • Direct depositions • Drugs can be small molecules, peptides, recombinant proteins, siRNA, cells, viruses, etc. • ‘Learn’ rules for drug discovery ‘success’ • Target selection and prioritisation - druggability • Lead discovery, optimisation, clinical candidate selection • Develop approaches to new target classes – e.g. PPIs • Drug combinations to improve safety and variability Elements in Bioactive Molecules H C C N O F S Cl Organic Chemistry • Carbon-based chemistry – the chemistry of life – Natural - Methane, Carbon Dioxide, b-carotene… CH4 CO2 – Synthetic - Armodafinil, Atorvastatin How Big Is Chemical Space? • GDB databases from Jean-Louis Reymond, University of Berne, Switzerland • GDB-13 database – Small organic molecules up to 13 atoms of C, N, O, S and Cl – Simple chemical stability and synthetic feasibility rules – 977,468,314 structures – GDB-13 is the largest publicly available small organic molecule database ADMET - The Rule of Five • To be bioactive, a molecule needs to access its ‘target’ • ADMET – Adsorption, Distribution, Metabolism, Excretion and Toxicity – The body has evolved to be really choosy over the types of molecule it lets in and tolerates • Chris Lipinski while at Pfizer uncovered the Rule of Five.. – ..proposed that an organic small molecule was likely to have poor oral drug properties if • >500 Molecular Weight (size of molecule) • >5 logP (greasiness of molecule) • >5 Hydrogen bond donors (polarity of molecule) • >10 Ns and Os (polarity of molecule) – Topical and parenteral dosed drugs are different How Big is Bioactive Chemical Space? Likely to be ~1019 organic small molecules obeying Lipinski’s Rule of Five How Many Biological Targets Are There? • Genes within the genome encode “target’ proteins – Bioactive molecules usually interact with proteins • Typical gene numbers in important genomes – fX174 Phage (a virus that infects bacteria) 11 – Escherichia coli (a bacteria) 4,377 – Plasmodium falciparum (the malaria parasite) 5,268 – Drosophila melanogaster (a fruit fly) ~17,000 – Homo sapiens ~21,000 Chemical and Biological Spaces • ~10,000,000,000,000,000,000 potential small bioactive molecules – 1 g sample of each would use all the Earth’s carbon • ~100,000 potential relevant biological receptors – Humans and pathogens Chemogenomics Exploration of bioactivity space at genomic scale Structure Activity Relationship (SAR) Drugs 103 Screened molecules 107-8 All reasonable molecules 1020 Drug targets 102 Drugs Screened proteins ChEMBL 103 All reasonable Proteins 106 Presented to P&G, Cincinnati, April 2005, © 2005 Inpharmatica Ltd. Inpharmatica © 2005 Cincinnati, April 2005, P&G, to Presented Clustering and Families • Small molecules and targets can be organized into ‘families’ • – this feature greatly helps analysis and data mining Sets of related targets Sets of related small molecules Bio- and Chemoinformatics • Bioinformatics largely to comparison of 1-D ‘strings’ – nucleic acid and protein sequences – Alignment, searching, mapping,…. • Chemoinformatics is more complex – Chemical structures are 2-D – more difficult to represent on computers – Alignment, searching are still areas of active research – Mapping is largely solved by the InChI – Open Standard and Software for unambiguous representation of chemical structures • originally developed by NIST Aspirin InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12) Chemical Space All compounds Drug-like compounds Available compounds Only certain molecules have features consistent with good pharmacological properties Target Space All targets Druggable targets Available targets Only certain targets have binding sites capable of ligand efficient binding of drug-like ligands Accessible Pharmacological Space Drug-like compounds but no complementary targets Druggable targets Available and complementary compounds for compounds Druggable targets target but non-drug- but no like complementary compounds Drug Optimisation Imidazole triazole 1st generation 2nd generation 3rd generation 4th generation Prototype N N O + N N O + N N O O O O S O Metronidazole 1962 Tinidazole 1970 Terconazole 1980 N Posaconazole 2005 N N Cl O + Itraconazole 1984 N N Clotrimazole 1970 Ketoconazole 1978 O Azomycin N N (1956) Cl N Cl N Cl Cl Streptomyces Cl O natural product Cl S trichomonacidal Cl Sulconazole 1980 ‘toxic’ Miconazole 1970 N N Voriconazole 2002 Cl N N Cl Cl O Econazole 1972 Fluconazole 1988 Bifonazole 1981 Fosfluconazole 2004 After W. Sneader Drug Discovery Target Lead Lead Preclinical Phase 1 Phase 2 Phase 3 Launch Discovery Discovery Optimisation Development (Phase 4) • Target identification • Medicinal • Microarray • High-throughput Chemistry profiling Screening (HTS) • Structure-based Indication • Target • Toxicology Safety • Fragment-based drug design PK discovery, validation • In vivo safety Efficacy & screening • Selectivity screens tolerability repurposing • Assay pharmacology Efficacy • Focused libraries • ADMET screens & expansion development • Formulation •Screening • Cellular/Animal • Biochemistry • Dose prediction collection disease models • Clinical/Animal • Pharmacokinetics disease models Discovery Development Use Med. Chem. SAR Clinical Candidates Drugs >1,290,000 compound records ChEMBL >6,900,000 bioactivities ~12,000 clinical ~1,600 ~44,500 abstracted papers candidates drugs content ~8,000 targets Targets of Launched Drugs Overington et al, Nat. Rev. Drug Disc., 5, pp. 993-996 (2006) Different Types of Drugs E. Lounkine et al, Nature (2012) NFκB Pathway Drug Approvals FDA Approved Drugs Affinity of Drugs for their‘Targets’ Ki, Kd, IC50, EC50, & pA2 endpoints for drugs against their‘efficacy targets’ 400 350 300 250 200 Frequency 150 100 50 0 2 3 4 5 6 7 8 9 10 11 12 -log10 affinity 10mM 1mM 100mM 10mM 1mM 100nM 10nM 1nM 100pM 10pM 1pM Overington, et al, Nature Rev. Drug Discov. 5 pp. 993-996 (2006) Gleeson et al, Nature Rev. Drug Discov. 10 pp. 197-208 (2011) Clinical Candidates • Database of clinical development candidates – Contains ~12,000 2-D structures/sequences • Estimated size ~35-45,000 compounds – Work in progress • Deeper coverage of key gene families • e.g. Protein kinases, 361 distinct clinical candidates Pharma Industry Productivity File Registration number vs. USAN date 800,000 Phase 2b date 700,000 ~Discovery date 600,000 500,000 400,000 300,000 200,000 100,000 0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Overington, unpublished Pharma Industry Productivity 16 Drugs/100,000 compounds Large Pharma needs to 64 USANs/100,000 compounds 70 synthesise and test ~250,000 60 compounds for each drug 50 40 30 0.4 Drugs/100,000 compounds 20 1.9 USANs/100,000 compounds 10 0 1- 100,001- 200,001- 300,001- 400,001- 500,001- 600,001- 700,001, 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 File registration number range Overington, unpublished Clinical Candidates What Is the ChEMBL Data? What Is the ChEMBL Data? Compound >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSY EEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRS RYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGInhibition
Recommended publications
  • Stem Strategy

    Stem Strategy

    SUCCESS THROUGH STEM STEM STRATEGY In response to the ‘Report of the STEM Review’ HELPING TO EMPOWER FUTURE GENERATIONS THROUGH SCIENCE, TECHNOLOGY, ENGINEERING AND MATHEMATICS TO GROW A DYNAMIC, INNOVATIVE ECONOMY 2011 CONTENTS 1. INTRODUCTION 4 2. CONTEXT 5 3. THE ROLE OF THE DEMAND SIDE 8 4. THE ROLE OF THE SUPPLY SIDE 10 5. RECOMMENDATIONS FOR ACTION 15 6. STRUCTURES FOR IMPLEMENTATION 23 7. CONCLUSION AND PRIORITY ACTIONS 25 ANNEX A – Existing Government STEM Activity ANNEX B – Government STEM Action Plan 1. INTRODUCTION Commissioned by the Department for Employment The Report contains 20 recommendations grouped and Learning (DEL) and the Department of Education under four ‘imperatives’. (DE), the review of Science, Technology, Engineering • Imperative 1 - Business must take the lead and Mathematics (STEM) commenced formally on in promoting STEM. 29 June 2007. Chaired by Dr Hugh Cormican, founder and former Chief Executive of Andor Technologies • Imperative 2 - The key constraints in the STEM Ltd., the steering group comprised representatives artery must be alleviated. from business, government and academia and the Programme Manager for the review was Dr Alan Blair, • Imperative 3 - There needs to be increased from the Association of NI Colleges (now Colleges NI). flexibility in the provision of STEM education. Three working groups reported to the steering group, • Imperative 4 - Government must better each of which was responsible for taking forward a coordinate its support for STEM. key strand of the Review. These working groups ensured This STEM Strategy forms Government’s response a focus on the respective roles of business, education, to the ‘Report of the STEM Review’.
  • Jeremy Farrar

    Jeremy Farrar

    FEATURE The BMJ THE BMJ INTERVIEW BMJ: first published as 10.1136/bmj.n459 on 19 February 2021. Downloaded from [email protected] Cite this as: BMJ 2021;372:n459 http://dx.doi.org/10.1136/bmj.n459 Jeremy Farrar: Make vaccine available to other countries as soon as Published: 19 February 2021 our most vulnerable people have received it The SAGE adviser and Wellcome Trust director tells Mun-Keat Looi how the UK government acted too slowly against the pandemic, about the perils of vaccine nationalism, and why he is bullish about controlling covid variants Mun-Keat Looi international features editor “Once the UK has vaccinated our most vulnerable among healthcare workers. We had no human communities and healthcare workers we should make immunity, no diagnostics, no treatment, and no vaccines available to other countries,” insists the vaccines. infectious disease expert Jeremy Farrar. This could Every country should have acted then. Singapore, avert further public health and economic disaster, China, and South Korea did. Yet most of Europe and he says, describing it as “enlightened self-interest, North America waited until the middle of March, and as well as the right ethical thing to do.” that defined the first wave. Countries including the In April 2020, soon after the first UK lockdown began, UK were unwilling to act early, before they felt Farrar predicted that the UK would have one of the comfortable; were unwilling to go deeper than they worst covid-19 death rates in Europe. As a member thought they had to; and were unwilling to keep of the Scientific Advisory Group for Emergencies restrictions in place for as long as was needed.
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The

    The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The

    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
  • Cryptic Inoviruses Revealed As Pervasive in Bacteria and Archaea Across Earth’S Biomes

    Cryptic Inoviruses Revealed As Pervasive in Bacteria and Archaea Across Earth’S Biomes

    ARTICLES https://doi.org/10.1038/s41564-019-0510-x Corrected: Author Correction Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes Simon Roux 1*, Mart Krupovic 2, Rebecca A. Daly3, Adair L. Borges4, Stephen Nayfach1, Frederik Schulz 1, Allison Sharrar5, Paula B. Matheus Carnevali 5, Jan-Fang Cheng1, Natalia N. Ivanova 1, Joseph Bondy-Denomy4,6, Kelly C. Wrighton3, Tanja Woyke 1, Axel Visel 1, Nikos C. Kyrpides1 and Emiley A. Eloe-Fadrosh 1* Bacteriophages from the Inoviridae family (inoviruses) are characterized by their unique morphology, genome content and infection cycle. One of the most striking features of inoviruses is their ability to establish a chronic infection whereby the viral genome resides within the cell in either an exclusively episomal state or integrated into the host chromosome and virions are continuously released without killing the host. To date, a relatively small number of inovirus isolates have been extensively studied, either for biotechnological applications, such as phage display, or because of their effect on the toxicity of known bacterial pathogens including Vibrio cholerae and Neisseria meningitidis. Here, we show that the current 56 members of the Inoviridae family represent a minute fraction of a highly diverse group of inoviruses. Using a machine learning approach lever- aging a combination of marker gene and genome features, we identified 10,295 inovirus-like sequences from microbial genomes and metagenomes. Collectively, our results call for reclassification of the current Inoviridae family into a viral order including six distinct proposed families associated with nearly all bacterial phyla across virtually every ecosystem.
  • Learning Protein Constitutive Motifs from Sequence Data Je´ Roˆ Me Tubiana, Simona Cocco, Re´ Mi Monasson*

    Learning Protein Constitutive Motifs from Sequence Data Je´ Roˆ Me Tubiana, Simona Cocco, Re´ Mi Monasson*

    TOOLS AND RESOURCES Learning protein constitutive motifs from sequence data Je´ roˆ me Tubiana, Simona Cocco, Re´ mi Monasson* Laboratory of Physics of the Ecole Normale Supe´rieure, CNRS UMR 8023 & PSL Research, Paris, France Abstract Statistical analysis of evolutionary-related protein sequences provides information about their structure, function, and history. We show that Restricted Boltzmann Machines (RBM), designed to learn complex high-dimensional data and their statistical features, can efficiently model protein families from sequence information. We here apply RBM to 20 protein families, and present detailed results for two short protein domains (Kunitz and WW), one long chaperone protein (Hsp70), and synthetic lattice proteins for benchmarking. The features inferred by the RBM are biologically interpretable: they are related to structure (residue-residue tertiary contacts, extended secondary motifs (a-helixes and b-sheets) and intrinsically disordered regions), to function (activity and ligand specificity), or to phylogenetic identity. In addition, we use RBM to design new protein sequences with putative properties by composing and ’turning up’ or ’turning down’ the different modes at will. Our work therefore shows that RBM are versatile and practical tools that can be used to unveil and exploit the genotype–phenotype relationship for protein families. DOI: https://doi.org/10.7554/eLife.39397.001 Introduction In recent years, the sequencing of many organisms’ genomes has led to the collection of a huge number of protein sequences, which are catalogued in databases such as UniProt or PFAM Finn et al., 2014). Sequences that share a common ancestral origin, defining a family (Figure 1A), *For correspondence: are likely to code for proteins with similar functions and structures, providing a unique window into [email protected] the relationship between genotype (sequence content) and phenotype (biological features).
  • DECIPHER: Harnessing Local Sequence Context to Improve Protein Multiple Sequence Alignment Erik S

    DECIPHER: Harnessing Local Sequence Context to Improve Protein Multiple Sequence Alignment Erik S

    Wright BMC Bioinformatics (2015) 16:322 DOI 10.1186/s12859-015-0749-z RESEARCH ARTICLE Open Access DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment Erik S. Wright1,2 Abstract Background: Alignment of large and diverse sequence sets is a common task in biological investigations, yet there remains considerable room for improvement in alignment quality. Multiple sequence alignment programs tend to reach maximal accuracy when aligning only a few sequences, and then diminish steadily as more sequences are added. This drop in accuracy can be partly attributed to a build-up of error and ambiguity as more sequences are aligned. Most high-throughput sequence alignment algorithms do not use contextual information under the assumption that sites are independent. This study examines the extent to which local sequence context can be exploited to improve the quality of large multiple sequence alignments. Results: Two predictors based on local sequence context were assessed: (i) single sequence secondary structure predictions, and (ii) modulation of gap costs according to the surrounding residues. The results indicate that context-based predictors have appreciable information content that can be utilized to create more accurate alignments. Furthermore, local context becomes more informative as the number of sequences increases, enabling more accurate protein alignments of large empirical benchmarks. These discoveries became the basis for DECIPHER, a new context-aware program for sequence alignment, which outperformed other programs on largesequencesets. Conclusions: Predicting secondary structure based on local sequence context is an efficient means of breaking the independence assumption in alignment. Since secondary structure is more conserved than primary sequence, it can be leveraged to improve the alignment of distantly related proteins.
  • Evidence Synthesis on the EU-UK Relationship on Research and Innovation January 2018

    Evidence Synthesis on the EU-UK Relationship on Research and Innovation January 2018

    Evidence synthesis on the EU-UK relationship on research and innovation January 2018 1. Introduction The Royal Society and the Wellcome Trust have undertaken a rapid evidence synthesis on the EU-UK research and innovation relationship as part of their Future Partnership Project. Organisations and individuals were invited to submit evidence and analyses for inclusion. Evidence was also gathered through internet searches to ensure an inclusive approach. The Annex is a summary of the methods. Two questions were used in gathering evidence and in determining the material in scope: 1. What incentives, infrastructure and mechanisms can be accessed by research and innovation organisations, funders and individuals in Member States to support collaborations? 2. How do Member States currently use and benefit from these and how might they be affected by Brexit? This paper is a synthesis of the evidence and covers funding, infrastructures, mobility, collaboration and regulation, with a focus on links between the EU and the UK. 2. Overview of the evidence base A few major reports were of particular relevance; the Royal Society’s three reports on the role of the EU in UK research and innovation and two reports commissioned from Technopolis Group by UK organisations, on the role of EU funding in UK research and innovation and the impact of collaboration: the value of UK medical research to EU science and health1,2. These documents were often referenced in other submissions. A report from the Lords Science and Technology Committee’s inquiry on EU Membership and UK Science also summarises many sources of evidence relevant to this synthesis.
  • Janet Thornton / 19 July 2018

    Janet Thornton / 19 July 2018

    Oral History: Janet Thornton / 19 July 2018 DISCLAIMER The information contained in this transcript is a textual representation of the recoded interview which took place on 2018-07-19 as part of the Oral Histories programme of the EMBL Archive. It is an unedited, verbatim transcript of this recorded interview. This transcript is made available by the EMBL Archive for free reuse for research and personal purposes, providing they are suitably referenced. Please contact the EMBL Archive ([email protected]) for further information and if you are interested in using material for publication purposes. Some information contained herein may be work product of the interviewee and/or private conversation among participants. The views expressed herein are solely those of the interviewee in his private capacity and do not necessary reflect the views of the EMBL. EMBL reserves the right not to be responsible for the topicality, accuracy, completeness or quality of the information provided. Liability claims regarding damage caused by the use of any information provided, including any kind of information which is incomplete or incorrect, will therefore be rejected. 2 2018_07_19_JanetThornton Key MG: Mark Green, former head of Administration at EMBL-EBI JT: Participant, Janet Thornton, former Director of EMBL-EBI and current EMBL-EBI Research Group Leader [??? At XX:XX] = inaudible word or section at this time MG: My name is Mark Green. This is Thursday 19th July 2018 and I’m in the Pompeian Room in Hinxton Hall on the Wellcome Genome Campus where EMBL-EBI is based and I’m about to do an interview as part of the oral histories programme of the EMBL Archive, with Janet Thornton, and I’d just like to ask Janet to introduce herself and to say a bit about her life before EMBL.
  • 1 Codon-Level Information Improves Predictions of Inter-Residue Contacts in Proteins 2 by Correlated Mutation Analysis 3

    1 Codon-Level Information Improves Predictions of Inter-Residue Contacts in Proteins 2 by Correlated Mutation Analysis 3

    1 Codon-level information improves predictions of inter-residue contacts in proteins 2 by correlated mutation analysis 3 4 5 6 7 8 Etai Jacob1,2, Ron Unger1,* and Amnon Horovitz2,* 9 10 11 1The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat- 12 Gan, 52900, 2Department of Structural Biology 13 Weizmann Institute of Science, Rehovot 7610001, Israel 14 15 16 *To whom correspondence should be addressed: 17 Amnon Horovitz ([email protected]) 18 Ron Unger ([email protected]) 19 1 20 Abstract 21 Methods for analysing correlated mutations in proteins are becoming an increasingly 22 powerful tool for predicting contacts within and between proteins. Nevertheless, 23 limitations remain due to the requirement for large multiple sequence alignments (MSA) 24 and the fact that, in general, only the relatively small number of top-ranking predictions 25 are reliable. To date, methods for analysing correlated mutations have relied exclusively 26 on amino acid MSAs as inputs. Here, we describe a new approach for analysing 27 correlated mutations that is based on combined analysis of amino acid and codon MSAs. 28 We show that a direct contact is more likely to be present when the correlation between 29 the positions is strong at the amino acid level but weak at the codon level. The 30 performance of different methods for analysing correlated mutations in predicting 31 contacts is shown to be enhanced significantly when amino acid and codon data are 32 combined. 33 2 34 The effects of mutations that disrupt protein structure and/or function at one site are often 35 suppressed by mutations that occur at other sites either in the same protein or in other 36 proteins.
  • Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I

    Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I

    EMBL-European Bioinformatics Institute Annual Scientific Report 2013 On the cover Structure 3fof in the Protein Data Bank, determined by Laponogov, I. et al. (2009) Structural insight into the quinolone-DNA cleavage complex of type IIA topoisomerases. Nature Structural & Molecular Biology 16, 667-669. © 2014 European Molecular Biology Laboratory This publication was produced by the External Relations team at the European Bioinformatics Institute (EMBL-EBI) A digital version of the brochure can be found at www.ebi.ac.uk/about/brochures For more information about EMBL-EBI please contact: [email protected] Contents Introduction & overview 3 Services 8 Genes, genomes and variation 8 Molecular atlas 12 Proteins and protein families 14 Molecular and cellular structures 18 Chemical biology 20 Molecular systems 22 Cross-domain tools and resources 24 Research 26 Support 32 ELIXIR 36 Facts and figures 38 Funding & resource allocation 38 Growth of core resources 40 Collaborations 42 Our staff in 2013 44 Scientific advisory committees 46 Major database collaborations 50 Publications 52 Organisation of EMBL-EBI leadership 61 2013 EMBL-EBI Annual Scientific Report 1 Foreword Welcome to EMBL-EBI’s 2013 Annual Scientific Report. Here we look back on our major achievements during the year, reflecting on the delivery of our world-class services, research, training, industry collaboration and European coordination of life-science data. The past year has been one full of exciting changes, both scientifically and organisationally. We unveiled a new website that helps users explore our resources more seamlessly, saw the publication of ground-breaking work in data storage and synthetic biology, joined the global alliance for global health, built important new relationships with our partners in industry and celebrated the launch of ELIXIR.
  • EMBL-EBI Now and in the Future

    EMBL-EBI Now and in the Future

    SureChEMBL: Open Patent Data Chemaxon UGM, Budapest 21/05/2014 Mark Davies ChEMBL Group, EMBL-EBI EMBL-EBI Resources Genes, genomes & variation European Nucleotide Ensembl European Genome-phenome Archive Archive Ensembl Genomes Metagenomics portal 1000 Genomes Gene, protein & metabolite expression ArrayExpress Metabolights Expression Atlas PRIDE Literature & Protein sequences, families & motifs ontologies InterPro Pfam UniProt Europe PubMed Central Gene Ontology Experimental Factor Molecular structures Ontology Protein Data Bank in Europe Electron Microscopy Data Bank Chemical biology ChEMBL ChEBI Reactions, interactions & pathways Systems BioModels BioSamples IntAct Reactome MetaboLights Enzyme Portal ChEMBL – Data for Drug Discovery 1. Scientific facts 3. Insight, tools and resources for translational drug discovery >Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE Compound RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG Ki = 4.5nM PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE Bioactivity data Assay/Target APTT = 11 min. 2. Organization, integration,
  • Francis Crick Institute-CS-JB080719.Indd

    Francis Crick Institute-CS-JB080719.Indd

    The Francis Crick Institute THE FRANCIS CRICK INSTITUTE The Crick is a landmark partnership between three of the UK’s largest funders of biomedical research: the Medical Research Council, Cancer Research UK and the Wellcome Trust, and three of its leading universities: UCL, Imperial College London and King’s College London. This represents an unprecedented joining of forces to tackle major scientific problems and generate solutions to the emerging health challenges of the 21st century. Business Challenge: The VIRTUS Solution: The Crick is being built in central London, where space is at a Collaboration is at the heart of the Crick’s vision. Its work premium. It was decided, early in the planning process, that will help to understand why disease develops and to find most of the Crick’s data would need to be stored off-site. new ways to diagnose, prevent and treat a range of illnesses However, the institute realised there were major benefits to – such as cancer, heart disease and stroke, infections and sharing resources with other institutions, particularly in terms neurodegenerative diseases. The Crick will bring together of scientific analysis. As the Crick’s plans developed, a number outstanding scientists from all disciplines, carrying out research of institutions – both within the original partners and more that will help improve the health and quality of people’s lives, broadly - had similar requirements and identified the same and keeping the UK at the forefront of medical innovation. potential for collaboration in having a colocated shared data centre. “The Crick has been proud to take a leading role in support of Janet.