Bioactivity Descriptors for Uncharacterized Chemical Compounds

Total Page:16

File Type:pdf, Size:1020Kb

Bioactivity Descriptors for Uncharacterized Chemical Compounds ARTICLE https://doi.org/10.1038/s41467-021-24150-4 OPEN Bioactivity descriptors for uncharacterized chemical compounds ✉ Martino Bertoni 1,6, Miquel Duran-Frigola 1,2,6 , Pau Badia-i-Mompel 1,6, Eduardo Pauls1, Modesto Orozco-Ruiz1, Oriol Guitart-Pla1, Víctor Alcalde1, Víctor M. Diaz 3,4, Antoni Berenguer-Llergo 1, ✉ Isabelle Brun-Heath 1, Núria Villegas 1, Antonio García de Herreros3 & Patrick Aloy 1,5 Chemical descriptors encode the physicochemical and structural properties of small mole- 1234567890():,; cules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, bioactivity descriptors are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our signaturizers relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, unveiling higher-order organization in natural product collec- tions, and to enrich mostly uncharacterized chemical libraries for activity against the drug- orphan target Snail1. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks. 1 Joint IRB-BSC-CRG Programme in Computational Biology, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain. 2 Ersilia Open Source Initiative, Cambridge, UK. 3 Programa de Recerca en Càncer, Institut Hospital del Mar d’Investigacions Mèdiques (IMIM) and Departament de Ciències de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain. 4 Faculty of Medicine and Health Sciences, International University of Catalonia, Barcelona, Catalonia, Spain. 5 Institució Catalana de Recerca i Estudis Avançats (ICREA), ✉ Barcelona, Catalonia, Spain. 6These authors contributed equally: Martino Bertoni, Miquel Duran-Frigola, Pau Badia-i-Mompel. email: [email protected]; [email protected] NATURE COMMUNICATIONS | (2021) 12:3932 | https://doi.org/10.1038/s41467-021-24150-4 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-021-24150-4 ost of the chemical space remains uncharted and identi- structures of the molecules, targets and metabolic genes, network fying its regions of biological relevance is key to medicinal properties of the targets, cell response profiles, drug indications, M 1,2 chemistry and chemical biology . To explore and catalog and side effects, among others (Fig. 1a). In the CC, each molecule this vast space, scientists have invented a variety of chemical is annotated with multiple n-dimensional vectors (i.e., bioactivity descriptors, which encode physicochemical and structural properties signatures) corresponding to the spaces where experimental of small molecules. Molecular fingerprints are a widespread form of information is available. As a result, chemistry (A) signatures are descriptors consisting of binary (1/0) vectors describing the presence widely available (~106 compounds), whereas cell-based assays (D) or absence of certain molecular substructures. These encodings are at cover about 30,000 molecules, and clinical (E) signatures are the core of chemoinformatics and are fundamental in compound known for only a few thousand drugs (Fig. 1b). We thus sought to similarity searches and clustering, and are applied to computational infer missing signatures for any compound in the CC, based on drug discovery (CDD), structure optimization, and target prediction. the observation that the different bioactivity spaces are not The corpus of bioactivity records available suggests that other completely independent and can be correlated. numerical representations of molecules are possible, reaching beyond Bioactivity signatures must be amenable to similarity calcula- chemical structures and capturing their known biological properties. tions, ideally by conventional metrics such as cosine or Euclidean Indeed, it has been shown that an enriched representation of mole- distances, so that short distances between molecule signatures cules can be achieved through the use of bioactivity signatures3. reflect a similar biological behavior. Therefore, inference of bioac- Bioactivity signatures are multi-dimensional vectors that capture tivity signatures can be posed as a metric learning problem where the biological traits of the molecule in a format that is akin to observed compound–compound similarities of a given kind are the structural descriptors or fingerprints used in the field of che- correlated to the full repertoire of CC signatures, so that similarity moinformatics. The first attempts to develop biological descriptors measures are possible for any compound of interest, including those for chemical compounds encapsulated ligand-binding affinities4,and that are not annotated with experimental data. In practice, for each fi fi ngerprints describing the target pro le of small molecules unveiled CC space (Si), we tackle the metric learning problem with a so- many unanticipated and physiologically relevant associations5.Cur- called Siamese Neural Network (SNN), having as input a stacked rently, public databases contain experimentally determined bioac- array of CC signatures available for the compound (belonging to – – tivity data for about a million molecules, which represent only a small any of the A1 E5 layers, S1 S25)andasoutput,ann-dimensional percentage of commercially available compounds6 and a negligible embedding optimized to discern between similar and dissimilar 7 fi fraction of the synthetically accessible chemical space .Inpractical molecules in Si. More speci cally, we feed the SNN with triplets of terms, this means bioactivity signatures cannot be derived for most molecules (an anchor molecule, one that is similar to the anchor compounds, and CDD methods are limited to using chemical (positive) and one that is not (negative)), and we ask the SNN to information alone as a primary input, thereby hindering their correctly classify this pattern with a distance measurement per- performance and not fully exploiting the bioactivity knowledge formed in the embedding space (Fig. 1a and Supplementary Fig. 1). produced over the years by the scientificcommunity. We trained 25 such SNNs, corresponding to the 25 spaces available Recently, we integrated the major chemogenomics and drug in the CC. We used 107 molecule triplets and chose an SNN databases in a single resource named the Chemical Checker (CC), embedding dimension of 128 for all CC spaces, scaling it to which is the largest collection of small-molecule bioactivity signatures the norm so as to unify the distance magnitude across SNNs (see available to date8. In the CC, bioactivity signatures are organized by “Methods” section for details). As a result of this procedure, we fi ‘ ’ data type (ligand-receptor binding, cell sensitivity pro les, toxicology, obtained 25 SNN signaturizers (S1–25), each of them devoted to etc.), following a chemistry-to-clinics rationale that facilitates the one of the CC spaces (Si). A signaturizer takes as input the subset of selection of relevant signature classes at each step of the drug dis- CC signatures available for a molecule and produces a 128D sig- covery pipeline. In essence, the CC is an alternative representation of nature that, in principle, captures the similarity profile of the the small-molecule knowledge deposited in the public domain and, as molecule in the Si CC space, where experimental information may such, it is also limited by the availability of experimental data and the not be available for the compound. coverage of its source databases (e.g., ChEMBL9 or DrugBank10). To handle the acute incompleteness of experimental signatures Thus, the CC is most useful when a substantial amount of bioactivity accessible for training the SNNs (Fig. 1b), we devised a signature- information is available for the molecules and remains of limited dropout sampling scheme that simulates a realistic prediction 11 value for poorly characterized compounds . In the current study, we scenario where, depending on the CC space of interest (Si), sig- present a methodology to infer CC bioactivity signatures for any natures from certain spaces will be available while others may not. compound of interest, based on the observation that the different For example, in the CC, biological pathway signatures (C3) are bioactivity spaces are not completely independent, and thus simila- directly derived from binding signatures (B4), thus implying that, rities of a given bioactivity type (e.g., targets) can be transferred to in a real B4 prediction case, C3 will never serve as a covariate. In other data kinds (e.g., therapeutic indications). Overall, we make practice, signature sampling probabilities for each CC space Si – bioactivity signatures available for
Recommended publications
  • Department of English and American Studies English Language And
    Masaryk University Faculty of Arts Department of English and American Studies English Language and Literature Jan Štěrba Word-formation Processes in TV Series Bachelor’s Diploma Thesis Supervisor: doc. PhDr. Naděžda Kudrnáčová, Csc. 2015 I declare that I have worked on this thesis independently, using only the primary and secondary sources listed in the bibliography. …………………………………………….. Jan Štěrba I would like to thank my supervisor doc, PhDr. Naděžda Kudrnáčová, Csc., for her time, kindness and willingness to help. Table of Contents Introduction ....................................................................................................................... 6 1. Word-formation ............................................................................................................ 9 1.1 Definitions of Basic Terms ..................................................................................... 9 1.1.1 Lexeme, word-form, derivation and inflection ................................................. 9 1.1.2 Root, stem and base ........................................................................................ 10 1.2 Definition of Word-formation ............................................................................... 11 1.3 Productivity and Lexicalization ............................................................................ 12 2. Word-formation Processes in English ........................................................................ 13 2.1 Compounding .......................................................................................................
    [Show full text]
  • Tgfβ-Activated USP27X Deubiquitinase Regulates Cell Migration And
    Author Manuscript Published OnlineFirst on October 19, 2018; DOI: 10.1158/0008-5472.CAN-18-0753 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. 1 TGFβ-activated USP27X deubiquitinase regulates cell migration and 2 chemoresistance via stabilization of Snail1 3 4 Guillem Lambies1,2, Martina Miceli1, Catalina Martínez-Guillamon1, Rubén 5 Olivera-Salguero1, Raúl Peña1, Carolina-Paola Frías1, Irene Calderón1, Boyko S. 6 Atanassov3, Sharon Y. R. Dent4, Joaquín Arribas5,6,7, Antonio García de 7 Herreros1,2*, and Víctor M. Díaz1,2*. 8 1 9 Programa de Recerca en Càncer, Institut Hospital del Mar d’Investigacions Mèdiques 10 (IMIM), Unidad Asociada CSIC, Barcelona, Spain. 2 11 Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra 12 (UPF), Barcelona, Spain. 3 13 Department of Pharmacology & Therapeutics, Roswell Park Comprehensive Cancer 14 Center, Buffalo, NY 15 4 Department of Epigenetics and Molecular Carcinogenesis, Center for Cancer 16 Epigenetics, University of Texas M.D. Anderson Cancer Center, Smithville, Texas 17 5 Preclinical Research Program, Vall d’Hebron Institute of Oncology (VHIO) 18 CIBERONC, Barcelona, Spain 19 6 Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain 20 7 Department of Biochemistry and Molecular Biology, Universitat Autònoma de 21 Barcelona, Campus de la UAB, Bellaterra, Spain 22 23 Running title: USP27X deubiquitinates Snail1 in tumor cells. 24 1 Downloaded from cancerres.aacrjournals.org on October 5, 2021. © 2018 American Association for Cancer Research. Author Manuscript Published OnlineFirst on October 19, 2018; DOI: 10.1158/0008-5472.CAN-18-0753 Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited.
    [Show full text]
  • NATURAL KILLER CELLS, HYPOXIA, and EPIGENETIC REGULATION of HEMOCHORIAL PLACENTATION by Damayanti Chakraborty Submitted to the G
    NATURAL KILLER CELLS, HYPOXIA, AND EPIGENETIC REGULATION OF HEMOCHORIAL PLACENTATION BY Damayanti Chakraborty Submitted to the graduate degree program in Pathology and Laboratory Medicine and the Graduate Faculty of the University of Kansas in partial fulfillment ofthe requirements for the degree of Doctor of Philosophy. ________________________________ Chair: Michael J. Soares, Ph.D. ________________________________ Jay Vivian, Ph.D. ________________________________ Patrick Fields, Ph.D. ________________________________ Soumen Paul, Ph.D. ________________________________ Michael Wolfe, Ph.D. ________________________________ Adam J. Krieg, Ph.D. Date Defended: 04/01/2013 The Dissertation Committee for Damayanti Chakraborty certifies that this is the approved version of the following dissertation: NATURAL KILLER CELLS, HYPOXIA, AND EPIGENETIC REGULATION OF HEMOCHORIAL PLACENTATION ________________________________ Chair: Michael J. Soares, Ph.D. Date approved: 04/01/2013 ii ABSTRACT During the establishment of pregnancy, uterine stromal cells differentiate into decidual cells and recruit natural killer (NK) cells. These NK cells are characterized by low cytotoxicity and distinct cytokine production. In rodent as well as in human pregnancy, the uterine NK cells peak in number around mid-gestation after which they decline. NK cells associate with uterine spiral arteries and are implicated in pregnancy associated vascular remodeling processes and potentially in modulating trophoblast invasion. Failure of trophoblast invasion and vascular remodeling has been shown to be associated with pathological conditions like preeclampsia syndrome, hypertension in mother and/or fetal growth restriction. We hypothesize that NK cells fundamentally contribute to the organization of the placentation site. In order to study the in vivo role of NK cells during pregnancy, gestation stage- specific NK cell depletion was performed in rats using anti asialo GM1 antibodies.
    [Show full text]
  • The Role of Dubs in the Post-Translational Control of Cell Migration
    Essays in Biochemistry (2019) 63 579–594 https://doi.org/10.1042/EBC20190022 Review Article The role of DUBs in the post-translational control of cell migration Guillem Lambies1,2, Antonio Garc´ıade Herreros1,2 and V´ıctor M. D´ıaz1,2,3 1Programa de Recerca en Cancer,` Institut Hospital del Mar d’Investigacions Mediques` (IMIM), Unidad Asociada CSIC, Barcelona, Spain; 2Departament de Ciencies` Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, Spain; 3Faculty of Medicine and Health Sciences, International University of Catalonia, Sant Cugat del Valles,` Barcelona, Spain Downloaded from https://portlandpress.com/essaysbiochem/article-pdf/63/5/579/859061/ebc-2019-0022c.pdf by guest on 05 November 2019 Correspondence: V.M. Dıaz´ ([email protected])orA.Garcıa´ de Herreros ([email protected]) Cell migration is a multifactorial/multistep process that requires the concerted action of growth and transcriptional factors, motor proteins, extracellular matrix remodeling and proteases. In this review, we focus on the role of transcription factors modulat- ing Epithelial-to-Mesenchymal Transition (EMT-TFs), a fundamental process supporting both physiological and pathological cell migration. These EMT-TFs (Snail1/2, Twist1/2 and Zeb1/2) are labile proteins which should be stabilized to initiate EMT and provide full mi- gratory and invasive properties. We present here a family of enzymes, the deubiquitinases (DUBs) which have a crucial role in counteracting polyubiquitination and proteasomal degra- dation of EMT-TFs after their induction by TGFβ, inflammatory cytokines and hypoxia. We also describe the DUBs promoting the stabilization of Smads, TGFβ receptors and other key proteins involved in transduction pathways controlling EMT.
    [Show full text]
  • Prefixes and Suffixes
    им от medical roots, suffixes and prefi... ■ 2.5 Roots of quantity j _ • - • ' I * 3 See also j Prefixes and suffixes The following is an alphabetical list o f medical prefixes and suffixes, along with their meanings, origin, and an English example. A P re fix o r 1 M e a n in g | Origin language and etymology Example (s) su ffix j i1 Ancient Greek a-/ccv- (a-/an-), | a-, an - Denotes an absence of ! Apathy, Analgia without, not ... ........... ' i iab - aw ay from Latin A bduction i Latin (abdomen), abdomen, fat abdomin(o)- O f or relating to the abdomen A bdom en ! around the belly cardiac, hydrophobiac, -ac, -a c a l pertaining to Greek -ако<; (-akos) pharmacomaniacal . Ancient Greek racavGa (akantha), acanthioa acanthocyte, aca n th (o )- thorn or spine thorn acanthoma, acanthulus Greek акоосгакос; (acoustikos), " i aco u s(io )- O f or relating to hearing acoumeter, acoustician o f or for hearing ■ Greek axpov (akron), highest or Acrocrany, acromegaly, acr(o )- extremity, topmost larthest point acroosteolysis, acroposthia Greek акоостко*; (acoustikos), -acu sis hearing paracusis o f or for hearing ' -a d tow ard, in the direction o f dorsad increase, adherence, motion ad - Latin A dduction toward, very ad en (o )-, Ancient Greek d8f|v, a8ev- Adenocarcinoma, adenology, i O f or relating to a gland ad en (i)- (aden, aden-), an acorn; a gland jadenotome, adenotyphus | O f or relating to lat or fatty adip(o)- Latin (adeps, adip-), lat A dipocyte tissue j | i O f or relating to adrenal i adren(o)- , Latin Iadrenal artery glands j -a p .
    [Show full text]
  • Theranostics the PSMD14 Inhibitor Thiolutin As a Novel Therapeutic
    Theranostics 2021, Vol. 11, Issue 12 5847 Ivyspring International Publisher Theranostics 2021; 11(12): 5847-5862. doi: 10.7150/thno.46109 Research Paper The PSMD14 inhibitor Thiolutin as a novel therapeutic approach for esophageal squamous cell carcinoma through facilitating SNAIL degradation Chao Jing1*, Xingchen Li1*, Mengqian Zhou1*, Shengchi Zhang1,2*, Qingchuan Lai1, Dandan Liu1, Beibei Ye1, Linqi Li1, Yue Wu1, Hong Li1, Kai Yue1, Peng Chen1, Xiaofeng Yao1, Yansheng Wu1, Yuansheng Duan1, Xudong Wang1 1. Department of Maxillofacial and Otorhinolaryngological Oncology, Tianjin Medical University Cancer Institute and Hospital, Key Laboratory of Cancer Prevention and Therapy, Tianjin Cancer Institute, National Clinical Research Center of Cancer, Tianjin 300060, China. 2. Department of Ear, Nose and Throat, Tianjin 1st Centre Hospital, Tianjin 300192, China. * These authors contributed equally to this work. Corresponding authors: Dr. Xudong Wang ([email protected]), Ms Yuansheng Duan ([email protected]) and Dr. Yansheng Wu ([email protected]). Department of Maxillofacial and Otorhinolaryngological Oncology, Tianjin Medical University Cancer Institute and Hospital, Key Laboratory of Cancer Prevention and Therapy, Tianjin Cancer Institute, National Clinical Research Center of Cancer, 18 Huanhuxi Road, Tianjin 300060, China. Tel: +86-22-23340123. © The author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/). See http://ivyspring.com/terms for full terms and conditions. Received: 2020.03.17; Accepted: 2021.03.11; Published: 2021.04.03 Abstract Metastasis and chemoresistance are major causes of poor prognosis in patients with esophageal squamous cell carcinoma (ESCC), manipulated by multiple factors including deubiquitinating enzyme (DUB).
    [Show full text]
  • USP27X Antibody (N-Term) Affinity Purified Rabbit Polyclonal Antibody (Pab) Catalog # Ap16881a
    10320 Camino Santa Fe, Suite G San Diego, CA 92121 Tel: 858.875.1900 Fax: 858.622.0609 USP27X Antibody (N-term) Affinity Purified Rabbit Polyclonal Antibody (Pab) Catalog # AP16881a Specification USP27X Antibody (N-term) - Product Information Application IF, WB,E Primary Accession A6NNY8 Other Accession NP_001138545.1 Reactivity Human Host Rabbit Clonality Polyclonal Isotype Rabbit IgG Antigen Region 1-30 USP27X Antibody (N-term) - Additional Information Gene ID 389856 Other Names Ubiquitin carboxyl-terminal hydrolase 27, Immunofluorescent analysis of 4% Deubiquitinating enzyme 27, Ubiquitin paraformaldehyde-fixed, 0. 1% Triton X-100 carboxyl-terminal hydrolase 22-like, permeabilized Hela cells labeling USP27X Ubiquitin thioesterase 27, with AP16881a at 1/25 dilution, followed by Ubiquitin-specific-processing protease 27, Dylight® 488-conjugated goat anti-Rabbit X-linked ubiquitin carboxyl-terminal IgG secondary antibody at 1/200 dilution hydrolase 27, USP27X, USP22L, USP27 (green). Immunofluorescence image showing Target/Specificity Nucleus and Weak Cytoplasm staining on This USP27X antibody is generated from Hela cell line. Cytoplasmic actin is detected rabbits immunized with a KLH conjugated with Dylight® 554 Phalloidin(red). The synthetic peptide between 1-30 amino acids nuclear counter stain is DAPI (blue). of human USP27X. Dilution IF~~1:25 WB~~1:2000 Format Purified polyclonal antibody supplied in PBS with 0.09% (W/V) sodium azide. This antibody is purified through a protein A column, followed by peptide affinity purification. Storage Maintain refrigerated at 2-8°C for up to 2 weeks. For long term storage store at -20°C USP27X Antibody (N-term) (Cat. #AP16881a) in small aliquots to prevent freeze-thaw western blot analysis in NCI-H292 cell line cycles.
    [Show full text]
  • Also Called Back-Derivation, Retrograde Derivation Or Deaffixation) Is Often Described As One of the Minor Word-Formation Processes
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by DSpace at University of West Bohemia 2013 ACTA UNIVERSITATIS CAROLINAE PAG. 31–60 PHILOLOGICA 3 / PRAGUE STUDIES IN ENGLISH XXVI BACK-FORMATION IN THE NEWEST LAYER OF ENGLISH VOCABULARY NADĚŽDA STAŠKOVÁ 1. Introduction Back-formation (also called back-derivation, retrograde derivation or deaffixation) is often described as one of the minor word-formation processes. It has a special position among the other types of word-formation for several reasons. Firstly, it is a process direc- tionally opposite in its character to the most frequent, and possibly most natural way of forming new words in English – affixation. In back-formation, instead of being added, an affix (or that part of the source word which is supposed to be an affix) is subtracted, which thus results in a shorter or morphologically less complex item of vocabulary. Comparison of the two processes is given in Table 1. The verb ovate (1988) represents the most typical and modern result of back-formation – it is a verb formed from an action noun and its meaning is “to give someone an ovation; applaud and cheer enthusiastically”. Table 1: Comparison of affixation and back-formation SOURCE WORD > RESULTING WORD AFFIXATION: translate > translation BACK-FORMATION: ovation > ovate Especially because of the opposite direction of creation, back-formation is often regard- ed as an unusual (or even unnatural) process, and new back-formed words are at their first occurrence frequently felt as weird, humorous or inacceptable. The truth is that some back-formations appear only as nonce words, sometimes deliberately coined by a speak- er for humorous, ironic, or otherwise idiosyncratic, purposes and do not have a long life.
    [Show full text]
  • The Process of Compound Words Used in Martin's
    THE PROCESS OF COMPOUND WORDS USED IN MARTIN’S NOVEL A GAME OF THRONES A THESIS BY BONIFASIA BR KARO REG. NO. 150721022 DEPARTMENT OF ENGLISH FACULTY OF CULTURAL STUDIES UNIVERSITY OF SUMATERA UTARA MEDAN 2019 UNIVERSITAS SUMATERA UTARA THE PROCESS OF COMPOUND WORDS USED IN MARTIN’S NOVEL A GAME OF THRONES A THESIS BY BONIFASIA BR KARO REG. NO. 150721022 SUPERVISOR CO-SUPERVISOR Dr. Deliana, M.Hum. Liza Amalia Putri, S.S., MA NIP. 19571117 198303 2 002 NIP. 198406262017042001 Submitted to Faculty of Cultural Studies University of Sumatera Utara Medan in partial fulfillment of the requirements for the degree of Sarjana Sastra from Department of English DEPARTMENT OF ENGLISH FACULTY OF CULTURAL STUDIES UNIVERSITY OF SUMATERA UTARA MEDAN 2019 UNIVERSITAS SUMATERA UTARA Approved by the Department of English, Faculty of Cultural Studies University of Sumatera Utara (USU) Medan as thesis for The Sarjana Sastra Examination. Head, Secretary, Prof. T. Silvana Sinar, M.A., Ph.D Rahmadsyah Rangkuti, M.A., Ph.D NIP. 1954091619803 2 003 NIP. 19750209 200812 1 002 UNIVERSITAS SUMATERA UTARA Accepted by the Board of Examiners in partial fulfillment of requirements for the degree of Sarjana Sastra from the Department of English, Faculty of Cultural Studies University of Sumatera Utara, Medan. The examination is held in Department of English Faculty of Cultural Studies University of Sumatera Utara on Augt 16th, 2019. Dean of Faculty of Cultural Studies University of Sumatera Utara Dr. Budi Agustono, M.S. NIP.19600805 198703 1 001 Board of Examiners Rahmadsyah Rangkuti, M.A. Ph.D Dr. Deliana, M.Hum Dr.
    [Show full text]
  • Etymological Wordnet: Tracing the History of Words
    Etymological Wordnet: Tracing The History of Words Gerard de Melo IIIS, Tsinghua University Beijing, P.R. China [email protected] Abstract Research on the history of words has led to remarkable insights about language and also about the history of human civilization more generally. This paper presents the Etymological Wordnet, the first database that aims at making word origin information available as a large, machine-readable network of words in many languages. The information in this resource is obtained from Wiktionary. Extracting a network of etymological information from Wiktionary requires significant effort, as much of the etymological information is only given in prose. We rely on custom pattern matching techniques and mine a large network with over 500,000 word origin links as well as over 2 million derivational/compositional links. Keywords: etymology, historical linguistics, multilingual resources 1. Introduction 2. Background Investigating the origins of words can lead to remarkable In the 19th century, numerous connections between Indo- insights about the cultural background that has shaped the European languages were recognized, resulting in impor- semantics of our modern vocabulary. As a matter of fact, tant insights that fundamentally shaped linguistics and an- research in comparative and historical linguistics has not thropology. For instance, English “ten”, German “zehn”, only produced numerous invaluable findings about the his- Latin “decem”, Greek “deka”, and Sanskrit “dasa´ ” are all tory of words and languages but also about the history of cognates, i.e., words that descend from the same Proto- humanity and the migration patterns that have shaped our Indo-European ancestor. Due to various phonetic, phono- world.
    [Show full text]
  • Title Page No
    Hussain , Mushtaq (2013) Phylogenomic and structure-function relationship studies of proteins involved in EBV associated oncogenesis. PhD thesis. http://theses.gla.ac.uk/5357/ Copyright and moral rights for this work are retained by the author A copy can be downloaded for personal non-commercial research or study, without prior permission or charge This work cannot be reproduced or quoted extensively from without first obtaining permission in writing from the author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Enlighten:Theses http://theses.gla.ac.uk/ [email protected] Phylogenomic and Structure-Function Relationship Studies of Proteins Involved in EBV Associated Oncogenesis by Mushtaq Hussain The research presented in this thesis was sponsored by Dow University of Health Sciences, Karachi, Pakistan and Higher Education Commission, Pakistan and is author’s original work unless otherwise stated. Mushtaq Hussain Mushtaq Hussain, 2013 2 In the memory of my Paternal Grand Father and Maternal Grand Mother Wali Bhai Dadu Bhai Hawan Ji Bai Chand Junagadh Wala (Late) Ibrahim (Late) Mushtaq Hussain, 2013 3 Abstract This study covers the investigation of evolutionary and structure-function relationship aspects of several cancer related proteins. One part of the study deals with the investigation of a critical protein of Epstein-Barr Virus (EBV) the Nuclear Antigen 1 (EBNA1), and its interactions with different host proteins. One of these host proteins is a member of a large gene family, encoding ubiquitin specific proteases (USP), known as USP7.
    [Show full text]
  • Neo-Classical Compounds in Student Writing: a Corpus-Based Study Smith, S
    Neo-classical compounds in student writing: a corpus-based study Smith, S. Published version deposited in CURVE February 2016 Original citation & hyperlink: Smith, S. (2012) Neo-classical compounds in student writing: a corpus-based study. Verbum, volume 34 (2): 277-296. http://www.lcdpu.fr/livre/?GCOI=27000100999490 Publisher statement Published by Presses universitaires de Nancy – Editions universitaires de Lorraine. Copyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. CURVE is the Institutional Repository for Coventry University http://curve.coventry.ac.uk/open Verbum XXXIV, 2012, no 2, 277-296 NEO-CLASSICAL COMPOUNDS IN STUDENT WRITING : * A CORPUS-BASED STUDY Simon SMITH Coventry University RÉSUMÉ Le présent article examine l’utilisation des composés néo-classiques (CNC) au sein du corpus BAWE (British Academic Written English). Sont étudiés des textes rédigés par des étudiants anglophones et non-anglophones dans le cadre de leurs études de trois disciplines (Anglais, Gestion et Ingénierie). La répartition des CNC au sein de ces catégories y sera discutée. Nous démontrerons que la répartition des CNC dans les trois disciplines varie sensiblement et que l’utilisation des CNC parmi les anglophones diffère de celle des locuteurs non natifs, tandis que son utilisation au sein de groupes non anglophones reste assez homogène.
    [Show full text]