Automatic Functional Annotation of Predicted Active Sites: Combining PDB and Literature Mining

Total Page:16

File Type:pdf, Size:1020Kb

Automatic Functional Annotation of Predicted Active Sites: Combining PDB and Literature Mining Automatic functional annotation of predicted active sites: combining PDB and literature mining Kevin Nagel Wolfson College A dissertation submitted to the University of Cambridge for the degree of Doctor of Philosophy European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom. Email: [email protected] January 2009 Declaration This dissertation is the result of my own work, and includes nothing which is the outcome of work done in collaboration, except where specifically indicated in the text. The disser- tation does not exceed the specified length limit of 300 pages as defined by the Biology Degree Committee. This thesis has been typeset in 12pt font using LATEX 2"according to the specifications defined by the Board of Graduate Studies and the Biology Degree Committee. 1 Summary Kevin Nagel European Bioinformatics Institute University of Cambridge Dissertation title: Automatic functional annotation of predicted active sites: combining PDB and literature mining. Proteins are essential to cell functions, which is mainly identified in biological experiments. The structural models for proteins help to explain their function, but are not direct evidence for their function. Nonetheless, we can mine structural databases, such as Protein Data Bank (PDB), to filter out shared structural components that are meaningful with regards to the protein function. This thesis applied mining techniques to PDB to identify evolutionary conserved struc- tural patterns, e.g. active sites. This analysis retrieved 3- and 4-bodies with assumed two- and three-way residue interaction that have been selected from a distribution analysis of residue triplets. A subset of the mined patterns is assumed to represent an active site, which should be confirmed by annotations gathered by automatic literature analysis. Literature analysis for the functional annotation of proteins relies on the extraction of GO terms from the context of a protein mention. The annotation of protein residues 2 requires the identification of chemical functions, which could be found in the context of residue mentions. MEDLINE abstracts have been processed to identify protein men- tions in combination with species and residues (F1-measure 0.52; the F1-measure is a statistical measure of a test's accuracy based on the precision and recall of a test). The identified protein-species-residue triplets have been validated and benchmarked against reference data resources. Then, contextual features were extracted through shallow and deep parsing and the features have been classified into predefined categories (F1-measure ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with annota- tion types in UniProtKB to assess the relevance of the annotations for ongoing curation projects. Altogether, the annotations have been assessed automatically and manually against reference data resources. All MEDLINE has been processed to filter out annotations for residues. A subset of identified catalytic sites could be cross-validated against the Catalytic Site Atlas (CSA; 44 out of 221). 429 out of 512 protein residues from MSDsite was then annotated with contextual data. Altogether, MEDLINE does not provide sufficient data to fully annotate the content from PDB. Conversely, residue annotation is achieved with a different feature set than provided from GO, and incomplete annotations in the reference datasets can be filled from public literature. 3 Acknowledgements This thesis would not have been possible without the support, direction, and love of a mul- titude of people. First, I would like to thank my supervisor Dietrich Rebholz-Schuhmann for his trust, encouragements, and for all his unconditional support and guidance. Diet- rich has throughout given me opportunity and a sound research methodology. Working with him I have learned the value of vision, and persistence in achieving it. I am blessed to have had Tom Oldfield for my second supervisor. Ever since I was interviewed by Tom, he has been inspiring, helpful and most of all patient. I will look back fondly on our discussions, the "insights" in protein science he gave me, and the cheerful and motivational chats. I am deeply indebted for his belief in me. I would like to thank my thesis committee members for their valuable and constructive comments and valuable criticism; Michael Ashburner, Kim Henrick, and Rob Russell. They all seemed to find time for me despite their busy schedules. A special thank you must go to Kim Henrick; had he not encouraged me to pursue a research position I would not be a scientist now. I would also like to acknowledge Antonio Jimeno for his time, patience, and suggestions and especially for reminding me to keep my focus always. But most of all I will remember the great times we had cycling to and from work. I would like to thank the past and present members of the Rebholz Group (Text Mining). During my years of research, the group has expanded and I have had the chance to learn from them as well as to have fun with them within the group. 4 I am also thankful to the European Molecular Biology Laboratoy EMBL for the schol- arship and the organised EMBL International PhD programme, throughout which I have had the chance to meet many talented and cheerful PhD students from the EMBL/EBI Hinxton. A special thank you to Christina Granroth and Dagmar Harzheim, who have done the proofreading of this thesis. Thank you Dagmar for becoming clearer what I want to say. Finally, I would like to acknowledge my wife Almut Nagel and my daughter Juli Nagel. Without Almut I would have become a working maniac with no joy in life; she helped me to maintain balance during my PhD research and also for the future. My special thanks and love will go to Juli, aged one, from whom I have learned so much. 5 Contents 1 Introduction 15 1.1 Proteins and functional sites.......................... 15 1.2 Motivation.................................... 19 1.3 Objective.................................... 21 1.4 Related works.................................. 21 1.5 Challenges.................................... 23 1.6 Guide to remaining chapters.......................... 24 2 Background 26 2.1 Protein related data resources......................... 26 2.1.1 Protein Data Bank........................... 27 2.1.2 Universal Protein Knowledge base................... 31 2.1.3 Gene Ontology............................. 33 2.1.4 Biomedical literature.......................... 33 2.2 Protein structure data mining......................... 35 2.2.1 Hypothesis-driven data analysis.................... 36 2.2.2 Discovery-driven data mining..................... 37 2.3 Biomedical literature mining.......................... 38 2.3.1 Biological entity recognition...................... 38 2.3.2 Biological relation extraction...................... 39 6 2.4 Conclusion.................................... 40 3 Mining residue interactions as triads from PDB 42 3.1 Algorithms.................................... 42 3.1.1 Structural feature extraction...................... 44 3.1.2 Detection of significant configurations as interactions........ 47 3.1.3 Grouping and selecting frequent configurations............ 52 3.2 Analysing available non-redundant protein structure sets.......... 53 3.3 Evaluation methods............................... 55 3.4 Results...................................... 55 3.4.1 Identification of residue interactions is dependent on data selection 55 3.4.2 The interaction distance correlates with the distribution of residue triads.................................. 56 3.4.3 Interaction classification is sensitive to the size of cross-validation. 59 3.5 Discussion.................................... 59 3.6 Conclusion.................................... 62 4 Prediction of functions for mined residue triads 63 4.1 Evaluation methods............................... 64 4.2 Results...................................... 65 4.2.1 Identification of homologous metal binding sites........... 66 4.2.2 Validation of convergent metal binding sites............. 67 4.2.3 Recovering active sites and catalytic triads from the dataset.... 73 4.2.4 Discovering the conserved serine residue in the catalytic triad (quar- tet).................................... 75 4.3 Discussion.................................... 76 4.4 Conclusion.................................... 78 7 5 Identification of protein residues in MEDLINE 79 5.1 Algorithms.................................... 79 5.1.1 Protein and organism entity recognition............... 81 5.1.2 Entity recognition of protein residue................. 82 5.1.3 Association identification of the entity triplet organism, protein, and residue............................... 83 5.2 The construction of evaluation test corpora.................. 86 5.3 Evaluation methods............................... 88 5.4 Results...................................... 89 5.4.1 Evaluation of organism, protein, and residue entity recognition... 90 5.4.2 Performance study on the entity triplet association......... 92 5.4.3 Cross-validation of identified residues with UniProtKB....... 93 5.4.4 Identified residues in MEDLINE for Uniprot/PDB proteins..... 94 5.5 Discussion.................................... 96 5.6 Conclusion.................................... 100 6 Information extraction from the context of a residue in text 101 6.1 Algorithms.................................... 101 6.1.1 Extraction of contextual features................... 103 6.1.2 Categorisation of contextual
Recommended publications
  • Dietary Supplementation with L-Arginine, Single Nucleotide
    The Journal of Nutrition Commentary Dietary Supplementation with L-Arginine, Single Nucleotide Polymorphisms of Arginase Downloaded from https://academic.oup.com/jn/advance-article/doi/10.1093/jn/nxaa431/6131937 by University of Memphis - Library user on 10 February 2021 1 and 2, and Plasma L-Arginine Keith R Martin Center for Nutraceutical and Dietary Supplement Research, College of Health Sciences, University of Memphis, Memphis, TN, USA There are ∼3 million single nucleotide polymorphisms (SNPs) Arginine-dependent NO impacts the immune, neuronal, and between 2 unrelated individuals, with each SNP representing cardiovascular systems and is a potent vasodilator critical for a single modification in the genomic code and cumulatively vascular tone, blood pressure modulation, vascular permeabil- representing 0.1% human genetic diversity. Although many of ity, and angiogenesis (7–9). SNPs in the ARG1 gene are clinically these differences are silent, relatively recent data suggest that associated with cardiovascular diseases such as hypertension, around half of the various responses to dietary agents could cardiomyopathy, myocardial infarction, and carotid intima be related to genetic variation, a field coined as nutrigenetics. media thickness, and upregulation of arginase is implicated in These variations are of considerable interest in improving the vascular dysfunction (10, 11). Thus, dietary supplementation health of individuals and collectively mitigating the risk of with l-arginine could be warranted. chronic disease. In this issue of the Journal, Hannemann et The study by Hannemann et al. is the first to describe al. (1) describe a possible functional relation between arginase an association between genotypes of ARG1 and ARG2 with isoforms, important in the urea cycle and nitric oxide (NO) consequent circulating plasma concentrations of l-arginine.
    [Show full text]
  • Mesenchymal Stem Cells Prevent the Progression of Diabetic
    Lee et al. Experimental & Molecular Medicine (2019) 51:77 https://doi.org/10.1038/s12276-019-0268-5 Experimental & Molecular Medicine ARTICLE Open Access Mesenchymal stem cells prevent the progression of diabetic nephropathy by improving mitochondrial function in tubular epithelial cells Seung Eun Lee1,2,7,JungEunJang1,2,8,HyunSikKim2,MinKyoJung2,MyoungSeokKo2,Mi-OkKim2, Hye Sun Park2, Wonil Oh3, Soo Jin Choi3,HyeJinJin3, Sang-Yeob Kim4,YunJaeKim2,SeongWhoKim5, Min Kyung Kim5, Chang Ohk Sung6, Chan-Gi Pack 2,Ki-UpLee1,2 and Eun Hee Koh1,2 Abstract The administration of mesenchymal stem cells (MSCs) was shown to attenuate overt as well as early diabetic nephropathy in rodents, but the underlying mechanism of this beneficial effect is largely unknown. Inflammation and mitochondrial dysfunction are major pathogenic factors in diabetic nephropathy. In this study, we found that the repeated administration of MSCs prevents albuminuria and injury to tubular epithelial cells (TECs), an important element in the progression of diabetic nephropathy, by improving mitochondrial function. The expression of M1 macrophage markers was significantly increased in diabetic kidneys compared with that in control kidneys. Interestingly, the expression of arginase-1 (Arg1), an important M2 macrophage marker, was reduced in diabetic kidneys and increased by MSC treatment. In cultured TECs, conditioned media from lipopolysaccharide-activated 1234567890():,; 1234567890():,; 1234567890():,; 1234567890():,; macrophages reduced peroxisomal proliferator-activated receptor gamma coactivator 1α (Pgc1a) expression and impaired mitochondrial function. The coculture of macrophages with MSCs increased and decreased the expression of Arg1 and M1 markers, respectively. Treatment with conditioned media from cocultured macrophages prevented activated macrophage-induced mitochondrial dysfunction in TECs.
    [Show full text]
  • Protein Identities in Evs Isolated from U87-MG GBM Cells As Determined by NG LC-MS/MS
    Protein identities in EVs isolated from U87-MG GBM cells as determined by NG LC-MS/MS. No. Accession Description Σ Coverage Σ# Proteins Σ# Unique Peptides Σ# Peptides Σ# PSMs # AAs MW [kDa] calc. pI 1 A8MS94 Putative golgin subfamily A member 2-like protein 5 OS=Homo sapiens PE=5 SV=2 - [GG2L5_HUMAN] 100 1 1 7 88 110 12,03704523 5,681152344 2 P60660 Myosin light polypeptide 6 OS=Homo sapiens GN=MYL6 PE=1 SV=2 - [MYL6_HUMAN] 100 3 5 17 173 151 16,91913397 4,652832031 3 Q6ZYL4 General transcription factor IIH subunit 5 OS=Homo sapiens GN=GTF2H5 PE=1 SV=1 - [TF2H5_HUMAN] 98,59 1 1 4 13 71 8,048185945 4,652832031 4 P60709 Actin, cytoplasmic 1 OS=Homo sapiens GN=ACTB PE=1 SV=1 - [ACTB_HUMAN] 97,6 5 5 35 917 375 41,70973209 5,478027344 5 P13489 Ribonuclease inhibitor OS=Homo sapiens GN=RNH1 PE=1 SV=2 - [RINI_HUMAN] 96,75 1 12 37 173 461 49,94108966 4,817871094 6 P09382 Galectin-1 OS=Homo sapiens GN=LGALS1 PE=1 SV=2 - [LEG1_HUMAN] 96,3 1 7 14 283 135 14,70620005 5,503417969 7 P60174 Triosephosphate isomerase OS=Homo sapiens GN=TPI1 PE=1 SV=3 - [TPIS_HUMAN] 95,1 3 16 25 375 286 30,77169764 5,922363281 8 P04406 Glyceraldehyde-3-phosphate dehydrogenase OS=Homo sapiens GN=GAPDH PE=1 SV=3 - [G3P_HUMAN] 94,63 2 13 31 509 335 36,03039959 8,455566406 9 Q15185 Prostaglandin E synthase 3 OS=Homo sapiens GN=PTGES3 PE=1 SV=1 - [TEBP_HUMAN] 93,13 1 5 12 74 160 18,68541938 4,538574219 10 P09417 Dihydropteridine reductase OS=Homo sapiens GN=QDPR PE=1 SV=2 - [DHPR_HUMAN] 93,03 1 1 17 69 244 25,77302971 7,371582031 11 P01911 HLA class II histocompatibility antigen,
    [Show full text]
  • Chuanxiong Rhizoma Compound on HIF-VEGF Pathway and Cerebral Ischemia-Reperfusion Injury’S Biological Network Based on Systematic Pharmacology
    ORIGINAL RESEARCH published: 25 June 2021 doi: 10.3389/fphar.2021.601846 Exploring the Regulatory Mechanism of Hedysarum Multijugum Maxim.-Chuanxiong Rhizoma Compound on HIF-VEGF Pathway and Cerebral Ischemia-Reperfusion Injury’s Biological Network Based on Systematic Pharmacology Kailin Yang 1†, Liuting Zeng 1†, Anqi Ge 2†, Yi Chen 1†, Shanshan Wang 1†, Xiaofei Zhu 1,3† and Jinwen Ge 1,4* Edited by: 1 Takashi Sato, Key Laboratory of Hunan Province for Integrated Traditional Chinese and Western Medicine on Prevention and Treatment of 2 Tokyo University of Pharmacy and Life Cardio-Cerebral Diseases, Hunan University of Chinese Medicine, Changsha, China, Galactophore Department, The First 3 Sciences, Japan Hospital of Hunan University of Chinese Medicine, Changsha, China, School of Graduate, Central South University, Changsha, China, 4Shaoyang University, Shaoyang, China Reviewed by: Hui Zhao, Capital Medical University, China Background: Clinical research found that Hedysarum Multijugum Maxim.-Chuanxiong Maria Luisa Del Moral, fi University of Jaén, Spain Rhizoma Compound (HCC) has de nite curative effect on cerebral ischemic diseases, *Correspondence: such as ischemic stroke and cerebral ischemia-reperfusion injury (CIR). However, its Jinwen Ge mechanism for treating cerebral ischemia is still not fully explained. [email protected] †These authors share first authorship Methods: The traditional Chinese medicine related database were utilized to obtain the components of HCC. The Pharmmapper were used to predict HCC’s potential targets. Specialty section: The CIR genes were obtained from Genecards and OMIM and the protein-protein This article was submitted to interaction (PPI) data of HCC’s targets and IS genes were obtained from String Ethnopharmacology, a section of the journal database.
    [Show full text]
  • Investigation of COVID-19 Comorbidities Reveals Genes and Pathways Coincident with the SARS-Cov-2 Viral Disease
    bioRxiv preprint doi: https://doi.org/10.1101/2020.09.21.306720; this version posted September 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Title: Investigation of COVID-19 comorbidities reveals genes and pathways coincident with the SARS-CoV-2 viral disease. Authors: Mary E. Dolan1*,2, David P. Hill1,2, Gaurab Mukherjee2, Monica S. McAndrews2, Elissa J. Chesler2, Judith A. Blake2 1 These authors contributed equally and should be considered co-first authors * Corresponding author [email protected] 2 The Jackson Laboratory, 600 Main St, Bar Harbor, ME 04609, USA Abstract: The emergence of the SARS-CoV-2 virus and subsequent COVID-19 pandemic initiated intense research into the mechanisms of action for this virus. It was quickly noted that COVID-19 presents more seriously in conjunction with other human disease conditions such as hypertension, diabetes, and lung diseases. We conducted a bioinformatics analysis of COVID-19 comorbidity-associated gene sets, identifying genes and pathways shared among the comorbidities, and evaluated current knowledge about these genes and pathways as related to current information about SARS-CoV-2 infection. We performed our analysis using GeneWeaver (GW), Reactome, and several biomedical ontologies to represent and compare common COVID- 19 comorbidities. Phenotypic analysis of shared genes revealed significant enrichment for immune system phenotypes and for cardiovascular-related phenotypes, which might point to alleles and phenotypes in mouse models that could be evaluated for clues to COVID-19 severity.
    [Show full text]
  • The Function of NM23-H1/NME1 and Its Homologs in Major Processes Linked to Metastasis
    University of Dundee The Function of NM23-H1/NME1 and Its Homologs in Major Processes Linked to Metastasis Mátyási, Barbara; Farkas, Zsolt; Kopper, László; Sebestyén, Anna; Boissan, Mathieu; Mehta, Anil Published in: Pathology and Oncology Research DOI: 10.1007/s12253-020-00797-0 Publication date: 2020 Licence: CC BY Document Version Publisher's PDF, also known as Version of record Link to publication in Discovery Research Portal Citation for published version (APA): Mátyási, B., Farkas, Z., Kopper, L., Sebestyén, A., Boissan, M., Mehta, A., & Takács-Vellai, K. (2020). The Function of NM23-H1/NME1 and Its Homologs in Major Processes Linked to Metastasis. Pathology and Oncology Research, 26(1), 49-61. https://doi.org/10.1007/s12253-020-00797-0 General rights Copyright and moral rights for the publications made accessible in Discovery Research Portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from Discovery Research Portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain. • You may freely distribute the URL identifying the publication in the public portal. Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your
    [Show full text]
  • DOE Human Genome Program Contractor-Grantee Workshop VI November 9-13, 1997 Santa Fe, New Mexico
    CONF-971146 DOE uman ~ ~enome Contact for queries about this publication: Human Genome Program U.S. Department of Energy Office of Biological and Environmental Research ER-72GTN Washington, DC 20585 301/903-6488, Fax: 301/903-8521 E-mail: [email protected] A limited number of print copies are available. Contact: Betty K. Mansfield Oak Ridge National Laboratory 1060 Commerce Park, MS 6480 Oak Ridge, TN 37830 423/576-6669, Fax: 423/574-9888 E-mail: [email protected] An electronic version of this document will be available, after the November 1997 meeting, at the Human Genome Project Information Web site under Publications (http://www.ornl.gov/hgmis). This report has been reproduced directly from the best obtainable copy. Available to DOE and DOE contractors from the Office of Scientific and Technical Information; P.O. Box 62; Oak Ridge, TN 37831. Price information: 423/576-8401. Available to the public from the National Technical Information Service; U.S. Department of Commerce; 5285 Port Royal Road; Springfield, VA 22161. CONF-971146 DOE Human Genome Program Contractor-Grantee Workshop VI November 9-13, 1997 Santa Fe, New Mexico Date Published: October 1997 Prepared for the U.S. Department of Energy Office of Energy Research Office of Biological and Environmental Research Washington, D.C. 20585 under budget and reporting code KP 0404000 Prepared by Hwnan Genome Management Information System Oak Ridge National Laboratory Oak Ridge, 1N 37830-6480 Managed by LOCKHEED MARTIN ENERGY RESEARCH CORP. for the U.S. DEPARTMENT OF ENERGY UNDER CONTRACT DE-AC05-960R22464 Contents Introduction to Contractor-Grantee Workshop VI .............................
    [Show full text]
  • The Urea Cycle Is Transcriptionally Controlled by Hypoxia
    bioRxiv preprint doi: https://doi.org/10.1101/2021.01.25.428152; this version posted January 27, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The urea cycle is transcriptionally controlled by hypoxia-inducible factors Charandeep Singh1, Andrew Benos1, Allison Grenell1,2, Vincent Tran1, Demiana Hanna1, Bela Anand-Apte1,3, Henri Brunengraber4, Jonathan E. Sears1,5 1Ophthalmic Research, Cole Eye Institute, Cleveland Clinic, Cleveland, OH 44195, USA. 2Department of Pharmacology, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA. 3Department of Ophthalmology and Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine at Case Western Reserve University, Cleveland, OH 44195, USA. 4Department of Nutrition, Case Western Reserve School of Medicine Cleveland, Cleveland, OH, 44106, USA. 5Cardiovascular and Metabolic Sciences, Cleveland Clinic, Cleveland, OH 44195, USA. Keywords Transcriptional regulation, stable-isotope, carbamoyl phosphate synthetase 1, arginase 1, FG-4592, urea cycle, HIF Abbreviations HIF PHi, hypoxia inducible factor-prolyl hydroxylase inhibition; OIR, oxygen induced retinopathy; CPS1, Carbamoyl phosphate synthetase I; ARG1, Arginase 1; ARG2, Arginase 2; OTC, Ornithine transcarbamylase; HNF3-β, Hepatocyte nuclear factors; bioRxiv preprint doi: https://doi.org/10.1101/2021.01.25.428152; this version posted January 27, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Abstract Here, we demonstrate transcriptional regulation of urea cycle genes CPS1 and ARG1 by hypoxia-inducible factors (HIFs) and demonstrate a hepatic HIF dependent increase in urea cycle activity.
    [Show full text]
  • Comparative Analysis of Pacbio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus Monodon
    life Brief Report Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus monodon Zulema Udaondo 1 , Kanchana Sittikankaew 2, Tanaporn Uengwetwanit 2 , Thidathip Wongsurawat 1,3 , Chutima Sonthirod 4, Piroon Jenjaroenpun 1,3 , Wirulda Pootakham 4, Nitsara Karoonuthaisiri 2 and Intawat Nookaew 1,* 1 Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; [email protected] (Z.U.); [email protected] (T.W.); [email protected] (P.J.) 2 National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani 12120, Thailand; [email protected] (K.S.); [email protected] (T.U.); [email protected] (N.K.) 3 Division of Bioinformatics and Data Management for Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand 4 National Omics Center (NOC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand; [email protected] (C.S.); [email protected] (W.P.) * Correspondence: [email protected]; Tel.: +1-501-686-6025; Fax: +1-501-603-1766 Citation: Udaondo, Z.; Sittikankaew, K.; Uengwetwanit, T.; Wongsurawat, Abstract: With the advantages that long-read sequencing platforms such as Pacific Biosciences T.; Sonthirod, C.; Jenjaroenpun, P.; (Menlo Park, CA, USA) (PacBio) and Oxford Nanopore Technologies (Oxford, UK) (ONT) can offer, Pootakham, W.; Karoonuthaisiri, N.; various research fields such as genomics and transcriptomics can exploit their benefits. Selecting an Nookaew, I. Comparative Analysis of appropriate sequencing platform is undoubtedly crucial for the success of the research outcome, PacBio and Oxford Nanopore thus there is a need to compare these long-read sequencing platforms and evaluate them for specific Sequencing Technologies for research questions.
    [Show full text]
  • Development and Validation of a Protein-Based Risk Score for Cardiovascular Outcomes Among Patients with Stable Coronary Heart Disease
    Supplementary Online Content Ganz P, Heidecker B, Hveem K, et al. Development and validation of a protein-based risk score for cardiovascular outcomes among patients with stable coronary heart disease. JAMA. doi: 10.1001/jama.2016.5951 eTable 1. List of 1130 Proteins Measured by Somalogic’s Modified Aptamer-Based Proteomic Assay eTable 2. Coefficients for Weibull Recalibration Model Applied to 9-Protein Model eFigure 1. Median Protein Levels in Derivation and Validation Cohort eTable 3. Coefficients for the Recalibration Model Applied to Refit Framingham eFigure 2. Calibration Plots for the Refit Framingham Model eTable 4. List of 200 Proteins Associated With the Risk of MI, Stroke, Heart Failure, and Death eFigure 3. Hazard Ratios of Lasso Selected Proteins for Primary End Point of MI, Stroke, Heart Failure, and Death eFigure 4. 9-Protein Prognostic Model Hazard Ratios Adjusted for Framingham Variables eFigure 5. 9-Protein Risk Scores by Event Type This supplementary material has been provided by the authors to give readers additional information about their work. Downloaded From: https://jamanetwork.com/ on 10/02/2021 Supplemental Material Table of Contents 1 Study Design and Data Processing ......................................................................................................... 3 2 Table of 1130 Proteins Measured .......................................................................................................... 4 3 Variable Selection and Statistical Modeling ........................................................................................
    [Show full text]
  • In This Table Protein Name, Uniprot Code, Gene Name P-Value
    Supplementary Table S1: In this table protein name, uniprot code, gene name p-value and Fold change (FC) for each comparison are shown, for 299 of the 301 significantly regulated proteins found in both comparisons (p-value<0.01, fold change (FC) >+/-0.37) ALS versus control and FTLD-U versus control. Two uncharacterized proteins have been excluded from this list Protein name Uniprot Gene name p value FC FTLD-U p value FC ALS FTLD-U ALS Cytochrome b-c1 complex P14927 UQCRB 1.534E-03 -1.591E+00 6.005E-04 -1.639E+00 subunit 7 NADH dehydrogenase O95182 NDUFA7 4.127E-04 -9.471E-01 3.467E-05 -1.643E+00 [ubiquinone] 1 alpha subcomplex subunit 7 NADH dehydrogenase O43678 NDUFA2 3.230E-04 -9.145E-01 2.113E-04 -1.450E+00 [ubiquinone] 1 alpha subcomplex subunit 2 NADH dehydrogenase O43920 NDUFS5 1.769E-04 -8.829E-01 3.235E-05 -1.007E+00 [ubiquinone] iron-sulfur protein 5 ARF GTPase-activating A0A0C4DGN6 GIT1 1.306E-03 -8.810E-01 1.115E-03 -7.228E-01 protein GIT1 Methylglutaconyl-CoA Q13825 AUH 6.097E-04 -7.666E-01 5.619E-06 -1.178E+00 hydratase, mitochondrial ADP/ATP translocase 1 P12235 SLC25A4 6.068E-03 -6.095E-01 3.595E-04 -1.011E+00 MIC J3QTA6 CHCHD6 1.090E-04 -5.913E-01 2.124E-03 -5.948E-01 MIC J3QTA6 CHCHD6 1.090E-04 -5.913E-01 2.124E-03 -5.948E-01 Protein kinase C and casein Q9BY11 PACSIN1 3.837E-03 -5.863E-01 3.680E-06 -1.824E+00 kinase substrate in neurons protein 1 Tubulin polymerization- O94811 TPPP 6.466E-03 -5.755E-01 6.943E-06 -1.169E+00 promoting protein MIC C9JRZ6 CHCHD3 2.912E-02 -6.187E-01 2.195E-03 -9.781E-01 Mitochondrial 2-
    [Show full text]
  • Assessing the Human Canonical Protein Count[Version 1; Peer Review
    F1000Research 2017, 6:448 Last updated: 15 JUL 2020 REVIEW Last rolls of the yoyo: Assessing the human canonical protein count [version 1; peer review: 1 approved, 2 approved with reservations] Christopher Southan IUPHAR/BPS Guide to Pharmacology, Centre for Integrative Physiology, University of Edinburgh, Edinburgh, EH8 9XD, UK First published: 07 Apr 2017, 6:448 Open Peer Review v1 https://doi.org/10.12688/f1000research.11119.1 Latest published: 07 Apr 2017, 6:448 https://doi.org/10.12688/f1000research.11119.1 Reviewer Status Abstract Invited Reviewers In 2004, when the protein estimate from the finished human genome was 1 2 3 only 24,000, the surprise was compounded as reviewed estimates fell to 19,000 by 2014. However, variability in the total canonical protein counts version 1 (i.e. excluding alternative splice forms) of open reading frames (ORFs) in 07 Apr 2017 report report report different annotation portals persists. This work assesses these differences and possible causes. A 16-year analysis of Ensembl and UniProtKB/Swiss-Prot shows convergence to a protein number of ~20,000. The former had shown some yo-yoing, but both have now plateaued. Nine 1 Michael Tress, Spanish National Cancer major annotation portals, reviewed at the beginning of 2017, gave a spread Research Centre (CNIO), Madrid, Spain of counts from 21,819 down to 18,891. The 4-way cross-reference concordance (within UniProt) between Ensembl, Swiss-Prot, Entrez Gene 2 Elspeth A. Bruford , European Molecular and the Human Gene Nomenclature Committee (HGNC) drops to 18,690, Biology Laboratory, Hinxton, UK indicating methodological differences in protein definitions and experimental existence support between sources.
    [Show full text]