Annotation: Curation, Tools, Ontologies, Databases Genomics
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Fine–Mapping Identifies NAD–ME1 As a Candidate Underlying a Major
bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.285429; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Fine–mapping identifies NAD–ME1 as a candidate underlying 2 a major locus controlling temporal variation in primary and 3 specialized metabolism in Arabidopsis 4 5 Marta Francisco1, Daniel J. Kliebenstein2,3, Víctor M. Rodríguez1, Pilar Soengas1, 6 Rosaura Abilleira1, María E. Cartea1 7 1Misión Biológica de Galicia, (MBG-CSIC), P.O. Box 28, 36080, Pontevedra, Spain 8 2Department of Plant Sciences, University of California at Davis, Davis, CA 95616, 9 USA. 10 3DynaMo Center of Excellence, University of Copenhagen, Thorvaldsensvej 40, DK- 11 1871 Frederiksberg C, Denmark. 12 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.285429; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 13 Summary 14 Plant metabolism is modulated by a complex interplay between internal signals and 15 external cues. A major goal of all quantitative metabolomic studies is to clone the 16 underlying genes to understand the mechanistic basis of this variation. Using fine-scale 17 genetic mapping, in this work we report the identification and initial characterization of 18 NAD-DEPENDENT MALIC ENZYME 1 (NAD-ME1) as the candidate gene underlying 19 the pleiotropic network Met.II.15 QTL controlling variation in plant metabolism and 20 circadian clock outputs in the Bay × Sha Arabidopsis population. -
Pancreatic Beta Cells Express a Diverse Set Ofhomeobox Genes
Proc. Nati. Acad. Sci. USA Vol. 91, pp. 12203-12207, December 1994 Biochemistry Pancreatic beta cells express a diverse set of homeobox genes (Lim motif/Lmx gene/Nkx gene/Alx gene/Vdx homeobox) ABRAHAM RUDNICK*t, THAI YEN LING*, HIROKI ODAGIRI*, WILLIAM J. RUTTER*t, AND MICHAEL S. GERMAN*t§ *Hormone Research Institute and Departments of tMedicine and tBiochemistry and Biophysics, University of California, San Francisco, CA 94143-0534 Contributed by William J. Rutter, August 22, 1994 ABSTRACT Homeobox genes, which are found in all RIPE3B element (16) and the P1 element (8) [also called CT1 eukaryotic organisms, encode transcriptional regulators in- (9)] lie on either side of the IEB1 element. The A/T elements volved in cell-type differentiation and development. Several and the E boxes function synergistically: none of the ele- homeobox genes encoding homeodomain proteins that bind and ments can function in isolation, but combination of an E box activate the insulin gene promoter have been described. In an and an A/T element results in dramatic activation of tran- attempt to identify additional beta-cell homeodomain proteins, scription (11, 16, 19). A number of complexes from beta-cell we designed primers based on the sequences of beta-cell nuclei bind to the A/T elements (6, 8-11, 16, 19). Some homeobox genes cdx3 and lmxl and the Drosophia homeodo- proteins in these complexes have been cloned, and they all main protein Antennapedia and used these primers to amplffy contain homeodomains. The A/T-binding proteins that have inserts by PCR from an insulinoma cDNA library. -
Molecular Basis for the Distinct Cellular Functions of the Lsm1-7 and Lsm2-8 Complexes
bioRxiv preprint doi: https://doi.org/10.1101/2020.04.22.055376; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Molecular basis for the distinct cellular functions of the Lsm1-7 and Lsm2-8 complexes Eric J. Montemayor1,2, Johanna M. Virta1, Samuel M. Hayes1, Yuichiro Nomura1, David A. Brow2, Samuel E. Butcher1 1Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA. 2Department of Biomolecular Chemistry, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA. Correspondence should be addressed to E.J.M. ([email protected]) and S.E.B. ([email protected]). Abstract Eukaryotes possess eight highly conserved Lsm (like Sm) proteins that assemble into circular, heteroheptameric complexes, bind RNA, and direct a diverse range of biological processes. Among the many essential functions of Lsm proteins, the cytoplasmic Lsm1-7 complex initiates mRNA decay, while the nuclear Lsm2-8 complex acts as a chaperone for U6 spliceosomal RNA. It has been unclear how these complexes perform their distinct functions while differing by only one out of seven subunits. Here, we elucidate the molecular basis for Lsm-RNA recognition and present four high-resolution structures of Lsm complexes bound to RNAs. The structures of Lsm2-8 bound to RNA identify the unique 2′,3′ cyclic phosphate end of U6 as a prime determinant of specificity. In contrast, the Lsm1-7 complex strongly discriminates against cyclic phosphates and tightly binds to oligouridylate tracts with terminal purines. -
The ELIXIR Core Data Resources: Fundamental Infrastructure for The
Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that mention Core Data Resource by name or include specific data record accession numbers. -
Creating the Gene Ontology Resource: Design and Implementation
Resource Creating the Gene Ontology Resource: Design and Implementation The Gene Ontology Consortium2 The exponential growth in the volume of accessible biological information has generated a confusion of voices surrounding the annotation of molecular information about genes and their products. The Gene Ontology (GO) project seeks to provide a set of structured vocabularies for specific biological domains that can be used to describe gene products in any organism. This work includes building three extensive ontologies to describe molecular function, biological process, and cellular component, and providing a community database resource that supports the use of these ontologies. The GO Consortium was initiated by scientists associated with three model organism databases: SGD, the Saccharomyces Genome database; FlyBase, the Drosophila genome database; and MGD/GXD, the Mouse Genome Informatics databases. Additional model organism database groups are joining the project. Each of these model organism information systems is annotating genes and gene products using GO vocabulary terms and incorporating these annotations into their respective model organism databases. Each database contributes its annotation files to a shared GO data resource accessible to the public at http://www.geneontology.org/. The GO site can be used by the community both to recover the GO vocabularies and to access the annotated gene product data sets from the model organism databases. The GO Consortium supports the development of the GO database resource and provides tools enabling curators and researchers to query and manipulate the vocabularies. We believe that the shared development of this molecular annotation resource will contribute to the unification of biological information. As the amount of biological information has grown, it has examining microarray expression data, sequencing genotypes become increasingly important to describe and classify bio- from a population, or identifying all glycolytic enzymes is logical objects in meaningful ways. -
Bioinformatics Study of Lectins: New Classification and Prediction In
Bioinformatics study of lectins : new classification and prediction in genomes François Bonnardel To cite this version: François Bonnardel. Bioinformatics study of lectins : new classification and prediction in genomes. Structural Biology [q-bio.BM]. Université Grenoble Alpes [2020-..]; Université de Genève, 2021. En- glish. NNT : 2021GRALV010. tel-03331649 HAL Id: tel-03331649 https://tel.archives-ouvertes.fr/tel-03331649 Submitted on 2 Sep 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITE GRENOBLE ALPES préparée dans le cadre d’une cotutelle entre la Communauté Université Grenoble Alpes et l’Université de Genève Spécialités: Chimie Biologie Arrêté ministériel : le 6 janvier 2005 – 25 mai 2016 Présentée par François Bonnardel Thèse dirigée par la Dr. Anne Imberty codirigée par la Dr/Prof. Frédérique Lisacek préparée au sein du laboratoire CERMAV, CNRS et du Computer Science Department, UNIGE et de l’équipe PIG, SIB Dans les Écoles Doctorales EDCSV et UNIGE Etude bioinformatique des lectines: nouvelle classification et prédiction dans les génomes Thèse soutenue publiquement le 8 Février 2021, devant le jury composé de : Dr. Alexandre de Brevern UMR S1134, Inserm, Université Paris Diderot, Paris, France, Rapporteur Dr. -
Genetics of Amyotrophic Lateral Sclerosis in the Han Chinese
Genetics of amyotrophic lateral sclerosis in the Han Chinese Ji He A thesis submitted for the degree of Master of Philosophy at The University of Queensland in 2015 The University of Queensland Diamantina Institute 1 Abstract Amyotrophic lateral sclerosis is the most frequently occurring neuromuscular degenerative disorders, and has an obscure aetiology. Whilst major progress has been made, the majority of the genetic variation involved in ALS is, as yet, undefined. In this thesis, multiple genetic studies have been conducted to advance our understanding of the genetic architecture of the disease. In the light of the paucity of comprehensive genetic studies performed in Chinese, the presented study focused on advancing our current understanding in genetics of ALS in the Han Chinese population. To identify genetic variants altering risk of ALS, a genome-wide association study (GWAS) was performed. The study included 1,324 Chinese ALS cases and 3,115 controls. After quality control, a number of analyses were performed in a cleaned dataset of 1,243 cases and 2,854 controls that included: a genome-wide association analysis to identify SNPs associated with ALS; a genomic restricted maximum likelihood (GREML) analysis to estimate the proportion of the phenotypic variance in ALS liability due to common SNPs; and a gene- based analysis to identify genes associated with ALS. There were no genome-wide significant SNPs or genes associated with ALS. However, it was estimated that 17% (SE: 0.05; P=6×10-5) of the phenotypic variance in ALS liability was due to common SNPs. The top associated SNP was within GNAS (rs4812037; p =7×10-7). -
To Find Information About Arabidopsis Genes Leonore Reiser1, Shabari
UNIT 1.11 Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes Leonore Reiser1, Shabari Subramaniam1, Donghui Li1, and Eva Huala1 1Phoenix Bioinformatics, Redwood City, CA USA ABSTRACT The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) is a comprehensive Web resource of Arabidopsis biology for plant scientists. TAIR curates and integrates information about genes, proteins, gene function, orthologs gene expression, mutant phenotypes, biological materials such as clones and seed stocks, genetic markers, genetic and physical maps, genome organization, images of mutant plants, protein sub-cellular localizations, publications, and the research community. The various data types are extensively interconnected and can be accessed through a variety of Web-based search and display tools. This unit primarily focuses on some basic methods for searching, browsing, visualizing, and analyzing information about Arabidopsis genes and genome, Additionally we describe how members of the community can share data using TAIR’s Online Annotation Submission Tool (TOAST), in order to make their published research more accessible and visible. Keywords: Arabidopsis ● databases ● bioinformatics ● data mining ● genomics INTRODUCTION The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) is a comprehensive Web resource for the biology of Arabidopsis thaliana (Huala et al., 2001; Garcia-Hernandez et al., 2002; Rhee et al., 2003; Weems et al., 2004; Swarbreck et al., 2008, Lamesch, et al., 2010, Berardini et al., 2016). The TAIR database contains information about genes, proteins, gene expression, mutant phenotypes, germplasms, clones, genetic markers, genetic and physical maps, genome organization, publications, and the research community. In addition, seed and DNA stocks from the Arabidopsis Biological Resource Center (ABRC; Scholl et al., 2003) are integrated with genomic data, and can be ordered through TAIR. -
In Human Metabolism
Supporting Information (SI Appendix) Framework and resource for more than 11,000 gene-transcript- protein-reaction associations (GeTPRA) in human metabolism SI Appendix Materials and Methods Standardization of Metabolite IDs with MNXM IDs Defined in the MNXref Namespace. Information on metabolic contents of the Recon 2Q was standardized using MNXM IDs defined in the MNXref namespace available at MetaNetX (1-3). This standardization was to facilitate the model refinement process described below. Each metabolite ID in the Recon 2Q was converted to MNXM ID accordingly. For metabolite IDs that were not converted to MNXM IDs, they were manually converted to MNXM IDs by comparing their compound structures and synonyms. In the final resulting SBML files, 97 metabolites were assigned with arbitrary IDs (i.e., “MNXMK_” followed by four digits) because they were not covered by the MNXref namespace (i.e., metabolite IDs not converted to MNXM IDs). Refinement or Removal of Biochemically Inconsistent Reactions. Recon 2 was built upon metabolic genes and reactions collected from EHMN (4, 5), the first genome-scale human liver metabolic model HepatoNet1 (6), an acylcarnitine and fatty-acid oxidation model Ac-FAO (7), and a small intestinal enterocyte model hs_eIEC611 (8). Flux variability analysis (9) of the Recon 2Q identified blocked reactions coming from these four sources of metabolic reaction data. The EHMN caused the greatest number of blocked reactions in the Recon 2Q (1,070 reactions corresponding to 69.3% of all the identified blocked reactions). To refine the EHMN reactions, following reactions were initially disregarded: 1) reactions having metabolite IDs not convertible to MNXM IDs; and 2) reactions without genes. -
Identification of Novel Branch Points Reveals Insights Into RNA Processing
Identification of Novel Branch Points Reveals Insights into RNA Processing by Genevieve Michelle Gould B.A. Molecular and Cell Biology with an emphasis in Genetics, Genomics, and Development University of California, Berkeley (2009) Submitted to the Department of Biology in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2015 © Massachusetts Institute of Technology 2015. All rights reserved. Signature of Author .................................................................................................................................................... Department of Biology August 31, 2015 Certified by .................................................................................................................................................................... Christopher B. Burge Professor of Biology Thesis Supervisor Accepted by.................................................................................................................................................................... Michael Hemann Associate Professor of Biology Co-Chair, Biology Graduate Committee 1 2 Identification of Novel Branch Points Reveals Insights into RNA Processing by Genevieve Michelle Gould Submitted to the Department of Biology on August 31, 2015 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Biology Abstract Pre-mRNA splicing is a ubiquitous process necessary for the production of functional eukaryotic mRNAs. The branch -
Redefining the Specificity of Phosphoinositide-Binding by Human
bioRxiv preprint doi: https://doi.org/10.1101/2020.06.20.163253; this version posted June 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Redefining the specificity of phosphoinositide-binding by human PH domain-containing proteins Nilmani Singh1†, Adriana Reyes-Ordoñez1†, Michael A. Compagnone1, Jesus F. Moreno Castillo1, Benjamin J. Leslie2, Taekjip Ha2,3,4,5, Jie Chen1* 1Department of Cell & Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801; 2Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, Baltimore, MD 21205; 3Department of Biophysics, Johns Hopkins University, Baltimore, MD 21218; 4Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205; 5Howard Hughes Medical Institute, Baltimore, MD 21205, USA †These authors contributed equally to this work. *Correspondence: [email protected]. bioRxiv preprint doi: https://doi.org/10.1101/2020.06.20.163253; this version posted June 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. ABSTRACT Pleckstrin homology (PH) domains are presumed to bind phosphoinositides (PIPs), but specific interaction with and regulation by PIPs for most PH domain-containing proteins are unclear. Here we employed a single-molecule pulldown assay to study interactions of lipid vesicles with full-length proteins in mammalian whole cell lysates. -
A Flexible Microfluidic System for Single-Cell Transcriptome Profiling
www.nature.com/scientificreports OPEN A fexible microfuidic system for single‑cell transcriptome profling elucidates phased transcriptional regulators of cell cycle Karen Davey1,7, Daniel Wong2,7, Filip Konopacki2, Eugene Kwa1, Tony Ly3, Heike Fiegler2 & Christopher R. Sibley 1,4,5,6* Single cell transcriptome profling has emerged as a breakthrough technology for the high‑resolution understanding of complex cellular systems. Here we report a fexible, cost‑efective and user‑ friendly droplet‑based microfuidics system, called the Nadia Instrument, that can allow 3′ mRNA capture of ~ 50,000 single cells or individual nuclei in a single run. The precise pressure‑based system demonstrates highly reproducible droplet size, low doublet rates and high mRNA capture efciencies that compare favorably in the feld. Moreover, when combined with the Nadia Innovate, the system can be transformed into an adaptable setup that enables use of diferent bufers and barcoded bead confgurations to facilitate diverse applications. Finally, by 3′ mRNA profling asynchronous human and mouse cells at diferent phases of the cell cycle, we demonstrate the system’s ability to readily distinguish distinct cell populations and infer underlying transcriptional regulatory networks. Notably this provided supportive evidence for multiple transcription factors that had little or no known link to the cell cycle (e.g. DRAP1, ZKSCAN1 and CEBPZ). In summary, the Nadia platform represents a promising and fexible technology for future transcriptomic studies, and other related applications, at cell resolution. Single cell transcriptome profling has recently emerged as a breakthrough technology for understanding how cellular heterogeneity contributes to complex biological systems. Indeed, cultured cells, microorganisms, biopsies, blood and other tissues can be rapidly profled for quantifcation of gene expression at cell resolution.