Annotation: Curation, Tools, Ontologies, Databases Genomics

Total Page:16

File Type:pdf, Size:1020Kb

Annotation: Curation, Tools, Ontologies, Databases Genomics Annotation: curation, tools, ontologies, databases Mike Cherry Genomics Lecture 7 Genetics 211 - Winter 2014 Database of the Past Staats, Joan. The Classified Bibliography of Inbred Strains of Mice. Science 119(3087): 295-296 (1954-02-26). !2 Staats, Joan. The Classified Bibliography of Inbred Strains of Mice. Science 119(3087): 295-296 (1954-02-26) What’s this guy talking about? "So you want to make a nnn, there are a couple of steps. You acquire data through partners. You do a bunch of engineering on that data to get it into the right format and conflate it with other sources of data, and then you do a bunch of operations, which is what this tool is about, to hand massage the data. And out the other end pops something that is higher quality than the sum of its parts." ! Michael Weiss-Malik, The Atlantic 9/2012 !3 Why curate publications? • Standardize vocabulary – fulltext is easy to use, but difficult to know the search was complete • Integrate results – why don’t publishers mandate submission of standardized data? • for decades crystallographic data & coordinates, and GenBank accession numbers have been required • GEO & SRA accessions should be but, are not enforced. !5 Manual Curation • Read published literature or use tools for analysis of results to make the best annotation • Identify the experimental methods used • Connect associated IDs from ontologies/ vocabularies, sequences their IDs, and connections to other databases (pathway, chemical, orthology, interactions, disease, expression, ... etc.) !6 Examples of Curated Databases • Protein Database, UniProtKB • Genome, model organism database • Chemical, CHEBI or PubChem • Human Genetic Disease, OMIM • Gene Function, Gene Ontology Consortium • Gene Models, GenCODE & UCSC • Sequence Variants, LSVD & HGMD • Personal Genomics, Ingenuity & OMICIA How would you curate this paper? Drosophila Hedgehog The gene hedgehog is referred to in FlyBase by the symbol Dmel\hh (CG4637, FBgn0004644). It is a protein_coding_gene from Drosophila melanogaster. Its sequence location is 3R:18953425..18967881. It has the cytological map location 94E1. There is experimental evidence that it has the molecular function: cell surface binding; protein binding. There is experimental evidence for 24 unique biological process terms, many of which group under: anatomical structure development; cellular component movement; organ morphogenesis; regulation of biological process; sensory organ development; reproductive process in a multicellular organism; organ development; regulation of protein transport; cell projection organization; open tracheal system development; central nervous system development. 169 alleles are reported. The phenotypes of these alleles are annotated with: organ system; adult segment; adult mesothoracic segment; peripheral nervous system; primordium; nervous system; embryonic nervous system; late extended germ band embryo; ventral nerve cord primordium; abdominal ventral denticle belt. It has 2 annotated transcripts and 2 annotated polypeptides. !10 Saccharomyces SGS1 ! SGS1 encodes a helicase with similarity to E. coli RecQ and human BLM and WRN helicases (6). Mutations in BLM are implicated in the cancer- prone Bloom's Syndrome, and mutations in WRN cause the premature- aging Werner's Syndrome (6). SGS1 was identified in a screen for suppressors of the slow growth phenotype of top3 mutants, and Sgs1p has been shown to interact with the topoisomerase Top3 p (7). Sgs1p appears to be involved in the maintenance of genome stability and the suppression of illegitimate recombination; sgs1 null mutants show mitotic hyperrecombination and elevated levels of chromosome missegregation (8, 9, 10). Sgs1p has been localized to the nucleolus, and is needed to maintain the integrity of rDNA repeats (2). Sgs1p shows ATPase activity and unwinds duplex DNA; it preferentially binds to branched DNA substrates and has a 3' to 5' polarity of unwinding (11, 12). !11 www.pharmgkb.org ! Knowledge Base of Pharmacogenetic and Pharmacogenomic Information !12 Imaging expression patterns during Drosophila embryogenesis. Tomancak et al. Genome Biology 2002 3:research0088.1 !13 The Gene Wiki article about Cyclin-dependent kinase Good B M et al. Nucl. Acids Res. 2011;nar.gkr925 Trust distributions of Gene Wiki revisions versus general (non-Gene Wiki) Wikipedia revisions. Good B M et al. Nucl. Acids Res. 2011;nar.gkr925 Directed Acyclic Graph How can ontologies help organizing information? • Describe material entities of nature • Represent events, actions, procedures & relationships as immaterial entities • Specific connections between entities • Standardize nomenclature • Enhance computability of information • Provide a framework to communicate information An ontology is a set of terms… mitochondrion cell chromosome nucleus mitochondrial! chromosome !18 An ontology is a set of terms… … with different types of relationships to each . cell Parent Term part_of part_of part_of mitochondrion chromosome nucleus Child Term part_of is_a mitochondrial! chromosome An ontology is a set of terms… … with different types of relationships to each other." All relationships must be true. cell Parent Term part_of part_of part_of mitochondrion chromosome nucleus Child Term part_of part_of is_a mitochondrial! chromosome An ontology is a set of terms… … with different types of relationships to each other." All relationships must be true" because inferences can be made based on these relationships cell Parent Term part_of part_of part_of mitochondrion chromosome nucleus part_of Child Term part_of part_of is_a part_of mitochondrial! chromosome www.geneontology.org/GO.ontology.relations.shtml !22 Example Gene Ontology Term id: GO:0000016! name: lactase activity! namespace: molecular_function! def: "Catalysis of the reaction: lactose + H2O = ! !!!!!D-glucose + D-galactose." [EC:3.2.1.108]! synonym: "lactase-phlorizin hydrolase activity" BROAD ! !!!!![EC:3.2.1.108]! synonym: "lactose galactohydrolase activity" EXACT [EC:3.2.1.108]! xref: EC:3.2.1.108! xref: MetaCyc:LACTASE-RXN! xref: Reactome:20536! is_a: GO:0004553 ! hydrolase activity, hydrolyzing O-glycosyl! !!!!!compounds !23 Subset of Evidence Code Ontology [Term]! id: ECO:00000067! name: inferred from electronic annotation! def: "Used for annotations that depend directly on computation or automated transfer of annotations from a database, particularly when the analysis is performed internally and not published. A key feature that distinguishes this evidence code from others is that it is not made by a curator; use IEA when no curator has checked the specific annotation to verify its accuracy. The actual method used (BLAST search, Swiss-Prot keyword mapping, etc.) doesn't matter." [GO:IEA]! synonym: "IEA" RELATED []! is_a: ECO:0000043 ! inferred from in-silico analysis! ! [Term]! id: ECO:0000007! name: inferred from immunofluorescence! def: "Used when an annotation is made based on methods that detect the presence of macromolecules, proteins, and compounds by the use of a fluorescent-labeled antibody." [TAIR:TED]! synonym: "IDA: immunofluorescence" RELATED []! is_a: ECO:0000040 ! inferred from immunological assay! ! !24 Evidence Codes for Gene Product Annotations Direct assay (IDA). Enzyme assays, In vitro reconstitution (e.g. transcription), Immunofluorescence (for cellular component) Expression pattern (IEP). Transcript levels (e.g. Northerns, microarray data) Genetic interaction (IGI). "Traditional" genetic interactions such as suppressors, synthetic lethals, etc. Mutant phenotype (IMP). Any gene mutation/knockout Physical interaction (IPI). 2-hybrid interactions; Co-purification Sequence or structural similarity (ISS). Sequence similarity (homologue of/most closely related to); Recognized domains Sequence Alignment (ISA). Pairwise or multiple sequence alignment to experimentally characterized proteins. Sequence Orthology (ISO). Orthologous genes share a common ancestor and have arisen due to a speciation event. Phylogenetic analysis with maximum likelihood or nearest neighbor joining. Sequence Model (ISM). Statistical modeling tool determined protein membership in a functional family. HHM, tRNAscan and TMHMM are examples of this type of algorithm. Genomic context (IGC). Annotations based on synteny, in these cases sequences similarity is not enough. For example, presence within an operon in bacterial systems. !25 30000" Number"of"Annotated"Genes"per"Organism"by"Evidence"Type" December"2013"(using"genename"as"unique"ID)" 25000" 20000" 15000" 10000" 5000" 0" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" P" F" C" EcoCyc" PomBase" SGD" dictyBase" WB" FB" Chicken" Cow" ZFIN" Pig" Dog" RGD" MGI" TAIR" Human" Biological process GO enrichment of 88 human peroxisome proteins before the focused manual peroxisome protein annotation effort. Mutowo-Meullenet P et al. Database 2013;2013:bas062 © The Author(s) 2013. Published by Oxford University Press. Biological process GO enrichment of 88 human peroxisome proteins after the focused manual peroxisome protein annotation effort. Mutowo-Meullenet P et al. Database 2013;2013:bas062 © The Author(s) 2013. Published by Oxford University Press. Annotations are unconnected posi-ve(regula-on(of( protein( transcrip-on(from(pol(II( localiza-on(to( pap1% promoter(in(response(to( sty1% nucleus[GO: oxida-ve(stress[GO: 0034504]( 0036091]( cellular(response( to(oxida-ve(stress( [GO:0034599]( DB# Object# Term# Ev# Ref# ..# PomBase( sty1( GO:0034504( IMP( PMID:9585505( ..( ..( ..(
Recommended publications
  • Fine–Mapping Identifies NAD–ME1 As a Candidate Underlying a Major
    bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.285429; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1 Fine–mapping identifies NAD–ME1 as a candidate underlying 2 a major locus controlling temporal variation in primary and 3 specialized metabolism in Arabidopsis 4 5 Marta Francisco1, Daniel J. Kliebenstein2,3, Víctor M. Rodríguez1, Pilar Soengas1, 6 Rosaura Abilleira1, María E. Cartea1 7 1Misión Biológica de Galicia, (MBG-CSIC), P.O. Box 28, 36080, Pontevedra, Spain 8 2Department of Plant Sciences, University of California at Davis, Davis, CA 95616, 9 USA. 10 3DynaMo Center of Excellence, University of Copenhagen, Thorvaldsensvej 40, DK- 11 1871 Frederiksberg C, Denmark. 12 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.07.285429; this version posted September 9, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 13 Summary 14 Plant metabolism is modulated by a complex interplay between internal signals and 15 external cues. A major goal of all quantitative metabolomic studies is to clone the 16 underlying genes to understand the mechanistic basis of this variation. Using fine-scale 17 genetic mapping, in this work we report the identification and initial characterization of 18 NAD-DEPENDENT MALIC ENZYME 1 (NAD-ME1) as the candidate gene underlying 19 the pleiotropic network Met.II.15 QTL controlling variation in plant metabolism and 20 circadian clock outputs in the Bay × Sha Arabidopsis population.
    [Show full text]
  • Pancreatic Beta Cells Express a Diverse Set Ofhomeobox Genes
    Proc. Nati. Acad. Sci. USA Vol. 91, pp. 12203-12207, December 1994 Biochemistry Pancreatic beta cells express a diverse set of homeobox genes (Lim motif/Lmx gene/Nkx gene/Alx gene/Vdx homeobox) ABRAHAM RUDNICK*t, THAI YEN LING*, HIROKI ODAGIRI*, WILLIAM J. RUTTER*t, AND MICHAEL S. GERMAN*t§ *Hormone Research Institute and Departments of tMedicine and tBiochemistry and Biophysics, University of California, San Francisco, CA 94143-0534 Contributed by William J. Rutter, August 22, 1994 ABSTRACT Homeobox genes, which are found in all RIPE3B element (16) and the P1 element (8) [also called CT1 eukaryotic organisms, encode transcriptional regulators in- (9)] lie on either side of the IEB1 element. The A/T elements volved in cell-type differentiation and development. Several and the E boxes function synergistically: none of the ele- homeobox genes encoding homeodomain proteins that bind and ments can function in isolation, but combination of an E box activate the insulin gene promoter have been described. In an and an A/T element results in dramatic activation of tran- attempt to identify additional beta-cell homeodomain proteins, scription (11, 16, 19). A number of complexes from beta-cell we designed primers based on the sequences of beta-cell nuclei bind to the A/T elements (6, 8-11, 16, 19). Some homeobox genes cdx3 and lmxl and the Drosophia homeodo- proteins in these complexes have been cloned, and they all main protein Antennapedia and used these primers to amplffy contain homeodomains. The A/T-binding proteins that have inserts by PCR from an insulinoma cDNA library.
    [Show full text]
  • Molecular Basis for the Distinct Cellular Functions of the Lsm1-7 and Lsm2-8 Complexes
    bioRxiv preprint doi: https://doi.org/10.1101/2020.04.22.055376; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Molecular basis for the distinct cellular functions of the Lsm1-7 and Lsm2-8 complexes Eric J. Montemayor1,2, Johanna M. Virta1, Samuel M. Hayes1, Yuichiro Nomura1, David A. Brow2, Samuel E. Butcher1 1Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA. 2Department of Biomolecular Chemistry, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA. Correspondence should be addressed to E.J.M. ([email protected]) and S.E.B. ([email protected]). Abstract Eukaryotes possess eight highly conserved Lsm (like Sm) proteins that assemble into circular, heteroheptameric complexes, bind RNA, and direct a diverse range of biological processes. Among the many essential functions of Lsm proteins, the cytoplasmic Lsm1-7 complex initiates mRNA decay, while the nuclear Lsm2-8 complex acts as a chaperone for U6 spliceosomal RNA. It has been unclear how these complexes perform their distinct functions while differing by only one out of seven subunits. Here, we elucidate the molecular basis for Lsm-RNA recognition and present four high-resolution structures of Lsm complexes bound to RNAs. The structures of Lsm2-8 bound to RNA identify the unique 2′,3′ cyclic phosphate end of U6 as a prime determinant of specificity. In contrast, the Lsm1-7 complex strongly discriminates against cyclic phosphates and tightly binds to oligouridylate tracts with terminal purines.
    [Show full text]
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The
    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
    [Show full text]
  • Creating the Gene Ontology Resource: Design and Implementation
    Resource Creating the Gene Ontology Resource: Design and Implementation The Gene Ontology Consortium2 The exponential growth in the volume of accessible biological information has generated a confusion of voices surrounding the annotation of molecular information about genes and their products. The Gene Ontology (GO) project seeks to provide a set of structured vocabularies for specific biological domains that can be used to describe gene products in any organism. This work includes building three extensive ontologies to describe molecular function, biological process, and cellular component, and providing a community database resource that supports the use of these ontologies. The GO Consortium was initiated by scientists associated with three model organism databases: SGD, the Saccharomyces Genome database; FlyBase, the Drosophila genome database; and MGD/GXD, the Mouse Genome Informatics databases. Additional model organism database groups are joining the project. Each of these model organism information systems is annotating genes and gene products using GO vocabulary terms and incorporating these annotations into their respective model organism databases. Each database contributes its annotation files to a shared GO data resource accessible to the public at http://www.geneontology.org/. The GO site can be used by the community both to recover the GO vocabularies and to access the annotated gene product data sets from the model organism databases. The GO Consortium supports the development of the GO database resource and provides tools enabling curators and researchers to query and manipulate the vocabularies. We believe that the shared development of this molecular annotation resource will contribute to the unification of biological information. As the amount of biological information has grown, it has examining microarray expression data, sequencing genotypes become increasingly important to describe and classify bio- from a population, or identifying all glycolytic enzymes is logical objects in meaningful ways.
    [Show full text]
  • Bioinformatics Study of Lectins: New Classification and Prediction In
    Bioinformatics study of lectins : new classification and prediction in genomes François Bonnardel To cite this version: François Bonnardel. Bioinformatics study of lectins : new classification and prediction in genomes. Structural Biology [q-bio.BM]. Université Grenoble Alpes [2020-..]; Université de Genève, 2021. En- glish. NNT : 2021GRALV010. tel-03331649 HAL Id: tel-03331649 https://tel.archives-ouvertes.fr/tel-03331649 Submitted on 2 Sep 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITE GRENOBLE ALPES préparée dans le cadre d’une cotutelle entre la Communauté Université Grenoble Alpes et l’Université de Genève Spécialités: Chimie Biologie Arrêté ministériel : le 6 janvier 2005 – 25 mai 2016 Présentée par François Bonnardel Thèse dirigée par la Dr. Anne Imberty codirigée par la Dr/Prof. Frédérique Lisacek préparée au sein du laboratoire CERMAV, CNRS et du Computer Science Department, UNIGE et de l’équipe PIG, SIB Dans les Écoles Doctorales EDCSV et UNIGE Etude bioinformatique des lectines: nouvelle classification et prédiction dans les génomes Thèse soutenue publiquement le 8 Février 2021, devant le jury composé de : Dr. Alexandre de Brevern UMR S1134, Inserm, Université Paris Diderot, Paris, France, Rapporteur Dr.
    [Show full text]
  • Genetics of Amyotrophic Lateral Sclerosis in the Han Chinese
    Genetics of amyotrophic lateral sclerosis in the Han Chinese Ji He A thesis submitted for the degree of Master of Philosophy at The University of Queensland in 2015 The University of Queensland Diamantina Institute 1 Abstract Amyotrophic lateral sclerosis is the most frequently occurring neuromuscular degenerative disorders, and has an obscure aetiology. Whilst major progress has been made, the majority of the genetic variation involved in ALS is, as yet, undefined. In this thesis, multiple genetic studies have been conducted to advance our understanding of the genetic architecture of the disease. In the light of the paucity of comprehensive genetic studies performed in Chinese, the presented study focused on advancing our current understanding in genetics of ALS in the Han Chinese population. To identify genetic variants altering risk of ALS, a genome-wide association study (GWAS) was performed. The study included 1,324 Chinese ALS cases and 3,115 controls. After quality control, a number of analyses were performed in a cleaned dataset of 1,243 cases and 2,854 controls that included: a genome-wide association analysis to identify SNPs associated with ALS; a genomic restricted maximum likelihood (GREML) analysis to estimate the proportion of the phenotypic variance in ALS liability due to common SNPs; and a gene- based analysis to identify genes associated with ALS. There were no genome-wide significant SNPs or genes associated with ALS. However, it was estimated that 17% (SE: 0.05; P=6×10-5) of the phenotypic variance in ALS liability was due to common SNPs. The top associated SNP was within GNAS (rs4812037; p =7×10-7).
    [Show full text]
  • To Find Information About Arabidopsis Genes Leonore Reiser1, Shabari
    UNIT 1.11 Using The Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes Leonore Reiser1, Shabari Subramaniam1, Donghui Li1, and Eva Huala1 1Phoenix Bioinformatics, Redwood City, CA USA ABSTRACT The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) is a comprehensive Web resource of Arabidopsis biology for plant scientists. TAIR curates and integrates information about genes, proteins, gene function, orthologs gene expression, mutant phenotypes, biological materials such as clones and seed stocks, genetic markers, genetic and physical maps, genome organization, images of mutant plants, protein sub-cellular localizations, publications, and the research community. The various data types are extensively interconnected and can be accessed through a variety of Web-based search and display tools. This unit primarily focuses on some basic methods for searching, browsing, visualizing, and analyzing information about Arabidopsis genes and genome, Additionally we describe how members of the community can share data using TAIR’s Online Annotation Submission Tool (TOAST), in order to make their published research more accessible and visible. Keywords: Arabidopsis ● databases ● bioinformatics ● data mining ● genomics INTRODUCTION The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) is a comprehensive Web resource for the biology of Arabidopsis thaliana (Huala et al., 2001; Garcia-Hernandez et al., 2002; Rhee et al., 2003; Weems et al., 2004; Swarbreck et al., 2008, Lamesch, et al., 2010, Berardini et al., 2016). The TAIR database contains information about genes, proteins, gene expression, mutant phenotypes, germplasms, clones, genetic markers, genetic and physical maps, genome organization, publications, and the research community. In addition, seed and DNA stocks from the Arabidopsis Biological Resource Center (ABRC; Scholl et al., 2003) are integrated with genomic data, and can be ordered through TAIR.
    [Show full text]
  • In Human Metabolism
    Supporting Information (SI Appendix) Framework and resource for more than 11,000 gene-transcript- protein-reaction associations (GeTPRA) in human metabolism SI Appendix Materials and Methods Standardization of Metabolite IDs with MNXM IDs Defined in the MNXref Namespace. Information on metabolic contents of the Recon 2Q was standardized using MNXM IDs defined in the MNXref namespace available at MetaNetX (1-3). This standardization was to facilitate the model refinement process described below. Each metabolite ID in the Recon 2Q was converted to MNXM ID accordingly. For metabolite IDs that were not converted to MNXM IDs, they were manually converted to MNXM IDs by comparing their compound structures and synonyms. In the final resulting SBML files, 97 metabolites were assigned with arbitrary IDs (i.e., “MNXMK_” followed by four digits) because they were not covered by the MNXref namespace (i.e., metabolite IDs not converted to MNXM IDs). Refinement or Removal of Biochemically Inconsistent Reactions. Recon 2 was built upon metabolic genes and reactions collected from EHMN (4, 5), the first genome-scale human liver metabolic model HepatoNet1 (6), an acylcarnitine and fatty-acid oxidation model Ac-FAO (7), and a small intestinal enterocyte model hs_eIEC611 (8). Flux variability analysis (9) of the Recon 2Q identified blocked reactions coming from these four sources of metabolic reaction data. The EHMN caused the greatest number of blocked reactions in the Recon 2Q (1,070 reactions corresponding to 69.3% of all the identified blocked reactions). To refine the EHMN reactions, following reactions were initially disregarded: 1) reactions having metabolite IDs not convertible to MNXM IDs; and 2) reactions without genes.
    [Show full text]
  • Identification of Novel Branch Points Reveals Insights Into RNA Processing
    Identification of Novel Branch Points Reveals Insights into RNA Processing by Genevieve Michelle Gould B.A. Molecular and Cell Biology with an emphasis in Genetics, Genomics, and Development University of California, Berkeley (2009) Submitted to the Department of Biology in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2015 © Massachusetts Institute of Technology 2015. All rights reserved. Signature of Author .................................................................................................................................................... Department of Biology August 31, 2015 Certified by .................................................................................................................................................................... Christopher B. Burge Professor of Biology Thesis Supervisor Accepted by.................................................................................................................................................................... Michael Hemann Associate Professor of Biology Co-Chair, Biology Graduate Committee 1 2 Identification of Novel Branch Points Reveals Insights into RNA Processing by Genevieve Michelle Gould Submitted to the Department of Biology on August 31, 2015 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Biology Abstract Pre-mRNA splicing is a ubiquitous process necessary for the production of functional eukaryotic mRNAs. The branch
    [Show full text]
  • Redefining the Specificity of Phosphoinositide-Binding by Human
    bioRxiv preprint doi: https://doi.org/10.1101/2020.06.20.163253; this version posted June 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Redefining the specificity of phosphoinositide-binding by human PH domain-containing proteins Nilmani Singh1†, Adriana Reyes-Ordoñez1†, Michael A. Compagnone1, Jesus F. Moreno Castillo1, Benjamin J. Leslie2, Taekjip Ha2,3,4,5, Jie Chen1* 1Department of Cell & Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801; 2Department of Biophysics and Biophysical Chemistry, Johns Hopkins University School of Medicine, Baltimore, MD 21205; 3Department of Biophysics, Johns Hopkins University, Baltimore, MD 21218; 4Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205; 5Howard Hughes Medical Institute, Baltimore, MD 21205, USA †These authors contributed equally to this work. *Correspondence: [email protected]. bioRxiv preprint doi: https://doi.org/10.1101/2020.06.20.163253; this version posted June 21, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. ABSTRACT Pleckstrin homology (PH) domains are presumed to bind phosphoinositides (PIPs), but specific interaction with and regulation by PIPs for most PH domain-containing proteins are unclear. Here we employed a single-molecule pulldown assay to study interactions of lipid vesicles with full-length proteins in mammalian whole cell lysates.
    [Show full text]
  • A Flexible Microfluidic System for Single-Cell Transcriptome Profiling
    www.nature.com/scientificreports OPEN A fexible microfuidic system for single‑cell transcriptome profling elucidates phased transcriptional regulators of cell cycle Karen Davey1,7, Daniel Wong2,7, Filip Konopacki2, Eugene Kwa1, Tony Ly3, Heike Fiegler2 & Christopher R. Sibley 1,4,5,6* Single cell transcriptome profling has emerged as a breakthrough technology for the high‑resolution understanding of complex cellular systems. Here we report a fexible, cost‑efective and user‑ friendly droplet‑based microfuidics system, called the Nadia Instrument, that can allow 3′ mRNA capture of ~ 50,000 single cells or individual nuclei in a single run. The precise pressure‑based system demonstrates highly reproducible droplet size, low doublet rates and high mRNA capture efciencies that compare favorably in the feld. Moreover, when combined with the Nadia Innovate, the system can be transformed into an adaptable setup that enables use of diferent bufers and barcoded bead confgurations to facilitate diverse applications. Finally, by 3′ mRNA profling asynchronous human and mouse cells at diferent phases of the cell cycle, we demonstrate the system’s ability to readily distinguish distinct cell populations and infer underlying transcriptional regulatory networks. Notably this provided supportive evidence for multiple transcription factors that had little or no known link to the cell cycle (e.g. DRAP1, ZKSCAN1 and CEBPZ). In summary, the Nadia platform represents a promising and fexible technology for future transcriptomic studies, and other related applications, at cell resolution. Single cell transcriptome profling has recently emerged as a breakthrough technology for understanding how cellular heterogeneity contributes to complex biological systems. Indeed, cultured cells, microorganisms, biopsies, blood and other tissues can be rapidly profled for quantifcation of gene expression at cell resolution.
    [Show full text]