Biocuration 2016 - Posters
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Original Article Text Mining in the Biocuration Workflow: Applications for Literature Curation at Wormbase, Dictybase and TAIR
Database, Vol. 2012, Article ID bas040, doi:10.1093/database/bas040 ............................................................................................................................................................................................................................................................................................. Original article Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR Kimberly Van Auken1,*, Petra Fey2, Tanya Z. Berardini3, Robert Dodson2, Laurel Cooper4, Donghui Li3, Juancarlos Chan1, Yuling Li1, Siddhartha Basu2, Hans-Michael Muller1, Downloaded from Rex Chisholm2, Eva Huala3, Paul W. Sternberg1,5 and the WormBase Consortium 1Division of Biology, California Institute of Technology, 1200 E. California Boulevard, Pasadena, CA 91125, 2Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, 420 E. Superior Street, Chicago, IL 60611, 3Department of Plant Biology, Carnegie Institution, 260 Panama Street, Stanford, CA 94305, 4Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331 and 5Howard Hughes Medical Institute, California Institute of Technology, 1200 E. California Boulevard, Pasadena, CA 91125, USA http://database.oxfordjournals.org/ *Corresponding author: Tel: +1 609 937 1635; Fax: +1 626 568 8012; Email: [email protected] Submitted 18 June 2012; Revised 30 September 2012; Accepted 2 October 2012 ............................................................................................................................................................................................................................................................................................ -
The Rat Genome Database at 20: a Multi-Species Knowledgebase and Analysis Platform Jennifer R
Published online 12 November 2019 Nucleic Acids Research, 2020, Vol. 48, Database issue D731–D742 doi: 10.1093/nar/gkz1041 The Year of the Rat: The Rat Genome Database at 20: a multi-species knowledgebase and analysis platform Jennifer R. Smith 1,*, G. Thomas Hayman1, Shur-Jen Wang1, Stanley J.F. Laulederkind 1, Matthew J. Hoffman1,2, Mary L. Kaldunski 1, Monika Tutaj1, Jyothi Thota1,Harika S. Nalabolu1, Santoshi L.R. Ellanki1, Marek A. Tutaj1, Jeffrey L. De Pons1, Anne E. Kwitek2, Melinda R. Dwinell2 and Mary E. Shimoyama1 1Rat Genome Database, Department of Biomedical Engineering, Medical College of Wisconsin, Milwaukee, WI 53226, USA and 2Genomic Sciences and Precision Medicine Center and Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA Received September 15, 2019; Revised October 21, 2019; Editorial Decision October 22, 2019; Accepted October 24, 2019 ABSTRACT In contrast to the current wealth of data and tools, the ‘bare bones’ first iteration (Figure 1,Table1) focused on just a few Formed in late 1999, the Rat Genome Database (RGD, data types––most notably a handful of known genes, plus https://rgd.mcw.edu) will be 20 in 2020, the Year of the EST and SSLP markers––localized only on genetic and RH Rat. Because the laboratory rat, Rattus norvegicus, maps, and offered a minimal number of tools for searching has been used as a model for complex human dis- and analyzing that data. eases such as cardiovascular disease, diabetes, can- Even before the first public release of the rat reference cer, neurological disorders and arthritis, among oth- genome in 2002 (1), RGD was focused on mapping disease- ers, for >150 years, RGD has always been disease- related regions and on comparative genomics between rat, focused and committed to providing data and tools human and mouse to support the use of rat as a model for for researchers doing comparative genomics and human disease. -
PLATFORM ABSTRACTS Abstract Abstract Numbers Numbers Tuesday, November 6 41
American Society of Human Genetics 62nd Annual Meeting November 6–10, 2012 San Francisco, California PLATFORM ABSTRACTS Abstract Abstract Numbers Numbers Tuesday, November 6 41. Genes Underlying Neurological Disease Room 134 #196–#204 2. 4:30–6:30pm: Plenary Abstract 42. Cancer Genetics III: Common Presentations Hall D #1–#6 Variants Ballroom 104 #205–#213 43. Genetics of Craniofacial and Wednesday, November 7 Musculoskeletal Disorders Room 124 #214–#222 10:30am–12:45 pm: Concurrent Platform Session A (11–19): 44. Tools for Phenotype Analysis Room 132 #223–#231 11. Genetics of Autism Spectrum 45. Therapy of Genetic Disorders Room 130 #232–#240 Disorders Hall D #7–#15 46. Pharmacogenetics: From Discovery 12. New Methods for Big Data Ballroom 103 #16–#24 to Implementation Room 123 #241–#249 13. Cancer Genetics I: Rare Variants Room 135 #25–#33 14. Quantitation and Measurement of Friday, November 9 Regulatory Oversight by the Cell Room 134 #34–#42 8:00am–10:15am: Concurrent Platform Session D (47–55): 15. New Loci for Obesity, Diabetes, and 47. Structural and Regulatory Genomic Related Traits Ballroom 104 #43–#51 Variation Hall D #250–#258 16. Neuromuscular Disease and 48. Neuropsychiatric Disorders Ballroom 103 #259–#267 Deafness Room 124 #52–#60 49. Common Variants, Rare Variants, 17. Chromosomes and Disease Room 132 #61–#69 and Everything in-Between Room 135 #268–#276 18. Prenatal and Perinatal Genetics Room 130 #70–#78 50. Population Genetics Genome-Wide Room 134 #277–#285 19. Vascular and Congenital Heart 51. Endless Forms Most Beautiful: Disease Room 123 #79–#87 Variant Discovery in Genomic Data Ballroom 104 #286–#294 52. -
Proquest Dissertations
Automated learning of protein involvement in pathogenesis using integrated queries Eithon Cadag A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Washington 2009 Program Authorized to Offer Degree: Department of Medical Education and Biomedical Informatics UMI Number: 3394276 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. UMI Dissertation Publishing UMI 3394276 Copyright 2010 by ProQuest LLC. All rights reserved. This edition of the work is protected against unauthorized copying under Title 17, United States Code. uest ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 University of Washington Graduate School This is to certify that I have examined this copy of a doctoral dissertation by Eithon Cadag and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made. Chair of the Supervisory Committee: Reading Committee: (SjLt KJ. £U*t~ Peter Tgffczy-Hornoch In presenting this dissertation in partial fulfillment of the requirements for the doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with "fair use" as prescribed in the U.S. -
Molecular Biology for Computer Scientists
CHAPTER 1 Molecular Biology for Computer Scientists Lawrence Hunter “Computers are to biology what mathematics is to physics.” — Harold Morowitz One of the major challenges for computer scientists who wish to work in the domain of molecular biology is becoming conversant with the daunting intri- cacies of existing biological knowledge and its extensive technical vocabu- lary. Questions about the origin, function, and structure of living systems have been pursued by nearly all cultures throughout history, and the work of the last two generations has been particularly fruitful. The knowledge of liv- ing systems resulting from this research is far too detailed and complex for any one human to comprehend. An entire scientific career can be based in the study of a single biomolecule. Nevertheless, in the following pages, I attempt to provide enough background for a computer scientist to understand much of the biology discussed in this book. This chapter provides the briefest of overviews; I can only begin to convey the depth, variety, complexity and stunning beauty of the universe of living things. Much of what follows is not about molecular biology per se. In order to 2ARTIFICIAL INTELLIGENCE & MOLECULAR BIOLOGY explain what the molecules are doing, it is often necessary to use concepts involving, for example, cells, embryological development, or evolution. Bi- ology is frustratingly holistic. Events at one level can effect and be affected by events at very different levels of scale or time. Digesting a survey of the basic background material is a prerequisite for understanding the significance of the molecular biology that is described elsewhere in the book. -
The ELIXIR Core Data Resources: Fundamental Infrastructure for The
Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that mention Core Data Resource by name or include specific data record accession numbers. -
Genome Informatics 4–8 September 2002, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
Comparative and Functional Genomics Comp Funct Genom 2003; 4: 509–514. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cfg.300 Feature Meeting Highlights: Genome Informatics 4–8 September 2002, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK Jo Wixon1* and Jennifer Ashurst2 1MRC UK HGMP-RC, Hinxton, Cambridge CB10 1SB, UK 2The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK *Correspondence to: Abstract Jo Wixon, MRC UK HGMP-RC, Hinxton, Cambridge CB10 We bring you the highlights of the second Joint Cold Spring Harbor Laboratory 1SB, UK. and Wellcome Trust ‘Genome Informatics’ Conference, organized by Ewan Birney, E-mail: [email protected] Suzanna Lewis and Lincoln Stein. There were sessions on in silico data discovery, comparative genomics, annotation pipelines, functional genomics and integrative biology. The conference included a keynote address by Sydney Brenner, who was awarded the 2002 Nobel Prize in Physiology or Medicine (jointly with John Sulston and H. Robert Horvitz) a month later. Copyright 2003 John Wiley & Sons, Ltd. In silico data discovery background set was genes which had a log ratio of ∼0 in liver. Their approach found 17 of 17 known In the first of two sessions on this topic, Naoya promoters with a specificity of 17/28. None of the Hata (Cold Spring Harbor Laboratory, USA) sites they identified was located downstream of a spoke about motif searching for tissue specific TSS and all showed an excess in the foreground promoters. The first step in the process is to sample compared to the background sample. -
Pi-Starvation Induced Transcriptional Changes in Barley Revealed by a Comprehensive RNA-Seq and Degradome Analyses
Pi-starvation induced transcriptional changes in barley revealed by a comprehensive RNA-Seq and degradome analyses Pawel Sega Uniwersytet im Adama Mickiewicza w Poznaniu Wydzial Biologii https://orcid.org/0000-0001-5920- 7514 Katarzyna Kruszka Uniwersytet im Adama Mickiewicza w Poznaniu Wydzial Biologii Dawid Bielewicz Uniwersytet im Adama Mickiewicza w Poznaniu Wydzial Biologii Wojciech Karlowski Uniwersytet im Adama Mickiewicza w Poznaniu Wydzial Biologii Przemyslaw Nuc Uniwersytet im Adama Mickiewicza w Poznaniu Wydzial Biologii Zoa Szweykowska-Kulinska Uniwersytet im Adama Mickiewicza w Poznaniu Wydzial Biologii Andrzej Pacak ( [email protected] ) https://orcid.org/0000-0002-6968-9307 Research article Keywords: Phosphate regulatory network, Barley, Small RNAs, Degradome, RNA-Seq Posted Date: February 27th, 2020 DOI: https://doi.org/10.21203/rs.2.24665/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Version of Record: A version of this preprint was published on March 9th, 2021. See the published version at https://doi.org/10.1186/s12864-021-07481-w. Page 1/40 Abstract Background: Small RNAs (sRNAs) are 18–24 nt regulatory elements which are responsible for plant development regulation and participate in many plant stress responses. Insucient inorganic phosphate (Pi) concentration triggers plant responses to balance the internal Pi level. Results: In this study, we describe Pi-starvation-responsive small RNAs and transcriptome changes in barley (Hordeum vulgare L.) using Next-Generation Sequencing (NGS) data derived from three different types of NGS libraries: (i) small RNAs, (ii) degraded RNAs, and (iii) functional mRNAs. We nd that differentially and signicantly expressed miRNAs (DEMs, p-value < 0.05) are represented by 162 (44.88 % of total differentially expressed small RNAs) molecules in shoot and 138 (7.14 %) in root; mainly various miR399 and miR827 isomiRs. -
What Remains to Be Discovered in the Eukaryotic Proteome?
bioRxiv preprint doi: https://doi.org/10.1101/469569; this version posted November 16, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Hidden in plain sight: What remains to be discovered in the eukaryotic proteome? Valerie Wood∗1,2, Antonia Lock3, Midori A. Harris1,2, Kim Rutherford1,2, J¨urgB¨ahler3, and Stephen G. Oliver1,2 1Cambridge Systems Biology Centre, University of Cambridge, Cambridge, UK 2Department of Biochemistry, University of Cambridge, Cambridge, UK 3Department of Genetics, Evolution and Environment, University College London, London, UK November 12, 2018 Abstract The first decade of genome sequencing stimulated an explosion in the charac- terization of unknown proteins. More recently, the pace of functional discovery has slowed, leaving around 20% of the proteins even in well-studied model or- ganisms without informative descriptions of their biological roles. Remarkably, many uncharacterized proteins are conserved from yeasts to human, suggesting that they contribute to fundamental biological processes. To fully understand biological systems in health and disease, we need to account for every part of the system. Unstudied proteins thus represent a collective blind spot that limits the progress of both basic and applied biosciences. We use a simple yet powerful metric based on Gene Ontology (GO) bio- logical process terms to define characterized and uncharacterized proteins for human, budding yeast, and fission yeast. We then identify a set of conserved but unstudied proteins in S. -
A Beginner's Guide to Eukaryotic Genome Annotation
REVIEWS STUDY DESIGNS A beginner’s guide to eukaryotic genome annotation Mark Yandell and Daniel Ence Abstract | The falling cost of genome sequencing is having a marked impact on the research community with respect to which genomes are sequenced and how and where they are annotated. Genome annotation projects have generally become small-scale affairs that are often carried out by an individual laboratory. Although annotating a eukaryotic genome assembly is now within the reach of non-experts, it remains a challenging task. Here we provide an overview of the genome annotation process and the available tools and describe some best-practice approaches. Genome annotation Sequencing costs have fallen so dramatically that a sin- with some basic UNIX skills, ‘do-it-yourself’ genome A term used to describe two gle laboratory can now afford to sequence large, even annotation projects are quite feasible using present- distinct processes. ‘Structural’ human-sized, genomes. Ironically, although sequencing day tools. Here we provide an overview of the eukary- genome annotation is the has become easy, in many ways, genome annotation has otic genome annotation process, describe the available process of identifying genes and their intron–exon become more challenging. Several factors are respon- toolsets and outline some best-practice approaches. structures. ‘Functional’ genome sible for this. First, the shorter read lengths of second- annotation is the process of generation sequencing platforms mean that current Assembly and annotation: an overview attaching meta-data such as genome assemblies rarely attain the contiguity of the Assembly. The first step towards the successful annota- gene ontology terms to classic shotgun assemblies of the Drosophila mela- tion of any genome is determining whether its assem- structural annotations. -
Ontology-Based Methods for Analyzing Life Science Data
Habilitation a` Diriger des Recherches pr´esent´ee par Olivier Dameron Ontology-based methods for analyzing life science data Soutenue publiquement le 11 janvier 2016 devant le jury compos´ede Anita Burgun Professeur, Universit´eRen´eDescartes Paris Examinatrice Marie-Dominique Devignes Charg´eede recherches CNRS, LORIA Nancy Examinatrice Michel Dumontier Associate professor, Stanford University USA Rapporteur Christine Froidevaux Professeur, Universit´eParis Sud Rapporteure Fabien Gandon Directeur de recherches, Inria Sophia-Antipolis Rapporteur Anne Siegel Directrice de recherches CNRS, IRISA Rennes Examinatrice Alexandre Termier Professeur, Universit´ede Rennes 1 Examinateur 2 Contents 1 Introduction 9 1.1 Context ......................................... 10 1.2 Challenges . 11 1.3 Summary of the contributions . 14 1.4 Organization of the manuscript . 18 2 Reasoning based on hierarchies 21 2.1 Principle......................................... 21 2.1.1 RDF for describing data . 21 2.1.2 RDFS for describing types . 24 2.1.3 RDFS entailments . 26 2.1.4 Typical uses of RDFS entailments in life science . 26 2.1.5 Synthesis . 30 2.2 Case study: integrating diseases and pathways . 31 2.2.1 Context . 31 2.2.2 Objective . 32 2.2.3 Linking pathways and diseases using GO, KO and SNOMED-CT . 32 2.2.4 Querying associated diseases and pathways . 33 2.3 Methodology: Web services composition . 39 2.3.1 Context . 39 2.3.2 Objective . 40 2.3.3 Semantic compatibility of services parameters . 40 2.3.4 Algorithm for pairing services parameters . 40 2.4 Application: ontology-based query expansion with GO2PUB . 43 2.4.1 Context . 43 2.4.2 Objective . -
Supplementary Materials
Lists of figures Figure S1: A-B: Principal Component Analysis (PCA) was applied to 3 pairs of SCEC tissues (red) and matched adjacent normal tissues (blue) that were characterized by the gene expression of all probes on Affymetrix HG U133 Plus 2.0 Array. C: Box plot of SCEC group. D: Pearson’s correlation matrix of SCEC group. 17 / 25 Figure S2: MvA plot of SCEC group. Figure S3: Volcano plots of probe sets differing between SCEC and matched normal tissues. Fold change (X axis) is plotted against statistical significance (Y axis) for each probe sets. Genes altered with a fold change ≥2 and FDR <0.01 are depicted in red. Grey represents genes in the arrays that were not found to differ significantly between cancerous samples and matched normal samples. Figure S4: Gene regulatory network plotted by the top 120 DEGs (ranked by FDR) of SCEC groups. 18 / 25 Figure S5: DNA copy number change profiles in 3 pairs of SCEC samples. The CNVs frequency of the whole genome was analyzed by aCGH. Gains were marked in red and losses in bule. Lists of tables Table S1. Primers used in qRT-PCR for microarray gene expression validation Gene Forward Primer (5’-3’) Reverse Primer (5’-3’) Product β-actin AAGGTGACAGCAGTCGGTT TGTGTGGACTTGGGAGAGG 195bp INSM1 GTATTCGCTGTGTTCATGGTC CGCTACATACATAGAGAGCAGAG 79bp ASCL1 AACTCCCATCACCTCTAACA TGAGACGAAAGACACCAACT 120bp NRCAM GATGGCGAAGAATGAAGTT ACAGTGAGGGATAAGGTGTG 141bp NUF2 ATGATGCCAGTGAACTCTGAA GACTTGTCCGTTTTGCTTTTG 160bp 19 / 25 SNAP25 CCTGGATATGGGCAATGAGAT ACACGGGTGGGCACACTTA 146bp PTP4A3 GCTTCCTCATCACCCACAA CCGTACTTCTTCAGGTCCTCA