ENA, Ensembl & Ensembl Genomes

Total Page:16

File Type:pdf, Size:1020Kb

ENA, Ensembl & Ensembl Genomes EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Exploring Sequences and Browsing Genomes: ENA, Ensembl & Ensembl Genomes Bert Overduin, Ph.D. Vertebrate Genomics Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 EBI Bioinformatics Services EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Outline • European Nucleotide Archive Introduction 1: Exploring an ENA record • Ensembl & Ensembl Genomes Introduction 2: Browser basics 3: Visualising your own data 4: Variation Effect Predictor (VEP) 5: BioMart EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Goal To provide a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 History • 1980: EMBL Data Library (EMBL Heidelberg, Germany) World’s first public database of nucleotide sequences • 1995: EMBL-Bank (EBI Hinxton, UK) • 2003: Trace Archive Capillary Sequencing reads • 2008: Sequence (formerly: Short) Read Archive (SRA) Next Generation Sequencing reads EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 INSDC • International Nucleotide Sequence Database Collaboration • ENA, NCBI GenBank and DNA Data Bank of Japan • Data are submitted to one of the databases • Databases are synchronized on a daily basis • http://www.insdc.org EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Three-tiered data architecture EMBL-Bank Sequence Read Archive Trace Archive EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Content EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Submitting data • Many journals and funders require authors to submit their sequence to an INSDC database prior to publication • Only submit to one INSDC database (ENA, GenBank or DDBJ) • Unique accession numbers are assigned to all submitted data • Submitted data can be made public immediately or kept private until the associated work has been published • Once public, data will be exchanged with NCBI and DDBJ • Data belong to the submitter and can only be updated with submitter consent EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Submitting data • Preferred: Webin interactive web submission system • Other tools for e.g. genome projects and large sequencing centers • http://www.ebi.ac.uk/ena/about/submit_and_update EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Retrieving data • ENA Browser • Free text search: ENA homepage, EB-eye • Sequence similarity search: ENA homepage, ENA Sequence Search • Programmatic data access using REST URLs • Formats: FASTA, FASTQ, flat file, HTML, XML • Bulk data download: using FTP or Aspera • http://www.ebi.ac.uk/ena/about/search_and_browse EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Demo 1 - Exploring an ENA record Background: Task: © Mo Hassan Retrieve and browse the mitochondrial genome of the cave bear (Ursus spelaeus) EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Help • Data submissions, helpdesk, enquiries [email protected] • Updates, publication notifications [email protected] • EBI Train Online • http://www.ebi.ac.uk/training/online/ EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Acknowledgements Guy Cochrane, Blaise Alako, Clara Amid, Ana Cerdeño-Tárraga, Iain Cleland, Richard Gibson, Neil Goodgame, Simon Kay, Rasko Leinonen, Xin Liu, Arnaud Oisel, Nima Pakseresht, Sheila Plaister, Rajesh Radhakrishnan, Kethi Reddy, Stephane Riviere, Marc Rossello, Alexander Senf, Nicole Silvester, Petra Ten Hoopen, Dmitriy Smirnov, Ana Toribio, Daniel Vaughan, Vadim Zalunin CTAAAGTTCTGAAAGACCTGTTGCTTTTCACCAGGAAGTTTTACTGGGCATCTCCTGAGCCTAGGCAATAGCTGTAGGGTGACTTCTGGAGCCATCCCCGTTTCCCCGCCCCCCAAAAGAAGCGGAGATTTAACGGG GACGTGCGGCCAGAGCTGGGGAAATGGGCCCGCGAGCCAGGCCGGCGCTTCTCCTCCTGATGCTTTTGCAGACCGCGGTCCTGCAGGGGCGCTTGCTGCGTGAGTCCGAGGGCTGCGGGCGAACTAGGGGCGCGGCG GGGGTGGAAAAATCGAAACTAGCTTTTTCTTTGCGCTTGGGAGTTTGCTAACTTTGGAGGACCTGCTCAACCCTATCCGCAAGCCCCTCTCCCTACTTTCTGCGTCCAGACCCCGTGAGGGAGTGCCTACCACTGAA CTGCAGATAGGGGTCCCTCGCCCCAGGACCTGCCCCCTCCCCCGGCTGTCCCGGCTCTGCGGAGTGACTTTTGGAACCGCCCACTCCCTTCCCCCAACTAGAATGCTTTTAAATAAATCTCGTAGTTCCTCACTTGA GCTGAGCTAAGCCTGGGGCTCCTTGAACCTGGAACTCGGGTTTATTTCCAATGTCAGCTGTGCAGTTTTTTCCCCAGTCATCTCCAAACAGGAAGTTCTTCCCTGAGTGCTTGCCGAGAAGGCTGAGCAAACCCACA GCAGGATCCGCACGGGGTTTCCACCTCAGAACGAATGCGTTGGGCGGTGGGGGCGCGAAAGAGTGGCGTTGGGGATCTGAATTCTTCACCATTCCACCCACTTTTGGTGAGACCTGGGGTGGAGGTCTCTAGGGTGG GAGGCTCCTGAGAGAGGCCTACCTCGGGCCTTTCCCCACTCTTGGCAATTGTTCTTTTGCCTGGAAAATTAAGTATATGTTAGTTTTGAACGTTTGAACTGAACAATTCTCTTTTCGGCTAGGCTTTATTGATTTGC AATGTGCTGTGTAATTAAGAGGCCTCTCTACAAAGTACTGATAATGAACATGTAAGCAATGCACTCACTTCTAAGTTACATTCATATCTGATCTTATTTGATTTTCACTAGGCATAGGGAGGTAGGAGCTAATAATA CGTTTATTTTACTAGAAGTTAACTGGAATTCAGATTATATAACTCTTTTCAGGTTACAAAGAACATAAATAATCTGGTTTTCTGATGTTATTTCAAGTACTACAGCTGCTTCTAATCTTAGTTGACAGTGATTTTGC CCTGTAGTGTAGCACAGTGTTCTGTGGCACACGCCGGCCTCAGCACAGCACTTTGAGTTTTGGTACTACGTGTATCCACATTTTACACATGACAAGAATGAGGCATGGCACGGCCTGCTTCCTGGCAAATTTATTCA ATGGTACATGGGCTTTGGTGGCAGAGCTCATGTCTCCACTTCATAGCTATGATTCTTAAACATCACACTGCATTAGAGGTTGAATAATAAAATTTCATGTTGAGCAGAAATATTCATTGTTTACAAGTGTAAATGAG TCCCAGCCATGTGTTGCACTGTTCAAGCCCCAAGGGAGAGAGCAGGGAAACAAGTCTTTACCCTTTGATATTTTGCATTCTAGTGGGAGAGATGACAATAAGCAAATGAGCAGAAAGATATACAACATCAGGAAATC ATGGGTGTTGTGAGAAGCAGAGAAGTCAGGGCAAGTCACTCTGGGGCTGACACTTGAGCAGAGACATGAAGGAAATAAGAATGATATTGACTGGGAGCAGTATTTCCCAGGCAAACTGAGTGGGCCTGGCAAGTTGG ATTAAAAAGCGGGTTTTCTCAGCACTACTCATGTGTGTGTGTGTGGGGGGGGGGGGCGGCGTGGGGGTGGGAAGGGGGACTACCATCTGCATGTAGGATGTCTAGCAGTATCCTGTCCTCCCTACTCACTAGGTGCT AGGAGCACTCCCCCAGTCTTGACAACCAAAAATGTCTCTAAACTTTGCCACATGTCACCTAGTAGACAAACTCCTGGTTAAGAAGCTCGGGTTGAAAAAAATAAACAAGTAGTGCTGGGGAGTAGAGGCCAAGAAGT AGGTAATGGGCTCAGAAGAGGAGCCACAAACAAGGTTGTGCAGGCGCCTGTAGGCTGTGGTGTGAATTCTAGCCAAGGAGTAACAGTGATCTGTCACAGGCTTTTAAAAGATTGCTCTGGCTGCTATGTGGAAAGCA GAATGAAGGGAGCAACAGTAAAAGCAGGGAGCCCAGCCAGGAAGCTGTTACACAGTCCAGGCAAGAGGTAGTGGAGTGGGCTGGGTGGGAACAGAAAAGGGAGTGACAAACCATTGTCTCCTGAATATATTCTGAAG GAAGTTGCTGAAGGATTCTATGTTGTGTGAGAGAAAGAGAAGAATTGGCTGGGTGTAGTAGCTCATGCCAAGGAGGAGGCCAAGGAGAGCAGATTCCTGAGCTCAGGAGTTCAAGACCAGCCTGGGCAACACAGCAA AACCCCTTCTCTACAAAAAATACAAAAATTAGCTGGGTGTGGTGGCATGCACCTGTGATCCTAGCTACTCGGGAGGCTGAGGTGGAGGGTATTGCTTGAGCCCAGGAAGTTGAGGCTGCAGTGAGCCATGACTGTGC CACTGTACTTCAGCCTAGGTGACAGAGCAAGACCCTGTCTCCCCTGACCCCCTGAAAAAGAGAAGAGTTAAAGTTGACTTTGTTCTTTATTTTAATTTTATTGGCCTGAGCAGTGGGGTAATTGGCAATGCCATTTC TGAGATGGTGAAGGCAGAGGAAAGAGCAGTTTGGGGTAAATCAAGGATCTGCATTTGGACATGTTAAGTTTGAGATTCCAGTCAGGCTTCCAAGTGGTGAGGCCACATAGGCAGTTCAGTGTAAGAATTCAGGACCA AGGCTGGGCACGGTGGCTCACTTCTGTAATCCCAGCACTTTGGTGGCTGAGGCAGGTAGATCATTTGAGGTCAGGAGTTTGAGACAAGCTTGGCCAACATGGTGAAACCCCATGTCTACTAAAAATACAAAAATTAG CCTGGTGTGGTGGCGCACGCCTATAGTCCCAGGTTTTCAGGAGGCTTAGGTAGGAGAATCCCTTGAACCCAGGAGGTGCAGGTTGCAGTGAGCTGAGATTGTGCCACTGCACTCCAGCCTGGGTGATAGAGTGAGAC TCTGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAACTGAAGGAATTATTCCTCAGGATTTGGGTCTAATTTGCCCTGAGCACCAACTCCTGAGTTCAACTACCATGGCTAGACACACCTTAACATTTTCTAGAAT CCACCAGCTTTAGTGGAGTCTGTCTAATCATGAGTATTGGAATAGGATCTGGGGGCAGTGAGGGGGTGGCAGCCACGTGTGGCAGAGAAAAGCACACAAGGAAAGAGCACCCAGGACTGTCATATGGAAGAAAGACA GGACTGCAACTCACCCTTCACAAAATGAGGACCAGACACAGCTGATGGTATGAGTTGATGCAGGTGTGTGGAGCCTCAACATCCTGCTCCCCTCCTACTACACATGGTTAAGGCCTGTTGCTCTGTCTCCAGGTTCA CACTCTCTGCACTACCTCTTCATGGGTGCCTCAGAGCAGGACCTTGGTCTTTCCTTGTTTGAAGCTTTGGGCTACGTGGATGACCAGCTGTTCGTGTTCTATGATCATGAGAGTCGCCGTGTGGAGCCCCGAACTCC ATGGGTTTCCAGTAGAATTTCAAGCCAGATGTGGCTGCAGCTGAGTCAGAGTCTGAAAGGGTGGGATCACATGTTCACTGTTGACTTCTGGACTATTATGGAAAATCACAACCACAGCAAGGGTATGTGGAGAGGGG GCCTCACCTTCCTGAGGTTGTCAGAGCTTTTCATCTTTTCATGCATCTTGAAGGAAACAGCTGGAAGTCTGAGGTCTTGTGGGAGCAGGGAAGAGGGAAGGAATTTGCTTCCTGAGATCATTTGGTCCTTGGGATGG TGGAAATAGGGACCTATTCCTTTGGTTGCAGTTAACAAGGCTGGGGATTTTTCCAGAGTCCCACACCCTGCAGGTCATCCTGGGCTGTGAAATGCAAGAAGACAACAGTACCGAGGGCTACTGGAAGTACGGGTATG ATGGGCAGGACCACCTTGAATTCTGCCCTGACACACTGGATTGGAGAGCAGCAGAACCCAGGGCCTGGCCCACCAAGCTGGAGTGGGAAAGGCACAAGATTCGGGCCAGGCAGAACAGGGCCTACCTGGAGAGGGAC TGCCCTGCACAGCTGCAGCAGTTGCTGGAGCTGGGGAGAGGTGTTTTGGACCAACAAGGTATGGTGGAAACACACTTCTGCCCCTATACTCTAGTGGCAGAGTGGAGGAGGTTGCAGGGCACGGAATCCCTGGTTGG AGTTTCAGAGGTGGCTGAGGCTGTGTGCCTCTCCAAATTCTGGGAAGGGACTTTCTCAATCCTAGAGTCTCTACCTTATAATTGAGATGTATGAGACAGCCACAAGTCATGGGTTTAATTTCTTTTCTCCATGCATA TGGCTCAAAGGGAAGTGTCTATGGCCCTTGCTTTTTATTTAACCAATAATCTTTTGTATATTTATACCTGTTAAAAATTCAGAAATGTCAAGGCCGGGCACGGTGGCTCACCCCTGTAATCCCAGCACTTTGGGAGG CCGAGGCGGGTGGTCACAAGGTCAGGAGTTTGAGACCAGCCTGACCAACATGGTGAAACCCGTCTCTAAAAAAATACAAAAATTAGCTGGTCACAGTCATGCGCACCTGTAGTCCCAGCTAATTGGAAGGCTGAGGC AGGAGCATCGCTTGAACCTGGGAAGCGGAAGTTGCACTGAGCCAAGATCGCGCCACTGCACTCCAGCCTAGGCAGCAGAGTGAGACTCCATCTTAAAAAAAAAAAAAAAAAAAAAAAGAGAATTCAGAGATCTCAGC TATCATATGAATACCAGGACAAAATATCAAGTGAGGCCACTTATCAGAGTAGAAGAATCCTTTAGGTTAAAAGTTTCTTTCATAGAACATAGCAATAATCACTGAAGCTACCTATCTTACAAGTCCGCTTCTTATAA CAATGCCTCCTAGGTTGACCCAGGTGAAACTGACCATCTGTATTCAATCATTTTCAATGCACATAAAGGGCAATTTTATCTATCAGAACAAAGAACATGGGTAACAGATATGTATATTTACATGTGAGGAGAACAAG CTGATCTGACTGCTCTCCAAGTGACACTGTGTTAGAGTCCAATCTTAGGACACAAAATGGTGTCTCTCCTGTAGCTTGTTTTTTTCTGAAAAGGGTATTTCCTTCCTCCAACCTATAGAAGGAAGTGAAAGTTCCAG TCTTCCTGGCAAGGGTAAACAGATCCCCTCTCCTCATCCTTCCTCTTTCCTGTCAAGTGCCTCCTTTGGTGAAGGTGACACATCATGTGACCTCTTCAGTGACCACTCTACGGTGTCGGGCCTTGAACTACTACCCC CAGAACATCACCATGAAGTGGCTGAAGGATAAGCAGCCAATGGATGCCAAGGAGTTCGAACCTAAAGACGTATTGCCCAATGGGGATGGGACCTACCAGGGCTGGATAACCTTGGCTGTACCCCCTGGGGAAGAGCA
Recommended publications
  • Ensembl Genomes: Extending Ensembl Across the Taxonomic Space P
    Published online 1 November 2009 Nucleic Acids Research, 2010, Vol. 38, Database issue D563–D569 doi:10.1093/nar/gkp871 Ensembl Genomes: Extending Ensembl across the taxonomic space P. J. Kersey*, D. Lawson, E. Birney, P. S. Derwent, M. Haimel, J. Herrero, S. Keenan, A. Kerhornou, G. Koscielny, A. Ka¨ ha¨ ri, R. J. Kinsella, E. Kulesha, U. Maheswari, K. Megy, M. Nuhn, G. Proctor, D. Staines, F. Valentin, A. J. Vilella and A. Yates EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK Received August 14, 2009; Revised September 28, 2009; Accepted September 29, 2009 ABSTRACT nucleotide archives; numerous other genomes exist in states of partial assembly and annotation; thousands of Ensembl Genomes (http://www.ensemblgenomes viral genomes sequences have also been generated. .org) is a new portal offering integrated access to Moreover, the increasing use of high-throughput genome-scale data from non-vertebrate species sequencing technologies is rapidly reducing the cost of of scientific interest, developed using the Ensembl genome sequencing, leading to an accelerating rate of genome annotation and visualisation platform. data production. This not only makes it likely that in Ensembl Genomes consists of five sub-portals (for the near future, the genomes of all species of scientific bacteria, protists, fungi, plants and invertebrate interest will be sequenced; but also the genomes of many metazoa) designed to complement the availability individuals, with the possibility of providing accurate and of vertebrate genomes in Ensembl. Many of the sophisticated annotation through the similarly low-cost databases supporting the portal have been built in application of functional assays.
    [Show full text]
  • Abstracts Genome 10K & Genome Science 29 Aug - 1 Sept 2017 Norwich Research Park, Norwich, Uk
    Genome 10K c ABSTRACTS GENOME 10K & GENOME SCIENCE 29 AUG - 1 SEPT 2017 NORWICH RESEARCH PARK, NORWICH, UK Genome 10K c 48 KEYNOTE SPEAKERS ............................................................................................................................... 1 Dr Adam Phillippy: Towards the gapless assembly of complete vertebrate genomes .................... 1 Prof Kathy Belov: Saving the Tasmanian devil from extinction ......................................................... 1 Prof Peter Holland: Homeobox genes and animal evolution: from duplication to divergence ........ 2 Dr Hilary Burton: Genomics in healthcare: the challenges of complexity .......................................... 2 INVITED SPEAKERS ................................................................................................................................. 3 Vertebrate Genomics ........................................................................................................................... 3 Alex Cagan: Comparative genomics of animal domestication .......................................................... 3 Plant Genomics .................................................................................................................................... 4 Ksenia Krasileva: Evolution of plant Immune receptors ..................................................................... 4 Andrea Harper: Using Associative Transcriptomics to predict tolerance to ash dieback disease in European ash trees ............................................................................................................
    [Show full text]
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The
    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
    [Show full text]
  • Whole Genome Sequencing Data of Multiple Individuals of Pakistani
    www.nature.com/scientificdata oPeN Whole genome sequencing data DAtA DeScriptor of multiple individuals of Pakistani descent Shahid Y. Khan1, Muhammad Ali1, Mei-Chong W. Lee2, Zhiwei Ma3, Pooja Biswas4, Asma A. Khan5, Muhammad Asif Naeem5, Saima Riazuddin 6, Sheikh Riazuddin5,7,8, Radha Ayyagari4, J. Fielding Hejtmancik 3 & S. Amer Riazuddin1 ✉ Here we report whole genome sequencing of four individuals (H3, H4, H5, and H6) from a family of Pakistani descent. Whole genome sequencing yielded 1084.92, 894.73, 1068.62, and 1005.77 million mapped reads corresponding to 162.73, 134.21, 160.29, and 150.86 Gb sequence data and 52.49x, 43.29x, 51.70x, and 48.66x average coverage for H3, H4, H5, and H6, respectively. We identifed 3,529,659, 3,478,495, 3,407,895, and 3,426,862 variants in the genomes of H3, H4, H5, and H6, respectively, including 1,668,024 variants common in the four genomes. Further, we identifed 42,422, 39,824, 28,599, and 35,206 novel variants in the genomes of H3, H4, H5, and H6, respectively. A major fraction of the variants identifed in the four genomes reside within the intergenic regions of the genome. Single nucleotide polymorphism (SNP) genotype based comparative analysis with ethnic populations of 1000 Genomes database linked the ancestry of all four genomes with the South Asian populations, which was further supported by mitochondria based haplogroup analysis. In conclusion, we report whole genome sequencing of four individuals of Pakistani descent. Background & Summary The completion of Human Genome Project ignited several large scale efforts to characterize variations in the human genome, which led to a comprehensive catalog of the common variants including single-nucleotide polymorphisms (SNPs) and insertions/deletions (indels), across the entire human genome1,2.
    [Show full text]
  • Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I
    EMBL-European Bioinformatics Institute Annual Scientific Report 2013 On the cover Structure 3fof in the Protein Data Bank, determined by Laponogov, I. et al. (2009) Structural insight into the quinolone-DNA cleavage complex of type IIA topoisomerases. Nature Structural & Molecular Biology 16, 667-669. © 2014 European Molecular Biology Laboratory This publication was produced by the External Relations team at the European Bioinformatics Institute (EMBL-EBI) A digital version of the brochure can be found at www.ebi.ac.uk/about/brochures For more information about EMBL-EBI please contact: [email protected] Contents Introduction & overview 3 Services 8 Genes, genomes and variation 8 Molecular atlas 12 Proteins and protein families 14 Molecular and cellular structures 18 Chemical biology 20 Molecular systems 22 Cross-domain tools and resources 24 Research 26 Support 32 ELIXIR 36 Facts and figures 38 Funding & resource allocation 38 Growth of core resources 40 Collaborations 42 Our staff in 2013 44 Scientific advisory committees 46 Major database collaborations 50 Publications 52 Organisation of EMBL-EBI leadership 61 2013 EMBL-EBI Annual Scientific Report 1 Foreword Welcome to EMBL-EBI’s 2013 Annual Scientific Report. Here we look back on our major achievements during the year, reflecting on the delivery of our world-class services, research, training, industry collaboration and European coordination of life-science data. The past year has been one full of exciting changes, both scientifically and organisationally. We unveiled a new website that helps users explore our resources more seamlessly, saw the publication of ground-breaking work in data storage and synthetic biology, joined the global alliance for global health, built important new relationships with our partners in industry and celebrated the launch of ELIXIR.
    [Show full text]
  • Genomic Data Standards Resources and Initiatives Cited in the Supplemental Information to the Genomic Data Sharing Policy
    Genomic Data Standards Resources and Initiatives Cited in the Supplemental Information to the Genomic Data Sharing Policy IMPORTANT NOTE: The National Institutes of Health makes no endorsement of the non-NIH-funded genomic data standards resources and/or initiatives included in this document. Categories Resources and Initiatives Common Data Element Resource Portal: http://www.nlm.nih.gov/cde/ NIH encourages the use of common data elements (CDEs) in clinical research, patient registries, and other human subject research in order to improve data quality and opportunities for comparison and combination of data from multiple studies and with electronic health records. This portal provides access to NIH-supported CDE initiatives and other tools and resources that can assist investigators developing protocols for data collection. Clinical Genome Resource (ClinGen): http://www.nih.gov/news/health/sep2013/nhgri-25.htm In 2013, the NIH National Human Genome Research Institute (NHGRI) and the Eunice Kennedy Shriver National Institute NIH-Funded of Child Health and Human Development (NICHD) awarded three grants totalling over $25 million to support a consortium of research groups to design and implement a framework for evaluating variants that are relevant to patient care (e.g., play a role in disease). International Collaboration for Clinical Genomics (ICCG): http://www.iccg.org/about-the-iccg/clingen/ ICCG was awarded as a grant under ClinGen. ICCG was charged with developing standard formats for gathering and depositing data in ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/). It will work with a variety of different stakeholder groups, including clinical laboratories and existing locus-specific databases, to obtain robust data sets on genomic variants and disease associations.
    [Show full text]
  • Strategic Plan 2011-2016
    Strategic Plan 2011-2016 Wellcome Trust Sanger Institute Strategic Plan 2011-2016 Mission The Wellcome Trust Sanger Institute uses genome sequences to advance understanding of the biology of humans and pathogens in order to improve human health. -i- Wellcome Trust Sanger Institute Strategic Plan 2011-2016 - ii - Wellcome Trust Sanger Institute Strategic Plan 2011-2016 CONTENTS Foreword ....................................................................................................................................1 Overview .....................................................................................................................................2 1. History and philosophy ............................................................................................................ 5 2. Organisation of the science ..................................................................................................... 5 3. Developments in the scientific portfolio ................................................................................... 7 4. Summary of the Scientific Programmes 2011 – 2016 .............................................................. 8 4.1 Cancer Genetics and Genomics ................................................................................ 8 4.2 Human Genetics ...................................................................................................... 10 4.3 Pathogen Variation .................................................................................................. 13 4.4 Malaria
    [Show full text]
  • Comparative Analysis of Pacbio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus Monodon
    life Brief Report Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus monodon Zulema Udaondo 1 , Kanchana Sittikankaew 2, Tanaporn Uengwetwanit 2 , Thidathip Wongsurawat 1,3 , Chutima Sonthirod 4, Piroon Jenjaroenpun 1,3 , Wirulda Pootakham 4, Nitsara Karoonuthaisiri 2 and Intawat Nookaew 1,* 1 Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA; [email protected] (Z.U.); [email protected] (T.W.); [email protected] (P.J.) 2 National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency, Pathum Thani 12120, Thailand; [email protected] (K.S.); [email protected] (T.U.); [email protected] (N.K.) 3 Division of Bioinformatics and Data Management for Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok 10700, Thailand 4 National Omics Center (NOC), National Science and Technology Development Agency (NSTDA), Pathum Thani 12120, Thailand; [email protected] (C.S.); [email protected] (W.P.) * Correspondence: [email protected]; Tel.: +1-501-686-6025; Fax: +1-501-603-1766 Citation: Udaondo, Z.; Sittikankaew, K.; Uengwetwanit, T.; Wongsurawat, Abstract: With the advantages that long-read sequencing platforms such as Pacific Biosciences T.; Sonthirod, C.; Jenjaroenpun, P.; (Menlo Park, CA, USA) (PacBio) and Oxford Nanopore Technologies (Oxford, UK) (ONT) can offer, Pootakham, W.; Karoonuthaisiri, N.; various research fields such as genomics and transcriptomics can exploit their benefits. Selecting an Nookaew, I. Comparative Analysis of appropriate sequencing platform is undoubtedly crucial for the success of the research outcome, PacBio and Oxford Nanopore thus there is a need to compare these long-read sequencing platforms and evaluate them for specific Sequencing Technologies for research questions.
    [Show full text]
  • Globalfungi, a Global Database of Fungal Occurrences from High
    www.nature.com/scientificdata OPEN GlobalFungi, a global database DATA DEscrIPTor of fungal occurrences from high-throughput-sequencing metabarcoding studies Tomáš Větrovský1,6, Daniel Morais1,6, Petr Kohout1,6, Clémentine Lepinay1,6, Camelia Algora1, Sandra Awokunle Hollá1, Barbara Doreen Bahnmann1, Květa Bílohnědá1, Vendula Brabcová1, Federica D’Alò2, Zander Rainier Human1, Mayuko Jomura 3, Miroslav Kolařík1, Jana Kvasničková1, Salvador Lladó1, Rubén López-Mondéjar1, Tijana Martinović1, Tereza Mašínová1, Lenka Meszárošová1, Lenka Michalčíková1, Tereza Michalová1, Sunil Mundra4,5, Diana Navrátilová1, Iñaki Odriozola 1, Sarah Piché-Choquette 1, Martina Štursová1, Karel Švec1, Vojtěch Tláskal 1, Michaela Urbanová1, Lukáš Vlk1, Jana Voříšková1, Lucia Žifčáková1 & Petr Baldrian 1 ✉ Fungi are key players in vital ecosystem services, spanning carbon cycling, decomposition, symbiotic associations with cultivated and wild plants and pathogenicity. The high importance of fungi in ecosystem processes contrasts with the incompleteness of our understanding of the patterns of fungal biogeography and the environmental factors that drive those patterns. To reduce this gap of knowledge, we collected and validated data published on the composition of soil fungal communities in terrestrial environments including soil and plant-associated habitats and made them publicly accessible through a user interface at https://globalfungi.com. The GlobalFungi database contains over 600 million observations of fungal sequences across > 17 000 samples with geographical locations and additional metadata contained in 178 original studies with millions of unique nucleotide sequences (sequence variants) of the fungal internal transcribed spacers (ITS) 1 and 2 representing fungal species and genera. The study represents the most comprehensive atlas of global fungal distribution, and it is framed in such a way that third-party data addition is possible.
    [Show full text]
  • Browsing Genomes with Ensembl Annotation
    Browsing genomes with EnsEMBL Annotation • During recent years release of large amounts of sequence data • Raw sequence data are not so useful on its own. They are most valuable when provided with comprehensive good quality annotation CCCAACAAGAATGTAAAATCTTTAAGTGCCTGTTTTCATACTTATTTGACCACCCTATCTCTAGAATCTTGCATGATG TCTAGCCCTAGTAGGATCAAAAAATACTTACAAAGCAACTGAATAGCTACATGAATAGATGGATGAATAAATGCATG GGTGGATGGATGGATTAATGAAATCATTTATATGACTTAAAGTTTGCAGAGGAGTATCATATTTGGAAGGCAGTAAG GAAGTCTGTGTAGTCGATGGTAAAGGCAATTGGGAAGTTTGTTAGGCACAATAGGTCAAAATTTGTTTTTGAAGTCC TGTTACTTCACGTTTCTTTGTTTCACTTTCTTAAAACAGGAAACTCTTTTCTATGATCATTCTTCCAGGGCCTGGCTCT TCATCTGCAACCCAGTAATATCCCTAATGTCAAAAAGCTACTGGTTTAATTCGTGCCATTTTCAAAGAGGACTACTGA ATTCTGATGTGGCTTCAAACATTTAGGTTAGGCATATCTAATGGAGAACTTGCAGCCACACTGACTTGTAGTGAAAT ATCTATTTTGAGCCTGCCCAGTGTTGCTTAAATTGTAGTTTTCCTTGCCAGCTATTCATACAAGAGATGTGAGAAGCA CCATAAAAGGCGTTGTGAGGAGTTGTGGGGGAGTGAGGGAGAGAAGAGGTTGAAAAGCTTATTAGCTGCTGTACGG TAAAAGTGAGCTCTTACGGGAATGGGAATGTAGTTTTAGCCCTCCAGGGATTCTATTTAGCCCGCCAGGAATTAACC TTGACTATAAATAGGCCATCAATGACCTTTCCAGAGAATGTTCAGAGACCTCAACTTTGTTTAGAGATCTTGTGTGGG TGGAACTTCCTGTTTGCACACAGAGCAGCATAAAGCCCAGTTGCTTTGGGAAGTGTTTGGGACCAGATGGATTGTAG GGAGTAGGGTACAATACAGTCTGTTCTCCTCCAGCTCCTTCTTTCTGCAACATGGGGAAGAACAAACTCCTTCATCC AAGTCTGGTTCTTCTCCTCTTGGTCCTCCTGCCCACAGACGCCTCAGTCTCTGGAAAACCGTGAGTTCCACACAGAG AGCGTGAAGCATGAACCTAGAGTCCTTCATTTATTGCAGATTTTTCTTTATATCATTCCTTTTTCTTTCCTATGATACT GTCATCTTCTTATCTCTAAGATTCCTTCCAGATTTTACAAATCTAGTTTACTCATTACTTGCTTACTTTTAATCATTCT TCCCCAACTCTCTGAAGCTCTAATATGCAAAGCCTTCCTAAGGGGTGTCAGAAATTTTTAGCTTTTTAAAAGAATAAA
    [Show full text]
  • The Genomic Basis of Circadian and Circalunar Timing Adaptations in a Midge Tobias S
    OPEN ARTICLE doi:10.1038/nature20151 The genomic basis of circadian and circalunar timing adaptations in a midge Tobias S. Kaiser1,2,3†, Birgit Poehn1,3, David Szkiba2, Marco Preussner4, Fritz J. Sedlazeck2†, Alexander Zrim2, Tobias Neumann1,2, Lam-Tung Nguyen2,5, Andrea J. Betancourt6, Thomas Hummel3,7, Heiko Vogel8, Silke Dorner1, Florian Heyd4, Arndt von Haeseler2,3,5 & Kristin Tessmar-Raible1,3 Organisms use endogenous clocks to anticipate regular environmental cycles, such as days and tides. Natural variants resulting in differently timed behaviour or physiology, known as chronotypes in humans, have not been well characterized at the molecular level. We sequenced the genome of Clunio marinus, a marine midge whose reproduction is timed by circadian and circalunar clocks. Midges from different locations show strain-specific genetic timing adaptations. We examined genetic variation in five C. marinus strains from different locations and mapped quantitative trait loci for circalunar and circadian chronotypes. The region most strongly associated with circadian chronotypes generates strain-specific differences in the abundance of calcium/calmodulin-dependent kinase II.1 (CaMKII.1) splice variants. As equivalent variants were shown to alter CaMKII activity in Drosophila melanogaster, and C. marinus (Cma)-CaMKII.1 increases the transcriptional activity of the dimer of the circadian proteins Cma-CLOCK and Cma-CYCLE, we suggest that modulation of alternative splicing is a mechanism for natural adaptation in circadian timing. Around the new or full moon, during a few specific hours surround- Our study aimed to identify the genetic basis of C. marinus adaptation ing low tide, millions of non-biting midges of the species C.
    [Show full text]
  • ALEXA: a Microarray Design Platform for Alternative Expression Analysis
    CORRESPONDENCE expression of alternative mRNA isoforms in 5-fluorouracil (5-FU)- ALEXA: a microarray design platform for sensitive and resistant colorectal cancer cell lines5 and compared alternative expression analysis the results to those from the Affymetrix ‘GeneChip Human Exon 1.0 ST’ array (see Supplementary Results, Supplementary Fig. 2 To the editor: Eukaryotic genomes are predicted to contain about and Supplementary Table 2 online). Genes and exons differentially 7,000–29,000 genes1. Each of these genes may be alternatively expressed between 5-FU–sensitive and resistant cells were identi- processed to produce multiple distinct mRNAs by alternative fied by both platforms (with significant overlap), but ALEXA arrays transcript initiation, splicing and polyadenylation (collectively provided additional information on the connectivity and boundar- referred to as alternative expression). Although analysis of avail- ies of exons (Table 1). Furthermore, alternative expression events able transcript resources indicates that up to ~75% of genes are identified by ALEXA were significantly enriched for known alterna- alternatively processed, most microarray expression platforms tive expression events represented in publicly available mRNA and cannot detect alternative transcripts2. expressed sequence tag (EST) databases (Supplementary Results methods Proof-of-principle experiments have described the use of oli- and Supplementary Data 1 online). Finally, we demonstrated the gonucleotide microarrays to profile transcript isoforms gener- advantage of the ALEXA approach by identifying several differen- ated by alternative expression, but resources to create such arrays tially expressed known and predicted isoforms with potential rele- are lacking3,4. To address this limitation we created a microarray vance to 5-FU resistance (Supplementary Fig. 3 and Supplementary .com/nature e design platform for alternative expression analysis (ALEXA), Tables 3 and 4 online).
    [Show full text]