Genome Resources at the EBI - Ensembl and Ensembl Genomes
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Ensembl Genomes: Extending Ensembl Across the Taxonomic Space P
Published online 1 November 2009 Nucleic Acids Research, 2010, Vol. 38, Database issue D563–D569 doi:10.1093/nar/gkp871 Ensembl Genomes: Extending Ensembl across the taxonomic space P. J. Kersey*, D. Lawson, E. Birney, P. S. Derwent, M. Haimel, J. Herrero, S. Keenan, A. Kerhornou, G. Koscielny, A. Ka¨ ha¨ ri, R. J. Kinsella, E. Kulesha, U. Maheswari, K. Megy, M. Nuhn, G. Proctor, D. Staines, F. Valentin, A. J. Vilella and A. Yates EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK Received August 14, 2009; Revised September 28, 2009; Accepted September 29, 2009 ABSTRACT nucleotide archives; numerous other genomes exist in states of partial assembly and annotation; thousands of Ensembl Genomes (http://www.ensemblgenomes viral genomes sequences have also been generated. .org) is a new portal offering integrated access to Moreover, the increasing use of high-throughput genome-scale data from non-vertebrate species sequencing technologies is rapidly reducing the cost of of scientific interest, developed using the Ensembl genome sequencing, leading to an accelerating rate of genome annotation and visualisation platform. data production. This not only makes it likely that in Ensembl Genomes consists of five sub-portals (for the near future, the genomes of all species of scientific bacteria, protists, fungi, plants and invertebrate interest will be sequenced; but also the genomes of many metazoa) designed to complement the availability individuals, with the possibility of providing accurate and of vertebrate genomes in Ensembl. Many of the sophisticated annotation through the similarly low-cost databases supporting the portal have been built in application of functional assays. -
Abstracts Genome 10K & Genome Science 29 Aug - 1 Sept 2017 Norwich Research Park, Norwich, Uk
Genome 10K c ABSTRACTS GENOME 10K & GENOME SCIENCE 29 AUG - 1 SEPT 2017 NORWICH RESEARCH PARK, NORWICH, UK Genome 10K c 48 KEYNOTE SPEAKERS ............................................................................................................................... 1 Dr Adam Phillippy: Towards the gapless assembly of complete vertebrate genomes .................... 1 Prof Kathy Belov: Saving the Tasmanian devil from extinction ......................................................... 1 Prof Peter Holland: Homeobox genes and animal evolution: from duplication to divergence ........ 2 Dr Hilary Burton: Genomics in healthcare: the challenges of complexity .......................................... 2 INVITED SPEAKERS ................................................................................................................................. 3 Vertebrate Genomics ........................................................................................................................... 3 Alex Cagan: Comparative genomics of animal domestication .......................................................... 3 Plant Genomics .................................................................................................................................... 4 Ksenia Krasileva: Evolution of plant Immune receptors ..................................................................... 4 Andrea Harper: Using Associative Transcriptomics to predict tolerance to ash dieback disease in European ash trees ............................................................................................................ -
A Comprehensive Workflow for Variant Calling Pipeline Comparison and Analysis Using R Programming
www.ijcrt.org © 2020 IJCRT | Volume 8, Issue 8 August 2020 | ISSN: 2320-2882 A COMPREHENSIVE WORKFLOW FOR VARIANT CALLING PIPELINE COMPARISON AND ANALYSIS USING R PROGRAMMING 1Mansi Ujjainwal, 2Preeti Chaudhary 1MSc Bioinformatics, 2Mtech Bioinformatics, 1Amity Institute of Biotechnology 1Amity University, Noida, India Abstract: The aim of the article is to provide variant calling workflow and analysis protocol for comparing results of the two using two variant calling platforms. Variant calling pipelines used here are predominantly used for calling variants in human whole exome data and whole genome data. The result of a variant calling pipeline is a set of variants( SNPS, insertions, deletions etc) present in the sequencing data. Each pipeline is capable of calling its certain intersecting and certain unique variants. The intersecting and unique variants can further be distinguished on the basis of their reference SNP ID and grouped on the basis of its annotation. The number of variants called can be humongous depending upon the size and complexity of the data. R programming packages and ubuntu command shell can be used to differentiate and analyse the variants called by each type of pipeline. Index Terms – Whole Exome Sequencing, Variant Calling, Sequencing data, R programming I. INTRODUCTION The Human Genome Project started in 1990, makes up the single most significant project in the field of biomedical sciences and biology. The project was set out to change how we see biology and medicine. The project was set out to sequence complete genome of Homo sapiens as well as several microorganisms including Escherichia coli, Saccharomyces cerevisiae, and metazoans such as Caenorhabtidis elegans. -
The ELIXIR Core Data Resources: Fundamental Infrastructure for The
Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that mention Core Data Resource by name or include specific data record accession numbers. -
A Beginner's Guide to Eukaryotic Genome Annotation
REVIEWS STUDY DESIGNS A beginner’s guide to eukaryotic genome annotation Mark Yandell and Daniel Ence Abstract | The falling cost of genome sequencing is having a marked impact on the research community with respect to which genomes are sequenced and how and where they are annotated. Genome annotation projects have generally become small-scale affairs that are often carried out by an individual laboratory. Although annotating a eukaryotic genome assembly is now within the reach of non-experts, it remains a challenging task. Here we provide an overview of the genome annotation process and the available tools and describe some best-practice approaches. Genome annotation Sequencing costs have fallen so dramatically that a sin- with some basic UNIX skills, ‘do-it-yourself’ genome A term used to describe two gle laboratory can now afford to sequence large, even annotation projects are quite feasible using present- distinct processes. ‘Structural’ human-sized, genomes. Ironically, although sequencing day tools. Here we provide an overview of the eukary- genome annotation is the has become easy, in many ways, genome annotation has otic genome annotation process, describe the available process of identifying genes and their intron–exon become more challenging. Several factors are respon- toolsets and outline some best-practice approaches. structures. ‘Functional’ genome sible for this. First, the shorter read lengths of second- annotation is the process of generation sequencing platforms mean that current Assembly and annotation: an overview attaching meta-data such as genome assemblies rarely attain the contiguity of the Assembly. The first step towards the successful annota- gene ontology terms to classic shotgun assemblies of the Drosophila mela- tion of any genome is determining whether its assem- structural annotations. -
Property Graph Vs RDF Triple Store: a Comparison on Glycan Substructure Search
RESEARCH ARTICLE Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search Davide Alocci1,2, Julien Mariethoz1, Oliver Horlacher1,2, Jerven T. Bolleman3, Matthew P. Campbell4, Frederique Lisacek1,2* 1 Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland, 2 Computer Science Department, University of Geneva, Geneva, 1227, Switzerland, 3 Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland, 4 Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia * [email protected] Abstract Resource description framework (RDF) and Property Graph databases are emerging tech- nologies that are used for storing graph-structured data. We compare these technologies OPEN ACCESS through a molecular biology use case: glycan substructure search. Glycans are branched Citation: Alocci D, Mariethoz J, Horlacher O, tree-like molecules composed of building blocks linked together by chemical bonds. The Bolleman JT, Campbell MP, Lisacek F (2015) molecular structure of a glycan can be encoded into a direct acyclic graph where each node Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search. PLoS ONE 10(12): represents a building block and each edge serves as a chemical linkage between two build- e0144578. doi:10.1371/journal.pone.0144578 ing blocks. In this context, Graph databases are possible software solutions for storing gly- Editor: Manuela Helmer-Citterich, University of can structures and Graph query languages, such as SPARQL and Cypher, can be used to Rome Tor Vergata, ITALY perform a substructure search. Glycan substructure searching is an important feature for Received: July 16, 2015 querying structure and experimental glycan databases and retrieving biologically meaning- ful data. -
Genomic Alignment (Mapping) and SNP / Polymorphism Calling
GenomicGenomic alignmentalignment (mapping)(mapping) andand SNPSNP // polymorphismpolymorphism callingcalling Jérôme Mariette & Christophe Klopp http://bioinfo.genotoul.fr/ Bioinfo Genotoul platform – Since 2008 ● 1 Roche 454 ● 1 MiSeq ● 2 HiSeq – Providing ● Data processing for quality control ● Secure data access to end users http://bioinfo.genotoul.fr/ http://ng6.toulouse.inra.fr/ 2 Bioinfo Genotoul : Services – High speed computing facility access – Application and web-server hosting – Training – Support – Project partnership 3 Genetic variation http://en.wikipedia.org/wiki/Genetic_variation Genetic variation, variations in alleles of genes, occurs both within and in populations. Genetic variation is important because it provides the “raw material” for natural selection. http://studentreader.com/genotypes-phenotypes/ 4 Types of variations ● SNP : Single nucleotide polymorphism ● CNV : copy number variation ● Chromosomal rearrangement ● Chromosomal duplication http://en.wikipedia.org/wiki/Copy-number_variation http://en.wikipedia.org/wiki/Human_genetic_variation 5 The variation transmission ● Mutation : In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus (http://en.wikipedia.org/wiki/Mutation). ● Mutations are transmitted if they are not lethal. ● Mutations can impact the phenotype. 6 Genetic markers and genotyping ● A set of SNPs is selected along the genome. ● The phenotypes are collected for individuals. ● The SNPs are genotyped -
Downloaded from the National Center for Cide Resistance Mechanisms
Zhou et al. Parasites & Vectors (2018) 11:32 DOI 10.1186/s13071-017-2584-8 RESEARCH Open Access ASGDB: a specialised genomic resource for interpreting Anopheles sinensis insecticide resistance Dan Zhou, Yang Xu, Cheng Zhang, Meng-Xue Hu, Yun Huang, Yan Sun, Lei Ma, Bo Shen* and Chang-Liang Zhu Abstract Background: Anopheles sinensis is an important malaria vector in Southeast Asia. The widespread emergence of insecticide resistance in this mosquito species poses a serious threat to the efficacy of malaria control measures, particularly in China. Recently, the whole-genome sequencing and de novo assembly of An. sinensis (China strain) has been finished. A series of insecticide-resistant studies in An. sinensis have also been reported. There is a growing need to integrate these valuable data to provide a comprehensive database for further studies on insecticide-resistant management of An. sinensis. Results: A bioinformatics database named An. sinensis genome database (ASGDB) was built. In addition to being a searchable database of published An. sinensis genome sequences and annotation, ASGDB provides in-depth analytical platforms for further understanding of the genomic and genetic data, including visualization of genomic data, orthologous relationship analysis, GO analysis, pathway analysis, expression analysis and resistance-related gene analysis. Moreover, ASGDB provides a panoramic view of insecticide resistance studies in An. sinensis in China. In total, 551 insecticide-resistant phenotypic and genotypic reports on An. sinensis distributed in Chinese malaria- endemic areas since the mid-1980s have been collected, manually edited in the same format and integrated into OpenLayers map-based interface, which allows the international community to assess and exploit the high volume of scattered data much easier. -
Genetics 211 - 2018 Lecture 3
Genetics 211 - 2018 Lecture 3 High Throughput Sequencing Pt II Gavin Sherlock [email protected] January 23rd 2018 Long “Synthetic Reads” aka Moleculo Genomic DNA Fragment Size Select (10kb) Polish, ligate amplification adaptors ~10 kb DNA Dilute to 500 molecules per well Amplify, fragment, add sequencing adaptors Pool Sequence Separate, based on well barcode Remove barcodes, assemble 10kb fragments Assemble genome from 10kb fragments Synthetic Read Characteristics 10x Genomics • Similar in concept to CPT-Seq from last week’s paper • Idea is to uniquely barcode reads that derive from a long molecule - ~50-100kb • 10x Chromium system automates much of the process for you ~10 HMW gDNA molecules per GEM 10x Barcoded Beads HMW gDNA Oil Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Haplotype phasing: Benefits of 10x • Haplotype phasing: Benefits of 10x • Haplotype phasing: Benefits of 10x • Haplotype phasing: Using Hi-C data to aid assemblies • Hi-C is a proximity ligation method, aimed at reconstructing the 3 dimensional structure of a genome • Originally developed with the idea of looking at how the genome of an organism for which a good reference exists is physically organized • But, probability of intrachromosomal contacts is much higher than that of interchromosomal contacts. -
Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I
EMBL-European Bioinformatics Institute Annual Scientific Report 2013 On the cover Structure 3fof in the Protein Data Bank, determined by Laponogov, I. et al. (2009) Structural insight into the quinolone-DNA cleavage complex of type IIA topoisomerases. Nature Structural & Molecular Biology 16, 667-669. © 2014 European Molecular Biology Laboratory This publication was produced by the External Relations team at the European Bioinformatics Institute (EMBL-EBI) A digital version of the brochure can be found at www.ebi.ac.uk/about/brochures For more information about EMBL-EBI please contact: [email protected] Contents Introduction & overview 3 Services 8 Genes, genomes and variation 8 Molecular atlas 12 Proteins and protein families 14 Molecular and cellular structures 18 Chemical biology 20 Molecular systems 22 Cross-domain tools and resources 24 Research 26 Support 32 ELIXIR 36 Facts and figures 38 Funding & resource allocation 38 Growth of core resources 40 Collaborations 42 Our staff in 2013 44 Scientific advisory committees 46 Major database collaborations 50 Publications 52 Organisation of EMBL-EBI leadership 61 2013 EMBL-EBI Annual Scientific Report 1 Foreword Welcome to EMBL-EBI’s 2013 Annual Scientific Report. Here we look back on our major achievements during the year, reflecting on the delivery of our world-class services, research, training, industry collaboration and European coordination of life-science data. The past year has been one full of exciting changes, both scientifically and organisationally. We unveiled a new website that helps users explore our resources more seamlessly, saw the publication of ground-breaking work in data storage and synthetic biology, joined the global alliance for global health, built important new relationships with our partners in industry and celebrated the launch of ELIXIR. -
An Integrated Mosquito Small RNA Genomics Resource Reveals
bioRxiv preprint doi: https://doi.org/10.1101/2020.04.25.061598; this version posted April 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 2 3 An integrated mosquito small RNA genomics resource reveals 4 dynamic evolution and host responses to viruses and transposons. 5 6 7 Qicheng Ma1† Satyam P. Srivastav1†, Stephanie Gamez2†, Fabiana Feitosa-Suntheimer3, 8 Edward I. Patterson4, Rebecca M. Johnson5, Erik R. Matson1, Alexander S. Gold3, Douglas E. 9 Brackney6, John H. Connor3, Tonya M. Colpitts3, Grant L. Hughes4, Jason L. Rasgon5, Tony 10 Nolan4, Omar S. Akbari2, and Nelson C. Lau1,7* 11 1. Boston University School of Medicine, Department of Biochemistry 12 2. University of California San Diego, Division of Biological Sciences, Section of Cell and 13 Developmental Biology, La Jolla, CA 92093-0335, USA. 14 3. Boston University School of Medicine, Department of Microbiology and the National 15 Emerging Infectious Disease Laboratory 16 4. Departments of Vector Biology and Tropical Disease Biology, Centre for Neglected Tropical 17 Diseases, Liverpool School of Tropical Medicine, Liverpool L3 5QA, UK 18 5. Pennsylvania State University, Department of Entomology, Center for Infectious Disease 19 Dynamics, and the Huck Institutes for the Life Sciences 20 6. Department of Environmental Sciences, The Connecticut Agricultural Experiment Station 21 7. Boston University Genome Science Institute 22 23 * Corresponding author: NCL: [email protected] 24 † These authors contributed equally to this study. 25 26 27 28 Running title: Mosquito small RNA genomics 29 30 Keywords: mosquitoes, small RNAs, piRNAs, viruses, transposons microRNAs, siRNAs 31 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.25.061598; this version posted April 27, 2020. -
Supplementary Material
SUPPLEMENTARY MATERIAL Transcriptomics supports local sensory regulation in the antennae of the kissing bug Rhodnius prolixus Jose Manuel Latorre-Estivalis; Marcos Sterkel; Sheila Ons; and Marcelo Gustavo Lorenzo DATABASES Database S1 – Protein sequences of all target genes in fasta format. Database S2 – Edited Generic Feature Format (GFF) file of the R. prolixus genome used for read mapping and gene expression analysis. Database S3 - FPKM values of target genes in the three libraries. Database S4 – Fasta sequences from different insects used in the CT/DH – CRF/DH and nuclear receptor phylogenetic analyses. FIGURES Figure S1 - Molecular phylogenetic analyses of calcitonin diuretic (CT) and corticotropin-releasing factor- related (CRF) like diuretic hormone (DH) receptors of R. prolixus and other insects. The evolutionary history of R. prolixus CT/DH and CRF/DH receptors was inferred by using the Maximum Likelihood method in PhyML v3.0. The support values on the bipartitions correspond to SH-like P values, which were calculated by means of aLRT SH-like test. The CT/DH receptor 3 clade was highlighted in red. The CT/DH and CRF/DH R. prolixus receptors were displayed in blue. The LG substitution amino-acid model was used. Species abbreviations: Dmel, Drosophila melanogaster; Aaeg, Aedes aegytpi; Agam, Anopheles gambiae; Clec, Cimex lecturiaus; Hhal, Halomorpha halys; Rpro, Rhodnius prolixus; Amel, Apis mellifera, Acyrthosiphon pisum; and Tcas, Tribolium castaneum. The glutamate receptor sequence from the D. melanogaster (FlyBase Acc. N° GC11144) was used as an out-group. The sequences used from other insects are in Supplementary Database S4). Figure S2 – Molecular phylogenetic analysis of nuclear receptor genes of R.