EMBL-EBI Powerpoint Presentation

Total Page:16

File Type:pdf, Size:1020Kb

EMBL-EBI Powerpoint Presentation EBI Patent Sequence Services • PIUG 2012 Biotechnology • Workshop • February 6th, 2012 • Boston Jennifer McDowall EBI is an Outstation of the European Molecular Biology Laboratory. Overview 1) Know the data ____Click __ toEuropean ____ edit Master_____ Nucleotide ____text Archive styles______ Second_____ _____ level UniProt ____Third _____ level Non -Fourth_____redundant _____ level patent sequence DB ____Fifth _____level 2) The Toolbox EBI search SRS advanced text search Sequence searching Sequence Searching Tools 2 1) Know the Data Know the data • Many databases, each getting bigger ____Click __ to ____ edit Master _____ ____text styles______ • Efficient searching Second_____ requires _____ level knowledge of what data is stored ____Third in _____ level a database Don‟t assume annotation Fourth_____ _____can level be transferred because of a good match ____Fifth _____level • Databases can contain errors • Data can change Deletions, sequence modifications Daily updates, identifier changes… Sequence Searching Tools 4 Major sequence databases ____Click __ to ____ edit Master _____ ____text styles______ European • >170 million sequences Second_____ _____ level Nucleotide Archive • (~42 million non-redundant) • ____Thirdrelease _____ levelevery 3 months, daily updates Fourth_____ _____ level ____Fifth _____level • >30.1 million non-redundant sequences UniProt • monthly release, daily updates Sequence Searching Tools 5 Additional sequence data Specialized databases ____Click __ to ____ edit Master _____ ____text styles______ • Immunoglobulins: Second_____ IMGT/HLA _____ level, IMGT/LIGM • Immunopolymorphisms ____Third _____ level : IPD -KIR , IPD-MHC Fourth_____ _____ level • Variation: HGVBase ____Fifth , _____ leveldbSNP • Alternative splicing: ASTD • Completed genomes: Ensembl, Integr8 • Structure: PDB, Structural Genomics targets Sequence Searching• Interaction Tools : IntAct 6 Patent Sequences Patent sequences can be found in ____Clickthe __ tofollowing ____ edit Master _____ databases: ____text styles______ Second_____ _____ level ____Third _____ level ENA • Patent nucleotides Fourth_____ _____ level ____Fifth _____level UniProt • Patent proteins Archive NR patent • Patent nucleotides and proteins sequences Sequence Searching Tools 7 Which database do you use? let’s take a look… European nucleotide archive UniProt Non-redundant patent sequence databases European nucleotide archive UniProt Non-redundant patent sequence databases Primary sequence databases Primary data submitted to databases ____Click __ to ____ edit Master _____ ____text styles______ GenBank DDBJ + SRA Second_____ _____ level ____Third _____ level INSDC Fourth_____ _____ level ____Fifth _____level (U.S.A.) (Japan) ENA Sequence Searching Tools 11 (Europe) Primary sequence databases Primary data submitted to databases ____Click __ to ____ edit Master _____ ____text styles______ GenBank DDBJ + SRA Second_____ _____ level ____Third _____ level INSDC Fourth_____ _____ level INSDC agreement: ____Fifth _____level • Free unrestricted access • All data exchanged daily ENA How do they differ? organization of data tools and database links Sequence Searching Tools 12 ENA has a 3-tiered structure Feature annotation ____Click __ to ____ edit Master _____ ____text styles______ Second_____1) EMBL _____ level-Bank ____Third _____ level Assembly E information Fourth_____ _____ level N ____Fifth _____level A Sequencing 2) Sequence Read Archive & sampling (Next Gen sequencing) information 3) Trace Archive (Capillary sequencing) Sequence Searching Tools 13 http://www.ebi.ac.uk/ena/ How is the data organised? Data in EMBL-Bank is divided in 2 ways: ____Click __ to ____ edit Master _____ ____text styles______ 1) Data classes Second _____ _____ level ____Third _____ level • Type of data or methodology used to obtain data Fourth_____ _____ level • Each entry belongs ____Fifth to _____level one data class 2) Taxonomic Divisions • Each entry belongs to one taxonomic division Sequence Searching Tools 14 EMBL-Bank data classes CON Constructed from sequence assemblies EST Expressed Sequence Tag (cDNA) GSS Genome ____Click Survey __ to Sequence____ edit Master _____ (high-throughput ____text styles______ short sequence) HTC High-Throughput cDNA Second_____ (unfinished) _____ level HTG High-Throughput Genome ____Third sequencing _____ level (unfinished) Fourth_____ _____ level MGA Mass Genome Annotation ____Fifth _____level PAT Patent sequences SRA Sequence Read Archive (both databank and data class) STS Sequence Tagged Site (short unique genomic sequences) STD Standard (high quality annotated sequence) TSA Transcriptome Shotgun Assembly (computational assembly) Sequence Searching Tools 15WGS Whole Genome Shotgun EMBL-Bank data classes Data is always changing ____Click __ to ____ edit Master _____ ____text styles______ • Assembly of sequences Second_____ into_____ level larger fragments ____Third _____ level • Suppression of obsolete entries (i.e. once assembled) Fourth_____ _____ level • Sequence modifications ____Fifth _____level • Daily updates • Identifier changes • Corrections (databases can contain errors) • etc… Sequence Searching Tools 16 EMBL-Bank data classes Data assembly can affect entries ____Click __ to ____ edit Master _____ ____text styles______ Example: Second_____ _____ level WGS Shotgun ____Third _____ level• Fragments in separate entries Fourth_____ _____ level • Join to make new CON entries ____Fifth _____level CON Constructed Old WGS entries archived • Join into large STD entry (e.g. completed genome) • Add annotation STD Standard Old CON entries Sequence Searching Tools archived 17 ENA taxonomy All INSDC databases use NCBI taxonomy ____Click __ to ____ edit Master _____ ____text styles______ Second_____ _____ level Divisions Only sequenced ____Third _____ level organisms represents HUM Human Fourth_____ _____ level MUS Mouse INV ____ FifthInvertebrate _____level Other: ROD Rodent PLN Plant ENV Environmental MAM Mammal PRO Prokaryote SYN Synthetic VRT Vertebrate PHG Phage TGN Transgenic FUN Fungi VIR Viral UNC Unclassified Sequence Searching Tools 18 ENA taxonomy Some species EXCLUDED from certain ____Click __ to ____taxonomic edit Master _____ ranges ____text styles______ Second_____ _____ level ROD Rodent excludes ____Third mouse _____ level Fourth_____ human_____ level MAM Mammal excludes ____Fifth _____mouselevel rodent Applies to ftp files and human sequence search tools mouse but not to ENA browser VRT Vertebrate excludes rodent mammal Sequence Searching Tools 19 ENA taxonomy Sometimes there is no taxonomic data ____Click __ to ____ edit Master _____ ____text styles______ Second_____ _____ level Environmental • Genus species = „uncultivated bacterium‟ ____Third or _____ „unspecified‟level Fourth_____ _____ level Synthetic • Genus species____Fifth _____=level „synthetic construct‟ Transgenic • Taxonomy for recipient and donor organisms Patent • Exempt from requiring Genus species Sequence Searching Tools 20 Database structure EMBL-Bank: ____Click __ to ____ edit Master _____ ____text styles______ Second_____ _____ level ____Third _____ level Fourth_____ _____ level ENA Database ____Fifth _____level Sequence Searching Tools 21 Database structure EMBL-Bank: Data classes ____Click __ to ____ edit Master _____ ____text styles______ Second_____ _____ level ____Third _____ level Fourth_____ _____ level ____Fifth _____level 1st: Data split into classes Sequence Searching Tools 22 Database structure EMBL-Bank: Data classes ____Click __ to ____ edit Master _____ ____text styles______ Second_____ _____ level HUM MUS ____Third _____ level Taxonomic ROD Fourth_____ _____ level Divisions MAM ____Fifth _____level VRT FUN INV ... Reduces search set 1st: Data split into classes 2nd: Data split into intersecting slices by taxonomy Sequence Searching Tools 23 Database structure EMBL-Bank: Data classes „Mouse‟ + „EST‟ ____ Click __ to ____ edit Master _____ ____text styles______ intersection Second_____ _____ level HUM MUS ____Third _____ level Taxonomic ROD Fourth_____ _____ level Divisions MAM ____Fifth _____level VRT FUN INV ... Reduces search set 1st: Data split into classes 2nd: Data split into intersecting slices by taxonomy Sequence Searching Tools 24 European Nucleotide Archive ENA is accessible from the EBI homepage ____Click __ to ____ edit Master _____ ____text styles______ Second_____ _____ level ____Third _____ level Fourth_____ _____ level ____Fifth _____level ENA Sequence Searching Tools 25 http://www.ebi.ac.uk/ ENA homepage ____Click __ to ____ edit Master _____ ____text styles______ Second_____ _____ level ____Third _____ level Fourth_____ _____ level ____Fifth _____level • Text search • Sequence search • Programmatic access Sequence Searching Tools 26 http://www.ebi.ac.uk/ena Patent sequence record in EMBL-Bank Sequence Download version data Dates (first public ____Click __ to ____ edit Master _____ ____text styles______ and last updated) Navigate to related data Second_____ _____ level e.g. Version Graphical viewer archive ____Third _____ level Fourth_____ _____ level DNA source ____Fifth _____level Navigate to external data sources e.g. UniProt Patent reference Sequence Sequence Searching Tools 27 Non-patent entry in EMBL-Bank General information ____Click __ to ____ edit Master _____ ____text styles______ Second_____ _____ level More detailed Additional graphical view information ____Third _____ level Fourth_____ _____ level Genome annotation ____Fifth _____level Assembly information Sequence Searching Tools 28 ENA graphical viewer ____Click
Recommended publications
  • Uniprot at EMBL-EBI's Role in CTTV
    Barbara P. Palka, Daniel Gonzalez, Edd Turner, Xavier Watkins, Maria J. Martin, Claire O’Donovan European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK UniProt at EMBL-EBI’s role in CTTV: contributing to improved disease knowledge Introduction The mission of UniProt is to provide the scientific community with a The Centre for Therapeutic Target Validation (CTTV) comprehensive, high quality and freely accessible resource of launched in Dec 2015 a new web platform for life- protein sequence and functional information. science researchers that helps them identify The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of therapeutic targets for new and repurposed medicines. functional information on proteins, with accurate, consistent and rich CTTV is a public-private initiative to generate evidence on the annotation. As much annotation information as possible is added to each validity of therapeutic targets based on genome-scale experiments UniProtKB record and this includes widely accepted biological ontologies, and analysis. CTTV is working to create an R&D framework that classifications and cross-references, and clear indications of the quality of applies to a wide range of human diseases, and is committed to annotation in the form of evidence attribution of experimental and sharing its data openly with the scientific community. CTTV brings computational data. together expertise from four complementary institutions: GSK, Biogen, EMBL-EBI and Wellcome Trust Sanger Institute. UniProt’s disease expert curation Q5VWK5 (IL23R_HUMAN) This section provides information on the disease(s) associated with genetic variations in a given protein. The information is extracted from the scientific literature and diseases that are also described in the OMIM database are represented with a controlled vocabulary.
    [Show full text]
  • Original Article Text Mining in the Biocuration Workflow: Applications for Literature Curation at Wormbase, Dictybase and TAIR
    Database, Vol. 2012, Article ID bas040, doi:10.1093/database/bas040 ............................................................................................................................................................................................................................................................................................. Original article Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR Kimberly Van Auken1,*, Petra Fey2, Tanya Z. Berardini3, Robert Dodson2, Laurel Cooper4, Donghui Li3, Juancarlos Chan1, Yuling Li1, Siddhartha Basu2, Hans-Michael Muller1, Downloaded from Rex Chisholm2, Eva Huala3, Paul W. Sternberg1,5 and the WormBase Consortium 1Division of Biology, California Institute of Technology, 1200 E. California Boulevard, Pasadena, CA 91125, 2Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, 420 E. Superior Street, Chicago, IL 60611, 3Department of Plant Biology, Carnegie Institution, 260 Panama Street, Stanford, CA 94305, 4Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331 and 5Howard Hughes Medical Institute, California Institute of Technology, 1200 E. California Boulevard, Pasadena, CA 91125, USA http://database.oxfordjournals.org/ *Corresponding author: Tel: +1 609 937 1635; Fax: +1 626 568 8012; Email: [email protected] Submitted 18 June 2012; Revised 30 September 2012; Accepted 2 October 2012 ............................................................................................................................................................................................................................................................................................
    [Show full text]
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The
    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
    [Show full text]
  • Dual Proteome-Scale Networks Reveal Cell-Specific Remodeling of the Human Interactome
    bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.905109; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Dual Proteome-scale Networks Reveal Cell-specific Remodeling of the Human Interactome Edward L. Huttlin1*, Raphael J. Bruckner1,3, Jose Navarrete-Perea1, Joe R. Cannon1,4, Kurt Baltier1,5, Fana Gebreab1, Melanie P. Gygi1, Alexandra Thornock1, Gabriela Zarraga1,6, Stanley Tam1,7, John Szpyt1, Alexandra Panov1, Hannah Parzen1,8, Sipei Fu1, Arvene Golbazi1, Eila Maenpaa1, Keegan Stricker1, Sanjukta Guha Thakurta1, Ramin Rad1, Joshua Pan2, David P. Nusinow1, Joao A. Paulo1, Devin K. Schweppe1, Laura Pontano Vaites1, J. Wade Harper1*, Steven P. Gygi1*# 1Department of Cell Biology, Harvard Medical School, Boston, MA, 02115, USA. 2Broad Institute, Cambridge, MA, 02142, USA. 3Present address: ICCB-Longwood Screening Facility, Harvard Medical School, Boston, MA, 02115, USA. 4Present address: Merck, West Point, PA, 19486, USA. 5Present address: IQ Proteomics, Cambridge, MA, 02139, USA. 6Present address: Vor Biopharma, Cambridge, MA, 02142, USA. 7Present address: Rubius Therapeutics, Cambridge, MA, 02139, USA. 8Present address: RPS North America, South Kingstown, RI, 02879, USA. *Correspondence: [email protected] (E.L.H.), [email protected] (J.W.H.), [email protected] (S.P.G.) #Lead Contact: [email protected] bioRxiv preprint doi: https://doi.org/10.1101/2020.01.19.905109; this version posted January 19, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder.
    [Show full text]
  • Sequence Motifs, Correlations and Structural Mapping of Evolutionary
    Talk overview • Sequence profiles – position specific scoring matrix • Psi-blast. Automated way to create and use sequence Sequence motifs, correlations profiles in similarity searches and structural mapping of • Sequence patterns and sequence logos evolutionary data • Bioinformatic tools which employ sequence profiles: PFAM BLOCKS PROSITE PRINTS InterPro • Correlated Mutations and structural insight • Mapping sequence data on structures: March 2011 Eran Eyal Conservations Correlations PSSM – position specific scoring matrix • A position-specific scoring matrix (PSSM) is a commonly used representation of motifs (patterns) in biological sequences • PSSM enables us to represent multiple sequence alignments as mathematical entities which we can work with. • PSSMs enables the scoring of multiple alignments with sequences, or other PSSMs. PSSM – position specific scoring matrix Assuming a string S of length n S = s1s2s3...sn If we want to score this string against our PSSM of length n (with n lines): n alignment _ score = m ∑ s j , j j=1 where m is the PSSM matrix and sj are the string elements. PSSM can also be incorporated to both dynamic programming algorithms and heuristic algorithms (like Psi-Blast). Sequence space PSI-BLAST • For a query sequence use Blast to find matching sequences. • Construct a multiple sequence alignment from the hits to find the common regions (consensus). • Use the “consensus” to search again the database, and get a new set of matching sequences • Repeat the process ! Sequence space Position-Specific-Iterated-BLAST • Intuition – substitution matrices should be specific to sites and not global. – Example: penalize alanine→glycine more in a helix •Idea – Use BLAST with high stringency to get a set of closely related sequences.
    [Show full text]
  • A Beginner's Guide to Eukaryotic Genome Annotation
    REVIEWS STUDY DESIGNS A beginner’s guide to eukaryotic genome annotation Mark Yandell and Daniel Ence Abstract | The falling cost of genome sequencing is having a marked impact on the research community with respect to which genomes are sequenced and how and where they are annotated. Genome annotation projects have generally become small-scale affairs that are often carried out by an individual laboratory. Although annotating a eukaryotic genome assembly is now within the reach of non-experts, it remains a challenging task. Here we provide an overview of the genome annotation process and the available tools and describe some best-practice approaches. Genome annotation Sequencing costs have fallen so dramatically that a sin- with some basic UNIX skills, ‘do-it-yourself’ genome A term used to describe two gle laboratory can now afford to sequence large, even annotation projects are quite feasible using present- distinct processes. ‘Structural’ human-sized, genomes. Ironically, although sequencing day tools. Here we provide an overview of the eukary- genome annotation is the has become easy, in many ways, genome annotation has otic genome annotation process, describe the available process of identifying genes and their intron–exon become more challenging. Several factors are respon- toolsets and outline some best-practice approaches. structures. ‘Functional’ genome sible for this. First, the shorter read lengths of second- annotation is the process of generation sequencing platforms mean that current Assembly and annotation: an overview attaching meta-data such as genome assemblies rarely attain the contiguity of the Assembly. The first step towards the successful annota- gene ontology terms to classic shotgun assemblies of the Drosophila mela- tion of any genome is determining whether its assem- structural annotations.
    [Show full text]
  • UC Davis UC Davis Previously Published Works
    UC Davis UC Davis Previously Published Works Title Longer first introns are a general property of eukaryotic gene structure. Permalink https://escholarship.org/uc/item/9j42z8fm Journal PloS one, 3(8) ISSN 1932-6203 Authors Bradnam, Keith R Korf, Ian Publication Date 2008-08-29 DOI 10.1371/journal.pone.0003093 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Longer First Introns Are a General Property of Eukaryotic Gene Structure Keith R. Bradnam*, Ian Korf Genome Center, University of California Davis, Davis, California, United States of America Abstract While many properties of eukaryotic gene structure are well characterized, differences in the form and function of introns that occur at different positions within a transcript are less well understood. In particular, the dynamics of intron length variation with respect to intron position has received relatively little attention. This study analyzes all available data on intron lengths in GenBank and finds a significant trend of increased length in first introns throughout a wide range of species. This trend was found to be even stronger when using high-confidence gene annotation data for three model organisms (Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster) which show that the first intron in the 59 UTR is - on average - significantly longer than all downstream introns within a gene. A partial explanation for increased first intron length in A. thaliana is suggested by the increased frequency of certain motifs that are present in first introns. The phenomenon of longer first introns can potentially be used to improve gene prediction software and also to detect errors in existing gene annotations.
    [Show full text]
  • Webnetcoffee
    Hu et al. BMC Bioinformatics (2018) 19:422 https://doi.org/10.1186/s12859-018-2443-4 SOFTWARE Open Access WebNetCoffee: a web-based application to identify functionally conserved proteins from Multiple PPI networks Jialu Hu1,2, Yiqun Gao1, Junhao He1, Yan Zheng1 and Xuequn Shang1* Abstract Background: The discovery of functionally conserved proteins is a tough and important task in system biology. Global network alignment provides a systematic framework to search for these proteins from multiple protein-protein interaction (PPI) networks. Although there exist many web servers for network alignment, no one allows to perform global multiple network alignment tasks on users’ test datasets. Results: Here, we developed a web server WebNetcoffee based on the algorithm of NetCoffee to search for a global network alignment from multiple networks. To build a series of online test datasets, we manually collected 218,339 proteins, 4,009,541 interactions and many other associated protein annotations from several public databases. All these datasets and alignment results are available for download, which can support users to perform algorithm comparison and downstream analyses. Conclusion: WebNetCoffee provides a versatile, interactive and user-friendly interface for easily running alignment tasks on both online datasets and users’ test datasets, managing submitted jobs and visualizing the alignment results through a web browser. Additionally, our web server also facilitates graphical visualization of induced subnetworks for a given protein and its neighborhood. To the best of our knowledge, it is the first web server that facilitates the performing of global alignment for multiple PPI networks. Availability: http://www.nwpu-bioinformatics.com/WebNetCoffee Keywords: Multiple network alignment, Webserver, PPI networks, Protein databases, Gene ontology Background tools [7–10] have been developed to understand molec- Proteins are involved in almost all life processes.
    [Show full text]
  • Downloaded from the National Center for Cide Resistance Mechanisms
    Zhou et al. Parasites & Vectors (2018) 11:32 DOI 10.1186/s13071-017-2584-8 RESEARCH Open Access ASGDB: a specialised genomic resource for interpreting Anopheles sinensis insecticide resistance Dan Zhou, Yang Xu, Cheng Zhang, Meng-Xue Hu, Yun Huang, Yan Sun, Lei Ma, Bo Shen* and Chang-Liang Zhu Abstract Background: Anopheles sinensis is an important malaria vector in Southeast Asia. The widespread emergence of insecticide resistance in this mosquito species poses a serious threat to the efficacy of malaria control measures, particularly in China. Recently, the whole-genome sequencing and de novo assembly of An. sinensis (China strain) has been finished. A series of insecticide-resistant studies in An. sinensis have also been reported. There is a growing need to integrate these valuable data to provide a comprehensive database for further studies on insecticide-resistant management of An. sinensis. Results: A bioinformatics database named An. sinensis genome database (ASGDB) was built. In addition to being a searchable database of published An. sinensis genome sequences and annotation, ASGDB provides in-depth analytical platforms for further understanding of the genomic and genetic data, including visualization of genomic data, orthologous relationship analysis, GO analysis, pathway analysis, expression analysis and resistance-related gene analysis. Moreover, ASGDB provides a panoramic view of insecticide resistance studies in An. sinensis in China. In total, 551 insecticide-resistant phenotypic and genotypic reports on An. sinensis distributed in Chinese malaria- endemic areas since the mid-1980s have been collected, manually edited in the same format and integrated into OpenLayers map-based interface, which allows the international community to assess and exploit the high volume of scattered data much easier.
    [Show full text]
  • Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I
    EMBL-European Bioinformatics Institute Annual Scientific Report 2013 On the cover Structure 3fof in the Protein Data Bank, determined by Laponogov, I. et al. (2009) Structural insight into the quinolone-DNA cleavage complex of type IIA topoisomerases. Nature Structural & Molecular Biology 16, 667-669. © 2014 European Molecular Biology Laboratory This publication was produced by the External Relations team at the European Bioinformatics Institute (EMBL-EBI) A digital version of the brochure can be found at www.ebi.ac.uk/about/brochures For more information about EMBL-EBI please contact: [email protected] Contents Introduction & overview 3 Services 8 Genes, genomes and variation 8 Molecular atlas 12 Proteins and protein families 14 Molecular and cellular structures 18 Chemical biology 20 Molecular systems 22 Cross-domain tools and resources 24 Research 26 Support 32 ELIXIR 36 Facts and figures 38 Funding & resource allocation 38 Growth of core resources 40 Collaborations 42 Our staff in 2013 44 Scientific advisory committees 46 Major database collaborations 50 Publications 52 Organisation of EMBL-EBI leadership 61 2013 EMBL-EBI Annual Scientific Report 1 Foreword Welcome to EMBL-EBI’s 2013 Annual Scientific Report. Here we look back on our major achievements during the year, reflecting on the delivery of our world-class services, research, training, industry collaboration and European coordination of life-science data. The past year has been one full of exciting changes, both scientifically and organisationally. We unveiled a new website that helps users explore our resources more seamlessly, saw the publication of ground-breaking work in data storage and synthetic biology, joined the global alliance for global health, built important new relationships with our partners in industry and celebrated the launch of ELIXIR.
    [Show full text]
  • The Biogrid Interaction Database
    D470–D478 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 26 November 2014 doi: 10.1093/nar/gku1204 The BioGRID interaction database: 2015 update Andrew Chatr-aryamontri1, Bobby-Joe Breitkreutz2, Rose Oughtred3, Lorrie Boucher2, Sven Heinicke3, Daici Chen1, Chris Stark2, Ashton Breitkreutz2, Nadine Kolas2, Lara O’Donnell2, Teresa Reguly2, Julie Nixon4, Lindsay Ramage4, Andrew Winter4, Adnane Sellam5, Christie Chang3, Jodi Hirschman3, Chandra Theesfeld3, Jennifer Rust3, Michael S. Livstone3, Kara Dolinski3 and Mike Tyers1,2,4,* 1Institute for Research in Immunology and Cancer, Universite´ de Montreal,´ Montreal,´ Quebec H3C 3J7, Canada, 2The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada, 3Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, 4School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR, UK and 5Centre Hospitalier de l’UniversiteLaval´ (CHUL), Quebec,´ Quebec´ G1V 4G2, Canada Received September 26, 2014; Revised November 4, 2014; Accepted November 5, 2014 ABSTRACT semi-automated text-mining approaches, and to en- hance curation quality control. The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein in- INTRODUCTION teractions curated from the primary biomedical lit- Massive increases in high-throughput DNA sequencing erature for all major model organism species and technologies (1) have enabled an unprecedented level of humans. As of September 2014, the BioGRID con- genome annotation for many hundreds of species (2–6), tains 749 912 interactions as drawn from 43 149 pub- which has led to tremendous progress in the understand- lications that represent 30 model organisms.
    [Show full text]
  • An Integrated Mosquito Small RNA Genomics Resource Reveals
    bioRxiv preprint doi: https://doi.org/10.1101/2020.04.25.061598; this version posted April 27, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 2 3 An integrated mosquito small RNA genomics resource reveals 4 dynamic evolution and host responses to viruses and transposons. 5 6 7 Qicheng Ma1† Satyam P. Srivastav1†, Stephanie Gamez2†, Fabiana Feitosa-Suntheimer3, 8 Edward I. Patterson4, Rebecca M. Johnson5, Erik R. Matson1, Alexander S. Gold3, Douglas E. 9 Brackney6, John H. Connor3, Tonya M. Colpitts3, Grant L. Hughes4, Jason L. Rasgon5, Tony 10 Nolan4, Omar S. Akbari2, and Nelson C. Lau1,7* 11 1. Boston University School of Medicine, Department of Biochemistry 12 2. University of California San Diego, Division of Biological Sciences, Section of Cell and 13 Developmental Biology, La Jolla, CA 92093-0335, USA. 14 3. Boston University School of Medicine, Department of Microbiology and the National 15 Emerging Infectious Disease Laboratory 16 4. Departments of Vector Biology and Tropical Disease Biology, Centre for Neglected Tropical 17 Diseases, Liverpool School of Tropical Medicine, Liverpool L3 5QA, UK 18 5. Pennsylvania State University, Department of Entomology, Center for Infectious Disease 19 Dynamics, and the Huck Institutes for the Life Sciences 20 6. Department of Environmental Sciences, The Connecticut Agricultural Experiment Station 21 7. Boston University Genome Science Institute 22 23 * Corresponding author: NCL: [email protected] 24 † These authors contributed equally to this study. 25 26 27 28 Running title: Mosquito small RNA genomics 29 30 Keywords: mosquitoes, small RNAs, piRNAs, viruses, transposons microRNAs, siRNAs 31 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.25.061598; this version posted April 27, 2020.
    [Show full text]