What's That Gene

Total Page:16

File Type:pdf, Size:1020Kb

What's That Gene What’s that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins James Hutchins To cite this version: James Hutchins. What’s that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins. Molecular Biology of the Cell, American Society for Cell Biology, 2014, 25 (8), pp.1187-1201. 10.1091/mbc.E13-10-0602. hal-02354657 HAL Id: hal-02354657 https://hal.archives-ouvertes.fr/hal-02354657 Submitted on 7 Nov 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. M BoC | TECHNICAL PERSPECTIVE What’s that gene (or protein)? Online resources for exploring functions of genes, transcripts, and proteins James R. A. Hutchins Institute of Human Genetics, Centre National de la Recherche Scientifique (CNRS), 34396 Montpellier, France ABSTRACT The genomic era has enabled research projects that use approaches including Monitoring Editor genome-scale screens, microarray analysis, next-generation sequencing, and mass spectrom- Doug Kellogg etry–based proteomics to discover genes and proteins involved in biological processes. Such University of California, Santa Cruz methods generate data sets of gene, transcript, or protein hits that researchers wish to ex- plore to understand their properties and functions and thus their possible roles in biological Received: Oct 21, 2013 systems of interest. Recent years have seen a profusion of Internet-based resources to aid this Revised: Feb 13, 2014 process. This review takes the viewpoint of the curious biologist wishing to explore the prop- Accepted: Feb 14, 2014 erties of protein-coding genes and their products, identified using genome-based technolo- gies. Ten key questions are asked about each hit, addressing functions, phenotypes, expres- sion, evolutionary conservation, disease association, protein structure, interactors, posttranslational modifications, and inhibitors. Answers are provided by presenting the latest publicly available resources, together with methods for hit-specific and data set–wide infor- mation retrieval, suited to any genome-based analytical technique and experimental species. The utility of these resources is demonstrated for 20 factors regulating cell proliferation. Re- sults obtained using some of these are discussed in more depth using the p53 tumor suppres- sor as an example. This flexible and universally applicable approach for characterizing ex- perimental hits helps researchers to maximize the potential of their projects for biological discovery. DOI: 10.1091/mbc.E13-10-0602 INTRODUCTION Address correspondence to: James R. A. Hutchins ([email protected]). The past decade has witnessed huge advances in the power and Periodic updates to the Supplemental Materials for this article will be made avail- able on the author’s website: http://www.jrahutchins.net. scope of analytical technologies based on genomic data. These The author declares that no conflict of interest exists. methods, which include the functional identification of genes using Abbreviations used: CDD, Conserved Domain Database; CNRS, Centre National traditional genetic and RNA interference (RNAi)-based knockdown de la Recherche Scientifique; DDBJ, DNA Data Bank of Japan; EBI, European screens (Forsburg, 2001; Boutros and Ahringer, 2008), the identifica- Bioinformatics Institute; ELM, eukaryotic linear motif; ENA, European Nucleotide Archive; GDSC, Genomics of Drug Sensitivity in Cancer; GEO, Gene Expression tion of DNA and RNA populations by microarray analysis or next- Omnibus; GI, GenInfo Identifier; GO, Gene Ontology; GSV, genomic structural generation sequencing (Capaldi, 2010; Niedringhaus et al., 2011; variation; ID, identifier code; IMEx, International Molecular Exchange; IPI, Interna- Ozsolak and Milos, 2011), and the identification of proteins, com- tional Protein Index; KEGG, Kyoto Encyclopedia of Genes and Genomes; NCBI, National Center for Biotechnology Information; nr, nonredundant; OMIM, Online plexes, and their modifications using mass spectrometry–based Mendelian Inheritance in Man; PDB, Protein Data Bank; PTM, posttranslational proteomics (Walther and Mann, 2010), have transformed biological modification; RCSB, Research Collaboratory for Structural Bioinformatics; Ro5, Rule of Five; SLiM, short linear motif; SNP, single-nucleotide polymorphism. research. Many researchers can now turn to these techniques to ad- © 2014 Hutchins. This article is distributed by The American Society for Cell Biol- dress specific biological questions or, by performing larger-scale or ogy under license from the author(s). Two months after publication it is available high-throughput experiments, discover genes, transcripts, and pro- to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0). teins involved in their system of interest. “ASCB®,” “The American Society for Cell Biology®,” and “Molecular Biology of Although these technologies differ greatly in their principles the Cell®” are registered trademarks of The American Society of Cell Biology. and mechanistics, their general approaches share a similar form. A Supplemental Material can be found at: Volume 25 April 15, 2014 http://www.molbiolcell.org/content/suppl/2014/04/03/25.8.1187.DC1.html 1187 tionally, resulting in the identification of Biological multiple gene, transcript, or protein hits, material that is, entries from public sequence data- Isolation of DNA, RNA or proteins. bases, each described using a unique iden- tifier code (ID; in some cases known as an accession code), plus other data pertinent Sample quality control Sample for analysis to the experiment in question, such as a (yield, purity) Sample processing confidence score or intensity measurement. The resulting hits table may typically un- Analysis of processed sample: data acquisition dergo further bioinformatic analyses, includ- (e.g. sequencing, microarray analysis, or mass spectrometry). ing statistical validation, ranking, and, in Raw data file(s) some cases, identification and removal of known contaminant entries. Public databases Unfortunately, this hits table is often nucleic acid or Data analysis: identification of “hits” (genes, protein sequences transcripts, or proteins), with confidence scores. where platform-driven analysis stops, leav- ing the research biologist with a list of often Bioinformatic analysis: statistical validation, ranking, unfamiliar gene, transcript, or protein contaminant filtering. Sequence & Functions, pathways names, abbreviations, and IDs, of which he genomic origins Results table & systems or she has the task of making sense. Re- Hits (genes, transcripts or proteins), searchers faced with such a list will naturally each with a unique identifier code be curious to find out more about each of (ID) and analysis-specific data. the hits, to determine whether they are in- Question 1 Question 2 Supplemental Tables S1, S2 & S3 Supplemental Tables S4, S5, S6 & S7 teresting and worthy of investing time and resources for follow-up studies. They may wish to know whether the hit was previously Homologues Gene expression & evolution reported to have an involvement in their bi- ological system of interest or whether it is novel and what is known about its functions, structure, interactions, and so on. Question 3 Question 4 Fortunately, help is at hand. The past de- Supplemental Table S8 Supplemental Table S9 cade has also seen the emergence of a plethora of high-quality database resources Protein structure Protein-protein Post-translational providing information about the functions of interactions modifications genes, transcripts, and proteins for many or- ganisms. These provide multiple gateways for the biologist, allowing access to informa- Question 5 Question 6 Question 7 tion relating to nucleotide or amino acid se- Supplemental Tables S10 & S11 Supplemental Table S12 Supplemental Table S13 quences, genomic origins, evolutionary conservation, expression in cells and tissues, and association with disease processes. Fur- Genetic variants & Drugs & inhibitors Summary & overview ther protein-related information centers on disease association enzymatic or other functions, biological pro- cesses in which they are involved, domain and three-dimensional structures, interac- Question 8 Question 9 Question 10 tion partners, posttranslational modifica- Supplemental Tables S14 & S15 Supplemental Table S16 Supplemental Tables S17 & S18 tions, and the possibility of modulating their activities using small-molecule inhibitors. FIGURE 1: Generalized workflow for the analysis of DNA, RNA, or protein samples and Database resources providing such in- questions about the hits identified. Nucleic acid or protein samples isolated from the biological formation are necessarily based on gene, material of interest are processed, then analyzed by various methods. Raw analytical data are (less commonly) transcript, or protein IDs. then matched to entries in public databases, generating a results table listing the genes, Analytical
Recommended publications
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The
    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
    [Show full text]
  • A Semantic Standard for Describing the Location of Nucleotide and Protein Feature Annotation Jerven T
    Bolleman et al. Journal of Biomedical Semantics (2016) 7:39 DOI 10.1186/s13326-016-0067-z RESEARCH Open Access FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation Jerven T. Bolleman1*, Christopher J. Mungall2, Francesco Strozzi3, Joachim Baran4, Michel Dumontier5, Raoul J. P. Bonnal6, Robert Buels7, Robert Hoehndorf8, Takatomo Fujisawa9, Toshiaki Katayama10 and Peter J. A. Cock11 Abstract Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions: Our ontology allows
    [Show full text]
  • Webnetcoffee
    Hu et al. BMC Bioinformatics (2018) 19:422 https://doi.org/10.1186/s12859-018-2443-4 SOFTWARE Open Access WebNetCoffee: a web-based application to identify functionally conserved proteins from Multiple PPI networks Jialu Hu1,2, Yiqun Gao1, Junhao He1, Yan Zheng1 and Xuequn Shang1* Abstract Background: The discovery of functionally conserved proteins is a tough and important task in system biology. Global network alignment provides a systematic framework to search for these proteins from multiple protein-protein interaction (PPI) networks. Although there exist many web servers for network alignment, no one allows to perform global multiple network alignment tasks on users’ test datasets. Results: Here, we developed a web server WebNetcoffee based on the algorithm of NetCoffee to search for a global network alignment from multiple networks. To build a series of online test datasets, we manually collected 218,339 proteins, 4,009,541 interactions and many other associated protein annotations from several public databases. All these datasets and alignment results are available for download, which can support users to perform algorithm comparison and downstream analyses. Conclusion: WebNetCoffee provides a versatile, interactive and user-friendly interface for easily running alignment tasks on both online datasets and users’ test datasets, managing submitted jobs and visualizing the alignment results through a web browser. Additionally, our web server also facilitates graphical visualization of induced subnetworks for a given protein and its neighborhood. To the best of our knowledge, it is the first web server that facilitates the performing of global alignment for multiple PPI networks. Availability: http://www.nwpu-bioinformatics.com/WebNetCoffee Keywords: Multiple network alignment, Webserver, PPI networks, Protein databases, Gene ontology Background tools [7–10] have been developed to understand molec- Proteins are involved in almost all life processes.
    [Show full text]
  • HTG Data Input: Annotation File Output in DDBJ Flat File
    HTG data Input: Annotation file Output in DDBJ flat file LOCUS ############ 123456 bp DNA linear HTG 30-SEP-2017 Entry Feature Location Qualifier Value DEFINITION Arabidopsis thaliana DNA, BAC clone: CIC5D1, chromosome 1, *** COMMON DATE hold_date 20191130 SEQUENCING IN PROGRESS *** SUBMITTER contact Hanako Mishima ACCESSION ############ ab_name Mishima,H. VERSION ############.1 ab_name Yamada,T. DBLINK ab_name Park,C.S. KEYWORDS HTG; HTGS_PHASE1. ab_name Liu,G.Q. SOURCE Arabidopsis thaliana (thale cress) email [email protected] ORGANISM Arabidopsis thaliana phone 81-55-981-6853 Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; fax 81-55-981-6849 Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; institute National Institute of Genetics rosids; malvids; Brassicales; Brassicaceae; Camelineae; department DNA Data Bank of Japan Arabidopsis. country Japan REFERENCE 1 (bases 1 to 123456) state Shizuoka AUTHORS Mishima,H., ada,T., Park,C.S. and Liu,G.Q. city Mishima TITLE Direct Submission street Yata 1111 JOURNAL Submitted (dd-mmm-yyyy) to the DDBJ/EMBL/GenBank databases. zip 411-8540 Contact:Hanako Mishima REFERENCE ab_name Mishima,H. National Institute of Genetics, DNA Data Bank of Japan; Yata 1111, ab_name Yamada,T. Mishima, Shizuoka 411-8540, Japan ab_name Park,C.S. REFERENCE 2 ab_name Liu,G.Q. AUTHORS Mishima,H., ada,T., Park,C.S. and Liu,G.Q. title Arabidopsis thaliana DNA TITLE Arabidopsis thaliana Sequencing year 2017 JOURNAL Unpublished (2017) status Unpublished COMMENT Please visit our web site
    [Show full text]
  • The Biogrid Interaction Database
    D470–D478 Nucleic Acids Research, 2015, Vol. 43, Database issue Published online 26 November 2014 doi: 10.1093/nar/gku1204 The BioGRID interaction database: 2015 update Andrew Chatr-aryamontri1, Bobby-Joe Breitkreutz2, Rose Oughtred3, Lorrie Boucher2, Sven Heinicke3, Daici Chen1, Chris Stark2, Ashton Breitkreutz2, Nadine Kolas2, Lara O’Donnell2, Teresa Reguly2, Julie Nixon4, Lindsay Ramage4, Andrew Winter4, Adnane Sellam5, Christie Chang3, Jodi Hirschman3, Chandra Theesfeld3, Jennifer Rust3, Michael S. Livstone3, Kara Dolinski3 and Mike Tyers1,2,4,* 1Institute for Research in Immunology and Cancer, Universite´ de Montreal,´ Montreal,´ Quebec H3C 3J7, Canada, 2The Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario M5G 1X5, Canada, 3Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA, 4School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR, UK and 5Centre Hospitalier de l’UniversiteLaval´ (CHUL), Quebec,´ Quebec´ G1V 4G2, Canada Received September 26, 2014; Revised November 4, 2014; Accepted November 5, 2014 ABSTRACT semi-automated text-mining approaches, and to en- hance curation quality control. The Biological General Repository for Interaction Datasets (BioGRID: http://thebiogrid.org) is an open access database that houses genetic and protein in- INTRODUCTION teractions curated from the primary biomedical lit- Massive increases in high-throughput DNA sequencing erature for all major model organism species and technologies (1) have enabled an unprecedented level of humans. As of September 2014, the BioGRID con- genome annotation for many hundreds of species (2–6), tains 749 912 interactions as drawn from 43 149 pub- which has led to tremendous progress in the understand- lications that represent 30 model organisms.
    [Show full text]
  • Nucleic Acid Databases on the Web Richard Peters and Robert S
    © 1996 Nature Publishing Group http://www.nature.com/naturebiotechnology RESOURCES • WWWGUIDE Nucleic acid databases on the Web Richard Peters and Robert S. Sikorski For many researchers, searchable sequence data­ is The National Center for Biotechnology the Washington University (St. Louis, MO) bases on the Internet are becoming as indis­ Information (NCBI) site. Some of the things sequencing project. dbSTS is a database of pensable as pipettes. Especially in the fields of you can now find at their site include BLAST, genomic sequence tagged sites that serve as molecular and structural biology, many excel­ Genbank, Entrez, dbEST, dbSTS, OMIM, mapping landmarks in the human genome. lent online databases have cropped up in just and UniGene. A brief description of each of OMIM (Online Mendelian Inheritance in the past year. Despite this wealth of informa­ these tools demonstrates what we mean. Man) is a unique database that catalogs cur­ tion, it still requires considerable effort to sort BLAST is the standard program used for rent research and clinical information about through these myriad databases to find ones querying your DNA or protein sequence individual human genes. UniGene is a subset that are useful. Keeping this in mind, we started against all known sequences housed in data­ of GenBank that defines a collection of DNA by using our Net search tools to see if we could bases. You can even have the output sent to sequences likely to represent the transcrip­ winnow out the best of the Net. This month, we you by e-mail. Genbank is an annotated col­ tion products of distinct genes.
    [Show full text]
  • Biological Databases
    Biological Databases Biology Is A Data Science ● Hundreds of thousand of species ● Million of articles in scientific literature ● Genetic Information ○ Gene names ○ Phenotype of mutants ○ Location of genes/mutations on chromosomes ○ Linkage (relationships between genes) 2 Data and Metadata ● Data are “concrete” objects ○ e.g. number, tweet, nucleotide sequence ● Metadata describes properties of data ○ e.g. object is a number, each tweet has an author ● Database structure may contain metadata ○ Type of object (integer, float, string, etc) ○ Size of object (strings at most 4 characters long) ○ Relationships between data (chromosomes have zero or more genes) 3 What is a Database? ● A data collection that needs to be : ○ Organized ○ Searchable ○ Up-to-date ● Challenge: ○ change “meaningless” data into useful, accessible information 4 A spreadsheet can be a Database ● Rectangular data ● Structured ● No metadata ● Search tools: ○ Excel ○ grep ○ python/R 5 A filesystem can be a Database ● Hierarchical data ● Some metadata ○ File, symlink, etc ● Unstructured ● Search tools: ○ ls ○ find ○ locate 6 Organization and Types of Databases ● Every database has tools that: ○ Store ○ Extract ○ Modify ● Flat file databases (flat DBMS) ○ Simple, restrictive, table ● Hierarchical databases ○ Simple, restrictive, tables ● Relational databases (RDBMS) ○ Complex, versatile, tables ● Object-oriented databases (ODBMS) ● Data warehouses and distributed databases ● Unstructured databases (object store DBs) 7 Where do the data come from ? 8 Types of Biological Data
    [Show full text]
  • PINOT: an Intuitive Resource for Integrating Protein-Protein Interactions James E
    Tomkins et al. Cell Communication and Signaling (2020) 18:92 https://doi.org/10.1186/s12964-020-00554-5 METHODOLOGY Open Access PINOT: an intuitive resource for integrating protein-protein interactions James E. Tomkins1, Raffaele Ferrari2, Nikoleta Vavouraki1, John Hardy2,3,4,5,6, Ruth C. Lovering7, Patrick A. Lewis1,2,8, Liam J. McGuffin9* and Claudia Manzoni1,10* Abstract Background: The past decade has seen the rise of omics data for the understanding of biological systems in health and disease. This wealth of information includes protein-protein interaction (PPI) data derived from both low- and high-throughput assays, which are curated into multiple databases that capture the extent of available information from the peer-reviewed literature. Although these curation efforts are extremely useful, reliably downloading and integrating PPI data from the variety of available repositories is challenging and time consuming. Methods: We here present a novel user-friendly web-resource called PINOT (Protein Interaction Network Online Tool; available at http://www.reading.ac.uk/bioinf/PINOT/PINOT_form.html) to optimise the collection and processing of PPI data from IMEx consortium associated repositories (members and observers) and WormBase, for constructing, respectively, human and Caenorhabditis elegans PPI networks. Results: Users submit a query containing a list of proteins of interest for which PINOT extracts data describing PPIs. At every query submission PPI data are downloaded, merged and quality assessed. Then each PPI is confidence scored based on the number of distinct methods used for interaction detection and the number of publications that report the specific interaction. Examples of how PINOT can be applied are provided to highlight the performance, ease of use and potential utility of this tool.
    [Show full text]
  • Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases
    Genomics Proteomics Bioinformatics 18 (2020) 91–103 Genomics Proteomics Bioinformatics www.elsevier.com/locate/gpb www.sciencedirect.com PERSPECTIVE Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases Qingyu Chen 1,*, Ramona Britto 2, Ivan Erill 3, Constance J. Jeffery 4, Arthur Liberzon 5, Michele Magrane 2, Jun-ichi Onami 6,7, Marc Robinson-Rechavi 8,9, Jana Sponarova 10, Justin Zobel 1,*, Karin Verspoor 1,* 1 School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3010, Australia 2 European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK 3 Department of Biological Sciences, University of Maryland, Baltimore, MD 21250, USA 4 Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA 5 Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA 6 Japan Science and Technology Agency, National Bioscience Database Center, Tokyo 102-8666, Japan 7 National Institute of Health Sciences, Tokyo 158-8501, Japan 8 Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland 9 Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland 10 Nebion AG, 8048 Zurich, Switzerland Received 8 December 2017; revised 24 October 2018; accepted 14 December 2018 Available online 9 July 2020 Handled by Zhang Zhang Introduction assembled, annotated, and ultimately submitted to primary nucleotide databases such as GenBank [2], European Nucleo- tide Archive (ENA) [3], and DNA Data Bank of Japan Biological databases represent an extraordinary collective vol- (DDBJ) [4] (collectively known as the International Nucleotide ume of work.
    [Show full text]
  • 2021.03.02.433662V1.Full.Pdf
    bioRxiv preprint doi: https://doi.org/10.1101/2021.03.02.433662; this version posted March 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 Title: Exploring bacterial diversity via a curated and searchable snapshot of archived DNA 2 sequences 3 4 Authors: Grace A. Blackwell1,2*, Martin Hunt1,3, Kerri M. Malone1, Leandro Lima1, Gal Horesh2#, 5 Blaise T.F. Alako1, Nicholas R Thomson2,4†* and Zamin Iqbal1†* 6 1EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United 7 Kingdom 8 2Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, 9 United Kingdom 10 3Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7LF, United Kingdom 11 4London School of Hygiene & Tropical Medicine, London, United Kingdom † 12 Joint Authors 13 * To whom correspondence should be addressed: [email protected]; [email protected]; 14 [email protected] 15 16 #Current address: Chesterford Research Park, Cambridge, CB10 1XL 17 18 ORCIDs: 19 G.A.B.: 0000-0003-3921-3516 20 M.H.: 0000-0002-8060-4335 21 L.L.: 0000-0001-8976-2762 22 K.M.M.: 0000-0002-3974-0810 23 G.H.: 0000-0003-0342-0185 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.02.433662; this version posted March 3, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
    [Show full text]
  • DDBJ Progress Report
    D44–D49 Nucleic Acids Research, 2014, Vol. 42, Database issue Published online 4 November 2013 doi:10.1093/nar/gkt1066 DDBJ progress report: a new submission system for leading to a correct annotation Takehide Kosuge1,*, Jun Mashima1, Yuichi Kodama1, Takatomo Fujisawa1, Eli Kaminuma1, Osamu Ogasawara1, Kousaku Okubo1, Toshihisa Takagi1,2 and Yasukazu Nakamura1,* 1DDBJ Center, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan and 2National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan Received September 13, 2013; Revised October 11, 2013; Accepted October 14, 2013 ABSTRACT We have already reported that our previous supercom- The DNA Data Bank of Japan (DDBJ; http://www. puter system has been totally replaced by a new commod- ddbj.nig.ac.jp) maintains and provides archival, re- ity-cluster-based system (1). Our services including Web trieval and analytical resources for biological infor- services, submission system, BLAST, CLUSTALW, WebAPI and databases are now running on a new super- mation. This database content is shared with the US computer system. In addition to a traditional assembled National Center for Biotechnology Information sequence archive, DDBJ provides the DDBJ Sequence (NCBI) and the European Bioinformatics Institute Read Archive (DRA) (5) and the DDBJ BioProject (6) (EBI) within the framework of the International for receiving short reads from next-generation sequencing Nucleotide Sequence Database Collaboration (NGS) machines and organizing the corresponding data (INSDC). DDBJ launched a new nucleotide obtained from research projects. In 2013, DDBJ has sequence submission system for receiving trad- started the BioSample database in collaboration with itional nucleotide sequence.
    [Show full text]
  • Ferguson, Alison (2012) Identification and Characterisation of Arabidopsis
    IDENTIFICATION AND CHARACTERISATION OF ARABIDOPSIS ER ACCESSORY PROTEINS Alison Ferguson, BSc Thesis submitted to the University of Nottingham for the degree of Doctor of Philosophy February 2012 i ABSTRACT ER accessory proteins are a novel class of endoplasmic reticulum (ER) proteins that facilitate the exit of polytopic membrane proteins from the ER. They are important for the correct targeting of their cognate polytopic membrane proteins to the plasma membrane (PM) and their absence leads to abnormal accumulation of their target in the ER. Until recently, it was not clear if such proteins exist in plants. However, work by Dharmasiri et al (2006) and Gonzales et al (2005) suggest that such proteins exists in plants too. Polytopic membrane proteins such as nutrient transporters, hormone transporters and sugar transporters are a very important class of proteins as they regulate many important physiological and biochemical processes. Better understanding of the targeting of these proteins to the PM is of considerable agronomic interest due to the importance of efficient use of resources in sustainable agriculture. One of the projects aims is to identify novel ER accessory proteins in Arabidopsis. Using a bioinformatics approach, 40 novel ER resident proteins were identified from a protein localisation database (LOPIT) generated by Dunkley et al (2006) as potential candidates for ER accessory proteins. Genetic, phenotypic and molecular approaches have been used to assess their role as potential ER accessory proteins. A few promising candidates have been identified, one of which AtBPL1 and related family. The AtBPL1 family has similarity to mammalian BAP31 which has been shown to function as an ER accessory protein (Ladasky et al, 2006).
    [Show full text]