Viroinformatics Investigation of B-Cell Epitope Conserved Region in SARS

Total Page:16

File Type:pdf, Size:1020Kb

Viroinformatics Investigation of B-Cell Epitope Conserved Region in SARS © 2021 Journal of Pharmacy & Pharmacognosy Research, 9 (6), 766-779, 2021 ISSN 0719-4250 http://jppres.com/jppres Original Article Viroinformatics investigation of B-cell epitope conserved region in SARS- CoV-2 lineage B.1.1.7 isolates originated from Indonesia to develop vaccine candidate against COVID-19 [Investigación viroinformática de la región conservada del epítopo de células B en el linaje SARS-CoV-2 B.1.1.7 aislamientos originados en Indonesia para desarrollar una vacuna candidata contra COVID-19] Arif N. M. Ansori1,2#, Reviany V. Nidom1,3*#, Muhammad K. J. Kusala1,2, Setyarina Indrasari1,3, Irine Normalina1,4, Astria N. Nidom1,3, Balqis Afifah1,3, Kartika B. Sari1,5, Nor L. Ramadhaniyah1,5, Mohammad Y. Alamudi1,3, Umi Cahyaningsih6, Kuncoro P. Santoso1,2, Heri Kuswanto5, Chairul A. Nidom1,2,3* 1Coronavirus and Vaccine Formulation Research Group, Professor Nidom Foundation, Surabaya, Indonesia. 2Faculty of Veterinary Medicine, Universitas Airlangga, Surabaya, Indonesia. 3Riset AIRC Indonesia, Surabaya, Indonesia. 4Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia. 5Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia. 6Faculty of Veterinary Medicine, IPB University, Bogor, Indonesia. #Both authors contributed equally. *E-mail: [email protected], [email protected], [email protected] Abstract Resumen Context: SARS-CoV-2, a member of family Coronaviridae and the Contexto: SARS-CoV-2, un miembro de la familia Coronaviridae y el causative agent of COVID-19, is a virus which is transmitted to human agente causante de COVID-19, es un virus que se transmite a humanos y and other mammals. otros mamíferos. Aims: To analyze the B-cell epitope conserved region and Objetivos: Analizar la región conservada del epítopo de células B y el viroinformatics-based study of the SARS-CoV-2 lineage from estudio basado en viroinformática del linaje SARS-CoV-2 de los aislados Indonesian B.1.1.7 isolates to invent a vaccine nominee for overcoming B.1.1.7 de Indonesia para inventar una vacuna candidata para superar COVID-19. COVID-19. Methods: The sequences of seven Indonesian B.1.1.7 isolates, Wuhan-Hu- Métodos: Las secuencias de siete aislamientos B.1.1.7 indonesios, el 1 isolate, and WIV04 isolate were extracted from the GISAID EpiCoV aislado Wuhan-Hu-1 y el aislado WIV04 se extrajeron de GISAID and GenBank, NCBI. MEGA X was employed to understand the EpiCoV y GenBank, NCBI. Se empleó MEGA X para comprender las transformations of amino acid in the S protein and to develop a transformaciones de aminoácidos en la proteína S y para desarrollar un molecular phylogenetic tree. The IEDB was implemented to reveal the árbol filogenético molecular. El IEDB se implementó para revelar los linear B-cell epitopes. In addition, PEP-FOLD3 web server was utilized epítopos de células B lineales. Además, se utilizó el servidor web PEP- to perform peptide modeling, while docking was performed using FOLD3 para realizar el modelado de péptidos, mientras que el PatchDock, FireDock, and the PyMOL software. Moreover, in silico acoplamiento se realizó mediante PatchDock, FireDock y el software cloning was developed by using SnapGene v.3.2.1 software. PyMOL. Además, la clonación in silico se desarrolló utilizando el Results: In this study, the changes of amino acid in all seven Indonesian software SnapGene v.3.2.1. B.1.1.7 isolates were uncovered. Furthermore, various peptides based on Resultados: En este estudio, se descubrieron los cambios de aminoácidos the B-cell epitope prediction, allergenicity prediction, toxicity prediction en los siete aislamientos de B.1.1.7 de Indonesia. Además, se from S protein to generate a vaccine contrary to SARS-CoV-2 were identificaron varios péptidos basados en la predicción del epítopo de identified. Furthermore, the development of in silico cloning using pET células B, la predicción de la alergenicidad, la predicción de la toxicidad plasmid was successfully achieved. de la proteína S para generar una vacuna contraria al SARS-CoV-2. Conclusions: This study exhibits the transformations of amino acid in Además, se logró con éxito el desarrollo de la clonación in silico Indonesian B.1.1.7 isolates, and proposes four peptides utilizando el plásmido pET. (“LTPGDSSSGWTAG”, “VRQIAPGQTGKIAD”, “ILPDPSKPSKRS”, and Conclusiones: Este estudio exhibe las transformaciones de aminoácidos “KNHTSPDVDLG”) from S protein as the candidate for a peptide-based en aislados B.1.1.7 de Indonesiay propone cuatro péptidos vaccine. However, further advance trials such as in vitro and in vivo (“LTPGDSSSGWTAG”, “VRQIAPGQTGKIAD”, “ILPDPSKPSKRS” y testing are involved for validation. “KNHTSPDVDLG”) de la proteína S como candidato para una vacuna basada en péptidos. Sin embargo, para la validación se requieren más ensayos in vitro e in vivo. Keywords: COVID-19; SARS-CoV-2; vaccine design; viroinformatics. Palabras Clave: COVID-19; SARS-CoV-2; vaccine design; viroinformatics. ARTICLE INFO Received: May 10, 2021. Received in revised form: June 3, 2021. Accepted: June 5, 2021. Available Online: June 13, 2021. _____________________________________ Ansori et al. B-cell epitope of SARS-CoV-2 and COVID-19 vaccine candidate INTRODUCTION larity shared with the genome of previous variant, SARS-CoV, is quite high, around 79.5% shared SARS-CoV-2 ranked as the seventh for the similarity (Hu et al., 2020). There are four structur- coronavirus (CoV) which has transversed the bar- al proteins which conceal this genome, they are rier between species and been able to infect hu- spike (S), nucleocapsid (N), membrane (M), and man. The virus was firstly identified in China in envelope (E) (Li et al., 2020; Ou et al., 2020; Zeng et 2019 and appeared sporadically all over China and al., 2020). In addition, several researches have in- many other nations worldwide (Harapan et al., formed that SARS-CoV-2 ties up ACE2 receptor on 2020; Rodriguez-Morales et al., 2020; Wu et al., the host cell. Furthermore, the receptor contributes 2020). As the virus spreads massively and quickly, greatly in viral entry system and closely related to it was confirmed and revealed publicly by WHO pathogenicity (Harapan et al., 2020). in March 2020 that this contagious widespread has become a pandemic. The global health and econ- Various investigation is demanded to acquire omy have suffered by this COVID-19 abrupt con- molecular epidemiological data of the novel virus tagion and rapid deployment (Astuti and Ysrafil, in Indonesia (Nidom et al., 2021). Research on mo- 2020; Huang et al., 2020). This global issue has re- lecular epidemiology is a vital instrument for ob- quired revolutionary scientific progression for serving new emerging viruses. There is an urgency deep investigations on SARS-CoV-2 concerning its to establish further studies in molecular epidemi- features, mechanism of transmission, and clinical ology to comprehend the probable consequences aspects for the sole purpose of ending this pan- of the disease (Nidom et al., 2020a; 2021). Current- demic’s catastrophic impacts (Awadasseid et al., ly, a novel SARS-CoV-2 lineage was detected on 2021; Ganesh et al., 2021). Sadly, this virus has September 2020, called B.1.1.7. A variant of this costed about 170 million people getting infected virus was initially discovered in England among a globally with more than 3.5 million global deaths. large number of new cases. Then, this variant be- To be specific, Indonesia has about 1.8 million cas- comes dominant in December 2020 in Southeast- es and around 50 thousand deaths. John Hopkins ern England. B.1.1.7 variant spreads globally and University has initiated the global data pooling rapidly including in Indonesia and as predicted in with actual time updated online to pursue the real the early of March 2021 (Duong, 2021). This virus time cases of COVID-19 in numbers and geograph- variant raises concerns recognized by neutralizing ical disbursement in which these above data are the antibodies, which may be affected leading to derived from (Dong et al., 2020). non-maximum vaccine efficacy. The coronavirus family is composed of four dif- In addition, our previous study reported the ferent genera: Alpha-, Gamma-, Beta-, and Del- four structural proteins as vaccine candidate’s de- tacoronavirus. Alphacoronavirus and Betacoronavirus velopment and revealed the antibody-dependent infect animals and humans, whereas Deltacorona- enhancement (ADE) phenomenon based on the S virus and Gammacoronavirus infect only animals protein gene sequences (Ansori et al., 2020; Nidom (Raoult et al., 2020). Similar to SARS-CoV-1 and et al., 2020b; Normalina et al., 2020). The efforts to MERS-CoV, SARS-CoV-2 also is part of Coronaviri- produce vaccines globally in resistance to SARS- dae (Betacoronavirus), a family that has formerly CoV-2 by scientists with the vaccines derived from driven the previous two epidemics (Shereen et al., its proteins turned out to be one of the most pro- 2020). Referenced in some data pools, the genome gressive kinds and the driving force of this study of SARS-CoV-2 has been recognized with around is the private sectors (Callaway, 2020). Although 30 kb nucleotides in a single-stranded positive- various countries already ongoing for the COVID- sense RNA (NC_045512.2 - GenBank, NCBI and 19 vaccination program (Nidom et al., 2021), cur- EPI_ISL_402124 - GISAID EpiCoV) and the simi- rently there are no approved drugs against the virus (Harisna et al., 2021). A recent study revea- http://jppres.com/jppres J Pharm Pharmacogn Res
Recommended publications
  • Zebrafish Disease Models to Study the Pathogenesis of Inherited Manganese Transporter Defects and Provide A
    Zebrafish disease models to study the pathogenesis of inherited manganese transporter defects and provide a route for drug discovery Dr Karin Tuschl University College London PhD Supervisors: Dr Philippa Mills & Prof Stephen Wilson A thesis submitted for the degree of Doctor of Philosophy University College London August 2016 Declaration I, Karin Tuschl, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Part of the work of this thesis has been published in the following articles for which copyright clearance has been obtained (see Appendix): - Tuschl K, et al. Manganese and the brain. Int Rev Neurobiol. 2013. 110:277- 312. - Tuschl K, et al. Mutations in SLC39A14 disrupt manganese homeostasis and cause childhood-onset parkinsonism-dystonia. Nat Comms. 2016. 7:11601. I confirm that these publications were written by me and may therefore partly overlap with my thesis. 2 Abstract Although manganese is required as an essential trace element excessive amounts are neurotoxic and lead to manganism, an extrapyramidal movement disorder associated with deposition of manganese in the basal ganglia. Recently, we have identified the first inborn error of manganese metabolism caused by mutations in SLC30A10, encoding a manganese transporter facilitating biliary manganese excretion. Treatment is limited to chelation therapy with intravenous disodium calcium edetate which is burdensome due to its route of administration and associated with high socioeconomic costs. Whole exome sequencing in patients with inherited hypermanganesaemia and early- onset parkinsonism-dystonia but absent SLC30A10 mutations identified SLC39A14 as a novel disease gene associated with manganese dyshomeostasis.
    [Show full text]
  • Annual Report 2014 - 2015 Our Mission
    ANNUAL REPORT 2014 - 2015 Our Mission By integrating the fields of medicine, science, engineering and technology into translational knowledge and making the resulting biomedical innovations accessible to public health, to improve the health of the most disadvantaged people in India and throughout the world. Our Vision As a networked organization linking many centers of excellence, THSTI is envisaged as a collective of scientists, engineers and physicians that will effectively enhance the quality of human life through integrating a culture of shared excellence in research, education and translational knowledge with the developing cohorts and studying the pathogenesis and the molecular mechanisms of disease to generate knowledge to complement the processes of designing interventions and technology development. CONTENTS 3 THE ORGANIZATION Society Governing Body Leadership From the Executive Director’s Desk 11 RESEARCH PROGRAMS Vaccine and Infectious Disease Research Centre Pediatric Biology Centre Centre for Biodesign and Diagnostics Policy Center for Biomedical Research Drug Discovery Research Centre Centre for Human Microbial Ecology Population Science Partnership Centre Clinical Development Services Agency 178 ACADEMIA 184 ADMINISTRATION Organization INTRAMURAL CENTRES Vaccine & Infectious Disease Research Centre (VIDRC) Pediatric Biology Centre (PBC) Centre for Biodesign & Diagnostics (CBD) Centre for Human Microbial Ecology (CHME) Policy Centre for Biomedical Research (PCBR) Drug Discovery Research Centre (DDRC) PARTNERSHIP CENTRE Population Science Partnership Centre (PSPC) EXTRAMURAL CENTRE Clinical Development Services Agency (CDSA) THSTI Society 1 2 3 4 5 6 7 8 9 10 11 12 1. Dr. G. Padmanaban 5. Dr. T.S. Rao 9. Dr. J. Gowrishankar Distinguished Professor, Nodal Officer, THSTI, Sr. Advisor, Director, IISc Bangalore Department of Biotechnology, Centre for DNA Fingerprinting President New Delhi & Diagnostics, Member Ex-officio Hyderabad 2.
    [Show full text]
  • Glycomics Goes Visual and Interactive
    Glycomics & Lipidomics Extended Abstract Glycomics goes visual and interactive Alessandra Gastaldello structures attached to each of these sites. Mass spectrometry Abstract (MS) and microarray are high-throughput technologies that are commonly used in glycomics and glycoproteomics, which often result in the generation of large experimental datasets. Glycomics@ExPASy the glycomics tab of the Swiss Institute of Bioinformatics approaches play an essential role in automated Bioinformatics server (www.expasy.org/glycomics) was created analysis and interpretation of such data. This unit describes in 2016 to centralise web-based glycoinformatics resources and discusses the computational tools currently available for developed within an international network of glycoscientists. these analyses, and their glycomics and glycoproteomics The philosophy of this toolbox is to be {glycoscientist AND applications. protein scientist}???friendly with the aim of popularising (a) the use of bioinformatics in glycobiology and (b) the relation A key point in achieving accurate intact glycopeptide between glycobiology and protein-oriented bioinformatics identification is the definition of the glycan composition file resources. The scarcity of bridging data led us to design tools that is used to match experimental with theoretical masses by a as interactive as possible based on database connectivity in glycoproteomics search engine. At present, these files are order to facilitate data exploration and support hypothesis mainly built from searching the literature and/or querying building. The current set of resources is mostly built on top of data sources focused on posttranslational modifications. Most curated or experimental data relative to glycan structures, glycoproteomics search engines include a default composition glycoproteins, host-pathogen interactions and mass file that is readily used when processing MS data.
    [Show full text]
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The
    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
    [Show full text]
  • Pathogenicity and Selective Constraint on Variation Near Splice Sites
    Downloaded from genome.cshlp.org on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press 1 Pathogenicity and selective constraint on variation near 2 splice sites 3 AUTHORS 4 Jenny Lord1, Giuseppe Gallone1, Patrick J. Short1, Jeremy F. McRae1, Holly Ironfield1, Elizabeth H. 5 Wynn1, Sebastian S. Gerety1, Liu He1, Bronwyn Kerr2,3, Diana S. Johnson4, Emma McCann5, Esther 6 Kinning6, Frances Flinter7, I. Karen Temple8,9 , Jill Clayton-Smith2,3, Meriel McEntagart10, Sally Ann 7 Lynch11, Shelagh Joss12, Sofia Douzgou2,3, Tabib Dabir13, Virginia Clowes14, Vivienne P. M. 8 McConnell13, Wayne Lam15, Caroline F. Wright16, David R. FitzPatrick1,15, Helen V. Firth1,17, Jeffrey 9 C. Barrett1, Matthew E. Hurles1, on behalf of the Deciphering Developmental Disorders study 10 AFFILIATIONS 11 1 Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK 12 2Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University Hospitals NHS 13 Foundation Trust Manchester Academic Health Sciences Centre 14 3Division of Evolution and Genomic Sciences School of Biological Sciences University of Manchester 15 4Sheffield Clinical Genetics Service, Sheffield Children's Hospital, OPD2, Northern General Hospital, 16 Herries Road, Sheffield, S5 7AU 17 5Liverpool Women’s Hospital Foundation Trust, Crown Street, Liverpool, L8 7SS 18 6West of Scotland Regional Genetics Service, NHS Greater Glasgow and Clyde, Institute of Medical 19 Genetics, Yorkhill Hospital, Glasgow G3 8SJ, UK 20 7South East Thames Regional Genetics
    [Show full text]
  • A New Tool for Efficient Genome Assembly of Known Eukaryotic Viruses from Metagenomes Hsin-Hung Lin and Yu-Chieh Liao∗
    GigaScience, 6, 2017, 1–10 doi: 10.1093/gigascience/gix003 Advance Access Publication Date: 20 January 2017 Research RESEARCH drVM: a new tool for efficient genome assembly of known eukaryotic viruses from metagenomes Hsin-Hung Lin and Yu-Chieh Liao∗ Institute of Population Health Sciences, National Health Research Institutes, No. 35, Keyan Road, Zhunan Town, Miaoli County, 35053, Taiwan ∗Correspondence: [email protected] Abstract Background: Virus discovery using high-throughput next-generation sequencing has become more commonplace. However, although analysis of deep next-generation sequencing data allows us to identity potential pathogens, the entire analytical procedure requires competency in the bioinformatics domain, which includes implementing proper software packages and preparing prerequisite databases. Simple and user-friendly bioinformatics pipelines are urgently required to obtain complete viral genome sequences from metagenomic data. Results: This manuscript presents a pipeline, drVM (detect and reconstruct known viral genomes from metagenomes), for rapid viral read identification, genus-level read partition, read normalization, de novo assembly, sequence annotation, and coverage profiling. The first two procedures and sequence annotation rely on known viral genomes as a reference database. drVM was validated via the analysis of over 300 sequencing runs generated by Illumina and Ion Torrent platforms to provide complete viral genome assemblies for a variety of virus types including DNA viruses, RNA viruses, and retroviruses. drVM is available for free download at: https://sourceforge.net/projects/sb2nhri/files/drVM/ andisalsoassembledasa Docker container, an Amazon machine image, and a virtual machine to facilitate seamless deployment. Conclusions: drVM was compared with other viral detection tools to demonstrate its merits in terms of viral genome completeness and reduced computation time.
    [Show full text]
  • Bioinformatics Exercises: Bovine Lactate Dehydrogenase (LDH)
    CH/BI 421/621/527 F15 Bioinformatics Worksheet for LDH Bioinformatics Exercises: Bovine Lactate Dehydrogenase (LDH) BACKGROUND: Often primary structure (amino acid sequence) is the first piece of experimental information a biochemist wants to have about a protein s/he is interested in studying since it can be used to make several predictions about the properties and possible behavior of the protein such as: • Protein molecular weight by adding up the masses of the individual amino acid residues. • Isoelectric point. The isoelectric point is where the protein has no charge. Because of ionizable functional groups on amino acids, protein charge changes as a function of pH depending on whether or not these groups are protonated. By knowing the sequence, we know how many of each ionizable group our protein contains. If we know the pH range where these groups become protonated or deprotonated, we can estimate the charge of the whole protein as a function of pH. This will be discussed in more detail below. • Molar extinction coefficient. Tryptophan, Tyrosine and Cysteine residues absorb ultraviolet light at 280 nm. By knowing how many of these amino acids are found in our protein’s sequence, we can calculate how much we expect a solution of our protein to absorb 280 nm light as a function of its concentration. I say “expect” instead of “determine” because the amount of light absorbed by these amino acids is dependent on their local environment within the protein especially on whether they are on the surface and exposed to the solution or buried inside the protein.
    [Show full text]
  • Molecular Characterization of the Alfalfa Mosaic Virus Infecting
    plants Article Molecular Characterization of the Alfalfa mosaic virus Infecting Solanum melongena in Egypt and the Control of Its Deleterious Effects with Melatonin and Salicylic Acid Ahmed R. Sofy 1,* , Mahmoud R. Sofy 1,* , Ahmed A. Hmed 1, Rehab A. Dawoud 2,3 , Ehab E. Refaey 1, Heba I. Mohamed 4 and Noha K. El-Dougdoug 5 1 Botany and Microbiology Department, Faculty of Science, Al-Azhar University, Cairo 11884, Egypt; [email protected] (A.A.H.); [email protected] (E.E.R.) 2 Virus and Phytoplasma Research Department, Plant Pathology Research Institute, Agricultural Research Center (ARC), Giza 12619, Egypt; [email protected] 3 Department of Biology, Faculty of Science, Jazan University, P.O. Box 114, Jazan 45142, Saudi Arabia 4 Department of Biological and Geological Sciences, Faculty of Education, Ain Shams University, Cairo 11566, Egypt; [email protected] 5 Botany and Microbiology Department, Faculty of Science, Benha University, Benha 13518, Egypt; [email protected] * Correspondence: [email protected] (A.R.S.); [email protected] (M.R.S.) Abstract: During the spring of 2019, distinct virus-like symptoms were observed in the Kafr El-Sheikh Governorate in Egypt in naturally infected eggplants. Leaves of affected plants showed interveinal Citation: Sofy, A.R.; Sofy, M.R.; leaf chlorosis, net yellow, chlorotic sectors, mottling, blisters, vein enation, necrotic intervention, and Hmed, A.A.; Dawoud, R.A.; Refaey, narrowing symptoms. The Alfalfa mosaic virus (AMV) was suspected of to be involved in this disease. E.E.; Mohamed, H.I.; El-Dougdoug, Forty plant samples from symptomatic eggplants and 10 leaf samples with no symptoms were N.K.
    [Show full text]
  • Viroinformatics-Based Analysis of SARS-Cov-2 Core Proteins for Potential Therapeutic Targets
    antibodies Article Viroinformatics-Based Analysis of SARS-CoV-2 Core Proteins for Potential Therapeutic Targets Lokesh Agrawal 1,2,*,†, Thanasis Poullikkas 3,4,†, Scott Eisenhower 3,5, Carlo Monsanto 6, Ranjith Kumar Bakku 7,8 , Min-Hua Chen 9 and Rajkumar Singh Kalra 10,* 1 Universidad Integral del Caribe y América Latina, Kaminda Cas Grandi #79, Willemstad, Curacao 2 Graduate School of Comprehensive Human Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba 305-8577, Japan 3 Human Biology, School of Integrative and Global Majors, University of Tsukuba, 1-1-1 Tennodai, Tsukuba 305-0006, Japan; [email protected] (T.P.); [email protected] (S.E.) 4 Department of Experimental Pathology, Faculty of Medicine, University of Tsukuba, 2-1-1 Tennodai, Tsukuba 305-8576, Japan 5 Department of Infection Biology, Faculty of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba 305-8575, Japan 6 Research Workgroup, Ronin Institute, 127 Haddon Place, Montclair, NJ 07043-2314, USA; [email protected] 7 Department of Computer Science, Faculty of Engineering Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba 305-8572, Japan; [email protected] 8 Tsukuba Life Science Innovation Program (TLSI), University of Tsukuba, 1-1-1 Tennodai, Tsukuba 305-8572, Japan 9 Department of Biomedical Engineering, Chung Yuan Christian University, 200, Chung Pei Road, Taoyuan City 32023, Taiwan; [email protected] 10 AIST-INDIA DAILAB, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba 305-8565, Japan * Correspondence: [email protected] (L.A.); [email protected] (R.S.K.); Tel.: +81-80-9442-7502 (L.A.); +81-80-6670-6681 (R.S.K.) † These authors contributed equally to this work.
    [Show full text]
  • Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases
    Genomics Proteomics Bioinformatics 18 (2020) 91–103 Genomics Proteomics Bioinformatics www.elsevier.com/locate/gpb www.sciencedirect.com PERSPECTIVE Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases Qingyu Chen 1,*, Ramona Britto 2, Ivan Erill 3, Constance J. Jeffery 4, Arthur Liberzon 5, Michele Magrane 2, Jun-ichi Onami 6,7, Marc Robinson-Rechavi 8,9, Jana Sponarova 10, Justin Zobel 1,*, Karin Verspoor 1,* 1 School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3010, Australia 2 European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK 3 Department of Biological Sciences, University of Maryland, Baltimore, MD 21250, USA 4 Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL 60607, USA 5 Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA 6 Japan Science and Technology Agency, National Bioscience Database Center, Tokyo 102-8666, Japan 7 National Institute of Health Sciences, Tokyo 158-8501, Japan 8 Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland 9 Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland 10 Nebion AG, 8048 Zurich, Switzerland Received 8 December 2017; revised 24 October 2018; accepted 14 December 2018 Available online 9 July 2020 Handled by Zhang Zhang Introduction assembled, annotated, and ultimately submitted to primary nucleotide databases such as GenBank [2], European Nucleo- tide Archive (ENA) [3], and DNA Data Bank of Japan Biological databases represent an extraordinary collective vol- (DDBJ) [4] (collectively known as the International Nucleotide ume of work.
    [Show full text]
  • Plant Protein Annotation in the Uniprot Knowledgebase
    Article Plant protein annotation in the UniProt Knowledgebase SCHNEIDER, Michel, et al. Abstract The Swiss-Prot, TrEMBL, Protein Information Resource (PIR), and DNA Data Bank of Japan (DDBJ) protein database activities have united to form the Universal Protein Resource (UniProt) Consortium. UniProt presents three database layers: the UniProt Archive, the UniProt Knowledgebase (UniProtKB), and the UniProt Reference Clusters. The UniProtKB consists of two sections: UniProtKB/Swiss-Prot (fully manually curated entries) and UniProtKB/TrEMBL (automated annotation, classification and extensive cross-references). New releases are published fortnightly. A specific Plant Proteome Annotation Program (http://www.expasy.org/sprot/ppap/) was initiated to cope with the increasing amount of data produced by the complete sequencing of plant genomes. Through UniProt, our aim is to provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information that will allow the plant community to fully explore and utilize the wealth of information available for both plant and non-plant model organisms. Reference SCHNEIDER, Michel, et al. Plant protein annotation in the UniProt Knowledgebase. Plant Physiology, 2005, vol. 138, no. 1, p. 59-66 DOI : 10.1104/pp.104.058933 PMID : 15888679 Available at: http://archive-ouverte.unige.ch/unige:38249 Disclaimer: layout of this document may differ from the published version. 1 / 1 Bioinformatics Plant Protein Annotation in the UniProt Knowledgebase1 Michel Schneider*,
    [Show full text]
  • Europe PMC Funders Group Author Manuscript Nature
    Europe PMC Funders Group Author Manuscript Nature. Author manuscript; available in PMC 2009 July 31. Published in final edited form as: Nature. 2007 June 7; 447(7145): 661–678. doi:10.1038/nature05911. Europe PMC Funders Author Manuscripts Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls The Wellcome Trust Case Control Consortium* Abstract There is increasing evidence that genome-wide association (GWA) studies represent a powerful approach to the identification of genes involved in common human diseases. We describe a joint GWA study (using the Affymetrix GeneChip 500K Mapping Array Set) undertaken in the British population, which has examined ~2,000 individuals for each of 7 major diseases and a shared set of ~3,000 controls. Case-control comparisons identified 24 independent association signals at P<5×10-7: 1 in bipolar disorder, 1 in coronary artery disease, 9 in Crohn’s disease, 3 in rheumatoid arthritis, 7 in type 1 diabetes and 3 in type 2 diabetes. On the basis of prior findings and replication studies thus-far completed, almost all of these signals reflect genuine susceptibility effects. We observed association at many previously identified loci, and found compelling evidence that some loci confer risk for more than one of the diseases studied. Across all diseases, we identified a large number of further signals (including 58 loci with single-point P values between 10-5 and 5×10-7) likely to yield additional susceptibility loci. The importance of appropriately large samples was confirmed by the modest effect sizes observed at most loci identified.
    [Show full text]