Uniprot Genomic Mapping, Supplemental Methods and Results
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Whole-Genome Sequencing for Tracing the Genetic Diversity of Brucella Abortus and Brucella Melitensis Isolated from Livestock in Egypt
pathogens Article Whole-Genome Sequencing for Tracing the Genetic Diversity of Brucella abortus and Brucella melitensis Isolated from Livestock in Egypt Aman Ullah Khan 1,2,3 , Falk Melzer 1, Ashraf E. Sayour 4, Waleed S. Shell 5, Jörg Linde 1, Mostafa Abdel-Glil 1,6 , Sherif A. G. E. El-Soally 7, Mandy C. Elschner 1, Hossam E. M. Sayour 8 , Eman Shawkat Ramadan 9, Shereen Aziz Mohamed 10, Ashraf Hendam 11 , Rania I. Ismail 4, Lubna F. Farahat 10, Uwe Roesler 2, Heinrich Neubauer 1 and Hosny El-Adawy 1,12,* 1 Institute of Bacterial Infections and Zoonoses, Friedrich-Loeffler-Institut, 07743 Jena, Germany; AmanUllah.Khan@fli.de (A.U.K.); falk.melzer@fli.de (F.M.); Joerg.Linde@fli.de (J.L.); Mostafa.AbdelGlil@fli.de (M.A.-G.); mandy.elschner@fli.de (M.C.E.); Heinrich.neubauer@fli.de (H.N.) 2 Institute for Animal Hygiene and Environmental Health, Free University of Berlin, 14163 Berlin, Germany; [email protected] 3 Department of Pathobiology, University of Veterinary and Animal Sciences (Jhang Campus), Lahore 54000, Pakistan 4 Department of Brucellosis, Animal Health Research Institute, Agricultural Research Center, Dokki, Giza 12618, Egypt; [email protected] (A.E.S.); [email protected] (R.I.I.) 5 Central Laboratory for Evaluation of Veterinary Biologics, Agricultural Research Center, Abbassia, Citation: Khan, A.U.; Melzer, F.; Cairo 11517, Egypt; [email protected] 6 Sayour, A.E.; Shell, W.S.; Linde, J.; Department of Pathology, Faculty of Veterinary Medicine, Zagazig University, Elzera’a Square, Abdel-Glil, M.; El-Soally, S.A.G.E.; Zagazig 44519, Egypt 7 Veterinary Service Department, Armed Forces Logistics Authority, Egyptian Armed Forces, Nasr City, Elschner, M.C.; Sayour, H.E.M.; Cairo 11765, Egypt; [email protected] Ramadan, E.S.; et al. -
De Novo Genomic Analyses for Non-Model Organisms: an Evaluation of Methods Across a Multi-Species Data Set
De novo genomic analyses for non-model organisms: an evaluation of methods across a multi-species data set Sonal Singhal [email protected] Museum of Vertebrate Zoology University of California, Berkeley 3101 Valley Life Sciences Building Berkeley, California 94720-3160 Department of Integrative Biology University of California, Berkeley 1005 Valley Life Sciences Building Berkeley, California 94720-3140 June 5, 2018 Abstract High-throughput sequencing (HTS) is revolutionizing biological research by enabling sci- entists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease of obtaining such data, using these data effectively still poses notable challenges, especially for those working with organisms without a high-quality reference genome. For every stage of analysis – from assembly to annotation to variant discovery – researchers have to distinguish technical artifacts from the biological realities of their data before they can make inference. In this work, I explore these challenges by generating a large de novo comparative transcriptomic dataset data for a clade of lizards and constructing a pipeline to analyze these data. Then, using a combination of novel metrics and an externally validated variant data set, I test the efficacy of my approach, identify areas of improvement, and propose ways to minimize these errors. I arXiv:1211.1737v1 [q-bio.GN] 8 Nov 2012 find that with careful data curation, HTS can be a powerful tool for generating genomic data for non-model organisms. Keywords: de novo assembly, transcriptomes, suture zones, variant discovery, annotation Running Title: De novo genomic analyses for non-model organisms 1 Introduction High-throughput sequencing (HTS) is poised to revolutionize the field of evolutionary genetics by enabling researchers to assay thousands of loci for organisms across the tree of life. -
CERENKOV: Computational Elucidation of the Regulatory Noncoding Variome
CERENKOV: Computational Elucidation of the Regulatory Noncoding Variome Yao Yao∗ Zheng Liu∗ Satpreet Singh Oregon State University Oregon State University Oregon State University EECS EECS EECS [email protected] [email protected] [email protected] Qi Wei Stephen A. Ramsey Oregon State University Oregon State University EECS Department of Biomedical Sciences [email protected] Corvallis, Oregon 97330 [email protected] ABSTRACT (hAVGRANKi = 3.877). The source code for CERENKOV We describe a novel computational approach, CERENKOV is available on GitHub and the SNP feature data files are (Computational Elucidation of the REgulatory NonKOd- available for download via the CERENKOV website. ing Variome), for discriminating regulatory single nucleotide polymorphisms (rSNPs) from non-regulatory SNPs within CCS CONCEPTS noncoding genetic loci. CERENKOV is specifically designed • Computing methodologies → Machine learning; • for recognizing rSNPs in the context of a post-analysis of a Applied computing → Computational biology; Bioin- genome-wide association study (GWAS); it includes a novel formatics; accuracy scoring metric (which we call average rank, or AV- GRANK) and a novel cross-validation strategy (locus-based KEYWORDS sampling) that both correctly account for the \sparse positive SNP, GWAS, noncoding, rSNP, SNV, machine learning bag" nature of the GWAS post-analysis rSNP recognition problem. We trained and validated CERENKOV using a reference set of 15,331 SNPs (the OSU17 SNP set) whose composition is based -
MODBASE: a Database of Annotated Comparative Protein Structure Models and Associated Resources Ursula Pieper1,2, Narayanan Eswar1,2, Fred P
Nucleic Acids Research, 2006, Vol. 34, Database issue D291–D295 doi:10.1093/nar/gkj059 MODBASE: a database of annotated comparative protein structure models and associated resources Ursula Pieper1,2, Narayanan Eswar1,2, Fred P. Davis1,2, Hannes Braberg1,2, M. S. Madhusudhan1,2, Andrea Rossi1,2, Marc Marti-Renom1,2, Rachel Karchin1,2, Ben M. Webb1,2, David Eramian1,2,4, Min-Yi Shen1,2, Libusha Kelly1,2,5, Francisco Melo3 and Andrej Sali1,2,* 1Department of Biopharmaceutical Sciences and 2Department Pharmaceutical Chemistry, California Institute for Quantitative Biomedical Research, QB3 at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, USA, 3Departamento de Gene´tica Molecular y Microbiologı´a, Facultad de Ciencias Biolo´gicas, Pontificia Universidad Cato´lica de Chile, Alameda 340, Santiago, Chile, 4Graduate Group in Biophysics and 5Graduate Group in Biological and Medical Informatics, University of California, San Francisco, CA, USA Received September 14, 2005; Accepted October 5, 2005 ABSTRACT sites, interactions between yeast proteins, and func- MODBASE (http://salilab.org/modbase) is a database tional consequences of human nsSNPs (LS-SNP, of annotated comparative protein structure models http://salilab.org/LS-SNP). for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated INTRODUCTION modeling pipeline that relies on MODELLER for fold The genome sequencing efforts are providing us with com- assignment, sequence–structure alignment, model plete genetic blueprints for hundreds of organisms, including building and model assessment (http:/salilab.org/ humans. We are now faced with the challenge of assigning, modeller). -
MATCH-G Program
MATCH-G Program The MATCH-G toolset MATCH-G (Mutational Analysis Toolset Comparing wHole Genomes) Download the Toolset The toolset is prepared for use with pombe genomes and a sample genome from Sanger is included. However, it is written in such a way as to allow it to utilize any set or number of chromosomes. Simply follow the naming convention "chromosomeX.contig.embl" where X=1,2,3 etc. For use in terminal windows in Mac OSX or Unix-like environments where Perl is present by default. Toolset Description Version History 2.0 – Bug fixes related to scaling to other genomes. First version of GUI interface. 1.0 – Initial Release. Includes build, alignment, snp, copy, and gap resolution routines in original form. Please note this project in under development and will undergo significant changes. Project Description The MATCH-G toolset has been developed to facilitate the evaluation of whole genome sequencing data from yeasts. Currently, the toolset is written for pombe strains, however the toolset is easily scalable to other yeasts or other sequenced genomes. The included tools assist in the identification of SNP mutations, short (and long) insertion/deletions, and changes in gene/region copy number between a parent strain and mutant or revertant strain. Currently, the toolset is run from the command line in a unix or similar terminal and requires a few additional programs to run noted below. Installation It is suggested that a separate folder be generated for the toolset and generated files. Free disk space proportional to the original read datasets is recommended. The toolset utilizes and requires a few additional free programs: Bowtie rapid alignment software: http://bowtie-bio.sourceforge.net/index.shtml (Langmead B, Trapnell C, Pop M, Salzberg SL. -
Spontaneous Puberty in 46,XX Subjects with Congenital Lipoid Adrenal Hyperplasia
Spontaneous puberty in 46,XX subjects with congenital lipoid adrenal hyperplasia. Ovarian steroidogenesis is spared to some extent despite inactivating mutations in the steroidogenic acute regulatory protein (StAR) gene. K Fujieda, … , T Sugawara, J F Strauss 3rd J Clin Invest. 1997;99(6):1265-1271. https://doi.org/10.1172/JCI119284. Research Article Congenital lipoid adrenal hyperplasia (lipoid CAH) is the most severe form of CAH in which the synthesis of all gonadal and adrenal cortical steroids is markedly impaired. We report here the clinical, endocrinological, and molecular analyses of two unrelated Japanese kindreds of 46,XX subjects affected with lipoid CAH who manifested spontaneous puberty. Phenotypic female infants with 46,XX karyotypes were diagnosed with lipoid CAH as newborns based on a clinical history of failure to thrive, hyperpigmentation, hyponatremia, hyperkalemia, and low basal values of serum cortisol and urinary 17-hydroxycorticosteroid and 17-ketosteroid. These patients responded to treatment with glucocorticoid and 9alpha- fludrocortisone. Spontaneous thelarche occurred in association with increased serum estradiol levels at the age of 10 and 11 yr, respectively. Pubic hair developed at the age of 12 yr 11 mo in one subject and menarche was at the age of 12 yr in both cases. Both subjects reported periodic menstrual bleeding and subsequently developed polycystic ovaries. To investigate the molecular basis of the steroidogenic lesion in these patients, the StAR gene was characterized by PCR and direct DNA sequence -
Large Scale Genomic Rearrangements in Selected Arabidopsis Thaliana T
bioRxiv preprint doi: https://doi.org/10.1101/2021.03.03.433755; this version posted March 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 Large scale genomic rearrangements in selected 2 Arabidopsis thaliana T-DNA lines are caused by T-DNA 3 insertion mutagenesis 4 5 Boas Pucker1,2+, Nils Kleinbölting3+, and Bernd Weisshaar1* 6 1 Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec), Bielefeld University, 7 Sequenz 1, 33615 Bielefeld, Germany 8 2 Evolution and Diversity, Department of Plant Sciences, University of Cambridge, Cambridge, 9 United Kingdom 10 3 Bioinformatics Resource Facility, Center for Biotechnology (CeBiTec, Bielefeld University, 11 Sequenz 1, 33615 Bielefeld, Germany 12 + authors contributed equally 13 * corresponding author: Bernd Weisshaar 14 15 BP: [email protected], ORCID: 0000-0002-3321-7471 16 NK: [email protected], ORCID: 0000-0001-9124-5203 17 BW: [email protected], ORCID: 0000-0002-7635-3473 18 19 20 21 22 23 24 25 26 27 28 29 30 page 1 bioRxiv preprint doi: https://doi.org/10.1101/2021.03.03.433755; this version posted March 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. -
An Open-Sourced Bioinformatic Pipeline for the Processing of Next-Generation Sequencing Derived Nucleotide Reads
bioRxiv preprint doi: https://doi.org/10.1101/2020.04.20.050369; this version posted May 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. An open-sourced bioinformatic pipeline for the processing of Next-Generation Sequencing derived nucleotide reads: Identification and authentication of ancient metagenomic DNA Thomas C. Collin1, *, Konstantina Drosou2, 3, Jeremiah Daniel O’Riordan4, Tengiz Meshveliani5, Ron Pinhasi6, and Robin N. M. Feeney1 1School of Medicine, University College Dublin, Ireland 2Division of Cell Matrix Biology Regenerative Medicine, University of Manchester, United Kingdom 3Manchester Institute of Biotechnology, School of Earth and Environmental Sciences, University of Manchester, United Kingdom [email protected] 5Institute of Paleobiology and Paleoanthropology, National Museum of Georgia, Tbilisi, Georgia 6Department of Evolutionary Anthropology, University of Vienna, Austria *Corresponding Author Abstract The emerging field of ancient metagenomics adds to these Bioinformatic pipelines optimised for the processing and as- processing complexities with the need for additional steps sessment of metagenomic ancient DNA (aDNA) are needed in the separation and authentication of ancient sequences from modern sequences. Currently, there are few pipelines for studies that do not make use of high yielding DNA cap- available for the analysis of ancient metagenomic DNA ture techniques. These bioinformatic pipelines are tradition- 1 4 ally optimised for broad aDNA purposes, are contingent on (aDNA) ≠ The limited number of bioinformatic pipelines selection biases and are associated with high costs. -
Sequence Alignment/Map Format Specification
Sequence Alignment/Map Format Specification The SAM/BAM Format Specification Working Group 3 Jun 2021 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing is version 53752fa from that repository, last modified on the date shown above. 1 The SAM Format Specification SAM stands for Sequence Alignment/Map format. It is a TAB-delimited text format consisting of a header section, which is optional, and an alignment section. If present, the header must be prior to the alignments. Header lines start with `@', while alignment lines do not. Each alignment line has 11 mandatory fields for essential alignment information such as mapping position, and variable number of optional fields for flexible or aligner specific information. This specification is for version 1.6 of the SAM and BAM formats. Each SAM and BAMfilemay optionally specify the version being used via the @HD VN tag. For full version history see Appendix B. Unless explicitly specified elsewhere, all fields are encoded using 7-bit US-ASCII 1 in using the POSIX / C locale. Regular expressions listed use the POSIX / IEEE Std 1003.1 extended syntax. 1.1 An example Suppose we have the following alignment with bases in lowercase clipped from the alignment. Read r001/1 and r001/2 constitute a read pair; r003 is a chimeric read; r004 represents a split alignment. Coor 12345678901234 5678901234567890123456789012345 ref AGCATGTTAGATAA**GATAGCTGTGCTAGTAGGCAGTCAGCGCCAT +r001/1 TTAGATAAAGGATA*CTG +r002 aaaAGATAA*GGATA +r003 gcctaAGCTAA +r004 ATAGCT..............TCAGC -r003 ttagctTAGGC -r001/2 CAGCGGCAT The corresponding SAM format is:2 1Charset ANSI X3.4-1968 as defined in RFC1345. -
Assembly Exercise
Assembly Exercise Turning reads into genomes Where we are • 13:30-14:00 – Primer Design to Amplify Microbial Genomes for Sequencing • 14:00-14:15 – Primer Design Exercise • 14:15-14:45 – Molecular Barcoding to Allow Multiplexed NGS • 14:45-15:15 – Processing NGS Data – de novo and mapping assembly • 15:15-15:30 – Break • 15:30-15:45 – Assembly Exercise • 15:45-16:15 – Annotation • 16:15-16:30 – Annotation Exercise • 16:30-17:00 – Submitting Data to GenBank Log onto ILRI cluster • Log in to HPC using ILRI instructions • NOTE: All the commands here are also in the file - assembly_hands_on_steps.txt • If you are like me, it may be easier to cut and paste Linux commands from this file instead of typing them in from the slides Start an interactive session on larger servers • The interactive command will start a session on a server better equipped to do genome assembly $ interactive • Switch to csh (I use some csh features) $ csh • Set up Newbler software that will be used $ module load 454 A norovirus sample sequenced on both 454 and Illumina • The vendors use different file formats unknown_norovirus_454.GACT.sff unknown_norovirus_illumina.fastq • I have converted these files to additional formats for use with the assembly tools unknown_norovirus_454_convert.fasta unknown_norovirus_454_convert.fastq unknown_norovirus_illumina_convert.fasta Set up and run the Newbler de novo assembler • Create a new de novo assembly project $ newAssembly de_novo_assembly • Add read data to the project $ addRun de_novo_assembly unknown_norovirus_454.GACT.sff -
(From GWAS/WGS/WES) to Mechanistic Disease Insight?
How do we go from genetic discoveries (from GWAS/WGS/WES) to mechanistic disease insight? Part I – Functional annotation in risk loci 2021 Online International Statistical Genetics Workshop Danielle Posthuma | [email protected] | @dposthu | https://ctg.cncr.nl What have we learned so far? Theory underlying genetic association Setting up a genome-wide association study Quality control for genetic datasets and analysis Conducting genetic association Several post-gwas analyses including SNP h2, causal modeling, gSEM Primary outcome of a genome-wide genetic association: - Manhattan plot - Summary statistics that include an effect estimate and significance of association per variant GWAS McCarthy et al. Nat. Rev. Gent. (2008) GWAS HOW? McCarthy et al. Nat. Rev. Gent. (2008) How to gain mechanistic insight from genetic discoveries Mendelian or monogenic disorders (influenced by one mutation in one gene) • Segregation analysis (1970s – onwards) detected several genes co- segregating with disease • For each disease a mutation in one gene is sufficient to express that disease • Functional experimentation on these genes involved e.g. knock-out models to investigate that gene’s function • This has been successful for e.g. PKU, Huntington’s disease, breast cancer. • Any mechanistic insight guides treatment development How to gain mechanistic insight from genetic discoveries Polygenic disorders (influenced by 100’s of variants each of small effect) • GWAS (2006s – onwards) detected several genetic loci associated with diseases that are polygenic • For each disease a single genetic variant is not sufficient to express that disease, instead 100’s of variants cumulatively increase risk for disease • Detected loci contain 100’s of variants, sometimes no genes are implicated • Functional experimentation on these variants is not straightforward, mechanistic insight is not easily obtained for polygenic traits GWAS hits for polygenic traits often not directly useful for functional follow-up 4 issues: 1. -
Nonclassic Congenital Lipoid Adrenal Hyperplasia Diagnosed at 17 Months in a Korean Boy with Normal Male Genitalia: Emphasis on Pigmentation As a Diagnostic Clue
Case report https://doi.org/10.6065/apem.2020.25.1.46 Ann Pediatr Endocrinol Metab 2020;25:4651 Nonclassic congenital lipoid adrenal hyperplasia diagnosed at 17 months in a Korean boy with normal male genitalia: emphasis on pigmentation as a diagnostic clue Hosun Bae, MD1, Congenital lipoid adrenal hyperplasia (CLAH) is one of the most fatal conditions Min-Sun Kim, MD1, caused by an abnormality of adrenal and gonadal steroidogenesis. CLAH results Hyojung Park, MD1, from loss-of-function mutations of the steroidogenic acute regulatory (STAR) Ja-Hyun Jang, MD, PhD2, gene; the disease manifests with electrolyte imbalances and hyperpigmentation Jong-Moon Choi, MD, PhD2, in neonates or young infants due to adrenocortical hormone deficiencies, and 46, Sae-Mi Lee, MD, PhD2, XY genetic male CLAH patients can be phenotypically female. Meanwhile, some Sung Yoon Cho, MD, PhD1, patients with STAR mutations develop hyperpigmentation and mild signs of adrenal insufficiency, such as hypoglycemia, after infancy. These patients are classified as Dong-Kyu Jin, MD, PhD1 having nonclassic CLAH (NCCLAH) caused by STAR mutations that retain partial 1Department of Pediatrics, Samsung activity of STAR. We present the case of a Korean boy with normal genitalia who Medical Center, Sungkyunkwan Univer was diagnosed with NCCLAH. He presented with whole-body hyperpigmentation sity School of Medicine, Seoul, Korea and electrolyte abnormalities, which were noted at the age of 17 months after 2GC Genome, Yongin, Korea an episode of sepsis with peritonitis. The compound heterozygous mutations p.Gly221Ser and c.653C>T in STAR were identified by targeted gene-panel sequencing. Skin hyperpigmentation should be considered an important clue for diagnosing NCCLAH.