A Combined RAD-Seq and WGS Approach Reveals the Genomic
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Variant Call Format Specification Vcfv4.3 and Bcfv2.2
The Variant Call Format Specification VCFv4.3 and BCFv2.2 27 Jul 2021 The master version of this document can be found at https://github.com/samtools/hts-specs. This printing is version 1715701 from that repository, last modified on the date shown above. 1 Contents 1 The VCF specification 4 1.1 An example . .4 1.2 Character encoding, non-printable characters and characters with special meaning . .4 1.3 Data types . .4 1.4 Meta-information lines . .5 1.4.1 File format . .5 1.4.2 Information field format . .5 1.4.3 Filter field format . .5 1.4.4 Individual format field format . .6 1.4.5 Alternative allele field format . .6 1.4.6 Assembly field format . .6 1.4.7 Contig field format . .6 1.4.8 Sample field format . .7 1.4.9 Pedigree field format . .7 1.5 Header line syntax . .7 1.6 Data lines . .7 1.6.1 Fixed fields . .7 1.6.2 Genotype fields . .9 2 Understanding the VCF format and the haplotype representation 11 2.1 VCF tag naming conventions . 12 3 INFO keys used for structural variants 12 4 FORMAT keys used for structural variants 13 5 Representing variation in VCF records 13 5.1 Creating VCF entries for SNPs and small indels . 13 5.1.1 Example 1 . 13 5.1.2 Example 2 . 14 5.1.3 Example 3 . 14 5.2 Decoding VCF entries for SNPs and small indels . 14 5.2.1 SNP VCF record . 14 5.2.2 Insertion VCF record . -
Temporal Flexibility of Gene Regulatory Network Underlies a Novel Wing Pattern in Flies
Temporal flexibility of gene regulatory network underlies a novel wing pattern in flies Héloïse D. Dufoura,b, Shigeyuki Koshikawa (越川滋行)a,b,1,2,3, and Cédric Fineta,b,3,4 aHoward Hughes Medical Institute, University of Wisconsin, Madison, WI 53706; and bLaboratory of Molecular Biology, University of Wisconsin, Madison, WI 53706 Edited by Denis Duboule, University of Geneva, Geneva 4, Switzerland, and approved April 6, 2020 (received for review February 3, 2020) Organisms have evolved endless morphological, physiological, and Nevertheless, it does not explain how the new expression of behavioral novel traits during the course of evolution. Novel traits the coopted toolkit genes does not interfere with the develop- were proposed to evolve mainly by orchestration of preexisting ment of the tissue. Some authors have suggested that the reuse genes. Over the past two decades, biologists have shown that of toolkit genes might only happen during late development after cooption of gene regulatory networks (GRNs) indeed underlies completion of the early function of the redeployed genes (21, 22, numerous evolutionary novelties. However, very little is known 29). However, little is known about the properties of a GRN that about the actual GRN properties that allow such redeployment. allow the cooption of one or several of its components/genes Here we have investigated the generation and evolution of the without impairing the development of the tissue. complex wing pattern of the fly Samoaia leonensis. We show that In this study, we use the complex wing pigmentation pattern of the transcription factor Engrailed is recruited independently from the fly species Samoaia leonensis as a model to address how the the other players of the anterior–posterior specification network to generate a new wing pattern. -
A Semantic Standard for Describing the Location of Nucleotide and Protein Feature Annotation Jerven T
Bolleman et al. Journal of Biomedical Semantics (2016) 7:39 DOI 10.1186/s13326-016-0067-z RESEARCH Open Access FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation Jerven T. Bolleman1*, Christopher J. Mungall2, Francesco Strozzi3, Joachim Baran4, Michel Dumontier5, Raoul J. P. Bonnal6, Robert Buels7, Robert Hoehndorf8, Takatomo Fujisawa9, Toshiaki Katayama10 and Peter J. A. Cock11 Abstract Background: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. Description: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Conclusions: Our ontology allows -
Functional Genetic Approaches to Provide Evidence for the Role of Toolkit Genes in the Evolution of Complex Color Patterns in Drosophila Guttifera
Michigan Technological University Digital Commons @ Michigan Tech Dissertations, Master's Theses and Master's Reports 2021 FUNCTIONAL GENETIC APPROACHES TO PROVIDE EVIDENCE FOR THE ROLE OF TOOLKIT GENES IN THE EVOLUTION OF COMPLEX COLOR PATTERNS IN DROSOPHILA GUTTIFERA Mujeeb Olushola Shittu Michigan Technological University, [email protected] Copyright 2021 Mujeeb Olushola Shittu Recommended Citation Shittu, Mujeeb Olushola, "FUNCTIONAL GENETIC APPROACHES TO PROVIDE EVIDENCE FOR THE ROLE OF TOOLKIT GENES IN THE EVOLUTION OF COMPLEX COLOR PATTERNS IN DROSOPHILA GUTTIFERA", Open Access Dissertation, Michigan Technological University, 2021. https://doi.org/10.37099/mtu.dc.etdr/1174 Follow this and additional works at: https://digitalcommons.mtu.edu/etdr Part of the Biology Commons, Developmental Biology Commons, Evolution Commons, Molecular Genetics Commons, and the Other Cell and Developmental Biology Commons FUNCTIONAL GENETIC APPROACHES TO PROVIDE EVIDENCE FOR THE ROLE OF TOOLKIT GENES IN THE EVOLUTION OF COMPLEX COLOR PATTERNS IN DROSOPHILA GUTTIFERA By Mujeeb Olushola Shittu A DISSERTATION Submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY In Biochemistry and Molecular Biology MICHIGAN TECHNOLOGICAL UNIVERSITY 2021 ©2021 Mujeeb Olushola Shittu This dissertation has been approved in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY in Biochemistry and Molecular Biology. Department of Biological Sciences Dissertation Advisor: Dr. Thomas Werner Committee Member: Dr. Chandrashekhar -
VCF/Plotein: a Web Application to Facilitate the Clinical Interpretation Of
bioRxiv preprint doi: https://doi.org/10.1101/466490; this version posted November 14, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Manuscript (All Manuscript Text Pages, including Title Page, References and Figure Legends) VCF/Plotein: A web application to facilitate the clinical interpretation of genetic and genomic variants from exome sequencing projects Raul Ossio MD1, Diego Said Anaya-Mancilla BSc1, O. Isaac Garcia-Salinas BSc1, Jair S. Garcia-Sotelo MSc1, Luis A. Aguilar MSc1, David J. Adams PhD2 and Carla Daniela Robles-Espinoza PhD1,2* 1Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Querétaro, Qro., 76230, Mexico 2Experimental Cancer Genetics, Wellcome Sanger Institute, Hinxton Cambridge, CB10 1SA, UK * To whom correspondence should be addressed. Carla Daniela Robles-Espinoza Tel: +52 55 56 23 43 31 ext. 259 Email: [email protected] 1 bioRxiv preprint doi: https://doi.org/10.1101/466490; this version posted November 14, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. CONFLICT OF INTEREST No conflicts of interest. 2 bioRxiv preprint doi: https://doi.org/10.1101/466490; this version posted November 14, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. ABSTRACT Purpose To create a user-friendly web application that allows researchers, medical professionals and patients to easily and securely view, filter and interact with human exome sequencing data in the Variant Call Format (VCF). -
Infravec2 Open Research Data Management Plan
INFRAVEC2 OPEN RESEARCH DATA MANAGEMENT PLAN Authors: Andrea Crisanti, Gareth Maslen, Andy Yates, Paul Kersey, Alain Kohl, Clelia Supparo, Ken Vernick Date: 10th July 2020 Version: 3.0 Overview Infravec2 will align to Open Research Data, as follows: Data Types and Standards Infravec2 will generate a variety of data types, including molecular data types: genome sequence and assembly, structural annotation (gene models, repeats, other functional regions) and functional annotation (protein function assignment), variation data, and transcriptome data; arbovirus and malaria experimental infection data, linked to archived samples; and microbiome data (Operational Taxonomic Units), including natural virome composition. All data will be released according to the appropriate standards and formats for each data type. For example, DNA sequence will be released in FASTA format; variant calls in Variant Call Format; sequence alignments in BAM (Binary Alignment Map) and CRAM (Compressed Read Alignment Map) formats, etc. We will strongly encourage the organisation of linked data sets as Track Hubs, a mechanism for publishing a set of linked genomic data that aids data discovery, sharing, and selection for subsequent analysis. We will develop internal standards within the consortium to define minimal metadata that will accompany all data sets, following the template of the Minimal Information Standards for Biological and Biomedical Investigations (http://www.dcc.ac.uk/resources/metadata-standards/mibbi-minimum-information-biological- and-biomedical-investigations). Data Exploitation, Accessibility, Curation and Preservation All molecular data for which existing public data repositories exist will be submitted to such repositories on or before the publication of written manuscripts, with early release of data (i.e. -
Semantics of Data for Life Sciences and Reproducible Research[Version 1
F1000Research 2020, 9:136 Last updated: 16 AUG 2021 OPINION ARTICLE BioHackathon 2015: Semantics of data for life sciences and reproducible research [version 1; peer review: 2 approved] Rutger A. Vos 1,2, Toshiaki Katayama 3, Hiroyuki Mishima4, Shin Kawano 3, Shuichi Kawashima3, Jin-Dong Kim3, Yuki Moriya3, Toshiaki Tokimatsu5, Atsuko Yamaguchi 3, Yasunori Yamamoto3, Hongyan Wu6, Peter Amstutz7, Erick Antezana 8, Nobuyuki P. Aoki9, Kazuharu Arakawa10, Jerven T. Bolleman 11, Evan Bolton12, Raoul J. P. Bonnal13, Hidemasa Bono 3, Kees Burger14, Hirokazu Chiba15, Kevin B. Cohen16,17, Eric W. Deutsch18, Jesualdo T. Fernández-Breis19, Gang Fu12, Takatomo Fujisawa20, Atsushi Fukushima 21, Alexander García22, Naohisa Goto23, Tudor Groza24,25, Colin Hercus26, Robert Hoehndorf27, Kotone Itaya10, Nick Juty28, Takeshi Kawashima20, Jee-Hyub Kim28, Akira R. Kinjo29, Masaaki Kotera30, Kouji Kozaki 31, Sadahiro Kumagai32, Tatsuya Kushida 33, Thomas Lütteke 34,35, Masaaki Matsubara 36, Joe Miyamoto37, Attayeb Mohsen 38, Hiroshi Mori39, Yuki Naito3, Takeru Nakazato3, Jeremy Nguyen-Xuan40, Kozo Nishida41, Naoki Nishida42, Hiroyo Nishide15, Soichi Ogishima43, Tazro Ohta3, Shujiro Okuda44, Benedict Paten45, Jean-Luc Perret46, Philip Prathipati38, Pjotr Prins47,48, Núria Queralt-Rosinach 49, Daisuke Shinmachi9, Shinya Suzuki 30, Tsuyosi Tabata50, Terue Takatsuki51, Kieron Taylor 28, Mark Thompson52, Ikuo Uchiyama 15, Bruno Vieira53, Chih-Hsuan Wei12, Mark Wilkinson 54, Issaku Yamada 36, Ryota Yamanaka55, Kazutoshi Yoshitake56, Akiyasu C. Yoshizawa50, Michel -
Amanitin Resistance in Drosophila Melanogaster: a Genome-Wide Association Approach
RESEARCH ARTICLE α-amanitin resistance in Drosophila melanogaster: A genome-wide association approach Chelsea L. Mitchell1, Catrina E. Latuszek1, Kara R. Vogel2, Ian M. Greenlund1, Rebecca E. Hobmeier1, Olivia K. Ingram1, Shannon R. Dufek1, Jared L. Pecore1, Felicia R. Nip3, Zachary J. Johnson4, Xiaohui Ji5, Hairong Wei5, Oliver Gailing5, Thomas Werner1* 1 Department of Biological Sciences, Michigan Technological University, 1400 Townsend Dr., Houghton, MI, United States of America, 2 Department of Neurology, University of Wisconsin School of Medicine and Public Health, 1300 University Ave., Madison, WI, United States of America, 3 College of Human Medicine, a1111111111 Michigan State University, Clinical Center, East Lansing, MI, United States of America, 4 U.S. Forest Service, a1111111111 Salt Lake Ranger District 6944 S, 3000 E, Salt Lake City, UT, United States of America, 5 School of Forest a1111111111 Resources and Environmental Sciences, Michigan Technological University, 1400 Townsend Dr., Houghton, MI, United States of America a1111111111 a1111111111 * [email protected] Abstract OPEN ACCESS We investigated the mechanisms of mushroom toxin resistance in the Drosophila Genetic Citation: Mitchell CL, Latuszek CE, Vogel KR, Reference Panel (DGRP) fly lines, using genome-wide association studies (GWAS). While Greenlund IM, Hobmeier RE, Ingram OK, et al. (2017) α-amanitin resistance in Drosophila Drosophila melanogaster avoids mushrooms in nature, some lines are surprisingly resistant melanogaster: A genome-wide association to α-amanitinÐa toxin found solely in mushrooms. This resistance may represent a pre- approach. PLoS ONE 12(2): e0173162. adaptation, which might enable this species to invade the mushroom niche in the future. doi:10.1371/journal.pone.0173162 Although our previous microarray study had strongly suggested that pesticide-metabolizing Editor: Gregg Roman, University of Mississippi, detoxification genes confer α-amanitin resistance in a Taiwanese D. -
Practical Guideline for Whole Genome Sequencing Disclosure
Practical Guideline for Whole Genome Sequencing Disclosure Kwangsik Nho Assistant Professor Center for Neuroimaging Department of Radiology and Imaging Sciences Center for Computational Biology and Bioinformatics Indiana University School of Medicine • Kwangsik Nho discloses that he has no relationships with commercial interests. What You Will Learn Today • Basic File Formats in WGS • Practical WGS Analysis Pipeline • WGS Association Analysis Methods Whole Genome Sequencing File Formats WGS Sequencer Base calling FASTQ: raw NGS reads Aligning SAM: aligned NGS reads BAM Variant Calling VCF: Genomic Variation How have BIG data problems been solved in next generation sequencing? gkno.me Whole Genome Sequencing File Formats • FASTQ: text-based format for storing both a DNA sequence and its corresponding quality scores (File sizes are huge (raw text) ~300GB per sample) @HS2000-306_201:6:1204:19922:79127/1 ACGTCTGGCCTAAAGCACTTTTTCTGAATTCCACCCCAGTCTGCCCTTCCTGAGTGCCTGGGCAGGGCCCTTGGGGAGCTGCTGGTGGGGCTCTGAATGT + BC@DFDFFHHHHHJJJIJJJJJJJJJJJJJJJJJJJJJHGGIGHJJJJJIJEGJHGIIJJIGGIIJHEHFBECEC@@D@BDDDDD<<?DB@BDDCDCDDC @HS2000-306_201:6:1204:19922:79127/2 GGGTAAAAGGTGTCCTCAGCTAATTCTCATTTCCTGGCTCTTGGCTAATTCTCATTCCCTGGGGGCTGGCAGAAGCCCCTCAAGGAAGATGGCTGGGGTC + BCCDFDFFHGHGHIJJJJJJJJJJJGJJJIIJCHIIJJIJIJJJJIJCHEHHIJJJJJJIHGBGIIHHHFFFDEEDED?B>BCCDDCBBDACDD8<?B09 @HS2000-306_201:4:1106:6297:92330/1 CACCATCCCGCGGAGCTGCCAGATTCTCGAGGTCACGGCTTACACTGCGGAGGGCCGGCAACCCCGCCTTTAATCTGCCTACCCAGGGAAGGAAAGCCTC + CCCFFFFFHGHHHJIJJJJJJJJIJJIIIGIJHIHHIJIGHHHHGEFFFDDDDDDDDDDD@ADBDDDDDDDDDDDDEDDDDDABCDDB?BDDBDCDDDDD -
Assembly, Annotation, and Polymorphic Characterization of the Erysiphe Necator Transcriptome
Rochester Institute of Technology RIT Scholar Works Theses 8-10-2012 Assembly, annotation, and polymorphic characterization of the Erysiphe necator transcriptome Jason Myers Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Myers, Jason, "Assembly, annotation, and polymorphic characterization of the Erysiphe necator transcriptome" (2012). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Assembly, Annotation, and Polymorphic Characterization of the Erysiphe necator Transcriptome by Jason Myers Submitted in partial fulfillment of the requirements for the Master of Science degree in Bioinformatics at Rochester Institute of Technology. Department of Biological Sciences School of Life Sciences Rochester Institute of Technology Rochester, NY August 10, 2012 Committee: Dr. Gary Skuse Associate Head of the School Life Sciences/Professor/Committee Member Dr. Michael Osier Bioinformatics Program Head/Program Advisor/Associate Professor/Committee Member Dr. Dina Newman Thesis Advisor/Associate Professor Dr. Lance Cadle-Davidson Associate Thesis Advisor/Plant Pathologist USDA-ARS Dr. Angela Baldo Thesis Project Advisor/Computational Biologist USDA-ARS II Abstract: The objectives of this study were to develop a transcriptomic reference resource and to characterize polymorphism between isolates of Erysiphe necator (syn. Uncinula necator), grape powdery mildew. The wine and fresh fruit markets are economically vital to many countries worldwide, and E. necator infection can cause severe crop damage and subsequent financial loss. Most of the publicly available sequence data for Erysiphales are from research done on Blumeria graminis f. -
Gain of Cis-Regulatory Activities Underlies Novel Domains of Wingless Gene Expression in Drosophila
Gain of cis-regulatory activities underlies novel domains of wingless gene expression in Drosophila Shigeyuki Koshikawaa,b,c, Matt W. Giorgiannia,b, Kathy Vaccaroa,b, Victoria A. Kassnera,b, John H. Yoderd, Thomas Wernere, and Sean B. Carrolla,b,1 aLaboratory of Molecular Biology, University of Wisconsin, Madison, WI 53706; bHoward Hughes Medical Institute, University of Wisconsin, Madison, WI 53706; cThe Hakubi Center for Advanced Research and Graduate School of Science, Kyoto University, Kyoto 606-8502, Japan; dDepartment of Biological Sciences, University of Alabama, Tuscaloosa, AL 35487; and eDepartment of Biological Sciences, Michigan Technological University, Houghton, MI 49931 Contributed by Sean B. Carroll, May 10, 2015 (sent for review April 2, 2015; reviewed by Michael Eisen and Gregory A. Wray) Changes in gene expression during animal development are largely In this case, the Dll protein is said to have been co-opted in the responsible for the evolution of morphological diversity. However, evolution of a new morphological trait. the genetic and molecular mechanisms responsible for the origins However, the mechanism underlying the co-option of Dll is of new gene-expression domains have been difficult to elucidate. not known in this case, nor for any other instances of the co- Here, we sought to identify molecular events underlying the origins option of regulatory genes. It is not known, for example, whether of three novel features of wingless (wg) gene expression that new features of gene expression evolve via the de novo origin of are associated with distinct pigmentation patterns in Drosophila enhancers or through the transposition or modification of existing guttifera. -
Biomolecule and Bioentity Interaction Databases in Systems Biology: a Comprehensive Review
biomolecules Review Biomolecule and Bioentity Interaction Databases in Systems Biology: A Comprehensive Review Fotis A. Baltoumas 1,* , Sofia Zafeiropoulou 1, Evangelos Karatzas 1 , Mikaela Koutrouli 1,2, Foteini Thanati 1, Kleanthi Voutsadaki 1 , Maria Gkonta 1, Joana Hotova 1, Ioannis Kasionis 1, Pantelis Hatzis 1,3 and Georgios A. Pavlopoulos 1,3,* 1 Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; zafeiropoulou@fleming.gr (S.Z.); karatzas@fleming.gr (E.K.); [email protected] (M.K.); [email protected] (F.T.); voutsadaki@fleming.gr (K.V.); [email protected] (M.G.); hotova@fleming.gr (J.H.); [email protected] (I.K.); hatzis@fleming.gr (P.H.) 2 Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark 3 Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece * Correspondence: baltoumas@fleming.gr (F.A.B.); pavlopoulos@fleming.gr (G.A.P.); Tel.: +30-210-965-6310 (G.A.P.) Abstract: Technological advances in high-throughput techniques have resulted in tremendous growth Citation: Baltoumas, F.A.; of complex biological datasets providing evidence regarding various biomolecular interactions. Zafeiropoulou, S.; Karatzas, E.; To cope with this data flood, computational approaches, web services, and databases have been Koutrouli, M.; Thanati, F.; Voutsadaki, implemented to deal with issues such as data integration, visualization, exploration, organization, K.; Gkonta, M.; Hotova, J.; Kasionis, scalability, and complexity. Nevertheless, as the number of such sets increases, it is becoming more I.; Hatzis, P.; et al. Biomolecule and and more difficult for an end user to know what the scope and focus of each repository is and how Bioentity Interaction Databases in redundant the information between them is.