Biological Sequence Database: NCBI

Total Page:16

File Type:pdf, Size:1020Kb

Biological Sequence Database: NCBI Biological sequence database: NCBI Subject : Bioinformatics Lesson : Biological sequence database: National Center for Biotechnology Information (NCBI ) Lesson Developer : Sandip Das College/ Department: Department of Botany, University of Delhi 0 Biological sequence database: NCBI Table of Contents Chapter: Biological sequence database: National Center for Biotechnology Information (NCBI) Introduction Databases at NCBI Literature Bookshelf Pubmed Nucleic Acid dbEST dbGSS dbGSS Popset dbGaP dbVar o Genome o Taxonomy o PubChem o Expression analysis o Protein Summary Exercise/ Practice Glossary References/ Bibliography/ Further Reading National Center for Biotechnology Information (NCBI) NCBI has emerged as the primary free-to-access source of data and analysis tools in the field of computational biology. The free-access nature of NCBI is possible as the policy of funding and publication in most countries dictates that the researcher mandatorily deposits the information generated using public-fund into a free-to-access central repository. In return, the repository (such as NCBI or EMBL) assigns a unique identification number, often termed as accession number, to the data that also can be used to identify the depositor and 1 Biological sequence database: NCBI several other features. The following section will introduce you to a variety of databases dealing with a wide range of disciplines. Please do note that although the data may be organized separately for the sake of simplicity and clarity, in reality, all the databases are inter-linked and can be navigated from one to the other. The databases are also associated with their appropriate analysis tools. The following section lists some of the databases that have been created at NCBI. For the sake of simplicity, the databases in this lesson have been divided into three sections-section I dealing with publication, literature and small scale DNA/RNA sequencing projects; section II-dealing with whole genome, epigenome, maps of genomes, taxonomy and chemical structures; and section III dealing with resources for RNA and protein that are required for “functional genomics” . These sections marked as I, II and III will be dealt in their respective chapters. Databases-I: Literature (PubMed, PubMed Central; NCBI Bookshelf): DNA and RNA (Refseq, nucleotide, EST, GSS, WGS, PopSet, trace archive, SRA): Databases-II: Genomes (Map Viewer, Genome workbench, Plant Genome Central, Genome Reference Consortium, Epigenomics, Genomics Structural variation): Maps: Taxonomy: PubChem Substance: Databases-III: Expression analysis-GEO Proteins (Reference sequences, GenPept, UniProt/SwissProt, PRF, PDB, Protein clusters, Structure, UniGene, CDD): Entrez is the single point database search and retrieval system that allows a user to perform the search and retrieve action against “all” or a “specific” database in an interlinked manner. 2 Biological sequence database: NCBI Figure : Various databases at NCBI can be accessed through the Entrez portal Source: http://www.ncbi.nlm.nih.gov/sites/gquery The National Center for Biotechnology Center (NCBI) site is conveniently organized into four major domains and these domains are interlinked : 1. Databases, 2. Tools, 3. Data submission and 4. Education The following figure depicts the interlinked nature of these domains and can be reached by 1. Open the ncbi page by typing in www.ncbi.nlm.nih.gov in the web browser 2. Click the “search” button on the home page without enetering any keyword . 3. On the top left hand corner of the webpage, click on the “site map” to reach the page. 3 Biological sequence database: NCBI Figure: Various databases are organized into four major domains and are interlinked Source: http://www.ncbi.nlm.nih.gov/guide/sitemap/ Databases of NCBI The following section introduces you to some of the following databases at NCBI Databases-I: Literature (PubMed, PubMed Central; NCBI Bookshelf): DNA and RNA (Refseq, nucleotide, EST, GSS, WGS, PopSet, trace archive, SRA): Literature: Bookshelf provides free access and allows users to browse and retrieve a wealth of information in life sciences and healthcare. The information may be in the form of books documents and policy information from various government agencies and publishers. The bookshelf titles are organized subject-wise, by Type or by Publisher in a searchable or browsable format. 4 Biological sequence database: NCBI Figure: Bookshelf database at NCBI Source: http://www.ncbi.nlm.nih.gov/books Pubmed: The second source of literature is Pubmed that comprises of millions of peer-reviewed research and review articles, and online books in the area of life science and allied disciplines. The articles and book chapters also provide links to related literature and information through web-links. A further sub-database of 5 Biological sequence database: NCBI Pubmed is PubMed Central (PMC) that provides free full-text access to research articles from the field of biomedical, life science and other related subjects. 6 Biological sequence database: NCBI Text and Reference books Figure: Pubmed and PubMed Central (PMC) is the key database at NCBI that provides access to research articles, review and books Source: http://www.ncbi.nlm.nih.gov/pubmed Nucleotides: The database for nucleotide resources have been divided into several sub- classes that are based on the genomic source or type. dbEST: The database on EST (dbEST; Expressed Sequence Tags) catalogues single-pass sequence reads of transcripts of a range of organisms which are further employed to evaluate spatio-temporal status of transcript and also for gene and genome annotation. A majority of the EST sequences are short and range between 300-500 nucleotides and are generated in large numbers from several EST projects in progress; ESTs are also derived from several projects that deal with differential display or RACE (Rapid Amplification of cDNA Ends). The expressed sequences present in the database can be used to study a global expression profile of an organism at various stages of development and adaptation. dbGSS: A parallel database that hosts random short single pass sequences from genome of various organisms is termed as database on Genome Survey Sequence or dbGSS. Like dbEST, an analysis of dbGSS can reveal a snapshot of the genomic landscape and composition of an organism and thus may provide valuable information prior to embarking on a full scale genome sequencing project. Both dbEST and dbGSS accept sequences that have been generated through Sanger’s di-deoxy Chain termination chemistry and are part of Trace Archive at NCBI. 7 Biological sequence database: NCBI SRA: EST and GSS generated through next Generation sequencing (NGS) such as Applied Biosystematics SOLiD, Roche 454 and Illumina 1G and Helicos Bioscience Heliscope are deposited at Short Read Archive (SRA) database. Indeed, SRA database is emerging as the primary repository for all forms of high-throughput data emerging from EST, GSS and Whole Genome Sequencing projects and other High Throughput Genomics studies. Figure: dbEST contains short single-pass sequence information from cDNA (http://www.ncbi.nlm.nih.gov/nucest) Figure : Sequence Read Archive (SRA) 8 Biological sequence database: NCBI Source: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi? Figure : SRA showing NGS data Source: http://www.ncbi.nlm.nih.gov/sra/?term=Cholesterol, http://www.ncbi.nlm.nih.gov/sra/SRX188623 Information in short Read archive (SRA) can be accessed by taking the following steps: 1. Go to NCBI by typing www.ncbi.nlm.nih.gov 2. Type any “keyword” such “cholesterol” in the search box 3. Select “SRA” database using the drop down menu in the search box 4. Click “search” 5. A list of SRA results of NGS data containing Cholesterol will appear 6. Select any data for further analysis Popset: Information originating from studies that compares sequences originating from same or different species or taxon for the purpose of ecosystem based analysis, phylogenetic analysis for genetic variation / mutational analysis are deposited in the Popset database. Each set consists of a comparable DNA sequence information derived from a single locus or gene for a group of organism or taxon. 9 Biological sequence database: NCBI . Figure: Popset with set of sequences deposited as a part of molecular phylogenetic studies employing ribosomal DNA gene sequence Source: http://www.ncbi.nlm.nih.gov/sites/gquery, 10 Biological sequence database: NCBI dbGaP: Studies undertaken with the help of Genome-wide association studies (GWAS), medical resequencing, molecular diagnostics to establish and analyse relationships between genotype and phenotype are archived under the database on Genotype and Phenotype(dbGaP); 11 Biological sequence database: NCBI Figure : dbGaP database contains information of relation between genotype and phenotype Source: http://www.ncbi.nlm.nih.gov/gap 12 Biological sequence database: NCBI dbVar: Data on large scale genomic variation such as insertions, deletions and relationship between such variation and phenotype are based at the database of Genomic Structural Variation (dbVar). 13 Biological sequence database: NCBI Figure: dbVar at NCBI is the database for genomic and structural variation for various genomes Source: http://www.ncbi.nlm.nih.gov/dbvar This section deals with databases that are specific to genome sequencing and analysis tools, maps, taxonomy and Chemicals substances and has been grouped under database-II Databases-II: Genomes (Map Viewer, Genome workbench, Plant Genome Central,
Recommended publications
  • Comparative Genomics of Arabidopsis and Maize: Prospects and Comment Limitations Volker Brendel*, Stefan Kurtz† and Virginia Walbot‡
    http://genomebiology.com/2002/3/3/reviews/1005.1 Minireview Comparative genomics of Arabidopsis and maize: prospects and comment limitations Volker Brendel*, Stefan Kurtz† and Virginia Walbot‡ Addresses: *Department of Zoology and Genetics and Department of Statistics, Iowa State University, Ames, IA 50010, USA. †Technische Fakultät, Universität Bielefeld, D-33501 Bielefeld, Germany. ‡Department of Biological Sciences, Stanford University, Stanford, CA 94305- 5020, USA. Correspondence: Volker Brendel. E-mail: [email protected] reviews Published: 14 February 2002 Genome Biology 2002, 3(3):reviews1005.1–1005.6 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2002/3/3/reviews/1005 © BioMed Central Ltd (Print ISSN 1465-6906; Online ISSN 1465-6914) reports Abstract The completed Arabidopsis genome seems to be of limited value as a model for maize genomics. In addition to the expansion of repetitive sequences in maize and the lack of genomic micro-colinearity, maize-specific or highly-diverged proteins contribute to a predicted maize proteome of about 50,000 deposited research proteins, twice the size of that of Arabidopsis. Maize (Zea mays L., corn) was domesticated in the high- contributions to agriculture through the discovery of hybrid lands of Central Mexico approximately 10,000 years ago [1]. vigor and cytoplasmic male sterility. Corn agriculture spread rapidly into diverse climate zones, ranging from 45° N to 45° S, and supported vast Native The beautiful detail evident in meiotic maize chromosomes refereed research American civilizations. Today, maize is one of the world’s stimulated a generation of gifted cytogeneticists to identify most important crops: for direct human consumption, as a the physical basis for recombination, to construct linkage key component of animal feed, and as the source of chemical maps tied to chromosomes, and to analyze the consequences feed stocks.
    [Show full text]
  • Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D
    BIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins SECOND EDITION Andreas D. Baxevanis Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland USA B. F. Francis Ouellette Centre for Molecular Medicine and Therapeutics Children’s and Women’s Health Centre of British Columbia University of British Columbia Vancouver, British Columbia Canada A JOHN WILEY & SONS, INC., PUBLICATION New York • Chichester • Weinheim • Brisbane • Singapore • Toronto BIOINFORMATICS SECOND EDITION METHODS OF BIOCHEMICAL ANALYSIS Volume 43 BIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins SECOND EDITION Andreas D. Baxevanis Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland USA B. F. Francis Ouellette Centre for Molecular Medicine and Therapeutics Children’s and Women’s Health Centre of British Columbia University of British Columbia Vancouver, British Columbia Canada A JOHN WILEY & SONS, INC., PUBLICATION New York • Chichester • Weinheim • Brisbane • Singapore • Toronto Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. Copyright ᭧ 2001 by John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic or mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher.
    [Show full text]
  • Quality Assessment of Maize Assembled Genomic Islands (Magis) and Large-Scale Experimental Verification of Predicted Genes
    Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes Yan Fu*†‡, Scott J. Emrich‡§¶, Ling Guo†§, Tsui-Jung Wenʈ, Daniel A. Ashlock§**††‡‡, Srinivas Aluru§¶**§§, and Patrick S. Schnable*†§ʈ**§§¶¶ *Interdepartmental Genetics Graduate Program, §Interdepartmental Bioinformatics and Computational Biology Graduate Program, **L. H. Baker Center for Bioinformatics and Biological Statistics, §§Center for Plant Genomics, and Departments of †Genetics, Development, and Cell Biology, ¶Electrical and Computer Engineering, ʈAgronomy, and ††Mathematics, Iowa State University, Ames, IA 50011 Edited by Susan R. Wessler, University of Georgia, Athens, GA, and approved July 5, 2005 (received for review April 26, 2005) Recent sequencing efforts have targeted the gene-rich regions of ically, this speed makes it possible to determine the effects of the maize (Zea mays L.) genome. We report the release of an different assembly parameter values on the quality of the improved assembly of maize assembled genomic islands (MAGIs). resulting assemblies. The 114,173 resulting contigs have been subjected to computa- Three research groups currently provide publicly available partial tional and physical quality assessments. Comparisons to the se- maize genome assemblies based on the GSS data [The Institute for quences of maize bacterial artificial chromosomes suggest that at Genomic Research (TIGR), Plant Genome Database, and our least 97% (160 of 165) of MAGIs are correctly assembled. Because group].
    [Show full text]
  • An Active DNA Transposon Family in Rice
    letters to nature reference to their impact on the salt marsh. Contrib. Mar. Sci. 23, 25–55 (1980). transposons comprise the largest component of transposable 8. Zeil, J. & Layne, J. Crustacean Experimental Systems in Neurobiology (ed. Wiese, K.) 227–247 (Springer, Heidelberg, 2002). elements in the rice genome (14% of the genomic DNA) but, 9. Zeil, J., Nalbach, G. & Nalbach, H.-O. Eyes, eye stalks, and the visual world of semi-terrestrial crabs. numerically, MITEs form the largest group with over 100,000 J. Comp. Physiol. A 159, 801–811 (1986). elements divided into hundreds of families comprising about 6% 10. Krapp, H. G., Hengstenberg, B. & Hengstenberg, R. Dendritic structure and receptive-field of the genome6,7. MITEs are the predominant transposable element organization of optic flow processing interneurons in the fly. J. Neurophysiol. 79, 1902–1917 (1998). 11. Wehner, R. ‘Matched filters’—neural models of the external world. J. Comp. Physiol. A 161, 511–531 associated with the non-coding regions of the genes of flowering (1987). plants, especially grasses, and have been found in several animal 12. Schall, R. Estimation in generalized linear models with random effects. Biometrika 78, 719–727 genomes including Caenorhabditis elegans, mosquitoes, fish and (1991). human (reviewed in ref. 8). 13. Zeil, J. & Al-Mutairi, M. M. The variation of resolution and of ommatidial dimensions in the eyes of the fiddler crab Uca lactea annulipes (Ocypodidae, Brachyura, Decapoda). J. Exp. Biol. 199, 1569–1577 Structurally, MITEs are reminiscent of non-autonomous DNA (1996). (class 2) elements with their small size (,600 base pairs) and short 14.
    [Show full text]
  • 5, and J. Chris Pires
    American Journal of Botany 99(2): 330–348. 2012. Q UALITY AND QUANTITY OF DATA RECOVERED FROM MASSIVELY PARALLEL SEQUENCING: EXAMPLES IN 1 ASPARAGALES AND POACEAE P . R OXANNE S TEELE 2 , K ATE L. HERTWECK 3 , D USTIN M AYFIELD 4 , M ICHAEL R. MCKAIN 5 , J AMES L EEBENS-MACK 5 , AND J. CHRIS P IRES 3,6 2 Department of Biology, 6001 W. Dodge Street, University of Nebraska at Omaha, Omaha, Nebraska 68182-0040 USA; 3 National Evolutionary Synthesis Center, 2024 W. Main Street, Suite A200, Durham, North Carolina 27705-4667 USA; 4 Biological Sciences, 1201 Rollins St., Bond LSC 311, University of Missouri, Columbia, Missouri 65211 USA; and 5 Plant Biology, 4504 Miller Plant Sciences, University of Georgia, Athens, Georgia 30602 USA • Premise of the study: Genome survey sequences (GSS) from massively parallel sequencing have potential to provide large, cost-effective data sets for phylogenetic inference, replace single gene or spacer regions as DNA barcodes, and provide a plethora of data for other comparative molecular evolution studies. Here we report on the application of this method to estimat- ing the molecular phylogeny of core Asparagales, investigating plastid gene losses, assembling complete plastid genomes, and determining the type and quality of assembled genomic data attainable from Illumina 80 – 120-bp reads. • Methods: We sequenced total genomic DNA from samples in two lineages of monocotyledonous plants, Poaceae and Aspara- gales, on the Illumina platform in a multiplex arrangement. We compared reference-based assemblies to de novo contigs, evaluated consistency of assemblies resulting from use of various references sequences, and assessed our methods to obtain sequence assemblies in nonmodel taxa.
    [Show full text]
  • Genome Survey of Misgurnus Anguillicaudatus to Identify Genomic Information, Simple Sequence Repeat (SSR) Markers and Mitochondrial Genome
    Genome Survey of Misgurnus Anguillicaudatus to Identify Genomic Information, Simple Sequence Repeat (SSR) Markers and Mitochondrial Genome Guiyun Huang Guangdong Ocean University Jianmeng Cao Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Chen Chen Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Miao Wang Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Zhigang Liu Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Fengying Gao Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Mengmeng Yi Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Gang Chen Guangdong Ocean University Maixin Lu ( [email protected] ) Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Research Article Keywords: Misgurnus anguillicaudatus, genome characteristics, microsatellite motifs, mitochondrial genome, microsatellite markers Posted Date: August 20th, 2021 DOI: https://doi.org/10.21203/rs.3.rs-767195/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Page 1/17 Abstract The dojo loach Misgurnus anguillicaudatus is an important economic species in Asia because of its nutritional value and broad environmental adaptability. Despite its economic importance, genomic data from M. anguillicaudatus was unavailable. In the present study, we conducted a genome survey of M. anguillicaudatus using next-generation sequencing technology. Its genome size was estimated to be 1105.97 Mb by using K-mer analysis, and its heterozygosity ratio, repeat sequence content, GC content were 1.45%, 58.98%, and 38.03%, respectively. A total of 376,357 microsatellite motifs were identied and mononucleotides, with a frequency of 42.57%, were the most frequently repeated motifs, followed by 40.83% dinucleotide, 7.49% trinucleotide, 8.09% tetranucleotide, and 0.91% pentanucleotide motifs.
    [Show full text]
  • SSR-HRM) Analysis for Genetic Relationship of Luffa Genotypes
    International Journal of Molecular Sciences Article Genome Survey Sequencing of Luffa Cylindrica L. and Microsatellite High Resolution Melting (SSR-HRM) Analysis for Genetic Relationship of Luffa Genotypes Jianyu An, Mengqi Yin, Qin Zhang, Dongting Gong, Xiaowen Jia, Yajing Guan * and Jin Hu Seed Science Center, Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China; [email protected] (J.A.); [email protected] (M.Y.); [email protected] (Q.Z.); [email protected] (D.G.); [email protected] (X.J.); [email protected] (J.H.) * Correspondence: [email protected]; Tel.: +86-0571-8898-2318 Received: 26 July 2017; Accepted: 7 September 2017; Published: 11 September 2017 Abstract: Luffa cylindrica (L.) Roem. is an economically important vegetable crop in China. However, the genomic information on this species is currently unknown. In this study, for the first time, a genome survey of L. cylindrica was carried out using next-generation sequencing (NGS) technology. In total, 43.40 Gb sequence data of L. cylindrica, about 54.94× coverage of the estimated genome size of 789.97 Mb, were obtained from HiSeq 2500 sequencing, in which the guanine plus cytosine (GC) content was calculated to be 37.90%. The heterozygosity of genome sequences was only 0.24%. In total, 1,913,731 contigs (>200 bp) with 525 bp N50 length and 1,410,117 scaffolds (>200 bp) with 885.01 Mb total length were obtained. From the initial assembled L. cylindrica genome, 431,234 microsatellites (SSRs) (≥5 repeats) were identified. The motif types of SSR repeats included 62.88% di-nucleotide, 31.03% tri-nucleotide, 4.59% tetra-nucleotide, 0.96% penta-nucleotide and 0.54% hexa-nucleotide.
    [Show full text]
  • Identification and Characterization of Rearrangements in the Vervet Monkey Genome
    Identification and characterization of rearrangements in the vervet monkey genome by AmanPreet Badhwar Department of Ruman Genetics McGill University, Montreal August 2006 A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Science Copyright © AmanPreet Badhwar, 2006 1 Library and Bibliothèque et 1+1 Archives Canada Archives Canada Published Heritage Direction du Branch Patrimoine de l'édition 395 Wellington Street 395, rue Wellington Ottawa ON K1A ON4 Ottawa ON K1A ON4 Canada Canada Your file Votre référence ISBN: 978-0-494-32815-6 Our file Notre référence ISBN: 978-0-494-32815-6 NOTICE: AVIS: The author has granted a non­ L'auteur a accordé une licence non exclusive exclusive license allowing Library permettant à la Bibliothèque et Archives and Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par télécommunication ou par l'Internet, prêter, telecommunication or on the Internet, distribuer et vendre des thèses partout dans loan, distribute and sell th es es le monde, à des fins commerciales ou autres, worldwide, for commercial or non­ sur support microforme, papier, électronique commercial purposes, in microform, et/ou autres formats. paper, electronic and/or any other formats. The author retains copyright L'auteur conserve la propriété du droit d'auteur ownership and moral rights in et des droits moraux qui protège cette thèse. this thesis. Neither the thesis Ni la thèse ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent être imprimés ou autrement may be printed or otherwise reproduits sans son autorisation.
    [Show full text]
  • The Nuclear Genome of Brachypodium Distachyon: Analysis of BAC End Sequences
    Funct Integr Genomics (2008) 8:135–147 DOI 10.1007/s10142-007-0062-7 ORIGINAL PAPER The nuclear genome of Brachypodium distachyon: analysis of BAC end sequences Naxin Huo & Gerard R. Lazo & John P. Vogel & Frank M. You & Yaqin Ma & Daniel M. Hayden & Devin Coleman-Derr & Theresa A. Hill & Jan Dvorak & Olin D. Anderson & Ming-Cheng Luo & Yong Q. Gu Received: 27 July 2007 /Revised: 4 October 2007 /Accepted: 6 October 2007 /Published online: 6 November 2007 # Springer-Verlag 2007 Abstract Due in part to its small genome (~350 Mb), indicated that approximately 21.2% of the Brachypodium Brachypodium distachyon is emerging as a model system genome represents coding sequence. Furthermore, Bra- for temperate grasses, including important crops like wheat chypodium BES have more significant matches to ESTs and barley. We present the analysis of 10.9% of the from wheat than rice or maize, although these species Brachypodium genome based on 64,696 bacterial artificial have similar sizes of EST collections. A phylogenetic chromosome (BAC) end sequences (BES). Analysis of analysis based on 335 sequences shared among seven repeat DNA content in BES revealed that approximately grass species further revealed a closer relationship 11.0% of the genome consists of known repetitive DNA. between Brachypodium and Triticeae than Brachypodium The vast majority of the Brachypodium repetitive ele- and rice or maize. ments are LTR retrotransposons. While Bare-1 retrotrans- posons are common to wheat and barley, Brachypodium Keyword Brachypodium . BAC . Genome . repetitive element sequence-1 (BRES-1), closely related Retrotransposons . Phylogeny. SSR to Bare-1, is also abundant in Brachypodium.
    [Show full text]
  • Rice Transposable Elements: a Survey of 73,000 Sequence-Tagged-Connectors
    Letter Rice Transposable Elements: A Survey of 73,000 Sequence-Tagged-Connectors Long Mao,1 Todd C. Wood,1 Yeisoo Yu,1 Muhammad A. Budiman,1,3 Jeff Tomkins,1 Sung-sick Woo,1,4 Maciek Sasinowski,1,5 Gernot Presting,1 David Frisch,1 Steve Goff,2 Ralph A. Dean,1,6 and Rod A. Wing1,7 1Clemson University Genomics Institute, Clemson, South Carolina 29634 USA; 2Novartis Agricultural Discovery Institute, San Diego, California 92121 USA As part of an international effort to sequence the rice genome, the Clemson University Genomics Institute is developing a sequence-tagged-connector (STC) framework. This framework includes the generation of deep-coverage BAC libraries from O. sativa ssp. japonica c.v. Nipponbare and the sequencing of both ends of the genomic DNA insert of the BAC clones. Here, we report a survey of the transposable elements (TE) in >73,000 STCs. A total of 6848 STCs were found homologous to regions of known TE sequences (E<10−5) by FASTX search of STCs against a set of 1358 TE protein sequences obtained from GenBank. Of these TE-containing STCs (TE–STCs), 88% (6027) are related to retroelements and the remaining are transposase homologs. Nearly all DNA transposons known previously in plants were present in the STCs, including maize Ac/Ds, En/Spm, Mutator, and mariner-like elements. In addition, 2746 STCs were found to contain regions homologous to known miniature inverted-repeat transposable elements (MITEs). The distribution of these MITEs in regions near genes was confirmed by EST comparisons to MITE-containing STCs, and our results showed that the association of MITEs with known EST transcripts varies by MITE type.
    [Show full text]
  • Distribution of Genes and Repetitive Elements in the Diabrotica Virgifera Virgifera Genome Estimated Using BAC Sequencing Brad S
    Entomology Publications Entomology 2012 Distribution of Genes and Repetitive Elements in the Diabrotica virgifera virgifera Genome Estimated Using BAC Sequencing Brad S. Coates United States Department of Agriculture, [email protected] Analiza P. Alves University of Nebraska-Lincoln Haichuan Wang University of Nebraska-Lincoln Kimberly K. O. Walden University of Illinois at Urbana-Champaign B. Wade French UFonitlloedw St thiatess D aepndar atmddenitt ofion Agalric wulorktures at: http://lib.dr.iastate.edu/ent_pubs Part of the Agronomy and Crop Sciences Commons, Entomology Commons, Genetics See next page for additional authors Commons, and the Systems Biology Commons The ompc lete bibliographic information for this item can be found at http://lib.dr.iastate.edu/ ent_pubs/196. For information on how to cite this item, please visit http://lib.dr.iastate.edu/ howtocite.html. This Article is brought to you for free and open access by the Entomology at Iowa State University Digital Repository. It has been accepted for inclusion in Entomology Publications by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Distribution of Genes and Repetitive Elements in the Diabrotica virgifera virgifera Genome Estimated Using BAC Sequencing Abstract Feeding damage caused by the western corn rootworm, Diabrotica virgifera virgifera, is destructive to corn plants in North America and Europe where control remains challenging due to evolution of resistance to chemical and transgenic toxins. A BAC library, DvvBAC1, containing 109,486 clones with 1 0 4 ± 3 4 . 5 kb inserts was created, which has an ~4.56X genome coverage based upon a 2.58 Gb (2.80 pg) flow cytometry- estimated haploid genome size.
    [Show full text]
  • Analysis of the Maize (Zea Mays L) Genome Using Molecular, Genetic and Computational Approaches Yan Fu Iowa State University
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 2005 Analysis of the maize (Zea mays L) genome using molecular, genetic and computational approaches Yan Fu Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Genetics Commons Recommended Citation Fu, Yan, "Analysis of the maize (Zea mays L) genome using molecular, genetic and computational approaches " (2005). Retrospective Theses and Dissertations. 1326. https://lib.dr.iastate.edu/rtd/1326 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Analysis of the maize (Zea mays. L) genome using molecular, genetic and computational approaches by Yan Fu A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Genetics Program of Study Committee: Patrick S. Schnable, Major Professor Daniel A. Ashlock Volker Brendel Allen W. Miller Basil J. Nikolau Iowa State University Ames, Iowa 2005 UMI Number: 3217340 INFORMATION TO USERS The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted.
    [Show full text]