Identify Audiences, Especially in the Fast-Moving Field of Bioinformatics

Total Page:16

File Type:pdf, Size:1020Kb

Identify Audiences, Especially in the Fast-Moving Field of Bioinformatics AnAn IntroductionIntroduction toto NCBINCBI’’ss BioinformaticsBioinformatics ResourcesResources Dr.Dr. MedhaMedha DevareDevare [email protected] Life Sciences/Bioinformatics Specialist Albert R. Mann Library Cornell University, Ithaca, NY 14853 USAIN 2006: Delivering Information for the New Life Sciences October 7, 2006 Part I: Introduction to DNA Sequencing Part II: Data Mining in Bioinformatics CENTRAL DOGMA OF BIOLOGY Courtesy: National Human Genome Research Institute NUCLEOTIDES Nucleotide = phosphate + pentose sugar + base http://www.web-books.com/MoBio/Free/Ch3A.htm PENTOSE SUGARS http://www.web-books.com/MoBio/Free/Ch3A.htm NITROGENOUS BASES Purines Adenine Guanine Pyrimidines Cytosine Thymine Uracil (RNA only) http://dl.clackamas.cc.or.us/ch106-09/nucleoti.htm STRUCTURE OF DNA Courtesy: National Human Genome Research Institute DNA REPLICATION http://www.ncc.gmu.edu/dna/repanim.htm DNA SEQUENCING DNA SEQUENCING DNA SEQUENCING DNA SEQUENCING DNA SEQUENCING http://www.dnalc.org/ddnalc/resources/cycseq.html CLONING – PLASMID VECTOR http://www.accessexcellence.org/RC/VL/GG/inserting.html CLONING – identifying transformed cells DNA insert AmpR origin of replication VECTORS Vector FormForm Host CarryingCarrying Capacity Major UsesUses Plasmid Double-stranded circular DNA E. coli Upto 15 kb cDNA libraries; subcloning Bacteriophage lambda Virus – linear DNA E. coli Upto 25 kb Genomic and cDNA libraries Cosmid Double-stranded circular DNA E. coli 30 – 45 kb Genomic libraries Bacteriophage P1 Virus – circular DNA E. coli 70 – 90 kb Genomic libraries BAC Bacterial artificial chromosome E. coli 100 – 500 kb Genomic libraries YAC Yeast artificial chromosome Yeast 250 – 2000 kb Genomic libraries GENOME SEQUENCING Genome sequencing: http://www.pbs.org/wgbh/nova/genome/sequencer.html# Whole genome shotgun sequencing: http://smcg.cifn.unam.mx/enp-unam/03-EstructuraDelGenoma/animaciones/humanShot.swf WhatWhat isis bioinformatics?bioinformatics? Research,Research, developmentdevelopment oror applicationapplication ofof computationalcomputational toolstools andand approachesapproaches toto expandexpand thethe use,use, acquisition,acquisition, visualization,visualization, analysis,analysis, organizationorganization andand archivingarchiving ofof biological,biological, medical,medical, behavioralbehavioral oror healthhealth data.data. [Bioinformatics[Bioinformatics atat thethe NIH,NIH, 2001]2001] http://http://grants.nih.gov/grants/bistic/bistic.cfmgrants.nih.gov/grants/bistic/bistic.cfm ImportantImportant databasesdatabases inin thethe publicpublic domaindomain •• NationalNational CenterCenter forfor BiotechnologyBiotechnology InformationInformation (NCBI)(NCBI) http://www.ncbi.nlm.nih.gov •• EuropeanEuropean BioinformaticsBioinformatics InstituteInstitute ((http://www.ebi.ac.uk/) •• EuropeanEuropean MolecularMolecular BiologyBiology LaboratoryLaboratory ((http://www.embl.org) •• DNADNA DataData BankBank ofof JapanJapan ((http://www.ddbj.nig.ac.jp/Welcome.html) •• TIGRTIGR ((http://www.tigr.org) TheThe NationalNational CenterCenter forfor BiotechnologyBiotechnology InformationInformation (NCBI)(NCBI) Bethesda CreatedCreated inin 19881988 (( NationalNational LibraryLibrary ofof MedicineMedicine atat NIH)NIH) – Establish public databases – Conduct research in computational biology – Develop software tools for sequence analysis – Disseminate biomedical information NCBI FieldGuide NCBINCBI databasedatabase typestypes – Bibliographic Citations for biomedical articles http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed Free archive of life sci. journals http://www.pubmedcentral.nih.gov/ From NCBI FieldGuide NCBINCBI databasedatabase typestypes – Bibliographic Books that can be searched online http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books&itool=toolbar Human genes/genetic disorders http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM From NCBI FieldGuide NCBINCBI databasedatabase typestypes – Sequence (nucleotide; protein) – Taxonomy – Genome http://www.ncbi.nlm.nih.gov – Gene – Expression –Structure NCBI FieldGuide TypesTypes ofof SequenceSequence DatabasesDatabases PrimaryPrimary DatabasesDatabases –– ContainContain rawraw andand redundantredundant datadata:: originaloriginal experimentalexperimental sequences,sequences, submittedsubmitted andand ““ownedowned”” byby experimentalistsexperimentalists –– DatabaseDatabase staffstaff reviewreview andand organizeorganize thethe datadata:: dondon’’tt add,add, modifymodify oror updateupdate thethe recordsrecords ¾¾Examples:Examples: GenBank,GenBank, SNP,SNP, GEOGEO NCBI FieldGuide TypesTypes ofof SequenceSequence DatabasesDatabases DerivativeDerivative DatabasesDatabases –– HumanHuman--curatedcurated (data(data compilationcompilation andand correction)correction) ¾ Examples:Examples: LocusLinkLocusLink,, OMIMOMIM && LiteratureLiterature databasesdatabases –– ComputationallyComputationally--DerivedDerived (auto(auto--partitioningpartitioning GenBankGenBank seqsseqs)) ¾¾Example:Example: UniGeneUniGene –– CombinationCombination ¾ Examples:Examples: RefSeq,RefSeq, GenomeGenome AssemblyAssembly NCBI FieldGuide 11ºº SequenceSequence DatabaseDatabase GenBank •• NucleotideNucleotide--onlyonly sequencesequence databasedatabase •• ArchivalArchival ((>292,000 organisms) SubmissionSubmission ofof GenBankGenBank DataData toto NCBI:NCBI: ¾¾DirectDirect submissionssubmissions ofof individualindividual recordsrecords viavia WebWeb ((BankItBankIt,, SequinSequin)) ¾¾BatchBatch submissionssubmissions ofof bulkbulk sequencessequences viavia ee--mailmail ((EST,EST, dbGSSdbGSS,, dbSTSdbSTS)) ¾¾FTPFTP accountsaccounts forfor sequencingsequencing centerscenters NCBI FieldGuide TheThe InternationalInternational SequenceSequence DatabaseDatabase CollaborationCollaboration NIHNIH Entrez NCBI GenBankGenBank EMBLEMBL •Submissions •Updates •Submissions EMBLEMBL •Updates DDBJDDBJ CIB EBI NIGNIG •Submissions •Updates SRS getentry NCBI FieldGuide CheckCheck forfor crosscross--functionalityfunctionality ofof accessionaccession numbersnumbers AccessionAccession no.no. AB062786AB062786 EBI:EBI: http://http://www.ebi.ac.ukwww.ebi.ac.uk DDBJ:DDBJ: http://www.ddbj.nig.ac.jp/http://www.ddbj.nig.ac.jp/ OrganizationOrganization ofof GenBank:GenBank: GenBankGenBank DivisionsDivisions ((gbdivgbdiv)) RecordsRecords areare divideddivided intointo 1818 divisions:divisions: -- 11 PatentPatent 5 High Throughput EST Expressed Sequence Tag -- 5 High Throughput PRIGSS PrimateGenome Survey Sequence PLNHTG PlantHigh and Throughput Fungal Genomic - 1212 TraditionalTraditional BCTSTS BacterialSequence and Tagged Archaeal Site - INVHTC InvertebrateHigh Throughput cDNA Traditional Divisions: ROD Rodent VRL Viral ••BulkDirect Divisions: Submissions VRT Other Vertebrate •• Batch(Sequin Submission and BankIt) MAM Mammalian (ex. ROD and PRI) •• Accurate(Email and FTP) PHG Phage •• Well characterized SYN Synthetic (cloning vectors) •• Inaccurate UNA Unannotated •• Poorly characterized ENV Environmental NCBI FieldGuide Length mRNA = cDNA Division DNA = genomic Accession Number Accession.Version NCBI’s Taxonomy Feature Table GenPept Protein ID Database searching: http://www.ncbi.nlm.nih.gov/ e.g.e.g. -- pharmacogeneticspharmacogenetics • Identifying novel targets for new drugs ¾ mapping and identifying genes associated w/ disease ¾ characterizing proteins targets for new drugs • Identifying genetic variants associated w/ adverse drug reactions ¾ e.g., cytochrome P450s = multigene family of enzymes (liver) ¾ genetically variable expression = variation in drug efficacy Adapted from: Wolf et al., British Medical J., 320: 987-990 Potential consequences of polymorphic drug metabolism • Extended pharmacological effect • Adverse drug reactions • Lack of pro-drug activation (e.g., codeine) • Drug toxicity • Increased effective dose • Metabolism by alternate, deleterious pathways • Exacerbated drug – drug interactions Adapted from: Wolf et al., British Medical J., 320: 987-990 Common pharmacogenetic polymorphisms in human drug metabolizing enzymes (Weber, W.W. Pharmacogenetics. Oxford, 1997) Gene Metaboliser Frequency # of drugs Examples Phenotype CYP2D6 Poor White 6%, African American 2% >100 codeine, dextromethorphan Ultra-rapid Ethiopian 20%, Spanish 7% CYP2C9 Reduced >60 Ibuprofen, warfarin TPMT Poor low in all populations <10 6-mercaptopurine, 6-thioguanine Example: Cytochrome P450 gene - CYP2D6 • CYP2D6 is highly polymorphic (inactive in ~ 6% of Caucasians) ¾ codes for debrisoquine hydroxylase Adapted from: Wolf et al., British Medical J., 320: 987-990 http://www.ncbi.nlm.nih.gov/ Sequence/structureSequence/structure searchingsearching toolstools s e q results Simple sequence search u (BLAST) e n results Profile-sequence search c (HMMER) e results Structure-sequence search s (threading) t r u Homology modeling c (MODELLER) t u Structure-structure search (CE) r e Slide courtesy of Pillardy, Ripoll, and Sun (CBSU) ToolTool comparisoncomparison BLAST HMM Threading Sensitivity: Least sensitive Most sensitive Speed: Seconds Minutes Hours DB size: 1 x 106 1 x 106 18000 (PDB) Result Some expertise interpretation: Relatively easy required Slide courtesy of Pillardy, Ripoll, and Sun (CBSU) SequenceSequence similaritysimilarity searchingsearching WhyWhy dodo it?it? • identify and annotate sequences with no, incomplete, incorrect annotations (GenBank) • infer functionality for genes/proteins • find conserved domains • assemble genomes; clean up sequences (e.g., suspected cloning vector sequences) • explore evolutionary relationships NOTE: Similar sequences may
Recommended publications
  • Comparative Genomics of Arabidopsis and Maize: Prospects and Comment Limitations Volker Brendel*, Stefan Kurtz† and Virginia Walbot‡
    http://genomebiology.com/2002/3/3/reviews/1005.1 Minireview Comparative genomics of Arabidopsis and maize: prospects and comment limitations Volker Brendel*, Stefan Kurtz† and Virginia Walbot‡ Addresses: *Department of Zoology and Genetics and Department of Statistics, Iowa State University, Ames, IA 50010, USA. †Technische Fakultät, Universität Bielefeld, D-33501 Bielefeld, Germany. ‡Department of Biological Sciences, Stanford University, Stanford, CA 94305- 5020, USA. Correspondence: Volker Brendel. E-mail: [email protected] reviews Published: 14 February 2002 Genome Biology 2002, 3(3):reviews1005.1–1005.6 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2002/3/3/reviews/1005 © BioMed Central Ltd (Print ISSN 1465-6906; Online ISSN 1465-6914) reports Abstract The completed Arabidopsis genome seems to be of limited value as a model for maize genomics. In addition to the expansion of repetitive sequences in maize and the lack of genomic micro-colinearity, maize-specific or highly-diverged proteins contribute to a predicted maize proteome of about 50,000 deposited research proteins, twice the size of that of Arabidopsis. Maize (Zea mays L., corn) was domesticated in the high- contributions to agriculture through the discovery of hybrid lands of Central Mexico approximately 10,000 years ago [1]. vigor and cytoplasmic male sterility. Corn agriculture spread rapidly into diverse climate zones, ranging from 45° N to 45° S, and supported vast Native The beautiful detail evident in meiotic maize chromosomes refereed research American civilizations. Today, maize is one of the world’s stimulated a generation of gifted cytogeneticists to identify most important crops: for direct human consumption, as a the physical basis for recombination, to construct linkage key component of animal feed, and as the source of chemical maps tied to chromosomes, and to analyze the consequences feed stocks.
    [Show full text]
  • Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D
    BIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins SECOND EDITION Andreas D. Baxevanis Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland USA B. F. Francis Ouellette Centre for Molecular Medicine and Therapeutics Children’s and Women’s Health Centre of British Columbia University of British Columbia Vancouver, British Columbia Canada A JOHN WILEY & SONS, INC., PUBLICATION New York • Chichester • Weinheim • Brisbane • Singapore • Toronto BIOINFORMATICS SECOND EDITION METHODS OF BIOCHEMICAL ANALYSIS Volume 43 BIOINFORMATICS A Practical Guide to the Analysis of Genes and Proteins SECOND EDITION Andreas D. Baxevanis Genome Technology Branch National Human Genome Research Institute National Institutes of Health Bethesda, Maryland USA B. F. Francis Ouellette Centre for Molecular Medicine and Therapeutics Children’s and Women’s Health Centre of British Columbia University of British Columbia Vancouver, British Columbia Canada A JOHN WILEY & SONS, INC., PUBLICATION New York • Chichester • Weinheim • Brisbane • Singapore • Toronto Designations used by companies to distinguish their products are often claimed as trademarks. In all instances where John Wiley & Sons, Inc., is aware of a claim, the product names appear in initial capital or ALL CAPITAL LETTERS. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. Copyright ᭧ 2001 by John Wiley & Sons, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic or mechanical, including uploading, downloading, printing, decompiling, recording or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher.
    [Show full text]
  • Quality Assessment of Maize Assembled Genomic Islands (Magis) and Large-Scale Experimental Verification of Predicted Genes
    Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes Yan Fu*†‡, Scott J. Emrich‡§¶, Ling Guo†§, Tsui-Jung Wenʈ, Daniel A. Ashlock§**††‡‡, Srinivas Aluru§¶**§§, and Patrick S. Schnable*†§ʈ**§§¶¶ *Interdepartmental Genetics Graduate Program, §Interdepartmental Bioinformatics and Computational Biology Graduate Program, **L. H. Baker Center for Bioinformatics and Biological Statistics, §§Center for Plant Genomics, and Departments of †Genetics, Development, and Cell Biology, ¶Electrical and Computer Engineering, ʈAgronomy, and ††Mathematics, Iowa State University, Ames, IA 50011 Edited by Susan R. Wessler, University of Georgia, Athens, GA, and approved July 5, 2005 (received for review April 26, 2005) Recent sequencing efforts have targeted the gene-rich regions of ically, this speed makes it possible to determine the effects of the maize (Zea mays L.) genome. We report the release of an different assembly parameter values on the quality of the improved assembly of maize assembled genomic islands (MAGIs). resulting assemblies. The 114,173 resulting contigs have been subjected to computa- Three research groups currently provide publicly available partial tional and physical quality assessments. Comparisons to the se- maize genome assemblies based on the GSS data [The Institute for quences of maize bacterial artificial chromosomes suggest that at Genomic Research (TIGR), Plant Genome Database, and our least 97% (160 of 165) of MAGIs are correctly assembled. Because group].
    [Show full text]
  • An Active DNA Transposon Family in Rice
    letters to nature reference to their impact on the salt marsh. Contrib. Mar. Sci. 23, 25–55 (1980). transposons comprise the largest component of transposable 8. Zeil, J. & Layne, J. Crustacean Experimental Systems in Neurobiology (ed. Wiese, K.) 227–247 (Springer, Heidelberg, 2002). elements in the rice genome (14% of the genomic DNA) but, 9. Zeil, J., Nalbach, G. & Nalbach, H.-O. Eyes, eye stalks, and the visual world of semi-terrestrial crabs. numerically, MITEs form the largest group with over 100,000 J. Comp. Physiol. A 159, 801–811 (1986). elements divided into hundreds of families comprising about 6% 10. Krapp, H. G., Hengstenberg, B. & Hengstenberg, R. Dendritic structure and receptive-field of the genome6,7. MITEs are the predominant transposable element organization of optic flow processing interneurons in the fly. J. Neurophysiol. 79, 1902–1917 (1998). 11. Wehner, R. ‘Matched filters’—neural models of the external world. J. Comp. Physiol. A 161, 511–531 associated with the non-coding regions of the genes of flowering (1987). plants, especially grasses, and have been found in several animal 12. Schall, R. Estimation in generalized linear models with random effects. Biometrika 78, 719–727 genomes including Caenorhabditis elegans, mosquitoes, fish and (1991). human (reviewed in ref. 8). 13. Zeil, J. & Al-Mutairi, M. M. The variation of resolution and of ommatidial dimensions in the eyes of the fiddler crab Uca lactea annulipes (Ocypodidae, Brachyura, Decapoda). J. Exp. Biol. 199, 1569–1577 Structurally, MITEs are reminiscent of non-autonomous DNA (1996). (class 2) elements with their small size (,600 base pairs) and short 14.
    [Show full text]
  • Biological Sequence Database: NCBI
    Biological sequence database: NCBI Subject : Bioinformatics Lesson : Biological sequence database: National Center for Biotechnology Information (NCBI ) Lesson Developer : Sandip Das College/ Department: Department of Botany, University of Delhi 0 Biological sequence database: NCBI Table of Contents Chapter: Biological sequence database: National Center for Biotechnology Information (NCBI) Introduction Databases at NCBI Literature Bookshelf Pubmed Nucleic Acid dbEST dbGSS dbGSS Popset dbGaP dbVar o Genome o Taxonomy o PubChem o Expression analysis o Protein Summary Exercise/ Practice Glossary References/ Bibliography/ Further Reading National Center for Biotechnology Information (NCBI) NCBI has emerged as the primary free-to-access source of data and analysis tools in the field of computational biology. The free-access nature of NCBI is possible as the policy of funding and publication in most countries dictates that the researcher mandatorily deposits the information generated using public-fund into a free-to-access central repository. In return, the repository (such as NCBI or EMBL) assigns a unique identification number, often termed as accession number, to the data that also can be used to identify the depositor and 1 Biological sequence database: NCBI several other features. The following section will introduce you to a variety of databases dealing with a wide range of disciplines. Please do note that although the data may be organized separately for the sake of simplicity and clarity, in reality, all the databases are inter-linked and can be navigated from one to the other. The databases are also associated with their appropriate analysis tools. The following section lists some of the databases that have been created at NCBI.
    [Show full text]
  • 5, and J. Chris Pires
    American Journal of Botany 99(2): 330–348. 2012. Q UALITY AND QUANTITY OF DATA RECOVERED FROM MASSIVELY PARALLEL SEQUENCING: EXAMPLES IN 1 ASPARAGALES AND POACEAE P . R OXANNE S TEELE 2 , K ATE L. HERTWECK 3 , D USTIN M AYFIELD 4 , M ICHAEL R. MCKAIN 5 , J AMES L EEBENS-MACK 5 , AND J. CHRIS P IRES 3,6 2 Department of Biology, 6001 W. Dodge Street, University of Nebraska at Omaha, Omaha, Nebraska 68182-0040 USA; 3 National Evolutionary Synthesis Center, 2024 W. Main Street, Suite A200, Durham, North Carolina 27705-4667 USA; 4 Biological Sciences, 1201 Rollins St., Bond LSC 311, University of Missouri, Columbia, Missouri 65211 USA; and 5 Plant Biology, 4504 Miller Plant Sciences, University of Georgia, Athens, Georgia 30602 USA • Premise of the study: Genome survey sequences (GSS) from massively parallel sequencing have potential to provide large, cost-effective data sets for phylogenetic inference, replace single gene or spacer regions as DNA barcodes, and provide a plethora of data for other comparative molecular evolution studies. Here we report on the application of this method to estimat- ing the molecular phylogeny of core Asparagales, investigating plastid gene losses, assembling complete plastid genomes, and determining the type and quality of assembled genomic data attainable from Illumina 80 – 120-bp reads. • Methods: We sequenced total genomic DNA from samples in two lineages of monocotyledonous plants, Poaceae and Aspara- gales, on the Illumina platform in a multiplex arrangement. We compared reference-based assemblies to de novo contigs, evaluated consistency of assemblies resulting from use of various references sequences, and assessed our methods to obtain sequence assemblies in nonmodel taxa.
    [Show full text]
  • Genome Survey of Misgurnus Anguillicaudatus to Identify Genomic Information, Simple Sequence Repeat (SSR) Markers and Mitochondrial Genome
    Genome Survey of Misgurnus Anguillicaudatus to Identify Genomic Information, Simple Sequence Repeat (SSR) Markers and Mitochondrial Genome Guiyun Huang Guangdong Ocean University Jianmeng Cao Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Chen Chen Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Miao Wang Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Zhigang Liu Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Fengying Gao Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Mengmeng Yi Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Gang Chen Guangdong Ocean University Maixin Lu ( [email protected] ) Chinese Academy of Fishery Sciences Pearl River Fisheries Research Institute Research Article Keywords: Misgurnus anguillicaudatus, genome characteristics, microsatellite motifs, mitochondrial genome, microsatellite markers Posted Date: August 20th, 2021 DOI: https://doi.org/10.21203/rs.3.rs-767195/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Page 1/17 Abstract The dojo loach Misgurnus anguillicaudatus is an important economic species in Asia because of its nutritional value and broad environmental adaptability. Despite its economic importance, genomic data from M. anguillicaudatus was unavailable. In the present study, we conducted a genome survey of M. anguillicaudatus using next-generation sequencing technology. Its genome size was estimated to be 1105.97 Mb by using K-mer analysis, and its heterozygosity ratio, repeat sequence content, GC content were 1.45%, 58.98%, and 38.03%, respectively. A total of 376,357 microsatellite motifs were identied and mononucleotides, with a frequency of 42.57%, were the most frequently repeated motifs, followed by 40.83% dinucleotide, 7.49% trinucleotide, 8.09% tetranucleotide, and 0.91% pentanucleotide motifs.
    [Show full text]
  • SSR-HRM) Analysis for Genetic Relationship of Luffa Genotypes
    International Journal of Molecular Sciences Article Genome Survey Sequencing of Luffa Cylindrica L. and Microsatellite High Resolution Melting (SSR-HRM) Analysis for Genetic Relationship of Luffa Genotypes Jianyu An, Mengqi Yin, Qin Zhang, Dongting Gong, Xiaowen Jia, Yajing Guan * and Jin Hu Seed Science Center, Institute of Crop Science, College of Agriculture and Biotechnology, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China; [email protected] (J.A.); [email protected] (M.Y.); [email protected] (Q.Z.); [email protected] (D.G.); [email protected] (X.J.); [email protected] (J.H.) * Correspondence: [email protected]; Tel.: +86-0571-8898-2318 Received: 26 July 2017; Accepted: 7 September 2017; Published: 11 September 2017 Abstract: Luffa cylindrica (L.) Roem. is an economically important vegetable crop in China. However, the genomic information on this species is currently unknown. In this study, for the first time, a genome survey of L. cylindrica was carried out using next-generation sequencing (NGS) technology. In total, 43.40 Gb sequence data of L. cylindrica, about 54.94× coverage of the estimated genome size of 789.97 Mb, were obtained from HiSeq 2500 sequencing, in which the guanine plus cytosine (GC) content was calculated to be 37.90%. The heterozygosity of genome sequences was only 0.24%. In total, 1,913,731 contigs (>200 bp) with 525 bp N50 length and 1,410,117 scaffolds (>200 bp) with 885.01 Mb total length were obtained. From the initial assembled L. cylindrica genome, 431,234 microsatellites (SSRs) (≥5 repeats) were identified. The motif types of SSR repeats included 62.88% di-nucleotide, 31.03% tri-nucleotide, 4.59% tetra-nucleotide, 0.96% penta-nucleotide and 0.54% hexa-nucleotide.
    [Show full text]
  • Identification and Characterization of Rearrangements in the Vervet Monkey Genome
    Identification and characterization of rearrangements in the vervet monkey genome by AmanPreet Badhwar Department of Ruman Genetics McGill University, Montreal August 2006 A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Science Copyright © AmanPreet Badhwar, 2006 1 Library and Bibliothèque et 1+1 Archives Canada Archives Canada Published Heritage Direction du Branch Patrimoine de l'édition 395 Wellington Street 395, rue Wellington Ottawa ON K1A ON4 Ottawa ON K1A ON4 Canada Canada Your file Votre référence ISBN: 978-0-494-32815-6 Our file Notre référence ISBN: 978-0-494-32815-6 NOTICE: AVIS: The author has granted a non­ L'auteur a accordé une licence non exclusive exclusive license allowing Library permettant à la Bibliothèque et Archives and Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par télécommunication ou par l'Internet, prêter, telecommunication or on the Internet, distribuer et vendre des thèses partout dans loan, distribute and sell th es es le monde, à des fins commerciales ou autres, worldwide, for commercial or non­ sur support microforme, papier, électronique commercial purposes, in microform, et/ou autres formats. paper, electronic and/or any other formats. The author retains copyright L'auteur conserve la propriété du droit d'auteur ownership and moral rights in et des droits moraux qui protège cette thèse. this thesis. Neither the thesis Ni la thèse ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent être imprimés ou autrement may be printed or otherwise reproduits sans son autorisation.
    [Show full text]
  • The Nuclear Genome of Brachypodium Distachyon: Analysis of BAC End Sequences
    Funct Integr Genomics (2008) 8:135–147 DOI 10.1007/s10142-007-0062-7 ORIGINAL PAPER The nuclear genome of Brachypodium distachyon: analysis of BAC end sequences Naxin Huo & Gerard R. Lazo & John P. Vogel & Frank M. You & Yaqin Ma & Daniel M. Hayden & Devin Coleman-Derr & Theresa A. Hill & Jan Dvorak & Olin D. Anderson & Ming-Cheng Luo & Yong Q. Gu Received: 27 July 2007 /Revised: 4 October 2007 /Accepted: 6 October 2007 /Published online: 6 November 2007 # Springer-Verlag 2007 Abstract Due in part to its small genome (~350 Mb), indicated that approximately 21.2% of the Brachypodium Brachypodium distachyon is emerging as a model system genome represents coding sequence. Furthermore, Bra- for temperate grasses, including important crops like wheat chypodium BES have more significant matches to ESTs and barley. We present the analysis of 10.9% of the from wheat than rice or maize, although these species Brachypodium genome based on 64,696 bacterial artificial have similar sizes of EST collections. A phylogenetic chromosome (BAC) end sequences (BES). Analysis of analysis based on 335 sequences shared among seven repeat DNA content in BES revealed that approximately grass species further revealed a closer relationship 11.0% of the genome consists of known repetitive DNA. between Brachypodium and Triticeae than Brachypodium The vast majority of the Brachypodium repetitive ele- and rice or maize. ments are LTR retrotransposons. While Bare-1 retrotrans- posons are common to wheat and barley, Brachypodium Keyword Brachypodium . BAC . Genome . repetitive element sequence-1 (BRES-1), closely related Retrotransposons . Phylogeny. SSR to Bare-1, is also abundant in Brachypodium.
    [Show full text]
  • Rice Transposable Elements: a Survey of 73,000 Sequence-Tagged-Connectors
    Letter Rice Transposable Elements: A Survey of 73,000 Sequence-Tagged-Connectors Long Mao,1 Todd C. Wood,1 Yeisoo Yu,1 Muhammad A. Budiman,1,3 Jeff Tomkins,1 Sung-sick Woo,1,4 Maciek Sasinowski,1,5 Gernot Presting,1 David Frisch,1 Steve Goff,2 Ralph A. Dean,1,6 and Rod A. Wing1,7 1Clemson University Genomics Institute, Clemson, South Carolina 29634 USA; 2Novartis Agricultural Discovery Institute, San Diego, California 92121 USA As part of an international effort to sequence the rice genome, the Clemson University Genomics Institute is developing a sequence-tagged-connector (STC) framework. This framework includes the generation of deep-coverage BAC libraries from O. sativa ssp. japonica c.v. Nipponbare and the sequencing of both ends of the genomic DNA insert of the BAC clones. Here, we report a survey of the transposable elements (TE) in >73,000 STCs. A total of 6848 STCs were found homologous to regions of known TE sequences (E<10−5) by FASTX search of STCs against a set of 1358 TE protein sequences obtained from GenBank. Of these TE-containing STCs (TE–STCs), 88% (6027) are related to retroelements and the remaining are transposase homologs. Nearly all DNA transposons known previously in plants were present in the STCs, including maize Ac/Ds, En/Spm, Mutator, and mariner-like elements. In addition, 2746 STCs were found to contain regions homologous to known miniature inverted-repeat transposable elements (MITEs). The distribution of these MITEs in regions near genes was confirmed by EST comparisons to MITE-containing STCs, and our results showed that the association of MITEs with known EST transcripts varies by MITE type.
    [Show full text]
  • Distribution of Genes and Repetitive Elements in the Diabrotica Virgifera Virgifera Genome Estimated Using BAC Sequencing Brad S
    Entomology Publications Entomology 2012 Distribution of Genes and Repetitive Elements in the Diabrotica virgifera virgifera Genome Estimated Using BAC Sequencing Brad S. Coates United States Department of Agriculture, [email protected] Analiza P. Alves University of Nebraska-Lincoln Haichuan Wang University of Nebraska-Lincoln Kimberly K. O. Walden University of Illinois at Urbana-Champaign B. Wade French UFonitlloedw St thiatess D aepndar atmddenitt ofion Agalric wulorktures at: http://lib.dr.iastate.edu/ent_pubs Part of the Agronomy and Crop Sciences Commons, Entomology Commons, Genetics See next page for additional authors Commons, and the Systems Biology Commons The ompc lete bibliographic information for this item can be found at http://lib.dr.iastate.edu/ ent_pubs/196. For information on how to cite this item, please visit http://lib.dr.iastate.edu/ howtocite.html. This Article is brought to you for free and open access by the Entomology at Iowa State University Digital Repository. It has been accepted for inclusion in Entomology Publications by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Distribution of Genes and Repetitive Elements in the Diabrotica virgifera virgifera Genome Estimated Using BAC Sequencing Abstract Feeding damage caused by the western corn rootworm, Diabrotica virgifera virgifera, is destructive to corn plants in North America and Europe where control remains challenging due to evolution of resistance to chemical and transgenic toxins. A BAC library, DvvBAC1, containing 109,486 clones with 1 0 4 ± 3 4 . 5 kb inserts was created, which has an ~4.56X genome coverage based upon a 2.58 Gb (2.80 pg) flow cytometry- estimated haploid genome size.
    [Show full text]