Genome Size What Is Genome Size? Remember What Is a Genome?
Total Page:16
File Type:pdf, Size:1020Kb
LECTURE 5: GENOME DIVERSITY: SIZE 19/20 JANUARY 2015 Smith Size doesn’t matter BIG tree of life endosymbiosis genetic mergers genetic mosaics genetic compartments small BIG tree of life endosymbiosis genomes genetic mergers genetic mosaics genetic compartments small What genetic compartments? chloroplasts Bacteria Archaea mitochondri nuclei a viruses Their genomes Size Structure Content Genome size what is genome size? Remember what is a genome? a set of genetic instructions within a biological compartment Genome size: is the length of those instructions Units of genome size 1 base pair (bp) A–T 1,000 base pairs (kb) ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 1,000,000 base pairs (Mb) 1,000,000,000 base pairs (Gb) Units of genome size Mass picograms (pg) DNA 1 pg ≈ 1,000,000,000 bp Haploid human nuclear genome mass ~3 pg length ~3 billion bp (~3 Gb) How to measure genome size? sequence and assemble more on genome sequencing later in course nuclear genome nucleus DNA Polytomella 1 chloroplast 2 DNA total DNA mitochondrial genome ? DNA mitochondrion commercial sequencing X chloroplast genome loss Millions of reads assemble into contigs size How to measure genome size? not good for big genomes repeats confuse gaps computer algorithm too much data too little computer too few reads low coverage pg. 632-636 How to measure genome size? staining and imaging cell nucleus DNA How to measure genome size? staining and imaging cell nucleus DNA DNA stain (Schiff Reagent) How to measure genome size? high-powered digital image analysis microscopy How to measure genome size? stained nuclei ? how would being haploid vs diploid change your interpretation of this? pixel intensity related to genome size Feulgen Image Analysis Densitometry T. R ya n G reg o r y How to measure genome size? other techniques… BIG small Gel electrophoresis DNA Flow cytometry pg. LINK 453-455 exploring genome size complexity unicellular multicellular viruses bacteria eukaryotes eukaryotes tiny small medium big genome size complexity eukaryotesX small medium massive genome size discordance between complexity & genome size Hewson Swift C-value paradox 130 billion nucleotides Protopterus aethiopicus 20 million nucleotides Pratylenchus coffeae 150 billion nucleotides Paris japonica 15060 millionbillion nucleotidesnucleotides Genlisea margaretae 150600 billion billion nucleotides nucleotides Polychaos dubium 150 billion nucleotides 3 billion nucleotides REVIEWS BoxGenome 1 | Extensive size variation in genome size within and among the main groups of life Ever since the first general Mammals surveysMicrosporidia of nuclear DNA Birds Reptiles contentsmallest were carried nuclear out in Frogs Salamanders the earlygenome 1950s it (2has Mb) been Lungfishes apparent that eukaryotic Teleost fishes Chondrostean fishes genome sizes vary Cartilaginous fishes enormously and that this is Jawless fishes Non-vertebrate chordates unrelated to intuitive ideas of Crustaceans 2 Insects morphological complexity . Arachnids This discrepancy between Myriapods Molluscs genome size and complexity Annelids remains clear more than half Echinoderms Water bears (Tardigrada) a century later, with genome Flatworms (Platyhelminthes) sizes now available for nearly Rotifers Red algae (Rhodophyta) 9,000 species of animals and Green algae (Chlorophyta) 10,11 Brown algae (Phaeophyta) plants . In prokaryotes, Flowering plants (Angiosperms) genome size and gene number Non-flowering seed plants (Gymnosperms) 86 Ferns (Monilophytes) are strongly correlated , but Club mosses (Lycophytes) in eukaryotes the vast majority Mosses and kin (Bryophytes) Roundworms (Nematoda) of nuclear DNA is non-coding Cnidarians (FIG. 1; BOX 3 Sponges (Porifera) . Nevertheless,eukaryotic microbes Fungi there is some overlap in genome Protozoa Bacteria Archaea size between the largest bacteria Bacteria MitochondriaArchaea and the smallest parasitic Chloroplasts protists. The figure illustrates –1Viruses 0 1 2 3 456 the means and overall ranges Log10 C-value (Mb) -3of10 genome3 size10-2 that4 have been10 5 106 107 108 109 1010 1011 10612 observed so far in the main groups of living organisms, and are loosely arranged according to common ideas of complexity to further emphasize the disparity betweenGenome this parameter size and genome (bp size.) Some commonly cited extreme values for amoebae (700,000 Mb) have been omitted, as there is considerable uncertainty about the accuracy of these measurements and the ploidy level of the species involved10,87. C-value enigma will require the integration of insights of genome-size evolution, but the obvious problem derived from various disciplines including cytogenet- is that they deal only with the subset of the C-value ics, cell biology, morphology, developmental biology, enigma that relates to the implications of DNA-content physiology, evolutionary theory, phylogenetics, ecol- variation. The equally important components of the ogy BOX 2 and, as argued here, complete genome puzzle that involve the sub-genomic processes and sequencing. specific sequences that generate variation in genome A detailed review of either genome sequencing size have received less attention. For the most part, or genome size is neither the intent nor within the this is because these issues can only be examined scope of this discussion (for this, see REFS 1012). in detail through large-scale comparisons of DNA Instead, the following sections outline some cru- sequences, an approach that has become possible only cial new insights into the study of genome size that relatively recently. have been derived from complete sequences, and Fortunately, interest in the molecular bases of the importance of genome size in the generation genome-size change has been increasing steadily over and interpretation of genome sequences. The key the past 10 years. This has included not only rudimen- message throughout this article is that considerable tary analyses of the sequences and processes that add benefits are to be had by bridging the current divide to genomic bulk, but also of previously overlooked between sequence and size. mechanisms for genome shrinkage. The net result has been a recognition that genome sizes can change — in Using sequences to understand sizes either direction — by various processes that operate Most previous work on genome-size evolution has at many physical and temporal scales, from individual involved carrying out interspecific comparisons of replication events within genomes to filtering at the total DNA content, mostly to the exclusion of gene- level of populations and higher-order lineages10,15 level analyses. In particular, the primary focus has BOX 2. Some specific contributions of large-scale been on correlating variation in DNA content with sequencing to this new understanding of genome-size a range of parameters, from the sizes of individual change are highlighted in the following sections. A chromosomes to the geographical distribution of spe- few warnings are also provided in an effort to prevent cies10,11,13–15 BOX 2. Phenotypic associations such as an overextension of these valuable, but still limited, these have had an important role in shaping discussions genome-sequence data. 700 | SEPTEMBER 2005 | VOLUME 6 www.nature.com/reviews/genetics © 2005 Nature Publishing Group Genome size is: Hugely variable within and among lineages. This is true for chloroplasts nuclei all types of genome. What do big genomes have that little genomes don’t have? ? sequence some genomes & find out whole-genome sequencing 3 Gb 2001 100 100 Mb 1998 12 Mb 80 1996 2 Mb 156 kb 1995 60 16 kb 1986 coding 1981 chloroplast 5 kb 1977 40 mitochondrion % non- 20 0 tiny Genome Size massive whole-genome sequencing 100 80 60 coding 40 % non- 20 0 tiny Genome Size massive What do big genomes have that little genomes don’t have? The answer to the C-value paradox non-coding DNA What is non-coding DNA? DNA that does not encode proteins or functional RNAs coding DNA messenger RNA transfer RNA ribosomal RNA protein Two types of non-coding DNA 1. The DNA between genes “intergenic DNA” gene A non-coding DNA gene B non-coding DNA gene C D 2. The DNA between exons “intronic DNA” exon 1 intron 1 exon 2 intron 2 exon 3 gene A 130 billion nucleotides 99.9% non-coding microsporidian parasites smallest nuclear genomes >90% coding DNA REVIEWS BoxAnother 1 | Extensive variation example in genome size within and among the main groups of life Ever since the first general Mammals surveys of nuclear DNA Birds Reptiles content were carried out in Frogs Salamanders the early 1950s it has been Lungfishes apparent that eukaryotic Teleost fishes Chondrostean fishes genome sizes vary Cartilaginous fishes enormously and that this is Jawless fishes Non-vertebrate chordates unrelated to intuitive ideas of Crustaceans 2 Insects morphological complexity . Arachnids This discrepancy between Myriapods Molluscs genome size and complexity Annelids remains clear more than half Echinoderms Water bears (Tardigrada) a century later, with genome Flatworms (Platyhelminthes) sizes now available for nearly Rotifers Red algae (Rhodophyta) 9,000 species DNAof animals and Green algae (Chlorophyta) 10,11 Brown algae (Phaeophyta) plants . In prokaryotes, Flowering plants (Angiosperms) genome size and gene number Non-flowering seed plants (Gymnosperms) mitochondrion86 Ferns (Monilophytes) are strongly correlated , but Club mosses (Lycophytes) in eukaryotes the vast majority