Please Consider the Environment Before Printing This Thesis. This Document Is Optimized for Duplex Printing

Total Page:16

File Type:pdf, Size:1020Kb

Please Consider the Environment Before Printing This Thesis. This Document Is Optimized for Duplex Printing Please consider the environment before printing this thesis. This document is optimized for duplex printing. Telomere length of kakapo and other New Zealand birds - assessment of methods and applications A thesis submitted in partial fulfilment of the requirements for the degree of PhD in Cellular and Molecular Biology at the University of Canterbury by Thorsten Horn University of Canterbury 2008 Abstract Abstract The age structure of populations is an important and often unresolved factor in ecology and wildlife management. Parameters like onset of reproduction and senescence, reproductive success and survival rate are tightly correlated with age. Unfortunately, age information of wild animals is not easy to obtain, especially for birds, where few anatomical markers of age exist. Longitudinal age data from birds banded as chicks are rare, particularly in long lived species. Age estimation in such species would be extremely useful as their long life span typically indicates slow population growth and potentially the need for protection and conservation. Telomere length change has been suggested as a universal marker for ageing vertebrates and potentially other animals. This method, termed molecular ageing, is based on a shortening of telomeres with each cell division. In birds, the telomere length of erythrocytes has been reported to decline with age, as the founder cells (haematopoietic stem cells) divide to renew circulating red blood cells. I measured telomere length in kakapo, the world largest parrot and four other bird species (Buller’s albatross, kea, New Zealand robin and saddleback) using telomere restriction fragment analysis (TRF) to assess the potential for molecular ageing in these species. After providing an overview of methods to measure telomere length, I describe how one of them (TRF) measures telomere length by quantifying the size distribution of terminal restriction fragments using southern blot of in-gel hybridization (Chapter 2). Although TRF is currently the ‘gold standard’ to measure telomere length, it suffers from various technical problems that can compromise precision and accuracy of telomere length estimation. In addition, there are many variations of the protocol, complicating comparisons between publications. I focused on TRF analysis using a non-radioactive probe, because it does not require special precautions associated with handling and disposing of radioactive material and therefore is more suitable for ecology laboratories that typically do not have a strong molecular biology infrastructure. However, most of my findings can be applied to both, radioactive and non- radioactive TRF variants. I tested how sample storage, choice of restriction enzyme, gel I Abstract electrophoresis and choice of hybridization buffer can influence the results. Finally, I show how image analysis (e.g. background correction, gel calibration, formula to calculate telomere length and the analysis window) can not only change the magnitude of estimated telomere length, but also their correlation to each other. Based on these findings, I present and discuss an extensive list of methodological difficulties associated with TRF and present a protocol to obtain reliable and reproducible results. Using this optimized protocol, I then measured telomere length of 68 kakapo (Chapter 3). Almost half of the current kakapo population consists of birds that were captured as adults, hence only their minimum age is known (i.e. time from when they were found +5 years to reach adulthood). Although molecular ageing might not be able to predict chronological age accurately, as calibrated with minimum age of some birds, it should be able to compare relative age between birds. Recently, the oldest kakapo (Richard Henry) was found to show signs of reproductive senescence. The age (or telomere length) difference to Richard Henry could have been used to approximate the remaining reproductive time span for other birds. Unfortunately, there was no change of telomere length with age in cross sectional and longitudinal samples. Analysis of fitness data available for kakapo yielded correlations between telomere length and fledging success, but they were weak and disappeared when the most influential bird was excluded from analysis. The heavy management and small numbers of kakapo make conclusions about fitness and telomere length difficult and highly speculative. However, telomere length of mothers and their chicks were significantly correlated, a phenomena not previously observed in any bird. To test if the lack of telomere loss with age is specific to kakapo, I measured telomere length of one of its closest relatives: the kea (Chapter 4). Like kakapo, telomere length did not show any correlation with age. I then further assessed the usefulness of molecular ageing in birds using only chicks and very old birds to estimate the maximum TL range in an additional long lived (Buller’s albatross) and two shorter lived species (NZ robin and saddleback). In these II Abstract species, telomere length was on average higher in chicks than in adults. However, age matched individuals showed high variations in telomere length, such that age dependent and independent telomere length could not be distinguished. These data and published results from other bird species, coupled with the limitations of methodology I have identified (Chapter 2), indicate that molecular ageing does not work in most (if not all) birds in its current suggested form. Another way to measure telomere length is telomere Q-PCR, a real-time PCR based method. Measurement of the same kakapo samples with TRF and Q-PCR did not result in comparable results (Chapter 4). Through experimentation I found that differences in amplification efficiency between samples lead to unreliable estimation of telomere length using telomere Q-PCR. These differences were caused by inhibitors present in the samples. The problem of differential amplification efficiency in Q-PCR, while known, is largely ignored by the scientific community. Although some methods have been suggested to correct for differing efficiency, most of these introduce more error than they eliminate. I developed and applied an assay based on internal standard oligonucleotides that was able to corrected EDTA induced quantification errors of up to 70% with high precision and accuracy (Chapter 5). The method, however, failed when tested with other inhibitors commonly found in DNA samples extracted from blood (i.e. SDS, heparin, urea and FeCl3). PCR inhibition was highly selective in the probe-polymerase system I used, inhibiting amplification of genomic DNA, but not amplification of internal oligonucleotide or plasmid standards in the same reaction. Internal standards are a key feature of most diagnostic PCR assays to identify false negatives arising from amplification inhibition. The differential response to inhibition I identified greatly compromises the accuracy of these assays. Consequently, I strongly recommend that researchers using PCR assays with internal standards should verify that the target DNA and internal standard actually respond similarly to common inhibitors. III Acknowledgement Acknowledgement I would like to thank Professor Neil J. Gemmell for giving me the opportunity to conduct this thesis under his supervision and for providing funding for my money-consuming southern blots. I am especially grateful to Dr. Bruce C. (FS) Robertson for supervision (you google the best) and advice in the laboratory (to reduce the costs for southern blots). Also all the best to his family in their new home. Probably the biggest achievement of my thesis period is that I met Iris Jentzsch. I am deeply grateful for your love and support during this work (when Southern blots went wrong). I am looking forward to our shared future. I am indebted to Jonci Wolff for constructive smoke breaks (quit smoking, man !), being a good neighbour in the lab and for his contagious relaxedness, and to Margee Will for late night fries (Signature Range, RIP) and her contagious unrelaxedness. Many thanks also to Wiebke Müller for overwhelming discussions and statistical advice and Andrew Bagshaw for late night encounters and nasty plasmid preps. I would like to acknowledge our lab technicians Linda Morris, Joanne Burke for keeping the lab going and especially Maggie Tisch and her husband for adopting our fish. A special thanks to the New Zealand Department of Conservation (DOC) for support of the kakapo work, especially Daryl Eason, Graeme Elliott and the KPOs at Codfish Island. Additional bird samples were provided by the following: saddleback and NZ robin by Dr. Tania King and Dr. Ian Jamieson (University of Otago, NZ); zebra finch by Dr. Mark Hauber (University of Auckland, NZ); Buller’s albatross by Paul Sagar (NIWA, NZ) and Aaron Russ (University of Canterbury, NZ) and kea by Dr. Gyula Gajdon (Universität Wien, Austria). European sea bass data used for TRF optimization were collected at the IFREMER aquaculture station, Palavas-Les-Flots, France and supported by the European Community through the ‘Access to research infrastructures action of the improving human potential IV Acknowledgement programme HPRI-CT-2001 00146’ and through Bluefin Tuna in Captivity (REPRODOTT) Contract Nr.Q5RS 2002-01355. Many thanks to Professor C. R. Bridges (Heinrich-Heine- Universität Düsseldorf, Germany) for giving me the opportunity for this project. I thank the University of Canterbury and Landcare Research Sustaining and Restoring Biodiversity OBI for financial support of this
Recommended publications
  • Absent from DNA and Protein: Genomic Characterization Of
    Georgakopoulos-Soares et al. Genome Biology (2021) 22:245 https://doi.org/10.1186/s13059-021-02459-z RESEARCH Open Access Absent from DNA and protein: genomic characterization of nullomers and nullpeptides across functional categories and evolution Ilias Georgakopoulos-Soares1,2†, Ofer Yizhar-Barnea1,2†, Ioannis Mouratidis3, Martin Hemberg4,5* and Nadav Ahituv1,2* * Correspondence: mhemberg@ bwh.harvard.edu; nadav.ahituv@ Abstract: Nullomers and nullpeptides are short DNA or amino acid sequences that ucsf.edu are absent from a genome or proteome, respectively. One potential cause for their † Ilias Georgakopoulos-Soares and absence could be their having a detrimental impact on an organism. Ofer Yizhar Barnea contributed equally to this work. Results: Here, we identify all possible nullomers and nullpeptides in the genomes 4Evergrande Center for and proteomes of thirty eukaryotes and demonstrate that a significant proportion of Immunologic Diseases, Harvard Medical School and Brigham and these sequences are under negative selection. We also identify nullomers that are Women’s Hospital, Boston, MA, USA unique to specific functional categories: coding sequences, exons, introns, 5′UTR, 3′ 1Department of Bioengineering and UTR, promoters, and show that coding sequence and promoter nullomers are most Therapeutic Sciences, University of California San Francisco, San likely to be selected against. By analyzing all protein sequences across the tree of life, Francisco, CA, USA we further identify 36,081 peptides up to six amino acids in length that do not exist Full list of author information is in any known organism, termed primes. We next characterize all possible single base available at the end of the article pair mutations that can lead to the appearance of a nullomer in the human genome, observing a significantly higher number of mutations than expected by chance for specific nullomer sequences in transposable elements, likely due to their suppression.
    [Show full text]
  • COVIPENDIUM May12.Pdf
    COVIPENDIUM Information available to support the development of medical countermeasures and interventions against COVID-19 Cite as: Martine DENIS, Valerie VANDEWEERD, Rein VERBEEKE, Anne LAUDISOIT, & Diane VAN DER VLIET. (2020). COVIPENDIUM: information available to support the development of medical countermeasures and interventions against COVID-19 (Version 2020-12-05). Transdisciplinary Insights. This document is conceived as a living document, updated on a weekly basis. You can find its latest version at: https://rega.kuleuven.be/if/corona_covid-19. The COVIPENDIUM is based on open-access publications (scientific journals and preprint databases, communications by WHO and OIE, health authorities and companies) in English language. Please note that the present version has not been submitted to any peer-review process. Any comment / addition that can help improve the contents of this review will be most welcome. For navigation through the various sections, please click on the headings of the table of contents and follow the links marked in blue in the document. Authors: Martine Denis, Valerie Vandeweerd, Rein Verbeke, Anne Laudisoit, Laure Wynants, Diane Van der Vliet COVIPENDIUM version: 12 MAY 2020 Transdisciplinary Insights - Living Paper | 1 Contents List of abbreviations .......................................................................................................................................................... 8 Introduction .....................................................................................................................................................................
    [Show full text]
  • Absent Sequences: Nullomers and Primes
    Pacific Symposium on Biocomputing 12:355-366(2007) ABSENT SEQUENCES: NULLOMERS AND PRIMES GREG HAMPIKIAN Biology, Boise State University, 1910 N University Drive Boise, Idaho 83725, USA TIM ANDERSEN Computer Science, Boise State University, 1910 N University Drive Boise, Idaho 83725, USA We describe a new publicly available algorithm for identifying absent sequences, and demonstrate its use by listing the smallest oligomers not found in the human genome (human “nullomers”), and those not found in any reported genome or GenBank sequence (“primes”). These absent sequences define the maximum set of potentially lethal oligomers. They also provide a rational basis for choosing artificial DNA sequences for molecular barcodes, show promise for species identification and environmental characterization based on absence, and identify potential targets for therapeutic intervention and suicide markers. 1. Introduction As large scale DNA sequencing becomes routine, the universal questions that can be addressed become more interesting. Our work focuses on identifying and characterizing absent sequences in publicly available databases. Through this we are attempting to discover the constraints on natural DNA and protein sequences, and to develop new tools for identification and analysis of populations. We term the short sequences that do not occur in a particular species “nullomers,” and those that have not been found in nature at all “primes.” The primes are the smallest members of the potential artificial DNA lexicon. This paper reports the results of our initial efforts to determine and map sets of nullomer and prime sequences in order to demonstrate the algorithm, and explore the utility of absent sequence analysis. It is well known that the number of possible DNA sequences is an exponentially increasing function of sequence length, and is equal to 4n, where n is the sequence length.
    [Show full text]
  • Efficient Computation of Absent Words in Genomic Sequences
    BMC Bioinformatics This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formatted PDF and full text (HTML) versions will be made available soon. Efficient computation of absent words in genomic sequences BMC Bioinformatics 2008, 9:167 doi:10.1186/1471-2105-9-167 Julia Herold ([email protected]) Stefan Kurtz ([email protected]) Robert Giegerich ([email protected]) ISSN 1471-2105 Article type Methodology article Submission date 8 November 2007 Acceptance date 26 March 2008 Publication date 26 March 2008 Article URL http://www.biomedcentral.com/1471-2105/9/167 Like all articles in BMC journals, this peer-reviewed article was published immediately upon acceptance. It can be downloaded, printed and distributed freely for any purposes (see copyright notice below). Articles in BMC journals are listed in PubMed and archived at PubMed Central. For information about publishing your research in BMC journals or any BioMed Central journal, go to http://www.biomedcentral.com/info/authors/ © 2008 Herold et al., licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Efficient computation of absent words in genomic sequences Julia Herold1, Stefan Kurtz2 and Robert Giegerich∗1 1Center of Biotechnology, Bielefeld University, Postfach 10 01 31, 33501 Bielefeld, Germany 2Center for Bioinformatics, University of Hamburg, Bundesstra¨se 43, 20146 Hamburg, Germany Email: Julia Herold – [email protected]; Stefan Kurtz – [email protected]; Robert Giegerich∗– [email protected]; ∗Corresponding author Abstract Background: Analysis of sequence composition is a routine task in genome research.
    [Show full text]
  • Hierarchical Clustering of DNA K-Mer Counts in Rnaseq Fastq Files Identifies Sample Heterogeneities
    International Journal of Molecular Sciences Article Hierarchical Clustering of DNA k-mer Counts in RNAseq Fastq Files Identifies Sample Heterogeneities Wolfgang Kaisers 1,2,* , Holger Schwender 3 and Heiner Schaal 2 1 Department of Anaesthesiology, HELIOS University Hospital Wuppertal, University of Witten/Herdecke, Heusnerstr. 40, 42283 Wuppertal, Germany 2 Institut fur Virologie, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, 40225 Düsseldorf, Germany; [email protected] 3 Mathematisches Institut, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany; [email protected] * Correspondence: [email protected]; Tel.: +49-202-896-1641 Received: 5 November 2018; Accepted: 15 November 2018; Published: 21 November 2018 Abstract: We apply hierarchical clustering (HC) of DNA k-mer counts on multiple Fastq files. The tree structures produced by HC may reflect experimental groups and thereby indicate experimental effects, but clustering of preparation groups indicates the presence of batch effects. Hence, HC of DNA k-mer counts may serve as a diagnostic device. In order to provide a simple applicable tool we implemented sequential analysis of Fastq reads with low memory usage in an R package (seqTools) available on Bioconductor. The approach is validated by analysis of Fastq file batches containing RNAseq data. Analysis of three Fastq batches downloaded from ArrayExpress indicated experimental effects. Analysis of RNAseq data from two cell types (dermal fibroblasts and Jurkat cells) sequenced in our facility indicate presence of batch effects. The observed batch effects were also present in reads mapped to the human genome and also in reads filtered for high quality (Phred > 30). We propose, that hierarchical clustering of DNA k-mer counts provides an unspecific diagnostic tool for RNAseq experiments.
    [Show full text]
  • Nullomers and High Order Nullomers in Genomic Sequences
    RESEARCH ARTICLE Nullomers and High Order Nullomers in Genomic Sequences Davide Vergni1, Daniele Santoni2* 1 Istituto per le Applicazioni del Calcolo ªMauro Piconeº - CNR, Via dei Taurini 19, 00185, Rome, Italy, 2 Istituto di Analisi dei Sistemi ed Informatica ªAntonio Rubertiº - CNR, Via dei Taurini 19, 00185, Rome, Italy * [email protected] a11111 Abstract A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applica- tions, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of OPEN ACCESS the notion of nullomer, namely high order nullomers, which are nullomers whose mutated Citation: Vergni D, Santoni D (2016) Nullomers sequences are still nullomers. We studied different aspects of them: comparison with nullo- and High Order Nullomers in Genomic Sequences. mers of random sequences, CpG distribution and mean helical rise. In agreement with previ- PLoS ONE 11(12): e0164540. doi:10.1371/journal. ous results we found that the number of nullomers in the human genome is much larger than pone.0164540 expected by chance. Nevertheless antithetical results were found when considering a ran- Editor: JuÈrgen Brosius, University of MuÈnster, dom DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies GERMANY in nullomers and high order nullomers revealed, as expected, a high CpG content but it also Received: January 25, 2016 highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggest- Accepted: September 27, 2016 ing that nullomers have their own peculiar structure and are not simply sequences whose Published: December 1, 2016 CpG frequency is biased.
    [Show full text]
  • Mutation Rate Variations in the Human Genome Are Encoded in DNA Shape
    bioRxiv preprint doi: https://doi.org/10.1101/2021.01.15.426837; this version posted June 14, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Mutation Rate Variations in the Human Genome are Encoded in DNA Shape Authors List Zian Liu1. [email protected] Md. Abul Hassan Samee1. [email protected]* 1: Department of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston, TX, 77030, USA bioRxiv preprint doi: https://doi.org/10.1101/2021.01.15.426837; this version posted June 14, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Abstract Single nucleotide mutation rates have critical implications for human evolution and genetic diseases. Accurate modeling of these mutation rates has long remained an open problem since the rates vary substantially across the human genome. A recent model, however, explained much of the variation by considering higher order nucleotide interactions in the local (7-mer) sequence context around mutated nucleotides. Despite this model’s predictive value, we still lack a biophysically-grounded understanding of genome-wide mutation rate variations. DNA shape features are geometric measurements of DNA structural properties, such as helical twist and tilt, and are known to capture information on interactions between neighboring nucleotides within a local context.
    [Show full text]
  • Amino Acid - Wikipedia, the Free Encyclopedia
    1/29/2016 Amino acid - Wikipedia, the free encyclopedia Amino acid From Wikipedia, the free encyclopedia This article is about the class of chemicals. For the structures and properties of the standard proteinogenic amino acids, see Proteinogenic amino acid. Amino acidsarebiologically important organic compoundscontaining amine (-NH2) and carboxylic acid(- COOH) functional groups, usually along with a side-chain specific to each aminoacid.[1][2][3] The key elements of an amino acid are carbon, hydrogen, oxygen, andnitrogen, though other elements are found in the side-chains of certain amino acids. About 500 amino acids are known and can be classified in many ways.[4] They can be classified according to the core structural functional groups' locations as alpha- (α-), beta- (β-), gamma- (γ-) or delta- (δ-) amino acids; other categories relate to polarity, pHlevel, and side-chain group type (aliphatic,acyclic, aromatic, containing hydroxyl orsulfur, etc.). In the form of proteins, amino acids comprise the second-largest component (water is the largest) of humanmuscles, cells and other tissues.[5] Outside proteins, The structure of an alpha amino amino acids perform critical roles in processes such as neurotransmittertransport and biosynthesis. acid in its un-ionized form In biochemistry, amino acids having both the amine and the carboxylic acid groups attached to the first (alpha-) carbon atom have particular importance. They are known as 2-, alpha-, or α-amino [6] acids (genericformula H2NCHRCOOH in most cases, where R is an organic substituent
    [Show full text]
  • COVID Review May5.Pdf
    COVIPENDIUM Information available to support the development of medical countermeasures and interventions against COVID-19 Cite as: Martine DENIS, Valerie VANDEWEERD, Rein VERBEEKE, Anne LAUDISOIT, & Diane Van der Vliet. (2020). COVIPENDIUM: information available to support the development of medical countermeasures and interventions against COVID-19 (Version 2020-05-05). Transdisciplinary Insights. This document is conceived as a living document, updated on a weekly basis. You can find its latest version at: https://rega.kuleuven.be/if/corona_covid-19. The COVIPENDIUM is based on open-access publications (scientific journals and preprint databases, communications by WHO and OIE, health authorities and companies) in English language. Please note that the present version has not been submitted to any peer-review process. Any comment / addition that can help improve the contents of this review will be most welcome. For navigation through the various sections, please click on the headings of the table of contents and follow the links marked in blue in the document. Authors: Martine Denis, Valerie Vandeweerd, Rein Verbeke, Anne Laudisoit, Laure Wynants, Diane Van der Vliet COVIPENDIUM version: 05 MAY 2020 Transdisciplinary Insights - Living Paper | 1 Contents List of abbreviations .......................................................................................................................................................... 8 Introduction .....................................................................................................................................................................
    [Show full text]
  • SBIA1201 – Sequence Analysis
    SCHOOL OF BIO AND CHEMICAL ENGINEERING DEPARTMENT OF BIOINFORMATICS UNIT – 1- SBIA1201 – Sequence Analysis Sequence analysis In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alignment, searches against biological databases, and others. Since the development of methods of high-throughput production of gene and protein sequences, the rate of addition of new sequences to the databases increased exponentially. Such a collection of sequences does not, by itself, increase the scientist's understanding of the biology of organisms. However, comparing these new sequences to those with known functions is a key way of understanding the biology of an organism from which the new sequence comes. Thus, sequence analysis can be used to assign function to genes and proteins by the study of the similarities between the compared sequences. Nowadays, there are many tools and techniques that provide the sequence comparisons (sequence alignment) and analyze the alignment product to understand its biology. Sequence analysis in molecular biology includes a very wide range of relevant topics: The comparison of sequences in order to find similarity, often to infer if they are related (homologous) Identification of intrinsic features of the sequence such as active sites, post translational modification sites, gene-structures, reading frames, distributions of introns and exons and regulatory elements Identification of sequence differences and variations such as point mutations and single nucleotide polymorphism (SNP) in order to get the genetic marker. Revealing the evolution and genetic diversity of sequences and organisms Identification of molecular structure from sequence alone In chemistry, sequence analysis comprises techniques used to determine the sequence of a polymer formed of several monomers (see Sequence analysis of synthetic polymers).
    [Show full text]
  • Bibliography
    Cover Page The handle http://hdl.handle.net/1887/87513 holds various files of this Leiden University dissertation. Author: Khachatryan, L. Title: Metagenomics : beyond the horizon of current implementations and methods Issue Date: 2020-04-28 Bibliography [1] WB Whitman, DC Coleman, and ica et Biophysica Acta (BBA)-Bioenergetics, WJ Wiebe. Prokaryotes: the unseen 1707(2-3):231–253, 2005. majority. Proceedings of the National [8] PG Falkowski and LV Godfrey. Elec- Academy of Sciences, 95(12):6578–6583, trons, life and the evolution of earth’s 1998. oxygen cycle. Philosophical Transactions [2] AA Juwarkar, SK Yadav, PR Thawale, of the Royal Society of London B: Biological P Kumar, SK Singh, and T Chakrabarti. Sciences, 363(1504):2705–2716, 2008. Developmental strategies for sustain- [9] C Gougoulias, JM Clark, and LJ Shaw. able ecosystem on mine spoil dumps: a The role of soil microbes in the case of study. Environmental monitoring global carbon cycle: tracking the below- and assessment, 157(1-4):471–481, 2009. ground microbial processing of plant- [3] C Pedros-Alio. Dipping into the rare derived carbon for manipulating car- biosphere. Sciense, 5809:192, 2007. bon dynamics in agricultural systems. Journal of the Science of Food and Agricul- [4] C Pedros-Alio. The rare bacterial bio- ture, 94(12):2362–2371, 2014. sphere. Annual review of marine science, [10] MG Klotz, DA Bryant, and TE Hanson. 4:449–466, 2012. The microbial sulfur cycle. Frontiers in [5] GT Pecl, MB Araujo, JD Bell, J Blan- microbiology, 2:241, 2011. chard, TC Bonebrake, I-C Chen, [11] J Chen, Y Li, Y Tian, C Huang, D Li, TD Clark, RK Colwell, F Danielsen, Q Zhong, and X Ma.
    [Show full text]
  • Some Unexpected Results on the Distribution of Genomic DNA K-Mers
    TEL-AVIV UNIVERSITY RAYMOND AND BEVERLY SACKLER FACULTY OF EXACT SCIENCES Common, Rare, and Surprising Words: Some Unexpected Results on the Distribution of Genomic DNA k-mers Thesis submitted in partial fulfillment of the requirements for the degree of M.Sc. in Tel -Aviv University The Department of Computer Science by Yaron Levy The research work for this thesis has been carried out at Tel-Aviv University under the supervision of Prof. Benny Chor and Prof. David Horn January 2008 i Abstract The probability distribution of DNA k-mers in whole genome sequences provides an inter- esting perspective of the complexity of these systems. Whereas previous research concen- trated on missing k-mers, we study the overall k-mer distribution of more than 100 species from archaea, bacteria, and eukarya (including mammals). We focus on low order Markov models, which capture short range correlations between nucleotides in the DNA sequence. In particular, they enable us to decide if rare, missing, and common k-mers are surprising or not. We show that various local and global properties of DNA k-mers can be modeled fairly well by low order chains. While exploring these empirical k-mer distributions, we discovered that a few species, in- cluding all mammals, have multi-modal histograms, while most species exhibit unimodal distributions. From an evolutionary perspective, these multi-modal distributions are exactly the tetrapods. These distributions are characterized by specific values of C+G contents and CpG dinucleotide suppression, but not by any one of these factors alone. Again, we provide an explanation for this phenomenon, using low order Markov models.
    [Show full text]