Viewed in Chapter 1

Total Page:16

File Type:pdf, Size:1020Kb

Viewed in Chapter 1 ABSTRACT A GENOME-WIDE ANALYSIS OF PERFECT INVERTED REPEATS IN ARABIDOPSIS THALIANA by Sutharzan Sreeskandarajan Perfect inverted repeats play wide variety of roles in genomes. In this study, we conduct a genome-wide analysis of perfect inverted repeats in Arabidopsis thaliana and explore the biological significance of the observed inverted repeat distribution. The roles of palindromic sequences are reviewed in chapter 1. Chapter 2 describes a tool which was developed to detect perfect inverted repeats using a novel prime number-based algorithm. Chapter 3 focuses on the performed genome-wide analysis, illustrating the observed non-random distribution of perfect inverted repeats in different genic and intergenic regions in Arabidopsis genome. A GENOME-WIDE ANALYSIS OF PERFECT INVERTED REPEATS IN ARABIDOPSIS THALIANA A Thesis Submitted to the Faculty of Miami University in partial fulfillment of the requirements for the degree of Master of Science Department of Botany by Sutharzan Sreeskandarajan Miami University Oxford, Ohio 2013 Advisor_____________________ Dr. Chun Liang Reader______________________ Dr. Daniel K. Gladish Reader______________________ Dr. John E. Karro TABLE OF CONTENTS LIST OF TABLES ........................................................................................................................ IV LIST OF FIGURES ....................................................................................................................... V THESIS ORGANISATION .......................................................................................................... VI CHAPTER 1: INTRODUCTION ................................................................................................... 1 1.1 DNA Palindromes ............................................................................................................ 1 1.2 Non β-helix DNA Structures ............................................................................................ 2 1.3 Biological importance of Non β -helix DNA Structures .................................................. 2 1.4 Binding of Proteins to Palindromes ................................................................................. 3 1.5 Transposons and Palindromes .......................................................................................... 4 Figures ......................................................................................................................................... 5 CHAPTER 2: A MATLAB-BASED TOOL FOR ACCURATE DETECTION OF PERFECT OVERLAPPING AND NESTED INVERTED REPEATS IN DNA SEQUENCES .................... 6 Abstract ....................................................................................................................................... 6 Introduction ................................................................................................................................. 6 Algorithm .................................................................................................................................... 7 Implementation and evaluation ................................................................................................... 8 Conclusion ................................................................................................................................... 9 Figures ....................................................................................................................................... 10 CHAPTER 3: A GENOME-WIDE ANALYSIS OF PERFECT INVERTED REPEATS IN ARABIDOPSIS THALIANA .......................................................................................................... 11 Abstract ..................................................................................................................................... 11 Introduction ............................................................................................................................... 11 Materials and Methods .............................................................................................................. 13 1. Genome-wide detection of perfect IRs in Arabidopsis thaliana ....................................... 13 ii 2. The distribution of perfect IRs in the near upstream intergenic region ............................. 14 2.1. Breakdown of the 200 base unit upstream of the 5’ UTR .............................................. 14 2.2 The distribution of IRs in the near upstream intergenic regions ..................................... 14 2.3 Detection of Cis-acting Elements containing perfect IRs ............................................... 15 2.4 The distribution of perfect IRs which are part of Cis-acting Element sequences in the near intergenic regions upstream of 5’ UTR ......................................................................... 15 2.5 The distribution of TATATATA sequences in the near intergenic region upstream of 5’UTR .................................................................................................................................... 15 Results ....................................................................................................................................... 16 1. Genome-wide detection of IRs in Arabidopsis thaliana ................................................... 16 2. The analysis of perfect IRs in the near intergenic regions upstream of 5’ UTR ............... 17 2.1 The distribution of perfect IRs in the near intergenic regions upstream of 5’ UTR ........ 17 2.2 The distribution of IRs which are part of Cis-acting Element sequences in the near intergenic regions upstream of 5’ UTR ................................................................................. 17 2.3 The distribution of TATATATA sequences in the near intergenic regions upstream of 5’ UTR ....................................................................................................................................... 17 Discussion ................................................................................................................................. 18 Conclusion ................................................................................................................................. 20 Tables ........................................................................................................................................ 22 Figures ....................................................................................................................................... 23 LITERATURE CITED ................................................................................................................. 31 SUPPLEMENTARY MATERIALS ............................................................................................ 37 iii LIST OF TABLES Table 1: Nested and overlapping gene statistics of Arabidopsis thaliana nuclear chromosomes 22 iv LIST OF FIGURES Figure 1: Examples for different types of palindromes .................................................................. 5 Figure 2: The major steps of the cumulative prime-number scoring system algorithm for the detection of overlapping IRs ......................................................................................................... 10 Figure 3: The schematic representation of intragenic and intergenic regions for perfect inverted repeat analysis. .............................................................................................................................. 23 Figure 4: Distribution of intergenic distances in Arabidopsis thaliana genes .............................. 24 Figure 5: Chromosomes lengths of Arabidopsis thaliana ............................................................ 25 Figure 6: Abundance of perfect IRs in chromosomes of Arabidopsis thaliana ........................... 25 Figure 7: Lengths distribution of perfect IRs in Arabidopsis thaliana genome ........................... 26 Figure 8: The distribution of total bases among various genomic regions ................................... 27 Figure 9: The distribution of total count of perfect IR bases among various genomic regions .... 27 Figure 10: The percentage of perfect IR bases in various genomic regions ................................. 28 Figure 11: The distribution of perfect IR bases in the considered windows of the 5’ UTR upstream region. ............................................................................................................................ 28 Figure 12: The distribution of perfect IR bases which are part of cis-acting elements in the considered windows of the near intergenic regions upstream of 5’ UTR ..................................... 29 Figure 13: The distribution of bases which are part of TATATATA sequences in the considered windows of the near intergenic regions upstream of 5’ UTR ....................................................... 29 Figure 14: The distribution of bases which are part of cis-acting elements, including TATATATA, in the considered windows of the near intergenic regions upstream of 5’ UTR ........................... 30 v THESIS ORGANISATION This thesis consists of two chapters in two different journal formats. Chapter 2 is in the format of the journal Bioinformatics, and is co-authored by Michelle M Flowers, John E Karro, and Chun Liang. Chapter 3 is formatted according to the format specifications of the journal G3, and is co- authored by John E Karro and Chun Liang. vi CHAPTER 1: INTRODUCTION 1.1 DNA Palindromes A palindrome
Recommended publications
  • Identification of the Promoter and a Transcriptional Enhancer of The
    Proc. Natl. Acad. Sci. USA Vol. 90, pp. 11356-11360, December 1993 Developmental Biology Identification of the promoter and a transcriptional enhancer of the gene encoding L-CAM, a calcium-dependent cell adhesion molecule (gene expression/regulatory sequences/morphogenesis/cadherins) BARBARA C. SORKIN, FREDERICK S. JONES, BRUCE A. CUNNINGHAM, AND GERALD M. EDELMAN The Department of Neurobiology, The Scripps Research Institute, 10666 North Torrey Pines Road, La Jolla, CA 92037 Contributed by Gerald M. Edelman, August 20, 1993 ABSTRACT L-CAM is a calcium-dependent cell adhesion expressed in the dermis, but rather in the adjacent epidermis molecule that is expressed in a characteristic place-dependent (6). pattern during development. Previous studies of ectopic ex- L-CAM mediates calcium-dependent cell-cell adhesion (7) pression of the chicken L-CAM gene under the control of via a homophilic mechanism; i.e., L-CAM on one cell binds heterologous promoters in transgenic mice suggested that directly to L-CAM on apposing cells (8). In all tissues where cis-acting sequences controlling the spatiotemporal expression the molecule is expressed, L-CAM is detected as a trans- patterns ofL-CAM were present within the gene itself. We have membrane protein of 124 kDa (7). Mammalian proteins now examined the L-CAM gene for sequences that control its uvomorulin (9), E-cadherin (10), cell-CAM 120/80 (11), and expression and have found an enhancer within the second Arcl (12) are similar to L-CAM in their biochemical and intron of the gene. A 2.5-kb Kpn I-EcoRI fragment from the functional properties and tissue distributions.
    [Show full text]
  • The Grammar of Transcriptional Regulation
    Hum Genet (2014) 133:701–711 DOI 10.1007/s00439-013-1413-1 REVIEW PAPER The grammar of transcriptional regulation Shira Weingarten-Gabbay · Eran Segal Received: 11 June 2013 / Accepted: 24 December 2013 / Published online: 5 January 2014 © Springer-Verlag Berlin Heidelberg 2014 Abstract Eukaryotes employ combinatorial strategies to regulatory motifs (Levine and Tjian 2003) and exploit motif generate a variety of expression patterns from a relatively geometry as another dimension of combinatorial power small set of regulatory DNA elements. As in any other lan- for regulating transcription. Understanding the fundamen- guage, deciphering the mapping between DNA and expres- tal principles governing transcriptional regulation could sion requires an understanding of the set of rules that govern allow us to predict expression from DNA sequence, with basic principles in transcriptional regulation, the functional far-reaching implications. Most notably, in many human elements involved, and the ways in which they combine to diseases, genetic changes occur in non-coding regions such orchestrate a transcriptional output. Here, we review the cur- as gene promoters and enhancers. However, without under- rent understanding of various grammatical rules, including standing the grammar of transcriptional regulation, we the effect on expression of the number of transcription factor cannot tell which sequence changes affect expression and binding sites, their location, orientation, affinity and activity; how. For example, even for a single binding site, we do not co-association with different factors; and intrinsic nucleo- know the quantitative effects on expression of its location, some organization. We review different methods that are orientation, and affinity; whether these effects are general, used to study the grammar of transcription regulation, high- factor-specific, and/or promoter-dependent; and how they light gaps in current understanding, and discuss how recent depend on the intrinsic nucleosome organization.
    [Show full text]
  • Technical Glossary
    WBVGL 6/28/03 12:00 AM Page 409 Technical Glossary abortive infection: Infection of a cell where there is no net increase in the production of infectious virus. abortive transformation: See transitory (transient or abortive) transformation. acid blob activator: A regulatory protein that acts in trans to alter gene expression and whose activity depends on a region of an amino acid sequence containing acidic or phosphorylated residues. acquired immune deficiency syndrome (AIDS): A disease characterized by loss of cell-mediated and humoral immunity as the result of infection with human immunodeficiency virus (HIV). acute infection: An infection marked by a sudden onset of detectable symptoms usually followed by complete or apparent recovery. adaptive immunity (acquired immunity): See immunity. adjuvant: Something added to a drug to increase the effectiveness of that drug. With respect to the immune system, an adjuvant increases the response of the system to a particular antigen. agnogene: A region of a genome that contains an open reading frame of unknown function; origi- nally used to describe a 67- to 71-amino acid product from the late region of SV40. AIDS: See acquired immune deficiency syndrome. aliquot: One of a number of replicate samples of known size. a-TIF: The alpha trans-inducing factor protein of HSV; a structural (virion) protein that functions as an acid blob transcriptional activator. Its specificity requires interaction with certain host cel- lular proteins (such as Oct1) that bind to immediate-early promoter enhancers. ambisense genome: An RNA genome that contains sequence information in both the positive and negative senses. The S genomic segment of the Arenaviridae and of certain genera of the Bunyaviridae have this characteristic.
    [Show full text]
  • Unitary Structure of Palindromes in DNA
    bioRxiv preprint doi: https://doi.org/10.1101/2021.07.21.453288; this version posted July 22, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Unitary Structure of Palindromes in DNA Mehmet Ali Tibatan1, ∗ and Mustafa Sarısaman2, y 1Department of Biotechnology, Istanbul University, 34134, Vezneciler, Istanbul, Turkey 2Department of Physics, Istanbul University, 34134, Vezneciler, Istanbul, Turkey We investigate the quantum behavior encountered in palindromes within DNA structure. In par- ticular, we reveal the unitary structure of usual palindromic sequences found in genomic DNAs of all living organisms, using the Schwinger’s approach. We clearly demonstrate the role played by palin- dromic configurations with special emphasis on physical symmetries, in particular subsymmetries of unitary structure. We unveil the prominence of unitary structure in palindromic sequences in the sense that vitally significant information endowed within DNA could be transformed unchangeably in the process of transcription. We introduce a new symmetry relation, namely purine-purine or pyrimidine-pyrimidine symmetries (p-symmetry) in addition to the already known symmetry rela- tion of purine-pyrimidine symmetries (pp-symmetry) given by Chargaff’s rule. Therefore, important vital functions of a living organisms are protected by means of these symmetric features. It is un- derstood that higher order palindromic sequences could be generated in terms of the basis of the highest prime numbers that make up the palindrome sequence number. We propose that violation of this unitary structure of palindromic sequences by means of our proposed symmetries leads to a mutation in DNA, which could offer a new perspective in the scientific studies on the originand cause of mutation.
    [Show full text]
  • Palindromes in DNA—A Risk for Genome Stability and Implications in Cancer
    International Journal of Molecular Sciences Review Palindromes in DNA—A Risk for Genome Stability and Implications in Cancer Marina Svetec Mikleni´cand Ivan Krešimir Svetec * Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; [email protected] * Correspondence: [email protected]; Tel.: +385-1483-6016 Abstract: A palindrome in DNA consists of two closely spaced or adjacent inverted repeats. Certain palindromes have important biological functions as parts of various cis-acting elements and protein binding sites. However, many palindromes are known as fragile sites in the genome, sites prone to chromosome breakage which can lead to various genetic rearrangements or even cell death. The ability of certain palindromes to initiate genetic recombination lies in their ability to form secondary structures in DNA which can cause replication stalling and double-strand breaks. Given their recombinogenic nature, it is not surprising that palindromes in the human genome are involved in genetic rearrangements in cancer cells as well as other known recurrent translocations and deletions associated with certain syndromes in humans. Here, we bring an overview of current understanding and knowledge on molecular mechanisms of palindrome recombinogenicity and discuss possible implications of DNA palindromes in carcinogenesis. Furthermore, we overview the data on known palindromic sequences in the human genome and efforts to estimate their number and distribution, as well as underlying mechanisms of genetic rearrangements specific palindromic sequences cause. Keywords: DNA palindromes; quasipalindromes; palindromic amplification; palindrome-mediated genetic recombination; carcinogenesis Citation: Svetec Mikleni´c,M.; Svetec, I.K. Palindromes in DNA—A Risk for Genome Stability and Implications in Cancer.
    [Show full text]
  • Molecular Biology and Applied Genetics
    MOLECULAR BIOLOGY AND APPLIED GENETICS FOR Medical Laboratory Technology Students Upgraded Lecture Note Series Mohammed Awole Adem Jimma University MOLECULAR BIOLOGY AND APPLIED GENETICS For Medical Laboratory Technician Students Lecture Note Series Mohammed Awole Adem Upgraded - 2006 In collaboration with The Carter Center (EPHTI) and The Federal Democratic Republic of Ethiopia Ministry of Education and Ministry of Health Jimma University PREFACE The problem faced today in the learning and teaching of Applied Genetics and Molecular Biology for laboratory technologists in universities, colleges andhealth institutions primarily from the unavailability of textbooks that focus on the needs of Ethiopian students. This lecture note has been prepared with the primary aim of alleviating the problems encountered in the teaching of Medical Applied Genetics and Molecular Biology course and in minimizing discrepancies prevailing among the different teaching and training health institutions. It can also be used in teaching any introductory course on medical Applied Genetics and Molecular Biology and as a reference material. This lecture note is specifically designed for medical laboratory technologists, and includes only those areas of molecular cell biology and Applied Genetics relevant to degree-level understanding of modern laboratory technology. Since genetics is prerequisite course to molecular biology, the lecture note starts with Genetics i followed by Molecular Biology. It provides students with molecular background to enable them to understand and critically analyze recent advances in laboratory sciences. Finally, it contains a glossary, which summarizes important terminologies used in the text. Each chapter begins by specific learning objectives and at the end of each chapter review questions are also included.
    [Show full text]
  • Medium Reiteration Frequency Repetitive Sequences in the Human Genome
    k.-_:) 1991 Oxford University Press Nucleic Acids Research, Vol. 19, No. 17 4731-4738 Medium reiteration frequency repetitive sequences in the human genome David J.Kaplan, Jerzy Jurka1, Joseph F.Solus and Craig H.Duncan* Center for Molecular Biology, Wayne State University, Detroit, Ml and 'Linus Pauling Institute of Science and Medicine, 440 Page Mill Road, Palo Alto, CA 94306, USA Received April 24, 1991; Revised and Accepted August 7, 1991 EMBL accession nos X59017-X59026 (incl.) ABSTRACT Fourteen novel medium reiteration frequency (MER) Isolation of novel MER families from sequence libraries families were found, in the human genome, by using The Alu fragment library. Human genomic DNA (250 isg) was two different methods. Repetition frequencies per digested to completion with the restriction enzyme AluI. The haploid human genome were estimated for each of resulting fragments were fractionated on a 6% polyacrylamide these families as well as for six previously described gel and fragments in the 500-1000 bp region were isolated. MER DNA families. By these measurements, the 50 ng of this size fractionated DNA was ligated to 500 ng of families were found to contain variable numbers of SnaI cleaved Ml3mpl9 RF DNA (5). The vector was elements, ranging from 200 to 10,000 copies per phosphatased before use. The ligated DNA was mixed with haploid human genome. competent JM 109 bacteria (Stratagene Inc.) and 20,000 resulting transformants were plated at a density of 1,650 plaques per 10 cm INTRODUCTION petri dish. Duplicate filter replicates were prepared and were incubated separately with either of two radioactive oligonucleotide The human genome, like those of other higher eukaryotes, probes.
    [Show full text]
  • Evidence for Two Mechanisms of Palindrome-Stimulated Deletion in Escherichia Coli: Single-Strand Annealing and Replication Slipped Mispairing
    Copyright 2001 by the Genetics Society of America Evidence for Two Mechanisms of Palindrome-Stimulated Deletion in Escherichia coli: Single-Strand Annealing and Replication Slipped Mispairing Malgorzata Bzymek1 and Susan T. Lovett Department of Biology and Rosenstiel Basic Medical Sciences Research Center, Brandeis University, Waltham, Massachusetts 02454-0110 Manuscript received January 31, 2001 Accepted for publication March 19, 2001 ABSTRACT Spontaneous deletion mutations often occur at short direct repeats that ¯ank inverted repeat sequences. Inverted repeats may initiate genetic rearrangements by formation of hairpin secondary structures that block DNA polymerases or are processed by structure-speci®c endonucleases. We have investigated the ability of inverted repeat sequences to stimulate deletion of ¯anking direct repeats in Escherichia coli. Propensity for cruciform extrusion in duplex DNA correlated with stimulation of ¯anking deletion, which was partially sbcD dependent. We propose two mechanisms for palindrome-stimulated deletion, SbcCD dependent and SbcCD independent. The SbcCD-dependent mechanism is initiated by SbcCD cleavage of cruciforms in duplex DNA followed by RecA-independent single-strand annealing at the ¯anking direct repeats, generating a deletion. Analysis of deletion endpoints is consistent with this model. We propose that the SbcCD-independent pathway involves replication slipped mispairing, evoked from stalling at hairpin structures formed on the single-stranded lagging-strand template. The skew of SbcCD-independent deletion endpoints with respect to the direction of replication supports this hypothesis. Surprisingly, even in the absence of palindromes, SbcD affected the location of deletion endpoints, suggesting that SbcCD- mediated strand processing may also accompany deletion unassociated with secondary structures. N vivo, large DNA palindromes are intrinsically unsta- and Wang 1983).
    [Show full text]
  • Evolutionary Advantage of a Broken Symmetry in Autocatalytic Polymers Tentatively Explains Fundamental Properties of DNA
    Evolutionary advantage of a broken symmetry in autocatalytic polymers tentatively explains fundamental properties of DNA Hemachander Subramanian1∗, Robert A. Gatenby1;2 1 Integrated Mathematical Oncology Department, 2Cancer Biology and Evolution Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, Florida. ∗To whom correspondence should be addressed; E-mail: hemachander.subramanian@moffitt.org. August 17, 2018 Abstract The macromolecules that encode and translate information in living systems, DNA and RNA, exhibit distinctive structural asymmetries, in- cluding homochirality or mirror image asymmetry and 30-50 directionality, that are invariant across all life forms. The evolutionary advantages of these broken symmetries remain unknown. Here we utilize a very simple model of hypothetical self-replicating polymers to show that asymmetric autocatalytic polymers are more successful in self-replication compared to their symmetric counterparts in the Darwinian competition for space and common substrates. This broken-symmetry property, called asym- metric cooperativity, arises with the maximization of a replication poten- tial, where the catalytic influence of inter-strand bonds on their left and right neighbors is unequal. Asymmetric cooperativity also leads to ten- tative, qualitative and simple evolution-based explanations for a number of other properties of DNA that include four nucleotide alphabet, three nucleotide codons, circular genomes, helicity, anti-parallel double-strand orientation, heteromolecular base-pairing, asymmetric base compositions, and palindromic instability, apart from the structural asymmetries men- tioned above. Our model results and tentative explanations are consistent with multiple lines of experimental evidence, which include evidence for the presence of asymmetric cooperativity in DNA. arXiv:1605.00748v3 [q-bio.BM] 9 Mar 2017 Introduction Living systems, uniquely in nature, acquire, store and use information au- tonomously.
    [Show full text]
  • Molecular Mechanism of Directional CTCF Recognition of a Diverse Range of Genomic Sites
    Cell Research (2017) :1365-1377. © 2017 IBCB, SIBS, CAS All rights reserved 1001-0602/17 $ 32.00 ORIGINAL ARTICLE www.nature.com/cr Molecular mechanism of directional CTCF recognition of a diverse range of genomic sites Maolu Yin1, 2, Jiuyu Wang1, Min Wang1, Xinmei Li1, 2, Mo Zhang3, 4, 5, Qiang Wu3, 4, 5, Yanli Wang1, 2, 6 1Key Laboratory of RNA Biology, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; 2University of Chinese Academy of Sciences, Beijing 100049, China; 3Center for Comparative Biomedicine, MOE Key Laboratory of Systems Biomedicine, Institute of Systems Biomedicine, Collaborative Innovative Center of Systems Biomedicine, SCSB, Shanghai Jiao Tong University (SJTU), Shanghai 200240, China; 4State Key Laboratory of On- cogenes and Related Genes, SJTU Medical School, Shanghai 200240, China; 5School of Life Sciences and Biotechnology, SJTU, Shanghai 200240, China; 6Collaborative Innovation Center of Genetics and Development, Shanghai 200438, China CTCF, a conserved 3D genome architecture protein, determines proper genome-wide chromatin looping interac- tions through directional binding to specific sequence elements of four modules within numerous CTCF-binding sites (CBSs) by its 11 zinc fingers (ZFs). Here, we report four crystal structures of human CTCF in complex with CBSs of the protocadherin (Pcdh) clusters. We show that directional CTCF binding to cognate CBSs of the Pcdh enhancers and promoters is achieved through inserting its ZF3, ZFs 4-7, and ZFs 9-11 into the major groove along CBSs, result- ing in a sequence-specific recognition of module 4, modules 3 and 2, and module 1, respectively; and ZF8 serves as a spacer element for variable distances between modules 1 and 2.
    [Show full text]
  • Palindromic Sequence Impedes Sequencing-By- Ligation Mechanism
    Huang et al. BMC Systems Biology 2012, 6(Suppl 2):S10 http://www.biomedcentral.com/1752-0509/6/S2/S10 PROCEEDINGS Open Access Palindromic sequence impedes sequencing-by- ligation mechanism Yu-Feng Huang1†, Sheng-Chung Chen1†, Yih-Shien Chiang2, Tzu-Han Chen1, Kuo-Ping Chiu1,2,3* From 23rd International Conference on Genome Informatics (GIW 2012) Tainan, Taiwan. 12-14 December 2012 Abstract Background: Current next-generation sequencing (NGS) platforms adopt two types of sequencing mechanisms: by synthesis or by ligation. The former is employed by 454 and Solexa systems, while the latter by SOLiD system. Although the pros and cons for each sequencing mechanism have more or less been discussed in a number of occasions, the potential obstacle imposed by palindromic sequences has not yet been addressed. Methods: To test the effect of the palindromic region on sequencing efficacy, we clonally amplified a paired-end ditag sequence composed of a 24-bp palindromic sequence flanked by a pair of tags from the E. coli genome. We used the near homogeneous fragments produced from MmeI digestion of the amplified clone to generate a sequencing library for SOLiD 5500xl sequencer. Results: Results showed that, traditional ABI sequencers, which adopt sequencing-by-synthesis mechanism, were able to read through the palindromic region. However, SOLiD 5500xl was unable to do so. Instead, the palindromic region was read as miscellaneous random sequences. Moreover, readable tag sequence turned obscure ~2 bp prior to the palindromic region. Conclusions: Taken together, we demonstrate that SOLiD machines, which employ sequencing-by-ligation mechanism, are unable to read through the palindromic region.
    [Show full text]
  • Super Short Operations on Both Gene Order and Intergenic Sizes Andre R
    Oliveira et al. Algorithms Mol Biol (2019) 14:21 https://doi.org/10.1186/s13015-019-0156-5 Algorithms for Molecular Biology RESEARCH Open Access Super short operations on both gene order and intergenic sizes Andre R. Oliveira1* , Géraldine Jean2 , Guillaume Fertin2 , Ulisses Dias3 and Zanoni Dias1 Abstract Background: The evolutionary distance between two genomes can be estimated by computing a minimum length sequence of operations, called genome rearrangements, that transform one genome into another. Usually, a genome is modeled as an ordered sequence of genes, and most of the studies in the genome rearrangement literature consist in shaping biological scenarios into mathematical models. For instance, allowing diferent genome rearrangements operations at the same time, adding constraints to these rearrangements (e.g., each rearrangement can afect at most a given number of genes), considering that a rearrangement implies a cost depending on its length rather than a unit cost, etc. Most of the works, however, have overlooked some important features inside genomes, such as the pres- ence of sequences of nucleotides between genes, called intergenic regions. Results and conclusions: In this work, we investigate the problem of computing the distance between two genomes, taking into account both gene order and intergenic sizes. The genome rearrangement operations we consider here are constrained types of reversals and transpositions, called super short reversals (SSRs) and super short transpositions (SSTs), which afect up to two (consecutive) genes. We denote by super short operations (SSOs) any SSR or SST. We show 3-approximation algorithms when the orientation of the genes is not considered when we allow SSRs, SSTs, or SSOs, and 5-approximation algorithms when considering the orientation for either SSRs or SSOs.
    [Show full text]