Viewed in Chapter 1

ABSTRACT A GENOME-WIDE ANALYSIS OF PERFECT INVERTED REPEATS IN ARABIDOPSIS THALIANA by Sutharzan Sreeskandarajan Perfect inverted repeats play wide variety of roles in genomes. In this study, we conduct a genome-wide analysis of perfect inverted repeats in Arabidopsis thaliana and explore the biological significance of the observed inverted repeat distribution. The roles of palindromic sequences are reviewed in chapter 1. Chapter 2 describes a tool which was developed to detect perfect inverted repeats using a novel prime number-based algorithm. Chapter 3 focuses on the performed genome-wide analysis, illustrating the observed non-random distribution of perfect inverted repeats in different genic and intergenic regions in Arabidopsis genome. A GENOME-WIDE ANALYSIS OF PERFECT INVERTED REPEATS IN ARABIDOPSIS THALIANA A Thesis Submitted to the Faculty of Miami University in partial fulfillment of the requirements for the degree of Master of Science Department of Botany by Sutharzan Sreeskandarajan Miami University Oxford, Ohio 2013 Advisor_____________________ Dr. Chun Liang Reader______________________ Dr. Daniel K. Gladish Reader______________________ Dr. John E. Karro TABLE OF CONTENTS LIST OF TABLES ........................................................................................................................ IV LIST OF FIGURES ....................................................................................................................... V THESIS ORGANISATION .......................................................................................................... VI CHAPTER 1: INTRODUCTION ................................................................................................... 1 1.1 DNA Palindromes ............................................................................................................ 1 1.2 Non β-helix DNA Structures ............................................................................................ 2 1.3 Biological importance of Non β -helix DNA Structures .................................................. 2 1.4 Binding of Proteins to Palindromes ................................................................................. 3 1.5 Transposons and Palindromes .......................................................................................... 4 Figures ......................................................................................................................................... 5 CHAPTER 2: A MATLAB-BASED TOOL FOR ACCURATE DETECTION OF PERFECT OVERLAPPING AND NESTED INVERTED REPEATS IN DNA SEQUENCES .................... 6 Abstract ....................................................................................................................................... 6 Introduction ................................................................................................................................. 6 Algorithm .................................................................................................................................... 7 Implementation and evaluation ................................................................................................... 8 Conclusion ................................................................................................................................... 9 Figures ....................................................................................................................................... 10 CHAPTER 3: A GENOME-WIDE ANALYSIS OF PERFECT INVERTED REPEATS IN ARABIDOPSIS THALIANA .......................................................................................................... 11 Abstract ..................................................................................................................................... 11 Introduction ............................................................................................................................... 11 Materials and Methods .............................................................................................................. 13 1. Genome-wide detection of perfect IRs in Arabidopsis thaliana ....................................... 13 ii 2. The distribution of perfect IRs in the near upstream intergenic region ............................. 14 2.1. Breakdown of the 200 base unit upstream of the 5’ UTR .............................................. 14 2.2 The distribution of IRs in the near upstream intergenic regions ..................................... 14 2.3 Detection of Cis-acting Elements containing perfect IRs ............................................... 15 2.4 The distribution of perfect IRs which are part of Cis-acting Element sequences in the near intergenic regions upstream of 5’ UTR ......................................................................... 15 2.5 The distribution of TATATATA sequences in the near intergenic region upstream of 5’UTR .................................................................................................................................... 15 Results ....................................................................................................................................... 16 1. Genome-wide detection of IRs in Arabidopsis thaliana ................................................... 16 2. The analysis of perfect IRs in the near intergenic regions upstream of 5’ UTR ............... 17 2.1 The distribution of perfect IRs in the near intergenic regions upstream of 5’ UTR ........ 17 2.2 The distribution of IRs which are part of Cis-acting Element sequences in the near intergenic regions upstream of 5’ UTR ................................................................................. 17 2.3 The distribution of TATATATA sequences in the near intergenic regions upstream of 5’ UTR ....................................................................................................................................... 17 Discussion ................................................................................................................................. 18 Conclusion ................................................................................................................................. 20 Tables ........................................................................................................................................ 22 Figures ....................................................................................................................................... 23 LITERATURE CITED ................................................................................................................. 31 SUPPLEMENTARY MATERIALS ............................................................................................ 37 iii LIST OF TABLES Table 1: Nested and overlapping gene statistics of Arabidopsis thaliana nuclear chromosomes 22 iv LIST OF FIGURES Figure 1: Examples for different types of palindromes .................................................................. 5 Figure 2: The major steps of the cumulative prime-number scoring system algorithm for the detection of overlapping IRs ......................................................................................................... 10 Figure 3: The schematic representation of intragenic and intergenic regions for perfect inverted repeat analysis. .............................................................................................................................. 23 Figure 4: Distribution of intergenic distances in Arabidopsis thaliana genes .............................. 24 Figure 5: Chromosomes lengths of Arabidopsis thaliana ............................................................ 25 Figure 6: Abundance of perfect IRs in chromosomes of Arabidopsis thaliana ........................... 25 Figure 7: Lengths distribution of perfect IRs in Arabidopsis thaliana genome ........................... 26 Figure 8: The distribution of total bases among various genomic regions ................................... 27 Figure 9: The distribution of total count of perfect IR bases among various genomic regions .... 27 Figure 10: The percentage of perfect IR bases in various genomic regions ................................. 28 Figure 11: The distribution of perfect IR bases in the considered windows of the 5’ UTR upstream region. ............................................................................................................................ 28 Figure 12: The distribution of perfect IR bases which are part of cis-acting elements in the considered windows of the near intergenic regions upstream of 5’ UTR ..................................... 29 Figure 13: The distribution of bases which are part of TATATATA sequences in the considered windows of the near intergenic regions upstream of 5’ UTR ....................................................... 29 Figure 14: The distribution of bases which are part of cis-acting elements, including TATATATA, in the considered windows of the near intergenic regions upstream of 5’ UTR ........................... 30 v THESIS ORGANISATION This thesis consists of two chapters in two different journal formats. Chapter 2 is in the format of the journal Bioinformatics, and is co-authored by Michelle M Flowers, John E Karro, and Chun Liang. Chapter 3 is formatted according to the format specifications of the journal G3, and is co- authored by John E Karro and Chun Liang. vi CHAPTER 1: INTRODUCTION 1.1 DNA Palindromes A palindrome

Load more