The Presence and Localization of G-Quadruplex Forming Sequences in the Domain of Bacteria
Total Page:16
File Type:pdf, Size:1020Kb
molecules Article The Presence and Localization of G-Quadruplex Forming Sequences in the Domain of Bacteria 1, 2, 2,3 4 4,5 Martin Bartas y , Michaela Cutovˇ á y,Václav Brázda , Patrik Kaura , Jiˇrí Št’astný , Jan Kolomazník 5, Jan Coufal 3, Pratik Goswami 3, Jiˇrí Cerveˇ ˇn 1 and Petr Peˇcinka 1,* 1 Department of Biology and Ecology/Institute of Environmental Technologies, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic; [email protected] (M.B.); [email protected] (J.C.)ˇ 2 Faculty of Chemistry, Brno University of Technology, Purkyˇnova118, 612 00 Brno, Czech Republic; [email protected] (M.C.);ˇ [email protected] (V.B.) 3 Institute of Biophysics, Academy of Sciences of the Czech Republic v.v.i., Královopolská 135, 612 65 Brno, Czech Republic; [email protected] (J.C.); [email protected] (P.G.) 4 Faculty of Mechanical Engineering, Brno University of Technology, Technicka 2896/2, 616 69 Brno, Czech Republic; [email protected] (P.K.); [email protected] (J.Š.) 5 Department of Informatics, Mendel University in Brno, Zemedelska 1665/1, 61300 Brno, Czech Republic; [email protected] * Correspondence: [email protected]; Tel.: +420-553-46-2318 These authors contributed equally to this work. y Received: 16 April 2019; Accepted: 1 May 2019; Published: 2 May 2019 Abstract: The role of local DNA structures in the regulation of basic cellular processes is an emerging field of research. Amongst local non-B DNA structures, the significance of G-quadruplexes was demonstrated in the last decade, and their presence and functional relevance has been demonstrated in many genomes, including humans. In this study, we analyzed the presence and locations of G-quadruplex-forming sequences by G4Hunter in all complete bacterial genomes available in the NCBI database. G-quadruplex-forming sequences were identified in all species, however the frequency differed significantly across evolutionary groups. The highest frequency of G-quadruplex forming sequences was detected in the subgroup Deinococcus-Thermus, and the lowest frequency in Thermotogae. G-quadruplex forming sequences are non-randomly distributed and are favored in various evolutionary groups. G-quadruplex-forming sequences are enriched in ncRNA segments followed by mRNAs. Analyses of surrounding sequences showed G-quadruplex-forming sequences around tRNA and regulatory sequences. These data point to the unique and non-random localization of G-quadruplex-forming sequences in bacterial genomes. Keywords: G-quadruplex; bacteria; bioinformatics; deinococcus; G4Hunter 1. Introduction The discovery of the B-DNA structure by Crick and Watson started a rapid growth in genetic and molecular biology research [1]. However, it is now clear that, apart from this well-known double helical DNA structure, other forms of secondary structure participate in various basic processes [2]. The presence of various local DNA structures including cruciforms [3], quadruplexes [4] and triplexes [5] has been demonstrated by various methodological approaches. For example, G-quadruplexes (G4) were studied by crystallography as far back as 1962 [6]. G4s are secondary structures formed by guanine rich sequences which are widespread in DNA and RNA [4]. The building block for a G4 is a guanine quartet formed by G:G Hoogsteen base pairs (Figure1). G4 formation requires the presence of monovalent cations such as Na+ and K+ [7]. Formation of this structure regulates various processes including gene expression [8], protein translation [9] and proteolysis pathways [10] in both prokaryotes Molecules 2019, 24, 1711; doi:10.3390/molecules24091711 www.mdpi.com/journal/molecules Molecules 2019, 24, 1711 2 of 13 and eukaryotes. In human, G4s are formed in various regions including sub-telomeres, gene bodies and Molecules 2019, 24,xy 2 of 13 gene regulatory regions [11] and in telomere regions to suppress degradation and maintain genomic stabilitymaintain [12 ].genomic Formation stability of G4s [12]. in thisFormation region of decreases G4s in this telomerase region decreases activity andtelomerase decreases activity the chances and ofdecreases cancer development the chances of [13 cancer,14]. Indevelopment addition, the[13,14]. proto-oncogene In addition, theMYC proto-oncogeneis bound by MYC nucleolin is bound in its hypersensitiveby nucleolin regionin its IIIhypersensitive and enhances region G4 folding III and and enhances suppresses G4MYC foldingtranscription and suppresses [15]. Therefore, MYC it hastranscription been suggested [15]. Therefore, that anticancer it has therapybeen suggested will be possiblethat anticancer by targeting therapy G4s will [16 be–18 possible]. Moreover, by it hastargeting been demonstratedG4s [16–18]. Moreover, that G4-stabilizing it has been demon ligandsstrated modulate that G4-stabilizing gene transcription ligands [modulate11]. It is alreadygene knowntranscription that clusters [11]. It of is G4-forming already known sequences that clusters induce of G4-forming gene expression sequences and induce that they gene are expression distributed nearand promoters that they and are 5distributed0UTRs. Replication-dependent near promoters and DNA 5′UTRs. damage Replication-dependent evidenced by G4 ligandsDNA damage have also beenevidenced discovered by G4 in tumorligands suppressor have also been genes discov and oncogenesered in tumor [19 suppressor]. Another potentialgenes and therapeutic oncogenes [19]. option is toAnother target potential G4-binding therapeutic proteins. option Many is proteins to target are G4-binding known to proteins. bind to Many G4s, including proteins are some known proteins to importantbind to inG4s, cancer including [20,21 some]. Moreover, proteins novel important G4 binding in cancer proteins [20,21]. have Moreover, been suggested, novel G4 sharing binding the NIQIproteins amino have acid been motif suggested, (RGRGR sharing GRGGG the SGGSG NIQI amino GRGRG) acid motif [22]. (RGRGR GRGGG SGGSG GRGRG) [22]. Figure 1. A Figure 1.G-quadruplexes: G-quadruplexes: ((A)) guanineguanine tetradtetrad stabilized by by Hoog Hoogstensten base base pairing pairing and and positively positively chargedcharged central central ion; ion; (B) schematic(B) schematic drawing drawing of intramolecular of intramolecular G4 structure G4 structure arising arising from double from double stranded DNA;stranded (C) G4Hunter, DNA; (C) G4Hunter, a new user-friendly a new user-friendly web server web for server high for throughput high throughput analyses analyses of G4-forming of G4- sequencesforming insequences DNA; and in DNA; (D) 3D and model (D) 3D of intramolecularmodel of intramolecular antiparallel antiparallel G4 formed G4 fromformed the from sequence the (50-GGGGTGTGGGGTGTsequence (5′-GGGGTGTGGGGTGT GGGGTGTGGGGTGT-3 GGGGTGTGGGGTGT-30) found in Microcystis′) found in aeruginosaMicrocystisbuilt aeruginosa using3D-NuS built webserverusing 3D-NuS [23]. webserver [23]. DueDue to to the the roles roles of of G4s G4s in in regulating regulating basicbasic cellular processes, it it is is essential essential to to identify identify the the location location ofof G4s G4s in in genomes. genomes. SeveralSeveral algorithms for for detecti detectingng expected expected matching matching patterns patterns for G4 for formation G4 formation are arealready already described. described. The The first first algorithm algorithm [Gn [GNmnGNnmNGoGnNnNopGGnn]N waspGn ]created was created by Balasubramanian by Balasubramanian and andcolleagues colleagues [24] [24 and] and the the second second algorithm algorithm co consideringnsidering occurrence occurrence of ofrepeating repeating unit unit Gn G (nn (n≥ 2) 2)was was ≥ createdcreated by by the the group group of of Maizels Maizels [ 25[25].]. Nevertheless,Nevertheless, these these algorithms algorithms only only produce produce binary binary (yes/no; (yes/no; matchmatch/no/no match) match) results, results, rather rather than than the the quantitativequantitative analyses that that are are mandatory mandatory for for correlation correlation with with quadruplexquadruplex strength strength metrics. metrics. G4HunterG4Hunter was devel developedoped to to overcome overcome this this limitation, limitation, in inwhich which G4 G4 propensitypropensity is calculatedis calculated depending depending on on G G richness richness andand G skewness [26]. [26]. Bacterial genetic material is stored mostly in circular chromosomes and plasmids [27]. It was demonstrated that secondary structures in bacterial genomes are responsible for genomic stability Molecules 2019, 24, 1711 3 of 13 Bacterial genetic material is stored mostly in circular chromosomes and plasmids [27]. It was demonstrated that secondary structures in bacterial genomes are responsible for genomic stability [28]. In addition, G4 structures are more stable than double stranded DNA due to slower unfolding kinetics [29]. Nevertheless, fewer studies on role of G4 in bacterial survival and virulence have been carried out [30]. A comparative functional analysis by Pooja et al. revealed that open reading frame (ORF) formulated amino acids biosynthesis and signal transduction are restrained by/controlled by G4 DNA in prokaryotes [31]. There are have been many reports on the role of G4s in eukaryotes over many years [32], although advances in prokaryotic G4s are not fully elucidated [33]. The formation of an intramolecular G4 requires the presence of a loop sequence between the G-tracts [34] and the density of G4 therefore broadly correlates with GC content. The GC content in bacterial genomes varies