University of Zagreb Faculty of Science Department of Biology

Berislav Lisnić

Recombinogenicity and Distribution of in the Yeast Saccharomyces cerevisiae

Doctoral Thesis

submitted to the Department of Biology Faculty of Science University of Zagreb

for the academic degree of Doctor of Natural Sciences (Biology)

Zagreb, 2011. declaration of autorship and supervision

The research described in this thesis was performed by the author in the Laboratory of Biology and Microbial at the Faculty of Food Technology and Biotechnology (Zagreb, Croatia), under supervision of dr. sc. Ivan Krešimir Svetec and dr. sc. Zoran Zgaga, as part of the postgraduate program in biology at the Department of Biology, Faculty of Science. ii | Declaration of autorship and superivion acknowledgements

Like all scientific endeavours, the work presented here was a collaborative effort that included many people. Preparing, organizing and writing this thesis has been a difficult, long, frustrating - but more than anything else - an exciting and intellectually rewarding adventure. Those who have, directly or indirectly, joined and helped me on this journey have earned my deepest respect and appreciation.

First of all, I would like to extend my gratitude to my mentors, Professors Zoran Zgaga and Ivan Krešimir Svetec for their counsel, guidance, patience and encouragement during my undergraduate and graduate years. This thesis would not have seen the light of day without their support. I also record special thanks to my committee members, Professors Višnja Besendorfer and Kristian Vlahoviček for their valuable support and suggestions. Furthermore, I gratefully acknowledge the financial support from the Croatian Ministry of Science, Education and Sports, which made the research described in this thesis possible.

It has been a privilege to spend several years at the Laboratory of Biology and Microbial Genetics at the Faculty of Food Technology and Biotechnology, where I have met many people with whom I shared some of the best and some of the worst moments of my PhD adventure. I will always hold our group and lab members, whether former or current, in my memory - particularly Krešimir Gjuračić, Sandra Gregorić, Petar Mitrikeski and Marina Miklenić, who were always ready to lend a helping hand and ear whenever it was needed.

The staff of the Ligament polyclinic, Diana, Dora, Hrvoje, Mario, Marinko, Marko, Slavica, Zdravko, and most notably "Il Dottore Medico" - dr. Stjepko Bućan, deserve my sincerest thanks. The knowledge, assistance and friendship Stjepko extended to me over the last couple of years mean to me more than I could ever express. I owe him my heartfelt appreciation.

My friends, especially Zvonka Zrnčević, Anita Marjanović, Lucija Barbarić and Igor Obleščuk were never-failing sources of joy, happiness and support when it was most needed. I am very happy to have you around me, and I hope that our friendship will last for many more years to come.

Finally, I dedicate this work to the closest members of my family - my mother Jasna, who left us too soon, my wife Vanda and my mother-in-law, Jasna. It is they who, quite often, suffered more from my PhD than myself. Their unconditional love, encouragement, understanding and endless patience were always my driving force and inspiration that allowed me to successfully finish this journey. I owe you everything and I hope that this work will make you proud.

Acknowledgements | iii contents

Title page i Declaration of autorship and supervision ii Acknowledgements iii Contents iv Basic documentation card vi Temeljna dokumentacijska kartica vii List of papers viii Abbreviations ix Summary x

INTRODUCTION 1

1 | Introduction to palindromic dna 2 1.1. Nomenclature and definitions 2 1.2. Palindromes are important, but dangerous DNA motifs 4 2 | Mechanisms of -driven genetic instability 7 2.1. Hairpin-mediated genetic instability 7 2.1.1. Replication slippage 8 2.1.2. Invasion of the 3' end into ectopic homology 11 2.1.3. Nucleolytic processing of the hairpin structure 13 2.1.4. Stalled replication fork reversal 15 2.2. Cruciform-mediated genetic instability 17 2.2.1. Mechanisms of cruciform extrusion 17 2.2.2. Cruciform resolution 20 2.2.3. Center-break palindrome revision 23 3 | Mechanisms of palindrome formation in vivo 25 3.1. Palindromes are formed in vivo as a consequence of DNA metabolism 25 3.1.1. Replication slippage 26 3.1.2. Template switching 27 3.1.3. Fusion of sister chromatids 29 3.1.4. Replication of hairpin-capped DNA molecules 31 4 | Palindromes and human health 34 4.1. Palindromes can cause genetic diseases 34 4.1.1. Palindrome-mediated β-thalassemia 34 4.1.2. Palindrome-mediated translocations and associated syndromes 36 4.1.3. Palindromic amplifications and cancer 38

iv | Contents contents

DISCUSSION 41

5 | Factors affecting palindrome-stimulated recombination 42 5.1. A 110-bp palindromic insert stimulates recombination regardless of its genomic position 42 5.2. The role of CRP1, MUS81, SAE2 and SPO11 in palindrome-induced pop-out recombination 46 5.3. A threshold size for palindrome-induced recombination 47 6 | Palindromic sequences in the yeast reference genome sequence 49 6.1. A model for the evolutionary dynamics of palindromic sequences in the yeast genome 51

CONCLUSIONS 53 REFERENCES 54 CURRICULUM VITAE 64 INDIVIDUAL PAPERS 65

Contents | v basic documentation card

University of Zagreb Doctoral Thesis Faculty of Science Department of Biology

Recombinogenicity and Distribution of Palindromes in the Yeast Saccharomyces cerevisiae Genome

Berislav Lisnić Faculty of Food Technology and Biotechnology Department of biochemical engineering Laboratory of biology and microbial genetics Pierottijeva 6, 10 000 Zagreb, Croatia

Genomes of many organisms, from to mammals, are known to contain significant proportion of palindromic sequences. Although frequently located in cis-acting genetic elements, where they play an important role in the regulation of various cellular processes, palindromes have been recognized as recombinogenic motifs that can provoke various types of genome rearrangements and cause genetic diseases in humans. Employing yeast Saccharomyces cerevisiae as a model organism, this study provides systematic analysis of factors affecting palindrome-stimulated recombination, together with computational tools for distribution analysis of palindromes in DNA sequences. This work establishes for the first time that palindromic sequences become recombinogenic only after they attain a critical length of approximately 70 bp. In addition, the obtained results not only indicate that palindromes conform to specific rules of evolutionary dynamics, but also suggest the existence of a mechanism that suppresses the recombinogenicity of numerous palindromes longer than 70 bp present in the human genome.

(64 pages, 18 figures, 1 table, 192 references, original in English)

Thesis deposited in National and University Library, Hrvatske bratske zajednice 4, 10 000 Zagreb, Croatia.

Keywords: cruciform/genome stability/palindrome/recombination/yeast

Supervisor: dr. sc. Ivan Krešimir Svetec

Reviewers: dr. sc. Višnja Besendorfer dr. sc. Ivan Krešimir Svetec dr. sc. Kristian Vlahoviček

Thesis accepted: July 6th, 2011. vi | Basic documentation card temeljna dokumentacijska kartica

Sveučilište u Zagrebu Doktorska disertacija Prirodoslovno-matematički fakultet Biološki odsjek

Rekombinogenost i Distribucija Palindroma u Genomu Kvasca Saccharomyces cerevisiae

Berislav Lisnić Prehrambeno-Biotehnološki Fakultet Zavod za biokemijsko inženjerstvo Laboratorij za biologiju i genetiku mikroorganizama Pierottijeva 6, 10 000 Zagreb, Hrvatska

Palindromske sekvencije u značajnom su udjelu zastupljene u genomima velikog broja organizama, od bakterija do sisavaca. Iako se često nalaze u raznim cis-djelujućim genetičkim elementima, gdje igraju važnu ulogu u regulaciji raznih staničnih procesa, palindromi su prepoznati kao rekombinogeni motivi koji mogu uzrokovati kako raznovrsne rearanžmane genetičkog materijala, tako i genetičke bolesti kod ljudi. Koristeći kvasac Saccharomyces cerevisiae, provedena je sistematska karakterizacija čimbenika koji utječu na rekombinaciju potaknutu palindromom, te su razvijeni računalni alati za analizu distribucije palindroma u sekvencijama DNA. K tome, u ovom je radu po prvi puta utvrđeno da palindromske sekvencije postaju rekombinogene tek kada dosegnu kritičnu duljinu od približno 70 bp, a ukupni rezultati ne samo da pokazuju kako se palindromske sekvencije pokoravaju specifičnim pravilima evolucijske dinamike, već upućuju i na postojanje mehanizma koji suprimira rekombinogenost brojnih palindroma duljih od 70 bp, koji su prisutni u humanom genomu.

(64 stranice, 18 slika, 1 tablica, 192 literaturna navoda, jezik izvornika: engleski)

Rad je pohranjen u Nacionalnoj i sveučilišnoj knjižnici, Hrvatske bratske zajednice 4, 10 000 Zagreb, Hrvatska.

Ključne riječi: kruciform/stabilnost genoma/palindrom/rekombinacija/kvasac

Mentor: dr. sc. Ivan Krešimir Svetec

Ocjenjivači: dr. sc. Višnja Besendorfer dr. sc. Ivan Krešimir Svetec dr. sc. Kristian Vlahoviček

Rad prihvaćen: 6. srpnja 2011.

Temeljna dokumentacijska kartica | vii list of papers

This thesis consists of the following papers: o Lisnić B, Svetec IK, Štafa A, Zgaga Z (2009) Size-dependent palindrome-induced intrachromosomal recombination in yeast. DNA Repair 8: 383-389. o Lisnić B, Svetec IK, Šarić H, Nikolić I, Zgaga Z (2005) Palindrome content of the yeast Saccharomyces cerevisiae genome. Current Genetics 47: 289-297. o Svetec IK, Lisnić B, Zgaga Z (2002) A 110 bp palindrome stimulates plasmid integration in yeast. Periodicum Biologorum 104: 421-424.

viii | List of papers abbreviations

abbreviations for commonly used terms A, C, G and T , , and Alu-IR composed of two Alu elements bp DNA Deoxyribonucleic acid DSB Double-strand break dsDNA, ssDNA Double-stranded DNA, single-stranded DNA FS2 Fragile site 2 HSV-1 Herpes simplex virus 1 IR Inverted repeat kb Kilobase MMV Mouse minute virus MRX Mre11/Rad50/Xrs2 protein complex NHEJ Non-homologous end-joining PATRR Palindromic AT-rich repeat RNA Ribonucleic acid RMS Restriction-modification system mRNA Messenger RNA rRNA Ribosomal RNA SSA Single-strand annealing SSB Single-strand binding protein(s) Ty1 Transposon yeast 1 UV Ultraviolet

description of yeast names standard name name description CRP1 Cruciform DNA-recognizing protein 1 CYC1 Cytochrome c 1 DNA2 DNA synthesis defective 2 EXO1 Exonuclease 1 LYS2 Lysine requiring 2 MRE11 Meiotic recombination 11 MSH1 MutS homolog 1 MUS81 Ethyl-methane sulfonate, UV-sensitive clone 81 PSO2 Psoralen derivative sensitive 2 RAD1 - RAD52 Radiation sensitive 1-52 SAE2 Sporulation in the absence of SPO eleven 2 SGS1 Slow growth suppresor 1 SLX4 Synthetic lethal of unknown (X) function 4 SPO11 Sporulation deficient mutant 1-1 URA3 requiring 3 XRS2 X-ray sensitive 2 YEN1 Yeast endonuclease 1

Abbreviations | ix summary

Numerous DNA sequencing projects have demonstrated that of many organisms contain significant proportion of palindromes - repetitive DNA motifs characterized by an internal two-fold axis of rotational symmetry. Palindromes are frequently found in different regulatory regions of the genome, such as promoters, operators and origins of replication, where they are involved in the regulation of fundamental cellular processes. However, longer palindromes have also been recognized as highly recombinogenic motifs, provoking different types of potentially dangerous genetic alterations which can have harmful impact on human health. The ability of palindromes to undermine the stability of genetic material and promote DNA rearrangements has been linked to their potential to form secondary structures in DNA known as hairpins and cruciforms. Once formed, these secondary structures can be processed to recombinogenic double-strand breaks that can lead to genome rearrangements and development of serious genetic diseases, such as β-thalassemia, Emanuel syndrome, type-1 neurofibromatosis and cancer. However, in spite of their negative influence on genetic stability and human health, factors affecting palindrome- mediated DNA breakage and recombination still remain poorly characterized. In this study, using yeast Saccharomyces cerevisiae as a model organism, the effect of several physical and genetic parameters, primarily length and position, but also the deletion of MUS81, SPO11, CRP1 and SAE2 genes on palindrome-stimulated recombination was systematically investigated. In addition, we developed a computer program which can identify, locate and count palindromes in a given sequence in a strictly defined way, and we used this program to analyze the palindrome content of the entire yeast genome. The obtained results demonstrate that, regardless of their position in the genome, palindromic sequences become recombinogenic only after they attain a critical length of approximately 70 bp, whereas the deletion of the SAE2 gene completely abolishes the high recombination rate in palindrome-containing constructs. Moreover, computational analyses enabled us to propose a general model for the evolution of palindromic sequences in yeast, while the combined results suggest that suppression of palindrome-induced recombination may be crucial for the genetic stability of genomes containing large numbers of very long palindromes, such as the human genome. x | Summary Individual Papers paper 1

DNA Repair 8 (2009) 383–389

Contents lists available at ScienceDirect

DNA Repair

journal homepage: www.elsevier.com/locate/dnarepair

Size-dependent palindrome-induced intrachromosomal recombination in yeast

Berislav Lisnic,´ Ivan-Kresimirˇ Svetec, Anamarija Stafa,ˇ Zoran Zgaga ∗

Laboratory of Biology and Microbial Genetics, Department of Biochemical Engineering, Faculty of Food Technology and Biotechnology, Pierottijeva 6, 10 000 Zagreb, Croatia

article info abstract

Article history: Palindromic and quasi-palindromic sequences are important DNA motifs found in various cis-acting Received 3 September 2008 genetic elements, but are also known to provoke different types of genetic alterations. The instability of Received in revised form 15 October 2008 such motifs is clearly size-related and depends on their potential to adopt secondary structures known as Accepted 25 November 2008 hairpins and cruciforms. Here we studied the influence of palindrome size on recombination between two Available online 4 January 2009 directly repeated copies of the yeast CYC1 gene leading to the loss of the intervening sequence (“pop-out” recombination). We show that palindromes inserted either within one copy or between the two copies Keywords: of the CYC1 gene become recombinogenic only when they attain a certain critical size and we estimate Palindrome SAE2 this critical size to be about 70 bp. With the longest palindrome used in this study (150 bp) we observed Pop-out a more than 20-fold increase in the pop-out recombination. In the sae2/com1 mutant the palindrome- Genome stability stimulated recombination was completely abolished. Suppression of palindrome recombinogenicity may Recombination be crucial for the maintenance of genetic stability in organisms containing a significant number of large palindromes in their genomes, like humans. © 2008 Elsevier B.V. All rights reserved.

1. Introduction yeast vegetative cells where the inverted dimer of the URA3 gene (1.02 kb) introduced in the genome stimulated recombination Palindromes, defined as inverted repeats without spacer DNA, in adjacent region up to 17,000-fold [10]. However, palindromic are present in the genetic material of all living organisms, in viral sequences present on plasmid molecules that could not be prop- genomes and plasmids. Their significance was first appreciated in agated in E. coli without rearrangements were maintained in the the context of the bacterial type II restriction–modification sys- yeast genome and it was suggested that the yeast S. cerevisiae should tems and a number of regulatory proteins were found later to bind be used for cloning and propagation of long palindromes [11]. The palindromic sequences as homo- or hetero-dimers. Another impor- potential of palindromic and near-palindromic sequences to induce tant property of palindromes and other types of closely spaced genetic rearrangements was also observed in mammalian cells. inverted repeats is their intrinsic potential to form intra-strand For example, Cunningham et al. [12] established the murine cell hydrogen bonds giving rise to secondary structures known as hair- line bearing an artificially introduced 15.4 kb palindrome that was pins and cruciforms. Both protein binding and secondary structure rearranged and subsequently stabilized by “center-break mecha- formation make palindromic sequences important constituents nism” at the rate of about 0.5% per population doubling. In humans, of different cis-acting genetic elements like promoters, operators, long AT-rich near-palindromic sequences were identified as break attenuators, viral and plasmid origins of replication [1–5]. points for meiotic translocations [13,14], while the formation of Another reason to study palindromes is based on observa- large palindromes observed in human cancers was associated with tions obtained in different experimental systems indicating that genomic amplifications [15]. Moreover, it has been proposed that they induce genetic instability. In , it is very diffi- the nonrandom formation of large palindromes in human can- cult to propagate plasmids with palindromes longer than 160–200 cers may be triggered by the presence of shorter palindromic and such instability can only partially be suppressed or near-palindromic sequences that can adopt hairpin and cruci- in sbcC/sbcD mutants [6,7]. A 160 bp palindrome induced genetic form structures [16]. To account for these and other examples of recombination in vegetative cells of the yeast Schizosaccharomyces palindrome-induced genetic instability, two non-exclusive types pombe [8], while a 140 bp palindrome presented a hot spot for of models have been proposed: first, which depend on template meiosis-specific double-strand breaks in S. cerevisiae [9]. Strong, misalignment during replication and second, which require an palindrome-induced genetic instability was also observed in the enzymatic activity that can transform secondary structures formed in vivo into double-strand breaks (reviews: [17–20]). The members of the polyfunctional MRX complex (Mre11-Rad50-Xrs2), together with the product of the SAE2 gene, were identified in yeast as pro- ∗ Corresponding author. Tel.: +385 1 4836013; fax: +385 1 4836016. E-mail address: [email protected] (Z. Zgaga). teins involved in processing of the hairpin-capped break needed

1568-7864/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.dnarep.2008.11.017

66 | Size-dependent palindrome-induced intrachromosomal recombination in yeast paper 1

384 B. Lisni´c et al. / DNA Repair 8 (2009) 383–389 for generation of recombinogenic DNA ends [21,22] and recently is flanked by G:C and C:G base pairs. Plasmids with shortened it was shown that the plasmid–borne palindromic sequences are palindromic sequences present either in the CYC1 region (pAB7- transformed to DSBs in a Mus81-dependent manner [23]. 110, pAB7-98, pAB7-74, pAB7-50 and pAB7-20) or in the plasmid Important genetic elements, but also potentially destabilizing backbone (pAB9-114, pAB9-102, pAB9-78, pAB9-54 and pAB9-24) motifs, palindromic sequences are attractive DNA patterns for sys- were constructed by digestion of corresponding parental plasmids tematic study at genome level. Short palindrome avoidance was (pAB7-146 and pAB-150, respectively) with the appropriate restric- first observed in the genomes of some bacteriophages and bac- tion enzyme (HindIII, SphI, SalI, BamHI or SacI; see Fig. 2c for exact teria and this bias was explained in the light of the activity of location of these sites) and subsequent religation. The overall size bacterial restriction–modification systems [24]. However, short of a palindromic sequence formed in each plasmid is indicated in palindromes were also under-represented in the yeast S. cerevisiae, its name. However, it should be noted that the restriction sites but at the same time palindromes longer than 12 bp were shown EcoRI and PvuII present in the plasmid pAB218-2 (no palindromic to be highly over-represented [25]. Long, AT-rich palindromes insert), together with their flanking bases, are also 8 bp palindromic were also over-represented in the III and X of the sequences. Standard procedures were used for all nematode Caenorhabditis elegans [26]. In the human genome palin- DNA manipulations [31]. dromic sequences are over-represented in the introns and upstream regions. More than 30 palindromes longer than 100 bp were scored, 2.3. Yeast transformation including several palindromes attending up to 300–400 bp [27]. Moreover, it has been proposed that longer palindromes may be We used standard lithium acetate transformation procedure even more frequent in eukaryotic genomes but remain undetected [34] for construction of the sae2::kanMX strains, but without carrier due to their instability during cloning in the E. coli cells [28]. DNA added. For the construction of all other yeast strains a modified An excess of large palindromes could clearly present a threat to spheroplast procedure, described previously, was used [35]. genome integrity and we decided here to investigate in more detail the influence of palindrome size on palindrome-induced genetic 2.4. Yeast strains instability. For this purpose palindromic sequences of increasing lengths were introduced in the yeast genome in order to monitor All yeast strains used in this study were isogenic to the parental whether they would stimulate intrachromosomal recombination strain FF18-52 (MATa, ade5, leu2-3,119, trp1-289, ura3-52, canR). between flanking homologous sequences. Our results indicated that The strain FF-Ura+, in which the ura3-52 allele was replaced by palindrome-stimulated pop-out recombination is size-dependent, the functional URA3 gene, was isolated after the transformation of but only after a critical size of approximately 70 bp is reached. the FF18-52 strain with circular plasmid pAB218-2 and Southern blot analysis of the Ura+ transformants obtained. The yeast strains 2. Materials and methods used to study the influence of palindromic sequences on pop-out recombination were constructed by transformation of FF18-52 with 2.1. Media and growth conditions appropriate plasmids linearized with XhoI in order to target plas- mid integration into the yeast CYC1 region present on

Yeast cells were grown at 28 ◦C either in the standard YPD or in X [32]. The control strain F-OOO, with no inserted palindromic synthetic complete (SC) medium, Ura+ transformants were selected sequences, was constructed by integration of the plasmid pAB218- on synthetic SC medium lacking uracil (SC-ura), and geneticin 2 in the CYC1 region. As explained before, this construct contained resistant transformants were selected on YPD plates containing the endogenous 8 bp palindromes in positions a–c as presented 200 ␮g/mL G418 [29]. Pop-out recombinants were selected on SC in Fig. 1. The yeast strains F-OPO-150, F-OPO-114, F-OPO-102, F- plates supplemented with 1 g/L 5-fluoroorotic acid (5-FOA) and OPO-78, F-OPO-54 and F-OPO-24 were constructed by targeted 33.6 ␮g/mL uracil [30]. E. coli DH5␣ was grown at 37 ◦C in LB medium supplemented with 50 ␮g/mL ampicillin, when necessary [31].

2.2. Plasmids

Plasmids used in this study for introduction of palindromic sequences into the yeast genome were based on the integrative vec- tor pAB218-2 (6504 bp) containing the yeast CYC1 region (1691 bp) and the URA3 gene [32]. Synthetic palindromic inserts were intro- duced either in the plasmid backbone (PvuII site) or within the yeast CYC1 region (EcoRI site). The longest, 136 bp EcoRI–EcoRI palin- dromic insert presented in Fig. 2c was constructed by self-ligation and subsequent digestion with EcoRI of the 68 bp EcoRI–BlnI frag- ment from the multiple cloning site (MCS) present in the plasmid pAB218-8 [33]. This palindrome was then either ligated directly into the EcoRI site of pAB218-2 to give rise to the plasmid pAB7-146, or end-filled using the Klenow fragment and then ligated into the Fig. 1. Pop-out assay for palindrome-induced intrachromosomal recombination. Recombination between directly repeated CYC1 regions, associated with loss of PvuII site, giving rise to the plasmid pAB9-150. Although the same the intervening sequence, generated uracil auxotrophs that were selected on the palindromic insert was used for construction of these two plas- plates containing 5-FOA. Palindromic sequences (head-to-head arrows, not drawn mids, the overall size of palindromic sequence in pAB7-146 was to scale) were inserted in one of the following locations: EcoRI site present in the 146 bp since the EcoRI site in pAB218-2 is flanked by T:A and A:T centromere-proximal (a) or centromere-distal (c) copy of the CYC1 gene (strains F- base pairs. Similarly, the overall size of the palindromic sequence POO and F-OOP, respectively) or in the PvuII site found in the plasmid backbone (strains F-OPO, position b). A–F indicate locations of primers used for amplifica- in pAB9-150 is 150 bp, since the 136 bp palindromic fragment was tion of palindrome-containing regions in the yeast genome. The centromere on end-filled prior to into the PvuII site in pAB218-2, which chromosome X is indicated by a circle.

Size-dependent palindrome-induced intrachromosomal recombination in yeast | 67 paper 1

B. Lisni´c et al. / DNA Repair 8 (2009) 383–389 385

Fig. 2. Analysis of palindromes introduced in the yeast genome. (a) Amplification of the target loci with primers E and F. The expected sizes of the amplified E–F fragments are as follows: 553 bp, 565 bp, 595 bp, 619 bp, 643 bp, 655 bp and 691 bp in the strains F-OOO, F-OPO-24, F-OPO-54, F-OPO-78, F-OPO-102, F-OPO-114 and F-OPO-150, respectively. DNA fragments were separated by electrophoresis in the 1% agarose gel. Size standard was from Fermentas. (b) The E and F fragment amplified from the strain F-OPO-150 (691 bp) was further digested with all restriction enzymes that have their recognition sites present in the palindrome. Indicated are the sizes of the fragments generated after cleavage with BlnI deduced from the known DNA sequence. The third DNA fragment produced by cleavage, containing the central part of the palindrome, is not visible on the figure. Digestion products were separated by electrophoresis in 8% polyacrylamide gel. Analogous analysis was performed with all strains containing palindromic insertions. (c) sequence of the 150 bp palindrome present in the strain F-OPO-150. The restriction enzymes used in this analysis, together with their cleavage sites, are indicated. The sequence of the shorter palindromic inserts can be deduced from this figure by joining symmetrically placed restriction enzyme sites written in bold. For example, joining of the two HindIII sites creates the 114 bp palindrome found in the strain F-OPO-114.

integration of the plasmids pAB9-150, pAB9-114, pAB9-102, pAB9- BY4742: sae2 [37] using primers 302 (5 -GCT-GAA-TGA-AGC-AAC- 78, pAB9-54 and pAB9-24 into the CYC1 region and they contained TCC-TG-3 and 303 (5 -CGT-TCC-CGT-GGT-AGA-AAT-GC-3 ). The palindromic sequences of decreasing sizes (150, 114, 102, 78, 54 accuracy of deletion of the SAE2 gene in the strains F-OOO and and 24 bp, respectively) between the two copies of the CYC1 region F-OPO-114 was verified by Southern blotting using DIG-labelled (Fig. 1, position b). On the other side, targeted integration of the kanMX as a probe and additionally by PCR with primers 302, plasmids pAB7-146, pAB7-110, pAB7-98, pAB7-74, pAB7-50 and 303, K1 (5 -TAC-AAT-CGA-TAG-ATT-GTC-GCA-C-3 ) and K2 (5 -AAT- pAB7-20 could give rise to two types of transformants: those con- CAC-CAT-GAG-TGA-CGA-CT-3 ). The list of relevant yeast strains taining palindromic insertion in the left or in the right copy of the used here is presented in Table 1. Note that the position, type CYC1 region (positions a and c in Fig. 1). Southern blot was used and size of the insert are indicated in the strain name (for to distinguish between these two possibilities and to isolate the example, F-OON-110 indicates the strain with non-palindromic yeast strains F-POO-146, F-POO-110, F-POO-98, F-POO-74, F-POO- 50, and F-POO-20 containing palindromic sequences of decreasing sizes (146, 110, 98, 74, 50 and 20 bp, respectively) in the centromere- Table 1 proximal (left) copy of the CYC1 region (Fig. 1, position a). The Yeast strains used to study palindrome-induced recombination. same strategy was used to construct the strain F-OOP-110 con- Straina Size and location of a palindromic or taining the 110 bp palindrome in the centromere-distal copy of non-palindromic insertionb the CYC1 region and two strains with non-palindromic inserts in F-OOOc – the CYC1 region, F-OON-110 and F-NOO-110, where the plasmid F-POO-20 20 bp palindrome in a pAB218-8 cut with XhoI was used for transformation. The pres- F-POO-50 50 bp palindrome in a ence of palindromic (or non-palindromic) inserts in our constructs F-POO-74 74 bp palindrome in a F-POO-98 98 bp palindrome in a was confirmed by Southern blotting and restriction analysis of the F-POO-110 110 bp palindrome in a inserts amplified by PCR, as described in Fig. 2. We did not encounter F-POO-146 146 bp palindrome in a any difficulties in amplifying palindromic sequences under stan- F-OPO-24 24 bp palindrome in b dard conditions [30] using GoTaq® Polymerase (Promega, Cat. No. F-OPO-54 54 bp palindrome in b F-OPO-78 78 bp palindrome in b M3178). Left copy of the CYC1 region was amplified with primers F-OPO-102 102 bp palindrome in b C (5 -GAG-CGG-ATA-CAT-ATT-TGA-AT-3 ) and D (5 -CTT-CCA-ACT- F-OPO-114 114 bp palindrome in b TCT-TAC-CAG-TG-3 ), inserts placed in the plasmid backbone with F-OPO-150 150 bp palindrome in b F-OOP-110 110 bp palindrome in c primers E (5 -GGA-ATT-CCC-CCT-TAC-ACG-GAG-3 ) and F (5 -GGA- ATT-CGC-CTT-TGA-GTG-AGC-3 ), and right copy of the CYC1 region F-OON-110 110 bp non-palindrome in c F-NOO-110 110 bp non-palindrome in a with primers A (5 -AAC-GGA-AGC-GAT-GAT-GAG-AG-3 ) and B (5 - F-OOO/sae2 – GGC-ACC-TGT-CCT-ACG-AGT-TG-3 ). Positions of all primers used F-OPO-114/sae2 114 bp palindrome in b SAE2 are indicated in Fig. 1. Finally, deletion of the gene in the a All strains are derived from the strain FF-18-52 MATa, ade5, leu2-3,119, trp1-289, strains F-OOO and F-OPO-114 was carried out by standard one- ura3-52, canR. step gene replacement procedure [36]. The disruption cassette used b Locations a–c are indicated in Fig. 1. for inactivation of the SAE2 gene was amplified from the strain c The strain F-OOO contained endogenous 8 bp palindromes in locations a–c.

68 | Size-dependent palindrome-induced intrachromosomal recombination in yeast paper 1

386 B. Lisni´c et al. / DNA Repair 8 (2009) 383–389

110 bp insertion present in the centromere-distal copy of the CYC1 region)

2.5. Estimation of recombination rates

Pop-out recombination rate was determined by fluctuation analysis using 15 independent cultures per strain. For each strain, a preculture grown to stationary phase in SC-ura medium ( 108 cells/mL) was diluted 10,000 times with SC medium and ∼ 25 ␮L aliquots were further distributed into each of the 20 culture tubes. The tubes were then incubated on orbital shaker at 175 rpm until stationary phase was reached. At the end of growth, the num- ber of Ura− recombinants was determined by plating the content of each of the 15 cultures on plates containing 5-FOA. The total number of cells was determined by plating the appropriate dilu- tion of the remaining five cultures on YPD plates. Colonies were counted after 48 h and the most probable number of recombination events, together with respective 99% confidence intervals for each strain, was determined in WR Mathematica v5.2 by the maximum- likelihood method described by Lea and Coulson [38] using the statistical fluctuation analysis package SALVADOR [39]. Recombina- tion rates were expressed as the ratio of the most probable number of recombination events and the number of cell divisions, and were considered significantly different if there was no overlap between 99% confidence intervals. Fig. 3. Palindrome-stimulated pop-out recombination is SAE2-dependent. Error bars indicate 99% confidence intervals. 2.6. Gel electrophoresis 3.2. Stimulation of pop-out recombination by the presence of a Agarose (1.0%) and polyacrylamide (8.0%) gels were run in 1 110 bp palindromic insert is abolished in SAE2 mutant × TBE [31]. Gel images were taken through a red/orange filter with Sony DSC-W90 digital camera, converted to grayscale and cropped In the control strain F-OOO, that contained no palindromic to show only relevant portions of the gels. Figures were drawn in sequences inserted, the uracil auxotrophs occurred at the rate of 2.82 10 5. This was more than two orders of magnitude higher Adobe Illustrator CS3. × − than the mutation rate to uracil auxotrophy determined in the strain FF-Ura+ (1.60 10 8) indicating that the pop-out recombination × − 3. Results between the two copies of the CYC1 gene is dominant mecha- nism for generation of Ura− colonies in our strains, as presented 3.1. Experimental system in Fig. 1. Insertion of a 110 bp palindrome either within or between the two copies of the CYC1 gene further stimulated the pop-out In order to study the influence of palindrome size on its poten- rate, while the presence of a non-palindromic insert of the same tial to induce recombination in yeast, we constructed two series size in either copy of the CYC1 resulted in a small, but significant of plasmids containing palindromic sequences of increasing sizes decrease in the pop-out rate. Finally, in the sae2 background, no inserted either in the yeast CYC1 gene or in the plasmid backbone. difference in the pop-out rate was observed between the strain Targeted integrations of these plasmids into the yeast CYC1 locus containing palindromic insertion (F-OPO-114/sae2) and the control resulted in genetic structures depicted in Fig. 1. Palindromes were strain (F-OOO/sae2) so we conclude that the palindrome-induced present either within one of the CYC1 sequences (strains F-POO stimulation of pop-out recombination was completely dependent and F-OOP) or between the two CYC1 sequences (strains F-OPO). on the functional SAE2 gene (Fig. 3). We also note that even in the We also constructed yeast strains that contained a non-palindromic strain without palindromic insertion the sae2 mutation exhibited 110 bp insertion in the CYC1 gene (strains F-NOO-110 and F-OON- a small, but significant influence on the pop-out rate which was 110, Table 1). Recombination between the two CYC1 sequences decreased from 2.82 10 5 in the strain F-OOO to 1.31 10 5 in × − × − associated with deletion of the intervening sequence (pop-out) the strain F-OOO/sae2. gives rise to uracil auxotrophs that can be selected on the synthetic medium containing 5-FOA and we wondered whether the presence 3.3. Palindrome-stimulated pop-out recombination depends on of palindromic sequences would stimulate this type of homologous palindrome size and location intrachromosomal recombination (Fig. 1). Having in mind the instability of palindromes observed both Pop-out rates were further determined in the strains with palin- in the E. coli and in yeast cells, we first checked for the pres- dromic sequences of different sizes inserted either within the ence and integrity of palindromes in our constructs (Fig. 2). PCR left copy of the CYC1 gene (strains F-POO) or between the two amplifications of the target sequences from the genomic DNA gave CYC1 genes (Strains F-OPO). Similar, but not identical results were fragments of expected sizes (Fig. 2a). The amplified fragments were observed with these two series of strains (Fig. 4). For the F-OPO further digested with different restriction enzymes that recognize strains the presence of 24 and 54 bp palindromes did not change sequences contained in the palindromes (Fig. 2b and c). This anal- the pop-out rate, while both 20 and 50 bp palindromes inserted in ysis demonstrated that all restriction enzyme sites were conserved one copy of the CYC1 gene resulted in a small decrease, rather than in our constructs indicating that the palindromic sequences were increase in the pop-out rate. On the other side, although longer successfully introduced and maintained in the yeast genome. palindromes were recombinogenic in both series of strains, the

Size-dependent palindrome-induced intrachromosomal recombination in yeast | 69 paper 1

B. Lisni´c et al. / DNA Repair 8 (2009) 383–389 387

increased pop-out rate. Suppression of the pop-out recombination by the presence of point mutations in recombining homologies was already reported [41] and our results indicate that the presence of deletions/insertions could have the same effect (see also below). However, the palindrome-induced recombination was completely abolished in the sae2 background where the pop-out rates were similar in the F-OPO-114 and F-OOO strains (Fig. 3). First isolated as a gene required for meiotic DSB repair [42,43] the SAE2 gene was also shown to be involved in DNA repair in yeast vegetative cells [44] and recombination between inverted [22,45] and direct repeats [46,47]. Formation of chromosome inverted duplications in the sae2 cells has been attributed to their inability to pro- cess hairpin structures formed between inverted repeats [22] and recently it was shown that the Sae2 protein operates as an endonu- clease, together with the Mre11-Rad50-Xrs2 (MRX) complex, to cleave hairpin structures in vitro [21]. The 20-fold reduction in the pop-out rate in the F-OPO-114/sae2 strain strongly suggests that the palindrome-induced recombination observed in our assay also requires the formation and processing of hairpin-capped breaks [22,23]. On the other side, a small but significant reduction in the Fig. 4. Palindrome-stimulated recombination is size-dependent. Pop-out recombi- pop-out rate in the strain without palindromic insertion (F-OOO) nation rates in the strains with palindromic insertions in two different locations are observed in the sae2 background is consistent with results of presented (strains F-OPO and F-POO). First data point relates to the control strain Clerici et al. [46] who demonstrated that the DSB repair by the SSA F-OOO, that has no palindromic insert. Error bars indicate 99% confidence inter- vals (please note that in some cases the confidence intervals are very narrow and pathway is delayed and less efficient in the absence of the Sae2 obscured by the data points). protein.

4.2. The influence of palindrome location more pronounced increase was observed with the F-OPO strains where the presence of a 150 bp palindrome resulted in 22-fold The same 110 bp palindromic insertion stimulated pop-out stimulation of the pop-out rate. recombination in all three locations tested, but not to the same extent: while the pop-out rates in the strains F-POO-110 and F-OOP- 4. Discussion 110 were comparable (1.06 10 4 and 0.98 10 4 respectively), × − × − twice as much uracil auxotrophs per cell division (2.37 10 4) were × − While short palindromes are important DNA motifs frequently generated in the strain F-OPO-114 (Fig. 3). Apart from the fact that found in different cis-acting genetic elements longer palindromic this palindrome is 4 bp longer, an increased pop-out rate may also sequences are highly unstable in various organisms. Since recom- reflect different possibilities and requirements for repair of a DSB binogenicity of a palindrome is clearly size-related we decided to introduced in the palindromic sequence present either between investigate how the step-by-step increase in the size of a palin- the two copies of the CYC1 gene or within one copy. The results drome introduced in the yeast genome will affect its genetic presented in Fig. 4 also indicate higher stimulation of the pop-out stability. The pop-out assay developed here to measure genetic rate in the F-OPO strains. While in both cases the SSA pathway recombination was convenient for introduction of different palin- could give rise to uracil auxotrophs, the conversion pathway, which dromic sequences present on bacterial plasmids directly into the is rarely associated with reciprocal recombination in yeast, may yeast genome, but also demonstrated a high sensitivity required for compete for repair of the breaks induced within the CYC1 gene. detection of slight differences in genetic stability. In our previous Moreover, the break introduced within the palindromic insertion in work we analyzed by Southern blotting 80 5-FOA resistant colonies the CYC1 gene would produce heterologous termini that can impair derived from the strain F-POO-110 and 60 colonies derived from efficient repair by the conversion-type recombination [48]. In other the strain F-NOO-110. They have all lost the intervening sequence words, although the formation of recombinogenic structures could indicating that they were indeed produced by pop-out recombi- be equally frequent in all locations, the breaks introduced between nation [33]. Two different mechanisms could be responsible for the two copies of the CYC1 could be more productive in generating the generation of uracil auxotrophs in our pop-out assay: single- uracil auxotrophs. However, it is also possible that some other fac- strand annealing (SSA) involving repeated copies of the CYC1 gene, tors, like local chromosome structure/supercoiling or proximity to or intrachromosomal reciprocal recombination between these two some cis-acting elements like origins of replication [10], could influ- homologies [40]. As stated before, the spontaneous mutation rate ence extrusion of the hairpin/cruciform and its transformation to to uracil auxotrophy was more than two orders of magnitude infe- the recombinogenic break. rior to the pop-out rate reported here and could not significantly influence our results. 4.3. A critical size for palindrome-induced recombination in yeast

4.1. The role of SAE2 gene in palindrome-stimulated pop-out In order to get a better insight in the size-dependence of recombination palindrome-induced recombination we decided to insert palin- dromes of different sizes in two different locations, within one The results presented in Fig. 3 clearly indicate that the insertion copy of the CYC1 gene (strains F-POO), and between the two copies of a 110 palindromic sequence either within or between directly (strains F-OPO). In spite of some differences that were observed repeated copies of the CYC1 gene stimulated pop-out recombi- between these two series of yeast strains and that were already nation. A non-palindromic insertion of the same size and placed discussed, they revealed the same basic response (Fig. 4). Namely, in the same location in either copy of the CYC1 gene (strains the increase in the pop-out rate observed was not gradual and F-NOO-110 and F-OON-110) resulted in decreased, rather than no stimulation of intrachromosomal recombination was observed

70 | Size-dependent palindrome-induced intrachromosomal recombination in yeast paper 1

388 B. Lisni´c et al. / DNA Repair 8 (2009) 383–389 with palindromes up to 50/54 bp. A significant increase with the F- Acknowledgements OPO strains was observed when the palindrome size reached 78 bp. In the F-POO strains, the insertion of 20 and 50 bp palindromes We gratefully acknowledge the assistance of our students Mar- resulted in decreased, rather than increased pop-out rate. As dis- ija Kovac,ˇ Vanda Juranic-Lisni´ c´ and Lucija Barbaric´ in different cussed before, these short palindromes may act as heterologous stages of this project. We also thank Natasaˇ Tomaseviˇ c´ for technical insertions within one copy of the CYC1 gene reducing the pop-out assistance. This work was funded by Croatian Ministry of Science, rate, but when 50 and 74 bp palindromes were compared, a sig- Education and Sports [grant number 2258]. nificant increase in the pop-out rate was observed. The increase in palindrome length for the next 14 bp resulted in more than References twofold increase in the pop-out rate and was followed by very strong further stimulation of genetic instability. Based on this anal- [1] J.T. Andersen, P. Poulsen, K.F. Jensen, Attenuation in the rph-pyrE operon of ysis we conclude that palindromes become recombinogenic only Escherichia coli and processing of the dicistronic mRNA, Eur. J. Biochem. 206 when they attain a certain critical size and in our case we can esti- (1992) 381–390. [2] K. Hiratsu, S. Mochizuki, H. Kinashi, Cloning and analysis of the replication origin mate this critical size to be about 70 bp. Since recombinogenicity and the telomeres of the large linear plasmid pSLA2-L in Streptomyces rochei, of palindromic sequences is related to their potential to form sec- Mol. Gen. Genet. 263 (2000) 1015–1021. ondary structures it may be that smaller palindromes either do not [3] R.R. Sinden, S.S. Broyles, D.E. Pettijohn, Perfect palindromic lac operator DNA sequence exists as a stable cruciform structure in supercoiled DNA in vitro but form such structures in vivo or that they are very unstable and not in vivo, Proc. Natl. Acad. Sci. U.S.A. 80 (1983) 1797–1801. short-lived. Conversely, with increasing size of a palindrome the [4] S.K. Thukral, A. Eisen, E.T. Young, Two monomers of yeast transcription factor cruciform/hairpin structure may become more stable and readily ADR1 bind a palindromic sequence symmetrically to activate ADH2 expression, Mol. Cell Biol. 11 (1991) 1566–1577. transformed to a DSB. Local melting of DNA duplex and adoption [5] S.K. Weller, A. Spadaro, J.E. Schaffer, A.W. Murray, A.M. Maxam, P.A. Schaffer, of cruciform structure are also temperature-dependent [49] which Cloning, sequencing, and functional analysis of oriL, a herpes simplex virus type may lead to higher palindrome instability at elevated tempera- 1 origin of DNA synthesis, Mol. Cell Biol. 5 (1985) 930–942. [6] A.F. Chalker, D.R. Leach, R.G. Lloyd, Escherichia coli sbcC mutants permit stable tures (data not shown). However, we cannot exclude the possibility propagation of DNA replicons containing a long palindrome, Gene 71 (1988) that even below the estimated critical size palindromes may form 201–205. hairpins/cruciforms in vivo, but that they are not subjected to the [7] J.C. Connelly, D.R. Leach, The sbcC and sbcD genes of Escherichia coli encode a cleavage by secondary structure-specific nucleases. This possibil- nuclease involved in palindrome inviability and genetic recombination, Genes Cells 1 (1996) 285–291. ity is supported by the observation that only hairpins longer than [8] J.A. Farah, E. Hartsuiker, K. Mizuno, K. Ohta, G.R. Smith, A 160-bp palindrome 70 bp are recognized and repaired during yeast meiosis in a pathway is a Rad50.Rad32-dependent mitotic recombination hotspot in Schizosaccha- independent from the Msh1 and Rad1 proteins [9]. romyces pombe, Genetics 161 (2002) 461–468. [9] F. Nasar, C. Jankowski, D.K. Nag, Long palindromic sequences induce double- strand breaks during meiosis in yeast, Mol. Cell Biol. 20 (2000) 3449–3458. [10] K.S. Lobachev, B.M. Shor, H.T. Tran, W. Taylor, J.D. Keen, M.A. Resnick, D.A. 4.4. Suppression of palindrome-induced recombination may be Gordenin, Factors affecting inverted repeat stimulation of recombination and crucial for the genetic stability of genomes containing long deletion in Saccharomyces cerevisiae, Genetics 148 (1998) 1507–1524. [11] A.J. Rattray, A method for cloning and sequencing long palindromic DNA junc- palindromes tions, Nucleic Acids Res. 32 (2004) e155. [12] L.A. Cunningham, A.G. Cote, C. Cam-Ozdemir, S.M. Lewis, Rapid, stabilizing To summarize, we show that even though palindromes longer palindrome rearrangements in somatic cells by the center-break mechanism, Mol. Cell Biol. 23 (2003) 8740–8750. than approximately 70 bp may be propagated and maintained in [13] H. Kurahashi, B.S. Emanuel, Long AT-rich palindromes and the constitutional yeast, they present a source of genetic rearrangements and will t(11;22) breakpoint, Hum. Mol. Genet. 10 (2001) 2605–2617. eventually be eliminated from the genome in the absence of pos- [14] H. Kurahashi, T. Shaikh, M. Takata, T. Toda, B.S. Emanuel, The constitutional t(17;22): another translocation mediated by palindromic AT-rich repeats, Am. itive selection. However, can our results be exploited to anticipate J. Hum. Genet. 72 (2003) 733–738. the stability of palindromic sequences present in the genomes [15] H. Tanaka, D.A. Bergstrom, M.C. Yao, S.J. Tapscott, Widespread and nonrandom of other organisms, notably, in humans? The available informa- distribution of DNA palindromes in cancer cells provides a structural platform for subsequent gene amplification, Nat. Genet. 37 (2005) 320–327. tion about the presence of longer palindromes in the genomes is [16] V. Narayanan, K.S. Lobachev, Intrachromosomal gene amplification triggered by rather sporadic, and only a few systematic studies have been per- hairpin-capped breaks requires homologous recombination and is independent formed. While the longest palindrome in the yeast genome has of nonhomologous end-joining, Cell Cycle 6 (2007) 1814–1818. [17] D.A. Gordenin, M.A. Resnick, Yeast ARMs (DNA at-risk motifs) can reveal sources 44 bp [25] and only a few palindromes longer than 60 bp were of genome instability, Mutat. Res. 400 (1998) 45–58. found on chromosomes III and X in C. elegans [26], palindromes [18] D.R. Leach, Long DNA palindromes, cruciform structures, genetic instability and exceeding 100 bp were found in some bacterial genomes (Lisnic´ et secondary structure repair, Bioessays 16 (1994) 893–900. al., unpublished) and more than 30 palindromes longer than 100 bp [19] S.M. Lewis, A.G. Cote, Palindromes and genomic stress fractures: bracing and repairing the damage, DNA Repair (Amst) 5 (2006) 1146–1160. were scored in the haploid human genome [27]. These results sug- [20] K.S. Lobachev, A. Rattray, V. Narayanan, Hairpin- and cruciform-mediated chro- gest that their recombinogenicity may be controlled at the cellular mosome breakage: causes and consequences in eukaryotic cells, Front Biosci. level. Kurahashi et al. [50] have demonstrated a very high incidence 12 (2007) 4208–4220. [21] B.M. Lengsfeld, A.J. Rattray, V. Bhaskara, R. Ghirlando, T.T. Paull, Sae2 of near-palindrome mediated t(11;22) chromosomal translocations is an endonuclease that processes hairpin DNA cooperatively with the in male meiosis and they proposed that the factor(s) required for Mre11/Rad50/Xrs2 complex, Mol. Cell 28 (2007) 638–651. transformation of palindromic sequences to DSBs are present only [22] K.S. Lobachev, D.A. Gordenin, M.A. Resnick, The Mre11 complex is required for repair of hairpin-capped double-strand breaks and prevention of chromosome in meiotic cells. Alternatively, it is also possible that the recombino- rearrangements, Cell 108 (2002) 183–193. genicity of palindromes is in fact actively suppressed in somatic [23] A.G. Coté, S.M. Lewis, Mus81-dependent double-strand DNA breaks at in vivo- cells in order to maintain genome stability. The inactivation of generated cruciform structures in S. cerevisiae, Mol. Cell 32 (2008) 800–812. [24] M.S. Gelfand, E.V. Koonin, Avoidance of palindromic words in bacterial and a function involved in suppression of palindrome instability may archaeal genomes: a close connection with restriction enzymes, Nucleic Acids trigger recombination processes giving rise to different genetic Res. 25 (1997) 2430–2439. rearrangements described in detail in yeast [51], but also observed [25] B. Lisnic,´ I.K. Svetec, H. Saric,ˇ I. Nikolic,´ Z. Zgaga, Palindrome content of the yeast Saccharomyces cerevisiae genome, Curr. Genet. 47 (2005) 289–297. in human cancer cells [15]. [26] M.D. LeBlanc, G. Aspeslagh, N.P. Buggia, B.D. Dyer, An annotated catalog of inverted repeats of Caenorhabditis elegans chromosomes III and X, with observa- tions concerning odd/even biases and conserved motifs, Genome Res. 10 (2000) Conflicts of interest 1381–1392. [27] L. Lu, H. Jia, P. Droge, J. Li, The human genome-wide distribution of DNA palin- No conflicts of interest. dromes, Funct. Integr. Genomics 7 (2007) 221–227.

Size-dependent palindrome-induced intrachromosomal recombination in yeast | 71 paper 1

B. Lisni´c et al. / DNA Repair 8 (2009) 383–389 389

[28] S.M. Lewis, S. Chen, J.N. Strathern, A.J. Rattray, New approaches to the analysis [40] F. Pâques, J.E. Haber, Multiple pathways of recombination induced by double- of palindromic sequences from the human genome: evolution and polymor- strand breaks in Saccharomyces cerevisiae, Microbiol. Mol. Biol. Rev. 63 (1999) phism of an intronic site at the NF1 locus, Nucleic Acids Res. 33 (2005) 349–404. e186. [41] A. Datta, M. Hendrix, M. Lipsitch, S. Jinks-Robertson, Dual roles for DNA [29] F. Sherman, Getting started with yeast, Methods Enzymol. 350 (2002) 3–41. sequence identity and the mismatch repair system in the regulation of mitotic [30] J.D. Boeke, F. LaCroute, G.R. Fink, A positive selection for mutants lacking crossing-over in yeast, Proc. Natl. Acad. Sci. U.S.A. 94 (1997) 9757–9762. orotidine-5 -phosphate decarboxylase activity in yeast: 5-fluoro-orotic acid [42] A.H. McKee, N. Kleckner, A general method for identifying recessive diploid- resistance, Mol. Gen. Genet. 197 (1984) 345–346. specific mutations in Saccharomyces cerevisiae, its application to the isolation [31] J. Sambrook, D.W. Russell, Molecular Cloning: A Laboratory Manual, Cold Spring of mutants blocked at intermediate stages of meiotic prophase and characteri- Harbor Laboratory Press, Cold Spring Harbor, N.Y., 2001. zation of a new gene SAE2, Genetics 146 (1997) 797–816. [32] I.K. Svetec, D. Stjepandic,´ V. Boric,´ P.T. Mitrikeski, Z. Zgaga, The influence of a [43] S. Prinz, A. Amon, F. Klein, Isolation of COM1, a new gene required to com- palindromic insertion on plasmid integration in yeast, Food Technol. Biotechnol. plete meiotic double-strand break-induced recombination in Saccharomyces 39 (2001) 169–173. cerevisiae, Genetics 146 (1997) 781–795. [33] I.K. Svetec, B. Lisnic,´ Z. Zgaga, A 110 bp palindrome stimulates plasmid integra- [44] E. Baroni, V. Viscardi, H. Cartagena-Lirola, G. Lucchini, M.P. Longhese, The func- tion in yeast, Period. Biol. 104 (2002) 421–424. tions of budding yeast Sae2 in the DNA damage response require Mec1- and [34] R.D. Gietz, R.H. Schiestl, High-efficiency yeast transformation using the LiAc/SS Tel1-dependent phosphorylation, Mol. Cell. Biol. 24 (2004) 4151–4165. carrier DNA/PEG method, Nat. Protoc. 2 (2007) 31–34. [45] A.J. Rattray, C.B. McGill, B.K. Shafer, J.N. Strathern, Fidelity of mitotic double- [35] K. Gjuraciˇ c,´ Z. Zgaga, Illegitimate integration of single-stranded DNA in Saccha- strand-break repair in Saccharomyces cerevisiae: a role for SAE2/COM1, Genetics romyces cerevisiae, Mol. Gen. Genet. 253 (1996) 173–181. 158 (2001) 109–122. [36] A. Wach, A. Brachat, R. Pohlmann, P. Philippsen, New heterologous modules for [46] M. Clerici, D. Mantiero, G. Lucchini, M.P. Longhese, The Saccharomyces cerevisiae classical or PCR-based gene disruptions in Saccharomyces cerevisiae, Yeast 10 Sae2 protein promotes resection and bridging of double strand break ends, J. (1994) 1793–1808. Biol. Chem. 280 (2005) 38631–38638. [37] E.A. Winzeler, D.D. Shoemaker, A. Astromoff, H. Liang, K. Anderson, B. Andre, [47] K. Lee, S.E. Lee, Saccharomyces cerevisiae Sae2- and Tel1-dependent single- R. Bangham, R. Benito, J.D. Boeke, H. Bussey, A.M. Chu, C. Connelly, K. Davis, F. strand DNA formation at DNA break promotes microhomology-mediated end Dietrich, S.W. Dow, M. El Bakkoury, F. Foury, S.H. Friend, E. Gentalen, G. Giaever, joining, Genetics 176 (2007) 2003–2014. J.H. Hegemann, T. Jones, M. Laub, H. Liao, N. Liebundguth, D.J. Lockhart, A. Lucau- [48] I.K. Svetec, A. Stafa,ˇ Z. Zgaga, Genetic side effects accompanying gene targeting Danila, M. Lussier, N. M’Rabet, P. Menard, M. Mittmann, C. Pai, C. Rebischung, in yeast: the influence of short heterologous termini, Yeast 24 (2007) 637–652. J.L. Revuelta, L. Riles, C.J. Roberts, P. Ross-MacDonald, B. Scherens, M. Snyder, S. [49] V. Viglasky, P. Danko, J. Adamcik, F. Valle, G. Dietler, Detection of cruciform Sookhai-Mahadeo, R.K. Storms, S. Veronneau, M. Voet, G. Volckaert, T.R. Ward, extrusion in DNA by temperature-gradient gel electrophoresis, Anal. Biochem. R. Wysocki, G.S. Yen, K. Yu, K. Zimmermann, P. Philippsen, M. Johnston, R.W. 343 (2005) 308–312. Davis, Functional characterization of the S. cerevisiae genome by gene deletion [50] H. Kurahashi, H. Inagaki, T. Ohye, H. Kogo, T. Kato, B.S. Emanuel, Palindrome- and parallel analysis, Science 285 (1999) 901–906. mediated chromosomal translocations in humans, DNA Repair (Amst) 5 (2006) [38] D. Lea, C.A. Coulson, The distribution of the number of mutants in bacterial 1136–1145. populations, J. Genet. 49 (1949) 264–285. [51] V. Narayanan, P.A. Mieczkowski, H.M. Kim, T.D. Petes, K.S. Lobachev, The pattern [39] Q. Zheng, New algorithms for Luria-Delbrück fluctuation analysis, Math. Biosci. of gene amplification is determined by the chromosomal location of hairpin- 196 (2005) 198–214. capped breaks, Cell 125 (2006) 1283–1296.

72 | Size-dependent palindrome-induced intrachromosomal recombination in yeast paper 2

Curr Genet (2005) 47: 289–297 DOI 10.1007/s00294-005-0573-5

RESEARCH ARTICLE

Berislav Lisnic´ Æ Ivan-Kresˇ imir Svetec Hrvoje Sˇ aric´ Æ Ivan Nikolic´ Æ Zoran Zgaga Palindrome content of the yeast Saccharomyces cerevisiae genome

Received: 10 February 2005 / Accepted: 20 February 2005 / Published online: 18 March 2005 Ó Springer-Verlag 2005

Abstract Palindromic sequences are important DNA Keywords Palindrome Æ Inverted repeat Æ motifs involved in the regulation of different cellular Dinucleotide Æ Saccharomyces cerevisiae Æ Sequence processes, but are also a potential source of genetic analysis instability. In order to initiate a systematic study of palindromes at the whole genome level, we developed a computer program that can identify, locate and count Introduction palindromes in a given sequence in a strictly defined way. All palindromes, defined as identical inverted re- Closely spaced inverted repeats (IRs), palindromes and peats without spacer DNA, can be analyzed and sorted quasipalindromes can be found in the DNA of natural according to their size, frequency, GC content or plasmids, viral and bacterial genomes and eukaryotic alphabetically. This program was then used to prepare a chromosomes and organelles. In prokaryotes, they may catalog of all palindromes present in the chromosomal serve as binding sites for regulatory proteins, while short DNA of the yeast Saccharomyces cerevisiae. For each perfect palindromes are known as recognition sites for palindrome size, the observed palindrome counts were type II restriction-modification systems (RMSs) that play significantly different from those in the randomly gen- a significant role in bacterial ecology and evolution erated equivalents of the yeast genome. However, while (Gelfand and Koonin 1997; Rocha et al. 2001). Another the short palindromes (2–12 bp) were under-repre- important property of such motifs is their potential to sented, the palindromes longer than 12 bp were over- form intra-strand hydrogen bonds within DNA mole- represented, AT-rich and preferentially located in the cules or in corresponding RNA transcripts. Therefore, intergenic regions. The 44-bp palindrome found between they are contained in genes encoding functional RNA the genes CDC53 and LYS21 on chromosome IV was molecules, the structure of which depends on the for- the longest palindrome identified and contained only mation of proper intra-strand bonding, and in different two C-G base pairs. Avoidance of coding regions was cis-acting genetic elements, like terminators, attenuators, also observed for palindromes of 4–12 bp, but was less plasmid and viral origins of replication. Protein binding pronounced. Dinucleotide analysis indicated a strong and secondary structure formation are also modes of bias against palindromic dinucleotides that could ex- action for IRs and related motifs in eukaryotic cells. For plain the observed short palindrome avoidance. We example, palindromes with a spacer of one nucleotide discuss some possible mechanisms that may influence were identified in yeast sequences regulating cellular re- the evolutionary dynamics of palindromic sequences in sponse to the accumulation of unfolded proteins in the the yeast genome. endoplasmic reticulum (Mori et al. 1998) and a hetero- dimeric complex was isolated that binds two palindromic sequences in the promoter region of the human erbB-2 Communicated by S. Hohmann gene (Chen and Gill 1996). In mouse B lymphoma cells, B. Lisnic´ Æ I.-K. Svetec Æ Z. Zgaga (&) palindromic and potential stem-loop motifs were iden- Faculty of Food Technology and Biotechnology, tified as break-points during class switch recombination University of Zagreb, Pierottijeva 6, (Tashiro et al. 2001); and the formation of intra-strand 10000 Zagreb, Croatia E-mail: [email protected] secondary structures is essential in the process of im- Tel.: +385-1-4836013 munoglobuline gene rearrangement known as V(D)J- Fax: +385-1-4836016 joining (Cuomo et al. 1996). H. Sˇ aric´ Æ I. Nikolic´ However, in spite of their importance and functional Sail Company Croatia Ltd., Ilica 412, 10000 Zagreb, Croatia versatility, longer palindromes and IRs were shown to be

Palindrome content of the yeast Saccharomyces cerevisiae genome | 71 paper 2

290

very unstable in different organisms, from bacteria to 001136.5, NC_001137.2, NC_001138.4, NC_001139.4, mammalian cells. The recombinogenicity of such motifs NC_001140.3, NC_001141.1, NC_001142.5, NC_ is attributed to their potential to form secondary struc- 001143.4, NC_001144.3, NC_001145.2, NC_001146.2, tures known as hairpins and cruciforms and the molec- NC_001147.4, NC_001148.2) were downloaded from ular models proposed to explain palindrome-induced the NCBI (ftp://ftp.ncbi.nih.gov/genomes/Saccharomy- genomic instability can be divided into two classes: first, ces_cerevisiae/) on 3 April, 2004. The total length of based on template switching or slippage of the DNA downloaded yeast genomic DNA sequences (excluding polymerase during replication of DNA adopting sec- mitochondrial DNA) was 12,070,766 bp. ondary structures and, second, requiring an enzymatic The file containing the coding regions of the entire activity that transforms cruciforms and hairpins to re- yeast genome (that is, ORF coding sequences only, combinogenic lesions, like double-strand breaks (DSBs; without 5¢UTR, 3¢UTR, intron sequences, or bases Leach 1994). Both types of models are supported by not translated due to translational frameshifting) experimental data; and several human genetic disorders was downloaded on 12 July, 2004 from the SGD (ftp:// can be explained either by errors occurring during the genome-ftp.stanford. edu/pub/yeast/data_download/se- replication of palindromic and quasipalindromic se- quence/genomic_sequence/orf_dna/orf_coding.fasta.gz). quences (Bissler 1998; Gordenin and Resnick 1998) or by This file includes all ORFs except dubious ORFs, and IR-induced illegitimate end-joining (Repping et al. 2002). prior to examination of the palindrome and dinucleotide Given the importance of palindromic sequences in the content in the coding regions, the sequences of mito- regulation of different cellular processes on the one side chondrial DNA were removed from the file. The total and their influence on genetic stability on the other side, length of the coding DNA sequences in this shortened an analysis of the incidence of palindromes at the geno- file was 8,755,368 bp. mic level seems particularly interesting. LeBlanc et al. Ten random genomes (12,070,766 bp each) were (2000) prepared a catalog of palindromes of 4–60 bp generated with respect to the frequency of the four bases present on chromosomes III and X of the nematode present in the yeast genome (A=30.90%, C=19.17%, Caenorhabditis elegans. Palindromes were classified as G=19.13%, T=30.81%), so that the average propor- AT-rich, non-AT-rich or GC-rich and this analysis tions of the bases in the random genomes were: indicated that the long, AT-rich palindromes are much A=30.90±0.01%, C=19.18±0.01%, G=19.13± more frequent in the actual chromosomes than in the 0.01% and T=30.81±0.01%. randomly generated sequences. The IRs separated by a single base pair were also included in their study and such ‘‘odd palindromes’’ were more frequent than perfect IRs Palindrome count without a spacer. Avoidance of short palindromic se- quences was observed in the genomes of some bacterio- The Spinnaker program described in this work phages (Sharp 1986), but also in the genomes of their is available upon request. bacterial hosts (Karlin et al. 1997; Gelfand and Koonin 1997; Rocha et al. 2001). The highest bias was frequently observed for the recognition sites of the type II restriction Statistics enzymes, supporting the view that RMSs influence gen- ome evolution in prokaryotes. Consistent with this The numbers of dinucleotides and palindromes deter- hypothesis, Fuglsang (2004) demonstrated that the suc- mined in the chromosomal DNA were called the ob- cession of codons disfavoring palindrome formation is served numbers, while the numbers of dinucleotides and more pronounced for the bacterial species that have palindromes determined in the random genomes were RMSs. In order to initiate a systematic study of palin- called the expected numbers. To test whether the medi- dromes present in different genomes, we decided first to ans of the expected numbers of palindromes and dinu- develop a computer program that can score and count all cleotides differed significantly from the observed palindromes present in a given sequence in a strictly de- numbers, we employed the Wilcoxon signed rank test fined way. This program was then used to analyze the with a confidence interval of 99%, using MINITAB palindrome content of the entire genome of the yeast ver. 14.1. Saccharomyces cerevisiae and the results of this analysis clearly indicate that the palindromic sequences conform to the specific rules of evolutionary dynamics. Results

Materials and methods Palindrome scoring

DNA sequences Systematic study of palindromic sequences requires convenient software that will recognize and map all The DNA sequence files for all yeast chromosomes palindromes in a predetermined size-range at the geno- (NC_001133.4, NC_001134.5, NC_001135.6, NC_ mic level. This is not a trivial problem, since individual

74 | Palindrome content of the yeast Saccharomyces cerevisiae genome paper 2

291 palindromes found within a given sequence can be sents a combination of these two modes of palindrome scored in different ways. First, we may decide to score as scoring where only non-nested embedded palindromes individual palindromes only those palindromes that do are presented. not share common base pairs, but this approach can Our computer program, named Spinnaker, was de- result in an underestimation of the actual number of signed to score palindromes as indicated in Fig. 1c–e. palindromes, since some palindromes may remain Let us define first the frame as an array of nucleotides of undetected and different results can be obtained for the even length. The location of the frame within a given same sequence (compare Fig. 1a,b). Therefore, we DNA sequence is identified by position of its left decided to count also the palindromes that do share boundary (L) and by its length. The maximum length of common base pairs and in this case we have the possi- the frame (p) is propounded as a search parameter that bility either to score only the longest palindromes that corresponds to the length of the potentially largest pal- may overlap, or to include also the shorter palindromes indrome. The position of the right boundary of the embedded within longer palindromes. There are two frame is given by R=L + p 1. Additionally, we define À classes of such palindromes, those sharing the same P1 as the right boundary of the last previously found center of symmetry (nested) and those having a different palindrome. In the beginning, we set L=0 and P1=0. center of symmetry. This is shown in Fig. 1c, where all The algorithm for the search for overlapping palin- palindromes present in the test sequence are indicated, dromes can be described recursively as follows. Step 1: including both nested and non-nested palindromes en- we set the frame size to p and move it one position to the tirely contained within longer palindromes. Embedded right (L:=L + 1). Step 2: within the frame, we check for palindromes are omitted in Fig. 1d, while Fig. 1e pre- complementarity in the Lth and Rth base pair, then the (L + 1)th and the (R 1)th base pair and so on. If it is found that all base pairsÀ are complementary, the frame is defined as a palindrome and its properties, including P1, are recorded. However, if there is at least one non- complemantary base pair and if the current size of the frame is >2 and R >P1, we reduce the frame size by decreasing the position of its right boundary by two and then repeat step 2. Otherwise, we repeat step 1. The search for embedded palindromes (with or without un- ique symmetry axis) follows the same general procedure, ignoring the P1 parameter and introducing a few addi- tional internal structures needed for the affiliation of a palindrome to a given category. The maximal size of a palindrome can be set to 500 nt and the results of sequence analysis can be sorted by size or frequency, alphabetically, or by GC content (per- centage or absolute number of GC base pairs). Addi- tionally, Spinnaker can graphically display the distribution of palindromes, where the analyzed se- quence is presented as a horizontal line and palindromes of a given size as vertical lines. The results of a search can be saved as two separate files, one that includes the number and exact chromosomal locations of all palin- dromes and the other that includes the summarized re- sults of a search. The program is user-friendly and works under Windows. We compared Spinnaker with two programs designed for the identification of IRs: Reputer (Kurtz and Schleiermacher 1999) and Palindrome (Rice et al. 2000). Both programs recognize palindromic sequences, as Fig. 1 Palindrome scoring. Different numbers of palindromes can shown in Fig. 1e, so that some of the palindromes be counted in the same sequence depending on the scoring criterion. a, b Non-overlapping palindromes do not share common contained within a longer palindrome remain unde- base pairs. c Embedded palindromes include both non-overlapping tected. For example, they detected only three 6-nt pal- and palindromes that share common base pairs as well as indromes in the test sequence presented in Fig. 1, while all shorter palindromes contained within a longer palindrome. Spinnaker recorded six such palindromes when embed- d Overlapping palindromes include both non-overlapping and palindromes that share common base pairs but not shorter ded palindromes were counted. Reputer also recorded palindromes contained within a longer palindrome. e Embedded some non-palindromic sequences, so that 29 instead of palindromes with nested palindromes excluded 16 palindromic dinucleotides were found in our test se-

Palindrome content of the yeast Saccharomyces cerevisiae genome | 75 paper 2

292

Table 1 Comparison of the numbers of palindromes in the Palindrome size Overlapping palindromes Embedded palindromes yeast genome and in random genomes Yeast Random Yeast Random genome genomesa genome genomesa

2 1,732,917b 1,904,950.4 2,770,675b 3,183,592.3 4 479,972b 580,521.2 705,275b 839,462.8 6 133,411b 159,564.1 194,874b 221,347.3 8 38,234b 42,738.6 57,086b 58,424.4 10 10,246b 11,256.8 17,514c 15,382.8 12 2,809b 2,995.9 6,495c 4,095.1 14 939c 807 3,159c 1,096.3 16 282c 212 1,813c 289.1 18 96c 55.2 1,207c 77.1 20 59c 16.9 857c 21.9 22 65c 3.8 606c 5.0 24 24c 1.2 413c 1.2 26 29c 0 289c 0 28 17c 0 194c 0 30 9c 0 125c 0 c c a 32 9 0 80 0 The average number from ten 34 8c 0 49c 0 random genomes c c b 36 4 0 25 0 Under-represented palin- 38 3c 0 13c 0 dromes (P=0.006), also repre- 40 3c 06c 0 sented in italics c c c 42 2 030 Over-represented palindromes 44 1c 01c 0 (P=0.006)

quence; and the minimal palindrome size for Palindrome even the 10-bp palindromes were over-represented. This is 4 nt. is not unexpected, since in this case, all the 10-bp pal- indromes present in longer palindromes were also added to the sum. It is important to note that, even when Palindromic sequences in the yeast genome embedded palindromes were counted, palindromes shorter than 10 bp were under-represented in compari- We initiated our study of yeast palindromic sequences son with randomly generated genomes. by identifying all palindromes present in individual chromosomes (data not shown). We did not observe any particular pattern in the distribution of palindromic se- Long palindromes are AT-rich and are placed quences along different chromosomes, although this in intergenic regions feature was beyond the scope of this study and was not systematically analyzed. The complete catalog of palin- Each palindrome identified by our program can be lo- dromes present in all yeast chromosomes, determined cated on the physical map of the yeast genome (http:// both as overlapped and embedded palindrome counts, is www.yeastgenome.org), as shown for the six longest presented in Table 1. As expected, a sharp size-depen- palindromes found in the yeast chromosomal DNA dent decrease in the numbers of palindromes was ob- (Fig. 2). They were all placed in intergenic regions and served, but only for shorter palindromes. For example, were AT-rich, so we decided to analyze these features the number of palindromes containing 22 bp was 65, more systematically. We calculated the A+T content for while the number of 20-bp palindromes was 59. Simi- all palindromes found in the yeast genome and for the larly, there were more 26-bp than 24-bp palindromes palindromes detected in ten randomly generated se- identified. The palindrome content determined in the ten quences. The A+T content of palindromes found in randomly generated genomes is also included in Table 1 artificial sequences was around 70%, while slightly lower and can be compared with the palindrome content of the values were observed in genomic palindromes up to actual genome. For each palindrome size, the values 12 nt (Fig. 3). However, a high increase in A+T content observed in the yeast genome were significantly different was observed with longer palindromes that were almost from those obtained in artificial genomes (P=0.006). exclusively composed of A-T base pairs (Fig. 2). The However, palindromes up to 12 bp were under-repre- remarkable exception was the presence of a GC-rich sented, while longer palindromes were over-represented. (52.4%) 42-nt palindrome found between the RFX1 gene The longest palindromes found in ten randomly gener- and YLR177W on chromosome XII (Fig. 2). ated sequences contained 24 bp, while 85 palindromes In order to find out the position of palindromic se- longer than 24 bp were found in the yeast genome. quences with respect to annotated coding regions, we Similar results were obtained when embedded palin- determined the proportion of palindromes that were dromes were included in the palindrome count, but here placed in intergenic (non-coding) regions. The sequence

76 | Palindrome content of the yeast Saccharomyces cerevisiae genome paper 2

293

expected from random distribution between the coding and non-coding part of the genome. A preferential location in intergenic regions was particularly pro- nounced for palindromes longer than 10 bp, so that more than 97% of palindromes longer than 18 bp were identified outside of the coding regions (Fig. 3).

Palindromic dinucleotides in the yeast genome

Four dinucleotides, AT, TA, GC and CG, may be considered as the shortest palindromic sequences, but they are also found in the center of any longer palin- drome. Our analysis indicated that the short palin- dromes, including palindromic dinucleotides, were less frequent in the yeast genome than in the random ge- nomes (Table 1) and we decided to determine relative frequencies for all 16 dinucleotides (Fig. 4). Indeed, the two most under-represented dinucleotides were palin- dromic dinucleotides containing purine on the 3¢ end, TA (0.77) and CG (0.80), while AT (0.94) was moder- ately under-represented and GC (1.02) was only slightly over-represented. For non-palindromic dinucleotides, very close values were observed for complementary di- Fig. 2 Sequences and chromosomal locations of the six longest nucleotides and only the AC and GT couple was under- palindromes found in the yeast genome. Numbers indicate the start represented (0.89). The ratio between non-palindromic and end locations for ORFs and palindromes and palindromic dinucleotides composed either of A-T or G-C base pairs illustrates well the avoidance of pal- containing only the yeast coding regions represented indromic dinucleotides in the yeast genome. TT+AA 72.73% of the total yeast genome (see Materials and dinucleotides are 1.33 times more frequent than methods), but even the short palindromes were prefer- TA+AT dinucleotides, while CC+GG dinucleotides entially (30–35%) placed in intergenic regions. Only the are 1.17 times more frequent than CG+GC dinucleo- proportion of palindromic dinucleotides found in the tides (Table 2). The same analysis was also done for intergenic regions (27.65%) was very close to that dinucleotides present in all palindromes longer than

Fig. 3 Analysis of the A+T content and location of palindromes with respect to the non-coding regions in the yeast genome. Palindromes were scored as presented in Fig. 1d (overlapping palindromes). Error bars indicate one standard deviation

Palindrome content of the yeast Saccharomyces cerevisiae genome | 77 paper 2

294

Fig. 4 Occurrence of dinucleotides in the yeast genome. For each dinucleotide, the ratio between the number determined in the yeast genome (observed) and the average number obtained in ten random genomes (expected) is presented

14 bp and the results obtained were quite different from involved in various cellular processes. In the present those observed with the entire yeast genome, indicating work, we focused on palindromes, defined as perfect IRs that the long palindromes are specifically enriched in without spacer DNA. In order to make possible a sys- palindromic dinucleotides. However, the bias was much tematic study of these important DNA motifs in differ- more pronounced for the AT and TA dinucleotides, ent genomes, we first developed a computer program which were 5.6 times more frequent than the TT+AA that can identify, locate and count palindromes in a gi- dinucleotides (Table 2). ven nucleotide sequence in a strictly defined way. This The data presented in Fig. 4 also indicated a strong program was then used to prepare a catalog of all pal- bias in the frequency of palindromic dinucleotides indromic sequences present in the S. cerevisiae genome composed of identical base pairs, so that the AT/TA and in randomly generated equivalents of the yeast ratio was 1.22 and the GC/CG ratio was 1.28. This genome. Comparison between actual and random ge- analysis was also done for the coding complement of the nomes revealed several important differences that are yeast genome and for palindromes longer than 14 bp discussed below. (Table 2). The AT/TA ratio rose to 1.31 in coding DNA, while in long palindromes it was 1.036, indicating that the two dinucleotides are almost equally frequent. This Identification of palindromic sequences can be explained by the presence of long runs of alter- nating A and T nucleotides frequently found in the long As discussed before, palindrome scoring can be done in palindromes (Fig. 2; data not shown). In contrast, the different ways and we decided to consider both sepa- GC/CG ratio in the coding complement was very close rated (non-overlapping) and overlapping palindromes, to that observed for the entire genome and in the long together with all smaller palindromes contained within palindromes it even increased to 1.33. longer palindromes, as individual palindromes. Such an approach ensures that no palindromic motif that may be hidden within a longer palindrome can be omitted; and Discussion this is very important for the detection of specific pal- indromic sites, like restriction enzyme sites or regulatory Palindromes, quasipalindromes and closely spaced IRs units. In addition to the embedded palindrome count, can be found in the genomes of all organisms and are we also used a simplified way of palindrome scoring where partial overlapping of individual palindromes is Table 2 Relative frequencies of palindromic dinucleotides in the allowed, but all smaller palindromes completely con- whole genome, coding DNA and palindromes longer than 14 bp. tained within longer palindromes are ignored. This way ND Not determined of palindrome scoring (overlapped palindromes) indi- cates individual palindromic loci within a given sequence Ratio Whole Coding Palindromes genome DNA >14 bp and each palindrome is counted only once. In this way, a more realistic picture of the numbers and distribution of (AA+TT)/(AT+TA) 1.328 ND 0.178 palindromic loci is obtained, especially when we con- (GG+CC)/(GC+CG) 1.166 ND 0.800 sider the long runs of alternating complementary dinu- AT/TA 1.220 1.308 1.036 cleotides frequently present in a genome. For example, GC/CG 1.277 1.291 1.330 ten AT dinucleotides will be scored as a single 20 nt

78 | Palindrome content of the yeast Saccharomyces cerevisiae genome paper 2

295 palindrome, while in the same sequence 100 embedded formed, the occurrence of all palindromic sequences may palindromes can be found. For these reasons, we deci- also be expected to be less frequent. Palindromic dinu- ded to use both embedded and overlapping palindrome cleotides composed of A-T base pairs were particularly counts in the analysis of palindrome content (Fig. 1c,d). disfavored, consistent with our finding that the A+T As illustrated by our analysis of the yeast genome, this content of short palindromes present in the yeast gen- distinction is important. For example, both 10-nt and ome is decreased in comparison with the random se- 12-nt palindromes are under-represented when counted quence. as overlapped palindromes, but are over-represented The observed differences in the occurrence of dinu- when scored as embedded palindromes (Table 1). Pal- cleotides are not specific for the yeast genome. Early indromic dinucleotides were also analyzed for two rea- sequence analyses indicated strong biases in dinucleotide sons. First, there is no clear reason to include frequencies in prokaryotes, eukaryotes, mitochondrial tetranucleotides and to exclude dinucleotides from the and viral genomes (Nussinov 1984). Karlin et al. (1997) palindrome count and, second, each palindrome con- also found that TA and CG are among the most under- tains one of the four palindromic dinucleotides in the represented dinucleotides in different microbial ge- center and the analysis of their incidence could be useful nomes; and they extensively discussed the mechanisms for the general study of palindromic sequences. This was that could create a genomic signature. For example, one demonstrated by our analysis of the yeast genome, possible explanation for the observed biases in the where the palindromic dinucleotides are particularly dinucleotide frequencies could reflect the yeast codon under-represented compared with non-palindromic di- bias (Sharp and Cowe 1991), since a high proportion of nucleotides with the same base pair composition. In the the yeast genome consists of protein-coding sequences. C. elegans genome, the IRs separated by one nucleotide However, our data indicated that 72.4% of all palin- are more frequent than the repeats without a spacer dromic dinucleotides were located within coding regions (LeBlanc et al. 2000). One possible explanation for this that represent 72.7% of the entire genome. This result interesting observation could be that the palindromic suggests that the high avoidance of palindromic dinu- dinucleotides are also avoided in the C. elegans genome, cleotides in the yeast genome (Table 1) is not limited but this type of analysis has not been performed. only to the coding DNA. We also observed that the dinucleotide GC-CG ratio was 1.28 for the entire gen- ome and 1.29 for the protein-coding part of the genome, Palindromes in the yeast genome while in palindromes longer than 14 bp it was even in- creased to 1.33. These results indicate that the observed The palindrome content of the yeast genome was sorted dinucleotide bias is not a consequence of the bias in by size and compared with the palindrome content of the codon usage but could rather reflect the intrinsic bias in randomly generated equivalents of the genome. Further spontaneous mutagenesis or a bias in replication/repair, analysis indicated several interesting numerical trends as proposed by Karlin et al. (1997). For example, the that allowed us to make a clear distinction between fact that the TT dinucleotide is 1.47 times more frequent ‘‘short’’ and ‘‘long’’ palindromes, with the breaking than the TA dinucleotide in the yeast genome could be point at 10–12 bp. While positive selection could ac- due to the biased incorporation of non-complementary count for the relative abundance of long palindromes, nucleotides during DNA replication and/or repair syn- under-representation of short palindromes seems more thesis, to the bias in the mismatch repair process, or intriguing. The avoidance of short palindromes was al- because the probability for TA dinucleotides to mutate ready observed in viral and bacterial genomes and was is slightly higher than that for TT dinucleotides. Such attributed either to the activity of the restriction enzymes biases could be beyond the level of detection in any present in the cell, or introduced by horizontal gene biochemical assay, but could leave an imprint in the transfer (Sharp 1986; Gelfand and Koonin 1997; Rocha genomic DNA sequences during evolutionary time, like et al. 2001). Obviously, such an interpretation could not the genome-wide under-representation of short palin- explain the paucity of short palindromes in the yeast dromes described here. Moreover, it is possible that a genome, where no restriction/modification system has similar mutagenic bias, rather than the activity of been detected. Since palindromic dinucleotides were also restriction enzymes, resulted in the short palindrome/ under-represented in the yeast genome, we decided to restriction site avoidance observed in bacterial genomes. perform a systematic analysis of the dinucleotide content A systematic study of their palindrome contents, like the of the yeast genome. This analysis indicated a different one described here, may contribute to a better under- bias for each palindromic dinucleotide in the following standing of the role of these two processes in the evo- order: TA > CG > AT > GC; and only the GC lution of bacterial genomes. dinucleotide was slightly over-represented. The strong A high preference for non-coding regions was ob- bias observed for palindromic dinucleotides can also served for long palindromes, but even the short (4– explain the under-representation of other short palin- 10 bp) palindromes showed size-dependent avoidance of dromes, since each palindrome contains a palindromic the protein-coding regions. Therefore, although biased dinucleotide as a center of symmetry. If, for some rea- dinucleotide composition could be the main reason for son, such centers of symmetry are less frequently the decreased frequency of short palindromes at the level

Palindrome content of the yeast Saccharomyces cerevisiae genome | 79 paper 2

296

of the whole genome, there could be some other mech- Different mechanisms may regulate palindrome anism(s) regulating their incidence in the coding regions. incidence Avoidance of the 6-bp palindromes was also observed in the genes of several bacterial species, including Buchnera The data presented here, together with the results of sp. and Wigglesworthia sp., which apparently have no other studies, may suggest an integrated view on the RMSs (Fuglsang 2004). It is tempting to speculate that evolutionary dynamics of palindromic sequences in the presence of palindromes in the genes is avoided be- yeast (Fig. 5). The occurrence of short palindromes in cause they could affect in some way the metabolism of the entire genome could be disfavored due to a slight mRNAs. It should be noted that, in addition to the bias in mutagenic DNA replication and/or the increased formation of intramolecular hairpins, palindromic se- probability for palindromic dinucleotides TA, CG and quences can also promote associations between two AT to mutate, while additional mechanism(s) could identical mRNA molecules in opposite orientations. The contribute to palindrome avoidance within coding re- RNA/RNA duplex created between two palindromic gions. AT-rich palindromes that acquire the critical size sequences may interfere with subsequent step(s) in pro- of 10–12 nt may become unstable and increase their size, tein synthesis. For example, longer palindromes present mainly due to the insertion of AT or TA dinucleotides in mRNAs may produce an effect similar to that of the by slippage during DNA replication (Kruglyak et al. interfering RNA molecules (Novina and Sharp 2004), 1988; Toth et al. 2000). Since insertions in the coding stimulating mRNA degradation. regions lead to gene inactivation, they are tolerated only Our data clearly indicate that the number of large in the intergenic regions. Long, AT-rich palindromes palindromes present in the yeast genome is much higher could acquire novel functions, but could also present a than expected from the analysis of random sequences. starting point for the generation of other palindromic Starting from 10–12 bp, palindromes tend to be not only sequences, imperfect palindromes and IRs, enlarging the more frequent than expected, but also more AT-rich and repertoire of possible cis-acting genetic elements. The preferentially located in the non-coding regions. Palin- upper size of a palindrome may be regulated by the dromes longer than 18 bp are usually built mainly of potential of palindromic sequences to form cruciform A-T base pairs and are almost exclusively located in the structures in vivo. As mentioned before, such structures intergenic regions, comprising only 27.7% of the entire can be processed to DSBs (Lobachev et al. 2002) and are genome. The longest palindrome detected in the yeast known to induce different types of recombination events genome (12.1 Mb) has 44 bp and contains only two C-G (Gordenin et al. 1993; Leach 1994). The longest palin- base pairs. An even more pronounced bias for long, AT- drome found in the yeast genome contains 44 bp, but rich palindromes was observed in C. elegans chromo- the critical size of a palindrome that could act as an somes III and X (LeBlanc et al. 2000). Four 60-bp initiator of recombination still needs to be experimen- palindromes were found on chromosome III (11 Mb) tally determined. and 17 on chromosome X (16 Mb) and they were built exclusively of A-T base pairs. Over-representation of Concluding remarks large palindromes is consistent with their presumed role in different cellular processes, like the regulation of gene The new research tool for palindrome analysis described expression or initiation of chromosomal replication; and here may contribute to a deeper insight into the evolu- the high A+T content may facilitate local DNA melting tion and functions of these important DNA motifs. Our and adoption of secondary structures. Systematic dele- analysis of the first complete catalog of palindromic tions of the longest palindromes detected in the yeast sequences present in the genome of a cellular organism genome, followed by phenotype analysis of the corre- revealed several interesting findings. For example, we sponding strains, could shed more light on their possible show that the under-representation of short palindromes biological functions. is not limited to the bacterial genomes but can also be observed in yeast, while size-dependent palindrome avoidance in the coding regions seems particularly intriguing. We also point out the relevance of dinucle- Fig. 5 Evolutionary dynamics of palindromic sequences in yeast. otide bias analysis for the study of palindromes and we The solid vertical line indicates the size limit between short and long palindromes as determined in this work. The critical size leading to believe our computer program will also be useful for palindrome loss/destruction (dashed vertical line) needs to be more specific tasks, like the identification of potential experimentally determined restriction sites or regulatory elements.

80 | Palindrome content of the yeast Saccharomyces cerevisiae genome paper 2

297

Acknowledgements We are grateful to Ana Vukelic´ for help in LeBlanc MD, Aspeslagh G, Buggia NP, Dyer BD (2000) An statistical analysis. This work was supported by grant 0058014 annotated catalog of inverted repeats of Caenorhabditis elegans from the Croatian Ministry of Science, Education and Sports. chromosomes III and X, with observations concerning odd/ even biases and conserved motifs. Genome Res 10:1381–1392 Lobachev KS, Gordenin DA, Resnick MA (2002) The Mre11 complex is required for repair of hairpin-capped double-strand References breaks and prevention of chromosome rearrangements. Cell 103:83–193 Bissler JJ (1998) DNA inverted repeats and human disease. Front Mori K, Ogawa N, Kawahara T, Yanagi H, Yura T (1998) Pal- Biosci 3:408–418 indrome with spacer of one nucleotide is characteristic of the Chen Y, Gill GN (1996) A heteromeric nuclear protein complex cis-acting unfolded protein response element in Saccharomyces binds two palindromic sequences in the proximal enhancer of cerevisiae. J Biol Chem 273:9912–9929 the human erbB-2 gene. J Biol Chem 271:5183–5188 Novina CD, Sharp PA (2004) The RNAi revolution. Nature Cuomo AC, Mundy CL, Oettinger MA (1996) DNA sequence and 430:161–164 structure requirements for cleavage of V(D)J recombination Nussinov R (1984) Doublet frequencies in evolutionary distinct signal sequences. Mol Cell Biol 16:5683–5690 groups. Nucleic Acids Res 12:1749–1763 Fuglsang A (2004) The relationship between palindrome avoidance Repping S, Skaletsky H, Lange J, Silber S, Veen F van der, Oates and intergenic codon usage variations: a Monte Carlo study. RD, Page DC, Rozen S (2002) Recombination between palin- Biochem Biophys Res Commun 316:755–762 dromes P5 and P1 on the human causes massive Gelfand MS, Koonin EV (1997) Avoidance of palindromic words deletions and spermatogenic failure. Am J Hum Genet 71:906– in bacterial and archaeal genomes: a close connection with 922 restriction enzymes. Nucleic Acids Res 25:2430–2439 Rice P, Longden I, Bleasby A (2000) EMBOSS—the european Gordenin DA, Resnick MA (1998) Yeast ARMs (DNA at-risk molecular biology open software suite. Trends Genet 15:276– motifs) can reveal sources of genome instability. Mutat Res 278 400:45–58 Rocha EPC, Danchin A, Viari A (2001) Evolutionary role of Gordenin DA, Lobachev KS, Degtyareva NP, Malkova AL, Per- restriction/modification systems as revealed by comparative kins E, Resnick MA (1993) Inverted DNA repeats: a source of genome analysis. Genome Res 11:946–958 eukaryotic genomic instability. Mol Cell Biol 13:5315–5322 Sharp PM (1986) Molecular evolution of bacteriophages: evidence Karlin S, Mrazek J, Campbell AM (1997) Compositional biases of of selection against the recognition sites of host restriction en- bacterial genomes and evolutionary implications. J Bacteriol zymes. Mol Biol Evol 3:75–83 179:1363–1370 Sharp PM, Cowe E (1991) Synonymous codon usage in Saccha- Kruglyak S, Durret RT, Schug MD, Aquadro CE (1998) Equilib- romyces cerevisiae. Yeast 7:657–678 rium distribution of microsatellite repeat length resulting from a Tashiro J, Kinoshita K, Honjo T (2001) Palindromic but not G- balance between slippage events and point mutations. Proc Natl rich sequences are targets of class switch recombination. Int Acad Sci USA 95:10774–10778 Immunol 13:495–505 Kurtz S, Schleiermacher C (1999) REPuter: fast computation of Toth G, Gaspari Z, Jurka J (2000) Microsatellites in different maximal repeats in complete genomes. Bioinformatics 15:426– eukaryotic genomes: survey and analysis. Genome Res 10:967– 427 981 Leach DRF (1994) Long DNA palindromes, cruciform structures, genetic instability and secundary structure repair. Bioessays 16:893–898

Palindrome content of the yeast Saccharomyces cerevisiae genome | 81 paper 3 PERIODICUM BIOLOGORUM UDC 57:61 VOL. 104, No 4, 421-424, 2002 CODEN PDBIAD ISSN 0031-5362

Original scientific paper

A 110 bp palindrome stimulates plasmid integration in yeast

IVAN-KREŠIMIR SVETEC Abstract BERISLAV LISNIĆ ZORAN ZGAGA Background and Purpose: Perfect inverted repeats, or palin- dromes, are known as potent inducers of genetic recombination. In Laboratory of Biology and Microbial our previous work we investigated the influence of a 110 bp palindro- Genetics mic insertion present on a non-replicative plasmid on integration into Faculty of Food Technology and the yeast genome (19). The palindrome influenced the final outcome of Biotechnology recombination process, but no stimulation or targeting of plasmid in- University of Zagreb tegration was observed. In this study we investigated whether plasmid Pierottijeva 6 integration would be stimulated when the palindrome was present in P.O.B. 625 both the chromosomal and the plasmid copy of the CYC1 gene. HR-10 001 Zagreb Croatia Materials and Methods: Yeast strains and plasmids containing palindromic sequence in the CYC1 gene were constructed using con- Corrrespondence: ventional methods. Strain construction was followed by Southern blot Ivan-Krešimir Svetec analysis. Laboratory of Biology and Microbial Genetics Results: When the palindromic sequence was present in both the Faculty of Food Technology and chromosomal and the plasmid copy of the CYC1 gene, the relative ef- Biotechnology ficiency of transformation was increased more than four-fold. During University of Zagreb intrachromosomal (»pop-out«) recombination, the palindrome present Pierottijeva 6 in only one copy of the CYC1 gene was lost with very high frequency. P.O.B. 625 HR-10 001 Zagreb Conclusions: Our study may encourage the use of inverted repeats Croatia present in the genomes of different organisms for development of pal- indrome-based integrative vectors. Key Words: palindrome, recombination, plasmid integration, yeast

Abbreviation: INTRODUCTION DSB - double strand break enomic DNA may contain significant proportion of repeated Gsequences that may be dispersed throughout the genome or or- ganized as direct repeats. Inverted repeats present a special class of repeated sequences due to their ability to form secondary structures known as hairpins or cruciforms. Palindromes are inverted repeats without spacer DNA and are frequently found in different cis-acting genetic elements like promoters, operators or origins of replication. However, in spite of their obvious biological importance, such motifs are unstable in different organisms, from bacteria to mammalian cells. Two types of models have been proposed to explain the instability of palindromes. The first were based on the existence of specific endo- nucleases that transform cruciforms into double-strand breaks (DSBs), while other propose that the recombinogenic structure is formed only when the progression of DNA replication is blocked by the formation of Received September 26, 2002. a secondary structure (1, 6, 8).

421

82 | A 110 bp palindrome stimulates plasmid integration in yeast paper 3 I.K. Svetec et al. Palindrome-induced plasmid integration

In the yeast S. cerevisiae, recombinogenic potential from the EcoR I site. Standard procedures were used for of inverted repeats and palindromes has been demon- all DNA manipulations (77) and the yeast strain used was strated for both mitochondrial and chromosomal DNA. FF18-52 (MATa, leu2-3,112, ura3-52, ade5, trpl-289, can1). In mitochondria, inverted repeats are responsible for the The strains containing desired insertions in their CYC1 re- formation of amphimeric genomes in petite mutants (16), gions were constructed as follows. First, the spheroplasts and mitochondrial endonucleases that cut the cruciform were transformed with the plasmid DNA that was cut with structures in vitro have been isolated in S. cerevisiae (3) Xho I in order to target integrations to the CYC1 region and and Schizosaccharomyces pombe (13). Palindromes in the structure of the CYC1 locus was analyzed by South- chromosomal DNA are recognized by the meiosis-spe- ern blotting (see below). Two Ura+ transformants (one for cific endonuclease Spo11 (12), but shorter palindromes each plasmid) were further grown to the stationary phase were shown to be refractory to the mismatch repair during in 4 ml of the minimal supplemented medium without ura- meiotic recombination (77). In mitotic cells, stimulation of cil. Aliquots were plated on the agar plates containing 0.75 recombination induced by inverted repeats was directly mg/ml of 5-fluoroorotic acid for selection of the cells that related to the size of repeats and inversely related to the had lost the URA3 gene due to »pop-out« recombination size of intervening spacers, so that an inverted dimer of event between directly repeated CYC1 regions (2). These the URA3 gene (1,0 kb) stimulated recombination in the were further analyzed by Southern blotting in order to de- adjacent region up to 17,000-fold (70). Recombination was tect the colonies that contained a single copy of the CYC1 also stimulated by the pol3-t mutation in the gene encod- region carrying the insertion formerly present on the plas- ing the DNA polymerase d, suggesting the involvement mid. of DNA replication in the formation of a recombinogenic structure. Recently, Lobachev et al. (9) demonstrated the Yeast transformation and analysis of transformants formation of DSBs at hairpin structures formed in vivo be- tween inverted repeats, but the responsible nuclease has Yeast transformation by a modified spheroplast proce- not been identified. dure and Southern-blot analysis of transformants were de- scribed previously (4, 7). For strain construction, genomic In our previous work we studied the influence of a 110 DNA of Ura- colonies was digested with the restriction en- bp palindromic insertion on plasmid integration in the zyme Sac I that cuts the plasmids pAB218-7 and 218-8 only homologous region present in the yeast genome (19). We within insertion in the CYC1 region. Relative efficiency of found that the palindromic insertion did not stimulate plas- transformation was determined by comparing the efficien- mid integration, but influenced some subsequent step(s) cies of transformation obtained with circular plasmid and of the recombination process. In this study we decided to the plasmid cut with Xho I in the CYC1 region. In each ex- introduce the palindromic insertion in the chromosomal periment the two samples were transformed with gapped copy of the CYC1 gene to see whether the efficiency of plasmid (0.4 µg) and ten samples with circular DNA (2 µg) transformation would be increased when the palindromic and the efficiency of transformation was expressed as the sequence was present both on the plasmid molecule and number of transformants per µg, of DNA. in the homologous region in the yeast genome.

RESULTS AND DISCUSSION MATERIALS AND METHODS Strain construction Plasmid and strain construction The lack of stimulation of plasmid integration in the pres- Construction of the plasmids containing 102 bp palin- ence of a palindromic insertion, observed in our previous dromic (pAB218-5) or non-palindromic (pAB216) inser- study, could be due to different reasons. For example, if tions in the EcoR I site of the 1685 bp yeast CYC1 region has the creation of a recombinogenic structure depends on been described (79). These two insertions were partially DNA replication (5, 8), a palindrome present on a non-rep- homologous, since the monomer used to create inverted licative plasmid would not stimulate plasmid integration. repeats was also present in non-palindromic insertion, Alternatively, if the palindromic insertion was recombino- and the overall size of the palindromic sequence (insert genic even in the absence of DNA replication, recombina- together with flanking regions) was 110 bp (79). These two tion could be blocked by the lack of homology to the pal- plasmids also contained two additional mutations in that re- indrome in the genomic copy of the CYC1 region. In order gion created by inactivation of the restriction enzyme sites to overcome possible barriers for palindrome-stimulated Kpn I and Sph I. In this work we used the plasmids pAB218- transformation, we decided to transfer the palindromic 7 (palindromic insertion) and pAB218-8 (non-palindromic and non-palindromic insertions (used as a control) from insertion) that were the same as the plasmids pAB218-5 the plasmids pAB218-7 and pAB218-8 in the CYC1 region and pAB218-6 but did not contain these two mutations. The present in the yeast genome. We first transformed the CYC1 region, also contained two sites for the restriction yeast strain FF18-52 with the plasmid DNA cut with Xho enzyme Xho I, so that the digestion of the plasmid DNA I and a sample of transformants was further analyzed by with this enzyme created a 434 bp gap 256 bp upstream Southern blotting. For both plasmids we observed that the

422 Period biol, Vol 104, No 4, 2002.

A 110 bp palindrome stimulates plasmid integration in yeast | 83 paper 3 I.K. Svetec et al. Palindrome-induced plasmid integration

heterologous insertion present on the plasmid molecule interpretation since palindromic sequences have been was lost in the majority of transformants (Figure 1). This is shown to be refractory to the mismatch repair, at least in in accordance with current models for homologous recom- meiotic cells (11). bination predicting that the unbroken DNA molecule acts as a donor of genetic information (15). A single colony from Yeast transformation each strain containing the insertion in the left copy of the CYC1 region (as presented in Figure 1) was further used In order to find out whether the palindromic sequence to isolate spontaneous Ura+ colonies that were analyzed would promote integrative transformation we compared by Southern blotting. This analysis revealed a striking dif- experimental systems in which both the chromosomal ference between the strains containing palindromic and and the plasmid copy of the CYC1 region contained either non-palindromic insertions in the duplicated copy of the palindromic or non-palindromic insertions. The insertions CYC1 region. In the strain containing the non-palindromic were of the same size and placed in the same location with- insertion 18/60 auxothrophs retained the insertion, while in the CYC1 gene. Transformation was performed with cir- only 1/80 was detected in the strain that contained the cular plasmids and also with the plasmids linearized with palindromic insertion in the CYC1 region (Figure 2). Pref- restriction endonuclease Xho I. For each experiment, the erential loss of the palindromic sequence was previously relative efficiency of transformation was determined as the detected during integration of the circular plasmid (19) ratio between transformation efficiencies obtained with and could be explained in two ways. One possibility is circular and linearized molecules, in order to compare that the hairpin structure formed in the DNA heteroduplex the results obtained with different strains and different was preferentially excised and another is that the recom- plasmid preparations. In all experiments performed the bination between two copies of the CYC1 gene was in fact relative efficiency of transformation was higher when the initiated by a DSB created in, or close to the palindrome. palindromic sequences were present in the CYC1 region In this case, heterologous termini containing palindromic (Figure 3). Higher fluctuations were observed with palin- sequence should be degraded prior to the completion dromic sequences (2.26 ± 1.13 x 102) than with non-palin- of the recombination process (18). We prefer the latter dromic sequences (0.47 ± 0.22 x 102). This could be due, for example, to different degrees of super-coiling present in different plasmid preparations used for transformation, since negative supercoiling was shown to stimulate cru- ciform extrusion in vivo (8). The average increase in the relative efficiency of transformation due to the presence of the palindrome was 4.8 fold. This effect is rather modest in relation to the DSB-stimulated plasmid integration (14) and is comparable to the single base-pair mismatch-stimulat- ed plasmid integration (20). However, based on the work of Lobachev et al. (10), we could expect highly increased stimulation of recombination with increasing size of a palindrome. The palindromes longer than approximately 150 bp, that cannot be propagated in wild-type Escheri- chia coli, can be stably maintained in sbcCD mutants (8) and used for the study of integrative transformation

FIGURE 1. Targeted plasmid integrations in the CYC1 region. )

Plasmid DNA was treated with the restriction endonuclease Xho 2 I before transformation. Different types of integration events were revealed by Southern blot analysis of transformants. relative efficiency of transformation (x 10 relative A B

FIGURE 3. The influence of the palindrome on the transforma- tion efficiency. Both plasmid and chromosomal copy of theCYC1 FIGURE 2. Recombination-associated loss of the insert. The gene contained either palindromic (A) or non-palindromic (B) presence of the insert in the CYC1 region among uracil auxo- insertions. For each strain, the relative transformation efficien- trophs was revealed by Southern blot analysis. Symbols are as cies obtained in five independent experiments and the average in Figure 1. values are indicated.

423 Period biol, Vol 104, No 4, 2002. 84 | A 110 bp palindrome stimulates plasmid integration in yeast paper 3 I.K. Svetec et al. Palindrome-induced plasmid integration

in organisms other than yeast, where DSBs fail to stimu- 10. LOBACHEV K S, SHOR B M, TRAN H T, TAYLOR W, late and target plasmid integration. KEEN J D, RESNICK M A, GORDENIN D A 1998 Fac- tors affecting inverted repeat stimulation of recombi- Acknowledgements: This work was supported by the nation and deletion in Saccharomyces cerevisiae. Ge- grants from the Ministry of Science and Technology of the netics 745:1507-1524 Republic of Croatia Nos. 0058405 and 0058014. 11. NAG D K, WHITE M A, PETES T D 1989 Palindromic sequences in heteroduplex DNA inhibit mismatch re- REFERENCES pair in yeast. Nature 340: 318-320 1. BISSLER J J 1998 DNA inverted repeats and human dis- 12. NASAR F, JANOWSKI C, NAG D K 2000 Long palin- ease. Frontiers in Bioscience 3:408-418 dromic sequences induce double-strand breaks dur- 2. BOEKE J, LACRUTE F, FINK G R 1984 A positive selec- ing meiosis in yeast. Mol Cell Biol 20: 3449-3458 tion for mutants lacking orotidine-5’-phosphate decar- 13. ORAM M, KEELEY M, TSANEVA A 1998 Holliday boxylase activity in yeast: 5-fluoro-orotic acid resistance. junction resolvase in Schizosaccharomyces pombe Mol Gen Genet 197: 345-346 has identical endonuclease activity to the CCE1 3. EZEKIEL U R, ZASSENHAUS H P 1993 Localization of homologue YDC2. Nucl Acids Res 26: 591-601 a cruciform cutting endonuclease to yeast mitochon- 14. ORR-WEAVER T L, SZOSTAK J W, ROTHSTEIN R J 1981 dria. Mol Gen Genet 240: 414-418 Yeast transformation: a model system for the study of 4. GJURAČIĆ K, ZGAGA Z 1996 Illegitimate integration recombination. Proc Natl Acad Sci USA 78: 6354-6358 of single-stranded DNA in Saccharomyces cerevisiae. 15. PAQUES F, HABER J E 1999 Multiple pathways of re- Mol Gen Genet 253: 173-181 combination induced by double-strand breaks in Sac- 5. GORDENIN D A, LOBACHEV K S, DEGTYAREVA N P, charomyces cerevisiae. Microbiol Mol Biol Rev 63: MALKOVA A L, PERKINS E 1993 Inverted DNA re- 349-404 peats: a source of eukaryotic genomic instability. Mol 16. RAYKO E, GOURSOT R 1999 Amphimeric mitochond- Cell Biol 13: 5315-5322 rial genomes of petite mutants of yeast. III. Genera- 6. GORDENIN D A, RESNICK M A 1998 Yeast ARMs tion by linking two secondary-structure-dependent il- (DNA at-risk motifs) can reveal sources of genome in- legitimate recombination events. Curr Genet 35: 14- stability. Mutat Res 400: 45-58 22 7. KOREN P, SVETEC I K, MITRIKESKI P T, ZGAGA Z 2000 17. SAMBROOK J, FRITCH E F, MANIATIS T 1989 Mo- nd Influence of homology size and polymorphism on plas- lecular cloning: a laboratory manual, 2 ed., Cold mid integration in the yeast CYC1 DNA region. Curr Spring Harbor Laboratory Press, Cold Spring Harbor, Genet 37: 292-297 New York 8. LEACH D R F 1994 Long DNA palindromes, cruciform 18. SVETEC I K 2000 The influence of short heterologous structures, genetic instability and secondary structure sequences on plasmid integration in the yeast Sac- repair. BioEssays 16: 893-898 charomyces cerevisiae genome. Master’s Thesis, 9. LOBACHEV K S, GORDENIN D A, RESNICK M A University of Zagreb 2002 The Mre11 complex is required for repair of 19. SVETEC I, STJEPANDIĆ D, BORIĆ V, MITRIKESKI P hairpin-capped double-strand breaks and preven- T, ZGAGA Z 2001 The influence of a palindromic in- tion of chromosome rearrangements. Cell 103: 183-193 sertion on plasmid integration in yeast. Food Technol Biotechnol 39: 169-173 20. ZGAGA Z, CHANET R, RADMAN M, FABRE F 1991 Mismatch-stimulated plasmid integration in yeast. Curr Genet 19: 329-332

424 Period biol, Vol 104, No 4, 2002. A 110 bp palindrome stimulates plasmid integration in yeast | 85