Birth and death of genes linked to chromosomal inversion

Yoshikazu Furutaa,b,1, Mikihiko Kawaia,b,c,1, Koji Yaharad,e, Noriko Takahashia,b, Naofumi Handaa,b, Takeshi Tsurub,f, Kenshiro Oshimag, Masaru Yoshidah, Takeshi Azumah, Masahira Hattorig, Ikuo Uchiyamac, and Ichizo Kobayashia,b,f,2

aDepartment of Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Minato-ku, Tokyo 108-8639, Japan; bInstitute of Medical Science, University of Tokyo, Minato-ku, Tokyo 108-8639, Japan; cLaboratory of Genome Informatics, National Institute for Basic Biology, National Institutes of Natural Sciences, Okazaki, Aichi 444-8585, Japan; dGraduate School of Medicine, Kurume University, Kurume, Fukuoka 830-0011, Japan; eFujitsu Kyushu Systems Ltd., Fukuoka, Fukuoka 814-8589, Japan; fDepartment of Biophysics and Biochemistry, Graduate School of Science, University of Tokyo, Minato-ku, Tokyo 108-8639, Japan; gDepartment of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba 277-8561, Japan; and hDepartment of Gastroenterology, Graduate School of Medicine, Kobe University, Chuou-ku, Kobe, Hyogo 650-0017, Japan

Edited by Patrick Sung, Yale University, New Haven, CT, and accepted by the Editorial Board December 7, 2010 (received for review August 24, 2010) The birth and death of genes is central to adaptive evolution, yet H. pylori show high plasticity through homologous re- the underlying genome dynamics remain elusive. The availability combination and have experienced geographical differentiation (8). of closely related complete genome sequences helps to follow Detailed comparison of these 10 complete genome sequences changes in gene contents and clarify their relationship to overall led to discovery of a unique mechanism of DNA duplication that genome organization. Helicobacter pylori, bacteria in our stomach, is linked to chromosomal inversion. We inferred the mechanisms are known for their extreme genome plasticity through of all of the large inversions seen and reconstructed genome and recombination and will make a good target for such an anal- synteny evolution through inversion. ysis. In comparing their complete genome sequences, we found that gain and loss of genes (loci) for outer membrane proteins, Results which mediate host interaction, occurred at breakpoints of chro- Linkage of Changes in Number of Paralogous Gene Loci and Chro- mosomal inversions. Sequence comparison there revealed a unique mosomal Inversion. In this work, we compared genome organiza-

mechanism of DNA duplication: DNA duplication associated with tion of 10 H. pylori strains: five Western strains [four Europe- EVOLUTION inversion. In this process, a DNA segment at one chromosomal an strains (26695, HPAG1, G27, and P12) and one West African locus is copied and inserted, in an inverted orientation, into a dis- strain (J99)] and five Eastern strains [four East Asian strains tant locus on the same , while the entire region be- (F16, F30, F32, and F57) and one Amerind strain (Shi470)]. tween these two loci is also inverted. Recognition of this and three H. pylori genes involved in host interaction, such as those more inversion modes, which occur through reciprocal recombina- coding for outer membrane proteins (OMPs), show large diver- tion between long or short sequence similarity or adjacent to a mo- gence between the Western strains and the Eastern strains (9). bile element, allowed reconstruction of synteny evolution through By comparing the complete genome sequences of Eastern strains inversion events in this species. These results will guide the inter- with those of Western strains, we found that differences in the pretation of extensive DNA sequencing results for understanding number of loci for two OMP families, oipA (one locus in the long- and short-term genome evolution in various organisms and West vs. two loci in the East) and hopMN (two loci in the West in cancer cells. vs. one locus in the East), are both linked to a large chromo- somal inversion (Fig. 1A; for details, see Genome Comparison in genome comparison | gene duplication | genome rearrangement | Materials and Methods and SI Materials and Methods). These | replicative inversion changes may have contributed to host adaptation because OMPs in general, and these OMP families in particular, are important hanges in the gene repertoire of a genome are central to in host interaction and pathogenicity (9–11). Cadaptive evolution. For example, microorganisms change host-interacting proteins on their surface, evading the immune DNA Duplication Associated with Inversion. Further comparative response and presenting alternative modes of adhesion, and sequence analysis of the inversion breakpoints led to our proposal human cancer cells show amplification of oncogenes. In addition for a unique process of DNA duplication, which we designated DNA duplication associated with inversion (DDAI), illustrated to horizontal transfer (1), various mechanisms for generation schematically in Fig. 1B. The DDAI concept explains the ob- and loss of genes have been recognized (2). A gene may move to served linkage of the copy number variations to an inversion as a distant locus through a variety of recombination mechanisms. follows: a DNA region in one locus (the blue wavy arrow at a-A in Replication error and illegitimate recombination may generate Fig. 1Bi) generates a copy at a distant locus (C-c) on the same fi a tandem duplication, which may lead to further ampli cation chromosome in a process that also results in the inversion of the through homologous interaction (3). These mechanistic analyses, however, are not usually connected to genome-wide dynamics. Recent innovations in DNA sequencing are providing a com- Author contributions: I.K. designed research; Y.F., M.K., K.Y., N.T., N.H., T.T., K.O., M.Y., plementary approach to the study of genome-wide changes. T.A., M.H., and I.U. performed research; Y.F., M.K., K.Y., T.T., I.U., and I.K. analyzed data; Comparing complete genome sequences between multiple Y.F. and I.K. wrote the paper; N.T. and N.H. contributed to preparation of genomic DNA; K.O. and M.H. contributed to sequencing and assembly; and M.Y. and T.A. provided the closely related organisms may reveal the dynamics of genome strains (for detail, see SI Author Contributions in Detail). evolution in detail (4). We can assess by The authors declare no conflict of interest. measuring the exact number of homologous genes in a genome This article is a PNAS Direct Submission. P.S. is a guest editor invited by the Editorial and trace both subtle and global genome rearrange- Board. ments. In bacteria, this approach has turned out to be especially Freely available online through the PNAS open access option. powerful (5, 6). 1Y.F. and M.K. contributed equally to this work. In this study, we compared 10 genome sequences of Helicobacter 2To whom correspondence should be addressed. E-mail: [email protected]. pylori, a bacterial species that is present in half the human pop- This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. ulation and is responsible for stomach cancer and other diseases (7). 1073/pnas.1012579108/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1012579108 PNAS Early Edition | 1of6 Downloaded by guest on October 1, 2021 chromosomal region between the two loci. Schematic nucleotide sequence alignments around the inversion breakpoints are illus- trated in Fig. 1Bii: sequence B consists of half of A and half of complement of C, whereas b consists of half of a and half of complement of c. One of the two sequence alignments in this particular scheme, A-B-C (complement), shows some overlap (boxed in Fig. 1Bii). The process seems to be always associated with / of a region at least at one break point. The molecular mechanism of DDAI might be similar to replicative inversion mediated by phage Mu and replicative transposons (12, 13). A simple stepwise model for replicative inversion is shown in Fig. 1Biii. An additional deletion/insertion step is required to explain the sequence alignment patterns in Fig. 1Bii in DDAI. The duplication of oipA (= hopH; Fig. 1A) can be fully explained by a DDAI event that duplicates a DNA region encompassing the oipA ORF (blue wavy arrow in Fig. 1C). The alignment A-B-C has an overlap, whereas a-b-c contains a gap. The DDAI event was followed by inversion through homologous recombination between the duplicated regions (Fig. 1Ci, third and fourth lines) in two genomes (inversions B2 and D2; Fig. 2). The loss of one of the hopN/hopM loci (Fig. 1A) is explained by a DDAI event that resulted in a gene fusion as follows: a DNA region covering the 3′ end of the dpnA gene (encoding a DNA methyltransferase; blue wavy arrow at a-A in Fig. 1Di) was duplicated into a site within the hopN gene (at C-c in Fig. 1Di), generating a hopN′-′dpnA fusion gene (at b in Fig. 1Di). This was likely inactive and seems to have acquired a secondary deletion within hopN′. Here the alignment a-b-c shows an overlap, whereas A-B-C contains a gap. The DDAI event was again followed by inversion through homologous recombination

interpretation of large-scale orientation, arrowheads on the DNA schemat- ically indicate global orientation, not 3′ ends. A pair of staggered single- strand breaks is made flanking a region (blue wavy arrows) to be duplicated. A double-strand DNA break is made at the target site. Single-strand ends of the region to be duplicated are ligated to the target. This strand transfer results in inversion between the two sites when the gaps are closed by replication, which may be primed from the unligated termini that remain. (C) Concrete example of gene duplication resulting from an inferred DDAI event. (Ci) Hypothetical steps for oipA (= hopH) duplication (inversion A) followed by inversion through homologous recombination (inversions B2 and D2; the Amerind strain Shi470 has a secondary deletion in the right oipA locus, not illustrated here). (Cii) Sequence alignments of the labeled boxes. A nonhomologous substitution, not accounted for by the above simple model (Biii), is found near the junction when the first and second lines in Ci were compared. (D) A second concrete example of DDAI, which was followed by gene decay. (Di) Hypothetical steps for loss of a hopN gene. A region cov- ering the dpnA C terminus was duplicated and fused to a part of hopN (inversion C2). The fusion gene underwent further deletion by a mechanism independent of this model, followed by further inversion through homol- ogous recombination (inversion E). (Dii) Sequence alignments. Substitution of nonhomologous material (138 bp at box B added, and part of the C terminus of hopN lost, comparing the top line with the hypothetical in- termediate) is not accounted for by the above simple model (Biii). P12, J99, and Shi470 are not illustrated here because P12 and J99 had a different hopM allele at the hopN locus, and Shi470 had lost the gene at the hopN Fig. 1. Gene duplication/decay through the DDAI process. (A) Linkage of locus by simple deletion. In all of the panels, gray and pink shading between oipA gene duplication and hopN gene decay with chromosomal inversion rows indicates similarity at the DNA level. Gray shading indicates syntenic between schematic Western and East Asian genomes. The upper side cor- alignment, whereby two carry the same sequences in the responds to P12 genome and the lower side to one of the two possible same order; pink shading indicates inverted alignment, whereby two chro- ancestral structures of the four Japanese genomes (Fig. 5B). (B) DDAI. (Bi) mosomes carry the same sequences but in inverted orientation. Blue wavy Genomic region in strain 1, represented as a blue wavy arrow, is duplicated arrows represent the region subject to duplication. Straight arrows repre- in inverted orientation in an event associated with a chromosomal inversion sent global orientation of a DNA region to be inverted. Boxed regions are in strain 2. The blue wavy segment originally present is inserted in inverted the breakpoints of the indicated alignments; capital letters indicate one set orientation at the end of the inverted region. (Bii) Conceptual nucleotide containing similar sequences; lowercase letters indicate a second set. In C sequence alignments around the inversion break points (boxed regions in and D, arrowed boxes (with wings) represent ORFs, with homologous seg- Bi). Only one of the two sequence alignment sets shows an overlap. (Biii) ments carrying the same pattern; a pentagon (arrow box with no wing) Hypothetical stepwise model for replicative inversion (modified from ref. indicates a relevant intergenic region, with homologous regions labeled by 12), which may be applicable to DDAI with modification. For convenient the same pattern.

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1012579108 Furuta et al. Downloaded by guest on October 1, 2021 Using the sequence analysis detailed below, we classified the inversions into four types (Table 1 and Table S1): (i) DDAI, (ii) homologous recombination between two long regions of se- quence similarity in inverted orientation, (iii) recombination involving two short regions of sequence similarity in inverted orientation, and (iv) inversion adjacent to the insertion of a mobile element. More DDAIs. We identified a third example of the DDAI type inversion (inversion J in Fig. 2 and Table 1), this time in a Eu- ropean strain. As illustrated in Fig. S2i, part of a hypothetical gene (HPAG1_1222) was duplicated with inversion (14). An overlap is seen in alignment A-B-C (Fig. S2ii). Occurrence of this inversion was noted earlier (14). A DDAI event may have also occurred in an inversion in- Fig. 2. Inversions in the genome evolution of H. pylori. Large inversions volving a smaller region. A region of approximately 1 kb in the mapped on an H. pylori genome. The outer circle indicates the genome of cag pathogenicity island that includes cagQ was reported to be P12, a European strain, whereas the inner circle indicates its origin and inverted in some Japanese strains (15). Our detailed sequence coordinates. An arc outside the outer circle indicates an inversion in the comparison confirmed a similar inversion in Shi470 and in three East Asian strains and Amerind strain, whereas an inside arc indicates an inversion in European and/or West African strains. A triangle indicates out of the four Japanese genomes and, furthermore, strongly a region generated by a DDAI event: brown, oipA (Fig. 1C); yellow, dpnA suggested that these inversion events likely occurred by two steps (Fig. 1D); orange, a hypothetical gene (Fig. S2). Inversion labels correspond of DDAI with generation of inverted repeat pairs of 87 bp and 82 to those in Fig. 5A. bp (Fig. S3). Homologous recombination. The second mechanism of inversion, homologous recombination (Table 1 and Fig. S4ii), has long between the duplicated regions (Fig. 1Di, third and fourth lines) been recognized in various organisms (16). Notably, most events in one genome (inversion E). (five of six) seem to have taken place between duplicated regions generated in the first place by DDAI (inversions B2, D2, E, K, EVOLUTION Mechanism-Based Classification of Inversions. We then systemati- and the short inversion covering cagQ; Fig. 1 C and D and Figs. cally searched for apparent large (>5 kb; Materials and Methods) S2 and S3). This association can be explained by the dependence chromosomal inversions among the 10 complete genomes to of homologous recombination on near-identity of substrate learn how general DDAI events are. Those inversions were sequences (17). Immediately after the duplication, the regions mapped on the genome (Fig. 2) and on dotplots (Fig. S1). share perfect identity and are prone to engage in homologous

Table 1. Inversions between complete genomes Inverted repeats* † Label Figure [identity]/[length](gene) Effect on gene Inserted element RM linkage

DNA duplication associated with inversion (DDAI) A1C 1007/1035 (oipA) Duplication (oipA) C2 1D 325/330 (dpnA) Decay (hopN); partial duplication (dpnA)+ J S2 244/255 (hypothetical) Partial duplication (hypothetical) Homologous recombination ‡ B1 S6 (Repeat 8 , deleted) TnPZ B2 1C 462/522 (oipA) D2 1C 984/1020 (oipA) E1D 331/331 (dpnA) + ‡ F 1399/1407 (repeat 8 ) K S2 244/255 (hypothetical) Recombination at a short sequence similarity C1 3 8/10 (endonuclease) Splitting (endonuclease) + D1 S7 5/5 (hypothetical, TnPZ) Splitting (hypothetical, TnPZ) Inversion adjacent to a mobile element insertion G S8A IS605 H1§ 4 Type II RM + H2§ S8B RM ? I§ S8C Type II RM + L§ S8D Type I RM + M1§ S8F IS605 + M2§ S8F Splitting (RM specificity subunit) Type I RM + M3§ S8F TnPZ + M4§ S8G IS605

Large (>5 kb) inversions were analyzed (Materials and Methods). For details, see Table S1. RM, restriction–modification genes; TnPZ, conjugative trans- posable element (20). *Sequence present in multiple strains and present as an inverted repeat in at least one strain. † Linkage to RM genes and endonuclease genes not further than five ORFs. ‡ Repeat 8, a repetitive element (19). §Previously reported (14, 18).

Furuta et al. PNAS Early Edition | 3of6 Downloaded by guest on October 1, 2021 recombination, resulting in another inversion event (Fig. S4 i and ii). Nonreciprocal homologous interaction such as gene conver- sion would further tend to homogenize their sequences (Fig. S4iv). Indeed the duplicated oipA alleles in the East Asian strains are very similar to each other in their sequence and are more similar within, than between, genomes (Fig. S5). When sequence divergence reaches a critical point, homologous recombination and, therefore, further inversion will occur less readily (Fig. S4 iii–v). Among other pairs of known inverted repeats in the H. pylori genomes (18, 19), the repeat 8 pair showed inversion [inversions B1 (Shi470) and F (F16 and F32); Table S1]. Repeat 8 seems to have been lost after inversion B1 through insertion of con- jugative transposonable element TnPZ (20) and secondary de- letion events in Shi470 (Fig. S6). Recombination at short sequence similarity. The third type of in- version (Table 1) occurred through reciprocal recombination involving a short (5–10 bp) region of sequence similarity. The example in Fig. 3 (inversion C1) involved a sequence within a gene for a putative endonuclease and a distant intergenic se- quence and resulted in the splitting of this gene into two parts at two very distant loci (Fig. 2). In another example (inversion D1; Fig. 2 and Table 1), re- combination between a 5-bp sequence within a gene on a copy of TnPZ and a 5-bp sequence in a distant gene resulted in splitting of this conjugative transposable element (Fig. S7). Fig. 4. Inversion adjacent to an inserted mobile element (inversion H1). (i) Inversion adjacent to a mobile element. The remaining examples of Insertion of a restriction-modification (RM) system accompanied by dupli- inversion (Table 1) all carried a mobile element around one of cation of an ≈400-bp-long target sequence (black triangle). Recombination the inversion break points. The insert was either IS605 (21), at the right repeat led to inversion. (ii) Sequence alignments. TnPZ, or a restriction–modification system (Fig. 4 and Fig. S8). Inversions H1, H2, I, L, M1, M2, M3, and M4 were noticed earlier (14, 18), but the mechanisms that generated them quence, and IS605 seems to have inserted into this TnPZ copy. remained unknown. Our multiple sequence comparisons led us This seems to have been followed by inversion at one end of the to hypothesize a two-step mechanism of inversion adjacent to an copy of the insertion sequence (IS). The IS end and the inversion inserted mobile element. break point are very close (Fig. S8Aiii). fi The third line of evidence is that the overlap of the end of the The rst line of evidence is our discovery of an intermediate – fi form for inversion H1 with a restriction–modification system insert (either a restriction modi cation system or an IS) and the insertion (Fig. 4i). The restriction–modification system seems to inversion break point was observed for inversion I, L, M1, M2, have inserted into the genome of HPAG1 with an ≈400-bp target and M4 (Fig. S8 C, D, E, and G). duplication, a mode already seen with several groups of re- Reconstruction of Inversion History in H. pylori. On the basis of these striction–modification systems (22, 23). A single copy of the re- findings and interpretations, we reconstructed the evolutionary peat was observed at the empty site in strain G27. A segment history of genome synteny involving large inversion events in adjacent to the restriction–modification system has been inverted H. pylori (Fig. 5). in strain J99. This inversion involved a site approximately 100 bp The phylogenetic tree simply based on the number of in- away from the duplicated copy of the repeat in HPAG1 (Fig. 4i). version events (Fig. 5A) shows the likely order of the events, but The second line of evidence is from inversion G (Fig. S8A). A its shape is different from a tree based on gene sequences (8). TnPZ seems to have inserted with duplication of a target se- One reason likely comes from the assumption that all of the inversion events occur at a constant rate during evolution (24). According to our observations, this assumption cannot be justi- fied at least for this species, because of the variety in the mo- lecular mechanisms of inversion and their mutual relationships. Inversion by homologous recombination would occur more fre- quently than DDAI or inversion by short-homology recombi- nation. Homology generated by a DDAI event would promote later frequent homologous recombination. The last hypothesis is supported by the observation that there are multiple inversions at oipA and hopN′-′dpnA by homologous recombination in the Japanese lineage after the DDAI event in the ancestor (Fig. 5 A and B). The difference between the two trees may become smaller when we learn the frequency of the four types of inversion and assign a unique branch length to each of the four. Fig. 3. Inversion through recombination involving a short sequence simi- Discussion larity (inversion C1). (i) Recombination between an intergenic sequence and an intragenic sequence split an endonuclease gene. (ii) Nucleotide sequence DDAI is similar to the replicative inversion process used by alignment at the inversion break points revealing a 10-bp region of se- specific DNA transposons (12, 25, 26) but differs in two impor- quence similarity. tant aspects: the nature of DNA duplicated and the sequences at

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1012579108 Furuta et al. Downloaded by guest on October 1, 2021 Fig. 5. Reconstructed genome synteny evolution through inversion events in H. pylori strains. (A) Phylogenetic tree inferred from all of the inversions. Label on the branch indicates an inversion event. Branch length is proportional to the number of inversion events. (B) Representation of inversion history based on the genome map. Triangles indicate regions duplicated through DDAI, that is, regions explained in Fig. 1, Fig. 2, and Fig. S2.

the breakpoints. Only one of the two sequence alignment sets The tight association of inversion and gene births and deaths showed an overlap (Fig. 1 Bii, Cii,andDii and Fig. S2ii), which demonstrated in this work will contribute to our understanding implies deletion/insertion at one of the breakpoints in the frame- of the dynamics of genome evolution in various organisms, both work of the model in Fig. 1Biii. DNA degradation from the initial in germlines and in somatic lines, and in normal and cancerous double-strand break may have resulted in the deletion. The in- cells. Reconstruction of genome synteny evolution must account duction of natural competence by DNA damage recently reported for the variety in inversion mechanisms and the wealth of bi-

(27) could be related to the deletion/insertion. Instead, the two EVOLUTION distinct processes, inversion and deletion/insertion, may have been ological information embedded in the genomes. evolutionarily related. The short nucleotide overlap at an inversion Materials and Methods breakpoint also reminds us of the other replicative models, such as Genome Sequences. Sequence data were obtained from National Center for the microhomology-mediated break-induced replication model Biotechnology Information Genome for the circularized chromosomes of the (28). Although the possibility of secondary rearrangements can following H. pylori strains: European strains, 26695 (NC_000915.1), HPAG1 never be excluded in the genome comparison approach, we prefer (NC_008086.1), G27 (NC_011333.1), P12 (NC_011498.1); West African strain, J99 the DDAI concept of a single event generating duplication and (NC_000921.1); Amerind strain, Shi470 (NC_010698.2); and East Asian strains, inversion at the same time, or at almost the same time, as an at- F16 (AP011940.1), F30 (AP011941.1), F32 (AP011943.1), and F57 (AP011945.1). tractive and productive working hypothesis for further genome comparison and experimental analyses. Genome Comparison. Genome alignments and dotplots were obtained with We do not yet know what catalyzes DDAI reaction. Enzymes ’ Comparative Genome Analysis Tool (31) and used for evolutionary inference (6, proposed to be involved in transposons replicative inversion, 25, 32, 33). Detection of inversion breakpoints and extraction of genome blocks such as transposases, DNA polymerases, DNA ligases, and top- were helped by Mauve (34). Genome blocks smaller than 5 kb were ignored. oisomerases (12), are all available in H. pylori according to ge- History of genome inversion events was reconstructed using the Multiple Ge- nome annotation (19). In addition, the initiating incisions for – fi nome Rearrangement program (35) with manual curation (SI Materials and DDAI may have been catalyzed by restriction modi cation sys- Methods). In Fig. 1A, the other inversions were ignored for clarity. The two tems in place of the transposases in replicative inversion. The schematic structures correspond to that of strain P12 and one of the two following considerations favor this possibility: (i) restriction– fi equally possible hypothetical ancestral structures of the four Japanese strains modi cation genes are abundant in H. pylori genomes (19); (ii) (the intermediate structure labeled with inversions C1 and C2 in Fig. 5B). some restriction–modification enzymes show a nicking activity (29); (iii) some restriction enzymes are similar to transposases in ACKNOWLEDGMENTS. We thank Shigeru Iida for comments on an early structure and function (30); and (iv) some restriction–modification version and Elisabeth Raleigh for expert help in improving the manuscript units are similar to DNA transposons in organization (23). during revision. This work was supported by the Institute for Bioinformatics Research and Development, the Japan Science and Technology Agency; by A related issue is the mechanism of inversion adjacent to mobile “ – fi a grant (to I.K.) from the global COE (Center of Excellence) project Genome element insertion (type iv). The four restriction modi cation Information Big Bang” from Ministry of Education, Culture, Sports, Science systems as well as the IS605 and TnPZ copies near the inversions and Technology-Japan (MEXT), Suzuken Memorial Foundation, Urakami may have caused them. Indeed, in one of the inversions adjacent Foundation; by Grant-in-Aid for Scientific Research 20310125 from the Japan to a restriction–modification system, several recognition sites of Society for the Promotion of Science (to I.U.); by a grant (to N.H.) from – fi MEXT, Takeda Foundation, Sumitomo Foundation, Kato Memorial Biosci- the linked restriction modi cation system were found at the ence Foundation, and Naito Foundation; and by Grants-in-Aid for Scientific breakpoints (inversion I; Fig. S8C). Research on Priority Areas “Comprehensive Genomics” from MEXT (to M.H.).

1. Novick RP, Christie GE, Penadés JR (2010) The phage-related chromosomal islands of 6. Tsuru T, Kobayashi I (2008) Multiple genome comparison within a bacterial species Gram-positive bacteria. Nat Rev Microbiol 8:541–551. reveals a unit of evolution spanning two adjacent genes in a tandem paralog cluster. 2. Ochman H, Davalos LM (2006) The nature and dynamics of bacterial genomes. Science Mol Biol Evol 25:2457–2473. 311:1730–1733. 7. Suerbaum S, Michetti P (2002) Helicobacter pylori infection. N Engl J Med 347:1175–1186. 3. Hastings PJ, Lupski JR, Rosenberg SM, Ira G (2009) Mechanisms of change in gene copy 8. Suerbaum S, Josenhans C (2007) Helicobacter pylori evolution and phenotypic number. Nat Rev Genet 10:551–564. diversification in a changing host. Nat Rev Microbiol 5:441–452. 4. Koonin EV, Wolf YI (2008) Genomics of bacteria and archaea: The emerging dynamic 9. Yamaoka Y, Alm RA (2008) Helicobacter pylori: Molecular and Cellular Biology, view of the prokaryotic world. Nucleic Acids Res 36:6688–6719. ed Yamaoka Y (Caister Academic, Hethersett. Norwich, UK), pp 37–60. 5. Kuroda M, et al. (2001) Whole genome sequencing of meticillin-resistant Staphylococcus 10. Yamaoka Y, Kwon DH, Graham DY (2000) A M(r) 34,000 proinflammatory outer aureus. Lancet 357:1225–1240. membrane protein (oipA) of Helicobacter pylori. Proc Natl Acad Sci USA 97:7533–7538.

Furuta et al. PNAS Early Edition | 5of6 Downloaded by guest on October 1, 2021 11. Alm RA, et al. (2000) Comparative genomics of Helicobacter pylori: Analysis of the 24. Blanchette M, Kunisawa T, Sankoff D (1999) Gene order breakpoint evidence in outer membrane protein families. Infect Immun 68:4155–4168. animal mitochondrial phylogeny. J Mol Evol 49:193–203. 12. Shapiro JA (1979) Molecular model for the transposition and replication of 25. Kawai M, Nakao K, Uchiyama I, Kobayashi I (2006) How genomes rearrange: genome bacteriophage Mu and other transposable elements. Proc Natl Acad Sci USA 76: comparison within bacteria Neisseria suggests roles for mobile elements in formation – 1933 1937. of complex genome polymorphisms. Gene 383:52–63. 13. Kleckner N (1981) Transposable elements in prokaryotes. Annu Rev Genet 15: 26. Iida S, Meyer J, Arber W (1981) Genesis and natural history of IS-mediated 341–404. transposons. Cold Spring Harb Symp Quant Biol 45:27–43. 14. Oh JD, et al. (2006) The complete genome sequence of a chronic atrophic gastritis 27. Dorer MS, Fero J, Salama NR (2010) DNA damage triggers genetic exchange in Helicobacter pylori strain: Evolution during disease progression. Proc Natl Acad Sci Helicobacter pylori. PLoS Pathog 6:e1001026. USA 103:9999–10004. 28. Hastings PJ, Ira G, Lupski JR (2009) A microhomology-mediated break-induced 15. Azuma T, et al. (2004) Distinct diversity of the cag pathogenicity island among Helicobacter pylori strains in Japan. J Clin Microbiol 42:2508–2517. replication model for the origin of human copy number variation. PLoS Genet 5: 16. Schmid MB, Roth JR (1983) Genetic methods for analysis and manipulation of e1000327. inversion mutations in bacteria. Genetics 105:517–537. 29. Roberts RJ, et al. (2003) A nomenclature for restriction enzymes, DNA methyltrans- 17. Fujitani Y, Kobayashi I (1999) Effect of DNA sequence divergence on homologous ferases, homing endonucleases and their genes. Nucleic Acids Res 31:1805–1812. recombination as analyzed by a random-walk model. Genetics 153:1973–1988. 30. Hickman AB, et al. (2000) Unexpected structural diversity in DNA recombination: the 18. Alm RA, et al. (1999) Genomic-sequence comparison of two unrelated isolates of the restriction endonuclease connection. Mol Cell 5:1025–1034. human gastric pathogen Helicobacter pylori. Nature 397:176–180. 31. Uchiyama I, Higuchi T, Kobayashi I (2006) CGAT: a comparative genome analysis tool 19. Tomb JF, et al. (1997) The complete genome sequence of the gastric pathogen for visualizing alignments in the analysis of complex evolutionary changes between – Helicobacter pylori. Nature 388:539 547. closely related genomes. BMC Bioinformatics 7:472. ’ 20. Kersulyte D, et al. (2009) Helicobacter pylori s plasticity zones are novel transposable 32. Kawai M, Uchiyama I, Kobayashi I (2005) Genome comparison in silico in Neisseria elements. PLoS ONE 4:e6859. suggests integration of filamentous bacteriophages by their own transposase. DNA 21. Kersulyte D, Akopyants NS, Clifton SW, Roe BA, Berg DE (1998) Novel sequence Res 12:389–401. organization and insertion specificity of IS605 and IS606: chimaeric transposable 33. Tsuru T, Kawai M, Mizutani-Ui Y, Uchiyama I, Kobayashi I (2006) Evolution of elements of Helicobacter pylori. Gene 223:175–186. 22. Nobusato A, Uchiyama I, Ohashi S, Kobayashi I (2000) Insertion with long target paralogous genes: Reconstruction of genome rearrangements through comparison of – duplication: A mechanism for gene mobility suggested from comparison of two multiple genomes within Staphylococcus aureus. Mol Biol Evol 23:1269 1285. related bacterial genomes. Gene 259:99–108. 34. Darling AC, Mau B, Blattner FR, Perna NT (2004) Mauve: Multiple alignment of 23. Furuta Y, Abe K, Kobayashi I (2010) Genome comparison and context analysis reveals conserved genomic sequence with rearrangements. Genome Res 14:1394–1403. putative mobile forms of restriction-modification systems and related rearrangements. 35. Bourque G, Pevzner PA (2002) Genome-scale evolution: reconstructing gene orders in Nucleic Acids Res 38:2428–2443. the ancestral species. Genome Res 12:26–36.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1012579108 Furuta et al. Downloaded by guest on October 1, 2021