Sequence Anomalies in the Cag7 Gene of the Helicobacter Pylori Pathogenicity Island

Sequence Anomalies in the Cag7 Gene of the Helicobacter Pylori Pathogenicity Island

Proc. Natl. Acad. Sci. USA Vol. 96, pp. 7011–7016, June 1999 Microbiology Sequence anomalies in the Cag7 gene of the Helicobacter pylori pathogenicity island GUOYING LIU†,TIMOTHY K. MCDANIEL‡,STANLEY FALKOW‡, AND SAMUEL KARLIN†§ †Department of Mathematics, Stanford University, Stanford, CA 94305-2125; and ‡Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305-5124 Contributed by Samuel Karlin, April 5, 1999 ABSTRACT The severity of Helicobacter pylori-related dis- genic strains, but no mechanisms for this variation have been ease is correlated with a pathogenicity island (the Cag region proposed and no features of the Cag7 sequence have been of about 26 genes) whose presence is associated with the noted to account for the origin of this variation. up-regulation of an IL-8 cytokine inflammatory response in We present here a rigorous statistical analysis of the Cag7 gastric epithelial cells. Statistical analysis of the Cag gene protein (1,927 aa) from strain 26695. Of particular interest, we sequences calculated from the complete genome of strain underscore several sequence features of this protein, including 26695 revealed several unusual features. The Cag7 sequence distinctive repeat patterns, a remarkable cysteine residue (1,927 aa) has two repeat regions. Repeat region I runs 317 aa distribution, a statistically significantly high multiplet count !!!!!! in a form of proximal to the protein N terminal; repeat (defined below), a pronounced charge residue cluster (8, 9), region II extends 907 aa in the middle of the protein sequence extremes of lysine and glutamate amino acid usage, and consisting of 74 contiguous segments composed from selec- identification of hydrophobic potential transmembrane seg- tions among six consensus sequences and includes 58 regu- ments. Expansion or contraction of the repeats could account larly distributed cysteine residues with consecutive cysteines for the size variations seen in the ORF of Cag7. mostly 12, 18, or 24 aa apart. This ‘‘regular’’ cysteine arrange- ment may provide a scaffolding of linker elements stabilized by disulfide bridges. When Cag7 homologues from different RESULTS strains are compared, differences were found almost exclu- y Unusual Sequence Features of Cag7. The SAPS (Statistical sively in the repeat regions, resulting from deletion and or Analysis of Protein Sequences) program (8) was applied to all the insertion of repeating units. These observations suggest that putative proteins encoded from the CagA region of strain 26695. the anomalous repetitive structure of the sequence plays an This analysis reveals several unusual sequence features especially important role in the conformation of Cag7 gene product and for the Cag7 protein, which was found to contain two impressive potentially in the function of the pathogenicity island. Other regions composed of contiguous repeated amino acid sequences. facets of the Cag7 sequence show significant charge clusters, Repeat I. Repeat I (Fig. 1), covering amino acid positions 9-325 high multiplet count, and extremes of amino acid usage. inclusive, in the pattern !!*!**, has ! (130 aa) aligned with !* (130 aa), showing only three mismatches and !** (57 aa), a Helicobactor pylori (HP) is a Gram-negative spiral-shaped truncated copy of !*, which matches perfectly over their com- bacterium that colonizes the human stomach. About 50% of mon 57 aa and, more impressively, in perfect DNA agreement. humans are infected by HP but only 10% exhibit clinical Remarkably the ! and !* differ at only three DNA positions, disease, including chronic gastritis, gastric carcinoma, and which all occur in codon site 1. There are no synonymous (silent peptic ulcer (1). The more severe forms of disease are asso- site) substitutions. The almost perfect DNA identities comparing ciated with infection by specific strains called type I. Two type ! to !*or!** strongly suggest a recent origin to these repeats. I HP strains have been sequenced in their entirety [strains Repeat II. Repeat II (Figs. 2 and 3) consists of 74 contiguous 26695 (2) and J99 (3)]. Virulent HPs differ from less virulent segments composed from selections among six different con- strains (type II) by the presence of a ;40-kb block of genes a, b, l, m, d, «, stretching over called the Cag pathogenicity island (abbreviated Cag PAI or sensus sequences, which we call CagA region; ref. 4). No specific function is established for any amino acid positions 477-1383. The underline signifies perfect conservation of the amino acids at that position among the gene from the Cag island. However, Cag-positive, but not a b Cag-negative, strains cause cultured gastric epithelial cells to ensemble of sequences of ,of , etc. a 5 CEKLLTPEA(KyR)KLLE (14aalength). secrete the proinflammatory cytokine IL-8 (4,5), and this a ability is abolished by specific mutation of many of the 26 Some have one or two appended aa, generally E, EE, or QE: ORFs found in the Cag island (4–6). Several of these genes are b 5 CLKDLPKDLQKKVL (14aalength): modestly similar to genes of other pathogens that encode subunits of specialized type IV secretory systems that directly l 5 deliver bacterial virulence factors to the surface and possibly CLKNAKT(D/E)EERK(K/R) (13 aa length); into host cells. Control of bacterial virulence often is mediated by changes at the DNA sequence level that affect gene m 5 CVSQA(R/K) (N/T) E (A/K)EKKE regulation or expression (7). Three Cag PAI now have been sequenced from the complete genomes of strains 26695 and (13 aa length). J99 and the sequenced cosmid 36 from strain NCTC11638. All In repeat II, a l sequence is always followed by a b sequence three contain an unusual ORF (annotated Cag7 or HP527 in m a strain 26695), which is significantly variable among HP patho- and by : d 5 AKES(V/L)KAYLD (10aalength), The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked ‘‘advertisement’’ in Abbreviation: HP, Helicobacter pylori. accordance with 18 U.S.C. §1734 solely to indicate this fact. §To whom reprint requests should be addressed. e-mail: fd.zgg@ PNAS is available online at www.pnas.org. forsythe.stanford.edu. 7011 Downloaded by guest on September 29, 2021 7012 Microbiology: Liu et al. Proc. Natl. Acad. Sci. USA 96 (1999) FIG. 1. Alignment of amino acid sequences of !, !*, and !** in repeat I. Matching residues are indicated by dots. ! and !* differ at three residues; !** is 73 aa shorter than ! or !*. !** matches exactly with !* over their common 57 residues. The numbers to the right of the sequences give their coordinates within the Cag7 protein. DNA conservation with respect to ! and !* differ only at codon site 1 of the altered aa. !* and !** are identical at the DNA level in their common sequence. « 5 Q Q (A/V/Y) L D (5 aa length). of disulphide bridges. This cysteine arrangement differs from other classical cysteine arrangements, including kringle pat- The explicit order of the subsequences of repeat II is terns, epidermal growth factor domains, fibronectin structures, and zinc fingers. displayed next: Comparisons of the Cag7 Sequence Among Different HP ~l! ~t « l! ~t « l! Strains. - 1- - - 2- - - There are three HP strains from which the CagA region is wholly sequenced. These are available from the ~a-«-l!-~b-d-m!-~a-«-l!-~b-d-m!-~a-d-m!- complete genome strain 26695, complete genome strain J99, and cosmid 36 strain NCTC 11638 (6). The alignments of the ~a-d-m!-~a-«-l!-~b-d-m!-~a-d-m!-~a-d-m!- Cag7 protein from the three sources are represented in Fig. 4. ~a d m! ~a « l! ~b d m! ~a d m! ~a d m! Cag7 matches excellently the two genes ORF14 and ORF13 - - - - - - - - - - - - - - - of cosmid 36 when encoded together with their intervening ORF14 ~a-«-l!-~b-d-m!-~a-d-m!-~a-d-m!-~a-«-l!- sequence. The correspondence with possesses a dele- tion of 130 successive residues near the N terminal of Cag7, ~b-d-m!-~a-d-m!-~a!. whereas ORF13 aligns almost perfectly with the C-terminal quarter of Cag7. Notably, the initial ! of repeat I in Cag7 is t t The sequences 1 and 2 in the above pattern each begin with the 130-residue segment missing from ORF14. The sequence a cysteine but significantly differ from the consensus sequences intervening ORF13 and ORF14 is replete with nonsense a and b, respectively. Each specific a unit aligns substantially codons. However, in introducing a frame shift (skip a guanine with the consensus a, each b unit aligns substantially with the at nucleotide position 24186) relative to the cosmid 36 se- consensus b, etc. The main repeat units occur as triplet groups quence, the amino acid sequence resulting from this transla- of sequences of the form tion aligns almost perfectly with the middle part of Cag7, but for an absent block of 69 aa. The missing part is equivalent to a d l the two repeat triplet groups of repeat II in Cag7, those of SorD 2 SorD 2 SorD . sequences (a-«-l)–(b-d-m) corresponding to amino acid posi- b « m tions 1114-1182 of Cag7. These alterations suggest that the number of repeat units may be part of the mechanism regu- The d sequences invariably are followed by a m sequence, « lating the expression, conformation, andyor function of the sequences are followed invariably by a l sequence, b sequences protein. We also guess that the 11 frame shift serves to are followed by d sequences, whereas a sequences are followed regulate the expression of Cag7-like genes among different by either « or d sequences. It is worth emphasis that DNA strains, which conceivably also controls the virulence of the conservation in these repeats among the a, b, d, etc.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us