Hindawi Publishing Corporation Volume 2012, Article ID 873589, 10 pages doi:10.1155/2012/873589

Research Article Identification of the Major Expressed S-Layer and Surface-Layer-Related Proteins in the Model Methanogenic Archaea: barkeri Fusaro and Methanosarcina acetivorans C2A

Lars Rohlin,1, 2 Deborah R. Leon,3, 4 Unmi Kim,1, 5 Joseph A. Loo,2, 3, 6 Rachel R. Ogorzalek Loo,2, 6 and Robert P. Gunsalus1, 2

1 Department of Microbiology, Immunology, and Molecular Genetics, University of California, Los Angeles, CA 90095, USA 2 UCLA-DOE Institute for Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA 3 Department of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095, USA 4 Mass Spectrometry Resource, Boston University School of Medicine, 670 Albany Street, Rm 511, Boston, MA 02118, USA 5 OPX Biotechnologies, Inc. Research Division, 2425 55th Street, Boulder, CO 80301, USA 6 Department of Biological Chemistry, University of California, Los Angeles, CA 90095, USA

Correspondence should be addressed to Rachel R. Ogorzalek Loo, [email protected] and Robert P. Gunsalus, [email protected]

Received 28 December 2011; Accepted 2 February 2012

Academic Editor: Jerry Eichler

Copyright © 2012 Lars Rohlin et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Many archaeal cell envelopes contain a protein coat or sheath composed of one or more surface exposed proteins. These surface layer (S-layer) proteins contribute structural integrity and protect the lipid membrane from environmental challenges. To explore the diversity of these layers in the , the major S-layer protein in strain Fusaro was identified using proteomics. The Mbar A1758 gene product was present in multiple forms with apparent sizes of 130, 120, and 100 kDa, consistent with post-translational modifications including signal peptide excision and protein glycosylation. A protein with features related to the surface layer proteins found in Methanosarcina acetivorans C2A and Methanosarcina mazei Goel was identified in the M. barkeri . These data reveal a distinct conserved protein signature with features and implied cell surface architecture in the Methanosarcinaceae that is absent in other archaea. Paralogous gene expression patterns in two Methanosarcina species revealed abundant expression of a single S-layer paralog in each strain. Respective promoter elements were identified and shown to be conserved in mRNA coding and upstream untranslated regions. Prior M. acetivorans genome annotations assigned S-layer or surface layer associated roles of eighty genes: however, of 68 examined none was significantly expressed relative to the experimentally determined S-layer gene.

1. Introduction in resisting osmotic stress and dehydration while allowing transit of small molecular weight nutrients and waste prod- Like cell envelopes of other archaeal species as well as gram- ucts [4]. However, relatively little is known about the cell positive and gram-negative bacteria, the envelopes of meth- envelopes of the Methanosarcinaceae, which include highly anogenic archaea have essential roles in protecting the cell studied model organisms Methanosarcina acetivorans C2A, from environmental challenges [1–3]. For example, envel- Methanosarcina mazei Goe1, and Methanosarcina barkeri opes resist attacks directed at the cytoplasmic membrane Fusaro. Prior electron microscopy studies reveal the presence by extracellular enzymes, small lipophilic or chaotrophic of a typical S-layer surrounding the cytoplasmic membrane molecules, and other toxic agents. The envelopes also aid [5, 6]. Bioinformatic studies have predicted surface-layer and 2 Archaea surface-layer-related proteins for these methanogenic strains. v/v) or with an 80 : 20 atmosphere of : carbon For example, the genome annotations of M. acetivorans list dioxide in the vessel headspace. 81 ORFs with these assigned functions [7], while over 14 and 52 ORFs were annotated in the M. mazei, and M. barkeri gen- 2.2. RNA Purification. For RNA isolations, cultures of M. omes to code related surface layer proteins, respectively acetivorans or M. barkeri were grown on the indicated sub- [8, 9]. In another study using a comparative bioinformatics strates with serial transfer for a minimum of three times approach, M. mazei, M. barkeri,andM. acetivorans were to midexponential phase prior to cell harvest. Total RNA predicted to possess 12, 12, and 3 putative S-layer proteins, was purified from 10 mL of cell samples using the RNAwiz respectively [10]. There was little overlap of these gene pre- (Ambion Austin, TX) following the manufacturer’s instruc- dictions with the above annotations. tions and as described [12]. The purified RNA was treated Little data exist that experimentally addresse the above with DNase I as described [14, 15]andstoredat−70◦C until predictions except for recent proteomic reports that iden- used. tified major surface layer proteins in two strains, M. mazei Goe1 and M. acetivorans C2A [11]. The Methanosarcina 2.3. Quantitative RT-PCR. Real-time reverse transcription studies revealed a protein in each species with a similar (RT-PCR) reactions were performed using Superscript II predicted amino acid sequence (i.e., MM1976 and MA0829), (Invitrogen) as previously described [12]. To remove com- but differing in apparent size as revealed by SDS-PAGE. The plementary RNA, 1 μL RNase H was added to mixture and M. mazei S-layer displayed three species of approximately ◦ incubated for 20 min at 37 C. The real-time PCR reactions 131, 119, and 101 kDa in size, each possessing glycan modi- were conducted on a Biorad iCycler (Biorad, Hercules, CA) fications of unknown composition. M. acetivorans displayed using a four-step program consisting of denaturing, anneal- major S-layer protein forms 134, 119, and 114 kDa in appar- ing, extension, and acquisition steps. The RT-PCR primers ent size [11]. Interestingly, these proteins were previously were created by a modified version of MyPROBES [15]. The annotated as hypothetical proteins in the M. mazei and PCR product lengths were 100–200 bp where the melting M. acetivorans in contrast to the numerous other ◦ temperature was in the range of 55–66 C. The GC content proteins annotated as surface layer or surface-related [7– was 55–65%, and the primer length was 17–22 bases (see 9]. Based on protein homology searches to MM1976 and the supplementary Material available online at doi:10.1155/ MA0829, the M. mazei and M. acetivorans genomes con- 2012/873589, Table S1). Each primer pair was calibrated tained four to seven related ORFs [11]. The roles and expres- using genomic DNA [12]. Gene expression values were sion of these related ORFs plus those previously annotated determined in triplicate and compared to a reference gene as surface associated in these model Methanosarcina strains that showed no significant variation in expression in the M. remain unclear. acetivorans microarray experiments (i.e., MA3998), or with To address the above questions, combined proteomic, genomic DNA for the M. barkeri experiments. Experiments bioinformatic, and gene expression studies were performed were performed in triplicate, and the standard deviations to explore the diversity of surface layers in two model were less than 5%. Error bars are indicated in the appropriate Methanosarcina strains. The major S-layer protein in M. figures. Abundance units (AUs) are expressed in copy barkeri was identified (Mbar A1758) and its sequence was number per 5 ng RNA [12]. used to define a family of paralogous and orthologous proteins in the Methanosarcinaceae. Transcript levels of 2.4. Primer Extension Analysis. Primer extension reactions abundantly expressed ORFs from two model strains were  examined and paralogous genes were identified. Finally, the were performed to determine the mRNA 5 ends using-gene expression of many M. acetivorans genes previously anno- specific primers which were located 60 bases downstream of mba tated as S-layer and surface-layer-associated proteins was the ATG start codon of the slmA1 gene (Mbar A1758) mac examined: none were found to be significantly expressed. and 100 bases for slmA1 (MA0829) (Table S1 listing each Together, these M. acetivorans, M. mazei,andM. barkeri primer). Total RNA was isolated as described above. A total studies reveal the presence of a distinct family of S-layer of 30 μgofRNAwasusedineachprimerextensionreaction genes/proteins that support a bioinformatics-based reassess- and carried out as describe in [12], but the Sequitherm Excel ment of Methanosarcinaceae cell surface layers. II Kit (Epicentre Madison, WI) was used to create ladder from sequencing reactions of unrelated DNA. 2. Methods and Materials 2.5. SDS-Polyacrylamide Gel Electrophoresis and Protein Vis- 2.1. Cell Culture. M. acetivorans C2A (DSM 2834) and M. ualization. Proteins were resolved on NuPAGE 4–12% Bis- barkeri Fusaro (DSM 804) were cultivated on a mineral salts- Tris gels using MES Running Buffer (Invitrogen) as pre- based medium in their single cell forms as described previ- viously described [11]. Images of stained proteins using ously [12] with an atmosphere (80 : 20) of nitrogen and car- SYPRO Ruby (Bio-Rad) were captured using a Molecular bon dioxide in the vessel headspace. Following sterilization, Imager FX scanner and PDQuest Image Analysis Software the medium was supplemented with filter-sterilized 0.1 mL (Bio-Rad). Glycosylated proteins were revealed by Pro-Q 50% or 0.2 mL 5 M per 10 mL medium as Emerald 300 Glycoprotein Stain following the manufac- previously described for M. acetivorans [13]. For M. barkeri turer’s protocol (Molecular Probes) as previously described cell growth, cultures were grown either with methanol (0.5% [11]. Protein molecular weight standards were obtained from Archaea 3

(kDa) GL (kDa) GLMkDa 200 180 1 180 2 116 3 97 97 97 82 66 82 66 66 4

42 5 42 45 6 29 29 31

18 22 18 14 14 14

(a) (b)

Figure 1: Fractionation of M. barkeri cell proteins by SDS-PAGE. (a) Pro-Q Emerald glycoprotein-stained gel. Lanes: G: glycoprotein stan- dards; L: whole-cell lysate. Gel bands analyzed by LC-MS/MS are numbered. (b) SYPRO-Ruby-stained gel. Lanes: G: glycoprotein standards; L: whole-cell lysate; M: protein standard set 2. Protein sizes are indicated in kDa. The indicated bands 1–6 were excised for LC-MS/MS analysis.

Molecular Probes (Candy Cane glycoprotein standards) and performed with splitTree4 [18]. Upstream DNA regions were Bio-Rad (biotinylated broad range protein standards). searched for palindromic and repeated motifs using simple Perl script software written in house as were searches for conserved elements in the UTR regions [12]. Potential Rho- 2.6. Trypsin Proteolysis and Nano-HPLC-MS/MS. Protein independent terminator sequences were predicted using bands to be identified were excised from SDS-PAGE gels by a TransTermHP software [19]. The signal peptides were pre- spot-excision robot (Proteome Works, Bio-Rad) or by hand. dicted using signalP 3.0 [20] and the transmembrane do- The gel-embedded proteins were reduced, iodoacetamide- mains were predicted using TMHMM [21]. The RNA sec- alkylated, and trypsin-digested (Promega, sequencing grade ondary structures were predicted using RNAfold webserver modified trypsin). Product peptides were extracted from [22]. the polyacrylamide matrix in 50% acetonitrile/0.1% triflu- oroacetic acid in water and dried by vacuum centrifugation. Peptides were dissolved in 10 μLof0.1%formicacid(FA) 3. Results solution and analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) with electrospray ioniza- 3.1. Identification of the Major M. barkeri Cell Surface Protein. tion (ESI) on an Applied BioSystems QSTAR Pulsar XL We recently identified the major S-layer proteins in M. (QqTOF) mass spectrometer as previously described [11]. acetivorans C2A and M. mazei Goe1 to be the MA0829 and Tandem mass spectra were recorded automatically during the MM1976 gene products, respectively [11]. However, the liquid chromatography run by information-dependent anal- identity and size(s) of the major M. barkeri S-layer protein(s) ysis (IDA) on the mass spectrometer with collision energies are currently unknown due to the presence of multiple par- selected by the software to yield maximum fragmentation alogous genes related to the secreted and modified major efficiency [11]. Database searches were performed on the surface proteins of M. acetivorans and M. mazei (described MS/MS data utilizing Mascot (Matrix Science) and the com- below). To address these questions, a proteomic analysis was plete MSDB database. Protein sequence searches employed performed using M. barkeri cells grown to midexponential one missed cleavage and a mass tolerance of 0.3 Da for both phase with methanol as the sole source of carbon and energy precursor and product ions. Protein hits were accepted based (Section 2). Cell-extracted proteins were separated by SDS- on ≥2 ascribed peptides, at least one of which possessed a PAGE and were visualized by fluorescence staining, employ- MOWSE score ≥ 50 (P ≤ 0.05). Correspondences between ing SYPRO Ruby to reveal the total protein content (Figure MS/MS spectra and ascribed sequences were also verified 1(b)) and Pro-Q Emerald (Figure 1(a)) to reveal glycosy- manually. lated proteins. Three prominent glycosylated species with apparent sizes of 130, 120, and 100 kDa were revealed by the Pro-Q Emerald stain (Figure 1(a), bands 1–3). The 130 kDa 2.7. Informatics Analysis and Data Visualization. Protein species appeared predominant by the SYPRO Ruby total similarities were determined using BLAST [16] whereas the protein stain (Figure 1(b)). In addition, several bands smaller alignment and the phylogenetic tree of proteins were per- in size also contained glycoproteins (Figure 1(a),markedas formed with clustalw [17]. The visualization of the trees was bands 4–6). 4 Archaea

Table 1: Summary of the DUF1608-containg proteins in the methanogenic archaea.

ORFa SignalPb TMb DUF1608 domainsb Mass kDaa AAa MA0068 + 1 2 127.0 1167 MA0653 − (+) 0 (1) 2 74.2 760 MA0829 + 1 2 74.4 671 MA0884 + 0 2 80.2 717 MA0876 + 0 1 65.2 585 MA0957 + 0 2 131.3 1196 MA3556 + 1 2 94.7 868 MA3598 + 0 1 52.4 475 MA3639 + 0 2 83.2 744 MA4531 −(+) 1 1 34.2 317 Mbar A1034 + 1 2 125.5 1161 Mbar A1577 + 0 2 82.4 744 Mbar A1758 + 1 2 73.5 668 Mbar A1775 + 0 1 65.2 586 Mbar A1815 + 0 2 68.5 617 Mbar A1816 + 0 2 55.6 505 Mbar A2011 + 0 2 74.3 681 Mbar A2016 + 1 2 94.4 868 Mbar A3145 + 1 3 121.4 1096 MM0467 + 0 2 94.6 868 MM1364 + 1 2 126.2 1164 MM1816 + 0 1 46.5 425 MM1976 + 1 2 74.1 669 MM2002 − 0 1 63.2 564 Mbur 0268 + 1 2 95.7 867 Mbur 1089 + 1 1 43.6 396 Mbur 1690 + 0 2 73.9 677 Mbur 2129 + 1 2 139.4 1304 Mthe 0149 + 1 2 118.9 1113 Mthe 0677 + 0 2 79.9 720 Mthe 1177 + 1 2 96.8 874 a Gene and protein properties are derived from the original genome annotation files [7–9]. bThe signalP (SP), transmembrane (TM), and DUF1608 domains were predicted as noted in Section 2.

The major glycan-stained M. barkeri protein bands are predicted to possess signal peptides that suggest protein (Figure 1(a), marked 1–3) were excised from the SDS-PAGE export, plus one or two Pfam domains of unknown function gels and subjected to LC-MS/MS analysis (Table S2). Highly called DUF1608. However, the proteins vary in apparent abundant in each band was the protein encoded by size and in the presence of a C-terminal hydrophobic trans- Mbar A1758. This gene is predicted to encode a hypothetical membrane element. protein of 73,535 Da in size ([9] IMG JGI) but because the To determine which paralogs were abundantly expressed gel exhibited species with 130, 120, and 100 kDa, it appears in M. barkeri, unique primer pairs were designed for each that the protein is posttranslationally modified, as are the S- gene and employed in a quantitative PCR-based gene trans- layer proteins of M. acetivorans and M. mazei ([11]; discussed cription assay (Figure 2(a); Section 2): RNA was isolated below). By tandem mass spectrometry, the mature N-termi- from cells grown to mid-exponential phase with either meth- nus (ADSVEIR) was determined to begin with residue 25 anol or hydrogen/ as sole source of car- (Table S3). bon and energy. Only one of the M. barkeri ORFs (i.e., Mbar A1758) was abundantly expressed. Of the remaining eight genes, transcripts for only two, Mba A1034 and 3.2. The M. barkeri Genome Possesses Nine Mbar A1758-Like Mbar A2016, were detected (ca., at 1.7 and 1.2% of the Proteins. A bioinformatic search for Mbar A1758-like pro- Mbar A1758 transcripts). Based on abundant detection of teins encoded in the M. barkeri Fusaro genome (Section 2) the Mbar A1758 protein in M. barkeri cell extracts and in the revealed nine highly conserved ORFs (Table 1). All paralogs uniquely high level of its transcript, it appears to constitute Archaea 5

2 4 1.8 3.5 1.5 3 2.5 1.3

2 Au 1 Au 1.5 0.8 1 0.5 0.5 0.3 0 0 A1034 A1577 A1758 A1775 A1815 A1816 A2011 A2016 A3145 MA0068 MA0653 MA0829 MA0876 MA0884 MA0957 MA3556 MA3598 MA3639 MA4531 MA3998 Gene Mbar Mbar Mbar Mbar Mbar Mbar Mbar Mbar Mbar Gene MeoH Acetate MeOH H2 + CO2 (a) (b)

Figure 2: Expression of M. acetivorans and M. barkeri paralogous S-layer genes. (a) The Mbar A1758-related genes in M. barkeri. (b) The MA0829-related genes in M. acetivorans. Abundance values (AU) are expressed in copy number [12]. Cells were grown with methanol (MeOH), acetate, or hydrogen and carbon dioxide (H2 +CO2) as the sole carbon and energy supply as described in Section 2.

the major envelope S-layer protein. For subsequent gene 3.5. Identification of the MA0829 mRNA 5 End. The M. ace- identification and description, we designate this protein/gene tivorans slmA1 transcript’s 5 end was determined (Section as SlmA1mba/slmA1mba. 2). A single 5 terminus was observed corresponding to a position located 159 nucleotides upstream of the slmA1 3.3. The M. acetivorans Genome Possesses Multiple MA0829- translational start (Figure 3(a), bent arrow). Analysis of the Like Genes. A prior Pandit phylogeny search of the M. acetiv- DNA coding region upstream of the mRNA initiation site orans genome [11] revealed four proteins related to MA0829 contained a recognizable TATA box needed for TBP recogni- (i.e., MA0068, MA0844, MA3556, and MA3639). All of these tion/recruitment and DNA binding [12]. Three short regions were predicted to encode proteins that contained domains of of dyad symmetry of 14 to 21 nucleotides in length were also unknown function, designated as DUF1608. In an expanded seen in this region (Figure 3(b), thin arrows). Finally, inspec- bioinformatic search of the M. acetivorans genome (Section tion of the 5 mRNA untranslated region (UTR) revealed 2) six additional ORFs were identified of varying sizes several potential RNA secondary structures (Section 2: from 34 to 131 kDa (Table 1). Interestingly, all have been Figure S1). annotated as hypothetical proteins of unknown function The M. acetivorans slmA1 gene is located upstream of ([7]; discussed below) and possess one or two DUF1608 two genes that encode hypothetical proteins (i.e., MA0830- domains. MA0831) and downstream of a tRNA gene (Figure 3). To determine if the MA0830 ORF is cotranscribed with slmA1, 3.4. The M. acetivorans MA0829 Gene is Abundantly Ex- quantitative PCR (qPCR) experiments were performed pressed. To determine which of these ten MA0829-like genes (Figure 4). MA0830 expression was about 2% (acetate) to 7% in M. acetivorans genome were expressed, unique DNA (methanol) of the level seen for slmA1, suggesting that little primer pairs were designed for each as described above for to no transcription read-through occurs from the upstream M. barkeri (Section 2, Table S1) and cells were grown to mid- MA0829 promoter element. There is a rho-independent-like exponential phase with either methanol or acetate as the sole terminator sequence after MA0829 predicted using TransTer- carbon and energy supply. Total cellular RNA was isolated mHP [19]. The proteomic studies provided no evidence for and employed in a quantitative PCR-based gene expression significant amounts of either MA0830 or MA0831. assay (Figure 2(b)). With the one exception (ORF MA0829, To compare slmA1 gene expression levels relative to designated slmA1mac to distinguish it from the related ORFs) several highly abundant ORFs that function in M. acetivo- all transcripts were present in low to nearly undetectable rans , specific primer pairs (Table S1) were abundance. Relative to slmA1mac, the next most highly designed for the mcrA and pta genes. Among the most highly expressed ORF was MA3556 (slmA2mac) followed by expressed in the cell, the mcrA gene encodes the A subunit MA0068 (slmA3mac) at levels of approximately 3.5 and 1.8%, of the methyl reductase enzyme of the central respectively. Interestingly, our prior proteomic study, which pathway in formation, while pta encodes phos- enriched for surface proteins, detected the MA0068 and photransacetylase required for acetate activation and utiliza- MA3556 proteins from Concanavalin A eluates, suggesting tion. The resulting qPCR analysis demonstrated that slmA1 that the proteins are either glycosylated or interact with expression was significantly higher (ca. by 2 to 10-fold) than glycans. They were not detected without employing Con- for either mcrA or pta (Figure 4). canavalin A enrichment [11]. Potential roles of these paral- The M. barkeri slmA1 mRNA 5 end was also identified ogous slmA1-like genes are discussed below. (Section 2, Figure 3(b)) and it corresponds to a position 6 Archaea

(a) 159 nt MA0829 MA0830 MA0831 MA0828

540 nt 173 nt 286 nt

(b) −150 −100 M. acetivorans M. mazei M. barkeri

TATABOX +1 +50

+100 +159 MKRFAA

Figure 3: The M. acetivorans MA0829 gene locus and mRNA 5 end. (a) The MA0829-MA0831 gene region encodes the S-layer protein (MA0829), plus two hypothetical proteins of unknown function (MA0830 and MA0831). The Genebank name (MA number) is shown above each gene. A predicted hairpin loop is indicated between MA0829 and MA0830. Intergenic distances are in nucleotides. (b) Alignment of the upstream and untranslated leader regions of genes encoding the Methanosarcina S-layer protein. The alignment of the upstream DNA sequences is relative to the start of transcription (+1 position for M. acetivorans slmA1, see text) while the ATG position of the initiation codon is indicated by the M. Numbering is relative to the transcription start. Identity of the sequences is indicated by asterisks. The putative TATA-box sequences are indicated by the bracket. The mRNA 5 end positions for M. acetivorans and M. barkeri were determined with a ubiquitous ladder (data not shown). The six first N-terminal amino acids for the unprocessed M. acetivorans MA0829 preprotein are indicated by the single amino acid code.

8 M. barkeri slmA1 has a UTR sequence highly conserved in 7 slmA relation to that of M. acetivorans (>91% identity). Further 6 upstream, the DNA also contained a conserved TATA box. 5 Interestingly, an annotated hypothetical protein coded di- 4 rectly downstream of slmA1, Mbar A1759, was identified in mcrA 3 this proteomic study (data not shown). However, that gene 2 ack was only weakly transcribed in methanol grown cells relative 1 unk Control to Mbar A1758, and it does not appear to be significantly 0 accumulated by M. barkeri. MA0829 MA0830 MA4546 MA3606 MA3998 Genes 3.6. The M. mazei Genome Possesses Five Mbar A1758-Like MeOH Acetate Proteins. AsearchoftheM. mazei Goe1 genome for M. ace- tivorans MA0829-like and M. barkeri Mbar A1758-like pro- Figure 4: Expression levels of the MA0829 and MA0830 genes in M. teins (Section 2) revealed five highly conserved ORFs (Table 1, acetivorans relative to methyl-Coenzyme M transferase (mcrA)and described below). One of these, MM1976 (slmA1mm), was acetate kinase (ack) genes. RT-PCR expression data for the indicated previously identified to be the major S-layer protein on the genes in cells grown on methanol versus acetate as the carbon supply. Abundance values (AU) are expressed as copy number [12]. M. mazei cell surface [11].

3.7. Are the Previously Annotated M. acetivorans S-Layer and analogous to that found for M. acetivorans slmA1.Located Surface-Related Proteins Abundantly Expressed? The anno- 154 nucleotides (nt) upstream of the translational start site, tation of the M. acetivorans genome lists approximately Archaea 7 eighty ORFs with assigned functions as S-layer proteins or start site (Figure 3(b)). The high conservation among the surface-related proteins [7] (Table S4). Todetermine if any of UTRs and promoter elements for the predominant slmA1 these ORFs were significantly expressed, we analyzed two M. genes from M. acetivorans, M. mazei,andM. barkeri suggests acetivorans microarray data sets from cells grown under the that all three organisms control transcription and translation conditions with acetate or with methanol as the sole supply of their S-layer proteins in a similar way. The role of the UTR of carbon and energy (Section 2). The compiled pixel data and putative secondary mRNA structures (Figure S1) in gene sets were then normalized to the mcrA gene that encodes expression is currently unknown. methyl coenzyme M reductase (MA4546). Strikingly, none of While the Mbar A1758 and MA0829 proteins are the the sixty eight previously annotated ORFs for which we had most abundantly expressed S-layer proteins in M. barkeri data were significantly expressed relative to mcrA (ca.,below and M. acetivorans, two additional proteins, MA0068 and 3-4%). As noted above, slmA1mac (MA0829) expression was MA3556, were detected by proteomic methods, albeit at 2–4-fold above that observed for the mcrA gene (Figure 4). much lower levels [11]. Correspondingly, their genes were Lastly, the twelve surface layer proteins predicted for M. transcribed at far lower levels relative to MA0829 (i.e., at less acetivorans by Saleh et al. [10] constituted a subset of those than 0.04 and 0.02 of the level, resp., Figure 2(b)). Whether shown in Table S3: they were not significantly expressed they represent minor components of the S-layer somehow relative to MA0829. These experimental findings indicate needed for envelope porosity and/or proper assembly is limitations of the computational tools used to predict archae- unknown. Alternatively, they may be more highly expressed al S-layer proteins and surface associated proteins for this under different cell growth conditions. Likewise, for the group of . Mbar A1034 and Mbar A2016 gene products detected in M. barkeri. Alignment of the primary amino acid sequences of the 4. Discussion three major S-layer proteins from M. barkeri, M. acetivorans, and M. mazei (Figure 6) reveals high conservation (65% 4.1. Identification and Properties of the M. barkeri Surface identity and 88% similarity). Each protein has a similar Layer Protein. Based on the prior M. acetivorans and M. signal peptide sequence targeting secretion followed by tan- mazei S-layer proteomic and bioinformatic data, it was not dem DUF1608 domains and a short 60 amino acid linker possible to predict which of the three to five closely related region. The latter element would presumably tether the S- ORFs in the M. barkeri genome encoded the major S-layer layer protein to the cytoplasmic membrane by a C-terminal protein(s) of this microbe. Glycoprotein staining of SDS- and hydrophobic transmembrane helix. A more detailed PAGE separated M. barkeri cell extracts revealed three dis- assessment of these proteins and their respective paralogs is tinct bands containing the Mbar A1758 polypeptide (see in progress. Pro-Q Emerald glycoprotein stained lanes 1–3, Figure 1(a)). This M. barkeri protein is analogous to the surface exposed M. acetivorans and M. mazei S-layer proteins revealed by 4.3. Mbar A1758-Like Proteins in Other Microorganisms. biotin-tagging studies [11]. It is posttranslationally processed Mbar A1758-like proteins identified in the sequenced gen- by removal of the N-terminal signal sequence and further omes of Methanosarcina species and other microorganisms modified by addition of unknown sugar moieties. The (Section 2, Table 1) are displayed in the phylogenetic tree mature N-terminus (ADSVEIR) was determined to begin shown in Figure 5. Four major conclusions may be drawn. with residue 25 (Table S4) and corresponds to those deter- First, these proteins group into six deeply branched clusters mined for MA0829 and MM1976 [11]. Lectin blotting with the M. acetivorans MA0829 (slmA1mac) and the M. revealed interactions between Mbar A1758 and the lectins mazei MM1976 (slmA1mm) proteins next to one another, Con A, Galanthus nivalis lectin (GNL), and Pisum sativum and adjacent to three M. barkeri ORFs (Mbar A1758, agglutinin (PSA). Con A, preferentially binding α-mannose Mbar A2011, and Mbar A3145). An additional two M. and α-glucose, has previously been shown by us to bind the barkeri ORFs (Mbar A1815, Mbar A1816) are the next most M. mazei and M. acetivorans S-layer proteins. GNL binds closely related proteins. Thus, it was not possible to predict (α-1,3) mannose residues preferentially and, unlike most in advance which of these five M. barkeri genes might encode mannose-binding lectins, does not bind α-glucose. PSA pre- the major S-layer protein for this organism. From the gene fers to bind α-mannose-containing oligosaccharides with N- expression studies (Figure 3(b)), Mbar A1758 is the logical acetylchitobiose-linked α-fucose residues. These lectin re- candidate due to its high level of transcription relative to sults suggest future approaches to enrich Mbar A1758 from all of the other related genes. Moreover, proteomic studies whole-cell lysates. show that the Mbar A1758 gene product is highly abundant, proteolytically processed, and further modified by unknown 4.2. The Highly Expressed Methanasarcina slmA1 Genes. The glycan additions. M. acetivorans MA0829 gene (slmA1mac) was among the Second, while the number of MA0829 homologs in M. most highly expressed in the cell (Figures 2 and 4). Interest- acetivorans and M. barkeri is similar (i.e., ten and nine pro- ingly, the downstream gene, MA0830, was not significantly teins, resp.), M. mazei only possesses five (Table 1). Inter- transcribed and does not appear to be part of an operon-like estingly, all of these appear to be closely related to an M. structure. The M. barkeri slmA1 gene (Mbar A1758) was also acetivorans and M. barkeri ORF (Figure 5), where only one highly transcribed from an identical upstream transcription Methanosarcina cluster lacks a M. mazei member. Four of 8 Archaea

0.1 NP1800A rrnAC0971 MA0957 MA0876

MM2002 Mbur 1690 MA3598 Mbar A1775 Mbar A1815 Mbar A1816 MA0653 MA3639 MM1816 Mbar A3145 Mbar A1577 Mbar A2011 MA4531 Mbar A1758 MM1976 MA0884 MA0829 Mbur 1089

Mthe 1177 Mthe 0677 Mthe 0149

Mbur 0268 MM1364 Mbur 2129 MA0068 Mbar A2016 Mbar A1034 MA3556

MM0467 Figure 5: Phylogenetic tree of all DUF1608-containing genes in sequenced genomes of Methanosarcina and in other archaeal species. The organisms are: M. acetivorans,MA;M. mazei, MM; M. barkeri,MbarA; M. burtonii,Mbur; Methanosaeta thermophila PT., Mthe ; Haloarcula marismortui,rrnAC;Natronomonas pharaonis DSM2160, NP. The indicated major (•) and minor (◦) expressed S-layer proteins are described in the text.

MA0829 150 MM1976 150 Mbar_A1758 145 Ruler

MA0829 300 MM1976 298 Mbar_A1758 292 Ruler

MA0829 449 MM1976 445 Mbar_A1758 439 Ruler

MA0829 599 MM1976 593 Mbar_A1758 587 Ruler

MA0829 671 MM1976 669 Mbar_A1758 668 Ruler

Figure 6: Primary amino acid alignment of the three major Methanosarcina S-layer proteins, Mbar A1758, MA0829, and MM1976. Identity of the sequences is indicated by asterisks. The experimentally determined mature N-terminus for Mbar A1758 (Section 3) is indicated by the arrow. Archaea 9 the five M. mazei ORFs have been detected by their binding RPG, by the UCLA-DOE Institute of Genomics and Pro- to Con A (Leon et al., in preparation) and only MM2002 has teomics to JL and RPG, by the U.S. National Institute of not been observed. M. acetivorans proteins MA3556 and Health (GM085402) to JL and RROL, and by the Ruth L. MA0068 were readily detected from Con A elutes, while pro- Kirschstein NRSA fellowship (NIH 5F31AI61886-02) to D. teins MA0653 and MA4531 were observed less consistently. R.L.TheUCLAMassSpectrometryandProteomicsTech- Noteworthy, the major ORFs detected in our Methanosarcina nology Center was established and equipped by a generous studies fall into three distinct groups on the phylogenetic gift from the W. M. Keck Foundation. The authors thank tree (Figure 5). One contains the most abundantly detected Patricia A. Denny and Paul C. Denny (University of Southern proteins while the other two are less abundant. California School of Dentistry) for performing the lectin Third, only two other methanogenic archaea possess blotting. MA0829-like genes/proteins. These are burtonii (four homologs: Mbur 0268, Mbur 1089, Mbur 1690, and Mbur 2129) and Methanosaeta thermophila PT References (three homologs: Mthe 0148, Mthe 0677, and Mthe 1177). [1] J. Eichler et al., “The cell envelopes of : staying in None of these putative proteins appear to be closely related shape in a world of salt,” in Prokaryotic Compounds: to any of the abundantly identified Methanosarcina S-layer Structure and Biochemistry,H.Konig,¨ H. Claus, and A. Varma, ORFs, but rather cluster together near the more distant and Eds., pp. 253–270, Springer, Berlin, Germany, 2010. weakly expressed Methanosarcina genes (Figure 5). Inter- [2] H. Konig¨ and H. Claus, “Cell envelopes of methanogens,” in estingly, the Mbur 1690 ORF was detected in the cell-free Prokaryotic Cell Wall Compounds: Structure and Biochemistry, supernatant of M. burtonii cultures [23], and this protein is H. Konig,¨ H. Claus, and A. Varma, Eds., pp. 231–252, Springer, most similar to the S-layer proteins detected in M. acetivorans Berlin, Germany, 2010. and M. mazei [11]. [3] R. Rachel, “Cell envelopes of and nanoar- Fourth, only two nonmethanogenic archaea possess chaeota,” in Prokaryotic Cell Wall Compounds: Structure and MA0829-like genes/proteins (Figure 5). These include Biochemistry,H.Konig,¨ H. Claus, and A. Varma, Eds., pp. 271– 294, Springer, Berlin, Germany, 2010. one each in Haloarcula marismortui (rrnAC0971, UniProt [4]A.F.Ellen,B.Zolghadr,A.M.J.Driessen,andS.V.Albers, Q5V3G1) and in Natronomonas pharaonis DSM2160 “Shaping the archaeal cell envelope,” Archaea, vol. 2010, (NP1800A, UniProt Q3IS97). No related ORFs were identi- Article ID 608243, 13 pages, 2010. fied in either the Bacteria or the Eukarya. The MA0829 pro- [5]K.R.Sowers,J.E.Boone,andR.P.Gunsalus,“Disaggregation tein appears to be relatively rare in a phylogenetic context, of Methanosarcina spp.andgrowthassinglecellsatelevated but at the same time, it is one of the major proteins in these osmolarity,” Applied and Environmental Microbiology, vol. 59, few organisms known to possess it. no. 11, pp. 3832–3839, 1993. [6] K. R. Sowers and R. P. Gunsalus, “Adaptation for growth at various saline concentrations by the archaebacterium Metha- 4.4. Previously Annotated Surface Layer and Surface-Layer- nosarcina thermophila,” Journal of Bacteriology, vol. 170, no. 2, Associated Proteins. While multiple S-layer and surface- pp. 998–1002, 1988. layer-associated genes were annotated by the Methanosarcina [7] J. E. Galagan, C. Nusbaum, A. Roy et al., “The genome of M. genome sequencing projects [7–9], it is now evident that ex- acetivorans reveals extensive metabolic and physiological di- perimental data were too limited to accurately predict the S- versity,” Genome Research, vol. 12, no. 4, pp. 532–542, 2002. layer proteins present in these archaeal species. Interestingly, [8] U. Deppenmeier, A. Johann, T. Hartsch et al., “The genome the bioinformatic study of Saleh et al. [10] predicted yet dif- of Methanosarcina mazei: evidence for lateral gene transfer ferent S-layer proteins with little overlap to other predictions. between bacteria and archaea,” Journal of Molecular Microbi- However, an analysis of several microarray data sets for M. ology and Biotechnology, vol. 4, no. 4, pp. 453–461, 2002. acetivorans failed to support the abundant expression of the [9] D. L. Maeder, I. Anderson, T. S. Brettin et al., “The Methano- above-mentioned candidates (Table S4). sarcina barkeri genome: comparative analysis with Methano- In conclusion, this study establishes the presence of a sarcina acetivorans and Methanosarcina mazei reveals extensive rearrangement within methanosarcinal genomes,” Journal of single major S-layer protein in each Methanosarcina strain Bacteriology, vol. 188, no. 22, pp. 7922–7931, 2006. examined. Its signature is well conserved within the Metha- [10] M. Saleh, C. Song, S. Nasserulla, and L. G. Leduc, “Indicators nosarcinaceae but nearly absent in any other types of organ- from archaeal secretomes,” Microbiological Research, vol. 165, isms. The current study revises criteria for predictions of no. 1, pp. 1–10, 2010. newly sequenced genomes and metagenomic data sets. In [11] D. R. Francoleon, P. Boontheung, Y. Yang et al., “S-layer, fact, it will be interesting should future studies assign newly surface-accessible, and concanavalin a binding proteins of revealed slmA1 genes/proteins to Methanosarcinaceae only. Methanosarcina acetivorans and Methanosarcina mazei,” Jour- If so, the DUF1608 motif is restricted to a relatively narrow nal of Proteome Research, vol. 8, no. 4, pp. 1972–1982, 2009. range of species. [12] L. Rohlin and R. P. Gunsalus, “Carbon-dependent control of electron transfer and central carbon pathway genes for methane biosynthesis in the Archaean, Methanosarcina ace- Acknowledgments tivorans strain C2A,” BMC Microbiology, vol. 10, article 62, 2010. This research was supported by the Department of Energy [13] K. R. Sowers, D. E. Robertson, D. Noll, R. P. Gunsalus, and Biosciences Division grant award DE-FG02-08ER64689 to M. F. Roberts, “N epsilon-acetyl-beta-lysine: an osmolyte 10 Archaea

synthesized by methanogenic archaebacteria,” Proceedings of the National Academy of Sciences of the United States of America, vol. 87, no. 23, pp. 9083–9087, 1990. [14]M.K.Oh,L.Rohlin,K.C.Kao,andJ.C.Liao,“Globalex- pression profiling of acetate-grown Escherichia coli,” Journal of Biological Chemistry, vol. 277, no. 15, pp. 13175–13183, 2002. [15] L. Rohlin, J. D. Trent, K. Salmon, U. Kim, R. P. Gunsalus, and J. C. Liao, “Heat shock response of Archaeoglobus fulgidus,” Journal of Bacteriology, vol. 187, no. 17, pp. 6046–6057, 2005. [16] S. F. Altschul, T. L. Madden, A. A. Scha¨ffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389–3402, 1997. [17] M. A. Larkin, G. Blackshields, N. P. Brown et al., “Clustal W and clustal X version 2.0,” Bioinformatics, vol. 23, no. 21, pp. 2947–2948, 2007. [18] D. H. Huson and D. Bryant, “Application of phylogenetic networks in evolutionary studies,” Molecular Biology and Evolution, vol. 23, no. 2, pp. 254–267, 2006. [19] C. L. Kingsford, K. Ayanbule, and S. L. Salzberg, “Rapid, accurate, computational discovery of Rho-independent tran- scription terminators illuminates their relationship to DNA uptake,” Genome Biology, vol. 8, no. 2, article R22, 2007. [20] O. Emanuelsson, S. Brunak, G. von Heijne, and H. Nielsen, “Locating proteins in the cell using TargetP, SignalP and related tools,” Nature Protocols, vol. 2, no. 4, pp. 953–971, 2007. [21] A. Krogh, B. Larsson, G. Von Heijne, and E. L. L. Sonnham- mer, “Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes,” Journal of Molecular Biology, vol. 305, no. 3, pp. 567–580, 2001. [22] D. H. Mathews, J. Sabina, M. Zuker, and D. H. Turner, “Expanded sequence dependence of thermodynamic parame- ters improves prediction of RNA secondary structure,” Journal of Molecular Biology, vol. 288, no. 5, pp. 911–940, 1999. [23] N. F. W. Saunders, C. Ng, M. Raftery, M. Guilhaus, A. Good- child, and R. Cavicchioli, “Proteomic and computational anal- ysis of secreted proteins with type I signal peptides from the antarctic archaeon Methanococcoides burtonii,” JournalofPro- teome Research, vol. 5, no. 9, pp. 2457–2464, 2006. International Journal of Peptides

Advances in BioMed Stem Cells International Journal of Research International International Genomics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Virolog y http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Journal of Nucleic Acids

Zoology International Journal of Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Submit your manuscripts at http://www.hindawi.com

Journal of The Scientific Signal Transduction World Journal Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Genetics Anatomy International Journal of Biochemistry Advances in Research International Research International Microbiology Research International Bioinformatics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Enzyme International Journal of Molecular Biology Journal of Archaea Research Evolutionary Biology International Marine Biology Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014