<<

Genome and proteome of long-chain alkane degrading Geobacillus thermodenitrificans NG80-2 isolated from a deep-subsurface oil reservoir

Lu Feng*†‡, Wei Wang*†‡, Jiansong Cheng*, Yi Ren*, Guang Zhao*, Chunxu Gao*, Yun Tang*, Xueqian Liu*, Weiqing Han*, Xia Peng*†‡, Rulin Liu†§, and Lei Wang*†‡§¶

*TEDA School of Biological Sciences and Biotechnology and †Tianjin Key Laboratory of Microbial Functional Genomics, Nankai University, 23 Hongda Street, Tianjin Economic-Technological Development Area (TEDA), Tianjin 300457, China; ‡Tianjin Research Center for Functional Genomics and Biochip, 23 Hongda Street, Tianjin Economic-Technological Development Area (TEDA), Tianjin 300457, China; and §College of Life Sciences, Nankai University, 94 Weijin Road, Tianjin 300071, China

Edited by David M. Karl, University of Hawaii, Honolulu, HI, and approved February 6, 2007 (received for review November 2, 2006) The complete genome sequence of Geobacillus thermodenitrifi- scribed thermophilic bacterial strain that degrades long-chain cans NG80-2, a thermophilic bacillus isolated from a deep oil alkanes up to at least C36 (6). reservoir in Northern China, consists of a 3,550,319-bp chromo- Here, we report the full genome sequence of NG80-2 and its some and a 57,693-bp plasmid. The genome reveals that NG80-2 is partial proteome. General and specific metabolic features are well equipped for adaptation into a wide variety of environmental described, and mechanisms for adaptation into oil reservoirs are niches, including oil reservoirs, by possessing genes for utilization discussed. We identified in silico the genes for the degradation of a broad range of energy sources, genes encoding various pathway of long-chain alkanes, and confirmed in vitro their encoded transporters for efficient nutrient uptake and detoxification, and by proteomic analysis. The key of this pathway, a genes for a flexible respiration system including an aerobic branch long-chain alkane monooxygenase, was fully characterized and comprising five terminal oxidases and an anaerobic branch com- shows particular potential to be used for the treatment of environ- prising a complete denitrification pathway for quick response to mental oil pollutions and in other biocatalytic processes. dissolved oxygen fluctuation. The identification of a nitrous oxide reductase gene has not been previously described in Gram-positive Results bacteria. The proteome further reveals the presence of a long-chain General Genome Features. The genome of G. thermodenitrificans alkane degradation pathway; and the function of the key enzyme NG80-2 is composed of a 3,550,319-bp chromosome and a in the pathway, the long-chain alkane monooxygenase LadA, is 57,693-bp plasmid, designated pLW1071, with mean G ϩ C con- confirmed by in vivo and in vitro experiments. The thermophilic tents of 49.0% and 39.8%, respectively (Table 1). Both the size and soluble monomeric LadA is an ideal candidate for treatment of en- structure of the NG80-2 genome are similar to that of G. kausto- vironmental oil pollutions and biosynthesis of complex molecules. philus HTA426, the only other sequenced Geobacillus genome, which contains a 3,544,776-bp chromosome and a 47,890-bp plas- adaptation ͉ degradation ͉ monooxygenase mid (4). There are 3,499 predicted ORFs, 11 rRNA operons and 87 tRNA genes for all 20 aa, covering 86% of the genome (Table 1 and Fig. 1). Putative functions were assigned to 2,479 ORFs [70.9%; eobacillus is a phenotypically and phylogenetically coherent supporting information (SI) Table 2]. Of the remainder, 757 genus of thermophilic bacilli with a high 16S rRNA se- G (21.6%) showed similarity to hypothetical proteins, and 263 (7.5%) quence similarity (98.5–99.2%), and was recently separated from had no detectable homologs in the public databases (e-value the genus Bacillus (1). Members of Geobacillus have been Ͻ1 ϫ 10Ϫ10). There are 68 putative transposase genes in intact or isolated from various terrestrial and marine environments, not mutated forms. The genome contains 30 phage-related genes, 22 of only in geothermal areas, but also in temperate regions and which constitute a defective prophage located 2,915 kb from oriC. permanently cold habitats (2), demonstrating great capabilities The likely origin of replication (oriC) on the chromosome was for adaptation to a wide variety of environmental niches. assigned to the intergenic region upstream dnaA (GT0001) based Geobacillus spp. have attracted industrial interest for their on GC skew analysis, and four DnaA box-like sequences were also potential applications in biotechnological processes as sources of found in this region. various thermostable (2). pLW1071 shows no overall similarity to other sequenced plas- There are 11 validly described Geobacillus species (3), and only mids. Genes involved in plasmid replication and conjugation were the whole genome sequence of Geobacillus kaustophilus HTA426, a deep-sea sediment isolate, has been reported (4). G. thermodenitrificans are Gram-positive, facultative soil bacteria Author contributions: L.F. and W.W. contributed equally to this work; L.F. and L.W. with distinctive property of denitrification (5). G. thermodeni- designed research; W.W., G.Z., C.G., Y.T., X.L., and W.H. performed research; L.F., W.W., J.C., trificans NG80-2 was isolated from a deep-subsurface oil reser- Y.R., X.P., and R.L. contributed new reagents/analytic tools; L.F., W.W., J.C., Y.R., G.Z., and C.G. analyzed data; and L.F., W.W., J.C., and L.W. wrote the paper. voir in Dagang oilfield, Northern China. It grows between 45°C Conflict of interest statement: L.F., W.W., Y.T., and L.W. have a financial conflict of interest and 73°C (optimum 65°C) and can use long-chain (C15-C36) resulting from a patent application for the DNA sequence of the ladA gene. alkanes as a sole carbon source (6). This article is a PNAS Direct Submission. Alkanes are the major components of crude oils and are Abbreviation: COG, clusters of orthologous groups. commonly found in oil contaminated environments. Biotechno- Data deposition: The complete genome sequence of G. thermodenitrificans NG80-2 re- logical applications for microbial degradation of those pollutants ported in this paper has been deposited in the GenBank database [accession nos. CP000557 are of long-standing interest. Although long-chain alkanes are (chromosome) and CP000558 (plasmid)]. more persistent in the environment than shorter-chains, only ¶To whom correspondence should be addressed. E-mail: [email protected]. genes involved in the degradation of alkanes up to C16 have been This article contains supporting information online at www.pnas.org/cgi/content/full/ well studied (7–11), and those for longer alkanes have not yet 0609650104/DC1. been reported. G. thermodenitrificans NG80-2 is the only de- © 2007 by The National Academy of Sciences of the USA

5602–5607 ͉ PNAS ͉ March 27, 2007 ͉ vol. 104 ͉ no. 13 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0609650104 Downloaded by guest on September 23, 2021 Table 1. General features of G. thermodenitrificans NG80-2 genome Chromosome Plasmid Total size, bp 3,550,319 57,693 G ϩ C content, % 49.01 39.75 Coding density, % 84.5 83.5 ORFs: with assigned 2,445 34 function conserved 749 8 hypothetical with no database 250 13 match total 3,444 55 Average ORF length, bp 871 876 rRNA operons 11 0 tRNAs 87 0 Transposases 55 13 Prophage-related genes 30 0

found, and 22 more genes could be assigned functions including a long-chain alkane monooxygenase gene (ladA, see below). Involve- ment of pLW1071 in alkane degradation was confirmed by the fact that a derivative of NG80-2 with the plasmid cured failed to degrade hexadecane (data not shown).

Comparative Genomics. Most predicted proteins (75.6%) have clos- est homologs in Geobacillus. Of the remainder, 11.4% are in Bacillus, 3.0% in other Gram-positive bacteria, and only 2.7% in other organisms. There are 2,578 (74.9%) NG80-2 genes having orthologs with an average protein identity of 83.2% in G. kausto- phillus HTA426, and genome-wide synteny for the orthologs be-

tween the two strains could be detected (SI Fig. 4). Also, between MICROBIOLOGY 53.1% and 58.7% of the NG80-2 genes have orthologs in each of 13 Bacillus strains sequenced (www.ncbi.nih.gov/genomes/lproks.cgi), and the average identity levels are in the range of 53.5–58.1%. No orthologs were found between the plasmids of NG80-2 and HTA426, indicating different origins. There are 1,179 orthologs shared among NG80-2, HTA426 and all 13 sequenced Bacillus genomes, of which, 1,099 (93.2%) are Fig. 1. Circular maps of the G. thermodenitrificans NG80-2 chromosome and located in the 0–1.3 Mb and 2.2–3.55 Mb regions of the NG80-2 plasmid pWL1071. (a) The chromosome map from the outside inward: the first genome (SI Table 3). The NG80-2 genes involved in nucleotide and second circles show predicted protein-coding regions on the plus and metabolism and translation were found most conserved, with minus strands, respectively (colors were assigned according to the color code 79.4% and 76.7%, respectively, of the total related genes having of the COG functional classes; see key); the third circle shows transposase genes and insertion sequence elements in red; the fourth and fifth circles show orthologs in others, in contrast to 26.8% of the genes involved in tRNAs and rRNAs in dark slate blue and dark golden red, respectively; the sixth carbohydrate metabolism shared (SI Fig. 5). and seventh circles show percentage G ϩ C in relation to the mean G ϩ C and G. thermodenitrificans NG80-2 and G. kaustophillus HTA426 GC skew, respectively. (b) The plasmid map shows ORFs color-coded according share 385 orthologs which are absent in the 13 Bacillus genomes, of to their assigned functions (see key). which, 200 could be assigned putative functions (SI Table 4). 14 genes involved in fatty acid synthesis and degradation are shared by N (GT1734-GT1731), in line with the distinctive denitrification NG80-2 and HTA426, and coincidentally, all Geobacillus species 2 property of G. thermodenitrificans. have a distinctive fatty acid profile with high proportion of iso- branched saturated acids (1). Other shared genes include a gene General Metabolism and Adaptation to Oil Reservoirs. Genes re- cluster (GT1516-GT1518) with genes encoding a heme O oxygen- quired for the synthesis of purine and pyrimidine nucleotides, fatty ase and subunits of b(o/a)3 cytochrome c oxidase involved in acids, and all 20 aa except for serB of the serine pathway were aerobic respiration, and a gene cluster (GT1698-GT1681) for the identified. serB was not found in any of the sequenced Bacillus synthesis of cobalamin. The ability to synthesize cobalamin was also genomes, suggesting the presence of nonorthologous genes. Com- reported for G. stearothermophilus, but not for Bacillus spp. with the plete sets of genes for the synthesis of all vitamins and cofactors are exception of B. megaterium in which cobalamin synthesis genes are present except for biotin. However, a putative biotin transporter located on a plasmid (12). gene (bioY, GT2896) was found, and the requirement of exogenous Four hundred ninety-eight of the NG80-2 genes are absent in biotin for growth of NG80-2 was confirmed. HTA426 and the 13 Bacillus genomes, of which, 253 are orphan All central metabolic pathways for carbohydrates except for the genes and 202 could be assigned putative functions (SI Table 5). Entner-Doudoroff pathway are present. NG80-2 utilizes a large They include those encoding proteins for the reduction of N2Oto variety of carbohydrates such as glycerol, cellobiose, trehalose and

Feng et al. PNAS ͉ March 27, 2007 ͉ vol. 104 ͉ no. 13 ͉ 5603 Downloaded by guest on September 23, 2021 starch as indicated by the fermentation tests using API 50 CHB/E genase activity, and no homologs of the FpoF subunit [containing test strips (BioMe´rieux,Marcy l’Etoile, France), and corresponding the motif to bind F420H2 in M. mazei (18)] were found despite the genes including genes encoding different glycolytic activities were presence of genes encoding F420H2 related proteins. Therefore, the found (SI Table 6). Genome analysis further revealed the presence nature of the electron donors for the Complex I-like enzymes is not of a gene cluster (GT1801-GT1756) for utilization of plant hemi- clear. The NG80-2 genome also encodes two type II NADH cellulose xylans, and the ability of NG80-2 to use xylans as a sole dehydrogenases (GT2904, GT2909), a succinate dehydrogenase carbon source was confirmed. A gene cluster for the synthesis of complex (GT2601-GT2599), a cytochrome b6c1 reductase complex carbon storage compound glycogen (GT2782-GT2778) was also (GT2126-GT2124), and five terminal oxidases of caa3-type found. In line with its capacity to use a broad range of carbohy- (GT0947-GT0950), b(o/a)3-type (GT1395-GT1394, GT1517- drates, NG80-2 has large number of genes involved in uptake of GT1518), aa3-type (GT3391-GT3388), and bd-type (GT0518- sugars (SI Table 7). At least four complete PTS systems with GT0519). The bd-type quinol oxidase has high affinity to O2 for predicted specificity to glucose (GT0878), fructose (GT1726), operating under low O2 concentrations (19). mannitol (GT1857) and cellulose (GT1748-GT1750), as well as a NG80-2 has genes needed for the reduction of nitrate to N2. glycerol facilitator (GT1215) are present. In addition, 16 ABC-type Typical narGHJI operon encoding membrane-bound nitrate reduc- sugar transporter systems were found, 5 of which are clustered with tase is present in two copies (GT0653-GT0656, GT1712-GT1715), the genes involved in utilization of starch or xylans. and the identity level between orthologous genes ranges from 38% In oil reservoirs, organic acids, with acetate being the most to 68%, indicating different origins. The GT0653-GT0656 set is abundant, are commonly detected and thought to be important for clustered with nirK (GT0650) encoding a copper-containing nitrite bacterial survival (13). Acetate may be assimilated through the reductase, norZ (GT0643) encoding a quinol oxidizing nitric oxide reversible Pta-AckA (GT3361, GT2688) pathway and Glyoxylate reductase, narK (GT0649) for nitrate or nitrite uptake, fnr bypass. The presence of genes encoding butyrate kinase (GT2309) (GT0657, GT0668) encoding the global anaerobic regulator, dnrN and formate dehydrogenases (GT0464, GT1579) indicates that (GT0660) encoding the nitric oxide-dependent regulator, and butyrate and formate may also be used. A variety of aromatic genes involved in synthesis of molybdenum (GT0645, compounds are present in reservoirs, and NG80-2 encodes four GT0658-GT0659, GT0662-GT0665), which is required for the distinct aerobic ring cleavage pathways for degradation of benzoate activity of nitrate reductase. Fnr is the anaerobic activator of (CoA activated, GT1899-GT1888), phenylacetate (CoA activated, narGHJI in E. coli and B. subtilis, and both Fnr and other Fnr-like GT1930-GT1919), 4-hydroxyphenylacetate (GT2993-GT2973), proteins such as Dnr are also involved in regulation of different and 3-hydroxyanthranilate (GT3163-GT3150). Fatty acids are the denitrifying reductase genes (20). major products of alkane degradation (see below) and NG80-2 A gene cluster (GT1735-GT1729) containing homologs of contains multiple candidate genes for enzymes of ␤-oxidation. nosZDYF for the reduction of N2OtoN2 as the last step of Genes encoding the two subunits of methylmalonyl-CoA mutase denitrification was found. GT1734 shares 41% identity with NosZ (B12-dependent) required for degradation of odd-carbon number (nitrous oxide reductase) of Wolinella succinogenes (21). The pu- fatty acids are also present (GT2300-GT2299, GT3336). rified product of GT1734 showed nitrous oxide reductase activity, Ammonia, which is the primary nitrogen source in oil reservoirs and the gene was confirmed to be a nosZ (unpublished data). The (14), may be taken up by a specific Amt type transporter (GT1313), presence of a nos gene cluster in Gram-positive bacteria has not and assimilated via glutamate dehydrogenase (GT2171, GT3148), been previously described. or the glutamine synthetase (GT1182, GT1483)-glutamate synthase Fermentation is an important mechanism for anaerobic growth (GT1292-GT1293) pathway. The latter route has a high affinity for of bacteria in the absence of alternative electron acceptors (22), and ammonia and is more efficient in N-limited conditions (15). Typical several genes of typical pyruvate fermentation pathways were found nasAB genes encoding assimilatory nitrate and nitrite reductases such as pta (GT3361), ackA (GT2688), ldh (GT0487), and buk were not found, indicating presence of nonorthologous genes in (GT2309). The por genes (GT1718-GT1717), which encode pyru- NG80-2, which can use nitrate as a nitrogen source. G. thermod- vate-ferredoxin and are commonly present in an- enitrificans does not produce urease (5), and no genes encoding aerobic bacteria (23), were also found. urease activity were found in NG80-2. Instead, we found genes encoding allophanate (GT1351-GT1352) of the urea Nutrient Uptake and Detoxification. Petroleum reservoirs represent amidolysis reaction (16), and utilization of urea by NG80-2 was an oligotrophic environment in which such elements as N, P, S, and confirmed. NG80-2 contains genes for degradation of most amino Fe are limited (14). In another hand, growth and survival of acids, and at least 60 genes encoding various proteases and pepti- microorganisms may be affected by the toxicity of crude oil dases were also found (SI Table 6). NG80-2 has a large number of hydrocarbons, various antimicrobial metabolites and heavy metals. transporter genes for uptake of exogenous amino acids and peptides Efficient transporter systems for uptake of nutrient elements and including 13 sets of ABC-type systems with predicted specificity for detoxification play a crucial role for the survival of bacteria under oligopeptides (4 sets), branched-chain amino acids (3 sets), polar reservoir conditions. We identified 368 (10.5%) genes encoding amino acids (2 sets), spermidine/putrescine (2 sets) and methionine transporter-related proteins by searching the Transport Protein (2 sets) (SI Table 7), indicating that exogenous proteins, peptides Database (www.tcdb.org) (SI Table 7). The largest number (193) of and amino acids can serve as nitrogen sources. these proteins are primary active transporters (class 3) including 186 members of the ATP-binding cassette transporter superfamily, and Respiration and Fermentation. In oil reservoirs, oxygen is only the second largest number (128) are electrochemical potential- transiently available during water flushing used for oil production driven transporters (class 2). (17). Therefore, a flexible respiration system for quick response to Phosphorus is considered to be the major rate limiting element changed O2 concentration is very important for the survival of for biological activity in oil reservoirs (14). NG80-2 contains a bacteria in reservoirs. G. thermodenitrificans is a facultative aerobe, typical ABC-type Pst system (GT2397-GT2394), which is under capable of oxygen and nitrate respiration (5). For aerobic respira- control of the phosphate uptake regulator PhoU (GT2393). Phos- tion, NG80-2 contains a gene cluster encoding Complex I-like phate may also be taken up by a Naϩ/phosphate symporter enzymes, comprising 11 (GT3302-GT3292) of the 14 subunits (GT2434) and two members of the PiT (inorganic phosphate encoded by the E. coli nuo operon, adjacent to the ATP synthase transporter) family (GT0388, GT2897). GT0388 is also likely to act complex genes (GT3311-GT3303). As in some thermophilic bac- as a sulfate transporter, as it is clustered with other sulfate assim- teria such as G. stearothermophilus (18), NG80-2 lacks three sub- ilation genes (GT0389-GT0386). Typical sulfate ABC-type trans- units of the nuo operon (nuoEFG) that code for NADH dehydro- porter and sulfate permease (SulP) were not found. Iron uptake

5604 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0609650104 Feng et al. Downloaded by guest on September 23, 2021 seems to be very important for NG80-2, as 4 sets of iron chelate ABC-type systems (GT2199-GT2197, GT1317-GT1320, GT1272- GT1275, GT0175-GT0172) and a set of FeoAB proteins (GT1740- GT1739) were found. NG80-2 also has specific uptake transporters for Zn2ϩ (ZnuABC, GT2407-GT2409), Co2ϩ (CbiMNOQ, GT1695-GT1698), Mg2ϩ/Co2ϩ (CorA, GT0899, GT1415, GT1433), Zn2ϩ/Fe2ϩ (ZIP, GT0845), Mn2ϩ/Fe2ϩ (Nramp, GT1354), and Mg2ϩ (MgtC, GT2865). Organic acids including benzoate, propi- onate and butyrate are commonly detected in reservoirs (13), and putative genes for their uptake were found (GT1615, GT1850, GT3161). We also found a putative transporter gene (GT2686) for fatty acids, the common degradation products of n-alkanes. NG80-2 utilizes both primary and secondary transporters for export of various drugs and heavy metals. At least 7 ABC-type drugs efflux systems are recognized, and another 27 electrochem- ical gradient driven systems including 14 members of the Drug:Hϩ antiporter, 4 of the cation diffusion facilitator (CDF), 3 of the resistance-nodulation-cell division (RND), 2 of the small multidrug resistance (SMR), and 2 of the multi antimicrobial extrusion (MATE) families; as well as two arsenite efflux pumps (GT1547, GT1599) were found. In addition to be expelled, arsenite may also be detoxified by arsenate reductase (GT1600, GT2957). The P- ATPase exporting systems for Cu2ϩ (GT1534-GT1535), and Zn2ϩ/ Cd2ϩ/Pb2ϩ (GT0637) were also found.

Motility, Chemotaxis, and Signal Transduction. Similar to those found in B. subtilis, NG80-2 possesses 50 genes involved in assembly and function of flagella (GT1069-GT1100, GT2142, GT2466-GT2467, GT3052-GT3069, GT3273-GT3274) including a cluster of 31 genes (GT1069-GT1100) corresponding to the fla/che operon of B. sub- Fig. 2. Proteomic characterization of pathways involved in hexadecane tilis. However, motPS encoding the second set of Mot proteins in B. metabolism in G. thermodenitrificans NG80-2. Differentially expressed pro- subtilis was not found in NG80-2. Therefore, the flagella system in teins in hexadecane-grown cells, in comparison with sucrose-grown cells, were investigated by 2-D electrophoresis (2-DE)/MALDI-TOF MS analysis. The pre- NG80-2 seems to be powered solely by MotAB complex. In dicted metabolic pathways for sucrose and hexadecane are shown. The en- addition, five intact and one interrupted methyl-accepting chemo- zymes of the pathways and corresponding genes in NG80-2 (as gene ID taxis proteins (MCPs) genes (GT345, GT661, GT884, GT1165- numbers) are noted. Proteins detected by 2-DE/MALDI-TOF MS are high- GT1166, GT2002, GT3317) were found. Inactivation of the MCP lighted. Blue, induced; red up-regulated; yellow, down-regulated; green, MICROBIOLOGY may be resulted from changed environment in oil reservoirs. unchanged. Proteins with pI values outside the range of pH 4 to 7 were not Most of the Gram-positive bacteria genomes contain relative investigated. Analogs are in brackets. P, phosphate; DH, dehydrogenase. smaller number of genes involved in signal transduction (24). NG80-2 has genes encoding 27 histidine kinases and 29 response regulators including 18 pairs of histidine kinases-response regula- sequenced Bacillus genomes. Polyamines such as norspermine and tors, 5 Ser/Thr protein kinases (GT0059, GT0497, GT1530, norspermidine are commonly found in hyperthermophilic bacteria GT3032, GT3493), a Tyr protein kinase (GT3268) and 10 proteins and are necessary for the hyperthermophilic phenotype of those with EAL (GT0376, GT0905, GT1555), HD-GYP (GT0007, bacteria (27). Spermine is the major type of polyamine in Geoba- GT1736), GGDEF (GT1560, GT2703, GT3401), EAL:GGDEF cillus species, and has been implicated in thermophily in G. kaus- (GT0605) and GAF (GT2702) domains. tophilus based on the presence of the ‘‘unique’’ genes encoding spermine/spermidine synthase and polyamine ABC transporters Thermotolerance. As a thermophilic bacterium, NG80-2 can easily (4). Those unique genes were also found in NG80-2 (GT1637, adapt to geothermal oil reservoirs when other survival require- GT0623-GT0626). Takami et al. (4) reported possible association of ments are met. Mesophilic bacteria response to heat induced asymmetric amino acid substitutions (Arg, Ala, Gly, Val, and Pro stresses by induction of heat shock proteins, which remove or refold are more frequently used in HTA426 than in mesophilic Bacillus damaged proteins (25). NG80-2 contains a wide range of genes strains) with thermoadaptation of G. kaustophilus HTA426. Similar encoding molecular chaperones including dnaK operon comprised amino acid substitution pattern was also observed in NG80-2 when of genes encoding DnaJ-DnaK-GrpE and the HrcA repressor compared with 7 sequenced Bacillus strains including 4 used to (GT2444-GT2441), genes encoding GroEL-GroES (GT0223- compare with HTA426 (SI Table 8), supporting the previous GT0224), a disulfide bond chaperone of HSP33 family (GT0065), suggestion. and small heat shock proteins of IbpA family (GT0237, GT2085, GT2097). Genes encoding ATP-dependent heat shock-responsive Long-Chain Alkane Degradation System. Alkanes are the major proteases such as HslVU (GT1067-GT1068), Clp (GT0079, components of crude oil, and potentially the most abundant and GT0678, GT0856), and Lon (GT2582, GT2583) are also present. available carbon and energy sources in reservoirs. NG80-2 can use Makarova et al. (26) listed 58 COG (clusters of orthologous groups) long-chain alkanes as its sole carbon and energy source under families, whose members are frequently detected in archaea and aerobic conditions (6). Here, we describe the pathway and func- thermophilic bacteria and predicted to be associated with the tional genes (Fig. 2). (hyper)thermophilic phenotype. NG80-2 has three genes encoding Alkane molecules are chemically inert and must be activated to proteins of those COG families: GT0279 (COG1583), GT0445 allow further metabolic steps. An aerobic degradation pathway for (COG3044), and GT3202 (COG2152). Homologs of GT0445 and short to medium-chain alkanes (C5-C16) has been well studied and GT3202 are also present in HTA426, and GT0279 is ‘‘unique’’ to a membrane-bound monooxygenase AlkB is responsible for the NG80-2. None of the three genes were found in any of the crucial initial activation of alkanes to the corresponding primary

Feng et al. PNAS ͉ March 27, 2007 ͉ vol. 104 ͉ no. 13 ͉ 5605 Downloaded by guest on September 23, 2021 whereas enzymes of the glycolysis pathway were down-regulated (Fig. 2; see also SI Fig. 8). Alcohol dehydrogenase is the second enzyme of the terminal oxidation pathway and three putative alcohol dehydrogenases (GT1287, GT1754, GT2878) were de- tected at similar levels under both conditions, suggesting their involvement in multiple metabolic functions (Fig. 2; see also SI Fig. 8). Expression of the TCA cycle enzymes was also unchanged as expected for this central metabolic pathway. However, glyoxylate bypass which allows growth on acetate as the sole carbon source was found up-regulated. The proteomic analysis confirmed the pres- ence of a terminal oxidation pathway for degradation of long-chain alkanes, which is initiated by LadA, and fatty acids produced are further degraded to acetyl-CoA via ␤-oxidation before entering the TCA cycle or glyoxylate bypass when alkanes are present as the sole carbon source for energy production (Fig. 2). Thermophilic enzymes offer major biotechnological advantages over mesophilic enzymes. have been used for industrial synthesis of many complex molecules (31), and LadA has great potential to be used in this area, particularly using petroleum compounds as substrates, and also in treatment of environmental oil pollution. Industrial applications of currently known alkane oxygenases are restricted because of their complex biochemistry and process requirements, e.g., AlkB is a membrane protein and requires rubredoxin and a rubredoxin reductase for activity (31). LadA acts on long-chain alkanes, is single-component with no Fig. 3. Characterization of long-chain alkane monooxygenase LadA. (a) coenzyme requirement, soluble (extracellular), and easily expressed Induced expression of LadA as an extracellular protein in hexadecane-grown NG80-2 cells. Sucrose-grown cells were used as the reference state. Sections of and purified in E. coli. 2-D PAGE gels stained with colloidal CBB G-250 are shown. The spot in circle was identified as LadA by MALDI-TOF MS analysis. (b) Expression and purifi- Discussion cation of LadA in E. coli as shown by SDS/PAGE. Lane 1, crude extract of E. coli G. thermodenitrificans NG80-2 has versatile metabolic pathways, BL21 (pET-ladA) before IPTG induction; lane 2, crude extract induced with flexible respiration systems and robust transport systems for its ϩ IPTG; lane 3, fractions eluted from Ni2 column; lane 4, molecular size markers. survival under various environmental conditions. The ability to (c) Hydroxylation of hexadecane by purified LadA. GC chromatographs show degrade long-chain alkanes and carry out denitrification pro- conversion of hexadecane to 1-hexadecanol. IS, internal standard (squalane). vides major survival advantages to NG80-2 in the current oil (d) Specific activity of purified LadA on alkanes with different chain lengths. reservoir, in which water had been used as the driving force for oil exploitation for 11 years before NG80-2 was isolated (un- alcohols, which are further oxidized by alcohol and aldehyde published data). When oxygen is available through water flush- dehydrogenases to fatty acids before entering ␤-oxidation (28). ing, alkanes can be used. Upon exhaustion of oxygen, NG80-2 NG80-2 does not have any AlkB homologs. Real-time RT-PCR can use nitrate, which may be carried in by the flushing waters, showed a 120-fold increase in transcription of a plasmid-borne as an alternative electron acceptor and various organic acids as putative monooxygenase gene (GT3499) when crude oil was used electron donors for survival. as a sole carbon source instead of sucrose (SI Fig. 6). GT3499 shares Presence of genes involved in utilization of xylans and degrada- 38% identity with DBT-5,5Ј-dioxide monooxygenase of Paeniba- tion of oligopeptides confirms that NG80-2 originated from a soil cillus sp. A11-2 (29) but has no detectable similarity with any known environment. We postulate that NG80-2 gained the capacity for alkane monooxygenases. alkane oxidization using plasmid pLW1071 in an oil contaminated The purified product of GT3499 converted alkanes ranging from soil before invading the oil reservoir. Gene ladA is flanked at both C15 to C36 to corresponding primary alcohols as determined by GC ends (14,207 bp upstream and 267 bp downstream) by two insertion and GC-MS (Fig. 3 b–d). In vivo experiments showed that a plasmid sequences of the IS21 family. IS21 elements are mainly found in carrying GT3499 (pCOM8-ladA) partially restored an alkB knock- Bacillus, Pseudomonas and Yersinia strains (32), indicating that ladA out mutant of P. fluorescens CHA0 (KOB2‚1) the ability to grow originated outside of G. thermodenitrificans and moved into its on alkanes by allowing growth on C15-C17, but not C6-C14. Comple- current position on the plasmid mediated by the two flanking IS21 mentation on longer chain alkanes could not be assessed as the elements. The upstream IS21 is intact, indicating that the move- mutant retains the ability to grow on C18-C28 because of presence ment of the ladA gene and its flanking DNA was a recent event. The of a second unidentified alkane hydroxylase system (30). Thus downstream IS21 is also intact except for the insertion of an intact GT3499 was identified as an alkane monooxygenase gene, and IS982 in the istB gene. The IS982 element is also present in the designated as ladA (long-chain alkane degradation). The results NG80-2 chromosome in eight places, and these nine elements share indicate that NG80-2 also utilizes a terminal oxidation pathway for high level DNA identity ranging from 90% to 100%. This fact the degradation of long-chain alkanes. LadA and AlkB, which indicates that IS982 inserted into the plasmid in NG80-2. catalyze the initial attack on alkanes, determine the size range of NG80-2 can facilitate oil recovery when added to oil reser- alkanes to be degraded. voirs, and this property may be partially attributed to its ability By using sucrose-grown cells as the reference state, proteins to selectively degrade long-chain alkanes, which leads to reduced differentially expressed in hexadecane-grown cells were investi- viscosity of crude oils (data not shown). We have demonstrated gated by 2-D electrophoresis and MALDI-TOF MS analysis. a combined genomic and proteomic approach to predict in silico Expression of LadA was induced and mainly found in extracellular and confirm in vitro a long-chain alkane degrading pathway. The fraction (Figs. 2 and 3a; see also SI Fig. 7), although this protein has NG80-2 genome provides an excellent platform for further no leader peptides. A putative aldehyde dehydrogenase (GT3117) improvement of this organism for oil bioremediation and other and enzymes of the ␤-oxidation pathway were also up-regulated, biotechnological applications.

5606 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0609650104 Feng et al. Downloaded by guest on September 23, 2021 Methods tRNAscan-SE (38). Ribosomal were identified by searching Genome Sequencing and Assembly. Genomic DNA, isolated from G. the genomic sequences against a set of known rRNAs with thermodenitrificans NG80-2, was sequenced by using a conventional BLASTN and verified by multiple alignments with other known whole genome random shotgun strategy (33). Plasmid libraries of rRNA sequences. The software Artemis (39) was used for whole- small inserts (2–3 kb) and large inserts (8–10 kb) generated by genome visualization. Genome annotation was performed by com- mechanical shearing of genomic DNA were constructed in pUC118 paring protein sequences with those in the NCBI protein database (Takara, Dalian, China). Plasmid DNA, for use as template for (www.ncbi.nlm.nih.gov) and the clusters of orthologous groups sequencing, was extracted by the alkaline lysis method from cells (COG) database (www.ncbi.nlm.nih.gov/COG). All annotations grown in 96-well plates, and purified by using Whatman Unifilter were inspected manually through searches against PFAM (40), plates (Whatman Inc., Clifton, NJ). Double-ended plasmid se- SMART (41), and PROSITE (42) databases. TMHMM 2.0 (43) quencing reactions were carried out by using ABI BigDye Termi- was used to identify transmembrane domains, and SignalP 3.0 (44) nator V3.1 Cycle Sequencing Kit, and the sequencing ladders were was used to predict signal peptide regions. Orthologous gene sets resolved on an ABI 3730 Automated DNA Analyzer (Applied were identified by BLASTP Reciprocal Best Hits (RBH) (45). Biosystems, Foster City, CA). Approximately 52,000 reads with an Putative metabolic pathways were analyzed by Metacyc (46) and average length of 629 bp were generated, providing a 9.1-fold the KEGG database (47). genome coverage. These sequences were assembled into contigs by using the PHRED࿝PHRAP࿝CONSED software package (34). Other Methods. Proteomic analysis, and methods used for in vivo Sequence gaps were closed by primer walking on gap-spanning and in vitro characterization of ladA are presented as SI Materials library clones identified based on linking information from forward and Methods. and reverse reads. The remaining 229 physical gaps were closed by using an improved Multiplex PCR method (35). The sequence was We thank Dr. Jan B. van Beilen (Institute of Biotechnology, Zurich, edited manually, and additional PCR and sequencing reactions Switzerland) for kindly providing Pseudomonas fluorescens KOB⌬1 and were done to improve coverage and resolve sequence ambiguities. plasmid pCom8 and for advice on in vivo functional analysis of long-chain All repeated DNA regions were verified by PCR amplification alkane monooxygenase genes. The following persons are acknowledged across the repeat and sequencing of the product. The final genome for their contributions to genome sequencing: Yuan Sui, Yue Li, Chun is based on 55,727 reads. Zhang, Gang Yuan, Likun Lu, Tao Tang, Hao Yu, Jianxiong Peng, Zhifeng Yang, and Chengtai Liao. This work was supported by Tianjin Municipal Special Fund for Science and Technology Innovation Genome Analysis. ORFs were identified by using CRITICA (36) and Grant 05FZZDSH00800, National 863 Program of China Grants GLIMMER (37), followed by BLASTX searches of the remaining 2002AA226011 and 2006AA020703, and funds from the Tianjin Mu- intergenic regions. RBSFINDER (www.tigr.org/software) was used nicipal Science and Technology Committee (Grants 033105811 and to locate the start codons. Transfer RNAs were predicted by using 06YFJZJC02200).

1. Nazina TN, Tourova TP, Poltaraus AB, Novikova EV, Grigoryan AA, Ivanova 25. Yura T, Nakahigashi K (1999) Curr Opin Microbiol 2:153–158. AE, Lysenko AM, Petrunyaka VV, Osipov GA, Belyaev SS, et al. (2001) Int J Syst 26. Makarova KS, Koonin EV (2003) Genome Biol 4:115. Evol Microbiol 51:433–446. 27. Daniel RM, Cowan DA (2000) Cell Mol Life Sci 57:250–264. 2. McMullan G, Christie JM, Rahman TJ, Banat IM, Ternan NG, Marchant R (2004) 28. van Beilen JB, Li Z, Duetz WA, Smits THM, Witholt B (2003) Oil Gas Sci Technol MICROBIOLOGY Biochem Soc Trans 32:214–217. 58:427–440. 3. Rahman TJ, Marchant R, Banat IM (2004) Biochem Soc Trans 32:209–213. 29. Ishii Y, Konishi J, Okada H, Hirasawa K, Onaka T, Suzuki M (2000) Biochem 4. Takami H, Takaki Y, Chee GJ, Nishi S, Shimamura S, Suzuki H, Matsui S, Biophys Res Commun 270:81–88. Uchiyama I (2004) Nucleic Acids Res 32:6292–6303. 30. Smits TH, Balada SB, Witholt B, van Beilen JB (2002) J Bacteriol 184:1733–1742. 5. Manachini PL, Mora D, Nicastro G, Parini C, Stackebrandt E, Pukall R, Fortina 31. van Beilen JB, Funhoff EG (2005) Curr Opin Biotechnol 16:308–314. MG (2000) Int J Syst Evol Microbiol 50:1331–1337. 32. Mahillon J, Chandler M (1998) Microbiol Mol Biol Rev 62:725–774. 6. Wang L, Tang Y, Wang S, Liu RL, Liu MZ, Zhang Y, Liang FL, Feng L (2006) 33. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage Extremophiles 10:347–356. AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. (1995) Science 7. van Beilen JB, Panke S, Lucchini S, Franchini AG, Rothlisberger M, Witholt B 269:496–512. (2001) Microbiology 147:1621–1630. 34. Gordon D, Abajian C, Green P (1998) Genome Res 8:195–202. 8. Tani A, Ishige T, Sakai Y, Kato N (2001) J Bacteriol 183:1819–1823. 35. Tettelin H, Radune D, Kasif S, Khouri H, Salzberg SL (1999) Genomics 62:500– 9. van Beilen JB, Smits TH, Whyte LG, Schorcht S, Rothlisberger M, Plaggemeier 507. T, Engesser KH, Witholt B (2002) Environ Microbiol 4:676–682. 36. Badger JH, Olsen GJ (1999) Mol Biol Evol 16:512–524. 10. Whyte LG, Smits TH, Labbe D, Witholt B, Greer CW, van Beilen JB (2002) Appl 37. Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Nucleic Acids Res Environ Microbiol 68:5933–5942. 27:4636–4641. 11. Yadav JS, Loper JC (1999) Gene 226:139–146. 38. Lowe TM, Eddy SR (1997) Nucleic Acids Res 25:955–964. 12. Rodionov DA, Vitreschak AG, Mironov AA, Gelfand MS (2003) J Biol Chem 39. Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell 278:41148–41159. B (2000) Bioinformatics 16:944–945. 13. Barth T (1991) Appl Geochem 6:1–15. 40. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, 14. Head IM, Jones DM, Larter SR (2003) Nature 426:344–352. Moxon S, Marshall M, Khanna A, Durbin R, et al. (2006) Nucleic Acids Res 15. Merrick MJ, Edwards RA (1995) Microbiol Rev 59:604–622. 34:D247–251. 16. Kanamori T, Kanou N, Atomi H, Imanaka T (2004) J Bacteriol 186:2532–2539. 41. Letunic I, Copley RR, Pils B, Pinkert S, Schultz J, Bork P (2006) Nucleic Acids Res 17. Whelan JK, Kennicutt MC, Brooks JM, Schumacher D, Eglington LB (1994) Org 34:D257–260. Geochem 22:587–615. 42. Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, 18. Pereira MM, Bandeiras TM, Fernandes AS, Lemos RS, Melo AM, Teixeira M Pagni M, Sigrist CJ (2006) Nucleic Acids Res 34:D227–230. (2004) J Bionenerg Biomembr 36:93–105. 43. Krogh A, Larsson B, von Heijne G, Sonnhammer EL (2001) J Mol Biol 305:567– 19. D’Mello R, Hill S, Poole RK (1996) Microbiology 142:755–763. 580. 20. Philippot L (2002) Biochim Biophys Acta 1577:355–376. 44. Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004) J Mol Biol 340:783–795. 21. Simon J, Einsle O, Kroneck PM, Zumft WG (2004) FEBS Lett 569:7–12. 45. Hirsh AE, Fraser HB (2001) Nature 411:1046–1049. 22. Eschbach M, Schreiber K, Trunk K, Buer J, Jahn D, Schobert M (2004) J Bacteriol 46. Caspi R, Foerster H, Fulcher CA, Hopkinson R, Ingraham J, Kaipa P, Krum- 186:4596–4604. menacker M, Paley S, Pick J, Rhee SY, et al. (2006) Nucleic Acids Res 34:D511–516. 23. Kletzin A, Adams MW (1996) J Bacteriol 178:248–257. 47. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) Nucleic Acids Res 24. Galperin MY (2004) Environ Microbiol 6:552–567. 32:D277–280.

Feng et al. PNAS ͉ March 27, 2007 ͉ vol. 104 ͉ no. 13 ͉ 5607 Downloaded by guest on September 23, 2021