Quick viewing(Text Mode)

Genomewide Transcriptional Changes Associated with Genetic Alterations and Nutritional Supplementation Affecting Tryptophan Metabolism in Bacillus Subtilis

Genomewide Transcriptional Changes Associated with Genetic Alterations and Nutritional Supplementation Affecting Tryptophan Metabolism in Bacillus Subtilis

Genomewide transcriptional changes associated with genetic alterations and nutritional supplementation affecting in Bacillus subtilis

Randy M. Berka*, Xianju Cui*, and Charles Yanofsky†‡

*Novozymes Biotech, Inc., Davis, CA 95616; and †Department of Biological Sciences, Stanford University, Stanford, CA 94305

Contributed by Charles Yanofsky, March 19, 2003 DNA microarrays comprising Ϸ95% of the Bacillus subtilis anno- initiation at the trp promoter͞operator. tated coding ORFs were deployed to generate a series of initiation is not known to be regulated at the trp operon snapshots of genomewide transcriptional changes that occur when of B. subtilis. Transcription of the structural genes in cells are grown under various conditions that are expected to the trp of both organisms is regulated by transcription increase or decrease transcription of the trp operon segment of the attenuation, but by different mechanisms. In B. subtilis a tryp- aromatic supraoperon. Comparisons of global expression patterns tophan-activated RNA-binding protein (TRAP), the product of were made between cells grown in the presence of indole acrylic the mtrB gene, regulates attenuation. The mtrB coding sequence acid, a specific inhibitor of tRNATrp charging; cells deficient in resides in a two-gene operon with mtrA, which specifies GTP expression of the mtrB gene, which encodes the tryptophan- cyclohydrolase I, the enzyme catalyzing the first step in pterin activated negative regulatory protein, TRAP; WT cells grown in the formation in folic acid biosynthesis. In addition to these aro- presence or absence of two or three of the aromatic amino acids; matic-folate cross-pathway features, at least one additional and cells harboring a tryptophanyl tRNA synthetase mutation operon, rtpA-ycbK, appears to play an important regulatory role conferring temperature-sensitive tryptophan-dependent growth. in trp operon expression in B. subtilis (4, 5). Transcription of Our findings validate expected responses of the tryptophan bio- rtpA-ycbK is regulated by the T box antitermination mechanism synthetic genes and presumed regulatory interrelationships be- in response to a deficiency of charged tRNATrp (4, 6). Expression tween genes in the different aromatic pathways and of the rtpA-ycbK operon leads to the synthesis of the anti-TRAP the biosynthetic pathway. Using a combination of super- regulatory protein AT, the rtpA gene product (5). AT can vised and unsupervised statistical methods we identified Ϸ100 inactivate TRAP and when it does this allows trp operon genes whose expression profiles were closely correlated with transcription and trpG . AT production is also regu- those of the genes in the trp operon. This finding suggests that lated translationally, in response to the accumulation of un- expression of these genes is influenced directly or indirectly by charged tRNATrp (G. Chen and C.Y., unpublished data). Many regulatory events that affect or are a consequence of altered of the known B. subtilis genes that are presumed to play a role tryptophan metabolism. in aromatic amino acid metabolism and folic acid synthesis and regulation are presented in Fig. 1. omologous protein domains are used by Bacillus subtilis and B. subtilis also lacks a structural homolog of TyrR, the major HEscherichia coli to catalyze the same reactions in the regulatory protein of E. coli that controls expression of the biosynthesis of the aromatic amino acids (1). Despite this common aromatic pathway genes and the genes required for similarity, very different regulatory and mechanisms are and synthesis. However, some of these B. used by these to regulate aromatic . subtilis genes are regulated in response to tyrosine or phenylal- These differences must be partly caused by the different evolu- anine accumulation. In addition, in B. subtilis, a single gene, tionary histories and experiences of these microorganisms. aroA, specifies 3-deoxy-D-arabino-heptulosonate-7-phosphate Operon organization also is somewhat different in the two (DAHP) synthase, the enzyme that catalyzes the first step in the species, reflecting regulatory interrelationships described as common aromatic pathway. Synthesis of this enzyme is regulated cross-pathway control, that exist between genes for different only indirectly by aromatic amino acids (1, 3). E. coli produces pathways in B. subtilis that are not evident in E. coli. Thus, the three nearly identical DAHP synthases that catalyze this reac- six-gene trp operon of B. subtilis resides within an aromatic tion, and each is subject to transcriptional regulation principally supraoperon that contains six additional genes, three upstream by a different aromatic amino acid. and three downstream, concerned with the common aromatic The present study parallels a similar investigation of E. coli pathway and with phenylalanine, tyrosine, and histidine biosyn- genes that respond transcriptionally to culture conditions and thesis (Fig. 1). The seventh trp gene, trpG (pabA), is in the folate genetic alterations that influence tryptophan metabolism (7). operon. This gene specifies a protein that functions both in Herein we describe the application of B. subtilis DNA microar- tryptophan and folate biosynthesis; presumably because of this, rays as an initial step toward our goal of determining the global it is subject to regulation by both metabolites. In E. coli the effects on gene expression of varying tryptophan and charged five-gene trp operon encodes all seven protein domains needed tRNATrp availability in B. subtilis. for tryptophan biosynthesis; two of these genes encode fused protein domains that engender bifunctional polypeptides (2). Materials and Methods The B. subtilis aromatic supraoperon has three promoters as B. subtilis Strains. The following B. subtilis strains were used in shown in Fig. 1 (1, 3). One promoter is located before the three these studies: CYBS400 (WT) (4), CYBS222 (mtrBϪ) [a frame- genes upstream of the six-gene trp operon. A second promoter shift mutation located near the end of the mtrB coding region immediately precedes the trp operon segment. These two pro- yields a TRAP protein with reduced activity (3)], and BS1A353 moters provide trp operon transcripts. The third supraoperon promoter is within trpA, the last trp gene; it provides transcripts derived from the last three genes of the supraoperon. A major Abbreviations: TRAP, tryptophan-activated RNA-binding protein; PC, principal compo- regulatory difference between E. coli and B. subtilis is that E. coli nent; PCA, PC analysis. uses a DNA-binding protein to control transcription ‡To whom correspondence should be addressed. E-mail: [email protected].

5682–5687 ͉ PNAS ͉ May 13, 2003 ͉ vol. 100 ͉ no. 10 www.pnas.org͞cgi͞doi͞10.1073͞pnas.1031606100 Downloaded by guest on September 24, 2021 Table 1. B. subtilis cells and growth conditions used as sources of RNA for microarray studies Experiment B. subtilis strains and culture conditions Replicates*

1 WT in minimal medium versus WT in 8 minimal ϩ 50 ␮g͞ml tryptophan WT grown in minimal medium ؉ 30 ␮g͞ml 6 2 indole acrylic acid versus WT grown in minimal medium 3 mtrB-deficient mutant grown in minimal 8 medium ؉ 50 ␮g͞ml tryptophan versus WT grown in minimal medium ϩ 50 ␮g͞ml tryptophan 4 WT grown in minimal medium with 50 5 ␮g͞ml each phenylalanine and tyrosine versus WT grown in minimal medium 5 WT grown in minimal medium versus WT 7 grown in minimal medium with 50 ␮g͞ml Fig. 1. Known relationships among the genes of the aromatic supraoperon. each tryptophan, phenylalanine, and Colored lines connect the genes that are directly responsible for synthesis of tyrosine tryptophan (red), folate (blue), histidine (green), phenylalanine (gray), cho- 6 WT grown in minimal medium with 50 7 rismate (violet), and tyrosine (brown). Promoters are denoted by PЈ. Orange ␮g͞ml each phenylalanine and tyrosine rectangles represent leader regulatory regions controlled by tRNA-mediated versus WT grown in minimal medium antitermination, and gray boxes define regions at which TRAP (mtrB gene with 50 ␮g͞ml each tryptophan, product) binds and regulates translation. The promoter of the trp operon phenylalanine, and tyrosine itself (trpEDCFBA) is denoted as a violet box because TRAP regulates tran- ts scription at this site. Anti-TRAP (rtpA gene product) forms a complex with 7 trpS1 mutant grown at 38°C in minimal 8 ؉ TRAP and inhibits its activity and is noted as AT. medium 0.2% acid hydrolyzed casein ؉ 50 ␮g͞ml tryptophan versus WT grown in the same medium (trpS1), a mutation in the tryptophanyl-tRNA synthetase struc- To minimize intensity biases that are sometimes observed with the use of tural gene resulting in temperature-sensitive tryptophan- Cy3͞Cy5 dyes we used a dye-swapping strategy for some of the experiments dependent growth (8). listed. Consequently, the fluorescence intensity ratios were calculated as the intensity derived from one cDNA probe (growth condition or strain) divided by Culture Conditions and Isolation of RNA. Cultures (50 ml) were the other. The inducing condition, indicated in boldface type, corresponds to grown to midlog phase with shaking in minimal medium (9) plus the cells indicated by * in Table 2. The term “replicates” as used here refers to trace elements, plus and minus various supplements, at 37°C. the number of times the B. subtilis genome was queried with fluorescently Where indicated, the following supplements were included: 30 labeled cDNA probes derived from the culture conditions listed. ␮g͞ml indole acrylic acid, 50 ␮g͞ml phenylalanine, 50 ␮g͞ml tryptophan, 50 ␮g͞ml tyrosine, or 0.2% acid hydrolyzed casein. this protocol can be found at http:͞͞cmgm.stanford.edu͞ Harvested cultures were chilled rapidly on ice, 1 ml of 2 M ͞ sodium azide was added to each culture, and the cells were pbrown protocols. Cy3- and Cy5-labeled probes were combined, collected by centrifugation at 4°C. The pelleted cells were denatured, and applied to a microarray slide under a cover glass, resuspended in minimal medium plus azide, recentrifuged, and placed in a humidified chamber, and incubated overnight (15–16 frozen. For RNA isolation the pellets were resuspended in 2.5 ml h) in a water bath at 63°C (11). Before scanning, the arrays were washed consecutively in 1ϫ SSC with 0.03% SDS, 0.2ϫ SSC, and of ice-cold sterile water. RNA was prepared by using the Bio101 ϫ FastRNA Blue Kit (Qbiogene, Carlsbad, CA), according to the 0.05 SSC and centrifuged for 2 min at 500 rpm to remove excess manufacturer’s instructions, with minor modifications (see Sup- liquid. The hybridization and washing conditions were the same porting Text, which is published as supporting information on the for both PCR-amplified ORFs and oligonucleotide-based mi- PNAS web site, www.pnas.org). croarrays. Lastly, the slides were imaged by using an Axon 4000B scanner (Axon Instruments, Union City, CA). DNA Microarrays. For the first six experiments listed in Table 1, we constructed DNA microarrays consisting of PCR-amplified Treatment of Microarray Data. The intensity values B. subtilis 168 ORFs by using a complete set of ORF-specific for microarray spots were quantified (including background PCR primers as described (10). Primers were designed based on subtraction), and the resulting figures were normalized by using the ORFs listed on the SubtiList database (http:͞͞genolist. the Lowess function provided in GENESPRING software (Silicon pasteur.fr͞SubtiList). The primers were designed to amplify each of Genetics, Redwood City, CA). Genes whose expression was the Ϸ4,100 protein-coding ORFs listed on the SubtiList database). significantly up- or down-regulated in each experiment were For the seventh experiment we used microarrays that were pre- identified by using SAM (significance analysis for microarrays)

pared by spotting ORF-specific oligonucleotides (60 mers, selected software (12). Those genes for which the fluorescence intensity BIOCHEMISTRY by using protein-coding ORFs listed in the SubtiList database) ratios were not significantly altered (up or down) in any of the purchased from Compugen (Jamesburg, NJ). experiments were excluded from further analysis. Principal component analysis (PCA), hierarchical clustering, and K-means Synthesis of cDNA Probes and Hybridization Conditions. Fluorescent cluster analysis of the data were done by using the algorithms probes were prepared by reverse transcription of 25 ␮g of total included in GENESPRING. The PATHWAY TOOLS suite of software RNA from B. subtilis to incorporate aminoallyl-dUTP into (13, 14) was used to superimpose the gene expression data onto first-strand cDNA. The amino-cDNA products were subse- a predicted metabolic network for B. subtilis that was generated quently labeled by direct coupling to either Cy3 or Cy5 mono- based on the complete genome annotations extracted from functional reactive dyes (Amersham Pharmacia). The details of GenBank (www.ncbi.nlm.nih.gov).

Berka et al. PNAS ͉ May 13, 2003 ͉ vol. 100 ͉ no. 10 ͉ 5683 Downloaded by guest on September 24, 2021 Results and Discussion operon and hisS was also observed in the mtrB mutant (exper- Experimental Design, Data Collection, and Significance Testing. Mi- iment 3) as well as in experiments 4, 5, and 6. Apparently growth ͞ croarray-based gene expression comparisons were performed without the mtrB product (the TRAP regulatory protein) and or with mutant and͞or WT B. subtilis cultures grown in parallel to tryptophan overproduction deprives the cell of the proteins or log phase under the conditions listed in Table 1. Although the intermediates needed for histidine biosynthesis, promoting in- cultures were harvested at a single time point, we analyzed creased expression of the histidine biosynthetic genes and hisS. mRNA profiles from seven different culture conditions and͞or An apparent scheme of cross-pathway regulation involving the mutant strains to provide informative comparisons of global his operon and genes of aromatic amino acid biosynthesis has transcription profiles. Background-subtracted intensity data col- been recognized (17, 18). Mutants that were derepressed for lected from these experiments were normalized by using the both aromatic amino acid biosynthesis and histidine biosynthesis Lowess method (GENESPRING software). Genes for which tran- were isolated and mapped to a single locus (19, 20); however, the script levels were significantly up- or down-regulated in each biochemical basis for this cross-pathway control is unknown. experiment were identified by using SAM software (12). SAM does Additionally, by overlaying our expression data onto the pre- not rely on the magnitude of the fold-change in intensity ratios, dicted biochemical pathways of B. subtilis (13, 14) we observed but rather assigns a score to each gene based on the change in that genes encoding histidine utilization activities (hut operon) gene expression relative to the standard deviation of replicate were down-regulated in the mtrB mutant and in comparison 4 microarray measurements. The ‘‘q value’’ derived for each gene (not shown). This finding suggests that expression of these genes corresponds to the lowest false discovery rate (FDR) at which may be induced by excess histidine. The results of experiments the gene is called significant. It is analogous to the well-known 4, 5, and 6 suggest that the presence of excess phenylalanine and statistical P value, but modified for multiple testing circum- tyrosine leads to partial tryptophan starvation, resulting in stances. Several published reports have noted that the use of an increased expression of genes required for tryptophan biosyn- arbitrary fold-change value as the criterion for determining up- thesis. It seems reasonable that histidine utilization activities or down-regulation produces inappropriately high FDR values would be down-regulated under conditions in which histidine (12, 15, 16). Although the number of genes that SAM deemed biosynthesis is up-regulated. As anticipated, the results of ex- significant varied among the seven data sets, we selected fairly periments 4, 5, and 6 suggest that expression of the genes stringent statistical limits so that the median FDR was acceptably involved in the biosynthesis of phenylalanine and tyrosine are low (1.04–2.52%). In addition to genes known to be concerned decreased by addition of the corresponding amino acids. How- with aromatic amino acid metabolism, many other genes were ever, the data from experiment 4 correctly indicate that the observed to be up-regulated or down-regulated under each of presence of phenylalanine and tyrosine in minimal medium the seven experimental comparisons. There were 544 genes for partially starves the cells of tryptophan, leading to elevated which transcript levels were not significantly altered in any of expression of the trp operon and hisC, tyrA, and aroE, the three the seven experiments; these were excluded from subsequent downstream genes of the aromatic supraoperon, over the level analyses. observed in minimal medium alone. Furthermore, it appears that the genes of the his operon are up-regulated to a greater Changes in Relative mRNA Levels for Genes of the trp, dhb, and his extent in the presence of added phenylalanine, tyrosine, and Operons. Table 2 lists the mean fluorescence intensity ratios (i.e., tryptophan compared with minimal medium (experiments 4 and fold-change) for 45 genes whose transcript levels were previously 5). Likewise the data derived from experiment 6 suggest that the reported to be altered by perturbations of aromatic amino acid apparent cross-activation of the his operon is more efficient in metabolism. For those transcript levels that were significantly medium supplemented with all three aromatic amino acids altered according to SAM, the q value is also given. Nearly all of compared with minimal medium supplemented with phenylal- the genes that SAM classified as significant yielded q values anine and tyrosine only. Nester and coworkers (18, 21) previ- substantially lower than the false-discovery rates of 1% that we ously observed that tyrosine (or phenylalanine plus tyrosine) selected to filter the data. In fact, the mean q value for the up- represses the synthesis of the bifunctional deoxy-D-arabino- and down-regulated genes listed in Table 2 suggests that on heptulosonate-7-phosphate synthase- (aroA average these genes would be recognized as statistically signif- gene product). The microarray data from experiments 4 and 5 icant at a false discovery rate of 0.3%. support these earlier studies; however, our data do not confirm In general, and as we expected, the genes of the trp operon the moderate repression of aroH (monofunctional chorismate itself and the distal genes of the supraoperon (hisC-tyrA-aroE) mutase) and aroI (shikimate kinase) under the same conditions. responded somewhat similarly (increased expression) to condi- In experiment 7 that compares gene expression in a trpS tions that reduced tryptophan or charged tRNATrp availability. mutant and WT B. subtilis cells grown at 38°C in the presence of Experiment 1 detected those genes that were up-regulated or all of the amino acids, we observed that transcript levels for the down-regulated in cultures grown without tryptophan versus trp operon as well as for the trpS and rtpA-ycbK operons were with excess tryptophan. In the second experiment, the addition elevated in the trpS1 mutant. This finding implies that under of indole acrylic acid increased not only expression of the trp these conditions the cell is deficient in charged tRNATrp, thus operon that respond to tryptophan deficiency, but also expres- overall protein synthesis is somewhat adversely affected. In sion of other genes such as rtpA-ycbK and trpS that are known to contrast, we observed that the genes responsible for synthesis of respond to charged tRNATrp deficiency (4, 5). In contrast, as 2,3-dihydroxybenzoate (dhb operon) were decidedly down- expected, transcription of these three genes was not increased in regulated in the trpS mutant growing at 38°C (not shown). This the mtrB mutant (experiment 3), which overproduces tryptophan operon was also significantly down-regulated by tryptophan and presumably charged tRNATrp. However, expression of the starvation, indole acrylic acid induction, and loss of mtrB gene trp operon, the distal genes of the supraoperon, and the genes function (experiments 1, 2, and 3, respectively). These observa- involved in folate biosynthesis were increased in this comparison. tions are of interest because 2,3-dihydroxybenzoate is produced Overexpression of the trp operon presumably also leads to a from chorismate, the central precursor for aromatic amino acid partial depletion of phenylalanine and tyrosine and their biosynthesis (22). Thus, it appears there may be specific condi- charged tRNAs, because transcription of pheS, pheT, and tyrS tions that not only incite expression of the trp operon, but also was activated by trp operon overexpression (experiment 3). reduce transcription of the dhb operon, thereby directing cho- Interestingly, increased transcription of the genes in the his rismate into tryptophan biosynthesis.

5684 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.1031606100 Berka et al. Downloaded by guest on September 24, 2021 Table 2. Mean fluorescence intensity ratios (i.e., fold-change) for known genes involved in aromatic amino acid metabolism Gene Exp. 1 Exp. 2 Exp. 3 Exp. 4 Exp. 5 Exp. 6 Exp. 7

aroA 0.99 1.18 1.24 0.40 (0.0007)† 2.67 (0.0020)* 0.98 1.02 aroB 0.92 1.06 1.37 (0.0003)* 0.95 (0.0012)† 1.31 (0.0166)* 0.96 0.99 aroC 1.04 1.25 0.99 1.02 0.84 1.11 (0.0093)* 1.01 aroD 0.92 1.09 1.83 (0.0003)* 0.99 1.07 0.73 (0.0001)† 0.96 aroE 1.24 (0.0066)* 1.27 (0.0124)* 2.43 (0.0003)* 0.62 (0.0007)† 1.98 (0.0020)* 1.61 (0.0012)* 0.98 aroF 1.16 (0.0106)* 1.11 1.11 0.77 1.25 (0.0068)* 1.23 (0.0012)* 1.01 aroH 1.00 1.05 1.09 0.96 1.18 0.93 0.97 aroK 1.02 0.94 0.99 0.73 1.21 (0.0068)* 1.59 (0.0012)* 0.88 (0.0252)† folB 1.15 1.03 1.49 (0.0003)* 1.29 0.93 1.03 0.96 folC 1.31 (0.0046)* 1.09 2.08 (0.0003)* 0.85 (0.0007)† 1.79 (0.0020)* 0.67 (0.0001)† 0.95 folD 1.06 0.95 0.89 1.17 0.82 0.98 0.67 (0.0022)† folK 1.34 (0.0031)* 1.02 1.37 1.02 0.93 1.03 0.98 hisA 1.01 0.85 2.54 (0.0003)* 2.25 (0.0007)* 0.50 (0.0020)† 0.83 (0.0001)† 0.99 hisB 1.19 0.79 2.33 (0.0003)* 1.38 (0.0007)* 0.65 (0.0027)† 0.94 1.07 hisC 1.17 1.36 (0.0033)* 3.70 (0.0003)* 0.83 (0.0007)* 1.73 (0.0020)* 1.56 (0.0001)* 1.09 hisD 1.00 0.83 2.56 (0.0003)* 1.49 (0.0007)* 0.57 (0.0020)† 0.95 0.92 hisF 1.09 0.91 3.06 (0.0003)* 1.91 (0.0007)* 0.56 (0.0020)† 0.71 (0.0001)† 1.02 hisG 1.11 0.83 2.49 (0.0003)* 2.17 (0.0007)* 0.47 (0.0020)† 1.11 (0.0020)* 0.80 (0.0022)† hisH 0.97 0.89 1.87 (0.0003)* 1.62 (0.0007)* 0.57 (0.0020)† 0.91 (0.0093)† 1.00 hisI 1.14 0.89 3.01 (0.0003)* 2.15 (0.0007)* 0.60 (0.0020)† 0.75 (0.0001)† 1.11 hisJ 0.82 (0.0046)† 1.30 (0.0108)* 1.18 (0.0003)* 0.93 0.92 0.91 1.00 hisS 1.21 1.18 1.97 (0.0003)* 0.79 (0.0012)† 1.15 0.58 (0.0001)† 0.89 hisZ 1.01 1.00 2.04 (0.0003)* 2.26 (0.0007)* 0.44 (0.0020)† 1.01 0.72 (0.0038)† mtrA 0.99 1.43 1.76 (0.0003)* 0.77 (0.0007)† 1.65 (0.0020)* 0.56 (0.0001)† 1.09 mtrB 0.99 1.36 1.15 0.90 (0.0028)† 1.06 0.54 (0.0001)† 1.16 (0.0127)* pabA 1.10 1.10 2.64 (0.0003)* 1.02 1.15 0.98 0.95 pabB 1.02 1.09 1.05 1.39 (0.0081)* 0.59 (0.0020)† ND 0.99 pabC 1.29 (0.0031)* 1.25 3.66 (0.0003)* 1.03 1.15 0.80 (0.0001)† 0.97 pheA 0.91 1.14 1.31 (0.0003)* 0.98 (0.0025)† 1.14 0.80 (0.0004)† 1.12 pheB 1.00 0.98 1.49 (0.0003)* 1.17 (0.0104)* 1.00 1.11 (0.0001)* 1.17 pheS 0.92 2.00 (0.0033)* 2.05 (0.0003)* 0.79 (0.0007)† 1.27 (0.0106)* 0.99 1.29 (0.0022)* pheT 0.93 1.17 1.66 (0.0003)* ND 1.00 1.19 0.91 suI 1.12 1.12 1.91 (0.0003)* 1.30 (0.0007)* 0.98 1.17 (0.0019)* 1.01 trpA 1.49 (0.0031)* 1.42 2.13 (0.0003)* 0.95 1.49 (0.0020)* 1.54 (0.0001)* 1.06 trpB 2.34 (0.0031)* 1.91 (0.0033)* 2.24 (0.0003)* 1.39 (0.0012)* 4.40 (0.0020)* 7.94 (0.0001)* 1.50 (0.0022)* trpC 1.16 1.38 (0.0124)* 1.86 (0.0003)* 1.32 (0.0035)* 1.56 (0.0020)* 2.56 (0.0001)* 1.17 (0.0252)* trpD 1.29 (0.0031)* 3.56 (0.0033)* 8.71 (0.0003)* 1.38 (0.0104)* 2.45 (0.0020)* 4.68 (0.0001)* 1.38 (0.0022)* trpE 0.92 5.11 (0.0033)* 6.63 (0.0003)* 1.81 (0.0007)* 1.75 (0.0020)* 3.15 (0.0001)* 1.75 (0.0022)* trpF 1.38 (0.0031)* 2.42 (0.0033)* 5.13 (0.0003)* 1.20 4.20 (0.0020)* 5.60 (0.0001)* 1.10 (0.0070)* trpS 1.36 (0.0031)* 2.61 (0.0033)* 0.91 1.33 (0.0007)* 1.13 1.80 (0.0001)* 3.03 (0.0022)* tyrA 1.09 1.27 2.00 (0.0003)* 0.96 (0.0081)† 1.24 1.13 (0.0008)* 1.56 (0.0022)* tyrS 0.88 1.42 (0.0124)* 1.31 (0.0021)* 0.94 (0.0035)† 1.08 0.75 (0.0001)† 1.19 (0.0158)* tyrZ 1.00 0.88 0.80 (0.0007)† 0.89 0.79 1.42 (0.0001)* 1.14 ycbK 0.95 2.87 (0.0033)* 0.61 (0.0003)† 0.9 0.83 1.45 (0.0001)* 3.05 (0.0022)* rtpA 1.12 3.46 (0.0033)* 0.87 1.00 0.91 ND 1.89 (0.0022)*

The ratios were calculated based on the culture conditions or strain comparisons shown in Table 1. Numbers in parentheses represent the q values determined by SAM (similar to P values; refer to text). ND, not determined. *Transcripts that are significantly up-regulated on the basis of SAM. †Transcripts that are significantly down-regulated.

Clustering of Genes Based on Expression Patterns. To further exam- clusters. Interestingly, one of these clusters contained most of the ine the relationships among the gene expression profiles derived genes that were previously known to be involved in aromatic from microarray data in this study, we used several statistical amino acid metabolism (see Table 3, which is published as tools that enable clustering of genes with similar expression supporting information on the PNAS web site). In total, this

patterns. K-means cluster analysis is one such method often cluster contained 164 genes that shared similar expression BIOCHEMISTRY applied to the sorting of microarray data (23). This technique profiles to those of the genes from the aromatic supraoperon, partitions the data into a predetermined number of clusters suggesting that they are coordinately up-regulated in response to based on the similarity of their expression patterns across a series depletion of tryptophan, phenylalanine, tyrosine, or availability of experiments. This is accomplished by iterative reallocation of of charged tRNATrp. the cluster members to minimize intracluster scattering. First, we As an alternative method for identifying genes that shared selected the number of clusters to be 15 by using a ‘‘best similar transcription profiles with those of the trp operon, we K-means’’ script within the GENESPRING software package. With applied a form of hierarchical clustering known as average- this algorithm Ϸ97% of the 3,552 genes that SAM deemed linkage cluster analysis (24), a method that is familiar to most significant in at least one experiment were placed into K-means biologists through its application in DNA sequencing and phy-

Berka et al. PNAS ͉ May 13, 2003 ͉ vol. 100 ͉ no. 10 ͉ 5685 Downloaded by guest on September 24, 2021 logenetic analysis. With this method, relationships among gene expression patterns are represented by a tree whose branch lengths reflect the degree of similarity between the gene profiles as assessed by a pairwise similarity function. Hierarchical clus- tering of the data from the seven experiments in this study showed that the genes of the trp operon were contained mostly in one branch of the gene tree (see Fig. 3, which is published as supporting information on the PNAS web site) comprising 174 genes. Ninety-five of these were in common with the 164 up-regulated genes from K-means cluster that harbored the genes of the aromatic supraoperon (58% overlap). The genes of the trp operon itself (trpEDCFBA, hisC, and aroE) were con- tained in one node of this branch in addition to proS, the gene encoding prolyl-tRNA synthetase. In the B. subtilis genome proS resides in a cluster with dxr and yluC, and the direction of transcription is the same for all three of these genes. Interest- ingly, dxr and yluC are clustered in the node that includes folB, pheB, and sul (genes known to be regulated by aromatic amino metabolism) immediately adjacent to the node containing the trp operon. The proS, dxr, and yluC genes are strongly up-regulated in experiments 3 and 6. Previous studies have shown a strong tendency for genes involved in common cellular functions, pathways, and processes to cluster together in this type of Fig. 2. PCA of the gene expression data from all seven experiments listed in analysis (24, 25).§ Consequently, it is tempting to speculate that Table 1 (3,552 genes filtered for significance by using SAM) showing the proS, dxr, and yluC might reflect a peripheral segment of the rotated and dimensionally reduced gene expression data. The first PC is plotted on the x axis, PC2 is plotted on the y axis, and PC3 is plotted on the z cross-pathway control system; however, further experimentation axis. Data points are color-coded by expression (color bar at right denotes will be required to support this hypothesis. normalized intensity ratios), but those points in red correspond to the 95 It is particularly noteworthy that nearly half (81͞174) of the genes in common between the K-means and hierarchical clusters including the genes that clustered with those of the trp operon were so-called genes of the trp operon. ‘‘y genes’’ with unknown or putatively assigned functions. How- ever, several of these unknowns (e.g., yqeK, ytpP, ytpQ) were previously classified as genes involved in aromatic amino acid biosynthesis, we believe that these transcripitional variations are metabolism (3), including the yhaG (trpP) gene that is believed likely to be indirect effects representing an overall slowing of to encode a transmembrane protein involved in tryptophan cellular metabolism precipitated by depletion of an essential transport (26). amino acid such as tryptophan. The hierarchical cluster of genes coordinately induced with Lastly, we performed a PCA on the microarray data (Fig. 2) those of the trp operon includes several genes that are associated as a third method to cluster the gene expression patterns and with competence development, DNA uptake, or recombination compare the results to those derived by K-means and hierarchical (e.g., comP, comQ, mreB, mreC, mreD, mutSB, and radC) (10). clustering algorithms. PCA is a statistical technique that can be It is possible that the apparent induction of these transcripts is used to simplify the analysis and visualization of multidimen- a nonspecific consequence of limiting nitrogen conditions that sional data sets, and it is particularly well suited for microarray occurs when cells are grown in minimal medium without added data in which the expression levels for thousands of genes are amino acids (27). measured across multiple conditions (29). This tool allows the Hierarchical cluster analysis also yielded possible relationships key variables in a data set to be identified, and each resulting among the genes that are down-regulated by tryptophan defi- component defines a linear combination of experimental pa- ciency. For example, genes in the dhb operon that were observed rameters that can be used to distinguish the genes parsimoni- to be down-regulated in experiments 1, 2, 3, and 7 form a single node that includes the following: dbhACEF, ydbL, yuiI, yoeB, ously. Fig. 2 shows a scatter plot generated from PCA of the gene yplQ, yukLM, and yvbA. Although ydbL, yuiI, yoeB, yplQ, and expression data from 3,552 genes filtered for significance with yvbA have unknown functions, yukL and yukM are actually part SAM across the seven experiments listed in Table 1. The 100 of the dhbF ORF that was recently extended because of se- genes in common between the K-means and hierarchical clusters quencing errors (28). are indicated as red points in Fig. 2. Clearly, their proximity to Several additional operons also show expression profiles that each other on the PCA plot suggests that their expression are inversely correlated to that of the trp operon. Most obvious patterns are not random. Fig. 4, which is published as supporting is a constellation of operons and gene clusters that encode information on the PNAS web site, illustrates that most of the numerous components implicated in energy metabolism and variance in the microarray data (Ϸ73%) can be summarized in electron transport functions. For example, atpABDEFGH, cyd- just three components. Thus, even though there were seven ABCD, hemABCDE, hemX, hemL, narGH, and qoxBCD, genes͞ individual experiments in this study, there were only three major operons lie in the same down-regulated gene K-means cluster as independent features for each gene. Half of the variance is the dhb operon described above (not shown). In addition, genes captured in the first PC, a weighted average that distinguishes involved in (purDFHMN) and (pyrABCDFK) genes on the basis of their expression. Genes with highly positive biosynthesis, glycolysis (eno, pgk, pgm, and tpi), and several values along this component, such as those marked in blue (Fig. ribosomal proteins lie in this cluster. However, in contrast to the 2), are up-regulated under conditions that simulate tryptophan dhb operon, which has a clear connection to aromatic amino acid deficiency. The second component represents the change in gene expression across each of the different experimental conditions, §Talaat, A. M., Lyons, R. & Johnston, S. A. (2001) Abstr. Ann. Meeting Am. Soc. Microbiol. and the third component measures concavity of the data. 101, 711. Collectively, these results demonstrate that three distinct clus-

5686 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.1031606100 Berka et al. Downloaded by guest on September 24, 2021 tering methods identify a subset of Ϸ100 genes that respond in previous interpretations by uncovering associations to other a coordinated fashion with the genes of the trp operon. genes and operons that are either up- or down-regulated in response to experimental perturbations of aromatic amino acid Conclusions metabolism (e.g., his operon, dhb operon, and dxr-yluC-proS By analysis of DNA microarray experiments in which we queried cluster͞operon). The underlying molecular mechanisms that the known protein coding ORFs of B. subtilis, we identified Ϸ100 control these manifestations of cross-pathway gene regulation genes whose transcription patterns closely parallel those of the are unknown at present, although future studies and compari- aromatic supraoperon under conditions that simulate mild star- sons to other model organisms such as E. coli may provide ͞ Trp vation for aromatic amino acids and or depletion of tRNA .By valuable clues. In addition, such comparisons could yield insights deploying established statistical tools (K-means, hierarchical into the evolution of diverse mechanisms that govern aromatic clustering, and PCA) to cluster the resulting gene expression amino acid synthesis and metabolism among various bacterial profiles, we found that the genes of the trp operon itself as well genera. as a number of additional ORFs appeared to represent a coherent subset of coordinately regulated transcription units. We gratefully acknowledge Michael Rey for assistance with establishing The microarray data presented herein are confirmatory and a computer database for our microarray data. These studies were consistent with previous observations regarding transcriptional supported in part by funds provided to C.Y. from the National Science and of the supraoperon, and they extend Foundation (MCB-0093023).

1. Henner, D. & Yanofsky, C. (1993) in Bacillus subtilis and Other Gram Positive 16. Bilban, M., Buehler, L. K., Head, S., Desoye, G. & Quarnata, V. (2002) BMC Bacteria: Biochemistry, Physiology, and , ed. Losick, R. (Am. Genomics 3, 19. Soc. Microbiol., Washington, DC), pp. 269–280. 17. Nester, E. W., Schafer, M. & Lederberg, J. (1963) Genetics 48, 529–551. 2. Yanofsky, C., Miles, E. W., Kirschner, K. & Bauerle, R. (1999) Encyclopedia 18. Nester, E. W. (1968) J. Bacteriol. 96, 1649–1657. Mol. Biol. 4, 2676–2689. 19. Chapman, L. F. & Nester, E. W. (1968) J. Bacteriol. 96, 1658–1663. 3. Gollnick, P., Babitzke, P., Merino, E. & Yanofsky, C. (2002) in Bacillus subtilis 20. Nester, E. W., Dale, B., Montoya, A. & Vold, B. (1974) Biochim. Biophys. Acta and Its Closest Relatives: From Genes to Cells, eds. Sonenshein, A. L., Hoch, J. A. 361, 59–72. & Losick, R. (Am. Soc. Microbiol., Washington, DC), pp. 233–244. 21. Nester, E. W., Jensen, R. A. & Nasser, D. S. (1969) J. Bacteriol. 97, 83–90. 4. Sarsero, J. P., Merino, E. & Yanofsky, C. (2000) Proc. Natl. Acad. Sci. USA 97, 22. Stachelhaus, T., Mootz, H. D. & Marahiel, M. A. (2002) in Bacillus subtilis and 2656–2661. Its Closest Relatives: From Genes to Cells, eds. Sonenshein, A. L., Hoch, J. A. 5. Valbuzzi, A. & Yanofsky, C. (2001) Science 293, 2057–2059. & Losick, R. (Am. Soc. Microbiol., Washington, DC), pp. 415–435. 6. Henkin, T. M. (2000) Curr. Opin. Microbiol. 3, 149–153. 23. Knudsen, S. (2002) A Biologists Guide to Analysis of DNA Microarray Data 7. Khodursky, A. B., Peter, B. J., Cozzarelli, N. R., Botstein, D., Brown, P. O. & (Wiley, New York), pp. 43–44. Yanofsky, C. (2000) Proc. Natl. Acad. Sci. USA 97, 12170–12175. 24. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. (1998) Proc. Natl. 8. Steinberg, W. (1974) J. Bacteriol. 117, 1023–1034. Acad. Sci. USA 95, 14863–14868. 9. Vogel, H. J. & Bonner, D. M. (1956) J. Biol. Chem. 218, 97–106. 10. Berka, R. M., Hahn, J., Albano, M., Draskovic, I., Persuh, M., Cui, X., Sloma, 25. Sepulveda, A. R., Tao, H., Carloni, E., Sepulveda, J., Graham, D. Y. & A., Widner, W. & Dubnau, W. (2002) Mol. Microbiol. 43, 1331–1345. Peterson, L. E. (2002) Aliment. Pharmacol. Ther. 16, 145–157. 11. Eisen, M. B. & Brown, P. O. (1999) Methods Enzymol. 303, 179–205. 26. Sarsero, J. P., Merino, E. & Yanofsky, C. (2000) J. Bacteriol. 182, 2329–2331. 12. Tusher, V. G., Tibshirani, R. & Chu, G. (2001) Proc. Natl. Acad. Sci. USA 98, 27. Jarmer, H., Berka, R., Knudsen, S. & Saxild, H. H. (2002) FEMS Microbiol. 5116–5121. Lett. 206, 197–200. 13. Karp, P., Krummenacker, M., Paley, S. & Wagg, J. (1999) Trends Biotechnol. 28. May, J. J., Wendrich, T. M. & Marahiel, M. A. (2001) J. Biol. Chem. 276, 17, 275–281. 7209–7217. 14. Karp, P., Paley, S. & Romero, P. (2002) Bioinformatics 18, S1–S8. 29. Raychaudhuri, S., Stuart, J. M. & Altman, R. B. (2000) Pacific Symp. 15. Mills, J. C. & Gordon, J. I. (2001) Nucleic Acids Res. 29, e72. Biocomput. 5, 455–466. BIOCHEMISTRY

Berka et al. PNAS ͉ May 13, 2003 ͉ vol. 100 ͉ no. 10 ͉ 5687 Downloaded by guest on September 24, 2021