DNA RESEARCH 1, 1-14 (1994)

Systematic Sequencing of the 180 Kilobase Region of the Bacillus Subtilis Chromosome Containing the Replication Origin

Naotake OGASAWARA,1* Sumiko NAKAI,2 and Hiroshi YOSHIKAWA1'2

Graduate School of Biological Sciences, Nara Advanced Institute of Science and Technology, 8916-5, Takayama-cho, Ikoma, Nara 630-01, Japan1 and Department of Genetics, Osaka University Medical School, 2-2, Yamadaoka, Suita, Osaka 565, Japan2

(Received 10 November 1993) Abstract We have determined a 180 kb contiguous sequence in the replication origin region of the Bacillus subtilis chromosome. Open reading frames (ORF) in this region were unambiguously identified from the determined sequence, using criteria characteristic for the B. subtilis structure, i.e., starting with an ATG, GTG or TTG codon preceded by sequences complementary to the 3' end of the 16S rRNA. Four rRNA gene sets, 7 individual tRNA and 1 scRNA gene were identified, occupying 20 kb in total. In the remaining 160 kb region, 158 ORFs were identified, suggesting that 1 ORF is coded on average by 1 kb of DNA of the B. subtilis genome. Among the 158 ORFs, the functions of 48 ORFs were assigned and those of 11 ORFs are suggested through significant similarities to known proteins present in data banks. However, the functions of more than half of the ORFs (63%) remain to be determined. Key words: Bacillus subtilis, genome sequencing project, systematic sequencing, open reading frame

1. Introduction molecules available in soil. Comparative studies on the molecular genetics of these two bacteria have provided Since the discovery of conjugation in Escherichia coli important knowledge in understanding the universality and transformation in Bacillus subtilis in the 1950s, the and diversity of genetic systems in prokaryotes. genetic systems of these two bacteria have often been Since cloning and sequencing techniques became avail- the targets of molecular genetic studies. As a result, 1 2 able, the number of genes that have been sequenced has about 1400 genes in E. coli and 750 genes in B. subtilis increased exponentially, and this has led to the idea that have been identified and mapped on each circular chro- the whole genome could be sequenced by the combined mosome. Analysis of the structure and function of these efforts of the many laboratories involved in the molecu- genes revealed that the genes involved in metabolism of lar genetics of E. coli or B. subtilis. The first systematic cellular constituents, energy production, and biosynthe- sequencing project of the B. subtilis chromosme was initi- sis of proteins and nucleic acids are common between ated in the European Community (EC),4 and which has the two bacteria, although they are believed to have di- 3 since expanded to become an international cooperative verged genetically approximately 1.2 billion years ago. project between the EC and Japan. On the other hand, the physiological properties of the In our laboratory, we have been studying the repli- two bacteria are drastically different. E. coli is a Gram- cation origin of the B. subtilis chromosome and deter- negative bacterium and a representative of continuously mined a sequence of about 23 kb near the replication growing cells. In contrast, B. subtilis is a Gram-positive origin region.5'6 To our surprise, the origin region was soil bacterium and is capable of either growing as veg- conserved in fivedifferen t bacteria: three Gram-positive etative cells when nutrients are available or resting as and two Gram-negative bacteria. In addition, it was pos- dormant spores when starved. Consequently, B. subtilis sible to propose a phylogenetic dendrogram of the evolu- has developed genes for sporulation and germination as tion of the replication origin of bacterial chromosomes.7 well as for many extracellular enzymes that digest macro- This finding led us to study more extensively the genomic * To whom correspondence should be addressed. Tel. 81-(0)7437- structure near the replication origin region. In the last 2-5131, Fax. 81-(0)7437-2-5134, E-mail [email protected] 2 years, we have expanded the sequence determination nara.ac.jp in both directions, 50 kb to the left and 107 kb to the This work was a part of the international cooperative project right. Thus, we have determined a 180 kb contiguous se- between European Community (EC) and Japan to sequence quence in the origin region of the B. subtilis chromosome. the entire B. subtilis genome. Sequence of the 180 kb Region of the B. Subtilis Chromosome [Vol. 1.

Insert in M13 (B) Chromosomal D.\A

PCR amplification of the insert PCR amplification of from plaque or colon) the target segment

Purification of the PCR product using Centricon 100

Cycle sequencing reaction using dyc-primcr and Taq DNA

Fig. 1. Preparation of templates for the sequencing reaction by polymerase chain reaction (PCR). Steps to prepare templates for the sequencing reaction from M13 phages (A) or chromosomal DNA (B) by PCR are shown schematically. Red and green arrows indicate the universal primers for sequencing. Blue arrows indicate the region-specific primers for PCR amplification.

Here we report the cloning and sequencing of the 180 kb 2.2. Sequence Determination region, and an initial analysis of the encoded genetic in- All sequences were determined using the Dye-Primer formation. Cycle Sequencing Kit and the 373A sequencer from Ap- plied Biosystems (CA, USA). We have determined the 2. Materials and Methods sequence of both DNA strands, except for regions whose sequences were already registered in data banks. 2.1. Bacterial Strains, Phages and Plasmids Randomly overlapping libraries for the shotgun se- B. subtilis strain 168, originally isolated by C. Anag- quencing were prepared according to the method of Ma- nostopoulos, was used as a standard strain for the se- niatis et al. after DNase I treatment of target fragments.9 quencing. Itaya and Tanaka have constructed a map Inserts of M13 phages were amplified by PCR from a sus- of Notl and SfH restriction sites of the entire circular pension of phages in plaques using primers having identi- genome and thus determined the total size of the genome cal sequences to -21M13 and M13RP1 sequencing primers to be 4.188 Mb.8 We used a derivative of B. subtilis 168 from Applied Biosystems. Amplified products were used in our stock, that was shown to give the same Notl and as templates for sequencing reactions after removal of SfH digestion patterns as those of the standard strain, primers and dNTPs using a Centricon-100 from Amicon as a source of chromosomal DNA to construct lambda (MA, USA) (Fig. 1A). We determined the concentration libraries and as a template for polymerase chain reaction of purified products by measuring UV absorbance, and (PCR) amplification for direct sequencing. used them for the sequencing reaction after dilution to Lambda DASHII, EMBL3 and E. coli P2392 were used 8 ng/kb///l. Preparation of templates by PCR is supe- to prepare libraries of B. subtilis chromosomal DNA after rior to the classical method of preparing single-stranded SawIIIA partial digestion. Original recombinant phages, DNA in terms of both time saving and reproducibility. prepared using an in vitro packaging system from Strata- Sequence information from random inserts correspond- gen (CA, USA), were used for screening without further ing to 3-4 times the length of the starting fragment was amplification. M13 phage vectors and E. coli XLl-Blue enough to cover most of the fragment. The sequences of were used to prepare randomly overlapping libraries for the remaining gap regions were filled using region-specific shotgun sequencing. primers as described below. We also used universal dye-primers for the primer walking method (Fig. IB). For this purpose, we added No. 1] N. Ogasawara, S. Nakai, and H. Yoshikawa

(A) Marker Noll s/n 140 'bindm? pNEXTSj] oriC region FA abrB Ims t I if 5° ll (B) Cloning Cloned in lambda phage

K'R amplification ^^^^ PCR PCR

Inverse PCR Inverse PCR Inverse PCR Inverse PCR

(C) Sequencing Sequence reported

Sequaiced by shot-gun method

Sequeaced by primer walking method

Sequencing in progrt Sequence determined (180 Kbp)

Fig. 2. Cloning and sequencing of the region assigned to our group {gnt - spoOH). (A) Locations of probe DNAs used for the screening of the lambda phage library are shown. The scale is according to Itaya and Tanaka.8 (B) Regions cloned using lambda phage, M13 and pBR322 vectors are shown, together with those obtained directly from the chromosome by PCR. Lambda phages containing the regions indicated by green were reported previously.5-610 (C) Regions sequenced by the shotgun method (red) or the primer walking method (yellow) and reported here are shown. Green arrows indicate regions whose sequences had been reported by us.5'6'13 sequences of universal sequence primers to the 5'-end of 3. Results and Discussion region-specific primers for the walking. PCR products amplified from the chromosome or from inserts in lambda 3.1. Cloning of a Region from gnt to spoOH on the B. phages were purified and sequenced as described above. subtilis Chromosome Primer DNAs were synthesized by a Model 474 synthe- In the B. subtilis genome sequencing project, our group sizer and a 40nm column from Applied Biosystems, and is responsible for the region from the gnt operon to the dried products were used without purification after dis- spoOH gene, including the replication origin (oriC) re- solving in water. gion of the chromosome. We have reported previously the cloning of approximately 33 kb around oriC3'10 and approximately 18 kb around rrnA.10 In addition, our re- 2.3. Data Handling and Computer Analysis gion contained several genes already sequenced2 and two DNA sequences from the ABI sequencer were compiled 8 linking clones containing Notl sites. The distance be- using the ATSQ program from Software Development tween any of these land marker probes was estimated Co., Ltd. (Tokyo, Japan). To avoid errors in the num- to be less than 50 kb (Fig. 2A). Therefore, it was ex- ber of bases, we inspected every chart and used sequence pected that the whole region can be easily cloned by the outputs only from regions, usually up to about 400 bp, chromosome walking technique, using a proper lambda where base peaks were well separated. The compiled se- phage library of the B. subtilis genome. It turned out, quence was further analyzed for the locations of possible however, that many regions were difficult to clone into open reading frames (ORFs) using the Gene Works pro- the lambda phage vector (Fig. 2B). No clones extend- gram from Intelli Genetics Inc. (CA, USA). The amino ing into our assigned region were obtained using gnt and acid sequences of the identified ORFs were searched for spoOH sequences as probes. We could clone separate re- similarity to sequences reported previously in the non- gions using the type II membrane binding sequence, E4 redundant protein sequence data bank using FASTA and fragment of the ch-rrnA phage, and tms gene sequence BLAST network services of the Supercomputer Labo- as probes, respectively. However, extensive chromosomal ratory, Institute for Chemical Research, Kyoto Univer- walking was unsuccessful in any of these regions. sity. The 180 kb sequence reported in this report was deposited at DDBJ with the accession number D26185. Relatively small gap regions (less than 4 kb) missing in the lambda phage library were filled in two steps. We Sequence of the 180 kb Region of the B. Subtilis Chromosome [Vol. 1, first prepared a library of short genomic fragments (less quencing projects in which subsequent confirmation by than 3 kb) in M13 phage, and screened it to fill some genetic or biochemical analysis is not always carried out. of the gaps. The remaining gaps were then amplified by To avoid errors in the number of bases, we inspected PCR directly from the chromosome, followed by direct every chart of raw data from the sequencer and used se- sequencing of the amplified products. As for the large quence outputs only from regions where base peaks were unclonable regions at both ends of our assigned region, well separated. We found quite a few discrepancies be- we first tried successive rounds of inverse PCR followed tween our sequences and those in data banks. We have by direct sequencing, and regions of about 10 kb were reexamined our sequences and have not found any am- amplified by inverse PCR walking. Then, using the am- biguity suggesting errors in our sequence. Errors due plified DNA as probes, overlapping clones were isolated to mutation during PCR amplification may cause errors from the lambda phage library in the left end region. As in the sequence data. To avoid such errors, we did not for the right end region, a tandem rRNA operon (rrnJW) use "cloned" PCR product as templates for sequence de- was identified in the middle of the gap region from the termination. One exception to this was the regions up- sequence of the inverse PCR products. stream and downstream from rrnJW which were deter- mined using random DNA libraries of PCR products. In 3.2. Systematic DNA Sequencing of the Assigned Re- the latter cases, we observed one discrepancy among mul- gion tiple raw outputs of the same region, probably due to a PCR error. We used two methods, the shotgun method and the primer walking method, for systematic DNA sequencing (Fig. 2C). The sequence of fragments 6-12 kb in size hav- 3.3. Assignment of Coding Regions in the 180 kb Se- ing the restriction enzyme sites necessary to purify them quence from lambda phage DNA were determined by the shot- We have confirmed the presence of 4 rRNA gene sets gun method using randomly overlapping Ml3 libraries of with associated tRNA genes (rrnO, rrnA and rrnJW) DNase I-treated products. The three sequential steps in- and scRNA gene in this region.2 An operon encoding 4 volved in the shotgun sequencing, i.e., preparation of a tRNA genes reported previously (trnY)14 was found to random library, sequencing of randomly selected inserts, be located at the left end. In addition, three individual and the filling of the gaps by region-specific primers, each tRNA genes were newly identified. Thus a total of 20,191 required roughly 2 weeks when performed by one person. bp was found to code for structural RNA genes (Fig. 3). Thus, one person could determine the sequence of frag- The remaining 160 kb was translated in all six frames ments about 10 kb in size starting from lambda clones and surveyed for protein coding sequences (ORF) which in 6 weeks. The primer walking method was used to started with ATG, GTG or TTG codons preceded by se- determine the sequence of regions where no appropriate quences complementary to the 3' end of the 16S rRNA restriction enzymes were available for purification and (Shine Dargarno sequence), 5'-AAAGGAGGTGA-3'. where cloning in E. coli was difficult. We have deter- Based on these criteria, 153 putative ORFs were identi- mined the sequence of both strands of DNA in our as- fied as listed in Table 2. In addition, five ORFs coding for signed region, except for regions whose sequences were proteins more than 150 amino acids (aa) in length were already registered in data banks (Table 1) and were iden- identified without associated putative SD sequences. We tical to our sequence data. However, when discrepancies included them in further analysis, since sequence simi- were found, we determined both strands to confirm our larities to E. coli proteins were found in three of them result. A total of 5 nucleotides remains undetermined (Table 2). Thus, we identified 158 ORFs in the 160 kb in 4 rRNA genes, although we carefully inspected the sequence, suggesting that the entire B. subtilis genome regions in both directions. contains some 4,000 ORFs. This estimation coincided We have reported previously a 23,967 bp sequence well with that obtained by the Pasteur Institute group 17 around oriC5'6 and a 3,741 bp sequence containing recR13 who found 94 ORFs in a 97 kb sequence, and those 11 18 20 (Fig. 2C). We have newly determined a 152,428 bp se- from the E. coli genome sequencing project. ' " The quence in this report, in which a 30,386 bp sequence was initiation codon appeared to be ATG in 119 cases, TTG previously registered in data banks (Table 1). In ad- in 18 cases and GTG in 16 cases (Table 2). ORFs com- dition, a 3,335 bp sequence of the spoOH gene and the posed of 100-300 aa are most abundant as shown in Fig. genes downstream from it is available in data banks and 4. Small overlaps in ORFs were found in 22 regions (Ta- the sequence was confirmed to be contiguous with our ble 2). Overlap of 4 bp including a termination codon sequence. Thus, a contignuous stretch of 183,318 bp has is predominant in 16 cases. The remaining overlap is of been determined, and it is currently the longest known 8, 11, and 19 bases in 3, 2, and 1 case, respectively. Of stretch of the B. subtilis genome. our sequence, 137,318 bp in total was found to code for The accuracy of sequences is critical in genome se- protein. N. Ogasawara, S. Nakai, and H. Yoshikawa

Table 1. GenBank entries (Rel.77) in the 180 kb sequence.

From To Strand Entry Gene(s) in the entry -28376 -29115 3' BSTRNA1 trnY (tRNA genes for Lys, Glu, Asp, and Phe) -26151 -28357 3' BACADESYN purA -16733 -16078 5' BACSPRCT cotF -10287 -7639 5' BACMBR -7151 -8473 3' BACPURA 5523 3953 3' BACTET tetB 31690 18086 3' BACORIGS spoOJ, orf253, orfZ83, gidB, gidA, tdhF? spoIIIJ,h rnpA 22571 21619 3' BACSPOJ spoOJ 29667 41782 5' BACORIC rpmH, dnaA, dnaN, recF, gyrB, gyrA 31292 31283 3' BACRNPASPO rap A, spoIIIJ 31821 32320 5' BACDNAA 33911 33920 5' BACDNAAN 41296 41782 5' BACSARX 41296 41782 5' BACRRNO rrnO 41368 41720 5' BACRRNOL trnO (tRNA genes for He and Ala) 43091 43790 5' BACTGRG16 46510 46828 5' BACRRNO5S 47618 49242 5' BACIMPDE guaB 49564 50763 5' BACDACA dacA 57615 58720 5' BACORF17 orj 161, scRNA 58015 58720 5' BACSCRNA 58307 58577 5' BACSSCR 58307 58577 5' BACSCRNA1 58445 62186 5' BACRECM dnaH, orfl07, recR 58445 59971 5' BACDNAZX 61894 62188 5' BSRRNAL rrnA 66978 67371 5' BSRRNA5S 67242 68937 5' BACXPAC xpaC 77211 76652 3' BSABRB abrB 77140 76670 3' BSARBB 77061 76670 3' BACABRBA 84551 84733 5' BACVEGPRO veg 84899 85339 5' BACSPOR sspF 87625 88305 5' BSSPOVG spoVG 87626 87842 5' BACSPOVGPO 87653 87737 5' BACSPOVGA 87655 87764 5' BACSPOVGP 88024 91232 5' BACTMSPRS prs, tms26, etc 88154 88304 5' BACTMSPRO 90591 90718 5' BACCTCPRO 90591 90718 5' BACCTCPR 102220 102477 5' BACSPOIIE spoIIE 114442 119321 5' BACRFOLA pab, trpG, pabC, sul 127171 128290 5' BACTG9168 trnJ (tRNA genes for Val, Thr, Lys, Leu, Gly, Leu, Arg, Pro, and Ala) 141230 147588 5' BACGLUSYN gltX, cysA, cysS 142937 144388 5' BACGLTXA a Renamed according to Burland et al.11, formerly named 50K. b Renamed according to Errington et al.12, formerly named spoOJ97. Sequence of the 180 kb Region of the B. Subtilis Chromosome [Vol. 1.

cotF lysR lamily

-30 Kb 0 Kb

30 Kb

tettt Transterase lad family exoA rpsR ssb rpsFp spouj regulation DNA binding g,oe gtfA fdhF lilamiii y (S18) (S6) <* *><*>' (S18) (S6) • oriC ser-tRNA guaB dacA serS dnaA dnaG recF gyrB gyrA rrnO y \ \ scRNA dnaH 30 Kb 60 Kb

spolllJ mpA rpmH (L34)

recR bolA rrnA xpaC similarity ti \ 60 Kb I L 90 Kb

abrB similarity Io abrB similarily lo spoVB \ cysK pabA trpC pabC sul lolK prs clc spoVC mid Met-tRNA GIU-tRNA spollE bprt ftsH I / I 90 Kb I 120 Kb

rmJ trnJ rmW dpA'clpB lamily gltX cysA cysS

120 Kb1 150 Kb

SDOOH

Fig. 3. Protein and RNA coding regions in the 180 kb sequence. Coding regions that will be transcribed from left to right are shown above the scale, and those transcribed in the opposite direction below the scale. RNA coding regions are indicated by blue boxes. The remaining boxes indicate protein coding regions (ORFs) and are colored according to function; red, identified; green, suggested: yellow, unknown but conserved in other bacteria; white, unknown and unique.

Table 2. Open reading frames (ORFs) in the 180 kb sequence.

Location Coding ORF From To strand size (aa) SD sequence and initiation codon Gene/product 29312 -30016 3' 235 AGGAGGaaaatcaaaatgATG 26989 -28278 3' 430 GGAGGTGcacggacATG purA (a) 26364 -26780 3' 139 AAAGGAGaTGgtaacaatATG 24879 -26240 3' 454 AGGAcGGTGcttagcATG dnaC (b) 24709 -24512 5' 66 AAgGGAGGaatgaaagATG 24263 -24000 5' 88 AAGGAtaacggtgcGTG 23924 -22719 5' 402 AAAaGAGGagtaactgctTTG 22611 -20557 5' 685 AAaGAGGTGAaaaacgATG 20068 -20514 3' 149 GGAGGcgtacagagATG rpll (b) 18092 -20068 3' 659 GGAGTGAtagaaATG 17126 -18052 3' 309 GAGGTGActaaGTG 16604 -16125 5' 160 AAAGGAGaattttcacATG cotF (a) 15715 -16089 3' 125 GGAGGgtcaagaATG 15510 -14584 5' 309 AtAGGAGctatcATG 14102 -14545 3' 148 AGGgGGcTGAagATG 13669 -12365 5; 435 AGGAGatcatgctATG No. 1] N. Ogasawara, S. Nakai, and H. Yoshikawa

Table 2. Continued

Location Coding ORF From To strand size (aa) SD sequence and initiation codon Gene/product -10600 -10166 5' 145 AAAGGAttgtaaaattATG -10049 -9297 5' 251 GAGTGAgaacATG -9304 -8597 5' 236 AtGAGGcaaaatacaATG -8597 -7845 5' 251 AAGGAttataaaaaATG -7845 -7192 5' 218 GAGGgcttacaATG -6026 -6811 3' 262 GGaAGGTGAcacaaTTG -5569 -5955 3' 129 GGgGGaattcatTTG -5423 -4587 5' 279 AAGG-GGTGAacattaTTG -3337 -4548 3' 404 GAGG-GAccgataaagtcccgATG -3108 -2275 5' 278 ATG LysR family of transcription regulator (?) (b) -2258 -1818 5' 147 AGGAaGGattcaagcATG -1732 -1256 5' 159 AgAGaAGGaatgattgtATG -416 -1075 3' 220 AGGAGaTGgtttgcGTG 184 -266 3' 150 AGG-GGttttaggaATG 304 747 5' 148 AAAGGA-tTGAttcataATG 747 1349 5' 201 ATG 1968 1450 3' 173 GGcGGaGAttgatATG 2379 2732 5' 118 AAAGGAG-TGATtgagtATG 2895 3458 5' 188 AAAGGAtGaatagttATG 5344 3971 3' 458 AAtGGAGGgGAaattGTG tetB (a) 5693 5929 5' 79 AAGaAGGTGAtcccATG 6083 6496 5' 138 GAGGgatgATG DNA binding ? (b) 6496 7410 5' 305 AGGAaGTGgaaaacaaATG 7485 9551 5' 689 AAGaAGGTGAtcccAT homologue in Corynebcterium sp. (b) 10450 9554 3' 299 AAAGGAaGaGAcatagaaggaATG 10676 12028 5' 451 AAAGGgGGTtttttATG 12619 12068 3' 184 GGAG-TGActtaaATG CysE_LacA_NodL family (acetyltransferase) (b) 13017 12640 3' 126 AAGGAGGaggaagaaATG 14008 13076 3' 311 AAAGGAGaaagagATG Lad family (b) (transcriptional repressor ?) 14825 14070 3' 252 GGAGGaTGAaacATG exoA (b) 15135 14893 3' 81 AAAGGAGG-GAaATG rpsR (b) 15691 15176 3' 172 AAAGGtGGtctttcttATG ssb (b) 16019 15735 3' 95 AAAGGAGGTGcaaacagATG rpsF (b) 17230 16133 3' 366 AgAGGAGaTGAaagaATG 19360 17360 3' 667 AAGGAGaaGAcggcgATG 20727 19714 3' 338 GAGGTGAtaGTG 21187 21801 5' 205 AAGGAaGTGAaacccATG 22691 21846 3' 282 AAAGGAaGTGgctgcgaATG spoOJ (a) 23445 22687 3' 253 AAAGtAGGTGAcatcGTG regulation of SpoOJ and Orf283 ? (a) 23693 24130 5' 146 AGGAtGGTGAgaaaATG 25035 24187 3' 283 AAAGGtGGTGtaggtacATG DNA binding ? (a) 25876 25160 3' 239 AAAGGAtGacggcATG gidB (function unknown), homologue in E. coli (a) 27776 25893 3' 628 AAAGGAGGaactagaatcATG gidA (function unknown), homologue in E. coli (a) 29176 27800 3' 459 AAGaGAGGTGAacaacATG 50K (function unknown), homologue in E. coli (a) 30113 29490 3' 208 GGAGGaaagaaaaaGTG 30895 30113 3' 261 AGGAGGaaATG spoIIIJ (a) Sequence of the 180 kb Region of the B. Subtilis Chromosome [Vol. 1,

Table 2. Continued

Location Coding ORF Prom To strand size (aa) SD sequence and initiation codon Gene/product 31390 31043 3' 116 GGAG-TGAgtcatTTG rnpA (a) 31676 31545 3' 44 GGAGGTGtcataaATG rpmH (a) 32303 33640 5' 446 GGAcGTGccggaagATG dnaA (a) 33832 34965 5' 378 AGGAGGataaaaATG dnaG (a) 35099 35311 5' 71 AAaGAGGTcgatataATG 35330 36439 5' 370 AAAGcgGGTGacactgaTTG recF (a) 36460 36615 5' 52 AAtGAGGTGAgcaaTTG 36759 38672 5' 638 GtAGGTGAatgacgtggctATG gyrB (a) 38886 41348 5' 821 GGAGGTtttttaATG gyrA (a) 47685 46741 3' 315 AAAGGAttaagaaatatacATG 47806 49269 5' 488 AGGgGGatttactaATG guaB (a) 49425 50753 5' 443 GGAGGTcgtacgaTTG dacA (a) 50953 51834 5' 294 AGGgGGaccaagaaATG homologue in M. vannielii (b) 51859 52446 5' 196 AGGAGcGcTGctgacATG 52771 54045 5' 425 AAAGGAG-TGtttcgcATG serS (b) 55040 54390 3' 217 AAAGGAGcttatcgtATG 55660 55040 3' 207 AAGGAGGattccgATG 57042 55762 3' 427 AAAGGAGGcgtttttcattcaaatttatGTG 57657 57115 3' 181 AAAGGAGactgtcgaTTG 57743 58225 5' 161 ATG homologue in E. coli (a) 58705 60393 5' 563 AGGAGGgcaaacccGTG dnaH (a) 60420 60740 5' 107 AAaGAGaGTGAatgctATG homologue in E. coli (a) 60758 61351 5' 198 AGGgGGataaaagaacATG recR (a) 61372 61593 5' 74 AAGGAGGaaaaagcgatccATG 61663 61923 5' 87 AAGGAGaTGAgaagattcATG bofA (c) 67422 67613 5' 64 GGAGGTGgagaagATG 67736 68347 5' 204 AGGAG-TcGAttatctcATG xpaC (a) 68369 69526 5' 386 AAGagGGTaagagcgATG homologue in RK2 (b) 69611 71050 5' 480 AAAGGAttgttttcATG similar to B. subtilis DclY (b) 71050 71685 5' 212 AaGAGGaGAaatcATG 71762 72088 5' 109 AAAGGAGGccttcacagATG 72104 72541 5' 146 AAGGAtctgaaactGTG 72556 73542 5' 329 GaGaGTGAtacaATG similar to B. subtilis DnaH (b) 73548 74372 5' 275 AGGAGGgataagcTTG 74390 74746 5' 119 GAGGTGtggaaccTTG 74808 75548 5' 247 AAAGGAtaagaagATG 75812 76687 5' 292 AAGGAGGctgtatATG 77029 76742 3' 96 GGAGGagaATG abrB (a) 77524 79515 5' 664 AGGAGGTTTCAAGATG metS (b) 79597 80361 5' 255 AAAGGAGtttttcacttATG 80520 81830 5' 437 AA-GAGGacactgagcttTTG 81978 82535 5' 186 GGAGGaaataATG 82531 83406 5' 292 GGAGGaacagaATG ksgA (b) 83571 84440 5' 290 GGA-GTGAggaacGTG 84654 84911 5' 86 GAGGTGgatgcaATG veg (function unknown) (a) 85074 85256 5' 61 AAAGGAGtTGtttcgtTTG sspF (a) 85407 86273 5' 289 AAGtAGGTGAaagctATG 86332 87186 5' 285 GGgGGTGAgttcATG 87186 87560 5' 125 AAtGGAGagacagaatcATG homologue in A. vinelandii (b) 87757 88047 5' 97 AAAGGtGGTGAactactGTG spoVG (a) 88243 89610 5' 456 GGAGGccaataaATG tms26 (a) 89636 90586 5' 317 GGAGGTttatccATG prs (a) 90674 91285 5' 204 AGGAtGGTGctgaatATG etc (function unknown) (a) 91395 91958 5' 188 AAgGGAGGattcgccATG spoVC(d) No. 1] N. Ogasawara, S. Nakai, and H. Yoshikawa

Table 2. Continued

Location Coding ORF Prom To strand size (aa) SD sequence and initiation codon Gene/product 92021 92248 5' 76 AGGAGGacccttgtATG 92321 95851 5' 1177 AgAGGAGGggccatATG mfd (b) 95990 96523 5' 178 AAAGaGAGGcaccagagATG similar to B. subtilis abrB (b) 96708 98303 5' 532 AAGGAGcttttggATG similar to B. subtilis spoVB (b) 98296 99762 5' 489 AAGGAGaGaacaaaATG 99768 100025 5' 86 GAGGaGAtcatagatATG 100107 100406 5' 100 GGgGGcTGAatagATG 100406 101038 5' 211 ATG 101059 101433 5' 125 AAAGGAGGaccgtctggtTTG 101517 101900 5' 128 AAGGAGGagcactttttttATG 102429 104909 5' 827 GGAGaTGAgaggaATG spoIIE (a) 104997 105731 5' 245 AGGAGGaatgaatGTG homologue in B. megaterium (b) 105823 106713 5' 297 AAAGG-GGccaatggaattGTG 106778 108235 5' 486 GGAGGggttgaaagtgttaggatATG 108235 108774 5' 180 AGGgGGCAAGCAAAATCATG hprt (b) 108875 110785 5' 637 AGGAGGTaaggaATG ftsH (b) 110983 111681 5' 233 AAAaGtGGTGAtagaggTTG 111770 112642 5' 291 AGGAGGTttagtaATG 112692 113582 5' 297 AAAGG-GGcTGAaagcaggtTTG 113661 114584 5' 308 GAGGTGtcgagaATG cysK (b) 114754 116163 5' 470 AGGAaTGAtacaaATG pab (a) 116180 116761 5' 194 GAGGTGAgcggagaaATG trpG (a) 116764 117642 5' 293 AAGGAagttattgcgtgATG pabC (a) 117627 118481 5' 285 GGAGGaagagcATG sul (a) 118477 118836 5' 120 GGAGGggtgcaccATG homologue in E.coli (b) 118836 119336 5' 167 ATG folK (b) 119524 120522 5' 333 AAAGGAGGaGAaaaaTTG homologue in E.coli (b) 120617 122113 5' 499 AAtGGA-GTGAtaacaATG lysS (b) 133339 133800 5' 154 AAAGGAGGgggttgaGTG 133817 134371 5' 185 AAGcG-GGTGAaaagaTTG 134374 135462 5' 363 AGGAGGaacaggagtaATG 135462 137891 5' 810 AGGAGGaTGAatcgatATG clpA/clpB family(b) 137986 139359 5' 458 AAgGGAGaGgtcttacactatatATG homologue in E.coli (b) 139366 140445 5' 360 AGGAGGataatagATG 140564 141661 5' 366 AAAGGAGGTGggggtATG homologue in T. Aquaticus (b) 141679 142374 5' 232 AAgGGAGaaGAaacaATG homologue in R. capsulatus (b) 142370 142843 5' 158 AAAGtgGGaataaacATG homologue in R. capsulatus (b) 142937 144385 5' 483 AAAGGAaGTatttgaaaATG gltX (a) 144690 145340 5' 217 GGgGG-GAagcatGTG cysA (a) 145340 146737 5' 466 AAAGGAtcaatcaaaaATG cysS (a) 147159 147905 5' 249 GGAGGaaaacaaATG 147915 148424 5' 170 GGAGaataaagacccATG

a, characterized previously, as listed in Table 1; b, newly identified in this work, as listed in Table 3; c, according to Ricca et al.15; d, according to Igo et al.16

The remaining 12,276 bp (13%) seems to be non- transcriptions are expected to extend from both ends, the coding. The largest non-coding stretch, 1,764 bp (from putative transcription termination signal, long inverted — 12,364 to —10,061), was without apparent structural repeats with a T stretch at the bottom, was identified feature. The second largest non-coding region was 627 bp in all regions except for one (data not shown). We have in length between rpmH and dnaA, and contained multi- not yet performed the systematic analysis of non-coding ple repeats of DnaA-box and two divergent promoters for sequences. adjacent genes. This region functions as a part of oriCoi Locations and directions of ORFs and structural RNA the chromosome.21 In 15 non-coding regions into which genes are illustrated in Fig. 3 together with their func- 10 Sequence of the 180 kb Region of the B. Subtilis Chromosome [Vol. 1, ment for the majority of genes in this region. However, in the region which is roughly 28 kb starting from cotF and ending near exoA, this correlation is reversed and the di- rection of transcription changes frequently (Fig. 3). Base composition of this region clearly shows a high AT con- tent (60%) compared with the average (55%) (data not shown), suggesting that frequent gene rearrangement in- cluding horizontal transfer of foreign species has occurred during evolution.

3.4- Functional Assignment of ORFs in the 180 kb Se- quence Among the 158 ORFs identified, sequences of 33 ORFs were reported previously and characterized ge- netically and/or biochemically (Table 1). Compari- son of the protein sequences of putative ORFs against Size of ORF (aa) the non-redundant protein sequence data bank us- Fig. 4. Size distribution of 158 ORFs in the 180 kb sequence. ing the FASTA and BLAST programs, revealed the function of an additional 15 ORFs based on the high degree of similarity to known protein sequences tional assignments as discussed in the following sec- of other bacteria (Table 3). Among them, five tion. It is interesting that the orientation of tran- genes, dnaC, rpll, rpsF, ksgA, and lysS, have been scription is the same as that of replication fork move-

Table 3. Similarity of predicted ORF products to other proteins (newly identified ORFs, presented in map order)

ORF Similarity to aa identity Length orf454 sp:DNAB_ECOLI DNAB PROTEIN 45.4% 447 aa sp:DNAB.SALTY DNAB PROTEIN 44.7% 447 aa orfl49 sp:RL9_BACST 50S RIBOSOMAL PROTEIN L9 (BL17) 72.3% 148 aa sp:RL9.SYNEN 50S RIBOSOMAL PROTEIN L9 35.6% 149 aa sp:RL9-ECOLI 50S RIBOSOMAL PROTEIN L9 33.3% 150 aa orf278 sp:GLTC_BACSU REGULATORY PROTEIN GLTC 35.1% 279 aa sp:CYNR_ECOLI CYNR ACTIVATORY PROTEIN 26.9% 264 aa sp:OXYR_ECOLI OXYR ACTIVATORY PROTEIN 28.2% 277 aa sp:LYSR-ECOLI LYSA ACTIVATORY PROTEIN 27.4% 230 aa orfl38 sp:MERR.THIFE MERCURIC RESISTANCE OPERON REGULATOR 30.0% 120 aa sp:NOLA-BRAJA NODULATION PROTEIN NOLA 27.2% 103 aa orf689 prf:1111332B ORF (partial) - Corynebacterium sp. 44.3% 221 aa orfl84 sp:NODL_RHIME NODULATION PROTEIN L 52.5% 181 aa sp:THGA.ECOLI GALACTOSIDE ACETYLTRANSFERASE 40.0% 185 aa orf311 sp:PURR_ECOLI PURINE NUCLEOTIDE SYNTHESIS REPRESSOR 26.2% 309 aa sp:RBSR-ECOLI RIBOSE OPERON REPRESSOR 24.7% 320 aa sp:CCPA_BACSU CCPA PROTEIN 28.5% 302 aa orf252 sp:RRPl-DROME RECOMBINATION REPAIR PROTEIN 1 47.8% 255 aa sp:APl_MOUSE DNA-(APURINIC OR APYRIMIDINIC SITE) LYASE 50.4% 258 aa sp:EXOA_STRPN EXODEOXYRIBONUCLEASE 39.4% 249 aa sp:EX3_ECOLI EXODEOXYRIBONUCLEASE III 29.8% 258 aa orf81 sp:RS18JBACST 30S RIBOSOMAL PROTEIN S18 (BS21) 87.0% 77 aa sp:RS18-ECOLI 30S RIBOSOMAL PROTEIN S18 60.0% 65 aa orfl72 sp:SSB_ECOLI SINGLE-STRAND BINDING PROTEIN (SSB) 38.3% 175 aa sp:SSB_SERMA SINGLE-STRAND BINDING PROTEIN (SSB) 36.7% 177 aa sp:SSB-PROMI SINGLE-STRAND BINDING PROTEIN (SSB) 32.8% 174 aa No. 1] N. Ogasawara, S. Nakai, and H. Yoshikawa 11

Table 3. Continued

ORF Similarity to aa identity Length orf95 sp:RS6J3COLI 30S RIBOSOMAL PROTEIN S6 29.5% 95 aa sp:RS6.BACSU 30S RIBOSOMAL PROTEIN S6 (BS9) (FRAGMENT) 100.0% 16 aa orf294 pir:S28731 hypothetical protein - Methanococcus vannielii 64.1% 220 aa orf425 sp:SYS-ECOLI SERYL-TRNA SYNTHETASE 52.1% 428 aa sp:SYSC.YEAST SERYL-TRNA SYNTHETASE, CYTOPLASMIC 34.8% 396 aa orf386 pir:B38178 telA protein - plasmid RK2 30.6% 340 aa orf480 sp:DCLY.BACSU LYSINE DECARBOXYLASE 33.5% 486 aa sp:DCLY_ECOLI LYSINE DECARBOXYLASE 22.1% 312 aa orf329 sp:DP3X_BACSU DNA POLYMERASE III SUBUNITS 24.1% 319 aa sp:HOLB-ECOLI DNA POLYMERASE III, DELTA' SUBUNIT 22.7% 326 aa orf664 sp:SYM.BACST METHIONYL-TRNA SYNTHETASE 71.6% 662 aa sp:SYM.ECOLI METHIONYL-TRNA SYNTHETASE 30.3% 524 aa orf292 pir:B42473 rRNA methylase ermK - Bacillus licheniformis 27.4% 219 aa sp:MLSB_ECOLI RRNA ADENINE N-6-METHYLTRANSFERASE 28.8% 191 aa orfl25 pir:B44514 Hypothetical protein 1 (vnfA 5' region) - A. vinelandii 49.2% 126 aa orfll77 sp:MFD-ECOLI TRANSCRIPTION-REPAIR COUPLING FACTOR 35.6% 1148 aa sp:RECG_ECOLI PROBABLE ATP-DEPENDENT DNA- RECG 32.6% 485 aa orfl78 sp:ABRB_BACSU TRANSCRIPTION REGULATION ABRB PROTEIN 68.6% 51 aa orf532 sp:SP5B.BACSU STAGE V SPORULATION PROTEIN B 23.7% 527 aa orf245 pir:S19193 Hypothetical protein 3/2 - Bacillus megaterium 65.9% 246 aa orfl80 sp:HPRT.VIBHA HYPOXANTHINE-GUANINE PHOSPHORIBOSYLTRANSFERASE 50.9% 175 aa sp:HPRT.MOUSE HYPOXANTHINE-GUANINE PHOSPHORIBOSYLTRANSFERASE 37.2% 180 aa orf637 sp:FTSH_ECOLI CELL DIVISION PROTEIN FTSH 51.7% 601 aa pir:S28533 tma protein - Lactococcus lactis 57.5% 562 aa gp:YSCAYME_l putative ATPase (Saccharomyces cerevisiae) 39.7% 521 aa orf308 sp:CYSK.ECOLI CYSTEINE SYNTHASE A 54.4% 305 aa sp:CYSK_SALTY CYSTEINE SYNTHASE A 53.1% 305 aa sp:CYSK.SPIOL CYSTEINE SYNTHASE A 56.5% 306 aa orfl20 sp:YGIG.ECOLI HYPOTHETICAL 12.5 KD PROTEIN IN BACA 5'REGION 29.0% 107 aa orfl67 sp:HPPK.ECOLI 2-AMINO-4-HYDROXY-6-HYDROXYMETHYLDIHYDRO 47.8% 134 aa orf333 sp:YHDG_ECOLI HYPOTHETICAL 35.9 KD PROTEIN IN FIS 5'RE 40.4% 319 aa gp:RCNIFR3.2 ni£R3 gene product (Rhodobacter capsulatus) 36.8% 315 aa orf499 sp:SYKl_ECOLI LYSYL-TRNA SYNTHETASE 52.6% 498 aa sp:SYK2_ECOLI LYSYL-TRNA SYNTHETASE, HEAT INDUCIBLE 53.0% 500 aa sp:SYKC.YEAST LYSYL-TRNA SYNTHETASE, CYTOPLASMIC 42.5% 506 aa orf810 sp:YATP.MYCLE HYPOTHETICAL PROTEIN OF THE ATP OPERON 64.0% 647 aa sp:CLPB_ECOLI CLPB PROTEIN (HEAT SHOCK PROTEIN F84.1) 31.6% 807 aa sp:CD4B_LYCES CD4B PROTEIN PRECURSOR 59.5% 810 aa sp:CLPB_BACNO CLPB HOMOLOG PROTEIN 33.8% 684 aa orf458 sp:SMS_ECOLI SMS PROTEIN 46.5% 456 aa orf366 sp:YSCl_THEFL HYPOTHETICAL PROTEIN IN SCSB 5'REGION 45.0% 80 aa orfl58 gp:RCNIFR3.1 R.capsulatus nifR3 DNA (Rhodobacter capsulatus) 49.0% 151 aa

mapped genetically in this region. However we have not synthesis and maintenance of DNA were found in the 180 yet confirmed experimentally the identify of the assigned kb sequence, 3 dna genes, 2 rec genes, 2 genes of DNA ORF with the mapped gene. A data bank search also gyrase subunits, exoA, ssb and mfd. Genes involved in suggested the function of 11 ORFs through significant protein synthesis were also abundant: 4 ribosomal pro- similarities to known protein sequences (Table 3). The tein genes, 5 aminoacyl-tRNA synthetase genes, 20 tRNA functions of remaining 99 ORFs are unknown, of which genes and 4 rRNA gene sets. These kinds of genes are not 15 are structurally conserved (more than 30% identical found in the 97 kb region sequenced by the Pasteur In- 17 amino acids in the whole sequence) in other bacteria (Ta- stitute group, suggesting unequal distribution of genes ble 3). It should be noted that many genes related to the on the genome as to their functions. 12 Sequence of the 180 kb Region of the B. Subtilis Chromosome [Vol. 1, mentally using cloned ORFs. Therefore it is essential Table 4. Function of ORFs in the 180 kb sequence. to develop analytical means to deduce functions directly Function Conservation Number from sequence data in parallel with the experimental ap- in other bacteria proach. Such information as functional domain and ac- known yes 28 tive centers, etc., will accelerate the speed of elucidation no 5 of function of unknown ORFs by genetic and biochemical newly identified yes 15 methods. total 48 (30%) suggested yes 8 3.5. Comparison with E. coli Genome no 3 total 11 (7%) Among the 158 ORFs discussed in this report, coun- unknown yes 15 terparts on the E. coli genome have been identified for 43 no 84 ORFs (27%), as listed in Table 5. This number will in- total 99 (63%) crease as genome sequences become known for both bac- TOTAL 158 (100%) teria. We have found that genes and their organization were conserved in B. subtilis and E. coli around oriC.7 Operonic structure of dnaH - recR was also the same in both bacteria.13 Among the newly identified genes, an Analysis of the function of ORFs is summarized in Ta- operonic structure containing rpsF and rpsR was found ble 4. The percentage of ORFs with unknown function to be similar in the two bacteria. In E. coli, rpsF, orfl04 (about half) is the same as that of 200 ORFs of Sac- : 22 rpsR and rpll constitute an operon in this order, while charomyces cerevisiae chromosme III and that of 287 11 19 20 ssb was found in place of orfl04, and rpll was located ORFs in a 325 kb stretch of E. coli genome. ' - The some 35 kb apart in B. subtilis. However, the locations number of unidentified ORFs reflects our limited knowl- of these conserved regions relative to oriC are completely edge on structure and function of even bacterial genomes. different. Other counterparts in E. coli also appear to We are planning a systematic genetic and biochemical be distributed randomly throughout the whole E. coli analysis of unidentified ORFs. However it is not an easy genome (Table 5). task to determine functions of some 100 genes experi-

Table 5. Genes in the 180 kb sequence conserved in Escherichia coli. Product B. subtilis E. coli Similarity genea gene location13 (identical aa) Adenylosuccinate synthetase purA purA 95.03 47% Replicative DNA helicase dnaC dnaB 91.99 42% Ribosomal protein L9 rpll rpll 95.48 33% 3'-exo-deoxyribonuclease exoA* xthA 39.11 26% Ribosomal protein S18 rpsR* rpsR 95.48 49% Single strand DNA binding protein ssb* ssb 92.25 34% Ribosomal protein S6 rpsF rpsF 95.46 23% unknown gidB* gidB 84.80 29% unknown gidA* gidA 84.81 52% Thiophen and furan oxidation thdF* thdF 83.97 28% unknown spoIIIJ 60K 83.8* 31% Protein component of ribonucleaseP rap A* rnpA 83.87 26% Ribosomal protein L34 rpmH* rpmH 83.87 63% Initiation protein of DNA replication dnaA dnaA 83.82 42% j3 subunit of DNA polymerase III dnaG dnaN 83.80 24% Recombination protein recF recF 83.78 22% B subunit of DNA gyrase gyrB gyrB 83.72 47% A subunit of DNA gyrase gyrA gyrA 49.72 51% IMP dehydrogenase guaB guaB 55.80 56% D-alanine carboxypeptidase (PBP-5) dacA dacA 14.39 24% Seryl-tRNA synthetase serS* serS 20.19 51% unknown orfl61* orfl78 57.1* 40% Subunit of DNA polymerase III dnaH dnaZX 10.68 29% unknown orflOT* orfl09 10.7* 39% Recombination protein recR recR 10.73 45% No. 1] N. Ogasawara, S. Nakai, and H. Yoshikawa 13

Table 5. Continued Product B. subtilis E. coli Similarity genea gene location (identical aa) Methionyl-tRNA synthetase metG* metG 46.67 28% S-adenosylmethionine-6-N',N'-adenosyl ksgA ksgA 1.10 28% dimethyltransferase Phosphoribosyl pyrophosphate synthetase prs prs 27.03 50% Temperature-sensitive cell division tms26 ecourf c 84.6* 42% Transcription-repair coupling factor mfd* mfd nd 35% Cell division protein ftsH* ftsH 69.3* 47% Cystein synthetase cysK* cysK 54.00 48% p-Aminobenzoate synthetase component I pab pabB 40.58 31% p-Aminobenzoate synthetase component II trpG pabA 75.46 58% Dihydroperoate pyrophosphorylase sul folP nd 38% unknown orfl20* orflll nd 26% 7,8-dihydro-6-hydroxymethylpterin pyrophosphokinase folK* folK nd 40% unknown orf333* orf321 73.7* 36% Lysyl-tRNA synthetase lysS lysS 64.59 52% unknown orf458* orf460 99.68 46% Glutamyl-tRNA synthetase gltX gltX 53.66 34% Serine acetyl transferase cysA cysE 81.64 28% Cysteinyl-tRNA synthetase cysS cysS 12.02 44%

a Genes with * were identified based on the sequence similarities. b Map position in min. according to Medigue et al.23 Positions of genes with * are approximate, nd, not determined. c Burland et al.11 reported that this ORF is broken into the two ORFs due to one base addition.

Acknowledgments: We thank Drs. Antoine Dan- 5. Moriya, S., Ogasawara, N., and Yoshikawa, H. 1985, chine, Rivka Rudner, Juan C. Alonso for sharing data Structure and function of the region of the replication ori- before publication and Nobuyuki Fujita for help with the gin of the Bacillus subtilis chromosome. III. Nucleotide sequence analysis. This work was supported by a Grant- sequence of some 10,000 base pairs in the origin region, in-Aid for Creative Research (Human Genome Program) Nucleic Acids. Res., 13, 2251-2265. and for Special Project Research (Genome Informatics) 6. Ogasawara, N. and Yoshikawa, H. 1992, Genes and their organization in the replication origin region of the bacte- from the Ministry of Education, Science and Culture, rial chromosome, Mol. Microbiol., 6, 629-634. Japan. 7. Yoshikawa, H. and Ogasawara, N. 1991, Structure and function of DnaA and the DnaA-Box in eubacteria: Evo- References lutionary relationships of bacterial replication origins, Mol. Microbiol, 5, 2589-2597. 1. Bachmann, B. J. 1990, Linkage map of Escherichia coli 8. Itaya, M. and Tanaka, T. 1991, Complete physical map K-12, edition 8, Microbiol. Rev., 54, 130-197. of the Bacillus subtilis 168 chromosome constructed by a 2. Anagnostopoulos, C, Piggot, P. J., and Hoch, J. A. 1993, gene-directed mutagenesis method, J. Mol. BioL, 220, The genetic map of Bacillus subtilis. In: Sonenshein, 631-648. A. L., Hoch, J. A., Losick, R. (eds.) Bacillus subtilis 9. Sambrook, J., Frietsch, E. F., and Maniatis, T. 1989, and other Gram-positive bacteria. American Society for Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Microbiology, Washington, D. C, pp. 425-461. Spring Harbor Laboratory, New York, pp. 13.21-13.33. 3. Ogasawara, N. and Kobayashi, Y. 1991, Genes and pro- 10. Ogasawara, N., Moriya, S., and Yoshikawa, H. 1983, teins for cell growth and division in Bacillus subtilis. In: Structure and organization of rRNA operons in the re- Ishihama, A., Yoshikawa, H. (eds.) Control of cell growth gion of the replication origin of the Bacillus subtilis chro- and division. Japan Scientific Societies Press, Tokyo, pp. mosome, Nucleic Acids. Res., 11, 6301-6318. 225-234. 11. Burland, V., Plunkett, G., Daniels, D. L., and Blattner, 4. Kunst, F. and Devine, K. 1991, The project of sequencing F. R. 1993, DNA sequence and analysis of 136 kilobases the entire Bacillus subtilis genome, Res. Microbiol., 142, of the Escherichia coli genome: organizational symmetry 905-912. around the , Genomics, 16, 551-561. 14 Sequence of the 180 kb Region of the B. Subtilis Chromosome [Vol. 1, 12. Errington, J., Appleby, L., Daniel, A., Goodfellow, H., 18. Yura, T., Mori, H., Nagai, H. et al. 1992, Systematic Partridge, S. R., and Yudkin, M. D. 1992, Structure and sequencing of the Escherichia coli genome: analysis of the function of the spoIIIJ gene of Bacillus subtilis: a vege- 0-2.4 min region, Nucleic Acids. Res., 20, 3305-3308. tatively expressed gene that is essential for sigma-G ac- 19. Daniels, D. L., Plunkett, G., Burland, V., and Blattner, tivity at an intermediate stage of sporulation, J. Gen. F. R. 1992, Analysis of the Escherichia coli, genome: Microbioi, 138, 2609-2618. DNA sequence of the region from 84.5 to 86.5 minutes, 13. Alonso, J. C, Shirahige, K., and Ogasawara, N. 1990, Science, 257, 771-778. Molecular cloning, genetic characterization and DNA se- 20. Plunkett, G., Burland, V., Daniels, D. L., and Blattner, quence analysis of the recM region of Bacillus subtilis, F. R. 1993, Analysis of the Escherichia coli genome. 3. Nucleic Acids. Res., 18, 6771-6777. DNA sequence of the region from 87.2 to 89.2 minutes, 14. Yamada, Y., Ohki, M., and Ishikura, H. 1983, The nu- Nucleic Acids. Res., 21, 3391-3398. cleotide sequence of Bacillus subtilis tRNA genes, Nucleic 21. Moriya, S., Atlung, T., Hansen, F. G., Yoshikawa, H., Acids. Res., 11, 3037-3045. and Ogasawara, N. 1992, Cloning of an autonomously 15. Ricca, E., Cutting, S., and Losick, R. 1992, Characteriza- replicating sequence (ars) from the Bacillus subtilis chro- tion of bofA, a gene involved in intercompartmental regu- mosome, Mol. Microbioi., 6, 309-315. lation of pro-crK processing during sporulation in Bacillus 22. Oliver, S. G., van der Aart, Q. J. M., Agostoni-Carbone, subtilis, J. Bacterioi, 174, 3177-3184. M. L. et al. 1992, The complete sequence of yeast chro- 16. Igo, M., Lampe, M., and Losick, R. 1988, Structure and mosome III, Nature, 357, 38-46. regulation of a Bacillus subtilis gene that is transcribed 23. Medigue, C, Viari, A., Henaut, A., and Danchin, A. by the EcrB form of RNA polymerase holoenzyme. In: 1993, Colibri: a functional data base for the Escherichia Ganesan A. T., Hoch, J. A. (eds.) Genetics and Biotech- coli genome, Microbioi. Rev., 57, 623-654. nology of Bacilli, Academic Press, San Diego, pp. 151 156. 17. Glaser, P., Kunst, F., Arnaud, M. et al. 1993, Bacillus subtilis genome project: cloning and sequence of the 97 kilobase region from 325° to 333°, Mol. Microbioi., 10, 371-384.