Synthesis of a Contiguous 32-Kb Polyketide Synthase Gene Cluster
Total Page:16
File Type:pdf, Size:1020Kb
Total synthesis of long DNA sequences: Synthesis of a contiguous 32-kb polyketide synthase gene cluster Sarah J. Kodumal, Kedar G. Patel, Ralph Reid, Hugo G. Menzella, Mark Welch, and Daniel V. Santi* Kosan Biosciences, Inc., 3832 Bay Center Place, Hayward, CA 94545 Communicated by Robert M. Stroud, University of California, San Francisco, CA, September 17, 2004 (received for review July 19, 2004) To exploit the huge potential of whole-genome sequence infor- Our efforts to this end stemmed from a desire to develop mation, the ability to efficiently synthesize long, accurate DNA heterologous expression of large polyketide synthase (PKS) sequences is becoming increasingly important. An approach pro- genes in Escherichia coli. Type I modular PKS genes encode the posed toward this end involves the synthesis of Ϸ5-kb segments of giant enzymes (among the largest proteins known) that synthe- DNA, followed by their assembly into longer sequences by con- size polyketide natural products such as erythromycin, epothi- ventional cloning methods [Smith, H. O., Hutchinson, C. A., III, lone, and tacrolimus (12). These genes reside within the high Pfannkoch, C. & Venter, J. C. (2003) Proc. Natl. Acad. Sci. USA 100, GϩC genomes of the actinomycete and myxobacterial groups of 15440–15445]. The major current impediment to the success of this prokaryotes and encode proteins with multiple sets, or modules, tactic is the difficulty of building the Ϸ5-kb components accurately, of active sites (domains). Each module catalyzes the assembly of efficiently, and rapidly from short synthetic oligonucleotide build- a specific two-carbon-unit component of the polyketide product. ing blocks. We have developed and implemented a strategy for the We sought to recreate PKS genes with the twin objectives of high-throughput synthesis of long, accurate DNA sequences. Un- optimizing their codon composition for efficient expression in E. purified 40-base synthetic oligonucleotides are built into 500- to coli and to introduce common restriction sites flanking modules 800-bp ‘‘synthons’’ with low error frequency by automated PCR- and domains that would permit facile interchangeability, thus based gene synthesis. By parallel processing, these synthons are exploiting the full potential of combinatorial biosynthesis of efficiently joined into multisynthon Ϸ5-kb segments by using only ‘‘unnatural natural products’’ (12). three endonucleases and ‘‘ligation by selection.’’ These large Smith et al. (4) proposed building very long DNA sequences by segments can be subsequently assembled into very long sequences synthesis of Ϸ5-kb segments of DNA from short synthetic oligo- by conventional cloning. We validated the approach by building a nucleotides, followed by their assembly into longer sequences by synthetic 31,656-bp polyketide synthase gene cluster whose func- conventional methods. However, methodologies for preparing seg- tionality was demonstrated by its ability to produce the megaen- ments of Ϸ5 kb were not sufficiently accurate or facile to enable zyme and its polyketide product in Escherichia coli. implementation of the approach. For the large number of se- quences to be synthesized in our project, we therefore excluded the he chemical synthesis of genes and genomes has received possibility of one-step synthesis of the Ϸ5-kb segments, because this Tconsiderable attention for several decades and is becoming would have required time-consuming, manual correction of nu- increasingly important in the exploitation of whole-genome se- merous errors (4). Instead, we developed methods to build them in quence information. The field was pioneered by Khorana and two error-free steps. First, we constructed multiple perfect se- coworkers with the then-heroic total synthesis of tRNA structural quences Ϸ500 bp in length called ‘‘synthons’’† (13); then we used a genes (1, 2) and by Itakura et al. (3) with the synthesis and facile method, dubbed ligation by selection (LBS), to connect them expression of the somatostatin gene. Since then, DNA synthesis into multisynthon segments of Ϸ5,000 bp. These segments, in turn, methodology has made steady progress, with current approaches were readily assembled into larger sequences by conventional relying on the enzyme-catalyzed assembly of short, chemically cloning strategies, as illustrated by our construction of a contiguous, synthesized oligonucleotides. Of the various methods, polymerase synthetic 31.7-kb PKS gene cluster. cycling assembly (PCA) (4) is the most widely used because of its inherent simplicity. Overlapping, complementary oligonucleotides Materials and Methods are annealed and recursively elongated with a heat-stable DNA Enzymes were obtained from New England Biolabs, unless other- polymerase to ultimately yield a full-length sequence, which is wise noted, and used as recommended. Molecular biological tech- amplified by conventional PCR. PCA, first reported for synthesis of niques were used as standard protocols (14). The pUC18-derived the 303-bp HIV-2 Rev gene (5), has since evolved (6–8) into a plasmids pKOS239-172-2 and pKOS293-172-A76 were reported in widely used general method for synthesis of genes of up to Ϸ1 kb. ref. 15. DH5␣ E. coli was made chemically competent with a kit BIOCHEMISTRY The 1-kb size barrier was broken in 1990 by Mandecki et al. (9), from Zymogen Research (Orange, CA). Oligonucleotides were who synthesized a 2.1-kb plasmid by ligation of 30 fragments, and from Qiagen͞Operon Technologies (Alameda, CA). NTPs were again in 1995 when Stemmer et al. (7) reported the one-step PCA PCR-grade from Roche Applied Sciences. DNA sequencing was synthesis of a 2.7-kb plasmid that was purified by antibiotic performed on an ABI 3730 DNA analyzer (Applied Biosystems) selection. Smith et al. (4) assembled the 5,386 X174 bacterio- according to the manufacturer’s recommended protocol. phage genome from a single pool of chemically synthesized oligonucleotides by using a combination of ligation and PCA methods, but purification of the product again required biolog- Abbreviations: DEBS, 6-deoxyerythronolide B synthase; EF, error frequency; LBS, ligation by ical selection. In 2002, Cello et al. (10) described a stepwise selection; LIC, ligation-independent cloning; PCA, polymerase cycling assembly; PKS, polyketide synthase; TU, transcription unit; UDG͞LIC, uracil DNA glycosidase͞ligation- synthesis of a 7,558-bp poliovirus cDNA by ligation and PCA. independent cloning. This sequence appears to be the longest synthetic DNA reported Data deposition: The sequences in this paper have been deposited in the GenBank database to date. Visionaries have even projected application of DNA (accession nos. AY661566 and AY771999). synthesis technology to build synthetic, minimal genomes (11). If *To whom correspondence should be addressed. E-mail: [email protected]. such goals are to be realized, methods will be needed to prepare †In synthetic chemistry, synthons are defined as ‘‘structural units within a molecule which long, contiguous, and perfect sequences of DNA without re- are related to possible synthetic operations’’ (13). quiring biological selection for purification. © 2004 by The National Academy of Sciences of the USA www.pnas.org͞cgi͞doi͞10.1073͞pnas.0406911101 PNAS ͉ November 2, 2004 ͉ vol. 101 ͉ no. 44 ͉ 15573–15578 Downloaded by guest on October 1, 2021 For uracil DNA glycosidase͞ligation-independent cloning to provide cohesive 4-nt 5Ј overhangs for ligation. Oligonucleotides (UDG͞LIC), the forward primer was 5Ј-GCUAUAUCGCUAUC- of 40 bases were synthesized that collectively encoded both strands GAUGAGCUGCCACTGAGCACCAACTACG, and the reverse of the insert, each having 20-nt overlaps with 40-mer oligos from the primer was 5Ј-GCUAGUGAUCGAUGCAUUGAGCUG- opposite strand. The single-stranded 5Ј overhangs were allowed to GCACTTCGCTCACTACACC. vary in size and filled during assembly. Gene synthesis and sequencing were assisted by an inte- grated automation system consisting of a BioMek FX, a robotic Synthon Synthesis. Oligonucleotide consolidation and assembly. To each ORCA arm, and a tip lift (Beckman Coulter), plate sealer and well of a microtiter plate was added 5 lofa50M solution (250 piercer (Velocity11), Palo Alto, CA), two tetrad thermal pmol) of each of the oligonucleotide components of a synthon, and cyclers (MJ Research, Cambridge, MA), and a cytomat hotel sufficient water was added to double the volume. For synthons up (Kendro, Asheville, NC). to 1 kb, each well of the ‘‘assembly’’ microtiter plate was loaded with 48 l of a stock solution containing 0.5 l of Expand High Fidelity Vector Construction. The BsaI site in the apramycin-resistance polymerase (5 units͞l, Roche), 1.0 l of 10 mM dNTPs, 5.0 lof R (Ap ) genes of pKOS293-172-2 and pKOS293-172-A76 were 10ϫ PCR buffer, 3.0 l of 25 mM MgCl2, and 38.5 l of water. To changed to GAGATC by PCR-based SDM (16) to give separate wells of the assembly plate, 2.0 l of each oligonucleotide pKOS309-52 [ApR and chloramphenicol-resistance (CmR)] and mixture was added. For synthons Ͼ1 kb, additional oligonucleotide pKOS309-53 [ApR and kanamycin-resistance (KmR)], respectively. mixture was added to keep the final concentration of individual The tetracycline-resistance gene (TetR), obtained from pACYC184 oligonucleotides at 1 M. Thermal cycling began with a 5-min by PCR, was introduced into the EcoRV site of pKOS309-52 with denaturing step at 95°C, and continued with 25 cycles at 95°C for the 5Ј end of the gene adjacent to the ApR gene to generate 30 s, 50°C for 30 s, and 72°C for 90 s. pKOS399-16-78 (ApR,CmR, and TetR). By using PCR-based Amplification. Each well of the ‘‘amplification’’ microtiter plate was site-directed mutagenesis, the BbsI site in the TetR gene of loaded with 48.75 l of a stock solution containing 0.5 l of Expand pKOS399-16-78 was changed to GTATTC to give pKOS399-21-1, High Fidelity polymerase (5 units͞l, Roche), 1.0 l10mM R and the EcoRI site in the Cm gene was changed to GAGTTC to dNTPs, 5.0 l10ϫ PCR buffer, 3.0 l of 25 mM MgCl2, 39.25 l give pKOS399-51-1.