Proc. Nati. Acad. Sci. USA Vol. 75, No. 12, pp. 5765-5769, December 1978

Chemical synthesis of genes for human insulin (synthetic genes/oligonucleotide synthesis/phosphotriester method/high-performance liquid ) ROBERTO CREA*, ADAM KRASZEWSKI, TADAAKI HIROSE, AND KEIICHI ITAKURA Division of Biology, City of Hope National Medical Center, Duarte, California 91010 Communicated by Ernest Beutler, October 2,1978

ABSTRACT A-rapid chemical procedure has been devel- MATERIALS AND METHODS oped and used for the synthesis of 29 oligodeoxyribonucleotides to build synthetic genes for human insulin, The gene for insulin Chemical synthesis of oligodeoxyribonucleotides B chain, 104 base pairs, and the one for A chain, 77 base pairs, Most of the materials and methods used for the synthesis of were designed from the amino acid sequence of human poly- oligodeoxyribonucleotides were described earlier (6, 7) except peptides. They bear single-stranded cohesive termini for the for the following modifications. EcoRI and BamHI restriction endonucleases and are designed The to be inserted separately into a pBR322 plasmid. Ihe synthetic (i) fully protected mononucleotides, 5'-0-dimethoxy- fragments, deca- to pentadecanucleotides, were synthesized by trityl-3'-p-chlorophenyl-,6-cyanoethyl phosphates, -were syn- a block phosphotriester method with trinucleotides as building thesized from the nucleoside derivatives with a new mono- blocks. Final purification was by high-performance liquid chromatography. All 29 oligonucleotides were pure and had the Table 1. Synthesis of trimer building blocks correct sequences. Com- ,t RF Purity,t Present No. pound* % a b % in In a previous paper we reported production of a functional peptide hormone, somatostatin, in Escherchia coi from a gene 1 AAG 47 0.15 0.40 93 B5, B6 of chemically synthesized origin (1). This result demonstrated 2 AAT 49 0.25 0.52 95 Hi, Al, A6 that a gene, designed from an amino acid sequence, can be 3 AAC 52 0.28 0.55 93 H5, B6, A2, A8 synthesized and expressed to produce the peptide in E. coli. 4 ACT 43 0.27 0.53 91 B4, B5, A6 However, extension of this to production of bio- 5 ACC 56 0.33 0.30 96 B7 medically valuable polypeptides such as human insulin and 6 ACG 39 0.18 0.45 90 H5, B7 growth hormone seemed to be limited by the of 7 AGG 45 0.10 0.26 89 H6, H7, B9 difficulty 8 AGT 33 0.14 0.40 96 B9, A2, All synthesizing longer genes in a reasonable time. 9 AGC 50 0.19 0.48 92 H8, Bl, A5, A10 Recent improvements in the synthesis of oligodeoxy- 10 AGA 48 0.24 0.50 91 A9 ribonucleotides by the phosphotriester method, such as the 11 TTC 44 0.26 0.52 95 B4, B7, A3 rapid synthesis of trimers (2) and the block synthesis of oligo- 12 TTG 49 0.11 0.31 94 H3, H5, A2, A3, A5 nucleotides (3), together with the extensive use of high-per- 13 TCT 58 0.24 0.49 96 A4 formance liquid chromatography for analysis and purification 14 TCA 45 0.28 0.53 92 Hi, H2, H4, Al of the DNA fragments, have dramatically reduced the time 15 TCG 39 0.12 0.34 91 A2 necessary for construction of DNA fragments. In this paper we 16 TGG 32 0.10 0.28 87 H3, Al, A10 report the synthesis of 29 oligodeoxyribonucleotides of defined 17 TGC 51 0.18 0.47 93 H6, B2, A4, A7, A8 sequence that can be assembled to form genes for human insulin 18 TGA 46 0.12 0.37 94 H7 B and A chains. 19 TAC 61 0.22 0.50 90 B4, All The overall design of the gene for its incorporation into the 20 TAA 55 0.17 0.44 95 B5, A10 plasmid pBR322 was similar to that used for the somatostatin 21 CCT 53 0.30 0.55 97 H3, H4, B10 gene (1). Fig. 1 shows the amino acid and nucleotide sequences 22 CAC 47 0.25 0.51 92 A3 for human insulin A and B chains. These genes will be fused to 23 CAA 58 0.25 0.51 93 H2, H6, H8, A7 E. coli ,-galactosidase gene on plasmid pBR322. Transforma- 24 CTT 41 0.28 0.54 92 B2, B9, A4 tion E. 25 CGA 40 0.27 0.52 93 A7 of coli with the chimeric plasmid DNA will lead to the 26 CGT 75 0.25 0.50 89 H2, H4, B3, Bi synthesis of hybrid polypeptides, including the sequence of 27 GGT 35 0.09 0.26 90 B3 amino acids corresponding to human insulin A and B chains. 28 GTT 46 0.18 0.45 93 B2 Because human insulin has no methionine residues in the amino 29 GTA 38 0.25 0.50 95 B6, B8, A6 acid sequence, the A and B chains can be obtained by cyanogen 30 GAA 39 0.15 0.39 88 H7, B3, B8, A5 bromide cleavage of the precursors. After the separate pro- 31 GAT 52 0.22 0.49 89 B10, A9 duction and purification of the A and B chains, active human 32 GCA 42 0.14 0.39 93 A9 insulin will be generated by in vitro formation of the correct * Fully protected trideoxynucleotides: 5'-O-dimethoxytrityl-3'-p- disulfide bonds between A and B chains (4, 5). chlorophenyl-2-cyanoethyl phosphate. t Overall yield calculated from the 5'-hydroxymonomers. The publication costs of this article were defrayed in part by page I Based on high-performance liquid chromatographic analysis. charge payment. This article must therefore be hereby marked "ad- vertisement" in accordance with 18 U. S. C. §1734 solely to indicate * Present address: Division of , Genentech, Inc., this fact. 460 Point San Bruno Boulevard, South San Francisco, CA 94080. 5765 Downloaded by guest on October 9, 2021 5766 Chemistry: Crea et al. Proc. Nati. Acad. Sci. USA 75 (1978) B-chain gene

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Met Phe Vol Asn Gin His Lou Cys Gly Ser His Lou Vol Glu Ala Lou Tyr Lou Vol Cys Gly Glu Arg Gly Phe Phe Tyr Thr Pro Lys Thr stop stop * 4 6 . , -*Z ~ -*e H * * 7 ------O. 3 * "I H2 3 4 a 8 2 -* a3 -B A ATTCATGT T CGTCAATCAGCACCT T TGTGGT TCTCACCTCGT TGAAGCTT TGTACCTTGT TTGCGGTGAACGTGGT T T CT TC TACACTCCTAAGACTTAA TAG GT ACAA GC A\GT TA GTC GTGGAAAC ACCAAAGAG TGGAGC AACTT CGA-A ACATGGAACA AACGCCACTT GCACCAAAAGAAGA TGTGAGGAT TCT GAATT ATCCTAG Hv b H- o.&| H. . *e H. * s be 8 , >z @ , - HindlIl Bam I

A-chain gene

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Met Gly Ite Val Glu Gin Cys Cys Thr Sor lie Cys Sor Leu Tyr Gin Lou Glu Asn Tyr Cys Asn EcoRI stop stop

AATTCATGGGCATCGTTGAACAGTGTTGCACTTCTATCTGCTCTCTTTACCAGCTTGAGAACTACTGTAACTAATAG GTACCCGTAGCAACTTGTCACAACGTGAAGATAGACGAGAGAAATGGTCGAACTCTTGATGACATTGATTATCCTAG Bam I FIG. 1. Design and synthesis of human insulin genes. The genes for human insulin, B chain and A chain, were designed from the amino acid sequences ofthe human polypeptides. The choice ofthe codon for each amino acid has been discussed (1). The 5' ends ofeach gene have single- stranded cohesive termini for the EcoRI and BamHI restriction endonucleases for correct insertion ofeach gene into plasmid pBR322. A HindII endonuclease recognition site was incorporated into the middle of the B-chain gene for the amino acid sequence Glu-Ala to allow amplification and verification ofeach halfofthe gene separately before construction ofthe whole B-chain gene. The B-chain and the A-chain genes were designed to be built from 29 different oligodeoxyribonucleotides, varying from decamer to pentadecamers. Each arrow indicates the fragment synthesized by the improved phosphotriester method, H1-H8 and Bl-B12 for the B-chain gene, and Al-All for the A-chain gene. The genes are designed to be expressed in E. coli separately; then human insulin is generated by disulfide-bond formation between B-chain and A-chain peptides in vitro.

functional phosphorylating agent, p-chlorophenyl-ft-cyanoethyl rate of 2 ml/min. The yield of each oligonucleotide, calculated phosphorochloridate (1.5 M equivalent) (unpublished data) in by high-performance liquid chromatographic analysis and the acetonitrile in the presence of 1-methylimidazole (8). The retention time of the pure fragments, is reported in Table 2. The products were isolated in large scale (100-30 g) by preparative 29 final oligonucleotides were also purified on Permaphase liquid chromatography (Prep 500 LC, Waters Assoc., Milford, AAX under the same conditions reported above. The material MA). in the desired peak was pooled, desalted by dialysis, and ly- (ii) By using the extraction method (2), we synthe- ophilized. After the 5' termini were labeled with [-y-32P]ATP sized 32 bifunctional trimers (see Table 1) at 5-10 mmol and by using T4 polynucleotide kinase (10), the homogeneity of 13 trimers, 3 tetramers, and 4 dimers as the 3'-terminus blocks each oligonucleotide was checked by electrophoresis on a 20% at about 1 mmol. The homogeneity of the fully protected tri- polyacrylamide gel. Most of the sequences were confirmed by mers was checked by thin-layer chromatography on silica gel two-dimensional sequence analysis (10) of their partial venom in two methanol/chloroform solvent systems: solvent a, 5% phosphodiesterase digestion products (Fig. 4). (vol/vol), and solvent b, 10% (vol/vol) (see Table 1). Starting from these compounds, we synthesized 29 oligodeoxyribonu- cleotides of defined sequence (Table 2), 18 for the B-chain gene RESULTS AND DISCUSSION and 11 for the A-chain gene. The scheme and the reaction Our approach to the construction of polynucleotides is the conditions for the condensation of trinucleotides to build do- block-coupling method rather than the step-by-step approach decanucleotides is shown in Fig. 2. (Fig. 2). Trinucleotides were used mainly as the starting Purification and identification of synthetic building blocks (trimers). For synthesis of the trimers, we have oligodeoxyribonucleotides reported the solvent extraction method, which simplifies the purification steps and increases the speed and yields of the High-performance liquid chromatography was used extensively trimer synthesis (3). The 45 different trimers necessary for during synthesis of the oligonucleotides for analysis of each construction of the insulin genes were synthesized in 3 months trimer and tetramer block, analysis of the intermediate frag- with yields in the range of 30-60% (see Table 1). ments (hexamers, monamers, and decamers), analysis of the last In the construction of the fully protected polynucleotides coupling reaction, and purification of the final products (see from the blocks, separation of the products from the starting Fig. 3). The chromatography was performed with a Spectra- material is more difficult as the length of the chain increases 3500B liquid chromatograph. After removal of all and depends on the base composition of the polynucleotides. protecting groups by concentrated NH40H at 50°C (6 hr) and This problem was overcome by using a 1.5 M excess of 3'- 80% AcOH at room temperature (15 min), the compounds were phosphodiester component 3 (Fig. 2) to drive the coupling re- analyzed on a Permaphase AAX (Du Pont) (9) column (1 m X action almost to completion with the aid of a coupling , 2 mm) with a linear gradient of buffer B (0.05 M KH2PO4/1¶0 2,4,6-triisopropylbenzenesulfonyl tetrazolide (11), and by M KCI, pH 4.5) in buffer A (0.01 M KH2PO4, pH 4.5). The passing the reaction mixture through a short silica gel column, gradient was formed by starting with buffer A and applying which traps only the unreacted charged component (Figs. 2 and 3% of buffer B per min. The elution was at 600C, with a flow 3). Without further purification steps, block couplings were Downloaded by guest on October 9, 2021 Chemistry: Crea et a). Proc. Nati. Acad. Sci. USA 75 (1978) 5767

B B B B B B n1 n 11 DMT _ p \4J-P-OCE DMT 0T 04-O~

0 0 0 0 0 R R R R R 1 / 3N 2

B B B B B B B B B 00 b0 111 111 I ~I 1Il 1 *)MT~ PI \_-Pp _ p_ + HO, _ p p -P-OCE 3 + HO - OAn

00 0 0 0 0 0 0

R R R R R R R R 4 5 1) TPSTe 1) TPSTe 2) Slica gel 2) Ht 3) Et3N 3) Silica gel

B B B B B B B B B B B B

0 0 lol l lol bl 11 11 11 11 111111 11 N DMT p p OP -P P-O- HO\ OO POP -OAn

o R 0 0 0 0 0 0 0 0 R R R R R R R R R R R 7 1) TPSTe 2) Silica gel 3) NH4OH 4) AcOH

B B B B B B B B B B B B

Il1f11 11 N 11If 1 1 1 HO O- O- ~O- I O- I O- O- O- O- 0 o o o - 0l 00N 0- 0-

R = CI

B = Protected and nonprotected bases 0 An -C DOMe DMT = dimethoxytrityl

TPSTe = S NXN N 0 - CE = -cyonoethyl FIG. 2. Improved method for chemical synthesis of oligonucleotides. The basic units used to construct polynucleotides are two types of trimer block, the bifunctional trimer I and the 3'-terminus trimer 2. The bifunctional trimer 1 was hydrolyzed to 3'-phosphodiester component 3 with pyridine/triethylamine/water (3:1:1 vol/vol) (unpublished data) and also to the 5'-hydroxyl component 4 with 2% benzenesulfonic acid. The 3'-terminus block 2, protected by an anisoyl group at 3'-hydroxy, was treated with 2% benzenesulfonic acid to give the 5'-hydroxyl block 5. The coupling reaction ofan excess ofthe 3'-phosphodiester trimer 3 (1.5 M equivalent) with the 5'-hydroxyl component 4 or 5 (1 M equivalent) in the presence of 2,4,6-triisopropylbenzenesulfonyl tetrazolide (TPSTe, 3-4 equivalents) went almost to completion in 3 hr. To remove the excess of the 3'-phosphodiester block 3, we passed the reaction mixture through a short silica gel column set up on a sintered glass filter. The column was washed, first with CHCl3 to elute some side products and the coupling reagent, and then with CHCl3/MeOH (95:5 vol/vol) in which almost all of the fully protected oligamer was eluted. Under these conditions, the charged compound 3 remained in the column. Similarly, block couplings were repeated until the desired length was constructed. repeated in the same way until the desired length was con- than 95%, judged by electrophoresis on acrylamide gel and by structed. two-dimensional sequence analysis (Figs. 4 A and B). Starting from the trimers, in 3 months we synthesized 29 The chemically synthesized oligonucleotides have been li- different oligodeoxyribonucleotides, ranging from decamer gated and the separate insulin genes cloned and expressed in to pentadecamer, to build genes for human insulin. In some plasmid pBR322. The details of this work will be published in cases, for construction of the desired length of oligonucleotides, a subsequent paper (12). trimers were coupled with monomers and dimers to prepare Sequence-specific polynucleotides have proven to be useful tetramer and pentamer blocks, respectively. The fully protected biological tools (13, 14). The major drawbacks in the synthesis oligonucleotides were deblocked completely and the unmasked of polynucleotides were the slowness of each step, the low yields oligomer was purified by high-performance liquid chroma- of coupling reactions, and the purity of the final products. tography. The purity of each fragment appeared to be more Significant improvements at several stages of the synthesis and Downloaded by guest on October 9, 2021 5768 Chemistry: Crea et al. Proc. Natl. Acad. Sci. USA 75 (1978)

A D Table 2. Yield and retention time of synthetic oligonucleotides Compound Sequence Yield, %* Rt, mint H1 AATTCATGTT 51 12.2 H2 CGTCAATCAGCA 53 13.8 E H3 CCTTTGTGGTTC 49 13.0 H4 TCACCTCGTTGA 58 13.8 c H5 TTGACGAACATG 37 13.0 H6 CAAAGGTGCTGA 33 12.8 H7 AGGTGAGAACCA 45 15.0 H8 AGCTTCAACG 54 12.5 Bi AGCTTTGTAC 42 12.8 B2 CTTGTTTGCGGT 48 14.0 1-- I I . I B3 GAACGTGGTTTC 54 14.0 1 5 9 13 17 1 5 9 13 17 B4 TTCTACACTCCT 22 13.6 U) N B5 AAGACTTAATAG 35 15.1 B6 AACAAGGTACAA 37 15.8 B7 ACGTTCACCGCA 48 14.8 B E B8 GTAGAAGAAACC 55 14.0 B9 AGTCTTAGGAGT 52 14.5 B10 (A12) GATCCTATTA 37 12.5 Al AATTCATGGGC 26 13.5 A2 ATCGTTGAACAGTG 27 16.1 A3 TTGCACTTCTAT 52 13.9 C14I A4 CTGCTCTCTTTACC 41 14.2 A5 AGCTTGAGAACT 53 13.9 A6 ACTGTAACTAATAG 50 15.8 A7 CAACGATGCCCATG 34 15.2 A8 TGCAACACTGTT 45 13.2 A9 AGAGCAGATAGAAG 35 15.0 52 14.4 -T----T-F-t-r---T- -1 A10 AAGCTGGTAAAG 1 5 9 13 17 1 5 9 13 17 21 All GTTACAGTAGTTCTC 48 15.7 * Based on high-performance liquid chromatographic analysis of the final products and estimated on the 3'-terminus blocks. t Retention time of the pure fragment.

C F the extensive use of high-performance liquid chromatography for analysis and purification of the products have overcome

C these problems. The results described above show that the Iq chemical synthesis of polynucleotides may no longer be the rate-limiting step in an overall project to produce biologically interesting genes. With these improvements the chemical synthesis of polynucleotides, together with recombinant DNA methods, will provide a powerful tool for production of medi- cally important polypeptide hormones and even enzymes.

1 5 9 13 17 1 5 9 13 17 21 We thank Leonor Directo, Lillian Shih, Eddie Huang, and Parkash Time, min Time, min Jhurani for their assistance in various aspects of the project. This re- FIG. 3. High-performance liquid chromatographic analysis search was supported by contracts from Genentech, Inc., to the City of synthetic route for the. deoxypentadecamer GTTACAG- of Hope National Medical Center. TAGTTCTC (All). The reaction between the 3'-phosphodiester trimer (AGT) (0.16 mmol, Rt = 7.5 min) and the 5'-OH, 3'-anisoyl heptamer (AGTTCTC) (0.10 mmol, Rt = 9.5 min) (A, before addition 1. Itakura, K., Hirose, T., Crea, R., Riggs, A. D., Heyneker, H. L., of the coupling reagent) led to the synthesis of the fully protected Bolivar, F. & Boyer, H. W. (1977) Science 198, 1056-1063. decamer (AGTAGTTCTC) (Rt= 12.5 min) almost ir| a quantitative 2. Hirose, Y., Crea, R. & Itakura, K. (1978) Tetrahedron Lett. 28, yield after 2 hr (B). After purification of the products through a short 2449-2052. as a in with silica gel filter, the decamer was recovered solid 92% yield 3. Crea, R., Hirose, T. &, Itakura, K. (1978) Tetrahedron Lett., in a purity of 75% (C). This latter compound was treated with 2% ben- zenesulfonic acid remove the group) and coupled with press. (to 5'-protecting K. & H. Adv. the 3'-phosphodiester pentamer (GTTAC) (0.11 mmol, Rt = 10.5 min) 4. Lubke, Klostermeyer, (1970) Enzymol. 33,445- (D). After 4 hr (E), the reaction was stopped and the desired fully 525. protected pentadecamer (GTTACAGTAGTTCTC) (Rt = 15.7 min) 5. Du, Y.-C., Zhang, Y.-C., Lu, Z.-X. & Tsou, C.-E. (1961) Scientia was purified on the silica gel column and isolated as a solid (yield, 48% Sinica 10, 84-95. as pure based on chromatographic analysis). Finally, this 6. Itakura, K., Katagiri, N., Narang, IS. A., Bahl, C. P., Mariaus, material (5-10 mg) was deblocked and passed through the high-per- K. J. & Wu, R. (1975) J. Biol. Chem. 250,4592-4600. formance liquid chromatograph. The material in the major peak was 7. Itakura, K., Katagiri, N., Bahl, C. P., Wightman, R. H. & Narang, pooled, evaporated, desalted by dialysis, and lyophilized. The purity S. A. (1975) J. Am. Chem. Soc. 97,7327-7332. of the unblocked pentadecamer (GTTACAGTAGTTCTC) (All) was 8. Van Boom, J. H., Burgess, P. M. T., Crea, R., Luyten, W. C. M. reconfirmed by chromatographic analysis (F). M. & Reese, C. B. (1975) Tetrahedron 31,2953-2959. Downloaded by guest on October 9, 2021 Chemistry: Crea et al. Proc. Natl. Acad. Sci. USA 75 (1978) 5769

Origin A B G

c

T

T

G

'WI * _ 4 lb, A _4

G

A Abb., I%I. 2 A Al A2 A3 A4 A5 AS A? A8 At AIO All c

T

FIG. 4. Polyacrylamide gel electrophoresis (A) and sequence analysis (B) of synthetic polynuclebtide fragments. Each oligonucleotide purified by high-performance liquid chromatography was phosphorylated at the 5' terminus with [-y-32P]ATP by T4 polynucleotide kinase. The radioactive oligonucleotides were analyzed by electrophoresis on 20% polyacrylamide gels under denaturing conditions (7 M urea) and autoradiographed (A). The autoradiogram of the synthetic fragments of the A series (Al-All) is shown. Similar results were obtained with the H and B series (not shown). The two-dimensional sequence analysis of most of the synthetic oligonucleotides was performed by the published procedure (10), and the expected sequence pattern was obtained. (B) Sequence of the dodecamer, AGCTTGAGAACT (A5).

9. Van Boom, J. H. & DeRooy, J. F. M. (1977) J. Chromatog. 131, sura, D. G., Crea, R., Hirose, T., Kraszewski, A., Itakura, K. & 169-177. Riggs, A. D. (1979) Proc. Natl. Acad. Sci. USA 76, in press. 10. Jay, E., Bambara, R., Padmanabhan, P. & Wu, R. (1974) Nucleic 13. Heyneker, H. L., Shine, J., Goodman, H. M., Boyer, H. W., Acids Res. 1, 331-353. Rosenberg, J., Dickerson, R. E., Narang, S. A., Itakura, K., Lin, 11. Stawinski, J., Hozumi, T., Narang, S. A., Bahl, C. P. & Wu, R. S. & Riggs, A. D. (1976) Nature (London) 263, 748-752. (1977) Nucleic Acids Res. 4, 353-372. 14. Scheller, R. H., Dickerson, R. E., Boyer, H. W., Riggs, A. D. & 12. Goeddel, D. V., Kleid, D. G., Bolivar, F., Heyneker, H. L., Yan- Itakura, K. (1977) Science 196, 177-180. Downloaded by guest on October 9, 2021