<<

Downloaded by guest on September 29, 2021 n so 08 ogl 0 fteetr es eoehas yeast entire the www.pnas.org/cgi/doi/10.1073/pnas.1818259116 of Subse- 40% (6). roughly generated, 2018, were kb) of (7–11) (273 as of synIII and additional synthesis reported five yeast consortium quently, the artificial 2014, the In institutions. of 21 chromosomes 16 all cerevisiae of expanded rebuilding been the has minimization toward genome and synthesis genome Tschan Flavia functionality Venetz E. biological Jonathan and to flexibility genome design bacterial achieve a of rewriting synthesis Chemical h em fCagVne ul 7-eerdcdvrinof version reduced 473- a built life, cellular Venter for the Craig set gene of minimal teams a into the insights gain To sin- up. the a boot within as synthesis, also missense whole-genome work gle of strategies the challenges assembly However, the DNA methods. highlighted vitro transplantation in genome improved and and scale megabases synthesis chemical to the from expanded efforts replicas These respectively. genome of genitalium synthesis reported Institute chemical Venter Craig the the complex 2010, more and 2008 of In synthesis . of whole-genome field spurred viral the sized has moderately in on progress ushered initial The has genomics. synthetic oligonucleotides using (2) genomes I recoding synonymous crescentus Caulobacter health and purposes industrial benefits. for the organisms toward to improved utility rewriting of its design and synthesis fea- our functions chemical control genome sum, of fundamental noncoding In decode promise the uncovered sequences. highlight -coding we results within where embedded identi- tures 27 function limited a of their including lost annotation, set incorrect that with genes genes essential 98 fied synthetic context of of codon Discovery functionality the biological genomes. on nei- changing in influence that nor equal significant suggest structure have are findings mRNA design, These changing the genes. ther of of natural 81.5% genes to to essential functionality corresponding 432 that eth-2.0, revealed C. in func- analysis biological design 799. Our the genome mutagenesis. tested to the in We 6,290 of resulting codons. tionality 123,562 from substitutions, of base features rewriting 133,313 the genetic introduced reduce Within encoded we to genes. rewriting Overall, of genetic essential sequence number the used its we the of studied genome, level and 785,701-bp the the rewriting at synthesis content information chemical rebuilt of We cell. cess bacterial of a genome of essential functions the fundamental most the of genomics. of testing synthetic and in ethensis-2.0 synthesis chemical challenge the report key we a 2018) Here, 29, remains October artificial review sequences into for functions (received DNA biological 2019 program 6, March to approved how and Understanding WA, Seattle, Washington, of University Baker, David by Edited 94598 CA Creek, Walnut Institute, Genome a Christen Beat and nttt fMlclrSsesBooy Eidgen , Systems Molecular of Institute fte74k oi iu 1 n .-bbceipaephiX174 5.4-kb and (1) virus polio 7.4-kb synthesis the chemical of template-independent the 2000s, early the n ln ihteeacmlsmns h ocp fwhole- of concept the accomplishments, these with Along .mycoides M. rvnb nitrainlcnotu opsdof composed consortium international an by driven 53k)and kb) (583 ,arwitnbceilgnm composed genome bacterial rewritten a eth-2.0), (C. a alsE Flores-Tinoco E. Carlos , eoe(5). genome | a,1 | env N synthesis DNA novo de a hmclgnm synthesis genome chemical uaDlMedico Del Luca , albce crescentus Caulobacter yolsamycoides Mycoplasma dnaA .crescentus C. siceTcnsh ohcueZ Hochschule Technische ossische eeiiilyprevented initially gene ¨ a a Mari , lxne W Alexander , | eoerewriting genome hog h pro- the through 11M)(,4), (3, Mb) (1.1 levnKooten van elle ¨ Saccharomyces ytransposon by Mycoplasma Caulobacter olfle ¨ | a hlp Sch Philipp , a y Guennoun Rym , 1073/pnas.1818259116/-/DCSupplemental n f1 es oosars e frbsmlgns(17) genes rewriting synonymous ribosomal recalcitrant in unexpected of accomplished unearthed rare were studies set arginine (18) the AGG a of and instances AGA across codons 123 rewrit- of codons approach, rewriting this sense genome-wide of and 13 extension within an codon of In stop ing (16). a code of genetic dispensability in the codon the stop demonstrating TAG TAA, the of instances 321 primarily of the on and genomes been focused microbial are few have of a and rewriting efforts organisms. (12–14), artifi- genomes natural rewriting viral of within for reported genome-wide occur incorporation not date, for do altered codons To which with acids, up organisms amino free engineer cial and Such to to codes genome. used possibility genetic entire are an the efforts throughout offers rewriting codons codons reassign and synonymous erase multiple by acid for approach alternative an reduction. offers genome yet iterative it progressive not today, permit had but CRISPR stage, to project, the the synthesis entered In sites 2.0 yeast chromosomes. loxP the yeast of of of beginning completion seeding on as reduction the replacement genome included homologous as increase stepwise to well during transposons) fidelity and , targeting genes, repetitive (tRNA removed sequences chromosomes redesigned The covered. been hsatcecnan uprigifrainoln at online information supporting contains article This (GenBank database Information Biotechnology 1 for Center no. National accession the in deposited the of sequence The deposition: Data BY-NC-ND) (CC 4.0 NoDerivatives License distributed under is article access open This Submission.y Direct PNAS a is article This etn fsnhtcgnms ..adBC odsae rmGgbssSizradAG. Switzerland Gigabases functional from shares covers hold that B.C. and inventors M.C. as genomes. B.C. synthetic of and testing M.C. with Eidgen (WO2017085249A1) statement: and application M.C., interest paper.y the L.D.M., of wrote J.E.V., Y.B., B.C. Conflict P.S., research; and M.C., A.W., performed J.E.V., L.D.M., and S.D. data; J.E.V., and analyzed research; B.C. R.G., designed M.v.K., C.E.F.-T., B.C. F.T., and D.A., M.C. contributions: Author rc,C-03Z CH-8093 urich, ¨ owo orsodnemyb drse.Eal [email protected]. Email: [email protected] addressed. or ch be may correspondence whom To eerhcnett nesadhwesnilfntosare functions essential genomes. how into complementary programed understand and to of powerful concept process a research the through provides genomes synthesis entire chemical argued of is It rewriting challenging. the very remained that has gene genetic an by essential of the systems set Classical uncovering biological yet genome. genes, of individual its functioning analyzing of the dissect sequence approaches DNA stored are the cell living within a of functions biological fundamental The Significance h eudnyo h eei oedfiigtesm amino same the defining code genetic the of redundancy The sn lg-eitdrcmiern 1) all (15), recombineering oligo-mediated Using cerevisiae. S. achle ¨ CP035535).y rc,Sizrad and Switzerland; urich, ¨ a vsBucher Yves , a aulDeutsch Samuel , .eth-2.0 C. . y siceTcnsh ohcuehlsapatent a holds Hochschule Technische ossische ¨ . y raieCmosAttribution-NonCommercial- Commons Creative a oa Appert Donat , b eoerpre nti ae a been has paper this in reported genome eateto nryJoint Energy of Department shrci coli, Escherichia b www.pnas.org/lookup/suppl/doi:10. atisChristen Matthias , NSLts Articles Latest PNAS .coli E. a eeatrdto altered were , These coli. E. Salmonella, | f10 of 1 a,1 , y

BIOPHYSICS AND SYSTEMS BIOLOGY COMPUTATIONAL BIOLOGY A synthesis methods have been used for the rewriting of gene . crescentus e C gen cassettes in conjunction with genomic replacement strategies tiv om E. coli a e (15, 20). Ongoing de novo synthesis toward a 57-codon n genome was reported (21), with the complete genome synthesis 2.0 2.4 underway. 1.6 Despite this progress, the underlying rewriting design prin-

2.8 ciples have remained ill defined, and debugging has remained challenging (17, 19). It has been speculated that presence of 1.2 embedded transcriptional and translational control signals at 676 rewritten the termini of coding sequences (CDSs) as well as imprecise 3.2 essential genes genome annotations are the underlying cause. We hypothesized

0.8 that massive synonymous rewriting in conjunction with a system-

atic investigation of error causes will shed light onto the general 3.6 sequence design principles of how biological functions are pro- gramed into genomes. However, while some progress has been

0.4 made to study recoding schemes using individual genes and gene

4.0 clusters (21), the field currently lacks a broadly applicable high- 0 throughput error diagnosis approach to probe the rewriting of 0 0.2 0.4 0.6 entire genomes. Here, we report the chemical synthesis of Caulobacter ethensis- 2.0 (C. eth-2.0), a bacterial minimized genome composed of the C e . eth nom most fundamental functions of a bacterial cell. We present a -2.0 ge broadly applicable design–build–test approach to program the most fundamental functions of a cell into a customized genome sequence. By rebuilding the essential genome of Caulobacter crescentus (Caulobacter hereafter) through the process of chemi- B Essential C. eth-1.0 DNA part cal synthesis writing, we studied the genetic information content at the level of its essential genes. Tn hits begin end Results Essential Part List to Build C. eth-1.0. We conceived a bac- ORFs terial genome design encoding the entire set of essential DNA sequences from the freshwater bacterium Caulobacter (Fig. 1A). Caulobacter is recognized as an exquisite cell cycle model organism (22–25) for which multidimensional omics (26) and transcriptome- (27) and ribosome-profiling measure- ments have been integrated into a well-annotated genome model (28, 29). We computationally generated the entire list of essential DNA parts for building a from a previously published high-resolution transposon sequencing DNA dataset (30) that identified with resolution the pre- sequence rewriting cise coordinates of essential genes, including endogenous pro- TSS 500 bp moter sequences. DNA parts were extracted from the native RBS Caulobacter NA1000 genome sequence [National Center for Biotechnology Information (NCBI) accession no. NC 011916.1] according to predefined design rules and concatenated into a rewritten essential C. eth-2.0 gene digital genome design preserving gene organization and orien- Fig. 1. Part design, compilation, and chemical synthesis rewriting of the tation (Fig. 1A and SI Appendix) (31). The resulting 785,701- C. eth-1.0 genome. (A) Schematic representation of the digital design pro- bp genome design termed Caulobacter ethensis-1.0 (C. eth-1.0) cess; 1,745 DNA parts were extracted from the native Caulobacter NA1000 encodes for the most fundamental functions of a bacterial genome (gray) and reorganized into a rewritten genome design (blue) cell. Cumulatively, C. eth-1.0 consists of 1,761 DNA parts, comprising the entire list of essential genes required to run the basic oper- including 676 protein-coding, 54 noncoding, and 1,015 inter- ating system of a bacterial cell. Lines (blue) connect positions of DNA parts genic sequences. To select for faithful assembly and per- between native and rewritten genomes. (B) Workflow of the part identifi- mit stable maintenance in S. cerevisiae, auxotrophic marker cation and chemical synthesis rewriting process. Transposon sequencing was used to identify the entire set of essential DNA parts of Caulobacter at a genes (TRP1, HIS3, MET14, LEU2, ADE2) and a set of 10 resolution of a few base pairs. Absence of transposon insertions [transpo- autonomous replicating sequences (ARSs) were seeded across son (Tn) hits are plotted as gray lines] pinpoints the nondisruptable DNA the genome design (Table 1 and SI Appendix). Furthermore, regions within the native Caulobacter genome. Such essential DNA parts the pMR10Y (31) shuttle vector sequence, permitting stringent may encode for putative alternative ORFs, TSSs, or ribosome binding sites low copy replication in S. cerevisiae, E. coli, and Caulobacter, (RBSs) that are not required for functionality of the essential DNA part itself. was inserted at the native location of the Caulobacter origin of Computational sequence rewriting (Materials and Methods) was used to replication. erase putative sequence features that have not been assigned to a specific biologic function. The resulting rewritten DNA parts are fully defined and Sequence Rewriting of C. eth-1.0 to Enable de Novo Genome Syn- only encode for their desired function. thesis. We were unable to obtain 3- to 4-kb DNA building blocks of C. eth-1.0 from commercial DNA suppliers due to events that occurred primarily in the vicinity of 50 and 30 termini a multitude of synthesis constraints. Synthesis constraints are of protein-coding sequences (18, 19). Recently, to investigate a common problem of natural genome sequences, which have the impact of more complex rewriting schemes, de novo DNA evolved to maintain biological information rather than facilitate

2 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1818259116 Venetz et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 N atcategory part DNA erte ee eanfntoa ilietf ee nwhich in genes identify will whether all functional Testing ribo- 2). remain of (Table predicted genes 1,331) 95.3% all of rewritten of S2C), (1,021 76.7% motifs Fig. and stalling 1,730), putative Appendix, some of all (SI (1,648 of TSSs 3,229) 87.4% internal of removed (2,822 we 1B (Fig. Overall, ORFs rates Methods). and tune or rials fine (predicted motifs may sequence that and cryptic) (TSSs), sites elements tran- start internal genetic scriptional gene predicted of hypothetical ORFs, alternative sequences include of elements protein-coding number enabled within rewriting the maintained, present minimize was to genes annotated us sequence 676 acid amino the the of While versions. synonymous In by (33). codons 2) rewritten (Table the design yield to substitutions sequences base protein-coding 123,141 within add design deliberately for to sequence introduced 32) computational (31, substitutions algorithms used base we be the streamlining, can to synthesis genomes addition essential In the funda- identified. within information Furthermore, encoded code. of functions acid amino layers mental primary additional the beyond where exist study genome existing and of accuracy annotations powerful Features. the a probe Genetic offers to approach rewriting of experimental synthesis Number chemical that the reasoned Minimize We to chromosome Rewriting 785-kb Sequence the into blocks building DNA assem- of the standardized facilitate of to bly 1,045 sites additional erased restriction we hinder endonuclease Moreover, to known synthesis. 2), DNA of (Table chemical of composed content regions (GC) 4,342 are guanine-cytosine and high These stretches, homopolymeric 2). removed 93 repeats, and (Table 1,233 substitutions constraints base synthesis 10,172 5,668 introduced generated we termed and latively, design 32) genome reported (31, synthesis-optimized previously algorithms a our used design We DNA func- S1A). computational biological Fig. the Appendix, encoded (SI of the tions maintaining synthesis while to chemical genome easy facilitate 785-kb an would into sequence rewriting synthesize synonymous hypothe- computational We synthesis. that low-cost sized genome for showed bacterial amenable deposited not (31) all are sequences of work three-quarters than bioinformatics more that Recent synthesis. chemical yeast. in selec- replication and kanamycin selection for a AJ606312.1), Caulobacter no. and accession marker, tion (GenBank replicon copy RNA. noncoding and 2.5 ncRNA, assembly yeast. the in design direct † genome to the used of were maintenance that markers selection *Auxotrophic 19,143 16 assembly and replication Genome sequences Intergenic sequences Protein-coding the build to used list Part 1. Table eeze al. et Venetz ocdn sequences Noncoding h M1Ysutevco otisabodhs-ag K-ae low RK2-based host-range broad a contains vector shuttle pMR10Y The pMR10Y ARS markers* Click ncRNA rRNA tRNA Semiessential Essential Nonessential Redundant .eth-2.0. C. oa o fDAprs171785,701 1,761 parts DNA of no. Total † swl sUA akr R,adcnrmr CN elements (CEN) centromere and ARS, marker, URA3 as well as oriT ucinfrcnuainltase from transfer conjugational for function erpae 61 fall of 56.1% replaced we eth-2.0, C. uniySz b)Fato (%) Fraction (bp) Size Quantity ,1 60312.2 96,043 1,015 .eth-1.0 C. 1 1,7 14.5 59.8 83.9 114,270 471,072 113 660,789 462 676 02110.3 2,121 10 0.4 3,455 44 66,7 7. 1.9 60,477 14,970 86 15 49761.2 9,726 54 0601.4 10,670 1 ,5 0.8 6,352 0.2 0.6 5 1,884 4,387 7 3 eoedesign genome Cumu- eth-2.0. C. These eth-2.0. C. and .eth-2.0 C. .coli E. Mate- to lentv eei features genetic Alternative N ytei constraints synthesis DNA Type features genetic of of reduction rewriting massive Sequence 2. Table these corrected We prevented chromosome. which full-length ARS1213), the and of (ARS416 replication elements ARS tive yeast partial of prototrophy Sequencing with successful. reconstitute clones not were and megasegments 16 the genes from assemble to attempts functional Initial 2B). form (Fig. mark- will click (YJV04), ers genes yeast marker megaseg- auxotrophic engineered all adjacent an lacking in strain between assembly HIS3, chromosome split (TRP1, correct On ADE2) genes ments. and yeast auxotrophic LEU2, five MET14, introducing strat- 2A by marker (Fig. click egy a size) applied we in assembly, chromosome kb size) complete (38–65 in megasegments kb (19–22 16 and segments into chromosome further 236 37 initial and into these required blocks assembled block progressively DNA DNA We single a synthesis. man- only custom successfully and were Appendix), blocks (SI 236 ufactured of 235 rewriting, synthesis genome-scale sequence of on ease the Demonstrating 2A). complete assem- (Fig. the (32) 4-kb build to to 3- blocks from bly starting strategy assembly DNA four-tier defined of Synthesis fully Chemical a to lead ultimately will cell. knowl- artificial genes repair precise subsequent nonfunctional the The and of functional elements. remain genes genetic which of of of composed edge number genes artificial minimized defined a fully provide functional will Achieving necessary however, is functioning. code proper acid amino for the beyond information additional Codons Nme fsnnmu oo usiuin nrdcdo sequence on introduced substitutions † codon synonymous rewriting. of *Number rewriting Sequence *ubro N ytei osrit of constraints synthesis DNA of gel **Number field of PacI pulsed linearization by and analysis facilitate PmeI subsequent electrophoresis. to for unique chromosome backbone two assembled pMR10Y final the the the that within Note remained I-SceI). sites I-CeuI, PmeI, PacI, BspQI, k ¶ § ‡ eth-2.0. C. # eth-2.0. C. ein fhg Ccontent GC high of Regions ubro lentv Rsrsdn ihnte66CS of CDSs 676 the within residing ORFs alternative of Number ubro iooebnigsts(Bs nenlt CDSs. to internal (RBSs) sites binding ribosome of CDSs. Number to internal TSSs of Number oa ubro yeISrsrcinststa eermvd(aI BsaI, (AarI, removed were that sites restriction IIS type of number Total ubro eann eei etrswti D of CDS within features genetic remaining of Number RBS TSS ORFs ihG regions GC High etito sites Restriction Homopolymers Hairpins repeats Direct TAG TTA TTG erte codons* Rewritten substitutions Base eann eei features genetic Remaining ytei osrit* ,1 301 7,014 constraints** Synthesis sn es rnfrain oslc o the for select To transformation. yeast using Appendix) SI ‡ § † bp ≥8 bp ≥8 k # .eth-2.0 C. .eth-2.0. C. > . ihna10b window. 100-bp a within 0.8 ¶ sebisietfidtodefec- two identified assemblies .eth-1.0 C. .eh10C eth-2.0 C. eth-1.0 C. .eth-2.0 C. ecmuainlydvsda devised computationally We 1,154 ,3 295.3 82 1,730 ,3 1 76.7 87.4 310 407 1,331 3,229 ,9 799 4,342 6,290 1,047 oe135256.1 17.0 123,562 133,313 None None 3 666.9 46 139 94.2 10 173 0 4 76.9 140 606 8 1 87.2 113 880 46 .eth-1.0 C. NSLts Articles Latest PNAS .eth-2.0 C. into hoooei yeast in chromosome and .eth-2.0 C. .eth-2.0 C. .eth-2.0. C. 100 100 0 0 100 0 99.8 2 .eth-1.0 C. chromosome .eth-1.0 C. rcin(%) Fraction ed to leads | genes, f10 of 3 and and

BIOPHYSICS AND SYSTEMS BIOLOGY COMPUTATIONAL BIOLOGY A B C 16 Mega-segments URA3, ARS209 PmeI, PacI YJV04 YJV04 ARS1323 + C. eth-2.0 C. eth-2.0

TRP1 YJV04 ARS_Max2 Selective medium SD medium ADE2 Ura-, Trp-, His- Ura+, Trp+, His+ - - - + + + -Ura, -Trp, -His, ARS4 Met , Leu , Ade Met , Leu , Ade -Met, -Leu, -Ade

ARS C. eth-2.0 785,701 bp D E 727 16 Mega-segments Segment level in E. coli Marker YJV04 C. eth-2.0 3.0 37 Segments digest digest 1.0 236 Blocks

HIS3 945 kb 3.0 Mega-segment level in E. coli LEU2 ARS416 C. eth-2.0 1.0 ARS_HI 825 kb 771 kb ARS1018 750 kb C. eth- 2.0 3.0 ARS516 680 kb ARS1113 1.0 MET14 565 kb 0100 200 300 400 500 600 700 ARS1213 Sequencing coverage [log10] Genome coordinates [kb]

Fig. 2. Assembly of C. eth-2.0 in S. cerevisiae. (A) Schematic representation of the circular 785,701-bp C. eth-2.0 chromosome with six auxotrophic selection markers (red), 11 ARSs (black), and the restriction sites for PmeI and PacI (blue); 236 DNA blocks (green boxes) were assembled into 37 genome segments (blue boxes) and 16 megasegments (orange boxes) and further assembled into the complete C. eth-2.0 genome (outermost gray track). (B) The complete C. eth-2.0 chromosome was assembled in a single reaction from 16 megasegments by yeast spheroplast transformation and subsequent growth selection for auxotrophic TRP1 and LEU2 markers. (C) Growth selection on medium lacking Ura, Trp, His, Met, Leu, and Ade identified yeast clone 2 (C. eth-2.0) positive for all auxotrophic markers, while the parental strain (YJV04) fails to grow and requires synthetic defined (SD) medium. (D) Size validation of the 785-kb C. eth-2.0 chromosome by pulsed field gel electrophoresis. Digestion with PmeI and PacI releases a 771-kb portion of the C. eth-2.0 chromosome (arrow) from the shuttle vector pMR10Y. Undigested (marker) and PmeI- and PacI-digested yeast chromosomes (YJV04 digest) serve as controls. (E) DNA sequencing coverage at segment level (Top) and megasegment level (Middle) and the complete chromosome assembly (Bottom) are shown.

design errors and added five additional ARS sequences to pro- occurred in the final assembly of the C. eth-2.0 chro- mote efficient replication of the GC-rich C. eth-2.0 chromosome mosome, indicating a high sequence fidelity in the genome build in yeast. process. One-step transformation of the 16 corrected megasegments into yeast spheroplasts yielded two clones, one of which restored Mapping of Toxic Genes. It was previously reported in clone-based prototrophy for all six auxotrophic click markers, indicating genome sequencing studies that natural microbial genomes complete assembly of C. eth-2.0 (Fig. 2C). We subsequently con- contain genes encoding for toxic and dosage-sensitive expression firmed the presence of C. eth-2.0 as a single circular chromosome products (36, 37). We speculated that toxic genes residing on by pulsed field gel electrophoresis (Fig. 2D), diagnostic PCR C. eth-2.0 would prevent chromosomal maintenance in (SI Appendix, Fig. S3), and whole-genome sequencing (Fig. 2E). Caulobacter. Therefore, we tested the design in the form of the C. eth-2.0 has a high GC content exceeding 57%, while previ- 37 individual C. eth-2.0 chromosome segments for the presence ous chemically synthesized chromosomes (5, 8, 34) exhibit low of toxic genes. Quantification of conjugational transfer from E. GC contents closely matching the native yeast genome. So far, coli to Caulobacter in conjunction with sequencing demonstrated attempts to clone high GC sequences in yeast have proven to be that toxic genes were absent in 25 of 37 chromosome segments difficult (35). (SI Appendix, Fig. S5 and Table S4). However, we observed a To assess whether C. eth-2.0 is stably maintained in yeast, we drastic reduction in transfer efficiency for 12 segments, suggest- performed whole-genome sequencing on prolonged cultivation. ing presence of toxic genes that collectively cover 18.9 ± 3.6 kb After propagation for over 60 generations, we found no occur- in sequence (SI Appendix, Table S4). We carried out genetic rences of adaptive mutations or chromosomal rearrangements suppressor analysis and identified evolved strains that tolerated within C. eth-2.0, indicating stable maintenance in YJV04 (SI formerly toxic genome segments (Materials and Methods). Appendix, Fig. S4A). In agreement with this observation, elec- Whole-genome sequencing of suppressor strains led to the tron micrographs showed normal yeast cell morphologies for identification of 14 toxic genetic loci (SI Appendix, Table S5) parental cells and YJV04 bearing the C. eth-2.0 chromosome (SI that bear mutations alleviating toxicity. An additional three Appendix, Fig. S4B). chromosome segments acquired small deletions on selection for We sequence verified C. eth-2.0 at each assembly level to assess fast growth (SI Appendix, Table S6). Among the toxic genes, we the performance of the genome synthesis process (Fig. 2E). found three chromosome replication genes (dnaQ, dnaB, rarA), Across the 785-kb genome design, we detected a total of 21 six genes involved in LPS and fatty acid (fabB, nonsynonymous mutations (SI Appendix, Table S3). Thereof, 17 lptD, lpxD, accC, murU, waaF), two genes encoding interacting emanated from nonsequence perfect DNA blocks that were pro- RNA polymerase components (rpoC, topA), the S10-spc-alpha vided by one of two commercial suppliers. Only four additional ribosomal protein gene cluster (CETH 01304-01323), and the missense mutations within the genes argS (arginyl-tRNA syn- sodium-proton antiporter nhaA. Multiple of the identified toxic thetase) and fabI (acyl-carrier protein) and the ribosomal genes genes encode for interacting that form complexes, S7P and L12P were introduced during segment and megaseg- suggesting an imbalance in subunit dosage as a likely cause for ment assembly in yeast and E. coli, respectively. No additional toxicity. Overall, the observed fraction of 1.9% (14 of 730) toxic

4 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1818259116 Venetz et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 eeze al. et Venetz arrows). (gray genes control nonessential and arrows), (orange genes nonfunctional arrows), (blue genes important are that elements control genetic in additional rewriting sequence erases where genes specific identify will to equivalent functionally native are In essential variants 3A). gene rewritten (Fig. functional that genes prove of insertions rewritten transposon case nonfunctional of disruptive the presence tolerate the disruptive not in acquire remain do genes and and native contrast, essential In dispensable 3A). (Fig. become insertions transposon genes native functional tial of presence testing the native The In between chromosome. equivalence rewritten native functional and the the measures to approach addition in of test copies segments merodiploid episomal in harbor genes which rewritten strains, approach of functionality This parallel the approach. assesses testing permit rewritten transposon-based To a of developed genomes. challenges assessment synthetic major functionality of are intro- constructs bioengineering DNA on for large-scale function of anticipated diagnosis their into resume duction whether investigated genes the next we rewritten process, hosts, heterologous of in build maintained the Assessment throughout identical Functionality maintains not Genome-Wide that is and process property rewriting general the proteins. when a genes for to likely these toxicity of attributed is toxicity of the expressed subunit that complex precedence ectopically argue protein we the in genes, Given wild-type imbalance to (38–41). due stoichiometry toxicity elicit to netos(lemrs napeiul seta hoooa ee(ryarw)idctsfntoaiyo h homologous the of functionality indicates arrows) nonfunctional (gray a gene indicates chromosomal insertions essential of previously absence a while in marks) (blue insertions episomal bearing strains 3. Fig. toxic 14 identified “unclonable of the as identified among in previously genes” genes been have agreement 6 genes In rewritten hypothesis, rewriting toxicity. synthesis gene this additional chemical with induce the not of does part process as rewriting sequence seven 2.15 of average reported in found genes chromosome C. eth-2.0 chromosome Caulobacter Native chromosome A Merodiploid Fault diagnosis rpoC, .coli E. al igoi n ro slto costhe across isolation error and diagnosis Fault Caulobacter rarA, .coli E. presence Caulobacter functional .eth-2.0 C. ucinlt sesetaderror and assessment Functionality Caulobacter. tan 3) ecnlddta computational that concluded We (36). strains .eth-2.0 C. and dnaB, Caulobacter

3) utemr,msaacdexpression misbalanced Furthermore, (37). .eth-2.0 C. TnSeq strain ee hog eei complementation. genetic through genes .eth-2.0 C. ± swl nareetwt h previously the with agreement in well is non-functional .%o oi ee dnie among identified genes toxic of 0.8% absence ofTninsertions accC ee.Fiuei complementation in Failure genes. hoooesget oag n lecrl)aesbetdt rnpsnsqecn TSq.Peec ftransposon of Presence (TnSeq). sequencing transposon to subjected are circle) blue and (orange segments chromosome .eth-2.0 C. a rvosybe reported been previously has ee,sc naayi will analysis an such genes, segments 1-37 chromosome .eth-2.0 C. ee,peiul essen- previously genes, .eth-2.0 C. .eth-2.0 C. C. eth-2.0 C. eth-2.0 .eth-2.0 C. .eth-2.0. C. .eth-2.0 C. eoewas genome chromosome B ee,we genes, 160K .eth-2.0 C. ucinlt a fthe of map Functionality (B) arrow). (orange gene 320K ucinlt seseto the of assessment Functionality (A) chromosome. 480K While 640K 20K 180K 340K 500K ob ipnal eetetrefrel sindantisense assigned formerly three the CCNA (sRNAs) were found transcripts features dispensable genetic be the Among to function. do essential features adopt genetic predicted not and annotated observed previously 6,290 81.5% the of level 6,290 rewritten functionality genome the high from within The CDS current 2). within (Table the features 799 genetic of of to of process refinement number the design to within reduced the lead During functionality will annotation. it features, biologic genetic hence, these and of dispensability of suggests genes rewritten Maintenance wild-type the cryptic, annotation. with (annotated, compared features predicted) genetic of or number the in semiessential reduction and rewritten essential all Functional of eth-2.0 530) of C. native (432 81.5% the found we to insertions and genome transposon assign ously sequence and the of rewriting optimization on introduced function- substitutions the assessed of to and ality conditions chromosome noncomplementing complementing native between and obtained the patterns compared insertion We along functionality. transposon gene test copies to 37 mutagenesis episomal transposon subjected as We bearing ments introduced. strains massive test modification the merodiploid sequence despite genes of native genes level to rewritten equivalent whether functionally asked We are functioning. gene proper for vr infiatnme fnnyoyossubstitutions nonsynonymous of intro- number were significant of substitutions sequences a base protein-coding ever, 133,313 within the duced of majority Sequences. Protein-Coding an large Beyond elicit Flexibility Design not Sequence do 43) sRNAs (42, encoded analysis function. chromosomally transcripts essential transcriptome of antisense by majority assigned identified the formerly that the suggests 4A). of (Fig. respectively Dispensability process, rewriting the during substitutions to 660K rpoC, 40K 200K 360K .eth-2.0 C. 520K and sufB, 680K ee ob ucinl(i.3B (Fig. functional be to genes .eth-2.0 C. 60K 220K 380K 540K .eth-2.0 C. aeil n Methods and (Materials .eth-2.0 C. atpD .eth-2.0 C. 80K hoooesget.Cumulatively, segments. chromosome 240K htaqie 6 7 n 2base 62 and 17, 16, acquired that 400K 00,R11 n 09 internal R0194 and R0151, R0109, 560K ugssta h ag aoiyof majority large the that suggests eoealwdu ounambigu- to us allowed genome .eth-2.0 C. .eth-2.0 C. 100K .eth-2.0 C. ee nops drastic a encompass genes 260K 420K Non-functional genes Functional genes 580K Non-essential genes .eth-2.0 C. NSLts Articles Latest PNAS hoooewt functional with chromosome hoooe Merodiploid chromosome. Caulobacter 120K ehave we eth-2.0, C. 280K hoooeseg- chromosome and .eth-2.0 C. 440K ee(learrow), (blue gene n al 3). Table and 600K aae S1). Dataset Caulobacter 140K 300K genome How- . | 460K f10 of 5 620K The

BIOPHYSICS AND SYSTEMS BIOLOGY COMPUTATIONAL BIOLOGY Table 3. Functionality of C. eth-2.0 genes according to cellular ity. For example, the two tRNA genes tRNATrp and tRNATyr processes remained functional despite base changes that were introduced Category Functional C. eth-2.0 genes* P value† within the anticodon arm to erase DNA synthesis constraints present within the wild-type sequences (Fig. 4B). In the case ‡ Translation 73.6% (81/110) 5.49E-03 of tRNATrp, we removed a Type IIS restriction site, and in ‡ Ribosomal proteins 60.6% (20/33) 2.01E-03 tRNATyr, a homopolymeric sequence pattern hindering DNA tRNA synthetases 81.8% (18/22) 6.14E-01 synthesis was removed. Both rewritten tRNA genes retained tRNAs 67.8% (19/28) 4.73E-02 their function as revealed by our transposon-based complemen- Translation factors 88.9% (24/27) 2.22E-01 tation measurements (Fig. 4B). These findings suggest that, 86.7% (13/15) 4.51E-01 even apart from protein-coding sequences, a high level of DNA replication 83.9% (26/31) 4.69E-01 sequence design flexibility exists to imprint biological functions Cellular processes 87.2% (123/141) 1.12E-02 into DNA. Cell cycle 87.5% (28/32) 2.52E-01 Cell envelope 86.9% (73/84) 8.48E-02 Level of Gene Functionality Among Cellular Processes. We reasoned Protein turnover 88.0% (22/25) 2.81E-01 that the analysis of gene functionality among different cellular Energy production 73.9% (34/46) 1.03E-01 processes would permit identification of gene classes harboring § 90.3% (121/134) 2.60E-04 ‡ high levels of transcriptional and translational control elements Hypothetical proteins 64.2% (34/53) 7.45E-04 within CDS. Assignment of gene functionality among cellular Total 81.5% (432/530) processes revealed that metabolic genes were enriched with over *Fraction of functional C. eth-2.0 genes as assessed by transposon sequenc- 90.3% functionality (P value of 2.60E-4) (Table 3). This supports ing. Numbers of functional genes vs. total gene numbers per class are shown the idea that metabolic genes contain a low level of transcrip- in parentheses. tional and translational control elements embedded within their † P values for functionality enrichment and deenrichment of different gene CDS. This finding correlates with the observation that regulation categories. ‡ of bacterial metabolism mainly occurs at the enzymatic level (44, Categories of genes that display a significant decrease in functionality. 45). However, hypothetical and ribosomal genes were underrep- §Categories of genes that display a significant increase in functionality. resented, with 64.2 and 60.6% functional C. eth-2.0 genes, respec- tively (P values of 5.45E-4 and 2.01E-3, respectively) (Table 3). were inserted within intergenic and noncoding regions, such as Based on these findings, we estimated that one-third of the tRNA genes, to facilitate the de novo DNA synthesis process. hypothetical essential genes encode for genetic features other We found that base substitutions within noncoding sequences than the annotated protein-coding sequence. Likewise, close to were frequently tolerated and did not impair gene functional- 40% of all ribosomal genes likely contain additional regulatory

AB C

Fig. 4. Sequence design flexibility within rewritten C. eth-2.0 genes. (A) Dispensability of antisense RNAs. Schematic depicting dispensable antisense tran- scripts embedded with CDSs of genes rpoC, sufB, and atpD (blue arrows). On synonymous rewriting, antisense transcripts CCNA R0109, CCNA R0151, and CCNA R0194 (doted arrows) internal to rpoC, sufB, and atpD acquired 16, 17, and 62 base substitutions, respectively. Essential chromosomal genes rpoC, sufB, and atpD carry disruptive transposon insertion (blue marks) in the presence of complementing C. eth-2.0 chromosome segments (blue marks) compared with the transposon insertion pattern of the wild-type control strain (green marks), indicating that antisense transcripts are nonessential. (B) Schematic depiction of the secondary structure of the rewritten tRNATrp and tRNATyr. Type IIS restriction sites (red letters; Left) and homopolymeric sequences (red letters; Right) hindering chemical synthesis of tRNA genes were erased by introducing base substitutions (blue) in the anticodon arms while maintaining the anticodons (gray box). Transposon testing reveals functionality of C. eth-2.0 tRNA genes. (C) Functionality testing of C. eth-2.0 . On complementation with C. eth-2.0 operons, chromosomal genes tolerate disruptive transposon insertions (blue marks) throughout the native , leading to simultaneous inactivation of multiple native genes.

6 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1818259116 Venetz et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 eeze al. et Venetz se- synonymous protein-coding on functionality of losing rewriting. for cause annotation find- main the These inaccurate is S2). quences that (Dataset process suggest boot-up ings and build the mutations during deleterious acquired genes six optimize Furthermore, to synthesis. sequences substitu- protein-coding base of outside to introduced due not inter- tions genes nonfunctional do that the detected we elements of suggests instances, CDS control within finding translational occur This often and erased rewriting. transcriptional were signals sequence nal that control to sequences translational due protein-coding and evi- within found transcriptional embedded We 27 genes genomes. for protein-coding curated of dence within number misannotated significant a remain that predictions. TSSs implies impre- incorrect This and of regions ancestral instances promoter the misannotated 52 of to nonfunctional annotation pinpointed cise 98 Methods) Error features, the and efforts. (Materials genetic of sug- annotation rewriting classification essential genome previous on cause additional evaded function of have their which presence lost the genome. that organism’s gested genes an of of accuracy Discovery annotation the pow- genetic validate sequence a and unknown protein-coding the hitherto offers within encoded map rewriting elements regulatory to synthesis approach chemical experimental of erful CDS. process Within the Features that Genetic Essential of Discovery synthesis chemical using by DNA. chromosomes build rewritten the artificial combined simplify of are ultimately genes will process which synthetic modules, multiple functional when into arise likely effects fitness not additive that do idea the support multiple findings These of genes. inactivation the simultaneous to throughout leading insertions sequence, native transposon operon disruptive chaperone tolerated protein Both the membrane for the obtained and were insertions poson function the the dis- that insertions chromosomal suggesting the transposon mRNA, of of found polycistronic maintenance We the and (48). rupting generation shape the cell for bacterial critical machinery. is biosynthesis complex peptidoglycan This wall cell of coordination exam- One the counterparts. includes native ple their com- of modules function biological the functional plementing encompass formerly fully indeed rewritten 41 may synthesis chemical within the that insertions suggesting transposon transcript. such essential mRNA polycistronic gene observed a multiple of We truncation of to inactivation due simultaneous products to for leading searched operons essential when thus within We arise insertions combined. transposon might chromosomal were effects genes fitness synthetic additive multiple that hypothesized we rewritten Modules. Rewritten modules cell. core bacterial multigene the of other within and embedded genes are How- ribosomal elements genes. regulatory coexpressed of metabolic number high of a sequences ever, protein-coding embedded the is elements a within regulatory that essential suggests additional analysis of our level sum, low In many (47). of genes control operon com- transcriptional ribosomal in concerted ribosome the energy functional on of depends cellular plexes protein biogenesis of as the consumer Furthermore, surprising, major (46). not the is is this synthesis perspective, regulatory From gene sequence. a protein-coding their within embedded elements .eth-2.0 C. lhuhasgicn muto hmclsynthesis chemical of amount significant a Although Caulobacter .eth-2.0 C. .eth-2.0 C. .Smlrpten ftrans- of patterns Similar 4C). (Fig. counterparts mreBCD-rodA prn nops ul ucinlBiological Functional Fully Encompass Operons ee r ucinlo nidvda basis, individual an on functional are genes mreBCD-rodA prngns(i.4C (Fig. genes operon prn hc sivle nthe in involved is which operon, Caulobacter Caulobacter prni opeetdby complemented is operon groEL-ES eoe nol 13 only In genome. eoeincluding genome and .eth-2.0 C. .eth-2.0 C. yidC-yidA ereasoned We aae S3), Dataset prn(49) operon Caulobacter genes genes (50). h idtp euneeeet ptemo ofntoa eldivision cell by nonfunctional measured lacZ as of expression gene upstream restores elements genes sequence wild-type sig- the coupling translational (murC (RBSs; including regions sites cluster, binding present ribosome gene elements internal (murG ), division control nals cell genetic reveals the arrows). (blue rewriting within functional synthesis are genes Chemical rewritten (B) of majority genes large the nonfunctional arrows), four the of tion netosi h idtp oto genmrs n ncomplementation on and marks) (green control with wild-type the in insertions the across diagnosis Fault 5. Fig. eddt nae h xc oeua ucin fteegenetic these of functions molecular are exact studies the additional unravel While to 5B). needed (Fig. conditions control metabolic attenu- may the transcriptional This a element. hairpin resembles ation a which contains structure, sequence wild-type secondary the that revealed nonfunctional (29) analysis CDS the annotated short of a found upstream we Finally, for 5B). (Fig. necessary (29) tion (ftsWs) the transcript of short translation embedded for necessary gene site downstream binding ribosome of internal rewriting of sequence rewriting Similarly, region. that promoter of found upstream we CDS overlapping Indeed, an expression. gene needed proper elements control for critical erased likely rewriting sequence corresponding the 2.0 with failed within complementation genes features genetic division genetic cell additional Genes. of the Division presence Cell the the investigated Within Features Control Genetic

mraW A .W yohszdta computational that hypothesized We 5A). (Fig. counterparts eotrgnfusions. gen reporter .eth-2.0 C. ftsL C al igoi n earars the across repair and diagnosis Fault LacZ activity ftsI B

160 240 murE 80 ,adatnao eune ptemof upstream sequences attenuator and ), Translational coupling 0 eldvso ee bu ak)aeson ihteexcep- the With shown. are marks) (blue genes division cell murF uGmr tQftsZ ftsQ non-functional murC murG ftsW

mraY ftsQ RBS

repaired ddlB RBS 120 180 murD 60 .Terwiigof rewriting The 5B). (Fig. .eth-2.0 C. 0 murG ftsWs murG ftsQ ftsW

non-functional murG repaired , murC, eldvso eecutr Transposon cluster. gene division cell 1200 1800 ftsZ murG,

murC murC 600 ftsZ 0 murB CETH_03971 ee oee,sequence However, gene. xrsindpnigon depending expression murC, and ftsQ, NSLts Articles Latest PNAS orpe h associated the corrupted *** .eth-2.0 C. β non-functional ddlB glcoiaeasy using assays -galactosidase Promoter Attenuator ,etne promoter extended ftsQ), repaired ftsQ and ftsQ, 250 500 750 murC RBS

(C ftsZ. ftsA 0 ftsZ hoooe (A) chromosome. ddlB ftsW nwhich in ftsZ, murG non-functional ftsZ neto of Insertion ) ftsZ rsdan erased rsdan erased

repaired next We transla- | .eth- C. (orange alaS 1 kb f10 of 7

BIOPHYSICS AND SYSTEMS BIOLOGY COMPUTATIONAL BIOLOGY control elements within the cell division gene cluster, we found ers. These include alternative reading frames as well as hidden that repair of the sequence upstream of the ftsZ gene restored control elements embedded within protein-coding sequences of the ftsZ expression levels (Fig. 5C). Similarly, insertion of the essential genes. wild-type sequence elements into the C. eth-2.0 genes murC, ftsQ, Rewriting of 56% of all codons resulted in complete rewriting and murG also restored gene expression (Fig. 5C). This suggests of the essential Caulobacter transcriptome. Despite incorporat- that, after missing essential genetic elements are identified, error ing such drastic changes at the level of mRNA, our functionality causes can rapidly be deduced to allow for rational repair of analysis revealed that over 432 of the transcribed essential genes genome designs. Furthermore, identification and error diagnosis of C. eth-2.0 corresponding to 81.5% of all rewritten essential of noncomplementing genes will provide a formidable opportu- genes are equal in functionality to natural counterparts to sup- nity to uncover DNA design principles that will further improve port viability. This result suggests that, in most essential genes, our capabilities in programing biological functions into synthetic the primary mRNA sequence, the secondary structure, or the chromosomes. codon context has no significant influence on biological function- ality. This finding is surprising given the fact that previous studies Discussion on individual genes reported that codon translation in vivo is con- C. crescentus has emerged as an important model organism for trolled by many factors, including codon context (56). Further- understanding the regulation of the bacterial cell cycle (25, 51, more, our findings suggest that the vast majority of the probed 52). A notable feature of Caulobacter is that the regulatory events ORFs encode exclusively for proteins and that other layers of that control polar differentiation and cell cycle progression are genetic control do not seem to play a significant role. Among the highly integrated and occur in a temporally restricted order (53). 134 enzyme-encoding genes that make up the metabolic core net- The advent of genomic technologies has enabled global analy- work of C. eth-2.0, the level of functional genes is even over 90%, ses that have revolutionized our understanding of Caulobacter suggesting that rewritten biosynthetic pathways retain their func- genetic core networks that control the lifecycle (26–29). In recent tionality in most cases. A possible explanation for the high pro- years, many components of the regulatory circuit have been iden- portion of functional metabolic genes might be the fact that regu- tified, and simulation of the circuitry has been reported (25, 54). lation of essential metabolic functions occurs rather by allosteric More recent experimental work using transposon sequencing interactions at the level of enzymes than at the level of gene has shown that 12% of the Caulobacter genome is essential for expression. survival under laboratory conditions (30). The identified set of In addition to 432 functional rewritten genes, our study pre- essential sequences included not only protein-coding sequences cisely mapped 98 genes that lost functionality on synonymous but also, regulatory regions and noncoding elements that collec- rewriting as detected by our transposon-based functionality tively store the genetic information necessary to run a living cell. assessment. Since retaining solely the protein-coding sequences Of the individual DNA regions identified as essential, 91 were of these genes is not sufficient for their functionality, it is reason- noncoding regions of unknown function, and 49 were genes pre- able to conclude that these genes are misannotated or contain sumably coding for hypothetical proteins with function that is hitherto unknown essential genetic elements embedded within unknown. their CDS. Alternatively, it is also possible that a subset of Although classical genetic approaches dissect the functioning these genes encode for RNA rather than protein-coding func- of biological systems by analyzing individual native genes, uncov- tions. Taken together, our genome rewriting approach can be ering the function of essential genes has remained very chal- used to experimentally validate the annotation fidelity of entire lenging. Herein, we show that the rewriting of entire genomes genomes. through the process of chemical synthesis provides a power- Altogether, the identified set of 98 nonfunctional genes corre- ful and complementary research concept to understand how sponds to less than 20% of the essential genome of C. eth-2.0 and essential functions are programed into genomes. Contempo- precisely revealed where we currently have gaps in our knowl- rary synthetic genome projects (3, 5, 8) have largely maintained edge that persisted despite previous omics-informed genome natural genome sequences, implementing only modest design reannotation efforts. In the future, it will be interesting to changes to increase the likelihood of functionality. However, unravel why rewriting renders particular genes nonfunctional. conservative genome design misses a key opportunity of chemical These studies will shed light onto hitherto unknown transcrip- DNA synthesis: the rewriting of DNA to advance our under- tional and translational control layers embedded within protein- standing of how fundamental biological functions are encoded coding sequences that are of fundamental importance for proper within genomes. Indeed, synthetic autonomous bacteria, such as gene functioning. Targeted repair of identified nonfunctional M. mycoides strain JCVI-syn3.0 made up of 473 genes within a C. eth-2.0 genes, as exemplified within the subset of the four 531-kb genome (5), resulted in the creation of a replicative cell. faulty cell division genes murG, murC, ftsQ, and ftsZ, will lead However, it also encompasses 149 genes with unknown func- to the discovery of genetic features, such as the essential attenu- tions (84 labeled as “generic,” and 65 labeled as “unknowns”) ator element identified upstream of the ftsZ gene, the function of (55). This corresponds to over one-third of its gene set. While which is currently unknown. We acknowledge that the 98 iden- these studies were highly valuable to experimentally deter- tified nonfunctional genes are still poorly understood, yet our mine the core set of genes for an independently replicating findings on C. eth-2.0 serve as an excellent starting point to close cell, they did not probe the genetic information content of its current knowledge gaps in essential genome functions toward essential genes. rational construction of a synthetic organism with a fully defined By rebuilding the essential genome of Caulobacter through genetic blueprint. the process of chemical synthesis rewriting, we assessed the On the level of de novo DNA synthesis, we herein demonstrate essential genetic information content of a bacterial cell on the how chemical synthesis rewriting facilitates the genome synthe- level of its protein-coding sequences. Within the 785,701-bp sis process. To simplify the entire genome build process, we used genome of C. eth-2.0, we used sequence rewriting to reduce sequence design algorithms (31, 32) and collectively introduce the number of genetic features present within protein-coding 10,172 base substitutions to remove 5,668 DNA synthesis con- sequences from 6,290 to 799. Overall, we introduced 133,313 straints, including 1,233 repeats, 93 homopolymeric stretches, base substitutions, resulting in the synonymous rewriting of and 4,342 regions of high GC content. Successful low-cost syn- 123,562 codons. We speculated that synonymous rewriting of thesis and subsequent higher-order assembly of C. eth-2.0 into protein-coding sequences maintains the encoded the complete chromosome exemplify the utility of our approach sequences but likely erases additional genetic information lay- to rapidly produce designer genomes.

8 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1818259116 Venetz et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 0 uY ta.(07 u apn n testsigo hmclysynthesized chemically of testing fitness and mapping Bug (2017) al. et Y, Wu 10. 1 i X ta.(07 Pret einrcrmsm n eairo ring a of behavior and V chromosome designer “Perfect” (2017) al. et ZX, Xie 11. 5 agH,e l 20)Pormigclsb utpe eoeegneigand engineering genome multiplex by cells Programming (2009) computer-aided al. by et vaccines HH, virus Wang influenza attenuated 15. Live (2010) al. et S, Mueller 14. pair codon in changes genome-scale Mart by attenuation 13. Virus (2008) al. et JR, Coleman 12. 6 sasF,e l 21)Peiemnplto fcrmsmsi ioealsgenome- enables vivo in chromosomes of manipulation Precise (2011) al. et FJ, Isaacs 16. 7 aoeM ta.(03 rbn h iiso eei eoigi seta genes. essential in recoding genetic of limits the Probing (2013) al. et M, Lajoie 17. o6-bmgsget sn es oooosgprpi.T eiythe verify To conducted. repair. was gap PCR homologous junction-amplifying 40- a yeast into assemblies, subsequently, using and megasegments segments 60-kb synthesis. 20-kb DNA to into novo assembled de were low-cost blocks of The suppliers of commercial assembly hierarchical two the from for ordered blocks the DNA 4-kb of to 3- Assembly tioned Hierarchical and Synthesis (32). algorithm Partitioner Genome route, published assembly retrosynthetic eth-2.0 the streamlined C. enable To The features. 0.56. genetic of unnecessary probability recoding codon eth-2.0 a (31) at pipeline design applied sequence were and algorithm Caligrapher Genome reported the streamline of Design CP035535). no. accession in eth-2.0 are C. methods and materials Detailed Methods and Materials health and toward purposes industrial utility benefits. for its organisms and improved of functions design rewriting genome synthesis fundamental chemical decode of to promise the sum, highlight In process. results build our genome the of costs physically effectiveness reducing be increasing thereby and manner, can reliable highly that a sequences in manufactured DNA bio- into of rewriting information findings facilitates logical our synthesis chemical synthesis, how genome highlight of biolog- also level their the maintaining On while functionality. sequence ical genomes systems in rewritten customized the produce are to On that to flexibility DNA. us design enables into massive approach harness programed design–build–test our are fundamental level, cell most engineering the a how of understand to functions genomes entire of ing oooosgprpi a oeb h el generated newly the by done was repair gap homologous eeze al. et Venetz .Se ,e l 21)De ucinlaayi fsni 7-ioaesnhtcyeast synthetic 770-kilobase a synii, of analysis functional Deep (2017) al. genome. et yeast Y, synthetic Shen a of 9. eukaryotic Design (2017) designer al. et functional chromosome SM, synthetic Richardson a of effects 8. and of debugging, Synthesis, synthesis (2017) al. Total et LA, (2014) Mitchell 7. al. et N, Annaluru 6. .HthsnC,e l 21)Dsg n ytei famnmlbceilgenome. bacterial minimal chemically a a of by synthesis and controlled Design cell (2016) bacterial al. a et of CA, Creation a Hutchison of (2010) 5. cloning al. syn- and et assembly, a DG, synthesis, chemical Gibson Generating Complete 4. (2003) (2008) al. JC et Venter DG, Gibson C, 3. Pfannkoch CA, Gener- Hutchison cdna: poliovirus HO, of Smith synthesis 2. Chemical (2002) E Wimmer AV, Paul J, Cello 1. oasml h eaemnsit h 785-kb the into megasegments the assemble To rewrit- synthesis chemical of promise the highlight results Our chromosome. 1044. beyond. and Synvi consolidation: chromosome. hoooeX. chromosome Science genome. synthesized genome. genitalium mycoplasma synthetic from bacteriophage phiX174 assembly: genome oligonucleotides. whole template. by natural genome thetic of absence the in virus 1018. infectious of ation derivative. ceeae evolution. accelerated design. rational fitness. viral impact to tool a as recoding bias. iecdnreplacement. codon wide Science nzM,Jra-azA rnoS eo 21)Snnmu iu genome virus Synonymous (2016) M Nevot S, Franco A, Jordan-Paiz MA, ´ ınez Science eincnan o muto ohsnhsscntansand constraints synthesis both of amount low a contains design 351:aad6253. 342:361–363. a attoe no3 o4k N lcsuigtepreviously the using blocks DNA 4-kb to 3- into partitioned was .eth-1.0 C. eoehsbe eoie nteNB aaae(GenBank database NCBI the in deposited been has genome Science 320:1784–1787. Science Science .eth-1.0 C. Science a Biotechnol Nat rcNt cdSiUSA Sci Acad Natl Proc 355:eaaf4704. Science 355:eaaf4791. 344:55–58. eoeadSqec ertn into Rewriting Sequence and Genome 355:eaaf4706. Science ein(0 o N ytei,tepreviously the synthesis, DNA for (30) design 329:52–56. 460:894–898. 28:723–726. Science Science 333:348–353. 100:15440–15445. rnsMicrobiol Trends 355:eaaf4831. 319:1215–1220. IAppendix SI .eth-2.0 C. h euneo the of sequence The . 24:134–147. .eth-2.0 C. Genome. .cerevisiae S. Science Science .eth-2.0 C. .eth-2.0. C. h parti- The genome, 355:1040– 297:1016– strain were To C. 3 hitnM hitnB(2019) B Christen M, web A Christen partitioner: 33. Genome (2017) B Christen H, Christen L, Medico Del M, refac- Christen for tool 32. web A calligrapher: Genome (2015) B Christen bacterium. S, a Deutsch of M, Christen genome essential 31. cycle The (2011) cell al. Caulobacter et in B, regulation Christen translation 30. Dynamic (2016) crescentus. al. Caulobacter et of JM, sequence Schrader genome Complete 29. the (2001) during al. transcription et of WC, architecture Nierman regulatory global 28. The (2015) al. et Caulobacter the B, of Zhou architecture cell-cycle noncoding 27. and bacterial coding a The (2014) of al. robustness et JM, inherent Schrader and Architecture 26. (2008) al. et X, c-di-GMP Shen messenger second 25. the of distribution Asymmetrical (2010) al. et M, Christen 24. 3 crt T ta.(07 ihtruhu dnicto ftasrpinstart transcription of identification High-throughput (2007) al. circuit et genetic PT, the control McGrath genome. regulators global 23. 57-codon iterative Oscillating a by (2004) toward al. genome testing et and J, bacterial Holtzendorff synthesis, a Design, 22. (2016) of al. recoding et N, Large-scale Ostrov (2017) 21. al. et genome YH, by schemes Lau compression codon 20. synonymous Defining editing (2016) by al. elucidated et K, choice Wang codon for 19. rules Emergent (2016) al. et MG, Napolitano 18. odce yteU eateto nryJitGnm nttt,a Institute, DE-AC02-05CH11231. Contract Genome Office Energy of by Joint Department supported US is Energy Facility, the User of of Science Science of of Department Office Energy US of Department the by conducted al igoi of Diagnosis Fault in up boot the evolved genes. the toxic after the within mutations system eth-2.0 genes, the Illumina data, C. toxic sequencing an the the on Using pinpoint Caulobacter. sequenced To occur- were The segments. strains. segments different test the merosynthetic 37 of of toxic panel of a rence generate to NA1000 eth-2.0 C. Merosynthetic of Construction systems. iSeq gel and NextSeq agarose plug. agarose field an pulse inside using cells verified yeast the was PCR. lysing construct junction-amplifying by the a electrophoresis of by proce- verified size spheroplast was correct a assembly The cells, The yeast applied. the was into dure segments the transform To YJV04. eateto nryJitGnm nttt,SisFdrlIsiueof US Institute the 31003A Grant from Federal Foundation B.C.) Science Swiss National and Institute, Swiss Z M.C. Genome (to (ETH) JGI Joint CSP-2840 Technology Grants Energy and Award of B.C.) Synthesis support and Department DNA institutional M.C. (CSP) received (to and Program work CSP-1593 Aebi, concep- Science This Markus for Community comments. Miller, Christen critical I. from support; H. for Samuel sequencing design; and Sauer algorithms; marker Uwe and computational yeast synthesis of with the DNA tion from assistance for Nath for S. (JGI) Rudolf and F. support; Institute Maier microscopy B. Genome electron support; Joint for sequencing ScopeM for from (ZFGC) members Center Genomics Functional ACKNOWLEDGMENTS. repaired the test To β done was techniques. sequence cloning the of standard repair using a genes, nonfunctional the analyzing After original the onto mapped were Caulobacter libraries data sequencing transposon The hypersaturated system. Illumina using an and conducted was analysis The (30). the of functionality nglcoiaerpre sa a conducted. was assay reporter on-galactosidase h euneo the of sequence The eobneigo ytei . synthetic of recombineering recoding. coli. escherichia in codons arginine rare oie aur 1 2019. 31, January posited De- https://www.ncbi.nlm.nih.gov/search/all/?term=CP035535. at Available GenBank. biology synthetic for constructs DNA large-scale of applications. partitioning multi-level synthesis. for tool DNA Novo de for sequences 4:927–934. genome bacterial toring 528. control. USA Sci Acad Natl Proc cycle. cell Caulobacter genome. crescentus system. control division. cell bacterial upon 592. ie,cnevdpooe oisadpeitdregulons. predicted and motifs promoter conserved sites, cycle. cell bacterial a driving Science 353:819–822. rcNt cdSiUSA Sci Acad Natl Proc emnswr ojgtdfrom conjugated were segments emnswr nlzd iligtepeiecodntsof coordinates precise the yielding analyzed, were segments Nature eoe eutn nasto l functional all of set a in resulting genome, .eth-2.0 C. LSOne PLoS rcNt cdSiUSA Sci Acad Natl Proc 539:59–64. .eth-2.0 C. rc T eerhGatEH0 61(oBC) and B.C.), (to 16-1 ETH-08 Grant Research ETH urich LSGenet PLoS ¨ .eth-2.0 C. LSGenet PLoS 98:4136–4141. 12:e0177234. etakR clpahadL oeafo Z from Poveda L. and Schlapbach R. thank We .eth-2.0 C. ee a esrdb h ojgto frequency conjugation the by measured was genes Science Science yTasoo Sequencing. Transposon by Caulobacter 113:E6859–E6867. 10:e1004463. albce ethensis Caulobacter ee,tasoo eunigwsapplied was sequencing transposon genes, uli cd Res Acids Nucleic 11:e1004831. 328:1295–1297. 304:983–987. osrc a eie sn h Illumina the using verified was construct 105:11340–11345. rcNt cdSiUSA Sci Acad Natl Proc etStrains. Test .coli E. NSLts Articles Latest PNAS 45:6971–6980. 646(oBC) h work The B.C.). (to 166476 EH. eoesequence. genome CETH2.0 1- into S17-1 a Biotechnol Nat Sequence-confirmed .eth-2.0 C. obnhakthe benchmark To 113:E5588–E5597. o ytBiol Syst Mol .eth-2.0 C. C yt Biol Synth ACS Caulobacter .eth-2.0 C. | ee,a genes, 25:584– genes. f10 of 9 7:528– urich ¨

BIOPHYSICS AND SYSTEMS BIOLOGY COMPUTATIONAL BIOLOGY 34. Gibson DG, et al. (2010) Creation of a bacterial cell controlled by a chemically 46. Li GW, Burkhardt D, Gross C, Weissman JS (2014) Quantifying absolute protein synthesized genome. Science 329:52–56. synthesis rates reveals principles underlying allocation of cellular resources. Cell 35. Noskov VN, et al. (2012) Assembly of large, high G+C bacterial DNA fragments in 157:624–635. yeast. ACS Synth Biol 1:267–273. 47. Paul BJ, Ross W, Gaal T, Gourse RL (2004) Rrna transcription in escherichia coli. Annu 36. Kimelman A, et al. (2012) A vast collection of microbial genes that are toxic to Rev Genet 38:749–770. bacteria. Genome Res 22:802–809. 48. Dye NA, Pincus Z, Theriot JA, Shapiro L, Gitai Z (2005) Two independent spiral 37. Sorek R, et al. (2007) Genome-wide experimental determination of barriers to structures control cell shape in caulobacter. Proc Natl Acad Sci USA 102:18608–18613. . Science 318:1449–1452. 49. Susin MF, Baldini RL, Gueiros-Filho F, Gomes SL (2006) Groes/groel and dnak/dnaj 38. Izard J, et al. (2015) A synthetic growth switch based on controlled expression of rna have distinct roles in stress responses and during cell cycle progression in caulobacter polymerase. Mol Syst Biol 11:840. crescentus. J Bacteriol 188:8044–8053. 39. Shibata T, et al. (2005) Functional overlap between reca and mgsa (rara) in the rescue 50. Kiefer D, Kuhn A (2007) Yidc as an essential and multifunctional component in of stalled replication forks in escherichia coli. Genes Cells 10:181–191. membrane protein assembly. Int Rev Cytol 259:113–138. 40. Allen G, Kornberg A (1991) Fine balance in the regulation of dnab helicase by dnac 51. McAdams HH, Shapiro L (2003) A bacterial cell-cycle regulatory network operating in protein in replication in escherichia coli. J Biol Chem 266:22096–22101. time and space. Science 301:1874–1877. 41. Choi-Rhee E, Cronan JE (2003) The biotin carboxylase-biotin carboxyl carrier protein 52. McAdams HH, Shapiro L (2009) System-level design of bacterial cell cycle control. FEBS complex of escherichia coli acetyl-coa carboxylase. J Biol Chem 278:30806–30812. Lett 583:3991. 42. Rutherford ST, Bassler BL (2012) Bacterial quorum sensing: Its role in virulence and 53. Lasker K, Mann TH, Shapiro L (2016) An intracellular compass spatially coordinates possibilities for its control. Cold Spring Harbor Perspect Med 2:a012427. cell cycle modules in Caulobacter crescentus. Curr Opin Microbiol 33:131–139. 43. Storz G, Vogel J, Wassarman KM (2011) Regulation by small rnas in bacteria: 54. Skerker JM, Laub MT (2004) Cell-cycle progression and the generation of asymmetry Expanding frontiers. Mol Cell 43:880–891. in Caulobacter crescentus. Nat Rev Microbiol 2:325–337. 44. Monod J, Changeux JP, Jacob F (1963) Allosteric proteins and cellular control systems. 55. Danchin A, Fang G (2016) Unknown unknowns: Essential genes in quest for function. J Mol Biol 6:306–329. Microb Biotechnol 9:530–540. 45. Chubukov V, Gerosa L, Kochanowski K, Sauer U (2014) Coordination of microbial 56. Chevance FFV, Hughes KT (2017) Case for the as a triplet of triplets. Proc metabolism. Nat Rev Microbiol 12:327–340. Natl Acad Sci USA 114:4745–4750.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1818259116 Venetz et al. Downloaded by guest on September 29, 2021