View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Available online at www.sciencedirect.com

ScienceDirect

Critical reflections on synthetic gene design for

recombinant protein expression

1 2 1

Annabel HA Parret , Hu¨ seyin Besir and Rob Meijers

Gene synthesis enables the exploitation of the degeneracy of expression using large-scale fermentation [2]. However,

the genetic code to boost expression of recombinant protein large-scale fermentation often requires optimization to

targets for structural studies. This has created new address problems of plasmid loss, oxygen deprivation and

opportunities to obtain structural information on proteins that pH fluctuation [3].

are normally present in low abundance. Unfortunately,

synthetic gene expression occasionally leads to insoluble or A more cost-effective approach is the optimization of the

misfolded proteins. This could be remedied by recent insights protein source, either by opting for a protein species of

in the effect of codon usage on translation initiation and high natural abundance, or by using an efficient recom-

elongation. In this review, we discuss the interplay between binant expression system. Traditionally, selecting protein

optimal gene and vector design to enhance expression in a homologues from species that can be purified from natural

particular host and highlight the benefits and potential pitfalls source has been rather successful [4–8]. Nevertheless,

associated with protein expression from synthetic genes. there is an increasing need to obtain the structure of a

protein from a particular species, in order to assess spe-

Addresses

1 cies-specific functional aspects. Structures of human or

European Laboratory, Hamburg Unit, Notkestrasse

85, 22607 Hamburg, Germany murine proteins involved in the immune or nervous

2

Protein Expression and Purification Core Facility, European Molecular system can help to relate the effect of specific point

Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany

mutants or posttranslational modifications. Similarly,

structural analysis of virulent proteins from bacterial

Corresponding author: Meijers, Rob ([email protected])

pathogens can reveal species-specific details important

for drug design. In this review, we focus on the oppor-

Current Opinion in Structural Biology 2016, 38:155–162 tunities that gene synthesis provides to optimize recom-

This review comes from a themed issue on New constructs and binant protein expression of specific protein targets.

expression of proteins

Edited by Rob Meijers and Anastassis Perrakis

Application of synthetic genes for protein

For a complete overview see the Issue and the Editorial production

Available online 21st July 2016 The use of synthetic genes for the production of proteins

http://dx.doi.org/10.1016/j.sbi.2016.07.004 for structural characterization is widespread. In general,

gene synthesis may expedite the production of a series of

0959-440X/# 2016 The Authors. Published by Elsevier Ltd. This is an

constructs with engineered restriction sites. It is often

open access article under the CC BY-NC-ND license (http://creative-

commons.org/licenses/by-nc-nd/4.0/). preferred to the use of cDNA, which can contain several

isoforms or splice variants. For target proteins identified

from metagenomes or extremophiles, gene synthesis may

provide the only route to protein expression, because

cDNA is not available. Protein expression from codon-

optimized genes has been especially successful for pro-

Introduction teins involved in cell signalling whose native codon

The efficient overexpression of proteins is a prerequisite sequence can be strongly regulated, such as Irisin [9],

for successful structure determination. Structural geno- Parkin [10] and Netrin [11]. Codon optimization has also

mics studies on a wide range of soluble protein targets uncovered hidden levels of gene regulation that are

over more than a decade have shown that more than 40% encoded in codon bias. Codon optimization of the

of the constructs failed due to limited expression FRQ protein involved in the regulation of circadian

(Figure 1). The high failure rate is a result of the use rhythms resulted in a loss of function, although expres-

of standardized approaches, and can serve as a baseline for sion levels were increased [12 ]. Codon optimization also

a structure determination project on a specific target. To revealed a hidden quality control mechanism encoded in

alleviate these ominous statistics, a number of approaches the gene of the cystic fibrosis transmembrane conduc-

have been adopted which address recombinant protein tance regulator (CFTR), an anion channel membrane

expression. The use of parallel cloning strategies employ- protein [13]. Similarly, optimization of the CTP1L endo-

ing different construct lengths and purification tags can lysin gene led to the discovery of a secondary translation

increase success rate to a certain extent [1]. Alternatively, site that produced a truncated protein. The truncated

larger amounts of protein can be obtained by scaling up protein forms a 1:1 complex with the full length protein,

www.sciencedirect.com Current Opinion in Structural Biology 2016, 38:155–162

156 New constructs and expression of proteins

Figure 1

ed features that have been identified to hamper expres-

sion (Table 2). Gene design tools enable researchers to

40000 optimize the sequence of their gene of interest (GOI)

2004 [20]. A common strategy to manipulate codon bias to

35000 100%

100% 2016 maximize protein translation is based on the ‘codon

30000 adaptation index’ (CAI) [21]. CAI is a species-specific

index for codon frequency based on a set of highly

25000

expressing genes. Individual genes can then be analyzed,

60%

20000 comparing the actual codons used to a fully optimized

51% gene that contains only the most frequently used codons.

15000

It should be noted that CAI maximization does not

No. of targets

10000 necessarily correlate with high protein yield [22–25]. In

9% 23%

1 general, the best results using CAI optimization are 17%

5000

* obtained when a subset of less frequent codons is

4% 3% 3%

0 replaced, after which a secondary list of criteria should

Cloned Expressed Purified Experiment PDB deposit be optimized involving DNA and RNA sequence ele-

Milestones ments that can negatively influence expression of the

Current Opinion in Structural Biology GOI (Table 2). It appears some features of codon bias are

universal for all expression systems, whereas others are

specific for prokaryotic or eukaryotic expression.

A comparison of the total number of targets achieved per milestone

along a structure genomics pipeline as reported by TargetDB in

2004 [104] and as of 15th of March 2016 (http://targetdb.rcsb.org/

Codon optimization algorithms will generally allow ad-

metrics/). The recent data are compiled from soluble protein targets

justment of several of these undesirable sequence prop-

from three consortia (JCSG, NESG and NYSGRC). The attrition rate

erties simultaneously [20,26–28], although the weight

from clone to purified target has changed little over a decade and

hovers around 75%, indicating that the early stages of a structure given to each sequence parameter varies between differ-

determination project are most crucial for successful completion. *The ent algorithms. It is important to note that proprietary

definition of the milestone ‘Experiment’ has changed from ‘crystals’ to

CAI-based codon optimization algorithms adjust a subset

‘X-ray/NMR/EM studies performed for structural studies’. As a result,

of rare codons in a random manner. As a result, gene

there is a large discrepancy between crystals obtained (4% in 2004)

optimization is ambiguous, since the replacement of rare

and experiments done on the sample (17%), irrespective of the

outcome. codons is not based on empirical data. Species-specific

gene optimization routinely involves the use of the spe-

cies-relevant CAI index, taking into account the codon

and lysis activity is substantially reduced when the sec-

frequency of the particular species. Other factors that

ondary translation site is removed by codon optimization

affect for example eukaryotic protein expression are most

[14]. Toll-like receptors (TLRs) contain leucine-rich

often not considered, because the influence on expression

domains, and it was shown through targeted leucine

levels is poorly understood.

codon optimization that transcription of TLRs is regulat-

ed by leucine codons to balance the abundance of certain

Recent insights into the relationship between

receptors [15]. It is clear that gene synthesis may provide

mRNA, protein expression and protein folding

even unexpected benefits, but there are also numerous

It is obvious that codon-mediated translational control is

reports were it leads to reduced protein yield [16,17].

not yet well understood, and further insight will benefit

gene design for protein expression. Recent studies indicate

Adapting codon usage of target genes using that rare codons may play different roles, such as safe-

codon optimization algorithms guarding mRNA structure [29], depending on their posi-

Exploiting gene synthesis for transgene expression tioning within the gene. A large scale study on bacterial

enables researchers to influence the multitude of vari- genes expression in Escherichia coli under a common pro-

ables controlling protein yield. The biased usage of motor system revealed that mRNA stability correlates with

synonymous codons has a strong effect on protein expression levels [30 ]. However, there is a difference in

expression [18 ]. Species-specific differences in this the effect between the ‘head’ of the gene (covering ap-

so-called codon bias can result in tRNA depletion, proximately 16 codons) and the rest of the gene. In the

which slows down protein translation. This effect can ‘head’, the mRNA structure is more important, whereas the

be in part compensated by using an expression host rest of the gene is rather affected by codon-mediated

that expresses rare tRNAs [16], however this is often translation elongation efficiency. This phenomenon is

insufficient [19]. likely a universal property for both prokaryotic and eukary-

otic protein expression [31]. Cell-free expression studies

An extensive collection of tools (Table 1) have shown that universal translation initation tags can be

are available to analyze codon usage and identify unwant- introduced upstream of the GOI that boost expression both

Current Opinion in Structural Biology 2016, 38:155–162 www.sciencedirect.com

The use of synthetic genes for protein expression Parret, Besir and Meijers 157

Table 1

Various tools to evaluate and optimize DNA sequences

Application Name Web URL Reference

Codon usage database http://www.kazusa.or.jp/codon/ [68]

Codon bias database CBDB http://homepages.luc.edu/cputonti/cbdb/ [69]

Analysis of codon usage RaCC http://nihserver.mbi.ucla.edu/RACC/

CAIcaI http://genomes.urv.cat/CAIcal/ [70]

CAI calculator http://www.umbc.edu/codon/cai/cais.php [71]

CodonO http://sysbio.cvm.msstate.edu/CodonO/ [72]

codonW http://codonw.sourceforge.net/

GCUA http://gcua.schoedl.de/ [73]

INCA http://bioinfo.hr/research/inca/ [74]

http://www.codons.org/index.html [38] http://www.cmbl.uga.edu/software/codon_usage.html

http://www.bioinformatics.org/sms2/codon_usage.html [75]

a b

Gene design tools CodonOpt https://eu.idtdna.com/CodonOpt b

COOL http://cool.syncti.org/ [76] b

DNAWorks https://hpcwebapps.cit.nih.gov/dnaworks/ [77] c

D-Tailor https://sourceforge.net/projects/dtailor/ [78] c

EuGene http://bioinformatics.ua.pt/eugene/ [79] b

GeneDesign http://www.genedesign.org [80]

c

Gene Designer https://www.dna20.com/resources/genedesigner [81] GeneOptimizerb https://www.thermofisher.com/de/de/home/life- [82] science/cloning/gene-synthesis/geneart-gene- synthesis/geneoptimizer.html b

JCat http://www.jcat.de/ [83]

c

mRNA Optimizer http://bioinformatics.ua.pt/software/mRNA-optimiser [84] b

OPTIMIZER http://genomes.urv.es/OPTIMIZER/ [85]

OptimumGeneb http://www.genscript.com/cgi-bin/tools/rare_codon_analysis

Synthetic Gene http://userpages.umbc.edu/wug1/codon/sgd/ [86]

Designerb

Visual Gene http://visualgenedeveloper.net/index.html [87] Developerc

a

A very useful and comprehensive comparison of functionality and easiness of use of gene design tools for codon optimization is described by Gould

et al. [26]. b

Web-based tool. c

Stand-alone software.

in bacterial and eukaryotic systems [32,33]. To boost Bioinformatic analyses indicate that rare codon clusters

0

protein expression, it may be sufficient to modify the codon are found in a wide variety of genes, especially in the 5

0

usage at the ‘head’ of the gene in order to optimize the and 3 termini [38–40]. Such clusters are thought to

mRNA structure for efficient translation initiation, and slowdown the ribosome during translation elongation in

leave the codon sequence of the rest of the gene intact. order to allow co-translational protein folding [41–43].

Slow elongation is especially important in case of mem-

In prokaryotes, translation of a transcript is initiated before brane protein targeting [44 ,45,46]. As a consequence,

the transcription process is complete, due to the recruit- modifying the codons involved in the regulation of trans-

ment of ribosomes to the newly synthesized mRNA. In lation speed can culminate in protein misfolding. In one

genes which are not highly expressed, the translation particular case, the replacement of frequently used

efficiency is not necessarily correlated with protein syn- codons by rare codons at domain boundaries resulted

thesis rates, as ribosome profiling studies have shown [34– in an improved solubility of an enzyme produced in E.

36]. These studies point to another level of regulation, coli due to slower translation rate at these sites [47]. To

where codon usage affects protein folding and post-trans- circumvent deleterious effects of codon usage during co-

lational modifications. A bioinformatics analysis in E. coli translational protein folding, Angov et al. proposed a so-

revealed that putative translational attenuation sites are called codon harmonization strategy where rare codons

widespread, and were identified in about 60% of protein are positioned where domain boundaries were predicted

coding sequences [37]. In this study, Li et al. proposed that [48,49]. When compared to conventional codon optimi-

these internal Shine–Dalgarno sequences are the main zation, this strategy resulted in improved expression level

determinants of translation elongation rates in bacteria and soluble protein yield when overexpressing optimized

allowing for translational pausing where necessary. genes from the eukaryotic parasite Plasmodium falciparum

www.sciencedirect.com Current Opinion in Structural Biology 2016, 38:155–162

158 New constructs and expression of proteins

Table 2

Codon sequence features potentially influencing protein expression levels

Sequence feature Possible effect References

General features

G + C content [15,22,88,89]

Very high G + C content (>70%) mRNA secondary structure formation; can slow down or inhibit translation

Very low G + C content (<30%) Can slow down transcription elongation

mRNA secondary structure [31,53,56 ,90–92]

Global Can promote mRNA stability

Near RBS Reduction of translational initiation

Repetitive sequences Inhibition of translation through formation of mRNA stem–loops [16]

Rare codons

Global Can result in translational pausing, tRNA depletion resulting in low [27,39,93,94]

protein yield, mRNA degradation

At protein domain boundaries Promotes accurate co-translational folding [40–42,48]

Alternative start codons Shorter protein (if in frame); mixture of two proteins; [14]

incorrect protein if out of frame

Bacterial features

Chi-site stretches Recombination hotspots; DNA instability [95]

Alternative transcriptional initiation sites Leaderless transcripts; shorter or incorrect protein [96–98]

RNase E sites Can effect mRNA stability [31,99]

SD-like RBS sequences Can cause translational pausing (ribosome stalling) [37]

Eukaryotic features

High CpG content Can result in incorrect transcription initiation [88,100]

Internal TATA-boxes Can result in incorrect transcription initiation

Cryptic splice sites Missplicing of mRNA; incorrect protein [101]

Potential polyA sites Can induce premature termination of translation [102,103]

in E. coli. All these studies indicate that in gene synthesis, recombinant protein expression involves three intersect-

it may be important to conserve codon usage in some parts ing areas. First, the choice of expression host, second, the

of the gene, to maintain the regulatory elements that selection of the [50] and third, the GOI

affect folding and post-translational modifications. (Figure 2). Much research has been done to optimize

expression vectors adapted to the expression host. How-

The interface between the synthetic gene and ever, the adaptation of the GOI towards the expression

the expression vector vector is often neglected. Most bacterial vector systems

It is important to keep in mind that the mRNA produced are used as plug-and-play, where the GOI is inserted

to translate into protein is often not limited to the downstream of a vector-encoded ribosome binding site

0 0

synthetized gene, but contains fragments derived from (RBS). As part of the 5 untranslated region (5 UTR), the

the expression vector. In fact, the challenge to optimize RBS controls the rate of ribosome recruitment as well as

Figure 2

GOI vector insert

RNA pool

Protein start of transcript yield RBS translationalaffinity start tag protease GOIsite

AUG * 5’ UTR Host Vector 10 bp protein synthesis integration

Current Opinion in Structural Biology

Optimization triangle illustrating the key determinants in recombinant protein expression. The inset illustrates the anatomy of a typical bacterial

overexpression vector belonging to the popular pET-series in the region surrounding the insertion site of the GOI. The asterisk indicates a potential

scar resulting from the cloning strategy.

Current Opinion in Structural Biology 2016, 38:155–162 www.sciencedirect.com

The use of synthetic genes for protein expression Parret, Besir and Meijers 159

the efficiency of translation initiation [51]. It is generally improve recombinant protein expression. Hence, spe-

0

assumed that the mRNA sequence of the 5 UTR of an cies-specific gene optimization using proprietary algo-

expression vector is optimized in order to maximize the rithms can increase protein yield. Unfortunately,

translational efficiency of the GOI within a particular predicting the output of the different optimization algo-

host. However, the mRNA structure of the initial 40 rithms is still a major challenge, especially for membrane

coding nucleotides can be crucial to obtain high protein and secreted proteins, due to the many parameters influ-

expression levels, given that in general strong secondary encing codon usage. It would be a great advance if

structure (due to e.g. high GC content) inhibits efficient commonly used codon optimization algorithms were im-

translation initiation [22,24,52–55,56 ,57]. An additional proved in the future to incorporate the new insights

global observation is the relatively higher abundance of obtained in recent years with respect to codon usage.

rare codons in the initial 30–50 codons [58]. The corre- This will probably involve segmentation of the gene,

lation between the presence of rare codons and weak where each segment has different requirements for codon

0

mRNA structure in the 5 end of coding sequences is still optimization, including segments for translation initia-

under debate [55]. tion, domain boundaries and sites for post-translational modifications.

0

In typical bacterial vectors, the 5 end of the coding

sequence (excluding the GOI) often encodes secretion Optimized protein translation initiation could be

signals, affinity tags, fusion proteins or linker remnants addressed by vector design, rather than codon optimiza-

resulting from cloning, but the mRNA structure in this tion of the gene of interest. Indeed, synonymous ex-

region is rarely scrutinized (Figure 2). This may explain change of a small number of codons or introducing

0

why the expression of some proteins is strongly affected non-coding secondary structure elements at the 5 -end

by the placement of a tag on either the N-terminus or C- of a gene can already be sufficient for a significant increase

terminus or by the length and sequence of the linker in expression levels [65,66]. This can be combined with

region between tag and first codon of the transgene. N- gene fragment synthesis, consisting of linear DNA frag-

terminal tags may have evolved towards optimal mRNA ments that can be used for direct cloning into expression

structure, thus increasing expression of some proteins vectors or as template for PCR amplification. In this

with a peculiar codon sequence at the start of the gene. respect, the Biobricks movement (http://biobricks.org/)

For example, it has been shown that addition of a leader is a promising initiative to deliver direct access to tools to

sequence can overcome the notoriously low expression of make proteins from genetically optimized building blocks

AT-rich genes in E. coli [59]. On the basis of the recent [67].

evidence on the importance of codon usage at the trans-

lation initiation site, the choice of any expression vector The expectation is that broad studies on the relationship

should include a review of this region. If the protein between codon usage and tRNA abundance as well as

expression is suboptimal, it could be worthwhile to opti- mRNA structure and stability will lead to a set of robust

mize elements involved in translation initiation upstream rules for gene design. This will further enable efficient

of the GOI and include these in gene synthesis. protein expression that is affordable for the larger molec-

ular biology community.

In eukaryotes, translation initiation is an intricate stepwise

process orchestrated by multiple components. Extensive Conflict of interest statement

studies in Saccharomyces cerevisiae in particular revealed that Nothing declared.

key regulatory elements, i.e. translation initiation motifs

0

and secondary structures in the 5 -UTR, modulate protein Acknowledgements

abundance [58,60–62]. One particularly successful ap- The authors are very grateful to Xia Sheng (Genscript) and Michel Koch for

providing valuable input into the manuscript. We would like to apologize to

proach to manipulate protein yield in mammalian cells

those authors whose work could not be cited here due to space restrictions.

has been the incorporation of viral introns and optimized

Kozak sequences into the translation initiation sequence of

References and recommended reading

the corresponding expression vectors [63]. Translation Papers of particular interest, published within the period of review,

have been highlighted as:

termination can also be promoted by the introduction of

viral introns such as the post-transcriptional regulatory

of special interest

element derived from woodchuck hepatitis virus [64]. It of outstanding interest

is expected that further large-scale investigations into the

1. Vincentelli R, Cimino A, Geerlof A, Kubo A, Satou Y, Cambillau C:

regulation of eukaryotic protein translation will yield more

High-throughput protein expression screening and

elements that can boost protein expression. purification in Escherichia coli. Methods Struct Proteom 2011,

55:65-72.

Concluding remarks 2. Makowska-Grzyska M, Kim Y, Maltseva N, Li H, Zhou M,

Joachimiak G, Babnigg G, Joachimiak A: Protein production for

Economically attractive access to gene synthesis has

structural genomics using E. coli expression. Methods Mol Biol

enabled optimization of gene sequences in order to 2014, 1140:89-105.

www.sciencedirect.com Current Opinion in Structural Biology 2016, 38:155–162

160 New constructs and expression of proteins

3. Assenberg R, Wan PT, Geisse S, Mayr LM: Advances in 21. Sharp PM, Li WH: The codon Adaptation Index — a measure of

recombinant protein expression for use in pharmaceutical directional synonymous , and its potential

research. Curr Opin Struct Biol 2013, 23:393-402. applications. Nucleic Acids Res 1987, 15:1281-1295.

4. Kendrew JC, Dickerson RE, Strandberg BE, Hart RG, Davies DR, 22. Kudla G, Murray AW, Tollervey D, Plotkin JB: Coding-sequence

Phillips DC, Shore VC: Structure of myoglobin: a three- determinants of gene expression in Escherichia coli. Science

˚

dimensional Fourier synthesis at 2 A resolution. Nature 1960, 2009, 324:255-258.

185:422-427.

23. Welch M, Govindarajan S, Ness JE, Villalobos A, Gurney A,

5. Morth JP, Pedersen BP, Toustrup-Jensen MS, Sørensen TL-M, Minshull J, Gustafsson C: Design parameters to control

Petersen J, Andersen JP, Vilsen B, Nissen P: Crystal structure of synthetic gene expression in Escherichia coli. PLoS One 2009,

the sodium–potassium pump. Nature 2007, 450:1043-1049. 4:e7002.

24. Allert M, Cox JC, Hellinga HW: Multifactorial determinants of

6. Yusupova G, Yusupov M: Ribosome biochemistry in crystal

protein expression in prokaryotic open reading frames. J Mol

structure determination. RNA 2015, 21:771-773.

Biol 2010, 402:905-918.

7. Celie PHN, van Rossum-Fikkert SE, van Dijk WJ, Brejc K, Smit AB,

25. Menzella HG: Comparison of two codon optimization

Sixma TK: Nicotine and carbamylcholine binding to nicotinic

strategies to enhance recombinant protein production in

acetylcholine receptors as studied in AChBP crystal

Escherichia coli. Microb Cell Fact 2011, 10:15.

structures. Neuron 2004, 41:907-914.

26. Gould N, Hendy O, Papamichail D: Computational tools and

8. Locher KP, Lee AT, Rees DC: The E. coli BtuCD structure: a

algorithms for designing customized synthetic genes. Front

framework for ABC transporter architecture and mechanism.

Bioeng Biotechnol 2014, 2:41.

Science 2002, 296:1091-1098.

27. Gustafsson C, Minshull J, Govindarajan S, Ness J, Villalobos A,

9. Bostro¨ m P, Wu J, Jedrychowski MP, Korde A, Ye L, Lo JC,

Welch M: Engineering genes for predictable protein

Rasbach KA, Bostro¨ m EA, Choi JH, Long JZ et al.: A PGC1a-

expression. Protein Expr Purif 2012, 83:37-46.

dependent myokine that drives browning of white fat and

thermogenesis. Nature 2012, 481:463-468.

28. Chung B, Lee D-Y: Computational codon optimization of

synthetic gene for protein expression. BMC Syst Biol 2012,

10. Trempe J-F, Sauve´ V, Grenier K, Seirafi M, Tang MY, Me´ nade M, 6:134.

Al-Abdul-Wahid S, Krett J, Wong K, Kozlov G et al.: Structure of

parkin reveals mechanisms for ubiquitin ligase activation. 29. Presnyak V, Alhusaini N, Chen YH, Martin S, Morris N, Kline N,

Science 2013, 340:1451-1455. Olson S, Weinberg D, Baker KE, Graveley BR et al.: Codon

optimality is a major determinant of mRNA stability. Cell 2015,

11. Finci LI, Kru¨ ger N, Sun X, Zhang J, Chegkazi M, Wu Y, Schenk G, 160:1111-1124.

Mertens HDT, Svergun DI, Zhang Y et al.: The crystal structure of

netrin-1 in complex with DCC reveals the bifunctionality of 30. Boe¨ l G, Letso R, Neely H, Price WN, Wong K-H, Su M, Luff JD,

netrin-1 as a guidance cue. Neuron 2014, 83:839-849. Valecha M, Everett JK, Acton TB et al.: Codon influence on

protein expression in E. coli correlates with mRNA levels.

12. Zhou M, Guo J, Cha J, Chae M, Chen S, Barral JM, Sachs MS, Nature 2016, 529:358-363.

Liu Y: Non-optimal codon usage affects expression, structure Large-scale protein expression study combined with biochemical ana-

and function of clock protein FRQ. Nature 2013, 495:111-115. lysis of optimized synthetic genes. This extensive analysis in E. coli

This paper describes how modification of codon usage in a circadian strongly suggests that codon content directly modulates two competing

protein revealed how the use of specific codons regulates phosphoryla- processes, mRNA stability and protein elongation.

tion and structure stability.

31. Del Campo C, Bartholoma¨ us A, Fedyunin I, Ignatova Z: Secondary

13. Shah K, Cheng Y, Hahn B, Bridges R, Bradbury NA, Mueller DM: structure across the bacterial transcriptome reveals versatile

Synonymous codon usage affects the expression of wild type roles in mRNA regulation and function. PLoS Genet 2015,

and F508del CFTR. J Mol Biol 2015, 427:1464-1479. 11:e1005613.

14. Dunne M, Leicht S, Krichel B, Mertens HDT, Thompson A, 32. Haberstock S, Roos C, Hoevels Y, Do¨ tsch V, Schnapp G,

Krijgsveld J, Svergun DI, Go´ mez-Torres N, Garde S, Uetrecht C Pautsch A, Bernhard F: A systematic approach to increase the

et al.: Crystal structure of the CTP1L endolysin reveals how its efficiency of membrane protein production in cell-free

activity is regulated by a secondary translation product. J Biol expression systems. Protein Expr Purif 2012, 82:308-316.

Chem 2016, 291:4882-4893.

33. Mureev S, Kovtun O, Nguyen UTT, Alexandrov K: Species-

15. Newman ZR, Young JM, Ingolia NT, Barton GM: Differences in independent translational leaders facilitate cell-free

codon bias and GC content contribute to the balanced expression. Nat Biotechnol 2009, 27:747-752.

expression of TLR7 and TLR9. Proc Natl Acad Sci U S A 2016,

113:E1362-E1371. 34. Brar GA, Weissman JS: Ribosome profiling reveals the what,

when, where and how of protein synthesis. Nat Rev Mol Cell Biol

16. Gustafsson C, Govindarajan S, Minshull J: Codon bias and 2015, 16:651-664.

heterologous protein expression. Trends Biotechnol 2004,

22:346-353. 35. Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS:

Genome-wide analysis in vivo of translation with nucleotide

resolution using ribosome profiling. Science 2009,

17. Wu G, Zheng Y, Qureshi I, Zin HT, Beck T, Bulka B, Freeland SJ:

324:218-223.

SGDB: a database of synthetic genes re-designed for

optimizing protein over-expression. Nucleic Acids Res 2007,

36. Ingolia NT: Ribosome profiling: new views of translation, from

35:D76-D79.

single codons to genome scale. Nat Rev Genet 2014,

15:205-213.

18. Quax TE, Claassens NJ, Soll D, van der Oost J: Codon bias as a

means to fine-tune gene expression. Mol Cell 2015, 59:149-161.

37. Li G-W, Oh E, Weissman JS: The anti-Shine–Dalgarno sequence

Excellent review illustrating the different types of codon bias and its

drives translational pausing and codon choice in bacteria.

underlying principles. The authors describe the potential of codon bias to

Nature 2012, 484:538-541.

modulate gene expression and explore the challenges associated with

exploiting codon bias as a tool for synthetic biology. 38. Clarke TF, Clark PL: Rare codons cluster. PLoS One 2008,

3:e3412.

19. Rosano GL, Ceccarelli EA: Rare codon content affects the

solubility of recombinant proteins in a codon bias-adjusted 39. Clarke TF, Clark PL: Increased incidence of rare codon clusters

0 0

Escherichia coli strain. Microb Cell Fact 2009, 8:41. at 5 and 3 gene termini: implications for function. BMC

Genomics 2010, 11:118.

20. Welch M, Villalobos A, Gustafsson C, Minshull J: Designing genes

for successful protein expression. Methods Enzym 2011, 40. Chartier M, Gaudreault F, Najmanovich R: Large-scale analysis

498:43-66. of conserved rare codon clusters suggests an involvement in

Current Opinion in Structural Biology 2016, 38:155–162 www.sciencedirect.com

The use of synthetic genes for protein expression Parret, Besir and Meijers 161

co-translational molecular recognition events. Bioinforma Oxf archaeon Pyrococcus furiosus. FEMS Microbiol Lett 2002,

Engl 2012, 28:1438-1445. 216:179-183.

41. Komar AA: A pause for thought along the co-translational 60. Dvir S, Velten L, Sharon E, Zeevi D, Carey LB, Weinberger A,

0

folding pathway. Trends Biochem Sci 2009, 34:16-24. Segal E: Deciphering the rules by which 5 -UTR sequences

affect protein expression in yeast. Proc Natl Acad Sci U S A

42. Zhang G, Hubalewska M, Ignatova Z: Transient ribosomal

2013, 110:E2792-E2801.

attenuation coordinates protein synthesis and co-

translational folding. Nat Struct Mol Biol 2009, 16:274-280. 61. Crook NC, Schmitz AC, Alper HS: Optimization of a yeast RNA

interference system for controlling gene expression and

43. Pechmann S, Frydman J: Evolutionary conservation of codon

enabling rapid metabolic engineering. ACS Synth Biol 2014,

optimality reveals hidden signatures of cotranslational 3:307-313.

folding. Nat Struct Mol Biol 2013, 20:237-243.

62. Redden H, Morse N, Alper HS: The synthetic biology toolbox for

44. Fluman N, Navon S, Bibi E, Pilpel Y: mRNA-programmed

tuning gene expression in yeast. FEMS Yeast Res 2014 http://

translation pauses in the targeting of E. coli membrane

dx.doi.org/10.1111/1567-1364.12188.

proteins. eLife 2014:3.

Relevant study underlying the importance of slow protein elongation for

63. Backliwal G, Hildinger M, Chenuet S, Wulhfard S, De Jesus M,

membrane protein targeting in E. coli.

Wurm FM: Rational vector design and multi-pathway

modulation of HEK 293E cells yield recombinant antibody

45. Pechmann S, Chartron JW, Frydman J: Local slowdown of

titers exceeding 1 g/l by transient transfection under serum-

translation by nonoptimal codons promotes nascent-chain

free conditions. Nucleic Acids Res 2008, 36:e96.

recognition by SRP in vivo. Nat Struct Mol Biol 2014,

21:1100-1105.

64. Zufferey R, Donello JE, Trono D, Hope TJ: Woodchuck hepatitis

virus posttranscriptional regulatory element enhances

46. Nørholm MHH, Light S, Virkki MTI, Elofsson A, von Heijne G,

expression of transgenes delivered by retroviral vectors. J

Daley DO: Manipulating the genetic code for membrane

Virol 1999, 73:2886-2892.

protein production: what have we learnt so far? Biochim

Biophys Acta BBA Biomembr 2012, 1818:1091-1096.

65. Karimi Z, Nezafat N, Negahdaripour M, Berenjian A, Hemmati S,

47. Hess A-K, Saffert P, Liebeton K, Ignatova Z: Optimization of Ghasemi Y: The effect of rare codons following the ATG start

translation profiles enhances protein expression and codon on expression of human granulocyte-colony

solubility. PLOS ONE 2015, 10:e0127039. stimulating factor in Escherichia coli. Protein Expr Purif 2015,

114:108-114.

48. Angov E, Hillier CJ, Kincaid RL, Lyon JA: Heterologous protein

expression is enhanced by harmonizing the codon usage 66. Cheong D-E, Ko K-C, Han Y, Jeon H-G, Sung BH, Kim G-J,

frequencies of the target gene with those of the expression Choi JH, Song JJ: Enhancing functional expression of

host. PLoS One 2008, 3:e2189. heterologous proteins through random substitution of genetic

0

codes in the 5 coding region. Biotechnol Bioeng 2015,

49. Angov E, Legler PM, Mease RM: Adjustment of codon usage 112:822-826.

frequencies by codon harmonization improves protein

expression and folding. In Heterologous Gene Expression in E. 67. Sleight SC, Sauro HM: Randomized BioBrick assembly: a novel

coli. Edited by Evans , Xu M-QQ.. Humana Press; 2011:1-13. DNA assembly method for randomizing and optimizing

genetic circuits and metabolic pathways. ACS Synth Biol 2013,

50. Sørensen HP, Mortensen KK: Advanced genetic strategies for 2:506-518.

recombinant protein expression in Escherichia coli.

J Biotechnol 2005, 115:113-128. 68. Nakamura Y, Gojobori T, Ikemura T: Codon usage tabulated

from international DNA sequence databases: status for the

51. Salis HM, Mirsky EA, Voigt CA: Automated design of synthetic

year 2000. Nucleic Acids Res 2000, 28:292.

ribosome binding sites to control protein expression. Nat

Biotechnol 2009, 27:946-950.

69. Hilterbrand A, Saelens J, Putonti C: CBDB: the codon bias

database. BMC Bioinformatics 2012, 13:62.

52. Araujo PR, Yoon K, Ko D, Smith AD, Qiao M, Suresh U, Burns SC,

Penalva LOF: Before it gets started: regulating translation at

` CAIcal: a combined set of

0 70. Puigbo P, Bravo IG, Garcia-Vallve S:

the 5 UTR. Comp Funct Genomics 2012, 2012:1-8.

tools to assess codon usage adaptation. Biol Direct 2008, 3:38.

53. Kozak M: Regulation of translation via mRNA structure in

71. Wu G, Culley DE, Zhang W: Predicted highly expressed genes in

prokaryotes and eukaryotes. Gene 2005, 361:13-37.

the genomes of Streptomyces coelicolor and Streptomyces

avermitilis and the implications for their metabolism. Microbiol

54. Gu W, Zhou T, Wilke CO: A universal trend of reduced mRNA

Read Engl 2005, 151:2175-2187.

stability near the translation-initiation site in prokaryotes and

eukaryotes. PLoS Comput Biol 2010, 6:e1000664.

72. Angellotti MC, Bhuiyan SB, Chen G, Wan X-F: CodonO: codon

0 usage bias analysis within and across genomes. Nucleic Acids

55. Tuller T, Zur H: Multiple roles of the coding sequence 5 end in

Res 2007, 35:W132-W136.

gene expression regulation. Nucleic Acids Res 2015, 43:13-28.

56. Goodman DB, Church GM, Kosuri S: Causes and effects of 73. Fuhrmann M, Hausherr A, Ferbitz L, Scho¨ dl T, Heitzer M,

N-terminal codon bias in bacterial genes. Science 2013, Hegemann P: Monitoring dynamic expression of nuclear genes

342:475-479. in Chlamydomonas reinhardtii by using a synthetic luciferase

This paper reports expression data of an extensive library of synthetic reporter gene. Plant Mol Biol 2004, 55:869-881.

reporter constructs consisting of 14,000 combinations of promoters,

74. Supek F, Vlahovicek K: INCA: synonymous codon usage

ribosome binding sites and 11 N-terminal codons fused to GFP. The

analysis and clustering by means of self-organizing map.

magnitude of this study allows for detailed dissection of how individual N-

Bioinforma Oxf Engl 2004, 20:2329-2330.

terminal codons effect expression yield of GFP fusions. The authors

report an up to 14-fold increase in expression when using N-terminal

75. Stothard P: The sequence manipulation suite: JavaScript

rare codons.

programs for analyzing and formatting protein and DNA

57. Shah P, Ding Y, Niemczyk M, Kudla G, Plotkin JB: Rate-limiting sequences. BioTechniques 2000, 28:1104.

steps in yeast protein translation. Cell 2013, 153:1589-1601.

76. Chin JX, Chung BK-S, Lee D-Y: Codon Optimization OnLine

58. Tuller T, Carmi A, Vestsigian K, Navon S, Dorfan Y, Zaborske J, (COOL): a web-based multi-objective optimization platform

Pan T, Dahan O, Furman I, Pilpel Y: An evolutionarily conserved for synthetic gene design. Bioinforma Oxf Engl 2014,

mechanism for controlling the efficiency of protein 30:2210-2212.

translation. Cell 2010, 141:344-354.

77. Hoover DM: DNAWorks: an automated method for designing

59. Ishida M, Oshima T, Yutani K: Overexpression in Escherichia coli oligonucleotides for PCR-based gene synthesis. Nucleic Acids

of the AT-rich trpA and trpB genes from the hyperthermophilic Res 2002, 30:43e.

www.sciencedirect.com Current Opinion in Structural Biology 2016, 38:155–162

162 New constructs and expression of proteins

78. Guimaraes JC, Rocha M, Arkin AP, Cambray G: D-Tailor: 92. Pop C, Rouskin S, Ingolia NT, Han L, Phizicky EM, Weissman JS,

automated analysis and design of DNA sequences. Bioinforma. Koller D: Causal signals between codon bias, mRNA structure,

Oxf. Engl. 2014 http://dx.doi.org/10.1093/bioinformatics/btt742. and the efficiency of translation and elongation. Mol Syst Biol

2014, 10:770.

79. Gaspar P, Oliveira JL, Frommlet J, Santos MAS, Moura G:

EuGene: maximizing synthetic gene design for heterologous 93. Kane JF: Effects of rare codon clusters on high-level

expression. Bioinforma Oxf Engl 2012, 28:2683-2684. expression of heterologous proteins in Escherichia coli. Curr

Opin Biotechnol 1995, 6:494-500.

80. Richardson SM, Wheelan SJ, Yarrington RM, Boeke JD:

GeneDesign: rapid, automated design of multikilobase 94. Drummond DA, Wilke CO: Mistranslation-induced protein

synthetic genes. Genome Res 2006, 16:550-556. misfolding as a dominant constraint on coding-sequence

evolution. Cell 2008, 134:341-352.

81. Villalobos A, Ness JE, Gustafsson C, Minshull J, Govindarajan S:

Gene Designer: a synthetic biology tool for constructing 95. Taylor AF, Amundsen SK, Smith GR: Unexpected DNA context-

artificial DNA segments. BMC Bioinformatics 2006, 7:285. dependence identifies a new determinant of Chi

recombination hotspots. Nucleic Acids Res 2016 http://

82. Fath S, Bauer AP, Liss M, Spriestersbach A, Maertens B, Hahn P, dx.doi.org/10.1093/nar/gkw541.

Ludwig C, Scha¨ fer F, Graf M, Wagner R: Multiparameter RNA

and codon optimization: a standardized tool to assess and

96. Chenge J, Kavanagh ME, Driscoll MD, McLean KJ, Young DB,

enhance autologous mammalian gene expression. PLoS ONE

Cortes T, Matak-Vinkovic D, Levy CW, Rigby SEJ, Leys D et al.:

2011, 6:e17596.

Structural characterization of CYP144A1 — a cytochrome

P450 enzyme expressed from alternative transcripts in

83. Grote A, Hiller K, Scheer M, Munch R, Nortemann B, Hempel DC,

Mycobacterium tuberculosis. Sci Rep 2016, 6:26628.

Jahn D: JCat: a novel tool to adapt codon usage of a target

gene to its potential expression host. Nucleic Acids Res 2005,

97. Shell SS, Wang J, Lapierre P, Mir M, Chase MR, Pyle MM,

33:W526-W531.

Gawande R, Ahmad R, Sarracino DA, Ioerger TR et al.: Leaderless

transcripts and small proteins are common features of the

84. Gaspar P, Moura G, Santos MAS, Oliveira JL: mRNA secondary

mycobacterial translational landscape. PLoS Genet 2015,

structure optimization using a correlated stem–loop

11:e1005641.

prediction. Nucleic Acids Res 2013, 41:e73.

98. Srivastava A, Gogoi P, Deka B, Goswami S, Kanaujia SP: In silico

85. Puigbo P, Guzman E, Romeu A, Garcia-Vallve S: OPTIMIZER: a 0

analysis of 5 -UTRs highlights the prevalence of Shine–

web server for optimizing the codon usage of DNA sequences.

Dalgarno and leaderless-dependent mechanisms of

Nucleic Acids Res 2007, 35:W126-W131.

translation initiation in bacteria and archaea, respectively. J

86. Wu G, Bashir-Bello N, Freeland SJ: The Synthetic Gene Theor Biol 2016, 402:54-61.

Designer: a flexible web platform to explore sequence

99. Mackie GA: RNase E: at the interface of bacterial RNA

manipulation for heterologous expression. Protein Expr Purif

processing and decay. Nat Rev Microbiol 2012, 11:45-57.

2006, 47:441-445.

100. Jonkers I, Kwak H, Lis JT: Genome-wide dynamics of Pol II

87. Jung S-K, McDonald K: Visual gene developer: a fully

elongation and its interplay with promoter proximal pausing,

programmable bioinformatics software for synthetic gene

chromatin, and exons. eLife 2014, 3:e02407.

optimization. BMC Bioinformatics 2011, 12:340.

88. Bauer AP, Leikam D, Krinner S, Notka F, Ludwig C, Langst G, 101. Bukovac SW, Bagshaw RD, Rigat BA, Callahan JW, Clarke JTR,

Wagner R: The impact of intragenic CpG content on gene Mahuran DJ: Cryptic splice site in the complementary DNA of

expression. Nucleic Acids Res 2010, 38:3891-3908. glucocerebrosidase causes inefficient expression. Anal

Biochem 2008, 381:276-278.

89. Kudla G, Lipinski L, Caffin F, Helwak A, Zylicz M: High guanine

and cytosine content increases mRNA levels in mammalian 102. Proudfoot NJ: Ending the message: poly(A) signals then and

cells. PLoS Biol 2006, 4:e180. now. Genes Dev 2011, 25:1770-1782.

90. Kertesz M, Wan Y, Mazor E, Rinn JL, Nutter RC, Chang HY, 103. Pfarr DS, Rieser LA, Woychik RP, Rottman FM, Rosenberg M,

Segal E: Genome-wide measurement of RNA secondary Reff ME: Differential effects of polyadenylation regions on

structure in yeast. Nature 2010, 467:103-107. gene expression in mammalian cells. DNA Mary Ann Liebert Inc

1986, 5:115-122.

91. Bentele K, Saffert P, Rauscher R, Ignatova Z, Blu¨ thgen N: Efficient

translation initiation dictates codon usage at gene start. Mol 104. Chayen NE: Turning protein crystallisation from an art into a

Syst Biol 2013, 9:675. science. Curr Opin Struct Biol 2004, 14:577-583.

Current Opinion in Structural Biology 2016, 38:155–162 www.sciencedirect.com