Genegenie: Optimized Oligomer Design for Directed Evolution Neil Swainston1,2,*, Andrew Currin1,3, Philip J
Total Page:16
File Type:pdf, Size:1020Kb
Published online 29 April 2014 Nucleic Acids Research, 2014, Vol. 42, Web Server issue W395–W400 doi: 10.1093/nar/gku336 GeneGenie: optimized oligomer design for directed evolution Neil Swainston1,2,*, Andrew Currin1,3, Philip J. Day1,4 and Douglas B. Kell1,3 1Manchester Institute of Biotechnology, The University of Manchester, Manchester M1 7DN, UK, 2School of Computer Science, The University of Manchester, Manchester M13 9PL, UK, 3School of Chemistry, The University of Manchester, Manchester M13 9PL, UK and 4Faculty of Medical and Human Sciences, The University of Manchester, Manchester M13 9PT, UK Received January 30, 2014; Revised April 2, 2014; Accepted April 9, 2014 ABSTRACT GeneDesign (4) and DNAWorks (5). Each of these tools has their advantages: Gene Designer, for example, provides GeneGenie, a new online tool available at http://www. a comprehensive application for designing larger synthetic gene-genie.org, is introduced to support the de- systems whilst GeneDesign has recently been updated to al- sign and self-assembly of synthetic genes and con- low for the construction of entire chromosomes (6). How- structs. GeneGenie allows for the design of oligonu- ever, none of these packages supports the generation of vari- cleotide cohorts encoding the gene sequence opti- ant libraries to enable directed evolution studies. mized for expression in any suitable host through Consequently, GeneGenie, a new online tool available at an intuitive, easy-to-use web interface. The tool en- http://www.gene-genie.org, is introduced to support the de- sures consistent oligomer overlapping melting tem- sign of variant libraries of synthetic genes and constructs. peratures, minimizes the likelihood of misannealing, GeneGenie shares many features of existing optimization optimizes codon usage for expression in a selected software, allowing for the design of oligonucleotides en- coding the gene sequence responsible for the desired pro- host, allows for specification of forward and reverse tein sequence and optimized for expression in any suitable cloning sequences (for downstream ligation) and host through an intuitive, easy-to-use web interface. The also provides support for mutagenesis or directed tool ensures consistent oligomer overlap melting tempera- evolution studies. Directed evolution studies are en- tures, minimizes the likelihood of misannealing and opti- abled through the construction of variant libraries via mizes codon usage for expression in a selected host. the optional specification of ‘variant codons’, con- Output oligomers can be assembled using polymerase taining mixtures of bases, at any position. For exam- chain reaction (PCR)-based methods (7) and are fully com- ple, specifying the variant codon TNT (where N is any patible with our own optimized gene synthesis protocol de- nucleotide) will generate an equimolar mixture of the veloped alongside GeneGenie (A. Currin et al., manuscript codons TAT, TCT, TGT and TTT at that position, en- in preparation). These methods provide highly efficient as- coding a mixture of the amino acids Tyr, Ser, Cys and sembly, permitting expression and functional analysis of genes up to 2 kb in length before sequence verification. This Phe. This facility is demonstrated through the use of represents a significant improvement over currently estab- GeneGenie to develop and synthesize a library of en- lished direct gene synthesis methods. Using this integrated hanced green fluorescent protein variants. wet- and dry-lab approach, the successful synthesis and di- rect assay of enhanced green fluorescent protein (EGFP) (8) is demonstrated. INTRODUCTION Novelties of GeneGenie include the specification of for- The de novo synthesis of genes is becoming increasingly es- ward and reverse cloning sequences, facilitating the ligation tablished in synthetic biology and biotechnology as a means of the designed gene into a vector and its subsequent ex- of controlling the specific assembly of amino acids produc- pression, and the optional specification of ‘variant codons’ ing active proteins. Current approaches involve the synthe- at given positions. These variant codons can include both sis (or purchase) of a number of short oligonucleotides (typ- ‘pure’ (A, C, G and T) and mixed bases. Specification of ically ∼60 bases in length), which can be assembled to form codons including mixed bases allows for variant sequences genes and expressed in a host system of interest. to be constructed, supporting mutagenesis studies through Recent review papers (1,2) discuss existing software the generation of variant libraries. For example, specifying for gene optimization, including Gene Designer (3), *To whom correspondence should be addressed. Tel: +44 161 306 5146; Fax: +44 (0)161 306 5201; Email: [email protected] C The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. W396 Nucleic Acids Research, 2014, Vol. 42, Web Server issue the variant codon TNT (where N is any nucleotide) will gen- codon that encoded the original amino acid in that po- erate an equimolar mixture of the codons TAT, TCT, TGT sition in step (i). and TTT at that position, encoding a mixture of the amino (v). The initial solution is scored, according to ‘Scoring’, acids Tyr, Ser, Cys and Phe. below, and the initial score for each objective [initial The web server is driven by a simple web interface and Codon Adaptation Index (CAI) score (CAIinit), initial runs an efficient simulated annealing algorithm to optimize overlap melting temperature score (Tminit), initial mis- oligomer design from any supplied protein sequence. The anneal score (misinit) and initial fixed codon viability web interface links to both the UniProt protein sequence score (fixedinit)] is retained. database (9) and the Codon Usage Database (10), is fully documented with help files and requires no user setup. Scoring. A solution is scored according to the following criteria. Three objectives are scored for each job––CAI, MATERIALS AND METHODS overlap melting temperatures and misanneals––and a System architecture fourth, fixed codon viability is considered if ‘variant codons’ have been selected. GeneGenie is a two-tiered web application, developed with CAI score, CAI , is simply defined as 1 − CAI. (See be- the Google Web Toolkit (GWT) and written in Java 7, s low for the definition of CAI). Overlap melting temperature CSS and HTML. The web interface is accessible through score, Tm , is calculated as the coefficient of variation of the a web browser that supports GWT (Firefox, Internet Ex- s overlap melting temperatures, Tm , from the target melting plorer 6 and above, Safari 5 and above, Chromium and i temperature, Tm . Melting temperatures are calculated as Google Chrome and Opera latest version) and provides t described below: the facility for submitting jobs and viewing results. The web server provides an implementation of a novel sim- 1 n − 2 ulated annealing algorithm for optimizing gene design. n i=1(Tmi Tmt) Tm = . Source code is freely available at http://svn.code.sf.net/p/ s Tm mcisb/code/mcisb-mercedes/. t The misanneals score, miss, is calculated as 1 − Zscore of Algorithm description the melting temperatures of the set of positive annealing se- quences (that is, those of the oligo overlaps) and the melting A novel simulated annealing algorithm (11,12) was devel- temperatures of the set of negative, misannealing sequences: oped to optimize gene design. This is described in depth be- low. σ + σ = 3( p n). miss Initialization. The job is initialized through the following μp − μn steps. The positive set is simply the calculated melted tempera- (i). Back translation of the ‘protein sequence’, using tures of the overlapping sequences, as described previously. codons selected randomly following a Monte Carlo The negative set is generated by calculating melting temper- approach according to their frequency in the codon atures between all segments of the gene sequence in both the usage table for the selected host organism. The se- forward and reverse directions and retaining those within quence is checked to ensure that it adheres to the spec- ◦ 25 C of the target melting temperature, Tmt. ified ‘maximum number of repeating nucleotides’. If The fixed codon viability score, fixeds, is simply a count the sequence contains more than the specified ‘maxi- of the number of unviable variant codons, that is, requested mum number of repeating nucleotides’, this process is variant codons that fall in overlapping regions of the se- repeated up to 1000 times until an acceptable initial quence. deoxyribonucleic acid (DNA) sequence is generated. The overall score, scores, is the mean of the score of each (ii). If ‘variant codons’ have been selected, these are substi- objective scaled by its corresponding initial score: tuted into the initial DNA sequence, and the replaced codon, encoding the original amino acid at that posi- scores = tion, is retained. (iii). Oligomers are generated, each with a length of the 1 CAIs + Tms + miss + fixeds . supplied ‘maximum oligo length’ minus a fixed value 4 CAIinit Tminit misinit fixedinit (currently 5 bp), which provides scope for oligomer lengths to subsequently both be increased and de- creased during the optimization