(12) Patent Application Publication (10) Pub. No.: US 2006/0068395 A1 Wood Et Al
Total Page:16
File Type:pdf, Size:1020Kb
US 2006.0068395A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2006/0068395 A1 Wood et al. (43) Pub. Date: Mar. 30, 2006 (54) SYNTHETIC NUCLEIC ACID MOLECULE (21) Appl. No.: 10/943,508 COMPOSITIONS AND METHODS OF PREPARATION (22) Filed: Sep. 17, 2004 (76) Inventors: Keith V. Wood, Mt. Horeb, WI (US); Publication Classification Monika G. Wood, Mt. Horeb, WI (US); Brian Almond, Fitchburg, WI (51) Int. Cl. (US); Aileen Paguio, Madison, WI CI2O I/68 (2006.01) (US); Frank Fan, Madison, WI (US) C7H 2L/04 (2006.01) (52) U.S. Cl. ........................... 435/6: 435/320.1; 536/23.1 Correspondence Address: SCHWEGMAN, LUNDBERG, WOESSNER & (57) ABSTRACT KLUTH 1600 TCF TOWER A method to prepare synthetic nucleic acid molecules having 121 SOUTHEIGHT STREET reduced inappropriate or unintended transcriptional charac MINNEAPOLIS, MN 55402 (US) teristics when expressed in a particular host cell. Patent Application Publication Mar. 30, 2006 Sheet 1 of 2 US 2006/0068395 A1 Figure 1 Amino Acid Codon Phe UUU, UUC Ser UCU, UCC, UCA, UCG, AGU, AGC Tyr UAU, UAC Cys UGU, UGC Leu UUA, UUG, CUU, CUC, CUA, CUG Trp UGG Pro CCU, CCC, CCA, CCG His CAU, CAC Arg CGU, CGC, CGA, CGG, AGA, AGG Gln CAA, CAG Ile AUU, AUC, AUA Thr ACU, ACC, ACA, ACG ASn AAU, AAC LyS AAA, AAG Met AUG Val GUU, GUC, GUA, GUG Ala GCU, GCC, GCA, GCG Asp GAU, GAC Gly GGU, GGC, GGA, GGG Glu GAA, GAG Patent Application Publication Mar. 30, 2006 Sheet 2 of 2 US 2006/0068395 A1 Spd Sequence pGL4B-4NN3. 1 sequence Spel-Nicol-Ver2 w N. MCS-4 sequence bla-5 ---...Ncd. A 10, 2 US 2006/0068395 A1 Mar. 30, 2006 SYNTHETIC NUCLEC ACID MOLECULE levels of expression. For example, transcription factor bind COMPOSITIONS AND METHODS OF ing sites located downstream from a promoter have been PREPARATION demonstrated to effect promoter activity (Michael et al., 1990; Lamb et al., 1998: Johnson et al., 1998; Jones et al., BACKGROUND 1997). Additionally, it is not uncommon for an enhancer element to exert activity and result in elevated levels of DNA 0001 Transcription, the synthesis of an RNA molecule transcription in the absence of a promoter sequence or for from a sequence of DNA is the first step in gene expression. the presence of transcription regulatory sequences to Sequences which regulate DNA transcription include pro increase the basal levels of gene expression in the absence moter sequences, polyadenylation signals, transcription fac of a promoter sequence. tor binding sites and enhancer elements. A promoter is a DNA sequence capable of specific initiation of transcription 0005 Thus, what is needed is a method for making and consists of three general regions. The core promoter is synthetic nucleic acid molecules with altered codon usage the sequence where the RNA polymerase and its cofactors without also introducing inappropriate or unintended tran bind to the DNA. Immediately upstream of the core pro Scription regulatory sequences for expression in a particular moter is the proximal promoter which contains several host cell. transcription factor binding sites that are responsible for the assembly of an activation complex that in turn recruits the SUMMARY OF THE INVENTION polymerase complex. The distal promoter, located further 0006 The invention provides an isolated nucleic acid upstream of the proximal promoter also contains transcrip molecule (a polynucleotide) comprising a synthetic nucle tion factor binding sites. Transcription termination and poly otide sequence having reduced, for instance, 90% or less, adenylation, like transcription initiation, are site specific and e.g., 80%, 78%, 75%, or 70% or less, nucleic acid sequence encoded by defined sequences. Enhancers are regulatory identity relative to a parent nucleic acid sequence, e.g., a regions, containing multiple transcription factor binding wild-type nucleic acid sequence, and having fewer regula sites, that can significantly increase the level of transcription tory sequences such as transcription regulatory sequences. from a responsive promoter regardless of the enhancer's In one embodiment, the synthetic nucleotide sequence has orientation and distance with respect to the promoter as long fewer regulatory sequences than would result if the sequence as the enhancer and promoter are located within the same differences between the synthetic nucleotide sequence and DNA molecule. The amount of transcript produced from a the parent nucleic acid sequence, e.g., optionally the result gene may also be regulated by a post-transcriptional mecha of differing codons, were randomly selected. In one embodi nism, the most important being RNA splicing that removes ment, the synthetic nucleotide sequence encodes a polypep intervening sequences (introns) from a primary transcript tide that has an amino acid sequence that is at least 85%, between splice donor and splice acceptor sequences. 90%. 95%, or 99%, or 100%, identical to the amino acid 0002 Natural selection is the hypothesis that genotype sequence of a naturally-occurring (native or wild-type) environment interactions occurring at the phenotypic level corresponding polypeptide (protein). Thus, it is recognized lead to differential reproductive success of individuals and that some specific amino acid changes may also be desirable therefore to modification of the gene pool of a population. to alter a particular phenotypic characteristic of a polypep Some properties of nucleic acid molecules that are acted tide encoded by the synthetic nucleotide sequence. Prefer upon by natural selection include codon usage frequency, ably, the amino acid sequence identity is over at least 100 RNA secondary structure, the efficiency of intron splicing, contiguous amino acid residues. In one embodiment of the and interactions with transcription factors or other nucleic invention, the codons in the synthetic nucleotide sequence acid binding proteins. Because of the degenerate nature of that differ preferably encode the same amino acids as the the genetic code, these properties can be optimized by corresponding codons in the parent nucleic acid sequence. natural selection without altering the corresponding amino 0007 Hence, in one embodiment, the invention provides acid sequence. an isolated nucleic acid molecule comprising a synthetic 0003. Under some conditions, it is useful to synthetically nucleotide sequence having a coding region for a selectable alter the natural nucleotide sequence encoding a polypeptide or screenable polypeptide, wherein the synthetic nucleotide to better adapt the polypeptide for alternative applications. A sequence has 90%, e.g., 80%, or less nucleic acid sequence common example is to alter the codon usage frequency of a identity to a parent nucleic acid sequence encoding a cor gene when it is expressed in a foreign host cell. Although responding selectable or screenable polypeptide, and redundancy in the genetic code allows amino acids to be wherein the synthetic nucleotide sequence encodes a select encoded by multiple codons, different organisms favor Some able or screenable polypeptide with at least 85% amino acid codons over others. It has been found that the efficiency of sequence identity to the corresponding selectable or screen protein translation in a non-native host cell can be substan able polypeptide encoded by the parent nucleic acid tially increased by adjusting the codon usage frequency but sequence. The decreased nucleotide sequence identity may maintaining the same gene product (U.S. Pat. Nos. 5,096, be a result of different codons in the synthetic nucleotide sequence relative to the codons in the parent nucleic acid 825, 5,670,356, and 5,874,304). sequence. The synthetic nucleotide sequence of the inven 0004. However, altering codon usage may, in turn, result tion has a reduced number of regulatory sequences relative in the unintentional introduction into a synthetic nucleic acid to the parent nucleic acid sequence, for example, relative to molecule of inappropriate transcription regulatory the average number of regulatory sequences resulting from sequences. This may adversely effect transcription, resulting random selections of codons or nucleotides at the sequences in anomalous expression of the synthetic DNA. Anomalous which differ between the synthetic nucleotide sequence and expression is defined as departure from normal or expected the parent nucleic acid sequence. In one embodiment, a US 2006/0068395 A1 Mar. 30, 2006 nucleic acid molecule may include a synthetic nucleotide e.g., differing codons, were randomly selected. Preferably, sequence which together with other sequences encodes a the synthetic nucleotide sequence encodes a polypeptide that selectable or screenable polypeptide. For instance, a syn has an amino acid sequence that is at least 85%, preferably thetic nucleotide sequence which forms part of an open 90%, and most preferably 95% or 99% identical to the reading frame for a selectable or screenable polypeptide may amino acid sequence of a naturally-occurring or parent include at least 100, 150, 200, 250, 300 or more nucleotides polypeptide. Thus, it is recognized that some specific amino of the open reading, which nucleotides have reduced nucleic acid changes may be desirable to alter a particular pheno acid sequence identity relative to corresponding sequences typic characteristic of the luciferase encoded by the syn in a parent nucleic acid sequence. In one embodiment, the thetic nucleotide sequence. Preferably, the amino acid parent nucleic acid sequence is SEQ ID NO:1, SEQ ID sequence identity is over at least