Viewed in a Web Browser (Figure 3)
Total Page:16
File Type:pdf, Size:1020Kb
Microbial Cell Factories BioMed Central Research Open Access Structural genomics of human proteins – target selection and generation of a public catalogue of expression clones Konrad Büssow*1,2, Christoph Scheich1,2, Volker Sievert1,2, Ulrich Harttig1,4,5, Jörg Schultz3,8, Bernd Simon3, Peer Bork3, Hans Lehrach1,2 and Udo Heinemann1,6,7 Address: 1Protein Structure Factory, Heubnerweg 6, 14059 Berlin, Germany, 2Max-Planck-Institut für Molekulare Genetik, Ihnestr. 73, 14195 Berlin, Germany, 3EMBL Heidelberg, Meyerhofstr. 1, 69117 Heidelberg, Germany, 4RZPD German Resource Center for Genome Research GmbH, Heubnerweg 6, 14059 Berlin, Germany, 5DIFE, Arthur-Scheunert-Allee 114–116, 14558 Nuthetal, Germany, 6Max-Delbrück-Centrum für Molekulare Medizin, Robert-Rössle-Str. 10, 13092 Berlin, Germany, 7Institut für Chemie/Kristallographie, Freie Universität, Takustr. 6, 14195 Berlin, Germany and 8Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany Email: Konrad Büssow* - [email protected]; Christoph Scheich - [email protected]; Volker Sievert - [email protected]; Ulrich Harttig - [email protected]; Jörg Schultz - [email protected]; Bernd Simon - [email protected]; Peer Bork - [email protected]; Hans Lehrach - [email protected]; Udo Heinemann - [email protected] * Corresponding author Published: 05 July 2005 Received: 23 May 2005 Accepted: 05 July 2005 Microbial Cell Factories 2005, 4:21 doi:10.1186/1475-2859-4-21 This article is available from: http://www.microbialcellfactories.com/content/4/1/21 © 2005 Büssow et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: The availability of suitable recombinant protein is still a major bottleneck in protein structure analysis. The Protein Structure Factory, part of the international structural genomics initiative, targets human proteins for structure determination. It has implemented high throughput procedures for all steps from cloning to structure calculation. This article describes the selection of human target proteins for structure analysis, our high throughput cloning strategy, and the expression of human proteins in Escherichia coli host cells. Results and Conclusion: Protein expression and sequence data of 1414 E. coli expression clones representing 537 different proteins are presented. 139 human proteins (18%) could be expressed and purified in soluble form and with the expected size. All E. coli expression clones are publicly available to facilitate further functional characterisation of this set of human proteins. Background tein expression in small and large scale, biophysical pro- The Protein Structure Factory tein characterisation, crystallisation, X-ray diffraction and The Protein Structure Factory (PSF) is a joint endeavour of structure calculation. universities, research institutes and companies from the Berlin area [1,2]. It takes part in the international struc- It is known that eukaryotic proteins are often difficult to tural genomics initiative [3,4] and aims at the determina- express in Escherichia coli [5]. Only a certain fraction of tion of human protein structures by X-ray diffraction these proteins can be overproduced in E. coli in sufficient methods and NMR spectroscopy using standardised high- yield without formation of inclusion body aggregates or throughput procedures. A complete pipeline has been proteolytic degradation. Alternative expression systems established for this purpose that comprises cloning, pro- include cell cultures of various eukaryotic organisms and Page 1 of 13 (page number not for citation purposes) Microbial Cell Factories 2005, 4:21 http://www.microbialcellfactories.com/content/4/1/21 Pt5 (1-87) cell-free, in vitro protein expression. These systems have Olac (30-48) Olac (60-87) been greatly improved since 1999, when the PSF project XhoI (1) EcoRI (88) RBS (101-107) was initiated. In the meantime, E. coli [5-7] and wheat MRGSH6 (115-144) beta-lactamase BamHI (145) germ [8]in vitro protein synthesis is routinely used by (2434-3292) KpnI (167) SalI (173) structural genomics projects. At the PSF, yeast expression BglII (184) NotI (191) hosts, Saccharomyces cerevisiae and Pichia pastoris, were suc- Strep-II tag (199-222) pQStrep2 HindIII (224) cessfully established as alternative systems to E. coli, as 3499 bp λ tO terminator (245-330) BglI described in detail previously [9-11]. We will focus here (2627) on the results obtained with the E. coli expression system. rrnB T1 terminator (1102-1199) E. coli strains and vectors XbaI (1201) pMB1/ColE1 ori The T7 RNA polymerase-dependent E. coli expression vec- (1673) BamHI Kpn I tor system (pET-vectors) is a universal system to generate TAACTATGAGAGGATCGCATCACCATCACCATCACGGATCCGCATGCGAGCTCGGTACCC |||||| 110 120 130 140 150 160 recombinant protein for structural analysis [12]. pET vec- MRGSHHHHHHGSACELGTP tors are usually combined with the E. coli B strain BL21 His -tag 6 and derivatives that are engineered to carry the T7 RNA SmaI | SalI Bgl II Not I Hin dIII CGGGTCGACCTGCAAGATCTGCGGCCGCTTGGAGCCACCCGCAGTTCGAAAAATAAGCTT polymerase gene. These strains, however, have limitations |||||| 170 180 190 200 210 220 GRPARSAAAWSHPQFEK* in cloning and stable propagation of the expression con- Strep-tag II structs. Expression vectors which are regulated by the lac operator are independent of the host strain. Recombina- Pt5 (1-87) Olac (30-48) tion-deficient E. coli K-12 strains are suitable for cloning Olac (60-87) EcoRI (88) because of their high transformation rates and because XhoI (1) RBS (101-107) Met-Arg-His7 (115-141) they allow for stable propagation of recombinant con- TEV-site (166-189) beta-lactamase - + BamHI (184) structs. The strain SCS1 (Stratagene; hsdR17(rK mK ) recA1 (3738-4596) SalI (192) BglII (205) endA1 gyrA96 thi-1 relA1 supE44) was found to perform KpnI (216) BglI (3931) NotI (219) well at the PSF in cloning experiments. It grows relatively HindIII (239) pQTEV λ tO terminator (259-354) fast and allows for robust protein expression. 4803 bp rrnB T1 terminator (1116-1213) Affinity tags allow for standardised protein purification XbaI (1215) Q procedures. The first vector that was used routinely in the lacI pMB1/ColE1 ori (1301-2258) PSF, pQStrep2 (GenBank AY028642, Figure 1), is based (2977) XbaI (2505) on pQE-30 (Qiagen) and adds an N-terminal His-tag [13] for metal chelate affinity chromatography (IMAC) and a EcoRI GAATTCATTAAAGAGGAGAAATTAACTATGAAACATCACCATCACCATCACCATAGCGATTACGACATCCCCACTACT C-terminal Strep-tag II [14,15] to the expression product. |||||||| 90 100 110 120 130 140 150 160 pQStrep2 allows for an efficient two-step affinity purifica- MKHHHHHHHSDYDIPTT tion of the encoded protein, as demonstrated in a study of His7-tag BamHI Sal I Bgl II Kpn I Not I Hin dIII an SH3 domain [16]. The eluate of the initial IMAC is GAGAATCTTTATTTTCAGGGATCCGGGTCGACTGTTGATAGATCTCGGTACCGCGGCCGCTCGACCTGCAGCCAA |||||||| 170 180 190 200 210 220 230 240 directly loaded onto a Streptactin column. Thereby, only ENLYFQGSGSTVDRSRYRGRSTCSQ full-length expression products are purified and degrada- TEV protease cleavage site tion products are removed. However, the two tags, which are flexible unfolded peptides, remain on the protein and may interfere with protein crystallisation, although we PvuI (155) PstI (281) could show that crystal growth may be possible in their HincII (666) presence even for small proteins [16]. To exclude any neg- argU kan ative influence by the affinity tags, another vector, pQTEV PstI (773) (GenBank AY243506, Figure 1), was constructed. pQTEV pSE111 allows for expression of N-terminal His-tag fusion pro- 5.1 kb teins that contain a recognition site of the tobacco etch p15A ori virus (TEV) protease for proteolytic removal of the tag. EcoRI Q (~3400) lacI EcoRI (2232) Codon usage has a major influence on protein expression levels in E. coli [17], and eukaryotic sequences often con- VectorFigure maps1 tain codons that are rare in E. coli. Especially the arginine Vector maps. Vector maps of pQStrep2, pQTEV and pSE111 codons AGA and AGG lead to low protein yield [18]. This can be alleviated by introducing genes for overexpression Page 2 of 13 (page number not for citation purposes) Microbial Cell Factories 2005, 4:21 http://www.microbialcellfactories.com/content/4/1/21 of the corresponding tRNAs into the E. coli host cells. We structures which can usually only be crystallised as have used the plasmid pSE111 (Figure 1) carrying the domains, i.e. expression constructs lacking other domains argU gene for this purpose. pSE111 is compatible with have to be prepared. To identify coiled-coil proteins, the pQTEV and other common expression vectors. It carries program COILS was used [26-28]. the lacIQ gene for overexpression of the Lac repressor, which is required when using promoters regulated by lac • The cellular localisation of target proteins was assigned operators. pSE111 was used at the PSF in combination with the Meta_A(nnotator) [29,30]. Target proteins anno- with the expression vectors pQStrep2 and pGEX-6P-1. tated to be localised in the extracellular space, endoplas- Strains for overexpression of rare tRNAs are available from mic reticulum, Golgi stack, peroxisome or mitochondria Invitrogen (BL21