<<

Journal of Cell Science 112, 4207-4211 (1999) 4207 Printed in Great Britain © The Company of Biologists Limited 1999 JCS0771

Motif Trap: a rapid method to clone motifs that can target to defined subcellular localisations

Luis A. Bejarano and Cayetano González* Cell Biology and Biophysics Programme, European Molecular Biology Laboratory, Heidelberg, Germany *Author for correspondence (e-mail: [email protected])

Accepted 27 September; published on WWW 17 November 1999

SUMMARY

We have developed a rapid procedure termed Motif Trap cDNAs. Using MT, we have identified patterns of GFP (MT) to identify motifs that are able to target localisation which correspond to every major and proteins to a distinct subcellular localisation in eukaryotic compartment. We have shown that MT is useful to identify cells. By expressing random DNA fragments fused to green new sequences that determine subcellular localisation as fluorescent protein (GFP), individual cells with the GFP well as known targeting motifs. localisation of interest are readily isolated allowing for the expressed DNA fragments to be cloned by RT-PCR. These Key words: GFP, Protein targeting motif, Subcellular localisation, can then be used to identify the corresponding full-length Visual screen

INTRODUCTION essential part, are needed. An attempt to this end was provided by the method developed to identify nuclear markers in fission One of the most conspicuous features of the eukaryotic cell is yeast (Sawin and Nurse, 1996). However, with the sequencing its compartmentalization. Numerous and domains of higher , a similar technique is needed for high subdivide the anatomy of the cell. Moreover, most of these efficiency determination of subcelular localisation in these can be further subdivided into well differentiated regions organisms. To identify protein motifs that are able to target or structures. The physiological relevance of such proteins to a distinct localisation in cultured cell lines derived compartmentalisation is paramount. Every major cellular from higher eukaryotes we have developed a rapid procedure activity can be assigned to one or more subcellular termed Motif Trap (MT). The method is based on libraries compartments. Furthermore, the intricate regulatory networks made of DNA fragments fused to GFP. Following transfection which operate within eukaryotic cells greatly rely on the with a MT library, the cells that display a GFP localisation of differential compartmentalisation of their components. The interest are isolated and the corresponding cDNA fragment that close relationship between subcellular localisation and encodes the targeting motif cloned by RT-PCR. function is such that determining the preferential localisation of a protein is often an essential step towards determining its function. Typically, the identification of proteins that are MATERIALS AND METHODS localised in a given compartment requires a very lengthy procedure that relies on the preparative isolation of such a Construction of the MT vector (MT#1) compartment through cell fractionation techniques. This Using primers A (CATGTTGGCGGCCGCGGTACCGTCGA) and B approach presents several limitations. Firstly, it is totally (GCCCGGGCGTGAGCAAGGGCGAG) we modified pEGFP-N1 dependent upon the yield and enrichment that can be achieved (Clontech) by PCR to introduce an SrfI site between nucleotides three by these cell fractionation techniques. Secondly, cell and four of the GFP coding sequence. This insertion shifts the initial fractionation can lead to the loss of peripheral, yet functionally ATG codon of the GFP out of frame with the rest of the coding relevant components. Finally, and more importantly, it can only sequence. This ensures that only insert-containing plasmids will be applied to those compartments that are amenable to cell express GFP. PCR was carried out with the Expand High Fidelity PCR fractionation. The relative amount and the biochemical System (Boehringer). Oligo A also introduced a NotI site 10 characteristics of every single protein can impose an additional nucleotides upstream of the GFP CDS. The PCR product was purified using the PCR Quiaquick Purification System (Qiagen), ligated with burden. Rapid Ligation Kit (Boehringer) and used to transform Epicurian Coli The genomic revolution is providing an ever increasing XL1-Blue (Stratagene), by heat-shock. Transformed cells were plated number of sequences with unknown functions. To match this out in LB-Agar suplemented with 30 µg/ml kanamycin and incubated wealth of sequence data, high throughput methods to generate for 16 hours at 37°C. The modified vector was then isolated by functional information, of which subcellular localisation is an minipreps and the NotI fragment subcloned into a pQE31 vector 4208 L. A. Bejarano and C. González

ATG-Srf I Fig. 1. The Motif Trap approach. Random DNA fragments cloned GF into the MT vector will produce fusion proteins between the P polypeptides encoded by these inserts and GFP. The MT vector MT DNA fragments contains a high efficiency cloning site immediately after the initial vector Clone fragments ATG of the GFP which shifts this codon out of frame with the rest of into MT vector the coding sequence. Thus, GFP can only be expressed from vectors carrying an insert that restores the reading frame. Upon transfection with the MT library, a fraction of the cells can be observed to express GFP which in some cases will be localised in a particular compartment or organelle. These cells can be cloned and the insert that encodes for the relevant subcelluar localisation signal can be isolated by RT-PCR.

MT Library previously modified to introduce a NotI site between the BamHI and Transfect and identify KpnI sites with an adaptor made with oligos Not1-1b cells with localised GFP (GATCGCGGCCGCGTAC) and Not1-8 (GCGGCCGC). The resulting colonies were checked under a transiluminator to test the expression of GFP and the NotI fragment was then isolated from one of the colonies and subcloned into of pEGFP-N1-Not, a modified

Subclone version of pEGFP-N1 that carries an aditional NotI site inserted in position 635-642. Construction of the MT libraries To construct the MmcDNA-MT library, cDNA was obtained from NIH/3T3 cells by random priming, purified with QIAEX II Gel Extraction Kit, cloned into the SrfI site of the MT#1 vector using the Rapid Ligation Kit (Boehringer) and transformed into E. coli XL1- RT-PCR RT-PCR RT-PCR Blue MR (Stratagene). Plating out a small aliquot of these cells, we estimated that the library contained about 420,000 clones of which 1.6% had no insert. The complete library was then plated out onto Isolated DNA fragments encoding subcellular localisation signals a sterile Nylon filter laid on a 24×24cm plate containing LB supplemented with kanamycin, incubated at 30¡C for 24 hours, replicated into another filter, and reincubated for 4 hours at 37¡C. The DNA from the library was then purified with Plasmid Maxi kit (QIAGEN). The DmgDNA-MT library was prepared in the same way except that the inserts in this case were Drosophila genomic DNA fragments between 100 and 150 nt in length. Transfection of HEK293 cells with the MT libraries and cloning of cells displaying localised GFP HEK293 cells were transfected with the MT libraries following the method described by Chen and Okayama (1987). The cells were observed twelve

Fig. 2. Patterns of GFP localisation generated by transfection with a MT library. Low (A) and high (B to I) magnification views of HEK293 cells transfected with the MmcDNA-MT library. GFP is shown in green. These cells were counterstained for DNA using propidium iodide (red). (B) Mitochondria; (C) centrosomal region (arrow); (D) cytokinesis furrow (arrow); (E) the chromosomes (arrows); (F) the mitotic spindle. We have not determined yet the subcellular localisation of GFP in the cells shown in G,H,I. Bars, 15 µm. Identification of protein targeting motifs 4209

Cloning of the DNA fragments encoding sucellular localisation sequences The DNA fragments of interest were isolated from cloned cells, obtained as described before, by RT-nested PCR. To this aim we used a set of nested oligos named Fir (AGCTTCGAATTC- GCGGCCGCCAACATG) Sec (TATGATCTAGAGTCGCGGCCGC- TTTAC) Thi (TAGCGCTACCGGACTCAGATCTCGAGC) and Fou (AAAACCTCTACAAATGTGGTATGGCTG) which flank the SrfI site of the MT#1 vector. mRNA was isolated from the cloned cells using the mRNA Capture Kit (Boehringer). The reverse transcriptase reaction and the first round of PCR were carried out using the Titan One Tube RT-PCR Kit with the Expand High Fidelity PCR System (Boehringer). Oligo Fou was used to prime the RT reaction. The first round of PCR was carried out with oligos Thi and Fou as primers. The second round of PCR was primed with oligos Fir and Sec. The PCR product was run on an agarose gel and isolated with QIAEX II Gel Extraction Kit Fig. 3. Sequencing the inserts that target GFP localisation. (A) The (QIAGEN). The isolated fragment was then digested with NotI and GFP fusion in clone 02/11#22 shows a strong nucleolar localisation subcloned into the MT#1 vector. This vector was the used to transfect with a faint homogeneous nuclear background. (B) The insert from HEK293 cells to check if the localisation of the GFP fused to the this clone contains a well defined bipartite NLS (red) and meets the isolated fragment reproduced the localisation of GFP in the original consensus of a nucleolar localisation signal. (C) In clone 09/07#18 clone, in which case the insert was sequenced. GFP colocalised with the ER as shown by counterstaining with an antibody against α-calnexin (not shown). (D) The insert from this cell line encodes a peptide of 35 amino acids that contains a RESULTS AND DISCUSSION predicted trans-membrane motif (PMSIFIQLIYFLLFLFLGVIC). This sequence does not have a match in the sequence databases. The construction of a Motif Trap library and its use to identify protein motifs that can target proteins to different cellular compartments is summarised in Fig. 1. The source DNA, either to sixteen hours after transfection to check for localised GFP using an genomic or cDNA, is fragmented and cloned into the MT × inverted LEICA DMI-RBE microscope with a long distance 63 vector. Upon transfection with a MT library, the cells that have Fluotar objective. The position of the cells of interest was labelled taken up a plasmid carrying an insert that encodes a targeting with a diamond pen, and then cloned by a combination of manual cloning and serial dilutions, as described by Harlow and Lane (1988). signal can readily be identified by the localisation of GFP. In some cases, the cells were first cloned using a fluorescence- These cells are then isolated by a combination of manual and activated cell sorter (FACS) and the resulting clones were later serial dilution cloning and the corresponding cDNA fragments analysed to determine the presence of localised GFP. cloned by RT-PCR. Between eight and ten hours after transfection of HEK293 cells with the MmcDNA-MT library, some cells start to express GFP and the first localisation patterns are recognisable (Fig. 2A). The percentage of cells transfected with a MT library that are expected to express GFP depends exclusively on the proportion of clones that carry a stop codon in frame with the GFP. This, in turn, depends on the size and base composition of the insert. Thus, the proportion of positive clones is expected to be high in libraries made with short inserts and should be minimum in MT libraries made of full-length cDNAs. In a typical transfection experiment with the MmcDNA-MT library, about 50% of the cells express GFP of which 20% display a distinct localisation of this reporter. Fig. 2B to I shows some of the GFP localisation patterns that we observed. Fig. 2B shows GFP specifically localised in the

Fig. 4. Cryptic mitochondrial localisation sequences. The mitochondrial localisation of these GFP fusions was determined by colocalisation experiments between mitotracker (A) and the GFP fusion (B). (C) A merged view of the two figures. (D) Sequence of 7 inserts that drive GFP to mitocondria. Most of these correspond to small ORFs present in non-coding genomic DNA. 4210 L. A. Bejarano and C. González mitochondria as confirmed by counterstaining with the as CAAX motifs (Hancock et al., 1990) or proteins whose mitochondria-specific marker mitotracker (not shown). In the compartmentalisation relays on the subcellular localisation of cell shown in Fig. 2C, GFP displays a fairly uniform their transcripts, which usually depends on the 3′UTR region distribution in the cytoplasm, but is significantly concentrated (reviewed by Kislauskis and Singer, 1992). These limitations in a small area near the nucleus that corresponds to the could be circumvented by a different MT vector designed to centrosome (arrow) as revealed by counterstaining with a produce C-terminal fusions, but in this case the reporter gene human autoimmune anti-centrosome antibody (not shown). would be expressed regardless of the presence of an insert and Fig. 2D,E and F show mitotic cells from different GFP whether or not the insert carries an open reading frame. expressing lines. GFP can be seen to localise in the cytokinesis Consequently, a great many more cells would have to be scored furrow (arrow; Fig. 2D), the chromosomes (Fig. 2E) and the to find clones that display a particular localisation of the mitotic spindle (Fig. 2F). GFP does not appear to be localised reporter. during interphase in these two cell lines. We have not yet An additional limitation of the MT approach is due to the determined the precise subcellular localisation of GFP in the very nature of the subcellular localisation motifs. Given the cells shown in Fig. 2G,H and I. These are a few examples of short size and low complexity of many of these, stretches of the patterns of GFP localisation that we have observed. Using amino acids that match a consensus, but do not act as targeting MT we have been able to identify cells with GFP localised in sequences are not uncommon. For instance, it has been every major organelle and compartment. estimated that a significant fraction of the proteins that carry a To demonstrate the use of MT to isolate protein targeting nuclear localisation signal are never found in the nucleus motifs we have cloned and sequenced the DNA inserts from (Dingwall and Laskey 1991). Mitochondrial signal sequences some of these cells. As expected, we have found sequences that (or pre-sequences) also fall into this category. Mitochondrial correspond to known proteins and contain targeting signals presequences have been defined as N-terminal peptides that are which are consistent with the observed localisation of the GFP generally basic, 15-30 amino acids long, rich in hydroxilated fusion. One of these is clone 02/11#22, (Fig. 3A,B). The GFP amino acids, deficient in acidic amino acids, and can form an fusion in this cell line shows a distinct nucleolar localisation amphiphilic α-helix or β-sheet (Roise and Schatz, 1988). To with a weak nuclear background. The insert from this line is get an estimate of the incidence of cryptic mitochondrial identical to the fragment that spans between amino acids 62 signals and how could they affect the identification of and 131 of the mouse homologue of the HTLV-I tax responsive mitochondrial proteins using MT, we made a library with small element binding protein TAXREB107 (Nacken et al., 1995). inserts of Drosophila genomic DNA that was then tested in This fragment, contains a well defined bipartite nuclear human HEK293 cells. We used genomic DNA from localisation signal (KRKYSAAKTKVEKKKKKE) and meets Drosophila as a source or high complexity DNA which, when the consensus of a nucleolus localisation signal. We have also cut in small inserts, would render a nearly random collection found inserts that are new sequences which do not have a match of sequences. The proportion of clones from this library that in the databases. This is the case of clone 09/07#18 (Fig. displayed a mitochondrial localisation was very high, around 3C,D). These cells contain GFP that is tightly localised to the 15% of the GFP expressing cells. Cloning and sequencing of (ER), as shown by counterstaining with seven of these inserts showed that none of them corresponded an antibody against the ER marker α-calnexin (not shown; to known mitochondrial proteins. In fact, most of them Cannon and Helenius, 1999). The insert from this cell line corresponded to very short ORFs present in non-coding encodes a peptide, 35 amino acids long and contains sequences like satellite DNA and rDNA (Fig. 4). Interestingly, a predicted trans-membrane motif (PMSIFIQLIY- despite the nearly random origin of these inserts, all of them FLLFLFLGVIC) that may account for the ER localisation fit very precisely the defined consensus for a mitochondrial shown by the fusion protein (von Heijne, 1992). Indeed, this pre-sequence: short, with a net positive charge, containing does not mean to say that the identified motif contains an active some hydroxilated amino acids and predicted to fold as an retention signal since the ER accumulation of the fusion amphiphilic α-helix, thus illustrating the power of the MT protein could result from its failure to progress further into the approach to define consensus localisation signals using nearly- secretory pathway. These observations demonstrate the use of random DNA. MT to identify and clone protein fragments bearing specific Short coding sequences, just like non-coding sequences, are targeting signals. therefore expected to contain numerous cryptic subcellular The main feature of MT is that it is not biased by the localisation motifs that are normally silent due to different particular features of a given protein. Polypeptides containing reasons. Probably, the most common one is their position a localisation signal are identified solely on the basis of their within the folded protein which renders them inaccessible to capability to drive GFP to a particular subcellular the other components of the subcellular localisation machinery compartment. The abundance, physiological role, and possible (Roise and Schatz, 1988). In addition, some potential regulation of the full-length protein is totally irrelevant. localisation signals may fail to target because of the presence Moreover, MT is not limited to compartments that are of other targeting signals that act in a dominant manner and amenable to cell fractionation. In principle, proteins belonging recruit the protein to a different compartment (Dingwall and to any compartment that can be visualised by fluorescence Laskey 1991). These cryptic targeting signals can be revealed microscopy could be identified with this method. by taking them out of their natural context in a fusion with a Nevertheless, several limitations apply to the Motif Trap reporter protein. Moreover, short DNA fragments can also approach. Since the MT vector generates N-terminal fusions contain non-coding ORFs that may encode artefactual targeting with the reporter, it cannot be used to identify localisation sequences. Although non-naturally occurring sequences signals that reside in the C-terminal end of the molecule such capable of targeting may have very interesting applications, Identification of protein targeting motifs 4211 they can produce a bothersome background when naturally by subdividing the library. Further subdivision into smaller occurring targeted proteins are to be identified. The incidence pools and rescreening was used to isolate the clones responsible of all these artefactual results can be considerably reduced by for a particular pattern of interest. Another difference with the the use of MT libraries made with large cDNA inserts so that MT method is that in the library used by Roll et al. (1999) the nearly full-length proteins are fused to the reporter. These cDNAs were inserted after the 3′ end of the GFP coding protein fusions have a higher chance of folding, thus sequence. Technical diferences aside, the visual screen inactivating cryptic targeting signals. They also have more described by these authors and the MT approach are chances of carrying the more dominant targeting motif, the one conceptually identical and represent the first two of what may that determines the normal localisation of the protein. soon become a large number of variations on the same theme. Moreover, longer cDNAs will increase the frequency of stop codons in any of the incorrect ORFs thus decreasing their We are indebted to M. Way, C. Dotti, S. Cohen, E. Karsenti, T. artefactual expression. Nilsson, T. Hyman, R. Pepperkok and G. Griffiths for their Finally, the MT approach is constrained by the limitations encouragement and support as well for their critical reading of this that apply to artificial protein expression in eukaryotic cells. manuscript. Some of the initial experiments that lead to the development of MT were carried out by L.A.B. in the laboratory of The main source of artefactual results is overexpression. J. Jimenez. G. Alonso, suggested the use of the HEK293 cells. L.A.B. Overexpression can lead to the ectopic accumulation of a is recipient of a postdoctoral fellowship from the Spanish ‘Ministerio protein in one of the compartments it normally goes through de Educacion’. Work in our laboratory is partially financed by EU to reach its final destination, to the production of aggregates, grant CT960005. and, when overexpression results in protein levels which exceeds the capability of the machinery, to the recruitment of the unfolded protein in the aggresome (Johnston REFERENCES et al., 1988). Despite these potential limitations, that can be reasonably Cannon, K. S. and Helenius, A. (1999). Trimming and readdition of glucose to N-linked oligosaccharides determines calnexin association of a substrate controlled by choosing the right vector, insert size and glycoprotein in living cells. J. Biol. Chem. 274, 7537-7544. expression conditions, MT provides a rapid approach to Chen, C. and Okayama, H. (1987). High-efficiency transformation of identify localised proteins. There are several potential mammalian cells by plasmid DNA. Mol. Cell Biol. 7, 2745-2752. applications of MT that we can envisage will be developed in Cubitt, A. B., Heim, R., Adams, S. R., Boyd, A. E., Gross, L. A. and Tsien, the future. MT may offer a powerful tool in the field of R. Y. (1995). Understanding, improving and using green fluorescent proteins. Trends Biochem. Sci. 20, 448-455. functional genomics as a high throughput method to determine Cudmore, S. Reckmann, I. and Way, M. (1997). Viral manipulations of the the subcellular localisation of the fast growing number of actin cytoskeleton. Trend Microbiol. 5, 142-147. sequences which are being generated by ongoing genome Dingwall, C. and Laskey, R. A. (1991). Nuclear targeting sequences Ð a projects. Another application of MT is the identification of consensus? Trends Biochem. Sci. 16, 478-481. Dotti, C. G. and Simmons, K. (1990). Polarized sorting of viral glycoproteins proteins that are differentially sorted in differentiating cells like to the axon and dendrites of hippocampal neurons in culture. Cell 62, 63- epithelial cells that are induced to polarise or primary cultures 72. of differentiating neurons (Dotti and Simons, 1990). Similarly, Hancock, J. F., Paterson, H. and Marshall, C. J. (1990). A polybasic domain MT could be used to detect proteins which are specifically or palmitoylation is required in addition to the CAAX motif to localize relocated under different stress conditions, following heat- p21ras to the plasma membrane. Cell 63, 133-139. Harlow, E. and Lane, D. (1988). Antibodies: a Laboratory Manual. Cold shock for instance, or after infection with pathogens which Spring Harbour Laboratory Press, NY. result in a dramatic rearrangement of the architecture of the Johnston, J. A., Ward, C. L. and Kopito, R. R. (1998). Aggresomes: a cell (Cudmore et al., 1997). Another interesting application of cellular response to misfolded proteins. J. Cell Biol. 143, 1883-1898. MT will be the cloning of interacting partners of a given Kislauskis, E. H. and Singer, R. H. (1992). Determinants of mRNA localization. Curr. Opin. Cell Biol. 4, 975-978. protein by transfecting cells which contain the protein labelled Nacken, W., Klempt, M. and Sorg, C. (1995). The mouse homologue of the with a fluorochrome that produces FRET (Cubitt et al., 1995) HTLV-I tax responsive element binding protein TAXREB107 is a highly with the library’s reporter. Finally, systematic screenings with conserved gene which may regulate some basal cellular functions. Biochim. a MT library could be used to identify new domains within Biophys. Acta 1261, 432-434. known organelles and compartments. An additional spin off of Roise, D. and Schatz, G. (1988). Mitochondrial presequences. J. Biol. Chem. 63, 4509-4511. MT is the establishment of a large collection of cell lines that Rolls, M. M., Stein, P. A., Taylor, S. S., Ha, E., McKeon, F. and Rapoport, carry GFP at different locations. T. A. (1999). A visual screen of a GFP-fusion library identifies a new type While this paper was being reviewed, a related manuscript of . J. Cell Biol. 146, 29-43. was published describing a visual screen of a GFP-fusion Sawin, K. E. and Nurse, P. (1996). Identification of fission yeast nuclear markers using random polypeptide fusions with green fluorescent protein. library (Rolls et al., 1999). Unlike the method described here, Proc. Nat. Acad. Sci. USA 94, 15146-15151. this screening was carried out by examining the patterns of GFP von Heijne, G. (1992). Membrane prediction, hydrophobicity localisation in cells transfected with pools of clones generated analysis and the positive-inside rule. J. Mol. Biol. 225, 487-494.