Analysis of a High-Throughput Yeast Two-Hybrid System and Its Use to Predict the Function of Intracellular Proteins Encoded Within the Human MHC Class III Region
Total Page:16
File Type:pdf, Size:1020Kb
Genomics 83 (2004) 153–167 www.elsevier.com/locate/ygeno Analysis of a high-throughput yeast two-hybrid system and its use to predict the function of intracellular proteins encoded within the human MHC class III region Ben Lehner,1 Jennifer I. Semple,1 Stephanie E. Brown, Damian Counsell, R. Duncan Campbell,2 and Christopher M. Sanderson*,2 Functional Genomics Group, MRC Rosalind Franklin Centre for Genomics Research,3 Hinxton, Cambridge CB10 1SB, United Kingdom Received 8 April 2003; accepted 15 July 2003 Abstract High-throughput (HTP) protein-interaction assays, such as the yeast two-hybrid (Y2H) system, are enormously useful in predicting the functions of novel gene-products. HTP-Y2H screens typically do not include all of the reconfirmation and specificity tests used in small-scale studies, but the effects of omitting these steps have not been assessed. We performed HTP-Y2H screens that included all standard controls, using the predicted intracellular proteins expressed from the human MHC class III region, a region of the genome associated with many autoimmune diseases. The 91 novel interactions identified provide insight into the potential functions of many MHC genes, including C6orf47, LSM2, NELF-E (RDBP), DOM3Z, STK19, PBX2, RNF5, UAP56 (BAT1), ATP6G2, LST1/f, BAT2, Scythe (BAT3), CSNK2B, BAT5, and CLIC1. Surprisingly, our results predict that 1/3 of the proteins may have a role in mRNA processing, which suggests clustering of functionally related genes within the human genome. Most importantly, our analysis shows that omitting standard controls in HTP-Y2H screens could significantly compromise data quality. D 2003 Elsevier Inc. All rights reserved. Keywords: Protein–protein interaction; Yeast two-hybrid; Human MHC class III region The yeast two-hybrid (Y2H) system has been used to studies. Importantly, von Mering’s data show that the accu- analyze large numbers of protein–protein interactions in racy of the raw (unfiltered) Y2H data is in fact comparable to yeast [1–3], bacteria [4], viruses [5], and Caenorhabditis that obtained by the TAP complex isolation approach [9] elegans [6]. In addition, an adaptation of the Gal4 two- (38.1 and 40.5%, respectively, for baits from a reference set hybrid assay was used to identify 145 interactions occurring of known interactions). Moreover, compared to purification between 3500 mouse proteins in mammalian cells [7]. The methods, Y2H data appear less biased toward highly small degree of overlap [1] between theoretically comparable expressed proteins, proteins of particular subcellular loca- data sets from some of the larger studies [1,2] has inevitably tions, or phylogenetically conserved proteins [8]. As such, led to questions about the general suitability of the Y2H the emerging consensus is that the two approaches are technology as a strategy for building protein interaction complementary and that by combining data from both maps. In particular, there has been speculation as to whether approaches it will be possible to increase the accuracy and coprecipitation methods may be more useful. In a recent coverage of predicted protein interaction networks. report, von Mering et al. [8] compared the accuracy, cover- Given the undoubted contribution that the Y2H approach age, and biases of the large-scale Y2H and complex isolation has made to the identification of novel protein–protein interactions, both in the small-scale study of human proteins * Corresponding author. and in larger scale studies of model organisms, it is clear E-mail addresses: [email protected] (C.M. Sanderson), that the Y2H technique should now be used on a larger scale [email protected] (R.D. Campbell). 1 to explore human protein interaction networks. However, Both authors contributed equally toward the publication of this although some important pioneering large-scale Y2H stud- manuscript. 2 Inquiries can be addressed to either of these authors. ies have been performed [1–3,6], high-throughput (HTP) 3 Formerly the MRC UK HGMP Resource Centre. Y2H screening is still in its infancy, and as such, there is still 0888-7543/$ - see front matter D 2003 Elsevier Inc. All rights reserved. doi:10.1016/S0888-7543(03)00235-0 154 B. Lehner et al. / Genomics 83 (2004) 153–167 a need for improvement and debate about the way HTP To address these issues we have performed a medium- screens are performed. In particular, large-scale analysis of scale pilot study using a HTP strategy that incorporates all human protein–protein interaction networks presents stra- classical secondary specificity checks adapted into a HTP tegic problems that were not encountered in previous large- format. For this study we chose to analyze a set of intracel- scale Y2H studies. It is therefore highly likely that different lular proteins encoded within the human MHC class III experimental strategies and HTP adaptations will need to be region [12]. This region (0.8 Mb), of chromosome 6p21.3, applied when analyzing the human proteome. is the most gene-dense domain of the human genome and has To date several different strategies have been used to genetic associations with a range of human diseases [13], adapt Y2H screens into HTP formats. For smaller numbers including insulin-dependent diabetes mellitus [14,15], rheu- of genes (f200), a robotically assisted array strategy has matoid arthritis [16], multiple sclerosis [17], ankylosing been used [2]. This approach provides a valuable tool for spondylitis [18], and IgA deficiency [19]. In addition to its proteomic analysis, not least of all because it enables medical significance, the MHC class III region provides an nonspecific interactions to be identified with relative ease ideal set of functionally diverse proteins that can be used to by repetitive screening. However, due to the potential size of assess the consequence of performing different classical the human proteome and the lack of a complete set of human specificity tests. Importantly, some proteins included in this open reading frames (ORFs), it is not possible to apply this study (NELF-E, CSNK2B, ATP6G2, LSM2, UAP56) have methodology directly to large-scale human protein interac- well-defined functions and known protein interaction part- tion studies. Consequently, the screening of high-complexity ners. Inclusion of these proteins allows us to assess the cDNA libraries remains the best way of identifying novel potential of the HTP Y2H approach to predict gene function interactions between human proteins. accurately and provides important internal controls for the As library screening is an inherently labor-intensive efficiency and fidelity of secondary specificity tests. procedure, a range of adaptations has been used to increase Despite convincing genetic evidence showing the impor- throughput. These include the use of non-sequence-verified tance of the class III region in health and disease, neither the baits [6], the pooling of baits [1,10], the analysis of only genes responsible nor the underlying mechanism of pathol- small numbers of interaction partners [2], and the omission ogy is known for many of the associated conditions. of secondary procedures, which are commonly used to test Meaningful interpretation of this genetic evidence would the specificity of putative protein–protein interactions in be greatly enhanced if the molecular functions of each gene small-scale Y2H screens. These procedures include the were understood. By using a high-stringency Y2H assay to reconfirmation of interactions in fresh yeast and the use of identify interaction partners we hoped to be able to predict nonspecific baits to assess prey specificity [11]. the functions for many of the currently uncharacterized gene As an alternative to experimental specificity checks some products with increased confidence. In addition, by cata- large-scale studies have implemented data-filtering methods loguing the potential function of a collection of genes from a to reduce the false-positive rate [1]. This involves reporting single genomic locus extending over 800 kb it was possible only those preys that are isolated three or more times in a to assess whether clustering of functionally related genes single screen. While this may be a valid strategy for has occurred in this important region of the human genome. assessing the reliability of interactions isolated from ge- nomic fragment libraries or libraries of pooled prey clones, in which each gene should be present in approximately Results equal abundance [1,4], the same criteria cannot be applied when nonnormalized cDNA libraries are being screened. In In this pilot study we have used a stringent HTP Y2H this case the number of clones isolated per gene will depend assay to analyze the protein–protein interaction profiles of far more on the relative level of gene expression rather than intracellular proteins encoded by genes in the human MHC the ‘‘trueness’’ of the observed interaction. Also, it is class III region on chromosome 6p21.3. In this region there possible that multiple positive colonies may arise from a are 60 expressed genes, approximately half of which encode single mutated yeast or vector (D. Markie, personal com- intracellular proteins. For the purpose of this study we chose munication). Therefore, obtaining multiple isolates of a not to study genes encoding secreted or cell surface proteins, single clone, even in repeat screens, does not prove that as they are less likely to perform well in the Y2H assay. the interaction is a true positive. Equally, singletons may Members of the HSP70 family were also excluded because well be true positives, and excluding them may increase the they are known to bind nonspecifically to hydrophobic false negative rate of the screen. peptides [20]. A list of the remaining 27 genes along with Due to the lack of primary data from many published a summary of available information regarding their domain Y2H studies, or quantitative studies that address the con- structures, functions, and known interaction partners is sequences of omitting established experimental specificity presented in Table 1.