–Protein technology

There are two main approaches for detecting ape interacting : techniques that measure OSC direct physical interactions between protein /Cyt O pairs — binary approaches — and those that measure interactions among groups of proteins that may not form physical contacts — co-

/UC San Dieg San /UC complex methods (see ‘Tools for the search’). O The most frequently used binary method is K. On K. the two-hybrid (Y2H) system2. It has vari- ations involving different reagents, and has been adapted to high-throughput screening. The strategy interrogates two proteins, called bait and prey, coupled to two halves of a transcrip- tion factor and expressed in yeast. If the proteins make contact, they reconstitute a transcription factor that activates a reporter gene. Another method for identifying binary interactions is luminescence-based mamma- lian mapping (LUMIER), a high- throughput approach developed by Jeff Wrana at the Samuel Lunefeld Research Institute in Toronto, Canada. This strategy fuses Renilla luciferaze (RL) , which catalyses light- emitting reactions, to a bait protein, which is expressed in a mammalian along with can- didate protein partners tagged with a polypep- The interactome contains more than 100,000 protein interactions, only a fraction of which are known. tide called Flag. Researchers use a Flag antibody to immunoprecipitate all proteins with the Flag tag, along with any that interact with them. Interactions between the RL-fused bait and the Flag-tagged prey are detected when light Interactome under is emitted. Other binary methods include the mammalian protein–protein trap and techniques based on chips. The most common co-complex method is construction co-immunoprecipitation (coIP) coupled with (MS). In this approach, a protein bait is tagged with a molecular marker. Developing techniques are helping researchers to build the Several types of tags are commercially available; protein interaction networks that underlie all cell functions. each requires a distinct biochemical technique to recognize the tag and fish the bait protein out of the cell lysate, bringing with it any interacting by Laura bonetta open-source software for network visualiza- proteins. These are then identified by MS. tion. “That effort identified 30,000 genes, but In addition to these empirical methods, n old adage says: “Show me your friends, that is not the end goal. How the genes work in researchers have used computational techniques and I’ll know who you are.” In the same pathways and how these pathways function in to predict interactions on the basis of factors way, finding interaction partners for states and development is the end goal. such as amino-acid sequence and structural Aa protein can reveal its function. To that end, To accomplish this we will need to systemati- information. “People ask ‘Why are you predict- researchers are now building entire networks of cally map gene and protein interactions.” ing interactions when you can just do the exper- protein–protein interactions. Unlike biological Unlike the , the interactome — the iment?’” says Gary Bader, a bioinformatician at pathways, which represent a sequence of molec- set of protein-to-protein interactions that the University of Toronto. “But experimental ular interactions leading to a final result — for occurs in a cell — is dynamic. Many inter- techniques fail for some proteins.” example, a signalling cascade — networks are actions are transient, and others occur only interlinked. Represented as starbursts of pro- in certain cellular contexts or at particular False Readings tein ‘nodes’ linked by interaction ‘edges’ to form times in development. The interactome may Every step of a procedure to detect protein– intricate constellations, they provide insight into be tougher to solve than the genome, but the protein interactions — from the reagents used the mechanisms of cell functions. Furthermore, information, researchers say, is crucial for a to the cell types and experimental conditions — placing proteins encoded by disease genes into complete understanding of . influences the proteins that are identified. Two these networks will let researchers determine studies this year used similar methods to iden- the best candidates for assessing disease risk The RighT PaRTneRs tify interacting proteins in transcription factors and targeting with therapies. At any time, a human cell may contain about in embryonic stem cells3,4; there was incomplete “This is the next step after the Human 130,000 binary interactions between proteins1. overlap between the resulting data sets. “If you Genome Project,” says Trey Ideker, a systems So far, a mere 33,943 unique human protein– use the same protocol you will get reproducible biologist at the University of California, San protein interactions are listed on BioGRID lists of proteins. But different labs use different Diego, and principal investigator at the National (http://thebiogrid.org), a database that stores protocols, which affects the end result,” says Resource for Network Biology, which provides interaction data. Clearly, there is work to do. Raymond Poot, a cell biologist at Erasmus MC

9 December 2010 | VOL 468 | NATUre | 851 © 2010 Macmillan Publishers Limited. All rights reserved technology Protein–Protein interactions

hospital in Rotterdam, the Netherlands, and Calling out false positives — reported inter- detect the interactions. But the definition of a lead author of one of the studies. actions that don’t actually occur — and false ‘real’ interaction depends on the context. “Does In his protocol, Poot pulled interacting pro- negatives — interactions that do occur but are a real interaction mean that two proteins inter- teins from cells using nuclear extracts express- not picked up by the experimental protocol or act if they are placed next to each other in a ing different Flag-tagged transcription factors. are discarded — is one of the main challenges test tube, or that they must interact in a cell? Or He added a nuclease to his reactions to remove in the field. “Normally when you do a coIP fol- does real mean that the interaction should have DNA and eliminate possible artefacts caused lowed by MS you will get hundreds of protein a biological function?” asks Ideker. Researchers by proteins binding to it. “Transcription fac- candidates interacting with any one bait,” says can home in on functional interactions by com- tors bind to DNA so you are likely to pull out Wade Harper, a cell biologist at Harvard Medi- bining data on interactions with other types of DNA-binding factors that are not directly inter- cal School in Boston, Massachusetts. “When biological information, such as genetic interac- acting,” he explains. Purifying many different you weed out all the stochastic and non-specific tions, protein localizations or . transcription factors with the same protocol interactions you end up with many fewer pro- For instance, proteins whose genes are co- also enabled the researchers to determine which teins. Some proteins in large complexes might expressed are likely to interact with each other interactions were most likely to be specific. For have 30–50 partners, others only 4–5.” or to be part of the same complex or pathway. example, proteins that consistently co-purified One way in which researchers increase the Many tools are available on the web for inte- with all transcription factors would be treated accuracy of their results is to use more than one grating different types of information about a as unlikely to indicate a genuine interaction. method (for example, Y2H plus LUMIER) to given protein or gene. One is GeneMANIA, developed by Bader’s group in collaboration with Quaid Morris, a computational biolo- gist also at the University of Toronto. A user enters the gene names into GeneMANIA; tools for the search the program provides a list of genes that are functionally similar or have shared properties, CS such as similar expression or localization, and

rigeni then displays a proposed interaction network, B

Hy showing relationships among the genes and the type of data used to gather that informa- tion. The user can click on any node to obtain information about the gene and on any link to obtain information about their relationship (such as citations for any published studies or other sources of data). “It’s like a Google for genetic and protein information,” says Bader. Other web-based interfaces that predict gene functions include STRING (http://string- db.org) developed at the European Molecular Methods such as the yeast two-hybrid system allow scientists to work out which proteins interact. Biology Laboratory in Heidelberg, Germany. It hunts for protein interactions on the basis of the two main methods for finding companies such as Hybrigenics in paris genomic context, high-throughput experiments, protein–protein interactions are the and Dualsystems Biotech of Schlieren, co-expression and data from the literature. yeast two-hybrid (y2H) system and Switzerland, offer y2H-based screening. co-immunoprecipitation followed by mass “We have complex libraries with ten times KeePing scoRe spectrometry. Several companies sell more independent clones than most other To select real protein–protein interactions, reagents for both approaches. invitrogen of libraries, which we screen to saturation. Harper and some members of his lab, Matt Sowa Carlsbad, California, sells the proQuest two- and rather than screening full-length and Eric Bennett, developed a software platform Hybrid System with gateway technology. proteins, we screen for interactions with called CompPASS to assign confidence scores this is based on y2H, with modifications domains,” says etienne Formstecher, to an interaction detected by MS5. CompPASS to decrease false-positive results and allow director of scientific projects and sales at takes data sets of interacting proteins (including rapid characterization, says the company. Hybrigenics. “Full-length proteins can have those identified in experiments) and measures Other firms provide vectors used to produce some domains buried and not available to frequency, abundance and reproducibility of proteins with affinity tags, which can easily interact, at least in yeast where you may interactions to calculate the score. be immunoprecipitated along with other not have signals to unlock a closed protein This year, Harper used CompPASS to iden- interacting proteins. a polypeptide tag called conformation.” a customer is given a list tify interactions among proteins involved in Flag is popular among researchers, and of proteins that interact with the protein autophagy, the process by which cellular pro- Sigma aldrich of St Louis, Missouri, provides of interest; it indicates which domains are teins and organelles are engulfed into vesicles several Flag-genes for purchase. promega making contact and provides a confidence and delivered to the lysosome to be degraded. in Madison, Wisconsin, has the Halotag score for each interaction. Starting with 32 proteins known to have a role technology, in which a protein of interest innoprot in Derio, Spain, provides in autophagy, they identified 2,553 interacting is expressed in fusion with a tag protein an interaction service using tag-based proteins using coIP–MS. CompPASS then nar- engineered from a bacterial enzyme. this purification designed for high-throughput rowed the list down to 409 high-confidence tag can be used to purify the protein, and analysis. and invitrogen’s protoarray interacting proteins with 751 interactions6. any interacting with it, by binding to a resin. protein–protein interaction Service uses Ideker’s group used a different approach the tag is cleaved off using a protease. microarrays containing more than 9,000 to map interactions among human mitogen- For researchers who don’t have the time human proteins to identify proteins that activated protein kinases (MAPKs), which or infrastructure to do the experiments, interact with any protein of interest. L.b. respond to external stimuli and regulate cell function. Having used Y2H to identify more

852 | NATUre | VOL 468 | 9 December 2010 © 2010 Macmillan Publishers Limited. All rights reserved Protein–Protein interactions technology

ape For proteins not yet assigned to a portion

OSC of the human interaction network, Cravatt’s

/Cyt group developed a technology for assigning O protein functions by exploiting an interac- tion between and chemical reagents dubbed activity-based probes. These probes /UC San Dieg San /UC

O consist of a reactive group that binds the active sites of many members of an enzyme family, K. On K. and a reporter tag that is used for the detec- tion and identification of the probe-labelled enzymes8. Because these probes bind only to enzymes that are active, they can give insights into the enzymes’ functions. For example, if a probe binds to a set of enzymes in a cancer cell but not in a normal cell, it means that these enzymes become more active in the cancer cell and so may have a role in cell growth. The activity probes can also serve as assays for the discov- From a full network, researchers can zoom in on specific interactions that might be functionally relevant. ery of inhibitors for a particular enzyme, which may help researchers to understand the role than 2,000 interactions among known MAPKs, available through a common language called of that enzyme. “You can develop an inhibitor Ideker used evidence including conservation PSICQUIC, to maximize access. for an enzyme before ever knowing what the of interactions among different species to win- Other types of data that can be combined with actual substrate is,” says Cravatt. now that down to a core network of 641 high- protein–protein interactions include informa- This year, he developed another strategy that confidence interactions7. tion on gene expression, cellular co-localization not only determines differences in enzyme For some of the proteins there was no pre- of proteins (based on microscopy), genetic activities in different cells, but also pinpoints vious evidence of interactions with MAPKs. information, metabolic and signalling pathways, where in the protein these differences occur, Ideker and his colleagues knocked down the and data from high-throughput assays. providing a more quantitative measure of the expression of these proteins using RNA inter- “One challenge computationally is integrat- differences9. The activities of many families ference, then looked for the effect of the knock- ing heterogeneous data sets to build a network of enzymes are regulated or fine-tuned by downs on proteins known to be activated by model,” says Ilya Shmulevich, a professor at the cysteine modifications. By looking specifically MAPKs. This allowed them to confirm that Institute for in Seattle, Wash- for changes in cysteine modifications across about one-third of their interactions had a role ington. The second challenge is to decide on a the proteome, he found in MAPK signalling. modelling approach. “It will depend on what ‘hyper-reactive’ cysteine These methods are helping to weed out false kind of data you have available and how you residues in several pro- positives and provide associated confidence will be using the model,” says Shmulevich. teins of unknown func- scores, but the problem of false negatives per- Several bioinformatic tools have been devel- tion, which suggests that sists. “With these assays we try to get false posi- oped to model and represent networks. The they probably have roles tives down to zero. The hit you take is on false most widely used ones are associated with in signalling pathways. negatives. So now you can be highly confident Cytoscape (www.cytoscape.org), an open- One challenge in of your data but you are probably probing only source program for visualizing networks and defining protein–protein about 20% of the interactome,” says Ideker. “We for integrating them networks with other types interaction networks is would like to get every interaction but we do of data. Several Cytoscape plug-ins allow users “One that interactions vary not get even close with current technologies.” to download and explore databases. challenge is depending on the type New methods may become available to iden- Commercial packages with similar functions integrating of cell and the cellular tify interactions that escape detection by cur- include MetaCore from GeneGO in St Joseph, heterogeneous environment. For exam- rent techniques (see ‘Real-time analysis’). In Michigan; from Ingenu- data sets.” ple, Wrana mapped the the meantime, one way to address the problem ity Systems in Redwood City, California; and ilya Shmulevich protein–protein interac- is to combine procedures for detecting inter- Pathway Studio from Ariadne in tion network for TGF-β, actions, each sampling a different portion of Rockville, Maryland. These can access public a growth factor that regulates cell functions, the interactome. The interaction data obtained sources of data as well as the company’s propri- and found that two proteins that pass on the in an experiment can also be combined with etary databases. “One of the unique features of signals from the factor inside the cell — Smad2 that available in public databases, thus provid- Pathway Studio is the openness of our system and Smad4 — interact with one another only ing a more complete picture, says Bader. and the ability to integrate many different kinds when the cells are stimulated with TGF-β. If of data,” says David Denny, director of market- the cells are not stimulated, these two proteins FRom daTa To neTwoRKs ing and product management at Ariadne. don’t come into contact10. Protein–protein interactions are only the raw Bennett, Harper and Steven Gygi, a cell biol- material for networks. To build a network, guilT by associaTion ogist also at Harvard Medical School, devel- researchers typically combine interaction data One reason for developing networks is to help oped a platform centred around a sets with other sources of data. Primary data- assign functions to proteins through guilt by technology called multiplex absolute quanti- bases that contain protein–protein interactions association. But “a huge slice of the proteome fication (AQUA) to look at dynamic changes include DIP (http://dip.doe-mbi.ucla.edu), consists of proteins that no one knows what in protein interaction networks. AQUA uses BioGRID, IntAct (www.ebi.ac.uk/intact) and they do or interact with”, says Benjamin Cravatt, synthetic peptides that contain stable isotopes MINT (http://mint.bio.uniroma2.it). These a chemical physiologist at the Scripps Research as internal standards for the native peptides databases have committed to making records Institute in San Diego, California. that are produced when proteins from a cell

9 December 2010 | VOL 468 | NATUre | 853 © 2010 Macmillan Publishers Limited. All rights reserved technology Protein–Protein interactions lysate are digested. Using tandem develop the method12. MS, researchers can compare the lev- According to Gygi, the biggest appli- els of native and synthetic peptides cation of KAYAK might be in tumour in a cell to obtain a measure of the classification. “Biopsies or excised tis- amount of native proteins present. sues can be profiled for kinase activities Synthetic peptides can also be pre- with pinpoint accuracy. These patterns pared with modifications, such as could contribute towards personalized extra phosphate groups, to measure drug treatments based on dysregulated the number of post-translationally kinase pathways,” he says. modified proteins. “We are pursuing The combination of different types the dynamics of protein networks by of data and technologies should con- quantifying changes in the amount tinue to fill in the empty spaces of the of proteins present in specific protein current human inter actome map. The complexes,” says Harper. “Techniques picture may never be complete, but it such as AQUA provide an accurate will continue to provide insights into and sensitive measure of how the Web tools such as GeneMANIA integrate data on a protein or gene. cellular mechanisms of health and stoichiometry of components within disease. “I think that the network complexes that make up a network are altered certain hub proteins were in a different category we have is dense enough for us to start doing in response to a stimulus.” in breast-cancer patients with a good prognosis studies to classify disease states,” says Wrana. The team used the approach to describe than in those with a poor prognosis. “As the networks become better and coverage the rearrangements that occur in the protein Thus, by overlaying the expression pattern improves, the accuracy of diagnosis will also network of cullin-RING ubiquitin ligases, of a cancer cell from an individual patient onto improve.” ■ enzymes that regulate protein turnover, under the network, Wrana could various cellular conditions11. predict a patient’s prognosis. “We found that Laura Bonetta is a freelance science writer the detection of global changes in network based in Garrett Park, Maryland. develoPmenTs in diagnosTics organization is more predictive of outcome Changes in protein–protein interaction than is gene expression alone,” says Wrana. 1. Venkatesan, K. et al. Nature Meth. 6, 83–90 (2009). 2. Fields, S. & Song, O.-K. Nature 340, 245–246 (1989). networks may provide information about the “We have now applied this method to other 3. van den Berg, D. L. C. et al. Cell Stem Cell 6, mechanisms of disease. Last year, Wrana applied tumour models and obtained similar results.” 369–381 (2010). the network approach to the diagnosis of KAYAK (kinase activity assay for protein 4. pardo, M. et al. Cell Stem Cell 6, 382–395 (2010). breast cancer. He used microarrays to measure profiling) is another approach to developing 5. Sowa, M. e., Bennett, e. J., gygi, S. p. & Harper, J. W. Cell 138, 389–403 (2009). genome-wide protein expression in the tumours diagnostic tools for cancer on the basis of the 6. Behrends, C., Sowa, M. e., gygi, S. p. & Harper, J. W. of people with breast cancer, and then overlaid functional consequences of the interaction Nature 466, 68–76 (2010). the expression data on the network diagram of between a protein, in this case a kinase, and 7. Bandyopadhyay, S. et al. Nature Meth. 7, 801–805 the human interactome. its substrate. In this method, up to 90 peptide (2010). 8. nomura, D. K., Dix, M. M. & Cravatt, B. F. Nature Rev. Wrana had noted that ‘hub’ proteins, defined substrates for kinases are used to simulta- Cancer 10, 630–638 (2010). as those that interact with more than four neously measure the addition of phosphate 9. Weerapana, e. et al. Nature 468, 790–795 (2010). others, can be grouped into two categories groups to proteins in a cell lysate — in essence 10. Barrios-rodiles, M. et al. Science 307, 1621–1625 depending on whether they are expressed at providing a ‘phosphorylation signature’ for (2005). 11. Bennett, e. J. et al. Cell (in the press). the same time as the proteins with which they that particular cell. “The readout is so sensi- 12. yonghao, y. et al. Proc. Natl Acad. Sci. USA 106, interact. When they looked at breast-cancer tive and so quantitative that even small differ- 11606–11611 (2009). samples, Wrana and his colleagues found that ences are teased out,” says Gygi, who helped to 13. Uemura, S. et al. Nature 464, 1012–1017 (2010). S e C real-time analysis ien OSC Bi C i F i

in november, pacific Biosciences of Menlo technology can detect relatively C a park, California, commercially released weak interactions,” says Jonas p its third-generation Dna-sequencing Korlach, a scientific fellow at platform, based on its single-molecule, pacific Biosciences, adding that real-time (SMrt) technology. a single Dna it could pick out interactions polymerase bound to a Dna template is that happen so quickly that they attached to a tiny chamber illuminated can’t be identified by current by lasers, and nucleotides labelled with methods. coloured fluorophores are introduced to as a step towards such applications, it. as the polymerase incorporates them, Joseph puglisi, a structural biologist at each base is held for a few microseconds, Stanford University School of in while the fluorophore emits coloured light California, and his group, with scientists at Future SMRT systems could reveal interactions. corresponding to the base identity. SMrt pacific Biosciences, observed transfer rnas technology could also be used to analyse binding to single ribosomes in real time13. in protein factors to determine how the biomolecules other than Dna, and could an unpublished follow-up, puglisi’s group has machinery synthesizes proteins. become a common tool for detecting protein used SMrt technology to watch interactions “We have just seen the tip of the iceberg in interactions, with some unique features. “this between transfer rnas, ribosomes and terms of applications,” says Korlach. L.b.

854 | NATUre | VOL 468 | 9 December 2010 © 2010 Macmillan Publishers Limited. All rights reserved