ROST

Protein- LAB interaction

Professors and Tobias Hamp discuss their research into protein-protein interaction, investigating whether or not it is essential for bodily function

predict aspects of relevant to the question of why they are different. In biomedical research. Could you provide many cases, their individuality was crucial for some successful examples of this? function, eg. to change enzymatic conversion rates or enable larger complexes to build.

ECKERT/TUM The ultimate value of the evolutionary This is in stark contrast to the notion that we comparison is what we learn from such only need to know if two proteins interact at comparisons about and all in order to interpolate how they function.

© ASTRID © function. The first time that evolutionary The molecular details appear to be more information was married to machine learning important than we might have anticipated. was the big leap in the usefulness of protein What fascinates you about the secondary structure predictions from 1993 To what extent do you agree with the relationship between protein/DNA onwards. Another example for the importance notion that programming is becoming a sequence, structure and function? of evolutionary information is the prediction fundamental skill for biomedical scientists of the residues involved in protein-protein in the 21st Century? TH: Computers have always been my interactions (PPIs). Using sequence alone, we favourite hobby. In particular, I wanted can identify some signal. However, we need BR: Drawing on my time at Columbia them to help people, not just entertain to include evolutionary information to reliably University in the city of New York and the them. When I learned that the blueprint predict interaction hotspots, ie. those sites that Technical University of Munich, I have not of life is a sequence of letters and offers are most important for an interaction. These are witnessed any more than 20 per cent of itself to be modelled by software, only two extremes from a spectrum of methods. postgraduate students having sufficient computational biology was an easy programming experiences to focus directly choice. However, it is both fascinating and You found that over a third of the on the scientific questions that must be shocking to see how much there is still left differences in interactions between pairs addressed in order to complete subsequent to uncover, even in the human body. Only of homologous proteins are also observed theses. Clearly, this number has increased a tiny fraction of cellular processes are between identical proteins. What does substantially over the course of my teaching, understood at the level of proteins/DNA. this suggest about the importance of the but 20 per cent is still fairly small, and I do not I believe that sequences hold the key to molecular details of PPIs for function? believe this proportion will rise much more. close this gap in both the fastest and the It is important that an increasing fraction of most significant way. PPIs allow for alternative solutions, and this PhD-level students in all fields of biology have has nothing to do with sequence changes. substantial understanding of methods that BR: At the European Why do we need alternative interfaces? govern the advance of computational biology. Laboratory, I became hooked by the beauty We might speculate with many possible of protein structures, by the surprising explanations, but the fact is that we did not What are your hopes for the future discovery of how much simple molecular anticipate this finding. Therefore, none of our of your research, and more broadly machines resemble the ingenuity human attempts at rationalising a result after its for the fields of and engineering has been investing into discovery appear convincing. computational biology? designing simple machines for centuries. This initial fascination, driven by the The reason why we explicitly focused on the TH: I am currently working on predicting colourful world of structural biology, particular statement that the variation is PPIs from sequence, and hope to publish new expanded into a wider fascination about independent of sequence is ultimately of a methods soon. They will also integrate the the principles that underlie molecular technical ; it is rooted in the unclear results about the alternative interfaces. As biology and life, and about biology’s only definition of a protein. Two slightly different for the field in general, I know it will continue ‘theory’ and its most comprehensive protein sequences can still map to the same to grow and rapidly gain importance. My commonality: evolution. gene and hence be referred to as the ‘same’ hope as a developer is that it will do this protein. We wanted to be absolutely clear as sustainably and with as much quality as You use protein and DNA sequences, that the differences in the interfaces are not possible, so that the tools we create can alongside evolutionary information, to due to mutations in the sequence. This raises celebrate a fifth or even a 10th anniversary.

WWW.RESEARCHMEDIA.EU 103 ROSTLAB Frequent exceptions A study being conducted by Rostlab at TU Munich Informatics has been merging computers with biology to investigate proteins at the molecular level. Using algorithms developed over the past 20 years, the research team is answering some of biology’s most profound questions

UNDERSTANDING INTERACTIONS spearheading the project ‘Alternative Protein- Some protein residue mutations are more BETWEEN molecules in living systems is an Protein Interfaces are Frequent Exceptions’. important than others, and these are invariably essential biological study. First, it helps us to Rost states: “The development of methods conserved through evolution. With the increase understand a protein’s function and behaviour. that predict which amino acid changes protein in deep sequencing and genotyping, it is essential Second, it helps to predict how a protein with an function and structure and which does not a distinction is made between effect and neutral unknown function behaves. And finally, it helps has evolved as the major focus of our lab. An variants. One hypothesis supposes that all to characterise protein complexes and pathways. essential way in which biological machines mutations of conserved residues have an effect – such as proteins differ from human machines though this approach is not particularly adequate. Rostlab, a laboratory based at TU Munich is in their extreme robustness against changes Informatics, is advancing this biological and errors. However, this does not imply you Computational tools, however, are optimised to understanding through computational biology can change any residue you want. Rather, the predict the effect of mutations, and offer more and bioinformatics. It does this by developing opposite is true: knowing which amino acids detail. Indeed, the range goes from single variants novel algorithms and methods that predict can be exchanged is extremely informative to whole mutability landscapes. This works by and characterise protein function at the about the details of structure and function”. supplanting every protein residue with 19 foreign molecular level. amino acids. Extensive mutagenesis experiments Rost and Hamp know this to be true, because such as these not only help us to further understand The laboratory has pioneered this combination proteins can be observed as three-dimensional protein function and genotype associations, but of ‘evolutionary information’ and ‘machine objects, meaning the changes that take place also life as we know it. learning’ for over 20 years, and subscribes to between natural proteins can provide essential a straightforward premise: the information information about structure and function. This FREQUENT EXCEPTIONS lies in the sequence. The resources that are idea embodies Rost’s fundamental proposition developed through Rostlab’s research are that ‘evolution teaches structure prediction’, Rostlab’s most recent project revealed that typically used to refine and quicken the design which has formed the basis of his work for alternative protein-protein interactions (PPIs) of experiments, providing validation of results. many years. are frequent exceptions. At the start of the study, the team planned only to collate data The fundamental question explored by Rostlab on known structures of PPIs - typically a close- USING ALGORITHMS TO has significance to us all. It is not just a matter of up of protein atoms at the moment they come SOLVE BIOLOGICAL PROBLEMS discovering how sequence variations determine into contact. Such data is stored in the protein Rostlab uses different algorithms to answer who we are, but also the impact mutations data bank (PDB), which provides information some of biology’s most profound questions. have on us as a species. We cannot understand on all known proteins of known structure. Burkhard Rost and Tobias Hamp are life without better understanding interactions. However, a problem was presented when the quality of data was jeopardised: interactions in the PDB may not fully represent those of natural living cells. “Filtering out all that noise was the biggest challenge. Afterwards, we could validate common (mis)conceptions about PPIs,” Hamp explains. “One of them was the view that the same two interacting proteins always touch each

Rostlab at the ISMB/ECCB in Vienna (2011). 104 INTERNATIONAL INNOVATION INTELLIGENCE ROSTLAB OBJECTIVES Rostlab develops computational methods that aid in the annotation of genomes predicting aspects of protein function and structure, with a specific focus in the combination of evolutionary information with machine learning. KEY COLLABORATORS

Three examples of the same two proteins interacting via different interfaces. Nir Ben-Tal, Tel Aviv University, Israel Different colours indicate different proteins. In A), the small image shows the • Assistant Professor Yana Bromberg, Rutgers two interface areas of the green protein. In C), the small image is a side view of University, USA • Professor Søren Brunak, TU the complex with all orange chains and all but one blue chain removed. Lyngby Denmark • Dr Wayne Hendrickson, Columbia University, USA • Professor , Hebrew University, Israel • Dr Michael Nilges, Institut Pasteur, France • Dr Yanay other at the same place on their surface. Over the number of protein structures and pair wise Ofran, Bar-Ilan University, Israel • Professor the last few years, various counter-examples interface comparisons. However, the team Christine Orengo, UCL, UK • Dr Marco Punta, had accumulated, but nobody had brought them had enough computational resources and was EMBL-EBI, UK • Dr , Sloan- together in a bigger picture.” fortunate in that it could choose the correct Kettering Institute, USA • Assistant Professor programming languages and hardware. Another Avner Schlessinger, Mount Sinai Hospital, Studying the intricate molecular details of PPIs essential factor is good communication with USA • Dr Reinhard Schneider, University of is important for a number of reasons. PPIs are a system administrators. This is because obstacles Luxembourg, Luxembourg • Professor Torsten gateway to a vast range of fascinating biological (such as network file systems) are prone to Schwede, Biozentrum, Switzerland discoveries, providing vital insight into evolution arise when multiple central processing units are PARTNERS and a better grasp on the functions of the human simultaneously undertaking the same task. New York Consortium on Membrane Protein body. However, no study of function is complete Structure (NYCOMPS) • New York Structural without observing interaction: proteins function in FINDINGS AND THE FUTURE Biology Center (NYSBC) the context of other proteins. Thus, no biological process can be properly understood without The results from Rostlab’s study have revealed FUNDING considering interaction. It is this contextual some intriguing questions as to whether or Alexander von Humboldt Foundation through understanding that informs the development of not the molecular factors of PPIs are essential the German Ministry for Research and Education medicine and bioengineering. for function. If molecular and protein details (BMBF) • National Institutes of Health must be identical to ensure function, different At a more fundamental level, all biological processes experiments would find the same interfaces. CONTACT involve the collaboration of cells, and then proteins. The team has tested interface similarity in Professor Tobias Hamp So, if interaction lies at the core of function, then many different ways, but all the results were of Lab Member of ROSTLAB no process of life occurs without protein interaction. an unexpected variety. In fact, 11-37 per cent TU Munich Informatics / i12 Bioinformatics Some of these interactions are understood and can of observations had significant differences, Boltzmannstr. 3 be manipulated, but in the majority of cases, they and up to 10 per cent were completely 85748 Garching b. are not. Therefore, more information is needed to different. These results may conflict with the München, Germany discover methods of control, enabling scientists to, hypothesis that maintaining molecular detail T +49 89 289 17837 for example, remedy defective function. is required for function, but they also suggest E [email protected] there are alternative solutions for maintaining www.rostlab.org molecular details. DATASET CHALLENGES PROFESSOR TOBIAS HAMP studied The study has not been without its challenges. Computers and biology are two rapidly evolving computational biology at the Technical Indeed, computational biology and fields of knowledge. However, in recent years, University of Munich/Ludwig-Maximilians- bioinformatics is different from any other area biological data has been outrunning the University universities in Munich until April of scientific study, much of which focuses on advancement of computers: “Every year the 2009. After a short-term employment as proteins and processes. Due to PPIs’ uniqueness, challenges become a little harder to manage Project Manager at the Biotech company there are many obstacles to overcome when with a little less money,” explains Rost. “One Eurofins Medigenomix, he joined Rostlab in dealing with large-scale datasets. Rost particular challenge for computational biology November 2009 to become a PhD graduate. elucidates: “Sometimes the numbers expand to comes from the difference in cultures. To put He plans to finish his studies in early 2014.

as much as tens of biomolecules that are studied this in the most dire economic figures: on PROFESSOR BURKHARD ROST currently in detail. Each typically follows its unique the one hand, biologists are cheap, material heads the Unit for Computational Biology principles. In contrast, most studies in physics scientists are more expensive, computer & Bioinformatics at the Department of focus on phenomena that generically describe scientists are most expensive. On the other, the Informatics of the Technical University of millions and millions-of-millions of objects greater say a team member has, the more the Munich. He has given 175 invited talks in 27 that are treated alike and evolve features”. pay. Both those simplifications define the reality countries and to date has authored over 220 The world of traditional, hypothesis-driven of computational biology.” scientific publications. experimental biology, therefore, is one in which we can look at the details of every component The past two decades has seen computational involved. That of physics is one in which we just biology become a major player in science. But, need to monitor generic descriptors. Rostlab’s in order to further research, more progression specialism lies between the two: the numbers is needed. To delve deeper into the study of are too large to be analysed in detail and the PPIs, laboratories such as Rostlab need grants, proteins are too individual to be homogenised. which would enable them to operate at the departmental level of a university. Until then, Furthermore, the dataset for alternative Rostlab will continue to explore how evolution interfaces is on too grand a scale in terms of teaches structure prediction. WWW.RESEARCHMEDIA.EU 105