Rosetta @ Home and the Foldit Game

Total Page:16

File Type:pdf, Size:1020Kb

Rosetta @ Home and the Foldit Game Rosetta @ Home and the Foldit game Observations of a distributed computing project, for the Interconnect & Neuroscience course by prof.dr.ir. R.H.J.M. Otten By Frank Razenberg – [email protected] – student id. 0636007 Contents 1 Introduction .......................................................................................................................................... 3 2 Distributed Computing .......................................................................................................................... 4 2.1 Concept ......................................................................................................................................... 4 2.2 Applications ................................................................................................................................... 4 3 The BOINC Platform .............................................................................................................................. 5 3.1 Design ............................................................................................................................................ 5 3.2 Processing power .......................................................................................................................... 5 3.3 Interesting projects ....................................................................................................................... 6 3.3.1 SETI@home ........................................................................................................................... 6 3.3.2 Chess960@home .................................................................................................................. 6 3.3.3 PrimeGrid .............................................................................................................................. 6 3.3.4 SHA-1 Collision Search Graz .................................................................................................. 7 4 Rosetta@home ..................................................................................................................................... 8 4.1 The Rosetta algorithm ................................................................................................................... 8 4.1.1 Overview ............................................................................................................................... 8 4.1.2 Secondary structure prediction ............................................................................................ 9 4.1.3 Decoy generation .................................................................................................................. 9 4.1.4 Structure ranking ................................................................................................................ 10 4.1.5 Fragment insertion .............................................................................................................. 10 5 Foldit game ......................................................................................................................................... 11 5.1 Origin ........................................................................................................................................... 11 5.2 Elements of the game ................................................................................................................. 12 5.3 Results ......................................................................................................................................... 12 6 Bibliography ........................................................................................................................................ 14 1 Introduction In this essay we will observe the distributed computing project named Rosetta@home. Rosetta@home is a non-profit project which attempts to determine the 3-dimensional shapes of proteins from their amino acid sequences. Success of Rosetta's work would have broad ranging implications for human health, ranging from the development of a vaccine for HIV to the eradication of Malaria. Almost all human diseases are caused by mutations in proteins that affect their 3-dimensional structures and functions, and so, if we can reliably predict protein structures, we could understand how mutations cause disease and from there perhaps go on to develop therapies. An example is one of Rosetta@home's goals: trying to design immunogens that will elicit antibodies against HIV, which would be a critical part of a vaccine (1). Up until recently it's been thought to be pretty much impossible to reliably predict the structure of proteins from their sequence, but it is known that the 3d structure is determined solely by their amino acid sequences. Protein structures are currently determined by time consuming, expensive experiments, which are only applicable to a small subset of proteins. If we instead could predict protein structures, in a reliable and accurate way, it would revolutionize much of molecular biology. Structure prediction is typically an energy minimization problem. Proteins tend to form structures that keep hydrophobic parts buried internally, away from the water they're dissolved in. They also form bridges between neighboring sections by hydrogen bonds and charge interactions. Maximize these sorts of interactions and you minimize the energy involved. The Rosetta algorithm was developed by the Baker Laboratory under principal guidance of biochemist David Baker. The algorithm was then implemented in the distributed computing application Rosetta@home, and later the computer game Foldit arose from the Rosetta@home project. The University of Washington and manages Rosetta@home, which runs on the Berkeley Open Infrastructure for Network Computing (hereafter BOINC). The first part of this essay is devoted to an observation of the concept of distributed computing. We then look at the BOINC platform, which is the distributed computing platform on which Rosetta@home and several similar projects run. Next we focus on Rosetta@home by looking at the problem it tries to solve, the projects’ significance so far. Then, we take an in-depth look at the Foldit game. 2 Distributed Computing The following sections briefly describe the concept and advantages of using distributed computing for solving large computable problems. 2.1 Concept Many problems can be solved by computation, for which our personal computers are of course very suited. Large problems, such as weather prediction, are often too complex for a consumer grade personal computer to take on. Instead, large supercomputers with possibly over 100 times the processing power of an average consumer PC are used to work on these problems. These supercomputers are generally built by placing multiple processing units in a cluster or grid. If the problem to solve can be divided in sub problems, then each of these sub problems can be solved by a different CPU. The power of such a supercomputer thus stems from the fact that it is possible to perform many calculations simultaneously. With the advent of internet and broadband internet connections, possibilities to create a gigantic computing cluster emerged. Millions of users can be persuaded to take part in a project. The project manager can assign jobs to each participant, and the participant works on this job. When calculations are finished, the participant returns the results to the process manager and he may accept a new job. Typically, a distributed system can tolerate failures of individual nodes. Nodes need to only know part of the total input and may not be aware of other nodes in system. If designed with this in mind, scalability is guaranteed, meaning that having more participants results in (near) linearly more work getting done. Although no clear definitions exist for parallel and distributed computing, the difference is generally considered that in parallel computation, different processes share the same memory, while in distributed computing each processor has its own memory set. Parallel computation might thus be considered are more tightly coupled form of distributed computing. 2.2 Applications Algorithms have been designed to tackle various problems through distributed computing. Mathematical applications include searching for unknown prime numbers and testing cryptology techniques. Another major application is medical research to cure diseases, study global warming, discover pulsars, and do many other types of scientific research. A few such projects are discussed in Section 3.3. 3 The BOINC Platform The Berkeley Open Infrastructure for Network Computing (BOINC) is an Open Source platform for distributed applications, developed at the University of California, Berkeley. It serves as platform for various scientific research projects that require grid computing. These research sciences include Biology and Medicine, Earth sciences, Physics and Astronomy, Mathematics, Artificial Intelligence and many others. 3.1 Design The BOINC Platform emerged from a rewrite of the distributed computing client to Search for Extra-Terrestrial Intelligence (SETI). SETI’s purpose was to do useful scientific work by supporting an observational analysis to detect intelligent life outside Earth, and to prove the viability and practicality of the 'volunteer computing' concept. This is done by analyzing radio signals. Thus far the first goal has not been met. The SETI client was only the second large distributed computing project, initiated in 1999, also at Berkeley.
Recommended publications
  • Foldit Gamers Improve Protein Design Through Crowdsourcing 25 January 2012, by Bob Yirka
    Foldit gamers improve protein design through crowdsourcing 25 January 2012, by Bob Yirka chemical reactions. In earlier versions of the Foldit game, players were simply given existing proteins to play with and asked to find the minimal energy state for them by folding them in optimum ways, this latest version has gone much farther by giving players the opportunity to come up with a whole new protein design. To create the new design, gamers were given a Image: Nature Biotechnology (2012) simple beginning structure and some basic ideas doi:10.1038/nbt.2109 about the goal of the new protein, in this case to serve as a better catalyst for a class of Diels-Alder reactions, which are used to synthesize many commercial products. After offering some ideas (PhysOrg.com) -- Gamers on Foldit have such as remodeling certain sections to make them succeeded in improving the catalyst abilities of an behave in certain ways, the gamers went to work enzyme, making it 18-fold more active than the folding the proteins using the tools at hand. original version. The idea is the brainchild of University of Washington scientist Zoran Popovic The first go-round proved mostly futile, with few who is director of the Center for Game Science, gamers coming up with good improvements. To and biochemist David Baker. Together they have improve the results, the team took the best foldings created the Foldit site which is a video game from the first round and fed them back into the application that allows players to work with protein game allowing gamers to improve on them.
    [Show full text]
  • Increasing Public Involvement in Structural Biology
    Structure Commentary Increasing Public Involvement in Structural Biology Seth Cooper,1,* Firas Khatib,2 and David Baker2 1Department of Computer Science 2Department of Biochemistry University of Washington, Seattle, WA 98195, USA *Correspondence: [email protected] http://dx.doi.org/10.1016/j.str.2013.08.009 Public participation in scientific research can be a powerful supplement to more-traditional approaches. We discuss aspects of the public participation project Foldit that may help others interested in starting their own projects. It is now easier than ever for the public to We’re very excited about the possibility Openness to Collaboration get involved in science. The Internet has for games and other forms of public in a Variety of Forms made it feasible for research groups to involvement in science to help advance The core of the project has been a very easily connect with people all over the the field. To our knowledge, there have fruitful collaboration between the Com- world. Personal computers have also been a few other projects actively puter Science and Engineering Depart- become powerful enough to run compu- involving the public in structural biology, ment and the Biochemistry Department tationally intensive programs, giving the and we look forward to many more in at the University of Washington. Both public the opportunity to contribute to the future. Structural biology problems departments were able to bring their scientific research. Volunteer computing involving the analysis of existing mole- knowledge and skills together to make a allows the public to share their spare cules and the design of new ones are successful team.
    [Show full text]
  • Algorithm Discovery by Protein Folding Game Players
    Algorithm discovery by protein folding game players Firas Khatiba, Seth Cooperb, Michael D. Tykaa, Kefan Xub, Ilya Makedonb, Zoran Popovićb, David Bakera,c,1, and Foldit Players aDepartment of Biochemistry; bDepartment of Computer Science and Engineering; and cHoward Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98195 Contributed by David Baker, October 5, 2011 (sent for review June 29, 2011) Foldit is a multiplayer online game in which players collaborate As the players themselves understand their strategies better than and compete to create accurate protein structure models. For spe- anyone, we decided to allow them to codify their algorithms cific hard problems, Foldit player solutions can in some cases out- directly, rather than attempting to automatically learn approxi- perform state-of-the-art computational methods. However, very mations. We augmented standard Foldit play with the ability to little is known about how collaborative gameplay produces these create, edit, share, and rate gameplay macros, referred to as results and whether Foldit player strategies can be formalized and “recipes” within the Foldit game (10). In the game each player structured so that they can be used by computers. To determine has their own “cookbook” of such recipes, from which they can whether high performing player strategies could be collectively invoke a variety of interactive automated strategies. Players can codified, we augmented the Foldit gameplay mechanics with tools share recipes they write with the rest of the Foldit community or for players to encode their folding strategies as “recipes” and to they can choose to keep their creations to themselves. share their recipes with other players, who are able to further mod- In this paper we describe the quite unexpected evolution of ify and redistribute them.
    [Show full text]
  • CASP)-Round V
    PROTEINS: Structure, Function, and Genetics 53:334–339 (2003) Critical Assessment of Methods of Protein Structure Prediction (CASP)-Round V John Moult,1 Krzysztof Fidelis,2 Adam Zemla,2 and Tim Hubbard3 1Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 2Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California 3Sanger Institute, Wellcome Trust Genome Campus, Cambridgeshire, United Kingdom ABSTRACT This article provides an introduc- The role and importance of automated servers in the tion to the special issue of the journal Proteins structure prediction field continue to grow. Another main dedicated to the fifth CASP experiment to assess the section of the issue deals with this topic. The first of these state of the art in protein structure prediction. The articles describes the CAFASP3 experiment. The goal of article describes the conduct, the categories of pre- CAFASP is to assess the state of the art in automatic diction, and the evaluation and assessment proce- methods of structure prediction.16 Whereas CASP allows dures of the experiment. A brief summary of progress any combination of computational and human methods, over the five CASP experiments is provided. Related CAFASP captures predictions directly from fully auto- developments in the field are also described. Proteins matic servers. CAFASP makes use of the CASP target 2003;53:334–339. © 2003 Wiley-Liss, Inc. distribution and prediction collection infrastructure, but is otherwise independent. The results of the CAFASP3 experi- Key words: protein structure prediction; communi- ment were also evaluated by the CASP assessors, provid- tywide experiment; CASP ing a comparison of fully automatic and hybrid methods.
    [Show full text]
  • A Deep Reinforcement Learning Neural Network Folding Proteins
    DeepFoldit - A Deep Reinforcement Learning Neural Network Folding Proteins Dimitra Panou1, Martin Reczko2 1University of Athens, Department of Informatics and Telecommunications 2Biomedical Sciences Research Center “Alexander Fleming” ABSTRACT Despite considerable progress, ab initio protein structure prediction remains suboptimal. A crowdsourcing approach is the online puzzle video game Foldit [1], that provided several useful results that matched or even outperformed algorithmically computed solutions [2]. Using Foldit, the WeFold [3] crowd had several successful participations in the Critical Assessment of Techniques for Protein Structure Prediction. Based on the recent Foldit standalone version [4], we trained a deep reinforcement neural network called DeepFoldit to improve the score assigned to an unfolded protein, using the Q-learning method [5] with experience replay. This paper is focused on model improvement through hyperparameter tuning. We examined various implementations by examining different model architectures and changing hyperparameter values to improve the accuracy of the model. The new model’s hyper-parameters also improved its ability to generalize. Initial results, from the latest implementation, show that given a set of small unfolded training proteins, DeepFoldit learns action sequences that improve the score both on the training set and on novel test proteins. Our approach combines the intuitive user interface of Foldit with the efficiency of deep reinforcement learning. KEYWORDS: ab initio protein structure prediction, Reinforcement Learning, Deep Learning, Convolution Neural Networks, Q-learning 1. ALGORITHMIC BACKGROUND Machine learning (ML) is the study of algorithms and statistical models used by computer systems to accomplish a given task without using explicit guidelines, relying on inferences derived from patterns. ML is a field of artificial intelligence.
    [Show full text]
  • Games As a Platform for Student Participation in Authentic Scientific Research
    Games as a Platform for Student Participation in Authentic Scientific Research Rikke Magnussen1, Sidse Damgaard Hansen2, Tilo Planke2 and Jacob Friis Sherson2 AU Ideas Center for Community Driven Research, CODER 1ResearchLab: ICT and Design for Learning, Department of Communication, Aalborg University, Denmark 2Department of Physics and Astronomy, Aarhus University, Denmark [email protected] [email protected] [email protected] [email protected] Abstract: This paper presents results from the design and testing of an educational version of Quantum Moves, a Scientific Discovery Game that allows players to help solve authentic scientific challenges in the effort to develop a quantum computer. The primary aim of developing a game-based platform for student-research collaboration is to investigate if and how this type of game concept can strengthen authentic experimental practice and the creation of new knowledge in science education. Researchers and game developers tested the game in three separate high school classes (Class 1, 2, and 3). The tests were documented using video observations of students playing the game, qualitative interviews, and qualitative and quantitative questionnaires. The focus of the tests has been to study players' motivation and their experience of learning through participation in authentic scientific inquiry. In questionnaires conducted in the two first test classes students found that the aspects of doing “real scientific research” and solving physics problems were the more interesting aspects of playing the game. However, designing a game that facilitates professional research collaboration while simultaneously introducing quantum physics to high school students proved to be a challenge. A collaborative learning design was implemented in Class 3, where students were given expert roles such as experimental and theoretical physicists.
    [Show full text]
  • Final Draft.Docx
    Three-Dimensional Modeling of Chicken Anemia Virus VP3 and Porcine Circovirus Type 1 VP3 A Major Qualifying Project Submitted to the faculty of WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degree of Bachelor of Science in Biochemistry and Chemistry by: __________________________ Sam Eisenberg __________________________________________ Lee Hermsdorf-Krasin __________________________ Curtis Innamorati September 12th, 2013 Approved: _________________________________ Dr. Destin Heilman, Advisor Department of Chemistry and Biochemistry, WPI Abstract The third viral protein (VP3) of the Chicken Anemia Virus (Apoptin) and Porcine Circovirus Type 1 (PCV1VP3) have potential therapeutic cancer killing properties. Though advances have been made in understanding their apoptotic mechanisms, the reasons behind their cancer cell selectivity have thus far eluded researchers. Further, researchers have been unable to isolate and crystallize these proteins, and this lack of a known structure greatly contributes to the difficulty of studying their selectivity. In the past decade protein prediction algorithms have made great strides in the ability to accurately predict secondary and tertiary structures of proteins. This project aimed to generate possible functional models of these proteins using the available prediction techniques. One significant and well defined function of these proteins is their ability to specifically localize to the cell nucleus or cytoplasm. In order to link and evaluate the results generated from tertiary structure predictions with possible mechanisms for localization, experiments regarding the activity of nuclear export signals in the proteins were performed. The generated models strongly suggest that a conformational change plays a significant role regarding the localization of Apoptin and that the export capabilities of PCV1VP3 are CRM1-dependent.
    [Show full text]
  • 11: Catchup II Machine Learning and Real-World Data (MLRD)
    11: Catchup II Machine Learning and Real-world Data (MLRD) Ann Copestake Lent 2019 Last session: HMM in a biological application In the last session, we used an HMM as a way of approximating some aspects of protein structure. Today: catchup session 2. Very brief sketch of protein structure determination: including gamification and Monte Carlo methods (and a little about AlphaFold). Related ideas are used in many very different machine learning applications . What happens in catchup sessions? Lecture and demonstrated session scheduled as in normal session. Lecture material is non-examinable. Time for you to catch-up in demonstrated sessions or attempt some starred ticks. Demonstrators help as usual. Protein structure Levels of structure: Primary structure: sequence of amino acid residues. Secondary structure: highly regular substructures, especially α-helix, β-sheet. Tertiary structure: full 3-D structure. In the cell: an amino acid sequence (as encoded by DNA) is produced and folds itself into a protein. Secondary and tertiary structure crucial for protein to operate correctly. Some diseases thought to be caused by problems in protein folding. Alpha helix Dcrjsr - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=9131613 Bovine rhodopsin By Andrei Lomize - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=34114850 found in the rods in the retina of the eye a bundle of seven helices crossing the membrane (membrane surfaces marked by horizontal lines) supports a molecule of retinal, which changes structure when exposed to light, also changing the protein structure, initiating the visual pathway 7-bladed propeller fold (found naturally) http://beautifulproteins.blogspot.co.uk/ Peptide self-assembly mimic scaffold (an engineered protein) http://beautifulproteins.blogspot.co.uk/ Protein folding Anfinsen’s hypothesis: the structure a protein forms in nature is the global minimum of the free energy and is determined by the animo acid sequence.
    [Show full text]
  • Benchmarking the POEM@HOME Network for Protein Structure
    3rd International Workshop on Science Gateways for Life Sciences (IWSG 2011), 8-10 JUNE 2011 Benchmarking the POEM@HOME Network for Protein Structure Prediction Timo Strunk1, Priya Anand1, Martin Brieg2, Moritz Wolf1, Konstantin Klenin2, Irene Meliciani1, Frank Tristram1, Ivan Kondov2 and Wolfgang Wenzel1,* 1Institute of Nanotechnology, Karlsruhe Institute of Technology, PO Box 3640, 76021 Karlsruhe, Germany. 2Steinbuch Centre for Computing, Karlsruhe Institute of Technology, PO Box 3640, 76021 Karlsruhe, Germany ABSTRACT forcefields (Fitzgerald, et al., 2007). Knowledge-based potentials, in contrast, perform very well in differentiating native from non- Motivation: Structure based methods for drug design offer great native protein structures (Wang, et al., 2004; Zhou, et al., 2007; potential for in-silico discovery of novel drugs but require accurate Zhou, et al., 2006) and have recently made inroads into the area of models of the target protein. Because many proteins, in particular protein folding. Physics-based models retain the appeal of high transmembrane proteins, are difficult to characterize experimentally, transferability, but the present lack of truly transferable potentials methods of protein structure prediction are required to close the gap calls for the development of novel forcefields for protein structure prediction and modeling (Schug, et al., 2006; Verma, et al., 2007; between sequence and structure information. Established methods Verma and Wenzel, 2009). for protein structure prediction work well only for targets of high We have earlier reported the rational development of transferable homology to known proteins, while biophysics based simulation free energy forcefields PFF01/02 (Schug, et al., 2005; Verma and methods are restricted to small systems and require enormous Wenzel, 2009) that correctly predict the native conformation of computational resources.
    [Show full text]
  • Methods for the Refinement of Protein Structure 3D Models
    International Journal of Molecular Sciences Review Methods for the Refinement of Protein Structure 3D Models Recep Adiyaman and Liam James McGuffin * School of Biological Sciences, University of Reading, Reading RG6 6AS, UK; [email protected] * Correspondence: l.j.mcguffi[email protected]; Tel.: +44-0-118-378-6332 Received: 2 April 2019; Accepted: 7 May 2019; Published: 1 May 2019 Abstract: The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
    [Show full text]
  • Getting Humans to Do Quantum Optimization - User Acquisition, Engagement and Early Results from the Citizen Cyberscience Game Quantum Moves
    Human Computation (2014) 1:2:221-246 © 2014, Lieberoth et al. CC-BY-3.0 ISSN: 2330-8001, DOI: 10.15346/hc.v1i2.11 Getting Humans to do Quantum Optimization - User Acquisition, Engagement and Early Results from the Citizen Cyberscience Game Quantum Moves ANDREAS LIEBEROTH, Aarhus University MADS KOCK PEDERSEN, Aarhus University ANDREEA CATALINA MARIN, Aarhus University TILO PLANKE, Aarhus University JACOB FRIIS SHERSON, Aarhus University ABSTRACT The game Quantum Moves was designed to pit human players against computer algorithms, combining their solutions into hybrid optimization to control a scalable quantum computer. In this midstream report, we open our design process and describe the series of constitutive building stages going into a quantum physics citizen science game. We present our approach from designing a core gameplay around quantum simulations, to putting extra game elements in place in order to frame, structure, and motivate players’ difficult path from curious visitors to competent science contributors. The player base is extremely diverse – for instance, two top players are a 40 year old female accountant and a male taxi driver. Among statistical predictors for retention and in-game high scores, the data from our first year suggest that people recruited based on real-world physics interest and via real-world events, but only with an intermediate science education, are more likely to become engaged and skilled contributors. Interestingly, female players tended to perform better than male players, even though men played more games per day. To understand this relationship, we explore the profiles of our top players in more depth. We discuss in-world and in-game performance factors departing in psychological theories of intrinsic and extrinsic motivation, and the implications for using real live humans to do hybrid optimization via initially simple, but ultimately very cognitively complex games.
    [Show full text]
  • Advances in Rosetta Protein Structure Prediction on Massively Parallel Systems
    UC San Diego UC San Diego Previously Published Works Title Advances in Rosetta protein structure prediction on massively parallel systems Permalink https://escholarship.org/uc/item/87g6q6bw Journal IBM Journal of Research and Development, 52(1) ISSN 0018-8646 Authors Raman, S. Baker, D. Qian, B. et al. Publication Date 2008 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Advances in Rosetta protein S. Raman B. Qian structure prediction on D. Baker massively parallel systems R. C. Walker One of the key challenges in computational biology is prediction of three-dimensional protein structures from amino-acid sequences. For most proteins, the ‘‘native state’’ lies at the bottom of a free- energy landscape. Protein structure prediction involves varying the degrees of freedom of the protein in a constrained manner until it approaches its native state. In the Rosetta protein structure prediction protocols, a large number of independent folding trajectories are simulated, and several lowest-energy results are likely to be close to the native state. The availability of hundred-teraflop, and shortly, petaflop, computing resources is revolutionizing the approaches available for protein structure prediction. Here, we discuss issues involved in utilizing such machines efficiently with the Rosetta code, including an overview of recent results of the Critical Assessment of Techniques for Protein Structure Prediction 7 (CASP7) in which the computationally demanding structure-refinement process was run on 16 racks of the IBM Blue Gene/Le system at the IBM T. J. Watson Research Center. We highlight recent advances in high-performance computing and discuss future development paths that make use of the next-generation petascale (.1012 floating-point operations per second) machines.
    [Show full text]