Analysis and Predictions of DNA Sequence Transformations on Grids

Total Page:16

File Type:pdf, Size:1020Kb

Analysis and Predictions of DNA Sequence Transformations on Grids Analysis and Predictions of DNA Sequence Transformations on Grids A Thesis Submitted for the Degree of Master of Science (Engineering) in the Faculty of Engineering By Yadnyesh R. Joshi Supercomputer Education and Research Centre INDIAN INSTITUTE OF SCIENCE BANGALORE – 560 012, INDIA August 2007 Acknowledgments First of all I would like to extend my sincere thanks to my research supervisor Dr. Sathish Vadhiyar for his constant guidance and support during the entire period of my post-graduation at IISc. He was always approachable, supportive and ready to help in any sort of problem. I am very thankful to him for being extremely patient and understanding about the silly mistakes that I had made. Under his guidance I learned to approach problems in an organized manner and set realistic goals for my research. I thank him for his extreme patience and excellent technical guidance in writing and presenting research. Finally, he was and continues to be my role model for his hard work and passion for research. I am also thankful to Dr. Nagasuma Chandra, Dr. Debnath Pal from S.E.R.C. and Dr. Narendra Dixit from Chemical Engineering department for their very useful and interesting insights into the biological domain of our research. I am also thankful to all the faculty of S.E.R.C. for always inspiring us with their motivational talks. I would like to mention the names of my colleagues Sandip, Sanjay, Rakhi, Sundari, Antoine and Roshan for making their technical and emotional support. Special thanks to vatyaa kya group members for the adventures and the routines inside and outside the institute. I would also like to thank the Marathi Mandal for making the institute a homely place. Back home, I would like to thank my parents for being my pillars of strength. I would also like to thank Yamini tai and Dhanashree, my sisters for supporting and guiding me to make important decisions. I would like to thank my friends, Vijay, Vishwanath, Pushkaraj, Prashant, Akshay, Sunder, Anusha and Neha for always being there for me. Last but never the least, I am very thankful to my grandfather Laxman Rao for being the strongest motivator all the time. i Abstract Phylogenetics is the study of evolution of organisms. Evolution occurs due to mutations of DNA sequences. The reasons behind these seemingly random mutations are largely unknown. There are many algorithms that build phylogenetic trees from DNA sequences. However, there are certain uncertainties associated with these phylogenetic trees. Fine level analysis of these phy- logenetic trees is both important and interesting for evolutionary biologists. In this thesis, we try to model evolutions of DNA sequences using Cellular Automata and resolve the uncertain- ties associated with the phylogenetic trees. In particular, we determine the effect of neighboring DNA base-pairs on the mutation of a base-pair. Cellular Automata can be viewed as an array of cells which modifies itself in discrete time-steps according to a governing rule. The state of the cell at the next time-step depends on its current state and state of its neighbors. We have used cellular automata rules for analysis and predictions of DNA sequence transformations on computational grids. In the first part of the thesis, DNA sequence evolution is modeled as a cellular automata with each cell having one of the four possible states, corresponding to four bases. Phylogenetic trees are explored in order to find out the cellular automata rules that may have guided the evo- lutions. Master-client paradigm is used to exploit the parallelism in the sequence transformation analysis. Load balancing and fault-tolerance techniques are developed to enable the execution of the explorations on grid resources. The analysis of the sequence transformations is used to resolve uncertainties associated with the phylogenetic trees namely, intermediate sequences in the phylogenetic tree and the exact number of time-steps required for the evolution of a branch. The model is further used to find out various statistics such as most popular rules at a partic- ular time-step in the evolution history of a branch in a phylogenetic tree. We have observed ii iii some interesting statistics regarding the unknown base pairs in the intermediate sequences of the phylogenetic tree and the most popular rules used for sequence transformations. Next part of the thesis deals with predictions of future sequences using the previous se- quences. First, we try to find out the preserved sequences so that cellular automata rules can be applied selectively. Then, random strategies are developed as base benchmarks. A roulette wheel strategy is used for predicting future DNA sequences. Though the prediction strategies are able to better the random benchmarks in most of the cases, average performance improve- ment over the random strategies is not significant. The possible reasons are discussed. Contents List of Figures vii List of Tables ix 1 Introduction 1 1.1 Cellular Automata . 1 1.2 DNA and Cellular Automata . 3 1.3 Phylogenetics . 5 1.4 Grid Computing . 7 1.5 Motivation and Problem Formulation . 8 2 Related Work 11 2.1 DNA Sequence Evolution . 11 2.2 Grid Computing Applications . 13 2.2.1 Applications in Mathematics and Earth Sciences . 13 2.2.2 Applications in Astronomy, Physics and Chemistry . 14 2.2.3 Applications in Biology and Bioinformatics . 14 3 Sequence Transformation on Grids 16 3.1 Sequence Transformation on a Branch . 16 3.1.1 Naive Approach . 17 3.1.2 Selective Application of Cellular Automata Rules . 19 3.1.3 Dynamic Formation of Cellular Automata Rules . 20 iv CONTENTS v 3.2 Sequence Transformer . 22 3.3 Pseudo Molecular Clock Assumption . 22 3.4 Design . 24 3.4.1 Master-Worker Paradigm . 25 3.4.2 Phases of Execution . 26 3.5 Grid Computing Techniques . 27 3.5.1 Load Balancing . 28 3.5.2 Fault Tolerance . 29 3.6 Database Design . 29 3.7 Statistics Collection . 31 3.7.1 Timesteps . 32 3.7.2 Unknown base-pairs . 32 3.7.3 Rules . 32 3.7.4 Differential rule analysis . 33 3.7.5 Popularity of transitions . 33 4 Experiments and Results 36 4.1 Grid Infrastructure . 36 4.2 Timesteps . 37 4.3 Popular Rules . 37 4.4 Base Pairs Corresponding to Unknown Positions . 42 4.5 Potential of Grid Computing . 46 5 Predictions in phylogenetic trees 48 5.1 Determining the Preserved Segments . 48 5.1.1 Calculation of PSSM . 49 5.1.2 Strategies for Determining Preserved Sequences . 50 5.1.3 Evaluation of Strategies . 51 5.1.4 Determination of Threshold Values for Flexible Strategies . 53 5.2 Analysis of Random Strategies . 55 CONTENTS vi 5.3 Methods Used for Prediction . 57 5.3.1 Roulette Wheel Method . 58 5.3.2 Roulette Wheel Method with Random Component . 58 5.3.3 History Sizes . 59 5.3.4 Experiments and Results . 59 5.4 Analysis . 60 6 Conclusions and Future work 62 6.1 Conclusions . 62 6.2 Future Work . 63 References 65 List of Figures 1.1 Evolution of Cellular Automata through time steps . 2 1.2 Rule that governs the evolution of cellular automata shown in Figure 1.1 . 2 1.3 Double helix structure of DNA (Courtesy : U.S. National Library of Medicine) 4 1.4 Example Phylogenetic Tree with Gag Sequences . 6 3.1 Application of Random Cellular Automata Rules . 18 3.2 Selective Application of Cellular Automata Rules . 19 3.3 Example : Dynamic Formation of Cellular Automata Rules . 20 3.4 Dynamic Formation and Selective Application of Cellular Automata Rules . 21 3.5 Illustration of the Greedy Algorithm . 24 3.6 The Master-Worker Design . 25 3.7 Phase I in Master . 27 3.8 Phase II in Master . 28 5.1 Analysis of threshold values : Flexible-1 . 53 5.2 Analysis of threshold values : Flexible-2 . 54 5.3 Analysis of random strategies . 57 vii List of Algorithms 1 Algorithm for Sequence Transformer . 34 2 Greedy Algorithm for Chain Formation . 35 3 Calculation of Position Specific Scoring Matrix . 49 viii List of Tables 1.1 Left-Hand Sides of 64 Transitions of Cellular Automata with Neighborhood Size of 1 . 5 3.1 strands . 30 3.2 working strand . 30 3.3 branches . 31 3.4 ruletable . 31 3.5 chains . 32 4.1 The Distributed Infrastructure . 37 4.2 Summary of time step information for Gag sequences . 38 4.3 Summary of time step information for GagPol sequences . 38 4.4 Summary of time step information for env sequences . 39 4.5 Differential Rule Analysis for Gag Sequences . 40 4.6 Differential Rule Analysis for GagPol Sequences . 40 4.7 Differential Rule Analysis for env Sequences . 41 4.8 Popular Rules for a Branch for Gag Sequences . 42 4.9 Popular Rules for a Branch for GagPol Sequences . 43 4.10 Popular Rules for a Branch for env Sequences . 44 4.11 Resolution of Unknown Positions for Gag Sequences . 45 4.12 Resolution of Unknown Positions for GagPol Sequences . 45 4.13 Resolution of Unknown Positions for env Sequences . 46 4.14 Usefulness of Large Number of Runs . 46 ix Chapter 1 Introduction In this section, we give brief background on cellular automata, the relationship between cellular automata and DNA evolutions and the concept of phylogenetic trees. 1.1 Cellular Automata Cellular automaton is a regular array of identical finite state automata where the next states of the array elements are determined solely by their current states and the states of their neighbors. One dimensional cellular automata consists of a line of cells, each having a particular state. State of each of these cells changes over discrete time-steps.
Recommended publications
  • Volunteer Computing Different Grids for Different Needs
    From the Web to the Grid How did the Grid start? • Name “Grid” chosen by analogy with electric power grid (Foster and Kesselman 1997) • Vision: plug-in computer for processing power just like plugging in toaster for electricity. • Concept has been around for decades (distributed computing, metacomputing) • Key difference with the Grid is to realise the vision on a global scale. From the Web to the Grid – 2007 HST 2011: Grids and Volunteer Computing Different Grids for different needs There is as yet no unified Grid, like there is a single web. Rather there are many Grids for many applications: • Enterprise Grids link together PCs within one company. • Volunteer computing links together public computers. • Scientific Grids link together major computing centres. • Latest trend federates national Grids into global Grid infrastructure. • High Energy Physics is a driving force for this. HST 2011: Grids and Volunteer Computing The LHC data challenge 1 Megabyte (1MB) • 40 million bunch collisions per second A digital photo 1 Gigabyte (1GB) = 1000MB • After filtering, ~100 collisions of 5GB = A DVD movie interest per second per detector 1 Terabyte (1TB) = 1000GB World annual book production • > 1 Megabyte of data per collision 1 Petabyte (1PB) recording rate > 1 Gigabyte/sec = 1000TB Annual production of one LHC experiment 10 • 10 collisions recorded each year 1 Exabyte (1EB) stored data ~15 Petabytes/year = 1000 PB 3EB = World annual information production …for more than 10 years HST 2011: Grids and Volunteer Computing Data Storage for the LHC Balloon (30 Km) • LHC data correspond to about 20 million CDs each year! CD stack with 1 year LHC data! (~ 20 Km) Concorde Where will the (15 Km) experiments store all of these data? Mt.
    [Show full text]
  • Spin-Off Successes of SETI Research at Berkeley
    **FULL TITLE** ASP Conference Series, Vol. **VOLUME**, c **YEAR OF PUBLICATION** **NAMES OF EDITORS** Spin-Off Successes of SETI Research at Berkeley K. A. Douglas School of Physics, University of Exeter, Exeter, United Kingdom D. P. Anderson, R. Bankay, H. Chen, J. Cobb, E.J. Korpela, M. Lebofsky, A. Parsons, J. Von Korff, D. Werthimer Space Sciences Laboratory, University of California Berkeley, Berkeley CA, USA 94720 Abstract. Our group contributes to the Search for Extra-Terrestrial Intelligence (SETI) by developing and using world-class signal processing computers to analyze data collected on the Arecibo telescope. Although no patterned signal of extra-terrestrial origin has yet been detected, and the immediate prospects for making such a detection are highly uncertain, the SETI@home project has nonetheless proven the value of pursuing such research through its impact on the fields of distributed computing, real-time signal pro- cessing, and radio astronomy. The SETI@home project has spun off the Center for Astronomy Signal Processing and Electronics Research (CASPER) and the Berkeley Open Infrastructure for Networked Computing (BOINC), both of which are responsi- ble for catalyzing a smorgasbord of new research in scientific disciplines in countries around the world. Futhermore, the data collected and archived for the SETI@home project is proving valuable in data-mining experiments for mapping neutral galatic hy- drogen and for detecting black-hole evaporation. 1 The SETI@home Project at UC Berkeley SETI@home is a distributed computing project harnessing the power from millions of volunteer computers around the world (Anderson 2002). Data collected at the Arecibo radio telescope via commensal observations are filtered and calibrated using real-time signal processing hardware, and selectable channels are recorded to disk.
    [Show full text]
  • "Challenges and Formal Aspects of Volunteer Computing"
    BUDAPEST UNIVERSITY OF TECHNOLOGY AND ECONOMICS FACULTY OF ELECTRICAL ENGINEERING AND INFORMATICS DEPARTMENT OF AUTOMATION AND APPLIED INFORMATICS Attila Csaba Marosi Önkéntes Számítási Rendszerek Kihívásai és Formális Aspektusai Challenges and Formal Aspects of Volunteer Computing Ph.D. Thesis Thesis Advisors: Péter Kacsuk MTA SZTAKI, LPDS Sándor Juhász BME, AUT Budapest, 2016. ‘‘The important thing is not to stop questioning. Curiosity has its own reason for existing. One cannot help but be in awe when he contemplates the mysteries of eternity, of life, of the marvelous structure of reality. It is enough if one tries merely to comprehend a little of this mystery every day. Never lose a holy curiosity.’’ -- Albert Einstein Abstract Volunteer Computing (VC) and Desktop Grid (DG) systems collect and make available the donated the resources from non-dedicated computers like office and home desktops. VC systems are usually deployed to solve a grand compute intensive problem by researchers who either don’t have access to or don’t have the resources to buy a dedicated infrastruc- ture; or simply don’t want to maintain such an infrastructure. VC and DG paradigms seem similar, however they target different use cases and environments: DG systems operate within the boundaries of institutes, while VC systems collect resources from the publicly accessible internet. Evidently VC resembles DGs whereas DGs are not fully equivalent to VC. Contrary to “traditional grids” [1,2] there is no formal definition for the relationship of DG and VC that could be used to categorize existing systems. There are informal at- tempts to categorize them and compare with grid systems [3,4,5].
    [Show full text]
  • Analyzing Daily Computing Runtimes on the World Community Grid
    https://doi.org/10.48009/3_iis_2020_289-297 Issues in Information Systems Volume 21, Issue 3, pp. 289-297, 2020 EXPLORING GRID COMPUTING & VOLUNTEER COMPUTING: ANALYZING DAILY COMPUTING RUNTIMES ON THE WORLD COMMUNITY GRID Kieran Raabe, Bloomsburg University of Pennsylvania, [email protected] Loreen M. Powell, Bloomsburg University of Pennsylvania, [email protected] ABSTRACT In the early 1990s, the concept of grid computing was little more than a simple comparison which stated that making computing power as accessible as electricity from a power grid. Today, grid computing is far more complex. Grid computing involves utilizing unused computing power from devices across the globe to run calculations and simulations and submit the results to move a project closer to its goal. Some projects, such as World Community Grid, work to achieve numerous goals, herein referred to as subprojects. The goal of this research is to explore grid computing and volunteer computing. Specifically, this research involved a daily collection of statistics on daily results returned to World Community Grid and daily computing runtimes for three subprojects named Africa Rainfall Project, Microbiome Immunity Project, and Help Stop TB. The data collection lasted four weeks and resulted in five graphs for each subproject being created with the collected data. Results revealed a correlation between daily results returned and daily runtimes, with each data point for results returned being slightly lower than the runtime data point from the same day. Keywords: Grid Computing, Volunteer Computing, BOINC, World Community Grid INTRODUCTION The concept of grid computing has existed since the early 1990s. Originally “a metaphor for making computer power as easy to access by the user as electricity from a power grid” (Dutton & Jeffreys, 2010).
    [Show full text]
  • EDGES Project Meeting
    International Desktop Grid Federation - Support Project Contract number: FP7-312297 Desktop Grids for e-Science Road Map Project deliverable: D5.5.1 Due date of deliverable: 2013-10-31 Actual submission date: 2013-12-27 Lead beneficiary: AlmereGrid Workpackage: WP5 Dissemination Level: PU Version: 1.2 (Final) IDGF-SP is supported by the FP7 Capacities Programme under contract nr FP7-312297. D5.5.1 – Desktop Grids for e-Science Road Map CopyriGht (c) 2013. MemBers of IDGF-SP consortium, see http://IDGF-SP.eu for details on the copyriGht holders. You are permitted to copy and distriBute verBatim copies of this document containinG this copyriGht notice But modifyinG this document is not allowed. You are permitted to copy this document in whole or in part into other documents if you attach the followinG reference to the copied elements: ‘Copyright (c) 2013. Members of IDGF-SP consortium - http://IDGF-SP.eu’. The commercial use of any information contained in this document may require a license from the proprietor of that information. The IDGF-SP consortium memBers do not warrant that the information contained in the deliveraBle is capaBle of use, or that use of the information is free from risk, and accept no liaBility for loss or damaGe suffered By any person and orGanisation usinG this information. WP3 © 2013. Members of IDGF-SP consortium - http://IDGF-SP.eu 2/8 D5.5.1 – Desktop Grids for e-Science Road Map Table of Contents 1 Status and ChanGe History .................................................................................................
    [Show full text]
  • Toward Crowdsourced Drug Discovery: Start-Up of the Volunteer Computing Project Sidock@Home
    Toward crowdsourced drug discovery: start-up of the volunteer computing project SiDock@home Natalia Nikitina1[0000-0002-0538-2939] , Maxim Manzyuk2[000-0002-6628-0119], Marko Juki´c3;4[0000-0001-6083-5024], Crtomirˇ Podlipnik5[0000-0002-8429-0273], Ilya Kurochkin6[0000-0002-0399-6208], and Alexander Albertian6[0000-0002-6586-8930] 1 Institute of Applied Mathematical Research, Karelian Research Center of the Russian Academy of Sciences, Petrozavodsk, Russia, [email protected] 2 Internet portal BOINC.ru, Moscow, Russia, [email protected] 3 Chemistry and Chemical Engineering, University of Maribor, Maribor, Slovenia 4 Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia [email protected] 5 Faculty of Chemistry and Chemical Technology, University of Ljubljana, Ljubljana, Slovenia, [email protected] 6 Federal Research Center \Computer Science and Control" of the Russian Academy of Sciences, Moscow, Russia, [email protected], [email protected], [email protected] Abstract. In this paper, we describe the experience of setting up a computational infrastructure based on BOINC middleware and running a volunteer computing project on its basis. We characterize the first series of computational experiments and review the project's development in its first six months. The gathered experience shows that BOINC-based Desktop Grids allow to to efficiently aid drug discovery at its early stages. Keywords: Desktop Grid · Distributed computing · Volunteer comput- ing · BOINC · Virtual drug screening · Molecular docking · SARS-CoV-2 1 Introduction Among the variety of high-performance computing (HPC) systems, Desktop Grids hold a special place due to their enormous potential and, at the same time, high availability.
    [Show full text]
  • Primegrid: Searching for a New World Record Prime Number
    PrimeGrid: Searching for a New World Record Prime Number rimeGrid [1] is a volunteer computing project which has mid-19th century the only known method of primality proving two main aims; firstly to find large prime numbers, and sec- was to exhaustively trial divide the candidate integer by all primes Pondly to educate members of the project and the wider pub- up to its square root. With some small improvements due to Euler, lic about the mathematics of primes. This means engaging people this method was used by Fortuné Llandry in 1867 to prove the from all walks of life in computational mathematics is essential to primality of 3203431780337 (13 digits long). Only 9 years later a the success of the project. breakthrough was to come when Édouard Lucas developed a new In the first regard we have been very successful – as of No- method based on Group Theory, and proved 2127 – 1 (39 digits) to vember 2013, over 70% of the primes on the Top 5000 list [2] of be prime. Modified slightly by Lehmer in the 1930s, Lucas Se- largest known primes were discovered by PrimeGrid. The project quences are still in use today! also holds various records including the discoveries of the largest The next important breakthrough in primality testing was the known Cullen and Woodall Primes (with a little over 2 million development of electronic computers in the latter half of the 20th and 1 million decimal digits, respectively), the largest known Twin century. In 1951 the largest known prime (proved with the aid Primes and Sophie Germain Prime Pairs, and the longest sequence of a mechanical calculator), was (2148 + 1)/17 at 49 digits long, of primes in arithmetic progression (26 of them, with a difference but this was swiftly beaten by several successive discoveries by of over 23 million between each).
    [Show full text]
  • Volunteer Down: How COVID-19 Created the Largest Idling Supercomputer on Earth
    future internet Perspective Volunteer Down: How COVID-19 Created the Largest Idling Supercomputer on Earth Nane Kratzke Department for Electrical Engineering and Computer Science, Lübeck University of Applied Sciences, 23562 Lübeck, Germany; [email protected] Received: 25 May 2020; Accepted: 3 June 2020; Published: 6 June 2020 Abstract: From close to scratch, the COVID-19 pandemic created the largest volunteer supercomputer on earth. Sadly, processing resources assigned to the corresponding Folding@home project cannot be shared with other volunteer computing projects efficiently. Consequently, the largest supercomputer had significant idle times. This perspective paper investigates how the resource sharing of future volunteer computing projects could be improved. Notably, efficient resource sharing has been optimized throughout the last ten years in cloud computing. Therefore, this perspective paper reviews the current state of volunteer and cloud computing to analyze what both domains could learn from each other. It turns out that the disclosed resource sharing shortcomings of volunteer computing could be addressed by technologies that have been invented, optimized, and adapted for entirely different purposes by cloud-native companies like Uber, Airbnb, Google, or Facebook. Promising technologies might be containers, serverless architectures, image registries, distributed service registries, and all have one thing in common: They already exist and are all tried and tested in large web-scale deployments. Keywords: volunteer computing; cloud computing; grid computing; HPC; supercomputing; microservice; nanoservice; container; cloud-native; serverless; platform; lessons-learned; COVID-19 1. Introduction On 28 April 2020, Greg Bowman—director of the volunteer computing (VC) project Folding@home— posted this on Twitter https://twitter.com/drGregBowman/status/1255142727760543744: @Folding@home is continuing its growth spurt! There are now 3.5 M devices participating, including 2.8 M CPUs (19 M cores!) and 700 K GPUs.
    [Show full text]
  • Integrated Service and Desktop Grids for Scientific Computing
    Integrated Service and Desktop Grids for Scientific Computing Robert Lovas Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary [email protected] Ad Emmen AlmereGrid, Almere, The Nederlands [email protected] RI-261561 GRID 2010, DUBNA Why Desktop Grids are important? http://knowledgebase.ehttp://knowledgebase.e--irg.euirg.eu RI-261561 GRID 2010, DUBNA Introduction RI-261561 WP4 Author: Robert Lovas, Ad Emmen version: 1.0 Prelude - what do people at home and SME’s think about grid computing Survey of EDGeS project Questionnaires all across Europe Get an idea of the interest in people and SMEs to donate computing time for science to a Grid Get an idea of the interest in running a Grid inside an SME RI-261561 GRID 2010, DUBNA Survey amongst the General Public and SME’s RI-261561 GRID 2010, DUBNA Opinions about Grid computing RI-261561 GRID 2010, DUBNA Survey - Conclusions Overall: there is interest in Desktop Grid computing in Europe. However, that people are willing to change their current practice and say that they want to participate in Grid efforts does not mean that they are actually going to do that. Need to generate trust in the organisation that manages the Grid. People want to donate computing time for scientific applications, especially medical applications. They do not like to donate computing time to commercial or defense applications. People want feedback on the application they are running. No clear technical barriers perceived by the respondents: so this does not need much attention. Overall the respondents were rather positive about donating computing time for a Grid or about running applications on a Grid.
    [Show full text]
  • Secure Volunteer Computing for Distributed Cryptanalysis
    ysis SecureVolunteer Computing for Distributed Cryptanal Nils Kopal Secure Volunteer Computing for Distributed Cryptanalysis ISBN 978-3-7376-0426-0 kassel university 9 783737 604260 Nils Kopal press kassel university press ! "# $ %& & &'& #( )&&*+ , #()&- ( ./0 12.3 - 4 # 5 (!!&& & 6&( 7"#&7&12./ 5 -839,:,3:3/,2;1/,2% ' 5 -839,:,3:3/,2;13,3% ,' 05( (!!<& &!.2&.81..!")839:3:3/2;133 "=( (!!, #& !(( (2221,;2;13/ '12.97 # ?@7 & &, & ) ? “With magic, you can turn a frog into a prince. With science, you can turn a frog into a Ph.D. and you still have the frog you started with.” Terry Pratchett Abstract Volunteer computing offers researchers and developers the possibility to distribute their huge computational jobs to the computers of volunteers. So, most of the overall cost for computational power and maintenance is spread across the volunteers. This makes it possible to gain computing resources that otherwise only expensive grids, clusters, or supercomputers offer. Most volunteer computing solutions are based on a client-server model. The server manages the distribution of subjobs to the computers of volunteers, the clients, which in turn compute the subjobs and return the results to the server. The Berkeley Open Infrastructure for Network Computing (BOINC) is the most used middleware for volunteer computing. A drawback of any client-server architecture is the server being the single point of control and failure. To get rid of the single point of failure, we developed different distribution algorithms (epoch distribution algorithm, sliding-window distribution algorithm, and extended epoch distribution algorithm) based on unstructured peer-to-peer networks. These algorithms enable the researchers and developers to create volunteer computing networks without any central server.
    [Show full text]
  • Predicting Climate Change Through Volunteer Computing
    climateprediction.net Predicting Climate Change Through Volunteer Computing University of Oxford Department of Atmospheric Physics Oxford e-Research Centre climateprediction.net The Goals: To harness the power of idle home and business PCs to help forecast the climate of the 21st century. To improve public understanding of the nature of uncertainty in climate prediction. The Method: Invite the public to download a full resolution, 3D climate model and run it locally on their PC. Use each PC to run a single member of a massive, perturbed physics ensemble. Provide visualization software and educational packages to maintain interest and facilitate school and undergraduate projects etc. Volunteer Computing A specialized form of “distributed computing” which is really an “old idea” in computer science -- using remote computers to perform a same or similar tasks Along the lines of Condor, which now supports BOINC VC projects as “backfill” – so we’re all grid now! Was around before '99 but took off with SETI@home Offers high CPU power at low cost (need a few developers/sysadmins to run the “supercomputer”) Processing Power SETI@home peak cap with 500K users about 1 PF = 1000 TF climateprediction.net (CPDN) running at about 60 TF (60K concurrent users each 1GF machine average, i.e. P.IV 2GHz conservatively rated) For comparison, Earth Sim in Yokohama = 35TF max. CPDN Volunteer Computing Challenges... Climate models (ESM's, AOGCM's etc) are very large, complex systems developed by physicists sometimes over decades (& proprietary
    [Show full text]
  • Computer Science • 14 (1) 2013
    Computer Science • 14 (1) 2013 http://dx.doi.org/10.7494/csci.2013.14.1.27 Olexander Gatsenko Lev Bekenev Evgen Pavlov Yuri G. Gordienko FROM QUANTITY TO QUALITY: MASSIVE MOLECULAR DYNAMICS SIMULATION OF NANOSTRUCTURES UNDER PLASTIC DEFORMATION IN DESKTOP AND SERVICE GRID DISTRIBUTED COMPUTING INFRASTRUCTURE Abstract The distributed computing infrastructure (DCI) on the basis of BOINC and EDGeS-bridge technologies for high-performance distributed computing is used for porting the sequential molecular dynamics (MD) application to its paral- lel version for DCI with Desktop Grids (DGs) and Service Grids (SGs). The actual metrics of the working DG-SG DCI were measured, and the normal dis- tribution of host performances, and signs of log-normal distributions of other characteristics (CPUs, RAM, and HDD per host) were found. The practical feasibility and high efficiency of the MD simulations on the basis of DG-SG DCI were demonstrated during the experiment with the massive MD simulations for the large quantity of aluminum nanocrystals (∼ 102–103). Statistical analysis (Kolmogorov-Smirnov test, moment analysis, and bootstrapping analysis) of the defect density distribution over the ensemble of nanocrystals had shown that change of plastic deformation mode is followed by the qualitative change of defect density distribution type over ensemble of nanocrystals. Some limita- tions (fluctuating performance, unpredictable availability of resources, etc.) of the typical DG-SG DCI were outlined, and some advantages (high efficiency, high speedup, and low cost) were demonstrated. Deploying on DG DCI al- lows to get new scientific quality from the simulated quantity of numerous configurations by harnessing sufficient computational power to undertake MD simulations in a wider range of physical parameters (configurations) in a much shorter timeframe.
    [Show full text]