22S:295 GEEMaP seminar

Volunteer computing

Session 1 Aug. 29, 2011

Kate Cowles 374 SH, 335-0727 [email protected]

1 What is ? • opportunity for any computer owner to contribute to research in energy, cli- mate change, medicine, astronomy, math- ematics and more • distributed system: multiple autonomous computers at different locations com- municating via network • distributed computing: use of a dis- tributed system to solve a computa- tional problem – must be possible to divide the com- putational problem into many small tasks – tasks must be able to run indepen- dently on separate computers • volunteer computing: form of distributed computing in which individual computer owners donate unused computing power of their own computers to research projects

2 How does it work? • requires a computer (Windows, Mac, or ) with an connection available at least part of the time • individual user downloads and installs middleware software program • within the software, user chooses in which project(s) to participate • software downloads work units from the project’s server, runs them in the back- ground while the computer is turned on, and uploads completed results to the project’s server • work units from (almost all) projects save intermediate results as they go along, so computer can be turned off and back on without losing everything

3 Berkeley Open Infrastructure for Network Computing (BOINC)

http://boinc.berkeley.edu/ • history: earliest volunteer computing projects wrote their own programs that combined the scientific computations and the communications management – Great Internet Mersenne Prime Search, 1996 – SETI@home and Folding@home, 1999 • now: BOINC is middleware system that many projects use to drive their appli- cations • developed by an NSF-funded research project located at the UC Berkeley Space Sciences Laboratory • open-source

4 • client software for Windows, Mac, Linux

5 World Community Grid worldcommunitygrid.org • an umbrella organization that hosts vol- unteer computing projects with clear potential benefit to humanity • scientists propose project ideas to WCG • WCG technicians develop the applica- tions to run under BOINC • WCG servers send out work units, re- ceive results, and pass them on to sci- entists • run by IBM • WCG volunteers have contributed over 500,000 years of run time since 2004 • lots of information under “Help” on WCG web page

6 Current World Community Grid sciences • Computing for Clean Water (C4CW) – to guide future development of effi- cient, low-cost water filters – to use simulaticon-based large-scale molecular dynamics calculations to improve understanding of water flow through carbon nanotube filters – runs ok on any computer ∗ low demands on system memory ∗ small files to download and upload ∗ WUs of a constant size (1+ hrs on very fast machines up to 16 hrs on an Atom with hyperthreading)

7 • Clean Energy Project Phase 2 (CEP2) – to find new materials for the next generation of solar cells and later, energy storage devices – to calculate the electronic properties of hundreds of thousands of organic materials and dermine which candi- dates are most promising for devel- oping affordable solar energy tech- nology – opt-in project because of demands on computer ∗ 1 Gb of memory per running WU ∗ large files to download and upload ∗ infrequent checkpointing ∗ lots of hard disk writing

8 • Help Conquer – fine for any computer – project will be completed in approx- imately 3 months • Help Cure Muscular Dystrophy – fine for any computer – shortage of WUs right now because scientists’ server is full! • Discovering Dengue Drugs Together – Phase 2 – intermittent project – some extremely long-running WUs • sciences with fairly long-running WUs but no other restrictive demands on com- puters – Help Fight – Human Proteome Folding – Phase 2 – Fight Aids at Home

9 Upcoming WCG projects http://www.greenbiz.com/news/2010/09/07/ibm-steps-water-management-projects-around-globe • Watershed Sus- tainability project: model the effects of agricultural, commercial and industrial actions on Chesapeake Bay • Inforium Bioinformatics in collabora- tion with FIOCRUZ-Minas (): to develop a cure for (river blindness) – “The World Community Grid tech- nology will allow researchers to screen 13 million possible drug compounds against 180 structures that are associated with the schistosomi- asis parasite.”

10 Controlling how WCG projects run on your PC • under “My Grid,” “Device Manager” on WCG web page • defaults work well for most people for unobtrusive use • you probably will have to change your Windows settings for power manage- ment if you want BOINC WUs to keep running when computer is on and you’re not using it • separate opt-in for beta testing

11 Other BOINC projects in environmental science • Climate Prediction http://climateprediction.net/ – very long-running work units (100 to 500 hours on 2.5 GHz computer • Quake Catcher Network – Seismic Mon- itoring http://qcn.stanford.edu/sensor/ – requires either a laptop with an ac- celerometer or purchase of an exter- nal motion sensor • Virtual Prairie http://vcsc.cs.uh.edu/virtual-prairie/

12 Finding other BOINC projects • listing at Berkeley BOINC wiki http://boinc.berkeley.edu/wiki/ Project_list • some enable computing on graphics cards • all projects that run under BOINC have similar user control features – just dif- ferent look on website

13 The human side • points • teams • badges • forums

14 But does it actually accomplish anything? • Clean Energy project (WCG) http://www.nature.com/news/2011/ 110816/full/news.2011.481.html • Computing for Clean Water (WCG) http://www.worldcommunitygrid. org/about_us/viewNewsArticle.do? articleId=161 • Help Fight Childhood Cancer (WCG) http://www.m.chiba-u.ac.jp/class/ bioinfor/wcg/e/hfcc_e/news.html • Quake Catcher Network http://www.washingtonpost.com/national/health-science/earthquake-brought-seismometer-of-citizen-scientist-to-life-in-charlottesville/ 2011/08/25/gIQAgcqrnJ_story.html?hpid=z4

15 Validation: How do researchers know returned results are correct? • redundant computing – 2 copies of each WU sent to different computers • single validation (type 1) – after a com- puter has returned valid WUs under re- dundant computing, no validation ex- cept for occasional spot checks • single validation (type 2) – for projects that do simulation, re- turned results from different comput- ers for the same WU won’t be iden- tical – many copies (e.g. 19 for HPF2) of each WU sent out – results must agree to within pre-specified tolerance

16 How are work units for new projects tested? • alpha testing by development team • beta testing by users – users opt in – beta tester is expected to truly test and give feedback to project ∗ suspend/resume ∗ turn computer off and back on ∗ check whether they are respond- ing to BOINC control ∗ test running concurrently with other software ∗ check CPU temperature

17 What I recommend if you want to participate • on Windows, download BOINC from World Community Grid website • on Ubuntu Linux, install BOINC from the package repository • use WCG default settings unless you want to get deeply involved in technical side • check temperature (especially important on laptops) – install CoreTemp or TThrottle on Windows – install lmsensors on Linux – reduce % of processor usage or num- ber of cores used if necessary • participate in beta testing only if you have time to keep a close eye on run- ning beta test units • stick to well-established projects

18 • participate in GPU BOINC projects only if you have considerable technical ex- pertise

19 Relationship to GEEMaP • GEEMaP WCG team?? • one aim of my research group is to de- velop a BOINC application for statisti- cal analysis of environmental data – we will need beta testers • longer term goals – develop Virtual Campus Supercom- puting Center at University of Iowa? http://boinc.berkeley.edu/trac/ wiki/VirtualCampusSupercomputerCenter – propose a GEEMaP application to World Community Grid?

20