Volunteer Computing
Total Page:16
File Type:pdf, Size:1020Kb
Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley May 2, 2007 Outline z Goals of volunteer computing z How BOINC works z Some volunteer computing projects z Some research directions z Non-technical problems Goal: Use all the computers in the world, all the time, to do worthwhile things z What do we mean by “computers”? z Who owns the computers? − Individuals (60% and rising) − Organizations z What does “worthwhile” mean? BOINC (Berkeley Open Infrastructure for Network Computing) z Middleware for volunteer computing z Open-source (LGPL) z Application-driven 60% Accounts PC 40% Attachments with Projects resource share The volunteer computing game Volunteers Projects Internet z Do more science z Involve public in science Volunteer computing != Grid computing Resource anonymous, unaccountable; identified, accountable owners need to check results Managed yes – software stack no – need plug & play software systems? requirements OK Clients behind yes – pull model no – push model firewall? yes ISP bill? no ... nor is it “peer-to-peer computing” Types of utility z Compute cycles (usually FLOPS) − with latency bound − with RAM − with storage − guaranteed availability z Storage − network bandwidth − availability z Network bandwidth at specific time z Wide/diverse deployment z Human steering of computation What is resource share? z Given: set of hosts, attachments z “Schedule”: what projects use resources when z Ideally: − Weighted Pareto optimality of utility? − Across multiple hosts z In BOINC − determines usage of bottleneck resources on a single host, e.g.: Resource share Demand Usage A 40% 15% 15% B 40% large 50% C 20% large 25% Credit: approximation of utility z Currently − projects can grant however they want − default: FLOPS z benchmarks * CPU time z application provides FLOP count z application does FP benchmark − cheat-proofing z To do: − cheat-proof measurements of other resources − projects publish “credit per day” functions z Normalization rule z Accounting rule How BOINC works: server DB Applications Platforms Accounts App versions Jobs Hosts Job instances Job replication z Problem: can’t trust volunteers − computational result − claimed credit z Application-specific checks, no replication z Replicated computing − do N copies, require that M of them agree − not bulletproof (collusion) time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 created validate; assimilate Job x x x created sent success Instance 1 x x---------------x created sent error Instance 2 x x--------x created sent success Instance 3 x x-------------------x created sent success Instance 4 x x----------------------x How to compare results? z Problem: numerical discrepancies z Stable problems: fuzzy comparison z Unstable problems − Eliminate discrepancies z compiler/flags/libraries − Homogeneous replication z send instances only to numerically equivalent hosts (equivalence may depend on app) Work flow assimilator (handles correct result) work generator (creates stream or batches of jobs) validator (compares replicas, selects “correct” result) BOINC Volunteer’s view z 1-click install, zero configuration z All platforms z Invisible, autonomic BOINC client structure schedulers, data servers screensave r application local TCP core client BOINC library GUI Runtime system user preferences, control Communication: “Pull” model I can run Win32 and Win64 512 MB RAM 20GB free disk 2.5 GFLOPS CPU (description of current work) client scheduler Here are three jobs. Job 1 has application files A,B,C, input files C,D,E and output file F ... Example: ClimatePrediction.net z Application: UK Met Office Unified Model z State-of-the-art global climate model − 1 million lines of FORTRAN z High-dimensional search space − model parameters − boundary conditions − perturbed initial conditions ClimatePrediction.net z Using supercomputers: − 1 day per run − 10-20 total runs z Using BOINC: − 6 months per run − 50,000 active hosts − 171,343 runs completed − Nature papers − 60-fold savings Some other BOINC-based projects z Einstein@home − LIGO; gravitational wave astronomy z Rosetta@home − U. Washington; protein study z SETI@home − U.C. Berkeley; SETI z LHC@home − CERN; accelerator simulation z Africa@home − STI, U. of Geneva; malaria epidemiology z IBM World Community Grid − several biomedical applications z ...and about 30 others Computing power z Folding@home: − 650 TeraFLOPS z 200 from PCs; 50 from GPUs; 400 from PS3 z BOINC-based projects: 70 600 2000 2006 60 500 50 TFLOPS TFLOPS 400 40 300 30 20 200 10 100 0 0 SETI@home Earth Simulator BOINC Blue Gene/L A sampling of research problems z Data-intensive computing z Low-latency computing z Background utility compatibility z Credit mechanism z Efficient validation z Game consoles and graphics chips z Simulation Data-intensive computing – client limits z Q = network transfer per GFLOPS/hr z SETI@home: Q = 0.1 MB z but wider range is OK: Server-side limits $ $ Server Internet Client Using free networks Server $ Internet2 Server $ commodity $ Internet Server $ Client Using more free networks Server $ Internet2 Server $ commodity $ Internet Server $ LAN Client Client Client Low-latency computing job submission deadline 2 min 4 min time z VC usually minimizes connection frequency z What if you want to do 10,000 1-minute jobs in 6 minutes of wall time? Background utility compatibility z Background utilities − disk defrag − disk indexing − virus scanning − web pre-fetch − disk backup z Most run only when computer is idle − volunteer computing ==> they never run z A) ignore zero-priority CPU activity z B) Background manager − intelligent decision about when to run various activities Credit mechanism z Already described Efficient validation z How to validate provably and efficiently with replication factor approaching 1? Game consoles and graphics chips z NVIDIA, ATI, Cell − 10X CPU and gaining? z Folding@home: − ATI version (50 GFLOPS) − Sony PS3 version (100 GFLOPS) z BOINC and Einstein@home on PS3 z How to make this available to other projects? Simulating volunteer computing z Ad-hoc development of scheduling policies − slow, noisy − jeopardizes running projects z Simulation-based R&D − client simulator z client scheduling policies − Project simulator z server scheduling policies − Global simulator z study data-intensive, low-latency, etc. The hard non-technical problems z How to increase the number of volunteers? − currently 1 in 1000 PC owners z How to increase the number of projects? − currently stuck at about 50 z How to get volunteers to diversify? How to attract and retain volunteers? Active hosts: z Retention − reminder emails − frequent science updates z Recruitment − Viral z “email a friend”, referral reward − Organizational z World Community Grid: “partner” program − Media coverage z need more discoveries − Bundling Why aren’t there more projects? z Lack of PR among scientists z IT antipathy z Creating a BOINC project is expensive: Research group Science Software/IT Communications App development Port/debug apps Web site development Experiment design workflow tools message board admin Paper writing server admin public relations Meta-projects z Virtual Campus Supercomputing Center − Deployment and publicity: z PC labs, staff/faculty desktops z students z alumni z public Research groups Berkeley@home Existing UCB staff Science Software/IT Communications App development Port/debug apps Web site development Experiment design workflow tools message board admin Paper writing server admin public relations z IBM World Community Grid Encouraging change z Cross-project credit system − encourage competition in total credit, not per-project z Account Managers − Make it easier to discover/attach/detach projects − GridRepublic, BAM! z Science Stock Market? − encourage participation in new high-potential projects z Scientific Mutual Funds? − e.g. American Cancer Society BOINC “portfolio” Conclusion z Volunteer computing: a new paradigm − distinct research problems, software requirements − big accomplishments, potential z Social impacts z Contact me about: − Using BOINC − Research based on BOINC [email protected].