Procedia Computer Science

Procedia Computer Science 101 , 2016 , Pages 369 – 378

YSC 2016. 5th International Young Scientist Conference on Computational Science

BOINC forks, issues and directions of development*

Ilya Kurochkin12†and Anatoliy Saevskiy2‡ 1 Kharkevich Institute for Information Transmission Problems of Russian Academy of Sciences, Russia 2 The National University of Science and Technology MISiS, Russia [email protected], [email protected]

Abstract The article based on the experience of running BOINC projects. We interviewed developers of projects on the platform BOINC in order to adopt their experience with the platform: issues with which they are confronted, how they have solved them, what changes have they done in BOINC and their opinion about BOINC platform, what should be improved in BOINC platform to make it better. Next we were study materials about experience of using the BOINC platform and BOINC issues. Finally we made conclusions about the actions to be taken for the development of BOINC: increase number of crunchers; rewrite the platform using modern architectural solutions and the latest technologies; initiate creation of services providing access to computing resources of crunchers.

Keywords: BOINC, fork, , volunteer distributed computing

1 Introduction BOINC is the leading platform for organization volunteer computing. Volunteer computing is a type of distributed computing in which computer owners or users donate their computing resources (such as processing power and storage) to one or more "projects". Certain roles are in volunteer computing process: x cruncher – a user that participates in a project; x developer – a person that develops and maintains a project. Volunteer computing can use for making scientific computation of for making commerce computation. In case of commerce computation, companies for whom these computations are making pay money from them to a cruncher in depend of amount of work he computed. In case of commerce computation, user donate their computing power for free because their support an idea of a project. In order to motivate crunchers to donate more some projects has rating where are listed users who donated more computational power than others. The Berkeley Open Infrastructure for Network Computing (BOINC), an open-source middleware system, supports volunteer and . Originally developed to support the SETI@home project, it became generalized as a platform for other distributed applications in areas as diverse as mathematics, linguistics, medicine, molecular biology, climatology, environmental science, and

* This work was funded by Russian Science Foundation (№16-11-10352) † Masterminded EasyChair and created the first stable version of this document ‡ Created the first draft of this document

Peer-review under responsibility of organizing committee of the scientific committee of the 369 5th International Young Scientist Conference on Computational Science © 2016 The Authors. Published by Elsevier B.V. doi: 10.1016/j.procs.2016.11.043 BOINC forks, issues and directions of development Ilya Kurochkin and Anatoliy Saevskiy

astrophysics, among others. BOINC aims to enable researchers to tap into the enormous processing resources of multiple personal computers around the world. BOINC development originated with a team based at the Space Sciences Laboratory (SSL) at the University of California, Berkeley and led by David Anderson, who also leads SETI@home. As a high-performance distributed computing platform, BOINC brings together about 311,742 active participants and 834,343 active computers (hosts) worldwide processing on average 11.747 PetaFLOPS as of 24 March 2016. As of January 2016, 58 BOINC projects are active. The National Science Foundation (NSF) funds BOINC through awards SCI/0221529, SCI/0438443 and SCI/0721124. Guinness World Records ranks BOINC as the largest computing grid in the world. BOINC code runs on various operating systems, including Microsoft Windows, Mac OS X, Android, Linux and FreeBSD. BOINC is free software released under the terms of the GNU Lesser General Public License (LGPL). BOINC platform consists of two components: BOINC server, which takes place on the server-side and BOINC client that install on the cruncher’s device (Anderson, 2004). BOINC platform has its leading position between platforms for organization volunteer computing, because of a large number of installed clients, a potentially large computational power and comparatively small amount of efforts required to start a project. Due to the popularity of BOINC is logical to assume that there are a large number of BOINC forks and projects, which using BOINC in their architecture. Fork of BOINC – is a project that based on the source code of BOINC platform. The reason why developers do forks is that they have different opinions about some architectural solutions, necessary functionality and preferred technologies and so on. The aim of this work is to collect and organize information about the changes that are introduced into the platform BOINC by different developers, to determine the directions of further development of BOINC and to assist researchers who have decided to start the project on the platform BOINC with understanding platform's main restrictions and the changes that need to be made to the BOINC platform before the start of the project.

2 Materials and methods In the article, we study experience of running applications on BOINC platform to determine main issues with which faced developers of applications on BOINC platform and main requirements for BOINC platform imposed by developers in order to determine possible ways of future BOINC development. In order to do this, we sent requests of providing necessary information to official representatives of projects. The purpose of the correspondence was to investigate the following information about each project: x Were any changes made to the source code of BOINC, what are these changes, and what was a purpose of making these changes? Are project’s source files freely available? x What is a deployment scheme of the project? x With what challenges has been faced team of a project while managing project and making changes in BOINC source code? x Which technologies are used by BOINC outdated? x Will using a platform-independent application for BOINC client be useful for a project? (Diogo Ferreira, 2010) In (Table 1) provided information about interviewed people, the information obtained from whom has been extremely useful to work on this article. Moreover, we studied different scientific articles about BOINC issues and analyzed statistic information from boincstat.com.

370 BOINC forks, issues and directions of development Ilya Kurochkin and Anatoliy Saevskiy

. Project Contact person University/Organization GRIDUnam Yevgeniy Kolokoltsev UNAM Cosmology@home Marius Millea Institut Lagrange de Paris SETI@home Matt Lebofsky UC Berkeley Asteroids@home Josef Durech, Charles University in Prague Radim Vančo Grid Travis Desell University of North Dakota - Dept. of Computer Science ATLAS@home Claire Adam Bourdarios, Université de Paris Saclay, experiment David Cameron, outreach and computing groups, CERN Laurence DENIS@home Jesús Carro Fernández Universidad San Jorge Joel Castro Mur Escuela Politécnica Superior Folding@home Joseph Coffland Campus Universitario de Villanueva de Gállego MilkyWay@Home Rom Walton Rensselaer Polytechnic Institute Weather@home Andy Bowery University of Oxford Table 1: Project information

3 Review of projects features This section provides information about experience of using BOINC in different projects. This information is a result of summarizing information from the communication with project developers, literature review, studying the open statistical information. 3.1 SZTAKI SZTAKI BOINC - fork of BOINC server application. It has about 39 000 users and 112 00 hosts on the state of 26.09.2016 accordingly to boincstats.com. Authors think, that the main issue of BOINC platform is not fully using of potential computing power of cruncher’s network, because of the difference in available computing power and capacity of available tasks (A.Cs. Marosi, 2010). To solve this issue SZTAKI BOINC team decided to create a global network of crunchers. To do this, they have chosen a distributed server solution. Developers of SZTAKI BOINC realized a possibility of interaction between servers themselves in order to implement functionality of distributed server (Kacsuk, 2009) (P. Kacsuk, 2006). Thus it is possible to build a multi-level hierarchy of system servers as global / regional grid, when the task of one project, can be performed on computers of crunchers of another project, if there is spare capacity. So a server that has free resources can connect to another server as a client, and perform calculations for this project on devices of its own crunchers (Kacsuk, 2009). Also within this fork developers replaced a job’s wrapper by extended one with support of POSIX- like syntax (Atisu, 2009). So now it's possible to use applications written not for BOINC client, without rewriting them using BOINC API. What is very useful for example in case of migration from other distributed volunteer computation system (GenWrapper: A more general BOINC wrapper, 2016).

371 BOINC forks, issues and directions of development Ilya Kurochkin and Anatoliy Saevskiy

3.2 MilkyWay@home The project has 205 000 user and 463 000 hosts in its crunchers network on the state of 26.09.2016 accordingly to boincstats.com. Rom Walton is a developer in the project and he said, that project’s repository on github.com is a clone of the official svn repository of BOINC project – http://boinc.berkeley.edu/svn/. At the moment, there is no activity in the repository. 3.3 Folding@home Team of Folding@home tried to use BOINC client to ensure that users have a choice, but they failed and the project has been frozen (FAH on BOINC, 2013). Joseph Kofflend, a developer of the Folding@home 's team and contributor to Folding@home 's repository on github.com told that the project does not use BOINC code and never did. The reason that some people think otherwise is that in the past was ability to connect to the project using BOINC- client. According to Joseph, the project team decided to give Folding@home 's crunchers a choice between different clients, but then the opportunity has been frozen because of many problems and dismissal of people, who was involved in development of this feature. 3.4 Asteroids@home The project has 86 000 user and 148 000 hosts in its crunchers network on the state of 26.09.2016 accordingly to boincstats.com. According to Radim Vanko, project researchers faced with following issues: x Inaccurate validator and outdated documentation; x Writing and supporting client application for different platforms takes a huge amount of time and others resources; x A need to allocate a separate server for a database at start of the project.

3.5 Citizen Science Grid The project has 7 000 user and 21 000 hosts in its crunchers network on the state of 26.09.2016 accordingly to boincstats.com. The researchers of the project, according to Travis Desella changed BOINC code for the integration of their own website with BOINC forum. Other than that they trying to keep it up to date and update it regularly, so everything else is vanilla BOINC. 3.6 DENIS@home The project has 4 000 user and 32 000 hosts in its crunchers network on the state of 26.09.2016 accordingly to boincstats.com. According Joel Castro, project researchers faced with the following issues: x Outdated documentation; x Code of the BOINCs server, written in C language, may work differently in different environments, for example because of different versions of gcc compiler. Also in this project had to be changed following parts BOINC: x The client version of the site was moved to WordPress CMS; x Validator;

372 BOINC forks, issues and directions of development Ilya Kurochkin and Anatoliy Saevskiy

x Assimilator. 3.7 Weather@home The project has 291 000 user and 618 000 hosts in its crunchers network on the state of 26.09.2016 accordingly to boincstats.com. According to Andy Bowery, project researchers faced with only minor difficulties and inconvenience in the management of their BOINC based project. All of them have been solved with help of simple scripts, such as deploying new versions of an application for BOINC client. Changes of BOINC source code expressed in adding some additional information pages in a web site. Also, the project is developing in the direction of the cross-platform client application with using a virtual machine. In the end, Andy told, that forum, integrated in BOINC, is outdated and it would be nice to update it. 3.8 ATLAS@home The project has 11 000 user and 15 000 hosts in its crunchers network on the state of 26.09.2016 accordingly to boincstats.com. According to David Cameron, the project researchers had some issues with running BOINC server on virtual machine, but in later versions of BOINC platform, these issues have been fixed. According to Laurence, the BOINC platform should be more intuitive in terms of deployment and management of a project. Moreover, Lawrence pointed to the need to migrate to a new technology stack, because the current one is outdated, for example migrate site from PHP to Django. As he said, he did not know about any BOINC forks, but he is aware of the different approach to the distributed volunteer computing – the WebAPI, that allows virtualized applications to be run via a Web browser. ATLAS@Home doesn’t use any load balancer because there is no need in it for now. 3.9 GridUNAM GridUNAM is a service that provides researchers with an opportunity to use volunteer computing network built on the BOINC-client to perform calculations. On the state of 14.05.2016 the project is quite small and running on a single server accordingly to Yevgeniy Kolokoltsev. The project is a logical continuation of BOINC as a tool with which researchers can make calculations. The purpose of the project – simplify the calculation process for researchers. According to Yevgeniy Kolokoltsev, GridUNAM developer, backend of GridUNAM project uses 3 BOINC daemons: Scheduler, Transitioner and Feeder. Scheduler implements internal xml protocol by which clients communicate with the server (RPC). Transitioner and Feeder communicate with the original mysql database and are responsible for planning the distribution of subtasks. However, it’s planned to migrate on its own task scheduler in order to improve the quality of planning tasks that with BOINC architecture is difficult to do. The main problem the BOINC, according to Yevgeniy, is the need of support own server, and the need of attract volunteers in the project, these factors greatly complicate the lives of researchers and serves as a barrier to the entrance of many researchers and research groups. With regard to the technology stack of BOINC, Yevgeniy thinks, that they all are outdated: x The current scheduler does not allow making a decision at the time of getting the request. For its expansion can only be used prioritizing subtasks that every time will need to be updated in line sub feeder, well and accordingly - the loss of productivity and the need to write external planning procedures. Develop scheduler within BOINC - a dead end, will

373 BOINC forks, issues and directions of development Ilya Kurochkin and Anatoliy Saevskiy

require changes in all three core modules in the database and there are many closely interwoven code. x The MySQL database has issues with scaling and are inconvenient in administrating; x Web service is implemented based on the PHP-Apache ligament and does not use microservice architecture; x Local storage in the BOINC, is at scaling will lead to the cluster and NetFS, but today to keep the cluster under one project is not profitable, computing resources are easier and cheaper to hire and local files can significantly raise the price of rent; x Demons BOINC does not support multi-threading, communicate with each other through shared memory and scaling will be connected with huge difficulties, synchronizing their work through NetFS also a bottleneck.

Summing up, BOINC is outdated both in terms of implementation and in from an architectural point of view, and it’s easy to write everything from scratch than refactor all source code. Despite this, experience, concentrated in the BOINC’s code is priceless and ideas inherent in the business logic of BOINC’s code should be used and developed 3.10 Cosmology@home The project has 67 000 user and 157 000 hosts in its crunchers network on the state of 26.09.2016 accordingly to boincstats.com. Contact person – Marius Millea member of development team of Cosmology@home, who is also involved in the development of BOINC platform and contribute to the official BOINC repository. According to him, the project's team launched a BOINC server using Docker and posted the image of it in the open access that everyone could repeat the experiment if desired. The image contains all project data except for user data. The project has 1 server, scalability is not needed. Marius said that the main issues of the BOINC platform are complexity of deployment and complexity of writing applications for BOINC client. Since it is, in his words, creates a huge difficulties for the single researchers and small groups who want to use the computing power of volunteer computing, but are not able to hire specialized staff, as well as do not have the relevant expertise to quickly launch a project on the BOINC platform and write an application for calculations for BOINC client. However, Marius pointed out that with his participation was developed BOINC2Docker project, which is freely available on github and is used for writing cross-platform applications for BOINC client. In addition, Marius said that the blog built-in in BOINC platform has a small functional and that there is ongoing work integrating with Drupal which has its own forums. 3.11 SETI@home The biggest project on BOINC has 1 648 000 user and 4 059 000 hosts in its crunchers network on the state of 26.09.2016 accordingly to boincstats.com. According to Matt, team member engaged in supporting the project, the SETI@home is running on multiple servers, scaling occurs in manual mode, both horizontal and vertical. Replication method used for scaling the database. The database, which is responsible for user interaction is about 70GB in size, and thus fits entirely in RAM in order to be fast. The science databases, reaching about 5 TB in size spread out over about 30 drives, does millions of inserts/day as well as analysis, but lately they have been needing to resort to cloning this database onto other servers/clusters to do science. Also, Matt noticed that for their goals using Python or Java in clients’ applications is unacceptable because of low productivity.

374 BOINC forks, issues and directions of development Ilya Kurochkin and Anatoliy Saevskiy

4 Scalability and fault tolerance of BOINC project To learn rules that help to run a failover BOINC project, we need to consider the risks leading to the failure of a project. Then we should approximate a probability of their occurrence and damage that this risks will cause. Finally we need to learn in which components of the system these risks can occur in Table 2.

Risk The amount Causes of damage The inability to get the job 5 Failure of server Apache Failure of the PHP interpreter Failure scheduler Disclaimer feed demon Failure database Inability to boot completed task 3 Failure of server Apache Failure of the PHP interpreter Disclaimer Assimilator Disclaimer validator Failure database The inaccessibility of the site, forum 7 Failure of the apache server Problems with the PHP interpreter Failure database Loss of jobs, customer data 9 Failure database Hard Disk Failure The loss on the results of these studies 10 Hard Disk Failure Table 2: Risks of failure

5 Discussion The results were contradictory. Some project developers on the BOINC platform complain about the architecture, the lack of performance and difficulty scaling. Other developers complain about uncomfortable admin panel, little functional blog, scripts written in the C language. The reason of this situation is different in projects loading. 6 from 10 projects which taken place in the polls are small projects, they have less than 70,000 users and 150,000 hosts (Figure 1) and are running on a single server. Developers of such projects are almost satisfied with BOINC as a platform and they have got only small issues and wishes about functionality of BOINC platform. Also, some researchers place different components on different servers BOINC platform that extends Figure 1: Distribution of users in projects the vertical scaling and today's reality is enough

375 BOINC forks, issues and directions of development Ilya Kurochkin and Anatoliy Saevskiy

for the majority of projects on BOINC platform. By the end of May 2016 7 projects have 80% of the all crunchers and the largest – SETI@home - 31%, the average number of participants in the project - 93 000, and the median is 10 000 participants. The reason of this is that there are only 3.6 million active users, which by today's standards is quite a bit (Jones, 2014). Therefore, build base of crunchers - a key stage of development of BOINC. As for the performance of BOINCs server, it scale with big difficulties and not oriented on high loads due to the following issues: x Difficulties in data exchange between BOINC daemons on different servers; x Scaling of one large relational database; x Storage of research results to the server. The reasons for the weak performance and poor scalability of BOINC are lie in its architecture. Since most part of BOINCs projects are running by individual researches and they do not care about its productivity, but the care about support, documentation, ease of use.

6 Best BOINC’s fork Native BOINC is the best solution in most cases, because it provides maximum of agility and actively developing by enthusiasts. However, if time is not a critical factor or work is quite small then better use GridUnam, which allow you to save a lot of your time on running and modification of BOINCs server, attracting participants to your project.

7 Conclusion BOINC has several major issues: x performance; x scalability; x small number cruncher; x difficulty of doing the project. For the majority of research projects, on BOINC platform, a number of crunchers are a key point for today then goes a complexity of running a project. However, with growth of a number of crunchers and an average load of a project on BOINC platform, will grow requirements on performance and scalability of BOINC platform. Also the number of crunchers grows with slower pace than expected (Antonio Amorim, 2004). For companies, which use the BOINC platform for organization of distributed calculations, the main disadvantages of BOINC are scalability and performance. Since companies need to be able to easily expand their computing resources, which with using BOINC platform will be extremely difficult to achieve. To solve issues that arise for researchers when trying to run a project, we can use the following approach. We should create cloud services on the similarity of GridUNAM, which will assume all of the tasks associated with the job distribution, validation and storage of results, and may be integrated with the instruments of analyzing big data. Such products will be interesting not only to the scientific community, but also for small private companies, for which the cost of resources is more significant than the time of computation. This will allow the product to earn money for support and further development. Today, if the main goal for a researcher is to make calculations with a minimum efforts on their part, then the most appropriate solution would be to use GridUNAM platform for the

376 BOINC forks, issues and directions of development Ilya Kurochkin and Anatoliy Saevskiy

organization of their calculations (Taufer, 2004) (Franck Cappello, 2005). However, if the researchers have always upcoming tasks for calculations in fairly large amounts, it’s better to choose the original BOINC as a platform for the organization of the volunteer computing projects, and actively engage the involvement of participants in the project. The main causes of low productivity of BOINC platform as the framework for organization of distributed computing are: x Outdated architectural approaches; x Outdated technologies, such as the use of blocking operations and the use of Apache, which is very heavy in the server with a large number of connections (Wolski, 2005). The causes of bad BOINC platform scalability are outdated architectural approaches that have been used in the development of the platform. In order to eliminate these disadvantages we need to migrate BOINC platform to microservice architecture with using of modern technologies (for example RabbitMQ, Redis) (Balaton, 2008). These steps will help to attract developers from private companies in the BOINC developers’ community in order to long living and fast growth of the BOINCs platform.

References A.Cs. Marosi, Z. B. (2010). SZTAKI Desktop Grid: Adapting Clusters for Desktop Grids. Remote Instrumentation and Virtual Laboratories, 133-144. Anderson, D. (2004). Proc. of the 5th IEEE/ACM International Workshop on Grid Computing. BOINC: a system for public-resource computing and storage, (pp. 4-10). Antonio Amorim, J. V. (2004). HEP@Home - A distributed computing system based on BOINC. Cornell university library. Atisu. (2009, September 09). Gen Wrapper: About. Retrieved August 1, 2016, from A generic BOINC wrapper for legacy applications: http://genwrapper.sourceforge.net/ Balaton, Z. F.-M. (2008). CoreGRID Integration Workshop. EDGeS: The common boundary between service and desktop grids. Hersonissos-Crete, Greece. Diogo Ferreira, F. A. (2010). 4th Iberian Grid Infrastructure Conference. Custom execution environments in the BOINC middleware. Braga, Portugal. FAH on BOINC. (2013, January 24). Retrieved August 1, 2016, from Folding@home: https://folding.stanford.edu/home/faq/faq-high-performance/#ntoc15 Franck Cappello, S. D. (2005). Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid. Future Generation Computer Systems, 21(3). GenWrapper: A more general BOINC wrapper. (2016, May 18). Retrieved August 1, 2016, from BOINC Brekeley wiki: http://boinc.berkeley.edu/trac/wiki/WrapperApp Jones, N. (2014, 02 04). Computer sharing loses momentum. Retrieved July 28, 2016, from Nature Journal: http://www.nature.com/news/computer-sharing-loses-momentum-1.14666 Kacsuk, P. (2009). SZTAKI Desktop Grid (SZDG): A Flexible and Scalable Desktop Grid System. Journal of Grid Computing. P. Kacsuk, A. M. (2006). SZTAKI Desktop Grid – a Hierarchical Desktop. Retrieved July 29, 2016, from researchgate.com: https://www.researchgate.net/publication/250765217_SZTAKI_Desktop_Grid_- _a_Hierarchical_Desktop_Grid_System Taufer, M. A. (2004). Predictor@home: a protein structure prediction based on global computing. IEEE Transactions on Parallel and Distributed Systems(17(8)), 786–796.

377 BOINC forks, issues and directions of development Ilya Kurochkin and Anatoliy Saevskiy

Wolski, R. N. (2005). 22nd International Parallel and Distributed Processing Symposium (IPDPS). Models and modeling infrastructures for global computational platforms., (pp. 73 - 80).

378