Scaling Reliably: Improving the Scalability of the Erlang Distributed

P. Trinder, N. Chechina, N. Papaspyrou, K. Sagonas, S. Thompson, et al. • Scaling Reliably • 2017 Scaling Reliably: Improving the Scalability of the Erlang Distributed Actor Platform Phil Trinder,Natalia Chechina,Nikolaos Papaspyrou, Konstantinos Sagonas,Simon Thompson,Stephen Adams,Stavros Aronis, Robert Baker,Eva Bihari,Olivier Boudeville,Francesco Cesarini, Maurizio Di Stefano,Sverker Eriksson,Viktoria Fordos,Amir Ghaffari, Aggelos Giantsios,Rockard Green,Csaba Hoch,David Klaftenegger, Huiqing Li,Kenneth Lundin,Kenneth MacKenzie,Katerina Roukounaki, Yiannis Tsiouris,Kjell Winblad RELEASE EU FP7 STREP (287510) project www.release-project.eu/ April 25, 2017 Abstract Distributed actor languages are an effective means of constructing scalable reliable systems, and the Erlang programming language has a well-established and influential model. While the Erlang model conceptually provides reliable scalability, it has some inherent scalability limits and these force developers to depart from the model at scale. This article establishes the scalability limits of Erlang systems, and reports the work of the EU RELEASE project to improve the scalability and understandability of the Erlang reliable distributed actor model. We systematically study the scalability limits of Erlang, and then address the issues at the virtual machine, language and tool levels. More specifically: (1) We have evolved the Erlang virtual machine so that it can work effectively in large scale single-host multicore and NUMA architectures. We have made important changes and architectural improvements to the widely used Erlang/OTP release. (2) We have designed and implemented Scalable Distributed (SD) Erlang libraries to address language-level scalability issues, and provided and validated a set of semantics for the new language constructs. (3) To make large Erlang systems easier to deploy, monitor, and debug we have developed and made open source releases of five complementary tools, some specific to SD Erlang. Throughout the article we use two case studies to investigate the capabilities of our new technologies and tools: a distributed hash table based Orbit calculation and Ant Colony Optimisation (ACO). Chaos Monkey experiments show that two versions of ACO survive random process failure and hence that SD Erlang preserves the Erlang reliability model. While we report measurements on a range of NUMA and cluster architectures, the key scalability experiments are conducted on the Athos cluster with 256 hosts (6144 cores). Even for programs with no global recovery data to maintain, SD Erlang partitions the network to reduce network traffic and hence improves performance of the Orbit and ACO benchmarks above 80 hosts. ACO measurements show that maintaining global recovery data dramatically limits scalability; however scalability is recovered by partitioning the recovery data. We exceed the established scalability limits of distributed Erlang, and do not reach the limits of SD Erlang for these benchmarks at this scale (256 hosts, 6144 cores). 1 I. Introduction ing reliable scalable servers, e.g. Ericsson’s AXD301 telephone exchange (switch) [79], the Distributed programming languages and Facebook chat server, and the Whatsapp in- frameworks are central to engineering large stant messaging server [77]. scale systems, where key properties include In Erlang, the actors are termed processes scalability and reliability. By scalability we and are managed by a sophisticated Virtual mean that performance increases as hosts and Machine on a single multicore or NUMA cores are added, and by large scale we mean host, while distributed Erlang provides rela- architectures with hundreds of hosts and tens tively transparent distribution over networks of thousands of cores. Experience with high of VMs on multiple hosts. Erlang is supported performance and data centre computing shows by the Open Telecom Platform (OTP) libraries that reliability is critical at these scales, e.g. that capture common patterns of reliable dis- host failures alone account for around one fail- tributed computation, such as the client-server ure per hour on commodity servers with ap- pattern and process supervision. Any large- proximately 105 cores [11]. To be usable, pro- scale system needs scalable persistent stor- gramming languages employed on them must age and, following the CAP theorem [39], be supported by a suite of deployment, moni- Erlang uses and indeed implements Dynamo- toring, refactoring and testing tools that work style NoSQL DBMS like Riak [49] and Cassan- at scale. dra [50]. Controlling shared state is the only way to While the Erlang distributed actor model build reliable scalable systems. State shared by conceptually provides reliable scalability, it has multiple units of computation limits scalability some inherent scalability limits, and indeed due to high synchronisation and communica- large-scale distributed Erlang systems must de- tion costs. Moreover shared state is a threat part from the distributed Erlang paradigm in for reliability as failures corrupting or perma- order to scale, e.g. not maintaining a fully con- nently locking shared state may poison the nected graph of hosts. The EU FP7 RELEASE entire system. project set out to establish and address the scal- Actor languages avoid shared state: actors or ability limits of the Erlang reliable distributed processes have entirely local state, and only in- actor model [67]. teract with each other by sending messages [2]. After outlining related work (Section II) and Recovery is facilitated in this model, since ac- the benchmarks used throughout the article tors, like operating system processes, can fail (Section III) we investigate the scalability lim- independently without affecting the state of its of Erlang/OTP, seeking to identify specific other actors. Moreover an actor can supervise issues at the virtual machine, language and other actors, detecting failures and taking re- persistent storage levels (Section IV). medial action, e.g. restarting the failed actor. We then report the RELEASE project work to Erlang [6, 19] is a beacon language for re- address these issues, working at the following liable scalable computing with a widely em- three levels. ulated distributed actor model. It has in- fluenced the design of numerous program- 1. We have designed and implemented a ming languages like Clojure [43] and F# [74], set of Scalable Distributed (SD) Erlang li- and many languages have Erlang-inspired braries to address language-level reliabil- actor frameworks, e.g. Kilim for Java [73], ity and scalability issues. An operational Cloud Haskell [31], and Akka for C#, F# and semantics is provided for the key new Scala [64]. Erlang is widely used for build- s_group construct, and the implementa- 2 P. Trinder, N. Chechina, N. Papaspyrou, K. Sagonas, S. Thompson, et al. • Scaling Reliably • 2017 tion is validated against the semantics (Sec- report measurements on a range of NUMA tion V). and cluster architectures as specified in Ap- pendix A, the key scalability experiments are 2. We have evolved the Erlang virtual machine conducted on the Athos cluster with 256 hosts so that it can work effectively in large-scale and 6144 cores. Having established scientifi- single-host multicore and NUMA archi- cally the folklore limitations of around 60 con- tectures. We have improved the shared nected hosts/nodes for distributed Erlang sys- ETS tables, time management, and load tems in Section 4, a key result is to show that balancing between schedulers. Most of the SD Erlang benchmarks exceed this limit these improvements are now included in and do not reach their limits on the Athos clus- the Erlang/OTP release, currently down- ter (Section VIII). loaded approximately 50K times each Contributions. This article is the first sys- month (Section VI). tematic presentation of the coherent set of technologies for engineering scalable reliable 3. To facilitate the development of scalable Erlang systems developed in the RELEASE Erlang systems, and to make them main- project. tainable, we have developed three new Section IV presents the first scalability study tools: Devo, SDMon and WombatOAM, covering Erlang VM, language, and storage and enhanced two others: the visualisa- scalability. Indeed we believe it is the first tion tool Percept, and the refactorer Wran- comprehensive study of any distributed ac- gler. The tools support refactoring pro- tor language at this scale (100s of hosts, and grams to make them more scalable, eas- around 10K cores). Individual scalability stud- ier to deploy at large scale (hundreds of ies, e.g. into Erlang VM scaling [8], or lan- hosts), easier to monitor and visualise their guage and storage scaling have appeared be- behaviour. Most of these tools are freely fore [38, 37]. available under open source licences; the WombatOAM deployment and monitor- At the language level the design, implemen- ing tool is a commercial product (Sec- tation and validation of the new libraries (Sec- tion VII). tion V) have been reported piecemeal [21, 60], and are included here for completeness. Throughout the article we use two bench- While some of the improvements made to marks to investigate the capabilities of our new the Erlang Virtual Machine (Section i) have technologies and tools. These are a compu- been thoroughly reported in conference pub- tation in symbolic algebra, more specifically lications [66, 46, 47, 69, 70, 71], others are re- an algebraic ‘orbit’ calculation that exploits ported here for the first time (Sections iii, ii). a non-replicated

Load more