Improving the Scalability of the Erlang Distributed Actor Platform
Total Page:16
File Type:pdf, Size:1020Kb
P. Trinder, N. Chechina, N. Papaspyrou, K. Sagonas, S. Thompson, et al. • Scaling Reliably • 2017 Scaling Reliably: Improving the Scalability of the Erlang Distributed Actor Platform Phil Trinder,Natalia Chechina,Nikolaos Papaspyrou, Konstantinos Sagonas,Simon Thompson,Stephen Adams,Stavros Aronis, Robert Baker,Eva Bihari,Olivier Boudeville,Francesco Cesarini, Maurizio Di Stefano,Sverker Eriksson,Viktoria Fordos,Amir Ghaffari, Aggelos Giantsios,Rockard Green,Csaba Hoch,David Klaftenegger, Huiqing Li,Kenneth Lundin,Kenneth MacKenzie,Katerina Roukounaki, Yiannis Tsiouris,Kjell Winblad RELEASE EU FP7 STREP (287510) project www.release-project.eu/ April 25, 2017 Abstract Distributed actor languages are an effective means of constructing scalable reliable systems, and the Erlang programming language has a well-established and influential model. While the Erlang model conceptually provides reliable scalability, it has some inherent scalability limits and these force developers to depart from the model at scale. This article establishes the scalability limits of Erlang systems, and reports the work of the EU RELEASE project to improve the scalability and understandability of the Erlang reliable distributed actor model. We systematically study the scalability limits of Erlang, and then address the issues at the virtual machine, language and tool levels. More specifically: (1) We have evolved the Erlang virtual machine so that it can work effectively in large scale single-host multicore and NUMA architectures. We have made important changes and architectural improvements to the widely used Erlang/OTP release. (2) We have designed and implemented Scalable Distributed (SD) Erlang libraries to address language-level scalability issues, and provided and validated a set of semantics for the new language constructs. (3) To make large Erlang systems easier to deploy, monitor, and debug we have developed and made open source releases of five complementary tools, some specific to SD Erlang. Throughout the article we use two case studies to investigate the capabilities of our new technologies and arXiv:1704.07234v2 [cs.PL] 8 May 2017 tools: a distributed hash table based Orbit calculation and Ant Colony Optimisation (ACO). Chaos Monkey experiments show that two versions of ACO survive random process failure and hence that SD Erlang preserves the Erlang reliability model. While we report measurements on a range of NUMA and cluster architectures, the key scalability experiments are conducted on the Athos cluster with 256 hosts (6144 cores). Even for programs with no global recovery data to maintain, SD Erlang partitions the network to reduce network traffic and hence improves performance of the Orbit and ACO benchmarks above 80 hosts. ACO measurements show that maintaining global recovery data dramatically limits scalability; however scalability is recovered by partitioning the recovery data. We exceed the established scalability limits of distributed Erlang, and do not reach the limits of SD Erlang for these benchmarks at this scale (256 hosts, 6144 cores). 1 I. Introduction stant messaging server [77]. In Erlang, the actors are termed processes Distributed programming languages and and are managed by a sophisticated Virtual frameworks are central to engineering large Machine on a single multicore or NUMA scale systems, where key properties include host, while distributed Erlang provides rela- scalability and reliability. By scalability we tively transparent distribution over networks mean that performance increases as hosts and of VMs on multiple hosts. Erlang is supported cores are added, and by large scale we mean by the Open Telecom Platform (OTP) libraries architectures with hundreds of hosts and tens that capture common patterns of reliable dis- of thousands of cores. Experience with high tributed computation, such as the client-server performance and data centre computing shows pattern and process supervision. Any large- that reliability is critical at these scales, e.g. scale system needs scalable persistent stor- host failures alone account for around one fail- age and, following the CAP theorem [39], ure per hour on commodity servers with ap- Erlang uses and indeed implements Dynamo- proximately 105 cores [11]. To be usable, pro- style NoSQL DBMS like Riak [49] and Cassan- gramming languages employed on them must dra [50]. be supported by a suite of deployment, moni- While the Erlang distributed actor model toring, refactoring and testing tools that work conceptually provides reliable scalability, it has at scale. some inherent scalability limits, and indeed Controlling shared state is the only way to large-scale distributed Erlang systems must de- build reliable scalable systems. State shared by part from the distributed Erlang paradigm in multiple units of computation limits scalability order to scale, e.g. not maintaining a fully con- due to high synchronisation and communica- nected graph of hosts. The EU FP7 RELEASE tion costs. Moreover shared state is a threat project set out to establish and address the scal- for reliability as failures corrupting or perma- ability limits of the Erlang reliable distributed nently locking shared state may poison the actor model [67]. entire system. After outlining related work (Section II) and Actor languages avoid shared state: actors or the benchmarks used throughout the article processes have entirely local state, and only in- (Section III) we investigate the scalability lim- teract with each other by sending messages [2]. its of Erlang/OTP, seeking to identify specific Recovery is facilitated in this model, since ac- issues at the virtual machine, language and tors, like operating system processes, can fail persistent storage levels (Section IV). independently without affecting the state of We then report the RELEASE project work to other actors. Moreover an actor can supervise address these issues, working at the following other actors, detecting failures and taking re- three levels. medial action, e.g. restarting the failed actor. 1. We have designed and implemented a Erlang [6, 19] is a beacon language for re- set of Scalable Distributed (SD) Erlang li- liable scalable computing with a widely em- braries to address language-level reliabil- ulated distributed actor model. It has in- ity and scalability issues. An operational fluenced the design of numerous program- semantics is provided for the key new ming languages like Clojure [43] and F# [74], s_group construct, and the implementa- and many languages have Erlang-inspired tion is validated against the semantics (Sec- actor frameworks, e.g. Kilim for Java [73], tion V). Cloud Haskell [31], and Akka for C#, F# and Scala [64]. Erlang is widely used for build- 2. We have evolved the Erlang virtual machine ing reliable scalable servers, e.g. Ericsson’s so that it can work effectively in large-scale AXD301 telephone exchange (switch) [79], the single-host multicore and NUMA archi- Facebook chat server, and the Whatsapp in- tectures. We have improved the shared 2 P. Trinder, N. Chechina, N. Papaspyrou, K. Sagonas, S. Thompson, et al. • Scaling Reliably • 2017 ETS tables, time management, and load ter (Section VIII). balancing between schedulers. Most of Contributions. This article is the first sys- these improvements are now included in tematic presentation of the coherent set of the Erlang/OTP release, currently down- technologies for engineering scalable reliable loaded approximately 50K times each Erlang systems developed in the RELEASE month (Section VI). project. Section IV presents the first scalability study 3. To facilitate the development of scalable covering Erlang VM, language, and storage Erlang systems, and to make them main- scalability. Indeed we believe it is the first tainable, we have developed three new comprehensive study of any distributed ac- tools: Devo, SDMon and WombatOAM, tor language at this scale (100s of hosts, and and enhanced two others: the visualisa- around 10K cores). Individual scalability stud- tion tool Percept, and the refactorer Wran- ies, e.g. into Erlang VM scaling [8], or lan- gler. The tools support refactoring pro- guage and storage scaling have appeared be- grams to make them more scalable, eas- fore [38, 37]. ier to deploy at large scale (hundreds of At the language level the design, implemen- hosts), easier to monitor and visualise their tation and validation of the new libraries (Sec- behaviour. Most of these tools are freely tion V) have been reported piecemeal [21, 60], available under open source licences; the and are included here for completeness. WombatOAM deployment and monitor- While some of the improvements made to ing tool is a commercial product (Sec- the Erlang Virtual Machine (Section i) have tion VII). been thoroughly reported in conference pub- Throughout the article we use two bench- lications [66, 46, 47, 69, 70, 71], others are re- marks to investigate the capabilities of our new ported here for the first time (Sections iii, ii). technologies and tools. These are a compu- In Section VII, the WombatOAM and SD- tation in symbolic algebra, more specifically Mon tools are described for the first time, as an algebraic ‘orbit’ calculation that exploits is the revised Devo system and visualisation. a non-replicated distributed hash table, and The other tools for profiling, debugging and an Ant Colony Optimisation (ACO) parallel refactoring developed in the project have previ- search program (Section III). ously been published piecemeal [52, 53, 55, 10], We report on the reliability and scalability