1802 Ertel CC

Compiling for Concise Code and Ecient I/O Sebastian Ertel Andrés Goens Chair for Compiler Construction Chair for Compiler Construction Technische Universität Dresden Technische Universität Dresden Dresden, Germany Dresden, Germany [email protected] [email protected] Justus Adam Jeronimo Castrillon Chair for Compiler Construction Chair for Compiler Construction Technische Universität Dresden Technische Universität Dresden Dresden, Germany Dresden, Germany [email protected] [email protected] Abstract (CC’18). ACM, New York, NY, USA, 12 pages. hps://doi.org/10. Large infrastructures of Internet companies, such as Face- 1145/3178372.3179505 book and Twitter, are composed of several layers of microservices. While this modularity provides scalability to the 1 Introduction system, the I/O associated with each service request strongly In today’s Internet applications, I/O is a dominating factor impacts its performance. In this context, writing concise pro- for performance, an aspect that has received little attention grams which execute I/O eciently is especially challenging. in compiler research. The server-side infrastructures of in- In this paper, we introduce Ÿauhau, a novel compile-time ternet companies such as Facebook [1], Twitter [11], and solution. Ÿauhau reduces the number of I/O calls through Amazon [10] serve millions of complex web pages. They run rewrites on a simple expression language. To execute I/O hundreds of small programs, called microservices, that com- concurrently, it lowers the expression language to a dataow municate with each other via the network. Figure 1 shows the representation. Our approach can be used alongside an ex- interactions of the microservices at Amazon. In these infras- isting programming language, permitting the use of legacy tructures, reducing network I/O to satisfy sub-millisecond code. We describe an implementation in the JVM and use latencies is of paramount importance [7]. it to evaluate our approach. Experiments show that Ÿauhau Microservice code bases are massive and are deployed can signicantly improve I/O, both in terms of the number across clusters of machines. These systems run 24/7 and of I/O calls and concurrent execution. Ÿauhau outperforms need to be highly exible and highly available. They must ac- state-of-the-art approaches with similar goals. commodate for live (code) updates and tolerate machine and network failures without noticeable downtime. A service- CCS Concepts • Software and its engineering → Func- oriented architecture [24] primarily addresses updates while tional languages; Data ow languages; Concurrent program- the extension to microservices deals with failures [8]. The ming structures; idea is to expose functions as a service that runs as a stand- Keywords I/O, dataow, concurrency alone server application and responds to HTTP requests. It is this loose coupling of services via the network that enables ACM Reference Format: Sebastian Ertel, Andrés Goens, Justus Adam, and Jeronimo Castril- both exibility and reliability for the executing software. But lon. 2018. Compiling for Concise Code and Ecient I/O. In Pro- this comes at the cost of code conciseness and I/O eciency. ceedings of 27th International Conference on Compiler Construction Code that calls another service over the network instead of invoking a normal function loses conciseness. The boiler- Permission to make digital or hard copies of all or part of this work for plate code for the I/O call obfuscates the functionality of the personal or classroom use is granted without fee provided that copies program and makes it harder to maintain. The additional are not made or distributed for prot or commercial advantage and that latency to receive the response adds to the latency of the copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must calling service. be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic 1.1 Code Conciseness versus I/O Eciency permission and/or a fee. Request permissions from [email protected]. Programmers commonly use frameworks, like Thrift [3], to CC’18, February 24–25, 2018, Vienna, Austria hide the boilerplate network code behind normal function © 2018 Copyright held by the owner/author(s). Publication rights licensed to the Association for Computing Machinery. calls. This improves code conciseness but not I/O eciency. ACM ISBN 978-1-4503-5644-2/18/02...$15.00 There are essentially two ways to improve I/O latency: by hps://doi.org/10.1145/3178372.3179505 reducing the number of I/O calls and through concurrency. 104 CC’18, February 24–25, 2018, Vienna, Austria S. Ertel, A. Goens, J. Adam, and J. Castrillon 2500 2000 1500 ● batched sequential 1000 latency [ms] 500 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 5 10 15 20 # blog posts Figure 2. Latency reduction via batching in a simple microservice for a blog. [Source: I Love APIs 2015 by Chris Munns, licensed under CC BY 4.0, available at: hp://bit.ly/2zboHTK] program can fully unblock computation from I/O, but is at Figure 1. Microservices at Amazon. odds with a reduction of I/O calls. To reduce I/O calls, they need to be collected rst which removes the concurrency To reduce the number of I/O calls in a program, one can between the calls. Not all I/O calls can be grouped into a remove redundant calls and batch multiple calls to the same single one, for example if they interface with dierent ser- service into a single one. Redundant calls occur naturally vices. Such calls would now execute sequentially. In order to in concise code because such a code style fosters a modular benet from concurrency, a batching optimization must pre- structure in which I/O calls can happen inside dierent func- serve the concurrent program structure, not only between tions. The programmer should not have to, e.g., introduce a the resulting I/O calls but also between said calls and the global cache to reuse results of previous I/O calls. This would rest of the computation. add the optimization aspect directly into the algorithm of To introduce concurrency, the developer would have to the application/service, making the code harder to main- split up the program into dierent parts and place them tain and extend, especially in combination with concurrency. onto threads or event handlers. Threads use locks which Batching, on the other hand, does not necessarily reduce can introduce deadlocks [18] and event-based programming the amount of data transferred but does decrease resource suers from stack ripping [2]. There is a long debate in the contention. A batched request utilizes only a single HTTP systems community which of the two approaches provides connection, a single socket on the server machine and a the best performance [23, 30]. Programming with futures on single connection to the database. If all services batch I/O the other hand, needs the concept of a monad which most then this can substantially decrease contention in the whole developers in imperative languages such as Java or even infrastructure. Network performance dramatically benets functional languages such as Clojure struggle with [12]. All when multiple requests are batched into a single one [22]. of these programming models for concurrency introduce As a simple example, consider a web blog that displays new abstractions and clutter the code dramatically. Once information on two panes. The main pane shows the most more, they lift optimization aspects into the application code, recent posts and the left pane presents meta information such resulting in unconcise code. as the headings of the most frequently viewed ones and a list We argue that a compiler-based approach is needed to of topics each tagged with the number of associated posts. help provide ecient I/O from concise code. State-of-the-art The blog service is then responsible of generating the HTML approaches such as Haxl [20] and Muse [17] do not fully code. To this end, it fetches post contents and associated succeed to provide both code conciseness and I/O eciency. meta data via I/O from other (database) services at various They require the programmer to adopt a certain program- places in the code. We implemented the database service for ming style to perform ecient I/O. This is due to the fact our blog using Thrift and introduced a single call to retrieve that both are runtime frameworks and require some form the contents for a list of posts. Figure 2 compares the latency of concurrent programming solely to batch I/O calls. Both of this batch call with a sequential retrieval of the blog posts. frameworks fall short on exposing the concurrency of the The batched call benets from various optimizations such application to the system they execute on. that retrieving 19 (4 kB) blog posts becomes almost 18 faster. ⇥ Note that batched retrievals with a larger latency cannot be 1.2 Contribution seen in the plot because of the scale. In this paper, we present a compiler-based solution that sat- A concurrent program can be executed in parallel to de- ises both requirements: concise code and ecient I/O. Our crease latency. When the concurrent program contains I/O approach, depicted in Figure 3, builds on top of two interme- calls, these can be performed in parallel even if the computa- diate representations: an expression language and a dataow tion is executed sequentially. A concurrent execution of the representation [6]. The expression language is based on the 105 Compiling for Concise Code and Eicient I/O CC’18, February 24–25, 2018, Vienna, Austria Terms: unoptimized Expression I/O Expression Ÿauhau program IR reduction IR t ::= x variable (ƛ-calcus-based) lower λx.t abstraction | tt application Dataﬂow Dataﬂow | I/O I/O optimized let x = t in t lexical scope (variable binding) IR concurrency IR program | (concurrent) if(ttt) conditionals | ff (x ...x ) apply foreign function f | f 1 n Figure 3.

1802 Ertel CC

What I Wish I Knew When Learning Haskell

Applicative Programming with Effects

Learn You a Haskell for Great Good! © 2011 by Miran Lipovača 3 SYNTAX in FUNCTIONS 35 Pattern Matching

There Is No Fork: an Abstraction for Efficient, Concurrent, and Concise Data Access

An Introduction to Applicative Functors

It's All About Morphisms

Cofree Traversable Functors

Notions of Computation As Monoids 3

Applicative Programming with Effects

UNIVERSITY of CALIFORNIA SAN DIEGO Front-End Tooling For

Selective Applicative Functors

A Layered Formal Framework for Modeling of Cyber-Physical Systems