University of California Santa Cruz Extending Ceph

UNIVERSITY OF CALIFORNIA SANTA CRUZ

EXTENDING CEPH OBJECTS TO SUPPORT WEBASSEMBLY EXECUTABLES A thesis submitted in partial satisfaction of the requirements for the degree of

MASTER OF SCIENCE

COMPUTER SCIENCE

Saloni Rane

June 2020

The Thesis of Saloni Rane is approved:

Professor Jeﬀ LeFevre, Chair

Professor Carlos Maltzahn

Professor Peter Alvaro

Professor Scott Brandt

Saloni Rane

2020 Table of Contents

List of Figures v

List of Tables vi

Abstract vii

1 Introduction 1

2 Background 5 2.1 Programmable Storage ...... 5 2.2 Object Classes in Ceph ...... 7 2.3 Ceph extension: The Skyhook Data Management project ...... 10

3 WebAssembly 12 3.1 Support ...... 13 3.2 Outside the Web ...... 14 3.3 Speciﬁcation of a Wasm environment ...... 14 3.4 Emscripten ...... 15 3.5 Compiling a C / C++ Module to WebAssembly ...... 16

4 Methodology 17 4.1 Embedding WebAssembly Runtime as an Object Class in Ceph . . . . . 17

5 Comparative Analysis of diﬀerent approaches to Embed WebAssembly in a non-web Environment 21 5.1 Choosing a WebAssembly Implementation ...... 21 5.1.1 WAVM ...... 22 5.1.2 wasmer ...... 23

6 Measuring the overheads 25 6.0.1 Systems ...... 26 6.0.2 Experimental scenarios ...... 27

iii 7 Implementation 30

8 Discussion 34 8.1 Current limitations of the WebAssembly runtime ...... 34 8.2 WebAssembly beyond cls tabular ...... 36

9 Related Work 38

10 Future Work 40

11 Conclusion 42

Bibliography 44

A Building Wasm for the Web using Emscripten 48

B Building Wasm using Clang 50

iv List of Figures

2.1 Process for invoking ioctx.exec function in Ceph ...... 8 2.2 Increase in the Number of Object Storage interfaces in Ceph since 2010. Figure copied from [27] without permission...... 10

3.1 Compiling and Deploying a C program to WebAssmebly for the Web Browser. Figure copied from [19]...... 16

4.1 Compiling, Deploying and Executing a WebAssembly Module in a Non- Web Environment. Figure includes elements copied from [19]...... 19

6.1 Ceph Cluster with one OSD ...... 26 6.2 Comparison of Overhead for a no-op function ...... 28 6.3 Comparison of Overhead for a Function that performs computation . . . 28

7.1 Performance Comparison of Queries in Tabular to that of WebAssembly 32

A.1 Running a wasm binary in the browser. Figure copied from [20] without permission...... 49

v List of Tables

6.1 Execution Time of functions in Native and with Wasm Runtime Envi- ronments ...... 27 6.2 Size of binary compiled ﬁles for each function executed in the Experiments 27

vi Abstract

Extending Ceph objects to support WebAssembly executables

Saloni Rane

Programmable storage provides a means by which existing services in the storage system can be generalized, exposed, extended, combined and reused to support applications through the creation of domain-specific interfaces for use by external storage clients. Current work on programmable storage has shown how to embed user-defined functions that perform data management tasks into an object storage system. How- ever, these functions are closely tied to the storage software code base, for instance they require an SDK and must be compiled against specific storage software versions, mak- ing them less portable and less future proof. We propose extending this capability by creating a dynamic object interface with WebAssembly - a portable binary format that facilitates generic code execution, and leveraging WebAssembly’s high-level goals to enable clients to add user-defined functionality to the OSD to support the needs of their applications. This thesis explores the design space of interfaces in a programmable storage system and introduces a method to embed a WebAssembly runtime environment, enabling dynamic injection of generic user-defined functions into a running storage system without requiring much knowledge of the internals of the storage layer.

vii Chapter 1

Introduction

With the intent to extend software-deﬁned storage, [23] introduced Programmable

Storage in order to utilize existing internal storage system abstractions to create spe- ciﬁc interfaces to support applications. The aim of programmable storage is to make components reusable by exposing sub-systems to eliminate redundancy, thus allowing a storage system to provide application-speciﬁc functionality.

Ceph[29] is an open source, distributed storage system that distributes data across a reliable object-store - RADOS[30]. This object store gives it the ability to extend the object interface to add remote execution targets that may perform certain operations on object data. This ability allows applications to reduce network round-trips and data movement, exploit remote resources, and simplify otherwise complex interfaces. Ceph can be extended via class interfaces using the “cls” mechanism which already exists in

Ceph eg. Ceph FS, rados gateway. This mechanism makes Ceph ﬂexible and permits

1 users to add their own object classes and methods. Malacology [27] is programmable storage built on top of Ceph that enables the programmability of internal abstractions in Ceph.

SkyhookDM is another extension to Ceph’s object classes that leverages “programmable storage” to enable data management tasks to be oﬄoaded directly on the objects in the storage layer. Tabular data is partitioned and stored in objects and various SQL operations can be applied to this data for processing. The functions in SkyhookDM are implemented using the object class mechanism and applied to object data by the OSD itself. SkyhookDM also supports database physical design operations such as indexing and transforming data between both row and columnar formats.

WebAssmebly (Wasm) is an open standard that facilitates creating a binary code format for executables. Wasm has been known to be used for computationally intensive tasks[18][8]. Brieﬂy, code written in any supported language (e.g., C, C++, Rust etc.) can be compiled to create a WebAssembly executable which can then be run on any environment that supports wasm execution, such as within a browser or inside a wasm vm. Thus it is portable, avoids lock-in with one language and can be easily bundled and distributed.

This thesis explores the beneﬁts and challenges of incorporating and adapting to a new storage interface within Ceph using WebAssembly (wasm) and provides a mechanism to

2 implement this interface and shows how this interface can be leveraged for existing data management tasks. We speciﬁcally provide a way to implement similar data management functionality to SkyhookDM but in a more generic way using webassembly with the goal of dynamic injection of user-code.

Increasingly more complex and resource-demanding services and applications provided the motivation to make code ﬂexible, sandboxed, and easily bundled to be shared across platforms. In order to enable executing code, written in any language on the object data in Ceph, we embed an engine to process WebAssembly code dynamically, inside the OSD process. Thus, a wasm object class contains an injected function that a client may invoke remotely. A client can remotely execute any registered function in this framework, provide input parameters for this function, and receive the expected output. Due to security considerations, there are some ways with which the registration process can be curated, however this is outside the scope of this thesis and we provide further details in the discussion section.

The data management tasks described and implemented in this work are just examples to demonstrate the usefulness of embedding WebAssembly capability within cls in Ceph.

The main goal of this work is to enable system-independent, generic functions to be dynamically injected at runtime, with wasm providing the generic compile and execution framework. This allows the storage system to rapidly evolve its capabilities to easily support generic user-deﬁned tasks directly within the storage layer.

3 • Contribution 1: Investigated how WebAssembly can work in a Non-web Environ-

ment and compared performance of WebAssmembly binary functions to natively

compiled gcc functions (Chapter 5)

• Contribution 2: Implemented WebAssmembly binary module execution in Ceph

and evaluated this modude in Ceph for tabular data to demonstrate data process-

ing with WebAssembly in Ceph (Chapter 7)

4 Chapter 2

Background

2.1 Programmable Storage

Moving computation close to the data brings important beneﬁts. If all the data needs to be moved from the storage to the compute layer, it consumes a lot of network bandwidth. This can sometimes lead to infrastructure in the storage being underutilized. Active Storage [26] allows to run computation tasks where the data is, leveraging storage nodes’ underutilized resources, reducing data movement between the compute and the storage layer. However, there is some distinction in Programmable

Storage and Active Storage - in that the former supports greater conﬁguration possibil- ities and injection and execution of code within any component of the storage system whereas the latter supports the same, but at the data access level. Malacology[27] introduces “programmable storage” as an approach to expose, augment, and/or compose existing services within the storage system toward new services and presents a proto-

5 type within Ceph. This thesis focuses speciﬁcally on data processing in the context of programmable storage approach that injects processing functionality into the storage system at the Ceph OSD (server) level.

One example of an active storage framework is the OpenStack Storlets framework. This framework allows running computations (storlets), in the object-store. Storlets provide an extension mechanism to OpenStack Swift in order to run computations close to the data that allows developers to write and deploy code as an object, and then invoke this code on data objects. Requests are intercepted at the proxy and also on the object servers due to a middleware integration [25].

Ceph implements user extensions through the Object Classes mechanism . These object classes allow to extend Ceph by loading custom code directly into the OSDs that can then be executed by a librados application. However, there is a distinction between

RADOS objects and the objects provided by Amazon S3 or OpenStack Swift. The

OpenStack framework has a middleware layer that runs a compiled jar in a container above the data access layer whereas in Ceph, using the object class mechanism, the processing code can be applied directly in the OSD through augmented read and write function.

6 2.2 Object Classes in Ceph

Ceph is an open source software storage platform that facilitates highly scalable object, block and ﬁle-based storage designed to run on commodity, readily available hardware and avoids having a single point of failure [5]. Ceph’s foundation is the Re- liable Autonomic Distributed Object Store (RADOS) which provides object storage services for the ﬁle system, S3 and block storage interfaces.

One of the important features of Ceph is that it can be extended by creating object classes that deﬁne functions in addition to read and write. Librados, the user-level library that clients use to communicate with RADOS, can be extended to call these additional functions.

The object class plugin framework provides the mechanism to add functions to objects.

Clients access objects managed by the OSDs directly instead of going through interme- diate services. Each added function is compiled into a shared library that can be loaded into the OSD process at runtime. These added functions, written in C++, are bundled as “object class” deﬁnitions, usually one class per particular storage interface.

Each directory in src/cls corresponds to a separate object class: src cls cephfs journal

7 lock log lua numops rbd refcount replica log rgw sdk statelog tabular timeindex user version wasm

Object Classes work in the following way:

Figure 2.1: Process for invoking ioctx.exec function in Ceph

8 After connecting to the cluster and opening an io context for a pool, the data is read into the heap, a speciﬁc operation is performed and the result is stored in the out buﬀerlist.

The CPU resources on the OSD are used to perform all of the user-deﬁned work, before returning the result to the caller client or writing the result to the storage media. Ceph handles failover and networking to ensure that the output arrives back at the client.

Fig. 2.1 shows the read process, that stores output in a buﬀerlist. The write function applies processing, before writing data to the storage media.

Some examples of object classes include - Rados gateway (rgw) , CephFS, rdb, lua.

Refcount is an existing cls function that does ref count and allows users to set attributes from within cls. Objects internally manage data on top of the ”Object Store”, which include extended key-value data and byte extents. Rocksdb is the OSD-local key-value store, and users can deﬁne object cls methods to store their data in rocksdb or the local blob store or both. Lua is another cls function that has a provision for the client to remotely execute any registered function from the Lua script, provide am arbitrary input and receive an arbitrary output. This Lua Script can be sent to the OSD along with the client request. Additional discussion on Lua is provided in Chapter 8.

9 Figure 2.2: Increase in the Number of Object Storage interfaces in Ceph since 2010. Figure copied from [27] without permission.

Fig. 2.2 shows the increase in the number of object classes over time. Thus, it can be seen that users do create their own object classes. This provided the motivation for us to extend the existing programmability feature to allow users to create and deploy their application-speciﬁc functions in Ceph, by writing code in a language of their choice.

2.3 Ceph extension: The Skyhook Data Management project

The SkyhookDM project provides data management functionality through custom cls extensions designed to do data processing, data conversion, and utilize the local key-value store as a data indexing mechanism. It is built on top of Ceph and provides storage layer extensions to Ceph object classes i.e it enables in-storage executions via

10 data access libraries and their APIs.

The tabular object class allows SkyhookDM to oﬄoad some data management tasks to the scalable, reliable, storage layer in order to potentially beneﬁt the higher layer client such as a Database. These tasks include physical design such as storage-local indexing, view materialization, and storage-local data processing tasks such as select, project, and some forms of aggregation, sorting and joins.

SkyhookDM partitions relational table data by row or column, and each partition is mapped to a corresponding object in Ceph. Since the SkyhookDM interfaces are generic, they can be invoked by higher level applications such as a Database through an external table interface (e.g., foreign data wrapper) or a python client library. Client applications can observe a collection of objects (data partitions) representing a table and can invoke the SkyhookDM cls interfaces on each object to perform table processing such as select, project, etc. Most operations like Predicate evaluation eg. where clause on SQL statements, can be pushed down into the storage system to leverage Ceph’s scale-out properties that can lead to signiﬁcant performance improvements for query execution.

11 Chapter 3

WebAssembly

WebAssembly (Wasm) is a binary instruction format for a stack-based virtual machine. WebAssembly is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.[23]

At a high level, WebAssembly is -

• A portable, size and load-time eﬃcient, binary instruction format

• A language ( a human readable text format that can be compiled to the binary

format [4])

• A compilation target

• An upgrade from javascript

Features of WebAssembly:

• Fast: WebAssembly aims to execute at native speed

12 • Secure: Describes a memory-safe, sandboxed execution environment

• Language-independent

• Platform-independent

Primarily, WebAssembly was designed to be run on the web. It is the fourth language to run in the browser along with HTML, CSS, and JavaScript. The purpose of We- bAssembly was to create a language that can be run at near-native speed, but on the

Web browser. Since WebAssembly is inherently a compilation target, multiple languages can be used to generate the wasm executable. In Spite of being a language itself, the goal is to provide other languages like C, C++, Rust, etc. to compile directly to

WebAssembly binary, thus providing a mechanism for these low-level languages to be executed on the Web.

3.1 Support

The WebAssembly Community Group was started in April 2015; By March

2017[21], WebAssembly was shipped in all four major browser engines. There are around

38 programming languages [1] known to support compilation to WebAssembly including languages like C, Cpp, Rust. Emscripten[7] is a popular toolchain for compiling C and

C++ to WebAssembly using the LLVM backend. Additional Details about Emscripten are provided in Section 3.4. Clang[2] is another compiler known to support a wasm target. Clang v8 or later is required to compile C and C++ to WebAssembly. Additional instructions to build a WebAssmebly target are provided in Appendix B.

13 3.2 Outside the Web

WebAssembly was initially designed to run on the Web to enable near-native execution speed for programs in the web browser. WebAssembly complements the

ﬂexibility of JavaScript with the performance of wasm in the Web. The general standards provide speciﬁcations for JavaScript and Web embedding. However, it has been considered valuable [10] for WebAssembly to execute in other environments as well.

WebAssembly provides isolation for running untrusted code and each WebAssembly module is sandboxed by default. Recently, there has been a lot of ongoing work [12] to extend that security to make it safe to use untrusted code on any platform, whether it is on a server in the cloud or data centers, on a mobile/desktop computer, in an IoT device or embedded within larger programs. Non-Web environments include runtime environments, libraries and VMs.

3.3 Speciﬁcation of a Wasm environment

In order to embed WebAssembly into diﬀerent environments, there are general standards that provide core speciﬁcations for these embeddings:

1. Wasm program - This is the wasm module containing a collection of various

wasm-deﬁned value and program types deﬁnitions presented either in binary or

textual format.

2. Virtual machine - The wasm program is intended to be run on a portable

14 virtual stack machine (VM). The VM leverages virtual memory to execute wasm

code with memory bounds.

3. Runtime environment - Runtime environment is a stand-alone WebAssembly

runtime that can be used in a CLI tool or embedded into other systems. Some

runtimes are wasmtime[15], wasmer[14].

4. Instruction Set - WebAssembly is a virtual ISA at its core. The list of instruc-

tions includes standard memory load/store instructions, numeric, parametric, con-

trol of ﬂow instruction types and wasm-speciﬁc variable instructions[17].

5. Code Representation - the WebAssembly Community Group reached consensus

on the initial (MVP) binary format - It deﬁnes a WebAssembly binary format

(.wasm) - which is not designed to be used by humans, as well as a human-readable

WebAssembly text format (.wat) that resembles a cross between S-expressions and

traditional assembly languages[23].

3.4 Emscripten

Emscripten is an LLVM to JavaScript compiler. Emscripten has always focused

ﬁrst and foremost on compiling to the Web and other JavaScript environments like

Node.js. But as more use cases require the use of WebAssembly without JavaScript, ongoing work also supports emitting standalone Wasm ﬁles from Emscripten, that do not depend on the Emscripten JS runtime. An example of how Emscripten generates a

WebAssmebly code from a simple C program can be in Appendix A.

15 3.5 Compiling a C / C++ Module to WebAssembly

Figure 3.1: Compiling and Deploying a C program to WebAssmebly for the Web Browser. Figure copied from [19].

The primary use case of WebAssembly was to take an existing C program and enable it to be run on the browser. Emscripten provides most of the features needed to compile a C program to WebAssembly. Compiling code with Emscripten outputs an

HTML page as well as the wasm module and the JavaScript glue code to compile and instantiate the wasm in order to execute it in the browser.

We use the Emscripten compiler throughout the course of the experiments to generate the necessary wasm binaries from the C/C++ source code.

16 Chapter 4

Methodology

4.1 Embedding WebAssembly Runtime as an Object Class

in Ceph

To embed WebAssembly in Ceph, we leverage the capabilities of programmable storage and WebAssembly’s Non-Web runtime environments. This entails implementing

Wasm as an object class in Ceph that allows us to build application-speciﬁc functionality. These functions can be written in WebAssembly and deployed in binary format, by injecting it into the osd at runtime.

A more straightforward path to incorporate user-deﬁned processing code might be to use dlopen() to dynamically load code. This is used by Ceph to load cls classes, such as libcls tabular. However, this has the same drawbacks as libcls tabular that we try to avoid with wasm. The dlopen() approach is less ﬂexible and has several real-world

17 deployment considerations. Even though it can provide results quickly, it entails hard coding routines. This can complicate reloading code. Other concerns involve com- patibility issues with the architecture and libraries. However WebAssembly has other beneﬁts, in that it involves language and platform independent binary code, thus avoid- ing the problems that can arise while using dlopen().

Updating object classes means adding new libraries and restarting OSDs which can be disruptive to the users, since restarting can take a currently active OSD in a running cluster, offline. The current version of Ceph allows us to copy new libraries (e.g., libcls tabular.so) into a running Ceph cluster without restarting the OSD, but this is not a reliable or likely intentional feature of Ceph for us to depend upon. It also requires the code to be compiled against a specific Ceph version, as previously mentioned. This limits the ability to inject generic user code at runtime. Thus we provide a framework to execute a compiled wasm binary on the OSD by injecting it into the system dynamically by copying the binary onto each OSD so that it is locally available. These wasm files can be replaced by the user at any time without disrupting the cluster as these files exist outside Ceph in a local directory. The performance of WebAssembly is competitive with native code, with many benchmarks performing within 10% of native code [23].

To invoke a method, the binary ﬁles are deployed to the local ﬁle system of the OSD in advance and are looked up at run time.

18 Figure 4.1: Compiling, Deploying and Executing a WebAssembly Module in a Non-Web Environment. Figure includes elements copied from [19].

The basic structure of a wasm object class is shown in the following code snippet:

wasm_runtime_init();

char*buffer= read_wasm_binary_to_buffer(filename,&size)

module= wasm_runtime_load(buffer, size, error_buf, sizeof(error_buf))

module_inst= wasm_runtime_instantiate(module, stack_size, heap_size,

error_buf, sizeof(error_buf));

The wasm runtime init() initializes the wasm runtime by default conﬁguration and uses the default memory allocator for the runtime memory management.

The read wasm binary to buffer() reads the WASM ﬁle into a memory buﬀer which is then parsed to create a WASM module using wasm runtime load(). This WASM module is then used to create an instance using wasm runtime instantiate() of WASM, so as to get a linear block of memory.

19 After the instantiation of a module, the runtime native looks up the WASM function by their names to call them and creates an execution environment which can be used to execute the WASM function. The runtime ensures the dynamics of the wasm applications don’t aﬀect any other processes running on the same system.

The parameters to a wasm binary method are passed as an array of 32 bit elements i.e each parameter occupies 4 or fewer bytes and is generally a single array element. In case of data types like double, each parameter takes two array elements. The return value of the wasm binary function is sent back to the calling method in the ﬁrst few elements of the array. Currently, only primary data types can be sent as return values.

Transferring the buffer is tricky since sandboxing prevents the wasm code from accessing native memory. We allocate a buffer from Wasm instance’s memory and pass the buffer address to the wasm module. Currently, it is not possible to pass structured data or class objects through pointers from a caller function to a wasm binary as they cannot be referenced due to the wasm memory being sandboxed. Since our use case entails passing pointers, we declare a memory buffer in wasm then copy the pointer data into the wasm memory.

20 Chapter 5

Comparative Analysis of diﬀerent approaches to Embed WebAssembly in a non-web Environment

5.1 Choosing a WebAssembly Implementation

In order to decide which runtime environment best suited and performed well, several runtime environments were investigated. Some environments were developed more rapidly than the others. Combined with limited documentation, there were several challenges with investigating and choosing the right environment.

• Due to the rapidly evolving codebase, it was diﬃcult to keep the code in sync with

the current implementation of the runtime, since sometimes, the core functions

were modiﬁed which led to changes in their API.

21 • Even though most environments stated that they supported C and CPP, the level

of support for each language from the runtime was vastly diﬀerent. It was found

that most of the runtime environments provided support for C, but there was very

little support for CPP.

• One of the most important goals of WebAssmebly is Security. Each runtime

environment provided sandboxing, however, after running several experiments it

was found that some functions allowed passing only primary data types as function

parameters while others permitted passing function addresses too.

• Performance was another important parameter considered while embedding the

runtime environment in Ceph in order to achieve minimum performance overhead

of the runtime over Ceph.

Out of the several implementations available, we selected two alternatives -

The WebAssembly Virtual Machine (wavm) [16] and the wasmer [14] library which are both being actively developed.

5.1.1 WAVM

WAVM is a WebAssembly Virtual Machine designed to be used in non-web applications.

Prerequisites: WAVM uses LLVM to compile WebAssembly code.

Features of WAVM:

• Fast: WAVM can beat native code performance since it generates machine code

22 tuned for the exact CPU that is running the code.

• Safe: WAVM prevents WebAssembly code from accessing state outside of We-

bAssembly virtual machine or calling native code that is not explicitly linked with

the WebAssembly module.

• WAVM supports WebAssembly 1.0

• WAVM is being actively developed to support the latest version of Emscripten(a

compiler to convert C/C++ code into WebAssembly).

The WebAssembly Virtual Machine can be built from the source on github which uses llvm and cmake. The size of the wavm binary after building WAVM is approximately

2MB.

5.1.2 wasmer

Wasmer is an open source runtime for executing WebAssembly on the Server.

Using the new Wasmer C bindings, it is possible to run WebAssembly code from a variety of programming languages that support calling C APIs including C, C++, Python, PHP.

Wasmer is a WebAssembly runtime designed to run both standalone and embedded.

Features of wasmer:

• Supports an older version (sdk-1.38.21-64bit) of Emscripten (currently - 1.39.4)

• No external build tools required

• Can be integrated with the source code of Ceph (using wasmer.hh)

23 wasmer.io provides an API to embed wasmer into native C/C++ code to run We- bAssembly anywhere. The size of the wasmer library after building is approximately

18kB.

24 Chapter 6

Measuring the overheads

Experimental Design and Methodology:

The objective of the experiments was to embed the WebAssembly runtime environment in Ceph. In order to do this, the runtime environment (e.g., WAVM, wasmer) had to be linked to the source code of Ceph. We compiled the required methods in wasm and deployed the binaries to the Ceph OSDs. We compiled two C functions to wasm - One function is a no-op function that returns 0 so that we can compute a base overhead of using the runtime environment itself and the other function performs some computation (counts to 1 billion) before returning a value. We measure the time that it takes for each of these functions to execute in Ceph and compare it to the performance of native code. This native code performance is computed in two ways:

1. The time each of these wasm binary functions take to execute if they were running

in the runtime environment, but outside Ceph

25 Figure 6.1: Ceph Cluster with one OSD

2. The time each object ﬁle takes to run (.o ﬁle obtained after compiling it with gcc)

Ceph’s object class interface can be used to create custom methods. After placing the wasm binary and deploying Ceph on the OSD, the Client node issues the data process

(cls read) request. The wasm binary is then executed on the OSDs using each of the two alternatives before returning the result back to the client. The time evaluated on

Ceph includes the time taken to issue the driver program CLI from the client until the client receives the result back from the OSD. The CLI arguments are ﬁrst unpacked in the CLS framework (currently, this code is in tabular). These arguments tell the osd which wasm program to execute.

6.0.1 Systems

All experiments were performed on 2X 10-core Intel Xeon Silver 4114 CPUs with 192GB ECC DDR4-2666 Memory running Ubuntu 18.04.1 with Linux kernel v4.15.0-70-generic.

26 No Experiments Native Ceph(external syscalls) gcc WAVM wasmer WAVM wasmer 1 Returns 0 0.0042s 2.9764s 0.0323s 2.9997s 0.0387s 2 Counter to 1 billion 2.3022s 4.8459s 3.3400s 4.8612s 3.3534s

Table 6.1: Execution Time of functions in Native and with Wasm Runtime Environments

No Function Native gcc WAVM wasmer 1 Returns 0 8168B 8863B 133B 2 Counter to 1 billion 8176B 9006B 220B

Table 6.2: Size of binary compiled ﬁles for each function executed in the Experiments

6.0.2 Experimental scenarios

• Locally without Ceph and without a wasm runtime (functions compiled

using gcc to create binary .o ﬁle then execute natively on machine)

• Locally without Ceph in WAVM (functions compiled using emscripten to

create .wasm binary then executed via wavm)

• Locally without Ceph in wasmer runtime (functions compiled using em-

scripten to create .wasm binary then executed via wasmer)

• In Ceph with WAVM (functions compiled using emscripten to create .wasm

binary then executed via wavm on the osd on Ceph)

• In Ceph with wasmer (functions compiled using emscripten to create .wasm

binary then executed via wasmer on the osd on Ceph)

27 Figure 6.2: Comparison of Overhead for a no-op function

Figure 6.3: Comparison of Overhead for a Function that performs computation

The height of each bar in the graph is the average of 5 runs. Both the graphs show

28 that running the functions natively is the fastest. Fig. 6.2 shows that WAVM has a high overhead but wasmer has a much lower overhead compared to native execution.

Fig. 6.3 shows a similar trend but the overhead is higher. Both the engines on Ceph assume that the function code exists locally on the OSDs. The execution time on Ceph also includes the network time i.e. the round trip time from the client.

Both, Fig. 6.2 and Fig. 6.3 show that embedding a library performed better in terms of the execution time. The WebAssembly Virtual Machine takes more time to run since it runs in its own sandboxed environment which prevents it from accessing any state outside of the VM. It seemed like it might not be possible to reduce the overhead caused due to the VM. The overhead of Ceph was also negligible. Thus, after this analysis, we decided to use a wasm runtime library in order to execute WebAssembly code in Ceph.

29 Chapter 7

Implementation

The functions executed for performance evaluation of wasm runtime experiments were basic functions that didn’t do any meaningful work. Thus, in order to estimate the performance of WebAssembly in a real-world scenario, we translated the

SkyhookDM data processing function for Flatbuﬀers data format (row-oriented) to We- bAssembly. This function can be referred to as f (e.g., processSkyFb in SkyhookDM).

The query engine calls the processing function f after the required pre-processing depending on the input parameters. This function processes tabular data in flatbuffer format. A similar function exists that processes tabular data in Arrow format. The need for processing different data formats provided motivation to build a common interface that allowed processing functions for new data formats to be injected dynamically.

This function was chosen in particular, since it involved processing diﬀerent data types within WebAssembly. This function iterates over rows, and calls the ﬁlter function to

30 apply predicates to the flatbuffer data. After processing the data, the rows that pass are added into the return flatbuffer builder which is then sent to the client.

Currently the wasm library is coupled with the cls object classes of wasm and tabular to create a .so ﬁle which can then be deployed into Ceph like any other object class ﬁle.

Once the cls wasm lib is deployed, we have the framework to execute any binary wasm method conforming to the signature of f. The binary for f is then deployed onto the

OSDd. As previously mentioned, having f as a wasm binary means that f can be easily updated any time by deploying a new wasm binary to the OSD. Thus we do not need to update our cls wasm class or disrupt/restart OSDs.

The call to f in SkyhookDM is facilitated by ioctx.aio exec(oid, s → c,‘‘tabular’’, ‘‘exec query op’’, inbl, &s → bl);

Here, the client runs the librados exec method. This function call invokes the exec query op

function remotely on the oid object deﬁned in the ‘tabular’ object class. exec query op is a registered function within the cls libary, and not the wasm binary f. Any input parameters such as query predicates, that are required are packed / serialized into a buﬀerlist inbl before passing them to the query op function. The exec query op method, using the wasm runtime library, creates and instantiates a wasm environment.

The exec query op will call the function f, through the wasm binary. In the following experiment we call either the function f currently from within cls tabular (native cpp in ceph) by passing all the required parameters to it. ‘s → bl’ stores the buﬀerlist that was output but the webAssmebly binary. This output result is then sent back to the

31 client.

This experiment was performed with a dataset containing 2 objects with 20 rows and shows an average of 5 runs.

Figure 7.1: Performance Comparison of Queries in Tabular to that of WebAssembly

This graph compares the execution time for query q via our cls tabluar exec qeury op method that either uses its cpp implementaiton of f or the wasm binary for f to process

the object data. The dataset was chosen with the objective of checking correctness. Both

versions of f (cpp, wasm) return the same results.

We selected diﬀerent queries for each run such that the result produced had:

1. Exactly one row

32 2. A selectivity of 10% rows

3. A selectivity of 100% rows

The height of each bar is the average of ﬁve runs. As seen from the graph, calling the cpp implementation of the processing function f is faster than calling the wasm binary, however there isn’t much standard deviation. Thus, from this small scale experiment, we were able to verify the correctness of the results produced and also conclude that the the performance of wasm is within 0.3% of the cpp implementation. Considering the previous experiments done for evaluating the wasm engines, it was seen that increasing the data did not increase the overhead. Thus, future work involves performing more experiments with larger object sizes to verify the same.

33 Chapter 8

Discussion

8.1 Current limitations of the WebAssembly runtime

The wasm implementation supports C/C++ primary data types, it does not support more complex data types such as vectors or Flatbuffer objects. As a workaround, we first cast any non-primary types to primary primary data types (char *, int, int *) and then pass these buffers to the WebAssembly code. The data processing function f is executed and the result is written to a buffer which is sent back to the client.

Security is one of the most important goals of WebAssembly. When the wasm binary executable is created using Emscripten, we allocate a certain amount of memory to this process. The wasm code runs inside a sandbox i.e. function calls to arbitrary addresses are not permitted. Any other process cannot write to the memory allocated for wasm i.e each process executes independently of the other and escape is only possible through

34 appropriate APIs.

With this goal in mind, it was difficult to share an address space where both the calling function and the processing wasm binary function f could write to. Our use case required us to be able to provide the wasm binary with access to the object data (Flat- buffer format in this case). Initally, we made several attempts to declare a shared memory buffer, but it violated the sandbox boundaries. The current implementation allows passing an address that is read-only for the wasm process. Thus, in order to get past the limitation of sharing memory, we declared the result buffer memory location global in cls tabular so that both - the calling function and the wasm process could access it.

To build the wasm binary using Emscripten, we run the following command: emcc -O3 -s WASM=1 -s SIDE MODULE=1 -s TOTAL MEMORY=64KB - s TOTAL STACK=30KB -s “EXPORTED FUNCTIONS=[’ main’, ’ process’]”

-o test.wasm test.c

• emcc runs the Emscripten compiler

• The TOTAL MEMORY ﬂag speciﬁes the amount of memory allocated to the

wasm process.

This command mandates specifying the total amount of memory that would be initially allocated to the WebAssembly process, restraining the memory allocation at compile time instead of at runtime.

35 With all the current wasm runtimes available for Non-Web environments (wavm, wasmer), none of them have a provision to share a memory buﬀer. However, there has been some progress to do so for a wasm process running inside a JS engine in the Web

Browser. The Roadmap for most of these runtime environments involves adding such a provision for a Non-Web environment while conforming to the security policies of

WebAssembly.

8.2 WebAssembly beyond cls tabular

The wasm object class has dependencies on the wasm runtime library and since all of these details are included in the CMake File, the current routines for building Ceph work just the same.

The current implementation of cls wasm [13] takes as input the function name, the name of the runtime environment (WAVM, wasmer), and the function arguments. Cur- rently, it is assumed that the wasm binary ﬁle is present on the OSD server. In our experiments we had to copy this ﬁle manually to the OSDs. A topic of future work is to implement a dynamic method and possibly a wasm binary function repository so that Ceph OSDs can retrieve these binaries themselves, on demand. Thus, using the

‘run-wasm’ command, the client calls the speciﬁed wasm function on the data in the object with the function name and the required parameters. The OSD executes the said function before returning the result back to the client. Simple examples have been

36 provided in the example code that facilitate using wasm to perform certain operations on the OSD.

Our implementation of wasm within cls follows the blueprint of Lua in cls [9]. Previous extensions to Ceph with Lua[9] have shown how a method could be executed dynamically only with interpreted Lua programs[6], and by passing the Lua code text as a parameter to the lua interface. However, the Lua approach to dynamic object classes is platform independent but not language independent and oﬀers only script execution, and not binary execution.

Python code can be compiled to WebAssmebly[11]. Since many data analysts (physics,

ML, etc.) use python and pyarrow for Arrow data, and the Arrow ﬁle format is supported by SkyhookDM already, an interesting use case is to embed this library in Ceph so that it could perform non-SQL style analysis on Arrow data.

37 Chapter 9

Related Work

[23] demonstrated that a program written in C and compiled to WebAssembly binary executables, instead of JavaScript, could run 34% faster in Chrome. The paper also showed that the performance of WebAssembly is competitive with native code.

[24] presented an evaluation of WebAssembly wherein the experiments were performed with the goal of comparing how WebAssembly-compiled Unix applications perform on the browser. It showed that these applications run slower than native code by approximately 45% to 55% depending on the browser. Only some of these issues were attributed to the WebAssembly platform while others were due to code generation issues. But since the goal of WebAssembly was to be faster than JavaScript which both the original paper and this paper show, is true.

The Open Stack Swift storelets deﬁned in [25] introduced a way to execute user deﬁned

38 functions in object storage. However, this approach required containers in a middleware layer to execute the user defined function and it supported a fix set of functions e.g. select, project. and a few user defined functions. Our wasm binary approach does not require any extra middleware, the binaries are deployed into OSDs and can apply data transformations directly to the object data before passing it up the software stack to another layer.

[31] presented JAFAR hardware, a Near-Data Processing (NDP) accelerator for pushing selects down to memory in modern column-stores. ‘Smart SSDs’ in [22] package, processing and storage inside the SSD for query processing. This paper identifies changes that are needed to be made in the SSD device to boost the use of Smart SSDs. Discharg- ing list intersection into Smart SSDs was an enhancement provided in [28]. However, these approaches are hardware solutions that embed data processing functions into storage devices, as they provide ways for modifying the manufacturing processes in order to utilize the full potential of the storage devices. Since they typically require a specialized hardware or customized firmware for specific tasks, they are less flexible in general.

39 Chapter 10

Future Work

Current implementation involves manual deployement of the binary ﬁle to the

OSD. Thus, the system works under the assumption that the wasm binary is already present inside the local directory of the OSD server. As mentioned earlier, one possible solution is to have a registry, to curate the wasm binaries so that the very ﬁrst time a wasm function is called, the OSD pulls the ﬁle possibly from the metadata server

(MDS) and caches it on the server. Various security policies could be implemented before storing the ﬁle on the MDS thus ensuring security. We leave this as future work to investigate a reliable, scalable, secure way to deploy the wasm binary ﬁle across all osds in Ceph.

This work compares the performance of WebAssembly to that of C++. But Ceph also has a provision to extend Object Interfaces dynamically with Lua. Thus, it would be interesting to see how WebAssembly performs compared to Lua.

40 A very small data set has been used to check the correctness. It can be seen from the current and previous experiments that the overhead of executing wasm in Ceph was negligible, and the overhead of wasm execution is reasonable compared to native execution. However, similar experiments can be performed with additional data to ensure that the overhead does not increase with the increase in object data.

41 Chapter 11

Conclusion

This thesis investigated, introduced, and evaluated a method to dynamically inject user-deﬁned methods into Ceph’s existing extensible storage mechanism (cls), as binaries created in a language independent way, using WebAssembly. Unlike most previous approaches, we propose a software solution to support this mechanism. The design and implementation of this work are mostly proof of concepts. The wasm object class facility is not yet in the mainline Ceph tree. The feature is located in the wasm branch, and can be checked out from github: https://github.com/uccross/skyhookdm- ceph/tree/wasm

Thus, we investigated the possibility of embedding wasm approaches using wavm, wasmer, wasmtime and decided to embed the library over the virtual machine based on the performance and advanced implementation support for CPP. We then implemented a wasm cls class that can execute a wasm binary. While the cls class must be compiled

42 and deployed as a ceph lib (disruptive but a one-time eﬀort), the wasm binary can be dynamically deployed into Ceph OSDs at any time.

Related work along with experiments presented in this thesis provide evidence that various data processing functions can be written in a generic way using any supported language ( C, C++, Rust) and compiled to WebAssmebly. This code can then be executed in Ceph to support the goal of dynamic injection of user-code at runtime without disrupting the Ceph cluster or restarting OSDs, thus presenting a ﬂexible framework for extending data processing capability within Ceph.

43 Bibliography

[1] Awesome wasm. https://github.com/mbasso/awesome-wasm.

[2] Clang: a c language family frontend for llvm. https://clang.llvm.org/.

[3] Compiling c to webassembly and running it without em-

scripten. https://depth-first.com/articles/2019/10/16/

compiling-c-to-webassembly-and-running-it-without-emscripten/.

[4] Converting webassembly text format to wasm. https://developer.mozilla.org/

en-US/docs/WebAssembly/Text_format_to_wasm.

[5] Crush maps. https://docs.ceph.com/docs/master/rados/operations/

crush-map/.

[6] Dynamic object interfaces with lua. https://ceph.io/geen-categorie/

dynamic-object-interfaces-with-lua/.

[7] Emscripten. https://emscripten.org/.

[8] How we used webassembly to speed up our web app by 20x (case study). https:

//www.smashingmagazine.com/2019/04/webassembly-speed-web-app/.

44 [9] Lua in ceph. https://github.com/ceph/ceph/tree/master/src/cls/lua.

[10] Non-web embeddings. https://webassembly.org/docs/non-web/.

[11] The python scientiﬁc stack, compiled to webassembly. https://github.com/

iodide-project/pyodide.

[12] Was - the webassmebly system interface. https://wasi.dev/.

[13] Wasm in ceph. https://github.com/uccross/skyhookdm-ceph/tree/wasm/

src/cls/wasm.

[14] Wasmer. https://wasmer.io/.

[15] Wasmtime - a small and eﬃcient runtime for webassembly wasi. https://

wasmtime.dev/.

[16] Wavm - webassembly virtual machine. https://github.com/WAVM/WAVM.

[17] Webassembly. https://en.wikipedia.org/wiki/WebAssembly.

[18] Webassembly at ebay: A real-world use case. https://tech.ebayinc.com/

engineering/webassembly-at-ebay-a-real-world-use-case/.

[19] Webassembly format. https://research.mozilla.org/webassembly/.

[20] Webassembly lesson 1: Hello world. https://www.jamesfmackenzie.com/2019/

11/30/whats-is-webassembly-hello-world/.

[21] Webassembly roadmap. https://webassembly.org/roadmap/.

45 [22] Jaeyoung Do, Yang-Suk Kee, Jignesh M. Patel, Chanik Park, Kwanghyun Park,

and David J. DeWitt. Query processing on smart ssds: opportunities and chal-

lenges. In SIGMOD ’13: Proceedings of the 2013 ACM SIGMOD International

Conference on Management of Data, pages 1221–1230, June 2013.

[23] Andreas Haasa, Andreas Rossberg, Derek L. Schuﬀ, Ben L. Titzera, Michael Hol-

man, Dan Gohman, Luke Wagner, Alon Zakai, and JF Bastien. Bringing the web up

to speed with webassembly. PLDI 2017: Proceedings of the 38th ACM SIGPLAN

Conference on Programming Language Design and Implementation, 38:185–200,

June 2017.

[24] Abhinav Jangda, Bobby Powers, Emery D. Berger, and Arjun Guha. Not so fast:

Analyzing the performance of webassembly vs. native code. Proceedings of the 2019

USENIX Annual Technical Conference, July 2019.

[25] Y. Moatti, E. Rom, R. Gracia-Tinedo, D. Naor, D. Chen, J. Sampe, M. Sanchez-

Artigas, P. Garcıa-Lopez, F. Gluszak, E. Deschdt, F. Pace, D. Venzano, and

P. Michiardi. Too big to eat: Boosting analytics data ingestion from object stores

with scoop. In 2017 IEEE 33rd International Conference on Data Engineering

(ICDE), pages 309–320, 2017.

[26] E. Riedel, G. A. Gibson, and C. Faloutsos. Active storage for large-scale data

mining and multimedia. In Proceedings of the 24th international Conference on

Very Large Databases, VLDB ’98, 1998.

46 [27] Michael A. Sevilla, Noah Watkins, Ivo Jimenez, Peter Alvaro, Shel Finkelsteina,

Jeﬀ LeFevre, and Carlos Maltzahn. Malacology: A programmable storage system.

In EuroSys ’17: Proceedings of the Twelfth European Conference on Computer

Systems, page 175–190, April 2017.

[28] Jianguo Wang, Dongchul Park, Yang-Suk Kee, and Yannis Papakonstantinou. Ssd

in-storage computing for list intersection. In DaMoN ’16: Proceedings of the 12th

International Workshop on Data Management on New Hardware, pages 1–7, June

2016.

[29] Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos

Maltzahn. Ceph: a scalable, high-performance distributed ﬁle system. In OSDI ’06:

Proceedings of the 7th symposium on Operating systems design and implementation,

page 307–320, November 2006.

[30] Sage A. Weil, Andrew W. Leung, Scott A. Brandt, and Carlos Maltzahn. Rados: A

scalable, reliable storage service for petabyte-scale storage clusters. In Proceedings

of the 2nd international Petascale Data Storage Workshop (PDSW ’07), November

2007.

[31] Sam Likun Xi, Oreoluwa Babarinsa, Manos Athanassoulis, and Stratos Idreos.

Beyond the wall: Near-data processing for databases. In DaMoN ’15: Proceedings

of the 11th International Workshop on Data Management on New Hardware, pages

1–10, May 2015.

47 Appendix A

Building Wasm for the Web using

Emscripten

1. In order to run a simple C program in the browser, write the following code snippet

and save it in a ﬁle called hello.c on your local drive.

2. #include

int main(int argc, char** argv) {

printf("Hello World\n");

}

3. Using the Emscripten compiler, navigate to the hello.c ﬁle and run the following

command:

emcc hello.c -s WASM=1 -o hello.html

4. -s WASM=1 — Specifies that we want wasm output. If this flag is not specified,

48 Figure A.1: Running a wasm binary in the browser. Figure copied from [20] without permission.

Emscripten will just output asm.js, as it does by default.

5. -o hello.html — Speciﬁes that we want Emscripten to generate an HTML page to

run our code in, as well as the wasm module and the JavaScript ”glue” code to

compile and instantiate the wasm so it can be used in the web environment.

49 Appendix B

Building Wasm using Clang

To compile a C program to WebAssmebly using Clang, run the following command[3]: clang --target=wasm32 --stdlib=libc++ -Wl,--export-all -o test.wasm test.c

The parameters include:

• –target=wasm32 - Speciﬁes the build target for Clang

• –stdlib - Speciﬁes which standard libraries to use

• -Wl,–export-all - Exports all symbols from WebAssembly to be executed by the

runtime Engine

• -o - Speciﬁes the output ﬁle name