A Reactive Middleware Framework for Secure Data Stream Processing

SecureStreams: A Reactive Middleware Framework for Secure Data Stream Processing Aurelien´ Havet Rafael Pires Pascal Felber University of Neuchatelˆ University of Neuchatelˆ University of Neuchatelˆ Neuchatel,ˆ Switzerland Neuchatel,ˆ Switzerland Neuchatel,ˆ Switzerland [email protected] [email protected] [email protected] Marcelo Pasin Romain Rouvoy Valerio Schiavoni University of Neuchatelˆ Univ. Lille / Inria / IUF University of Neuchatelˆ Neuchatel,ˆ Switzerland Lille, France Neuchatel,ˆ Switzerland [email protected] [email protected] [email protected] ABSTRACT by millions of IoT devices, data streams have emerged as a suit- e growing adoption of distributed data processing frameworks in able paradigm to process ows of data at scale. However, as some a wide diversity of application domains challenges end-to-end inte- of these data streams may convey sensitive information, stream gration of properties like security, in particular when considering processing requires support for end-to-end security guarantees in deployments in the context of large-scale clusters or multi-tenant order to prevent third parties accessing restricted data. Cloud infrastructures. is paper therefore introduces SecureStreams, our initial work is paper therefore introduces SecureStreams, a reactive mid- on a middleware framework for developing and deploying secure dleware framework to deploy and process secure streams at scale. stream processing on untrusted distributed environments. Secure- Its design combines the high-level reactive dataow programming Streams supports the implementation, deployment, and execution paradigm with Intel®’s low-level soware guard extensions (SGX) of stream processing tasks in distributed seings, from large-scale in order to guarantee privacy and integrity of the processed data. clusters to multi-tenant Cloud infrastructures. More specically, e experimental results of SecureStreams are promising: while SecureStreams adopts a message-oriented [28] middleware, which oering a uent scripting language based on Lua, our middleware integrates with the SSL protocol [30] for data communication and delivers high processing throughput, thus enabling developers to the current version of Intel®’s soware guard extensions (SGX) [27] implement secure processing pipelines in just few lines of code. to deliver end-to-end security guarantees along data stream processing stages. SecureStreams can scale vertically and horizon- KEYWORDS tally by adding or removing processing nodes at any stage of the pipeline, for example to dynamically adjust according to the current Middleware, security, SGX, stream processing workload. e design of the SecureStreams system is inspired ACM Reference format: by the dataow programming paradigm [48]: the developer com- Aurelien´ Havet, Rafael Pires, Pascal Felber, Marcelo Pasin, Romain Rouvoy, bines together several independent processing components (e.g., and Valerio Schiavoni. 2017. SecureStreams: A Reactive Middleware mappers, reducers, sinks, shuers, joiners) to compose specic Framework for Secure Data Stream Processing. In Proceedings of DEBS ’17, processing pipes. Regarding packaging and deployment, Secure- Barcelona, Spain, June 19-23, 2017, 10 pages. Streams smoothly integrates with industrial-grade lightweight DOI: hp://dx.doi.org/10.1145/3093742.3093927 virtualization technologies like Docker [9]. In this paper, we propose the following contributions: (i) we 1 INTRODUCTION describe the design of SecureStreams, (ii) we provide details of our reference implementation, in particular on how to smoothly arXiv:1805.01752v1 [cs.DC] 4 May 2018 e data deluge imposed by a world of ever-connected devices, integrate our runtime inside an SGX enclave, and (iii) we perform whose most emblematic example is the Internet of ings (IoT), an extensive evaluation with micro-benchmarks, as well as with a has fostered the emergence of novel data analytics and process- real-world dataset. ing technologies to cope with the ever increasing volume, velocity, e remainder of the paper is organized as follows. To beer and variety of information that characterize the big data era. In understand the design of SecureStreams, Section 2 delivers a brief particular, to support the continuous ow of information gathered introduction to today’s SGX operating mechanisms. e architecture of SecureStreams is then introduced in Section 3. Our Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed implementation choices and an example of a SecureStreams pro- for prot or commercial advantage and that copies bear this notice and the full citation gram are reported in Section 4. Section 5 discusses our extensive on the rst page. Copyrights for components of this work owned by others than ACM evaluation, presenting a detailed analysis of micro-benchmark per- must be honored. Abstracting with credit is permied. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a formances, as well as more comprehensive macro-benchmarks with fee. Request permissions from [email protected]. real-world datasets. Some related works to this topic are gathered DEBS ’17, Barcelona, Spain in Section 6. Finally, Section 7 briey describes our future work © 2017 ACM. 978-1-4503-5065-5/17/06...$15.00 DOI: hp://dx.doi.org/10.1145/3093742.3093927 and concludes. DEBS ’17, June 19-23, 2017, Barcelona, Spain A. Havet, R. Pires, P. Felber, M. Pasin, R. Rouvoy, V. Schiavoni to bring the execution ow inside the enclave (Í). Once the trusted Untrusted Code Trusted Code function is executed by one of the enclave’s threads (Î), its result Call is encrypted and sent back (Ï) before giving back the control to gate ➊ Trusted function the main processing thread (Ð). ➌ ➍ Execute Create enclave 3 ARCHITECTURE ➋ ➎ ➏ e architecture of SecureStreams comprises a combination of Return Call trusted function two dierent types of base components: worker and router.A ➐ worker component continuously listens for incoming data by means of non-blocking I/O. As soon as data ows in, an application- … dependent business logic is applied. A typical use-case is the de- Enclave ployment of a classic lter/map/reduce paern from the functional programming paradigm [24]. In such a case, worker nodes execute only one function, namely map, filter, or reduce.A router com- Figure 1: SGX core operating principles. ponent acts as a message broker between workers in the pipeline and transfers data between them according to a given dispatching 2 SGX LIGHTNING TOUR policy. Figure 2 depicts a possible implementation of this dataow paern using the SecureStreams middleware. e design of SecureStreams revolves around the availability of SecureStreams is designed to support the processing of sensi- SGX features in the host machines. It consists in a trusted execu- tive data inside SGX enclaves. As explained in the previous section, tion environment (TEE) recently introduced into Intel® SkyLake, the enclave page cache (EPC) is currently limited to 128 MB. To over- similar in spirit to ARM TrustZone [2] but much more powerful. come this limitation, we seled on a lightweight yet ecient em- Applications create secure enclaves to protect the integrity and the beddable runtime, based on the Lua virtual machine (LuaVM)[32] condentiality of the data and the code being executed. and the corresponding multi-paradigm scripting language [15]. e e SGX mechanism, as depicted in Figure 1, allows applications Lua runtime requires only few kilobytes of memory, it is designed to access condential data from inside the enclave. e architec- to be embeddable, and as such it represents an ideal candidate to ture guarantees that an aacker with physical access to a machine execute in the limited space allowed by the EPC. Moreover, the will not be able to tamper with the application data without being application-specic functions can be quickly prototyped in Lua, noticed. e CPU package represents the security boundary. More- and even complex algorithms can be implemented with an almost over, data belonging to an enclave is automatically encrypted and 1:1 mapping from pseudo-code [35]. We provide further imple- authenticated when stored in main memory. A memory dump on a mentation details of the embedding of the LuaVM inside an SGX victim’s machine will produce encrypted data. A remote aestation enclave in Section 4. protocol allows one to verify that an enclave runs on a genuine Each component is wrapped inside a lightweight Linux container Intel® processor with SGX. An application using enclaves must (in our case, the de facto industrial standard Docker [9]). Each con- ship a signed (not encrypted) shared library (a shared object le in tainer embeds all the required dependencies, while guaranteeing Linux) that can possibly be inspected by malicious aackers. the correctness of their conguration, within an isolated and repro- In the current version of SGX, the enclave page cache (EPC) is ducible execution environment. By doing so, a SecureStreams pro- 1 a 128 MB area of memory predened at boot to store enclaved cessing pipeline can be easily deployed without changing the source code and data. At most around 90 MB can be used by application’s code on dierent public or private infrastructures. For instance, this memory pages, while the remaining area is used to maintain SGX will allow developers to deploy SecureStreams to Amazon EC2 metadata. Any access to an enclave page that does not reside in the container service [1], where SkyLake-enabled instances will soon EPC triggers a page fault. e SGX driver interacts with the CPU be made available [4], or similarly to Google compute engine [12]. to choose which pages to evict. e trac between the CPU and e deployment of the containers can be transparently executed the system memory is kept condential by the memory encryption on a single machine or a cluster, using a Docker network and the engine (MEE) [31], also in charge of tamper resistance and replay Docker Swarm scheduler [11].

A Reactive Middleware Framework for Secure Data Stream Processing

A Theory on Information Security

Data and Database Security and Controls

Information Security Essentials Definition of Information Security

Application of Bioinformatics Methods to Recognition of Network Threats

New-Age Supercomputers: Hi-Speed Networks and Information Security

Application of Bioinformatics Methods to Recognition of Network Threats

Small Business Information Security: the Fundamentals

Introduction to Database Security

Analysis of Computer Network Information Security and Protection Strategy

Cybersec and Human Performance in Degraded Modes

Improving the Cyber Security of Consumer Internet of Things Report

The Basic Components of an Information Security Program MBA Residential Technology Forum (RESTECH) Information Security Workgroup