
Manisha Luthra, Sebastian Hennig, Kamran Razavi, Lin Wang, Boris Koldehofe: Operator as a Service: Stateful Serverless Complex Event Processing. In the Proceedings of IEEE International Conference on Big Data, December, 2020 Operator as a Service: Stateful Serverless Complex Event Processing Manisha Luthra∗, Sebastian Hennig∗, Kamran Razavi∗ Lin Wang∗y, Boris Koldehofe∗z ∗Technical University of Darmstadt, Germany yVrije Universiteit Amsterdam, Netherlands ffi[email protected] [email protected] [email protected] zUniversity of Groningen, Netherlands [email protected] Abstract—Complex Event Processing (CEP) is a powerful highly intertwined. The constructs that describe the event paradigm for scalable data management that is employed in detection logic are mapped to specific, at times infrastructure many real-world scenarios such as detecting credit card fraud dependent, operator implementations, e.g., the algorithms for in banks. The so-called complex events are expressed using a specification language that is typically implemented and executed detecting sequences of events in a time-based window. Driven on a specific runtime system. While the tight coupling of these two by preferences of programmers and the underlying systems components has been regarded as the key for supporting CEP infrastructure, many distinct CEP systems have been pro- at high performance, such dependencies pose several inherent posed [20], [8], [3], [11], offering each very specific features challenges as follows. (1) Application development atop a CEP having specific programming models and infrastructures in system requires extensive knowledge of how the runtime system operates, which is typically highly complex in nature. (2) The mind. For example, classic CEP programming models such as specification language dependence requires the need of domain CQL [6] and SASE [29] are based on SQL-like semantics, and experts and further restricts and steepens the learning curve for hence, they also share many limitations of SQL. In particular, application developers. it is very difficult to express complex business logic using In this paper, we propose CEPLESS, a scalable data manage- these programming models. Rather in current practices such ment system that decouples the specification from the runtime system by building on the principles of serverless computing. as Google Dataflow [4], Millwheel [3], and Flink [8], object- CEPLESS provides “operator as a service” and offers flexibility oriented languages are often used to express complex business by enabling the development of CEP application in any specifica- logic in the form of user-defined functions (UDFs). With UDFs tion language while abstracting away the complexity of the CEP an operator can encapsulate any business logic and hence can runtime system. As part of CEPLESS, we designed and evaluated be customized as per user needs. novel mechanisms for in-memory processing and batching that enable the stateful processing of CEP operators even under While the expressiveness is significantly improved with high rates of ingested events. Our evaluation demonstrates that UDFs, existing CEP system still fall short in several aspects. CEPLESS can be easily integrated into existing CEP systems like One of them is the lack of runtime independence. Although Apache Flink while attaining similar throughput under high scale within each CEP system, queries can be specified in a highly of events (up to 100K events per second) and dynamic operator composable manner or even altered, there is little support update in ˜238 ms. Index Terms—Complex Event Processing; Serverless comput- to benefit from reusing development effort from one CEP- ing; Function as a Service; Internet of Things application to another. Simply, rewriting the query specifica- tion from one system to another is difficult since the way CEP I. INTRODUCTION queries are written has specific execution semantics in mind which are known to diverge between multiple systems [10]. Complex event processing (CEP) is a data management Furthermore, the support for dynamic updates to the UDFs paradigm used in a wide range of applications to efficiently arXiv:2012.04982v3 [cs.NI] 28 Jun 2021 is weak. Thus, changes to the implementation of specific detect interesting event patterns in event streams. Such event operators or the definition of new functionality often require patterns often named complex events, allow applications upon a restart and new deployment of the operators [8], which is their detection to adapt to situational changes, such as the problematic in many applications like fraud detection where detection of fraud in credit card payments [2] and deriving system availability is critically important. tweet trends in Twitter [20]. The strengths of CEP reside in In this paper, we aim for a better understanding of how the simple specification of complex events by means of a query one can benefit from the diversity of different CEP sys- language and the support for efficient and distributed execution tems and enhance their applicability to a wide spectrum of of the event detection logic. different infrastructures. By proposing a data management Therefore, almost every CEP system provides two key system that builds on the principles of serverless computing components: (i) the specification language used to define architecture [15], we aim to enhance the reuse and integration event patterns and (ii) the runtime system to execute the of CEP operators. Furthermore, by proposing methods for event detection logic. Typically, these two components are adding new functionalities and altering the operator logic 978-1-7281-6251-5/20/$31.00 ©2020 IEEE at run-time, CEP systems can adapt their processing logic The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, not withstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder. Manisha Luthra, Sebastian Hennig, Kamran Razavi, Lin Wang, Boris Koldehofe: Operator as a Service: Stateful Serverless Complex Event Processing. In the Proceedings of IEEE International Conference on Big Data, December, 2020 dependent on the application context as well as the features paper makes the following contributions: of the underlying infrastructure. This way, the diversity of operator implementations becomes no longer an obstacle in 1) We propose mechanisms for in-memory queue manage- the development cycle, but a feature that allows CEP systems ment and batching that enable stateful processing and to evolve to new requirements and adapt to contextual changes ensure correctness and fast delivery of events which are at run-time. extremely important for CEP systems. While existing serverless computing platforms [27], [12] 2) We introduce a unified user-defined operator interface provide important concepts for the scalable execution of that allows the integration of highly diverse CEP run- operator implementations, the extension of CEP systems to time systems into CEPLESS system and allows runtime a serverless platform imposes many challenges. First, CEP system independence. operators often perform stateful processing such as detection 3) We implement and evaluate CEPLESS on two state-of- of correlated events within a time-based window of the data the-art CEP systems Apache Flink [8] and TCEP [22] stream [10]. Stateful processing is not easily possible in using an open and anonymous credit card transaction a serverless model because conventionally each function or dataset [23]. Results show that CEPLESS enables run- operator execution is required to be isolated and ephemeral in time system independent updates of user-defined opera- current platforms. For example in AWS Lambda, the functions tors while attaining equal throughput and preserving low have a limited lifetime of up to 15 minutes, which is the max- latency overhead (˜1.9 ms) under high scale of event imum among several serverless platforms. For the functions rates of up to 100K events per second. or operators, it is assumed that the state is not recoverable The rest of the paper is structured as follows. Section II across invocations. This stays in contradiction to the lifetime presents a motivational example. Section III presents the of a CEP operator which is required to be running as long as system model introducing the important system entities. Sec- the query is executed [14]. Second, the execution mechanisms tion IV describes CEPLESS system design. Section V shows in current CEP systems perform many optimizations such as the evaluation. Section VI discusses the related approaches and flow control, backpressure, and in-memory network buffers to Section VII concludes the paper. guarantee low latency and high throughput. Achieving equal performance in an existing serverless platform is currently not II. PROBLEM STATEMENT USING FRAUD DETECTION possible especially due to missing optimizations as aforemen- EXAMPLE tioned and slow communication through storage. For example, in AWS Lambda [27] it is only possible to communicate A financial institution wants to detect payment frauds in between two lambdas using S3 that is extremely slow and
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-