A Distributed Logic Reactive Programming Model and Its Application to Monitoring Security

A Distributed Logic Reactive Programming Model and its Application to Monitoring Security Thierry Renaux Dissertation submitted in fulfillment of the requirement for the degree of Doctor of Sciences March 28, 2019 Jury: Prof. Dr. Wolfgang De Meuter, Vrĳe Universiteit Brussel, Belgium (promotor) Prof. Dr. Joeri De Koster, Vrĳe Universiteit Brussel, Belgium (promotor) Prof. Dr. Viviane Jonckers, Vrĳe Universiteit Brussel, Belgium (chair) Prof. Dr. Katrien Beuls, Vrĳe Universiteit Brussel, Belgium (secretary) Prof. Dr. Tias Guns, Vrĳe Universiteit Brussel, Belgium Prof. Dr. Coen De Roover, Vrĳe Universiteit Brussel, Belgium Prof. Dr. Sebastian Erdweg, TU Delft, The Netherlands Prof. Dr. Robert Hirschfeld, Hasso Plattner Institute, Germany Vrĳe Universiteit Brussel Faculty of Sciences and Bio-engineering Sciences Department of Computer Science Software Languages Lab © 2019 Thierry Renaux Printed by Crazy Copy Center Productions VUB Pleinlaan 2, 1050 Brussel Tel / fax : +32 2 629 33 44 [email protected] www.crazycopy.be ISBN 9789493079205 NUR 989 All rights reserved. No part of this publication may be produced in any form by print, photoprint, microfilm, electronic or any other means without permission from the author. Abstract Processes in the world are increasingly managed by software systems that are centered around the notion of events. For instance, banking software responds to the occurrence of financial transaction events and parcel services respond to the arrival event of a package at a depot. To, e.g., identify fraud in a series of financial transactions, multiple transaction events must be correlated. How to efficiently extract patterns from a stream of events — in an online fashion — is an active research topic. The state of the art enables extracting complex patterns from large streams, or performing simple processing with predictable latency. There are, however, no techniques which combine expressive event correlation mechanisms with guaranteed upper bounds on resource usage. Lacking such guarantees, event monitoring systems risk falling behind on the live data, or even crashing due to memory exhaustion. Preventing this is of utmost importance for modern systems that are online 24/7. In this dissertation, we propose a novel programming paradigm for large-scale online event correlation: Logic Reactive Programming. LRP overcomes the issues in the state of the art by imposing maximum lifetimes on stored event data. Through temporal reasoning, LRP then guarantees that a fixed upper limit exists on the number of events whose data needs to be retained. Stale data is automatically discarded, thus enabling LRP systems to operate in a fixed resource budget. We introduce PARTElang, the first Logic Reactive Programming language. Its formal foundations are defined by means of an event algebra. We define Featherweight PARTE, an operational semantics for PARTElang, which guarantees that every operation evaluates in constant time, using constant space. We prove correctness of Featherweight PARTE, i.e., that steps defined by the formal model ensure that a PARTElang program is translated to a number of concurrent units of computation which jointly arrive at the results prescribed by the event algebra. We further validate PARTE- lang by building a security monitoring application on top of a prototypical implementation of the Featherweight PARTE model. i Samenvatting Steeds vaker worden activiteiten in de buitenwereld beheerd door softwaresys- temen die werken op basis van events. Zo reageren banksystemen bĳvoorbeeld op het plaatsvinden van een transactie-event. De systemen van pakjesdiensten voeren acties uit wanneer een aankomst-event van een pakje zich bĳ een depot voordoet. Dergelĳke events bevatten op zich reeds nuttige informatie, maar er kan nog meer informatie onttrokken worden uit een correlatie van meerdere events, bv. door fraude te detecteren in een reeks financiële transacties. Dit alles zorgt ervoor dat er nood is aan middelen om grote stromen van events te monitoren op het voorkomen van patronen van events. Omdat het vaak nuttig is om snel te reageren, vindt deze monitoring best live plaats — terwĳl de events plaatsvinden. De ontwikkeling van technieken om grootschalige eventstromen live te analyseren, is de focus van meerdere onderzoeksdomei- nen. Er ontbreekt echter een oplossing die kan garanderen dat de grootschalige live analyse uitgevoerd kan worden met een op voorhand bepaalde hoeveelheid middelen. Zonder die garantie riskeren eventmonitoringsystemen om achterop te geraken ten opzichte van de live data, of zelfs om hun taak te moeten staken bĳ gebrek aan werkgeheugen. Voor systemen die continu beschikbaar moeten blĳven, is dit onaanvaardbaar. In deze doctoraatsverhandeling stellen we een nieuw programmeerparadigma voor voor grootschalige live eventcorrelatie: Logisch-Reactief Programmeren. LRP overkomt de problemen in huidige oplossingen door een garandeerbare bovengrens te vereisen op de levensduur van eventdata. Aan de hand van temporal reasoning garandeert LRP dat een vaste bovengrens bestaat op het aantal events dat bĳgehouden moet worden. Oude events worden automatisch verwĳderd. Dit laat LRP systemen toe om binnen een eindig tĳds- en geheugenbudget te werken. We introduceren PARTElang, de eerste LRP taal. De formele onderbouwing van LRP berust op een event algebra. We definiëren een operationele semantiek voor PARTElang. Deze garandeert dat iedere operatie in constante tĳd uitgevoerd wordt, gebruikmakend van een constante hoeveelheid geheugen. We bewĳzen dat het evaluatiemodel het correcte gedrag voorschrĳft, en demonstreren dit aan de hand van een prototype. iii Acknowledgments First of all, I would like to thank the members of my jury (Viviane Jonckers, Katrien Beuls, Tias Guns, Coen De Roover, Sebastian Erdweg, and Robert Hirschfeld) for their insightful comments and feedback. I would like to thank my promotors, Wolf and Joeri, for their guidance and feedback. Their advice, encouragement, and occasional constructive criticism were essential throughout my PhD. Of course, the rest of the Software Languages Lab was also a great bonus. Thank you all, both for offering a good environment to do work in, and for being great colleagues. I won’t list you all, but know that I am thankful for the work environment you helped create. So, thanks to the Rete-and-event-based- things people: Kennedy, Humberto, Sam, etc. Thanks to those of you who arrived at SOFT around the same time as me, and therefore were often going through the same thing at the same time as me: Janwillem, Nathalie, Simon, etc. Thanks to those who at some point were doing the day-to-day guidance for the part of SOFT I was in at that time: Stefan, Lode, Yves, etc. Outside of SOFT, I’d also like to thank my family and friends. Thanks Ansje, for being there for me through all these years. Thanks, mom and Evelyne too, for all the support you provided. Thanks, Raf, for your friendship and for proofreading this entire dissertation, even though you had no obligation to do so at all. Finally, thanks, reader, for reading this. Why are you reading this? Are you waiting for my public defense to start? (Hello! Wave!) Or are you reading it during the presentation? Aww, I spent all that effort making it interesting... Or are you looking for inspiration while writing your own Acknowledgments section? Did you pass your private defense yet? Congratulations! Or are you just that well-prepared that you’re writing it proactively? Wow, keep up the good work! In any case, thanks, and I hope the work described in this dissertation is useful to some of you. This work is partially funded by a PhD scholarship of the Agency for Innovation by Science and Technology in Flanders (IWT). v Contents 1 Introduction 1 1.1 Problem Statement . .4 1.2 Research Goal . .4 1.3 Approach . .5 1.4 Contributions . .8 1.5 Supporting publications . .8 1.6 Outline of the Dissertation . 10 2 State of the Art in Distributed Big Data and Stream Processing 11 2.1 Driver Scenarios . 12 2.1.1 Setting . 12 2.1.2 Scenarios . 13 2.1.3 Conclusion . 14 2.2 History of Big Data Stream Processing . 14 2.2.1 Origins . 14 2.2.2 Early Stream Processing: Tribeca . 16 2.2.3 Later Stream Processing Systems . 17 2.2.4 Early Big Data Processing: MapReduce . 19 2.2.5 Data-Parallel Pipelines . 20 2.2.6 Towards Streaming Big Data Processing . 21 2.2.7 Summary . 23 2.3 Research Trends . 23 2.3.1 Restricting Selections by Time: Windowing . 23 2.3.2 Distribution and Fault-Tolerance . 27 2.3.3 Expressivity and Language Paradigm . 29 2.3.4 Summary . 31 2.4 Discussion: Shortcomings in the State of the Art . 31 2.4.1 Event Correlation in Big Data Stream Processing Frameworks . 32 2.4.2 Data Processing Guarantees of Data in Streaming Databases . 33 2.5 Conclusion . 34 vii 3 State of the Art in Event Handling 35 3.1 Traditional Approaches to Event Handling . 35 3.2 Averting the Callback Hell: Reactive Programming . 37 3.2.1 The Functional Reactive Programming Paradigm . 37 3.2.2 Beyond Functional Reactive Programming . 38 3.2.3 Active Research in Reactive Programming . 38 3.3 Detecting Event Patterns: Complex Event Processing . 38 3.3.1 Differentiating CEP from Streaming Databases . 39 3.3.2 A Baseline for Modern CEP: SASE . 41 3.3.3 Aggregation and Monitoring Multiple Streams: Cayuga . 42 3.3.4 Expressive and Efficient CEP with Distributed Event Sources: TESLA 44 3.3.5 A Formal Foundation for Modern CEP: EVA . 46 3.3.6 Complex Event Patterns and Reaction Logic as Declarative Rules . 46 3.4 A Taxonomy of Event Handling . 47 3.4.1 Event Consumption Policies . 47 3.4.2 Semantics of “followed by” . 48 3.4.3 Support for Temporal Constraints in Event Handling Languages . 50 3.4.4 Event Detection Models . 51 3.4.5 Summary . 56 3.5 Discussion: Shortcomings in the State of the Art . 56 3.5.1 Shortcomings of Functional Reactive Programming .

Load more