From Complex Event Processing to Simple Event Processing

Total Page:16

File Type:pdf, Size:1020Kb

From Complex Event Processing to Simple Event Processing From Complex Event Processing to Simple Event Processing Sylvain Hallé1 Abstract Many problems in Computer Science can be framed as the computation of queries over sequences, or “streams” of data units called events. The field of Complex Event Processing (CEP) relates to the techniques and tools developed to efficiently process these queries. However, most CEP systems developed so far have concentrated on relatively narrow types of queries, which consist of sliding windows, aggregation functions, and simple sequential patterns computed over events that have a fixed tuple structure. Many of them boast throughput, but in counterpart, they are difficult to setup and cumbersome to extend with user-defined elements. This paper describes a variety of use cases taken from real-world scenarios that present features seldom considered in classical CEP problems. It also provides a broad review of current solutions, that includes tools and techniques going beyond typical surveys on CEP. From a critical analysis of these solutions, design principles for a new type of event stream processing system are exposed. The paper proposes a simple, generic and extensible framework for the processing of event streams of diverse types; it describes in detail a stream processing engine, called BeepBeep, that implements these principles. BeepBeep’s modular architecture, which borrows concepts from many other systems, is complemented with an extensible query language, called eSQL. The end result is an open, versatile, and reasonably efficient query engine that can be used in situations that go beyond the capabilities of existing systems. Keywords: event processing, software testing, query languages, runtime verification 1. Introduction taken from domains as varied as bug detection in video games and network intrusion detection. Event streams have become an important part of the This potent concept has spawned an impressive amount mass of data produced by computing systems. They can of work in the past twenty years. As we will see in Section 3, be generated by a myriad of sources such as sensors [1–3], there exist literally dozens of competing systems claiming business process logs [4], instrumented software [5–7], finan- the CEP label, ranging from academic proofs-of-concept cial transactions [8], healthcare systems [9], and network to commercial cloud frameworks such as Apache Spark or packet captures [10]. The ability to collect and process Microsoft Azure. These systems are based on a commen- these event streams can be put to good use in fields as surate number of research papers, technical reports and diverse as software testing, data mining, and compliance buzzword-laden whitepapers introducing a plethora of in- auditing. compatible formalizations of the problem. This reveals that Event stream processing typically involves computations CEP has never been a single problem, but rather a family that go beyond the evaluation of simple functions on in- of related problems sharing a relatively blurry common dividual events. Of prime importance is the possibility ground. Many of these systems and frameworks, however, to perform correlations between events, either at multiple have in common the fact that they deserve the epithet “complex”. They often rely on an intricate definition of arXiv:1702.08051v1 [cs.DB] 26 Feb 2017 moments in time within a single stream, or even between events taken from different event streams. The term Com- seemingly simple concepts; some of them don’t even state plex Event Processing (CEP) has been coined to refer to them formally, making their available implementation a de computations of this nature. One of the goals of CEP is to facto specification. Many of their built-in query languages create aggregated (i.e. “complex”) events using data fetched are quirky outgrowths of SQL, whose syntax seldom pre- from one or more lower-level events [11]. This computation serves backward-compatibility for operations identical to can be executed in cascade, with the output streams of one those found in relational databases. Almost none of them process becoming the input streams of the next, leading to provides comprehensive means for extending their language events of increasingly higher levels of abstraction. Section syntax with user-defined constructs. Finally, some suffer 2 starts this paper by presenting a wide range of examples from high setup costs, requiring hours if not days of arcane configuration editing and boilerplate code to run even the smallest example. 1Laboratoire d’informatique formelle, Université du Québec à While it is obvious that convincing use cases motivate Chicoutimi, Canada the existence of these systems, the current state of things Preprint submitted to acmjacm February 28, 2017 leaves a potential user between two uncomfortable extremes: 2. Use Cases for Event Stream Processing embrace a complex Event Processing system, with all its aforementioned shortcomings, or do without and fall back Complex Event Processing (CEP) can loosely be defined to low-level scripting languages, such as Perl or Python, as the task of analyzing and aggregating data produced to write menial trace-crunching tasks. What seems to be by event-driven information systems [11]. A key feature missing is a “Simple Event Processing” engine, in the same of CEP is the possibility to correlate events from multiple way that a spreadsheet application like Microsoft Excel sources, occurring at multiple moments in time. Informa- is often a satisfactory middle ground between a pocket tion extracted from these events can be processed, and calculator and a full-blown accounting system. Such a lead to the creation of new, “complex” events made of that system should provide higher abstraction than hand-written computed data. This stream of complex events can itself scripts, an easy to understand computational model, zero- be used as the source of another process, and be aggregated configuration operation and reasonable performance for and correlated with other events. light- to medium-duty tasks. Event processing distinguishes between two modes of op- eration. In online (or “streaming”) mode, input events are consumed by the system as they are produced, and output This paper presents a detailed description of such a Sim- events are progressively computed and made available. It ple Event Processing engine, called BeepBeep. In Section 4, is generally assumed that the output stream is monotonic: we first describe the fundamental design principles behind once an output event is produced, it cannot be “taken the development of this system. Some of these principles back” at a later time. In contrast, in offline (or “batch”) purposefully distance themselves from trends followed by mode, the contents of the input streams are completely current CEP solutions, and are aimed at making the in- known in advance (for example, by being stored on disk tended system both simpler and more versatile. Section 5 or in a database). Whether a system operates online or then formally describes BeepBeep’s computational model. offline sometimes matters: for example, offline computation This simple formalization completely describes the system’s may take advantage of the fact that events from the input semantics, making it possible for alternate implementations streams may be indexed, rewinded or fast-forwarded on to be independently developed. demand. Recently, the hybrid concept of “micro-batching” has been introduced in systems like Apache Spark Stream- One of the key features of BeepBeep is its associated ing (cf. Section 3.1.7). It is a special case of batch processing query language, called eSQL, which is described in Section with very small batch sizes. 6. Substantial effort has been put in making eSQL simple Guarantees on the delivery of events in a CEP system and coherent; in accordance to BeepBeep’s design princi- can also vary. “At most once” delivery entails that every ples, it strives towards relational transparency, meaning event may be sent to its intended recipient, but may also be that queries that perform computations similar to rela- lost. “At least once” delivery ensures reception of the event, tional database operations are written in a syntax that is but at the potential cost of duplication, which must then backwards-compatible with SQL. Section 7 then describes be handled by the receiver. In between is perfect event de- the various means of extending BeepBeep’s built-in func- livery, where reception of each event is guaranteed without tionalities. A user can easily develop new event processing duplication. These concepts generally matter only for dis- units in a handful of lines of code, and most importantly, tributed event processing systems, where communication define arbitrary grammatical extensions to eSQL to use links between nodes may involve loss and latency. these custom elements inside queries. Extensions can be In the following, we proceed to describe a few scenarios bundled in dynamically-loaded packages called palettes; where event streams are produced and processed. we describe a few of the available palettes, allowing Beep- Beep to manipulate network captures, tuples, plots, and 2.1. Stock Ticker temporal logic operators, among others. A recurring scenario used in CEP to illustrate the perfor- mance of various tools is taken from the stock market [12]. Equipped with these constructs, Section 8 then proceeds One considers a stream of stock quotes, where each event to showcase BeepBeep’s functionalities. An experimental contains attributes such as a stock symbol, the price of the comparison of BeepBeep’s performance with respect to a stock at various moments (such as its minimum price and selection of other CEP engines is detailed in Section 9. closing price), as well as a timestamp. A typical stream To the best of our knowledge, this is the first published of events of this nature is shown in Figure 1. This figure account of such an empirical benchmark of CEP engines shows that events are structured as tuples, with a fixed set on the same input data. These experiments reveal that, on of attributes, each of which taking a scalar value.
Recommended publications
  • Log4j-Users-Guide.Pdf
    ...................................................................................................................................... Apache Log4j 2 v. 2.2 User's Guide ...................................................................................................................................... The Apache Software Foundation 2015-02-22 T a b l e o f C o n t e n t s i Table of Contents ....................................................................................................................................... 1. Table of Contents . i 2. Introduction . 1 3. Architecture . 3 4. Log4j 1.x Migration . 10 5. API . 16 6. Configuration . 18 7. Web Applications and JSPs . 48 8. Plugins . 56 9. Lookups . 60 10. Appenders . 66 11. Layouts . 120 12. Filters . 140 13. Async Loggers . 153 14. JMX . 167 15. Logging Separation . 174 16. Extending Log4j . 176 17. Extending Log4j Configuration . 184 18. Custom Log Levels . 187 © 2 0 1 5 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . T a b l e o f C o n t e n t s ii © 2 0 1 5 , T h e A p a c h e S o f t w a r e F o u n d a t i o n • A L L R I G H T S R E S E R V E D . 1 I n t r o d u c t i o n 1 1 Introduction ....................................................................................................................................... 1.1 Welcome to Log4j 2! 1.1.1 Introduction Almost every large application includes its own logging or tracing API. In conformance with this rule, the E.U.
    [Show full text]
  • Distributed Complex Event Processing in Multiclouds
    Distributed Complex Event Processing in Multiclouds Vassilis Stefanidis1, Yiannis Verginadis1, Ioannis Patiniotakis1 and Gregoris Mentzas1 1 Institute of Communications and Computer Systems, National Technical University of Ath- ens, Greece {stefanidis, jverg, ipatini, gmentzas}@mail.ntua.gr Abstract. The last few years, the generation of vast amounts of heterogeneous data with different velocity and veracity and the requirement to process them, has significantly challenged the computational capacity and efficiency of the modern infrastructural resources. The propagation of Big Data among different pro- cessing and storage architectures, has amplified the need for adequate and cost- efficient infrastructures to host them. An overabundance of cloud service offer- ings is currently available and is being rapidly adopted by small and medium enterprises based on its many benefits to traditional computing models. However, at the same time the Big Data computing requirements pose new research chal- lenges that question the adoption of single cloud provider resources. Nowadays, we discuss the emerging data-intensive applications that necessitate the wide adoption of multicloud deployment models, in order to use all the advantages of cloud computing. A key tool for managing such multicloud applications and guarantying their quality of service, even in extreme scenarios of workload fluc- tuations, are adequate distributed monitoring mechanisms. In this work, we dis- cuss a distributed complex event processing architecture that follows automati- cally the big data application deployment in order to efficiently monitor its health status and detect reconfiguration opportunities. This proposal is examined against an illustrative scenario and is preliminary evaluated for revealing its performance results. Keywords: Distributed CEP, Cloud Monitoring, Multiclouds, Big Data.
    [Show full text]
  • LMAX Disruptor
    Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads Martin Thompson Dave Farley Michael Barker Patricia Gee Andrew Stewart May-2011 http://code.google.com/p/disruptor/ 1 Abstract LMAX was established to create a very high performance financial exchange. As part of our work to accomplish this goal we have evaluated several approaches to the design of such a system, but as we began to measure these we ran into some fundamental limits with conventional approaches. Many applications depend on queues to exchange data between processing stages. Our performance testing showed that the latency costs, when using queues in this way, were in the same order of magnitude as the cost of IO operations to disk (RAID or SSD based disk system) – dramatically slow. If there are multiple queues in an end-to-end operation, this will add hundreds of microseconds to the overall latency. There is clearly room for optimisation. Further investigation and a focus on the computer science made us realise that the conflation of concerns inherent in conventional approaches, (e.g. queues and processing nodes) leads to contention in multi-threaded implementations, suggesting that there may be a better approach. Thinking about how modern CPUs work, something we like to call “mechanical sympathy”, using good design practices with a strong focus on teasing apart the concerns, we came up with a data structure and a pattern of use that we have called the Disruptor. Testing has shown that the mean latency using the Disruptor for a three-stage pipeline is 3 orders of magnitude lower than an equivalent queue-based approach.
    [Show full text]
  • High Performance & Low Latency Complex Event Processor
    International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Impact Factor (2012): 3.358 Lightning CEP - High Performance & Low Latency Complex Event Processor Vikas Kale1, Kishor Shedge2 1, 2Sir Visvesvaraya Institute of Technology, Chincholi, Nashik, 422101, India Abstract: Number of users and devices connected to internet is growing exponentially. Each of these device and user generate lot of data which organizations want to analyze and use. Hadoop like batch processing are evolved to process such big data. Batch processing systems provide offline data processing capability. There are many businesses which requires real-time or near real-time processing of data for faster decision making. Hadoop and batch processing system are not suitable for this. Stream processing systems are designed to support class of applications which requires fast and timely analysis of high volume data streams. Complex event processing, or CEP, is event/stream processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. In this paper I propose “Lightning - High Performance & Low Latency Complex Event Processor” engine. Lightning is based on open source stream processor WSO2 Siddhi. Lightning retains most of API and Query Processing of WSO2 Siddhi. WSO2 Siddhi core is modified using low latency technique such as Ring Buffer, off heap data store and other techniques. Keywords: CEP, Complex Event Processing, Stream Processing. 1. Introduction of stream applications include automated stock trading, real- time video processing, vital-signs monitoring and geo-spatial Number of users and devices connected to internet is trajectory modification. Results produced by such growing exponentially.
    [Show full text]
  • Need a Title Here
    Equilibrium Behaviour of Double-Sided Queueing System with Dependent Matching Time by Cheryl Yang A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of Master of Science in Probability and Statistics Carleton University Ottawa, Ontario @ 2020 Cheryl Yang ii Abstract In this thesis, we consider the equilibrium behaviour of a double-ended queueing system with dependent matching time in the context of taxi-passenger systems at airport terminal pickup. We extend the standard taxi-passenger model by considering random matching time between taxis and passengers in an airport terminal pickup setting. For two types of matching time distribution, we examine this model through analysis of equilibrium behaviour and optimal strategies. We demonstrate in detail how to derive the equilibrium joining strategies for passen- gers arriving at the terminal and the existence of a socially optimal strategies for partially observable and fully observable cases. Numerical experiments are used to examine the behaviour of social welfare and compare cases. iii Acknowledgements I would like to give my deepest gratitude to my supervisor, Dr. Yiqiang Zhao, for his tremendous support throughout my graduate program. Dr. Zhao gave me the opportunity to explore and be exposed to various applications of queueing theory and offered invaluable guidance and advice in my thesis research. Without his help, this thesis would not be possible. I am grateful for his patience, the time he has spent advising me, and his moral support that guided me through my studies at Carleton University. I would also like to thank Zhen Wang of Nanjing University of Science and Tech- nology for his gracious help and patience.
    [Show full text]
  • Varon-T Documentation Release 2.0.1-Dev-5-G376477b
    Varon-T Documentation Release 2.0.1-dev-5-g376477b RedJack, LLC June 21, 2016 Contents 1 Contents 3 1.1 Introduction...............................................3 1.2 Disruptor queue management......................................3 1.3 Value objects...............................................5 1.4 Producers.................................................6 1.5 Consumers................................................7 1.6 Yield strategies..............................................9 1.7 Example: Summing integers.......................................9 2 Indices and tables 15 i ii Varon-T Documentation, Release 2.0.1-dev-5-g376477b This is the documentation for Varon-T 2.0.1-dev-5-g376477b, last updated June 21, 2016. Contents 1 Varon-T Documentation, Release 2.0.1-dev-5-g376477b 2 Contents CHAPTER 1 Contents 1.1 Introduction Message passing is currently a popular approach for implementing concurrent data processing applications. In this model, you decompose a large processing task into separate steps that execute concurrently and communicate solely by passing messages or data items between one another. This concurrency model is an intuitive way to structure a large processing task to exploit parallelism in a shared memory environment without incurring the complexity and overhead costs associated with multi-threaded applications. In order to use a message passing model, you need an efficient data structure for passing messages between the processing elements of your application. A common approach is to utilize queues for storing and retrieving messages. Varon-T is a C library that implements a disruptor queue (originally implemented in the Disruptor Java library), which is a particularly efficient FIFO queue implementation. Disruptor queues achieve their efficiency through a number of related techniques: • Objects are stored in a ring buffer, which uses a fixed amount of memory regardless of the number of data records processed.
    [Show full text]
  • Darko Anicic Event Processing and Stream Reasoning with ETALIS
    Darko Anicic Event Processing and Stream Reasoning with ETALIS Zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswissenschaften (Dr. rer. pol.) von der Fakultät für Wirtschaftswissenschaften des Karlsruher Instituts für Technologie (KIT) genehmigte Dissertation. Event Processing and Stream Reasoning with ETALIS Dipl.-Ing. Darko Aniciˇ c´ Referent: Prof. Dr. Rudi Studer Korreferent: Prof. Dr. Opher Etzion Prüfer: Prof. Dr. Detlef Seese Vorsitzender der Prüfungskommission: Prof. Dr. Andreas Geyer-Schulz Tag der mündlichen Prüfung: 09. November 2011. Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB) Fakultät Wirtschaftswissenschaften Karlsruher Institut für Technologie (KIT) Karlsruhe 2011 i To Vedrana and Andre Abstract Event Processing (EP) is concerned with detection of situations under time constraints that are of a particular business interest. We face today a paradigm shift toward the real time information processing, and EP has therefore spawned significant attention in science and technology. Due to omnipresence of events, EP is becoming a central as- pect of new distributed systems such as cloud computing and grid systems, mobile and sensor-based systems, as well as a number of application areas including financial ser- vices, business intelligence, social and collaborative networking, click stream analysis and many others. However, there are a number of issues to be considered in order to enable effective event-based computation. A language for describing event patterns needs to feature a well-defined semantics. It also needs to be rich enough to express important classes of event patterns. Pattern matching should be supported in both, query-driven and event- driven modes. A number of other event operations, such as event aggregation, filtering, translation, enrichment and splitting, should be supported too.
    [Show full text]
  • On New Approaches of Assessing Network Vulnerability: Hardness and Approximation Thang N
    1 On New Approaches of Assessing Network Vulnerability: Hardness and Approximation Thang N. Dinh, Ying Xuan, My T. Thai, Member, IEEE, Panos M. Pardalos, Member, IEEE, Taieb Znati, Member, IEEE Abstract—Society relies heavily on its networked physical infrastructure and information systems. Accurately assessing the vulnerability of these systems against disruptive events is vital for planning and risk management. Existing approaches to vulnerability assessments of large-scale systems mainly focus on investigating inhomogeneous properties of the underlying graph elements. These measures and the associated heuristic solutions are limited in evaluating the vulnerability of large-scale network topologies. Furthermore, these approaches often fail to provide performance guarantees of the proposed solutions. In this paper, we propose a vulnerability measure, pairwise connectivity, and use it to formulate network vulnerability assessment as a graph-theoretical optimization problem, referred to as β-disruptor. The objective is to identify the minimum set of critical network elements, namely nodes and edges, whose removal results in a specific degradation of the network global pairwise connectivity. We prove the NP-Completeness and inapproximability of this problem, and propose an O(log n log log n) pseudo- approximation algorithm to computing the set of critical nodes and an O(log1:5 n) pseudo-approximation algorithm for computing the set of critical edges. The results of an extensive simulation-based experiment show the feasibility of our proposed vulnerability assessment framework and the efficiency of the proposed approximation algorithms in comparison with other approaches. Index Terms—Network vulnerability, Pairwise connectivity, Hardness, Approximation algorithm. F 1 INTRODUCTION ONNECTIVITY plays a vital role in network per- tal a network element is to the survivability of the net- formance and is fundamental to vulnerability C work when faced with disruptive events.
    [Show full text]
  • Data Structures of Big Data: How They Scale
    DATA STRUCTURES OF BIG DATA: HOW THEY SCALE Dibyendu Bhattacharya Manidipa Mitra Principal Technologist, EMC Principal Software Engineer, EMC [email protected] [email protected] Table of Contents Introduction ................................................................................................................................ 3 Hadoop: Optimization for Local Storage Performance................................................................ 3 Kafka: Scaling Distributed Logs using OS Page Cache .............................................................10 MongoDB: Memory Mapped File to Store B-Tree Index and Data Files ....................................15 HBase: Log Structured Merged Tree – Write Optimized B-Tree ................................................19 Storm: Efficient Lineage Tracking for Guaranteed Message Processing ...................................25 Conclusion ................................................................................................................................33 Disclaimer: The views, processes, or methodologies published in this article are those of the authors. They do not necessarily reflect EMC Corporation’s views, processes, or methodologies. 2014 EMC Proven Professional Knowledge Sharing 2 Introduction We are living in the data age. Big data—information of extreme volume, diversity, and complexity—is everywhere. Enterprises, organizations, and institutions are beginning to recognize that this huge volume of data can potentially deliver high value to their
    [Show full text]
  • Requirements and State of the Art Overview on Flexible Event Processing
    ICT, STREP FERARI ICT-FP7-619491 Flexible Event pRocessing for big dAta aRchItectures Collaborative Project D4.1 Requirements and state of the art overview of flexible event processing 01.02.2013 – 31.01.2014(preparation period) Contractual Date of Delivery: 31.01.2015 Actual Date of Delivery: 31.01.2015 Author(s): Fabiana Fournier and Inna Skarbovsky Institution: IBM Workpackage: Flexible Event Processing Security: PU Nature: R Total number of pages: 48 Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491 Project coordinator name Michael Mock Revision: 1 Project coordinator organisation name Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) Schloss Birlinghoven, 53754 Sankt Augustin, Germany URL: http://www.iais.fraunhofer.de Abstract The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling business users to express complex analytics tasks through a high-level declarative language that supports distributed complex event processing as an integral part of the system architecture. Work package 4 “Flexible Event Processing” deals with all the developments around event processing technologies in order to achieve this goal. In order to be flexible, event processing engines need to tackle the two following requirements in a satisfactory way: • The easy adaptability to non-functional requirements, specially, the way the tool copes with scalability issues in a distributed environment. • The easy definition and maintenance of the event-driven logic. The task of work package 4 is to provide a model and methodology to cope with these limitations.
    [Show full text]
  • Research Article DECISION TREE LEARNING and REGRESSION MODELS to PREDICT ENDOCRINE DISRUPTOR CHEMICALS - a BIG DATA ANALYTICS APPROACH with HADOOP and APACHE SPARK
    International Journal of Machine Intelligence ISSN: 0975-2927 & E-ISSN: 0975-9166, Volume 7, Issue 1, 2016, pp.-469-473. Available online at http://www.bioinfopublication.org/jouarchive.php?opt=&jouid=BPJ0000231 Research Article DECISION TREE LEARNING AND REGRESSION MODELS TO PREDICT ENDOCRINE DISRUPTOR CHEMICALS - A BIG DATA ANALYTICS APPROACH WITH HADOOP AND APACHE SPARK PAULOSE RENJITH1*, JEGATHEESAN K.2 AND GOPAL SAMY B.3 1Cognizant Technology Solutions, Athulya, Infopark SEZ, Kakkanad, Kochi 682030, Kerala, India 2Center for Research and PG Studies in Botany and Department of Biotechnology, Thiagarajar College (Autonomous), Madurai - 625 009, Tamil Nadu, India 3Department of Biotechnology, Liatris Biosciences LLP, Kakkanad, Cochin, Kerala *Corresponding Author: [email protected] Received: April 04, 2016; Revised: May 04, 2016; Accepted: May 05, 2016; Published: May 07, 2016 Abstract- Predictive toxicology calls for innovative and flexible approaches to mine and analyse the mounting quantity and complexity of data used in it. Classification and regression based machine learning algorithms are used in this study in order to computationally predict chemical’s affinity towards endocrine hormones. As a result of the modelling complexity and existing big sized toxicity datasets generated by various irrelevant descriptors, missing values, noisy data and skewed distribution, we are motivated to use machine learning and big data analytics in toxicity prediction. This paper reports results of a qualitative and quantitative toxicity prediction of endocrine disrupting chemicals. Datasets of Estrogen Receptor (ER) and Androgen Receptor (AR) disrupting chemicals along with their Binding Affinity values were used for building the predictive models. Fragment counts of dataset chemicals were generated using Kier Hall Smarts Descriptor that exploit electro-topological state (e- state) indices.
    [Show full text]
  • Anomaly Detection in Manufacturing Equipment with Apache Flink : Grand Challenge Yann Busnel, Nicolo Riveei, Avigdor Gal
    FlinkMan : Anomaly Detection in Manufacturing Equipment with Apache Flink : Grand Challenge Yann Busnel, Nicolo Riveei, Avigdor Gal To cite this version: Yann Busnel, Nicolo Riveei, Avigdor Gal. FlinkMan : Anomaly Detection in Manufactur- ing Equipment with Apache Flink : Grand Challenge. DEBS ’17 : 11th ACM International Conference on Distributed and Event-based Systems, Jun 2017, Barcelone, Spain. pp.274-279 10.1145/3093742.3095099. hal-01644417 HAL Id: hal-01644417 https://hal.archives-ouvertes.fr/hal-01644417 Submitted on 22 Nov 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Grand Challenge: FlinkMan – Anomaly Detection in Manufacturing Equipment with Apache Flink Nicolo Rivei Yann Busnel Avigdor Gal Technion - Israel Institute of IMT Atlantique / IRISA / UBL Technion - Israel Institute of Technology [email protected] Technology [email protected] [email protected] ABSTRACT analog. ese sensors provide periodic measurements, which are We present a (so) real-time event-based anomaly detection appli- sent to a monitoring base station. e laer receives then a large cation for manufacturing equipment, built on top of the general collection of observations. Analyzing in an ecient and accurate purpose stream processing framework Apache Flink.
    [Show full text]