ICT, STREP FERARI ICT-FP7-619491 Flexible Event pRocessing for big dAta aRchItectures Collaborative Project

D4.1

Requirements and state of the art overview of flexible event processing

01.02.2013 – 31.01.2014(preparation period)

Contractual Date of Delivery: 31.01.2015 Actual Date of Delivery: 31.01.2015 Author(s): Fabiana Fournier and Inna Skarbovsky Institution: IBM Workpackage: Flexible Event Processing Security: PU Nature: R Total number of pages: 48

Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491

Project coordinator name Michael Mock

Revision: 1 Project coordinator organisation name Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)

Schloss Birlinghoven, 53754 Sankt Augustin, Germany

URL: http://www.iais.fraunhofer.de Abstract The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling business users to express complex analytics tasks through a high-level declarative language that supports distributed complex event processing as an integral part of the system architecture. Work package 4 “Flexible Event Processing” deals with all the developments around event processing technologies in order to achieve this goal.

In order to be flexible, event processing engines need to tackle the two following requirements in a satisfactory way:

• The easy adaptability to non-functional requirements, specially, the way the tool copes with scalability issues in a distributed environment. • The easy definition and maintenance of the event-driven logic.

The task of work package 4 is to provide a model and methodology to cope with these limitations. The proposed approach addresses both the functional and non-functional properties of event processing applications by supporting non-technical users with a declarative language expressed in tabular forms. The outcome model can be then automatically translated into event driven definitions and eventually into a running application in the proposed FERARI architecture.

D4.1 Requirements and state of the art overview on flexible event processing

Revision history Administration Status Project acronym: FERARI ID: ICT-FP7-619491 Document identifier: D4.1 Requirements and state of the art overview of flexible event processing (01.02.2013 – 31.01.2014) Leading Partner: IBM Report version: 1 Report preparation date: 31.01.2014 Classification: PU Nature: REPORT Author(s) and contributors: Fabiana Fournier and Inna Skarbovsky Status: - Plan - Draft - Working - Final x Submitted

Copyright This report is © FERARI Consortium 2014. Its duplication is restricted to the personal use within the consortium and the European Commission. www.ferari-project.eu

D4.1 Requirements and state of the art overview on flexible event processing

Document History Version Date Author Change Description 0.1 15/11/2014 Fabiana Fournier (IBM) First draft 0.2 1/12/2014 Fabiana Fournier (IBM) Second draft including sections 3 and 4 0.3 15/12/2014 Fabiana Fournier (IBM) First complete version 0.4 15/12/2014 Fabiana Fournier (IBM) Inclusion of abstract 0.5 15/12/2014 Fabiana Fournier (IBM) Updates per internal review 1.0 30/12/2014 Fabiana Fournier (IBM) Final fixes and cleanup

D4.1 Requirements and state of the art overview on flexible event processing

Table of Contents 1 Introduction ...... 1 1.1 Purpose and scope of the document ...... 1 1.2 Relationship with other documents ...... 1 2 Complex event processing – The motivation ...... 1 3 Complex event processing – The business case ...... 4 4 State of the art in complex event processing tools ...... 6 4.1 Commercial tools ...... 8 4.1.1 InfoSphere Streams (IBM) [18]‎ [19]‎ ...... 9 4.1.2 Informatica Platform for streaming analytics (Informatica), ...... 9 4.1.3 Event Stream Processor (ESP) (SAP) [18]‎ [19]‎ ...... 9 4.1.4 (Software AG) [18]‎ [19]‎ ...... 10 4.1.5 StreamBase (Tibco) [18]‎ [19]‎ ...... 10 4.2 Open source engines ...... 10 4.2.1 Esper (EsperTech Inc) ...... 11 4.2.2 IBM Proactive Technology Online (PROTON)...... 11 4.2.3 Open source event processing running on distributed stream computing platforms ...... 12 4.3 Research tools ...... 13 4.4 Limitations of contemporary event processing tools ...... 14 5 Complex event processing background ...... 14 5.1 Event types ...... 15 5.2 Event attributes ...... 16 5.3 Context ...... 16 5.4 Event Processing Network (EPN) ...... 17 5.5 Event Processing Agent (EPA) ...... 17 5.6 Pattern policies ...... 18 5.7 Context initiator policies ...... 19 5.8 PROTON definitions ...... 20 6 Requirements for flexible event processing ...... 21

D4.1 Requirements and state of the art overview on flexible event processing

6.1 Non-functional requirements of event processing applications ...... 21 6.1.1 Scalability ...... 22 6.1.2 Availability ...... 22 6.1.3 Security ...... 23 6.1.4 Performance objectives ...... 23 6.1.5 Usability...... 24 6.2 Requirements for the mobile fraud use case ...... 26 6.2.1 Description of the mobile fraud use case ...... 27 6.2.1 Event types ...... 28 6.2.2 Event processing agents ...... 29 6.2.3 Mobile phone fraud use case functional requirements summary ...... 35 6.3 Introduction to the event model ...... 35 6.4 Summary of the requirements for flexible event processing in FERARI ...... 36 7 Summary and future steps ...... 36 8 References ...... 38

List of Tables Table 1: Initial EPN for the mobile phone fraud use case ...... 28

D4.1 Requirements and state of the art overview on flexible event processing

List of Figures Figure 1: Illustration of an event processing network ...... 17 Figure 2: Event recognition process in an EPA ...... 18 Figure 3: Mobile fraud use case initial EPN ...... 27 Figure 4: Event recognition process for Filtering EPA ...... 29 Figure 5: Context for Filter EPA ...... 30 Figure 6: Event recognition process for FrequentLongCallsAtNight EPA ...... 30 Figure 7: Context for FrequentLongCallsAtNight EPA ...... 31 Figure 8: Event recognition process for FrequentLongCalls EPA ...... 32 Figure 9: Context for FrequentLongCalls EPA ...... 33 Figure 10: Event recognition process for FrequentEachLongCall EPA ...... 33 Figure 11: Context for FrequentEachLongCall EPA ...... 34 Figure 12: Event recognition process for ExpensiveCalls EPA ...... 34 Figure 13: Context for ExpensiveCall EPA ...... 35

D4.1 Requirements and state of the art overview on flexible event processing

Acronyms ASF Apache Software Foundation BAM Business Activity Monitoring CEP Complex Event Processing DBMS Data base Management System DEBS Distributed Event Based DSCP Distributed Stream Computing Platforms DSMS Data System EAI Enterprise Application Integration ECA Event-Condition-Action EPA Event Processing Agent EPL Event Processing Language EPN Event Processing Network EPTS Event Processing Technical Society ESP Event Stream Processing FERARI Flexible Event pRocessing for big dAta aRchItectures JSON JavaScript Object Notation IP Intellectual Property SaaS Software as a Service SCADA Supervisory Control And data Acquisition SIEM Security Information and Event Management TDM The Decision Model TEM The Event Model WP Work Package

D4.1 Requirements and state of the art overview on flexible event processing

1

1 Introduction 1.1 Purpose and scope of the document The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling business users to express complex analytics tasks through a high-level declarative language that supports distributed complex event processing as an integral part of the system architecture. Work package 4 (WP4) “Flexible Event Processing” deals with all the developments around event processing technologies in order to achieve this goal.

This report surveys the state of the art in event processing systems including products and research assets, trends, and limitations of current offerings; and hints towards the way to cope with current limitations in the scope of the project. The report also describes non-functional and functional requirements of event processing engines in relation to the mobile fraud use case of the project.

Note that we use complex event processing and event processing, as well as tool, engine and system, interchangeable throughout this report.

This report is structured as follows: Section 2‎ gives the background for the appearance of complex event driven systems from the technical point of view, whilst Section 3‎ adds the business incentive. Section 4‎ surveys main commercial, open source, and research event processing tools. Section 5‎ provides some necessary background on the semantics used in the FERARI project. Section 6‎ describes the requirements for flexible event processing including details on the mobile fraud use case. We conclude the report with summary and future steps in Section 7‎ .

1.2 Relationship with other documents FERARI stands for Flexible Event pRocessing for big dAta aRchItectures, therefore there is a tight connection between event processing components and the rest of the components that form the FERARI architecture, specifically, this deliverable is strongly related to D2.1 - Architecture definition in WP2. The requirements for the event processing engine are dictated from the use cases in the project, thus, this report is also strongly related to D1.1 - Application Scenario Description and Requirement Analysis in WP1.

2 Complex event processing – The motivation In the past decade, there has been an increase in demand to process continuously flowing data from external sources at unpredictable rate to obtain timely responses to complex queries. Traditional Data

D4.1 Requirements and state of the art overview on flexible event processing

2

Base Management Systems (DBMSs) require data to be (persistently) stored and indexed before it could be processed, and process data only when explicitly asked by the users, that is, asynchronously with respect to its arrival. These requirements led to the development of a number of systems specifically designed to process information as a flow according to a set of pre-deployed processing rules. Two models have emerged [6]‎ : the data stream processing model [22]‎ and the complex event processing model [23]‎ .

Data Stream Management Systems (DSMSs) differ from conventional Data Base Management Systems (DBMSs) in several ways: (a) as opposed to tables, streams are usually unbounded; (b) no assumption can be made on data arrival order; and (c) size and time constraints make it difficult to store and process data stream elements after their arrival, and therefore, one time processing is the typical mechanism used to deal with streams. Users of DSMS install standing (or continuous) queries, i.e., queries that are deployed once and continue to produce results until removed. Standing queries can be executed periodically or continuously, as new streams items arrive. As opposed to DBMSs, users in DSMSs do not have to explicitly ask for updated information; rather the system actively notifies it according to installed queries. DSMSs focus on producing queries results, which are continuously updated to in accordance to the constantly changing contents of their input data. Detection and notification of complex patterns of elements involving sequences and ordering relations are usually out of the scope of DSMSs. DSMSs mainly focus on flowing data and data transformation, but only a few allow the easy capture of sequences of data involving complex ordering relationships, not to mention taking into account the possibility to perform filtering, correlation, and aggregation of data directly in-network, as streams flow from sources to sinks.

Complex Event Processing (CEP) systems adopt an opposite approach. They associate a precise semantics to the information items being processed: they are notifications of events which happened in the external world and were observed by sources, also called event producers. The CEP engine is responsible for filtering and combining such notifications to understand what is happening in terms of higher-level events (a.k.a complex events, composite events, or situations) to be notified to sinks, called event consumers. CEP systems put emphasis on the issue that represents the main limitation of DSMSs, that is, the ability to detect complex patterns of incoming items, involving sequencing and ordering relationships. An example for a situation is a Suspicious account which is detected whenever there are at least three large cash deposits within 10 days for the same account. Event processing is in essence a paradigm of reactive computing: a system observes the world and reacts to events as they occur. It is an evolutionary step from the paradigm of responsive computing, in which a system responds only to explicit service requests. Event processing has evolved in the past years departing from traditional computing architectures which employ synchronous, request-response interactions between client and servers to reactive application, in which decisions are driven by events.

D4.1 Requirements and state of the art overview on flexible event processing

3

CEP [20]‎ is a technique in which incoming data about what is happening (event data) is processed more or less as it arrives to generate higher-level, more-useful, summary information (complex events). Event processing platforms have built-in capabilities for filtering incoming data, storing windows of event data, computing aggregates and detecting patterns. In a more formal terminology, CEP software is any computer program that can generate, read, discard and perform calculations on events. A complex event is an abstraction of one or more raw or input events. Complex events may signify threats or opportunities that require a response from the business. One complex event may be the result of calculations performed on a few or on millions of events from one or more event sources. A situation may be triggered by the observation of a single raw event, but is more typically obtained by detecting a pattern over the flow of events. Many of these patterns are temporal in nature [11],‎ but they can also be spatial, spatio-temporal, or modal [7].‎ Event processing deals with these functions: get events from sources (event producers), route these events, filter them, normalize or otherwise transform them, aggregate them, detect patterns over multiple events, and transfer them as alerts to a human or as a trigger to an autonomous adaptation system (event consumers). An application or a complete definition set made up of these functions is also known as an Event Processing Network (EPN).

As aforementioned, the goal of a CEP engine is to notify its users immediately upon the detection of a pattern of interest. Data flows are seen as streams of events, some of which may be irrelevant for the user's purposes. Therefore, the main focus is on the efficient filtering out of irrelevant data and processing of the relevant. Obviously, for such systems to be acceptable, they have to satisfy certain efficiency, fault tolerance, and accuracy constraints, such as low latency and robustness.

CEP platforms required a new type of architecture. Conventional architectures are not fast or efficient enough for some applications because they use a "save-and-process" paradigm in which incoming data is stored in databases in memory or on disk, and then queries are applied. When fast responses are critical, or the volume of incoming information is very high, application architects instead use a "process-first" CEP paradigm, in which logic is applied continuously and immediately to the "data in motion" as it arrives. CEP is more efficient because it computes incrementally, in contrast to conventional architectures that reprocess large datasets, often repeating the same retrievals and calculations as each new query is submitted.

CEP has already successfully been applied to several domains: sensor networks for environmental monitoring [13]; payment analysis for fraud detection [39]‎ ; financial applications for trend discovery [40]‎ ; RFID-based inventory management for anomaly detection [41]‎ . According to Gartner [46]‎ , over a third of spending on event processing technologies comes from the financial services institution vertical. More in general, as observed in [23]‎ , the information system of every company could and should be organized around an event-based core that acts as a nervous system to guide and control the other sub- systems.

CEP has already built up significant momentum manifested in a steady research community and a variety of commercial as well as open source products [6]‎ . Today, a large variety of commercial and

D4.1 Requirements and state of the art overview on flexible event processing

4 open source event processing tools is available to architects and developers who are building event processing applications (see Section 0‎ ). These are sometimes called event processing platforms, streaming analytics platform, complex-event processing systems, event stream processing (ESP) systems, or distributed stream computing platforms (DSCPs). For example, Forrester[19]‎ defines streaming analytics platform as: “Software that can filter, aggregate, enrich, and analyze a high throughput of data from multiple disparate live data sources and in any data format to identify simple and complex patterns to visualize business in real-time, detect urgent situations, and automate immediate actions”. In their definition, streaming analytics platforms include both development tools to create streaming applications and a run-time platform.

However, we distinguish between platforms that can do complex patterns over events and platforms that only can perform filtering on events and offer the possibility to add the pattern logic. (Complex) event processing systems are general purpose development and runtime tools that are used by developers to build custom, event-processing applications without having to re-implement the core algorithms for handling event streams; as they provide the necessary building blocks to build the event driven applications. DSCPs, on the other hand, are general-purpose platforms without full native CEP analytic functions and associated accessories, but they are highly scalable and extensible and usually offer an open programming model, so developers can add the logic to address many kinds of stream processing applications, including some CEP solutions. Therefore, they are not considered “real” complex event processing platforms. As we will see in Section 4.2.3‎ , today there are already some implementations that take advantage of the pattern recognition capability of CEP systems along with the scalability capabilities that offer DSCPs, and offer a holistic architecture. FERARI architecture is one example of this new approach.

3 Complex event processing – The business case CEP usage is growing rapidly because CEP, in a technical sense, is the only way to get information from event streams in real-time or near-real time [32]‎ . The system has to process the event data more or less as it arrives so that the appropriate action can be taken quickly. Note, that we use the term “real time” loosely, to include “near-real-time” or “business real time.”

Event processing has a marked impact on technical and business aspects of an enterprise [1]‎ . From technical perspective, CEP enables loose coupling among the components of an enterprise system or end-to-end process, which makes this system or process highly adaptable while enabling service reuse. For executives, CEP enables a performance-driven enterprise, allowing immediate, tactical, proactive decision-making to be driven by a deep knowledge of the context of that decision, while comprehensive, near-term operational data can inform tactical and strategic decisions.

More specifically, Gartner ([2]‎ ,[3]‎ ,[4]‎ , and [5]‎ ) denotes three business impacts of CEP:

D4.1 Requirements and state of the art overview on flexible event processing

5

 Improves the quality of decision making by presenting information that would otherwise be overlooked.

 Enables faster response to threats and opportunities.

 Helps shield business people from data overload by eliminating irrelevant information and presenting only alerts and distilled versions of the most important information. CEP also adds real-time intelligence to operational technology and business IT applications.

Moreover, the same Gartner reports [‎2],[3]‎ ,[4]‎ , and [5]‎ ) state that companies should use CEP to enhance their situation awareness and to build "sense-and respond" behavior into their systems. Situation awareness means “understanding what is going on, so that you can decide what to do”. According to these reports, CEP should be used in operational activities that run continuously and need ongoing monitoring. This can apply to fraud detection, real-time precision marketing (cross-sell and upsell), factory floor systems, website monitoring, customer contact center management, trading systems for capital markets, transportation operation management (for airlines, trains, shipping and trucking) and other applications. In a utility context, CEP can be used to process a combination of supervisory control and data acquisition (SCADA) events and "last gasp" notifications from smart meters to determine the location and severity of a network fault, and then to trigger appropriate remedial actions.

In Gartner’s Hype Cycle reports from 2014 ([2]‎ ,[3]‎ ,[4]‎ , and [5]‎ ) CEP remains positioned as transformational, that means that it enables new ways of doing business across industries that will result in major shifts in industry dynamics, because “it is the only way to get information from event streams in real time”. According to these Gartner reports, “CEP will inevitably be adopted in multiple places within virtually every company. However, companies were initially slow to adopt CEP because it is so different from conventional architecture, and many developers are still unfamiliar with it. CEP has moved slightly further past the Peak of Inflated Expectations, but it may take up to 10 more years for it to reach its potential on the Plateau of Productivity”. According to a recent market guide for event stream processing [46]‎ , “it may take up to 10 years for CEP to reach widespread usage in mainstream companies. However, it is taken for granted in financial services today. We estimate that it is also in use in more than 100 smart grid projects and a total of several thousand production deployments worldwide in a range of industries”. In fact, Forrester [19]‎ estimates a 66% increase in firms’ use of streaming analytics in the past two years.

As these reports also note, CEP has the potential to influence all sectors: “CEP has already transformed financial markets… and it is also essential to earthquake detection, radiation hazard screening, smart electrical grids and real-time location-based marketing”. Furthermore, “CEP is also essential to future Internet of Things applications where streams of sensor data must be processed in real time”. According to Gartner, CEP should be used in operational activities that run continuously and need ongoing monitoring. This can apply to fraud detection, real-time precision marketing (cross-sell and upsell), factory floor systems, website monitoring, customer contact centre management, trading systems for

D4.1 Requirements and state of the art overview on flexible event processing

6 capital markets, transportation operation management (for airlines, trains, shipping and trucking) and other applications. Note that our two business cases nicely fall in the list Gartner mentions as natural and essential candidates for CEP.

Forrester [19]‎ states that “Streaming analytics platforms can help firms detect insights in high velocity streams of data and act on them on real time”. Moreover, “Business won’t wait. That is truer today than ever before because of the white-water flow of data from innumerable real-time data sources. Market data, clickstream, mobile devices, sensors, and even good old fashioned transactions may contain valuable, but perishable insights. Perishable because the insights are only valuable if you can detect and act on them right now. That’s where streaming analytics platforms can help”.

In summary, from the business point of view, we can conclude as stated in [32]‎ that “Companies that understand CEP have more and better real-time intelligence than those that don’t understand it. The use of CEP will expand further as the pace of business accelerates, more data becomes available in real- time, and business people demand better situation awareness”. The analysts’ reports presented here only emphasis this statement. Accordingly the CEP market is forecast to grow rapidly. In fact, it is forecast to grow to US$4.7bn in 20191.

4 State of the art in complex event processing tools Today there exist a wide variety of commercial, open source, and research event processing tools. According to Gartner [‎2],[3]‎ ,[4]‎ , and [5]‎ ), companies should acquire CEP functionality by using an off- the-shelf application or SaaS (Software as a Service) offering that has embedded CEP under the covers, if a product that addresses their particular business requirements is available. Companies should consider building their own event driven applications in one of the following three cases:

• When an appropriate off-the-shelf application or SaaS offering is not available, companies should consider building their own CEP-enabled application on an operational intelligence platform that has embedded CEP capabilities. • For demanding, high-throughput, low-latency applications — or where the event processing logic is primary to the business problem — companies should build their own CEP-enabled applications on commercial or open source CEP platforms (see examples of vendors below) or DSCPs. • In rare cases, when none of the other tactics are practical, developers should write custom CEP logic into their applications using a standard programming language without the use of a commercial or open source CEP or DSCP product.

1 http://www.companiesandmarkets.com/Market/Information-Technology/Market-Research/Complex-Event- Processing-CEP-Market-by-Algorithmic-Trading-Global-forecast-to-2019/RPT127618

D4.1 Requirements and state of the art overview on flexible event processing

7

Two forms of stream processing software have emerged in the past 15 years [‎2],[3]‎ ,[4]‎ , [5]‎ and [46]‎ ) . The first were CEP platforms that have built-in analytic functions such as filtering, storing windows of event data, computing aggregates, and detecting patterns. Modern commercial CEP platform products include adapters to integrate with event sources, development and testing tools, dashboard and alerting tools, and administration tools. More recently the second form — distributed stream computing platforms (DSCPs) such as Amazon Web Services Kinesis2 and open source offerings including Apache Samza3, Spark4 and Storm5 — was developed. As previously mentioned, DSCPs are general-purpose platforms without full native CEP analytic functions and associated accessories, but they are highly scalable and extensible so developers can add the logic to address many kinds of stream processing applications, including some CEP solutions. Specially, Apache open source projects (Storm, Spark, and recently Samza) have gained a fair amount of attention and interest ([46]‎ , [21]‎ ), and these may well mature into commercial offerings in future and/or get embedded in existing commercial product sets.

Gartner is now tracking 20 vendors that offer pure-play CEP platforms and six that offer DSCPs [46]‎ (Note that this is not an exhaustive list): CEP platforms or tools:

• Codehaus/EsperTech's Esper, NEsper • Feedzai Pulse • IBM InfoSphere Streams • IBM Operational Decision Manager • Informatica RulePoint • Fujitsu Interstage Big Data Complex Event Processing • Hitachi uCosminexus Stream Data Platform • LG CNS' EventPro • Microsoft StreamInsight • OneMarketData OneTick CEP • Oracle Event Processing • Red Hat Drools Fusion/JBoss Enterprise BRMS • SAP Event Stream Processor • SAS DataFlux • SQLstream s-Server • Software AG Apama Event Processing Platform • Tibco BusinessEvents • Tibco StreamBase • Vitria Operational Intelligence Analytic Server

2 http://aws.amazon.com/kinesis/ 3 http://samza.incubator.apache.org/ 4 http://spark.apache.org/streaming/ 5 https://storm.apache.org/

D4.1 Requirements and state of the art overview on flexible event processing

8

• WS02 CEP Server

DSCPs:

• Google Cloud Dataflow • Apache S4 (open-source software, originated in Yahoo • Apache Samza (open source software, maybe from Linked-In) • Apache Storm (originated in Twitter) • Server DataTorrent RTS

This document focuses on event processing platforms and we will address DSCPs only in the context of open source tooling offerings that already comprise the DSCP along with event processing capabilities, as these are relevant to FERARI. DSCPs are discussed extensively in D2.1 and out of the scope of this document.

In the following sections we will address the most popular tools in three categories: commercial, open source, and research.

4.1 Commercial tools Most CEP tools are obtained as part of a larger product. Companies acquire a packaged application or subscribe to a SaaS service that has embedded CEP under the covers. The company is buying a solution that happens to require event processing, and it may not realize that CEP is being used. For example, supply chain visibility products; security information and event management (SIEM) products; some kinds of fraud detection and governance systems, risk and compliance products; system and network monitoring systems; business activity monitoring (BAM) tools; and many other categories of software implement some greater or lesser amount of CEP logic. In a few cases, the developers of these products or SaaS offerings have leveraged the general purpose event processing platforms listed above to reduce the amount of code they have to write. But in most cases, the developers implement a specialized subset of event processing algorithms in new code to suit their application purposes.

Forrester’s evaluation of general-purpose big data streaming analytics platforms from Q3 2014 reveals five leader vendors in the event processing niche [19]‎ : IBM, Informatica, SAP, Software AG, and Tibco Software. To assess the state of the big data streaming analytics market and see how the vendors and their platforms stack up against each other, Forrester evaluated the strengths and weaknesses of the top commercial big data streaming analytics platform vendors against 50 criteria, grouped into three high-level buckets: current offering, strategy, and market presence. The leaders have high scores in all the key evaluation areas: architecture, development tools, and stream processing. Next we will briefly describe the five leading products.

D4.1 Requirements and state of the art overview on flexible event processing

9

4.1.1 InfoSphere Streams (IBM)6 [18]‎ [19]‎ InfoSphere Streams is a dedicated stream processing system, where the processing of events is distributed among a dedicated cluster of machines. Depending on the hardware infrastructure and use case, millions of events can be processed per second. IBM’s InfoSphere Streams supports high volume, structured and unstructured streaming data sources such as images, audio, voice, VoIP, video, TV, financial news, radio, police scanners, web traffic, email, chat, GPS data, financial transaction data, satellite data, sensors, badge swipes, etc.. InfoSphere Streams emerged from IBM Research in 2009 and continues to benefit from IBM’s significant investments in research. InfoSphere Streams include customers in healthcare, financial services, telecommunications, government, energy and utilities, financial services, manufacturing, and transportation. Note that IBM’s InfoSphere Streams can be classified as either DSCP or CEP, depending on the context and the author’s point of view [20]‎ .

4.1.2 Informatica Platform for streaming analytics (Informatica)7,8 At the core of Informatica’s event detection and response products is RulePoint. RulePoint is a Java- based software product that acts as an enterprise event service, detecting complex business events as they occur, and automatically initiating responses as required. RulePoint detects complex events across disparate information sources including sensors, enterprise application integration (EAI), enterprise applications, databases, text documents, and more. In 2011, RulePoint was refactored to include streaming capabilities. They enable developers to author streaming applications using both business rules and streaming operator constructs built into the platform. Examples of applications include a geospatial tracking solution to monitor high-risk vessels before they enter ports or as they pass through shipping areas that are predetermined to be high-risk locations; and a phishing attack management solution support for banks, credit unions, online brokerages, and e-commerce companies.

4.1.3 Event Stream Processor (ESP) (SAP)9 [18]‎ [19]‎ SAP’s ESP (formerly Sybase Aleri) [18]‎ is a complex event processing system designed for analyzing large amount of various data in real time. It offers ability to filter, combine and normalize incoming data and can be used to detect important patterns, changed conditions, security problems and much more. It can be used to alert when events occurs or to react to events. The tool provides wide range of integrated tools to improve productivity. With Studio 3, developers can create and manage their applications and the event processing flow. Wide range of built-in adapters provide interfaces for JDBC, ODBC, JMS, etc. With the XML based AleriML language data models can be defined, while its SPLASH developer script language helps to develop much more complex applications what is unable to do with standard relational programming languages. SAP’s ESP has a broad base of customers in financial services, telecommunications, manufacturing, energy, retail, transportation and logistics, and public sector.

6 http://www-03.ibm.com/software/products/en/infosphere-streams 7 http://www.informatica.com/us/products/complex-event-processing/#fbid=ghA_Zem5ovE 8 http://www.complexevents.com/wp-content/uploads/2010/10/7107_EventDetectionAndResponse_web.pdf 9 http://www.sybase.com/products/financialservicessolutions/complex-event-processing

D4.1 Requirements and state of the art overview on flexible event processing

10

4.1.4 Apama10 (Software AG) [18]‎ [19]‎ Apama Event Processing Platform is a complete CEP based tool acquired from Progress Software in 2013. The CEP engine can handle inbound events within sub-seconds, find defined patterns, alert or respond to actions. With Apama Event Modeler, developers can create applications via graphical user interface, which are presentable with Apama Research Studio. Apama Dashboard Studio provides a set of tools to develop visually rich user interfaces. Via Apama dashboards, users can start/stop, parameterize and monitor event operations from both client and browser desktops. The Apama package includes many major adapters to handle communication with other components and applications. Apama has a long and strong history as a complex event processing platform used for algorithmic trading applications and market monitoring dating back to its origins in 2001. But, it is also used by telecommunication firms and credit card companies to provide real-time, location-based, and customer-preference based offers to consumers. Other industries include retail banking, telecommunications, retail, gaming, logistics and supply chain, government, energy and utilities, manufacturing.

4.1.5 StreamBase (Tibco)11 [18]‎ [19]‎ Tibco Software has been a force in the high-frequency trading market for more than fifteen years, and its acquisition of StreamBase in 2013 has given them the tools they need to meet the needs of the wider streaming analytics market. StreamBase is a high performance event stream processing platform, which provides efficient solution to build powerful applications for almost any usage area. It supports fast development via graphical event-flow language and supports StreamSQL for providing ease of use, flexibility, and extensibility capabilities for developers. This widely used software gives solutions for telecommunication, capital markets, intelligence and military, e-commercial, and multiplayer online gaming areas. In telecommunication, it provides services like network monitoring and protection, bandwidth and quality-of-service monitoring, fraud detection and location based services and even more.

4.2 Open source engines Open source is also an option when selecting a CEP engine with developers acquiring a basic open- source event stream processing engine and then using common, general-purpose programming tools to build the rest of the application.

In this section we briefly present two open source engines: Esper, today’s the most popular open source engine, as stated by Gartner “open source CEP products, particularly Esper, have been embedded in several thousand applications and commercial software products” [46]‎ , and PROTON from partner IBM which is the complex event processing engine in the FERARI project. Other common open source

10 http://www.softwareag.com/corporate/products/apama_webmethods/analytics/products/default.asp 11 http://www.tibco.com/products/event-processing/complex-event-processing/streambase-complex-event- processing

D4.1 Requirements and state of the art overview on flexible event processing

11 engines include: Triceps12 and WSO2 Complex Event Processing Server13 (which uses the Siddhi14 engine that started as a research project initiated at University of Moratuwa, Sri Lanka)

4.2.1 Esper (EsperTech Inc)15 Esper system [8]‎ , [18]‎ , which relies on a SQL-based language and Java has already been the target of previous benchmark studies [9]‎ . Esper is integrated into the Java and .NET languages (NEsper) and can be used in CEP applications as a library. For ease of understanding, one could conceptualize the Esper engine as a database turned upside-down. Traditional database systems work by storing incoming data in disks, according to a predefined relational schema. They can hold an exact history of previous insertions and updates are usually rare events. User queries are not known beforehand and there are no strict constraints as far as their latency is concerned. The Esper engine, on the other hand, lets users define from the very start the queries they are interested in, which act as filters for the streams of incoming data. Events satisfying the filtering criteria are detected in “real-time" and may be pushed further down the chain of filters for additional processing or published to their respective listeners/subscribers [10].‎

Esper provides a rich set of constructs by which events and event patterns can be expressed. One way to achieve event representation and handling is through the use of expression-based pattern matching. Patterns incorporate several operators, some of which may be time-based, and are applied to sequences of events. A new event matches the pattern expression whenever is satisfies its filtering criteria. Another method to process events is through the event processing language (EPL) queries which resemble in their syntax that of the well-known SQL. The most common SQL constructs may also be used in EPL statements. However, the defined queries are not applied to tables but to views, which can be understood as basic structures for holding events, according to certain user demands, e.g. the need for grouping based on certain keys or for applying queries to events up to certain time point in the past [10].‎

4.2.2 IBM Proactive Technology Online (PROTON) In the FERARI project the complex event processing component is built on and extends the IBM Proactive Technology Online (PROTON) research asset. This asset has become open source16 in the scope of the FI-WARE FI-PPP project17 (PROTON being the CEP Generic Enabler in the FI-WARE platform18). Technical documentation regarding PROTON can be found in [24]‎ [25]‎ , and [26]‎ .

PROTON comprises an authoring tool, a run-time engine, producers, and consumers adapters. Specifically, it includes an integrated run-time platform to develop, deploy, and maintain event-driven

12 http://triceps.sourceforge.net/ 13 http://wso2.com/products/complex-event-processor 14 http://siddhi-cep.blogspot.co.il/ 15 http://www.espertech.com/ 16 Link to the open source: https://github.com/ishkin/Proton 17 http://www.fi-ware.org/ 18 https://forge.fi-ware.org/plugins/mediawiki/wiki/fiware/index.php/FI-WARE_Architecture

D4.1 Requirements and state of the art overview on flexible event processing

12 applications using a single programming model. The specific architecture of PROTON and its implementation in the scope of the FERARI project are described in D2.1 – Architecture definition.

4.2.3 Open source event processing running on distributed stream computing platforms As previously mentioned, many vendor products that claim streaming analytics functionality are actually frameworks to ingest and route data, but they lack streaming operators and developers must code them themselves. Henceforth, we survey three recent attempts of integrating event processing open source tools with DSCP open source platforms.

4.2.3.1 Streaming-cep-engine Streaming-cep-engine19 is a Complex Event Processing platform built on Spark Streaming. It is the result of combining the power of Spark Streaming as a continuous computing framework and Siddhi CEP engine as complex event processing engine (Siddhi is the core engine of WSO2 open source tool). It was first introduced in Spark Summit 201420.

4.2.3.2 Esper on top of Storm The storm-esper21 library provides a bolt that allows using Esper queries on Storm data streams (for Storm building blocks refer to D2.1). Storm’s tuples are quite similar to Esper’s map event types. The tuple field names map naturally to map keys and the field values to values for these keys. The tuple fields are not typed when they are defined, and considered by Esper as of type Object. In addition, the fact that tuples have to be defined before a topology is running makes it relatively easy to define the map event type in the setup phase.

The Esper bolt itself is generic. It receives esper statements and the names of the output fields which will be generated by these esper statements.

The bolt code itself consists of three pieces. The setup part constructs map event types for each input stream and registers them with Esper. The second part is the transfer of data from Storm to Esper. The execute(Tuple tuple) method is called by Storm whenever a tuple from any of the connected streams is sent to the bolt. The Esper bolt code first has to find the event type name corresponding to the tuple. Then it iterates over the fields in the tuple and puts the values into a map using the field names as the keys. Finally, it passes that map to Esper.At this moment, Esper routes this map (the event) through the statements which in turn might produce new data that it needs to hand back to Storm. For this purpose, the bolt registered itself as a listener for data emitted from any of the statements that were configured in Storm during the setup. Esper then calls back the update method on the bolt if one of the statements generated data. The update method will then basically perform the reverse operation of the execute method and convert the event data to a tuple.

19 http://stratio.github.io/streaming-cep-engine/ 20 http://spark-summit.org/2014/talk/stratio-streaming-a-new-approach-to-spark-streaming 21 https://github.com/tomdz/storm-esper

D4.1 Requirements and state of the art overview on flexible event processing

13

4.2.3.3 PROTON on top of Storm Storm is an Apache incubation project at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. It is utilized by well-known companies with significant volumes of streaming data, such as The Weather Channel, Spotify, Twitter, and Rocket Fuel (refer to D2.1).

In the scope of the FERARI project, PROTON has been implemented on top of Storm, thus making it a distributed and scalable CEP engine. For details refer to D2.1 – Architecture definition.

4.3 Research tools There are also many research tools developed in the last decade. They include:

• Amit (IBM Haifa Research Lab) [12]‎ • Aurora (Brandeis University, Brown University and MIT)22 • Borealis (Brandeis University, Brown University and MIT)23 • Cayuga (Cornell University)24 • ETALIS (Forschungszentrum Informatik Karlsruhe and Stony Brook University) [47]‎ • NiagaraST (Portland State University)25 • STREAM (Stanford University)26 • Telegraph (UC Berkeley)27 • epZilla (University of Moratuwa)28

We will briefly describe Amit and Etalis as these are the only two ones appearing in the last CEP Tooling Market Survey 2014 [21]‎ .

4.3.1.1 Amit [12]‎ IBM Research in Haifa has developed a fully functional event processing research asset [12],‎ which is capable of processing raw event streams from different sources, indentifying specific patterns that are of interest, and forwarding derived events to subscribers. Amit is no longer supported and has been replaced by open source PROTON (see 4.2.2‎ ).

4.3.1.2 Etalis [47]‎ The ETALIS system provides an expressive logic-based language for specifying and combining complex events. For this language both a syntax as well as a formal declarative semantics are provided. The language enables efficient run time event recognition and supports deductive reasoning. Execution model of the language is based on a compilation strategy into Prolog.

22 http://cs.brown.edu/research/aurora/ 23 http://cs.brown.edu/research/borealis/public/ 24 http://www.cs.cornell.edu/bigreddata/cayuga/ 25 http://datalab.cs.pdx.edu/niagaraST/ 26 http://infolab.stanford.edu/stream/ 27 http://telegraph.cs.berkeley.edu/ 28 http://www.epzilla.org/

D4.1 Requirements and state of the art overview on flexible event processing

14

4.4 Limitations of contemporary event processing tools As has been presented above, there is a large variety of research prototypes as well as commercial products and platforms. Still, despite the outlooks and the maturity of the tools, CEP tools are not widely used. In fact, most applications that implement CEP logic don’t use dedicated event processing tools [1]‎ . Some user companies have written custom applications with CEP logic rather than leveraging an off-the- shelf event processing platform. This was especially common in the 1990s and early 2000s before the products were widely available, and some developers still choose to write their own CEP logic for performance or cost reasons. For example, large banks and related financial services companies have built front-office systems for capital markets trading with their own embedded CEP logic [32]‎ . Gartner analyst Roy Schulte estimated in July 2012 that around 95% of the event processing applications are built using ad-hoc programming and do not use existing frameworks [33]‎ . Two of the main reasons [2]‎ [31]‎ are the difficulty to think in terms of event driven architectures which are asynchronous in nature, and the relative complexity of existing tools, making them impractical and inaccessible for business users. In practice, the design of event-driven applications is either done using current dedicated event processing tools by skilled IT developers that have good familiarity with the event processing engine and the particular way to bypass the engine’s limitations, or in hand-coded fashion. As pointed out by Forrester [19]‎ “The streaming application programming model is unfamiliar to most application developers. It’s a different paradigm from normal programming where code execution controls data. In streaming applications, the incoming data controls the code”.

In addition, current tools also often lack the ability to process large volumes of distributed (complex) events which become increasingly important in modern automated business decision processes.

In other words, current event processing tools are not flexible enough, as they require IT expert skills, do not easily scale, and cannot always run in distributed environments, limiting their usability and widely spread in the Big Data era.

Before discussing the requirements for flexible event processing systems in details, we next we describe some basic terms necessary for gaining a common understanding.

5 Complex event processing background Since no widely accepted standard exists for the concepts of event processing, several synonyms appear in the literature and several attempts have been made in the last years towards homogeneity.

The Event Processing Technical Society (EPTS) is an inclusive group of organizations and individuals aiming to increase awareness of event processing, foster topics for future standardization, and establish event processing as a separate academic discipline. The goal of the EPTS is development of shared understanding of event processing terminology. The society believes that through communicating the shared understanding developed within the group it would become a catalyst for emergence of effective interoperation standards, would foster academic research, and creation of training curriculum. In turn, it

D4.1 Requirements and state of the art overview on flexible event processing

15 would lead to establishment of event processing as a discipline in its own right. The EPTS members hope that through combination of academic research, vendor experience and customer data they will be able develop a unified glossary, language, and architecture that would homogenize event processing in a similar way. The society started as an informal group in 2005/2006. It was formally launched as a consortium in June 2008. Membership of the consortium is based on a formal agreement defining intellectual property (IP) ownership terms and rules of engagement. The society is governed by a Steering Committee consisting of founding members of the organization, representatives of major vendors, and scientists. It is partner of the major scientific event processing conference: Distributed Event Based Systems (DEBS), the major scientific rules conference: International Web Rule Symposium (RuleML) and also launched two Dagstuhl seminars on event processing (May 2007 and 2010). It has also published an event processing glossary [17]‎ . However, the EPTS is almost not-active nowadays.

Also, recent efforts, such as, the Real-time Business Insight Event Processing in Practice and Event Processing Online Magazine, have stopped their activities.

As a result, each complex event processing engine uses its own terminology and semantics. We follow the semantics presented in Etzion’s and Niblet’s book [7]‎ and applied in PROTON. We describe below some main terms used in our work for the sake of clarity.

Henceforth we briefly present main concepts and building blocks in our terminology. For further details refer to [7]‎ .

5.1 Event types Generally speaking, an event is an occurrence within a particular system or domain; it is something that has happened, or is contemplated as having happened in that domain ([7]‎ [23]‎ ). The word “event” is also used to mean a programming entity that represents such an occurrence in a computing system. In the latter definition, an event is an object of an event type. Events are actual instances of the event types and have specific values. For example, the event "today at 10 PM a customer named John Doe made a new deposit to his bank account” is an instance of the Transaction event type. An event type specifies the information that is contained in its event instances by defining a set of attributes. The event attributes are grouped into the header or metadata (e.g., the occurrence time of the event instance) and the body or payload (specific information about the event, e.g., customer name).

We relate to the following event types:

A raw event is an event that is introduced into an event processing system by an event producer (an entity at the edge of an event processing system that introduces events to the system). An example of a raw event is a Cash deposit into a bank account.

A derived event is an event that is generated as a result of event processing that takes place inside the event processing system. An example is that a Large cash deposit has been made into a bank account.

D4.1 Requirements and state of the art overview on flexible event processing

16

A situation is a derived event that is emitted outside the event processing system and consumed by at least one consumer (an entity at the edge of an event processing system that receives events from the system). An example is a Suspicious bank account.

5.2 Event attributes Every event instance has a set of built-in attributes (metadata) and a set of payload attributes. PROTON employs the following attributes in the event type's metadata:

 Name – of the event type.

 OccurenceTime – a timestamp attribute, which we expect the event source to fill in as the occurrence time of the event. If left empty, this equals the detectionTime attribute value.

 DetectionTime – a timestamp attribute that records the time the CEP engine detected the event. The time is measured in milliseconds, specifying the time difference between the current machine time at the moment of event detection and midnight, January 1, 1970 UTC.

 EventId – a unique string identification of the event, which can be set by the event source to identify the event instance.

 EventSource – holds the source of the event (usually the name of event producer).

The above built-in attributes can be used in an expression in the same manner as user-defined attributes. User defined attributes can be added to the event type by specifying their names and object types. If the attribute is an array, its dimension should be specified.

5.3 Context Context is a named specification of conditions that groups event instances so they can be processed in a related way. While there exist several context dimensions, in this report we employ the two most commonly used dimensions (in the future we might enlarge the set of context types, depending on the scenarios requirements): temporal and segmentation-oriented. A temporal context consists of one or more time intervals, possibly overlapping. Each time interval corresponds to a context partition, which contains events that occur during that interval. A segmentation-oriented context is used to group event instances into context partitions based on the value of an attribute or collection of attributes in the instances themselves. As a simple example, consider a single stream of input events, in which each event contains a customer identifier attribute. The value of this attribute can be used to group events so there is a separate context partition for each customer. Each context partition contains only events related to that customer, so the behaviour of each customer can be tracked independently of the other customers. A composite context is a context that is composed from two or more contexts, known as its members. The set of context partitions for the composite context is the Cartesian product of the partition sets of the member contexts

D4.1 Requirements and state of the art overview on flexible event processing

17

5.4 Event Processing Network (EPN) An Event Processing Network (EPN) is a conceptual model, describing the event processing flow execution. An EPN comprises a collection of Event processing Agents (EPAs), event producers, events and consumers Figure 1. The network describes the flow of events originating at event producers and flowing through various event processing agents to eventually reach event consumers. For example, in Figure 1, events from Producer 1 are processed by Agent 1. Events derived by Agent 1 are of interest to Consumer 1 but are also processed by Agent 3 together with events derived from Agent 2. Note that the intermediary processing between producers and consumers in every installation is made up of several functions and often the same function is applied to different events for different purposes at different stages of the processing.

Event Event EPA 1 Producer 1 Consumer 1

Event Event EPA 2 EPA 3 Producer 2 Consumer 2

Figure 1: Illustration of an event processing network

The application definitions, i.e. the EPN, are written by the application developer during the build-time. In PROTON, the definitions output in JSON (JavaScript Object Notation) format, is provided as configuration to the CEP run-time engine.

5.5 Event Processing Agent (EPA) An Event Processing Agent (EPA) is a component that, given a set of input/incoming events within a context, applies some logic for generating a set of output/derived events. An EPA can apply different event patterns to detect specific relations among the input events.

An EPA performs three logical steps, a.k.a pattern matching process or event recognition (see Figure 2). Please note that all three steps are optional but at least one must be done inside an EPA.

 The filtering step, in which relevant events from the input events are selected for processing according to the filter conditions. The output of this step is a set of participant events.

 The matching step that takes all events that passed the filtering and looks for matches between these events, using an event processing pattern or some other kind of matching criterion. The output of this step is the matching set.

 The derivation step that takes the output from the matching step and uses it to derive the output events by applying derivation formulae.

D4.1 Requirements and state of the art overview on flexible event processing

18

Event Processing Agent

participant events filtering

matching set Incoming/input matching Derived/output events events

within deriving context

Figure 2: Event recognition process in an EPA

An event pattern is a template specifying one or more combinations of events. Given any collection of events, if it’s possible to find one or more subsets of those events that match a particular pattern, it can be said that such a subset satisfies the pattern. Some common examples of patterns are:

 Filter, means that each event is evaluated against an expression and the event is filtered-in only if it meets the expression conditions, and otherwise is filtered-out.

 Sequence, means that at least one instance of all participating event types must arrive in a specified order for the pattern to be matched.

 Count, means that the number of instances in the participant event set satisfies the pattern’s number assertion.

 All, means that at least one instance of all participating event types must arrive for the pattern to be matched; the arrival order in this case is immaterial.

 Trend, events need to satisfy a specific change (increasing or decreasing) over time of some observed value; this refers to the value of a specific attribute or attributes.

 Absence, a specified event(s) must not occur within a predefined time window. The matching set in this case is empty.

 SUM, means that the value of a specific attribute, summed up over all participant events, satisfies the sum threshold assertion.

5.6 Pattern policies A pattern policy is a named parameter that disambiguates the semantics of the pattern and the pattern matching process. Pattern policies fine-tune the way the pattern detection process works. PROTON supports five types of policies:

D4.1 Requirements and state of the art overview on flexible event processing

19

Evaluation policy – when the matching sets are produced? The EPA can either generate output incrementally (in this case the evaluation policy is called Immediate) or at the end of the temporal context (called Deferred).

Cardinality policy – how many matching sets are produced within a single context partition? Cardinality policy helps limiting the number of matching sets generated, and thus the number of derived events produced. The policy type can be single, meaning only one matching set is generated; or unrestricted, meaning there are no restrictions on the number of matching sets generated.

Repeated/Instance Selection type policy – what happens if the matching step encounters multiple events of the same type? The override repeated policy means that whenever a new event instance is encountered and the participant set already contains the required number of instances of that type, the new instance replaces the oldest previous instance of that type. The every repeated policy means that every instance is kept, meaning all possible matching sets can be produced. First means that every instance is kept, but only the earliest instance of each type is used for matching. Last is the same as first, but the latest instance of each type is used for matching.

Consumption policy – what happens to a particular event after it has been included in the matching set? Possible consumption policies are consume, meaning each event instance can be used in only one matching set; and reuse, meaning an event instance can participate in an unrestricted number of matching sets.

Policy relevance can be dictated by the event pattern. For example, the evaluation policy for an absence pattern is always deferred (as we are testing the existence of an event instance for a specified temporal context). Also, not all possible policies combinations are meaningful. For example, the choice of consumption policy is irrelevant if the cardinality policy is single, because that means that the matching step runs only once.

5.7 Context initiator policies A temporal context starts with an initiator and ends with a terminator. An initiator can be an event, system startup, or absolute time. A terminator ends the temporal context. The terminator can be an event, relative expiration time, an absolute expiration time, or “never ends”, i.e. the temporal context remains open until engine shutdown.

A context initiator policy tunes up the semantics for temporal contexts in which the context initiator is determined by an event. A context initiator policy defines the behaviours required when a window has been opened and a subsequent initiator event is detected. The options are: add, a new window is opened alongside the existing one; or ignore, the original window is preserved.

D4.1 Requirements and state of the art overview on flexible event processing

20

5.8 PROTON definitions In PROTON, the JSON CEP application definitions file can be created in three ways:

1. Build-time user interface – By this, the application developer creates the building blocks of the application definitions. This is done by filling up forms without the need to write any code. The file that is generated is exported in a JSON format to the CEP run-time engine.

2. Programming – The JSON definitions file can alternatively be generated programmatically by an external application and fed into the CEP run-time engine.

3. Manually – The JSON file is created manually and fed into the CEP run-time engine.

The created JSON file comprises the following definitions:

 Event types – the events that are expected to be received as input or to be generated as derived events. An event type definition includes the event name and a list of its attributes.

 Producers – the event sources and the way PROTON gets events from those sources.

 Consumers – the event consumers and the way they get derived events from PROTON.

 Temporal contexts – time window contexts in which event processing agents are active.

 Segmentation contexts – semantic contexts that are used to group several events to be used by the EPAs.

 Composite contexts – grouping together several different contexts.

 Event processing agents – patterns of incoming events in specific context that detect situations and generate derived events. An EPA includes most of the following general characteristics:

o Unique name o EPA type (operator). For each operator, different sets of properties and operands are applicable. o Context o Other properties such as condition o Participating events o Segmentation contexts o Derived events

The JSON file that is created at build-time contains all EPN definitions, including definitions for event types, EPAs, contexts, producers, and consumers. At execution, the standalone run-time engine accesses the metadata file, loads and parses all the definitions, creates a thread per each input and output

D4.1 Requirements and state of the art overview on flexible event processing

21 adapter and starts listening for events incoming from the input adapters (representing producers) and forwards events to output adapters (representing consumers).

For the distributed implementation on top of STORM, an input Bolt serves the same function as input adapter, and the derived events are passed as STORM tuples farther up in the chain of processing in STORM (for the full integration details refer to D2.1).

6 Requirements for flexible event processing In essence, in order for an event processing system to be flexible it has to fulfill two main requirements: it can easily adapt to distributed scalable architectures and is easy enough so that not-IT experts can define the event logic of an application. With regards to CEP, the FERARI project exactly addresses these gaps.

FERARI envisaged architecture provides a distributed scalable platform in which PROTON is already implemented. Refer to D2.1 – Architecture definition for details on the FERAI architecture. WP4 mainly addresses the second requirement.

In this section we will briefly describe the main non-functional requirements of event processing systems followed by a first cut of the mobile fraud use case event processing application design. We also introduce The Event Model (TEM). In the summary of this section we address how we will tackle the flexibility issue in the project.

6.1 Non-functional requirements of event processing applications The design of event processing applications consists of the design of the functional properties as well as the nonfunctional properties. Non-functional requirements are concerned not with what a system does but how well. It is often the non-functional properties that make or break a specific application [7]‎ . In the following subsections we briefly describe main aspects of non-functional requirements of event driven systems. A survey of the state of the art in the area of non-functional requirements can be found in [13]‎ .

The design of both functional and non-functional requirements is implementation specific and is either done using current dedicated event processing tools by skilled IT developers that have good familiarity with the event processing engine and the particular way to bypass the engine’s limitations, or in hand coded fashion. As aforementioned, in both cases, it is rather complex and the actual design is not accessible to the business users. With regards to non-functional requirements, the tuning is done according to the capabilities of the tool, and often it is not possible to optimize for multiple goals, such as trade-off between throughput and latency (see [14]‎ for such optimization methods).

D4.1 Requirements and state of the art overview on flexible event processing

22

6.1.1 Scalability Scalability is the capability of a system to adapt readily to a greater or lesser intensity of use, volume, or demand while still meeting its business objectives. Scalability has several dimensions. The dimensions relevant to event processing are the number of producers and consumers, number of input events, number of event processing agent types, processing complexity, number of derived events, number of concurrent runtime instances, number of concurrent runtime contexts [13]‎ and [7]‎ .

In the event processing world there are two common approaches to scalability: scaling out and scaling up. In the case of scaling out, or in other words, “horizontal scalability”, the approach is to add additional logical units or nodes to increase processing power, while on the surface making them work as a single unit. Example of such approaches is clustering of processing nodes and load balancing of incoming stream of data between nodes. Scaling up, or “vertical scalability” means adding resources within the same logical unit (node) to increase processing capacity, example of such is adding memory to a physical node.

Not all applications can be scaled using the above techniques, but rather they need to satisfy some constraints in order to be candidates for scale-up and scale-out, such as applications that can support partitioning of state and load balancing.

Both approaches have tradeoffs. Scale-up approach has a simple management model and no network communication overhead, however its growth potential is finite and there is no redundancy. On the other hand, in scale-out approach we gain performance, redundancy, fault tolerance, but the cost of such is increased management complexity, complex programming model, and communication overheads between nodes which need to be taken into account.

Event processing applications uses load-shedding and load-balancing approaches to ensure the desired performance using the limited resources provided to the application. For each application the options should be examined carefully to determine what the appropriate solution is.

6.1.2 Availability The availability of a system is the percentage of the time its users perceive it to be functioning. Event processing systems can use existing standard high availability practices like logging, failover, and disaster recovery practices. The designer of an event processing system must, however, make decisions related to high availability. These considerations relate to whether it is cost effective to employ high availability practices, as they have a cost associated with them and they may not be fully required in some applications. An example of such a consideration is the issue of recoverability, that is, is the ability to restore the state of a system to its exact value before a failure occurred.

Some event processing agents (such as those that perform aggregation, composition, and pattern detection) are stateful, that is, the internal state of such an agent has to be kept as long as the particular EPA instance is active, meaning as long as its context partition is valid. For example, a sequence pattern detect EPA running with the reuse policy over a 24-hour window might need to retain all the participant

D4.1 Requirements and state of the art overview on flexible event processing

23 events that occurred during that period. In some applications recoverability is a must. If the event processing is part of a mission-critical application, and decisions are made using the results of this processing, losing some of the system’s state may have critical implications.

However there are also event processing applications where high availability is not required, applications where events are symptoms of some underlying problem, which will occur again even if an event is lost, or systems looking for statistical trends based on sampling. In such applications the cost of applying high availability solution might well be too high based on the benefits which can be ripped from such a solution.

6.1.3 Security Security requirements relate both to ensuring that operations are only performed by authorized parties, and that privacy considerations are met. Specifically this means the following functions:

 Ensuring only authorized parties are allowed to be event producers or event consumers.  Ensuring that incoming events are filtered so that authorized producers can’t introduce invalid events or events that they are not entitled to publish.  Ensuring that consumers only receive information to which they are entitled. In some cases a consumer might be entitled to see some of the attributes of an event but not others.  Ensuring that unauthorized parties can’t add new event processing agents to the system, or make modifications to the EPN itself (in systems where dynamic EPN modification is supported).  Keeping auditable logs of events received and processed, or other activities performed by the system.  Ensuring that all databases and data communications links used by the system are secure.

6.1.4 Performance objectives Some non-functional requirements can be translated to performance objectives which can then be the subject of various optimization approaches. Some of the major performance objectives for event processing are related to throughput, latency, and time-constraint objectives.

All these objectives are intended to address scaling issues, but each addresses them using different assumptions and may be served by different optimizations. In addition, each objective may apply to an entire system, or to any part of a system. In some systems there is a single performance objective for all the processing in the system, for example, latency leveling for each event type in that system. In other systems, there may be mix of performance objectives; some of the events may have real-time constraints associated with them, whereas others may have another metric. Performance objectives may also be composed of several separate metrics.

One of the major ways to achieve various performance metrics is parallel processing. There are three levels of parallelism: first, parallelism inside a single core using multithreading; second, parallelism by partitioning the work within a multicore machine where the threads running in different cores have access to shared memory; and third, partitioning the work to multiple machines within a cluster.

D4.1 Requirements and state of the art overview on flexible event processing

24

An additional optimization method involves moving the processing close to the producers and consumers where applicable. Consider an example where there are multiple sensors within the same location, and the event processing involves aggregation of events that are emitted by these sensors. Placing the aggregation EPA close to the sensors can eliminate a substantial amount of network traffic. Likewise, if the EPN contains an EPA that creates many events that are all consumed by a certain consumer, or a set of consumers that are located in a certain location, it might be useful to locate this EPA close to the consumer or consumers. This optimization approach can also complement the parallel processing approach. If the parallel event processing is executed over a grid of machines within various geographic locations (instead of being on a physical cluster or co-located set of multicore machines) it might be sensible to co-locate a group of agents if there’s a substantial amount of communication between them.

In the research community several attempts were made to optimize the distribution and schedule of event processing networks. In [15],‎ a stratification algorithm is used to reveal dependencies among functions in an event processing network and co-locate independent functions in layers or strata. This allows for horizontal partitioning. The work in [15]‎ then elaborates on a profiling-based technique for event processing agent placements on execution nodes allowing for vertical partitioning. For example, if a sequential pattern is segmented by an identifier as payload of the events, the execution could be vertically partitioned by that identifier.

6.1.5 Usability As already mentioned, there are no standards for event processing programming languages, although there are various programming styles and approaches. In this section we look at two styles: the stream- oriented style and the rule-oriented style [34]‎ , [7]‎ .

6.1.5.1 Stream-oriented programming style The stream-oriented programming style is rooted in data flow programming. In essence a data flow graph is a directed graph that consists of nodes and edges. The nodes represent processing elements, and the edges represent data flowing between these nodes. The paradigm is one of continuous queries, sometimes called operators that are constantly running in the nodes, while their results flow through the edges in the data flow graph. The languages used to describe the queries are inspired by SQL and relational algebra, though not all of them are based on SQL. When we are using a data flow graph for event processing, the data flowing in the streams are event instances and have the appropriate event semantics. These event instances are represented as records, and are often referred to as tuples following the relational model’s terminology. A stream is a continuous flow of events, in most cases all of the same event type, and are considered to be tuples of the same relation. The stream may be unbounded and be active forever. This means that, unlike the conventional relational model where a query is executed against an entire table of data, in the continuous query model a query can execute only against a bounded subset of the stream. The stream is therefore broken up into a sequence of windows and the query is performed successively against each window. This style is very common in

D4.1 Requirements and state of the art overview on flexible event processing

25 existing tools, e.g., InfoSphere Streams, Tibco Streambase, SAP ESP, Esper, Oracle Event processing, and Microsoft Stream Insight.

6.1.5.2 Rule-oriented languages The other dominant style of event processing languages is the style called rule-oriented. There are several distinct types of rules: production rules, active (event-condition- action) rules, and rules based on logic programming. We briefly present each of these styles below.

6.1.5.2.1 Production rules Production rules are rules of the type “if –condition- then action”. They operate in a forward chaining way: when the condition is satisfied, the action is performed. Production rules are rooted in expert systems; the operational processing of production rules may be either declarative or procedural:

 Declarative production rule execution is typically based on a variation of the Rete [35]‎ algorithm which matches facts against the patterns contained in the rules to determine which rule conditions are satisfied. Information about the antecedents (conditions) of each rule is stored in an internal state, and in every execution cycle changes to these states are evaluated.  Procedural production rule execution is based on sequential execution of compiled rules.

Production rules are based on state changes and not on events; however, some event processing languages extend Rete-based production rules to support event processing. This is done by making events an explicit part of the model, so that event occurrences can be used as part of the conditions for invoking an inference rule. Thus the event processing is done through an inference process.

6.1.5.2.2 Active rules Active rules, also known as event-condition-action (ECA) rules, are descended from work on active databases. Active rules operate according to the following execution pattern: when an event occurs, evaluate conditions and, if they are satisfied, trigger an action. The event may be primitive or composite. The action can be one that derives an additional event, in which case an active rule maps directly onto an EPA in our model. In cases where the action performs some external activity, such as invoking an external service, the rule maps to the combination of an EPA and an event consumer. Example of tools that apply the ECA style: Apama and RulePoint.

6.1.5.2.3 Logic programming rules Logic programming is a programming style based on logical assertions. The most well-known example of a logic programming language is Prolog. The application of the logic programming style to event processing is rooted in the work done in the deductive database area. Commercial tools seldom apply this kind of style, but still can be found in Tibco Business Event. Research projects are more common, and can be found in the following languages: Etalis [42]‎ ; r-tec [37]‎ and [43]‎ [43]‎ ; SAGE [38]‎ ; and t-rex [44]‎ .

D4.1 Requirements and state of the art overview on flexible event processing

26

6.1.5.3 Build-time interfaces Event processing tools are composed of design (or build)-time and run-time components. The design component serves for the definition of the event-driven application, while the run-time is the engine that according to the event definitions, processes the events in real-time in order to detect and derive the desired situations. We can identify four types of build-time interfaces [13]‎ :

 Text based programming languages (e.g., Apama)  Visual languages (e.g., StreamBase)  Form based languages (e.g., PROTON)  Natural languages (e.g., ODM Advanced29)

These types are not mutually exclusive, as development environments can consist of a mixture of graphical and text oriented tools. The various environments reflect different assumptions about developers’ preferences. In some cases developers prefer a more familiar text-based interface, whereas others prefer a more visual style of development.

The task of defining the event definitions can be tedious and a hard task even for experts. In order to alleviate this task, in some engines, the event definitions can be learnt in an automated way using machine learning techniques. However, this aspect has received little attention so far. Some research work on machine learning techniques to define the event patterns can be found in [36]‎ and [37]‎ .

Most of the existing CEP engines have limitation on addition or modification of rules. Rules are configured initially once and are not expected to change later. In other words, once the rules are defined and configured the system freezes and rules cannot be added dynamically at run-time. However, rules might change over time due to the dynamic nature of the application. In Esper [8]‎ on Demand Query/Rule facility provides ad-hoc execution of an EPL expression, but it has some limitations. Drools30 uses a polling mechanism to support dynamic rules/queries at runtime. However, this approach is not very efficient as the system is not notified whenever there is a need to update the rule base; instead it polls the resources again and again. The proposed research tool in [45]‎ applies a push based or event driven approach for incorporating the dynamism in CEP engines. It has been implemented to extend Drools CEP engine. Note that in the scope of FERARI we intend to extend PROTON to cover some functionality with regards to dynamic updates.

6.2 Requirements for the mobile fraud use case The use of the system will be shown in two application scenarios from telecommunication, where end users will test the architecture for the two scenarios of mobile phone fraud detection and for cloud health monitoring. Right now we focus only on mobile fraud.

29 https://www- 01.ibm.com/support/knowledgecenter/SSQP76_8.7.0/com.ibm.odm.itoa.overview/topics/odm_itoa_overview.ht ml?lang=en-us 30 http://docs.jboss.org/drools/release/6.2.0.CR3/drools-docs/pdf/drools-docs.pdf

D4.1 Requirements and state of the art overview on flexible event processing

27

6.2.1 Description of the mobile fraud use case The overarching aim of the CEP component in this use case is to detect a potential mobile fraud incident. To this end, a first EPN has been created with the collaboration of the use case owner with the goal of having something meaningful and representative, yet doable to be achieved in the first year of the project. The outcome is an EPN consisting of five EPAs shown in Figure 3 and detailed in the following sections. For the sake of simplicity we only show the EPAs and the events flow in the network. The PROTON JSON definitions file that comprises this EPN is currently being implemented.

In the current EPN we want to fire situations in the following cases (for detailed descriptions of each EPA see Sections 6.2.2.1‎ -6.2.2.5‎ ):

 A long call to premium distance is made during night hours (EPA2, LongCallAtNight).  As before, but this time we are looking for at least three of these “long distance calls” per calling number (EPA2, FrequentLongCallsAtNight).  Multiple long distance calls per calling number that cost more than a certain threshold value (EPA3, FrequentLongCalls).  Same as before, but each occurrence cost exceeds the threshold (EPA4, FrequentEachLongCall)  We are looking for high usage of a line for long distance calls (EPA5, Expensivecall).

In the current process, potential fraud situations are (automatically) marked and inspected afterwards by a human operator who decides whether it is a fraud or not. Therefore, the situations described above and depicted in Figure 3 will be marked as potential indications of fraud incidents, and will be checked up by humans afterwards.

LongCallAtNight EPA1

FrequentLongCallsAtNight EPA2

FrequentLongCalls

EPA3

Calls Situations

FrequentEachLongCall EPA4

ExpensiveCalls EPA5

Figure 3: Mobile fraud use case initial EPN

Note the following:

D4.1 Requirements and state of the art overview on flexible event processing

28

 Due to privacy issues, the values chosen for specific variables and thresholds selected are not the correct ones. In reality, the EPN will be implemented applying the correct values. However, this does not alter the logic of the rules, just the assignment of the different variables and thresholds values.  “Premium location services” is a closed list of potential far locations/destinations for which the rules are relevant. We have opted for “Maldives” as a code name for these locations. In practice, the same pattern will be duplicated for each of the locations in this list.  In this use case night hours are considered between 19:00 and 7:00, and 24 hours are considered from 24:00 to 23:59 the day after.  We are only are interested in outgoing calls (incoming calls are not relevant to fraud detection), indicated whenever the call_direction field equals 1 (refer to Table 1).

6.2.1 Event types Five event types have been defined so far that comprise the event inputs, outputs/derived, and situations as shown in Table 1. For the sake of simplicity we only show the user-defined attributes or the event payload and not the metadata (Section 5.2‎ ).

Although the names of concepts in the application can be determined freely by the application designer in PROTON, we use some naming conventions for the sake of clarity. We denote event types with capital letters. Built-in/metadata attributes start with a capital letter, as well as payload attributes that hold operators values, while payload attributes start with a lower letter.

Note that the Call raw event includes more fields or attributes. We defined only the ones required for pattern detection in the current EPN implementation. When running in FERARI architecture, PROTON will ignore event attributes not specified in its JSON.

Table 1: Initial EPN for the mobile phone fraud use case

Event name Call Payload object_id; billed_msisdn; call_start_date; calling_number; called_number; other_party_tel_number; call_direction; tap_related; conversation_duration; total_call_charge_amount Event name LongCallAtNight Payload calling_number; conversation_duration; other_party_tel_number Event name FrequentLongCallsAtNight Payload calling_number; other_party_tel_number; CallsCount Event name FrequentLongCalls Payload calling_number; other_party_tel_number; CallsCount; CallsLengthSum Event name FrequentEachLongCall Payload calling_number other_party_tel_number; CallsCount Event name ExpensiveCalls Payload calling_number; other_party_tel_number; CallsCostSum

D4.1 Requirements and state of the art overview on flexible event processing

29

6.2.2 Event processing agents Henceforth, we describe the EPAs in the following order: Event name; motivation; event recognition process (following Figure 2); contexts along with temporal context policy; and pattern policies.

In the event recognition process we only show the steps that take place, i.e. relevant, in the specific EPA, while the others are greyed. For the filtering step we show the filtering expression; for the matching step we denote the pattern variables; and for the derivation step we denote the values assignment and calculations. Please note that for the sake of simplicity we only show the assignments that are not copy of values (all other derived event attributes values are copied from the input events). For attributes, we just denote their names without the prefix of ‘attribute_name.’

6.2.2.1 EPA1: LongCallAtNight Motivation: Check for “long” calls (defined as more than 40 min) to premium locations during night hours (limited from 19:00 to 7:00).

Event recognition process

Event Processing Agent other_party_tel_number = “Maildives” AND filtering call_direction = 1 AND (call_start_date > 19:00 OR call_start_date < 7:00) AND conversation_duration > 40 minutes

Call matching LongCallAtNight

within deriving context

Figure 4: Event recognition process for Filtering EPA

Note that Filter agents are used to eliminate uninteresting events. A Filter agent takes an incoming event object and applies a test to decide whether to discard it or whether to pass it on for processing by subsequent agents. The Filter agent test is therefore stateless, in other words, a test based solely on the content of the event instance. Therefore, both pattern and context policies are not applicable with this type of EPA.

Pattern policies

Evaluation Cardinality Repeated Consumption IMMEDIATE UNRESTRICTED FIRST REUSE

D4.1 Requirements and state of the art overview on flexible event processing

30

Context

Segmentation: Not applicable.

Temporal window: ALWAYS

Initiator policy: IGNORE

Meaning: The temporal window will open with the first Call and will not close.

Call

Figure 5: Context for Filter EPA

6.2.2.2 EPA2: FrequentLongCallsAtNight Motivation: Same as before, but we are seeking for at least 3 calls made to premium locations during night hours lasting longer than “40 minutes” per a calling number.

Event recognition process

Event Processing Agent

filtering

count> 2 FrequentLongCalls LongCallAtNight COUNT AtNight

within deriving context CallsCount : count

Figure 6: Event recognition process for FrequentLongCallsAtNight EPA

Note that the pattern COUNT sums the number of the input event occurrences, while count is the assertion value for the COUNT pattern. Also, that the input event for this EPA is LongCallAtNight event which is derived from EPA1 (see Figure 3).

D4.1 Requirements and state of the art overview on flexible event processing

31

Pattern policies

Evaluation Cardinality Repeated Consumption IMMEDIATE UNRESTRICTED FIRST REUSE Context

Segmentation: by calling_number

Temporal window: DAILY (fixed non-overlapping interval)

 Initiator: 24:00  Terminator: 23:59

Initiator policy: IGNORE

Meaning: The temporal window will open at 24:00 and will close at 23:59 per calling-number, so we group calls made during one day. The filter step will assure that only calls made at night will be considered in the counting. In Figure 7, the forth call does not pass the filter assertion, and therefore there is no a derived event at this point (as per the policies used, we start firing derived events at each time the pattern is satisfied).

Call

24 : 00 23 : 59

FrequentLongCallsAtNight FrequentLongCallsAtNight

Figure 7: Context for FrequentLongCallsAtNight EPA

6.2.2.3 EPA3: FrequentLongCalls Motivation: We are interested in detecting a situation resulting from at least 10 calls made to a premium location summing up at least 60 min length in a day.

Event recognition process

D4.1 Requirements and state of the art overview on flexible event processing

32

Event Processing Agent other_party_tel_number = “Maildives” AND filtering call_direction = 1 (count>9 and countSum(conversation_duration) > 60 min)

Call SUM FrequentLongCalls

within deriving CallsCount: count context CallsLengthSum: countSum

Figure 8: Event recognition process for FrequentLongCalls EPA

Note that the pattern SUM has two assertions, namely count (the number of occurrences to be satisfied) and countSum (the value to be exceeded).

Pattern policies

Evaluation Cardinality Repeated Consumption IMMEDIATE SINGLE FIRST REUSE Context

Segmentation: by calling_number

Temporal window: DAILY (fixed non-overlapping windows)

 Initiator: 24:00  Terminator: 23:59

Initiator policy: IGNORE

Meaning: The temporal window will open at 24:00 and will close at 23:59 per calling-number, so we group calls made during one day. In Figure 9, only one derived events will be fired as the pattern is satisfied on the 10th Call. Note that it might be that the pattern is also satisfied in following calls as well, but according to the policy we only notify once as the pattern is detected.

D4.1 Requirements and state of the art overview on flexible event processing

33

Call

24 : 00 23 : 59

FrequentLongCalls

Figure 9: Context for FrequentLongCalls EPA

6.2.2.4 EPA4: FrequentEachLongCall Motivation: A variation of the previous pattern. In this case, we are interested in detecting a situation resulting from at least 10 long (last at least 60 min each) calls made to a premium location in a day.

Event recognition process

Event Processing Agent other_party_tel_number = “Maildives” AND filtering call_direction = 1 AND conversation_duration > 60 minutes

count>9

Call COUNT FrequentEachLongCall

within deriving context CallsCount : count

Figure 10: Event recognition process for FrequentEachLongCall EPA

Pattern policies

Evaluation Cardinality Repeated Consumption IMMEDIATE SINGLE FIRST REUSE Context

Segmentation: by calling_number

Temporal window: DAILY (non-overlapping windows)

 Initiator: 24:00  Terminator: 23:59

D4.1 Requirements and state of the art overview on flexible event processing

34

Initiator policy: IGNORE

Meaning: The temporal window will open at 24:00 and will close at 23:59 per calling-number, so we group calls made during one day. In Figure 11, one derived event will be fired as the pattern is satisfied on the 12th call.

Call

24 : 00 23 : 59

FrequentEachLongCalls

Figure 11: Context for FrequentEachLongCall EPA

6.2.2.5 EPA5: ExpensiveCalls Motivation: For every six hours, we notify in case calls dialed to premium locations sum up more than a pre-defined cost (e.g. 100kn) per calling number.

Event recognition process

Event Processing Agent other_party_tel_number = “Maildives” AND filtering call_direction = 1 AND

countSum(total_call_charge_amount) > 100 kn

Call SUM ExpensiveCalls

within deriving CallsCostSum: countSum context

Figure 12: Event recognition process for ExpensiveCalls EPA

Pattern policies

Evaluation Cardinality Repeated Consumption IMMEDIATE SINGLE FIRST REUSE Context

Segmentation: by calling_number

D4.1 Requirements and state of the art overview on flexible event processing

35

Temporal window: sliding/overlapping window

 Initiator: first Call  Terminator: + 6 hours

Initiator policy: ADD

Meaning: The first window opens with the first Call event. This window closes after 6 hours. The second event opens, again, a six hour window, and so forth. Figure 13 shows the different windows that appear in different colors. As it can been seem, each event might correspond to more than window. The derived event is emitted only once (the cardinality policy is SINGLE) when the pattern is detected (the evaluation policy is IMMEDIATE).

Call

+6 hours +6 hours +6 hours+6 hours

ExpensiveCalls ExpensiveCalls

Figure 13: Context for ExpensiveCall EPA

6.2.3 Mobile phone fraud use case functional requirements summary The first EPN for the fraud detection use case (see Figure 3) includes five EPAs (types: FILTER, COUNT, and SUM), one raw event, and five situations. Our design and implementation relies on PROTON’s building blocks and capabilities and it might be possible that the same application will look differently when implemented in another CEP engine that uses different building blocks. The implementation of this EPN is currently work-in-process in FERARI’s architecture uses real-data that has been anonymized due to privacy issues. Further refinements of this initial EPN will include more event rules.

6.3 Introduction to the event model The Event Model (TEM) provides a new way to model, develop, validate, maintain, and implement event-driven applications. In TEM, the event derivation logic is expressed through a high-level declarative language through a collection of normalized tables (spreadsheet like fashion). These tables can be automatically validated and transformed into an EPN and eventually to a running application. This idea has already been successfully proven in the domain of business rules by The Decision Model (TDM) [16]‎ . TDM groups the rules into natural logical groups to create a structure that makes the model relatively simple to understand, communicate, and manage. TEM is based on a set of well-defined principles and building blocks, and does not require substantial programming skills, therefore target to non-technical people.

D4.1 Requirements and state of the art overview on flexible event processing

36

Current version of the model ([27]‎ [28]‎ , [29]‎ , [30]‎ ) covers part of the functional requirements of event- driven applications. In the scope of the FERARI project we plan to extend today’s basic model to cover all aspects of functional requirements as well as non-functional requirements which is still a missing piece in the model. The resulting tables will be converted into an EPN which can thereafter be converted into a JSON definition file and run in PROTON.

6.4 Summary of the requirements for flexible event processing in FERARI In this section we surveyed the main aspects of non-functional requirements from a (complex) event processing system and the main functional requirements addressed in the scope of the mobile fraud use case in the project.

In order to be flexible, event processing engines need to tackle the two following requirements in a satisfactory way:

• The easy adaptability to non-functional requirements, specially, the way the tool copes with scalability issues in a distributed environment. • The ease definition and maintenance of the event-driven logic.

Regarding the first requirement, in FERARI, the proposed architecture is a scalable distributed environment that combines event processing capabilities (PROTON) on top of a streaming platform (Storm). Regarding the second requirement, we propose to develop TEM, which enables the definition and maintenance of event-driven applications by non-technical people.

7 Summary and future steps Our goal in FERARI is to bring event processing much closer to the business world by extending simple stream processing of numeric or textual data to the much more powerful realm of complex event processing in a way that is both consumable to business users and as a seamless part of Big Data applications.

CEP has already built up significant momentum manifested in a steady research community and variety of commercial as well as open source products. Capitalizing on those works, our approach is to provide a model to construct event processing applications by using a goal-driven declarative approach to define the requirements for event processing applications and generate implementable complete designs out of these requirements. The requirements will include both functional requirements such as: event filtering, event aggregations, and event patterns, as well as non-functional requirements such as: scalability and fault-tolerance.

By applying TEM, flexibility is achieved using an implementation independent meta-model based on table representation that can be presented in a spreadsheet like fashion and a set of diagrams that are both easily consumable by business users who are used to work with spreadsheets, and expressive

D4.1 Requirements and state of the art overview on flexible event processing

37 enough so it can directly generate code. Note that the control of both the functional and non-functional specification over all the life cycle will stay at the hands of the business users – the generated code will be untouchable.

During the second year of the project we plan to extend current TEM tables and diagrams to cope with FERARI’s requirements in the mobile fraud detection use case along with the implementation of the event processing network for the use cases presented in this report in PROTON on Storm.

D4.1 Requirements and state of the art overview on flexible event processing

38

8 References

[1]. Altman R., Schulte W. R., Natis Y. V., Pezzini M., Driver M., Blanton C. E., Wilson N., and Van Huizen G. 2014. Agenda Overview for Application Architecture. Gartner report G00261571. Published: 10 January 2014. [2]. Linden A. 2104. Hype Cycle for Advanced Analytics and Data Science. Gartner report G00262076. Published: 30 July 2014. [3]. LeHong H., Fenn J., and Toit R. L-du. 2014. Hype Cycle for Emerging Technologies. Gartner report G00264126. Published: 28 July 2014. [4]. Steenstrup K. 2014. Hype Cycle for Operational Technology. Gartner report G00263170. Published: 23 July 2014. [5]. LeHong H. and Velosa A. 2014. Hype Cycle for the Internet of Things. Gartner report G00264127. Published: 21 July 2014. [6]. Cugola G. and Margara A. 2012. Processing Flows of information: From Data Stream to Complex Event Processing. ACM Comput. Surv., 44(3), 2012. [7]. Etzion O. and Niblett P. 2010. Event Processing in Action. Manning Publications Company. [8]. Esper reference document. [Online]. At: http://esper.codehaus.org/esper- 4.10.0/doc/reference/en-US/html/index.html. [9]. Mendes M. R., Bizarro P., and Marques P. 2009. A performance study of event processing systems. In Performance Evaluation and Benchmarking, 221-236. Springer. [10]. Alevizos E. and Artikis A. 2014. Being Logical or Going with the Flow? A Comparison of Complex Event Processing Systems. 8th Hellenic Conference on . [11]. Etzion O. 2010. Temporal Aspects of event processing. Handbook of distributed event based system. [12]. Adi A. and Etzion O. 2004. AMIT – The situation manager. VLDB J. 13 (2), 177-203. [13]. Etzion O., Rabinovich E., and Skarbovsky I. 2011. Non-functional properties of event processing. In Proceedings of the Fifth ACM International Conference on Distributed Event-Based Systems (DEBS 2011), 365-366. [14]. Rabinovich E., Etzion O., and Gal A. 2011. Pattern rewriting framework for event processing optimization. In Proceedings of the Fifth ACM International Conference on Distributed Event- Based Systems (DEBS 2011), 101-112. [15]. Lakshmanan G., Rabinovich Y., and Etzion O. 2009. A stratified approach for supporting high throughput event processing application. In Proceedings of the Fifth ACM International Conference on Distributed Event-Based Systems (DEBS 2009). [16]. von Halle B. and Goldberg L. 2010. The Decision Model. CRC Press. [17]. Luckham D. and Schulte R. 2011. EPTS Event Processing Glossary v2.0. Technical report. [Online]. At: http://www.complexevents.com/2011/08/31/epts-event-processing-glossary- updated-to-version-2-0/.

D4.1 Requirements and state of the art overview on flexible event processing

39

[18]. Fülöp L. J., Tóth G., Rácz R., Pánczél J., Gergely T., Beszédes A., and Farkas L. 2010. Survey on complex event processing and predictive analytics. In Proceedings of the Fifth Balkan Conference in Informatics, 26-31. [19]. Gualtieri M. and Curran R. 2014. The Forrester Wave™: Big Data Streaming Analytics Platforms. Q3 2014, July 17. [Online]. At: http://forms2.tibco.com/rs/tibcoinfra/images/ Forrester %20Wave %20Big%20Data% 20Streaming%20Analytics%207.17.14.pdf. [20]. Shulte R. 2014. An Overview of Event Processing Software (August, 25, 2014), [Online]. At: http://www.complexevents.com/ [21]. Vincent P., 2014. CEP tooling market survey. December 3, 2014, [Online]. At: http://www.complexevents.com/2014/12/03/cep-tooling-market-survey-2014/ [22]. Babcock B., Babu S., Datar M., Motwani R., and Widom J. 2002. Models and issues in data stream systems. In Proceedings of the 21st ACM SIGMOD/PODS Symposium on Principles of Database Systems (PODS’02). ACM, New York, NY, 1–16. [23]. Luckham D. The power of events: an introduction to complex event processing in distributed enterprise systems. 2001. Addison-Wesley Longman Publishing Co., Inc. [24]. Proton user guide and programmer guide [Online]. At: https://forge.fiware.org/plugins/mediawiki/wiki/fiware/index.php/CEP_GE_IBM_Proactive_Te chnologyOnline_User_and_Programmer_Guide [25]. Open specification (REST api) [Online]. At: http://forge.fi- ware.org/plugins/mediawiki/wiki/fiware/index.php/Complex_Event_Processing_Open_RESTful _API_Specification [26]. Installation and administration guide. [Online]. At: https://forge.fi- ware.org/plugins/mediawiki/wiki/fiware/index.php/CEP_GE_- _IBM_Proactive_Technology_Online_Installation_and_Administration_Guide [27]. Etzion O. and von Halle B.: 2013. The Event Model. [Online]. At: http://www.slideshare.net/opher.etzion/er-2013-tutorial-modeling-the-event-driven-world. [28]. Fournier F. and Limonad L., 2014. The BE2 model: When Business Events meet Business Entities. DAB14 Workshop. [29]. von Halle B. and Fournier F. 2014. Introducing the Next Horizon: The TEM Model (Part 1 - A Paradigm for Processing Complex Events in Real Time) http://www.modernanalyst.com/Resources/Articles/tabid/115/ID/3036/Introducing-the-Next- Horizon-The-Event-Model-TEM.aspx [30]. Fournier F. and von Halle B. 2014. Introducing the Next Horizon: The Event Model (Part 2 – The Event Processing in Action) http://www.modernanalyst.com/Resources/Articles/tabid/115/ID/3059/The-Event-Model- TEM-in-Action.aspx [31]. Etzion O. and Adkins J.M. 2013. 2013. Tutorial: Why is event-driven thinking different from traditional thinking about computing? In Proceedings of the Seventh ACM International Conference on Distributed Event-Based Systems (DEBS 2013), 269–270.

D4.1 Requirements and state of the art overview on flexible event processing

40

[32]. Schulte W.R. and Luckham D. 2013. Introduction to Real-Time Intelligence. [Online]. At: http://www.complexevents.com/2013/09/17/understanding-real-time-intelligence/, September 2013. [33]. Shulte W.R. 2012. Does anyone care about event processing?. [Online]. At: http://www.complexevents.com/2012/07/25/does-anyone-care-about-event-processing/, July 2012. [34]. Bry F., Eckert M., Etzion O., Pashchke A., and Riecke J. 2009. Event processing Language Tutorial, [Online]. At: http://www.slideshare.net/opher.etzion/debs2009-event-processing-languages- tutorial [35]. Forgy C. 1982. Rete: A Fast Algorithm for the Many Patterns/Many Objects Match Problem. Artificial Intelligence 19(1), 17-37. [36]. Margara A., Cugola G., and Tamburrelli G. 2014. Learning from the past: automated rule generation for complex event processing. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS2014), 47-58. [37]. Artikis A., Sergot M., and Paliouras G. 2014. An Event Calculus for Event Recognition. IEEE Transactions on Knowledge and Data Engineering (TKDE). [38]. Broda K. Clark, R. M. and Russo A. 2009. Sage: A logical agent-based environment monitoring and control system. In AmI, 112-117. [39]. Schultz-Moller N. P., Migliavacca M., and Pietzuch P. 2009. Distributed complex event processing with query rewriting. Proceedings of the Third ACM International Conference on Distributed Event-Based Systems (DEBS2009), 1-12. [40]. Demers A. J., Gehrke J., Hong M., Riedewald M., and White W. M. 2006. Towards expressive publish/subscribe systems. Intl Conference on Extending Database Technology (EDBT), 627-644. [41]. Wang F. and Liu P. 2005. Temporal management of rfd data. In Proceedings of the 31st VLDB Conference, 1128-1139. [42]. Anicic D., Fodor P., Rudolph S., Stuhmer R., Stojanovic N., and Studer R. 2011. Etalis: Rule-based reasoning in event processing. Reasoning in Event-Based Distributed Systems, 99-124. [43]. Artikis A., Paliouras G., Portet F., and Skarlatidis A. 2010. Logic-based representation, reasoning and machine learning for event recognition. Proceedings of the Forth ACM International Conference on Distributed Event-Based Systems (DEBS2010), 282-293. [44]. Cugola G. and Margara A. 2012. Complex event processing with t-rex. Journal of Systems and Software, 85(8), 1709-1728. [45]. Bhargavi R, Ravi Pathak, and Vaidehi V. 2013. Dynamic Complex Event Processing – Adaptive Rule Engine. International Conference on Recent Trends in Information Technology (ICRTIT), 189-194. [46]. Biscotti F., Schulte W.R., Iijima K., and Heudecker N.. 2014. Market Guide for Event Stream Processing. Gartner report G00263080. Published: 14 August 2014. [47]. Anicic D., Rudolph S., Fodor P., and Stojanovic N. 2012. Real-Time Complex Event Recognition and Reasoning – A Logic Programming Approach. Applied Artificial Intelligence, Volume 26 Special Issue on Event Recognition (January 2012).

D4.1 Requirements and state of the art overview on flexible event processing