Procedia Computer Science

Procedia Computer Science 101 , 2016 , Pages 323 – 332

YSC 2016. 5th International Young Scientist Conference on Computational Science,Science

Quality-based workload scaling for real-time streaming systems

Pavel A. Smirnov, Denis Nasonov ITMO University, St.Petersburg, Russia {smirnp, denis_nasonov}@niuitmo.ru

Abstract In this paper we propose an idea to scale workload via elastic quality of solution provided by the particular streaming applications. The contribution of this paper consists of quality-based workload scaling model, implementation details for quality assessment mechanism implemented at the top of and experimental evaluation of the proposed model on a synthetic and real-world (medical) examples.

Keywords: scaling model, data streaming, elastic workload, quality of service, apache storm, big data

1 Introduction Nowadays huge amounts of data are produced in real time manner. Autonomous vehicles, physical sensors, social-network activities, stock-exchange markets are typical producers of streaming data, which require immediate real-time processing to be actual in a short period of time. Performance optimization for real-time streaming applications is a complex non-trivial . The goal of profiliration and optimization process is to find an optimal tradeoff between two states of an application: underutilization and overloading. A result of non-optimal performance occurs queues of tuples behind the overloaded operators. Open-source streaming engines like Apache Storm* are smart enough to handle overloads by pausing an emission if downside operators cannot serve it (this feature is called “backpressure’). But if amounts of data from third-party sources remain unchangeable, an overloaded application will accumulate an endless queues of data, which may be lost as a result of crash. While there are at least 5 ways to scale application performance in general (Ahluwalia, 2007), only two of them become popular in field of data streaming: to increase operator’s parallelism and to add extra hardware resources. The literature overview in field of scaling streaming applications outlined that these two patterns are usually applied simultaneously: parallelism increase is reached via allocation of new operators on a newly-added hardware. From economic point of view new hardware resources cause additional costs, so it makes sense for more effective operators’ placement (called “scheduling”), which also impacts on application’s service rate.

* http://storm.apache.org/

Peer-review under responsibility of organizing committee of the scientific committee of the 323 5th International Young Scientist Conference on Computational Science © 2016 The Authors. Published by Elsevier B.V. doi: 10.1016/j.procs.2016.11.038 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov

In contrast with scheduling and resource scaling in this paper we propose the idea to scale workload via elastic quality of solution, which a streaming application provides. The contribution of this paper consists of: the quality-based workload scaling model for streaming applications; the implementation details of quality assessment mechanism implemented at the top of Apache Storm; experimental evaluation of the proposed model on the synthetic and real-world (medical) applications. The remainder of the paper is organized as follows: section II presents the literature study; the formal definition of the proposed model is presented in section III; the implementation details of quality assessment mechanism are described in section IV; section V presents the results of experimental evaluation; section VI contains conclusion and plans for the future work with potential use-cases regarding quality-based mass-serving applications.

2 Related works Today a workload scaling patterns for online streaming applications is the actual state-of-art problem. The widely spread open-source streaming engines like Apache Storm†, Samza‡, Flink§, Twitter Heron** and others cannot scale workload automatically, but may do it according to some signals from user. This lead to appearance of several simultaneous research efforts devoted to active and proactive workload scaling techniques. Paper (Xu, Peng, & Gupta, 2016) describes system called Stela, which provides active workload scaling via regular calculation of service rates (authors call it a congestion or ETP-metric) for all operators and increase it for the most ETP-slowest operators by allocating extra instances on newly allocated hardware. The Stela is implemented on top of the Apache Storm platform and automatically captures performance statistics from Storm API. Stela periodically refreshes operators’ ETPs and automatically scales-in and scales-out topologies in Apache Storm. The research (Heinze et al., 2015) proposes online optimization approach, which automatically detects changes in workload pattern, chooses and applies a scaling strategy to minimize number of hosts used for current workload characteristics. Changes of workload patterns are detected by an adaptive windows approach. The approach is implemented on top of FUGU (Heinze et al., n..), which is elastic data engine developed by the authors during previous research efforts. For overloaded hosts the system automatically decides which operators should be unchanged and which should be moved to a newly added resources. The decisions are made according to local and global threshold- based rules (Heinze, Pappalardo, Jerzak, & Fetzer, 2014), which deal with operators’ performance metrics. Authors also propose a QoS-based (Heinze, Jerzak, Hackenbroich, & Fetzer, 2014) approach for workload scaling, where latency is the main QoS constraint, usually strictly declared in service level agreement. In (De Matteis & Mencagli, 2016) authors propose latency latency-aware and energy-efficient scaling strategies with predictive capabilities. The strategies are made via adaptation of the Model Predictive Control – a technique for searching optimal applications’ configurations along a limited prediction horizon. The paper presents models, which describe dependency between QoS variables (latency, energy) and a configuration of the system (parallelism, CPU frequency and etc.). The result of optimization is a reconfiguration trajectory within a strictly-limited prediction horizon. Paper (Hidalgo, Wladdimiro, & Rosas, 2016) is devoted to reactive and predictive resource scaling according to workload demands. Authors apply optimization approach called fission to increase the amount of replicas for fully-loaded operators. The two algorithms are proposed to determine an operator’s state: a short-term and a mid-term state for peaks and patterns detection respectively. The

† http://storm.apache.org/ ‡ http://samza.apache.org/ § http://flink.apache.org/ ** http://twitter.github.io/heron/

324 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov

short-term algorithm classifies operator’s states according to predefined lower and upper thresholds. The mid-term algorithm uses a Markov Chain Model to calculate operators’ transition probabilities. The solution was implemented on top of S4 platform according to MAPE†† model. All the aforementioned papers provide either reactive or predictive workload scaling patterns via modification of a resource set (to add or to remove resource nodes). The major difference between aforementioned papers and our research is in subject for scale: we propose scaling via tuning of quality- sensitive application’s parameters instead of scaling via a resource set. The proposed quality-based scaling provides control for both: performance improvements and a quality loss. To the best of our knowledge there is no research regarding elastic quality for processing results.

3 Quality-based workload scaling In this section we present a formal definition for the proposed quality-based workload scaling model. The model is based on the existing models and extends them with novel formulations. 3.1 Model definition For the model we reuse some elements from third-party streaming application model based queuing theory (Beard & Chamberlain, 2013): ൌሺܸǡܧሻ – an application graph defined by set of vertex ܸ, which define operators and set of edges ܧ, which define flows of data; ɉሺܸ୧ሻ – mean operator’s data arrival rate; Ɋሺܸ୧ሻ – mean operator’s service rate; Here we introduce a matrix of operator’s instances defined as follows: ܹሺܸ୧ሻ=ሼݓଵ ǥݓ௝ሽ. To deal with operator’s performance and quality of it’s results we offer to reuse the performance model from quality-based approach for workflows (Butakov, Nasonov, Svitenkov, & Radice, 2016): ܶ௏௜ ൌ ݂ሺ݀ǡ݌ǡݎሻ – a performance model (calculation time) for operator ܸ௜, which depends on: ݀ = ݀௏௜ – characteristics of an input data (format, size, precision) arrived ݎ a particular resource – ܴא ൌ ௪೔ೕݎ ,௏௜ - one of available parameters’ configurationܲאto operator, ݌ node, where operator’s instance ݓ௜௝ will be launched (note, that ݌ and ݀ are the same for all instances :ݓ௜௝). Also we offer to reuse model of quality function and define for operator ܸ௜ is defined as follows ௏௜ ݍൌܳ ሺܳ௏௜ǡܲ௏௜ሻ, where ܳ௏௜ are quality parameters to measure, ܲ௏௜ ൌሼ݌ଵǡǤǤǤǡ݌௡ሽ parameter’s values, which impact on ܳ௏௜. To declare the operator’s allocation on a particular resource we introduce a schedule function: ܵܿ ൌ ݐ is a scheduling strategy. Theݏ ,is an application graph, ܴ is a resource pool ܩ ݐሻ, whereݏǡܴǡܩሺܨ aggregated elements notation is presented in table 1.

Formal definition Description ൌሺܸǡܧሻ application graph determined by set of operators ܸ and set of data flows ܧ ɉ௏௜ mean operator’s data arrival rate (tuples/s) Ɋ௏௜ mean operator’s service rate (tuples/s) ሺܸ୧ሻ=ሼݓଵ ǥݓ௝ሽ set of operator’s instancesܹ ܶ௏௜ ൌ݂ሺ݀ǡ ݌ǡ ݎሻ performance model for a single task performed by operator ܸ௜ ݀௏௜ characteristics of input data (format, size, precision) arrived to operator ܸ௜ ௏௜=ሼ݌ଵǡǤǤǤǡ݌௡ሽ parameter set for a single task of particular operator ܸ௜ܲא݌

resource node from pool ܴ where instance ݓ௜௝ is launched ܴא ௪೔ೕݎൌݎ ௏௜ ݍൌܳ ሺܳ௏௜ǡܲ௏௜ሻ quality function for operator ܸ௜;ܳ௏௜ – quality parameters to measure ݐݏ ݐሻ schedule defined by graph, resource pool and scheduling strategyݏǡܴǡܩሺܨൌ ܿܵ Table 1: Elements used to describe a quality-based scaling model

†† MAPE – Monitor, Analyze, Plan, Execute

325 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov

Assuming, if during data streaming process operators are already placed on particular nodes (ܵܿ ൌ

ݐ ), then operator’s performance ܶ௏௜ݏݐ ) and tuples are homogenous (݀௏௜ ൌ ܿ݋݊ݏ௪೔ೕ ൌ ܿ݋݊ݎ , ݐݏ݋݊ܿ becomes dependent only on parameter configuration ܲ௏௜ used for at the moment. The performance and quality functions of a particular application may have undetermined behavior, so we propose to determine them empirically. Having such a performance statistics gained on a particular resource, it becomes possible to calculate performance and quality correlation ܶ௏௜ሺܳ௏௜ሻaccording to empirical measurements. Like frequency oppositely depends on period, an operator’s instance service rate oppositely depends ଵ on single operation performance: Ɋ௏௜ ൌ , which is correspondingly depends on the quality function ்ೇ೔ ܳ௏௜. The total operator’s service rate may be defined as following ௘ ͳ Ɋ௏௜ሺܳ௏௜ሻൌ෍ ܶ௏௜ሺܳ௏௜ሻ ௝ୀଵ ݐ is an amount of parallel operator’s instances assumed to be fixed during alreadyݏwhere ݁ൌܿ݋݊ , launched streaming process. Service rate for whole application graph ܩ is limited by the slowest operator’s rate and defined as follows: Ɋீ ൌ ‹ሼ Ɋ ௏ଵ ǥǡɊ௏௜ሽ. As a result, the proposed model makes possible to calculate the acceptable range of data arrival rate ɉ to be served by application with a given quality. The model makes possible to answer the questions like “how the quality will be changed, if the data arrival rate will be increased/decreased”. The model may be used either for a predictive or for reactive workload scaling in already running applications.

3.2 Model assumptions The proposed model does not cover deep application- and infrastructure-specific factors, which impact on performance. So the model has the following assumptions: 1. Application graph is considered to be a black box: elastic quality of upstream operators do not impact on quality of downstream ones. In other words, quality changes do not have a recursive behavior. ݐݏݐ, ܴൌܿ݋݊ݏApplication is deployed at the fixed resource pool with a fixed schedule: ܵܿ ൌ ܿ݋݊ .2 and not will be rescheduled. Or it becomes a non-trivial to empirically gain the quality and performance statistics in case of random operator placement on heterogeneous sharable nodes. ݐ, ɉ ՜ λ. The model isݏInput data has a fixed structure and an unlimited arrival rate: ݀௏௜ ൌ ܿ݋݊ .3 considered to achieve maximum service rates on unlimited homogeneous data and do not covers the cases, when tuples have different size or fixed/random intervals between emissions.

326 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov

4 Implementation The proposed model was implemented as the quality manager (QM) component – an external component on top of the Apache Storm. The component is targeted for an empirical statistics accumulation and a control of launched topologies according to the statistics. While the model does not deal with scheduling, it is implemented as a Storm’s ICustomScheduler, which provides a good opportunity to impact on execution process from outside of Storm’s binaries. A quality-elastic topologies should contain operators (spouts and bolts) based on IQualitySpout and IQualityBolt interfaces, which provide information about quality-sensitive parameters and their value ranges to the QM. The QM iterates by the parameter’s values and measures the performance during an evaluation phase. The quality-sensitive operators communicate with the QM via built-in REST-API interface. The interactions between the QM and other Storm components are presented at Fig.1.

Figure 1: Quality manager’s interaction diagram

4.1 Evaluation phase For a newly launched and already scheduled topology the QM periodically iterates operators’ parameters with values within topology-specified range and captures operator’s service rates. The length of evaluation period depends linearly on the amount of quality-sensible operators, amount parameters’ values per operator and a control period for each configuration (the default value is 30 seconds). For operators’ placement QM uses one of the schedulers built-in Storm (default, multitenant, Resource- Aware (Peng, 2015)). While the evaluation process may take up to several hours, the procedure is fully automatic and produces a reusable statistical knowledge, which is applicable for the particularly scheduled topology. If in future the schedule will be changed during an additional statistics will be gathered automatically during an ad-hoc evaluation process.

327 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov

4.2 Parameters tuning The QM analyses data arrival rate (i.e. spouts’ service rates) and compares it with service rates of operators in topology. While Storm has a backpressure feature (it pauses a data emission when downside components’ queues are overloaded) the QM analyses dropdowns in emission performance. If proportion between arrival and service rates is more than 100% then the QM modifies the quality- sensible parameters to increase an operator’s performance. If there are no dropdowns and the proportion is around or less than 100% (i.e. no waiting queues), then the QM tries to increase an operator’s quality until the queues will start to occur. The QM also provides a history of parameters’ changes, which may be used for further improvements of the algorithm.

Figure 2: a) quality (Q) and time dependencies from parameter (P); b) time and predictive service rate dependencies from quality (Q)

5 Experimental study For experimental study of the proposed approach we prefer to demonstrate the following applications: simple synthetic workload and complex medical data monitoring application.

328 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov

5.1. Synthetic load application For a simple demo application we have implemented a topology with only two operators: the spout and the processBolt with single instances for each. Let’s assume, that a quality function in our synthetic ሺ݌ሻ ൌ ඥ݌ and a computation time grows linearly: ݐሺ݌ሻ ൌݍ model is determined by the power function ݌൅ͷͲ, where ݌ is some quality-sensible numerical parameter. While processing operator in our experiment is presented by a single instance, the overall application’s service rate Ɋ depends only from value of the parameter P specified for the instance at the moment. Dependencies of quality and computation time from parameter ݌ are presented at Fig.2a. In the particular experiment values are calculated according to the known functions, but in general case these dependencies are unknown and will be gathered automatically by volatizing all available parameter values. The dependency of computation time and predictive service rate Ɋ from the quality are presented at Fig.2b. Results of an experimental topology launch on a real cluster are presented at Fig.3a,b. The two experiments demonstrate standard and elastic behavior of the topology with different data arrival rates: 1, 5 and 10 tuples per second. The lower quality threshold was specified by user at level 50%. Results of the second experiment (Fig.3b) demonstrate that the bolt (marked by green line) dynamically adopts to changes of data arrival rate (i.e. spout’s service rate marked by red line) with a corresponding loss of quality of results (Fig.3c). It may be seen, what dynamic workload scaling makes possible to boost a performance up to 250% with predictable quality loss (up to 45%). The boost of a service rate makes Storm not to pause the spout’s emission, like it occurs because of backpressure (dropdown during 37-41 min at Fig.3a).

Figure 3: Non-scaled (A) and quality-based reactively-scaled (B) operator’s service rates; Quality elasticity (C) during automatic scaling

329 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov

5.2. Medical data monitoring application The second example for quality-based workload scaling is an electroencephalogram (EEG) monitoring topology, which plays the main role in experimental prototype of a medical mass serving system. The idea of the application is to classify an actual data from neural interface sensors and to predict the probability of epileptic seizures according to previously trained model. The problem definition, datasets and classifier algorithms were found at Kaggle‡‡ and Github§§. The python source- codes have been wrapped as Storm’s shell components to be launched as multilanguage-support topology. The original source-codes have been extended with elastic frequency and windowing capabilities, which significantly impact on tuples’ volumes and a processing time. The probability of epileptic seizures predicted according to the original (200Hz) and some reduced frequency data presented at Fig. 4. From the picture may be seen, what the 50 and 100 Hz results deviate from the original one with only 5 and 2% respectively, while the 150 Hz result frequency demonstrates rather strong quality dropdown (12%) and will be ineffective to use for scaling. As a result a set of tunable parameter values is be defined by user instead of iterating parameter values within fixed thresholds, so non-effective parameter values will be ignored by system during scaling process to avoid useless operations.

Figure 4: Accuracy of prediction of epileptic seizures for original (200 Hz) and frequency-reduced data

The dependencies between time and quality dependencies are presented at Fig.5. The comparison between the auto-scaled and static topology and the quality behavior during the experiment presented at Fig..6. From the figures it may be seen that the real service rate (40-200 tuples/min at Fig. 6b) almost

‡‡ https://www.kaggle.com/c/seizure-prediction §§ https://github.com/MichaelHills/seizure-prediction

330 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov

achieve the predicted one (40-190 tuples/min at Fig. 5b). The experiment demonstrates up 400% better performance with up to 10% of quality loss. For a mass serving system with online patients monitoring this elasticity allows to launch some extra risk-assessment computations for preventive treatment and hospitalization for the patients with highly probable epileptic seizures.

Figure 5: a) Time and Quality dependency from parameter value (frequency). b) Time and predicted service rare dependency from quality

Figure 6: Non-scaled (a) and quality-based reactively-scaled (b) operator’s service rates for medical monitoring application; quality elasticity (c) during automatic scaling

331 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov

6 Conclusion & future works In this paper we have presented a quality-based workload scaling model for a real-time streaming applications. The model is successfully implemented as an external component for Apache Storm engine. Experimental results, demonstrating adaptive workload scaling for synthetic and real-world applications are also presented. The outstanding feature of the proposed model is an ability to significantly boost the application’s service rate with a controlled quality loss. Workload scaling is performed on a fixed number of resources, which makes mass serving system more flexible for unpredicted input loads. Quality-based algorithms are highly actual in field of mass serving systems, especially for machine- learning (ML) tasks. Vast amounts of state-of-art research are devoted to application ML techniques for online streaming processing. The ongoing research will be devoted to elaboration of more complex quality dependencies and recursive quality impact: how quality of upstream operators of topology will impact on quality of downstream ones. Volatile quality rates make possible to organize elastic mass service systems with workload scaling on a fixed set of resources. Acknowledgement: This paper is financially supported by Ministry of Education and Science of the Russian Federation, Agreement #14.578.21.0077 (24.11.2014) RFMEFI57814X0077.

References Ahluwalia, K. (2007). design patterns. //Proceedings of the 14th Conference on Pattern Languages of Programs, ACM, С. 2. Beard, J. C. (2013). Simple analytic performance models for streaming data applications deployed on diverse architectures. Butakov N. et al. (2016.). Quality-based Approach to Urgent Workflows Scheduling. Procedia Computer Science. – 2016. – Т. 80, 2074-2085. De Matteis T., Mencagli G. Keep calm and react with foresight: strategies for low-latency and energy- efficient elastic data stream processing //Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. – ACM, 2016. . (n.d.). Heinze T. et al. Latency-aware elastic scaling for distributed data stream processing systems //Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems. – ACM, 2014. – С. 13-22. (n.d.). Heinze, e. a. (2014). Auto-scaling techniques for elastic data stream processing. Data Engineering Workshops (ICDEW), 296-302. Heinze, e. a. (2015). FUGU: Elastic Data Stream Processing with Latency Constraints. Data Engineering, 73. Heinze, T. e. (2015). Online parameter optimization for elastic data stream processing. //Proceedings of the Sixth ACM Symposium on . ACM, 276-287. Hidalgo N., Wladdimiro D., Rosas E. Self-adaptive processing graph with operator fission for elastic stream processing //Journal of Systems and Software. – 2016. (n.d.). Peng, e. a. (2015). R-storm: Resource-aware scheduling in storm. Proceedings of the 16th Annual Middleware Conference. – ACM, 2015, pp. 149-161. Self-adaptive processing graph with operator fission for elastic stream processing. (n.d.). Xu L., P. B. (2016.). Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On-demand. //IEEE International Conference on Cloud Engineering (IC2E).

332