Quality-Based Workload Scaling for Real-Time Streaming Systems

Procedia Computer Science Procedia Computer Science 101 , 2016 , Pages 323 – 332 YSC 2016. 5th International Young Scientist Conference on Computational Science,Science Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov, Denis Nasonov ITMO University, St.Petersburg, Russia {smirnp, denis_nasonov}@niuitmo.ru Abstract In this paper we propose an idea to scale workload via elastic quality of solution provided by the particular streaming applications. The contribution of this paper consists of quality-based workload scaling model, implementation details for quality assessment mechanism implemented at the top of Apache Storm and experimental evaluation of the proposed model on a synthetic and real-world (medical) examples. Keywords: scaling model, data streaming, elastic workload, quality of service, apache storm, big data 1 Introduction Nowadays huge amounts of data are produced in real time manner. Autonomous vehicles, physical sensors, social-network activities, stock-exchange markets are typical producers of streaming data, which require immediate real-time processing to be actual in a short period of time. Performance optimization for real-time streaming applications is a complex non-trivial process. The goal of profiliration and optimization process is to find an optimal tradeoff between two states of an application: underutilization and overloading. A result of non-optimal performance occurs queues of tuples behind the overloaded operators. Open-source streaming engines like Apache Storm* are smart enough to handle overloads by pausing an emission if downside operators cannot serve it (this feature is called “backpressure’). But if amounts of data from third-party sources remain unchangeable, an overloaded application will accumulate an endless queues of data, which may be lost as a result of crash. While there are at least 5 ways to scale application performance in general (Ahluwalia, 2007), only two of them become popular in field of data streaming: to increase operator’s parallelism and to add extra hardware resources. The literature overview in field of scaling streaming applications outlined that these two patterns are usually applied simultaneously: parallelism increase is reached via allocation of new operators on a newly-added hardware. From economic point of view new hardware resources cause additional costs, so it makes sense for more effective operators’ placement (called “scheduling”), which also impacts on application’s service rate. * http://storm.apache.org/ Peer-review under responsibility of organizing committee of the scientific committee of the 323 5th International Young Scientist Conference on Computational Science © 2016 The Authors. Published by Elsevier B.V. doi: 10.1016/j.procs.2016.11.038 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov In contrast with scheduling and resource scaling in this paper we propose the idea to scale workload via elastic quality of solution, which a streaming application provides. The contribution of this paper consists of: the quality-based workload scaling model for streaming applications; the implementation details of quality assessment mechanism implemented at the top of Apache Storm; experimental evaluation of the proposed model on the synthetic and real-world (medical) applications. The remainder of the paper is organized as follows: section II presents the literature study; the formal definition of the proposed model is presented in section III; the implementation details of quality assessment mechanism are described in section IV; section V presents the results of experimental evaluation; section VI contains conclusion and plans for the future work with potential use-cases regarding quality-based mass-serving applications. 2 Related works Today a workload scaling patterns for online streaming applications is the actual state-of-art problem. The widely spread open-source streaming engines like Apache Storm†, Samza‡, Flink§, Twitter Heron** and others cannot scale workload automatically, but may do it according to some signals from user. This lead to appearance of several simultaneous research efforts devoted to active and proactive workload scaling techniques. Paper (Xu, Peng, & Gupta, 2016) describes system called Stela, which provides active workload scaling via regular calculation of service rates (authors call it a congestion or ETP-metric) for all operators and increase it for the most ETP-slowest operators by allocating extra instances on newly allocated hardware. The Stela is implemented on top of the Apache Storm platform and automatically captures performance statistics from Storm API. Stela periodically refreshes operators’ ETPs and automatically scales-in and scales-out topologies in Apache Storm. The research (Heinze et al., 2015) proposes online optimization approach, which automatically detects changes in workload pattern, chooses and applies a scaling strategy to minimize number of hosts used for current workload characteristics. Changes of workload patterns are detected by an adaptive windows approach. The approach is implemented on top of FUGU (Heinze et al., n.d.), which is elastic data stream processing engine developed by the authors during previous research efforts. For overloaded hosts the system automatically decides which operators should be unchanged and which should be moved to a newly added resources. The decisions are made according to local and global threshold- based rules (Heinze, Pappalardo, Jerzak, & Fetzer, 2014), which deal with operators’ performance metrics. Authors also propose a QoS-based (Heinze, Jerzak, Hackenbroich, & Fetzer, 2014) approach for workload scaling, where latency is the main QoS constraint, usually strictly declared in service level agreement. In (De Matteis & Mencagli, 2016) authors propose latency latency-aware and energy-efficient scaling strategies with predictive capabilities. The strategies are made via adaptation of the Model Predictive Control – a technique for searching optimal applications’ configurations along a limited prediction horizon. The paper presents models, which describe dependency between QoS variables (latency, energy) and a configuration of the system (parallelism, CPU frequency and etc.). The result of optimization is a reconfiguration trajectory within a strictly-limited prediction horizon. Paper (Hidalgo, Wladdimiro, & Rosas, 2016) is devoted to reactive and predictive resource scaling according to workload demands. Authors apply optimization approach called fission to increase the amount of replicas for fully-loaded operators. The two algorithms are proposed to determine an operator’s state: a short-term and a mid-term state for peaks and patterns detection respectively. The † http://storm.apache.org/ ‡ http://samza.apache.org/ § http://flink.apache.org/ ** http://twitter.github.io/heron/ 324 Quality-based workload scaling for real-time streaming systems Pavel A. Smirnov and Denis Nasonov short-term algorithm classifies operator’s states according to predefined lower and upper thresholds. The mid-term algorithm uses a Markov Chain Model to calculate operators’ transition probabilities. The solution was implemented on top of S4 platform according to MAPE†† model. All the aforementioned papers provide either reactive or predictive workload scaling patterns via modification of a resource set (to add or to remove resource nodes). The major difference between aforementioned papers and our research is in subject for scale: we propose scaling via tuning of quality- sensitive application’s parameters instead of scaling via a resource set. The proposed quality-based scaling provides control for both: performance improvements and a quality loss. To the best of our knowledge there is no research regarding elastic quality for processing results. 3 Quality-based workload scaling In this section we present a formal definition for the proposed quality-based workload scaling model. The model is based on the existing models and extends them with novel formulations. 3.1 Model definition For the model we reuse some elements from third-party streaming application model based queuing theory (Beard & Chamberlain, 2013): ൌሺܸǡܧሻ – an application graph defined by set of vertex ܸ, which define operators and set of edges ܧ, which define flows of data; ɉሺܸ୧ሻ – mean operator’s data arrival rate; Ɋሺܸ୧ሻ – mean operator’s service rate; Here we introduce a matrix of operator’s instances defined as follows: ܹሺܸ୧ሻ=ሼݓଵ ǥݓ௝ሽ. To deal with operator’s performance and quality of it’s results we offer to reuse the performance model from quality-based approach for workflows (Butakov, Nasonov, Svitenkov, & Radice, 2016): ܶ௏௜ ൌ ݂ሺ݀ǡ݌ǡݎሻ – a performance model (calculation time) for operator ܸ௜, which depends on: ݀ = ݀௏௜ – characteristics of an input data (format, size, precision) arrived ݎ a particular resource – ܴא ൌ ௪೔ೕݎ ,௏௜ - one of available parameters’ configurationܲאto operator, ݌ node, where operator’s instance ݓ௜௝ will be launched (note, that ݌ and ݀ are the same for all instances :ݓ௜௝). Also we offer to reuse model of quality function and define for operator ܸ௜ is defined as follows ௏௜ ݍൌܳ ሺܳ௏௜ǡܲ௏௜ሻ, where ܳ௏௜ are quality parameters to measure, ܲ௏௜ ൌሼ݌ଵǡǤǤǤǡ݌௡ሽ parameter’s values, which impact on ܳ௏௜. To declare the operator’s allocation on a particular resource we introduce a schedule function: ܵܿ ൌ ݐ is a scheduling strategy. Theݏ ,is an application graph, ܴ is a resource pool ܩ ݐሻ, whereݏǡܴǡܩሺܨ aggregated elements notation is presented

Load more