Large-Scale Stream Processing in the Hadoop Ecosystem

Large-Scale Stream Processing in the Hadoop Ecosystem Gyula Fóra [email protected] Márton Balassi [email protected] This talk § Stream processing by example § Open source stream processors § Runtime architecture and programming model § Counting words… § Fault tolerance and stateful processing § Closing 2015-09-28 Apache: Big Data Europe 2 Stream processing by example 2015-09-28 Apache: Big Data Europe 3 Streaming applications ETL style operations Process/Enrich Inpu • Filter incoming data, InpuInpu t Log analysis Inputtt • High throughput, connectors, at-least-once processing Window aggregations • Trending tweets, User sessions, Stream joins • Window abstractions 2015-09-28 Apache: Big Data Europe 4 Streaming applications Machine learning • Fitting trends to the evolving stream, Stream clustering • Model state, cyclic flows Pattern recognition • Fraud detection, Triggering signals based on activity • Exactly-once processing 2015-09-28 Apache: Big Data Europe 5 Open source stream processors 2015-09-28 Apache: Big Data Europe 6 Apache Streaming landscape 2015-09-28 Apache: Big Data Europe 7 Apache Storm § Started in 2010, development driven by BackType, then Twitter § Pioneer in large scale stream processing § Distributed dataflow abstraction (spouts & bolts) 2015-09-28 Apache: Big Data Europe 8 Apache Flink § Started in 2008 as a research project (Stratosphere) at European universities § Unique combination of low latency streaming and high throughput batch analysis § Flexible operator states and windowing Stream Data Kafka, RabbitMQ, ... Batch data HDFS, JDBC, ... 2015-09-28 Apache: Big Data Europe 9 Apache Spark § Started in 2009 at UC Berkley, Apache since 2013 § Very strong community, wide adoption § Unified batch and stream processing over a batch runtime § Good integration with batch programs 2015-09-28 Apache: Big Data Europe 10 Apache Samza § Developed at LinkedIn, open sourced in 2013 § Builds heavily on Kafka’s log based philosophy § Pluggable messaging system and execution backend 2015-09-28 Apache: Big Data Europe 11 System comparison Streaming Native Micro-batching Native Native model API Compositional Declarative Compositional Declarative Fault tolerance Record ACKs RDD-based Log-based Checkpoints Guarantee At-least-once Exactly-once At-least-once Exactly-once State as Stateful Stateful State Only in Trident DStream operators operators Windowing Not built-in Time based Not built-in Policy based Latency Very-Low Medium Low Low Throughput Medium High High High 2015-09-28 Apache: Big Data Europe 12 Runtime and programming model 2015-09-28 Apache: Big Data Europe 13 Native Streaming 2015-09-28 Apache: Big Data Europe 14 Distributed dataflow runtime § Storm, Samza and Flink § General properties • Long standing operators • Pipelined execution • Usually possible to create cyclic flows Pros Cons • Full expressivity • Fault-tolerance is hard • Low-latency execution • Throughput may suffer • Stateful operators • Load balancing is an issue 2015-09-28 Apache: Big Data Europe 15 Distributed dataflow runtime § Storm • Dynamic typing + Kryo • Dynamic topology rebalancing § Samza • Almost every component pluggable • Full task isolation, no backpressure (buffering handled by the messaging layer) § Flink • Strongly typed streams + custom serializers • Flow control mechanism • Memory management 2015-09-28 Apache: Big Data Europe 16 Micro-batching 2015-09-28 Apache: Big Data Europe 17 Micro-batch runtime § Implemented by Apache Spark § General properties • Computation broken down to time intervals • Load aware scheduling • Easy interaction with batch Pros Cons • Easy to reason about • Latency depends on • High-throughput batch size • FT comes for “free” • Limited expressivity • Dynamic load balancing • Stateless by nature 2015-09-28 Apache: Big Data Europe 18 Programming model Compositional Declarative § Offer basic building blocks § Expose a high-level API for composing custom § Operators are higher order operators and topologies functions on abstract data § Advanced behavior such as stream types windowing is often missing § Advanced behavior such as § Topology needs to be hand- windowing is supported optimized § Query optimization 2015-09-28 Apache: Big Data Europe 19 Programming model § Transformations abstract DStream, DataStream operator details § Suitable for engineers and data analysts Spout, Consumer, Bolt, Task, Topology § Direct access to the execution graph / topology • Suitable for engineers 2015-09-28 Apache: Big Data Europe 20 Counting words… 2015-09-28 Apache: Big Data Europe 21 WordCount (storm, 4) storm budapest flink (budapest, 1) apache storm spark (flink, 4) streaming samza storm (apache, 2) flink apache flink (spark, 1) bigdata storm (streaming, 2) flink streaming (samza, 1) (bigdata, 1) 2015-09-28 Apache: Big Data Europe 22 Storm Assembling the topology TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new SentenceSpout(), 5); builder.setBolt("split", new Splitter(), 8).shuffleGrouping("spout"); builder.setBolt("count", new Counter(), 12) .fieldsGrouping("split", new Fields("word")); Rolling word count bolt public class Counter extends BaseBasicBolt { Map<String, Integer> counts = new HashMap<String, Integer>(); public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.containsKey(word) ? counts.get(word) + 1 : 1; counts.put(word, count); collector.emit(new Values(word, count)); } } 2015-09-28 Apache: Big Data Europe 23 Samza Rolling word count task public class WordCountTask implements StreamTask { private KeyValueStore<String, Integer> store; public void process( IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { String word = envelope.getMessage(); Integer count = store.get(word); if(count == null){count = 0;} store.put(word, count + 1); collector.send(new OutgoingMessageEnvelope(new SystemStream("kafka", ”wc"), Tuple2.of(word, count))); } } 2015-09-28 Apache: Big Data Europe 24 Flink case class Word (word: String, frequency: Int) Rolling word count val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() Window word count val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .groupBy("word").sum("frequency") .print() 2015-09-28 Apache: Big Data Europe 25 Spark Window word count Rolling word count (kind of) 2015-09-28 Apache: Big Data Europe 26 Fault tolerance and stateful processing 2015-09-28 Apache: Big Data Europe 27 Fault tolerance intro § Fault-tolerance in streaming systems is inherently harder than in batch • Can’t just restart computation • State is a problem • Fast recovery is crucial • Streaming topologies run 24/7 for a long period § Fault-tolerance is a complex issue • No single point of failure is allowed • Guaranteeing input processing • Consistent operator state • Fast recovery • At-least-once vs Exactly-once semantics 2015-09-28 Apache: Big Data Europe 28 Storm record acknowledgements § Track the lineage of tuples as they are processed (anchors and acks) § Special “acker” bolts track each lineage DAG (efficient xor based algorithm) § Replay the root of failed (or timed out) tuples 2015-09-28 Apache: Big Data Europe 29 Samza offset tracking § Exploits the properties of a durable, offset based messaging layer § Each task maintains its current offset, which moves forward as it processes elements § The offset is checkpointed and restored on failure (some messages might be repeated) 2015-09-28 Apache: Big Data Europe 30 Flink checkpointing § Based on consistent global snapshots § Algorithm designed for stateful dataflows (minimal runtime overhead) § Exactly-once semantics 2015-09-28 Apache: Big Data Europe 31 Spark RDD recomputation § Immutable data model with repeatable computation § Failed RDDs are recomputed using their lineage § Checkpoint RDDs to reduce lineage length § Parallel recovery of failed RDDs § Exactly-once semantics 2015-09-28 Apache: Big Data Europe 32 State in streaming programs § Almost all non-trivial streaming programs are stateful § Stateful operators (in essence): �: ��, �� ⟶ ��, ��. § State hangs around and can be read and modified as the stream evolves § Goal: Get as close as possible while maintaining scalability and fault-tolerance 2015-09-28 Apache: Big Data Europe 33 § States available only in Trident API § Dedicated operators for state updates and queries § State access methods Exactly-once guarantee • stateQuery(…) • partitionPersist(…) • persistentAggregate(…) § It’s very difficult to implement transactional states 2015-09-28 Apache: Big Data Europe 34 § Stateless runtime by design • No continuous operators • UDFs are assumed to be stateless § State can be generated as a separate stream of RDDs: updateStateByKey(…) . �: ��[��], �� ⟶ �� § � is scoped to a specific key § Exactly-once semantics 2015-09-28 Apache: Big Data Europe 35 § Stateful dataflow operators (Any task can hold state) § State changes are stored as a log by Kafka § Custom storage engines can be plugged in to the log § � is scoped to a specific task § At-least-once processing semantics 2015-09-28 Apache: Big Data Europe 36 § Stateful dataflow operators (conceptually similar to Samza) § Two state access patterns • Local (Task) state • Partitioned (Key) state § Proper API integration • Java: OperatorState interface

Load more