Large-Scale Stream Processing in the Hadoop Ecosystem
Gyula Fóra [email protected] Márton Balassi [email protected] This talk
§ Stream processing by example § Open source stream processors § Runtime architecture and programming model § Counting words… § Fault tolerance and stateful processing § Closing
2015-09-28 Apache: Big Data Europe 2 Stream processing by example
2015-09-28 Apache: Big Data Europe 3 Streaming applications
ETL style operations Process/Enrich Inpu • Filter incoming data, InpuInpu t Log analysis Inputtt • High throughput, connectors, at-least-once processing Window aggregations • Trending tweets, User sessions, Stream joins • Window abstractions
2015-09-28 Apache: Big Data Europe 4 Streaming applications
Machine learning • Fitting trends to the evolving stream, Stream clustering • Model state, cyclic flows Pattern recognition • Fraud detection, Triggering signals based on activity • Exactly-once processing
2015-09-28 Apache: Big Data Europe 5 Open source stream processors
2015-09-28 Apache: Big Data Europe 6 Apache Streaming landscape
2015-09-28 Apache: Big Data Europe 7 Apache Storm
§ Started in 2010, development driven by BackType, then Twitter § Pioneer in large scale stream processing § Distributed dataflow abstraction (spouts & bolts)
2015-09-28 Apache: Big Data Europe 8 Apache Flink
§ Started in 2008 as a research project (Stratosphere) at European universities § Unique combination of low latency streaming and high throughput batch analysis § Flexible operator states and windowing
Stream Data Kafka, RabbitMQ, ...
Batch data HDFS, JDBC, ... 2015-09-28 Apache: Big Data Europe 9 Apache Spark
§ Started in 2009 at UC Berkley, Apache since 2013 § Very strong community, wide adoption § Unified batch and stream processing over a batch runtime § Good integration with batch programs
2015-09-28 Apache: Big Data Europe 10 Apache Samza
§ Developed at LinkedIn, open sourced in 2013 § Builds heavily on Kafka’s log based philosophy § Pluggable messaging system and execution backend
2015-09-28 Apache: Big Data Europe 11 System comparison
Streaming Native Micro-batching Native Native model API Compositional Declarative Compositional Declarative
Fault tolerance Record ACKs RDD-based Log-based Checkpoints
Guarantee At-least-once Exactly-once At-least-once Exactly-once State as Stateful Stateful State Only in Trident DStream operators operators Windowing Not built-in Time based Not built-in Policy based Latency Very-Low Medium Low Low Throughput Medium High High High
2015-09-28 Apache: Big Data Europe 12 Runtime and programming model
2015-09-28 Apache: Big Data Europe 13 Native Streaming
2015-09-28 Apache: Big Data Europe 14 Distributed dataflow runtime
§ Storm, Samza and Flink § General properties • Long standing operators • Pipelined execution • Usually possible to create cyclic flows Pros Cons • Full expressivity • Fault-tolerance is hard • Low-latency execution • Throughput may suffer • Stateful operators • Load balancing is an issue
2015-09-28 Apache: Big Data Europe 15 Distributed dataflow runtime
§ Storm • Dynamic typing + Kryo • Dynamic topology rebalancing § Samza • Almost every component pluggable • Full task isolation, no backpressure (buffering handled by the messaging layer) § Flink • Strongly typed streams + custom serializers • Flow control mechanism • Memory management
2015-09-28 Apache: Big Data Europe 16 Micro-batching
2015-09-28 Apache: Big Data Europe 17 Micro-batch runtime
§ Implemented by Apache Spark § General properties • Computation broken down to time intervals • Load aware scheduling • Easy interaction with batch Pros Cons • Easy to reason about • Latency depends on • High-throughput batch size • FT comes for “free” • Limited expressivity • Dynamic load balancing • Stateless by nature
2015-09-28 Apache: Big Data Europe 18 Programming model
Compositional Declarative
§ Offer basic building blocks § Expose a high-level API for composing custom § Operators are higher order operators and topologies functions on abstract data § Advanced behavior such as stream types windowing is often missing § Advanced behavior such as § Topology needs to be hand- windowing is supported optimized § Query optimization
2015-09-28 Apache: Big Data Europe 19 Programming model
§ Transformations abstract DStream, DataStream operator details § Suitable for engineers and data analysts
Spout, Consumer, Bolt, Task, Topology § Direct access to the execution graph / topology • Suitable for engineers
2015-09-28 Apache: Big Data Europe 20 Counting words…
2015-09-28 Apache: Big Data Europe 21 WordCount
(storm, 4) storm budapest flink (budapest, 1) apache storm spark (flink, 4) streaming samza storm (apache, 2) flink apache flink (spark, 1) bigdata storm (streaming, 2) flink streaming (samza, 1) (bigdata, 1)
2015-09-28 Apache: Big Data Europe 22 Storm
Assembling the topology
TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new SentenceSpout(), 5); builder.setBolt("split", new Splitter(), 8).shuffleGrouping("spout"); builder.setBolt("count", new Counter(), 12) .fieldsGrouping("split", new Fields("word")); Rolling word count bolt public class Counter extends BaseBasicBolt { Map
public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.containsKey(word) ? counts.get(word) + 1 : 1; counts.put(word, count); collector.emit(new Values(word, count)); } }
2015-09-28 Apache: Big Data Europe 23 Samza
Rolling word count task public class WordCountTask implements StreamTask { private KeyValueStore
public void process( IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { String word = envelope.getMessage(); Integer count = store.get(word); if(count == null){count = 0;} store.put(word, count + 1); collector.send(new OutgoingMessageEnvelope(new SystemStream("kafka", ”wc"), Tuple2.of(word, count))); } }
2015-09-28 Apache: Big Data Europe 24 Flink case class Word (word: String, frequency: Int) Rolling word count val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() Window word count val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .groupBy("word").sum("frequency") .print()
2015-09-28 Apache: Big Data Europe 25 Spark
Window word count
Rolling word count (kind of)
2015-09-28 Apache: Big Data Europe 26 Fault tolerance and stateful processing
2015-09-28 Apache: Big Data Europe 27 Fault tolerance intro
§ Fault-tolerance in streaming systems is inherently harder than in batch • Can’t just restart computation • State is a problem • Fast recovery is crucial • Streaming topologies run 24/7 for a long period § Fault-tolerance is a complex issue • No single point of failure is allowed • Guaranteeing input processing • Consistent operator state • Fast recovery • At-least-once vs Exactly-once semantics
2015-09-28 Apache: Big Data Europe 28 Storm record acknowledgements
§ Track the lineage of tuples as they are processed (anchors and acks) § Special “acker” bolts track each lineage DAG (efficient xor based algorithm) § Replay the root of failed (or timed out) tuples
2015-09-28 Apache: Big Data Europe 29 Samza offset tracking
§ Exploits the properties of a durable, offset based messaging layer § Each task maintains its current offset, which moves forward as it processes elements § The offset is checkpointed and restored on failure (some messages might be repeated)
2015-09-28 Apache: Big Data Europe 30 Flink checkpointing
§ Based on consistent global snapshots § Algorithm designed for stateful dataflows (minimal runtime overhead) § Exactly-once semantics
2015-09-28 Apache: Big Data Europe 31 Spark RDD recomputation
§ Immutable data model with repeatable computation § Failed RDDs are recomputed using their lineage § Checkpoint RDDs to reduce lineage length § Parallel recovery of failed RDDs § Exactly-once semantics
2015-09-28 Apache: Big Data Europe 32 State in streaming programs
§ Almost all non-trivial streaming programs are stateful
§ Stateful operators (in essence): �: ��, ����� ⟶ ���, �����
§ State hangs around and can be read and modified as the stream evolves
§ Goal: Get as close as possible while maintaining scalability and fault-tolerance
2015-09-28 Apache: Big Data Europe 33 § States available only in Trident API § Dedicated operators for state updates and queries § State access methods Exactly-once guarantee • stateQuery(…) • partitionPersist(…) • persistentAggregate(…) § It’s very difficult to implement transactional states
2015-09-28 Apache: Big Data Europe 34 § Stateless runtime by design • No continuous operators • UDFs are assumed to be stateless § State can be generated as a separate stream of RDDs: updateStateByKey(…)