Introduction to Stateful Stream Processing with Apache Flink

Introduction to Stateful Stream Processing with Apache Flink Robert Metzger @rmetzger_ [email protected] 2 Ververica Platform Original creators of Open Source Apache Flink Apache Flink® + Application Manager 3 Apache Flink 101 4 Apache Flink § Apache Flink is an open source stream processing framework • Low latency • High throughput • Stateful • Distributed 5 What is Apache Flink? Data Stream Processing Batch Event-driven Processing realtime results from data streams Applications process static and data-driven actions historic data and services Stateful Computations Over Data Streams 6 Use Case: Streaming ETL ● Periodic ETL is the traditional Periodic ETL approach ○ External tool periodically triggers ETL batch job Data Pipeline / Real-time ETL ● Data pipelines continuously move data ○ Ingestion with low latency ○ No artificial data boundaries 7 Use Case: Data Analytics ● Batch analytics is great for ad-hoc Batch Analytics queries ○ Queries change faster than data ○ Interactive analytics / prototyping ● Stream analytics continuously Stream Analytics processes data ○ Data changes faster than queries ○ Live / low latency results ○ No Lambda architecture required! Use Case: Event-driven applications ● Traditional application design Transactional Application ○ Compute & data tier architecture ○ React to and process events ○ State is stored in (remote) database ● Event-driven application ○ State is maintained locally Event-driven Application ○ Guaranteed consistency by periodic state checkpoints ○ Tight coupling of logic and data (microservice architecture) ○ Highly scalable design 9 Hardened at scale Details about their use cases and more users are listed on Flink’s website at https://flink.apache.org/poweredby.html 10 Case Study: Single’s Day ● Chinese Shopping Festival ● Very high peak load ○ 100s millions records per second ○ 100s thousands payments per second ○ 10 TBs of managed state ○ 10s thousands of cores ● Flink used in various areas in the process incl. payment, shipping, realtime recommendations and the giant dashboard References ● https://www.ververica.com/blog/singles-day-2018-data-in-a-flink-of-an-eye ● https://medium.com/@alitech_2017/how-to-cope-with-peak-data-traffic-on-11-11-the-secrets-of- alibaba-stream-computing-17d5e807980c 11 Building blocks 12 The Core Building Blocks Event Streams State (Event) Time Snapshots consistency with forking / real-time and complex out-of-order data versioning / hindsight business logic and late data time-travel 13 Stateful Event & Stream Processing val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer(…)) Source val events: DataStream[Event] = lines.map((line) => parse(line)) Transformation val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) Transformation .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Sink Streaming Dataflow Source Transform Window Sink (state read/write) 14 Stateful Event & Stream Processing Filter / State Source Sink Transform read/write 15 Stateful Event & Stream Processing Scalable embedded state Access at memory speed & scales with parallel operators 16 Stateful Event & Stream Processing Rolling back computation Re-processing Re-load state Reset positions in input streams 17 Time: Different Notions of Time Flink Flink Event Producer Message Queue Data Source Window Operator partition 1 partition 2 Broker Event Ingestion Window Time Time Time Processing Time 18 Time: Event Time Example Event Time Episode Episode Episode Episode Episode Episode Episode IV V VI I II III VII 1977 1980 1983 1999 2002 2005 2015 Processing Time 19 Recap: The Core Building Blocks Event Streams State (Event) Time Snapshots consistency with forking / real-time and complex out-of-order data versioning / hindsight business logic and late data time-travel 20 APIs 21 The APIs Analytics Stream SQL Stream- & Batch Processing Table API (dynamic tables) 22 Stateful DataStream API (streams, windows) Event-Driven Applications Process Function (events, state, time) Process Function class MyFunction extends ProcessFunction[MyEvent, Result] { // declare state to use in the program lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = { // handle callback when event-/processing- time instant is reached } } 23 Data Stream API val lines: DataStream[String] = env.addSource( new FlinkKafkaConsumer<>(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) 24 Table API & Stream SQL 25 Deployment & Integrations 26 Deployment 27 Integrations ● Event logs: ○ Kafka, Kinesis, Pulsar* ● File systems: ○ S3, HDFS, NFS, MapR FS, ... ● Encodings: ○ Avro, JSON, CSV, ORC, Parquet ● Databases: ○ JDBC, HCatalog ● Key-Value Stores ○ Cassandra, Elasticsearch, Redis* 28 Concluding… 29 Early Bird ticket sales ends July 15th Get in touch via eMail: Get in touch via Twitter: Q & A [email protected] @rmetzger_ [email protected] @ApacheFlink 31 What’s happening in Flink these days … • INSERT INTO flink_sql SELECT * FROM blink_sql DataStream Table / SQL • Turning Table API into an API unified across batch DataSet and streaming (FLINK-11439) (deprecated) Stream Operator & DAG API • Integration with Hive ecosystem (FLINK-10556) Runtime • Batch runtime improvements: Fine-grained recovery (FLINK-4256), more schedulers (FLINK-10429), pluggable shuffle service (FLINK-10653) • Machine Learning Pipelines on Table API (FLIP-39) • Table API: Caching of intermediate results (.cache() API) (FLINK-11199) • Table API: Python support (FLINK-12308) Implementation: State Checkpointing 34 State, Snapshots, Recovery Coordination via markers, injected into the streams 35 36 37 38 39 Implementation: Stream SQL 41 Stream SQL: Intuition Stream / Table Duality Table without Primary Key Table with Primary Key 42 Stream SQL: Implementation Differential Computation Query Compilation (add/mod/del) 43 Implementation: Rescaling 44 Rescaling State / Elasticity ▪ Similar to consistent hashing Key space ▪ Split key space into key groups Key group #1 Key group #2 ▪ Assign key groups to tasks Key group #4 Key group #3 45 Rescaling State / Elasticity ▪ Rescaling changes key group assignment ▪ Maximum parallelism defined by #key groups ▪ Rescaling happens through restoring a savepoint using the new parallism 46 Implementation: Time-handling 47 Time: Different Notions of Time Flink Flink Event Producer Message Queue Data Source Window Operator partition 1 partition 2 Broker Event Ingestion Window Time Time Time Processing Time 48 Time: Watermarks Stream (in order) 23 21 20 19 18 17 15 14 11 10 9 9 7 W(20) W(11) Event Watermark Event timestamp Stream (out of order) 21 19 20 17 22 12 17 14 12 9 15 11 7 W(17) W(11) Event Watermark Event timestamp 49 Time: Watermarks in Parallel Event Watermark [id|timestamp] 29 14 29 Q|44 N|39 M|39 Source K|35 map B|31 A|30 window (1) (1) (1) W(33) C|30 14 Watermark D|15 Event Time at input streams Generation W(17) E|30 29 17 14 Source map window R|37 O|23 L|22 H|20 G|18 F|15 (2) (2) (2) 14 W(17) Event Time at the operator 50 Implementation: Queryable State 51 Queryable State Traditional Queryable State 52 Queryable State: Implementation Query: /job/operation/state-name/key Query Client (1) Get location of "key-partition" (4) Query for "operator" of" job" state-name and key (3) Respond location State Location Server (2) Look up State State register location Registry Registry ExecutionGraph deploy window( window( local state status )/ )/ sum() sum() Job Manager Task Manager Task Manager 53 State, Snapshots, Recovery Events flow without replication or synchronous writes State index (Hash Table or RocksDB) Events are persistent and ordered (per partition / key) in the log (e.g., Apache Kafka) source / stateful transform operation 54 State, Snapshots, Recovery Trigger checkpoint Inject checkpoint barrier source / stateful transform operation 55 State, Snapshots, Recovery Take state snapshot RocksDB: Trigger state copy-on-write source / stateful transform operation 56 State, Snapshots, Recovery Persist state snapshots Processing pipeline continues Durably persist snapshots asynchronously source / stateful transform operation 57 Powerful Abstractions Layered abstractions to navigate simple to complex use cases High-level Analytics API Stream SQL / Tables (dynamic tables) val stats = stream Stream- & Batch .keyBy("sensor") DataStream API (streams, windows) .timeWindow(Time.seconds(5)) Data Processing .sum((a, b) -> a.add(b)) Stateful Event- Process Function (events, state, time) Driven Applications def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } 58 Event Sourcing + Memory Image periodically snapshot the memory main memory event / command event log update local variables/structures persists events (temporarily) Process 59 Event Sourcing + Memory Image Recovery: Restore snapshot and replay events since snapshot event log persists events (temporarily) Process 60 Distributed Memory Image Distributed application, many memory images. Snapshots are all consistent together. 61.

Load more