Introduction to Stateful Stream Processing with Apache Flink
Robert Metzger @rmetzger_ [email protected]
2 Ververica Platform Original creators of Open Source Apache Flink Apache Flink® + Application Manager
3 Apache Flink 101
4 Apache Flink
§ Apache Flink is an open source stream processing framework • Low latency • High throughput • Stateful • Distributed
5 What is Apache Flink?
Data Stream Processing Batch Event-driven Processing realtime results from data streams Applications process static and data-driven actions historic data and services
Stateful Computations Over Data Streams
6 Use Case: Streaming ETL
● Periodic ETL is the traditional Periodic ETL approach ○ External tool periodically triggers ETL batch job
Data Pipeline / Real-time ETL ● Data pipelines continuously move data ○ Ingestion with low latency ○ No artificial data boundaries
7 Use Case: Data Analytics
● Batch analytics is great for ad-hoc Batch Analytics queries ○ Queries change faster than data ○ Interactive analytics / prototyping
● Stream analytics continuously Stream Analytics processes data ○ Data changes faster than queries ○ Live / low latency results ○ No Lambda architecture required! Use Case: Event-driven applications
● Traditional application design Transactional Application ○ Compute & data tier architecture ○ React to and process events ○ State is stored in (remote) database
● Event-driven application ○ State is maintained locally Event-driven Application ○ Guaranteed consistency by periodic state checkpoints ○ Tight coupling of logic and data (microservice architecture) ○ Highly scalable design
9 Hardened at scale
Details about their use cases and more users are listed on Flink’s website at https://flink.apache.org/poweredby.html 10 Case Study: Single’s Day
● Chinese Shopping Festival ● Very high peak load ○ 100s millions records per second ○ 100s thousands payments per second ○ 10 TBs of managed state ○ 10s thousands of cores ● Flink used in various areas in the process incl. payment, shipping, realtime recommendations and the giant dashboard References ● https://www.ververica.com/blog/singles-day-2018-data-in-a-flink-of-an-eye ● https://medium.com/@alitech_2017/how-to-cope-with-peak-data-traffic-on-11-11-the-secrets-of- alibaba-stream-computing-17d5e807980c 11 Building blocks
12 The Core Building Blocks
Event Streams State (Event) Time Snapshots
consistency with forking / real-time and complex out-of-order data versioning / hindsight business logic and late data time-travel
13 Stateful Event & Stream Processing
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer(…)) Source val events: DataStream[Event] = lines.map((line) => parse(line)) Transformation val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) Transformation .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Sink
Streaming Dataflow Source Transform Window Sink (state read/write) 14 Stateful Event & Stream Processing
Filter / State Source Sink Transform read/write
15 Stateful Event & Stream Processing
Scalable embedded state
Access at memory speed & scales with parallel operators
16 Stateful Event & Stream Processing
Rolling back computation Re-processing Re-load state
Reset positions in input streams
17 Time: Different Notions of Time
Flink Flink Event Producer Message Queue Data Source Window Operator
partition 1
partition 2
Broker Event Ingestion Window Time Time Time Processing Time
18 Time: Event Time Example
Event Time
Episode Episode Episode Episode Episode Episode Episode IV V VI I II III VII
1977 1980 1983 1999 2002 2005 2015
Processing Time
19 Recap: The Core Building Blocks
Event Streams State (Event) Time Snapshots
consistency with forking / real-time and complex out-of-order data versioning / hindsight business logic and late data time-travel
20 APIs
21 The APIs
Analytics Stream SQL
Stream- & Batch Processing Table API (dynamic tables)
22 Stateful DataStream API (streams, windows) Event-Driven Applications Process Function (events, state, time) Process Function
class MyFunction extends ProcessFunction[MyEvent, Result] {
// declare state to use in the program lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…)
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = { // work with event and state (event, state.value) match { … }
out.collect(…) // emit events state.update(…) // modify state
// schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) }
def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = { // handle callback when event-/processing- time instant is reached } } 23 Data Stream API
val lines: DataStream[String] = env.addSource( new FlinkKafkaConsumer<>(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
24 Table API & Stream SQL
25 Deployment & Integrations
26 Deployment
27 Integrations
● Event logs: ○ Kafka, Kinesis, Pulsar* ● File systems: ○ S3, HDFS, NFS, MapR FS, ... ● Encodings: ○ Avro, JSON, CSV, ORC, Parquet ● Databases: ○ JDBC, HCatalog ● Key-Value Stores ○ Cassandra, Elasticsearch, Redis*
28 Concluding…
29 Early Bird ticket sales ends July 15th Get in touch via eMail: Get in touch via Twitter: Q & A [email protected] @rmetzger_ [email protected] @ApacheFlink
31 What’s happening in Flink these days …
• INSERT INTO flink_sql SELECT * FROM blink_sql DataStream Table / SQL • Turning Table API into an API unified across batch DataSet and streaming (FLINK-11439) (deprecated) Stream Operator & DAG API • Integration with Hive ecosystem (FLINK-10556) Runtime
• Batch runtime improvements: Fine-grained recovery (FLINK-4256), more schedulers (FLINK-10429), pluggable shuffle service (FLINK-10653) • Machine Learning Pipelines on Table API (FLIP-39) • Table API: Caching of intermediate results (.cache() API) (FLINK-11199) • Table API: Python support (FLINK-12308) Implementation: State Checkpointing
34 State, Snapshots, Recovery
Coordination via markers, injected into the streams
35 36 37 38 39 Implementation: Stream SQL
41 Stream SQL: Intuition
Stream / Table Duality
Table without Primary Key Table with Primary Key
42 Stream SQL: Implementation
Differential Computation Query Compilation (add/mod/del) 43 Implementation: Rescaling
44 Rescaling State / Elasticity
▪ Similar to consistent hashing Key space
▪ Split key space into key groups Key group #1 Key group #2
▪ Assign key groups to tasks
Key group #4 Key group #3
45 Rescaling State / Elasticity
▪ Rescaling changes key group assignment ▪ Maximum parallelism defined by #key groups ▪ Rescaling happens through restoring a savepoint using the new parallism
46 Implementation: Time-handling
47 Time: Different Notions of Time
Flink Flink Event Producer Message Queue Data Source Window Operator
partition 1
partition 2
Broker Event Ingestion Window Time Time Time Processing Time
48 Time: Watermarks
Stream (in order)
23 21 20 19 18 17 15 14 11 10 9 9 7
W(20) W(11) Event Watermark Event timestamp
Stream (out of order)
21 19 20 17 22 12 17 14 12 9 15 11 7
W(17) W(11)
Event Watermark
Event timestamp 49 Time: Watermarks in Parallel
Event Watermark [id|timestamp]
29 14 29 Q|44 N|39 M|39 Source K|35 map B|31 A|30 window (1) (1) (1) W(33) C|30 14 Watermark D|15 Event Time at input streams Generation W(17)
E|30 29 17 14 Source map window R|37 O|23 L|22 H|20 G|18 F|15 (2) (2) (2) 14 W(17)
Event Time at the operator 50 Implementation: Queryable State
51 Queryable State
Traditional
Queryable State 52 Queryable State: Implementation
Query: /job/operation/state-name/key Query Client (1) Get location of "key-partition" (4) Query for "operator" of" job" state-name and key (3) Respond location State Location Server (2) Look up State State register location Registry Registry ExecutionGraph deploy window( window( local state status )/ )/ sum() sum()
Job Manager Task Manager Task Manager 53 State, Snapshots, Recovery
Events flow without replication or synchronous writes
State index (Hash Table or RocksDB)
Events are persistent and ordered (per partition / key) in the log (e.g., Apache Kafka) source / stateful transform operation 54 State, Snapshots, Recovery
Trigger checkpoint Inject checkpoint barrier
source / stateful transform operation 55 State, Snapshots, Recovery
Take state snapshot RocksDB: Trigger state copy-on-write
source / stateful transform operation 56 State, Snapshots, Recovery
Persist state snapshots Processing pipeline continues Durably persist snapshots asynchronously
source / stateful transform operation 57 Powerful Abstractions
Layered abstractions to navigate simple to complex use cases
High-level Analytics API Stream SQL / Tables (dynamic tables)
val stats = stream Stream- & Batch .keyBy("sensor") DataStream API (streams, windows) .timeWindow(Time.seconds(5)) Data Processing .sum((a, b) -> a.add(b))
Stateful Event- Process Function (events, state, time) Driven Applications def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … }
out.collect(…) // emit events state.update(…) // modify state
// schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } 58 Event Sourcing + Memory Image
periodically snapshot the memory main memory
event / command event log update local variables/structures persists events (temporarily) Process
59 Event Sourcing + Memory Image
Recovery: Restore snapshot and replay events since snapshot
event log
persists events (temporarily) Process
60 Distributed Memory Image
Distributed application, many memory images. Snapshots are all consistent together.
61