Introduction to Stateful with Apache Flink

Robert Metzger @rmetzger_ [email protected]

2 Ververica Platform Original creators of Open Source Apache Flink Apache Flink® + Application Manager

3 Apache Flink 101

4 Apache Flink

§ Apache Flink is an open source stream processing framework • Low latency • High throughput • Stateful • Distributed

5 What is Apache Flink?

Data Stream Processing Batch Event-driven Processing realtime results from data streams Applications process static and data-driven actions historic data and services

Stateful Computations Over Data Streams

6 Use Case: Streaming ETL

● Periodic ETL is the traditional Periodic ETL approach ○ External tool periodically triggers ETL batch job

Data Pipeline / Real-time ETL ● Data pipelines continuously move data ○ Ingestion with low latency ○ No artificial data boundaries

7 Use Case: Data Analytics

● Batch analytics is great for ad-hoc Batch Analytics queries ○ Queries change faster than data ○ Interactive analytics / prototyping

● Stream analytics continuously Stream Analytics processes data ○ Data changes faster than queries ○ Live / low latency results ○ No Lambda architecture required! Use Case: Event-driven applications

● Traditional application design Transactional Application ○ Compute & data tier architecture ○ React to and process events ○ State is stored in (remote) database

● Event-driven application ○ State is maintained locally Event-driven Application ○ Guaranteed consistency by periodic state checkpoints ○ Tight coupling of logic and data (microservice architecture) ○ Highly scalable design

9 Hardened at scale

Details about their use cases and more users are listed on Flink’s website at https://flink.apache.org/poweredby.html 10 Case Study: Single’s Day

● Chinese Shopping Festival ● Very high peak load ○ 100s millions records per second ○ 100s thousands payments per second ○ 10 TBs of managed state ○ 10s thousands of cores ● Flink used in various areas in the process incl. payment, shipping, realtime recommendations and the giant dashboard References ● https://www.ververica.com/blog/singles-day-2018-data-in-a-flink-of-an-eye ● https://medium.com/@alitech_2017/how-to-cope-with-peak-data-traffic-on-11-11-the-secrets-of- alibaba-stream-computing-17d5e807980c 11 Building blocks

12 The Core Building Blocks

Event Streams State (Event) Time Snapshots

consistency with forking / real-time and complex out-of-order data versioning / hindsight business logic and late data time-travel

13 Stateful Event & Stream Processing

val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer(…)) Source val events: DataStream[Event] = lines.map((line) => parse(line)) Transformation val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) Transformation .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Sink

Streaming Dataflow Source Transform Window Sink (state read/write) 14 Stateful Event & Stream Processing

Filter / State Source Sink Transform read/write

15 Stateful Event & Stream Processing

Scalable embedded state

Access at memory speed & scales with parallel operators

16 Stateful Event & Stream Processing

Rolling back computation Re-processing Re-load state

Reset positions in input streams

17 Time: Different Notions of Time

Flink Flink Event Producer Message Queue Data Source Window Operator

partition 1

partition 2

Broker Event Ingestion Window Time Time Time Processing Time

18 Time: Event Time Example

Event Time

Episode Episode Episode Episode Episode Episode Episode IV V VI I II III VII

1977 1980 1983 1999 2002 2005 2015

Processing Time

19 Recap: The Core Building Blocks

Event Streams State (Event) Time Snapshots

consistency with forking / real-time and complex out-of-order data versioning / hindsight business logic and late data time-travel

20 APIs

21 The APIs

Analytics Stream SQL

Stream- & Batch Processing Table API (dynamic tables)

22 Stateful DataStream API (streams, windows) Event-Driven Applications Process Function (events, state, time) Process Function

class MyFunction extends ProcessFunction[MyEvent, Result] {

// declare state to use in the program lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…)

def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = { // work with event and state (event, state.value) match { … }

out.collect(…) // emit events state.update(…) // modify state

// schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) }

def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = { // handle callback when event-/processing- time instant is reached } } 23 Data Stream API

val lines: DataStream[String] = env.addSource( new FlinkKafkaConsumer<>(…))

val events: DataStream[Event] = lines.map((line) => parse(line))

val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction())

stats.addSink(new RollingSink(path))

24 Table API & Stream SQL

25 Deployment & Integrations

26 Deployment

27 Integrations

● Event logs: ○ Kafka, Kinesis, Pulsar* ● File systems: ○ S3, HDFS, NFS, MapR FS, ... ● Encodings: ○ Avro, JSON, CSV, ORC, Parquet ● Databases: ○ JDBC, HCatalog ● Key-Value Stores ○ Cassandra, , Redis*

28 Concluding…

29 Early Bird ticket sales ends July 15th Get in touch via eMail: Get in touch via Twitter: Q & A [email protected] @rmetzger_ [email protected] @ApacheFlink

31 What’s happening in Flink these days …

• INSERT INTO flink_sql SELECT * FROM blink_sql DataStream Table / SQL • Turning Table API into an API unified across batch DataSet and streaming (FLINK-11439) (deprecated) Stream Operator & DAG API • Integration with Hive ecosystem (FLINK-10556) Runtime

• Batch runtime improvements: Fine-grained recovery (FLINK-4256), more schedulers (FLINK-10429), pluggable shuffle service (FLINK-10653) • Pipelines on Table API (FLIP-39) • Table API: Caching of intermediate results (.cache() API) (FLINK-11199) • Table API: Python support (FLINK-12308) Implementation: State Checkpointing

34 State, Snapshots, Recovery

Coordination via markers, injected into the streams

35 36 37 38 39 Implementation: Stream SQL

41 Stream SQL: Intuition

Stream / Table Duality

Table without Primary Key Table with Primary Key

42 Stream SQL: Implementation

Differential Computation Query Compilation (add/mod/del) 43 Implementation: Rescaling

44 Rescaling State / Elasticity

▪ Similar to consistent hashing Key space

▪ Split key space into key groups Key group #1 Key group #2

▪ Assign key groups to tasks

Key group #4 Key group #3

45 Rescaling State / Elasticity

▪ Rescaling changes key group assignment ▪ Maximum parallelism defined by #key groups ▪ Rescaling happens through restoring a savepoint using the new parallism

46 Implementation: Time-handling

47 Time: Different Notions of Time

Flink Flink Event Producer Message Queue Data Source Window Operator

partition 1

partition 2

Broker Event Ingestion Window Time Time Time Processing Time

48 Time: Watermarks

Stream (in order)

23 21 20 19 18 17 15 14 11 10 9 9 7

W(20) W(11) Event Watermark Event timestamp

Stream (out of order)

21 19 20 17 22 12 17 14 12 9 15 11 7

W(17) W(11)

Event Watermark

Event timestamp 49 Time: Watermarks in Parallel

Event Watermark [id|timestamp]

29 14 29 Q|44 N|39 M|39 Source K|35 map B|31 A|30 window (1) (1) (1) W(33) C|30 14 Watermark D|15 Event Time at input streams Generation W(17)

E|30 29 17 14 Source map window R|37 O|23 L|22 H|20 G|18 F|15 (2) (2) (2) 14 W(17)

Event Time at the operator 50 Implementation: Queryable State

51 Queryable State

Traditional

Queryable State 52 Queryable State: Implementation

Query: /job/operation/state-name/key Query Client (1) Get location of "key-partition" (4) Query for "operator" of" job" state-name and key (3) Respond location State Location Server (2) Look up State State register location Registry Registry ExecutionGraph deploy window( window( local state status )/ )/ sum() sum()

Job Manager Task Manager Task Manager 53 State, Snapshots, Recovery

Events flow without replication or synchronous writes

State index (Hash Table or RocksDB)

Events are persistent and ordered (per partition / key) in the log (e.g., ) source / stateful transform operation 54 State, Snapshots, Recovery

Trigger checkpoint Inject checkpoint barrier

source / stateful transform operation 55 State, Snapshots, Recovery

Take state snapshot RocksDB: Trigger state copy-on-write

source / stateful transform operation 56 State, Snapshots, Recovery

Persist state snapshots Processing pipeline continues Durably persist snapshots asynchronously

source / stateful transform operation 57 Powerful Abstractions

Layered abstractions to navigate simple to complex use cases

High-level Analytics API Stream SQL / Tables (dynamic tables)

val stats = stream Stream- & Batch .keyBy("sensor") DataStream API (streams, windows) .timeWindow(Time.seconds(5)) Data Processing .sum((a, b) -> a.add(b))

Stateful Event- Process Function (events, state, time) Driven Applications def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … }

out.collect(…) // emit events state.update(…) // modify state

// schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } 58 Event Sourcing + Memory Image

periodically snapshot the memory main memory

event / command event log update local variables/structures persists events (temporarily) Process

59 Event Sourcing + Memory Image

Recovery: Restore snapshot and replay events since snapshot

event log

persists events (temporarily) Process

60 Distributed Memory Image

Distributed application, many memory images. Snapshots are all consistent together.

61