Portable stateful big data processing in

Kenneth Knowles

Apache Beam PMC Software Engineer @ https://s.apache.org/ffsf-2017-beam-state [email protected] / @KennKnowles Flink Forward San Francisco 2017 Agenda

1. What is Apache Beam? 2. State 3. Timers 4. Example & Little Demo What is Apache Beam? TL;DR

(Flink draws it more like this) 4 DAGs, DAGs, DAGs Apache Beam Apache Cloud Hadoop Apache Apache Dataflow Spark Samza MapReduce Apache Apache Apache (paper) Storm Gearpump Apex (incubating) FlumeJava (paper) Heron MillWheel (paper) Dataflow Model (paper)

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Apache Flink local, on-prem, The Beam Vision cloud

Cloud Dataflow: Java fully managed input.apply( Sum.integersPerKey()) local, on-prem, cloud

Sum Per Key Python local, on-prem, cloud input | Sum.PerKey() Apache Gearpump (incubating) ⋮ ⋮ 6 Apache Flink local, on-prem, The Beam Vision cloud

Cloud Dataflow: Python fully managed input | KakaIO.read() Apache Spark local, on-prem, cloud

KafkaIO Apache Apex ⋮ local, on-prem, cloud

Apache Java Gearpump (incubating) class KafkaIO extends UnboundedSource { … }

⋮ 7 The Beam Model

PTransform Pipeline

PCollection (bounded or unbounded)

8 The Beam Model

What are you computing? (read, map, reduce)

Where in event time? (event time windowing)

When in processing time are results produced? (triggers)

How do refinements relate? (accumulation mode)

9 What are you computing?

Read ParDo Grouping Composite Parallel connectors to Per element Group by key, Encapsulated external systems "Map" Combine per key, subgraph "Reduce"

10 State What are you computing?

Read ParDo Grouping Composite Parallel connectors to Per element Group by key, Encapsulated external systems "Map" Combine per key, subgraph "Reduce"

12 State for ParDo

ParDo(DoFn)

13 ParDo.of(new DoFn<...>() { "Example" // declare some state

@ProcessElement public void process(...) { // update quantiles // output if needed } }) Per Key Quantiles

14 Partitioned

Quantiles Quantiles Quantiles

15 Parallelism! new DoFn>() { @StateId("quantiles") private final StateSpec<...> quantilesState = StateSpecs.combining();

… }

Quantiles Quantiles Quantiles DoFn DoFn DoFn

16 Windowed

Window into Fixed windows of one hour

Quantiles Quantiles Quantiles DoFn DoFn DoFn

Expected result: Quantiles for each hour

17 Windowed

Window into windows of 30 min sliding by 10 min

Quantiles Quantiles Quantiles DoFn DoFn DoFn

Expected result: Quantiles for 30 minutes sliding by 10 min

18 State is per key and window

... 1 2 3

"x" 3 7 15

"y" "fizz" "7" "fizzbuzz"

...

Bonus: automatically garbage collected when a window expires (vs manual clearing of keyed state) Windowed

Window into Fixed windows of one hour

Quantiles Quantiles Quantiles DoFn DoFn DoFn

Expected result: Quantiles for each hour

20 What about Combine?

Window into Fixed windows of one hour

Quantiles Quantiles Quantiles Combiner Combiner Combiner

Expected result: Quantiles for each hour

21 Combine vs State (naive conceptualization)

Window hourly Window hourly

Quantiles Quantiles Combiner DoFn

Expected result: Expected result: Quantiles for each hour Quantiles for each hour

22 Combine vs State (likely execution plan)

Quantiles Quantiles non-associative* Combiner Combiner non-commutative* Shuffle Elements multi-output Shuffle side outputs Accumulators (user is in control)

Quantiles Quantiles associative Combiner DoFn commutative single-output enables optimizations (engine is in control) Combine vs State

Quantiles Combiner Quantiles DoFn

output governed by trigger

(data/computation unaware) "output only when there's an interesting change"

(data/computation aware) Example: Arbitrary indices

E

B

A

C

D non-associative

Assign indices non-commutative

non-deterministic E,4

B,3 and totally fine!

A,2

C,1

D,0 Kinds of state

● Value - just a mutable cell for a value ● Bag - supports "blind writes" ● Combining - has a CombineFn built in; can support blind writes and lazy accumulation ● Set - membership checking ● Map - lookups and partial writes Timers new DoFn<...>() { Timers for ParDo @TimerId("timeout") private final TimerSpec timeoutTimer = TimerSpecs.timer(TimeDomain.PROCESSING_TIME);

@OnTimer("timout") public void timeout(...) { // access state, set new timers, output } … }

Stateful ParDo

28 Timers in processing time

buffer request

"call me back in 5 On Element seconds"

On Timer

batched RPC

output based only output return on incoming value of batched element RPC Timers in event time

"call me back when store speculative result the watermark hits the end of the On Element window"

On Timer

output output replacement speculative result value only if immediately changed More example uses for state & timers

● Per-key arbitrary numbering ● Output only when result changes ● Tighter "side input" management for slowly changing dimension ● Streaming join-matrix / join-biclique ● Fine-grained combine aggregation and output control ● Per-key "workflows" like user sign up flow w/ expiration ● Low-latency deduplication (let the first through, squash the rest) Performance considerations (cross-runner)

● Shuffle to colocate keys ● Linear processing of elements for key+window ● Window merging ● Storage of state and timers ● GC of state

32 Demo

33 Summary

State and Timers in Beam...

● … unlock new uses cases

● … they "just work" with event time windowing

● … are portable across runners (implementation ongoing) Thank you for listening! This talk: ● Me - @KennKnowles ● These Slides - https://s.apache.org/ffsf-2017-beam-state

Go Deeper ● Design doc - https://s.apache.org/beam-state ● Blog post - https://beam.apache.org/blog/2017/02/13/stateful-processing.html

Join the Beam community: ● User discussions - [email protected] ● Development discussions - [email protected] ● Follow @ApacheBeam on Twitter

You can contribute to Beam + Flink ● New types of state ● Easy launch of Beam job on Flink-on-YARN ● Integration tests at scale ● Fit and finish: polish, polish, polish! ● … and lots more!

https://beam.apache.org