Streaming inference with Apache Beam and TFX

Robert Crowe Developer Engineer Reza Rokni Google Developer Advocate Agenda

Robert & Reza High level overview of TFX Using the hermetic seal between training and inference In addition to training an amazing model ...

Modeling Code … a production solution requires so much more

Machine Resource Data Verification Management Data Collection

Serving Configuration Monitoring Infrastructure ML Code Analysis Tools

Feature Extraction Process Management Tools Tales From The Trenches

https://twitter.com/ginablaber/status/971450218095943681 Production Machine Learning

Machine Learning Development Modern Software Development Labeled data Scalability Feature space coverage Extensibility Minimal dimensionality Configuration Maximum predictive data Consistency & Reproducibility Fairness Modularity Rare conditions Best Practices Data lifecycle management + Testability Monitoring Safety & Security Leading ML best practices

Continuous Training for Production ML in the TFX Platform. OpML (2019). Slice Finder: Automated Data Slicing for Model Validation. ICDE (2019). Data Validation for Machine Learning. SysML (2019). TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD (2017). Data Management Challenges in Production Machine Learning. SIGMOD (2017). Rules of Machine Learning: Best Practices for ML Engineering. Google AI Web (2017). Machine Learning: The High Interest Credit Card of Technical Debt. NeurIPS (2015). Hidden Technical Debt in Machine Learning Systems. NIPS (2015). What is MLOps?

“MLOps is a practice for collaboration and communication between data scientists and operations professionals to help manage production ML lifecycle.” “Similar to the DevOps or DataOps approaches, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements.” Production ML Infrastructure CD Foundation MLOps reference architecture

https://cd.foundation/blog TensorFlow Extended (TFX) PowersPowers Alphabet’s Alphabet’s most most important important bets andbets products and products TFX Powers our partners too

“... we have re-tooled our machine learning platform to use Spotify SAP Concur Etsy TensorFlow. This yielded significant productivity gains while positioning ourselves to take advantage of the latest industry research.” Airbus Twier PayPal Ranking Tweets with TensorFlow - Twitter Tencent NetEase Yahoo Japan https://goo.gle/tf-twitter-rank

JD.Com TFX Production Components

Feature Tasks Data Data Train Validate Push If Serve Ingestion Validation Engineering Model Model Good Model

Libraries

Components

Tuner InfraValidator Bulk Inference Component: ExampleGen Inputs and Outputs Configuration

CSV TF Record example_gen = CsvExampleGen(input_base=external_input(data_root))

Raw Data

Standard Formats Formats Supported Using Beam

ExampleGen ● CSV ● S3 ● tf.Record ● GCS ● Split TF ● BigQuery Hadoop Record Data ● Kafka Custom Formats ● PubSub ● Avro ● BigQuery Training ● Parquet ● BigTable Eval ● Presto ● Datastore ● Mongo ● Flink Component: Transform Inputs and Outputs Configuration

ExampleGen SchemaGen Code transform = Transform( input_data=example_gen.outputs.examples, schema=infer_schema.outputs.output, Data Schema module_file=taxi_module_file)

Transform Code

for key in _DENSE_FLOAT_FEATURE_KEYS: Transform Transformed outputs[_transformed_name(key)] = transform.scale_to_z_score( Graph Data _fill_in_missing(inputs[key])) # ...

outputs[_transformed_name(_LABEL_KEY)] = tf.where( Trainer tf.is_nan(taxi_fare), tf.cast(tf.zeros_like(taxi_fare), tf.int64), # Test if the tip was > 20% of the fare. tf.cast( tf.greater(tips, tf.multiply(taxi_fare, tf.constant(0.2))), tf.int64))

# ... Component: BulkInferrer Inputs and Outputs Configuration

bulk_inferrer = BulkInferrer( Trainer Evaluator ExampleGen examples=inference_example_gen.outputs[‘examples’], model_export=trainer.outputs[‘output’], Validation Unlabelled model_blessing=evaluator.outputs[‘blessing’], Model Outcome examples data_spec=bulk_inferrer_pb2.DataSpec( example_splits=[‘unlabelled’]), model_spec=bulk_inferrer_pb2.ModelSpec())

BulkInferrer Configuration Options Block batch inference on a successful model validation. Choose the inference examples from example gen's output. Inference Choose the signatures and tags of inference model. Result Inference Result

Contains features and predictions. Autoencoder

Learns an efficient encoding in an unsupervised manner Tries to reconstruct a representation as close as possible to its original input “Reconstruction error” Difference between the original data and reconstruction Used as an anomaly score to detect anomalies Java Input.apply (Sum.integersPerKey())

Python Apache Beam

input | Sum.PerKey() Open-source, unified model and set of SDKs for defining and Sum Per Key SQL executing data processing Cloud Dataflow SELECT key, SUM(value) FROM input GROUP BY key

Go

stats.Sum(s, input)

... Others PCollections, PTransforms, Pipelines...

PCollection: Represents unordered set of data, e.g. for input and output PTransform: Rata processing operation that operates over 1:many PCollections Pipeline: Entire set of operations being performed including reading input, applying transformations, writing output, and the execution engine to be used. Architecture

Raw ticks

Timeseries Data Apache Beam Architecture TFX

Signal

Raw ticks

Timeseries Data Apache Beam Processed Aggregations as Aggregations TF.Example

Google Cloud Storage Architecture TFX

Signal

Raw ticks

Timeseries Inference Data Apache Beam Processed Aggregations as Aggregations TF.Example

Google Apache Cloud Beam Storage Processing - Timeseries

TS-1 TS-2 TS-3 TS-4 TS-5

t0 →t1

t1 →t2

t2 →t3

t3 →t4

t4 →t5

Data Point Processing - Timeseries

TS-1 TS-2 TS-3 TS-4 TS-5

t0 →t1

t1 →t2

t2 →t3

t3 →t4

t4 →t5

Data Point Processing - Timeseries Two types of computation

Type 1 [3,1]..[4,1]..[4,0] 2 3 4 '4' 3

Type 2 1

1 Data Point Max Fixed Window 0 Sliding Window (rolling window) Absolute Gain Demo Demo with sin wave

Synthetic - Train data Synthetic - Data with anomalies Demo with sin wave Feedback

Your feedback is important to us. Don’t forget to rate and review the sessions.