Streaming inference with Apache Beam and TFX
Robert Crowe Google Developer Engineer Reza Rokni Google Developer Advocate Agenda
Robert & Reza High level overview of TFX Using the hermetic seal between training and inference In addition to training an amazing model ...
Modeling Code … a production solution requires so much more
Machine Resource Data Verification Management Data Collection
Serving Configuration Monitoring Infrastructure ML Code Analysis Tools
Feature Extraction Process Management Tools Tales From The Trenches
https://twitter.com/ginablaber/status/971450218095943681 Production Machine Learning
Machine Learning Development Modern Software Development Labeled data Scalability Feature space coverage Extensibility Minimal dimensionality Configuration Maximum predictive data Consistency & Reproducibility Fairness Modularity Rare conditions Best Practices Data lifecycle management + Testability Monitoring Safety & Security Leading ML best practices
Continuous Training for Production ML in the TFX Platform. OpML (2019). Slice Finder: Automated Data Slicing for Model Validation. ICDE (2019). Data Validation for Machine Learning. SysML (2019). TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD (2017). Data Management Challenges in Production Machine Learning. SIGMOD (2017). Rules of Machine Learning: Best Practices for ML Engineering. Google AI Web (2017). Machine Learning: The High Interest Credit Card of Technical Debt. NeurIPS (2015). Hidden Technical Debt in Machine Learning Systems. NIPS (2015). What is MLOps?
“MLOps is a practice for collaboration and communication between data scientists and operations professionals to help manage production ML lifecycle.” “Similar to the DevOps or DataOps approaches, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements.” Production ML Infrastructure CD Foundation MLOps reference architecture
https://cd.foundation/blog TensorFlow Extended (TFX) PowersPowers Alphabet’s Alphabet’s most most important important bets andbets products and products TFX Powers our partners too
“... we have re-tooled our machine learning platform to use Spotify SAP Concur Etsy TensorFlow. This yielded significant productivity gains while positioning ourselves to take advantage of the latest industry research.” Airbus Twi er PayPal Ranking Tweets with TensorFlow - Twitter Tencent NetEase Yahoo Japan https://goo.gle/tf-twitter-rank
JD.Com TFX Production Components
Feature Tasks Data Data Train Validate Push If Serve Ingestion Validation Engineering Model Model Good Model
Libraries
Components
Tuner InfraValidator Bulk Inference Component: ExampleGen Inputs and Outputs Configuration
CSV TF Record example_gen = CsvExampleGen(input_base=external_input(data_root))
Raw Data
Standard Formats Formats Supported Using Beam
ExampleGen ● CSV ● S3 ● tf.Record ● GCS ● Split TF ● BigQuery Hadoop Record Data ● Kafka Custom Formats ● PubSub ● Avro ● BigQuery Training ● Parquet ● BigTable Eval ● Presto ● Datastore ● Mongo ● Flink Component: Transform Inputs and Outputs Configuration
ExampleGen SchemaGen Code transform = Transform( input_data=example_gen.outputs.examples, schema=infer_schema.outputs.output, Data Schema module_file=taxi_module_file)
Transform Code
for key in _DENSE_FLOAT_FEATURE_KEYS: Transform Transformed outputs[_transformed_name(key)] = transform.scale_to_z_score( Graph Data _fill_in_missing(inputs[key])) # ...
outputs[_transformed_name(_LABEL_KEY)] = tf.where( Trainer tf.is_nan(taxi_fare), tf.cast(tf.zeros_like(taxi_fare), tf.int64), # Test if the tip was > 20% of the fare. tf.cast( tf.greater(tips, tf.multiply(taxi_fare, tf.constant(0.2))), tf.int64))
# ... Component: BulkInferrer Inputs and Outputs Configuration
bulk_inferrer = BulkInferrer( Trainer Evaluator ExampleGen examples=inference_example_gen.outputs[‘examples’], model_export=trainer.outputs[‘output’], Validation Unlabelled model_blessing=evaluator.outputs[‘blessing’], Model Outcome examples data_spec=bulk_inferrer_pb2.DataSpec( example_splits=[‘unlabelled’]), model_spec=bulk_inferrer_pb2.ModelSpec())
BulkInferrer Configuration Options Block batch inference on a successful model validation. Choose the inference examples from example gen's output. Inference Choose the signatures and tags of inference model. Result Inference Result
Contains features and predictions. Autoencoder
Learns an efficient encoding in an unsupervised manner Tries to reconstruct a representation as close as possible to its original input “Reconstruction error” Difference between the original data and reconstruction Used as an anomaly score to detect anomalies Java Apache Spark Input.apply (Sum.integersPerKey())
Python Apache Beam Apache Flink
input | Sum.PerKey() Open-source, unified model and set of SDKs for defining and Sum Per Key SQL executing data processing Cloud Dataflow SELECT key, SUM(value) FROM input GROUP BY key
Go Apache Samza
stats.Sum(s, input)
... Others PCollections, PTransforms, Pipelines...
PCollection: Represents unordered set of data, e.g. for input and output PTransform: Rata processing operation that operates over 1:many PCollections Pipeline: Entire set of operations being performed including reading input, applying transformations, writing output, and the execution engine to be used. Architecture
Raw ticks
Timeseries Data Apache Beam Architecture TFX
Signal
Raw ticks
Timeseries Data Apache Beam Processed Aggregations as Aggregations TF.Example
Google Cloud Storage Architecture TFX
Signal
Raw ticks
Timeseries Inference Data Apache Beam Processed Aggregations as Aggregations TF.Example
Google Apache Cloud Beam Storage Processing - Timeseries
TS-1 TS-2 TS-3 TS-4 TS-5
t0 →t1
t1 →t2
t2 →t3
t3 →t4
t4 →t5
Data Point Processing - Timeseries
TS-1 TS-2 TS-3 TS-4 TS-5
t0 →t1
t1 →t2
t2 →t3
t3 →t4
t4 →t5
Data Point Processing - Timeseries Two types of computation
Type 1 [3,1]..[4,1]..[4,0] 2 3 4 '4' 3
Type 2 1
1 Data Point Max Fixed Window 0 Sliding Window (rolling window) Absolute Gain Demo Demo with sin wave
Synthetic - Train data Synthetic - Data with anomalies Demo with sin wave Feedback
Your feedback is important to us. Don’t forget to rate and review the sessions.