Using with Intel BigDL for optimized personalized card linked offer

November 29, 2017 Suqiang Song , Lead Enterprise Architect Shared Components and Security Solutions Me Find me: Previously: Enterprise Lead Architect Suqiang Song ( Jack) https Twitter@JackSsq1981 LinkedIn : [email protected] Lead Enterprise Enterprise Lead Architect ://www.linkedin.com/in/suqiang @ @ MasterCard Huawei - song - 72041716 /

©2016 Mastercard. Proprietary and Confidential. Our Explore & Evaluation Journey H https://hackernoon.com/the ierarchy of needs of AI - ai - hierarchy - of - needs - 18f111fcc007?gi=b66e6ae1b71d

©2016 Mastercard. Proprietary and Confidential. ………….. Deep Learning only works for Deep Learning requires GPU Deep Learning = Tensor Flow Deep Learning = Alpha Go Some gossips of DeepLearning Image interpretation

©2016 Mastercard. Proprietary and Confidential. numerical numerical A set of techniques that use layers What is Deep Learning ? techniques, techniques, and hardware Recently Popular • • • Arbitrary Regression Classification in the 80’s as Neural Networks came came back thanks to advances in data inputs: mapping collection, collection, computation that transform that transform

©2016 Mastercard. Proprietary and Confidential. data arestored A Enterprise Requirements for DeepLearning …… managementand enterprise monitoring traditional ML than separatedetc..) ratherresources ,workloads allocation be managedand monitored with Leverage machine learning workflows ratherthan rebuild all of them. Add data. nalyze deeplearning a large amount of dataa large thesame amount on BigData existing (HDFS Big DataBig capabilities existing to Analytic Applicationsand/or , HBase,Hive,etc clusters other workloads other workloads and deep learning workloads and deep workloads learning should .) ratherthan moveor .) duplicate (ETL clusters , , data warehouse, warehouse, where where the

©2016 Mastercard. Proprietary and Confidential. “Super Stars”…. Challenges and limitations to Production considering some • • • • • • ………….. Flow fromGoogleCloud The concernsof Enterprise Maturity and InfoSec( Tedious and fragile to duplicate data pipelines and data copythe to capacity and performance data pipelines, technologiesand sets.tool And it is also a big challenge tomake ( Not well integrated with other enterprise Low level with steep learning at STAGE Success requires many engineer hardware infrastructure ( Claimed that the GPU computingare better than CPU which requires new couldn't leverage the existing data ETL, warehousing and other relevant analytic Maybenot your story , but we have...... ) ...) distribute ) very very long timeline normally - computations( hours hours ( curve ( curve Impossible toInstall a Tensor Flow Cluster toolsand need data movements Where is yourPHD degree ? less monitoring use ) GPU cluster with Tensor ) ) .)

©2016 Mastercard. Proprietary and Confidential. • TensorFrames • Paddle • mxnet • • • • libraries Integrations with existing DL TensorFlow CNTK ( ( Databricks) (from Pipelines Learning Deep What does Spark offer? ( CaffeOnSpark Elephas mmlspark ( TensorFlow ) ) ) ) on Spark, on Spark, • • • • • Implementations SparkNet SparkCL DeepDist BigDL of DL on Spark

©2016 Mastercard. Proprietary and Confidential. HDFS files as job input and output). in a very coarse (2) with the JVM threading in Spark (resulting in lower performance). interconnectedby InfiniBand), and the Open MP implementations in Tensor Flow/Caffe conflicts (1) and prediction) are performed outside ofSpark (across multiple Tensor Flow orCaffe instances). Flow/Caffe instances in the cluster; however, the distributed deep learning (e.g., training, tuning Tensor Flow

Statistics Statistics collected on Nov 17, 2017 Need more break down Needmore break down …..

BigDL TensorflowOnSpark databricks/tensorframes databricks/spark learning As a results, Tensor Flow/Caffe still runs specializedon HW(such as GPU servers In addition,in this case Tensor Flow/Caffe canonly interact the rest of pipelines the analytics

- on - - Spark (orCaffe grained fashion (running as standalone jobs outside ofthe pipeline, and using

-

deep

-

Programming interface Scala & Python Python Python Python - on

- Spark) uses Spark executors (tasks) tolaunch Tensor

Contributors

43 8 9 8

Commits

1963 214 185 36

©2016 Mastercard. Proprietary and Confidential. BIGDL BigDL: Implemented as a standalone BigDL:a Implemented libraryon Spark as

SQL

SparkR BIG DL – (Spark package)(Spark Data Frame Spark Core

Streaming MLlib ML Pipeline GraphX BigDL with with Feature Parity Designed for Intel®for Xeon®Designed * Caffe http:// * and and * software.intel.com/bigdl improved ease of use Lower TCO and infrastructure existing with Deep Deep Enabling Data Data Platform, Scale Learning on on Big Learning Efficient - Out

©2016 Mastercard. Proprietary and Confidential. Use Case and POC

©2016 Mastercard. Proprietary and Confidential. Business Use Case probability. ourcasesuseas runningexample users on focusand predicting merchantswithinseveral days. We takePCLO CardOffers) (Personal Lined thepropensity estimatingcouldinvolve of toconsumerstargetshop orlikeawarenesscreating re Therearearetargeted a campaignsvariety of thatadvertising goals drivenBusiness : targetingoffer . linkingand matching improvethe quality , performanceandaccuracy of offer design,andcampaigns algorithms implement : Goal newUsers base on Intel base BigDLframework tohelp - ItemsPropensity learningModels with deep - targeting consumers. For targetingconsumers. example,latterthe our Loyaltytosolution - merchants offering,

©2016 Mastercard. Proprietary and Confidential. 1 Model correlation Prebuilt Issues with Traditional ML : Model pipeline limitations Engineering

Merchant Merchant2 Merchant Feature … n 1 2 2 2 Training n Training Training 1 Training Training 2 Training Model Model n Model 2 Model 1 3 3 3 Evaluation 3 Evaluation Evaluation 2 Evaluation Evaluation 1 Evaluation Merge 4 Merge all the prediction results prediction the allMerge couldn’t scale to more couldn’tscale   whenresources executed cluster  processand testing ,training engineeringfeature   merchant eachforone model generate  Limitations It was fine to support up to 200 merchants but upto200merchantsto support fineItwas fewtransactions with merchantsfor Badmetrics All the pipelines separated by merchants and bymerchantsseparated pipelines the All Run merchants in parallel and occupied most most of occupiedand in parallel merchants Run at computations and duplications of redundantLots pre Haveto - calculate the correlation correlationthe calculate 14

©2016 Mastercard. Proprietary and Confidential. • • • • • • ba27b721.pdf d/39e938c84a867ddf2a8cabc575ff https://pdfs.semanticscholar.org/aa9 systems. forand generalization recommender combine the benefits of memorization models and deep neural networks Scenario: jointly WideDeep & gnan/papers/ncf.pdf https://www.comp.nus.edu.sg/~xian NCF past past history activities active users) according to customers’ customers (priority is to recommend to Filtering ,recommend Scenario Approach: Neural : Neural Collaborative learning trained trained linear wide . products products to --- to Recommender

Embedding Layers MF User EmbeddingUser MF LookupTable (MFUser) Select User indexUser ConcatTable CMul MF Item EmbeddingItem MF LookupTable (MFItem) Select User Item Pair MF Item IndexItem Sigmoid Linear3 Concat LookupTable MLP User Embedding User MLP (MLP User)(MLP Select User indexUser Linear1 Linear2 Conca ReLU ReLU LookupTable (MLP Item) (MLP MLP Item Embedding ItemMLP Select MLP Item IndexItem 15

©2016 Mastercard. Proprietary and Confidential. pipelines Issues with Traditional ML : in a in very coarse - grained fashion Multiple technologies, separated 16

©2016 Mastercard. Proprietary and Confidential. With BigDL on Spark : in one and All standard ML pipeline &DL libs 17

©2016 Mastercard. Proprietary and Confidential. deep deep learning projects based on different business problems, domain models and data sets.   approach  major The   T Benchmark approaches     follows: benchmark, each For ry two kinds of approaches to predict users predict to approaches of kinds two TOP 20 Precision & Precision Recall (https://en.wikipedia.org/wiki/Precision_and_recall) Recall area Precision under curve (PR AUC) CharacteristicReceiver Operating area under curve (ROC AUC) Traditional NCF Abstract Abstract After Learn compared with traditional machine learning. & WAD learning with automated selected couple couple rounds of benchmarks, verify and validate the differences, enhancements of deep learning the details the details Intel of BigDL, including APIs and mainstreammodels, master how to use it to handle the goals of this POC including: this of goals and and purify the general data pipelines for similar kinds of modeling and data science engineering machine learning with handcrafted features ( on Spark Based Mlib ALS) evaluate evaluate and compare the regular performance metrics of ML of as metrics models performance regular the and compare - features merchants (offers) propensities model. propensities (offers) merchants 18

©2016 Mastercard. Proprietary and Confidential.   ML Libs      Stage) Like( Prod Cluster We    sets data Select Environments Target Intel Intel BigDL (0.3 ) Spark Mlib ALS ( 2.2 ) Jdk Spark Hadoop 24 nodes 6 cluster (3 nodes, HMN3 HDP nodes), for single each node is physical box Distinct Distinct We Stage selected a Hadoop cluster which samehas restrictions PROD:as We selected years3 choose choose 12 ~24 hyper cores, 384 memory,G 21 T disk 1.8 version: version: 2.2 Merchants Merchants (Offers or Campaigns) for benchmark : distribution: distribution: CDH5.12.1 known transactions qualified consumer months for training and credit credit transactions : 675 675 : k :1.4 billions ( 53 G for raw data) (from a Mid Bank), 1~2 1~2 months for back test. 2000 with with those characteristics: 19

©2016 Mastercard. Proprietary and Confidential. Load Parquet Load Training Training Data Negative samples Months 10~12 Raw Txns Implementation : runBigDL & AIS overSpark onHadoop + Engineering Feature Simple Parquet partition sampled partition sampled Files partition sampled Selections Feature Spark Data Frames features Spark ML Pipeline Spark MLStages Pipeline Train Estimator Train NCF NCF Model ( BigDL) Train Multiple Wide and Deep Model ( BigDL) Models Train Spark

AIS Model ( Mlib … M Transformer Transformer odels Using Using Neural Recommender Mlib BigDL NCF/ Wide And Deep Wide And NCF/ BigDL ) candidate model Tune & Fine Evaluation Model models …. User User User User models models - - - - Merchant Geo Category Merchant Processing Processing Post - Geo Benchmark

Spark Pipeline Pre

Predictions Test Test Data Months Ensemble Model Selection Feature Inference

Test - processing 1~2 … 20

©2016 Mastercard. Proprietary and Confidential. Alpha( Rank( RegParam MaxIter : Parameters Mlib Mlib AIS Benchmark results> 100 rounds) ( 20precision: precision: D recall: AUPRCs: AUROC: A 200 0.01 ( ) ( 100 0.01 ) B ) ) E precision: D recall AUPRCs: AUROC: 20precision: batchSize mOutput uOutput learningRateDecay learningRate MaxEpoch : Parameters BigDL NCF : C+18% A+23% B+31% ( ( 200 1 ( +47% (1.6 M) (1.6 00 10 E+51% ) ) ) ( 3e - 2 (3e ) - 7) 20precision: precision: recall AUPRCs: AUROC: BigDL WAD batchSize mOutput uOutput learningRateDecay learningRate MaxEpoch : Parameters : C+12% (4 % down) A+20% (3 (3 % A+20%down) B+30%(1% down) D+49%(2 % up) ( ( 200 1 ( E+54% (3% up) (0.6 M) (0.6 00 10 ) ) ) ( 1 e - 2 (1e ) - 7) 21

©2016 Mastercard. Proprietary and Confidential. T ips for BigDL

©2016 Mastercard. Proprietary and Confidential.   know should limitations Some     Tuning Performance      Avoid OOM Tips Tune GClot a , god bless you Tune shuffle a lot , spark.shuffle.manager=tungstensuch as Turn on spark.rdd.compress BigDL 0.2 does not support Spark 2.x , use so 0.3 least at Only Random sampling VS Invert sampling , another tradeoff Separate data preparing ,training and Inference to steps multiple spark jobs Plan your epoch and iterations , tradeoff between performance and precision Batch Size matters Sparse Sensor , not Sensor Dense BigDL broadcasts model to each CPUcore , not executor ! support support JavaSerializer JavaSerializer with ,issue KyroSerializer  - sort 23

©2016 Mastercard. Proprietary and Confidential. Thanks Q & A

©2016 Mastercard. Proprietary and Confidential.