Using for optimized personalized card linked offer

March 5, 2018 Suqiang Song , Director, Chapter Leader of Data Engineering & AI Shared Components and Security Solutions

LinkedIn : https://www.linkedin.com/in/suqiang-song-72041716/ Mastercard Big Data & AI Expertise Differentiation starts with consumer insights from a massive worldwide payments network and our experience in data cleansing, analytics and modeling What can MULTI-SOURCED • 38MM+ merchant locations • 22,000 issuers CLEANSED, AGGREGATD, ANONYMOUS, AUGMENTED 2.4 BILLION • 1.5MM automated rules • Continuously tested and Global Cards WAREHOUSED • 10 petabytes 56 BILLION • 5+ year historic global view • Rapid retrieval Transactions/ • Above-and-beyond privacy protection and security

Year mean TRANSFORMED INTO ACTIONABLE INSIGHTS • Reports, indexes, benchmarks to you? • Behavioral variables • Models, scores, forecasting • Econometrics

Mastercard Enhanced Capability with the Acquisitions of Applied Predictive Technologies(2015) and Brighterion (2017) H https://hackernoon.com/the ierarchy of needs of AI - ai - hierarchy - of - needs - 18f111fcc007?gi=b66e6ae1b71d

©2016 Mastercard. Proprietary and Confidential. Our Explore & Evaluation Journey data arestored A Enterprise Requirements for DeepLearning …… allocation ,workloads managementand enterprise monitoring traditional ML etc..) be managedand monitored with Leverage workflows Add data. nalyze deeplearning a large amount of dataa large thesame amount on BigData existing (HDFS Big DataBig capabilities existing to Analytic Applicationsand/or rather than separated deployment , resources , deployment separated than rather , HBase,Hive,etc clusters rather than rebuild ratherthan rebuild all of them other workloads other workloads and deep learning workloads and deep workloads learning should .) ratherthan moveor duplicate (ETL clusters , , data . warehouse, warehouse, where where the

©2016 Mastercard. Proprietary and Confidential. “Super Stars”…. Challenges and limitations to Production considering some • • • • • • ………….. Flow fromGoogleCloud The concernsof Enterprise Maturity and InfoSec( Tedious and fragile to duplicate data pipelines and data copythe to capacity and performance data pipelines, technologiesand sets.tool And it is also a big challenge tomake ( Not well integrated with other enterprise Low level with steep learning at STAGE Success requires many engineer hardware infrastructure ( Claimed that the GPU computingare better than CPU which requires new couldn't leverage the existing data ETL, warehousing and other relevant analytic Maybenot your story , but we have...... ) ...) distribute ) very very long timeline normally - computations( hours hours ( curve ( curve Impossible toInstall a Tensor Flow Cluster toolsand need data movements Where is yourPHD degree ? less monitoring use ) GPU cluster with Tensor ) ) .)

©2016 Mastercard. Proprietary and Confidential. • TensorFrames • Paddle • mxnet • • • • libraries Integrations with existing DL TensorFlow CNTK ( ( Databricks) (from Pipelines Learning Deep What does Spark offer? ( CaffeOnSpark Elephas mmlspark ( TensorFlow ) ) ) ) on Spark, on Spark, • • • • • Implementations SparkNet SparkCL DeepDist BigDL of DL on Spark

©2016 Mastercard. Proprietary and Confidential. HDFS files as job input and output). in a very coarse (2) with the JVM threading in Spark (resulting in lower performance). interconnectedby InfiniBand), and the Open MP implementations in Tensor Flow/Caffe conflicts (1) and prediction) are performed outside ofSpark (across multiple Tensor Flow orCaffe instances). Flow/Caffe instances in the cluster; however, the distributed deep learning (e.g., training, tuning Tensor Flow Need more break down Needmore break down …..

Statistics Statistics collected on Mar 5

BigDL TensorflowOnSpark Databricks/tensor Databricks/spark learning As a results, Tensor Flow/Caffe still runs specializedon HW(such as GPU servers In addition,in this case Tensor Flow/Caffe canonly interact the rest of pipelines the analytics

- on - - Spark (orCaffe grained fashion (running as standalone jobs outside ofthe pipeline, and using

-

deep

-

th

, 2018 ,

Programming interface Scala & Python Python Python Python

- on

- Spark) uses Spark executors (tasks) tolaunch Tensor

Contributors

50 9 9 8

commits

2221 257 185 51

©2016 Mastercard. Proprietary and Confidential. Use Case and POC

©2016 Mastercard. Proprietary and Confidential. Business Use Case probability. ourcasesuseas runningexample users on focusand predicting merchantswithinseveral days. We takePCLO CardOffers) (Personal Lined thepropensity estimatingcouldinvolve of toconsumerstargetshop orlikeawarenesscreating re Therearearetargeted a campaignsvariety of thatadvertising goals drivenBusiness : targetingoffer . linkingand matching improvethe quality , performanceandaccuracy of offer design,andcampaigns algorithms implement : Goal newUsers base on Intel base BigDLframework tohelp - ItemsPropensity learningModels with deep - targeting consumers. For targetingconsumers. example,latterthe our Loyaltytosolution - merchants offering,

©2016 Mastercard. Proprietary and Confidential. Issues with Traditional ML : Bottlenecks  occupied lots of resources which had huge impact to other critical workloads.  category list divided byweek, months and years  Bottlenecks The computation time for LTV features took 70% of the data processing time for the whole lifecycle and Need to pre Miss the the Miss feature selection optimizations which could the data save engineering efforts a lot LTV LTV DATAFOR THIS WEEK LTV fromLTV DATA last week AGED AGED LTV DATA - calculate calculate 126 Long Long Term AUTHDETAIL Variables for each user, such total as spends /visits for merchants list, FILTERED FILTERED TRANSACTIONS SUMMED SUMMED BY USER AGED AGED BY USER from from last week ITEM LEVEL ITEM LEVEL DATA MERCHANT CATEGORY GEO 11

©2016 Mastercard. Proprietary and Confidential. feature engineering DeepWith Learning :Remove lots ofLTV workloads and simply the validation validation at training stage automatically when it does cross and optimizationselections will do featuresthe internal  complex ETL jobs ,query input much simpler than constructed templatesas or SQL  and lots of resources works a lot, saved 1~ 2 hours time  namesfrom SQL source) only need to identify the columns custom overlap few pre  Improvements BigDL ( with neutral networks ) The feature The feature engineering be can Whenbuild model , only focus on Remove the Removethe LTV pre - defined defined sliding features features ( the features users - calculations calculations and and 12

©2016 Mastercard. Proprietary and Confidential. 1 Model correlation Prebuilt Issues with Traditional ML : Model pipeline limitations Engineering

Merchant Merchant2 Merchant Feature … n 1 2 2 2 Training n Training Training 1 Training Training 2 Training Model Model n Model 2 Model 1 3 3 3 Evaluation 3 Evaluation Evaluation 2 Evaluation Evaluation 1 Evaluation Merge 4 the prediction the allMerge results merchants but couldn’t to scale more  transactions  when executed occupied most of cluster resources  ,training and testing process computations feature at engineering  correlation  merchanteach merchants and generate one model for  Limitations It It fine was to support up to 200 metricsBad for merchants with few All All the pipelines separated by Run Run merchants in parallel and Lots of duplicationsredundant and Have to pre - calculate calculate the 13

©2016 Mastercard. Proprietary and Confidential. • • • • • • ba27b721.pdf d/39e938c84a867ddf2a8cabc575ff https://pdfs.semanticscholar.org/aa9 systems. forand generalization recommender combine the benefits of memorization models and deep neural networks Scenario: jointly WideDeep & gnan/papers/ncf.pdf https://www.comp.nus.edu.sg/~xian NCF past past history activities active users) according to customers’ customers (priority is to recommend to Filtering ,recommend Scenario Approach: Neural : Neural Collaborative learning trained trained linear wide . products products to --- to Recommender

Embedding Layers MF User EmbeddingUser MF LookupTable (MFUser) Select User indexUser ConcatTable CMul MF Item EmbeddingItem MF LookupTable (MFItem) Select User Item Pair MF Item IndexItem Sigmoid Linear3 Concat LookupTable MLP User Embedding User MLP (MLP User)(MLP Select User indexUser Linear1 Linear2 Conca ReLU ReLU LookupTable (MLP Item) (MLP MLP Item Embedding ItemMLP Select MLP Item IndexItem 14

©2016 Mastercard. Proprietary and Confidential. deep deep learning projects based on different business problems, domain models and data sets.   approach  major The   T Benchmark approaches     follows: benchmark, each For ry two kinds of approaches to predict users predict to approaches of kinds two TOP 20 Precision & Precision Recall (https://en.wikipedia.org/wiki/Precision_and_recall) Recall area Precision under curve (PR AUC) CharacteristicReceiver Operating area under curve (ROC AUC) Traditional NCF Abstract Abstract After Learn compared with traditional machine learning. & WAD learning with automated selected couple couple rounds of benchmarks, verify and validate the differences, enhancements of deep learning the details the details Intel of BigDL, including APIs and mainstreammodels, master how to use it to handle the goals of this POC including: this of goals and and purify the general data pipelines for similar kinds of modeling and data science engineering machine learning with handcrafted features ( on Spark Based Mlib ALS) evaluate evaluate and compare the regular performance metrics of ML of as metrics models performance regular the and compare - features merchants (offers) propensities model. propensities (offers) merchants 15

©2016 Mastercard. Proprietary and Confidential.   ML Libs      Stage) Like( Prod Cluster We    sets data Select Environments Target Intel Intel BigDL (0.3 ) Spark Mlib ALS ( 2.2 ) Jdk Spark Hadoop 24 nodes 6 cluster (3 nodes, HMN3 HDP nodes), for single each node is physical box Distinct Distinct We Stage selected a Hadoop cluster which samehas restrictions PROD:as We selected years3 choose choose 12 ~24 hyper cores, 384 memory,G 21 T disk 1.8 version: version: 2.2 Merchants Merchants (Offers or Campaigns) for benchmark : distribution: distribution: CDH5.12.1 known transactions qualified consumer months for training and credit credit transactions : 675 675 : k :1.4 billions ( 53 G for raw data) (from a Mid Bank), 1~2 1~2 months for back test. 2000 with with those characteristics: 16

©2016 Mastercard. Proprietary and Confidential. Load Parquet Load Training Training Data Negative samples Months 10~12 Raw Txns Implementation : runBigDL & ALS overSpark onHadoop + Engineering Feature Simple Parquet partition sampled partition sampled Files partition sampled Selections Feature Spark Data Frames features Spark ML Pipeline Spark MLStages Pipeline Train Estimator Train NCF NCF Model ( BigDL) Train Multiple Wide and Deep Model ( BigDL) Models Train Spark

AIS Model ( Mlib … M Transformer Transformer odels Using Using Neural Recommender Mlib BigDL NCF/ Wide And Deep Wide And NCF/ BigDL ) candidate model Tune & Fine Evaluation Model models …. User User User User models models - - - - Merchant Geo Category Merchant Processing Processing Post - Geo Benchmark

Spark Pipeline Pre

Predictions Test Test Data Months Ensemble Model Selection Feature Inference

Test - processing 1~2 … 17

©2016 Mastercard. Proprietary and Confidential. Alpha( Rank( RegParam MaxIter : Parameters Mlib Mlib AIS Benchmark results> 100 rounds) ( 20precision: precision: D recall: AUPRCs: AUROC: A 200 0.01 ( ) ( 100 0.01 ) B ) ) E precision: D recall AUPRCs: AUROC: 20precision: batchSize mOutput uOutput learningRateDecay learningRate MaxEpoch : Parameters BigDL NCF : C+18% A+23% B+31% ( ( 200 1 ( +47% (1.6 M) (1.6 00 10 E+51% ) ) ) ( 3e - 2 (3e ) - 7) 20precision: precision: recall AUPRCs: AUROC: BigDL WAD batchSize mOutput uOutput learningRateDecay learningRate MaxEpoch : Parameters : C+12% (4 % down) A+20% (3 (3 % A+20%down) B+30%(1% down) D+49%(2 % up) ( ( 200 1 ( E+54% (3% up) (0.6 M) (0.6 00 10 ) ) ) ( 1 e - 2 (1e ) - 7) 18

©2016 Mastercard. Proprietary and Confidential. for BigDL PROD on - boarding journey and Tips

©2016 Mastercard. Proprietary and Confidential.   data sets   changed transactions and models benchmark the ContinuousLearning(  relevant train and test datasets from pre   such as the for metrics 50improve pre Fine Tuning Periodic Periodic Load the history model to the context and update incremental parts of model based on the incremental Refresh the dimensional datasets ( such adding as new users , items…) Update the model and generate separate matrixes.evaluation Run the global firsttraining base , on the phase 1 model , run fine byturning enhanced parameters with Prepare training and testing datasets for pre Benchmark the history model and update model and on Additional PROD on Re - training training with a batch algorithm and time ( Among users the Among all , business items and learning on wants subsets to enhanceddeepof them, only re only - selected merchants merchants from selected6000 merchants ) - run the whole pipeline with incremental changed datasets such changed datasets such with as incremental pipeline dailyrun whole the - selected selected items ) - - selected selected items boarding efforts - series series - board board the better ones. prediction prediction

©2016 Mastercard. Proprietary and Confidential. • • • Cloud and Native API/Pipeline enablement jobs for offline, Spark Streaming jobs for streaming , the real  integration points  Model Serving Integrate Integrate the model services serving inside the business pipelines , embedsuch as them into Map Build Build the model capability serving byexporting model to scoring/prediction/recommendation services and Automation Microservice Microservice Implementation and deployment Deploy to Additional PROD on on on premise API’s to create autonomous deployment pipeline ( Connect to existing business pipelines , offline ,streaming and real , offline,streaming business pipelines existing to Connect and and off - premise cloud native - boarding Strategy(Containerization Orchestration)& Container environments - time “dialogue” with Kafka messaging … efforts - time ) time -- continue - reduce reduce

©2016 Mastercard. Proprietary and Confidential.   know should limitations Some     Tuning Performance      Avoid OOM Other PROD enablement Tips Tune GClot a , god bless you Tune shuffle a lot , spark.shuffle.manager=tungstensuch as Turn on spark.rdd.compress BigDL 0.2 does not support Spark 2.x , use so 0.3 least at Only Random sampling VS Invert sampling , another tradeoff Separate data preparing ,training and Inference to steps multiple spark jobs Plan your epoch and iterations , tradeoff between performance and precision Batch Size matters Sparse Sensor , not Sensor Dense BigDL broadcasts model to each CPUcore , not executor ! support support JavaSerializer JavaSerializer with ,issue KyroSerializer  - sort 22

©2016 Mastercard. Proprietary and Confidential. Thanks Q & A

©2016 Mastercard. Proprietary and Confidential.