Transfer Learning Ji 02 2019 A3 AI

Transfer Learning

Changrong Ji, A3.AI 02/2019

1 Topics

• Transfer Learning: What and Why • Computer Vision • Example: Train + Inference with MobileNet using Javascript in the browser • NLP • Transformer & Attention • Example Code • Tabular/Structured Data • Challenges and opportunities • Applying Transfer Learning to Healthcare • Claims and clinical concept embeddings • Deep Learning Model Factory

2 About Me

• Changrong Ji • Founder of A3.AI, a nonprofit Applied AI R+D organization • Leads AI/ML explorations at CareFirst BlueCross BlueShield • Investor, entrepreneur, learner, maker

[email protected] https://www.linkedin.com/in/changrongji/

3 What is Transfer Learning

End-to-End Modeling

Transfer Learning

e.g Google BERT for NLP

Store knowledge gained solving one problem and apply to different but related problems 1. Train model on a large dataset Source 2. Use the pretrained model to jump start the training for related target 4 task(s) Benefits

• Fewer labelled training data • Faster convergence • More robust models • Privacy Preservation

• Example code (login required)

Text Classification Example. Source ULMFit paper

5 Transfer Learning: Machine Learning’s Next Frontier

2018 2019+

Popularized in Breakthroughs Advances in 6 computer vision, audio in NLP Structured Data? Transfer learning is the key to artificial general intelligence.

Demis Hassabis, DeepMind

7 History of Transfer Learning

• Long history of psychological study on human cognition “Transfer of Learning”

• Theoretical foundation of Transfer Learning in Machine Learning established in 1990s

• Popularized in computer vision since ~201 • Example: Train and Inference with MobileNet using TensorFlow.js in the browser

• Recent Breakthroughs in Natural Language Processing propelling NLP into a “Golden Era’ • Before 2018: Word Embeddings: Word2Vec, GloVe • 2018: ULMFiT, GPT/GPT_2, Google BERT • 2019: MT-DNN, XLNet, RoBERTa, Ernie 2.0, etc • 2020: T5, Reformer Leader Board

• Limited adoption in structured data to-date

• Challenges 8 • Intriguing opportunities Under the Hood

• A Comprehensive Hands-on Guide

• What is transferrable • Feature-representation * • Parameters * • Instance • Relational-knowledge

*currently more commonly used in Healthcare use cases

9 Attention & Transformer

Attention http://jalammar.github.io/illustrated-transformer Attention is All You Need The Annotated Transformer Context Sensitive, Bi-directional, parallel processing 10 Related Concepts

• Representation Learning

• Domain Adaptation. Zero/Few Shot Learning

• Multi-task Learning • Multiple tasks solved at the same time • Exploit commonalities and differences across tasks • Examples • Microsoft MT-DNN • Text-to-Text Transfer Transformer

11 T5 11 billion parameters Text-to-Text Transfer Transformer Code

12 12 NLP Examples with Code

• Domain-specific fine tuning • Lanaugae Model Example (login required) • ClinicalBERT, Code

• Training from Scratch

13 TF for Unstructured Data is Relatively Easy

Source: Giles Strong

14 TF for Unstructured Data is Relatively Easy

Source: Giles Strong

15 TF for Structured Data is Harder

Features vary in • Numbers • Meaning • Continuous vs. categorical

Applying a pretrained model requires the same input features

A common model can’t be trained for all possible structured data

Example: Health Insurance Claiims 16 Approaches for TF on Structured Data

1. Turn structured data into images or 2. Create models and standard data text representations specific to domains

Example: Super TML Credit: Mitch Quinn, BlueCross BlueShield NC 17 Example: Predictive Modeling using Claims

500K members’ claims labelled for High Cost Claimant End-to-End Model for HCC End-to-End 500K members’ claims labelled for Customer Churn End-to-End Model for Churn Modeling

500K members’ claims labelled for Hospital Re-admission End-to-End Model for Re-admit

Vector 1 Time Representations Claims of Diagnoses HCC model Concept Procedure Embedding Self-supervised Drug codes, etc. Transfer Pre-Training Churn Model Learning DNN Hyper- Supervised Million of Members’ parameters Fine-tuning Re-admit Model un-labelled Claims

Task-specific labelled data Single or Multi-task Training

18 Transfer Learning with Claims and Clinical Data

• Base DNN trained on millions of member’s, multiple years of claims • Claims data are sparse, noisy and high dimensional • 10s of thousands of ICD diagnoses, procedure and NDC drug code • DNN learns the latent relationships between features and project them into lower dimension vectors (60-1000 dimensions) • Clinical data (notes, labs, EMR semi-structured) can be processed similarly • Concatenate claims and clinical features

• Fine-tune for • Specific data distribution (new populations, regions, etc) • Specific downstream tasks – resulting in task-specific models (classification, regression etc)

• Alternatively • Use as a feature extraction tool • Build separate tasks-specific models (Random Forest, Boosted Trees, etc) decoupled from base model

19 Deep Learning AI Model Factory (Design phase) Changrong Ji - 2019 Train Deploy Integrate Transfer learning Predict, Explain Validate Multi-task learning Prescribe Feedback

Task Specific Models 3A. Input: Inference • Claims History Claims • other 4. Workflow High Cost Engine Integration DNN model 1B. Train 2. Deploy API

Re-admit Inform Contextualized Validate Concept Knee Surgery Intervene Embedding 3B Output: Feedback • Prediction Customer Churn • Probability Many more… Impact • Feature Importance Analysis • Interventions

1A. Pre-train 1B. Train

Claims + 5. Feedback Task-specific training 6. Monitor

SDOH, EHR + 7. Request New/ Upgrade Leveage Model 8. Train, Retrain, Tune Validate Model as a Service API 20 Accuracy