Transfer Learning
Changrong Ji, A3.AI 02/2019
1 Topics
• Transfer Learning: What and Why • Computer Vision • Example: Train + Inference with MobileNet using Javascript in the browser • NLP • Transformer & Attention • Example Code • Tabular/Structured Data • Challenges and opportunities • Applying Transfer Learning to Healthcare • Claims and clinical concept embeddings • Deep Learning Model Factory
2 About Me
• Changrong Ji • Founder of A3.AI, a nonprofit Applied AI R+D organization • Leads AI/ML explorations at CareFirst BlueCross BlueShield • Investor, entrepreneur, learner, maker
[email protected] https://www.linkedin.com/in/changrongji/
3 What is Transfer Learning
End-to-End Modeling
Transfer Learning
e.g Google BERT for NLP
Store knowledge gained solving one problem and apply to different but related problems 1. Train model on a large dataset Source 2. Use the pretrained model to jump start the training for related target 4 task(s) Benefits
• Fewer labelled training data • Faster convergence • More robust models • Privacy Preservation
• Example code (login required)
Text Classification Example. Source ULMFit paper
5 Transfer Learning: Machine Learning’s Next Frontier
2018 2019+
Popularized in Breakthroughs Advances in 6 computer vision, audio in NLP Structured Data? Transfer learning is the key to artificial general intelligence.
Demis Hassabis, DeepMind
7 History of Transfer Learning
• Long history of psychological study on human cognition “Transfer of Learning”
• Theoretical foundation of Transfer Learning in Machine Learning established in 1990s
• Popularized in computer vision since ~201 • Example: Train and Inference with MobileNet using TensorFlow.js in the browser
• Recent Breakthroughs in Natural Language Processing propelling NLP into a “Golden Era’ • Before 2018: Word Embeddings: Word2Vec, GloVe • 2018: ULMFiT, GPT/GPT_2, Google BERT • 2019: MT-DNN, XLNet, RoBERTa, Ernie 2.0, etc • 2020: T5, Reformer Leader Board
• Limited adoption in structured data to-date
• Challenges 8 • Intriguing opportunities Under the Hood
• A Comprehensive Hands-on Guide
• What is transferrable • Feature-representation * • Parameters * • Instance • Relational-knowledge
*currently more commonly used in Healthcare use cases
9 Attention & Transformer
Attention http://jalammar.github.io/illustrated-transformer Attention is All You Need The Annotated Transformer Context Sensitive, Bi-directional, parallel processing 10 Related Concepts
• Representation Learning
• Domain Adaptation. Zero/Few Shot Learning
• Multi-task Learning • Multiple tasks solved at the same time • Exploit commonalities and differences across tasks • Examples • Microsoft MT-DNN • Text-to-Text Transfer Transformer
11 T5 11 billion parameters Text-to-Text Transfer Transformer Code
12 12 NLP Examples with Code
• Domain-specific fine tuning • Lanaugae Model Example (login required) • ClinicalBERT, Code
• Training from Scratch
13 TF for Unstructured Data is Relatively Easy
Source: Giles Strong
14 TF for Unstructured Data is Relatively Easy
Source: Giles Strong
15 TF for Structured Data is Harder
Features vary in • Numbers • Meaning • Continuous vs. categorical
Applying a pretrained model requires the same input features
A common model can’t be trained for all possible structured data
Example: Health Insurance Claiims 16 Approaches for TF on Structured Data
1. Turn structured data into images or 2. Create models and standard data text representations specific to domains
Example: Super TML Credit: Mitch Quinn, BlueCross BlueShield NC 17 Example: Predictive Modeling using Claims
500K members’ claims labelled for High Cost Claimant End-to-End Model for HCC End-to-End 500K members’ claims labelled for Customer Churn End-to-End Model for Churn Modeling
500K members’ claims labelled for Hospital Re-admission End-to-End Model for Re-admit
Vector 1 Time Representations Claims of Diagnoses HCC model Concept Procedure Embedding Self-supervised Drug codes, etc. Transfer Pre-Training Churn Model Learning DNN Hyper- Supervised Million of Members’ parameters Fine-tuning Re-admit Model un-labelled Claims
Task-specific labelled data Single or Multi-task Training
18 Transfer Learning with Claims and Clinical Data
• Base DNN trained on millions of member’s, multiple years of claims • Claims data are sparse, noisy and high dimensional • 10s of thousands of ICD diagnoses, procedure and NDC drug code • DNN learns the latent relationships between features and project them into lower dimension vectors (60-1000 dimensions) • Clinical data (notes, labs, EMR semi-structured) can be processed similarly • Concatenate claims and clinical features
• Fine-tune for • Specific data distribution (new populations, regions, etc) • Specific downstream tasks – resulting in task-specific models (classification, regression etc)
• Alternatively • Use as a feature extraction tool • Build separate tasks-specific models (Random Forest, Boosted Trees, etc) decoupled from base model
19 Deep Learning AI Model Factory (Design phase) Changrong Ji - 2019 Train Deploy Integrate Transfer learning Predict, Explain Validate Multi-task learning Prescribe Feedback
Task Specific Models 3A. Input: Inference • Claims History Claims • other 4. Workflow High Cost Engine Integration DNN model 1B. Train 2. Deploy API
Re-admit Inform Contextualized Validate Concept Knee Surgery Intervene Embedding 3B Output: Feedback • Prediction Customer Churn • Probability Many more… Impact • Feature Importance Analysis • Interventions
1A. Pre-train 1B. Train
Claims + 5. Feedback Task-specific training 6. Monitor
SDOH, EHR + 7. Request New/ Upgrade Leveage Model 8. Train, Retrain, Tune Validate Model as a Service API 20 Accuracy