AI, Machine Learning and Language Technologies: What’s Next?

Army Mad Scientists Program March 7, 2017

Jaime Carbonell, and colleagues School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~jgc AI Touches Virtually All Areas of Computer Science

Systems Entertain Lang + Theory Tech Tech Fine Arts

Comp Bio Sciences Machine Learning Human-Comp Interaction Robotics

Computer Humanities Science Engineering

3/6/2017 Jaime G. Carbonell, Language 2 Technolgies Institute AI is Becoming Central to the World Economy (Davos 2016)

“The fourth Industrial Revolution” is characterized by: n “Ubiquitous and mobile internet”, n “Smaller and more powerful sensors”, n “Artificial Intelligence”, and n “Machine Learning”

-- Prof. Klaus Schwab, founder of the Davos World Economic Forum, 2016

3/6/2017 Jaime G. Carbonell, Language 3 Technolgies Institute Key Components of AI

o Automated Perception n Vision, sonar, lidar, haptics, … o Robotic Action n Locomotion, manipulation, … o Deep Reasoning n Planning, goal-oriented behavior, projection, … o Language Technologies n Language, speech, dialog, social nets, … o Machine Learning n Adaptation, reflection, knowledge acquisition, … o Big Data

3/6/2017 Jaime G. Carbonell, Language 4 Technolgies Institute Key Components of AI

o Automated Perception n Vision, sonar, lidar, haptics, … o Robotic Action n Locomotion, manipulation, … o Deep Reasoning n Planning, goal-oriented behavior, projection, … o Language Technologies n Language, speech, dialog, social nets, … o Machine Learning  Today’s main focus n Adaptation, reflection, knowledge acquisition, … My research o Big Data

3/6/2017 Jaime G. Carbonell, Language 5 Technolgies Institute How Big is Big? Dimensions of Big Data Analytics

LARGE-SCALE : TERABYTES  PETABYTES  EXOBYTES

Billions++ of entries: Terabyes/Petabyes of data

HIGH-COMPLEXITY Trillions of potential relations among entries (graphs)

HIGH-DIMENSIONAL Millions of attributes per entry (but typically sparse encoding)

3/6/2017 Jaime G. Carbonell, CMU 6 The Big-Data “Stack”

Analytics Algorithms -- Machine Learning -- Artificial Intelligence Alerts, Visualization

Big-Data Architecture -- Hadoop/H-Table -- Asynch/Pegasus

Sensors Big-Data “Plumbing” -- Cloud/Storage Knowledge -- Resource Allocator Historical & base Normative Data

3/6/2017 Jaime G. Carbonell, CMU 7 Trends in Machine Learning o “Deep” Learning (DNNs): vision, speech, NLP o Reinforcement Learning: robotics o Large-margin methods (SVM): classification o Graphical models: strong priors, domain K.

How to cope with knowledge sparsity? o (Pro)Active learning: optimizing external help o Transfer/Multitask learning: related new domains o Explainable AI (to engage SMEs, users) o (Pro)Active teaching: …coming next?

3/6/2017 Jaime G. Carbonell, Language 8 Technolgies Institute Machine Learning in A Nutshell o Training data: n Special case: o Functional space: o Fitness Criterion: n a.k.a. loss function

o Active Learning Sampling Strategy:

3/6/2017 Jaime G. Carbonell, Language 9 Technolgies Institute Why is Active Learning Important?

o Labeled data volumes  unlabeled data volumes n 1.2% of all proteins have known structures n < .01% of all galaxies in the Sloan Sky Survey have consensus type labels n < .0001% of all web pages have topic labels n << E-10% of all internet sessions are labeled as to fraudulence (malware, etc.) n < .0001% of all financial transactions investigated w.r.t. fraudulence o If labeling is costly, or limited, select the instances with maximal impact for learning

3/6/2017 Jaime G. Carbonell, Language 10 Technolgies Institute Strategy Selection: A Surprise There is No Universal Optimum

• Optimal operating range for AL sampling strategies differs • How to get the best of both worlds? • (Hint: ensemble methods)

3/6/2017 Jaime G. Carbonell, Language 11 Technolgies Institute How does DUAL do better?

o Runs DWUS until it estimates a cross-over

o Monitor the change in expected error at each iteration to detect when it is stuck in local minima

o DUAL uses a mixture model after the cross-over ( saturation ) point

o Our goal should be to minimize the expected future error n If we knew the future error of Uncertainty Sampling (US) to be zero, then we’d force n But in practice, we do not know it

3/6/2017 Jaime G. Carbonell, Language 12 Technolgies Institute Cost varies non-uniformly

statistically significant (p<0.01)

3/6/2017 Jaime G. Carbonell, Language 13 Technolgies Institute Active Learning is Awesome, but … is it Enough?

Traditional Single Perfect Source Fixed Labeling Cost Active Learning

CIKM ‘08 Multiple Sources Varying-Cost Model

Going Beyond Differing Answer Task Expertise Reluctance Expertise Difficulty Level Labeling Noise Proactive Learning SDM_sub ‘10 Ambiguity JMLR_’09 Fixed over time Time-varying 14 KDD ‘09 Active vs Proactive Learning

Active Learning Proactive Learning

Number of Oracles Individual (only one) Multiple, with different capabilities, costs and areas of expertise Reliability Infallible (100% right) Variable across oracles and queries, depending on difficulty, expertise, … Reluctance Indefatigable (always Variable across oracles and answers) queries, depending on workload, certainty, … Cost per query Invariant (free or constant) Variable across oracles and queries, depending on workload, difficulty, …

Note: “Oracle”  {expert, experiment, computation, …}

3/6/2017 Jaime G. Carbonell, Language 15 Technolgies Institute SDM ‘10 Does Tracking Predictor Accuracy Actually Help in Proactive Learning?

3/6/2017 Jaime G. Carbonell, Language 16 Technolgies Institute Active Learning for MT Parallel Expert corpus Translator S,T Trainer

Mode S Sampled l corpus

MT System Source Language Active Corpus Learner

3/6/2017 Jaime G. Carbonell, Language 17 Technolgies Institute Active Crowd Translation S,T 1

S,T 2 Trainer . Translation . Selection . Mode S,T l n

S Sentenc e Selectio n MT System Source Language ACT Corpus Framework

3/6/2017 Jaime G. Carbonell, Language 18 Technolgies Institute Active Learning Strategy: Diminishing Density Weighted Diversity Sampling

Experiments: Language Pair: Spanish-English Iterations: 20 Batch Size: 1000 sentences each Translation: Moses Phrase SMT Development Set: 343 sens Test Set: 506 sens

Graph: X: Performance (BLEU ) Y: Data (Thousand words) 3/6/2017 Jaime G. Carbonell, Language 19 Technolgies Institute Translation Selection from AMT

o Crowds beat experts • Translator Reliability

• Translation Selection:

3/6/2017 Jaime G. Carbonell, Language 20 Technolgies Institute MT via LSTM (DNNs + Sequence)

I'd like a beer STOPstop

→ I'd like a beer

Attention history:

3/6/2017 Jaime G. Carbonell, Language 21 Technolgies Institute Used Deep Learning (LDSTA) model trained on Yahoo! answers data to match questions with answer-bearing sentences

3/6/2017 Jaime G. Carbonell, Language 22 Technolgies Institute 3/6/2017 Jaime G. Carbonell, Language 23 Technolgies Institute Transfer/Multi-Task Learning o Basic Idea: Map invariant properties from similar tasks previously learning tasks o Challenges: What to retain? How to modify? o History: n Transformation/Derivational Analogy (1980s) n Case-Based Reasoning (1980s-1990s) n “Modern” Transfer Multi-Task (2000’s) o New focus: beyond transferring priors & features n Regularizers to maximize transfer n Structural biases

3/6/2017 Jaime G. Carbonell, Language 24 Technolgies Institute Host-pathogen interactions : The Multitask Landscape

Homologous proteins due to common Firmicutes ancestors B. anthracis H. sapiens

Bacteria Vertebrates Y. pestis

M. musculus Enterobacteria Protists S. typhi Plants A. Thaliana

3/6/2017 Jaime G. Carbonell, Language 25 Technolgies Institute Common Biological Pathways

The “Glucose Transport Pathway”

3/6/2017 Jaime G. Carbonell, Language 26 Technolgies Institute Multi-task Objective

For m tasks with parameters 1. Minimize empirical error 2. Enforce commonality hypothesis 3. Prevent overfitting

Empirical loss Pathway regularizer L2 regularizer

3/6/2017 Jaime G. Carbonell, Language 27 Technolgies Institute Boeing-CMU Aerospace ML/Analytics Lab Just after Takeoff

Dreamliner Maiden Flight 15-December-2009

3/6/2017 Jaime G. Carbonell, Language 28 Technolgies Institute F/A-18 Maintenance Decision Support Past: Reactive, Improve flight readiness

§ Computer-assisted diagnoses of F/A-18 troubles • Statistical learning problem § Computer-assisted expert finding • Statistical recommendation (“collaborative filtering”) problem § Computer-assisted resolution recommendation • Information retrieval problem § From research prototypes to operational systems • Software engineering problem (not done by CMU)

Jaime G. Carbonell, Language 29 Technologies Institute Information flow: Aircraft trouble reports

Classifiers

Recom. System

Search Engine

Jaime G. Carbonell, Language 30 Technolgies Institute What’s Next for AI/ML? o Reflection n Agent that analyzes its own failures n Knows what it does not know but needs to know n Learns trust (teachers, sources, observations) o Curiosity n Never idles, but dreams “what if” n Runs internal experiments when inactive o Teamwork n Shares knowledge proactively n Changes roles as needed

3/6/2017 Jaime G. Carbonell, Language 31 Technolgies Institute LTI COG (aka Agent) Architecture

3/6/2017 32 What’s Next (take 2) o Safe AI vs Wild AI n Safe = guarantees, constraints, transparency n Wild = adaptability, curiosity, exploration o Safety n Universal “undo” button for autonomy n Explain why it is recommending decisions n Best for non-time critical tasks o AI in the wild n Full autonomy  no ironclad guarantees n Remotely instructable (“reset” button)

3/6/2017 Jaime G. Carbonell, Language 33 Technolgies Institute Some Predictions for 10+ years o Ubiquitous autonomous vehicles n That drive, talk, entertain, maintain themselves,… o Proactive intelligence gathering and analysis n Optimally (re)deploy intelligence assets n Generate hypotheses with probabilities o AI will finally take over healthcare n Evidence/outcomes-based medicine n Fully personalized o Ubiquitous personalized agents (“Cogs”) n 1 user, many agents (vs Siri: one agent, many users) n Proactive agents (alerts, cognitive-impairment, …)

3/6/2017 Jaime G. Carbonell, Language 34 Technolgies Institute THANK YOU!

3/6/2017 Jaime G. Carbonell, Language 35 Technolgies Institute