Active Learning
Total Page:16
File Type:pdf, Size:1020Kb
AI, Machine Learning and Language Technologies: What’s Next? Army Mad Scientists Program March 7, 2017 Jaime Carbonell, and colleagues School of Computer Science Carnegie Mellon University www.cs.cmu.edu/~jgc AI Touches Virtually All Areas of Computer Science Systems Entertain Lang + Theory Tech Tech Fine Arts Comp Artificial Intelligence Bio Sciences Machine Learning Human-Comp Interaction Robotics Computer Humanities Science Engineering 3/6/2017 Jaime G. Carbonell, Language 2 Technolgies Institute AI is Becoming Central to the World Economy (Davos 2016) “The fourth Industrial Revolution” is characterized by: n “Ubiquitous and mobile internet”, n “Smaller and more powerful sensors”, n “Artificial Intelligence”, and n “Machine Learning” -- Prof. Klaus Schwab, founder of the Davos World Economic Forum, 2016 3/6/2017 Jaime G. Carbonell, Language 3 Technolgies Institute Key Components of AI o Automated Perception n Vision, sonar, lidar, haptics, … o Robotic Action n Locomotion, manipulation, … o Deep Reasoning n Planning, goal-oriented behavior, projection, … o Language Technologies n Language, speech, dialog, social nets, … o Machine Learning n Adaptation, reflection, knowledge acquisition, … o Big Data 3/6/2017 Jaime G. Carbonell, Language 4 Technolgies Institute Key Components of AI o Automated Perception n Vision, sonar, lidar, haptics, … o Robotic Action n Locomotion, manipulation, … o Deep Reasoning n Planning, goal-oriented behavior, projection, … o Language Technologies n Language, speech, dialog, social nets, … o Machine Learning Today’s main focus n Adaptation, reflection, knowledge acquisition, … My research o Big Data 3/6/2017 Jaime G. Carbonell, Language 5 Technolgies Institute How Big is Big? Dimensions of Big Data Analytics LARGE-SCALE : TERABYTES PETABYTES EXOBYTES Billions++ of entries: Terabyes/Petabyes of data HIGH-COMPLEXITY Trillions of potential relations among entries (graphs) HIGH-DIMENSIONAL Millions of attributes per entry (but typically sparse encoding) 3/6/2017 Jaime G. Carbonell, CMU 6 The Big-Data “Stack” Analytics Algorithms -- Machine Learning -- Artificial Intelligence Alerts, Visualization Big-Data Architecture -- Hadoop/H-Table -- Asynch/Pegasus Sensors Big-Data “Plumbing” -- Cloud/Storage Knowledge -- Resource Allocator Historical & base Normative Data 3/6/2017 Jaime G. Carbonell, CMU 7 Trends in Machine Learning o “Deep” Learning (DNNs): vision, speech, NLP o Reinforcement Learning: robotics o Large-margin methods (SVM): classification o Graphical models: strong priors, domain K. How to cope with knowledge sparsity? o (Pro)Active learning: optimizing external help o Transfer/Multitask learning: related new domains o Explainable AI (to engage SMEs, users) o (Pro)Active teaching: …coming next? 3/6/2017 Jaime G. Carbonell, Language 8 Technolgies Institute Machine Learning in A Nutshell o Training data: n Special case: o Functional space: o Fitness Criterion: n a.k.a. loss function o Active Learning Sampling Strategy: 3/6/2017 Jaime G. Carbonell, Language 9 Technolgies Institute Why is Active Learning Important? o Labeled data volumes unlabeled data volumes n 1.2% of all proteins have known structures n < .01% of all galaxies in the Sloan Sky Survey have consensus type labels n < .0001% of all web pages have topic labels n << E-10% of all internet sessions are labeled as to fraudulence (malware, etc.) n < .0001% of all financial transactions investigated w.r.t. fraudulence o If labeling is costly, or limited, select the instances with maximal impact for learning 3/6/2017 Jaime G. Carbonell, Language 10 Technolgies Institute Strategy Selection: A Surprise There is No Universal Optimum • Optimal operating range for AL sampling strategies differs • How to get the best of both worlds? • (Hint: ensemble methods) 3/6/2017 Jaime G. Carbonell, Language 11 Technolgies Institute How does DUAL do better? o Runs DWUS until it estimates a cross-over o Monitor the change in expected error at each iteration to detect when it is stuck in local minima o DUAL uses a mixture model after the cross-over ( saturation ) point o Our goal should be to minimize the expected future error n If we knew the future error of Uncertainty Sampling (US) to be zero, then we’d force n But in practice, we do not know it 3/6/2017 Jaime G. Carbonell, Language 12 Technolgies Institute Cost varies non-uniformly statistically significant (p<0.01) 3/6/2017 Jaime G. Carbonell, Language 13 Technolgies Institute Active Learning is Awesome, but … is it Enough? Traditional Single Perfect Source Fixed Labeling Cost Active Learning CIKM ‘08 Multiple Sources Varying-Cost Model Going Beyond Differing Answer Task Expertise Reluctance Expertise Difficulty Level Labeling Noise Proactive Learning SDM_sub ‘10 Ambiguity JMLR_’09 Fixed over time Time-varying 14 KDD ‘09 Active vs Proactive Learning Active Learning Proactive Learning Number of Oracles Individual (only one) Multiple, with different capabilities, costs and areas of expertise Reliability Infallible (100% right) Variable across oracles and queries, depending on difficulty, expertise, … Reluctance Indefatigable (always Variable across oracles and answers) queries, depending on workload, certainty, … Cost per query Invariant (free or constant) Variable across oracles and queries, depending on workload, difficulty, … Note: “Oracle” {expert, experiment, computation, …} 3/6/2017 Jaime G. Carbonell, Language 15 Technolgies Institute SDM ‘10 Does Tracking Predictor Accuracy Actually Help in Proactive Learning? 3/6/2017 Jaime G. Carbonell, Language 16 Technolgies Institute Active Learning for MT Parallel Expert corpus Translator S,T Trainer Mode S Sampled l corpus MT System Source Language Active Corpus Learner 3/6/2017 Jaime G. Carbonell, Language 17 Technolgies Institute Active Crowd Translation S,T 1 S,T 2 Trainer . Translation . Selection . Mode S,T l n S Sentenc e Selectio n MT System Source Language ACT Corpus Framework 3/6/2017 Jaime G. Carbonell, Language 18 Technolgies Institute Active Learning Strategy: Diminishing Density Weighted Diversity Sampling Experiments: Language Pair: Spanish-English Iterations: 20 Batch Size: 1000 sentences each Translation: Moses Phrase SMT Development Set: 343 sens Test Set: 506 sens Graph: X: Performance (BLEU ) Y: Data (Thousand words) 3/6/2017 Jaime G. Carbonell, Language 19 Technolgies Institute Translation Selection from AMT o Crowds beat experts • Translator Reliability • Translation Selection: 3/6/2017 Jaime G. Carbonell, Language 20 Technolgies Institute MT via LSTM (DNNs + Sequence) I'd like a beer STOPstop → I'd like a beer Attention history: 3/6/2017 Jaime G. Carbonell, Language 21 Technolgies Institute Used Deep Learning (LDSTA) model trained on Yahoo! answers data to match questions with answer-bearing sentences 3/6/2017 Jaime G. Carbonell, Language 22 Technolgies Institute 3/6/2017 Jaime G. Carbonell, Language 23 Technolgies Institute Transfer/Multi-Task Learning o Basic Idea: Map invariant properties from similar tasks previously learning tasks o Challenges: What to retain? How to modify? o History: n Transformation/Derivational Analogy (1980s) n Case-Based Reasoning (1980s-1990s) n “Modern” Transfer Multi-Task (2000’s) o New focus: beyond transferring priors & features n Regularizers to maximize transfer n Structural biases 3/6/2017 Jaime G. Carbonell, Language 24 Technolgies Institute Host-pathogen interactions : The Multitask Landscape Homologous proteins due to common Firmicutes ancestors B. anthracis H. sapiens Bacteria Vertebrates Y. pestis M. musculus Enterobacteria Protists S. typhi Plants A. Thaliana 3/6/2017 Jaime G. Carbonell, Language 25 Technolgies Institute Common Biological Pathways The “Glucose Transport Pathway” 3/6/2017 Jaime G. Carbonell, Language 26 Technolgies Institute Multi-task Objective For m tasks with parameters 1. Minimize empirical error 2. Enforce commonality hypothesis 3. Prevent overfitting Empirical loss Pathway regularizer L2 regularizer 3/6/2017 Jaime G. Carbonell, Language 27 Technolgies Institute Boeing-CMU Aerospace ML/Analytics Lab Just after Takeoff Dreamliner Maiden Flight 15-December-2009 3/6/2017 Jaime G. Carbonell, Language 28 Technolgies Institute F/A-18 Maintenance Decision Support Past: Reactive, Improve flight readiness § Computer-assisted diagnoses of F/A-18 troubles • Statistical learning problem § Computer-assisted expert finding • Statistical recommendation (“collaborative filtering”) problem § Computer-assisted resolution recommendation • Information retrieval problem § From research prototypes to operational systems • Software engineering problem (not done by CMU) Jaime G. Carbonell, Language 29 Technologies Institute Information flow: Aircraft trouble reports Classifiers Recom. System Search Engine Jaime G. Carbonell, Language 30 Technolgies Institute What’s Next for AI/ML? o Reflection n Agent that analyzes its own failures n Knows what it does not know but needs to know n Learns trust (teachers, sources, observations) o Curiosity n Never idles, but dreams “what if” n Runs internal experiments when inactive o Teamwork n Shares knowledge proactively n Changes roles as needed 3/6/2017 Jaime G. Carbonell, Language 31 Technolgies Institute LTI COG (aka Agent) Architecture 3/6/2017 32 What’s Next (take 2) o Safe AI vs Wild AI n Safe = guarantees, constraints, transparency n Wild = adaptability, curiosity, exploration o Safety n Universal “undo” button for autonomy n Explain why it is recommending decisions n Best for non-time critical tasks o AI in the wild n Full autonomy no ironclad