AI for Complex Situations: Beyond Uniform Problem Solving
Total Page:16
File Type:pdf, Size:1020Kb
AI for Complex Situations: Beyond Uniform Problem Solving Michael Witbrock DRSM, Learning to Reason Cognitive Computing Research [email protected] © 2017 International Business Machines Corporation Cognitive Systems Systems that Reason, Learn and Understand 2 Beyond Data, Beyond Programs Can be done. Worth doing. Diverse Machine many, many problems Reasoning Targets Can be programmed. Worth programming. Uniform © 2016 International Business Machines Corporation 3 Goal: Professional Level Competence © 2016 International Business Machines Corporation 4 Professional Competence in Organizations • Warn and explain at time of compliance risk • Contextual ID of Relevant Co-workers during meeting planning • Personalized Medicine based on recent research • Corporate forms that only ask for new, knowable information • Compliant re-engineering based on supply chain • Validity maintenance for documentation and processes • Code analysis and synthesis based on intent © 2017 International Business Machines Corporation 5 Kinds of Thought Neural Networks Deliberate Reasoning • Humans and Animals • Humans only (almost) • Reactive • Supervisory • Trained from Data • Trainable from Data or Language • Hard to Explain • Explainable and Correctable • Particular Skills • Portable Skills • Ubiquitous in Enterprise • Data and Processing Driven • Ripe for Rapid Progress Advances 6 © 2016 International Business Machines Corporation Early, symbolic, small AI systems were impressive AARON - The First Artificial Carnegie Learning’s SHRDLU: A program for Intelligence Creative Artist Algebra Tutor (1999– understanding natural (Harold Cohen, UCSD) present): This tutor language, (Terry Winograd, 1973–2016) encodes knowledge about MIT) in 1968-70 that carried The Aaron system algebra as production on a simple dialog with a composes and physically rules, infers models of user, about a small world of paints novel art work. students’ knowledge, and objects on a display screen. It is a rule-based expert provides them with system using a declarative personalized instruction. http://hci.stanford.edu/~winograd/shrdlu/ language. http://www.viewingspace.com/genetics_culture/p http://www.carnegielearning.com ages_genetics_culture/gc_w05/cohen_h.htm © 2015 International Business Machines Corporation IBM has developed landmark game playing systems Playing checkers on the 701 On February 24, 1956, Arthur Samuel’s Checkers IBM Researcher Gerald program, which was Tesauro (1994) developed On May 11, 1997, IBM’s Deep developed for play on the a self-teaching Blue (manned by co-creator IBM 701, was demonstrated backgammon program Murray Campbell above) beat the to the public on television. It called TD-Gammon. Starting world chess champion Garry is considered a milestone from a random initial Kasparov after a six-game match: for artificial intelligence, and strategy, and learning its two wins for IBM, one for the offered the public in the strategy almost entirely from champion and three draws. early 1960s an example of self-play, TD-Gammon the capabilities of an achieved a human world- electronic computer champion level of performance. © 2015 International Business Machines Corporation … and, of course, Watson for Jeopardy 9 Rise of Machine Learning 93.4% “Machine learning models are machines for creating entanglement and making the isolation of improvements effectively impossible” Machine Learning: The High-Interest Credit Card of Technical Debt (Sculley et al. – via Doug Beeferman, Sift Science) BUT https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf https://deepmind.com/blog/wavenet-generative-model-raw-audio/ IBM Switchboard : https://arxiv.org/pdf/1604.08242v2.pdf Parsing https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html Image Synthesis http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.pdf NNs in Finance 1986: HNC founded 1990s: HNC Uses NN for Credit Score 2002: Fair Isaac Company buys HNC FICO Score 1985: Modern Neural Nets Invented 1991: Neural Networks in Banking 1994: IBM Neural Network Utility Released 2017: almost no major applications Hypothesis Easy NN learnability depends on simple underlying causal structure overlaid with hard-to-describe variation of limited complexity 12 Speech Textual Utterance Words Words Words Prosody Phonemes Phonemes Intonation Phones Phones Phones Pitch Rate Volume 13 Shallow NLP I am not a computer • Variation from: PRP VBP RB DET NN ‒ large vocabulary NP ‒ subtle interaction effects NP VP S • Perhaps: Similar shallow NP VP semantic combination structure NP PRP RB VBP RB DET NN Je ne suis pas un ordinateur 14 Go, Backgammon, … 80’s Video Games Backgammon Rules (US Backgammon Federation), ~3 pages How to Play Go (British Go Association), 12 pages of instructions. http://usbgf.org/learn-backgammon/backgammon-rules-and-terms/rules-of-backgammon/ https://www.britgo.org/files/pubs/playgo.pdf 15 Object Recognition • More or less rigid • Slowly changing • Recursive structure • Surface properties • Illumination • Optics 16 Hypothesis Easy NN learnability depends on simple underlying structure overlaid with hard-to- describe variation of limited complexity Statistical Theory of Model of the World Variation 17 How do Humans Build Large Models of the World? Learned Mostly Learned Explicit, Implicit, Symbolic, Tacit, Compositional Statistical Theory of the Models of World Variation 18 Statistical ML + Symbolic Reasoning Statistical Deep Learning has become the engine of machine learning Rich knowledge graphs and KBs have become the foundation for symbolic reasoning + Causal Inference is rapidly becoming more practical and well-founded Continuous Real-time Online Knowledge Learning Fusion e.g. Sara Magliacane, Tom Claassen, Joris M. Mooij Ancestral Causal Inference In Proceedings of Advances in Neural Information Processing Systems 29 (NIPS 2016) Methods for Symbolic Evidence Assembly Knowledge Source Persistent, minimally inconsistent, Task Context Task Inductive Logic Programming general purpose knowledge Analogical Mapping Program Synthesis NL Text Synthesis Textual entailment … Logical inference Probabilistic inference Task relevant knowledge 20 and data Collaborative Cognition Cognitive agents that collectively learn and leverage sophisticated models of users, engaging with us via adaptive multi-modal interfaces Human to Human Sequential Markov Decision Process Watson Sentiment Analysis Sensitivity Analysis Cog to Human Rule Elicitation Influence Diagram Constructor Objective Identification Lighting Cog to Cog Critical Sites Personal Avatar Consequence Table 21 © 2015 International Business Machines Corporation Smart Swaps Fact Checker Beyond Data, Beyond Programs, Beyond Narrow Tasks Humans want to do and can most problems Diverse Machine Learning Explicit, Symbolic, Implicit, Tacit, & Reasoning Composable Theory Statistical Model of Targets of the World Variation Can be programmed. Worth programming. Uniform © 2017 International Business Machines Corporation 22 Rich, Compositional Knowledge Representations • Logic Rethink Logic • Probabilistic logics • Trainable programs • Distributed representations • Trained Neural Network Reasoners • logics over continuous mathematical structures • differentiable logics to allow reliable approximate reasoning © 2017 International Business Machines Corporation 23 Rethinking Computation with Reusable, Composional Learning Automated Knowledge Symbolic & Trainable Explainable AI Base Creation Logic Facts, Rules Integration, testing, deployment and Entities, Relations experimentation Reusable (learned) Language to Representations Representation to Language Shallow Structured Knowledge Extraction Document Understanding Massive Unstructured Unstructured/Semi-structured Whole stack optimization Approximations at the hardware level can tremendously improve the computational efficiencies of Machine Learning Systems that are inherently more resilient to these approximations. Non-parametric Limited Supervision High-Dimensional Learning, Learning Learning • Key Insight from the Machine Learning / Deep Deep neural networks, Active Learning, multi- Low-rank structure, Learning workloads: Kernel-based methods modal learning Dimensionality reduction • Machine Learning algorithms are inherently Few, expensive iterations v/s Approximate Numerical Optimization, more tolerant to approximations at every level in Many, cheap iterations Stochastic Optimization Methods the stack from algorithm down to the hardware Fast Numerical Linear Algebra via Randomized Algorithms: implementations. SVD, Eigen-decomposition, Matrix Multiplication etc. Programming Interface: • Approximations at the hardware level can be Language extensions, Probabilistic Programming Languages embedded in the architecture • Reduced Precision for Computes (8-bit vs. 64-bit) Relaxed Synchronization in a Distributed Computing Model: Across nodes or across cores • Relaxed Synchronization between threads. • Native Devices / Circuits that can add “noise” to Hardware Acceleration via Approximate Computing: computations and help “regularize” parameters Low precision arithmetic, stochastic computing circuits Sub-10 nm Si-CMOS: • 10-100X over commodity CPU-GPU clusters can be Relax constraints on device variability targeted in the foreseeable time-frame. Beyond Si-CMOS and Emerging Device Technology: Carbon-based logic, Resistive RAM, PCM etc.. • Significant Hardware Speed-Up for Symbolic Reasoning also likely attainable Cognitive