Alternating Recurrent Dialog Model with Large-Scale Pre-Trained Language Models

Total Page:16

File Type:pdf, Size:1020Kb

Alternating Recurrent Dialog Model with Large-Scale Pre-Trained Language Models Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models Qingyang Wu1, Yichi Zhang2, Yu Li1, Zhou Yu1 1University of California, Davis, 2Tsinghua University, fwilwu, yooli, [email protected], [email protected] Abstract is difficult to utilize these methods on challenging dialog tasks where dialog states and acts are diffi- Existing dialog system models require exten- cult to annotate such as persuasion and negotiation. sive human annotations and are difficult to gen- Eric and Manning(2017) proposed a simple eralize to different tasks. The recent success sequence-to-sequence architecture that requires no of large pre-trained language models has sug- gested the effectiveness of incorporating lan- explicit annotations. The model learns to extract guage priors in down-stream NLP tasks. How- information from dialog history with attention and ever, how much pre-trained language models copy mechanism. However, due to the limited lan- can help dialog response generation is still un- guage modeling capability in the previous model, der exploration. In this paper, we propose a Sequicity (Lei et al., 2018), which uses belief states simple, general, and effective framework: Al- as inputs for supervision, outperforms Eric and ternating Recurrent Dialog Model (ARDM)1. Manning(2017)’s method significantly in the re- ARDM models each speaker separately and takes advantage of large pre-trained language cent dialog datasets. But with the success of large models. It requires no supervision from human pre-trained language models such as BERT (Devlin annotations such as belief states or dialog acts et al., 2019) and GPT-2 (Radford et al., 2019), we to achieve effective conversations. ARDM out- investigate how large-scale pre-trained language performs or is on par with the state-of-the- models can help dialog tasks. art methods on two popular task-oriented di- Previous sequence-to-sequence models are used alog datasets: CamRest676 and MultiWOZ. to tackle documents with only one narrator. How- Moreover, we can generalize ARDM to more challenging, non-collaborative tasks such as ever, in dialogs, two speakers have different roles; persuasion. In the PersuasionForGood task, therefore, their language model distributions are ARDM is capable of generating human-like re- very different from each other. To address this sponses to persuade people to donate to a char- issue, we propose ARDM, a dialog model that ity. encodes and decodes different speaker utterances in alternating order. This structure makes the 1 Introduction model more flexible and efficient than traditional sequence-to-sequence models in processing vari- It has been a long-standing ambition for artificial ous dialogs. We evaluate our model on three dif- intelligence researchers to create an intelligent con- ferent task-oriented dialog datasets: CamRes676, versational agent that can generate human-like re- MultiWOZ, and PersuasionForGood. The first two sponses. Recently, data-driven dialog models are datasets are traditional information request dialog more and more popular. However, most current datasets with well-defined automatic evaluation state-of-the-art approaches still heavily rely on ex- metrics on task completion. By contrast, Persua- tensive human annotations such as belief states and sionForGood is a new dataset that focuses on per- dialog acts (Lei et al., 2018). However, dialog con- suading people to donate to a charity. Due to the tent can vary considerably in different dialog tasks. complexity of dialog content, there is no explicit Having a different intent or dialog act annotation dialog state defined in this task. scheme for each task is costly and even impossible We observe that ARDM is capable of improv- for tasks such as open-domain social chat. Thus, it ing task-oriented dialog tasks performance over 1https://github.com/qywu/ARDM the previous state-of-the-art methods without in- 1292 Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, pages 1292–1301 April 19 - 23, 2021. ©2021 Association for Computational Linguistics corporating any explicit supervision from belief notation required and is easier to fix errors. Also, states or dialog acts. Also, because of ARDM’s since belief states are a set of important entities con- simplicity and generality, one can rapidly build a densed from dialog history (i.e., often exact words dialog prototype on different types of applications from utterances), they do not introduce extra infor- using only conversations without additional human mation to the model. Therefore, a dialog model annotations. We also found that ARDM works with powerful representation learning should learn well on complex dialogs, such as persuasion. The a form of belief states information automatically model generates dialog responses that successfully without human annotations as the scaffold. persuade people to donate to a charity, suggesting Recent success of BERT (Devlin et al., 2019) the potential of ARDM being used in wide-scale and GPT2 (Radford et al., 2019) suggests the possi- real-world settings. bility of applying large pre-trained language mod- els to dialog systems. There are some studies 2 Related Work of applying large pre-trained language model to dialog generation. TransferTransfo (Wolf et al., Traditional dialog systems consist of a dialog man- 2019) fine-tuned the pre-trained language model ager to maintain dialog states and control the con- GPT (Radford et al., 2018) on Persona-Chat dataset versation flow. However, a dialog manager re- (Zhang et al., 2018) and obtained significant im- quires extensive manual annotations for training provements on chitchat response generation, sug- the sub-modules such as dialog state tracker or gesting the potential of fine-tuning large pre-trained policy decision-maker. An alternative is to model language models on other dialog response gener- dialog without explicitly modeling belief states. ation tasks. A more recent work (Budzianowski Specifically, Eric and Manning(2017) proposed and Vulic, 2019) adopted the framework of Trans- a sequence-to-sequence model that utilizes copy- ferTransfo. They made the first attempt to lever- mechanism to copy history information directly age large pre-trained language models GPT and from raw dialog history. This method achieved GPT-2 on task-oriented dialog generation, but it the state-of-the-art results on DSTC2 (Henderson included belief states modeling as the input and et al., 2014), which is a simple dialog restaurant did not achieve better results than the baseline. We booking task with abundant data. However, such propose to model dialogs without any annotation method did not perform well on more complex di- but rely on pre-trained large-scale language models alog task data sets CamRes676 (Wen et al., 2017) that alternate. and KVRET (Eric et al., 2017). Sequicity (Lei et al., 2018) attributed the bad performance of Eric and Manning(2017)’s method to the omission of 3 Alternating Recurrent Dialog Model belief tracker. They introduced the concept of be- lief span and added belief tracker back to the model We propose Alternating Recurrent Dialog Model and achieved state-of-the-art performance. (ARDM) by compositing two separate pre-trained Compared to Sequicity, Eric and Manning language model in alternate order to learn the user (2017)’s method provides a more general frame- and system utterance distribution through memory work that reduces manual dialog state, user intent, recurrence. Figure1 shows an overview of ARDM. and dialog act labeling by bypassing any symbolic annotations. Such a model can apply to datasets with no or partial annotations of belief states. In 3.1 Recurrent Modeling for User and System a real-world setting, if the dialog task introduces new slot values in belief states (i.e. a new type of We aim to model both user and system utter- food), Sequicity will suffer from the belief span de- ances distribution recurrently. Given a multi- coder error in response generation. Thus, Eric and turn dialog (d) between a user (u) and a system Manning(2017)’s method may be potentially more (s), we can represent d as a series of utterances robust than Sequicity in this situation. Besides, if fu1; s1; u2; s2; : : : ; uT ; sT g, where T denotes the the task requires belief states for database search, total number of turns. We decompose the prob- we can treat belief tracking as a separate task. We ability distributions over the utterances in d into can train a good belief tracking with only a small two language models for the user and system re- amount of annotated data, which reduces the an- spectively, denoted as pu and ps. Then we define a 1293 (a) (b) Figure 1: Alternating Recurrent Dialog Model (ARDM) Overview. (a) shows how we feed the entire dialog to ARDM. (b) shows the recurrence mechanism we used to preserve memory. dialog model p(d) with the equation: features denoted as Q; K; V , where the equation is defined as: T Attention(Q; K; V ) = softmax(QKT V ) (4) Y p(d) = pu(utju<t; s<t) ps(stju≤t; s<t) (1) t=1 For simplicity, we assume there is only one layer in Transformer, and ht is the hidden states which pu and ps are standard language models where consist of N vectors for the current input N tokens the task is to predict the next token given the pre- in the utterance at time t. Then a recurrence rela- ceding context. For an utterance ut or st with m tion for ht is defined by computing Qt, K≤t, V≤t tokens fw1; : : : ; wmg, the joint probability of an from h≤t−1 and the current utterance. In practice, utterance is as follows: we reuse K≤t−1 and V≤t−1 (i.e. history keys and values) as Mt−1 instead of ht−1 to avoid recomput- ing history information. Therefore, the final ht is mut Y computed as: pu(utju<t; s<t) = P (wijw<i; u<t; s<t) (2) i=1 Mt−1 = [K≤t−1;V≤t−1] (5) K≤t;V≤t = [K≤t−1; Kt]; [V≤t−1; Vt] (6) ms Yt ht = Attention(Qt;K≤t;V≤t) (7) ps(stju≤t; s<t) = P (wijw<i; u≤t; s<t) (3) i=1 One can use ht (consisting of vectors for each to- ken) to get each token’s probability to calculate Finally, we train the dialog model by maximizing the language model cross entropy loss to maximize the likelihood over Equation 1.
Recommended publications
  • Lecture Note on Deep Learning and Quantum Many-Body Computation
    Lecture Note on Deep Learning and Quantum Many-Body Computation Jin-Guo Liu, Shuo-Hui Li, and Lei Wang∗ Institute of Physics, Chinese Academy of Sciences Beijing 100190, China November 23, 2018 Abstract This note introduces deep learning from a computa- tional quantum physicist’s perspective. The focus is on deep learning’s impacts to quantum many-body compu- tation, and vice versa. The latest version of the note is at http://wangleiphy.github.io/. Please send comments, suggestions and corrections to the email address in below. ∗ [email protected] CONTENTS 1 introduction2 2 discriminative learning4 2.1 Data Representation 4 2.2 Model: Artificial Neural Networks 6 2.3 Cost Function 9 2.4 Optimization 11 2.4.1 Back Propagation 11 2.4.2 Gradient Descend 13 2.5 Understanding, Visualization and Applications Beyond Classification 15 3 generative modeling 17 3.1 Unsupervised Probabilistic Modeling 17 3.2 Generative Model Zoo 18 3.2.1 Boltzmann Machines 19 3.2.2 Autoregressive Models 22 3.2.3 Normalizing Flow 23 3.2.4 Variational Autoencoders 25 3.2.5 Tensor Networks 28 3.2.6 Generative Adversarial Networks 29 3.3 Summary 32 4 applications to quantum many-body physics and more 33 4.1 Material and Chemistry Discoveries 33 4.2 Density Functional Theory 34 4.3 “Phase” Recognition 34 4.4 Variational Ansatz 34 4.5 Renormalization Group 35 4.6 Monte Carlo Update Proposals 36 4.7 Tensor Networks 37 4.8 Quantum Machine Leanring 38 4.9 Miscellaneous 38 5 hands on session 39 5.1 Computation Graph and Back Propagation 39 5.2 Deep Learning Libraries 41 5.3 Generative Modeling using Normalizing Flows 42 5.4 Restricted Boltzmann Machine for Image Restoration 43 5.5 Neural Network as a Quantum Wave Function Ansatz 43 6 challenges ahead 45 7 resources 46 BIBLIOGRAPHY 47 1 1 INTRODUCTION Deep Learning (DL) ⊂ Machine Learning (ML) ⊂ Artificial Intelli- gence (AI).
    [Show full text]
  • AI Computer Wraps up 4-1 Victory Against Human Champion Nature Reports from Alphago's Victory in Seoul
    The Go Files: AI computer wraps up 4-1 victory against human champion Nature reports from AlphaGo's victory in Seoul. Tanguy Chouard 15 March 2016 SEOUL, SOUTH KOREA Google DeepMind Lee Sedol, who has lost 4-1 to AlphaGo. Tanguy Chouard, an editor with Nature, saw Google-DeepMind’s AI system AlphaGo defeat a human professional for the first time last year at the ancient board game Go. This week, he is watching top professional Lee Sedol take on AlphaGo, in Seoul, for a $1 million prize. It’s all over at the Four Seasons Hotel in Seoul, where this morning AlphaGo wrapped up a 4-1 victory over Lee Sedol — incidentally, earning itself and its creators an honorary '9-dan professional' degree from the Korean Baduk Association. After winning the first three games, Google-DeepMind's computer looked impregnable. But the last two games may have revealed some weaknesses in its makeup. Game four totally changed the Go world’s view on AlphaGo’s dominance because it made it clear that the computer can 'bug' — or at least play very poor moves when on the losing side. It was obvious that Lee felt under much less pressure than in game three. And he adopted a different style, one based on taking large amounts of territory early on rather than immediately going for ‘street fighting’ such as making threats to capture stones. This style – called ‘amashi’ – seems to have paid off, because on move 78, Lee produced a play that somehow slipped under AlphaGo’s radar. David Silver, a scientist at DeepMind who's been leading the development of AlphaGo, said the program estimated its probability as 1 in 10,000.
    [Show full text]
  • Accelerators and Obstacles to Emerging ML-Enabled Research Fields
    Life at the edge: accelerators and obstacles to emerging ML-enabled research fields Soukayna Mouatadid Steve Easterbrook Department of Computer Science Department of Computer Science University of Toronto University of Toronto [email protected] [email protected] Abstract Machine learning is transforming several scientific disciplines, and has resulted in the emergence of new interdisciplinary data-driven research fields. Surveys of such emerging fields often highlight how machine learning can accelerate scientific discovery. However, a less discussed question is how connections between the parent fields develop and how the emerging interdisciplinary research evolves. This study attempts to answer this question by exploring the interactions between ma- chine learning research and traditional scientific disciplines. We examine different examples of such emerging fields, to identify the obstacles and accelerators that influence interdisciplinary collaborations between the parent fields. 1 Introduction In recent decades, several scientific research fields have experienced a deluge of data, which is impacting the way science is conducted. Fields such as neuroscience, psychology or social sciences are being reshaped by the advances in machine learning (ML) and the data processing technologies it provides. In some cases, new research fields have emerged at the intersection of machine learning and more traditional research disciplines. This is the case for example, for bioinformatics, born from collaborations between biologists and computer scientists in the mid-80s [1], which is now considered a well-established field with an active community, specialized conferences, university departments and research groups. How do these transformations take place? Do they follow similar paths? If yes, how can the advances in such interdisciplinary fields be accelerated? Researchers in the philosophy of science have long been interested in these questions.
    [Show full text]
  • The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design
    The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design Jeffrey Dean Google Research [email protected] Abstract The past decade has seen a remarkable series of advances in machine learning, and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas, including computer vision, speech recognition, language translation, and natural language understanding tasks. This paper is a companion paper to a keynote talk at the 2020 International Solid-State Circuits Conference (ISSCC) discussing some of the advances in machine learning, and their implications on the kinds of computational devices we need to build, especially in the post-Moore’s Law-era. It also discusses some of the ways that machine learning may also be able to help with some aspects of the circuit design process. Finally, it provides a sketch of at least one interesting direction towards much larger-scale multi-task models that are sparsely activated and employ much more dynamic, example- and task-based routing than the machine learning models of today. Introduction The past decade has seen a remarkable series of advances in machine learning (ML), and in particular deep learning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across a broad range of areas [LeCun et al. 2015]. Major areas of significant advances ​ ​ include computer vision [Krizhevsky et al. 2012, Szegedy et al. 2015, He et al. 2016, Real et al. 2017, Tan ​ ​ ​ ​ ​ ​ and Le 2019], speech recognition [Hinton et al.
    [Show full text]
  • The Power and Promise of Computers That Learn by Example
    Machine learning: the power and promise of computers that learn by example MACHINE LEARNING: THE POWER AND PROMISE OF COMPUTERS THAT LEARN BY EXAMPLE 1 Machine learning: the power and promise of computers that learn by example Issued: April 2017 DES4702 ISBN: 978-1-78252-259-1 The text of this work is licensed under the terms of the Creative Commons Attribution License which permits unrestricted use, provided the original author and source are credited. The license is available at: creativecommons.org/licenses/by/4.0 Images are not covered by this license. This report can be viewed online at royalsociety.org/machine-learning Cover image © shulz. 2 MACHINE LEARNING: THE POWER AND PROMISE OF COMPUTERS THAT LEARN BY EXAMPLE Contents Executive summary 5 Recommendations 8 Chapter one – Machine learning 15 1.1 Systems that learn from data 16 1.2 The Royal Society’s machine learning project 18 1.3 What is machine learning? 19 1.4 Machine learning in daily life 21 1.5 Machine learning, statistics, data science, robotics, and AI 24 1.6 Origins and evolution of machine learning 25 1.7 Canonical problems in machine learning 29 Chapter two – Emerging applications of machine learning 33 2.1 Potential near-term applications in the public and private sectors 34 2.2 Machine learning in research 41 2.3 Increasing the UK’s absorptive capacity for machine learning 45 Chapter three – Extracting value from data 47 3.1 Machine learning helps extract value from ‘big data’ 48 3.2 Creating a data environment to support machine learning 49 3.3 Extending the lifecycle
    [Show full text]
  • ARCHITECTS of INTELLIGENCE for Xiaoxiao, Elaine, Colin, and Tristan ARCHITECTS of INTELLIGENCE
    MARTIN FORD ARCHITECTS OF INTELLIGENCE For Xiaoxiao, Elaine, Colin, and Tristan ARCHITECTS OF INTELLIGENCE THE TRUTH ABOUT AI FROM THE PEOPLE BUILDING IT MARTIN FORD ARCHITECTS OF INTELLIGENCE Copyright © 2018 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. Acquisition Editors: Ben Renow-Clarke Project Editor: Radhika Atitkar Content Development Editor: Alex Sorrentino Proofreader: Safis Editing Presentation Designer: Sandip Tadge Cover Designer: Clare Bowyer Production Editor: Amit Ramadas Marketing Manager: Rajveer Samra Editorial Director: Dominic Shakeshaft First published: November 2018 Production reference: 2201118 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK ISBN 978-1-78913-151-2 www.packt.com Contents Introduction ........................................................................ 1 A Brief Introduction to the Vocabulary of Artificial Intelligence .......10 How AI Systems Learn ........................................................11 Yoshua Bengio .....................................................................17 Stuart J.
    [Show full text]
  • Technology Innovation and the Future of Air Force Intelligence Analysis
    C O R P O R A T I O N LANCE MENTHE, DAHLIA ANNE GOLDFELD, ABBIE TINGSTAD, SHERRILL LINGEL, EDWARD GEIST, DONALD BRUNK, AMANDA WICKER, SARAH SOLIMAN, BALYS GINTAUTAS, ANNE STICKELLS, AMADO CORDOVA Technology Innovation and the Future of Air Force Intelligence Analysis Volume 2, Technical Analysis and Supporting Material RR-A341-2_cover.indd All Pages 2/8/21 12:20 PM For more information on this publication, visit www.rand.org/t/RRA341-2 Library of Congress Cataloging-in-Publication Data is available for this publication. ISBN: 978-1-9774-0633-0 Published by the RAND Corporation, Santa Monica, Calif. © Copyright 2021 RAND Corporation R® is a registered trademark. Cover: U.S. Marine Corps photo by Cpl. William Chockey; faraktinov, Adobe Stock. Limited Print and Electronic Distribution Rights This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited. Permission is given to duplicate this document for personal use only, as long as it is unaltered and complete. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial use. For information on reprint and linking permissions, please visit www.rand.org/pubs/permissions. The RAND Corporation is a research organization that develops solutions to public policy challenges to help make communities throughout the world safer and more secure, healthier and more prosperous. RAND is nonprofit, nonpartisan, and committed to the public interest. RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors.
    [Show full text]
  • Machine Learning Conference Report
    Part of the conference series Breakthrough science and technologies Transforming our future Machine learning Conference report Machine learning report – Conference report 1 Introduction On 22 May 2015, the Royal Society hosted a unique, high level conference on the subject of machine learning. The conference brought together scientists, technologists and experts from across academia, industry and government to learn about the state-of-the-art in machine learning from world and UK experts. The presentations covered four main areas of machine learning: cutting-edge developments in software, the underpinning hardware, current applications and future social and economic impacts. This conference is the first in a new series organised This report is not a verbatim record, but summarises by the Royal Society, entitled Breakthrough Science the discussions that took place during the day and the and Technologies: Transforming our Future, which will key points raised. Comments and recommendations address the major scientific and technical challenges reflect the views and opinions of the speakers and of the next decade. Each conference will focus on one not necessarily that of the Royal Society. Full versions technology and cover key issues including the current of the presentations can be found on our website at: state of the UK industry sector, future direction of royalsociety.org/events/2015/05/breakthrough-science- research and the wider social and economic implications. technologies-machine-learning The conference series is being organised through the Royal Society’s Science and Industry programme, which demonstrates our commitment to reintegrate science and industry at the Society and to promote science and its value, build relationships and foster translation.
    [Show full text]
  • Arxiv:1709.02779V1 [Quant-Ph] 8 Sep 2017 I
    Machine learning & artificial intelligence in the quantum domain Vedran Dunjko Institute for Theoretical Physics, University of Innsbruck, Innsbruck 6020, Austria Max Planck Institute of Quantum Optics, Garching 85748, Germany Email: [email protected] Hans J. Briegel Institute for Theoretical Physics, University of Innsbruck Innsbruck 6020, Austria Department of Philosophy, University of Konstanz, Konstanz 78457, Germany Email: [email protected] Abstract. Quantum information technologies, on the one side, and intelligent learning systems, on the other, are both emergent technologies that will likely have a transforming impact on our society in the future. The respective underlying fields of basic research { quantum information (QI) versus machine learning and artificial intelligence (AI) { have their own specific questions and challenges, which have hitherto been investigated largely independently. However, in a growing body of recent work, researchers have been prob- ing the question to what extent these fields can indeed learn and benefit from each other. QML explores the interaction between quantum computing and machine learning, inves- tigating how results and techniques from one field can be used to solve the problems of the other. In recent time, we have witnessed significant breakthroughs in both directions of influence. For instance, quantum computing is finding a vital application in providing speed-ups for machine learning problems, critical in our \big data" world. Conversely, machine learning already permeates many cutting-edge technologies, and may become instrumental in advanced quantum technologies. Aside from quantum speed-up in data analysis, or classical machine learning optimization used in quantum experiments, quan- tum enhancements have also been (theoretically) demonstrated for interactive learning tasks, highlighting the potential of quantum-enhanced learning agents.
    [Show full text]
  • Updated Compressed Data Trends
    Trends in Data Science 2019/2020 Foreword Across six major conferences in Boston, NYC, Sao Paulo, Bangalore, London, and San Francisco, ODSC hosted an unprecedented 920+ speakers in 2019. It was another year of rapid advances and exciting developments in the fields of data science and artificial intelli- gence. The year was significant on several fronts. Frameworks like PyTorch, Keras, and TensorFlow saw major releases that further cemented their position as the leading machine and deep learning tools and featured in many of our hands-on training sessions. However, NLP transformer architectures was one of our speak- ers’ favorite topics in 2019 due to major advances such as OpenAI’s GPT-2, Google’s BERT, DiDI’s Elmo, and FaceBook’s RoBERTa. Hugging Face’s transformer library which exposes an API to these transformers got a lot of traction also. These pretrained models, especially BERT, have been especially hailed by many as NLPs Imagenet moment. Employing new techniques like bi-directional sequencing and transformers these models are saving data scientists the time and expenses normally required to train NLP models thus marking a major development. Our new MLOps and Data Engineering focus area tracks coincided with the massive ramp up of efforts to increase the percent of data science projects into production. Tools like MLFlow and Kubeflow continued to grow in popularity. Interest in AutoML grew exponen- tially in 2019, given its potential as a productivity tool in all stages of the machine learning life cycle. Sessions around Annotation, model interoperability, pipelines, deployment, and testing saw increased interest. This coupled with the fact that 2019 saw a significant drop in the cost of modeling helped accelerate deployment in production environments.
    [Show full text]
  • Brave New World
    Visions don’t come much loftier than DeepMind’s. Described as an Apollo programme for the 21st century, it’s helping to tackle the world’s biggest problems by developing learning machines more powerful than the human brain. “We’re on a scientific mission to push the boundaries of artificial intelligence (AI), developing programs that can learn to solve any complex problem without needing to be taught how.” That’s literally the mission statement of DeepMind, the world’s leading AI research firm. It’s the sheer breadth of that “any” that’s so thrilling. While most AI is “narrow”, able only to learn a single task, DeepMind’s goal is more akin to the brain: a “general-purpose learning machine” that can make sense of whatever it’s applied to – even utilising such human traits as the ability to “imagine” outcomes and reason about the future. As Dr Demis Hassabis, DeepMind’s co-founder and CEO, puts it: “First we solve intelligence, then we use that to solve everything else and make the world a better place.” “Time magazine listed Hassabis among the 100 Most Influential People for 2017” DeepMind honed its expertise in the world of games: “a useful training ground”, it now says. It created a program that taught itself how to play and win at 49 Atari titles, with just raw pixels as input. It developed AlphaGo, which beat the world’s best player at Go – one of the most complex and intuitive games ever devised, with more positions than there are atoms in the universe.
    [Show full text]
  • Global Artificial Intelligence Industry Whitepaper
    Global artificial intelligence industry whitepaper Global artificial intelligence industry whitepaper | 4. AI reshapes every industry 1. New trends of AI innovation and integration 5 1.1 AI is growing fully commercialized 5 1.2 AI has entered an era of machine learning 6 1.3 Market investment returns to reason 9 1.4 Cities become the main battleground for AI innovation, integration and application 14 1.5 AI supporting technologies are advancing 24 1.6 Growing support from top-level policies 26 1.7 Over USD 6 trillion global AI market 33 1.8 Large number of AI companies located in the Beijing-Tianjin-Hebei Region, Yangtze River Delta and Pearl River Delta 35 2. Development of AI technologies 45 2.1 Increasingly sophisticated AI technologies 45 2.2 Steady progress of open AI platform establishment 47 2.3 Human vs. machine 51 3. China’s position in global AI sector 60 3.1 China has larger volumes of data and more diversified environment for using data 61 3.2 China is in the highest demand on chip in the world yet relying heavily on imported high-end chips 62 3.3 Chinese robot companies are growing fast with greater efforts in developing key parts and technologies domestically 63 3.4 The U.S. has solid strengths in AI’s underlying technology while China is better in speech recognition technology 63 3.5 China is catching up in application 64 02 Global artificial intelligence industry whitepaper | 4. AI reshapes every industry 4. AI reshapes every industry 68 4.1 Financial industry: AI enhances the business efficiency of financial businesses
    [Show full text]