AAAI Presidential Address: Steps Toward Robust Artificial Intelligence
Total Page:16
File Type:pdf, Size:1020Kb
Articles AAAI Presidential Address: Steps Toward Robust Artificial Intelligence Thomas G. Dietterich I Recent advances in artificial intelli- et me begin by acknowledging the recent death of gence are encouraging governments and Marvin Minsky. Professor Minsky was, of course, corporations to deploy AI in high-stakes Lone of the four authors of the original Dartmouth settings including driving cars Summer School proposal to develop artificial intelligence autonomously, managing the power (McCarthy et al. 1955). In addition to his many contri- grid, trading on stock exchanges, and controlling autonomous weapons sys- butions to the intellectual foundations of artificial intel- tems. Such applications require AI ligence, I remember him most for his iconoclastic and methods to be robust to both the known playful attitude to research ideas. No established idea unknowns (those uncertain aspects of could long withstand his critical assaults, and up to his the world about which the computer can death, he continually urged us all to be more ambitious, reason explicitly) and the unknown to think more deeply, and to keep our eyes focused on unknowns (those aspects of the world the fundamental questions. that are not captured by the system’s In 1959, Minsky wrote an influential essay titled Steps models). This article discusses recent progress in AI and then describes eight Toward Artificial Intelligence (Minsky 1961), in which he ideas related to robustness that are summarized the state of AI research and sketched a path being pursued within the AI research forward. In his honor, I have extended his title to incor- community. While these ideas are a porate the topic that I want to discuss today: How can we start, we need to devote more attention make artificial intelligence systems that are robust in the to the challenges of dealing with the face of lack of knowledge about the world? known and unknown unknowns. These Minsky shared this concern. In his book, Society of issues are fascinating, because they touch on the fundamental question of Mind (Minsky 1988) and in many interviews, he often how finite systems can survive and thrive in a complex and dangerous world. Copyright © 2017, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602 FALL 2017 3 Articles 30 23% 20 10 Word Error (%) Word 8% 0 2013 2014 2015 Years Figure 1. Google Speech Word Error Rate. (Reproduced with permission from Fernando Pereira and Matthew Firestone, Google) pointed out the contrast between the robustness of these applications. Let me first review some of the the human intellect and the brittleness of existing AI recent progress and then discuss these emerging systems. In an interview with John Brockman, he said applications. almost any error will completely paralyze a typical The past decade has witnessed exciting advances in computer program, whereas a person whose brain has artificial intelligence research and applications. Fig- failed at some attempt will find some other way to pro- ure 1 shows that the word error rate of the Google ceed. We rarely depend upon any one method. We speech engine has declined dramatically from 23 per- usually know several different ways to do something, cent in 2013 to 8 percent in 2015 (Fernando Pereira so that if one of them fails, there’s always another. and Matthew Firestone, personal communication). (Brockman [1996] p. 156) Figure 2 shows similar progress in computer vision for In this article, I wish to address this question: As a the task of determining whether an image contains field, what are our current ideas about how to achieve an instance of an object class (for 1000 possible class- robustness in AI systems? I will begin by arguing that es). The top-5 error rate has dropped from 28.2 per- we need methods for robust AI because of emerging cent in 2010 to 6.7 percent in 2014 (Russakovsky et applications of AI in high-stakes applications where al. 2015). There have been similar advances in other computer vision tasks such as object localization, rec- human lives are at risk. I will then discuss the two ognizing text in images (for example, signage), and main settings in which to consider robustness: image captioning. Turning to language processing, Robustness to the known unknowns (that is, robustness the progress on natural language translation has been to aspects of the world for which we have models) substantial. A standard automated metric for transla- and robustness to the unknown unknowns (that is, tion quality is the bilingual evaluation understudy robustness to unmodeled aspects of the world). I will (BLEU) score. A typical BLEU score in 2007 was 14.6. then describe four approaches to robustness within In 2014, scores in the range of 23.6–24.7 were each of these settings. attained by several groups (Sennrich 2016). Another way to assess translation accuracy is to have bilingual Why We Need Robust AI human judges assign a score from 0 (nonsense) to 6 (perfect translation) to a pair of sentences. Figure 3 Recent advances in AI are encouraging and enabling from Google compares such scores for various lan- new high-stakes AI applications. My argument is that guage pairs and translation methods: (a) phrase-based AI technology is not yet sufficiently robust to support translation, (b) neural machine translation, and (c) 4 AI MAGAZINE Articles 30 25 20 15 10 5 Top 5 Classification Error (%) 0 2010 2011 2012 2013 2014 Years Figure 2. ImageNet Top 5 Classification Error Rate. 7 phrase-based NMT human 6 5 4 3 2 Human Judge Score 1 0 English -> English -> English -> Spanish -> French -> Chinese -> Spanish French Chinese English English English Languages Figure 3. Human Evaluation of Google Translate Quality. FALL 2017 5 Articles Figure 4. Google Real-Time Image Translation. (Courtesy, BBC.com) human translation (Wu et al. 2016). We see that neu- progress by formulating their tasks as satisfiability ral machine translation has greatly reduced the gap problems and solving them with these improved between AI methods and human translators, algorithms. although a substantial gap in quality remains. In all Game playing is another area where AI continues of these cases, the advances are due to improvements to make progress. Figure 6 plots the performance of in deep neural networks for machine learning. The computer chess and Go programs as a function of separate advances in vision, speech, and translation time (Schaeffer, Müller, and Kishimoto 2014). Per- can be combined in exciting ways. For example, by formance on chess has continued to improve beyond combining speech recognition and translation, the level achieved by IBM’s Deep Blue when it defeat- Microsoft now offers Skype Translator, which pro- ed Gary Kasparov in 1997. Progress in the game of Go vides real-time speech-to-speech translation in Skype. was very slow until the development of the UCT algo- By integrating computer vision (for recognizing text) rithm for Monte Carlo tree search (Kocsis and with translation, Google offers live image translation, Szepesvári 2006). This ushered in a period of rapid as shown in figure 4. progress, but it still did not lead beyond master-level In addition to progress enabled by machine learn- play. In 2015–16, the researchers at Google DeepMind ing, many improvements in AI systems have resulted developed AlphaGo, which combined deep neural from advances in reasoning methods. Figure 5 shows network methods with Monte Carlo tree search. The how the size of satisfiability problems that can be neural networks were apparently able to solve the solved in a fixed amount of time improved 1000-fold pattern perception problem, which was long believed from 1998 to 2010. Note in particular that these to be a key to human skill in the game. By combining advances resulted from many different algorithm this with search, AlphaGo was able to defeat Lee innovations. Satisfiability solvers (especially, solvers Sedol, one of the top players in the world. for satisfiability modulo theories; SMT) are now being Beyond games of perfect information, such as Go widely deployed for model checking in hardware and and chess, AI has made steady progress in games of software verification. In addition, areas of AI such as imperfect information, such as poker. In such games, automated planning and scheduling have made one measure of difficulty is the number of possible 6 AI MAGAZINE Articles • Solver-based programming languages • Compiler optimizations using solvers • Solver-based debuggers 1,000,000 Constraints • Solver-based type systems • Solver-based concurrency bugfinding • Solver-based synthesis • Bio & Optimization • Concolic Testing • Program Analysis • Equivalence Checking 100,000 Constraints • Auto Configuration • Bounded MC • Program Analysis • AI 10,000 Constraints 1,000 Constraints 1998 2001 2004 2007 2010 Figure 5. Progress on Satisfiability Algorithms. (Courtesy, Vijay Ganesh.) information sets. In poker, an information set is the of AI. Many of these, such as personal assistants (Siri, information about the cards that are face up. Figure 7 GoogleNow, Alexa, and others), advertisement place- reports the number of information sets that can be ment, and recommendation engines, operate in very correctly processed as a function of time. There has low-stakes settings where errors are easily managed. been an increase of eight orders of magnitude However, several emerging applications involve between 2003 and 2014, although three of these can potential loss of human lives or catastrophic financial be credited to improvements in computer hardware and infrastructure disruption. Let us list some of these (Bowling et al. 2015). In 2017, a program named applications. Libratus from Tuomas Sandholm’s lab at Carnegie The application that has received the most atten- Mellon University convincingly defeated four of the tion in the media is the development of self-driving world’s top 10 players in Heads-Up No Limit Texas cars.