Escaping the Local Minimum Kenneth Friedman May 9, 2016

Escaping the Local Minimum Kenneth Friedman May 9, 2016 Artificial Intelligence performs gradient descent. The AI field discovers a path of success, and then travels that path until progress stops (when a local minimum is reached). Then, the field resets and chooses a new path, thus repeating the process. If this trend continues, AI should soon reach a local minimum, causing the next AI winter. However, recent methods provide an opportunity to escape the local minimum. To continue recent success, it is necessary to compare the current progress to all prior progress in AI. Contents Introduction 2 Gradient Descent 2 1960s 3 The First AI Winter 5 1980s 5 The Second AI Winter 6 Modern Day 7 The Next AI Winter 8 Deep Learning Checklist 9 Logic Systems 10 Minsky’s Multiplicity 11 Story Understanding 13 Contributions 15 Overview I begin this paper by pointing out a concerning pattern in the field of AI and describing how it can be useful to model the field’s behavior. The paper is then divided into two main sections. In the first section of this paper, I argue that the field of artificial intelligence, itself, has been performing gradient descent. I catalog a repeating trend in the field: a string of successes, followed by a sudden crash, followed by a change in direction. In the second section, I describe steps that should be taken to prevent the current trends from falling into a local minimum. I present a escaping the local minimum 2 number of examples from the past that deep learning techniques are currently unable to accomplish. Finally, I summarize my findings and conclude by reiterating the use of the gradient descent model. Introduction Imagine training a pattern matching algorithm (say, a neural network) on the following series of data: 10 data points of the value "good", followed by 5 data points of "bad", another 10 data points of "good" and another 5 data points of "bad." Next, you test the system with 10 data points of "good." What will the algorithm predict the next data points to be? Obviously, and sufficient pattern matching algorithm would predict an incoming set of "bad" points. Unfortunately, this pattern is the trend of the first 60 years of the AI field1. When the field began, 1 Of course, this trend is a simplifica- there was been a period of success followed by a period of collapse. tion. A field of academia is not binary in its progress. However, it is a useful Then another period of success emerged, but failure soon followed. simplification to understand periods of Currently, we are in a period of great progress. So should we be pre- significant improvement compared to periods of stagnation. dicting failure any day now? Is this pattern doomed to continue, or can we understand why it happens and take steps to prevent it? I believe we can model the field of AI using a technique from within: gradient descent. By modeling the field’s behavior with this well understood concept, we can pinpoint the problem with the technique and take steps to ameliorate its downside. Gradient Descent Gradient descent is an optimization algorithm that at- tempts to find the minimum of a function. In gradient descent, a series of steps are taken in the direction of the steepest negative slope. These steps repeat along that path until the direction of the slope changes (which proves a local minimum). The problem with gradient descent is clear: a local minimum is not necessarily the global minimum, since a local minimum is only based on a small period of time. In fact, a given local minimum might be worse than a previously discovered minimum (see Figure 1). The field of Artificial Intelligence can be modeled with this tech- Figure 1: The fallacy of gradient descent nique. In this case, the function is loosely understood as the distance can be easily seen here. Starting at (0, 0), gradient descent will end at the that "computer intelligence" is from "human intelligence", over time. local minimum near (0, −2), however That is, as the function approaches zero, computer intelligence ap- a far smaller value could be achieved (− ) proaches human level intelligence. Conceptually, a decreasing slope following a path to 3, 1 . (Plot of z = −y3(y − 2)(y + 2) + x(x + 1)(x + 3) − x2y, created with WolframAlpha.) escaping the local minimum 3 signifies a progressing state-of-the-art AI, while an increasing slope signifies a lack of progress. Here, the idea of a local minimum is that the field has chosen a particular path that appeared successful when compared to the recent past. However, the path ran out of areas to improve. Moreover, the path could be worse than a previous minimum — an improvement over the recent past, but a long term dead-end. In the remainder of this section, I catalog the history of AI falling into local minimum traps. 1960s Early progress in artificial intelligence was swift and far- reaching. In the years after the Dartmouth Conferences2, there were 2 The Dartmouth Conferences took are series of successes that lead to an extremely optimistic timeline. place at Dartmouth College in 1956. The complete name of the event was These successes came largely from attacking a number of chal- the Dartmouth Summer Research Project lenges using search algorithms. One of the earliest examples of a on Artificial Intelligence. It spanned 3 2 months and is widely regarded as search based system was saint (Symbolic Automatic Integrator) , the founding conference of the field. a program that could solve integration problems at the level of a McCarthy et al. [2006] college freshman. 3 James Robert Slagle. A heuristic program that solves symbolic Saint was considered the most advanced problem solving sys- integration problems in fresh- tem of its time. In the 108-page paper’s concluding chapter, Slagle man calculus, symbolic automatic explains insights that he discovered in both natural and artificial integrator (saint)., 1961. URL http://hdl.handle.net/1721.1/11997 intelligence. While Slagle concedes that the program has a limited domain compared to the human mind, he explains that his integration system could solve problems that were advanced than any prior system. Another early success was Newell & Simon’s General Problem 4 Solver (gps) . This program was able to solve elementary symbolic Figure 2: The goal of the Tower of logic and elementary algebra. The most impressive demonstration of Hanoi is to move the the disks from one gps was its ability to solve the Towers of Hanoi problem (See Figure ring to another ring, maintaining the smallest-to-largest order of the original. 2). Instead of attempting problems that were advanced for humans Tower of Hanoi, Wikipedia (such as integral calculus), gps sought to be as general as possible. 4 Allen Newell, John C Shaw, The report on gps discusses the system’s focus on two techniques: and Herbert A Simon. Report on a general problem-solving Means-End Analysis (mea), and planning (discussed later). Mae was program, 1959. URL http: the general term Newell et al. used in order to describe the search //129.69.211.95/pdf/rand/ipl/P- 1584_Report_On_A_General_Problem- heuristics in the program. Specifically, the mae algorithms in gps Solving_Program_Feb59.pdf attempted to minimize the difference between the system’s current state and its goal state. Other successes include Thomas Evans’ analogy program and Daniel Bobrow’s student program. Analogy5 could solve geomet- 5 T.G Evans. A heuristic program to ric, IQ-test style analogy problems such as "shape A is to shape B as solve geometric analogy problems, 1962. URL http://hdl.handle.net/ shape C is to which of the following shapes," as seen in Figure 3. 1721.1/6101 escaping the local minimum 4 Figure 3: The pictures shown here relate to the type of geometric analogy question that analogy was able to solve. The question could be asked: Shape A is to shape B as shape C is to which of the following shapes? Analogy was able to determine the answer: shape 2. Meanwhile, student6 was able to solve word-algebra problems 6 Daniel G Bobrow. Natural Language at a high school level. These early successes, many of which were at Input for a Computer Problem Solving System. PhD thesis, Massachusetts MIT, along with a large ARPA grant, led Marvin Minsky and John Institute of Technology, 1964. URL McCarthy to found the MIT AI Lab in the late 1960s. http://hdl.handle.net/1721.1/5922 All of these examples (and more) showed the field of Artificial Intelligence to be progressing well. This progress led to widespread optimism. In 1967, Minsky reportedly said "within a generation ... the problem of creating ’artificial intelligence’ will substantially be solved."7 7 Daniel Crevier. AI: The Tumultuous His- It might seem likely that Minsky quickly loosened his prediction. tory of the Search for Artificial Intelligence. Basic Books, Inc., New York, NY, USA, In fact, the opposite occurred. In 1970, three years and many small 1993. ISBN 0-465-02997-3 successes later, Minsky quickened and expanded his prediction, saying: In from three to eight years we will have a machine with the general intelligence of an average human being. I mean a machine that will be able to read Shakespeare, grease a car, play office politics, tell a joke, have a fight. At that point the machine will begin to educate itself with fantastic speed. In a few months it will be at genius level and a few months after that its powers will be incalculable.8 8 Brad Darrach.

Escaping the Local Minimum Kenneth Friedman May 9, 2016

A Framework for Representing Knowledge Marvin Minsky MIT-AI Laboratory Memo 306, June, 1974. Reprinted in the Psychology of Comp

Mccarthy As Scientist and Engineer, with Personal Recollections

Arxiv:2106.11534V1 [Cs.DL] 22 Jun 2021 2 Nanjing University of Science and Technology, Nanjing, China 3 University of Southampton, Southampton, U.K

WHY PEOPLE THINK COMPUTERS CAN't Marvin

The Qualia Manifesto (C) Ken Mogi 1998 [email protected] the Qualia Manifesto Summar

A Programming and Problem-Solving Seminar

Transforming Ad Hoc EDA to Algorithmic EDA

The Logic of Qualia 1 Introduction

Russell & Norvig, Chapters

Tlie Case of Artificial Intelligence and Some Historical Parallels

Marvin Minsky & Websites ………………………

Philip Wadler