The Macgyver Test
Total Page:16
File Type:pdf, Size:1020Kb
The MacGyver Test - A Framework for Evaluating Machine Resourcefulness and Creative Problem Solving Vasanth Sarathy Matthias Scheutz Tufts University Medford, MA, USA Abstract 2 Background: Turing Test and its Progeny Alan Turing asked whether machines could produce observ- Current measures of machine intelligence are ei- able behavior (e.g., natural language) that we (humans)would ther difficult to evaluate or lack the ability to test say required thought in people [Turing, 1950]. He suggested a robot’s problem-solving capacity in open worlds. that if an interrogator was unable to tell, after having a We propose a novel evaluation framework based on long free-flowing conversation with a machine whether she the formal notion of MacGyver Test which provides was dealing with a machine or a person, then we can con- a practical way for assessing the resilience and re- clude that the machine was “thinking”. Turing did not in- sourcefulness of artificial agents. tend for this to be a test, but rather a prediction of sorts [Cooper and Van Leeuwen, 2013]. Nevertheless, since Tur- ing, others have developed tests for machine intelligence that 1 Introduction were variations of the so-called Turing Test to address a com- mon criticism that it was easy to deceive the interrogator. Consider a situation when your only suit is covered in lint and Levesque et al. designed a reading comprehension test, en- you do not own a lint remover. Being resourceful, you rea- titled the Winograd Schema Challenge, in which the agent son that a roll of duct tape might be a good substitute. You is presented a question having some ambiguity in the ref- then solve the problem of lint removal by peeling a full turn’s erent of a pronoun or possessive adjective. The question worth of tape and re-attaching it backwards onto the roll to asks to determine the referent of this ambiguous pronoun expose the sticky side all around the roll. By rolling it over or possessive adjective, by selecting one of two choices your suit, you can now pick up all the lint. This type of ev- [Levesque et al., 2012]. Feigenbaum proposed a variation of eryday creativity and resourcefulness is a hallmark of human the Turing Test in which a machine can be tested against a intelligence and best embodied in the 1980s television series team of subject matter specialists through natural language MacGyver which featured a clever secret service agent who conversation [Feigenbaum, 2003]. Other tests attempted to used common objects around him like paper clips and rub- study a machine’s ability to produce creative artifacts and ber bands in inventive ways to escape difficult life-or-death solve novel problems [Boden, 2010; Bringsjord et al., 2001; situations.1 Bringsjord and Sen, 2016; Riedl, 2014]. Yet, current proposals for tests of machine intelligence do Extending capabilities beyond linguistic and creative, Har- not measure abilities like resourcefulness or creativity, even nad’s Total Turing Test (T3) suggested that the range of ca- pabilities must be expanded to a full set of robotic capaci- arXiv:1704.08350v1 [cs.AI] 26 Apr 2017 though this is exactly what is needed for artificial agents such as space-exploration robots, search-and-rescue agents, ties found in embodied systems [Harnad, 1991]. Schweizer or even home and elder-care helpers to be more robust, re- extended the T3 to incorporate species evolution and de- silient, and ultimately autonomous. velopment over time and proposed the Truly Total Turing In this paper we thus propose an evaluation framework Test (T4) to test not only individual cognitive systems but for machine intelligence and capability consisting of prac- whether as a species the candidate cognitive architecture in tical tests for inventiveness, resourcefulness, and resilience. question is capable of long-term evolutionary achievement Specifically, we introduce the notion of MacGyver Test (MT) [Schweizer, 2012]. as a practical alternative to the Turing Test intended to ad- Finding that the Turing Test and its above-mentioned vari- vance research. ants were not helping guide research and development, many proposed a task-based approach. Specific task-based goals 1As a society, we place a high value on our human ability to were designed couched as toy problems that were representa- solve novel problems and remain resilient while doing so. Beyond tive of a real-world task [Cohen, 2005]. The research com- the media, our patent system and peer-reviewed publication systems munities benefited greatly from this approach and focused are additional examples of us rewarding creative problem solving their efforts towards specific machine capabilities like object and elegance of solution. recognition, automatic scheduling and planning, scene under- standing, localization and mapping, and even game-playing. consider the notion of state reachability and the set of all suc- Many public competitions and challenges emerged that tested cessor states Γ(ˆ s), which defines the set of states reachable the machine’s performance in applying these capabilities – from s. from image recognition contests and machine learning con- tests. Some of these competitions even tested embodiment 3.2 A MacGyver Problem and robotic capacities, while combining multiple tasks. For To formalize a MacGyver Problem (MGP), we define a uni- example, the DARPA Robotics Challenge tested a robot’s verse and then a world within this universe. The world de- ability to conduct tasks relevant to remote operation includ- scribes the full set of abilities of an agent and includes those ing turning valves, using a tool to break through a concrete abilities that the agent knows about and those of which it is panel, opening doors, remove debris blocking entryways. unaware. We can then define an agent subdomain as repre- Unfortunately, the Turing Test variants as well as the task- senting a proper subset of the world that is within the aware- based challenges are not sufficient as true measures of au- ness of the agent. An MGP then becomes a planning problem tonomy in the real-world. Autonomy requires a multi-modal defined in the world, but outside the agent’s current subdo- ability and an integrated embodied system to interact with main. the environment, and achieve goals while solving open-world U problems with the limited resources available. None of these Definition 1 (Universe). We first define a Universe = tests are interested in measuring this sort of intelligence and (S,A,γ) as a classical planning domain representing all as- capability, the sort that is most relevant from a practical pects of the physical world perceivable and actionable by any standpoint. and all agents, regardless of capabilities. This includes all the allowable states, actions and transitions in the physical 3 The MacGyver Evaluation Framework universe. Definition 2 (World). We define a world Wt = (St, At,γt) The proposed evaluation framework, based on the idea as a portion of the Universe U corresponding to those aspects of MacGyver-esque creativity, is intended to answer the that are perceivable and actionable by a particular species t question whether embodied machines can generate, execute of agent. Each agent species t ∈ T has a particular set of and learn strategies for identifying and solving seemingly- sensors and actuators allowing agents in that species to per- unsolvable real-world problems. The idea is to present an ceive a proper subset of states, actions or transition functions. agent with a problem that is unsolvable with the agent’s ini- Thus, a world can be defined as follows: tial knowledge and observing the agent’s problem solving processes to estimate the probability that the agent is being Wt = {(St, At,γt) | ((St ⊆ S) ∨ (At ⊆ A) ∨ (γt ⊆ γ)) creative: if the agent can think outside of its current context, t t t take some exploratory actions, and incorporate relevant envi- ∧ ¬((S = S) ∧ (A = A) ∧ (γ = γ))} ronmental cues and learned knowledge to make the problem Definition 3 (Agent Subdomain). We next define an agent tractable (or at least computable) then the agent has the gen- t t t t 2 Σi = (Si , Ai,γi ) of type t, as a planning subdomain corre- eral ability to solve open-world problems more effectively. sponding to the agent’s perception and action within its world This type of problem solving framework is typically used Wt. In other words, the agent is not fully aware of all of in the area of automated planning for describing various sorts t its capabilities at all times, and the agent domain Σi corre- of problems and solution plans and is naturally suited for sponds to the portion of the world that the agent is perceiving defining a MacGyver-esque problem and a creative solution and acting at time i. strategy. We are now ready to formalize various notions of the MacGyver evaluation framework. t t t t t t t t t t Σi = {(Si , Ai,γi ) | ((Si ⊂ S )∨(Ai ⊂ A )∨(γi ⊂ γ )) t t t t t t 3.1 Preliminaries - Classical Planning ∧ ¬((Si = S ) ∧ (Ai = A ) ∧ (γi = γ ))} We define L to be a first order language with predicates Definition 4 (MacGyver Problem). We define a MacGyver p(t1,...,tn) and their negations ¬p(t1,...,tn) , where ti Problem (MGP) with respect to an agent t, as a planning represents terms that can be variables or constants. A predi- problem in the agent’s world Wt that has a goal state g that cate is grounded if and only if all of its terms are constants. is currently unreachable by the agent. Formally, an MGP We will use classical planning notions of a planning domain t PM = (W ,s0,g), where: in L that can be represented as Σ = (S,A,γ), where S rep- t resents the set of states, A is the set of actions and γ are the • s0 ∈ Si is the initial state of the agent transition functions.