From: AAAI-96 Proceedings. Copyright © 1996, AAAI (www.aaai.org). All rights reserved.

Challenge Problems for (panel statement)

Bart Selman (moderator) AT&T

Rodney A. Brooks Thomas Dean Eric Horvitz MIT Brown University

Tom M. Mitchell Nils J. Nilsson CMU

Introduction: progress, and bring us important new insights. Many AI researchers may not like the particular lesson about AI textbooks and papers often discuss the big ques- the value of brute-force over more “intelligent” forms tions, such as “how to reason with uncertainty”, “how of search, but, nevertheless, it is a very tangible re- to reason efficiently”, or “how to improve performance sult. In fact, the issue of general purpose ultra-fast through learning .” It is more difficult, however, to find search procedures versus heuristically guided domain- descriptions of concrete problems or challenges that are dependent methods is currently being revisited in the still ambitious and interesting, yet not so open-ended. search and reasoning community. The goal of this panel is to formulate a set of such Finally, as a meta-issue, we will consider how to mea- challenge problems for the field. Each panelist was sure progress in the field. Determining whether a par- asked to formulate one or more challenges. The em- ticular piece of work in AI actually brings us any closer phasis is on problems for which there is a good chance to the ultimate goals of AI has proven to be quite dif- that they will be resolved within the next five to ten ficult. By introducing a set of well-defined challenge years. problems, we hope that this panel will help provide A good example of the potential benefit of a con- some benchmarks against which we can measure re- crete AI challenge problem is the recent success of Deep search progress. Blue. Deep Blue is the result of a research effort fo- cused on a single problem: develop a program to defeat Eight Challenges for Artificial the world chess champion. Although Deep Blue has Intelligence: Rodney Brooks not yet quite achieved this goal, it played a remark- ably strong game against Kasparov in the recent ACM There are two very different sorts of challenges that I Chess Challenge Match. see for Artificial Intelligence - first, our systems are A key lesson we learn from Deep Blue’s strength is pathetic compared to biological systems, along many that efficient brute-force search can be much more ef- dimensions, and secondly, moderately good perfor- fective than sophisticated, heuristically guided search. mance from some approaches has sociologically led to In fact, brute-force was so successful that it led Kas- winner-take-all trends in research where other promis- parov to exclaim “I could feel - I could smell - a new ing lines of research have been snuffed out too soon. kind of int.elligence across the table.” (Kasparov 1996) If we compare either software systems, or robotic The experience with Deep Blue shows that a good systems to biological systems, we find that our cre- challenge problem can focus research, lead to concrete ations are incredibly fragile by comparison. Below I

1340 AfMI-96 pose some challenges that are aimed at narrowing this the activity of the neuron is an “all-or-none” process, distance. The challenges themselves are general in na- that a certain fixed number of synapses must be ex- ture - they do not solve particular problems, but the cited within the period of latent addition in order to acts of meeting these challenges will force the creation excite a neuron at any time, and this number is inde- of new general purpose techniques and tools which will pendent of the synapses’ previous activity and position allow us to solve more particular problems. on the neuron, that the only significant delay within Challenge 1. Biological systems can adapt to new the nervous system is synaptic delay, that the activity environments - not perfectly, they die in some envi- of any inhibitory synapse absolutely prevents excita- ronments, but often they can adapt. Currently our tion of the neuron at that time, and that the structure programs are very brittle, and certainly a program of the net does not change with time. With the addi- compiled for one architecture cannot run on another tion of changing synaptic weights by Hebb (1949) we architecture. Can we build a program which can in- pretty much have the modern computational model of stall itself and run itself on an unknown architecture? neurons used by most researchers. With 50 years of ad- This sounds very difficult. How about a program which ditional neuroscience, we now know that there is much can probe an unknown architecture from a known ma- more to real neurons. Can newer models provide us chine and reconfigure a version of itself to run on the with new computational tools, and will they lead to unknown machine ? Still rather difficult, so perhaps new insights to challenge the learning capabilities that we have to work up to this by making some “blocks we see in biological learning? worlds” artificial architectures where we can do this. Over time we become trapped in our shared vi- This might lead to some considerations of how future sions of appropriate ways to tackle problems, and architectures might be designed so that software is self- even more trapped by our funding sources where we configurable, and then even perhaps self-optimizing. must constantly justify ourselves by making incremen- Challenge 2. Minsky (1967) was foundational in es- tal progress. Sometimes it is worthwhile stepping back tablishing the theory of computation, but after Hart- and taking an entirely new (or perhaps very old) look manis (1971) there has been a fixation with asymp- at some problems and to think about solving them in totic complexity. In reality lots of problems we face new ways. This takes courage as we may be leading in building real AI systems do not get out of hand in ourselves into different sorts of solutions that will for terms of the size of problems for individual modules - many years have poorer performance than existing so- in particular with behavior-based systems most of the lutions. With years of perseverance we may be able submodules need only deal with bounded size prob- to overcome initial problems with the new approaches lems. There are other ways theory could have gone. and eventually leapfrog to better performance. Or we For instance one might try to come up with a theory may turn out to be totally wrong. That is where the of computation based on how much divergence there courage comes in. might be in programs given a one bit error in either Challenge 5. Despite some early misgivings (Sel- the program or data representation. If theory were fridge 1956) back when chess playing programs had based on this fundamental concern we might start to search trees only two deep (Newell et al. 1958), our understand how to make programs more robust. modern chess programs completely rely on deep search Challenge 3. Recent work with evolutionary sys- trees and play chess not at all like humans. Can we tem has produced some tantalizing spectacular results, build a program that plays chess in the way that a hu- e.g., Sims (1994). But it is hard to know how to take man plays? If we could, then perhaps we could prove things from successes and apply them to new problems. how good it was by getting it to play GO-tree search We do not have the equivalent of the Perceptron book just cannot cut it with GO. (Minsky and Papert 1969) for evolutionary systems.’ Challenge 6. All of the competitive speech under- We need such a new book of mathematics so that we standing systems today use hidden Markov models. understand the strengths and weaknesses of this excit- While trainable, these systems have some unfortunate ing new approach. properties. They have much higher error rates than Challenge 4. We have been living with the basic for- we might desire, they require some restriction in do- malizations made by McCulloch and Pitts (1943) for main, and they are often inordinately sensitive to the over fifty years now. Their formalization included that choice of microphone. It seems doubtful that people use HMM’s internally (even if one doesn’t believe that ‘Note that these evolutionary systems are much more generative grammars are the right approach either). that straight genetic algorithms as there is both a variable length genotype and a morphogenesis phase that produces Can we build a speech understanding system that is a distinctly different phenotype. based on very different principles?

Invited Talks 1341 Challenge 7. We live in an environment where we from theoretical that provide alterna- make extensive use of non-speech sound cues. There tives to the standard measures of performance, specif- has been very little work on noise understanding. Can ically asymptotic worst-case analysis. As an alterna- we build interesting noise understanding systems? tive to worst-case analysis relying on all-powerful, all- Challenge 8. Can we build a system by evolution knowing adversaries, I discuss the average-case per- that is better at a non-trivial task than anything that formance measures and the properties of distribu- has been built by hand? tions governing the generation of problem instances. Regarding asymptotic arguments, I consider sharp- Integrating Theory and Practice in - threshold functions, the relevance of phase-transition Planning: Thomas Dean phenomena, and the statistical properties of graphs of small order. The resulting perspective empha- I issue a challenge to theorists, experimentalists, and sizes particular problem instances and specific algo- practitioners alike to raise the level of expectation for rithms rather than problem classes and complexity re- collaborative scientific research in planning. Niels Bohr sults that pertain to all algorithms of a given order was first and foremost a theoretical physicist. Ernest of growth. I also point out the embarrassing lack of Rutherford was first and foremost an experimentalist. ‘real’ planning problems, posit the reason for such a It can be argued that neither Bohr nor Rutherford deficit, and suggest how recent progress in learning the- would have made as significant contributions to nu- ory might provide a rich source of planning problems.2 clear physics without an appreciation and understand- ing of one another’s results. By analogy, I believe that Decisions, Uncertainty and Intelligence: a deeper understanding of planning problems is pos- Eric Horvitz sible only through the concerted efforts of theorists, experimentalists, and practitioners. The current in- To be successful in realistic environments, reason- terplay between these groups (perhaps factions is the ing systems must identify and implement effective ac- appropriate word) is minimal. tions in the face of inescapable incompleteness in their The pendulum of popular opinion swings back and knowledge about the world. AI investigators have long forth between theory and practice. At different times, realized the crucial role that methods for handling in- experimentalist have had to pepper their papers with completeness and uncertainty must play in intelligence. equations and theorists have had to build systems and Although we have made significant gains in learning run experiments in order to get published. With the and decision making under uncertainty, difficult chal- exception of the rare person gifted as mathematician lenges remain to be tackled. and hacker, the requirement of both theory and prac- Challenge: Creating Situated Autonomous Decision tice in every result is a difficult one to satisfy; difficult Systems and, I believe, unnecessary. This requirement implies A key challenge for AI investigators is the develop- that every publishable result must put forth a theoret- ment of comprehensive autonomous decision-making ical argument and then verify it both analytically and systems that are situated in dynamic environments over experimentally. The requirement tends to downplay extended periods of time, and that are entrusted with the need for a community effort to weave together a handling varied, complex tasks. Such robust decision rich tapestry of ideas and results. systems need the ability to process streams of events A scientific field can nurture those who lean heav- over time, and to continue, over their lifetimes, to pur- ily toward theory or practice as long as the individ- sue actions with the greatest expected utility. uals direct their research to contribute to problems Mounting a response to this broad challenge imme- of common interest. Of course, the field must iden- diately highlights several difficult subproblems, each of tify the problems that it considers worthy of empha- which may be viewed as a critical challenge in itself. sis and marshal its forces accordingly. I suggest plan- Pursuing solutions to these subproblems will bring us ning as such a problem and the study of propositional closer to being able to field a spectrum of application- STRIPS planning a good starting point. By analogy to specific challenges such as developing robotic systems physics in the 1930’s, I recommend continued study of that are given the run of our homes, tractable medi- the STRIPS planning (the hydrogen atom of planning) cal decision making associates that span broad areas and its stochastic counterparts (Markov decision prob- of medicine, automated apprentices for helping people lems) in parallel with investigations into a wide range with scientific exploration, ideal resource management of more expressive languages for specifying planning 2Additional details can be found ’ problems (the whole of the periodic table). ftp://www.cs.brown.edu/u/tld/postscript/DeanetalAAA~ In terms of concrete proposals, I mention approaches 96.ps.

1342 AAAI-96 in multimedia systems, and intelligent user interfaces ple of maximum expected utility: an agent should take that employ rich models of user intentions and can en- actions that maximizes its expected (or average) mea- gage in effective dialogue with people. sure of reward. Although it is easy to state the prin- To address the broad challenge, we need to con- ciple, we are forced in practice to wrestle with sev- sider key phases of automated decision making, includ- eral difficult problems. Where does information about ing the steps of perceiving states of the world, fram- the utility of states come from? Whose utility is be- ing decisions, performing inference to compute beliefs ing maximized? How can we derive utilities associated about the world, making observations, and, most im- with solving subproblems from assertions about high- portantly, identifying a best set of actions. Under lim- level goals (e.g., “survive for as long as possible!“) or ited resources, we also need to carefully guide the allo- from utilities on goal states. ? What is the most reason- cation of resources to the different phases of analysis, able utility model for evaluating a finite sequence of and to extend decision making to the realm of monitor- actions an agent might take over time (e.g., should we ing and control of the entire decision-making process. assume an infinite number of future actions and dis- I will dive into several problems associated with these count value of future rewards, or assume a finite set components of decision making. of steps and compute average reward?, etc.). Different Subproblem: Automated Framing of Decision Prob- assumptions about the specific structure of the utility lems model lead to different notions of the “best” behaviors and to different computational efficiencies with evalu- Faced with a challenge, a decision-making system ating sequences of plans. must rely on some rules or, more generally, a model that expresses relationships among observations, states Subproblem: Mastery of Attention and Architecture of the world, and system actions. Several methods have Perceiving, reasoning, and acting all require costly been studied for dynamically building representations resources. Controlling the allocation of computational of the world that are custom-tailored to perceived chal- resources can be a critical issue in maximizing the value lenges. Framing a decision problem refers to identify- of a situated system’s behavior. What aspects of a ing a set of relevant distinctions and relationships, at problem and problem-solving strategy should a system the appropriate level of detail, and weaving together attend to and when? We need to develop richer mod- a decision model. Framing a decision problem has re- els of attention. There is promise in continuing work sisted formalization. Nevertheless, strides have been that turns the analytic machinery of decision-theoretic made Lll”U~Lb”LIU”L U”Vl”IL techniques, typically re- inference onto problem solving itself, and using such lying on the use of logical or decision-theoretic proce- measures as the expected value of computation (EVC) dures to piece together or prune away distinctions, as a to make design-time and run-time decisions about the function of the state of the world, yielding manageable ideal quantities of computation and memory to allo- focused models. We have a long way to go in our under- cate to alternative phases of reasoning - including to standing of principles for tractably determining what the control processes themselves. More generally, there distinctions and dependencies will be relevant given a is great opportunity in applying these methods in off- situation. line and on-line settings to optimize the overall nature Subproblem: Handling Time, Synchronicity, and and configuration of a system’s architecture, including a- Streams of l3vents decisions about the compilation of results. Autonomous systems must make decisions in an Subproblem: Learning about Self and Environment evolving environment that may change dramatically Continual learning about the environment and about over time, partly in response to actions that a system the efficacy of problem solving is critical for systems has or will take. Most research on action under uncer- situated in complex, dynamic environments, especially tainty has focused on models and inference procedures when systems may wander into one of several special- that are fundamentally atemporal, or that encode tem- ized environmental niches. We need to better under- poral distinctions as static variables. We must endow stand how we can endow our systems with awareness of systems with the ability to represent and reason about having adequate or inadequate knowledge about spe- the time-dependent dynamics of belief and action, in- cific types of problems so that they can allocate ap- cluding such critical notions as the persistence and dy- propriate resources for exploration and active learning. namics of world states. We also need to develop better There has been research on methods for computing means of synchronizing an agent’s perceptions, infer- the confidence in results given a model and problem ence, and actions with important events in the world. instance. This work highlights opportunities for de- Subproblem: Modeling Preferences and Utility veloping methods that an agent could use to probe The axioms of utility give us the fundamental princi- for critical gaps in its knowledge about the world.

Invited Talks 1343 Continuing this research will be valuable for building that is revealed by the web page you get if you click on decision-making systems that can perform active, di- it. How can we learn something useful about natural rected learning. language understanding from this kind of data? (We Subproblem: Living Life Fully-Harnessing Every probably need to use more than just this kind of data Second to learn language, but once we have some basic ontol- To date, most of our reasoning systems have no ogy defined, this kind of data should be of great use in choice but to idle away the precious time between their learning the details.) active problem-solving sessions. Systems immersed in 3. Let’s build agents that exhibit life-long machine complex environments should always have something learning, rather than algorithms that to do with their time. We need to develop techniques learn one thing and then get rebooted. Consider people that allow an agent to continuously partition its time and consider current ML approaches such as decision not only among several phases of a pressing analysis tree or neural network learning. People mature, they but also among a variety of tasks that will help the learn things, then use these things they know to make agent to maximize the expected utility of its behavior it easier to learn new things. They use the things they over its entire lifetime. Tasks that can benefit from on- know to choose what to learn next. ML has made good going attention include planning for future challenges, progress on approximating isolated functions from ex- probing and refining a utility model, prefetching in- amples of their input/output. But this is just a subrou- formation that is likely to be important, compilation tine for learning, and it’s more of less a solved problem of portions of expected forthcoming analyses, experi- at this point. The next question is how can we build menting (playing?) with its reasoning and motion con- agents that exhibit long-term learning that is cumula- trol systems, and learning about critical aspects of the tive in a way more like people learn. external world. Toward Flexible and Robust Robots: Think Big - AI Needs More Than Nils J. Nilsson Incremental Progress: Tom Mitchell I start with the premise that it would be desirable 1. Let’s build programs that turn the WWWeb into to have mobile, AI-style robots that have continu- the world’s largest knowledge-base. Doug Lenat had a ous existence - ones that endure and perform useful good idea that we should have a large AI knowledge work for long periods of time rather than ones that base covering much of human knowledge. One problem merely hold together long enough for a quick demo with building it is that it might take millions of people. and video-taping session. Of course, some limited- And then we’d have to maintain it afterwards. The web capability robots - such as those currently used in is built and growing already, and it’s online, and people automobile assembly and hospital-item delivery - do are already maintaining it. Unfortunately, it’s in text stay on the job for long periods of time. But all of and images and sounds, not logic. So the challenge is these robots are far from being as robust and flexible to build programs that can “read” the web and turn it as we want robots to be. into, say, a frame-based symbolic representation that My challenge problem is to produce a robot facto- mirrors the content of the web. It’s hard, but there turn and errand-runner for a typical office building - is no evidence that it’s impossible. And, even partial an office building that is not specially equipped to ac- solutions will be of incredible value. commodate robots. The kinds of tasks that such a 2. Apply Machine Learning- to learn to understand robot will be able to perform will depend, of course, Natural Language. Natural language has always been on its effecters and sensors as well as on its software. considered to be too difficult to do for real. But things There are plenty of important challenge problems con- have changed over the past three years in an interesting cerned with sensors and effecters, but I leave it to oth- way - for the first time in history we have hundreds of ers to pose those. Instead, my challenge is to AI people millions of supervised training examples indicating the to develop the software for a robot with more-or-less meaning of sentences and phrases: those hyperlinks in state-of-the art range-finding and vision sensors and all those web pages. Now agreed, they’re not quite the locomotion and manipulation effecters. Let’s assume kind of supervised training data we’d ask for if we were that our robot can travel anywhere in the building - to choose training data to learn natural language. But down hallways, into open offices, and up and down el- when it comes to training data you take what you can evators (but perhaps not escalators). Assume that it get, especially if it numbers in the millions (when there can pick up, carry, and put down small parcels, such as are less than 100,000 words to begin with). So each hy- books and packages. For communication with humans, perlink like my recent publications has a meaning suppose it has a speech synthesizer and word or phrase

1344 AAAI-96 recognizer and a small keyboard/display console. Any References such set of sensors and effecters would be sufficient for J.S. Albus. Outline for a Theory of Intelligence. IEEE an extensive list of tasks if only we had the software Systems, Man, and Cybernetics, Vol 21, No. 3, pp. to perform them. 473-509, May/June 1991. Here is the specific challenge: The robot must be G. Drescher. Made Up Minds: A Constructivist Ap- able to perform (or learn to perform with instruc- proach to Artificial Intelligence, Cambridge, MA: tion and training - but without explicit post-factory MIT Press, 1991. computer programming) any tusk that a human might “reasonably” expect it to be able to perform given its E.A. Feigenbaum, and J. Feldman. Computers and effecter/sensor suite. Of course, some tasks will be im- Thought, New York, NY: McGraw-Hill, 1963. possible - it cannot climb ladders, and it doesn’t do M. L. Ginsberg. Do computers need common sense? windows. And, I don’t mean the challenge to be one of Techn. report, CIRL, Univerity of Oregon, 1996. developing a “Cyc-like” commonsense knowledge base J. Hartmanis. Computational complexity of random and reasoning system. Our factotum will be allowed access stored program machines. some lapses in commonsense so long as it can learn Mathematical Sys- 1971. from its mistakes and benefit from instruction. It is tems Theory, 5:232-245, not my purpose to set high standards for speech un- D.O. Hebb. The Organization of Behavior. John Wi- derstanding and generation. The human task-givers ley and Sons, New York, New York, 1949. should be tolerant of and adapt to the current limited G. Kasparov. The day that I sensed a new kind of abilities of systems to process natural language. intelligence. Time, March 25, 1996, p. 55. The second part of the challenge is that the robot W.S. McCulloch and W. Pitts. A logical calculus of must stay on-the-job and functioning for a year with- the ideas immanent in nervous activity. Bull. Math. out being sent back to the factory for re-programming. of Biophysics, 5:115-137, 1943. What will be required to meet this challenge? First, of course, a major project involving the application Marvin Minsky. Computation: finite and infinite mu- and extension of several robotic and AI technologies chines. Prentice-Hall, 1967. and architectures for integrating them. I do not think Marvin Minsky and Seymour Papert. Perceptrons. that it will be feasible for the robot’s builders to send it MIT Press, Cambridge, Massachusetts, 1969. to its office building with a suite of programs that an- Allen Newell, J.C. Shaw, and Herbert Simon. Chess ticipate allof the tasks that could be given. I think the playing programs and the problem of complexity. robot will need to be able to plan and to learn how to perform some tasks that the building occupants (who Journal of Research and Development, 21320-335, 1958. Also appeared in (Feigenbaum and Feldman know only about its sensors and effecters) might ex- 1963). pect it to be able to perform but that its programmers did not happen to anticipate. To plan and to learn Oliver G. Selfridge. Pattern recognition and learning. efficiently, I think it will need to be able to construct In Colin Cherry, editor, Proceedings of the Third Lon- for itself hierarchies of useful action routines and the don Symposium on Information Theory, New York, appropriate associated perceptual processing routines New York, 1956. Academic Press. for guiding these actions. Perhaps something like the Karl Sims. Evolving 3d morphology and behavior by “twin-tower” architecture of James Albus (1991) would competition. In Rodney A. Brooks and Pattie Maes, be appropriate for overall control. But the towers will editors, Artificial Life IV: Proceedings of the Fourth have to grow with instruction and experience. The International Workshop on the Synthesis and Sim- computational models of developmental learning pro- ulation of Living Systems, pages 28-39. MIT Press, posed by Gary Drescher (1991) seem to me to be a Cambridge, Massachusetts, 1994. good place to start for the tower-building aspect of the problem. Work on this challenge problem would be good for AI. It would encourage progress on extending and in- tegrating the many disparate components of intelligent systems: reacting, planning, learning, perception, and reasoning. It might also connect the bottom-up and top-down AI approaches - to the benefit of both. (It could also produce a useful factotum.) Good luck!

Invited Talks 1345