Review Articles
Total Page:16
File Type:pdf, Size:1020Kb
review articles DOI:10.1145/2701413 AI has seen great advances of many kinds recently, but there is one critical area where progress has been extremely slow: ordinary commonsense. BY ERNEST DAVIS AND GARY MARCUS Commonsense Reasoning and Commonsense Knowledge in Artificial Intelligence WHO IS TALLER, Prince William or his baby son Prince key insights George? Can you make a salad out of a polyester shirt? ˽ To achieve human-level performance in domains such as natural language If you stick a pin into a carrot, does it make a hole processing, vision, and robotics, basic knowledge of the commonsense world— in the carrot or in the pin? These types of questions time, space, physical interactions, people, may seem silly, but many intelligent tasks, such as and so on—will be necessary. understanding texts, computer vision, planning, and ˽ Although a few forms of commonsense reasoning, such as taxonomic reasoning scientific reasoning require the same kinds of real- and temporal reasoning are well world knowledge and reasoning abilities. For instance, understood, progress has been slow. ˽ Extant techniques for implementing if you see a six-foot-tall person holding a two-foot-tall commonsense include logical analysis, handcrafting large knowledge bases, person in his arms, and you are told they are father Web mining, and crowdsourcing. Each of these is valuable, but none by itself is a and son, you do not have to ask which is which. If you full solution. need to make a salad for dinner and are out of lettuce, ˽ Intelligent machines need not replicate you do not waste time considering improvising by human cognition directly, but a better understanding of human commonsense taking a shirt of the closet and cutting might be a good place to start. 92 COMMUNICATIONS OF THE ACM | SEPTEMBER 2015 | VOL. 58 | NO. 9 it up. If you read the text, “I stuck a pin In this article, we argue that common- that are comparatively easy to acquire, a in a carrot; when I pulled the pin out, it sense reasoning is important in many AI substantial fraction can only be resolved had a hole,” you need not consider the tasks, from text understanding to com- using a rich understanding of the world. possibility “it” refers to the pin. puter vision, planning and reasoning, A well-known example from Terry Wino- To take another example, con- and discuss four specific problems grad48 is the pair of sentences “The city sider what happens when we watch where substantial progress has been council refused the demonstrators a per- a movie, putting together infor- made. We consider why the problem mit because they feared violence,” vs.“… mation about the motivations of in its general form is so difficult and because they advocated violence.” To de- fictional characters we have met why progress has been so slow, and termine that “they” in the first sentence only moments before. Anyone survey various techniques that have refers to the council if the verb is “feared,” who has seen the unforgettable been attempted. but refers to the demonstrators if the horse’s head scene in The Godfather verb is “advocated” demands knowledge immediately realizes what is going on. Commonsense in Intelligent Tasks about the characteristic relations of city It is not just it is unusual to see a sev- The importance of real-world knowl- councils and demonstrators to violence; ered horse head, it is clear Tom Hagen edge for natural language processing, no purely linguistic clue suffices.a is sending Jack Woltz a message—if and in particular for disambiguation of I can decapitate your horse, I can de- all kinds, was discussed as early as 1960, 3 a Such pairs of sentences are known as “Wino- capitate you; cooperate, or else. For by Bar-Hillel, in the context of machine grad schemas” after this example; a collection now, such inferences lie far beyond translation. Although some ambigui- of many such examples can be found at http:// 32 ILLUSTRATION BY PETER CROWTHER ASSOCIATES BY ILLUSTRATION anything in artificial intelligence. ties can be resolved using simple rules cs.nyu.edu/faculty/davise/papers/WS.html. SEPTEMBER 2015 | VOL. 58 | NO. 9 | COMMUNICATIONS OF THE ACM 93 review articles Machine translation likewise often tasks can be carried out purely in terms matches the partial view of the chair involves problems of ambiguity that of manipulating individual words or on the near side of the table. can only be resolved by achieving an short phrases, without attempting any The viewer infers the existence of actual understanding of the text—and deeper understanding; commonsense objects that are not in the image at all. bringing real-world knowledge to bear. is evaded, in order to focus on short- There is a table under the yellow table- Google Translate often does a fine job term results, but it is difficult to see cloth. The scissors and other items of resolving ambiguities by using nearby how human-level understanding can hanging on the board in the back are words; for instance, in translating the be achieved without greater attention presumably supported by pegs or two sentences “The electrician is work- to commonsense. hooks. There is presumably also a hot ing” and “The telephone is working” into Watson, the “Jeopardy”-playing water knob for the faucet occluded by German, it correctly translates “work- program, is an exception to the above the dish rack. The viewer also infers how ing” as meaning “laboring,” in the first rule only to a small degree. As de- the objects can be used (sometimes sentence and as meaning “functioning scribed in Kalyanpur,27 commonsense called their “affordances”); for example, correctly” in the second, because in the knowledge and reasoning, particular- the cabinets and shelves can be opened corpus of texts Google has seen, the Ger- ly taxonomic reasoning, geographic by pulling on the handles. (Cabinets, man words for “electrician” and “labor- reasoning, and temporal reasoning, which rotate on joints, have the handle ing” are often found close together, as played some role in Watson’s opera- on one side; shelves, which pull out are the German words for “telephone” tions but only a quite limited one, and straight, have the handle in the center.) and “function correctly.”b However, if they made only a small contribution to Movies would prove even more dif- you give it the sentences “The electrician Watson’s success. The key techniques ficult; few AI programs have even tried. who came to fix the telephone is work- in Watson are mostly of the same fla- The Godfather scene mentioned earlier ing,” and “The telephone on the desk is vor as those used in programs like is one example, but almost any movie working,” interspersing several words Web search engines: there is a large contains dozens or hundreds of mo- between the critical element (for exam- collection of extremely sophisticated ments that cannot be understood ple, between electrician and working), and highly tuned rules for match- simply by matching still images to the translations of the longer sentences ing words and phrases in the ques- memorized templates. Understand- say the electrician is functioning prop- tion with snippets of Web documents ing a movie requires a viewer to make erly and the telephone is laboring (Table such as Wikipedia; for reformulating numerous inferences about the inten- 1). A statistical proxy for commonsense the snippets as an answer in proper tions of characters, the nature of physi- that worked in the simple case fails in form; and for evaluating the quality of cal objects, and so forth. In the current the more complex case. proposed possible answers. There is state of the art, it is not feasible even to Almost without exception, current no evidence that Watson is anything attempt to build a program that will be computer programs to carry out lan- like a general-purpose solution to the able to do this reasoning; the most that guage tasks succeed to the extent the commonsense problem. can be done is to track characters and Computer vision. Similar issues identify basic actions like standing up, arise in computer vision. Consider the sitting down, and opening a door.4 b Google Translate is a moving target; this par- ticular example was carried out on 6/9/2015, photograph of Julia Child’s kitchen Robotic manipulation. The need but translations of individual sentences (Figure 1): Many of the objects that for commonsense reasoning in au- change rapidly—not always for the better on are small or partially seen, such as the tonomous robots working in an un- individual sentences. Indeed, the same query metal bowls in the shelf on the left, controlled environment is self-evident, given minutes apart can give different results. most conspicuously in the need to have Changing the target language, or making the cold water knob for the faucet, the seemingly inconsequential changes to the round metal knobs on the cabinets, the robot react to unanticipated events sentence, can also change how a given ambi- the dishwasher, and the chairs at the appropriately. If a guest asks a waiter- guity is resolved, for no discernible reason. table seen from the side, are only rec- robot for a glass of wine at a party, and Our broader point here is not to dissect Google ognizable in context; the isolated im- the robot sees the glass he is picked up Translate per se, but to note it is unrealistic to expect fully reliable disambiguation in the ab- age would be difficult to identify. The is cracked, or has a dead cockroach at sence of a deep understanding of the text and top of the chair on the far side of the the bottom, the robot should not simply relevant domain knowledge.