<<

feature Talking to Computers in Natural Natural language understanding is as old as computing itself, but recent advances in machine and the rising demand of natural-language interfaces make it a promising time to once again tackle the long-standing challenge.

By Percy Liang DOI: 10.1145/2659831

s you read this sentence, the on the page are somehow absorbed into your and transformed into , which then enter into a rich network of previously-acquired concepts. This process of language understanding has so far been the sole privilege of . But the universality of computation, Aas formalized by Alan Turing in the early 1930s—which states that any computation could be done on a Turing machine—offers a tantalizing possibility that a computer could understand language as well. Later, Turing went on in his seminal 1950 article, “Computing Machinery and Intelligence,” to propose the now-famous Turing test—a bold

and speculative method to evaluate cial intelligence at the time. Daniel Bo- Figure 1). SHRDLU could both answer whether a computer actually under- brow built a system for his Ph.D. thesis questions and execute actions, for ex- stands language (or more broadly, is at MIT to solve algebra problems ample: “Find a block that is taller than “intelligent”). While this test has led to found in high-school algebra books, the one you are holding and put it into the development of amusing chatbots for example: “If the number of custom- the box.” In this case, SHRDLU would that attempt to fool judges by ers Tom gets is twice the square of 20% first have to understand that the blue engaging in light-hearted banter, the of the number of advertisements he runs, block is the referent and then perform grand challenge of developing serious and the number of advertisements is 45, the action by moving the small green programs that can truly understand then what is the numbers of customers block out of the way and then lifting the language in useful ways remains wide Tom gets?” [1]. Another landmark was blue block into the brown box. Inter- open. This article provides a brief the LUNAR system, developed by Bill pretation and execution are performed glimpse into the history of language un- Woods in the early 1970s at BBN [2]. jointly, and the key idea was that true derstanding systems, the challenges as- LUNAR provided a natural-language language understanding is only sensi- sociated with understanding language, interface into a database about moon ble when it connects with the world. and how is emerging rocks that had been brought back on For their time, these systems were as a major character in the story. the recent Apollo 11 mission. Scientists significant achievements. They were could ask LUNAR to: “list all the rocks able to handle fairly complex linguistic THE EARLY YEARS, that contain chronite and ulvospinel.” phenomena and integrate , se- RULE-BASED SYSTEMS Around the same time, Terry Wino- mantics, and reasoning in an end-to- The first natural language understand- grad, then a Ph.D. student at MIT, de- end application. What is even more im- ing systems emerged in the early 1960s veloped another system called SHRDLU pressive is that these systems ran on a in Cambridge, MA, a hotbed for artifi- [3] that lived in a toy blocks world (see modicum of resources by today’s stan-

18 XRDS • FALL 2014 • VOL.21 • NO.1 dards. For example, LUNAR was writ- ral such as English with pro- terpretation, whereas in the latter, it ten in LISP for the DEC PDP-10, and all gramming languages such as Python. is “the women.” the code and data fit in only 80 KB of Python is unambiguous. The expres- In both of these examples, it is clear . In contrast, today, just start- sion lambda x : x.s plit(" ")[0:3] the words alone do not fully specify the ing a Python interpreter alone eats up has exactly one denotation () meaning. They are only a few impres- 5 MB, which is 60 times more. as dictated by the Python language sionistic brushstrokes, leaving the Over the following 20 years, these specification. English, on the other rest to be filled in by the reader. Hu- systems were extended, but soon it hand, can be ambiguous and vague. mans perform this completion based became increasingly difficult to make Consider the following pair of on knowledge about the world that we progress. Systems built for rocks and sentences from Yehoshua Bar-Hillel: cultivate throughout our lives. Com- blocks did not automatically general- “The pen is in the box” and “The box puters lack this knowledge, and there- ize to other domains, so adaptation is in the pen.” [4] In the first sentence, fore these inferential leaps are cur- was very burdensome. Also, to handle “pen” is most likely a writing instru- rently extremely difficult. This is why the never-ending intricacies of natural ment; in the second, it is most likely systems such as SHRDLU were con- language, the complexity of these sys- an enclosure for animals. Another fined to a microcosm, where such tems spiraled out of control. example from Winograd: “The city inferences are possible, and why pro- councilmen refused the women a gramming languages live solely in the READING BETWEEN THE LINES, permit because they feared violence” computer world. THE CHALLENGES OF LANGUAGE and “The ... because they advocated Despite differences between Eng- Why is it so difficult for a computer to revolution” [3]. Who does “they” refer lish and Python, there are some understand natural language? To an- to in each case? In the first sentence, important similarities. The first is

Image by Lasse Behnke swer this, it is helpful to contrast natu- “the city councilmen” is the likely in- “compositionality”—an idea often

XRDS • FALL 2014 • VOL.21 • NO.1 19 feature

attributed to German logician Got- 1959 checkers program that learned to tlob Frege—that the meaning of the Language is an play from its own moves and game out- whole is derived from the meaning of amazing vehicle for comes. Today, machine learning plays the parts. Just as a Python interpreter a vital role in applications spanning computes (4 − 2) + 3 by understand- human expression, spam filtering, speech recognition, ad- ing numbers, operators, and a few capable of conveying vertisement placement, robotics, med- combination rules, humans under- ical diagnosis, etc. stand “red house” by understanding everything from Machine learning also drives many the constituent words. This compo- intense emotions to NLP tasks: part-of-speech tagging sitionality is what allows us to com- (e.g., identifying “London” as a prop- municate a dazzling array of different intricate scientific er noun), named-entity recognition meanings given just a relatively small arguments. (e.g., identifying “London” as a loca- vocabulary, or as German philosopher tion), syntactic (e.g., identify- Wilhelm von Humboldt put it, “make ing “London” as the direct object of a infinite use of finite means.” sentence), and machine The ambiguous nature of natural to the same problems that plagued all (e.g., converting “London” to “Lon- language might seem like a flaw, but purely rule-based systems. dres” in French). One task closely tied in fact, it is exactly this that to language understanding is ques- makes natural language so powerful. THE STATISTICAL REVOLUTION tion answering (e.g., responding to Think of language as a (cooperative) In the early 1990s, a revolution oc- “What is the largest city in England?” game between a speaker and a listen- curred. Up until then, natural lan- with “London”). A question-answering er. Game play proceeds as follows: 1. guage processing (NLP) research had system uses the question to retrieve the speaker thinks of a , 2. she been rule-based, where one directly relevant Web pages using a search chooses an utterance to convey that writes a program to perform a task. engine, extracts candidate answers concept, and 3. the listener interprets But with the increased availability from those pages, and then scores and the utterance. Both players win if of more data and more computing ranks the answers. the listener’s interpretation matches power, NLP went statistical. The sta- A shining achievement of ques- the speaker’s intention. To play this tistical paradigm requires a different tion answering was IBM’s [5], game well, the speaker should thus mode of thinking: one (1) first col- a computer built for the quiz show choose the simplest utterance that lects examples of the desired input- “Jeopardy!” David Ferrucci’s team of conveys her intended concept—any- output behavior of the program, (2) 20 researchers worked tirelessly for thing the listener can infer can be writes a partial program with un- four years, and in 2011, Watson took omitted. For example, “cities that are known parameters, and (3) uses a on former “Jeopardy!” champions Brad in United States” can be shortened to machine learning algorithm to au- Rutter and Ken Jennings in a widely “U.S. cities.” How can a computer fill tomatically tune these parameters publicized match. Watson was victo- in these gaps, which depend on the based on the examples. rious. IBM had pulled off a similarly breadth of human experience involv- Statistical techniques have a long impressive feat in 1997 with Deep Blue, ing of the world and social history. As far back as 1795, Carl Fried- which defeated world chess champion, interactions? Projects such as at- rich Gauss developed the least squares Gary Kasparov. Somehow, “Jeopardy!” tempted to manually write down this method for fitting a line to a set of hit closer to home, as it dealt with lan- world knowledge (e.g., every tree is a points determined from measurement guage, something uniquely human, plant), but there is a lot of world out data. An early example within artifi- in contrast to the calculated thinking there, and it’s messy too. Cyc fell prey cial intelligence was Arthur Samuel’s involved in playing chess. It was a soft blow, though: The stylized trivia ques- Figure 1. SHRDLU was a natural language understanding system that allowed users tions on “Jeopardy!” are more about to interact in a blocks world environment. memorization and pattern-matching than reasoning; human-level natural language understanding remains a distant goal.

SYNTHESIS We have climbed to an interesting van- tage point over the last 50 years. We have seen early rule-based systems such as LUNAR and SHRDLU perform relative- ly deep analyses of natural language, but only in narrow domains. We have also seen the blossoming of statistical techniques in a of applications,

20 XRDS • FALL 2014 • VOL.21 • NO.1 including , which practical result was not much better these applications will also generate operates more broadly and robustly by than building a rule-based system. large amounts of data from which se- absorbing huge amounts of data but In the last five years, there has been mantic parsers can learn, thus cov- has limited reasoning capabilities. For another important technical devel- ering the Achilles heel of statistical the geography question above, a system opment: new machine learning tech- methods—having adequate data. might answer “London” not because it niques whose input data consists of an- Language is an amazing vehicle for understands what “largest” means, but swers to questions rather than logical human expression, capable of convey- because a document contains the text: forms (e.g., “London” for “What is the ing everything from intense emotions “London is the capital of England, as largest city in England?”). The key ad- to intricate scientific arguments. Fully well as its largest city, ...” vantage is that answers, unlike logical answering Turing’s original question Fortunately, the rule-based and forms, do not require specialized ex- of whether computers can match this statistical views are complementary, pertise to produce and are thus much capability remains, at least for now, although a large gulf has historically easier to obtain. Using crowdsourcing, more of a philosophical enterprise than separated them [6]. Ideally, we would much larger question-answering data- an empirical one. But natural language fuse the logically sophisticated rep- sets are now emerging [9]. understanding is not solely a scientifi- resentations of rule-based systems Figure 2 illustrates a simplified cally interesting endeavor but also one with the robustness and automation version of how a system would con- loaded with practical potential. The of statistical techniques. The area of ceivably learn from two question- confluence of demand, robust machine (statistical) semantic parsing provides answer pairs. For each question, the learning techniques, and expressive an initial step toward bridging this system generates a set of possible logical representations makes the pres- gulf by applying machine learning to logical forms and executes each one ent time a particular exciting one for the problem of parsing sentences into to yield an answer. The logical forms natural language understanding. logical forms. Instead of treating “larg- that do not yield the correct answer est city in England” as merely words, are discarded. Based on the remain- References semantic parsing, like rule-based ing logical forms, the system would [1] Bobrow , D. G. Natural Language Input for a Computer System. AI Technical methods, uses a logical form: argma find that “breathe his last” occurs Reports (1964-2004). MIT. 1964. http://hdl.handle. x x:City(x)∧LocatedIn(x,France) more consistently with PlaceOfDeath net/1721.1/6903 Population(x), not unlike a Python than the alternatives and choose this [2] Woods, W.A., Kaplan, R.M., and Webber, B.N. The Lunar Sciences Natural Language Information program. Moreover, like statistical mapping. This is the core intuition. System: Final report. Technical report. BBN Report methods, the mapping from sentences In an actual system, thousands of 2378. Bolt Beranek and Newman Inc., 1972. [3] Winograd, T. Understanding Natural Language. to logical forms is learned from data, question-answer pairs are used, each Academic Press, New York, 1972. not manually hand-coded. generating hundreds of potential [4] Bar-Hillel, Y. Language and Information: Selected Ray Mooney [7] and Luke Zettlemoy- logical forms that are more complex. Essays on Their Theory and Application. Jerusalem, Addison-Wesley/The Jerusalem Academic Press, er [8] were among the first to develop The system also maintains probabil- 1964. statistical semantic parsers. However, ity distributions over logical forms [5] Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., their work was limited by data, which reflecting ambiguity in language Gondek, D., Kalyanpur, A.A., Lally, A., Murdock, J.W., Nyberg, E., Prager, J., Schlaefer, N., and Welty, C. in their case were pairs of questions an- and uncertainty due to noise in the Building Watson: An overview of the DeepQA project. notated with their logical forms. Creat- data. Semantic parsing thus draws AI Magazine 31, 3 (2013), 59–79. ing such data was burdensome, so the strength from both machine learning [6] Liang, P. and Potts, C. Bringing Machine Learning and Compositional Together. Annual Reviews and logic, two powerful but disparate of (to appear). Figure 2. A sketch of how a system can intellectual traditions. [7] Zelle, M. and Mooney, R.J. Learning to Parse Database Queries Using Inductive Logic infer the logical from question-answer Programming. In Proceedings of the Thirteenth pairs.. THE FUTURE OF NATURAL National Conference on - Vol. 2 LANGUAGE UNDERSTANDING (Portland, Aug.) AAAI Press, 1996, 1050–1055. [8] Zettlemoyer, L.S. and Collins, M. Learning to Map On the technical front, new semantic Sentences to Logical Form: Structured classification Where did Mozart breathe his last? parsing techniques are being rapidly with probabilistic categorial . In Proceedings of the Twenty-First Conference on PlaceOfBirth(Mozart) ⇒ Salzburg developed. But there is another im- Uncertainty in Artificial Intelligence(Edinburgh, PlaceOfDeath(Mozart) ⇒ Vienna portant enabling factor. In 2012, Apple June). AUAI Press, 2005, 658–666. launched , a virtual personal as- [9] Berant, J., Chou, A., Frostig, R., and Liang, P. PlaceOfMarriage(Mozart) ⇒ Vienna Semantic Parsing On From Question- sistant that could understand basic Answer Pairs. In Proceedings of the Conference on Vienna user requests. Google and Microsoft Empirical Methods in Natural Language Processing (Seattle, Oct.). Association for Computational soon followed suit. Home entertain- Linguistics Stroudsburg (ACL), 2013. Where did Hogarth breathe his last? ment systems and cars are now being PlaceOfBirth(Hogarth) ⇒ London equipped with natural-language inter- Biography PlaceOfDeath(Hogarth) ⇒ London faces. All these applications require Percy Liang is an assistant professor of Computer Science precise language understanding, and at Stanford University. He received his B.S. from MIT, Ph.D. PlaceOfMarriage(Hogarth)⇒ Paddington from University of California Berkeley, and was a post-doc this demand will undoubtedly stress at Google. London current semantic parsers, causing Copyright held by Owner(s)/Author(s). Publication more innovation. At the same time, rights licensed to ACM $15.00

XRDS • FALL 2014 • VOL.21 • NO.1 21