Talking to Computers in Natural Language
Total Page:16
File Type:pdf, Size:1020Kb
feature Talking to Computers in Natural Language Natural language understanding is as old as computing itself, but recent advances in machine learning and the rising demand of natural-language interfaces make it a promising time to once again tackle the long-standing challenge. By Percy Liang DOI: 10.1145/2659831 s you read this sentence, the words on the page are somehow absorbed into your brain and transformed into concepts, which then enter into a rich network of previously-acquired concepts. This process of language understanding has so far been the sole privilege of humans. But the universality of computation, Aas formalized by Alan Turing in the early 1930s—which states that any computation could be done on a Turing machine—offers a tantalizing possibility that a computer could understand language as well. Later, Turing went on in his seminal 1950 article, “Computing Machinery and Intelligence,” to propose the now-famous Turing test—a bold and speculative method to evaluate cial intelligence at the time. Daniel Bo- Figure 1). SHRDLU could both answer whether a computer actually under- brow built a system for his Ph.D. thesis questions and execute actions, for ex- stands language (or more broadly, is at MIT to solve algebra word problems ample: “Find a block that is taller than “intelligent”). While this test has led to found in high-school algebra books, the one you are holding and put it into the development of amusing chatbots for example: “If the number of custom- the box.” In this case, SHRDLU would that attempt to fool human judges by ers Tom gets is twice the square of 20% first have to understand that the blue engaging in light-hearted banter, the of the number of advertisements he runs, block is the referent and then perform grand challenge of developing serious and the number of advertisements is 45, the action by moving the small green programs that can truly understand then what is the numbers of customers block out of the way and then lifting the language in useful ways remains wide Tom gets?” [1]. Another landmark was blue block into the brown box. Inter- open. This article provides a brief the LUNAR system, developed by Bill pretation and execution are performed glimpse into the history of language un- Woods in the early 1970s at BBN [2]. jointly, and the key idea was that true derstanding systems, the challenges as- LUNAR provided a natural-language language understanding is only sensi- sociated with understanding language, interface into a database about moon ble when it connects with the world. and how machine learning is emerging rocks that had been brought back on For their time, these systems were as a major character in the story. the recent Apollo 11 mission. Scientists significant achievements. They were could ask LUNAR to: “list all the rocks able to handle fairly complex linguistic THE EARLY YEARS, that contain chronite and ulvospinel.” phenomena and integrate syntax, se- RULE-BASED SYSTEMS Around the same time, Terry Wino- mantics, and reasoning in an end-to- The first natural language understand- grad, then a Ph.D. student at MIT, de- end application. What is even more im- ing systems emerged in the early 1960s veloped another system called SHRDLU pressive is that these systems ran on a in Cambridge, MA, a hotbed for artifi- [3] that lived in a toy blocks world (see modicum of resources by today’s stan- 18 XRDS • FALL 2014 • VOL.21 • NO.1 dards. For example, LUNAR was writ- ral languages such as English with pro- terpretation, whereas in the latter, it ten in LISP for the DEC PDP-10, and all gramming languages such as Python. is “the women.” the code and data fit in only 80 KB of Python is unambiguous. The expres- In both of these examples, it is clear memory. In contrast, today, just start- sion lambda x : x.s plit(" ")[0:3] the words alone do not fully specify the ing a Python interpreter alone eats up has exactly one denotation (meaning) meaning. They are only a few impres- 5 MB, which is 60 times more. as dictated by the Python language sionistic brushstrokes, leaving the Over the following 20 years, these specification. English, on the other rest to be filled in by the reader. Hu- systems were extended, but soon it hand, can be ambiguous and vague. mans perform this completion based became increasingly difficult to make Consider the following pair of on knowledge about the world that we progress. Systems built for rocks and sentences from Yehoshua Bar-Hillel: cultivate throughout our lives. Com- blocks did not automatically general- “The pen is in the box” and “The box puters lack this knowledge, and there- ize to other domains, so adaptation is in the pen.” [4] In the first sentence, fore these inferential leaps are cur- was very burdensome. Also, to handle “pen” is most likely a writing instru- rently extremely difficult. This is why the never-ending intricacies of natural ment; in the second, it is most likely systems such as SHRDLU were con- language, the complexity of these sys- an enclosure for animals. Another fined to a microcosm, where such tems spiraled out of control. example from Winograd: “The city inferences are possible, and why pro- councilmen refused the women a gramming languages live solely in the READING BETWEEN THE LINES, permit because they feared violence” computer world. THE CHALLENGES OF LANGUAGE and “The ... because they advocated Despite differences between Eng- Why is it so difficult for a computer to revolution” [3]. Who does “they” refer lish and Python, there are some understand natural language? To an- to in each case? In the first sentence, important similarities. The first is Image by Lasse Behnke swer this, it is helpful to contrast natu- “the city councilmen” is the likely in- “compositionality”—an idea often XRDS • FALL 2014 • VOL.21 • NO.1 19 feature attributed to German logician Got- 1959 checkers program that learned to tlob Frege—that the meaning of the Language is an play from its own moves and game out- whole is derived from the meaning of amazing vehicle for comes. Today, machine learning plays the parts. Just as a Python interpreter a vital role in applications spanning computes (4 − 2) + 3 by understand- human expression, spam filtering, speech recognition, ad- ing numbers, operators, and a few capable of conveying vertisement placement, robotics, med- combination rules, humans under- ical diagnosis, etc. stand “red house” by understanding everything from Machine learning also drives many the constituent words. This compo- intense emotions to NLP tasks: part-of-speech tagging sitionality is what allows us to com- (e.g., identifying “London” as a prop- municate a dazzling array of different intricate scientific er noun), named-entity recognition meanings given just a relatively small arguments. (e.g., identifying “London” as a loca- vocabulary, or as German philosopher tion), syntactic parsing (e.g., identify- Wilhelm von Humboldt put it, “make ing “London” as the direct object of a infinite use of finite means.” sentence), and machine translation The ambiguous nature of natural to the same problems that plagued all (e.g., converting “London” to “Lon- language might seem like a flaw, but purely rule-based systems. dres” in French). One task closely tied in fact, it is exactly this ambiguity that to language understanding is ques- makes natural language so powerful. THE STATISTICAL REVOLUTION tion answering (e.g., responding to Think of language as a (cooperative) In the early 1990s, a revolution oc- “What is the largest city in England?” game between a speaker and a listen- curred. Up until then, natural lan- with “London”). A question-answering er. Game play proceeds as follows: 1. guage processing (NLP) research had system uses the question to retrieve the speaker thinks of a concept, 2. she been rule-based, where one directly relevant Web pages using a search chooses an utterance to convey that writes a program to perform a task. engine, extracts candidate answers concept, and 3. the listener interprets But with the increased availability from those pages, and then scores and the utterance. Both players win if of more data and more computing ranks the answers. the listener’s interpretation matches power, NLP went statistical. The sta- A shining achievement of ques- the speaker’s intention. To play this tistical paradigm requires a different tion answering was IBM’s Watson [5], game well, the speaker should thus mode of thinking: one (1) first col- a computer built for the quiz show choose the simplest utterance that lects examples of the desired input- “Jeopardy!” David Ferrucci’s team of conveys her intended concept—any- output behavior of the program, (2) 20 researchers worked tirelessly for thing the listener can infer can be writes a partial program with un- four years, and in 2011, Watson took omitted. For example, “cities that are known parameters, and (3) uses a on former “Jeopardy!” champions Brad in United States” can be shortened to machine learning algorithm to au- Rutter and Ken Jennings in a widely “U.S. cities.” How can a computer fill tomatically tune these parameters publicized match. Watson was victo- in these gaps, which depend on the based on the examples. rious. IBM had pulled off a similarly breadth of human experience involv- Statistical techniques have a long impressive feat in 1997 with Deep Blue, ing perception of the world and social history. As far back as 1795, Carl Fried- which defeated world chess champion, interactions? Projects such as Cyc at- rich Gauss developed the least squares Gary Kasparov. Somehow, “Jeopardy!” tempted to manually write down this method for fitting a line to a set of hit closer to home, as it dealt with lan- world knowledge (e.g., every tree is a points determined from measurement guage, something uniquely human, plant), but there is a lot of world out data.