Advances in Natural Language Processing Julia Hirschberg and Christopher D

REVIEW question, request), emotions and positive or neg- ative sentiment, and discourse structure (topic or rhetorical structure). Second, performance improvements in NLP were spurred on by shared Advances in natural task competitions. Originally, these competitions were largely funded and organized by the U.S. language processing Department of Defense, but they were later organized by the research community itself, such Julia Hirschberg1* and Christopher D. Manning2,3 as the CoNLL Shared Tasks (3). These tasks were a precursor of modern ML predictive modeling Natural language processing employs computational techniques for the purpose of learning, and analytics competitions, such as on Kaggle (4), understanding, and producing human language content. Early computational approaches to in which companies and researchers post their language research focused on automating the analysis of the linguistic structure of language data and statisticians and data miners from all over and developing basic technologies such as machine translation, speech recognition, and speech theworldcompetetoproducethebestmodels. synthesis. Today’s researchers refine and make use of such tools in real-world applications, A major limitation of NLP today is the fact that creating spoken dialogue systems and speech-to-speech translation engines, mining social most NLP resources and systems are available media for information about health or finance, and identifying sentiment and emotion toward only for high-resource languages (HRLs), such as products and services. We describe successes and challenges in this rapidly advancing area. English, French, Spanish, German, and Chinese. In contrast, many low-resource languages (LRLs)— ver the past 20 years, computational lin- strikes idle kids,” depending on the noun, verb, and such as Bengali, Indonesian, Punjabi, Cebuano, guistics has grown into both an exciting adjective assignments of the words in the sentence, and Swahili—spoken and written by millions of area of scientific research and a practical as well as grammatical structure. Beginning in the people have no such resources or systems avail- O technology that is increasingly being in- 1980s, but more widely in the 1990s, NLP was able.Afuturechallengeforthelanguagecommu- corporated into consumer products (for transformed by researchers starting to build mod- nity is how to develop resources and tools for example, in applications such as Apple’sSiriand els over large quantities of empirical language hundreds or thousands of languages, not just a few. Skype Translator). Four key factors enabled these data. Statistical or corpus (“body of words”)–based developments: (i) a vast increase in computing NLP was one of the first notable successes of Machine translation on May 12, 2016 power, (ii) the availability of very large amounts the use of big data, long before the power of Proficiency in languages was traditionally a hall- of linguistic data, (iii) the development of highly ML was more generally recognized or the term mark of a learned person. Although the social successful machine learning (ML) methods, and “big data” even introduced. standing of this human skill has declined in the (iv) a much richer understanding of the structure A central finding of this statistical approach to modern age of science and machines, translation of human language and its deployment in social NLP has been that simple methods using words, between human languages remains crucially im- contexts. In this Review, we describe some cur- part-of-speech (POS) sequences (such as whether portant, and MT is perhaps the most substantial rent application areas of interest in language a word is a noun, verb, or preposition), or simple way in which computers could aid human-human research. These efforts illustrate computational templates can often achieve notable results when communication. Moreover, the ability of com- approaches to big data, based on current cutting- trained on large quantities of data. Many text puters to translate between human languages edge methodologies that combine statistical anal- and sentiment classifiers are still based solely on remains a consummate test of machine intel- ysis and ML with knowledge of language. the different sets of words (“bag of words”)that ligence: Correct translation requires not only Computational linguistics, also known as nat- documents contain, without regard to sentence the ability to analyze and generate sentences in http://science.sciencemag.org/ ural language processing (NLP), is the subfield and discourse structure or meaning. Achieving human languages but also a humanlike under- of computer science concerned with using com- improvements over these simple baselines can be standing of world knowledge and context, de- putational techniques to learn, understand, and quite difficult. Nevertheless, the best-performing spite the ambiguities of languages. For example, produce human language content. Computation- systems now use sophisticated ML approaches the French word “bordel” straightforwardly means al linguistic systems can have multiple purposes: and a rich understanding of linguistic structure. “brothel”; but if someone says “My room is un The goal can be aiding human-human commu- High-performance tools that identify syntactic bordel,” then a translating machine has to know nication, such as in machine translation (MT); and semantic information as well as information enough to suspect that this person is probably not Downloaded from aiding human-machine communication, such as about discourse context are now available. One running a brothel in his or her room but rather is with conversational agents; or benefiting both example is Stanford CoreNLP (1), which provides saying “My room is a complete mess.” humans and machines by analyzing and learn- a standard NLP preprocessing pipeline that in- Machine translation was one of the first non- ing from the enormous quantity of human lan- cludes POS tagging (with tags such as noun, verb, numeric applications of computers and was studied guage content that is now available online. and preposition); identification of named entities, intensively starting in the late 1950s. However, the During the first several decades of work in such as people, places, and organizations; parsing hand-built grammar-basedsystemsofearlydec- computational linguistics, scientists attempted of sentences into their grammatical structures; ades achieved very limited success. The field was to write down for computers the vocabularies and identifying co-references between noun transformed in the early 1990s when researchers and rules of human languages. This proved a phrase mentions (Fig. 1). at IBM acquired a large quantity of English and difficult task, owing to the variability, ambiguity, Historically, two developments enabled the French sentences that weretranslationsofeach and context-dependent interpretation of human initial transformation of NLP into a big data field. other (known as parallel text), produced as the languages. For instance, a star can be either an The first was the early availability to researchers proceedings of the bilingual Canadian Parliament. astronomical object or a person, and “star” can of linguistic data in digital form, particularly These data allowed them to collect statistics of be a noun or a verb. In another example, two in- through the Linguistic Data Consortium (LDC) word translations and word sequences and to terpretations are possible for the headline “Teacher (2), established in 1992. Today, large amounts build a probabilistic model of MT (5). of digital text can easily be downloaded from Following a quiet period in the late 1990s, the Web. Available as linguistically annotated the new millennium brought the potent combina- 1Department of Computer Science, Columbia University, New York, data are large speech and text corpora anno- tion of ample online text, including considerable 2 NY 10027, USA. Department of Linguistics, Stanford University, tated with POS tags, syntactic parses, semantic quantities of parallel text, much more abundant Stanford, CA 94305-2150, USA. 3Department of Computer Science, Stanford University, Stanford, CA 94305-9020, USA. labels, annotations of named entities (persons, and inexpensive computing, and a new idea *Corresponding author. E-mail: [email protected] places, organizations), dialogue acts (statement, for building statistical phrase-based MT systems SCIENCE sciencemag.org 17 JULY 2015 • VOL 349 ISSUE 6245 261 ARTIFICIAL INTELLIGENCE (6). Rather than translating word by word, the of this form is much more practical with the therapy for less-abled persons [e.g., Maja Mataric’s key advance is to notice that small word groups massive parallel computation that is now econo- socially assistive robots (17)]. They also enable ava- often have distinctive translations. The Japa- mically available via graphics processing units. For tars to tutor people in interview or negotiation stra- nese “mizu iro” is literally the sequence translation, research has focused on a particular tegies or to help with health care decisions (18, 19). of two words (“water color”), but this is not the version of recurrent neural networks, with enhanced The creation of SDSs, whether between hu- correct meaning (nor does it mean a type of “long short-term memory” computational units mans or between humans and artificial agents, painting); rather, it indicates a light, sky-blue color. that can better maintain contextual information requires tools for automatic speech

Advances in Natural Language Processing Julia Hirschberg and Christopher D

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support