<<

Deep Learning in Lexical Analysis and

Wanxiang Che Yue Zhang Harbin Institute of Technology Singapore University of Technology and Design [email protected] yue [email protected]

Abstract Two different methods have been used to solve these structured prediction problems, including Lexical analysis and parsing tasks, mod- graph-based methods and transition-based meth- eling deeper properties of the and ods. The former differentiates output structures their relationships to each other, typi- based on their characteristics directly, while the cally involve segmentation, part-of- latter transforms output construction processes speech tagging and parsing. A typical into state transition processes, differentiating se- characteristic of such tasks is that the out- quences of transition actions. puts have structured. All of them can The conditional random fields (CRFs) are typi- fall into three types of structured predic- cal graph-based methods, which aim to maximize tion problems: sequence segmentation, se- the probability of the correct output structure. The quence labeling and parsing. graph-based methods can also be applied to depen- In this tutorial, we will introduce two dency parsing, where the aim change to maximize state-of-the-art methods to solve these the score of the correct output structure. structured prediction problems: graph- At the beginning, the transition-based methods based and transition-based methods. were applied into dependency parsing. Latter, it While, traditional graph-based and was found that sequence segmentation and label- transition-based methods depend on ing can also be modeled into a sequence of state “feature engineering” work, which costs transition. lots of human labor and may misses Both graph-based and transition-based meth- many useful features. Deep learning just ods depend on “feature engineering” work, that is right can overcome the above “feature huge hand-crafted features and their combination engineering” problem. We will further should be designed according to different tasks. It introduction those deep learning models usually cost lots of human labor. More seriously, which have been successfully used for many useful features may be missed by human both graph-based and transition-based beings. The features extracted from training data structured prediction. also lack in generalization. Neural networks, also with a fancy name deep 1 Tutorial Overview learning, just right can overcome the above “fea- ture engineering” problem. In theory, they can use Lexical analysis and parsing tasks, modeling non-linear activation functions and multiple layers deeper properties of the words and their relation- to automatically find useful features. The novel ships to each other, typically involve word seg- network structures, such as convolutional or recur- mentation, part-of-speech tagging and parsing. A rent, help to reduce the difficulty further. typical characteristic of such tasks is that the out- These deep learning models have been success- puts have structured. All of them can fall into three fully used for both graph-based and transition- types of structured prediction problems: sequence based structured prediction. In this tutorial, we segmentation, sequence labeling and parsing. Be- will give a review of each line of work, by con- cause of the pervasive problem of ambiguity, none trasting them with traditional statistical methods, of the above problems are trivial to predict. and organizing them in consistent orders.

1 Proceedings of the The 8th International Joint Conference on Natural Language Processing, pages 1–2, Taipei, Taiwan, November 27 – December 1, 2017. 2017 AFNLP 2 Outline of China, a national 973 and a number of research projects. He has published more than 40 papers in 1. Typical Lexical Analysis and Parsing Tasks high level journals and conferences, and published Word segmentation two textbooks. He and his team have achieved • POS tagging good results in a number of international technical • Parsing evaluations, such as the first place of CoNLL 2009 • Structured Prediction and the fourth place of CoNLL 2017. He was an • area co-chair of ACL 2016, publication co-chairs – Sequence Segmentation of ACL 2015 and EMNLP 2011. The Language Sequence Labeling – Technology Platform (LTP), an open source Chi- – Parsing nese NLP system he leads to develop, has been 2. Deep Learning Background authorized to more than 600 institutes and indi- viduals including Baidu, Tencent and so on. He Multilayer Perceptron (MLP) • achieved the outstanding paper award honorable Back-propagation mention of AAAI 2013, the first prize of techno- • logical progress award in Heilongjiang province • Recurrent Neural Networks (RNNs) in 2016, Google focused research award in 2015 • Recursive Neural Networks and 2016, the first prize of Hanwang youth inno- • Convolutional Neural Networks (CNNs) vation award and first prize of the Qian Weichan • Chinese information processing science and tech- 3. State-of-the-art Methods nology award in 2010. Yue Zhang is currently an assistant professor at Graph-based Methods • Singapore University of Technology and Design. – Conditional Random Fields Before joining SUTD in July 2012, he worked as – Graph-based Dependency Parsing a postdoctoral research associate in University of Transition-based Methods • Cambridge, UK. Yue Zhang received his DPhil – Greedy Shift-Reduce Dependency and MSc degrees from University of Oxford, UK, Parsing and his BEng degree from Tsinghua University, – Greedy Sequence Labeling China. His research interests include natural lan- – Beam-search Training and Decod- guage processing, machine learning and artificial ing Intelligence. He has been working on statistical parsing, parsing, text synthesis, machine transla- 4. Neural Graph-based Methods tion, and stock market analysis Neural Conditional Random Fields intensively. Yue Zhang serves as the reviewer for • Neural Graph-based Dependency Pars- top journals such as Computational Linguistics, • ing Transaction of Association of Computational Lin- guistics (standing review committee) and Journal 5. Neural Transition-based Methods of Artificial Intelligence Research. He is the as- Neural Greedy Shift-Reduce Depen- sociate editor for ACM Transactions on Asian and • dency Parsing Low Resource Language Information Processing. He is also PC member for conferences such as Neural Greedy Sequence Labeling • ACL, COLING, EMNLP, NAACL, EACL, AAAI Globally Optimized Models • and IJCAI. He was the area chairs of COLING 3 Instructors 2014, NAACL 2015, EMNLP 2015, ACL 2017 and EMNLP 2017. He is the TPC chair of IALP Wanxiang Che is currently an associate profes- 2017. sor of school of and technology at Harbin Institute of Technology (HIT) and visit- ing associate professor of Stanford University (at NLP group in 2012). His main research area lies in Natural Language Processing (NLP). He cur- rently leads a national natural science foundation

2