Deep Learning for Feature Representation in Natural Language Processing

SCUOLA DI DOTTORATO UNIVERSITÀ DEGLI STUDI DI MILANO-BICOCCA Dipartimento di / Department of Informatica Sistemistica e Comunicazione Dottorato di Ricerca in / PhD program Computer Science Ciclo / Cycle XXX Deep Learning for Feature Representation in Natural Language Processing Cognome / Surname Nozza Nome / Name Debora Matricola / Registration number 717156 Tutore / Tutor: Prof. Giuseppe Vizzari Cotutore / Co-tutor: Dr. Elisabetta Fersini Supervisor: Prof. Enza Messina Coordinatore / Coordinator: Prof. Stefania Bandini ANNO ACCADEMICO / ACADEMIC YEAR 2016/2017 To my parents and grandparents Abstract The huge amount of textual user-generated content on the Web has incredibly grown in the last decade, creating new relevant opportunities for different real-world applications and domains. To overcome the difficulties of dealing with this large volume of unstructured data, the research field of Natural Language Processing has provided efficient solutions developing computational models able to understand and interpret human natural language without any (or almost any) human intervention. This field has gained in further computational efficiency and performance from the advent of the recent Machine Learning research lines concerned with Deep Learn- ing. In particular, this thesis focuses on a specific class of Deep Learning models devoted to learning high-level and meaningful representations of input data in unsupervised settings, by computing multiple non-linear transformations of increasing complexity and abstraction. In- deed, learning expressive representations from the data is a crucial step in Natural Language Processing, because it involves the transformation from discrete symbols (e.g. characters) to a machine-readable representation as real-valued vectors, which should encode semantic and syntactic meanings of the language units. The first research direction of this thesis is aimed at giving evidence that enhancing Natural Lan- guage Processing models with representations obtained by unsupervised Deep Learning models can significantly improve the computational abilities of making sense of large volume of user-generated text. In particular, this thesis addresses tasks that were considered crucial for understanding what the text is talking about, by extracting and disambiguating the named enti- ties (Named Entity Recognition and Linking), and which opinion the user is expressing, dealing also with irony (Sentiment Analysis and Irony Detection). For each task, this thesis proposes a novel Natural Language Processing model enhanced by the data representation obtained by Deep Learning. As second research direction, this thesis investigates the development of a novel Deep Learning model for learning a meaningful textual representation taking into account the relational struc- ture underlying user-generated content. The inferred representation comprises both textual and relational information. Once the data representation is obtained, it could be exploited by off-the- shelf Machine Learning algorithms in order to perform different Natural Language Processing tasks. As conclusion, the experimental investigations reveal that models able to incorporate high-level i features, obtained by Deep Learning, show significant performance and improved generalization abilities. Further improvements can be also achieved by models able to take into account the relational information in addition to the textual content. Publications Journal D. Nozza, E. Fersini, E. Messina (2018). “CAGE: Constrained deep Attributed Graph Embed- dings”. Information Sciences (Submitted). E. Fersini, P. Manchanda, E. Messina, D. Nozza, M. Palmonari (2018). “LearningToAdapt with Word Embeddings: Domain Adaptation of Named Entity Recognition Systems”. Information Processing and Management (Submitted). Conferences F. Bianchi, M. Palmonari and D. Nozza (2018). “Towards Encoding Time in Text-Based Entity Embeddings”. Proceedings of the 17th International Semantic Web Conference. (To appear) E. Fersini, P. Manchanda, E. Messina, D. Nozza, M. Palmonari (2018). “Adapting Named Entity Types to New Ontologies in a Microblogging Environment”. Proceedings of the 31st International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems. D. Nozza, F. Ristagno, M. Palmonari, E. Fersini, P. Manchanda, E. Messina (2017). “TWINE: A real-time system for TWeet analysis via INformation Extraction”. In Proceedings of the Software Demonstrations at the 15th Conference of the European Chapter of the Association for Computational Linguistics (pp. 25-28). Association for Computational Linguistics. iii Contents iv D. Nozza, E. Fersini, E. Messina (2017). “A Multi-View Sentiment Corpus”. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers (Vol. 1, pp. 273-280). Association for Computational Linguistics. P. Manchanda, E. Fersini, D. Nozza, E. Messina, M. Palmonari (2017). “Towards Adaptation of Named Entity Classification”. In Proceedings of the 32nd ACM Symposium on Applied Com- puting (pp. 155-157). ACM. F. M. Cecchini, E. Fersini, P. Manchanda, E. Messina, D. Nozza, M. Palmonari, C. Sas (2016). “UNIMIB@NEEL-IT: Named Entity Recognition and Linking of Italian Tweets”. In Proceed- ings of the 3rd Italian Conference on Computational Linguistics. CEUR-WS. D. Nozza, E. Fersini, E. Messina (2016). “Unsupervised Irony Detection: A Probabilistic Model with Word Embeddings”. In Proceedings of the 8th International Joint Conference on Knowl- edge Discovery, Knowledge Engineering and Knowledge Management (pp. 68-76). SciTePress. (Best Paper Award) D. Nozza, E. Fersini, E. Messina (2016). “Deep Learning and Ensemble Methods for Domain Adaptation”. In Proceedings of the 28th IEEE International Conference on Tools with Artificial Intelligence (pp. 184-189). IEEE. D. Nozza, D. Maccagnola, V. Guigue, E. Messina and P. Gallinari (2014). “A latent representation model for sentiment analysis in heterogeneous social networks”. In Proceedings of the 12th International Conference on Software Engineering and Formal Methods (pp. 201-213). Springer International Publishing. Contents Abstract i Publications iii Contents iv List of Figures viii List of Tablesx 1 Introduction1 1.1 Thesis contribution and organization.......................3 2 Natural Language Processing4 2.1 Challenges.....................................5 2.2 Language Feature Representation.........................8 2.2.1 Term Presence and Frequency......................8 2.2.2 Words and n-grams............................9 2.2.3 Complex Linguistic Descriptors.....................9 2.2.4 Distributional features.......................... 10 3 Deep Learning Background 12 3.1 Challenges motivating Deep Learning...................... 14 3.2 Artificial Neural Networks............................ 16 3.3 Neural Network Architectures.......................... 17 3.3.1 Single-Layer Feedforward Network................... 17 3.3.2 Multi-Layer Feedforward Network.................... 18 3.3.3 Recurrent Neural Network........................ 19 3.4 Activation Functions............................... 20 3.5 Learning method................................. 21 3.5.1 Objective functions............................ 22 3.5.1.1 Loss functions......................... 23 3.5.1.2 Regularization......................... 24 3.5.2 Training algorithm............................ 25 3.5.3 Optimization algorithms......................... 26 3.5.3.1 Gradient descent........................ 26 3.5.3.2 Stochastic gradient descent.................. 27 3.5.3.3 Stochastic gradient descent with Momentum......... 27 v Contents vi 3.5.3.4 Stochastic gradient descent with Nesterov Momentum.... 28 3.5.3.5 AdaGrad............................ 30 3.5.3.6 RMSProp........................... 31 3.5.3.7 Adam............................. 31 3.5.3.8 Adadelta............................ 33 4 Deep Learning Architectures for Textual Feature Representation 34 4.1 Neural Networks Language Model........................ 36 4.1.1 Neural Probabilistic Language Model.................. 38 4.1.2 Collobert and Weston........................... 41 4.1.3 Word2vec................................. 41 4.2 Auto-encoder................................... 43 4.2.1 Stacked Auto-encoder.......................... 45 4.2.2 Regularized Auto-encoder........................ 46 4.2.3 Sparse Auto-encoder........................... 46 4.2.3.1 k-sparse Auto-encoder..................... 47 4.2.4 Denoising Auto-encoder......................... 47 4.2.4.1 Marginalized Stacked Denoising Auto-encoder........ 48 4.2.5 k-Competitive Auto-encoder....................... 49 5 Deep Learning Representation for Making Sense of User-Generated Content 51 5.1 Named Entity Recognition and Classification.................. 54 5.1.1 Word Embeddings Representation for Learning to Adapt Entity Classi- fication.................................. 57 5.1.1.1 Adaptation Model: Learning to Adapt with Word Embeddings 59 5.1.1.2 Experimental Settings..................... 61 5.1.1.3 Experimental Results..................... 66 5.1.1.4 Related Works......................... 73 5.2 Named Entity Linking.............................. 74 5.2.1 Word Embeddings for Named Entity Linking.............. 75 5.2.1.1 Representation and Linking model.............. 77 5.2.1.2 Experimental Settings..................... 78 5.2.1.3 Experimental Results..................... 80 5.2.1.4 Related Works......................... 85 5.3 Sentiment Analysis...............................

Deep Learning for Feature Representation in Natural Language Processing

The Complete Guide to Social Media from the Social Media Guys

E-Learning 3.0 = E-Learning 2.0 + Web 3.0?

Social Tagging in the Life Sciences: Characterizing a New Metadata Resource for Bioinformatics Benjamin M Good*1, Josephttennis2 and Mark D Wilkinson1

The Social Semantic Web

Semantic Web Beyond Science Fiction

The Social Semantic Web John G

The Semantic Web: Lighter, Faster, Easier

Web Squared: Web 2.0 Five Years On

The Social Semantic Web

Technology Forecast Systems Are Not Designed for This On-The-Fly Creation of • Tom Scott of BBC Earth Talks About His Company’S Meaning

Case Study: Twine