Unsupervised Learning of Sentence Embeddings Using Compositional N-Gram Features

Total Page:16

File Type:pdf, Size:1020Kb

Unsupervised Learning of Sentence Embeddings Using Compositional N-Gram Features Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features Matteo Pagliardini* Prakhar Gupta* Martin Jaggi Iprova SA, Switzerland EPFL, Switzerland EPFL, Switzerland [email protected] [email protected] [email protected] Abstract mains a key goal to learn such general-purpose representations in an unsupervised way. The recent tremendous success of unsuper- vised word embeddings in a multitude of ap- Currently, two contrary research trends have plications raises the obvious question if simi- emerged in text representation learning: On one lar methods could be derived to improve em- hand, a strong trend in deep-learning for NLP beddings (i.e. semantic representations) of leads towards increasingly powerful and com- word sequences as well. We present a sim- plex models, such as recurrent neural networks ple but efficient unsupervised objective to train (RNNs), LSTMs, attention models and even Neu- distributed representations of sentences. Our ral Turing Machine architectures. While ex- method outperforms the state-of-the-art unsu- pervised models on most benchmark tasks, tremely strong in expressiveness, the increased highlighting the robustness of the produced model complexity makes such models much general-purpose sentence embeddings. slower to train on larger datasets. On the other end of the spectrum, simpler “shallow” models such 1 Introduction as matrix factorizations (or bilinear models) can benefit from training on much larger sets of data, Improving unsupervised learning is of key impor- which can be a key advantage, especially in the tance for advancing machine learning methods, as unsupervised setting. to unlock access to almost unlimited amounts of data to be used as training resources. The ma- Surprisingly, for constructing sentence embed- jority of recent success stories of deep learning dings, naively using averaged word vectors was does not fall into this category but instead relied shown to outperform LSTMs (see Wieting et al. on supervised training (in particular in the vision (2016b) for plain averaging, and Arora et al. domain). A very notable exception comes from (2017) for weighted averaging). This example the text and natural language processing domain, shows potential in exploiting the trade-off be- in the form of semantic word embeddings trained tween model complexity and ability to process unsupervised (Mikolov et al., 2013b,a; Penning- huge amounts of text using scalable algorithms, ton et al., 2014). Within only a few years from towards the simpler side. In view of this trade- their invention, such word representations – which off, our work here further advances unsupervised are based on a simple matrix factorization model learning of sentence embeddings. Our proposed as we formalize below – are now routinely trained model can be seen as an extension of the C-BOW on very large amounts of raw text data, and have (Mikolov et al., 2013b,a) training objective to train become ubiquitous building blocks of a majority sentence instead of word embeddings. We demon- of current state-of-the-art NLP applications. strate that the empirical performance of our re- sulting general-purpose sentence embeddings very While very useful semantic representations are significantly exceeds the state of the art, while available for words, it remains challenging to pro- keeping the model simplicity as well as training duce and learn such semantic embeddings for and inference complexity exactly as low as in aver- longer pieces of text, such as sentences, para- aging methods (Wieting et al., 2016b; Arora et al., graphs or entire documents. Even more so, it re- 2017), thereby also putting the work by (Arora * indicates equal contribution et al., 2017) in perspective. 528 Proceedings of NAACL-HLT 2018, pages 528–540 New Orleans, Louisiana, June 1 - 6, 2018. c 2018 Association for Computational Linguistics Contributions. The main contributions in this which can be of arbitrary length, the indicator vec- work can be summarized as follows: tor ι 0, 1 is a binary vector encoding S S ∈ { }|V| (bag of words encoding). • Model. We propose Sent2Vec1, a sim- Fixed-length context windows S running over ple unsupervised model allowing to com- the corpus are used in word embedding methods pose sentence embeddings using word vec- as in C-BOW (Mikolov et al., 2013b,a) and GloVe tors along with n-gram embeddings, simulta- (Pennington et al., 2014). Here we have k = k |V| neously training composition and the embed- and each cost function fS : R R only de- → ding vectors themselves. pends on a single row of its input, describing the observed target word for the given fixed-length • Efficiency & Scalability. The computational context S. In contrast, for sentence embeddings complexity of our embeddings is only (1) O which are the focus of our paper here, S will vector operations per word processed, both be entire sentences or documents (therefore vari- during training and inference of the sentence able length). This property is shared with the su- embeddings. This strongly contrasts all neu- pervised FastText classifier (Joulin et al., 2017), ral network based approaches, and allows our which however uses soft-max with k being |V| model to learn from extremely large datasets, the number of class labels. in a streaming fashion, which is a crucial ad- vantage in the unsupervised setting. Fast in- 2.1 Proposed Unsupervised Model ference is a key benefit in downstream tasks We propose a new unsupervised model, Sent2Vec, and industry applications. for learning universal sentence embeddings. Con- ceptually, the model can be interpreted as a natu- • Performance. Our method shows signifi- ral extension of the word-contexts from C-BOW cant performance improvements compared to (Mikolov et al., 2013b,a) to a larger sentence con- the current state-of-the-art unsupervised and text, with the sentence words being specifically even semi-supervised models. The resulting optimized towards additive combination over the general-purpose embeddings show strong ro- sentence, by means of the unsupervised objective bustness when transferred to a wide range of function. prediction benchmarks. Formally, we learn a source (or context) embed- 2 Model ding vw and target embedding uw for each word w in the vocabulary, with embedding dimension h Our model is inspired by simple matrix factor and k = as in (1). The sentence embedding |V| models (bilinear models) such as recently very is defined as the average of the source word em- successfully used in unsupervised learning of beddings of its constituent words, as in (2). We word embeddings (Mikolov et al., 2013b,a; Pen- augment this model furthermore by also learning nington et al., 2014; Bojanowski et al., 2017) source embeddings for not only unigrams but also as well as supervised of sentence classification n-grams present in each sentence, and averaging (Joulin et al., 2017). More precisely, these models the n-gram embeddings along with the words, i.e., can all be formalized as an optimization problem the sentence embedding vS for S is modeled as of the form 1 1 vS := R(S) V ιR(S) = R(S) vw (2) | | | | min fS(UV ιS) (1) w R(S) U,V ∈X S X∈C where R(S) is the list of n-grams (including un- k h for two parameter matrices U R × and V igrams) present in sentence S. In order to pre- h ∈ ∈ R ×|V|, where denotes the vocabulary. Here, dict a missing word from the context, our objective V the columns of the matrix V represent the learnt models the softmax output approximated by neg- source word vectors whereas those of U represent ative sampling following (Mikolov et al., 2013b). the target word vectors. For a given sentence S, For the large number of output classes to be |V| predicted, negative sampling is known to signifi- 1 All our code and pre-trained models will be made publicly available on http://github.com/epfml/ cantly improve training efficiency, see also (Gold- sent2vec berg and Levy, 2014). Given the binary logistic 529 loss function ` : x log (1 + e x) coupled with model, parallel training is straight-forward using 7→ − negative sampling, our unsupervised training ob- parallelized or distributed SGD. jective is formulated as follows: Also, in order to store higher-order n-grams effi- ciently, we use the standard hashing trick, see e.g. (Weinberger et al., 2009), with the same hashing min ` uw> vS wt U,V t \{ } S wt S function as used in FastText (Joulin et al., 2017; X∈C X∈ Bojanowski et al., 2017). + ` uw> vS wt − 0 \{ } w0 Nwt 2.3 Comparison to C-BOW X∈ where S corresponds to the current sentence and C-BOW (Mikolov et al., 2013b,a) aims to predict a chosen target word given its fixed-size context Nwt is the set of words sampled negatively for the word w S. The negatives are sampled2 window, the context being defined by the average t ∈ following a multinomial distribution where each of the vectors associated with the words at a dis- word w is associated with a probability qn(w) := tance less than the window size hyper-parameter √f f f ws. If our system, when restricted to unigram w wi wi , where w is the normal- ized frequency∈V of w in the corpus. features, can be seen as an extension of C-BOW P p To select the possible target unigrams (posi- where the context window includes the entire sen- tives), we use subsampling as in (Joulin et al., tence, in practice there are few important differ- 2017; Bojanowski et al., 2017), each word w be- ences as C-BOW uses important tricks to facilitate ing discarded with probability 1 q (w) where the learning of word embeddings. C-BOW first − p qp(w) := min 1, t/fw + t/fw . Where t is uses frequent word subsampling on the sentences, the subsampling hyper-parameter.
Recommended publications
  • Classifying Relevant Social Media Posts During Disasters Using Ensemble of Domain-Agnostic and Domain-Specific Word Embeddings
    AAAI 2019 Fall Symposium Series: AI for Social Good 1 Classifying Relevant Social Media Posts During Disasters Using Ensemble of Domain-agnostic and Domain-specific Word Embeddings Ganesh Nalluru, Rahul Pandey, Hemant Purohit Volgenau School of Engineering, George Mason University Fairfax, VA, 22030 fgn, rpandey4, [email protected] Abstract to provide relevant intelligence inputs to the decision-makers (Hughes and Palen 2012). However, due to the burstiness of The use of social media as a means of communication has the incoming stream of social media data during the time of significantly increased over recent years. There is a plethora of information flow over the different topics of discussion, emergencies, it is really hard to filter relevant information which is widespread across different domains. The ease of given the limited number of emergency service personnel information sharing has increased noisy data being induced (Castillo 2016). Therefore, there is a need to automatically along with the relevant data stream. Finding such relevant filter out relevant posts from the pile of noisy data coming data is important, especially when we are dealing with a time- from the unconventional information channel of social media critical domain like disasters. It is also more important to filter in a real-time setting. the relevant data in a real-time setting to timely process and Our contribution: We provide a generalizable classification leverage the information for decision support. framework to classify relevant social media posts for emer- However, the short text and sometimes ungrammatical nature gency services. We introduce the framework in the Method of social media data challenge the extraction of contextual section, including the description of domain-agnostic and information cues, which could help differentiate relevant vs.
    [Show full text]
  • Self-Discriminative Learning for Unsupervised Document Embedding
    Self-Discriminative Learning for Unsupervised Document Embedding Hong-You Chen∗1, Chin-Hua Hu∗1, Leila Wehbe2, Shou-De Lin1 1Department of Computer Science and Information Engineering, National Taiwan University 2Machine Learning Department, Carnegie Mellon University fb03902128, [email protected], [email protected], [email protected] Abstract ingful a document embedding as they do not con- sider inter-document relationships. Unsupervised document representation learn- Traditional document representation models ing is an important task providing pre-trained such as Bag-of-words (BoW) and TF-IDF show features for NLP applications. Unlike most competitive performance in some tasks (Wang and previous work which learn the embedding based on self-prediction of the surface of text, Manning, 2012). However, these models treat we explicitly exploit the inter-document infor- words as flat tokens which may neglect other use- mation and directly model the relations of doc- ful information such as word order and semantic uments in embedding space with a discrimi- distance. This in turn can limit the models effec- native network and a novel objective. Exten- tiveness on more complex tasks that require deeper sive experiments on both small and large pub- level of understanding. Further, BoW models suf- lic datasets show the competitiveness of the fer from high dimensionality and sparsity. This is proposed method. In evaluations on standard document classification, our model has errors likely to prevent them from being used as input that are relatively 5 to 13% lower than state-of- features for downstream NLP tasks. the-art unsupervised embedding models. The Continuous vector representations for docu- reduction in error is even more pronounced in ments are being developed.
    [Show full text]
  • Q-Learning in Continuous State and Action Spaces
    -Learning in Continuous Q State and Action Spaces Chris Gaskett, David Wettergreen, and Alexander Zelinsky Robotic Systems Laboratory Department of Systems Engineering Research School of Information Sciences and Engineering The Australian National University Canberra, ACT 0200 Australia [cg dsw alex]@syseng.anu.edu.au j j Abstract. -learning can be used to learn a control policy that max- imises a scalarQ reward through interaction with the environment. - learning is commonly applied to problems with discrete states and ac-Q tions. We describe a method suitable for control tasks which require con- tinuous actions, in response to continuous states. The system consists of a neural network coupled with a novel interpolator. Simulation results are presented for a non-holonomic control task. Advantage Learning, a variation of -learning, is shown enhance learning speed and reliability for this task.Q 1 Introduction Reinforcement learning systems learn by trial-and-error which actions are most valuable in which situations (states) [1]. Feedback is provided in the form of a scalar reward signal which may be delayed. The reward signal is defined in relation to the task to be achieved; reward is given when the system is successfully achieving the task. The value is updated incrementally with experience and is defined as a discounted sum of expected future reward. The learning systems choice of actions in response to states is called its policy. Reinforcement learning lies between the extremes of supervised learning, where the policy is taught by an expert, and unsupervised learning, where no feedback is given and the task is to find structure in data.
    [Show full text]
  • Injection of Automatically Selected Dbpedia Subjects in Electronic
    Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hospitalization Prediction Raphaël Gazzotti, Catherine Faron Zucker, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon To cite this version: Raphaël Gazzotti, Catherine Faron Zucker, Fabien Gandon, Virginie Lacroix-Hugues, David Darmon. Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hos- pitalization Prediction. SAC 2020 - 35th ACM/SIGAPP Symposium On Applied Computing, Mar 2020, Brno, Czech Republic. 10.1145/3341105.3373932. hal-02389918 HAL Id: hal-02389918 https://hal.archives-ouvertes.fr/hal-02389918 Submitted on 16 Dec 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Injection of Automatically Selected DBpedia Subjects in Electronic Medical Records to boost Hospitalization Prediction Raphaël Gazzotti Catherine Faron-Zucker Fabien Gandon Université Côte d’Azur, Inria, CNRS, Université Côte d’Azur, Inria, CNRS, Inria, Université Côte d’Azur, CNRS, I3S, Sophia-Antipolis, France I3S, Sophia-Antipolis, France
    [Show full text]
  • Information Extraction Based on Named Entity for Tourism Corpus
    Information Extraction based on Named Entity for Tourism Corpus Chantana Chantrapornchai Aphisit Tunsakul Dept. of Computer Engineering Dept. of Computer Engineering Faculty of Engineering Faculty of Engineering Kasetsart University Kasetsart University Bangkok, Thailand Bangkok, Thailand [email protected] [email protected] Abstract— Tourism information is scattered around nowa- The ontology is extracted based on HTML web structure, days. To search for the information, it is usually time consuming and the corpus is based on WordNet. For these approaches, to browse through the results from search engine, select and the time consuming process is the annotation which is to view the details of each accommodation. In this paper, we present a methodology to extract particular information from annotate the type of name entity. In this paper, we target at full text returned from the search engine to facilitate the users. the tourism domain, and aim to extract particular information Then, the users can specifically look to the desired relevant helping for ontology data acquisition. information. The approach can be used for the same task in We present the framework for the given named entity ex- other domains. The main steps are 1) building training data traction. Starting from the web information scraping process, and 2) building recognition model. First, the tourism data is gathered and the vocabularies are built. The raw corpus is used the data are selected based on the HTML tag for corpus to train for creating vocabulary embedding. Also, it is used building. The data is used for model creation for automatic for creating annotated data.
    [Show full text]
  • A Second Look at Word Embeddings W4705: Natural Language Processing
    A second look at word embeddings W4705: Natural Language Processing Fei-Tzin Lee October 23, 2019 Fei-Tzin Lee Word embeddings October 23, 2019 1 / 39 Overview Last time... • Distributional representations (SVD) • word2vec • Analogy performance Fei-Tzin Lee Word embeddings October 23, 2019 2 / 39 Overview This time • Homework-related topics • Non-homework topics Fei-Tzin Lee Word embeddings October 23, 2019 3 / 39 Overview Outline 1 GloVe 2 How good are our embeddings? 3 The math behind the models? 4 Word embeddings, new and improved Fei-Tzin Lee Word embeddings October 23, 2019 4 / 39 GloVe Outline 1 GloVe 2 How good are our embeddings? 3 The math behind the models? 4 Word embeddings, new and improved Fei-Tzin Lee Word embeddings October 23, 2019 5 / 39 GloVe A recap of GloVe Our motivation: • SVD places too much emphasis on unimportant matrix entries • word2vec never gets to look at global co-occurrence statistics Can we create a new model that balances the strengths of both of these to better express linear structure? Fei-Tzin Lee Word embeddings October 23, 2019 6 / 39 GloVe Setting We'll use more or less the same setting as in previous models: • A corpus D • A word vocabulary V from which both target and context words are drawn • A co-occurrence matrix Mij , from which we can calculate conditional probabilities Pij = Mij =Mi Fei-Tzin Lee Word embeddings October 23, 2019 7 / 39 GloVe The GloVe objective, in overview Idea: we want to capture not individual probabilities of co-occurrence, but ratios of co-occurrence probability between pairs (wi ; wk ) and (wj ; wk ).
    [Show full text]
  • Cw2vec: Learning Chinese Word Embeddings with Stroke N-Gram Information
    The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information Shaosheng Cao,1,2 Wei Lu,2 Jun Zhou,1 Xiaolong Li1 1 AI Department, Ant Financial Services Group 2 Singapore University of Technology and Design {shaosheng.css, jun.zhoujun, xl.li}@antfin.com [email protected] Abstract We propose cw2vec, a novel method for learning Chinese word embeddings. It is based on our observation that ex- ploiting stroke-level information is crucial for improving the learning of Chinese word embeddings. Specifically, we de- sign a minimalist approach to exploit such features, by us- ing stroke n-grams, which capture semantic and morpholog- ical level information of Chinese words. Through qualita- tive analysis, we demonstrate that our model is able to ex- Figure 1: Radical v.s. components v.s. stroke n-gram tract semantic information that cannot be captured by exist- ing methods. Empirical results on the word similarity, word analogy, text classification and named entity recognition tasks show that the proposed approach consistently outperforms Bojanowski et al. 2016; Cao and Lu 2017). While these ap- state-of-the-art approaches such as word-based word2vec and proaches were shown effective, they largely focused on Eu- GloVe, character-based CWE, component-based JWE and ropean languages such as English, Spanish and German that pixel-based GWE. employ the Latin script in their writing system. Therefore the methods developed are not directly applicable to lan- guages such as Chinese that employ a completely different 1. Introduction writing system. Word representation learning has recently received a sig- In Chinese, each word typically consists of less charac- nificant amount of attention in the field of natural lan- ters than English1, where each character conveys fruitful se- guage processing (NLP).
    [Show full text]
  • Effects of Pre-Trained Word Embeddings on Text-Based Deception Detection
    2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress Effects of Pre-trained Word Embeddings on Text-based Deception Detection David Nam, Jerin Yasmin, Farhana Zulkernine School of Computing Queen’s University Kingston, Canada Email: {david.nam, jerin.yasmin, farhana.zulkernine}@queensu.ca Abstract—With e-commerce transforming the way in which be wary of any deception that the reviews might present. individuals and businesses conduct trades, online reviews have Deception can be defined as a dishonest or illegal method become a great source of information among consumers. With in which misleading information is intentionally conveyed 93% of shoppers relying on online reviews to make their purchasing decisions, the credibility of reviews should be to others for a specific gain [2]. strongly considered. While detecting deceptive text has proven Companies or certain individuals may attempt to deceive to be a challenge for humans to detect, it has been shown consumers for various reasons to influence their decisions that machines can be better at distinguishing between truthful about purchasing a product or a service. While the motives and deceptive online information by applying pattern analysis for deception may be hard to determine, it is clear why on a large amount of data. In this work, we look at the use of several popular pre-trained word embeddings (Word2Vec, businesses may have a vested interest in producing deceptive GloVe, fastText) with deep neural network models (CNN, reviews about their products.
    [Show full text]
  • Reinforcement Learning Data Science Africa 2018 Abuja, Nigeria (12 Nov - 16 Nov 2018)
    Reinforcement Learning Data Science Africa 2018 Abuja, Nigeria (12 Nov - 16 Nov 2018) Chika Yinka-Banjo, PhD Ayorkor Korsah, PhD University of Lagos Ashesi University Nigeria Ghana Outline • Introduction to Machine learning • Reinforcement learning definitions • Example reinforcement learning problems • The Markov decision process • The optimal policy • Value function & Q-value function • Bellman Equation • Q-learning • Building a simple Q-learning agent (coding) • Recap • Where to go from here? Introduction to Machine learning • Artificial Intelligence (AI) is the study and design of Intelligent agents. • An Intelligent agent can perceive its environment through sensors and it can act on its environment through actuators. • E.g. Agent: Humanoid robot • Environment: Earth? • Sensors: Camera, tactile sensor etc. • Actuators: Motors, grippers etc. • Machine learning is a subfield of Artificial Intelligence Branches of AI Introduction to Machine learning • Machine learning techniques learn from data without being explicitly programmed to do so. • Machine learning models enable the agent to learn from its own experience by extracting useful information from feedback from its environment. • Three types of learning feedback: • Supervised learning • Unsupervised learning • Reinforcement learning Branches of Machine learning Supervised learning • Supervised learning: the machine learning model is trained on many labelled examples of input-output pairs. • Such that when presented with a novel input, the model can estimate accurately what the correct output should be. • Data(x, y): x is input data, y is label Supervised learning task in the form of classification • Goal: learn a function to map x -> y • Examples include; Classification, regression object detection, image captioning etc. Unsupervised learning • Unsupervised learning: here the model extract useful information from unlabeled and unstructured data.
    [Show full text]
  • A Review of Unsupervised Artificial Neural Networks with Applications
    A REVIEW OF UNSUPERVISED ARTIFICIAL NEURAL NETWORKS WITH APPLICATIONS Samson Damilola Fabiyi Department of Electronic and Electrical Engineering, University of Strathclyde 204 George Street, G1 1XW, Glasgow, United Kingdom [email protected] ABSTRACT designer) who uses his or her knowledge of the environment to Artificial Neural Networks (ANNs) are models formulated to train the network with labelled data sets [7]. Hence, the mimic the learning capability of human brains. Learning in artificial neural networks learn by receiving input and target ANNs can be categorized into supervised, reinforcement and pairs of several observations from the labelled data sets, unsupervised learning. Application of supervised ANNs is processing the input, comparing the output with the target, limited to when the supervisor’s knowledge of the environment computing the error between the output and target, and using is sufficient to supply the networks with labelled datasets. the error signal and the concept of backward propagation to Application of unsupervised ANNs becomes imperative in adjust the weights interconnecting the network’s neurons with situations where it is very difficult to get labelled datasets. This the aim of minimising the error and optimising performance [6, paper presents the various methods, and applications of 7]. Fine-tuning of the network continues until the set of weights unsupervised ANNs. In order to achieve this, several secondary that minimise the discrepancy between the output and the sources of information, including academic journals and desired output is obtained. Figure 1 shows the block diagram conference proceedings, were selected. Autoencoders, self- which conceptualizes supervised learning in ANNs.
    [Show full text]
  • Smart Ubiquitous Chatbot for COVID-19 Assistance with Deep Learning Sentiment Analysis Model During and After Quarantine
    Smart Ubiquitous Chatbot for COVID-19 Assistance with Deep learning Sentiment Analysis Model during and after quarantine Nourchène Ouerhani ( [email protected] ) University of Manouba, National School of Computer Sciences, RIADI Laboratory, 2010, Manouba, Tunisia Ahmed Maalel University of Manouba, National School of Computer Sciences, RIADI Laboratory, 2010, Manouba, Tunisia Henda Ben Ghézala University of Manouba, National School of Computer Sciences, RIADI Laboratory, 2010, Manouba, Tunisia Soulaymen Chouri Vialytics Lautenschlagerstr Research Article Keywords: Chatbot, COVID-19, Natural Language Processing, Deep learning, Mental Health, Ubiquity Posted Date: June 25th, 2020 DOI: https://doi.org/10.21203/rs.3.rs-33343/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Noname manuscript No. (will be inserted by the editor) Smart Ubiquitous Chatbot for COVID-19 Assistance with Deep learning Sentiment Analysis Model during and after quarantine Nourch`ene Ouerhani · Ahmed Maalel · Henda Ben Gh´ezela · Soulaymen Chouri Received: date / Accepted: date Abstract The huge number of deaths caused by the posed method is a ubiquitous healthcare service that is novel pandemic COVID-19, which can affect anyone presented by its four interdependent modules: Informa- of any sex, age and socio-demographic status in the tion Understanding Module (IUM) in which the NLP is world, presents a serious threat for humanity and so- done, Data Collector Module (DCM) that collect user’s ciety. At this point, there are two types of citizens, non-confidential information to be used later by the Ac- those oblivious of this contagious disaster’s danger that tion Generator Module (AGM) that generates the chat- could be one of the causes of its spread, and those who bots answers which are managed through its three sub- show erratic or even turbulent behavior since fear and modules.
    [Show full text]
  • A Comparison of Word Embeddings and N-Gram Models for Dbpedia Type and Invalid Entity Detection †
    information Article A Comparison of Word Embeddings and N-gram Models for DBpedia Type and Invalid Entity Detection † Hanqing Zhou *, Amal Zouaq and Diana Inkpen School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa ON K1N 6N5, Canada; [email protected] (A.Z.); [email protected] (D.I.) * Correspondence: [email protected]; Tel.: +1-613-562-5800 † This paper is an extended version of our conference paper: Hanqing Zhou, Amal Zouaq, and Diana Inkpen. DBpedia Entity Type Detection using Entity Embeddings and N-Gram Models. In Proceedings of the International Conference on Knowledge Engineering and Semantic Web (KESW 2017), Szczecin, Poland, 8–10 November 2017, pp. 309–322. Received: 6 November 2018; Accepted: 20 December 2018; Published: 25 December 2018 Abstract: This article presents and evaluates a method for the detection of DBpedia types and entities that can be used for knowledge base completion and maintenance. This method compares entity embeddings with traditional N-gram models coupled with clustering and classification. We tackle two challenges: (a) the detection of entity types, which can be used to detect invalid DBpedia types and assign DBpedia types for type-less entities; and (b) the detection of invalid entities in the resource description of a DBpedia entity. Our results show that entity embeddings outperform n-gram models for type and entity detection and can contribute to the improvement of DBpedia’s quality, maintenance, and evolution. Keywords: semantic web; DBpedia; entity embedding; n-grams; type identification; entity identification; data mining; machine learning 1. Introduction The Semantic Web is defined by Berners-Lee et al.
    [Show full text]