Unsupervised Dialogue Intent Detection Via Hierarchical Topic Model

Unsupervised dialogue intent detection via hierarchical topic model Artem Popov1;2, Victor Bulatov1, Darya Polyudova1, Eugenia Veselova1 1Moscow Institute of Physics and Technology 2Lomonosov Moscow State University [email protected], [email protected], [email protected], [email protected] Abstract labeling challenge easier. The resulting labels will be more consistent and better suitable for model One of the challenges during a task- training. oriented chatbot development is the scarce Simple intent analysis is based on empirical availability of the labeled training data. rules, e.g. “question” intent contains phrase “what The best way of getting one is to ask the is # of #” (Yan et al., 2017). More universal and assessors to tag each dialogue according robust dialogue systems should work without any to its intent. Unfortunately, performing la- supervision or defined rules. Such systems can beling without any provisional collection be implemented with automatic extraction of the structure is difficult since the very notion semantic hierarchy from the query by multi-level of the intent is ill-defined. clustering, based on different semantic frames (ca- In this paper, we propose a hierarchical pability, location, characteristics etc.) in sentences multimodal regularized topic model to ob- (Chen et al., 2015). In our work intents represent a tain a first approximation of the intent more complex entity which combine all intentions set. Our rationale for hierarchical mod- and objectives. els usage is their ability to take into ac- Many previous works take advantage of hier- count several degrees of the dialogues rel- archical structures in user intention analysis. In evancy. We attempt to build a model paper (Shepitsen et al., 2008) automatic approach that can distinguish between subject-based through hierarchical clustering for document tag- (e.g. medicine and transport topics) and ging is used. However, this approach does not take action-based (e.g. filing of an application advantage of peculiar phrase features, such as syn- and tracking application status) similari- tax or specific words order. Syntactic parsing of ties. In order to achieve this, we divide intention was applied in (Gupta et al., 2018) to set of all features into several groups ac- decompose client intent. This hierarchical repre- cording to part-of-speech analysis. Vari- sentation is similar to a constituency syntax tree. ous feature groups are treated differently It contains intentions and objects as tree elements on different hierarchy levels. and demands deep analysis of every sentence. At- tempt to extract subintents along with main in- 1 Introduction tent can be found in paper (Tang et al., 2018), but One of the most important goals of task-oriented as proved below it is not necessary to apply neu- dialogue systems is to identify the user intention ral networks for precise and efficient retrieval of from the user utterances. State-of-the-art solutions multi-intent, especially in unsupervised task. like (Chen et al., 2017) require a lot of labeled We propose a hierarchical multimodal regular- data. User’s utterances (one or several for a dia- ized topic model as a simple and efficient solu- logue) have to be tagged by the intent of the dia- tion for accurate approximation of the collection logue. structure. The main contribution of this paper is This is a challenging task for a new dialogue the construction of a two-level hierarchical topic collection because the set of all possible intents model using different features on the first and sec- is unknown. Giving a provisional hierarchical col- ond levels. To the best of our knowledge, this is lection structure to assessors could make the intent the first work that investigates that possibility. We 932 Proceedings of Recent Advances in Natural Language Processing, pages 932–938, Varna, Bulgaria, Sep 2–4, 2019. https://doi.org/10.26615/978-954-452-056-4_108 introduce a custom evaluation metric which mea- ample, Wikipedia. Further improvement in qual- sures the quality of hierarchical relations between ity of clustering models with embeddings can be topics and intent detection. achieved through fine-tuning. Similar to the tf- The hierarchy structure helps to make a provi- idf approach dimensionality reduction is often em- sional clustering more interpretative. Namely, we ployed for the clustering problem (Park et al., require first level topics to describe the dialogue 2019). Several averaging schemes can be used subject and the second level topics to describe the to aggregate word embeddings: mean, where all action user is interested in. We accomplish this words contribute equally to the document, or idf- by incorporating information about part-of-speech weighted, where rare words have a greater contri- (PoS) tags into the model. bution than frequent words. This paper is organized as follows. Section 2.2 Topic modeling two describes popular approaches to an unsupervised text classification. Section three describes Another approach to text clustering problem is our reasoning behind our choices of model archi- topic modeling. The topic model simultaneously tecture. Section four briefly reviews our prepro- computes words and document embeddings and cessing pipeline and introduces several enhance- perform clusterization. It should be noted that ments to the existing NLP techniques. We demon- in some cases topic model-based embeddings out- strate the results of our model in section five. We perform traditional word embeddings, (Potapenko conclude our work in section six. et al., 2017). The probability of the word w in the document d is represented by formula below: 2 Text clustering approaches X X 2.1 Embeddings approaches p(w j d) = p(w j t)p(t j d) = φwtθtd The simplest way to build a clustering model on t2T t2T a collection of text documents includes two steps. where matrix Φ contains probabilities φwt of word On the first step, each document is mapped to a w in topic t, matrix Θ contains probabilities θtd of real-valued vector. On the second step, one of the topic t in document d. standard clustering algorithms is applied to the re- Probabilistic Latent Semantic Analysis (pLSA) sulting vectors. (Hofmann, 2000) is the simplest topic model There are many methods to build an embedding which describes words in documents by a mix- of a document. The simplest way is the tf-idf ture of hidden topics. The Φ and Θ distribu- representation. Logistic regression on the tf-idf tions are obtained via maximization of the like- representation is quite a strong algorithm for the lihood given probabilistic normalization and non- text classification problem. This algorithm is re- negativity constraints: spectable baseline even in deep neural networks research (Park et al., 2019). However, the di- X X rect use of the tf-idf representation leads to poor L(Φ; Θ) = ndw log p(wjd) ! max Φ;Θ results in the clustering problem because of the d2D w2W curse of dimensionality. Dimensionality reduction X φ = 1; φ ≥ 0 methods could be used to improve clustering qual- wt wt w2W ity: PCA or Uniform Manifold Approximation X and Projection (UMAP, McInnes et al.(2018)). θtd = 1; θtd ≥ 0 Another popular approach makes use of differ- t2T ent word embeddings (Esposito et al., 2016). First This optimization problem can be effectively of all, each word is mapped to a real-valued vector. solved via EM-algorithm or its online modifica- Then the document representation is derived from tions (Kochedykov et al., 2017). the embeddings of its words. The most popular Latend Dirichlet Allocation (LDA) (Blei et al., embedding models belong to the word2vec family 2003) model is an extension of pLSA with a prior (Mikolov et al., 2013b): CBOW, Skip-gram and estimation of the Φ and Θ, widely used in topic their modifications (Mikolov et al.(2013a)). For modelling. However, as a solution for both pLSA correct representation word2vec models should be and LDA optimization problem is not unique, each trained on a large collection of documents, for ex- solution may have different characteristics. 933 Additive Regularization of Topic Models p(tjd; w) more stable for w belonging to a same (ARTM) (Vorontsov and Potapenko, 2015) is local segment. In a way, p(tjd; w) distribution non-bayesian extension of likelihood optimization could be interpreted as the analogue for context task, providing robustness of the solution by ap- embeddings in topic modeling world. p(tjd; w) plying different regularizers. Each regularizer distribution isn’t used directly for topic represen- is used to pursue different solution characteris- tation, but it is used on the E-step of EM-algorithm tics. For example, many varieties of LDA can for φwt and θtd recalculation. be obtained from ARTM model by using certain In order to obtain more control over intent ro- smoothing regularizer; pLSA model is an ARTM bustness we propose to use a two-level hierarchi- model without regularizers. Furthermore, docu- cal topic model. The first level is responsible for ments can contain not only words but also terms of coarse-grained similarity, while the second one other modalities (e.g. authors, classes, n-grams), could take into account less obvious but important which allow us to select specific for our task lan- differences. guage features. In this case, instead of a sin- The hierarchical ARTM model consists of two gle Φ matrix, we have several Φm matrices for different ARTM models for each level, which are each modality m. Resulting functional to be opti- linked to each other. The first level of the hierar- mized is the sum of weighted with αm coefficients chical model can be any ARTM model. The sec- modalities likelihoods with regularization terms: ond level is built using regularizer from (Chirkova X and Vorontsov, 2016) which ensures that each αmL(Φm; Θm) + R([mΦm; Θ) ! max Φ;Θ m first-level topic is a convex sum of second-level topics.

Load more