Variational Inference in Pachinko Allocation Machines

Total Page:16

File Type:pdf, Size:1020Kb

Variational Inference in Pachinko Allocation Machines Variational Inference In Pachinko Allocation Machines Akash Srivastava Charles Sutton University of Edinburgh University of Edinburgh [email protected] The Alan Turing Institute Google Brain [email protected] Abstract each word, a path from the root of the DAG to a The Pachinko Allocation Machine (PAM) is a leaf. Thus the internal nodes can represent distri- deep topic model that allows representing rich butions over topics, so-called “super-topics”, and correlation structures among topics by a di- so on, thereby representing correlations among rected acyclic graph over topics. Because of topics. the flexibility of the model, however, approx- Unfortunately PAM introduces many latent imate inference is very difficult. Perhaps for variables — for each word in the document, the this reason, only a small number of potential path in the DAG that generated the word is latent. PAM architectures have been explored in the literature. In this paper we present an effi- Therefore, traditional inference method, such as cient and flexible amortized variational infer- Gibbs sampling and decoupled mean-field varia- ence method for PAM, using a deep inference tional inference become significantly more expen- network to parameterize the approximate pos- sive. This not only affects the scale of data sets terior distribution in a manner similar to the that can be considered, but more fundamentally variational autoencoder. Our inference method the computational cost of inference makes it diffi- produces more coherent topics than state-of- cult to explore the space of possible architectures art inference methods for PAM while being an order of magnitude faster, which allows explo- for PAM. As a result, to date only relatively sim- ration of a wider range of PAM architectures ple architectures have been studied in the literature than have previously been studied. (Li and McCallum, 2006; Mimno et al., 2007; Li et al., 2012). 1 Introduction We present what is, to the best of our knowl- Topic models are widely used tools for explor- edge, the first variational inference method for ing and visualizing document collections. Sim- PAM, which we call dnPAM. Unlike collapsed pler topic models, like latent Dirichlet alloca- Gibbs, dnPAM can be generically applied to any tion (LDA) (Blei et al., 2003), capture correla- PAM architecture without the need to derive a new tions among words but do not capture correlations inference algorithm, allowing much more rapid among topics. This limits the model’s ability to exploration of the space of possible model archi- discover finer-grained hierarchical latent structure tectures. dnPAM is an amortized inference follow- in the data. For example, we expect that very spe- ing the learning principle of variational autoen- cific topics, such as those pertaining to individ- coders, which means that the variational distribu- ual sports teams, are likely to co-occur more often tion is parameterized by a deep network, which is than more general topics, such as a generic “poli- trained to perform accurate inference. We find that tics” topic with a generic “sports” topic. dnPAM is not only an order of magnitude faster A popular extension to LDA that captures topic than collapsed Gibbs, but even returns topics with correlations is the Pachinko Allocation Machine comparable or greater coherence. The dramatic (PAM) (Li and McCallum, 2006). PAM is es- speedup in inference time comes from the amor- sentially “deep LDA”. It is defined by a directed tization of the learning cost via learning a neu- acyclic graph (DAG) in which each leaf node de- ral network to produce posterior parameter instead notes a word in the vocabulary, and each internal of learning these parameters directly. This effi- node is associated with a distribution over its chil- ciency in inference enables exploration of more dren. The document is generated by sampling, for complex and deeper PAM models than have pre- viously been possible. sists of a root node θr which is connected to chil- As a demonstration of this, as our second con- dren θ1 : : : θS called super-topics. Each super- tribution we introduce a mixture of PAM model, topic θs is connected to the same set of children where each component distribution of the mixture β1 : : : βK called subtopics, each of which are fully is represented by a PAM. By mixing PAMs with connected to the vocabulary items 1 :::V in the varying number of topics, this model captures the leaves. latent structure in the data at many different levels A document is generated in 4-PAM as follows. of granularity that decouples general broad topics First, a single matrix of subtopics β are gener- from the more specific ones. ated for the entire corpus as βk ∼ Dirichlet(α0). Like other variational autoencoders (VAEs) Then, to sample a document w, we sample child (Kingma and Welling, 2013; Rezende et al., 2014), distributions for each remaining internal node in our model also suffers from the posterior collaps- the DAG. For the root node, θr is drawn from ing (van den Oord et al., 2017) what is sometime a Dirichlet prior θr ∼ Dirichlet(αr), and simi- also referred to as component collapsing (Dinh larly for each super-topic s 2 f1 :::Sg, the su- and Dumoulin, 2016) and slow training due to low pertopic θs is drawn as θs ∼ Dirichlet(αs). Fi- learning rates. We present an analysis of these is- nally, for each word wn, a path is sampled from sues in the context of topic modeling and propose the root to the leaf. From the root, we sam- normalization based solution to alleviate them. ple the index of a supertopic zn0 2 f1 :::Sg as zn0 ∼ Categorical(θr), followed by a subtopic 2 Latent Dirichlet Allocation index zn2 2 f1 :::Kg sampled as zn2 ∼ Categorical(βz ), and finally the word is sampled LDA represents each document w in a collection n1 as wn ∼ Categorical(βz ). This process can be β n1 as a mixture of topics. Each topic vector k is a written as a density distribution over the vocabulary, that is, a vector of probabilities, and β = (β : : : β ) is the ma- S 1 K Y trix of the K topics. Every document is then mod- P (w; z; θ j α; β) = p(θrjαr) P (θsjαs) (1) eled as an admixture of the topics. The genera- s=1 tive process is to first sample a proportion vector Y × p(zn1jθr)p(zn2jθzn1 )p(wnjβzn2 ): θ ∼ Dirichlet(α), and then for each word at posi- n tion n, sampling a topic indicator zn 2 f1;:::Kg It should be easily seen how this process can be as zn ∼ Categorical(θ), and finally sampling the extended to arbitrary `-partite graphs, yielding the word index wn ∼ Categorical(βzn ). `-PAM model, in which case LDA exactly corre- 2.1 Deep LDA: Pachinko Allocation Machine sponds to 3-PAM, and also to arbitrary DAGs. PAM is a class of topic models that extends LDA 3 Mixture of PAMs by modeling correlations among topics. A partic- ular instance of a PAM represents the correlation The main advantage of the inference framework structure among topics by a DAG in which the leaf we propose is that it allows easily exploring the nodes represent words in the vocabulary and the design space of possible structures for PAM. As internal nodes represent topics. Each node s in the a demonstration of this, we present a word-level DAG is associated with a distribution θs over its mixture of PAMs that allows learning finer grained children, which has a Dirichlet prior. There is no topics than a single PAM, as some mixture com- need to differentiate between nodes in the graph ponents learn topics that capture the more general, and the distributions θs, so we will simply take global topics so that other mixture components can fθsg to be the node set of the graph. To gener- focus on finer-grained topics. ate a document in PAM, for each word we sample We describe a word-level mixture of M PAMs a path from the root to a leaf, and output the word P1 :::PM , each of which can have a different associated with that leaf. number of topics or even a completely different More formally, we present the special case of DAG structure. To generate a document under this 4-PAM, in which the DAG is a 4-partite graph. model, first we sample an M-dimensional docu- It will be clear how to generalize this discussion ment level mixing proportion θr ∼ Dirichlet(αr). to arbitrary DAGs. In 4-PAM, the DAG con- Then, for each word wn in the document, we topics are sharper, indicating that each individual topic is capturing more information about the data. The mixture model allows the two LDAs being mixed to focus exclusively on higher (for 10 top- ics) and lower (for 50 topics) level features while modeling the images. On the other hand, the top- ics in the vanilla LDA need to account for all the variability in the dataset using just 10 (or 50) top- ics and therefore are fuzzier. 4 Inference Probabilistic inference in topic models is the task of computing posterior distributions p(zjw; α; β) over the topic assignments for words, or over the posterior p(θjw; α; β) of topic proportions for documents. For all practical topic models, this Figure 1: Top: A and B show randomly sampled top- ics from MoLDA(10:50). Bottom: C and D show ran- task is intractable. Commonly used methods in- domly sampled topics from LDA with 10 topics and clude Gibbs sampling (Li and McCallum, 2006; 50 topics on Omniglot. Notice that by using a mixture, Blei et al., 2004), which can be slow to converge, the MoLDA can decouple the higher level structure (A) and variational inference methods such as mean from the lower-level details(B).
Recommended publications
  • Multi-View Learning for Hierarchical Topic Detection on Corpus of Documents
    Multi-view learning for hierarchical topic detection on corpus of documents Juan Camilo Calero Espinosa Universidad Nacional de Colombia Facultad de Ingenieria, Departamento de Ingenieria de Sistemas e Industrial. Bogot´a,Colombia 2021 Multi-view learning for hierarchical topic detection on corpus of documents Juan Camilo Calero Espinosa Tesis presentada como requisito parcial para optar al t´ıtulode: Magister en Ingenier´ıade Sistemas y Computaciøn´ Director: Ph.D. Luis Fernando Ni~noV. L´ıneade Investigaci´on: Procesamiento de lenguaje natural Grupo de Investigaci´on: Laboratorio de investigaci´onen sistemas inteligentes - LISI Universidad Nacional de Colombia Facultad de Ingenieria, Departamento de Ingenieria en Sistemas e Industrial. Bogot´a,Colombia 2021 To my parents Maria Helena and Jaime. To my aunts Patricia and Rosa. To my grandmothers Lilia and Santos. Acknowledgements To Camilo Alberto Pino, as the original thesis idea was his, and for his invaluable teaching of multi-view learning. To my thesis advisor, Luis Fernando Ni~no,and the Laboratorio de investigaci´onen sistemas inteligentes - LISI, for constantly allowing me to learn new knowl- edge, and for their valuable recommendations on the thesis. V Abstract Topic detection on a large corpus of documents requires a considerable amount of com- putational resources, and the number of topics increases the burden as well. However, even a large number of topics might not be as specific as desired, or simply the topic quality starts decreasing after a certain number. To overcome these obstacles, we propose a new method- ology for hierarchical topic detection, which uses multi-view clustering to link different topic models extracted from document named entities and part of speech tags.
    [Show full text]
  • Nonstationary Latent Dirichlet Allocation for Speech Recognition
    Nonstationary Latent Dirichlet Allocation for Speech Recognition Chuang-Hua Chueh and Jen-Tzung Chien Department of Computer Science and Information Engineering National Cheng Kung University, Tainan, Taiwan, ROC {chchueh,chien}@chien.csie.ncku.edu.tw model was presented for document representation with time Abstract evolution. Current parameters served as the prior information Latent Dirichlet allocation (LDA) has been successful for to estimate new parameters for next time period. Furthermore, document modeling. LDA extracts the latent topics across a continuous time dynamic topic model [12] was developed by documents. Words in a document are generated by the same incorporating a Brownian motion in the dynamic topic model, topic distribution. However, in real-world documents, the and so the continuous-time topic evolution was fulfilled. usage of words in different paragraphs is varied and Sparse variational inference was used to reduce the accompanied with different writing styles. This study extends computational complexity. In [6][7], LDA was merged with the LDA and copes with the variations of topic information the hidden Markov model (HMM) as the HMM-LDA model within a document. We build the nonstationary LDA (NLDA) where the Markov states were used to discover the syntactic by incorporating a Markov chain which is used to detect the structure of a document. Each word was generated either from stylistic segments in a document. Each segment corresponds to the topic or the syntactic state. The syntactic words and a particular style in composition of a document. This NLDA content words were modeled separately. can exploit the topic information between documents as well This study also considers the superiority of HMM in as the word variations within a document.
    [Show full text]
  • Large-Scale Hierarchical Topic Models
    Large-Scale Hierarchical Topic Models Jay Pujara Peter Skomoroch Department of Computer Science LinkedIn Corporation University of Maryland 2029 Stierlin Ct. College Park, MD 20742 Mountain View, CA 94043 [email protected] [email protected] Abstract In the past decade, a number of advances in topic modeling have produced sophis- ticated models that are capable of generating hierarchies of topics. One challenge for these models is scalability: they are incapable of working at the massive scale of millions of documents and hundreds of thousands of terms. We address this challenge with a technique that learns a hierarchy of topics by iteratively apply- ing topic models and processing subtrees of the hierarchy in parallel. This ap- proach has a number of scalability advantages compared to existing techniques, and shows promising results in experiments assessing runtime and human evalu- ations of quality. We detail extensions to this approach that may further improve hierarchical topic modeling for large-scale applications. 1 Motivation With massive datasets and corresponding computational resources readily available, the Big Data movement aims to provide deep insights into real-world data. Realizing this goal can require new approaches to well-studied problems. Complex models that, for example, incorporate many de- pendencies between parameters have alluring results for small datasets and single machines but are difficult to adapt to the Big Data paradigm. Topic models are an interesting example of this phenomenon. In the last decade, a number of sophis- ticated techniques have been developed to model collections of text, from Latent Dirichlet Allocation (LDA)[1] through extensions using statistical machinery such as the nested Chinese Restaurant Pro- cess [2][3] and Pachinko Allocation[4].
    [Show full text]
  • Journal of Applied Sciences Research
    Copyright © 2015, American-Eurasian Network for Scientific Information publisher JOURNAL OF APPLIED SCIENCES RESEARCH ISSN: 1819-544X EISSN: 1816-157X JOURNAL home page: http://www.aensiweb.com/JASR 2015 October; 11(19): pages 50-55. Published Online 10 November 2015. Research Article A Survey on Correlation between the Topic and Documents Based on the Pachinko Allocation Model 1Dr.C.Sundar and 2V.Sujitha 1Associate Professor, Department of Computer Science and Engineering, Christian College of Engineering and Technology, Dindigul, Tamilnadu-624619, India. 2PG Scholar, Department of Computer Science and Engineering, Christian College of Engineering and Technology, Dindigul, Tamilnadu-624619, India. Received: 23 September 2015; Accepted: 25 October 2015 © 2015 AENSI PUBLISHER All rights reserved ABSTRACT Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. In existing system, a novel information filtering model, Maximum matched Pattern-based Topic Model (MPBTM), is used.The patterns are generated from the words in the word-based topic representations of a traditional topic model such as the LDA model. This ensures that the patterns can well represent the topics because these patterns are comprised of the words which are extracted by LDA based on sample occurrence of the words in the documents. The Maximum matched pat-terns, which are the largest patterns in each equivalence class that exist in the received documents, are used to calculate the relevance words to represent topics. However, LDA does not capture correlations between topics and these not find the hidden topics in the document. To deal with the above problem the pachinko allocation model (PAM) is proposed.
    [Show full text]
  • Incorporating Domain Knowledge in Latent Topic Models
    INCORPORATING DOMAIN KNOWLEDGE IN LATENT TOPIC MODELS by David Michael Andrzejewski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the UNIVERSITY OF WISCONSIN–MADISON 2010 c Copyright by David Michael Andrzejewski 2010 All Rights Reserved i For my parents and my Cho. ii ACKNOWLEDGMENTS Obviously none of this would have been possible without the diligent advising of Mark Craven and Xiaojin (Jerry) Zhu. Taking a bioinformatics class from Mark as an undergraduate initially got me excited about the power of statistical machine learning to extract insights from seemingly impenetrable datasets. Jerry’s enthusiasm for research and relentless pursuit of excellence were a great inspiration for me. On countless occasions, Mark and Jerry have selflessly donated their time and effort to help me develop better research skills and to improve the quality of my work. I would have been lucky to have even a single advisor as excellent as either Mark or Jerry; I have been extremely fortunate to have them both as co-advisors. My committee members have also been indispensable. Jude Shavlik has always brought an emphasis on clear communication and solid experimental technique which will hopefully stay with me for the rest of my career. Michael Newton helped me understand the modeling issues in this research from a statistical perspective. Working with prelim committee member Ben Liblit gave me the exciting opportunity to apply machine learning to a very challenging problem in the debugging work presented in Chapter 4. I also learned a lot about how other computer scientists think from meetings with Ben.
    [Show full text]
  • WELDA: Enhancing Topic Models by Incorporating Local Word Context
    WELDA: Enhancing Topic Models by Incorporating Local Word Context Stefan Bunk Ralf Krestel Hasso Plattner Institute Hasso Plattner Institute Potsdam, Germany Potsdam, Germany [email protected] [email protected] ABSTRACT The distributional hypothesis states that similar words tend to have similar contexts in which they occur. Word embedding models exploit this hypothesis by learning word vectors based on the local context of words. Probabilistic topic models on the other hand utilize word co-occurrences across documents to identify topically related words. Due to their complementary nature, these models define different notions of word similarity, which, when combined, can produce better topical representations. In this paper we propose WELDA, a new type of topic model, which combines word embeddings (WE) with latent Dirichlet allo- cation (LDA) to improve topic quality. We achieve this by estimating topic distributions in the word embedding space and exchanging selected topic words via Gibbs sampling from this space. We present an extensive evaluation showing that WELDA cuts runtime by at least 30% while outperforming other combined approaches with Figure 1: Illustration of an LDA topic in the word embed- respect to topic coherence and for solving word intrusion tasks. ding space. The gray isolines show the learned probability distribution for the topic. Exchanging topic words having CCS CONCEPTS low probability in the embedding space (red minuses) with • Information systems → Document topic models; Document high probability words (green pluses), leads to a superior collection models; • Applied computing → Document analysis; LDA topic. KEYWORDS word by the company it keeps” [11]. In other words, if two words topic models, word embeddings, document representations occur in similar contexts, they are similar.
    [Show full text]
  • Extração De Informação Semântica De Conteúdo Da Web 2.0
    Mestrado em Engenharia Informática Dissertação Relatório Final Extração de Informação Semântica de Conteúdo da Web 2.0 Ana Rita Bento Carvalheira [email protected] Orientador: Paulo Jorge de Sousa Gomes [email protected] Data: 1 de Julho de 2014 Agradecimentos Gostaria de começar por agradecer ao Professor Paulo Gomes pelo profissionalismo e apoio incondicional, pela sincera amizade e a total disponibilidade demonstrada ao longo do ano. O seu apoio, não só foi determinante para a elaboração desta tese, como me motivou sempre a querer saber mais e ter vontade de fazer melhor. À minha Avó Maria e Avô Francisco, por sempre estarem presentes quando eu precisei, pelo carinho e afeto, bem como todo o esforço que fizeram para que nunca me faltasse nada. Espero um dia poder retribuir de alguma forma tudo aquilo que fizeram por mim. Aos meus Pais, pelos ensinamentos e valores transmitidos, por tudo o que me proporcionaram e por toda a disponibilidade e dedicação que, constantemente, me oferecem. Tudo aquilo que sou, devo-o a vocês. Ao David agradeço toda a ajuda e compreensão ao longo do ano, todo o carinho e apoio demonstrado em todas as minhas decisões e por sempre me ter encorajado a seguir os meus sonhos. Admiro-te sobretudo pela tua competência e humildade, pela transmissão de força e confiança que me dás em todos os momentos. Resumo A massiva proliferação de blogues e redes sociais fez com que o conteúdo gerado pelos utilizadores, presente em plataformas como o Twitter ou Facebook, se tornasse bastante valioso pela quantidade de informação passível de ser extraída e explorada.
    [Show full text]
  • An Unsupervised Topic Segmentation Model Incorporating Word Order
    An Unsupervised Topic Segmentation Model Incorporating Word Order Shoaib Jameel and Wai Lam The Chinese University of Hong Kong One line summary of the work We will see how maintaining the document structure such as paragraphs, sentences, and the word order helps improve the performance of a topic model. Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 1 Outline Motivation Related Work I Probabilistic Unigram Topic Model (LDA) I Probabilistic N-gram Topic Models I Topic Segmentation Models Overview of our model I Our N-gram Topic Segmentation model (NTSeg) Text Mining Experiments of NTSeg I Word-Topic and Segment-Topic Correlation Graph I Topic Segmentation Experiment I Document Classification Experiment I Document Likelihood Experiment Conclusions and Future Directions Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 2 Motivation Many works in the topic modeling literature assume exchangeability among the words. As a result we see many ambiguous words in topics. For example, consider few topics obtained from the NIPS collection using the Latent Dirichlet Allocation (LDA) model: Example Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 architecture order connectionist potential prior recurrent first role membrane bayesian network second binding current data module analysis structures synaptic evidence modules small distributed dendritic experts The problem with the LDA model Words in topics are not insightful. Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 3 Motivation Many works in the topic modeling literature assume exchangeability among the words. As a result we see many ambiguous words in topics. For example, consider few topics obtained from the NIPS collection using the Latent Dirichlet Allocation (LDA) model: Example Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 architecture order connectionist potential prior recurrent first role membrane bayesian network second binding current data module analysis structures synaptic evidence modules small distributed dendritic experts The problem with the LDA model Words in topics are not insightful.
    [Show full text]
  • A New Method and Application for Studying Political Text in Multiple Languages
    The Pennsylvania State University The Graduate School THE RADICAL RIGHT IN PARLIAMENT: A NEW METHOD AND APPLICATION FOR STUDYING POLITICAL TEXT IN MULTIPLE LANGUAGES A Dissertation in Political Science by Mitchell Goist Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2020 ii The dissertation of Mitchell Goist was reviewed and approved* by the following: Burt L. Monroe Liberal Arts Professor of Political Science Dissertation Advisor Chair of Committee Bruce Desmarais Associate Professor of Political Science Matt Golder Professor of Political Science Sarah Rajtmajer Assistant Professor of Information Science and Tecnology Glenn Palmer Professor of Political Science and Director of Graduate Studies iii ABSTRACT Since a new wave of radical right support in the early 1980s, scholars have sought to understand the motivations and programmatic appeals of far-right parties. However, due to their small size and dearth of data, existing methodological approaches were did not allow the direct study of these parties’ behavior in parliament. Using a collection of parliamentary speeches from the United Kingdom, Germany, Spain, Italy, the Netherlands, Finland, Sweden, and the Czech Re- public, Chapter 1 of this dissertation addresses this problem by developing a new model for the study of political text in multiple languages. Using this new method allows the construction of a shared issue space where each party is embedded regardless of the language spoken in the speech or the country of origin. Chapter 2 builds on this new method by explicating the ideolog- ical appeals of radical right parties. It finds that in some instances radical right parties behave similarly to mainstream, center-right parties, but distinguish themselves by a focus on individual crime and an emphasis on negative rhetorical frames.
    [Show full text]
  • Generative Topic Embedding: a Continuous Representation of Documents
    Generative Topic Embedding: a Continuous Representation of Documents Shaohua Li1,2 Tat-Seng Chua1 Jun Zhu3 Chunyan Miao2 [email protected] [email protected] [email protected] [email protected] 1. School of Computing, National University of Singapore 2. Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly (LILY) 3. Department of Computer Science and Technology, Tsinghua University Abstract as continuous vectors in a low-dimensional em- bedding space (Bengio et al., 2003; Mikolov et al., Word embedding maps words into a low- 2013; Pennington et al., 2014; Levy et al., 2015). dimensional continuous embedding space The learned embedding for a word encodes its by exploiting the local word collocation semantic/syntactic relatedness with other words, patterns in a small context window. On by utilizing local word collocation patterns. In the other hand, topic modeling maps docu- each method, one core component is the embed- ments onto a low-dimensional topic space, ding link function, which predicts a word’s distri- by utilizing the global word collocation bution given its context words, parameterized by patterns in the same document. These their embeddings. two types of patterns are complementary. When it comes to documents, we wish to find a In this paper, we propose a generative method to encode their overall semantics. Given topic embedding model to combine the the embeddings of each word in a document, we two types of patterns. In our model, topics can imagine the document as a “bag-of-vectors”. are represented by embedding vectors, and Related words in the document point in similar di- are shared across documents.
    [Show full text]
  • Linked Data Triples Enhance Document Relevance Classification
    applied sciences Article Linked Data Triples Enhance Document Relevance Classification Dinesh Nagumothu * , Peter W. Eklund , Bahadorreza Ofoghi and Mohamed Reda Bouadjenek School of Information Technology, Deakin University, Geelong, VIC 3220, Australia; [email protected] (P.W.E.); [email protected] (B.O.); [email protected] (M.R.B.) * Correspondence: [email protected] Abstract: Standardized approaches to relevance classification in information retrieval use generative statistical models to identify the presence or absence of certain topics that might make a document relevant to the searcher. These approaches have been used to better predict relevance on the basis of what the document is “about”, rather than a simple-minded analysis of the bag of words contained within the document. In more recent times, this idea has been extended by using pre-trained deep learning models and text representations, such as GloVe or BERT. These use an external corpus as a knowledge-base that conditions the model to help predict what a document is about. This paper adopts a hybrid approach that leverages the structure of knowledge embedded in a corpus. In particular, the paper reports on experiments where linked data triples (subject-predicate-object), constructed from natural language elements are derived from deep learning. These are evaluated as additional latent semantic features for a relevant document classifier in a customized news- feed website. The research is a synthesis of current thinking in deep learning models in NLP and information retrieval and the predicate structure used in semantic web research. Our experiments Citation: Nagumothu, D.; Eklund, indicate that linked data triples increased the F-score of the baseline GloVe representations by 6% P.W.; Ofoghi, B.; Bouadjenek, M.R.
    [Show full text]
  • Document and Topic Models: Plsa
    10/4/2018 Document and Topic Models: pLSA and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline • Topic Models • pLSA • LSA • Model • Fitting via EM • pHITS: link analysis • LDA • Dirichlet distribution • Generative process • Model • Geometric Interpretation • Inference 2 1 10/4/2018 Topic Models: Visual Representation Topic proportions and Topics Documents assignments 3 Topic Models: Importance • For a given corpus, we learn two things: 1. Topic: from full vocabulary set, we learn important subsets 2. Topic proportion: we learn what each document is about • This can be viewed as a form of dimensionality reduction • From large vocabulary set, extract basis vectors (topics) • Represent document in topic space (topic proportions) 푁 퐾 • Dimensionality is reduced from 푤푖 ∈ ℤ푉 to 휃 ∈ ℝ • Topic proportion is useful for several applications including document classification, discovery of semantic structures, sentiment analysis, object localization in images, etc. 4 2 10/4/2018 Topic Models: Terminology • Document Model • Word: element in a vocabulary set • Document: collection of words • Corpus: collection of documents • Topic Model • Topic: collection of words (subset of vocabulary) • Document is represented by (latent) mixture of topics • 푝 푤 푑 = 푝 푤 푧 푝(푧|푑) (푧 : topic) • Note: document is a collection of words (not a sequence) • ‘Bag of words’ assumption • In probability, we call this the exchangeability assumption • 푝 푤1, … , 푤푁 = 푝(푤휎 1 , … , 푤휎 푁 ) (휎: permutation) 5 Topic Models: Terminology (cont’d) • Represent each document as a vector space • A word is an item from a vocabulary indexed by {1, … , 푉}. We represent words using unit‐basis vectors.
    [Show full text]