Probabilistic Topic Modelling with Semantic Graph
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Semantic Memory: a Review of Methods, Models, and Current Challenges
Psychonomic Bulletin & Review https://doi.org/10.3758/s13423-020-01792-x Semantic memory: A review of methods, models, and current challenges Abhilasha A. Kumar1 # The Psychonomic Society, Inc. 2020 Abstract Adult semantic memory has been traditionally conceptualized as a relatively static memory system that consists of knowledge about the world, concepts, and symbols. Considerable work in the past few decades has challenged this static view of semantic memory, and instead proposed a more fluid and flexible system that is sensitive to context, task demands, and perceptual and sensorimotor information from the environment. This paper (1) reviews traditional and modern computational models of seman- tic memory, within the umbrella of network (free association-based), feature (property generation norms-based), and distribu- tional semantic (natural language corpora-based) models, (2) discusses the contribution of these models to important debates in the literature regarding knowledge representation (localist vs. distributed representations) and learning (error-free/Hebbian learning vs. error-driven/predictive learning), and (3) evaluates how modern computational models (neural network, retrieval- based, and topic models) are revisiting the traditional “static” conceptualization of semantic memory and tackling important challenges in semantic modeling such as addressing temporal, contextual, and attentional influences, as well as incorporating grounding and compositionality into semantic representations. The review also identifies new challenges -
Employee Matching Using Machine Learning Methods
Master of Science in Computer Science May 2019 Employee Matching Using Machine Learning Methods Sumeesha Marakani Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 20 weeks of full time studies. The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree. Contact Information: Author(s): Sumeesha Marakani E-mail: [email protected] University advisor: Prof. Veselka Boeva Department of Computer Science External advisors: Lars Tornberg [email protected] Daniel Lundgren [email protected] Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract Background. Expertise retrieval is an information retrieval technique that focuses on techniques to identify the most suitable ’expert’ for a task from a list of individ- uals. Objectives. This master thesis is a collaboration with Volvo Cars to attempt ap- plying this concept and match employees based on information that was extracted from an internal tool of the company. In this tool, the employees describe themselves in free flowing text. This text is extracted from the tool and analyzed using Natural Language Processing (NLP) techniques. -
Matrix Decompositions and Latent Semantic Indexing
Online edition (c)2009 Cambridge UP DRAFT! © April 1, 2009 Cambridge University Press. Feedback welcome. 403 Matrix decompositions and latent 18 semantic indexing On page 123 we introduced the notion of a term-document matrix: an M N matrix C, each of whose rows represents a term and each of whose column× s represents a document in the collection. Even for a collection of modest size, the term-document matrix C is likely to have several tens of thousands of rows and columns. In Section 18.1.1 we first develop a class of operations from linear algebra, known as matrix decomposition. In Section 18.2 we use a special form of matrix decomposition to construct a low-rank approximation to the term-document matrix. In Section 18.3 we examine the application of such low-rank approximations to indexing and retrieving documents, a technique referred to as latent semantic indexing. While latent semantic in- dexing has not been established as a significant force in scoring and ranking for information retrieval, it remains an intriguing approach to clustering in a number of domains including for collections of text documents (Section 16.6, page 372). Understanding its full potential remains an area of active research. Readers who do not require a refresher on linear algebra may skip Sec- tion 18.1, although Example 18.1 is especially recommended as it highlights a property of eigenvalues that we exploit later in the chapter. 18.1 Linear algebra review We briefly review some necessary background in linear algebra. Let C be an M N matrix with real-valued entries; for a term-document matrix, all × RANK entries are in fact non-negative. -
Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs
January 2020 Knowledge Graphs on the Web – an Overview Nicolas HEIST, Sven HERTLING, Daniel RINGLER, and Heiko PAULHEIM Data and Web Science Group, University of Mannheim, Germany Abstract. Knowledge Graphs are an emerging form of knowledge representation. While Google coined the term Knowledge Graph first and promoted it as a means to improve their search results, they are used in many applications today. In a knowl- edge graph, entities in the real world and/or a business domain (e.g., people, places, or events) are represented as nodes, which are connected by edges representing the relations between those entities. While companies such as Google, Microsoft, and Facebook have their own, non-public knowledge graphs, there is also a larger body of publicly available knowledge graphs, such as DBpedia or Wikidata. In this chap- ter, we provide an overview and comparison of those publicly available knowledge graphs, and give insights into their contents, size, coverage, and overlap. Keywords. Knowledge Graph, Linked Data, Semantic Web, Profiling 1. Introduction Knowledge Graphs are increasingly used as means to represent knowledge. Due to their versatile means of representation, they can be used to integrate different heterogeneous data sources, both within as well as across organizations. [8,9] Besides such domain-specific knowledge graphs which are typically developed for specific domains and/or use cases, there are also public, cross-domain knowledge graphs encoding common knowledge, such as DBpedia, Wikidata, or YAGO. [33] Such knowl- edge graphs may be used, e.g., for automatically enriching data with background knowl- arXiv:2003.00719v3 [cs.AI] 12 Mar 2020 edge to be used in knowledge-intensive downstream applications. -
Large Semantic Network Manual Annotation 1 Introduction
Large Semantic Network Manual Annotation V´aclav Nov´ak Institute of Formal and Applied Linguistics Charles University, Prague [email protected] Abstract This abstract describes a project aiming at manual annotation of the content of natural language utterances in a parallel text corpora. The formalism used in this project is MultiNet – Multilayered Ex- tended Semantic Network. The annotation should be incorporated into Prague Dependency Treebank as a new annotation layer. 1 Introduction A formal specification of the semantic content is the aim of numerous semantic approaches such as TIL [6], DRT [9], MultiNet [4], and others. As far as we can tell, there is no large “real life” text corpora manually annotated with such markup. The projects usually work only with automatically generated annotation, if any [1, 6, 3, 2]. We want to create a parallel Czech-English corpora of texts annotated with the corresponding semantic network. 1.1 Prague Dependency Treebank From the linguistic viewpoint there language resources such as Prague Dependency Treebank (PDT) which contain a deep manual analysis of texts [8]. PDT contains annotations of three layers, namely morpho- logical, analytical (shallow dependency syntax) and tectogrammatical (deep dependency syntax). The units of each annotation level are linked with corresponding units on the preceding level. The morpho- logical units are linked directly with the original text. The theoretical basis of the treebank lies in the Functional Gener- ative Description of language system [7]. PDT 2.0 is based on the long-standing Praguian linguistic tradi- tion, adapted for the current computational-linguistics research needs. The corpus itself is embedded into the latest annotation technology. -
A Hierarchical Clustering Approach for Dbpedia Based Contextual Information of Tweets
Journal of Computer Science Original Research Paper A Hierarchical Clustering Approach for DBpedia based Contextual Information of Tweets 1Venkatesha Maravanthe, 2Prasanth Ganesh Rao, 3Anita Kanavalli, 2Deepa Shenoy Punjalkatte and 4Venugopal Kuppanna Rajuk 1Department of Computer Science and Engineering, VTU Research Resource Centre, Belagavi, India 2Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bengaluru, India 3Department of Computer Science and Engineering, Ramaiah Institute of Technology, Bengaluru, India 4Department of Computer Science and Engineering, Bangalore University, Bengaluru, India Article history Abstract: The past decade has seen a tremendous increase in the adoption of Received: 21-01-2020 Social Web leading to the generation of enormous amount of user data every Revised: 12-03-2020 day. The constant stream of tweets with an innate complex sentimental and Accepted: 21-03-2020 contextual nature makes searching for relevant information a herculean task. Multiple applications use Twitter for various domain sensitive and analytical Corresponding Authors: Venkatesha Maravanthe use-cases. This paper proposes a scalable context modeling framework for a Department of Computer set of tweets for finding two forms of metadata termed as primary and Science and Engineering, VTU extended contexts. Further, our work presents a hierarchical clustering Research Resource Centre, approach to find hidden patterns by using generated primary and extended Belagavi, India contexts. Ontologies from DBpedia are used for generating primary contexts Email: [email protected] and subsequently to find relevant extended contexts. DBpedia Spotlight in conjunction with DBpedia Ontology forms the backbone for this proposed model. We consider both twitter trend and stream data to demonstrate the application of these contextual parts of information appropriate in clustering. -
Universal Or Variation? Semantic Networks in English and Chinese
Universal or variation? Semantic networks in English and Chinese Understanding the structures of semantic networks can provide great insights into lexico- semantic knowledge representation. Previous work reveals small-world structure in English, the structure that has the following properties: short average path lengths between words and strong local clustering, with a scale-free distribution in which most nodes have few connections while a small number of nodes have many connections1. However, it is not clear whether such semantic network properties hold across human languages. In this study, we investigate the universal structures and cross-linguistic variations by comparing the semantic networks in English and Chinese. Network description To construct the Chinese and the English semantic networks, we used Chinese Open Wordnet2,3 and English WordNet4. The two wordnets have different word forms in Chinese and English but common word meanings. Word meanings are connected not only to word forms, but also to other word meanings if they form relations such as hypernyms and meronyms (Figure 1). 1. Cross-linguistic comparisons Analysis The large-scale structures of the Chinese and the English networks were measured with two key network metrics, small-worldness5 and scale-free distribution6. Results The two networks have similar size and both exhibit small-worldness (Table 1). However, the small-worldness is much greater in the Chinese network (σ = 213.35) than in the English network (σ = 83.15); this difference is primarily due to the higher average clustering coefficient (ACC) of the Chinese network. The scale-free distributions are similar across the two networks, as indicated by ANCOVA, F (1, 48) = 0.84, p = .37. -
Query-Based Multi-Document Summarization by Clustering of Documents
Query-based Multi-Document Summarization by Clustering of Documents Naveen Gopal K R Prema Nedungadi Dept. of Computer Science and Engineering Amrita CREATE Amrita Vishwa Vidyapeetham Dept. of Computer Science and Engineering Amrita School of Engineering Amrita Vishwa Vidyapeetham Amritapuri, Kollam -690525 Amrita School of Engineering [email protected] Amritapuri, Kollam -690525 [email protected] ABSTRACT based Hierarchical Agglomerative clustering, k-means, Clus- Information Retrieval (IR) systems such as search engines tering Gain retrieve a large set of documents, images and videos in re- sponse to a user query. Computational methods such as Au- 1. INTRODUCTION tomatic Text Summarization (ATS) reduce this information Automatic Text Summarization [12] reduces the volume load enabling users to find information quickly without read- of information by creating a summary from one or more ing the original text. The challenges to ATS include both the text documents. The focus of the summary may be ei- time complexity and the accuracy of summarization. Our ther generic, which captures the important semantic con- proposed Information Retrieval system consists of three dif- cepts of the documents, or query-based, which captures the ferent phases: Retrieval phase, Clustering phase and Sum- sub-concepts of the user query and provides personalized marization phase. In the Clustering phase, we extend the and relevant abstracts based on the matching between input Potential-based Hierarchical Agglomerative (PHA) cluster- query and document collection. Currently, summarization ing method to a hybrid PHA-ClusteringGain-K-Means clus- has gained research interest due to the enormous information tering approach. Our studies using the DUC 2002 dataset load generated particularly on the web including large text, show an increase in both the efficiency and accuracy of clus- audio, and video files etc. -
Structure at Every Scale: a Semantic Network Account of the Similarities Between Very Unrelated Concepts
De Deyne, S., Navarro D. J., Perfors, A. and Storms, G. (2016). Structure at every scale: A semantic network account of the similarities between very unrelated concepts. Journal of Experimental Psychology: General, 145, 1228-1254 https://doi.org/10.1037/xge0000192 Structure at every scale: A semantic network account of the similarities between unrelated concepts Simon De Deyne, Danielle J. Navarro, Amy Perfors University of Adelaide Gert Storms University of Leuven Word Count: 19586 Abstract Similarity plays an important role in organizing the semantic system. However, given that similarity cannot be defined on purely logical grounds, it is impor- tant to understand how people perceive similarities between different entities. Despite this, the vast majority of studies focus on measuring similarity between very closely related items. When considering concepts that are very weakly re- lated, little is known. In this paper we present four experiments showing that there are reliable and systematic patterns in how people evaluate the similari- ties between very dissimilar entities. We present a semantic network account of these similarities showing that a spreading activation mechanism defined over a word association network naturally makes correct predictions about weak sim- ilarities and the time taken to assess them, whereas, though simpler, models based on direct neighbors between word pairs derived using the same network cannot. Keywords: word associations, similarity, semantic networks, random walks. This work was supported by a research grant funded by the Research Foundation - Flanders (FWO), ARC grant DE140101749 awarded to the first author, and by the interdisciplinary research project IDO/07/002 awarded to Dirk Speelman, Dirk Geeraerts, and Gert Storms. -
Latent Semantic Analysis for Text-Based Research
Behavior Research Methods, Instruments, & Computers 1996, 28 (2), 197-202 ANALYSIS OF SEMANTIC AND CLINICAL DATA Chaired by Matthew S. McGlone, Lafayette College Latent semantic analysis for text-based research PETER W. FOLTZ New Mexico State University, Las Cruces, New Mexico Latent semantic analysis (LSA) is a statistical model of word usage that permits comparisons of se mantic similarity between pieces of textual information. This papersummarizes three experiments that illustrate how LSA may be used in text-based research. Two experiments describe methods for ana lyzinga subject's essay for determining from what text a subject learned the information and for grad ing the quality of information cited in the essay. The third experiment describes using LSAto measure the coherence and comprehensibility of texts. One of the primary goals in text-comprehension re A theoretical approach to studying text comprehen search is to understand what factors influence a reader's sion has been to develop cognitive models ofthe reader's ability to extract and retain information from textual ma representation ofthe text (e.g., Kintsch, 1988; van Dijk terial. The typical approach in text-comprehension re & Kintsch, 1983). In such a model, semantic information search is to have subjects read textual material and then from both the text and the reader's summary are repre have them produce some form of summary, such as an sented as sets ofsemantic components called propositions. swering questions or writing an essay. This summary per Typically, each clause in a text is represented by a single mits the experimenter to determine what information the proposition. -
Untangling Semantic Similarity: Modeling Lexical Processing Experiments with Distributional Semantic Models
Untangling Semantic Similarity: Modeling Lexical Processing Experiments with Distributional Semantic Models. Farhan Samir Barend Beekhuizen Suzanne Stevenson Department of Computer Science Department of Language Studies Department of Computer Science University of Toronto University of Toronto, Mississauga University of Toronto ([email protected]) ([email protected]) ([email protected]) Abstract DSMs thought to instantiate one but not the other kind of sim- Distributional semantic models (DSMs) are substantially var- ilarity have been found to explain different priming effects on ied in the types of semantic similarity that they output. Despite an aggregate level (as opposed to an item level). this high variance, the different types of similarity are often The underlying conception of semantic versus associative conflated as a monolithic concept in models of behavioural data. We apply the insight that word2vec’s representations similarity, however, has been disputed (Ettinger & Linzen, can be used for capturing both paradigmatic similarity (sub- 2016; McRae et al., 2012), as has one of the main ways in stitutability) and syntagmatic similarity (co-occurrence) to two which it is operationalized (Gunther¨ et al., 2016). In this pa- sets of experimental findings (semantic priming and the effect of semantic neighbourhood density) that have previously been per, we instead follow another distinction, based on distri- modeled with monolithic conceptions of DSM-based seman- butional properties (e.g. Schutze¨ & Pedersen, 1993), namely, tic similarity. Using paradigmatic and syntagmatic similarity that between syntagmatically related words (words that oc- based on word2vec, we show that for some tasks and types of items the two types of similarity play complementary ex- cur in each other’s near proximity, such as drink–coffee), planatory roles, whereas for others, only syntagmatic similar- and paradigmatically related words (words that can be sub- ity seems to matter. -
How Latent Is Latent Semantic Analysis?
How Latent is Latent Semantic Analysis? Peter Wiemer-Hastings University of Memphis Department of Psychology Campus Box 526400 Memphis TN 38152-6400 [email protected]* Abstract In the late 1980's, a group at Bellcore doing re- search on information retrieval techniques developed a Latent Semantic Analysis (LSA) is a statisti• statistical, corpus-based method for retrieving texts. cal, corpus-based text comparison mechanism Unlike the simple techniques which rely on weighted that was originally developed for the task of matches of keywords in the texts and queries, their information retrieval, but in recent years has method, called Latent Semantic Analysis (LSA), cre• produced remarkably human-like abilities in a ated a high-dimensional, spatial representation of a cor• variety of language tasks. LSA has taken the pus and allowed texts to be compared geometrically. Test of English as a Foreign Language and per• In the last few years, several researchers have applied formed as well as non-native English speakers this technique to a variety of tasks including the syn• who were successful college applicants. It has onym section of the Test of English as a Foreign Lan• shown an ability to learn words at a rate sim• guage [Landauer et al., 1997], general lexical acquisi• ilar to humans. It has even graded papers as tion from text [Landauer and Dumais, 1997], selecting reliably as human graders. We have used LSA texts for students to read [Wolfe et al., 1998], judging as a mechanism for evaluating the quality of the coherence of student essays [Foltz et a/., 1998], and student responses in an intelligent tutoring sys• the evaluation of student contributions in an intelligent tem, and its performance equals that of human tutoring environment [Wiemer-Hastings et al., 1998; raters with intermediate domain knowledge.