Knowledge Graphs and Knowledge Graph Embeddings

Total Page:16

File Type:pdf, Size:1020Kb

Knowledge Graphs and Knowledge Graph Embeddings St. Cloud State University theRepository at St. Cloud State Culminating Projects in Computer Science and Department of Computer Science and Information Technology Information Technology 5-2020 Knowledge Graphs and Knowledge Graph Embeddings Catherine Tschida [email protected] Follow this and additional works at: https://repository.stcloudstate.edu/csit_etds Recommended Citation Tschida, Catherine, "Knowledge Graphs and Knowledge Graph Embeddings" (2020). Culminating Projects in Computer Science and Information Technology. 32. https://repository.stcloudstate.edu/csit_etds/32 This Starred Paper is brought to you for free and open access by the Department of Computer Science and Information Technology at theRepository at St. Cloud State. It has been accepted for inclusion in Culminating Projects in Computer Science and Information Technology by an authorized administrator of theRepository at St. Cloud State. For more information, please contact [email protected]. 1 Knowledge Graphs and Knowledge Graph Embeddings by Catherine Tschida A Starred Paper Submitted to the Graduate Faculty of St. Cloud State University in Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science May, 2020 Starred Paper Committee: Andrew Anda, Chairperson Richard Sundheim Bryant Julstrom 2 Abstract Knowledge graphs provide machines with structured knowledge of the world. Structured, machine-readable knowledge is necessary for a wide variety of artificial intelligence tasks such as search, translation, and recommender systems. These knowledge graphs can be embedded into a dense matrix representation for easier usage and storage. We first discuss knowledge graph components and knowledge base population to provide the necessary background knowledge. We then discuss popular methods of embedding knowledge graphs in chronological order. Lastly, we cover how knowledge graph embeddings improve both knowledge base population and a variety of artificial intelligence tasks. 3 Table of Contents Page List of Tables .......................................................................................................... 6 List of Figures ......................................................................................................... 7 List of Equations ..................................................................................................... 8 Chapter 1. Introduction .................................................................................................. 9 2. Knowledge Graph Components ................................................................... 13 2.1 Entity ................................................................................................... 13 2.2 RDF ..................................................................................................... 15 2.3 Different Knowledge Graphs ............................................................... 17 2.4 Adding Additional Information to the RDF Structure ............................ 20 2.5 Section Summary ................................................................................ 22 3. Knowledge Base Population ........................................................................ 24 3.1 Supervised Learning ............................................................................ 24 3.2 Syntactical Preprocessing Steps ......................................................... 25 3.3 Entity Disambiguation .......................................................................... 26 3.4 Relation Extraction .............................................................................. 27 3.5 Long Short-Term Memory (>STM) ...................................................... 28 3.6 Categorical Data and Keras ................................................................ 31 3.7 Preventing Overfit ................................................................................ 31 3.8 Noise in Extracting Data from Unstructured Text ................................. 32 3.9 Judging Accuracy of Techniques ......................................................... 32 4 Chapter Page 4. Embedding Background Knowledge ............................................................ 34 4.1 Initial Input ........................................................................................... 34 4.2 Relationship Categories ...................................................................... 35 4.3 Negative Sampling .............................................................................. 36 4.4 Loss Functions .................................................................................... 36 5. Embedding Types ........................................................................................ 39 5.1 RESCAL .............................................................................................. 40 5.2 Word2Vec Embeddings ....................................................................... 41 5.3 TransE Embeddings ............................................................................ 42 5.4 TransM Embeddings ........................................................................... 44 5.5 HolE Embeddings ................................................................................ 45 5.6 ComplEx Embeddings ......................................................................... 46 5.7 Spectrally Trained HolE ....................................................................... 48 6. The Use of Embeddings .............................................................................. 50 6.1 Abbreviation Disambiguation ............................................................... 50 6.2 Relation Extraction .............................................................................. 50 6.3 Link Prediction / Reasoning ................................................................. 51 6.4 Classifying Entities as Instances of a Class ........................................ 51 6.5 Language Translation .......................................................................... 51 6.6 Recommender Systems ...................................................................... 52 6.7 Question Answering ............................................................................ 52 7. Conclusion ................................................................................................... 53 5 Chapter Page References ........................................................................................................ 55 6 List of Tables Table Page 2.1 Date and creation method of knowledge graphs ......................................... 17 2.2 A comparison of knowledge graph components .......................................... 18 4.1 Embeddings and their loss functions ........................................................... 38 5.1 Knowledge graph embeddings .................................................................... 39 5.2 Mean hit at 10 for TransE and TransM on freebase15k .............................. 45 5.3 Comparison of TransE, RESCAL, and HolE ................................................ 46 5.4 Comparison of TransE, HolE, and ComplEx ................................................ 48 7 List of Figures Figure Page 2.1 KG showing information on the entity URI: Hamlet ...................................... 14 2.2 Google’s knowledge graph card .................................................................. 15 2.3 DBpedia entry for Michael Schumacher ...................................................... 20 2.4 <Mikołaj of Ściborz, date of death, unknown value> ........ 21 3.1 Information extraction flowchart ................................................................... 25 3.2a LSTM Forget gate .................................................................................... 29 3.2b LSTM Input gate ...................................................................................... 29 3.2c LSTM Output gate ..................................................................................... 29 4.1 Embedding input .......................................................................................... 35 5.1 3CosAdd for a-a'+b≠b' ............................................................................... 42 5.2 TransE embedding with translation vector r ................................................ 43 5.3 Modeling 1:n relations in TransE vs. TransM ............................................... 44 8 List of Equations Equation Page 4.1 Pointwise square error loss ......................................................................... 38 4.2 Pairwise hinge loss ...................................................................................... 38 4.3 Pointwise hinge loss .................................................................................... 38 4.4 Pointwise logistic loss .................................................................................. 38 5.1 Scoring function of RASCAL ........................................................................ 41 5.2 ComplEx scoring function ............................................................................ 47 5.3 The spectrally trained HolE scoring function ............................................... 49 9 Chapter 1: Introduction Knowledge graphs contain knowledge of the world in a format that is usable to computers. A knowledge graph is a directed graph. The nodes of the graph represent named objects such as Abraham Lincoln, concepts such as Gravity, or literal
Recommended publications
  • Ignite Visibility Consulting
    Ignite Visibility Consulting Copyright 2013 – Ignite Visibility Page 1 Introduction .................................................................................................................................................. 3 Your Page ...................................................................................................................................................... 3 Branded URL ............................................................................................................................................. 3 Background and Profile Picture ................................................................................................................ 3 Fill out all Text ........................................................................................................................................... 3 Videos............................................................................................................................................................ 4 Theme your Videos Around One Keyword ............................................................................................... 4 Add a Date, Describe a Location, Link to Social Sites and Links to YouTube Video .................................. 5 Get Video Embeds, +1s, Ratings, Views and Comments Gradually .......................................................... 5 Add Annotations to Videos ........................................................................................................................... 5 Make Videos
    [Show full text]
  • UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL ESCOLA DE ADMINISTRAÇÃO CURSO DE GRADUAÇÃO EM ADMINISTRAÇÃO Eduardo Gessival
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Lume 5.8 0 UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL ESCOLA DE ADMINISTRAÇÃO CURSO DE GRADUAÇÃO EM ADMINISTRAÇÃO Eduardo Gessival Vieira Mendes ESTRATÉGIAS DE SOBREVIVÊNCIA DO PROBLOGGER DIANTE DA SAZONALIDADE DO MARKETING DIGITAL Porto Alegre 2018 1 Eduardo Gessival Vieira Mendes ESTRATÉGIAS DE SOBREVIVÊNCIA DO PROBLOGGER DIANTE DA SAZONALIDADE DO MARKETING DIGITAL Trabalho de conclusão de curso de graduação apresentado ao Departamento de Ciências Administrativas da Universidade Federal do Rio Grande do Sul, como requisito parcial para a obtenção do grau de Bacharel em Administração. Orientador: Prof. Fernando Dias Lopes Porto Alegre 2018 2 AGRADECIMENTOS Em primeiro lugar, agradeço aos meus pais, por terem me transformado na pessoa que hoje eu sou. Agradeço especialmente aos meus filhos, que me deram as forças necessárias para que eu não perdesse o foco dos objetivos, com a motivação de proporcionar a eles uma vida melhor na posteridade, e ao mesmo tempo servir de exemplo para que busquem a formação em nível superior. Agradeço também aos meus colegas e amigos, e principalmente ao meu sócio e grande amigo Luiz Felipe Kessler e Silva, que sempre me incentivou consideravelmente a ir em busca dos meus sonhos, além de compartilharmos da mesma evolução nas nossas vidas pessoais e profissionais. Por fim, agradeço ao professor Fernando Dias Lopes por apostar no meu tema, bem como por todo o seu apoio, dedicação e paciência ao longo da orientação deste Trabalho de Conclusão de Curso. Não esquecendo de todos os demais professores que foram importantíssimos ao longo da minha formação em Administração de Empresas pela Universidade Federal do Rio Grande do Sul.
    [Show full text]
  • Semantic Search
    Semantic Search Philippe Cudre-Mauroux Definition ter grasp the semantics (i.e., meaning) and the context of the user query and/or Semantic Search regroups a set of of the indexed content in order to re- techniques designed to improve tra- trieve more meaningful results. ditional document or knowledge base Semantic Search techniques can search. Semantic Search aims at better be broadly categorized into two main grasping the context and the semantics groups depending on the target content: of the user query and/or of the indexed • techniques improving the relevance content by leveraging Natural Language of classical search engines where the Processing, Semantic Web and Machine query consists of natural language Learning techniques to retrieve more text (e.g., a list of keywords) and relevant results from a search engine. results are a ranked list of documents (e.g., webpages); • techniques retrieving semi-structured Overview data (e.g., entities or RDF triples) from a knowledge base (e.g., a knowledge graph or an ontology) Semantic Search is an umbrella term re- given a user query formulated either grouping various techniques for retriev- as natural language text or using ing more relevant content from a search a declarative query language like engine. Traditional search techniques fo- SPARQL. cus on ranking documents based on a set of keywords appearing both in the user’s Those two groups are described in query and in the indexed content. Se- more detail in the following section. For mantic Search, instead, attempts to bet- each group, a wide variety of techniques 1 2 Philippe Cudre-Mauroux have been proposed, ranging from Natu- matical tags (such as noun, conjunction ral Language Processing (to better grasp or verb) to individual words.
    [Show full text]
  • SEKI@Home, Or Crowdsourcing an Open Knowledge Graph
    SEKI@home, or Crowdsourcing an Open Knowledge Graph Thomas Steiner1? and Stefan Mirea2 1 Universitat Politècnica de Catalunya – Department LSI, Barcelona, Spain [email protected] 2 Computer Science, Jacobs University Bremen, Germany [email protected] Abstract. In May 2012, the Web search engine Google has introduced the so-called Knowledge Graph, a graph that understands real-world en- tities and their relationships to one another. It currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects. Soon after its announce- ment, people started to ask for a programmatic method to access the data in the Knowledge Graph, however, as of today, Google does not provide one. With SEKI@home, which stands for Search for Embedded Knowledge Items,weproposeabrowserextension-basedapproachtocrowdsource the task of populating a data store to build an Open Knowledge Graph. As people with the extension installed search on Google.com, the ex- tension sends extracted anonymous Knowledge Graph facts from Search Engine Results Pages (SERPs) to a centralized, publicly accessible triple store, and thus over time creates a SPARQL-queryable Open Knowledge Graph. We have implemented and made available a prototype browser extension tailored to the Google Knowledge Graph, however, note that the concept of SEKI@home is generalizable for other knowledge bases. 1Introduction 1.1 The Google Knowledge Graph With the introduction of the Knowledge Graph, the search engine Google has made a significant paradigm shift towards “things, not strings” [7], as a post on the official Google blog states. Entities covered by the Knowledge Graph include landmarks, celebrities, cities, sports teams, buildings, movies, celestial objects, works of art, and more.
    [Show full text]
  • Seo Report for Brands July 2019
    VOICE ASSISTANT SEO REPORT FOR BRANDS JULY 2019 GIVING VOICE TO A REVOLUTION Table of Contents About Voicebot About Magic + Co An Introduction to Voice Search Today // 3 Voicebot produces the leading independent Magic + Co. is a next-gen creative services firm research, online publication, newsletter and podcast whose core expertise is in designing, producing, and Consumer Adoption Trends // 10 focused on the voice and AI industries. Thousands executing on conversational strategy, the technology of entrepreneurs, developers, investors, analysts behind it, and associated marketing campaigns. Voice Search Results Analysis // 16 and other industry leaders look to Voicebot each Grading the Experts // 32 week for the latest news, data, analysis and insights Facts defining the trajectory of the next great computing • 25 people, technologists, strategists, and Practical Voice Assistant SEO Strategies // 36 platform. At Voicebot, we give voice to a revolution. creatives with focus on conversational design and technologies. Additional Resources // 39 • 1 Webby Award Nominee and 3 Honorees. • 3 years old and global leader. Methodology • We’ve worked primarily with CPG, but our practice extends into utilities, governments, Consumer survey data was collected online during banking, entertainment, healthcare and is the first week of January 2019 and included 1,038 international in scope. U.S. adults age 18 or older that were representative of U.S. Census demographic averages. Findings were compared to earlier survey data collected in Magic + Co. the first week of September 2018 that included responses from 1,040 U.S. adults. Voice search data results from Amazon Echo Smart Speakers with Alexa, Apple HomePod with Siri, Google Home and a Smartphone with Google Assistant, and Samsung Galaxy S10 with Bixby, were collected in Q2 2019.
    [Show full text]
  • Arxiv:1812.02903V1 [Cs.LG] 7 Dec 2018 Ized Server
    APPLIED FEDERATED LEARNING: IMPROVING GOOGLE KEYBOARD QUERY SUGGESTIONS Timothy Yang*, Galen Andrew*, Hubert Eichner* Haicheng Sun, Wei Li, Nicholas Kong, Daniel Ramage, Franc¸oise Beaufays Google LLC, Mountain View, CA, U.S.A. ftimyang, galenandrew, huberte haicsun, liweithu, kongn, dramage, [email protected] ABSTRACT timely suggestions are necessary in order to maintain rele- vance. On-device inference and training through FL enable Federated learning is a distributed form of machine learn- us to both minimize latency and maximize privacy. ing where both the training data and model training are decen- In this paper, we use FL in a commercial, global-scale tralized. In this paper, we use federated learning in a commer- setting to train and deploy a model to production for infer- cial, global-scale setting to train, evaluate and deploy a model ence – all without access to the underlying user data. Our to improve virtual keyboard search suggestion quality with- use case is search query suggestions [4]: when a user enters out direct access to the underlying user data. We describe our text, Gboard uses a baseline model to determine and possibly observations in federated training, compare metrics to live de- surface search suggestions relevant to the input. For instance, ployments, and present resulting quality increases. In whole, typing “Let’s eat at Charlie’s” may display a web query sug- we demonstrate how federated learning can be applied end- gestion to search for nearby restaurants of that name; other to-end to both improve user experiences and enhance user types of suggestions include GIFs and Stickers. Here, we privacy.
    [Show full text]
  • Knolwedge Acquisi$On from Music Digital Libraries
    Knolwedge Acquision from Music Digital Libraries Sergio Oramas, Mohamed Sordo Mo0vaon • Musicological Knowledge is hidden between the lines • Machines don’t know how to read Why Knowledge Acquisi0on? • Obtain knowledge automacally • Make complex ques0ons • Visualize the informaon • Improve navigaon • Share knowledge 4 Musical Libraries Musical Libraries digital recording, scan Musical Libraries OCR, manual transcrip0on digital recording, scan Musical Libraries Informaon Extrac0on, seman0c annotaon OCR, manual transcrip0on digital recording, scan Musical Libraries Informaon Extrac0on, seman0c annotaon Current DL OCR, manual transcrip0on digital recording, scan Musical Libraries Web search Informaon Extrac0on, seman0c annotaon Current DL OCR, manual transcrip0on digital recording, scan Musical Libraries Web search Informaon Extrac0on, seman0c annotaon Current DL OCR, manual transcrip0on digital recording, scan Seman0c Web • The Semanc Web aims at conver0ng the current web, dominated by unstructured and semi-structured documents into a web of linked data. • Achievements useful for Digital Libraries – Common framework for data representaon and interconnec0on (RDF, ontologies) – Seman0c technologies to annotate texts (En0ty Linking) – Language for complex queries (SPARQL) Wikipedia and DBpedia - Digital Encyclopedia - Knowledge Base - Unstructured - Structured - Keyword search - Query search 13 14 15 DBpedia • Dbpedia example queries – Composers born in Vienna in XVIII Century – American jazz musicians that have wri\en songs recorded by RCA
    [Show full text]
  • Google's Knowledge Graph
    Google’s Knowledge Graph Google has just announced that it will completely reshape the presentation of its search results. Instead of listing websites where people can find the answer to their queries, it will present data, links, pictures, etc. from its own databases, some 500 million “items”. The redesign is said to represent the biggest change in search for the last five years, not just conceptually but also in terms of business impact. The goal, of course, is to make sure that web surfers remain on Google properties instead of clicking away to another site for answers. Internet companies all over the world already complain and regulators can add the move to their list of Google features to investigate from an antitrust perspective. The announcement is also a subtle tactical move, just one day before Facebook’s IPO. It reinforces Google’s positioning as the owner of the “knowledge network” as opposed to the “social network”. It is hard to anticipate what the new ‘product’ will really be but there are a few important pointers. First, it will make connections between the different information elements. At the moment, search provides a list of sources but does not really categorize them. Knowledge Graph will and may even link the sources to one another. People (not least Tim Berners-Lee) have always argued for the, so called ‘semantic web’ to replace the traditional web. Google’s Knowledge Graph is a fine approximation of the idea. Another aspect of the service will, hopefully, have to do with formatting. We may get a more standardized presentation of the information, which is quite important for rapid processing.
    [Show full text]
  • The Rise of KNOWLEDGE GRAPHS
    THOUGHT LEADERSHIP SERIES The Rise of KNOWLEDGE GRAPHS 2019 | MAR THOUGHT LEADERSHIP SERIES | March 2019 2 MAKING NEW CONNECTIONS WITH KNOWLEDGE GRAPHS In today’s business world, time-to-insight and time-to-action are critical competitive differentiators. The demand for quick, easy access to information is growing. However, the challenge of combining data into meaningful information is also mounting—alongside the proliferation of data sources and types. While companies nowadays are investing heavily in initiatives to increase the amount of data at their disposal, most are spending more time finding than analyzing data. Data silos remain a huge problem at many enterprises, and legacy data management technologies and processes are having trouble keeping up with the speed, scalability, and flexibility requirements of new workloads and use cases. Data volumes are exploding in will be driven by specific applications Graph databases, a type of NoSQL organizations and the pressure to exploit and increasing expertise. “NoSQL” data database that employs structures that data for faster time to insight is management approaches, features, and with nodes, edges, and properties to increasing. Along with this store data and places data growth, there is a new a heavy emphasis wave of tools to expand The ability for knowledge graphs to gather on relationships, are what is possible with the growing in use. More proliferation of data and information, relationships, and insights—and recently, the term data types. connect those facts—allows organizations to “knowledge graph” While relational databases in particular has are still both the biggest discern context in data, which is important for gained popularity with source of transactional as well as complying with the introduction of data in most organizations extracting value several high-profile and the greatest source of increasingly stringent data privacy regulations.
    [Show full text]
  • Seo-101-Guide-V7.Pdf
    Copyright 2017 Search Engine Journal. Published by Alpha Brand Media All Rights Reserved. MAKING BUSINESSES VISIBLE Consumer tracking information External link metrics Combine data from your web Uploading backlink data to a crawl analytics. Adding web analytics will also identify non-indexable, redi- data to a crawl will provide you recting, disallowed & broken pages with a detailed gap analysis and being linked to. Do this by uploading enable you to find URLs which backlinks from popular backlinks have generated traffic but A checker tools to track performance of AT aren’t linked to – also known as S D BA the most link to content on your site. C CK orphans. TI L LY IN A K N D A A T B A E W O R G A A T N A I D C E S IL Search Analytics E F Crawler requests A G R O DeepCrawl’s Advanced Google CH L Integrate summary data from any D Search Console Integration ATA log file analyser tool into your allows you to connect technical crawl. Integrating log file data site performance insights with enables you to discover the pages organic search information on your site that are receiving from Google Search Console’s attention from search engine bots Search Analytics report. as well as the frequency of these requests. Monitor site health Improve your UX Migrate your site Support mobile first Unravel your site architecture Store historic data Internationalization Complete competition Analysis [email protected] +44 (0) 207 947 9617 +1 929 294 9420 @deepcrawl Free trail at: https://www.deepcrawl.com/free-trial Table of Contents 9 Chapter 1: 20
    [Show full text]
  • Summarized by © Lakhasly.Com Real-Time Data Feeds Although It
    Real-time Data Feeds Although it doesn’t promote itself as such, Google is actually a collection of data and a set of tools for working with it. It has progressed from an index of web pages to a central hub for real-time data feeds on just about anything that can be measured such as weather reports, travel reports, stock market and shares, shopping suggestions, travel suggestions, and several other things. Sorting Tools Big Data analysis which implies utilizing tools intended to deal with and comprehend this massive data becomes an integral factor whenever users carry out a search query. The Google’s algorithms run complex calculations intended to match the questions that user entered with all the available data. It will try to determine whether the user is searching for news, people, facts or statistics, and retrieve the data from the appropriate feed. Knowledge Graph Pages Google Knowledge Graph is a tool or database which collects all the data and facts about people, places and things along with proper differentiation and relationship between them. It is then later used by Google in solving our queries with useful answers. Google knowledge graph is user-centric and it provides them with useful relevant information quickly and easily. Literal & Semantic search The main aim of the literal search engine is to find the root of your search phrase by looking for a match for some of the word or entire phrase. The root of the phrase is then examined and explored upon to display better search results. While semantic search engine tries to understand the context of the phrase by analyzing the terms and language in knowledge graph database to directly answer a question with specific information.
    [Show full text]
  • Messageontap: a Suggestive Interface to Facilitate Messaging-Related Tasks
    MessageOnTap: A Suggestive Interface to Facilitate Messaging-related Tasks Fanglin Chen Kewei Xia Human-Computer Interaction Institute Electronic Information School Carnegie Mellon University Wuhan University Pittsburgh, PA, USA Wuhan, China [email protected] [email protected] Karan Dhabalia Jason I. Hong Electrical and Computer Engineering Human-Computer Interaction Institute Carnegie Mellon University Carnegie Mellon University Pittsburgh, PA, USA Pittsburgh, PA, USA [email protected] [email protected] ABSTRACT KEYWORDS Text messages are sometimes prompts that lead to infor- Messaging/Communication, Information Seeking & Search; mation related tasks, e.g. checking one’s schedule, creating Contextual Computing; Personal Data/Tracking; User Expe- reminders, or sharing content. We introduce MessageOnTap, rience Design; Text/Speech/Language; Productivity a suggestive interface for smartphones that uses the text in ACM Reference Format: a conversation to suggest task shortcuts that can stream- Fanglin Chen, Kewei Xia, Karan Dhabalia, and Jason I. Hong. 2019. line likely next actions. When activated, MessageOnTap MessageOnTap: A Suggestive Interface to Facilitate Messaging- uses word embeddings to rank relevant external apps, and related Tasks. In CHI Conference on Human Factors in Computing parameterizes associated task shortcuts using key phrases Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland mentioned in the conversation, such as times, persons, or Uk. ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/ events. MessageOnTap also tailors the auto-complete dictio- 3290605.3300805 nary based on text in the conversation, to streamline any 1 INTRODUCTION text input. We first conducted a month-long study of mes- Messaging apps are among the most popular apps for smart- saging behaviors (N=22) that informed our design.
    [Show full text]