Aprendizado de Máquina e Inferência em Grafos de Conhecimento Machine Learning and Inference in Knowledge Graphs

Daniel N. R. da Silva Artur Ziviani Fabio Porto dramos,ziviani,[email protected] Laboratório Nacional de Computação Científica (LNCC)

Simpósio Brasileiro de Bancos de Dados (SBBD), Outubro de 2019 Big Data Management Computational Reproducibility Knowledge Bases Machine Learning Network Science Scientific Workflows http://dexl.lncc.br/ Outline

• Introduction and Motivation • Applications • Data Models and Systems • Relational Machine Learning • Knowledge Graph Embeddings • Events, Softwares, and Datasets • Open Research

3 Introduction and Motivation The third (current) rise of AI

Custom Drug Image Board Games Relationship Discovery and Captioning Management Toxicology

Natural Medical Image Medical Online Language Analysis Informatics Advertisement Processing

Recommendation Self-Driving Robotics Web Search Systems Vehicles

Deng, Li. Artificial Intelligence in the Rising Wave of Deep Learning: The Historical Path and Future 5 Outlook [Perspectives]. IEEE Signal Processing Magazine. 2018. Knowledge Representation and Reasoning (KRR) How to symbolically encode human knowledge and reasoning in such a way that this encoded knowledge can be processed by a computer via encoded reasoning to obtain intelligent behavior.

Fragmentary Set of Medium for Medium of Theory of Surrogate Ontological Efficient Human Intelligent Commitments Computation Expression Reasoning

Chein, M. and Mugnier, M-L. Graph-based Knowledge Representation: Computational Foundations of 6 Conceptual Graphs. Springer. 2008. Porphyry's Linked Open Commentaries 1960s Ontology 1998 Data 2012

300s AD Semantic 1980s The Semantic 2006 Knowledge Networks Web Graph

7 https://www.gartner.com/smarterwithgartner/5-trends-appear-on-the-gartner-hype-cycle-for-emerging- 8 technologies-2019/ Graph DBMS

https://db-engines.com/en/ranking_categories 9 Linked Open Data

https://lod-cloud.net/ 10 Knowledge Graph

• Network representation for knowledge; • Heterogeneous data integration; • Constrained by an ontology or data schema; • AI applications.

11 Knowledge Graph

• Terminological Component • C: Concepts • RC:Concept Relations • TC: Relationships between concepts • TC->A: Attributive relationships • A: Attributes • V: Attribute Values • Assertional Component: • E: Entities • RE: Entity Relations • TE: Relationships between entities • TE->C: Instantiation relationships • TE->V: Attributive relationships

12 13 Related Terms

: Set of sentences (assertions) expressed in a language called a knowledge representation language (semantics and syntax); • (Graph) : Storage and data organization; • Ontology: Formal explicit description of concepts in a domain of discourse. It provides sharable and reusable knowledge. • Knowledge Based System: Keeps a knowledge base and performs reasoning.

Noy, N. and McGuinness, D. Ontology development 101: A guide to creating your first ontology. 2001. 14 Ehrlinger, L. and Wöß, W. Towards a definition of knowledge graphs. SEMANTiCS. 2016. Related Fields

Artificial Intelligence

Knowledge Natural Machine Representation Language Learning and Reasoning Processing

Data Management

Data Information Data Modeling Integration Retrieval

15 General-purpose KGs Bio & Medical KGs

NELL

Common-sense KGs & NLP 16 Product Graph & E-commerce KGs Big Techs

• Apple Siri Knowlege Graph • Amazon Product Graph • Facebook Graph • Knowledge Graph • IBM Watson • Microsoft Satori

17 Knowledge Graphs

# Entities # Triples # Concepts # Relations DBpedia 4 298 433 411 885 960 736 2 819 Freebase 49 947 799 3 124 791 156 53 092 70 902 OpenCyc 41 029 2 412 520 116 822 18 028 Wikidata 18 697 897 748 530 833 302 280 1 874 5 130 031 1 001 461 792 569 751 106

Färber, M. and Rettinger, A. Which Knowledge Graph Is Best for Me? ArXiv. 2019. 18 Knowledge Graphs

Accuracy 1 Interlinking Trustworthiness 0.8 0.6 Licensing 0.4 Consistency 0.2 0 Accessibility Relevancy

Interoperability Completeness

Ease of Timeliness understanding

DBpedia Freebase OpenCyc Wikidata YAGO

Färber, M. and Rettinger, A. Which Knowledge Graph Is Best for Me? ArXiv. 2019. 19 Färber, M. et al. quality of , freebase, opencyc, wikidata, and yago. . 2018. DBpedia

• Crowd-sourced community effort to extract structured content from Wikmedia projects; • Served as Linked Data; • Ontology, including persons, places, creative works, organizations, species, and diseases; • 125 languages; • Links to YAGO and Wikipedia.

Lehmann, J et al. DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia.SWJ. 2012. https://dbpedia.org/ Structure in Wikipedia

• Title • Abstract • Infoboxes • Geo-coodinates • Categories • Images • Links: • other language versions • other Wikipedia pages • to the web; • redirections; • disambiguations.

21 YAGO

• YAGO (Yet Another Great Ontology). • Derived from Wikipedia, WordNet, and Geonames. • 95% accuracy on sample facts. • Temporal Dimension: • Before, after, during, and overlaps • Location Dimension: • northOf, eastOf, southOf, westOf, nearby, and locatedIn. • Multilingual facts.

Rebele, T. et al. YAGO - A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames. ISWC. 2016. https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/ https://gate.d5.mpi-inf.mpg.de/webyago3spotlxComp/SvgBrowser/ 23 Structured Semi-structured Non-Structured Data Data Data Ex.: Relational . Ex.: XML and JSON Ex.: Text, audio, image, documents. and video files.

Knowledge Base Construction Ex.: Entity and relation extraction.

Correctness Completeness Knowledge Graph

Freshness

Refinement

Completion and Correction

Applications 24 Example Text fragment Marie Curie was born on November 7, 1867, in Warsaw, Poland. She was a french naturalized, physicist and chemist who conducted pioneering research in the fields of radioactivity.

Extracted triples

(Marie Curie, date-of-birth, 07/11/1867) (Marie Curie, born-in, Poland) (Marie Curie, nationality, French) (Marie Curie, profession, Physicist) (Marie Curie, profession, Chemist)

(Marie Curie, research-field, Radioactivity) 25 Polish

Chemist born-in Physicist

profession profession Marie Curie date-of-birth research-field

nationality

“07/11/1867” Radioactivity

French 26 Challenges

Coverage/Completeness Have we got the information we need?

Correctness Is information accurate?

Freshness Is information up to date?

Gao, Yuqing. Building a Large-scale, Accurate and Fresh Knowledge Graph. KDD Tutorial 2018. 27 Applications Applications

• Conversational Agents; • Data integration; • Fact checking and fake news detection; • Question Answering; • Recommendation Systems; • Search Engines.

29 Search Engines

30 Recommendation Systems

Cast Tom Away Hanks Drama Back Robert Forrest To The Zemeckis Future Gump User SciFi Saving Steven Jurassic Private Spielberg Ryan Park

Action Blade Harrison Runner Ford starred-in Movies the director-of Recommended user has genre movies. watched.

Wang, H et al. Exploring High-Order User Preference on the Knowledge Graph for Recommender 31 Systems. ACM TOIS. 2019. Conversational Agents

Natural Jonh Doe: @bot How much money Language do I have in my account? Processing

User Query has Generation Account Balance

isA Context JD has Management $ 0,99 Account ChatBot owns Domain Knowledge Jonh Doe

Natural Language Bot: @johndoe You have $0,99 in Generation your bank account. 32 Question Answering

Which films directed by Steven Spielberg won the Oscar Award?

Film directed film Steven won isA Spielberg

dep nsubj directed ? ? ... ? nsubj det

nmod:by won

Oscar Award Which Oscar Award Steven Spielberg

Zheng, W et al. Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition. PVLDB. 2019. 33 https://nlp.stanford.edu/software/lex-parser.shtml Fact Checking and Fake News Detection

Barack Obama

Columbia University

Association of American Universities Canada Barack Obama secretly practices Islam. Stephen Harper

Calagary

Naheed Neshi

Islam

Ciampaglia, G. L. Computational Fact Checking from Knowledge Networks. PLOS ONE. 2015. 34 Personalized Medicine

iASiS: Big Data to Support Precision Medicine and Public Health Policy. 2017. 35 Vidal, M-E et al. Semantic Data Integration of Big Biomedical Data for Supporting Personalised 36 Medicine. Springer. 2019. Literome

• Extract genomic knowledge from PubMed articles; • Knowledge pertinent to genomic medicine: • Directed genic interactions such as pathways; • Genotype–phenotype associations.

Poon, H. Literome: PubMed-scale genomic knowledge base in the cloud. Bioinformatics. 2014. 37 Project HANOVER

• Machine reading for accelerating “Curations as a service” (CaaS) in precision medicine: • Molecular Tumor Board: • Cancer is a thousand diseases driven by disparate genetic mutations. • Real-world evidence: • FDA-approved drug takes over a decade and costs more than $2 billion. • Clinical trial matching: • 20% of clinical trials fail due to insufficient patients. https://hanover.azurewebsites.net/ 38 ResearchSpace

• Semantic Web / Linked Data application; • Museums, libraries, archives: knowledge or memory institutions • CIDOC-CRM: • Conceptual Reference Model

Oldman, D. and Tanase, D. Reshaping the Knowledge Graph by Connecting Researchers, Data and Practices in ResearchSpace. ISWC. 2018. 39 https://www.researchspace.org. Oldman, D. and Tanase, D. Reshaping the Knowledge Graph by Connecting Researchers, Data and Practices in ResearchSpace. ISWC. 2018. 40 https://www.researchspace.org. Drug Discovery

New Drug Discovery

Semantic Search for Drug Discovery Intelligent Applications Drug Discovery Knowledge Graphs for DD

Drug Discovery Knowledge Bases

Drug Discovery RDF Drug Discovery Ontologies

Drug Discovery Data H u m a n

C u r

Kanza, S. and Frey, J. G. A new wave of innovation in Semantic web tools for drug discovery. 41a t

Expert Opnion on Drug Discovery. 2019. i o n Data Models and Systems Data Models

• Labeled Property Graph: • Node-oriented; • Resource Description Framework (RDF): • Triple-oriented.

43 Labeled Property Graph

• Graph Databases: • Ex.: Amazon Neptune, Neo4J, JanusGraph, and TigerGraph. • Query languages: • Cypher, GCore, GS, Gremlim, and PGQL

https://aws.amazon.com/neptune/ https://neo4j.com/ https://janusgraph.org/ 44 https://www.tigergraph.com/ Labeled Property Graph

n1: Person n2: Person Nodes - One or more labels firstName = “João” firstName = “Maria” - Set of properties lastName = “Silva lastName = “Santos” Edges - Edge type - Set of properties e1: likes e2: creates date = 2019-09-05 date = 2019-09-04 n3: Tag label = “italian” e3: has n5: Post text = “I love italian food.” lang = “en” e4: has n4: Tag label = “food” 45 RDF Triple Predicate Subject Object

Nodes: Resources, literals or blank nodes Subjects (resource or blank node) Objects (resource, literal or blank node)

Edges: Predicates (resource)

Literal: - Can be interpreted as datatypes - Encoded as strings - Represent data values

46 RDF

• IRI/URI (Internationalized/Uniform Resource Identifier) • Serialization: JSON-LD, N-Triples, RDF/XML, and • Query Languages: SPARQL.

@prefix dbr: . dbr:Albert_Einstein @prefix dbo: . @prefix : . rdf:type dbo:birthDate dbr:Albert_Einstein foaf:Person 1879-03-14 rdf:type foaf:Person; dbo:birthDate "1879-03-14"^^xsd:date. xsd:date 47 Systems for Knowledge Bases

• Grakn • GraphDB • Amazon Neptune

48 Grakn

• Hyper-relational deductive database: • Entity and relationship types reasoning; • Rule-based reasoning. • Core: • Built on top of JanusGraph, Apache TinkerPop, Apache Hadoop, Apache Cassandra, Apache Spark • Data Model based on Entity-Relationship Model and Hypergraphs; • Graql: • OLTP, OLAP

https://grakn.ai/ 49 Concept

Type Rule Thing

EntityType Attribute

RelationType Entity

AttributeType Relation

https://dev.grakn.ai/docs/concept-api/overview 50 01. define 02. name sub attribute, datatype string;03. “A” “C” 03. person sub entity, has name, plays person_; 04. friend sub relation, relates person_, relates person_; 05. is_friend sub rule, 06. when { 07. (person_: $p1, person_: $p2) isa friend; Inferred 08. (person_: $p2, person_: $p3) isa friend; 09. }, then { 10. (person_: $p1, person_: $p2) isa friend; 11. }; 12. 13. insert 14. $p1 isa person, has name "A"; 15. $p2 isa person, has name "B"; 16. $p3 isa person, has name "C"; 17. 18. match 19. $p1 isa person, has name "A"; 20. $p2 isa person, has name "B"; 21. $p4 isa person, has name "C"; 22. insert “B” 23. $r (person_: $p1, person_: $p2) isa friend; 24. $r (person_: $p2, person_: $p3) isa friend; 25. commit 26. @has owner value 27. match person 28. $p1 isa person; $p2 isa person; 29. $r (person_: $p1, person_: $p2) isa friend; person_ person_ name 30. get 31. $r; friend 51 Ontotext GraphDB

: • RDFS, SPARQL, and RDF4J. • Query Optimizer; • Reasoner (TREE Engineer); • Storage: Entity Pool

http://graphdb.ontotext.com 52 http://graphdb.ontotext.com/documentation/free/architecture-components.html 53 Amazon Neptune

• Graph service and database; • Amazon Web Services; • Labeled graph property: • Apache Tinkerpop Gremlim • RDF • SPARQL • Quad (subject, object, predicate, graph);

https://aws.amazon.com/neptune/ 54 https://www.slideshare.net/AmazonWebServices/new-launch-amazon-neptune-overview-and- 55 customer-use-cases-dat319-reinvent-2017 Knowledge Graph Tasks Knowledge Graph Tasks

• Automated Construction • Refinement: • Completion • Correction • Analytics

57 Automated Construction Knowledge Base/Graph Construction • Entity and Relation Extraction: • Named Entity Recognition • Entity Alignment • Systems: • DeepDive • Fonduer • Snorkel

59 Named Entity Recognition

Pedro II of Brazil was the second and last monarch of the Empire of Brazil. He was born in Rio de Janeiro to Emperor Pedro I of Brazil and Empress Maria Leopoldina,who were married at that time.

Person State City

60 Entity Linking

Pedro II of Brazil was the second and last monarch of the Empire of Brazil. He was born in Rio de Janeiro to Emperor Pedro I of Brazil and Empress Maria Leopoldina,who were married at that time.

Q217230 Q156774 Q939 Q84239 Q8678 61 Entity Linking

Abrams, R. Google Thinks I’m Dead (I know otherwise.) 2017. 62 https://www.nytimes.com/2017/12/16/business/google-thinks-im-dead.html?_r=0 Relation Extraction

Pedro II of Brazil was the second and last monarch of the Empire of Brazil. He was born in Rio de Janeiro to Emperor Pedro I of Brazil and Empress Maria Leopoldina, who were married at that time.

spouse emperor-of Q217230 Q939 son

son born-in 63 Q84239 Q8678 Q156774 DeepDive

Error Analysis

Candidate Learning and Mapping and Supervision Inference Feature Extraction Data Input

Spouse Spouse1 Spouse2 Tom Rita Sarah Matthew Output

Ce Zhang, C. et al. DeepDive: Declarative Knowledge Base Construction. Communications of the ACM 2017. http://deepdive.stanford.edu/kbc DeepDive

Feature Extractors: Populates the database using a set of SQL queries and UDFs. By default, one sentence per row using NLP. Candidate Mapping: SQL queries to produce possible mentions, entities and relations.

Supervision Evidence Relation: Handlabeling and Distant Supervision.

Learning and Inference Factor Graph similar to Markov Logic. Gibbs sampling.

Ce Zhang, C. et al. DeepDive: Declarative Knowledge Base Construction. Communications of the ACM 2017. http://deepdive.stanford.edu/kbc Fonduer User Input

Matchers and Labeling Schema Throttlers Functions

Multimodal KBC Candidate Featurization & Supervision & Initialization Generation Multimodal Classification Error LSTM Analysis Data Input Phase 1 Phase 2 Phase 3 Output DateOfBirth Person Date Madonna 16/08/1958 Gisele B. 20/07/1980

Ratner, A. J. et al. Snorkel: rapid training data creation with weak supervision. VLDB Journal. 2019. 66 https://www.snorkel.org/ Fonduer

Candidate Generation (Apply UDFs: Matchers and Throtlers) Matchers: Returns mentions to entities. Throttlers: Decrease the # of relationship candidates.

Multimodal Featurization Associate textual, structural, visual and tabular features to relationship candidates.

Supervision and Classification Labeling Function: Yields a label (-1, 0 or 1) for each candidate. Data Programming: Estimate the true label for each candidate. Multimodal BiLSTM: Estimate the true label for each candiate considering features.

Ratner, A. J. et al. Snorkel: rapid training data creation with weak supervision. VLDB Journal. 2019. 67 https://www.snorkel.org/ Fonduer

Candidate Generation (Apply UDFs: Matchers and Throtlers) Matchers: Returns mentions to entities. Throttlers: Decrease the # of relationship candidates.

- DateMatcher() - DictionaryMatch() - For each si in schema R(s1, ..., sn): - LambdaFunctionMatcher() - Define a set of matchers - LambdaFunctionFigureMatcher() - Define the mention space - LocationMatcher() - Cartesian product to yield candidates - MiscMatcher() - Define throttlers: - NumberMatcher() - Operates on candidates - OrganizationMatcher() - PersonMatcher() - RegexMatchEach() - RegexMatchSpan() https://spacy.io/

Wu, S. et al. Fonduer: Knowledge Base Construction from Richly Formatted Data. SIGMOD. 2018. https://fonduer.readthedocs.io/en/latest/ Data Programming

Labeling Ground Candidates Functions Truth Prob Spouse1 Spouse2 L1 L2 L3 Y c1 Tom Hanks Rita Wilson 1 1 -1 ? .75 c2 Matthew Broderick Sarah J. Parker 0 1 1 ? .80 c3 Brad Pitt Jennifer Aniston 0 1 0 ? .42

Estimate the ground truth based on Labeling Functions agreements and disagreements.

Ratner, A. J. et al. Data programming: Creating large training sets, quickly. NIPS. 2016. Ratner, A. J. et al. Snorkel: rapid training data creation with weak supervision. VLDB Journal. 2019. 69 https://www.snorkel.org/ Data Programming

1 L1 1 Accuracy: 1{Li == Y} 1 Labeling Propensity: 1{Li != 0} 3 2 L2 2 Y Pairwise Correlation: 1{Li == Lj} 2 3 Indicator Function 3 L3

w1 w2 w3 w4 w5 w6 w7 w8 w9

1 2 3 1 2 3 1 2 3

Ratner, A. J. et al. Data programming: Creating large training sets, quickly. NIPS. 2016. Ratner, A. J. et al. Snorkel: rapid training data creation with weak supervision. VLDB Journal. 2019. 70 https://www.snorkel.org/ Snorkel

• Data Labeling: • Labeling functions to heuristically or noisily label some subset of the training examples; • Data augmentation: • Transformation functions to heuristically generate new, modified training examples by transforming existing ones; • Slicing: • Slicing functions to heuristically identify subsets of the data the model should particularly care about.

Ratner, A. J. et al. Snorkel: rapid training data creation with weak supervision. VLDB Journal. 2019. 71 https://www.snorkel.org/ Multimodal BiLSTM

Wu, S. et al. Fonduer: Knowledge Base Construction from Richly Formatted Data. SIGMOD. 2018. 72 https://fonduer.readthedocs.io/en/latest/ Relational Machine Learning Knowledge Graph Completion

Task Query Example Result Example

Triple Classification (Einstein, died-in, USA)? (Yes, 90%)

(1, Blue Hawaii, 3.23), (2, Change of Tail Prediction (Elvis Presley, starred-in, ?) Habit, 3.12), ... (1, Humphrey Bogart, 2.21), (2, Ingrid Head Prediction (?, starred-in, Casablanca) Bergman, 2.01), ...

Relation Prediction (Einstein, ?, Germany) (1, born-in, 5.01), (2, died-in, 1.23),...

L Attribute Prediction (Obama, nationality, ?) (1, american, 2.21), (2, kenian, 1.02), ... i n k

P r (Michael Jackson, isA, ?) (1, singer, 6.20), (2, composer, 5.22),... e Entity Classification d i c t i o n (ranking, answer, score)

74 Relational Machine Learning

• Probabilistic Graphical Models (PGMs) • Graphical Feature Models (GFMs) • Latent Feature Models (LFMs): • Knowledge Graph Embedding.

Model

φθ: E X RE X E → Y Score Parameters Possible Triples

Maximilian, N. A review of relational machine learning for knowledge graphs. IEEE. 2016. 75 Supervised Learning

( , S q u a re ) ( , S q u a re ) Supervised Predictive Triangle ( , ) ( , C ir c le ) Learning Model ( , T ri an g l e) ( , C ir c le )

Square

Predictive Triangle Model

Circle 76 Spouse João Maria

Parent Child (Lucas, child, João)? Lucas

Possible Triples: Not observed + Observed Triples (João, Spouse, João) (Maria, Spouse, João) (Lucas, Spouse, João) (João, Spouse, Maria) (Maria, Spouse, Maria) (Lucas, Spouse, Maria) (João, Spouse, Lucas) (Maria, Spouse, Lucas) (Lucas, Spouse, Lucas) (João, Parent, João) (Maria, Parent, João) (Lucas, Parent, João) (João, Parent, Maria) (Maria, Parent, Maria) (Lucas, Parent, Maria) (João, Parent, Lucas) (Maria, Parent, Lucas) (Lucas, Parent, Lucas) (João, Child, João) (Maria, Child, João) (Lucas, Child, João) (João, Child, Maria) (Maria, Child, Maria) (Lucas, Child, Maria) (João, Child, Lucas) (Maria, Child, Lucas) (Lucas, Child, Luca77s) ( C hi ld ,1) ( S p ou se ,0) Child ( ,1) Supervised Predictive Spouse Learning Model ( ,0) ( S p o us e ,1) ( P a re nt ,0)

Parent Predictive Model 1

78 Tensor Factorization Formulation

R Based on observed relationships, e la tio ns does an unseen relationship exist? E n t i t i e s Entities

79 Relational Machine Learning

• Probabilistic Graphical Models (PGMs): The existence of a triple depends on the existence of other triples. • Graphical Feature Models: The existence of a triple is conditionally independent of the existence of other triples given model parameters and observed features in the graph. • Latent Feature Models: The existence of a triple is conditionally independent of the existence of other triples given model parameters and latent features.

Maximilian, N. A review of relational machine learning for knowledge graphs. IEEE. 2016. 80 PGMs

Knowledge Graph: Dependency Graph: 2 entities, 2 possible relations, 8 nodes (1 for each possible triple) 8 possible triples 27 possible pair-interdepedencies

Y2 Y3

E E 1 2 Y1 Y4

Y5 Y8

Y6 Y7

- Yi is a RV denoting the ith triple existence. - PGM to model the distribution over possible worlds: Prob(Y1,Y2,Y3,Y4,Y5,Y6,Y7, Y8) 81 Markov Logic Networks

• Syntax: Weighted first-order formulas; • Semantics: Templates for Markov Nets; • Inference: Logical and Probabilistic; • Learning: Statistical and Inductive Logical Programming.

82 Markov Logic Networks

• Set of pairs (Fi, wi): • Fi: First-order logic formula; • wi: A real number (weight). • ni: Number of satisfied groundings of Fi in y (possible world). • PGM: • One node for each grounding atom. • Edges between nodes appearing at the same grounding formula.

Richardson, M. and Domingos, P. Markov Logic Networks. Machine Learning. 2006. 83 Markov Logic Networks

Knowledge Base Cancer(A) Cancer(J) Rules x Smokes(x) → Cancer(x) ∀x y Friends(x,y) → (Smokes(x) ↔ Smokes(y)) Smokes(A) Smokes(J) ∀ ∀ Facts Ana(A) João(J) Friends(A, A) Friends(A, J) Friends(J, A) Friends(J, J)

Rule (feature) weights Number of times the rule is satisfied in the world y.

Richardson, M. and Domingos, P. Markov Logic Networks. Machine Learning. 2006. 84 Graphical Feature Models

• Graph observed features • Similarity Indices • Rule mining • Inductive Logic Programming

85 Path Ranking Algorithm

• Random walks of bounded length; • Feature Extraction • Build feature vectors for triples;

• Vectors are based on relation paths Pj(R1,...,Rn). • Training: • Train an off-the-shelf machine learning model.

Lao, N., Mitchell, T. and Cohen, W. W. Random walk inference and learning in a large scale knowledge 86 base. EMNLP. 2011. Relation Paths P1(R1, R2) -1 -1 P2(R2 , R1 ) (A, R1, B), (B, R2,C) -1 P3(R3, R2 ) (A, R3, C) -1 P4(R3 , R1) -1 P5(R2, R3 ) R -1 3 P6(R1 , R3)

R1 R2 A B C -1 -1 R1 R2

-1 R3

Probabilities of reaching C from A by following given relation paths: [ 1/4, 0, 0, 0, 0, 0]

Maximilian, N. A review of relational machine learning for knowledge graphs. IEEE. 2016. 87 Enumeration of 2-length paths:

-1 -1 -1 -1 -1 -1 P1(Sp,Pa) P2(Pa ,Sp ) P3(Pa,Ch) P4(Ch ,Pa ) P5(Ch, Sp ) P6(Sp ,Ch )

Sp Si-1 Ch: Child, João Ana Luísa Sp-1 Si Pa: Parent, Ch Sp: Spouse, -1 Sp Sp-1 Si: Sibling Pa-1 Pa Pa Pa Ch-1 Ch Maria Lucas Rafael Bruna

Ch-1

Feature vector associated with (Luísa, Parent, Bruna)

0 0 0 0 0 1/4

Maximilian, N. A review of relational machine learning for knowledge graphs. IEEE. 2016. 88 Enumeration of 2-length paths:

-1 -1 -1 -1 -1 -1 P1(Sp,Pa) P2(Pa ,Sp ) P3(Pa,Ch) P4(Ch ,Pa ) P5(Ch, Sp ) P6(Sp ,Ch )

Sp Si-1 Ch: Child, João Ana Luísa Sp-1 Si Pa: Parent, Ch Sp: Spouse, -1 Sp Sp-1 Si: Sibling Pa-1 Pa Pa Pa Ch-1 Ch Maria Lucas Rafael Bruna

Ch-1

Feature vector associated with (Luísa, Parent, Bruna)

0 0 0 0 0 1/4

Maximilian, N. A review of relational machine learning for knowledge graphs. IEEE. 2016. 89 Path Ranking Algorithm

• Logistic regression • θ: model parameter vector • σ(x) = 1 / (1 + exp(-x))

• f(s,r,o) : (s,r,o) feature vector • y(s,r,o) : 1 iff (s,r,o) is in the knowledge graph

φθ(s,r,o) := σ(<θ, f(s,r,o)>)

90 n2 n6 Feature Engineering - Engineered features to n1 n4 n5 n8 measure local neighborhoods; n n 3 7 - Graph Kernels; - Summary statistics.

Machine Learning or Node Classification: Model

91 Knowledge Graph Embeddings Word Embeddings in NLP

man

king swimming woman walking queen Brasil swam Italy walked Spain Brasilia Rome Madrid

Bengio, Y. et al. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern 93 Analysis and Machine Intelligence. 2013. Embeddings

Word Word Embedding I I [0.19,...,0.87] like like [0.09,...,0,46] apple apple [0.97,...,0.22]

Entitty/Relation Embedding Apple [0.92,...,0.12] Bill Gates [0.08,...,0.46] Semantic Triple capital-of [0.67,...,0.37] (Steve Jobs, founder, Apple) founder [0.12,...,0.41] (Bill gates,founder, Microsoft) France [0.65,...,0.87] (Paris, capital-of, France) Microsoft [0.78,...,0.21] Paris [0.98,...,0.46] Steve Jobs [0.12,...,0.69] 94 Graph Representation Learning

Espato, A. Innovations in Graph Representation Learning. 2019. 95 https://ai.googleblog.com/2019/06/innovations-in-graph-representation.html. Knowledge Graph Embedding (KGE) Embed components of a Knowledge Graph including entities and relations into continuos vector spaces, so as to simplify the manipulation while preserving the inhrent structure of the KG.

Wang, Q. et al. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE 96 Transactions on Knowledge and Data Engineering. 2017. KGE Models

Entity acted-in Embedding born-in died-in plays-in

owns born-in Relation died-in Embedding owns

plays-in acted-in 97 KGE Models

• Translational Distance Models • Score functions based on distance. • Models • Score functions based on similarity. • Multiplicative, additive and neural approaches.

98 Model requirements

• Expressiveness: Model KG properties and regularities as much as possible (symmetry, antisymmetry, inversion, composition, etc.) • Space / time complexity: O(NE), O(NR), O(NT) in time and memory.

99 Anatomy of a KGE Model

• Knowledge Graph (KG); • Negative triples generation strategy; • Scoring function for a triple; • Loss function; • Optimization algorithm.

100 Hierarchical TransE structure

translation vs

vr 1-to-1 relation as vo translation (NLP) Brasilia capital-of Brasil

Bordes, A. et al. Translating Embeddings for Modeling Multi-relational Data. NIPS. 2013. 101 TransE

• Constraints on entity embedding: • Prevent learning trivial representations. • Limitations on dealing with 1-N, N-1, N-M relations.

Movies are v”Maryl Streep” + v”starred-in” = v”Death becomes her” very different: v”Maryl Streep” + v”starred-in” = v”Doubt” cast, genre, director, etc.

Bordes, A. et al. Translating Embeddings for Modeling Multi-relational Data. NIPS. 2013. 102 TransH

• Project to relation-specific hyperplanes.

||wr||2 = 1 vs vo wr

vr vs⊥ vo⊥

Wang, Z. et al. Knowledge Graph Embedding by Translating on Hyperplanes. AAAI. 2014. 103 Translational Distance Models

- K2GE; - TransG; - ManifoldE; - TransH; - SE; - TransM; - STransE; - TransR; - TransD; - TranSparse; - TransE; - UM. - TransF;

Wang, Q. et al. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Transactions on Knowledge and Data Engineering. 2017. Nguyen, D. Q. An overview of embedding models of entities and relationships for knowledge base 104 completion. ArxiV. 2019. RESCAL

Too many parameters!!!

score

w11 w12 w21 w22

vs1 vs2 vo1 vo1

105 RESCAL

• DistMult:

• Wr as diagonal matrix. • Can not deal with asymmetric relations. • Complex: • Introduce complex value embedding; • Score based on the real part of embeddings.

Nickel, M. et al. A three-way model for collective learning on multi-relational data. ICML. 2011. Yang, B. et al. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. ICLR. 2015. Trouillion, T. Knowledge Graph Completion via Complex Tensor Factorization. 2017. Analogy

female Man Woman Helps to predict royal royal

King Queen female

Man is to king what woman is to queen. king - man + woman = queen

Ethayarajh, K. et al. Towards Understanding Linear Word Analogies. ACL. 2019. 107 Liu, H et al. Analogical Inference for Multi-relational Embeddings. PMLR. 2017. Analogy

Normality constraints.

Commutative constraints.

Liu, H et al. Analogical Inference for Multi-relational Embeddings. PMLR. 2017. 108 SimplE

• Canonical Polyadic Decomposition. • Two vectors for each entity: • One as subject and another as object.

Trilinear product Element-wise product

Kazemi, S. M. and Poole, D. SimplE Embedding for Link Prediction in Knowledge Graphs. NIPS. 2018. 109 Shallow and Deep Models

Plausibility of (ei, rk, ej) existence

Embedding Dimension Shallow Models - Representation expressiviness depends on the ei embedding dimension; - Transductiveness. #

E n t i t i e e j s Deep models - Overfitting; Lookup Table - Time and space.

110 ConvE 1D Convolution no padding a1 a2 a3 a4 stride 1 a1 a2 a3 a4 b1 b2 b3 b4 b1 b2 b3 b4 f1*b2 + f2*b3 + f3*b4 f1 f2 f3

2D Convolution padding 1 a1 a2 stride 1 a1 a2 a3 a4 a3 a4 f1 f2 b1 b2 f3 f4 b1 b2 b3 b4 b3 b4 f1*0 + f2*0 + f3*0 + f4*a1

f1*a4 + f2*0 + f3*b2 + f4*0

Dettmers, T. et al. Convolutional 2D Knowledge Graph Embeddings. AAAI. 2018. 111 ConvE Embeddings ”Image” Feature maps Projection Logits Predictions Matrix multiplication .5 Fully Connected with entity Logistic Concatenate X Convolve X projection matrix Sigmoid .3 .9 X X X .1

vs vr Embedding Feature Map Hidden Dropout .25 Dropout .25 Dropout .25

Dettmers, T. et al. Convolutional 2D Knowledge Graph Embeddings. AAAI. 2018. 112 Convolution Neural Network

113 Convolution Neural Network

114 Convolution Neural Network

115 Convolution Neural Network

116 Graph Convolution

Generalize the operation of convolution from grid data to graph data.

Main idea: Learn a representation for a node taking into account its neighbors representations.

Wang, Q. et al. A Comprehensive Survey on Graph Neural Networks. ArXiv. 2019. 117 Differentiable message-passing framework

element-wise activation function

hidden state at l+1 layer hidden states at l layer incoming messages for node u

(l+1) l+1 - hu is a real vector of dimension d ; - Messages are often chosen to be identical to the set of incoming edges; - gm may be Whv.

Kipf, T. N and Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. ICLR. 2017. Schlichtkrull, M. et al. Modeling Relational Data with Graph Convolutional Networks. ESWC. 2018. Relational Graph Convolutional Networks

self-relation specific matrix relation specific matrix

r-neighborhood of entity e relation specific constant

Schlichtkrull, M. et al. Modeling Relational Data with Graph Convolutional Networks. ESWC. 2018. 119 In X h2 h1 h2

Out X h1 + h4 h

In X h3 h3 h4 In X h5 + ReLU h

Out X h6

h6 h5 Self X h

RGCN (Encoder) Decoder

1st 2nd ... nth Edge Input Score Layer Layer Layer Loss

120 Last layer representations taken as latent features. Relational Graph Convolutional Networks • Rapid growth in number of parameters: • Overfitting (rare relations); • Models of very large size. • Weights regularization: • Basis decomposition: linear combination of basis transformations. • Block-diagonal decomposition: direct sum over a set of low-dimensional matrices.

121 Graph Neural Networks

• Convolutional Graph Neural Networks • Graph Autoencoders • Recurrent Graph Neural Networks • Spatial-temporal Graph Neural Networks

Wang, Q. et al. A Comprehensive Survey on Graph Neural Networks. ArXiv. 2019. 122 Attributive Relations

• Literals; • KGs often include: • Numerical attributes (e.g., ages, dates, financial, and geoinformation); • Textual attributes: (e.g., names, descriptions, and titles); • Images (e.g., profile photos, flags, and posters). • Useful for entities with few relationships; • Regard attributes as entities.

Gesese, G. A. and Russa Biswas, H. S. A comprehensive survey of knowledge graph embeddings with 123 literals: Techniques and applications. ArXiv. 2019. MKBE

• Multimodal Knowledge Base Embeddings; • Compositional encoding component: • Different neural encoders for the variety of observed data. • Embedding Multimodal Data: • Structured knowledge, numerical, text, and images.

Pezeshkpour, P. et al. Embedding multimodal relational data for knowledge base completion. EMNLP. 124 2019. Stacked bidirectional GRUs Feed forward layer

Character Level ...... Number ...

Embedding Vector Text

Image Sentence Level Pretrained VGGNet CNN over word 125 embeddings on ImageNet Entity Dense s Layer

Relation Dense Score r Layer Function Embeddings Entity Dense Layer Score for the Short Text GRU triple (s,r,o) o CNN Long Text

VGGNet Image

126 Ontology

• Explicit and implicitly employ ontological knowledge on KGE learning; • Restrictions on embedding space to guarantee expressiveness and consistency.

127 JOIE

• Instance-view, ontology-view, and cross view; • Cross-view association model: • Cross-view grouping (CG); • Cross-view transformation (CT). • Intra-view model: subclass A • Default; ontology-view B • Hierarchy-aware. • Specific Cost Functions. cross-view isA

Instance-view X Y r

Pezeshkpour, P. et al. Universal Representation Learning of Knowledge Bases by 128 Jointly Embedding Instances and Ontological Concepts. KDD. 2019. City Entities

Person Entities Rio

Fortaleza Adele

Pelé City

Person

129 fct City City Entities Entities Person Person Entities Rio Entities Rio Fortaleza Adele Pelé Fortaleza Pelé City

Adele Person fct

130 ”Person” ”Place” Place hierarchy hierarchy Person isA isA State Institution isA isA Artist isA locatedIn isA Scientist works-in isA City University Musician

131 Model Training and Evaluation

• Triple Classification Protocol: • Test the model's ability to discriminate between true and false triples. • Triple (s,r,o) is classified as positive if its score exceeds a relation-specific decision threshold (learned on validation data). • Entity Ranking Protocol: • Assess model performance in terms of ranking answers to certain questions.

Wang, Y. et al. On Evaluating Embedding Models for Knowledge Base Completion. RepL4NLP. 2019. 132 Evaluation Metrics

• Regression, classification and ranking metrics.

(Mean Reciprocal Rank)

133 s p o score rank Test triples: (Neymar, born-in, Mogi) Neymar born-in Mogi 0.80 1 2 (Kaká, born-in, Gama) Neymar born-in Gama 0.70 2

Entities: Neymar born-in Kaká 0.20 3 Kaká, Neymar, Mogi, and Neymar born-in Neymar 0.10 4 Gama Gama born-in Mogi 0.05 3

Kaká born-in Mogi 0.85 1

Mogi born-in Mogi 0.04 4 hits@1 = (1 + 0 + 0 + 1)/4 = 0.5 hits@2 = (1 + 1 + 1 + 1)/4 = 1 Kaká born-in Gama 0.80 2 1

Kaká born-in Kaká 0.10 3 MRR = (1 + 1/2 + 1/2 + 1)/4 = 0.75 Kaká born-in Neymar 0.20 4

Kaká born-in Mogi 0.85 1

Test Triple => Corrupted Triples: Gama born-in Gama 0.04 4 (s, r, o) => (s, r, o') and (s', r, o). Mogi born-in Gama 0.05 3 134 Neymar born-in Gama 0.70 2 Negative Sampling

unusual Perturb a triple:

- Uniformally sample from entities/relation set.

- tph (Average number of tail entities per head) - hpt (Average number of head entities per tail) - Bernoulli with parameters: - tph / (tph + hpt) for replacing the head and hpt / (tph + hpt) for replacing the tail

- Corrupt a position (i.e. head or tail) using only entities that have appeared in that position with the same relation. 135 Closed World P Square Error Loss Assumption o i n t Hinge Loss w i s e Logistic Loss

P a Hinge Loss i r w i s Logistic Loss e

Score given for triple t Label (0 or 1) for triple t 136 1-N Scoring

Probabilities Labels

Probability (s,r,o') being true.

Against all entities (s,r, o') label. (s,r) p y

137 Event, Softwares, and Datasets Conferences

Artificial Intelligence: ECML AAAI ICLR ICML IJCAI NIPS PKDD

Data Management: SIGMOD CIKM ESWC KDD VLDB WSDM WWW PODS

Natural Language Processing:

ACL EMNLP

139 • Automated Knowledge Base Construction; • Knowledge Graph Conference; • International Workshop on Challenges and Experiences from Data Integration to Knowledge Graphs; • Workshop on Knowledge Graph Technology and Applications; • Workshop on Deep Learning for Knowledge Graphs.

140 KGE libraries and systems

• AmpliGraph (Tensorflow, Benchmark Datasets): • https://ampligraph.org/; • DeepGraphLibrary (MXNet/Gluon and PyTorch): • https://www.dgl.ai/ • OpenKE (Tensorflow, Pretrained embeddings): • http://openke.thunlp.org • PyKEEN (PyTorch): • http://pykeen.readthedocs.io • PyTorch-BigGraph (PyTorch): • https://torchbiggraph.readthedocs.io/en/latest

141 Model is implemented by the system. PyTorch AmpliGraph DeepGraphLibrary OpenKE PyKEEN BigGraph ConvE Complex DistMult ERMLP HolE RESCAL R-GCN Structured Embedding TransD TransE TransH TransR TranSparse Unstructured Model PyTorch-BigGraph

• Built on PyTorch; • Models: • ComplEx, DistMult, RESCAL, and TransE. • Features: • Graph Partitioning; • Multi-threaded computation; • Distributed execution accross multiple machines; • Batched negative sampling.

Lerer, A. et al. PyTorch-BigGraph: A Large-scale Graph Embedding System. 2019. 143 https://torchbiggraph.readthedocs.io/en/latest/ PyTorch-BigGraph

Lerer, A. et al. PyTorch-BigGraph: A Large-scale Graph Embedding System. 2019. 144 https://torchbiggraph.readthedocs.io/en/latest/ DeepGraphLibrary

• Built on MXNet/Gluon and PyTorch. • Models: • Graph Neural Networks: GCN, GAN, R-GCN, LGNN, and SSE. • Generative Models: DGMG and JTNN. • Capsule, Transformer, and Universal Transformer architecture.

https://www.dgl.ai/ 145 Bechmark Datasets

• DB15K: • https://github.com/nle-ml/mmkb • FB15K: • https://github.com/nle-ml/mmkb • FB15K-237: • https://github.com/TimDettmers/ConvE • WN18: • https://everest.hds.utc.fr/doku.php?id=en:transe • WN18RR • https://github.com/TimDettmers/ConvE • YAGO3-10: • https://github.com/TimDettmers/ConvE • YAGO15K: • https://github.com/nle-ml/mmkb

146 Open Research Knowledge Graph Embeddings

• KGE and the lack of symbolic structures (e.g., rules and restrictions); • Compare/Combine KGE to/with other approaches (e.g., Probabilistic Soft Logic + rule learning); • KGE interpretability and explainability; • KGE consistency (e.g., embed formal knowledge);

148 Knowledge Graph Embeddings

• KGE sparsity and uncertainty; • Temporal and spatial dynamics; • Other structures: • Paths, motifs, graphlets, etc. • Representation: • Hypergraphs (n-ary relations); • Meta-properties. • Inductive vs. transductive learning.

149 Knowledge base/graph construction • Heterogeneous and multimodal information; • Multi-language knowledge bases; • Construction in specific domains; • Entity disambiguation and managing identity;

150 Knowledge base/graph construction • Managing operations at scale; • Virtual knowledge graph. • Continuously learning and self-correcting systems. • Knowledge graph alignment.

151 Structured Semi-structured Non-Structured Data Data Data Ex.: Relational Databases. Ex.: XML and JSON Ex.: Text, audio, image, documents. and video files.

Knowledge Base Construction Ex.: Entity and relation extraction.

Correctness Completeness Knowledge Graph

Freshness

Refinement

Completion and Correction

Applications 152 http://dexl.lncc.br/ dramos,ziviani,[email protected] Os autores agradecem ao CNPq, à FAPERJ e ao CENPES/Petrobras pelo financiamento.