Aprendizado De Máquina E Inferência Em Grafos De Conhecimento Machine Learning and Inference in Knowledge Graphs
Total Page:16
File Type:pdf, Size:1020Kb
Aprendizado de Máquina e Inferência em Grafos de Conhecimento Machine Learning and Inference in Knowledge Graphs Daniel N. R. da Silva Artur Ziviani Fabio Porto dramos,ziviani,[email protected] Laboratório Nacional de Computação Científica (LNCC) Simpósio Brasileiro de Bancos de Dados (SBBD), Outubro de 2019 Big Data Management Computational Reproducibility Knowledge Bases Machine Learning Network Science Scientific Workflows http://dexl.lncc.br/ Outline • Introduction and Motivation • Applications • Data Models and Systems • Relational Machine Learning • Knowledge Graph Embeddings • Events, Softwares, and Datasets • Open Research 3 Introduction and Motivation The third (current) rise of AI Custom Drug Image Board Games Relationship Discovery and Captioning Management Toxicology Natural Medical Image Medical Online Language Analysis Informatics Advertisement Processing Recommendation Self-Driving Robotics Web Search Systems Vehicles Deng, Li. Artificial Intelligence in the Rising Wave of Deep Learning: The Historical Path and Future 5 Outlook [Perspectives]. IEEE Signal Processing Magazine. 2018. Knowledge Representation and Reasoning (KRR) How to symbolically encode human knowledge and reasoning in such a way that this encoded knowledge can be processed by a computer via encoded reasoning to obtain intelligent behavior. Fragmentary Set of Medium for Medium of Theory of Surrogate Ontological Efficient Human Intelligent Commitments Computation Expression Reasoning Chein, M. and Mugnier, M-L. Graph-based Knowledge Representation: Computational Foundations of 6 Conceptual Graphs. Springer. 2008. Porphyry's Linked Open Commentaries 1960s Ontology 1998 Data 2012 300s AD Semantic 1980s The Semantic 2006 Knowledge Networks Web Graph 7 https://www.gartner.com/smarterwithgartner/5-trends-appear-on-the-gartner-hype-cycle-for-emerging- 8 technologies-2019/ Graph DBMS https://db-engines.com/en/ranking_categories 9 Linked Open Data https://lod-cloud.net/ 10 Knowledge Graph • Network representation for knowledge; • Heterogeneous data integration; • Constrained by an ontology or data schema; • AI applications. 11 Knowledge Graph • Terminological Component • C: Concepts • RC:Concept Relations • TC: Relationships between concepts • TC->A: Attributive relationships • A: Attributes • V: Attribute Values • Assertional Component: • E: Entities • RE: Entity Relations • TE: Relationships between entities • TE->C: Instantiation relationships • TE->V: Attributive relationships 12 13 Related Terms • Knowledge base: Set of sentences (assertions) expressed in a language called a knowledge representation language (semantics and syntax); • (Graph) Database: Storage and data organization; • Ontology: Formal explicit description of concepts in a domain of discourse. It provides sharable and reusable knowledge. • Knowledge Based System: Keeps a knowledge base and performs reasoning. Noy, N. and McGuinness, D. Ontology development 101: A guide to creating your first ontology. 2001. 14 Ehrlinger, L. and Wöß, W. Towards a definition of knowledge graphs. SEMANTiCS. 2016. Related Fields Artificial Intelligence Knowledge Natural Machine Representation Language Learning and Reasoning Processing Data Management Data Information Data Modeling Integration Retrieval 15 General-purpose KGs Bio & Medical KGs NELL Common-sense KGs & NLP 16 Product Graph & E-commerce KGs Big Techs • Apple Siri Knowlege Graph • Amazon Product Graph • Facebook Graph • Google Knowledge Graph • IBM Watson • Microsoft Satori 17 Knowledge Graphs # Entities # Triples # Concepts # Relations DBpedia 4 298 433 411 885 960 736 2 819 Freebase 49 947 799 3 124 791 156 53 092 70 902 OpenCyc 41 029 2 412 520 116 822 18 028 Wikidata 18 697 897 748 530 833 302 280 1 874 YAGO 5 130 031 1 001 461 792 569 751 106 Färber, M. and Rettinger, A. Which Knowledge Graph Is Best for Me? ArXiv. 2019. 18 Knowledge Graphs Accuracy 1 Interlinking Trustworthiness 0.8 0.6 Licensing 0.4 Consistency 0.2 0 Accessibility Relevancy Interoperability Completeness Ease of Timeliness understanding DBpedia Freebase OpenCyc Wikidata YAGO Färber, M. and Rettinger, A. Which Knowledge Graph Is Best for Me? ArXiv. 2019. 19 Färber, M. et al. Linked data quality of dbpedia, freebase, opencyc, wikidata, and yago. Semantic Web. 2018. DBpedia • Crowd-sourced community effort to extract structured content from Wikmedia projects; • Served as Linked Data; • Ontology, including persons, places, creative works, organizations, species, and diseases; • 125 languages; • Links to YAGO and Wikipedia. Lehmann, J et al. DBpedia – A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia.SWJ. 2012. https://dbpedia.org/ Structure in Wikipedia • Title • Abstract • Infoboxes • Geo-coodinates • Categories • Images • Links: • other language versions • other Wikipedia pages • to the web; • redirections; • disambiguations. 21 YAGO • YAGO (Yet Another Great Ontology). • Derived from Wikipedia, WordNet, and Geonames. • 95% accuracy on sample facts. • Temporal Dimension: • Before, after, during, and overlaps • Location Dimension: • northOf, eastOf, southOf, westOf, nearby, and locatedIn. • Multilingual facts. Rebele, T. et al. YAGO - A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames. ISWC. 2016. https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/ https://gate.d5.mpi-inf.mpg.de/webyago3spotlxComp/SvgBrowser/ 23 Structured Semi-structured Non-Structured Data Data Data Ex.: Relational Databases. Ex.: XML and JSON Ex.: Text, audio, image, documents. and video files. Knowledge Base Construction Ex.: Entity and relation extraction. Correctness Completeness Knowledge Graph Freshness Refinement Completion and Correction Applications 24 Example Text fragment Marie Curie was born on November 7, 1867, in Warsaw, Poland. She was a french naturalized, physicist and chemist who conducted pioneering research in the fields of radioactivity. Extracted triples (Marie Curie, date-of-birth, 07/11/1867) (Marie Curie, born-in, Poland) (Marie Curie, nationality, French) (Marie Curie, profession, Physicist) (Marie Curie, profession, Chemist) (Marie Curie, research-field, Radioactivity) 25 Polish Chemist born-in Physicist profession profession Marie Curie date-of-birth research-field nationality “07/11/1867” Radioactivity French 26 Challenges Coverage/Completeness Have we got the information we need? Correctness Is information accurate? Freshness Is information up to date? Gao, Yuqing. Building a Large-scale, Accurate and Fresh Knowledge Graph. KDD Tutorial 2018. 27 Applications Applications • Conversational Agents; • Data integration; • Fact checking and fake news detection; • Question Answering; • Recommendation Systems; • Search Engines. 29 Search Engines 30 Recommendation Systems Cast Tom Away Hanks Drama Back Robert Forrest To The Zemeckis Future Gump User SciFi Saving Steven Jurassic Private Spielberg Ryan Park Action Blade Harrison Runner Ford starred-in Movies the director-of Recommended user has genre movies. watched. Wang, H et al. Exploring High-Order User Preference on the Knowledge Graph for Recommender 31 Systems. ACM TOIS. 2019. Conversational Agents Natural Jonh Doe: @bot How much money Language do I have in my account? Processing User Query has Generation Account Balance isA Context JD has Management $ 0,99 Account ChatBot owns Domain Knowledge Jonh Doe Natural Language Bot: @johndoe You have $0,99 in Generation your bank account. 32 Question Answering Which films directed by Steven Spielberg won the Oscar Award? Film directed film Steven won isA Spielberg dep nsubj directed ? ? ... ? nsubj det nmod:by won Oscar Award Which Oscar Award Steven Spielberg Zheng, W et al. Question Answering Over Knowledge Graphs: Question Understanding Via Template Decomposition. PVLDB. 2019. 33 https://nlp.stanford.edu/software/lex-parser.shtml Fact Checking and Fake News Detection Barack Obama Columbia University Association of American Universities Canada Barack Obama secretly practices Islam. Stephen Harper Calagary Naheed Neshi Islam Ciampaglia, G. L. Computational Fact Checking from Knowledge Networks. PLOS ONE. 2015. 34 Personalized Medicine iASiS: Big Data to Support Precision Medicine and Public Health Policy. 2017. 35 Vidal, M-E et al. Semantic Data Integration of Big Biomedical Data for Supporting Personalised 36 Medicine. Springer. 2019. Literome • Extract genomic knowledge from PubMed articles; • Knowledge pertinent to genomic medicine: • Directed genic interactions such as pathways; • Genotype–phenotype associations. Poon, H. Literome: PubMed-scale genomic knowledge base in the cloud. Bioinformatics. 2014. 37 Project HANOVER • Machine reading for accelerating “Curations as a service” (CaaS) in precision medicine: • Molecular Tumor Board: • Cancer is a thousand diseases driven by disparate genetic mutations. • Real-world evidence: • FDA-approved drug takes over a decade and costs more than $2 billion. • Clinical trial matching: • 20% of clinical trials fail due to insufficient patients. https://hanover.azurewebsites.net/ 38 ResearchSpace • Semantic Web / Linked Data application; • Museums, libraries, archives: knowledge or memory institutions • CIDOC-CRM: • Conceptual Reference Model Oldman, D. and Tanase, D. Reshaping the Knowledge Graph by Connecting Researchers, Data and Practices in ResearchSpace. ISWC. 2018. 39 https://www.researchspace.org. Oldman, D. and Tanase, D. Reshaping the Knowledge Graph by Connecting Researchers, Data and Practices in ResearchSpace. ISWC. 2018. 40 https://www.researchspace.org.