Web open standards for and knowledge graphs as enablers of EU digital sovereignty

Fabien Gandon, http://fabien.info PROFILE

. Graduated Engineer INSA Applied Math, DEA/Master Image & Vision . PHD & HDR (Habilitation) in computer science . Research Director / Senior researcher, INRIA . Leader Wimmics (UCA, Inria, CNRS, I3S) on Campus Sophia Antipolis . Advisory Committee of W3C . Responsible research convention French Ministry of Culture – Inria . Vice-head of Science for Inria Sophia Antipolis DR/Professors: . Fabien GANDON, Inria, AI, KRR, , Social Web, K. Graphs . Nhan LE THANH, UCA, Logics, KR, Emotions, Workflows, K. Graphs . Peter SANDER, UCA, Web, Emotions . Andrea TETTAMANZI, UCA, AI, Logics, Evo, Learning, Agents, K. Graphs WIMMICS TEAM . Marco WINCKLER, UCA, Human-Computer Interaction, Web, K. Graphs CR/Assistant Professors: . Michel BUFFA, UCA, Web, Social Media, Web Audio, K. Graphs . Elena CABRIO, UCA, NLP, KR, Linguistics, Q&A, Text Mining, K. Graphs . Olivier CORBY, Inria, KR, AI, Sem. Web, Programming, K. Graphs . Catherine FARON-ZUCKER, UCA, KR, AI, Semantic Web, K. Graphs . Damien GRAUX, Inria, Linked Data, Sem. Web, Querying, K. Graphs . Serena VILLATA, CNRS, AI, Argumentation, Licenses, Rights, K. Graphs

Research engineer: Franck MICHEL, CNRS, Linked Data, Integration, DB, K. Graphs

External: . Andrei Ciortea (University of St. Gallen) Agents, WoT, Sem. Web, K. Graphs . Nicolas DELAFORGE (Mnemotix) Sem. Web, KM, Integration, K. Graphs . Alain GIBOIN, (Retired CR Inria), Interaction Design, KE, User & Task, K. Graphs . Freddy LECUE (Thales, Montreal) AI, Logics, Mining, Big Data, S. Web , K. Graphs OWL

N-Quad TriG RDF XML RDFS N-Triple /N3 CSV-LD R2RML JSON LD

GRDDL XML JSON LDP SPARQL SHACL RDFa Linked Data RDF HTML HTTP

URI, IRI, URL, HTTP URI

STANDARDS FOR DATA & KNOWLEDGE GRAPHS ON THE WEB (1/8) Web open standards Consortium an international community leading the Web to its full potential since 1994 i.e. building an open, interoperable Web that works for everyone, by developing freely available and open standards for it.

In 2016, Tim Berners-Lee received the Turing Award for his invention of the Web . Over 430 Members org. around the world = . The not-for-profit organization’s staff of 50 supported by Membership dues World Wide Web Consortium . Over 12,000 developers worldwide . 38 working groups + 10 interest groups + 350 Business Groups and Community Groups . Hundreds of open technologies that power… browsers, smart phones, ebook readers, set top boxes, automobiles, search engines, social media, trillions of dollars of online commerce, and more than a billion Web sites

xschema skos

xslfo rdf rdfs owl

xsignat. xbop :id

ns xml xbase canon. x dtxml xfrag

woff wscdl wsp wsdl xkms

sml ttml smile webcgm svg awww

pics png powder qa rif sec cont.

ets mf omr m. ok emma geo api

its cmwww ruby an. assx dom xform ddrsa xml eve. exi … ra earl mwbp cc/pp

aria wcag iri uaag atag … uri http url examples of former or current members examples of standards for instance… (2/8) Web open standards for… distributed, interoperable hypermedia AN HYPERMEDIA linking everything… three components of the Web architecture

1. identification (URI) & address (URL) ex. http://www.inria.fr

URL three components of the Web architecture

1. identification (URI) & address (URL) ex. http://www.inria.fr HTTP 2. communication / protocol (HTTP) GET /centre/sophia HTTP/1.1 address Host: www.inria.fr

URL three components of the Web architecture

1. identification (URI) & address (URL) ex. http://www.inria.fr HTML communication HTTP WEB 2. communication / protocol (HTTP) GET /centre/sophia HTTP/1.1 reference address Host: www.inria.fr

3. representation language (HTML) URL Fabien works at Inria [Tim Beners-Lee et al., 1994]

14 (3/8) Web open standards for… distributed, interoperable identifiers Universal Resource Locator / Indentifier

HTML communication HTTP HTML communication HTTP WEB WEB reference address reference address

URL URI identify what identify, exists on the on the web, what web exists http://my-site.fr http://animals.org/this-zebra • URI for Paris in DBpedia: http://dbpedia.org/resource/Paris • URI for name of Victor Hugo in the Library of Congress: http://id.loc.gov/authorities/names/n79091479 • The MUC18 protein at UniProt http://www.uniprot.org/uniprot/P43121 • Xavier Dolan in Wikidata https://www.wikidata.org/wiki/Special:EntityData/Q551861 • The book with doi:10.1007/3-540-45741-0_18 http://dx.doi.org/10.1007/3-540-45741-0_18 • URIs for everything e.g. identifying 1025 car configurations

[François-Paul Servant et al. ESWC 2012] (4/8) Web open standards for… distributed, interoperable data RDF: a Web standard for knowledge graphs

HTML communication HTTP RDF communication HTTP WEB WEB reference address reference address

URI URI a Web approach to data publication

« http://fr.dbpedia.org/resource/Paris » ???... a Web approach to data publication

HTTP URI GET a Web approach to data publication

HTTP URI GET

HTML, … a Web approach to data publication

HTTP URI GET

RDF linked data The MUC18 protein at UniProt http://www.uniprot.org/uniprot/P43121 linked open data(sets) cloud on the Web

1400 number of linked open datasets on the Web 1200

1000

800

600

400

200

0 5/1/2007 10/8/2007 11/7/2007 11/10/2007 2/28/2008 3/31/2008 9/18/2008 3/5/2009 3/27/2009 7/14/2009 9/22/2010 9/19/2011 8/30/2014 1/26/2017 Smarter Cities’ knowledge graphs IBM Dublin [Lécué et al., 2015] (also for private KGs behind firewalls) (5/8) Web open standards for… distributed interoperable access SPARQL : Get Data, Not Documents

ex. DBpedia

31 185 377 686 RDF triples extracted and mapped

DBPEDIA.FR 180 000 000 arcs in an encyclopedic knowledge public dumps, endpoints, interfaces, APIs… graph 2.5 millions max

70 000 on average

number of queries per day COVID LINKED DATA [Gandon, Michel, Gazzotti, Mayer, Cabrio, Corby, Menin, Winckler, Villata et al. 2020] . integrate multiple datasets in heterogeneous formats . perform information extraction, inferences, validation . provide a public end-point and visualization services (6/8) Web open standards for… distributed interoperable validation SHACL is a language for describing and validating pieces (shapes) of RDF knowledge graphs

eg. every Person must have one and only one name

used for validation, description, interaction, integration, code generation,… [Corby et al., 2019]

ONTOLOGY FOR AI ITSELF This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement 825619. . ontology and of AI resources . SHACL to validate AI4EU these RDF graphs . online endpoint http://corese.inria.fr . predefined SPARQL queries, SHACL shapes, display (7/8) Web open standards for… distributed, interoperable vocabularies RDFS to declare classes of resources and properties, of your knowledge graph and organize their hierarchy

Document creator

author Report Document Person OWL in one… union  disjunction algebraic properties intersection complement ! restriction 1..1 disjoint properties cardinality ! qualified cardinality equivalence 1..1  enumeration individual prop. neg

[>18] value restriction   chained prop.  disjoint union keys … Sexe Date Cause CISP2 ... History Observations Element Number

H 25/04/2012 vaccin-antitétanique A44 ... Appendicite EN CP - Bon état général - auscult Patients 55 823 pulm libre; bdc rég sans souffle - Consultations 364 684 tympans ok- Past medical history 187 290 Biometric data 293 908 PRIMEGE Semiotics 250 669 Diagnosis 117 442 Row of prescribed drugs 847 422 Symptoms 23 488 Health care procedures 11 850 Additional examination 871 590 Paramedical prescription 17 222 PREDICT HOSPITALIZATION Observations/notes 56 143 [Gazzotti, Faron et al. 2020] . Predict hospitalization from Physician’s records classification Sexe Date Cause CISP2 ... History Observations Element Number

H 25/04/2012 vaccin-antitétanique A44 ... Appendicite EN CP - Bon état général - auscult Patients 55 823 pulm libre; bdc rég sans souffle - Consultations 364 684 tympans ok- Past medical history 187 290 Biometric data 293 908 PRIMEGE Semiotics 250 669 Diagnosis 117 442 Row of prescribed drugs 847 422 Symptoms 23 488 Health care procedures 11 850 Additional examination 871 590 Paramedical prescription 17 222 Observations/notes 56 143 PREDICT HOSPITALIZATION (1) [Gazzotti, Faron et al. 2020] . Predict hospitalization from Physician’s records classification . Augment records data with Web knowledge graphs Sexe Date Cause CISP2 ... History Observations Element Number

H 25/04/2012 vaccin-antitétanique A44 ... Appendicite EN CP - Bon état général - auscult Patients 55 823 pulm libre; bdc rég sans souffle - Consultations 364 684 tympans ok- Past medical history 187 290 Biometric data 293 908 PRIMEGE Semiotics 250 669 Diagnosis 117 442 Row of prescribed drugs 847 422 Symptoms 23 488 Health care procedures 11 850 Additional examination 871 590 Paramedical prescription 17 222 Observations/notes 56 143 PREDICT HOSPITALIZATION (1) [Gazzotti, Faron et al. 2020] . Predict hospitalization from Physician’s records classification . Augment records data with Web knowledge graphs (2) . Study impact on prediction broaderTransitive skos:narrowerTransitive broaderTransitive broaderTransitive skos:narrower broader broader SKOS #Mathematics #Algebra #LinearAlgebra thesaurus, lexicon narrower narrower skos:broaderTransitive narrowerTransitive narrowerTransitive skos:broader narrowerTransitive Joconde from French museums MonaLIA [Bobasheva et al. 2020] 350 000 images RDF metadata based . reason & query on RDF to build training sets. of artworks on external thesauri (1) Joconde database from French museums MonaLIA [Bobasheva et al. 2020] 350 000 images RDF metadata based . reason & query on RDF to build training sets. of artworks on external thesauri . transfer learning & CNN classifiers on targeted (1) categories (topics, techniques, etc.)

(2) Joconde database from French museums MonaLIA [Bobasheva et al. 2020] 350 000 images RDF metadata based . reason & query on RDF to build training sets. of artworks on external thesauri . transfer learning & CNN classifiers on targeted (1) categories (topics, techniques, etc.) . reason & query RDF of results to address silence, noise and explain

Image Metadata Score  (2) figure (saint Eloi de Noyon, évêque, en pied, bénédiction, vêtement liturgique, mitre, attribut, cheval, marteau, outil : ferronnerie) cheval: 000SC022652 0.006

C:/Joconde/joconde\0355/m079806_bsa0030101_p.jpg

Image Metadata Score 

portrait cheval: 50350012455 0.999 C:\Joconde\joconde\0138\m503501_d0012455-000_p.jpg (3) Web open standards as enablers of interoperable platforms e.g.

“Solid (…) is a proposed set of conventions and tools for building decentralized Web applications based on Linked Data principles. (…)

It relies as much as possible on existing W3C standards and protocols. (…)

RDF 1.1 (…) The WebID 1.0 (…) The FOAF vocabulary (…) WebID-TLS protocol (…) HTML5 (…) Linked Data Platform (LDP) standard” https://github.com/solid/solid#standards-used “I’m right there in the room, but no one even acknowledges me.” (8/8) Web open standards for… distributed, interoperable Europe W3C = strategic place to survey and shape W3C = strategic place to survey and shape Web standards

Personal opinion: . Important to have a neutral place to build open-standards (1 member = 1 vote) . Important to have public and private members at W3C . Important to have a large European participation to W3C Web open standards & world-wide interoperability are key enablers of EU digital sovereignty . Interoperability is strategic to federate actors/actions. (cf. members) . Web standards are transversal to domains/tasks/… (cf. applications examples) . Importance of knowledge graphs and danger of knowledge silos. (cf. data) . Having established open standards between actors in Europe (public and private) is a stake for setting up European data spaces. Web open standards & world-wide interoperability are key enablers of EU digital sovereignty . Interoperability is strategic to federate actors/actions. (cf. members) . Web standards are transversal to domains/tasks/… (cf. applications examples) . Importance of knowledge graphs and danger of knowledge silos. (cf. data) . Having established open standards between actors in Europe (public and private) is a stake for setting up European data spaces.

• active participation to W3C is a key to build EU digital sovereignty. he who controls metadata, controls the web and through the world-wide web many things in our world.

WIMMICSFabien Gandon - @fabien_gandon - http://fabien.info Web-Instrumented Man-Machine Interactions, Communities and Site: http://wimmics.inria.fr Overview: http://bit.ly/wimmics-slides Technical details: http://bit.ly/wimmics-papers

   