Semantic Web Company Workshops & Trainings

Semantic Web & Linked Data in

Andreas Blumauer MSc IT Enterprises CEO Semantic Web Company An overview Welcome!

Andreas Blumauer, MSc IT CEO of Semantic Web Company, Vienna

Acknowldeged computer expert in the areas of Text Mining, Semantic Web, Knowledge Modelling & Linked Data • Some initial thoughts on “Information Quality” & “Knowledge Creation”

• What is the Semantic Web? What is Linked Data? - The end of documents? - Standards & Norms

• Some examples of Linked Data Applications

• Linked Data in the context of information management About Semantic Web Company (SWC)

SWC founded 2001 in Vienna

More than 25 Linked Data experts

Product: PoolParty Suite (on the markets since 2009)

Customers from all sectors

EU- & US-based Partner Network Our network: Customers & Partners

Finance / Automotive / Publisher / Health Care / Public Administration / Energy / Education

Customers Partners ● Credit Suisse ● Cognizant ● Daimler ● EBCONT ● Roche ● EPAM Systems ● Wolters Kluwer ● iQuest ● Tieto ● PwC ● Canadian Broadcasting Corporation (CBC) ● DTI AG ● World Bank Group ● Tenforce ● The Pokémon Company ● OpenLink Software ● Healthdirect Australia ● Ontotext ● Ministry of Finance (A) ● MarkLogic ● Wood Mackenzie ● Gravity Zero ● Red Bull Media House ● Altotech ● Council of the E.U. ● Wolters Kluwer ● TC Media ● Term Management ● American Physical Society ● Taxonomy Strategies ● Education Services Australia ● Search explained ● Pearson ● WAND ● Techtarget ● Digirati ● Norwegian Directorate of Immigration ● Cognistreamer ● REEEP ● Linked Data Factory ● European Commission ● Taxonic ● Bank of America ● semweb Information Quality & Knowledge Creation “Information Quality”:

The Enterprise View

nd • Information is often treated as ‘2 class citizen’ in enterprises • Information management lies in the responsibilty of the CTO → Information as technical artefact • Trend towards information silos, no standards • The value of contextual information and premium metadata is often underestimated • Business models rarely recognize the benefits of collaborative practices

→ Hypothesis 1: “The information demands of customers are often being neglected.” → Hypothesis 2: “Enterprises face increasing competitive pressure due to a lack of informational agility.” “Information Quality”: A Meta-Perspective

Humans & Information (CIO-View) Hans Rosling: Growth of the global population

Information increases in value,

• when communicators share a mutual Analog understanding (common sense), and • when information is designed according to the needs of its recipients (personalisation) Digital

→ Hypothesis: “The ability to transfer knowledge (contexts, interdependecies) becomes more important.” “Information Quality”: A Meta-Perspective

Humans & Information (CTO-View)

Information increases in value,

• the lower its integration costs, and • the cheaper its reusability in various contexts

→ Hypothesis: “Providing information (content) in various formats as service via APIs is key to increase information quality from a technical perspective.” What is the Semantic Web? What is Linked Data? Data as Precursor of Knowledge LOD Cloud Challenges in

Data & Information Management

1. Distributed Data Sources 2. Differing Formats 3. Implicit Semantics 4. Dubious Provenance 5. Missing Licenses 6. Unclear Topicality The Semantic Web: ‘Things’ not Strings

St. Mark’s Square Venice prefLabel http://www.mycom.com/ Piazza prefLabel taxonomy/97345854 altLabel San Marco http://www.mycom.com/ taxonomy/62346723

image

has broader http://www.mycom.com/ http://www.mycom. images/90546089 com/taxonomy/4543567

prefLabel altLabel Piazza Square The power of knowledge graphs: Agility, flexibility, complexity

Show me all Traditional approach documents about Graph-based approach European countries

Norway France Austria Canada Norway France Austria Canada

doc doc doc doc doc doc doc doc The power of knowledge graphs: Agility, flexibility, complexity

Show me all Traditional approach documents about Graph-based approach European countries

Europe

Europe, Europe, Europe, America, Norway France Austria Canada Norway France Austria Canada

doc doc doc doc doc doc doc doc The power of knowledge graphs: Agility, flexibility, complexity

Traditional approach Graph-based approach

Show me all documents about EU Europe member countries

EU EU, EU, Europe, Europe, Europe, America, Norway France Austria Canada Norway France Austria Canada

doc doc doc doc doc doc doc doc The power of knowledge graphs: Agility, flexibility, complexity

Traditional approach Graph-based approach

Europe French- speaking French- French, speaking? EU EU, EU, French, Europe, Europe, Europe, America, Norway France Austria Canada Norway France Austria Canada

doc doc doc doc doc doc doc doc The power of knowledge graphs: Agility, flexibility, complexity

Show me all Traditional approach documents from Graph-based approach European countries

Show me all documents from EU Europe Metadatamember countries Knowledge French- speaking French- French,per speaking? aboutEU EU, EU, French, Europe, Europe, Europe, America, NorwaydocumentFrance Austria Canada Norway metadataFrance Austria Canada

doc doc doc doc doc doc doc doc Linked Data: Discovering Answers to

Complex Questions

To answer the following question,

“Are there interdependencies between the Human Development Index of certain countries and the regional research activities concerning specific types of illnesses?”

the following sources can be consulted and linked: ● MeSH (Medical Subject Headings) ● PubMed ● Geonames ● DBpedia ● UNDP

Interlinking of various Knowledge Graphs & Ontologies is key

Venice http://www.mycom.com/ prefLabel taxonomy/5456544 St. Mark’s prefLabel Square http://www.mycom.com/ taxonomy/62346723 http://schema.org/containedIn

http://schema.org/location

http://www.mycom.com/ taxonomy/7835488 http://www.geonames.org/7302945

http://www.freebase.com/m/0q9rr

http://schema.org/City Peggy http://dbpedia.org/resource/ Guggenheim Peggy_Guggenheim_Collection

http://schema.org/TouristAttraction Museum https://www.youtube.com/ VeniceGuggenheim http://schema.org/ArtGallery

Semantic Web - The End of Documents? The End of Documents?

What is a Document? What should it be?

● Production: A tool to create information? ● Storage: A method to store information? ● Visualization: A convention to visualize and represent information? ● Interface: An access point (API) or container, to connect to information and make it findable? ● Craft: The art to tell stories, trigger emotions and/or create common sense? ● ? Knowledge workers link and contextualize information!

Journal article Dossier

Social Web Profil Health Record

Blog post Product information Law

News article Campaign

Regulation Poem

Contract Tweet

Product specification “Follow your nose (‘nous’)” ...some more graphs

Microsoft „Office Graph“ Google „Knowledge Graph“

Facebook „Social Graph“ What exactly do knowledge workers interlink? • Entities, not

documents!

• Things, not

strings! PoolParty Tagging Workflow

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut.

“strings” become “things” sadipscing

Corpus Analysis Quality Checks Concept-based Tagging in Enterprise Content Systems

Drupal Confluence SharePoint 2013 ‘Google’s Knowledge Graph’ as an example for semantic information machines

Enterprises just have started to create their own, specific knowledge graphs.

Which new opportunities can be derived from this development for the information management industry?

Mashup from knowledge graphs and API calls! BBC’s Linked Data Plattform:

How many information sources do you see?

Individual CMSs are pretty good at keeping tabs on the content they create but if you wanted to get hold of the 20 most recent pieces of content from across the BBC (and hence across CMSs) on Burkina Faso, or Jarvis Cocker or global warming it would be very tricky.

Oli Bartlett, product manager for the BBC's Linked Data Platform

Clean Energy Data - Country Profiles Linked Data is a data model, which is based on graphs

● Linked Data is a graph-based data model that is expressive enough to represent and to process a wide spectrum of types of information

→ Being used for Data Integration & Dynamic Semantic Publishing (DSP) in distributed environments (“Semantic Web”)

Semantic Web Standards & Technologies Resource Description Framework (RDF)

predicate Subject Object

Semantic Web is a Organization Company

Semantic Web is located in Vienna Company Simple Knowledge Organization System (SKOS) Taxonomies and controlled vocabularies

http://www.w3.org/2004/02/skos/ From Simple SKOS to large knowledge graphs

Link and map Generate 1st Edit,extend & Extend schema, between version of SKOS curate apply ontologies, taxonomies taxonomy taxonomy use SKOS-XL and LD graphs

- Reuse of existing - Taxonomy Editing - Reuse existing ontologies - Automatic mapping between vocabularies - Collaborative workflows - Create custom schemes taxonomies - Corpus Analysis - Free term extraction - Apply SKOS-XL - Linked Data frontend - Excel import - Tag recommender - Apply ontologies on your - Link to other LD graphs, e.g. - XML import - Quality Checker SKOS taxonomy DBpedia or Geonames - Linked data harvester

your data, your your CMS e.g. Excel docs Linked Vocabularies - Linked Contents

Wolters Kluwer

Working Law Thesaurus Eurovoc

STW Thesaurus

DBpedia Linked Data & Linked Vocabularies can be reused with increased efficiency

● Linked Data is based on standards and embedded in a wide data eco- system

→ Semantic Web based ontologies, thesauri, taxonomies and knowledge graphs can be reused at relatively low costs, at least technically spoken. Linked Open Data Graphs

38.8 mio. entities 3 bio. facts / triples 125 languages 50 mio. links to other sources

SKOS Thesauri

● Eurovoc (EU) ● UNESCO Thesaurus (UN) ● ESCO (EU) ● New York Times SH (US) ● Jurivoc (SUI) ● RAMEAU subject headings (FR) ● ScoT (AUS) ● TheSoz (DE) ● Agrovoc (UN) ● The General Finnish Thesaurus (FIN) ● MeSH (US) ● NAL Thesaurus (US) ● Getty Vocabularies (US) ● Social Semantic Web Thesaurus (AT) ● GEMET (EEA) ● Courts thesaurus (DE) ● GeoThesaurus (AT) ● SITC-V4 (UN) ● STW Economy (DE) ● Google Product Taxonomy (US) ● Polythematic SH (CZ) ● NAICS 2012 (US) ● Canadian Subject Headings (Can) ● Common Procurement Vocabulary (ES) ● LCSH (US) ● UKAT UK Archival Thesaurus (UK) ● Worldbank Taxonomy (WBG) ● NASA taxonomy (US) ● Labor Law Germany Thesaurus (DE) ● IVOA astronomy vocabularies (UK) ● Reegle Thesaurus (REEEP) ● IPTC News Codes (UK) ● Austrian Tax Law Thesaurus (AT) ● WAND taxonomies (US) Query language for knowledge graphs:

SPARQL Complex queries with SPARQL

PREFIX mrv-schema: PREFIX qb:

SELECT DISTINCT * WHERE { GRAPH {

?observation mrv-schema:year ?year. ?observation mrv-schema:region ?region. ?observation mrv-schema:region . ?observation mrv-schema:scenario ?scenario. ?observation mrv-schema:scenario . { ?observation mrv-schema:urbanizationType ?urbanizationType. ?observation mrv-schema:urbanizationType . ?observation mrv-schema:buildingType ?buildingType. ?observation mrv-schema:buildingType . ?observation mrv-schema:publicBuildingType ?publicBuildingType. ?observation mrv-schema:publicBuildingType . } UNION { ?observation mrv-schema:urbanizationType ?urbanizationType. ?observation mrv-schema:urbanizationType . ?observation mrv-schema:buildingType ?buildingType. ?observation mrv-schema:buildingType . ?observation mrv-schema:publicBuildingType ?publicBuildingType. ?observation mrv-schema:publicBuildingType . } UNION { ……. PoolParty Semantic Integrator: Unified Views on various data sources

based on

Linked Data Applications Traditional approach for data- and information integration

Show me the ‘most influential people Solution: Develop specific in the world’ who were born in countries application to integrate the data with an HDI less than 0.5? sources

Person 4711 Country 4812

Name Name Jeff Bezos USA Affiliation GDP Amazon $ 15.684 billion Born in HDI Albuquerque 0.937 Linked Data combines the requirements

‘Semantic search’ and ‘Business analytics’

● Linked Data is based on an expressive data model being able to represent a wide spectrum of types of information

→ Excellent capabilities for complex search- and analytics applications; combines and links the realms of structured & unstructured information Show me the ‘most influential Solution: Use people in the world’ who were Thesaurus/Taxonomy-Graph born in countries taxonomies and with an HDI less than 0.5? Continents U.S. ontologies to link knowledge graphs

America New Mexico

Knowledge-Graph 1 South Knowledge-Graph 2 Albuquerque America

Jeff Bezos United States

$ 15.684 Amazon Albuquerque 0,937 billion

Ontology-Graph GDP affiliated with born in Organization Person Place

HDI

See how it works: PoolParty components & workflows

● Confluence, WordPress is user of enrich SharePoint, Drupal, ... Content ● search engine Manager annotate ● database

basis for

analyzes Developer basis for uses API

basis for

works on enrich ● reference taxonomies Taxonomist/ ● linked data sources Ontologist ● text reference corpora Applications & Online-Demos

Thesaurus Publishing Business Intelligence Content Recommendation Semantic Expert Finder

Web Mining Semantic Search Semantic Tagging in SharePoint Symptom Checker Demo: PowerTagging for Drupal

Interactive data visualization Matchmaking based on linked data technologies

Quality of meta information and knowledge graphs describing profiles determine,

Information pieces and business objects like products, users, ads are linked dynamically Matchmaking between users / content

http://faq.poolparty.biz/

http://www.eip-water.eu/

The impact of Linked Data on a new generation of information management Benefit arguments

Operating Cost effectiveness The systemic view efficiency

Basic IT-Management / Information & Business Process argument Software Architect Knowledge Management Management

Better reuse of Better understanding of Unified views on Efficient existing information relations between things business objects and agile resources helps to save increases communication lead to better data model costs skills decisions

Higher Increased transparency on Information flows Efficient handling of information inconsistencies and adapt to the needs metadata quality contradictions of the user

Improved Automatic structuring Consistent use of controlled BI-like, complex information of unstructured data vocabularies triggers queries become retrieval help to save costs additional network effects possible Content value chains based on open data Summary: Data Silos & Documents open up!

● Graph-based data model ● Standards-based data model ● Expressive data model ● SKOS (Simple Knowledge Organization System) as core element of enterprise knowledge graphs

● Search → Analyze ● Read → Visualise ● Data driven decision making ● Enterprise Linked Data Get started! Try it out now.

Get your PoolParty Thesaurus Server & Entity Extractor trial:

http://www.poolparty.biz/test-demo/ Let your enterprise Conclusioknowledge graphs grow in parallel to your staff’s linked data skills! Let’s get in contact!

Andreas Blumauer, MSc IT [email protected]

http://j.mp/ablvienna