Semantic Web Company Workshops & Trainings
Semantic Web & Linked Data in
Andreas Blumauer MSc IT Enterprises CEO Semantic Web Company An overview Welcome!
Andreas Blumauer, MSc IT CEO of Semantic Web Company, Vienna
Acknowldeged computer expert in the areas of Text Mining, Semantic Web, Knowledge Modelling & Linked Data • Some initial thoughts on “Information Quality” & “Knowledge Creation”
• What is the Semantic Web? What is Linked Data? - The end of documents? - Standards & Norms
• Some examples of Linked Data Applications
• Linked Data in the context of information management About Semantic Web Company (SWC)
SWC founded 2001 in Vienna
More than 25 Linked Data experts
Product: PoolParty Suite (on the markets since 2009)
Customers from all sectors
EU- & US-based Partner Network Our network: Customers & Partners
Finance / Automotive / Publisher / Health Care / Public Administration / Energy / Education
Customers Partners ● Credit Suisse ● Cognizant ● Daimler ● EBCONT ● Roche ● EPAM Systems ● Wolters Kluwer ● iQuest ● Tieto ● PwC ● Canadian Broadcasting Corporation (CBC) ● DTI AG ● World Bank Group ● Tenforce ● The Pokémon Company ● OpenLink Software ● Healthdirect Australia ● Ontotext ● Ministry of Finance (A) ● MarkLogic ● Wood Mackenzie ● Gravity Zero ● Red Bull Media House ● Altotech ● Council of the E.U. ● Wolters Kluwer ● TC Media ● Term Management ● American Physical Society ● Taxonomy Strategies ● Education Services Australia ● Search explained ● Pearson ● WAND ● Techtarget ● Digirati ● Norwegian Directorate of Immigration ● Cognistreamer ● REEEP ● Linked Data Factory ● European Commission ● Taxonic ● Bank of America ● semweb Information Quality & Knowledge Creation “Information Quality”:
The Enterprise View
nd • Information is often treated as ‘2 class citizen’ in enterprises • Information management lies in the responsibilty of the CTO → Information as technical artefact • Trend towards information silos, no standards • The value of contextual information and premium metadata is often underestimated • Business models rarely recognize the benefits of collaborative practices
→ Hypothesis 1: “The information demands of customers are often being neglected.” → Hypothesis 2: “Enterprises face increasing competitive pressure due to a lack of informational agility.” “Information Quality”: A Meta-Perspective
Humans & Information (CIO-View) Hans Rosling: Growth of the global population
Information increases in value,
• when communicators share a mutual Analog understanding (common sense), and • when information is designed according to the needs of its recipients (personalisation) Digital
→ Hypothesis: “The ability to transfer knowledge (contexts, interdependecies) becomes more important.” “Information Quality”: A Meta-Perspective
Humans & Information (CTO-View)
Information increases in value,
• the lower its integration costs, and • the cheaper its reusability in various contexts
→ Hypothesis: “Providing information (content) in various formats as service via APIs is key to increase information quality from a technical perspective.” What is the Semantic Web? What is Linked Data? Data as Precursor of Knowledge LOD Cloud Challenges in
Data & Information Management
1. Distributed Data Sources 2. Differing Formats 3. Implicit Semantics 4. Dubious Provenance 5. Missing Licenses 6. Unclear Topicality The Semantic Web: ‘Things’ not Strings
St. Mark’s Square Venice prefLabel http://www.mycom.com/ Piazza prefLabel taxonomy/97345854 altLabel San Marco http://www.mycom.com/ taxonomy/62346723
image
has broader http://www.mycom.com/ http://www.mycom. images/90546089 com/taxonomy/4543567
prefLabel altLabel Piazza Square The power of knowledge graphs: Agility, flexibility, complexity
Show me all Traditional approach documents about Graph-based approach European countries
Norway France Austria Canada Norway France Austria Canada
doc doc doc doc doc doc doc doc The power of knowledge graphs: Agility, flexibility, complexity
Show me all Traditional approach documents about Graph-based approach European countries
Europe
Europe, Europe, Europe, America, Norway France Austria Canada Norway France Austria Canada
doc doc doc doc doc doc doc doc The power of knowledge graphs: Agility, flexibility, complexity
Traditional approach Graph-based approach
Show me all documents about EU Europe member countries
EU EU, EU, Europe, Europe, Europe, America, Norway France Austria Canada Norway France Austria Canada
doc doc doc doc doc doc doc doc The power of knowledge graphs: Agility, flexibility, complexity
Traditional approach Graph-based approach
Europe French- speaking French- French, speaking? EU EU, EU, French, Europe, Europe, Europe, America, Norway France Austria Canada Norway France Austria Canada
doc doc doc doc doc doc doc doc The power of knowledge graphs: Agility, flexibility, complexity
Show me all Traditional approach documents from Graph-based approach European countries
Show me all documents from EU Europe Metadatamember countries Knowledge French- speaking French- French,per speaking? aboutEU EU, EU, French, Europe, Europe, Europe, America, NorwaydocumentFrance Austria Canada Norway metadataFrance Austria Canada
doc doc doc doc doc doc doc doc Linked Data: Discovering Answers to
Complex Questions
To answer the following question,
“Are there interdependencies between the Human Development Index of certain countries and the regional research activities concerning specific types of illnesses?”
the following sources can be consulted and linked: ● MeSH (Medical Subject Headings) ● PubMed ● Geonames ● DBpedia ● UNDP
Interlinking of various Knowledge Graphs & Ontologies is key
Venice http://www.mycom.com/ prefLabel taxonomy/5456544 St. Mark’s prefLabel Square http://www.mycom.com/ taxonomy/62346723 http://schema.org/containedIn
http://schema.org/location
http://www.mycom.com/ taxonomy/7835488 http://www.geonames.org/7302945
http://www.freebase.com/m/0q9rr
http://schema.org/City Peggy http://dbpedia.org/resource/ Guggenheim Peggy_Guggenheim_Collection
http://schema.org/TouristAttraction Museum https://www.youtube.com/ VeniceGuggenheim http://schema.org/ArtGallery
Semantic Web - The End of Documents? The End of Documents?
What is a Document? What should it be?
● Production: A tool to create information? ● Storage: A method to store information? ● Visualization: A convention to visualize and represent information? ● Interface: An access point (API) or container, to connect to information and make it findable? ● Craft: The art to tell stories, trigger emotions and/or create common sense? ● ? Knowledge workers link and contextualize information!
Journal article Dossier
Social Web Profil Health Record
Blog post Product information Law
News article Campaign
Regulation Poem
Contract Tweet
Product specification “Follow your nose (‘nous’)” ...some more graphs
Microsoft „Office Graph“ Google „Knowledge Graph“
Facebook „Social Graph“ What exactly do knowledge workers interlink? • Entities, not
documents!
• Things, not
strings! PoolParty Tagging Workflow
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut.
“strings” become “things” sadipscing
Corpus Analysis Quality Checks Concept-based Tagging in Enterprise Content Systems
Drupal Confluence SharePoint 2013 ‘Google’s Knowledge Graph’ as an example for semantic information machines
Enterprises just have started to create their own, specific knowledge graphs.
Which new opportunities can be derived from this development for the information management industry?
Mashup from knowledge graphs and API calls! BBC’s Linked Data Plattform:
How many information sources do you see?
Individual CMSs are pretty good at keeping tabs on the content they create but if you wanted to get hold of the 20 most recent pieces of content from across the BBC (and hence across CMSs) on Burkina Faso, or Jarvis Cocker or global warming it would be very tricky.
Oli Bartlett, product manager for the BBC's Linked Data Platform
Clean Energy Data - Country Profiles Linked Data is a data model, which is based on graphs
● Linked Data is a graph-based data model that is expressive enough to represent and to process a wide spectrum of types of information
→ Being used for Data Integration & Dynamic Semantic Publishing (DSP) in distributed environments (“Semantic Web”)
Semantic Web Standards & Technologies Resource Description Framework (RDF)
predicate Subject Object
Semantic Web is a Organization Company
Semantic Web is located in Vienna Company Simple Knowledge Organization System (SKOS) Taxonomies and controlled vocabularies
http://www.w3.org/2004/02/skos/ From Simple SKOS to large knowledge graphs
Link and map Generate 1st Edit,extend & Extend schema, between version of SKOS curate apply ontologies, taxonomies taxonomy taxonomy use SKOS-XL and LD graphs
- Reuse of existing - Taxonomy Editing - Reuse existing ontologies - Automatic mapping between vocabularies - Collaborative workflows - Create custom schemes taxonomies - Corpus Analysis - Free term extraction - Apply SKOS-XL - Linked Data frontend - Excel import - Tag recommender - Apply ontologies on your - Link to other LD graphs, e.g. - XML import - Quality Checker SKOS taxonomy DBpedia or Geonames - Linked data harvester
your data, your your CMS e.g. Excel docs Linked Vocabularies - Linked Contents
Wolters Kluwer
Working Law Thesaurus Eurovoc
STW Thesaurus
DBpedia Linked Data & Linked Vocabularies can be reused with increased efficiency
● Linked Data is based on standards and embedded in a wide data eco- system
→ Semantic Web based ontologies, thesauri, taxonomies and knowledge graphs can be reused at relatively low costs, at least technically spoken. Linked Open Data Graphs
38.8 mio. entities 3 bio. facts / triples 125 languages 50 mio. links to other sources
SKOS Thesauri
● Eurovoc (EU) ● UNESCO Thesaurus (UN) ● ESCO (EU) ● New York Times SH (US) ● Jurivoc (SUI) ● RAMEAU subject headings (FR) ● ScoT (AUS) ● TheSoz (DE) ● Agrovoc (UN) ● The General Finnish Thesaurus (FIN) ● MeSH (US) ● NAL Thesaurus (US) ● Getty Vocabularies (US) ● Social Semantic Web Thesaurus (AT) ● GEMET (EEA) ● Courts thesaurus (DE) ● GeoThesaurus (AT) ● SITC-V4 (UN) ● STW Economy (DE) ● Google Product Taxonomy (US) ● Polythematic SH (CZ) ● NAICS 2012 (US) ● Canadian Subject Headings (Can) ● Common Procurement Vocabulary (ES) ● LCSH (US) ● UKAT UK Archival Thesaurus (UK) ● Worldbank Taxonomy (WBG) ● NASA taxonomy (US) ● Labor Law Germany Thesaurus (DE) ● IVOA astronomy vocabularies (UK) ● Reegle Thesaurus (REEEP) ● IPTC News Codes (UK) ● Austrian Tax Law Thesaurus (AT) ● WAND taxonomies (US) Query language for knowledge graphs:
SPARQL Complex queries with SPARQL
PREFIX mrv-schema:
SELECT DISTINCT * WHERE { GRAPH
?observation mrv-schema:year ?year. ?observation mrv-schema:region ?region. ?observation mrv-schema:region
based on
Linked Data Applications Traditional approach for data- and information integration
Show me the ‘most influential people Solution: Develop specific in the world’ who were born in countries application to integrate the data with an HDI less than 0.5? sources
Person 4711 Country 4812
Name Name Jeff Bezos USA Affiliation GDP Amazon $ 15.684 billion Born in HDI Albuquerque 0.937 Linked Data combines the requirements
‘Semantic search’ and ‘Business analytics’
● Linked Data is based on an expressive data model being able to represent a wide spectrum of types of information
→ Excellent capabilities for complex search- and analytics applications; combines and links the realms of structured & unstructured information Show me the ‘most influential Solution: Use people in the world’ who were Thesaurus/Taxonomy-Graph born in countries taxonomies and with an HDI less than 0.5? Continents U.S. ontologies to link knowledge graphs
America New Mexico
Knowledge-Graph 1 South Knowledge-Graph 2 Albuquerque America
Jeff Bezos United States
$ 15.684 Amazon Albuquerque 0,937 billion
Ontology-Graph GDP affiliated with born in Organization Person Place
HDI
See how it works: PoolParty components & workflows
● Confluence, WordPress is user of enrich SharePoint, Drupal, ... Content ● search engine Manager annotate ● database
basis for
analyzes Developer basis for uses API
basis for
works on enrich ● reference taxonomies Taxonomist/ ● linked data sources Ontologist ● text reference corpora Applications & Online-Demos
Thesaurus Publishing Business Intelligence Content Recommendation Semantic Expert Finder
Web Mining Semantic Search Semantic Tagging in SharePoint Symptom Checker Demo: PowerTagging for Drupal
Interactive data visualization Matchmaking based on linked data technologies
Quality of meta information and knowledge graphs describing profiles determine,
Information pieces and business objects like products, users, ads are linked dynamically Matchmaking between users / content
http://faq.poolparty.biz/
http://www.eip-water.eu/
The impact of Linked Data on a new generation of information management Benefit arguments
Operating Cost effectiveness The systemic view efficiency
Basic IT-Management / Information & Business Process argument Software Architect Knowledge Management Management
Better reuse of Better understanding of Unified views on Efficient existing information relations between things business objects and agile resources helps to save increases communication lead to better data model costs skills decisions
Higher Increased transparency on Information flows Efficient handling of information inconsistencies and adapt to the needs metadata quality contradictions of the user
Improved Automatic structuring Consistent use of controlled BI-like, complex information of unstructured data vocabularies triggers queries become retrieval help to save costs additional network effects possible Content value chains based on open data Summary: Data Silos & Documents open up!
● Graph-based data model ● Standards-based data model ● Expressive data model ● SKOS (Simple Knowledge Organization System) as core element of enterprise knowledge graphs
● Search → Analyze ● Read → Visualise ● Data driven decision making ● Enterprise Linked Data Get started! Try it out now.
Get your PoolParty Thesaurus Server & Entity Extractor trial:
http://www.poolparty.biz/test-demo/ Let your enterprise Conclusioknowledge graphs grow in parallel to your staff’s linked data skills! Let’s get in contact!
Andreas Blumauer, MSc IT [email protected]
http://j.mp/ablvienna