Semantic Web Technologies for Digital History: an Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Web Technologies for Digital History: an Introduction Workshop at the UNIL Digital History Summer School Maud Ehrmann- Digital Humanities Laboratory – EPFL [email protected] What is it all about Today: how to represent, exchange, link and exploit historical data information extraction interoperability (manual, (semi)-automatic) exploitation unstructured data data storage (primary sources) knowledge representation documents, pictures, videos, art works, tweets, blogs, etc. data processing SOURCE LEVEL DATA/INFORMATION LEVEL KNOWLEDGE LEVEL 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 2 Today’s objectives 1. Working with digital historical information 2. Introduction to Semantic Approaches: • Semantic Web and Linked Data • Resource Description Framework • Ontologies • SPARQL 3. Hands on historical data sets published as LD 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 3 How to handle historical information? Starting point: ? Ending point: historical sources new historical knowledge, (potentially a lot) shared among historians and with the public at large Hands-on: Take any historical source of your choice, and list the concrete steps you think you will need to be able to exploit it, given some historical questions 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 4 Historical information life cycle transcription & annotation metadata enhancement Enrichment Editing physical production look-up, queries, design inf. structure Modelling Creation Usability Retrieval visualization interfaces Durability exhibitions, online DBs, Presentation Analysis historical research digital editions, etc. 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 5 Source: Past, present and future of historical information science, Boonstra O. , Breure L., Doorn P., Historical Social Research Journal, 2004, 29-2, pp. 4-132 Challenges & Requirements Regarding historical sources infer missing information • incomplete encode • uncertain uncertainty • scattered interoperability 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 6 Challenges & Requirements Regarding historical information keep source and turn back to the interpretation separated original source • relative keep represent diverse non destructive provenance source manipulation • messy interpretations encode spatio- temporal change structure and link data 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 7 Challenges & Requirements Regarding historical investigations • (individual) iterative process continuous • difficult to define precise questions communication at the start tool fleXibility and modularity 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 8 Semantic Web and Linked Data 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 9 The Semantic Web Web of Documents Web of Data URLs U/IRIs 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 10 Standard Web Architecture (simplified view) untyped links untyped links untyped links HTML HTML HTML HTML European Polish Wikipedia parliament newspaper DB DB DB 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 11 Slide (adapted) from Michele Pasin Standard Web Architecture (simplified view) untypedAnalogylinksGlobal file system untyped links untyped links Designed for Human consumption HTML HTML HTMLPrimary objects DocumentsHTML Links between Documents (or sub parts of) Degree of structure in object Low Semantics of content and links Implicit European Polish Wikipedia parliament newspaper DB DB DB 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 12 Slide (adapted) from Michele Pasin Semantic Web Architecture (simplified view) Thing Thing Thing Thing Thing Thing typed links typed links European Polish DBpedia parliament newspaper DB DB 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 13 Slide (adpated) from Michele Pasin Semantic Web Architecture (simplified view) Thing http://europeanparliament.eu/entity/JCJuncker http://europeanparliament.eu/relation/mentionedIn Thing http://europeanparliament.eu/entity/Speech237854 HTML European <subject URI> parliament <predicate URI> DB <object URI> 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 14 Slide (adapted) from Michele Pasin Semantic Web Architecture (simplified view) <dbp-pers:Jean_Claude_Juncker> <ep:Speech87246> <pnp:Person87246> <dbo:almaMater> <epo:mentions> <pnpo:AltLabel> <dbp:University_Strasbourg> <ep-pers:JCJuncker> <pnpo:Jean_Claudem_Junckerem> Thing Thing AnalogyThing Global data base Designed for Human and Machines Thing Thing Thing Primary objects Things (eXpressed through URIs) typed links typed links Links between Things (eXpressed through URIs) Degree of structure in object High Semantics of content and links Explicit European Polish DBpedia parliament newspaper DB DB 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 15 Slide (adapted) from Michele Pasin Semantic Web Architecture (simplified view) <dbp-pers:Jean_Claude_Juncker> <ep:Speech87246> <pnp:Person87246> <dbo:almaMater> <epo:mentions> <pnpo:AltLabel> <dbp:University_Strasbourg> <ep-pers:JCJuncker> <pnpo:Jean_Claudem_Junckerem> Thing Thing Thing Thing Thing Thing typed links typed links European Polish DBpedia parliament newspaper DB DB 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 16 Slide (adpated) from Michele Pasin The Semantic Web The Semantic Web is an eXtension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. (TBL et al., 2001) 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 17 The Semantic Web URIs (linked) data The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. shared vocabularies query and and ontologies inference 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 18 The Semantic Web Render data so that people (and machines) can: • access • understand • exploit Strong focus on interoperability: • syntactic level (format) - machines can read the data • semantic level (conceptual model) - machines can understand the data 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 19 Linked Data A way of publishing data according to Semantic Web Standards 1. use URIs as names for things 2. use HTTP URIs so that people can look up those names. 3. when someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4. include links to other URIs so that they can discover more things. 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 20 See: https://www.w3.org/DesignIssues/LinkedData.html Linked Open Data cloud diagram – 01/2007 - source: http://lod-cloud.net/ 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 21 Linked Open Data cloud diagram – 10/2007 - source: http://lod-cloud.net/ 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 22 Linked Open Data cloud diagram – 2008 - source: http://lod-cloud.net/ 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 23 Linked Open Data cloud diagram - 2009 - Sem- Wiki- Surge company LIBRIS Web- Radio Central RDF ohloh Doap- Music- space Semantic Resex brainz Audio- Flickr Web.org Eurécom MySpace Scrobbler QDOS exporter SW Conference Wrapper IRIT Corpus Toulouse RAE BBC BBC Crunch 2001 FOAF SIOC ACM BBC Later + John Base Revyu Jamendo Peel profiles Sites Playcount TOTP Open- Buda- Data Guides pest DBLP BME RKB Project flickr Geo- Pub Euro- Guten- wrappr Explorer Guide Virtuoso names stat berg Pisa BBC Sponger eprints Programm Open es Calais New- riese World Linked ECS castle Fact- MDB South- IEEE Magna- book ampton Gov- tune RDF Book Track DBpedia Mashup lingvoj Freebase IBM US CiteSeer LAAS- Census W3C DBLP CNRS Data WordNet Hannover UniRef GEO UMBEL Species DBLP Berlin Reactome LinkedCT UniParc Open Taxonomy Cyc Yago Drug PROSITE Daily Bank Med Pub GeneID Homolo Chem Gene KEGG UniProt Pfam ProDom CAS Disea- Gene some ChEBI Ontology Symbol OMIM Inter Pro UniSTS PDB HGNC MGI source: http://lod-cloud.net/ PubMed As of March 2009 6/21/17 Maud Ehrmann - UNIL Digital History Summer School - Workshop 24 Linked Open Data cloud diagram - 2010 - Sussex St. Reading Andrews NDL Audio- Resource Lists subjects t4gm MySpace scrobbler Lists Moseley (DBTune) (DBTune) RAMEAU Folk NTU SH lobid Resource GTAA Plymouth Organi- Reading Lists Lists sations Music ECS Magna- The Open Brainz Music Library LCSH South- DB tune (Data LIBRIS Brainz lobid ampton Ulm Tropes Incubator) (zitgist) Man- Resources EPrints chester Surge biz. Music Reading RISKS Radio The Open ECS data. Brainz Lists John Discogs Library PSH Gem. South- gov.uk (DBTune) UB Peel FanHubz (Data In- (Talis) Norm- Mann- ampton (DB cubator) datei heim RESEX Tune) Jamendo Poké- DEPLOY Popula- Last.fm tion (En- pédia Artists Last.FM Linked RDF AKTing) research EUTC (DBTune) (rdfize) LCCN VIAF Book Wiki data.gov Eurécom Produc- P20 Mashup semantic Pisa NHS .uk classical tions Pokedex web.org (EnAKTing) (DB Mortality Tune) PBAC ECS (En- MARC (RKB AKTing) BBC Budapest Codes Explorer) OpenEI Program BBC Lotico Revyu Energy education List Semantic OAI (En- CO2 data.gov mes Music Crunch SW (En- Chronic- Linked Dog AKTing) .uk NSZL Base AKTing) ling Event- MDB RDF Food Catalog IRIT America Media ohloh BBC DBLP Good-