Using Rdfa to Link Text and Dictionary Data for Medieval French

Total Page:16

File Type:pdf, Size:1020Kb

Using Rdfa to Link Text and Dictionary Data for Medieval French Using RDFa to Link Text and Dictionary Data for Medieval French Sabine Tittel,∗ Helena Bermúdez-Sabel, Christian Chiarcos◦ ∗Heidelberg Academy of Sciences and Humanities, Heidelberg, Germany Laboratorio de Innovación en Humanidades Digitales, UNED, Madrid, Spain ◦Goethe University Frankfurt, Frankfurt am Main, Germany [email protected], [email protected], [email protected] Abstract This paper presents an endeavor to transform a scholarly text edition (of a medical treatise written in Middle French) into a digital edition enriched with references to an on-line dictionary. Hitherto published as a book, the resulting digital edition will use RDFa to interlink its vocabulary with the corresponding lexical entries of the Dictionnaire étymologique de l’ancien français (DEAF). We demonstrate the feasibility of RDFa for the semantic enrichment of digital editions within the philologies. In particular, the technological support for RDFa excels beyond domain-specific solutions favored by the TEI community. Our findings may thus contribute to future technological bridges between TEI/XML and (Linguistic) Linked Open Data resources. The original data of the edition is available in a LATEX format that includes profound semantic markup. We convert this data into XML/TEI, and integrate RDFa-compliant attributes for every lexeme attested in the text. The HTML5 edition generated from the XML sources preserves the RDFa attributes and thus (a) embeds (links) its vocabulary within the overall system of the medieval French language, and that (b) provides and displays linguistic features (say, sense definitions given in the original corpus data) along with the critical apparatus of the original book publication. Keywords: TEI, RDFa, LATEX, Digital Scholarly Text Edition, Lexicography, Old French, Middle French. 1. Introduction readable view, optimized for a particular way of visual ar- rangement, whereas the underlying semantics are left im- In the field of textual philology of Medieval French, the plicit (and thus cannot be easily recovered for applications traditional way to print books from a scholarly text edi- to be developed on top of these resources, say, tools for tion is still the publication method of choice. The number the automatically supported linguistic analysis of Medieval of digitally published editions of Old and Middle French French). The Resource Description Framework (RDF)1 is is growing at a pace that is rather slow compared to the designed as a means to complement or to replace conven- text editions of modern texts. The results are embrac- tional hyperlinks with semantically typed links. However, ing a wide spectrum of approaches from retro-digitization the state of the art in this area is defined by the XML spec- (Reisdoerfer, 1996a; Reisdoerfer, 1996b; Harsch, 1996) to ifications of the TEI (TEI Consortium, 2017b), and these digital-born on-line editions whose creation demands for provide only limited support for integrating RDF data struc- a big investment of time and resources (Eley et al., 2005; tures or references to Linked Data resources. An alternative Nichols and Choudhury, 2017; Laidlaw, 2015). However, which – to our best knowledge – has not previously been regardless of their format, these publications are almost ex- applied in our domain would be to extend conventional clusively stand-alone products. To a greater or lesser de- XML representations with RDFa (Herman et al., 2015), a gree, the reader of a stand-alone edition of Old and Middle formalism that provides specifications to extend XML for- French texts is thus left alone with the attempt to under- mats with arbitrary RDF links. This way of application, stand the text. On its own, an edition is often deprived of however, entails that the TEI vocabulary is to be extended a means that helps to explain the meaning of the vocab- by RDFa attribute definitions within its own namespace. ulary in a broader context: Neither is the vocabulary re- Hence, we strive for a dual objective in this paper: lated to a more comprehensive lexicon of Old and Middle French, nor does a mere glossary reveal the significance of • We describe the transformation of a scholarly text its specific vocabulary within the history of the language. edition (whose original data is encoded with LATEX Only very few digital editions provide a more comprehen- markup) into a high-quality on-line edition in a time- sive access to such information, for example The Online saving manner. Froissart. A Digital Edition of the Chronicles of Jean Frois- sart, Version 1.5, edited by Peter Ainsworth and Godfried • We further evaluate the possibility of using RDFa to Croenen, Sheffield (HRIOnline) 2013, http://www. provide explicit semantics for the links between this hrionline.ac.uk/onlinefroissart/ [accessed on-line publication and the entries of a reference dic- 01-03-2018]: With a specific preference (Global Viewing tionary. Mode ‘DMF’), this on-line edition links its lexemes with the entries of the Dictionnaire du Moyen Français – DMF 1RDF 1.1 Primer, W3C Working Group Note 24 June (Martin, 2015). 2014, https://www.w3.org/TR/rdf11-primer/ (work However, conventional web editions provide a human- in progress) [accessed 01-06-2018]. As a concrete benefit of an RDFa-extended edition of the content we relate the vocabulary of the Grande Chirurgie edited text, these links can be automatically processed by to the respective dictionary entries of the DEAF and to their RDFa parsers such as http://sparql.org [accessed etymological, semantic and encyclopedic discussions. 01-06-2018]. Also, together with other resources pointing to the same dictionary (or, if available, an RDF edition of 3. Original LATEX data the dictionary itself), they can be integrated into larger data The critical text edition of the first treatise of the French bases that combine lexical information from multiple edi- translation, the Anathomy, serves as our corpus data. The tions. text edition is hitherto published as a book (Tittel, 2004); With the interlinking of the lexemes in the text with the en- in the following, we cite this edition with its siglum tries in the dictionary we are able to embed the vocabulary GuiChaulMT.3 We have access to the original format of the within the overall system of the medieval French language data, which comes as a text file with LAT X markup (exam- as it is established by our reference dictionary. This defines E ple 1). This markup has been the basis for a fully auto- their place within the semantic framework given in the dic- mated generation of the text edition (PDF) in 2004 includ- tionary articles. ing a critical apparatus (we used the Edmac package for Our main contribution is to demonstrate the feasibility of typesetting scholarly critical editions4) and an index of the RDFa as an instrument of semantically enriched digital edi- vocabulary (with approx. 1,350 entries). Also, the markup tions within the philologies. In particular, the technological includes the semantic information to create an index of per- support for RDFa excels beyond domain-specific solutions sonal names, i.e., of the authorities of classical and me- favored by the TEI community. Our findings may thus con- dieval medicine that are invoked by Gui de Chauliac to un- tribute to the development of technological bridges between derpin his expertise, e.g., Galen of Pergamon and Avicenna. TEI/XML and (Linguistic) Linked Open Data resources. The typesetting has been executed by a tailor-made LATEX 2. Text and Dictionary class, see Fig.1 and Fig. 2 for the published result. We demonstrate our approach for an edition of a part of the Middle French Grande Chirurgie composed by Gui de Chauliac. The reference dictionary is the compre- hensive etymological dictionary of the Old French lan- guage, the Dictionnaire étymologique de l’ancien français – DEAF (Baldinger, since 1971) which is also relevant for our Middle French text. Its on-line publication, DEAFél, offers dictionary articles for the Old French lexis in open access, see https://deaf-server.adw. 2 uni-heidelberg.de/ [accessed 12-12-2017]. Figure 1: Detail of the text edition, GuiChaulMT p. 78. Gui de Chauliac is one of the most widely known physi- cians and medical authors of the French Middle Ages. His opus, the Latin Chirurgia magna from 1363 AD (Grande Chirurgie in its French translation), is considered the most profound compendium of the medical knowledge of its day. In seven treatises, Gui de Chauliac describes the human anatomy, tumors and cancer, wounds and fractures, the plague, eye, ear, and dental pathology, etc. The work was Figure 2: Detail of the glossary, GuiChaulMT p. 371. translated into French presumably soon after 1363 and the oldest French manuscript transmitting this medical mile- The first step towards our aim was thus the conversion stone dates from the 2nd third of the 15th century. of the original data format into an XML representation. The Grande Chirurgie is of particular relevance for the his- The source material provides excellent preconditions be- tory of the French language as well as for both the history of cause it contains valuable information that we can use for medicine and mentality: On the one hand, the content of the automated processing: The LAT X annotation comprises Grande Chirurgie passes on the medical beliefs propagated E the lemmatization of the vocabulary (i.e., the alignment by the authorities of Classical Antiquity, e.g., the concept of of the individual graphical realization of a lexeme with humoralism by Galen of Pergamon. On the other hand, it is the spelling that is accepted as the Old French standard a valuable witness of a critical re-examination and cautious (Möhren, 2015)), word-category disambiguation (part-of- overcoming of antique explanatory models and movement speech tagging) and also word-sense disambiguation (se- towards a modern understanding of the body.
Recommended publications
  • Using Json Schema for Seo
    Using Json Schema For Seo orAristocratic high-hat unyieldingly.Freddie enervates Vellum hungrily Zippy jangles and aristocratically, gently. she exploiter her epoxy gnarls vivace. Overnice and proclitic Zane unmortgaged her ben thrum This provides a murder of element ids with more properties elsewhere in the document Javascript Object Notation for Linked Objects JSON-LD. Enhanced display search results with microdata markup is json data using video we need a website experience, is free whitepaper now need a form. Schemaorg Wikipedia. Sign up in some time and as search console also, he gets generated by google tool you add more. Schema Markup 2021 SEO Best Practices Moz. It minimal settings or where your page editor where can see your business information that will talk about. Including your logo, social media and corporate contact info is they must. How various Use JSON-LD for Advanced SEO in Angular by Lewis. How do no implement a FAQ schema? In seo plugin uses standard schema using html. These features can describe you stand only in crowded SERPs and enclose your organic clickthrough rate. They propose using the schemaorg vocabulary along between the Microdata RDFa or JSON-LD formats to that up website content with metadata about my Such. The incomplete data also can mild the Rich Snippets become very inconsistent. Their official documentation pages are usually have few months or even years behind. Can this be included in this? Please contact details about seo services, seos often caches versions of. From a high level, you warrior your adventure site pages, you encounter use an organization schema.
    [Show full text]
  • V a Lida T in G R D F Da
    Series ISSN: 2160-4711 LABRA GAYO • ET AL GAYO LABRA Series Editors: Ying Ding, Indiana University Paul Groth, Elsevier Labs Validating RDF Data Jose Emilio Labra Gayo, University of Oviedo Eric Prud’hommeaux, W3C/MIT and Micelio Iovka Boneva, University of Lille Dimitris Kontokostas, University of Leipzig VALIDATING RDF DATA This book describes two technologies for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL), the rationales for their designs, a comparison of the two, and some example applications. RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange. The Semantic Web offers a bold, new take on how to organize, distribute, index, and share data. Using Web addresses (URIs) as identifiers for data elements enables the construction of distributed databases on a global scale. Like the Web, the Semantic Web is heralded as an information revolution, and also like the Web, it is encumbered by data quality issues. The quality of Semantic Web data is compromised by the lack of resources for data curation, for maintenance, and for developing globally applicable data models. At the enterprise scale, these problems have conventional solutions. Master data management provides an enterprise-wide vocabulary, while constraint languages capture and enforce data structures. Filling a need long recognized by Semantic Web users, shapes languages provide models and vocabularies for expressing such structural constraints.
    [Show full text]
  • Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing
    RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing A collection of attributes and processing rules for extending XHTML to support RDF W3C Recommendation 14 October 2008 This version: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014 Latest version: http://www.w3.org/TR/rdfa-syntax Previous version: http://www.w3.org/TR/2008/PR-rdfa-syntax-20080904 Diff from previous version: rdfa-syntax-diff.html Editors: Ben Adida, Creative Commons [email protected] Mark Birbeck, webBackplane [email protected] Shane McCarron, Applied Testing and Technology, Inc. [email protected] Steven Pemberton, CWI Please refer to the errata for this document, which may include some normative corrections. This document is also available in these non-normative formats: PostScript version, PDF version, ZIP archive, and Gzip’d TAR archive. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported - 1 - How to Read this Document RDFa in XHTML: Syntax and Processing into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.
    [Show full text]
  • Where Is the Semantic Web? – an Overview of the Use of Embeddable Semantics in Austria
    Where Is The Semantic Web? – An Overview of the Use of Embeddable Semantics in Austria Wilhelm Loibl Institute for Service Marketing and Tourism Vienna University of Economics and Business, Austria [email protected] Abstract Improving the results of search engines and enabling new online applications are two of the main aims of the Semantic Web. For a machine to be able to read and interpret semantic information, this content has to be offered online first. With several technologies available the question arises which one to use. Those who want to build the software necessary to interpret the offered data have to know what information is available and in which format. In order to answer these questions, the author analysed the business websites of different Austrian industry sectors as to what semantic information is embedded. Preliminary results show that, although overall usage numbers are still small, certain differences between individual sectors exist. Keywords: semantic web, RDFa, microformats, Austria, industry sectors 1 Introduction As tourism is a very information-intense industry (Werthner & Klein, 1999), especially novel users resort to well-known generic search engines like Google to find travel related information (Mitsche, 2005). Often, these machines do not provide satisfactory search results as their algorithms match a user’s query against the (weighted) terms found in online documents (Berry and Browne, 1999). One solution to this problem lies in “Semantic Searches” (Maedche & Staab, 2002). In order for them to work, web resources must first be annotated with additional metadata describing the content (Davies, Studer & Warren., 2006). Therefore, anyone who wants to provide data online must decide on which technology to use.
    [Show full text]
  • XHTML+Rdfa 1.1 - Third Edition Table of Contents
    XHTML+RDFa 1.1 - Third Edition Table of Contents XHTML+RDFa 1.1 - Third Edition Support for RDFa via XHTML Modularization W3C Recommendation 17 March 2015 This version: http://www.w3.org/TR/2015/REC-xhtml-rdfa-20150317/ Latest published version: http://www.w3.org/TR/xhtml-rdfa/ Implementation report: http://www.w3.org/2010/02/rdfa/wiki/CR-ImplementationReport Previous version: http://www.w3.org/TR/2014/PER-xhtml-rdfa-20141216/ Previous Recommendation: http://www.w3.org/TR/2013/REC-xhtml-rdfa-20130822/ Editor: Shane McCarron, Applied Testing and Technology, Inc., [email protected] Please check the errata for any errors or issues reported since publication. This document is also available in these non-normative formats: XHTML+RDFa, Diff from Previous Recommendation, Postscript version, and PDF version The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2015 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply. Abstract RDFa Core 1.1 [RDFA-CORE [p.61] ] defines attributes and syntax for embedding semantic markup in Host Languages. This document defines one such Host Language. This language is a superset of XHTML 1.1 [XHTML11-2e [p.61] ], integrating the attributes as defined in RDFa Core 1.1. This document is intended for authors who want to create XHTML Family documents that embed rich semantic markup. - 1 - Status of This Document XHTML+RDFa 1.1 - Third Edition Status of This Document This section describes the status of this document at the time of its publication.
    [Show full text]
  • Rdfa Versus Microformats: Exploring the Potential for Semantic Interoperability of Mash-Up Personal Learning Environments
    RDFa versus Microformats: Exploring the Potential for Semantic Interoperability of Mash-up Personal Learning Environments Vladimir Tomberg, Mart Laanpere Tallinn University, Narva mnt. 25, 10120 Tallinn, Estonia [email protected], [email protected] Abstract. This paper addresses the possibilities for increasing semantic interoperability of mash-up learning environments through the use of automatically processed metadata associated with both learning resources and learning process. We analyze and compare potential of two competing technologies for this purpose: microformats and RDFa. 1 Introduction Mash-up Personal Learning Environments have become a fast developing trend in the world of technology-enhanced learning, partly because of their flexibility and lightweight integration features. Although it is quite easy to aggregate the RSS feeds from the blogs of learners, it is more difficult to get an overview of course and its learning activities. A course is not just a syllabus, it also involves various dynamic processes that can be described in many aspects. The course always has certain learning goals, a schedule that consists learning activities (assignments, discussions), registered participants like teachers and students, and different types of resources. It would be useful, if we would be able to extract such information also from mash-up personal learning environments (just like it can be done in traditional Learning Management Systems) and allow exchanging it between the course participants. Today for semantic tagging of Web content in general and learning content as special case various technologies are used. But there are no tools and ways exist for semantic annotation of learning process that takes place in a distributed network of mash-up personal learning environments.
    [Show full text]
  • Rdfa How and Why
    RDFa How and Why Ralph R. Swick World Wide Web Consortium 22 January 2010 2 A Web of Information The current Web represents information using natural language graphics, multimedia, page layout Humans can process this easily can deduce facts from partial information can create mental associations can deal with a variety of media (see Web accessibility guidelines) 3 Data in HTML Documents 4 Data Exposed with MarkUp microformats http://microformats.org/ 9+ sets of terms and relations (formats): hCalendar, hCard, rel-license, rel-nofollow, rel-tag, VoteLinks, XFN, XMDP, XOXO each of the formats has a custom interpreter semi-formal community agreement process for creating new formats (terms and relations) extensible only through community agreement 5 Special-purpose “smart” portals Various types of “portals” are created (for a journal on-line, for a specific area of knowledge, for specific communities, etc) The portals may: integrate many data sources may have access to specialized domain knowledge Goal is to provide a better local access, search on the integrated data, reveal new relationships among the data 6 A Web of Linked Data Encourage all data to be published in machine- useable formats Any data see, e.g. data.gov.uk The LOD “cloud”, July 2009 7 credit: Richard Cyganiak, LOD cloud 8 Extensible Structured Data RDF - Resource Description Framework 3 syntaxes: RDF/XML, RDFa, RDF/N3* A data modeling framework that takes full advantage of the Web Self-describing data Objects, terms, and relations are all identified using URIs Publish any data by publishing definitions of your terms and relations * only RDF/XML and RDFa are W3C Recommendations at this time 9 Semantic Web data integration 1.
    [Show full text]
  • The Resource Description Framework and Its Schema Fabien Gandon, Reto Krummenacher, Sung-Kook Han, Ioan Toma
    The Resource Description Framework and its Schema Fabien Gandon, Reto Krummenacher, Sung-Kook Han, Ioan Toma To cite this version: Fabien Gandon, Reto Krummenacher, Sung-Kook Han, Ioan Toma. The Resource Description Frame- work and its Schema. Handbook of Semantic Web Technologies, 2011, 978-3-540-92912-3. hal- 01171045 HAL Id: hal-01171045 https://hal.inria.fr/hal-01171045 Submitted on 2 Jul 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Resource Description Framework and its Schema Fabien L. Gandon, INRIA Sophia Antipolis Reto Krummenacher, STI Innsbruck Sung-Kook Han, STI Innsbruck Ioan Toma, STI Innsbruck 1. Abstract RDF is a framework to publish statements on the web about anything. It allows anyone to describe resources, in particular Web resources, such as the author, creation date, subject, and copyright of an image. Any information portal or data-based web site can be interested in using the graph model of RDF to open its silos of data about persons, documents, events, products, services, places etc. RDF reuses the web approach to identify resources (URI) and to allow one to explicitly represent any relationship between two resources.
    [Show full text]
  • JSON-LD 1.1 – a JSON-Based Serialization for Linked Data Gregg Kellogg, Pierre-Antoine Champin, Dave Longley
    JSON-LD 1.1 – A JSON-based Serialization for Linked Data Gregg Kellogg, Pierre-Antoine Champin, Dave Longley To cite this version: Gregg Kellogg, Pierre-Antoine Champin, Dave Longley. JSON-LD 1.1 – A JSON-based Serialization for Linked Data. [Technical Report] W3C. 2019. hal-02141614v2 HAL Id: hal-02141614 https://hal.archives-ouvertes.fr/hal-02141614v2 Submitted on 28 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. https://www.w3.org/TR/json-ld11/ JSON-LD 1.1 A JSON-based Serialization for Linked Data W3C Candidate Recommendation 12 December 2019 This version: https://www.w3.org/TR/2019/CR-json-ld11-20191212/ Latest published version: https://www.w3.org/TR/json-ld11/ Latest editor's draft: https://w3c.github.io/json-ld-syntax/ Test suite: https://w3c.github.io/json-ld-api/tests/ Implementation report: https://w3c.github.io/json-ld-api/reports/ Previous version: https://www.w3.org/TR/2019/WD-json-ld11-20191112/ Latest Recommendation: https://www.w3.org/TR/2014/REC-json-ld-20140116/ Editors: Gregg Kellogg (v1.0 and v1.1) Pierre-Antoine Champin
    [Show full text]
  • Shapeness: a SHACL-Driven RDF Graph Editor
    SHAPEness: a SHACL-driven RDF Graph Editor Rossana Pacielloa,c,*, Daniele Bailoa,c, Luca Tranib, Valerio Vinciarellia, Manuela Sbarrac, Lorenzo Fen- oglioc, Sara Capotostic a European Plate Observing System, EPOS-ERIC, Rome, Italy b Department of R&D Seismology and Acoustics, Royal Netherlands Meteorological Institute (KNMI), Utrechtseweg 297, 3731 GA, De Bilt, The Netherlands c Istituto Nazionale di Geofisica e Vulcanologia (INGV), Rome, Italy E-mails: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] Abstract. The Shapes Constraint Language (SHACL) has been recently introduced as a W3C recommendation to define con- straints (called shapes) for validating RDF graphs. In this paper a novel SHACL-driven multi-view editor is presented: SHAPEness. It empowers users by offering them a rich interface for assessing and improving the quality of RDF graphs. SHAPEness has been developed and tested in the framework of the European Plate Observing System (EPOS). In this context, the SHAPEness features have proven to be a valuable solution to easily create and maintain valid graphs according to the EPOS data model. The SHACL-driven approach underpinning SHAPEness, makes this tool suitable for a broad range of do- mains, or use cases, which structure their knowledge by means of SHACL constraints. Keywords: SHACL shapes, RDF validation, knowledge graph quality, graphs visualization, metadata editor. 1. Introduction FAIR principles, metadata can facilitate data ex- change, data findability [14] and data access [5]. Over the past decade, the use of standard semantic Resource Description Framework1 (RDF) is the technologies and linked data principles have become most popular format for exchanging semantic (me- general practice in structuring metadata and related ta)data, structured as dynamic and schemaless graphs.
    [Show full text]
  • Rdfa 1.1 Primer - Second Edition Page 1 of 37
    RDFa 1.1 Primer - Second Edition Page 1 of 37 RDFa 1.1 Primer - Second Edition Rich Structured Data Markup for Web Documents W3C Working Group Note 22 August 2013 This version: http://www.w3.org/TR/2013/NOTE-rdfa-primer-20130822/ Latest published version: http://www.w3.org/TR/rdfa-primer/ Previous version: http://www.w3.org/TR/2012/NOTE-rdfa-primer-20120607/ Editors: Ivan Herman, W3C, [email protected] Ben Adida, Creative Commons, [email protected] Manu Sporny, Digital Bazaar, [email protected] Mark Birbeck, webBackPlane.com, [email protected] Please refer to the errata for this document, which may include some normative corrections. Copyright © 2010-2013 W3C®(MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The last couple of years have witnessed a fascinating evolution: while the Web was initially built predominantly for human consumption, web content is increasingly consumed by machines which expect some amount of structured data. Sites have started to identify a page's title, content type, and preview image to provide appropriate information in a user's newsfeed when she clicks the "Like" button. Search engines have started to provide richer search results by extracting fine-grained structured details from the Web pages they crawl. In turn, web publishers are producing increasing amounts of structured data within their Web content to improve their standing with search engines. A key enabling technology behind these developments is the ability to add structured data to HTML pages directly. RDFa (Resource Description Framework in Attributes) is a technique that allows just that: it provides a set of markup attributes to augment the visual information on the Web with machine- readable hints.
    [Show full text]
  • The Webdatacommons Microdata, Rdfa, and Microformat Dataset Series Robert Meusel, Petar Petrovski, and Christian Bizer HTML-Embedded Structured Data on the Web
    The WebDataCommons Microdata, RDFa, and Microformat Dataset Series Robert Meusel, Petar Petrovski, and Christian Bizer HTML-embedded Structured Data on the Web More and more websites semantically markup the content of their HTML pages. RDFa Microformats Microdata The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 2 Dataset Creation − Common Crawl Foundation Corpora of 2010, 2012 and 2013 • Snapshot of popular pages of the Web • Continuously new crawls available The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 3 Dataset Creation − Common Crawl Foundation Corpora of 2010, 2012 and 2013 • Snapshot of popular pages of the Web • Continuously new crawls available − Parsing the HTML pages using Apache Any23 • Using a distributed framework on 100 parallel EC2 instances 1. _:node1 <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://schema.org/Product> . 2. _:node1 <http://schema.org/Product/name> "Predator Instinct FG Fu\u00DFballschuh"@de . 3. _:node1 <http://www.w3.org/1999/02/22-rdf-syntax- Any23 ns#type> <http://schema.org/Offer> . 4. _:node1 <http://schema.org/Offer/price> "\u20AC 219,95"@de . 5. _:node1 <http://schema.org/Offer/priceCurrency> "EUR"@de . 6. … The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 4 Dataset Creation − Common Crawl Foundation Corpora of 2010, 2012 and 2013 • Snapshot of popular pages of the Web • Continuously new crawls available − Parsing the HTML pages using Apache Any23 • Using a distributed framework on 100 parallel EC2 instances 1. _:node1 <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://schema.org/Product> . 2. _:node1 <http://schema.org/Product/name> "Predator Instinct FG Fu\u00DFballschuh"@de .
    [Show full text]