The Webdatacommons Microdata, Rdfa, and Microformat Dataset Series Robert Meusel, Petar Petrovski, and Christian Bizer HTML-Embedded Structured Data on the Web

Total Page:16

File Type:pdf, Size:1020Kb

The Webdatacommons Microdata, Rdfa, and Microformat Dataset Series Robert Meusel, Petar Petrovski, and Christian Bizer HTML-Embedded Structured Data on the Web The WebDataCommons Microdata, RDFa, and Microformat Dataset Series Robert Meusel, Petar Petrovski, and Christian Bizer HTML-embedded Structured Data on the Web More and more websites semantically markup the content of their HTML pages. RDFa Microformats Microdata The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 2 Dataset Creation − Common Crawl Foundation Corpora of 2010, 2012 and 2013 • Snapshot of popular pages of the Web • Continuously new crawls available The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 3 Dataset Creation − Common Crawl Foundation Corpora of 2010, 2012 and 2013 • Snapshot of popular pages of the Web • Continuously new crawls available − Parsing the HTML pages using Apache Any23 • Using a distributed framework on 100 parallel EC2 instances 1. _:node1 <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://schema.org/Product> . 2. _:node1 <http://schema.org/Product/name> "Predator Instinct FG Fu\u00DFballschuh"@de . 3. _:node1 <http://www.w3.org/1999/02/22-rdf-syntax- Any23 ns#type> <http://schema.org/Offer> . 4. _:node1 <http://schema.org/Offer/price> "\u20AC 219,95"@de . 5. _:node1 <http://schema.org/Offer/priceCurrency> "EUR"@de . 6. … The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 4 Dataset Creation − Common Crawl Foundation Corpora of 2010, 2012 and 2013 • Snapshot of popular pages of the Web • Continuously new crawls available − Parsing the HTML pages using Apache Any23 • Using a distributed framework on 100 parallel EC2 instances 1. _:node1 <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://schema.org/Product> . 2. _:node1 <http://schema.org/Product/name> "Predator Instinct FG Fu\u00DFballschuh"@de . 3. _:node1 <http://www.w3.org/1999/02/22-rdf-syntax- Any23 ns#type> <http://schema.org/Offer> . 4. _:node1 <http://schema.org/Offer/price> "\u20AC 219,95"@de . 5. _:node1 <http://schema.org/Offer/priceCurrency> "EUR"@de . 6. … The framework is easy to adapt and is publicly available at: http://webdatacommons.org/framework/ The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 5 Dataset Series Overview − Series contains three datasets from 2010, 2012 and 2013 − All together over 30 billion RDF quads − Each dataset is again split into subsets including quads extracted for a particular markup language The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 6 Overview of 2013 dataset − Over 1.7 million domains using at least one markup language − Over 17 billion quads with over 4 billion records (typed entities) The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 7 Overview of 2013 dataset − Over 1.7 million domains using at least one markup language − Over 17 billion quads with over 4 billion records (typed entities) − hCard still most dominant among domains The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 8 Overview of 2013 dataset − Over 1.7 million domains using at least one markup language − Over 17 billion quads with over 4 billion records (typed entities) − hCard still most dominant among domains − Microdata contains the largest number of quads The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 9 Divergence in Class and Property Usage in 2013 − Small number of classes and properties is used by a large number of domains The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 10 Divergence in Class and Property Usage in 2013 − Small number of classes and properties is used by a large number of domains − RDFa: 646k classes and 27k properties, but <1k classes and ~2k properties are used by at least two different domains The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 11 Divergence in Class and Property Usage in 2013 − Small number of classes and properties is used by a large number of domains − RDFa: 646k classes and 27k properties, but <1k classes and ~2k properties are used by at least two different domains − MD: 15k classes and 170k properties, but but ~1.2k classes and <13k properties are used by at least two different domains. The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 12 Divergence in Class and Property Usage in 2013 − Small number of classes and properties is used by a large number of domains − RDFa: 646k classes and 27k properties, but <1k classes and ~2k properties are used by at least two different domains − MD: 15k classes and 170k properties, but but ~1.2k classes and <13k properties are used by at least two different domains. Classes and Properties used by solely one domain are mostly typos The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 13 RDFa Insights 2013 − Usage of various vocabularies to describe information: • Strong presents of Open Graph Protocol (e.g. Facebook) • FOAF and SIOC (Blog-Software as Drupal) − Largest topics covered are: • Articles and Documents (Blogs and News portals) • Products, Reviews and Ratings • BusinessEntities and Organizations The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 14 Microdata Insights 2013 and 2012 − Clear increase of development in comparison to 2012 − Still two vocabularies deployed: data-vocabulary and schema.org − Largest topical areas: • Postal Addresses and Locations • Products, Offers and Ratings • Organizations and Persons • Articles and Blogs • Breadcrumb The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 15 Focus on Schema.org/Product − One of the largest public available product collections − Almost 100 million records described with name, offer and image − 34 million records contain a further description − 11% of all product records include a brand The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 16 Microformats Insights 2013 − Most dominant vocabulary is hCard − Still a very solid deployment − Topics are: • Persons & Organizations • Events • Products and reviews • Recipes The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 17 Opportunities & Challenges The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 18 Opportunities & Challenges Opportunities − Vast amounts of free data, created from people all over the world − Large topical coverage from broad areas (as products) to niche (as recipes) − High up-to-dateness of information, as popular pages potentially update their content frequently The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 19 Opportunities & Challenges Opportunities Challenges − Vast amounts of free data, − Data quality assessment, as created from people all over the data is created by the world experts and rookies − Large topical coverage from − Further information broad areas (as products) to extraction, as a flat schema niche (as recipes) and rather low number of properties are used − High up-to-dateness of information, as popular − Identity resolution, as the pages potentially update data does hardly contain their content frequently identifiers The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 20 Possible Application Domains The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 21 Possible Application Domains − Enriching existing knowledge bases • E.g. mapping DBPedia Classes and Properties to the corresponding classes and properties within the available vocabularies to add missing information and extend entity knowledge • As shown by Lehmberg et al. within the Semantic Web Challenge, this data can be used as additional source (besides others) to gather and return wider search results The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 22 Possible Application Domains − Enriching existing knowledge bases • E.g. mapping DBPedia Classes and Properties to the corresponding classes and properties within the available vocabularies to add missing information and extend entity knowledge • As shown by Lehmberg et al. within the Semantic Web Challenge, this data can be used as additional source (besides others) to gather and return wider search results − Design and adaption of algorithms and methods to face the characteristics of such web data • Training of data extraction methods to gather not marked data within the HTML pages • Further extraction of additional information from the raw data, e.g. extraction of skills, requirements etc. from job posting descriptions The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 23 Possible Application Domains − Enriching existing knowledge bases • E.g. mapping DBPedia Classes and Properties to the corresponding classes and properties within the available vocabularies to add missing information and extend entity knowledge • As shown by Lehmberg et al. within the Semantic Web Challenge, this data can be used as additional source (besides others) to gather and return wider search results − Design and adaption of algorithms and methods to face the characteristics of such web data • Training of data extraction methods to gather not marked data within the HTML pages • Further extraction of additional information from the raw data, e.g. extraction of skills, requirements etc. from job posting descriptions − Starting point for further data discovery • The dataset can be used as starting points for further data crawling, as not all pages from a domain are included (in most of the cases) The WebDataCommons Microdata, RDFa, and Microformats Dataset Series 24 Thank you! Questions? Feedback? Data and more statistics can be found at: http://webdatacommons.org/structureddata/index.html
Recommended publications
  • Using Json Schema for Seo
    Using Json Schema For Seo orAristocratic high-hat unyieldingly.Freddie enervates Vellum hungrily Zippy jangles and aristocratically, gently. she exploiter her epoxy gnarls vivace. Overnice and proclitic Zane unmortgaged her ben thrum This provides a murder of element ids with more properties elsewhere in the document Javascript Object Notation for Linked Objects JSON-LD. Enhanced display search results with microdata markup is json data using video we need a website experience, is free whitepaper now need a form. Schemaorg Wikipedia. Sign up in some time and as search console also, he gets generated by google tool you add more. Schema Markup 2021 SEO Best Practices Moz. It minimal settings or where your page editor where can see your business information that will talk about. Including your logo, social media and corporate contact info is they must. How various Use JSON-LD for Advanced SEO in Angular by Lewis. How do no implement a FAQ schema? In seo plugin uses standard schema using html. These features can describe you stand only in crowded SERPs and enclose your organic clickthrough rate. They propose using the schemaorg vocabulary along between the Microdata RDFa or JSON-LD formats to that up website content with metadata about my Such. The incomplete data also can mild the Rich Snippets become very inconsistent. Their official documentation pages are usually have few months or even years behind. Can this be included in this? Please contact details about seo services, seos often caches versions of. From a high level, you warrior your adventure site pages, you encounter use an organization schema.
    [Show full text]
  • V a Lida T in G R D F Da
    Series ISSN: 2160-4711 LABRA GAYO • ET AL GAYO LABRA Series Editors: Ying Ding, Indiana University Paul Groth, Elsevier Labs Validating RDF Data Jose Emilio Labra Gayo, University of Oviedo Eric Prud’hommeaux, W3C/MIT and Micelio Iovka Boneva, University of Lille Dimitris Kontokostas, University of Leipzig VALIDATING RDF DATA This book describes two technologies for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL), the rationales for their designs, a comparison of the two, and some example applications. RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange. The Semantic Web offers a bold, new take on how to organize, distribute, index, and share data. Using Web addresses (URIs) as identifiers for data elements enables the construction of distributed databases on a global scale. Like the Web, the Semantic Web is heralded as an information revolution, and also like the Web, it is encumbered by data quality issues. The quality of Semantic Web data is compromised by the lack of resources for data curation, for maintenance, and for developing globally applicable data models. At the enterprise scale, these problems have conventional solutions. Master data management provides an enterprise-wide vocabulary, while constraint languages capture and enforce data structures. Filling a need long recognized by Semantic Web users, shapes languages provide models and vocabularies for expressing such structural constraints.
    [Show full text]
  • Spatial Data Infrastructures and Linked Data
    Spatial Data Infrastructures and Linked Data Carlos Granell Centre for Interactive Visualization Universitat Jaume I, Castellón, Spain Sven Schade European Commission - Joint Research Centre Institute for Environment and Sustainability, Ispra, Italy Gobe Hobona Centre for Geospatial Science University of Nottingham, Nottingham, United Kingdom ABSTRACT A Spatial Data Infrastructure (SDI) is a type of information infrastructure for enhancing geospatial data sharing and access. At the moment, we face the transition from the service-oriented second generation of SDI to a third generation, characterized by user-centric approaches. This new movement closes the gap between classical SDI and Volunteered Geographic Information (VGI). Public use and acquisition of information provides additional challenges within and beyond the geospatial domain. Linked data has been suggested recently as a possible overall solution. This notion refers to a best practice for exposing, sharing, and connecting resources in the (semantic) web. In this paper, we project the linked data approach to SDI and suggest it as a possibility to combine SDI with VGI. We advocate a Spatial Linked Data Infrastructure, which applies solutions for linked data to classical SDI standards. We detail different implementing strategies, give examples, and argue for benefits, while at the same time trying to outline possible fallbacks. We hope that this contribution will enlighten a way towards a single shared information space. 2 INTRODUCTION A Spatial Data Infrastructure (SDI) is a type of information infrastructure for enhancing geospatial data sharing and access. An SDI embraces a set of rules, standards, procedures, guidelines, policies, institutions, data, networks, technology and human resources for enabling and coordinating the management and exchange of geospatial data between stakeholders in the spatial data community (Nebert, 2004; Rajabifard et al., 2006; Masser, 2007).
    [Show full text]
  • Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing
    RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing RDFa in XHTML: Syntax and Processing A collection of attributes and processing rules for extending XHTML to support RDF W3C Recommendation 14 October 2008 This version: http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014 Latest version: http://www.w3.org/TR/rdfa-syntax Previous version: http://www.w3.org/TR/2008/PR-rdfa-syntax-20080904 Diff from previous version: rdfa-syntax-diff.html Editors: Ben Adida, Creative Commons [email protected] Mark Birbeck, webBackplane [email protected] Shane McCarron, Applied Testing and Technology, Inc. [email protected] Steven Pemberton, CWI Please refer to the errata for this document, which may include some normative corrections. This document is also available in these non-normative formats: PostScript version, PDF version, ZIP archive, and Gzip’d TAR archive. The English version of this specification is the only normative version. Non-normative translations may also be available. Copyright © 2007-2008 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported - 1 - How to Read this Document RDFa in XHTML: Syntax and Processing into a user’s desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo’s creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.
    [Show full text]
  • Nuno Crato Paolo Paruolo Editors How Access to Microdata
    Nuno Crato Paolo Paruolo Editors Data-Driven Policy Impact Evaluation How Access to Microdata is Transforming Policy Design Data-Driven Policy Impact Evaluation Nuno Crato • Paolo Paruolo Editors Data-Driven Policy Impact Evaluation How Access to Microdata is Transforming Policy Design Editors Nuno Crato Paolo Paruolo University of Lisbon Joint Research Centre Lisbon, Portugal Ispra, Italy ISBN 978-3-319-78460-1 ISBN 978-3-319-78461-8 (eBook) https://doi.org/10.1007/978-3-319-78461-8 Library of Congress Control Number: 2018954896 © The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 Inter- national License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
    [Show full text]
  • Metadata and Microdata Curation and Dissemination Protocol
    METADATA AND MICRODATA CURATION AND DISSEMINATION PROTOCOL 2 Contents Acknowledgements ............................................................................................................................... 5 1. Background, concepts, and definitions ............................................................................................... 6 1.1 Background ........................................................................................................................ 6 2. Metadata standards ........................................................................................................................... 8 2.1 What is metadata? ............................................................................................................. 8 2.2 The Data Documentation Initiative (DDI) ............................................................................. 9 2.2.1 Benefits of DDI ............................................................................................................ 9 2.2.2 DDI Structure (version 2.5) ......................................................................................... 10 2.3 Acquisition of metadata .................................................................................................... 11 2.3.1 Receiving metadata through the data deposit system.................................................. 11 2.3.2 Harvesting metadata from external sources ................................................................ 11 2.4 Metadata required for the FAM catalogue
    [Show full text]
  • Where Is the Semantic Web? – an Overview of the Use of Embeddable Semantics in Austria
    Where Is The Semantic Web? – An Overview of the Use of Embeddable Semantics in Austria Wilhelm Loibl Institute for Service Marketing and Tourism Vienna University of Economics and Business, Austria [email protected] Abstract Improving the results of search engines and enabling new online applications are two of the main aims of the Semantic Web. For a machine to be able to read and interpret semantic information, this content has to be offered online first. With several technologies available the question arises which one to use. Those who want to build the software necessary to interpret the offered data have to know what information is available and in which format. In order to answer these questions, the author analysed the business websites of different Austrian industry sectors as to what semantic information is embedded. Preliminary results show that, although overall usage numbers are still small, certain differences between individual sectors exist. Keywords: semantic web, RDFa, microformats, Austria, industry sectors 1 Introduction As tourism is a very information-intense industry (Werthner & Klein, 1999), especially novel users resort to well-known generic search engines like Google to find travel related information (Mitsche, 2005). Often, these machines do not provide satisfactory search results as their algorithms match a user’s query against the (weighted) terms found in online documents (Berry and Browne, 1999). One solution to this problem lies in “Semantic Searches” (Maedche & Staab, 2002). In order for them to work, web resources must first be annotated with additional metadata describing the content (Davies, Studer & Warren., 2006). Therefore, anyone who wants to provide data online must decide on which technology to use.
    [Show full text]
  • Conceptualization and Visualization of Tagging and Folksonomies
    Conceptualization and Visualization of Tagging and Folksonomies Von der Fakultät für Ingenieurwissenschaften, Abteilung Informatik und Angewandte Kognitionswissenschaft der Universität Duisburg-Essen zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) genehmigte Dissertation von Steffen Lohmann aus Hamburg 1. Gutachter: Prof. Dr. Maria Paloma Díaz Pérez 2. Gutachter: Prof. Dr.-Ing. Jürgen Ziegler Tag der mündlichen Prüfung: 27.11.2013 Hinweis: Diese Dissertation ist im Rahmen eines binationalen Promotionsverfahrens (Cotutelle) in Kooperation mit der Universidad Carlos III de Madrid entstanden. Abstract Tagging has become a popular indexing method for interactive systems in the past decade. It offers a simple yet effective way for users to organize an ever increasing amount of digital information for themselves and/or others. The linked user vocabulary resulting from tagging is known as folksonomy and provides a valuable source for the retrieval and exploration of digital resources. Although several models and representations of tagging have been proposed, there is no coherent conceptualization that provides a comprehensive and pre- cise description of the concepts and relationships in the domain. Furthermore, there is little systematic research in the area of folksonomy visualization, and so folksonomies are still mainly depicted as simple tag clouds. Both problems are related, as a well-defined conceptualization is an important prerequisite for the interoperable use and visualization of folksonomies. The thesis addresses these shortcomings by developing a coherent conceptualiza- tion of tagging and visualizations for the interactive exploration of folksonomies. It gives an overview and comparison of tagging models and defines key concepts of the domain. After a comprehensive review of existing tagging ontologies, a unified and coherent conceptualization is presented that incorporates the best parts of the reviewed ontologies.
    [Show full text]
  • Microformats the Next (Small) Thing on the Semantic Web?
    Standards Editor: Jim Whitehead • [email protected] Microformats The Next (Small) Thing on the Semantic Web? Rohit Khare • CommerceNet “Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards.” — Microformats.org hen we speak of the “evolution of the is precisely encoding the great variety of person- Web,” it might actually be more appropri- al, professional, and genealogical relationships W ate to speak of “intelligent design” — we between people and organizations. By contrast, can actually point to a living, breathing, and an accidental challenge is that any blogger with actively involved Creator of the Web. We can even some knowledge of HTML can add microformat consult Tim Berners-Lee’s stated goals for the markup to a text-input form, but uploading an “promised land,” dubbed the Semantic Web. Few external file dedicated to machine-readable use presume we could reach those objectives by ran- remains forbiddingly complex with most blog- domly hacking existing Web standards and hop- ging tools. ing that “natural selection” by authors, software So, although any intelligent designer ought to developers, and readers would ensure powerful be able to rely on the long-established facility of enough abstractions for it. file transfer to publish the “right” model of a social Indeed, the elegant and painstakingly inter- network, the path of least resistance might favor locked edifice of technologies, including RDF, adding one of a handful of fixed tags to an exist- XML, and query languages is now growing pow- ing indirect form — the “blogroll” of hyperlinks to erful enough to attack massive information chal- other people’s sites.
    [Show full text]
  • Directed Graph
    The Consistency and Conformance of Web Document Collection Based on Heterogeneous DAC Graph Marek Kopel and Aleksander Zgrzywa www.iis.pwr.wroc.pl www.zsi.pwr.wroc.pl Outline • Background & Idea • Personal Web of Trust • User and Agent Trust • Local Document Ranking & Filtering • Example Scenario • Conclusions & Future Work 2 Relationships in WWW • directed graph - most common model of a Web document collection • documents' hyperlinking relationship (edges) → PageRank, HITS • Tim Berners-Lee (in reference to the social aspect of Web 2.0): “I called this graph the Semantic Web, but maybe it should have been Giant Global Graph!” 3 Relationships in WWW (2) There's more to hyperlink than href: • HTML 4.01 attributes rel and rev - e.g: <a href=”glos.html” rel=”glossary”>used definitions</a> – navigation in a document collection (start, prev, next, contents, index), – structure (chapter, section, subsection, appendix, glossary) – meta (copyright, help) • XHTML 2.0 – custom namespaces 4 Relationships in WWW (3) Popular relation ontologies: • FOAF <foaf:knows> • XFN microformat – friendship (contact, acquaintance, friend) – family (child, parent, sibling, spouse, kin) – professional (co-worker, colleague) – physical (met) – geographical (co-resident, neighbor) – romantic (muse, crush, date, sweetheart) • rel-tag microformat - folksonomies 5 Heterogeneous DAC Graph • DAC graph – nodes of three types: • Document • Author • Concept – edges between nodes model the relationships – most of the relationships can be acquired directly from the Web data
    [Show full text]
  • HTML5 Microdata and Schema.Org
    HTML5 Microdata and Schema.org journal.code4lib.org/articles/6400 On June 2, 2011, Bing, Google, and Yahoo! announced the joint effort Schema.org. When the big search engines talk, Web site authors listen. This article is an introduction to Microdata and Schema.org. The first section describes what HTML5, Microdata and Schema.org are, and the problems they have been designed to solve. With this foundation in place section 2 provides a practical tutorial of how to use Microdata and Schema.org using a real life example from the cultural heritage sector. Along the way some tools for implementers will also be introduced. Issues with applying these technologies to cultural heritage materials will crop up along with opportunities to improve the situation. By Jason Ronallo Foundation HTML5 The HTML5 standard or (depending on who you ask) the HTML Living Standard has brought a lot of changes to Web authoring. Amongst all the buzz about HTML5 is a new semantic markup syntax called Microdata. HTML elements have semantics. For example, an ol element is an ordered list, and by default gets rendered with numbers for the list items. HTML5 provides new semantic elements like header , nav , article , aside , section and footer that allow more expressiveness for page authors. A bunch of div elements with various class names is no longer the only way to markup this content. These new HTML5 elements enable new tools and better services for the Web ecosystem. Browser plugins can more easily pull out the text of the article for a cleaner reading experience. Search engines can give more weight to the article content rather than the advertising in the sidebar.
    [Show full text]
  • Microformats?
    What are Microformats? “Designed for humans first and machines second, Microformats microformats are a set of simple, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler problems first by adapting to current behaviors and usage patterns (e.g. XHTML, blogging).” Randy Schauer – Microformats.org CMSC 691M Semantic Web vs. semantic web Advantages Encode explicit information to aid machine Semantic Web semantic web Philosophy Build a common data format for Humans first, machines second. readability expressing the meaning of data. Use Encode existing Web content with Uses standard XHTML with a set of common ontologies to help machines to special tags. understand web content. class-names. They are freely available for Language RDF, RDFS, OWL Microformats anyone to use. Format Well- formed RDF documents Valid XHTML documents Simple to both describe and use Semantic Defined by the underlying Loosely defined. No formal ontology model semantic model. Offers a way to go beyond just presentation in XHTML. From Harry Chen’s Geospatial Semantic Web Presentation on 3/27/2007 1 Disadvantages Why use Microformats? Do not address implicit knowledge representation, In short, microformats are the convergence of a ontological analysis or logical inference. number of trends: a logical next step in the evolution of web design and Since there is no equivalent to namespaces, name information architecture. conflicts are destined to occur with increasing a way for people and organizations to publish richer information themselves, without having to rely upon frequency. centralized services. In order to try to reduce possible conflicts, there is a single an acknowledgment that (outside of specialist areas) repository for microformats; this is not scalable.
    [Show full text]