Web-Scale Profiling of Semantic Annotations in HTML Pages

Total Page:16

File Type:pdf, Size:1020Kb

Web-Scale Profiling of Semantic Annotations in HTML Pages Web-Scale Profiling of Semantic Annotations in HTML Pages Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Universitat¨ Mannheim vorgelegt von Dipl.-Wirtsch.-Inf. Robert Meusel aus Berlin Mannheim, 2016 Dekan: Professor Dr. Heinz Jurgen¨ Muller,¨ Universitat¨ Mannheim Referent: Professor Dr. Christian Bizer, Universitat¨ Mannheim Korreferent: Professor Dr. Wolfgang Nejdl, Leibniz Universitat¨ Hannover Tag der mundlichen¨ Prufung:¨ 10. Marz¨ 2017 Abstract The vision of the Semantic Web was coined by Tim Berners-Lee almost two decades ago. The idea describes an extension of the existing Web in which “information is given well-defined meaning, better enabling computers and people to work in cooperation” [Berners-Lee et al., 2001]. Semantic annotations in HTML pages are one realization of this vision which was adopted by large numbers of web sites in the last years. Semantic annotations are integrated into the code of HTML pages using one of the three markup languages Microformats, RDFa, or Microdata. Major consumers of semantic annotations are the search engine companies Bing, Google, Yahoo!, and Yandex. They use semantic annotations from crawled web pages to enrich the presentation of search results and to complement their knowledge bases. However, outside the large search engine companies, little is known about the deployment of semantic annotations: How many web sites deploy semantic annotations? What are the topics covered by semantic annotations? How detailed are the annotations? Do web sites use semantic annotations correctly? Are semantic annotations useful for others than the search engine companies? And how can semantic annotations be gathered from the Web in that case? The thesis answers these questions by profiling the web-wide deployment of semantic annotations. The topic is approached in three consecutive steps: In the first step, two ap- proaches for extracting semantic annotations from the Web are discussed. The thesis evaluates first the technique of focused crawling for harvesting semantic annotations. Afterward, a framework to extract semantic annotations from existing web crawl corpora is described. The two extraction approaches are then compared for the purpose of analyzing the deployment of semantic annotations in the Web. In the second step, the thesis analyzes the overall and markup language-specific adoption of semantic annotations. This empirical investigation is based on the largest web corpus that is available to the public. Further, the topics covered by deployed semantic annotations and their evolution over time are analyzed. Subsequent studies examine common errors within semantic annotations. In addition, the thesis analyzes the data overlap of the entities that are described by semantic annotations from the same and across different web sites. The third step narrows the focus of the analysis towards use case-specific issues. Based on the requirements of a marketplace, a news aggregator, and a travel portal the thesis empirically examines the utility of semantic annotations for these use cases. Additional experiments analyze the capability of product-related semantic annotations to be integrated into an existing product categorization schema. Especially, the potential of exploiting the diverse category information given by the web sites providing semantic annotations is evaluated. Zusammenfassung Vor mehr als 20 Jahren veroffentlichte¨ Tim Berners-Lee seine Idee des Semantic Webs. Basierend auf seiner Vision, sollte das semantische Web eine Erweiterung des bestehenden Webs sein, in dem die enthaltenen Informationen semantisch definiert sind, wodurch die Kooperation zwischen Mensch und Maschine vereinfacht werden wurde.¨ [Berners-Lee et al., 2001] Semantische Annotationen in HTML Seiten sind eine konkrete Umsetzung dieser Idee, die in den letzten Jahren von sehr vielen Webseitenbetreibern adap- tiert wurden. Semantische Annotationen werden direkt im HTML Quellcode der Webseite mithilfe der drei HTML-Markup-Erweiterungen Microformats, RDFa, und Microdata eingefugt.¨ Hauptsachlich¨ werden so annotierte Informationen von den großen Suchmaschinenfirmen, Bing, Google, Yahoo! oder Yandex verarbeitet. Diese Firmen nutzen semantische Annotationen, die sie in dem HTML Quell- code von gecrawlten Webseiten finden, um die Anzeige von Suchergebnissen zu verbessern oder ihren internen Wissensgraphen zu erweitern. Trotz der starken Nutzung durch die Suchmaschinenfirmen ist wenig uber¨ die Einbindung und Ver- breitung von semantischen Annotationen im Web bekannt: Wie viele Webseiten bieten semantische Annotationen an? Welche Themengebiete werden beschrieben? Wie detailliert sind die annotierten Informationen und nutzen Webseitenbetreiber die Annotationen korrekt? Sind die so angebotenen Informationen nutzlich¨ und wie konnen¨ sie effizient gesammelt werden? Diese Fragen werden in den drei, aufeinanderfolgenden Teilen dieser Disserta- tion im Zuge einer umfassenden Profilierung des Datenraumes, der von semantis- chen Annotationen aufgespannt wird, beantwortet. Im ersten Teil werden zwei Moglichkeiten¨ zur Sammlung von semantischen An- notationen diskutiert. Zuerst evaluiert die Dissertation eine Methodik, die sich an der Idee des fokussierten Crawlens orientiert. Daraufhin wird ein Frame- work vorgestellt, welches semantische Annotationen aus bestehenden Webcrawl- datensatzen¨ extrahieren kann. Beide Vorgehensweisen werden verglichen und mit Bezug auf die Reprasentativit¨ at¨ der gewonnen Daten evaluiert. Im zweiten Teil analysiert die Arbeit empirisch die allgemeine, wie auch markup- spezifische Verbreitung von semantischen Annotationen im Web basierend auf dem großten¨ offentlich¨ zuganglichen¨ Webcrawldatensatzes. Uber¨ die Verbreitung hinaus werden die enthaltenen Themengebiete sowie deren Veranderung¨ uber¨ die Zeit be- trachtet. Nachfolgend untersucht die Arbeit, zu welchem Grad Webseitenbetreiber semantische Annotationen korrekt benutzen. Der abschließende Teil der Arbeit fokussiert sich auf eine anwendungsbezogene Analyse von semantischen Annotationen. Basierend auf den Anforderungen eines Onlineshops, einer Nachrichtenaggregationsseite und eines Reiseportals wird die Nutzlichkeit¨ von semantischen Annotationen evaluiert. Anschließend wird unter- sucht, in wie weit es moglich¨ ist, die seitenspezifischen Produktkategorisierungen zu nutzen um Produktinformationen, auf eine bestehende Produktklassifizierung abzubilden und somit eine feingranulare Themenanalyse zu ermoglichen.¨ Contents 1 Introduction 1 1.1 Motivation . .4 1.2 Problem Description and Contributions . .5 1.3 Thesis Outline . .8 1.4 Published Work . 11 2 Preliminaries 13 2.1 Semantic Markup Languages . 13 2.1.1 Microformats . 14 2.1.2 RDFa . 15 2.1.3 Microdata . 16 2.2 Common Vocabularies . 16 2.2.1 Open Graph Protocol . 17 2.2.2 Schema.org . 18 2.2.3 Other Vocabularies . 19 2.2.4 Namespaces and Abbreviations . 21 2.3 Parsing Semantic Annotations to RDF . 21 2.4 Dataspaces . 23 I Extraction of Semantic Annotations 25 3 Data Extraction from the Web using Focused Crawling 27 3.1 Related Work . 28 3.1.1 Structure of the Web . 28 3.1.2 Crawling . 31 3.1.3 Focused Crawling . 32 3.2 Focused Crawling Methodology . 34 3.2.1 Online Classification . 34 3.2.2 Bandit-Based Selection . 36 3.3 Experimental Setup . 39 3.3.1 System Architecture and Process Flow . 40 3.3.2 Research Data . 41 iii iv CONTENTS 3.3.3 Experiments Description . 42 3.3.4 Evaluation Metrics . 44 3.4 Results . 44 3.4.1 Online Classification Optimization . 44 3.4.2 Offline versus Online Classification . 45 3.4.3 Evaluation of Different Bandit Functions . 46 3.4.4 Adaptability to More Specific Semantic Annotations Crawl- ing Tasks . 49 3.4.5 Evaluation of Runtime . 50 3.5 Conclusion . 52 4 Data Extraction from Web Corpora 55 4.1 Public Web Corpora . 56 4.2 Overall Extraction Workflow . 58 4.3 Extraction of Microformats, RDFa, and Microdata . 59 4.4 Additional Use Cases . 60 4.5 Discussion and Conclusion . 61 5 Comparison of the Extraction Approaches 63 5.1 Representativity . 63 5.2 Sampling Errors . 66 5.3 Conclusion . 68 II Analysis of Semantic Annotations 69 6 Overall Adoption of Semantic Annotations 71 6.1 Introduction to Data Profiling . 72 6.1.1 Different Dimensions of Profiling . 72 6.1.2 Profiling Semantic Annotations in HTML pages . 73 6.2 Profiling of the Adoption of Semantic Annotations . 74 6.2.1 Research Data and Measures . 74 6.2.2 Overall Adoption . 75 6.2.3 Microformats Adoption . 76 6.2.4 RDFa Adoption . 76 6.2.5 Microdata Adoption . 78 6.3 Related Work . 81 6.3.1 Web Data Profiling . 81 6.3.2 Microformats, RDFa and Microdata . 82 6.4 Summary . 83 7 Profiling of schema.org Microdata 85 7.1 Problem Statement . 87 7.1.1 Duplicates . 88 CONTENTS v 7.1.2 Non-compliance to the Schema . 90 7.2 Methodology . 93 7.2.1 Syntactic Duplicate Removal . 93 7.2.2 Heuristics to Correct Schema Violations . 94 7.2.3 Combined Approach . 96 7.2.4 Semantic Identity Resolution . 97 7.3 Empirical Findings . 99 7.3.1 Syntactical Duplicate Removal and Correction of Schema Violations . 99 7.3.2 Semantic Duplicate Detection . 103 7.4 Discussion . 107 7.4.1 Quality of Structural Duplicate Detection . 107 7.4.2 Limitation of Duplicate Detection by RDF Graph Equivalence109 7.4.3 Limitation of Heuristics with High Precision . 109 7.4.4 Selection of (Pseudo-)Key Properties . 110 7.5 Related Work . 110 7.6 Summary . 112 8 Evolution of the Deployment of schema.org Microdata over Time 115 8.1 Research Data . 116 8.2 Research Questions and Methodology . 118 8.2.1 Top-down Processes . 118 8.2.2 Bottom-up Processes . 119 8.2.3 Overall Convergence of Vocabulary Usage . 121 8.2.4 Influence of Data Consumers . 122 8.3 Empirical Findings . 123 8.3.1 Top-down Processes . 123 8.3.2 Bottom-up Processes . 128 8.3.3 Overall Convergence of Vocabulary Usage . 131 8.4 Related Work . 134 8.5 Summary . 135 III Use Case-Specific Profiling 137 9 Use Case-specific Utility Analysis of schema.org Data 139 9.1 Use Cases . 140 9.1.1 Marketplace . 140 9.1.2 News Aggregator . 141 9.1.3 Travel Portal . 141 9.2 Related Work . 141 9.3 Research Data and Methodology . 142 9.3.1 Research Data . 142 9.3.2 Methodology . 143 vi CONTENTS 9.4 Empirical Findings . 144 9.4.1 Use Case-Independent Analysis .
Recommended publications
  • Where Is the Semantic Web? – an Overview of the Use of Embeddable Semantics in Austria
    Where Is The Semantic Web? – An Overview of the Use of Embeddable Semantics in Austria Wilhelm Loibl Institute for Service Marketing and Tourism Vienna University of Economics and Business, Austria [email protected] Abstract Improving the results of search engines and enabling new online applications are two of the main aims of the Semantic Web. For a machine to be able to read and interpret semantic information, this content has to be offered online first. With several technologies available the question arises which one to use. Those who want to build the software necessary to interpret the offered data have to know what information is available and in which format. In order to answer these questions, the author analysed the business websites of different Austrian industry sectors as to what semantic information is embedded. Preliminary results show that, although overall usage numbers are still small, certain differences between individual sectors exist. Keywords: semantic web, RDFa, microformats, Austria, industry sectors 1 Introduction As tourism is a very information-intense industry (Werthner & Klein, 1999), especially novel users resort to well-known generic search engines like Google to find travel related information (Mitsche, 2005). Often, these machines do not provide satisfactory search results as their algorithms match a user’s query against the (weighted) terms found in online documents (Berry and Browne, 1999). One solution to this problem lies in “Semantic Searches” (Maedche & Staab, 2002). In order for them to work, web resources must first be annotated with additional metadata describing the content (Davies, Studer & Warren., 2006). Therefore, anyone who wants to provide data online must decide on which technology to use.
    [Show full text]
  • 978-3-642-23300-5 6 Chapter.Pd
    The Problem of Conceptual Incompatibility Simon Mcginnes To cite this version: Simon Mcginnes. The Problem of Conceptual Incompatibility. 1st Availability, Reliability and Security (CD-ARES), Aug 2011, Vienna, Austria. pp.69-81, 10.1007/978-3-642-23300-5_6. hal-01590394 HAL Id: hal-01590394 https://hal.inria.fr/hal-01590394 Submitted on 19 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License The Problem of Conceptual Incompatibility Exploring the Potential of Conceptual Data Independence to Ease Data Integration Simon McGinnes Trinity College Dublin, Dublin 2, Ireland [email protected] Abstract. Application interoperability and data exchange are desirable goals, but conventional system design practices make these goals difficult to achieve, since they create heterogeneous, incompatible conceptual structures. This conceptual incompatibility increases system development, maintenance and integration workloads unacceptably. Conceptual data independence (CDI) is proposed as a way of overcoming these problems. Under CDI, data is stored and exchanged in a form which is invariant with respect to conceptual structures; data corresponding to multiple schemas can co-exist within the same application without loss of integrity.
    [Show full text]
  • Tantek Çelik HTML5 Now a Step-By-Step Tutorial for Getting Started Today Tantek Çelik
    <!DOCTYPE html> <meta charset= "utf-8"><title>HTML5 Now</title> <hgroup> <h1> HTML5 </h1> N <h2> OW A STEP-BY-STEP TUtoRIAL FOR GETTING STARTED TODAY </h2> </hgroup> <div class= "vcard"> <a href= "http://tantek.com" class= "url fn"> TANTEK ÇELIK </a> <div> EXTENDED REFERENCE GUIDE HTML5 NOW A STEP-BY-STEP TUtoRIAL FOR GETTING STARTED TODAY TANTEK ÇELIK HTML5 Now A Step-by-Step Tutorial for Getting Started Today Tantek Çelik New Riders 1249 Eighth Street Berkeley, CA 94710 510/524-2178 Fax: 510/524-2221 Find us on the web at www.newriders.com To report errors, please send a note to [email protected] New Riders is an imprint of Peachpit, a division of Pearson Education Copyright © 2011 by Pearson Education, Inc. Senior Editor: Karyn Johnson Production Editor: Hilal Sala Copy Editor: Kelly Anton Technical Editor: Ben Ward Interior Design and Composition: Andreas deDanaan Presentation Graphics and Design: Coley Wopperer Author Photo: Matt Nuzzaco Cover Design: Mimi Heft Cover Production: Andreas deDanaan Video Production and Direction: Mary Sweeney Notice of Rights All rights reserved. No part of this book may be reproduced or transmitted in any form by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. For information on getting permission for reprints and excerpts, contact [email protected]. Notice of Liability The information in this book is distributed on an “As Is” basis without warranty. While every precaution has been taken in the preparation of the book, neither the author nor Peachpit shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the instructions contained in this book or by the computer software and hardware products described in it.
    [Show full text]
  • REST in Practice
    REST in Practice Jim Webber, Savas Parastatidis, and Ian Robinson Beijing · Cambridge · Farnham · Köln · Sebastopol · Tokyo REST in Practice by Jim Webber, Savas Parastatidis, and Ian Robinson Copyright © 2010 Jim Webber, Savas Parastatidis, and Ian Robinson. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or [email protected]. Editor: Simon St.Laurent Indexer: Lucie Haskins Production Editor: Rachel Monaghan Cover Designer: Karen Montgomery Copyeditor: Audrey Doyle Interior Designer: Ron Bilodeau Proofreader: Rachel Monaghan Illustrator: Robert Romano Production Services: Newgen North America Printing History: September 2010: First Edition. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc., REST in Practice, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and au- thors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 978-0-596-80582-1 [M] C o n t e n t s Foreword .
    [Show full text]
  • Foundations of Semantic Web Technologies
    FOUNDATIONS OF SEMANTIC WEB TECHNOLOGIES Linked Data and Applications Sebastian Rudolph Agenda 1 Linked (Open) Data 2 Semantic Web and HTML RDFa Microformats Google Knowledge Graph 3 OWL Applications OWL DL Application EDF Energy OWL Profile Application BBC World Cup Semantic Technologies in the Pharmaceutical Industry 4 Summary TU Dresden Foundations of Semantic Web Technologies slide 2 of 51 Agenda 1 Linked (Open) Data 2 Semantic Web and HTML RDFa Microformats Google Knowledge Graph 3 OWL Applications OWL DL Application EDF Energy OWL Profile Application BBC World Cup Semantic Technologies in the Pharmaceutical Industry 4 Summary TU Dresden Foundations of Semantic Web Technologies slide 3 of 51 Data in the Web • more and more data is available in the Web for programmatic access • often specified using Semantic Web Standards, e.g., the following Linking Open Data (LOD) Initiative http://www.w3.org/wiki/SweoIG/TaskForces/ CommunityProjects/LinkingOpenData • using APIs, e.g. via JSON/REST • Semantic Web technologies simplify the integration of data from different sources • combination of data leads to deeper insights TU Dresden Foundations of Semantic Web Technologies slide 4 of 51 Linked Data in the Web 01.05.2007 Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ TU Dresden Foundations of Semantic Web Technologies slide 5 of 51 Linked Data in the Web 31.03.2008 ECS Sem- South- Web- ampton Central updated Music- Doap- Audio- brainz space Flickr Scrobbler QDOS exporter SIOC profiles BBC BBC Magna- SW Later + John Onto- tune Conference Jamendo Peel world TOTP FOAF Corpus profiles Open- Guides Geo- names Revyu DBpedia US RDF Book Census Mashup World Data NEW! Fact- DBLP book lingvoj riese Berlin NEW! RKB Euro- Explorer stat flickr Gov- Wiki- Open wrappr Track company Cyc DBLP Hannover W3C Project WordNet Guten- berg Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
    [Show full text]
  • Starting to Cook a Coaching Dialogue System in the Olympus Framework
    Starting to Cook a Coaching Dialogue System in the Olympus framework Joana Paulo Pardal and Nuno J. Mamede Abstract This paper presents the COOKCOACH, a system that guides the user while cooking a recipe during a spoken dialogue coaching session. We describe our expe- rience creating the first version of our coaching dialogue system that helps the user cook a selected dish. We use the CMU dialogue systems framework OLYMPUS. The main challenge we faced was the change of paradigm: instead of the system being driven by the user, the user is guided by the system. The result is a system capable of dictating generic tasks to the user. The methodology allows us to create different systems in diverse domains for similar guidance tasks that support the user while s/he performs some procedural task. The use of ontologies to facilitate the creation of new systems is briefly explored. Key words: Coaching, Ontologies, Cooking, Knowledge Acquisition, Interface Design, Human-Human Dialogue Corpora 1 Introduction Everyone needs to eat. Most of us need to cook. Once in a while, or daily. Whether we like it or not. However, most cooking books scatter the related knowledge across different sections assuming a common level of expertise. This means that an unex- perienced cook needs to navigate the cooking book to find all s/he needs, to cook what s/he has in mind. It also means that an experienced cook will read superfluous information s/he might already know. It is also a scenario where the person needs to stop what s/he is doing to read and handle the book, shifting focus from the task and Joana P.
    [Show full text]
  • Web Standards.Pdf
    BOOKS FOR PROFESSIONALS BY PROFESSIONALS® Sikos, Ph.D. RELATED Web Standards Web Standards: Mastering HTML5, CSS3, and XML gives you a deep understand- ing of how web standards can be applied to improve your website. You will also find solutions to some of the most common website problems. You will learn how to create fully standards-compliant websites and provide search engine-optimized Web documents with faster download times, accurate rendering, lower development costs, and easy maintenance. Web Standards: Mastering HTML5, CSS3, and XML describes how you can make the most of web standards, through technology discussions as well as practical sam- ple code. As a web developer, you’ll have seen problems with inconsistent appearance and behavior of the same site in different browsers. Web standards can and should be used to completely eliminate these problems. With Web Standards, you’ll learn how to: • Hand code valid markup, styles, and news feeds • Provide meaningful semantics and machine-readable metadata • Restrict markup to semantics and provide reliable layout • Achieve full standards compliance Web standardization is not a sacrifice! By using this book, we can create and maintain a better, well-formed Web for everyone. CSS3, and XML CSS3, Mastering HTML5, US $49.99 Shelve in Web Development/General User level: Intermediate–Advanced SOURCE CODE ONLINE www.apress.com www.it-ebooks.info For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them. www.it-ebooks.info Contents at a Glance About the Author................................................................................................
    [Show full text]
  • D2.4 Weblog Spider Prototype and Associated Methodology
    SEVENTH FRAMEWORK PROGRAMME FP7-ICT-2009-6 BlogForever Grant agreement no.: 269963 BlogForever: D2.4 Weblog spider prototype and associated methodology Editor: M. Rynning Revision: First version Dissemination Level: Public Author(s): M. Rynning, V. Banos, K. Stepanyan, M. Joy, M. Gulliksen Due date of deliverable: 30th November 2011 Actual submission date: 30th November 2011 Start date of project: 01 March 2011 Duration: 30 months Lead Beneficiary name: CyberWatcher Abstract: This document presents potential solutions and technologies for monitoring, capturing and extracting data from weblogs. Additionally, the selected weblog spider prototype and associated methodologies are analysed. D2.2 Report: Weblog Spider Prototype and Associated Methodology 30 November 2011 Project co-funded by the European Commission within the Seventh Framework Programme (2007-2013) The BlogForever Consortium consists of: Aristotle University of Thessaloniki (AUTH) Greece European Organization for Nuclear Research (CERN) Switzerland University of Glasgow (UG) UK The University of Warwick (UW) UK University of London (UL) UK Technische Universitat Berlin (TUB) Germany Cyberwatcher Norway SRDC Yazilim Arastrirma ve Gelistrirme ve Danismanlik Ticaret Limited Sirketi (SRDC) Turkey Tero Ltd (Tero) Greece Mokono GMBH Germany Phaistos SA (Phaistos) Greece Altec Software Development S.A. (Altec) Greece BlogForever Consortium Page 2 of 71 D2.2 Report: Weblog Spider Prototype and Associated Methodology 30 November 2011 History Version Date Modification reason Modified
    [Show full text]
  • Smart Kitchen Final
    Supporting Cooking Tasks by a Smart Formulation Table Wolfgang Maass1,2 Andreas Filler 1Research Center for Intelligent Media Research Center for Intelligent Media Furtwangen University, Germany Furtwangen University, Germany 2Institute of Technology Management, University of St. Gallen, Switzerland Abstract—Kitchen environments are target of many In this article, we focus on the formulation area and how Ubiquitous Computing prototypes that investigate the it can be enhanced by non-intrusive information services that application of technologies in real-world situations. In this support weighing and mixing tasks. On technological level, paper, we present a design-driven approach that carefully we investigate whether Ubiquitous Computing technologies, analyzes domain theories, identifies requirements for human Semantic Technologies, and Internet of Things (IoT) task support, derives services accordingly, and uses these technologies can be used to positively support preparation descriptions for guiding implementations. We focus on how and concocting situations in a way that does not interfere preparation and concocting tasks can be supported by a with traditional cooking behaviors. formulation table with integrated scaling mechanism and a set of dedicated information services. We present how requirements are translated into system architectures and a prototype. Empirical studies are current tangible user interfaces, intelligent environment, design work. method II. REQUIREMENTS I. INTRODUCTION In our project, we have selected several functional and A kitchen is an environment equipped with different non-functional user requirements related to preparation and tools and services that assist individuals and groups in concocting situations. These requirements represent a different situations, such as cooking, washing the dishes, selection for complete cooking processes encompassing all breakfast preparation, and also communications.
    [Show full text]
  • Microformats and Microdata and SEO
    Microformats and Microdata and SEO Making the Web a Better Place with Rich Snippets Microformats are a way of marking up content with “meta” information. Other means include Microdata and RDFa. In essence, this involves adding some additional HTML around existing content so that the content can be identified by automated web services. These services include search engines and news aggregators. Of the three existing formats, I recommend looking at either Microdata or Microformats. Microformats Microformats is a grass-roots effort and has been around for quite a while now. It is based on tagging items with specific classes. There are many “patterns” for Microformats. Each pattern is intended for a specific type of content and has its own set of classes for marking things up. Some examples of Microformats patterns include: hCard - for people and organizations hCalendar - for events hProduct - for products hReview - for reviews of items hRecipe - for recipes Microdata The Microdata format was proposed by several major search engines within the last year. This is a similar idea to Microformats but the content is marked up in a different way - with new attributes on HTML tags. Since Microdata uses new attributes, it is a bit more technical than simply adding classes to markup. There are similar examples of Microdata patterns: Person - a person (alive, dead, undead, or fictional) Organization - an organization such as a school, NGO, corporation, club, etc. Event - an event happening at a certain time at a certain location Place - entities that have a somewhat fixed, physical extension LocalBusiness - particular physical business or branch of an organization Product - anything that is made available for sale Review - a review of an item What are they Good for? When Google (for example) presents the results of a search, sometimes you get more than just a few lines of text.
    [Show full text]
  • The Webdatacommons Microdata, Rdfa and Microformat Dataset Series
    The WebDataCommons Microdata, RDFa and Microformat Dataset Series Robert Meusel, Petar Petrovski, and Christian Bizer University of Mannheim, Germany Research Group Data and Web Science frobert,petar,[email protected] Abstract. In order to support web applications to understand the con- tent of HTML pages an increasing number of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, Microformats. The annotations are used by Google, Yahoo!, Yandex, Bing and Facebook to enrich search results and to display entity descriptions within their applications. In this paper, we present a series of publicly accessible Microdata, RDFa, Microformats datasets that we have extracted from three large web corpora dating from 2010, 2012 and 2013. Altogether, the datasets consist of almost 30 billion RDF quads. The most recent of the datasets contains amongst other data over 211 million product descriptions, 54 million reviews and 125 million postal addresses originating from thousands of websites. The availability of the datasets lays the foundation for further research on in- tegrating and cleansing the data as well as for exploring its utility within different application contexts. As the dataset series covers four years, it can also be used to analyze the evolution of the adoption of the markup formats. Keywords: Microdata, RDFa, Microformats, Dataset, Web Science 1 Introduction A large number of websites have started to use markup standards to annotate information about products, reviews, blog posts, people, organizations, events, and cooking recipes within their HTML pages. The most prevalent of these standards are Microformats,1 which use style definitions to annotate HTML text with terms from a fixed set of vocabularies; RDFa [1], which is used to embed any kind of RDF data into HTML pages, and Microdata [7], a recent format developed in the context of HTML5.
    [Show full text]
  • Restful Web Apis
    CHAPTER 10 The Hypermedia Zoo There are a lot of hypermedia document formats in active use. Some are designed for very specialized purposes—the people who use them may not even think of them as hypermedia formats. Other hypermedia formats are in such common usage that people don’t really think about them at all. In this chapter, I’ll take you on an educational tour of a “zoo” containing the most popular and most interesting hypermedia formats. I won’t be going into a lot of technical detail. Any one of these formats probably isn’t the one you want to use, and I’ve covered many of them earlier in the book. Many of the formats are still under active development, and their details might change. If you’re interested in one of the zoo’s specimens, the next step is to read its formal specification. My goal is to give you a sense of the many forms hypermedia can take, and to show how many times we’ve tackled the basic problems of representing it. The hypermedia zoo is so full that you probably don’t need to define a brand new media type for your API. You should be able to pick an existing media type and write a profile for it. I’ve organized the hypermedia zoo along the lines of my introduction to hypermedia. There’s a section for domain-specific formats (a la Chapter 5), a section for formats whose primary purpose is to implement the collection pattern (a la Chapter 6), and a section for general hypermedia formats (a la Chapter 7).
    [Show full text]