WYSIWYM – Integrated Visualization, Exploration and Authoring of Semantically Enriched Un-Structured Content
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Web 1 (2013) 1–14 1 IOS Press WYSIWYM – Integrated Visualization, Exploration and Authoring of Semantically Enriched Un-structured Content Ali Khalili a, Sören Auer b a AKSW, Universität Leipzig, Augustusplatz 10, 04109 Leipzig, Germany [email protected] b CS/EIS, Universität Bonn, Römerstraße 164, 53117 Bonn, Germany [email protected] Abstract. The Semantic Web and Linked Data gained traction in the last years. However, the majority of information still is contained in unstructured documents. This can also not be expected to change, since text, images and videos are the natural way how humans interact with information. Semantic structuring on the other hand enables the (semi-)automatic integration, repurposing, rearrangement of information. NLP technologies and formalisms for the integrated representation of unstructured and semantic content (such as RDFa and Microdata) aim at bridging this semantic gap. However, in order for humans to truly benefit from this integration, we need ways to author, visualize and explore unstructured and semantically enriched content in an integrated manner. In this paper, we present the WYSIWYM (What You See is What You Mean) concept, which addresses this issue and formalizes the binding between semantic representation models and UI elements for authoring, visualizing and exploration. With RDFaCE and Pharmer we present and evaluate two complementary showcases implementing the WYSIWYM concept for different application domains. Keywords: Visualization, Authoring, Exploration, Semantic Web, WYSIWYM, WYSIWYG, Visual Mapping 1. Introduction more efficient and effective search interfaces, such as faceted search [29] or question answer- The Semantic Web and Linked Data movements ing [17]. with the aim of creating, publishing and interconnect- – In information presentation semantically enriched ing machine readable information have gained traction documents can be used to create more sophis- in the last years. However, the majority of informa- ticated ways of flexibly visualizing information, tion still is contained in and exchanged using unstruc- such as by means of semantic overlays as de- tured documents, such as Web pages, text documents, scribed in [3]. images and videos. This can also not be expected to – For information integration semantically enriched change, since text, images and videos are the natural documents can be used to provide unified views way how humans interact with information. Semantic on heterogeneous data stored in different applica- structuring on the other hand provides a wide range tions by creating composite applications such as of advantages compared to unstructured information. semantic mashups [2]. It facilitates a number of important aspects of informa- – To realize personalization, semantically enriched tion management: documents provide customized and context-specific information which better fits user needs and will – For search and retrieval enriching documents result in delivering customized applications such with semantic representations helps to create as personalized semantic portals [25]. 1570-0844/13/$27.50 c 2013 – IOS Press and the authors. All rights reserved 2 Khalili et al. / WYSIWYM – For reusability and interoperability enriching 1. A formalization of the WYSIWYM concept documents with semantic representations facili- based on definitions for the WYSIWYM model, tates exchanging content between disparate sys- binding and concrete interfaces. tems and enables building applications such as 2. A survey of semantic representation elements of executable papers [19]. tree, graph and hyper-graph knowledge represen- tation formalisms as well as UI elements for au- thoring, visualization and exploration of such el- Natural Language Processing (NLP) technologies ements. (e.g. named entity recognition and relationship extrac- 3. Two complementary use cases, which evaluate tion) as well as formalisms for the integrated represen- different, concrete WYSIWYM interfaces in a tation of unstructured and semantic content (such as generic as well as domain specific context. RDFa and Microdata) aim at bridging the semantic gap between unstructured and semantic representation for- The WYSIWYM formalization can be used as a ba- malisms. However, in order for humans to truly ben- sis for implementations; allows to evaluate and classify efit from this integration, we need ways to author, vi- existing user interfaces in a defined way; provides a sualize and explore unstructured and semantically en- terminology for software engineers, user interface and riched content in an integrated manner. domain experts to communicate efficiently and effec- In this paper, we present an approach inspired by the tively. We aim to contribute with this work to mak- WYSIWYM metaphor (What You See Is What You ing Semantic Web applications more user friendly and Mean), which addresses the issue of an integrated vi- ultimately to create an ecosystem of flexible UI com- sualization, exploration and authoring of semantically ponents, which can be reused, repurposed and chore- enriched un-structured content. Our WYSIWYM con- ographed to accommodate the UI needs of dynami- cept formalizes the binding between semantic repre- cally evolving information structures. sentation models and UI elements for authoring, visu- The remainder of this article is structured as follows: alizing and exploration. We analyse popular tree, graph In Section 2, we describe the background of our work and hyper-graph based semantic representation mod- and discuss the related work. Section 3 describes the els and elicit a list of semantic representation elements, fundamental WYSIWYM concept proposed in the pa- such as entities, various relationships and attributes. per. Subsections of Section 3 present the different com- We provide a comprehensive survey of common UI ponents of the WYSIWYM model. In Section 4, we elements for authoring, visualizing and exploration, introduce two implemented WYSIWYM interfaces to- which can be configured and bound to individual se- gether with their evaluation results. Finally, Section 5 mantic representation elements. Our WYSIWYM con- concludes with an outlook on future work. cept also comprises cross-cutting helper components, which can be employed within a concrete WYSIWYM 2. Related Work interface for the purpose of automation, annotation, recommendation, personalization etc. WYSIWYG. The term WYSIWYG as an acronym for With RDFaCE and Pharmer we present and evalu- What-You-See-Is-What-You-Get is used in computing ate two complementary showcases implementing the to describe a system in which content (text and graph- WYSIWYM concept for different domains. RDFaCE ics) displayed on-screen during editing appears in a is domain agnostic editor for text content with em- form closely corresponding to its appearance when bedded semantic in the form of RDFa or Microdata. printed or displayed as a finished product. The first Pharmer is a WYSIWYM interface for the authoring of usage of the term goes back to 1974 in the print in- semantic prescriptions and thus targeting the medical dustry to express the idea that what the user sees on domain. Our evaluation of both tools with end-users the screen is what the user gets on the printer. Xe- (in case of RDFaCE) and domain experts (in case of rox PARC’s Bravo was the first WYSIWYG editor- Pharmer) shows, that WYSIWYM interfaces provide formatter [20]. It was designed by Butler Lampson and good usability, while retaining benefits of a truly se- Charles Simonyi who had started working on these mantic representation. concepts around 1970 while at Berkeley. Later on by The contributions of this work are in particular: the emergence of Web and HTML technology, the Khalili et al. / WYSIWYM 3 WYSIWYG concept was also utilized in Web-based representation can be edited directly by the user by ma- text editors. The aim was to reduce the effort required nipulating the feedback text. by users to express the formatting directly as valid The WYSIWYM term as defined and used in this HTML markup. In a WYSIWYG editor users can edit paper targets the novel aspect of integrated visualiza- content in a view which matches the final appear- tion, exploration and authoring of unstructured and se- ance of published content with respect to fonts, head- mantic content. The rationale of our WYSIWYM con- ings, layout, lists, tables, images and structure. Be- cept is to enrich the existing WYSIWYG presenta- cause using a WYSIWYG editor may not require any tional view of the content with UI components reveal- HTML knowledge, they are often easier for an aver- ing the semantics embedded in the content and enable age computer user to get started with. The first pro- the exploration and authoring of semantic content. In- grams for building Web pages with a WYSIWYG in- stead of separating presentation, content and meaning, terface were Netscape Gold, Claris HomePage, and our WYSIWYM approach aims to integrate these as- Adobe PageMill. pects to facilitate the process of Semantic Content Au- WYSIWYG text authoring is meanwhile ubiquitous thoring. Two “You”s in our WYSIWYM concept re- on the Web and part of most content creation and fer to the end user (with no or limited knowledge of management workflows. It is part of content manage- Semantic Web ) who is viewing an unstructured con- ment cystems (CMS), weblogs, wikis, fora, product tent which is semantically enriched by himself. The data management systems and online shops, just to “Mean” refers to the metadata or semantics which