WYSIWYM – Integrated Visualization, Exploration and Authoring of Un-Structured and Semantic Content

Semantic Web 1 (2013) 1–14 1 IOS Press WYSIWYM – Integrated Visualization, Exploration and Authoring of Un-structured and Semantic Content Ali Khalili ∗, Sören Auer, University of Leipzig, Institute of Computer Science, AKSW Group, Augustusplatz 10, D-04009 Leipzig, Germany E-mail: {lastname}@informatik.uni-leipzig.de Abstract. The Semantic Web and Linked Data gained traction in the last years. However, the majority of information still is contained in unstructured documents. This can also not be expected to change, since text, images and videos are the natural way how humans interact with information. Semantic structuring on the other hand enables the (semi-)automatic integration, repurposing, rearrangement of information. NLP technologies and formalisms for the integrated representation of unstructured and semantic content (such as RDFa and Microdata) aim at bridging this semantic gap. However, in order for humans to truly benefit from this integration, we need ways to author, visualize and explore unstructured and semantic information in a holistic manner. In this paper, we present the WYSIWYM (What You See is What You Mean) concept, which addresses this issue and formalizes the binding between semantic representation models and UI elements for authoring, visualizing and exploration. With RDFaCE and Pharmer we present and evaluate two complementary showcases implementing the WYSIWYM concept for different application domains. Keywords: Visualization, Authoring, Exploration, Semantic Web, WYSIWYM, WYSIWYG, Visual Mapping 1. Introduction such as faceted search [27] or question answer- ing [16]. The Semantic Web and Linked Data gained traction – In information presentation semantically enriched in the last years. However, the majority of informa- documents can be used to create more sophis- tion still is contained in and exchanged using unstruc- ticated ways of flexibly visualizing information, tured documents, such as Web pages, text documents, such as by means of semantic overlays as de- images and videos. This can also not be expected to scribed in [3]. change, since text, images and videos are the natural – For information integration semantically enriched way how humans interact with information. Semantic documents can be used to provide unified views structuring on the other hand provides a wide range on heterogeneous data stored in different applica- of advantages compared to unstructured information. tions by creating composite applications such as It facilitates a number of important aspects of informa- semantic mashups [2]. tion management: – To realize personalization, semantic documents provide customized and context-specific informa- – For search and retrieval enriching documents tion which better fits user needs and will result with semantic representations helps to create in delivering customized applications such as per- more efficient and effective search interfaces, sonalized semantic portals [23]. – For reusability and interoperability enriching *Corresponding author. E-mail: [email protected] documents with semantic representations facili- leipzig.de. tates exchanging content between disparate sys- 1570-0844/13/$27.50 c 2013 – IOS Press and the authors. All rights reserved 2 Khalili et al. / WYSIWYM tems and enables building applications such as knowledge representation formalisms as well as executable papers [18]. UI elements for authoring, visualization and exploration of such elements. 3. Two complementary use cases, which evaluate Natural Language Processing (NLP) technologies different, concrete WYSIWYM interfaces in a (e.g. named entity recognition and relationship extrac- generic as well as domain specific context. tion) as well as formalisms for the integrated representation of unstructured and semantic content (such as RDFa and Microdata) aim at bridging the semantic gap The WYSIWYM formalization can be used as a ba- between unstructured and semantic representation for- sis for implementations; allows to evaluate and classify malisms. However, in order for humans to truly benefit existing user interfaces in a defined way; provides a from this integration, we need ways to author, visual- terminology for software engineers, user interface and ize and explore unstructured and semantic information domain experts to communicate efficiently and effec- in a holistic manner. tively. We aim to contribute with this work to mak- In this paper, we present the WYSIWYM (What ing Semantic Web applications more user friendly and You See Is What You Mean) concept, which addresses ultimately to create an ecosystem of flexible UI com- the issue of an integrated visualization, exploration ponents, which can be reused, repurposed and chore- and authoring of un-structured and semantic content. ographed to accommodate the UI needs of dynami- Our WYSIWYM concept formalizes the binding be- cally evolving information structures. tween semantic representation models and UI elements The remainder of this article is structured as follows: for authoring, visualizing and exploration. We anal- In Section 2, we describe the background of our work yse popular tree, graph and hyper-graph based seman- and discuss the related work. Section 3 describes the tic representation models and elicit a list of semantic fundamental WYSIWYM concept proposed in the pa- representation elements, such as entities, various rela- per. Subsections of Section 3 present the different com- tionships and attributes. We provide a comprehensive ponents of the WYSIWYM model. In Section 4, we survey of common UI elements for authoring, visu- introduce two implemented WYSIWYM interfaces to- alizing and exploration, which can be configured and gether with their evaluation results. Finally, Section 5 bound to individual semantic representation elements. concludes with an outlook on future work. Our WYSIWYM concept also comprises cross-cutting helper components, which can be employed within a concrete WYSIWYM interface for the purpose of 2. Related Work automation, annotation, recommendation, personalization etc. WYSIWYG. The term WYSIWYG as an acronym for With RDFaCE and Pharmer we present and evalu- What-You-See-Is-What-You-Get is used in computing ate two complementary showcases implementing the to describe a system in which content (text and graph- WYSIWYM concept for different domains. RDFaCE ics) displayed on-screen during editing appears in a is domain agnostic editor for text content with em- form closely corresponding to its appearance when bedded semantic in the form of RDFa or Microdata. printed or displayed as a finished product. The first Pharmer is a WYSIWYM interface for the authoring of usage of the term goes back to 1974 in the print in- semantic prescriptions and thus targeting the medical dustry to express the idea that what the user sees on domain. Our evaluation of both tools with end-users the screen is what the user gets on the printer. Xe- (in case of RDFaCE) and domain experts (in case of rox PARC’s Bravo was the first WYSIWYG editor- Pharmer) shows, that WYSIWYM interfaces provide formatter [19]. It was designed by Butler Lampson and good usability, while retaining benefits of a truly se- Charles Simonyi who had started working on these mantic representation. concepts around 1970 while at Berkeley. Later on by The contributions of this work are in particular: the emergence of Web and HTML technology, the 1. A formalization of the WYSIWYM concept WYSIWYG concept was also utilized in Web-based based on definitions for the WYSIWYM model, text editors. The aim was to reduce the effort required binding and concrete interfaces. by users to express the formatting directly as valid 2. A comprehensive survey of semantic represen- HTML markup. In a WYSIWYG editor users can edit tation elements of tree, graph and hyper-graph content in a view which matches the final appear- Khalili et al. / WYSIWYM 3 ance of published content with respect to fonts, head- cept is to enrich the existing WYSIWYG presenta- ings, layout, lists, tables, images and structure. Be- tional view of the content with UI components reveal- cause using a WYSIWYG editor may not require any ing the semantics embedded in the content and enable HTML knowledge, they are often easier for an aver- the exploration and authoring of semantic content. In- age computer user to get started with. The first pro- stead of separating presentation, content and meaning, grams for building Web pages with a WYSIWYG in- our WYSIWYM approach aims to integrate these as- terface were Netscape Gold, Claris HomePage, and pects to facilitate the process of Semantic Content Au- Adobe PageMill. thoring. There are already some approaches (i.e. vi- WYSIWYG text authoring is meanwhile ubiquitous sual mapping techniques), which go into the direction on the Web and part of most content creation and of integrated visualization and authoring of structured management workflows. It is part of content manage- content. ment cystems (CMS), weblogs, wikis, fora, product Visual Mapping Techniques. Visual mapping tech- data management systems and online shops, just to niques (a.k.a. knowledge representation techniques) mention a few. However, the WYSIWYG model has are methods to graphically represent knowledge struc- been criticized, primarily for the verbosity, poor sup- tures. Most of them have been developed as paper- port of semantics and low quality of the generated based techniques for brainstorming, learning facilita- code and there have been voices advocating a change tion, outlining or to elicit knowledge structures.

Load more