Conceptualization and Visualization of Tagging and Folksonomies

Conceptualization and Visualization of Tagging and Folksonomies Von der Fakultät für Ingenieurwissenschaften, Abteilung Informatik und Angewandte Kognitionswissenschaft der Universität Duisburg-Essen zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) genehmigte Dissertation von Steffen Lohmann aus Hamburg 1. Gutachter: Prof. Dr. Maria Paloma Díaz Pérez 2. Gutachter: Prof. Dr.-Ing. Jürgen Ziegler Tag der mündlichen Prüfung: 27.11.2013 Hinweis: Diese Dissertation ist im Rahmen eines binationalen Promotionsverfahrens (Cotutelle) in Kooperation mit der Universidad Carlos III de Madrid entstanden. Abstract Tagging has become a popular indexing method for interactive systems in the past decade. It offers a simple yet effective way for users to organize an ever increasing amount of digital information for themselves and/or others. The linked user vocabulary resulting from tagging is known as folksonomy and provides a valuable source for the retrieval and exploration of digital resources. Although several models and representations of tagging have been proposed, there is no coherent conceptualization that provides a comprehensive and pre- cise description of the concepts and relationships in the domain. Furthermore, there is little systematic research in the area of folksonomy visualization, and so folksonomies are still mainly depicted as simple tag clouds. Both problems are related, as a well-defined conceptualization is an important prerequisite for the interoperable use and visualization of folksonomies. The thesis addresses these shortcomings by developing a coherent conceptualization of tagging and visualizations for the interactive exploration of folksonomies. It gives an overview and comparison of tagging models and defines key concepts of the domain. After a comprehensive review of existing tagging ontologies, a unified and coherent conceptualization is presented that incorporates the best parts of the reviewed ontologies. The conceptualization is implemented in a standardized ontology language and enables scalable reasoning on folksonomies. Based on the conceptualization, different tag cloud layouts and their ability to support users in information seeking tasks are investigated. Extensions to tag clouds are developed and combined into an analysis system, enhanced by sophisticated natural language processing and intuitive interaction techniques. Furthermore, existing work on visualizing folksonomies as graphs is analyzed and a new type of graph visualization tailored to the tag-based exploration of digital resources is introduced. The work presented in this thesis has been successfully applied in different contexts. The developed conceptualization has been used in knowledge and requirements engineering, the tag cloud extensions have been applied in visual text analysis, and the novel graph visualization has proved useful in image retrieval and requirements analysis. These examples demonstrate the utility of the developed approaches beyond the topic of tagging and open up interesting questions for future research. iii Resumen El etiquetado se ha convertido en la última década en un método popular para la indexación de contenidos en sistemas interactivos. Ofrece una forma sencilla y eficaz para que los usuarios organicen una cantidad cada vez mayor de información para ser usada por ellos mismos y/o por terceros. El vocabulario de los usuarios resultante del etiquetado se conoce como folcsonomía y puede ser muy útil en la búsqueda y exploración de contenidos digitales. Aunque se han propuesto varios modelos y representaciones para el etiquetado, todavía no hay una conceptualización coherente que describa completamente los conceptos y relaciones de la materia. Además, existe poca investigación sistemática en el área de visualización de folcsonomías, por lo que éstas se representan princi- palmente como simples nubes de palabras. Ambos problemas están relacionados, ya que una conceptualización precisa y bien definida es importante para el uso interoperable y la visualización de folcsonomías. La tesis aborda estos dos problemas mediante el desarrollo de una conceptual- ización coherente de etiquetado y de visualizaciones para la exploración interactiva de folcsonomías. En primer lugar, se ofrece una visión general de los modelos de etiquetado y se definen los conceptos clave. Después de un análisis exhaustivo de las ontologías de etiquetado existentes, se presenta una conceptualización unificada que combina las mejores características de las ontologías analizadas. La conceptualización se implementa en un lenguaje de ontologías estandarizada y permite el uso de algoritmos de razonamiento eficientes. Sobre la base de la conceptualización, se investigan diferentes formas de las nubes de etiquetas y las propiedades que ayudan a los usuarios en la búsqueda de información. Se desarrollan extensiones de las nubes de etiquetas y se las integra en un único sistema de análisis que utiliza técnicas avanzadas de interacción y de procesamiento del lenguaje natural. En un segundo paso, se analizan y se clasifican los enfoques de visualización gráfica de folcsonomías y, finalmente, se presenta un nuevo tipo de visualización gráfica que permite la exploración de contenidos digitales basada en etiquetas. Los desarrollos de esta tesis se han aplicado con éxito en diferentes contextos. La conceptualización se ha utilizado tanto en la ingeniería del conocimiento y de requisitos. Las extensiones de las nubes de etiquetas se han integrado en un sistema de análisis del texto. Además, la visualización gráfica se ha aplicado tanto en la recuperación de imágenes como en el análisis de requisitos. Estas aplicaciones muestran los diversos usos de los desarrollos más allá de etiquetado y plantean preguntas interesantes para la investigación futura. v Zusammenfassung Tagging ist im vergangenen Jahrzehnt zu einer beliebten Methode für die Index- ierung von Inhalten in interaktiven Systemen geworden. Es bietet Nutzern eine einfache und effektive Möglichkeit, immer größere Mengen digitaler Informationen für sich selbst und/oder andere zu organisieren. Das durch Tagging entstehende Vokabular wird als Folksonomie bezeichnet und kann die Suche und Erschließung von digitalen Inhalten hilfreich unterstützen. Obwohl mehrere Modelle und Repräsentationen für Tagging vorgeschlagen wurden, gibt es bislang keine umfassende Beschreibung der Konzepte und Beziehungen des Themengebiets. Zudem existiert wenig systematische Forschung im Bereich der Visualisierung von Folksonomien, so dass diese hauptsächlich in Form einfacher Wortwolken dargestellt werden. Beide Probleme bedingen sich gegenseitig, da eine präzise und konsistente Konzeptualisierung wichtige Voraussetzung für die interoperable Nutzung und Visualisierung von Folksonomien ist. Diese Dissertation setzt sich mit den genannten Problemen auseinander, indem sie eine umfassende Konzeptualisierung für Tagging und Visualisierungen für die interaktive Analyse von Folksonomien entwickelt. Zunächst wird ein Überblick über Tagging-Modelle gegeben und werden wichtige Begriffe des Themengebiets definiert. Nach einer sorgfältigen Analyse vorhandener Tagging-Ontologien wird eine konsistente und wohldefinierte Konzeptualisierung präsentiert, die kompatible Teilbereiche der betrachteten Ontologien aufgreift und vereint. Die Konzeptual- isierung ist in einer standardisierten Ontologiebeschreibungssprache implementiert und ermöglicht die Nutzung skalierbarer Reasoning-Verfahren auf Folksonomien. Ausgehend von der Konzeptualisierung untersucht die Dissertation anschließend verschiedene Darstellungsformen für Wortwolken und wie sie Nutzer bei der Informationssuche unterstützen. Es werden visuelle und interaktive Erweiterun- gen für Wortwolken entwickelt und zu einem Analysesystem integriert, das mit maschineller Sprachverarbeitung und intuitiven Interaktionstechniken ausgestat- tet ist. Darüber hinaus werden vorhandene Umsetzungen der Darstellung von Folksonomien als Graphen analysisiert und schließlich eine neuartige Graphvisu- alisierung präsentiert, die die tag-basierte Erschließung von digitalen Inhalten ermöglicht. Die entwickelten Ansätze wurden erfolgreich in verschiedenen Kontexten ange- wandt. Die Konzeptualisierung kam im Wissens- und Anforderungsmanagement zum Einsatz, die erweiterten Wortwolken halfen bei der visuellen Textanalyse und die neuartige Graphvisualisierung fand in der Bildersuche und Anforderungsanal- yse Anwendung. Die Beispiele zeigen die vielfältigen Verwendungsmöglichkeiten der entwickelten Ansätze über das Themengebiet Tagging hinaus und führen zu interessanten Fragestellungen für zukünftige Forschung. vii Acknowledgments This thesis was conducted under a joint supervision (cotutelle) agreement between Universidad Carlos III de Madrid and Universität Duisburg-Essen. I would like to express my deep gratitude to my thesis supervisors Paloma Díaz and Jürgen Ziegler for making this possible and for their support, trust, and patience throughout the whole PhD process. I am also very grateful to Thomas Ertl who gave me the opportunity and time to complete the thesis after starting work at the Institute for Visualization and Interactive Systems (VIS) of the University of Stuttgart. My grateful thanks are extended to Manuel Castro, Fabio Crestani, and Heinz Ulrich Hoppe who participated in the international PhD committee and examined the thesis and defense, as well as to Norbert Fuhr for his efficient support in coordinating the PhD committee and defense. Helpful feedback was also provided by the

Conceptualization and Visualization of Tagging and Folksonomies

Panel: Large Knowledge Bases

Annotea: an Open RDF Infrastructure for Shared Web Annotations

A Critique of Pure Reason’

Spatial Data Infrastructures and Linked Data

Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing

Where Is the Semantic Web? – an Overview of the Use of Embeddable Semantics in Austria

Peiwen Zhu. Web Annotation Systems: a Literature Review and Case Study

Microformats the Next (Small) Thing on the Semantic Web?

Directed Graph

HTML5 Microdata and Schema.Org

Microformats?

Inconsistency Robustness for Logic Programs Carl Hewitt