Linked Open Data Visualization Revisited: a Survey

Semantic Web 0 (0) 1 1 IOS Press

Linked Open Data Visualization Revisited: A Survey

Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name Surname, University, Country

Oscar Peña ∗, Unai Aguilera and Diego López-de-Ipiña Deusto Institute of Technology - DeustoTech, University of Deusto Avda. Universidades 24, 48007, Bilbao, Spain

Mass adoption of the Semantic Web’s vision will not become a reality unless the beneﬁts provided by data published under the Linked Open Data principles are understood by the majority of users. As technical and implementation details are far from being interesting for lay users, the ability of machines and algorithms to understand what the data is about should provide smarter summarisations of the available data. Visualization of Linked Open Data proposes itself as a perfect strategy to ease the access to information by all users, in order to save time learning what the dataset is about and without requiring knowledge on semantics. This article collects previous studies from the Information Visualization and the Exploratory Data Analysis ﬁelds in order to apply the lessons learned to Linked Open Data visualization. Datatype analysis and visualization tasks proposed by Ben Shneiderman are also added in the research to cover different visualization features. Finally, an evaluation of the current approaches is performed based on the dimensions previously exposed. The article ends with some conclusions extracted from the research.

Keywords: Visualization, Visual Representations, Linked Open Data, Analytics, Semantic Web

1. Introduction datasets holds information about a great variety of topics, such as public transport, air and water quality, The need to understand and work with new data has public funding, cultural agendas and the like. These accompanied humanity since its origins. The need to store, analyse and spread information can already be datasets are traditionally found on the official web- observed in the first pictograms drawn by our ancestors sites of public administrations or on Open Data cat- in caves, and those techniques have been constantly alogues, normally under standardised document for- used and improved through centuries and generations. mats: plain text, CSV (Comma Separated Values) and It is now thanks to the widespread adoption of Internet access, that information technologies come to help to spreadsheets are among the most used file formats, but effectively deal with data and avoid overloading. relational database dumps can also be found. Nonethe- The amount of data made publicly available by gov- less, there are ongoing efforts to make all these data ernments, public entities, companies and even citizens available in machine readable formats, following the has seen an exponential increase in the latest years, specially due to Open Government and Open Data Linked Open Data (LOD) principles exposed by Tim encouraging policies. The diversity of the published Berners-Lee [7], making data publishers to embrace a set of guidelines in order to make data consumable by *Corresponding author. E-mail: [email protected]. algorithms through the Internet.

As stated by Tim Berners-Lee, Open Data can fol- 2. Background low a five-star model1 in which each level, from 1 star up to 5, points out an increasingly reusable and pro- As progress stands on the shoulders of giants, it is cessable dataset. To be fully compliant with LOD’s vi- important to compile existing research on this field in sion, 5 star data is required, as lower levels do not en- order to apply it to LOD scenarios. force the linkage of resources over the Internet. In accordance with Information Theory, vision is the sense with the largest bandwidth to send information RDF (Resource Description Framework) [10] is an to the brain [52], and humans ability to quickly under- abstract syntax to make statements about resources, stand complex data through it is reflected on the well and thanks to LOD principles, construct links to avail- known adage “a picture is worth a thousand words”. able external datasets in a simple manner, making them Promoted by John Tukey, Exploratory Data Analysis accessible and queryable through the Internet and pro- (EDA) [49] tries to summarize the main features of a moting data reusability. As the amount of information dataset applying visual methods. This makes the EDA at hand is greater every day, the need to handle it effi- approach a perfect candidate to be followed in LOD ciently becomes a key requirement for anybody inter- visualization. ested in working with it. This situation settles a great As everyday more and more governments, public scenario for the Information Visualization field (or In- entities, organisations, etc. are encouraged (and some- foVis, as it is known by academics and industry), tak- times forced due to transparency policies) to make ing advantage of humans capacity to identify patterns public data accessible to citizens and interested third parties, automatizing the publication of information is and gain insights from visual representations of ab- a common approach among practitioners to easily ex- stract data. InfoVis positions itself in the intersection pose huge amounts of files and documents to public of other data-related fields: Statistics, Analytics, Dis- consumption. The errors caused by automatic parsing semination and so on. and processing, together with the lack of correctly ap- One of the biggest issues concerning mass adoption plying term disambiguation and the selected approach of LOD outside the Semantic Web (SW) community, to deal with missing values, gives birth to LOD in need is the technical and conceptual knowledge required to of a lot of pre-processing to be usable for a data analy- take full advantage of the benefits provided by this type sis task. of data publishing. Utilizing expressive visual repre- Likewise, the diversity of topics that those datasets sentations, most users can employ their visual capaci- deal with, make the automatic visualization of LOD a ties to obtain a clear understanding of the data stored great challenge full of research opportunities. Tables have been largely used to display LOD. When con- within the dataset. Interactive visualizations also offer sulting information about a resource (object), a table is the possibility to play and experiment with the data, generated with as many rows as attribute instances: a allowing to perform exploratory knowledge discovery first column with the property name (or IRI), and a sec- using the “follow your nose” principle [53]. ond column with the value. The table layout has been This article is structured as follows: In section 2 popularised by tools similar to Pubby [24], in charge background knowledge on information visualization is of the generation of the green-ish HTML pages of DB- provided, addressing the best practices to represent ab- pedia articles. stract data in a visual manner to allow a coherent in- Regarding topic diversity, domain specific tools terpretation of them. Section 3 describes current ap- such as FoaF Explorer [13], map4rdf [34], LinkedGeo- proaches that deal with LOD visualization, and are Data browser [48], etc. display a well selected set of later evaluated in Section 4 according to the previously visual representations, as result of being tailored for a defined features in order to solve LOD visualization is- concrete set of ontologies within a well known envi- ronment. sues. Finally, Section 5 discusses the findings of con- As diversity increases, more vocabularies are de- ducting this study and the conclusions drawn from it. signed to reflect the details of a great amount of sub- jects, thus multi-domain or generalist approaches need to be designed in a manner that lets them manage and 1http://5stardata.info/ generate visualizations over different scenarios. O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey 3

2.1. Datatype analysis gation with other data dimensions can create aug- mented visualizations of greater value. Adding Ben Shneiderman proposed seven basic datatypes labels, descriptions, images, etc. the map layout [46] a data fragment could be classified into, stating a is enriched, making geospatial data to be fun to taxonomy which allows to tag a datum with a certain interact with by any user. Furthermore, data in- category, determining how it can be used and which stances can be collected by geographical areas operators are applicable. Following this taxonomy, an (such as countries, states, etc.) and normalised, extended description of each datatype and their con- encoding each region within a pre-established nection to LOD principles is detailed. colour palette resulting in choropleth maps, or – 1 dimensional: linear datatypes including textual distort established borders to proportionally ex- documents, program source code and alphabeti- pose local contrasts over a set of variables using cal lists of names which are all organised in a se- cartograms. quential manner. – 3 dimensional (volumetric): real-world objects Unidimensional data is usually displayed as lists such as molecules, the human body, and buildings of items organised by a single feature (e.g., alpha- having items with volume and some potentially betical order), so it is uncommon to see it visu- complex relationship with other items. alised. Whereas this datatype is one of the pillars of sci- A especial case is when a data dimension has entific visualization, non-trained eyes may find a narrow range of values repeated through the difficult to correctly interpret what 3D graphs and dataset, for example, the names of months or the charts are trying to represent. Traditionally re- department titles of an office. These values are lated to huge datasets, this datatype adds com- known in statistics as categorical data, or factors. plexity to non-trained users, requiring a devel- A simple aggregation of these values can be used oped spatial vision skill in order to understand the for the creation of distribution analyses in a fur- underlying data. ther step. Besides, pleasant rendering of both big data – 2 dimensional (planar): planar or map data in- sources and 3D images on web browsers still cluding geographic maps, floorplans or newspa- comprises a challenge, but server-side prepro- per layouts. cessing techniques together with WebGL’s fea- Geographical features offer an excellent oppor- tures [1] should overcome the technical issues in tunity to help users locate data instances on a the near-future. map. In combination with map templating en- – Multi-dimensional: items from relational and gines, data instances can be placemarked using statistical databases with n attributes becoming different symbols, thus allowing resources to be points in a n-dimensional space. distinguished attending to their class. The ability The easiest manner in which n dimensional data to pinpoint elements on a map may help uncov- can be defined is by taking an object, and provid- ering element distribution patterns in the datasets, ing values for each of its n attributes (with n > 1). letting users identify the areas where resources The descriptions of m objects using those n fea- are either tightly gathered or disperse. Advanced tures will give birth to a n × m matrix, each row projection techniques can also enhance presenta- representing an object instance and each column tion by clustering elements together in associa- collecting all the measurements for a given di- tion to the applied zoom level, or even addressing mension. high-interest areas using heatmaps. Due to its suitability to fit abstract models, map- Planar data can also be found as an array of pings from multi-dimensional data to relational bidimensional features, producing geometrical database schemas, spreadsheets or CSV files are shapes which limit an area within a map. These quite trivial, and so is expressing these data by bounding boxes, when overlayed to a base map, means of ontological class resources being the provide great insights of data which affect a subject of predicate triples with the measured val- greater geographical area, not just a unique, pre- ues as the objects. cise point. The number of dimensions can give clues about Finally, bidimensional data does not only produce which visual representations are more appropriate visual representations on their own, but in aggre- in each case. As an example, a first choice to ex- 4 O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey

hibit a dataset by two of its dimensions would be Hierarchies or tree structures are formed by items to draw a scatter plot, each dimension represented having links to other instances as parents, siblings over an axis and with the dots placing the union and children in a resemblance to a family tree. between both for each instance. If a third dimen- These structures have in common a root node, sion is added to the analysis, encoding it to each from which the rest of instances grow in depth, dot’s area will evolve the scatter plot to a bubble until end nodes are reached (items with no chil- chart. More dimensions can be encoded through dren), also known as leaves. colours, shapes, etc. Trees provide a great understanding of the over- As important as dimensionality, the datatypes of all structure of the data being studied, where each dimension can filter the universe of visual- analysts are able to perform the first two tasks izations to the most suitable ones in each case, as of Ben Shneiderman’s visualization mantra [46]: expounded in [29]. “Overview” and “zoom”, gaining an overview of Advanced visualizations can be developed by the whole structure and then zooming in the items bringing the features of other datatypes together of interest. in a multi-dimensional space: time-based cyclical Common operations performed over trees in- data in polar charts, planar combined with times- clude count of total items (e.g., total number of tamped data in complex timelines, etc. classes in the DBpedia ontology [3]), number of – Temporal: separated from 1-dimensional data, children of a selected node (e.g., child classes the distinction in temporal data is that items have of dbo:Agent) and number of elements defined a start and finish time, which not only covers within a node (e.g., instances of dbo:University). timestamped data (i.e., a precise moment in time), The indented tree visual representation has tradi- but those items spanning through time with a de- tionally been used to navigate through file direc- fined starting and end date (overlaps are allowed). tories in operating systems, or render the struc- Time based data is very useful when arranging el- ture of software packages in programming suites. ements through history in chronological order, for The possibility to collapse a subtree made this ap- example in medical records, project management proach very useful to reach deep nodes within the or historical presentations. structure with minimal visual overload and effi- Additionally, temporal data can have a recur- cient interactive exploration. rent regularity (e.g., weekly, monthly, every four Adjacency diagrams are a space-filling variant years, etc.). All this components make tempo- of the previous representations, where the posi- ral data suitable to be displayed in calendars and tion of a node relative to adjacent items reveals timelines (either in combination with geographi- its place in the hierarchy. IciclePartition layouts cal features or by its own). are similar to dendrograms, with the advantage Nevertheless, as with planar data, time-series data of providing an additional dimension (area) to makes a perfect candidate to be mixed with new display another variable. Sunbursts are a polar- data dimensions, allowing new analyses over data coordinate variant of icicle layouts. that changes over time. Multiple domains such as Substituting adjacency by containment the treemap finance, science, public policy and management concept was introduced [30], displaying struc- (to name a few), take the advantage of temporal ture as a set of nested geometries in a tile lay- data to detect patterns and trends in their datasets. out. Whilst the most widespread geometry used Time series forecasting can also be used to predict in treemaps are rectangles, other shapes can also future values based on the recorded measures in be used generating Voronoi, Jigsaw or Circular our datasets. treemaps. Together with multi-dimensional data, temporal – Network: cases emerge where hierarchical struc- information can be represented with the most di- tures are not enough to capture the essence of verse variety of visualization techniques, relying the relationship among items on a dataset, spe- on the temporal dimension as a principal compo- cially within links among LOD sources. Nodes nent of the chart. have no linkage constraints, being free to connect – Tree (hierarchical): collections of items with to whatever items they want. This freedom allows each having a link to a parent object (except the to combine similar resources within a diverse set root), forming hierarchies or tree-like structures. of features. Both external and internal links create O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey 5

a graph of interlinked items, which need a layout VizWiz) [16] also uses this datatype categorisation in algorithm in order to be displayed due to the lack order to deal with the semi-automatic generation of of hierarchical meaning. visual representations based on LOD. The search for an efficient layout that honestly represents the data, depends on the message an- Table 1 alysts want to highlight. Sometimes analysts will LDVM’s generic visualization datatypes be looking for the shortest paths between two RDF Vocabulary Datatype Visualization Tool items, or how many cliques [38] the community xsd:int, dc:subject,... (count) 1D Histogram is divided in. Taking techniques from the Social wgs84:lat, geo:point,... 2D Map Network Analysis (SNA) field, who are the key visko:3DPointPlot,... 3D 3D Rendering players of the network can also be understood, us- qb:Observation, scovo:Item,... Multidim. Chart ing a wide set of metrics to determine relevance xsd:date, ical:dtstart,... Temporal Timeline, Calendar,... [37]. rdfs:subClassOf, skos:narrower,... Tree Treemap, SunBurst,... The least known network representation is usu- foaf:knows,... Network Graph,... ally the adjacency matrix, a tool often used by mathematicians and computer scientist to relate items in a 2D space. Each cell encodes the value 2.2. Primitive datatypes between the column and row data instances (either showing the number or following a colour The term “datatype” is misleading for those people palette), and matrix re-arrangement allows to with a solid background in Semantics. The usage of quickly detect clusters and bridges. Besides, no this term refers to the abstract structural layout within collisions between links can happen, at the ex- the data. For example, the 1 dimensional datatype can pense of requiring a bigger area to display all the be conceptually thought of as a set, list or array by a information. Filters and selectors can help dimin- computer scientist, or as a vector by mathematicians. ishing the matrix’s width and height. In computer science and programming, a datatype Easier to interpret are node-link diagrams in a defines the manner a value should be interpreted, how graph layout, where nodes represent each data in- it is implemented, encoded and stored within a system, stance, and edges or links between them the at- what operations can be performed over it, its meaning tribute through which items are connected. De- and the value ranges for the observation [43]. pending on the algorithm used to display the In statistics, the term has a slightly different mean- graph, different attributes will be highlighted in ing, clustering groups of individual data points into the analysis. categories with the same semantic context. However, For example, the force-directed layout tries to there is a equivalent mapping between both definitions emulate nodes as being particles or a physical sys- of datatypes, so no deeper understanding is required. tem, each one repelling the others and only be- The following primitive datatypes will be consid- ing pulled together those that share links. Edge ered to take them into account to select the visual rep- weight can be used as a gravity indicator, thus resentations that fit best for a particular analysis (note the stronger the link, the closer particles will that this classification is conceptual and programming stay together, whereas the weaker the link, the language agnostic): more remote nodes will be placed. Bigger graphs – Integer: A finite subset of integer values, such as will populate the visualization with nodes and the height of a person in centimetres, or the num- links, creating giant hairballs with multiple line ber of wheels of a vehicle. It may contain negative crossings. Although there are researchers trying values. to minimise the hairball effect [32], usually high – Float: The representation of a real number (e.g., density networks are not suitable for graph ren- the height of a person in meters). It may contain dering. negative values. The Linked Data Visualization Model (LDVM) – Boolean: A value meaning a logical truth, either [22], mapped these datatypes to visualization tools and true or false. RDF vocabularies, creating the mappings summarised – String: Defined as a sequence of characters, it in Table 1. The Linked Data Visualization Wizard (LD- can contain any of the other datatypes in lit- 6 O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey

eral format. This drawback is solved in RDF but stacked bars and lines are widely used in both notations by appending ˆˆdatatype to the object static and over-time visual analyses. value. Newer approaches such as JSON-LD deal – Distribution: Popular among statisticians during with this situations by using native JavaScript the first analysis stages, provides a layout to dis- datatypes where possible. play how data items correlate. Distribution analy- – Date: Usually encoded as a string, a timestamp is ses can be studied over one, two or three variables defined by either one of the agreed standards such in an understandable way, making them suitable as ISO 8601 [4]. Data correctly formatted as date to quickly get a scent of the information within a instances can significantly improve temporal data dataset. detection in datasets. – Relationship: Tries to expose connections be- – Geographical coordinates: In a similar fashion, tween two or three variables within the dataset. It geographical data is usually encoded as strings or may become of special interest to the LOD com- floats, making it really difficult to detect these fea- munity, merging data from different sources to tures by a simple data pre-processing. Actually observe the relations among them. the best approach is to detect geographical components by the ontology property used to define Based on the works by [14,35,6], and the tour them [16], either as isolated latitude and longi- through some of the most typically used charts on [29], tude coordinates, tuples containing both, or lists it is possible to envisage some common patterns be- of geo-points describing a bounding box. tween visual representations and the analysis type to – IRIs: Internationalised Resource Identifiers are a be performed, in order to establish a first recommenda- standard defined upon the URI scheme [11]. IRIs tion for newcomers to the visualization field. A sum- can contain any Unicode character, thus allow- marization of these studies is depicted in Table 2. ing a true internationalisation of resources over the Internet. Also commonly found as strings, ap- 2.4. Visual information seeking mantra applied to proaches like JSON-LD’s “@id” key allows for a LOD rapid identification of links to other resources, in order to establish relationships among data. Formulated by Ben Shneiderman and popularised as the Information Visualization Mantra: Overview first, 2.3. Analysis message type zoom and filter, then details on demand, actually its author proposed seven different tasks that can be accom- On behalf of the type of analysis to perform, there plished through visualization techniques. Whilst most are different visual representations that fit best each of of the works focus only on the most known version of the messages. An analyst may be interested in draw- the mantra, each task forms part of a strategy designed ing a comparison among fundings coming from differ- to get the best performances on data analytics: ent regional governments for the last decades, whilst – Overview: Its focus is set on getting a gen- a colleague wants to conduct an experiment about the eral feel of the data analysts are working with. article publishing patterns of her research institute. When first approaching an unknown dataset, an The analysis type usually fits one of the following: overview of the data helps figuring out how the – Comparison: This analysis tries to set one group structure looks like. In most works the overview of variables from another for the selected data. stage is made at ontology level, displaying a ta- Data comparison may be performed over time, ble of high-level metadata statistics about the or among items. The former highlights the trends dataset such as number of classes defined, num- and patterns of data (either periodic or episodic), ber of triples, out-degree and in-degree proper- whereas the latter sets the focus on direct compar- ties, owl:sameAs links, etc. W3C’s VoID vocabu- ison among data instances. lary [15] deals with this task by providing means – Composition: Its purpose is to render the com- to publish metadata about the dataset in order to ponents of a whole, based on a singular aspect be queried by third parties. Hierarchical visual- of the data. Compositions can evolve over time, izations are commonly found in this stage as they or be static. The most representative visual repre- supply a visual aid to understand the overall struc- sentation for a static composition is the pie chart, ture in a rapid fashion. O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey 7

Table 2 Popular charts suitability based on analysis type Comparison Composition Distribution Relationship Line chart * Bar chart ** Bullet bar chart * Column chart ** Stacked columns * Area chart * Stacked areas * Pie chart * Scatter plot ** Bubble chart *** Waterfall chart * Histogram *

– Zoom: When a user is interested in a particu- showing all data attributes’ values may not be fea- lar subset of the data, zooming allows to dive sible, selecting the most appealing ones requires deeper in the selection, giving relevance to fea- a certain amount of thought: for example, when tures being covered by the lack of detail caused by only a few features can be displayed on a details a high level focus. Interaction through zooming query, the central geo-point of a city is interesting, in maps (planar data) or highly-populated graphs whilst for a person instance its birthday should fill (network data) seems natural to lay users, as the the gap. navigation patterns of the majority of user inter- – Relate: Usually ignored by the LOD visualiza- faces follow quite similar mechanics. tion tools following “Shneiderman’s mantra”, the – Filter: The bigger the dataset, the more impor- visualization of connections between items is a tant it becomes to get rid of the features and ele- core task to perform over LOD datasets, in or- ments not relevant for the analysis. If the visual- der to exhibit the benefits of its principles to peo- ization is the result of a SPARQL query, FILTER ple outside the SW community. When an item clauses can be included to avoid retrieving non is selected, how it interacts with other instances desired data. Facets navigation [42] also lets users of the dataset or even external resources is vital select desired features, updating the subset of se- for the understanding of the Web of Data con- lected data in near real-time. Selecting the best cept [20] envisaged by Tim Berners-Lee. An ini- features for filtering purposes is a non-trivial task, tial approach consists on presenting a list of re- and an incorrect strategy may lead towards pre- lated items, or even drawing edges to this exter- senting the users a long list of features, thus dam- nal resources as nodes in a graph [41]. However, aging the exploration experience. Projects such as more advanced and smart solutions using tech- 2 Open Refine improve facet filtering by provid- niques from the EDA field are welcomed. ing sliders and regular expressions where possi- – History: The interaction between users and the ble. The former filters numerical data above or data usually does not follow a pre-established pat- under a selected threshold to be kept out of the tern, with a clear goal and the precise knowl- analysis, whilst the latter enables complex filter- edge to directly conduct analyses in order to gain ing for power users. the expected outcome. EDA promotes exploration – Details on demand: Once a subset of items are of uncertain paths, making adventurous guesses selected, users are usually interested in augment- about the data and committing mistakes, without ing the information about the selection. A detailed the objective of getting conclusive results at the view should be able to display further data which end of the exploration. Therefore, keeping a his- could not be accessed at a previous stage. Since torical record will allow users to undo, replay and progressively refine the performed actions from 2http://openrefine.org/ any given point. 8 O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey

– Extract: Finally, after an effective understanding 3. Actual approaches to visualising LOD of the dataset, users (especially technical and expert ones) might be interested in exporting the After summarising the elements that shape visual subset of data eventually obtained, together with representations, the most relevant current approaches the filters, query parameters and tweaks required are examined. In 2011, Aba-Sah Dadzie & Matthew for its production. The need for interoperable Rowe conducted a research on the up-to-date ap- and standardised formats should anybody want proaches on Visualising Linked Data [25]. They di- their research to be reproducible and repeatable vided the analysed browsers between those offering a in other environments, either for replicating the a) text-based presentation and those b) with visualisa- analysis in other machines or to have the pro- tion options. The Semantic Web and its related tech- cess evaluated by colleagues. Whatever the rea- nologies have evolved since then, as well as the tools son, the ability to extract data is a must where analysed in the survey. Some of them are not longer common standard notations such as RDF/XML available as they were an in-lab prototype, whereas [12], RDF/JSON [8] and JSON-LD [5] play a key others have evolved into new concepts. The tools of role. this survey exhibit the actual status of the visualization approaches to LOD. 2.5. Target user categorisation 3.1. CODE Visualization Wizard Finally, a common theme among LOD visualization works is the need for a widespread adoption of the SW, Within the EU-funded research project CODE3, the to avoid its practice by community members solely. Vi- Vis Wizard tool [40,39] envisages a visualization plat- sualization tools should set a baseline for LOD exploration, thus it must take into account each user’s re- form for all the research publications data extracted quirements. As addressed in [25], there are three well due to the project’s efforts. defined target user groups to be taken into account. To publish research data on the LOD cloud, CODE relies on the RDF Data Cube Vocabulary (DCV) [9], – Lay-users: The vast majority of Internet users are a W3C standard developed to represent statistical data not expected to have knowledge beyond browsing as RDF. Document parser’s output is mapped to the the web, search for relevant content and navigate DCV as a collection of observations consisting in a set through links. Their analytical capacities do not of dimensions and measures. need to be developed, neither any domain knowledge should be anticipated, but the hunger for information and being the largest group makes them play a key role in the mass adoption of LOD. – Techies: Those with a computer science background, or in possession of the set of skills to op- erate websites and program algorithms. Knowl- edgeable of conceptual data models, understanding of basic SW concepts should not be an issue if a solid explanation is provided. Members of this group may need to implement SW services or concepts, or develop an interactive visualization using dynamic data from a SPARQL endpoint – Domain experts: Even without a technological background (or a different one from computing), experts are perfect candidates to perform complex Figure 1. CODE Linked Data Vis Wizard, with the map selected as analyses using a diverse set of data sources. In- the representation to display data, among other options depth knowledge usually drives their analyses to a specific goal, with a clear path of the steps to fol- Once data is loaded in the system, Vis Wizard high- low in order to achieve it. However, dealing with lights the visual representations that can fit the selected huge amounts of heterogeneous data should benefit from a good overview set of visualizations. 3http://code-research.eu/ O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey 9 data as shown in Figure 1, depending on its nature. For cabularies, for example, geography ↔ planar data as example, if planar data are found the users will be able seen in Figure 2, temporal data and knowledge infor- to draw a map, disabling the selector otherwise. This mation ↔ hierarchical, others such as Agent/Person recommendations can be improved by implementing or Organization are related to whole ontology class new generators, well-defined interfaces with the abil- instances. This non-datatype based approach loses ity to map data to new visualizations when plugged to some of the advantages provided by best practices and the Vis Wizard. lessons learned from the data visualization field, al- Lay users can interact with the data by adding and though class-category template filling exhibits a more removing dimensions, and changing the visual compo- robust performance if the schema is known before- nents they are referenced to. This opens new opportu- hand. The biggest drawback is the need to adapt and nities to customize the way data is presented. extend the ASK queries to new vocabularies whenever Finally, using mindmeister’s Mind Mapping soft- they are detected. ware4, a history of the performed actions, applied filters and generated visualizations can be observed, in 3.3. LODVizSuite & ResXplorer order to make data analysis processes reproducible. Researchers from the iMinds Digital Research Cen- 3.2. LDVizWiz tre5 designed, implemented and evaluated an interactive visual workflow to explore LOD. The workflow With the goal of providing general purpose visu- uses EDA techniques to guide users through the ex- alizations of any SPARQL endpoint, LDVizWiz [16] inspects the features of the dataset in order to un- ploratory stage, and Exploratory Search [36] concepts derstand the underlying data and detect categories. to ease data querying. Based on the classification provided by Shneiderman, EDA is allowed through narrowing the dataset from LDVizWiz performs ASK queries to categorise data high level group overviews towards their details. Pro- in one of the following classes: Geography, Tempo- viding an overview first, the dataset reveals its underly- ral, Event, Agent/Person, Organization, Statistics and ing structure and the internal connections as the users Knowledge. explore deeper in the content. The tool designed for this purpose is named LODVizSuite [26]. Exploratory Search consists on broadening a coor- dinated view (output of the narrowing phase). This set of actions is highly focused on leading to other datasets through relationships, if they are considered relevant enough. Broadening is provided by the ResX- plorer tool [27].

Figure 2. LDVizWiz’s display or detected GeoData

Even though some of the categories have a direct Figure 3. LODVizSuite rendering of a research co-authorship net- mapping to Shneiderman’s taxonomy and certain vo- work using a force directed layout

4http://mindmeister.com/ 5http://iminds.be/ 10 O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey

For the workflow’s implementation, the Research LODVisualization can draw treemaps, tables and Information Linked Open Data (RILOD) dataset is bar charts with the extracted data from any SPARQL used, an integration of heterogeneous sources related endpoint, without the need of adapting to a certain set with research and investigation within the region of of domains. Still, it does not perform the rest of tasks Flanders. Figure 3 shows the connections between re- defined by Shneiderman. searchers, with a sidebar displaying the amount of topics covered by the selected author in the graph. The 3.5. Payola data contained within the dataset is enriched with Dig- ital Bibliography and Library Project (DBLP)6 using Payola [31] provides a refined implementation of ResXplorer. the LDVM, to generate visualizations for the Czech LOD cloud, a set of public datasets with relevant data 3.4. LODVisualization for Czechoslovakians, such as public inspections and sanctions depicted in Figure 5. Users are able to con- LODVisualization is a tool based on the LDVM nect to a SPARQL endpoint or upload a RDF file, per- (Linked Data Visualization Model) [21] that allows to forming different analysis over the data and visualising connect different data sources in a dynamic way us- them on a web browser, thanks to the implementation ing different visual representations. LODVisualization of LDVM pipelines. Tech-users can also improve the adopts the Data State Reference Model (DRM) [23] as SPARQL queries and add plugins to Payola in order to a conceptual framework, implementing it according to get a more refined outcome of the analysis. the features of LOD as exhibited in Figure 4.

Figure 5. Payola visualization of inspection and sanctions data’s structure using the TreeMap representation

Collaboration between users is encouraged, as visualizations and the operators used in their generation can be shared within the platform. This allows not only to re-run experiments, but to connect new analysis operators and plugins to existing pipelines in order to pro- Figure 4. LDVM’s implementation of Ed Chi’s DRM duce new visuals with enriched or refined data. This feature allows technical and expert users create visu- The tool implements the Overview task proposed by alizations which can later be consumed by lay users, Shneiderman, generating visual representations about taking away the required knowledge to collect and pro- the hierarchy of classes and properties defined in a cess data. SPARQL endpoint graph. Users can also consult the shared properties between two classes, or those in- 3.6. rdf:SynopsViz stances with the highest in/out-degrees, i.e., resources of a given class with the highest number of outgo- The purpose of ref:SynopsViz [19] it to provide hi- ing and incoming links. However, instances referring erarchical charts and visualizations about LOD. The to the same entity are not disambiguated and are dis- tool heavily relies on metadata as a means of under- played iteratively. standing a datasets internal structure. Statistics such as total number of triples and owl:sameAs links are dis- 6http://dblp.uni-trier.de/ played together with the number of properties, objects, O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey 11 classes, languages, etc. A faceted navigation bar lets the user filter the data, which will later be visualised using bar, linea or area charts (see Figure 6). If a area of special interest is zoomed, more fine-grained data will fill the available space.

Figure 7. Visualization example using Sgvizler, showing the HTML mark-up with the SPARQL query, the data obtained from the selection in tabular format, and the visual output on a scatter plot

Figure 6. rdf:SynopsViz’s faceted browsing feature over DBpedia to be CORS-enabled, otherwise they would not return any information to be rendered. Even though its suitability to perform the “Overview However, Sgvizler requires the user to have a pre- first, zoom and filter” data visualization tasks, more vious SW knowledge, specially about the SPARQL advanced visualizations and interactions are missed querying language to write down the SELECT state- in order to get a real feel on the underlying data. A ments to retrieve data from the endpoint. This makes good point though, is the possibility to obtain com- Sgvizler a good tool for expert users with semantic puted statistics about the data being queried, such as: knowledge and with liberty to modify the HTML of mean, variance, minimum and maximum values, etc. the website to include the special mark-up (Figure 7). Lay users, on the other hands, may not be able to gain 3.7. Sgvizler any benefit from the use of Sgvizler rather than the visualization of the final output. Sgvizler [47] is a JavaScript (JS) wrapper to vi- sualise the result of SPARQL queries within the 3.8. Visualbox HTML elements of a website. To accomplish this goal, Sgvizler makes use of HTML5’s data- prefixed ele- Taking a similar approach to Sgvizler, Visualbox ment attributes, where technical users can specify the [28] requires users to have a certain technological SPARQL query to perform together with the endpoint background, some concepts about RDF and knowledge it is addressed to, the type of visualization to generate of the SPARQL language. This tool joins different fea- (e.g., map, treemap, bar chart, etc.), the dimensions of tures in a single platform: SPARQL syntax highlight- the chart and the format of the data. The tool works ing to detect common errors, connection to endpoints excellent with JSON (JavaScript Object Notation) for- to perform queries and control of the visualization rep- mats, as the data sharing with visualization libraries is resentation through templates. Figure 8 shows the vi- trivial (web-browser based visualization libraries are sual editor of Visualbox: the textareas allow to specify developed in JS, whose understanding of JSON is di- the SPARQL query to retrieve data, and how they are rect). Support for Google Charts7 and d3js’8 force di- going to be rendered using a special templating lan- rected graphs are built in. guage. The options on the sidebar filter elements to Sgvizler adds Cross-Origin Resource Sharing (CORS) generate the final visualization, as depicted on the right [2] support, in order to query SPARQL endpoints in side. Visualbox relies on LODSPeaKr9, a framework an external domain. It requires the SPARQL endpoint to create LOD-based applications, setting the focus on visualizing data.

7https://developers.google.com/chart/ 8http://d3js.org/ 9http://alangrafu.github.io/lodspeakr/ 12 O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey

Figure 8. Visualbox’s graphical editor and ﬁnal visualization schema

Visualbox uses a templating engine similar to django’s10 where specially marked elements in the template are Figure 9. Data panels on VizBoard demo substituted by the values returned after performing the SPARQL query. This variables are used in the sup- tion described in Section 3 are evaluated to analyse ported web-based visualization engines to create the the feature compliance of each tool. The tables sum- graphics. marise which tools support the listed features, in order Collaboration between visualizers is encouraged, as to get on ﬁrst sight a good perception of their distinc- all graphs are shareable through a unique URI and the tivenesses. charts can be downloaded as an image to be included in any document. 4.1. Datatype support

3.9. VizBoard Table 3 portrays which datatypes conceived by Shneiderman are manageable by the current approaches VizBoard [51,50] is a SW visualization tool build to LOD visualization. Only Sgvizler supports the visu- on top of the CRUISe platform [44], designed as alization of 1-dimensional data, but as expressed be- a workbench for information visualization purposes. fore, this datatype is not usually visualised, being lists VizBoard acts as a mash-up tool, where users are able of items it most characteristic representation. to combine different dimensions of the data to create It is noteworthy the lack of support of 3D data by all insightful visualizations that allow any user understand the analysed tools. 3 dimensional data is fundamental the whole picture of the dataset. Figure 9 displays for in many scientific fields, and having domain experts as data panels with different representations of the under- a valuable stakeholder, it does not make much sense lying data. Actions performed in each panel update the to avoid complex structures to being drawn in web data on the rest. browsers. This situation may be due to the big amounts Interaction is heavily based on facet navigation, thus of data scientific areas deal with. The most common letting users select the data components most relevant approach is to have data dumps to work with loaded for their analyses, and keep the non-desired data out of in high-capacity computing machines, and launch off- the picture. Through facets understanding the principal line batch processes to analyse them. components the data is categorised in is quite straight. However, the widespread deployment of sensor net- Finally, VizBoard supports all the information about works across cities and real-time data flows (e.g., so- visualization rendering using The Visualization Ontol- cial networks) will create new challenges to tackle ogy (VISO) [45], a multi-model vocabulary which de- with big datasets over the Internet in the forthcoming scribes all the concepts and relations on the graphics years. and visualization fields. 4.2. Visualization task support

4. Evaluation Regarding visualization tasks support, most of the current approaches are compliant with the shortened In accordance to the features defined in Section 2, version of the visualization mantra: “Overview first, the current approaches dealing with LOD visualiza- zoom and filter, then details on demand”, and satisfy the relate task mainly due to the interlinked nature of 10https://djangoproject.com/ the resources published as LOD. O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey 13

Table 3 Tools support of datatypes 1 dimensional 2 dimensional 3 dimensional Multidimensional Temporal Hierarchical Network CODE *** LDVizWiz **** LOD/VizSuite ** LODVisualization ** Payola **** rdf:SynopsViz *** Sgvizler ****** VisualBox ***** VizBoard ***

As Table 4 shows, both history and extract tasks are exploratory data analyses greatly benefit if the almost ignored, but they are vital when analysts wish used tools support multiple data sources in the to share their experiments, letting others re-run the tri- same instance, without the need to open new tabs als, perform further studies on the displayed data or or explore external datasets from scratch. simply reproduce the visualization with a custom set – SPARQL querying: For those users desiring to of parameters. Providing a simple mechanism to track have a high level of control about the input data the applied algorithms and save the data in standard- for the analysis, the ability to design and tweak ised formats keeps alive the spirit of LOD principles: the SPARQL queries is an essential feature. How- as data is discoverable through the Internet, consulting ever, in those cases where the exploratory analy- the details of the data used in the visualization should sis begins writing a SPARQL query, lay users will be democratised also. not be able to use the tool. – Target users: As listed in Section 2.5, three main 4.3. Other features support user groups are envisaged as potential consumers of the tools: lay users [L], techies [T] and domain Although all the tools share the goal of visualising experts [E]. LOD, the means and features they provide in order to – Visualization customization: As every visual- achieve it significantly differ from one another. Next ization consumer has its own preferences, the some common features are detailed, specially those re- ability to customize the visual components of quired by certain users groups in order to fed their in- a representation is appreciated by a subset of formation analysis hunger. Feature compliance is sum- power users. The possibility to change the layout, marised in Table 5. colours and shapes open new ways to improve – Metadata exploration: The structure and inter- finding communication. These features are also esting features about a dataset can be consulted of high importance for people with visual handi- through the VoID description of the dataset (if caps such as Colour Vision Deficiencies (CVD), present) or using custom queries against the data. which affects people’s ability to distinguish cer- These high-level properties are usually presented tain colours, who would benefit from the avail- in tabular format during the overview stage of ability of correction features within the visualiza- exploratory analyses, and allows to compare the tion. dataset with similar data sources. Some tools up- – Visual ontology support: In order to reuse us- grade the metadata analyses adding statistical in- ability patterns and best practices in LOD visual- formation about the dataset contents [17,33], as ization, semantically describing the resulting vi- well as provenance and data quality metrics. sual representations is a must to encourage fur- – Multiple dataset usage: Data discovery and the ther improvements. Some ontologies are being “follow your nose” principle seem trivial when designed to describe visualization elements as vi- surfing the Internet or consuming videos from on- sual components, purpose and features of the im- line platforms, giving birth to serendipitous be- ages for the sake of reproducibility, as differ- haviours among users. Within the LOD context, ent developers can implement visualization tools 14 O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey

Table 4 Tools support of tasks Overview Zoom Filter Details on demand Relate History Extract CODE ***** LDVizWiz *** LOD/VizSuite *** LODVisualization * Payola *** rdf:SynopsViz *** Sgvizler ***** VisualBox ***** VizBoard ****

Table 5 Tools support of features Metadata Multiple SPARQL Target Vis. Vis. Vis. Vis. exploration datasets querying users customization sharing ontology recommendation CODE * * L, T, E * * * LDVizWiz L, E LOD/VizSuite L, E LODVisualization *L* Payola * * * L, T, E * rdf:SynopsViz *E* Sgvizler T, E VisualBox ***T* VizBoard L***

which take the visual representation’s character- 5. Conclusions istics as semantic-annotated inputs. – Visual recommendations: There are some cases In this paper a background analysis of the Informa- tion Visualization field is presented together with its where the output data suits different visual repre- adaptability to the Visualization topic within the Se- sentations, for example, how a government splits mantic Web community, specially for the information the annual budget between different departments published under the Linked Open Data principles. and ministries can be drawn both as a pie chart The motivation beneath this study is to exhibit the (circular segments being the portion of the total current approaches to LOD visualization on the Inter- budget) and as a column chart. The former depicts net. As users regularly consume Internet resources by and image well known for users, whereas the lat- means of a web browser, the survey is conducted over tools which are operable within these environments. ter gives the opportunity to compare amounts in a Moreover, the number of devices used to access the In- clearer manner (humans are not so good in com- ternet grows in range everyday: smartphones, tablets paring circular areas). and laptops are some of the most preferred gadgets, – Visualization sharing: Sometimes visualizations all of them having at least one web browser in com- are the outputs of community efforts to make mon. There are even some Operative Systems such 11 12 insights known to a wider audience. Publishing as Chrome OS , WebOS and those running under Smart TVs which are barely more than just a web these visualizations on the Internet, allowing con- browser with steroids. Thus betting on for solutions tributors to collaborate and share improvements built on top of them is a desirable feature among 11http://chromium.org/chromium-os institutions pushing towards Open Data policies. 12http://openwebosproject.org/ O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey 15 compliant with standard web technologies (platform References independent) and which do not require any installation in the users’ devices is largely encouraged. [1] Chrome experiments - WebGL. http://www. chromeexperiments.com/webgl/. Availability on multiple devices is a fundamental re- [2] Cross-origin resource sharing. http://www.w3.org/TR/ quirement for widespread adoption of semantic tech- cors/. nologies, but is not enough. The Semantic Web com- [3] DBpedia ontology classes. http://mappings. munity has worked for the last decade in implement- dbpedia.org/server/ontology/classes/. [4] ISO 8601. http://www.iso.org/iso/iso8601. ing the Internet envisaged by Tim Berners-Lee in 2001 [5] JSON-LD - JSON for linking data. http://json-ld. [18], and numerous benefits are developed by its mem- org/. bers. Still, mass adoption of the Semantic Web con- [6] Juice labs - chart chooser. http://labs. cept is not a reality, and will not be fully achieved juiceanalytics.com/chartchooser/index. html. unless the vast majority of Internet users enjoy the [7] Linked data - design issues. http://www.w3.org/ advantages it brings. Semantics provide descriptions DesignIssues/LinkedData.html. about resources, that is, machines are able to under- [8] RDF 1.1 JSON alternate serialization (RDF/JSON). http: stand what information they are working with, and the //www.w3.org/TR/rdf-json/. [9] The RDF data cube vocabulary. http://www.w3.org/ possibility to connect to external data sources and en- TR/vocab-data-cube/. rich the contents should result in the automatic gen- [10] RDF schema 1.1. http://www.w3.org/TR/2014/ eration of smart visualizations, meaning visual repre- REC-rdf-schema-20140225/. sentations which are better formed that what could be [11] RFC 3987: Internationalized resource identifiers (IRIs). http://www.ietf.org/rfc/rfc3987.txt. automatically generated (when possible) if no meta- [12] RDF 1.1 XML syntax, 2001. http://www.w3.org/TR/ information about the data was available. rdf-syntax-grammar/. On a similar fashion, application developers need to [13] FoaF explorer, 2002. http://xml.mfd-consult.dk/ appreciate the added value of getting knowledge from foaf/explorer/. [14] A. Abela. Chart suggestions - a thought starter. http: LOD sources over querying Application Programming //www.extremepresentation.com/uploads/ Interfaces (APIs), parsing documents, scraping web- documents/choosing_a_good_chart.pdf. sites or retrieving information from local databases. [15] K. Alexander and M. Hausenblas. Describing linked datasets - This issue can be further exploited in order to stimu- on the design and usage of VoID, the vocabulary of interlinked datasets. In In Linked Data on the Web Workshop (LDOW 09), late LOD principles’ adoption among data publishers, in conjunction with 18th International World Wide Web Con- creating LOD-driven markets worth using outside the ference (WWW 09, 2009. Semantic Web community practices. [16] G. A. Atemezing and R. Troncy. Towards a linked-data based Finally, well established protocols and the know- visualization wizard. 2014. [17] S. Auer, J. Demter, M. Martin, and J. Lehmann. LODStats – how provided by the long trajectory of InfoVis re- an extensible framework for high-performance dataset analyt- search must be used and disseminated through LOD ics. In A. t. Teije, J. Völker, S. Handschuh, H. Stuckenschmidt, visualization practitioners. Backing visualizations with M. d’Acquin, A. Nikolov, N. Aussenac-Gilles, and N. Her- robust vocabularies and procuring a semantic descrip- nandez, editors, Knowledge Engineering and Knowledge Man- agement, number 7603 in Lecture Notes in Computer Science, tion together with the visual representation allow ma- pages 353–362. Springer Berlin Heidelberg, Jan. 2012. chines not only to understand the data within the [18] T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. graphic, but the procedures and components used for Scientific american, 284(5):28–37, 2001. its construction. Nowadays, not many tools meta- [19] N. Bikakis, M. Skourla, and G. Papastefanatos. rdf:SynopsViz - a framework for hierarchical linked data visual exploration information about the visualization using semantic de- and analysis. arXiv:1408.3148 [cs], Aug. 2014. arXiv: scriptions, neither of the processes and applied tech- 1408.3148. niques to generate it. In order to foster data discover- [20] C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story ability and reusability, the collaboration and process- so far:. International Journal on Semantic Web and Informa- tion Systems, 5(3):1–22, 2009. description features are welcomed in LOD visual ex- [21] J. M. Brunetti, S. Auer, and R. García. The linked data vi- ploration tools. sualization model. In International Semantic Web Conference The authors of this article are expectant to see the (Posters & Demos), 2012. evolution of LOD visualization in the near future in its [22] J. M. Brunetti, S. Auer, R. García, J. Klímek, and M. Necaský.ˇ Formal linked data visualization model. In Proceedings path towards a true acceptance of the Semantic Web in of International Conference on Information Integration and the Internet’s DNA. Web-based Applications & Services, IIWAS ’13, pages 16 O. Peña, U. Aguilera & D. López-de-Ipiña / Linked Open Data Visualization Revisited: A Survey

309:309–309:318, New York, NY, USA, 2013. ACM. [39] B. Mutlu and P. Höfler. Suggesting visualisations for published [23] E. H. Chi. A taxonomy of visualization techniques using the data. 2014. data state reference model. In IEEE Symposium on Information [40] B. Mutlu, P. Höfler, V. Sabol, G. Tschinkel, and M. Gran- Visualization, 2000. InfoVis 2000, pages 69–75, 2000. itzer. Automated visualization support for linked research [24] R. Cyganiak and C. Bizer. Pubby-a linked data frontend for data. In S. Lohmann, editor, I-SEMANTICS (Posters & De- sparql endpoints. Retrieved fro m http://www4. wiwiss. fu- mos), volume 1026 of CEUR Workshop Proceedings, pages berlin. de/pubby/at May, 28:2011, 2008. 40–44. CEUR-WS.org, 2013. [25] A.-S. Dadzie and M. Rowe. Approaches to visualising linked [41] A. G. Nuzzolese, V. Presutti, A. Gangemi, A. Musetti, and data: A survey. Semantic Web, 2(2):89–124, Jan. 2011. P. Ciancarini. Aemoo: Exploring knowledge on the web. In [26] L. De Vocht, A. Dimou, J. Breuer, M. Van Compernolle, Proceedings of the 5th Annual ACM Web Science Confer- R. Verborgh, E. Mannens, P. Mechant, and R. Van de Walle. A ence, WebSci ’13, pages 272–275, New York, NY, USA, 2013. visual exploration workflow as enabler for the exploitation of ACM. linked open data. In Proceedings of the 3rd Workshop Intelli- [42] E. Oren, R. Delbru, and S. Decker. Extending faceted nav- gent Exploration of Semantic Data, 2014. igation for RDF data. In I. Cruz, S. Decker, D. Allemang, [27] L. De Vocht, S. Softic, E. Mannens, R. Van de Walle, C. Preist, D. Schwabe, P. Mika, M. Uschold, and L. M. Aroyo, and M. Ebner. ResXplorer: Interactive search for re- editors, The Semantic Web - ISWC 2006, number 4273 in lationships in research repositories. In Semantic Web Lecture Notes in Computer Science, pages 559–572. Springer Challenge, part of the 12th International Semantic Web Berlin Heidelberg, Jan. 2006. Conference Available at http://challenge. semanticweb. [43] D. L. Parnas, J. E. Shore, and D. Weiss. Abstract types defined org/2013/submissions/swc2013_submission_14. pdf, 2013. as classes of variables. In Proceedings of the 1976 Conference [28] A. Graves. Creation of visualizations based on linked data. In on Data : Abstraction, Definition and Structure, pages 149– Proceedings of the 3rd International Conference on Web Intel- 154, New York, NY, USA, 1976. ACM. ligence, Mining and Semantics, WIMS ’13, pages 41:1–41:12, [44] S. Pietschmann, M. Voigt, A. Rümpel, and K. Meißner. New York, NY, USA, 2013. ACM. CRUISe: Composition of rich user interface services. In [29] J. Heer, M. Bostock, and V. Ogievetsky. A tour through the M. Gaedke, M. Grossniklaus, and O. Díaz, editors, Web Engi- visualization zoo. Communications of the ACM, 53(6):59, June neering, number 5648 in Lecture Notes in Computer Science, 2010. pages 473–476. Springer Berlin Heidelberg, Jan. 2009. [30] B. Johnson and B. Shneiderman. Tree-maps: A space-filling [45] J. Polowinski and M. Voigt. VISO: A shared, formal knowl- approach to the visualization of hierarchical information struc- edge base as a foundation for semi-automatic infovis systems. tures. In Proceedings of the 2Nd Conference on Visualization In CHI ’13 Extended Abstracts on Human Factors in Comput- ’91, VIS ’91, pages 284–291, Los Alamitos, CA, USA, 1991. ing Systems, CHI EA ’13, pages 1791–1796, New York, NY, IEEE Computer Society Press. USA, 2013. ACM. [31] J. Klímek, J. Helmich, and M. Necaský.ˇ Application of the [46] B. Shneiderman. The eyes have it: a task by data type taxon- linked data visualization model on real world data from the omy for information visualizations. In , IEEE Symposium on czech LOD cloud. 2014. Visual Languages, 1996. Proceedings, pages 336–343, Sept. [32] M. Krzywinski, I. Birol, S. J. Jones, and M. A. Marra. Hive 1996. plots—rational approach to visualizing networks. Briefings in [47] M. G. Skj\a eveland. Sgvizler: A javascript wrapper for easy Bioinformatics, 13(5):627–644, Sept. 2012. visualization of sparql result sets. In Extended Semantic Web [33] A. Langegger and W. Woss. Rdfstats-an extensible rdf statis- Conference, 2012. tics generator and library. In Database and Expert Systems [48] C. Stadler, J. Lehmann, K. Höffner, and S. Auer. LinkedGeo- Application, 2009. DEXA’09. 20th International Workshop on, Data: A core for a web of spatial open data. Semantic Web, pages 79–83. IEEE, 2009. 3(4):333–354, Jan. 2012. [34] A. d. Leon, F. Wisniewki, B. Villazón-Terrazas, and O. Corcho. [49] J. W. Tukey. Exploratory Data Analysis. Pearson, Reading, Map4rdf - faceted browser for geospatial datasets. In PMOD Mass, 1 edition edition, 1977. workshop. Facultad de Informática (UPM), 2012. [50] M. Voigt, M. Franke, and K. Mei\s sner. Capturing and reusing [35] S. Madhu Sudhan and J. Chandra. IBA graph selector algo- empirical visualization knowledge. In UMAP Workshops. Cite- rithm for big data visualization using defence dataset. Interna- seer, 2013. tional Journal of Scientific & Engineering Research, 4(3), Mar. [51] M. Voigt, M. Franke, and K. Meißner. Using expert and em- 2013. pirical knowledge for context-aware recommendation of visu- [36] G. Marchionini. Exploratory search: From finding to under- alization components. International Journal On Advances in standing. Commun. ACM, 49(4):41–46, Apr. 2006. Life Sciences, 5(1 and 2):27–41, June 2013. [37] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and [52] C. Ware. Information Visualization, Third Edition: Perception B. Bhattacharjee. Measurement and analysis of online social for Design. Morgan Kaufmann, Waltham, MA, 3 edition edi- networks. In Proceedings of the 7th ACM SIGCOMM Con- tion, June 2012. ference on Internet Measurement, IMC ’07, pages 29–42, New [53] L. Yu. Follow your nose: A basic semantic web agent. In A De- York, NY, USA, 2007. ACM. veloper’s Guide to the Semantic Web, pages 533–557. Springer [38] J. W. Moon and L. Moser. On cliques in graphs. Israel Journal Berlin Heidelberg, Jan. 2011. of Mathematics, 3(1):23–28, Mar. 1965.