Undefined 0 (2010) 1–0 1 IOS Press

Lookup, Explore, Discover: how DBpedia can improve your Web search

Roberto Mirizzi a, Azzurra Ragone a Tommaso Di Noia a and Eugenio Di Sciascio a a Politecnico di Bari, via Re David 200, 7015, Bari – Italy E-mail: [email protected], [email protected], [email protected], [email protected]

Abstract. The power of search is with no doubt one of the main aspects for the success of the Web. Differently from traditional searches in a library, search engines for digital libraries (as the Web is) allow to return results that are considerably effective. Nevertheless, current search engines are no more able to really help the user in all those tasks that go under the umbrella of exploratory search, where the user needs not only to perform a lookup operation but also to discover, understand and learn novel contents on complex topics while searching. Similar problems arise also with search exploiting classical keyword-based tagging techniques. In the paper we present LED, a web based system that aims to improve Web search by enabling users to properly explore knowledge associated to a query/keyword. We rely on DBpedia to explore the of keywords and to guide the user during the exploration of search results.

1. Introduction happens when the user has not a clear idea of what she is looking for and (possibly) she needs some Thanks to the social paradigm of the Web 2.0 suggestions in the exploration process. The above new tools and technologies emerged to allow the mentioned tools enrich the page of search results user to interact with on-line resources. By now, we from classical search engines, e.g., Google, with a are very familiar with the notion of annotating a tag cloud suggesting relevant keywords, so that the web page as well as of tagging system, tag cloud, user may expand her query by adding new terms tag browsing just to cite a few. Surfing the wave of just clicking on tags in the cloud. the social Web, several web search tools have been From the users perspective, a very interesting developed to help users to navigate through search aspect related to the generation of the tag cloud results in a fast, simpler and efficient way combin- in the page of search results is the combination ing traditional search facilities with the tag cloud of lookup search with exploratory browsing. As exploration. Such tools are usually available as pointed out by Marchionini in [8] there are mainly plug-ins (add-ons) for the most common browsers two different ways of search strategies: lookup and and the most relevant are currently: DeeperWeb1, exploratory search. The presentation of contex- Search Cloudlet 2, SenseBot - Search Results Sum- tualized tag clouds combined with search results marizer 3. The main aim of these plug-ins is to makes possible to combine the two search strate- guide users in refining queries with terms which gies. As a matter of fact, after a lookup in a search are somehow related with the ones they are look- engine the user may explore the results space. ing thus exploring the results space. This usually Basically, all the above cited tools exploit text retrieval techniques to suggest new tags looking for relevant keywords in the snippets of search en- 1http://www.deeperweb.com 2http://www.getcloudlet.com gines results. Terms in the tag cloud are the most 3http://www.sensebot.net frequent ones in the result snippets. None of these

0000-0000/10/$00.00 c 2010 – IOS Press and the authors. All rights reserved 2R. Mirizzi, A. Ragone, T. Di Noia, E. Di Sciascio / Lookup, Explore, Discover: how DBpedia can improve your Web search tools take into account the meaning/semantics as- – a hybrid approach to rank resources in a RDF sociated to tags. In this paper we present LED dataset (such as DBpedia): our system ex- (acronym of Lookup Explore Discover), a tool for ploits both the graph nature of a RDF dataset exploratory search exploiting the knowledge en- and text-based techniques used in IR; coded in the Web of data for the tag cloud gener- – a RESTful JSON API that can be accessed ation. Although LED shares with the above men- from any environment that allows HTTP re- tioned tools all the main functionalities such as quest, useful both for building innovative web add/remove tags in order to refine the query, the applications and for comparison with different way it generates the tag cloud is completely dif- algorithms. ferent: The remainder of this paper is structured as fol- – tags are semantically related to the query and lows: in Section 2 we present in detail the LED in- not to the (partial) content of search results; terface. Section 3 is focused on the back-end of the – each generated tag is associated to a RDF re- system. Section 4 shows how LED backend can be source coming from the Web of Data (in our used as a semantic tagging system. In Section 5 we case we mainly rely on DBpedia [2]) with its present the LED RESTful API. A discussion and a associate semantics; comparison of related work is done in Section 6. – tags are semantically related with each other; Conclusion and future work close the paper. Therefore, while for the other tools the main aim is to give to the user a “visual summary” of the 2. LED: Lookup Explore Discover search results, in LED the terms included in the tag cloud are the most relevant w.r.t. the typed In this section we introduce LED, a web-based keyword(s), not the most relevant w.r.t. the re- tool that helps users in query refinement on trieved documents. Also for LED, the main pur- search engines and exploratory knowledge search pose is to suggest keywords similar to the one(s) in DBpedia. The front-end of the system is avail- representing the query in order to allow users able at http://sisinflab.poliba.it/led/ and to refine it and narrow down the search results currently is focused on search in the IT domain. adding/removing new tags from the cloud. In LED In a nutshell, the web-interface is composed by the different dimension of tags in the cloud does three different areas. The first one (marked with not reflect the number of occurrences of the key- (1) in Fig. 1) is mainly a keyword lookup: here the in the search results but the semantic simi- user types in some characters to obtain an auto- larity of each tags w.r.t. the searched keyword(s). complete list of related keywords. The second area LED wants to favour a serendipitous approach [6], (marked with (2)) in Fig. 1 consists of a tag cloud that is the fortuitous discovery of something which (or a set of tag clouds) populated with automati- is not already included in the search results and cally generated tags that are semantically related that the user did not know to exist, but that could with the keyword entered by the user. Each of be as well of interest for her. them corresponds to a DBpedia resource. In fact, The innovative aspects of LED are: the displayed tag is the value of the rdf:label of the corresponding resource. The user may also – a novel approach to exploratory browsing that search for multiple keywords thanks to a tab-based exploits the : in this first ver- structure of the area devoted to search results. As sion of the tool we focus on DBpedia and shown in Fig. 1, one tab for each searched keyword present an intuitive way to explore this knowl- is open (RDFa and , respectively) and edge base; one with all the keywords (RDFa Microformat). – a web application that help users to refine The tag cloud allows the user to discover new queries for search engines with the aid of se- knowledge that is hidden in a traditional search mantically generated suggestions; engine. Thanks to LED it is also possible to explore – a meta-search engine and news collector that knowledge, browsing through suggested tags, and integrates results coming from different exter- refine the query by adding more specific keywords. nal sources in a single platform; Finally, the third areas of the interface (marked R. Mirizzi, A. Ragone, T. Di Noia, E. Di Sciascio / Lookup, Explore, Discover: how DBpedia can improve your Web search3

Fig. 1. The LED interface, available online at http://sisinflab.poliba.it/led/. with (3) in Fig. 1) acts as a meta-search engine: [8]. Therefore, using additional external knowledge it displays the results coming from the three most to refine the queries can be a good way to narrow popular search engines (i.e., Google, Yahoo! and down the search results and improve the user ex- Bing), the and microblogging ser- perience, maximizing the number of possibly rele- vice Twitter, the news aggregator Google News, vant retrieved objects. In [11] the authors describe Flickr and Google Images, according to the query the main ideas behind a simple lookup search on a formulated in the first section and refined in the traditional search engine and a deeper (semantic- second section. guided) exploratory search: a search engine allows to find what you are looking for and already know, 2.1. Lookup while LED (and more generally exploratory search activities) allows to explore and discover what you The entry point to LED is a text input field. Sim- probably did not know to exist. ilarly to classical web search engines, through this When the user starts to type some charac- field the user formulates her query as a set of key- ters in the text search field, the system returns words. a list of DBpedia resources that contain the en- Various studies estimate the average length of a tered string either in their rdf:label or in their search query between 2.4 and 2.7 words [4]. Nev- -owl:abstract. This list is populated by ertheless a query made just of two or three key- querying the DBpedia URI lookup web service4. words can convey only a small amount of infor- For example, if the Java is typed in, the sys- mation. Keeping in mind this assumption, search tem allows you to specify if you are referring ei- engines are tuned toward maximizing precision in the first results (i.e., minimizing the number of possibly irrelevant documents that are retrieved) 4http://lookup.dbpedia.org/api/search.asmx 4R. Mirizzi, A. Ragone, T. Di Noia, E. Di Sciascio / Lookup, Explore, Discover: how DBpedia can improve your Web search ther to the programming language or to the island new knowledge, starting from an initial idea of the (and other possible choices in addition). user. For example, suppose you want to learn more The user can also ignore the suggestions and about microformats5 and related technologies. In look for something that simply does not exist in a traditional lookup search, you would enter the DBpedia. This is allowed by LED: in this case the keyword microformat on a search engine to obtain system will act as a standard meta-search en- a list of documents containing the keyword. Then, gine/aggregator (see Section 2.3). you should navigate through results by selection, navigation and trial-and-error tactics [8] to learn 2.2. Exploratory browsing new information. On the contrary, if you enter microformat in LED, you get information about The most interesting part of LED interface is the what microformat is, and its top-related technolo- tag cloud (marked with (2) in Fig. 1). When the gies (with their description). As you can see from user selects a keyword (i.e., a DBpedia resource) Fig. 1, the suggested keywords are highly relevant from the autocomplete list, the system populates w.r.t. the selected one: in the cloud there is for a cloud containing tags that are semantically re- example HTML, i.e., the markup language used to lated to it and the size of each tag reflects its se- represent in web pages, RDFa and mantic relevance w.r.t. the selected keyword (see GRDDL, that are the main alternatives to mi- Section 3). If a keyword does not exist in DBpedia, croformats to semantically annotate documents. the can not be used to discover As you can see from this example, it is possible and explore new knowledge, so in this case the tag to explore new knowledge with LED, moving from cloud is empty. In order to overcome this issue, we searching to learning. are currently working integrating resources com- Furthermore, LED allows the user to refine her ing from more datasets in the cloud query adding new tags in two different ways, either (e.g., LinkedMDB for movies or DBLP for academic dragging and dropping them in the text input field papers) as well as to exploit user behavior to com- or clicking on the plus icon near the tag. Indeed, pute related concepts. moving the mouse over a keyword in the cloud, When the user moves the mouse pointer over a a small icon representing a plus sign (+) next to keyword in the tag cloud, the dbpedia-owl:abstract the keyword is shown (see (a) in Fig. 1). Click- of the corresponding DBpedia resource is shown ing on it the new keyword is added to the original in a tooltip. The abstract allows the user to bet- query. Once a new tag is added from the cloud, ter understand the meaning of the tag. Since the the tab corresponding to this new keyword is pop- user reads the description of a tag directly into ulated with its semantically related tags. More- , there is no need to leave the application to LED over a new tab is created, containing the most rel- obtain info about the specific tag. This could re- evant tags w.r.t. all the concepts composing the duce trial-and-error tactics in browsing strategies, new query (see (b) in Fig. 1). The relevance value allowing to achieve the goal faster. By clicking of each tag in this new tab is calculated as the on a tag in the cloud, the selected keyword be- sum of the relevance values of the tags belonging comes the new query and the associated results of to each separate keyword in the query. The third the meta-search engine are updated in the meta- column of Table 1 shows the top results according search area, as indicated with (3) in Fig. 1. Once to a query composed by RDFa and Microformat. the user clicks on a tag, the corresponding cloud In the first column we see that the semantic rele- is created in a new tab. Thanks to this feature vance of the pair hRDF a, Semantic W ebi is eval- the user can navigate the knowledge base in an uated as 0.89 while in the second one the relevance intuitive way. Browsing among related tags helps of hMicroformat, Semantic W ebi is 0.87. Hence, the user to obtain a clearer idea about her query. the semantic relevance of hRDF a Microformat, Semantic W ebi When the user is looking for something on a search engine, it may happen that she has just a vague is computed as 0.87 + 0.89. The query refinement idea about how to express her query, and she does is applied also to results from meta-search engine. not know exactly what she is looking for. In this Thanks to this feature the user can be more pro- case LED is more powerful than a traditional search engine, because it allows to explore and discover 5http://microformats.org/about R. Mirizzi, A. Ragone, T. Di Noia, E. Di Sciascio / Lookup, Explore, Discover: how DBpedia can improve your Web search5 ductive, saving time on her search activities6. The sources in order to compute a similarity value for user can also exclude a previously added keyword each pair of resources reached during the explo- from the query simply clicking on the × icon that ration. The RDF graph browsing, and the conse- appears next to a tag that has already been added quent ranking of resources, is performed offline to the query. and, at the end, the result is a weighted graph where the nodes are DBpedia resources and the 2.3. The meta-search engine weights represent the similarity value between any pair of nodes. The graph so obtained will then The lower part of the LED interface displays be used at runtime by LED: (i) to suggest similar results about the query formulated by the user. keywords to users for knowledge exploration and LED acts as a meta-search engine and news ag- query refinement, and (ii) in the document selec- gregator, sending user requests to external sources tion phase, to display documents semantically re- and aggregating them. The idea behind this ac- lated to the query (keywords) posed to the search tivity is that more comprehensive search results engines (see Section 2). can be obtained by combining the results from The similarity value between any pair of re- several sources as Google, Yahoo, Bing, Twitter, sources in the DBpedia graph is computed query- Flickr 7. The first three sources are used to retrieve ing external information sources (search engines a ranked list of documents indexed by search en- and social bookmarking systems) and exploits tex- gines, allowing the user to immediately look at the tual and in DBpedia. For each pair of differences among the results provided by the dif- nodes in the explored graph, we perform a query ferent search engines. Each result set is displayed to each external information source: we search for in a separate tab. Another tab is dedicated to the number of returned web pages containing the show the most recent contents coming from Twit- labels of each nodes individually and then for the ter, the popular microblogging platform. Recent two labels together. Moreover, we look at, respec- news from Google News related to the query for- tively, abstracts in Wikipedia and wikilinks, i.e., mulated by the user are displayed in a separate links between Wikipedia pages. Specifically, given tab. The last tab is dedicated to relevant images two nodes uri1 and uri2, we check if the label of from Google Images and Flickr. node uri1 is contained in the abstract of node uri2, When the user specifies her query, the external and vice versa. The main assumption behind this sources are invoked through their REST APIs. check is that if a DBpedia resource name appears in the abstract of another DBpedia resource it is reasonable to think that the two resources are re- 3. DBpediaRanker: RDF Ranking in DBpedia lated with each other. For the same reason, we also check if the Wikipedia page of resource uri1 has a In this section we will briefly describe our hy- link to the Wikipedia page of resource uri2, and brid ranking algorithm DBpediaRanker, used to vice versa. The architecture of DBpediaRanker is rank resources in DBpedia w.r.t. a given keyword. depicted in Fig. 2. For a more comprehensive description of DBpedi- aRanker and for results of user evaluation the in- terested reader may refer to [10,9]. 4. LED as a service: Not Only Tag (NOT) In a nutshell, DBpediaRanker explores the DBpedia graph and queries external information In this section we show how LED, with slight modifications of its interface and without changes 6Just for lovers of statistics: if only 1% of the World’s in the back-end system, can become a semantic population used LED, and it helped to save each user 10 min- tagging system. The tool grown out of LED is utes per months on their searches, this would save globally NOT, which stands for Not Only Tag (http:// over six billion of working hours per year (this phrase was sisinflab.poliba.it/not-only-tag). NOT (see inspired by http://www.getcloudlet.com/swm.php?page= Fig. 3) is the core of a semantic social tagging donate). 7Exploiting the available APIs exposed by content system for an effective and efficient annotation providers, we are currently expanding the set of information and retrieval of any type of Web resources. It can sources. be used to recommend similar tags to users in 6R. Mirizzi, A. Ragone, T. Di Noia, E. Di Sciascio / Lookup, Explore, Discover: how DBpedia can improve your Web search

# RDFa Microformat RDFa Microformat 1. GRDDL 0.95 HTML 0.94 GRDDL 1.86 2. Resource Description Framework 0.93 Cascading Style Sheet 0.91 HTML 1.85 3. XHTML 0.91 GRDDL 0.91 Semantic Web 1.76 4. HTML 0.91 0.90 Resource Description Framework 0.93 5. Embedded RDF 0.90 Web Technology 0.90 XHTML 0.91 6. Microformat 0.90 RDFa 0.90 Cascading Style Sheet 0.91 7. Semantic Web Services 0.89 XMDP 0.89 Embedded RDF 0.90 8. Semantic Web 0.89 0.88 Web development 0.90 9. XML 0.87 0.88 Web technology 0.90 10. Ontology 0.86 Semantic Web 0.87 Semantic Web Services 0.89 Table 1 The ten most relevant tags for RDFa, Microformat and RDFa Microformat, respectively.

A lot of web sites use tags to annotate and re- trieve content, just to cite a few, ProgrammableWeb8 to tag APIs and mashups; Amazon9 for books, movies, music and games; Flickr 10 for pictures, Delicious11 to store and retrieve favorite book- marks, Youtube12 for videos, and many more. NOT could be exposed as a service (see Section 5) and integrated in one or more of these web sites, in order to make more efficient the tagging process. Indeed, in tagging we have two main issues: – in the annotation phase: tagging is a time- consuming task. In order to make a resource easily retrievable the user should use different tags to match all the possible user queries; – in the retrieval phase: an annotated resource Fig. 2. The functional architecture of the whole system is retrieved only if there is an exact match be- composed by the ranking back-end system DBpediaRanker tween the string representing the annotation and the LED interface. and the searched tag. As a way of example, we can refer to one of the several books tagged on Amazon, e.g., a book about semantic web technologies13, whose tags are: ontology, rdf, semantics, semanticweb, seman- tic web, semantic web, owl. Looking at the tags it is clearly visible the misuse of tags, since the same tag (same meaning) is written three times in three different forms: semantic web, semantic web and semanticweb in order to catch all the possible

8www.programmableweb.com/ Fig. 3. The NOT interface, available online at 9www.amazon.com/ http://sisinflab.poliba.it/not-only-tag/. 10www.flickr.com/ 11www.delicious.com/ 12www.youtube.com 13 the annotation phase in order to make possible In this particular case we refer to the book by John Davies, Rudi Studer, and Paul Warren: Semantic Web a semantic-based retrieval of the resources them- Technologies: Trends and Research in Ontology-based Sys- selves. tems, accessed on Amazon the 26th of July, 2010. R. Mirizzi, A. Ragone, T. Di Noia, E. Di Sciascio / Lookup, Explore, Discover: how DBpedia can improve your Web search7 searches. Then there are more “specific” tags, as (indicated by (3) in Fig. 3). Once the user se- rdf, ontology and owl; furthermore, you can notice lects a tag, the system automatically enriches this that if a book is about rdf, owl and ontology area with concepts related to the dropped tag. it should be automatically inferred by the system For example, in the case of Web Ontology Lan- (without the need to be explicitly specified by the guage, if the user drags and drops in the tag-bag user) that is also about semantic web. and Resource Descrip- Moreover, an effective system should be able to tion Framework (the last is one of the suggested return also approximate results w.r.t. the user’s tags in the tag cloud), the system will automati- query, results ranked based on the similarity of cally add their most related concepts, i.e., Knowl- used tags to the user’s request. Referring to the edge, XML, Ontology, Semantic Web and so on. previous example, it means that a book tagged These new keywords/resources are the most simi- only with owl should be retrieved and suggested lar (best ranked) Wikipedia Categories w.r.t. the as relevant when you are looking for books about resource(s) dropped in the tag-bag, in this case e.g., semantic web, because of its similarity with Web Ontology Language and Resource Description the query, even if the exact searched tag is not Framework. The tags appearing in the personal present in its description. tag-bag area are sized according to their relevance. Another issue strictly coupled with the keyword- Moreover, all the tags automatically added by the based nature of current tagging systems on the system are colored in orange, to distinguish them web is synonymy. Different tags having the same from the ones explicitly added by the user (colored meaning can be used to annotate the same con- in red), and can be deleted if the user do not want tent. to tag a resource with one of them, e.g., she can Here we show how NOT can help the user to select appropriate tags in the annotation phase. delete Data or Data management. Thanks to the The interaction with the system is very simple RDF nature of DBpedia, categories in the tag-bag and intuitive. Keeping the example of the book in (indicated by (3) in Fig. 3) can be easily retrieved Amazon, suppose the user wants to tag the same via nested SPARQL queries. In DBpedia, for each book we referred in the previous example using URI representing Wikipedia category there is a NOT. She will start to type some characters (i.e., RDF triple having the URI as subject, rdf:type as “ontology lan”) in the text input area (marked as property and skos:Concept as object. As an ex- (1) in Fig. 3) and the system will suggest a list ample for the Wikipedia category Semantic Web, of DBpedia URIs whose labels or abstracts con- in DBpedia we have: tain the typed string, in this case: Web Ontol- ogy Language, Ontology language, Ontology (infor- dbpres:Category:Semantic_Web rdf:type skos:Concept mation science), Multimedia Web Ontology lan- guage. At this point, the user may select one of Actually, in order to expand the (semantic) the suggested items. Again, we stress here that keywords in the tag-bag, NOT also exploits the the system does not suggest just a keyword but skos:broader and skos:subject properties within a DBpedia resource identified by a unique URI. DBpedia. These two properties are used to repre- Let us suppose that the choice is for the tag Web sent an ontological classification among Wikipedia Ontology Language, which corresponds to the URI categories: http://dbpedia.org/resource/Web Ontology Language. The system populates a tag cloud (see (2) dbpres:Web_Ontology_Language skos:subject dbpres:Category:Semantic_Web in Fig. 3), where the size of the tags reflects their relative relevance with respect to Web On- dbpres:Category:Semantic_Web skos:broader tology Language in this case. We may see that dbpres:Category:Knowledge_representation the biggest tags are Ontology, , DAML+OIL, XML and Semantic Web Rule Lan- The SPARQL query used to compute the ex- guage. panded cloud related to a given resource is recur- The user can select suggested tags she consid- sively repeated for all the related categories. For ers relevant for the book she is tagging by a drag- example, for the Web Ontology and-drop operation of the tag in her tag-bag area the first query is: 8R. Mirizzi, A. Ragone, T. Di Noia, E. Di Sciascio / Lookup, Explore, Discover: how DBpedia can improve your Web search

SELECT DISTINCT ?hasValue Table 2 shows the details of the request param- WHERE { eters. dbpres:Web_Ontology_Language ?p ?hasValue. A sample response for the method get/tags is ?hasValue a skos:Concept 15 FILTER( isIRI(?hasValue) ). the following : FILTER(?p = skos:subject || ?p = skos:broader) callbackTags({ } "query":"", "results":[ { "label":"PHP", 5. API "uri":"http:\/\/dbpedia.org\/resource\/PHP>" }, { The algorithms behind the tag cloud genera- "label":"PhpBB", "uri":"http:\/\/dbpedia.org\/resource\/PhpBB>" tion discussed in Section 2 and Section 4and are }, also publicly available as a RESTful web service { "label":"PEAR", [12]. The web API is implemented over HTTP "uri":"http:\/\/dbpedia.org\/resource\/PEAR>" }, and is based on the principles of REST architec- { ture. Our aim is to allow innovative web applica- "label":"PhpMyAdmin", "uri":"http:\/\/dbpedia.org\/resource\/PhpMyAdmin>" tions to exploit the results we obtained to built }, smarter tools on the top of our API. Moreover, be- ], "total":10 ing available on the , it can be used by any- }) one for comparison with other metrics for different purposes, such as tag cloud enrichment, ranking Similarly, a sample response for the method of RDF graphs (in particular DBpedia), contextual get/related-tags is16: tagging. callbackRelatedTags({ The LED API consists of two methods on a "records":{"system":42,"users":0}, REST endpoint. To perform an action using the "maxRelevance":4.1037796302354, "main":{ LED API, a request to the endpoint has to be sent "uri":"", specifying a method and its arguments. Currently "label":"RDFa", "description":"RDFa (or Resource Description Framework the response is formatted in JSON, a machine- - in - attributes) is a ...", readable data-interchange format that makes eas- "relevance":1.8909135979731, "normalizedRelevance":5 ier the construction of applications in JavaScript. }, With the LED API it is also possible to bypass the "system":[ { same origin policy requesting the response to be "uri":"", formatted in JSONP14. "label":"GRDDL", "description":"GRDDL (pronounced ’griddle’) is a The callable methods are: markup format ...", "relevance":4.0825179030441, – get/tags - fetch all tags related to a gen- "normalizedRelevance":10 }, eral keyword. This method returns a list of { DBpedia resources that contain the searched "uri":"", "label":"Semantic Web", keyword either in their rdfs:label or in "description":"", theirs dbpedia-owl:abstract. "relevance":2.9482104095809, "normalizedRelevance":7 – get/related-tags - fetch all tags related to the }, selected one. This method returns a list of ... ] DBpedia resources that are related to a given }) resource. The user can get either any resource that is somehow related to the given one or The value of relevance represents the similarity only the resources that are ancestors of the between two resources: the selected one and the given one, i.e., resources that are more general than the selected one. These last resources 15http://sisinflab.poliba.it/led/api/get/tags. correspond to Wikipedia Categories. php?q=php&limit=4&callback=callbackTags 16http://sisinflab.poliba.it/led/api/get/ related-tags?q= 14http://en.wikipedia.org/wiki/JSON#JSONP &callback=callbackRelatedTags R. Mirizzi, A. Ragone, T. Di Noia, E. Di Sciascio / Lookup, Explore, Discover: how DBpedia can improve your Web search9

Method: get/tags Parameter Value Description q string (required) The query to search for. It can be a whole phrase, a keyword or just a part of it. limit integer: default 10 The maximum number of results to return. If not specified it is set to 10. string The name of the callback function to wrap around the JSON data. callback If not specified, the response is plain JSON. Method: get/related-tags Parameter Value Description string The uri of the DBpedia resource to explore. q array of strings If an array of strings is passed, the method returns the most relevant resources (required) w.r.t. all the ones specified in the array (see Table 1 for details). If set to 1, it returns all the related resources, even if they have the same label, showAll integer: default 0 e.g., dbpres:PHP and dbpres:Category:PHP have the same label even if they refer to different DBpedia resources. integer: default 0 If set to 1, it returns a list of the ancestors of the selected resource, parents i.e., DBpedia resources that are more general than the selected one. string The name of the callback function to wrap around the JSON data. callback If not specified, the response is plain JSON. Table 2 Request parameters for the REST queries. suggested one. For example, from the above code, sources in a RDF graph. In [18] the notion of inter- the pair hRDF a, GRDDLi has a relevance score of active semantics is introduced as “an open, self- 4.082. The normalizedRelevance value represents organized and evolving social interactive system the relevance value normalized to the max of the and its semantic image”. From this perspective our returned result set. It is comprised within a range choice of relying on the Linked Data cloud and in of [0,10] and varies at discrete steps. This value is particular on DBpedia well fits with the vision pro- used in LED and in NOT to scale the size of the tags posed by the author. Similarly to LED, in [14,15] an in the generated cloud. approach for query refinement is presented, how- ever, differently from LED, a pure text-based is ex- ploited to enrich Web searching with exploration 6. Related Work services. Peng et al. [5] conduct a study about on how to enhance existing web search paradigms The existence of semantic and re- with tags. In particular they use tags in Delicious lated ontologies in the Web of Data allows to de- to improve Google search function: tags are dis- velop new tools and approaches for data explo- tinguished in search keywords, useful for query re- ration and . Nowadays, RDF datasets finement and exploration keywords, useful for ex- with their formalized knowledge are not only de- ploratory purposes. However, in this case as well, veloped for machine-to-machine interaction but no semantic information is exploited. also for knowledge discovery and navigation. The On the side of collaborative tagging systems whole knowledge available to machines could be (CTS), different approaches have been proposed, exploited to help the user in her searching activi- most of them suggest tags based on their popular- ties, easing the learn process. One of the main is- ity and rely on collaborative filtering algorithms. sues to be faced is how to manipulate these huge In [7] the authors propose an algorithm for tag repositories of information in a “overview first, recommendation in folkosonomies based on graph zoom and filter, then details-on-demand” fashion exploration PageRank-like. Song Yang et al. [16] as pointed out in [13]. In [17] the author proposes adopt a machine learning approach to tag rec- Semantic Link Network (SLN ), a semantic data ommendation: they propose a document-centered model to semantically link resources and derive recommendation opposed to a user-centered one. implicit semantic links according to a set of rela- However all these approaches do not take into ac- tional reasoning rules. Inspired by this model, we count semantics, that is the fact that tags trans- adopt a different approach to elicit hidden knowl- port a meaning and are more than simple key- edge and infer new meaningful links between re- words as they are symbols for concepts. 10R. Mirizzi, A. Ragone, T. Di Noia, E. Di Sciascio / Lookup, Explore, Discover: how DBpedia can improve your Web search

Among semantic-based approaches, there is We also introduced NOT and describe how it can Faviki17, which shares some commonalities with be used as the core module for a semantic-based our tool NOT. Faviki is a tool for social book- tagging system. Thanks to NOT we show how an marking that helps users to tag documents using approach based on the semantics of tags may help DBpedia [2] terms extracted from Wikipedia. Al- the user both in the annotation phase and in the though it is a good starting point to cope with retrieval one. synonymy, it does not solve the problem of ranking The back-end system of both LED and NOT is tags w.r.t. a query. Moreover it does not provide made also available as a RESTful web service. the user with any suggestion of tags related to the Currently, we are mainly investigating how to ones selected during the annotation phase, e.g., if deal with cases where a single knowledge base such the user tags a resource with owl, the tool does as DBpedia is not sufficient to cover the informa- not suggest to tag the page with rdf too. ConTag tive demand. We intend to combine a content- [1] is a semantic tagging recommendation system based recommendation (based also on other avail- which relies on the PIMO ontology, an ad-hoc on- able datasets) and a collaborative-filtering ap- tology built to link tags to concepts. ConTag, as proach, in order to integrate the two stages to our tools, attaches a unique URI to tags, thus al- obtain more accurate and robust results. At the lows to cope with problems typical of current tag- present moment, since LED has just been released, ging systems as , homonyms, acronyms we do not have an active community using it, and different spelling. However, differently from so LED concepts recommendation is only content- our approach the recommendation consists in vi- based. Nevertheless, the interface is ready to sug- sualizing an alignment scheme instead of most re- gest keywords exploiting related searches from lated tags as in our case. TagClusters [3] is a tool other users (as indicated with (c) in Fig. 1) ex- for tag recommendation and tag-based retrieval: ploiting a collaborative filtering approach. We are tags are firstly clustered and then the system sug- also interested in evaluating the usability of the gests tags to users in the same clusters. After de- proposed new interface for exploratory browsing riving the hierarchical structure of tags, the se- with the support of the HCI community, in order mantic similarity is calculated based on the co- to contribute to the design to the future search occurrence. However, the evaluation of our system interfaces for the web. performed in [9] has shown that the co-occurence formula could not be a good choice, e.g., when References a resource is extremely more popular than the other. [1] B. Adrian, L. Sauermann, and T. Roth-berghofer. Contag: A semantic tag recommendation system. In Proceedings of ISemantics 07, pages 297–304. JUCS, 7. Conclusion and future work 2007. [2] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - In this paper we presented LED, a system a crystallization point for the web of data. Web Se- for exploratory search and query refinement in mantics: Science, Services and Agents on the World web searches. We motivated our approach in the Wide Web, July 2009. search-engine scenario, showing how exploiting se- [3] Y.-X. Chen, R. Santamar´ıa,A. Butz, and R. Ther´on. Tagclusters: Semantic aggregation of collaborative mantic information in DBpedia it is possible both tags beyond tagclouds. In SG ’09: Proceedings of (i) to help users in the process of query formula- the 10th International Symposium on Smart Graph- tion and refinement, and (ii) to enhance the doc- ics, pages 56–67, Berlin, Heidelberg, 2009. Springer- ument selection process, displaying more specific Verlag. documents related to the keywords entered in the [4] E. Gabrilovich, A. Z. Broder, M. Fontoura, A. Joshi, V. Josifovski, L. Riedel, and T. Zhang. Classifying search engine. We explained how LED can improve search queries using the web as a source of knowledge. the way web searches are done nowadays and how TWEB, 3(2), 2009. user could save time using such tool. [5] P. Han, Z. Wang, Z. Li, B. Kramer, and F. Yang. Sub- stitution or complement: An empirical analysis on the impact of collaborative tagging on web search. In WI 17http://www.faviki.com ’06: Proceedings of the 2006 IEEE/WIC/ACM Inter- R. Mirizzi, A. Ragone, T. Di Noia, E. Di Sciascio / Lookup, Explore, Discover: how DBpedia can improve your Web search11

national Conference on Web Intelligence, pages 757– formation Management (SWIM 2010), Lecture Notes 760, Washington, DC, USA, 2006. IEEE So- in Computer Science. Springer-Verlag, 2010. ciety. [12] L. Richardson and S. Ruby. Restful web services. [6] J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. O’Reilly, 2007. Riedl. Evaluating collaborative filtering recommender [13] B. Shneiderman. The Craft of Information Visualiza- systems. ACM Trans. Inf. Syst., 22(1):5–53, 2004. tion: Readings and Reflections, chapter The eyes have [7] R. J¨aschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, it: a task by data type of information vi- and G. Stumme. Tag recommendations in folk- sualizations. Morgan Kaufmann Publishers Inc., San sonomies. In PKDD 2007: Proceedings of the 11th Francisco, CA, USA, 2003. European conference on Principles and Practice of [14] B. Velez, R. Weiss, M. A. Sheldon, and D. K. Gifford. Knowledge Discovery in , pages 506–514, Fast and effective query refinement. In In Proc. of the Berlin, Heidelberg, 2007. Springer-Verlag. 20th Int. ACM SIGIR Conf. on Research and Devel- [8] G. Marchionini. Exploratory search: from finding to opment in , pages 6–15, 1997. understanding. Commun. ACM, 49(4):41–46, 2006. [15] M. L. Wilson, B. Kules, m. c. schraefel, and B. Shnei- [9] R. Mirizzi, A. Ragone, T. Di Noia, and E. Di Sciascio. derman. From keyword search to exploration: Design- Ranking the linked data: the case of dbpedia. In 10th ing future search interfaces for the web. Found. Trends International Conference on (ICWE Web Sci., 2(1):1–97, 2010. 2010), 2010. [16] S. Yang, Z. Lu, and L. Giles. Automatic tag recom- [10] R. Mirizzi, A. Ragone, T. Di Noia, and E. Di Scias- mendation algorithms for social recommender systems cio. Semantic tags generation and retrieval for online - microsoft research. ACM Transactions on Web, 2009. advertising. In 19th ACM International Conference [17] H. Zhuge. Communities and emerging semantics in on Information and (CIKM semantic link network: Discovery and learning. IEEE 2010), 2010. Trans. on Knowl. and Data Eng., 21(6):785–799, 2009. [11] R. Mirizzi, A. Ragone, T. Di Noia, and E. Di Sciascio. [18] H. Zhuge. Interactive semantics. Artificial Intelli- Semantic wonder cloud: exploratory search in dbpedia. gence, 174(2):190–204, 2010. In 2th International Workshop on Semantic Web In-