Incremental End-User Query Construction for the Semantic Desktop

Total Page:16

File Type:pdf, Size:1020Kb

Incremental End-User Query Construction for the Semantic Desktop INCREMENTAL END-USER QUERY CONSTRUCTION FOR THE SEMANTIC DESKTOP Ricardo Kawase, Enrico Minack, Wolfgang Nejdl L3S Research Center, Leibniz Universität Hannover, Appelstrasse 9a, 30167 Hannover, Germany Samur Araújo, Daniel Schwabe Departamento de Informática, Catholic University of Rio de Janeiro Rua M. de S. Vicente, 222 Rio de Janeiro RJ 224530-900, Brazil Keywords: Semantic Desktop, User interface, Visual query system, Graphical query language, Ontology, RDF, SPARQL, Personal information management. Abstract: This paper describes the design and implementation of a user interface that allows end-users to incrementally construct a query over the information in the Personal Information Management (PIM) domain. It allows semantically enriched keyword queries, implemented in the Semantic Desktop of the NEPOMUK Project. The Semantic Desktop user is able to explicitly articulate machine-processable knowledge, as described by its metadata. Therefore, searching this semantic information space can also benefit from the knowledge articulation within the query. Contrary to keyword queries, where it is not possible to provide semantic information, structured query languages as SPARQL enable exploiting this knowledge explicitly. 1 INTRODUCTION same, provided that the appropriate tools are available. Whereas traditional folder-based storage and To allow the personal information management navigation-based retrieval used to be sufficient for in the Desktop we first need to introduce an our personal information management needs, the ontology language that can be used to express personal mental models – the personal information sheer number of items on our Desktop calls for the 1 integration of Desktop search in its interfaces. The model ontology (PIMO) . The use of such an paradigm of Desktop search is similar to Web ontology gives the user a better understanding of her search, but shows some interesting differences: Desktop and her tasks, and a way to express her We typically search for items that we have knowledge about the information items. By adding created, received or stored earlier – which explicit, processable meta-data to the information means that we try to re-locate (re-find) rather items in the Desktop, it becomes a “Semantic than to find (discover) something new. Desktop”, a concept we will elaborate later. We typically have a (most likely incomplete) As it has been identified in previous works picture of how the data on our Desktop is (Teevan, 2004) (Bates, 1989) (Belkin, 1993) a organized (which means that navigation might progressive modelling search activity can provide be more apt than keyword search). better results than a single, static search. Yet, strong methods and interaction mechanisms are still From Teevan et al. (Teevan, 2004) we know that missing for searching the Desktop. users use a combination of search and navigation to In this paper we fill this gap by presenting a relocate information on the Web. For personal methodologically designed interface for the information management this is likely to be the 1 http://www.semanticdesktop.org/ontologies/2007/11/01/pimo/ 270 Kawase R., Minack E., Nejdl W., AraÞjo S. and Schwabe D. INCREMENTAL END-USER QUERY CONSTRUCTION FOR THE SEMANTIC DESKTOP. DOI: 10.5220/0001825902700275 In Proceedings of the Fifth International Conference on Web Information Systems and Technologies (WEBIST 2009), page ISBN: 978-989-8111-81-4 Copyright c 2009 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved INCREMENTAL END-USER QUERY CONSTRUCTION FOR THE SEMANTIC DESKTOP (Semantic) Desktop search that combines and environment that is based on the NEPOMUK exploits current interaction mechanisms. It benefits architecture. Since NEPOMUK still requires some from what the user knows (the vocabulary and the semantic knowledge from the user, user-friendly mental model), respects what the user does not know interfaces are a crucial milestone to achieve the goal (the data structure and query languages), to finally of bringing the Semantic Desktop to the common give her what she wants. user. Walking in a two way path, first we aim at In the remaining of this paper we discuss in more designing interfaces that solve the user’s needs on details searching activities in the Semantic Desktop the Semantic Desktop, and on the other direction, we and our proposed work in Section 2. In Section 3 we design interfaces to show the user the potential of describe our implementation and the architecture. the Semantic Desktop and still hide its complexity. Finally, in Section 4 we draw our conclusions and In both cases, first we take a look at the way people the sketch future work. think and express their mental models, so that we understand how the Semantic Desktop can support this (Sauermann, 2005). 2 SEARCHING THE As we mentioned before, we use the Personal Information Model Ontology (PIMO) to describe SEMANTIC DESKTOP and work within the PIM in the Semantic Desktop. The PIMO forms the basis for all custom, user- In (Sauermann, 2005), the authors argued, that the created types and relations. It defines basic types typical user uses the (information on the) Desktop to such as Document, a Person, a Location, etc. and complete a certain task. For this, documents that are relations such as creator, hasLocation, etc. and is relevant to the user’s current task are retrieved, intended to be extended by the user in any way he or processed and stored. Such documents contain she likes. Note that we use the terms “type” and relevant information that is processed by the user, “relation” throughout the paper as user-friendly allowing the user to generate knowledge. This terms for RDF Class and RDF Property. We also use knowledge is implicitly stored in the documents. these less technology-based terms consistently in our Making this implicit knowledge explicitly user interface and the implementation. expressible and machine-processable, is one of the The user can extend the PIMO ontology and use goals of the Semantic Desktop. Allowing the user to it to articulate arbitrary knowledge in an explicit exploit knowledge for retrieval at the Semantic way. For the task of re-finding information on the 2 Desktop, as implemented in the NEPOMUK Semantic Desktop, NEPOMUK provides two project, is the goal of our proposed user interface. different mechanisms. First, a type and instance The definition given by Sauermann et al. browser that is analogous to most operating system (Sauermann, 2005) depicts, that the Semantic file browsers where users type hierarchy and the Desktop paradigm brings the ideas of the Semantic instances of each type. Alternatively, there’s a full- Web paradigm to the user’s personal Desktop where text keyword search that analyses the extracted the conceptualization of the personal mental model metadata from the instances returning a relevance is described in formal ontologies. The standard data sorted list classified using complex ranking format for a common representation is RDF algorithms. (Resource Description Format). Finally, the different It happens that in both cases the potential Desktop applications are integrated using the same combination of user knowledge and system concept of the Semantic Web, for exchanging data functionality is not fully merged. The browser does and accessing resources. not allow the user to input her knowledge about the The NEPOMUK project integrates research, instances and its relations. Conversely, the keyword industrial and open source community efforts to search does not permit the user to use her knowledge develop a new technical and methodological of her PIMO. platform: the Social Semantic Desktop. This is an Consider the case when the user is looking for an extension of the personal Desktop that aims at email (or a presentation document) that was sent (or collaboration and personal information management. created) by a certain person, containing a certain 3 The NEPOMUK framework PSEW (P2P information (e.g., a telephone number or a quote). In Semantic Eclipse Workbench) is an integrated a pure keyword-based interface, the user should input a query such as “email sent by person 2 http://nepomuk.semanticdesktop.org/ telephone number”. However, this is very unlikely, 3 http://nepomuk-eclipse.semanticdesktop.org/xwiki/bin/view/ since most purely keyword-based interfaces assume Main/PSEW 271 WEBIST 2009 - 5th International Conference on Web Information Systems and Technologies the search will be made exclusively in the contents, the solution. The results are given based on the and it would not be expected to understand the records that match the pattern of this table. semantics of “sent by”, or the type of item (email). Considering the graph properties of the RDF Being able to leverage such semantic information model (Pérez, 2006), our solution was based on greatly reduces the search space, since, for instance, allowing the user to create a graph visually and only a small fraction of items in the Desktop would incrementally, which then is translated to a graph- have a relation “sent by”. matching query language. Analogous to what was A similar argument can be made when the query done by Zloof, we are giving to the user a visual must span several items, e.g., “email sent by the model where she can provides examples of the author of a given presentation”. In such cases, reality it wants to get. The incremental feature of our keyword-based searches would, at best, retrieve solution is that each new node added to the graph, partial answers, that must be combined by the user the user is able to view the intermediate result of her by filling in the semantic information that allows the query. binding between the partial answers returned. Below is an example of how the user can make a The user would now need to know the exact query using a graph structure. Suppose you want to semantic terms that are used in her semantic get all emails sent by John. This query can be made repository.
Recommended publications
  • PDF-Xchange Viewer
    PDF-XChange Viewer © 2001-2011 Tracker Software Products Ltd North/South America, Australia, Asia: Tracker Software Products (Canada) Ltd., PO Box 79 Chemainus, BC V0R 1K0, Canada Sales & Admin Tel: Canada (+00) 1-250-324-1621 Fax: Canada (+00) 1-250-324-1623 European Office: 7 Beech Gardens Crawley Down., RH10 4JB Sussex, United Kingdom Sales Tel: +44 (0) 20 8555 1122 Fax: +001 250-324-1623 http://www.tracker-software.com [email protected] Support: [email protected] Support Forums: http://www.tracker-software.com/forum/ ©2001-2011 TRACKER SOFTWARE PRODUCTS II PDF-XChange Viewer v2.5x Table of Contents INTRODUCTION...................................................................................................... 7 IMPORTANT! FREE vs. PRO version ............................................................................................... 8 What Version Am I Running? ............................................................................................................................. 9 Safety Feature .................................................................................................................................................. 10 Notice! ......................................................................................................................................... 10 Files List ....................................................................................................................................... 10 Latest (available) Release Notes .................................................................................................
    [Show full text]
  • Lookeen Desktop Search
    Lookeen Desktop Search Find your files faster! User Benefits Save time by simultaneously searching for documents on your hard drive, in file servers and the network. Lookeen can also search Outlook archives, the Exchange Search with fast and reliable Server and Public Folders. Advanced filters and wildcard options make search more Lookeen technology powerful. With Lookeen you’ll turn ‘search’ into ‘find’. You’ll be able to manage and organize large amounts of data efficiently. Employees will save valuable time usually Find your information in record spent searching to work on more important tasks. time thanks to real-time indexing Lookeen desktop search can also Search your desktop, Outlook be integrated into Outlook The search tool for Windows files and Exchange folders 10, 8, 7 and Vista simultaneously Ctrl+Ctrl is back: instantly launch Edit and save changes to Lookeen from documents in Lookeen preview anywhere on your desktop Save and re-use favorite queries and access them with short keys View all correspondence with individuals or groups at the push of a button Create one-click summaries of email correspondences Start saving time and money immediately For Companies Features Business Edition Desktop search software compatible with Powerful search in virtual environments like Compatible with standard and virtual Windows 10, 8, 7 and Vista Citrix and VMware desktops like Citrix, VMware and Terminal Servers. Simplified roll out through exten- Optional add-in to Microsoft Outlook 2016, Simple, user friendly interface gives users a sive group directives and ADM files. 2013, 2010, 2007 or 2003 and Office 365 unified view over multiple data sources Automatic indexing of all files on the hard Clear presentation of search results drive, network, file servers, Outlook PST/OST- Enterprise Edition Full fidelity preview option archives, Public Folders and the Exchange Scans additional external indexes.
    [Show full text]
  • Towards the Ontology Web Search Engine
    TOWARDS THE ONTOLOGY WEB SEARCH ENGINE Olegs Verhodubs [email protected] Abstract. The project of the Ontology Web Search Engine is presented in this paper. The main purpose of this paper is to develop such a project that can be easily implemented. Ontology Web Search Engine is software to look for and index ontologies in the Web. OWL (Web Ontology Languages) ontologies are meant, and they are necessary for the functioning of the SWES (Semantic Web Expert System). SWES is an expert system that will use found ontologies from the Web, generating rules from them, and will supplement its knowledge base with these generated rules. It is expected that the SWES will serve as a universal expert system for the average user. Keywords: Ontology Web Search Engine, Search Engine, Crawler, Indexer, Semantic Web I. INTRODUCTION The technological development of the Web during the last few decades has provided us with more information than we can comprehend or manage effectively [1]. Typical uses of the Web involve seeking and making use of information, searching for and getting in touch with other people, reviewing catalogs of online stores and ordering products by filling out forms, and viewing adult material. Keyword-based search engines such as YAHOO, GOOGLE and others are the main tools for using the Web, and they provide with links to relevant pages in the Web. Despite improvements in search engine technology, the difficulties remain essentially the same [2]. Firstly, relevant pages, retrieved by search engines, are useless, if they are distributed among a large number of mildly relevant or irrelevant pages.
    [Show full text]
  • An Activity Based Data Model for Desktop Querying (Extended Abstract)?
    An activity based data model for desktop querying (Extended Abstract)? Sibel Adalı1 and Maria Luisa Sapino2 1 Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, USA, [email protected], 2 Universit`adi Torino, Corso Svizzera, 185, I-10149 Torino, Italy [email protected] 1 Introduction With the introduction of a variety of desktop search systems by popular search engines as well as the Mac OS operating system, it is now possible to conduct keyword search across many types of documents. However, this type of search only helps the users locate a very specific piece of information that they are looking for. Furthermore, it is possible to locate this information only if the document contains some keywords and the user remembers the appropriate key- words. There are many cases where this may not be true especially for searches involving multimedia documents. However, a personal computer contains a rich set of associations that link files together. We argue that these associations can be used easily to answer more complex queries. For example, most files will have temporal and spatial information. Hence, files created at the same time or place may have relationships to each other. Similarly, files in the same directory or people addressed in the same email may be related to each other in some way. Furthermore, we can define a structure called “activities” that makes use of these associations to help user accomplish more complicated information needs. Intu- itively, we argue that a person uses a personal computer to store information relevant to various activities she or he is involved in.
    [Show full text]
  • Using Context to Enhance File Search
    Connections: Using Context to Enhance File Search Craig A. N. Soules, Gregory R. Ganger Carnegie Mellon University ABSTRACT Attribute-based naming allows users to classify each file Connections is a file system search tool that combines tradi- with multiple attributes [9, 12, 37]. Once in place, these at- tional content-based search with context information gath- tributes provide additional paths to each file, helping users ered from user activity. By tracing file system calls, Con- locate their files. However, it is unrealistic and inappropri- nections can identify temporal relationships between files ate to require users to proactively provide accurate and use- and use them to expand and reorder traditional content ful classifications. To make these systems viable, they must search results. Doing so improves both recall (reducing false- automatically classify the user's files, and, in fact, this re- positives) and precision (reducing false-negatives). For ex- quirement has led most systems to employ search tools over ample, Connections improves the average recall (from 13% hierarchical file systems rather than change their underlying to 22%) and precision (from 23% to 29%) on the first ten methods of organization. results. When averaged across all recall levels, Connections The most prevalent automated classification method to- improves precision from 17% to 28%. Connections provides day is content analysis: examining the contents and path- these benefits with only modest increases in average query names of files to determine attributes that describe them. time (2 seconds), indexing time (23 seconds daily), and in- Systems using attribute-based naming, such as the Seman- dex size (under 1% of the user's data set).
    [Show full text]
  • LNCS 3532, Pp
    Activity Based Metadata for Semantic Desktop Search Paul Alexandru Chirita, Rita Gavriloaie, Stefania Ghita, Wolfgang Nejdl, and Raluca Paiu L3S Research Center / University of Hanover, Deutscher Pavillon, Expo Plaza 1, 30539 Hanover, Germany {chirita, gavriloaie, ghita, nejdl, paiu}@l3s.de Abstract. With increasing storage capacities on current PCs, searching the World Wide Web has ironically become more efficient than searching one’s own per- sonal computer. The recently introduced desktop search engines are a first step towards coping with this problem, but not yet a satisfying solution. The reason for that is that desktop search is actually quite different from its web counter- part. Documents on the desktop are not linked to each other in a way compara- ble to the web, which means that result ranking is poor or even inexistent, be- cause algorithms like PageRank cannot be used for desktop search. On the other hand, desktop search could potentially profit from a lot of implicit and explicit semantic information available in emails, folder hierarchies, browser cache con- texts and others. This paper investigates how to extract and store these activity based context information explicitly as RDF metadata and how to use them, as well as additional background information and ontologies, to enhance desktop search. 1 Introduction The capacity of our hard-disk drives has increased tremendously over the past decade, and so has the number of files we usually store on our computer. It is no wonder that sometimes we cannot find a document any more, even when we know we saved it somewhere. Ironically, in quite a few of these cases nowadays, the document we are looking for can be found faster on the World Wide Web than on our personal com- puter.
    [Show full text]
  • Dtsearch Desktop/Dtsearch Network Manual
    dtSearch Desktop dtSearch Network Version 7 Copyright 1991-2021 dtSearch Corp. www.dtsearch.com SALES 1-800-483-4637 (301) 263-0731 Fax (301) 263-0781 [email protected] TECHNICAL (301) 263-0731 [email protected] 1 Table of Contents 1. Getting Started _____________________________________________________________ 1 Quick Start 1 Installing dtSearch on a Network 7 Automatic deployment of dtSearch on a Network 8 Command-Line Options 10 Keyboard Shortcuts 11 2. Indexes __________________________________________________________________ 13 What is a Document Index? 13 Creating an Index 13 Caching Documents and Text in an Index 14 Indexing Documents 15 Noise Words 17 Scheduling Index Updates 17 3. Indexing Web Sites _________________________________________________________ 19 Using the Spider to Index Web Sites 19 Spider Options 20 Spider Passwords 21 Login Capture 21 4. Sharing Indexes on a Network _________________________________________________ 23 Creating a Shared Index 23 Sharing Option Settings 23 Index Library Manager 24 Searching Using dtSearch Web 25 5. Working with Indexes _______________________________________________________ 27 Index Manager 27 Recognizing an Existing Index 27 Deleting an Index 27 Renaming an Index 27 Compressing an Index 27 Verifying an Index 27 List Index Contents 28 Merging Indexes 28 6. Searching for Documents _____________________________________________________ 29 Using the Search Dialog Box 29 Browse Words 31 More Search Options 32 Search History 33 i Table of Contents Searching for a List of Words 33 7.
    [Show full text]
  • Master Thesis Innovation Dynamics in Open Source Software
    Master thesis Innovation dynamics in open source software Author: Name: Remco Bloemen Student number: 0109150 Email: [email protected] Telephone: +316 11 88 66 71 Supervisors and advisors: Name: prof. dr. Stefan Kuhlmann Email: [email protected] Telephone: +31 53 489 3353 Office: Ravelijn RA 4410 (STEPS) Name: dr. Chintan Amrit Email: [email protected] Telephone: +31 53 489 4064 Office: Ravelijn RA 3410 (IEBIS) Name: dr. Gonzalo Ord´o~nez{Matamoros Email: [email protected] Telephone: +31 53 489 3348 Office: Ravelijn RA 4333 (STEPS) 1 Abstract Open source software development is a major driver of software innovation, yet it has thus far received little attention from innovation research. One of the reasons is that conventional methods such as survey based studies or patent co-citation analysis do not work in the open source communities. In this thesis it will be shown that open source development is very accessible to study, due to its open nature, but it requires special tools. In particular, this thesis introduces the method of dependency graph analysis to study open source software devel- opment on the grandest scale. A proof of concept application of this method is done and has delivered many significant and interesting results. Contents 1 Open source software 6 1.1 The open source licenses . 8 1.2 Commercial involvement in open source . 9 1.3 Opens source development . 10 1.4 The intellectual property debates . 12 1.4.1 The software patent debate . 13 1.4.2 The open source blind spot . 15 1.5 Litterature search on network analysis in software development .
    [Show full text]
  • Comparison of Indexers
    Comparison of indexers Beagle, JIndex, metaTracker, Strigi Michal Pryc, Xusheng Hou Sun Microsystems Ltd., Ireland November, 2006 Updated: December, 2006 Table of Contents 1. Introduction.............................................................................................................................................3 2. Indexers...................................................................................................................................................4 3. Test environment ....................................................................................................................................5 3.1 Machine............................................................................................................................................5 3.2 CPU..................................................................................................................................................5 3.3 RAM.................................................................................................................................................5 3.4 Disk..................................................................................................................................................5 3.5 Kernel...............................................................................................................................................5 3.6 GCC..................................................................................................................................................5
    [Show full text]
  • List of Search Engines
    A blog network is a group of blogs that are connected to each other in a network. A blog network can either be a group of loosely connected blogs, or a group of blogs that are owned by the same company. The purpose of such a network is usually to promote the other blogs in the same network and therefore increase the advertising revenue generated from online advertising on the blogs.[1] List of search engines From Wikipedia, the free encyclopedia For knowing popular web search engines see, see Most popular Internet search engines. This is a list of search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites that have a search facility for online databases. Contents 1 By content/topic o 1.1 General o 1.2 P2P search engines o 1.3 Metasearch engines o 1.4 Geographically limited scope o 1.5 Semantic o 1.6 Accountancy o 1.7 Business o 1.8 Computers o 1.9 Enterprise o 1.10 Fashion o 1.11 Food/Recipes o 1.12 Genealogy o 1.13 Mobile/Handheld o 1.14 Job o 1.15 Legal o 1.16 Medical o 1.17 News o 1.18 People o 1.19 Real estate / property o 1.20 Television o 1.21 Video Games 2 By information type o 2.1 Forum o 2.2 Blog o 2.3 Multimedia o 2.4 Source code o 2.5 BitTorrent o 2.6 Email o 2.7 Maps o 2.8 Price o 2.9 Question and answer .
    [Show full text]
  • Ileak: a Lightweight System for Detecting Inadvertent Information Leaks
    iLeak: A Lightweight System for Detecting Inadvertent Information Leaks Vasileios P. Kemerlis, Vasilis Pappas, Georgios Portokalidis, and Angelos D. Keromytis Network Security Lab Computer Science Department Columbia University, New York, NY, USA {vpk, vpappas, porto, angelos}@cs.columbia.edu Abstract—Data loss incidents, where data of sensitive nature Most research on detecting and preventing information are exposed to the public, have become too frequent and have leaks has focused on applying strict information flow control caused damages of millions of dollars to companies and other (IFC). Sensitive data are labeled and tracked throughout organizations. Repeatedly, information leaks occur over the Internet, and half of the time they are accidental, caused by an application’s execution to detect their illegal propaga- user negligence, misconfiguration of software, or inadequate tion (e.g., their transmission over the network). Approaches understanding of an application’s functionality. This paper such as Jif [6] and JFlow [7] achieve IFC by introducing presents iLeak, a lightweight, modular system for detecting extensions to the Java programming language, while others inadvertent information leaks. Unlike previous solutions, iLeak enforce it dynamically [8], or propose new operating system builds on components already present in modern computers. In particular, we employ system tracing facilities and data (OS) designs [9]. Fine-grained IFC mechanisms require that indexing services, and combine them in a novel way to detect sensitive data are labeled by the user beforehand, but as data leaks. Our design consists of three components: uaudits are the amount of data stored in desktops continuously grows, responsible for capturing the information that exits the system, locating documents and files containing personal data be- while Inspectors use the indexing service to identify if the comes burdensome.
    [Show full text]
  • Instructions for the Preparation of a Camera-Ready Paper in MS Word
    Review Semantic web; towards Ontology based Web Application Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name Surname, University, Country Zeeshan Ahmed*, Saman Majeed Department of Bioinformatics, Biocenter, University of Wuerzburg, Am Hubland 97074, Wuerzburg Germany Abstract. In this review we address the importance of field i.e. Semantic Web; a mechanism of presenting information over the web in a format so that human being as well as machines can understand the semantic of context. Describing the techno- logical contributions of Semantic Web in detail, we present its main innovation i.e. Ontology, and development supporting technologies i.e. XML (eXtensible Mark-up Language), RDF (Resource Description Framework) and OWL (Web Ontology Language), offering different ways of more explicitly structuring and richly annotating Web pages. Furthermore, in this review paper, we discuss how ontology is contributing to semantic web development with the presentation of some Semantic Web application and conclude with some existing limitations need to be overcome. Keywords: Web, Ontology, Semantic Web 1. Introduction easy to go for scattered extensive information by looking into bookmarked web pages but quite diffi- Targeting the challenge of implementing a web cult to extract a piece of needed information. Al- based system capable of performing semantic based though some search engines and screen scrapers are search to extract desired information from attached invented, search
    [Show full text]