LNCS 4273, Pp

Total Page:16

File Type:pdf, Size:1020Kb

LNCS 4273, Pp Semantic Desktop 2.0: The Gnowsis Experience Leo Sauermann1, Gunnar Aastrand Grimnes1, Malte Kiesel1, Christiaan Fluit3,HeikoMaus1, Dominik Heim2, Danish Nadeem4, Benjamin Horak2, and Andreas Dengel12 1 Knowledge Management Department German Research Center for Artificial Intelligence DFKI GmbH, Kaiserslautern, Germany {firstname.surname}@dfki.de 2 Knowledge-Based Systems Group, Department of Computer Science, University of Kaiserslautern, Germany 3 Aduna BV, Amersfoort, The Netherlands [email protected] 4 University of Osnabrueck, Germany [email protected] Abstract. In this paper we present lessons learned from building a Se- mantic Desktop system, the gnowsis beta. On desktop computers, se- mantic software has to provide stable services and has to reflect the personal view of the user. Our approach to ontologies, the Personal In- formation Model PIMO allows to create tagging services like del.icio.us on the desktop. A semantic wiki allows further annotations. Continuous evaluations of the system helped to improve it. These results were cre- atedintheEPOSresearchprojectandareavailableintheopensource projects Aperture, kaukoluwiki, and gnowsis and will be continued in the Nepomuk project. By using these components, other developers can create new desktop applications the web 2.0 way. 1 Introduction A characteristic of human nature is to collect. In the information age we have moved from the basic collection of food, books and paintings to collecting web- sites, documents, e-mails, ideas, tasks and sometimes arguments and facts. We gather information and store them on our desktop computers, but once stored the satisfaction of possessing something is soon distorted by the task of finding information in our personal data swamp [9]. In this paper we present parts of the gnowsis semantic desktop framework, a tool for personal information manage- ment (PIM). In addition to providing an interface for managing your personal data it also provides interfaces for other applications to access this, acting as a central hub for semantic information on the desktop. The described gnowsis system is a prototype of a Semantic Desktop [17], aiming to integrate desktop ap- plications and the data managed on desktop computers using semantic web tech- nology. Previous work published about Semantic Desktop applications [5,12,15] did show that this approach is promising to support users in finding and remind- ing information, and to work with information in new ways. The architecture I. Cruz et al. (Eds.): ISWC 2006, LNCS 4273, pp. 887–900, 2006. c Springer-Verlag Berlin Heidelberg 2006 888 L. Sauermann et al. was improved during the last years, taking inspiration from the current advanced and popularity of the web 2.0 (See Section 3). The new architecture and new ontology policies for gnowsis version 0.9 is described in Section 2. In Section 3 we will discuss what semantic desktop applications can learn from the web 2.0, in particular we discuss how semantic wikis provide an ideal lightweight solu- tion for ad-hoc creating of meta-data, and how such semantic wikis and tagging were integrated into gnowsis. In section 4, a short description of the information integration framework Aperture is given. A summary on our evaluation efforts and lessons learned is given in Section 5, indicating best practices and other remarks on practical semantic web engineering. The system has been deployed in several evaluation settings, gathering user feedback to further improve it, and an active community is now building around the gnowsis semantic desktop. As the system is now part of the EU-funded Integrated Project Nepomuk1,which develops a comprehensive solution for a Social Semantic Desktop, we expect that the service oriented approach to the semantic desktop will have more impact in the future, which is the conclusion of this paper. 2 The Gnowsis Approach Gnowsis is a semantic desktop with a strong focus on extensibility and inte- gration. The goal of gnowsis is to enhance existing desktop applications and the desktop operating system with Semantic Web features. The primary use for such asystemisPersonal Information Management (PIM), technically realized by representing the user’s data in RDF.Although the technology used is the same, communication, collaboration, and the integration with the global semantic web is not addressed by the gnowsis system. The gnowsis project was created 2003 in Leo Sauermann’s diploma thesis [14] and continued in the DFKI research project EPOS2 [7]. Gnowsis can be coarsely split into two parts, the gnowsis-server which does all the data processing, storage and interaction with native applications; and the graphical user interface (GUI) part, currently implemented as Swing GUI and some web-interfaces (See the gnowsis web-page for illustrative screenshots3). The interface between the server and GUI is clearly specified, making it easy to develop alternative interfaces. It is also possible to run gnowsis-server standalone, without a GUI. Gnowsis uses a Service Oriented Architecture (SOA), where each component defines a certain interface, after the server started the component the interface is available as XML/RPC service4, to be used by other applications, for more detail refer to Section 3. External applications like Microsoft Outlook or Mozilla Thunderbird are in- tegrated via Aperture data-source (See Section 4), their data is imported and mirrored in gnowsis. Some new features were also added to these applications 1 http://nepomuk.semanticdesktop.org 2 http://www.dfki.uni-kl.de/epos 3 http://www.gnowsis.opendfki.de/ 4 http://www.xmlrpc.com/ Semantic Desktop 2.0: The Gnowsis Experience 889 Fig. 1. The Gnowsis Architecture using plugins, for example, in Thunderbird users can relate e-mails to concepts within their personal ontology (See Section 3.1). The whole gnowsis framework is free software, published under a BSD com- patible license. It is implemented in Java to be platform-independent and reuses well-known tools like Jena, Sesame, Servlets, Swing, and XML-RPC. Gnowsis can be downloaded from http://www.gnowsis.org/. The Gnowsis Server. The architecture of the gnowsis-server service is shown in Figure 1. Its central component is naturally an RDF storage repository. Gnow- sis uses four different stores for this purpose. The PIMO store handles the in- formation in the user’s Personal Information Model (See Section 2.1), the re- source store handles the data crawled from Aperture data-sources (See Section 4), the configuration store handles the data about available data-sources, log- levels, crawl-intervals, etc., and finally, the service store handles data created by various gnowsis modules, such as user profiling data or metadata for the crawling of data-sources. Separating the PIMO store from the resource store was an important decision for the gnowsis architecture, and it was made for several reasons: The resource store is inherently chaotic, since it mirrors the structure of the user’s applications (consider your email inbox), whereas the thoughts (eg, concepts and relations) can be structured separately in the PIMO. Another reason was efficiency, while a user’s PIMO may contain a few thousand instances for a very frequent user, it is not uncommon for people to have an archive of 100,000 emails. By separating the two we can save time and resources by only performing inference on the 890 L. Sauermann et al. PIMO store. We also note that a similar approach was taken in many other projects, for instance the topic maps community, where topics and occurrences are separated [13]. A discussion of cognitive justification for such a split can be found in Section 2.1. The storage modules in gnowsis are currently based on Sesame 2 [2] and are using Sesame’s native Storage And Inference Layer (SAIL) to store the data on disk. In the previous gnowsis versions we used MySQL in combination with Jena as triple store, but this enforced users to install the database server on their desktops and also the performance of fulltext-searching in MySQL/Jena was not satisfying. By using Sesame2 with the embedded native SAIL we simplified the installation significantly. In addition to the raw RDF the PIMO and resource stores use an additional SAIL layer which utilizes Lucene5 to index the text of RDF literals, providing extremely fast full-text search capabilities on our RDF stores. Lucene indexing operates on the level of documents. Our LuceneSail has two modes for mapping RDF to logical documents: one is used with Aperture and will index each Aperture data-object (for example files, webpages, emails, etc.) as a document. The other mode does not require Aperture. Instead one Lucene document is created for each named RDF resource in the store. The resulting Lucene Index can be accessed either explicitly through java-code, or by using special predicates when querying the RDF store. Figure 2 shows an example SPARQL query for PIMO documents containing the word “rome”. The LuceneSail will rewrite this query to access the Lucene index and remove the special predicates from the triple patterns. This method for full-text querying of RDF stores is equivalent to the method used in Aduna MetaData server and Aduna AutoFocus [6]. SELECT ?X WHERE { ?X rdf:type pimo:Document ; lucene:matches ?Q. ?Q lucene:query “rome”. } Fig. 2. A Query using special predicates for full-text searching 2.1 Open Source, Open Standards, Open Minds A key to success for a future semantic desktop is making it extendable. Parti- tioning the framework into services and making it possibile to add new services is one way to support this. Another contributor is exposing all programming interfaces based on the XML-RPC standard, allowing new applications can use the data and services of gnowsis. Open standards are important to help create interoperable and affordable solutions for everybody. They also promote competition by setting up a technical playing field that is level to all market players.
Recommended publications
  • PDF-Xchange Viewer
    PDF-XChange Viewer © 2001-2011 Tracker Software Products Ltd North/South America, Australia, Asia: Tracker Software Products (Canada) Ltd., PO Box 79 Chemainus, BC V0R 1K0, Canada Sales & Admin Tel: Canada (+00) 1-250-324-1621 Fax: Canada (+00) 1-250-324-1623 European Office: 7 Beech Gardens Crawley Down., RH10 4JB Sussex, United Kingdom Sales Tel: +44 (0) 20 8555 1122 Fax: +001 250-324-1623 http://www.tracker-software.com [email protected] Support: [email protected] Support Forums: http://www.tracker-software.com/forum/ ©2001-2011 TRACKER SOFTWARE PRODUCTS II PDF-XChange Viewer v2.5x Table of Contents INTRODUCTION...................................................................................................... 7 IMPORTANT! FREE vs. PRO version ............................................................................................... 8 What Version Am I Running? ............................................................................................................................. 9 Safety Feature .................................................................................................................................................. 10 Notice! ......................................................................................................................................... 10 Files List ....................................................................................................................................... 10 Latest (available) Release Notes .................................................................................................
    [Show full text]
  • Lookeen Desktop Search
    Lookeen Desktop Search Find your files faster! User Benefits Save time by simultaneously searching for documents on your hard drive, in file servers and the network. Lookeen can also search Outlook archives, the Exchange Search with fast and reliable Server and Public Folders. Advanced filters and wildcard options make search more Lookeen technology powerful. With Lookeen you’ll turn ‘search’ into ‘find’. You’ll be able to manage and organize large amounts of data efficiently. Employees will save valuable time usually Find your information in record spent searching to work on more important tasks. time thanks to real-time indexing Lookeen desktop search can also Search your desktop, Outlook be integrated into Outlook The search tool for Windows files and Exchange folders 10, 8, 7 and Vista simultaneously Ctrl+Ctrl is back: instantly launch Edit and save changes to Lookeen from documents in Lookeen preview anywhere on your desktop Save and re-use favorite queries and access them with short keys View all correspondence with individuals or groups at the push of a button Create one-click summaries of email correspondences Start saving time and money immediately For Companies Features Business Edition Desktop search software compatible with Powerful search in virtual environments like Compatible with standard and virtual Windows 10, 8, 7 and Vista Citrix and VMware desktops like Citrix, VMware and Terminal Servers. Simplified roll out through exten- Optional add-in to Microsoft Outlook 2016, Simple, user friendly interface gives users a sive group directives and ADM files. 2013, 2010, 2007 or 2003 and Office 365 unified view over multiple data sources Automatic indexing of all files on the hard Clear presentation of search results drive, network, file servers, Outlook PST/OST- Enterprise Edition Full fidelity preview option archives, Public Folders and the Exchange Scans additional external indexes.
    [Show full text]
  • Towards the Ontology Web Search Engine
    TOWARDS THE ONTOLOGY WEB SEARCH ENGINE Olegs Verhodubs [email protected] Abstract. The project of the Ontology Web Search Engine is presented in this paper. The main purpose of this paper is to develop such a project that can be easily implemented. Ontology Web Search Engine is software to look for and index ontologies in the Web. OWL (Web Ontology Languages) ontologies are meant, and they are necessary for the functioning of the SWES (Semantic Web Expert System). SWES is an expert system that will use found ontologies from the Web, generating rules from them, and will supplement its knowledge base with these generated rules. It is expected that the SWES will serve as a universal expert system for the average user. Keywords: Ontology Web Search Engine, Search Engine, Crawler, Indexer, Semantic Web I. INTRODUCTION The technological development of the Web during the last few decades has provided us with more information than we can comprehend or manage effectively [1]. Typical uses of the Web involve seeking and making use of information, searching for and getting in touch with other people, reviewing catalogs of online stores and ordering products by filling out forms, and viewing adult material. Keyword-based search engines such as YAHOO, GOOGLE and others are the main tools for using the Web, and they provide with links to relevant pages in the Web. Despite improvements in search engine technology, the difficulties remain essentially the same [2]. Firstly, relevant pages, retrieved by search engines, are useless, if they are distributed among a large number of mildly relevant or irrelevant pages.
    [Show full text]
  • An Activity Based Data Model for Desktop Querying (Extended Abstract)?
    An activity based data model for desktop querying (Extended Abstract)? Sibel Adalı1 and Maria Luisa Sapino2 1 Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, USA, [email protected], 2 Universit`adi Torino, Corso Svizzera, 185, I-10149 Torino, Italy [email protected] 1 Introduction With the introduction of a variety of desktop search systems by popular search engines as well as the Mac OS operating system, it is now possible to conduct keyword search across many types of documents. However, this type of search only helps the users locate a very specific piece of information that they are looking for. Furthermore, it is possible to locate this information only if the document contains some keywords and the user remembers the appropriate key- words. There are many cases where this may not be true especially for searches involving multimedia documents. However, a personal computer contains a rich set of associations that link files together. We argue that these associations can be used easily to answer more complex queries. For example, most files will have temporal and spatial information. Hence, files created at the same time or place may have relationships to each other. Similarly, files in the same directory or people addressed in the same email may be related to each other in some way. Furthermore, we can define a structure called “activities” that makes use of these associations to help user accomplish more complicated information needs. Intu- itively, we argue that a person uses a personal computer to store information relevant to various activities she or he is involved in.
    [Show full text]
  • Using Context to Enhance File Search
    Connections: Using Context to Enhance File Search Craig A. N. Soules, Gregory R. Ganger Carnegie Mellon University ABSTRACT Attribute-based naming allows users to classify each file Connections is a file system search tool that combines tradi- with multiple attributes [9, 12, 37]. Once in place, these at- tional content-based search with context information gath- tributes provide additional paths to each file, helping users ered from user activity. By tracing file system calls, Con- locate their files. However, it is unrealistic and inappropri- nections can identify temporal relationships between files ate to require users to proactively provide accurate and use- and use them to expand and reorder traditional content ful classifications. To make these systems viable, they must search results. Doing so improves both recall (reducing false- automatically classify the user's files, and, in fact, this re- positives) and precision (reducing false-negatives). For ex- quirement has led most systems to employ search tools over ample, Connections improves the average recall (from 13% hierarchical file systems rather than change their underlying to 22%) and precision (from 23% to 29%) on the first ten methods of organization. results. When averaged across all recall levels, Connections The most prevalent automated classification method to- improves precision from 17% to 28%. Connections provides day is content analysis: examining the contents and path- these benefits with only modest increases in average query names of files to determine attributes that describe them. time (2 seconds), indexing time (23 seconds daily), and in- Systems using attribute-based naming, such as the Seman- dex size (under 1% of the user's data set).
    [Show full text]
  • LNCS 3532, Pp
    Activity Based Metadata for Semantic Desktop Search Paul Alexandru Chirita, Rita Gavriloaie, Stefania Ghita, Wolfgang Nejdl, and Raluca Paiu L3S Research Center / University of Hanover, Deutscher Pavillon, Expo Plaza 1, 30539 Hanover, Germany {chirita, gavriloaie, ghita, nejdl, paiu}@l3s.de Abstract. With increasing storage capacities on current PCs, searching the World Wide Web has ironically become more efficient than searching one’s own per- sonal computer. The recently introduced desktop search engines are a first step towards coping with this problem, but not yet a satisfying solution. The reason for that is that desktop search is actually quite different from its web counter- part. Documents on the desktop are not linked to each other in a way compara- ble to the web, which means that result ranking is poor or even inexistent, be- cause algorithms like PageRank cannot be used for desktop search. On the other hand, desktop search could potentially profit from a lot of implicit and explicit semantic information available in emails, folder hierarchies, browser cache con- texts and others. This paper investigates how to extract and store these activity based context information explicitly as RDF metadata and how to use them, as well as additional background information and ontologies, to enhance desktop search. 1 Introduction The capacity of our hard-disk drives has increased tremendously over the past decade, and so has the number of files we usually store on our computer. It is no wonder that sometimes we cannot find a document any more, even when we know we saved it somewhere. Ironically, in quite a few of these cases nowadays, the document we are looking for can be found faster on the World Wide Web than on our personal com- puter.
    [Show full text]
  • Dtsearch Desktop/Dtsearch Network Manual
    dtSearch Desktop dtSearch Network Version 7 Copyright 1991-2021 dtSearch Corp. www.dtsearch.com SALES 1-800-483-4637 (301) 263-0731 Fax (301) 263-0781 [email protected] TECHNICAL (301) 263-0731 [email protected] 1 Table of Contents 1. Getting Started _____________________________________________________________ 1 Quick Start 1 Installing dtSearch on a Network 7 Automatic deployment of dtSearch on a Network 8 Command-Line Options 10 Keyboard Shortcuts 11 2. Indexes __________________________________________________________________ 13 What is a Document Index? 13 Creating an Index 13 Caching Documents and Text in an Index 14 Indexing Documents 15 Noise Words 17 Scheduling Index Updates 17 3. Indexing Web Sites _________________________________________________________ 19 Using the Spider to Index Web Sites 19 Spider Options 20 Spider Passwords 21 Login Capture 21 4. Sharing Indexes on a Network _________________________________________________ 23 Creating a Shared Index 23 Sharing Option Settings 23 Index Library Manager 24 Searching Using dtSearch Web 25 5. Working with Indexes _______________________________________________________ 27 Index Manager 27 Recognizing an Existing Index 27 Deleting an Index 27 Renaming an Index 27 Compressing an Index 27 Verifying an Index 27 List Index Contents 28 Merging Indexes 28 6. Searching for Documents _____________________________________________________ 29 Using the Search Dialog Box 29 Browse Words 31 More Search Options 32 Search History 33 i Table of Contents Searching for a List of Words 33 7.
    [Show full text]
  • Master Thesis Innovation Dynamics in Open Source Software
    Master thesis Innovation dynamics in open source software Author: Name: Remco Bloemen Student number: 0109150 Email: [email protected] Telephone: +316 11 88 66 71 Supervisors and advisors: Name: prof. dr. Stefan Kuhlmann Email: [email protected] Telephone: +31 53 489 3353 Office: Ravelijn RA 4410 (STEPS) Name: dr. Chintan Amrit Email: [email protected] Telephone: +31 53 489 4064 Office: Ravelijn RA 3410 (IEBIS) Name: dr. Gonzalo Ord´o~nez{Matamoros Email: [email protected] Telephone: +31 53 489 3348 Office: Ravelijn RA 4333 (STEPS) 1 Abstract Open source software development is a major driver of software innovation, yet it has thus far received little attention from innovation research. One of the reasons is that conventional methods such as survey based studies or patent co-citation analysis do not work in the open source communities. In this thesis it will be shown that open source development is very accessible to study, due to its open nature, but it requires special tools. In particular, this thesis introduces the method of dependency graph analysis to study open source software devel- opment on the grandest scale. A proof of concept application of this method is done and has delivered many significant and interesting results. Contents 1 Open source software 6 1.1 The open source licenses . 8 1.2 Commercial involvement in open source . 9 1.3 Opens source development . 10 1.4 The intellectual property debates . 12 1.4.1 The software patent debate . 13 1.4.2 The open source blind spot . 15 1.5 Litterature search on network analysis in software development .
    [Show full text]
  • Comparison of Indexers
    Comparison of indexers Beagle, JIndex, metaTracker, Strigi Michal Pryc, Xusheng Hou Sun Microsystems Ltd., Ireland November, 2006 Updated: December, 2006 Table of Contents 1. Introduction.............................................................................................................................................3 2. Indexers...................................................................................................................................................4 3. Test environment ....................................................................................................................................5 3.1 Machine............................................................................................................................................5 3.2 CPU..................................................................................................................................................5 3.3 RAM.................................................................................................................................................5 3.4 Disk..................................................................................................................................................5 3.5 Kernel...............................................................................................................................................5 3.6 GCC..................................................................................................................................................5
    [Show full text]
  • List of Search Engines
    A blog network is a group of blogs that are connected to each other in a network. A blog network can either be a group of loosely connected blogs, or a group of blogs that are owned by the same company. The purpose of such a network is usually to promote the other blogs in the same network and therefore increase the advertising revenue generated from online advertising on the blogs.[1] List of search engines From Wikipedia, the free encyclopedia For knowing popular web search engines see, see Most popular Internet search engines. This is a list of search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites that have a search facility for online databases. Contents 1 By content/topic o 1.1 General o 1.2 P2P search engines o 1.3 Metasearch engines o 1.4 Geographically limited scope o 1.5 Semantic o 1.6 Accountancy o 1.7 Business o 1.8 Computers o 1.9 Enterprise o 1.10 Fashion o 1.11 Food/Recipes o 1.12 Genealogy o 1.13 Mobile/Handheld o 1.14 Job o 1.15 Legal o 1.16 Medical o 1.17 News o 1.18 People o 1.19 Real estate / property o 1.20 Television o 1.21 Video Games 2 By information type o 2.1 Forum o 2.2 Blog o 2.3 Multimedia o 2.4 Source code o 2.5 BitTorrent o 2.6 Email o 2.7 Maps o 2.8 Price o 2.9 Question and answer .
    [Show full text]
  • Ileak: a Lightweight System for Detecting Inadvertent Information Leaks
    iLeak: A Lightweight System for Detecting Inadvertent Information Leaks Vasileios P. Kemerlis, Vasilis Pappas, Georgios Portokalidis, and Angelos D. Keromytis Network Security Lab Computer Science Department Columbia University, New York, NY, USA {vpk, vpappas, porto, angelos}@cs.columbia.edu Abstract—Data loss incidents, where data of sensitive nature Most research on detecting and preventing information are exposed to the public, have become too frequent and have leaks has focused on applying strict information flow control caused damages of millions of dollars to companies and other (IFC). Sensitive data are labeled and tracked throughout organizations. Repeatedly, information leaks occur over the Internet, and half of the time they are accidental, caused by an application’s execution to detect their illegal propaga- user negligence, misconfiguration of software, or inadequate tion (e.g., their transmission over the network). Approaches understanding of an application’s functionality. This paper such as Jif [6] and JFlow [7] achieve IFC by introducing presents iLeak, a lightweight, modular system for detecting extensions to the Java programming language, while others inadvertent information leaks. Unlike previous solutions, iLeak enforce it dynamically [8], or propose new operating system builds on components already present in modern computers. In particular, we employ system tracing facilities and data (OS) designs [9]. Fine-grained IFC mechanisms require that indexing services, and combine them in a novel way to detect sensitive data are labeled by the user beforehand, but as data leaks. Our design consists of three components: uaudits are the amount of data stored in desktops continuously grows, responsible for capturing the information that exits the system, locating documents and files containing personal data be- while Inspectors use the indexing service to identify if the comes burdensome.
    [Show full text]
  • Instructions for the Preparation of a Camera-Ready Paper in MS Word
    Review Semantic web; towards Ontology based Web Application Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name Surname, University, Country Zeeshan Ahmed*, Saman Majeed Department of Bioinformatics, Biocenter, University of Wuerzburg, Am Hubland 97074, Wuerzburg Germany Abstract. In this review we address the importance of field i.e. Semantic Web; a mechanism of presenting information over the web in a format so that human being as well as machines can understand the semantic of context. Describing the techno- logical contributions of Semantic Web in detail, we present its main innovation i.e. Ontology, and development supporting technologies i.e. XML (eXtensible Mark-up Language), RDF (Resource Description Framework) and OWL (Web Ontology Language), offering different ways of more explicitly structuring and richly annotating Web pages. Furthermore, in this review paper, we discuss how ontology is contributing to semantic web development with the presentation of some Semantic Web application and conclude with some existing limitations need to be overcome. Keywords: Web, Ontology, Semantic Web 1. Introduction easy to go for scattered extensive information by looking into bookmarked web pages but quite diffi- Targeting the challenge of implementing a web cult to extract a piece of needed information. Al- based system capable of performing semantic based though some search engines and screen scrapers are search to extract desired information from attached invented, search
    [Show full text]