Faculty of Computer Science, Free University of Bozen-Bolzano, Piazza Domenicani 3, 39100 Bolzano, Italy

Tel:+39 04710 16000, Fax:+39 04710 16009, http://www.inf.unibz.it/krdb/

EMCL Project Report

Extending basic Technology by using APIs

Muhammad Faheem Abstract Spotlight is advanced search technology based on information and is integrated with the file system, which enable user to search files in the file system efficiently and quickly. Spotlight is more than just searching for document, it provides the APIs which help developers to make use of these metadata information according to their own domain. The aim of this project is to explore the Spotlight APIs and enhance them, so that we could run intelligent queries over it.

1 CONTENTS

Contents

1 Introduction 3

2 Motivation 3

3 Preliminaries 4

4 Spotlight APIs 4 4.1 Metadata ...... 5 4.2 Spotlight Store ...... 6 4.3 Different ways to examine the file’s metadata ...... 6 4.3.1 High Level Language Program ...... 7 4.3.2 Command Line ...... 8 4.4 Spotlight Queries ...... 9 4.4.1 Tool ...... 9 4.4.2 Programming Languages ...... 9 4.4.3 mdfind Tool ...... 10 4.5 Xcode 3.2 ...... 10 4.5.1 Creating New Project ...... 10 4.5.2 Xcode WorkSpace ...... 11 4.5.3 Interface building ...... 12 4.5.4 Objective C ...... 12 4.5.5 A simple Graphical User Interface application based on Predicate ...... 13

5 Enriching basic Spotlight Technology 16 5.1 Spotlight Schema ...... 18 5.2 Beyond Spotlight APIs ...... 23 5.3 A Graphical User Application based on Key metadata attribute ...... 25 5.4 Use Cases and Evaluation ...... 26

6 Conclusion and Future Work 28 Reference29

2 CONTENTS

1 Introduction

Nowadays, new technologies coming from different field of research may converge for imple- menting new and more sophisticated office automation systems. One of these technology allows modern to store in the file system meta information regarding content of various type of documents such as media files, instant messages, office documents, etc. The Internet, web and electronic mail have revolutionized the way we communicate and collaborate. We are much more connected and in turn our demands increase. Now Metadata information regarding the files, contacts and messages is available. Now challenge is to make usage of this information in the right way. Spotlight provides us all the facilities we need regarding searching a file in the file system. Spotlight is tightly integrated with the operating system, which gives it edge over the other technologies, e-g Tool. To use the metadata information in other applications, we need to extend the capabilities of Spotlight. We can query the system using gray box on the top-right side of the Mac window screen, using finder tool or using command prompt. But for running some intelligent queries we need to use Spotlight APIs. By using Spotlight APIs we can run query based on key attributes, e-g kMDItemKind.

2 Motivation

Spotlight allows text searching of user emails, computer files, photos, music, chat, web history etc. We can use the Spotlight APIs to make better use of meta information. The main moti- vation to explore spotlight APIs is to assist the on going research on Tool in KRDB1 center. We are trying to assist at data level of Sematic Desktop Application. Here we discuss more organized approach than already in use. To make a usage of meta information for the field of sematic web we need to organize data in a better structure, as we know already that Spotlight organize the meta-data information in a poor way, because the main purpose of the spotlight is to search for files quickly over the system. So here we extend the Spotlight APIs to fill this gap and make data well structured. Semantic Desktop Tool uses a methodology to extract conceptual schema from raw data and raw schema. So we believe that spotlight APIs can help in a way to extract conceptual schema and also populating ABOX based on conceptual schema. The past work done over Semantic Desktop motivates us to develop that strategy. In other way we are trying to provide a search engine (wrapper developed using Spotlight) for Semantic Desktop Tool. For example Semantic Desktop needs to run query like: • Give me names of all persons who work on Project AAA. • Task associated with the project. • Person X role in a company. • Who replied to an email with subject A, sent by Manger on 20-09-2010. • files written by Faheem and is a part of project. To run these kind of queries against file system we need fast and efficient Tool at data level. We believe that we cannot run these queries straight on Spotlight, but rather we need to use the APIs provided by spotlight and extend them so that we can run such a complex queries. Though these queries looks simple but still require join between different type of files.

Here one point needs to be cleared: why we prefer Spotlight over Google Desktop2? There are several reason but we will only pinpoint these according to Semantic Desktop Tool require- ment. We prefer Spotlight because Spotlight has better integration with the OS, so it will use less resources and be able to do better indexing, and faster searches. Due to integration with

1KRDB research center for Knowledge and Data, Free University Bolzano 2Google Desktop makes searching your computer as easy as searching the web with Google. see [1]

3 CONTENTS

OS, Spotlight updates itself every time the hard drive is written. Our experience says that some times even we delete some file on the file system, but still it is shown in a search result by Google Desktop. When we try to open that file we get message ”File not found”. But on the other hand spotlight does fast indexing and update database in few seconds because of its integration with the OS.

3 Preliminaries

We don’t require any special skills but still we assume that a reader of this paper has some basic programming skills and good understanding of Mac environment. We will develop our small application using Xcode IDE [3] [6]. We will write our code in objective C [9]. Although this paper is well organized but still readers are recommended to read these tutorial [3] [6] [9] before they start reading this paper. We will start with a brief introduction about spotlight but the main purpose of this paper is to extend basic spotlight technology by using spotlight APIs.

4 Spotlight APIs

Spotlight [2] is fast desktop search technology and fundamental feature of Mac OS X that allows the user to organize and search files based on metadata information. For years peoples have been talking about making the file system fast and easy to search by using the metadata information. But it’s been just talk, no technical development regarding it. Other Operating systems have long promised it but does not come up with application. But suddenly third party add-ons are starting to appear and provide the capability of searching across the file system based on metadata information but still with lots of limitations. However Tiger is the first industrial-strength operating system which provides the fully integrated, fast and efficient search across all the files on the system. Organizing the files on system in such a way that it could be easily accessible is a difficult task, and mostly end users are responsible for it. However, even the most organized user will not find it easy to arrange their files in a way that makes it easy to find metadata information. As file system provides only one way of organizing information, user must use some special tool to search for what they want. But still it does not help as most of tools can be slow and limited in how they perform search and also not efficient against complex queries. for instance, user may want to search more than a file e-g searching an email sent by john on date 13-08-2010. Spotlight is an advanced search Technology based on metadata information and integrated with the file system. It keeps tracking the file system and performs certain action to keep its spotlight store updated so that each files easily accessible. Every time a file is created, moved, saved, copied, or deleted, the file system will automatically ensures that the file is properly cataloged,inexed and ready for whatever search query might be issued. Spotlight is more than just searching for documents. Spotlight importers define metadata information that Finder tool can display in its Get Info panel. This information provides more sufficient and details about documents.

Examples of Metadata information: • Video files provide their dimensions, pixel depth and other color related information. • Movies provide their duration.

• PDF files provide information about the authors, creation date, dimensions, encoding, and where they originated. • Contact provides information about the first name, last name, email id, phone number and instant messaging address.

4 CONTENTS

Spotlight is not only available for end users but also for developers to help them to enrich their application with the spotlight capabilities. Tiger does not apply any restrictions or limits over the use of spotlight APIs. There are several technologies that power spotlight and provide dominance over other existing technologies.

Spotlight Technologies:

• A database consisting of a high-performance meta-data store and content index that is fully integrated with the file system. • Programmatic APIs that are part of the CoreServices and Cocoa frameworks that helps user to query the meta-data store and content index. • A set of importer plug-ins that are used to populate the meta-data store and content index with information about the files on the file system. • A plug-in API allowing you to provide meta-data and content to be indexed for your application’s custom file formats.

Spotlight helps us to plug our application into the operating system and work on files in totally different way. For example we are developing video management system and we want to provide the facility of search. So We can add spotlight search window in our application that match certain criteria provided by the end user and can return result after running some back hand query. Since we are discussing the power of Spotlight, still we need to cover some ground knowledge to make our vision clear about the working of spotlight. AS For Spotlight searching to work, it has to have metadata. While some bits of metadata (modification dates, creation date, file type, path name) are easy to gather for a given file, but still most of the interesting data is embedded inside the file. To gather this embedded information for specific file type, you must provide a specific Spotlight importer. So let’s start out by defining metadata.

4.1 Metadata Meta-data is structured information that describes, explains, locates or otherwise makes it easier to retrieve, use and mange an information resource. It provides a description about the data contained in a file such as author, title, editor, created date, last modification date etc.An example is given in Figure 1.

Figure 1: An example of MetaData Information [2]

Some of the metadata information kept external to the file by the file system and accessible through different way such as modification date. But most of interesting kind of data embedded

5 CONTENTS inside the file such as flash ON OFF, pixel, width, height, encoding etc. Until now it was almost impossible to extract the files based on metadata information efficiently, but spotlight makes it possible and provides a way to query the file system easily and efficiently. Spotlight gathers all of the information about the files over the files system into the spotlight store.

4.2 Spotlight Store The spotlight store is a database over the file system which consists of • Meta Data store: that holds all the metadata attributes about the files. • Content Index: that keeps index of the contents of files.

Spotlight will ensure that both the content index and metadata store entries for file are updated after any operation such as created, copied, updated or deleted.See Figure 2.

Figure 2: Spotlight Server Architecture [2]

Content index is totally based on the Search kit technologies that were introduced with Mac OS X 10.3 partner. Search kit is three times faster at indexing content and up to 20 times faster at incremental searching than its Partner. The meta-data store, on the other hand, is a totally advanced database designed specially to handle the needs of meta-data. Internally,each file is represented as an MDItem object. Each MDItem contains a dictionary of the meta-data attributes of that file. A sample of these keys is listed in the table 1 . There are several keys which are used to handle the metadata information. Noticeable thing is that there is one metadata store and one context index per file system. So that, database could be kept close to files they belong to. As now we have clear picture about the usage of spotlight store for keeping the useful infor- mation about the files on the system. Now we will explore how we can look into the metadata information of specific file over file system.

4.3 Different ways to examine the file’s metadata There are two broad ways to look into file’s metadata.

6 CONTENTS

MetaData Attributes Attribute Key Data Type kMDItemTitle The title of Item kMDItemAuthors The author of the content of file kMDItemWhereFroms Describes where the item was obtained from kMDItemDueDate Date this item is due kMDItemCopyright Copyright owner of the file contents kMDItemRecipients Recipients of the item

Table 1: Key MetaData item Example

4.3.1 High Level Language Program We can develop a small high level language program to extract metadata information of any file. But for this we need to use CoreService Framework, which provides the MDItem object. We can create the MDItem object by using the file’s path. MDItem is wrapper around the file’s metadata attributes Let’s look into sample code of the Listing 1 and its result is shown in Listing 2.

Listing 1: Objective C Program for extracting Meta data Information of Specific File

// Create the path refernce to the existing file on our System CFStringRef path=CFSTR(”/Library/Application Support/Apple/iChat \ Icons ” ”/Flags/Pakistan. gif”);

// Create the MetaData item which will keep all the values //and names of MetaData Attributes MDItemRef item=MDItemCreate(KCFAllocatorDefault , path );

// take the names all the MetaData attributes of existing file CFArrayRef attributeNames=MDItemCopyAttributeNames(item );

//Load up an NSArray for convenience NSArray∗ array=(NSArray ∗)attributeNames; NSEnumerator ∗e=[array objectEnumerator ]; id arrayObject;

//placeholder for the metadata information NSMutablestring ∗ info=[NSMutableString stringWithcapacity :50]; CFTypeRef ref ; while ((arrayObject =[e nextObject])) { // will copy the description of specific meta data attribute ref=MDItemCopyAttribute(item ,(CFStrinRef)[ arrayObjectndescription ]) //cast to get a NSObject for convenience NSObject∗ tempObject=(NSObject ∗) r e f ; // append the result into Mutable String [info appendString:[arrayObject description ]]; [ info appendstring:@”=”] [info appendString:[tempObject description ]]; [info appendstring:@”\n” ] }

7 CONTENTS

Listing 2: Output of the program in Listing 1 and command in Listing 3. kMDItemDisplayName = Pakistan . gif kMDItemKind = Graphics Interchange Format (GIF) kMDItemContentType = com.compuserve. gif kMDItemPixelCount = 2304 kMDItemOrientation = 0 kMDItemResolutionWidthDPI = 0 kMDItemBitsPerSample = 40 kMDItemResolutionHeightDPI = 0 kMDItemContentCreationDate = 2009−07−29 07:28:14 +0200 kMDItemSupportFileType = ( MDSystemFile ) kMDItemContentModificationDate = 2009−07−29 07:28:14 +0200 kMDItemHasAlphaChannel = 1 kMDItemColorSpace = RGB kMDItemContentTypeTree = ( ”com.compuserve. gif”, ”public.image”, ”public.data”, ”public.item”, ”public.content” ) kMDItemPixelWidth = 48 kMDItemPixelHeight = 48 kMDItemFSName = Pakistan . g i f kMDItemFSSize = 2625 kMDItemFSCreationDate = 2009−07−29 07:28:14 +0200 kMDItemFSContentChangeDate = 2009−07−29 07:28:14 +0200 kMDItemFSOwnerUserID = 0 kMDItemFSOwnerGroupID = 80 kMDItemFSNodeCount = 0 kMDItemFSInvisible = 0 kMDItemFSTypeCode = 0 kMDItemFSCreatorCode = 0 kMDItemFSFinderFlags = 0 kMDItemFSHasCustomIcon = 0 kMDItemFSIsExtensionHidden = 0 kMDItemFSIsStationery = 0 kMDItemFSLabel = 0

4.3.2 Command Line There is some command line tool which provides the possibility to access the metadata infor- mation of files and perform queries. As Spotlight exists at very lowest level of operating system, so performing queries at lowest level is natural. Mdls command provides us metadata infor- mation of specific file. For Example Listing 3 is command to extract Metadata information. And Listing 2 is ouptput of that query.

Listing 3: Command Line query for Extracting Metadata Information of specific File. $ mdls ”/Library/Application Support/Apple/iChat Icons/Flags/Pakistan. gif”

8 CONTENTS

Till now we have seen the way of extracting the MetaData information by using the path of the file. Now we will create the complex queries based on search expression to extract the result from the System.

4.4 Spotlight Queries There are several way to query the file System. Queries are written in C like expression. So Lets explore the way to query the file System.

4.4.1 Finder Tool We can perform high level queries by client application such as finder. Application translates this high level query into appropriate query expression, and also defines the scope of search. This is one of the best way to create the query. As we know, we can run the high level query through Finder, but remember there is always complex query which run against Spotlight Server. We can examine the Complex query. To look into that complex query, search some file using the Finder and save the result. Now go to the location of the saved result and right click over the folder and then select sub menu ”Get Info”, there you will see the complex query which run in reply of high level query. See Figure 3.

Figure 3: Complex query in response of Finder Tool search expression

4.4.2 Programming Languages Several languages use the Spotlight APIs to query the file system. One of those languages is Objective C. In the section 4.3.1 we have seen the way of executing the queries on the base of path key metadata item, here in this section we will see how we can create the queries by using the search expressions. For that purpose we just need C like Query expression. Listing 4 shows Complex search expression. We could develop query expression based on file system attributes, text content of file and metadata information. Listing 5 depict the small query witting expression.

9 CONTENTS

Listing 4: Complex Query. ((kMDItemKind=’vrge08 contact’)) && ( com entourage nickname!=’∗ ’ ) && ( com microsoft entourage spouse = ’∗ ’) && ( c o m m i c r o s o f t e n t o u r a g e s u f f i x != ’∗ ’) && ( c o m m i c r o s o f t e n t o u r a g e t i t l e != ’∗ ’) && ( c o m m i c r o s o f t e n t o u r a g e n o t e s != ’∗ ’) && ( c o m m i c r o s o f t e n t o u r a g e interests!=’∗ ’)

Listing 5: Lines for creating complex query

MDQueryRef query ; query = MDQueryCreate(kCFAllocatorDefault , CFSTR( ”kMDItemKeywords==’∗ S p o t l i g h t ∗ ’ ” ) ,NULL,NULL) ;

4.4.3 mdfind Tool We can also run the complex queries by using command line.For that purpose we use mdfind Command. mdfind as shown in Listing 6, finds the metadata information from the file system.

Listing 6: Running Queries from Command Line $ mdfind ”kMDItemTitle==’Spotlight APIs ’ ” We have talked about spotlight and different ways of extracting the metadata information from the file system, one of that way was by using some programming language. So here we will make a small Graphical User Interface application by using XCode 3.2, which is application development tool over the mac OS X Tiger. So lets see how we could develop an application using Xocode [3].

4.5 Xcode 3.2 Xcode 3.2 IDE is the fastest and easiest way for developers to create applications for Mac OS X Tiger. But question arise why we prefer XCode over other IDE? simply because It provide the best environment to take advantage of all of the new developer Technologies that Apple has put into Tiger. Xcode brings together power of UNIX, high performance development technologies, and automated way of using Mac new technologies. Xcode is the best tool for writing the Code in C,C++, Objective C and also provide the efficient support for migrating the code from another legacy System. Xcode 3.2 is the latest version which come up with set of powerfull new visualization tools, workspace layout and new version of GCC. The new workspace layout makes comfortable with code and new version of GCC compiler makes application run faster. So in other words Xcode IDE provides elegant and power full user interface for developing and managing software development projects.

4.5.1 Creating New Project Follow the following steps for creating new project. See figures 4, 5, 6, 7 to understand the steps graphically. • Open Xcode . • select ”Create a new Xcode project” from menu 4. • Choose template for your new project 5. Application Cocoa Application

10 CONTENTS

• Save your project 6.

Figure 4: Welcome to XCode

Figure 5: New Project

4.5.2 Xcode WorkSpace The Xcode workspace is made up of the windows we use to develop products using the Xcode. Such windows include the text editor window, project window, the Documentation window, and others. Xcode lets us arrange the components of the project window and specify what documentation the ”Documentation window” shows. When it comes to editing text files, espe- cially source-code files, the Xcode text editor provides us many features that facilitate editing code and accessing API reference directly from the editor quickly. To get an overview please refer to [10]. see figure 8.

11 CONTENTS

Figure 6: Save Project

4.5.3 Interface building Interface Builder [8] is a visual design tool you use to create the user interfaces of your iOS and Mac OS X applications. Using the graphical environment of Interface Builder, you assemble windows, views, controls, menus, and other elements from a library of configurable objects. You arrange these items, set their attributes, establish connections between them, and then save them in a special type of resource file, called a nib file. A nib file stores your objects, including their configuration and layout information, in a format that at runtime can be used to recreate the actual objects. Figure 9, 10, 11, 12 shows main component of Interface Builder. To get overview of interface builder please refer to [8].

4.5.4 Objective C The Objective-C [9] language is a simple computer language designed to enable sophisticated object-oriented programming. Objective-C is defined as a small but powerful set of extensions to the standard ANSI C language. Its additions to C are mostly based on Smalltalk, one of the first object-oriented programming languages. Objective-C is designed to give C full object- oriented programming capabilities, and to do so in a simple and straightforward way. [9] is good tutorial about Objective C. We will do our programming in this language so we need to understand the syntax of this language.

We will discuss shortly about the two application which i have developed. Application perform the search of particular file on the base of ”natural language expression” like ”faheem.jpg”, etc where as Application perform search on the base of ”key attributes search expressions” like

"kMDItemKind=pdf && kMDItemTitle=faheem". So lets see the application based on ”natural language expression” in upcoming section first, application based on ”key attributes expression” we will introduce latter.

12 CONTENTS

Figure 7: Project Window

4.5.5 A simple Graphical User Interface application based on Predicate Figure 13 is GUI, of the small application which extract the meta data information of specific file against some natural language search expression e-g tiger, italy.jpg etc . This application perform almost the same functionality as spotlight does. But here not only we search for the file (as we only can do with spotlight) but also see its meta information associated with the files. In this application we can’t perform join query because search query base on natural language expression, thats why we will discuss latter about the application which search according to key attributes. So lets see first the sample code of application which based on natural search expression. We have seen the sample code in Listing 1. So now we will see how we can develop a GUI application. We just need to replace following line at the top of the code of program in Listing 1.

Replace the following line of Listing 1

CFStringRef path=CFSTR("/Library/Application Support/Apple/iChat\ Icons" "/Flags/Pakistan.gif");

with following lines of code and rest of program will be the same.

MDQueryRef query;

NSString *expressionstring; expressionstring = (NSString *) searchexpression; // search expression is an expression // given by end user through some // interface NSPredicate *predicateToRun = nil; // need to create predicate for running query

NSUInteger options = (NSCaseInsensitivePredicateOption|NSDiacriticInsensitivePredicateOption);

// predicate created against user search expression NSPredicate *compPred = [NSComparisonPredicate predicateWithLeftExpression:[NSExpression expressionForKeyPath:@"*"] rightExpression:[NSExpression expressionForConstantValue:self.expressionstring]

13 CONTENTS

Figure 8: Project Window Component [10]

modifier:NSDirectPredicateModifier type:NSLikePredicateOperatorType options:options];

predicateToRun = compPred;

predicateToRun = [NSCompoundPredicate andPredicateWithSubpredicates:[NSArray arrayWithObjects:predicateToRun, nil, nil]];

// Set it to the query. If the query already is alive, it will update immediately [self.query setPredicate:predicateToRun];

// In case the query hasn’t yet started, start it. [self.query startQuery]; int count = MDQueryGetResultCount(query);

MDItemRef item;NSString *path; for(int row=0;row

//rest of the code of the Listing 1 will come here. } Now we just need to bind the event in GUI, so that we could perform some action over it. For example by clicking over an button we can populate the result of query. Here in this code we will bind the value in text code with the search expression a NSString type of variable. Then we can use this expression to search a file in the system. To get experience about the development of GUI application, we can visit the official website of Xcode [6] which help us how

14 CONTENTS

Figure 9: Interface Builder Layout [8]

we can create project in Xcode and perform event handling. 13 is very simple application. We can develop very advanced application (e-g Semantic Desktop Tool) as Spotlight APIs provides us a gateway to access most of the technologies provided by Mac OS. One point need to clear here that we cannot run complex queries by using this application e-g ”select Note which are associated with some project”. As this application based on natural langugae search, so we can not search cross the file format. So this limitation can be overcome by using Spotlight APIs and extend them by using key Attributes.

As our motivation is to provide a search engine for semantic desktop application, thats why we need to extend basic functionality of Spotlight by extending and using spotlight APIs. So in next section we will talk about the extension we need in Spotlight.

15 CONTENTS

Figure 10: Library Window [8]

5 Enriching basic Spotlight Technology

Spotlight is an advanced search technology based on metadata information and integrated with the file system. It provides us the way to query file system and makes possible to extract the desired information. It provides us set of APIs which allow us to personalize our application. In the last section we have explored the capabilities of spotlight and have seen that it is more than just a gray box on window screen. But there we find some limitations, as we can’t search across the file format, e-g ”search a person who work on task A of project B” will not be possible by natural language expression, but by using the key attributes expression we can write the queries which can search across the file format. We can access the file system and execute complex queries by using the Spotlight APIs. More importantly Tiger does not apply any limitation over the use of spotlight APIs. Spotlight APIs is a strong feature of this technology and could be one of the major steps in development phase of Semantic Desktop Tool. If we say that Spotlight could be a search engine for Semantic Desktop Tool, it won’t be wrong as we could use it’s APIs for extracting the hidden information in the file system and could use them in a way that whole file system will be on our finger tips. But question arise whether we could use extracted information straight way: offcourse not. Before we use this metadata information in our proposed tool we have to enrich these spotlight APIs. By using the spotlight from gray box, or from the finder tool we cannot execute the interesting queries like join etc. If we want to do more than just running

16 CONTENTS

Figure 11: Inspector Window [8]

simple queries, we need to use the spotlight APIs and need to enrich the capabilities according to our application’s demand. Spotlight provides several application programming interfaces (APIs) that allows our appli- cation to search for files efficiently and quickly based on metadata. The way our application interacts with the search results against some query will often depend on the API that we chose. We can make usage of this result according to our needs of application. Spotlight APIs can be considered as a good framework for the purpose of our project. The possibility to personalize the indexing process, extending the metadata information associated to each file or creating new index for new type of file, is a fundamental feature for creating real business application. In the Semantic Desktop Tool a methodology is defined to extract conceptual schema from raw data. We believe that with the use of spotlight APIs we could retrieve the meta information associated to each file according to methodology and then we could populate database or Abox from this information. As we know there are two types of search expressions which we can use for querying the file system: • Natural Language search Expression: we can query by using the natural expression like person, tiger, italy.jpg, etc. • Key metadata attribute search Expression: we can also query by using the metadata attributes associate with the file like kMDItemKind=’pdf’, kMDItemTitle=’Italy’, etc

By using the key attributes, we can write more complex queries which can search across the files. Where as natural language expressions are only helpful to write queries which does not

17 CONTENTS

Figure 12: Connection Panel [8]

perform any join. In this section we will use Spotlight APIs to extend basic spotlight technology. • We have made an application 5.3 for data analysis. Like it helps to understand the meta data values associated with specific file and relationship with the other file format, so on. • We can not run the join queries over the spotlight or over the application 4.5.5 we have introduced in the last section. So here we have new version of application 5.3, which can run the queries by using the key metadata search expression like ”kMDItemTitle=’Sharif’” rather than natural language search expression. So this application helps us to run the join queries over the system.

In this section we want to discuss the reason of enriching our Spotlight APIs according to Semantic Desktop Tool’s needs. But before we discuss the reasons, we need to look into the structure of data in schema file which is used by spotlight APIs for extracting meta information according to some key attributes e-g kMDItemTitle.

5.1 Spotlight Schema Spotlight organizes the schema of the data in a schema.xml. This file contains the logical schema that structures the organization of meta information. Tiger provides importers for a variety of common file formats as well as all the important file formats used by Apple’s applications e-g JPEG, PNG, TIFF,GIF images, PDF , MS Word, Email messages, Address book contacts, MP3 etc. But some time it happens that we may need to use some user defined file format or unsup- ported file format in our application. In this case Spotlight will require some help in order to understand this file format and extract information from the file system. To give Spotlight this help, we can provide a meta-data importer plug-in [7] with our application that understands the in-and-outs of our file formats. See Figure 14. Spotlight Importer are associated with the document types with whom they belong. When there is a pdf file and we want to extract the meta data information associated to it, spotlight will call the pdf importer that will help us to extract the information from the system. Point to be noted that spotlight use schema file to know what attributes an importer support.

18 CONTENTS

Figure 13: Simple Application

Spotlight schema is specified in XML format called schema.xml [5]. This schema file is pro- vided within spotlight importer bundle. Schema file assist Spotlight to know what attributes an importer supports. The schema file describes the attributes that the importer describes, populates the attributes that applications should use to provide a preview of the document’s metadata, and also specifies any custom metadata attributes information that our documents require.

Following is general format of schema.xml [5]

... ...

19 CONTENTS

Figure 14: Extracting Metadata from files[4]

...

20 CONTENTS

We need to analyze the data before we use, so we need to understand the structure of data. Following commands are used to extract schema file for Mac machine. We need to execute these commands from terminal.

// following command will printout the schema $ mdimport -X

// following command will printout list of plugin we are going to use $ mdimport -L

// following command will printout all the available attributes $ mdimport -A

Schema file keeps the structure of data and defines which key meta data attributes associated to specific importer of document. We can only extract the information according to these attributes. Listing 7 is a short version of schema. Xml file. The whole Schema.xml file will keeps all the information about the structure of data about all supported file formats.

Listing 7: small version of Schema.xml kMDItemParticipants = { multivalued = 1; name = kMDItemParticipants; type = CFString; } ; kMDItemPath = { name = kMDItemPath; type = CFString; } ; . . . . ”com.microsoft.entourage08. virtual .message” = { a l l a t t r s = ( kMDItemTitle , ” com microsoft entourage recordID ” , kMDItemContentCreationDate , ” com microsoft entourage messageSent”, ” com microsoft entourage messageReceived”, ” c o m m i c r o s o f t e n t o u r a g e p r i o r i t y ” , ” c o m m i c r o s o f t e n t o u r a g e f l a g ” , ” com microsoft entourage unread ” , kMDItemContentModificationDate , kMDItemCoverage , kMDItemKeywords , kMDItemProjects , ” c o m m i c r o s o f t e n t o u r a g e h a s t e x t c o n t e n t ” , kMDItemTextContent , kMDItemAuthors , kMDItemRecipients , ” c o m m i c r o s o f t e n t o u r a g e f o l d e r I D ” , ” c o m m i c r o s o f t e n t o u r a g e junkLikelihood”, ” c o m m i c r o s o f t e n t o u r a g e s i z e ” , ” com microsoft entourage newsAccountID”, ” com microsoft entourage accountID”,

21 CONTENTS

” c o m m i c r o s o f t e n t o u r a g e repliedTo”, ” com microsoft entourage forwarded”, ” c o m m i c r o s o f t e n t o u r a g e redirected”, ” com microsoft entourage ReplyTo ” , ” com microsoft entourage ReplyToEmailAddresses”, ” c o m m i c r o s o f t e n t o u r a g e toRecipients”, ” com microsoft entourage toEmailAddresses”, ” c o m m i c r o s o f t e n t o u r a g e ccRecipients”, ” com microsoft entourage ccEmailAddresses”, ” com microsoft entourage attachments”, ” c o m m i c r o s o f t e n t o u r a g e p r o j e c t s ” , ” c o m m i c r o s o f t e n t o u r a g e categories”, ” c o m m i c r o s o f t e n t o u r a g e f l a g g e d ” , ” c o m m i c r o s o f t e n t o u r a g e a u t h o r e m a i l addresses”, ” c o m m i c r o s o f t e n t o u r a g e r e c p i e n t e m a i l addresses”, ” com microsoft entourage isFromMailingList” ); displayattrs= ( kMDItemContentCreationDate , kMDItemAuthors , kMDItemRecipients , kMDItemCoverage , kMDItemContentModificationDate ); name = ”com.microsoft.entourage08. virtual .message”; } ; ”com.microsoft.entourage08. virtual .note” = { a l l a t t r s = ( kMDItemTitle , ” com microsoft entourage recordID ” , kMDItemContentCreationDate , kMDItemContentModificationDate , ” c o m m i c r o s o f t e n t o u r a g e h a s t e x t c o n t e n t ” , kMDItemTextContent , kMDItemKeywords , kMDItemProjects , ” c o m m i c r o s o f t e n t o u r a g e p r o j e c t s ” , ” c o m m i c r o s o f t e n t o u r a g e categories” ); displayattrs= ( kMDItemTitle , kMDItemContentCreationDate , kMDItemContentModificationDate ); name = ”com.microsoft.entourage08. virtual .note”; } ; . . . if we analyze schema file, we find that all the attributes which are associated with any importer are defined at the start of file. It also defines whether specific attribute is single or multi valued, e-g ”kMDItemPath” is single value where as ”kMDItemParticipants” is multivalued. Different importer can share the same key attributes e-g ”kMDItemPath” is shared almost among all the document types. Schema file also defines set of attributes associated with the specific importer. So Spotlight need schema file with importer information for querying file system and it queries and extract information according to structure define by importer with in the schema.

22 CONTENTS

Now we are aware with the structure of the attribute and list of attributes associate to specific file format. We can use this information to create complex query. But remember, the data on the file system is in raw form and contains lots of database anomalies. To make use of this data in better way we need to enrich it. We only don’t need simple query like ”select person” but also some complex queries (join) like ”select persons who work on project A and associated to task B”. We can’t execute join queries straight from the finder or from the spotlight. We need to use the Spotlight APIs. Spotlight APIs provide us more power as we can execute complex queries based on key attributes. Still one issue need to talk about, eventhough we run complex queries which get the result against them but the result we get against queries contains lots of anomalies which database designer doesn’t like. We need to make database in normal form and need to remove existing and non existing null. Semantic Desktop Tool require to handle these kind of anomalies and we propose spotlight APIs, which can help us to handle anomalies at data level. So in conclusion we can apply required methodology of semantic desktop at data level.

5.2 Beyond Spotlight APIs Spotlight does not organize the data in more complex way for reason of performance because the main purpose of the tool is to retrieve as fast as possible the information inside the file sys- tem. In our project we want to add more semantics to data for performing more semantically described queries and for crossing the information between different sources. The vision of the semantic desktop can be considered as a response to the perceived prob- lems of existing user interfaces. Firstly computers cannot get a great deal of information about the content of files. For example suppose one downloads a document by a particular author on a particular subject - though the document will likely clearly indicate its subject, author, source and possibly copyright information there is no way for the computer to obtain this in- formation or process it. This means the computer cannot search, filter or otherwise act upon the information as effectively as it otherwise could. This is very much the problem that the Semantic Web is concerned with [11]. Secondly there is the problem that information stored on a computer can only be accessed or sorted in a way related to its format. For example, on legacy operating systems such as Unix e-mails are stored separately to files, and both have nothing to do with tasks, notes and planned activities that may be stored in a calendar program, whilst contacts might be stored in another program, however all these forms of information might simultaneously be relevant and necessary for a particular task. Further even if data is all stored as part of the file system it is often accessed with different applications, even very similar formats may need to be accessed with different programs - for example a PDF, PostScript, Microsoft Word and ASCII files are all opened in different programs despite being essentially the same. To meet with the needs of semantic desktop, Spotlight APIs need to enrich to overcome the incomplete information in the system. We need to fetch a data from the system in a way that it is complete and according to the methodology defined by semantic desktop team. As the purpose of Semantic Desktop application is to extract all the metadata information from the file system and populate database, and ontology (Abox), so we need to meet with the standards of these fields and make our data better organized, quickly accessible and normalized. Take a scenario of the application that with single click, we will populate the ABox and database after the extraction of data from the file system. So sure we cannot do it by just catching the result of these queries and populate our database and Abox straight way. We need to catch the data from the result set in way that we could create intelligent query over our ontology or database. Its mean we need to remove the anomalies and normalize the data, in other word we need to fetch meta information according to our proposed methodology. Here we have developed an application that use Spotlight APIs to extract the information from the file system on the base of key attributes. It provides us a way to analyze the extracted

23 CONTENTS information and we can find the relationship among different files. Lets explore by following use case scenario to clear why we need to query file system by using the key attributes expression rather than natural language expression.

Use case Scenario Example 1: "Select all the person" It is possible to run theses query over spotlight and we do not feel the need of database or ontology. But what about the following scenario

Example 2: "Select person who work on task A of project B" This query looks simple but it queries cross the files thats why we need join query. It is almost impossible to write join query using natural language expression which we could execute straight from spotlight rather we need to use key attributes for querying across file format. We need to enrich the spotlight APIs to handle the data from cross file formats e-g we need to

• select person work on task A. • and then refine result set with constraint which specify that task A should associate with project B What about the following one

Example 3 "Give me name of all participant of event xx of project yy, who were in the invitation list made by ITS Project manager and replied to his email with the subject ’ABC’" Wait: isn’t here we are using complex concept3? Yes we are!! Then Even we cannot run this query over database straightforwardly, then how we can from spotlight.

Now we have clear vision about the needs of enriching our spotlight APIs to meet with these needs. Moreover Example 1 still has some limitations; even though we could run this query through spotlight. But data will not be normalized and well structured because we are fetching information from raw data. Let’s see why I said that it still has some limitations: let’s suppose a person does not have nickname but still he is a person so its mean there will be null value in the field of nick name, same in the case of spouse. But we need to remove the null values from

Person PID Fname Lname Birthday Nickname Spouse 1 Muhammad Faheem 20-0-1985 Tiger Jan 2 Shahid Ali 12-01-1986 NULL NULL 3 Monika Jakubowska 19-04-1984 Moonia NULL

Table 2: Example of Person table the database as semantic desktop methodology require it and want to handle it at very low level so for that purpose we need to apply some methodology which will take care of extraction of conceptual schema from the raw data as well as will populate Abox and database. So in short we will create a separate table/concept e-g person with nickname which will be disjoint from a

3e-g Project Manager is complex concept. who is person, employee of some company and working over some project with role manager. Any person who will be satisfying these constraints will become Project Manager.

24 CONTENTS person without nickname. By this way we can handle the null values. So let’s collect the main point from this section. Spotlight APIs provide us a mean to search our file system through metadata information. We can use spotlight APIs in our application for extracting data from the system. But before we use it in our Semantic Desktop tool, we need to enrich Spotlight APIs to handle limitations in the raw data and to meet with requirement of our application. An methodology is defined by semantic desktop team and can be applied at data level.

In upcoming section we will see new version of our application which extract the information from the file system on the base of key attributes rather than by using natural language like expression.

5.3 A Graphical User Application based on Key metadata attribute We have already seen a simple application in section 4.5.5. In that application we were facing limitations regarding running some complex queries like join etc. Figure 15 is GUI, of the an application which extract the meta data information of specific file against some complex query which is composed of key metadata attributes.

For example when we want to search a person who has nickname but not spouse and work on project we write the query using key attributes as follow. kMDItemKind=’vrge08_contact’&&com_microsoft_entourage_nickname=’*’ &&com_microsoft_entourage_spouse!=’*’&&com_microsoft_entourage_project=’*’

We can not run above query by using the natural language expression. This section we will present an application which not only search for a file in system but also provides us meta data information associate to file that helps us in data analysis. We can see the relationships among different entities after performing data analysis over extracted meta data information. We have seen the sample code in Listing 1. Now we just need to replace following line at the top of the code of program in Listing 1.

Replace the following line of Listing 1

CFStringRef path=CFSTR("/Library/Application Support/Apple/iChat\ Icons" "/Flags/Pakistan.gif");

with following lines of code and rest of program will be the same.

MDQueryRef query;

NSString *querystring; querystring = (NSString *) searchexpression; // search expression is an expression // given by end user through some // interface query = MDQueryCreate(kCFAllocatorDefault,(CFStringRef)querystring,NULL,NULL);

MDQueryExecute(query, kMDQueryWantsUpdates); int count = MDQueryGetResultCount(query);

MDItemRef item;NSString *path;

25 CONTENTS for(int row=0;row

Figure 15: Simple Application

5.4 Use Cases and Evaluation we have seen application in section 5.3. Lets evaluate some use cases. Application provide meta information of files against some query. We can make use of this information in our application e-g populating DB or ABox.

26 CONTENTS

Example 1: "Select all the person" we can run this query from gray box on the top right corner of mac machine, See figure 16. We can also run this query from our application see figure 17. But with our application, we don’t only search the file but also metadata information associated with the file.

Figure 16: Spotlight Query Result

Example 2: "Select all the person who only have nickname but not any other information " The above we can run through new version we have created. Search expression of this query is written as follow: Search Expression: kMDItemKind=’vrge08_contact’&& com_microsoft_entourage_nickname=’*’ && com_microsoft_entourage_spouse!=’*’ && com_microsoft_entourage_suffix!=’*’ && com_microsoft_entourage_title!=’*’ && com_microsoft_entourage_notes!=’*’ && com_microsoft_entourage_interests!=’*’ we cannot run this query from gray box but yes we can run this query from the application see figure 18. There are lot of examples like this for example for example following complex query we also can’t run through spotlight gray box. for example: • Task associated to project. • person who is married and has children.

27 CONTENTS

Figure 17: Demo Result with Metadata Information

• person work in company as Manager. So we can execute queries based on key attributes by using Spotlight APIs which helps us to create more interesting queries and get more relevant information. So here one more point is clear that we can extract meta information from the system and then we can use them in our desired application and its upto our methodology that how we want to use data and in which form we want to store. We can apply normalization over this data and lot of other things like removing null values and creating the relationships between different classes like ”person and project class with relation person associate project”.

6 Conclusion and Future Work

We have seen that spotlight provides us totally a new way of working with files. It is tightly integrated with OS and also update spotlight database quickly which gives it upper hand on other tools available in market such as Google Desktop. Spotlight also provides us lots of application programming interfaces that allow us to search for files based on metadata. We can write queries using Key meta attribute, which provides us a way to extract meta data information against some file by using some join query. Then we can use this meta data information to populate some DB or Abox. There is a research going on over Semantic Desktop Tool in KRDB center at Free University Bolzano. We believe that we can provide a platform (based on spotlight wrapper) for extracting the information according to methodology defined by semantic desktop team. Future work requires to develop a strategy for extracting conceptual schema from raw data.

28 CONTENTS

Figure 18: Demo Result with Meta Information

Semantic Desktop Team still has to deal with it at implementing phase even though they define a methodology to do so. We believe, in future we can develop a way to which will lead us to implement that methodology which states the way of extracting conceptual schema from raw data.

29 REFERENCES

References

[1] 2009 Google. Google desktop. http://desktop.google.com/features.html. [2] 2010 Apple Inc. Working with spotlight. http://developer.apple.com/macosx/ spotlight.html, 2006. [3] 2010 Apple Inc. Working with xcode. http://developer.apple.com/macosx/xcode2. html, 2006. [4] 2010 Apple Inc. Extracting metadata from files. http://developer.apple. com/mac/library/documentation/Carbon/Conceptual/MetadataIntro/Concepts/ HowDoesItWork.html, 2009. [5] 2010 Apple Inc. Spotlight importer schema format. http://developer.apple.com/mac/ library/documentation/Carbon/Conceptual/MDImporters/Concepts/SchemaRef. html, 2009.

[6] 2010 Apple Inc. A tour of xcode. http://developer.apple.com/mac/library/ documentation/DeveloperTools/Conceptual/A_Tour_of_Xcode/000-Introduction/ qt_intro.html, 2009. [7] 2010 Apple Inc. Writing a spotlight importer. http://developer.apple.com/mac/ library/documentation/Carbon/Conceptual/MDImporters/Concepts/WritingAnImp. html, 2009. [8] 2010 Apple Inc. Interface builder user guide. http://developer.apple.com/mac/ library/documentation/DeveloperTools/ConceptualIB_UserGuide/Introduction/ Introduction.html//apple_ref/doc/uid/TP40006920, 2010.

[9] 2010 Apple Inc. Introduction to objective c programming language. http:// developer.apple.com/mac/library/documentation/Cocoa/Conceptual/ObjectiveC/ Introduction/introObjectiveC.html, 2010. [10] 2010 Apple Inc. Xcode workspace guide. http://developer.apple.com/mac/library/ documentation/DeveloperTools//Conceptual/XcodeWorkspace/000-Introduction/ Introduction.html#//apple_ref/doc/uid/TP40006920, 2010.

[11] 2010. http://en.wikipedia.org/wiki/Semantic_desktop.

30