Automatic Bookmark Classification
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Bookmark Classification: A Collaborative Approach Dominik Benz, Karen H. L. Tso, Lars Schmidt-Thieme Computer-based New Media Group (CGNM) Department of Computer Science, University of Freiburg Georges-Kohler-Allee¨ 51, 79110 Freiburg, Germany {dbenz, tso, lst}@informatik.uni-freiburg.de ABSTRACT size of the repository, difficulties in organizing and main- Bookmarks (or Favorites, Hotlists) are a popular strategy taining the hierarchical structure arise, as for example the to relocate interesting websites on the WWW by creating a classification of new bookmarks, i.e. finding or creating an personalized local URL repository. Most current browsers appropriate folder to store them. offer a facility to store and manage bookmarks in a hier- This paper presents a novel approach to automate the archy of folders; though, with growing size, users report- bookmark classification process, aiming at recommending edly have trouble to create and maintain a stable taxon- appropriate folders to a user when filing new bookmarks. omy. This paper presents a novel collaborative approach to There are two basic strategies to solve the problem of how ease bookmark management, especially the “classification” to generate such recommendations: the first one, commonly of new bookmarks into a folder. We propose a methodology referred to as information filtering or content-based filtering to realize the collaborative classification idea of considering [13], draws inferences from the user’s past behaviour. In how similar users have classified a bookmark. A combina- this context, “behaviour” means which bookmarks the user tion of nearest-neighbour-classifiers is used to derive a rec- stored in which folders. ommendation from similar users on where to store a new The second strategy, usually referred to as collaborative bookmark. Additionally, a procedure to generate keyword filtering [13], takes the behaviour of others into account, recommendations is proposed to ease the annotation of new especially of those who displayed similar interests in the bookmarks. A prototype system called CariBo has been im- past. In other words, the basic idea is to find similar users plemented as a plugin of the central bookmark server soft- who have already classified a bookmark, and then to derive ware SiteBar. A case study conducted with real user data recommendations on where the target user could store this supports the validity of the approach. bookmark. The central contribution of this paper is to present a col- laborative classification algorithm for bookmarks. The nov- Categories and Subject Descriptors elty hereby consists of recommending structural information H.3.3 [Information Storage and Retrieval]: Information from similar users. This has scarcely been researched in the Search and Retrieval—Information Filtering; I.2.11 [Arti- context of bookmark classification, where content-based ap- ficial Intelligence]: Distributed Artificial Intelligence—In- proaches prevail. A prototype of a collaborative bookmark telligent agents classification system, CariBo, has been built, and experi- mentation results with this prototype and real user data con- firm that the presented approach can outperform content- Keywords based approaches. WWW, bookmark, classification, collaborative filtering, rec- ommender systems, augmented averaging 2. BOOKMARKS IN GENERAL Studies on web usage in general and bookmark usage sug- 1. INTRODUCTION gest that bookmarks are a very popular method among users The continuing, explosive growth of the WWW strength- to facilitate the access to WWW information; [1] cites a sur- ens its role as a prevalent source of information for scientific vey conducted in 1996 with 6619 web users, where 80% of research as well as everyday work and leisure. Studies on the subjects reported bookmarks as a strategy for locating web usage like [5] reported that revisits are a major part information. 92% of them had a bookmark archive, 37% (58%) of website visits. Bookmarks (or Favorites, Hotlists) had more than 50 bookmarks. are a widely used strategy to relocate sites of interest that al- Cockburn and McKenzie reported an average number of lows the user to create a personalized URL repository, which 184 bookmarks within their 17 subjects, organized in a mean facilitates an easy and fast access to relevant information of 18.1 folders [5]. Both studies confirmed that users tended [1]. Most current browsers support hierarchical bookmark- to have problems with bookmark management, especially ing schemes that enable users to manage their collection when the size of the collection increased. Kanawati and of URLs in a hierarchy of folders. However, with growing Malek categorized the problems into three classes [7]: Copyright is held by the author/owner(s). WWW2006, May 22–26, 2006, Edinburgh, UK. • Resource discovery . The problem of “finding good bookmarks” that match the user’s information needs. This is not a purely basis are [9] (employing a semi-automatic clustering algo- bookmark-specific problem, but corresponds heavily to rithm for reorganization of the bookmark hierarchy) and [8] the problem of “finding interesting websites”. This is (comparing different document classification methods). addressed by a large research community in the area All of the described approaches address different aspects, of recommender systems. but leave out an important source of information, namely to consider the bookmark organization habits and strategies of • Recall similar users. Haase et al. [6] presented a more general The problem of locating an appropriate bookmark at approach how the evolution and management of personal a given time. ontologies can be supported by a collaborative recommen- • Maintenance dation algorithm. The problem of keeping the set of bookmarks up-to- date and well-organized; difficulties hereby arise from 4. COLLABORATIVE APPROACH discovering broken links, modifying the organization Pemberton et al. point out that the basic idea of collab- scheme due to changes in personal interest, and cre- orative filtering is “to recruit others to act as our filtering ating and maintaining the taxonomy implied by the agents on the assumption that they are our peers, i.e. like bookmark folder hierarchy. The classification of new us in tastes and judgements of quality” [12]. For the case of bookmarks also belongs to this category. bookmark management, one could replace “filtering agents” Abrams et al. point out that the crucial tradeoff in book- with “classification agents” or “annotation agents”. Dif- marking is between organization costs and future benefit: ferent groups of people obviously have different needs and “Users must weigh the cost of organizing bookmarks against strategies to organize and annotate bookmarks belonging to the expected gains”[1]. Roughly half of their subjects turned a certain category. A computer scientist for example might out to be “sporadic filers”, i.e. users who occasionally sched- store a bookmark about web development with PHP in a rel- ule reorganization sessions when their bookmark repository atively sophisticated hierarchy like development > web de- became too complex. This task is generally reported to be velopment > languages > PHP. A sales consultant however time-consuming and tedious. Among others, their implica- would probably file the same bookmark in a less differenti- tions for the design of a (possibly shared) bookmark man- ated organization scheme, possibly something like marketing agement system include to “provide users with an immediate > websites. Analogously, the annotations that these both filing mechanism when creating a bookmark”. We argue that persons would use for this website will in all probability dif- using a collaborative classification algorithm for this purpose fer. The computer scientist might annotate the PHP page is a sensible choice. with something like “dynamic, script language, LAMPP”, whereas one could imagine annotations like “advanced web- design, programming, webserver” for the sales consultant. 3. RELATED WORK Consequently, having a look in our peer group, i.e. people As bookmarking is one of the most commonly used fea- who are interested and engaged in similar topics as we are, tures of web browsers, there is a vast number of programs is highly probable to give us valuable information how to and tools with the purpose to alleviate different aspects of classify and annotate our own bookmarks. Invoking a public bookmark management. The majority of them can be as- directory (like Yahoo or the Open Directory Project) for signed to the category “centrally store and browse”, whereby this purpose is an alternative. But those public taxonomies the core benefit is to make bookmarks available when the are apparently either too detailed or not detailed enough. user moves to another physical machine. This concept is ex- For the computer scientist, Yahoo’s categories in the area of tended in some cases by making bookmarks shareable with webdesign might be not fine-grained enough, whereby they other users. Delicious (http://www.del.icio.us) for example are much too detailed for our sales-consultant. Nevertheless, is a popular online service which transfers the usual client public directories constitute a source of information worth side bookmarking mechanism onto a central server to en- to be consulted in the absence of other alternatives. Table able roaming. This has become known as social bookmarking 1 gives an overview of the conditions for different types of and has gained popularity recently. Furthermore, the book- recommendations. marks can be tagged