On Webfeed Aggregators and Social Navigation

On Webfeed Aggregators and Social Navigation Brian M. Dennis Computer Science Department, McCormick School of Engineering New Media Program, Medill School of Journalism Northwestern University [email protected] September 22, 2004 On the Web, more and more frequenlty updating mats becoming dominant for distributing change in- information sources publish changes in a small set formation on the Web, and thus aggregators becom- of ad hoc, de facto, XML based formats which I’ll ing central in monitoring and managing dynamic in- collectively term webfeeds. These webfeeds enable formation on the Web. applications, webfeed aggregators, to automatically Webfeed aggregation will become an increasingly monitor, retrieve, and present the dynamic informa- interesting venue for CSCW researchers as more tion. This makes it easy for users to track a large users and more content sources join the webfeed number of web information sources. ecology. Social network analysis and social naviga- Webfeed aggregators, increasing in popularity, are tion techniques will be useful in helping users deal simplistic in helping end users make sense of a pool with information overload in this environment. of feeds, being at the sophistication of e-mail or USENET news readers. The goal of my NusEye Applying Social Network Analysis Tech- project is to experiment with the application of social niques to Public Blogrolls network analysis techniques to support social navigation [1] in helping users makes sense of a torrent Recently, Dave Winer, arguably the father of the cur- of information rent syndication infrastructure, implemented a Web Paralleling the growth of weblogs as a self pub- service that collects feed subscription information lishing platform, has been the rising popularity of from self identifying users. Called “Share Your attendant content syndication and aggregation tools. OPML” [4] (SYOPML), after the file format for de- At the producer end, most weblog tools support de scribing subscriptions, the service has induced hun- facto syndication standards (RSS [2], Atom [3]) out dreds of users to upload lists of their subscribed of the box. For content consumers, aggregation ap- feeds. A metaindex of the lists that are publicly avail- plications and services ease the task of staying on able is published by the SYOPML service. I have top of a large number of weblogs. The syndica- written a simple robot that downloads the SYOPML tion mechanisms are not strictly weblog based either. index, checks for new and updated subscriptions, and Traditional news producers such as The New York maintains a database of these public subscriptions. Times, Yahoo/AP News, and ABC also participate The results of these crawls will be made available to in this ecology. As well, different forms of dynamic other researchers. information, such as the results of standing search A key observation is that a large number of syndi- queries or wiki page changes, are being published as cated content readers are openly providing subscrip- webfeeds. Current trends seem indicate these for- tion information. SYOPML not only has a number of 1 weblog publishers but a significant number of “just A challenge for this work is that the network anal- plain folks”. This is in contrast to other popular so- ysis software known to me for discovering roles and cially based applications such as e-mail, instant mes- positions (UCINET, BLOCKS, Pajek) is unusable saging, or USENET, where it is extremely difficult for graphs on the scale of our network: over 900 sub- to get social relation information without corporate scribers, and over 29000 webfeeds as of this writing. assistance. The effect is not limited to SYOPML. Note that such numbers are relatively small for a rea- The popular Web hosted aggregator Bloglines al- sonably popular Web based service. lows users to make their blogrolls publicly avail- To surmount these obstacles, I have taken two able and many users have done so. Similarly, the approaches to make the analysis feasible. First, I del.icio.us social bookmarking service [5] al- have employed the CLUTO [10] clustering toolkit lows users to subscribe to other users and tags, ad to find interesting partitions of the network nodes, hoc labelings of urls. del.icio.us subscriptions looking for significant network positions. Each node are publicly available and easily discoverable. is assigned a number of network based attributes Subjecting such public data to social network such. Then this data is processed using the CLUTO analysis techniques can lead to a better understand- toolkit. CLUTO uses optimization based clustering of how subscribers, feeds, and content interre- ing in partitional, agglomerative, and graph based late. Such analysis can also support providing so- schemes. Since CLUTO is designed for high di- cial clues within webfeed aggregators. In contrast mensional datasets and large numbers of instances, to easily calculated statistics (e.g. node degree dis- the toolkit generates partitions on a reasonable time tributions) that lead to simplistic results (e.g. the scale. graph exhibits a power law of coefficient gamma), Second, examining the 1-mode versions of the ac- network analysis can provide subtler insights. Us- tors and events network indicates the domination of ing the collected subscription data, I have generated a small number of feeds becomes apparent. The top corresponding social network data and examined it twenty feeds, ranked by number of subscribers, reach through a combination of standard network analysis over ninety percent of the community. Removing tools (Pajek [6]) and hand written code. these twenty feeds allows tools like Pajek to start re- The SYOPML data can be used to construct what vealing hidden clusters within the networks. is known as an affiliation network. In an affiliation Our initial efforts indicate that fine grained social network, nodes are separated into two classes, ac- clues can be teased out of the network data. How tors and events. For my purposes, subscribers are might these clues be employed for improved aggre- the actors and news feeds the events. Under this gation services? This information can be used to definition, the network graph is bipartite. Sociol- augment content based analysis of feed. I have pro- ogy researchers such as Wellman [7] , Borgatti and toyped some visualizations of content clustering ap- Everett [8], Faust [9] and others have developed a plied to a collection of news feeds. This is done using wealth of techniques for interpreting and analyzing CLUTO and pycluster [11], a toolkit which imple- such affiliation networks. ments a technique that lends itself to visual display, Faust describes a number of centrality measures Self-Organizing Maps [12] . The clustering results appropriate for analyzing affiliation networks, in- were annotated with information from our network cluding degree, eigenvector, closeness, betweenness, analysis of the SYOPML affiliation network. and flow betweenness. Besides centrality measures, Despite the fact that members of this community the network can be examined for particular roles and rarely interact with each other directly, there is a so- positions. Roles are patterns of network structure cial structure of syndication feeds that can be dis- that actors are embedded in. Positions are partitions cerned. This structure can provide social navigation of the graph nodes based upon similiarity of nodes’ clues to assist users in dealing with information over- local network structure. load. Such clues could be incorporated into feed ag- 2 gregators and increasingly important “meta-weblog • We know that every webfeed aggregator under- services”, which appear to use rather unsophisticated stands webfeeds. Thus our services will auto- network analysis techniques. matically support any aggregator, regardless of platform. NusEye • Services deploy incrementally, upgrade in place and instantaneously propagate to all users. In a recent paper [13], authored with my graduate student Azzari Caillier Jarrett, the application of so- • We can do content analysis of the subscribed cial network analysis, along with graph visualization webfeeds alongside the social network analysis. and interaction, for navigating syndicated web con- • We can provide web based collaboration tools, tent, a.k.a webfeeds, was presented. A key approach ala del.icio.us centered around webfeed was to apply network analysis to content item and subscriptions. content source relationsips in addition to analysis of traditional human networks. Three social networks • Due to centralization, we can easily track the were used to to generate interactive graphs in par- popularity and usage of various features that are ticularly useful visual styles. The results of a small provided to end users. initial user study were presented indicating initial In short, we are building infrastructure that allows us promise for our approach. to run interesting user experiments with social navi- One serious criticism of our approach is that net- gation mechanisms in the webfeed ecology. We an- work visualization may not be the most effective ticipate network analysis techniques to be a core part means by which to present social navigation clues. of the backend of our services. First, as Herman [14] et. al. point out, net- The only major limitation of this approach is the work visualization is appropriate for certain cogni- very limited “user interface” that webfeed formats tive tasks, which may not be applicable in this do- provide. While many desktop aggregators

Load more