On Webfeed Aggregators and Social Navigation

Brian M. Dennis Computer Science Department, McCormick School of Engineering New Media Program, Medill School of Northwestern University [email protected]

September 22, 2004

On the Web, more and more frequenlty updating mats becoming dominant for distributing change in- information sources publish changes in a small set formation on the Web, and thus aggregators becom- of ad hoc, de facto, XML based formats which I’ll ing central in monitoring and managing dynamic in- collectively term webfeeds. These webfeeds enable formation on the Web. applications, webfeed aggregators, to automatically Webfeed aggregation will become an increasingly monitor, retrieve, and present the dynamic informa- interesting venue for CSCW researchers as more tion. This makes it easy for users to track a large users and more content sources join the webfeed number of web information sources. ecology. Social network analysis and social naviga- Webfeed aggregators, increasing in popularity, are tion techniques will be useful in helping users deal simplistic in helping end users make sense of a pool with information overload in this environment. of feeds, being at the sophistication of e-mail or news readers. The goal of my NusEye Applying Social Network Analysis Tech- project is to experiment with the application of social niques to Public Blogrolls network analysis techniques to support social navi- gation [1] in helping users makes sense of a torrent Recently, , arguably the father of the cur- of information rent syndication infrastructure, implemented a Web Paralleling the growth of weblogs as a self pub- service that collects feed subscription information lishing platform, has been the rising popularity of from self identifying users. Called “Share Your attendant content syndication and aggregation tools. OPML” [4] (SYOPML), after the file format for de- At the producer end, most weblog tools support de scribing subscriptions, the service has induced hun- facto syndication standards (RSS [2], [3]) out dreds of users to upload lists of their subscribed of the box. For content consumers, aggregation ap- feeds. A metaindex of the lists that are publicly avail- plications and services ease the task of staying on able is published by the SYOPML service. I have top of a large number of weblogs. The syndica- written a simple robot that downloads the SYOPML tion mechanisms are not strictly weblog based either. index, checks for new and updated subscriptions, and Traditional news producers such as The New York maintains a database of these public subscriptions. Times, Yahoo/AP News, and ABC also participate The results of these crawls will be made available to in this ecology. As well, different forms of dynamic other researchers. information, such as the results of standing search A key observation is that a large number of syndi- queries or wiki page changes, are being published as cated content readers are openly providing subscrip- webfeeds. Current trends seem indicate these for- tion information. SYOPML not only has a number of

1 weblog publishers but a significant number of “just A challenge for this work is that the network anal- plain folks”. This is in contrast to other popular so- ysis software known to me for discovering roles and cially based applications such as e-mail, instant mes- positions (UCINET, BLOCKS, Pajek) is unusable saging, or USENET, where it is extremely difficult for graphs on the scale of our network: over 900 sub- to get social relation information without corporate scribers, and over 29000 webfeeds as of this writing. assistance. The effect is not limited to SYOPML. Note that such numbers are relatively small for a rea- The popular Web hosted aggregator Bloglines al- sonably popular Web based service. lows users to make their blogrolls publicly avail- To surmount these obstacles, I have taken two able and many users have done so. Similarly, the approaches to make the analysis feasible. First, I del.icio.us service [5] al- have employed the CLUTO [10] clustering toolkit lows users to subscribe to other users and tags, ad to find interesting partitions of the network nodes, hoc labelings of urls. del.icio.us subscriptions looking for significant network positions. Each node are publicly available and easily discoverable. is assigned a number of network based attributes Subjecting such public data to social network such. Then this data is processed using the CLUTO analysis techniques can lead to a better understand- toolkit. CLUTO uses optimization based cluster- ing of how subscribers, feeds, and content interre- ing in partitional, agglomerative, and graph based late. Such analysis can also support providing so- schemes. Since CLUTO is designed for high di- cial clues within webfeed aggregators. In contrast mensional datasets and large numbers of instances, to easily calculated statistics (e.g. node degree dis- the toolkit generates partitions on a reasonable time tributions) that lead to simplistic results (e.g. the scale. graph exhibits a power law of coefficient gamma), Second, examining the 1-mode versions of the ac- network analysis can provide subtler insights. Us- tors and events network indicates the domination of ing the collected subscription data, I have generated a small number of feeds becomes apparent. The top corresponding social network data and examined it twenty feeds, ranked by number of subscribers, reach through a combination of standard network analysis over ninety percent of the community. Removing tools (Pajek [6]) and hand written code. these twenty feeds allows tools like Pajek to start re- The SYOPML data can be used to construct what vealing hidden clusters within the networks. is known as an affiliation network. In an affiliation Our initial efforts indicate that fine grained social network, nodes are separated into two classes, ac- clues can be teased out of the network data. How tors and events. For my purposes, subscribers are might these clues be employed for improved aggre- the actors and news feeds the events. Under this gation services? This information can be used to definition, the network graph is bipartite. Sociol- augment content based analysis of feed. I have pro- ogy researchers such as Wellman [7] , Borgatti and toyped some visualizations of content clustering ap- Everett [8], Faust [9] and others have developed a plied to a collection of news feeds. This is done using wealth of techniques for interpreting and analyzing CLUTO and pycluster [11], a toolkit which imple- such affiliation networks. ments a technique that lends itself to visual display, Faust describes a number of centrality measures Self-Organizing Maps [12] . The clustering results appropriate for analyzing affiliation networks, in- were annotated with information from our network cluding degree, eigenvector, closeness, betweenness, analysis of the SYOPML affiliation network. and flow betweenness. Besides centrality measures, Despite the fact that members of this community the network can be examined for particular roles and rarely interact with each other directly, there is a so- positions. Roles are patterns of network structure cial structure of syndication feeds that can be dis- that actors are embedded in. Positions are partitions cerned. This structure can provide social navigation of the graph nodes based upon similiarity of nodes’ clues to assist users in dealing with information over- local network structure. load. Such clues could be incorporated into feed ag-

2 gregators and increasingly important “meta-weblog • We know that every webfeed aggregator under- services”, which appear to use rather unsophisticated stands webfeeds. Thus our services will auto- network analysis techniques. matically support any aggregator, regardless of platform. NusEye • Services deploy incrementally, upgrade in place and instantaneously propagate to all users. In a recent paper [13], authored with my graduate student Azzari Caillier Jarrett, the application of so- • We can do content analysis of the subscribed cial network analysis, along with graph visualization webfeeds alongside the social network analysis. and interaction, for navigating syndicated web con- • We can provide web based collaboration tools, tent, a.k.a webfeeds, was presented. A key approach ala del.icio.us centered around webfeed was to apply network analysis to content item and subscriptions. content source relationsips in addition to analysis of traditional human networks. Three social networks • Due to centralization, we can easily track the were used to to generate interactive graphs in par- popularity and usage of various features that are ticularly useful visual styles. The results of a small provided to end users. initial user study were presented indicating initial In short, we are building infrastructure that allows us promise for our approach. to run interesting user experiments with social navi- One serious criticism of our approach is that net- gation mechanisms in the webfeed ecology. We an- work visualization may not be the most effective ticipate network analysis techniques to be a core part means by which to present social navigation clues. of the backend of our services. First, as Herman [14] et. al. point out, net- The only major limitation of this approach is the work visualization is appropriate for certain cogni- very limited “user interface” that webfeed formats tive tasks, which may not be applicable in this do- provide. While many desktop aggregators embed a main. Network visualizations can also take up sig- Web browser control, across a broad range of plat- nificant screen real estate, which is an issue in that forms, we can probably only rely on controlled ren- we intended these displays to be secondary informa- dering of text, images and links. In fact, links may be tion displays. Further, for complex networks the vi- our only means of interacting with the user, although sualizations are difficult to construct such that they there is some hope that usage of JavaScript may be are comprehensible. viable. Still, Web applications such as GMail, Flickr We are committed to further experiments to deter- (acknowledging the heavy use of Flash in Flickr), mine the feasibility of using such network displays and del.icio.us indicate that popular, interest- because, despite our minimal efforts, users were fa- ing, and social applications can be constructed out of vorably attracted to such visualizations. Thus, if the those limited elements. above criticisms are invalid or can be overcome, we At worst, we will discover that desktop level in- may see a high rate of adoption. In short, we would teraction is needed to make the use of these so- not have an initial selling hurdle to engage users. cial mechanisms viable. Our suspicion though is In parallel with these efforts, we are building that a small amount of social navigation information, hosted analysis services that actually deliver their straightforwardly displayed can have significant util- results as webfeeds. The proposition is that users ity and impact. provide us with a blogroll, our service analyzes the blogroll individually, and within the social context Conclusion of other blogrolls we have access too. We then pro- vide a custom webfeed for the user to monitor. This Webfeed aggregation will become an increasingly approach provides a number of benefits: interesting venue for CSCW researchers as more

3 users and more content sources join the webfeed [9] K. Faust, “Centrality in affiliation networks,” ecology. Social network analysis and social naviga- Social Networks, vol. 19, pp. 157 – 191, Apr. tion techniques will be useful in helping users deal 1997. with information overload in this environment. My NusEye project has made an initial foray into adding [10] G. Karypis, “Cluto: Software package for social navigation to webfeed aggregation, and we clustering high dimensional data.” http://www- will be continuing to pursue this work. I would be users.cs.umn.edu/ karypis/cluto/, Nov. 2003. glad to join with other CSCW researchers to share [11] M. de Hoon, S. Imoto, and ideas, techniques, and approaches. S. Miyano, “The C clustering library.” http://bonsai.ims.u-tokyo.ac.jp/ mde- References hoon/software/cluster/software.htm, 2004. [12] T. Kohonen, Self Organizing Maps. Spring- [1] K. Ho¨ok,¨ D. Benyon, and A. Munro, eds., De- Verlag, 1997. signing Information Spaces: The Social Navi- gation Approach. Spinger-Verlag, 2003. [13] B. M. Dennis and A. C. Jarrett, “Nuseye: Vi- sualizing network structure to support naviga- [2] D. Winer, “RSS 2.0 specification.” tion of aggregated content,” in Proceedings of http://blogs.law.harvard.edu/tech/rss, HICSS-38, 2005. Dec. 2003. [14] I. Herman, G. Melancon, and M. S. Marshall, [3] M. Nottingham, “The atom syn- “Graph visualization and navigation in infor- dication forma 0.3 (pre-draft).” mation visualization: A survey,” IEEE Trans- http://www.mnot.net/drafts/ actions on Visualization and Computer Graph- draft-nottingham-atom-format-02.html, ics, vol. 6, pp. 24 – 43, jan 2000. Dec. 2003.

[4] D. Winer, ““Share Your OPML!” a com- mons for sharing outlines, feeds, taxonomy.” http://feeds.scripting.com/, Jan. 2004.

[5] J. Schachter, “del.icio.us, a social bookmarking service.” http://del.icio.us/about, 2003.

[6] V. Batagelj and A. Mrvar, “Pajek - program for large network analysis,” Connections, vol. 21, pp. 47 – 57, Jan. 1999.

[7] B. Wellman and S. Berkowitz, eds., Social Structures: A Network Approach, ch. Think- ing Structurally. Cambridge University Press, 1988.

[8] S. Borgatti and M. Everett, “Network analy- sis of 2-mode data,” Social Networks, vol. 19, pp. 243 – 269, Aug. 1997.

4