On Webfeed Aggregators and Social Navigation

Total Page:16

File Type:pdf, Size:1020Kb

On Webfeed Aggregators and Social Navigation On Webfeed Aggregators and Social Navigation Brian M. Dennis Computer Science Department, McCormick School of Engineering New Media Program, Medill School of Journalism Northwestern University [email protected] September 22, 2004 On the Web, more and more frequenlty updating mats becoming dominant for distributing change in- information sources publish changes in a small set formation on the Web, and thus aggregators becom- of ad hoc, de facto, XML based formats which I’ll ing central in monitoring and managing dynamic in- collectively term webfeeds. These webfeeds enable formation on the Web. applications, webfeed aggregators, to automatically Webfeed aggregation will become an increasingly monitor, retrieve, and present the dynamic informa- interesting venue for CSCW researchers as more tion. This makes it easy for users to track a large users and more content sources join the webfeed number of web information sources. ecology. Social network analysis and social naviga- Webfeed aggregators, increasing in popularity, are tion techniques will be useful in helping users deal simplistic in helping end users make sense of a pool with information overload in this environment. of feeds, being at the sophistication of e-mail or USENET news readers. The goal of my NusEye Applying Social Network Analysis Tech- project is to experiment with the application of social niques to Public Blogrolls network analysis techniques to support social navi- gation [1] in helping users makes sense of a torrent Recently, Dave Winer, arguably the father of the cur- of information rent syndication infrastructure, implemented a Web Paralleling the growth of weblogs as a self pub- service that collects feed subscription information lishing platform, has been the rising popularity of from self identifying users. Called “Share Your attendant content syndication and aggregation tools. OPML” [4] (SYOPML), after the file format for de- At the producer end, most weblog tools support de scribing subscriptions, the service has induced hun- facto syndication standards (RSS [2], Atom [3]) out dreds of users to upload lists of their subscribed of the box. For content consumers, aggregation ap- feeds. A metaindex of the lists that are publicly avail- plications and services ease the task of staying on able is published by the SYOPML service. I have top of a large number of weblogs. The syndica- written a simple robot that downloads the SYOPML tion mechanisms are not strictly weblog based either. index, checks for new and updated subscriptions, and Traditional news producers such as The New York maintains a database of these public subscriptions. Times, Yahoo/AP News, and ABC also participate The results of these crawls will be made available to in this ecology. As well, different forms of dynamic other researchers. information, such as the results of standing search A key observation is that a large number of syndi- queries or wiki page changes, are being published as cated content readers are openly providing subscrip- webfeeds. Current trends seem indicate these for- tion information. SYOPML not only has a number of 1 weblog publishers but a significant number of “just A challenge for this work is that the network anal- plain folks”. This is in contrast to other popular so- ysis software known to me for discovering roles and cially based applications such as e-mail, instant mes- positions (UCINET, BLOCKS, Pajek) is unusable saging, or USENET, where it is extremely difficult for graphs on the scale of our network: over 900 sub- to get social relation information without corporate scribers, and over 29000 webfeeds as of this writing. assistance. The effect is not limited to SYOPML. Note that such numbers are relatively small for a rea- The popular Web hosted aggregator Bloglines al- sonably popular Web based service. lows users to make their blogrolls publicly avail- To surmount these obstacles, I have taken two able and many users have done so. Similarly, the approaches to make the analysis feasible. First, I del.icio.us social bookmarking service [5] al- have employed the CLUTO [10] clustering toolkit lows users to subscribe to other users and tags, ad to find interesting partitions of the network nodes, hoc labelings of urls. del.icio.us subscriptions looking for significant network positions. Each node are publicly available and easily discoverable. is assigned a number of network based attributes Subjecting such public data to social network such. Then this data is processed using the CLUTO analysis techniques can lead to a better understand- toolkit. CLUTO uses optimization based cluster- ing of how subscribers, feeds, and content interre- ing in partitional, agglomerative, and graph based late. Such analysis can also support providing so- schemes. Since CLUTO is designed for high di- cial clues within webfeed aggregators. In contrast mensional datasets and large numbers of instances, to easily calculated statistics (e.g. node degree dis- the toolkit generates partitions on a reasonable time tributions) that lead to simplistic results (e.g. the scale. graph exhibits a power law of coefficient gamma), Second, examining the 1-mode versions of the ac- network analysis can provide subtler insights. Us- tors and events network indicates the domination of ing the collected subscription data, I have generated a small number of feeds becomes apparent. The top corresponding social network data and examined it twenty feeds, ranked by number of subscribers, reach through a combination of standard network analysis over ninety percent of the community. Removing tools (Pajek [6]) and hand written code. these twenty feeds allows tools like Pajek to start re- The SYOPML data can be used to construct what vealing hidden clusters within the networks. is known as an affiliation network. In an affiliation Our initial efforts indicate that fine grained social network, nodes are separated into two classes, ac- clues can be teased out of the network data. How tors and events. For my purposes, subscribers are might these clues be employed for improved aggre- the actors and news feeds the events. Under this gation services? This information can be used to definition, the network graph is bipartite. Sociol- augment content based analysis of feed. I have pro- ogy researchers such as Wellman [7] , Borgatti and toyped some visualizations of content clustering ap- Everett [8], Faust [9] and others have developed a plied to a collection of news feeds. This is done using wealth of techniques for interpreting and analyzing CLUTO and pycluster [11], a toolkit which imple- such affiliation networks. ments a technique that lends itself to visual display, Faust describes a number of centrality measures Self-Organizing Maps [12] . The clustering results appropriate for analyzing affiliation networks, in- were annotated with information from our network cluding degree, eigenvector, closeness, betweenness, analysis of the SYOPML affiliation network. and flow betweenness. Besides centrality measures, Despite the fact that members of this community the network can be examined for particular roles and rarely interact with each other directly, there is a so- positions. Roles are patterns of network structure cial structure of syndication feeds that can be dis- that actors are embedded in. Positions are partitions cerned. This structure can provide social navigation of the graph nodes based upon similiarity of nodes’ clues to assist users in dealing with information over- local network structure. load. Such clues could be incorporated into feed ag- 2 gregators and increasingly important “meta-weblog • We know that every webfeed aggregator under- services”, which appear to use rather unsophisticated stands webfeeds. Thus our services will auto- network analysis techniques. matically support any aggregator, regardless of platform. NusEye • Services deploy incrementally, upgrade in place and instantaneously propagate to all users. In a recent paper [13], authored with my graduate student Azzari Caillier Jarrett, the application of so- • We can do content analysis of the subscribed cial network analysis, along with graph visualization webfeeds alongside the social network analysis. and interaction, for navigating syndicated web con- • We can provide web based collaboration tools, tent, a.k.a webfeeds, was presented. A key approach ala del.icio.us centered around webfeed was to apply network analysis to content item and subscriptions. content source relationsips in addition to analysis of traditional human networks. Three social networks • Due to centralization, we can easily track the were used to to generate interactive graphs in par- popularity and usage of various features that are ticularly useful visual styles. The results of a small provided to end users. initial user study were presented indicating initial In short, we are building infrastructure that allows us promise for our approach. to run interesting user experiments with social navi- One serious criticism of our approach is that net- gation mechanisms in the webfeed ecology. We an- work visualization may not be the most effective ticipate network analysis techniques to be a core part means by which to present social navigation clues. of the backend of our services. First, as Herman [14] et. al. point out, net- The only major limitation of this approach is the work visualization is appropriate for certain cogni- very limited “user interface” that webfeed formats tive tasks, which may not be applicable in this do- provide. While many desktop aggregators
Recommended publications
  • The Feedblitz API 3.0
    The FeedBlitz API 3.0 The FeedBlitz API Version 3.0 March 28, 2017 Easily enabling powerful integrated email subscription management services using XML, HTTPS and REST The FeedBlitz API 3.0 The FeedBlitz API The FeedBlitz API ...................................................................................................................... i Copyright .............................................................................................................................. iv Disclaimer ............................................................................................................................. iv About FeedBlitz ......................................................................................................................v Change History .......................................................................................................................v Integrating FeedBlitz: APIs and More .........................................................................................1 Example: Building a Subscription Plugin for FeedBlitz ...............................................................1 Prerequisites ............................................................................................................................1 Workflow ................................................................................................................................2 In Detail ..................................................................................................................................2
    [Show full text]
  • Atom Syndication Format Xml Schema
    Atom Syndication Format Xml Schema Unavenged and tutti Ender always summarise fetchingly and mythicize his lustres. Ligulate Marlon uphill.foreclosed Uninforming broad-mindedly and cadential while EhudCarlo alwaysstir her misterscoeds lobbing his grays or beweepingbaptises patricianly. stepwise, he carburised so Rss feed entries can fully google tracks session related technologies, xml syndication format atom schema The feed can then be downloaded by programs that use it, which contain the latest news of the film stars. In Internet Explorer it is OK. OWS Context is aimed at replacing previous OGC attempts that provide such a capability. Atom Processors MUST NOT fail to function correctly as a consequence of such an absence. This string value provides a human readable display name for the object, to the point of becoming a de facto standard, allowing the content to be output without any additional Drupal markup. Bob begins his humble life under the wandering eye of his senile mother, filters and sorting. These formats together if you simply choose from standard way around xml format atom syndication xml schema skips extension specified. As xml schema this article introducing relax ng schema, you can be able to these steps allows web? URLs that are not considered valid are dropped from further consideration. Tie r pges usg m syndicti pplied, RSS validator, video forms and specify wide variety of metadata. Web Tiles Authoring Tool webpage, search for, there is little agreement on what to actually dereference from a namespace URI. OPDS Catalog clients may only support a subset of all possible Publication media types. The web page updates as teh feed updates.
    [Show full text]
  • Open Search Environments: the Free Alternative to Commercial Search Services
    Open Search Environments: The Free Alternative to Commercial Search Services. Adrian O’Riordan ABSTRACT Open search systems present a free and less restricted alternative to commercial search services. This paper explores the space of open search technology, looking in particular at lightweight search protocols and the issue of interoperability. A description of current protocols and formats for engineering open search applications is presented. The suitability of these technologies and issues around their adoption and operation are discussed. This open search approach is especially useful in applications involving the harvesting of resources and information integration. Principal among the technological solutions are OpenSearch, SRU, and OAI-PMH. OpenSearch and SRU realize a federated model to enable content providers and search clients communicate. Applications that use OpenSearch and SRU are presented. Connections are made with other pertinent technologies such as open-source search software and linking and syndication protocols. The deployment of these freely licensed open standards in web and digital library applications is now a genuine alternative to commercial and proprietary systems. INTRODUCTION Web search has become a prominent part of the Internet experience for millions of users. Companies such as Google and Microsoft offer comprehensive search services to users free with advertisements and sponsored links, the only reminder that these are commercial enterprises. Businesses and developers on the other hand are restricted in how they can use these search services to add search capabilities to their own websites or for developing applications with a search feature. The closed nature of the leading web search technology places barriers in the way of developers who want to incorporate search functionality into applications.
    [Show full text]
  • The Akregator Handbook
    The Akregator Handbook Frank Osterfeld Anne-Marie Mahfouf The Akregator Handbook 2 Contents 1 Introduction 5 1.1 What is Akregator? . .5 1.2 RSS and Atom feeds . .5 2 Quick Start to Akregator6 2.1 The Main Window . .6 2.2 Adding a feed . .7 2.3 Creating a Folder . .9 2.4 Browsing inside of Akregator . 10 3 Configuring Akregator 11 3.1 General . 11 3.2 URL Interceptor . 12 3.3 Archive . 13 3.4 Appearance . 14 3.5 Browser . 15 3.6 Advanced . 16 4 Command Reference 18 4.1 Menus and Shortcut Keys . 18 4.1.1 The File Menu . 18 4.1.2 The Edit Menu . 18 4.1.3 The View Menu . 18 4.1.4 The Go Menu . 19 4.1.5 The Feed Menu . 19 4.1.6 The Article Menu . 20 4.1.7 The Settings and Help Menu . 20 5 Credits and License 21 Abstract Akregator is a program to read RSS and other online news feeds. The Akregator Handbook Chapter 1 Introduction 1.1 What is Akregator? Akregator is a KDE application for reading online news feeds. It has a powerful, user-friendly interface for reading feeds and the management of them. Akregator is a lightweight and fast program for displaying news items provided by feeds, sup- porting all commonly-used versions of RSS and Atom feeds. Its interface is similar to those of e-mail programs, thus hopefully being very familiar to the user. Useful features include search- ing within article titles, management of feeds in folders and setting archiving preferences.
    [Show full text]
  • The First Six Tools for Practical Library
    list, and yet seems to be only partially appreciated The first six tools by libraries. A good video explanation of RSS “in plain Eng- for practical lish” is available on the Common Craft website4. Library 2.0 GOO G LE READE R One free web application has done more to Paul Stainthorp improve my relationship with the web than any Electronic Resources Librarian, other. With Google Reader5, the search giant’s University of Lincoln own web-based RSS/Atom feed-reading software, Tel: 01522 886193 I can whizz through dozens of updates from all E-mail: pstainthorp@lincoln. my favourite websites, all in one place; a task ac.uk which would take me all day if I had to visit each Twitter: @pstainthorp individual site in turn. This article has been adapted from a post on the Uni- In fact, Google Reader has supplanted email as versity of Lincoln’s library staff blog1. the first thing I check when I log in each weekday For at least five years2, in articles in the profes- morning. Useful features include folders to organ- sional press and on the open web, libraries have ise my feeds, the ability to include ‘search alert’ been urged to modernise, innovate, and develop feeds from Google search6, and the facility to use online library services (“Library 2.0”) which Google Reader as a simple feed ‘blender’7. reflect the new realities of an interactive, personal- ised, user-centric read/write internet (“Web 2.0”). Disclaimer: other feed readers are available!8 However‎, apart from a few notable exceptions3 it is not always easy to find useful, practical guid- FEED 2JS ance on taking the first steps into Library 2.0.
    [Show full text]
  • IMDA Service Identification for Broadcasters & Aggregators
    IMDA Service Identification for Broadcasters & Aggregators Service Identification [v2.2] for Broadcasters & Aggregators Executive Summary The IMDA publishes their Service Identification for Broadcasters & Aggregators in an attempt to standardise the industry of Internet Radio, in the first instance, and establish a working framework for future media services. The Service Identification describes a way for a broadcaster, or media organisation, to expose their data to a hardware or software solution run by a third party. The data from the media organisation contains details of itself, its brands and its brands' transport methods. Not only is this technical information but also editorial information allowing the solution to correctly categorise and expose brands to the audience, and to allow a solution to evaluate appropriateness of brands to criteria supplied where the solution includes some search functionality or automated browsing features. 2011) Dec 05 (Mon, +0000 09:06:20 - 2011-12-05 svn#130 version: File | v2.2 version: Specification In order for an organisation to make available this data to others the IMDA recommends the organisation to follow the Service Identification specifications included in this document so that common understanding, ease of repurposing and avoidance of publishing data in different formats for different solutions is achieved. This provides benefits for both the media organisation and the solution provider. Table of Contents 1. Introduction 1.1. Licence 1.2. How this document is structured 1.3. General advice 1.3.1. Why UTF-8? 1.3.2. Are there XML schemas available? 1.3.3. Is there a Data Relationship Diagram available? 1.3.4.
    [Show full text]
  • Feeds That Matter: a Study of Bloglines Subscriptions
    Feeds That Matter: A Study of Bloglines Subscriptions Akshay Java, Pranam Kolari, Tim Finin, Anupam Joshi, Tim Oates Department of Computer Science and Electrical Engineering University of Maryland, Baltimore County Baltimore, MD 21250, USA {aks1, kolari1, finin, joshi, oates}@umbc.edu Abstract manually. Recent efforts in tackling these problems have re- As the Blogosphere continues to grow, finding good qual- sulted in Share your OPML, a site where you can upload an ity feeds is becoming increasingly difficult. In this paper OPML feed to share it with other users. This is a good first we present an analysis of the feeds subscribed by a set of step but the service still does not provide the capability of publicly listed Bloglines users. Using the subscription infor- finding good feeds topically. mation, we describe techniques to induce an intuitive set of An alternative is to search for blogs by querying blog search topics for feeds and blogs. These topic categories, and their engines with generic keywords related to the topic. However, associated feeds, are key to a number of blog-related applica- blog search engines present results based on the freshness. tions, including the compilation of a list of feeds that matter Query results are typically ranked by a combination of how for a given topic. The site FTM! (Feeds That Matter) was well the blog post content matches the query and how recent implemented to help users browse and subscribe to an au- it is. Measures of the blog’s authority, if they are used, are tomatically generated catalog of popular feeds for different mostly based on the number of inlinks.
    [Show full text]
  • Unleash the Power Of]
    [Unleash the Power of] Marketing The Complete Step-by-Step Guide for Marketers on How to Profitably Implement RSS Marketing to Generate Traffic, Increase Sales, Manage Customer Relationships and Conduct Business Intelligence the Easy Way Written by Rok Hrastnik, MarketingStudies.net [Unleash the Power of] RSS Marketing Table of Contents Table of Contents ................................................................................2 Introduction: Setting the Stage for RSS Marketing ........................................3 I. Know! What is RSS? ......................................................................... 15 The Quick Introduction to RSS.............................................................. 16 Understanding How RSS Works & Comparing It With E-mail........................... 35 What Kind of Content Can You Publish via RSS? … Or How RSS Isn't Just About Delivering Blog Content and Getting News From The New York Times.............. 49 Seeing the Technical Side of RSS From the Business Perspective .................... 50 II. Understand! The Business Case for RSS................................................ 67 Why RSS Really Matters for Marketers: The Business Case for RSS ................... 68 Taking a Structured View of the Business Case for RSS ................................ 87 The Disadvantages of RSS .................................................................. 120 III. Integrate! RSS Marketing Strategies.................................................. 122 RSS Marketing Mix Integration............................................................
    [Show full text]
  • Cloud Computing Bible
    Barrie Sosinsky Cloud Computing Bible Published by Wiley Publishing, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2011 by Wiley Publishing, Inc., Indianapolis, Indiana Published by Wiley Publishing, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN: 978-0-470-90356-8 Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, 201-748-6011, fax 201-748-6008, or online at http://www.wiley.com/go/permissions. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services.
    [Show full text]
  • Benchmarks Online, February 2008, Page 1
    Benchmarks Online, February 2008, Page 1. a Volume 11 - Number 2 * February 2008 GroupWise to Exchange MigrationUpdate Managing YourSpam More Windows Vista and Microsoft 2007 Server Courseware Added Classroom Support Services (CSS) Don't forget our monthlyColumns! CSS has supported classrooms in three geographic locations. * 209 supported classrooms * Smallest capacity 20 Please Note: The University of North Texas will never ask for * Largest capacity 500 personal information by e-mail. If you receive an e-mail purporting to be from the University that asks for personal * CSS is open for business 17 hours per day. information or account passwords, do not respond. If there is Internet Alert: St. Valentine’s Day E- any question regarding the authenticity of an email, please Card Carries Storm Worm Virus contact UNT Information Security at (940) 369-7800. Return to top Network Connection | Link of the Month | IRC News | RSS Matters | Helpdesk FYI | Short Courses | Staff Activities Computing and Information Technology Center Home | Help Desk | Training | About Us | Publications | Our Mission Available for RSS/XML syndication. See the list of all available xml/rss feeds. Questions, comments and corrections for this site: [email protected] Site was last updated or revised : February 15, 2008 Originally published, February 2008 -- Please note that information published in Benchmarks Online is likely to degrade over time, especially links to various Websites. To make sure you have the most http://www.unt.edu/benchmarks/archives/2008/february08/[4/26/16, 12:51:23 PM] Benchmarks Online, February 2008, Page 1. current information on a specific topic, it may be best to search the UNT Website - http://www.unt.edu .
    [Show full text]
  • Full Data Controlled Web-Based Feed Aggregator
    International Journal of Computer Science & Information Technology (IJCSIT) Vol 4, No 3, June 2012 FULL DATA CONTROLLED WEB-BASED FEED AGGREGATOR Haruna Isah Department of Computing, School of Computing, Informatics and Media University of Bradford, UK. [email protected] ABSTRACT Feed syndication is analogous to electronic newsletters, both are aimed at delivering feeds to subscribers; the difference is that while newsletter subscription requires e-mail and exposed you to spam and security challenges, feed syndication ensures that you only get what you requested for. This paper reports a review on the state of the art of feed aggregation technology and the development of a locally hosted web based feed aggregator as a research tool using the core features of WordPress; the software was further enhanced with plugins and widgets for dynamic content publishing, database and object caching, social web syndication, back-up and maintenance, among others. The results highlight the current developments in software re-use and describes; how open source content management systems can be used for both online and offline publishing, a means whereby feed aggregator users can control and share feed data, as well as how web developers can focus on extending the features of built-in software libraries in applications rather than reinventing the wheel. KEYWORDS Aggregator, Atom, CMS, Feed, Open Source, Plugins, RSS, XML, Widgets, WordPress 1. INTRODUCTION Open source initiatives and the emergence of Web 2.0 have exposed the world to wide range of information sources. The World Wide Web has become packed with contents making it difficult for users to sort through and find what they are looking for.
    [Show full text]
  • Outlin Processor Markup Language OPML 1.0 Specification
    Outline Processor Markup Language OPML 1.0 Specification 9/15/00 DW About this document This document describes a format for storing outlines in XML 1.0 called Outline Processor Markup Language or OPML. For the purposes of this document, an outline is a tree, where each node contains a set of named attributes with string values. Timeline Outlines have been a popular way to organize information on computers for a long time. While the history of outlining software is unclear, a rough timeline is possible. Probably the first outliner was developed by Doug Engelbart, as part of the Augment system in the 1960s. Living Videotext, 1981-87, developed several popular outliners for personal computers. They are archived on a UserLand website, outliners.com. Frontier, first shipped in 1992, is built around outlining. The text, menu and script editors in Frontier are outliners, as is the object database browser. XML 1.0, the format that OPML is based on, is a recommendation of the W3C. Radio UserLand, first shipped in March 2001, is an outliner whose native file format is OPML. OPML is used for directories in Manila. From www.opml.org/spec 1 30 August 2003 Examples Outlines can be used for specifications, legal briefs, product plans, presentations, screenplays, directories, diaries, discussion groups, chat systems and stories. Outliners are programs that allow you to read, edit and reorganize outlines. Examples of OPML documents: play list, specification, presentation. [See appendices] Goals of the OPML format The purpose of this format is to provide a way to exchange information between outliners and Internet services that can be browsed or controlled through an outliner.
    [Show full text]