Design and Development of a Learning-Augmented RSS Feed Reader Program “The Mindful Reader”

Total Page:16

File Type:pdf, Size:1020Kb

Design and Development of a Learning-Augmented RSS Feed Reader Program “The Mindful Reader” Project Number: DCB-0802 Design and Development of a Learning-Augmented RSS Feed Reader Program “The Mindful Reader” A Major Qualifying Project Report: submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degree of Bachelor of Science By _________________________________ Chris Drouin Date: April 29, 2009 _________________________________ Professor David C. Brown (CS) Abstract The Mindful Reader Project centers on the design and development of a machine learning- augmented newsfeed aggregation application. It seeks to reduce the time necessary for users to find interesting newsfeed articles, by building a user interest model from implicit and explicit article ratings and applying that model to rank incoming articles based on predicted user interest. The software was developed using code from the RSSOwl project; in tests, the user interest model grew more accurate with time. ii Executive Summary Feed aggregator software and services, such as RSSOwl and Google Reader, can be used to subscribe to websites and obtain streaming updates regarding new articles or items posted to those websites. These existing aggregators follow an email-client-like design that can make it time- consuming and inconvenient to manage high volumes of incoming articles. The Mindful Reader project aims to solve this problem by modeling user interests and using that model to rate and filter incoming articles. As reducing user fatigue is an important goal of this project, the Mindful Reader builds its user interest model in part through observation of normal user behavior, rather than requiring explicit user judgment of every viewed article. It uses metrics established by the earlier Curious Browser projects to measure implicit user interest in article content, while still allowing the user to train the system by providing explicit content ratings. Article ratings and contents are fed into a database of informative terms and term frequencies, which can then be used to evaluate new articles as they arrive. New articles are evaluated with a set of Naïve Bayes Classifiers, and the evaluations are used to rank the articles, so that potentially interesting articles are prominently visible in the article list. The Mindful Reader has not been developed from scratch. To avoid unnecessary re- implementation of critical aggregator features, such as feed parsing and article rendering, it is instead based upon the RSSOwl project. RSSOwl is licensed under the Eclipse Public License; it is an open-source, cross-platform, Java-based feed aggregator that provides excellent functionality. The Mindful Reader was tested and evaluated through a series of two week-long user tests conducted in April of 2009. The first of these tests showed that the implicit interest inference mechanism worked well for many users, and the second test showed that the user interest model improved the accuracy of its predictions with time. iii Table of Contents Abstract ....................................................................................................................................................................................... ii Executive Summary ............................................................................................................................................................... iii List of Figures ........................................................................................................................................................................... vi List of Tables ............................................................................................................................................................................ vii 1 Introduction ..................................................................................................................................................................... 1 2 Background ...................................................................................................................................................................... 3 2.1 Feeds and the RSS Specification ..................................................................................................................... 3 2.1.1 Feed File Contents ...................................................................................................................................... 3 2.1.2 Using Feeds with Aggregators ............................................................................................................... 3 2.1.3 Aggregator Use Amongst Web Users .................................................................................................. 3 2.2 Academic Adaptive News Access Systems ................................................................................................. 4 2.3 User Interest Modeling Techniques .............................................................................................................. 4 2.3.1 Gathering Interest Ratings ...................................................................................................................... 4 2.3.2 Representing Interest Ratings and Content .................................................................................... 5 2.3.3 Predicting User Interest in New Content .......................................................................................... 6 2.4 Existing Software.................................................................................................................................................. 7 2.4.1 RSSOwl ............................................................................................................................................................ 8 2.4.2 Mozilla Thunderbird ................................................................................................................................. 9 2.4.3 RSSBandit ..................................................................................................................................................... 10 2.4.4 BottomFeeder ............................................................................................................................................ 11 2.4.5 Ground-Up Development ...................................................................................................................... 12 2.4.6 Feature Comparison ................................................................................................................................ 13 3 Methodology .................................................................................................................................................................. 14 3.1 Design ...................................................................................................................................................................... 14 3.2 Development ........................................................................................................................................................ 14 3.2.1 Requirements ............................................................................................................................................. 14 3.2.2 Testing and Debugging ........................................................................................................................... 15 3.3 Evaluation .............................................................................................................................................................. 16 3.3.1 Evaluation Methodology ........................................................................................................................ 16 3.3.2 Evaluation Survey ..................................................................................................................................... 16 3.4 Project Management ......................................................................................................................................... 17 3.4.1 Schedule ....................................................................................................................................................... 17 4 Software Design ............................................................................................................................................................ 19 4.1 RSSOwl Architecture ......................................................................................................................................... 19 4.1.1 org.rssowl.core module .......................................................................................................................... 19 4.1.2 org.rssowl.lib.db4o module .................................................................................................................. 19 4.1.3 org.rssowl.httpclient ............................................................................................................................... 19 4.1.4 org.rssowl.lib.jdom................................................................................................................................... 20 4.1.5 org.rssowl.lib.lucene ................................................................................................................................ 20 4.1.6 org.rssowl.ui ............................................................................................................................................... 20 4.2 User Behavior Monitoring and
Recommended publications
  • Omea Pro Printed Documentation
    Omea Pro Printed Documentation Table Of Contents Welcome............................................................................................................................................ 1 Help Navigation Buttons.......................................................................................................... 1 Toolbars ..................................................................................................................................... 1 Adjusting the window and pane size.............................................................................. 1 Navigating topics.................................................................................................................... 1 About Local Video Tutorials ................................................................................................... 2 What’s Next?............................................................................................................................ 2 Introducing Omea Pro.................................................................................................................. 3 New in Omea Pro ........................................................................................................................... 5 General Improvements............................................................................................................ 5 Organizational Features Improvements............................................................................ 5 Browser Integration.................................................................................................................
    [Show full text]
  • Instant Messaging Video Converter, Iphone Converter Application
    Web Browsing Mozilla Firefox The premier free, open-source browser. Tabs, pop-up blocking, themes, and extensions. Considered by many to be the world's best browser. Download Page Video Player, Torrents, Podcasting Miro Beautiful interface. Plays any video type (much more than quicktime). Subscribe to video RSS, download, and watch all in one. Torrent support. Search and download from YouTube and others. Download Page IM - Instant Messaging Adium Connect to multiple IM accounts simultaneously in a single app, including: AOL IM, MSN, and Jabber. Beautiful, themable interface. Download Page Video Converter, iPhone Converter Miro Video Converter Convert any type of video to mp4 or theora. Convert any video for use with iPhone, iPod, Android, etc. Very clean, easy to use interface. Download Page Application Launching Quicksilver Quicksilver lets you start applications (and do just about everything) with a few quick taps of your fingers. Warning: start using Quicksilver and you won't be able to imagine using a Mac without it. Download Page Email Mozilla Thunderbird Powerful spam filtering, solid interface, and all the features you need. Download Page Utilities The Unarchiver Uncompress RAR, 7zip, tar, and bz2 files on your Mac. Many new Mac users will be puzzled the first time they download a RAR file. Do them a favor and download UnRarX for them! Download Page DVD Ripping Handbrake DVD ripper and MPEG-4 / H.264 encoding. Very simple to use. Download Page RSS Vienna Very nice, native RSS client. Download Page RSSOwl Solid cross-platform RSS client. Download Page Peer-to-Peer Filesharing Cabos A simple, easy to use filesharing program.
    [Show full text]
  • Specifications for Implementing Web Feeds in DLXS Kevin S
    Hawkins 1/5/2011 5:01:52 PM Page 1 of 5 * SCHOLARLY PUBLISHING OFFICE WHITEPAPER Specifications for implementing web feeds in DLXS Kevin S. Hawkins Executive Summary SPO uses DLXS to deliver nearly all of its publications online, but this software does not offer any functionality for web feeds (commonly called RSS feeds). The components of web feeds are reviewed, and recommendations are made for implementing Atom 1.0 web feeds in SPO publications using a script run during releasing which determines which content to push to the feed. Motivations Members of the editorial board of the Journal of Electronic Publishing (collid jep) and SPO staff have requested that web feed functionality be provided for SPO publications. This would allow end users to subscribe to feeds of not only serial content (the obvious use) but in theory any other collection to which content is added, especially on a regular basis. A feed could provide a single notice that a collection has been updated, perhaps with a table of contents (henceforth a small feed), or instead contain separate notices for each item added to the collection (a large feed). Components of a web feed A web feed operates with three necessary components: • a website that syndicates (or publishes) a feed • a feed standard being used by the website • the end user’s news aggregator (feed reader or news reader), which retrieves the feed periodically. A feed is an XML document containing individual entries (also items, stories, or articles). Syndicating a feed The content provider’s website often has buttons (icons called chicklets) to be used by the end user to add a feed to his or her news aggregator.
    [Show full text]
  • DVD-Libre 2007-09
    (continuación 2) DOSBox 0.72 - DosZip Commander 1.28 - Doxygen 1.5.3 - DrawPile 0.4.0.1 - Drupal 4.7.7 - Drupal 4.7.X Castellano - Drupal 4.7.X Catalán - Drupal 5.2 - Drupal 5.X Castellano - Drupal 5.X Catalán - DVD Flick 1.2.2.1 - DVDStyler 1.5.1 - DVDx 2.10 - EasyPHP 1.8 - Eclipse 3.2.1 Castellano - Eclipse 3.3 - Eclipse Graphical Editor Framework 3.3 - Eclipse Modeling Framework 2.3.0 - Eclipse UML2 DVD-Libre 2.1.0 - Eclipse Visual Editor 1.2.1 - Ekiga 2.0.9 beta - Elgg 0.8 - EQAlign 1.0.0. - Eraser 5.84 - Exodus 0.9.1.0 - Explore2fs 1.08 beta9 - ez Components 2007.1.1 - eZ Publish 3.9.3 - Fast Floating Fractal Fun cdlibre.org 3.2.3 - FileZilla 3.0.0 - FileZilla Server 0.9.23 - Firebird 2.0.1.12855 - Firefox 2.0.0.6 Castellano - Firefox 2.0.0.6 Català - FLAC 1.2.0a - FMSLogo 6.16.0 - Folder Size 2.3 - FractalForge 2.8.2 - Free Download 2007-09 Manager 2.5.712 - Free Pascal 2.2.0 - Free UCS Outline Fonts 2006.01.26 - Free1x2 0.70.1 - FreeCAD 0.6.476 - FreeDOS 1.0 Disquete de arranque - FreeDOS 1.0 Full CD 2007.05.02 - FreeMind 0.8.0 - FreePCB 1.2.0.0 - FreePCB 1.338 - Fyre 1.0.0 - Gaim 1.5.0 Català - Gambit 0.2007.01.30 - GanttProject DVD-Libre es una recopilación de programas libres para Windows. 2.0.4 - GanttPV 0.7 - GAP 4.4.9 - GAP paquetes 2007.09.08 - Gazpacho 0.7.2 - GCfilms 6.4 - GCompris 8.3.3 - Gencat RSS 1.0 - GenealogyJ 2.4.3 - GeoGebra 3.0.0 RC1 - GeoLabo 1.25 - Geonext 1.71 - GIMP 2.2.17 - GIMP 2.2.8 Català - GIMP Animation package 2.2.0 - GIMPShop 2.2.8 - gmorgan 0.24 - GnuCash En http://www.cdlibre.org puedes conseguir la versión más actual de este 2.2.1 - Gnumeric 1.6.3 - GnuWin32 Indent 2.2.9 - Gparted LiveCD 0.3.4.8 - Gpg4win 1.1.2 - Graph 4.3 - DVD, así como otros CDs y DVDs recopilatorios de programas y fuentes.
    [Show full text]
  • Portable Apps Fact Sheet
    Gerry Kennedy- Fact Sheets Pottable Apps xx/xx/2009 PORTABLE APPS FACT SHEET What is a portable program? A Portable App or program is a piece of software that can be accessed from a portable USB device (USB thumb or pen drive, PDA, iPod or external hard disk) and used on any computer. It could be an email program, a browser, a system recovery tool or a program to achieve a set task – reading, writing, music editing, art and design, magnification, text to speech, OCR or to provide more efficient or equitable access (e.g. Click ‘n’ Type onscreen keyboard). There are hundreds of programs that are available. All of a user’s data and settings are always stored on the thumb drive so when a user unplugs the device, none of the personal data is left behind. It provides a way of accessing and using relevant software – anywhere, anytime. Portable Apps are relatively new. Apps do NOT need to be installed on the ‘host’ computer. No Tech support is required on campus. No permissions are required in order to have the software made available to them. Students can work anywhere - as long as they have access to a computer. The majority of programs are free (Freeware or Open Source). Fact sheets, user guides and support are all available on the web or within the Help File in the various applications. Or – have students write their own! Be sure to encourage all users to BACK UP their files as USB drives can be lost, stolen, misplaced, left behind, erased -due to strong magnetic fields or accidentally compromised (left on a parcel shelf in a car, or left in clothes and washed!).
    [Show full text]
  • Open Sourcing De Imac =------Jan Stedehouder
    Copyright: DOSgg ProgrammaTheek BV SoftwareBus 2007 3 ---------------------------------------------------------------------------= Open Sourcing de iMac =--------------------------------------------------------------------------------------------------------- Jan Stedehouder Inleiding Het kan dus betrekkelijk onschuldig wor- Het zijn er niet veel, maar zo nu en dan den geïnstalleerd (betrekkelijk, want heb ik een uurtje over voor wat extra het kan altijd misgaan, de risico’s zijn speelwerk. Bij voorkeur doe je dan iets voor uzelf). wat zowel leuk als interessant is en in dit geval stond mijn trouwe iMac Blue Fink sluit bij de uitvoering heel sterk (733 MHz, 768 MB RAM en een 20 GB HD) aan bij Debian en maakt gebruik van de te smeken om onder handen genomen te pakketbeheermogelijkheden die worden worden. De afgelopen maanden heeft ’ie worden geboden door dpkg, dselect en namelijk wat te lijden gehad onder (1) apt-get. Fink Commander is een grafi- een gebrek aan tijd en (2) een reeks ex- sche interface die wel iets weg heeft perimenten met Linux voor de PowerPC van Synaptic. De lijst van beschikbare (zie tekstvak). Het uitgangspunt was nu: programma’s is niet gek. Zo kunnen we ‘Hoe ver kan ik gaan met installeren van KDE en Gnome installeren. In vergelij- open source-software met Mac OSX als king met de complete Debian reposito- platform?’ De speurtocht leverde weer ries (softwarelijsten) is het wel beperkt. voldoende inspiratie op voor dit artikel Mijn Fink-installatie geeft aan dat er en op die wijze kunnen we ook onze Mac- ruim 2500 pakketten beschikbaar zijn, liefhebbers van dienst zijn. We beginnen tegen bijna 20.000 voor de Debian- met Fink, een project om Unix-software repositories.
    [Show full text]
  • Copyrighted Material
    Index caching data, 76 A CDNs (Content Delivery Networks), 76–77 Accept-Language HTTP header, 465, 469 controller class in BLL, 74–75 Access database, as data store, 66 DAL (data access layer), 67 accessibility data store selection, 66–67 selling products online and, 377 Design section, 64–83 site design process and, 48 exception handling and, 77–78 use of tables and, 40–41 JavaScript integration, 82–83 account box settings, 134 layered approach to, 64–65 accounts, managing user accounts, 86 LINQ Entity Framework, 72 actions, controller, 31–32, 175 LINQ impact on DAL, 69–71 AddOption action, opinion poll controllers, 273 LINQ-to-SQL and, 71–72 AddShoppingCartItem action, e-commerce multiple data store support, 67–69 controllers, 401–402 MVC framework and, 64 administration output options and, 78 e-commerce store. See storefront administration presentation layer best practices, 80 views Problem section, 63–64 newsletter, 296 search engine optimization, 81–82 administration console, for polls module, 257 securing DAL, 72–74 administration pages, tools for creating, 89 security and, 75 administrators Solution section, 83 managing user accounts, 86 summary, 83 role, 176 transaction management, 78 AdminProductItem.ascx control user interface, 79 ManageProducts.aspx view, 434–435 views and, 80–81 ViewDepartment.aspx views, 440–441 web.config configuration, 78–79 AdminSidebar.ascx control archiving opinion polls, 258 forum views, 350–351 Article class, 166–168 news, articles, and blog views, 222 extending LINQ objects, 187–188 opinion poll views,
    [Show full text]
  • EMERGING TECHNOLOGIES Skype and Podcasting: Disruptive Technologies for Language Learning
    Language Learning & Technology September 2005, Volume 9, Number 3 http://llt.msu.edu/vol9num3/emerging/ pp. 9-12 EMERGING TECHNOLOGIES Skype and Podcasting: Disruptive Technologies for Language Learning Robert Godwin-Jones Virginia Comonwealth University New technologies, or new uses of existing technologies, continue to provide unique opportunities for language learning. This is in particular the case for several new network options for oral language practice. Both Skype and podcasting can be considered "disruptive technologies" in that they allow for new and different ways of doing familiar tasks, and in the process, may threaten traditional industries. Skype, the "people's telephone," is a free, Internet-based alternative to commercial phone service, while podcasting, the "radio for the people," provides a "narrowcasting" version of broadcast media. Both have sparked intense interest and have large numbers of users, although it is too soon to toll the bell for telephone companies and the radio industry. Skype and podcasting have had a political aspect to their embrace by early adopters -- a way of democratizing institutions -- but as they reach the mainstream, that is likely to become less important than the low cost and convenience the technologies offer. Both technologies offer intriguing opportunities for language professionals and learners, as they provide additional channels for oral communication. Skype and Internet Telephony Skype is a software product which provides telephone service through VoIP (Voice over IP), allowing your personal computer to act like a telephone. A microphone attached to the computer is necessary and headphones are desirable (to prevent echoes of the voice of your conversation partner). It is not the only such tool, nor the first, but because it provides good quality (through highly efficient compression) and is free, it has become widely used.
    [Show full text]
  • Web 2.0: Beyond the Concept Practical Ways to Implement RSS, Podcasts, and Wikis
    Web 2.0: Beyond the Concept Practical Ways to Implement RSS, Podcasts, and Wikis By Karen Huffman, National Geographic Society Abstract This article focuses on how the National Geographic Society’s Libraries & Information Services has implemented RSS, podcasts and wikis with suggested practical application for academic libraries. The article is an adaptation from a presentation given during the Special Libraries Association’s Annual Conference, June 2006 (Huffman & Ferry, 2006). The Sparks That Ignited Us…. National Geographic’s Libraries & Information Services (LIS) discovery and implementation of technologies dubbed “Web 2.0” has been an evolutionary, rather than revolutionary process. We watched the trends to evolve our products and services. In 2005 we began seeing online news sources like BBCnews.com offering alternative ways to access their content. New RSS-labeled icons appeared in the margins of websites with links to directories of feeds. These RSS directories generally organized their feeds either topically or by news section and often included frequently asked questions about RSS as well as recommended feed readers. As information organizers (and multi-taskers who streamline their work processes wherever possible), we initially saw RSS feeds as a way to keep current with news sources, blogs, and research sites that were regularly publishing content through XML-based feeds! We saw the arrival of podcasts on the heels of RSS. In January and February, 2005, the Pew Internet and American Life Project conducted a survey, and found 22 million, or 11%, Americans age 18 or older owned an iPod or MP3 player. Apple’s freely-available iTunes library brought podcasting to the mainstream with the launch of its “Podcasts” directory in June 2005.
    [Show full text]
  • Working with Feeds, RSS, and Atom
    CHAPTER 4 Working with Feeds, RSS, and Atom A fundamental enabling technology for mashups is syndication feeds, especially those packaged in XML. Feeds are documents used to transfer frequently updated digital content to users. This chapter introduces feeds, focusing on the specific examples of RSS and Atom. RSS and Atom are arguably the most widely used XML formats in the world. Indeed, there’s a good chance that any given web site provides some RSS or Atom feed—even if there is no XML-based API for the web site. Although RSS and Atom are the dominant feed format, other formats are also used to create feeds: JSON, PHP serialization, and CSV. I will also cover those formats in this chapter. So, why do feeds matter? Feeds give you structured information from applications that is easy to parse and reuse. Not only are feeds readily available, but there are many applications that use those feeds—all requiring no or very little programming effort from you. Indeed, there is an entire ecology of web feeds (the data formats, applications, producers, and consumers) that provides great potential for the remix and mashup of information—some of which is starting to be realized today. This chapter covers the following: * What feeds are and how they are used * The semantics and syntax of feeds, with a focus on RSS 2.0, RSS 1.0, and Atom 1.0 * The extension mechanism of RSS 2.0 and Atom 1.0 * How to get feeds from Flickr and other feed-producing applications and web sites * Feed formats other than RSS and Atom in the context of Flickr feeds * How feed autodiscovery can be used to find feeds * News aggregators for reading feeds and tools for validating and scraping feeds * How to remix and mashup feeds with Feedburner and Yahoo! Pipes Note In this chapter, I assume you have an understanding of the basics of XML, including XML namespaces and XML schemas.
    [Show full text]
  • Feed Your ‘Need to Read’ – an Explanation of RSS
    by Jen Sharp, jensharp.com Feed your ‘Need To Read’ – an explanation of RSS here exists a useful informational tool so powerful that provides feeds. For example, USA Today has separate that the government of China has completely feeds in its different sections. You know that a page has the banned its use since 2007. Yet many people have capabilities for RSS subscriptions by looking for the orange Tnot even heard about it, despite its juxtaposition in chicklet in the toolbar of your browser. Clicking on the logo front of our attention during our experiences on the web. will start you in the process of seeing what the feed for that Recently I was visiting with my dad, who is well-informed page looks like. Then you can choose how to subscribe. and somewhat technically savvy. I asked him if he had ever Choosing a reader can seem overwhelming, as there are heard of RSS feeds. He hadn’t, so when I pointed out the thousands of readers available, most of them for free. To ever-present orange “chicklet,” he remarked, “Oh! I thought start, decide how you what to view the information. RSS that was the logo for Home Depot!” feeds can be displayed in many ways, including: Very simply, RSS feeds are akin to picking up your ■ sent to your email inbox favorite newspaper and scanning the headlines for a story ■ as a personal browser start page like igoogle.com or you are interested in. Items are displayed with a synopsis and my.yahoo.com links to the full page of information.
    [Show full text]
  • Bitassa a Lloure - Blog De Benjamí Villoslada Edit Feed Details… | Delete Feed…
    Analyze :: Feed Subscribers http://www.feedburner.com/fb/a/subscribers?dateRa... My Feeds • My Account Signed in as benjami (Sign Out) • Languages • Forums • Help Bitassa a lloure - Blog de Benjamí Villoslada Edit Feed Details… | Delete Feed… Analyze Optimize Publicize Monetize Troubleshootize My Feeds ↓ VIEW TIP: FeedBurner Email: Not the prettiest name, but a darn pretty service. Send your feed by email. Feed Stats Subscribers Feed Subscribers Show stats for last 30 days Live Hits 366 T T Item Use T T W T F S M W T F S S M W F S M T W F S M S T F S S S Uncommon Uses Export: Excel • CSV 0 Site Stats not Site stats are active for your Friday, November 09 – Saturday, December 08 feed. Dare to be different? subscribers (on average) Headline Animator Stats 346 468x60 - White 152 reach (on average) ↓ SERVICES FeedBurner Stats Feed Readers and Aggregators NAME SUBSCRIBERS Google Feedfetcher 125 Bloglines 55 Akregator 28 Netvibes 19 Rojo 19 Firefox Live Bookmarks 13 1 de 6 09/12/07 21:13 Analyze :: Feed Subscribers http://www.feedburner.com/fb/a/subscribers?dateRa... Liferea 10 Thunderbird 7 (not identified) 4 NewsGator Online 3 Apple-PubSub/59 2 FeedReader 2 Google Desktop 2 Jakarta Commons Generic Client 2 MagpieRSS 2 Apple CFNetwork Generic Client 1 Attensa for Outlook 1 Fastladder 1 FeedShow 1 Firefox Live Bookmarks (version 1) 1 Flock My News 1 Generic curl-based feed client 1 NetNewsWire 1 NewzCrawler 1 Programari Lliure a les Balears +http://mnm.uib.es/ Planet/2.0 1 +http://www.planetplanet.org UniversalFeedParser/4.1 +http://feedparser.org/
    [Show full text]