Design and Development of a Learning-Augmented RSS Feed Reader Program “The Mindful Reader”
Total Page:16
File Type:pdf, Size:1020Kb
Project Number: DCB-0802 Design and Development of a Learning-Augmented RSS Feed Reader Program “The Mindful Reader” A Major Qualifying Project Report: submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degree of Bachelor of Science By _________________________________ Chris Drouin Date: April 29, 2009 _________________________________ Professor David C. Brown (CS) Abstract The Mindful Reader Project centers on the design and development of a machine learning- augmented newsfeed aggregation application. It seeks to reduce the time necessary for users to find interesting newsfeed articles, by building a user interest model from implicit and explicit article ratings and applying that model to rank incoming articles based on predicted user interest. The software was developed using code from the RSSOwl project; in tests, the user interest model grew more accurate with time. ii Executive Summary Feed aggregator software and services, such as RSSOwl and Google Reader, can be used to subscribe to websites and obtain streaming updates regarding new articles or items posted to those websites. These existing aggregators follow an email-client-like design that can make it time- consuming and inconvenient to manage high volumes of incoming articles. The Mindful Reader project aims to solve this problem by modeling user interests and using that model to rate and filter incoming articles. As reducing user fatigue is an important goal of this project, the Mindful Reader builds its user interest model in part through observation of normal user behavior, rather than requiring explicit user judgment of every viewed article. It uses metrics established by the earlier Curious Browser projects to measure implicit user interest in article content, while still allowing the user to train the system by providing explicit content ratings. Article ratings and contents are fed into a database of informative terms and term frequencies, which can then be used to evaluate new articles as they arrive. New articles are evaluated with a set of Naïve Bayes Classifiers, and the evaluations are used to rank the articles, so that potentially interesting articles are prominently visible in the article list. The Mindful Reader has not been developed from scratch. To avoid unnecessary re- implementation of critical aggregator features, such as feed parsing and article rendering, it is instead based upon the RSSOwl project. RSSOwl is licensed under the Eclipse Public License; it is an open-source, cross-platform, Java-based feed aggregator that provides excellent functionality. The Mindful Reader was tested and evaluated through a series of two week-long user tests conducted in April of 2009. The first of these tests showed that the implicit interest inference mechanism worked well for many users, and the second test showed that the user interest model improved the accuracy of its predictions with time. iii Table of Contents Abstract ....................................................................................................................................................................................... ii Executive Summary ............................................................................................................................................................... iii List of Figures ........................................................................................................................................................................... vi List of Tables ............................................................................................................................................................................ vii 1 Introduction ..................................................................................................................................................................... 1 2 Background ...................................................................................................................................................................... 3 2.1 Feeds and the RSS Specification ..................................................................................................................... 3 2.1.1 Feed File Contents ...................................................................................................................................... 3 2.1.2 Using Feeds with Aggregators ............................................................................................................... 3 2.1.3 Aggregator Use Amongst Web Users .................................................................................................. 3 2.2 Academic Adaptive News Access Systems ................................................................................................. 4 2.3 User Interest Modeling Techniques .............................................................................................................. 4 2.3.1 Gathering Interest Ratings ...................................................................................................................... 4 2.3.2 Representing Interest Ratings and Content .................................................................................... 5 2.3.3 Predicting User Interest in New Content .......................................................................................... 6 2.4 Existing Software.................................................................................................................................................. 7 2.4.1 RSSOwl ............................................................................................................................................................ 8 2.4.2 Mozilla Thunderbird ................................................................................................................................. 9 2.4.3 RSSBandit ..................................................................................................................................................... 10 2.4.4 BottomFeeder ............................................................................................................................................ 11 2.4.5 Ground-Up Development ...................................................................................................................... 12 2.4.6 Feature Comparison ................................................................................................................................ 13 3 Methodology .................................................................................................................................................................. 14 3.1 Design ...................................................................................................................................................................... 14 3.2 Development ........................................................................................................................................................ 14 3.2.1 Requirements ............................................................................................................................................. 14 3.2.2 Testing and Debugging ........................................................................................................................... 15 3.3 Evaluation .............................................................................................................................................................. 16 3.3.1 Evaluation Methodology ........................................................................................................................ 16 3.3.2 Evaluation Survey ..................................................................................................................................... 16 3.4 Project Management ......................................................................................................................................... 17 3.4.1 Schedule ....................................................................................................................................................... 17 4 Software Design ............................................................................................................................................................ 19 4.1 RSSOwl Architecture ......................................................................................................................................... 19 4.1.1 org.rssowl.core module .......................................................................................................................... 19 4.1.2 org.rssowl.lib.db4o module .................................................................................................................. 19 4.1.3 org.rssowl.httpclient ............................................................................................................................... 19 4.1.4 org.rssowl.lib.jdom................................................................................................................................... 20 4.1.5 org.rssowl.lib.lucene ................................................................................................................................ 20 4.1.6 org.rssowl.ui ............................................................................................................................................... 20 4.2 User Behavior Monitoring and