The Buzz: Supporting Extensively Customizable Information Awareness Applications

THE BUZZ: SUPPORTING EXTENSIVELY CUSTOMIZABLE INFORMATION AWARENESS APPLICATIONS A Dissertation Presented to The Academic Faculty by James R. Eagan In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the College of Computing Georgia Institute of Technology December 2008 THE BUZZ: SUPPORTING EXTENSIVELY CUSTOMIZABLE INFORMATION AWARENESS APPLICATIONS Approved by: John T. Stasko, Advisor Beki Grinter College of Computing College of Computing Georgia Institute of Technology Georgia Institute of Technology Mark Guzdial Saul Greenberg College of Computing Department of Computer Science Georgia Institute of Technology University of Calgary Keith Edwards Date Approved: August 2008 College of Computing Georgia Institute of Technology For my father. iii ACKNOWLEDGEMENTS This work is the product of input and help from many sources. Many thanks to my committee for their time, effort, and insight in shaping this work, and especially to my advisor, John Stasko. To my peers who helped me shape and refine my ideas, James Hudson, Jochen Rick, Chris Plaue, Zach Pousman, Erika Shehan, Dugald Hutchings, Brian Dorn, Allison Tew, and the members of the Information Interfaces lab at Georgia Tech. Thanks to the U.S. National Science Foundation (IIS-0414667) for supporting much of this work. iv TABLE OF CONTENTS DEDICATION . iii ACKNOWLEDGEMENTS . iv LIST OF TABLES . viii LIST OF FIGURES . ix I INTRODUCTION . 1 1.1 Motivation . 1 II SUPPORTING MODELS . 9 2.1 User Ecology . 9 2.2 Information Awareness Data Pipeline . 11 III RELATED WORK . 14 3.1 Information Awareness Applications . 14 3.2 Data Gathering . 17 3.2.1 Data Scraping . 17 3.2.2 Web Data Scraping . 19 3.2.3 Data Storage . 21 3.2.4 Data Presentation . 22 3.3 End User Customization . 23 IV THE BUZZ . 25 4.1 Formative Interviews . 25 4.1.1 Interview Observations . 26 4.2 Prototype Design Goals . 28 4.3 Iterative Prototypes . 30 4.4 Running The Buzz . 31 4.4.1 Users of The Buzz . 34 4.5 Channels . 35 4.6 Customizing The Buzz . 35 4.7 Creating a Channel Lineup . 36 4.8 Modifying and Creating Channels . 40 v 4.8.1 The Channel Editor . 41 4.9 A Tale of Three Channels . 45 4.9.1 Webcams . 45 4.9.2 Digg . 48 4.9.3 NSF News . 52 4.9.4 Channel Data Model . 57 4.9.5 Channel Presentation Model . 59 4.9.6 Layout Templates and Regions . 61 4.10 Sharing Customizations . 62 V THE BUZZ ARCHITECTURE . 67 5.1 Database . 67 5.2 Properties . 70 5.3 Harvesters . 71 5.4 Scrapers . 72 5.5 Visualizers . 74 5.6 Dispatch . 75 5.7 Plugins . 77 VI COMPARATIVE ANALYSIS . 79 6.1 Gentle Slope . 79 6.2 Six Approaches . 82 6.2.1 Yahoo Pipes . 82 6.2.2 Marmite . 86 6.2.3 Konfabulator/Dashboard/Google/LiveGadgets . 89 6.2.4 Summaries Framework and Cards . 91 6.2.5 Koala/Coscripter . 92 6.2.6 Notification Engine . 93 6.3 Customization Space . 95 6.3.1 The User . 97 6.3.2 Data Extraction and Transformation . 99 6.3.3 Data Presentation . 104 6.3.4 Beyond the Data Pipeline . 108 vi 6.4 Customization Space Summary . 112 6.5 Dimensions Overview . 114 6.6 Challenges and Limitations . 116 6.6.1 General Limitations . 117 6.6.2 Limitations of The Buzz . 123 VII DEPLOYMENT STUDY . 125 7.1 Desktop Deployment . 125 7.2 Lounge Deployment . 127 7.3 Lounge Deployment Observations . 129 7.4 Deployment Conclusions . 130 VIII CONCLUSION . 131 REFERENCES . 138 vii LIST OF TABLES 1 Built-in harvesters in The Buzz. 72 2 Operators available in Yahoo Pipes . 85 viii LIST OF FIGURES 1 Al Gore working in his office, with three displays on the desktop and another on the wall........................................... 2 2 The awareness data pipeline. 11 3 A microformat to provide semantic information for a phone number in an HTML document. 18 4 Potential channel information sources . 28 5 The original channel selector interface, which was replaced by the channel browser shown in Figure 8. 31 6 The Buzz running on a extra monitor on the desktop (at right). 32 7 The Buzz running in a public lounge. 32 8 Viewing the current channel lineup in The Buzz. 37 9 Browsing available shared channels in The Buzz. 39 10 Downloading a shared channel that already exists in the user’s current channel lineup. 40 11 The channel editor in The Buzz. 42 12 Standard (left) and Advanced (right) options for configuring a Flickr channel. 43 13 The “My Webcams” channel in The Buzz. 46 14 Editing the list of webcams used by the “My Webcams” channel. 46 15 Modifying the presentation of the “My Webcams” channel. 47 16 Viewing the configuration for a region. 48 17 The Digg channel in The Buzz. 49 18 Setting the name and description of a new channel. 49 19 Configuring the Digg data gatherer. 50 20 A Digg article summary page. 51 21 The pattern editor in The Buzz. 52 22 Configuring the CISE News channel to gather CISE-related news articles from the NSF........................................... 54 23 Images extracted from the CISE section of the NSF News website. 55 24 Creating an extraction pattern for NSF News articles. 56 25 The CISE News articles channel, including article summaries. 57 26 The Channel model in The Buzz. 58 ix 27 Configuring the presentation of a channel with potentially stale data. 59 28 Customizing the presentation.

The Buzz: Supporting Extensively Customizable Information Awareness Applications

An Overview of the 50 Most Common Web Scraping Tools

Webscraper.Io a How-To Guide for Scraping DH Projects

1 Running Head: WEB SCRAPING TUTORIAL a Web Scraping

A Study on Web Scraping

CSCI 452 (Data Mining) Dr. Schwartz HTML Web Scraping 150 Pts

Lecture 18: HTML and Web Scraping

CSCI 452 (Data Mining) Basic HTML Web Scraping 75 Pts Overview for This Assignment, You'll Write Several Small Python Programs T

A Comparative Study on Web Scraping

Optimized Template Detection and Extraction Algorithm for Web Scraping of Dynamic Web Pages

Detection of Web API Content Scraping an Empirical Study of Machine Learning Algorithms

Automatic Data Collection on the Internet (Web Scraping) Ingolf Boettcher ([email protected])1 VERSION 18 May 2015

Web Scraping, Applications and Tools