Summarizing Personal Web Browsing Sessions

Total Page:16

File Type:pdf, Size:1020Kb

Summarizing Personal Web Browsing Sessions Summarizing Personal Web Browsing Sessions Mira Dontcheva1 Steven M. Drucker3 Geraldine Wade4 David Salesin1,2 Michael F. Cohen3 1Computer Science & Engineering 2Adobe Systems 3Microsoft Research, 4Microsoft University of Washington 801 N. 34th Street One Microsoft Way Seattle, WA 98105-4615 Seattle, WA 98103 Redmond, WA 98052-6399 {mirad, salesin}@cs.washington.edu [email protected] {sdrucker, gwade, mcohen}@microsoft.com ABSTRACT ing sessions typically last a long time, span several sessions, We describe a system, implemented as a browser extension, involve gathering large amounts of heterogeneous content, that enables users to quickly and easily collect, view, and and can be difficult to organize ahead of time as the cate- share personal Web content. Our system employs a novel in- gories emerge through the tasks themselves. Current meth- teraction model, which allows a user to specify webpage ex- ods for collecting and organizing Web content include sav- traction patterns by interactively selecting webpage elements ing bookmarks or tabs, collecting content in documents, or and applying these patterns to automatically collect similar storing pages locally. While useful, these methods require a content. Further, we present a technique for creating visual great deal of overhead as pages must be saved manually and summaries of the collected information by combining user organized into folders, which distracts from the real task of labeling with predefined layout templates. These summaries analyzing the content and making decisions. are interactive in nature: depending on the behaviors encoded in their templates, they may respond to mouse events, in ad- In this paper, we present a new approach to collecting and dition to providing a visual summary. Finally, the summaries organizing Web information. Our approach is motivated by can be saved or sent to others to continue the research at an- Gibson et al.’s analysis [8] of the amount of template content other place or time. Informal evaluation shows that our ap- on the Web. Their study shows that 40-50% of the content on proach works well for popular websites, and that users can the Web is template content and that template material has quickly learn this interaction model for collecting content grown and is continuing to grow at a rate of approximately from the Web. 6% per year. Studies of users’ habits for collecting Web con- tent [19] further show that people are often interested in col- ACM Classification H5.2 [Information interfaces and pre- lecting not only full webpages but also pieces of webpages. sentation]: User Interfaces. - Graphical user interfaces. These findings motivated us to design an interface that lever- ages the growing amount of templated information on the General Terms Design, Human Factors, Algorithms Web and allows users to gather only the pieces of content they find relevant to their task. Keywords: Information management, webpage extraction patterns, template-based summarization Our interface includes two parts: user specification of rele- INTRODUCTION vant content, and automatic collection of similar content. The The proliferation of Web content and the advances in search user specifies relevant content by interactively selecting web- technology have fundamentally changed our lives. Today, to page elements and labeling each element with a keyword, plan a vacation, you no longer go to a travel agent; when chosen from a predefined set. The selected content is stored making purchases, you no longer go to your neighborhood locally and used to create an extraction pattern that can be specialty store; and when learning about a new topic, you no applied to collect more content from similar pages. Extrac- longer go to the library. Instead, you search the Web. While tion patterns specify the locations of selected elements using finding information is becoming easier and easier, collecting the structure of the webpage and, optionally, text that must be and organizing Web research and coming back to it remains matched in any new pages. When the user visits a new page, difficult, especially when the research is exploratory. Sharing he can automatically collect all of the content that is analo- the results of the browsing session with others can be even gous to the content previously selected as relevant. Further, more difficult. As found by Sellen et al. [20], research brows- we provide an interface for collecting content from multiple pages simultaneously by allowing the user to select hyper- links that point to pages of interest. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are To organize and present the gathered content, we employ lay- not made or distributed for profit or commercial advantage and that copies out templates that create visual summaries. A layout template bear this notice and the full citation on the first page. To copy otherwise, to specifies how the content should be organized and formatted republish, to post on servers or to redistribute to lists, requires prior specific and defines a set of user interactions available within a sum- permission and/or a fee. UIST’06, October 15–18, 2006, Montreux, Switzerland.. mary. The templates filter and present the content according Copyright 2006 ACM 1-59593-313-1/06/0010 ...$5.00. to the labeling specified by the user during content selection. To demonstrate possible content presentations, we have im- some cases, we override the browser’s layout engine, using plemented several layout templates, including a map, calen- scripts, to create more aesthetic results. dar, and grid. Our system is not limited to these designs and can incorporate additional layout templates. Bibliographic reference tools, such as RefWorks [16] and EndNote [6], are specifically designed for collecting and Our work has two key contributions: a novel interaction managing academic references. These systems have special- mechanism for collecting Web content semi-automatically, ized filters, much like our extraction patterns, for electronic and a presentation technique for the gathered content that references websites and can automatically import all the data uses pre-defined layout templates to construct summaries for a particular reference into a database. Our system com- that are interactive and can be shared with others. We demon- plements such systems by giving the user an interface for strate our approach on three types of tasks: comparison shop- selecting personally relevant bibliographic information. Fur- ping, travel planning, and researching academic papers. How- thermore, our approach is more general and is applicable to ever, our approach can be used for any type of Web research other Web content. and can be applied to other domains. User feedback shows USER INTERFACE that people can quickly learn to use our system and find many We designed the user interface with the goal of making the tasks for which it is useful. process of collecting information as unobtrusive as possible and allowing the user to focus on the task at hand rather RELATED WORK than worry about organizing and keeping track of content. Our work has its roots in the area of information workspaces Our system is implemented as an extension to the Firefox and systems like WebBook [5], Data Mountain [17], and browser and is presented to the user through a toolbar (see TopicShop [1], which present different mechanisms for pre- Figure 1). The toolbar includes four buttons and a check- senting and organizing collections of webpages. Our system box. The “Start” button opens the “summary” window that is most closely related to Hunter Gatherer [19] and Internet displays a visual summary of the content gathered thus far. Scrapbook [22], which provide the ability to save webpage The “Select” button initiates the selection mode and enables elements and place them together in a document. Like these the user to select and label pieces of Web content. The “Add two systems, we provide an interface that allows the user to Page” button allows the user to automatically add the content select pieces of webpages; however, we go further by mak- found by an extraction pattern to the summary. This button ing use of the user selection to create extraction patterns that is enabled only when there is a matching pattern that can can then be used to gather content automatically. We also be used to extract content automatically. The “Add Linked provide a mechanism for organizing the gathered content in Page(s)” button initiates a different selection mode in which richer ways, such as on a map or calendar, and for different the user can select any number of hyperlinks to add the con- purposes, such as for a PDA or printer. tent from the linked pages to the summary directly and simul- taneously. A checkbox specifies whether to visually outline Our use of user-specified labels for relating content from dif- the elements found by an extraction pattern on a new web- ferent sources is similar to approaches in Semantic Web ap- page. If it is checked, the elements found are outlined in pur- plications, such as Piggy Bank [11] and Thresher [10]. Our ple. The summary window (shown in Figure 2) has buttons work differs from these in two ways. First, we create visual for opening and saving summaries and a menu for changing summaries of the gathered content, while Piggy Bank and the layout template used to compose the summary. Thresher store the collected data in a database. Second, our interface for labeling the content is more accessible to the Sample User Scenario average Web user, because the user must simply right-click To help explain the interface, we describe an example user and select a keyword. Piggy Bank requires that the user write scenario and show the steps taken by the user to create a scripts, and Thresher expects that the user has some knowl- summary.
Recommended publications
  • Discontinued Browsers List
    Discontinued Browsers List Look back into history at the fallen windows of yesteryear. Welcome to the dead pool. We include both officially discontinued, as well as those that have not updated. If you are interested in browsers that still work, try our big browser list. All links open in new windows. 1. Abaco (discontinued) http://lab-fgb.com/abaco 2. Acoo (last updated 2009) http://www.acoobrowser.com 3. Amaya (discontinued 2013) https://www.w3.org/Amaya 4. AOL Explorer (discontinued 2006) https://www.aol.com 5. AMosaic (discontinued in 2006) No website 6. Arachne (last updated 2013) http://www.glennmcc.org 7. Arena (discontinued in 1998) https://www.w3.org/Arena 8. Ariadna (discontinued in 1998) http://www.ariadna.ru 9. Arora (discontinued in 2011) https://github.com/Arora/arora 10. AWeb (last updated 2001) http://www.amitrix.com/aweb.html 11. Baidu (discontinued 2019) https://liulanqi.baidu.com 12. Beamrise (last updated 2014) http://www.sien.com 13. Beonex Communicator (discontinued in 2004) https://www.beonex.com 14. BlackHawk (last updated 2015) http://www.netgate.sk/blackhawk 15. Bolt (discontinued 2011) No website 16. Browse3d (last updated 2005) http://www.browse3d.com 17. Browzar (last updated 2013) http://www.browzar.com 18. Camino (discontinued in 2013) http://caminobrowser.org 19. Classilla (last updated 2014) https://www.floodgap.com/software/classilla 20. CometBird (discontinued 2015) http://www.cometbird.com 21. Conkeror (last updated 2016) http://conkeror.org 22. Crazy Browser (last updated 2013) No website 23. Deepnet Explorer (discontinued in 2006) http://www.deepnetexplorer.com 24. Enigma (last updated 2012) No website 25.
    [Show full text]
  • 2013 State of Media
    UNDERSTANDING GROWTH TARGETING THE NEW MOBILE AUDIOAND FRAGMENTATION 26 CONSUMER82 STATE OF MEDIA 2013 ANALYTICS DEATH OF BIG DATA, RISE OF SMART DATA 68 ESPN: End of an Empire? 34 02 VIDEO EVERYWHERE FACT, FICTION & FUTURE Letter from the President For me, working in the discussions that will continue this industry has to drive growth and progress. always been an exciting adventure. Offline and online are cohabitating It still is to this now more than ever to earn day. Nevertheless, viewers’ time, so let’s examine the continuous shakeup of content and provider advancements models. Today technologies like in the media Dish Network’s Hopper and landscape Aereo (page 6) are stirring the can make pot, but tomorrow, who knows? our jobs and the task of tracking I also happen to be a tablet addict, trends a challenging endeavor. so it’s been enjoyable to see my These rapid changes are what favorite magazines adapting to the encouraged us to write our very ever-increasing push for crossover first STATE OF MEDIA (SOM) content (page 22). This process has four years ago, and I am proud to already made for some truly creative say our mission to keep business uses of the medium and I can’t wait partners, clients, new friends and to see what’s next. Again, it all ourselves informed has successfully reminds me that we should dispel continued to this day. Now, just the premonitions and instead look like the industry in which we work, at the opportunities a more unified KSM is evolving our publication media ecosystem will produce.
    [Show full text]
  • Before the Web There Was Gopher
    Before the Web There Was Gopher Philip L. Frana Charles Babbage Institute The World Wide Web, universally well known today, was preceded by an efficient software tool that was fondly named Gopher. The Internet Gopher, much like the Web, enabled users to obtain information quickly and easily. Why, then, did it disappear but the Web did not? Gopher faded into obscurity for two main reasons: hypertext and commerce. Before the Web there was Gopher, a nearly anymore, despite the efforts of a handful of defunct Internet application protocol and asso- individuals to revitalize the protocol. Why did ciated client and server software.1 In the early Gopher fail? Many Gopher aficionados have a 1990s, Gopher burrowed a crucial path ready answer: pretty pictures. For them, the through a perceived bottleneck on the Web won and Gopher lost because the Web Information Superhighway. Internet Gopher could display vibrant images of, for example, passed away as a technological and social phe- hand-woven Turkish rugs, animated glyphs of nomenon quickly: Its inventors released mice stuffing email into virtual mailboxes, and Gopher in 1991. Within three years, the World blinking advertising banners. Clearly, the Wide Web had bypassed it. Most people who “Gopher faithful”—as they are often called— surf the Web today have no idea what Gopher are right about the importance of pictures. is, but in the 1990s it helped usher in a new age People get serious when money is at stake, and of user-friendly access to a rapidly growing uni- in the early 1990s lots of people came to accept verse of online information.
    [Show full text]
  • Why Websites Can Change Without Warning
    Why Websites Can Change Without Warning WHY WOULD MY WEBSITE LOOK DIFFERENT WITHOUT NOTICE? HISTORY: Your website is a series of files & databases. Websites used to be “static” because ​ there were only a few ways to view them. Now we have a complex system, and telling your webmaster what device, operating system and browser is crucial, here’s why: TERMINOLOGY: You have a desktop or mobile “device”. Desktop computers and mobile ​ ​ ​ devices have “operating systems” which are software. To see your website, you’ll pull up a ​ ​ ​ ​ “browser” which is also software, to surf the Internet. Your website is a series of files that ​ ​ needs to be 100% compatible with all devices, operating systems and browsers. Your website is built on WordPress and gets a weekly check up (sometimes more often) to ​ ​ ​ see if any changes have occured. Your site could also be attacked with bad files, links, spam, comments and other annoying internet pests! Or other components will suddenly need updating which is nothing out of the ordinary. WHAT DOES IT LOOK LIKE IF SOMETHING HAS CHANGED? Any update to the following can make your website look differently: There are 85 operating ​ systems (OS) that can update (without warning). And any of the most popular roughly 7 ​ ​ ​ browsers also update regularly which can affect your site visually and other ways. (Lists below) ​ Now, with an OS or browser update, your site’s 18 website components likely will need ​ ​ updating too. Once website updates are implemented, there are currently about 21 mobile ​ devices, and 141 desktop devices that need to be viewed for compatibility.
    [Show full text]
  • Principles of Segmentation Culture Segments: the Quest for the Holy Grail 10 Top Tips for Successful Segmentation
    > Segmentation ISSUE 37 | JANUARY 2010 Principles of segmentation Culture segments: the quest for the holy grail 10 top tips for successful segmentation > So I think I’ve got > Segmentation in a segmentation practice: getting > Friends, fans … but how do I started and followers colour it in? Contents JAM is published by JAM is sponsored by > Regulars Spotlight ................................................................. 3 Research round-up ............................................ 4 www.a-m-a.co.uk Just a minute ...................................................... 11 © sugarfree/istock AMA museums and galleries marketing day ................................................... 12 Resources ........................................................... 23 > Segmentation ( Principles of segmentation .............................6 0 Case study: Segmentation in practice: 6 getting started .....................................................9 Culture segments: the quest A rough guide to the Principles of Just a minute for the holy grail ............................................... 14 segmentation toolkit segmentation Case study: segment evolution at the Southbank Centre .............................................17 Case study: Friends, fans and followers .............................................................. 18 Case study: So I think I’ve got a segmentation .... but how do I colour it in? 20 10 top tips for successful segmentation ....................................................22 AMA museums and Culture segments: the
    [Show full text]
  • Transforming the Customer Experience Through New Technologies ⁎ Wayne D
    Available online at www.sciencedirect.com ScienceDirect Journal of Interactive Marketing 51 (2020) 57–71 www.elsevier.com/locate/intmar Transforming the Customer Experience Through New Technologies ⁎ Wayne D. Hoyer a, & Mirja Kroschkeb& Bernd Schmitt c& Karsten Kraume d& Venkatesh Shankar e a McCombs School of Business, University of Texas at Austin, Austin, TX, United States of America b Marketing Center Muenster, University of Muenster, Münster, Germany c Columbia Business School, Columbia University, New York City, NY, United States of America d European Research Center for Information Systems, University of Münster, Münster, Germany e Center for Retailing Studies, Mays Business School, Texas A&M University, College Station, TX, United States of America Available online 19 May 2020 Abstract New technologies such as Internet of Things (IoT), Augmented Reality (AR), Virtual Reality (VR), Mixed Reality (MR), virtual assistants, chatbots, and robots, which are typically powered by Artificial Intelligence (AI), are dramatically transforming the customer experience. In this paper, we offer a fresh typology of new technologies powered by AI and propose a new framework for understanding the role of new technologies on the customer/shopper journey. Specifically, we discuss the impact and implications of these technologies on each broad stage of the shopping journey (pre-transaction, transaction, and post-transaction) and advance a new conceptualization for managing these new AI technologies along customer experience dimensions to create experiential
    [Show full text]
  • For All to Play Launches Kickstarter for Grail to the Thief, a Blind Accessible Audio Game
    For All To Play Launches Kickstarter for Grail to the Thief, a Blind Accessible Audio Game Slay a foul beast, save a child in danger, rescue a princess, and steal the Holy Grail! (Worcester, MA) April 7, 2014 - For All To Play is excited to announce the launch of their Kickstarter campaign for Grail to the Thief, an interactive audio adventure for Windows, Mac OS X, and Linux that can be played using only sound, without the need for visuals. The game has been designed with the needs of the blind and visually impaired in mind but can be enjoyed by everyone. The game will deliver an exciting, immersive experience in which the player will always be fully aware of what is happening through the use of voice-overs, sound effects, ambient sound and music. In Grail to the Thief, players make their choices through a conversation tree, eliminating the confusion and frustration that comes with traditional text adventure games, which require players to type in commands to progress. The game is a nostalgic throwback to childhood favorites such as Zork, The Hitchhiker’s Guide to the Galaxy, Day of the Tentacle, and Grim Fandango, and draws inspiration from old BBC radio dramas and the movie Time Bandits. The game stars Hank Krang, a thief from the future, and his self-aware time machine, TEDI, as they go throughout time stealing priceless artifacts. In Grail to the Thief, Hank and TEDI travel to Arthurian times in search of the Holy Grail. For All To Play has developed a browser based prototype of Grail to the Thief.
    [Show full text]
  • Freshman Survey
    Academic Violations Yes, a couple of incidents. My freshman year I took Latin with a wonderful teacher. She also taught greek (which I wasn't in) and had about 3 boys who she suspected of cheating the entire semester. She finally caught them cheating on their final but unfortunately, at the time my school did not have an academic honesty code so they were not punished. Partially because of this incident my school implemented an academic honesty code so in my junior year when someone attempted to plagiarize their summer AP History paper they received a 0 for a grade. In 11th grade, we had to write a paper on two books that we had read for English and roughly 30-40 kids plagiarized by taking information off of spark notes. The punishment included a 0 on the paper as well as an in- school-suspension (ISS) in the extreme cases. Yes, my junior year in Computer Science AB. Students got into the habit of copying other peoples labs (including simple copy-and-past of Java files) and turning them in. My teacher, Mr. Wittry, ran JPlag on all the labs, and a lot of people got caught cheating. Yes, four students at my old school repeatedly snuck into the school after school hours and stole test answers from the testing center. As their punishment, they were suspended for different amounts of time (depending on the degree of their involvement) and indefinitely suspended from athletic teams and other extracurricular activities. Also, they were given failing grades for the classes they cheated in.
    [Show full text]
  • AGS Annual Report 1996
    1/23/98 11:36 AM Annual Technical Summary Application Gateway System NSF Grant Number: NCR-9302522 January 96 to December 96 Corporation for National Research Inititives 1895 Preston White Drive, Suite 100 Reston, VA 20191 1/23/98 11:36 AM Introduction This report describes progress on the Application Gateway System (AGS) during the period beginning January 1, 1996 and ending December 31, 1996. The Application Gateway System is a collection of software components which provides a robust substrate that operates a collection of servers using distributed load-balancing techniques and secure remote management. The AGS is composed of routers, an AGS Management System and a collection of computers that host the application servers. Each host is called an Application Gateway (AG). An AGS may be comprised of a collection heterogeneous computers (AGs) and operating systems of varying capabilities. Several applications and core technologies for the AGS system were developed and demonstrated during this reporting period: 1. Two implementations of the rate.d system were developed to provide load-balanced high availability access to a single service. The first demonstration used a simple "delay service" to simulate client requests of variable resource demands, while the second simulated multiple service vending by the AGS. 2. A set of GUI tools to monitor and control a rate.d balanced AGS through the use of the ILU distributed object system. The rate.d system and monitor tools were demonstrated to D. Lindberg (NLM), B. Leiner and R. Larsen (DARPA), and M. Nelson (OSTP) 3. MEDLINE data was integrated into the Handle System.
    [Show full text]
  • Kenneth Arnes Ryan Paul Jaca Network Software Applications History
    Kenneth Arnes Ryan Paul Jaca Network Software Applications Networks consist of hardware, such as servers, Ethernet cables and wireless routers, and networking software. Networking software differs from software applications in that the software does not perform tasks that end-users can see in the way word processors and spreadsheets do. Instead, networking software operates invisibly in the background, allowing the user to access network resources without the user even knowing the software is operating. History o Computer networks, and the networking software that enabled them, began to appear as early as the 1970s. Early networks consisted of computers connected to each other through telephone modems. As personal computers became more pervasive in homes and in the workplace in the late 1980s and early 1990s, networks to connect them became equally pervasive. Microsoft enabled usable and stable peer-to-peer networking with built-in networking software as early as 1995 in the Windows 95 operating system. Types y Different types of networks require different types of software. Peer-to-peer networks, where computers connect to each other directly, use networking software whose basic function is to enable file sharing and printer sharing. Client-server networks, where multiple end-user computers, the clients, connect to each other indirectly through a central computer, the server, require networking software solutions that have two parts. The server software part, running on the server computer, stores information in a central location that client computers can access and passes information to client software running on the individual computers. Application-server software work much as client-server software does, but allows the end-user client computers to access not just data but also software applications running on the central server.
    [Show full text]
  • Improving SSL Warnings
    Improving SSL warnings Adrienne Porter Felt Chrome security team [email protected] THE HOLY GRAIL 1. Warn only when under attack THE HOLY GRAIL 2. Users understand warnings THE HOLY GRAIL 3. Users follow warning advice How can browsers stop crying wolf? Photo credit: https://www.flickr.com/photos/tambako Real attacks HOW I IMAGINE D Webmaster ERRORS mistakes DEVELOPER MEASURED ERRORS, SSL SSL 10% 20% 0% WWW SUBDOMAIN ERRORS OVERRIDABLE SSL % SUBDOMAIN SUBDOMAIN REV OUTSIDE WILDCARD UNKNOWN TLD MULTI-TENANT LOCALHOST PRIVATE URL a248.e.akamai.net Certificate mis- issuance Client missing Anti-virus software root certificate REALATTACK FALSE POSITIVE FALSE Captive School or employer Malware portal Gov’t content filter State attacks Expired certificate ISP adding advertisements Client clock wrong Blame the clock Wrong clocks cause 20% of HSTS errors Captive portals 4.5% of all errors caused by redirects Wonky trust stores Expired and missing certificates Traffic shaping Throttle or block expensive streaming Schools & employers Network admins want to filter content Traffic is $$$$$ Monetizing traffic with ads, search, etc. Clear-cut attack More common than we think? define, identify, fix How do we explain this to users? Photo credit: https://www.flickr.com/photos/sandras_weeds WHAT WE WANT TO CONVEY Threat source: the attacker is on the network, not a malicious website WHAT WE WANT TO CONVEY Data risk: the data on foo.com is at risk (and no other data) WHAT WE WANT TO CONVEY False positives: be more concerned about errors on well-regarded sites
    [Show full text]
  • Crowdsourcing Event Detection in Youtube Videos
    Crowdsourcing Event Detection in YouTube Videos Thomas Steiner1, Ruben Verborgh2, Rik Van de Walle2, Michael Hausenblas3, and Joaquim Gabarró Vallés1 1 Universitat Politécnica de Catalunya – Department LSI 08034 Barcelona, Spain {tsteiner,gabarro}@lsi.upc.edu 2 Ghent University – IBBT, ELIS – Multimedia Lab Gaston Crommenlaan 8 bus 201, B-9050 Ledeberg-Ghent, Belgium {ruben.verborgh,rik.vandewalle}@ugent.be 3 DERI, NUI Galway IDA Business Park, Lower Dangan Galway, Ireland [email protected] Abstract. Considerable efforts have been put into making video content on the Web more accessible, searchable, and navigable by research on both textual and visual analysis of the actual video content and the accompanying metadata. Nev- ertheless, most of the time, videos are opaque objects in websites. With Web browsers gaining more support for the HTML5 <video> element, videos are becoming first class citizens on the Web. In this paper we show how events can be detected on-the-fly through crowdsourcing (i) textual, (ii) visual, and (iii) be- havioral analysis in YouTube videos, at scale. The main contribution of this paper is a generic crowdsourcing framework for automatic and scalable semantic anno- tations of HTML5 videos. Eventually, we discuss our preliminary results using traditional server-based approaches to video event detection as a baseline. 1 Introduction Official statistics [26] from YouTube—owned by Google and one of the biggest online video platforms—state that more than 13 million hours of video were uploaded dur- ing 2010, and that 48 hours of video are uploaded every single minute. Given this huge and ever increasing amount of video content, it becomes evident that advanced search techniques are necessary in order to retrieve the few needles from the giant haystack.
    [Show full text]