Unanswered Questions and Policy Challenges
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by The University of Utah: J. Willard Marriott Digital Library Personalizing the Web Using Site Descriptions Vinod Anupam Yuri Breitbart Juliana Freire Bharat Kumar {anupam,yuri,juliana,bharat}@research.bell-labs.com Bell Laboratories, Lucent Technologies Murray Hill, NJ 07974 " A b s tra c t mation when signing up, and the sites can track the interests of the user. The information overload on the Web has created a In this paper we propose a new approach to person great need fo r efficient filtering mechanisms. Many sites alization that allows a user to create personalized pages (e.g., CNN and Quicken) address this problem by allow that contain information from any Web site. The user has ing a user to create personalized pages that contain only full control over which information is retrieved, and thus information that is o f interest to the user. We propose a she need not sign-up for any special service. In addition, new approach fo r personalization that improves on exist personalized pages may include information from multiple ing services in three significant ways: the user can create sites (e.g., one’s favorite news categories from her preferred personalized pages with information from any site (with news source, weather information for her city, and the traf out being restricted to sites that offer personalization); fic report for her afternoon commute). This service can personalized pages may contain information from multiple also be extended to customize change notifications based Web sites (e.g., a user can create a personalized page that on conditions that may span multiple sites. contains not only news categories from her favorite news Our personalization system, MyOwnWeb, views a per sources, but also information about the prices o f all stocks sonalized page as a set of logical queries against Web sites, whose names appear in the headlines o f selected news, and and the user can specify the conditions under which each weather information fo r a particular city); and users have query needs to be re-executed and the page refreshed. Each more privacy since they are not required to sign up fo r the logical query is translated into a query to a specific Web site service. In order to build a personalization service that by translating logical attributes of the personalized system is general and easy to maintain, we make use o f site de into the actual attributes of a specific Web site. Such an scriptions that facilitate access to the data stored in and approach enables user to get information from several Web generated by Web sites. Site descriptions encode informa sites without requiring the user to know specific address tion about the contents, structure, and sennces offered by a able attributes that the site may have. Web site, and they can be created semi-automatically. Since Web languages such as W3QL [11], We- bOQL [14], and WebL [9] provide mechanisms for query ing the Web, a limited version of this service could be built 1. In tro d u ctio n using these languages — either allowing users to write a The information overload that we face today on the set of queries, or by providing a set of pre-defined queries Web has created a great need for efficient filtering mecha for a user to choose from. Neither alternative is satisfac nisms that give better and faster access to data. Some sites tory: whereas the latter is too restrictive (and since Web (e.g., CNN and Quicken) address this problem by allow sites change constantly, significant effort may be required ing a user to create personalized pages that contain only to maintain such queries), the former is likely not to be information that is of interest to the user. Other sites send widely acceptable as these languages can be too complex notifications when the underlying data changes and some for an average Web user. condition is met. However, there are problems with these MyOwnWeb uses site descriptions as a simple and suc approaches to personalization: 1) personalization is limited cinct way to represent the structure and contents of a Web to the information accessible from a single site; 2) person site. These site descriptions are an extension of the naviga alization features are not offered by all the Web sites that tion maps described in [5], and thus they can be generated might be of interest to a user; 3) since sites that provide semi-automatically. In addition users (or map builders) personalization require users to sign up, there are privacy are not required to know specifics of the language that is issues involved — the user needs to divulge personal infor Heb_page[ Declaration of Class WebPage used for site description specification. Given a set of such address^-url: URL of page descriptions, a naive Web user can create queries by sim contents=^string; HTML contents of page actions=££*{action} List of actions found in the page ply selecting from site descriptions the contents of interest extracted-attr=&{output-attr}]', Attributes extractable from page without having to specify how they should be retrieved. A action[ Declaration of Class Action point and click user-interface is provided that lets users se object=^{link,form} Object that action applies to source=Hirl; Page where the action belongs lect attributes of interest from source pages, and prompts t ar get s —»web_page; Where this could lead us the user for required input values. In the absence of the doitd attrValPair^-web page] Method to execute action description for a site, our service provides SmartBook- submit_form : action Form fillout is an action follow_link : action Following a link is an action marks [2], a mechanism that transparently records a se lin k[ Declaration of Class Link quence of browsing actions that can be saved and replayed name=>-string; Name of link later. address=>url | URL of link form[ Declaration of Class Form The rest of the paper is organized as follows. In Sec cgi=Hirl; URL for submission of this form tion 2 we describe how site descriptions are used to model method=>meth; Submit Method of this form mandat or y=££* input _attr; Mandatory attributes of this form Web sites, and how they can be used to build personalized opt i onal=S£- input _attr; Optional attributes of this form pages. An overview of the MyOwnWeb personalization state=£>attrValPair | State of form (name-value pairs) system is given in Section 3. Related work and conclud attrValPair[ Declaration of Class AttrValPair attrName—^string; Name of the attribute part ing remarks are presented in Sections 4 and 5. type—)-widget; Checkbox, select, radio, text etc. default—)• Object; Default value of the attribute value—> Ob j ect] The value part 2. Modeling Web Sources Several data models to represent the Web have been Figure 1. Data Model for Navigation Maps proposed in the literature (e.g., [4, 10, 5]). Site descrip tions based on these data models provide a uniform repre action has a source (the Web page where it appears), a set sentation of sites that is similar to a schema in traditional of possible targets (the Web pages it can lead to), and a databases, and thus simplify querying of described sources. method to execute the action (for example, follow link or They are also focused, and need only reflect a set of pages fill form). Figure 4 shows an instance of a navigation map for www. q u ic k en . com. that contain relevant information and services. Since these models contain not only information about the contents of Even though navigation maps provide a good frame pages, but also information required to navigate through a work for retrieving Web pages, issues related to data ex set of pages, they can be used for handling data retrieval traction are not discussed in [5]. In what follows, we detail and extraction in our personalization service. our extensions to handle data extraction. The site descriptions used in our personalization ser vice are an extension of the navigation maps described in Data Extraction In order to provide filtering in our per [5]. A navigation map is a collection of Web objects that sonalization service, retrieved Web pages are further pro represent the portion of a Web site that is of interest to cessed and attributes of interest extracted. For exam the map builder. It can be represented by a directed graph ple, if one is interested in the prices of a set of stocks, whose nodes correspond to Web pages, and an edge be he/she should be able to select from the page returned by tween nodes corresponds to the action that took the user q u ic k e n . com just the table with the prices (and omit from one page to another (see Figure 3). There are many all ads and other information that is returned together with advantages of using this model in a personalization service: the table). Sometimes though, fine-grained data extraction (1) the user can be presented with a visual representation of may be required. For example, the user may want to see the site and its contents, and once the desired information only the B id and Ask prices for a stock, and omit all the is selected, the actual navigation process to retrieve the se other information. Such an extraction can be done by se lected pages can be generated automatically; (2) since these lecting attributes of interest from the source pages using a maps can be created semi-automatically (by example), one point-and-click interface.