DOM-Based Content Extraction of HTML Documents

Total Page:16

File Type:pdf, Size:1020Kb

DOM-Based Content Extraction of HTML Documents DOM-based Content Extraction of HTML Documents Suhit Gupta Gail Kaiser David Neistadt Peter Grimm Columbia University Columbia University Columbia University Columbia University Dept. of Comp. Sci. Dept. of Comp. Sci. Dept. of Comp. Sci. Dept. of Comp. Sci. New York, NY 10027, US New York, NY 10027, US New York, NY 10027, US New York, NY 10027, US 001-212-939-7184 001-212-939-7081 001-212-939-7184 001-212-939-7184 [email protected] [email protected] [email protected] [email protected] Abstract extraction of useful and relevant content from web pages has many applications, ranging from Web pages often contain clutter (such as enabling end users to accessing the web more pop-up ads, unnecessary images and extraneous easily over constrained devices like PDAs and links) around the body of an article that distract cellular phones to providing better access to the a user from actual content. Extraction of “useful web for the visually impaired. and relevant” content from web pages has many applications, including cell phone and PDA Most traditional approaches to removing browsing, speech rendering for the visually clutter or making content more readable involve impaired, and text summarization. Most increasing font size, removing images, disabling approaches to removing clutter or making JavaScript, etc., all of which eliminate the content more readable involve changing font webpage’s inherent look-and-feel. Examples size or removing HTML and data components include WPAR [18], Webwiper [19] and such as images, which takes away from a JunkBusters [20]. All of these products involve webpage’s inherent look and feel. Unlike hardcoded techniques for certain common web “Content Reformatting”, which aims to page designs as well as “blacklists” of reproduce the entire webpage in a more advertisers. This can produce inaccurate results convenient form, our solution directly addresses if the software encounters a layout that it hasn’t “Content Extraction”. We have developed a been programmed to handle. Another approach framework that employs an easily extensible set has been content reformatting which reorganizes of techniques that incorporate advantages of the data so that it fits on a PDA; however, this previous work on content extraction. Our key does not eliminate clutter but merely insight is to work with the Document Object reorganizes it. Opera [21], for example, utilizes Model tree, rather than with raw HTML their proprietary Small Screen Rendering markup. We have implemented our approach in technology that reformats web pages to fit a publicly available Web proxy to extract inside the screen width. We propose a “Content content from HTML web pages. Extraction” technique that can remove clutter without destroying webpage layout, making 1. Introduction more of a page’s content viewable at once. Web pages are often cluttered with Content extraction is particularly useful distracting features around the body of an article for the visually impaired and blind. A common that distract the user from the actual content practice for improving web page accessibility they’re interested in. These “features” may for the visually impaired is to increase font size include pop-up ads, flashy banner and decrease screen resolution; however, this advertisements, unnecessary images, or links also increases the size of the clutter, reducing scattered around the screen. Automatic effectiveness. Screen readers for the blind, like Hal Screen Reader by Dolphin Computer 2. Related Work Access or Microsoft’s Narrator, don’t usually automatically remove such clutter either and There is a large body of related work in often read out full raw HTML. Therefore, both content identification and information retrieval groups benefit from extraction, as less material that attempts to solve similar problems using must be read to obtain the desired results. various other techniques. Finn et al. [1] discuss methods for content extraction from “single- Natural Language Processing (NLP) and article” sources, where content is presumed to information retrieval (IR) algorithms can also be in a single body. The algorithm tokenizes a benefit from content extraction, as they rely on page into either words or tags; the page is then the relevance of content and the reduction of sectioned into 3 contiguous regions, placing “standard word error rate” to produce accurate boundaries to partition the document such that results [13]. Content extraction allows the most tags are placed into outside regions and algorithms to process only the extracted content word tokens into the center region. This as input as opposed to cluttered data coming approach works well for single-body directly from the web [14]. Currently, most documents, but destroys the structure of the NLP-based information retrieval applications HTML and doesn’t produce good results for require writing specialized extractors for each multi-body documents, i.e., where content is web domain [14][15]. While generalized content segmented into multiple smaller pieces, extraction is less accurate than hand-tailored common on Web logs (“blogs”) like Slashdot extractors, they are often sufficient [22] and (http://slashdot.org). In order for content of reduce labor involved in adopting information multi-body documents to be successfully retrieval systems. extracted, the running time of the algorithm would become polynomial time with a degree While many algorithms for content equal to the number of separate bodies, i.e., extraction already exist, few working extraction of a document containing 8 different implementations can be applied in a general bodies would run in O(N8), N being the number manner. Our solution employs a series of of tokens in the document. techniques that address the aforementioned problems. In order to analyze a web page for McKeown et al. [8][9], in the NLP group content extraction, we pass web pages through at Columbia University, detects the largest body an HTML parser that corrects the markup and of text on a webpage (by counting the number creates a Document Object Model tree. The of words) and classifies that as content. This Document Object Model (www.w3.org/DOM) method works well with simple pages. is a standard for creating and manipulating in- However, this algorithm produces noisy or memory representations of HTML (and XML) inaccurate results handling multi-body content. By parsing a web site's HTML into a documents, especially with random DOM tree, we can not only extract information advertisement and image placement. from large logical units similar to Buyukkokten’s “Semantic Textual Units” Rahman et al. [2] propose another (STUs, see [3][4]), but can also manipulate technique that uses structural analysis, smaller units such as specific links within the contextual analysis, and summarization. The structure of the DOM tree. In addition, DOM structure of an HTML document is first trees are highly editable and can be easily used analyzed and then properly decomposed into to reconstruct a complete web site. Finally, smaller subsections. The content of the increasing support for the Document Object individual sections is then extracted and Model makes our solution widely portable. summarized. However, this proposal has yet to be implemented. Furthermore, while the paper Kaasinen et al. [5], discusses methods to lays out prerequisites for content extraction, it divide a web page into individual units likened doesn’t actually propose methods to do so. to cards in a deck. Like STUs, a web page is divided into a series of hierarchical “cards” that A variety of approaches have been are placed into a “deck”. This deck of cards is suggested for formatting web pages to fit on the presented to the user one card at a time for easy small screens of cellular phones and PDAs browsing. The paper also suggests a simple (including the Opera browser [16], and conversion of HTML content to WML Bitstream ThunderHawk [17]); they only (Wireless Markup Language), resulting in the reorganize the content of the webpage to fit on a removal of simple information such as images constrained device and require a user to scroll and bitmaps from the web page so that scrolling and hunt for content. is minimized for small displays. While this reduction has advantages, the method proposed Buyukkokten et al. [3][10] define in that paper shares problems with STUs. The “accordion summarization” as a strategy where problem with the deck-of-cards model is that it a page can be shrunk or expanded much like the relies on splitting a site into tiny sections that instrument. They also discuss a method to can then be browsed as windows. But this transform a web page into a hierarchy of means that it is up to the user to determine on individual content units called Semantic Textual which cards the actual contents are located. Units, or STUs. First, STUs are built by analyzing syntactic features of an HTML None of the concepts solve the problem document, such as text contained within of automatically extracting just the content, paragraph (<P>), table cell (<TD>), and frame although they do provide simpler means in component (<FRAME>) tags. These features which the content can be found. Thus, these are then arranged into a hierarchy based on the concepts limit analysis of web sites. By parsing HTML formatting of each STU. STUs that a web site into a DOM tree, more control can be contain HTML header tags (<H1>, <H2>, and achieved while extracting content. <H3>) or bold text (<B>) are given a higher level in the hierarchy than plain text. This 3. Our Approach hierarchical structure is finally displayed on PDAs and cellular phones. While Our solution employs multiple Buyukkokten’s hierarchy is similar to our DOM extensible techniques that incorporate the tree-based model, DOM trees remain highly advantages of the previous work on content editable and can easily be reconstructed back extraction.
Recommended publications
  • Document Object Model
    Document Object Model DOM DOM is a programming interface that provides a way for the values and structure of an XML document to be accessed and manipulated. Tasks that can be performed with DOM . Navigate an XML document's structure, which is a tree stored in memory. Report the information found at the nodes of the XML tree. Add, delete, or modify elements in the XML document. DOM represents each node of the XML tree as an object with properties and behavior for processing the XML. The root of the tree is a Document object. Its children represent the entire XML document except the xml declaration. On the next page we consider a small XML document with comments, a processing instruction, a CDATA section, entity references, and a DOCTYPE declaration, in addition to its element tree. It is valid with respect to a DTD, named root.dtd. <!ELEMENT root (child*)> <!ELEMENT child (name)> <!ELEMENT name (#PCDATA)> <!ATTLIST child position NMTOKEN #REQUIRED> <!ENTITY last1 "Dover"> <!ENTITY last2 "Reckonwith"> Document Object Model Copyright 2005 by Ken Slonneger 1 Example: root.xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE root SYSTEM "root.dtd"> <!-- root.xml --> <?DomParse usage="java DomParse root.xml"?> <root> <child position="first"> <name>Eileen &last1;</name> </child> <child position="second"> <name><![CDATA[<<<Amanda>>>]]> &last2;</name> </child> <!-- Could be more children later. --> </root> DOM imagines that this XML information has a document root with four children: 1. A DOCTYPE declaration. 2. A comment. 3. A processing instruction, whose target is DomParse. 4. The root element of the document. The second comment is a child of the root element.
    [Show full text]
  • Exploring and Extracting Nodes from Large XML Files
    Exploring and Extracting Nodes from Large XML Files Guy Lapalme January 2010 Abstract This article shows how to deal simply with large XML files that cannot be read as a whole in memory and for which the usual XML exploration and extraction mechanisms cannot work or are very inefficient in processing time. We define the notion of a skeleton document that is maintained as the file is read using a pull- parser. It is used for showing the structure of the document and for selecting parts of it. 1 Introduction XML has been developed to facilitate the annotation of information to be shared between computer systems. Because it is intended to be easily generated and parsed by computer systems on all platforms, its format is based on character streams rather than internal binary ones. Being character-based, it also has the nice property of being readable and editable by humans using standard text editors. XML is based on a uniform, simple and yet powerful model of data organization: the generalized tree. Such a tree is defined as either a single element or an element having other trees as its sub-elements called children. This is the same model as the one chosen for the Lisp programming language 50 years ago. This hierarchical model is very simple and allows a simple annotation of the data. The left part of Figure 1 shows a very small XML file illustrating the basic notation: an arbitrary name between < and > symbols is given to a node of a tree. This is called a start-tag.
    [Show full text]
  • WEB-BASED ANNOTATION and COLLABORATION Electronic Document Annotation Using a Standards-Compliant Web Browser
    WEB-BASED ANNOTATION AND COLLABORATION Electronic Document Annotation Using a Standards-compliant Web Browser Trev Harmon School of Technology, Brigham Young University, 265 CTB, Provo, Utah, USA Keywords: Annotation, collaboration, web-based, e-learning. Abstract: The Internet provides a powerful medium for communication and collaboration through web-based applications. However, most web-based annotation and collaboration applications require additional software, such as applets, plug-ins, and extensions, in order to work correctly with the web browsers typically found on today’s computers. This in combination with the ever-growing number of file formats poses an obstacle to the wide-scale deployment of annotation and collaboration systems across the heterogeneous networks common in the academic and corporate worlds. In order to address these issues, a web-based system was developed that allows for freeform (handwritten) and typed annotation of over twenty common file formats via a standards-compliant web browser without the need of additional software. The system also provides a multi-tiered security architecture that allows authors control over who has access to read and annotate their documents. While initially designed for use in academia, flexibility within the system allows it to be used for many annotation and collaborative tasks such as distance-learning, collaborative projects, online discussion and bulletin boards, graphical wikis, and electronic grading. The open-source nature of the system provides the opportunity for its continued development and extension. 1 INTRODUCTION did not foresee the broad spectrum of uses expected of their technologies by today’s users. While serving While the telegraph and telephone may have cracked this useful purpose, such add-on software can open the door of instantaneous, worldwide become problematic in some circumstances because communication, the Internet has flung it wide open.
    [Show full text]
  • Document Object Model
    Document Object Model CITS3403: Agile Web Development Semester 1, 2021 Introduction • We’ve seen JavaScript core – provides a general scripting language – but why is it so useful for the web? • Client-side JavaScript adds collection of objects, methods and properties that allow scripts to interact with HTML documents dynamic documents client-side programming • This is done by bindings to the Document Object Model (DOM) – “The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents.” – “The document can be further processed and the results of that processing can be incorporated back into the presented page.” • DOM specifications describe an abstract model of a document – API between HTML document and program – Interfaces describe methods and properties – Different languages will bind the interfaces to specific implementations – Data are represented as properties and operations as methods • https://www.w3schools.com/js/js_htmldom.asp The DOM Tree • DOM API describes a tree structure – reflects the hierarchy in the XTML document – example... <html xmlns = "http://www.w3.org/1999/xhtml"> <head> <title> A simple document </title> </head> <body> <table> <tr> <th>Breakfast</th> <td>0</td> <td>1</td> </tr> <tr> <th>Lunch</th> <td>1</td> <td>0</td> </tr> </table> </body> </html> Execution Environment • The DOM tree also includes nodes for the execution environment in a browser • Window object represents the window displaying a document – All properties are visible to all scripts – Global variables are properties of the Window object • Document object represents the HTML document displayed – Accessed through document property of Window – Property arrays for forms, links, images, anchors, … • The Browser Object Model is sometimes used to refer to bindings to the browser, not specific to the current page (document) being rendered.
    [Show full text]
  • Chapter 10 Document Object Model and Dynamic HTML
    Chapter 10 Document Object Model and Dynamic HTML The term Dynamic HTML, often abbreviated as DHTML, refers to the technique of making Web pages dynamic by client-side scripting to manipulate the document content and presen- tation. Web pages can be made more lively, dynamic, or interactive by DHTML techniques. With DHTML you can prescribe actions triggered by browser events to make the page more lively and responsive. Such actions may alter the content and appearance of any parts of the page. The changes are fast and e±cient because they are made by the browser without having to network with any servers. Typically the client-side scripting is written in Javascript which is being standardized. Chapter 9 already introduced Javascript and basic techniques for making Web pages dynamic. Contrary to what the name may suggest, DHTML is not a markup language or a software tool. It is a technique to make dynamic Web pages via client-side programming. In the past, DHTML relies on browser/vendor speci¯c features to work. Making such pages work for all browsers requires much e®ort, testing, and unnecessarily long programs. Standardization e®orts at W3C and elsewhere are making it possible to write standard- based DHTML that work for all compliant browsers. Standard-based DHTML involves three aspects: 447 448 CHAPTER 10. DOCUMENT OBJECT MODEL AND DYNAMIC HTML Figure 10.1: DOM Compliant Browser Browser Javascript DOM API XHTML Document 1. Javascript|for cross-browser scripting (Chapter 9) 2. Cascading Style Sheets (CSS)|for style and presentation control (Chapter 6) 3. Document Object Model (DOM)|for a uniform programming interface to access and manipulate the Web page as a document When these three aspects are combined, you get the ability to program changes in Web pages in reaction to user or browser generated events, and therefore to make HTML pages more dynamic.
    [Show full text]
  • Ch08-Dom.Pdf
    Web Programming Step by Step Chapter 8 The Document Object Model (DOM) Except where otherwise noted, the contents of this presentation are Copyright 2009 Marty Stepp and Jessica Miller. 8.1: Global DOM Objects 8.1: Global DOM Objects 8.2: DOM Element Objects 8.3: The DOM Tree The six global DOM objects Every Javascript program can refer to the following global objects: name description document current HTML page and its content history list of pages the user has visited location URL of the current HTML page navigator info about the web browser you are using screen info about the screen area occupied by the browser window the browser window The window object the entire browser window; the top-level object in DOM hierarchy technically, all global code and variables become part of the window object properties: document , history , location , name methods: alert , confirm , prompt (popup boxes) setInterval , setTimeout clearInterval , clearTimeout (timers) open , close (popping up new browser windows) blur , focus , moveBy , moveTo , print , resizeBy , resizeTo , scrollBy , scrollTo The document object the current web page and the elements inside it properties: anchors , body , cookie , domain , forms , images , links , referrer , title , URL methods: getElementById getElementsByName getElementsByTagName close , open , write , writeln complete list The location object the URL of the current web page properties: host , hostname , href , pathname , port , protocol , search methods: assign , reload , replace complete list The navigator object information about the web browser application properties: appName , appVersion , browserLanguage , cookieEnabled , platform , userAgent complete list Some web programmers examine the navigator object to see what browser is being used, and write browser-specific scripts and hacks: if (navigator.appName === "Microsoft Internet Explorer") { ..
    [Show full text]
  • XPATH in NETCONF and YANG Table of Contents
    XPATH IN NETCONF AND YANG Table of Contents 1. Introduction ............................................................................................................3 2. XPath 1.0 Introduction ...................................................................................3 3. The Use of XPath in NETCONF ...............................................................4 4. The Use of XPath in YANG .........................................................................5 5. XPath and ConfD ...............................................................................................8 6. Conclusion ...............................................................................................................9 7. Additional Resourcese ..................................................................................9 2 XPath in NETCONF and YANG 1. Introduction XPath is a powerful tool used by NETCONF and YANG. This application note will help you to understand and utilize this advanced feature of NETCONF and YANG. This application note gives a brief introduction to XPath, then describes how XPath is used in NETCONF and YANG, and finishes with a discussion of XPath in ConfD. The XPath 1.0 standard was defined by the W3C in 1999. It is a language which is used to address the parts of an XML document and was originally design to be used by XML Transformations. XPath gets its name from its use of path notation for navigating through the hierarchical structure of an XML document. Since XML serves as the encoding format for NETCONF and a data model defined in YANG is represented in XML, it was natural for NETCONF and XML to utilize XPath. 2. XPath 1.0 Introduction XML Path Language, or XPath 1.0, is a W3C recommendation first introduced in 1999. It is a language that is used to address and match parts of an XML document. XPath sees the XML document as a tree containing different kinds of nodes. The types of nodes can be root, element, text, attribute, namespace, processing instruction, and comment nodes.
    [Show full text]
  • Basic DOM Scripting Objectives
    Basic DOM scripting Objectives Applied Write code that uses the properties and methods of the DOM and DOM HTML nodes. Write an event handler that accesses the event object and cancels the default action. Write code that preloads images. Write code that uses timers. Objectives (continued) Knowledge Describe these properties and methods of the DOM Node type: nodeType, nodeName, nodeValue, parentNode, childNodes, firstChild, hasChildNodes. Describe these properties and methods of the DOM Document type: documentElement, getElementsByTagName, getElementsByName, getElementById. Describe these properties and methods of the DOM Element type: tagName, hasAttribute, getAttribute, setAttribute, removeAttribute. Describe the id and title properties of the DOM HTMLElement type. Describe the href property of the DOM HTMLAnchorElement type. Objectives (continued) Describe the src property of the DOM HTMLImageElement type. Describe the disabled property and the focus and blur methods of the DOM HTMLInputElement and HTMLButtonElement types. Describe these timer methods: setTimeout, setInterval, clearTimeout, clearInterval. The XHTML for a web page <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Image Gallery</title> <link rel="stylesheet" type="text/css" href="image_gallery.css"/> </head> <body> <div id="content"> <h1 class="center">Fishing Image Gallery</h1> <p class="center">Click one of the links below to view
    [Show full text]
  • Cascading Style Sheet Web Tool
    CASCADING STYLE SHEET WEB TOOL _______________ A Thesis Presented to the Faculty of San Diego State University _______________ In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science _______________ by Kalthoum Y. Adam Summer 2011 iii Copyright © 2011 by Kalthoum Y. Adam All Rights Reserved iv DEDICATION I dedicate this work to my parents who taught me not to give up on fulfilling my dreams. To my faithful husband for his continued support and motivation. To my sons who were my great inspiration. To all my family and friends for being there for me when I needed them most. v ABSTRACT OF THE THESIS Cascading Style Sheet Web Tool by Kalthoum Y. Adam Master of Science in Computer Science San Diego State University, 2011 Cascading Style Sheet (CSS) is a style language that separates the style of a web document from its content. It is used to customize the layout and control the appearance of web pages written by markup languages. CSS saves time while developing the web page by applying the same layout and style to all pages in the website. Furthermore, it makes the website easy to maintain by just editing one file. In this thesis, we developed a CSS web tool that is intended to web developers who will hand-code their HTML and CSS to have a complete control over the web page layout and style. The tool is a form wizard that helps developers through a user-friendly interface to create a website template with a valid CSS and XHTML code.
    [Show full text]
  • Node.Js: Building for Scalability with Server-Side Javascript
    #141 CONTENTS INCLUDE: n What is Node? Node.js: Building for Scalability n Where does Node fit? n Installation n Quick Start with Server-Side JavaScript n Node Ecosystem n Node API Guide and more... By Todd Eichel Visit refcardz.com Consider a food vending WHAT IS NODE? truck on a city street or at a festival. A food truck In its simplest form, Node is a set of libraries for writing high- operating like a traditional performance, scalable network programs in JavaScript. Take a synchronous web server look at this application that will respond with the text “Hello would have a worker take world!” on every HTTP request: an order from the first customer in line, and then // require the HTTP module so we can create a server object the worker would go off to var http = require(‘http’); prepare the order while the customer waits at the window. Once Get More Refcardz! Refcardz! Get More // Create an HTTP server, passing a callback function to be the order is complete, the worker would return to the window, // executed on each request. The callback function will be give it to the customer, and take the next customer’s order. // passed two objects representing the incoming HTTP // request and our response. Contrast this with a food truck operating like an asynchronous var helloServer = http.createServer(function (req, res) { web server. The workers in this truck would take an order from // send back the response headers with an HTTP status the first customer in line, issue that customer an order number, // code of 200 and an HTTP header for the content type res.writeHead(200, {‘Content-Type’: ‘text/plain’}); and have the customer stand off to the side to wait while the order is prepared.
    [Show full text]
  • Abstracts Ing XML Information
    394 TUGboat, Volume 20 (1999), No. 4 new precise meanings assigned old familar words.Such pairings are indispensable to those who must work in both languages — or those who must translate from one into the other! The specification covers 90 pages, the introduction another 124.And this double issue has still more: a comparison between SGML and XML, an introduction to Document Object Models (interfaces for XML doc- uments), generating MathML in Omega, a program to generate MathML-encoded mathematics, and finally, the issue closes with a translation of the XML FAQ (v.1.5, June 1999), maintained by Peter Flynn. In all, over 300 pages devoted to XML. Michel Goossens, XML et XSL : unnouveau d´epart pour le web [XML and XSL:Anewventure for the Web]; pp. 3–126 Late in1996, the W3C andseveral major soft- ware vendors decided to define a markup language specifically optimized for the Web: XML (eXtensible Markup Language) was born. It is a simple dialect of SGML, which does not use many of SGML’s seldom- used and complex functions, and does away with most limitations of HTML. After anintroduction to the XML standard, we describe XSL (eXtensible Stylesheet Language) for presenting and transform- Abstracts ing XML information. Finally we say a few words about other recent developments in the XML arena. [Author’s abstract] LesCahiersGUTenberg As mentioned in the editorial, this article is intended Contents of Double Issue 33/34 to be read in conjunction with the actual specification, (November 1999) provided later in the same issue. Michel Goossens, Editorial´ : XML ou la d´emocratisationdu web [Editorial: XML or, the Sarra Ben Lagha, Walid Sadfi and democratisationof the web]; pp.
    [Show full text]
  • Security Considerations Around the Usage of Client-Side Storage Apis
    Security considerations around the usage of client-side storage APIs Stefano Belloro (BBC) Alexios Mylonas (Bournemouth University) Technical Report No. BUCSR-2018-01 January 12 2018 ABSTRACT Web Storage, Indexed Database API and Web SQL Database are primitives that allow web browsers to store information in the client in a much more advanced way compared to other techniques such as HTTP Cookies. They were originally introduced with the goal of enhancing the capabilities of websites, however, they are often exploited as a way of tracking users across multiple sessions and websites. This work is divided in two parts. First, it quantifies the usage of these three primitives in the context of user tracking. This is done by performing a large-scale analysis on the usage of these techniques in the wild. The results highlight that code snippets belonging to those primitives can be found in tracking scripts at a surprising high rate, suggesting that user tracking is a major use case of these technologies. The second part reviews of the effectiveness of the removal of client-side storage data in modern browsers. A web application, built for specifically for this study, is used to highlight that it is often extremely hard, if not impossible, for users to remove personal data stored using the three primitives considered. This finding has significant implications, because those techniques are often uses as vector for cookie resurrection. CONTENTS Abstract ........................................................................................................................
    [Show full text]