Prevent Xpath and CSS Based Scrapers by Using Markup Randomization

Total Page:16

File Type:pdf, Size:1020Kb

Prevent Xpath and CSS Based Scrapers by Using Markup Randomization The Islamic University of Gaza الجـامعـــــــــ ـة اﻹســـــﻻميـ ـة بغـ ـزة عم ا ة البح ل العلم والد اس ات العلي ا Deanship of Research and Graduate Studies كـليــــ ـة تكنولوجي ا المعلوم ات Faculty of Information Technology ماجس تير تكنولوجي ا المعلوم ات Master of Information Technology Prevent XPath and CSS Based Scrapers by Using Markup Randomization منع جمع المعلومات بالطريقة المعتمدة على XPath و CSS باستخدام عشوائية الترميز By Ahmed Mustafa Ibrahim Diab Supervised by Dr. Tawfiq S. Barhoom Associate Prof. of Applied Computer Technology A thesis submitted in partial fulfilment of the requirements for the degree of Master of Information Technology September/2018 إقـــــــــــــرار أنا الموقع أدناه مقدم الرسالة التي تحمل العنوان: Prevent XPath and CSS Based Scrapers by Using Markup Randomization منع جمع المعلومات بالطريقة المعتمدة على XPath وCSS باستخدام عشوائية الترميز أقر بأن ما اشتملت عليه هذه الرسالة إنما هو نتاج جهدي الخاص، باستثناء ما تمت اﻹشارة إليه حيثما ورد، وأن هذه الرسالة ككل أو أي جزء منها لم يقدم من قبل اﻻخرين لنيل درجة أو لقب علمي أو بحثي لدى أي مؤسسة تعليمية أو بحثية أخرى. Declaration I understand the nature of plagiarism, and I am aware of the University’s policy on this. The work provided in this thesis, unless otherwise referenced, is the researcher's own work, and has not been submitted by others elsewhere for any other degree or qualification. احمد مصطفى إبراهيم دياب :Student's name اسم الطالب: Signature: التوقيع: Date: التاريخ: 28/08/2018 I I Abstract Web Scraping is a useful technique can be used in an ethical way such as climate and many researching fields, on the other hand, unethical way such as exploit content privacy, which is Data Theft. Several researchers have introduced some approaches for addressing this issue, these solutions could have solved the problem in partial ways or in some cases, therefore, the problem still needs another effort. Consequently, in this work, a new solution is introduced for preventing web scraping based on XPath and CSS in an efficient way and applicable to modern web techniques. The proposed solution will be based on Markup Randomization which will rename all CSS classes for a web page then sync those changes back with the HTML page. The main advantage of the proposed solution that can be applied on any web page. Experiments were done over collected dataset which consists of 30 websites divided into three categories: News, Currency Rates and Weather. The aim of the experiments is to measure the Similarity, File Size and the processing time. Visual Similarity was tested and proved that no visual changed occurred during and after applying the solution and most of comparing results were 100% and few results were above 97% due to some unsupported HTML tags was exists on the page such as tags with different namespace like Facebook plugins. File size also changed during the process so some experiments showed that file size reduced due to unnecessary HTML elements removed and other increased due to the length of CSS classes’ length. The processing time of applying the solution is related to file size so that the file with more than 4500 lines should take an average of 5 minutes while the file contains (0-4500) lines the processing time should be less than 2 minutes. Keywords: Anti-Scraper, Anti-Data theft, Web Scrapers. II ملخص الد اسة كشط الويب – عملية جمع المعلومات بطريقة آلية من مواقع اﻻنترنت – يمكن ان تستخدم بطريقة أخﻻقية مثل التنبؤ بحالة الطقس او حتى في البحث العلمي، ومن ناحية أخرى يمكن استخدامها بطريقة ﻻ أخﻻقية تعزز مبدئ انتهاك ملكية المحتوى وهذا يعتبر سرقة البيانات. بعض الباحثين اقترحوا طرق عديدة لحل هذه المشكلة ولكن هذه الحلول ﻻ يمكن ان تنهي هذه المشكلة بشكل كامل ﻷنها تطرقت للمشكلة بشكل جزئي أو في بعض أوقات تشغيل وليس كل أوقات تشغيل برنامج الكشط أو حتى حلول ﻻ يمكن تطبيقها من أحدث معايير الويب الحديث. على العكس تماماً، طريقة جديدة لحل المشكلة تم طرحها في هذه اﻻطروحة لمنع مشكلة كشط الويب بشكل كافي وفعّال مع أحدث معايير الويب. هذه الطريقة ستبنى على مبدئ الترميز العشوائي للكود البرمجي والتي ستعمل على أعادة تسمية جميع قواعد الشكل "CSS Rules" وفي نفس الوقت تغييرها في الكود البرمجي الخاص بالصفحة "HTML Markup" ويمكن تطبيقها على كل صفحة من صفحات الموقع بسهولة وبدون قيود. تم فحص هذا المقترح على عينة البيانات التي تم تجهزيها لتتﻻءم من المقترح بحيث تتكون من 30 موقع الكتروني مختلف في الشكل مو ّزعين على ثﻻثة تصنيفات مواقع إخبارية، مواقع العمﻻت ومواقع حاﻻت الطقس، وكان الهدف من التجارب هو فحص مدى تشابه الصفحة قبل وبعد تطبيق الطريقة المقترحة، وحجم التغيير على ملفات الكود البرمجي وأخيراً الوقت اﻹجمالي لتنفيذ الطريقة. التشابه المرئي تم تفحصه باستخدام أدوات ذكية تفحص مدى تشابه الصفحات، وأثبتت النتائج انه ﻻ يوجد تغيير مرئي في اغلب الحاﻻت بحيث نسبة التشابه كانت 100% وفي بعض الحاﻻت كانت نسبة التشابه وصلت الى 97% نتيجة ﻻحتواء الكود البرمجي اﻷصلي على بعض الرمز الغير مدعومة ﻷدوات الفحص وتعطي في كل مرة كود برمجة متخلف مثل إضافات فيسبوك. التغير في حجم الملفات تم فحصة ومقارنته بما كان عليه وكانت النتائج تثبت ان حجم الملفات تقل بسبب عملية تحسين الكود البرمجي التي تتم اثناء تطبيق الطريقة المقترحة وفي بعض الحاﻻت كانت هناك زيادة في حجم الملفات طبيعية بسبب ان الكود البرمجي اﻷصلي محسن وﻻ يوجد بده أي سطور برمجية غير ضرورية او اغير مرئية يمكن ازالتها. الوقت اﻹجمالي لتطبيق الطريقة المقترحة يعتمد على حجم ملفات الكود البرمجي اﻷصلي، ففي حاﻻت ان الكود البرمجي يحتوي على 4500 سطر فأكثر فان الوقت اﻹجمالي لتطبيق الطريقة يكون في حدود 5 دقائق، بينا ان الوقت الﻻزم لتطبيق الطريقة المقترحة في حالة اقل من 4500 سطر يكون اقل من دقيقتين. كلمات مفتاحية: كشط الويب، منع سرقة البيانات، منع كشط الويب III Dedication This research is dedicated to my father Mustafa, my mother Suad, Sister, brothers, my wife and my sons Ezzuddeen and Yassin, friends and all one who encourage me to complete my study. IV Acknowledgment I would first like to thank my thesis advisor Associate Professor Tawfiq Soliman Barhoom of the Information Technology at Islamic University of Gaza. The door to Prof. Tawfiq office was always open whenever I ran into a trouble spot or had a question about my research or writing. He consistently allowed this thesis to be my own work, but steered me in the right the direction whenever he thought I needed it. At long last, I should offer my extremely significant thanks to my Father and Mother and to my better half to provide me with unfailing help and constant consolation during my time of study and through the way toward exploring and composing this theory. This achievement would not have been conceivable without them. Much obliged to you. Author Ahmed Mustafa Ibrahim Diab V Table of Contents Declaration .................................................................................................................... I Abstract ........................................................................................................................ II III ................................................................................................................. ملخص الدراسة Dedication .................................................................................................................. IV Acknowledgment ......................................................................................................... V Table of Contents ....................................................................................................... VI List of Tables ........................................................................................................... VIII List of Figures ............................................................................................................ IX List of Formulas ......................................................................................................... XI List of Abbreviations ................................................................................................ XII Chapter 1 Introduction .................................................................................................. 1 1.1 Statement of the Problem ........................................................................................ 2 1.2 Objectives ............................................................................................................... 3 1.2.2 Main Objectives ...................................................................................................3 1.2.3 Specific Objectives ..............................................................................................3 1.3 Importance of the Research .................................................................................... 3 1.3.1 Motivation ............................................................................................................3 1.4 Scope and Limitation of the Research .................................................................... 4 1.5 Overview of Thesis ................................................................................................. 4 Chapter 2 Theoretical Background ............................................................................... 6 2.1 Introduction ............................................................................................................. 6 2.2 Web Scraping Techniques ...................................................................................... 6 2.2.1 Web Usage Mining ..............................................................................................6 2.2.2 Web Scraping: .....................................................................................................9 2.2.3 Semantic Annotations ..........................................................................................9
Recommended publications
  • Bibliography of Erik Wilde
    dretbiblio dretbiblio Erik Wilde's Bibliography References [1] AFIPS Fall Joint Computer Conference, San Francisco, California, December 1968. [2] Seventeenth IEEE Conference on Computer Communication Networks, Washington, D.C., 1978. [3] ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Los Angeles, Cal- ifornia, March 1982. ACM Press. [4] First Conference on Computer-Supported Cooperative Work, 1986. [5] 1987 ACM Conference on Hypertext, Chapel Hill, North Carolina, November 1987. ACM Press. [6] 18th IEEE International Symposium on Fault-Tolerant Computing, Tokyo, Japan, 1988. IEEE Computer Society Press. [7] Conference on Computer-Supported Cooperative Work, Portland, Oregon, 1988. ACM Press. [8] Conference on Office Information Systems, Palo Alto, California, March 1988. [9] 1989 ACM Conference on Hypertext, Pittsburgh, Pennsylvania, November 1989. ACM Press. [10] UNIX | The Legend Evolves. Summer 1990 UKUUG Conference, Buntingford, UK, 1990. UKUUG. [11] Fourth ACM Symposium on User Interface Software and Technology, Hilton Head, South Carolina, November 1991. [12] GLOBECOM'91 Conference, Phoenix, Arizona, 1991. IEEE Computer Society Press. [13] IEEE INFOCOM '91 Conference on Computer Communications, Bal Harbour, Florida, 1991. IEEE Computer Society Press. [14] IEEE International Conference on Communications, Denver, Colorado, June 1991. [15] International Workshop on CSCW, Berlin, Germany, April 1991. [16] Third ACM Conference on Hypertext, San Antonio, Texas, December 1991. ACM Press. [17] 11th Symposium on Reliable Distributed Systems, Houston, Texas, 1992. IEEE Computer Society Press. [18] 3rd Joint European Networking Conference, Innsbruck, Austria, May 1992. [19] Fourth ACM Conference on Hypertext, Milano, Italy, November 1992. ACM Press. [20] GLOBECOM'92 Conference, Orlando, Florida, December 1992. IEEE Computer Society Press. http://github.com/dret/biblio (August 29, 2018) 1 dretbiblio [21] IEEE INFOCOM '92 Conference on Computer Communications, Florence, Italy, 1992.
    [Show full text]
  • QUERYING JSON and XML Performance Evaluation of Querying Tools for Offline-Enabled Web Applications
    QUERYING JSON AND XML Performance evaluation of querying tools for offline-enabled web applications Master Degree Project in Informatics One year Level 30 ECTS Spring term 2012 Adrian Hellström Supervisor: Henrik Gustavsson Examiner: Birgitta Lindström Querying JSON and XML Submitted by Adrian Hellström to the University of Skövde as a final year project towards the degree of M.Sc. in the School of Humanities and Informatics. The project has been supervised by Henrik Gustavsson. 2012-06-03 I hereby certify that all material in this final year project which is not my own work has been identified and that no work is included for which a degree has already been conferred on me. Signature: ___________________________________________ Abstract This article explores the viability of third-party JSON tools as an alternative to XML when an application requires querying and filtering of data, as well as how the application deviates between browsers. We examine and describe the querying alternatives as well as the technologies we worked with and used in the application. The application is built using HTML 5 features such as local storage and canvas, and is benchmarked in Internet Explorer, Chrome and Firefox. The application built is an animated infographical display that uses querying functions in JSON and XML to filter values from a dataset and then display them through the HTML5 canvas technology. The results were in favor of JSON and suggested that using third-party tools did not impact performance compared to native XML functions. In addition, the usage of JSON enabled easier development and cross-browser compatibility. Further research is proposed to examine document-based data filtering as well as investigating why performance deviated between toolsets.
    [Show full text]
  • Scraping HTML with Xpath Stéphane Ducasse, Peter Kenny
    Scraping HTML with XPath Stéphane Ducasse, Peter Kenny To cite this version: Stéphane Ducasse, Peter Kenny. Scraping HTML with XPath. published by the authors, pp.26, 2017. hal-01612689 HAL Id: hal-01612689 https://hal.inria.fr/hal-01612689 Submitted on 7 Oct 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Scraping HTML with XPath Stéphane Ducasse and Peter Kenny Square Bracket tutorials September 28, 2017 master @ a0267b2 Copyright 2017 by Stéphane Ducasse and Peter Kenny. The contents of this book are protected under the Creative Commons Attribution-ShareAlike 3.0 Unported license. You are free: • to Share: to copy, distribute and transmit the work, • to Remix: to adapt the work, Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work.
    [Show full text]
  • SVG-Based Knowledge Visualization
    MASARYK UNIVERSITY FACULTY}w¡¢£¤¥¦§¨ OF I !"#$%&'()+,-./012345<yA|NFORMATICS SVG-based Knowledge Visualization DIPLOMA THESIS Miloš Kaláb Brno, spring 2012 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Advisor: RNDr. Tomáš Gregar Ph.D. ii Acknowledgement I would like to thank RNDr. Tomáš Gregar Ph.D. for supervising the thesis. His opinions, comments and advising helped me a lot with accomplishing this work. I would also like to thank to Dr. Daniel Sonntag from DFKI GmbH. Saarbrücken, Germany, for the opportunity to work for him on the Medico project and for his supervising of the thesis during my erasmus exchange in Germany. Big thanks also to Jochen Setz from Dr. Sonntag’s team who worked on the server background used by my visualization. Last but not least, I would like to thank to my family and friends for being extraordinary supportive. iii Abstract The aim of this thesis is to analyze the visualization of semantic data and sug- gest an approach to general visualization into the SVG format. Afterwards, the approach is to be implemented in a visualizer allowing user to customize the visualization according to the nature of the data. The visualizer was integrated as an extension of Fresnel Editor. iv Keywords Semantic knowledge, SVG, Visualization, JavaScript, Java, XML, Fresnel, XSLT v Contents Introduction . .3 1 Brief Introduction to the Related Technologies ..........5 1.1 XML – Extensible Markup Language ..............5 1.1.1 XSLT – Extensible Stylesheet Lang.
    [Show full text]
  • XPATH in NETCONF and YANG Table of Contents
    XPATH IN NETCONF AND YANG Table of Contents 1. Introduction ............................................................................................................3 2. XPath 1.0 Introduction ...................................................................................3 3. The Use of XPath in NETCONF ...............................................................4 4. The Use of XPath in YANG .........................................................................5 5. XPath and ConfD ...............................................................................................8 6. Conclusion ...............................................................................................................9 7. Additional Resourcese ..................................................................................9 2 XPath in NETCONF and YANG 1. Introduction XPath is a powerful tool used by NETCONF and YANG. This application note will help you to understand and utilize this advanced feature of NETCONF and YANG. This application note gives a brief introduction to XPath, then describes how XPath is used in NETCONF and YANG, and finishes with a discussion of XPath in ConfD. The XPath 1.0 standard was defined by the W3C in 1999. It is a language which is used to address the parts of an XML document and was originally design to be used by XML Transformations. XPath gets its name from its use of path notation for navigating through the hierarchical structure of an XML document. Since XML serves as the encoding format for NETCONF and a data model defined in YANG is represented in XML, it was natural for NETCONF and XML to utilize XPath. 2. XPath 1.0 Introduction XML Path Language, or XPath 1.0, is a W3C recommendation first introduced in 1999. It is a language that is used to address and match parts of an XML document. XPath sees the XML document as a tree containing different kinds of nodes. The types of nodes can be root, element, text, attribute, namespace, processing instruction, and comment nodes.
    [Show full text]
  • Pearls of XSLT/Xpath 3.0 Design
    PEARLS OF XSLT AND XPATH 3.0 DESIGN PREFACE XSLT 3.0 and XPath 3.0 contain a lot of powerful and exciting new capabilities. The purpose of this paper is to highlight the new capabilities. Have you got a pearl that you would like to share? Please send me an email and I will add it to this paper (and credit you). I ask three things: 1. The pearl highlights a capability that is new to XSLT 3.0 or XPath 3.0. 2. Provide a short, complete, working stylesheet with a sample input document. 3. Provide a brief description of the code. This is an evolving paper. As new pearls are found, they will be added. TABLE OF CONTENTS 1. XPath 3.0 is a composable language 2. Higher-order functions 3. Partial functions 4. Function composition 5. Recursion with anonymous functions 6. Closures 7. Binary search trees 8. -- next pearl is? -- CHAPTER 1: XPATH 3.0 IS A COMPOSABLE LANGUAGE The XPath 3.0 specification says this: XPath 3.0 is a composable language What does that mean? It means that every operator and language construct allows any XPath expression to appear as its operand (subject only to operator precedence and data typing constraints). For example, take this expression: 3 + ____ The plus (+) operator has a left-operand, 3. What can the right-operand be? Answer: any XPath expression! Let's use the max() function as the right-operand: 3 + max(___) Now, what can the argument to the max() function be? Answer: any XPath expression! Let's use a for- loop as its argument: 3 + max(for $i in 1 to 10 return ___) Now, what can the return value of the for-loop be? Answer: any XPath expression! Let's use an if- statement: 3 + max(for $i in 1 to 10 return (if ($i gt 5) then ___ else ___))) And so forth.
    [Show full text]
  • Decentralized Identifier WG F2F Sessions
    Decentralized Identifier WG F2F Sessions Day 1: January 29, 2020 Chairs: Brent Zundel, Dan Burnett Location: Microsoft Schiphol 1 Welcome! ● Logistics ● W3C WG IPR Policy ● Agenda ● IRC and Scribes ● Introductions & Dinner 2 Logistics ● Location: “Spaces”, 6th floor of Microsoft Schiphol ● WiFi: SSID Publiek_theOutlook, pwd Hello2020 ● Dial-in information: +1-617-324-0000, Meeting ID ● Restrooms: End of the hall, turn right ● Meeting time: 8 am - 5 pm, Jan. 29-31 ● Breaks: 10:30-11 am, 12:30-1:30 pm, 2:30-3 pm ● DID WG Agenda: https://tinyurl.com/didwg-ams2020-agenda (HTML) ● Live slides: https://tinyurl.com/didwg-ams2020-slides (Google Slides) ● Dinner Details: See the “Dinner Tonight” slide at the end of each day 3 W3C WG IPR Policy ● This group abides by the W3C patent policy https://www.w3.org/Consortium/Patent-Policy-20040205 ● Only people and companies listed at https://www.w3.org/2004/01/pp-impl/117488/status are allowed to make substantive contributions to the specs ● Code of Conduct https://www.w3.org/Consortium/cepc/ 4 Today’s agenda 8:00 Breakfast 8:30 Welcome, Introductions, and Logistics Chairs 9:00 Level setting Chairs 9:30 Security issues Brent 10:15 DID and IoT Sam Smith 10:45 Break 11:00 Multiple Encodings/Different Syntaxes: what might we want to support Markus 11:30 Different encodings: model incompatibilities Manu 12:00 Abstract data modeling options Dan Burnett 12:30 Lunch (brief “Why Are We Here?” presentation) Christopher Allen 13:30 DID Doc Extensibility via Registries Mike 14:00 DID Doc Extensibility via JSON-LD Manu
    [Show full text]
  • Session 2: Markup Language Technologies
    XML for Java Developers G22.3033-002 Session 2 - Main Theme Markup Language Technologies (Part II) Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences 1 Agenda Summary of Previous Session / Review New Syllabus XML Applications vs. Applications of XML!? History and Current State of XML Standards Advanced Applications of XML XML’s eXtensible Style Language (XSL) Character Encodings and Text Processing XML and DBMSs Course Approach ... XML Application Development XML References and Class Project Readings Assignment #1a (due today - reminder?) 2 Assignment #1b (due next week) 1 Summary of Previous Session XML Generics Course Logistics, Structure and Objectives History of Meta-Markup Languages XML Applications: Markup Languages XML Information Modeling Applications XML-Based Architectures XML and Java XML Development Tools (XML, DTD and Schema Editors) Summary Class Project Readings Assignment #1a 3 Old History Formatting Markups Rendition notations (e.g., LaTeX, TeX, RTF, MIF) Compatible with standard text editors Processed into presentations (printout, or electronic display) WYSIWYG What You See Is “ALL” You Get Meta-Markup Language GML (Goldfarb, Mosher, Lorie - IBM 1969) Generalized (i.e., indep. of systems, devices, applications) Markups (i.e., information related to struct. & content Language (i.e., methodology with formal syntax) Validation capabilities (1974) 4 2 SGML SGML (1978 - 10/15/86) Used by DoD for Continuous Acquisition and Lifecycle Support (CALS) http://www.oasis-open.org/cover/general.html SGML DTD or Schema <!DOCTYPE tutorials [ <!ELEMENT tutorials – (tutorial+)> <!ELEMENT tutorial – (title, intro, chap+)> <!ELEMENT title – O (#PCDATA)> <!ELEMENT intro – O (para+)> <!ELEMENT chap – O (title, para+)> <!ELEMENT para – O (#PCDATA)> ]> 5 SGML Markup <tutorials> <tutorial> <title>XML TUTORIAL <intro> <para>Required first paragraph of intro.
    [Show full text]
  • Xpath Terminology
    Tutorial 7 – XPath, XQuery CSC343 - Introduction to Databases Fall 2008 TA: Lei Jiang CSC343: Intro. to Databases 1 XPath Terminology • Node – document root, element, attribute, text, comment, ... • Relationship – parent, child, sibling, ancestor, descendent, … • Exercise: Identify nodes and relationships in following xml document <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <!-- a bookstore database --> <book isbn=“111111” cat=“fiction”> <!-- a particular book --> <title lang=“chn”>Harry Potter</title> <price unit=“us”>79.99</price> </book> <book isbn=“222222” cat=“textbook”> document root does not <title lang=“eng”>Learning XML</title> correspond to anything <price unit=“us”>69.95</price> in the document </book> <book isbn="333333" cat="textbook"> <title lang="eng">Intro. to Databases</title> <price unit="usd">39.00</price> </book> </bookstore> CSC343: Intro. to Databases 2 1 Node selector Expression Description / Selects the document root node (absolute path) node Selects the node (relative path) // Selects all descendent nodes of the current node that match the selection . Selects the current node .. Selects the parent of the current node @ Selects attribute nodes CSC343: Intro. to Databases 3 Node selector: exercise Result Path Expression Selects the document root node ? ? Selects the bookstore element node ? ? Selects all book element nodes ? ? Selects all price element nodes ? ? Selects all lang attribute nodes ? ? ././. ? /bookstore//@lang/../.. ? ./book/tilte/@lang CSC343: Intro. to Databases 4 2 Node selector : exercise sol Result Path Expression Selects the document root node / /. Selects the bookstore element node /bookstore ./bookstore Selects all book element nodes /bookstore/book //book Selects all price element nodes bookstore/book/price //price Selects all lang attribute nodes //@lang Selects the document root node ././.
    [Show full text]
  • Automating Navigation Sequences in AJAX Websites
    Automating Navigation Sequences in AJAX Websites Paula Montoto, Alberto Pan, Juan Raposo, Fernando Bellas, and Javier López Department of Information and Communication Technologies, University of A Coruña Facultad de Informática, Campus de Elviña s/n 15071 A Coruña, Spain {pmontoto,apan,jrs,fbellas,jmato}@udc.es Abstract. Web automation applications are widely used for different purposes such as B2B integration, automated testing of web applications or technology and business watch. One crucial part in web automation applications is to allow easily generating and reproducing navigation sequences. Previous proposals in the literature assumed a navigation model today turned obsolete by the new breed of AJAX-based websites. Although some open-source and commercial tools have also addressed the problem, they show significant limitations either in usability or their ability to deal with complex websites. In this paper, we propose a set of new techniques to deal with this problem. Our main contributions are a new method for recording navigation sequences supporting a wider range of events, and a novel method to detect when the effects caused by a user action have finished. We have evaluated our approach with more than 100 web applications, obtaining very good results. Keywords: Web automation, web integration, web wrappers. 1 Introduction Web automation applications are widely used for different purposes such as B2B integration, web mashups, automated testing of web applications or business watch. One crucial part in web automation applications is to allow easily generating and reproducing navigation sequences. We can distinguish two stages in this process: − Generation phase. In this stage, the user specifies the navigation sequence to reproduce.
    [Show full text]
  • Xpath Is a Syntax for Defining Parts of an XML Document. Xpath Uses Path Expressions to Navigate in XML Documents
    XML PROGRAMMING: SUB CODE- 24662 QPCODE: -780 PART-A (Each question carries 1 marks, Answer any FIFTEEN(15) questions) 1.Mention any two part of XML tree structure. Root node Leaf node 2. Write any two use of XPath. (Any Two) XPath is a syntax for defining parts of an XML document. XPath uses path expressions to navigate in XML documents. XPath contains a library of standard functions. XPath is a major element in XSLT and in XQuery. XPath is a W3C recommendation. 3. Define WML. Wireless markup language (WML) is a markup language for wireless devices that adhere to Wireless Application Protocol (WAP) and have limited processing capability. 4.What is an absolute location path? Location path specifies the location of node in XML document. This path can be absolute or relative. If location path starts with root node or with '/' then it is an absolute path. Following are few of the example locating the elements using absolute path. 5.Write the limitation of schema language. Poor support for xml namespace Poor data typing Limited content model description It supports only the text string data type. Limited possibilities to express the cardinality for elements. 6.Mention any two declaration that can be used in DTDs. (Any Two) <!ELEMENT letter (date, address, salutation, body, closing, signature)> Element – letter Child Elements - date, address, salutation, body,closing,signature <!ELEMENT Name (#PCDATA)> #PCDATA – parsed character data - Data contains only text 1 <!ELEMENT Street (#CDATA)> #CDATA – Character Data - Data may contain text, numbers and other character. <!ELEMENT br EMPTY> EMPTY – It has no content.
    [Show full text]
  • Scalable Vector Graphics (SVG) 1.2
    Scalable Vector Graphics (SVG) 1.2 Scalable Vector Graphics (SVG) 1.2 W3C Working Draft 27 October 2004 This version: http://www.w3.org/TR/2004/WD-SVG12-20041027/ Previous version: http://www.w3.org/TR/2004/WD-SVG12-20040510/ Latest version of SVG 1.2: http://www.w3.org/TR/SVG12/ Latest SVG Recommendation: http://www.w3.org/TR/SVG/ Editor: Dean Jackson, W3C, <[email protected]> Authors: See Author List Copyright ©2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract SVG is a modularized XML language for describing two-dimensional graphics with animation and interactivity, and a set of APIs upon which to build graphics- based applications. This document specifies version 1.2 of Scalable Vector Graphics (SVG). Status of this Document http://www.w3.org/TR/SVG12/ (1 of 10)30-Oct-2004 04:30:53 Scalable Vector Graphics (SVG) 1.2 This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/. This is a W3C Last Call Working Draft of the Scalable Vector Graphics (SVG) 1.2 specification. The SVG Working Group plans to submit this specification for consideration as a W3C Candidate Recommendation after examining feedback to this draft. Comments for this specification should have a subject starting with the prefix 'SVG 1.2 Comment:'. Please send them to [email protected], the public email list for issues related to vector graphics on the Web.
    [Show full text]