
http://excel.fit.vutbr.cz Platform for measurement of JavaScript APIs usage on a web Marek Schauer* Abstract The world wide web is a complex environment. Web pages can access many APIs ranging from text formatting to access to nearby Bluetooth devices. While many APIs are used for legitimate purposes, some are misused to track and identify their users without their knowledge. In this paper, we propose a methodology to measure the usage of JavaScript APIs on the public web. The methodology consists of an automated visit of several thousand websites and intercepting JavaScript calls performed by the pages. We also provide a design and architecture of a measurement platform that can be used for an automated visit of a list of websites. The proposed platform is based on OpenWPM. The browser is instrumented by OpenWPM and a customized Web API Manager extension is responsible for capturing JavaScript API calls. Keywords: JavaScript — ECMAScript — Web API — API usage — Web measurement — Browser — OpenWPM Supplementary Material: N/A *xschau00@fit.vutbr.cz, Faculty of Information Technology, Brno University of Technology 1. Introduction environment, which in addition to the base set of EC- MAScript provides many additional APIs specified by Web browsers offer a wide range of possibilities. A reg- other organizations (e.g. W3C or WHATWG). These ular user can say that it is just a place for viewing web standards are often called Web APIs and in this article, pages, but under the hood, web browsers provide an we will call these Web APIs as JavaScript APIs. actual bridge between a viewed page and the host op- erating system. A web browser allows a web page Many JavaScript APIs are used in a manner that to access information like values from sensors, infor- has nothing in common with the original purpose of the mation about battery status, installed fonts, and much API; this paper is concerned mainly with user tracking more. Internet industry often takes advantage of such and fingerprinting. This behavior of websites usually a wide range of information provided by web browsers includes using many distinct APIs, that provide some and there is quite a high percentage of pages that col- kind of specific information [1, 2]. For example Bat- lect a fingerprint from users and try to identify them tery Status API implementation on Mozilla Firefox to show them relevant content [1]. These practices are revealed very precise value allowing the trackers to quite common and a large part of them is implemented identify the user for a period of time [3, 4]. Because using the JavaScript language. the Battery Status API was used heavily for fingerprint- ing, it has been removed from Mozilla Firefox in 2017. JavaScript is a language that can be run in many Other examples of JavaScript APIs that are used often environments, while the most well-known one is prob- for fingerprinting are Canvas API [4], Audio API [2], ably the web browser. Nowadays, there are solutions Permissions API [1] or APIs for working with users that provide JavaScript to the servers too, such as device sensors [5]. NodeJS. All these environments supply the basic set of ECMAScript language specified by the ECMA organi- In our work, we aim to measure the JavaScript zation. In this article, we will focus on the web browser APIs usage by popular websites. In this article, we 130 The main idea of the measurement is to visit sev- 120 Internet Explorer 110 Firefox eral thousands of the most famous pages on the internet 100 Opera 90 and intercept as many JavaScript calls as possible. 80 Google Chrome 70 Microsoft Edge Visiting websites will be performed through the 60 Safari 50 commonly used web browser, specifically Mozilla Fire- 40 30 fox. This web browser will be enriched by an exten- 20 10 0 sion, that can intercept and log the JavaScript calls. To maximize the amount of intercepted JavaScript Amount of implemented APIs Amount implementedof calls we will visit not only the landing page but also the 1.1.2005 1.1.2006 1.1.2007 1.1.2008 1.1.2009 1.1.2010 1.1.2011 1.1.2012 1.1.2013 1.1.2014 1.1.2015 1.1.2016 1.1.2017 1.1.2018 1.1.2019 1.1.2020 1.1.2021 Date of new browser version release subset of subpages of each website. From the landing page, we will extract three links that point to a subpage Figure 1. Progress of Web APIs amount implemented of a given page. From each of these three subpages, in distinct browsers in time. we will get another three subpage links resulting in up to 13 pages of a given website being visited. This present core technologies to accomplish these measure- amount of pages should be high enough to catch the ments. The stack of technologies is based on Open- most of JavaScript calls. WPM1 enriched by a browser extension, that allows To visit several thousands of websites we need us to intercept JavaScript calls of different APIs. This to assemble a list of website addresses, that will be browser extension is based on Proxy objects. Further visited when performing our measurements. In this details about core technologies are given in Section3. area, we will take advantage of previous work of other Our work is based on work of Peter Snyders et researchers and we will use the Tranco list. In the al. [6] in 2016. Since then many new APIs were research conducted by Snyders, the Alexa Top Sites specified and implemented in web browsers. Progress list provided by Amazon was used. Our decision of of APIs amount implemented in several most famous choosing the Tranco instead of the Alexa Top Sites is web browsers is illustrated in Figure1. Figure is based explained in subsection 3.4. on data from Can I use website2. To give a web page a chance to perform all the In our work we will reproduce Snyders measure- JavaScript calls as it would be visited by a real user, we ments with a focus on new JavaScript APIs. More will wait and intercept JavaScript calls for 30 seconds details about results of Snyders research are given in on each page. section 3.1. In this paper, we propose the methodology Results of our measurements should also provide of our own measurements in section2. Further, in information about the JavaScript APIs, that were prob- subsections 3.2, 3.3 and 3.4 we describe technologies ably used in a manner, that is not necessary for a page that will be used for measurements. Finally, in sec- to be working and is very likely used in a way, that tion4 we describe our contribution and explain how the user would not find useful. To achieve this, we we customized and adjusted these formerly mentioned will run our measurements on every page in two dif- technologies to perform measurements. ferent modes. At first, we will visit the page using the common browser, which is not extended by any- thing special except the extension used for intercepting 2. Methodology proposal JavaScript calls and logging them. Then we will visit This section describes the methodology that is planned the same page using a browser complemented by an to be used when the actual measurements will be per- extension for blocking ads. The difference between formed. This methodology is based on the methodol- logged data will show us which APIs were blocked ogy used in previous research conducted by Peter Sny- by the ad blocker and which APIs were left out with- ders [6]. Building a methodology on top of a method- out blocking. As an extension for the ad-blocking, ology used in previous research can be profitable in we picked a generic version of well known Ghostery two ways. First, the methodology is already validated extension for Mozilla Firefox. by the authors of the mentioned study. Second, using a methodology that is very close to the original one 3. Existing research and state of the art can show us a difference in the usage of the JavaScript This section provides key insights into the existing APIs between 2016 and 2021. research in the field of JavaScript usage on the web. 1https://github.com/mozilla/OpenWPM Further, in the following subsections, we describe the 2https://caniuse.com/ state-of-the-art tools and technologies that we plan to use later when the measurements will be actually :Document taken. 3.1 JavaScript usage on the web document.querySelector(selectors) The Snyders study suggests that some of the JavaScript APIs are extremely popular and they are used on more than 90% of measured pages (e.g. a well known Document.createElement method from DOM Element | null API). On the other hand, there are many APIs that are used by a minority of measured pages. That being said, almost 50% of JavaScript APIs implemented in Figure 2. Sequence diagram of querySelector the browser at the time were not used by any of the method called directly measured pages. :Proxy :LogsInterface :Document The study also suggests that there is no direct con- nection between the implementation date of a given JavaScript API in the browser (or by its specification document.querySelector(selectors) date by some of the specifications vendors) and its log() popularity in using by websites. Concretely, there are some old JavaScript APIs, such as XMLHttpRequest, querySelector(selectors) that are still very popular. However, there are also quite Element | null new JavaScript APIs, that are used very frequently (i.e., Element | null Selectors API Level 1). The conducted study also measured the pages in two ways - with the ad blocker and without any ad Figure 3.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-