<<

Muzeel : A Dynamic JavaScript Analyzer for Dead Code Elimination in Today’s Web Tofunmi Kupoluyi Moumena Chaqfeh Matteo Varvello New York University Abu Dhabi New York University Abu Dhabi Nokia Bell Labs Abu Dhabi, UAE Abu Dhabi, UAE varvello@.com [email protected] [email protected] Waleed Hashmi Lakshmi Subramanian Yasir Zaki New York University Abu Dhabi New York University New York University Abu Dhabi Abu Dhabi, UAE NY, USA Abu Dhabi, UAE [email protected] [email protected] [email protected] ABSTRACT 1 INTRODUCTION JavaScript contributes to the increasing complexity of to- The reuse of existing JavaScript code is a common web devel- day’s web. To support user interactivity and accelerate the opment practice which speeds up the creation and the amend- development cycle, web developers heavily rely on large ment of web pages, but it requires sending large JavaScript general-purpose third-party JavaScript libraries. This prac- ￿les to web browsers, even when only part of the code is tice increases the size and the processing complexity of a actually required. In this paper, we propose to eliminate web page by bringing additional functions that are not used JavaScript functions that are brought to a given web page, by the page but unnecessarily downloaded and processed but never used. These functions are referred to as dead code. by the browser. In this paper, an analysis of around 40,000 The elimination of dead code is inspired by the fact that web pages shows that 70% of JavaScript functions on the even though an unused function is never executed, it im- median page are unused, and the elimination of these func- pacts the overall performance of the page because it must be tions would contribute to the reduction of the page size processed by the browser. The impact of JavaScript is further by 60%. Motivated by these ￿ndings, we propose Muzeel worsened for users who solely rely on low-end smartphones (which means eliminator in Arabic); a solution for eliminat- to access the web [23]. For example, pages require overall ing JavaScript functions that are not used in a given web triple processing time on mobile devices compared to desk- page (commonly referred to as dead code). Muzeel extracts all tops [12]. While methods like script-steaming (parsing in of the page event listeners upon page load, and emulates user parallel to download) and lazy parsing can reduce JavaScript interactions using a bot that triggers each of these events, processing time, only a maximum of 10% improvement of in order to eliminate the dead code of functions that are not the page load time is reported [19]. called by any of these events. Our evaluation results span- This work is motivated by an analysis of around 40,000 ning several Android mobile phones and browsers show that web pages, which shows that 70% of JavaScript functions on Muzeel speeds up the page load by around 30% on low-end the median page are unused, and the elimination of these phones, and by 25% on high-end phones under 3G network. functions would contribute to the reduction of the page size It also reduces the speed index (which is an important user by 60%. Given that user interactivity is a key feature that experience metric) by 23% and 21% under the same network JavaScript provides in web pages, the dead code cannot be on low-end, and high-end phones, respectively. Additionally, accurately identi￿ed unless all JavaScript functions that are Muzeel reduces the overall download size while maintaining executed when the user interacts with the page are reported. the visual content and interactive functionality of the pages. However, the identi￿cation of dead code remains an open problem to date due to a number of challenges, including the dynamic nature of JavaScript that hinders the static analysis CCS CONCEPTS of the source code, the di￿erent ways in which the user can • Information systems World Wide Web. ! interact with a given page, and the highly dynamic changes in the structure and the state of modern web pages that KEYWORDS occur due to user interactivity, which require handling the amended and the dynamically generated events accordingly. JavaScript, Web, Page Event, User Interactivity Automation, Dead Code Elimination. 1 The aforementioned challenges has led [13] to passively reduction. Our analysis shows that 70% of JavaScript monitor JavaScript usage on web pages without capturing the functions on the median page are unused, and the functions that are triggered from the user interactivity, while elimination of these functions would contribute to the the analysis in [24] ends once the page is loaded, implying reduction of the page size by 60%. that any JavaScript functions called afterward will not be Proposing Muzeel; a novel solution for eliminating un- • considered. Instead of identifying only the dead code for used JavaScript functions in web pages through user potential elimination, [21] removes “less-useful” functions, interactivity automation that comprehensively consid- a process that can drastically impact the functionality of ers the events dependency. interactive pages (more than 40% loss of page functionality Evaluating the impact of Muzeel on the page perfor- • can be witnessed when saving 50% of the memory). mance and quality using several Android mobile phones To mitigate the impact of JavaScript on performance degra- and browsers, showing signi￿cant speedups in page dation of today’s web pages [28], and to address the chal- loads and speed index, and a reduction in page size lenges of JavaScript dead code elimination in these pages, we by around 0.8 Megabyte, while maintaining the visual and implement Muzeel; a novel dead code analyzer content and interactive functionality of most pages. that emulates the user interactivity events in web pages us- ing a browser automation environment. Muzeel dynamically analyzes JavaScript code after the page load to accurately 2 RELATED WORK identify the used JavaScript functions that are called when the user interacts with the page (instead of a static JavaScript 2.1 JavaScript Cost Mitigation analysis that is proven to be ine￿cient [24]). It then elimi- Web developers often utilize ugli￿ers (also known as mini- nates the dead code of functions that are not being called by ￿ers) [9, 11] to reduce the size of JS ￿les before using them any of these events. Muzeel addresses the uncertainty of the in their pages. The reduction is achieved by removing un- user interactions (the various ways in which a user can inter- necessary characters from JavaScript ￿les, such as new lines, act with a given page), by covering potential combinations white spaces, and comments. Despite the enhancement in of the run-time interactions while considering the events transmission e￿ciency (due to the reduced sizes of JavaScript dependency to trigger these events in the appropriate order. ￿les), the browser still has to interpret the entire JavaScript Muzeel can be applied to any web page, without imposing code, where a similar processing time is witnessed as in the constraints at the coding style level. case of the original code. In contrast, we mitigate the cost of We assume a medium CDN provider hosting the 40,000 JavaScript in web pages by completely eliminating unused most popular web pages [5]. We then dedicate a server ma- code. chine to “crawl” these pages and run Muzeel to produce While methods like script-steaming (parsing in parallel to Muzeel-ed pages (with dead code eliminated). We ￿nally download) and lazy parsing can reduce JavaScript processing setup a CDN edge node to serve both original and Muzeel- time, only a maximum of 10% improvement of the page load ed pages. The results show that for most pages the num- time is reported [19]. This is due to parsing one JavaScript ber of eliminated JavaScript functions ranges between 100 resource at a time, while many other JavaScript resource are and 10,000, while the JavaScript size reduction ranges from downloaded in parallel. With lazy parsing, some functions 100KBytes to 5MBytes. Motivated by the aforementioned that are never executed are still being processed, for instance, ￿ndings, we selected a set of 200 pages to evaluate the per- when embedded in other functions that are about to execute. formance of Muzeel across di￿erent Android phones and In [24], the authors proposed to analyze JavaScript code browsers, and the quality of Muzeel-ed pages with respect to eliminate unused functions. This analysis was bounded of the original pages. Results show that Muzeel speeds up by the time it takes a web page to load, which implies that the page load by around 30% on low-end phones, and by 25% any JavaScript functions called afterward, e.g., due to a user on high-end phones under 3G network. It also reduces the interaction or some dynamic JavaScript behavior, is ignored. speed index (which is an important user experience metric) The challenge we aim to address in this work is to identify by 23% and 21% under the same network on low-end, and all potential user events, independently of when the page high-end phones, respectively. Additionally, Muzeel main- loads, and then eliminate unused functions that are not being tains the visual content and interactive functionality of most called when any of these events is triggered. More recently, pages. The contribution of this paper are as follows: the authors of [21] proposed to remove less-useful functions from JavaScript elements used in web pages to minimize Analyzing JavaScript in 40,000 popular web pages, to memory usage in low-end mobile devices. They represent • investigate the number of unused functions and the a given web page as a set of components structured as a potential of eliminating these functions on page size dependency-tree graph and explored cutting in a bottom-up 2 fashion, where more than 40% loss of page functionality can 3.1 The Dynamic Nature of JavaScript be witnessed when saving 50% of the memory. The identi￿cation and the elimination of dead code is chal- lenging and remains an open problem to date. This is basi- cally due to the dynamic nature of JavaScript, which hin- ders the static analysis of the source code. More speci￿cally, language features such as the possibility of dynamically ac- 2.2 Web Complexity Solutions cessing objects properties, and context binding that allows During the last decade, a number of solutions were proposed developers to assign an arbitrary object to "this" keyword, to speed up complex web pages. In [28], it is shown that require a run-time analysis of the code to identify unused JavaScript has a key impact on Page Load Time (PLT) due functions. to its role in blocking the page rendering. To speedup PLT, Shandian [29] restructures the loading process of web pages, whereas Polaris [22] provides more accurate fetch sched- 3.2 Handling User Interactivity ules. However, the browser has to download and process Given that user interactivity is a key feature that JavaScript all JavaScript elements brought by a given page. In a recent provides in web pages, the dead code cannot be accurately solution [10], a proxy server is implemented to o￿er a set identi￿ed unless all JavaScript functions that are executed of simpli￿ed web pages via a proxy server, where the iden- when the user interacts with the page are reported. How- ti￿cation of essential JavaScript elements was based on the ever, di￿erent run-time execution￿ ows are expected due JS access to web pages for reading, writing, or event han- the uncertainty of the user interactions with the page. Con- dling. To prepare lightweight web pages, web developers can sequently, a major challenge to address by a dynamic web- utilize Accelerated Mobile Pages (AMP) [15], which based JavaScript analyzer is to handle all combinations of provides a set of restrictions in a web content creation frame- interactive events in a given web page. This can be achieved work. The impact of AMP pages on the user experience is by providing an accurate representation of the page elements characterized in [20]. While AMP o￿ers the opportunity to along with their associated events, which requires the con- create new lightweight pages, we aim to improve the brows- sideration of events dependencies across the page. ing experience by optimizing JS usage in existing web pages. 3.3 Handling Highly-Dynamic Pages The dynamism of web pages adds another level of complex- ity to the aforementioned challenge. More speci￿cally, ad- 3 MOTIVATION AND CHALLENGES ditional interactive elements can be added dynamically to the page, and existing elements can be modi￿ed as the user While methods like script-steaming (parsing in parallel to interacts with the page. With this in mind, it is required to download) and lazy parsing can reduce JavaScript processing consider changes in the page structure and state to cover time, only a maximum of 10% improvement of the page load potential additional and/or modi￿ed events. This challenge time is reported [19]. This is due to parsing one JavaScript has led [13] to analyze JavaScript usage from on web pages at resource at a time, while many other JavaScript resource are page load, while the functions that are triggered when users downloaded in parallel. With lazy parsing, some functions interact with pages are not captured. Similarly, the analysis that are never executed are still being processed, for instance, of dead code in [24] ends once the page is loaded, implying when embedded in other functions that are about to execute. that any JavaScript functions called afterward will not be The alternative approach proposed by is to com- Muzeel considered, and might be mistakenly eliminated. pletely eliminate the code of the functions that are never executed (dead code), motivated by the high percentage of dead code in modern web pages. Our analysis of 25,000 web 3.4 Real-World Deployment Challenges pages shows that 70% of functions in a median page are un- While a real world deployment would be helpful in identi- used. The elimination of these functions would contribute fying unused functions based on real user interactivity, it to the reduction of the page size by 60%. A survey among requires a representative number of users from di￿erent lo- 9,300 JavaScript developers rated dead code elimination as cations with di￿erent needs and interests, interacting with one of the highest requested features [16]. However, the the page for su￿cient periods of time. Instead of identifying identi￿cation and the elimination of JavaScript dead code only the dead code for potential elimination, [21] removes from modern web pages is challenging and remains an open “less-useful” functions, a process that can drastically impact problem to date. In the remaining of this section, we brie￿y the functionality of pages (more than 40% loss of page func- discuss these challenges. tionality can be witnessed when saving 50% of the memory). 3 With Muzeel, we aim to accurately identify and eliminate unused JavaScript functions from web pages to avoid un- necessary browser processing. To handle the dynamic prop- erties of JavaScript that hinder the identi￿cation of unused

functions, we propose to dynamically analyze web pages by automating all possible user interactivity events in these pages. Since the user interactivity events are associated with JavaScript functions, Muzeel triggers each of these events and monitor the JavaScript functions that are called upon the occurrence of a given event. Any function that is never called by the page is considered as unused function and will be eliminated accordingly. We also propose a potential imple- mentation that obtains and serves the optimized JavaScript ￿les to achieve an improved user experience without impact- ing the pages’ quality.

4 MUZEEL Figure 1: Muzeel’s architecture with the main pro- Muzeel is a dynamic JavaScript analyzer which autonomously cesses. identi￿es dead code in web pages with the goal of improv- ing the browsing experience, e.g., data savings and page speedups. Muzeel utilizes a black-box approach in which the Muzeel eliminates dead code of ￿rst party JavaScript ￿les, unused JavaScript functions in web pages are identi￿ed with- given the fact that the CDN is the authoritative entity respon- out having knowledge of the JavaScript code and its imple- sible for hosting these ￿les. For third-party JavaScript ￿les, mentation details. Unlike existing approaches, Muzeel runs there are di￿erent scenarios o￿ered by Muzeel, depending dead code elimination autonomously without the need for on the license of the JavaScript ￿les, as well as the web page “execution traces” [27] or real user interactions [14]. Muzeel owner’s preferences: addresses the aforementioned challenges by: For copyrighted JavaScript ￿les, Muzeel does not per- • form the dead code elimination process, given that Dynamically analyzing JavaScript code in web pages • these ￿les cannot be hosted by the CDN. after the page load to accurately identify the dead code For open-source/copylefted JavaScript ￿les, Muzeel • to eliminate, instead of a static JavaScript analysis that can perform the dead code elimination if the web page is proven to be ine￿cient [24]. owner agrees to host local versions of these ￿les. Thoroughly emulating users’ interaction with a given • Figure 1 shows Muzeel’s architecture indicating the main web page, by automatically triggering all events present processes: pre-processing, dead code discovery, and dead in the page, thus alleviating the need for a real user code elimination. Each of these processes is discussed further interactivity evaluation – a process that is time costly in this section. and prone to subjectivity. Addressing the uncertainty of the user interactions • 4.1 Pre-Processing (the various ways in which a user can interact with a given page), by covering potential combinations of The ￿rst phase of Muzeel’s pre-processing focuses on creat- the run-time interactions while considering the events ing an internal duplicate version of a given page on which dependency (to trigger them in the appropriate order). the dead code elimination is carried out on. This version is used by Muzeel to modify the JavaScript ￿les used by the Muzeel is envisioned as a service o￿ered by a CDN provider page and dynamically analyze it in an automated browser to help content owners optimize JavaScript code within their environment. To create this duplicate version, Muzeel copies pages. JavaScript code used in today’s web pages can be di- and hosts the web page along with its JavaScript ￿les in an vided broadly into two categories: ￿rst party JavaScript, and internal CDN back-end server. third-party JavaScript. First party JavaScript refers to those In the second phase, a unique ID is assigned to each ￿les that are hosted within the same authoritative domain of JavaScript function in each JavaScript ￿le used by the page. the web page, e.g., by a CDN provider. Third-party JavaScript The ID is represented in a form of the tuple <￿le_name, func- ￿les refer to scripts hosted externally, outside the CDN; pop- tion_startLine, function_endLine>. A special console log call ular examples are Google Tag Manager/Analytics [3]. is added to the ￿rst line of every function which outputs its 4 ID, that is, the ￿le where the function is de￿ned, as well as identi￿er. XPath is an accepted page element identi￿cation the function’s start and end line numbers. Muzeel maintains syntax on several browsers and browser automation tools. a list of all IDs created during this process. These IDs cover Muzeel chooses to use XPath for the following reasons: all functions present in the JavaScript ￿les used by the page. XPath allows for the extraction of the events associated • When a page event triggers a particular JavaScript func- with a speci￿c element. tion, the function will print it’s own ID to the browser’s con- XPath can also be used by browser automation tools • sole log. Function IDs that are not printed to the browser’s to trigger events on a speci￿c page element. console log after triggering all page events are identi￿ed as XPath allows Muzeel to uniquely identify elements • unused functions or dead code. across reloads. This is crucial since Muzeel has to reload the page in some circumstances (for example, 4.2 Dead Code Discovery when redirected out of the page upon triggering a Given the dynamic nature of JavaScript, discovering the used click event). XPath is used as opposed to using the and unused functions by the page is not possible through internal representation of page elements provided by static analysis [24]. As mentioned earlier (see Section 3), ex- the browser automation tools, which may be invalid isting dynamic analysis methods do not consider the user when the page is reloaded. interactions with the page such as clicking a button or nav- An XPath can be constructed using di￿erent strategies [1]: igating a drop-down menu. These interactions are crucial position-based, and attribute-based. The position-based strat- for identifying the JavaScript functions that are necessary egy would fail in situations such as adding/removing an el- for the page interactivity, given that these functions won’t ement to/from the page. Additionally, the attribute-based be under the dynamic analyzer radar unless their associated strategy requires the presence of unique attributes which events are triggered. identify every element, where such attributes might not al- The user interactivity on a given page is facilitated through ways be available. Consequently, Muzeel considers both ap- events, such as hover, click, and focus. Muzeel identi￿es proaches in constructing the XPath, where in the presence the functions required for user interactivity (used functions) of a tag element id, Muzeel uses the attribute-based strategy through event listeners. An event listener monitors the oc- given that these ids are unique. Whereas, in the absence of currence of a certain event to call a JavaScript function or the tag element id, Muzeel reverts back to the position-based a chain of functions required to handle that event. Conse- approach, where the XPath is constructed using position- quently, functions that are not called, either directly by any based indexing from the nearest parent element with an “id” event listener or indirectly by other called-functions, are con- or “class” attribute. sidered as unused functions or dead code. In order to trigger In an HTML document representing a given web page, ele- events, it is required to map these events to their correspond- ments are structured internally using di￿erent tags. Each tag ing page elements. A page element refers to an HTML tag can have a unique “id”, and a reference to a pre-de￿ned “class” that appears in the Document Object Model (DOM) of the of attributes from the accompanying Cascading Styling Sheets page, such as image, button, or navigation element. Muzeel (CSS) ￿les, which set the di￿erent visual attributes for a given loads the page in a real browser environment and leverages tag. To demonstrate this, given a page: the built-in functionality of that environment to identify 1 the events and associate them to their corresponding page 2 elements. Hence, Muzeel relies on the ￿nal DOM structure 3

built by the browser without having to statically analyze the 4 JavaScript code or create the DOM structure from scratch. 5
After identifying and mapping the events to page elements, 6
Muzeel considers the events dependency to automatically 7 trigger these events in the appropriate order through the 8
browser automation environment (see Section 4.2.2). The 9 functions called when an event is triggered are logged to the 10 console. Muzeel monitors the console for log statements and Listing 1: A sample page to demonstrate XPath obtains a list of functions that are called, in order to identify construction in Muzeel the used and unused functions. In the following, we discuss the design considerations in Muzeel’s dead code discovery. The “