Detecting Redundant CSS Rules in HTML5 Applications

rtifact A Comple * t * te n * A * te is W E s A e n C l l L o D C S o * * c P u e m s E u O e e n v R t e O o d t a y * s E a * l u d a e Detecting Redundant CSS Rules in HTML5 t Applications: A Tree Rewriting Approach Matthew Hague Anthony Widjaja Lin C.-H. Luke Ong Royal Holloway, University of Yale-NUS College, Singapore University of Oxford London Abstract 1. Introduction HTML5 applications normally have a large set of CSS (Cas- cading Style Sheets) rules for data display. Each CSS rule HTML5 is the latest revision of the HTML standard of the consists of a node selector and a declaration block (which World Wide Web Consortium (W3C), which has become a assigns values to selected nodes’ display attributes). As web standard markup language of the Internet. HTML5 provides applications evolve, maintaining CSS files can easily be- a uniform framework for designing a web application: (1) come problematic. Some CSS rules will be replaced by new data content is given as a standard HTML tree, (2) rules for ones, but these obsolete (hence redundant) CSS rules often data display are given in Cascading Style Sheets (CSS), and remain in the applications. Not only does this “bloat” the ap- (3) dynamic behaviors are specified through JavaScript. plications – increasing the bandwidth requirement – but it An HTML5 application normally contains a large set of also significantly increases web browsers’ processing time. CSS rules for data display, each consisting of a (node) se- Most works on detecting redundant CSS rules in HTML5 ap- lector given in an XPath-like query language and a decla- plications do not consider the dynamic behaviors of HTML5 ration block which assigns values to selected nodes’ dis- (specified in JavaScript); in fact, the only proposed method play attributes. However, many of these styling rules are that takes these into account is dynamic analysis, which can- often redundant (in the sense of unreachable code), which not soundly prove redundancy of CSS rules. In this paper, “bloat” the application. As a web application evolves, some we introduce an abstraction of HTML5 applications based rules will be replaced by new rules and developers often for- on monotonic tree-rewriting and study its “redundancy prob- get to remove obsolete rules. Another cause of redundant lem”. We establish the precise complexity of the problem styling rules is the common use of HTML5 boilerplate (e.g. and various subproblems of practical importance (ranging WordPress) since they include many rules that the applica- from P to EXP). In particular, our algorithm relies on an effi- tion will not need. A recent case study [46] shows that in cient reduction to an analysis of symbolic pushdown systems several industrial web applications on average 60% of the (for which highly optimised solvers are available), which CSS rules are redundant. These bloated applications are not yields a fast method for checking redundancy in practice. We only harder to maintain, but they also increase the band- implemented our algorithm and demonstrated its efficacy in width requirement of the website and significantly increase detecting redundant CSS rules in HTML5 applications. web browsers’ processing time. In fact, a recent study [47] reports that when web browsers are loading popular pages around 30% of the CPU time is spent on CSS selectors Keywords HTML5, jQuery, CSS, redundancy analysis, (18%) and parsing (11%). [These numbers are calculated static analysis, tree-rewriting, symbolic pushdown systems without even including the extra 31% uncategorised oper- ations of the total CPU time, which could include opera- tions from these two categories.] This suggests the importance of detecting and removing redundant CSS rules in an Permission to make digital or hard copies of all or part of this work for personal or HTML5 application. Indeed, a sound and automatic redun- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation dancy checker would allow bloated CSS stylesheets to be on the first page. Copyrights for components of this work owned by others than ACM streamlined during development, and generic stylesheets to must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a be minimised before deployment. fee. Request permissions from [email protected]. There has been a lot of work on optimising CSS (e.g. [14, OOPSLA ’15, Month d–d, 20yy, City, ST, Country. Copyright © 20yy ACM 978-1-nnnn-nnnn-n/yy/mm. $15.00. 21, 44, 46, 47]), which include merging “duplicated” CSS http://dx.doi.org/10.1145/nnnnnnn.nnnnnnn rules, refactoring CSS declaration blocks, and simplifying CSS selectors, to name a few. However, most of these works and real-world examples from the benchmarks in Mesbah analyse the set of CSS rules in isolation. In fact, the only and Mirshokraie [46]), we were surprised to learn that one- available methods (e.g. Cilla [46] and UnCSS [58]) that step tree updates used in these applications are extremely take into account the dynamic nature of HTML5 introduced simple, despite the complexity of the JavaScript code from by JavaScript are based on simple dynamic analysis (a.k.a. the point of view of static analysers. That said, we found testing), which cannot soundly prove redundancy of CSS that these updates are not restricted to modifying only cer- rules since such techniques cannot in general test all possible tain regions of the HTML tree. As a result, models such behaviors of the HTML5 application. For example, from as ground tree rewrite systems [38] and their extensions the benchmarks of Mesbah and Mirshokraie [46] there are [22, 25, 36, 39, 43] (where only the bottom part of the tree some non-redundant CSS rules that their tool Cilla falsely may be modified) are not appropriate. However, systems identifies as redundant, e.g., due to the use of JavaScript with rules that may rewrite nodes in any region of a tree are to compensate for browser-specific behavior under certain problematic since they render the simplest problem of reach- HTML5 tags like <input/> (see Section 6 for more details). ability undecidable. Recently, owing to the study of active Removing such rules can distort the presentation of HTML5 XML, some restrictions that admit decidability of verifica- applications, which is undesirable. tion (e.g. [7, 8, 19, 20]) have been obtained. However, these models have very high complexity (ranging from double ex- Static Analysis of JavaScript A different approach to ponential time to nonelementary), which makes practical im- identifying redundant CSS rules by using static analysis plementation difficult. for HTML5. Since JavaScript is a Turing-complete pro- gramming language, the best one can hope for is approxi- Contributions. The main contribution of the paper is to mating the behaviors of HTML5 applications. Static anal- give a simple and clean tree-rewriting model which strikes ysis of JavaScript code is a challenging goal, especially a good balance between: (1) expressivity in capturing the in the presence of libraries like jQuery. The current state dynamics of tree updates commonly performed in HTML5 of the art is well surveyed by Andreasen and Møller [9], applications (esp. insofar as detecting redundant CSS rules with the main tools in the field being WALA [51, 55] and is concerned), and (2) decidability and complexity of rule TAJS [9, 28, 29]. These tools (and others) provide traditional redundancy analysis (i.e. whether a given rewrite rule can static analysis frameworks encompassing features such as ever be fired in a reachable tree). We show that the com- points-to [27, 55] and determinacy analysis [9, 51], type in- plexity of the problem is EXP-complete1, though under var- ference [28, 34] and security properties [23, 24]. The mod- ious practical restrictions the complexity becomes PSPACE elling of the HTML DOM is generally treated as part of the or even P. This is substantially better than the complexity heap abstraction [24, 29] and thus the tree structure is not of the more powerful tree rewriting models studied in the precisely tracked. context of active XML, which is at least double-exponential For the purpose of soundly identifying redundant CSS time. Moreover, our algorithm relies on an efficient reduc- rules, we need a technique for computing a symbolic repre- tion to a reachability analysis in symbolic pushdown systems sentation of an overapproximation of the set of all reachable for which highly optimised solvers (e.g. Bebop [12], Getafix HTML trees that is sufficiently precise for real-world appli- [33], and Moped [48]) are available. cations. Currently there is no clean abstract model that cap- We have implemented our reduction, together with a tures common dynamics of the HTML (DOM) tree caused proof-of-concept translation tool from HTML5 to our tree by the JavaScript component of an HTML5 application and rewriting model. Our translation by no means captures the at the same time is amenable to algorithmic analysis. Such full feature-set of JavaScript and is simply a means of test- a model is not only important from a theoretical viewpoint, ing the underlying model and analysis we introduce2. We but it can also serve as a useful intermediate language for the specifically focus on modelling standard features of jQuery analysis of HTML5 applications which among others can be [30] — a simple JavaScript library that makes HTML docu- used to identify redundant CSS rules. ment traversal, manipulation, event handling, and animation easy from a web application developer’s viewpoint.

Load more