Statically Detecting JavaScript Obfuscation and Minification Techniques in the Wild

Marvin Moog∗†, Markus Demmel∗, Michael Backes†, and Aurore Fass† ∗Saarland University †CISPA Helmholtz Center for Information Security: {backes, aurore.fass}@cispa.de

Abstract—JavaScript is both a popular client-side program- inherently different objectives, these transformations leave ming language and an attack vector. While malware developers different traces in the source code syntax. In particular, transform their JavaScript code to hide its malicious intent the Abstract Syntax Tree (AST) represents the nesting of and impede detection, well-intentioned developers also transform their code to, e.g., optimize performance. In this paper, programming constructs. Therefore, regular (meaning non- we conduct an in-depth study of code transformations in the transformed) JavaScript has a different AST than transformed wild. Specifically, we perform a static analysis of JavaScript files code. Specifically, previous studies leveraged differences in to build their Abstract Syntax Tree (AST), which we extend with the AST to distinguish benign from malicious JavaScript [5], control and data flows. Subsequently, we define two classifiers, [9], [14], [15]. Still, they did not discuss if their detectors benefitting from AST-based features, to detect transformed sam- ples along with specific transformation techniques. confounded transformations with maliciousness or why they Besides malicious samples, we find that transforming code did not. In particular, there are legitimate reasons for well- is increasingly popular on Node.js libraries and client-side intentioned developers to transform their code (e.g., perfor- JavaScript, with, e.g., 90% of Alexa Top 10k containing mance improvement), meaning that code transformations are a transformed script. This way, code transformations are no not an indicator of maliciousness. indicator of maliciousness. Finally, we showcase that benign code transformation techniques and their frequency both differ from In this paper, we conduct an in-depth empirical study the prevalent malicious ones. of code transformations in the wild. Contrary to Skolka et Index Terms—Web Security, JavaScript, Obfuscation, Minifi- cation, Empirical Study al. [44], who performed an analysis of transformed code on the most popular websites, we do not focus solely on client- I.INTRODUCTION side JavaScript, but we also consider library code from and malicious JavaScript, for a comparative analysis. Contrary JavaScript is a scripting language, which has become one to their approach, we do not target the tools leveraged to of the core technologies of the Web platform. In particular, modify the code. Instead, we dive into the specific techniques it is used as a client-side programming language by almost used to transform it to, e.g., highlight any differences in 97% of all websites [47]. Initially, JavaScript was invented terms of techniques or usage frequency between benign vs. to create sophisticated and interactive web pages. However, malicious code transformations, based on the ten techniques as it is executed in users’ browsers, it also provides a basis that we monitor. At the same time, we perform a longitudinal for attacks. On the one-hand, JavaScript offloads the work analysis of code transformations in the wild to study at large- to the client-side, meaning that it should be lightweight and scale the evolution of these techniques, depending on time fast to reduce loading times. Specifically, saving bandwidth or on specific trends. Since these transformation techniques will improve website performance. To this end, developers mostly work at code level, e.g., to make it harder to reverse- transform their code to reduce its size, e.g., by shortening the engineer, we seek to directly analyze the patterns that specific length of variables, inlining functions, and various optimiza- transformations leave in the code structure. To directly work tion shortcuts [17], [32]. We use the term minification to refer at code structure level, we chose to perform a static analysis to the transformation techniques aiming at reducing code size. of JavaScript files, also motivated by its speed and high code Overall, developers can also choose to protect code privacy coverage. Specifically, we enhance the traditional AST with and intellectual property. For this purpose, there are some control and data flow information to abstract the source code, additional code transformations, which, e.g., make reverse- without losing its semantics. Next, we extract specific features engineering harder without downgrading the performance too from the AST, which are either typical of regular JavaScript or much. On the other hand, to perform malicious activities, of a given transformation technique. Finally, we leverage two attackers would like to evade detection. For this purpose, they random forest classifiers, first to predict whether a given script transform their code too, to hide its malicious intent or at least is transformed or not, and second–in the case of a transformed make it harder to analyze, e.g., string manipulations, encoding, script–to detect the specific techniques developers used. and logic structure obfuscation. With the term obfuscation, we refer to techniques, which aim at hindering code analysis. In particular, we highlight the fact that code transforma- This way, both benign and malicious JavaScript use code tions are popular in the wild, both for websites (68.60% of transformations but for different purposes. Due to their the extracted scripts), npm packages (8.7% of the extracted scripts), and malicious JavaScript (29-73%). Specifically, web- encoding (such as ASCII, Unicode, or hexadecimal), encryp- sites are getting more transformed over time, meaning that tion and decryption functions hinder a direct understanding web developers seem to be more concerned about bandwidth of the code. We refer to these techniques as string ob- and loading times, i.e., are increasingly working on improving fuscation. Similarly, with integer obfuscation, website performance. On the contrary, malicious JavaScript are a number does not appear in plain text but is, e.g., computed more ephemeral, so that we rather observe trends specific to with arithmetic operators. Further techniques also enable a month. Still, the transformation techniques used by benign to hide data. For example, with obfuscated field developers are not changing, neither over time nor between reference, the bracket notation is privileged over the client-side and library-based JavaScript, whereas malicious dot notation to access an object property (as it enables to JavaScript uses different patterns. Specifically, the most pop- compute an expression, whereas dot notation only considers ular benign transformation techniques relate to minification, identifiers [34]). In addition, data can be fetched from mostly basic techniques, while the prevalent malicious ones a global array (global array) or rewritten, so that it include identifier and string obfuscation and aggressively mini- does not contain any alphanumeric characters anymore [36] fying the code. Besides different transformation techniques, we (no alphanumeric). also note that the remaining techniques that we monitor have • Logic structure obfuscation directly targets the code logic, differing usage frequency depending on whether the files are such as manipulating execution paths, e.g., by adding condi- benign or malicious. tional branches. Another technique consists in adding irrele- To sum up, our paper makes the following contributions: vant instructions (dead-code injection) or changing • We abstract JavaScript files through their AST enhanced the program flow, e.g., by moving all basic blocks in a with control and data flows. Based on features we extract single infinite loop, whose flow is controlled by a switch from the AST, we train two classifiers: while the first one statement [23] (control-flow flattening). distinguishes regular from transformed code, the second one • Dynamic code generation leverages the dynamic nature of can recognize the specific transformation techniques used. JavaScript to generate code on the fly, e.g., with eval. • We evaluate our approach on a labeled dataset of regular • Environment interactions is specific to web JavaScript. and transformed JavaScript, also combining different trans- In this case, statements can be scattered across an HTML formation techniques. document using multiple < script > blocks. Similarly, the • We perform a large-scale and comparative analysis of code payload could also be stored within the DOM and extracted transformations in the wild, where we target client-side, subsequently so that it is not directly discernible. library-based, and malicious JavaScript. • Code protection regroups techniques to protect code pri- • Finally, we conduct a longitudinal analysis of code transfor- vacy and intellectual property, e.g., by impeding reverse mations to study their evolution over time. engineering. Specifically, we focus on self-defending, which makes the code resilient against formatting and For reproducibility reasons, our source code is available [31]. variable renaming [24]. In addition, we consider debug II.TRANSFORMING JAVASCRIPT CODE protection, which makes it harder to use features from the Developer Tool, for Chrome and Firefox [24]. This section first provides an overview of JavaScript ob- On the contrary, minification aims at reducing code size, fuscation and minification techniques. Then, we select state- e.g., saving bandwidth improves website performance. Basic of-the-art systems to transform JavaScript code. Finally, we techniques consist of deleting whitespaces and comments present the specific transformation techniques on which we (while obfuscation randomly added them to hinder the anal- focus in this paper. ysis), shortening variable names, and removing dead-code A. Code Transformation Techniques (minification simple). More advanced techniques di- rectly modify the logic structure of the code, e.g., by elimi- Obfuscation makes code harder to understand, both for nating unreachable or redundant code, inlining functions [17], human analysts and automatic tools. Several categories of code or replacing an if statement with the conditional operator obfuscation can be found in the wild [19], [26], [51]: shortcut [32] (minification advanced). • Randomization obfuscation consists in randomly insert- ing or changing elements of a script without altering its B. Code Transformation Tools semantics, e.g., whitespace or comments random- To transform JavaScript code, we focus on several tools: ization. Also, variable and function names can be ran- • obfuscator.io [24] is a highly configurable JavaScript ob- domized, e.g., to hinder manual analysis; we refer to this fuscator. For example, it includes several string obfuscation technique as identifier obfuscation. methods, self-defending, and control-flow flattening [23]. • Data obfuscation regroups string and number manipulation • JSFuck [27] is an obfuscator, which rewrites JavaScript techniques. For example, strings can be split, concatenated, code without any alphanumerical characters by only keeping or reversed so that they do not appear in plain text. Similarly, the six following characters: ”[”, ”]”, ”(”, ”)”, ”!”, and ”+”. characters can be substituted, e.g., by running a regular • gnirts [2], which obfuscates strings in JavaScript code expression replace on a string. Also, standard or custom (cf. Section II-A), without using encoding escape.

2 • custom-encoding, our approach to obfuscate strings, which consists in splitting a string into several variables and with encoding. concatenating them later on, will be characterized, e.g., by • JavaScript Minifier [8], which minifies samples, e.g., by an unusually high number of variable declarations and binary shortening variable names and removing whitespaces. expressions (to represent the ”+” operator). • Google closure compiler [16], which minifies JavaScript. We use the AST generation of the open-source JavaScript It can perform more advanced optimizations, such as dead- static analyzer JSTAP [14], which augments the traditional code elimination, functions inlining, and constant folding. AST from Esprima [18] with control and data flows. In We specifically selected (or implemented) these tools be- particular, control flows enable to reason about conditions cause they are configurable, meaning that for a given file, that should be met for a specific execution path to be taken. we could choose the specific code transformation technique(s) As for data flows, they represent the influence of a variable to use. Contrary to Skolka et al. [44], the tool used to on another. We propose several adjustments to the original perform the transformation is not relevant to us. Therefore, implementation. For example, we restrict flows of control to we did not consider two different tools performing the same nodes having an impact on program execution paths, meaning transformation the same way. statement nodes [11], CatchClause, and ConditionalExpres- sion. Similarly, we only consider data flows on Identifier C. Transformation Technique Selection nodes, i.e., there is a data flow between two Identifier nodes if Based on the previous state-of-the-art tools to trans- and only if a variable is defined at the source node and used form JavaScript code, along with their configuration set- at the destination node. We also improve the way to handle tings, we can focus on the detection of the 10 follow- objects and scoping. For performance reasons, we set a two- ing techniques: identifier obfuscation, string minute timeout to generate data flow edges. After that, we obfuscation, global array, no alphanumeric, consider the AST only enhanced with control flows. Finally, dead-code injection, control-flow flatten- we also leverage Esprima to collect lexical units (i.e., tokens). ing, self-defending, debug protection, mini- fication simple, and minification advanced. Since we can leverage the previous tools to transform B. Feature Set JavaScript code using one or several techniques, we are able to build a ground-truth dataset. This way, we can evaluate After building the AST, enhanced with control and data our classifier on the detection of specific techniques.1 Also, flows, and token information, we traverse the graph to extract we can still recognize techniques, which we do not monitor, specific features. We consider two feature categories, namely as transformed, even though we do not name the specific automatically selected features and hand-picked features. technique, e.g., obfuscated field reference. First, we extract 4-gram features from the AST. It has indeed been shown that n-grams are an effective means for modeling III.TRANSFORMED CODE DETECTION reports [14], [15], [28]–[30], [39]. In fact, moving a window of In this section, we present our approach, implemented in length four over the list of syntactic units extracted enables to Python, to detect and analyze transformed code. First, we ab- retain information about the code original syntactic structure. stract input code with its AST, enhanced with control and data Second, to determine features typical of specific transfor- flows, and tokens. Second, we leverage the fact that regular and mation techniques, we performed an in-depth study of the transformed code have a different structure (AST). This way, transformation techniques we presented in Section II-A. In the we extract specific features, which are typical of regular code following, we introduce a subset of the features we considered or of specific transformations. Third, we define two multi- (for reproducibility reasons, we made our source code avail- task detectors to a) detect transformed code and b), in the able [31]). To distinguish regular from transformed code, we case of transformed code, predict the specific transformation leverage generic features, such as an AST depth and breadth techniques used. Fourth, we present our experimental approach divided by a script number of lines. We also consider more in- to train our two detectors. Finally, we evaluate the performance depth features, like the ratio of MemberExpression compared of these two detectors on different ground-truth datasets. to the number of unique Identifier nodes, the proportion of A. Source Code Abstraction CallExpression, Literal, and Identifier nodes, the presence or absence of specific built-in functions, or the number of string To detect and analyze code transformations, we chose to operations. Similarly, for the scripts reported as transformed, perform a static analysis of JavaScript files to directly work we focus on the specific transformation techniques used. To at code level. In particular, the AST is a tree abstraction target minification, our feature set includes the average length of the syntactic structure of the code. As it represents the of Identifiers, the average number of characters per line, or the way programmatic and structural constructs are arranged in proportion of ternary operators [32]. Regarding obfuscation a given file, different code transformations have a dissimilar techniques, we consider, e.g., the ratio of dot to bracket AST. For example, the string obfuscation technique, notations to access an object’s properties [34], the average size 1We discuss limitations of our approach, which can recognize only these of arrays/dictionaries, or the proportion of variables fetched known techniques, in Section V-A from these structures (by leveraging data flows).

3 Finally, we construct a vector space such that each feature is files with at least a conditional control flow node,2 a function associated with one consistent dimension, and its correspond- node,3 or a CallExpression node4 in their AST. Finally, we ing value is stored at this position in the vector. As a first manually verified a subset of our collected samples to ensure pre-filtering layer, we consider one vector space to analyze that they are regular: of the 100 analyzed files, 99 are regular. files with the aim of distinguishing regular from transformed Overall, we collected 21,000 scripts. JavaScript code. In the following, we use the term level 1 to 2) Training Set: Next, we built our dataset of transformed refer to this first step. For the samples reported as transformed, samples. For this purpose, we considered the previous 21,000 we then construct a second vector space to detect the specific scripts and transformed them with one of the 10 techniques transformation techniques used on these inputs (similarly, we presented in Section II-C, for all techniques. This way, we refer to this second step as level 2). transformed these 21,000 scripts 10 times. We stored these 10 transformed variants separately so that we do not mix different techniques at this stage. To train our level 1 detector, C. Detector Definition we first removed 5,000 regular samples, which we will use The learning-based level 1 and level 2 detectors complete as a validation set. Then, we randomly selected half of our the design of our approach. In both cases, we define a multi- remaining regular set (i.e., 8,000 scripts), as many minified, task system [6]. In particular, multi-task learning includes and as many obfuscated samples from the previously generated both a multi-class and a multi-label system. With multi- pools. For the minified files, we represent the 2 minification class classification, we consider more than two classes, e.g., techniques equally (i.e., 4,000 files per category), and the for level 2: 10 transformation techniques. Still, each sample same applies to the 8 obfuscation techniques (1,000 files per can only be assigned to one class. On the contrary, with category). The process is similar for level 2, where we selected multi-label classification, several labels can be assigned to a 2,000 samples for each of the 10 transformation techniques. given sample. Formally, a multi-task system with C different 3) Validation Set: To optimize the predictions of our classes is comparable to running C binary classifiers. Either learning-based detectors, we performed an empirical study to the classifiers can be evaluated independently, or they can evaluate several off-the-shelf systems. In particular, we com- be arranged into a chain to account for possible correlations pared two multi-task classifier implementations: a) classifiers between the classes. In the second case, all classifiers use chain [41] and b) classifiers independence assumption [43] the same features, but the binary classifier at position P also (cf. Section III-C). Both for level 1 and level 2, we randomly leverages the predictions of all classifiers with a position in built a new dataset (disjoint from the training set) and, in both 0, P − 1 as an additional feature [38]. For both multi-task cases, the random forest classifier with the classifiers chain classifiers,J K we use the Scikit-learn implementation [42]. approach performed best. In the following, we always refer to this classifier for the learning based-detection. First, to distinguish regular from transformed code, we define a multi-task detector with the following classes: regular, E. Detector Accuracy minified, and obfuscated. Since a file can be both obfuscated To evaluate the accuracy of our two detectors, we leverage and minified, or even have a first part regular and a second three different datasets. First, we consider the remaining part transformed (e.g., when a minified jQuery version is samples from Section III-D2 (disjoint from both training and added to a regular sample), level 1 can output several labels validation sets). Second, we analyze 35,000 files, which we for a given input. We consider that a file is transformed if transformed with multiple techniques. Finally, we show that level 1 flagged it as obfuscated and/or minified. Second, for our approach generalizes to samples transformed with a new transformed samples, we detect the specific transformation tool, namely the Daft Logic obfuscator [10], [12]. techniques that developers used. As previously, we define a level 2 multi-task system with the 10 classes from Section II-C. 1) Test Set 1 - Remaining Samples: First, to evaluate the level 1 detector, we consider the remaining 8,000 regular samples from Section III-D2, as many minified, and as many D. Detector Training obfuscated. Overall, we accurately detect 7,892 files as regular (98.65%), 7,985 as obfuscated (99.81%), and 7,977 as minified Next, we train our two detectors with regular files and sam- (99.71%) so that we have a high detection accuracy of 99.41%. ples we transformed using the tools presented in Section II-B. In the remaining sections, we consider files obfuscated or 1) Regular File Collection: To train our detectors, we minified as transformed (accuracy: 99.69%), since we focus first collected a set of regular JavaScript files from popular on the specific transformation techniques at level 2. GitHub projects [46] and popular JavaScript libraries or other For level 2, we randomly selected 2,000 samples from resources [7]. We removed files smaller than 512 bytes to keep Section III-D2, for each transformation technique (disjoint only those with enough features to be representative of a class. For performance reasons, we chose to collect only samples 2 DoWhileStatement, WhileStatement, ForStatement, ForOfStatement, ForInStatement, IfStatement, ConditionalExpression, TryStatement, and smaller than 2 megabytes (even though our static approach SwitchStatement can still analyze them). Also, to remove, e.g., JSON files, or 3 ArrowFunctionExpression, FunctionExpression, and FunctionDeclaration samples containing exclusively comments, we only consider 4 including TaggedTemplateExpression

4 1.0 Wrong labels 1.00 1.000 0.5 Missing labels 0.007 4 0.98 0.999 0.8 0.006 0.96 0.4 0.005 3 0.998 0.6 0.94 0.3 0.004 0.92 0.997 2 0.4 0.003 Accuracy (%) Accuracy (%) Accuracy (%) 0.90 0.2 Number of labels Number of labels Number of labels 0.996 0.002 1 0.88 0.2 0.1 0.001 0.86 0.995

0.0 0 0.84 0.0 0.000 0.994 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 k k k (a) Threshold = 0% (b) Threshold = 10% (c) Threshold = 50% Figure 1: Top-k score accuracy, depending on the threshold from the previous sets). While we built this dataset with the detection accuracy when k increases, as well as the average tools from Section II-B, we used only one configuration per number of wrong and missing labels. From k = 7, we have an file. This way, we limited the number of multiple techniques artificial fast decline in terms of accuracy, as the ground truth used on a single file. Still, some tools always perform a contains at most 7 labels. Still, for k = 8, our detector would specific technique in combination with others so that one output 8 labels, possibly including labels with a very low prob- given sample can have up to three different labels. We first ability of being correct. Therefore, we will only consider the measure the accuracy on this set by considering only the first k labels if they have a probability of being correct over a correct predictions, i.e., both the labels predicted, and their threshold that we will determine. Our threshold should respond number must match the ground truth. We achieve an accuracy to the following challenges: 1) minimize the number of wrong of 86.95%, which we deem to be high, given that our detector labels, 2) maximize the number of detected techniques, and can make 1,023 different predictions.5 Due to the high number 3) maximize the accuracy. Specifically, choosing a very high of possible labels, another way of measuring the accuracy threshold (i.e., considering predictions only if they have a very consists of using the Top-k score [22]. In our case, we consider high confidence) would minimize the number of wrong labels that a Top-k prediction is correct when the k predictions with and maximize the accuracy. In turn, we would only be able the highest probability are part of the expected ground-truth to detect a few transformation techniques. For example, even labels. For example, if we consider a multi-task detector D, with a threshold of 50%, we could only recognize 3 or 4 which can predict the following labels {A, B, C, D}, and a transformation techniques (see Figure 1c), while we would file with the following ground truth [A, B, C]. If D makes the like to recognize most of them. We empirically selected a following predictions: Top-1 = [B], Top-2 = [B, C], Top-3 = threshold of 10%. As represented in Figure 1b, with such a [B, C, D], and Top-4 = [B, C, D, E], then only Top-1 and Top-2 threshold, we have less than 0.32 wrong labels on average and are correct. Specifically, in our code transformation setting, can detect up to 7 transformation techniques with an accuracy Top-1 accuracy is 99.63%, Top-2 90.85%, and Top-3 98.95% still over 89% (and over 99.84% for 1 and 2 techniques). (the remaining Top-k are 0%, as the ground truths have at 3) Test Set 3 - Daft Logic Samples: Finally, we show most 3 labels here). Thus, considering the k most probable that our detectors generalize to samples transformed with predictions of our classifier is an accurate way of detecting another tool. For this purpose, we generated 10,000 new the specific transformation techniques used in given samples. samples, which we transformed with a popular obfuscator [44], 2) Test Set 2 - Mixed Samples: Next, we show that our namely the Daft Logic obfuscator [10]–based on Dean Ed- detectors can also analyze files transformed with multiple wards packer [12].6 Specifically, level 1 accurately recognizes techniques (as we envision that samples from the wild mix 99.52% of the samples as transformed. As for level 2, our these techniques). To this end, we generated 35,000 new files, Top-4 metric based on a threshold of 10% reports the follow- which we transformed with the tools from Section II-B by ing techniques: minification advanced and simple, identifier combining different configuration settings. obfuscation, and string obfuscation, which is in line with the For level 1, we accurately detect 99.99% of the transformed transformations performed by the packer. In fact, our approach files, compared to 99.69% in Section III-E1. We assume that generalizes, as it is not specific to a tool but recognizes generic mixing several techniques increases the proportion of features transformation techniques through syntactic patterns. typical of transformed scripts, hence a higher accuracy. IV. LARGE-SCALE ANALYSIS OF TRANSFORMED CODE For level 2, we use the Top-k metric to evaluate the accuracy. Contrary to Section III-E1, the ground truth may contain In this section, we present the results of our large-scale between 1 and 7 labels. Figure 1a presents the evolution of the analysis of code transformations in the wild. We focus, in particular, on benign JavaScript extracted from the most pop- 5The number of predictions from our classifier follows the combination ular websites and the most popular npm packages. Next, we without repetition distribution, knowing that, for a given file, our classifier can predict k ∈ 1, 10 labels. Therefore, the number of possible predictions 6We chose not to consider this tool in Section II-B, as it combines several Í10 10 J K is k=1 k = 1, 023 techniques without providing a way for targeting specific ones at a time

5 Table I: JavaScript dataset description B. Code Transformations in the Wild In this section, we provide an in-depth study of the preva- Source Creation #JS Class Section lence of code transformations in the wild. We first focus on Alexa Top 10k 2020 46,238 Benign Section IV-B1 client-side JavaScript with Alexa Top 10k websites before npm Top 10k 2020 51,053 Benign Section IV-B2 considering library-based code with npm Top 10k packages. DNC 2015-2017 4,514 Malicious Section IV-C Hynek 2015-2017 29,484 Malicious Section IV-C 1) Alexa Top 10k Websites: In our first experiment, we BSI 2017 36,475 Malicious Section IV-C leverage our level 1 detector to classify JavaScript files ex- Alexa Top 2k * 65 2015-2020 327,164 Benign Section IV-D1 tracted from the 10,000 most popular websites in September npm Top 2k * 65 2015-2020 482,834 Benign Section IV-D2 2020. In particular, we found that code transformations (mostly minification) are highly used, with over 89.4% of the websites, which contain at least one transformed script. In fact, minifica- tion is especially used to reduce loading times, thus to improve consider malicious JavaScript samples, which enables us to website performance. A more in-depth study of the specific highlight transformation technique differences depending on scripts included by Alexa Top 10k websites suggests that the intent of the files. To avoid any bias due to our malware 68.60% of them are transformed, with over 68.20% of them collection being older than our benign samples, we also reported as minified and 0.40% as obfuscated. To verify these perform a longitudinal analysis of benign code transformations results, we randomly selected 400 files, which we manually between 2015 and 2020. Finally, we summarize our findings reviewed). In particular, out of the 100 files classified as regarding the proportion of transformed code and specific regular, we confirm that 83 are regular. Regarding transformed techniques used in benign vs. malicious samples. samples, 96 / 100 are minified, 99 / 100 are obfuscated, and 100 / 100 are transformed. We noticed that manually assigning A. System Setup a label to samples is not straightforward because they may have been made harder to understand but without impeding Our large-scale analysis of transformed code in the wild the analysis too much. In this case, we mostly reported them rests on several datasets. For client-side JavaScript, we focus as regular because they were, in general, solely using slight on the most popular Alexa websites [1], which we statically identifier obfuscation. Also, we observed samples that could scraped, also including external scripts. We first focus on be obfuscated, minified, and regular at the same time. In this Alexa Top 10k in September 2020 (Section IV-B1), before case, if one class was significantly prevalent, we assigned considering Alexa Top 2k websites, one crawl per month the label of that class; otherwise, we considered multiple between May 2015 and September 2020 (i.e., on 65 months, labels. Overall, our manual analysis underlines the fact that our Section IV-D1). Given the fact that we statically extracted system can detect transformed samples very accurately, while JavaScript from the start pages of high-profile sites, we assume it slightly over-approximates the number of regular instances. this JavaScript collection to be benign. Similarly, for library- As an additional verification step, we classified the dataset of based JavaScript, we consider the most popular npm packages, 150,000 regular samples collected by Raychev et al. [37]. We based on their number of downloads [35] (first, Top 10k in retain an accuracy of 98.65%, which highlights the accuracy Section IV-B2, then, Top 2k over 65 months in Section IV-D2). of our approach to detect regular samples too. As previously, we assume them to be benign. Finally, we Contrary to our system, which detects 68.60% of the scripts also analyze malicious JavaScript from three different sources extracted from Alexa Top 10k as transformed, Skolka et (Section IV-C). In particular, we consider exploit kits provided al. [44] found that 38% of the scripts extracted from the by Kafeine DNC (DNC) [25], the malware collection of Hynek 100,000 most popular websites are transformed. Based on their Petrak (Hynek) [20], as well as JScript-loaders from the Ger- results, we verified whether a website rank could influence the man Federal Office for Information Security (BSI) [4]. Based probability of it containing a transformed script. For Alexa on the classification of DNC, Hynek, and BSI, including, e.g., Top 10k, we arranged the sites by groups of 1,000, ordered anti-virus systems, and a runtime-based analysis, we consider by popularity. We found that, while 72.35% of the scripts that these files are malicious. belonging to Alexa Top 10k, but not to Alexa Top 9k, are As our datasets contain both benign (client-side and library- transformed, almost 80% of the Top 1k are transformed.7 As based) and malicious JavaScript, in the following sections, a further comparison step, we extracted scripts belonging to we will highlight any transformation technique differences Alexa Top 100k, but not to Alexa Top 90k, and reported depending on the intent of the scripts. Also, we will showcase 64.72% as transformed. These observations suggest that there any evolution in the transformed code landscape between 2015 is a link between website popularity and code transformations. and 2020. Table I summarizes the content of our datasets. Also, and contrary to our approach, Skolka et al. limited Similarly to Section III-D1, we consider scripts between 512 their analysis to files smaller than 40 kB (vs. 2 MB for us), bytes and 2 megabytes, containing at least a conditional control flow node, a function node, or a CallExpression node. We 7These results differ slightly from Alexa Top 10k because, for Alexa Top 10k, we consider unique scripts, while in this setting, the scripts are perform the following experiments with the level 1 and level 2 unique per 1,000 website sampling but may appear in the ten samplings, detectors, which we defined and trained in Section III-D. which increases the weight of, e.g., widespread minified libraries

6 tmyntb eesr o p akgs o example, For packages. npm the Node.js, for bandwidth, to with necessary save used be mostly and when are not times samples loading may Alexa transformed reduce 8 it While to almost as Alexa. e.g., is scripts for minified, which than the obfuscated), less 0.25% of times and 10k) 8.7% minified (Top only (8.46% We 136,395 Our reports 2020. before. and September detector and 1) performed total- August (Top packages, between been downloads 332,946,417 npm not downloaded between has most ing knowl- 10,000 study our the of a consider best such the To edge, JavaScript. library-based on focus performance. website improve to as minified, and mostly Therefore, techniques. at other observed and the already obfuscation) for with (identifier 1.94% used 5.72% below even seldom below are probability they usage transfor- the a monitored, remaining by we offered the techniques e.g., Regarding mation (40.24%), compiler. ones closure deletion, advanced Google whitespace more and also shortening but variable techniques minification e.g., loading basic reduce (45.96%), particular, to in websites, observe, popular minification We most necessarily expected, times. the as not on and used is 2, highly are Figure is confidence predictions in their indicated the As of based Since 100%. sum used, score. the the being confidence computed independent, technique detector we given our purpose, a on this of For probability prevalent. average most the are ihour With detected. samples transformed of thus, to libraries; proportion minified us the e.g., increasing be, enabled can choices which files, design bigger as be analyze our to Second, files showed kB. 40 we transformed. more which over popular, more con- more reported size we are First, we that a reasons. websites two that sidered have for assume work them previous than we of transformed reason, (73%) this 87 For that libraries we found popular [48], most and the report of W3Techs versions minified on 120 downloaded Based reasons. performance for 10k Top Alexa from used script being transformed of a probability in technique Transformation 2: Figure )nmTp1kPackages: 10k Top npm 2) et efcso h ape eotda transformed. as reported samples the on focus we Next, node_modules ee 2 level

Technique usage probability (%) 10 20 30 40 0

Identifier obfuscation eetr edtriewihtechniques which determine we detector, ee 1 level Self-defending

String obfuscation h rnfre lx cit are scripts Alexa transformed the , odr sn ewr cesis access network no As folder.

node Dead-code injection

Debug protection sascn xeiet we experiment, second a As

No alphanumeric utm a ietaccess direct has runtime

Control-flow flattening

Global array

Minification advanced

Minification simple ee 1 level 7 natasomdsrp rmnmTp10k Top npm from used script being transformed of a probability in technique Transformation 3: Figure o hsrao,w sueta u eetrhsahigher a has npm. detector for our cases that such assume no we observed reason, we this 100 For while the include of code, also 11 out reviewed, regular particular, manually we In samples combine Alexa code. minified samples transformed Alexa with while regular transformed, tend completely Still, scripts transformed be packages. that to popular observe we npm Alexa, to on contrary transformed is and widespread minification as very that scripts not meaning the Alexa), its overall of for reducing 8.7% 68.60% of to report aim (compared only the with we mostly Still, developers is size. package it npm code, when Simi- their that prediction. transform observe we each Alexa, for to confidence larly scores detector’s these our computed on we advanced based Alexa, more for and As re- (58.34%) (36.57%). are basic techniques techniques both prevalent minification, main most to our two lated up the sums before, 3 As Figure findings. packages. npm on used techniques Alexa . that for than found one less times least we 6 at almost Overall, contain is packages Web. which popular script, packages the transformed most the 10k in when the of loaded or 15.14% runtimes), be older directly on can syntax new transformations use packages), some to to proprietary induces applied (which for compatibility rather (e.g., backward property are intellectual does techniques bandwidth protect Transformation save e.g., apply. to, not code the transforming required, popularity package in on code depending transformed packages of 10k prevalence Top the npm of Evolution 4: Figure et with Next,

ee 2 level Technique Usage Probability (%) Technique usage probability (%) 10 20 30 40 50 10 15 20 0 0 5

1,000 Identifier obfuscation edv noteseictransformation specific the into dive we , 2,000 Self-defending

3,000 String obfuscation

4,000 Dead-code injection

5,000 Debug protection

6,000 No alphanumeric

7,000 Control-flow flattening

8,000 Global array

9,000 Minification advanced

10,000 Minification simple W sueta hs eut ifrars our Hynek across of differ 73.07% results these and that DNC, assume of We 65.94% transformed. as samples, BSI 28.93% only necessarily the classify not we of particular, are In samples transformed. as malicious detected our of most expecting, datasets the JavaScript malicious In our IV-D. our with Section analyze we in section, 2020) library-based two-year current and and a 2015 of client-side over (between study in JavaScript collection longitudinal transformations their comparative compared code a b) malware benign our provide and of we files character time-frame, benign older any the the avoid To and a) to IV-A). to (Section 2015 BSI due malicious between and bias Hynek, popular our collected DNC, download particular, by been 2017 cannot In have we 2020. samples as JavaScript September 10k, for Top malware npm and Alexa transformation a the samples. on As JavaScript malicious focus reasons. by we used performance techniques section, techniques prevalent for this and most in minification, Alexa the comparison, to between cases, similar both related in is are that, used usage and techniques their npm, transformation that specific showed the and on Specifically, focussed transformed. be we can code JavaScript benign based JavaScript Malicious in Transformations Code (37%). C. simpler ones privilege complex significantly more 10k over Top (58%) and techniques 5k respectively), Top 47%, the and while (49% most ad- techniques the and minification basic is use vanced minification equally packages while 1k Top remaining that, transformation technique, prevalent specific find the the we than into used, diving code times techniques By 4.4 transformed packages. and contain 10k 2.4 Top between to 4, are 10 likely Figure packages into in less popular indicated packages most As 10k popularity. 1k our by the ordered split 1k, Similarly we of packages. experiment, groups considered Alexa the the of to rank the on depending (45.96%). Alexa of than (58.34% samples simple) minified minification npm for of prediction the in confidence )Lvl1Detector: 1 Level 1) from different slightly is JavaScript malicious of case The library- and client-side both that showcased we Previously, transformations code of prevalence the on focus we Finally,

Technique usage probability (%) 10 15 20 25 30 0 5 ee 1 level

2015-05 (76) 2015-06 (704) 2015-07 (336)

Tasomto ehiu rbblt fbigue namlcostasomdsample transformed malicious a in used being of probability technique Transformation 5: Figure 2015-08 (148) 2015-09 (261) 2015-10 (227) and 2015-11 (194) 2015-12 (135) 2016-01 (179) Tested month a o DNC For (a) 2016-02 (146)

ee 2 level 2016-03 (111) 2016-04 (129) 2016-05 (83) 2016-06 (233) is,adcnrr owa ewere we what to contrary and First, 2016-07 (129) 2016-08 (245) 2016-09 (263)

detectors. 2016-10 (224) 2016-11 (197) 2016-12 (90) 2017-01 (218) 2017-02 (112) 2017-03 (74) Minification simple Minification advanced Global array Control-flow flattening No alphanumeric Debug protection Dead-code injection String obfuscation Self-defending Identifier obufscation

Technique usage probability (%) 10 20 30 40 0

2015-08 (19) 2015-09 (23) 2015-10 (38) 2015-11 (20)

b o Hynek For (b) 2015-12 (882) 2016-01 (24) 2016-02 (69) 2016-03 (3,669) 8 Tested month 2016-04 (1,365)

rnfre u h aiiu oi tl paet ota we that so apparent) still logic partially are malicious 97 ones the that remaining but 3 stress transformed the we (and reviewed, transformed classified manually indeed samples are we 100 that the transformed of Out as transfor- use. specific they the techniques on now mation focus we transformed, another. as to detected source, from malware different a particular, very even In is and different. month, samples a transformed very of be proportion may a the (as samples wave remaining while a instances), the malicious inside similar way, syntactically results contains This classification wave detection. similar signature-based observe per impede we script e.g., malicious [15], unique to, one instances victim, waves, malicious in broadcast unique, syntacti- are SHA-1 which generate but regular. to identical, obfuscation, as cally identifier code transformation e.g., specific the leverage techniques, can classify regular, actors being (correctly) malicious code Second, would the of system majority the and our to amount Due small larger code. significantly a regular a of combine with payload can to obfuscated authors example, slightly For malware samples transformed. detection, malware completely evade First, necessarily rates. not transformed are open. low the observed rather in the are staying mostly 3 is while logic them, malicious of their 97 as for regular correct are predictions sub- our which they strings, because into fixed flew readable label sequently non-human a we few sample, on a each decide observed for only Specifically, to dynamically. us code (i.e., for generated ones, regular complicated remaining 18 was very the for it rather looks As randomized). are are structure names 57 variable syntactic meaning ones, obfuscation, their remaining identifier on that the rely Of solely but comment). parses transformed Esprima large which a 23, [33], for compilation as open the conditional in use confirm stays 2 we logic and files, malicious split (the regular regular (equally 100 are the transformed 25 Of that as sources). three 100 our and across we regular results, instances as malicious these recognized selected Given randomly 100 JScript-loaders. JavaScript, reviewed vs. malicious manually enti- kits of different exploit kinds three different e.g., by collected provided which were ties, they as datasets, three 2016-05 (3,114) 2016-06 (1,823) )Lvl2Detector: 2 Level 2) for responsible are reasons two that assume we Overall, 2016-07 (568) 2016-08 (3,375) 2016-09 (4,362) 2016-10 (763) 2016-11 (5,317) 2016-12 (3,933) 2017-01 (53) 2017-02 (36) 2017-03 (31) eval

Technique usage probability % 10 20 30 40 0

o h aiiu ape previously samples malicious the For 2017-01 (2,523) eadn h rnfre samples, transformed the Regarding .

2017-02 (4,888)

2017-03 (4,820) c o BSI For (c)

2017-04 (4,860) Tested month

2017-05 (4,919)

2017-06 (4,411)

2017-07 (4,193)

2017-08 (1,695)

2017-09 (568)

2017-10 (3,598) nAeaTp2 estsadnmTp2 akgs which packages, 2020-09. 2k and Top 2015-05 npm between and collected longitudinal websites we focus 2k a We Top samples. Alexa perform benign on on we transformations code 2017-10, of analysis and 2015-05 between Wild the in Transformations Code of Analysis analyze. Longitudinal and D. understand make to to harder code minification, JavaScript aggressive malicious transformation with other favor combined reducing latter by techniques, the contrast, e.g., In performance, times. improve loading to priv- minification former 1%. malicious the ilege mostly techniques, and transformation of benign use both usage files while time, JavaScript benign that the showcased their we of to way, 5-10% This superior between again appear is remaining array which the global for and flat- scripts), control-flow As tening, injection, benign confirms. dead-code for techniques, analysis 45.96% obfuscation manual over our vs. widespread 22% which also of is simple usage be- minification (average for DNC, advanced) For (minification inputs. usage nign 36.5% (string average over 3.3% an below and to obfuscation) have compared 17-21%, both between probability which and obfuscation advanced, string scripts. minification include probability benign techniques for usage popular 6.2% average below Additional to an compared malicious with 25-37%, between three obfuscation, technique our client- identifier transformation benign for is popular most Specifically, for the than code. sources, JavaScript probability library differing and a side differ- have both and are ent techniques transformation between malicious probability 36-59%, a with technique transformation prevalent where 3), (Figure attackers. different same may from the which stem waves, or comparing malicious independent not different be that but are stress time we to over i.e., like malware DNC also ephemeral, for would is collected kits We malware exploit BSI. which e.g., for providers, JScript-loaders files, vs. JavaScript different of to types different due transformation any chose avoid bias we to Also, source technique x-axis. per the malware number in the samples month the separate tested of to indicate each we to number 5 next constant, the files not Figure of As is results. month observations. following each main the collected our in bias up a sums introducing not are iue6 vlto ftepooto ftasomdscripts transformed of proportion the of Evolution 6: Figure sacmaio ihmlcossmlsta eecollected were that samples malicious with comparison a As 10k Top npm and 2) (Figure 10k Top Alexa to Compared

Transformed samples (%) 10 20 30 40 50 60 70

2015-05

ee 2 level 2016-01

2017-01 Tested month eotdmnfiaina h most the as minification reported

2018-01

2019-01

2020-01 npm-2k alexa-2k 9 downloads/point/2018-05-01:2018-05-01/supports-color consecutive two between almost common and are transformed, 5.9%, packages consistent as of the deviation more classified of standard scripts 93% a relative the small observe contrary, a of the we with given 17.95% On a 2019-05, month. with on next trend to average, packages the on 2016-05 on npm that, popular from popular fact still most the are the is month by of data explain 76.7% that we only indicates which the 24.22% out, of of spread 7.4% high deviation very average standard corresponding From on The relative 6. transformed. classified as Figure scripts we in extracted 2016-04, phases to three distinguish 2015-05 time. we over that packages particular, so same In the ephemeral comparing more necessarily not are are packages 2020. we and npm 2015 Alexa, between to month) of each Contrary number of the end on the (based at packages downloads npm popular most 2,000 the websites. in 2k Alexa Top them observations compare the directly our with use cannot with 10k we really Top line though even in not 2, also Figure do from are other results websites average, the These on benign practice. while 2.4%, under that 6.21%, probability meaning to usage a 8.23% have obfuscation techniques from identifier example, decreases For staying are slightly time. are over As and they constant times. techniques, time more loading transformation over and remaining bandwidth the websites save for their to of trying increasingly performance concerned more the become have with developers Therefore, minification web 40%). that while to observe 43.77% simple, we from to minification decreases 2015 slightly for in advanced 38.74% 2020 (from in increase usage 47.02% solely minification transformation is augmentation a specific this to that the due ob- infer into we we 7), diving transformed (Figure 6, By techniques of Figure proportion time. month the in over of each scripts indicated augmentation of As steady the first a 2020. serve collect the and to websites 2015 [45] Alexa between [21], popular Archive most Internet 2,000 the from Machine time over 2k Top Alexa from used script being transformed of a probability in technique Transformation 7: Figure downloads/point/ 8 )JvSrp rmnmPackages: npm From JavaScript 2) Alexa: From JavaScript 1) ecletdtepcae odwla rmhttps://api.npmjs.org/ from download to packages the collected We

Technique usage probability (%) 10 20 30 40 50 0

2015-05 {

period 2016-01

} 2017-01 [/ { Tested month package

2018-01 } is,w eeae h Wayback the leveraged we First, ,freape https://api.npmjs.org/ example, for ], 2019-01

2020-01 iial,w collected we Similarly, Minification simple Minification advanced Global array Control-flow flattening No alphanumeric Debug protection Dead-code injection String obfuscation Self-defending Identifier obfuscation 8 agesv iicto.Wiew lorpre on reported also and not we While abuse but minification. including, they aggressive to, techniques samples, limited transformation JavaScript ac- to network many malicious any apply combine for need not not As web does do cess. it fact, generally save which whereas In packages, to times, files. npm performance loading npm website and minified optimize bandwidth of to choose 8.46 % websites, developers only from and extracted found scripts client-side detected the we we both of While 68.20 % for in minification. minification is particu- technique code prevalent In benign library-based most transformations. the code lar, use JavaScript malicious Transformations Code of Summary trends. over package E. current packages on depends npm stays usage technique of its transformation evolution prevalent minification, landscape specific most the a code While observe time. transformed not do the we in way, mostly probabilities This with 3%. time, ones, below over common consistent less stay and the rather identifier for they 58.62% popular As less (9.71%). of the technique for probability obfuscation as average well as an respectively), for 34.28%, (with previously simple advanced as minification stages namely and techniques, three prevalent same most two the the observe times. we loading 8, or bandwidth Figure as reduce transformations e.g., code to, need abuse not not do that do they rate sense mostly the script in packages transformed IV-B2 npm Section current of from on observation 7-18% the our depend observed of confirms which our typical packages, are Still, npm observed trends. of we phases state three ephemeral increas- rather the We the that though. and minifiers, assume relationships scripts npm any of transformed (more popularity infer of ing/decreasing transformed not number could as the We between classified minified). accurately as regular files a precisely as classified retain 50 of / accurately 47 we files subset where and 50 / a 50 phases), with different reviewed accuracy the high manually (for We predictions 87.48% our packages. and scripts common transformed of of 15.17% a observe of also trend we consistent 2020-09, to 2019-06 from Similarly, months. time over 2k Top npm from used script being transformed of a probability in technique Transformation 8: Figure ntepeiu etos ecnr htbt einand benign both that confirm we sections, previous the In in shown as techniques, transformation specific Regarding

Technique usage probability (%) 10 20 30 40 50 60 0

2015-05

2016-01

2017-01 Tested month

2018-01

2019-01

2020-01 Minification simple Minification advanced Global array Control-flow flattening No alphanumeric Debug protection Dead-code injection String obfuscation Self-defending Identifier obfuscation 10 otetosw osdrdt eeaeortann esbut generic sets recognize to training tailored our is it generate as to tools, limited other considered not to we is generalizes approach tools our the Similarly, monitor. to techniques not use do they we if our even that transformed, Still, as II-A). samples recognize Section existing in additional discussed techniques we (while monitor our we techniques reason, obfuscators. and this minifiers For JavaScript transforma- available transformation ten the of on on a capabilities based focus selected of we to which chose behavior in techniques, we tion the accessible code particular, directly infer In the traces syntax. to in transformation its trying study trace to not any but are existing script leave we an not Still, reproduce does structure. to which JavaScript syntax, malicious benign rewrites H it with transformed as samples detect several cannot dy- used e.g., analyze techniques runtime, cannot transformation tech- we at nor design, transformation code By generated different syntax. namically by files’ the left in traces niques the see directly Limitations A. improvements. potential introducing before have might code. JavaScript probability, malicious vs. code usage differing benign different of for a ma- use with and the techniques, developers motivate benign transformation that Still, both reasons transformed. that different are have fact samples the JavaScript quantified previously.licious we discussed obfuscation, as way, code, string This the and minifying identifier aggressively namely stay and time, techniques over not the transformation that same are malicious note the we we popular Still, that most time. so over three samples ephemeral, same is the malware comparing is trend as perfor- the different, JavaScript, website very malicious with Regarding web time. concerned that the over more suggests mance example, getting websites For are in transformed. minification developers of more library-based increase and get linear client-side to both tend that JavaScript noticed we time, a over have techniques malware. the in of 5-10% half between than probability usage more them whereas using 3%, of we malicious probability below that a for have techniques developers than obfuscation to benign prevalent remaining monitored, transformed the less for be for is to As also this samples. reason harder analysis, can them main its code make impede the benign benign to that While for is understand. fact 3.3% samples the malicious lies than transforming highlights obfuscation less the This string to Similarly, scripts. use compared samples. compared 17-21%, files malicious npm), between malicious our and that for Alexa probability 25-37% (both over scripts to below benign usage average for our an obfuscation, with 6.2% technique identifier this with predicted e.g., detector files, obfuscated benign ecoet efr ttcaayi fJvSrp lsto files JavaScript of analysis static a perform to chose We approach our limitations the examine first we section, this In landscape code transformed the of evolution the Regarding ee 2 level .D V. eval ISCUSSION Smlry we Similarly, layers. unfolding eetri iie oteten the to limited is detector ee 1 level IDE N O eetrcan detector S EEK [13], transformation techniques through syntactic patterns. This is npm, and malicious samples, to identify any difference in confirmed in Section III-E3 and with our manual analyses in the transformation process, depending on the intent of the Sections IV-B1, IV-B2, and IV-C. JavaScript samples, as well as their evolution over time. B. Potential Improvements Detecting Malicious JavaScript — Code transformations, In this paper, we measure the prevalence of JavaScript code more specifically obfuscation, should not be confounded with transformations in the wild. In particular, we showcase that maliciousness. In the following, we present some tools that malicious both benign and malicious JavaScript transform their code, statically detect JavaScript inputs. Rieck et al. intro- meaning that code transformation is no indicator of mali- duced CUJO [39], which contains a static detector leveraging ciousness. In particular, we show that specific techniques and n-grams upon lexical units. With ZOZZLE, Curtsinger et al. [9] their usage probability differ between benign and malicious combined a Bayesian classifier with features extracted from scripts. For this reason, we believe that our large-scale analysis the AST to detect malicious JavaScript. Similarly, Fass et al. and measurement study are useful to the community and will proposed JAST [15], an n-gram approach based on features contribute to implementing new malware detection systems. from the AST. Later, they introduced JSTAP [14] to go beyond To this end, we could consider patterns that are solely present the syntactic structure and consider control and data flows. in malicious files to reduce the number of false positives due to Detecting Obfuscation — Finally, code transformations are benign obfuscated samples. However, this extension is outside not limited to JavaScript. Specifically, Wermke et al. [50] the scope of this paper, and we leave it for future work. focussed on software obfuscation on Android applications, while Wang et al. [49] studied software obfuscation techniques VI.RELATED WORK on mobile applications from Apple App Store. In this paper, we perform a large-scale analysis of code VII.CONCLUSION transformations in the wild. While obfuscation has been analyzed before, we provide an in-depth study of specific This paper measures the prevalence of JavaScript code trans- transformation techniques and highlight the fact that code formations in the wild. In particular, both well-intentioned and transformations used in malicious code differ from the preva- malware developers use code transformations on JavaScript. lent benign transformation techniques. We first present works Due to the inherently different intent of their scripts, they related to transformed JavaScript before focussing on mali- transform their code for different reasons, e.g., performance cious JavaScript and finally on obfuscation beyond JavaScript. improvement vs. analysis impediment. These different expec- tations from code transformations naturally lead to distinct Detecting Transformed JavaScript — In the literature, several techniques, which leave different traces in the source code approaches have been proposed to recognize transformed, or syntax. In this paper, we study how code transformations at least obfuscated, JavaScript code. Specifically, Kaplan et used in malicious JavaScript code differ from transformations al. [26] introduced NoFus, their bayesian classifier trained over typically used in benign scripts, and how code transformations the AST to distinguish obfuscated from non-obfuscated code. evolve over time. To this end, we define a learning-based Similarly, with JSOD, Blanc et al. [3] proposed an anomaly- pipeline to 1) distinguish regular from transformed scripts and based detection system over the AST to detect obfuscated 2) recognize the specific transformation techniques used. scripts including readable patterns. Likarish et al. [30] defined In practice, code transformations are widespread, both for specific features based on detecting obfuscation to recognize malicious and benign JavaScript, e.g., 89.4% of Alexa Top 10k malicious JavaScript. With JStill, Xu et al. [52] leveraged the websites contain at least a transformed script. In addition, we fact that malicious JavaScript code has to be deobfuscated observe that client-side JavaScript is getting more transformed before performing its intended behavior. Thus, they detected (minified) over time, meaning that web developers are getting obfuscation given specific deobfuscation functions. Similarly, more concerned about reducing loading times and improving Sarker et al. [40] rely on the fact that obfuscation aims at hid- website performance. In contrast, the most popular transfor- ing a script’s behavior, meaning that if the results of a dynamic mation techniques used by malicious JavaScript do not change analysis differ from a static analysis, then the script’s behavior over time. Finally, we showcase that transformation techniques is obfuscated. Still, none of these approaches focus on the used by malicious JavaScript differ from the prevalent benign specific transformation techniques that developers use nor on techniques, which are all the more common between client- benign vs. malicious transformation techniques. While Xu et side and library-based code. al. [51] analyzed the usage of some obfuscation techniques, they manually reviewed 100 malicious samples. In contrast, ACKNOWLEDGMENT our approach is automated and can analyze JavaScript code We thank the anonymous reviewers for their valuable feed- from the wild at scale. Finally, Skolka et al. [44] performed an back. In particular, we thank our shepherd Yinzhi Cao for his empirical study of transformed code on the Web. In particular, guidance and support in preparing the final version of this they focus on the tools that developers used to obfuscate or paper. We also thank the German Federal Office for Informa- minify their code, while we dive into (and detect) the specific tion Security (BSI) and Kafeine DNC, which provided us with transformation techniques. Besides, they analyzed client-side malicious JavaScript for our experiments. Special thanks also JavaScript, while we also consider library-based code with go to Ben Stock and Cris Staicu for their appreciated feedback.

11 REFERENCES [30] P. Likarish, E. Jung, and I. Jo, “Obfuscated Malicious JavaScript Detection Using Classification Techniques,” in International Conference [1] Alexa Internet, “Alexa Top Sites,” http://www.alexa.com/topsites. Ac- on Malicious and Unwanted Software (MALWARE), 2009. cessed on 2020-10-17. [31] MarM15, “Statically Detecting JavaScript Obfuscation and Minification Techniques in the Wild,” https://github.com/MarM15/js-transformations. [2] anseki, “Obfuscate string literals in JavaScript code,” https://github.com/ [32] Mozilla Developer Network, “Conditional (ternary) operator,” anseki/gnirts. Accessed on 2020-06-13. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/ [3] G. Blanc, D. Miyamoto, M. Akiyama, and Y. Kadobayashi, “Character- Operators/Conditional Operator. Accessed on 2020-06-07. izing Obfuscated JavaScript Using Abstract Syntax Trees: Experiment- [33] ——, “JavaScript Conditional Compilation: cc on,” https://developer. ing with Malicious Scripts,” in International Conference on Advanced mozilla.org/en-US/docs/Web/JavaScript/Microsoft Extensions/at-cc-on. Information Networking and Applications Workshops, 2012. Accessed on 2020-06-13. [4] BSI, “German Federal Office for Information Security (BSI),” https: [34] ——, “Property accessors,” https://developer.mozilla.org/en-US/docs/ //www.bsi.bund.de/EN. Accessed on 2021-03-18. Web/JavaScript/Reference/Operators/Property Accessors. Accessed on [5] D. Canali, M. Cova, G. Vigna, and C. Kruegel, “Prophiler: A Fast Filter 2020-06-07. for the Large-scale Detection of Malicious Web Pages,” in International [35] npm, “npm: download-counts,” https://github.com/npm/download- Conference on World Wide Web (WWW), 2011. counts. Accessed on 2020-10-17. [6] R. Caruana, “Multitask Learning,” in Machine Learning, 1997. [36] P. Palladino, “Non alphanumeric JavaScript,” http://patriciopalladino. [7] chencheng, “awesome-,” https://github.com/sorrycc/awesome- com/blog/2012/08/09/non-alphanumeric-javascript.. Accessed on javascript. Accessed on 2020-04-29. 2020-12-08. [8] A. Chilton, “JavaScript Minifier,” https://javascript-minifier.com. Ac- [37] V. Raychev, P. Bielik, M. Vechev, and A. Krause, “Learning Programs cessed on 2020-12-08. from Noisy Data,” in ACM SIGPLAN-SIGACT Symposium on Principles [9] C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert, “ZOZZLE: Fast and of Programming Languages (POPL), 2016. Precise In-Browser JavaScript Malware Detection,” in USENIX Security, [38] J. Read, B. Pfahringer, G. Holmes, and E. Frank, “Classifier Chains for 2011. Multi-Label Classification,” in Machine Learning, 2011. [10] Daft Logic, “Daft Logic: Online Javascript Obfuscator,” https:// [39] K. Rieck, T. Krueger, and A. Dewald, “CUJO: Efficient Detection and www.daftlogic.com/projects-online-javascript-obfuscator.htm. Accessed Prevention of Drive-by-Download Attacks,” in ACSAC, 2010. on 2020-12-08. [40] S. Sarker, J. Jueckstock, and A. Kapravelos, “Hiding in Plain Site: De- [11] Ecma International, “ECMAScript 2020 Language Specification,” https: tecting JavaScript Obfuscation through Concealed Browser API Usage,” //www.ecma-international.org/ecma-262. Accessed on 2020-12-08. in ACM Internet Measurement Conference (IMC), 2020. [12] D. Edwards, “dean.edwards.name/packer/,” http://dean.edwards.name/ [41] scikit-learn developers, “Scikit-learn: ClassifierChain,” https://scikit- packer/. Accessed on 2020-06-15. learn.org/stable/modules/generated/sklearn.multioutput.ClassifierChain. [13] A. Fass, M. Backes, and B. Stock, “HIDENOSEEK: Camouflaging html. Accessed on 2020-06-08. Malicious JavaScript in Benign ASTs,” in CCS, 2019. [42] ——, “Scikit-learn: Multiclass and Multilabel Algorithms,” https:// [14] ——, “JSTAP: A Static Pre-Filter for Malicious JavaScript Detection,” scikit-learn.org/stable/modules/multiclass.html. Accessed on 2020-06- in ACSAC, 2019, code repository: https://github.com/Aurore54F/JStap. 08. [15] A. Fass, R. P. Krawczyk, M. Backes, and B. Stock, “JAST: Fully [43] ——, “Scikit-learn: MultiOutputClassifier,” https://scikit-learn.org/ Syntactic Detection of Malicious (Obfuscated) JavaScript,” in DIMVA, stable/modules/generated/sklearn.multioutput.MultiOutputClassifier. 2018. html. Accessed on 2020-06-08. [16] Google Developers, “Closure Compiler,” https://developers.google.com/ [44] P. Skolka, C.-A. Staicu, and M. Pradel, “Anything to Hide? Studying The World Wide Web closure/compiler. Accessed on 2020-12-08. Minified and Obfuscated Code in the Web,” in Conference (WWW), 2019. [17] ——, “Closure Compiler: Advanced Compilation,” https://developers. [45] B. Stock, M. Johns, M. Steffens, and M. Backes, “How the Web Tangled google.com/closure/compiler/docs/api-tutorial3. Accessed on 2020-12- Itself: Uncovering the History of Client-Side Web (In)Security,” in 08. USENIX Security, 2017. [18] A. Hidayat, “ECMAScript Parsing Infrastructure for Multipurpose Anal- [46] O. Trekhleb, “Top 33 JavaScript Projects on GitHub,” https://itnext.io/ ysis,” http://esprima.org. Accessed on 2020-12-08. top-33-javascript-projects-on-github-ad9d1dc822f7. Accessed on 2020- [19] F. Howard, “Malware with your Mocha? Obfuscation and Anti Emula- 04-29. tion Tricks in Malicious JavaScript,” Sophos, Tech. Rep., 2010. [47] W3Techs, “Historical trends in the usage statistics of client-side pro- [20] Hynek Petrak, “Javascript Malware Collection,” https://github.com/ gramming languages for websites,” https://w3techs.com/technologies/ HynekPetrak/javascript-malware-collection. Accessed on 2020-06-16. history overview/client side language/all. Accessed on 2020-10-18. [21] Internet Archive, “Search the history of over 486 billion web pages on [48] ——, “Usage statistics of JavaScript libraries for websites,” https: the Internet,” https://archive.org. Accessed on 2020-12-08. //w3techs.com/technologies/overview/javascript library. Accessed on [22]K.J arvelin¨ and J. Kekal¨ ainen,¨ “IR Evaluation Methods for Retrieving 2020-12-07. Highly Relevant Documents,” in ACM SIGIR Conference on Research [49] P. Wang, Q. Bao, L. Wang, S. Wang, Z. Chen, T. Wei, and D. Wu, and Development in Information Retrieval, 2000. “Software Protection on the Go: A Large-Scale Empirical Study on [23] Jscrambler, “Control Flow Flattening,” https://docs.jscrambler.com/ Mobile App Obfuscation,” in International Conference on Software code-integrity/tutorials/control-flow-flattening?utm source=blog. Engineering (ICSE), 2018. jscrambler.com&utm medium=referral&utm campaign=101-cff. [50] D. Wermke, N. Huaman, Y. Acar, B. Reaves, P. Traynor, and S. Fahl, Accessed on 2020-06-07. “A Large Scale Investigation of Obfuscation Use in Google Play,” in [24] T. Kachalov, “JavaScript Obfuscator Tool,” https://obfuscator.io. Ac- ACSAC, 2018. cessed on 2020-12-08. [51] W. Xu, F. Zhang, and S. Zhu, “The Power of Obfuscation Techniques [25] Kafeine, “MDNC - Malware don’t need coffee,” https://malware. in Malicious JavaScript Code: A Measurement Study,” in International dontneedcoffee.com. Accessed on 2020-06-15. Conference on Malicious and Unwanted Software (MALWARE), 2012. [26] S. Kaplan, B. Livshits, B. Zorn, C. Siefert, and C. Curtsinger, “”NOFUS: [52] ——, “JStill: Mostly Static Detection of Obfuscated Malicious Automatically Detecting” + String.fromCharCode(32) + ”ObFuSCateD JavaScript Code,” in ACM conference on Data and application security ”.toLowerCase() + ”JavaScript Code”,” in Microsoft Research Technical and privacy (CODASPY), 2013. Report, 2011. [27] M. Kleppe, “JSFuck,” https://github.com/aemkei/jsfuck. Accessed on 2020-12-08. [28] J. Z. Kolter and M. A. Maloof, “Learning to Detect and Classify Malicious Executables in the Wild,” in Journal of Machine Learning Research, 2006. [29] P. Laskov and N. Srndiˇ c,´ “Static Detection of Malicious JavaScript- Bearing PDF Documents,” in ACSAC, 2011.

12