Static Analysis for Javascript
Total Page:16
File Type:pdf, Size:1020Kb
University of Aarhus Department of Computer Science Ph.D Dissertation Static Analysis for JavaScript Simon Holm Jensen Supervisor: Anders Møller Submitted: January 28, 2013 Abstract Web applications present unique challenges to designers of static analysis tools. One of these challenges is the language JavaScript used for client side scripting in the browser. JavaScript is a complex language with many pitfalls and poor tool support compared to other languages. This dissertation describes the design and implementation of a static analysis for JavaScript that can assist programmers in finding bugs in code during development. We describe the design of a static analysis tool for JavaScript, built using the monotone framework. This analysis infers detailed type information about programs. This information can be used to detect bugs such as null pointer dereferences and unintended type coercions. The analysis is sound, enabling it to prove the absence of certain program errors. JavaScript is usually run within the context of the browser and the DOM API. The major challenges in supporting this environment is to model the event loop of the browser and Document Object Model used to interface and modify the HTML displayed in the browser. We address both of these challenges in the design of our analysis. Dynamic code evaluation is widely used in JavaScript applications. To accommodate this in the analysis, we add the Unevalizer component which can transform code on the fly to eliminate dynamic code evaluation. By studying the use of dynamic code evaluation in the wild, we have identified several common patterns. Many of these patterns can automatically be transformed into equivalent code without dynamic code evaluation and can then be analyzed further. Acceptable performance is needed to make an analysis tool useful in prac- tice. To that end we have designed an extension to the analysis called lazy propagation. Lazy propagation improves performance of the analysis by re- ducing the information that the analysis must consider in the program. Ex- perimental validation of lazy propagation indicates a significant performance improvement. The design of the analysis has been evaluated on a large selection of bench- marks taken from online sources. The results shows that the analysis is able to identify bugs in real code in reasonable time. i Resume Web applikationer indeholder mange unikke udfordringer for designere af statisk analyse værktøjer. En af disse udfordringer er programmeringssproget Java- Script som bliver brugt til programmering i browseren. JavaScript er et kom- pliceret sprog med mange faldgruber og i forhold til andre sprog mangler det gode værktøjer til at hjælpe programmøren. Denne afhandling beskriver design og implementation af en statisk analyse for JavaScript. Vi beskriver designet af et statisk analyse værktøj for JavaScript som er bygget ved brug af det monotone framework. Analysen infererer detaljeret typeinformation om programmer. Denne information kan bruges til at finde fejl i koden s˚asom null pointer fejl og utilsigtede type konverteringer Analysen er sund, s˚aden kan bevise programmer fejlfri for visse klasser af fejl. JavaScript programmer bliver som regel kørt i en browser og bruger DOM APIet. De store udfordringer involveret i at understøtte dette miljø er browse- rens event loop og den objekt model som bliver brugt til at tilg˚aHTML siden. Vi adresserer begge disse udfordringer i designet af vores analyse. Dynamisk kode evaluering er udbredt i JavaScript applikationer. For at kunne h˚andtere dette i vores analyse har vi udviklet Unevalizer komponenten som kan transformere kode med dynamisk kode evaluering til ækvivalent kode uden. Via et studie af brugen af dynamisk kode evaluering i rigtige programmer har vi identificeret flere gennemg˚aende mønstre. Mange af disse mønstre kan automatisk transformeres til ækvivalent kode uden dynamisk kode evaluering og kan derved analyseres videre. Acceptabl udførselstid for analysen er nødvendig for at den er anvendeligt i praksis. For at opn˚adette har vi designet en udvidelse til analysen kaldet lazy propagation. Lazy propagation forbedrer udførselstiden ved at reduc- ere mængden af information, analysen skal behandle i programmet. Eksperi- mentelle resultater viser betydelige forbedringer af udførselstiden ved brug af lazy propagation. Designet af analysen er blevet evalueret p˚aet stort udvalg af benchmarks fundet p˚aInternettet. Resultaterne viser, at analysen er i stand til at finde fejl i rigtige programmer med et rimelig tidsforbrug. iii Acknowledgments I am indebted to my advisor Anders Møller for being a capable and useful mentor. He has been a great advisor both in sickness and in health during my time as a Ph.D student. I thank the entire Programming Languages group at Aarhus University for creating a great working environment. I still do not understand what is so great about Foosball though. A special thanks goes to my office mate Ian Zerny for having a decent taste in music and for not judging me on the days where I did not show up until after lunch. I also thank Mathias Schwarz for a giving a meticulous review of this dis- sertation, which has greatly improved it. I am also indebted to both Frank Tip and Satish Chandra who where both excellent hosts when I visited IBM Research Watson in Hawthorne and Ban- galore respectively. Finally I would like to thank my mother for supporting and encouraging me. Simon Holm Jensen Aarhus, January 27, 2013 v Contents Abstract i Resume iii Acknowledgments v Contents vi I Overview 1 1 Introduction 3 1.1 Hypothesis ............................. 3 1.2 Method ............................... 4 1.2.1 Implementation....................... 4 1.2.2 Experimentalevaluation . 4 1.3 Structure .............................. 5 1.4 Papers................................ 6 2 JavaScript and Web Development 7 2.1 ECMAScriptandJavaScript . 7 2.1.1 Prototypes ......................... 8 2.2 TheDocumentObjectModel . 9 2.2.1 Events............................ 9 2.2.2 AJAX............................ 9 2.2.3 An Example of DOM Usage and AJAX . 10 2.3 Dynamiccodeevaluation . 10 2.4 JavaScriptapplicationframeworks . 11 3 Static Analysis Background 13 3.1 Values................................ 13 3.1.1 Objects ........................... 14 3.2 Controlflow............................. 15 3.2.1 Flowsensitivity. 16 3.3 Representingprograms. 16 vi CONTENTS vii 3.4 Functions .............................. 16 3.4.1 Interproceduralanalysis . 17 3.4.2 Contextsensitivity . 18 3.5 Computingthefixpoint . 19 3.6 Alternativestostaticanalysis . 19 3.6.1 Typesystems........................ 19 3.6.2 Dynamicapproaches . 20 3.6.3 Semantics.......................... 21 4 TAJS 23 4.1 Designchoices............................ 23 4.1.1 WholeProgram....................... 23 4.1.2 SoundApproximation . 24 4.2 Overview .............................. 24 4.3 Lattice................................ 24 4.3.1 Programstate ....................... 25 4.3.2 Abstractvalues....................... 25 4.4 Transferfunctions ......................... 26 4.5 Recencyabstraction ........................ 27 4.6 LazyPropagation.......................... 28 4.6.1 Acallgraph......................... 28 4.6.2 Analysiswithlazypropagation . 28 4.7 ModelingtheBrowser ....................... 30 4.7.1 EventModel ........................ 31 4.8 TheUnevalizer ........................... 32 4.8.1 Measuring eval inpractice ................ 32 4.8.2 UnevalizerFramework . 33 4.8.3 Constantstrings ...................... 33 4.8.4 Dynamicallycreatedstrings . 34 4.9 Relatedwork ............................ 35 4.9.1 StaticanalysisforJavaScript . 36 4.9.2 DOMmodeling....................... 38 4.9.3 Dynamiccodeevaluation . 38 5 Evaluation 41 5.1 Researchquestions ......................... 41 5.2 Results................................ 42 5.3 Threatstovalidity ......................... 46 6 Conclusion 47 II Papers 49 7 Type Analysis for JavaScript 51 7.1 Introduction............................. 51 7.2 RelatedWork............................ 55 7.3 FlowGraphsforJavaScript . 57 7.4 The Analysis Lattice and Transfer Functions . 58 7.4.1 TransferFunctions . 60 viii CONTENTS 7.4.2 RecencyAbstraction . 61 7.4.3 InterproceduralAnalysis. 61 7.4.4 TerminationoftheAnalysis . 62 7.5 Experiments............................. 62 7.6 Conclusion ............................. 65 8 Lazy Propagation 67 8.1 Introduction............................. 67 8.2 ABasicAnalysisFramework . 69 8.2.1 AnalysisInstances . 69 8.2.2 DerivedLattices ...................... 70 8.2.3 ComputingtheSolution . 70 8.2.4 An Abstract Data Type for Transfer Functions . 71 8.2.5 Problems with the Basic Analysis Framework . 73 8.3 Extending the Framework with Lazy Propagation . 74 8.3.1 ModificationsoftheAnalysisLattice . 74 8.3.2 Modifications of the Abstract Data Type Operations . 75 8.3.3 RecoveringUnknownFieldValues . 77 8.4 ImplementationandExperiments . 81 8.5 RelatedWork............................ 82 8.6 Conclusion ............................. 83 8.7 TheoreticalProperties . 83 8.7.1 Termination......................... 84 8.7.2 Precision .......................... 84 8.7.3 Soundness.......................... 88 9 DOM Modeling 89 9.1 Introduction............................. 89 9.2 Challenges.............................. 92 9.2.1 TheJavaScriptLanguage . 92 9.2.2 TheHTMLDOMandBrowserAPI . 93 9.2.3 ApplicationDevelopmentPractice . 94 9.3 TheTAJSAnalyzer ........................ 95 9.4 ModelingtheHTMLDOMandBrowserAPI. 96 9.4.1 HTMLObjects......................