<<

MASARYK UNIVERSITY FACULTY}w¡¢£¤¥¦§¨  OF I !"#$%&'()+,-./012345

Techniques for JavaScript debugging

BACHELORTHESIS

Jakub Jurových

Brno, 2014 Declaration

Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Jakub Jurových

Advisor: RNDr. Radek Ošlejšek, Ph..

ii Acknowledgement

I would like to thank my supervisor RNDr. Radek Ošlejšek, Ph.D. I am grateful for his help and insights that helped me to write this thesis.

iii Abstract

It is very difficult to understand and debug code, especially if it was written in one of the dynamically typed languages like JavaScript. As a result, spend a lot of time debugging their ap- plications. This thesis presents Recognizer, a tool for faster debug- ging, along with its JavaScript implementation. Recognizer aims to improve productivity of programmers by automatically logging all variables and expressions in a runtime. The data obtained from the logging are then transferred to a code editor and visualised in real time by semantically highlighting the code. The code’s colour high- lighting is based on real data instead of a static syntax analysis. De- bugging with real-time semantic highlighting is being compared to traditional debugging techniques that usually involve step debug- gers and log statements.

iv Keywords semantic highlighting, code understanding, programming, debug- ging, code visualisation

v Contents

1 Introduction ...... 1 1.1 Seeing the Data ...... 1 1.2 Information Foraging Theory ...... 2 1.3 Recognizer ...... 3 1.4 Organisation of This Thesis ...... 4 2 Current State of Debugging and Related Work ...... 5 2.1 Principles of Debugging ...... 5 2.2 Step Debuggers ...... 6 2.3 Log Statements ...... 8 2.4 Always-on Debugging ...... 9 2.5 Interactive Programming ...... 12 3 Recognizer ...... 14 3.1 Implementation Decisions ...... 14 3.2 Debugging with Recognizer ...... 15 3.2.1 Semantic Highlighting ...... 15 3.2.2 Variable Inspection ...... 17 3.2.3 Counting Function Calls ...... 18 3.2.4 Instrumentation Heuristics ...... 20 3.3 Case Studies ...... 20 3.3.1 Case Study 1: Code Understanding ...... 21 3.3.2 Case Study 2: Discovering Bugs ...... 23 3.4 Implementation Details ...... 24 3.4.1 Brackets ...... 25 3.4.2 Remote Debugging Protocol ...... 25 3.4.3 Code Instrumentation ...... 25 Identifiers ...... 26 Member Expressions ...... 27 Call Expressions ...... 28 Function Entries ...... 29 3.4.4 Runtime Tracer Code ...... 29 3.4.5 Serving Instrumented Files ...... 30 3.4.6 Retrieving Values ...... 30 3.5 Current Problems and Limitations ...... 31 3.5.1 Browser Support ...... 31 3.5.2 Performance ...... 32

vi 3.5.3 Languages Compiled to JavaScript ...... 34 3.5.4 Error Handling ...... 34 3.5.5 Integration with Development Workflows . . . 35 3.6 Future Development ...... 35 4 Summary ...... 37

vii 1 Introduction

Programming is difficult. We build extremely complex applications which are difficult to navigate and understand. We cannot see how different parts of a program interact with each other, whether a cer- tain function was even evaluated, or what arguments were passed to it. It is too easy to introduce and too hard to fix a bug in such a system. Code is grouped into modules, classes and objects, but despite all the naming conventions and best practices, we only see the code. No matter how experienced the is, it takes too much time to understand what the code does. We need to read it line by line, imagine how it is evaluated by a computer, keep the state in the head, and hope we didn’t make any mistake along the way. I think we can do better. Reading code line by line is a task for a computer. When we read the code, we should see more than that. We should see what the code means and does. We should see the data hidden behind it [1]. Despite the long history of research in debugging, this problem has not yet been solved [2]. The goals of this thesis are 1 to pro- ¨ vide an overview of existing approaches, 2 design new debugging ¨ © techniques embracing this problem, and 3 develop a working im- ¨© plementation of these techniques. ©

1.1 Seeing the Data

Bret Victor argues that “The entire purpose of code is to manipulate data, and we never see the data." [3] We see only the final state, not the inter- mediate steps. During development, seeing the intermediate steps is problem- atic. We cannot see how it works. For example, the following code tells us nothing about itself.

1 1. INTRODUCTION

1 person.walk(5);

We can see the method walk is being called. It might modify the state in some way, but we don’t know how. We need to read its defini- tion line by line. More importantly, it tells us nothing about the data. What has changed? We can capture the state before and after we call person.walk (amount).

1 console.log(person);//{name:"Joe", walkingDistance: 5} 2 person.walk(5); 3 console.log(person);//{name:"Joe", walkingDistance: 10}

The additional log statements helped us understand how the pro- gram works. We can see the data and we can see the change. Now it is easy to understand that person.walk increments walkingDistance by a certain amount without looking at its definition. We can see the behaviour.

1.2 Information Foraging Theory

To properly design and evaluate a debugging interface, we need to understand how programmers think. During debugging, programmers spend 35 % of their time by me- chanically navigating the code [4]. Besides losing the visual context on every “jump”, this is a considerable efficiency problem. Switching between files is a relatively huge investment in terms of both time and mental effort. Information foraging theory (IFT) describes how people look up relevant information, follow information paths and determine which paths contain the most relevant information, also known as informa- tion scent. IFT studies how humans are able to efficiently find infor- mation with minimal mental cost. IFT provides a powerful framework for understanding the be- haviour of programmers trying to fix the bug or understand the code. In the context of software debugging, the key terms of IFT can be

2 1. INTRODUCTION defined as follows [5, 6]: • Predator The programmer who is debugging, looking for a source of the bug. • Prey The source of the bug that the programmer wants to fix and all other information needed to achieve that goal. • Information patch Regions containing many information sources. For example, the patch can be a single document or a debug- ger window. The programmer can navigate inside the same patch with little cost. • Information scent Perceived value of information and the cost associated with its obtaining. This is inferred from cues, such as a comment in the or an icon in a debugger’s user interface. Based on the personal preference or complexity of the task, pro- grammers use different programming methods. In hypothesis-based debugging, programmers think about the problem as a whole. On the other hand, in scent-based debugging, programmers try to anal- yse the problem bottom-up, closely following the information forag- ing theory [5]. Scent-based programming involves tracing of infor- mation usually by reading the source code or documentation. According to Lawrance et al. [5], programmers use scent-based techniques about 4 times more often than hypothesis-based tech- niques. However, a majority of current debugging tools (presented in Section 2) weren’t designed to support the debugging process based on the information scent.

1.3 Recognizer

I designed a new debugging tool, Recognizer, to explore ideas such as semantic highlighting (see Section 3.2.1). Recognizer’s working implementation targets JavaScript and is released under an open-source license. Recognizer can be installed as an extension for Brackets code editor (see Section 3.4.1). Recognizer brings real-time data from the runtime to the pro- grammer in form of semantic code highlighting (see Section 3.2.1)

3 1. INTRODUCTION

and inspectable variables (see Section 3.2.2). It aims to solve a com- mon problem of inspecting intermediate values of variables, which is especially troublesome in dynamically typed languages. Addition- ally, each function call is logged and visualised using a simple counter near the function declaration (see Section 3.2.3). This kind of detailed inspection answer many questions the programmers may have dur- ing debugging [7, 8]. Thanks to the real-time updating, programmers are able to eas- ily verify their assumptions about the code: “When I click this button, will this function be evaluated?" Patterns inside the code can be easily identified as well: “These functions have been evaluated the same number of times, they are probably somehow related." With Recognizer it is much easier to understand new code. To answer a question “Which variable stores the value of this input field?" we just need to 1 write something into the input field and 2 skim ¨ ¨ the code, looking for the same value. Compare this to the traditional © © way, where we need to 1 read the code in detail, 2 make assump- ¨ ¨ tions about APIs, 3 make a guess, and finally verify it by 4 setting ¨ © © ¨ a breakpoint and 5 re-running the program. ¨© © © 1.4 Organisation of This Thesis

Chapter 2 presents an overview of both state-of-the-art and tradi- tional debugging techniques. I focus on how the discussed tools present runtime data to the programmers and how they help them under- stand the code. The main contribution of this paper, Recognizer, is described in detail in Chapter 3. This includes both design decisions and techni- cal details (Section 3.4). The benefit of using Recognizer is demon- strated in two case studies (Section 3.3). The problems, limitations, and possible solutions are outlined in Section 3.5. The idea of seman- tic highlighting is presented in Section 3.2.1. Finally, (Chapter 4) is dedicated to the summary of this thesis.

4 2 Current State of Debugging and Related Work

Applications work with data, but traditional debugging environments rarely show it to the programmer. We write code blindly, without any real-time knowledge of the program’s state. We just imagine what hundreds of lines of code do. Not surprisingly, we spend a lot of time finding bugs and trying to fix them. Debugging consists of several activities, including observing pro- gram’s runtime state, hypothesising about runtime failures, diagnos- ing faulty code, and its reparation [7]. We should not limit ourselves to traditional debuggers, such as step debuggers. Modern debuggers should proactively show us real data and inform us about possible problems. They should prevent us from introducing new bugs into the code.

2.1 Principles of Debugging

The analysis of a bug can be extremely difficult in complex or poorly documented applications. We need to reliably determine the source of the problem. We usually start with finding ways how to reproduce the bug in order to understand a scope of the problem. This helps us with reasoning about the problem and isolating the problematic code [9]. During a usual debugging session, we perform the following ac- tions: 1. Discover a bug 2. Setup breakpoints or log statements 3. Reproduce the bug 4. Analyse data and fix the bug 5. Remove breakpoints or log statements (undo step 2 ) ¨ 1 and 4 are the core steps of the debugging process. Steps© 2 , ¨ ¨ ¨ 3 and 5 present the overhead of current debugging techniques. All ¨ © ¨ © © breakpoints and log statements from step 2 need to be manually © © ¨ © 5 2. CURRENT STATE OF DEBUGGINGAND RELATED WORK

removed in step 5 once the bug is fixed. Step 3 often needs to be ¨ ¨ repeated multiple times, each time with a different set of breakpoints © © and log statements. We can simplify this process by inserting log statements every- where we might find it useful. Some tools, including Recognizer, have recognised that the overhead steps can be automated. Instead of 5 steps, the programmer needs to perform only 2 steps during a single debugging session.

2.2 Step Debuggers

Perhaps the most advanced tool for debugging JavaScript applica- tions is a step debugger. Step debuggers use programmer-defined breakpoints that halt the program when a certain line is reached. Once the program is halted, we can evaluate the code line by line (i.e. “step” through the code) or navigate through the call stack. To see the data in a step debugger, two steps are needed:

• Setup a breakpoint. First, we have to setup a breakpoint man- ually. But where? At the moment, we don’t know yet where the problem occurs. Several different locations have to be tried.

• Inspect the variables. Once the program is halted on a break- point, we can inspect the program’s state. Chrome DevTools allows us to see values of some variables (Figure 2.1). But these are shown only in the right sidebar, not in the code where we need them. To understand the program’s state, we need to manually switch back and forth between the code and the sidebar. Chrome makes it a bit easier by allowing us to hover the variable to see its value, but we still have to guess where it works, since the inspectable variables are not highlighted in any way (Figure 2.2).

6 2. CURRENT STATE OF DEBUGGINGAND RELATED WORK

Figure 2.1: Screenshot of a user interface from Chrome DevTools debugger. The program is paused on a breakpoint.

Figure 2.2: Screenshot of a user interface from Chrome DevTools debugger. While the program is paused on breakpoint, the programmer can inspect the program’s state by moving a mouse over a variable.

7 2. CURRENT STATE OF DEBUGGINGAND RELATED WORK 2.3 Log Statements

Perhaps the most popular way of debugging applications across all programming environments are the log statements. Log statements, also known as print statements, inform us about the program’s state. They show us the data. Print statements are perhaps the oldest debugging technique still in use. Thanks to their simplicity, they are available in almost all en- vironments, making it the most universal tool for debugging. In 1997, Henry Lieberman pointed out that “many programmers identify ‘insert- ing print statements’ as their debugging technique of choice" [2]. But they are fundamentally separated from the editor. We need another window, a console, to see these values. Log statements show the data, but we can see only a very small subset of program’s inter- nal state. The context is entirely missing. We need to link the output back to the code manually. They can answer some of our questions [8], but at a great cost.

Figure 2.3: Screenshot of Chrome DevTools Console with results of console.log and console.error calls.

JavaScript has been developed as a scripting language for web

8 2. CURRENT STATE OF DEBUGGINGAND RELATED WORK browsers, missing any concept of standard output for printing the values. Console API1 may be thought of as a an equivalent of print with a set of additional features like message formatting. First imple- mentations of Console API started appearing in 20092 as a replace- ment for document.write and window.alert. Even though the Con- sole API is not standardised, it is available in almost all recent browsers, usually through console.log and console.error functions3. In Chrome DevTools (Figure 2.3), console.log(message) shows the formatted message and a line from which the console.log func- tion was called. If the value is an object, its properties can be recur- sively expanded. console.error provides us with a result similar to console.log, but also shows a current stack trace. Debugging with log statements is a very tedious process. First, they have to be inserted manually into the code. Program then has to be restarted and all steps leading to a bug must be reproduced. Once the log statements are not needed, they have to be removed, again, manually. This creates an enormous feedback loop and significantly slows down the development. Despite a huge debugging overhead, log statements are some- times faster than using a step debugger. If programmers already know which variables they are interested in, they don’t have to bring this knowledge into a separate interface of the stepping debugger.

2.4 Always-on Debugging

An alternative approach has been taken by Theseus [10, 11, 12]. The- seus implements an idea of always-on retroactive debugging by keep- ing track of function calls. Theseus visualises function calls real-time as (a) counters next to the code and (b) call graphs in a separate panel (Figure 2.4). It is worth examining Theseus in further detail, as it had perhaps

1. https://developer.mozilla.org/en-US/docs/Web/API/console 2. http://www.paulirish.com/2009/log-a-lightweight-wrapper -for-consolelog/ 3. https://developer.mozilla.org/en-US/docs/Web/API/console

9 2. CURRENT STATE OF DEBUGGINGAND RELATED WORK the greatest influence on the design of Recognizer. Recognizer builds upon the ideas presented in Theseus, such as logging of function calls. Recognizer and Theseus share a similar architecture, which is thoroughly explained in Section 3.4. Theseus has been built as an open-source extension for Brackets. Parts of its source4 are used in the current implementation of Recognizer5.

Figure 2.4: Screenshot of Theseus. Number of calls of a render function is dis- played inside a “pill”, next to its declaration. Call graph is expanded in the bottom panel.

Theseus integrates itself into the editor and presents all the data in the same window – the same information patch [6]. The program- mer is not forced to remember code while switching from the editor to the debugger. This saves time navigating the code and makes in- formation foraging easier. Theseus automatically logs function calls and keeps a history of all passed arguments. With this data, the programmer can go back in

4. https://github.com/adobe-research/theseus 5. https://github.com/equiet/recognizer

10 2. CURRENT STATE OF DEBUGGINGAND RELATED WORK time to understand what happened, which parts of code were exe- cuted and how. Visually, Theseus consist of 2 interfaces (Figure 2.4):

• Function hit counts are summarised in “pills” next to the code. This allows for a quick overview of evaluated and erroneous code (displayed as a red pill). • Full function call graph can be expanded in a separate panel. This graph includes a one-level-deep copy of arguments, which allows the programmer to travel back in time and inspect state almost as if the breakpoint was set inside every function. This is a clear advantage over traditional breakpoints – a program needn’t be halted at any time during the debugging session.

Shortcomings of this design are acknowledged by the author him- self: “Theseus’ technique for displaying historical values in a log sacrifices code locality since the information appears in a separate panel." [12] In other words, we see the data, but not so much the context. Not surprisingly, a study has found that Theseus required only little training to use and was rated favourably for many of the test subjects [12]. However, no significant difference was found in the ability to complete the programming tasks. This can be attributed to a small sample of test subjects, or, more likely, artificial testing con- ditions. Programmers need time to embrace new technologies [13]. Retroactive debugging requires a different way of thinking about programming. I believe this cannot be developed instantly. One of the UX problems the study has discovered was that users had to memorise call counts or reset the debugger to clear the coun- ters. During the code exploration, where we tend to group our ac- tions by time, relative counts are more useful than absolute counts. A more heavy-weight approach was taken by several other tools. TraceGL6 logs every function call and every expression and displays which code paths have been taken. The resulting visualisation is pre- sented in a separate window. Presenting such amount of data out

6. http://www.infoq.com/news/2013/04/tracegl

11 2. CURRENT STATE OF DEBUGGINGAND RELATED WORK of the context leads to a data overload. Spy-js7 improves this expe- rience by integrating itself into an IDE. It allows the programmer to jump back to the source and highlight evaluated expressions, but the actual data is still displayed in a separate panel, out of the context.

2.5 Interactive Programming

Step debuggers are highly unsuitable for interactive and complex applications. Events that happen 60 times a second cannot be ef- fectively debugged using breakpoints. Log statements, on the other hand, only flood the console. Theseus handles this problem pretty well, as it combines the best of traditional debuggers and log statements. The program is never halted and is easily inspectable in a real time. Light Table8 extends this concept by providing us with an ability to setup “watches" [14]. These allow the programmer to inspect the variables they are interested in in real time. Similarly to Brackets, Light Table is an IDE which communicates with a JavaScript engine. Watched expression can be setup only after the program starts. Therefore it makes sense to insert them only in functions which will be called later. Expressions that are evaluated only once right after the program starts cannot be inspected at all. This somewhat limits many of the potential usage scenarios, but brings us a true real-time interactivity with the program during a debugging session. The support of live programming in JavaScript is limited. JavaScript debuggers can move only forward in time. Any retroactive debug- ging needs to be done by tediously copying the state. We must look to other programming paradigms for new ideas. Elm [15] is a functional language that compiles itself into JavaScript. It allows the programmer to travel in time and change the history during the debugging [16]. This is possible thanks its architecture based on Functional Reactive Programming (FRP) [17]. Instead of copying the state, we only need to record and replay the events.

7. http://spy-js.com/ 8. http://www.lighttable.com/ 12 2. CURRENT STATE OF DEBUGGINGAND RELATED WORK

Timelapse9 brings the record and replay functionality to the native JavaScript code. Perhaps the best way of seeing the behaviour of a program is presented by ZStep [18], a debugging environment for Lisp. ZStep is able to show the results of an evaluation directly into the code. Recognizer’s goal is to replicate this behaviour in an imperative lan- guage such as JavaScript. Other possibilities of interactive programming has been explored in Bret Victor’s essay Learnable Programming [3].

9. http://homes.cs.washington.edu/~burg/projects/timelapse/

13 3 Recognizer

How can we efficiently understand the dynamic behaviour of an ap- plication, if its code is just a static text? How can we design an intu- itive debugging interface? Development environments seldom augment the code itself. The augmentation of the code is limited mostly to syntax highlighting and autocomplete features. Instead, a new panel is introduced for almost every advanced feature. In Recognizer, the code is treated as a visualisation of a program execution. Recognizer shows the data (real values of variables) in its context (the variable name within the code). To efficiently visualise dynamic behaviour (the changes in data), the code cannot be static. It must be interactive [19]. The goal of Recognizer is to make the programming faster, easier and more intuitive. There are many possibilities for improvements in current Integrated Development Environments (IDE) [20, 21]. In this thesis I explore new ideas and concepts for making the programmers more productive during debugging and making the code easier to understand. Recognizer bridges the gap between the code and the runtime. With Recognizer, programmers are no longer forced to constantly switch between a code editor and a debugger. Recognizer minimises the cognitive overhead that arises from us- ing a new tool. My goal was to develop a tool that is intuitive, easy to use, and improves the programmer’s productivity almost instantly. The implementation presented in this thesis targets the JavaScript application development.

3.1 Implementation Decisions

I have chosen JavaScript as an implementation language. JavaScript’s simplicity and extensibility makes it an ideal language for explo- ration and fast iteration on feature development. It is widely used

14 3. RECOGNIZER

in both web development and native applications. In the future, the wide user base will enable me to easily test pre- sented concepts in real-world situations. JavaScript is one of the languages which, I believe, desperately need a new generation of debugging tools. Complex JavaScript ap- plications suffer from poor design decisions [22]. Application devel- opment is difficult and error-prone. However, the concepts developed in this thesis are not limited to JavaScript. Dynamic languages suffer from the same problem of unpredictable behaviour. Once the ideas are evaluated in JavaScript environments, they can be ported to other languages.

3.2 Debugging with Recognizer

Recognizer links source code to the execution environment. In terms of information foraging theory, this means that both program and code are linked together in the the same patch, where navigating and information foraging is much easier. Recognizer promotes ex- ploration of the code.

3.2.1 Semantic Highlighting

Over the years, syntax highlighting has proven to be a very use- ful and requested feature. It has been implemented in practically all code editors. Syntax highlighting improves the readability of the code, but does not bring any new information to the programmer. The variable is al- ways blue, no matter if it holds a number, a string or is undefined. We see only an abstract variable. We can’t see data. In this thesis I present semantic highlighting for dynamically-typed languages (Figure 3.1). Semantic highlighting brings a new level of code inspection, where every colour inside code editor has a mean- ing. It links code to real data. Standard syntax highlighting colours the code using the informa-

15 3. RECOGNIZER

Figure 3.1: Screenshot of code semantically highlighted by Recognizer.

tion we already know from the code. With semantic highlighting we actually see new information about the code. We can closely exam- ine the program without any intermediate steps, re-running the pro- gram with additional log statements, or stepping through the code line by line. In contrast with existing tools which support some sort of semantic highlighting1, Recognizer highlights the code according to real data instead of just analysing the syntax. In Recognizer, semantic highlighting is in fact a full inspection of all variables and expressions in the code. Instead of showing all values at once, semantic highlighting is a high-level overview of all data. Semantic highlighting brings attention to possible problems in the program. Recognizer highlights each variable type with a dif- ferent colour (Figure 3.2). This allows the programmer to see any inconsistency or unexpected behaviour right inside the code editor.

1. KDevelop: http://zwabel.wordpress.com/2009/01/08/c-ide-evo lution-from-syntax-highlighting-to-semantic-highlighting/, ScalaIDE: http://scala-ide.org/docs/current-user-doc/features/ typingviewing/semantic-highlighting/index.html

16 3. RECOGNIZER

Colour Type of the inspected value • #44BD87 object • #8757AD function • #446FBD number • #57C0D8 string • #DDCF56 boolean • #EA1717 undefined • #EA1784 NaN • #FB9E0F null • #7EAD25 unevaluated keyword

Figure 3.2: Colour scheme for semantic highlighting.

Programmer can spot bugs sooner than they are discovered in an ap- plication. In dynamically-typed languages, we cannot rely on static analy- sis and compile-time checking. This makes a misspelled variable a frequent source of bugs in JavaScript applications. Recognizer high- lights all undefined variables with red colour, which makes them easy to find (Figure 3.3).

Figure 3.3: add event triggers an undefined function. Without Recognizer, this bug would be difficult to detect, since this code doesn’t throw any error.

3.2.2 Variable Inspection

I create a probe for each variable, which is an underlying mechanism for semantic highlighting. Probe highlights the corresponding code in a collapsed state and a full value in expanded state. The detailed process of inserting probes is explained in Section 3.4.3.

17 3. RECOGNIZER

Figure 3.4: Object variables can be inspected with their properties inside a tooltip.

Figure 3.5: Tooltip for inspecting a primitive type like number.

Probes do not store a history of all values. Instead, each probe allows us to fully inspect only the last known value. This is not lim- ited to primitive data types (Figure 3.5), but works with deep objects as well (Figure 3.4). We cannot see the history partly due to perfor- mance reasons. Keeping every value would result in enormous mem- ory requirements. A user study is required to determine whether the usefulness of probe history would justify its memory requirements. The inspection of values is based on Chrome DevTools’2 code and interface. Each probes is underlined so that the programmer can easily see which variables are inspectable and which code was evaluated.

3.2.3 Counting Function Calls

Probes show only the most recent values. We cannot see the change unless we look at the right place at the right time. To see the history, we can look at the functions. Recognizer keeps

2. https://developers.google.com/chrome-developer-tools/

18 3. RECOGNIZER a history of every function call, including the arguments that were passed to it. This allows us to compare program’s behaviour over time. With history, we can see the change. Recognizer displays a counter near every function call (Figure 3.6).

Figure 3.6: Function call counter is displayed next to the function declaration.

We can immediately see which functions were evaluated. This helps us understand how the program works, which functions are related, and which functions are redundant. The counter can be ex- panded to show a history of function calls (Figure 3.7).

Figure 3.7: When a call counter is clicked, it reveals a history of function calls, times of invocation, and passed arguments.

The usefulness of call counters and call history have been ex- plored in Theseus [12, 11]. The difference between Recognizer and Theseus is in the visual- isation of information. Recognizer displays the log immediately af- ter the function declaration, while Theseus shows this in a separate panel. Recognizer shows the data within the context, but this method is unsuitable if more than 1 function is involved, as is the case with Theseus’ call graphs.

19 3. RECOGNIZER

3.2.4 Instrumentation Heuristics

Large parts of the codebases come from third-party vendors. It is usually our code, not the vendors’, we want to debug. The goal is to minimise the number of instrumented files, while satisfying the programmer’s needs. Recognizer employs a simple heuristic to avoid instrumenting the entire codebase. It instruments only the files which are among Brackets’ Working Files (i.e. files that were “opened” by the program- mer inside the editor).

3.3 Case Studies

In dynamically-typed languages, it is difficult to see variable types. These can be either explicitly declared in the inline documentation (e.g. /* @param string name */), deduced from the code itself (by the methods which are called on them like name.replace(’foo’, ’bar’)), or solely by its name (it is reasonable to expect name to hold a string value). Unfortunately, these methods soon become insufficient, when ob- jects are used instead of primitive values. The properties of an object can change during runtime, which makes the static analysis useless. Listing all expected properties of an object in a documentation is very tedious. Programmers need to see the data with a minimal effort. Rec- ognizer was designed to eliminate the debugging overhead, as de- scribed in Section 2.1. In this chapter I use TodoMVC3, a simple JavaScript application, to demonstrate the experience of writing applications with Recog- nizer.

3. A small todo-list application built for demonstration of various JavaScript MVC frameworks. http://todomvc.com/

20 3. RECOGNIZER

Figure 3.8: Screenshot of TodoMVC application.

3.3.1 Case Study 1: Code Understanding

Mike, a JavaScript programmer, wants to learn how to write appli- cations using Backbone.js4. He downloads Backbone.js implementa- tion5 of TodoMVC app (Figure 3.8) to see how the framework works. While browsing the code, he is not sure what app.TodoView does and when it is instantiated. Without Recognizer, he would need to read the code of whole application, not just todo-view.js. In app-view.js he would find an addOne function (Figure 3.9) which is triggered by add event on app.todos, an instance of app.Todos. It looks like app.TodoView is instantiated when new todo item is added, but he is not sure. add is an internal event which is triggered by Backbone.js. He would need to look into documentation to figure this out. Instead, it is much easier to setup a breakpoint or insert log state- ment into initialize function and launch the application from a

4. A JavaScript framework for writing applications. http://backbonejs.org / 5. http://todomvc.com/architecture-examples/backbone/

21 3. RECOGNIZER

Figure 3.9: addOne function from app-view.js without Recognizer.

debugger. It soon becomes clear that app.TodoView is instantiated only when a new item is added into the todo list. Mike needs to open another window, a debugger, to get this information.

Figure 3.10: Screenshot of the code with Recognizer disabled.

Recognizer eliminates the debugging overhead mentioned in Sec- tion 2.1 altogether. To understand how app.TodoView works, Mike just needs to launch the application and try adding a few items. initialize counter (Figure 3.11) always increments as soon as Mike adds a new item into his todo list. He can see how the application works right from the code (Figure 3.12).

Figure 3.11: Function call count.

22 3. RECOGNIZER

Figure 3.12: Screenshot of the code with Recognizer enabled.

3.3.2 Case Study 2: Discovering Bugs

Mike’s colleague, Lisa, wants to make her own todo list, but the ap- plication isn’t working. When she tries to add a new item, nothing shows up, not even an error message. There is a bug. Since Lisa doesn’t use Recognizer, she needs to read the code. First, she finds functions which should be evaluated every time a new item is added. Then, she sets up a breakpoint in each one of them to find out which functions where evaluated correctly and which were not. After some time spent switching between the debugger and the application, she finds out that the conditional expression app.todos.lenght from render function never evaluates to true. She just found a misspelled variable which prevented the todo items from rendering. She rewrites it to app.todos.length and re-runs the application to make sure it fixed the bug. With Recognizer, these kinds of errors are very easy to spot. In JavaScript, a misspelled variable is undefined, which Recognizer treats as a frequent source of errors and highlights it with red colour (Figure 3.13). The red colour immediately catches Lisa’s attention.

23 3. RECOGNIZER

She easily inspects the parent object and finds out that length is a valid property of app.todos, not lenght (Figure 3.14).

Figure 3.13: Part of code from TodoMVC application. A misspelled variable is auto- matically highlighted by Recognizer with a red colour (see Section 3.2.1 for details about semantic highlighting).

Figure 3.14: Inspection on a parent provides a list of available properties.

3.4 Implementation Details

Recognizer modifies the application’s code to include additional log functions called probes, without changing the functionality of the code (see Section 3.4.3). Probes store all values in a global object dedicated for a later retrieval. In the current implementation, the retrieval oc- curs every 100 milliseconds by a call from Brackets to V8 JavaScript

24 3. RECOGNIZER engine6 via a Remote Debugging Protocol (see Section 3.4.2).

3.4.1 Brackets

Recognizer is implemented as an extension for Brackets7. Brackets is an open-source code editor built in HTML, CSS and JavaScript, which makes it very flexible and easy to extend. It has a strong user base, which will make user testing easier in later phases of the re- search. Brackets allows programmers to preview a web application via its Live Preview feature. When the application is opened via Live Preview, Brackets creates and maintains a remote debugging session with Chrome browser and creates a server for serving files to the browser.

3.4.2 Remote Debugging Protocol

During Live Preview, Brackets starts a browser (the runtime envi- ronment) with debug mode enabled. Debug mode allows Brackets to communicate with the runtime via WebSockets8. The communi- cation is based on JSON messages described by Remote Debugging Protocol9. Through Remote Debugging Protocol, Recognizer can remotely evaluate code inside the runtime and retrieve the result.

3.4.3 Code Instrumentation

A custom depth-first traversing algorithm is used to recursively walk and transform the code’s Abstract Syntax Tree (AST). Upon leaving an instrumentable AST node, the algorithm wraps it inside a probe.

6. https://code.google.com/p/v8/ 7. http://brackets.io 8. http://www.websocket.org/ 9. https://developers.google.com/chrome-developer-tools/doc s/debugger-protocol

25 3. RECOGNIZER

Probe is a simple unary function that stores a copy of any passed argument and then returns it without modification. The algorithm follows the paths according to its built-in ruleset. The ruleset determines which leaves are traversable from a certain type of node. For example, in a assignment expression the left side cannot be instrumented, while the right side can be (Figure 3.15). totalTasks = completedTasks + remainingTasks;

left right

Figure 3.15: Assignment Expression. Left side cannot be instrumented, since logProbe(left) = right is an invalid operation in JavaScript.

AST is generated by the open-source Esprima parser10. Esprima offers full support for ECMAScript 5.1 (ECMA-262) and a partial support for ECMAScript 6 [23]. The AST includes information about the position of expressions in the original code, which allows us to uniquely identify probes and map the values back to the code editor. Once the AST is modified to include log statements, the code is regenerated from the AST using Escodegen11 code generator. An ex- ample of fully instrumented code is shown in Figure 3.20. Various instrumentation mechanisms are used depending on the type of the instrumented expression.

Identifiers

Identifiers are the simplest expressions for the instrumentation pro- cess. They can be instrumented using a single probe function (Figure 3.16. Function logProbe takes two arguments: location and value. The value is evaluated, stored inside an object with its location as a key, and subsequently returned (Figure 3.17). Since objects in JavaScript

10. http://esprima.org/ 11. https://github.com/Constellation/escodegen

26 3. RECOGNIZER

var total = completedWithRemaining;

var total = logProbe([1, 12, 1, 34], completedWithRemaining);

Figure 3.16: Result of instrumenting a variable declaration. The first argu- ment of logProbe function (Figure 3.17) is a location of the original node in [startLine, startColumn, endLine, endColumn] format.

are mutable, the value needs to be deeply copied in order to preserve the value as it was passed to the log function. Without a deep copy the logged value could be mutated before we would have the chance to display it in the editor.

1 function logProbe(location, value){ 2 probes[location] = deepCopy(value); 3 return value;

4 }

Figure 3.17: logProbe function, simplified for demonstration purposes.

Member Expressions

In case of a chained expression, we want to inspect every step. In an expression like window.app.Todo this means inspecting the follow- ing expressions: • window • window.app • window.app.Todo It is important to evaluate window and window.app only once, since evaluating an object inside Member Expression could have side effects. The solution is to insert probes inside the single expression:

1 logProbe(location, 2 logProbe(location,

3 logProbe(location, 4 window 5 ).app

27 3. RECOGNIZER

6 ).Todo

7 )

Call Expressions

If we instrumented item.remove() the same way as the MemberEx- pression, it would not have the same meaning. In JavaScript, this keyword is not lexically scoped. Its value depends on the context from which the functions was called (Figure 3.18).

1 var car={start: function() { 2 this.started= true; 3 return this; 4 }};

5 var startFn= car.start; 6 7 car.start();// returns car and sets car.started to true 8 startFn();// returns window and sets window.started to true

Figure 3.18: Behaviour of this keyword in JavaScript [24].

For this reason, car.start() and logProbe(logProbe(car).start)() will have different results, as logProbe function does not preserve this value. start() will be called with the default this value12. We need a different approach to instrument Call Expressions. In the following method we store the original this and apply it later to the instrumented member function.

1 var tmp= car; 2 var fn= tmp.start;

3 return fn.apply(tmp, arguments);

Both the object and the property are then wrapped inside probes separately. To avoid polluting current function scope with Recog- nizer’s internal variables, the instrumented code is also enclosed in-

12. In JavaScript, the default this value is a global variable window. If it is called in strict mode, this is undefined [24].

28 3. RECOGNIZER

side an immediately-invoked function expression [25] binded to the this value of the current scope.

1 (function() {

2 var tmp= logProbe(location, car); 3 var fn= logProbe(location, tmp.start); 4 return fn.apply(tmp, arguments); 5 }.bind(this));

Function Entries

To track the number of function calls and the history of passed argu- ments, a logEntry function (Figure 3.19) is inserted at the beginning of each function declaration (Figure 3.20).

1 function logEntry(location, args){ 2 calls.push({ 3 position: location, 4 args: deepCopy(args),

5 argsCount: args.length, 6 time: Date.now() 7 }); 8 }

Figure 3.19: logProbe function, simplified for demonstration purposes.

3.4.4 Runtime Tracer Code

Each instrumented file is prepended with a tracer declaration. Tracer is an object which provides declarations of probe functions and stores all logged values. Tracer serves a contact point between the run- time and the editor. It provides an API for Recognizer to retrieve the logged values. Each tracer is uniquely identified to prevent namespace collisions between multiple tracers in different files. The tracer object is acces- sible through a global variable __recognizer{UNIQUE_ID}.

29 3. RECOGNIZER

1 function square(val){ 2 var square= Math.pow(val, 2);

3 return square; 4 }

1 function square(val){ 2 logEntry([1,9,1,15], arguments); 3 4 var square= function () { 5 var obj= logProbe([2, 17, 2, 21], Math),

6 fn= logProbe([2, 22, 2, 25], obj.pow); 7 return fn.apply(obj, arguments); 8 }.bind(this)(logProbe([2, 26, 2, 29], val), 2); 9 10 return logProbe([3, 11, 3, 17], square); 11 }

Figure 3.20: Example of original and instrumented code, simplified for demonstra- tion purposes.

3.4.5 Serving Instrumented Files

To achieve the transparency of using instrumented code, all files are served to the browser through a Recognizer server. A custom server eliminates the need to request an instrumented file manually by the programmer, usually by rewriting an HTML

Web Analytics