Visualizing Program-Execution Data for Deployed Software∗

Gammatella: Visualizing Program-Execution Data for Deployed Software∗ James A. Jones, Alessandro Orso, and Mary Jean Harrold College of Computing Georgia Institute of Technology {jjones,orso,harrold}@cc.gatech.edu Abstract 1 Introduction Software systems are often released with missing Developing reliable, safe, and efficient software is dif- functionality, errors, or incompatibilities that may ficult. Quality assurance tasks, such as testing, anal- result in failures in the field, inferior performances, ysis, and performance optimization, are often con- or, more generally, user dissatisfaction. In previous strained because of time-to-market pressures and be- work, some of the authors presented the Gamma ap- cause products must function in a number of vari- proach, whose goal is to improve software quality by able configurations. Consequently, released software augmenting software-engineering tasks with dynamic products may exhibit missing functionality, errors, in- information collected from deployed software. The compatibility with the running environment, security Gamma approach enables analyses that (1) rely on holes, or inferior performance and usability. actual field data instead of synthetic in-house data Many of these problems arise only when the soft- and (2) leverage the vast and heterogeneous resources ware runs in the users’ environments and cannot be of an entire user community instead of limited, and easily detected or investigated in-house. We believe, often homogeneous, in-house resources. and our previous research suggests, that software de- When monitoring a large number of deployed in- velopment can greatly benefit from augmenting anal- stances of a software product, however, a significant ysis and measurement tasks performed in-house with amount of data is collected. Such raw data are use- program-execution data—information, such as cover- less in the absence of suitable data-mining and vi- age data, exception-related information, and perfor- sualization techniques that support exploration and mance related data that is collected from deployed understanding of the data. In this paper, we present software while it is used in the field. We call this a new technique for collecting, storing, and visualiz- approach the Gamma approach.1 The Gamma ap- ing program-execution data gathered from deployed proach aims to improve software quality through con- instances of a software product. We also present a tinuous monitoring and analysis of software after de- prototype toolset, Gammatella, that implements ployment [6, 15, 17]. the technique. Finally, we show how the visualiza- Monitoring of a high number of deployed instances tion capabilities of Gammatella facilitate effective of a software product, however, can produce a huge investigation of several kinds of execution-related in- amount of program-execution data. For example, in formation in an interactive fashion, and discuss our a preliminary study that involved only a few users initial experience with a semi-public display of Gam- and only one system, we collected more than 1,000 matella. program-execution data in less than a month. If we multiply that number by a realistic number of users for an average system, it is easy to see that the quan- Keywords tities involved are on the order of millions of program- execution data per system. Gamma technology, software visualization, remote Furthermore, when collecting more and more kinds monitoring of program-execution data from the field, not only ∗An earlier version of the material presented in this paper 1We call the approach Gamma because it can be seen as appeared in the Proceedings of the ACM Symposium on Soft- the next phase after α-testing, performed in-house, and beta- ware Visualization (June, 2003) [16]. testing, performed in the field by a set of selected users. 1 does the size of the data grow, but also their com- tation and data-collection capabilities; and plexity: different kinds of data may have intricate • a case study that shows the feasibility of the ap- relationships and dependences that require such data proach. to be analyzed together to be understood. Such a huge and complex amount of data can- In the next section, we describe the new visual- not be analyzed manually. To be able to extract ization approach that we use to represent program- meaningful information about program behavior from execution data. Section 3 describes the components the raw data and exploit their potential, we need of the Gammatella toolset and discusses our setup suitable data-mining and visualization techniques. for and initial experience with a semi-public display of In particular, visualization techniques can be effec- Gammatella. Section 4 presents two applications of tive in transforming program-execution data into vi- our approach and a feasibility study performed using sual information that can be explored and under- the tool. Section 5 discusses related work. Finally, stood [2, 9, 20, 22, 24]. Section 6 presents some conclusions and discusses fu- In this paper, we present a new visualization ap- ture work. proach that can efficiently represent different kinds of program-execution data and that facilitates investigation of the data to study the behavior of pro- 2 Visualization Technique grams in the field. The approach is defined for a con- text in which a number of instances of a program In this section, we describe the visualization approach are continuously monitored, and has the following that we defined to enable continuous monitoring and characteristics: (1) it provides a hierarchical view exploration of program-execution data collected from of the code, so that the user can navigate the pro- deployed software. gram at different levels of detail while studying the One goal of our work is to provide an interface program-execution data; (2) it is flexible in the kind that can scale to large programs and that can handle of program-execution data it can show for each exe- a number of executions by many users. To achieve cution; and (3) it accounts for a dynamic, constantly this goal, we defined a visualization approach that increasing, and possibly very large number of execu- provides: tions through the use of filters and summarizers. We also present a prototype toolset, Gam- • representation of software systems at different matella, that implements the visualization ap- levels of detail; proach and provides capabilities for instrumenting • use of coloring to represent program-execution the code, collecting program-execution data from the data; field, and storing and retrieving the data locally. • explicit representation and visualization of Finally, we report two possible applications of program-execution data about each execution our visualization technique—exception and profiling together with its properties; and analyses—and a feasibility study. In the study, we • capabilities for filtering and summarizing the Gammatella use , displayed on a semi-public dis- program-execution data in an interactive way. play in our lab, to collect, store, visualize, and investigate program-execution data gathered from in- To minimize the interaction required by a user to stances of a real software system distributed to a set see all dimensions of the display, we chose to focus of users. The study shows how the visualization ca- Gammatella our attention on two-dimensional visualization tech- pabilities of let us effectively investi- niques rather than three-dimensional techniques. For gate several kinds of program-execution data in an example, in our approach, the user is not required to interactive fashion and get meaningful insights into rotate the display to reveal obscured features. Ob- the monitored program’s behavior. Our initial expe- Gammatella viously, a height dimension could be added to our rience with the semi-public display of visualization to let more variables be displayed. is also discussed. The main contributions of the paper are: 2.1 Representation Levels • a new visualization approach that facilitates visualization of various kinds of program-execution To investigate the program-execution data efficiently, data and interactive study of a program’s behav- we must be able to view the data at different levels ior; of detail. In our visualization approach, we represent • a toolset, Gammatella, that implements the software systems at three different levels: statement visualization approach and provides instrumen- level, file level, and system level. 2 Statement level. The lowest level of representa- are especially effective in letting users spot unusual tion in our visualization is the statement level. At patterns in the represented data. Moreover, treemaps this level, the visualization represents source code, can efficiently encode information for large-sized pro- and each line of code is suitably colored (in cases grams: in a 1280x1024 display, treemaps can visual- where the information being represented does involve ize, on average, programs of more than 4,000 files [21]. coloring). Figure 1 shows an example of a colored set of statements in this view. For example, if we are In our development of the system-level view, we visualizing statement-coverage information, each line considered other visualization techniques such as of code is colored based on whether it was covered Stasko and Zhang’s Sunburst visualization [23] or by the considered execution(s) (e.g., gray if not cov- Lamping, Rao, and Pirolli’s Hyberbolic Tree visual- ered and green if covered). The statement level is the ization [12]. These techniques, however, focus more level at which users can get the most detail about on the hierarchical structure of the information they the code. However, directly viewing the code is not represent, and use a considerable amount of screen efficient for programs of non-trivial size. To allevi- space to represent such structure. For our applica- ate this problem, our visualization approach provides tion, the hierarchical structure of the program mod- representations at higher levels of abstraction. ules is less important than representing as much information as possible at each level of the hierarchy. File level.

Visualizing Program-Execution Data for Deployed Software∗

SIGSOFT CFP-ICSE04.Qxd

Improving Web Services Maintenance Through Regression Testing Divya Rohatgi, Prof

FINAL PROGRAM University of California, Irvine [email protected] Program Chair Matthew B

Curriculum Vitae James A

Automated Test Cases Prioritization for Self-Driving Cars in Virtual

Mary Lou Soffa: Curriculum Vitae

Dissertation Towards Model-Based Regression

CRA-W Newletter October 2008 Final

Source-Code Level Regression Test Selection: the Model-Driven Way

Empirical Studies of a Safe Regression Test Selection Technique

ICSE 2001 Advance Program

Trade-Offs in Continuous Integration:Assurance, Security, And