Software Analytics As a Learning Case in Practice: Approaches and Experiences

Software Analytics as a Learning Case in Practice: Approaches and Experiences Dongmei Zhang1, Yingnong Dang1, Jian-Guang Lou1, Shi Han1, Haidong Zhang1, Tao Xie2 1Microsoft Research Asia, Beijing, China 2North Carolina State University, Raleigh, NC, USA {dongmeiz,yidang,jlou,shihan,haizhang}@microsoft.com, [email protected] ABSTRACT analysis in order to obtain insightful and actionable information for 2 Software analytics is to enable software practitioners to perform data-driven tasks around software and services . data exploration and analysis in order to obtain insightful and ac- Insightful information is information that conveys meaningful tionable information for data-driven tasks around software and ser- and useful understanding or knowledge towards performing the tar- vices. In this position paper, we advocate that when applying ana- get task. Typically insightful information is not easily attainable by lytic technologies in practice of software analytics, one should (1) directly investigating the raw data without aid of analytic technolo- incorporate a broad spectrum of domain knowledge and expertise, gies. Actionable information is information upon which software e.g., management, machine learning, large-scale data processing practitioners can come up with concrete solutions (better than ex- and computing, and information visualization; and (2) investigate isting solutions if any) towards completing the target task. how practitioners take actions on the produced information, and Developing a software analytic project typically goes through it- provide effective support for such information-based action taking. erations of the life cycle of four phases: task definition, data prepa- Our position is based on our experiences of successful technology ration, analytic-technology development, and deployment and feed- transfer on software analytics at Microsoft Research Asia. back gathering. Task definition is to define the target task to be assisted by software analytics. Data preparation is to collect data to be analyzed. Analytic-technology development is to develop prob- Categories and Subject Descriptors lem formulation, algorithms, and systems to explore, understand, D.2.5 [Software Engineering]: Testing and Debugging—Debug- and get insights from the data. Deployment and feedback gather- ging aids, diagnostics; D.2.9 [Software Engineering]: Manage- ing involves two typical scenarios. One is that, as researchers, we ment—Productivity, software quality assurance (SQA) have obtained some insightful information from the data and we would like to ask domain experts to review and verify. The other is that we ask domain experts to use the analytic tools that we have General Terms developed to obtain insights by themselves. Most of the times it is Experimentation, Management, Measurement the second scenario that we want to enable. Among various analytic technologies, machine learning is a well- recognized technology for learning hidden patterns or predictive Keywords models from data. It plays an important role in software analyt- Software analytics, machine learning, technology transfer ics. In this position paper, we argue that when applying analytic technologies in practice of software analytics, one should 1. INTRODUCTION • incorporate a broad spectrum of domain knowledge and ex- A huge wealth of various data exists in the software development pertise, e.g., management, machine learning, large-scale data process, and hidden in the data is information about the quality of processing and computing, and information visualization; software and services as well as the dynamics of software develop- • investigate how practitioners take actions on the produced ment. With various analytic technologies (e.g., data mining, ma- insightful and actionable information, and provide effective chine learning, and information visualization), software analytics support for such information-based action taking. is to enable software practitioners1 to perform data exploration and 1 Our position is based on a number of software analytic projects Software practitioners typically include software developers, that have been conducted at the Software Analytics (SA) group3 testers, usability engineers, and managers, etc. at Microsoft Research Asia (MSRA) in recent years (in the rest of this paper, we refer to members of the software analytic projects at MSRA as the SA project teams). These software analytic projects have undergone successful technology transfer within Microsoft for Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are enabling informed decision making and improving quality of soft- not made or distributed for profit or commercial advantage and that copies ware and services. We expect that our position will provide useful bear this notice and the full citation on the first page. To copy otherwise, to 2 republish, to post on servers or to redistribute to lists, requires prior specific We coin this term software analytics to expand the scope of previ- permission and/or a fee. ous work [1,6] on analytics for software development and previous work on software intelligence [5]. MALETS ’11, November 12, 2011, Lawrence, Kansas, USA 3 Copyright 2011 ACM 978-1-4503-1022-2/11/11 ...$10.00. http://research.microsoft.com/groups/sa/ guidelines for other researchers and practitioners in successfully code-clone detection, we should not stop at measuring the preci- carrying out software analytics research for technology transfer. sion and recall of detected clones; rather, we should push further to accomplish that the detected clones could effectively help address 2. PROJECT OBJECTIVES ultimate tasks such as refactoring and defect detection, and should measure such benefits in evaluations. The main objectives of MSRA software analytic projects are to Engagement of customers during the development process of a advance the state of the art in software analytics and help improve software analytic project. It is well recognized that engaging cus- the quality of Microsoft products as well as the productivity of Mi- tomers is a challenging task especially in the context of software crosoft product development. The target customers (in short as cus- engineering tools. Customers may have resistance to proposed tomers) are primarily practitioners from Microsoft product teams. changes (due to analytic-tool adoption) on their existing way of Such emphasis also benefits the broader research and industrial carrying out a task. In addition, due to tight development sched- communities because Microsoft product development belongs to ule, they may not be able to invest time on gaining understanding important representatives of modern software development, along of the best/worst scenarios for applying an analytic tool. However, dimensions such as the team scale, software complexity, and soft- developing a software analytic project typically needs the engage- ware types. Some of the MSRA project outcomes could also even- ment of customers in iterations of the four phases in the project tually reach out to broad communities through their integration life cycle, e.g., to get better understanding on the tasks and domain with Microsoft development tools. For example, the code-clone knowledge. Among the phases, especially the phase of deployment detection tool [2] resulted from the the first example project dis- and feedback gathering, it is crucial for the produced analytic tools cussed in Section 4 has been integrated in Microsoft Visual Studio to have good usability, e.g., providing effective visualization and vNext, benefiting broader communities. manipulation of analysis results. 3. CHALLENGES 4. EXPERIENCES There are two main categories of challenges to overcome in or- We next discuss our experiences in different phases of develop- der to achieve the stated objectives. The first category is rooted ing two example MSRA analytic tools that have been successfully from the characteristics of the data being analyzed with analytic adopted at various Microsoft product teams. XIAO [2] is a tool for technologies. detecting code clones of source-code bases, targeting at tasks such Data scale. Typical data in software analytics is of large scale, as refactoring and defect detection. StackMine is a tool for learn- e.g., due to the large scale of software being developed and the large ing performance bottlenecks from call-stack traces collected from size of software development teams. Some tasks require to analyze real-world usage, targeting at tasks such as performance analysis. division-wide or even company-wide code bases, which are far beyond the scope of a single code base (e.g., when conducting code- 4.1 Task Definition clone detection [2]). Some tasks require to analyze a large quantity Task definition is to define the target task to be assisted by soft- of (likely noisy) data samples within or beyond a single code base ware analytics. There are two models (or their mixture) of initiat- (e.g., when conducting runtime-trace analysis [3]). Although lack- ing a software analytic project at MSRA: the pull model and push ing data samples may not be an issue in this context of machine model. learning, the large scale of data poses challenges for data process- StackMine primarily follows the pull model. Before the Stack- ing and analysis, including learning-algorithm design and system Mine project was started, during one of the meetings with the SA building. team, a member of a Microsoft

Load more