FOCUS: ENGINEERING’S 50TH ANNIVERSARY

much reuse of too many old, and now Software outdated, ideas. A recent study of 564 software developers found that

a) programmers do indeed have Analytics: very strong beliefs on certain top- ics b) their beliefs are primarily formed based on personal experi- What’s Next? ence, rather than on fndings in empirical research and c) beliefs can vary with each project, but do Tim Menzies, North Carolina State University not necessarily correspond with actual evidence in that project.3 Thomas Zimmermann, Microsoft Research The good news is that, using soft- ware analytics, we can correct those // Developers sometimes develop their misconceptions (for examples, see own ideas about “good” and “bad” the sidebar “Why We Need Soft- ware Analytics”). For example, after software, on the basis of just a few a project is examined, certain cod- projects. Using software analytics, we ing styles could be seen as more bug can correct those misconceptions. // prone (and to be avoided). There are many ways to distill that data, including quantitative methods (which often use data- mining algorithms) and qualitative FPO methods (which use extensive human feedback to guide the data explora- tion). Previously, we have offered ex- tensive surveys of those methods (see Figure 2).4–7 The goal of this article IS a someone claims that “this or that” is to update those surveys and offer complex process. Human develop- is important for a successful soft- some notes on current and future di- ers may not always understand all ware project, analytics lets us treat rections in software analytics. the factors that infuence their proj- that claim as something to be veri- ects. Software analytics is an excel- fed (rather than a sacred law that Some History lent choice for discovering, verifying, cannot be questioned). Also, once As soon as people started program- and monitoring the factors that the claim is verifed, analytics can ming, it became apparent that pro- affect software development. act as a monitor to continually check gramming was an inherently buggy Software analytics distills large whether “this or that” is now over- process. Maurice Wilkes, speaking of amounts of low-value data into small come by subsequent developments. his programming experiences in the chunks of very high-value informa- Anthropologists studying software early 1950s, recalled the following: tion.1 Those chunks can reveal what projects warn that developers usually factors matter the most for software develop their personal ideas about It was on one of my journeys projects. For example, Figure 1 lists good and bad software on the basis between the EDSAC room and some of the more prominent recent of just a few past projects.2 All too the punching equipment that insights learned in this way. often, these ideas are assumed to ap- “hesitating at the angles of stairs” Software analytics lets us “trust, ply to all projects, rather than just the the realization came over me with but verify” human intuitions. If few seen lately. This can lead to too full force that a good part of the

2 IEEE SOFTWARE | PUBLISHED BY THE IEEE COMPUTER SOCIETY 0740-7459/18/$33.00 © 2018 IEEE FIGURE 1. Surprising software analytics fndings in the press, from Linux Insider, Nature, Forbes, InfoWorld, The Register, Live Science, and Men’s Fitness.

remainder of my life was going to WHY WE NEED be spent in fnding errors in my own programs.8 SOFTWARE ANALYTICS

It took several decades to gather Prior to the 21st century, researchers often had access to data from only one or the experience required to quantify two projects. This meant theories of software development were built from limited any kind of size–defect relationship. data. But in the data-rich 21st century, researchers have access to all the data they In 1971, Fumio Akiyama described need to test the truisms of the past. And what they’ve found is most surprising: the frst known “size” law, saying that the number of defects D was a • In stark contrast to the assumptions of much prior research, pre- and post- function of the number of LOC.9 In release failures are not connected.25 1976, Thomas McCabe argued that • Static code analyzers perform no better than simple statistical predictors.26 the number of LOC was less im- • The language construct GOTO, as used in contemporary practice, is rarely portant than the complexity of that considered harmful.27 code.10 He argued that code is more • Strongly typed languages are not associated with successful projects.28 likely to be defective when his cyclo- • Developer beliefs are rarely backed by any empirical evidence.3 matic complexity measure is over 10. • Test-driven development is not any better than “test last.”29 Not only is programming an in- • Delayed issues are not exponentially more expensive to fix.30 herently buggy process, it’s also • Most “bad smells” should not be fixed.31,32 inherently diffcult. On the basis of data from dozens of projects, Barry Boehm proposed in 1981 an

SEPTEMBER/OCTOBER 2018 | IEEE SOFTWARE 3 FOCUS: ’S 50TH ANNIVERSARY

from software projects contains use- ful information that can be found by data-mining algorithms. We now know that with modest amounts of data collection, it is possible to

• build powerful recommender systems for software navigation or bug triage, or • make reasonably accurate pre- dictions about software develop- ment effort and defects.

Those predictions can rival those made by much more complex meth- ods (such as static-code-analysis tools, which can be hard to maintain). Various studies have shown the commercial merits of this approach. For example, for one set of projects, software analytics could predict 87 percent of code defects, decrease inspection effort by 72 percent, and hence reduce post-release defects by 44 percent.12 Better yet, when mul- tiple models are combined using en- semble learning, very good estimates can be generated (with very low vari- ance in their predictions), and these predictors can be incrementally up- dated to handle changing condi- tions.13 Also, when there is too much data for a manual analysis, it is pos- FIGURE 2. Researchers have collaborated extensively to record the state of the art in sible to automatically analyze data software analytics. from hundreds to thousands of proj- ects, using software analytics.14 Recent research explains why soft- estimator for development effort (that modifcation to work on other proj- ware analytics has been so successful: was exponential with program size) ects. One useful feature of the cur- artifacts from software projects have using a set of effort multipliers Mi rent generation of data-mining an inherent simplicity. As Abram inferred from the current project:11 algorithms is that it is now relatively Hindle and his colleagues explained, fast and simple to learn, for exam- b effort ϭ a * KLOC * ⌸Mi, ple, defect or effort predictors for Programming languages, in other projects, using whatever attri- theory, are complex, fexible and where 2.4 Յ a Յ 3 and 1.05 Յ b Յ butes they have available. powerful, but the programs that 1.2. While the results of Akiyama, real people actually write are McCabe, and Boehm were useful Current Status mostly simple and rather repeti- in their home domains, it turns out Thousands of recent research papers tive, and thus they have usefully that those results required extensive all make the same conclusion: data predictable statistical properties

4 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE that can be captured in statistical different groups. By adopting the software engineering tools can im- language models and leveraged perspectives of different social plement other useful tasks such as for software engineering tasks.15 groupings, developers can build bet- automatically tuning the control ter software.19 Models built from parameters of a data-mining algo- For example, here are just some social factors (such as how often rithm22 (which, for most developers, of the many applications in this someone updates part of the code) is a black art). growing area of research: can be more effective for predicting Other important trends show the code quality than code factors (such maturity of software analytics. For • Patterns in the tokens of soft- as function size or the number of ar- decades now, there have been exten- ware can recognize code that is guments). For example, when you’re sive discussions about the reproduc- unexpected and bug prone. studying software built in multiple ibility of software research—much • Sentiment analysis tools can countries, a good predictor for bugs of it having the form, “wouldn’t it gauge the mood of the develop- is the complexity of the organiza- be a good idea.” We can report here ers, just by reading their issue tional chart. (The fewest bugs are that, at least for quantitative soft- comments.16 introduced when people working ware analytics based on data-mining • Clustering tools can explore on the same functions report to the algorithms, such reproducibility is complex spaces such as Stack same manager, even if they are in common practice. Many conferences Overfow to automatically detect different countries.20) in software engineering reward re- related questions.17 Software analytics also studies searchers with “badges” if they place the interactions of developers, us- their materials online (see tiny.cc ing biometric sensors. Just as we /acmbadges). Accordingly, research- But What’s Next? mine software (and the social pro- ers place their scripts and data in When we refect over the past de- cesses that develop them), so too GitHub. Some even register those re- cade, several new trends stand out. can we mine data, collected at the positories with tools such as Zenodo, For example, consider the rise of the millisecond level, from computer in order to obtain unique and eternal data scientist in industry. Many or- programmers. For example, using digital object identifers for their ar- ganizations now pay handsomely to eye-tracking software or sensors for tifacts. For an excellent pragmatic hire large teams of data scientists. skin and brain activity, software an- discussion on organizing repositories For example, at the time of this alytics can determine what code is in order to increase reproducibility, writing, there are more than 1,000 important to, or most diffcult for, see “Good Enough Practices in Sci- Micro soft employees exploring proj- developers.21 entifc Computing.”23 ect data using software analytics. When trading off different goals, Another challenge is the security These teams are performing tasks the data-mining algorithms used and privacy issues raised by software that a decade ago would have been in conventional software engineer- analytics. Legislation being enacted called cutting-edge research. But ing often use hard-wired choices. around the world (e.g., the European now we call that work standard op- These choices may be irrelevant or General Data Protection Regulation, erating procedure. even antithetical to the concerns of GDPR) means that vendors collect- Several new application areas the business users who are fund- ing data will face large fnes unless have emerged, such as green engi- ing the analysis. For example, one they address privacy concerns. One neering. Fancy cellphones become way to maximize accuracy in un- way to privatize data is to obfuscate hunks of dead plastic if they run out balanced datasets (where, say, most (mutate) it to reduce the odds that a of power. Hence, taming power con- of the examples are not defective) malevolent agent can identify who sumption is now a primary design is to obsess on maximizing the true or what generated that data. But this concern. Software analytics is a tool negative score. This can mean that, raises a problem: the more we mu- for monitoring, and altering, power by other measures such as precision tate data, the harder it becomes to usage profles.18 or recall, the learner fails. What are learn an effective model from it. Re- Another emerging area is so- required are “learners” that guide cent results suggest that sometimes cial patterns in software engineer- the reasoning according to a user- this problem can be solved,24 but ing. Our society divides users into specifed goal. Such search-based much more work is required on how

SEPTEMBER/OCTOBER 2018 | IEEE SOFTWARE 5 FOCUS: SOFTWARE ENGINEERING’S 50TH ANNIVERSARY

text mining, and autonomous cars or drones. ISSUES IN SOFTWARE Finally, with every innovation ANALYTICS come new challenges. In the rush to explore a promising new technology, we should not forget the hard-won Not everything that happens in software development is visible in a software lessons of other kinds of research in repository,33 which means that no data miner can learn a model of these subtler software engineering. This point is factors. Qualitative case studies and ethnographies can give comprehensive in- explored more in the sidebar “Issues sights on particular cases, and surveys can offer industry insights beyond what in Software Analytics.” can be measured from software repositories. Due to such methods’ manual workload, they have issues with scaling up to cover many projects. Hence, we see a future where software is explored in alternating cycles of qualitative and e end with this fre- quantitative methods (where each can offer surprises that prompt further work in quently asked ques- the other34). tion: “What is the most 35 W Another issue with software analytics is the problem of context. That is, important technology newcomers models learned for one project may not apply to another. Much progress has should learn to make themselves bet- been made in recent years in learning context-specifc models36,37 or building ter at data science (in general) and operators to transfer data or models learned from one project to another.38 But software analytics (in particular)?” this is clearly an area that needs more work. To answer this question, we need Yet another issue is how actionable are our conclusions. Software analytics a workable defnition of “science,” aims to obtain actionable insights from software artifacts that help practitioners which we take to mean a community accomplish tasks related to software development and systems. For software of people collecting, curating, and vendors, managers, developers, and users, such comprehensible insights are critiquing a set of ideas. In this com- the core deliverable of software analytics.39 Robert Sawyer commented that munity, everyone does each other the actionable insight is the key driver for businesses to invest in data analytics courtesy to try to prove this shared initiatives.40 pool of ideas. But not all the models generated via software analytics are actionable; i.e., By this defnition, most data sci- they cannot be used to make effective changes to a project. For example, soft- ence (and much software analytics) ware analytics can deliver models that humans fnd hard to read, understand, is not science. Many developers use or audit. Examples of those kinds of models include anything that uses complex software analytics tools to produce or arcane mathematical internal forms, such as deep-learning neural networks, conclusions, and that’s the end of naive Bayes classifers, or random forests. Hence, some researchers in this the story. Those conclusions are not feld have explored methods to reexpress complex models in terms of simpler registered and monitored. There is ones.41 nothing that checks whether old con- clusions are now out of date (e.g., us- ing anomaly detectors). There are no to share data without compromising data-mining algorithms to their data, incremental revisions that seek mini- confdentiality. they can build and ship innovative AI mal changes when updating old ideas. And every innovation also of- tools. While sometimes those tools If software analytics really wants fers new opportunities. There is a solve software engineering problems to be called a science, then it needs to fow-on effect from software analyt- (e.g., recommending what to change be more than just a way to make con- ics to other AI tasks outside of soft- in source code), they can also be used clusions about the present. Any scien- ware engineering. Software analytics on a wider range of problems. That tist will tell you that all ideas should lets software engineers learn about is, we see software analytics as the be checked, rechecked, and incremen- AI techniques, all the while prac- training ground for the next gen- tally revised. Data science methods ticing on domains they understand eration of AI-literate software engi- such as software analytics should be (i.e., their own development prac- neers working on applications such a tool for assisting in complex discus- tices). Once developers can apply as image recognition, large-scale sions about ongoing issues.

6 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE Which is a long-winded way 12. A.T. Misirli, A. Bener, and R. Kale, 21. T. Fritz et al., “Using Psycho- of saying that the technology we “AI-Based Defect Predictors: Applica- physiological Measures to most need to better understand tions and Benefts in a Case Study,” AI Assess Task Diffculty in Soft- software analytics and data science Magazine, Summer 2011, pp. 57–68. ware Development,” Proc. is … science. 13. L.L. Minku and X. Yao, “Ensembles 36th Int’l Conf. Software Eng. and Locality: Insight on Improving (ICSE 14), 2014, pp. 402–413; References Software Effort Estimation,” Infor- doi:10.1145/2568225.2568266. 1. D. Zhang et al., “Software Analytics mation and Software Technology, 22. C. Tantithamthavorn et al., “Auto- in Practice,” IEEE Software, vol. 30, vol. 55, no. 8, 2013, pp. 1512–1528. mated Parameter Optimization of no. 5, 2013, pp. 30–37. 14. R. Krisha et al., “What Is the Con- Classifcation Techniques for Defect 2. C. Passos et al., “Analyzing the Im- nection between Issues, Bugs, and Prediction Models,” Proc. IEEE/ pact of Beliefs in Software Project Enhancements? (Lessons Learned ACM 38th Int’l Conf. Software Eng. Practices,” Proc. 2011 Int’l Symp. from 800ϩ Software Projects),” pre- (ICSE 16), pp. 321–332. Empirical Software Eng. and Mea- sentation at 40th Int’l Conf. Software 23. G. Wilson et al., “Good Enough surement (ESEM 11), 2011, pp. Eng. (ICSE 18), 2018; https://arxiv Practices in Scientifc Computing,” 444–452. .org/abs/1710.08736. PLoS Computational Biology, 3. P. Devanbu, T. Zimmermann, and C. 15. A. Hindle et al., “On the Naturalness vol. 13, no. 6, 2017; https://doi Bird, “Belief & Evidence in Empirical of Software,” Proc. 34th Int’l Conf. .org/10.1371/journal.pcbi.1005510. Software Engineering,” Proc. 38th Software Eng. (ICSE 12), 2012, pp. 24. Z. Li et al., “On the Multiple Sources Int’l Conf. Software Eng. (ICSE 16), 837–847. and Privacy Preservation Issues for 2016, pp. 108–119. 16. A. Murgia et al., “Do Developers Feel Heterogeneous Defect Prediction,” 4. C. Bird, T. Menzies, and T. Emotions? An Exploratory Analysis to be published in IEEE Trans. Soft- Zimmermann, eds., The Art and of Emotions in Software Artifacts,” ware Eng.; https://ieeexplore.ieee.org Science of Analyzing Software Data, Proc. 11th Working Conf. Mining /document/8168387. Morgan Kaufmann, 2015; goo.gl Software Repositories (MSR 14), 25. N.E. Fenton and N. Ohlsson, “Quan- /ZeyBhb. 2014, pp. 262–271. titative Analysis of Faults and Fail- 5. T. Menzies, L. Williams, and T. 17. B. Xu et al., “Predicting Semantically ures in a Complex Software System,” Zimmermann, eds., Perspectives on Linkable Knowledge in Developer IEEE Trans. Software Eng., vol. 26, Data Science for Software Engineering, Online Forums via Convolutional no. 8, 2000, pp. 797–814. Morgan Kaufmann, 2016; goo.gl Neural Network,” Proc. 31st ACM/ 26. F. Rahman et al., “Comparing Static /ZeyBhb. IEEE Int’l Conf. Automated Soft- Bug Finders and Statistical Predic- 6. T. Menzies et al., Sharing Data and ware Eng. (ASE 16), 2016, pp. 51–62. tion,” Proc. 36th Int’l Conf. Software Models in Software Engineering, 18. A. Hindle et al., “GreenMiner: A Eng. (ICSE 14), 2014, pp. 424–434. Morgan Kaufmann, 2014; goo.gl Hardware Based Mining Software 27. M. Nagappan et al., “An Empiri- /ZeyBhb. Repositories Software Energy Con- cal Study of Goto in C Code from 7. T. Menzies and T. Zimmermann, sumption Framework,” Proc. 11th GitHub Repositories,” Proc. 10th “Software Analytics: So What?,” Working Conf. Mining Software Re- Joint Meeting Foundations of Soft- IEEE Software, vol. 30, no. 4, 2013, positories (MSR 14), 2014, pp. 12–21. ware Eng. (ESEC/FSE 15), 2015, pp. pp. 31–37. 19. M. Burnett et al., “GenderMag: A 404–414. 8. M. Wilkes, Memoirs of a Computer Method for Evaluating Software’s 28. B. Ray et al., “A Large-Scale Study of Pioneer, MIT Press, 1985. Gender Inclusiveness,” Interacting Programming Languages and Code 9. F. Akiyama, “An Example of Software with Computers, vol. 28, no. 6, 2016, Quality in Github,” Comm. ACM, System Debugging,” Information Pro- pp. 760–787. vol. 60, no. 10, 2017, pp. 91–100. cessing, vol. 71, 1971, pp. 353–359. 20. C. Bird et al., “Does Distributed De- 29. D. Fucci et al., “A Dissection of the 10. T. McCabe, “A Complexity Mea- velopment Affect ? Test-Driven Development Process: sure,” IEEE Trans. Software Eng., An Empirical Case Study of Win- Does It Really Matter to Test-First vol. 2, no. 4, 1976, pp. 308–320. dows Vista,” Proc. 31st Int’l Conf. or to Test-Last?,” IEEE Trans. Soft- 11. B. Boehm, Software Engineering Software Eng. (ICSE 09), 2009, pp. ware Eng., vol. 43, no. 7, 2017, pp. Economics, Prentice-Hall, 1981. 518–528. 597–614.

SEPTEMBER/OCTOBER 2018 | IEEE SOFTWARE 7 FOCUS: SOFTWARE ENGINEERING’S 50TH ANNIVERSARY

40. R. Sawyer, “BI’s Impact on Analyses and Decision Making Depends on the TIM MENZIES is a full professor of computer science at North Development of Less Complex Appli- Carolina State University. He works on software engineer- cations,” Principles and Applications ing, automated software engineering, and the foundations of Business Intelligence Research, IGI of software science. Menzies received a PhD in computer Global, 2013, pp. 83–95. science from the University of New South Wales. Contact him 41. H.K. Dam, T. Tran, and A. Ghose, at [email protected]; http://menzies.us. “Explainable Software Analytics,” 2018; https://arxiv.org/abs/1802 .00603. THOMAS ZIMMERMANN is a senior researcher at Micro- soft. He works on the productivity of Microsoft’s software developers and data scientists. Zimmermann received a PhD in computer science from Saarland University. He is a Distin-

ABOUT THE AUTHORS THE ABOUT guished Member of ACM and a Senior Member of IEEE and the IEEE Computer Society. Contact him at tzimmer@microsoft .com; http://thomas-zimmermann.com.

30. T. Menzies et al., “Are Delayed Is- Research,” Proc. 3rd Int’l Symp. Em- sues Harder to Resolve? Revisiting pirical Software Eng. and Measure- Cost-to-Fix of Defects throughout the ment (ESEM 09), 2009, pp. 401–404. Lifecycle,” Empirical Software Eng., 36. T. Dybå, D.I.K. Sjøberg, and D.S. vol. 22, no. 4, 2017, pp. 1903–1935. Cruzes, “What Works for Whom, 31. R. Krishna, T. Menzies, and L. Where, When, and Why? On the Role Layman, “Less Is More: Minimizing of Context in Empirical Software En- Code Reorganization Using XTREE,” gineering,” Proc. 2012 ACM/IEEE Information and Software Technol- Int’l Symp. Empirical Software Eng. ogy, vol. 88, 2017, pp. 53–66. and Measurement (ESEM 12), 2012, 32. M. Kim, T. Zimmermann, and pp. 19–28. N. Nagappan, “A Field Study of 37. T. Menzies, “Guest Editorial Best Refactoring Challenges and Benefts,” Papers from the 2011 Conference on Proc. ACM SIGSOFT 20th Int’l Predictive Models in Software Engi- Symp. Foundations of Software Eng. neering (PROMISE),” Information (FSE 12), 2012, article 50. and Software Technology, vol. 55, 33. J. Aranda and G. Venolia, “The Secret no. 8, 2013, pp. 1477–1478. Life of Bugs: Going past the Errors 38. R. Krishna and T. Menzies, “Bell- and Omissions in Software Reposito- wethers: A Baseline Method for ries,” Proc. 31st Int’l Conf. Software Transfer Learning,” 2018; https:// Eng. (ICSE 09), 2009, pp. 298–308. arxiv.org/abs/1703.06218. 34. D. Chen, K.T. Stolee, and T. Menzies, 39. S.-Y. Tan and T. Chan, “Defning and “Replicating and Scaling Up Qualita- Conceptualizing Actionable Insight: A tive Analysis Using Crowdsourcing: Conceptual Framework for Decision- A Github-Based Case Study,” 2018; Centric Analytics,” presentation at Read your subscriptions https://arxiv.org/abs/1702.08571. 2015 Australasian Conf. Informa- through the myCS publications portal at 35. K. Petersen and C. Wohlin, “Context tion Systems (ACIS 15), 2015; https:// http://mycs.computer.org in Industrial Software Engineering arxiv.org/abs/1606.03510.

8 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE