Accepted Manuscript1.0
Total Page:16
File Type:pdf, Size:1020Kb
RESEARCHERS AND PRACTITIONERS VOLLEY ABOUT MAKING RESEARCH USEFUL Dear Practitioners: The research community is actually discovering things you might find useful. Please help us organize this knowledge so that it’s actually useful to you. Understand that this isn’t absolute truth, but rather the best we can do at the moment. You must be thoughtful about using this knowledge, but it’s a lot better than guessing. Sincerely, The Researchers Dear Researchers, We have a lot of questions, and we suspect you have answers. Unfortunately, the answers are scattered among thousands of papers, and we can’t tell fact from fiction. Worse, there are entire topics that no one is studying because they aren’t “scientific enough.” We have fallen back on getting insights from Hacker News, Stevey’s Drunken Blog Rants, and Jeff, who just transferred from Accounting. We’re pretty sure Jeff doesn’t know anything, but he’s the loudest person in our stand-up, and we don’t have any evidence to dispute him. We’ll take whatever evidence you have. Sincerely, The Practitioners imagining a dialogue between how these tradeoffs between rigor writing programs; the concept of rea- researchers and practitioners (see and pragmatism have been handled soning mathematically about a pro- the sidebar). in medicine, where risk is often ac- gram had just been introduced. The ceptable in the face of urgency. We emphasis was on demonstrated capa- This pattern is common: engi- propose an approach to describ- bility—what we might now call feasi- neers often rely on their experience ing SE research results with varied bility—rather than rigorous validation. and a priori beliefs1 or turn to co- quality of evidence and synthesizing This is visible in a sampling of workers for advice. This is better those results into codified knowledge major results of the period. For ex- than guessing or giving up. But what for practitioners. This approach can ample, Carnegie Mellon University if incompletely validated research both improve practice and increase identified a set of canonical papers outcomes could be distilled into re- the pace of research, especially in ex- published between 1968 and 2002.3 liable sources, intermediate between ploratory topics. Several are formal analyses or em- validated results and folk wisdom? pirical studies, and a few are case To impact practice, SE research Software Engineering studies. However, the majority are results must lead to pragmatic, ac- Research Expectations carefully reasoned essays that pro- tionable advice. This involves syn- over Time pose new approaches based on the thesizing recommendations from When the 1968 NATO Conference authors’ experience and insight. results with different assumptions introduced “software engineering” The field has historically built and levels of rigor, assigning appro- to our vocabulary,2 research often on results with varying degrees of priate levels of confidence to the rec- focused on designing and building certainty. Indeed, Fred Brooks pro- ommendations. Here, we examine programs. There were guidelines for posed a “certainty-shell” structure SEPTEMBER/OCTOBER 2018 | IEEE SOFTWARE 51 FOCUS: SOFTWARE ENGINEERING’S 50TH ANNIVERSARY for knowledge, to balance “the ten- rigorous validation.7 The strength EBM emphasizes the synthesis of sion between narrow truths proved of validation was the most impor- (possibly weaker) individual results convincingly by statistically sound tant factor affecting acceptance, and into stronger conclusions. Signifi- experiments, and broad ‘truths,’ there were clear alignments between cantly, EBM then supports practi- generally applicable, but supported the types of result and the validation cal decision making that traces the only by possible unrepresentative techniques. level of evidence and confidence in observations.”4 This evolution is consistent with a decision to its source by assign- Brooks’ structure recognizes three the way ideas typically evolve in our ing strengths to a recommendation nested classes of results: scientifi- field—building from key insights based on the level of the evidence cally validated findings, observa- and early exploration to products, that supports it: tions, and rules of thumb—with over several decades. Different re- different evaluation criteria for each. search methods are appropriate at What are we to do when the By properly identifying each re- different stages of this evolution, as irresistible force of the need to sult, we can take advantage of in- increasing confidence in the work offer clinical advice meets with complete or partial knowledge. For justifies larger-scale, controlled eval- the immovable object of flawed example, Butler Lampson’s “Hints uation. However, well-controlled ex- evidence? All we can do is our for Computer System Design” is an periments almost inevitably narrow best: give the advice, but alert excellent set of well-thought-out the scope of their results and their the advisees to the flaws in the rules of thumb.5 immediate practical relevance. evidence on which it is based.9 Expectations for rigor in SE re- Additionally, certain types of search have evolved since the NATO research (like design), and most Table 1 gives the rules for assign- conference. Around the turn of the early-stage research, remain more ing recommendation grades.9 21st century, the SE research com- narrative; setting inappropriate eval- Today, many diagnoses combine munity became concerned about the uation expectations can deter prog- the context of the particular patient lack of quantitative, experimental ress. Even results falling short of case with levels of uncertainty from research. A preponderance of papers current standards are “better than the analysis model to determine di- accepted for the 2002 International nothing” for practitioners. To help agnosis certainty and confidence Conference on Software Engineer- reconcile this tension between the in the recommended treatment. ing (ICSE 02) defended their results research community’s standards The decision can map to combined with examples, followed by papers of rigor and the pragmatic needs levels of certainty, including less that supported results with formal of practitioners, we observe how certain results from different analy- or controlled experimental tech- evidence-based medicine (EBM) sis methods. On the basis of EBM niques and reports on experience in connects research results to the needs principles, physicians can now con- practice.6 Beginning in 2004, the of clinical medical practice, and sult best medical-practice guidance evidence-based software engineer- vice versa. via a smartphone from a patient’s ing (EBSE) community began call- bedside. ing for synthesizing research results Evidence-Based EBM favors neither the research through systematic literature re- Medicine nor the practice. It instead represents views (SLRs). EBM is the conscientious, explicit, a systematic approach to clinical The field has matured in its and judicious use of the current best problem solving that allows the inte- awareness of the variety of research evidence from medical clinicians gration of the best available research methods available. In 2014 and to make timely decisions about the evidence with clinical expertise and 2015, ICSE asked authors to classify care of individual patients. EBM ar- patient values. EBM can teach SE submissions as analytical, empiri- ranges medical evidence into a hier- the value of researchers and practitio- cal, methodological, perspective, or archy (see Figure 1): case reports and ners collaborating. Researchers make technological, providing criteria for series are toward the bottom, pro- in-development techniques available each category. Compared to ICSE gressing to individual randomized for testing, and practitioners com- 02, ICSE 16 had substantially more clinical trials and then meta-analysis bine such evidence using their best empirical reports and much more and systematic reviews.8 In this, judgment. 52 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE FOCUS: SOFTWARE ENGINEERING’S 50TH ANNIVERSARY Budgen and his colleagues found Table 2. The hierarchy of evidence for software that only 37 out of 178 SLRs pub- engineering research. lished between 2010 and 2015 provided recommendations or con- Type of study Level Evidence clusions of relevance to education or Secondary or filtered studies 0 Systematic reviews with recommendations practice.16 A follow-up study found for practice; meta-analyses that the SLRs with recommenda- Primary studies Systematic 1 Formal or analytic results with rigorous tions on practice were derived pre- evidence derivation and proof dominantly from primary studies conducted in industry. While the fo- 2 Quantitative empirical studies with careful experimental design and good statistical cus on what has been done is helpful control for researchers, the lack of focus on what has been learned is unhelpful Observational 3 Observational results supported by sound evidence qualitative methods, including well-designed for practitioners. case studies In one paper that provided action- able insights, the authors synthesized 4 Surveys with good sampling and good nine papers from the software-test- design; field studies; data mining ing literature.17 They gave rules of 5 Experience from multiple projects, with thumb for practice, such as “when analysis and cross-project comparison; a you need higher assurance, ... a tool, a prototype, a notation, a dataset, or another artifact (that has been certified as data-flow technique … can be more