Cobold: Gobblin' up COBOL Bugs for Fun and Profit
Total Page:16
File Type:pdf, Size:1020Kb
COBOLd: Gobblin’ Up COBOL Bugs for Fun and Profit squaresLab∗ Mr. squaresLab SpouseMan† Institute for Soware Research e Private Sector School of Computer Science Pisburgh, PA 15213 Carnegie Mellon University Pisburgh, PA 15213 ABSTRACT that failing test cases pass without breaking any of the originally e cost and ubiquity of soware bugs has motivated research in passing test cases. automated program repair (APR), a eld dedicated to the automatic In this work, we automatically x bugs in COBOL programs triage and patching of soware defects. APR has typically focused using COBOLd, which is just GenProg [10, 11, 17, 18] directly ap- on popular languages such as C and Java, but COBOL developers plied to COBOL programs. GenProg uses genetic programming have suered six decades of neglect and deserve a repair tool too. to traverse a space of possible patches to a given buggy program. We present COBOLd, the rst tool for automatically repairing buggy GenProg uses the test results to dene an objective function that COBOL programs. We demonstrate COBOLd’s eectiveness on informs an evolutionary search [4, 8] over the space of candidate a COBOL reimplementation of the infamous Zune leap-year bug. patches. We reuse GenProg because it is well-established and also We also argue that “we got this” and other researchers should stay we know its source code very, very well, having wrien a large pro- away so that we can x all the COBOL bugs ourselves and thus be portion of it. It turns out that, given the appropriate conguration lthy, stinking rich. options, the code we already have can totally x bugs in COBOL programs. Who knew? CCS CONCEPTS •So ware and its engineering →So ware evolution; Main- 2 BACKGROUND taining so ware; Search-based so ware engineering; •COBOL Don’t know COBOL? Don’t worry. is section has all you need. repair →$$$$$$$$$$$$$$$$$$$$$$$$$$$$; 2.1 COBOL: Is it reasonable? KEYWORDS 2.1.1 Yes. It is incredibly dicult to create a program that con- COBOL, SBSE, Soware Evolution, Genetic Programming, $$$$ tains bugs in COBOL. is is because the design of the language makes it incredibly dicult to write any program in COBOL. e ACM Reference format: squaresLab and Mr. squaresLab SpouseMan. 2018. COBOLd: Gobblin’ Up lack of advanced features such as passing parameters to procedures COBOL Bugs for Fun and Prot. In Proceedings of SIGBOVIK, Pisburgh, PA means that programmers are forced to think very carefully before USA, March 2018 (SIGBOVIK’18), 5 pages. authoring COBOL programs. Bugs can only exist if programs exist. DOI: 10.475/123 4 2.1.2 Counterpoint: Absolutely Not. e rst specication of the language (COBOL-60) contained many logical errors and was 1 INTRODUCTION impossible to interpret unambiguously [2]. is is unsurprising: no 1.1 Automatic Program Repair (APR) academic computer scientists were on the design commiee. e Automatic program repair (APR) is all about xing bugs automati- commiee invented a new metalanguage to dene its syntax be- cally because soware maintenance is expensive. APR techniques cause no commiee members had heard of Backus-Naur form [14]. take as input (1) a program with a defect and (2) a mechanism to Dijkstra once declared: “e use of COBOL cripples the mind; its validate correct and incorrect behavior in that program. e val- teaching should, therefore, be regarded as a criminal oence.” [3] Also, the “Hello World” program for COBOL is considered non- idation mechanism is typically a test suite: a bug to be repaired 1 corresponds to one or more tests that is failing; one or more pass- compliant. ing tests guard against the removal of desireable functionality. e goal is to produce a patch that modies the original program such 2.2 (Inexplicably Recent) Prior Work on COBOL ∗Chris Timperley, Deby Katz, Zack Coker, Rijnard van Tonder, Mauricio Soto, Afsoon Afzal, Cody Kinneer, Jeremy Lacomis, and Claire Le Goues Unfortunately for Dijkstra, COBOL is still lurking in academic con- †Adam Brady ferences and college curriculums (ese citations are all real‼! I know. We were also surprised!). Research papers on COBOL in Permission to make digital or hard copies of part or all of this work for personal or st classroom use is granted without fee provided that copies are not made or distributed the 21 century cover aspect oriented programming [9, 15] and a for prot or commercial advantage and that copies bear this notice and the full citation (2017‼!) study of code clones [6]. Professors disagreeing with Dijk- on the rst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). stra, or openly admiing to torturing their students, have published SIGBOVIK’18, Pisburgh, PA USA © 2018 Copyright held by the owner/author(s). 123-4567-24-567/08/06...$15.00 1e DISPLAY statement (i.e., print) is considered a dangerous paern according to a DOI: 10.475/123 4 COBOL static analyzer: hps://rules.sonarsource.com/cobol/RSPEC-1279 SIGBOVIK’18, March 2018, Pisburgh, PA USA squaresLab and Mr. squaresLab SpouseMan 1 IDENTIFICATIONDIVISION . 1 IDENTIFICATIONDIVISION . 2 PROGRAM-ID .ZUNEBUG. 2 PROGRAM-ID .ZUNEBUG. 3 3 4 DATADIVISION . 4 DATADIVISION . 5 WORKING-STORAGESECTION . 5 WORKING-STORAGESECTION . 6 01 WS-YEAR PIC 9(9) VALUE 1980. 6 01 WS-YEAR PIC 9(9) VALUE 1980. 7 01 WS-DAYS PIC 9(9). 7 01 WS-DAYS PIC 9(9). 8 01 WS-Q PIC 9(9). 8 01 WS-Q PIC 9(9). 9 01 WS-R1 PIC 9(9). 9 01 WS-R1 PIC 9(9). 10 01 WS-R2 PIC 9(9). 10 01 WS-R2 PIC 9(9). 11 01 WS-R3 PIC 9(9). 11 01 WS-R3 PIC 9(9). 12 12 13 PROCEDUREDIVISION . 13 PROCEDUREDIVISION . 14 MAIN-PARAGRAPH. 14 MAIN-PARAGRAPH. 15 DISPLAY 1 UPON ARGUMENT-NUMBER. 15 DISPLAY 1 UPON ARGUMENT-NUMBER. 16 ACCEPT WS-DAYS FROM ARGUMENT-VALUE. 16 ACCEPT WS-DAYS FROM ARGUMENT-VALUE. 17 PERFORMWITHTESTBEFOREUNTIL WS-DAYS <= 365 17 PERFORMWITHTESTBEFOREUNTIL WS-DAYS <= 365 18 *LEAPYEARCOMPUTATION 18 *LEAPYEARCOMPUTATION 19 DIVIDE WS-YEAR BY 4 GIVING WS-Q REMAINDER WS-R1 19 DIVIDE WS-YEAR BY 4 GIVING WS-Q REMAINDER WS-R1 20 DIVIDE WS-YEAR BY 100 GIVING WS-Q REMAINDER WS-R2 20 DIVIDE WS-YEAR BY 100 GIVING WS-Q REMAINDER WS-R2 21 DIVIDE WS-YEAR BY 400 GIVING WS-Q REMAINDER WS-R3 21 DIVIDE WS-YEAR BY 400 GIVING WS-Q REMAINDER WS-R3 22 IF ( WS-R1 = 0 AND WS-R2 NOT = 0) OR WS-R3 = 0 THEN 22 IF ( WS-R1 = 0 AND WS-R2 NOT = 0) OR WS-R3 = 0 THEN 23 IF WS-DAYS ISGREATERTHAN 366 THEN 23 SUBTRACT 366 FROM WS-DAYS 24 SUBTRACT 366 FROM WS-DAYS 24 IF WS-DAYS IS GREATER THAN 366 THEN 25 ADD 1 TO WS-YEAR 25 ADD 1 TO WS-YEAR 26 END-IF 26 END-IF 27 ELSE 27 ELSE 28 SUBTRACT 365 FROM WS-DAYS 28 SUBTRACT 365 FROM WS-DAYS 29 ADD 1 TO WS-YEAR 29 ADD 1 TO WS-YEAR 30 END-IF 30 END-IF 31 END-PERFORM . 31 END-PERFORM . 32 DISPLAY WS-YEAR. 32 DISPLAY WS-YEAR. 33 STOPRUN. 33 STOPRUN. (a) Buggy code: An innite loop happens when WS-YEAR (b) Zune bug, repaired by COBOLd (yes, seriously). Lines 23 and 24 is a leap year, and WS-DAYS is less than 366. are swapped in the original source Figure 1: COBOL implementation of the infamous Zune bug and repaired program generated by COBOLd. First six columns omitted to t on one page. the colorfully titled papers “Teaching a language many CS students Algorithm 1 A simplied G&V algorithm for automated COBOL didn’t want to learn: COBOL” [7] and “Bringing COBOL back to repair. the college IT curriculum” [12]. Some of our favorite excepts from 1: function COBOLd(p) these papers’ abstracts include “As shocking as this may sound to 2: cands GenProg.getInitialPopulation(p) some, there are many reasons beyond great need to bring COBOL back 3: while !oracle(cands) do into the CIS curriculum” [12] and “Over 80 percent of error diagnosis 4: cands GenProg.getNextPopulation(cands) was found to be inaccurate. Such feedback is not optimal for users, 5: end while particularly for the learning user of Cobol.” [13]. 6: ??????????????????????????????? 2 As far as static analysis tools, Sonarbe provides static analy- 7: PROFIT! sis tools for COBOL. However, Sonarbe can’t automatically x 8: end function COBOL programs. But we can. So watch this and don’t blink. 3 APR FOR COBOL program text itself. is helps limit the generation stage to syntac- 3.1 Generate-and-validate. tically valid programs, and allows more advanced reasoning about program structure. ough this works for reasonable programming One of primary classes of approaches for APR is known as heuristic languages, COBOL is not specied with a real grammar (cf. Sec- or generate-and-validate repair (G&V), which uses search-based tion 2.1.2), and the creation of a parser would require for-real work. soware engineering techniques [5, 16] to generate many candidate So, the COBOLd function invokes GenProg [11] to modify the pro- patches for a bug, and then validate them using indicative workloads gram text of p directly. is is particularly fortunate because we or test suites. would hate to implement a parser for a language with 550 reserved Algorithm 1 illustrates our novel G&V approach for xing COBOL words, are you #kidding me?3 programs. e COBOLd function takes as input a textual COBOL e brunt of the work is done by GenProg, which produces program p.