Patrick Cousot
Total Page:16
File Type:pdf, Size:1020Kb
Abstract Interpretation: From Theory to Tools Patrick Cousot cims.nyu.edu/~pcousot/ pc12ou)(sot@cims$*.nyu.edu ICSME 2014, Victoria, BC, Canada, 2014-10-02 1 © P. Cousot Bugs everywhere! Ariane 5.01 failure Patriot failure Mars orbiter loss Russian Proton-M/DM-03 rocket (overflow error) (float rounding error) (unit error) carrying 3 Glonass-M satellites (unknown programming error :) unsigned int payload = 18; /* Sequence number + random bytes */ unsigned int padding = 16; /* Use minimum padding */ /* Check if padding is too long, payload and padding * must not exceed 2^14 - 3 = 16381 bytes in total. */ OPENSSL_assert(payload + padding <= 16381); /* Create HeartBeat message, we just use a sequence number * as payload to distuingish different messages and add * some random stuff. * - Message Type, 1 byte * - Payload Length, 2 bytes (unsigned int) * - Payload, the sequence number (2 bytes uint) * - Payload, random bytes (16 bytes uint) * - Padding */ buf = OPENSSL_malloc(1 + 2 + payload + padding); p = buf; /* Message Type */ *p++ = TLS1_HB_REQUEST; /* Payload length (18 bytes here) */ s2n(payload, p); /* Sequence number */ s2n(s->tlsext_hb_seq, p); /* 16 random bytes */ RAND_pseudo_bytes(p, 16); p += 16; /* Random padding */ RAND_pseudo_bytes(p, padding); ret = dtls1_write_bytes(s, TLS1_RT_HEARTBEAT, buf, 3 + payload + padding); Heartbleed (buffer overrun) ICSME 2014, Victoria, BC, Canada, 2014-10-02 2 © P. Cousot Bugs everywhere! Ariane 5.01 failure Patriot failure Mars orbiter loss Russian Proton-M/DM-03 rocket (overflow error) (float rounding error) (unit error) carrying 3 Glonass-M satellites (unknown programming error :) unsigned int payload = 18; /* Sequence number + random bytes */ unsigned int padding = 16; /* Use minimum padding */ /* Check if padding is too long, payload and padding * must not exceed 2^14 - 3 = 16381 bytes in total. */ OPENSSL_assert(payload + padding <= 16381); /* Create HeartBeat message, we just use a sequence number * as payload to distuingish different messages and add * some random stuff. * - Message Type, 1 byte * - Payload Length, 2 bytes (unsigned int) * - Payload, the sequence number (2 bytes uint) * - Payload, random bytes (16 bytes uint) * - Padding */ buf = OPENSSL_malloc(1 + 2 + payload + padding); These are great proofs of the p = buf; /* Message Type */ *p++ = TLS1_HB_REQUEST; • /* Payload length (18 bytes here) */ s2n(payload, p); /* Sequence number */ s2n(s->tlsext_hb_seq, p); /* 16 random bytes */ RAND_pseudo_bytes(p, 16); presence of bugs! p += 16; /* Random padding */ RAND_pseudo_bytes(p, padding); ret = dtls1_write_bytes(s, TLS1_RT_HEARTBEAT, buf, 3 + payload + padding); Heartbleed (buffer overrun) ICSME 2014, Victoria, BC, Canada, 2014-10-02 3 © P. Cousot On the limits of bug finding • Giant software manufacturers can rely on gentle end- users to find myriads of bugs; • But what about: can passengers really help? • Is dynamic/static bug finding always enough? • Proving the absence of bugs is much better! ICSME 2014, Victoria, BC, Canada, 2014-10-02 4 © P. Cousot Formal Methods ICSME 2014, Victoria, BC, Canada, 2014-10-02 5 © P. Cousot Formal Methods • Mathematical and engineering principles applied to the specification, design, construction, verification, maintenance, and evolution of very high quality software • Strongly promoted by Harlan D. Mills since the 70’s e.g. • Harlan D. Mills: The New Math of Computer Programming. Commun. ACM 18(1): 43-48 (1975) • Harlan D. Mills: Software Development. IEEE Trans. Software Eng. 2(4): 265-273 (1976) • Harlan D. Mills: Function Semantics for Sequential Programs. IFIP Congress 1980: 241-250 • ... ICSME 2014, Victoria, BC, Canada, 2014-10-02 6 © P. Cousot Main formal methods for verification • Objective: prove automatically that a program does satisfy a specification given either explicitly or implicitly (e.g. absence of runtime errors) • Deductive methods: use a theorem prover/proof assistant to check a user-provided proof argument • Enumerative, symbolic, bounded, solver(e.g. Z3)- based, interpolation, statistical, etc model-checking: check the specification by enumerating finitely many possibilities • Abstract interpretation: use approximation ideas to consider infinitely many possiblilities ICSME 2014, Victoria, BC, Canada, 2014-10-02 7 © P. Cousot Fundamental limitations • By Gödel’s undecidability, no perfect solution is and will ever be possible: • Deductive methods: the burden is on the end-user and the proofs are exponential in the size of programs • Model-checking: severe unsolved scalability problem • Abstract interpretation: may produce false alarms (but no false negative) • Unsound methods (Coverity, Klocwork, Purify, etc): no correctness guarantee at all. ICSME 2014, Victoria, BC, Canada, 2014-10-02 8 © P. Cousot The Evolution of Formal Methods ICSME 2014, Victoria, BC, Canada, 2014-10-02 9 © P. Cousot testing are returned to the development team for cor- A traditional project experiencing, say, five rection. If quality is not acceptable, the software is errors/KLOC in function testing may have encount- removed from testing and returned to the develop- ered 25 or more errors per KLOC when measured ment team for rework and reverification. from first execution in unit testing. Quality compar- The process of Cleanroom development and cer- isons between traditional and Cleanroom software tification is carried out incrementally. Integration is are meaningful when measured from first execution. continuous, and system functionality grows with the Experience has shown that there is a qualitative addition of successive increments. When the final difference in the complexity of errors found in increment is complete, the system is complete. Cleanroom and traditional code. Errors left behind Because at each stage the harmonious operation of by Cleanroom correctness verification, if any, tend to future increments at the next level of refinement is be simple mistakes easily found and fixed by statis- predefmed by increments already in execution, inter- tical testing, not decp design or interface errars. face and design errors are rare. Cleanroom errors are not only infrequent, but The Cleanroom process is being successfully usually simple as well. applied in IBM and other organizations. The tech- Highhghts of Cleanroom projects reported in nology requires some training and practice, but Table 1Change are described below:of Scale builds on existing skills and software engineering practices. It is readily applied to both new system• 1993: IBM Flight Control. A III-I60 helicopter avionics development and re-engineering and extension of component was developed on schedule in three existing systems. The IBM Cleanroom Software increments comprising 33 KLOC of JOVIAL [SI. Technology Center (CSTC) [4] provides technology A total of 79 corrections were required during statis- transfer support to Cleanroom teams through educa- tical certification for an error rate of 2.3 errors per tion and consultation. KLOC for verified software with no prior execution or debugging. Cleanroom quality results • 2013: Astrée IBM COBOL checks Structuring automatically Facility the (COBOL/SI;).absence of Table 1 summarizes quality results from anyCOBOL/SF, runtime IBM’serror firstin thecommercial control/command Cleanroom Cleanroom projects. Earlier results are reported in softwareproduct, was of developedthe A380 byand a A400Msix-person by team. abstract The [SI. The projects report a ”certification testing interpretationproduct automatically i.e. > 1000 transforms KLOC of unstructuredC failure rate;” for example, the rate for the IBM Flight COBOL programs into functionally equivalent struc- Control project was 2.3 errors per KLOC, andHarlan for D. Mills: Zerotured Defect Software: form Cleanroom for improved Engineering. Advances understandability in Computers 36: 1-41 (1993) and main- the IBM COBOL Structuring Facility project,Patrick 3.4 Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine Miné, Xavier Rival: Why does Astrée scale up? Formal Methods in tenance.System Design 35 (3):It 229-264 makes (2009) use of proprietary graph-theoretic errors per KLOC. These numbers represent all algorithms, and exhibits a level of complexity on the errors found in all testing, measured from first-everICSME 2014, Victoria, BC, Canada,order 2014-10-02 of a COBOL compiler. 10 © P. Cousot execution through test completion. That is, the rates The current version of the 85 KLOC PL/I represent residual errors present in the software fol- product required 52 KLOC of new code and 179 lowing correctness verification by development corrections during statistical certification of five teams. increments, for a rate of 3.4 errors per KLOC [7]. The projects in Table 1 produced over a half a Several major components completed certification million lines of Cleanroom code with a range of 0 to with no errors found. In an early support program 5.1 errors per KLOC for an average of 3.3 errors per at a major aerospace corporation, six months of KLOC found in all testing, a remarkable quality intensive use resulted in no functional equivalence achievement indeed. errors ever found [SI. Productivity, including all Traditionally developed software does not specification, design, verification, certification, user undergo correctness verification. It goes from devel- publications, and management, averaged 740 LOC opment to unit testing and debugging, then more per person-month. Challenging schedules defined for debugging in function and system testing. At entry competitive reasons were all met.