Special Topics on Binary‐Level Program Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Special topics on binary‐level program analysis Gang Tan CSE 597 Spring 2019 Penn State University 1 Critical Software Systems • Software is ubiquitous – E‐commerce – E‐voting – Airplane control software • “Fly by wire” – … • However, the media is full of reports of the catastrophic impact of software failures – Misbehaving software – Vulnerable software being attacked • Viruses, internet worms, botnets, rootkits, • Web site defacement, DDoS • Hacked accounts 2 What Allowed These Failures and Attacks? • Design flaws – E.g., misuse of crypto • Programming bugs – Missing input validation – In C/C++, missing array bound checking – … 3 Example: Knight Capital's $440 million loss • Knight capital: algorithmic trading • Stock price – Bid price: what buyers are willing to pay – Ask price: what sellers are willing to accept – Ask price >= the bid price • Difference called a spread • Knight capital’s misbehaving trading software – Bought at ask price and sold at bid price • Buy high and sell low – Did this over and over again – Lost $440 million before they realized it – Knight capital on the brink of bankruptcy; bought by a different company 4 Example: NASA Mars Climate Orbiter • In 1999, NASA’s $125‐million Mars Climate Orbiter crashed into Mars • Two pieces of the orbiter software used different units for calculation – One piece calculated results in pound‐seconds, interpreted by a second piece as newton‐seconds – As a result, the orbiter was sent too low and too fast into the Martial atmosphere 5 Example: Microsoft Zune Crash • Last day of 2008 – Thousands of Microsoft Zune music players began freezing about midnight year = ORIGINYEAR; /* = 1980 */ while (days > 365) { if (IsLeapYear(year)) { if (days > 366) { days ‐= 366; year += 1; } } else { days ‐= 365; year += 1; } } – The bug surfaces on the last day of a leap year 6 Why Can’t We Get Rid of All Errors from Software? • Writing programs is not easy – Tons of issues to consider – Reliability and security are hard for programmers to reason about – There is a lack of tools other than testing • Statistics: 30‐85 errors are made per 1000 lines of source code • Testing helps – However, even extensively tested software contains 0.5‐3 errors per KLOC 7 How Big are Software Systems Today? Year Operating System SLOC (Million) 1993 Windows NT 3.1 4‐5 1994 Windows NT 3.5 7‐8 1996 Windows NT 4.0 11‐12 2000 Windows 2000 More than 29 2001 Windows XP 40 2006 Windows Vista ~50 Windows 7 ??? Windows 8 ??? Windows 10 ??? Now multiple this many lines of code with the error rate 8 How Can We Possibly Improve the Situation? • Program analysis – Build a model of the program – Analyze the model to search for errors • Examples – Show code doesn’t have security vulnerabilities such as buffer overflows – Show code doesn’t go into infinite loops • In general, formal methods research – Program analysis – Model checking – Theorem proving • Developing formal machine‐checked proofs on existing code • Or proof by construction 9 Analyzing Source Code • Typically program analysis is performed on source code (or some equivalent intermediate language) • Benefits – Source code is structured – Source code has rich information (e.g., types) to help analysis – Results on source code are understandable to programmers 10 Analyzing Binary Code • However, there are situations when source code is not available – Third‐party code/libraries – Legacy systems where source code is lost • Further, analyzing binary code means no need to trust the compiler – Compiler bugs are not so rare – Even if the source code is secure, the compiled binary code may not 11 Buggy Compilers Source Executable Code Compiler Code • Hundreds of compiler bugs found in recent work – [Yang et al. PLDI 2011], [Wang et al. SOSP 2013] 12 Further, Compiler May not Understand Security Requirements int *password = (int *)malloc(sizeof(int)*length); read_password(password, length); /* read the password */ process(password, length); /* process the password */ memset(password, 0, length); /* wipe out the password */ • The example – Erase the password from memory after it is no longer needed – To mitigate certain security attacks • The compiler, however – May erase “memset(password, 0, length)” during dead‐code elimination • Because password isn’t used after the call to process – Was the case for Microsoft Visual C++ .NET compiler 13 Analyzing Binary Code is More Challenging • Binary code is unstructured – Uses gotos • Binary code may not have meta information – May not have symbols, types, etc. – Program analysis has to use some algorithm to recover (partial) meta information • Results on binary code not easily understandable • Binary code is architecture specific – The same source code can be compiled to x86, ARM, SPARC, MIPS, etc. – Each is different • … 14 COURSE SUMMARY This Course • Theme: covers the state of the art of binary‐level program‐analysis techniques • Program analysis: static or dynamic • Static analysis: analyze the code before it is running – Build abstractions (approximations) of programs – These abstractions allow the identification of programming errors • Dynamic analysis: analyze the code as it is running – Monitor the state of the program during runtime • State: instruction, registers, memory, I/O device states, … – Pro: more accurate information is available at runtime – Cons: analyze a particular run of the code; poor code coverage 16 Topics Covered by the Course • x86 assembly basics • Static disassembly • Basics of static analysis using Datalog – Dataflow analysis – Inter‐procedural analysis – Points‐to analysis • Dynamic binary analysis – Taint tracking • Binary‐level code instrumentation – Software‐based fault isolation – Control‐flow integrity • Topics if have time – Binary‐level type inference – Data structure reverse engineering 17 Administrivia • Canvas (http://canvas.psu.edu/) – Homework submission • Q&A Forum in Piazza • A course public website – http://www.cse.psu.edu/~gxt29/teaching/cse597s19/ – Schedule and homework announcements – Slides • No exams! • Research‐oriented final project – Format discussed later • Research paper presentations and reviews – A significant part 18 Paper Presentation and Reviewing • Purpose – Read some literature – Understand how papers are organized – Practice presentation skills – Practice the ability of understanding other peoples’ talks and asking provocative questions • Each student – Present one research paper – Write reviews for four research papers • I will post a list of papers and the time for each paper • However, we may not have enough time for every student to present a paper – Students who haven’t presented a paper by the end of the semester will need to write a paper survey on a topic 19 Format of Paper Presentation • Presenter – Prepare slides for about 25 mins – Teach everyone about the paper – Paper critique: strength and weakness of the techniques in the paper • Q&A: 5‐10 mins • Evaluation – Audience 50%; Instructor 50% 20 Format for Paper Reviewing • Review – A summary of the paper (problem, techniques, and results) – A critique of the paper – Possible future work • 2 pages; single space; single column; font size: 12 • Four reviews in total; one for each month (Jan, Feb, Mar, Apr) – Each review is due before the corresponding paper is presented 21 Academic Integrity • Paper presentation – Make your own slides – There are likely other slides about the paper on the internet • Do not borrow verbatim! Rephrase the slides in your own words or from your own angle • Borrowing some figures would be fine, but add attribution! • Paper reviewing – Should not copy sentences from the paper or other sources in verbatim • Occasional quoting from the paper would be fine, but put such sentences in quotes and add citations – We run automatic plagiarism detection tools (turnitin.psu.edu) • Projects – You cannot borrow code from any other source, including the internet or other students – We run automatic plagiarism detection tools 22.