Build It, Break It, Fix It: Contesting Secure Development
Total Page:16
File Type:pdf, Size:1020Kb
Build It, Break It, Fix It: Contesting Secure Development Andrew Ruef Michael Hicks James Parker Dave Levin Michelle L. Mazurek Piotr Mardziely University of Maryland yCarnegie Mellon University experts have long advocated that achieving security in a computer system requires treating security as a first-order ABSTRACT design goal [32], and is not something that can be added after the fact. As such, we should not assume that good Typical security contests focus on breaking or mitigating the breakers will necessarily be good builders [23], nor that top impact of buggy systems. We present the Build-it, Break-it, coders necessarily produce secure systems. Fix-it (BIBIFI) contest, which aims to assess the ability to This paper presents Build-it, Break-it, Fix-it (BIBIFI), securely build software, not just break it. In BIBIFI, teams a new security contest with a focus on building secure sys- build specified software with the goal of maximizing correct- tems. A BIBIFI contest has three phases. The first phase, ness, performance, and security. The latter is tested when Build-it, asks small development teams to build software ac- teams attempt to break other teams' submissions. Win- cording to a provided specification that includes security ners are chosen from among the best builders and the best goals. The software is scored for being correct, efficient, and breakers. BIBIFI was designed to be open-ended|teams feature-ful. The second phase, Break-it, asks teams to find can use any language, tool, process, etc. that they like. As defects in other teams' build-it submissions. Reported de- such, contest outcomes shed light on factors that correlate fects, proved via test cases vetted by an oracle implementa- with successfully building secure software and breaking inse- tion, benefit a break-it team's score and penalize the build-it cure software. During 2015, we ran three contests involving team's score; more points are assigned to security-relevant a total of 116 teams and two different programming prob- problems. (A team's break-it and build-it scores are inde- lems. Quantitative analysis from these contests found that pendent, with prizes for top scorers in each category.) The the most efficient build-it submissions used C/C++, but final phase, Fix-it, asks builders to fix bugs and thereby get submissions coded in other statically-typed languages were points back if the process discovers that distinct break-it less likely to have a security flaw; build-it teams with di- test cases identify the same defect. verse programming-language knowledge also produced more BIBIFI's design aims to minimize the manual effort of secure code. Shorter programs correlated with better scores. running a contest, helping it scale. BIBIFI's structure and Break-it teams that were also successful build-it teams were scoring system also aim to encourage meaningful outcomes, significantly better at finding security bugs. e.g., to ensure that the top-scoring build-it teams really pro- duce secure and efficient software. Behaviors that would thwart such outcomes are discouraged. For example, break- it teams may submit a limited number of bug reports per 1. INTRODUCTION build-it submission, and will lose points during fix-it for test Cybersecurity contests [24, 25, 11, 27, 13] are popular cases that expose the same underlying defect or a defect also proving grounds for cybersecurity talent. Existing contests identified by other teams. As such, they are encouraged to largely focus on breaking (e.g., exploiting vulnerabilities or look for bugs broadly (in many submissions) and deeply (to misconfigurations) and mitigation (e.g., rapid patching or uncover hard-to-find bugs). reconfiguration). They do not, however, test contestants' In addition to providing a novel educational experience, ability to build (i.e., design and implement) systems that are BIBIFI presents an opportunity to study the building and secure in the first place. Typical programming contests [35, breaking process scientifically. In particular, BIBIFI con- 2, 21] do focus on design and implementation, but generally tests may serve as a quasi-controlled experiment that cor- ignore security. This state of affairs is unfortunate because relates participation data with final outcome. By exam- ining artifacts and participant surveys, we can study how Permission to make digital or hard copies of all or part of this work for personal or the choice of build-it programming language, team size and classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation experience, code size, testing technique, etc. can influence a on the first page. Copyrights for components of this work owned by others than the team's (non)success in the build-it or break-it phases. To the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or extent that contest problems are realistic and contest partic- republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ipants represent the professional developer community, the CCS’16, October 24 – 28, 2016, Vienna, Austria results of this study may provide useful empirical evidence c 2016 Copyright held by the owner/author(s). Publication rights licensed to ACM. for practices that help or harm real-world security. Indeed, ISBN 978-1-4503-4139-4/16/10. $15.00 the contest environment could be used to incubate ideas to DOI: http://dx.doi.org/10.1145/2976749.2978382 improve development security, with the best ideas making a quantitative and qualitative analysis of the results. We their way to practice. will be making the BIBIFI code and infrastructure publicly This paper studies the outcomes of three BIBIFI contests available so that others may run their own competitions; we that we held during 2015, involving two different program- hope that this opens up a line of research built on empir- ming problems. The first contest asked participants to build ical experiments with secure programming methodologies.1 a secure, append-only log for adding and querying events More information, data, and opportunities to participate are generated by a hypothetical art gallery security system. At- available at https://builditbreakit.org. tackers with direct access to the log, but lacking an \authen- The rest of this paper is organized as follows. We present tication token," should not be able to steal or corrupt the the design of BIBIFI in x2 and describe specifics of the con- data it contains. The second and third contests were run tests we ran in x3. We present the quantitative analysis of simultaneously. They asked participants to build a pair of the data we collected from these contests in x4, and qual- secure, communicating programs, one representing an ATM itative analysis in x5. We review related work in x6 and and the other representing a bank. Attackers acting as a conclude in x7. man in the middle (MITM) should neither be able to steal information (e.g., bank account names or balances) nor cor- 2. BUILD-IT, BREAK-IT, FIX-IT rupt it (e.g., stealing from or adding money to accounts). Two of the three contests drew participants from a MOOC This section describes the goals, design, and implementa- (Massive Online Open Courseware) course on cybersecurity. tion of the BIBIFI competition. At the highest level, our These participants (278 total, comprising 109 teams) had an aim is to create an environment that closely reflects real- average of 10 years of programming experience and had just world development goals and constraints, and to encourage completed a four-course sequence including courses on se- build-it teams to write the most secure code they can, and cure software and cryptography. The third contest involved break-it teams to perform the most thorough, creative anal- U.S.-based graduate and undergraduate students (23 total, ysis of others' code they can. We achieve this through a comprising 6 teams) with less experience and training. careful design of how the competition is run and how vari- BIBIFI's design permitted it to scale reasonably well. For ous acts are scored (or penalized). We also aim to minimize example, one full-time person and two part-time judges ran the manual work required of the organizers|to allow the the first 2015 contest in its entirety. This contest involved contest to scale|by employing automation and proper par- 156 participants comprising 68 teams, which submitted more ticipant incentives. than 20,000 test cases. And yet, organizer effort was lim- 2.1 Competition phases ited to judging whether the few hundred submitted fixes addressed only a single conceptual defect; other work was We begin by describing the high-level mechanics of what handled automatically or by the participants themselves. occurs during a BIBIFI competition. BIBIFI may be ad- Rigorous quantitative analysis of the contests' outcomes ministered on-line, rather than on-site, so teams may be geo- revealed several interesting, statistically significant effects. graphically distributed. The contest comprises three phases, Considering build-it scores: Writing code in C/C++ in- each of which last about two weeks for the contests we de- creased build-it scores initially, but also increased chances scribe in this paper. of a security bug found later. Interestingly, the increased BIBIFI begins with the build-it phase. Registered build- insecurity for C/C++ programs appears to be almost en- it teams aim to implement the target software system ac- tirely attributable to memory-safety bugs. Teams that had cording to a published specification created by the contest broader programming language knowledge or that wrote less organizers.