The Good, The Bad, And The Ugly: Stepping on the Security Scale

Add to that the fact that information is becoming (either directly or indirectly) more regulated through vehicles such Mary Ann Davidson as the Federal Management Act Oracle Corporation (FISMA) and Sarbanes-Oxley and often regulation requires Redwood Shores, California one to attest to business practices to the satisfaction of [email protected] independent auditors, which usually necessitates measurable proof points (metrics). The US Federal Government is also Abstract: Metrics are both fashionable and timely: many becoming security metrics-focused through such programs as regulations that affect cybersecurity rely upon metrics – albeit, the Federal Desktop Core Configuration (FDCC) and of the checklist variety in many cases – to ascertain through sponsorship of such programs as the creation of compliance. However, there are far more effective uses of security measurement schemes such as Common Weakness security metrics than external compliance exercises. The most Enumeration (CWE) and Common Vulnerabilities and effective use of security metrics is to manage better, which may Exposure (CVE) under the tagline “making security include: measurable.” • Make a business case for needed change Lastly, the practice of information security naturally • Focus scarce resource on most pressing problems (with lends itself to measurement if for no other reason than the the biggest payoff for resolution) • difficulties in determining “How much security is enough?” Help spot problems early - or successes early and “Am I doing the right things?” Anyone tasked with • Address “outside” concerns or criticisms fairly and objectively protecting an enterprise’s networks must have a basic handle on such indicators as “What are my most critical systems?” A successful security metric should: and “What is the ‘security-readiness state’ – e.g., patch level and secure configuration status – of those systems?” • Motivate good/correct behavior (not promote evasive While faddishness is never a good reason to adopt a tactics just to make the numbers look good) business practice, there are clearly many good reasons to • Prompt additional questions (“Why? How?”) to implement metrics programs around security. One of the understand what is influencing the numbers most often-cited bromides is “If you can’t measure it, you • Answer basic questions of goodness (e.g., “Are we can’t manage it.” (This isn’t necessarily true, by the way, doing better or worse?”) since many managers supervise the people who work for • Be objective and measurable, even if correlation may them without explicitly measuring their work. You can not equal causality imagine the revolt if employee compensation were based on such metrics as number of emails sent and received, number This paper explores the qualities of good security metrics and their application in security vulnerability handling as well as a of hours at work, volume of phone calls placed, and the like.) software assurance program. It is true, however, that a good metrics program can help you manage better. In fact, the primary justification for Keywords-security metrics, vulnerability handling, software implementing a metrics program in security or any other assurance discipline – absent a regulatory requirement – is simply, to manage better. Then-Mayor Rudy Giuliani famously and dramatically improved the governability and livability of the I. WHY SECURITY METRICS? city of New York through a vigorous metrics program There are many management fads that wax and wane in (including, for example, measuring the number of guerrilla terms of the rate of adoption and the number of breathless window washers accosting commuters). presentations, seminars, and books on the topic. One of the “Manage better” may include the following: current waves sweeping the information technology (IT) • Answering basic questions about operational security world is metrics: the idea that security can be and practice, such as “What are we doing and not should be measurable. In a way, this is not an unusual doing?” development. Businesses in general rely on measurement as • Spotting trends – both positive and negative – in a key management indicator. For example, what are audited operational practice, such as “Are we doing better or financial statements but measurement of business activity, worse since the last measurement period?” and what are ratios such as earnings per share (EPS), net • Serve to raise additional questions (e.g., “Why is X present value (NPV) and internal rate of return (IRR) but key happening?”) comparison metrics? Information technology is a business • Comparing entities (e.g., “How is group X doing vs. enabler and should be subject to management oversight group Y?”), so long as the comparisons are “apples (including reasonable and appropriate measurement) just as to apples” other lines of business are. • Allocating resources well (e.g., “The volume of X processes and data stores for logging, tracking, and resolving has increased by Z, and we’ve historically needed Y bugs of all kinds. Furthermore, security bugs were already resource to handle the increment”) flagged separately, because Oracle vulnerability handling processes mandate that security bugs are not published to Metrics can be either external or internal (e.g., as noted, customers and also that “need to know” is enforced on audited financial statements and the financial analysis ratios security bugs, as implemented by row-level data access in derived from them are a form of external metrics). They need the bug database (so that only developers working on a not be different; however, metrics used for external purposes security bug fix and their management chain can access may likely be focused on looking good externally while details about a security bug). The reason security bugs are those used internally can more objectively focus on not published is that Oracle believes it puts customers at risk “managing better.” Putting it differently, internal metrics to publish security vulnerability details where there is no provide more room for improvement if the numbers are not patch available. “good.” That’s the point, actually. If metrics do not The foundation of the security metrics wiki – the ultimately help an organization be honest with itself, they internally accessible metrics page to analyze security might as well not exist. vulnerability trends – is the Oracle bug database (since to do otherwise would be to create an entirely separate and II. FOLLOW THE YELLOW BRICK ROAD possibly less up-to-date data store). Also, the focus on Metrics do not grow on trees: thus, implementing a security vulnerabilities is pragmatic: over time, as part of the metrics program requires resourcing just as any other project Oracle software security assurance program, one expects the does, with time, money, and people to support the desired rate of serious security defects in software to go down. Also, outcome. To that effect – given that most organizations do the rate of security vulnerabilities goes directly to Oracle not have unlimited time, money or people – it is important to Corporation’s security brand, and represents a cost to Oracle consider the value of more information or better information and to its customers. Specifically, Oracle has so many as captured in a metrics program. That is, there may be a products, running on so many operating systems (e.g., the number of things you would like to measure. In a perfect Oracle database runs on 19 operating systems) and so many world, you could measure how many angels dance on the versions in support that almost anything you can do to head of a pin if that were of interest. However, unless you prevent security vulnerabilities - or find and fix them faster - are already capturing the information you’d like to measure, represents significant cost avoidance. it will cost you something to obtain the data or measurement. Even in the short run, it is important to ensure that: The cost to obtain the metric (such as investment in changing processes or creating repositories to capture • security vulnerabilities are handled reasonably information) needs to be weighed against the value of more promptly (that is, the percentage of vulnerabilities by information. If capturing and analyzing data will not lead to aging bucket is not skewing to older buckets) your ability to make a better decision or manage better, you • the backlog of unfixed security vulnerabilities is not are better off not capturing the information and using the growing (that is, worst case, there is a steady state of time and resource to obtain another metric that will lead to find-and-fix) or is shrinking better decision-making. This – being explicitly cognizant of • the bulk of security vulnerabilities are found the value of more information – is the difference between internally and not by customers or third party academic discussions of process- and artifact-focused security researchers metrics, and what resource-constrained entities do to manage better. Another reason to focus metrics on vulnerabilities is that A good starting place for a metrics program is simply development already has many metrics around development date mining what you already have. For example, many process and practice (including tracking open bugs in organizations have processes and organizations associated software). The primary goal of creating the security metrics with the IT function that track many aspects of their IT wiki is to give development the tools it needs to manage their systems (such as help desk tickets, configuration own workload. Development at Oracle “owns” its own code management actions, and the like). Such data stores may and in general does “take care of business.” Highlighting already contain information relevant to security that can be security-specific bug trends of interest and exposing those to incorporated into a metrics program. Adding to that baseline development is more likely to result in a positive outcome information by creating a metrics plan – a priority-ordered than producing metrics that are only used by the software listing of “What else would we like to know?,” “How can we assurance team (that does not own code or fix defects). obtain that information?” and “How will we use this data?” Putting it differently, a self-policing model where developers is key to creating an implementable metrics program. have the tools to manage their own work is preferable to a For example, at Oracle, the security metrics program “security police” model in which a small group uses metrics associated with software assurance focuses in part upon the offensively, to punish development infractions. security vulnerability handling function (there are also metrics associated with program management of assurance). The foundation of the metrics program was the Oracle bug database, since the development organization already had III. ACCENTUATE THE POSITIVE release, you want developers to find bugs, log them, and fix A general difficulty in a metrics program is creating them), but an aged backlog of bugs in a production product measurement that is not only useful but creates incentives for that is both growing and aging raises a red flag. the correct behavior. “Correct behavior” can be demonstrated Developers may be measured on how fast they close by a positive trend (e.g., fewer security defects in software) bugs as well as by the absolute number of outstanding bugs. or can be as simple as a “hygiene metric,” such as how many These metrics may be used for a number of goals, such as people have completed a mandatory security training class ensuring that no critical (high severity) bugs are outstanding and passed it. Note: the metric is “hygienic” in that it cannot before a product shipment milestone (e.g., major product necessarily be proved that there is a cause and effect between release or a patch set release) and that the most critical bugs “do X and Y results,” but it is nonetheless considered a are addressed quickly. Both of these are worthy goals. positive achievement towards a program objective. Putting it However, in the case of security vulnerabilities, the picture is differently, it’s not clear how strongly correlated driver’s a little more complex. education training and a reduction in accidents involving Security vulnerabilities are different than non-security teenage drivers are, but few people would turn a fifteen-year- vulnerabilities in that speed of closure is not as important as old loose with the car keys without basic driver’s education. ensuring that the root cause of the bug is addressed (or, A subtlety of “incentivizes the correct behavior” is “fixed completely”), and, moreover that the bug is fixed making sure that the metrics are a fair measurement and not correctly the first time. One could argue that these are easily “gamed” by the participants (which may have important attributes in any bug, but they are particularly unintended and often perverse side effects). That is, a poorly critical in security bugs because of their nature. Specifically: formed metric may create a circumstance in which those • being measured change their behavior to optimize the metric Security bugs generally require the vendor to instead of optimizing behavior whose natural outcome leads produce more patches than is the case for non- to an improved metric. For example, one Oracle security bugs • development organization many years ago ran bug reports Security patches are generally more broadly applied for senior management every Friday at noon (that reported by customers than non-security patches the number of bugs still open in the product suite). Product • Security bugs that are not fixed correctly may managers would log into the bug database Friday morning (perversely) increase risk to customers and change the status of a number of bugs to “closed” or “not a bug,” wait until the reports were run, then reopen the bugs Regarding the first point, many vendors’ vulnerability later to continue working on them. It was the easiest way to handling policies include some notion of treating customers make the numbers look good at noon on Friday. equally – more specifically, protecting customers to the same Using a “point in time” metric, instead of either degree. As a result, the vendor produces security patches on analyzing the overall trend, or triangulating several data all (or mostly all) supported systems so that customers are points, resulted in “gaming” behavior. In addition to the fact protected to the same degree, rather then telling customers to that the gaming activity made the metric unusable (the upgrade to the newest version (which often takes months and number of open bugs at noon Friday was largely a not days to do). The record at Oracle for a single-issue meaningless number), the activity consumed in changing bug security patch (that is, a patch that fixed a single security status crowded out more useful work. This is what you don’t issue on all affected versions and platforms) is 78 patches. want if you use metrics as a management tool instead of a The cost to produce those patches easily exceeded blame-assignment tool. $1,000,000 and that was exclusive of costs to Oracle Triangulated metrics, in which several factors are used to Corporation to patch their own product instances and arrive at a combined attribute, are more resistant to gaming exclusive of costs to customers of patching their product as well as potentially providing better measurement instances. (navigators, for example, require both latitude and longitude Regarding the second point, customers generally apply to “fix” their position in the ocean: neither latitude nor security patches more broadly then other (non-critical) longitude is sufficient. Nor is a GPS system sufficient to fix patches for, say, performance issues. That is, customers don’t your position if you drop it overboard). wait until they encounter a security problem (i.e., by For example, consider the area of producing patches for experiencing a data breach) before applying the patch if they software vulnerabilities: most particularly, producing patches believe their software installations are affected by an issue for security vulnerabilities. Most development organizations and there is no other mitigation. They may also be required want fairly speedy closure of bugs, particularly for customer- by regulatory pressures to apply security patches more reported bugs of high severity (that is, in which customer broadly than is the case for non-security patches. systems are down or their ability to function is impaired). Regarding the third point, either a reporter of a security Therefore, many development organizations keep track of bug or other third parties may find other instances of a their backlog of vulnerabilities, and “age” them. The utility vulnerability if it is not fixed correctly. For example, suppose of aging is similar to that of other business functions that do a third party security researcher reports that a particular aging (accounts receivable, for example). It isn’t necessarily sequence of characters (Ctrl-X) inputted to an application the total number of bugs that is a problem (after all, pre- causes a core dump (and therefore, anyone sending that sequence of characters to an application could cause a denial of service attack). The real security problem is not that Ctrl- organization can reasonably fix all security bugs of all X causes a core dump; it’s that a failure to validate input severity on all old product versions. Also, it is important to correctly leads to a denial of service attack. Should a customers (as well as efficient resource allocation) to ensure developer only modify the code to handle Ctrl-X gracefully, that at least the most severe issues (that are likely to become the likelihood is that the reporter will log additional bugs in public or that are already public) are fixed for affected and succeeding weeks regarding Ctrl-Y, Ctrl-Z, and so forth. supported platforms. Even if the reporter does not try additional combinations to force a core dump, should the vendor release a security IV. MIRROR MIRROR ON THE WALL, WHICH PRODUCT IS patch, it may well be decompiled (by other researchers) to THE LEAST BUGGIEST OF ALL? figure out “What changed” to determine “How can we Metrics gaming becomes an issue when metrics moves exploit the vulnerability?” The original coding error is now from a management function to a reporting function, or from compounded – and the risk to customers increased – private measurement to public disclosure. The temptation to because: “game” metrics is never greater than in instances where numbers will be used for comparative purposes or • The fix was not complete or correct competitive purposes, most especially when they are public. • A customer may have applied a costly emergency Many vendors routinely take pot shots at one another patch, yet is still vulnerable to a variant of the over a metric that can broadly be categorized as “number of weakness published security vulnerabilities in competing products.” • And may either experience an exploit or have to Absent much transparency or disclosure in how products are apply another expensive emergency patch developed or “how buggy the code base actually is,” “published vulnerabilities” becomes the security metric In short, if you only measure developers on how fast they vendors use against one another (and that customers rely close a security bug, their incentive is to fix the exact issue upon as a pseudo-security quality metric). The metric can be reported and not look at the root cause. You get the behavior compiled in as simple a fashion as counting the number of you measure for, but not what customers actually need. line items in security advisories that a competitor releases As a result of the above three factors that make security during a year in a competing product. The metric as used is vulnerabilities different from non-security vulnerabilities, it inherently flawed for reasons that will become clear below. is critical to capture metrics that measure the desired First of all, a metric “comparison,” to be relevant, needs outcome and incentivize the correct behavior. A more robust to compare apples to apples, not apples to road apples. There metric than “closing the bug quickly” is to triangulate what is are few equivalent ways to measure security defects in being measured to factor in “completeness of fix” and software, and the ones that are equivalent sometimes fail specifically, capture whether a bug is ever reopened (i.e., because of lack of transparency and disclosure (as noted because the developer made a mistake and the security bug earlier, public metrics are likely to not only measure different was not fixed correctly the first time). Incentivizing the things but potentially, actually report different (i.e., “prettied correct behavior can be facilitated by using normal bug up”) numbers since they are not audited or otherwise correction mechanisms to remind developers that “security independently verified). bugs are different.” An obvious comparative metric for multiple bodies of Oracle uses the bug database to both “flag” security bugs code is “defects per thousand lines of code (KLOC).” specifically and add notations to the bug text as to the Comparing defects per KLOC in two bodies of code, requirements for “fix completely” (a URL to the secure assuming the sample is large enough, is a fair comparison. coding standards and “case law” on what “fix completely” However, defects per KLOC is not a comparison of means in the context of security bugs). The purpose of “security” per se; rather, it is a quality metric that may metrics is to manage better; therefore, reminding developers nonetheless have security implications. That is, some of their responsibilities vis-à-vis fixing security bugs (and percentage of the defects per KLOC may be security that correctness is as much or more important than a fast fix) vulnerabilities, but absent knowing how many are, it’s not a is considered more effective than merely tracking how security comparison at all, merely a quality metric. poorly the organization is doing regarding meeting that Another aspect of comparing security vulnerabilities is requirement. that not all security vulnerabilities are of comparable Another triangulation example involves the development severity. Clearly severity is important when talking about equivalent of solving the worst problems first. Assigning a security vulnerabilities, since some defects are far more severity to security bugs using a standard measure, such as relevant than others. A low severity, or non-exploitable, or Common Vulnerability Scoring System (CVSS), allows not-easily-exploitable security defect is not in the same “bucketing” of bugs by severity. Capturing aging category as one in which anybody can (remotely) become information (e.g., aged backlog – which bugs are 0-30 days SYSADMIN, for example. Note: it is true that lower old, 30-60, 60-90 days old, and so forth) can be refined by severity security vulnerabilities may be exploited in a so- using aging buckets to help measure not merely the status of called combined attack, but it is difficult to anticipate these the backlog, but whether more critical issues are being and, absent an combined attack exploit in the wild, vendors addressed first. Most development organizations have to use are better off using an objective (stand-alone) severity some kind of scoring or vulnerability rating anyway since no ranking for resource allocation purposes. In fact, security vulnerabilities are inherently non- Furthermore, as more vendors move to a bundling comparable unless they are “rated” according to a severity strategy as regards security bugs (that is, security patches are ranking such as the CVSS. While there is inevitably some released in regular patch bundles either monthly or quarterly judgment involved in assigning CVSS scores, they or on another published schedule), does it really matter how nonetheless enable some “apples to apples” comparisons many vulnerabilities are contained in the bundle (and isn’t between comparable products to the extent all product that the point, really, to address a number of critical security vulnerabilities are assigned a CVSS score by vendors. issues in one patch rather than piecemeal?) Non-comparability of product factoring or product size is Granted, knowing how many security issues are fixed in also a consideration. For example, comparing numbers of a patch may be cause for customer concern. Of course, security vulnerabilities in competing products is not always absent knowing how many issues are fixed-but-not-public in useful to the extent that the products are either significantly a bundle or whether the trend of “new vulnerabilities different as regards features and functions, the size of code introduced in code” vs. “old vulnerabilities found and fixed” base, or in the way the product is factored (that is, extra-cost is positive (that is, that newer versions are more secure), it’s or additional product functionality may be bundled in one still not necessarily possible to draw security-worthiness product where it is factored as a separate product by another inferences from published security vulnerability information. vendor). It is, in a way, like comparing termite inspection It’s like drawing inferences about the size of an iceberg from reports from a 20,000 square foot house and a doghouse. looking at the amount above water (except that in the case of You might know how many absolute numbers of bugs there icebergs, we at least have specific gravity information to are in each structure, but it doesn’t tell you whether the work with whereas with vulnerabilities, there is no way to McMansion can truly be considered “buggier” than the estimate what’s below the water line). doghouse, or vice versa. The nail in the coffin in using “number of published Another difficulty with comparing number of published security vulnerabilities” as a quality metric or for security security vulnerabilities in products is that there is no standard comparison purposes is that it is not only gameable, but the (or objective) way of actually counting security metric itself invites gaming. In other words, if a vendor vulnerabilities. If a change to one function addresses the root thinks that numbers of published vulnerabilities will be used cause of several reported security bugs (involving, say, against it competitively, the incentive is to not self-report failure to validate input correctly), is that to be counted as significant vulnerabilities so as to make the numbers look one security bug or several security bugs? A common better. Note: while no vendor publicly reports every issue criticism of many vendors is that they do not, upon they fix for a multiplicity of reasons (e.g., inability to fix all addressing externally reporting vulnerabilities, address issue on all old versions argues for some discretion in “nearby” vulnerabilities. For example, a reported reporting to the extent those versions are at a risk that can’t vulnerability in an Apache mod should lead one to analyze be mitigated); on balance, creating positive incentives to other Apache mods for similar problems. Should a vendor protect customers (instead of reasons to “game”) is in who does so report “related” vulnerabilities in the same mod industry’s interests. as part of a security advisory, the disclosure of what else was In a competitive market, businesses will use what found (if the vendor discloses it) is likely to be held against competitive advantage they can to gain leverage over a the vendor. Ironically, “correct” behavior (“look for other competitor. Security-worthiness, to the extent that it can be issues and fix them”) is likely to rebound more negatively on verifiably a differentiator, can be a competitive advantage. the vendor (at least in the short run) than would be the case if However, number of publicly disclosed security the vendor either doesn’t look for related issues, or looks for vulnerabilities is not objectively a fairly competitive metric. and fixes additional issues without disclosing what else was found and fixed. (One can argue about “silent fixing” vs. full V. MINE’S BIGGER THAN YOURS disclosure, but the bottom line is that finding and fixing more The corollary to the – potentially destructive – aspects of issues earlier is generally in customers’ best interests.) using externalized (public) security metrics competitively is To that point, using published numbers of security the benefit of harnessing competitive metrics internal to an vulnerabilities as a competitive comparison is inherently organization. Specifically, nobody likes to be seen as the flawed as it relies to a large extent on self-disclosure by the person or group doing the worst at something. Peer pressure vendor and thus is “rigged to be gamed.” Absent knowing can be a force for good in a security program. what a vendor’s vulnerability disclosure policies are (for For example, Oracle has made a number of significant example, do they self-disclose issues they find and fix acquisitions in the past several years. As part of post- themselves?), it is not possible to make any meaningful acquisition integration, we align development organizations comparison based on “number of publicly disclosed from acquired companies with Oracle secure development vulnerabilities.” One can assume that vendors find some practices. One of the difficulties of this process is that portion of security vulnerabilities themselves: do they different acquired entities are at different points in the disclose all of these? Likely not. Oracle’s own metrics continuum of secure development practice, in regards to both substantiate that (taking all product families together) more “where they started” and “how they are aligning with the than 90% of security vulnerabilities are found internally (a corporate secure development practices” (the umbrella term percentage that has increased since we began keeping for which is Oracle Software Security Assurance (OSSA)). metrics). Organizations that, post-acquisition, find themselves under the umbrella of a mature development organization tend to more transparency and more accurate RFP responses. In at “inherit” a number of development practices as they are least one case, an increased customer concern in the security integrated into the development group. On the other hand, of a particular product led to a “full court press” in some acquired entities – mostly in vertical or industry- increasing the uptake of OSSA activities. This would not specific segments – are incorporated as largely standalone have been possible without the baseline reporting that the entities into organizations that are not primarily development security compliance scorecard (metrics) enables. organizations. There is thus, in these organizations, no long- Without making compliance an impossible goal to meet, established development practice to inherit. the intention of the compliance scorecard is that what For reasons including the disparity of “inherited constitutes “compliance” will change over time. Specifically, development practice” as well as the sheer number of as the state of the art improves in software assurance, Oracle acquisitions, Oracle has instituted a more structured will redefine what “compliance” means for a number of cells governance program around software assurance that is in the chart. For example, one compliance area is the use of geared to looking at consistency of secure development automated tools, which is one of the best and cheapest ways practice across the entire organization. The primary purpose to get avoidable, preventable defects out of one’s code base. of the collected metrics – in this case, measurement of which “Use of tools” as a compliance metric encompasses elements of OSSA have been adopted by each organization everything from identification of the appropriate and and to what degree – is simply to measure “How are we applicable tools by development group (e.g., static analysis, doing in the uptake of OSSA software assurance programs?” web vulnerability analysis, fuzzers, etc.) to deployment and The security compliance scorecard is reported to the Oracle degree of code coverage. Over time, we may identify (or Security Oversight Committee (OSOC) as well as the Chief develop) additional tools that would also be adopted, which Executive Officer (CEO). means “compliance” will include use of new tools in the The secondary purpose of metrics around OSSA groups to which they are applicable. adoption is to be able to track organizations that are slow in An expected benefit of the security compliance scorecard adoption or appeared to be running into difficulties in is the – for lack of a better expression – shaming effect it has adopting standard Oracle secure development practices. on groups that have had OSSA adoption issues. Specifically, Groups that are slow, or running into difficulties, are targets one of the baseline elements of OSSA is security training – for concentrated assistance. Specifically, rather than using more specifically, completion of an online class on Oracle Oracle Global Product Security (GPS) (the group responsible Standards (OSCS), that is mandatory for most for monitoring OSSA compliance) resources thinly across of development (e.g., developers, quality assurance (QA), the entire company, the resources are applied in a more release management, product management, and so on). concentrated form to the teams that are “slow learners” or Despite the OSCS class being online and modular (which “late adopters.” This may include a product assessment facilitates people being able to take the class at their own conducted by the ethical hacking team, accelerated uptake of pace, completing portions of it a bit at a time), adoption had automated tools (e.g., static analysis) and other fast-tracked been slow in some groups. The single factor in rapidly OSSA requirements. increasing class adoption was creation of a bar chart

SECURITY COMPLIANCE SCORE CARD showing, for each organization, their percentage completion. Since the bar chart showed all organizations’ training status Organization Senior Development Security Lead SPOCs M&A Se cur e Product Third Party Automated Se cur e Security Best Critical as compared with each other, it also highlighted who the Manager Identified Checklist Coding Se cur ity Security Test Tools Configuration Practices Patch Process Practices Checklist Checklist Install Guides Update “worst offenders” were. Reporting numbers by group as a Training Process Process participation percentage had never “grabbed” management attention: the

Mature Product 1 SVP1 SL1 bar chart showed not only who was compliant, but how far Mature Prodiuct 2 SVP2 SL2 out of compliance some groups were relative to others. Mature Produuct 3 SVP3 SL3 Publishing the bar chart that showed who the outliers

Acquired Product 1 SVP4 SL4 were had a dramatic and immediate effect upon compliance Acquired Product 2 SVP5 SL5 rates. Specifically, once the word got out that the charts had Acquired Product 3 SVP6 SL6 gone to executive management and the Oracle Security

In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance In Compliance Oversight Committee – and that the report was being rerun In Progress In Progress In Progress In Progress In Progress In Progress In Progress In Progress In Progress and the results re-reported in a month’s time – the Non Compliant Non Compliant Non Compliant Non Compliant Non Compliant Non Compliant Non Compliant Non Compliant Non Compliant compliance rate went straight up. “Compliance” here meant not merely that someone had attempted the class, but more Figure 1. Oracle Security Compliance Scorecard specifically, “of the people in a development group required to take the class, who has taken it and passed it?” Ninety five The third reason for a governance structure is in percent or better is considered “compliant” for tracking anticipation of greater customer interest in secure purposes. development practices and a requirement for transparency – perhaps driven by regulation. Anecdotally, more customer requests for proposals (RFPs) are requesting information on secure development practice, and being able to track, by development group, what is being done and not done enables Secure Coding Practices Completion Summary It’s not clear that there is a direct correlation between expanded secure coding training and security bug reduction, 100% in particular because automated tools have also become more 90% broadly deployed and thus more security bugs are found

80% earlier. That said, the foundation for OSSA is the Oracle Secure Coding Standards. The training class is intended to be 70% familiarization for developers that they do indeed have 60% responsibilities and standards of practice, and pointers to 50% sources of more information. Getting developers to “get it” is

40% the main purpose of the class, and other processes, tools, Percent Trained checks and balances and compliance constitute the 30% framework to make it easier for developers to write secure 20% code than not. 10% Secure Codi ng Practices Compl et i on Summary 0%

1 2 3 4 5 6 1 22 3 4 25 6 1 2 3 4 5 P1 P1 P1 P1 P1 P1 P2 P P2 P2 P P2 P3 P3 P3 P4 P5 100% n1 n2 on3 ivisi D oduct 90% Product Divisio Product Divisio Pr Development Organization 80%

70%

Figure 2. OSCS Training Completion Status – Pre-OSOC Meeting 60%

To reiterate that this was not intended to be a “gotcha” 50% 40% exercise, Oracle Global Product Security: Percent Trained 30% • Published the then-current scorecard about two 20% months in advance of the semi-annual Oracle security oversight committee meeting at which the 10% numbers would be reported 0% 1 3 5 6 2 5 1 4 6 • 2 4 21 3 4 2 6 32 3 5 With a notation that the final results would be n1 P1 P1 P1 P1 P1 P1 P P2 P2 P2 P P2 3 P3 P P3 P3 P3 P3 io n2 s sio ivi D compiled and reported in two months’ time ct u uct Divi • Prod And a reminder that the class was mandatory for Prod Product Division Development Organization most of development (individual managers were reminded of the specific names of those who worked for them who had to take the class) Figure 3. OSCS Training Completion Status – Four Months Post-OSOC Meeting • And a request to comply with the training mandate as soon as feasible VI. TASTES GREAT AND LESS FILLING, TOO? In other words, this was geared as a reminder – with a One of the benefits of a thriving metrics program is that long lead-time – of the requirement to take the class and a in many cases information you collect for one purpose may notation that the results would be reported to executive have a secondary use. Of course, this is what privacy experts management. One could argue that it should not require worry about vis-à-vis personally identifiable information “reporting to the boss” to ensure compliance but the fact (PII). Metrics you develop for your own use do not have the remains that compliance rates dramatically increased when same concerns: in fact, one hopes that a metric developed for the results clearly showed the outliers. The results also a good business reason proves to have secondary benefits. showed groups that had made a dramatic improvement, and That is not the primary reason to collect data – “I might the written analysis also credited the security leads for those need that some day” – but a secondary use can help validate groups who had used the opportunity to exhort their teams to the metrics function and in some cases can provide business compliance. In other words, the report taken in toto was justification “with teeth” that would not otherwise be geared to show “who has done very well” as well as “who is available. For example, one of the challenges for Oracle falling behind.” related to acquisitions is – as noted in the earlier discussions Figures 2 and 3 show, respectively, compliance rates of OSSA – getting acquired entities to adopt standard secure before the numbers were reported to the Oracle Security development practices, including Oracle vulnerability Oversight Committee, and compliance rates four months handling processes. In many cases, an acquired entity may later (in reality, the most dramatic compliance increase not have a pre-existing security vulnerability handling team occurred within the first two months). or processes; also, an acquired entity may become a target of When all else fails, public “naming and shaming” works, the research community after the product becomes an Oracle and numbers don’t lie. branded product, post-acquisition. Therefore, the challenge is both the “how” to integrate The first is to find vulnerabilities in our own product suites the entity with Oracle vulnerability handling processes and before third party researchers or actual hackers do; the determining the correct number of “whos” to do the work. second (though, in a way, the first) is to effect knowledge Even instituting a lot of automation around security transfer to development (in the form of in-house “hacking vulnerability handling processes, as Oracle has done, does tools,” secure coding standards, “case law” on hacker- not mean that no additional bodies are needed to handle an resistant development practices and the like). It is a general increased workload. goal of our vulnerability handling processes to address Fortunately, the area of vulnerability handling is one in internal, ethical-hacking team found vulnerabilities quickly which Oracle’s security metrics include: since the entire goal of turning the hackers loose on a product is to beat third party researchers or actual hackers to • Number of open security bugs being tracked, aged, the punch. (There’s nothing worse than having a third party and growth rate researcher report the exact same bug that was found • Number of security bulletins or alerts issued per year internally, has not been fixed and is still languishing in • Volume of email correspondence to the security development.) vulnerability handling team (i.e., from external There was a concern that ethical hacking bugs were not reporters who are not customers) being fixed quickly enough. The ramifications of that would include a higher cost to the company (because, to the extent Managers made the business case for the number of the fixes for those vulnerabilities are ultimately released on people needed to resource the security vulnerability handling multiple platforms and versions, the longer they languish, the function (at the yearly budgeting cycle) based upon what was greater the cost to remediate as we miss a version going out known about the “security bug load” of recently acquired the door which could contain the fix and may thus need a entities. (Similar analysis was used to justify additional separate patch later). headcount for the oversight function.) More specifically, An analysis of the metrics of how many issues had been looking at workload items such as volumes of emails to the reported to-date in each “reporter” category (shown in Figure vulnerability group, number of security bulletins/advisories 4) plus the percentage of issues that were still open showed issues, backlogs of security vulnerabilities that required that “gut feeling” was not accurate. In fact, next to researcher tracking and resolution was used as raw data to justify reported bugs, “ethical hacking bugs” had the highest degree headcount increases. of bug closure (see below chart). (Note: “external, no credit” At Oracle, the tracking system for security vulnerabilities category captures third party researchers who do not care tracks not only the reporter but also “buckets” the reporters about receiving credit in security advisories.) A quick review for metrics purposes. Specifically, we differentiate among: of “the facts” helped alleviate the concern over what was, in fact, anecdotal but not accurate. More importantly, it helped • Customer reported bugs Oracle avoid an expensive fire drill to solve a non-existing • Security researcher- reported bugs problem. • Internally found bugs (general) • Internally found bugs (found by the ethical hacking Reporter Percentage of Reported team) Bugs That Are Open Customer 9.9 One of the main reasons to differentiate among the above External, No Credit 3.3 groups is that – all things being equal – we believe it is more Researchers 3.6 important to speedily resolve researcher-reported bugs than comparably severe bugs reported by other sources. The Internal 11.7 reasons include the concern that researchers may become Internal (ethical hacking 5.2 impatient if their bugs are not speedily addressed (and thus team) they may “out” the bug before there is a fix available, thus increasing risk to customers). Also, researchers whose bugs Figure 4. Percentage of Open Bugs by Reporter are not fixed promptly may complain publicly, which – in addition to generating bad PR – generates work to respond to the bad PR and thereby address customer concerns. (Note: VII. SUMMARY one of the best ways to avoid negative PR generated by Metrics are a useful tool in many areas of management, security researchers is to give them nothing that is both including security and more particularly, in the area of negative and accurate to say. To the extent bad PR is based security vulnerability handling and security assurance. on incorrect information, it can be refuted; to the extent it is Oracle’s security metrics program has moved from a gleam accurate, the root cause needs to be addressed by the in a manager’s eye to an active program used to change the vendor.) way we allocate security resources. Giving access to security At Oracle, the purpose of having an ethical hacking team metrics to developers themselves has helped highlight focused on product assessments (rather than, for example, successes (as well as “needs work” areas) to help doing penetration tests of production systems) is twofold. development organizations better manage their own workload. Providing assurance compliance metrics to senior management has helped increase the visibility of the assurance function within Oracle and harnessed peer pressure in pursuit of more secure code.

ACKNOWLEDGMENT Thanks to Darius Wiles of Oracle Global Product Security for development of the Oracle security metrics wiki and to John Heimann of Oracle Global Product Security for the Oracle Security Compliance Scorecard.

REFERENCES [1] Rudolph W. Giuliani, Leadership, Miramax, 2005.