Automated Identification of Security Issues from Commit Messages and Bug Reports Yaqin Zhou Asankhaya Sharma SourceClear, Inc. SourceClear, Inc.
[email protected] [email protected] ABSTRACT 1 INTRODUCTION The number of vulnerabilities in open source libraries is increasing To aid in the software development, companies typically adopt rapidly. However, the majority of them do not go through public issue-tracking and source control management systems, such as disclosure. These unidentied vulnerabilities put developers’ prod- GitHub, JIRA, or Bugzilla. In fact, as of April 2017, GitHub reports ucts at risk of being hacked since they are increasingly relying on having almost 20 million users and 57 million repositories. Accord- open source libraries to assemble and build software quickly. To ing to Atlassian, JIRA is used by over 75,000 companies. These nd unidentied vulnerabilities in open source libraries and se- tools are very popular for open source projects, and are essential cure modern software development, we describe an ecient au- to modern software development. Developers work on reported is- tomatic vulnerability identication system geared towards track- sues in these systems, then commit corresponding code changes ing large-scale projects in real time using natural language pro- to GitHub (or other source code hosting platforms, e.g. SVN or Bit- cessing and machine learning techniques. Built upon the latent in- Bucket). Bug xes and new features are frequently merged into a formation underlying commit messages and bug reports in open central repository, which is then automatically built, tested, and source projects using GitHub, JIRA, and Bugzilla, our K-fold stack- prepared for a release to production, as part of the DevOps prac- ing classier achieves promising results on vulnerability identica- tices of continuous integration (CI) and continuous delivery (CD).