Enhancement of a Vulnerability Checker for Software Libraries with Similarity Metrics Based on File-Hashes
Total Page:16
File Type:pdf, Size:1020Kb
Gottfried Wilhelm Leibniz Universität Hannover Faculty of Electrical Engineering and Computer Science Institute of Practical Computer Science Software Engineering Group Enhancement of a Vulnerability Checker for Software Libraries with Similarity Metrics based on File-Hashes Bachelor Thesis in Computer Science by Huu Kim Nguyen First Examiner: Prof. Dr. Kurt Schneider Second Examiner: Dr. Jil Klünder Supervisor: M.Sc. Fabien Patrick Viertel Hannover, March 19, 2020 ii Declaration of Independence I hereby certify that I have written the present bachelor thesis independently and without outside help and that I have not used any sources and aids other than those specified in the work. The work has not yet been submitted to any other examination office in the same or similar form. Hannover, March 19, 2020 _____________________________________ Huu Kim Nguyen iii iv Abstract This bachelor thesis presents a software which checks a software project for libraries that have security vulnerabilities so the user is notified to update them. External software libraries and frameworks are used in websites and other software to provide new functionality. Using outdated and vulnerable libraries poses a large risk to developers and users. Finding vulnerabilities should be part of the software development. Manually finding vulnerable libraries is a time consuming process. The solution presented in this thesis is a vulnerability checker which scans the project library for any libraries that contain security vulnerabilities provided that the software project is written in Java or in JavaScript. It uses hash signatures obtained from these libraries to check against a database that has hash signatures of libraries that are known to have security vulnerabilities. The developer will receive warnings if the software detects vulnerable libraries. For every library in Maven and in the Node Package Manager registry, its metadata is used to find any matching entries in the National Vulnerability Database which indicate that the library has vulnerabilities. If a match is found, the hash signature and the metadata is then saved to the library hash signature database. The hash checker can be integrated as a library into other software, which can then fully automate this process. Once the lengthy process of creating the hash signature database is finished, the hash-based vulnerability checker performs has a slightly better retrieval performance than any existing library vulnerability checker. v vi vii Table of contents 1 Introduction 1 1.1 Problem . 2 1.2 Proposed Solution . 2 1.3 Structure of the Thesis . 3 2 Requirements 5 2.1 Stakeholder . 5 2.2 Functional requirements . 5 2.3 Nonfunctional requirements . 7 2.4 Priority . 7 2.5 Packages . 8 3 Basics 9 3.1 Vulnerability . 9 3.2 National Vulnerability Database (NVD) . 9 3.3 Hash function . 14 3.4 Maven repositories . 16 3.5 Node Package Manager . 16 4 Concept 17 4.1 Definitions . 17 4.2 Generic tasks . 17 4.3 NVD datafeed download . 19 4.4 Library hash creator for Maven libraries . 19 4.4 Library hash creator for Maven libraries . 19 4.5 Library hash creator for npm packages . 25 4.6 Standalone JavaScript libraries . 27 4.7 Library hash checker . 28 4.8 Fundamental drawbacks . 29 viii 5 Implementation 31 5.1 General Architecture . 31 5.2 Database . 32 5.3 Library hash creator application . 34 5.4 Library hash checker application . 36 5.5 Difficulties . 38 6 Related works 39 6.1 OWASP Dependency-Check . 39 6.2 Retire.js . 39 6.3 Snyk.io . 40 6.4 Other related works . 40 7 Evaluation 41 7.1 Testing environment . 41 7.2 Evaluation Terminology and Methodology . 42 7.3 Retrieval performance evaluation . 43 7.4 Time performance evaluation . 50 8 Conclusion 53 8.1 Summary . 53 8.2 Outlook . 54 8.3 Conclusion . 55 A Appendix 57 A.1 Contents of the disc . 57 Bibliography 59 ix List of Figures 1.1 Overall schema of the proposed solution . 2 3.2 CWE ID followed by short description . 11 3.3 Small portion of the large CWE tree . 11 3.4 Example CVSS vector . 12 3.8 Probability of hash collision . 14 4.1 Schematic of the library hash creator . 18 4.3 Tree data structure with three levels . 20 4.6 Version ranges in the NVD entry from the JSON datafeed . 23 4.7 Excerpt from the version information of jackson-databind . 24 4.8 Raw list of files of jackson-databind 2.9.0 . 24 4.9 Schematic of the library hash checker . 29 5.1 General software architecture of the library hash creator and checker . 32 5.2 Database scheme of a Maven table . 34 5.3 GUI of the library hash creator . 35 5.4 GUI of the library hash checker . 37 x List of Tables 3.5 CVSS Exploitability metrics . 12 3.6 CVSS Impact metrics . 13 3.7 CVSS scoring scale . 13 3.9 Probability of a hash collision for a hash size k > 64 bits . 14 3.10 Number of available hashes before a probable collision occurs . 15 4.4 Relevant metadata from the POM file . 22 7.1 Specification of evaluation computer . 41 7.2 Example result table . 41 7.3 Test Java libraries . 44 7.4 Results of evaluation for Java libraries . 45 7.5 Test npm modules . 46 7.6 Results of evaluation for npm modules . 47 7.7 Test standalone JavaScript libraries . 48 7.8 Results of evaluation for standalone JavaScript libraries . 49 7.9 Time required for a single scan . 51 7.10 Time performance of hash functions in Java . 52 xi List of Listings 3.1 CPE Example using as many attributes as possible . ..