A Deep Dive Into Docker Hub's Security Landscape
Total Page:16
File Type:pdf, Size:1020Kb
A Deep Dive into Docker Hub’s Security Landscape A story of inheritance? Emilien Socchi Jonathan Luu Thesis submitted for the degree of Master in Network and System Administration 30 credits Department of Informatics Faculty of Mathematics and Natural Sciences UNIVERSITY OF OSLO Spring 2019 A Deep Dive into Docker Hub’s Security Landscape A story of inheritance? Emilien Socchi Jonathan Luu © 2019 Emilien Socchi, Jonathan Luu A Deep Dive into Docker Hub’s Security Landscape http://www.duo.uio.no/ Printed: Reprosentralen, University of Oslo Abstract Docker containers have become a popular virtualization technology for running multiple isolated application services on a single host using minimal resources. That popularity has led to the cre- ation of an online sharing platform known as Docker Hub, hosting images that Docker containers instantiate. In this thesis, a deep dive into Docker Hub’s security landscape is undertaken. First, a Python based software used to conduct experiments and collect metadata, parental and vul- nerability information about any type of image available on Docker Hub is developed. Secondly, our tool allows analyzing the most recent image found in each Certified, Verified and Official repository, as well the most recent image found in 500 random Community repositories among the most popular ones. Using our software named Docker imAge analyZER (DAZER), the fol- lowing discoveries were made: (1) the Certified and Verified repositories introduced by Docker Inc. in December 2018 do not improve the overall Docker Hub’s security landscape in a way that is significant; (2) the most influential parent images on Docker Hub are all Official images and although vulnerabilities in the platform are still inherited in a highly manner, they do not tend to be introduced by the top root parents as suggested by previous studies; (3) the average number of unique vulnerabilities found across all types of repositories is expected to grow with a rate of approximately 105 vulnerabilities per year between 2019 and 2025 if Docker Hub’s security landscape continues evolving the same way. While set in perspective with results from previous studies, our findings demonstrate the deterioration of Docker Hub’s security landscape over the years and the strong need for automated Docker image security updates of a significantly higher quality than what today’s procedures are offering. i ii Acknowledgements First and foremost, we would like to express our sincere gratitude and appreciation to our su- pervisors I. Hassan and V. Tasoulas for their support and enthusiasm throughout the entire thesis. Their constant availability and constructive feedback provided valuable guidance, as well as inspirational encouragements during the entire project. Secondly, we would like to express a special thanks to our closest friends and family who helped us getting through this demanding but exciting master’s studies that is the Network and System Administration (NSA) program. Finally, we wish to express our sincere appreciation to Oslo Metropolitan University (OsloMet) and the University of Oslo (UiO) for giving us the opportunity to take part in the NSA program and thank all of our lecturers for their inspiring work and constant dedication. Oslo, May 2019 Emilien Socchi & Jonathan Luu iii iv Preface The basis of this research originally stemmed from the master’s topic proposed by V. Tasoulas regarding the investigation of container security in the world of microservices. Our initial back- ground survey revealed that a strong need for examining the security landscape of the biggest container image sharing platform known as Docker Hub was needed, as very little study had been conducted on the subject so far. Both interested in conducting research about the same topic, we decided collaborating in order to enhance our productivity and demonstrate that a binomial cooperation may produce increased valuable results and contributions for the research community. Our final contributions in this research are multiple and are not strictly limited to the scope of the problem statement. Finally, we have intended to make the reading of this thesis as easy as possible, by writing important keywords and concepts in the background chapter in italic. Moreover, important findings are summarized in the result and analysis chapters for better readability and understanding, while all the details are available in their entirety in the appendix. We hope that you enjoy your reading and find our research of interest. v vi Contents 1 Introduction 1 1.1 Motivation.......................................1 1.2 Problem statement...................................2 1.3 Thesis outline......................................3 2 Background and literature5 2.1 Software vulnerabilities................................5 2.1.1 What is a software vulnerability?.......................5 2.1.2 Enumerating vulnerabilities..........................6 2.1.3 Classifying vulnerabilities...........................6 2.1.4 Severity levels..................................7 2.2 Software containers...................................8 2.2.1 What is a software container?.........................8 2.2.2 Container vs. Virtual Machine (VM)?....................9 2.3 Docker..........................................9 2.3.1 What is Docker?................................ 10 2.3.2 What is a Docker container?......................... 10 2.3.3 How are Docker images distributed?..................... 11 2.3.4 Docker’s architecture.............................. 11 2.4 The Docker engine................................... 12 2.4.1 What is the Docker engine?.......................... 12 2.4.2 Managing images................................ 13 2.5 Docker Hub....................................... 14 2.5.1 What is Docker Hub?............................. 14 2.5.2 Repository types................................ 15 2.5.3 Repository naming convention........................ 17 2.5.4 Docker image reusability............................ 18 2.5.5 Docker image dependencies.......................... 18 2.5.6 Have you said API?.............................. 20 2.6 Docker Hub’s security landscape........................... 20 2.6.1 Current knowledge............................... 20 2.6.2 Docker Inc.’s response............................. 21 3 Methodology 23 3.1 Objectives........................................ 23 3.2 Design.......................................... 25 3.2.1 Data set definition............................... 25 vii 3.2.2 Preliminary requirements........................... 26 3.2.3 Overview.................................... 27 3.2.4 Result data format definition......................... 28 3.2.5 Detailed research questions definition.................... 30 3.3 Implementation..................................... 30 3.3.1 Tools and technologies............................. 31 3.3.2 Architecture.................................. 32 3.3.3 Intended workflow............................... 33 3.3.4 Research queries definition.......................... 34 3.4 Measurements and analysis.............................. 35 3.5 Expected results.................................... 35 4 Result 1: Design 37 4.1 Data set......................................... 37 4.1.1 Defined data set................................ 37 4.1.2 Skipped repositories.............................. 38 4.2 Preliminary requirements............................... 39 4.2.1 Two parent databases............................. 39 4.2.2 Manual image checkout............................ 41 4.3 Overview........................................ 42 4.4 Designed result data format.............................. 43 4.5 Detailed research questions.............................. 46 5 Result 2: Implementation 49 5.1 Tools and technologies................................. 49 5.2 Retrieving data..................................... 50 5.2.1 The Docker Hub API: version 1........................ 50 5.2.2 The Docker Hub API: version 2........................ 52 5.2.3 CIRCL’s CVE API............................... 53 5.2.4 The MicroBadger API............................. 54 5.2.5 The Red Hat security data API........................ 54 5.2.6 Enterprise Linux Security Advisory...................... 54 5.3 Implemented architecture............................... 55 5.4 Implemented workflow................................. 56 5.5 Getting ready for analysis............................... 58 5.5.1 Importing result data to MongoDB...................... 59 5.5.2 Research queries................................ 59 6 Result 3: Measurements 63 6.1 Describing the results................................. 63 6.2 RQ3: Vulnerability distribution across repository types............... 67 6.2.1 Quantitative vulnerability distribution.................... 67 6.2.2 Severity distribution.............................. 69 6.2.3 Vulnerable image distribution......................... 71 6.2.4 Potential correlations............................. 73 6.3 RQ2: Vulnerabilities and inheritance......................... 80 6.4 RQ1: Certified and Verified vs. Official and Community repositories....... 81 6.5 Additional research question.............................. 84 6.6 Summary........................................ 86 viii 7 Analysis 87 7.1 Vulnerability distributions and predictions...................... 87 7.1.1 General interpretation............................. 87 7.1.2 Interpreting box plots............................. 88 7.1.3 Interpreting density plots........................... 90 7.1.4 Analyzing potential quantitative vulnerability