DEGREE PROJECT FOR MASTER OF SCIENCE IN ENGINEERING

COMPUTER SECURITY

Vulnerability Analysis of Vagrant Boxes

Andreas Holmqvist | Fredrik Lycke

Blekinge Institute of Technology, Karlskrona, Sweden, 2017

Supervisor: Emiliano Casalicchio, Department of Computer Science, BTH

Abstract

Virtual machines are often considered more secure than regular machines due to the abstraction from the hardware layer. Abstraction does provide some extra security benefits, but many vulnerabilities that exist on a regular machine still exist on virtual machines. Moreover, the sheer amount of virtual machines that are running on many systems makes it difficult to analyse potential vulnerabilities.

Vagrant is a management tool for virtual machines packaged in what is called boxes. There are currently no way to automatically scan these Vagrant boxes for vulnerabilities or insecure configurations to determine whether or not they are secure. Therefore we want to establish a method to detect the vulnerabilities of these boxes automatically without launching the box or executing code.

There are two main parts in the method used to investigate the boxes. First there is the base box scanning. A base box is an image of which the final box is built upon. This base box is launched, a list of packages is extracted, and the information is then sent to a vulnerability scanner. There is also the analysis of the Vagrantfile. The Vagrantfile is the file that is used to ready the base box with needed software and configurations. The configuration file is written in Ruby and in order to extract information from this file a static code analysis is performed.

The result for each box scanned is a list of all the vulnerabilities present on the base box as well as security configurations like SSH settings and shared folders that is retrieved from the Vagrantfile. The results are not completely accurate because the base box is used for the scan, rather than the box itself. Some of the configurations in the Vagrantfiles could not be retrieved because it required code execution or support for configurations done in by other means, like bash. The method does however provide a good indication of how many vulnerabilities a given box possesses.

Keywords: Vagrant, Static code analysis, Vulnerability

i

Sammanfattning

Virtuella maskiner anses ofta säkrare än vanliga maskiner på grund av abstraktionen från hårdvarulagret. Abstraktion ger vissa extra säkerhetsfördelar, men många sårbarheter som finns på en vanlig maskin finns fortfarande på virtuella maskiner. Dessutom gör det stora antalet virtuella maskiner som körs på många system det svårt att analysera potentiella sårbarheter.

Vagrant är en hanterare för virtuella maskiner förpackade i vad som kallas boxar. Det finns för närvarande ingen möjlighet att automatiskt skanna dessa Vagrant boxar för sårbarheter eller osäkra konfigurationer för att avgöra om de är säkra eller inte. Därför vill vi skapa en metod för att upptäcka sårbarheter för dessa lådor automatiskt och utan att köra boxen eller exekvera kod.

Det finns två huvuddelar i metoden som används för att undersöka boxarna. Först finns det basbox-skanningen. En basbox är en avbild som den slutliga boxen är byggd på. Den här basboxen startas, en lista över paket utvinns och informationen skickas sedan till en sårbarhetsscanner. Det utförs också en analys av Vagrantfilen. Vagrantfilen är den fil som används för att konfigurera basboxen med nödvändig programvara och konfigurationer. Konfigurationsfilen är skriven i Ruby, och för att extrahera information från denna fil utförs en statisk kodanalys.

Resultatet från varje skannad box är en lista över alla sårbarheter som finns i basboxen samt säkerhetskonfigurationer som SSH-inställningar och delade mappar som hämtas från Vagrantfilen. Resultaten är inte helt korrekta eftersom basboxen används för skanningen, snarare än själva boxen. Vissa av konfigurationerna i Vagrantfilen kunde inte hämtas eftersom det krävdes kodkörning eller stöd för konfigurationer gjorda på annat sätt, som bash. Metoden ger dock en bra indikation på hur många sårbarheter en given box har.

Nyckelord: Vagrant, Statisk kodanalys, Sårbarhet

iii

Preface

This thesis marks the end of the five years study in the Master of Science in Engineering: Computer Security programme at Blekinge Institute of Technology, Karlskrona.

Acknowledgements: We would like to thank John Stock, Martin Jartelius, and Davide Girardi at Outpost24 for providing the opportunity to do this thesis, and assistance and guidance to complete it.

We would also thank our supervisor Emiliano Casalicchio for continuously providing valuable feedback and suggestions during the the thesis work and report writing.

v

Nomenclature

Acronyms

AST Abstract Syntax Tree.

CVE Common Vulnerability and Exposures.

JSON JavaScript Object Notation.

vii

Table of Contents

Abstract i Sammanfattning (Swedish) iii Preface v Nomenclature vii Acronyms ...... vii Table of Contents ix List of Figures x List of Tables xi 1 Introduction 1 1.1 Introduction ...... 1 1.2 Background ...... 1 1.3 Objectives ...... 1 1.4 Delimitations ...... 2 1.5 Thesis question and technical problem ...... 2 2 Theoretical Framework 3 2.1 Related work ...... 3 2.2 Technologies ...... 3 3 Method 9 3.1 Base box scanning ...... 9 3.2 Vagrantfile analysis ...... 10 3.3 Architecture ...... 11 3.4 Reporting ...... 12 4 Results 13 4.1 Vulnerability scanning ...... 13 4.2 Static code analysis ...... 13 4.3 Full scan ...... 15 5 Discussion 17 5.1 Vulnerability scanning ...... 17 5.2 Static code analysis ...... 17 5.3 Full scan ...... 18 6 Conclusions 19 7 Recommendations and Future Work 21 References 23

ix List of Figures

2.1 Layered architecture of Vagrant using providers [10]...... 5 3.1 Abstract Syntax Tree ...... 11 3.2 Architecture ...... 11 4.1 Visualisation of the Vagrantfile JSON. Rendered by [21]...... 14 4.2 Visualization of the full scan JSON. Rendered by [21]...... 16

x List of Tables

4.1 Vulnerabilities for the first five most downloaded base boxes...... 13

xi

1 INTRODUCTION

1.1 Introduction has grown in use the last several years. Virtualization enables more efficient use of hardware, with multiple isolated platorms running on a single machine. Currently there are two main types of virtualization, containerisation, and hypervizor-based virtualization. Hypervizor-based virtualization establishes complete virtual machines on top of the host machine in the machine layer, along an entire guest . Containerisation runs at operating system level and uses the host’s kernel to run virtual environments. This means that containers do not need its own individual operating system to run.

Vagrant is a platform that is used to manage virtual machines and containers. It can be used to make sure that the same software with the same configuration is used in an environment for multiple users no matter if , Mac OS, or Windows is used as host [1]. Vagrant does not actually provide any kind of virtualization by itself, but rather the management of machines. Instead Vagrant relies on virtualization software, like VMWare and Virtualbox, to run and configure the virtual machines. This allows support for new virtualization techniques to be added more easily. The advantage of using Vagrant as a manager is that it can be used to more easily reproduce and launch virtual environments. Vagrant supports Virtualbox, Hyper-V, and machines by default.

Vagrant has some similarities with Docker [2] but Vagrant is a higher level of abstraction. While Docker is a container platform, Vagrant is a manager with multiple providers. As Vagrant is a higher level of abstraction it can even use Docker as a provider. Docker support was available for Vagrant as a plugin for version 1.4 or later but built in support was added in version 1.6 [3].

This thesis looks at different ways to make a vulnerability analysis of a Vagrant box without having to boot it or execute anything on a running machine. A method is chosen and a system is developed to scan boxes.

1.2 Background A user can have multiple Vagrant boxes installed and running silently in the background. These boxes can be hard to keep track of and manage. They can also contain security vulnerabilities such as outdated libraries or insecure configurations. The vulnerabilities can be anything a regular machine can have. This can for example include remote code execution, misconfiguration, and insecure services running. Currently there are no publicly available tools to scan boxes for security vulnerabilities. Because of the lack of tools capable of scanning Vagrant boxes Outpost24 has requested to find a way to create automatic vulnerability analysis of boxes.

This project is developed in collaboration with Outpost24 and because of an agreement with them the source code will not be included in this thesis.

1.3 Objectives The objective of the project is to create a system that automatically assembles a list of Vagrant boxes and their information on a computer. The boxes are then scanned for known vulnerabilities. The information gathered is put into a report. Such a report will make it easy for a person to

1 quickly establish whether or not a given system uses any Vagrant boxes and what vulnerabilities they contain. The report should be in a format that allows easy parsing for further work.

1.4 Delimitations Vagrant supports many different virtual environment providers. In order to limit repetitive work we are focusing on the most popular provider, Virtualbox. As making a vulnerability scanner is a difficult task in itself, an existing scanner that searches for known vulnerabilities will be used.

Vagrant supports multiple different operating systems, most of which are different Linux distributions. Because of the common occurrence of these distributions and the difference in implementing a solution for other operating systems we have put focus on Linux. Furthermore the vulnerability scanner used supports only , Ubuntu, Centos, Redhat, Oraclelinux, and Fedora. This can be expanded in future work by using one or more different scanners.

The Vagrantfile used to configure vagrant boxes is written in Ruby. This means that the configuration file will be executed as a regular script for the user that launches it. Analysing such a file can prove difficult, as at a certain point it might prove necessary to execute code in order to automatically glimpse the purpose of it. Executing code is infeasible to do as it might expose the system to vulnerabilities.

1.5 Thesis question and technical problem A system can have many Vagrant boxes present. These boxes may have vulnerabilities such as old libraries, insecure configurations, or insecure running services. It is very difficult and time consuming to find whether or not these boxes contain any security vulnerabilities without any automated tools. As of now, there are no existing tools that can accomplish this with Vagrant boxes.

• How can software vulnerabilities in Vagrant virtual machines be detected without launching boxes?

• How can configurations relevant for a security analysis be extracted from the Vagrantfile without code execution?

2 2 THEORETICAL FRAMEWORK

2.1 Related work Steven J. Vaughan-Nichols showcases many security concerns in his article [4] regarding virtual machines. He points out that even if virtualization provides some security by abstraction from the hardware layer, it also adds some points of attack. Some new vulnerabilities can for example exist in the hypervisor, which could potentially give an attacker root privileges. Important to note is also that many of the vulnerabilities that exists on a regular machine can also exist on a virtual machine.

Tal Garfinkel and Mendel Rosenblum give some insight in their article [5] of aspects that can make virtual machine security harder than for physical machines. Most notable of these are the issues of scaling, transience, and software life cycle. These problems become bigger when using virtual machine managers, such as Vagrant, which is likely to increase the amount of virtual machines on a network.

Melina Kulenovic and Dzenana Dzonko bring up methods in [6] for static code analysis for vulnerabilities. The paper illustrates some of the weaknesses and advantages of the static code analysis approach for analysing code. The advantage of the method is that it is fast, repeatable, and does not need the program to be executed to be scanned. However, because the program never runs, for an entirely accurate representation, the program has to be analysed with all different inputs and states. Analysing a program for all states and inputs proves infeasible.

Clair [7] is a open source project for static vulnerability analysis of Docker and appc [8] containers. Clair scans each layer of a container for known security flaws documented in databases such as the Common Vulnerability and Exposures (CVE) database. The scanner does however only scan containers, and not regular virtual machines.

2.2 Technologies 2.2.1 Vagrant Vagrant is a tool that is used to ease the tasks of building, managing, and distributing virtual machines. The most useful feature of Vagrant is the ability to provision a virtual machine with a single command vagrant up. The environment will only ever have to be setup once, and will easily be replicated on any number of instances by downloading the box and running vagrant up.

In [1] Hashimoto explains what happens when vagrant up is run, which launches a Vagrant box:

• A virtual machine is created with the image specified in the box. • Physical properties of the machine is modified. • Network interface(s) are established for access from a local device, a local network, or remotely. • Shared folders between host and guest are set up. • The virtual machine is started. • The machine’s hostname is set. • Provisions specified software on the machine.

3 • Tweaks for known issues that could arise between host and guest are performed.

The most important parts of Vagrant are the boxes, which comprised of a base box and a Vagrantfile, and providers which are used to create an interface between Vagrant and different virtualization software.

2.2.1.1 Boxes

Vagrant boxes are packages that contain information about how to set up an environment. There are many available boxes online which can be downloaded for free [9]. These include ready servers and development environments. There are two main parts that is required to create a Vagrant box.

Base box The base box is the image of which the Vagrant box is built upon. This is a regular virtual machine image that has been modified to contain the bare essentials for running from Vagrant. As the name suggest, it is only a base to build the complete box on and should not provide any useful service without further configuration.

Vagrantfile A base box is combined together with a Vagrantfile to create a project. The Vagrantfile denotes which base box to use and how to configure it. By placing the Vagrantfile in version control every member of a team can be sure that they have the same development environment. The Vagrantfile is written in the Ruby but knowledge of Ruby is not necessary to edit it. However, it is possible to create more complex configurations because it is pure Ruby code. [1]

Listing 2.1 shows an example of a very simple Vagrantfile that is constructed to showcase such a file may look. The file does the following configurations:

• ubuntu/trusty64 is configured to be used as a base box.

• Port 80 on the guest is mapped to 8080 on host.

• The standard synchronised folders for Vagrant boxes are disabled.

• Two different ways to configure the box further with bash script are used. One which the bash script is input directly into the Ruby code, and one where a bash file is run.

4 Listing2.1: ExampleVagrantfile Vagrant.configure(2) do |config| config.vm.box="ubuntu/trusty64"

config.vm.box_check_update= false

config.vm.network"forwarded_port",guest:80,host:8080

config.vm.synced_folder".","/vagrant",disable: true

config.vm.provision"shell",inline:<< SHELL sudoapt getupdate sudoapt getinstall y build essentialzsh wgetgitvim SHELL

if (not ARGV.include?(" no provision")) config.vm.provision"shell",path:"install_packages.sh" end end

2.2.1.2 Providers Providers refertothedifferentprovidersofvirtualizationsoftware.Vagrantactasanextralayer ofabstractionbetweentheuserandthevirtualizationsoftwaretoprovideeasiermanagement, andalsoasimilarwaytointeractwitheachofthese.ProvidersallowsVagranttosupportmany differentvirtualizationtechniques,asdeveloperscancreatetheirown,ifneeded.

Figure2.1: LayeredarchitectureofVagrantusingproviders[10].

Asshowninfigure2.1,Vagrantallowsformanydifferentproviderstobeusedsimultaneously. Eachoftheproviderscaninturnrunasmanyvirtualmachinesasthehardwareallowsfor.

5 2.2.2 Virtual machines Virtual machines are computer systems which are run within an operating system. Virtual machines allow for many machines of different operating systems to exist on a single system. One of the more useful aspects of a virtual machine is the ability to load and save specific states of a machine, which allows a user to share and roll back machines.

2.2.2.1 Security problems Virtual machines are often considered secure due to the abstraction from the hardware. It is true that the abstraction layer does add some security, but the virtual machines also bring new security problems. Tal Garfinkel and Mendel Rosenblum discusses in [11] the flexibility and availability of virtual machines and the security problems these bring.

Scaling Virtual machines are not as strictly bound by hardware as regular systems. Users can have many different virtual machines on a single system, each designed for a different purpose. The number of virtual machines in a company can grow massively in a short amount of time. These machines can prove hard to maintain and monitor.

Transience Virtual machines are often short-lived, as they are created for specialised purposes. A large number of virtual machines can disappear or appear on a network on a short amount of time. This means that it is difficult to know what is present on a network at any given time without scanning it.

Software life cycle The state of a virtual machine can shift rapidly when, for example configurations change, new software is installed, or when patches are applied. Multiple instances of a particular virtual machine can exist with different versions. The machines can also be rolled back to a previous version to fix possible problems. This can cause severe problems in maintenance as the different version of a single versions can have different vulnerabilities.

Diversity Organisations generally run one or a few different environments on their hardware, making it easier to maintain an updated work environment. Virtual machines however are often run on a plethora of different environments, which can add to the difficulty in managing the software.

2.2.3 Vulners A vulnerability scanner is a tool that is used to determine what vulnerabilities a computer system contains. Vulners [13] is a vulnerability scanner that uses a database of known vulnerabilities. These vulnerabilities can be cross referenced with operating system, and software packages installed on a machine, in order to provide a list of vulnerabilities for the given machine.

2.2.4 Parsing The parsing of a language is to break it up into its component parts, often with a syntax tree.

An Abstract Syntax Tree (AST) is a tree constructed from source code and is comprised of a minimised structural representation of the source code in the form of nodes. The tree is

6 abstract due to the way superfluous signs and nodes are removed, and only necessary symbols for displaying nodes and structure remain. An AST is needed in many cases where code analysis is concerned, because many patterns cannot be properly described with regular expressions [12].

Parser [20] is a well documented parser for Ruby. It is written in pure Ruby code and is fully open source. It is very powerful with many features but still very easy to use.

7

3 METHOD

A box contains two main parts: a base box and a Vagrantfile. The base box is a virtual machine with essential software which can be used to build a complete box. The Vagrantfile is where the configuration of the box is specified. To do a vulnerability analysis both of these parts have to be examined. It is also possible to start up the box and examine it as a normal virtual machine.

Booting up a box will execute code in the Vagrantfile which can cause network conflicts on the host and is therefore not preferable. Mounting the image would be the preferable solution in the aspect of resource usage and security as no actual code will have to be executed, and no virtual machine will have to be delegated any resources. This, however requires knowledge of the underlying file systems and partition table for each box to be scanned. File system and partition table information is entirely possible to acquire but would require considerable amount of work to provide support for available types.

With these things in mind a solution was found to first scan the base boxes, which should have the same packages as the actual box except for eventual packages installed via the Vagrant configuration file. The Vagrantfile is to be analysed for interesting security configurations, for example which ports are open, which folders are synchronised with the host and SSH configurations. The results from the Vagrantfile analysis and base box scanning is combined into a report. A database is kept with previously scanned base boxes and the scan results so that they do not have to be scanned again. SQLite was chosen because of its fast and easy setup and management. Should the program require a client/server solution the data from SQLite can be exported to a database such as MySQL.

Underneath is a broad overview of the process that is used for analysing a Vagrant box:

1. Popular Vagrant boxes are scanned for packages with vulnerabilities in order to build up a database that will save time when these are encountered in later analyses of Vagrantfiles. 2. The Vagrantfile is parsed with Parser [20], which creates an AST. 3. The AST is traversed in search for nodes that contain interesting configurations, including the base box used. 4. If the base box that is referred to in the Vagrantfile does not exist in the database of base boxes it is scanned for packages with vulnerabilities. 5. Results from base box scanning and Vagrantfile analysis are combined into a report.

3.1 Base box scanning To scan the base boxes for vulnerabilities a list of the most downloaded base boxes is retrieved from the repository provided by Hashicorp. After the list is retrieved each of the base boxes are downloaded. There are problems establishing an SSH connection to some of the boxes. This can be resolved by adding a statement in the Vagrantfile, which connects a virtual cable. The virtual machine is then started and three commands are sent to the virtual machine. The first command sent may also return a welcome message. In order to prevent this from disrupting the first command’s output a simple ’ls’ command is sent first. The second command is the one that retrieves all package names, version, and architecture. The last command gets the distribution name and version. After this the virtual machine is shutdown, destroyed and Vagrantfile is removed.

9 The package data, distribution name and version is sent to a vulnerability scanner called Vulners. Vulners return a JavaScript Object Notation (JSON) object with all vulnerabilities for the packages. Vulners does only have support for Redhat, Fedora, Oracle Linux, CentOs, Ubuntu, and Debian. If any other distribution is sent to Vulners it will produce an error. However, if a version that Vulners does not have support for is provided it will return no vulnerabilities. No vulnerabilities are seen as a failed scan as it is very unlikely that a box contains no vulnerabilities at all. The package data, vulnerabilities, box name and box version are inserted into an SQLite database.

To test the vulnerability scanning a known vulnerability is inserted into a base box. The base box chosen for this is ubuntu/trusty64 with Ubuntu 14.04. For the vulnerability the latest entry in the Ubuntu Security Notes [15] is used. At the time of writing this is USN-3272-1: Ghostscript vulnerabilities [16]. This vulnerability is found in version 9.10 dfsg-0ubuntu10 of the package Ghostscript for Ubuntu 14.04. First a scan of the base box is conducted to get all vulnerabilities. After this the vulnerable version of the package is installed on the box and then another scan is conducted.

3.2 Vagrantfile analysis

To make it easier to read the Vagrantfile it has to be converted to another format with the most interesting configurations, in a security standpoint. The first idea was to simply use regular expressions to extract the configuration. Early on it was realised that this would not work in a lot of cases. It is difficult to know when a statement ends and to know how many lines to read. Because of this a parser is a better way to go [18]. The most complete parser found for Ruby code was a parser written in Ruby, called Parser [20]. Parser creates an abstract syntax tree which can be traversed to get the configuration. A JSON object of the configuration is sent to stdout. This JSON object contains box name, box url, if automatic update is enabled, synchronised folders, provisioning, network, and SSH configurations.

The AST is traversed by post order. This is because the information in the children are needed to determine what kind of statement it is. The listing 3.1 is an example of how a synced folder statement can look like. This statement will disable the default synchronisation between the folder the Vagrantfile is in and a Vagrant folder in root of the virtual machine. Figure 3.1 shows how the AST would look like for the statement in listing 3.1. From the first two children it is possible to determine that it is a synced folder statement. Third and fourth children contains a string with the folders that should be synchronised. The fifth and last statement is not always present but in the cases it does exist it contains additional options.

Listing 3.1: "Synced folder statement in a Vagrantfile"

config.vm.synced_folder ".", "/vagrant", disabled: true

10 Figure 3.1: Abstract Syntax Tree

Vagrantfiles that have some configurations are needed to test the program that creates a JSON object from a Vagrantfile. To build a dataset of Vagrantfiles Google is used with the search query "vagrantfile site:github.com". This generates many results with projects that use Vagrant. The Vagrantfiles are then parsed and analysed. If any problems occur they are looked at and fixed. Because of the complexity of the Ruby language the configuration can be done in such a vast amount of ways. Therefore only the most basic and some more complex configurations that is found is implemented. In case the program encounters something it is not familiar with it will try to traverse the node’s children anyway. This is to make sure that it finds configurations further down the tree.

3.3 Architecture Figure 3.2 shows how the architecture looks like. The system scanner can be used if a full scan of a system is needed. Each part can be run by its own. For example it is possible to find all base boxes on a system or just scan one Vagrantfile.

Figure 3.2: Architecture

11 System Scanner is the main part of the system. It is used to run the different parts of the system. It uses Vagrantfile Finder to find all Vagrantfiles. Each of these are then sent to Box Scanner. It also use Base Box Finder to find all installed base boxes. If the base box has been scanned the vulnerabilities from Database is added.

Box scanner is the part that is used to retrieve configurations found in the Vagrantfile and the vulnerability scan of the base box, which is combined into a JSON. In the case the base box has not been scanned it will use Package Fetcher.

Package Fetcher is the part that launches base boxes, retrieves the installed packages, makes a vulnerability scan and puts the result into Database.

Analyser uses the AST provided by Parser to create a JSON from a Vagrantfile.

3.4 Reporting Information for the report is gathered from the vulnerability scan of the base boxes and the relevant configuration found. The vulnerability scan of the base box retrieves the distribution and version of the box. The scan also retrieves the packages installed on the machine and what vulnerabilities these contain. The analysis of the Vagrantfile yield important security related configurations that is present in the file.

This information is then collected into a JSON object on a per box basis. JSON was chosen as a format because of its light weight and the language independent nature of the format. The language independency will mean that the report can be easily used for further work.

12 4 RESULTS

This chapter describes the result of the methods implemented in chapter 3.

4.1 Vulnerability scanning Vulnerabilities were collected from each box by scanning the packages with Vulners. This provides a list of vulnerabilities for each box, and version of the box that is scanned.

Of the 40 most downloaded base boxes three failed. The reason for the failures was that one of them was no longer available for download, one of them received no vulnerabilities from Vulners and the last one was not configured correctly and could therefore not be started.

The number of vulnerabilities found for the first five most downloaded base boxes can be seen in table 4.1. The Hashicorp box is used in the documentation which is most likely the reason it has so many downloads. It has however not been updated in three years which is reflected in the number of vulnerabilities in it. The most downloaded box with Ubuntu was recently updated and does not have as many vulnerabilities.

Table 4.1: Vulnerabilities for the first five most downloaded base boxes. Base box Version Vulnerabilities ubuntu/trusty64 20170412.0.0 52 /homestead 2.1.0 30 hashicorp/precise64 1.1.0 214 /7 1703.01 119 puphpet/ubuntu1404-x64 20161102 62

The scan with the introduced vulnerability found three more vulnerabilities and not just one. These were USN-3272-1, USN-3148-1 and USN-2697-1. The first two were introduced with Ghostscript while all three of them could be found in Libgs9 which is a dependency for Ghostscript. The vulnerability that was intended to be introduced was one of them so the test was successful.

4.2 Static code analysis Due to the fact that pure Ruby is used in the Vagrantfiles, the structure of the code and the coding methods used during the creation of the file can be wildly different. For example some of the files that were analysed opened bash files that run further configurations, or downloads configurations from git repositories.

Figure 4.1 shows the JSON output of the example Vagrantfile 2.1. As can be seen from the JSON representation, the base box, forwarded ports, and synchronised folders can successfully be extracted from the file. From this JSON we can easily see that the base box that is used in creating the box is named ubuntu/trusty64, and that port 80 for the guest has been mapped to port 8080 to the host. It can also be seen that the standard synchronised folder that is present in Vagrant machines are disabled. It should also be noted that the Vagrantfile used as an example is quite a simple variant and could be much more advanced, and most of the configurations that it contains are actually interesting in a security aspect.

13 Figure4.1: VisualisationoftheVagrantfileJSON.Renderedby[21].

Theexamplefiledoeshoweverhighlightafewproblemswiththemethodofusingstaticanalysis. Somestatementscannotbeevaluatedwithoutactuallyexecutingcode,whichcouldprovetobea securityrisk.TheseunresolvedstatementsaresavedinastringintotheJSON.

Thefollowingexampleshowcasesoneoftheproblemswithnotbeingabletoexecutecode. if(notARGV.include?(" no provision")) config.vm.provision"shell",path:"install_packages.sh" end

Inthiscase,theifstatementchecksiftheincludedargumentcontainstheoption –no-provision . Eveniftheprogramhadaccesstothisargument,evaluatingsuchastatementcouldprovide asecurityrisk.Thismeansthatwithoutexecutingitishardtodeterminewhetherornotthe configurationintheifstatementisusedinthebox.Incasessuchasthisitisassumedthat configurationsthatarenotcertaintorunwilldoso.

14 The config.vm.provision statement takes in this case a path to a shell script and runs the configuration inside. If this file is present on the local machine it is possible analyse this for more configuration information.

The final version has support for the following statements and operands:

• Hashes • Constants • Pairs • Square bracket operator • Arrays • Here documents • Symbols • Assignment of constants • Integers • Local and global variables • Strings • Plus operator, converts operands to string • Booleans • Assignment of local and global variables

This does not cover the whole Ruby language but is enough to work with most Vagrantfiles.

Many of the settings found in the Vagrantfile were not interesting in a security standpoint, and only relevant settings are extracted. During the analysis the following settings were found to contain useful information:

• Forwarded ports • Base box name • Synced folders • Private networks • Base box URL • Provisioning • Public networks • Automatic update • SSH configuration

4.3 Full scan The full scan is where the whole system is scanned for Vagrantfiles and installed boxes. The result of this will show the found Vagrantfiles and base boxes, as well as security configurations and vulnerabilities related to them.

Figure 4.2 shows how a result from a full scan can look like. It combines the output from Box Scanner and installed base boxes into one JSON. As can be seen from the figure in the installed_boxes segment, the scanned system has three base boxes installed. The vagrantfiles section contains two scanned Vagrantfiles.

15 Figure 4.2: Visualization of the full scan JSON. Rendered by [21].

16 5 DISCUSSION

5.1 Vulnerability scanning Of the 40 scanned base boxes only three of them failed. One of them failed because it received no vulnerabilities from Vulners, which is a standard response when the version is not supported. This could be fixed in the future by including more scanners to the system. The other two base boxes were misconfigured, or unavailable for download, which only the developers of the base box can correct.

The amount of vulnerabilities in base boxes were mostly dependent on how long ago they were updated. The hashicorp/precise64 box had not been updated for three years, which is visible in the large amount of vulnerabilities it contains. However the centos/7 also contained a large amount of vulnerabilities despite it being updated a month before scanning. This is likely due to a poor choice of secure packages installed.

The scanning of base boxes provides a good indication of what vulnerabilities that exist on a Vagrant box. The scan does however not show a complete accurate representation of the vulnerabilities contained in the box’s packages. There can be false positives and false negatives because the base box is scanned instead of the actual box. It is the starting state of the box that is scanned rather than the current state.

The inaccuracies of scanning the base box opposed to the actual box can be partially offset by the analysis of the Vagrantfile. Because the Vagrantfile exists to provide the configuration necessary for the box to achieve its purpose, most of the installed packages are installed with the help of the Vagrantfile. There are however many different ways packages can be installed through the Vagrantfile, such as through bash script, , or Puppy which all will have to be supported in order to get an accurate representation.

It is also possible to conduct a static analysis of boxes, but this will require significant investment in providing support for different file systems, and partitions. Unless the box to be scanned has already been launched previously, this approach will also have to be done on the base box. This is because Vagrant does much if its configuration to machines the first time the box is launched.

5.2 Static code analysis The static analysis of the Vagrantfile does not always make an entirely accurate representation of the configurations made. This is because the Vagrantfile sometimes contain statements that cannot be resolved without executing code or checking configurations written in different languages and files. As is shown in the result in figure 4.1, the condition of the if statement will be disregarded. This is because it is difficult to know if the statement will be false or true without code execution, and in this case an argument is used which will not exist in a statical analysis. Anything in the if statement will be assumed to run, because it better to have configurations that may be false than no indication of a configuration at all. This could in the future be partially fixed by providing an indication when configurations are not certain.

Any configuration that the analyser can not handle are put into a string as can be seen for the shell script and shell file in figure 4.1, where these configurations are saved as a string of the shell script, and the path to the file that is read. This will will make it easy for anyone that investigates

17 the report to see what configurations are not found, and where the problem is.

It is also possible, but not recommended, to configure a box after it is started with SSH. If a box were to be configured after launch these configurations would not be detected with a static analysis of the Vagrantfile.

There is also a problem with how base boxes are created. According to Vagrants guidelines, Vagrant base boxes are constructed as a lightweight box without any use by itself. The Vagrantfile exists to configure the base boxes for the desired purpose. If someone has constructed a base box that is not used like this, the configurations that are done on the base box directly are not detected by the analysis of the Vagrantfile.

Static analysis of the Vagrantfile also provides some benefits, with more condensed summary of security related configurations. Also because the box is not scanned directly, it does not have to be started. The box does not even have to be present on the system when performing a vulnerability analysis of it. For the static analysis only the Vagrantfile is needed, and for a complete vulnerability analysis, a previous scan, or access to the base box is also needed.

5.3 Full scan The full scan combines the scan of the base box with the analyse of the Vagrantfile, and compiles them into a JSON format. This will make it easier for anyone to see what boxes are installed on a system, get an indication of what vulnerabilities they contain, and security related configurations that can be used to further investigate the box.

The proposed method of scanning Vagrant boxes can help in choosing which box to use by getting a quick view of what configurations and vulnerabilities a box contains. Even people with no programming knowledge will be able to judge if a box is safe to use by only providing the Vagrantfile and therefore not have to start it up. This will decrease the amount of vulnerabilities and therefore increase the security for people that use Vagrant.

18 6 CONCLUSIONS

The method chosen can provide an indirect scan of the box by instead launching the base box. This scanning provides a good indication of which, and how many, vulnerabilities a box contains, but will not be completely accurate. This is because the box might not contain the same vulnerabilities as the base box, as packages can have been added or removed since the box’s launch. This can be partially solved by looking at the result from the Vagrantfile analysis as it will contain the provisioning and therefore what packages that are installed when the box is started.

The Vagranfile analyser successfully extracts many of the security configurations that can prove interesting in evaluating whether or not any insecure settings have been applied to the machine. This is done without the need for any execution of code from the Vagrantfile. This does however mean that some of the configurations will not be properly resolved as it is sometimes impossible without code execution.

Making a complete static code analyser is a very difficult process. Such an analyser would have to be run with all possible inputs. It will also have to support not only the language it is written for, but all other languages that can be interacted with.

These parts can together automatically create a list of Vagrant boxes on a system together with the vulnerabilities they contain, and security configurations made with the Vagrantfile. This is without launching the box or executing any code in the Vagrantfile.

• “How can software vulnerabilities in Vagrant virtual machines be detected without launching boxes?”

By having a database with already scanned base boxes the vulnerabilities in a box can be estimated. The accuracy can be increased by looking at the Vagrantfile which configures the box.

• “How can configurations relevant for a security analysis be extracted from the Vagrantfile without code execution?”

By doing a static code analysis on the Vagrantfile it is possible to extract the relevant configurations. The Vagrantfile is written in Ruby so this is done by first parsing the Ruby code and then traversing the tree generated by the parser.

19

7 RECOMMENDATIONS AND FUTURE WORK

The solution of scanning base boxes and analysing the Vagrantfile works well to avoid the launching of boxes and executing code. The solution is however more in the proof of concept stage, and can be improved significantly. As of the moment the program uses only Vulners for the vulnerability scan. Support for more scanners would make these results more accurate. Another important point would be to look at the result from the Vagrantfile analysis and be able to extract what packages that are installed at start up. This would also require a way to find out which versions of the packages that are installed.

The parser also requires more work, as it needs support to interpret configurations made by other means such as bash scripts, Chef, or . Support to detect all possible ways to configure is unlikely, but the most popular solutions are entirely possible.

Another way to solve the problem of not launching boxes or executing code would be to mount the box image directly. This method would be more accurate, but would require considerably more work. Should this method be implemented, support for the different file systems and partitions would be required. Mounting also requires more precise knowledge about the structure of the operating system in order to extract information.

21

REFERENCES

[1] M. Hashimoto, Vagrant: Up and Running, 1st ed. O’Reilly Media, 2013. [2] "Docker", Docker, 2017. [Online]. Available: https://www.docker.com/. [Accessed: 27- Apr- 2017]. [3] "fgrehm/docker-provider", GitHub, 2017. [Online]. Available: https://github.com/fgrehm/docker-provider. [Accessed: 25- Apr- 2017]. [4] S. Vaughan-Nichols, "Virtualization Sparks Security Concerns", Computer, vol. 41, no. 8, pp. 13-15, 2008. [5] T. Garfinkel and M. Rosenblum, "When Virtual Is Harder than Real: Security Challenges in Virtual Machine Based Computing Environments", HotOS, 2005. [6] M. Kulenovic and D. Donko, "A survey of static code analysis methods for security vulnera- bilities detection", 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014. [7] "coreos/clair", GitHub, 2017. [Online]. Available: https://github.com/coreos/clair. [Accessed: 27- Apr- 2017]. [8] "appc/spec", GitHub. [Online]. Available: https://github.com/appc/spec. [Accessed: 25- Apr- 2017] [9] Atlas.hashicorp.com. [Online]. Available: https://atlas.hashicorp.com/boxes/search. [Ac- cessed: 13- Apr- 2017] [10] W. Gajda, Pro Vagrant, 1st ed. 2015. [11] A. Rehman, S. Alqahtani, A. Altameem and T. Saba, "Virtual machine security challenges: case studies", International Journal of Machine Learning and Cybernetics, vol. 5, no. 5, pp. 729-742, 2013. [12] I. Kochurkin, "Theory and Practice of Source Code Parsing with ANTLR and Roslyn", Blog.ptsecurity.com, 2016. [Online]. Available: http://blog.ptsecurity.com/2016/06/theory- and-practice-of-source-code.htmltheory-of-parsing. [Accessed: 18- Apr- 2017] [13] "Vulners - Vulnerability Data Base", Vulners.com. [Online]. Available: https://vulners.com/. [Accessed: 18- Apr- 2017] [14] T. Bui, "Analysis of docker security", arXiv preprint arXiv:1501.02967, 2015. [15] "Ubuntu security notices | Ubuntu", Ubuntu.com, 2017. [Online]. Available: https://www.ubuntu.com/usn/. [Accessed: 02- May- 2017]. [16] "USN-3272-1: Ghostscript vulnerabilities | Ubuntu", Ubuntu.com, 2017. [Online]. Avail- able: https://www.ubuntu.com/usn/usn-3272-1/. [Accessed: 02- May- 2017]. [17] J. Smith and R. Nair, "The architecture of virtual machines", Computer, vol. 38, no. 5, pp. 32-38, 2005. [18] A. Karpov, "Static analysis and regular expressions", Viva64.com, 2010. [Online]. Available: https://www.viva64.com/en/b/0087/. [Accessed: 18- Apr- 2017]

23 [19] "RFC 7159 - The JavaScript Object Notation (JSON) Data Interchange Format", Tools.ietf.org, 2014. [Online]. Available: https://tools.ietf.org/html/rfc7159. [Accessed: 17- Apr- 2017] [20] "whitequark/parser", GitHub, 2017. [Online]. Available: https://github.com/whitequark/parser. [Accessed: 13- Apr- 2017] [21] J. Jong, "JSON Editor Online - view, edit and format JSON online", Jsoneditoronline.org. [Online]. Available: http://www.jsoneditoronline.org/. [Accessed: 19- Apr- 2017].

24