Kulkarni [email protected] Rackspace, the University of Texas at Austin

Provenance Issues in Platform-as-a-Service Model of Cloud Computing Devdatta Kulkarni [email protected] Rackspace, The University of Texas at Austin Abstract manages complete life-cycle of an application. In order for application developers to gain confidence into the In this paper we present provenance issues that arise in platform and to gain insights into application construc- building Platform-as-a-Service (PaaS) model of cloud tion process, it is important that the platform collects computing. The issues are related to designing, build- different kinds of provenance information and makes it ing, and deploying of the platform itself, and those re- available to application developers. Additionally, prove- lated to building and deploying applications on the plat- nance information about the platform itself is useful for form. These include, tracking of commands for success- the platform developers and platform operators towards ful software installations, tracking of inter-service de- correct design and operation of the platform. pendencies, tracking of configuration parameters for dif- In this paper we identify different kinds of provenance ferent services, and tracking of application related arti- information that need to be collected within a PaaS to facts. We identify the provenance information to address provide better insights into the workings of the plat- these issues and propose mechanisms to track it. form and the applications that it deploys (Section 2). We also propose mechanisms to collect this informa- 1 Introduction tion (Section 3). We realized the importance of some of these through our experience of building Solum, which is Platform-as-a-Service (PaaS) model of cloud computing an application life-cycle management system for Open- simplifies deployment of applications to a cloud infras- Stack [6]. Application developers use declarative speci- tructure. Application developers use a declarative model fication model to specify application’s build and runtime for specifying their application’s source code and re- requirements to Solum. Solum builds and deploys the quirements to the platform. The platform builds and de- application by creating Docker containers [7] with the ploys the application by provisioning the required infras- application code, hence forth referred as docker contain- tructure resources. There are several such PaaS systems ers, and deploying them on OpenStack infrastructure. in existence today, such as Heroku [1], Google App En- gine [2], OpenShift [3], Solum [4]. 2 Provenance Issues Provenance has emerged as one of the key requirement in building modern systems that support properties such 2.1 Platform development as auditability and data lineage tracking [5]. In our view provenance is all the information about an entity that can While developing Solum, we frequently followed a pat- help towards understanding how that entity got to a par- tern of trial and exploration when it came to using new ticular state. For instance, when a software is success- softwares and tools, such as the docker registry [8], fully installed on a Ubuntu host, all the packages that which we use for storing application’s docker containers. had to be installed towards that are part of that software’s The typical pattern that we followed was as follows. We provenance. Another example is that of a service which referred to several on-line documents, we installed dif- depends on other services for its correct functioning. The ferent softwares and packages, changed file permissions dependency information, such as the versions of the de- if required, changed directories, compiled/built certain pendent services, are part of that service’s provenance. packages, and so on until we were successful. In a PaaS system, the need to track provenance infor- At the end when the required software/package was mation is especially important given that the platform installed successfully, we wanted to consolidate the exact steps that we could follow if we had to install that ters for other services matter, for Solum to function cor- software on a different host in the future. However, at rectly in the devstack environment, following variables this point we generally faced a problem. From the entire need to be set in Nova’s [15] configuration file: sched- shell command history, it was difficult to find out only uler default filter, memory, compute driver. We propose those commands which were essential towards success- in subsection 3.3 how we can use provenance of config- ful installation of that software. We felt that we needed uration parameters towards enabling choosing of appro- an automated way which, given a list of commands, will priate configuration parameters of dependent services for find out the exact commands that are necessary and suffi- the purpose of design and development of Solum. cient for installing that software. Essentially, we needed the provenance for software installs (subsection 3.1). 2.4 Application provenance We observe that the trial and exploration pattern is not unique to design and development of Solum. It is quite Within Solum, application construction and deploy- general and can be seen to be followed by developers in ment information includes, commit hash of application’s trying to install any new software. source code deployed by the platform, test and run commands used in building particular version of the appli- 2.2 Platform building cation, version of docker used to build the application runtime environment, and versions of dependent services Solum depends on other OpenStack services such as used. All this information is part of the application con- Keystone [9], Glance [10], Heat [11], Barbican [12], struction and deployment provenance. Tempest [13]. Often times these services change their requirements, or undergo a refactoring change, which 3 Provenance Mechanisms causes Solum to break. For example, recently the tempest functional testing library re-factored its code to 3.1 Command list provenance move its REST module from its package to a separate library. This caused all the functional tests in Solum to In order to find the provenance for successfully installing stop working as the import path of the REST module a software we need to consider the list of commands within Solum’s functional tests was no longer correct. that eventually led to a successful installation. An en- Such a breakage results in wasted developer productiv- tire history of user’s shell interactions contains differ- ity. It also leads to wasted resources in terms of testing ent kinds of commands, such as package installation failures and backups on the OpenStack’s continuous inte- commands (apt-get install), navigational commands (cd, gration servers (CI servers). The problem is exacerbated pushd, popd), listing/viewing commands (ls, less, more), if a service has had several commits. For developers of editor commands (opening files in vi, emacs), service a service like Solum, it is currently very tedious to find start up commands (service docker start), and so on. Be- out which commit in the dependent service broke their low we propose a method which, given a list of com- build. We felt that the severity of this issue could be re- mands and the name of the software to install, prunes duced if could have information about the last commit the command list to the essential commands required for of the dependent service that worked for Solum. Then installing the software. the problem would be constrained to finding the culprit The main issues that need to be addressed in design- commit between the last known working commit and the ing this method are: (a) detecting where to start and stop latest commit. Essentially, we needed to maintain prove- in user’s command history, and (b) determining whether nance information related to inter-service dependencies the candidate list of commands is complete. To address (subsection 3.2). the issue of where to start and stop in a long execution history we define the notion of an anchor point (ap) 2.3 Platform deployment command. Example of an anchor point command on Ubuntu is apt-get update. The reason this command can Each OpenStack service has large number of configura- be treated as an anchor point is because in our experi- tion parameters. It is hard for developers of a service like ence, updating packages on the host is a typical usage Solum, which depends on other services, to know the pa- pattern when trying to install new software on Ubuntu rameters from dependent services that are critical to the hosts. The anchor point commands may also be speci- operation of their service. We have faced this issue sev- fied by users. Moreover, instead of considering the entire eral times in the devstack [14] setup of OpenStack, which command history to build the candidate list, we can use supports running all the OpenStack services within a sin- the last command that led to successfully starting up of gle VM, typically for the purpose of design and devel- the service as another anchor point. Lets call this as the opment. As an example of how configuration parame- last successful command (lsc). To address the issue of 2 how to find out whether the candidate list of commands fore this commit. Below we propose two approaches that is complete, we propose that we need to try out that com- can help with this process. mand list in an automated manner in a contained environment. Our idea for this is to build a docker container 3.2.1 Dependent services’ commits tracking consisting of the list of candidate commands and check it with a verification script provided by the application One way to find the commit(s) to pin to is to maintain developer which checks whether that software was cor- provenance information about Solum that includes the rectly installed on the container or not.

Kulkarni [email protected] Rackspace, the University of Texas at Austin

Extended Picodos Reference Manual

Digital Backend Software Command Set – Ver

The Linux Command Line

The Steps for Updating the 2488

Database Administration - System Management User's Guide

Introduction: DOS (Disk Operating System) Is an Oldest Type of Operating System

EPSON RC+ 5.0 (Ver.5.4) User's Guide Rev.3 I

MS-DOS Microsoft Disk Operating System

SPEL+ Language Reference (Ver.4.2) Rev.3 I

Command Line Interface User's Guide for NEC Expressupdate [PDF]

MCQ Questions from MS

Tutorial De Linux Comandos Básicos