Configuration and Orchestration Techniques for Federated Resources

Denis Weerasiri

A thesis in fulfilment of the requirements for the degree of Discovery and Adaptation of DoctorProcess of Philosophy Views

THE UNIVERSITY OF NEW SOUTH WALES

SYDNEY AUSTRALIA · School of Computer Science and Engineering A dissertation submitted in fulfillment Faculty of Engineering of the requirements for the degree of

Doctor of Philosophy Supervisor: Prof. Boualem Benatallah in Computer Science and Engineering February 2016 Hamid Reza Motahari Nezhad Supervisor: Prof. Boualem Benatallah

12 February 2008 PLEASE TYPE THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet

Surname or Family name: WEERASIRI

First name: WICKRAMA ARACHCHILLAGE DENIS Other name/s: DHANANJAYA

Abbreviation for degree as given in the University calendar: PhD

School: School of Computer Science and Engineering Faculty: Faculty of Engineering

Title: Configuration and Orchestration Techniques for Federated Cloud Resources

Abstract 350 words maximum: (PLEASE TYPE)

Cloud resources are central to the operation of modern software-driven organizations. Due to the exponential growth of the number and diversity of cloud resources, organizations are inspired to build and deploy services, processes and applications by intermixing a set of best-of-breed cloud resources. It is estimated that nearly half of all large enterprises will have such intermixed (or we call federated) cloud resource deployments by the end of 2017.

Federated cloud resources, a special category of composite resources, which draws component resources from one or more public clouds and one or more private clouds. In this dissertation, we investigate the problems of configuration and orchestration of federated cloud resources. Addressing this problem is challenging, as component resources of federated cloud resources are distributed across multiple heterogeneous, autonomous and evolving cloud providers. Moreover, cloud-based applications may possess dynamic resource requirements during different phases of their life-cycle. Consequently, designing interoperable, portable and effective cloud resource configuration and orchestration techniques that cope with both heterogeneous and dynamic environments remains a deeply challenging problem.

To address these challenges, we first propose a taxonomy framework for cloud resource consumers to improve the awareness of the fundamental building blocks within the domain of cloud resource orchestration. Our taxonomy framework allows consumers to efficiently explore, understand, compare, contrast and thereby be able to wisely and rationally evaluate cloud resource orchestration techniques based on consumers' requirements. We then present model-driven and process-driven techniques to describe, reuse and orchestrate elementary and federated cloud resource configurations. In conjunction with, we also propose a pluggable architecture to translate these high-level models into resource descriptions and management rules which can be interpreted by external configuration and orchestration tools such as Juju and . We next propose a rule-based configuration and orchestration knowledge recommender service which empowers incremental acquisition, curation, and recommendation of knowledge based on users' contexts. Finally we introduce a language for effective comprehension and visualization of cloud resource orchestration concerns. This language allows to visually represent, monitor and control cloud resource configurations. All aforementioned proposals have been implemented tools and experimentally validated based on real-world user scenarios and user studies.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral theses only).

'2.."' 1.b --o 2.. ~a ...... 1 ...... · · ······ · ······~Signature ·-··· ~ - --~ ·- ···· Date

The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award:

THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS ORIGINALITY STATEMENT

I hereby declare that this submission is my own work and to the best of my knowl­ edge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of t his thesis is the product of my own work, except to the extent that assistance from others in the projects design and conception or in style, presentation and linguistic expression is acknowledged.

S i gner~... ,. '

2 '0 l 0 2 - 0 Date ...... 8, .·..-...... ~ ii

COPYRIGHT STATEMENT .

I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after knovvn, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain t he right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only) . I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright ma­ terial; where permission has not been granted I have applied/ will apply for a partial restriction of the digital copy of my thesis or dissertation.

Sign ed ~ ..... ,

Date ..::Lo...... / .b ~ D L - o .4

AUTHENTICITY STATEMENT

I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.

Signed ..~ .

..2_o /i - 0). - 0Cj Date ...... '? ...... iii

ACKNOWLEDGEMENTS

If I have seen further it is by standing on the shoulders of Giants - Sir Isaac Newton

First and foremost, I would like to express my sincere appreciation and deep gratitude to my supervisor, Scientia Professor Boualem Benatallah, for his excep- tional support, encouragement and guidance during the last three years and five months. Boualem taught me how to do high-quality research and helped me think creatively. His truly incredible academic excellence and beautiful mind have made him as a constant oasis of ideas and passions in science, which has inspired and en- riched my growth as a student, a researcher and a scientist. Moreover, I thank him for providing me with the opportunity to work with a talented team of researchers.

I gratefully dedicate this dissertation to my father Alfred, my mother Chandra, my wife Nipuni and other members of my family, for their love, patience, and un- derstanding. They allowed me to spend most of the time on this thesis. They are my source of strength and without their countless support this dissertation would have never been started.

I gratefully thank my co-authors, Dr. Moshe Chai Barukh and Dr. Amin Be- heshti for their enjoyable collaborations. I would like to thank Dr. Moshe especially for his help in reviewing and editing our publications.

My sincere thanks go to everyone in the Service-Oriented Computing (SOC) group at UNSW, especially Professor Fethi Rabhi (my co-supervisor), Srikumar Venugopal, Mortada Al-Banna, John Sun, George Ajam, Mohammed Allahbaksh and Helen Paik for their friendship, support and helpful comments. In addition, I would like to thank the Ph.D. review panels and the anonymous reviewers who provided suggestions and helpful feedback on my publications.

Working at the School of Computer Science and Engineering at the University of New South Wales (UNSW) has been a great pleasure and a wonderful privilege. I iv acknowledge Smart Services CRC, University of New South Wales, and the Faculty of Engineering at UNSW for providing scholarships to pursue doctoral studies. In addition, I would like to thank administrative and technical staff members of the school of computer science and engineering at UNSW who have been kind enough to advise and help in their respective roles.

Denis Weerasiri Sydney, Australia February 2016 v

To my family for their love, patience, and understanding vi

ABSTRACT

Cloud resources are central to the operation of modern software-driven organiza- tions. Due to the exponential growth of the number and diversity of cloud resources, organizations are inspired to build and deploy services, processes and applications by intermixing a set of best-of-breed cloud resources. It is estimated that nearly half of all large enterprises will have such intermixed (or we call federated) cloud resource deployments by the end of 2017.

Federated cloud resources, a special category of composite resources, which draws component resources from one or more public clouds and one or more pri- vate clouds, combined at the behest of its users. In this dissertation, we investi- gate the problems of configuration and orchestration of federated cloud resources. Addressing this problem is challenging, as component resources of federated cloud resources are distributed across multiple heterogeneous, autonomous and evolving cloud providers. Moreover, cloud-based applications may possess dynamic resource requirements during different phases of their life-cycle. Consequently, designing in- teroperable, portable and effective cloud resource configuration and orchestration techniques that cope with both heterogeneous and dynamic environments remains a deeply challenging problem.

To address these challenges, we first propose a taxonomy framework for cloud resource consumers to improve the awareness of the fundamental building blocks within the domain of cloud resource orchestration. Our taxonomy framework allows consumers to efficiently explore, understand, compare, contrast and thereby be able to wisely and rationally evaluate cloud resource orchestration techniques based on consumers’ requirements. We then present model-driven and process-driven tech- niques to describe, reuse and orchestrate elementary and federated cloud resource configurations. In conjunction with, we also propose a pluggable architecture to translate these high-level models into resource descriptions and management rules which can be interpreted by external configuration and orchestration tools such as vii

Juju and Docker. We next propose a rule-based configuration and orchestration knowledge recommender service which empowers incremental acquisition, curation, and recommendation of knowledge based on users’ contexts. Finally we introduce a language for effective comprehension and visualization of cloud resource orchestra- tion concerns. This language allows to visually represent, monitor and control cloud resource configurations. All aforementioned proposals have been implemented tools and experimentally validated based on real-world user scenarios and user studies. viii

PUBLICATIONS

• Weerasiri D., Benatallah B., and Barukh M.C., “Process-driven Configura- tion of Federated Cloud Resources”, 20th International Conference on Database Systems for Advanced Applications (DASFAA 2015), volume 9049 of Lecture Notes in Computer Science, pages 334-350. Springer-Verlag Berlin Heidelberg, 2015.

• Weerasiri D., and Benatallah B., “Unified Representation and Reuse of Fed- erated Cloud Resources Configuration Knowledge", 19th International Con- ference on Enterprise Distributed Object Computing (EDOC), pages 142-150. IEEE, 2015.

• Weerasiri D., Barukh M.C., Benatallah B., and Jian C., “CloudMap: A Visual Notation for Representing & Managing Cloud Resources”, 28th Inter- national Conference on Advance Information Systems Engineering (CAiSE 2016), volume 9694 of Lecture Notes in Computer Science, pages 427-443. Springer-Verlag Berlin Heidelberg, 2016.

• Weerasiri D., Benatallah B., Barukh M.C., and Jian C., “A Model-Driven Framework for Interoperable Cloud Resources Management”, 14th Interna- tional Conference on Service Oriented Computing (ICSOC 2016), volume 9936 of Lecture Notes in Computer Science. Springer-Verlag Berlin Heidelberg, 2016.

• Weerasiri D., Barukh M.C., Benatallah B., and Sheng Q.Z., “A Taxonomy and Survey of Cloud Resource Orchestration Techniques”, ACM Computing Surveys (CSUR), 2016. submitted

• Weerasiri D., Benatallah B., and Yang J., “Unified Representation and Reuse of Federated Cloud Resources Configuration Knowledge”, unsw-cse-tr-201411, University of New South Wales, 2014. ix

• Weerasiri D., Benatallah B., Barukh M.C., and Jian C., “A Model-Driven Framework for Interoperable Cloud Resources Management”, unsw-cse-tr-201514, University of New South Wales, 2015. Contents

1 Introduction1

1.1 Preliminaries ...... 3

1.1.1 Cloud Resources ...... 3

Federated Cloud Resources ...... 5

1.1.2 Cloud Resource Lifecycle ...... 6

1.2 Key Research Issues ...... 8

1.2.1 Interoperability of Cloud Resource Orchestration Techniques .9

1.2.2 Dynamic Re-configuration of Cloud Resources ...... 10

1.2.3 Knowledge Reuse for Cloud Resource Orchestration ...... 10

1.2.4 Understanding Cloud Resource Orchestration Concerns . . . . 11

1.3 State of the Art ...... 12

1.4 Contributions Overview ...... 12

1.5 Dissertation Organization ...... 15

2 A Taxonomy and Survey of Cloud Resource Orchestration Tech- niques and Tools 18

2.1 Introduction ...... 18

2.2 Cloud Resource Orchestration ...... 19

2.3 Cloud Resource Orchestration Taxonomy ...... 22

x CONTENTS xi

2.4 Resources ...... 25

2.4.1 Resource Types ...... 25

2.4.2 Resource Entity Model ...... 26

Resource Entities ...... 27

Resource Relationships ...... 28

Constraints ...... 31

2.4.3 Resource Access Methods ...... 33

Command Line Interfaces (CLIs) ...... 33

Software Development Kits (SDKs) ...... 34

Application Programming Interfaces () ...... 34

Graphical User Interfaces (GUIs) ...... 35

2.4.4 Resource Representation Notation ...... 35

Textual notations ...... 36

Visual notations ...... 37

Hybrid notation ...... 37

2.5 Resource Orchestration Capabilities ...... 38

2.5.1 Primitive Actions ...... 38

2.5.2 Orchestration Strategies ...... 42

Script-based Orchestration Strategies ...... 42

Reactive Orchestration Strategies ...... 43

State-based Orchestration Strategies ...... 44

Proactive Orchestration Strategies ...... 45

2.5.3 Language Paradigm ...... 46

Imperative Programming ...... 47

Declarative Programming ...... 47 CONTENTS xii

2.5.4 Theoretical Foundation ...... 49

2.5.5 Cross-cutting Concerns ...... 50

2.6 User Types ...... 50

2.6.1 DevOps ...... 50

2.6.2 Application Developers ...... 51

2.6.3 Domain Experts ...... 51

2.7 Runtime Environment ...... 51

2.7.1 Technique ...... 52

2.7.2 Execution Model ...... 53

2.7.3 Target Environment ...... 54

Public Cloud ...... 54

Private Cloud ...... 55

Federated Cloud ...... 55

2.8 Knowledge Reuse ...... 56

2.8.1 Reuse Artifact ...... 56

Concrete and Template Resource Descriptions ...... 56

Resource Snapshots ...... 57

Miscellaneous ...... 57

2.8.2 Reuse Techniques ...... 58

Search Indexes ...... 58

Recommendations ...... 58

Community-driven Techniques ...... 58

2.9 Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques ...... 60

2.9.1 Selection Process ...... 60 CONTENTS xiii

2.9.2 Resources and User Type ...... 61

2.9.3 Resource Orchestration Capabilities ...... 64

2.9.4 Knowledge Reuse ...... 66

2.9.5 Runtime Environment ...... 68

2.10 Conclusion ...... 69

3 A Model-Driven Framework for Interoperable Cloud Resources Management 71

3.1 Introduction ...... 71

3.2 Limitations of Existing C&M Techniques ...... 74

3.3 Cloud Resources Management Architecture: An overview ...... 77

3.4 Extracting Domain-specific Models from Tool-specific Resource Arti- facts ...... 80

3.4.1 An Embryonic Cloud Resource Configuration & Management Model ...... 81

3.4.2 Docker-based Domain-specific Model ...... 83

3.4.3 Juju-based Domain-specific Model ...... 89

3.4.4 Representing Federated cloud resources using Domain-specific Models ...... 93

3.4.5 Connectors ...... 96

Using Docker Connector to generate native resource manage- ment artifacts from Domain-specific Models: . . . . . 98

3.5 Implementation ...... 99

3.5.1 Connector Curators ...... 102

3.5.2 DSM Curators ...... 102

3.5.3 Event Management System ...... 105

3.5.4 Rule Processor ...... 107 CONTENTS xiv

3.5.5 Use-case scenario ...... 109

3.6 Evaluation ...... 110

3.6.1 Results, Analysis and Discussion ...... 111

Incurred cost of implementing Domain-specific Models and Connectors: ...... 112

3.7 Related Work ...... 113

3.8 Conclusion and Future Work ...... 116

4 Process-driven Configuration of Federated Cloud Resources 117

4.1 Introduction ...... 117

4.2 Modeling Cloud Resource Configuration Tasks ...... 119

4.2.1 Motivating Scenario ...... 120

4.2.2 Cloud Resource Deployment Tasks ...... 121

4.2.3 Cloud Resource Re-configuration Policies ...... 123

4.3 Translating CRD-Task and CRR-Policy into BPMN ...... 123

4.3.1 Translating CRD-Tasks ...... 124

4.3.2 Translating CRR-Policies ...... 124

4.4 Implementation and Evaluation ...... 126

4.4.1 Evaluation ...... 126

4.4.2 Analysis and Discussion ...... 129

4.5 Related Work ...... 131

4.6 Conclusions & Future Work ...... 132

5 A Recommender Service for Knowledge Reuse in Cloud Resource Configurations 134

5.1 Introduction ...... 134 CONTENTS xv

5.2 Knowledge Base for Reuse of Resource Configurations ...... 136

5.2.1 Recommendation Rules ...... 137

Contexts of Rules ...... 137

Conclusions of Rules ...... 138

5.2.2 Reuse of Configuration Knowledge ...... 139

5.2.3 Knowledge Acquisition Process ...... 141

5.3 Implementation and Evaluation ...... 143

5.3.1 Experiment ...... 143

5.3.2 Results and Analysis ...... 145

5.4 Related Work and Discussion ...... 146

5.5 Conclusion and Future Work ...... 148

6 CloudMap: A Visual Notation for Representing & Managing Cloud Resources 149

6.1 Introduction ...... 149

6.2 Motivating Example ...... 151

6.3 CloudMap: Visual Notation for Cloud Resource Management . . . . . 152

6.3.1 Structural Model: Entities ...... 154

1. Container ...... 154

2. Hosting Machine ...... 155

3. Cluster ...... 156

4. Application ...... 157

5. Image ...... 158

1. Hosting Machine Registry ...... 159

2. Application Registry ...... 160

3. Image Registry ...... 161 CONTENTS xvi

6.3.2 Navigation Model: Links ...... 162

6.3.3 Badges: Probes and Control-Actions ...... 165

Probes ...... 167

Control Actions ...... 167

6.3.4 Visualization Patterns for Cloud Resource Configurations . . . 168

Image Map...... 168

Application Map...... 173

Hosting-Machine Map...... 176

6.4 Implementation ...... 176

6.4.1 Mind-Map Generation ...... 178

6.4.2 Activity/Control Wall ...... 178

Widgets: ...... 180

Command Line Interface: ...... 181

6.5 Evaluation ...... 181

6.5.1 Experimental Setup ...... 181

6.5.2 Questionnaire ...... 182

6.5.3 Participant Selection & Grouping ...... 182

6.5.4 Experiment Results & Analysis ...... 183

Evaluation of H1 and H2...... 183

Evaluation of H3...... 184

6.5.5 Discussion ...... 184

6.6 Related Work ...... 186

6.7 Conclusion and Future Work ...... 188

7 Conclusions and Future Work 190 CONTENTS xvii

7.1 Concluding Remarks ...... 190

7.2 Future Directions ...... 193

Bibliography 197

Appendix 217

A Evaluated Orchestration Tools and Research Initiatives in Chapter 2 218

B List of References Organized by Taxonomy Dimensions in Chapter 2 220

B.1 Resources and User Type ...... 220

B.2 Resource Orchestration Capabilities ...... 228

B.3 Knowledge Reuse ...... 234

B.4 Runtime Environment ...... 236

C Java-based implementation of a Connector 241

D Evaluation Questionnaire in Chapter6 246

D.1 Background Questions ...... 246

D.2 Functionality Questions ...... 246

D.3 Insight Questions ...... 247

D.4 Improvement Questions ...... 247 List of Figures

1.1 Categorization of Cloud Resources ...... 4

1.2 Cloud Resource Deployment Models ...... 4

1.3 Cloud Resource Configuration of a 2-tier Application ...... 6

1.4 The life-cycle of Cloud resources ...... 8

2.1 Reference Architecture for Cloud Resource Orchestration ...... 20

2.2 A Taxonomy in Cloud Resource Orchestration ...... 24

2.3 Resource Entities and Relationships of a Web application ...... 29

2.4 Communication Relationship between a Web Application and Database 29

2.5 Apache-Tomcat-Server depends on SSL-Library ...... 30

2.6 Inheritance Relationships between Docker Images ...... 30

2.7 Containment Relationships within an OpsWorks Stack ...... 31

2.8 Web application server and Log processor, hosted in one VM . . . . . 31

2.9 CA-Applogic GUI ...... 35

2.10 Visual notation in CA-Applogic for a Web application ...... 37

2.11 State transitions of the cloud resource life cycle ...... 38

2.12 A Composite Resource Infrastructure for a Web application . . . . . 39

2.13 Orchestration Workflow (in black and bold) for Scaling-up an Apache Tomcat Application Engine Cluster ...... 42

xviii LIST OF FIGURES xix

2.14 OS-level (left) vs. Container Manager (right) ...... 53

2.15 Sub-dimensions of Knowledge Reuse ...... 56

3.1 Components and relationships of a Node.js Web application stack . . 76

3.2 System Overview ...... 77

3.3 UML Class diagram for Resource Description Model ...... 82

3.4 UML Class diagram for Resource Management Model ...... 83

3.5 Domain-specific Model for Docker ...... 85

3.6 Juju based Domain-specific Model ...... 90

3.7 Technical translation of Domain-specific Models (per each Image in- stance) ...... 100

3.8 Technical translation of a high-level action into low-level API calls . . 101

3.9 Internal Architecture ...... 101

3.10 Entity-Schema Editor ...... 103

3.11 Relationship-Schema Editor ...... 103

3.12 Container and Image Entity in Docker-based Domain-specific Model . 104

3.13 Instantiation Relationship in Docker-based Domain-specific Model .. 104

3.14 Results (Time, grouped by expertise); t-test Results; and Lines-of-Code112

4.1 Deployment plan for a web application in an Apache Tomcat cluster . 121

4.2 Modeling application updates within the deployment plan in Figure 4.1121

4.3 Modeling application updates in Figure 4.2 using CRR-Policy . . . . 122

4.4 Conceptual model of CRD-Task and CRR-Policy ...... 122

4.5 A CRD-Task and its BPMN generation (within the dotted rectangle) 124

4.6 A CRD-Task with two CRR-Policies and its BPMN generation (within the dotted rectangle) ...... 125

4.7 System Overview ...... 127 LIST OF FIGURES xx

4.8 Extended BPMN Editor ...... 128

5.1 Rule based Recommender System Overview ...... 137

5.2 UML class diagram of Recommendation Rules ...... 139

5.3 Example of Recommendation Rule Trees ...... 140

5.4 Internal Architecture ...... 144

5.5 Accuracy of recommendations vs. knowledge-base size ...... 145

6.1 State transitions of the cloud resource life cycle ...... 151

6.2 Resource diagram of the typical 3-tier (BPEL-based) Application . . 152

6.3 CloudMap Visual Notations ...... 153

6.4 CloudMap Syntactical Schema of Constructs ...... 153

6.5 Visual construct of the Container ...... 155

6.6 Visual construct of the Hosting Machine ...... 156

6.7 Visual construct of the Cluster ...... 157

6.8 Visual construct of the Application ...... 158

6.9 Visual construct of the Image ...... 159

6.10 Visual construct of the Hosting Machine Registry ...... 160

6.11 Visual construct of the Application Registry ...... 161

6.12 Visual construct of the Image Registry ...... 162

6.13 Visual construct of the Communication Link ...... 162

6.14 Visual construct of the Containment Link ...... 163

6.15 Visual construct of the Hosting Link ...... 164

6.16 Visual construct of the Dependency Link ...... 164

6.17 Visual construct of the Instantiation Link ...... 165

6.18 Visual representation of a probe ...... 166 LIST OF FIGURES xxi

6.19 Visual representation of a widget ...... 167

6.20 Image Map ...... 168

6.21 Application Map ...... 175

6.22 Hosting Machine Map ...... 177

6.23 CloudMap System Architecture ...... 179

6.24 Time Results (grouped by expertise) to complete the tasks; and below t-test Results for H1 and H2 ...... 185

6.25 Rate of usability of the main features of CloudMap ...... 186 List of Tables

2.1 The Resource and User Type Dimensions of the Selected Platforms . 63

2.2 The Resource Orchestration Capabilities Dimension of the Selected Platforms ...... 65

2.3 The Knowledge Reuse Dimension of the Selected Platforms ...... 67

2.4 The Runtime Environment Dimension of the Selected Platforms . . . 69

3.1 Heterogeneous configuration and management interfaces ...... 76

3.2 Operations of Connector interface ...... 80

4.1 Results of the experiment ...... 130

B.1 Representative literature references for the Resources and User Types dimensions ...... 221

B.2 Representative literature references for the Resource Orchestration Capabilities dimension ...... 228

B.3 Representative literature references for the Knowledge Reuse dimension234

B.4 Representative literature references for the Runtime Environment di- mension ...... 236

xxii Chapter 1

Introduction

Since the mainframe era, information technologies have evolved through several architectural paradigms: client-server, Web 1.0, Web 2.0 and most recently mobile computing, social computing, big data, of things and . The many benefits of cloud computing, include enabling virtualization capabilities and outsourcing strategies. It is estimated that by 2016 the growth in cloud computing will consume the bulk of IT spend, whereby nearly half of all large enterprises will comprise hybrid cloud service deployments by end of 2017 [83]. Cloud services are now firmly recognised as engines of innovation and online service-enabled business transformation.

Cloud computing is designed upon service-orientation principles which convert application, hardware and software infrastructures into standardized and dynam- ically scalable resources. These resources are available on demand for consumers as cloud services. Similar to Web services, cloud services provide an abstraction layer that shifts the focus from infrastructure and operations to cloud services and application management [168, 159]. Cloud services, which are accessible as Web services, give cloud consumers the illusion of infinite resource pools and pay-per- needed or pay-per-usage costing schemes instead of requiring upfront expenditures on resources which may never be consumed optimally. Certainly, due to aforemen- tioned advantages, cloud computing has been rapidly adopted by both government and non-government organizations.

1 2

Cloud services can be categorized into main three layers of service offerings: Software (SaaS), (PaaS) and Infrastructure as a Service (IaaS) where consumers of cloud services are able to shift the focus on what a service offers them instead of knowing internal details of the system that deliv- ers the service. SaaS layer offers software applications (e.g., customer relationship management, project management, social networks, Web conferencing, healthcare), designed for end users and delivered over the Internet. Paas layer offers a set of software development and runtime platforms (e.g., databases, middleware, content delivery networks) which make the programming and deployment of SaaS applica- tions efficient. Finally, IaaS layer supplies processing power, networking, storage and hosting environments which empower both PaaS and SaaS layers [168, 177, 207].

Cloud computing is evolving in the form of both public (deployed by IT or- ganisations) and private clouds (usually deployed behind a company or user group enterprise firewall) (refer to Figure 1.2). Public cloud services typically package re- sources and lease them to customers. A third option, a hybrid or federated cloud [21, 200], draws computing resources from one or more public clouds and one or more private clouds, combined at the behest of its users.

However, there are fundamental and technical issues to be addressed in order to facilitate development and management of cloud-based solutions [168, 177, 135, 169, 200]. In particular, heterogeneity among cloud resource management tools and diversity of cloud resource providers pose technical challenges in implementing and managing interoperable and portable cloud solutions. Existing cloud resource man- agement techniques generally do not provide all the features required to manage cloud resources during each phase (e.g., selection, configuration, deployment, mon- itoring, controlling) of the life-cycle. Therefore end-to-end modeling and manage- ment of cloud resources require organizations to utilize multiple management tech- niques in conjunction. However, due to the aforementioned heterogeneity among them, this implies organizations are forced to implement ad-hoc and fragmented solutions which are not either easily comprehensible, cost-effective or reliable. Han- dling variations in resource requirements, and environment changes; and maintain- ing quality-of-service, security and privacy result in cloud resource management to 1.1. Preliminaries 3 be an extremely complex task. Along with the growing scale of cloud services in quantity, quality, and diversity, organizations are forced to gain knowledge, skills and understanding in cloud services and their management aspects to come up with a successful approach to gain maximum benefits from cloud based solutions. To address these challenges, we have therefore positioned this thesis to facilitate the federation of cloud resource configuration and orchestration techniques.

The rest of this chapter is organised as follows: In Section 1.1 a brief prelimi- nary background is presented covering the relevant concepts, and dimensions of the research landscape in which this thesis is positioned. In Section 1.2, we outline the key research issues that have been tackled. In Section 1.3, we summarize the related work in the domain of cloud resource orchestration. In Section 1.4, we summarize the main contributions; and finally in Section 1.5, we present the organisational structure of this thesis.

1.1 Preliminaries

This section gives a brief introduction to the main topics of this dissertation, namely Cloud Resources and Federated Cloud Resources.

1.1.1 Cloud Resources

Cloud computing is commonly understood as the overarching terminology refer- ring to the spectrum of “resources” offered “as-a-service”. As illustrated in Figure 1.1, Cloud providers typically offer three different layers of resources. Software (e.g., healthcare, email clients, CRM, telecommunication), Platform (e.g., program- ming language runtimes, databases, web servers) and Infrastructure (e.g., virtual machines, storage, networking). Infrastructure layer is the most basic form of cloud resources and each higher layer abstracts from the details of the lower layer. Infrastructure-layer resources power the platform-layer resources. Software-layer resources are usually build upon underlying Platform and Infrastructure-layer re- sources. 1.1. Preliminaries 4

Figure 1.1: Categorization of Cloud Resources

Figure 1.2: Cloud Resource Deployment Models

Based on the structural attributes of cloud resources, they can be categorized into two main types: (1) Atomic resources and (2) Composite resources.

• Atomic Resource - An Atomic Resource represents a cloud resource that does not rely on any other resource. In other words, an Atomic Resource is an indivisible resource into component resources. For example, a VM with 4GB RAM and 4GHz processing power, can be modeled as an Atomic Resource with two Attributes for the memory and processing power. Atomic Resources act as primary building blocks of Composite Resources. 1.1. Preliminaries 5

• Composite Resource -A Composite Resource is an umbrella structure that brings together other Atomic and Composite Resources to model a config- uration, which is composed with multiple cloud resources. An example of Composite Resource would be an E-Learning platform that consists of an arti- fact management service and student identity management service. The cloud resources brought together by a Composite Resource are referred to as its component resources.

Federated Cloud Resources

Cloud resources can be deployed in the form of both public (deployed by IT organi- zations) and private clouds (usually deployed behind a company firewall). Due to no or minimal capital expenditure, public cloud environments offer cost-effective cloud resources compared to private cloud environments. Private cloud environments re- quire higher capital expenditure, which results in cloud resources with higher unit costs. On the other hand, private cloud environments are more secure and provide higher orchestration flexibility compared to public cloud environments. A third op- tion, a hybrid or federated cloud, is now emerging, where computing resources drawn from a subset of private and public clouds are combined to leverage advantages in multiple cloud providers.

For example, let’s consider an e-learning service, which consist of an artifact management service and identity management service to securely maintain assign- ments and lecture slides decks. The artifact management service can be deployed in a low cost and scalable public cloud (e.g., AWS S3). On the other hand, identity management system could be deployed within a private cloud in order to secure personal information of students and lectures.

An amalgam of resources across multiple clouds motivates consumers to imple- ment applications with higher scalability and wider resource availability compared to an individual cloud. Furthermore, consumers employ federated clouds to build in- teroperable applications, which do not depend on a particular cloud provider. Such interoperable applications are inclined to achieve higher availability and perform 1.1. Preliminaries 6 better in disaster recovery situations.

1.1.2 Cloud Resource Lifecycle

In much the same way practitioners have abstracted the lifecycle model, for example in the case of software engineering artifacts (e.g., waterfall) [124], or Business Process Management (BPM) [71] – we propose a similar lifecycle model suited for cloud resource, as elucidate below. This model aims to empower better understanding of the underlying support systems.

1. Selection. Consumers first select cloud resources that satisfy their requirements. Both functional (e.g. storage capacity, number of CPUs) and nonfunctional (e.g. cost, availability) attributes are considered to determine the optimal configura- tions. For a 2-tiered application, two VMs are selected as shown in Figure 1.3. A VM with a large number of CPU cores is required for the application tier; while a VM with high storage capacity is required for the database tier.

2. Configuration. This phase involves: (i) specifying the type of required resources; (ii) providing values for resource description attributes; and (iii) establishing re- lationships with other resources. The resource configurations are then inputted and deployed on a providers environment (e.g., (AWS)). In the example shown at Figure 1.3, consumers may specify a JSON-file that describes the two VMs in terms of CPU, memory and storage capacity; as well as their relationship, such as the Application VM depends on the Database VM to store data.

3. Deployment. During deployment, resource providers: (i) interpret the specified

depends-on

Application Database VM VM

Figure 1.3: Cloud Resource Configuration of a 2-tier Application 1.1. Preliminaries 7

resource configurations; (ii) instantiate cloud resources; and (iii) instantiate rela- tionships with dependent resources. For example, a provider such as AWS would interpret the JSON-based configuration request, and accordingly create the nec- essary VMs with the specified resource description attributes. The deployment sequence is based on the relationship between the two VMs. Consumers may further fine tune resources (e.g., modify resource description attributes), in or- der to meet consumers’ requirements. For example, consumers may edit firewall configurations of two related VMs to instantiate a secured data communication channel.

4. Monitoring. Consumers may then monitor the Quality of Service (QoS), and verify whether the deployed resources individually and collectively satisfy the Service Level Agreements (SLAs) between the consumer and provider [185, 25]. Monitoring refers to collecting and analyzing events, and is an essential concern for enabling elasticity and handling unexpected behaviors [1]. For instance, by analyzing the CPU utilization and data transfer activity, consumers may determine whether the Application VM has crashed, is overused, underused or operational as expected.

5. Control. While monitoring provides the mechanism to obtain the necessary operative information, control enables taking actions when necessary. For in- stance, if there are SLA violations or requirement variations, consumers may apply control operations (e.g., scale in/out, migrate) to retain or update the con- sumers’ requirements. Likewise, elasticity and unexpected behaviors are usually handled through invoking control tasks (e.g., increase or restart VMs) that run as a result of events. The control tasks span from simple reconfigurations (e.g., restarting VMs) to complex processes (e.g., migrating and scaling up/down a set of VMs).

To manage cloud resources over each phase of the aforementioned lifecycle, var- ious services and processes are used to: select, describe, configure & deploy, monitor and control cloud resources. We refer to the term Cloud Resource Orchestration to denote such services and processes. From the consumers’ perspective, the function 1.2. Key Research Issues 8

Selection

System Adminstrators, Application Developers Configuration Cloud Resource Descriptions

Deployment

Monitoring depends-on

Application Database VM VM

Deployed Cloud Resources

Control

Figure 1.4: The life-cycle of Cloud resources of orchestration systems is to bind resources and operations (e.g., deploy, monitor, scale-out), thereby providing an abstraction layer that shifts the focus from the underlying resource infrastructure, to available orchestration services and resource management [205]. Throughout this thesis, we use the term DevOps for system ad- ministrators, software engineers and application developers who collectively involved in the orchestration of cloud resources.

1.2 Key Research Issues

In this section, we outline key research issues tackled in this dissertation. We in- tend to facilitate the federation of cloud resource orchestration techniques. We therefore separate research issues into three areas: (1) Interoperability of cloud re- source orchestration techniques, (2) Dynamic re-configuration of cloud resources, (3) Knowledge reuse for cloud resource orchestration, and (4) Understanding cloud resource orchestration concerns. 1.2. Key Research Issues 9

1.2.1 Interoperability of Cloud Resource Orchestration Tech- niques

The rapid growth of tools facilitating different aspects (i.e., selection, configuration, deployment, monitoring and controlling) of cloud resource life cycle encourages De- vOps to design end-to-end and automated orchestration tasks that span across a selection of the most suitable tools. But heterogeneities among resource description models and orchestration capabilities of such tools pose inherent and fundamental limitations when orchestrating complex and dynamic cloud resources.

For example, when Apache Web server, deployed in a private cloud (e.g., Open- Stack), reaches its maximum capacity, the excess load can be outsourced to a replica of the Apache Web server that is deployed in a public cloud resource provider (e.g., Amazon Web Services). In general when a single cloud resource provider cannot sat- isfy all application and resource requirements DevOps are inevitably responsible to describe, deploy and manage the component resources of a federated cloud resource configuration in a segregated fashion. As the application development becomes in- creasingly distributed across multiple, heterogeneous and evolving networks, it be- comes increasingly difficult to develop and manage interoperable and portable cloud solutions. Because DevOps have to deal with multiple orchestration languages (e.g., AWS OpsWorks1, Ubuntu Juju2 and OpsCode Chef3). These languages possess dif- ferent notations (e.g., json, xml, yaml); resource description models (e.g., OpsWorks Stacks4, Juju charms5, recipes6); resource access interfaces (e.g., command line interfaces, web interfaces); and capabilities (e.g., deployment, scaling, migration, monitoring) [159, 214]. To describe, deploy and manage a federated cloud resource configuration, DevOps need to understand the orchestration languages of all partic- ipating cloud resource providers.

It is desirable to have languages that support high-level and tool-independent 1https://aws.amazon.com/opsworks/ 2http://www.ubuntu.com/cloud/juju 3https://www.chef.io/chef/ 4http://docs.aws.amazon.com/opsworks/latest/userguide/workingstacks.html 5jujucharms.com/ 6https://docs.chef.io/recipes.html 1.2. Key Research Issues 10 representations and orchestration of cloud resources. These languages will greatly simplify the representation and manipulation of heterogeneous cloud resources.

1.2.2 Dynamic Re-configuration of Cloud Resources

Implementing and managing elastic cloud services require the ability to dynami- cally scale resources up and down to adopt varying requirements and environment changes. Existing cloud resource providers offer heterogeneous resource deployment and reconfiguration services to implement elastic cloud resource configurations. De- ploying and re-configuring federated cloud resource configurations over such services is challenging due to dynamic application requirements and complexity of cloud en- vironments. For example, modeling a re-configuration tasks (e.g., add storage capac- ity, restart VM instances) that run as a result of events (e.g., service usage increases beyond a certain threshold) by directly interacting with heterogeneous deployment services leads to ad-hoc scripts or manual tasks, which hinder the automation of orchestration activities in a dynamic cloud environment.

It is desirable to have languages that specify high-level deployment processes and reconfiguration policies for cloud resources in a unified manner. These languages will significantly reduce the burden of the managing federated cloud resources.

1.2.3 Knowledge Reuse for Cloud Resource Orchestration

DevOps often share valuable knowledge regarding the orchestration of cloud re- sources. (e.g. resource configuration templates, elasticity rule templates). This knowledge may then be utilised, as DevOps may potentially discover knowledge ar- tifacts (e.g. configuration scripts, documentations, forums, binary installers, and portable packages) when defining cloud orchestration processes. Some other De- vOps may share non-textual packaging formats (e.g., Open Virtualization Format7 (OVF), Snaps8 and Docker Images9) as resource artifacts.

7http://www.dmtf.org/standards/ovf 8https://terminal.com/explore 9https://docs.docker.com/userguide/dockerimages/ 1.2. Key Research Issues 11

Reusing such low-level and cross-layer knowledge artifacts for federated cloud resource orchestration is complex, erroneous and time-consuming for DevOps as it inevitably leads to an inflexible and costly environment which adds considerable complexity, demands extensive programming effort, requires multiple and continuous patches, and perpetuates closed cloud solutions. Rather there is a need to effectively represent, organize and manipulate otherwise low-level, complex and cross-layer orchestration knowledge artifacts into meaningful and higher-level segments.

1.2.4 Understanding Cloud Resource Orchestration Concerns

With the vast proliferation of cloud computing technologies, DevOps are inevitably faced with managing large amounts of complex cloud resource configurations [16]. This involves being able to proficiently understand and analyze cloud resource at- tributes and relationships, and make decisions on demand. However, a majority of cloud tools such as Puppet, Ubuntu Juju, Ansible, Amazon OpsWorks and Chef encode resource descriptions and monitoring and control scripts in tedious textual formats (e.g., executable scripts) [64]. This presents complex and overwhelming challenges for DevOps to manually read, navigate and iteratively build a mental representation especially when it involves a large number of cloud resources. For example, simple management tasks generally involve: analyzing resource attributes and relationships between resources; monitoring events from resources; and invoke control actions to reconfigure those resources, if necessary. Nevertheless, until now DevOps are required to manually and iteratively read several low-level resource de- scription files and use command-line tools to extract monitoring information and invoke control actions. In fact, it has been observed and confirmed that DevOps commit the majority of their time to understand existing configuration and manage- ment artifacts instead of creating new ones, updating and/or testing them [161, 17].

We identify the need for developing visual notations to simplify representation and management of cloud resources. We argue this novel approach will facilitate De- vOps to invest more in creating, configuring and managing cloud resources, instead of the frustrations and time spent to understand them. 1.3. State of the Art 12

1.3 State of the Art

Previous work mostly focussed on specific aspects of cloud resource orchestration, and these surveys investigated specific configuration techniques [64]; monitoring techniques [25]; security assurance best practices [13, 100, 174]; energy-efficiency of orchestration techniques [142]; adaptability of orchestration mechanisms [185, 224]; Quality of Service (QoS) and Service Level Agreements (SLAs) of cloud resource orchestration [96]; as well as interoperability concerns among different orchestra- tion techniques [196, 127]. However, existing surveys are mostly fragmented and lack a holistic view of the problem. A taxonomy and analysis is presented [113], albeit primarily based on industry tools, such as Amazon Web Services and . Moreover, while some of these dimensions overlap with our work in Chapter2, our proposed taxonomy includes additional dimensions (and sub-dimensions) which contribute to an in-depth analysis over a mixture of tools from both industry and academia. Clearly, previous efforts by research and practitioner communities have produced promising results that are certainly useful. However, more holistic efforts and comprehensive analysis are vital to understand fundamentals of cloud resource orchestration and enable federated cloud resource orchestration techniques.

1.4 Contributions Overview

Our goals are to improve the fundamental understanding of cloud resource orches- tration and facilitate the federation of heterogeneous cloud resource orchestration techniques. More specifically, we propose the following contributions:

(i) A taxonomy framework to assess, comprehend, compare and select cloud resource orchestration techniques: Orchestration of cloud resources for building cloud services and applications, which is a greatly sensible and conceptu- ally critical process, is yet to unleash its entire magnitude of power. We identify the significance of a unified and comprehensive analysis framework which accelerates fundamental understanding of cloud resource orchestration in terms of concepts, paradigms, languages, models and tools. There exist a wide range of orchestra- 1.4. Contributions Overview 13 tion techniques covering different aspects (i.e., selection, configuration, deployment, monitoring and controlling) of cloud resource life-cycle. More so, the continuous proliferation of such techniques has encouraged DevOps to adopt a set of best-of- breed techniques to automate cloud resource resource orchestration processes. Albeit adopting the right set of cloud resource orchestration techniques within an organi- zation requires a great deal of investment in terms of time and money - as a wrong choice may often result in devastating rollback consequences.

We present a taxonomy framework and survey to analyse the state-of-the-art in cloud resource orchestration from a rational and holistic viewpoint. The taxonomy framework and survey aim to efficiently explore, assess, understand, compare, con- trast and thereby be able to wisely select cloud resource orchestration techniques based on DevOps’ requirements. We further provide an analysis over a set of me- thodically chosen cloud resource orchestration techniques. Subsequently we derive some future directions based on the technical gaps which were identified during the analysis.

(ii) High-level models for representing and managing elementary and federated cloud resources: To address the challenges discussed in Section 1.2.1, we investigate how to effectively represent, organize and manipulate otherwise low- level, complex and cross-layer cloud resource descriptions into meaningful and high- level segments. We propose Domain-specific Models for high-level representation of low-level and technique-specific cloud resource descriptions. Given that we architect Domain-specific Models over existing cloud resource orchestration techniques, this significantly enhances the potential for knowledge-reuse, since we can better har- ness interoperability capabilities. In order to facilitate large number and variety of cloud resource orchestration techniques (e.g., procedural, activity based and declar- ative), as well as the variety of different target environments (i.e., public, private and federated) - we propose Connectors which deploy and manage cloud resource configurations, described using Domain-specific Models. Ultimately our contribu- tions facilitate to build up an ecological knowledge community of cloud resource orchestration techniques. We implement our system and subsequently conduct a user study in order to evaluate our work, where we demonstrate overall effectiveness 1.4. Contributions Overview 14 of our proposed approach.

(iii) Process-based notations for deployment and reconfiguration of federated cloud resources: Cloud resource orchestration tools offer various de- ployment languages to facilitate resource consumers (e.g., DevOps) to describe, de- ploy and re-configure cloud resource configurations. To describe and deploy fed- erated cloud resource configurations, users should understand deployment service interfaces of all participating resource providers and implement ad-hoc deployment scripts.

We propose a unified and graphical process-based notation to specify deployment and reconfiguration workflows of federated cloud resources. We provide mechanisms which automatically translate higher-level deployment and reconfiguration work- flows into executable BPMN (Business Process Management Notation) processes. Experiments on a real-life federated cloud resource show significant improvements achieved by our approach compared to traditional techniques.

(iv) Knowledge reuse techniques for cloud resource orchestration ar- tifacts: Knowledge reuse is an invaluable feature for efficient and productive cloud resource configuration. However, different cloud orchestration tools adopt differ- ent knowledge reuse methods: parameterized resource description templates (e.g., AWS Cloudformation Templates10), resource snapshots (e.g., VMWare snapshots11, Snaps12), and scripts (e.g., Dockerfiles13) which explain how to configure, deploy and manage cloud resources.

To address this, we propose a recommender system for discovery and selection of cloud resource orchestration knowledge. The recommender system recommends cloud resource orchestration artifacts based on user’s context (e.g., intended task and deployment scenario). We further propose an incremental orchestration knowl- edge acquisition approach that gradually builds a knowledge base for cloud resource orchestration with very little human intervention. We conduct experiments on 36 real-life cloud resource configurations which show efficient reuse of cloud resource 10https://aws.amazon.com/cloudformation/ 11kb.vmware.com//kb/1015180 12www.terminal.com/explore 13docs.docker.com/reference/builder 1.5. Dissertation Organization 15 configuration knowledge by our approach compared to traditional techniques.

(v) Visual notations for representing and managing complex cloud resources: Taking analogies from software development domain and based on user studies, we realize textual cloud resource representations are complex for DevOps as they have to manually read and understand cloud resource configurations, especially when it involves large number of inter-dependent cloud resources.

We propose visual notations and semantics for DevOps to represent, monitor and control cloud resource configurations, which are managed by external cloud resource orchestration techniques such as Docker - thus facilitating DevOps to understand- ing and manage complex cloud resource configurations efficiently. We define Entities and Links to represent cloud resources and relationships between cloud resources. We propose Badges and Widgets which facilitate DevOps to visually and seamlessly monitor and control cloud resources. We introduce three mindmap based reusable visualization patterns for managing complex cloud resources. We conduct an experi- ment and evaluation which show significant productivity and usability improvements of our approach for understanding, navigating, monitoring and controlling cloud re- sources.

1.5 Dissertation Organization

The rest of this dissertation is formulated as follows: In Chapter2, we start with a discussion of the current state of the art in cloud resource orchestration. We de- rive a taxonomy framework and present a survey on cloud resource orchestration techniques. Moreover, our taxonomy framework improves the awareness of funda- mental building blocks (i.e., resources, resource orchestration capabilities, user types, runtime environment, knowledge reuse) within the domain of cloud resource orches- tration. We apply our taxonomy to contrast and compare a wide range of cloud resource orchestration techniques and finally present an analysis where we derive some future directions to improve the state of the art cloud resource orchestration techniques. 1.5. Dissertation Organization 16

In Chapter3, we present the details of our model-driven framework for man- aging elementary and federated cloud resources. We first introduce the research issues by illustrating the heterogeneity in cloud resource management languages and tools. We then present the overall system architecture and interactions of the main components. In particular we elaborate the concepts of Domain-specific mod- els and Connectors which facilitate high-level representation and management of cloud resource configurations over existing cloud resource configuration and manage- ment tools. We also explain the process of extracting Domain-specific models from tool-specific cloud resource artifacts and implementing Connectors which translate Domain-specific models into resource descriptions and management scripts that can be interpreted by cloud resource configuration and management tools such as Juju and Docker. We then illustrate the implementation and evaluation of our approach.

In Chapter4, we present a novel cloud resource deployment and reconfiguration framework which provides a process-based notation (i.e., an extension of BPMN) for users to describe complex deployment and reconfiguration tasks over federated cloud services. We begin by introducing the concepts Cloud Resource Deployment Task and Cloud Resource Reconfiguration Policy which are two high-level process-based abstractions to facilitate users to describe, deploy and specify reconfiguration policies of their federated cloud resource configurations. We illustrate the transformation of the high-level process-based notations in to BPMN notations. We then illustrate the proposed system architecture and the implementation of our system.

In Chapter5, we begin by explaining the research issues related to reusing feder- ated cloud resource configurations. We then explain the overall architecture of our recommender system and interactions between the main components (i.e., Context Database, Recommendation Rules and Configuration Knowledge Representations). We further elaborate how our Recommendation Rules employ Ripple Down Rules (RDR) [54], which is a knowledge acquisition and maintenance method to empower the reusability of federated cloud resource configurations. We then illustrate the implementation and evaluation of our approach.

In Chapter6, we present a model-driven language to visually represent, moni- tor and control cloud resource configurations, managed by existing cloud resource 1.5. Dissertation Organization 17 orchestration tools such as Docker and Juju. We begin by identifying the issues in existing cloud resource configuration representations based on a user study. We then introduce our visual language notations (i.e., Entities, Links, Probes and Wid- gets) and their semantics. We also propose mindmap-based visualization patterns for managing complex cloud resource configurations. We then explain the imple- mentation, experiment and evaluation of our approach.

Finally in Chapter7, we offer concluding remarks of this thesis together and discuss of possible directions for future work. Chapter 2

A Taxonomy and Survey of Cloud Resource Orchestration Techniques and Tools

2.1 Introduction

In this chapter, we propose a consolidated and comprehensive analysis framework to better understand the fundamental building blocks of cloud orchestration. We devise a taxonomy to articulate the concepts, models, languages, standards, techniques and tools. This framework is necessary to effectively explore, assess, contrast and compare the variety of resource orchestration techniques. We compare our work with other related works in Section 1.3.

This chapter contributes an extensive survey of the main issues and solutions in cloud resource orchestration domain. After introducing the necessary background on the core cloud resource orchestration concerns (Section 2.2), we propose our taxonomy and framework for analyzing and comparing cloud resource orchestration techniques (Section 2.3). The taxonomy proposes a set of dimensions (i.e., resources, orchestration capabilities, user types, runtime environment and knowledge reuse), which we progressively discuss in Sections 2.4-2.8. We then apply the taxonomy to analyze a set of methodically chosen cloud resource orchestration tools and research

18 2.2. Cloud Resource Orchestration 19 prototypes; and derive several open research issues based on the technical gaps which were identified during the analysis (Section 2.9). In the last section, we provide concluding remarks and directions for future study.

2.2 Cloud Resource Orchestration

Consumers of cloud resources, human or software, typically have diverse require- ments (e.g., storage capacity, access rules, etc.) Moreover, a single cloud resource normally cannot provide all the necessary capabilities. Consider an HTTP server, application runtime and database, composed together to formulate a typical Web ap- plication deployment platform. The composition of dependent resources may require additional and complex configuration changes. For instance, a secured communi- cation channel may be initialized between the application runtime and database by opening IP ports and enforcing access rules (e.g., firewall rules). Furthermore, deployed resources produce events (e.g., application server started, database server crashed), which need to be monitored so that necessary actions can be taken. To reason about this process, we introduce the notion of Cloud Resource Lifecycle in Chapter1, which aims to categorize orchestration tasks over the different phases in the typical lifespan of a cloud resource.

To manage cloud resources over the lifecycle phases, various services and pro- cesses are used to: select, describe, configure, deploy, monitor and control cloud re- sources. We refer to the term Cloud Resource Orchestration to denote such processes and services. From the consumers’ perspective, the function of orchestration sys- tems is to bind resources and operations (e.g., deploy, monitor, scale-out), thereby providing an abstraction layer that shifts the focus from the underlying resource infrastructure, to available orchestration services and resource management [205]. Cloud resource orchestration systems implement a service-oriented model, enabling consumers to satisfy their application requirements by utilizing resources from cloud environments. In this manner, the overall goal of cloud resource orchestration is to ensure successful hosting and delivery of applications by meeting the QoS objectives of consumers. 2.2. Cloud Resource Orchestration 20

In Figure 2.1, we devise a reference architecture for cloud resource orchestration systems. In the following, we categorize processes, services and tools, involved in cloud resource orchestration based on their functionalities vis-à-vis this reference model.

Consumer User Layer CLI IDE Dashboard

Descripon Orchestraon Rule Policy Layer Resource Descripon Descripon Descripon

Resource Event Monitoring Rules Policies Descripons Descripons Data Resource Management Layer Policy Enforcement Monitoring Rule Engine Engine Engine

Resource Provisioning Create Start Scale-up Scale-down Stop Delete Layer

Legend

refers to reports to consumes creates invokes

Figure 2.1: Reference Architecture for Cloud Resource Orchestration

• Resource Provisioning Layer. Some services and tools merely offer the most basic operations to create, reconfigure and delete cloud resources. Such services and tools are built upon a resource description model – a meta-model that allows consumers to describe resource configurations. Consumers invoke operations of such services with resource configuration descriptions as inputs. Resource provisioning services then interpret these descriptions and manipulate cloud resources accordingly. For example, AWS Command Line Interface (CLI) [20] provides a range of provisioning services for every resource which they support. One such service offers operations (e.g., create, start, stop, delete, clone, attach storage volumes) to provision EC2 virtual machines [19].

• Resource Management Layer. It is vital that services are provided to ef- fectively automate the management of cloud resources, as otherwise consumers are forced to manually invoke basic manipulation tasks offered by the Resource 2.2. Cloud Resource Orchestration 21

Provisioning Layer. For instance, automating a complex management task such as throughput-based Web application scaling in AWS requires: (i) a monitor- ing engine (e.g., AWS CloudWatch [52]); (ii) a policy enforcement engine (e.g., AWS Auto Scaling [18]); and (iii) a rule engine (e.g., Opscode Chef [175]). The monitoring engine: collects throughput metrics from Web application servers; and thereby publishes events to a Policy Enforcement Engine (PEE). Based on the captured metrics, the PEE determines whether to replicate the Web ap- plication into multiple instances. The PEE invokes the rule engine to execute orchestration processes (e.g., clone, deploy and notify the HTTP load balancer about new instances). The rule-engine coordinates the scaling process by lever- aging operations exposed by the Resource Provisioning Layer. Furthermore, there are services (e.g., AWS Marketplace [141]) that facilitate consumers to discover, create, curate and share knowledge about resource provisioning and management as reusable artifacts.

• Description Layer. The Description Layer refers to languages and models to represent configuration, deployment, monitoring and control tasks of cloud resources. This is typically in the form of: (i) resource descriptions; (ii) or- chestration processes (e.g., elasticity rules); and/or (iii) policies (e.g., security policies, load balancing policies). Consumers may model Resource Descriptions from scratch, or alternatively discover existing descriptions and modify to sat- isfy their requirements. For example, AWS OpsWorks provides a description language which enables users to specify a collection of Web application compo- nents (e.g., database, application engine, HTTP load balancer) and relationships between them via a JSON notation [5, 173].

An Orchestration Rule Description language models orchestration behavior (e.g., based on Event-Action rules, flow-based languages such as BPMN). For exam- ple, AWS OpsWorks provides a language with a set of pre-defined life-cycle events (e.g., setup, configure, deploy, undeploy, shutdown), which enables users to associate orchestration actions to those events.

Policy Descriptions define policies that endow resources with dynamic control behaviors. For example, AWS OpsWorks supports defining load-based policies 2.3. Cloud Resource Orchestration Taxonomy 22

to scale Web applications. Such a policy may specify to instantiate new applica- tion engines when the average CPU utilization exceeds 95% and stop application engines when their average CPU load falls below 40%.

• User Layer. The User Layer allows cloud resource consumers (e.g., sys- tem/network administrators, application developers) to interact with the De- scription Layer, Resource Management Layer and Resource Provisioning Layer. Command Line Interfaces (CLIs), Software Development Kits (SDKs), Applica- tion Programming Interfaces (APIs) and Integrated Development Environments (IDEs) (e.g., AWS CLI, AWS Java SDK, AWS REST API and VisualOps) ex- pose operations to manipulate cloud resource descriptions, orchestration rules and policies [20,8,9, 203]. Dashboards (e.g., Amazon CloudWatch [52]) rep- resent Monitoring Data that has been captured in a human-readable format. Dashboards are useful when authoring semi-automated orchestration processes where people coordinate resource orchestration by analyzing monitoring data. For example, when the CPU utilization exceeds 95% of a specific VM, the ad- ministrator may need to discuss with a chief operating officer to recognize which orchestration tasks (e.g., scale out), should be executed while keeping financial costs to a minimum.

2.3 Cloud Resource Orchestration Taxonomy

The focus of this chapter is to articulate the concepts, techniques, languages and tools that are targeted towards cloud resource orchestration. In order to promote a focused analysis over a number of key concerns and solutions, we devise and introduce a carefully considered taxonomy, as depicted in Figure 2.2. The intention has been to provide a holistic view over these diverse concerns, while highlighting the main dimensions and options that are available for cloud resource orchestration. The taxonomy is a result of our own research efforts, experiences from industry, extensive literature reviews in related areas, as well as experiments with various services and tools. This enabled us to recognize the common building blocks and provide a consolidated analysis. 2.3. Cloud Resource Orchestration Taxonomy 23

Earlier, we discussed what is meant by “cloud resource orchestrations”. With the assistance of our taxonomy, we now turn our attention to discover how such orchestrations can be described, deployed and provisioned – independent of specific technologies or target solutions. We delve into understanding the challenges and potential for improving the current orchestration methodology. Accordingly, we divide the orchestration problem into five dimensions, which in turn are split into various sub-dimensions. A portion of this analysis also requires figuring out how a specific dimension may affect the others, (e.g., which resource access methods are suitable for which users).

1. Resources. This dimension identifies the formalisms that are offered for repre- senting cloud resources in order to be supported by orchestration services and tools. Given its vital role in cloud resource orchestration, we further analyze what resources are supported, how resources are modeled, represented and ac- cessed by users (refer to Section 2.4).

2. Orchestration Capabilities. Given a particular orchestration tool, Orches- tration Capabilities consist of actions and processes to manage orchestration tasks. We further divide this dimension into sub-dimensions and look at orches- tration actions, paradigms, automation strategies, theoretical foundations and cross-cutting concerns (refer to Section 2.5).

3. User Type. This dimension identifies the type of users who are involved in orchestrating cloud resources. In our analysis, we identified three categories of users who have different levels of expertise and expectations from cloud resource orchestration services and tools (refer to Section 2.6).

4. Runtime Environment. Another important dimension is the underlying ex- ecution environment of cloud resource management services, which are intro- duced in Section 2.2. We identify three sub-dimensions of runtime environment: (i) Virtualization technique; (ii) Execution model; and (iii) Target environment. We believe these dimensions strongly affect the overall runtime performance of cloud resources orchestration processes. Virtualization technique refers to how physical resources are abstracted to simplify their consumption. The Execution 2.3. Cloud Resource Orchestration Taxonomy 24

IaaS Legend 1 PaaS Possible values of abstract Resour ce Type value SaaS Possible values of (sub-)dimension Resour ce Textual Native Repr esentati on 1 Visual key-value pair s Concr ete 1 1 Notati on Hybr id Semi-str uctur ed Analysi s Abstr act Concr ete 1 (sub-) dimension Concr ete Enti ti es Resour ce Enti ty Rel ati onshi ps Cardinality of possible Model values Constr ai nts

CLI Resour ce SDKs N Resour ce Access Web Ser vices Method GUIs

Pr i m i tive Acti ons Scr ipt-based Reactive Or chestr ati on 1 Str ategi es State-based Or chestr ati on Proactive Language Par adi gm Capabi l i ti es Theor eti cal Foundtati on Cr oss-cutti ng Concer ns

Cl oud DevOps Application User Layer 1 Resour ce Developer s Or chestr ati on Domain Exper ts

OS-level Vi r tual i zati on Hyper visor 1 Techni que Environment-level container manager

Centr alized Or chestr ation Runti m e Envi r onent Ex ecuti on Model 1 De-Centr alized Or chestr ation

Public get Envi r onm ent 1 Pr ivate Feder ated/Hybr id

Template/Concr ete Resour ce N descr iption Reused Ar ti fact Resour ce Snapshot Miscellaneous Knowl edge Reuse Sear ch Indexes Resour ce Repositor ies Reuse Techni que N Recommendation For ums N Community Blogs suppor t Wikis

Figure 2.2: A Taxonomy in Cloud Resource Orchestration 2.4. Resources 25

model refers to how cloud resources are deployed, monitored and controlled in an environment. The sub-dimension, Target environment identifies different deployment models such as public, private and federated/hybrid cloud environ- ments (refer to Section 2.7).

5. Knowledge Reuse. Productivity may be further enhanced through support- ive reuse capabilities of existing orchestration knowledge. Users may implement and share orchestration knowledge as reusable software artifacts (e.g., resource descriptions, orchestration rules). While some users may curate, others may thereby reuse those artifacts. This incremental process of knowledge reuse nur- tures the productivity of cloud resource orchestration by reducing development time and human errors. We identify two sub-dimensions of knowledge reuse: (i) Reused Artifact; and (ii) Reuse Technique (refer to Section 2.8).

We use the taxonomy presented in Figure 2.2 to structure the rest of this chapter.

2.4 Resources

2.4.1 Resource Types

Cloud providers enable virtualizing three categories of resources. We further eluci- date these types below, namely: Infrastructure, Platform, and Software -as-a-Service [168].

• Infrastructure. Infrastructure resources represent processing, storage, net- work and hosting environments that power the Platform-layer resources [31, 16, 205, 194, 168]. Cloud-resource orchestration providers that support infrastruc- ture resources include: VMWare vSphere, AWS EC2 CLI, Google Cloud Plat- form, OpenNebula, , CohesiveFT, Nectar, CloudStack and Rackspace [133, 19, 162, 151, 164, 65, 67, 50, 39]. However, some orchestration techniques do not support all types of infrastructure resources. For example, Rackspace allows users to describe and create virtual machines (VM), associate storage vol- umes to VMs, and create communication channels among VMs. On the other 2.4. Resources 26

hand, Juju [198] only supports provisioning Ubuntu based VMs and does not support storage or network resources.

• Platform. Platform resources provide a set of software development tools, mid- dleware and APIs which include programming SDKs and languages. As well as supporting run-time environments, such as Content Delivery Networks, mo- bile application run-times, Big-data platforms, all which facilitate coding and deploying software resources. Providers include, AWS OpsWorks, AWS Cloud- Formation, Ubuntu Juju, Puppet, Chef, Ansible, , Docker, EngineYard, CloudBees and nitrous.io [173,4, 198, 145, 73, 49, 152, 118]. For example, Heroku provides language runtimes such as Java, Ruby and Node.js to both: describe and provision a language runtime environment; as well as deploy and manage software applications on the provisioned runtime environment.

• Software. Software resources are applications (i.e., Web or mobile), designed for end-users [60, 168]. Software resources (e.g., social networking, project or sales management, personal productivity) are usually built upon underlying Platform and Infrastructure resources. For example, .com1s provides pay-per-use Customer Relationship Management (CRM), amongst other tar- geted features. However, orchestration tasks that underlie software resources remain invisible from end-users, as they are implemented behind the scenes. This implies end-users are unable to alter the software by performing orches- tration tasks (e.g., increasing memory or integrating new resources), albeit may customize the software at the user-interface level for specific requirements. Soft- ware resources are the most abundant type of resource compared to Platform or Infrastructure resources [82].

2.4.2 Resource Entity Model

We propose the notion of Resource Entity Model to provide a generic description of the structure of cloud resources, without necessarily taking into consideration the specifics of any particular service or tool. Effectively, this implies a high-level 1http://www.salesforce.com 2.4. Resources 27 abstraction, which we represent as a graph, whose nodes and edges correspond to cloud Resource Entities and their Relationships respectively [45]; as well as any Constraints.

Resource Entities

Resource Entities describe properties of cloud resources via a set of attributes (e.g., key-value pairs), and as such characterizes the possible runtime instances of the resource. For example, a VM provided by AWS EC2 [180], has attributes such as number of CPU cores, storage capacity, memory capacity, and ac- cess rules. System administrators specify values for each attribute before deploying, and once deployed it may include additional attributes like instance ID, public IP address, and launched time to represent the runtime state.

Resource entities can be further categorized as Elementary or Composite. An elementary resource do not rely on any other resources, while acting as the primary building blocks of composite resources. A composite resource is an umbrella struc- ture that brings together other elementary and composite resources to describe a larger cloud resource. For example, an E-Learning platform that consists of an ar- tifact management service and student identity management service to support 100 students.

Resource entities may have diverse granularities – depending on the design of the orchestration technique that operate on them. For example, Puppet [111] is de- signed to orchestrate resources within a single physical or virtual machine. There- fore, primary resource entities are fine-grained such as, file, sshkey and package [121]. Coarse-grained resources such as application engines (e.g., Node.js runtime) are composed of fine-grained resources. In contrast, Juju [198] is primarily dedi- cated to orchestrating resources deployed across multiple machines. Juju provides resource entity types called Charms, which represent high-level services (e.g., Node.js runtimes, Hadoop clusters [184]) as primary resource entities.

Most orchestration techniques only support describing resources of a specific provider [116, 216, 20]. On the other hand, others such as TOSCA, ModaClouds 2.4. Resources 28 and CloudBase provide cross-provider support, enabling Resource Entities that are portable across different providers [29, 14, 208]. Orchestration techniques that support cross-provider descriptions (e.g., ComputeService in JCloud) are often in- tended for configuration and management of federated or hybrid cloud resources [78, 72, 201].

Resource Relationships

A Relationship denotes a link between two Resource Entities. The relationship constructs can be further annotated with key-value pairs, in order to describe the properties of the respective relationship.

In circumstances where an orchestration technique does not support the explicit descriptions of relationships, composite resources may in fact become inconsistent when orchestrating two related component resources. Consider a Web application, such as LAMP suite (i.e., a software stack, including an Apache HTTP server, MySQL database server and PHP application engine) [125]. When the associated database server is migrated to a new IP address, this means that the relevant con- figuration attributes held at the application engine should also be updated; as this is required in order to maintain successful communication between the application engine and database server. However, if the orchestration technique does not sup- port explicit relationships, this implies system administrators may need to manually update the relevant attributes (or employ other 3rd-party tools such as shell scripts). These alternatives are error-prone and may also cause unnecessary overheads.

Relationships are established between a provider and consumer resource entity – where the provider offers some type of capability for the consumer. Figure 2.3 exemplifies relationships amongst a typical Web application named ESales-Web- App. This Web application is “hosted” in Apache-Tomcat-Server and “communi- cates” data provided by CustomerDB, which is “hosted” in MySQL-DB-Server-1. Apache-Tomcat-Server and MySQL-DB-Server-1 are “hosted” in AWS-EC2-VM1 and AWS-EC2-VM2 respectively.

We identify following types of relationships and elucidate their semantics below: 2.4. Resources 29

Hosts AWS-EC2-VM1 AWS-EC2-VM2 CPU CPU Memory Memory Storage Log-Processor-1 Storage OS Inherits OS ip:open-port tomcat-config Hosts Hosts MySQL-DB-Server-1 Depends-on Apache-Tomcat-Server ip deployed-web-apps open-port SSL-Library access-rules version Hosts Hosts

ESales-Web-App Communicates-to CustomerDB name name db-config table-schema

Figure 2.3: Resource Entities and Relationships of a Web application

1. Communication Relationship. Denotes the exchange data. For example, TOSCA 1.0 [153] provides a relationship type called ConnectsTo, such as an application and its associated database, (refer to Figure 2.4). TOSCA 1.0 thereby, inter- prets description attributes (e.g., communication protocol) of the relationship; and thereby constructs a channel between the relevant resources.

ESales-Web-App Communicates-to CustomerDB name name db-config table-schema

Figure 2.4: Communication Relationship between a Web Application and Database

2. Dependency Relationship. Associate resources with supporting resources that are required for successful operation. For example, a Web application server depends on a Secure Socket Layer (SSL) library (e.g., OpenSSL) to encrypt and communicate data with other resources, such as database server (refer to Figure 2.5). TOSCA 1.0 provides a relationship type called DependsOn, which inter- prets the dependency and enforces that the participant resources are deployed on one VM. In Ubuntu Juju [198], relationships are described as resource attributes that specify whether a given resource either provides or requires a particular ca- pability between another resource (e.g., MySQL DB provides a data source; whereas a Web application requires a data source). System administrators are 2.4. Resources 30

able to create these relationships during deployment.

Apache-Tomcat-Server Depends-on SSL-Library deployed-web-apps version

Figure 2.5: Apache-Tomcat-Server depends on SSL-Library

3. Inheritance Relationship. Denotes when the provider’s attribute values are in- herited by the consumer. However, the consumer resource is permitted to over- ride the inherited attributes to enable customizations. In other words, inheri- tance relationships are a convenient way of configuring attributes of a resource entity by reusing attribute values of another resource entity. For example, to describe a new Web application, which is to be installed on Apache Web server and Ubuntu Operating System: An application developer may simply inherit an existing Web Application resource with a similar configuration – all relevant attributes are inherited (refer to Figure 2.6). Similarly, in Figure 2.3, an In- heritance relationship is set up from AWS-EC2-VM1 to AWS-EC2-VM2. This relationship enforces VM2 to include the same version of operating system de- scribed VM1.

Web-Application Inherits Apache-Web-Server Inherits Ubuntu-OS

Figure 2.6: Inheritance Relationships between Docker Images

4. Containment Relationship. Denotes a parent-child relationship in which orches- tration actions on a parent automatically trigger actions on all children. In practice, containment relationships are used to conveniently orchestrate a set of related resource entities together. For example, AWS OpsWorks [173] provides a resource entity type called Stack. It represents a Web application and may contain a set of child entities that are required to build a Web application, such as Apache Tomcat Server and MSQL database (refer to Figure 2.7). When the Stack entity is deleted, consequently all children are deleted automatically.

5. Hosting Relationship. Enforces deployment of the consumer within the provider resource. This is useful when multiple component resources need to be deployed 2.4. Resources 31

Contains Apache Tomcat Server Stack-1 Contains MySQL Database

Figure 2.7: Containment Relationships within an OpsWorks Stack

within a single component resource. For example, a log-file processor and an ap- plication server need to be deployed within a single VM, as the log-file processor needs the local file system access to read application server logs (see Figure 2.8 and 2.3). For example, Ubuntu Juju [198] enables users to specify the infras- tructure resource provider (e.g., AWS, HP-Cloud, Windows Azure), that will be used to deploy platform resources. In this case, this hosting information is specified via resource attributes. Similarly, TOSCA 1.0 [153] supports a hosting relationship called HostedOn, where its deployment engine interprets the rela- tionships and resolves which resources are to be hosted onto which resource.

AWS-EC2-VM1 Hosts Apache-Tomcat-Server CPU deployed-web-apps Memory Log-Processor-1 Storage Hosts ip:open-port OS tomcat-config

Figure 2.8: Web application server and Log processor, hosted in one VM

Constraints

In some circumstances, it may be necessary to restrict the type of resource entities or relationships, due to both technical and/or non-technical reasons. For example, from a technical standpoint AWS does not allow creating EC2 VMs with arbi- trary amounts of CPU, memory and storage. Instead, AWS provides a set of VM types (e.g., micro, medium, large), which are optimized for different use case (e.g., low-traffic Web applications, large databases) [6]. Likewise, users may specify an Operating System (OS), but only from a predefined list of supported OSs. Like- wise, from a non-technical standpoint, having a constrained set of VM types allows providers to maintain simpler billing policies. Similarly, constraints may restrict 2.4. Resources 32 resource relationships. For example, an AWS EC2 VM must be configured with a 64-bit CPU in order to install a 64-bit OS on the particular AWS-EC2 VM.

In effect, constraints are implemented by restricting the possible values of at- tributes and this helps increase the robustness of cloud resource orchestration pro- cesses. This is straightforward with respect to resource entities; for resource rela- tionships we have identified two sub-categories, namely: cardinality and participants.

1. Cardinality. Defines the potential number of participants in resource relation- ship, and can vary from one-to-one to many-to-many. For example, consider a Ubuntu OS installed within an AWS EC2 VM. There is an one-to-one rela- tionship between the two resources given that neither the Ubuntu OS can exist within more than one VM; nor multiple OSs can be installed with a VM simul- taneously. Whereas, one-to-many relationships may exists between a cluster of HTTP Web servers and their load balancer. Sometimes the cardinality may be arbitrary, such as Ubuntu Juju allows users to specify maximum and minimum numbers of consumers (e.g., Web application) that may create relationships with a provider (e.g., Web application engine), via the resource attributes of the provider [109].

2. Participants. Denote what resource entities are permitted in a particular re- lationship. For example, a Hosting relationship may exist between a VM and a database server, but not between a database server and a Web application. As a concrete example, resource entities (i.e., Charms) in Ubuntu Juju can in- clude properties, such as provides and requires which define their roles [109]. Accordingly, when users create such relationships, Juju ensures that users do not create relationships either between two providers or two consumers.

We also identify two further orthogonal sub-categories of the Participants rela- tionship, namely, (i) Inter-Vendor; and (ii) Vendor-Specific relationships.

(a) Inter-Vendor relationships. In some cases, relationships are permitted be- tween two participating resources from different orchestration vendors. For example, offers higher flexibility to describe rela- tionships amongst different orchestration techniques and various resource 2.4. Resources 33

types. [90], a Web application development platform, allows users to describe and deploy applications on a Google Compute VM or as a Docker container. DevOps are therefore allowed to associate infrastructure and platform resources across these different vendors.

(b) Vendor-Specific relationships. Some orchestration techniques do not permit relationships between resources provided by other vendors. Relationships are thus restricted to the resources defined by the specific orchestration vendor. However, some even restrict the permitted relationships between different resource types (i.e., infrastructure, platform and software). For example, DotCloud [70] only allows to describe and deploy a composition of platform resources (e.g., databases, application engines) on top of a specific infrastructure resource (e.g., AWS EC2). DevOps are therefore not allowed to configure or reconfigure the infrastructure itself. On the other hand, CA-AppLogic permits users to associate any platform resources to desired infrastructure resources, in order to specify which platforms resources are to be deployed on which infrastructure resources.

2.4.3 Resource Access Methods

Cloud orchestration providers expose their functionality by providing software inter- faces for users. Over the years, software interfaces in general have evolved offering various designs to cater for the different capabilities of diverse users. Similarly, in the context of cloud orchestration, building on existing efforts [167], we have identified four types of interfaces:

Command Line Interfaces (CLIs)

CLIs offer a fixed set of commands each of which includes a specified set of input, output and error parameters. For example, the AWS CLI [20] suite allows users to configure, deploy and control cloud resources; such as VMs, data storages and load balancers. As shown in Code 2.1, the “run-instances” command allows DevOps to deploy a specified number of VMs in the AWS public cloud infrastructure. The 2.4. Resources 34 input parameters describes the VM configuration and the number of VMs to be launched in terms of key-value pairs. The output of the command execution is a JSON-based description of the resultant deployment.

Code 2.1: AWS CLI command to deploy VMs 1 aws ec2 run i n s t a n c e s imageId = 1a2b3c4d 2 − − count = 1 3 −instanceType = t1.micro 4 −keyName = MyKeyPair −

Software Development Kits (SDKs)

AWS provides SDKs for a wide range of languages (e.g., Java, PHP, .NET and Ruby). For example, DevOps may download the Java-based SDK and thereby write Java applications to configure and deploy cloud resources in AWS cloud infrastruc- ture (refer to Code 2.2). While CLIs are intended for system administrators with less application development skills, SDKs are intended for those with expertise in particular programming languages.

Code 2.2: Java Syntax in AWS SDK to deploy a VM 1 RunInstancesRequest runInstancesRequest = new RunInstancesRequest() ; 2 3 runInstancesRequest .withImageId("1a2b3c4d") 4 .withInstanceType("t1.micro") 5 .withMinCount(1) 6 .withMaxCount(1) 7 .withKeyName("MyKeyPair") ; /* specifying attributes of the VM */

Application Programming Interfaces (APIs)

Compared to SDKs, APIs provide language independent interfaces for orchestration capabilities that can be accessed by software applications, typically over HTTP. There are two main API implementation strategies: (i) Web Service Definition Language (WSDL), and (ii) REpresentational State Transfer (REST). For exam- 2.4. Resources 35 ple, Rackspace provides a RESTful API (Application Programming Interface) [165] to configure, deploy and control cloud resources such as VMs, load balancers and databases.

Graphical User Interfaces (GUIs)

GUIs comprise visual constructs to interact with orchestration techniques. For ex- ample, StackEngine, Panamax and Shipyard provide Web based GUIs to configure, deploy, monitor and replicate Docker containers [189, 41, 183]. CA-AppLogic pro- vides a desktop based GUI to manage software appliances in a private cloud infras- tructure (see Figure 2.9)[12]. Some other advanced GUIs, such as Puppet Enterprise Console and VisualOps provide dashboards which generate graphical reports such as bar charts and geography maps (e.g., which visualizes the number of failed and running VMs during past 30 days) [120, 202].

Figure 2.9: CA-Applogic GUI

2.4.4 Resource Representation Notation

Notations for representing resources and their relationships may consist of textual and/or visual constructs such as characters and icons. We have identified three 2.4. Resources 36 classes, namely: textual, visual, and hybrid (a mix of textual and visual) notations.

Textual notations

We distinguish between three variations of textual notations:

1. Native resource representations.

There are various cloud resource orchestration techniques which represent cloud resources and their relationships in a proprietary language. These notations are called as Native resource representations, which are usually scripting languages (e.g., Ruby, Python) which are divided into two categories: general-purpose and domain-specific formats. For example, Chef Recipes [44] follow a proprietary scripting language that extends Ruby [53], to represent a file and its attributes such as access permissions and the owner of the file. (refer to Code 2.3).

Code 2.3: A Chef Recipe representing the Configuration and Deployment of a File 1 file "/etc/config.txt" do # location of the file 2 owner ’root’ 3 group ’root’ 4 mode ’0755’ # access permissions 5 action :create 6 end

2. Key-value. This consists of a set of unique keys (or attributes) that characterize cloud resources. A schema is also provided that defines the range of possible values for particular keys. This type of notation is commonly used amongst providers that offer CLIs. For example, the command for creating VMs in AWS CLI [20] expects DevOps to provide values for keys such as “image-id” and “instance-type” in order to describe the VMs to be created (refer to Code 2.1).

3. Semi-structured. Semi-structured data formats, such as YAML (YAML Ain’t Markup Language), XML (Extensible Markup Language) and JSON (JavaScript Object Notation), offer a structuring mechanisms for better organization and clarity of key-value pairs. They define markers to separate and enforce hi- erarchies among different key-value pairs. Furthermore in contrast to other 2.4. Resources 37

notations, these are better suited for representing complex cloud resource con- figurations. For example, DotCloud follows YAML based resource descriptions, which include an umbrella structure of basic and composite configuration at- tributes [70]. Each branch in the root level represents a basic cloud resource configuration (e.g., Java VM, node.js engine, PHP engine).

Visual notations

Visual programming languages abstract the technical details, and thereby offer visual symbols and graphical notations based on the Entity-Relationship model [48]. For example, CA AppLogic Cloud Platform [12] provides a notation, which includes a catalog of constructs that represents elementary platform resources (e.g., databases, routers), and other visual constructs to describe composite platform resources (e.g., Web applications). Figure 2.10 depicts a Web application, composed with a HTTP Gateway (i.e., IN ), Web application Server (i.e., WEB5 ) and a network-attached storage (i.e., NAS)[193].

Figure 2.10: Visual notation in CA-Applogic for a Web application

Hybrid notation

Finally, hybrid notations are a blend of the aforementioned types of notations. Most providers adopt this approach to leverage the advantage from each notation. For example, the Ubuntu Juju Charms language [198] follows a YAML-based notation to describe configuration attributes of cloud resources. However, Juju supports scripts that represent orchestration logic, such that users implement deployment, 2.5. Resource Orchestration Capabilities 38 starting-up and shutting-down behaviors as Shell scripts.

2.5 Resource Orchestration Capabilities

Implementing orchestration processes can vary from a simple sequence of primitive actions to complex and proactive processes. In this section, we discuss these con- cepts, and later we introduce different language paradigms, theoretical foundations and cross-cutting concerns of orchestration languages.

2.5.1 Primitive Actions

We begin by explaining primitive actions using a concrete example. According to the Resource Entity Model presented in Section 2.4.2, Figure 2.12 depicts a composite cloud resource infrastructure for a Web application runtime. An Apache-Tomcat application engine cluster is employed, whereby the Web application is deployed at each node. Nginx is a reverse proxy, which distributes the incoming traffic to the Web application that is deployed at each cluster nodes. Nagios is a monitoring service, which is configured to observe the throughput of node clusters. MySQL database server is deployed to persist data and state information of the Web application. MemCache is configured as a caching service, which improves the performance of database calls. For organizing primitive actions into different categories, we refer to the identified lifecycle model of typical cloud resources (see Section 1.1.2). These primitive action categories are depicted as state transitions of the state chart in Figure 2.11 and explained below.

Control Monitor

Select Selected Configure Configured Deploy Deployed Delete

Figure 2.11: State transitions of the cloud resource life cycle

1. Select. DevOps typically need to first select cloud resources that satisfy the application requirements. For instance, if we expect to deploy a database, we 2.5. Resource Orchestration Capabilities 39

Apache-Tomcat-Cluster Communicates-to Nginx-Proxy deployed-web-apps listener-port routing-table Hosts

WebApp Depends-on Memcache Communicates-to name name db-config ip:open-port memcache-config access-rules nagios-config

Communicates-to MySQL-DB-Server Nagios Database Hosts name tomcat-config name ip:open-port ip:open-port table-schema access-rules

Figure 2.12: A Composite Resource Infrastructure for a Web application

would need to evaluate a set of potential database resource providers and select a particular provider based on both functional (e.g., storage capacity and type of the database) and non-functional requirements (e.g., availability and cost per unit). For example, Bitnami [30] provides a cloud resource selection ser- vice where consumers search and select cloud resources based on intended task category (e.g., project management, Web application) and target deployment environment (e.g., personal desktops, VMWare vShpere private cloud, AWS public cloud).

2. Configure. Next resources are configured by defining the expected properties. DevOps choose a preferred provider and instantiate instances of the Resource Entity Model (refer to Section 2.4.2). This effectively specifies the configuration description that defines what resources and relationships are needed between resources. For example, in AWS OpsWorks [173], consumers choose required re- sources (e.g., MySQL DB server) and initiate configuration attributes (e.g., DB server with 5GB of capacity and running on port 3306) to define an expected runtime behavior. These descriptions are then submitted to the Resource Pro- visioning Layer (refer to Section 2.2) to create the desired cloud resources.

3. Deploy. Deployment involves interpreting the description specified in the Re- source Entity Models and bringing the resources into an operation and consumption- ready state. For example, system administrators may use AWS-RDS API [7] to 2.5. Resource Orchestration Capabilities 40

provision and start a MySQL DB-server where the database in Figure 2.12 is created. System administrators then create and configure the database and its tables manually or via an ad-hoc script (refer to Code 2.4).

Code 2.4: Linux shell commands to deploy a database server in AWS-RDS 1 #provisioning and starting the database server with configuration attributes 2 rds create db instance mysqlDatabase s 10 c db.m1.large e − − − − − − u admin p password − − 3 4 #creating the database and tables via an SQL script 5 mysql h mysqlDatabase. rds .amazonaws.com P 3306 u admin p < − − − − ’’dbAndTableCreationScript.sql ’ ’

Once the component resources are constructed, relationships between resources must be created. For example, the necessary ports and access rules should be setup within the Apache Tomcat application engine cluster and MySQL DB- server such that every node in application engine cluster can send requests and receive responses from the MySQL DB-server. There may also be some additional operations, which make cloud resources operational. For example, starting an Apache Tomcat application engine cluster, Memcached server and Nginx are essential to allow consumers to access the Web application (see Figure 2.12).

4. Monitor. Once the cloud resources are operational, DevOps must monitor to check whether those resources are continuously operating according to the con- figured resource attributes. For example, a Tomcat application engine, which is configured to be operational on 24x7, should be monitored to check whether it responds to incoming requests continuously. If found to be not responding, it implies the service level agreement between the cloud resource provider con- sumer is violated. Nagios is a monitoring engine [22] which enables DevOps to specify events to be monitored on specific cloud resource and get notifications (e.g., emails) when they are triggered. For example, to monitor a Tomcat ap- plication engine, DevOps need to specify the location details of the engine via Host definition and Service definition (refer to Code 2.5). To specify the exact 2.5. Resource Orchestration Capabilities 41

events to be monitored, a Command definition must be specified, as shown in Code 2.5. Here we check whether the Tomcat application engine is running or not. DevOps specify notifications, to be sent, as a Contact definition in Code /reflst:nagios-config, which sends an email to [email protected] when the engine is not running.

Code 2.5: Nagios syntax to monitor a Tomcat application engine 1 # Host definition (where the Tomcat application engine is hosted) 2 define host{ 3 use linux s e r v e r 4 host_name− AWS EC2 Host 1 5 address 201.168.1.3− − − 6 contact_groups admins 7 } 8 # Service definition of Tomcat application engine 9 define service{ 10 use generic s e r v i c e 11 service_description− Tomcat Engine 1 12 hostgroup_name AWS EC2 Host− 1 − 13 contact_groups admins− − − 14 check_command check_Tomcat 15 } 16 # Command definition for the health check of Tomcat application engine 17 d e f i n e command{ 18 command_name check_Tomcat 19 command_line ps ef | grep tomcat; if[ $? gt 0 ] ; 20 then− echo "Tomcat Pass"; else− echo "Tomcat F a i l " ; f i 21 } 22 # Contact definition 23 d e f i n e contact { 24 contact_name admins 25 email adminabc.com

5. Control. When cloud resources are monitored and found to notbe operat- ing according to the configured attributes, DevOps may take necessary control actions to recover any issues. For example, Figure 2.13 depicts the orches- tration logic that scales an Apache Tomcat application engine cluster when the network throughput becomes less than 95%. In this particular workflow, the Nagios monitoring engine continuously monitors the network throughput (i.e., the percentage of successful message delivery over a network). When the 2.5. Resource Orchestration Capabilities 42

Figure 2.13: Orchestration Workflow (in black and bold) for Scaling-up an Apache Tomcat Application Engine Cluster

throughput reduces below the threshold, Nagios triggers the rules that deploy a new node in the cluster. Once a new node is included in the cluster, the Web application should be deployed within the node. The Nginx reverse proxy should then be notified about the new node with its IP address and port. In this manner, Nginx mediates incoming traffic to the new node. Finally, cloud resources are deleted when no longer needed.

2.5.2 Orchestration Strategies

We classify cloud resource orchestration techniques in accordance with their level of sophistication. Less sophisticated techniques require more human interventions (and vice versa), particularly to orchestrate resources in response to dynamic changes.

Script-based Orchestration Strategies

Script-based orchestration strategies are the most basic and widely used form of implementing cloud orchestration processes. DevOps implement orchestration pro- cesses as ad-hoc scripts, which exploit only a set of primitive actions supported by a particular orchestration language. Existing cloud resource orchestration techniques typically rely on Script-based orchestration strategies, written in general-purpose or scripting languages [168, 131, 135, 223, 163]. 2.5. Resource Orchestration Capabilities 43

Code 2.6: Dockerfile of a Python Web application 1 FROM python : 2 . 7 # this application is based on python 2.7 2 ADD . / code # commands to install the python application 3 WORKDIR / code 4 RUN pip i n s t a l l r requirements.txt 5 CMD python app.py− # command to start the python application

Docker [197] allows scripts known as Dockerfiles to be written. Each specifies the configuration parameters of a particular cloud resource (refer to Code 2.6). DevOps may also describe a composition of a set of cloud resources using a file named docker- compose.yml (refer to Code 2.7). Along with those scripts, DevOps specify primitive actions to deploy, start and stop composite and component cloud resources.

Code 2.7: docker-compose.yml of Web application and database 1 web: # configuration parameters for the Web application 2 build:. 3 ports: "5000:5000" 4 volumes: − .:/code 5 links: − r e d i s # setting up the communication link with the database− 6 7 redis: # configuration parameters for the database 8 image: redis

However, scaling-up or down cloud resources in dynamic environments via Script- based orchestration processes leads to an inflexible and costly solution. This adds considerable complexity, demands extensive programming effort, requires multiple and continuous patches, and perpetuates closed cloud solutions.

Reactive Orchestration Strategies

Some providers define a rule-based orchestration language (e.g., Event-Action rules) in addition to primitive actions. This brings reactive capabilities to orchestra- tion processes. DevOps specify Event-Condition-Action (ECA) rules based on pre- defined events and/or event patterns [144]. Flow-based languages such as Business Process Model and Notation (BPMN) are used to implement deployment tasks and 2.5. Resource Orchestration Capabilities 44 reconfiguration policies [208]. When the event patterns are matched, the specified actions are automatically fired by the orchestration tool.

For example, AWS OpsWorks [173] supports five events (i.e., setup, install, de- ploy, undeploy and shutdown). Actions are represented via Chef recipes [44]. When events are triggered, the associated recipes are automatically executed. There are several research initiatives which propose reactive orchestration languages [43, 227, 221]. Code 2.8 exemplifies an elasticity rule that dynamically deploys new VMs as the number of jobs awaiting execution increases [43].

Code 2.8: Elasticity Rule to Scale VMs 1 2 3 5000 //Event 4 //Condition 5 (@uk.ucl.condor.schedd.queuesize / 6 (@uk.ucl.condor.exec.instances.size +1) > 4) && 7 (@uk.ucl.condor.exec.instances.size < 16 ) 8 9 10 //Action 11

State-based Orchestration Strategies

State-based strategies provide declarative and specification-oriented languages to describe orchestration of cloud resources. A recurring and intuitive abstraction in modern IT resource management processes, instead of directly manipulating low- level interfaces and scripting rules over cloud resource configurations, these strate- gies reason about resource requirement states. States characterize resource require- ments (e.g., CPU and storage usages), constraints (e.g, in terms of costs), and other SLAs. Transitions between states are triggered when certain conditions are satisfied (e.g., a temporal event, application workload increases beyond a certain threshold). Transitions also automatically trigger actions to perform the desired resource (re- )configurations to satisfy the requirements and constraints of target states. AWS OpsWorks supports a state-based language to specify whether to scale up or down 2.5. Resource Orchestration Capabilities 45 applications based on temporal events (e.g., on every Sunday) and load-based events (e.g., average CPU load over 90%) [173].

Proactive Orchestration Strategies

With the expanding complexity of cloud-based systems, orchestration tasks become too cumbersome to be carried-out largely with human-assisted techniques alone. Such as script-based, reactive and state-based orchestration strategies. Proactive or- chestration strategies, the highest level of sophistication, refers to the self-managing features of cloud resources as per their environment’s needs while hiding the intrinsic orchestration complexity of the resources [196, 185]. For instance, requirements for high availability in cloud resources demand that cloud resources are self-adaptive – dynamically and automatically (re-)configuring, in order to maintain the expected quality of service in the presence of faults, variable environmental conditions, and changes in user requirements. Using the holistic techniques provided by proactive orchestration, we can handle to a large extent different user requirements such as performance, fault tolerance, reliability, security, QoS, and so forth without manual intervention [46, 219]. For example, an proactive orchestration process automatically scales up or down running applications by analyzing the recent resource consumption statistics. This implies, orchestration techniques intelligently make certain decisions when managing cloud resources without taking any instructions from users. Com- pared to Reactive Orchestration Processes, advantages of Proactive Orchestration Processes include: (a) reduced deployment and management cost; and (b) increased stability of cloud resources as less human effort is required for programming, val- idating and maintaining such orchestrations [160]. We identify three categories of proactive orchestration strategies as follows.

• Autonomic orchestration. CometCloud is an autonomic orchestration engine designed for cloud environments [114]. It supports resizable computing capa- bility, with integrated private and public cloud resources based on-demand. Abstractions and mechanisms are provided to support autonomic orchestra- tion, including features such as budget-, deadline-, and workload-based de- 2.5. Resource Orchestration Capabilities 46

ployment of applications on the cloud.

Self-optimization is an approach in research to realize autonomic orchestra- tion strategies. It refers to automatic re-configuration of resources to meet dynamic Quality of Service (QoS) requirements [224, 185, 196]. For example, Fuzzy BPM-aware Auto-Scaler scales-up or down VMs based on Key Per- formance Indicators (KPIs) of the VMs and the deployed business processes within the VMs [179, 139]. Other research initiatives target analyzing user re- quirements (e.g., level of SLA); end-user context (e.g., geolocations and device configurations of end-users); and environmental properties (e.g., unit cost per resource, processing speed of VMs) [212, 93, 143, 225, 220, 74].

• Predictive orchestration. Predictive methods, which are based on historical or simulated data sets, have been applied in proactive orchestration. Xu et al., Sadeka et al., ASAP and ORMDSS propose neural network based approaches [217, 104, 106, 166]. These are are trained against different workload scenarios, and predict an optimal or near-optimal configuration of VMs and software appliances. Antonescu et al. propose a predictive technique to migrate and provision cloud-based mobile services based on the mobility of users [11].

• Heuristic-based orchestration. Various heuristic-based resource allocation and migration algorithms have been proposed in research for proactive orches- tration [147, 158, 26, 103]. Some of these algorithms are based on a set of pre-defined policies that determine which type of VMs should be provisioned to which data centers; while optimizing the energy consumption [26].

2.5.3 Language Paradigm

Language paradigm is an approach of programming based on a coherent set of principles and practices, which determine its suitability for solving certain types of problems [199]. We have identified the following language paradigms amongst exist- ing cloud providers (e.g., Puppet, Chef, Juju, Docker, SmartFrog, AWS OpsWorks), and research initiatives [59, 47, 87, 64, 116, 215]: 2.5. Resource Orchestration Capabilities 47

Imperative Programming

We have identified three sub-categories:

• Script-based. DevOps widely adopt scripting languages (e.g., JavaScript, Python, ) to implement cloud resource orchestration processes. Providers that use this method include Docker and 2 [197, 97].

• Flow-based. The primitive constructs of flow-based orchestration languages are data-flow and control-flow connectors. This approach is commonly used in the service composition domain (e.g., BPEL, BPMN), where the primary compo- nents are Web Services [154, 110]. BPMN4TOSCA [117] (which include four BPMN extensions) and CloudBase [208] extend BPMN to implement orches- tration processes of cloud applications.

• ECA/Rule-based. ECA rules are specified by associating a sequence of con- figuration, deployment or re-configuration actions for each of possible events. For example, Juju [198] supports defining lifecycle events (e.g., installing, start- ing, upgrading, stopping) to configure, deploy and re-configure cloud resources. Once the rules are deployed, the Juju runtime: detects such events; automati- cally triggers associated actions; and notifies dependent resources by triggering new life-cycle events. Code 2.8 exemplifies an ECA rule that dynamically de- ploys new VMs as the number of jobs awaiting execution increases [43].

Declarative Programming

We have identified three sub-categories:

• Markup Languages. Markup languages are intended for annotating documents in both machine and human readable; for example XML [33] is widely used. Plush is a tool to deploy, monitor and control distributed software applications. It advocates an XML-based language to model and deploy software compo- nents [3]. The Plush runtime interprets XML-based component descriptions,

2https://www.vagrantup.com/ 2.5. Resource Orchestration Capabilities 48

downloads the required artifacts, prepares them for execution, and starts the necessary processes within VMs or physical machines (see Code 2.9).

Code 2.9: Component Description in Plush 1 2 − 3 4 //The resource description 5 6 7 http://10.15.10.25/software.tar 8 software.tar 9 10 11 //Deployment specification 12 13 14 25 15 [email protected] 16 17 18 19 20 21 22 23

• Query-based. Query-based orchestration languages model cloud resources as structured data (e.g., tables, graphs, trees) and provide actions (e.g., create, read, update and delete) for processing those structured data. [132, 131] repre- sents cloud resources as a tree-like data structure and provide declarative actions to create, delete and update cloud resources with well-defined transactional se- mantics.

• Constraint Programming. Constraint programming enables automatic genera- tion of cloud resource configurations from declarative constraint specifications [61, 178]. For example, CFEngine provides a constraint-based specification to configure resources (e.g., files) within a physical or virtual machine (refer to Code 2.10). It automatically determines the steps required to create and up- 2.5. Resource Orchestration Capabilities 49

date resource configurations by analyzing constraint specifications and recent changes within the operating environment [37].

Code 2.10: A Constraint Specification in CFEngine to Represent a File 1 f i l e s : 2 ‘‘/home/mark/tmp/test_plain ’ ’ > ‘ ‘ system blue team ’ ’ , − − − 3 c r e a t e => ‘ ‘ true ’’, #Constraint-1 4 permissions => owner(‘‘@(usernames)’’), #Constraint-2 5 comment => ‘‘Hello World ’ ’ ; #Constraint-3 −

2.5.4 Theoretical Foundation

Some cloud resource orchestration techniques are devised based on theory – they are either derived in a top-down manner or inferred from experimental data. On the other hand, others are just rules-of-thumb, learned empirically without any theoret- ical foundations. We identify several formal methods, which include mathematically defined abstractions for describing cloud resources and their orchestration behaviors. These models embody formally proved implementations.

Closures encapsulate a structure of orchestration commands in the form of black boxes; this aids to reduce management complexity and costs [56, 38]. Its behavior can be thought of as the sum of its transactions with the outside world, such that each output from a closure is a function of all input received so far. Inputs can take the form of events and streams. Closures are adopted in CFEngine to configure cloud resources [37].

Promises model the way cloud resources commit to certain behaviors [38, 28]. It allows cloud resources to become more autonomous and self-sufficient in dy- namic environments. In CFEngine, Promises are implemented as policies that modify cloud resources, such as those in non-conforming states which are trans- formed into conforming states [37]. Effectively, this approach aims to immunize cloud resources against potential deterioration by continuously repairing any non- conforming states. Promises are also idempotent, meaning that they will do nothing unless non-conformity is discovered. This technique have further been applied in the domain of verification and knowledge management of cloud resource orchestration 2.6. User Types 50

[36, 35].

Aspects are an abstraction for organizing Promises into distributed bundles and constellation [34]. Aspects are introduced over Promises to describe complex or- chestrations which need to be dealt with by multiple Promises simultaneously.

2.5.5 Cross-cutting Concerns

Implementing cloud orchestration processes are met with a range of cross-cutting concerns, such as: security; service level agreements and negotiations; portability; interoperability; standardization; resource demand profiling; load balancing policies; resource pricing; profit maximizing and other runtime issues. While we understand the importance of all this, we specifically focus on the orchestration aspects of cloud resources – in the sense of identifying abstractions to manage cloud resources. There are however other surveys with in-depth focus on these related aspects [127, 100, 174, 13, 224, 85, 196, 105, 140].

2.6 User Types

We identify three types of users, typically involved in orchestrating cloud resources:

2.6.1 DevOps

DevOps are an emerging role in software organizations to consolidate application developers and system administrators. With the adoption of agile software devel- opment, application developers can implement software updates much faster com- pared to traditional development methods. To effectively push these updates to the production environment, complex orchestration processes (e.g., setting up an application testing environment, testing application updates, migrating the tested environment to the production, scaling the production environment based on usage patterns) needs to be carried out. DevOps are responsible for optimizing and au- tomating (when it is possible) those orchestration processes, which thereby improves 2.7. Runtime Environment 51 the quality of software development and continuous delivery processes.

2.6.2 Application Developers

Application developers implement software and often deploy their software artifacts on platform cloud resources. For example, developers may write Java applications and deploy them on Heroku. In general they seek simple resource access methods (e.g., GUI) and languages (e.g., nitrous.io3), which deploy language runtimes (PHP, Node.js, Python). Eventually application developers move onto more complex or- chestration techniques (e.g., Heroku) which provide more orchestration capabilities to optimize and scale the deployed software resources.

2.6.3 Domain Experts

A domain expert has expertise on a specific domain (e.g., biologists, teachers). Do- main experts usually employ software-based orchestration solutions for day-to-day operations. For example, the lecturers of Introduction to Computer Science Course [138] in Harvard University create and publish a virtual machine named CS50 Ap- pliance 19 [58], which includes all software required by students to develop, test and build source-code. The students access the virtual machine to work on their assignments. Compared to other user categories, domain experts have very few or no programming expertise on resource orchestration processes. For this reason, it is imperative domain experts are provided with declarative orchestration languages and simple resource access methods which are specialized for their domain.

2.7 Runtime Environment

The runtime environment for cloud orchestration may depend on three orthogonal concerns: (a) virtualization technique; (b) execution model; and (c) target environ- ment. 3https://www.nitrous.io/ 2.7. Runtime Environment 52

2.7.1 Virtualization Technique

Virtualization is the key technique that transforms cloud resource descriptions into concrete cloud resources – it does so by provisioning underlying hardware and soft- ware constituents without upfront capital expenditure [43]. Traditionally, the oper- ating system manages the allocation of underlying resources (CPU, memory, storage, network bandwidth) to software applications. However, this conventional approach causes significant interference among running software applications. Especially with large numbers of applications; when each application has different resource require- ments; ownerships and Quality of Service (QoS) demands. For example, an erro- neous application or a CPU intensive application could affect the QoS of all other running applications. Virtualization technologies thereby evolved to solve this prob- lem by providing better isolation and scalability abstractions.

Techniques for virtualization generally address three main concerns: (a) perfor- mance isolation; (b) data isolation; and (c) execution isolation [94]. Data inter- ference is the unintended data sharing (e.g., file systems) across different resources. Execution interference is the effect on the runtime state (e.g., failures) of one resource to another resource. Performance interference is the influence of the performance of one resource to another, where both share the same underlying resources.

We discuss two types of virtualization techniques that are commonly adopted by cloud resource orchestration techniques:

• OS-level Hypervisor. This technique runs on top of a host operating system; it creates and manages the execution of one or more VMs each of which is installed with a guest operating system (refer to Figure 2.14 (left)). To enable this arrangement, it accesses a shared pool of resources (e.g., memory, CPU and system calls) through the host operating system and carefully partitions those resources across guest operating systems. For example, AWS EC2 service uses an extended version of as their OS-level hypervisor to provision EC2 virtual machines.

• Environment-level Container Manager. This technique performs on top of the kernel of the host operating system, similar to OS-level . In contrast 2.7. Runtime Environment 53

Applicaon Applicaon Dependencies Dependencies

Guest OS Guest OS Applicaon Applicaon

Virtual Machine1 Virtual Machine2 Dependencies Dependencies Container OS-level Hypervisor Container1 Container2 Manager

Host Operang System (OS) Host Operang System (OS)

Hardware Hardware

Figure 2.14: OS-level hypervisor (left) vs. Container Manager (right)

however, container managers do not virtualize the hardware layer but use fea- tures of the operating system kernel to create lightweight virtualized operating system environments (i.e., containers) (refer to Figure 2.14 (right)). For ex- ample, LXCs (Linux containers) are built by leveraging cgroups and namespace features of the Linux kernel [130, 172]. Environment-level containers also do not require installing separate guest operating system on each container. Effectively they share the hardware layer and host operating system kernel layer across all containers. This is a resource isolation mechanism with little overhead compared to OS-level hypervisors. Docker is an Environment-level container manager that runs on top of the Linux operating system [197].

2.7.2 Execution Model

The execution model refers to how a particular orchestration process distributes and performs tasks. We identify two main types of execution models:

• Centralized Orchestration. In this approach, one manager performs all the tasks of an orchestration process. If tasks are dispersed across a set of machines within a distributed environment, the centralized manager directly issues commands to perform the orchestration. For example, VMWare vSphere [133] is a virtual machine management tool that creates and manages VMs on top of a single 2.7. Runtime Environment 54

host machine. Ansible performs as a central manager, which directly issues orchestration commands via the SSH (Secure Shell) protocol; such that the issued commands are received by remote machines [148].

• De-Centralized Orchestration. In this approach all participating machines are required to install an agent supplied by the orchestration provider. During execution, tasks are delegated to the agent – which is thereby responsible to perform the actual orchestration tasks. Agents are only aware of its delegated tasks, and not about tasks assigned to other agents. For example, Puppet supports agent-based orchestration in which there is a central server that stores orchestration processes. Agent machines periodically poll the central server for orchestration tasks and perform those tasks. Puppet follows a model based on Promise Theory (refer to Section 2.5.4) to avoid potential inconsistencies in autonomous and de-centralized orchestrations [27]. Kirshnick et. al. propose a peer-to-peer architecture, a highly scalable and fault-tolerant architecture with no central orchestration server, to automatically deploy software components across a pool of virtual machines [115].

2.7.3 Target Environment

Cloud computing is evolving over a diverse range of forms: Public clouds are typically deployed by IT organizations; private clouds are usually deployed behind a company or user-group enterprise firewall. A third option, a hybrid or federated cloud, draws computing resources from one or more public clouds and one or more private clouds, combined at the behest of its users.

Public Cloud

Public cloud providers, such as AWS provides a range of orchestration techniques (e.g., AWS Command Line Interface, AWS CloudFormation, AWS OpsWorks), each of which suits different types of users (e.g., system administrators, DevOps, de- velopers) to configure, deploy and control cloud resources. Alternatively, there are third-party cloud resource orchestration techniques that provide plug-ins to integrate 2.7. Runtime Environment 55 with public cloud providers. For example VisualOps provides a graphical interface to configure and visualize VMs deployed across different regions (e.g., Europe, Aus- traliasia) in the AWS environment [203].

Private Cloud

Private cloud resource providers, such as VMWare and OpenStack offer VMWare vSphere and Heat respectively to configure and manage virtual machines within a private network [133, 157]. Additionally, third-party tooling such as Juju, Ansible, Chef and Puppet [198, 148, 175, 118, 111] support resource configuration, deploy- ment and control in OpenStack-based private cloud deployments.

Federated Cloud

Tools for orchestrating federated cloud resources have been introduced in both re- search and industry practice [196, 95]. They either: (a) define a unified cloud resource orchestration language which must be conformed to by all participating providers [208, 214]; or (b) provide a pluggable architecture that interprets different orchestration languages that are offered by participating cloud resource providers [213, 209].

For example, TOSCA is an open standard for representing and orchestrating cloud resources [153, 29]. It describes a federated cloud resource using a Service Template. This template captures the topology of component resources, and sets a plan for orchestrating those resources.

Techniques for capturing a unified representation, as well as enabling orches- tration of cloud resources amongst diverse providers have been studied by research [149, 188] and implemented as language libraries [79, 195, 77, 80]. On the other hand, Ansible provides a suite of distinct language modules each of which publishes an orchestration interface for a specific resource type offered by a particular resource provider (e.g., AWS, Rackspace, Azure, VMWare) [10]. Users of Ansible are thus able to implement scripts by reusing a set of modules to model and orchestrate federated cloud resources. 2.8. Knowledge Reuse 56

2.8 Knowledge Reuse

Knowledge reuse frameworks are based on four main pillars: (a) knowledge rep- resentation; (b) knowledge acquisition; (c) knowledge curation; and (d) knowledge discovery (see Figure 2.15). Knowledge representation techniques are presented in Section 2.8.1. We then collectively discuss various methods used for knowledge acquisition, curation and discovery, in Section 2.8.2.

Knowledge Reuse

Knowledge Knowledge Knowledge Knowledge Representation Acquisition Discovery Curation

Figure 2.15: Sub-dimensions of Knowledge Reuse

2.8.1 Reuse Artifact

We refer to an artifact as a logical entity within cloud resource orchestration. An ar- tifact may be atomic (e.g., a resource description or orchestration rule); or composite including multiple interrelated elements (e.g, deployment workflow). Reuse artifacts can be distinguished as template or concrete. Concrete artifacts are fully-developed solutions for specific problems. Template artifacts are generalized solutions, which need manual adaptations (e.g., initializing configuration parameters) before reuse. Considering the above dimensions, we have identified the following variety of reuse artifacts:

Concrete and Template Resource Descriptions

Most enterprise-ready cloud orchestration providers support both concrete and tem- plate resource description repositories for knowledge-reuse. For example, Google Container Engine, Docker and Juju offer a knowledge reuse repository called Google Container Registry, Docker Hub and Juju Charm Store respectively [89, 68, 40, 181]. For instance, Docker Hub enables sharing and reusing resource descriptions by means 2.8. Knowledge Reuse 57 of Docker Images; which represents a deployment description of some cloud resource (e.g., mongoDB database, nginx reverse proxy server) with the required dependen- cies. Docker Hub may be used to discover, configure and deploy existing Images. Template Images are associated with a set of configuration parameters (e.g., access credentials of a database server Image), which are initialized by users before the deployment; while concrete Images have pre-initialized configuration parameters.

Resource Snapshots

A snapshot of a cloud resource includes not just its description but also a specific runtime state (e.g., deployed and started application server). In contrast to reusing concrete and template resource descriptions, snapshots additionally embed infor- mation about the execution of the orchestration process. For example, Snaps in terminal.com and VMware Snapshots provide resource snapshots [101, 204]. Users of terminal.com (e.g., application developers) may specify, deploy and share Snaps with other users (e.g., QA engineers, system administrators) – who may test, mon- itor and control those Snaps.

Miscellaneous

Developing orchestration processes using CLIs and GUIs involves manual tasks, such as entering a command or clicking the “deploy” icon. Inevitably in these en- vironments, knowledge reuse is only limited to miscellaneous methods such as user guides. For example, DevOps create and publish DockerFiles, which are textual re- source descriptions of Docker Images; they may then be shared on code repositories such as GitHub. Instructions for how to configure and deploy the specific DockerFile into a Docker Container may also be shared. Albeit these instructions can only be interpreted by humans (not machine read). 2.8. Knowledge Reuse 58

2.8.2 Reuse Techniques

Given an artifact for reuse, it is imperative to identify different techniques that can be applied in practice to enable its reuse. We identify the following three categories:

Search Indexes

Ansible, Puppet and Chef provide search indexes based on resource description attributes (e.g, artifact name, owner, version and created date) [148, 118, 175]. However, this assumes users know the exact (or nearly exact) attributes values in order to query for potential reuse artifacts. There are more advanced search indexes (e.g., Bitnami) which accept query inputs such as intended task category (e.g., project management) and target deployment environment (e.g., AWS EC2 public cloud, VMWare vSphere private cloud). This helps target more relevant artifacts, albeit it still remains ineffective in practice when users do not know near precise attributes values.

Recommendations

This implies proactively suggesting a set of potential artifacts to facilitate the orches- tration process. Compared to search indexes, recommended artifacts are suggested based on user profiles, usage histories and contexts [171, 207, 226]. For example, AWS marketplace suggests virtual appliances based on users’ ratings and comments. Additionally when users choose a particular (e.g., http server), a list of related virtual appliances (e.g., http load balancer) is recommended that can be deployed along with the chosen appliance.

Community-driven Techniques

Leveraging user-expertise to facilitate knowledge-reuse is a popular choice amongst many enterprise-level cloud providers.

• Resource Repositories. Online databases such as Docker Hub and AWS EC2 Container Registry act as -like version-control repositories [68, 181]. Users 2.8. Knowledge Reuse 59

create Docker Images and push them to the repository; while other users can pull Images for reuse or further customizations. When the community grows in terms of users and resource artifacts, quality (e.g., support after sales), correctness and discovery of artifacts become critical factors that determine reliability, safety and efficiency of orchestration processes. Communities like Bitnami [30] restrict all but authorized developers to register resource artifacts. Some communities (e.g., Ubuntu Juju, Puppet [108, 119]) implement strict curation policies (e.g., licensing, naming conventions, idempotency of orchestration rules) which must be adhered to when sharing resource artifacts. Yet while other communities such as Docker Hub [68] do not enforce curation policies, they provide reputation schemes (e.g., user ratings/comments, number of artifact downloads) such that DevOps collectively estimate quality and correctness of resource artifacts.

• Forums. Forums allow users to post questions and ideas, and receive targeted answers and comments from other users. This proves useful during testing when users may stumble upon issues prior to publishing workable artifacts. For example, Puppet provides a forum for DevOps to post, query, answer and rate questions.

• Blogs. Blogs usually contain information authored by a single user or organi- zation, rather than Q&A which requires constructive feedback. However very useful to keep updated. For example, Chef posts blog articles about artifact development best practices, updates to the orchestration language and other related news.

• Wikis. Wikis are community-driven collaborative environments that are partic- ularly useful in small teams to maintain orchestration knowledge. For example, DevOps in a software company can keep track of the list of deployed VMs. De- vOp may then update the wiki whenever they make any changes (e.g., installing software, operating system updates). Effectively, wikis provide a centralized and constantly updated means for effective and easy-to-share documentation. 2.9. Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques 60 2.9 Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques

In consolidation of the foregoing discussion, we compare a wide range of state-of-art cloud resource orchestration methods. We organize our analysis by characterizing these tools along the main dimensions of our taxonomy (as presented in Section 2.3). We include well-known enterprise tools, frameworks and research initiatives.

2.9.1 Selection Process

Careful consideration was applied in the selection of relevant tools for our anal- ysis; this entailed several phases of investigation: Initially, 20 orchestration tools (refer to AppendixA) were chosen out of a set heavily advocated by the DevOps community. We experimented with those tools to understand the main dimen- sions that are common amongst all. Based on our observations, we were able to derive the initial draft of our taxonomy. Furthermore, we compiled comparison ta- bles that summarize how each tool implements the main dimensions of our initial taxonomy. We then chose a selection of research initiatives from leading, critically- reviewed research proceedings (research and demo tracks), magazines, and journals articles that were relevant to the domain from the year 2004 onwards. In particu- lar, these included the following conferences: Cloud Computing (CLOUD), Cloud Engineering (IC2E), Service-Oriented Computing (ICSOC), Advanced Information Systems Engineering (CAiSE), Large Installation System Administration (LISA), Database Systems for Advanced Applications (DASFAA), Cooperative Information Systems (CoopIS), Cloud Computing and Services Science (CLOSER), Utility and Cloud Computing (UCC). And the following journals: ACM Computing Surveys (CSUR), ACM Transactions on Internet Technology (TOIT), IEEE Internet Com- puting, IEEE Transactions on Network and Service Management (TNSM), IEEE Transactions of Cloud Computing (TCC), Journal of Systems and Software (JSS). The selection of research initiatives were refined down to 10 candidates (refer to AppendixA) based on our knowledge, expertise, and conversations with academic 2.9. Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques 61 colleagues and industry experts. We analyzed these initiatives and further revised our taxonomy and comparison tables based on our findings.

Ultimately, 11 different approaches were selected for comparison. This was based on critical analysis using our selection criteria: significance, originality of the ap- proach, impact and relevance. Namely: AWS OpsWorks [173], AWS CloudForma- tion [4], VMWare vSphere [133], Heroku [145], Puppet [111, 118], Juju [198], Docker [197], OpenTOSCA [29], CFEngine [37], Plush [3], and SmartFrog [88].

We present a concise yet comprehensive analysis divided into four subsections: (a) Resources and User Types; (b) Resource Orchestration Capabilities; (c) Knowl- edge Reuse; and (d) Runtime Environment. These reflects the various dimension(s) of our taxonomy, and for each we have evaluated the 11 different approaches.

2.9.2 Resources and User Type

Table 2.1 maps the selected orchestration techniques onto the taxonomy of Re- sources and User Types described in Section 2.4 and 2.6. The supported resource types, access methods and representation notations immensely influence the type of users. Therefore to appreciate this correlation, we present our analysis of these two dimensions together.

Accordingly, by studying the characteristics relative to these two dimensions, we identify the following points:

• 8 out of 11 approaches utilize native (or script-based) representation notations – although there are a variety of other notations providing the same or similar or- chestration features (some of which are cited at Section 2.4.4). This underlines the factual assertion and suitability of script-based notations in this domain. Another historical reason for its prominence is due to the traditional system administrators’ community relying heavily on scripts to automate their tasks. Most early techniques were also focussed primarily on system administrators rather than application developers or non-IT people. In contrast, visual nota- tions are quite prominent in modern Web-services composition techniques at 2.9. Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques 62

the Software-as-a-Service (SaaS) layer [126].

• 10 out of 11 approaches support resource access over CLIs; from which 7 pro- vides APIs. Due to the fact that Linux and Unix based systems are managed via CLIs instead of GUIs, the current DevOps community is heavily equipped with CLI-based system administration skills. Before the advent of public clouds such as AWS, most organizations managed private cloud infrastructures. Seamless in- tegration support with other orchestration techniques were thus less important. With the advent of public clouds, organizations have started to incrementally move their applications onto public clouds. To manage applications across pub- lic and private clouds, providing APIs and SDKs for programmatically accessing resources has become an important requirement.

• 10 out of 11 approaches support orchestrating platform resources, as well as those targeted towards DevOps. In general, cloud resource orchestration vastly remains the prerogative of professional DevOps, although the adoption of end- user intuitive visual abstractions is emerging. Nonetheless, composing and or- chestrating cloud resources still ideally requires specialized expertise, such as system management and software engineering knowledge.

As future directions, with respect to user types and resource representation nota- tions, we believe it is vital to overcome the current limitations that make it difficult to cater for (1) end-users and (2) unified and domain-specific resource representation notations. We further elaborate these future directions is Section 7.2. 2.9. Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques 63

Table 2.1: The Resource and User Type Dimensions of the Selected Platforms

Resource User Type

Res. Types Res. Res. Entity Model Res. Represen- Access tation Method Notation

Entities Relationships Constraints

Hybrid Support defining (Visual composite trees Constraints are notation, from compo- Support Containment rela- allowed in entities as Web based AWS Platform Scripts nent entities. tionship to compose a set of attributes and rules (eg GUI, CLI, DevOps OpsWorks Resources based Component related resources required for - auto-scaling rules in SDK, APIs on Chef entities are are a Web application Layer) cook- Web application books) components.

(1) Support Dependency (2) Local attributes Support defining AWS Infrastructure relationships between in resource entities composite graphs CloudFor- and Platform JSON resources. (2) Support (2) Attributes can be CLI, APIs DevOps from component mation resources Containment relationships to defined to apply on all entities group all the related resources the resource entities

Top level resource is a ServiceIn- (1) Support Dependency stance (a data relationships between center). Service- resources. (usually these Instance can be relationships can be modeled Support attributes in Desktop System VMWare Virtual Ma- Visual modeled a set of between the component resource entities to based GUI, Admin- vSphere chines VMs which can resources within VMs) configure VMs CLI, APIs istrators be composed of (2) Support Containment component enti- relationships to group all the ties like network, related VMs alarm

(1)Support attribute based constraints (1) Support Containment in resource entities. Support defining relationships to group a set (2) Policies can be composite trees of Dynos which belong to a Heroku Platform Scripts specified on particular CLI, APIs DevOps from component particular app. (2) Support entities (eg., At least entities Dependency relationships one Web Dyno entity (e.g., pom.xml in java apps) should exist in each App entity)

(1) Supports a graph of resource Resource entity entities (2) Entity specific constraints are types include (1) Support Dependency provided as attributes. files, packages relationships which results the Puppet also define a like resource that deployment behavior among CLI, APIs, Platform hierarchical structure Puppet Scripts can be composed resource entities (2) Hosting Web based DevOps Resources to categorize resource to model a relationships to specify which GUI entities such that machine (3) Top resource entities should be constraints defined in level Composite deployed on which machines parent are inherited to entity represent children a Machine (Physical/Virtual)

(2) Dependency relation- ships between Charms (e.g., require, provide interfaces) Support entity and Infrastructure Hybrid (1) Support a (2) Containment relationship relationship specific CLI, Web Juju and Platform (YAML, graph of resource (e.g., between Charms and DevOps constraints via based GUI resources Scripts) entities the Provider) (3) Hosting attributes relationship (e.g., between a service-unit and a Machine/- Container) 2.9. Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques 64

Table 2.1 – Continued from previous page

Resource User Type

Res. Types Res. Res. Entity Model Res. Represen- Access tation Method Notation

Entities Relationships Constraints

(1) Communication rela- Support a graph Entity specific Platform tionships (2) Dependency Docker Scripts of resource constraints via CLI, APIs DevOps Resources relationships (3) Hosting entities attributes relationship

(1) Communication relation- Hybrid Infrastructure Support a graph ships (e.g., connect to) (2) Entity and relation Open- (Visual Web based and Platform of resource Dependency relationships specific constraints via DevOps TOSCA notation, GUI resources entities (e.g., depend on) (3) Hosting attributes Scripts) relationships (e.g., hosted on)

(1) Supports Dependency re- Support a graph Resource entity CLI, APIs, Platform lationships (e.g., depends_on) CFEngine Scripts of resource specific constraints are Web based DevOps Resources (2) Support Containment entities provided as attributes GUI relationship

(1) Support Dependency Support a graph Entity specific Platform relationships (2) Support Plush XML of resource constraints via CLI DevOps Resources Containment relationships to entities attributes group all the related resources

Support a graph Entity specific Platform (1) Supports Inheritence and SmartFrog Scripts of resource constraints via CLI DevOps Resources Containment relationship entities attributes

2.9.3 Resource Orchestration Capabilities

Table 2.2 maps the selected orchestration techniques onto the taxonomy of Resource Orchestration Capabilities described in Section 2.5. By studying characteristics rel- ative to this dimension, we identify the following points of significance:

• 7 out of 11 approaches support script-based orchestration strategies – the most basic form of orchestration process pattern. 4 out of 11 support reactive orches- tration processes. However, to the best of our knowledge none fully supports proactive orchestration processes – the most sophisticated pattern. While this is rather expected, it also manifests an important need for continued research on effective, intuitive and proactive orchestration processes, which can be easily mastered not only by professional DevOps.

• Cross-cutting concerns are not addressed or addressed only vaguely amongst 2.9. Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques 65

research initiatives. On the other hand, cross-cutting concerns are fairly consid- ered in enterprise-ready orchestration techniques. This is because their use in production environments demands solutions that address issues such as security, portability and/or fault tolerance.

• It is apparent that various orchestration techniques employ different language paradigms. However based on our observations, there is not yet any predomi- nant language widely adopted by the majority of cloud orchestration providers.

In addition, other studies including use-cases, which involve DevOps and system administrators, confirm that there is a need to navigate and understand cloud re- source configurations, as well as monitor and control such resources in a simplified manner [206].

Accordingly, we identify the following main issues as future directions for Orchestra- tion Capabilities: (1) declarative cloud resource orchestration and management, (2) visual notations for orchestrating cloud resources and (3) proactive cloud resource orchestration. We further elaborate them as general future directions of this thesis (refer to Section 7.2).

Table 2.2: The Resource Orchestration Capabilities Dimension of the Selected Plat- forms

Resource Orchestration Capabilities

Primitive Actions Orchestration Strategies Language Paradigm Cross-cutting Concerns

Create, Delete, Describe, Update actions are provided for each resource entity. Security rules (authorization, AWS Clone, Start, Stop, Reboot actions are access protocols), SLA can Reactive processes ECA rule based OpsWorks offered for some entities (e.g.,- Stack, be defined via auto-scaling Instance). Global actions are also provided. and auto-healing rules (e.g., SetLoadBasedAutoScaling)

Security rules (authorization, AWS Create, Delete, Update, Describe, and access protocols), SLA can CloudFor- Script-based processes Markup language Clone are the main actions provided be defined via auto-scaling mation rules

Provide a large amount of actions for each Security rules (authentica- VMWare entitiy type. In general all these actions can Reactive processes Markup language tion, authorization), Portable vSphere be categorize into create, delete, update. VMs

Create, update, scale, delete applications, Proactive and Script-based Security rules (OAuth Heroku Script-based viewLogs (useful for monitoring) processes authorization)

Mainly Script-based processes, Constraint Program- Security rules (encryption, Puppet Create, update and delete resources but reactive processes for few ming authentication, authorization) resources 2.9. Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques 66

Table 2.2 – Continued from previous page

Resource Orchestration Capabilities

Primitive Actions Orchestration Strategies Language Paradigm Cross-cutting Concerns

Create and delete (Environment, VMs, and Charms, Services, Relationships between Security rules (authentica- Juju Reactive processes ECA rule based Charms), describe Environment, detect tion, authorization) Events, update Charm

Create and delete (Image, Container), share Security rules (authorization, Docker (Image), start, stop, restart (Container), Script-based processes Script-based access protocols) update (Container)

Open- Create, update and delete (resources and Script-based processes Flow-based Portable resources TOSCA relationships, attributes)

Constraint Program- Security rules (encryption, CFEngine Create, update and delete resources Reactive processes ming authentication, authorization)

Markup-based and Plush Create (environment and application) Script-based processes Not addressed Flow-based

SmartFrog Deploy, start and terminate Script-based processes Markup language Not addressed

2.9.4 Knowledge Reuse

Table 2.3 maps the selected orchestration techniques onto the taxonomy of Knowl- edge Reuse as described in Section 2.8. By studying characteristics relative to this dimension, we identify the following points of significance:

• A large majority, 9 out of 11 approaches use proprietary reuse artifacts in con- crete and template formats. Whereas, only 2 leverages open standards (e.g., TOSCA, OVF, OCF) to represent reuse artifacts [153, 57, 155]. This latter approach would in fact significantly assist DevOps to build portable and inter- operable configurations across different cloud providers. Unlike in Web-services composition, adopting open standards is still not prevalent; it is thus deemed high-priority amongst cloud resource orchestration techniques.

• Research initiatives for cloud orchestration techniques generally underestimates the reuse of orchestration knowledge. Comparatively, all of the enterprise-ready approaches we analyzed, provide some form of knowledge reuse technique. This observation asserts the utmost practical necessity and importance of knowledge- reuse for DevOps to build and orchestrate real-world cloud resources. 2.9. Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques 67

• 7 out of 11 approaches employed search indexes – the most prominent knowledge discovery technique. Amongst other search methods, keyword-based search is widely used; this is likely due to its simplicity of implementation. Generally speaking, recommendation-based knowledge discovery techniques are promis- ing, albeit most orchestration providers do not adopt this approach due to the complexity of implementation and maintenance of the accuracy of recommen- dations.

• Enterprise-ready approaches predominantly support community-driven knowl- edge archival and curation techniques. This is due to the vast amount and diversity of cloud resources that needs to be supported. For instance, in the absence of the crowd, providers would have to build and maintain a knowledge artifact repository on their own – which would clearly be infeasible in practice.

Accordingly, we foresee the unified representation and reuse mechanisms over heterogeneous cloud resource knowledge as a future direction, which is explained in Section 7.2 in details.

Table 2.3: The Knowledge Reuse Dimension of the Selected Platforms

Knowledge Reuse

Reused Artifact Reuse Technique

AWS OpsWorks Concrete and Template resource descriptions Search index

AWS CloudFor- Concrete and Template resource descriptions Search index mation

VMWare Search index, Recommendations, Community-driven Portable Resource snapshots vSphere approaches (e.g., blogs)

Heroku Concrete and Template resource descriptions Not specified

Puppet Concrete and Template resource descriptions Community-driven search indexes

Concrete and Template resource descriptions and Juju Community-driven search indexes Miscellaneous

Concrete and Template resource descriptions and Docker Community-driven search indexes Miscellaneous

OpenTOSCA Portable Concrete and Template resource descriptions Not specified

CFEngine Concrete and Template resource descriptions Search index

Plush Concrete and Template resource descriptions Not specified

SmartFrog Concrete and Template resource descriptions Not specified 2.9. Applying the Taxonomy: Evaluation of Cloud Resource Orchestration Techniques 68 2.9.5 Runtime Environment

Table 2.4 maps the selected orchestration techniques onto the taxonomy of Runtime Environment as described in Section 2.7. By studying characteristics relative to this dimension, we identify the following points of significance:

• A large majority, 9 out of 11 approaches adopt a centralized execution model to orchestrate cloud resources. After observing other distributed system do- mains (such as Business Process Management [66]), this design choice is likely due to the flexibility of implementation. In comparison, decentralized agents requires an implementation that carefully considers discovery, synchronization, coordination and security aspects of agents.

• Surprisingly, the value of federated cloud resources is largely underestimated. Most cloud resource orchestration techniques either focus on private or public cloud environments as their target environment. Whereas, only 1 out of the 11 approaches we studied provide support for federated cloud resources.

• The preference of virtualization technique varies largely based on the types of resources (i.e., Infrastructure, Platform or Software) that are supported by any given orchestration technique. All of the approaches that we analyzed which support Infrastructure resources, adopt OS-level hypervisors as their virtualiza- tion technique. Other approaches which which support Platforms and Software adopt environment-level container managers.

Furthermore, we identify the following future directions: runtime intelligence for declarative orchestration and cloud service event analytics, in the evolution of Runtime Environments. We elaborate them as general future directions of this thesis (refer to Section 7.2). 2.10. Conclusion 69

Table 2.4: The Runtime Environment Dimension of the Selected Platforms

Runtime Environment

Virtualization Technique Execution Model Target Environment

AWS OS-level hypervisor Centralized Public Cloud OpsWorks

AWS OS-level hypervisor Centralized Public Cloud OpsWorks

VMWare OS-level hypervisor Centralized Private Cloud vSphere

Heroku Environment-level Container manager Centralized Public Cloud

Not relevant (only responsible for configuration management of resources rather Puppet De-centralized Public or Private Cloud virtualizing them)

Juju OS-level hypervisor Centralized Public or Private Cloud

Docker Environment-level Container manager Centralized Public or Private Cloud

Open- OS-level hypervisor Centralized Public or Private Cloud TOSCA

Not relevant (only responsible for configuration management of resources rather CFEngine De-centralized Public or Private Cloud virtualizing them)

Not relevant (only responsible for configuration management of resources rather Plush Centralized Private Cloud virtualizing them)

Not relevant (only responsible for configuration management of resources rather SmartFrog Centralized Private Cloud virtualizing them)

2.10 Conclusion

Cloud resources and orchestration techniques are an effective technology, endowed with immense power to transform traditional infrastructure, platforms and software resources into elastic, measurable, on-demand self-service-based virtual components. In this extensive survey, we have studied a diverse mix of cloud resource orchestration techniques, which include languages, services, standards, and tools. We presented a novel taxonomy over a broad range of relevant dimensions, which we have applied to characterize and compare various orchestration techniques. We contribute a sys- tematic analysis of the most representative cloud resource orchestration techniques by evaluating and classifying them against the presented taxonomy. In subsequent chapters, we further analyse additional research initiatives and resource orchestra- tion techniques, which are closely related with those specific chapters. Towards the 2.10. Conclusion 70 end of this contribution, we derive key open research issues based on the apparent technical gaps that were identified during the analysis. Accordingly, we propose a range of future directions as fruitful guidelines for the next generation of cloud orchestration. Chapter 3

A Model-Driven Framework for Interoperable Cloud Resources Management

3.1 Introduction

Existing cloud resource Configuration and Management (C&M) techniques typ- ically rely on procedural programming (general-purpose or scripting) languages [168, 131, 135, 223, 163]. Modern systems such as: Puppet, Juju, Docker and Ama- zon OpsWorks provide low-level script-based languages and user interfaces (e.g., CLIs, APIs, GUIs) for configuration and management of resources over cloud ser- vices [64]. Moreover, cloud applications may possess varying resource requirements during different phases of their life-cycle, which encourages DevOps (i.e., software en- gineers and system engineers who are collectively involved in designing, developing, deploying and managing cloud applications) to design end-to-end and automated configuration and management tasks that span across a selection of best of best- of-breed C&M techniques [168, 131]. Thus DevOps would be forced to understand the different and heterogeneous low-level cloud service APIs, command line syntax, Web interfaces, and procedural programming constructs - in order to create and maintain complex cloud configurations and management artifacts. Moreover, the

71 3.1. Introduction 72 problem intensifies with the increasing variety of cloud services, together with dif- ferent resource requirements and constraints for each application. This inevitably leads to an inflexible and costly environment which adds considerable complexity, demands extensive programming effort, requires multiple and continuous patches, and perpetuates closed cloud solutions.

Drawing analogies from techniques in domain of service oriented architecture, such as Web Service Description Language (WSDL), we are encouraged to likewise support the abstract representation of cloud resources by devising rich abstrac- tions to describe and manage cloud resource requirements and their constraints. In this chapter we therefore investigate how to effectively represent, organise and ma- nipulate otherwise low-level, complex, cross-layer cloud resource descriptions into meaningful and higher-level segments. We believe this would greatly simplify the representation, manipulation as well as reuse of heterogeneous cloud resources. To enable this, we propose a methodology to support the automated translation of high-level resource requirements to underlying provider-specific resource and service calls. More specifically, this chapter makes the following main contributions:

Domain-Specific Models for the representation of cloud resource man- agement entities: DevOps represent and share cloud resources in forms of low- level representation objects (e.g., scripts, key-value pairs) and management scripts (e.g., elasticity rules). We propose Domain-specific Models for representing cloud resources and their management strategies as high-level entities. Domain-specific Models, which are based on the Entity-Relationship (ER) model, enable the rep- resentation of cloud resources as entities and relationships. The proposed model features: a vocabulary and set of constructs for describing or representing both elementary (e.g., VMs, database services, load balancer services), and federated cloud resources (e.g., packaged virtual appliances); and the relationships amongst resources (e.g., dependencies, configuration parameters, resource constraints). Given that we architect this layer over existing C&M techniques, this significantly enhances the potential for knowledge re-use, since we can better harness interoperability ca- pabilities. Moreover, our model enables cloud resources to be combined to create higher-level virtual entities, called Federated cloud resources, which shield from com- 3.1. Introduction 73 plexity and heterogeneity of underlying cloud services. For instance, by identifying common concepts among different tools which have different granularities of C&M features, we can seamlessly merge those features for end-to-end configuration and management of cloud resources via our system. For example, a VM, which is de- ployed by a particular tool (e.g., Vagrant), can be modified by another tool (e.g., Puppet) with fine-grained configuration tasks (e.g., installing software within the VM) which are not supported by the initial tool. DevOps, who are usually experts on a certain C&M technique, specify the Domain-specific Model for the particular C&M technique.

Connectors for automated translation of Domain-specific Model ob- jects into native resource description and management artifacts: In order to combat the large number and variety of cloud resources Configuration and Man- agement (C&M) languages (e.g. procedural, activity based and declarative), as well as the various heterogeneous tools/APIs involved to manage resources in different environments (i.e., public, private and federated) - we propose Connectors, which allow DevOps to deploy and reconfigure high-level cloud resource representations based on Domain-specific Models. Basically, a Connector is an Application Pro- gramming Interface (API), that exposes a set of high-level management operations (e.g., deploy, reconfigure, delete), which are specific to a particular C&M tool. Be- hind the scene, Connectors (a) accept Domain-specific Model-based high-level cloud resource representations and translate them into native resource descriptions and management scripts (e.g., files, shell code snippets); and (b) interpret the exposed management operations and transform them into low-level API calls (e.g., create necessary Images and Containers using Docker Remote API1), which thereby ab- stracts the complexity of the low-level interfaces (and communication protocols) of the native cloud C&M techniques. In addition, Connectors may include basic events that are to be monitored by periodically querying for data using low-level APIs. DevOps are thus empowered to write automated management processes such as Event-Condition-Action (ECA) rules and workflows over the operations exposed by the Connectors. Connectors are implemented by DevOps, who have expertise in 1http://docs.docker.com/reference/api/docker_remote_api/ 3.2. Limitations of Existing C&M Techniques 74 programming and knowledge on relevant C&M techniques.

The rest of this chapter is organised as follows: In Section 3.2 we further elucidate limitations amongst existing cloud resource C&M techniques. In Section 3.3 we present the overall system architecture of our proposed platform. In Section 3.4 we demonstrate our methodology via realistic scenarios. While in Section 3.5 and 3.6 we present our implementation and evaluation, and conclude in Section 3.7 with an examination of related work, and discussion of future work in Section 3.8.

3.2 Limitations of Existing C&M Techniques

As mentioned earlier, existing C&M techniques rely on low-level script-based lan- guages. For example, Ubuntu Juju (respectively, Docker) employs Charm language2 (respectively, Dockerfiles3). Charms and Dockerfiles are a collection of configura- tion attributes and executable scripts that configure, install and start an application. Code 3.1 includes a configuration script that describes a simple Node.js4 web ap- plication as a Dockerfile. The constructs of this configuration script, which are basic commands (e.g., RUN, COPY, CMD) provide little or no abstraction for DevOps to identify the main attributes and relationships of cloud resources. Some other tools share non-textual packaging formats (e.g., Open Virtualization Format5 (OVF) and Docker Images6) as resource artifacts.

Code 3.1: Configuration script to deploy a Node.js web application in Docker 1 FROM centos:centos6 2 3 # Enable Extra Packages for Enterprise Linux (EPEL) for CentOS 4 RUN yum install -y epel-release 5 # Install Node.js and npm 6 RUN yum install -y nodejs npm 7

2https://jujucharms.com/ 3https://docs.docker.com/reference/builder/ 4https://nodejs.org/en/ 5http://www.dmtf.org/standards/ovf 6https://docs.docker.com/userguide/dockerimages/ 3.2. Limitations of Existing C&M Techniques 75

8 # Install app dependencies 9 COPY package.json /src/package.json 10 RUN cd /src; npm install 11 12 # Bundle app source 13 COPY . /src 14 15 EXPOSE 8080 16 CMD ["node", "/src/index.js"]

Consider a scenario of describing a composite cloud resource such as a Node.js Web application stack in Docker: First DevOps would need to identify the required component resources (e.g., Node.js as application engine and MySQL as database) and their relationships (i.e., the application engine stores data in the database); then, implement or reuse C&M scripts (i.e., Dockerfiles) for each component resource. Docker provides a Command Line Interface7 and a RESTful interface8 which inter- pret Dockerfiles and allows DevOps to build, deploy, monitor and control necessary resources known as Containers on a given Virtual Machine (VM) (refer to Figure 3.1). As Docker does not support configuring and deploying VMs9, another C&M tool such as AWS-EC2 CLI or Rackspace CLI should be employed for deploying VMs. This requires DevOps to employ multiple C&M tools to automate end-to-end management tasks (e.g., deploying Docker Containers in VMs which are managed by AWS (refer to Figure 3.1)). As every C&M tool has tool-specific resource description models, management capabilities and interfaces, DevOps are required to implement ad-hoc and low-level orchestration scripts to coordinate C&M tasks among these different tools. For example, the Table 3.1 shows heterogeneous configuration and management interfaces exposed by several C&M tools. Consequently, these ad-hoc scripts introduce hard-coded dependencies among resources that are orchestrated by different tools. Reusing knowledge artifacts, which include such ad-hoc scripts, is not scalable as DevOps require to manually analyze those knowledge artifacts to realise 7https://docs.docker.com/reference/commandline/cli/ 8https://docs.docker.com/reference/api/docker_remote_api/ 9This feature was later introduced in Docker after this chapter was completed 3.2. Limitations of Existing C&M Techniques 76

Docker

Node.js App MySQL

AWS VM-1 VM-2

Figure 3.1: Components and relationships of a Node.js Web application stack cross-domain relationships among resources within a composite cloud resource.

In the case of federated cloud resource configuration and management, afore- mentioned limitations excruciate DevOps. For instance, consider more VMs need to be deployed and managed among two cloud services such as AWS and Rackspace, to improve reliability and handle increasing demand of the Web application: DevOps need to implement additional orchestration scripts that monitor the application load and deploy the Web application in either AWS or Rackspace based on a certain load- balancing algorithm. AWS and Rackspace, however, use different formats of access credentials and management interfaces to deploy VMs.

Table 3.1: Heterogeneous configuration and management interfaces

Tool Types of C&M interfaces AWS CLI, SDKs, Web 2.0, REST and SOAP APIs Rackspace CLI, SDKs, Web 2.0, REST API Puppet CLI, Web 2.0, REST API Docker CLI, REST API

As explained, this inevitably entails great complexity when exploiting cloud ser- vices, and the problems get increased in distributed environment across multiple, heterogeneous, autonomous, and evolving cloud services. More specifically, with ex- isting cloud delivery models, developing a new cloud-based solution generally leads to uncontrollable fragmentation using different C&M languages and tools (e.g., Pup- pet, Chef, Juju, Docker, SmartFrog, AWS OpsWorks) [47, 64, 87, 116]. This makes it very difficult to develop interoperable and portable cloud solutions. It also de- grades performance as applications cannot be partitioned or migrated easily and arbitrarily to another cloud when demand cycles increase. 3.3. Cloud Resources Management Architecture: An overview 77

Domain-Specific Models

Database- Task-specific resource layer Key-Value-Storage- CRM-Community Community Community is registered with

Federated resource layer is composed of

Tool specific resource layer Docker Ansible Puppet Chef DSM Juju DSM DSM=Domain-specific Model DSM DSM DSM

fed into

Connectors Docker- Ansible- Chef- Juju- Puppet- Federated- Connector Connector Connector Connector Connector Resource- Connector generates

Exisng Resource Management Communies

Figure 3.2: System Overview

3.3 Cloud Resources Management Architecture: An overview

To overcome the limitations described in Section 3.2, we propose a layered archi- tecture that enables: (a) Domain-specific Models (i.e., high-level representation and management models for cloud resources); and (b) Connectors (i.e., automated trans- lations of these high-level Domain-specific Models into low level resource descriptions and management rules). Figure 3.2 illustrates the system design and interactions of main layers in our proposed approach; which are elucidated as follows:

Existing Resource Management Communities represent tools and APIs available for cloud resource C&M. We discussed about cloud resource C&M com- munities and their limitations extensively in Section 3.2.

Domain-specific Models layer consists of three sub-layers: (i) Tool-specific re- source layer; (ii) Federated resource layer; and (iii) Task-specific resource layer. All sub-layers consist of a collection of Domain-specific Models. Starting from bottom- up, the Tool-specific resource layer includes Domain-specific Models, each of which represents cloud resource entities (e.g., resource descriptions, management rules) and relationships among those entities of a ‘particular’ cloud resource C&M tool. For example, Docker DSM (refer to Figure 3.2) describe linked entities that are 3.3. Cloud Resources Management Architecture: An overview 78 provided specifically by the Docker engine. Tool-specific Domain-specific Models can also be combined to create higher-level DSMs that represent Federated cloud resources, which may be managed by two or more existing cloud resource C&M tools. For example, a customer relationship management application of an organi- zation, which is deployed in a public cloud service (e.g., AWS), may access a client information database server, which is managed within the organization’s private cloud infrastructure (e.g., VMWare). Finally, the Task-specific resource layer rep- resents “splices” of the fundamental DSMs that are reformulated to specific types of categories. For example, DSMs for the Database Community may include models that facilitate key-value storages, relational databases and graph databases. The extended goals of DSMs are also to abstract unwanted heterogeneous notations in order to simplify for the end-developer. DSMs can thus be customized to further accommodate this.

We further elucidate with examples on Domain-specific Models in Section 3.4.

Connectors layer exposes interfaces which allow to create cloud resources and invoke management actions by accepting Domain-specific Model-based resource de- scriptions.

We designed the interface of Connectors (refer to Table 3.2) by analysing man- agement actions exposed by a range of existing C&M techniques (refer to Chapter2). The init operation: (i) accepts Domain-specific Model-based resource descriptions; (ii) generates a package of native resource descriptions; and (iii) return a unique id that represents the generated resource description. DevOps refer this unique-id for subsequent management operations on that particular resource description. Every C&M technique exposes three basic operations: deploy, control (or reconfigure) and undeploy a cloud resource. The deploy operation requires a resource config- uration description as the input. The output of the deploy operation returns an identification(id) value that uniquely represents the deployed cloud resource (e.g. "ImageId" value of an (AMI)10). This id value is referred in controlling and undeploying the particular resource. The undeploy operation takes the id value of an already deployed resource and returns optional information 10docs.aws.amazon.com/cli/latest/reference/ec2/create-image.html 3.3. Cloud Resources Management Architecture: An overview 79 regarding the success/failure of the operation.

Cloud resources may be dynamically controlled (e.g. restart VM instance) to sat- isfy varying resource requirements. Similarly, a subset of component resources of a composite or federated cloud resource may be subjected to dynamic re-configurations when certain events (e.g. VM instance connection failure) occur. DevOps should be able to specify control actions for a cloud resource configuration. The control operation allows DevOps to specify how a cloud resource should be controlled when certain events occur. Depending on specific control operations exposed by existing C&M techniques, control operation may have multiple and customized implemen- tations to satisfy each of those control operations. For example, an operation named restart of a virtual machine (VM) is invoked when the C&M technique detects a connection failure to the relevant VM. The restart operation performs some na- tive management actions such as requesting to restart the VM through the C&M technique’s low-level API. DevOps, who describe cloud resources based on Domain- specific Models, may annotate the resource descriptions with rules. Those rules include events (e.g., connection failure to VM) and operations (e.g., restart) ex- posed by the Connector to enable dynamic execution of control actions based on environment and resource variations. Our initial working assumption is that the operations of Connectors are implemented, verified, tested, reviewed and curated by DevOps based on available knowledge and experience.

Some C&M techniques support pushing basic events, which are generated from cloud resources, to interested consumers. For example, GitHub provides 25 different events11 such as Push, Issue and Fork. However, some C&M techniques do not support pushing events. For example, DevOps need to periodically query the low- level API of Docker to extract CPU and memory consumption details of Containers. When implementing Connectors for such C&M techniques, DevOps may addition- ally specify basic events and their extraction logic along with the operations of the Connectors, such that consumers of the Connectors may register to those events to monitor them.

We further exemplify and illustrate technical details of Connectors in Section 11https://developer.github.com/v3/activity/events/types/ 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 80

3.4.5 using Docker as an example.

Table 3.2: Operations of Connector interface

init(resource_meta_data): This operation accepts the resource description and and translate them into native resource descriptions and management scripts (e.g., files, shell code snippets) and return a unique id deploy(resource_meta_data): When a user requests to deploy a cloud resource, the runtime of Connectors processes the incoming cloud resource description and selects a particular Connector that can satisfy the incoming resource description. Then the runtime invokes deploy operation of the particular Connector along with the incoming resource description as resource_meta_data. This operation returns an object that consists of a resource_id and an optional resultant message. control(resource_id, action_description): This operation ac- cepts a particular resource identifier and an action description as inputs. The implementation of the operation should specify the configuration behavior based on the input action description (e.g., increase-CPU, decrease-memory). This operation returns an object that consists of the resource_id and an optional resultant message. undeploy(resource_id): When a user requests to undeploy a de- ployed cloud resource, the runtime of Connectors extracts the re- source id of the deployed cloud resource and figures out the specific Connector, which was invoked to deploy the particular cloud re- source. Then runtime invokes undeploy operation of the particular Connector along with the resource_id.

3.4 Extracting Domain-specific Models from Tool- specific Resource Artifacts

In this section we illustrate our methodology of analysing existing Configuration and Management (C&M) tools to derive Domain-specific Models. Using a real-world example, we demonstrate how we derive their key entities (i.e., resource description entities, management actions and events) which thereby constitute the Domain- specific Models.

We built Domain-specific Models for a diverse range of tools and languages: Docker, Juju and TOSCA. For each, we first analyzed existing knowledge sources 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 81

(e.g., C&M language specifications, user documentations, forums and resource de- scription repositories) to understand and extract key entities for describing cloud resources. Next, we extracted relationships between the entities by understanding how entities are associated when describing composite cloud resources. These en- tities and relationships constitute the resource description model of the respective Domain-specific Model. Likewise, to model management capabilities of the Domain- specific Model, we again analyzed knowledge sources and extracted what actions and events are provided by these tools, such as for manipulating the given resource. These events and actions allow DevOps to annotate resource descriptions with ECA rules. For example, DevOps may specify ECA rules such as: when event patterns (e.g., a user changes a configuration attribute of a cloud application) are matched and their conditions (e.g., application is started) are satisfied, the specified C&M actions (e.g., deploy, delete, re-configure, start and stop) are fired. Finally we inte- grated the extracted events and actions as two sets of entities to the Domain-specific Model.

3.4.1 An Embryonic Cloud Resource Configuration & Man- agement Model

During this kind of reverse engineering analysis, we need a language that captures characteristics of Domain-specific Models. We propose an Entity-Relationship (ER)- based model to represent Domain-specific Models. In this manner, ER-constructs can capture the high-level design of cloud resources as entities, and likewise explic- itly represent relationships between other cloud resources. Additionally, ER-model based resource descriptions act as documentations that explicitly describe the re- source, and the relationships amongst other resources. Alternatively with existing script-based approaches, complex cloud resource configurations are often just docu- mented separately in forms of ad-hoc Wikis that outdate quickly unless continuously maintained. ER based Domain-specific Models also support a machine-readable syn- tax, which is consumed by software like Connectors to automatically generate cloud resource descriptions, deployment and management scripts of C&M tools. Our em- 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 82

Resource-Description-Model

Entity Relationship 2 0..* name : String has has name : String attributes: Array attributes: Array

Figure 3.3: UML Class diagram for Resource Description Model bryonic data model consists of two aspects.

1. Resource Description Model: It describes language constructs provided for representing cloud resources in a C&M tool, in terms of relevant entities and relationships (refer to Figure 3.3). Entities and relationships include attributes that characterise them. For example, an entity that represents a VM may include CPU, memory and storage as attributes.

2. Resource Management Model: It expresses language constructs, provided to configure, deploy, monitor and control cloud resources by a C&M tool. Re- source Management Model consists of two sub-models.

(a) Action Model: It specifies available actions (e.g., deploy, configure, mi- grate), which manage cloud resources, as a set of entities with relevant attributes that express required input and output parameters (refer to Figure 3.4).

(b) Event Model: It expresses events related to the life cycle of cloud resources in terms of entities with necessary attributes that describe events [168] (refer to Figure 3.4). It should be noted that, the issues of event detection while important, they are complementary to research issues addressed in our work and outside the scope of this chapter.

In the following sections we explain how we leverage our embryonic model to define Domain-specific Models, using Docker and Juju as real-world examples. We chose Docker as it is open-source and emerging industry standard. Moreover, Docker and Juju are vastly praised by DevOps communities. 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 83

Resource-Management-Model

Action Event 1..1 1..1 name : String name : String attributes : Array attributes: Array

0..1 0..1 Output Input

attributes : Array attributes : Array

Figure 3.4: UML Class diagram for Resource Management Model

3.4.2 Docker-based Domain-specific Model

Docker comprises cloud resource management entities which are required for applica- tion deployment over software containers. Docker is a Container-based virtualization technique, which offers a lightweight and portable resource isolation alternative to VMs. Container-based virtualization techniques have been emerged to simplify and accelerate the modeling and deployment of cloud resources. More specifically, for composite cloud resources, which depend on multiple service platform for their op- erations, container-based virtualization techniques enable accelerated and efficient modeling and deployment of optimally configured, scalable and lightweight platform instances.

By analyzing the Docker language specifications12,13 we identified six key re- source description entity types: (1) Container, (2) Image, (3) Application, (4) Registry, (5) Hosting-Machine and (6) Cluster (refer to Figure 3.5).

The central entity: Container represents a virtualised software container where DevOps deploy an application or a component of an application (e.g., an Apache Web-Server installed on Ubuntu OS with dependent libraries). Deployment knowl- edge of the application and its dependencies (or application components) is repre- sented via the entity, Image. Such knowledge is either represented using one mono- lithic Image instance or a set of Image instances that each represents deployment 12https://docs.docker.com/reference/builder/ 13https://docs.docker.com/compose/reference/ 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 84 knowledge of an application component. In other words, the Image possesses deploy- ment knowledge required to instantiate a Container. An Application represents a logical entity that includes a collection of related Containers. Each Container constitutes a component of the Application. The entity Registry represents a repository of Images where DevOps organise, curate and share resource deployment knowledge. The entity Hosting-Machine represents the location where a Container is hosted (e.g., VM or physical machine). A Cluster represents a set of Hosting Machines. This reduces the overhead of dynamically managing multiple machines. For example, the Cluster may automatically decide which Hosting Machine will be chosen to deploy the given container based on an optimization algorithm [179].

Once we identified these main entities, we derive (a) attributes that characterize each entity and (b) relationships among entities. For example, a Hosting-Machine in Docker is identified using a FQDN (Fully Qualified Domain Name) of VM or physical machine. The relationship between Hosting-Machine and Container is Deployment. The Containment relationship defines the hierarchical organization of entities. For example, Containment relationships exist between a Container and its related Application; and between a Hosting-Machine and its related Cluster. Likewise, we derive all the relationships available between the entities.

We then extract actions, offered by the tool (refer to Figure 3.5). For exam- ple, Docker exposes actions like create, start, stop, pause and delete to manipulate Containers. Docker offers actions to manipulate other entities and relationships.

We then extract basic events supported by the tool, if any. For example, Docker includes events14 such as: @Created, @Started, @Stopped, @Paused, @Running, @Killed, etc. to detect the runtime state of Containers. We then specify additional events that are not directly supported by the tool, but they are required by the Connec- tor for resource management. For example, we may specify a periodic event that includes memory usage data of a particular Container. In addition, we may spec- ify composite events based on previously extracted events using an existing event- pattern specification language (e.g., Esper EPL). For example, we may specify a composite event, which gets triggered if the memory usage of a Container exceeds 14https://docs.docker.com/engine/reference/commandline/events/ 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 85

Base-Image Link ActionType EventType source target child 0:1 0:N parent 0:N 1:1 1:1 Container Image consumer instantiate name createContainer createImage Container-Stopped name 0:N 0:N state imageName dockerFile version instantiated by port-binding-rules Volume script optional-attributes startContainer pullImage Container-Started 0:N 0:N distributed by cpu containerName imageName 1:N provider memory contained-by Container-Paused Distribution 0:N deployed by stopContainer pushImage containerName imageName Deployment 1:1 Containment Container-Created distribute 1:1 deploy pauseContainer deleteImage Registry Hosting-Machine containerName imageName id FQDN Container-Running deleteContainer createLink/Volume url access-credentials port name containerName containerName 0:N contained-by createRegistry deleteLink/Volume registryDesc containerName Containment

1:1 contain deleteRegistry 0:1 contain repoID Application Cluster name name version discovery-token

Resource Description Model Resource Management Model

Figure 3.5: Domain-specific Model for Docker

95% and then the Container is killed, to identify Containers that get crashed due to the shortage of available memory.

Code 3.2 represents a JSON-based cloud resource configuration which is de- rived from the Docker-based Domain-specific Model in Figure 3.5. The JSON-based configuration represents a 3-tier web application. In this case, business logic is ex- ecuted using Business Process Execution Language (BPEL), with state data stored on a MySQL DB. For scaling purposes, we introduce a Nginx Load Balancer that propagates requests to a cluster of Apache Orchestration Director Engine (ODE) Servers. The JSON-based configuration essentially includes arrays of Entities (refer to line 2), Relationships (refer to line 33) and Rules (refer to line 50). Rules are described based on the events and actions, exposed by the Docker-based Domain- specific model. In addition to the events specified in the Domain-specific Model, DevOps may specify complex events in Rules using an existing event-pattern de- scription language. The Connector, related to Docker, generates native resource 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 86

descriptions and management scripts in Docker (i.e., Docker Images, Containers, Dockerfiles15 and Shell command scripts) by consuming the JSON-based configura- tion. The generation logic is illustrated in Section 3.4.5.

Code 3.2: Sample object of Docker-based Domain-specific Model 1 { 2 " Entities ":[ <- Entity Configurations 3 { 4 "name": "ODE-Server-1", 5 "schema": "docker.rest.Container", 6 "state": "run", 7 "port-binding-rules": "8088:80", 8 "cpu": "2", 9 "memory": "2048", 10 "optional-attributes": { 11 ... 12 } 13 }, 14 { "name": "Ubuntu-OS", "schema": "docker.rest.Image",...}, 15 { "name": "MySQL-DB-Server", "schema": "docker.rest.Image", ...}, 16 { "name": "Java7-VM", "schema": "docker.rest.Image",...}, 17 { "name": "Tomcat", "schema": "docker.rest.Image",...}, 18 { "name": "Apache-ODE", "schema": "docker.rest.Image",...} , 19 { "name": "Jenkins", "schema": "docker.rest.Image",...}, 20 { "name": "Java8-VM", "schema": "docker.rest.Image",...}, 21 { "name": "Nginx", "schema": "docker.rest.Image",...}, 22 { "name":"IR-01", "schema": "docker.rest.Registry",...}, 23 { "name": "MySQL-DB-Server-1", "schema": "docker.rest. Container ",...}, 24 { "name": "ODE-Server-2", "schema": "docker.rest.Container", ...},

15https://docs.docker.com/engine/reference/builder/ 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 87

25 { "name": "ODE-Server-3", "schema": "docker.rest.Container", ...}, 26 { "name": "Nginx-LB", "schema": "docker.rest.Container",... }, 27 { "name":"HM-1", "schema": "docker.rest.HostingMachine", ...}, 28 { "name":"HM-2", "schema": "docker.rest.HostingMachine", ...}, 29 { "name": "Cluster-Manager", "schema": "docker.rest. Container ",...}, 30 { "name":"HM-0", "schema": "docker.rest.HostingMachine", ...} 31 32 ], 33 "Relationships":[ <- Relationship Configurations 34 { 35 "source-participant": "MySQL-DB-Server", 36 "target-participant": "Ubuntu-OS", 37 "schema": "docker.rest.BaseImageLink" 38 }, 39 { "source-participant": "Java7-VM", "target-participant":" Ubuntu -OS",...}, 40 { "source-participant": "Nginx", "target-participant":" Ubuntu -OS",...}, 41 { "source-participant":"IR-01", "target-participant":" Ubuntu -OS",...}, 42 { "source-participant": "Tomcat", "target-participant":" Java7-VM",...}, 43 { "source-participant": "Apache-ODE", "target-participant": " Tomcat ",...}, 44 { "source-participant": "Apache-ODE", "target-participant": "ODE-Server-1",...}, 45 { "source-participant": "Apache-ODE", "target-participant": "ODE-Server-2",...}, 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 88

46 { "source-participant": "Apache-ODE", "target-participant": "ODE-Server-3",...}, 47 { "source-participant": "Nginx", "target-participant":" Nginx -LB",...}, 48 ... 49 ], 50 " Rules ":[ <- Rule Configurations 51 { 52 "listen-to":{ 53 "event": "@Stopped", 54 "Entity-type":[ "Container"] 55 }, 56 " trigger ":{ 57 "action": "sendEmail($recipient, $content, $subject, $host, $port, $credentials)", 58 }, 59 "map":{ <- Specifying input parameters of the action 60 "recipient": "[email protected]", 61 " subject ": "Stopped Container/Application: $event. resourceID", 62 " content ": "$event.data-type: $event.data-value", 63 " host ": "smtp.gmail.com", 64 " port ":"587", 65 "credentials": "[email protected]/********" 66 } 67 }, 68 ... 69 ] 70 } 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 89 3.4.3 Juju-based Domain-specific Model

Juju comprises cloud resource management entities which are required for applica- tion deployment over Virtual Machines. Juju enables DevOps to model, configure and deploy all the components of applications and scale the application components based on the request of DevOps. By analyzing the Juju language documentation16 we identified six key resource descriptions entities and seven relationships (refer to Figure 3.6).

For the sake of clarification purposes, we divide resource description entities in Juju into three layers: (1) Resource Description Layer, (2) Service Layer and (3) Hosting Layer. Resource Description Layer includes two entities that model each component of an application. The central entity: Charm includes knowledge to configure, deploy and scale an application component (e.g., MySQL database server, Java Virtual Machine). Interface allows DevOps to create relationships among Charms. Service Layer includes two entities which represent the runtime state of deployed application components via Charms. A deployed application component is represented as a Service. Once deployed, a Service may be scaled into multiple instances (e.g., nodes of a MySQL database cluster). Each instance is represented as a Service Unit which is belong to a particular Service. Hosting Layer includes two key entities: (1) Provider and (2) Machine. Machine represents a computer system where Service Units are hosted (e.g., Virtual Machine (VM) or a physical machine). Provider represents a software infrastructure (e.g., AWS, Rackspace) that is capable of provisioning Machines on demand.

We then derive (a) relationships among entities and (b) attributes that charac- terize each entity. For example, a Machine in Juju is described using attributes such as processor architecture (e.g., 64bit, 32bit), the number of CPU cores, the amount of memory, operating system (e.g., Ubuntu 14.04). We then extract events and ac- tions, offered by Juju (refer to Figure 3.5). For example, Juju exposes actions such as deploy, expose, update, scale and remove Services. Juju includes events such as: config-changed, update-charm, interface-changed, etc,. to detect runtime state 16https://jujucharms.com/docs/1.18/getting-started 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 90

Resource Description Layer Service Layer Hosting Layer

EventType ActionType 1:1 service described by 1:1 exposed describe 0:N parent install deployService Charm charmName charmName 1:1 Machine name summary arch config-changed exposeService 1:1 child description cpu-cores charmName charmName peers maintainer service-unit mem category networks start updateService optional-attributes root-disk charmName configAttribute 1:1 1:1 hosted by repository container hooks IP upgrade-charm removeService OS 0:N 0:N require charmName charmName provide provided by 1:1 stop addUnit charmName charmName required by provides 0:N 0:N hosts interface-joined removeUnit Provider Interface 1:N charmName charmName provided by name interfaceName 1:N scope limit interface-changed AWS HP optional-attribs charmName type type interfaceName region tenant-mode access-key auth-mode interface-departed secret-key access-key charmName secret-key OpenStack interfaceName MaaS type interface-broken admin-secret type charmName control-bucket maas-server interfaceName auth-mode mass-auth default-series

Resource Description Model Resource Management Model

Figure 3.6: Juju based Domain-specific Model of Charms. We may specify necessary composite events as well.

Code 3.3 represents a JSON-based cloud resource configuration which is derived from Juju-based Domain-specific Model in Figure 3.6. The JSON-based configura- tion includes arrays of Entities (refer to line 2), Relationships (refer to line 22) and Rules (refer to line 38). The JSON-based configuration represents a Node.js17 web application whose application related data are stored in a MongoDB18 database. The Connector, associated with the Juju-based Domain-specific Model, generates native resource descriptions and management scripts in Juju (i.e., a collection of

17https://nodejs.org/en/ 18https://www.mongodb.org/ 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 91

YAML configurations and Shell command scripts) by consuming the JSON-based configuration.

Code 3.3: Sample object of Juju-based Domain-specific Model 1 { 2 " Entities ":[ <- Entity Configurations 3 { 4 "name": "NodeJS", 5 "schema": "juju.rest.Service", 6 "exposed": "true", 7 "optional-attributes": { 8 ... 9 } 10 }, 11 { "name": "MongoDB", "schema": "juju.rest.Service",...}, 12 { "name": "NodeJS-1", "schema": "juju.rest.ServiceUnit",... }, 13 { "name": "MongoDB-1", "schema": "juju.rest.ServiceUnit", ...}, 14 { "name": "NodeJS-Charm", "schema": "juju.rest.Charm",...} , 15 { "name": "MongoDB-Charm", "schema": "juju.rest.Charm",... }, 16 { "name": "http-interface", "schema": "juju.rest.Interface", ...}, 17 { "name": "mongodb-interface", "schema": "juju.rest. Interface ",...}, 18 { "name":"AWS", "schema": "juju.rest.AWS-Provider",...}, 19 { "name":"VM-1", "schema": "juju.rest.Machine",...}, 20 { "name":"VM-2", "schema": "juju.rest.Machine",...} 21 ], 22 "Relationships":[ <- Relationship Configurations 23 { 24 "source-participant": "NodeJS-1", 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 92

25 "target-participant": "AWS", 26 "schema": "juju.rest.HostingLink" 27 }, 28 { "source-participant": "NodeJS-1", "target-participant":" NodeJS ",...}, 29 30 { "source-participant":"AWS", "target-participant":"VM-1", ...}, 31 { "source-participant":"AWS", "target-participant":"VM-2", ...}, 32 { "source-participant": "MongoDB-1", "target-participant":" AWS",...}, 33 { "source-participant": "NodeJS", "target-participant":" NodeJS-Charm",...}, 34 { "source-participant": "MongoDB", "target-participant":" MongoDB-Charm",...}, 35 { "source-participant": "NodeJS-Charm", "target-participant" : "http-interface",...}, 36 { "source-participant": "MongoDB-Charm", "target-participant ": "mongodb-interface",...} 37 ], 38 " Rules ":[ <- Rule Configurations 39 { 40 "listen-to":{ 41 "event": "@Started", 42 "Entity-type":[ "Charm"] 43 }, 44 " trigger ":{ 45 "action": "addUnit(\$charmName)" 46 }, 47 "map":{ <- Specifying input parameters of the action 48 "charmName": "\$event.charmName" 49 } 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 93

50 }, 51 ... 52 ] 53 }

3.4.4 Representing Federated cloud resources using Domain- specific Models

Once tool-specific Domain-specific Models are derived, they can be used in conjunc- tion to specify federated cloud resources. A federated Domain-specific Model in- cludes necessary relationship types which associate entities represented using Docker and Juju or vice-versa. The Domain-specific Model may specify complex events that combine events from Docker and Juju. The action model may include basic actions and processes. To implement processes (e.g., deployment and reconfiguration work- flows), we reuse the language and runtime in Chapter4. These workflows include tasks that trigger operations in Connectors related to Docker and Juju for deploying and reconfiguring required federated cloud resources. To understand the structure of a federated cloud resource configuration, consider a Web application stack, which is described in Docker and deployed on HP public cloud19. Since Docker does not have inbuilt support to provisioning VMs on HP cloud, we need to use another tool such as Juju in conjunction to fully automate the deployment of the Web application.

Code 3.4 represents the JSON-based cloud resource configuration of the afore- mentioned federated cloud resource. The Entities section includes two objects that represent the web application components named Node-Engine-1 and MySQL-DB- Server-1 (refer to line 4 and 32) using Docker-based Domain-specific Model. The remaining three objects in the Entities section are described using Juju-based Domain-specific Model. These three objects represent two VMs (refer to line 15 and 33) and their provider which is HP cloud service (refer to line 24). The Relationships section includes four relationship objects. First two objects (re- fer to line 36 and 42) denote relationships between the application components and

19http://www.hpcloud.com/ 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 94

their target VMs. The type of those relationships is specified as federated.docker. juju.DeploymentLink, which is defined in the federated Domain-specific Model in- stead of Docker-based or Juju-based Domain-specific Models. Remaining two rela- tionship objects (refer to line 46 and 51) denotes the VMs should be deployed on HP cloud service. The Rules section (refer to line 57) includes an action that triggers a workflow which is specified in an external file.

Code 3.4: Federated cloud resource configuration 1 { 2 " Entities ":[ 3 { <- Entity from Docker 4 " name ": "Node-Engine-1", 5 "schema": "docker.rest.Container", 6 " state ": "run", 7 "port-binding-rules":"8088:80", 8 "cpu":"2", 9 " memory ":"2048", 10 "optional-attributes":{ 11 ... 12 } 13 }, 14 { <- Entity from Juju 15 " name ":"HM-1", 16 "schema": "juju.rest.Machine", 17 " arch ":"64bit", 18 "cpu-cores":"4", 19 "mem":"4048", 20 "root-disk":"", 21 "OS": "Ubuntu/14.04" 22 }, 23 { 24 " name ": "HP-Provider-1", 25 "schema": "juju.rest.Provider", 26 " type ":"HP", 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 95

27 "tenant-mode":"false", 28 "auth-code":"*******", 29 "access-key":"*******", 30 "secret-key":"*******" 31 }, 32 { "name": "MySQL-DB-Server-1", "schema": "docker.rest. Container ",...}, 33 { "name":"HM-2", "schema": "juju.rest.Machine",...} 34 ], 35 "Relationships":[ 36 { <- A federated relationship 37 "source-participant": "Node-Engine-1", 38 "target-participant":"HM-1", 39 "schema": "federated.docker.juju.DeploymentLink" 40 }, 41 { 42 "source-participant": "MySQL-DB-Server-1", 43 "target-participant":"HM-2", 44 "schema": "federated.docker.juju.DeploymentLink" 45 }, 46 { 47 "source-participant": "HP-Provider-1", 48 "target-participant":"HM-1", 49 " schema ": "juju.rest.ProvisionLink" 50 }, 51 { 52 "source-participant": "HP-Provider-1", 53 "target-participant":"HM-2", 54 " schema ": "juju.rest.ProvisionLink" 55 } 56 ], 57 " Rules ":[ <- Rule Configurations 58 { 59 "listen-to":{ 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 96

60 "event": null, 61 }, 62 " trigger ":{ 63 "action": "invokeWorkflow(’file:///processes/deployment- workflow.bpmnx’)" 64 }, 65 "map":{} 66 }, 67 ... 68 ] 69 }

3.4.5 Connectors

Once a Domain-specific Model is described, curators may then implement a Con- nector that serve to bridge the Domain-specific Model with the interface of the particular Configuration and Management (C&M) tool. In our system, we define a generic programmable interface (refer Table 3.2) for DevOps to implement Con- nectors. The proposed interface include four operations which are mandatory to be implemented. Additionally, a Connector can have any number of operations that implement the actions specified in the relevant Domain-specific Model. For example, the Connector for Docker has a method called createContainer which: (a) accepts the name of an Image ; (b) prepare the Hosting-Machine to deploy a Container; and (c) invoke docker run command in the Docker CLI20 along with an Image.

Code 3.5 represents a Java-based implementation of the Connector for the Docker- based Domain-specific Model in Figure 3.5. For the sake of brevity and clarity, only the operations are illustrated and implementation syntax is abbreviated as Java syntax is heavily verbose. We include an implementation of a Connector with more details in AppendixC. The mandatory operations are implemented from line 13 to 16. The actions defined in the Domain-specific Model are implemented from line 22 to 40. 20https://docs.docker.com/reference/commandline/cli/ 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 97

Code 3.5: Java-based Connector for Docker-based Domain-specific Model 1 package au.org.unsw.cse.soc.cloudbase.connectors; 2 3 import javax.ws.rs.PathParam; 4 import javax.ws.rs.core.Response; 5 ... 6 7 @namespace("docker.rest") 8 public class DockerConnector implements Connector { 9 /** 10 * Implementation of Mandatory operations 11 */ 12 13 public Response init(String resourceDescription) {...} 14 public Response deploy (String resourceDescription) {...} 15 public Response undeploy (String resourceID) {...} 16 public Response control (String resourceID, Action action) {...} 17 18 /** 19 * Implementation of actions specified in Domain-specific Model 20 */ 21 22 public Response createContainer(String imageName) {...} 23 public Response startContainer(String containerName) {...} 24 public Response stopContainer(String containerName) {...} 25 public Response pauseContainer(String containerName) {...} 26 public Response deleteContainer(String containerName) {...} 27 public Response createRegistry(String registryDescription) {...} 28 public Response deleteRegistry(String repoID) {...} 29 public Response createImage(String dockerFile) {...} 30 public Response pullImage(String imageName) {...} 31 public Response pushImage(String imageName) {...} 3.4. Extracting Domain-specific Models from Tool-specific Resource Artifacts 98

32 public Response deleteImage(String imageName) {...} 33 ... 34 35 /** 36 * monitoring operations 37 */ 38 public Response increaseMemory(String containerName, String amount) {...} 39 public Response decreaseMemory(String containerName, String amount) {...} 40 ... 41 }

Once both the Domain-specific Model and Connector are registered in our frame- work, DevOps are then able to create cloud resources and moreover, implement management processes/rules based on the Domain-specific Model.

Finally, although we demonstrate our methodology for two specific frameworks chosen above, the same could equally be applied to derive the Domain-specific Models and Connectors for any cloud resource C&M tool in our framework.

Using Docker Connector to generate native resource management arti- facts from Domain-specific Models:

The transformation logic to generate the native resource descriptions from instances of the Domain-specific Models are heavily linked to the particular Cloud Resource C&M tool. We describe the approach below to perform a transformation of instances of Docker-based Domain-specific Model. The Figure 3.7 illustrates the translation process from high-level JSON representations into native and low-level resource de- scriptions and management scripts.

The bottom portion of Figure 3.7 depicts two types of files, named Docker- file and build.sh, are generated for each Image instance, named Image-Tomcat and Image-MySQL, in the top portion of Figure 3.7. Dockerfile is a script that includes low-level configuration commands that are required to generate a concrete Image 3.5. Implementation 99 in Docker runtime. DevOps, who implement the Connector, derive these configu- ration commands from the script attribute of an Image instance and the available relationships with other Image instances (refer to Figure 3.5).

The file, build.sh is generated based on a sequence of commands which (1) read the Dockerfile, (2) generate a concrete Image, (3) upload the generated concrete Image to a specified Registry (i.e., Registry-1 in Figure 3.7), and (4) create a concrete Container from the concrete Image in a specified Hosting-Machine (i.e., HostingMachine-1 in Figure 3.7). In addition, build.sh may include commands to instantiate relationships (i.e., Links and Volumes in Figure 3.5) between dependent concrete Containers. The transformation logic extracts required input data for these commands from attributes specified within the instances of Docker-based Domain- specific Model.

When it is required to reuse an existing Image from Docker Registry, instead of constructing one from scratch, Image instances are modeled without a script attribute. In such situations, the transformation logic does not generate a Dockerfile, but include additional commands within build.sh to download an existing concrete Image to generate a concrete Container.

In addition to translating high-level resource representations into native resource descriptions and management scripts, Connectors generate low-level API calls when executing management rules. For example, a management rule, that creates new Containers to handle increasing load is depicted on the left-hand side of Figure 3.8. The right-hand side depicts a sequence of generated API calls which (a) log-in to a particular Hosting-Machine, (b) create an Image (if not already exist), and (c) create and start a Container. The issues of event specification and detection of management rules are explained in Section 3.5.

3.5 Implementation

Our system is implemented as a set of software modules that are described below and shown in Figure 3.9. These modules communicate with one another. 3.5. Implementation 100

High-level JSON representations { "Entities": [ { "name": "Container-Nodejs", ...}, { "name": "Container-MySQL", ...}, { "name": "Image-Nodejs", ...}, { "name": "Image-MySQL", ...}, { "name": "Registry-1", ...}, { "name": "HostingMachine-1", ...}, ], "Relationships": [ { "source": "Container-Nodejs", "target": "Container-MySQL"}, { "source": "Image-Nodejs", "target": "Container-Nodejs"}, { "source": "Image-MySQL", "target": "Container-MySQL"}, ... ], "Rules": [ { "event": "@Stopped-Container", "action": "sendEmail(...)"} ] }

Translation Logic of the Connector for Docker

/(root)

Docker/

Native resource MySQL/ Nodejs/ management artifacts build.sh build.sh

Dockerfile Dockerfile

Generated Dockerfile for Image-Nodejs Generated build.sh for Image-Nodejs # Set the base image to Ubuntu #!/bin/bash FROM ubuntu clear

# File Author / Maintainer ssh [email protected] #log-in to the VM MAINTAINER Denis Weerasiri cd /Docker/Nodejs/ # Install Node.js and other dependencies RUN apt-get update && \ sudo service docker start apt-get -y install curl && \ curl -sL https://deb.nodesource.com/setup | sudo bash - && \ docker build -t ddwee/Nodejs . #generate Image apt-get -y install python build-essential nodejs docker login -u ddwee -p *** # Install nodemon push ddwee/Nodejs # store Image in Registry RUN npm install -g nodemon docker run -d --name Nodejs-1 -p 8080 --link # Provides cached layer for node_modules mysql:mysql ddwee/Nodejs #run the Container ADD package.json /tmp/package.json RUN cd /tmp && npm install RUN mkdir -p /src && cp -a /tmp/node_modules /src/

# Define working directory WORKDIR /src ADD . /src

# Expose port EXPOSE 8080

# Run app using nodemon CMD ["nodemon", "/src/index.js"]

Figure 3.7: Technical translation of Domain-specific Models (per each Image in- stance) 3.5. Implementation 101

ssh [email protected] #log-in to the VM

if docker history -q nodejs 2>&1 >/dev/null; then echo "nodejs image exists"

docker run -d --name Nodejs-1 -p 8080 --link mysql:mysql ddwee/Nodejs #run the Container

else echo "nodejs image does not exist" Connector cd /Docker/Nodejs/ docker build -t ddwee/Nodejs . #generate Image

docker login -u ddwee -p *** docker push ddwee/Nodejs # store Image in Registry

docker run -d --name Nodejs-1 -p 8080 --link mysql:mysql ddwee/Nodejs #run the Container fi

Figure 3.8: Technical translation of a high-level action into low-level API calls

DSM Curators Connector Curators

Front-end Interface

DSM Editor Command-line tool

create/read/update/delete DMSs trigger actions

Domain-specific Event Models (DSMs) Management events ECA Rule System Processor

Events DB Rules DB JSON Object Store subscribe trigger actions

Connector-1 Connector-2 Connector-3

invoke operations events

Cloud Resource Management Tools

Figure 3.9: Internal Architecture 3.5. Implementation 102

3.5.1 Connector Curators

Connector curators implement Connectors by providing necessary business logics for each of the operations of the Interface, defined in Table 3.2. textitConnectors are deployed as RESTful services which are exposed via ServiceBus API (based on our previous work [23]). Further details about functionality in regard to implementing Connectors are explained in Section 3.4.5. We describe an implementation of a Connector in AppendixC.

3.5.2 DSM Curators

DSM curators create, read, update and delete Domain-specific Models (DSM). The Front-end Interface includes the DSM Editor that allows curators to graphically specify the structure of entities, relationships, actions and events of Domain-specific Models. In addition, curators may specify complex or high-level events using Esper EPL21. Main interfaces of our graphical editor to specify entities and relationships are depicted in Figure 3.10 and 3.11 respectively. In our current implementation, we serialize the specified Entity and Relationship schemas as JSON descriptions, which are compliant with JSON-Schema specification [107]. Accordingly, the DSM Editor produces JSON schemas, once curators specify the structure of entities, re- lationships, actions and events. We reuse Java-script library named JSON Schema Based Editor 22 for the generation and verification of JSON schemas. The generated JSON schemas result a serializable Domain-specific Model of a particular C&M tool as previously illustrated (refer to Figure 3.5 and 3.6).

The Entity-Schema and Relationship-Schema enforce curators to provide manda- tory attributes (i.e., name, version, author, associated tool) for book-keeping and curation tasks of Domain-specific Models. The Entity-Schema allows defining any number of arbitrary attributes under Properties section. Figure 3.12 depicts resul- tant schema objects for Container and Image.

The Relationship-Schema enforces curators to specify two participating entities, 21http://rsper.codehaus.org 22https://github.com/jdorn/json-editor 3.5. Implementation 103

Entity-Schema Relationship-Schema

Name Version Author Name Version Author Description Associated-Tool Description Associated-Tool

Properties Participating-Entity-Schema Entity-version String Type Integer participant-1 property-1 Role Default-Value participant-2 Cardinality property-2 Required? yes no Description Min-value 00 ... INF

Figure 3.10: Entity-Schema Editor Figure 3.11: Relationship-Schema Editor role of each participating entity, and cardinality constraints (i.e., minimum and maximum number of Entity objects that may join the relationship). For example, the Instantiation relationship in Figure 3.5 is established between a Container and Image. The roles of Image and Container are instantiate and instantiated-by respectively. The cardinality constraint of Image is one-to-infinite as an Image can instantiate any number of Containers. The cardinality constraint of Container is one-to-one as a given Container can be only instantiated by a particular Image. Figure 3.13 depicts resultant schema object for the Instantiation relationship. Similarly, curators may collaboratively construct all required JSON objects of Entity and Relationship schemas for a particular Domain-specific Model.

DevOps, who configure cloud resources, may create, read, update and delete JSON-based resource descriptions (refer to Code 3.2 and 3.3 for examples). In addi- tion, DevOps may invoke management actions (e.g., deploy, start, delete) against the JSON-based resource descriptions. The Front-end Interface of the system includes the Command-line tool which facilitates DevOps to invoke management actions which are defined in Domain-specific Models. For example, DevOps may invoke init and deployContainer actions against a particular JSON-based resource description to deploy a Container in Docker runtime (refer to Code 3.6). The Command-line tool interprets those actions, locates an appropriate Connector and triggers the op- 3.5. Implementation 104

Container-Entity Image-Entity

Name Container-Entity Name Image-Entity Version 1.0 Version 1.0 Author John-Smith Author John-Smith Description represent Docker Cont'ners Description represent Docker Image Associated-Tool Docker-CLI-1.3.2 Associated-Tool Docker-CLI-1.3.2

Properties Properties

Type String Type String name name Default-Value Default-Value state Required? yes no version Required? yes no Description cont'ner name Description Image name port-binding-rule script

cpu

memory

Figure 3.12: Container and Image Entity in Docker-based Domain-specific Model

Figure 3.13: Instantiation Relationship in Docker-based Domain-specific Model 3.5. Implementation 105 eration exposed by the Connector with necessary input parameters. In addition, DevOps may specify Event-Condition-Action (ECA) rules based on the specified events and actions in the Domain-specific Model.

Code 3.6: CLI command to deploy a container in Docker 1 cd ~/base-git-repo/node-app-1 #location of the JSON-based resource description 2 cloudbase docker.rest -action=init 3 cloudbase docker.rest -action=deployContainer 4 -input={"resource":"node-engine.json"}

Our implementation includes JSON Object Store; a Git23 repository to store and share Domain-specific Models and their objects as JSON files. The DSM Editor and Command-line tool are configured to store JSON files in the JSON Object Store. Related cloud resource descriptions (e.g., a collection of files that represent a web application) are organized into separate folders within the repository. The JSON Object Store allows keeping multiple versions of JSON files and trigger events when certain modifications (e.g., store, update, delete cloud resource descriptions) occur. These features are very useful to dynamically reconfigure cloud resources and roll- back to a previous stable configuration if an error occurs.

3.5.3 Event Management System

The Event Management System is implemented to collect and process lower-level monitoring events (e.g., (re)starting, CPU usage and memory usage events from Containers and VMs) from different cloud services (e.g., Docker, AWS) and gener- ating higher-level events for users. Based on the Event Model specified in a Domain- specific Model, we determine the type of events that can be detected. For example, the ODE-Server-1 includes events such as: Created, Started, Stopped, Paused, Running, etc.

There are two kinds of events collecting mechanisms are implemented: Polling and Pushing. 23https://git-scm.com/ 3.5. Implementation 106

We use Fluentd24 along with the low-level cloud resource C&M APIs (e.g., Docker Remote API25 and Docker Compose CLI 26) to poll and extract JSON-based events related to state changes of Containers in Docker-based Domain-specific Model; and we assume access credentials are supplied in advance by the user. Listing 3.7 represents a sample event received from Docker Remote API to notify the starting of a Container named “ODE-Server-1”. For extracting push-based events we leverage Apache Camel27 framework. For specifying, processing and generating high-level events we use Esper EPL. For instance, DSM curators may use Esper EPL to define a high-level event with only the attributes, required from a low-level event. High-level events also enable defining events based on a series of low-level events. For example, Esper may trigger an event named CPULoad-High for a particular Application in Docker if the CPU usage of each Container of the Application is over 95%.

Code 3.7: Sample JSON-based event from Docker 1 { 2 "log":"@Started", 3 "container\_id":"387ee161310fgh58hdfg674fg6c504e3", 4 "container\_name":"/ODE-Server-1", 5 " source ":"mysql-events-table-for-docker" 6 }

Events are archived and indexed in a single MySQL database table, the Events DB. Each table entry includes the ID of the Resource, timestamp, data-type (e.g. CPU or memory usage, state changes) and data-value. The Event Management Sys- tem also implements a Java-based event publishing channel for interested consumers (e.g., rules engines) to subscribe. Once subscribed, listeners will start receiving no- tifications when there are new table entries in the Events DB. 24http://www.fluentd.org/ 25http://docs.docker.com/reference/api/docker_remote_api/ 26https://docs.docker.com/compose/reference/ 27http://camel.apache.org/ 3.5. Implementation 107

3.5.4 Rule Processor

To enable automation, DevOps may also supply simple reactive rules. For example, if @Stopped then #notify, which implies if a Container or VM stopped perform some notification action. To greatly simplify the way rules can be defined, we reuse a simple rule-definition language adapted from our previous work [191]. In that previous work, we assumed a “Knowledge-driven” approach, which means APIs and their constituents (i.e. operations, input/output types) of the orchestration tools are loaded in a knowledge-base. This makes it possible to write high-level rule definitions and translate into concrete actions.

Listing 3.8 illustrates a rule that sends an email notification to the user if a Container or a Hosting Machine stops. The listen-to section (lines 3-6) specifies the specific events to be subscribed from which Entities. The section named trigger (line 7-9) describes the action to be invoked. The section named map (lines 10-17) describes the required input parameters to invoke the action.

Code 3.8: Sample rule to notify state changes of Containers and Applications 1 " Rules ":[ 2 { 3 "listen-to":{ 4 " event ": "@Stopped", 5 "Entity-type":[ "Container-Entity", "Hosting-Machine- Entity "] 6 }, 7 " trigger ":{ 8 " action ": "sendEmail($recipient, $content, $subject, $host , $port, $credentials)" 9 }, 10 "map":{ 11 "recipient": "[email protected]", 12 " subject ": "Stopped Container/Hosting-Machine: $event. resourceID", 13 " content ": "$event.data-type: $event.data-value", 3.5. Implementation 108

14 " host ": "smtp.gmail.com", 15 " port ":"587", 16 "credentials": "[email protected]/********" 17 } 18 } 19 ]

The Rule Processor subscribes to the event publishing channel from the Event Management System and begin listening for events which are stored in the Events DB. When a new event is received from the Event Management System, the Rule Processor checks the event, against rules defined within JSON-based resource de- scriptions in the JSON Object Store. If the event matches with any rules, the corresponding action in the rule is invoked, finishing the rule triggering process. During the execution of the action, the Rule Processor looks up required input pa- rameters and send a request to a relevant Connector which automatically translates high-level actions into native and low-level operations. In the case of federated cloud resources, DevOps may specify BPMN-based workflows instead of simple ac- tions which are implemented in Connectors. The language and runtime of those BPMN-based workflows are illustrated in Chapter4.

The Rule processor supports two types of rules: automated and manual. Auto- mated rules are automatically triggered by the Rule Processor when certain events occur. Automated rules consist of Entity-related events (e.g., Hosting-Machine deployed); and actions (e.g., deploy-Container) to be invoked. Manual rules are only triggered manually by the user. Compared to Automated rules, Manual rules may additionally include special events and actions. These events and actions are at the disposal of DevOps to take better control over manual actions. For example, those events and actions may beneficial to implement and manage User Interface (UI) components associated with manual actions (e.g., an approval button to restart a stopped Container). Similarly, another rule could be defined to listen to events that denote the completion of the manual actions, and trigger certain post-processing actions (e.g., a pop-up message to notify whether the Container was restarted). 3.5. Implementation 109

3.5.5 Use-case scenario

We demonstrate our implementation based on a use-case scenario. Consider we would like to model and deploy a software development and distribution platform. This platform is intended for software engineers who want to manage the entire lifecycle of a project. Multiple projects can leverage this platform by just cloning the deployment multiple times. This platform requires an AWS-EC2 VM where a Docker Container resides in. The Docker Container includes Redmine28, a project management service, and a Git client29. The Redmine service is intended to (1) extract commits from a specified source repository in GitHub via the Git client and (2) link them with relevant bug reports. In addition AWS-S3 bucket (i.e., a key- value storage), which acts as a software distribution repository, is required. We need to employ on Docker and Juju for configuring and deploying these resources.

Prior to modeling and deploying aforementioned cloud resource configuration, Connector Curators implement Connectors for Docker and Juju and deploy them as REST APIs in ServiceBus [23]. DSM Curators need to specify Domain-specific Models for Docker and Juju via DSM Editor. DSM Curators then store those Domain-specific Models in the JSON Object Store. DSM Curators may also con- figure the Event Management System such that events are pulled or pushed from low-level cloud resource management tools. In addition, DSM Curators may specify high-level events to be generated using Esper EPL.

Once required Domain-specific Models and Connectors are in place, DevOps model aforementioned cloud resource configuration as JSON-based descriptions. De- vOps deploy the specified resource descriptions via the Command-line tool. During the deployment, the Command-line tool (1) commits the JSON descriptions into the JSON Object Store, (2) generate a package of native resource descriptions, and (3) returns a unique id back to the DevOps for subsequent management operations of the deployed resource.

Once resources are deployed, they generate events. Event Management System capture, store and process those events and notify to the Rule Processor. Rule 28http://www.redmine.org/ 29http://git-scm.com/ 3.6. Evaluation 110

Processor triggers necessary actions in relevant Connectors if notified events are relevant to any rule, specified in the resource descriptions.

3.6 Evaluation

We evaluate the effectiveness of cloud resource configuration and management using Domain-specific Models instead of using low-level, heterogeneous and script-based tools. We then evaluate the cost of implementing Domain-specific Models and Con- nectors, which are the main contributions of this Chapter. We examined hypotheses: H, the Domain-specific Model driven approach is effective to accurately configure and deploy cloud resources. We measured the time-taken to accurately complete a given configuration and deployment task. We measured the accuracy of the mod- eling tasks by deploying each cloud resource description and checking whether the resultant deployment complied with the initial deployment specification.

Participants were sourced with diverse levels of technical expertise. For the sake of analysis, we classified a total of 14 participants into 2 main groups: (I) Experts (8 participants) with sophisticated understanding of cloud orchestration tools with 2-8 years of experience. And (II) Generalists (6 participants) who have average knowledge of cloud orchestration tool for day-to-day requirements, with around 1-5 years of experience.

Prior to the experiment, participants attended at least 1 out of 4 individual training sessions. During each session, participants were explained our tool’s usage via a presentation, and hand-on session. In addition, a brief description of various use-cases were presented. We then provided each participant a deployment specifi- cation of the software development and distribution platform, which we introduced under the Use-Case Scenario in Section 3.5.5. The deployment specification only il- lustrated the high-level requirements and participants were supposed to understand high-level requirements and implement those requirements using our tool. Several questions were raised and clarified accordingly.

During the experiments, participants translated the deployment specification 3.6. Evaluation 111 into relevant JSON-based resource descriptions and management rules; and stored them in the JSON Object Store. To deploy and manage the specified resource descriptions, participants invoked commands via Command-line tool (e.g., init, deploy, control) with necessary input parameters. For quantitative comparison purposes, we conducted the same experiment against two third-party tools: Docker and Juju. Only eight and five out of fourteen DevOps participated in the Docker and Juju based experiments respectively due to some DevOps did not have expertise and confidence to use those tools. In addition, a total of 7 participants implemented the same deployment specification using Shell scripts to estimate an upper bound of the test results.

We allowed participants to follow any order of 3rd party tools during the second phase of the experiment. Hence all the participants started the second phase of the experiment using the tool which is the most confident to them. To avoid any potential impact between times-taken to accurately complete the experiment using different tools, we allowed enough break times for participants at the end of each iteration.

3.6.1 Results, Analysis and Discussion

Evaluation of H. The hypothesis H was evaluated based on the time taken and number of lines-of-code. Alternatively, we sought to disprove the null hypothesis H 0. The hypothesis was examined by conducting a t-test with a probability threshold of 5%, and assuming unequal variance.

As shown in Figure 3.14, it was pleasantly surprising that even generalists demon- strated a significant increase in efficiency (i.e. reduction in time and lines-of-code). More specifically, the time taken to complete the task was reduced by 31% in compar- ison to there other approaches. Similarly, the number of lines-of-code was reduced by 37.2%. Participants reported that they much rather preferred an entity-relationship (ER) based abstraction for describing resources, as opposed to script-based lan- guages that are provided by otherwise widely adopted cloud management tools, such as Docker and JuJu. DevOps confirmed this greatly helped improve their 3.6. Evaluation 112 configuration and deployment time.

Our Approach Docker JuJu Shell Scripts 115 115 115 115 105 105 105 105 95 95 95 95 85 85 85 85 75 75 75 75 time(min) time(min) time(min) time(min) 65 65 65 65 55 55 55 55 45 45 45 45 Generalists Experts Generalists Experts Generalists Experts Generalists Experts

Hypothesis (H): All Participants Hypothesis (H) #LOC Our Approach Docker JuJu Shell Scripts df 18 Our Approach 82 Mean 63 93 80 101 p (T ≠ t) 0.0003 Docker 126 Variation 87.69 69.42 203.5 175.77 t Critical two-tail 26 JuJu 113 Observations 14 8 5 7 Reject H0 YES Shell Scripts 153

Figure 3.14: Results (Time, grouped by expertise); t-test Results; and Lines-of-Code

On the other hand, our approach assumes that appropriate Domain-Specific Models and Connectors have been defined and registered. This does incur addi- tional costs to implement, however we argue this is typically a one-off for the benefit of many. Once registered, countless DevOps would benefit over many occasions. Moreover, our knowledge-driven approach implies knowledge (such as high-level rep- resentations of cloud resource configurations) can be incrementally shared and col- lectively reused, which significantly improve productivity to implement federated management spanning across multiple cloud services.

Due to the vast number of alternative tools, and project-based constraints, a more exhaustive comparative experiment was outside the scope. However, given the notable differences in times (mean of 63, against 93, 72 and 101 minutes), we postulate it is unlikely to observe fundamental differences when comparing with any other tools similar to Docker or Juju. Accordingly, given our observations the likelihood of H 0 (equal mean modeling time) was around 5%. Therefore, we could safely reject these null hypotheses, and imply the truth of H .

Incurred cost of implementing Domain-specific Models and Connectors:

To measure the incurred cost of implementing our contributions, we measured the total number of lines-of-code (LOC) written by DSM curators and Connector cura- 3.7. Related Work 113 tors to implement a Domain-specific Model and a Connector for Docker, excluding white spaces and comments. In our approach DSM curators and Connector curators requires 839 LOC (lines-of-code) in average to implement a Domain-specific Model and a Connector. It also takes 3-5 weeks in average to extract a Domain-specific Model by going through and understanding the relevant language specifications and documentations of a particular C&M tool. This indicates that the incurred cost of implementing our proposed contributions is significantly high. However, given the fact that DevOps who configure and manage cloud resources would not require the efforts of development, registration and maintenance of Domain-specific Models and Connectors - since this could be pre-done and re-used multiple times for the benefits of many: the user study thus confirms the overall improved effectiveness by employing the Domain-specific Model methodology.

3.7 Related Work

In this section we briefly explore the technological landscape and survey the resource representation aspect of cloud resource Configuration and Management (C&M) tech- niques; as well as interoperability concerns amongst cloud resource C&M tools. We compare and contrast our proposed approach with these related work.

Data representation languages such as YAML (YAML Ain’t Markup Language), XML (Extensible Markup Language) and JSON (JavaScript Object Notation) are general purpose languages which offer structuring mechanisms for better organiza- tion and clarity of data. These representation languages are already used to represent cloud resource configurations. For example, YAML, XML, and JSON are used in Docker Compose, Plush and AWS CloudFormation respectively [69,3,4]. However, we reuse JSON to propose a Domain-specific language that represents concepts that are significant in cloud resource configuration and management.

Cloud resource C&M tools (e.g., Puppet, Chef, Juju, Docker, SmartFrog, AWS OpsWorks), and research initiatives provide domain specific languages to represent and manage resources in a cloud environment [47, 87, 64, 116, 215]. These lan- guages are either template-based or model-driven [59]. Template-based approaches 3.7. Related Work 114

(e.g., Open Virtualization Format) aggregate resources from a lower level of the cloud stack and expose the package, along with some configurability options, to a higher layer. Model-driven approaches (e.g., TOSCA [153]) define various models of the application at different levels of the cloud stack, and aim to automate the configuration and management of abstract pre-defined composite solutions on cloud infrastructure [149, 168, 131]. Our approach proposes Domain-specific Models, a methodology to extract cloud resource management entities from such model-driven and template-based C&M languages. These Domain-specific Models provide a vo- cabulary and constructs to build elementary and federated cloud resources, as an abstract-layer over these multiple and diverse languages.

Different cloud resource providers offer resource configuration services with dif- ferent interfaces (e.g., CLI, SDKs, REST/SOAP based and Scripts) for users who have different levels of comfort and experience with those interfaces. Methodologies for unified representation and invocation of heterogeneous web services are proposed in several research [23, 24]. Those approaches only focus on application-level services instead of arbitrary services like CLI, SDKs and scripts, which are quite common for cloud resource configuration and management tools. Wettinger et al. [214] pro- poses a REST-based unified invocation API that abstracts out different invocation mechanisms, interfaces (e.g., CLIs, scripts) and tools available for cloud resource configuration and orchestration. However the authors have not focused on modeling orchestration processes and reconfiguration policies on top of that unified invocation API. Whereas our research focuses on (1) high-level representation and invocation; and (2) modeling high-level orchestration rules.

To build federated cloud resource management solutions across heterogeneous C&M tools, we need a middleware that either: (a) defines a unified cloud resource C&M language (e.g., TOSCA, MODAClouds [153, 14]), which is conformed by every tool; or (b) provide a pluggable architecture that accepts and interprets different resource C&M models, offered by any tool. The former method is not cost effective as it would require existing tools undergo major architectural changes or complex model transformations to conform to a new language provided by the middleware. We thus believe the latter approach provides a more pragmatic and adaptive solution 3.7. Related Work 115 that can be integrated amongst a set of already existing and prevalent tools.

TOSCA is an open standard for unified representation and orchestration of cloud resources [153]. Wettinger et al. propose a model transformation technique that generates TOSCA based resource descriptions from resource descriptions in Chef and Juju [213]. Wettinger et al. and our work both focus on addressing drawbacks of heterogeneity among different Configuration and Management (C&M) tools. But our main goal is to build up a knowledge ecosystem by extracting resource C&M models of tools and represent them in a linked data model (i.e., Domain-specific models) such that common or related concepts across different tools can be exploited. For example, in Docker, the entity named Hosting-Machine (refer to Figure 3.5) represents a VM where Containers are deployed, albeit Docker run-time cannot itself provision VMs. JuJu on the other hand focuses on managing a set of VMs, and can thus provision VMs. If Domain-specific models for both of tools are linked together, we can automate end-to-end deployment of Docker Containers on VMs which are provisioned by Juju.

MODAClouds [14] is another unified model-driven approach to design and man- age multi-cloud applications. MODAClouds proposes four layers of models that incrementally transform functional and non-functional requirements of applications into tool-specific resource C&M tasks. However, compared to our approach, the disadvantages of such existing unified model-driven approaches include: (a) such models potentially ignore specific features of C&M tools; and (b) lack of transfor- mation support from high-level models to low level resource descriptions of various management rules.

In the domain of multi-cloud application development, wrapping heterogeneous cloud resources has been researched [149] and implemented as language libraries (e.g., Apache jclouds30). However, the fact that providers furnish different offerings and change them frequently often complicates these approaches.

Konstantinous et al. [116] present a cloud resource description and deployment model that first models a resource as a provider-independent resource configuration, called "Virtual Solution Model", and then another party can transform the provider- 30http://jclouds.apache.org 3.8. Conclusion and Future Work 116 independent model to a provider-specific model called, "Virtual Deployment Model". This approach only allows users to compose federated resource configurations from a single provider for a single deployment, in contrast to our approach, which considers the resource federation from multiple providers as a first class citizen. Methodologies for unified representation and invocation of heterogeneous web services are proposed in several research [23]. Those approaches only focus on application-level services APIs (i.e., unifying the operations and input/output message schema) instead of arbitrary cloud resources, which require a unified representation to describe resource configurations and topologies as well.

3.8 Conclusion and Future Work

In this chapter, we have presented a Domain-specific Model that allows high-level cloud resource representations over existing, low-level, heterogeneous and script- based C&M tools. We further propose a pluggable architecture (i.e., Connectors), a programmable interface that allows DevOps to deploy and manage high-level cloud resource representations. Behind the scenes, Connectors translate high-level cloud resource representations and management rules into native resource descriptions and management scripts. To evaluate the effectiveness of our approach, we implemented a proof-of-concept prototype. Our approach yields significantly promising results, a 26.7% reduction of resource configuration and deployment time compared to tradi- tional C&M techniques. We deduce the improved effectiveness of Domain-specific Model based cloud resource C&M. To further demonstrate the effectiveness of our work, we introduce (1) a visual language over Domain-specific Models for interactive exploration and comprehension of cloud resource configurations; and (2) a cloud resource recommender system based on a C&M knowledge acquisition technique (by extending our previous work [210]) in Chapter6 and5 respectively. As future work, we plan to provide resource migration support across different Domain-specific Models. Chapter 4

Process-driven Configuration of Federated Cloud Resources

4.1 Introduction

A key distinguishing feature of cloud services is the elasticity, i.e., the power to dynamically scale resources up and down to adapt varying requirements. Elasticity is usually achieved through invocation of reconfiguration tasks (e.g., add storage capacity, restart VM instances) that run as a result of events (e.g., service usage in- creases beyond a certain threshold) allowing the configuration and management tool to dynamically re-configure cloud resources. We argue that a cloud resource config- uration and management technique must support dynamic reconfiguration policies that cater for flexible characterization and planning of varying resource needs over time. For example, a “restart” policy of a Virtual Machine (VM) is invoked whenever the configuration and management runtime detects a connection failure to the rele- vant VM. The reconfiguration policy may perform some reconfiguration operations such as requesting to restart the VM through the provider’s deployment service interface. However, the automated and unified configuration and management of federated services is still in the early stages [168].

In this chapter we address above limitations by providing high-level abstractions

117 4.1. Introduction 118 for federated cloud resource configuration and management processes, which re- place existing, ad-hoc or manual configuration and management processes and also low-level and heterogeneous configuration and management services. In this Chap- ter, we reuse the concepts from Chapter3: (1) Domain-specific Model to represent federated cloud resource configurations, and (2) Connectors to invoke monitoring and management actions of federated cloud resources. The main contributions of this chapter are:

(1) BPMN -based process modeling notation for resource orchestration tasks: Deployment and orchestration activities of federated resource configurations can be modeled using a process-based language. But modeling orchestration pro- cesses by directly interacting with heterogeneous deployment services leads to ad-hoc scripts or manual tasks, which hinder the automation of orchestration activities in a dynamic cloud environment.

Hence it is desirable to provide productive and user-friendly modeling techniques for users to compose federated resource deployment and orchestration processes. We provide two high-level and process-based abstractions that facilitate users to de- scribe, deploy and specify reconfiguration policies of their federated cloud resource configurations. For the description of cloud resources, we reuse the Domain-specific Model that we propose in Chapter3. Firstly, we propose the concept of Cloud Resource Deployment Task to simplify cloud resource configuration and foster inde- pendence between applications and cloud services. Secondly, we propose the concept of Cloud Resource Re-configuration Policies to endow resources with dynamic re- source re-configurations. We implemented these notations by extending Business Process Model and Notation (BPMN) [154], an open and graphical process mod- eling standard, which is already adapted by industry and academia for modeling cloud resource deployment and management tasks [153]. We provide mechanisms that automatically translate high-level deployment tasks and re-configuration poli- cies into the corresponding native BPMN constructs. We decided to extend BPMN rather than use native BPMN to avoid orchestration tasks getting (1) complex; (2) error-prone; and (3) difficult to verify and manage later. An alternative ap- proach to translating high-level deployment tasks and reconfiguration policies into 4.2. Modeling Cloud Resource Configuration Tasks 119

BPMN-based workflows is to translate the very same high-level tasks and policies into Events-Condition-Action (ECA) rules that we proposed in Chapter3.

(2) A prototype implementation and evaluation: We implemented an ex- tended BPMN editor and a translator that interpret and converts extended BPMN constructs into native BPMN constructs. Our extended BPMN editor supports modeling federated cloud resource deployment tasks and dynamic reconfiguration policies as first class citizens. Our proposed approach thus replaces time-consuming, frequently costly, naturally expert driven and manual cloud resource configuration and deployment tasks with a model-driven and unified cloud resource orchestration method and techniques. We presents an experiment, conducted using a real-life fed- erated cloud resource that demonstrates the improvements achieved by our proposed contributions.

Together these contributions enable cloud resource consumers to focus on high- level application requirements, instead of low-level details related to dealing with heterogeneous deployment services.

This chapter is structured as follows. Section 4.2 introduces our process-based abstractions for federated cloud resource configuration and management. In Section 4.3, we illustrate the transformation of high-level deployment tasks and reconfigura- tion policies into native BPMN constructs. Section 4.4 explains the implementation and the evaluation of our solution, followed by related work (Section 4.5) and the conclusion including future work (Section 4.6).

4.2 Modeling Cloud Resource Configuration Tasks

We introduced the concepts of Domain-specific Models and Connectors to abstract out the heterogeneity of cloud resource configuration and management services in Chapter3. Users can discover those published resource descriptions (i.e., JSON- based representations of cloud resource configurations) and invoke available opera- tions of associated configuration and management services (i.e., via Connectors) of resource descriptions to deploy, configure and undeploy a resource instance. To au- 4.2. Modeling Cloud Resource Configuration Tasks 120 tomatically deploy and orchestrate a federated resource configuration users require modeling an orchestration process that coordinates the deployment, configuration and undeployment tasks of several resources. We propose two high-level process based abstractions over Domain-specific Models and Connectors to model feder- ated cloud resource configuration and management tasks: Deployment Tasks and Reconfiguration Policies. In Section 4.3, we explain how these abstractions are im- plemented by extending BPMN.

4.2.1 Motivating Scenario

Consider a scenario, where a web-application developer needs to deploy a web- application in an Apache-Tomcat1 based application server cluster. To distribute requests to a set of Tomcat application servers, the resource infrastructure includes an http load balancer (LB) like nginx2. The web application is deployed in each Tomcat server. When adding a new Tomcat server to the cluster, the web application also should be deployed within the new server. To add the newly deployed Tomcat server to the cluster, the routing table of the LB should be updated with details (e.g., IP and port) of the new server. Then more Tomcat servers can be deployed in the aforementioned manner until the cluster reaches the expected number of Tomcat servers.

The deployment tasks of the Tomcat cluster are depicted in Figure 4.1. We excluded the deployment tasks of the LB from Figure 4.1 for a simplified graphical representation. Once the Tomcat cluster is deployed, the application developer may release new versions of the application. Hence each Tomcat server in the cluster must be updated with the new version of the application. The orchestration process in Figure 4.1 should be updated by including additional orchestration tasks for continuously integrate web application updates. See the updated version in Figure 4.2.

The complexity incurred by modeling even a single reconfiguration policy within the initial orchestration process points out the drawbacks of including reconfigura- 1http://tomcat.apache.org/ 2nginx.org/ 4.2. Modeling Cloud Resource Configuration Tasks 121

Figure 4.1: Deployment plan for a web application in an Apache Tomcat cluster

Figure 4.2: Modeling application updates within the deployment plan in Figure 4.1 tion policies as part of the initial orchestration process. Modeling further reconfigu- ration policies (e.g., a web application upgrade is rollbacked if it cannot happen on every Tomcat server in the cluster) within the initial orchestration process makes the resultant orchestration process (1) complex; (2) error-prone; and (3) difficult to verify and manage later. It is advantageous in such situations to separate the mod- eling of corresponding reconfiguration policies from the initial orchestration process and refer the relevant policies within initial tasks as in Figure 4.3. Using our high- level process based abstractions, user is now capable of simply and clearly define high-level reconfiguration policies that apply to one or more resources in a federated resource configurations.

4.2.2 Cloud Resource Deployment Tasks

Figure 4.4 depicts the UML class diagram of how we address cloud resource con- figuration processes. Two entities on the top represent our high-level process-based abstractions. We introduce “Cloud Resource Deployment Task” (CRD-Task) that 4.2. Modeling Cloud Resource Configuration Tasks 122

Figure 4.3: Modeling application updates in Figure 4.2 using CRR-Policy

referred by referred by CRD-Task CRR-Policy 0..* 1..*

referred by 0..* 0..* referred by 0..* referred by

refer-to 1..1 refer-to 1..1 refer-to 1..1 Cloud Resource 1..1 refer-to Connctor Event Description Description refer-to 1..1 represented-by 0..* included-in 1..1

represent 1..1 include 1..* Orchestration Cloud Resource Operation

Figure 4.4: Conceptual model of CRD-Task and CRR-Policy allows users to model the deployment of cloud resources configurations. Every CRD- Task is associated with a high-level representation of a cloud resource configuration and a potential Connector that can deploy the associated resource representation (see Figure 4.5). We implemented a recommender service in Chapter5[210] that facilitates users to search for available resource configurations. During the execution of this task, the "deploy" operation of the associated Connector is triggered along with the resource representation as the input. For example, the deployment of a HP-Cloud-Compute3 VM is modeled using CRD-Task named "HP-Compute-VM", which triggers the "deploy" operation of the Connector named "HP-Deployer" with the resource representation named "desc1". 3www.hpcloud.com/products-services/compute 4.3. Translating CRD-Task and CRR-Policy into BPMN 123

4.2.3 Cloud Resource Re-configuration Policies

"Cloud Resource Re-configuration Policy" (CRR-Policy) allows users to specify a high-level and dynamic re-configuration policy for a cloud resource. A CRR-Policy is a pair of an event description and a reconfiguration action, exposed by a Connec- tor. Users can add any number of CRR-Policies to a CRD-Task, given that events and reconfiguration actions are registered in the associated Domain-specific Model and Connector of the CRD-Task.A CRR-Policy is triggered whenever its associated event occurs. The Event Management System, which is proposed in Chapter3, is responsible to propagate the event to the runtime of the CRR-Policy. For an exam- ple, the CRD-Task, named "HP-Compute-VM" contains two CRR-Policies, called "CRR1" and "CRR2" (see Figure 4.6). BPMN runtime triggers "CRR1" (i.e., IF {incomingMessage=="restart-desc1"} THEN RUN {"restart-policy"}) whenever a user sends a request to restart the cloud resource. "CRR2" (i.e., IF {getDay()=="sunday"} THEN RUN {"backup-policy"}) is triggered weekly to backup the cloud resource.

4.3 Translating CRD-Task and CRR-Policy into BPMN

The choice of BPMN, as the language for modeling deployment and reconfigura- tion processes of federated cloud resources is motivated by several reasons. First, BPMN is an open, standardized and task-based service composition language that is heavily used in application layer. Next, BPMN is suitable to express executional dependencies among different deployment tasks. Furthermore, BPMN supports ex- tension points which are crucial to model deployment and reconfiguration workflows as BPMN doesn’t support modeling cloud resource deployment and reconfiguration tasks out of the .

An alternative implementation is to translate Cloud Resource Configuration Tasks into ECA rules and reuse the system, we proposed in Chapter3. However, we choose BPMN as the target language as it provides higher-level flow-based language constructs compared to ECA rules and thereby results in a much more cost-effective 4.3. Translating CRD-Task and CRR-Policy into BPMN 124

HP-Compute-VM Extended Resource-config :desc1 BPMN Connector:HP-Deployer

Translator Engine

Native invoke:RESTService BPMN service :HP-Deployer operation :deploy(desc1)

Figure 4.5: A CRD-Task and its BPMN generation (within the dotted rectangle) implementation.

4.3.1 Translating CRD-Tasks

Figure 4.5 depicts a federated cloud resource deployment workflow. The "HP- Compute-VM" is a CRD-Task, which is annotated with a cloud resource repre- sentation and Connector. During the runtime of the workflow, the CRD-Task is transformed into a BPMN sequence flow that includes a Service Task [154] that triggers the "deploy" of Connector along with the resource representation as the input parameter.

4.3.2 Translating CRR-Policies

BPMN allows to model events and associated tasks as the business logic. We decided not to reuse the native BPMN events to implement CRR-Policies, because modeling several events and associated tasks within the deployment workflow makes the resul- tant deployment workflow complex. Also it enforces the workflow designer to know exactly where to inject those events and associated tasks within the workflow. Hence it is advantageous to provide a high-level abstraction to implement re-configuration policies in BPMN. We implemented CRR-Policy as an extension to BPMN to define 4.3. Translating CRD-Task and CRR-Policy into BPMN 125

Figure 4.6: A CRD-Task with two CRR-Policies and its BPMN generation (within the dotted rectangle) re-configuration policies, which are linked with the CRD-Tasks while separating the original deployment workflow from re-configuration policies.

Figure 4.6 depicts a federated cloud resource deployment workflow. The "HP- Compute-VM" is a CRD-Task, which is annotated with two CRR-Policies (i.e., event-policy pairs) that define re-configuration policies of the cloud resource config- uration. During the runtime of the workflow, each CRR-Policy is transformed into a BPMN sequence flow that includes an Event and a Service Task that triggers the reconfiguration actions. All the sequence flows are initiated from an Event-based Gateway within a Loop Task.

Our approach supports the automated generation of BPMN processes that de- ploy and re-configure the appropriate SaaS, PaaS or IaaS resources with respect to the introduced modeling abstractions, namely CRD-Tasks and CRR-Policies.A detailed description of generation techniques is outside of this chapter. 4.4. Implementation and Evaluation 126

4.4 Implementation and Evaluation

We reused our previous work: Domain-specific Models, Connectors and Event Man- agement System in Chapter3 to simplify the interaction between underlying con- figuration and management tools, and thereby to deploy and reconfigure federated cloud resources in a unified manner. We built a proof-of-concept (POC) prototype of the proposed approach4: an extended BPMN editor and a translator engine (refer to Figure 4.7). We implemented the extended BPMN editor (refer to Figure 4.8) and translator engine by extending Activiti5, a graphical BPMN editor and engine. Our workflow editor was extended to model deployment and reconfiguration workflows with tasks named CRD-Task and CRR-Policy (see Section 4.2.2 and 4.2.3). Our translator engine was extended to parse and generate native BPMN code for CRD- Tasks and CRR-Policies; and execute them. We also implemented several POC prototypes of Connectors. In the current implementation, DevOps implemented and registered the Connectors as RESTful services in our system via ServiceBus API [23].

4.4.1 Evaluation

To evaluate our approach, we measured the overall productivity gained by three professional software engineers from business process and application development backgrounds. For the experiment we provided each testee a deployment specification and asked them to model an arbitrary deployment workflow to deploy a software development and distribution platform. This platform was intended for software engineers who want to manage the entire lifecycle of a project. Multiple projects can leverage this platform by just cloning the deployment multiple times. We enforced testees to limit their resource selection choices to a fixed set of cloud resources, which were currently supported by the implementation of our system6. We expect to increase selection choices as the design of the system inherently supports to incrementally collaborate resource configuration knowledge in terms of Connectors 4github.com/ddweerasiri/Federated-Cloud-Resources-Deployment-Engine 5activiti.org/ 6github.com/ddweerasiri/Federated-Cloud-Resources-Deployment-Engine DevOps

Extended BPMN Editor Workflows

deployment and reconfiguration processes

Translator Engine

native BPMN processes

BPMN 2.0 Engine

trigger actions events

Event Domain-specific events ECA Rule Models (DSMs) Management System Processor

Events DB Rules DB JSON Object Store subscribe trigger actions

Connector-1 Connector-2 Connector-3

invoke operations 4.4. Implementation and Evaluationevents 127

Cloud Resource Management Tools

Figure 4.7: System Overview 4.4. Implementation and Evaluation 128 Figure 4.8: Extended BPMN Editor 4.4. Implementation and Evaluation 129 and Domain-specific Models. In the deployment specification, testees were instructed to deploy an AWS EC2 VM where Redmine [170], a project management service and a Git client [86] is installed. Testees were advised to deploy a new source code repository in GitHub and integrate it with the Redmine service such that the Redmine service automatically extracts the latest commits from the repository via the Git client. Additionally testees were instructed to deploy an AWS S3 bucket which act as a software distribution repository.

For the evaluation purposes we implemented the same deployment specification in three languages; (i) Shell scripts, (ii) Docker and (iii) Juju. The main reason to choose Shell scripts was to estimate an upper bound of the result set. Docker and Juju were selected as their popularity among DevOps and they are specifically designed for cloud resource configuration and deployment. We measured (i) the total number of lines-of-code ("actual" lines of code written and how many generated by our approach), excluding white spaces and comments; (ii) number of external dependencies/libraries required to describe and deploy each federated cloud resource; and (iii) time taken to complete the modeling task. We measured the correctness of the modeling tasks by executing each deployment workflow and checking whether the resultant deployment complied with the initial deployment specification. The benefits of our approach is further demonstrated in embracing the knowledge-sharing paradigm (inspired from industry7): Given that users in this scenario would not require the efforts of development of the deployment and reconfiguration processes since this could be pre-done once and re-used multiple times for the benefit of many.

4.4.2 Analysis and Discussion

Results of the experiment (see Table 4.1) show that lines-of-code; number of exter- nal dependencies; and time-to-modeling are improved when using our system over prevalent resource deployment techniques. More specifically, the time-to-modeling is reduced by 15.2% assuming the required Connectors and Domain-specific Models for all component resources are registered in the system. We argue that high-level rep- resentations of cloud resource deployment and reconfiguration processes; and tool- 7https://jujucharms.com/ 4.4. Implementation and Evaluation 130

Table 4.1: Results of the experiment Shell Our Ap- Parameters Docker Juju Scripts proach average time-to-modeling (min) 103 95 72 61 #lines-of-code 107 116 127 541 (gener- (gener- ated) ated) #dependencies 3 1 1 1 knowledge shareable without no yes yes yes changing deployment workflows

support like graphical deployment workflow editors improve the time-to-modeling. Comparing with proprietary languages like Juju, we argue that extending standards- based workflow languages like BPMN further improves the time-to-modeling for users like business process and application developers. Shell scripts, Docker and Juju required 116 lines-of-code on average to model the deployment plan. Our approach generated 541 lines-of-code because Connectors were implemented using Java and BPMN based resource deployment tasks generated XML and JSON files which are more verbose compared to shell scripting based approaches like Docker and Juju. Therefore we determine the improved productivity of the process-driven federated resource deployment and reconfiguration over a rich layer of high-level cloud resource representations.

Instead of following formal specification and validation approaches, we followed the use-cases based validation approach. We successfully implemented several real- world use cases to validate the expressiveness of our modeling notation. In addition, we reused our BPMN-based notation in Chapter3 to model deployment and re- configuration of federated cloud resources (refer to Code 3.4). Hence we believe our extended BPMN notation is expressive enough to model any cloud resource de- ployment and reconfiguration scenario, given that required Domain-specific Models, Connectors and Event Management System are in place (refer to Figure 4.7). 4.5. Related Work 131

4.5 Related Work

In this section, we briefly describe two areas of related works: (1) federated cloud resource configuration and management processes; and (2) modeling dynamic or- chestration of federated cloud resources. We compare and contrast our proposed approach with these related works.

Various cloud resource description and orchestration frameworks are proposed in industry and research. Market-leading cloud resource providers such as AWS OpsWorks8 and CA AppLogic9 allow describing and deploying complete application stacks. These providers offer provider-specific resource representations while our approach allows modeling federated cloud resource configuration and management processes.

TOSCA [153] is an open standard for representation and orchestration of cloud resources. TOSCA facilitates to describe a federated cloud resource configuration using a "Service Template" that captures (1) topology of component resources; and (2) plans that orchestrate component resources. But TOSCA does not define (1) the implementation language of orchestration processes; and (2) how to specify dy- namic reconfiguration policies. When developing TOSCA orchestration plans for multi-cloud environments, developers often need to deal with different invocation mechanisms (e.g., SOAP, REST and SSH) even though the resource representation is standardized. Hence our graphical process modeling language and Resource Con- figuration Service (RCS) can be leveraged for implementing high-level orchestration plans of a TOSCA "Service Template".

Cloud resource orchestration processes are composed of deployment and recon- figuration tasks, which require advanced abstractions over general business process modeling languages like BPMN and BPEL. BPMN and BPEL focus primarily on the application layer [154, 62]. However, orchestrating cloud resources requires rich abstractions to describe and manage application resource requirements and con- straints, support exception handling, flexible and efficient scheduling of resources. 8https://aws.amazon.com/opsworks/ 9www.ca.com/au/cloud-platform.aspx 4.6. Conclusions & Future Work 132

Nonetheless a user can leverage native BPMN or BPEL to model deployment pro- cesses [146] with reduced design flexibility, increased modeling size and complexity. Furthermore, modeling dynamic reconfiguration policies (e.g., backup the MySQL database on every Sunday) within the same deployment process makes it difficult to verify and manage the workflow later. We propose two BPMN extensions to model deployment tasks and high-level reconfiguration policies while keeping the deploy- ment workflow simple and modular. In the domain of cloud resource management [146] proposes a unified cloud resource deployment API using Web Service standards (e.g., WSDL and WS-BPEL). However the authors leverage native WS-BPEL to describe federated resource deployment processes. Comparatively, we propose an extended BPMN based language rather than using native BPMN to avoid orches- tration processes getting (1) complex; (2) error-prone; and (3) difficult to verify and manage later.

Extending modeling languages to facilitate domain specific needs is a common practice. For an example, Sungur et al. [192] propose BPMN extensions to program wireless sensor networks. BPMN4TOSCA [117] extends BPMN to implement or- chestration plans of resource representations described as TOSCA "Topology Tem- plates" [153]. BPMN4TOSCA includes 4 BPMN extensions, which facilitate to model configuration and orchestration tasks associated with a Topology Template. Comparing to our graphical modeling language, BPMN4TOSCA does not model dynamic reconfiguration policies that are essential to model the elasticity of cloud resources. We propose a high-level process based notation to model dynamic recon- figuration policies, which trigger orchestration tasks when specific events happen, without complicating the initial deployment process model.

4.6 Conclusions & Future Work

In this chapter, we have presented a federated cloud resource configuration orches- tration framework, which leverages a high-level cloud resource representation model, proposed in Chapter3. The framework consists of a design and implementation of an extended BPMN-based process notation to describe and automate the deployment 4.6. Conclusions & Future Work 133 and reconfiguration tasks of complex resource configurations over federated clouds. To evaluate the feasibility and productivity of the proposed framework, we imple- mented our system as a proof-of-concept prototype. As future works, we plan to enable the orchestration framework to select the potential cloud resource providers on arrival of deployment requests based on a user defined QoS (e.g., provider’s avail- ability >95%) policy. Chapter 5

A Recommender Service for Knowledge Reuse in Cloud Resource Configurations

5.1 Introduction

Cloud resource configuration and management tools improve the productivity of cloud resource consumers (e.g., DevOps, system administrators, application devel- opers) by providing necessary services to select, describe, configure, deploy, monitor and control cloud resources through the life cycle (refer to Chapter1). The pro- ductivity in cloud resource configuration and management can be further improved by enabling reuse capabilities of existing configuration and management knowledge (refer to Chapter2). For example, a system administrator specifies a cloud resource description that represents a MongoDB1, a NoSQL database, in Docker. The spec- ified resource description (i.e., a Dockerfile) may be shared among other DevOps. Some of the DevOps may curate the resource description to improve its quality (e.g., check for any errors). Some other DevOps may reuse the resource descrip- tion to deploy a MongoDB server for their own applications. This knowledge reuse paradigm reduces development time and human errors, incurred in cloud resource 1https://www.mongodb.org/

134 5.1. Introduction 135 configuration and management, thereby nurtures the productivity of cloud resource consumers.

An important research question is how to leverage existing cloud resources con- figuration knowledge to support individualized application requirements on a feder- ated cloud in a systematic manner. The challenges involved in solving this research problem are (1) modeling of heterogeneous cloud resources configuration manage- ment knowledge in a high-level and unified manner; (2) satisfying individualized application and resource requirements with existing configuration knowledge; and (3) capturing, customizing, and reusing existing configuration knowledge.

Solving this research problem is beneficial to any cloud resource consumer. Be- cause regardless of how different cloud resources (IaaS, PaaS and SaaS) are or how such resources are deployed (public, private or federated), almost every cloud resources configuration expects to satisfy individualized application requirements. Furthermore consolidating and sharing cloud resources configuration management knowledge are advantageous for consumers who do not have many skills in modeling and management of cloud applications. Because consumers can leverage existing configuration management knowledge in a unified and high-level format. Also con- sumers are shielded from provider specific, low level, complex, and heterogeneous configuration management interfaces, and technologies. Based on above analyses, we propose the following contributions.

1. A recommender system for cloud resource configuration knowledge. This system leverages the cloud resources representation model, introduced in Section 3.4 in Chapter3, to describe cloud resource configuration knowledge. In essence the recommender system accepts application requirement context description (e.g., intended task, deployment scenario) from consumers to query a "cloud resources configuration Knowledge Base (KB)", which returns a con- figuration artifact that satisfies the given context

2. An incremental knowledge acquisition technique, starting with an empty KB which is gradually built up

We also discuss a proof-of-concept prototype implementation and verify our pro- 5.2. Knowledge Base for Reuse of Resource Configurations 136 posed approach based on a modeling experiment of 36 real-world cloud resources.

The chapter is structured as follows. Section 5.2 elaborates the rule based rec- ommender system and incremental knowledge acquisition technique. Section 5.3 explains the implementation and the evaluation of our solution, followed by related work (Section 5.4) and the conclusion including future work (Section 5.5).

5.2 Knowledge Base for Reuse of Resource Config- urations

Current techniques for re-use usually utilize provider-specific resource configuration descriptions. However, this does not prove efficient for modeling federated cloud resource configurations. In our research work, we propose a novel rule based recom- mender system to reuse existing configuration knowledge for modeling cloud resource configurations. Our rule based recommender system intends to automatically sug- gest configuration knowledge artifacts, required for consumers during cloud resources configuration management processes (e.g., deployment and configuration parameter modification). The cloud resource representation model in Section 3.4 thus far in- herently supports this type of knowledge. Recommended suggestions are generated based on a consumer specified context (e.g., intended task and deployment scenario). This context represents an individualized application or resource requirement. The suggested configuration knowledge includes all necessary information and instruc- tions, required to deploy a cloud resources configuration, which satisfies the context description. Our system derives suggestions from configuration knowledge artifacts (e.g., executable deployment scripts, packaged virtual appliances) that were created for similar contexts in the past. Consumers can accept or modify recommended configuration knowledge artifacts according to consumers’ requirements. Alterna- tively consumers can reject the recommendation, and create a new configuration knowledge artifact from scratch. Once such modifications are completed, the rec- ommender system translates those modifications into Recommendation Rules with help of consumers and make available new recommendations for future consumer re- 5.2. Knowledge Base for Reuse of Resource Configurations 137

Contexts Config. Knowledge Context Artifact {E-learning, VMWare VM} Task Category Configuration Recommendat Knowledge ion Rules Representations

Deployment Scenario

Figure 5.1: Rule based Recommender System Overview quests. Finally recommended knowledge artifacts are input into relevant providers’ deployment interfaces to provision concrete cloud resources configurations.

In this section we first introduce Recommendation Rules of the recommender system, followed by the construction, origin and evolution of Recommendation Rules.

5.2.1 Recommendation Rules

Our recommender system maintains a cloud resources configuration knowledge base (KB) which stores contexts, configuration knowledge representations, and configu- ration knowledge artifacts (i.e., packaged virtual appliances and executable scripts) (See Figure 5.1). Recommendation Rules maintain associations between those items in the KB as shown in Figure 5.2. Recommendation Rules consist of contexts (when does the rule apply) and conclusions (what should be recommended when the rule is activated).

Contexts of Rules

The recommender system maintains "Contexts" data (See Figure 5.1) of intended task categories (e.g., Operating systems, eCommerce) and deployment scenarios (e.g., public and private cloud). The "Task Category" and "Deployment Scenario" databases intend to capture meta-data and common information about classes of similar resource and application requirements. These databases allow cloud resource 5.2. Knowledge Base for Reuse of Resource Configurations 138 consumers to reuse and customize shared context knowledge. On the other hand associating configuration knowledge representations with relevant contexts can effec- tively segment those representations based on potentially satisfying tasks. Indeed, there could be other aspects to develop a context. For example: (1) pricing strategy (e.g., free, monthly subscription); (2) deployed location (e.g., Asia, Europe), etc. We found the intended task category and deployment scenario are more than adequate to develop a feasible context description. However, this could be altered as required.

The "Contexts" database is implemented as a hierarchical structure of con- text entities. Therefore task categories can be associated from more generic (e.g., "Application Development", "Storage management") to more specialized entities (e.g., "Java based Web Application Development", "Relational Database services"). Similarly deployment scenarios can be associated from more generic (e.g., "public cloud deployment", "private cloud deployment") to more specialized entities (e.g., "VMWare vSphere Hypervisor 5.5", "Amazon-EC2", "Windows Azure"). This hi- erarchical structure helps consumers to query "Contexts" data and find either equal or approximately equal (but more generalized) contexts that satisfy consumers’ re- quirements.

In Figure 5.2, left hand side of Recommendation Rules depicts the context of rules. More design level details on how these rules are originated, processed and evolved are explained later in Section 5.2.

Conclusions of Rules

The right hand side of the UML class diagram depicts components, which construct the conclusion of a Recommendation Rule (See Figure 5.2). Recommendation Rules suggest configuration knowledge representations. Consumers can deploy the config- uration knowledge representation via a specific provider’s configuration deployment service. In some situations, consumers may need to submit knowledge representa- tions to specific deployment services and generate knowledge artifacts (e.g., packaged virtual appliances, deployment scripts) and then deploy the generated artifacts. For example, Docker API firstly generates an Image (a packaged virtual appliance) from 5.2. Knowledge Base for Reuse of Resource Configurations 139

Form the conclusion Form the context of the rule Recommendation of the rule 1..n Rule 1..n 1..1 1..*

Context Recommendation

1..* 1..* 0..* 1..1

1..* 1..* Configuration 1..1 Configuration Knowledge Knowledge Deployment 1..* Task Category Artifact Representation Scenario

Packaged Virtual Executable Script Appliance

Figure 5.2: UML class diagram of Recommendation Rules a submitted knowledge representation and next consumers should submit the Image to Docker API for deploying concrete cloud resources (i.e., Containers). Consumers can manage the deployed cloud resources configuration then. We consider the man- agement aspect of federated cloud resources configurations as a future work.

In the next sections we explain how Recommendation Rules are originated, pro- cessed and evolved.

5.2.2 Reuse of Configuration Knowledge

To facilitate efficient reuse of existing configuration knowledge representations for modelling federated cloud resource configurations, we use a knowledge acquisition and maintenance method called Ripple Down Rules (RDR) [54]. We decided to choose RDR as it empowers reusability of existing configuration knowledge repre- sentations and configuration knowledge artifacts. RDR also enrich knowledge by creating and integrating new rules to the existing knowledge base. RDR technique has been successfully implemented in many domains (e.g., natural language process- ing, clinical pathology reports, call centers, database cleansing, UI artifact reuse and soccer simulations). But to the best of our knowledge, there has been no attempt 5.2. Knowledge Base for Reuse of Resource Configurations 140

Rule A1 Rule A2 Rule A0 IF T_C= "Web IF T_C= "J2EE Based IF T_C = (undefined) Application Develoment" Web Application" AND D_S = (undefined) AND D_S = "Public except except AND D_S = "DotCloud" THEN KR_ID = unknown Cloud" THEN KR_ID = "B001" THEN KR_ID = "A001" if not Rule A3 Rule A4 IF T_C= "Storage IF T_C= "e-Learning Management" platform" AND D_S = if not AND D_S = "VMWare" "Rackspace" THEN KR_ID = "D001" THEN KR_ID = "C001" - T_C = Task_Category - D_S = Depoyment_Scenario - KR_ID = Cloud_Resources_Configuration_Knowledge_Representation_ID

Figure 5.3: Example of Recommendation Rule Trees to adapt RDR to the domain of cloud resources configuration knowledge reuse.

There are different variations of RDR such as Single-Conclusion RDR (SCRDR), Multiple-Conclusion RDR (MCRDR), and Collaboration RDR. For evaluation pur- poses of our recommender system, we implemented SCRDR technique, which allows only one conclusion for a given context. As a future work we will replace our SCRDR implementation with MCRDR, which allows multiple conclusions and rule modifi- cations by adding exception rules. MCRDR will further enhance the productivity of our recommender system.

Figure 5.3 shows the tree structure of Recommendation Rules in the KB. Rule A0 contains the default conclusion("unknown"). The recommender system suggests the default conclusion, when the input context is not specified. Thus in Figure 5.3, the inference engine triggers the default conclusion, when the task category and deployment scenario are not defined in input contexts. The KB depicts "except" (true) branches and "if not" (false) branches. When consumers input a context to the recommender system, the inference engine starts querying the Recommendation Rule tree. Starting from the root node, the engine checks whether the next rule node is true or false by comparing the context of each rule node with the consumer specified context. This task is carried out repeatedly until the inference engine cannot proceed to find any more true nodes. The conclusion of the last true node 5.2. Knowledge Base for Reuse of Resource Configurations 141 is returned back to the consumer. This is done for each aspect (i.e., task category and deployment scenario).

For example, a curator of our KB may want to model a cloud resources con- figuration for a "Java based Web application development runtime" as a public cloud deployment. But assume our KB does not contain this task category at this moment. That means "Rule A2" does not exist in Figure 5.3. The inference engine queries the KB and find a configuration knowledge representation that is associated with "Task_Category"="Web Application Development" and "Deploy- ment_Scenario"="Public Cloud" ("Rule A1"). But the curator cannot find an "except" rule that originated from "Rule A1". Therefore the curator firstly eval- uates the configuration knowledge representation associated with the "Rule A1" and determines if that configuration knowledge representation satisfies to deploy Java based web applications. Unless the curator modifies the suggested configura- tion knowledge representation such that it describes the configuration with required component resources for the Java based Web application development environment. The curator then registers an "except" rule ("Rule A2") under "Rule A1", and refer the modified configuration knowledge representation as the conclusion of "Rule A2".

In another scenario, a cloud resource consumer may need to deploy an E-Learning platform on a "VMWare vSphere Hypervisor"(Private Cloud). The inference engine checks in Rules-A tree for a rule whose "Task_Category" is equal to "E-Learning Platform" and "Deployment_Scenario" is equal to "VMWare". The inference en- gine checks along the "if not" path way in Rules tree, and realize the last rule node that is set to true is "Rule A4". Hence the conclusion (e.g., configuration knowl- edge representation along with a download link to a pre built VMWare supported virtual appliance that includes Moodle; an E-Learning platform) of "Rule A4" is recommended back to the consumer.

5.2.3 Knowledge Acquisition Process

The KB, empowered by SCRDR, incrementally acquires configuration knowledge in forms of rules. Any change in contexts, configuration knowledge representations, or 5.2. Knowledge Base for Reuse of Resource Configurations 142 configuration knowledge artifacts activates an update to the KB. The following two cases create new Recommendation Rules in our KB.

1. A new configuration knowledge representation or a configuration knowledge artifact is registered on an existing context in the KB

2. A configuration knowledge representation or a configuration knowledge artifact is registered or modified on a non-existing context in the KB

Case 1 - Users (e.g., curators of the KB, cloud resource providers or configuration knowledge artifact developers), who expect to register a configuration knowledge representation model, can register rules by specifying an existing "Task_Category" and "Deployment_Scenario" as the context and referring the new configuration knowledge representation model as the conclusion.

Case 2 - Case 2 is triggered in two scenarios: (1) curators who need to reg- ister, or modify a configuration knowledge representation; or (2) users (e.g., cloud resource providers, or configuration knowledge artifact developers) who expect to register, or modify configuration knowledge artifacts. In both scenarios, the ex- pected "Task_Category" or "Deployment_Scenario" does not exist in "Contexts" database (see Figure 5.1). Therefore users firstly register relevant entries in "Con- texts" database. When a new context is a specialization to an existing context, the new context is positioned accordingly in the hierarchical tree structure. Next, an "except" rule is registered in the Recommendation Rule tree accordingly. Alterna- tively a new rule is registered in a "if not" branch in the tree.

These processes allow the incremental evolution of our SCRDR based KB. Our approach makes more productive suggestions when there are enough rules. So our approach needs little consumer effort to specify contexts of rules when rules are being created.

In summary, our rule based recommender system lets consumers focus on their application and resource requirements, while the system shields consumers from technical complexity of federated cloud service solutions. We argue that our frame- work, which decouples a cloud resource requirement specification from underlying 5.3. Implementation and Evaluation 143 resource and service configuration needs, caters for a flexible characterization and planning of resource needs over time.

5.3 Implementation and Evaluation

For the evaluation purposes, we implemented a proof-of-concept prototype of our concepts on top of the system, we proposed in Chapter3. Figure 5.4 illustrates the internal architecture of our system. We implemented the Context DB, a tree data structure to store context data (i.e., task categories and deployment scenar- ios) such that context information can be organized based on their relations. We use JSON Object Store to store JSON-based descriptions of cloud resource config- urations. The Recommendation Rules component, a Java based recommendation engine, that persists and processes SCRDR based recommendation rules.

5.3.1 Experiment

For the experiment, we simulated large scale modeling effort of 36 cloud resources (e.g., application runtimes, database servers, wiki engines). These 36 resources were the 14% of most popular (i.e., deployed over 1000 times) among users (i.e., devOps and system administrators) of online Ubuntu Juju tool2. We deliberately decided to model these highest demanding resources as we could observe how our knowledge reuse technique makes a productive impact on a large set of real-world resource configuration and deployment tasks.

Before starting the experiment, we initialized the context database with most general forms of task categories (i.e., infrastructure, platforms and software) and deployment scenarios (i.e., public and private). We then modeled each cloud resource iteratively. For each iteration, we invoked our recommendation engine by specifying the current context. The context is chosen based on the existing values in context database. Our recommendation engine (1) matches the context with available rules, (2) selects the best-suited cloud resource representation and (3) returns a YAML

2www.jujucharms.com/solutions recommended context resource configuration

Contexts DB Recommendation Extended BPMN Rules Editor Workflows Task Category access Translator Engine Deployment Scenario BPMN 2.0 Engine

access

Domain-specific Models (DSMs) Event Management events ECA Rule System Processor

Events DB Rules DB JSON Object Store subscribe trigger actions

5.3. ImplementationCo andnnec Evaluationtor-1 Connector-2 Connector-3 144

invoke operations events

Cloud Resource Management Tools

Figure 5.4: Internal Architecture 5.3. Implementation and Evaluation 145

40 120

35 100 30 80 25

20 60

15 40 10 20 5

0 0 1 5 10 15 20 25 30 35 # resource modeling iteraons

% Correctness of recommendaons Cumulave no. of rules

Figure 5.5: Accuracy of recommendations vs. knowledge-base size string. If needed, we modified the recommended resource representation by changing configuration attributes, management rules or component resource representations. Similarly, we leveraged our recommendation engine to generate recommendations when modeling component resources of Composite Resources. Storing a modified resource representation activates a knowledge acquisition process as described in Section 5.2.3. The execution of the knowledge acquisition process may result new context entries and rules in our KB.

To measure the growth of our KB, we measured the number of context entries and rules created for each iteration. After each iteration, we measured the percentage of accuracy of our recommendation engine by calculating the number of exactly cor- rect recommendations and the number of modified resource representations (which resembles the number of partially correct or incorrect recommendations).

5.3.2 Results and Analysis

In Figure 5.5 we compare the growth of our KB (in terms of the no. of cumulative rules) with the accuracy of recommendations. Since KB did not contain any rules or resource representations at initial iterations of the experiment, our recommendation engine does not provide any useful results and the knowledge acquisition process 5.4. Related Work and Discussion 146 incurs and additional overhead to users. In subsequent iterations, our recommenda- tion engine starts to generate more accurate results when resource representations required for modelling of cloud resources are already available in our KB. Higher accuracy of recommendations means less modelling efforts for users compared to traditional resource configuration techniques. Therefore the accuracy of recommen- dation results can be interpreted as a quantitative measure of productivity of cloud resource configuration and deployment tasks.

5.4 Related Work and Discussion

In this section, we briefly describe related works of knowledge re-use for cloud re- source configurations. We compare and contrast our proposed approach with these related works.

Cloud resource configuration tools enhance productivity by not only via effective configuration languages, but also via reusing of existing configuration knowledge artifacts. Users of configuration tools model and share configuration knowledge as reusable software artifacts (e.g., resource configuration descriptions, deployment rules/scripts). Other users curate and/or reuse those artifacts. This process of knowledge reuse incrementally nurtures the quality of cloud resource orchestration by reducing development time and human errors.

We identify two sub-aspects of knowledge reuse; (1) Reused Artifact and (2) Reuse Technique. Reuse Artifact is a logical entity in cloud resource configuration. An entity may be formed of a single element, which corresponds to an instance of a resource description model or a deployment rule, or an entity may include multiple interrelated elements (e.g., components of a Composite Resource). From another dimension, artifacts are distinguished as template and concrete. Concrete artifacts (e.g., VMWare snapshots3, Snaps4) are fully-developed solutions for specific prob- lems. Template artifacts (e.g., Dockerfiles5, Juju Charms6) are generalized solutions,

3kb.vmware.com/kb/1015180 4www.terminal.com/explore 5docs.docker.com/reference/builder/ 6juju.ubuntu.com/charms/ 5.4. Related Work and Discussion 147 which may need manual adaptations (e.g., initializing configuration parameters) be- fore reuse. Our unified resource representation model is categorized as a Template artifact, where we allow users to initialize attributes of Resource entities before deployment.

Given an artifact to be reused, it is imperative to identify different techniques that can be applied to reuse the artifacts in practice. Any knowledge reuse technique must address three concerns; (1) knowledge acquiring, (2) knowledge discovery and (3) knowledge curation. Knowledge acquiring is the strategy of archiving reuse artefacts. Knowledge discovery is how to make available the archived artifacts for users. Third concern is how to maintain the quality of the archived reuse artifacts. Most of cloud resource configuration tools provide a search index (e.g., Docker Hub7, Bitnami8) for users to discover and reuse resource artifacts. Search indexes (1) accept a set of resource attributes (e.g., resource name, cost per unit), (2) search of a pool of artifacts and (3) return artifacts that exactly satisfy the inputs. Comparatively, our recommendation system maintains a tree structure of context data and finds either equal or approximately similar (but more generalized) contexts that satisfy consumers’ requirements. On the other hand, our recommendation system can be introduced as a community-driven knowledge acquisition technique, which leverages the expertise of users (e.g., devOps, system administrators) to collaboratively and incrementally build a knowledge-base over multiple cloud resource configuration tools.

TOSCA [153] is an open standard for representation and orchestration of cloud resources. TOSCA facilitates to model a federated cloud resource configuration using a "Service Template" that captures (1) topology of component resources; and (2) plans that orchestrate component resources. Our recommendation system can be leveraged to facilitate re-use of "Service Templates" in TOSCA. We believe that this is the first work that enables incremental knowledge acquisition for "Service Templates" in TOSCA. 7registry.hub.docker.com 8bitnami.com 5.5. Conclusion and Future Work 148

5.5 Conclusion and Future Work

In this chapter, we have presented a framework that recommends reusable config- uration knowledge representations for cloud resources. The framework consists of (1) a declarative and context-aware language to specify consumers’ application and resource requirements in terms of task categories and deployment scenarios; (2) au- tomatic recommendation of configuration knowledge artifacts for a given context; and (3) an incremental configuration knowledge acquisition mechanism based on a Ripple Down Rules base. To evaluate the feasibility and efficiency of the proposed framework, we implemented our system as a proof of concept prototype. As future work, we plan to extend our declarative requirement specification language with other dimensions (e.g., the expected uptime period of a cloud resources configura- tion) to bring time-aware configuration management and orchestration capabilities. Chapter 6

CloudMap: A Visual Notation for Representing & Managing Cloud Resources

6.1 Introduction

Typical cloud-based organizations are finding it difficult to productively utilize their very large repositories ladened with textual cloud resource description and manage- ment artifacts, [16]. For example, simple management tasks commonly involve: analyzing resource descriptions; understanding the inter-relationships between re- sources; and aggregating monitoring data. However, until now DevOps (i.e. devel- opers and operation personnel who are collectively involved in designing, developing, deploying and managing cloud applications) are required to manually and iteratively read several low-level files and use command-line tools to extract monitoring infor- mation. In fact, it has been confirmed that DevOps dedicate the majority of their time to understand existing artifacts instead of creating new ones, updating and/or testing them [17, 161].

To overcome these challenges, we present CloudMap. Leveraging the old-age dic- tum of “a picture tells a thousand words!”, we develop visual notations to simplify representing and managing cloud resources. We argue this novel approach will en-

149 6.1. Introduction 150 able DevOps to invest more on creating, configuring and managing cloud resources, instead of the frustrations and time spent to understand them. Since we are at the foundational stage, we have specialized our framework to Docker [197]; albeit in future it can easily be extended with other orchestration tools (e.g. JuJu or Ansi- ble). Docker is an open-source and widely-praised industry standard initiative. Its Container-based virtualization technique offers a lightweight and portable resource isolation alternative to Virtual Machines (VMs). This technique emerged to simplify and accelerate the configuration and management of cloud resources. More specif- ically, for composite service-based cloud resources that depend on multiple service middleware for their operations, container-based virtualizations enable accelerated and efficient deployment of optimally configured, scalable and lightweight middle- ware instances. However, since to the best of our knowledge current tools merely leverage textual resource representations, it does not do justice to improving the pro- ductivity and efficiency of DevOps. Accordingly, this chapter makes the following main contributions:

Literature Review. Based on our extensive analysis on literature and tools related to cloud resource orchestration in Chapter2, we discovered gaps and chal- lenges in current solutions. We form a strong understanding of the requirements, and derive key design decisions for our novel solution.

Visual Notation for Representing & Managing Cloud Resources. We formulate the necessary notational constructs (i.e. Entities and Links) and define the semantics for each. We also propose novel auxiliary features called Probes and Control Actions, that can be “tagged” to entities.

Cloud Visualization Patterns. We identify common visualization patterns for cloud resource configuration. Existing architectural patterns are high-level and mostly suitable for solution architects or IT directors [128]. In contrast, we believe DevOps require more fine-grained visual abstractions for understanding, navigating, monitoring and controlling complex cloud resources. Resultantly, we present three patterns and describe their benefits via practical scenarios.

The rest of this chapter is organized as follows: In Section 6.2, we present an example scenario and explain how our visual notation is applied within the cloud 6.2. Motivating Example 151

Control Monitor

Select Selected Configure Configured Deploy Deployed Delete

Figure 6.1: State transitions of the cloud resource life cycle resource lifecycle. In Section 6.3, we detail our visual notation and its semantics, and present three organizational patterns. We employ a mind-map interface, and illustrate how they could be used over various use-cases which span across selection, configuration, deployment, monitoring and controlling of cloud resources. In Sec- tions 6.4, we present our implementation and GUI; evaluation in Section 6.5; then related-work, and conclusions in Section 6.6 and 6.7.

6.2 Motivating Example

Cloud resources management typically involves: (i) An initially sequential stage of consisting of: Select, Configure and Deploy; (ii) Followed by an iterative phase consisting of: Monitor and Control (Refer to Figure 6.1).

As a running example, consider the 3-tier system illustrated in Figure 6.2. We begin by selecting the required resources. In this case, business logic is executed using Business Process Execution Language (BPEL), with state data stored on a MySQL DB. For scaling purposes, we introduce a Nginx Load Balancer that propa- gates requests to a cluster of Apache Orchestration Director Engine (ODE) Servers. To configure and deploy, DevOps determine the relationships between components and write configuration and deployment scripts, that describes the attributes (e.g. no. of BPEL engines, CPU allocation). Subsequently, DevOps also collect and analyze events to monitor and apply control actions if necessary. 6.3. CloudMap: Visual Notation for Cloud Resource Management 152

Container)5+ Container)1+ Apache&ODE&Server,Container)4+ Container)6+ (bpel,processes) Nginx&Load&Balancer, Apache&ODE&Server,Container)3+ MySQL&DB&Server, (LB&Config) (IP;,Open&port;,Access&rules) Apache&ODE&Server,(bpel,processes) (bpel,processes)

Figure 6.2: Resource diagram of the typical 3-tier (BPEL-based) Application

6.3 CloudMap: Visual Notation for Cloud Resource Management

CloudMap offers a refreshing “visual” attempt at simplifying the way DevOps can navigate and understand cloud resource configurations, as well as monitor and con- trol such resources. The concepts of CloudMap are associated with both a visual notation and an underlying textual JSON syntax. The textual syntax provides the context in which visual primates can be specified and executed.

The constructs of the notation are specified as the following: (i) Structural model represents primitive cloud resource entities and their attributes. Attributes have string name with one or a set of string values. We reuse the high-level cloud re- source representation model that we propose in Chapter3. (ii) Navigation Model represents the topology of links between entities. Links are directional with a single string label. The set of valid links are domain-specific; (we will explain in Section 6.4 how CloudMap depends upon our previous work in Chapter3[211] to determine domain context). (iii) Badges are an auxiliary feature to the fundamental con- structs mentioned above. A badge represents a special entity that may be “tagged” to another entity. Visually, a badge is realized as a single or set of widgets, which may be used to monitor and/or control tagged entities. The data model of a badge specifies which entity-type it applies to.

Figure 6.3 illustrates the graphical notation, while Figure 6.4 is the syntactical schema of the underlying JSON model. Below we explain each of these constructs. 6.3. CloudMap: Visual Notation for Cloud Resource Management 153

Entities

Resource Entities Index Entities

Image Application Hosting Image Container Cluster Hosting Application Machine Registry Registry Machine Registry

Links

Hosting Link Containment Link hosts contains hosts contains contains is hosted by is contained by Communication Link contains communicates-to is contained by is contained by

Instantiation Link Dependency Link instantiates depends-on

instantiated by prerequisite-of

Badges: Probes & Control Actions A Attribute Probe M Monitoring Probe

M Migration Control Action E Elasticity Control Action

Figure 6.3: CloudMap Visual Notations

(a) { (b) { "name": "Entity", "name": "Link", "description": "Represent an Entity in CloudMap", "description": "Represent a Link in CloudMap", "properties": { "properties": { "schema": {"type": "string"}, "name": {"type": "string"}, "source-participant": {"type": "string"}, "schema": {"type": "string"}, "target-participant": {"type": "string"}, "symbol": {"type": "string"}, "arrow-style": {"type": "enum['bi-directional','directional']"}, "optional-attributes": [...] } "optional-attributes": […] } } }

(c) { (d) { "name": "Badge", "name": "Widget", "description": "Represent a Badge in CloudMap", "description": "Represent a Widget in CloudMap", "properties": { "properties": { "name": {"type": "string"}, "display-name": {"type": "string"}, "schema": {"type": "string"}, "schema": {"type": "string"}, "type": {"type": "enum['probe','control-action']"}, "applies-to": {"type": "string"}, "tool-tip-description": {"type": "string"}, // describe how to retrieve data via relevant DSM and Connector "icon": {"type": "string"}, "dsm-config": {"name": "...", "monitored-events": ['...', '...']}, "applies-to": {"type": "array"}, // entity it applies to //describe how to visualize retrieved data "related-widgets": {"type": "array"} } "visualization-config": {"type": "enum['bar-chart','list','pie-chart']"}, } "optional-attributes": […] } }

Figure 6.4: CloudMap Syntactical Schema of Constructs 6.3. CloudMap: Visual Notation for Cloud Resource Management 154

6.3.1 Structural Model: Entities

We follow the design of Domain-specific Models in Chapter3 to describe the Struc- tural Model. However, to make the chapter self-explanatory, we will redefine the concepts in certain sections. An entity is a single cloud resource, referred to as a Resource Entity; or collection thereof, referred to as a Index Entity. Syntacti- cally, an Entity as shown in Figure 6.4(a) contains a string name and description and properties. Properties includes a set of child attributes (e.g., name, schema, symbol, and optional-attributes) that describes features specific to each Entity type. The attribute name is used for the unique identification of an instance of En- tity among other Entity instances. The attribute symbol is used for the graphical representation of Entities. The attribute schema denotes the overall structure for each Entity type and it is in accordance with the Domain-Specific schema1. Further details are exemplified in the remainder of this Section.

Resource Entities. We have identified 5 resource entities types:

1. Container

A Container represents a virtualized software container (e.g., Linux Containers2, OpenVZ3, Solaris containers [122]) where DevOps deploy an application or a com- ponent of an application (e.g., an Apache ODE Server installed on Ubuntu OS with dependent libraries).

Syntactically a Container requires a collection of attributes that describes the Container. Required attributes and constraints of the Container description is de- fined as a JSON schema named “docker.rest.Container” (refer to Listing 6.1). At- tributes named cpu, memory, port-binding-rules and state denote the configu- ration details such as the amount of allocated CPU, memory, and port forwarding rules. The value of the attribute image refers to an Image, which represents the deployment description of the Container.

1See our previous work: ftp://ftp.cse.unsw.edu.au/pub/doc/papers/UNSW/201514.pdf 2linuxcontainers.org 3http://openvz.org/ 6.3. CloudMap: Visual Notation for Cloud Resource Management 155

Code 6.1: Sample configuration of a Container 1 { 2 " schema ": "docker.rest.Container", 3 " name ": "ODE-Server-1", 4 " symbol ": "./container.svg", 5 "optional-attributes":{ 6 " state ": "run", 7 "port-binding-rules":"8080/80", 8 "cpu":2, 9 " memory ":1024, 10 " image ":"[Image]-bpel/2.3" 11 } 12 }

Figure 6.5 depicts the CloudMap visual notation for representing the Container in Listing 6.1.

ODE-Server-1

Figure 6.5: Visual construct of the Container

2. Hosting Machine

A Hosting Machine represents a computer system where Containers are hosted (e.g., Virtual Machine (VM) or a physical machine). Syntactically a Hosting Machine requires several attributes (refer to Listing 6.2). FQDN denotes the fully qualified IP address of the Hosting Machine. The attribute, access-credentials denotes required user name and password to log in to the Hosting Machine. The values of hm-registry and cluster-config describe the relationships with two Entities (i.e., Hosting Machine Registry and Cluster), which we define later.

Code 6.2: Sample configuration of a Hosting Machine 1 { 6.3. CloudMap: Visual Notation for Cloud Resource Management 156

2 " schema ": "docker.rest.HostingMachine", 3 " name ": "vm1", 4 " symbol ": "./hm.svg", 5 "optional-attributes":{ 6 "FQDN":"10.100.56.67", 7 "access-credentials": "root/admin123", 8 "hm-registry":"HMR-01", 9 "cluster-config":{"name": "cluster-1", "tcp-port":"4005"} 10 } 11 }

Figure 6.6 depicts the CloudMap visual notation for representing the Hosting Machine in Listing 6.2.

vm1

Figure 6.6: Visual construct of the Hosting Machine

3. Cluster

A Cluster represents a set of Hosting Machines. This reduces the overhead of dy- namically managing multiple machines. For example, the Cluster may automatically decide which Hosting Machine will be chosen to deploy the given container based on an optimization algorithm [179]. A sample Cluster object configuration described in Listing 6.3 indicates that the cluster belongs to a Hosting Machine Registry named “HMR-01” and the location of a discovery service.

Code 6.3: Sample configuration of a Cluster 1 { 2 " schema ": "docker.rest.Cluster", 3 " name ": "cluster-1", 4 " symbol ": "./cluster.svg", 5 "optional-attributes":{ 6 "hm-registry":"HMR-01", 6.3. CloudMap: Visual Notation for Cloud Resource Management 157

7 "discovery-service-config": "token://fy67v96a72cf04dba8c1c4aa 79536ec3" 8 } 9 }

Figure 6.7 depicts the CloudMap visual notation for representing the Cluster in Listing 6.3.

cluster-1

Figure 6.7: Visual construct of the Cluster

4. Application

An Application represents a logical entity that includes a collection of related Con- tainers. Each Container constitutes a component of the Application. Syntactically an Application, described in Listing 6.4, includes the version, and a list of Containers, which are the components of the Application (e.g., “app2” is composed of Containers named “mysql”, “nginx”, “ODE-Server-1”, “ODE-Server-2” and “ODE-Server-3”).

Code 6.4: Sample configuration of an Application 1 { 2 " schema ": "docker.rest.Application", 3 " name ": "app2", 4 " symbol ": "./application.svg", 5 "optional-attributes":{ 6 " version ":"1.0", 7 "containers":[ 8 { 9 " name ": "mysql", 10 "communication-links":[] 11 }, 12 { 13 " name ": "nginx", 6.3. CloudMap: Visual Notation for Cloud Resource Management 158

14 "communication-links":["ODE-Server-2","ODE-Server-3"," ODE-Server-2"] 15 }, 16 { 17 " name ": "ODE-Server-1", 18 "communication-links":["mysql"] 19 }, 20 { 21 " name ": "ODE-Server-2", 22 "communication-links":["mysql"] 23 }, 24 { 25 " name ": "ODE-Server-3", 26 "communication-links":["mysql"] 27 } 28 ], 29 "application-registry":"AR-01" 30 } 31 } 32 }

Figure 6.8 depicts the CloudMap visual notation for representing the Application in Listing 6.4.

app2

Figure 6.8: Visual construct of the Application

5. Image

An Image represents the deployment description of a Container, that is fed to the runtime of the orchestration tool in order to instantiate the Container. Syntactically an Image, described in Listing 6.5, includes the version, dependent Images and a 6.3. CloudMap: Visual Notation for Cloud Resource Management 159

script that includes a set of commands to construct the Image. For example, script may represent a sequence of commands to install and configure Ubuntu operating system with Apache Tomcat Server and a web application.

Code 6.5: Sample configuration of an Image 1 { 2 " schema ": "docker.rest.Image", 3 " name ": "MySQL-DB-Server", 4 " symbol ": "./image.svg", 5 "optional-attributes":{ 6 " version ":"7.0", 7 "depend-on": "ubuntu-OS/10.04", 8 " script ": "./run.sh", 9 "image-registry":"IR-01" 10 } 11 }

Figure 6.9 depicts the CloudMap visual notation for representing the Image in Listing 6.5.

MySQL-DB-Server

Figure 6.9: Visual construct of the Image

Index Entities. We define the following 3 index/registry entities. As men- tioned, they represent a collection of elementary resource entities. For instance, a registry of VMs allows users to query and search for VMs.

1. Hosting Machine Registry

A Hosting Machine Registry is a logical entity that contains a set of Hosting Ma- chines and Clusters.

Code 6.6: Sample configuration of a Hosting Machine Registry 6.3. CloudMap: Visual Notation for Cloud Resource Management 160

1 { 2 " schema ": "docker.rest.HostingMachineRegistry", 3 " name ":"HMR-01", 4 " symbol ": "./hmr.svg", 5 "optional-attributes":{ 6 "url": "registry.hub.soc.unsw.edu.au", 7 " protocol ": "http", 8 " port ":"7001" 9 } 10 }

Figure 6.10 depicts the CloudMap visual notation for representing the Hosting Machine Registry in Listing 6.6.

HMR-01

Figure 6.10: Visual construct of the Hosting Machine Registry

2. Application Registry

An Application Registry represents a repository of Applications. DevOps orga- nize and discover all deployed cloud applications within the Registry. A sample Application Registry object configuration described in Listing 6.7 indicates that the Application Registry is accessible using “http” protocol at IP address “reg- istry.hub.soc.unsw.edu.au” and port “6001”.

Code 6.7: Sample configuration of an Application Registry 1 { 2 " schema ": "docker.rest.ApplicationRegistry", 3 " name ":"AR-01", 4 " symbol ": "./app-reg.svg", 5 "optional-attributes":{ 6 "url": "registry.hub.soc.unsw.edu.au", 6.3. CloudMap: Visual Notation for Cloud Resource Management 161

7 " protocol ": "http", 8 " port ":"6001" 9 } 10 }

Figure 6.11 depicts the CloudMap visual notation for representing the Applica- tion Registry in Listing 6.7.

AR-01

Figure 6.11: Visual construct of the Application Registry

3. Image Registry

An Image Registry represents a repository of Images where DevOps may organize, curate and share resource deployment knowledge. A sample Image Registry object configuration described in Listing 6.8 indicates that the Image Registry is accessible using “http” protocol at IP address “registry.hub.soc.unsw.edu.au” and port “5001”.

Code 6.8: Sample configuration of an Image Registry 1 { 2 " schema ": "docker.rest.ImageRegistry", 3 " name ":"IR-01", 4 " symbol ": "./img-reg.svg", 5 "optional-attributes":{ 6 "url": "registry.hub.soc.unsw.edu.au", 7 " protocol ": "http", 8 " port ":"5001" 9 } 10 }

Figure 6.12 depicts the CloudMap visual notation for representing the Image Registry in Listing 6.8. 6.3. CloudMap: Visual Notation for Cloud Resource Management 162

IR-01

Figure 6.12: Visual construct of the Image Registry

6.3.2 Navigation Model: Links

The relationship between Entities are represented as Links and enable navigation. Syntactically, Links have a string schema (Refer to Figure 6.4(b)), which denotes the overall structure for each Link type and it is in accordance with the Domain- Specific schema4. They also define the source and target participants. Graphical representation of a Link is denoted by arrow-style. Additional attributes may also be defined. Further details are exemplified in the remainder of this Section.

We have identified 5 different groups of links, as categorized below.

1. Communication Links are defined between two Containers that interact or exchange data. For example, an Apache ODE (BPEL) server communicates with a MySQL database server, about a BPEL instances.

Code 6.9: Sample configuration of a Communication Link 1 { 2 " schema ": "docker.rest.CommLink", 3 "source-participant": "ODE-Server-1", 4 "target-participant": "MySQL-DB-Server-1", 5 "arrow-style": "directional", 6 }

Figure 6.13 depicts the CloudMap visual notation for representing the Com- munication Link in Listing 6.9.

communicates-to ODE-Server-1 MySQL-DB-Server-1

Figure 6.13: Visual construct of the Communication Link

4See our previous work: ftp://ftp.cse.unsw.edu.au/pub/doc/papers/UNSW/201514.pdf 6.3. CloudMap: Visual Notation for Cloud Resource Management 163

2. Containment Links defines the hierarchical organization of entities. In prac- tice, they may also be also used to simultaneously control a set of related resources (e.g. control actions on a parent automatically triggers actions on all children). The valid set of containment links are defined in the links’ schema (also as illustrated in Figure 6.3). For example, Containment Links exist between an Application entity and its Container entities.

Code 6.10: Sample configuration of a Containment Link 1 { 2 " schema ": "docker.rest.ContainmentLink", 3 "source-participant": "BPEL-App", 4 "target-participant": "ODE-Server-1", 5 "arrow-style": "directional", 6 }

Figure 6.14 depicts the CloudMap visual notation for representing the Con- tainment Link in Listing 6.10.

contains BPEL-App ODE-Server-1

Figure 6.14: Visual construct of the Containment Link

3. Hosting Links defines a relationship between a Hosting Machine and Container or an Application. For example, between a PHP application engine and a VM where the PHP application engine is deployed.

Code 6.11: Sample configuration of a Hosting Link 1 { 2 " schema ": "docker.rest.HostingLink", 3 "source-participant":"VM-1", 4 "target-participant": "ODE-Server-1", 5 "arrow-style": "directional", 6 } 6.3. CloudMap: Visual Notation for Cloud Resource Management 164

Figure 6.15 depicts the CloudMap visual notation for representing the Hosting Link in Listing 6.11.

hosts VM-1 ODE-Server-1

Figure 6.15: Visual construct of the Hosting Link

4. Dependency Links defines a relationship between two Images, where the at- tributes of a particular resource depend upon the other resource. To emphasis the practical benefits of this: Suppose there exists an Image that describes an Apache Web server installed on an Ubuntu Operating System. Now, to create a new Web Application that is to be installed on the Apache server, the developer would simply need to: create a new image that is dependent on the existing image, and subsequently customize with any additional at- tributes. This saves the developer from creating a new Image that describes the complete software stack from scratch.

Code 6.12: Sample configuration of a Dependency Link 1 { 2 " schema ": "docker.rest.DependencyLink", 3 "source-participant": "Tomcat-7", 4 "target-participant": "Java-7-VM", 5 "arrow-style": "directional", 6 }

Figure 6.16 depicts the CloudMap visual notation for representing the Depen- dency Link in Listing 6.12.

depends-on Tomcat-7 Java-7-VM

Figure 6.16: Visual construct of the Dependency Link

5. Instantiation Links defines a relationship between an Image and Container, when the latter may be produced from the former. For example, a Conteiner 6.3. CloudMap: Visual Notation for Cloud Resource Management 165

may instantiate an Image.

Code 6.13: Sample configuration of an Instantiation Link 1 { 2 " schema ": "docker.rest.InstantiationLink", 3 "source-participant": "MySQL-DB-Server-Image", 4 "target-participant": "MySQL-DB-Server-1", 5 "arrow-style": "directional", 6 }

Figure 6.17 depicts the CloudMap visual notation for representing the Instan- tiation Link in Listing 6.13.

instantiates MySQL-DB-Server-Image MySQL-DB-Server-1

Figure 6.17: Visual construct of the Instantiation Link

6.3.3 Badges: Probes and Control-Actions

A badge represents an auxiliary feature to the fundamental constructs of Entities and Links. Syntactically, they define a string name, tool-tip-description, icon, and may only be applied to certain entity types, defined in the applies-to property (Refer to Listing 6.14). When a badge is tagged to some entity, they apply some behavioral function, depending on the type of badge: either a probe (used for moni- toring) or control action (used for performing some action). Visually, when a badge is tagged to an entity, it renders a widget; a badge thus also contains a pointer to one or a set of widgets. A sample probe configuration described in Listing 6.14 indicates the probe applies to all Entity types and renders a widget named “attribute-widget”.

Code 6.14: Sample configuration of a Badge 1 { 2 " name ": "Attributes Probe", 3 " type ": "probe", 4 "tool-tip-description": "Cloud Resource Attributes", 6.3. CloudMap: Visual Notation for Cloud Resource Management 166

5 " icon ": "../icons/ap.svg", 6 "applies-to":["$all"], <- Applicable for all Entity types 7 "related-widgets":["attribute-widget"] <- Associating with a Widget 8 }

Figure 6.18 depicts the visual representation of the probe in Listing 6.14.

Figure 6.18: Visual representation of a probe

Syntactically, widgets, as shown in Fig 6.4(d), contain a string name, description and may only be applied to certain Entity types, defined in the applies-to property. The attribute, dsm-config describes the configuration to retrieve data, to be visual- ized, from our previous work (refer to Chapter3)[211] to populate the widget. The attribute visualization-config describes how to visualize the retrieved data (e.g., as a bar chart, pie chart, list). Additional attributes (i.e., optional-attributes) may be included to describe features specific for each widget type. A sample wid- get configuration described in Listing 6.15 indicates the widget applies to all Entity types and renders a list of attributes described in the Domain-specific Model named “cloudbase.docker.v1”.

Code 6.15: Sample configuration of a Widget 1 { 2 " name ": "attribute-widget", 3 "display-name": "Attributes Widget", 4 "resource-type":["$all"], 5 "dsm-config":{ <- Configuration to extract data from Cloud resources 6 " name ": "cloudbase.docker.v1", 7 " events ":["$static-attributes"] 8 }, 9 "visualization-config":{ "type": "list"} <- Generates a list 10 } cloudMAP Moshe Chai Barukh Account Help Log-Out

[Application]-BPEL-App Hosting Map App Map Image Map Activity Wall ODE-Server-1 : Pulling Image from BPEL/1.3... 2015-11-28:10:54 [Container]-ODE- cloudMAP Server-1 ODE-Server-1 : started. Pulled Image from BPEL/1.3... Moshe Chai Barukh Account Help Log-Out 2015-11-28:11:04 [Hosting- [Container]-ODE- ODE-Server-2 : started. Pulled Image from BPEL/1.3... Activity Wall [Application]-BPEL-AppMachine]-HM-1 S e rv e r - 2 Hosting Map App Map Image Map 2015-11-28:11:16 ODE-Server-1 : Pulling Image from BPEL/1.3... ODE-Server-2 : paused. All processed frozen. 2015-11-28:10:54 [Container]-ODE- [Container]-ODE- Server-3 Server-12015-11-28:11:18 ODE-Server-1 : started. Pulled Image from BPEL/1.3... [Application]- 2015-11-28:11:04 BPEL-App [Container]- Memory Usage Widget [Hosting- [Container]-ODE- ODE-Server-2 : started. Pulled Image from BPEL/1.3... NginMxa-LcBhine]-HM-1 Serv[Ceorn-ta2iner]-ODE-Server-2 [Hosting- [Container]-ODE-… Free (MB): 546 2015-11-28:11:16 Machine]-HM-2 [Container]- ODE-Server-2 : paused. All processed frozen. MySQL-DB-Server [Container]-ODE-… [Container]-ODE- [Application- Server-3 2015-11-28:11:18 Registry]-AR-01 0 600 1,200 1,800 2,400 [Application]- Used (MB) Free (MB) BPEL-App Memory Usage Widget Elasticity Control Widget CPU Usage Widget [ C o n t a i n e r ] - Nginx-LB [Container]-ODE-Server-2 [Hosting- Select an Entity [Container]-ODE-Server-1 [Container]-ODE-… Free (MB): 546 [Container]-Cluster-Manager Machine]-HM-2 [Container]-ODE-Server-1 No. of CPU Cores 1 [Container]- 6.3. CloudMap: Visual Notation for Cloud Resource Management 167 [Container]-ODE-Server-2 [Container]-ODE-ServeMr-2ySQL-DB-Server [Container]-ODE-… [Container]-ODE-Server-3 Amount of Memory 1042MB [Application- [Container]-Nginx-LB Registry]-AR-0Figure1 6.19 depicts the visual generation0.0 of0 the.5 widget1.0once1 its.5 related2.0 probe is 0 600 1,200 1,800 2,400 [Container]-MySQL-DB-Server Container Submit Resetattached to a named “Nginx-LB”. CPU Cores Used (MB) Free (MB) Elasticity Control Widget Attributes WidgetAttrib u t e s D a t a CPU Usage Widget NodeID = container-05 Select an Entity schema = docker.rest.Container [Container]-ODE-Server-1 [Container]-Cluster-Manager [Container]-ODE-Server-1 No. of CPU Cores 1 name = Nginx-LB [Container]-ODE-Server-2 state = run [Container]-ODE-Server-2 port-binding-rules = 80/22 [Container]-ODE-Server-3 Amount of Memory 1042MB [Container]-Nginx-LB cpu = 2 0.0 0.5 1.0 1.5 2.0 memory = 2048 [Container]-MySQL-DB-Server Submit Reset CPU Cores image = nginx

Figure 6.19: Visual representation of a widget Attributes Widget

Probes

At present, we have developed 2 probes: (1) Attribute Probe displays the name and values of attributes for a given entity. For example, a VM in AWS-EC2 contains attributes about the number of CPU cores, storage and memory capacity, OS and access rules. (2) Monitoring Probe continuously monitor runtime data of deployed cloud resources. For example, if an Application, a Container or a Hosting Machine is tagged by the “monitoring probe”, an appropriate widget becomes available to graphically analyze the underlying resource consumption statistics (e.g., memory usage, network I/O, CPU usage). Monitoring probes may also be attached on several resources at the same time, and the associated widget aggregates, summarizes and provides a visual technique to compare the performance of multiple cloud resources.

Control Actions

Actions can be both manual or automated. Manual actions are self-prompted by dragging specific badges onto entities. For example, (1) Elasticity Control applies to a Container or Hosting Machine to scale up or down (e.g. no. of CPUs or memory); and (2) Migration Control to migrate a container across VMs. Automated actions 6.3. CloudMap: Visual Notation for Cloud Resource Management 168 are inputed as ECA-rules (see Section 6.4), via a dedicated universal badge that renders a rule-input widget.

6.3.4 Visualization Patterns for Cloud Resource Configura- tions

We consolidate the above by identifying 3 organizational patterns commonly found within cloud resource management. We describe the benefits of these via practical scenarios and accompanying illustrations. These visualization patterns are inspired from Mind-Maps – a popular information visualization technique. Similar to other available methods (e.g., ER-diagrams, spider diagrams, concept and topic maps), mind-maps in particular have been proven to be more effective and scalable for browsing, querying and understanding information from a very large data set [98].

Image Map.

An Image Map visualizes the recursive dependency of Images within a Registry. In general, cloud resource descriptions, deployment and/or control scripts typically have inter-relationships among them. Understanding this is essential during de- ployment; it also assists to avoid errors in updating and creating new deployment artifacts. Ordinarily, DevOps would have to manually examine and comparing de- ployment artifacts to extract such inter-relationships.

[Image]­MySQL­ [Container]­ODE­ DB­Server Server­1

[Image]­Java7­ [Image]­Tomcat [Image]­Apache­ [Container]­ODE­ VM ODE Server­2 [Image]­Ubuntu­ OS [Image]­Nginx [Container]­ODE­ Server­3

[Image­ Registry]­IR­01

Probe Palette Figure 6.20: Image Map

Monitoring Communication Attributes Discarding inExample.fo FigureL in6.20k info depicts an Imageinfo Map focusedZo onne "Ubuntu-OS". Con- sider we want to extend the "Apache-ODE" Image with an additional feature (e.g.

Attributes Widget

Please drag­drop a Attributes­Information Probe to initialize 6.3. CloudMap: Visual Notation for Cloud Resource Management 169

BPEL4People). We are required to know all existing dependencies (e.g. Java-VM version) recursively up until the root Image. Thus the Image Map provides an in- dispensable visual technique to easily discover whether the "Apache-ODE" Image is dependent upon a particular version of "Java-VM" (e.g., Java7-VM). It is also useful as DevOps can identify, customize and reuse existing resources.

Listing 6.16 represents the JSON-based configuration which is used to generate the mind map in Figure 6.20. The JSON configuration essentially includes arrays of Entities (refer to line 2), Links (refer to line 34), Badges (refer to line 52) and Widgets (refer to line 73). For generating the Image Map, relevant Entities (i.e., Images, ImageRegistries and Containers) and their related Links are consumed from the JSON configuration. The same JSON configuration is used for generating other types of Mind-Maps, which we propose in our work. For the sake of brevity and clarity, only the most important configuration attributes are included.

Code 6.16: JSON-based configuration for generating Mind Maps 1 { 2 " Entities ":[ <- Entity Configurations 3 { 4 " name ": "Ubuntu-OS", 5 " schema ": "docker.rest.Image", 6 " symbol ": "./container.svg", 7 "optional-attributes":{ 8 ... 9 } 10 }, 11 { "name": "MySQL-DB-Server", "schema": "docker.rest.Image", ...}, 12 { "name": "Java7-VM", "schema": "docker.rest.Image",...}, 13 { "name": "Tomcat", "schema": "docker.rest.Image",...}, 14 { "name": "Apache-ODE", "schema": "docker.rest.Image",...} , 15 { "name": "Jenkins", "schema": "docker.rest.Image",...}, 16 { "name": "Java8-VM", "schema": "docker.rest.Image",...}, 6.3. CloudMap: Visual Notation for Cloud Resource Management 170

17 { "name": "Nginx", "schema": "docker.rest.Image",...}, 18 { "name":"IR-01", "schema": "docker.rest.ImageRegistry", ...}, 19 { "name": "MySQL-DB-Server-1", "schema": "docker.rest. Container ",...}, 20 { "name": "ODE-Server-1", "schema": "docker.rest.Container", ...}, 21 { "name": "ODE-Server-2", "schema": "docker.rest.Container", ...}, 22 { "name": "ODE-Server-3", "schema": "docker.rest.Container", ...}, 23 { "name": "Nginx-LB", "schema": "docker.rest.Container",... }, 24 { "name": "BPEL-App", "schema": "docker.rest.Application", ...}, 25 { "name":"HM-1", "schema": "docker.rest.HostingMachine", ...}, 26 { "name":"HM-2", "schema": "docker.rest.HostingMachine", ...}, 27 { "name":"AR-01", "schema": "docker.rest.Application- Registry ",...}, 28 { "name": "Cluster-Manager", "schema": "docker.rest. Container ",...}, 29 { "name":"C-1", "schema": "docker.rest.Cluster",...}, 30 { "name":"HMR-01", "schema": "docker.rest.Hosting-Machine- Registry ",...}, 31 { "name":"HM-0", "schema": "docker.rest.HostingMachine", ...} 32 33 ], 34 " Links ":[ <- Link Configurations 35 { 36 "source-participant": "MySQL-DB-Server", 37 "target-participant": "Ubuntu-OS", 6.3. CloudMap: Visual Notation for Cloud Resource Management 171

38 " schema ": "docker.rest.DependencyLink", 39 "arrow-style": "directional" 40 }, 41 { "source-participant": "Java7-VM", "target-participant":" Ubuntu -OS",...}, 42 { "source-participant": "Nginx", "target-participant":" Ubuntu -OS",...}, 43 { "source-participant":"IR-01", "target-participant":" Ubuntu -OS",...}, 44 { "source-participant": "Tomcat", "target-participant":" Java7-VM",...}, 45 { "source-participant": "Apache-ODE", "target-participant": " Tomcat ",...}, 46 { "source-participant": "Apache-ODE", "target-participant": "ODE-Server-1",...}, 47 { "source-participant": "Apache-ODE", "target-participant": "ODE-Server-2",...}, 48 { "source-participant": "Apache-ODE", "target-participant": "ODE-Server-3",...}, 49 { "source-participant": "Nginx", "target-participant":" Nginx -LB",...}, 50 ... 51 ], 52 " Badges ":[ <- Badge Configurations 53 { 54 " name ": "Attributes Probe", 55 " type ": "probe", 56 "tool-tip-description": "Cloud Resource Attributes", 57 " icon ": "../icons/ap.svg", 58 "applies-to":["$all"], 59 "related-widgets":["attribute-widget"] 60 }, 61 { 62 "monitoring-info-probe":{ 6.3. CloudMap: Visual Notation for Cloud Resource Management 172

63 " name ": "Monitoring Information Probe", 64 " type ": "probe", 65 "tool-tip-description": "Runtime Statistics", 66 " icon ": "../icons/mip.svg", 67 "applies-to":["Container", "Hosting-Machine"], 68 " widgets ":["cpu-info-widget", "memory-info-widget"] 69 } 70 }, 71 {...} 72 ], 73 " Widgets ":[ <- Widget Configurations 74 { 75 " name ": "attribute-widget", 76 "display-name": "Attributes Data", 77 "resource-type":["$all"], 78 "dsm-config":{ 79 " name ": "cloudbase.docker.v1", 80 " events ":["$static-attributes"] 81 }, 82 "visualization-config":{ "type": "list"} 83 }, 84 { 85 " name ": "cpu-info-widget", 86 "display-name": "CPU Usage Widget", 87 "resource-type":[ "Container", "Hosting-Machine"], 88 "dsm-config":[ 89 { 90 " name ": "cloudbase.docker.v1", 91 "resource-type": "Container", 92 " events ":["getContainerCPUDetails"] 93 }, 94 { 95 " name ": "cloudbase.docker.v1", 96 "resource-type": "Hosting-Machine", 6.3. CloudMap: Visual Notation for Cloud Resource Management 173

97 " events ":["getHMCPUDetails"] 98 } 99 ], 100 "visualization-config":{"type": "bar-chart"} 101 }, 102 { 103 " name ": "memory-info-widget", 104 "display-name": "Memory Usage Widget", 105 "resource-type":[ "Container", "Hosting-Machine"], 106 "dsm-config":[ 107 { 108 " name ": "cloudbase.docker.v1", 109 "resource-type": "Container", 110 " events ":["getContainerMemoryDetails"] 111 }, 112 { 113 " name ": "cloudbase.docker.v1", 114 "resource-type": "Hosting-Machine", 115 " events ":["getHMMemoryDetails"] 116 } 117 ], 118 "visualization-config":{"type": "bar-chart"} 119 }, 120 {...}, 121 {...} 122 ] 123 }

Application Map.

An Application Map visualizes the organization and interaction of Hosting Machines and/or Containers of an Application. Deployed resources usually depend on other cloud resources to provide and consume services. Thus understanding their runtime 6.3. CloudMap: Visual Notation for Cloud Resource Management 174 interactions proves extremely important, particularly when applying modifications (e.g., reconfiguring, scaling, shutting down). This could otherwise lead to Service- Level-Agreement (SLA) violations, or catastrophic disruptions to the complete re- source infrastructure5.

Example. Figure 6.21 depicts an Application Map focused on the "BPEL-App". Consider a DevOp may wish to scale-up the "Apache-ODE-Server". This requires creating a new "Apache-ODE-Server" and the communication links with any related Containers (e.g., "MySQL-DB-Server" and "Nginx-LB"). DevOps hence need to understand: (a) what are the existing Containers of an Application; (b) how each are related to one another; and (c) what Image is to be used to instantiate the new Container. Using the Application Map in conjunction with the Image Map, DevOps can easily determine which image would be needed, and the related containers to setup the communication links.

Another use-case is understanding the communication links between Containers. Traditionally, this is achieved via command-line tools, which only returns details about a single container. DevOps would thus iteratively discover information about each Container to derive a global view. Command-line tools are also only suitable for sophisticated administrators. Monitoring details and control actions are represented in textual forms which are hard to memorize and understand compared to visual forms.

Example. To optimize the overall performance of an Application we may min- imize data communication across different Hosting Machines. One such technique is to detect inter-communicating Containers and migrate into one Hosting Machine to reduce network latencies.

Application Map in Figure 6.21 is populated from the relevant Entities (i.e., Ap- plications, Containers and Application Registries) and their related Links, described in JSON description in Listing 6.16 (Line 2-51). Drag-n-Drop Badges are populated from the Badges section in Listing 6.16 (Line 52-72). Memory Usage Widget, CPU Usage Widget, Attributes Widget, Activity Wall and Elasticity Control Widget in Figure 6.21 are populated from the Widgets section in Listing 6.16 (Line 73-122). 5http://aws.amazon.com/message/65648/ r - - - e E E E 0 v - - D D D r 0 r - - - ] ] t e O O O e E E E r r 0 0 B ) 1 2 3 - - - e v , S e e - - - - - D D D L s ] ] ] 0 r - ] ] r r r - B n n r r r t 5 e O O O e r r i i 0 B B e e e ) 1 2 3 x - - - e e e e R , S e e a a - - - L s v v v ] ] ] n M D - n n n t t r r r - B n n i r r r r r r 5 i i i e - ( i i

B e e e n n x

e e e e e e g a a a R L a a v v v o o n M t t t D n n n t t S S S N e i t r r r Q i i i 0 - ( i

C C n n n n n

e e e g a a a L S e [ [ 5 o o o o o t t t S S S m r y N e t Q 0 C C C i C C n n n 7 b S [ [ [ e [ [ F M , 5 o o o u m r y C C C 3 S 7 b [ [ [ F M , u 3 S 0 0 0 5 0 , ) 5 2 , B ) 2 B M ( g 1

M 1 2 0 - ( 0 n - - d g 1 i

5 - 1 2 0 n - - M M - 0 n - - e d R d 2 o i 5 - g g i n r , H H s - - M M A e t d R - - 2 n n o - a g g 1 i i i r e , H H a ] ] s A ] U t t t c - - n n c - e e a n 1 i i y e i s s a ] ] s ] U l t t c r n n i c o e e o o n i i y t i s s p s l r n n i s h h D Z H H o o o i i p t i p [ [ c c s h h D Z H H A g p i a a [ [ c c [ e A g a a M M 0 [ R e M M 0 R … … - - … … 1 - - E E 1

E E s D D

e s D D r e O O o r - - O O o C ] ]

- - C r r ] ]

U r r P U e e

P C e e

y

n n r C y f i i

n n r o f 6.3. CloudMap: Visual Notation for Cloud Resource Management 175 o i i

o : o a a . m

: t t a a r t . o m e t t r s t o o e e n n - N s M o e f e n n - N

] M g e

f e f o o t

] g e

t f n o o t s d o p t n t s

i d u o p o t C C n

t i u p i o e C C n t p [ [ b i n e l t o [ [ A b i W n l t o i A u i W

- a a i r u

t - a a o o r s t c t L o o

P s f c c t L i

P f t c

i e m l E t

e m l E n A t e A n A t p A i P

e A p A i P

l u b l p u B b p B o b o o b A o i A r i r r [ r r [ t r t t P t P n n t r - - - t r - - - o e o E E E A 0 A e E E E 0 v 0 0 C - - C D D D v 0 r . - - . e D D D e ] ] 0 2 r t ] ] O O O e r r - 0 B z t z ) 1 2 3 2 O O O 2 e - - - e r r 4 r , i B i S e e ) 1 2 3 - - - L - - - e s ] ] ] l l , S e e - - - - r r r L e - B n n s ] ] ] r r r 5 e - i i r r r - B a a n n B e e e r r r x 2 v e e e e i i R i i a a B e e e x r v v v e e e n M D R n n n t t t t a a i r r r v v v i i i i - e ( n M i D

n n n t t n n i r r r

e e e g i i i - a a a ( L s

s S 5 n 5 n n n o o t t t

e e e g S S S a a a L i - . N e i t . Q 0 e o o t t t i C C e

n n n S S S

N e t E Q S 0 r e [ [ 1 r i C C 5 n n n o o o 1 o m o r y S e [ [ D 0 o o o C C C t o t 7 o 6 b m r y

[ [ [ F M ,

C C C O 4 8 u b C - [ [ [ e F M , C 3 e S r r r 5 ]

u r r r

r b e e e n : 1 S b e e e n ) 0 v v v e U o 0 o v v v U . o r r r o i . B r r r r n i 2015-11-28:11:04 2015-11-28:11:16 2015-11-28:11:18 r 2015-11-28:10:54 e e e i 1 2 3 t P 1 e e e 1 2 3 t P P 1 S S S M a - - - 0 P a S S S

- - - ( t r r r a

- - - C

r r r c 0 - - - C 0 n e e e n c i B B B e n e e e i B B B v v v 5 0 o o e v v v n D D D r r r , i o o r n D D D r r r ) i - - - o C 2 e e e t f u 2 5 - - - [ F e e e t , f u L L L S S S 5 . a B n ) L L L S S S . i - - - a 1 n m Q Q Q 0 i - - -

E E E m Q Q Q S S S B m 0 M

E E E Account Help Log-Out Help Account k S S S m y y y ( r D D D m g k 1

y y y r D D D m 1 2 0 n M o o O M O M O M - 0 i n - - n ( d f o o i O M O M O M 5 - g ------i 1

n C L f 1 2 ] ] ] ] ] ] - - M M e ------n d R - 2 o 0 n C L

- - r r r r r r 0 ] ] ] ] ] ] d 0 g g t i I r n i , H H - s . A r r r r r r e e e e e e n 0 t - - n n I - - - M M - a e 0 d . e R 1 i i e e e e e e e o a n n n n n n ] ] 0 - ] U s g g t t i i i i i i i r c H H s c Ac#vity/Control.Wall e e n n n n n n 6 A s n 1 2 0 t y i s s - - s a a a a a a i i i i i i n n s e - a l - - r n n i i a i e t t t t t t 1 2 a ] ] o o o t e a a a a a a i i ] U t e t t r r c p - -

c t t t t t t n n n n n n e e n t s h h D Z y H H i s s u s r r p i l R e e o o o o o o n n n n n n r n n [ [ c c i o o o p i i u A t b g p e e a a o o o o o o C C C C C C v v i [ s h h D Z H H e [ [ [ [ [ [ b p r r i r o C C C C C C v v M M i [ [ c c 0

R [ [ [ [ [ [ t 0 t A r r g r r e e a a t [ : Pulling Image from BPEL/1.3... from Pulling Image : BPEL/1.3... from Pulled Image started. : BPEL/1.3... from Pulled Image started. : t e : paused. All processed frozen. processed All paused. : e 0 a … … e e M M t S S 0 d A - - R t s

- - 2 1 i S S - A , e E E a

- - … … p E E

- - 1 g R s a D D m 1 E E p

e o D D E E r a t o b O O p r o

D D r r - - s O O C D D e o ] ]

u e - - d d r r r r d s O O ) ] ] U t - O O S o g P r r

- - - . e e i

d - - e ] ] C C g y ] ] n -

e e n n g r r r e i f e g r r i i R a U o m g o r n n Moshe Chai BarukhChai Moshe n r e e

s P : z a a i i e e i

. m a a o b t t r i C t y d r o 2 n n e

t

s n n a a r a l r

r i i f o e i n n - i i N - u t t M o o f e ) o

d ] g e e

r

e t a a a f n o o : t a a S

n n . d o t i ) m n s l t t i d s o p t t r t t 0 f t B o o

i u e e

e o s . o o i C C n t p n t o a ODE-Server-1 e i ODE-Server-1 ODE-Server-2 e n n a n n - o N ODE-Server-2 [ [ P n b M 0 n i s l f t v e o A M i f i W

] o g e

e e

i M e u f C C

o o t o o - ( a a l r r a t 9 t m n s o [ [ ( o d n o p n s c t L

t

s

P i u f M i c

z P i p o i e t e C C n

t p e C C m l E i b l e i : [ [ n b

A t [ [ n l t e A e o p A i P A a l i

W o i P S u ActivityWall Memory Usage Widget CPU Usage Widget AttributesWidget ) l

u u - a a b r p B t t o e o - o r s o c t L b a e a o

P s f S A c r i i t r t

e l e r i m l E [

r t E r n

A d t e A t s p F P A i P t

n t P l u i - b p e e p B D e 7 o o A b ( o A i 0 C r g n n 6 r b R [

. o e r O t i i t P - t z 6 2 : n r a o

t i g ] l a

o : e ) r r A r t a ) o d C n i s s e P e 4 i t d t ) 0 - n i B t - - -

z e r r

- - -

. i i r s n 4 0 5 n g l d o i E E E i . g e n R E E E e e M e e e 2

e a 6 r r n ) 1 m ( i a - 1 ImageMap D D D v o C 0 o g i s

a - - t t z n D D D b t o - ] i i

a r b i B

i 1 e a ] ] O O O r d s r d t a n l n r

C C o 1 2 3 e O O O c u e - - - i r r r r r n r 3 a e B e - e

1 2 3 M - - - n o b a - - - r d e e e n e a ] ] ] o ) t S r s t ( S e e t ] 0 a r r r s v v v - - - l o i U L a n o f

o r r r ] ] ]

. i . i r r r e o t i o e r - P i C r r r r

t - e e e n n

e e e r r r 1 2 3 t n c P e e e m P [ U 1 M d - - -

i P e i i e a S S S n e g - - - C r e a B v v v e e e

x y D Z - r g r r r s ( e e e r r r m n n n t

- - - t C e o E E E l c r r r a a i n n b s i i i r e e e n n e e e d o e z

i B B B v v v s n o D i s i n n n n v v v v v v b t t n U i o e e e o - o 1 i a a a n e n D D D D D D r r r : r r r C r r r f i i i o i B g

i i - o 2 a l r D Z u o t t t - - - - t e e e t ] f e e e i 1 2 3 t u

u n n U S S S P l 5 ) e e e g P L L L a a a a S S S W L O O O n S S S . M - - - o a r O s d n t a C n n n e

r a C

S 1 2 3 i - - - r r r o o C t t t - - - s m Q Q Q C - - - I S S S 0 t l 0 c i 2 ( n r [ N E E E - e u n e e e o o o - - - S S S Q i B B B m [ App Map App e a r ] ] ] o

- l C C k n n n v v v t 4 y y y 0 r D D D m ] o a r r r n D D D P r r r a n f C C C r r r i i S o g n e [ [ o C e o 0 r i O M O M O M e g h - - - o o o 3 e e e t i f

u e e e [ [ [ f ( c e e e m ------L L L a y S S S M C L 1 a ] ] ] ] ] ] e n a n n 1 n n C C C p n

b v v v

i - - - y r r r r r r - r 0 2 m Q Q Q s i n n n t s I t i i 2

. l E E E e e e e e e r r r [ [ [ i S S S i i i M r : m - o a -

o r s k n n n n n n o y y y r 0 n D D D 3 m s a U i i i i i i e e e 1 r r a a a ) n e r 1 2 f

n - a a a a a a

o o o D Z u e O M O M O M t o i o - - t t t t t t t t t t f S S S t l e r s ------s t r r y P C L G t n n n n n n n

] ] ] ] ] ] O s i n C n n n n u r r r r r r r - - -

e

e e r C o o o o o o e s

v I I r ( b [ r e e e e e e g u 0 o o o C C C C C C v v - n e i [ o o E E E r v e [ [ [ [ [ [

- l n n n n n n n r r e e r r e o s i i i i i i r C C C n t o 1 2 - 1 e e e o a a a a a a D D D g C C e o t g g h C i m - - t t t t t t [ [ [ n b e - t ] i

r r C S S A S i n n n n n n a a e O O O M r d p n u t

Switch.between.Maps - - r S C e e U o o o o o o 1 2 3 o - - - - i s a U n b r - - a e Control.Widget - - - E E C C C C C C v v P i a a

M r ] ] ] o r Hosting Map Hosting [ [ [ [ [ [ P E r r 0 3 ] r t a r r r

E 2 U a p n f r r r r C D D t e C r - i e e o y P

o t e e e

D c e e e m D n r f M r r … … f t - - - O O

e a n G

S S

A y v v v ] ] r y - - i - o r o s d o n n n t

g - - t

] ] e O s r r l O y o r r r E E E r

i i i i - r o n g r r a e r . s . o - n - m e n E E e e v

g e Figure 6.21: Application Map e e e n t n a a a o t n e e e - 1 r o f D D D o o e i ] C o ] D Z u g i s a r p o t t t n n s t o D D r 1 - e S S S n n l o ] i g r i i r

N e i i r N M o n - O a s o e e O O O m C n n n m r

d d t e C r g e r a a e C O O C t a a a 1 2 3 I f g

e t e M

- - - i ( e [ t t t t - - u t t n o o o S r d - [ M - e e d - - - ] ] e o n t

- l a n s ] ] ] o t - n n U d o

n n i u - g n r r i s M ] g C C C a r r r t 1 f o t i H u a n f r r r i g

C P o o 1 2 g e n h e M a o o e e r a i E [ [ [ b n f - U i n n

l e e e - e a 0 M i - - i c e W C e e e a m t i r CloudMap: Visual Notation for Cloud Resource Management Cloud for Notation Visual CloudMap: b C C n n M

] l l r o p n u e

C C a i i - n [ [ a i W t D r v v v t

n o y i f - r [ [ P s n o d o e n n n t t

- - M M s y t t a a a t l s a r r r r

i

P i i i r i R f r o o t t o n o 3 O o n s t r

s

o 1 n e t r

g g e m n i o n

H H i P e e e f . n n - a a a o n e - o n A t f s o t t y u

C e A f t D Z u e o - - o A i n n o t t t o r t h H o o t - ] r S S S a G l

s n [ i i o u t n a i ] ] b C e 1 e n [ c M i r

O A i ] s o N e e s C n n n t t C C l - C b c : e [ e e [ [ o u I m a m n e b g ( e y e [ i i s s v e u

P o o o r m t [ r r l

- t l r r n n A - b : r M e d o a o M n o o e o o i i g i t t C C C t P i u i r

p r C g e f g h t H C r M [ [ [ s h h

H H e a M M

a

p b o p - i n t l S P f A [ [ s c c i W p n t i f s ] U t -

o i A g o

a t r n a a P

n o e e a r

[ E s r U t A e s s t

3 P f o C 1 r z y

o n t

M M d i

e n - o i D r o i n R f n t l t - y u C e r o t A i o G h H

o c O r i o

a [ u g i b

e s [ . c i -

m A t t o b : t a o m n e o

) e ] a v e s i i l r c 1 r r e . r r A N r M r o - e a o n M o g m

o d e g e e t A P i 0 f

e t r f

C t . t

- M z e a d o

t n M

l i t p S

i u e i 2 g l o A s t s H n U o - e M s t o a b - n n a l r P o n

i W t i e i

r E ] U a u

t a t r C t C o e y

z n o o e r

i d e i s t s D r i n

P f f l b r t e n l o n - t y

e n m o o o i o v i e P c n O t o r

o a C e A g r A i

i . Drag;n;Drop.Badges ( h H i - g m r C [ u A t e

b 1 2 3 t o r r r [ t a c o o

a e

] ) P s i S : - - - b l : t e e e n 1 r e o . a a

r n N i r r r

M ) - r - r v v v e n o c m a

r o d e g e e n e e e i a i B s M r r r f e t o r i t

P t v v v

z - M r e f M o B d t o e e e 1 2 3 t n 5 t n D b r r r i t

i

i . u o e i L - l o p g e S S S - - - - t n e e e H t f a u A - s o r 1 e s t M - r r r a b L - - - - S S S a o n n n l r ] o a e

c n n i W x i i t i e e e e a ] r i - - - t i B B B u m

Q P a n t C t r e

n p o s v v v a E E E z r n o i o e S e i m d i s n D D D t i s r r r o t l

u P k f p o b y r e n t r l D D D m i g - l o n n t n

- - - e e e e m f o u n t i v i e P n c A o o n o o a C O O O N M t s L L L g o S S S i r C e r A

i i A i ( f - o a n i g r h H ------A e i - - - e [

C L 1 2 3 t u t a t ] ] ] ] ] ] e m Q Q Q

b ) o c L r a [ n c

i C P

l i r r r r r r S : r E E E . - - - t S S S a a I b :

o

l E n o n a r r r e e e e e e

) n - i o k y y y i - r d e r D D D m c a r i p P C n e e e r n n n n n n r i

B s MAP e m s

a

i i i i i i M z n o 2 a o t o v v v t p O M O M O M P r i B M i r

o B f e a a a a a a n D f t 0 e b r r r l o U t i o

n 2 t t t t t t ------.

L - A e t C L o s t - e e e ] ] ] ] ] ] t p f u o a n r P / n n n n n n f 1 y o [ - A

s r r r r r r L i S S S u l n a r a e n o o o o o o o x i o t n C C 0 e e e e e e e n i - - - t b r m Q i n e e r P I C C C C C C i

n s Select an Entity Selectan a l E E E n n n n n n [ [ [ [ [ [ b C 8

r S e i n o m - i i i i i i z r n t 5 O d i u

k . y v i i e P r D D D m t g o l o n a a a a a a g ( t l t r -

[Application]-BPEL-App [Application]-BPEL-App n i t ( 0 t t t t t t g n r

cloud o o Elasticity Control Widget Widget ElasticityControl C O O O N M = o c e

i n 1 2 3 t s - a o a f g o n n n n n n i i A

P h ------i S : - - - t C L r

a t r A ] ] ] ] ] ] e

n t o o o o o o n r r r t a

) - s C p

) r r r r r r r c a a i e e o a I l

n e e e r . C C C C C C i B s e

e e e e e e . a t e [ [ [ [ [ [ - r v v v g r i n n o M r o B r l d n n n n n n n e D p b r r r i i i B m s o i i i i i i d o r L -

e n 5 - i

e e e e t f z G o r u f 8 o . r u t a a a a a a a L e - i

L S S S o

r n t t t t t t e r a e k t 0 l o n t r - o x i 4 n W i - - - t e n n n n n n f d y m

Q s t x P c - M

n n u a x s l a E E E o 0

-

S i g o o o o o o e m g i n a n t o u k g o b y n n r D D D m i t g C g l g a 2 I e n C C C C C C n i i r i e n [ [ [ [ [ [ i d c

n - s a o a r o n C O O O N M l g o O i b r

r e n n g f o i i r t n s ------g ( U v C L = i e P n t o t ] ] ] ] ] ] o o u

n

d = C N =

r

U

d r r r r r r t i n ( g

r

r a I

i

i A h e

e e e e e e 1 2 3 t U

y o a - r

n e r = P U n S : a n n n n n n - - - o t p 2 P i r a = D

m s s

n i i i i i i o a f o r r r P = o

) g -

r f

a a a a a a a c I C a a t e

b o e 0 n e e e n

i t t t t t t B s i C m e r n . t e M i o p = - e v v v i e e r n n n n n n M f y o B l g 0

n n D b r r r e t u l r i G o o m t o o o o o o L - … … P e d n

- e e e a o r t m r f ] ] u u b o r n o h - I C C C C C C r r e a i L S S S e n o t r [ [ [ [ [ [ d a e a n o - r c t x e e p i M i O i - - - t m - g m Q t g P n n g (

n N s n s p c m i s a i i E E E n t

S i g m o

a n t n a a u k f i y o r D D D m g t t i A s l a n h r n

r n n n r n o o C O O O N M p o U i M i o a o o f o o i

d ------t

a C L t t

] ] ] ] ] ] i C C n i r C U [ [ r r r r r r p e a I

n n e e e e e e o P G o s - r f o

o r n n n n n n C m a s i i i i i i o n e d r M i f a a a a a a M e e -

l g t t t t t t g t o g a P n n n n n n f y n u l i s a o o o o o o n r r b n I C C C C C C U i o [ [ [ [ [ [

d - r t O

i t U g ( e t n

o P s n f o i A h C a

r n M i p e a o l

a t P i r p n G o

o r e d M - g g g a n i s a r r U o

d t

i U e n o P s f o C a n M i e l P 6.4. Implementation 176

Hosting-Machine Map.

A Hosting-Machine Map visualizes the organization of Containers and Applications within a specific Hosting Machine. This is useful for DevOps who manage a com- plete cloud environment; as opposed to the Application Map which only shows the Containers for a specific Application.

Example. Figure 6.22 shows the set of Containers deployed on a Hosting Machine “HM-1”. DevOps are constantly responsible to check for optimization strategies: identifying under- or over- used machines. Suppose “HM1” can host a maximum of five Containers. We can use the map to determine there are only three running Containers (i.e., ODE-Server 1,2 and 3). Thus it is possible to deploy two new Containers or migrate two existing ones. On the other hand, DevOps may delete Hosting Machines which are not currently hosting any Containers.

Furthermore, in conjunction with the Monitoring Probe, DevOps may observe data such as memory and storage utilization, as well as the existence of any ex- hausted machines. Actions may then be taken to avoid potential memory overflows and crashes of Containers. For instance, scaling-up the Hosting Machines to in- crease their underlying resources; or notify the owners to take any necessary actions (e.g., migrate Containers to a non-exhausted machines).

Hosting Machine Map in Figure 6.22 is populated from the relevant Entities (i.e., Hosting Machines, Containers, Hosting Machine Registries and Applications) and their related Links, described in JSON description in Listing 6.16 (Line 2-51). Drag-n-Drop Badges are populated from the Badges section in Listing 6.16 (Line 52-72). Memory Usage Widget, CPU Usage Widget, Attributes Widget, Activity Wall and Elasticity Control Widget in Figure 6.22 are populated from the Widgets section in Listing 6.16 (Line 73-122).

6.4 Implementation

We leverage our previous work in Chapter3[211] to simplify the interactions be- tween underlying orchestration tools. With our proposed Domain Specific Model 6.4. Implementation 177 2015-11-28:08:38 2015-11-28:09:23 2015-11-28:09:28 2015-11-28:08:24 Account Help Log-Out Help Account

: Setting up and Setting configuring... : Deployed : up and Setting configuring... : : Deployed : Moshe Chai BarukhChai Moshe HM-1 HM-1 HM-2 HM-2 ActivityWall Memory Usage Widget CPU Usage Widget AttributesWidget ImageMap App Map App Hosting Map Hosting Figure 6.22: Hosting Machine Map CloudMap: Visual Notation for Cloud Resource Management Cloud for Notation Visual CloudMap: MAP Select an Entity Selectan [Hosting Machine]-HM-1 Machine]-HM-1 [Hosting cloud Elasticity Control Widget ElasticityControl 6.4. Implementation 178

(DSM), high-level resource configurations can be supplied which are then automat- ically translated into their native language using Connectors. Layered above this as shown in Figure 6.23, CloudMap implements: (i) An interactive mind-map visualiza- tion for navigating cloud resources; (ii) Detecting and displays events for monitoring; and (iii) allowing to perform both manual and automated actions.

6.4.1 Mind-Map Generation

The knowledge needed to generated the mind-maps are serialized from a supplied CloudMap JSON file. A detailed CloudMap JSON file can be found at Listing 6.16 in Section 6.3.4. This is typically written by DevOps for a desired cloud resource configuration using the CloudMap Notation we presented in Section 6.3. Further- more, existing description artifacts of cloud resource configurations, stored in JSON Object Store can be reused to generate the mind-maps (refer to Figure 6.23). In a CloudMap JSON file, the Entities and Links are the two main sections, that simplify the generation of mindmaps. As mentioned, constructs that pertain to some orches- tration tool (e.g. Docker), must abide by the Domain-specific Model’s schema. For example, the BPEL-App entity abides by the docker.rest.Application schema, as shown in Figure 6.23. When this is the case, behind the scenes the system, we proposed in Chapter3, is able to interpret complex and heterogeneous config- urations and seamlessly connect to the underlying orchestration tool. This means, when a configuration file is written it is automatically translated into the low-level tool-specific language/API and deployed. The graphical mind-maps is rendered via the JS InfoVis Toolkit6.

6.4.2 Activity/Control Wall

To enable interactivity, we have implemented a contextualized dashboard and control wall (see right hand side and bottom panels in Figure 6.21 and 6.22) that includes Widgets and a Command Line Interface. 6http://philogb.github.io/jit/ Contexts DB Recommendation Rules Task Category Deployment Scenario

Extended BPMN Editor Workflows

Translator Engine

BPMN 2.0 Engine

read Domain-specific Event DSM events ECA Rule Models (DSMs) DSM-to-MindMap Management 6.4. ImplementationSystem Processor 179

MindMap DB Widgets DB Events DB Rules DB JSON Object Store subscribe trigger actions

Connector-1 Connector-2 Connector-3

invoke operations events

Cloud Resource Management Tools

Figure 6.23: CloudMap System Architecture 6.4. Implementation 180

Widgets:

For example, when the user drag-n-drop either a Probe or Control badge onto a mind map entity, an appropriate widget is displayed. Activity events are also posted on Activity Wall (refer to Figure 6.21). For each Probe badge, a necessary monitoring task is scheduled. These monitoring tasks consume the JSON descriptions of badges and related widgets (refer to Fig 6.4(c) and (d)) to determine which data should be monitored from Events DB. When a Probe badge is detached, the monitoring task is deleted. Badges may also be attached to multiple nodes to formulate an aggre- gated visualization. For example, Figs. 6.21 and 6.22 compares absolute memory consumption statistics of each Container and Hosting Machine. Similarly, control actions widgets allow DevOps to “manually” perform actions to modify the resource configurations. Widgets are implemented in HTML/JS. Widgets, related to Probe badges, leverages Google Chart Library7 for generating graphical charts. We as- sume the requisite widgets are pre-built and curated in the Widgets Base. Realtime updates to the widgets are also achieved by triggers on the Monitoring Events DB that notifies the affected widget when new event entries are received in the DB.

Required GUI elements (e.g., buttons, text boxes) which are required to build Control action widgets can be determined based-on the the metadata (e.g., action name and input parameters) contained in the Connectors which we proposed in Chapter3. For example, Control action widgets related to the Hosting Machine entities may include drop-down menu to select number of CPU cores; slider to scale- up or down memory. Furthermore, Manual and Automated rules, offered by the Rule Processor are very potent to customize user interfaces of Widgets dynamically (e.g., recommend control actions based on the monitored events, update Activity Wall to denote the execution of control actions). For instance, they may beneficial for pre- processing the graphical user interfaces of Widgets to reflect certain manual actions (e.g., an approval button to restart a stopped Container). Similarly, another rule could be defined to listen to events that denote the completion of the manual actions, and trigger certain post-processing actions (e.g., a pop-up message to notify whether the Container was restarted). It is apparent, not only does Activity/Control wall

7http://developers.google.com/chart/ 6.5. Evaluation 181 help automate certain management tasks, it also automate the notification process - thus making it more simpler for users to decide what needs to be done.

Command Line Interface:

To facilitate users to execute manual and ad-hoc tasks against monitored events, we propose a Command-Line Interface (CLI) widget as future work. CLI widget allows DevOps to remotely trigger control actions on cloud resources. It consists of operations to (re-)configure attributes of Entities and Links in CloudMap notation (e.g., increase the amount of CPU in Container, create new Containers). The Connectors include knowledge to interpret these operations and transform them into low-level API (e.g., Docker Remote API) calls.

6.5 Evaluation

6.5.1 Experimental Setup

We evaluated our tool by conducting a user-study and the following hypotheses were examined: H1, CloudMap increases the efficiency to accurately understand and navigate attributes and relationships of deployed cloud resources; H2, increases the efficiency to accurately perform monitor and control actions; and H3, the key features offered are useful and comprehensible. We measured efficiency as the time taken to complete the tasks; and accuracy was determined using a set of questions (see below). The total time to complete the tasks was until all questions were answered accurately.

Prior to the experiment, participants attended at least 1 out of 3 individual train- ing sessions. During each session, participants were explained CloudMap’s usage via a presentation, user-guide and hand-on session. In addition, a brief description of various use-cases were presented. Participants were provided with a pre-built de- scription of a Web application such that they could quickly test the features without modeling and deploying cloud resources from scratch. Several questions were raised 6.5. Evaluation 182 and clarified accordingly.

Participants were then provided with an evaluation task, with only the user- guide available for support. This task consisted of a practical component, where we instructed the participants to utilize our tool to visualize 2 cloud resource configu- rations. The first was based on the motivating example presented earlier at Section 6.2, the second was a Node.js Web application that included three runtime instances, a Redis database and Nginx server for load-balancing. Participants were required to understand their interactions and properties, monitor and perform control actions. It also then consisted of a written component to record their feedback.

6.5.2 Questionnaire

The participants’ survey was divided into three main parts: (a) Participants’ Back- ground; (b) Product Functionality; and (c) Insights and Improvements. We include the complete list of questions in AppendixD.

The Background questions sought to discover the participants’ familiarity with existing cloud resource orchestration techniques; i.e. Docker.

The Functionality questions provided the necessary instructions for completing the tasks, as well as to determine the accuracy of the completed task. Questions were categorized, each targeting a particular feature of the tool.

Finally, the Insights and Improvements questions sought to determine the use- fulness of the visual tool in everyday cloud resource management tasks. As well as obtain feedback for improvements or free comments.

6.5.3 Participant Selection & Grouping

Participants were sourced from Sydney-based companies with diverse levels of tech- nical expertise. For the purpose of analysis we classified the total of 12 participants into 3 groups: (I) Experts (7 participants), who rated an average of 4-5 in the back- ground questions, indicated sophisticated understanding about one or more cloud orchestration tool. Participants in this category were system administrators with 2-8 6.5. Evaluation 183 years of experience. We observed that these participants provided the most produc- tive feedback for improvements as well. (II) Generalists (4 participants), who rated an average of 2-3 on the background questions. This indicated they have enough working experience of a particular cloud resource orchestration tool for day-to-day requirements. Participants in this category were general software developers with 1-5 years of experience. (III) Novices (1 participant), who rated 1 or below on the background questions. This indicated they had not worked with cloud orchestration technologies, albeit we included this participant for sake of qualitative feedback from a fresh perspective.

6.5.4 Experiment Results & Analysis

Evaluation of H1 and H2.

The stipulated hypothesis set out to prove if CloudMap is indeed faster at: under- stand and navigate attributes and relationships over cloud resources in the case of H1, and perform control actions in the case of H2. H1 and H2 were evaluated based on the time taken to perform the tasks respond to Q1-3 and Q4 respectively.

Alternatively, we sought to prove (or disprove) the null hypotheses H10 and H20 that they are similar in time. Both hypotheses were examined by conducting a t-test with a probability threshold of 5%, and assuming unequal variance.

From the results (see Tables in Figure 6.24), it is evident for both hypotheses, participants with a stronger technical backgrounds found it easier to use our tool and solved the tasks quicker. Although, for generalists it was pleasantly surprising that CloudMap demonstrated a significant increase in efficiency (and reduction in time).

The comparative experiment focused on one third-party tool only, Shipyard 8. Due to the high number of existing cloud management tools, as well as project- based constraints, a more exhaustive comparative experiment was outside the scope. However, given the stark differences in times (means of 15.58 mins against 25.33 mins for H1; and means of 7.83 mins against 10.89 mins for H2), we postulate that

8https://shipyard-project.com/ 6.5. Evaluation 184 it is unlikely to observe fundamental differences when comparing with any other tools similar to Shipyard. Accordingly, the result was that given our observations the likelihood of both H10 and H20 (equal mean modeling time) was around 5%. Therefore, we could safely reject these null hypotheses, and imply the truth of H1 and H2.

Evaluation of H3.

H3 states that the main features offered by CloudMap are useful and comprehen- sible. We tested this hypothesis through asking respective questions in the survey (i.e. Insights section of the survey). In particular, after performing Q4 we asked the participants to rate the usability (1-5) of each feature if the participant under- stood what it did. Basic features such as the Application, Image and Hosting Map was presented. As well as advanced features such as working with monitoring and control probes and widgets. Furthermore, we examined the usability of monitoring and controlling (M&C) single Applications; and M&C Hosting Machines (an entire cloud infrastructure).

We observed that regardless of experience, participants found the Application and Hosting Machine Map very intuitive and useful. Even though, several par- ticipants rated it slightly less intuitive and useful the Image Map. We observed, participants, who were experienced with Docker, gave higher ratings and vice versa. We hence believe participants who were not familiar with Docker found Image Map was less intuitive and useful. Nonetheless, overall the mean score for all features in Figure 6.25 is above the neutral value of 3.

6.5.5 Discussion

According to the written feedback, participants found that the Mind-Map visualiza- tion was a new but familiar concept. It was thus impressive that CloudMap had a considerably fast learning-curve rate amongst participants. Participants also cham- pioned the explicit visualization of cloud resource relationships as it is very useful for navigating through complex cloud resources. Similarly for the probes and widgets 6.5. Evaluation 185

CloudMap-H1 Shipyard-H1

33.00 31.00

29.00 28.00 27.00 23.00 25.00 18.00 Time (min) Time (min) 23.00

13.00 21.00

8.00 19.00 General Experts General Experts

(a) H1:CloudMap (b) H1:Shipyard

CloudMap-H2 Shipyard-H2

12.00 15.00 11.00 14.00 10.00 13.00 9.00 12.00 8.00 11.00 7.00 10.00 Time (min) Time (min) 6.00 9.00 5.00 8.00 4.00 7.00 General Experts General Experts

(c) H2:CloudMap (d) H2:Shipyard H1 H2 CloudMap Shipyard CloudMap Shipyard Mean 15.58 25.33 7.83 10.89 Variance 37.17 11.5 3.606 3.611 Observations 12 9 12 9

H1 H2 df 19 19 p(T = t) 0.0004 0.0017 6 t Critical two-tail 4.3063 3.6479 Reject null hypothesis Yes Yes

Figure 6.24: Time Results (grouped by expertise) to complete the tasks; and below t-test Results for H1 and H2 6.6. Related Work 186

5

4 4.45 4.18 3 3.82 3.73 Usability 3.27 Rank 2 1 0 Hosng Machine Image Map Applicaon Map M&C M&C Hosng Map Applicaons Machines

Figure 6.25: Rate of usability of the main features of CloudMap that enabled seamless monitoring, analysis and control.

However, participants also suggested potential extensions such as: (a) cost visu- alization widget for cloud resources, (b) sorting and filtering Hosting Machines and Applications based on the geographical region and role; (c) cost comparison widget for a cloud resource on different providers; (d) Control Actions to set up scheduled orchestration tasks; (e) Control Actions to set up push notifications for high CPU and memory usage; and (f) a widget that generate recommendations to recover from error conditions.

6.6 Related Work

In the following we explore cloud resource configuration and management tools, with a focus on their language and representation aspects. We use this to identify visualization concerns over cloud resources, and compare and contrast with our proposed approach. Furthermore, we examine visual notations in other domains, and extrapolate our findings.

Cloud resource representation notations of existing orchestration tools can either be textual, visual or hybrid (i.e., a mixture of both textual and visual notations). Textual notations can be either key-value pairs (e.g., AWS CLI), YAML (e.g., Docker Compose), XML (e.g., Plush), JSON (e.g., AWS CloudFormation), or proprietary formats such as Chef recipes and Dockerfiles [20, 69,3,4, 44]. The visual paradigm often simplify the manner of understanding compared to textual notations. While 6.6. Related Work 187 such visual techniques can be applied over most of the cloud resource lifecycle, we scope this chapter on: navigation, understanding, discovery, monitoring and control concerns of cloud resources.

Discovery, Navigation, Understanding and Selection. Tools and research initiatives such as AWS Management Console, OpenTOSCA, Puppet Enterprise Console, RightScale and CA AppLogic provide visual features to facilitate discov- ery, navigation, understanding and selection of cloud resources [182, 29, 120, 12,2]. However, these tools provide a flat view (e.g. catalogs) of cloud resources with sorting and filtering features. DevOps may select a particular resource to analyze their attributes, albeit they do not explicitly visualize relationships, dependencies and memberships between cloud resources. This implies DevOps would need to manually mine relationship details via textual descriptions. On the other hand, Hyperglance [190], (a ) visualizes cloud resource attributes and relationships as a graph. In contrast to the above, we contribute an extensible framework (i.e. in future additional Entities, Links, Badges and/or Widgets can be curated) for visualizing cloud resources via the familiar notion of mind-maps.

Deployment, Monitoring and Controlling. Tools such as Juju GUI, Cloud- Base, CA AppLogic, OpenTOSCA and VisualOps provide visual abstractions to de- scribe deployment workflows and resource topologies [208, 134, 29, 12, 203]. Cloud resource monitoring tools such as Nagios, CloudFielder, allows DevOps to de- fine Service Level Agreements (SLAs), detect anomalies and notify about violations of SLAs of cloud resources [22, 137, 222]. AWS Management Console, VisualOps, CA AppLogic, RightScale, Puppet Enterprise Console, Plush and some other cloud resource management tools provide control features such as restarting, scaling and migration [182, 203, 12,2, 120,3]. Ordinarily, DevOps would have to switch be- tween multiple tools for different aspects of the cloud resource management lifecycle, this is time-consuming and cumbersome. In contrast, our tool greatly compliments this work, as we can integrate these features as pluggable widgets to seamlessly and centrally manage cloud resources.

Visual Notations in Other Domains. Visual notations are adopted in other closely related domains, such as object-oriented programming, Web site modeling, 6.7. Conclusion and Future Work 188 business process management, distributed systems and enterprise service modeling. Existing visual notations for service orchestration such as Business Process Modeling Notation (BPMN), focus primarily on the application layer. However, orchestrat- ing cloud resources requires rich abstractions to describe and manage application resource requirements and constraints; support troubleshooting; and flexible and efficient scheduling of resources. Architexa and Atlas visualize Java based source codes and execution aspects in terms of hierarchical trees and UML sequence di- agrams [186, 150, 55]. WebML introduces a visual notation to model Web sites [42]. OverView introduce a generic visualization of large-scale distributed software [123]. MaramaEML includes a visual language to model and deploy enterprise ser- vices [129]. All these visual notations or languages adopt Entity-Relationship (ER) models (e.g., graphs, trees, UML class diagrams), which served as our motivation. Eden [218] is a visual notation for network management that proposed the concept of Badges to associate security and access policies with network devices. Similarly, we were inspired to propose the concept of Badges which can be attached to cloud resources to enable Probes and Control Actions functions.

6.7 Conclusion and Future Work

Visual techniques provide a refreshing approach in contrast with existing largely text-based solutions. With the vast proliferation of cloud computing and large amount of complex configurations DevOps are faced with, this work provides a timely contribution. Our design was based on a detailed survey comprising 21 ex- perts, where we aggregated, analyzed and applied our findings to propose a visual notation for cloud resource management. We further proposed the notion of Badges via drag-n-drop to enable monitoring and control features. To support the effec- tiveness of our approach, we also identified 3 common visualization patterns. We evaluated our work with a user-study of 12 participants, and our approach yielded significantly promising results with 33.29% improved efficiency. We are therefore confident our work provides an innovative approach to a new way of cloud manage- ment. As future work, we plan to integrate visual notations to specify cloud resource 6.7. Conclusion and Future Work 189 deployment and reconfiguration workflows, also based on our work in Chapter4 [208]. Moreover, we endeavor to provide high-level monitoring features such as cost estimation and comparison of cloud-based solutions across multiple providers (e.g. AWS EC2 and Google Cloud). Chapter 7

Conclusions and Future Work

In this chapter, we summarize the contributions of this dissertation and discuss some future research directions to build on this work.

7.1 Concluding Remarks

While the cloud computing paradigm has become the catalyst for accelerating inno- vations, creating new business models and fundamentally changing economies, the flexibility to deploy applications, workloads and data by leveraging the right blend of technologies prompted the need for federated cloud resource configuration and orchestration in the business world. Following this, the problem of designing effec- tive and agile cloud resource configuration and orchestration techniques which cope with large-scale heterogeneous cloud environments remains a main priority in the industry.

In this dissertation, we focused on the federation of cloud resource configuration and orchestration techniques. Below we summarize the most significant contribu- tions of this dissertation:

• A taxonomy framework and survey to efficiently explore, analyse, understand, compare, contrast and thereby be able to wisely evaluate and select cloud re- source orchestration techniques (Chapter2). As there exist a wide range of

190 7.1. Concluding Remarks 191

orchestration techniques covering different aspects (i.e., selection, configura- tion, deployment, monitoring and controlling) of cloud resource life-cycle, we realized that it is required a unified and comprehensive analysis framework which accelerate fundamental understanding of cloud resource orchestration in terms of concepts, paradigms, languages, models and tools. We further provide an analysis over a set of methodically chosen eleven cloud resource orchestration techniques.

We found out several technical gaps which need to be addressed. There is still little support for federated cloud resource configuration and orchestration in enterprise-ready techniques. After analysing knowledge reuse mechanisms for cloud resource orchestration, we realized that majority of tools adopt pro- prietary languages to represent reuse artifacts, even though there are several emerging open and industry standards for representing reuse artifacts. Fur- thermore, we figured out that employing autonomic process models to orches- trate cloud resources has been minimally explored. Autonomic process models would allow to dynamically and automatically modify cloud resources in re- sponse to changes in the cloud resources and their environment. Subsequently we derive some future directions based on the technical gaps which were iden- tified during the analysis.

• Domain-specific Models, a model-driven approach for high-level representa- tions of low-level and technique-specific cloud resource descriptions (Chapter 3). Given that we architect Domain-specific Models over existing cloud re- source orchestration techniques, this significantly enhances the potential for knowledge reuse, since we can better harness interoperability capabilities. In order to facilitate a large number and variety of cloud resource orchestration techniques (e.g., procedural, activity based and declarative), as well as the variety of different target environments (i.e., public, private and federated) - we propose, a pluggable architecture which enables DevOps to implement and register Connectors which are capable of deploying and reconfiguring cloud resource configurations, described using Domain-specific Models. Ultimately our contributions facilitate to build up an ecological knowledge community of 7.1. Concluding Remarks 192

cloud resource orchestration techniques.

• A unified and graphical process-based notation to specify deployment and re- configuration workflows of federated cloud resources (Chapter4). The frame- work consists of a graphical and process-based notation to describe and auto- mate the deployment and reconfiguration tasks of complex and federated cloud resources. We provide mechanisms which automatically translate higher-level deployment and reconfiguration workflows into executable BPMN (Business Process Management Notation) processes. Our framework handles the hetero- geneity in resource description models, notations and capabilities of different provider-specific configuration languages and services.

• A knowledge base for users to leverage existing cloud resource orchestration knowledge in a unified manner (Chapter5). To facilitate discovery and selec- tion of orchestration knowledge, we propose a rule-based recommender system that recommends high-level cloud resource representations based on user con- texts (e.g., intended task and deployment scenario). We further propose an in- cremental orchestration knowledge acquisition technique that gradually builds a knowledge base with very little human intervention. This means whenever a user creates a new or modifies a cloud resource configuration, a new/updated context triggers a new knowledge-rule to be incrementally added.

• Visual notations and semantics for DevOps to represent, monitor and con- trol cloud resource configurations (Chapter6). We define Entities and Links to represent cloud resources and relationships between cloud resources. We propose Badges and Widgets which facilitate DevOps to visually and seam- lessly monitor and control cloud resources. We introduce three mindmap based reusable visualization patterns (i.e., Application Map, Hosting Machine Map and Image Map) for managing complex cloud resources. We validate our problem statement and proposed visual notation based on surveys and user studies which have successfully demonstrated both the comprehensibility and the anticipated benefits of our proposed approach. 7.2. Future Directions 193

7.2 Future Directions

In this dissertation, we have investigated the problem of facilitating the configu- ration and orchestration of federated cloud resources. We believe that increasing and simplifying support for federated cloud resources is an important research area, which will attract a lot of attention in the research community. In the following, we summarize significant research directions in this area.

• Declarative Cloud Resource Orchestration and Management: A key distinguishing feature of cloud services is elasticity, i.e., the power to dynam- ically scale up/down resources to adapt to varying requirements. Elasticity is usually achieved through invocation of actions (e.g., add storage capacity, restart VMs, increase servers, decrease load balancers) that run as a result of events (e.g., usage increases beyond a threshold, invocation failures), allowing a controller to automatically configure or reconfigure the relevant resources. However, according to our analysis, the automated monitoring and control of federated cloud services is still at its early stages [168, 135]. We believe that models and languages for describing cloud resources should be endowed with intuitive and automation-friendly constructs that can be used to specify a range of enforceable and flexible elasticity mechanisms in accordance with high-level policies specified by consumers.

Accordingly, we envision state machines as a novel abstraction to declaratively represent and reason about dynamic elasticity-aware resource orchestration techniques. Instead of directly manipulating low-level interfaces and scripting orchestration rules over complex cloud services, state machines may reason about resource requirement states. States may also characterize application- specific resource requirements (e.g., CPU and storage usages), constraints in terms of costs, and other SLAs. We anticipate that, based on experience and knowledge sharing, resource consumers will be able to intuitively spec- ify resource requirements and constraints during different phases (e.g., high-, normal- and low-use phases), using state machines. Transitions between states are triggered when certain conditions are satisfied (e.g., a temporal event, ap- 7.2. Future Directions 194

plication workload increases beyond a certain threshold). Transitions thereby automatically trigger controlling actions to perform the desired resource (re- )configurations to satisfy the requirements and constraints of target states.

• Visual Notations for Orchestrating Cloud Resources: With the prolif- eration in demands for cloud computing, DevOps are faced with orchestrating large amounts of complex cloud resource configurations. This involves be- ing able to proficiently understand and analyze cloud resource attributes and relationships, and make orchestration decisions on demand. However, a ma- jority of cloud tools encode resource descriptions and deployment, monitoring and control scripts in tedious textual formats. This presents complex and overwhelming challenges for DevOps to manually read and iteratively build a mental representation especially when it involves a large number of cloud resources. We therefore believe cloud resource orchestration should empower visual techniques to configure, deploy, monitor and control cloud resources. For example, a visual notation may allow DevOps to drag, drop and connect pre-built component cloud resources and deploy composite cloud resources. After the deployment, the visual notation may simplify the means to query, monitor and control cloud resource via graphical widgets.

• Autonomic Cloud Resource Orchestration: We believe autonomic or- chestration will play a key role in addressing crucial gaps in cloud computing – especially to help support the fully-automated cloud service-based endeavor [196]. Most existing work only apply orchestration strategies for specific as- pects such as configuration [217], deployment [11, 26], and control [179]. We envision an integrated solution that is able to perform a unified range of au- tonomic orchestration tasks such as self-configuration, self-optimization, self- healing (automatic discovery and fixation of errors), and self-protecting (au- tomatic security and integrity) tasks.

• End-users: We identify end-users as an important and emerging user cat- egory for orchestration techniques in future. Accordingly, end-users should be able to easily and declaratively access, configure, compose, and analyze simple yet powerful composite cloud resources. Currently, even sophisticated 7.2. Future Directions 195

DevOps are regularly forced to resort to understanding different low-level re- source access methods, and procedural language paradigms, to create and manage complex cloud resources.

• Unified and Domain-specific Models for Cloud Resource Represen- tation and Management: We believe unified and domain-specific models for representing and managing cloud resources as high-level entities are the key to building interoperable cloud resource orchestration techniques. Accordingly, users are able to specify technique-independent cloud resource configurations and deploy and manage them. However, there are challenges to be addressed when adopting unified resource representations such that minimal or no sup- port by existing orchestration tools. We believe there will be more platform will emerge to support unified and domain-specific model driven cloud resource orchestration.

• Runtime Intelligence for Declarative Orchestration: Although the pro- liferation of cloud services and orchestration will increase development produc- tivity, there are significant shortcomings in seamlessly integrating orchestra- tion languages and techniques with scalable data processing platforms. For instance, such data platforms are essential for monitoring and enforcing SLAs, which involve capturing and analyzing large amounts of real-time data in big data analytics platforms [184, 63]. We believe the orchestration layer should contain the intelligence responsible for specifying resource orchestration, while the data processing layer should contain the intelligence responsible for data- flow and processing. This could be achieved by leveraging platforms such as Hadoop. DevOps will thus be able to describe resource requirements and con- straints using declarative and orchestration-aware abstractions such as State machines (refer to Section 2.9.3). Orchestration runtimes may thus automati- cally translate these abstractions into efficient and technique-aware execution scripts.

• Cloud Service Event Analytics: The ability for cloud orchestration plat- forms to gain the requisite intelligence about consumption patterns of deployed 7.2. Future Directions 196

resources, ensures compliance with cost and SLA constraints, and improves re- source orchestration processes in general (e.g., continuously fine-tuning defined policies in dynamic and evolving environments).

In this context, we therefore believe future work should develop concepts and techniques to model and capture event patterns and abstract them into mean- ingful concepts (e.g., characterizing states of an application or a service, state of a specific application component, behavior of users from a specific geoloca- tion) that are suitable for cloud elastic resource orchestration purposes. Ac- cordingly, we believe high-level language constructs to abstract and aggregate temporal and resource-relevant events over federated cloud services at various granularities will provide the key. These can be used to describe event sum- maries of knowledge about variations in resource requirements, in terms of both aggregated resource consumption metrics (e.g., the number of API calls per second) and semantically meaningful event categories (e.g., moderate ap- plication load). Event summaries can be defined at various abstraction levels as a hierarchy to cater for context-based, fine- or coarse-grained analysis of resource requirements and consumption trends. Lower-level event summaries may be concrete (e.g., providing knowledge relevant to a fine-grain analy- sis of patterns for some specific cloud service such as Amazon DynamoDB). Higher-level event summaries may capture knowledge required for coarse-grain analysis of patterns relevant to a collection of resources (e.g., cluster, whole application). Bibliography

[1] Giuseppe Aceto, Alessio Botta, Walter De Donato, and Antonio Pescapè. Cloud monitoring: A survey. Computer Networks, 57(9):2093–2115, 2013.

[2] Brian Adler. Building scalable applications in the cloud: Reference architec- ture & best practices, inc, 2011.

[3] Jeannie Albrecht, Christopher Tuttle, Ryan Braud, Darren Dao, Nikolay Top- ilski, Alex C Snoeren, and Amin Vahdat. Distributed application configu- ration, management, and visualization with plush. ACM Transactions on Internet Technology (TOIT), 11(2):6, 2011.

[4] AWS Amazon. Aws cloud formation. Online article, 2011.

[5] AWS Amazon. Aws opsworks template snippets. Online article, 2015.

[6] Inc Amazon Web Services. Amazon ec2 instances. Online article, 2015.

[7] Inc. Amazon Web Services. Amazon relational database service - api docu- umentation. Online article, 2015.

[8] Inc. Amazon Web Services. Aws sdk for java. Online article, 2015.

[9] Inc. Amazon Web Services. Rest api for aws s3. Online article, 2015.

[10] Inc. Ansible. Ansible : Cloud modules. Online article, 2015.

[11] Alexandru-Florian Antonescu, Alvaro Gomes, Peter Robinson, and Torsten Braun. Sla-driven predictive orchestration for distributed cloud-based mo- bile services. In Communications Workshops (ICC), 2013 IEEE International Conference on, pages 738–743. IEEE, 2013.

197 BIBLIOGRAPHY 198

[12] CA AppLogic. Ca applogic cloud platform. Online article, 2015.

[13] Claudio A. Ardagna, Rasool Asal, Ernesto Damiani, and Quang Hieu Vu. From security to assurance in the cloud: A survey. ACM Comput. Surv., 48(1):2:1–2:50, July 2015.

[14] D. Ardagna and et al. Modaclouds: A model-driven approach for the de- sign and execution of applications on multiple clouds. In MISE, 2012 ICSE Workshop on, pages 50–56, June 2012.

[15] D. Ardagna, E. Di Nitto, P. Mohagheghi, S. Mosser, C. Ballagny, F. D’Andria, G. Casale, P. Matthews, C. S. Nechifor, D. Petcu, A. Gericke, and C. Sheridan. Modaclouds: A model-driven approach for the design and execution of appli- cations on multiple clouds. In 2012 4th International Workshop on Modeling in Software Engineering (MISE), pages 50–56, June 2012.

[16] Michael Armbrust and et al. A view of cloud computing. Commun. ACM, 53(4):50–58, April 2010.

[17] Lowell Jay Arthur. Software Evolution: The Software Maintenance Challenge. Wiley-Interscience, New York, NY, USA, 1988.

[18] Amazon Auto Scaling. Auto scaling for aws cloud resources. Online article, 2015.

[19] AWS. Available commands for ec2 in aws cli. Online article, 2013.

[20] AWS. Aws cli. Online article, 2013.

[21] Arshdeep Bahga and Vijay K Madisetti. Rapid prototyping of multitier cloud- based services and systems. Computer, 46(11):76–83, 2013.

[22] Wolfgang Barth. Nagios: System and network monitoring. No Starch Press, 2008.

[23] Moshe Chai Barukh and Boualem Benatallah. Servicebase: A programming knowledge-base for service oriented development. In DASFAA, pages 123–138. Springer, 2013. BIBLIOGRAPHY 199

[24] Moshe Chai Barukh and Boualem Benatallah. A toolkit for simplified web- services programming. In Web Information Systems Engineering–WISE 2013, pages 515–518. Springer, 2013.

[25] Erick Bauman, Gbadebo Ayoade, and Zhiqiang Lin. A survey on hypervisor- based monitoring: Approaches, applications, and evolutions. ACM Comput. Surv., 48(1):10:1–10:33, August 2015.

[26] Anton Beloglazov, Jemal Abawajy, and Rajkumar Buyya. Energy-aware re- source allocation heuristics for efficient management of data centers for cloud computing. Future generation computer systems, 28(5):755–768, 2012.

[27] Jan A. Bergstra and Mark Burgess. A static theory of promises. CoRR, abs/0810.3294, 2008.

[28] Jan A Bergstra and Mark Burgess. Promises, impositions, and other direc- tionals. arXiv preprint arXiv:1401.3381, 2014.

[29] Tobias Binz, Uwe Breitenbücher, Florian Haupt, Oliver Kopp, Frank Ley- mann, Alexander Nowak, and Sebastian Wagner. Opentosca–a runtime for tosca-based cloud applications. In Service-Oriented Computing, pages 692– 695. Springer, 2013.

[30] Bitnami. Bitnami makes it easy to run your favorite server apps anywhere. Online article, 2015.

[31] Thomas J. Bittman. The road map from virtualization to cloud computing. https://www.gartner.com/doc/1572031, March 2011. Accessed: 24/11/2013.

[32] Paul Borril, Mark Burgess, Todd Craw, and Mike Dvorkin. A promise theory perspective on data networks. arXiv preprint arXiv:1405.2627, 2014.

[33] Tim Bray, Jean Paoli, C Michael Sperberg-McQueen, Eve Maler, and François Yergeau. Extensible markup language (xml). World Wide Web Consortium Recommendation REC-xml-19980210. http://www. w3. org/TR/1998/REC- xml-19980210, page 16, 1998. BIBLIOGRAPHY 200

[34] Mark Burgess. Promise you a rose garden, 2007.

[35] Mark Burgess. Knowledge management and promises. In Scalability of Net- works and Services, pages 95–107. Springer, 2009.

[36] Mark Burgess. Testable system administration. Communications of the ACM, 54(3):44–49, 2011.

[37] Mark Burgess and Oslo College. Cfengine: a site configuration engine. In in USENIX Computing systems, Vol, 1995.

[38] Mark Burgess and Alva L Couch. Modeling next generation configuration management tools. In LISA, pages 131–147, 2006.

[39] Damon Cali. Introducing rumm: a command line tool for the . Online article, 2013.

[40] Canonical. Juju charm store. Online article, 2015.

[41] CenturyLink. Panamax: Docker management for humans. Online article, 2015.

[42] Stefano Ceri, Piero Fraternali, and Aldo Bongio. Web modeling language (webml): a modeling language for designing web sites. Computer Networks, 33(1):137–157, 2000.

[43] Clovis Chapman, Wolfgang Emmerich, Fermín Galán Márquez, Stuart Clay- man, and Alex Galis. Software architecture definition for on-demand cloud provisioning. Cluster Computing, 15(2):79–100, 2012.

[44] Chef. About recipes. Online article, 2015.

[45] Peter Pin-Shan Chen. The entity-relationship model—toward a unified view of data. ACM Trans. Database Syst., 1(1):9–36, March 1976.

[46] Shang-Wen Cheng and David Garlan. Stitch: A language for architecture- based self-adaptation. J. Syst. Softw., 85(12):2860–2875, December 2012. BIBLIOGRAPHY 201

[47] Trieu C Chieu and at al. Solution-based deployment of complex application services on a cloud. In SOLI, 2010 IEEE International Conference on, pages 282–287. IEEE, 2010.

[48] Mark Chignell, James Cordy, Joanna Ng, and Yelena Yesha. The smart inter- net: current research and future applications, volume 6400. Springer Science & Business Media, 2010.

[49] Inc. CloudBees. Cloudbees: The enterprise jenkins company. Online article, 2016.

[50] Apache CloudStack. Apache cloudstack: Open source cloud computing. On- line article, 2016.

[51] AWS CloudTrail. Security at scale: Logging in aws, 2014.

[52] Amazon CloudWatch. Monitoring for aws cloud resources. Online article, 2013.

[53] Ruby Community. Ruby language runtime. Online article, 2015.

[54] Paul Compton and et al. Ripple down rules: Turning knowledge acquisition into knowledge maintenance. Artificial Intelligence in Medicine, 4(6):463–475, 1992.

[55] EnSoft Corp. Atlas. Online article, 2015.

[56] Alva L Couch, John Hart, Elizabeth G Idhaw, and Dominic Kallas. Seek- ing closure in an open world: A behavioral agent approach to configuration management. In LISA, volume 3, pages 125–148, 2003.

[57] S Crosby, R Doyle, M Gering, M Gionfriddo, S Grarup, S Hand, M Hapner, D Hiltgen, et al. Open virtualization format specification. Standards and Technology, no. DSP0243 in DMTF Specifications, Distributed Management Task Force, 2009.

[58] CS50. Cs50 appliance 19. Online article, 2015. BIBLIOGRAPHY 202

[59] Yong Cui, Vojislav B Misic, Rajkumar Buyya, and Dejan Milojicic. Guest editors’ introduction: Special issue on cloud computing. IEEE Transactions on Parallel and Distributed Systems, 24(6):1062–1065, 2013.

[60] Michael Cusumano. Cloud computing and saas as new computing platforms. Communications of the ACM, 53(4):27–29, 2010.

[61] Clemens Danninger. Using constraint solvers to find valid software configura- tions. 2015.

[62] Buddhika De Alwis, Supun Malinga, Kathiravelu Pradeeban, Denis Weerasiri, Srinath Perera, and Vishaka Nanayakkara. Mooshabaya: mashup generator for xbaya. In Proceedings of the 8th International Workshop on Middleware for Grids, Clouds and e-Science, page 8. ACM, 2010.

[63] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.

[64] Thomas Delaet, Wouter Joosen, and Bart Vanbrabant. A survey of system configuration tools. In Proceedings of the 24th International Conference on LISA, pages 1–8. USENIX Association, 2010.

[65] Hewlett Packard Enterprise Development. Hpe helion eucalyptus: Open source hybrid cloud software for aws users. Online article, 2016.

[66] Remco Dijkman and Marlon Dumas. Service-oriented design: A multi- viewpoint approach. International journal of cooperative information systems, 13(04):337–368, 2004.

[67] Nectar Directorate. Nectar: AustraliaâĂŹs fastest growing researcher network. Online article, 2016.

[68] Docker. Docker hub registry. Online article, 2015.

[69] Docker. Overview of docker compose. Online article, 2015.

[70] dotCloud. dotcloud documentation. Online article, 2015. BIBLIOGRAPHY 203

[71] Marlon Dumas, Marcello La Rosa, Jan Mendling, and Hajo A Reijers. Fun- damentals of business process management. Springer, 2013.

[72] Erik Elmroth and Lars Larsson. Interfaces for placement, migration, and monitoring of virtual machines in federated clouds. In GCC’09, pages 253– 260. IEEE, 2009.

[73] Inc. . Engine yard. Online article, 2016.

[74] Wei Fang, ZhiHui Lu, Jie Wu, and ZhenYin Cao. Rpps: a novel resource pre- diction and provisioning scheme in cloud . In Services Computing (SCC), 2012 IEEE Ninth International Conference on, pages 609–616. IEEE, 2012.

[75] Finally.io. finally.io. Online article, 2014.

[76] Linux Foundation. Ooen container project. Online article, 2015.

[77] The Apache Software Foundation. An api that abstracts the differents between clouds. Online article, 2014.

[78] The Apache Software Foundation. Compute guide. Online article, 2014.

[79] The Apache Software Foundation. The java multi-cloud toolkit. Online article, 2014.

[80] The Apache Software Foundation. One interface to rule them all. Online article, 2015.

[81] Joerg Fritsch. Security properties of containers man- aged by docker. https://www.gartner.com/doc/2956826/ security-properties-containers-managed-docker, January 2015. Ac- cessed: 05/06/2015.

[82] Inc. Gartner. Gartner survey reveals that saas deployments are now mission critical. http://www.gartner.com/newsroom/id/2923217, November 2014. Accessed: 14/07/2015. BIBLIOGRAPHY 204

[83] Gartner says cloud computing will become the bulk of new it spend by 2016. http://www.gartner.com/newsroom/id/2613015. Accessed: 07/12/2014.

[84] Wolfgang Gerlach, Wei Tang, Kevin Keegan, Travis Harrison, Andreas Wilke, Jared Bischof, Mark D’Souza, Scott Devoid, Daniel Murphy-Olson, Narayan Desai, et al. Skyport: container-based execution environment management for multi-cloud scientific workflows. In Proceedings of the 5th International Workshop on Data-Intensive Computing in the Clouds, pages 25–32. IEEE Press, 2014.

[85] Katja Gilly, Carlos Juiz, and Ramon Puigjaner. An up-to-date survey in web load balancing. World Wide Web, 14(2):105–131, March 2011.

[86] Git-distributed-is-the-new-centralized. http://git-scm.com/. Accessed: 28/10/2014.

[87] Patrick Goldsack and at al. The smartfrog configuration management frame- work. ACM SIGOPS Operating Systems Review, 43(1):16–25, 2009.

[88] Patrick Goldsack, Julio Guijarro, Steve Loughran, Alistair N. Coles, Andrew Farrell, Antonio Lain, Paul Murray, and Peter Toft. The smartfrog configura- tion management framework. Operating Systems Review, 43(1):16–25, 2009.

[89] Google. Container registry: Fast, private docker image storage on google cloud platform. Online article, 2015.

[90] Google. Google app engine: Platform as a service. Online article, 2015.

[91] Google cloud deployment manager. https://cloud.google.com/ deployment-manager/. Accessed: 16/01/2016.

[92] Christophe Gravier, Julien Subercaze, Amro Najjar, Frédérique Laforest, Xavier Serpaggi, and Olivier Boissier. Context awareness as a service for cloud resource optimization. 2013.

[93] Christophe Gravier, Julien Subercaze, Amro Najjar, Frederique Laforest, Xavier Serpaggi, and Olivier Boissier. Context awareness as a service for cloud resource optimization. Internet Computing, IEEE, 19(1):28–34, 2015. BIBLIOGRAPHY 205

[94] Diwaker Gupta, Ludmila Cherkasova, Rob Gardner, and Amin Vahdat. En- forcing performance isolation across virtual machines in xen. In Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware, Mid- dleware ’06, pages 342–362, New York, NY, USA, 2006. Springer-Verlag New York, Inc.

[95] Mohammad Hajjat, Xin Sun, Yu-Wei Eric Sung, David Maltz, Sanjay Rao, Kunwadee Sripanidkulchai, and Mohit Tawarmalani. Cloudward bound: plan- ning for beneficial migration of enterprise applications to the cloud. ACM SIGCOMM Computer Communication Review, 41(4):243–254, 2011.

[96] Ahmad Fadzil M. Hani, Irving Vitra Paputungan, and Mohd Fadzil Hassan. Renegotiation in service level agreement management for a cloud-based sys- tem. ACM Comput. Surv., 47(3):51:1–51:21, April 2015.

[97] Mitchell Hashimoto. Vagrant: Up and Running. " O’Reilly Media, Inc.", 2013.

[98] Brian Holland, Lynda Holland, and Jenny Davies. An investigation into the concept of mind mapping and the use of mind mapping software to support and improve student academic performance. 2004.

[99] Ben Hosmer. Getting started with salt stack–the other configuration manage- ment system built with python. Linux journal, 2012(223):3, 2012.

[100] Wei Huang, Afshar Ganjali, Beom Heyn Kim, Sukwon Oh, and David Lie. The state of public infrastructure-as-a-service cloud security. ACM Comput. Surv., 47(4):68:1–68:31, June 2015.

[101] Cloudlabs Inc. Public snaps. Online article, 2015.

[102] TIBCO Software Inc. Event processing with state machines. Technical report, 05 2014.

[103] Waheed Iqbal, Matthew N Dailey, David Carrera, and Paul Janecek. Adaptive resource provisioning for read intensive multi-tier applications in the cloud. Future Generation Computer Systems, 27(6):871–879, 2011. BIBLIOGRAPHY 206

[104] Sadeka Islam, Jacky Keung, Kevin Lee, and Anna Liu. Empirical prediction models for adaptive resource provisioning in the cloud. Future Generation Computer Systems, 28(1):155–162, 2012.

[105] Brendan Jennings and Rolf Stadler. Resource management in clouds: Survey and research challenges. Journal of Network and Systems Management, pages 1–53, 2014.

[106] Yexi Jiang, Chang-shing Perng, Tao Li, and Rong Chang. Asap: A self- adaptive prediction system for instant cloud resource demand provisioning. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 1104–1109. IEEE, 2011.

[107] Json schema. http://json-schema.org/latest/json-schema-core.html. Accessed: 6/12/2014.

[108] Ubuntu Juju. Charm store policy. Online article, 2015.

[109] Ubuntu Juju. What is a relation? Online article, 2015.

[110] Matjaz B Juric and Denis Weerasiri. WS-BPEL 2.0 beginner’s guide. Packt Publishing Ltd, 2014.

[111] Luke Kanies. Puppet: Next-generation configuration management. The USENIX Magazine, 31(1):19–25, 2006.

[112] Alexander Keller and Heiko Ludwig. The wsla framework: Specifying and monitoring service level agreements for web services. Journal of Network and Systems Management, 11(1):57–81, 2003.

[113] Alireza Khoshkbarforoushha, Meisong Wang, Lizhe Wang, Leila Alem, Samee U Khan, and Boualem Benatallah. Capability analysis of cloud re- source orchestration frameworks. Computer (To be appeared), 2016.

[114] Hyunjoo Kim and Manish Parashar. Cometcloud: An autonomic cloud engine. Cloud Computing: Principles and Paradigms, pages 275–297, 2011. BIBLIOGRAPHY 207

[115] Johannes Kirschnick, Jose M. Alcaraz Calero, Patrick Goldsack, Andrew Far- rell, Julio Guijarro, Steve Loughran, Nigel Edwards, and Lawrence Wilcock. Towards an architecture for deploying elastic services in the cloud. Softw. Pract. Exper., 42(4):395–408, April 2012.

[116] Alexander V. Konstantinou and et al. An architecture for virtual solution composition and deployment in infrastructure clouds. In Proceedings of the 3rd International Workshop on VTDC, pages 9–18. ACM, 2009.

[117] Oliver Kopp and et al. Bpmn4tosca: A domain-specific language to model management plans for composite applications. In Business Process Model and Notation, pages 38–52. Springer, 2012.

[118] Puppet Labs. Overview of orchestration topics. Online article, 2015.

[119] Puppet Labs. Publishing modules on the puppet forge. Online article, 2015.

[120] Puppet Labs. Puppet enterprise. Online article, 2015.

[121] Puppet Labs. Type reference. Online article, 2015.

[122] Menno Lageman and Sun Client Solutions. Solaris containers-what they are and how to use them. Sun BluePrints OnLine, pages 819–2679, 2005.

[123] Jason LaPorte, Travis Desell, and Carlos Varela. Overview: Generic visual- ization of large-scale distributed systems. 2006.

[124] C. Larman and V.R. Basili. Iterative and incremental developments. a brief history. Computer, 36(6):47–56, June 2003.

[125] George Lawton. Lamp lights enterprise development efforts. Computer, 38(9):0018–20, 2005.

[126] Angel Lagares Lemos, Florian Daniel, and Boualem Benatallah. Web ser- vice composition: A survey of techniques and tools. ACM Comput. Surv., 48(3):33:1–33:41, December 2015. BIBLIOGRAPHY 208

[127] Grace Lewis et al. Role of standards in cloud-computing interoperability. In System Sciences (HICSS), 2013 46th Hawaii International Conference on, pages 1652–1661. IEEE, 2013.

[128] Frank Leymann and et al. Cloud computing patterns. 2014.

[129] Lei Li, John Grundy, and John Hosking. A visual language and environment for enterprise system modelling and automation. Journal of Visual Languages & Computing, 25(4):253–277, 2014.

[130] LinuxContainers.org. What’s lxc? Online article, 2015.

[131] Changbin Liu, Boon Thau Loo, and Yun Mao. Declarative automated cloud resource orchestration. In Proceedings of the SOCC’11, pages 1–8. ACM, 2011.

[132] Changbin Liu, Yun Mao, Jacobus Van der Merwe, and Mary Fernandez. Cloud resource orchestration: A data-centric approach. In Proceedings of the biennial Conference on Innovative Data Systems Research (CIDR), pages 1–8, 2011.

[133] Scott Lowe. Mastering VMware vSphere 5. John Wiley & Sons, 2011.

[134] Canonical Ltd. Juju admin. Online article, 2015.

[135] Hongbin Lu, M. Shtern, B. Simmons, M. Smit, and M. Litoiu. Pattern-based deployment service for next generation clouds. In Services (SERVICES), 2013 IEEE Ninth World Congress on, pages 464–471, June 2013.

[136] Heiko Ludwig, Alexander Keller, Asit Dan, Richard King, and Richard Franck. A service level agreement language for dynamic electronic services. Electronic Commerce Research, 3(1-2):43–59, 2003.

[137] MadeiraCloud. Cloudfielder: Policy as a service, for your cloud infrastrucutre. Online article, 2015.

[138] David J. Malan. Cs50. Online article, 2015.

[139] Ebrahim H Mamdani. Application of fuzzy algorithms for control of simple dy- namic plant. In Proceedings of the Institution of Electrical Engineers, volume 121, pages 1585–1588. IET, 1974. BIBLIOGRAPHY 209

[140] Zoltán Ádám Mann. Allocation of virtual machines in cloud data cen- ters—a survey of problem models and optimization algorithms. ACM Comput. Surv., 48(1):11:1–11:34, August 2015.

[141] Amazon Marketplace. Marketplace for aws cloud resources. Online article, 2012.

[142] Toni Mastelic, Ariel Oleksiak, Holger Claussen, Ivona Brandic, Jean-Marc Pierson, and Athanasios V. Vasilakos. Cloud computing: Survey on energy efficiency. ACM Comput. Surv., 47(2):33:1–33:36, December 2014.

[143] Michael Menzel, Rajiv Ranjan, Lizhe Wang, Samee U Khan, and Jinjun Chen. Cloudgenius: a hybrid decision support method for automating the migration of web application clusters to public clouds. Computers, IEEE Transactions on, 64(5):1336–1348, 2015.

[144] Brenda M Michelson. Event-driven architecture overview. Patricia Seybold Group, 2, 2006.

[145] Neil Middleton, Richard Schneeman, et al. Heroku: Up and Running. "O’Reilly Media, Inc.", 2013.

[146] R. Mietzner and F. Leymann. Towards provisioning the cloud: On the usage of multi-granularity flows and services to realize a unified provisioning infras- tructure for saas applications. In Services - Part I, 2008. IEEE Congress on, pages 3–10, 2008.

[147] M. Mishra, A. Das, P. Kulkarni, and A. Sahoo. Dynamic resource manage- ment using virtual machine migrations. Communications Magazine, IEEE, 50(9):34–40, September 2012.

[148] Madhurranjan Mohaan and Ramesh Raithatha. Learning Ansible. Packt Pub- lishing Ltd, 2014.

[149] Francesco Moscato and et al. An analysis of mosaic ontology for cloud re- sources annotation. In FedCSIS, 2011, pages 973–980. IEEE, 2011. BIBLIOGRAPHY 210

[150] Elizabeth L. Murnane and Vineet Sinha. Interactive exploration of compacted visualizations for understanding behavior in complex software. In Companion to the 23rd ACM SIGPLAN Conference on Object-oriented Programming Sys- tems Languages and Applications, OOPSLA Companion ’08, pages 729–730, New York, NY, USA, 2008. ACM.

[151] Cohesive Networks. Cohesive networks: Home. Online article, 2016.

[152] Nitrous. nitrous.io. Online article, 2013.

[153] OASIS. Topology and Orchestration Specification for Cloud Applications (TOSCA), Version 1.0, 2013.

[154] OMG. Business Process Model and Notation (BPMN), Version 2.0, 2011.

[155] Open container initiative. https://www.opencontainers.org/. Accessed: 21/08/2015.

[156] OpenStack.org. Open source software for creating private and public clouds. Online article, 2015.

[157] OpenStack.org. Openstack orchestration. Online article, 2015.

[158] Suraj Pandey, Linlin Wu, Siddeswara Mayura Guru, and Rajkumar Buyya. A particle swarm optimization-based heuristic for scheduling workflow applica- tions in cloud computing environments. In Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on, pages 400–407. IEEE, 2010.

[159] Michael P. Papazoglou and Willem-Jan van den Heuvel. Blueprinting the cloud. IEEE Internet Computing, 15(6):74–79, 2011.

[160] Manish Parashar and Salim Hariri. Autonomic computing: An overview. In Unconventional Programming Paradigms, pages 257–269. Springer, 2005.

[161] Thomas M. Pigoski. Practical Software Maintenance: Best Practices for Man- aging Your Software Investment. John Wiley & Sons, Inc., New York, NY, USA, 1996. BIBLIOGRAPHY 211

[162] Google Cloud Platform. Cloud sdk. Online article, 2015.

[163] Julien Ponge, Boualem Benatallah, Fabio Casati, and Farouk Toumani. Anal- ysis and applications of timed service protocols. ACM Trans. Softw. Eng. Methodol., 19(4):11:1–11:38, April 2010.

[164] OpenNebula Project. Opennebula | flexible enterprise cloud made simple. Online article, 2016.

[165] Rackspace. Rackspace: Api documentation. Online article, 2015.

[166] Fahimeh Ramezani, Jie Lu, and Faheem Hussain. An online fuzzy decision support system for resource management in cloud environments. In IFSA World Congress and NAFIPS Annual Meeting (IFSA/NAFIPS), 2013 Joint, pages 754–759. IEEE, 2013.

[167] R. Ranjan, B. Benatallah, S. Dustdar, and M.P. Papazoglou. Cloud resource orchestration programming: Overview, issues, and directions. Internet Com- puting, IEEE, 19(5):46–56, Sept 2015.

[168] Rajiv Ranjan and Boualem Benatallah. Programming cloud resource orches- tration framework: Operations and research challenges. CoRR, abs/1204.2204, 2012.

[169] Rajiv Ranjan, Rajkumar Buyya, and Surya Nepal. Editorial: Model-driven provisioning of application services in hybrid computing environments. Future Gener. Comput. Syst., 29(5):1211–1215, July 2013.

[170] Redmine. http://www.redmine.org/. Accessed: 28/10/2014.

[171] Paul Resnick and Hal R Varian. Recommender systems. Communications of the ACM, 40(3):56–58, 1997.

[172] Rami Rosen. Resource management: Linux kernel namespaces and cgroups. Haifux, May, 2013.

[173] Todd Rosner. Learning AWS OpsWorks. Packt Publishing Ltd, 2013. BIBLIOGRAPHY 212

[174] Arpan Roy, Santonu Sarkar, Rajeshwari Ganesan, and Geetika Goel. Secure the cloud: From the perspective of a service-oriented organization. ACM Comput. Surv., 47(3):41:1–41:30, February 2015.

[175] Navin Sabharwal. Automation Through Chef Opscode. APress, 2014.

[176] H. Sato, A. Kanai, and S. Tanimoto. A cloud trust model in a security aware cloud. In Applications and the Internet (SAINT), 2010 10th IEEE/IPSJ In- ternational Symposium on, pages 121–124, July 2010.

[177] Benjamin Satzger and et al. Winds of change: From vendor lock-in to the meta cloud. Internet Computing, IEEE, 17(1):69–73, 2013.

[178] Pete Sawyer, Raul Mazo, Daniel Diaz, Camille Salinesi, and Danny Hughes. Using constraint programming to manage configurations in self-adaptive sys- tems. Computer, (10):56–63, 2012.

[179] Stefan Schulte, Christian Janiesch, Srikumar Venugopal, Ingo Weber, and Philipp Hoenisch. Elastic business process management: State of the art and open challenges for bpm in the cloud. Future Generation Computer Systems, 46:36–50, 2015.

[180] Amazon Web Services. Amazon ec2. Online article, 2015.

[181] Amazon Web Services. Amazon ec2 container registry. Online article, 2015.

[182] Amazon Web Services. Aws management console. http://aws.amazon.com/console/, 2015.

[183] shipyard. Shipyard walkthrough. Online article, 2015.

[184] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. The hadoop distributed file system. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on, pages 1–10. IEEE, 2010.

[185] Sukhpal Singh and Inderveer Chana. Qos-aware autonomic resource man- agement in cloud computing: A systematic review. ACM Comput. Surv., 48(3):42:1–42:46, December 2015. BIBLIOGRAPHY 213

[186] Vineet Sinha and et al. Understanding code architectures via interactive ex- ploration and layout of layered diagrams. In Companion to the 23rd ACM SIGPLAN Conference on OOPSLA, OOPSLA Companion ’08, pages 745– 746. ACM, 2008.

[187] James Skene, Franco Raimondi, and Wolfgang Emmerich. Service-level agree- ments for electronic services. Software Engineering, IEEE Transactions on, 36(2):288–304, 2010.

[188] M. Smit, B. Simmons, M. Shtern, and M. Litoiu. Supporting application development with structured queries in the cloud. In Software Engineering (ICSE), 2013 35th International Conference on, pages 1213–1216, May 2013.

[189] StackEngine. Stackengine container application center. Online article, 2015.

[190] Real Status. Hyperglance. http://www.real-status.com/product/, 2015.

[191] Yu-Jen John Sun, Moshe Chai Barukh, Boualem Benatallah, et al. Scalable saas-based process customization with casewalls. In Service-Oriented Com- puting, pages 218–233. Springer, 2015.

[192] C.T. Sungur and et al. Extending bpmn for wireless sensor networks. In Business Informatics (CBI), 2013 IEEE 15th Conference on, pages 109–116, July 2013.

[193] CA Technologies. Insslr2 - redundant http input gateway with ssl support. Online article, 2013.

[194] R.W. Thrash. Building a cloud computing specification: Fun- damental engineering for optimizing cloud computing initiatives. http://assets1.csc.com/innovation/downloads/CSC_Papers_2010_ Building_a_Cloud_Computing_Specification.pdf, March 2010. Accessed: 25/11/2013.

[195] Doug Tidwell. The simple cloud api : Writing portable, interoperable appli- cations for the cloud. Online article, 2009. BIBLIOGRAPHY 214

[196] Adel Nadjaran Toosi, Rodrigo N Calheiros, and Rajkumar Buyya. Inter- connected cloud computing environments: Challenges, taxonomy, and survey. ACM Computing Surveys (CSUR), 47(1):7, 2014.

[197] James Turnbull. The Docker Book: Containerization is the new virtualization. James Turnbull, 2014.

[198] Ubuntu. Juju. Online article, 2013.

[199] Peter Van Roy et al. Programming paradigms for dummies: What every should know. New computational paradigms for computer music, 104, 2009.

[200] Bharadwaj Veeravalli and Manish Parashar. Guest editors’ introduction: Spe- cial issue on cloud of clouds. IEEE Transactions on Computers, 63(1):1–2, 2014.

[201] David Villegas and et al. Cloud federation in a layered service model. J. Comput. Syst. Sci., 78(5):1330–1344, September 2012.

[202] VisualOps. Visualops: Global dashboard. Online article, 2014.

[203] VisualOps. Visualops - wysiwyg for your cloud. Online article, 2015.

[204] Inc. VMware. Understanding virtual machine snapshots in vmware esxi and esx (1015180). Online article, 2015.

[205] Lizhe Wang, Rajiv Ranjan, Jinjun Chen, and Boualem Benatallah. Cloud computing: methodology, systems, and applications. CRC Press, 2012.

[206] Denis Weerasiri, Moshe Chai Barukh, Boualem Benatallah, and Cao Jian. Cloudmap: A visual notation for representing and managing cloud resources. In International Conference on Advanced Information Systems Engineering, pages 427–443. Springer International Publishing, 2016.

[207] Denis Weerasiri and Boualem Benatallah. Unified representation and reuse of federated cloud resources configuration knowledge. In Enterprise Distributed BIBLIOGRAPHY 215

Object Computing Conference (EDOC), 2015 IEEE 19th International, pages 142–150, Sept 2015.

[208] Denis Weerasiri, Boualem Benatallah, and Moshe Chai Barukh. Process-driven configuration of federated cloud resources. In Database Systems for Advanced Applications, pages 334–350. Springer, 2015.

[209] Denis Weerasiri, Boualem Benatallah, and Moshe Chai Barukh. Processbase: a hybrid process management platform. In Submitted to Service-Oriented Com- puting. 2015.

[210] Denis Weerasiri, Boualem Benatallah, and Jian Yang. Unified representation and reuse of federated cloud resources configuration knowledge. Technical Report UNSW-CSE-TR-201411, Department of CSE, University of New South Wales, 2014.

[211] Denis Weerasiri and et al. A model-driven framework for interoperable cloud resources management. Technical Report UNSW-CSE-TR-201514, UNSW, 2015.

[212] Yi Wei and M Brian Blake. Adaptive service workflow configuration and agent-based virtual resource management in the cloud*. In Cloud Engineering (IC2E), 2013 IEEE International Conference on, pages 279–284. IEEE, 2013.

[213] Johannes Wettinger, Uwe Breitenbücher, and Frank Leymann. Standards- based devops automation and integration using tosca. In Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Comput- ing, pages 59–68. IEEE Computer Society, 2014.

[214] Johannes Wettinger and et al. Unified Invocation of Scripts and Services for Provisioning, Deployment, and Management of Cloud Applications Based on TOSCA. In CLOSER 2014, April 3-5, 2014, pages 559–568. SciTePress, April 2014.

[215] Matthew S. Wilson. Constructing and managing appliances for cloud deploy- ments from repositories of reusable components. In Proceedings of the 2009 Conference on HotCloud’09. USENIX Association, 2009. BIBLIOGRAPHY 216

[216] Erik Wittern, Alexander Lenk, Sebastian Bartenbach, and Tobias Braeuer. Feature-based configuration of vendor-independent deployments on iaas. In Enterprise Distributed Object Computing Conference (EDOC), 2014 IEEE 18th International, pages 128–135. IEEE, 2014.

[217] Cheng-Zhong Xu, Jia Rao, and Xiangping Bu. Url: A unified reinforcement learning approach for autonomic cloud management. Journal of Parallel and , 72(2):95–105, 2012.

[218] Jeonghwa Yang, W Keith Edwards, and David Haslem. Eden: supporting home network management through interactive visual tools. In Proceedings of the 23nd annual ACM symposium on User interface software and technology, pages 109–118. ACM, 2010.

[219] Eric Yuan, Naeem Esfahani, and Sam Malek. A systematic survey of self- protecting software systems. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 8(4):17, 2014.

[220] Rostyslav Zabolotnyi, Philipp Leitner, and Schahram Dustdar. Profiling-based task scheduling for factory-worker applications in infrastructure-as-a-service clouds. In Software Engineering and Advanced Applications (SEAA), 2014 40th EUROMICRO Conference on, pages 119–126. IEEE, 2014.

[221] Rostyslav Zabolotnyi, Philipp Leitner, Stefan Schulte, and Schahram Dustdar. Speedl - a declarative event-based language to define the scaling behavior of cloud applications. In Services (SERVICES), 2015 IEEE World Congress on, pages 71–78, June 2015.

[222] Peter Zadrozny and Raghu Kodali. Big Data Analytics Using Splunk: Deriv- ing Operational Intelligence from Social Media, Machine Data, Existing Data Warehouses, and Other Real-Time Streaming Sources. Apress, 2013.

[223] Liangzhao Zeng, B. Benatallah, A.H.H. Ngu, M. Dumas, J. Kalagnanam, and H. Chang. Qos-aware middleware for web services composition. Software Engineering, IEEE Transactions on, 30(5):311–327, May 2004. BIBLIOGRAPHY 217

[224] Zhi-Hui Zhan, Xiao-Fang Liu, Yue-Jiao Gong, Jun Zhang, Henry Shu-Hung Chung, and Yun Li. Cloud computing resource scheduling and a survey of its evolutionary approaches. ACM Comput. Surv., 47(4):63:1–63:33, July 2015.

[225] Miranda Zhang, Rajiv Ranjan, Anna Haller, Dimitrios Georgakopoulos, and Peter Strazdins. Investigating decision support techniques for automating cloud service selection. In Cloud Computing Technology and Science (Cloud- Com), 2012 IEEE 4th International Conference on, pages 759–764. IEEE, 2012.

[226] Miranda Zhang, Rajiv Ranjan, Surya Nepal, Michael Menzel, and Armin Haller. A declarative recommender system for cloud infrastructure services selection. In Proceedings of the 9th International Conference on Economics of Grids, Clouds, Systems, and Services, GECON’12, pages 102–113, Berlin, Heidelberg, 2012. Springer-Verlag.

[227] Xinwen Zhang, Anugeetha Kunjithapatham, Sangoh Jeong, and Simon Gibbs. Towards an elastic application model for augmenting the computing capabil- ities of mobile devices with cloud computing. Mobile Networks and Applica- tions, 16(3):270–284, 2011. Appendix A

Evaluated Orchestration Tools and Research Initiatives in Chapter2

We initially evaluated following 20 cloud resource orchestration tools and 10 research initiatives for understanding the main characteristics.

Orchestration Tools

• AWS OpsWorks [173]

• AWS CloudFormation [4]

• Google Cloud Deployment Manager [91]

• VMWare vSphere [133]

• OpenStack [156]

• Rackspace [39]

• Heroku [145]

• nitrous.io [152]

• Puppet [111]

• Chef (https://www.chef.io/)

218 219

• Ansible [148]

• Juju [198]

• Docker [197]

• EngineYard [73]

• RightScale [2]

• Finally.io [75]

• Bitnami [30]

• AWS CloudTrail [51]

• CloudFoundry-Bosh (https://bosh.io/)

• Terminal (https://www.terminal.com/)

Research Initiatives

• OpenTOSCA [29]

• CFEngine [37]

• Plush [3]

• SmartFrog [87]

• ModaClouds [15]

• Skyport [84]

• COPE [131]

• Speedl [221]

• [135]

• [212] Appendix B

List of References Organized by Taxonomy Dimensions in Chapter2

In this section, we list the name (if any) and reference of the methods, techniques and tools described along this work identifying the characteristic/s for which they were included.

B.1 Resources and User Type

Table B.1 summarizes the examples used to illustrate the different dimensions of Resources and User Types.

220 B.1. Resources and User Type 221

Table B.1: Representative literature references for the Resources and User Types dimensions

Resources and User Types References

AWS CloudFormation [4] VMWare vSphere [133] Juju [198] Google Cloud (https://cloud.google.com/) Infrastructure CohesiveFT [151] Resource Types OpenNebula [164] Eucalyptus [65] Nectar [67] Apache CloudStack [50] OpenTOSCA [29] B.1. Resources and User Type 222

Table B.1 – Continued from previous page

Resources and User Types References

AWS OpsWorks [173] AWS CloudFormation [4] Heroku [145] terminal (https://www.terminal.com/) Puppet [111] Juju [198] Docker [197] OpenTOSCA [29] CFEngine [37] Platform Plush [3] SmartFrog [87] nitrous.io [152] Chef (https://www.chef.io/) Ansible [148] RightScale [2] VisualOps [203] Skyport [84] EngineYard [73] CloudBees [49] SaltStack [99]

Salesforce Software (http://www.salesforce.com)

Resource Entity Model Applicable for any cloud resource or- chestration technique B.1. Resources and User Type 223

Table B.1 – Continued from previous page

Resources and User Types References

AWS OpsWorks [173] AWS CloudFormation [4] Rackspace [39] VMWare vSphere [133] Heroku [145] Puppet [111] Juju [198] CLIs Docker [197] Resource Access Methods CFEngine [37] Plush [3] SmartFrog [87] Chef (https://www.chef.io/) Ansible [148] SaltStack [99]

AWS OpsWorks [173] AWS Java SDK [8] Rackspace (https://developer. SDKs rackspace.com/sdks/) jCloud [79] SimpleCloud [195] [77] LibCloud [80] B.1. Resources and User Type 224

Table B.1 – Continued from previous page

Resources and User Types References

AWS OpsWorks [173] AWS CloudFormation [4] AWS REST API for S3 [9] Rackspace (http: //docs.rackspace.com/) VMWare vSphere [133] APIs Heroku [145] Puppet [111] Juju [198] Docker [197] Chef (https://www.chef.io/) Ansible [148] SaltStack [99]

AWS OpsWorks [173] VMWare vSphere [133] Puppet [111] Juju-GUI (https: //demo.jujucharms.com/) OpenTOSCA [29] VisualOps [203] AWS Management Console GUIs [182] Puppet Management Console [120] CA-Applogic [12] StackEngine [189] Panamax [41] Shipyard [183] CFEngine [37] B.1. Resources and User Type 225

Table B.1 – Continued from previous page

Resources and User Types References

AWS OpsWorks [173] AWS CloudFormation [4] Puppet [111] Juju [198] Docker [197] OpenTOSCA [29] CFEngine [37] Textual Plush [3] SmartFrog [87] Chef (https://www.chef.io/) Ansible [148] Docker Compose [69] Resource Representation Notation AWS CLI [20] SaltStack [99]

Cloud Computing Patterns [128] AWS OpsWorks [173] Puppet [111] Juju [198] OpenTOSCA [29] Visual CFEngine [37] nitrous.io [152] Chef (https://www.chef.io/) Ansible [148] VisualOps [203] Hyperglance [190] RightScale [2] B.1. Resources and User Type 226

Table B.1 – Continued from previous page

Resources and User Types References

AWS OpsWorks [173] Hybrid Juju [198] OpenTOSCA [29]

Google Cloud (https://cloud.google.com/) AWS OpsWorks [173] AWS CloudFormation [4] VMWare vSphere [133] Puppet [111] Juju [198] Docker [197] DevOps OpenTOSCA [29] User Types CFEngine [37] Plush [3] SmartFrog [87] Chef (https://www.chef.io/) Ansible [148] RightScale [2] VisualOps [203] SaltStack [99]

Heroku [145] terminal Application (https://www.terminal.com/) Devel- nitrous.io [152] opers

Skyport [84] Domain CS50 appliance [58] Experts B.1. Resources and User Type 227 B.2. Resource Orchestration Capabilities 228

B.2 Resource Orchestration Capabilities

Table B.2 summarizes the examples used to illustrate the different dimensions of Resource Orchestration Capabilities.

Table B.2: Representative literature references for the Resource Orchestration Ca- pabilities dimension

Resource Orchestration Capabilities References

AWS Marketplace [141] Bitnami [30] Puppet Forge (https: //forge.puppetlabs.com/) Docker Hub Registry [68] Select [207] [226] Primitive Actions Juju Charms (https: //jujucharms.com/store) Terminal.com (https: //www.terminal.com/explore)

Puppet [111] Chef (https://www.chef.io/) Ansible [148] Configure CFEngine [37] VisualOps [203] [115] SaltStack [99] B.2. Resource Orchestration Capabilities 229

Table B.2 – Continued from previous page

Resource Orchestration Capabilities References

Puppet [111] Juju [198] Docker [197] OpenTOSCA [29] CFEngine [37] Plush [3] SmartFrog [87] Deploy Chef (https://www.chef.io/) Ansible [148] [115] AWS OpsWorks [173] AWS CloudFormation [4] VisualOps [203] [208] [214]

Nagios [22] AWS CloudWatch [52] AWS CloudTrail [51] Monitor CloudFielder [137] Splunk [222] Finally.io [75]

AWS OpsWorks [173] Juju [198] Control [208] Finally.io [75] B.2. Resource Orchestration Capabilities 230

Table B.2 – Continued from previous page

Resource Orchestration Capabilities References

Docker [197] [214] [131] [168] Script-based [135] Orchestration Strategies [223] [163] [135]

Juju [198] AWS OpsWorks [173] CloudFielder [137] Reactive [43] [227] [221] [208]

AWS OpsWorks [173] State-based TIBCO [102] B.2. Resource Orchestration Capabilities 231

Table B.2 – Continued from previous page

Resource Orchestration Capabilities References

[179] [212] [224] [185] [143] [225] [93] [217] Proactive [166] [104] [11] [74] [106] [158] [103] [26]

AWS OpsWorks [173] Juju [198] Docker [197] Plush [3] Script-based SmartFrog [87] Chef (https://www.chef.io/) Language Paradigm Ansible [148] Puppet [111] SaltStack [99]

[117] Flow- [208] based B.2. Resource Orchestration Capabilities 232

Table B.2 – Continued from previous page

Resource Orchestration Capabilities References

AWS OpsWorks [173] ECA rule- CloudFielder [137] based Juju [198]

Plush [3] Markup AWS CloudFormation [4] languages DotCloud [70]

[131] Query- [135] based

CFEngine [37] Constraint [61] programming [178]

[38] [28] [37] [56] Theoretical Foundation Formal [36] methods [35] [34] [32]

[81] Security [176]

[112] [92] Cross-cutting Concerns [187] SLAs [136] [184] [63] B.2. Resource Orchestration Capabilities 233

Table B.2 – Continued from previous page

Resource Orchestration Capabilities References

[85] Load Balanc- Seesaw1 ing policies

[127] [153] Portability [197] [156] [76]

1https://github.com/google/seesaw B.3. Knowledge Reuse 234

B.3 Knowledge Reuse

Table B.3 summarizes the examples used to illustrate the different dimensions of Knowledge Reuse.

Table B.3: Representative literature references for the Knowledge Reuse dimension

Knowledge Reuse References

AWS OpsWorks [173] AWS CloudFormation [4] Heroku [145] Puppet [111] Juju [198] Docker [197] OpenTOSCA [29] Concrete and CFEngine [37] Template re- Plush [3] source descrip- SmartFrog [87] tions Reuse Artifact Chef (https://www.chef.io/) Ansible [148] RightScale [2] SaltStack [99]

VMWare vSphere [133, 204] terminal Resource (https://www.terminal.com/) snapshots [101]

Juju [198] Miscellaneous Docker [197] B.3. Knowledge Reuse 235

Table B.3 – Continued from previous page

Knowledge Reuse References

Bitnami [30] VMWare vSphere [133] AWS OpsWorks [173] AWS CloudFormation [4] Puppet [111] Juju [198] Search in- Docker [197] dex CFEngine [37] Chef (https://www.chef.io/) Reuse Technique Ansible [148] VisualOps [203] RightScale [2]

AWS Marketplace [141] Recommendations [226] [207]

Puppet [111] Juju [198] Community- Docker [197] based Chef (https://www.chef.io/) Ansible [148] B.4. Runtime Environment 236

B.4 Runtime Environment

Table B.4 summarizes the examples used to illustrate the different dimensions of Runtime Environment.

Table B.4: Representative literature references for the Runtime Environment di- mension

Runtime Environment References

VMWare vSphere [133] Google Cloud (https://cloud.google.com/) OS-level hypervisorAWS OpsWorks [173] Virtualization Technique AWS CloudFormation [4] Juju [198] OpenTOSCA [29]

Heroku [145] Environment- Skyport [84] level Container Docker [197] manager B.4. Runtime Environment 237

Table B.4 – Continued from previous page

Runtime Environment References

VMWare vSphere [133] AWS OpsWorks [173] AWS CloudFormation [4] Heroku [145] Juju [198] Docker [197] Centralized Execution Model OpenTOSCA [29] Plush [3] SmartFrog [87] Chef (https://www.chef.io/) Ansible [148] SaltStack [99]

Puppet [111] Skyport [84] De- [115] centralized CFEngine [37] B.4. Runtime Environment 238

Table B.4 – Continued from previous page

Runtime Environment References

Google Cloud (https://cloud.google.com/) Terminal (https://www.terminal.com/) Nitrous.io [152] Chef (https://www.chef.io/) Ansible [148] RightScale [2] SaltStack [99] ElasticBox Public (https://elasticbox.com/) Target Environment AWS OpsWorks [173] AWS CloudFormation [4] Heroku [145] Puppet [111] Juju [198] Docker [197] OpenTOSCA [29] VisualOps [203] CFEngine [37] B.4. Runtime Environment 239

Table B.4 – Continued from previous page

Runtime Environment References

Chef (https://www.chef.io/) Ansible [148] SaltStack [99] ElasticBox (https://elasticbox.com/) VMWare vSphere [133] Private Puppet [111] Juju [198] Docker [197] OpenTOSCA [29] CFEngine [37] Plush [3] SmartFrog [87] B.4. Runtime Environment 240

Table B.4 – Continued from previous page

Runtime Environment References

[208] [214] [213] [209] TOSCA [153] OpenTOSCA [29] [149] [188] jCloud [79] Federated SimpleCloud [195] DeltaCloud [77] Skyport [84] LibCloud [80] CohesiveFT [151] OpenNebula [164] Eucalyptus [65] Cloudward Bound [95] Ansible Cloud Modules [10] Appendix C

Java-based implementation of a Connector

In this section, we illustrate an implementation the Connector for AWS S3 1 service, a public file storage service. This Connector leverages AWS Java SDK 2 to invoke low-level API, published by AWS. The Connector is implemented as a RESTful API which includes three operations: deploy, undpeloy, control as defined in Chapter3. For instance, the deploy operation (1) accepts a JSON based resource description, (2) creates a client to interact with the AWS S3 low-level API, (3) prepares a request and sends it to the low-level API to create a new S3 bucket (i.e., a file storage), (4) waits for a result from the low-level API, and (5) returns a resource ID and resultant message back to the client. The Connector also includes an event named “BucketUpdatedEvent” which is triggered if any particular bucket is updated. Interested consumers of the event may subscribe to it by invoking addListener operation of the Connector. The Connector periodically check for the revision history of buckets (refer to the operation named getBucketUpdatedEvent) and if the Connector find any modifications, the “BucketUpdatedEvent” event is published for all the subscribed listeners.

Code C.1: Java-based Connector for AWS-S3 based Domain-specific Model 1 public class AWSS3Connector implements Connector {

1https://aws.amazon.com/s3/ 2http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/

241 242

2 3 private List listeners = new ArrayList(); 4 5 @Path("/deploy") 6 @POST 7 @Produces(MediaType.APPLICATION_JSON) 8 @Consumes(MediaType.APPLICATION_JSON) 9 public Response deploy(@QueryParam("description") String resourceDescription) throws Exception { 10 //Reading the credentials 11 Properties properties = new Properties(); 12 properties.load(this.getClass(). 13 getResourceAsStream("/AwsCredentials.properties")); 14 String accessKey = properties.getProperty("accessKey"); 15 String secretKey = properties.getProperty("secretKey"); 16 AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey); 17 AmazonS3Client client = new AmazonS3Client(credentials); 18 19 String name = resourceDescription.getAttribute("bucket-name"); 20 Bucket bucket = client.createBucket(name); 21 22 JsonObject json = new JsonObject(); 23 json.addProperty("resourceID", bucket.getName()); 24 return Response.status(Response.Status.OK). 25 entity(json.toString()).build(); 26 } 27 28 @Path("/undeploy") 29 @POST 30 @Produces(MediaType.APPLICATION_JSON) 31 @Consumes(MediaType.APPLICATION_JSON) 243

32 public Response undeploy(@QueryParam("resourceID") String resourceID) throws Exception { 33 //Reading the credentials 34 Properties properties = new Properties(); 35 properties.load(this.getClass(). 36 getResourceAsStream("/AwsCredentials.properties")); 37 String accessKey = properties.getProperty("accessKey"); 38 String secretKey = properties.getProperty("secretKey"); 39 AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey); 40 AmazonS3Client client = new AmazonS3Client(credentials); 41 42 client.deleteBucket(resourceID); 43 44 JsonObject json = new JsonObject(); 45 json.addProperty("result", "Bucket:" + resourceID + "undeployed."); 46 return Response.status(Response.Status.OK). 47 entity(json.toString()).build(); 48 } 49 50 @Path("/control") 51 @POST 52 @Produces(MediaType.APPLICATION_JSON) 53 @Consumes(MediaType.APPLICATION_JSON) 54 public Response undeploy(@QueryParam("resourceID") String resourceID, @QueryParam("action") Action action) throws Exception { 55 if (action.name == "accessControl") { 56 String accessControlConfig = action.getInput().alc; 57 58 AmazonS3Client client = new AmazonS3Client(credentials); 59 SetBucketAclRequest setBucketAclRequest = new 244

SetBucketAclRequest(resourceID, accessControlConfig); 60 61 client.setBucketAcl(bucketAclRequest); 62 JsonObject json = new JsonObject(); 63 json.addProperty("result", "Bucket:" + resourceID + "Access Control Changed."); 64 return Response.status(Response.Status.OK). 65 entity(json.toString()).build(); 66 } else { 67 throw new RuntimeException("Action is not specified!"); 68 } 69 } 70 71 @Path("/subscribeToEvents") 72 @POST 73 @Produces(MediaType.APPLICATION_JSON) 74 @Consumes(MediaType.APPLICATION_JSON) 75 public Response addListener(EventListener toAdd) { 76 listeners.add(toAdd); 77 return Response.status(Response.Status.OK). 78 entity(json.toString()).build(); 79 } 80 81 public AWSS3Connector() { 82 while(true) { 83 getBucketUpdatedEvent(); 84 Thread.sleep(10000); 85 } 86 } 87 88 //Trigger all the listeners at event detections 89 public void getBucketUpdatedEvent() { 245

90 //check the Bucket Version ID 91 String responseID = BucketVersionIDChecker(); 92 93 //If the Bucket version ID changed, publish the event to the subscribers 94 if (isBucketVersionIDChanged(response)) { 95 // Notify everybody that may be interested. 96 for (EventListener el : listeners) 97 el.trigger("{"bucket-id": "+ responseID +", "event": "bucket-updated"}"); 98 } 99 } 100 101 //Event Description 102 class BucketUpdatedEvent { 103 public Response trigger(json) { 104 return Response.entity(json).build(); 105 } 106 } 107 108 } Appendix D

Evaluation Questionnaire in Chapter 6

In this section, we present the questionnaire for determining the accuracy of the experiment tasks in Section 6.5.1.

D.1 Background Questions

Q1. How familiar are you with Cloud Resource Orchestration tools?

Q2. How familiar are you with the "Docker" platform?

D.2 Functionality Questions

Q1. Cloud Attributes: (a) What are the open ports of Container named “BPEL- App1”? and (b) What is the amount of memory allocated to Virtual Machine named “VM-1”?

Q2. Cloud relationships: (a) If the “Nginx” Container crashed, what other Con- tainers would be affected? (b) How many Containers are deployed in Virtual Machine named “VM-2”? and (c) What is the required version of Java runtime to deploy the Image “BPEL-App”?

246 D.3. Insight Questions 247

Q3. Solving problems involving both Attributes and Relationships: (a) What is the amount of CPU allocated to the VM with the Container named “Node-App1” deployed? and (b) What is the additional amount of memory required at “VM- 2” to be able to deploy two more Containers similar to the Container named “Node-App1”?

Q4. Controlling cloud resources: (a) Double the amount of CPU memory and stor- age capacity of “VM-1”, and to record the result.

D.3 Insight Questions

Q1. Do you think the Hosting Machine Map is useful for your day-to-day tasks?

Q2. How intuitive do you think the Image Map is?

Q3. Do you think the Application Map is useful for your day-to-day tasks?

Q4. How easy or difficult to Monitor and Control (M&C) Application and Con- tainers?

Q5. How easy or difficult to Monitor and Control (M&C) Hosting Machines?

D.4 Improvement Questions

Q1. How would you improve the visualisation tool?

Q2. Any other comments?